Home |   Industries   | Defence |   VRAS


Voice Recognition and Analysis System
VRAS System Architecture
VRAS is Windows OS based solution consisting of a server that hosts the speech processing algorithms. The system architecture allows parallel access to the voice processing algorithms. System users are able to access voice recognition and analysis functionality using their laptop computers. The client program facilitates the users to log into the system, upload audio files and perform voice processing functions like speech enhancement, pause removal, speaker identification, key word spotting, etc.

User Interface

System Features

  • Audio conversion and standardization of incoming audio signal
  • Rule based speech activity detection system employs a number of speech features related to spectral structure, pitch, energy and speech-pause interaction. Extremely effective in noisy as well as clean environments
  • Effective and efficient algorithms for quick speech enhancement
  • State of the art GMM-UBM (Gaussian mixture model-universal background model) based speaker identification technique
  • SVM (support vector machine) based super vector classifier for highest quality speaker identification
  • Hybrid word spotting system that combines phoneme recognition based approach with language modeling based approach
Speaker Enrollment

Speech Detection

  • Speech enhancement for noise Suppression
  • Speech activity detection for pause Removal feature
  • Audio edit feature such as deletion, cut, copy, paste and merge including annotation of the portion of recorded data into another file to make a compile data file
  • Speech detection will be done in 1:15 i.e. Signal of 15 minutes duration will fully processed in one minute
Speaker/Keyword Search Output

Speaker Identification (SID)

  • Speaker enrollment for building speaker voice biometric models
  • Speaker verification for confirming/rejecting identity of a speaker
  • User configurable threshold level (Confidence level)
  • SID will be done in 1:8 i.e. Signal of eight minutes duration will fully processed in one minute

Keyword Spotting (KWS)

  • Phoneme recognition for converting audio into speech sound units
  • Search algorithms for spotting words in the audio file
  • User configurable threshold level (Confidence level)
  • KWS will be done in 1:3 i.e. Signal of three minutes duration will fully processed in one minute