Astricon 2016: Speaker Bhagvan Kommadi

Architect Corner update:
Astricon 2016, September 27-29, 2016 in Glendale Arizona

https://astricon2016.sched.org/event/7Zkk/voice-to-text-intelligent-knowledge-assistant

astricon

Speech Processing consists of speech coding, synthesis, recognition and speaker recognition modules. Speech Types can be isolated words, connected words,continuous and spontaneous speech. Speaker models can be dependent and independent. Independent models recognises the speech patterns of a large group of people. Dependent models are more accurate for the particular speaker.

Speech Recognition involves Audio recording and conversion of speech to text. Audio Recording are archived into a file system or a database. Recording and conversion might happen offline. Online sync and updates will happen when the connectivity exists. Accuracy of conversion is important. Pronunciation and Ascents are known challenges.

Speech Recognition

Audio recorded need to be classified into voiced or unvoiced sounds. The classification is done into silence/unvoiced/voiced sounds. The stop consonant identification and end point detection for isolated utterances is classified. Noisy environment will have unwanted signals and back ground.


The communicated speech in the audio can be classified into words, phrases and sentences by applying grammar. The converted text might be interpreted as words and phrases. In India, the language can be a mix of different dialects/native languages. The total number of languages is 1652 dialects/native languages.

 

Speech Processing consists of speech coding, synthesis, recognition and speaker recognition modules. Speech Types can be isolated words, connected words,continuous  and spontaneous speech. Speaker models can be dependent and independent. Independent models recognises the speech patterns of a large group of people. Dependent models are more accurate for the particular speaker.

Speaker diarisation is the process of partitioning an input audio stream into homogeneous segments according to speaker identity. Readability of an automatic speech transcription is improved when speaker’s true identity is provided.

The vocabulary is classified as small, medium, large, very large and out of vocabulary. Small vocabulary is ten of words. Medium vocabulary is hundreds of words. Large is thousands of words. Very Large is tens of thousands of words. Out of Vocabulary is mapping into the unknown word.

Speech systems have other important characteristics like environment variability, channel variability, speaker style, sex, age, speed of speech etc.,

Speech Perception consists of message understanding, language translation and feature extraction. Speaker recognition system is performed in four stages : analysis, feature extraction, modelling and testing.

The system will have dictionary based on dialect and language. The words with correct pronunciation will identified from the set of words available in the dictionary. The dictionary will be created with adjectives, pronouns, nouns and verbs. The updates for the dictionary will be supported on a regular basis.

Template based isolated word recognition, continuous speech  recognition and applying neural network to speech recognition are the other approaches applied for recognition.

The correctness of pronunciation and grammar in phrases and sentences is the key challenge in identifying the words in the audio with dictionary of words. The accuracy will be dependent on Read vs spontaneous speech. Adverse conditions while recording the audio will impact the accuracy of conversion and recording audio.

On Screen editing with suggestions for corrections will be provided for the decoded audio after identifying the words from dictionary of words.

Speech verification is verifying the correctness of pronounced speech. The expected speech to be pronounced is compared against the decode speech. The correctness of the speech is measured.

Speech analytics are measured based on topics being discussed, emotional character of the audio, locations of speech vs non-speech and periods of silence.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s