You are on page 1of 16

SPEECH PROCESSING

BINIT MOHANTY binit.mohanty@gmail.com

Why Speech?
No visual contact required No special equipment required Can be done while doing other things Telephones AT&T Mobile Phones (1G and 2G)

Speech Processing
Speech Coding Speech Synthesis Speech Recognition Speaker Recognition/Verification Dyslexia and Auditory problems

Audio Engineering

Speech Coding
Compress a Speech File Why not use standard compression techniques? MP3 Format
Perceptual Coding Exploits sensory organ biases

Speech Synthesis
Construct Speech waveform from words Speaker Quality and Accent Prosody?

http://www.research.att.com/~ttsweb/tts/demo.php

Speech Recognition
Convert a sound waveform to words The most relevant and important task in the industry 90% in lab conditions, much lower in factory conditions Sphinx by CMU, ViaVoce by IBM & SDK by Microsoft

Speaker Recognition
Concerned with Biometrics Acceptable as a verification technique How would this be different from Speech recognition?
Speaker Quality Prosody Pitch, Accent etc.

Dyslexia & Auditory Problems


Study Voice and Ear defects Detect and correct Speech Disfluencies CMU Development of better Ear substitutes Cochlear Implants

Audio Engineering
Adding effects to sound Clarity of reproduction A Big industry with players like Dolby, Bose, Phillips etc Voice Morphing!
SOURCE TARGET CONV 1 CONV 2

Courtesy: Hui Ye & Steve Young, Cambridge

Automatic Speech Recognition


Most Important Task Hardest Task
Co-articulation: Two speakers speaking at the same time Speaker Variation Spontaneity Language Modeling Noise Robustness

ASR: Problems

James Glass, MIT

ASR: Method

James Glass, MIT

ASR: Application

James Glass, MIT

Automatic Speech Recognition

James Glass, MIT

Automatic Speech Recognition

James Glass, MIT

Speech Production

You might also like