Professional Documents
Culture Documents
casey chesnut
brains-N-brawn.com
http://blogs.msdn.com/anandis_thoughts/archive/2006/
Speech Synthesis
• Text to Speech
– Dynamic
– Prompt database
How Synthesis Works
• Text parsing
– Sentences, numbers, symbols, pauses
• Natural language processing
– Part of speech, tense
• Phonemes are looked up or sounded out
• Diphones are appended together
• Post process audio to add emphasis
• Play speech audio
How Synthesis Works
• Demo
– /xnaSynth app
• Article
– http://www.brains-N-brawn.com/ttSpeech/
– http://www.brains-N-brawn.com/xnaSynth/ (codebase from
/ttSpeech)
Speech Recognition
• Speech to Text
– Dictation
– Command and Control
How Recognition Works
• Audio signal is processed
• Look for signals which might be speech
• Phonemes are found in audio signals
• Phonemes are mapped to a dictionary or
words
– Dictation or grammar-based
• Apply natural language processing
How Recognition Works
• Demo
– /wavReader app
• Article
– http://www.brains-N-brawn.com/noReco/
– http://www.brains-N-brawn.com/speakerVerify/ (codebase from
/noReco)
Outline : Vista Speech
Recognizer
• Built-in to Vista’s shell
• Microphone bar
• Language support
• Can be trained to improve accuracy
• Command-and-control, also Dictation
• Automagic application support
• Horrible Office integration
• UAC problems
Demo
• Say what you see
• Show numbers
• Correct
• Spell it
• Mouse grid
http://www.istartedsomething.com/20060808/vis
High Risk Demo
Hack
http://news.bbc.co.uk/1/hi/technology/6320865.s