You are on page 1of 39

The Speech Speech

casey chesnut
brains-N-brawn.com

Madison .NET March 2007


Powerpoint
• Page Up
• Page Down
brains-N-brawn.com
• Pervasive Computing
– Tablet PC (MVP 03)
– Compact Framework (MVP 04)
– Advanced Web Services (MVP 05)
– Media Center (MVP 06)
– Speech
– Location Based Services
– Artificial Intelligence
– 3D
Outline
• Speech Overview
• Vista Speech Recognition
• SAPI 5.3 / System.Speech
• Speech Server 2007
Outline : Speech Overview
• Voice User Interface
• How does it work?
– Synthesis (TTS)
– Recognition (SR)
Overview
• Speech is just another presentation
system
– Synthesis = Output to user
– Recognition = User input
• Voice User Interface (VUI)
VUI Modes
• Applications
– Multi-modal
– Voice-only
VUI Tips
• Don't replicate the touch-tone-based menu
system
• Restrict options on the main (opening) menu to
4 or fewer
• Make sure your opening greeting is short
• Don't design the app solely for the new user
• Focus on task completion above all
• What can I say?

http://blogs.msdn.com/anandis_thoughts/archive/2006/
Speech Synthesis
• Text to Speech
– Dynamic
– Prompt database
How Synthesis Works
• Text parsing
– Sentences, numbers, symbols, pauses
• Natural language processing
– Part of speech, tense
• Phonemes are looked up or sounded out
• Diphones are appended together
• Post process audio to add emphasis
• Play speech audio
How Synthesis Works
• Demo
– /xnaSynth app
• Article
– http://www.brains-N-brawn.com/ttSpeech/
– http://www.brains-N-brawn.com/xnaSynth/ (codebase from
/ttSpeech)
Speech Recognition
• Speech to Text
– Dictation
– Command and Control
How Recognition Works
• Audio signal is processed
• Look for signals which might be speech
• Phonemes are found in audio signals
• Phonemes are mapped to a dictionary or
words
– Dictation or grammar-based
• Apply natural language processing
How Recognition Works
• Demo
– /wavReader app
• Article
– http://www.brains-N-brawn.com/noReco/
– http://www.brains-N-brawn.com/speakerVerify/ (codebase from
/noReco)
Outline : Vista Speech
Recognizer
• Built-in to Vista’s shell
• Microphone bar
• Language support
• Can be trained to improve accuracy
• Command-and-control, also Dictation
• Automagic application support
• Horrible Office integration
• UAC problems
Demo
• Say what you see
• Show numbers
• Correct
• Spell it
• Mouse grid

http://www.istartedsomething.com/20060808/vis
High Risk Demo
Hack
http://news.bbc.co.uk/1/hi/technology/6320865.s

• /micBarExtend – tap and talk


Narrator
• Vista’s screen reader
Outline : SAPI 5.3 /
System.Speech
• Desktop applications
– SAPI 5.3
– System.Speech
SAPI 5.3
• COM based
• Native applications
• Managed apps which need more control
System.Speech
• Part of .NET 3.0 WPF
• Managed wrapper built on SAPI 5.3
• Simple API
• Standards support (SSML, SRGS)
• Language support
• Vista Speech Recognition integration
• Does not work in XBAP
System.Speech.Synthesis
• SpeechSynthesizer
• SSML
• PromptBuilder
• Voices
System.Speech.Synthesis
• Demo
– /speechSamples - /speechSynth
System.Speech.Recognition
• SpeechRecognizer /
SpeechRecognizerEngine
• SRGS
• GrammarBuilder
• Advanced users
– Deep-link functionality
– Mixed initiative
System.Speech.Recognition
• Demo
– /speechSamples - /speechReco
System.Speech
• Demo
– /micBarExtend
– /mceSapiMcpl
• Article
– http://www.brains-N-brawn.com/speechSamples/
– http://www.brains-N-brawn.com/micBarExtend/
– http://www.brains-N-brawn.com/mceSapi/ (not
updated for Vista yet)
What about Mobile Devices
• OEMs can add VoiceCommand
– VoiceCommand is not accessible to
developers
• WindowsMobile has the SAPI API, but no
engines
• PlatformBuilder is supposed to have
engines
• There are 3rd party engines for purchase
Outline : Speech Server 2007
Speech Server 2007
• Telephony Applications
• Outgoing calls
• Speaker Independent
Speech Server 2007
• VOIP
• Language support
• VoiceXML / SALT
• Workflow development model
• Reports
• Still in beta
Speech Server 2007
• Speech Synthesis
– Inline
– PromptBuilder
– SSML
– Prompt databases
• Speech Recognition
– Inline
– Dynamic Grammar
– SRGS
– Conversational Grammar Builder
– DTMF
VoiceXML
• Declarative language
• Article
– http://www.brains-N-brawn.com/vxml/
– http://www.brains-N-brawn.com/myVoices/
– http://www.brains-N-brawn.com/voiceBio/
SALT
• Yet another declarative language
• Multimodal support has been dropped
• Article
– http://www.brains-N-brawn.com/noHands/
– http://www.brains-N-brawn.com/speechMulti/
– http://www.brains-N-brawn.com/tabletWeb/
– http://www.brains-N-brawn.com/mceSalt/
Speech Workflow
• Speech Sequence Workflow designer
• Speech activities
– Statement
– QuestionAnswer
• Debugging tools
Speech Workflow
• Demo
– /speechTextAdv
– /speakerVerify
– /mobileRecord
• Article
– http://www.brains-N-brawn.com/speechTextAdv/
– http://www.brains-N-brawn.com/speakerVerify/
Where
• Accessibility
• Telephony
• Telematics
• Home automation
• Mobile Devices / Tablets
• Gaming
• Warehouses
• …
Possible Future
• Telematics
• Service Pack for Office Support
• Exchange Server 2007
• Speech Server 2007 release
• Rumors that WindowsMobile will get a
public API
• Dictation has room to improve
• Hope that System.Speech will ultimately
work in XBAP
Questions

You might also like