You are on page 1of 14

Real-Time Speech Recognition

Thang Pham
Advisor: Shane Cotter
Background
Types of speech recognition systems:
Word recognition,
Connected speech recognition,
Speech understanding systems
Simplest:
user-dependent limited vocabulary
Hard to design any system
Variations of speech, i.e.
amplitude,
duration,
and signal to noise
Background noise
Reverberation noise.

Implemented in banking, telephone, etc.
IBM ViaVoice

Project Outline
Design a user-dependent speech recognition system to control
the movement of a small remote control car

Limited in vocabulary: Backward, Forward, Left, and Right
Trained to my voice

Different speech recognition algorithms were examined to
understand the advantages and disadvantages of each system

Linear Predictive Coding
Cepstrum Coefficients
Mel-frequency Cepstrum Coefficients
System Design
Microphone
TI 6713 DSP Board
Sample word at 8 kHz
Segment word into time frames
Find Mel-Cepstrum coefficients
for each frame
Compare input word to a
codebook of defined words using
dynamic time warping
Recognized
word
Components List
Texas Instruments TMS320C6713 DSP Board

Audio Technica Omnidirectional Microphone
ATR35S

Two step motors

Linear Predictive Coding
Provides a good model of the speech signal.
Can approximate a speech sample at time n from past
samples.


where a
1
,a
2
,,a
p
are coefficients that weight each sample.


) ( ... ) 2 ( ) 1 ( ) (
2 1
p n s a n s a n s a n s
p
+ + + ~
Mel-frequency Cepstrum Coefficients
Research has shown mel-
frequency cepstrum
coefficients to be better
than cepstrum coefficients
and LPC
Modeled around human
auditory system (ear)



where c
n
is the n
th
order
mel-frequency cepstrum,
and S
k
is the power of the
k
th
mel filter.

12 mel-frequency cepstrum
coefficients characterize
each time frame

=
=
M
k
M
k n k S Log n C
1
] * ) 5 . 0 ( * cos[ * ]) [ ( ] [
t
Dynamic Time Warping
Arranged mel-frequency coefficients into vectors

Use dynamic time warping to find best match

Compare words that are uttered in a different time
frame.
You have a referenced word that you are listening
for

You have a sampled word

Want to compared both words, sampled and
referenced, and see if they match

Compare mel-frequency cepstrum coefficients for
each frame of speech

Dynamic Time Warping
Example of DTW:


Dynamic Time Warping
Solution:


Results
Word Recognition Rate
Backward 50 %
Forward 70 %
Left 90 %
Right 40 %
Sources of error: 1. Noise, i.e. computer fan, fluorescent
light.
2. Voice changes, i.e. a word spoken on
a day might not sound the same on the
next day
3. Trained to one word template
Problems Encountered
Warping frequency domain
into mel-frequency, i.e.
Log
10
.




Translation of MATLAB code
into C, i.e. dynamic arrays,
debugging process
Dynamic time warping, i.e.
theory, algorithm
|
.
|

\
|
+ =
700
1 * 2595
10
Hz
mel
F
Log F
Future Work
The C implementation of this system is being developed.
The implementation will be uploaded onto the TI 6713 DSP
Board once it is completed.

The code will be modified to allow the recognition system
to operate in real-time.

A more comprehensive testing of the system will be
performed under a variety of noise conditions.
That is all.

You might also like