You are on page 1of 6

Speech Recognition

Using HTK

Speech
Sensations of air pressure vibrations produced
by air exhaled from the lungs and modulated
and shaped by the glottal cords and the
resonance of the vocal tract as the air is
pushed out through the lips and nose.
Immensely information-rich signal
(information about words, speaker identity,
accent, expression, emotion and the state of
health of the speaker)
Human Communication
A. Speech Formulation
B. Human Vocal
Mechanism
C. Acoustic Wave in Air
D. Perception of the Ear
E. Speech
Comprehension
Schematic diagram of the speech
production/ perception process
Cocktail Party effect
Anatomy of Speech Production
It consists of 3 cavities
pharyngeal, oral and
nasal cavity.
Air is expelled from the
lungs through the trachea
which causes the vocal
cords to vibrate and by
positioning the
articulators, different
sounds can be produced.
The time between successive vocal fold openings
is called the fundamental period T
o
, while the rate
of vibration is called fundamental frequency of
the vocal fold, F
o
=1/T
o
.
Pitch is often used interchangeably with
fundamental frequency.
For men, the possible pitch range is usually found
somewhere between the two bounds 50-250 Hz,
while for women the range usually falls
somewhere in the interval 120-500 Hz

You might also like