Professional Documents
Culture Documents
Speech Recognition
Punit Kumar Sharma, Dr. B.R. Lakshmikantha and K. Shanmukha Sundar
I. INTRODUCTION
Speech is one of the natural forms of communication.
Recent development has made it possible to use this in the
security system. In speech recognition, the task is to use a
speech sample to select the identity of the person that
produced the speech from among a population of speakers.
This paper makes it possible to use the speakers voice to
verify their identity and control access to DC motor. This
approach can be used such as voice dialling, banking by
telephone, to drive electrical vehicles, telephone shopping,
database access services, information services, voice mail,
security control for confidential information areas, and
remote access to computers [1,3]. The MFCC algorithm for
speech recognition is more accurate then Linear Prediction
Coding (LPC) and Hidden Markove Model (HMM). The
external DC motor is connected through interfacing between
computer and hardware circuit. The hardware circuit consist
of microcontroller (8051), IC MAX 232, driver IC (l293D)
mainly.1
mel(f)= 2595*log10(1+f/700)
............ (1)
C. Cepstrum
The output of the equation (1) is log mel spectrum, it has
to be converted back into time. The result is called the mel
frequency cepstrum coefficients (MFCCs). This may be
converted to the time domain using the Discrete Cosine
Transform (DCT). The MFCCs may be calculated using this
equation. [6,8]
K
Cn= (logSk)
) ]
......(2)
K=1
Where n= 1,2,3,....K
The number of mel cepstrum coefficients, K, is typically
chosen as 16. The first component, C0 is excluded from the
DCT since it represents the mean value of the input signal
which carries little speaker specific information.[1,3]
IV. FEATURE MATCHING
Feature matching techniques used in speaker recognition
include, Dynamic Time Warping (DTW), Hidden Markov
Modeling (HMM), and Vector Quantization (VQ). The VQ
approach has been used here for its ease of implementation
and high accuracy.
A. Vector Quantization
Vector quantization (VQ) is a lossy data compression
method based on principle of blockcoding [9]. It is a fixedto-fixed length algorithm. VQ may be thought as an
aproximator. Fig. 2, shows an example of a 2- dimensional
VQ.
Here, every pair of numbers falling in a particular region
is approximated by a star associated with that region. In Fig.
2, the stars are called codevectors and the regions defined by
the borders are called encoding regions. The set of all
codevectors is called the codebook.
Audio signal
MFCC
VECTOR
QUANTIZATION
EUCLIDIAN
DISTANCE
MICROPHONE
PC
SERIAL PORT
INTERFACE
Micro Controller
DC
Motor
Driver Unit
VI. RESULTS
V. EXPERIMENTAL SETUP
In the experimental setup of DC motor drive through
speech recognition, the speech signal is taken by
microphone that is connected to computer. Software coding
is to calculate the MFCC and VQ (LBG algorithm)
MATLAB 7.5 version is used here in this paper, to
recognize the input speech taken from micro phone. And for
hardware part to make DC motor understands,
microcontroller (8051) is used. For microcontroller coding
Embedded C programming is used. The interfacing between
computer and microcontroller is done by RS-232. For drive
the DC motor the driver IC L293D is used. The Fig. 6
shows the basic speech recognition block diagram.
VII. CONCLUSION
In this paper MFCC and VQ techniques are used in speech
recognition to control the DC motor drive. The code
developed in MATLAB using MFCC and VQ can be even
used for control and drive the stepper motor, servo motor etc.
The developed speech algorithm can be use for navigation
purpose, to drive electric vehicles security areas (like banking,
unman vehicles, remote access of computers where speech
can be use as password).
VIII. REFERENCES
[1] An Efficient MFCC Extraction Method in Speech Recognition. Wei HAN,
Cheong-Fat CHAN, Chiu-Sing CHOY and Kong-Pang PUN, Department of
Electronic Engineering, The Chinese University of Hong Kong, Hong Kong,
IEEE 2006.
[2] Differential MFCC and Vector Quantization used for Real-Time Speaker
Recognition System, 2008 IEEE Congress on Image and Signal Processing,
Wang ChenMiao Zhenjiang, Institute of Information Science, Beijing
Jiaotong University, Beijing 100044, China
[3] Speaker Identification Using MEL Frequency Cepstral Coefficient, Md.
Rashidul Hasan, Mustafa Jamil, Md. Golam Rabbani Md. Saifur Rahman.
Electrical and Electronic Engineering, Bangladesh University of Engineering
and Technology. 3rd International Conference on Electrical & Computer
Engineering ICECE 2004, 28-30 December 2004, Dhaka, Bangladesh.
[4] Lawrence Rabiner and Biing-Hwang Juang, Fundamental of Speech
Recognition, Prentice-Hall, Englewood Cliffs, N.J., 1993.
[5] Zhong-Xuan, Yuan & Bo-Ling, Xu & Chong-Zhi, Yu. (1999). Binary
Quantization of Feature Vectors for Robust Text-Independent Speaker
Identification in IEEE Transactions on Speech and Audio Processing, Vol. 7,
No. 1, January 1999. IEEE, New York, NY, U.S.A.
[6] F. Soong, E. Rosenberg, B. Juang, and L. Rabiner, "A Vector
Quantization Approach to Speaker Recognition", AT&T Technical Journal,
vol. 66, March/April 1987, pp. 14-26.
[7] Comp.speech Frequently Asked Questions WWWsite,http://svr
www.eng.cam.ac.uk/comp.speech/
[8] Jr., J. D., Hansen, J., and Proakis, J. Discrete-Time Processing of Speech
Signals, second ed. IEEE Press, New York, 2000.
[9] R. M. Gray, ``Vector Quantization,'' IEEE ASSP Magazine, pp. 4--29,
April 1984.
IX. BIOGRAPHIES
Punit Kumar Sharma, born in 1985 Rajasthan,
India. He received his B.E. (Electrical Engineering)
in 2007 from University of Rajasthan & pursuing
his M.Tech in Power Electronics from Visvesvaraya
Technological
University,
Belgaum
(Karnataka). His areas of interest are speech
recognition, artificial Intelligence, embedded
system, electrical drives etc.