You are on page 1of 5

Real Time Control of DC Motor Drive using

Speech Recognition
Punit Kumar Sharma, Dr. B.R. Lakshmikantha and K. Shanmukha Sundar

Abstract- This paper introduces a new approach to control


and drive the DC motor, using speech recognition. The speech
signal can be provided through microphone that is connected
to computer. A DC motor connected through microcontroller
can be driven in forward or reverse direction at different
speeds, as well as it can be stopped by giving speech command.
In this paper Mel Frequency Cepstral Co-efficient (MFCC) is
used to recognize the users speech and vector quantization is
used to increase speech recognition accuracy. The MFCC and
VQ algorithm, for speech recognition have been implemented
in MATLAB 7.5 version on Windows Vista platform, and the
supporting hardware setup is being implemented.
Keywords- Speech Recognition, MFCC, Vector Quantization.

I. INTRODUCTION
Speech is one of the natural forms of communication.
Recent development has made it possible to use this in the
security system. In speech recognition, the task is to use a
speech sample to select the identity of the person that
produced the speech from among a population of speakers.
This paper makes it possible to use the speakers voice to
verify their identity and control access to DC motor. This
approach can be used such as voice dialling, banking by
telephone, to drive electrical vehicles, telephone shopping,
database access services, information services, voice mail,
security control for confidential information areas, and
remote access to computers [1,3]. The MFCC algorithm for
speech recognition is more accurate then Linear Prediction
Coding (LPC) and Hidden Markove Model (HMM). The
external DC motor is connected through interfacing between
computer and hardware circuit. The hardware circuit consist
of microcontroller (8051), IC MAX 232, driver IC (l293D)
mainly.1

Punit Kumar Sharma, Department of Electrical & Electronics Engineering,


Dayananda Sagar College of Engineering, Bangalore, India
(e-mail : pkselectrical@gmail.com)
Dr. B.R. Lakshmikantha, , Department of Electrical & Electronics
Engineering, Dayananda Sagar College of Engineering, Bangalore, India
(e-mail : lkantha@indiatimes.com)
K. Shanmukha Sundar, , Department of Electrical & Electronics
Engineering, Dayananda Sagar College of Engineering, Bangalore, India
(e-mail : bonniekhs@gmail.com)

978-1-4244-7882-8/11/$26.00 2011 IEEE

II. PRINCIPLES OF SPEECH RECOGNITION


Speaker recognition methods can be divided into textindependent and text-dependent methods. In a textindependent system, speaker models capture characteristics
of somebodys speech which show up irrespective of what
one is saying. [4] In a text-dependent system, the
recognition of the speakers identity is based on users
speaking one or more specific phrases, like passwords, card
numbers, PIN codes, etc. Every technology of speaker
recognition, identification and verification, whether textindependent and text-dependent, each has its own
advantages and disadvantages and may require different
treatments and techniques. The choice of which technology
to use is application-specific. At the highest level, all
speaker recognition systems contain two main modules
feature extraction and feature matching. [5, 6]
III. METHODOLOGY
The purpose of this module is to convert the speech
waveform to some type of parametric representation at a
considerably lower information rate. The speech signal is a
slowly time varying signal. When examined over a
sufficiently short period of time (between 5 and 100ms), its
characteristics are fairly stationary. However, over long
periods of time (on the order of 0.2s or more) the signal
characteristics change to reflect the different speech sounds
being spoken. The speech signal which is slowly time
varying called quasi-stationary. A number of methods are
available for parametrically representing the speech signal
for the speaker recognition task, such as Linear Prediction
Coding (LPC), Mel-Frequency Cepstrum Coefficients
(MFCC), and others [3]. MFCC is perhaps the best known
and most popular. MFCCs are based on the known
variation of the human ears critical bandwidths with
frequency. The MFCC technique makes use of two types of
filter, namely, linearly spaced filters and logarithmically
spaced filters.
A. MFCC Processor
The block diagram represents the structure of a MFCC
processor in Fig. 1. The speech input is recorded at sampling
frequency rate of 12500 Hz. This sampling frequency is
chosen to minimize the effects of aliasing in the analog -todigital conversion process. Figure 1 shows the block
diagram of an MFCC processor.
B. Mel Frequency Wrapping
The speech signal consists of tones with different
frequencies. For each tone with an actual Frequency f,

measured in Hz, a subjective pitch is measured on the Mel


scale. The mel-frequency scale is linear frequency spacing
below 1000Hz and a logarithmic spacing above 1000Hz. As
a reference point, the pitch of a 1 kHz tone, 40dB above the
perceptual hearing threshold, is defined as 1000 mels.
The following formula can used to compute the mels for
given frequency f, in Hz. [8]

mel(f)= 2595*log10(1+f/700)

............ (1)

Fig 2. An example of a 2-dimensional VQ

B. LBG Design Algorithm


The LBG VQ design algorithm is an iterative algorithm. It
is proposed by Y. Linde, A. Buzo & R. Gray. This
alternatively solves optimality criteria [10].

Fig 1. Block diagram of MFCC processor

C. Cepstrum
The output of the equation (1) is log mel spectrum, it has
to be converted back into time. The result is called the mel
frequency cepstrum coefficients (MFCCs). This may be
converted to the time domain using the Discrete Cosine
Transform (DCT). The MFCCs may be calculated using this
equation. [6,8]
K

Cn= (logSk)

) ]

......(2)

K=1
Where n= 1,2,3,....K
The number of mel cepstrum coefficients, K, is typically
chosen as 16. The first component, C0 is excluded from the
DCT since it represents the mean value of the input signal
which carries little speaker specific information.[1,3]
IV. FEATURE MATCHING
Feature matching techniques used in speaker recognition
include, Dynamic Time Warping (DTW), Hidden Markov
Modeling (HMM), and Vector Quantization (VQ). The VQ
approach has been used here for its ease of implementation
and high accuracy.
A. Vector Quantization
Vector quantization (VQ) is a lossy data compression
method based on principle of blockcoding [9]. It is a fixedto-fixed length algorithm. VQ may be thought as an
aproximator. Fig. 2, shows an example of a 2- dimensional
VQ.
Here, every pair of numbers falling in a particular region
is approximated by a star associated with that region. In Fig.
2, the stars are called codevectors and the regions defined by
the borders are called encoding regions. The set of all
codevectors is called the codebook.

Fig 3. Flowchart of VQ-LBG algorithm

The algorithm requires an initial codebook. The initial


codebook is obtained by the splitting method. In this
method, an initial codevector is set as the average of the
entire training sequence. This codevector is then split into
two. The iterative algorithm is run with these two vectors as
the initial codebook. The final two codevectors are split into
four and the process is repeated until the desired number of
codevectors is obtained. The algorithm is summarized in the

flowchart of Fig. 3. In Fig. 4, the VQ is shown for two


speakers. The circle refers the speaker 1 and triangle refers
for speakers 2. [5] In the training phase, a speaker-specific
VQ codebook is generated for each known speaker. Fig. 5,
shows the use of different number of centroids for the same
data field. After calculation of MFCC and VQ, Euclidian
distance is calculated for nearest speech matching.

Audio signal

MFCC

VECTOR
QUANTIZATION

EUCLIDIAN
DISTANCE

RECOGNIZED SPEAKER O/P


Fig 6. Basic speech recognition block diagram

The Fig. 7, showing below the complete experimental


setup for DC motor Drive through speech recognition, here
pc denotes the personal computer.

MICROPHONE

Fig 4. conceptual diagram that explains VQ process

PC

SERIAL PORT
INTERFACE

Micro Controller

DC
Motor

Driver Unit

Fig 7. Complete experimental setup

VI. RESULTS

Fig 5. Pictorial view of codebook with 15 Centroids respectively.

V. EXPERIMENTAL SETUP
In the experimental setup of DC motor drive through
speech recognition, the speech signal is taken by
microphone that is connected to computer. Software coding
is to calculate the MFCC and VQ (LBG algorithm)
MATLAB 7.5 version is used here in this paper, to
recognize the input speech taken from micro phone. And for
hardware part to make DC motor understands,
microcontroller (8051) is used. For microcontroller coding
Embedded C programming is used. The interfacing between
computer and microcontroller is done by RS-232. For drive
the DC motor the driver IC L293D is used. The Fig. 6
shows the basic speech recognition block diagram.

The coding has been developed using MFCC and VQ


algorithm, in MATLAB 7.5 version on window Vista
platform and supporting hardware also has been
implemented. For example the speech databases consist of
10 speeches, here the Fig. 8, shows the speech added to data
base and Fig. 9, shows the recognized speaker ID 2.
The interfacing is done between hardware and software
using RS-232 cable (MAX-232 IC). External DC motor can
be driven in forward or reverse direction as well as it can be
stopped also by giving speech commands. While calculating
of MFCC for database at the time of speech recognition,
there are some graphs obtained, these are shown below. The
Fig. 10 shows the graph of speaker voice database
(example), Fig. 11, shows the graph for hamming window,
Fig.12, shows the graph for hamming window multiplied by
input signal, Fig. 13, shows the application of filter bank.

Fig 8. Sound added to database in MATLAB

Fig 9. Recognized speaker ID 2

Fig 10. Graph of speaker voice database

Fig 11. Graph for hamming window

Fig 12. Hamming window multiplied by input signal

Fig 13. Applied filter banks

VII. CONCLUSION
In this paper MFCC and VQ techniques are used in speech
recognition to control the DC motor drive. The code
developed in MATLAB using MFCC and VQ can be even
used for control and drive the stepper motor, servo motor etc.
The developed speech algorithm can be use for navigation
purpose, to drive electric vehicles security areas (like banking,
unman vehicles, remote access of computers where speech
can be use as password).
VIII. REFERENCES
[1] An Efficient MFCC Extraction Method in Speech Recognition. Wei HAN,
Cheong-Fat CHAN, Chiu-Sing CHOY and Kong-Pang PUN, Department of
Electronic Engineering, The Chinese University of Hong Kong, Hong Kong,
IEEE 2006.
[2] Differential MFCC and Vector Quantization used for Real-Time Speaker
Recognition System, 2008 IEEE Congress on Image and Signal Processing,
Wang ChenMiao Zhenjiang, Institute of Information Science, Beijing
Jiaotong University, Beijing 100044, China
[3] Speaker Identification Using MEL Frequency Cepstral Coefficient, Md.
Rashidul Hasan, Mustafa Jamil, Md. Golam Rabbani Md. Saifur Rahman.
Electrical and Electronic Engineering, Bangladesh University of Engineering
and Technology. 3rd International Conference on Electrical & Computer
Engineering ICECE 2004, 28-30 December 2004, Dhaka, Bangladesh.
[4] Lawrence Rabiner and Biing-Hwang Juang, Fundamental of Speech
Recognition, Prentice-Hall, Englewood Cliffs, N.J., 1993.
[5] Zhong-Xuan, Yuan & Bo-Ling, Xu & Chong-Zhi, Yu. (1999). Binary
Quantization of Feature Vectors for Robust Text-Independent Speaker
Identification in IEEE Transactions on Speech and Audio Processing, Vol. 7,
No. 1, January 1999. IEEE, New York, NY, U.S.A.
[6] F. Soong, E. Rosenberg, B. Juang, and L. Rabiner, "A Vector
Quantization Approach to Speaker Recognition", AT&T Technical Journal,
vol. 66, March/April 1987, pp. 14-26.
[7] Comp.speech Frequently Asked Questions WWWsite,http://svr
www.eng.cam.ac.uk/comp.speech/
[8] Jr., J. D., Hansen, J., and Proakis, J. Discrete-Time Processing of Speech
Signals, second ed. IEEE Press, New York, 2000.
[9] R. M. Gray, ``Vector Quantization,'' IEEE ASSP Magazine, pp. 4--29,
April 1984.

[10] Y. Linde, A. Buzo & R. Gray, An algorithm for vector quantizer


design, IEEE Transactions on Communications, Vol. 28, pp.84-95, 1980.
[11] A Mixed Parameter Method Based on MFCC and Fractal Dimension for
Speech Recognition, Minghai Yao, Jing Hu and Qinlong Gu, College of
Information Engineering, Zhejiang University of Technology,, Hangzhou,
310032, China. Proceedings of the 2006 IEEE, International Conference on
Information Acquisition August, 20 - 23, 2006, Weihai, Shandong, China.

IX. BIOGRAPHIES
Punit Kumar Sharma, born in 1985 Rajasthan,
India. He received his B.E. (Electrical Engineering)
in 2007 from University of Rajasthan & pursuing
his M.Tech in Power Electronics from Visvesvaraya
Technological
University,
Belgaum
(Karnataka). His areas of interest are speech
recognition, artificial Intelligence, embedded
system, electrical drives etc.

Dr. B.R. Lakshmikantha, obtained his B.E.


(Electrical Engineering) in 1979 from Bangalore
University, M.E. (Power system) in 1981 from
Visvesvaraya Technological University & Ph.D. in
Power system Stability from
Visvesvaraya
Technological University. He is working as Dean of
academics and HOD of EEE Dept. at Dayananda
Sagar College of Engineering, Bangalore-78. His
areas of interest are FACTS devices.

K. Shanmukha Sundar, obtained his B.E.


(Electrical Engineering) degree in the year 1990 and
M.Tech.(Power Systems) 1994 from Mysore and
Mangalore University respectively. Presently he is
working as associate professor in the department of
Electrical & Electronics Engineering, Dayananda
Sagar College of Engineering Bangalore-78. His
areas of interest are power system optimization,
FACTS controllers, Electrical drives etc.

You might also like