Real Time Control of DC Motor Using Speech Recognition PDF

Real Time Control of DC Motor Drive using
Speech Recognition
Punit Kumar Sharma, Dr. B.R. Lakshmikantha and K. Shanmukha Sundar
Abstract- This paper introduces a new approach to control

and drive the DC motor, using speech recognition. The speech
signal can be provided through microphone that is connected
to computer. A DC motor connected through microcontroller
can be driven in forward or reverse direction at different
speeds, as well as it can be stopped by giving speech command.
In this paper Mel Frequency Cepstral Co-efficient (MFCC) is
used to recognize the users speech and vector quantization is
used to increase speech recognition accuracy. The MFCC and
VQ algorithm, for speech recognition have been implemented
in MATLAB 7.5 version on Windows Vista platform, and the
supporting hardware setup is being implemented.
Keywords- Speech Recognition, MFCC, Vector Quantization.
I. INTRODUCTION
Speech is one of the natural forms of communication.
Recent development has made it possible to use this in the
security system. In speech recognition, the task is to use a
speech sample to select the identity of the person that
produced the speech from among a population of speakers.
This paper makes it possible to use the speakers voice to
verify their identity and control access to DC motor. This
approach can be used such as voice dialling, banking by
telephone, to drive electrical vehicles, telephone shopping,
database access services, information services, voice mail,
security control for confidential information areas, and
remote access to computers [1,3]. The MFCC algorithm for
speech recognition is more accurate then Linear Prediction
Coding (LPC) and Hidden Markove Model (HMM). The
external DC motor is connected through interfacing between
computer and hardware circuit. The hardware circuit consist
of microcontroller (8051), IC MAX 232, driver IC (l293D)
mainly.1
Punit Kumar Sharma, Department of Electrical & Electronics Engineering,

Dayananda Sagar College of Engineering, Bangalore, India
(e-mail : pkselectrical@gmail.com)
Dr. B.R. Lakshmikantha, , Department of Electrical & Electronics
Engineering, Dayananda Sagar College of Engineering, Bangalore, India
(e-mail : lkantha@indiatimes.com)
K. Shanmukha Sundar, , Department of Electrical & Electronics
Engineering, Dayananda Sagar College of Engineering, Bangalore, India
(e-mail : bonniekhs@gmail.com)
978-1-4244-7882-8/11/$26.00 2011 IEEE
II. PRINCIPLES OF SPEECH RECOGNITION

Speaker recognition methods can be divided into textindependent and text-dependent methods. In a textindependent system, speaker models capture characteristics
of somebodys speech which show up irrespective of what
one is saying. [4] In a text-dependent system, the
recognition of the speakers identity is based on users
speaking one or more specific phrases, like passwords, card
numbers, PIN codes, etc. Every technology of speaker
recognition, identification and verification, whether textindependent and text-dependent, each has its own
advantages and disadvantages and may require different
treatments and techniques. The choice of which technology
to use is application-specific. At the highest level, all
speaker recognition systems contain two main modules
feature extraction and feature matching. [5, 6]
III. METHODOLOGY
The purpose of this module is to convert the speech
waveform to some type of parametric representation at a
considerably lower information rate. The speech signal is a
slowly time varying signal. When examined over a
sufficiently short period of time (between 5 and 100ms), its
characteristics are fairly stationary. However, over long
periods of time (on the order of 0.2s or more) the signal
characteristics change to reflect the different speech sounds
being spoken. The speech signal which is slowly time
varying called quasi-stationary. A number of methods are
available for parametrically representing the speech signal
for the speaker recognition task, such as Linear Prediction
Coding (LPC), Mel-Frequency Cepstrum Coefficients
(MFCC), and others [3]. MFCC is perhaps the best known
and most popular. MFCCs are based on the known
variation of the human ears critical bandwidths with
frequency. The MFCC technique makes use of two types of
filter, namely, linearly spaced filters and logarithmically
spaced filters.
A. MFCC Processor
The block diagram represents the structure of a MFCC
processor in Fig. 1. The speech input is recorded at sampling
frequency rate of 12500 Hz. This sampling frequency is
chosen to minimize the effects of aliasing in the analog -todigital conversion process. Figure 1 shows the block
diagram of an MFCC processor.
B. Mel Frequency Wrapping
The speech signal consists of tones with different
frequencies. For each tone with an actual Frequency f,
measured in Hz, a subjective pitch is measured on the Mel

scale. The mel-frequency scale is linear frequency spacing
below 1000Hz and a logarithmic spacing above 1000Hz. As
a reference point, the pitch of a 1 kHz tone, 40dB above the
perceptual hearing threshold, is defined as 1000 mels.
The following formula can used to compute the mels for
given frequency f, in Hz. [8]
mel(f)= 2595*log10(1+f/700)
............ (1)
Fig 2. An example of a 2-dimensional VQ
B. LBG Design Algorithm

The LBG VQ design algorithm is an iterative algorithm. It
is proposed by Y. Linde, A. Buzo & R. Gray. This
alternatively solves optimality criteria [10].
Fig 1. Block diagram of MFCC processor
C. Cepstrum
The output of the equation (1) is log mel spectrum, it has
to be converted back into time. The result is called the mel
frequency cepstrum coefficients (MFCCs). This may be
converted to the time domain using the Discrete Cosine
Transform (DCT). The MFCCs may be calculated using this
equation. [6,8]
K
Cn= (logSk)
) ]
......(2)
K=1
Where n= 1,2,3,....K
The number of mel cepstrum coefficients, K, is typically
chosen as 16. The first component, C0 is excluded from the
DCT since it represents the mean value of the input signal
which carries little speaker specific information.[1,3]
IV. FEATURE MATCHING
Feature matching techniques used in speaker recognition
include, Dynamic Time Warping (DTW), Hidden Markov
Modeling (HMM), and Vector Quantization (VQ). The VQ
approach has been used here for its ease of implementation
and high accuracy.
A. Vector Quantization
Vector quantization (VQ) is a lossy data compression
method based on principle of blockcoding [9]. It is a fixedto-fixed length algorithm. VQ may be thought as an
aproximator. Fig. 2, shows an example of a 2- dimensional
VQ.
Here, every pair of numbers falling in a particular region
is approximated by a star associated with that region. In Fig.
2, the stars are called codevectors and the regions defined by
the borders are called encoding regions. The set of all
codevectors is called the codebook.
Fig 3. Flowchart of VQ-LBG algorithm
The algorithm requires an initial codebook. The initial

codebook is obtained by the splitting method. In this
method, an initial codevector is set as the average of the
entire training sequence. This codevector is then split into
two. The iterative algorithm is run with these two vectors as
the initial codebook. The final two codevectors are split into
four and the process is repeated until the desired number of
codevectors is obtained. The algorithm is summarized in the
flowchart of Fig. 3. In Fig. 4, the VQ is shown for two

speakers. The circle refers the speaker 1 and triangle refers
for speakers 2. [5] In the training phase, a speaker-specific
VQ codebook is generated for each known speaker. Fig. 5,
shows the use of different number of centroids for the same
data field. After calculation of MFCC and VQ, Euclidian
distance is calculated for nearest speech matching.
Audio signal
MFCC
VECTOR
QUANTIZATION
EUCLIDIAN
DISTANCE
RECOGNIZED SPEAKER O/P

Fig 6. Basic speech recognition block diagram
The Fig. 7, showing below the complete experimental

setup for DC motor Drive through speech recognition, here
pc denotes the personal computer.
MICROPHONE
Fig 4. conceptual diagram that explains VQ process
PC
SERIAL PORT
INTERFACE
Micro Controller
DC
Motor
Driver Unit
Fig 7. Complete experimental setup
VI. RESULTS
Fig 5. Pictorial view of codebook with 15 Centroids respectively.
V. EXPERIMENTAL SETUP
In the experimental setup of DC motor drive through
speech recognition, the speech signal is taken by
microphone that is connected to computer. Software coding
is to calculate the MFCC and VQ (LBG algorithm)
MATLAB 7.5 version is used here in this paper, to
recognize the input speech taken from micro phone. And for
hardware part to make DC motor understands,
microcontroller (8051) is used. For microcontroller coding
Embedded C programming is used. The interfacing between
computer and microcontroller is done by RS-232. For drive
the DC motor the driver IC L293D is used. The Fig. 6
shows the basic speech recognition block diagram.
The coding has been developed using MFCC and VQ

algorithm, in MATLAB 7.5 version on window Vista
platform and supporting hardware also has been
implemented. For example the speech databases consist of
10 speeches, here the Fig. 8, shows the speech added to data
base and Fig. 9, shows the recognized speaker ID 2.
The interfacing is done between hardware and software
using RS-232 cable (MAX-232 IC). External DC motor can
be driven in forward or reverse direction as well as it can be
stopped also by giving speech commands. While calculating
of MFCC for database at the time of speech recognition,
there are some graphs obtained, these are shown below. The
Fig. 10 shows the graph of speaker voice database
(example), Fig. 11, shows the graph for hamming window,
Fig.12, shows the graph for hamming window multiplied by
input signal, Fig. 13, shows the application of filter bank.
Fig 8. Sound added to database in MATLAB
Fig 9. Recognized speaker ID 2
Fig 10. Graph of speaker voice database
Fig 11. Graph for hamming window
Fig 12. Hamming window multiplied by input signal
Fig 13. Applied filter banks
VII. CONCLUSION
In this paper MFCC and VQ techniques are used in speech
recognition to control the DC motor drive. The code
developed in MATLAB using MFCC and VQ can be even
used for control and drive the stepper motor, servo motor etc.
The developed speech algorithm can be use for navigation
purpose, to drive electric vehicles security areas (like banking,
unman vehicles, remote access of computers where speech
can be use as password).
VIII. REFERENCES
[1] An Efficient MFCC Extraction Method in Speech Recognition. Wei HAN,
Cheong-Fat CHAN, Chiu-Sing CHOY and Kong-Pang PUN, Department of
Electronic Engineering, The Chinese University of Hong Kong, Hong Kong,
IEEE 2006.
[2] Differential MFCC and Vector Quantization used for Real-Time Speaker
Recognition System, 2008 IEEE Congress on Image and Signal Processing,
Wang ChenMiao Zhenjiang, Institute of Information Science, Beijing
Jiaotong University, Beijing 100044, China
[3] Speaker Identification Using MEL Frequency Cepstral Coefficient, Md.
Rashidul Hasan, Mustafa Jamil, Md. Golam Rabbani Md. Saifur Rahman.
Electrical and Electronic Engineering, Bangladesh University of Engineering
and Technology. 3rd International Conference on Electrical & Computer
Engineering ICECE 2004, 28-30 December 2004, Dhaka, Bangladesh.
[4] Lawrence Rabiner and Biing-Hwang Juang, Fundamental of Speech
Recognition, Prentice-Hall, Englewood Cliffs, N.J., 1993.
[5] Zhong-Xuan, Yuan & Bo-Ling, Xu & Chong-Zhi, Yu. (1999). Binary
Quantization of Feature Vectors for Robust Text-Independent Speaker
Identification in IEEE Transactions on Speech and Audio Processing, Vol. 7,
No. 1, January 1999. IEEE, New York, NY, U.S.A.
[6] F. Soong, E. Rosenberg, B. Juang, and L. Rabiner, "A Vector
Quantization Approach to Speaker Recognition", AT&T Technical Journal,
vol. 66, March/April 1987, pp. 14-26.
[7] Comp.speech Frequently Asked Questions WWWsite,http://svr
www.eng.cam.ac.uk/comp.speech/
[8] Jr., J. D., Hansen, J., and Proakis, J. Discrete-Time Processing of Speech
Signals, second ed. IEEE Press, New York, 2000.
[9] R. M. Gray, ``Vector Quantization,'' IEEE ASSP Magazine, pp. 4--29,
April 1984.
[10] Y. Linde, A. Buzo & R. Gray, An algorithm for vector quantizer

design, IEEE Transactions on Communications, Vol. 28, pp.84-95, 1980.
[11] A Mixed Parameter Method Based on MFCC and Fractal Dimension for
Speech Recognition, Minghai Yao, Jing Hu and Qinlong Gu, College of
Information Engineering, Zhejiang University of Technology,, Hangzhou,
310032, China. Proceedings of the 2006 IEEE, International Conference on
Information Acquisition August, 20 - 23, 2006, Weihai, Shandong, China.
IX. BIOGRAPHIES
Punit Kumar Sharma, born in 1985 Rajasthan,
India. He received his B.E. (Electrical Engineering)
in 2007 from University of Rajasthan & pursuing
his M.Tech in Power Electronics from Visvesvaraya
Technological
University,
Belgaum
(Karnataka). His areas of interest are speech
recognition, artificial Intelligence, embedded
system, electrical drives etc.
Dr. B.R. Lakshmikantha, obtained his B.E.

(Electrical Engineering) in 1979 from Bangalore
University, M.E. (Power system) in 1981 from
Visvesvaraya Technological University & Ph.D. in
Power system Stability from
Visvesvaraya
Technological University. He is working as Dean of
academics and HOD of EEE Dept. at Dayananda
Sagar College of Engineering, Bangalore-78. His
areas of interest are FACTS devices.
K. Shanmukha Sundar, obtained his B.E.

(Electrical Engineering) degree in the year 1990 and
M.Tech.(Power Systems) 1994 from Mysore and
Mangalore University respectively. Presently he is
working as associate professor in the department of
Electrical & Electronics Engineering, Dayananda
Sagar College of Engineering Bangalore-78. His
areas of interest are power system optimization,
FACTS controllers, Electrical drives etc.

Real Time Control of DC Motor Using Speech Recognition PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Real Time Control of DC Motor Using Speech Recognition PDF

Uploaded by

Copyright:

Available Formats

Real Time Control of DC Motor Drive using

Abstract- This paper introduces a new approach to control

Punit Kumar Sharma, Department of Electrical & Electronics Engineering,

978-1-4244-7882-8/11/$26.00 2011 IEEE

II. PRINCIPLES OF SPEECH RECOGNITION

measured in Hz, a subjective pitch is measured on the Mel

Fig 2. An example of a 2-dimensional VQ

B. LBG Design Algorithm

Fig 1. Block diagram of MFCC processor

Fig 3. Flowchart of VQ-LBG algorithm

The algorithm requires an initial codebook. The initial

flowchart of Fig. 3. In Fig. 4, the VQ is shown for two

RECOGNIZED SPEAKER O/P

The Fig. 7, showing below the complete experimental

Fig 4. conceptual diagram that explains VQ process

Fig 7. Complete experimental setup

Fig 5. Pictorial view of codebook with 15 Centroids respectively.

The coding has been developed using MFCC and VQ

Fig 8. Sound added to database in MATLAB

Fig 9. Recognized speaker ID 2

Fig 10. Graph of speaker voice database

Fig 11. Graph for hamming window

Fig 12. Hamming window multiplied by input signal

Fig 13. Applied filter banks

[10] Y. Linde, A. Buzo & R. Gray, An algorithm for vector quantizer

Dr. B.R. Lakshmikantha, obtained his B.E.

K. Shanmukha Sundar, obtained his B.E.

You might also like