You are on page 1of 41

CERTIFICATE

This is to certify that the work titled SPEECH ANALYSIS submitted by ADITYA KHER in
partial fulfillment for the award of degree of B.Tech of Jaypee Institute of Information
Technology, Noida has been carried out under my supervision. This work has not been submitted
partially or wholly to any other University or Institute for the award of this or any other degree or
diploma.

Signature of Supervisor

..

Name of Supervisor ..
Designation

..

Date

..

ACKNOWLEDGEMENT

Words are inadequate to express the overwhelming sense of gratitude and humble regards to my
guide supervisor Mr. Manish Kumar, Department of Electronics and Communication
Engineering for his constant motivation, support, expert guidance, constant supervision and
constructive suggestion for the submission of my progress report of work on Speech
Recognition.

I express my gratitude to Mr. Manish Kumar, Electronics and Communication Engineering for
his invaluable suggestions and constant encouragement all through the work. I also thank all the
teaching and non-teaching staff for their nice cooperation to the students.

This report would have been impossible if not for the perpetual moral support from my family
members, and friends. I would like to thank them all.

INTRODUCTION
First of all, why do we need a transform, or what is a transform anyway?

Mathematical transformations are applied to signals to obtain a further information from that
signal that is not readily available in the raw signal. In the following discussion, we will assume
a time-domain signal as a raw signal, and a signal that has been "transformed" by any of the
available mathematical transformations as a processed signal.

There are number of transformations that can be applied, among which the Fourier transforms
are probably by far the most popular.

Most of the signals in practice, are TIME-DOMAIN signals in their raw format. That is,
whatever that signal is measuring, is a function of time. In other words, when we plot the signal
one of the axes is time (independent variable), and the other (dependent variable) is usually the
amplitude. When we plot time-domain signals, we obtain a time-amplitude representation of the
signal. This representation is not always the best representation of the signal for most signal
processing related applications. In many cases, the most distinguished information is hidden in
the frequency content of the signal. The frequency SPECTRUM of a signal is basically the
frequency components (spectral components) of that signal. The frequency spectrum of a signal
shows what frequencies exist in the signal.

Intuitively, we all know that the frequency is something to do with the change in rate of
something. If something ( a mathematical or physical variable, would be the technically correct
term) changes rapidly, we say that it is of high frequency, where as if this variable does not
change rapidly, i.e., it changes smoothly, we say that it is of low frequency. If this variable does
not change at all, then we say it has zero frequency, or no frequency. For example the publication
frequency of a daily newspaper is higher than that of a monthly magazine (it is published more
frequently).

The frequency is measured in cycles/second, or with a more common name, in "Hertz". For
example the electric power we use in our daily life in the US is 60 Hz (50 Hz elsewhere in the
world). This means that if you try to plot the electric current, it will be a sine wave passing
through the same point 50 times in 1 second. Now, look at the following figures. The first one is
a sine wave at 3 Hz, the second one at 10 Hz, and the third one at 50 Hz. Compare them.

THE FOURIER TRANSFORM

So how do we measure frequency, or how do we find the frequency content of a signal? The
answer is FOURIER TRANSFORM (FT).

If the FT of a signal in time domain is taken, the frequency-amplitude representation of that


signal is obtained. In other words, we now have a plot with one axis being the frequency and the
other being the amplitude. This plot tells us how much of each frequency exists in our signal.

The frequency axis starts from zero, and goes up to infinity. For every frequency, we have an
amplitude value. For example, if we take the FT of the electric current that we use in our houses,
we will have one spike at 50 Hz, and nothing elsewhere, since that signal has only 50 Hz
frequency component. No other signal, however, has a FT which is this simple. For most
practical purposes, signals contain more than one frequency component. The following shows
the FT of the 50 Hz signal:

Figure 1.4 and 1.5: Fourier transform of 50 Hz signal

One word of caution is in order at this point. Note that two plots are given in Figure 1.4. The
bottom one plots only the first half of the top one. Due to reasons that are not crucial to know at
this time, the frequency spectrum of a real valued signal is always symmetric. The top plot
illustrates this point. However, since the symmetric part is exactly a mirror image of the first
part, it provides no additional information, and therefore, this symmetric second part is usually
not shown. In most of the following figures corresponding to FT, I will only show the first half
of this symmetric spectrum.

WHY DO WE NEED FREQUENCY INFORMATION?


Often times, the information that cannot be readily seen in the time-domain can be seen in the
frequency domain.

Let's give an example from biological signals. Suppose we are looking at an ECG signal
(ElectroCardioGraphy, graphical recording of heart's electrical activity). The typical shape of a
healthy ECG signal is well known to cardiologists. Any significant deviation from that shape is
usually considered to be a symptom of a pathological condition.
This pathological condition, however, may not always be quite obvious in the original
time-domain signal. Cardiologists usually use the time-domain ECG signals which are recorded
on

strip-charts

to

analyze

ECG

signals.

Recently,

the

new

computerized

ECG

recorders/analyzers also utilize the frequency information to decide whether a pathological


condition exists. A pathological condition can sometimes be diagnosed more easily when the
frequency content of the signal is analyzed.

This, of course, is only one simple example why frequency content might be useful.
Today Fourier transforms are used in many different areas including all branches of engineering.

Although FT is probably the most popular transform being used (especially in electrical
engineering), it is not the only one. There are many other transforms that are used quite often by
engineers and mathematicians. Hilbert transform, short-time Fourier transform (more about this
later), Wigner distributions, the Radon Transform, and of course our featured transformation ,
the wavelet transform, constitute only a small portion of a huge list of transforms that are
available at engineer's and mathematician's disposal. Every transformation technique has its own
area of application, with advantages and disadvantages, and the wavelet transform (WT) is no
exception.

For a better understanding of the need for the WT let's look at the FT more closely. FT
(as well as WT) is a reversible transform, that is, it allows to go back and forward between the
raw and processed (transformed) signals. However, only either of them is available at any given
7

time. That is, no frequency information is available in the time-domain signal, and no time
information is available in the Fourier transformed signal. The natural question that comes to
mind is that is it necessary to have both the time and the frequency information at the same time?

As we will see soon, the answer depends on the particular application, and the nature of
the signal in hand. Recall that the FT gives the frequency information of the signal, which means
that it tells us how much of each frequency exists in the signal, but it does not tell us when in
time these frequency components exist. This information is not required when the signal is socalled stationary .

Let's take a closer look at this stationarity concept more closely, since it is of paramount
importance in signal analysis. Signals whose frequency content do not change in time are called
stationary signals . In other words, the frequency content of stationary signals do not change in
time. In this case, one does not need to know at what times frequency components exist , since
all frequency components exist at all times !!! .

For example the following signal

is a stationary signal, because it has frequencies of 10, 25, 50, and 100 Hz at any given
time instant. This signal is plotted below:

Figure 1.6: Stationary Signal


And the following is its FT:

Figure 1.7:Fourier Transform of Stationary Signal

The top plot in Figure 1.5 is the (half of the symmetric) frequency spectrum of the signal
in Figure 1.5. The bottom plot is the zoomed version of the top plot, showing only the range of
frequencies that are of interest to us. Note the four spectral components corresponding to the
frequencies 10, 25, 50 and 100 Hz.
Contrary to the signal in Figure 1.6, the following signal is not stationary. Figure 1.8 plots a
signal whose frequency constantly changes in time. This signal is known as the "chirp" signal.
This is a non-stationary signal.

Figure 1.8: Non Stationary Signal

Let's look at another example. Figure 1.9 plots a signal with four different frequency components
at four different time intervals, hence a non-stationary signal. The interval 0 to 300 ms has a 100
Hz sinusoid, the interval 300 to 600 ms has a 50 Hz sinusoid, the interval 600 to 800 ms has a 25
Hz sinusoid, and finally the interval 800 to 1000 ms has a 10 Hz sinusoid.

10

Figure 1.9: Signal with 4 different frequency components

And the following is its FT:

11

Figure 1.10: Fourier Transform of signal in fig 1.9


Do not worry about the little ripples at this time; they are due to sudden changes from one
frequency component to another, which have no significance in this text. Note that the
amplitudes of higher frequency components are higher than those of the lower frequency ones.
This is due to fact that higher frequencies last longer (300 ms each) than the lower frequency
components (200 ms each). (The exact value of the amplitudes are not important).

Other than those ripples, everything seems to be right. The FT has four peaks,
corresponding to four frequencies with reasonable amplitudes... Right

WRONG (!)

Well, not exactly wrong, but not exactly right either...


Here is why:

For the first signal, plotted in Figure 1.5, consider the following question:

At what times (or time intervals), do these frequency components occur?


Answer:
At all times! Remember that in stationary signals, all frequency components that exist in
the signal, exist throughout the entire duration of the signal. There is 10 Hz at all times, there is
50 Hz at all times, and there is 100 Hz at all times.

Now, consider the same question for the non-stationary signal in Figure 1.7 or in Figure
1.8.

At what times these frequency components occur?

12

For the signal in Figure 1.8, we know that in the first interval we have the highest frequency
component, and in the last interval we have the lowest frequency component. For the signal in
Figure 1.7, the frequency components change continuously. Therefore, for these signals the
frequency components do not appear at all times!

Now, compare the Figures 1.6 and 1.9. The similarity between these two spectrum should
be apparent. Both of them show four spectral components at exactly the same frequencies, i.e., at
10, 25, 50, and 100 Hz. Other than the ripples, and the difference in amplitude (which can always
be normalized), the two spectrums are almost identical, although the corresponding time-domain
signals are not even close to each other. Both of the signals involves the same frequency
components, but the first one has these frequencies at all times, the second one has these
frequencies at different intervals. So, how come the spectrums of two entirely different signals
look very much alike? Recall that the FT gives the spectral content of the signal, but it gives no
information regarding where in time those spectral components appear . Therefore, FT is not a
suitable technique for non-stationary signal, with one exception:

FT can be used for non-stationary signals, if we are only interested in what spectral
components exist in the signal, but not interested where these occur. However, if this information
is needed, i.e., if we want to know, what spectral component occur at what time (interval) , then
Fourier transform is not the right transform to use.

For practical purposes it is difficult to make the separation, since there are a lot of
practical stationary signals, as well as non-stationary ones. Almost all biological signals, for
example, are non-stationary. Some of the most famous ones are ECG (electrical activity of the
heart , electrocardiograph), EEG (electrical activity of the brain, electroencephalograph), and
EMG (electrical activity of the muscles, electromyogram).

Once again please note that, the FT gives what frequency components (spectral
components) exist in the signal. Nothing more, nothing less.
When the time localization of the spectral components are needed, a transform giving the
TIME-FREQUENCY REPRESENTATION of the signal is needed.
13

THE ULTIMATE SOLUTION:


THE WAVELET TRANSFORM

The Wavelet transform is a transform of this type. It provides the time-frequency


representation. (There are other transforms which give this information too, such as short time
Fourier transform, Wigner distributions, etc.)

Often times a particular spectral component occurring at any instant can be of particular
interest. In these cases it may be very beneficial to know the time intervals these particular
spectral components occur. For example, in EEGs, the latency of an event-related potential is of
particular interest (Event-related potential is the response of the brain to a specific stimulus like
flash-light, the latency of this response is the amount of time elapsed between the onset of the
stimulus and the response).

Wavelet transform is capable of providing the time and frequency information


simultaneously, hence giving a time-frequency representation of the signal.

To make a real long story short, we pass the time-domain signal from various highpass
and low pass filters, which filters out either high frequency or low frequency portions of the
signal. This procedure is repeated, every time some portion of the signal corresponding to some
frequencies being removed from the signal.

Here is how this works: Suppose we have a signal which has frequencies up to 1000 Hz.
In the first stage we split up the signal in to two parts by passing the signal from a highpass and a
lowpass filter (filters should satisfy some certain conditions, so-called admissibility condition)

14

which results in two different versions of the same signal: portion of the signal corresponding to
0-500 Hz (low pass portion), and 500-1000 Hz (high pass portion).

Then, we take either portion (usually low pass portion) or both, and do the same thing
again. This operation is called decomposition .

Assuming that we have taken the lowpass portion, we now have 3 sets of data, each
corresponding to the same signal at frequencies 0-250 Hz, 250-500 Hz, 500-1000 Hz.

Then we take the lowpass portion again and pass it through low and high pass filters; we
now have 4 sets of signals corresponding to 0-125 Hz, 125-250 Hz,250-500 Hz, and 500-1000
Hz. We continue like this until we have decomposed the signal to a pre-defined certain level.
Then we have a bunch of signals, which actually represent the same signal, but all corresponding
to different frequency bands. We know which signal corresponds to which frequency band, and
if we put all of them together and plot them on a 3-D graph, we will have time in one axis,
frequency in the second and amplitude in the third axis. This will show us which frequencies
exist at which time ( there is an issue, called "uncertainty principle", which states that, we cannot
exactly know what frequency exists at what time instance , but we can only know what
frequency bands exist at what time intervals)

However, I still would like to explain it briefly:

The uncertainty principle, originally found and formulated by Heisenberg, states that, the
momentum and the position of a moving particle cannot be known simultaneously. This applies
to our subject as follows:

15

\WAVELET APPLICATIONS

Wavelets have scale aspects and time aspects, consequently every application has scale and time
aspects. To clarify them we try to untangle the aspects somewhat arbitrarily.
For scale aspects, we present one idea around the notion of local regularity. For time aspects, we
present a list of domains. When the decomposition is taken as a whole, the de-noising and
compression processes are center points.

Scale Aspects:
As a complement to the spectral signal analysis, new signal forms appear. They are less regular
signals than the usual ones.
The cusp signal presents a very quick local variation. Its equation is

with t close to 0 and 0 < r

< 1. The lower r the sharper the signal.


To illustrate this notion physically, imagine you take a piece of aluminum foil; The surface is
very smooth, very regular. You first crush it into a ball, and then you spread it out so that it looks
like a surface. The asperities are clearly visible. Each one represents a two-dimension cusp and
analog of the one dimensional cusp. If you crush again the foil, more tightly, in a more compact
ball, when you spread it out, the roughness increases and the regularity decreases.
Several domains use the wavelet techniques of regularity study:

Biology for cell membrane recognition, to distinguish the normal from the pathological
membranes

Metallurgy for the characterization of rough surfaces

Finance (which is more surprising), for detecting the properties of quick variation of
values

In Internet traffic description, for designing the services size

16

Time Aspects:
Let's switch to time aspects. The main goals are

Rupture and edges detection

Study of short-time phenomena as transient processes

As domain applications, we get

Industrial supervision of gear-wheel

Checking undue noises in craned or dented wheels, and more generally in nondestructive
control quality processes

Detection of short pathological events as epileptic crises or normal ones as evoked


potentials in EEG (medicine)

SAR imagery

Automatic target recognition

Intermittence in physics

17

Wavelet Decomposition

Many applications use the wavelet decomposition taken as a whole. The common goals concern
the signal or image clearance and simplification, which are parts of de-noising or compression.
We find many published papers in oceanography and earth studies.
One of the most popular successes of the wavelets is the compression of FBI fingerprints.
When trying to classify the applications by domain, it is almost impossible to sum up several
thousand papers written within the last 15 years. Moreover, it is difficult to get information on
real-world industrial applications from companies. They understandably protect their own
information.
Some domains are very productive. Medicine is one of them. We can find studies on micropotential extraction in EKGs, on time localization of His bundle electrical heart activity, in ECG
noise removal. In EEGs, a quick transitory signal is drowned in the usual one. The wavelets are
able to determine if a quick signal exists, and if so, can localize it. There are attempts to enhance
mammograms to discriminate tumors from calcifications.
Another prototypical application is a classification of Magnetic Resonance Spectra. The study
concerns the influence of the fat we eat on our body fat. The type of feeding is the basic
information and the study is intended to avoid taking a sample of the body fat. Each Fourier
spectrum is encoded by some of its wavelet coefficients. A few of them are enough to code the
most interesting features of the spectrum. The classification is performed on the coded vectors.

18

Fourier Analysis:
Signal analysts already have at their disposal an impressive arsenal of tools. Perhaps the most
well known of these is Fourier analysis, which breaks down a signal into constituent sinusoids of
different frequencies. Another way to think of Fourier analysis is as a mathematical technique for
transforming our view of the signal from time-based to frequency-based.

For many signals, Fourier analysis is extremely useful because the signal's frequency content is
of great importance. So why do we need other techniques, like wavelet analysis?
Fourier analysis has a serious drawback. In transforming to the frequency domain, time
information is lost. When looking at a Fourier transform of a signal, it is impossible to tell when
a particular event took place.
If the signal properties do not change much over time -- that is, if it is what is called a stationary
signal -- this drawback isn't very important. However, most interesting signals contain numerous
nonstationary or transitory characteristics: drift, trends, abrupt changes, and beginnings and ends
of events. These characteristics are often the most important part of the signal, and Fourier
analysis is not suited to detecting them.

19

Short Time Fourier Analysis:

In an effort to correct this deficiency, Dennis Gabor (1946) adapted the Fourier transform to
analyze only a small section of the signal at a time -- a technique called windowing the signal.
Gabor's adaptation, called the Short-Time Fourier Transform (STFT), maps a signal into a twodimensional function of time and frequency.

The STFT represents a sort of compromise between the time- and frequency-based views of a
signal. It provides some information about both when and at what frequencies a signal event
occurs. However, you can only obtain this information with limited precision, and that precision
is determined by the size of the window.
While the STFT compromise between time and frequency information can be useful, the
drawback is that once you choose a particular size for the time window, that window is the same
for all frequencies. Many signals require a more flexible approach -- one where we can vary the
window size to determine more accurately either time or frequency.

20

Wavelet Analysis:

Wavelet analysis represents the next logical step: a windowing technique with variable-sized
regions. Wavelet analysis allows the use of long time intervals where we want more precise lowfrequency information, and shorter regions where we want high-frequency information.

Here's what this looks like in contrast with the time-based, frequency-based, and STFT views of
a signal:

You may have noticed that wavelet analysis does not use a time-frequency region, but rather a
time-scale region.

21

What Can Wavelet Analysis Do?


One major advantage afforded by wavelets is the ability to perform local analysis -- that is, to
analyze a localized area of a larger signal.
Consider a sinusoidal signal with a small discontinuity -- one so tiny as to be barely visible. Such
a signal easily could be generated in the real world, perhaps by a power fluctuation or a noisy
switch.

A plot of the Fourier coefficients (as provided by the fft command) of this signal shows nothing
particularly interesting: a flat spectrum with two peaks representing a single frequency.
However, a plot of wavelet coefficients clearly shows the exact location in time of the
discontinuity.

Wavelet analysis is capable of revealing aspects of data that other signal analysis techniques
miss, aspects like trends, breakdown points, discontinuities in higher derivatives, and selfsimilarity. Furthermore, because it affords a different view of data than those presented by

22

traditional techniques, wavelet analysis can often compress or de-noise a signal without
appreciable degradation.
Indeed, in their brief history within the signal processing field, wavelets have already proven
themselves to be an indispensable addition to the analyst's collection of tools and continue to
enjoy a burgeoning popularity today.

What is Wavelet Analysis?

Now that we know some situations when wavelet analysis is useful, it is worthwhile asking
"What is wavelet analysis?" and even more fundamentally, "What is a wavelet?"
A wavelet is a waveform of effectively limited duration that has an average value of zero.
Compare wavelets with sine waves, which are the basis of Fourier analysis. Sinusoids do not
have limited duration -- they extend from minus to plus infinity. And where sinusoids are smooth
and predictable, wavelets tend to be irregular and asymmetric.

Fourier analysis consists of breaking up a signal into sine waves of various frequencies.
Similarly, wavelet analysis is the breaking up of a signal into shifted and scaled versions of the
original (or mother) wavelet.
Just looking at pictures of wavelets and sine waves, you can see intuitively that signals with
sharp changes might be better analyzed with an irregular wavelet than with a smooth sinusoid,
just as some foods are better handled with a fork than a spoon.

23

It also makes sense that local features can be described better with wavelets that have local
extent.

Number Dimensions:
Thus far, we've discussed only one-dimensional data, which encompasses most ordinary signals.
However, wavelet analysis can be applied to two-dimensional data (images) and, in principle, to
higher dimensional data.
This toolbox uses only one- and two-dimensional analysis techniques.

The Continuous Wavelet Transform:


Mathematically, the process of Fourier analysis is represented by the Fourier transform:

which is the sum over all time of the signal f(t) multiplied by a complex exponential. (Recall that
a complex exponential can be broken down into real and imaginary sinusoidal components.)
The results of the transform are the Fourier coefficients,
sinusoid of frequency

which when multiplied by a

yield the constituent sinusoidal components of the original signal.

Graphically, the process looks like

24

Similarly, the continuous wavelet transform (CWT) is defined as the sum over all time of the
signal multiplied by scaled, shifted versions of the wavelet function

The results of the CWT are many wavelet coefficients C, which are a function of scale and
position.
Multiplying each coefficient by the appropriately scaled and shifted wavelet yields the
constituent wavelets of the original signal:

Scaling:
We've already alluded to the fact that wavelet analysis produces a time-scale view of a signal,
and now we're talking about scaling and shifting wavelets. What exactly do we mean by scale in
this context?
Scaling a wavelet simply means stretching (or compressing) it.
To go beyond colloquial descriptions such as "stretching," we introduce the scale factor, often
denoted by the letter a. If we're talking about sinusoids, for example, the effect of the scale factor
is very easy to see:

25

The scale factor works exactly the same with wavelets. The smaller the scale factor, the more
"compressed" the wavelet.

It is clear from the diagrams that, for a sinusoid


to the radian frequency

, the scale factor a, is related (inversely)

. Similarly, with wavelet analysis, the scale is related to the frequency

of the signal. We'll return to this topic later.

26

Shifting:
Shifting a wavelet simply means delaying (or hastening) its onset. Mathematically, delaying a
function f(t) by k is represented by f(t-k):

FIVE EASY STEPS FOR CONTINUOS WAVELET TRANSFORM:


The continuous wavelet transform is the sum over all time of the signal multiplied by scaled,
shifted versions of the wavelet. This process produces wavelet coefficients that are a function of
scale and position.
It's really a very simple process. In fact, here are the five steps of an easy recipe for creating a
CWT:
1. Take a wavelet and compare it to a section at the start of the original signal.
2. Calculate a number, C, that represents how closely correlated the wavelet is with this
section of the signal. The higher C is, the more the similarity. More precisely, if the
signal energy and the wavelet energy are equal to one, C may be interpreted as a
correlation coefficient.
Note that the results will depend on the shape of the wavelet you choose.

27

3. Shift the wavelet to the right and repeat steps 1 and 2 until you've covered the whole
signal.

4. Scale (stretch) the wavelet and repeat steps 1 through 3.

5. Repeat steps 1 through 4 for all scales.


When you're done, you'll have the coefficients produced at different scales by different sections
of the signal. The coefficients constitute the results of a regression of the original signal
performed on the wavelets.
How to make sense of all these coefficients? You could make a plot on which the x-axis
represents position along the signal (time), the y-axis represents scale, and the color at each x-y

28

point represents the magnitude of the wavelet coefficient C. These are the coefficient plots
generated by the graphical tools.

These coefficient plots resemble a bumpy surface viewed from above. If you could look at the
same surface from the side, you might see something like this:

The continuous wavelet transform coefficient plots are precisely the time-scale view of the signal
we referred to earlier. It is a different view of signal data from the time-frequency Fourier view,
but it is not unrelated.

29

Scale and Frequency:


Notice that the scales in the coefficients plot (shown as y-axis labels) run from 1 to 31. Recall
that the higher scales correspond to the most "stretched" wavelets. The more stretched the
wavelet, the longer the portion of the signal with which it is being compared, and thus the
coarser the signal features being measured by the wavelet coefficients.

Thus, there is a correspondence between wavelet scales and frequency as revealed by wavelet
analysis:

Low scale a

High scale a => Stretched wavelet


frequency

compressed wavelet Rapidly

changing details

High frequency

Slowly changing, coarse features

30

.
Low

What is Continuous about the Continuous Wavelet Transform?


Any signal processing performed on a computer using real-world data must be performed on a
discrete signal -- that is, on a signal that has been measured at discrete time. So what exactly is
"continuous" about it?
What's "continuous" about the CWT, and what distinguishes it from the discrete wavelet
transform (to be discussed in the following section), is the set of scales and positions at which it
operates.
Unlike the discrete wavelet transform, the CWT can operate at every scale, from that of the
original signal up to some maximum scale that you determine by trading off your need for
detailed analysis with available computational horsepower.
The CWT is also continuous in terms of shifting: during computation, the analyzing wavelet is
shifted smoothly over the full domain of the analyzed function.

31

The Discrete Wavelet Transform:

Calculating wavelet coefficients at every possible scale is a fair amount of work, and it generates
an awful lot of data. What if we choose only a subset of scales and positions at which to make
our calculations?
It turns out, rather remarkably, that if we choose scales and positions based on powers of two -so-called dyadic scales and positions -- then our analysis will be much more efficient and just as
accurate. We obtain such an analysis from the discrete wavelet transform (DWT).
An efficient way to implement this scheme using filters was developed in 1988 by Mallat (see
[Mal89] in References). The Mallat algorithm is in fact a classical scheme known in the signal
processing community as a two-channel subband. This very practical filtering algorithm yields a
fast wavelet transform -- a box into which a signal passes, and out of which wavelet coefficients
quickly emerge. Let's examine this in more depth.

One Stage Filtering:


For many signals, the low-frequency content is the most important part. It is what gives the
signal its identity. The high-frequency content, on the other hand, imparts flavor or nuance.
Consider the human voice. If you remove the high-frequency components, the voice sounds
different, but you can still tell what's being said. However, if you remove enough of the lowfrequency components, you hear gibberish.
In wavelet analysis, we often speak of approximations and details. The approximations are the
high-scale, low-frequency components of the signal. The details are the low-scale, highfrequency components.

The filtering process, at its most basic level, looks like this:
32

The original signal, S, passes through two complementary filters and emerges as two signals.
Unfortunately, if we actually perform this operation on a real digital signal, we wind up with
twice as much data as we started with. Suppose, for instance, that the original signal S consists of
1000 samples of data. Then the resulting signals will each have 1000 samples, for a total of 2000.
These signals A and D are interesting, but we get 2000 values instead of the 1000 we had. There
exists a more subtle way to perform the decomposition using wavelets. By looking carefully at
the computation, we may keep only one point out of two in each of the two 2000-length samples
to get the complete information. This is the notion of downsampling. We produce two sequences
called cA and cD.

The process on the right, which includes downsampling, produces DWT coefficients.
To gain a better appreciation of this process, let's perform a one-stage discrete wavelet transform
of a signal. Our signal will be a pure sinusoid with high-frequency noise added to it.
33

Here is our schematic diagram with real signals inserted into it:

The MATLAB code needed to generate s, cD, and cA is

s = sin(20.*linspace(0,pi,1000)) + 0.5.*rand(1,1000);

[cA,cD] = dwt(s,'db2');

where db2 is the name of the wavelet we want to use for the analysis.
Notice that the detail coefficients cD are small and consist mainly of a high-frequency noise,
while the approximation coefficients cA contain much less noise than does the original signal.

[length(cA) length(cD)]

ans = 501 501

You may observe that the actual lengths of the detail and approximation coefficient vectors are
slightly more than half the length of the original signal. This has to do with the filtering process,
which is implemented by convolving the signal with a filter. The convolution "smears" the
signal, introducing several extra samples into the result.

34

Multiple Level Decomposition:


The decomposition process can be iterated, with successive approximations being decomposed in
turn, so that one signal is broken down into many lower resolution components. This is called the
wavelet decomposition tree.

Looking at a signal's wavelet decomposition tree can yield valuable information.

35

USER INTERFACE:

OUTPUT:
I would like to explain the analysis with the help of an example:

We record a speech signal my name is aditya

On recording the signal, we observe that it contains 4 different and distinct words.

We first plot the respective time and frequency graphs for the entire sentence ie. my
name is aditya.

Then we plot the graphs for time and frequency for the individual words in the sentence
like my, aditya etc.

Then we compare the values of time and frequency from the respective plots for the
individual words

The respective plots are shown below. It can be easily seen that the values of frequency and
time for an individual word match those values when these words are a part of a larger sentence.
We can easily match frequencies using this method.

36

SOUND FILE: MY NAME IS ADITYA

SOUND FILE: MY

37

SOUND FILE: NAME

SOUND FILE: IS

38

SOUND FILE: ADITYA

Now, we note that when the same sentence is recorded by a different person, we still obtain
similar time and frequency characteristics ie. On comparing this graph with our earlier results,
we can easily match the frequencies of individual words.

39

RESULT:
We have observed that the plots of time and frequency for individual words and those same
words contained in larger sentences are almost similar. There is also a high degree of similarity
when a different user is speaking those same words.

Hence, we conclude that we can obtain the frequency-time plots or wavelets for speech signals
with minimal error. Therefore our speech analysis is successful.

40

REFERENCES:
1.

A Wavelet Tour of Signal Processing Stephanie Mallat Third edition,


Academic Press 1999

2.

L. Fugal, Conceptual Wavelets Fourth Edition pages 331-371

3.

The World According to Wavelets by Barbara Burke Hubbard, A K Peters Ltd,


1998

4.

Fundamental Concepts and an Overview of Wavelet Theory Tutorial Part 1


Robi Polkar

5.

Fundamental Concepts and an Overview of Wavelet Theory Tutorial Part 2( The


Fourier Transform and Short Term Fourier Transform) Robi Polkar

6.

Multiresolution Analysis and Continuous Wavelet Transform Tutorial Part 3


Robi Polkar

7.

Multiresolution Analysis and Continuous Wavelet Transform Tutorial Part 4


Robi Polkar

8.

Wavelets by Gilbert Strang, American Scientist 82 (1994) 250-255.

9.

Gerald Kaiser, A Friendly Guide to Wavelets, Birkhauser 1994

10. Paul S Addison, The Illustrated Wavelet Transform Handbook, Institute of


Physics, 2002
11. Ramazan Genay, Faruk Seluk and Brandon Whitcher, An Introduction to
Wavelets and Other Filtering Methods in Finance and Economics, Academic
Press, 2001

41

You might also like