You are on page 1of 5

Optimization oI FeedIorward Neural Network Ior

Audio ClassiIication Systems



Onkar Singh
Department oI Electronics and Communication Engineering
Rayat Institute oI Engineering & InIormation Technology
SBS Nagar. India
auilaonkarsingh(gmailmail.com
Neeru Singla
Department oI Electronics and Communication Engineering
Rayat Institute oI Engineering & InIormation Technology
SBS Nagar. India
neerusingla99(gmail.com

Manish Dev Sharma
Department oI Physics paniab university
Chandigarh. India
mds(pu.ac.in


Abstract In this paper we classified the audio systems using
feedforward neural networ k to measure the suitability for
accuracy in classification and time taken to classify. Here we
have investigated and analyzed this system to optimize the neural
networ ks as to what layers and numbers of neurons are most
suitable to classify audio wave files. Here accuracy of above 99%
is reported.

I. INTRODUCTION
ClassiIication oI audio signals according to their content
has been a maior concern in recent years. There have been
many studies on audio content analysis. using diIIerent
Ieatures and diIIerent methods. It is a well known Iact that
audio signals are baseband. one-dimensional signals. General
audio consists oI a wide range oI sound phenomena such as
music. sound eIIects. environmental sounds. and speech and
non-speech signals.
The classiIication oI audio. at Iirst step. requires the
extraction oI certain Ieatures related to the input sound
sample. which may include root-mean-square amplitude
envelope. constant Q transIorm Irequency spectrum.
Multidimensional Analysis Scaling traiectories. cepstral
coeIIicients. spectral centroid and presence oI vibrato. |1|
There are two main approaches to this problem oI content
based classiIication based on previous extracted Ieatures:|2|
The Iirst which uses deterministic methods and the one that
utilizes probabilistic techniques. There are many research
eIIorts. high accuracy audio classiIication is only achieved Ior
the simple cases such as speech/music discrimination.
Previous works have presented a theoretic Iramework and
application oI automatic audio content analysis using some
perceptual Ieatures and audio classiIier based on simple
Ieatures such as zero crossing rate and short time energy Ior
radio broadcast. |3|
Researchers have conducted many experiments with
diIIerent classiIication models including GMM (Gaussian
Mixture Model) |4|. BP-ANN (Back Propagation ArtiIicial
Neural Network) and KNN (K-Nearest Neighbour)|5|. Many
other works have been done to enhance audio classiIication
algorithms such as pre-classiIication oI audio recordings into
speech. silence. laughter and non-speech sounds. in order to
segment discussion recordings in meetings. |6| The usage oI
taxonomic structures also helps to enhance classiIication
perIormance. Pitch tracking methods have also been
introduced to discriminate audio recordings into more classes.
such as songs. speeches over music. with a heuristic-based
model. Figure 1 below is a block diagram oI the classiIication
system. An audio Iile stored in WAV Iormat is passed to a
Ieature extraction Iunction. The Ieature extraction Iunction
calculates numerical Ieatures that characterize the sample.
When training the system. this Ieature extraction process is
perIormed on many diIIerent input WAV Iiles to create a
matrix oI column Ieature vectors |7|. This matrix is then
preprocessed to reduce the number oI inputs to the neural
network and then sent to the neural network Ior training. AIter
training. single column vectors can be Ied to the preprocessing
block. which processes them in the same manner as the
training vectors. and then classiIied by the neural network.

Onkar Singh* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES
Vol No. 7, Issue No. 1, 098 - 102
ISSN: 2230-7818 @ 2011 http://www.ijaest.iserp.org. All rights Reserved. Page 98
I
J
A
E
S
T

Figure1- Block diagram oI audio classiIication system


II. SYSTEM SETUP

This section describes the setup oI the digital audio
classiIication system. This system is composed primarily oI
the blocks above and was developed in the Matlab
environment. Matlab code can be provided upon request.

Data Ior training and testing the system was taken Irom ten
compact discs. The tracks on each oI these CDs were extracted
and converted to WAV Iormat and then divided into segments.
WAV Iiles are taken as a input Iiles. We can also take MP3
Iiles as input instead oI WAV Iiles is desired. The system
presented in this paper can be easily converted to take MP3
Iiles as input by pre-appending an MP3 to WAV converter.

A. Feature extraction

Discriminative Ieatures will contribute a lot in audio
classiIication task. In order to improve the accuracy oI
classiIication and segmentation Ior audio sequence. it`s
important to choose the Ieatures that can represent the
temporal and spectral characteristics properly. In our system.
we select the mel Irequency cepstral coeIIicients (MFCC).
which was proved to be eIIective Ior speech and music
discrimination |7
.
B. Data preprocessing

The Ieature vectors returned by the Ieature extraction block
were Iirst preprocessed beIore inputting them to the neural
network. Two types oI preprocessing were perIormed. one to
scale the data to Iall within the range oI 1 to 1 and one to
reduce the length oI the input vector. The data was divided
into three sets. one Ior training. one Ior validation. and one Ior
testing. The preprocessing parameters were determined using
the matrix containing all Ieature vectors used Ior training and
validation. For testing. these same parameters were used to
preprocess test Ieature vectors beIore passing them to the
trained neural network. The Iirst preprocessing Iunction used
was premnmx. which preprocesses the data so that the
minimum and maximum oI each Ieature across all training and
validation Ieature vectors is 1 and 1. Premnmx returns two
parameters. minp and maxp. which were used with the
Iunction tramnmx Ior preprocessing the test Ieature vectors.
The second preprocessing Iunction used was prepca. which
perIorms principle component analysis on the training and
validation Ieature vectors. Principle component analysis is
used to reduce the dimensionality oI the Ieature vectors Irom a
length oI 124 to a length more manageable by the neural
network. It does this by orthogonal zing the Ieatures across all
Ieature vectors. ordering the Ieatures so that those with the
most variation come Iirst. and then removing those that
contribute least to the variation. Precpa was used with a value
oI .001 so that only those Ieatures that contribute to 99.9 oI
the variation were used. This procedure reduced the length oI
the Ieature vectors by one halI. Precpa returns the matrix
transMat. which is used with the Iunction trapca to perIorm
the same principle component analysis procedure on the test
Ieature vectors as perIormed on the training and validation
Ieature vectors. This was done beIore passing the test Ieature
vectors to the trained neural network.

C. Neural Network

A three-layer IeedIorward backpropagation neural network.
shown in Figure. was used Ior classiIying the Ieature vectors
|6|. By trial and error. an architecture consisting oI 20 adalines
in the input layer. 10 adalines in the middle layer. and 3
adalines in the output layer was Iound to provide good
perIormance. The transIer Iunction used Ior all adalines was a
tangent sigmoid. tansig`. The Levenberg-Marquardt
backpropagation algorithm. trainlm`. was used to train neural
network.


III. III. EXPERIMENTAL RESULTS

This section will discuss the results oI training and testing
the classiIication system.


Onkar Singh* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES
Vol No. 7, Issue No. 1, 098 - 102
ISSN: 2230-7818 @ 2011 http://www.ijaest.iserp.org. All rights Reserved. Page 99
I
J
A
E
S
T
A. Classification

Here we made to build a classiIier that can predict the
correct classiIication oI audio wav Iile based on Ieatures
extracted Irom the wave Iile. Neural networks have proved
themselves as proIicient classiIiers and are particularly well
suited Ior addressing non-linear problems. Given the non-
linear nature oI real world phenomena. like predicting success
oI audio classiIication. neural networks is certainly a good
candidate Ior solving the problem.|6| The parameters will act
as inputs to a neural network and the prediction oI success oI
classiIication will be the target. Given an input. which
constitutes the measured values Ior the parameters oI the wav
Iile the neural network is expected to identiIy iI the audio
classiIication is correct or not. This is achieved by presenting
previously recorded parameters oI wave Iile to a neural
network and then tuning it to produce the desired target
outputs. This process is called neural network training. The
samples will be divided into three units:-
1. Training
2. Validation.
3. Test sets.
The training set is used to teach the network. Training
continues as long as the network continues improving on the
validation set. The test set provides a completely independent
measure oI network accuracy. The trained neural network will
be tested with the testing samples. The network response will
be compared against the desired target response to build the
classiIication matrix which will provide a comprehensive
picture oI a system perIormance.
The next step was to create the neural network discussed
above in the system setup section. The training Iunction used
was the IeedIorward backpropagation algorithm. trainlm.`
Using these parameters we can classiIy our audio Iiles which
we have given in the inputs. The plots oI some audio Iiles
classiIication is shown in Iigures. Figures show the
perIormance and the epoch plot. AIter training. the system was
then tested using the data set reserved Ior testing. BeIore
passing the test Ieature vectors to the trained neural network.
data preprocessing was perIormed using the saved parameters
Irom the preprocessing oI the training data. Figure 3shoiws the
plot between perIormance and epochs with 97.34
eIIiciency. The perIormance reached .0040275beIore a
validation stop occurred. Figure 4.5.6.7 also show the plot
between perIormance and epochs with eIIiciencies 95. 5.
97. 98.2 and 99.15 respectively. In Iigures 4.5.6.7
perIormance reached at .000785268. 5.51221e-005. .0032053.
.00202006 respectively.








Onkar Singh* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES
Vol No. 7, Issue No. 1, 098 - 102
ISSN: 2230-7818 @ 2011 http://www.ijaest.iserp.org. All rights Reserved. Page 100
I
J
A
E
S
T






B. Final result

In the Iinal result we plotted the graph between accuracy
and number oI neurons. Accuracy is the ratio oI correct
detection over the all detection |5|.

Figure shows the graph between the percentage accuracy and
number oI neurons. Here we noted that accuracy is above 99
at neuron 3 and 6. So here we analyzed that at neuron 3 we
will get the maximum accuracy. This result is made to vary
the number oI neurons means at which neurons our result is
more accurate. So in our proiect we prooI that instead oI using
a speciIic network we got the more accurate result to vary the
number oI neurons.


Onkar Singh* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES
Vol No. 7, Issue No. 1, 098 - 102
ISSN: 2230-7818 @ 2011 http://www.ijaest.iserp.org. All rights Reserved. Page 101
I
J
A
E
S
T



IV. CONCLUSIONS AND FUTURE WORK

The classiIier we have built has provided excellent and
robust discrimination among speech signals. We extracted the
Ieatures Irom the audio content and built the Ieature vectors.
then we applied the neural network to classiIy the audio. and
we used the IeedIorward training procedure. we reported 99
accuracy. Only a particular and speciIic neuron is used to
make the classiIication result more accurate. There are many
interesting directions that can be explored in the Iuture. To
achieve goal. we need to explore more audio Ieatures that can
be used to characterize the audio system. The second direction
is to improve the computational eIIiciency Ior neural network.

REFERENCES

|1| Yu Song; Wen-Hong Wang; Feng-Juan Guo;. International ConIerence
on Feature extraction and classiIication Ior audio inIormation in news video."
Wavelet Analysis and Pattern Recognition. 2009. ICWAPR 2009. . pp.43-46.
12-15 July 2009

|2| Mitra. V.; Wang. C.J.; International Joint ConIerence on. A Neural
Network based Audio Content ClassiIication." Neural Networks. 2007.
IJCNN 2007. . vol.23. no.8. pp.1494-1499. 12-17 Aug. 2007

|3| Xi Shao; Changsheng Xu; Kankanhalli. M.S.; . Proceedings oI the 2003
Joint ConIerence oI the Fourth International ConIerence on . "Applying neural
network on the content-based audio classiIication." InIormation.
Communications and Signal Processing. 2003 and the Fourth PaciIic Rim
ConIerence on Multimedia. vol.3. no.4. pp. 1821- 1825 vol.3. 15-18 Dec.
2003

|4 Jae-Young Kim; Dong-Chul Park; International Joint ConIerence on.
Application oI Bhattacharyya kernel-based Centroid Neural Network to the
classiIication oI audio signals." Neural Networks. 2009. IJCNN 2009. .
pp.1606-1610. 14-19 June 2009.

|5| Yu Song; Wen-Hong Wang; Feng-Juan Guo; . "Feature extraction and
classiIication Ior audio inIormation in news video." Wavelet Analysis and
Pattern Recognition. 2009.ICWAPR 2009. International ConIerence on.pp.43-
46.12-15July2009
.|6|Ballan. L.; Bazzica. A.; Bertini. M.; Del Bimbo. A.; Serra. G.; . "Deep
networks Ior audio event classiIication in soccer videos." Multimedia and
Expo. 2009. ICME 2009. IEEE International ConIerence on.. pp.474-477.
June 28 2009-July 2009

|7| Xin He; Yingchun Shi; Fuming Peng; Xianzhong Zhou; . " Chinese
ConIerence on . A Method Based on General Model and Rough Set Ior Audio
ClassiIication."Pattern Recognition.2009.CCPR 2009. pp.1-5.4-6 Nov. 2009.

Onkar Singh* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES
Vol No. 7, Issue No. 1, 098 - 102
ISSN: 2230-7818 @ 2011 http://www.ijaest.iserp.org. All rights Reserved. Page 102
I
J
A
E
S
T

You might also like