Cours Tolerance

Genetic algorithms for feature selection in machine
condition monitoring with vibration signals
L.B.Jack and A.K.Nandi
Abstract: Artificial iieurnl networks (ANNs) cain be used succcsstiilly to dctcct l'aults i n rotating
machinery, using statistical cstiriiates of the vibration signal as input features. 111 any givcii
scenario, there arc many tliffcrcnt possiblc features that niay be used as inputs fus the ANN. One
o f the main problems facing the u s e of ANNs i s tlic sclcction o f tlic best illpiits 111 the ANN,
allowing thc crcation (if compact, Iiighly accurate networks that rcquisc coiiiparativcly little
prepsoccssing. The paper examines the use of a genetic algorithin (GA) to sclcct llic most
significant input features rsom a large set OS possiblc features i n machine condition iiionitoring
contexts. Using a (?A, a suhsct of six input features is sclcctcd fsoiii a set ol' 66, giving a
classification accuracy of 99.R%,, comparcd with a i r accuracy of 81.2% using an ANN without
feature selection and all 66 inputs. From a Iargcr set of 156 dilfbrcnl rccalurcs, the GA is able to
select a set of six features to give 100% rccognitimi accuracy.
1 Introduction inlbrmatioii for the neural network, whilst cuttiiig down

the number of inputs rcquirctl Tor the network
Modern day production plants are expected lo run SOS 24
hours a day, seven days a week. Lost produeticin duc to
unexpected failure o f niachiiiery i s regarded as a serious 2 The problem
and high cost problem. 'So cnsurc that production r i m s
successfully, industry has created a demand for tccliniqiics A N N s show psomisc for their ~~ppliciition in condition
that are capable o f recognising both the dcvclopincnt and iiionitoring. T h e advent of' siiiilll silicon bascd wceler-
severity of a fault condition within a machine system. omctcrs and vclocimetcrs comhincd with the classificatio~i
Machine condition monitoring was dcvclopcd to incct abilities oF A N N s lend tliciiisclvcs 10 the creation of siniill
this need. Traditional methods have been hascd around intcgsatcd chips that may bc incorp~isatcddircctly into tlie
the use of two basic approaches: the use o f single feature Fabric 11Fa machine, giving it the ability 10 sctusii inrorma-
threshold performance mcasuscs that give a vcsy general tiori to cithcr a n iiitcriial co111ro1 syslciii or an cxtcsual
indication of the existence of a fault hut no iiidication of monitosing system. Iiowcvcr, oiic 111' the innin problems
the nature of the fault, or altcrnativcly the use of l'rcqucncy facing the use of ANNs lics in lhc
dcrivcd indicators which, while extremely rcliablc, can be for this typc o f application. To be fcusihlc, tlic inetwork
time-consuming to compute and require a detailed kirowl- i i i i i s t be relativcly coni]x~cl,s o that a small, cos1 effective
edge of the intcsiial components of s niacliiiie and their processing unit may be used.
relative speeds to inake a good classilication of the fault. To mainlain the accusacy ol'tlic iictwork whilst reducing
Reliability of detection i s the most important criterion fur its physical size, some form or fcaturc selection that i s
the success of a condition monitoring system, and the ciipablc of sclccting tlic most sigiiilicant fcaturcs o f a
challcngc i s to dcvclop algorithms aiid systems lliat arc feature sct must be used. This raises tlic qucslion OS what
able to diagnose fault conditions more accuratcly than constitutes a signilicant fcaturc l'os the network. I t i s
those available at present. Research has been carried out difficult tii tell fi-oin raw valucs wliat will bc a significant
into the use of artificial neural networks for fault diagnosis, fcaturc to the A N N . Thiis, tlic iicthod used must bc
and the results are promising [1-~5]; however, many of Ihc capable o f rating the importance oSdiSfcrciit input features,
input features used require a significant computational and sclccting a subset of inpiits which will allow the A N N
cffost to calculate. To inake operation faster, and also to to perrorm classification accurately.
increase the accuracy of the classification, a feature selec- 'Ihc work prcscntcd iii this paper i s hascd around
tion process using genctic algorithms i s used in this papcs cxpcriiiicntal results performed on vihratioii (lata takcn
to isolate those features providing the iiiost significant fiom a small tcst rig (sec Fig. I) which can be littctl with
a numbcr 01'iiitcrcliaiigcable faulty roller bcarings. This i s
used to siinulate the typc or priiblciiis tliat can commonly
occm i n rolating inachincry. llolliiig clcmcnts, or ball
bearings, arc oiic of the most common components in
modern rotating ~nacliiiiery(sec Pig. 2); being able to
clclecl accurately tlie existence and severity o f a fault in a
iiiacliiiie can be OS prime impoi'tance in certain areas of
induslry, as in inany cases the mschinc may be safety or
emergency related.
205
bearing block with accelerometer attached rcqnired numbcr nF inputs is reached. One problem with
forward selection is in the case where two features acting
individually are relatively poor, but when uscd together
give a much better result than the two bcst features
achieved through forward selectioii. Thc algorithm has
no provision to deal with this type of problem, and may
block suffer as a result or it. Additionally, to take account of the
sizing of the hidden layer, it is necessary to train several
networks for cach size of input. To mirror the GA, which
lire" signal conditioning amplifier can choose a hidden layer of hctwccn 2 and 15 neurons, the
pulses forward selection algorithm will have to cxhaustively tcst
networks with sizes of between 2 and 15 neurons. The use
card in PC of a CA lhas no such problem, a s the reatores arc selected
as a unit, aiid the interaction between the different features
as a group is tested, rather than as individual features.
Computationally, as the number of features that the system
is expectcd lo deal with iiicrcascs, the CA based approach
rolling elements becomes more coinputationally efficient. Take a hypotlie-
outer race, / lical case where one has two feature sets, one with 18
features, and another with 156 features, and requires a
network with five inputs. Running a CA with a population
size of ten members for 60 generations will mean the
training of 600 networks (as will be seen in the Results
section, this was enough to allow the training algorithm to
run to effective convergence). Thc number of inputs to the
network can be any user defined size. So, to create a
network with five inputs will require 60 x l 0 = 6 0 0
nctworks to be trained. Using forward selection, the
numbers of networks to be trained is given by
+
(18 + I7 + I6 3- 15 14) x I4 = I 120 networks for the 18
member features sct and (156+ 1 5 5 + 154+ 153+ 152)x
Six different conditions arc used within the cxpcriinents 14= 10780 networks for thc 156 member feature set. As
conducted for this paper. 'Two normal conditions exist: a the iiiimbcr or features to he considered with forward
bearing in brand new condition (NO), and a bcaring in selection incrcases, so does the computational load on
slightly worn (NW) condition. There are four fault condi- the machine. This will also be the case €or the CA bascd
tions: inncr race fault (TR), outer race (OR) fault, rolling appruiich; however, the increase in computation time (and
clcment (RE) fault aiid cage fault (CA). the number of networks to bc evaluated) will be much lcss
severc. For cases whcrc there are several hundreds or
thousands or potential inputs (which can often be the
3 Justification case in large monitoring systems), the GA becomes thc
inorc effective tool. Thus, for searching within high dimen-
There is some justificalion for using CA based featurc sional €eatwe spacc, the GA becomes a practical option, as
selection over some other inethods available, such as a reasonably efficient mcthod of searching through the
principal component analysis (PCA), which can bc much feature space. Forward selection is by no means the only
lcss coinputationally intensive than a GA based approach. altcrnative feature selection and network sizing algorithm
The downside to PCA is that all the iivailablc halures are available. Rcccntly, approaches usiug support vector
required for the transformation matrix to create the rotated machines [7] and ncurofuzzy networks [XI have also
feature space. However, it must be reincmbcred that the been put forward, and thcsc may be considered as aller-
drivc behind the feature selection process is to create a natives in the future.
small system that requires as little processing as possible,
whilst maintaining a high level of accuracy. Using PCA
will still require the calculation of all the available features
beforc the transformation matrix can be applicd, and hence
it requires a larger coiuputing power on-board the hypothe- 4 Neural networks
tical smart sensor than would be nccdcd by using a C A
bascd fcature selection that selects only the bcst reatures. The noiiliiiear properties of ANNs make them ideal for
The computational cost of the GA will be much higher than applications such a s machinc condition monitoring, where
using a system like PCA during training and feature the training data are very oficn relatively sparsc, and the
selection; however, this will be offset by the lower comput- network will have to generalise to a certain extent. Several
ing power required on a sensor, and hencc the lower cost in applications have demonstrated that a neural network has
manufacture. successfully recognised and classificd different faults i n a
Another altcrnative [or feature sclectioii would be to usc variety of difrerent condition monitoring applications [9].
forward selection [ 6 ] . Forward selection first trains a single A good general introduction to neural nctworks is providcd
iiiput network for each featurc available. The process is by Iiaykin [IO] and also Rojas [ I l l .
thcn repeated for a two-input network, using the best The multilayer perceptron (MLP) has been used for
feature from the first stage and each of the rcmaiiiiiig classification purposes in this experiment. The MLP used
features, and the combination of two features that gave in this case consists of one hiddcn laycr and an output
the best result is kept, and thc process repeated, until the Iaycr, the hidden layer having a logistic activation function,
206 Vol. 147, M O 3, .June ZUUU
IEE P,uc.-Vir. itnupc Sifinni Procc,~~..
cqn. I, whilst the output layer uses a linear activatinn rank the pcrrormencc nf ii particular genome is importani
ftinctinn, cqn. 2: fnr the siicccss of Ihc training process. The genetic algo-
rithni will rate i t s owii pcri"orniance around that of the
fitness runetion; consequently, if tlic fitness function does
iiot adequately take account of the desired performance
tkatorcs, tlic gcnctic algoriihni w i l l be unable to mcei the
:p(v) = v (21 requirements of ilic uscr.
where v i s tlic s u m 01'tlic wcighled outputs. The size or the
hidden layer is deterniincd by the gcnctic algoritlim ilsclf 6 Data preparation
during training. 'l'liis allows tsaining to proceed at a Ristcr
rate Lliaii an exliaustivc training process tliiit checks difrcr- 6.1 Sampling
ciit sizcs or the lirst layer. Tlic s i x o r the output layer, The raw vibraiion data WCID collcctcd from experiments
dclcrnnined by tlic ~nminbcr(if initputs required, i s set at six with ii small vihration lest rig. Accelerometers were
iiciirons [or this particular application. momitctl vertically and liorizontally on a bearing housing,
TI-,'
dining
' of thc ~ietworkis ciirricd out using a slainrliird so as to allow the moriitoriiig ofniovcmcnt in a polar form
back-propagation algorithm wilh adaptive learning and rather than solely vertical movement. The output k o m the
nionicntmin, and the iiciwurk i s trained for 350 cpochs, accclci-omcters was passed through charge amplifiers, and
using 40% of tlic data set iis training data, and the tlicn lo the analogue i o digiiiil converter. The signal passed
remaining 60% as tlic test and validation set. The training tlnrongli a I o w p ~ ~ sfilter
s with a cuiorr rrcquency o f 18.3
time i s limited to 350 cpochs ti) allow computation;il time k l l z , and was then sampled a i a rate o f 48 kllz, giving a
with the (;A t o reasonable leiigtlis o f time. slight oversampling o r the data. This operation was
One o r t h e ideals o f a gond monitoring systciii iii a inass repeated ten times in each o r 16 different speeds. With a
production context i s tliat the system slioiild be small and total of six dirfcrcnl conditions, this gives a total data set of
casy 10 compuie. '1'0 this end, it i s desirable i o find a '160 cases, wiili 160 cases for each condition.
network that IISCS tlic iiiiiiiiniini iiiimhcr or inputs for a
given lcvcl o r accuracy. Tlic sinallcr tlic preprocessing
requiremcnts [if the system, tlic Iowci- i s the lcvcl o f liiw 6.2 Feature extraction
computing power requircd, and tlicrcforc tlie hetter arc the llaving sampled the data, a number of d i f h c n t forms of
CilallCCS of s u ~ c c ~ ~ rCl ~ Oi ~ ~ V ~ ~ ~ ~ preprocessing were used as shown below:
produced. Additionally, ]-educing the number o r inpills to
tlic iictwork inay improve tlie accuriicy of classilicalion, as 6.2.1 Plain statistics: A number ordirkrent staiislical
tlic input inl'orniation can h e adjusted to use only the features were takcii lnascd on the moments and cmnulants
signiticant features of tlic particular application without of the vihralion data. EIigher order spectra have been found
having to calculatc all the avnilablc values. Gctictic algo- to he useful in tlic idcntilication of different problems i n
rithms :ire idcslly suited to dcterminc which I'caturcs ofthc condition monitoring [I61. A good inlroduction is given in
original (lata s h o u l d hc used, as the cvolutionary process 117-191. For c a d i 01' llie basic preprocessed signals, ii set
that the algorithm imimics i s cxtrcmcly good at feature o r I 8 difl'crcnt iiiomcnis and cuniiilanis were calculated.
sclcciinii [12, 131. The Tormat o r the valiies i s shown i n eqn. 7. The cmnulanls
arc dcfincd in cqns. 3 ~ 6 .
As iwo dimensional in~ormationexists for vertical and
5 Genetic algorithms lhorieontal movement, i t was decided to calculate values
hasctl on the the equivalent polar positions, and values
Genctic algorithms lhavc hccn gaining popularity in a calculatcd on this basis arc given the subscript z,dcfincd in
variety or applications which requirc glohal optimisaiion cqll. 8:
of a solution. A good gcncral introiluction to gcnelic
algorithms i s given in 114, 151. 'l'lnc prime componcnt o r cl1' ,x!l) (3)
21 genetic algnrithm i s the genome, which mimics tlic
properties o f DNA in a ccll. The genome i s i m encoded ci2J ~ ,@ ~ (, 11)
'2.3 12 (4)
set o i instructions wliicli the gcnctic algorithm will me 10
construct a iicw model or function (in this case the inputs (?,:I = mi? ~ 3,nj'Jm:" -t 2(ml')j3 (5)
to a neural iictwol-IC). The encoding for the genome will
vary depending on the type of application that i s required; e!"' Y ??I!.,!3(m.?)j2
~ - 4,,pm,:.')
in some cases a binary encoding (with ones and zeros
representing tlic presence or ahscncc o f ii component) w i l l + 12n1!*!(ni,:'~)~
- 6(1% ) (6)
be sufficient, wliilsl in others an encoding composcd of IO 11) &3 (.lIil1J ( 3,,2 ) @ @illJ L:(IJ(2i cl11
real or floaling point numbers niay be used instead. The Im, 111). .v I.r), I ,Y ,) y . ' '
best type of encoding i s very imiicli problem depcndcnt, 14) cl3ill) c l 2 l l 2 J ~ 1 4 ) (IJ ~ 1 2 c
) l31 ~141'
and may require sonic Corm of coinbination or two or inorc c.> <. 2, ,' I
(;(1Jl3)
x y ,i m, I 3 1
encoding types to get the optimum results. (7)

The gcnctic algoritlim i s allowcd to select sub. ~.
vxioiis sizes to delermine tlic optimum combination and z = fi2~.I.)?) (8)
nombcr of inputs to tlic iictwork. T h e emphasis in using the
genetic algorithm Sor feature sclcction i s to reduce the This was applied i o the sampled vibration data, and the
computational load on the training system while still results wcrc saved in a I 8 x 060 inatrix.
allowing ncar optimal results to hi; found relatively quickly.
While any successful application of gcnctic algoriihnis 6.2.2 Signaldifferences andsums: In an attempt to
to il problem i s greatly dcpcndcnt on finding a suitalnlc highlight both high imd low frcqucncy features, it was
method f i r cncoding, the creation of ii fitness liinciion tn decided to apply simple calculatioiis that would emphasise
.i,d 1.17, Mi,. 3. ,iiirir, 2000
, S i g ~l/' r t ~ w s ~l
Ilih l'?o.~Vi.s. h z p ~ 207
the high and low frequency content o f t h e raw signals. 'I'lie For this particular application, a simple integer based
differences will increase in value wherever high frequency genome string was used. For a training run requiring N
changes take place, while the snms orthe signal will lielp dilTcrcnt inputs to bc sclcctcd as a subset of Q possible
lo emphasise the low kequency content of the signal. inputs, the genome string would consist ol' N + 1 real
Taking the original vibration signals, the derivative of numbers. I?acli of the first N numbers x in the genome is
each vibration signal was calculated according to eqn. 9, constrained a s O(x(Q - 1, whilst the last number x is
and then the statistical parametel-s given in eqn. 7 were constrained to lie within the bounds 5 5 x 5 S, whcrc S is
calculated From the modified signals, and saved in another the niiixiniuw number ol' neurons permissible i n the first
18 x 960 matrix. This process was thcn rcpcatcd using the Iaycr. This means that any mutation that occnrs will be
integral of the vibration signal (calculated according to bounded within the limits set at thc dcfiiiition of the
cqn. 10, where rcpi-cscnts the mean valuc of tbc genome. This arrangement is shown in eqn. 12,
seqncncc n). creating yet another 18 x 960 matrix:
d ( n ) = x(n) ~ x(n - I) (9)
6.2.3 High- and lowpass filtering: The signals were

passed through an eighth order Buttcrworlh IIR highpass
filter with a cutoff frequency or 129 Hz; thc statistical
values oreqn. 7 were thcn computed, and the results saved
in a 18 x 960 matrix. The process was thcn repeated using Having prepared thc input data matrix to be used in tlic
a lowpass filter with thc same cutoff frequency, creating training run, the datasets were split 40/60 into two
another I X x 960 matrix. portions. The 40% portion was used to train the network,
whilst the remaining 60% was used to validate the network
6.2.4 Spectral data: For each of the two clianiicls after training. Al'lcr training lrad proceeded for 350 epochs,
sampled, a 32-point FFT or the raw data was carried out, the network was tcstcd with the Cull dataset, and tlic results
and 13 values were obtained for each channel. These wcrc wcrc returned as a measure of perforinancc of the neural
then stored as a colunin vector of 66 values. which was nctwork.
uscd as the input dataset for the given data sample. Thc hll The GA used in this cxpcrimenl is a simple GA, as
input dataset formed a 66 x 960 clatasct. proposed by Goldbcrg [12]. The simulations arc
programmed i n e++,using the GAlib library [20] avail-
able over the Internet. The genome class used to implement
6.2.5 Target data: For each given vector in thc input
the Corm above is tlic GARealAlleleSelArray, with only
datasets, a corresponding vector was created in a second
intcgei- steps allowed. This was carried out in this way as
inatrix containing the target information used during the the real genome had all the dcsircd properties required for
neural network training. This information reflected the
onr application. Limiting the step size to integers allows
actual condition or the machine. This tlicn gave, for the the genome to behave fur all intents and purposes as an
six conditions, a 6 x 960 matrix containing the target data.
integer based gcnoinc, but allowing the use of the most
This is all the input information assembled to train the
desirable features of the ~ARealAlleleSetArr~iy to bc oscd
networks.
without cxcessive reprogramming.
The GA uses a population size of ten individuals,
6.3 Normalisation starling with randomly generated genomes. This cnsurcd
Prior experience (for example, sec [O]) with training neural that the interchange between different genonics within the
networks had indicated the significance of normalisation to population was relatively high; however, this also helps to
both the speed and succcss of training. Prior to commen- rcducc tlie likelihood of convcgcncc within the popula-
cing thc training run for tlie nciiral network, all the data i i i tion. An elitist population model is used, meaning that tlic
tlie input data set was normalised on a row by row basis. hest individual in tlie previous population is kept in the
Rows were iiormalised according to the forinnla next population, preventing the performance of the GA
worscning as the nunibcr of generations increases.
7.1 Mutation operation

whcrc m, is the mean valtic of thc row vector x, snd ox is The mutation operator used is rea1 Gaussian mutation, with
the slantlard deviation of the row vector x. ii probability of mutation equal to 0.2. For an allele string
U , and using a probability of mutation P , , ~E [0, I], tlie
7 Feature selection and encoding mutation operation can be expressed a s
Feature selection of the GA is controlled through the

values contained within the genome gcncratcd by the
CA. On being passcd a genome with N+ 1 values to bc where U:. represents the modified allele member, xi t [0, I]
tcstcd, the first N values arc nscd to determine which rows is a randomly sampled uniformly distributed variable, and
are selected as a suhsct from the input fcature set matrix. iki is a randonily samplcd Gaussian distributed variable
Rows corresponding to the numbers contained within the with 0 mean and a standard deviation of I . a,,,,,, and U,",,
genome are copied into II new matrix containing N rows. represent the bounding limits of thc allele values: if a value
The last value of the genomc dctcrniincs the number of cxcccds one of the limits, then it will be set to the
neurons present in the hidden layer. maximum or minimum, as relevant.
208 m PWKAS. imiigc signer/
P ~ ~ ~vui.~ 147, ~ J .U ~. Czona
~ ~N,,.. i,
7.2 Crossover tcstcd using various numbers 0 C inputs, varying from five
The crossover operator used is the uniform crossovcr, with to 12. The rcsnlts for each number oC inputs was then
the probability of crossover set to 0.75. I C lherc arc two recorded. The set of operations was repeated Cor runs o f 4 0
strings, u ~ ,and
, I I ~ , and
~ , given a probability of crossover and 60 generations. As a comparison, a neural network was
11,. t [0, I], then the crossover operation for cach individual trtiined usiiig each fealorc set. These were Irainctl [or a total
clemcnl, ai may bc cxprcsscd as of 350 epochs, and allowcd to clnoose the best s i x of
intermediate layer between two and 15 neurons.
10 Results
where y. E [O, I ) are both randomly sampled nniformly 10.1 Results: ANN
distributed variables. Table I sliows ii sominary o f results Tor a11 six Seature sets
uscd The ‘no. neurons’ quoted in tlie second column is the
8 Fitness function numbcr of neurons itsett in the hiddeu layer of the best
network in each training run. ‘Classification succcss’
rcprcscnls Ihc pcrccnlagc succcss rate of the ANN using
The fitness function uscd in tlie GA simply rctttrns Ihc
immbcr of corrcct classifications made over tlie whole the complete dalascl, which includes both training and lest
data. Thc ‘Calse alarm’ sate dctails the percentage or
datasct. No direct penalisation is made for incorrect clas-
sification hcrc, just that the classification score will be ‘iiorinal’ conditions that were misclassified as alarm condi-
correspondingly lower. The whole dataset is uscd to tions. The ‘Saault iiot rccognisctl’ category clctails the
provide sonic forin of cvaluation of the generalisation percentage of I‘ault conditions that wcrc classified as
properties 01‘ the network. While the perforiniince could normal.
As can be sccn, all tlic fcalurc sets have a performance
be rated nii only the performance of tlie validation set, it
was fell that this would give an inadequate measure of the greater than 80%, with two cases in excess of 90%; the
performance of the G A on the training sct. Owing to tlic spectral reaturc set givcs the best performance, at 97% Sor
nature of the machine syslcni, it is not always easy to the overall datasct. While this is a comparatively small
choose training data that arc complctcly representative o r feature se1 containing 66 rcat~rcs.the spcctral content of
likely characteristics of the fault data, and so the perfor- tlie data is ideally suited to recognising scvcral oC tlne
innincc of thc G A / A N N is mcasurcd over both datasets lo periodic type faults that are generated by the diffcrciit
conditions. The aggrcgatc nf tlie false alarm rate and
allow the GA to bc more representative in its sclcction of
fault not recognised rates is also the lowest of all the
features.
diffcrcnt fcaturc sets. This would perhaps suggest that
More complex forms of fitness ftmction, cithcr involving
breaking the signal down into a niiiiibcr of different
incorrect classifications or sum squared error or other
f~-cqucncy‘bins’ allows the A N N to classify the different
factors, could be uscd to determine the performance of
features comparatively easily.
each network trained; however, tlne performancc achieved
using this comparatively siniplc fitness function shows that The poorest performance is given by the feature set
which used high- and lowpass filtering. Again, this is a
they are not needed to achieve good results in this applica-
comparatively small Ccaturc set, containing only 36
tion.
features; however, it is not the sniallcst fcaturc sct, and il
may be the case that the filtering process is removing
9 Training and simulation ccrtain data conccrning the spccd of rotation, etc. that the
ANN rcqoircs for accurate recognition.
Training was carried out using six dalascls. lour oC the Generally speaking, those datasets coutaining little or no
datasets were statistically based, using tlne plain statistics spectrally based inrormalion have iiot performed as well as
fcattirc set (18 fcalurcs), signal differencing and sunlining those that do. Examining the actuiil classification rcsulls
(36 features), and high- and Iiiwpass filtering (36 fcalurcs), for the case using the spectral fcaturc set (Table 2), the
the fourth lcattirc set comprising all the statistically based brealdown of the A N N classification can be seen. Each
features (90 features). The set of 66 spectral fwturcs was column of the table sliows the relative classifications made
used as an individual case, and this dalpel was combined by the A N N for a given condilion. Each row i n thc colnmnn
with all the statistical feature sets to forin an input fcaturc vector sliows what the ANN perceived them as, expressed
set oC 156 inputs. Each lcalure set contained a total of 960 a s a pcrcentage of the total number of cases for that
cases. Using the gcinctic algorithm running for a total o f 2 0 condition. As can he seen, two categories inanage to
generations, each containing ten mcmhcrs (meaning tlic achieve 100% accuracy, wlnilc most others are in excess
training OS 200 neural networks), eight separate cases were of OX’%, accurate. Thc lcast iiccuralc fault condition is tlne
Table 1: Performance of straight ANNs using different feature sets

Dataset No. of No. of Classification False alarm Fault not
inputs neurons success (x) rate (%) recognised (%)
Statistics only 18 14 84.4 2.5 2.6
Highpassllowpass 36 7 83.1 2.6 2.2
Differencinglsumming 36 12 89.2 0 8.1
All statistics 90 7 88.1 0.2 7.4
Spectral data 66 8 97.0 0.2 2.2
All data 156 6 91.1 0.1 6.0
209
Table 2: Classification success (%) for straight ANN, 'normal' and as such arc 1101rcgarded iis fatiil conditions.
using spectral data only Thcrc still rcniiiins a rairly high proportion o f misclassifi-
cations hcing made, with the rcmiiindcr bciiig conliiscd
Perceived Actual condition hclwccn outcr race (OK) and cage Tauits (CA). Of lliesc
condition NO NW IR OR RE CA
two categories, tlic cage Iiault i s tlic inorc coininon classi-
NO 98.8 0 0 0 0 12.5 lication, a1 15.6% 'l'his i s cc~nsistcntwith the niisclassifi-
cations that wcrc seen with tlic spectral dalasc~,in that it
NW 0 100 0 0 0.6 0
would seein that tlic two inorniiil cases and the cage Tault
IR 0 0 98.8 0 0 0 appcar similar to the ANN.
OR 12 0 1.2 100 0 2.5
RE 0 0 0 0 99.4 0
CA 0 0 0 0 0 85.0 10.2 Genetic algorithm with ANN after 20
generations
Table 4 shows a coiiq~arisunhclween the results generated
after a run of 20 gciicrations using tlic combined genetic
cage (CA) Tault, which the ANN has a lot o r Irouble algorithm ANN prograiii (GANN), and the stand-alone
classifying, consislently confusing the f i u l t conditi(in A N N program. The tahlc shows specific i-csulls fool- cach
(85%) with the norinal condition (12.5"/0), and also lo a ANN considercd i n Table I, and compares tliesc against
lesser extent, the outer race (OR) Can11 (2.5%). This i s the best rcsull achieved using Ilic C A N N systcm. The third
consistent w i l h the fail1 condition, which can he erratic a t subset of columns show Lhc iiicaii perforinancc and the
times and, depending on the bearing loading, can be range of pcrTormauccs achieved, exprcsscd as perccntagc
difficult lo detect. It would perhaps suggest that Ihe acciiracies.
resolution nT the FFTs that were taken i s not sufficiently On cxaminiiig tlie rcstilts on a like Tor l i k e basis, it can
high lo detect the hull; alternatively, due to the intermittent be secn that all cases havc a success sale in cxccss of
nature that the liault exhibited under the experimental load 92.0%, and tlic hcst (;ANN solution lias a marltedly higher
condilions, i t inmy be that the amplitude of the Tau11 signal classilicalion rale tllan tlic cquivulenl 'sli-aighl' ANN. Tlic
i s very similar to that of the normal conditions, and tlic
number o T inputs sclecled in cach ciisc i s significantly
ANN has lrouhlc distinguishing between them. sinaller than tlic coinpletc fcaturc sets used in cach set, the
Examining the classification tablc (Table 3) Tor thc worst seleclcd features arc subscts of tlic original datasets, aiitl
casc (the higli/lowpass filtered dala), the exlcnt oT the thus any informalion available to the small GANN i s also
inisclassification that i s occurring can be sccn. Threc available to ttic larger one.
categories in particular arc very poor; thc normal (NO) Exaiiiining the rcsulls Cor tlic nicau pcrrurmancc, it cm
casc i s the worst of these, with a classilication accuracy oC
be s e w that cvcsy singlc result achieved a i i g h c r perTor-
22.5%. The poor performance i s no1 as bad as firs1 inancc using the GANN than the stand-alunc ANN. This i s
impressions may give; it can be seen that a Turther 61.9% also bornc out by the performatice mngc figures, wliicli
of the cases have been classified as worn norinal (NW) show that i n a l l cases Ihe inputs selected hy the GANN
hcarings, which while iiicorrcclly classificd, are still provide a superior perfimnancc to an ANN using all
available inputs. As can he sccn, Ihc spcclral datascl,
wliich had the higlicst pcrTormancc, improves rrom
Table 3: Classification success (%) for straight ANN, 97.0% (using 66 inputs) to 99.7% (using only eight
using highpassilowpass data illputs). 'l'liis inicans Ilial, out of the lotnl oT 960 examples,
only thrcc havc been misclassified, using oiily one-eighth
Perceived Actual condition
of the inputs. Intcrcstingly, Ilic highhwpass iiltercd data
have undcrgonc the largest improvement in pcrrornmcc,
NO 22.5 0 0 0 0 0.6 jumping Trom 83.1% to 97.7%. This compares well with
NW 61.9 100 1.3 0 1.3 10 thc other results. 'lhc fact that the high/lowpass cl;ita are
able to reach ii value i n cxccss oT 07% iinplics that the
IR 0 0 88.8 0 0 0
infc~rmationcontained within the fcaturc set i s sufficient lo
OR 0 0 2.5 100 0.6 0
allow accurale classification, witiiout the n c c d Tor extra-
RE 0 0 0.6 0 98.1 0 neous in~orination.Additionally, the nontbcr of inpols
CA 15.6 0 6.8 0 0 89.4 required lo achieve this level of accuracy i s one-quarter
of that used by lhe straight ANN.
Table 4: Comparison between stand-alone ANN and GA with ANN after 20 generations, for all six datasets
Dataset Straight ANN GA with best ANN GA with ANN

No. of No. No. of NO. Mean Perf.
inputs hidden Perf. % inputs hidden Perf. % perf. % range
Statistics only 18 14 84.4 6 9 95.8 92.0 90.2-95.8

HighpasslLowpass 36 7 83.1 9 8 97.7 96.0 89.2-97.7
Differencinglsumming 36 12 89.3 9 7 97.0 95.6 94.597.0
All statistics 90 7 83.1 5 5 97.6 95.0 89.4-97.6
Spectral data 66 8 97.0 8 11 99.7 98.7 96.7-99.7
All data 156 6 91 .1 11 12 99.7 96.8 94.699.7
210 IEE i'lor'-Vi.~.iiaaxc Sigtwl l'mwn., I'oi 147, dNo 3. .him, 2lliiil
Examining the classifications o f the spectral fcaturc set hclpful criterion in the sclcction of condition monitoring
after 20 generations (Table 5 ) , a number o f things can he systcms.
seen. The classification of the nctwork has improved Looking at thc results for the high- and lowpass filtering
markedly in that thcrc arc only a few mistakes. All faults (Tahlc 6), the improvement in the classification from the
are now classificd as t:dults, and all normal conditions arc straight A N N (Table 3) is very evident. Four o f the six
classified correctly. Whilc the actual classification bctwccn conditions achieve I00% accuvacy, whilc one ol' the other
some or the fault conditions is not yet ciitirely right, the two categories (rolling clement fault, RE) has remained as
fact that there arc no errors heing madc in the classification accurate, and the other (cage fault, CA) has dcterinrated
of faults as normal, and vice versa, i s perhaps the most slightly. The only Eaull condition which i s being confused
with the 'normal' conditions is the cage fault, and this is
thc same problem as previously existed. It may hc that the
Table 5: Classification success (%) for GANN after 20
spectral information by itself i s insufficiently detailed to
generations, using spectral data
allow discrimination hetwccn these conditions at the lower
Perceived Actual condition spceds o f rotation.
NO 100 0 0 0 0 0 10.3 Genetic algorithm with ANN after 40

NW 0 100 0 0 0 0 generations
IR 0 0 98.9 0 1.1 0 Table 7 shows the pcrl'orinancc of thc diffcrcnt feature sets
OR 0 0 0 100 0 0 aftcr running under the G A for 40 gencrations. The results
are again an iniprovcment over the previous set (Table 4).
RE 0 0 1.1 0 98.9 0
The mean pcrCorniancc of cvery set has increased fiom the
CA 0 0 0 0 0 100 values achieved after 20 generations. Five o f the six
datasets have their best perforinancc in excess o f 97.5%
Again the featurc set using a l l the available training data
Table 6: Classification success ( O h ) for GANN after 20 has invanaged tu achicve an accuracy o f 100%, indicating
generations, using highilowpass data acctrratc classification. This i s acliicved using only six
inputs out o f the pnssible 156. Using ninc neurons i n the
Perceived Actual condition hiddcn layer, LL relatively small network has hccn created
that fillfils thc critcria set earlier on. A network orthis size
NO 100 0 0 0 0 4.4 would bc ideal for a real-time implementation on a s m a l l
chip or microcontroller.
NW 0 100 0 0 0 6.9
IR 0 0 100 0 1.9 0
OR 0 0 0 100 0 0.6 70.4 Genetic algorithm with ANN after 60
RE 0 0 0 0 98.1 0 generations
CA 0 0 0 0 0 88.1 Table R shows the results after ruining simulations through
to 60 generations, and thcsc still show a dcgrcc o f
No. of No. No. of NO. Mean Perf.
inputs hidden Perf. % inputs hidden Perl. % perf. % range
~
Statistics only 18 14 84.4 12 7 95.4 93.9 91.7-95.4

Highpassllowpass 36 7 83.1 6 10 97.8 97.6 97.4-97.8
Diiferencinglsumming 36 12 89.3 6 10 97.5 98.8 96.1-97.5
All statistics 90 7 83.1 7 14 97.7 97.1 96.CL97.7
Spectral data 66 8 97.0 6 11 99.8 99.2 97.3-99.8
All data 156 6 91.1 6 9 100 98.3 95.CL100
No. of No. No. of No. Mean Perf.
inputs hidden Perf. % inputs hidden Perf. % perf. % range
Statistics only 18 14 84.4 9 5 97.0 95.5 94.2-97.0
Highpassllowpass 36 7 83.1 8 io 97.7 97.4 97.0-97.7
Difierencinglsumming 38 12 89.3 6 8 97.6 96.6 96.147.6
All statistics 90 7 83.1 7 9 97.4 96.6 95.7-97.4
Spectral data 66 8 97.0 7 10 99.7 99.4 98.9-99.7
All data 156 6 91.1 7 15 I00 98.7 96.4-100
improvcment ovcr the shorter ~UIS. In almost all cascs, the 12 Acknowledgments
algorithms had converged by 60 gcnerations. Comparing
the results for GO gencmtions, it can bc seen that Ihc Thanks must be cxpresscd to Weir Pumps for the loan of
pcrformancc oC llic best nctwnrks for each dataset the machinc set uscd in the cxperimcnts. Fiiiancial support
rcmaincd similar, cxcept that it improved fnr the statistics was provided b y Weir Pumps, Solatron Instruments and the
only case. University o f Livcrpool.
In Tour C ~ S C S (statistics only, differcncing/sitinming.
spectral and all data), tlic mean pcrformance oC thc featurc
sets imorovcd. whilc in thc othcr two [all statistics and 13 References
..
due to an inadcqnacy o f t l i c fitncss fmiction, i n tllitt there i s
no pcndty term for larger sizcs of hidden laycr. Incorpiir-
., ,...
ating this into the fitncss funclion would allow better 5 ILlU, Trl., a n c l MliNGCL, J.M.: ‘Intelligent monitoring of ball bearing
results in this rcspect. cmtlilioils', M e d . .SjmSigrid t ' i a c ~ ~ ,1992,
~ , 6, (5), pp. 4 1 9 4 3 1
6 BlSl IOP, C.M.: 'Neural nctworks h r lpattem locognilion' (Clarendon
Press, oxralri, 199s)
7 GUNN, S.R., and BROWN, M.: 'SIJPANOVA ii sparse transparent
modcllitig approach'. Procecrlings of IEEB Workshop on Neirml
11 Conclusions
rotatiiig siiafl loading conditions using ilrlificinl neond nctworks', IEEli
The work that has been carried ont so Car appcars very lians. Newal Nefw, 1997, 8, (3), pp. 748-757
promising. Thc performance of networks trained using thc 10 HAYKIN. S.: 'Ncural nciwarks: A comnrchcnsivo huntlation'
fcature sclcctioii was consistently higher than those trained (Macmillan College Publishing Company. N c h York, 1994)
I I RO.IAS, 11.: 'Ncuriil Inctwurks: n systematic iiitraduclion' (Springcr.
witliout fcature sclcction Thc m e of the genctic algorithm Vel-lag, Berlin, l996), ~
allows rcature sclcctioo to be carried out in an automatic I? CKILDBERG. G.B.: (mictio illmrithms in SCBICII. ontimisation ai11
manncr, and this makcs i t attractivc iii a tlcvelopmcnt oi8chiiio ic&ing' (Addison \V&, New York, 1989)'
13 RAYMtR, M.L., PUNCH, W.F., GOODMAN, IE, SANSCIIA-
cyclc, as training and optimisation can be carricd ont GRIN, P.C., iind KUMN, L A 'Simulbmeous l'cnlure extraction and
automatically. It has been shown in one set of cxperimcnts sclcctioii itbins n inaskine emetic nleorithm'. Proceedines of 7th
that the genetic algorithm i s capable of selecting a snbsct I'ublishing, San rmcisco, 1997) p p 6% 642
of six inputs from a set of 156 features that allow the ANN 14 TANG, K,S., MAN, K.F., KWONG, S., and Hli, Q.:'Gcnetic algo-
to nerforni with 100% accnraw. Addilionallv. ~. oii a smallcr rithms md their applications', ll!liR Sigltal 1'roce.w !Max., 1996, 13,
~~
(6), pp, 2 2 ~37

h t u r e - s c t consisting o r 66 spectral inputs, the (?A was D,S,: ',\ gc,,ctic lc,tofit,13, Con,,,al,, 1994,
able to sclect D subscl o f eight inputs hom il feature-set of 4, ipp. 6 5 ~85
66, givillg 99.8% accuracy. In this rcslpect, rcducing IO McCOi1MICK. A.C., NANDI, A.K., 81x1JACK, I..U.: 'Digital signs1
iprocessing algorithms in condition moniloring', hf..I COMADEM>
number or innuts reunircd for correct classification is a ,""U
,I
I,
ili n,.
,a>,,,,,.
c !A
_I j _
useful and dcsimblc approach. The smaller the numbcr of 17 MENDBL, J.M.:'Signal pmcessing with higher
S i g d P~oce.ssMug., 1993, 10, (4), pp. 10-37
inputs and computation required lo carry out classification Ix NANDI, A , , ~ , : 'Blind eSti,natiOn using stilti~tice,
successfully, the lcss complex thc proccssing nnit nscd Tor (Kluwer Academic I'ublishers, Ihston, 1999)
classificaiio11 w i l l have to be, wit11 tlic colrscqucnt benefits 19 I'APOIJLIR, A.: 'I'rohability, ratidmi variables and stochastic
pmcesscs' (McGlaw llill Inc., Ncw York, 1991)
i n costs and sizc, making a A N N based system suitahlc h r 2o M,: 'GAlib: C++ gcnctic i,lgnl.itllms lihlnry~, Availahlo
mounting 011 a chip or board in an intcgraled package. littp:lllancct.mir.~~l~~lg~~
212

Cours Tolerance

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cours Tolerance

Uploaded by

Copyright:

Available Formats

Genetic algorithms for feature selection in machine

condition monitoring with vibration signals

L.B.Jack and A.K.Nandi

1 Introduction inlbrmatioii for the neural network, whilst cuttiiig down

encoding types to get the optimum results. (7)

6.2.3 High- and lowpass filtering: The signals were

7.1 Mutation operation

Feature selection of the GA is controlled through the

Table 1: Performance of straight ANNs using different feature sets

Dataset Straight ANN GA with best ANN GA with ANN

Statistics only 18 14 84.4 6 9 95.8 92.0 90.2-95.8

NO 100 0 0 0 0 0 10.3 Genetic algorithm with ANN after 40

Statistics only 18 14 84.4 12 7 95.4 93.9 91.7-95.4

(6), pp, 2 2 ~37

You might also like