You are on page 1of 5

Difference Histograms: A new tool for time series analysis applied to bearing

fault diagnosis
Barend J. van Wyk
a,
*
, Michal A. van Wyk
b
, Guoyuan Qi
a
a
French South African Technical Institute in Electronics (FSATIE) at the Tshwane University of Technology, Private Bag X680, Pretoria 0001, South Africa
b
School of Electrical and Information Engineering, University of the Witwatersrand, Johannesburg, South Africa
a r t i c l e i n f o
Article history:
Received 3 May 2007
Received in revised form 24 December 2008
Available online 9 January 2009
Communicated by R.C. Guido
Keywords:
Time series classication
Feature extraction
Bearing fault diagnosis
Pattern spectra
Vibration analysis
Difference Histograms
a b s t r a c t
A powerful tool for bearing time series feature extraction and classication is introduced that is compu-
tationally inexpensive, easy to implement and suitable for real-time applications. In this paper the pro-
posed technique is applied to two rolling element bearing time series classication problems and shown
that in some cases no data pre-processing, articial neural network or nearest neighbour approaches are
required. From the results obtained it is clear that for the specic applications considered, the proposed
method performed as well as or better than alternative approaches based on conventional feature
extraction.
2009 Elsevier B.V. All rights reserved.
1. Introduction
The concept of a Difference Histogram, a new tool for time ser-
ies feature extraction, is introduced in this paper and applied to
two rolling element bearing time series classication problems.
Since rolling element bearing failures are one of the foremost
causes of failures in rotating machinery, condition monitoring is
important for system maintenance and process automation. In
many cases the simplest approach is to directly measure vibration
of the rotating machine using an accelerometer. The presence of
noise and the wide variety of possible faults complicate diagnostic
procedures. Very often fault diagnosis rely on expert experience,
statistical analysis, or the use of classical time and frequency do-
main analysis techniques.
During the past decade various signal processing and pattern
recognition approaches were added to the arsenal of available
diagnostics tools: Nikolaou and Antoniadis (2002) introduced an
effective demodulation method based on the use of complex
shifted Morlet wavelets, Chen and Mo (2004) used wavelet trans-
formtechniques in combination with a function approximation ap-
proach to extract fault features which were used with a neural
network, Lou and Loparo (2004) introduced a scheme based on
the wavelet transform and a neuro-fuzzy classication strategy,
Altman and Mathew (2001) used discrete wavelet packet analysis
to enhance the detection and diagnostics of low-speed rolling ele-
ment bearing faults, Zhang et al. (2005) introduced an approach
based on localised wavelet packet bases of vibration signals, Sun
and Tang (2002) applied the wavelet transform to detect abrupt
changes in vibration signals, and Prabhakar et al. (2002) also
showed that the discrete wavelet transform can be used for im-
proved detection of bearing faults.
Subrahmanyam and Sujatha (1997) demonstrated that a multi-
layered feedforward network and an ART-2 network can be used
for the automatic detection and diagnosis of localised ball bearing
defects, Kowalski and Orlowska-Kowalska (2003) showed that
Kohonen networks can be used as an introductory step before a
neural detector for initial classication, Spoerre (1997) applied
the cascade correlation algorithm to bearing fault classication
problems, Gelle and Colas (2001) used blind source separation as
a pre-processing step to rotating machinery fault detection and
diagnosis, Zhang et al. (2005) used a genetic programming ap-
proach and Samanta et al. (2003) used a support vector machine
in conjunction with a genetic algorithm.
Results obtained using the proposed Difference Histograms for
two rolling element bearing time series feature extraction and clas-
sication problems are compared to the work of Samanta and Al-
Balushi (2003) and Kith et al. (2006), both based on conventional
time domain feature extraction and supervised learning. The
0167-8655/$ - see front matter 2009 Elsevier B.V. All rights reserved.
doi:10.1016/j.patrec.2008.12.012
* Corresponding author. Tel.: +27 12 382 4191; fax: +27 12 382 5294.
E-mail address: vanwykb@gmail.com (B.J. van Wyk).
Pattern Recognition Letters 30 (2009) 595599
Contents lists available at ScienceDirect
Pattern Recognition Letters
j our nal homepage: www. el sevi er . com/ l ocat e/ pat r ec
feature extraction methodology described in (Lou and Loparo,
2004) is also explored for purposes of comparison. The Difference
Histogram algorithm is introduced in Section 2 and the data sets,
feature extraction and classication methodologies and results
are described in Sections 3 and 4. Section 5 concludes the paper.
2. Difference Histograms
The idea of a Difference Histogram is summarised by the follow-
ing three denitions:
Denition 1. A Difference Histogram, X, is dened as a scaled
representation of the number of occurrences of the lengths of
Segments of Increase in a block of N samples of a discrete time
series, /(n).
Denition 2. A Segment of Increase is a group of consecutive
samples in a discrete time series U(n) such that Un 1
Un > Un Un 1 e, where e is a Tolerance Parameter.
Denition 3. The Tolerance Parameter, e, is dened as a positive
real number chosen to maximise some distance measure between
X
i
, where X
i
, i 1; . . . ; C are the Difference Histograms obtained
from time series belonging to C different classes.
As is evident from Algorithm 1 which operates directly on con-
secutive blocks of N samples of a discrete time series U, a Difference
Histogram is extremely easy to implement and has a complexity of
only ON, which makes it ideal for real-time applications:
Algorithm 1
1: initialise: e, k 0, D 0, X 0
2: for n 2 : N
3: D Un Un 1
4: if D > D e
5: increment k
6: else
7: increment Xk
8: k 0
9: end if
10: D D
11: end for
12: scale X
Scaling is needed to keep X independent of the block size N. For
our implementation we have divided each histogram bin, Xk, by
N=100 where 100 was simply chosen for convenience to have the
values of Xk in a convenient range. Since in many cases only se-
lected histogram bins indexed by k are calculated instead of the full
difference histogram, conventional normalisation is not recom-
mended. It should be noted that the algorithm processes blocks
of data where the size of each block is dened by N and therefore
even if Xk is increased each time during the N 1 loop, the max-
imum value Xk can have is N 1. Singularity will therefore only
be an issue if N tends to innity. It is obvious that a Difference His-
togramcan also be dened using Segments of Decrease. A Segment of
Decrease is be dened as a group of consecutive samples in a dis-
crete time series U, indexed by n, such that Un Un 1 <
Un 1 Un e. It should be observed that e creates a tolerance
region around D (where D is the value for D at iteration n 1).
How to select e for a specic application will be illustrated in
Section 3.
In Sections 3 and 4 the simplicity and power of the Difference
Histogram for time series classication will be illustrated using
two well-known datasets. In this paper we only consider the
two-class case for which the procedure is summarised by Algo-
rithm 2:
Algorithm 2
1: Use training data and Algorithm 1 to compute kX
1
X
2
k for
increasing values of e with N set sufciently large or equal to
the total number of training samples available.
2: Determine the optimal value for e, i.e. the value that maxi-
mises the separation between X
1
and X
2
.
3: Given an optimal e, calculate X using training data to repeat-
edly train a classier such as a neural network (described in
Section 4) or some distance-based method (described in Section
3) for increasing values of N, starting with N sufciently small.
4: Choose smallest N giving acceptable training results.
5: Perform classication using chosen e and N.
In the multi-class case an approach can be adopted reminiscent
of the linear machine approach described in (Duda et al., 2000): for
C different classes this means that C two-class classiers are
trained with the kth classier classifying X
k
and X
k
. Finally, the
multi-class classier is implemented to yield the class associated
with that two-class classier whose output is the maximum.
3. Landustrie dataset
The data used in this case study are measurements from accel-
erometers on a submersible pump driven by an electric motor ac-
quired in the Delft Machine Diagnostics by Neural Networks project
with the assistance of Landustrie B.V, The Netherlands, and can
be freely downloaded from http://www.aypma.nl/PhD/pump_-
sets.html. Separate measurements were obtained for a normal
bearing and a bearing with an outer race defect at the upper end.
The sensors were placed at ve different positions and sampled
at 51.2 kHz while the pump was rotating at 1123 rpm. A total of
20,480 samples were recorded for each sensor, under normal con-
ditions and when the bearing had an outer race defect. For each
sensor the rst 10,240 samples were used for training and the
remaining samples for testing. Fig. 1 shows the scaled histogram
obtained using the training time series from sensor 4 with
e 0:018. The difference between a normal and faulty bearing is
clearly visible. In general, if time series data (whether of a defective
device or not) is stationary, then the variance of the histogram, as a
1 2 3 4 5 6 7 8 9 10
0
2
4
6
8
10
12
14
16
18
20
Bin Number
S
c
a
l
e
d

M
a
g
n
i
t
u
d
e
Normal Bearing
Faulty Bearing
Fig. 1. Difference Histograms extracted from the training sets for normal and faulty
bearings for sensor 4.
596 B.J. van Wyk et al. / Pattern Recognition Letters 30 (2009) 595599
feature, will decrease asymptotically as the block size N (number of
samples) used to calculate it increases, and in the limit, will con-
verge. However, if the data is non-stationary then the histogram,
as a feature, may or may not converge, in the limit, as the block size
N used to calculate increases to innity.
Steps 1 and 2 of Algorithm2: for each sensor, the best choice for
e, the Tolerance Parameter, can be determined by calculating X
1
,
the Difference Histogram obtained using the 10,240 training sam-
ples recorded under normal conditions and X
2
, the Difference His-
togram obtained using the 10,240 training samples recorded under
bearing fault conditions, and computing kX
1
X
2
k
2
for increasing
values of e. Fig. 2 illustrates the result and shows that the optimal
Tolerance Parameter for sensor 4 is 0.018 (corresponding to the
maximum histogram separation value in Fig. 2). Table 1 has been
obtained by repeating the process for all ve sensors.
Step 3 of Algorithm 2: once the optimal Tolerance Parameters
for each sensor have been determined, the associated Difference
Histograms (or selected bins from suitable sensors) can be used
as the input to a classier such as a neural network for training
and classication. However, for the two-class application consid-
ered in this section, it was found that using the l
1
norm as a dis-
tance measure, i.e. using only the most discriminative histogram
bin, together with a hard threshold proved more than sufcient.
From Table 1 it is clear that recordings from sensor 4 are the most
suitable for classication since this sensor has the largest histo-
gram separation, followed by sensor 1 as a second choice. The next
step is now to nd the most discriminative histogram bin associ-
ated with sensor 4. By calculating the differences between corre-
sponding bins of the Difference Histograms X
1
and X
2
, derived
from the training data from the normal and faulty bearings, it is
possible to determine the most discriminative bins which are listed
in Table 2 (histogram bins 1, 2 and 3 cf. Fig. 1). For the most dis-
criminative bin k a Bin Threshold given by X
1
k X
2
k=2 can
be calculated. Classifying a signal as belonging to a normal or faulty
bearing then boils down to comparing the value of the most dis-
criminative bin from the Difference Histogram obtained from the
testing set, to the Bin Threshold of the most discriminative bin. As
shown in Table 2, for sensor 4 the most discriminative bin is
k 1 with an associated Bin Threshold of 13.07. As shown in
Fig. 1, the value of bin 1 for the faulty bearing class exceeds the va-
lue of bin 1 for the normal bearing class. Therefore, if X1 exceeds
13.07 it is classied as belonging to the faulty bearing class, other-
wise it is classied as belonging to the normal bearing class.
Step 4 of Algorithm 2: the optimal block size, N, must now be
determined. The results of the Difference Histogram classication
experiment using training data and the most discrimitive histo-
gram bin associated with sensor 4 is shown in Fig. 3. For compar-
ison the results from sensors 1 and 5 are also shown in Figs. 4 and
5. These gures show the percentage correct classications, from a
total of b10; 240=Nc classications, using the test time series for the
normal and faulty bearing classes, respectively, where the block
size N is the size of the batch of samples processed before a classi-
cation decision is made. Fig. 3 shows that for sensor 1 a block size
of N > 400 is sufcient for a 100% correct classication rate. Fig. 5
shows that sensor 5 is not suitable. Sensor 1 gave similar results to
sensor 4 for N 600 and that although not optimal, sensors 2 and
3 can also be used provided that N is chosen large enough.
Step 5 of Algorithm 2: using the test data from sensor 4 and a
block size of N 600 gave a 100% correct classication rate. Alter-
natively using test data from sesnor 1 and a block size of N 600
also gave a 100% correct classication rate.
For this dataset the Difference Histogram method was com-
pared to nearest neighbour approaches using the same features
proposed by Samanta and Al-Balushi (2003) who experimented
with the same dataset: root mean square rms
P
U
2
n
N
q
, variance
( r
2
EfU
2
ng), normalised third central moment c
3

EfU
3
ng
r
3
, nor-
malised fourth central moment c
4

EfU
4
ng
r
4
and the normalised
sixth central moment c
6

EfU
6
ng
r
6
, where Un Un l and
l EfUng. As in (Samanta and Al-Balushi, 2003) the testing
and training time series were respectively divided into 20 non-
overlapping blocks of 1024 samples. Each block was processed to
extract these ve features.
0 0.01 0.02 0.03 0.04 0.05
4
5
6
7
8
9
10
11
Tolerance Parameter
H
i
s
t
o
g
r
a
m

S
e
p
a
r
a
t
i
o
n
Fig. 2. Inuence of Tolerance Parameter on separation of histograms for data from
sensor 4.
Table 1
Determining the optimal Tolerance Parameter.
Sensor 1 2 3 4 5
Optimal Tolerance Parameter 0.011 0.011 0.007 0.018 0.000
Maximum histogram separation 8.42 2.97 2.35 10.83 2.85
Table 2
Determining the optimal histogram bins for sensor 4.
Histogram bin 1 2 3
Bin difference 10.62 1.63 0.76
Bin threshold 13.07 3.06 3.54
0 200 400 600 800 1000
70
75
80
85
90
95
100
Block Size
P
e
r
c
e
n
t
a
g
e

C
o
r
r
e
c
t

B
l
o
c
k

C
l
a
s
s
i
f
i
c
a
t
i
o
n
s
Normal Bearing
Faulty Bearing
Fig. 3. Classication results for sensor 4.
B.J. van Wyk et al. / Pattern Recognition Letters 30 (2009) 595599 597
Since both a normal bearing and a faulty bearing recording are
available for each sensor, there were 40 feature vectors available
per sensor (i.e. 20 feature vectors for normal bearing data and 20
feature vectors for faulty bearing data). As in (Samanta and Al-
Balushi, 2003), we divided these feature vectors, for each sensor,
into two groups. One group for training (consisting of the rst 12
feature vectors for normal bearing data and the rst 12 feature vec-
tors for faulty bearing data) and one group for testing (consisting of
the 16 remaining feature vectors).
All ve features were used to represent the sensor signals. All
sensor signals were tested both individually and in groups. The
objective of this experiment was to demonstrate the diagnostic
capability of the Nearest Neighbour (NN) and the Variable-kernel
Similarity Metric (VSM) learning approaches for different sensor
signals. The results of the diagnostic capability of NN compared
against the VSM for training and testing are reported. The VSM
approach introduced by Lowe (1995) attaches more impor-
tance to closer neighbours by determining the weight assigned to
each neighbour by learning the optimal parameters of a kernel
function.
Table 3 (for training) and Table 4 (for testing) report the results
of the experiment. These results showthat the VSMperformed bet-
ter than the NN, as expected. For these two tables we observed that
whenever sensor 1 or sensor 5 was used as input (individually or
grouped), the success rate for training and testing is worse than
when using other input signals. The effects of using different signal
feature combinations for training and testing were also investi-
gated, but no signicant improvement in performance was ob-
served. The reader may consult Kith et al. (2006) for more detail.
The results in Tables 3 and 4 are similar to that obtained by Saman-
ta and Al-Balushi (2003) using their articial neural network ap-
proach without pre-ltering the sensor signals. The reader is
referred to their work for more information on the structure and
details of the feedforward neural network used. Samanta and Al-
Balushi (2003) also studied the effects of various pre-processing
techniques like band-pass and high-pass ltering, envelope detec-
tion and wavelet transform processing, achieving a 100% training
and test success in some cases where more than one feature or
more than one sensor signal were used for training and testing. It
is therefore signicant to note that a 100% success rate can be
achieved using only a single Difference Histogram bin from an
individual sensor without using a nearest neighbour or articial
neural network approach.
4. Case Western dataset
The dataset used in this section was acquired by the Case Wes-
tern Reserve University (CWRU) Bearing Data Center with help
from Rockwell Science, CVX, and the Ofce of Naval Research
and can be freely downloaded from http://www.eecs.case.edu/lab-
oratory/bearing. The test setup consisted of a motor where bear-
ings supported the motor shaft, a torque transducer and a
dynamometer. Single point faults of sizes 7 mils, 14 mils and 21
mils were introduced to the outer raceway, ball and inner raceway
of the front end and drive end bearings respectively. For each fault
0 200 400 600 800 1000
75
80
85
90
95
100
Block Size
P
e
r
c
e
n
t
a
g
e

C
o
r
r
e
c
t

B
l
o
c
k

C
l
a
s
s
i
f
i
c
a
t
i
o
n
s
Normal Bearing
Faulty Bearing
Fig. 4. Classication results for sensor 1.
0 500 1000 1500 2000
0
5
10
15
20
25
30
35
40
45
Block Size
P
e
r
c
e
n
t
a
g
e

C
o
r
r
e
c
t

B
l
o
c
k

C
l
a
s
s
i
f
i
c
a
t
i
o
n
s
Normal Bearing
Faulty Bearing
Fig. 5. Classication results for sensor 5.
Table 3
Effects of input signals from different sensors on identication of machine condition
with ve features (rms, r
2
; c
3
; c
4
; c
6
). Results for training.
Sensor(s) Training success
NN (%) VSM (%)
1 29 70
2 95 100
3 91 100
4 100 100
5 29 62
2,3 97 100
2,3,4 97 100
1,2,3,4 83 89
1,2,3,4,5 75 81
Table 4
Effects of input signals from different sensors on identication of machine condition
with ve features (rms, r
2
; c
3
; c
4
; c
6
). Results for testing.
Sensor(s) Test success
NN (%) VSM (%)
1 43 62
2 87 100
3 87 100
4 100 100
5 43 50
2,3 93 93
2,3,4 91 95
1,2,3,4 81 89
1,2,3,4,5 81 78
598 B.J. van Wyk et al. / Pattern Recognition Letters 30 (2009) 595599
introduced, the motor load was varied from 0 hp to 3 hp in 1 hp
increments. As for the data used in Section 3, discriminating be-
tween normal and faulty bearing (as was done in Section 3) using
data from an accelerometer mounted on the base plate was in this
case found to be trivial using the Difference Histogrammethod and
a single histogram bin. What was found more challenging was try-
ing to discriminate between outer raceway, ball and inner raceway
faults using only drive end bearing fault data from a single acceler-
ometer sampled at 48 kHz. All the available time series for each
motor load, fault size and inner raceway and ball faults were used
resulting in 12 time series for testing and training for each of these
two fault types. More than 12 time series are available for outer
raceway faults, but for consistency only the 12 corresponding to
faults located orthogonal to the load zone were used, giving a total
of 36 time series available for testing and training. The rst half of
each time series was divided into four blocks and used for training
and the remaining half of each time series was also divided into
four blocks and used for testing. In total we therefore had 144 data
blocks available for training and 144 data blocks available for test-
ing. Due to the varying length of each time series and to aid com-
parison with a Discrete Wavelet Transform (DWT) approach, each
data block was limited to 30,000 samples for consistency (i.e. N in
both steps 1 and 4 of Algorithm 2 was xed to 30,000).
For each of the 144 data blocks only the rst 6 histogram bins
were calculated. The tolerance parameter, calculated using the
methodology described in Section 3, was found to be 0.02. With
these six histogram bins as inputs, a feedforward neural network
with six input layer neurons, six hidden layer neurons and 3 output
layer neurons (i.e. one for each fault class) was trained using the
LevenbergMarquardt algorithm. The results obtained are reported
in Table 4. Increasing the number of hidden layer neurons did not
result in increased performance during testing. Decreasing the
number of hidden layer neurons resulted in decreased perfor-
mance both during training and testing.
The Difference Histogram method was compared to a wavelet
feature extraction method similar to that described in (Lou and
Loparo, 2004). For each of the 144 data blocks a Discrete Wavelet
Transform (DWT) was performed. The Daubechies-2 wavelet was
used in the decomposition to obtain a ve level DWT. For each of
these DWTs the variance of each of the six wavelet bins was calcu-
lated and used as inputs to the same neural network conguration
as used for the Difference Histogram. The reason for not altering
the neural network conguration was to enable the comparison
of the two feature extraction methods. Refer to Table 5 for a com-
parison of the results. It is observed that results obtained using the
same neural network architecture are very similar.
5. Conclusion
A powerful tool for bearing time series feature extraction and
classication was introduced which is computationally easy to
implement and suitable for real-time applications. Its power was
demonstrated using two rolling element bearing time series classi-
cation problems without using data pre-processing. From the re-
sults obtained it is clear that for the specic applications
considered, the proposed method performed as well or better than
alternative approaches. Work in progress includes extending the
concept to image analysis applications.
References
Altman, J., Mathew, J., 2001. Multiple band-pass autoregressive demodulation for
rolling-element bearing fault diagnosis. Mech. Systems Signal Process. 15 (5),
963977.
Chen, C., Mo, C., 2004. A method for intelligent fault diagnosis of rotating
machinery. Digital Signal Process. 14, 203217.
Duda, R.O., Hart, P.E., Stork, D.G., 2000. Pattern Classication, second ed. Wiley-
Interscience.
Gelle, G., Colas, M., 2001. Blind source separation: A tool for rotating machine
monitoring by vibration analysis. J. Sound Vib. 248 (5), 865885.
Kith, K., Van Wyk, B.J., Van Wyk, M.A., 2006. A variable kernel classier using
ALOPEX optimization and its application to bearing fault diagnosis. In: Proc.
IASTED Internat. Conf. on Modeling and Simulation, Gabarone, Botswana, pp.
5661.
Kowalski, C.T., Orlowska-Kowalska, T., 2003. Neural networks application for
induction motor fault diagnosis. Math. Comput. Simul. 63, 435448.
Lou, X., Loparo, K.A., 2004. Bearing fault diagnosis based on wavelet transform and
fuzzy inference. Mech. Systems Signal Process. 18, 10771095.
Lowe, D.G., 1995. Similarity metric learning for a variable-kernel classier. Neural
Comput. 7 (1), 7285.
Nikolaou, N.G., Antoniadis, I.A., 2002. Demodulation of vibration signals generated
by defects in rolling element bearings using complex shifted Morlet wavelets.
Mech. Systems Signal Process. 16 (4), 677694.
Prabhakar, S., Mohanty, A.R., Sekhar, A.S., 2002. Application of discrete wavelet
transform for detection of ball bearing race faults. Tribol. Int. 35, 793800.
Samanta, B., Al-Balushi, K.R., 2003. Articial neural network based fault diagnostics
of rolling element bearings using time-domain features. Mech. Systems Signal
Process. 17 (2), 238317.
Samanta, B., Al-Balushi, K.R., Al-Araimi, S.A., 2003. Articial neural networks and
support vector machines with genetic algorithm for bearing fault detection.
Eng. Appl. Artif. Intell. 16, 657665.
Spoerre, J.K., 1997. Application of the cascade correlation algorithm(CCA) to bearing
fault classication problems. Comput. Ind. 32, 295304.
Subrahmanyam, M., Sujatha, C., 1997. Using neural networks for the diagnosis of
localized defects in ball bearings. Tribol. Int. 30 (10), 739752.
Sun, Q., Tang, Y., 2002. Singularity analysis using continuous wavelet transform for
bearing fault diagnosis. Mech. Systems Signal Process. 16 (6), 10251041.
Zhang, L., Jack, L.B., Nandi, A.K., 2005. Fault detection using genetic programming.
Mech. Systems Signal Process. 19, 271289.
Zhang, S., Matyhew, J., Ma, L., Sun, Y., 2005. Best basis-based intelligent machine
fault diagnostics. Mech. Systems Signal Process. 19, 357370.
Table 5
Results for training using Difference Histogram and Wavelet features.
Input features Training Success (%) Test success (%)
Difference Histogram 95 92
Wavelet 96 91
B.J. van Wyk et al. / Pattern Recognition Letters 30 (2009) 595599 599

You might also like