1 s2.0 S0263224112002400 Main

Accepted Manuscript
Diagnosis of artificially created surface damage levels of planet gear teeth using
ordinal ranking
Xiaomin Zhao, Ming J. Zuo, Zhiliang Liu, Mohammad R. Hoseini
PII: S0263-2241(12)00240-0
DOI: http://dx.doi.org/10.1016/j.measurement.2012.05.031
Reference: MEASUR 1955
To appear in: Measurement
Received Date: 8 August 2011
Revised Date: 6 May 2012
Accepted Date: 30 May 2012
Please cite this article as: X. Zhao, M.J. Zuo, Z. Liu, M.R. Hoseini, Diagnosis of artificially created surface damage
levels of planet gear teeth using ordinal ranking, Measurement (2012), doi: http://dx.doi.org/10.1016/j.measurement.
2012.05.031
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers
we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting proof before it is published in its final form. Please note that during the production process
errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

1

Diagnosis of artificially created surface damage levels of
planet gear teeth using ordinal ranking
Xiaomin Zhao
1
, Ming J Zuo
1*
, Zhiliang Liu
1,2
and Mohammad R. Hoseini
1

1
Department of Mechanical Engineering
University of Alberta, Edmonton, Canada
ming.zuo@ualberta.ca
2
School of Automation Engineering
University of Electronic Science and Technology
Chengdu, China
Effective diagnosis of damage levels is important for condition based preventive
maintenance of gearboxes. One special characteristic of damage levels is the inherent
ordinal information among different levels. Retaining the ordinal information is therefore
important for diagnosing damage levels. Classification, a machine learning technique, has
been widely adopted for automated diagnosis of gear faults. However, classification cannot
keep the ordinal information because the damage levels are treated as nominal variables.
This paper employs ordinal ranking, another machine learning technique, to preserve the
ordinal information in automated diagnosis of damage levels. As to ordinal ranking,
feature selection is important. However, most existing feature selection methods are for
classification, which are not suitable for ordinal ranking. This paper designs a feature
selection method for ordinal ranking based on correlation coefficients. A diagnosis
approach based on ordinal ranking and the proposed feature selection method is then
introduced. This method is tested on diagnosis of artificially created surface damage levels
of planet gear teeth in a planetary gearbox. Experimental results show the effectiveness of
the proposed diagnosis approach. The advantages of using ordinal ranking for diagnosing
gear damage levels are also demonstrated.
Keywords- ordinal ranking; damage level; planetary gearbox; correlation coefficient

2

1. Introduction
Gearboxes are widely employed in automotive, aerospace, and various other industries to
provide speed and torque conversions. Effective diagnosis of gear faults is critical to the
reliable operation of a mechanical system. Majority of the reported investigations on gear
fault diagnosis focus on the detection of the presence of a fault [1] and identification of the
fault modes, such as pitting [1], crack [2] and tooth breakage [2]. The fault propagation is,
however, not studied as much.
Information on fault propagation reveals the severity of a fault, so it is very helpful in
scheduling preventive maintenance or other actions to avoid a failure. Feng et al. [3]
reported that the regularization dimension of gearbox vibration signals increased
monotonically with the gear fault severity. Ozturk et al. [4] stated that the scalogram,
especially its mean frequency variation, provided early indications of the presence and
progression of gear pitting. These methods find an indicator that monotonically varies with
fault levels. The fault level can then be estimated by checking the value of this indicator.
However, such methods need expertise from a diagnostician to apply them successfully and
cannot distinguish fault levels automatically. To alleviate this problem, researchers have
used classifiers to automatically classify different fault levels [5, 6]. For example, Lei et al.
[6] proposed a weighted KNN classification method for gear crack level identification. In
these methods, the fault level is regarded as a nominal variable, and the problem of
diagnosing fault levels is treated as a classification problem. As a result, the ordinal
information among fault levels is ignored [7]. For example, a moderate fault is worse than
(<) a slight fault but is better than (>) a severe fault; however, in classification, the severe,

3

moderate and slight faults are parallel to each other and cannot be compared using > and
< operations. Having ordinal information is the main characteristic of fault levels, which
makes the diagnosis of fault levels more complicated than the diagnosis of the fault modes.
How to utilize and retain the ordinal information is therefore an important issue.
Planetary gearbox is a type of gearbox whose planet gears are mounted on a carrier which
itself may rotate relative to the sun gear. It has many advantages over the traditional (fixed-
shaft) gearbox, e.g. high power output, small volume, multiple kinematic combinations, etc.
[8]. Nevertheless, planetary gearboxes are structurally more complicated and possess
several unique behaviours that are not found in fixed-shaft gearboxes. For instance, gear
meshing frequencies of planetary gearboxes are often completely suppressed, and sidebands
are not as symmetric as those of fixed-shaft gearboxes [8]. Barszca and Randall [9] found
that the conventional analysis methods did not detect a tooth crack in a planet gear. They
then proposed a diagnosis method based on spectral kurtosis and achieved good results.
Bartelmus and Zimroz [10] found that a planetary gearbox with a fault was more
susceptible to load than a healthy gearbox. Based on this observation, they proposed a new
feature for monitoring a planetary gearbox in non-stationary operating conditions in [11, 12].
The objective of this paper is to develop an intelligent method for diagnosing damage levels
of planet gears in a planetary gearbox. In order to reserve the ordinal information in damage
levels, ordinal ranking will be employed. Ordinal ranking [13, 14] (also called ordinal
classification [15] and ordinal regression [16]) is a supervised machine learning technique
that uses an ordinal variable as its label. Details on ordinal ranking will be introduced in
Section 2. Feature selection is a necessary procedure for ordinal ranking, particularly

4

because it can enhance accuracy and improve the efficiency of training [17]. Most existing
feature selection methods are actually developed for classification problems, which makes
them work less effectively in ordinal ranking problems. For example, Mukras et al. [18]
found that the standard feature selection method using information gain failed to identify
discriminatory features in an ordinal ranking problem. The reported work on feature
selection dedicated to ordinal ranking is very limited. Baccianella et al. [19] proposed
feature selection methods based on information gain for ranking in text-related applications.
Geng et al. [17] proposed a feature selection method based on mean average precision
(MAP) and Kendalls tau correlation coefficient for ranking in information retrieval. In this
paper, a feature selection method based on correlation coefficients will be designed.
The rest of the paper is organized as follows. Section 2 introduces the concept of ordinal
ranking and describes a reported ordinal ranking algorithm. Section 3 proposes a feature
selection method for ordinal ranking. Section 4 describes the planetary gearbox test rig for
experimental data collection. Section 5 explains the feature calculation and extraction for
the planetary gearbox. Section 6 presents an approach for diagnosis of damage levels based
on ordinal ranking. Section 7 tests the diagnosis approach with experimental data and
discusses the results. Finally, conclusion comes in Section 8.
2. Ordinal ranking
2.1. Review on ordinal ranking
Steven [20] divided the scales of measurement into four types: nominal, ordinal, interval,
and ratio. The nominal scale is for variables that have two or more categories but do not
have an intrinsic order, e.g. types of fruits. When only two categories are involved, the

5

variable is called binary, e.g. gender. The ordinal scale is rank-ordered but does not have
metric information, e.g. grades of students (A+, A, A-, B+,, F). The interval scale and the
ratio scale are for continuous variables which have metric information. In supervised
machine learning, a training set in the form of T=[X, y] is given, where X is the input set
(the feature set) and y is the output set (the label). According to the labels measurement
scale, supervised machine learning problems can be grouped into three categories: if the
label is a nominal variable, the problem is called classification; if the label is a continuous
variable, the problem is called regression; if the label is an ordinal variable, the problem is
called ordinal ranking [13]. In this paper, we call the labels of ordinal ranking, ranks.
Ordinal ranking is similar to classification in the sense that the rank y is a finite set.
Nevertheless, besides representing the nominal variables as classification labels, ranks of
ordinal ranking also carry ordinal information. That is, two ranks in y can be compared by
the < (better) or > (worse) operation. Ordinal ranking is also similar to regression, in
the sense that ordinal information is similarly contained in y. However, unlike the real-
valued regression labels; the discrete ranks do not carry metric information. That is, we can
say rank A is better than rank B, but it is hard to say quantitatively how much better rank A
is.
A commonly used idea to conduct ordinal ranking is to transform the ranking problem to a
set of binary classification problems, or to add additional constraints to traditional
classification formations. Herbrich et al. [16] proposed a loss function between pairs of
ranks, and then applied the principle of structure risk minimization to solve the ordinal
ranking problem. However, because there are O (N
2
) pairwise comparisons out of N training

6

samples, the computation complexity is high when N is large. Crammer and Singer [21]
generalized the online perception algorithm with multiple thresholds to predict r ranks. The
feature space was divided into r parallel equally-ranked regions, where each region stood
for one rank. With this approach, the loss function was calculated pointwisely and the
quadratic expansion problem was avoided. Following the same idea, Shashua and Levin [14]
generalized support vector machine (SVM) into ordinal ranking by finding r-1 thresholds
that divided the real line into r consecutive intervals for the r ranks. Chu and Keerthi [22]
improved the approach in [14] and proposed two new approaches by imposing the ordinal
inequality constraints on the thresholds explicitly in the first approach and implicitly in the
second one. Among the above methods, SVM-based methods have shown great promise in
ordinal ranking. The algorithm proposed in [22] (called support vector ordinal regression
(SVOR)) implicitly adding constraints to SVM is straightforward and easy to interpret, and
therefore will be adopted in this paper.
2.2.A reported ordinal ranking algorithm [22]
The concept of the ordinal ranking algorithm (SVOR) [22] is briefed as follows. First, the
original feature space (X) is mapped into the high dimensional feature space ( ( )) | x . In this
feature space ( ( )) | x , an optimal projection direction (w) and r-1 thresholds (b) which
define r-1 parallel discriminant hyperplanes for the r ranks correspondingly were found, as
illustrated in Fig. 1. The sample x satisfying
1
( )
i i
b b |
< < w x is assigned rank i. The

ranking model is thus
1
(rank) , ( )
i i
z i if b b |
= < < w x . (1)

7

b
1 b
i
b
2
...
b
r-1
rank 1 rank 2
b
i-1
b
i+1
...
rank i rank i+1 rank r
( ) ( ) f | = w x x

Fig. 1 Schematic drawing of the ordinal ranking algorithm (SVOR)

For a threshold b
j
, the function values of samples from all the lower ranks should be less
than the lower margin b
j
-1. If sample
k
i
x violates this requirement, then
( ) ( 1)
j k
ki i j
b c | = w x (2)
is taken as the error associated with the sample
k
i
x for b
j
, where k is the rank of
k
i
x and
k j s . Similarly, the function values of samples for the upper rank should be greater than
the upper margin b
j
+1. Otherwise,
*
( 1) ( )
j k
ki j i
b c | = + w x (3)
is the error associated with the sample
k
i
x for b
j
, where k j > . Considering the error terms
associated with all r-1 thresholds, the primal problem to find the optimal w and thresholds
b are defined as follows:
*
1
*
, , ,
1 1 1 1 1
* *
1
min ( )
2
. . ( ) 1 , 0,
1, 2,..., ; 1, 2,...,
( ) 1 , 0,
1, 2,..., ; 1, 2,...,
k k
j j r n n
T j j
ki ki
j k i k i
k j j
i j ki ki
k
k j j
i j ki ki
k
C
s t b
for k j i n
b
for k j j r i n
c c
c c
| c c
| c c
= = = = =
+ +
s + >
= =
s + + >
= + + =

w
w w
w
w
b
x
x
(4)

8

where j runs over 1, 2, , r-1 and n
k
is the number of samples with the rank k. By solving
this optimization problem, the optimal w and b will be found, and thus the ranking model
(Eq. (1)) will be built.
3. Feature selection method for ordinal ranking
In fault diagnosis, the number of features is usually large because many features can be
extracted to capture fault information utilizing signal processing technology. The large
number of features may reduce the performance of ordinal ranking [17]. Thus feature
selection is needed. Feature selection can be achieved by finding a feature subset that has
maximum relevance to the rank (label) and minimum redundancy among the features
themselves [17, 23, 24]. Measures are needed to evaluate the relevance between a feature
and the label (i.e. feature-label relevance) and redundancy among features (i.e. feature-
feature redundancy), respectively. However, most existing measures are proposed for
classification. The measures of feature-feature redundancy can work in an ordinal ranking
problem if the types of features the same as in a classification problem. Nevertheless, the
measures of feature-label relevance in a classification problem is not suitable for an ordinal
ranking problem [17, 18], because the label of ordinal ranking is an ordinal variable
whereas the label of classification is a nominal variable, as stated in Section 2.1. In this
paper, correlation coefficients are used for these measures, because they are conceptually
simple and practically effective.

9

3.1. Correlation coefficients
Correlation coefficients evaluate the relevance between two variables. Depending upon the
types of variables, several correlation coefficients are defined, some of which are listed in
Table 1 [25, 26].
Table 1 List of correlation coefficients [25, 26]
Types of variables Nominal (binary) Ordinal Continuous
Nominal (binary) Phi Rank-biserial Point-biserial
Ordinal Rank-biserial Polychoric, Spearman rank,
Kendal rank
Polyserial
Continuous Point-biserial Polyserial Pearson

In fault diagnosis of machinery, features are usually continuous variables (e.g. kurtosis) [6,
27] and/or nominal variables (e.g. the status of a valve (on/off)) [27]. In a classification
problem (the label is nominal), the feature-label relevance can be evaluated using the Point-
biserial correlation coefficient (if the feature is continuous) and the Phi correlation
coefficient (if the feature is nominal). The feature-feature redundancy can be evaluated
using the Point-biserial correlation coefficient (if one feature is continuous and the other is
nominal) and the Pearson correlation coefficient (if both features are continuous). The
Point-biserial and the Phi correlation coefficients are mathematically equal to the Pearson
correlation coefficient. That is why the Pearson correlation coefficient is said to be widely
used in classification problems [23, 28].
In an ordinal ranking problem, the feature-label relevance can be evaluated by the Polyserial
correlation coefficient (if the feature is continuous), the Rank-biserial correlation coefficient
(if the feature is nominal) and the Polychoric, Spearman rank or Kendal rank correlation
coefficient (if the feature is ordinal). The feature-feature redundancy can be evaluated using

10

the Pearson correlation coefficient (if both features are continuous), the Point-biserial
correlation coefficient (if one feature is continuous and the other is nominal), and the Rank-
biserial correlation coefficient (if one feature is ordinal and the other is nominal).
In planetary gearbox experiments, the features (will be shown in Section 5) calculated are
continuous variables. So the Polyserial correlation coefficient and the Pearson correlation
coefficient are used as measures of feature-label relevance and feature-feature redundancy,
respectively in this paper. The mathematical explanations of the Pearson and the Polyserial
correlation coefficients are described in the following.
The Pearson correlation coefficient,
xy
, is defined in Eq.(5), where x and y are two
continuous variables.
1
2 2
1 1
( (
cov( , )
( (
n
i i
i
xy
n n
x y
i i
i i
x y
x y
o o
=
= =

= =

x y
x y
x y
(5)
In some cases, the continuous variable (y) could only be measured in rough categories like
low, medium, high, etc, using an ordinal variable z (shown as Eq. (6)). Here z
1
, z
2
, , z
r

( 2) r > are known increasing ranks (i.e. z
i
< z
i+1
) and b=
0 1
( , ,..., )
r
b b b is a vector of known
thresholds with
0
b = and
r
b = +.
1
, 1, 2,...,
i j j
z z if b y b j r
= < < = (6)

The Polyserial correlation coefficient between observed data (x and z) is actually an
estimation of the Pearson correlation coefficient between x and y. Without loss of generality,

11

we assume x ~ ( , )
x x
N o , y ~ (0,1) N , and the joint distribution of x and y follows the
bivariate normal distribution,
2
2
2
2
2
2 )
( )
1
( , ) exp[
2(1 )
2 1-
x xy
x
x x
xy
x xy
xy y
x
y
p x y

o o
to
+
=
. (7)
In Eq. (7),
xy
is called the Polyserial correlation coefficient between x and z. The value of
xy
can be obtained by maximum-likelihood estimator or two-step estimator [29]. For
details, refer to Ref. [29].
3.2. The proposed feature selection method for ordinal ranking
The value of correlation coefficient varies from -1 to 1. A correlation coefficient of 1 (-1)
means that the two variables are perfectly (perfectly inversely) correlated, and 0 means that
the two variables are not correlated. If the absolute value of the correlation coefficient
between the two variables is closer to 0, then the two variables are less correlated.
A feature with a larger absolute value of the Polyserial correlation coefficient has more
information on the rank, whereas a feature with a smaller absolute value contributes less
information on the rank. Similarly, two features with larger absolute value of the Pearson
correlation coefficient share more redundant information. The proposed feature selection
method follows the maximum relevance and minimum redundancy feature selection scheme.
It selects a feature subset, V, based on Eq. (8) where I(
i
x , z) is the relevance between
feature
i
x and rank z, ( , )
i j
I

x x is the redundancy between two features
i
x and
j
x , and t
1

and t
2
are two thresholds. t
1
is selected to avoid features with little or negative information
being included in V. t
2
is selected to ensure that the redundancy between any two arbitrary

12

features is below a certain level. The values of t
1
and t
2
are determined by specific
application problems.
1
2
max ( , ) ( , )
s.t. ( , )
( , ) , , ,
i
i
i
i j i j
D I
I t
I t i j

=
>
< e =
x
V z x z
x z
x x x x V
(8)
A sequential forward search strategy can be used to find the solution to Eq. (8). The
proposed feature selection method is described in Table 2. The raw feature set is denoted by
matrix
1 2
[ , , ..., ]
m n m
= X x x x (m is the total number of features and n is the total number of
samples). Vector z
1
[ ]'
i n
z

= represents the ranks for the n samples.
Table 2 The proposed feature selection method for ordinal ranking
Input: T=
1 2 ( 1)
[ , , ..., , ]
m n m +
x x x z // input data set
t
1 ,
t
2
(1>t
1
,t
2
>0) // thresholds
Output: V // the selected feature subset

Step 1. Set V=| .
Step 2. Calculate the relevance vector, p
1
[ ]
j m
p

= , whose element, p
j
, is the absolute
value of the Polyserial correlation coefficient between the j
th
feature (i.e.
j
x ) and the
rank vector (z). j=1, 2, , m.
Calculate the redundancy matrix [ ]
ij m m
s

= S , whose element, s
ij,
, is the absolute
value of the Pearson correlation coefficient between the i
th
feature (
i
x ) and the j
th

features (
j
x ). i, j=1, 2, , m.
Step 3. Find the largest element in p, i.e. p
r
=max (p), then put the corresponding feature
r
x into V (i.e. V=[V
r
x ]) and set p
r
=0.
Step 4. Find the features whose redundancy with feature
r
x are not smaller than t
2
,
i.e.
2
{ | }
hr
h s t > , then set their relevance values to zero (i.e. p
h
=0) so that these features
wont be selected in future steps.
Step 5. Check elements in p.
If p
j
st
1
for all j=1, 2, , m, then go to Step 6; otherwise, go

to Step 3.
Step 6. Return V.

13

Note that in Table 2, the Polyserial and the Pearson correlation coefficient are used for
evaluation of feature-label relevance and feature-feature redundancy respectively, because
the features involved in our experiments (will explained in Section 5) are continuous. In
other applications, proper type of correlation coefficient can be chosen according to the
types of features (Table 1) if the features are not continuous.
4. Experimental data collection
The planetary gearbox test rig [30] shown in Fig. 2 was utilized to collect vibration data for
developing a reliable fault diagnosis system. It includes a 20 HP drive motor, a bevel
gearbox, two planetary gearboxes, two speed-up gearboxes and a 40 HP load motor. The
load was applied through the drive motor. A torque sensor was installed at the output shaft
of the second stage planetary gearbox. The transmission ratio of each gearbox is listed in
Table 3. Fig. 3 shows the schematic diagram of the two planetary gearboxes. Four
accelerometers, namely LS1, HS1, LS2 and HS2, are located on the housing of the
planetary gearboxes. LS1 and LS2 are two identical low-sensitivity accelerometers. HS1
and HS2 are two identical high-sensitivity accelerometers.

14

Table 3 Specification of the planetary gearbox test rig

LS1 LS2 HS1 HS2
1
st
stage housing 2
nd
stage housing
2
nd
stage planet gear
2
nd
stage sun gear
1
st
stage planet gear
1
st
stage sun gear
1
st
stage carrier
1
st
stage ring gear 2
nd
stage ring gear
2
nd
stage carrier
Shaft #1 Shaft #2 Shaft #3
1
st
stage planet gear
1
st
stage sun gear
1
st
stage carrier
1
st
stage ring gear
2
nd
stage planet gear
2
nd
stage sun gear
2
nd
stage carrier
2
nd
stage ring gear
2
nd
stage housing
1
st
stage housing
Shaft #3 Shaft #2 Shaft #1
LS1 HS1 HS2 LS2

Fig. 2 Schematic diagram of the two-stage planetary gearbox

We artificially created the gear tooth surface damage at several severity levels following
Refs. [4] and [31]. Circular holes with the diameter of 3 mm and the depth of 0.1 mm were
created on the teeth of one of the four planet gears in the second stage planetary gearbox.
Gearbox No. of Teeth Ratio
Bevel Input gear 18 4.000
Output gear 72
The 1
st
stage planetary Sun gear 28 6.429
Three planet gears 62
Ring gear 152
The 2
nd
stage planetary Sun gear 19 5.263
Four planet gears 31
Ring gear 81
The 1
st
stage speed-up Input gear 72 0.133
Middle gear 1 32
Middle gear 2 80
Output gear 24
The 2
nd
stage speed-up Input gear 48 0.141
Middle gear 1 18
Middle gear 2 64
Output gear 24

15

The number of holes and the number of teeth with holes were varied to mimic the slight,
moderate and severe damage,as below. Fig. 4 shows the four gears having different
damage levels. Details on damage creation are reported in [30]. For slight damage, 3 holes
on one tooth and 1 hole on each of the two neighbouring teeth were created. The damage
area accounted for 2.65%, 7.95%, and 2.65% of the surface of each of these three teeth,
respectively. For moderate damage, 10 holes on one tooth, 3 holes on each of the two
immediate neighbouring teeth, and 1 hole on each of the next neighbouring teeth on
symmetric sides were created. The damage areas of these five teeth were 2.65%, 7.95%,
26.5%, 7.95% and 2.65% of the tooth surface, respectively. For severe damage, 24 holes
on one tooth, 10 holes on each of the two immediate neighbouring teeth, and 3 holes on
each of the next neighbouring teeth on symmetric sides were created. The damage areas of
these five teeth were 7.95%, 26.5%, 63.6%, 26.5% and 7.95% of the tooth surface,
respectively.
For each gear, vibration data with the length of 10 minutes were collected from four
accelerometers with a sampling frequency 10 KHz at each combination of the following
conditions: four drive motor speed conditions (i.e. 300, 600, 900 and 1200 revolution per
minute, RPM) and two load conditions (i.e. low load (191.9 ~ 643.6 [N-m]) and high load
(812.9 ~ 1455.2 [N-m]). At the low load condition, the load motor was off, but there were
frictions in the two speedup gearboxes and the rotor in the load motor was also rotating.
According to the readings of the torque sensor, at this low load condition, the load that was
applied at the output shaft of the planetary gearboxes ranged from 191.9 [N-m] to 643.6
[N-m]. The high load condition was selected based on the gear materials and stress
calculation. We adjusted the loading applied by the load motor to reach an average of 1130

16

[N-m] as displayed by the torque sensor, so that the system would run with a comfortable
safe margin. The actual readings of the torque sensor fluctuated from 812.9 [N-m] to
1455.2 [N-m]. The 10-minute data were evenly split into 20 samples, so 160 samples (i.e.
20 samples 2 (loads) 4 (speeds)) were collected for each gear. Totally 640 samples (i.e.
160 samples 4 (damage levels)) are available.

a) No damage b) Slight damage

c) Moderate damage d) Severe damage
Fig. 4 Planet gears with artificially created damage at different severity levels

5. Feature calculation and extraction for planetary gearboxes
The traditional techniques for vibration-based gear fault diagnosis are typically based on
statistical measurements of the collected vibration signals [32, 33]. Many statistical features
have been proposed and studied for fixed-shaft gearboxes; however, some are not suitable
for planetary gearboxes.

17

In a fixed-shaft gearbox, damage to an individual gear tooth modulates with the vibration on
the housing at the shaft frequency. In the frequency domain, the damage appears in the form
of symmetric sidebands around the gear meshing frequency which is the dominant
component. In a planetary gearbox, the dominant frequency component usually does not
appear at the gear meshing frequency because the planet gears are usually not in phase. In
fact, the gear mesh frequencies are often completely suppressed, and sidebands are not
symmetric around the gear meshing frequency any more [33]. For description convenience,
, m n
f

is used to denote the frequency at ( )
c
r
f m Z n f = + , where Z
r
is the number of ring gear
teeth,
c
f is the carrier frequency, m (m>0) and n are integers. In an ideal planetary gearbox,
only frequency components that appear at sidebands where
r p
m Z n kN + = (N
p
is the number
of planets) survive in a vibration signal. Keller and Grabill [33] referred to the surviving
sidebands with two different names: dominant sideband and apparent sideband. For each
group of sidebands with the same value of m, there is one dominant sideband (donated
by
d
m,n
RMC ) which is the one closest to the m
th
harmonic of gear meshing frequency. Other
surviving sidebands in this group are called apparent sidebands (donated by
, m n
RMC ). Let
s
RMC denote the shaft frequency and its harmonics,
+1
d
m,n
RMC denote the first-order
sideband of
d
m,n
RMC . The regular (reg), difference (d), residual (r) and envelope (e) signals
for a planetary gearbox are then defined in Eq. (9) [33]. In Eq. (9), x is a vibration signal in
the time-waveform, F
-1
is the inverse Fourier transform, bp is the signal band-pass filtered
about the dominant sideband of the gear meshing frequency (
1
d
,n
RMC ) and H(bp) is the
Hilbert transform of bp.

18

1
, , , 1
1
, ,
[ ]
[ ]
= | [ ( )] |
d d
d
s
m n m n m n
s
m n m n
F RMC RMC RMC RMC
F RMC RMC RMC
iH
= + + +
=
= + +
+
reg
d x reg
r x
e bp bp
(9)
With signals reg, d, r and e defined, features can now be calculated for planetary gearboxes.
Sixty-three features are extracted from each vibration signal: 18 features are from the time-
domain, 30 features are from the frequency-domain, and 15 features are specifically
designed for gear fault diagnosis. Table 4 lists the definitions of these features.
Table 4 List of feature names and definitions [33, 34]
# Feature Name Definition #
Feature
Name
Definition
Time-domain features
F1 maximum
_ max max( ) x = x
F2 minimum
_min min( ) x = x
F3
average
absolute
1
1
_
N
i
i
x abs x
N
=
=

F4 peak to peak
_ _max- _min x p x x =
F5 mean
1
1
=
N
i
i
x
N
=
x
F6 RMS
2
1
1
_
N
i
i
x rms x
N
=
=

F7 delta RMS
1
_ _ _
j j
x drms x rms x rms

=
where j is the current
segment of time record
and j-1 is the previous
segment.
F8 variance
2 2
1
1
_ ( )
N
i
i
x x
N
o
=
=
x

F9
standard
deviation
2
_ _ x x o o = F10 skewness
( )
3
1
3
1
_
( _ )
N
i
i
x
N
x sk
x o
=

=

x

F11 kurtosis
( )
4
1
4
1
_
( _ )
N
i
i
x
N
x kur
x o
=

=

x

F12 crest factor
_ max
_
_
x
x cf
x rms
=

F13
clearance
factor
( )
2
max | |
_
( )
x clf
x_rms
=
x

F14
impulse
factor
max(| |)
_
_
x if
x abs
=
x
F15 shape factor
_
_
_
x rms
x sf
x abs
=
F16
coefficient of
variation
_
_
x
x cv
x o
=
F17
coefficient of
skewness
3
1
3
1
_
( _ )
N
i
i
x
N
x cs
x o
=
=

F18
coefficient of
kurtosis
4
1
4
1
_
( _ )
N
i
i
x
N
x ck
x o
=
=

Frequency-domain features

19

F19
mean
frequency
1
1
K
k
k
mf X
K
=
=

F20
frequency
center
1
1
[ ]
K
k k
k
K
k
k
f X
fc
X
=
=

F21
root mean
square
frequency
2
1
1
( )
K
k k
k
K
k
k
f X
rmsf
X
=
=

F22
standard
deviation
frequency
( )
( )
2
1
1
K
k k
k
K
k
k
f fc X
stdf
X
=
=

=

F23-
F35
amplitudes at
characteristic
frequencies of
the 1
st
stage
planetary
gearbox
amplitudes at the
frequencies:
1 1
1, 1
( )
c
n r
f Z n f = +
where n= -6, -5, , 6
F36-
F48
amplitudes at
characteristic
frequencies of
the 2
nd
stage
planetary
gearbox
amplitudes at the
frequencies:
2 2
1, 2
( )
c
n r
f Z n f = +
where n= -6, -5, , 6
Features specifically designed for planetary gearboxes
F49 energy ratio
( )
( )
RMS
er
RMS
=
d
r

F50
energy
operator
( ) eo kurtosis = y
where
2
1 +1
i
i i i
y x x x
=
F51 FM4
4 ( ) FM kurtosis = d
F52 M6A
( )
( )
6
1
3
2
1
1
6
1
N
i
i
N
i
i
d
N
M A
d
N
=
=
=
| |
|
\ .
d
d

F53 M8A
( )
( )
8
1
4
2
1
1
8
1
N
i
i
N
i
i
d
N
M A
d
N
=
=
=
| |
|
\ .
d
d

F54 NA4
( )
( )
4
1
2
2
1 1
1
4
1 1
N
i
i
M N
ij j
j i
r
N
NA
r
M N
=
= =
=
| | | |
| |
\ . \ .

r
r

F55 NB4
( )
( )
4
1
2
2
1 1
1
4
1 1
N
i
i
M N
ij j
j i
e
N
NB
e
M N
=
= =
=
| | | |
| |
\ . \ .

e
e

F56 FM4*
( )
( )
'
4
1
2
2
'
1 1
1
4*
1 1
N
i
i
M N
ij j
j i
d
N
FM
d
M N
=
= =
=
| |
| |
|
|
|
\ .
\ .

d
d

F57 M6A*
( )
( )
'
6
1
3
2
'
1 1
1
6 *
1 1
N
i
i
M N
ij j
j i
d
N
M A
d
M N
=
= =
=
| | | |
|
|
|
\ .
\ .

d
d

F58 M8A*
( )
( )
'
8
1
4
2
'
1 1
1
8 *
1 1
N
i
i
M N
ij j
j i
d
N
M A
d
M N
=
= =
=
| |
| |
|
|
|
\ .
\ .

d
d

F59 NA4*
( )
( )
4
1
2
'
2
1 1
1
4*
1 1
'
N
i
i
M N
ij j
j i
r
N
NA
r
M N
=
= =
=
| | | |
| |
\ . \ .

r
r

F60 NB4*
( )
( )
4
1
2
'
2
1 1
1
4*
1 1
'
N
i
i
M N
ij j
j i
e
N
NB
e
M N
=
= =
=
| | | |
| |
\ . \ .

e
e

F61 FM0
1
max( ) min( )
0
d
p
m,n
m
FM
RMC
=
x x

where p is the total
number of harmonics
considered
F62
sideband
level factor
1 +1
_
d d
m, n m, n
RMC RMC
slf
x o
+
=
F63 sideband index
1 +1
2
d d
m, n m, n
RMC RMC
si

+
=
Note: X is the Fourier transform of x. N is the length of signal x. K is the length of signal X. M
represents the total number of segments up to the present. M represents the total number of segment
in which gearbox is healthy.

20

6. Diagnosis of damage levels using ordinal ranking
Fig. 5 shows the flow chart of the proposed method for diagnosing gear tooth surface
damage levels using ordinal ranking. Firstly, sixty-three features described in Table 4 were
calculated for each of the four sensors. Secondly, features from all the sensors are combined,
making the total number of features be 252 (i.e. 634), and the whole data set be 640 (i.e.
160 samples/level 4 levels) by 252 (features). Thirdly, feature selection is conducted
using the feature selection method proposed in Section 3.2. Finally, the selected feature
subset is imported into the ordinal ranking algorithm (SVOR) described in Section 2.2 to
diagnose the damage levels, and output the diagnosis results.
For the convenience of description, we will use ranks 1, 2, 3, and 4 to denote the
baseline, slight damage, moderate damage and severe damage in subsequent sections. The
diagnosis results will be quantitatively evaluated using two metrics [22]: mean absolute
(MA) error (Eq. 10) and mean zero-one (MZ) error (Eq. (11)). MA error is affected by how
wrongly a sample is diagnosed. The further the diagnosed rank is from the true rank, the
larger the MA error is. If more ordinal information is preserved in the trained ranking
model, the MA error is more likely to be smaller. MZ error, commonly used in
classification problems, is affected only by whether a sample is wrongly diagnosed or not.
If each rank is more clearly separated from others, the MZ error is more likely to be
smaller. The smaller MA and MZ errors mean a better ranking model. In the two equations,
N is the total number of samples, '
i
z is the diagnosed rank, and
i
z is the true rank.
Mean absolute error (MA error):
1
1
| ' |
N
i i
i
z z
N
=

(10)

21

Mean zero-one error (MZ error):
1
1 '
1
,
0 '
N
i i
i i
i i i
if z z
t where t
if z z N
=
=
=
(11)

Vibration data
Envelope signal
Difference and
residual signals
Frequency
spectrum
18 time-domain
features
15 features specifically
designed for gearbox
damage diagnosis
30 frequency-
domain features
Ccombine features from all sensors
Select sensitive features using the
proposed method
Diagnose damage level using ordinal
ranking
Output diagnosis results

Fig. 5 The proposed diagnosis approach for damage levels

7. Results and discussion
Vibration data collected from the planetary gearbox test rig are used to demonstrate the
effectiveness of the proposed diagnosis approach. Among the total 252 features, features #1
~ # 63 are from sensor LS1 following the order in Table 4; features #64 ~ #126, #127 ~
#139, and #140~ #252 are from LS2, HS1, and HS2, respectively. The feature-label
relevance (i.e. the absolute value of the Polyserial correlation coefficient) between each
individual feature and ranks (damage levels) are shown in Fig. 6. It can be seen that

22

different features have different relevance values, some of which are very small. A
threshold (t
1
) is employed to determine whether a feature has positive contribution to the
ranks. If t
1
is large, only a few really important features will be kept; if t
1
is small, most
features will be kept and some might be useless. In this paper, we choose t
1
=0.5 so that
more than half information contained in an individual feature is related to the ranks. The
largest value in Fig. 6 is 0.765 (feature #94), followed by 0.762 (feature #31), 0.762 (feature
#157), and 0.752 (feature #220). These features (#94, #31, #157 and #220) are the
amplitudes at sideband
1
1
( 2)
c
r
Z f + from sensors LS2, LS1, HS1 and HS2, respectively.
0 50 100 150 200 250 300
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
#31 #94 #157 #220
Feature #
P
o
l
y
s
e
r
i
a
l

c
o
r
r
e
l
a
t
i
o
n

c
o
e
f
f
i
c
i
e
n
t

Fig. 6 Feature-label relevance between damage levels and each of the 252 features

The feature-feature redundancy (i.e. the absolute value of the Pearson correlation coefficient)
between feature #94 and each of the 252 features are shown in Fig. 7. It can be seen that
some features (e.g. #31, #157 and #220) are highly related to feature #94; this means that a
large amount of information in those features is also contained in feature #94. A threshold
(t
2
) is chosen to limit the redundancy among selected features. Features whose redundancy

23

values with selected features are higher than t
2
will be omitted. If t
2
is large, only a few
features will be omitted and finally most feature will be selected; if t
2
is small, most features
will be omitted and finally only a few features will be selected. By checking Fig. 7, we
choose t
2
=0.8 so that the highly related features (i.e. feature #31, #157 and #220) are
omitted and others can be further considered in next feature selection steps. After t
1
and t
2
being chosen, the proposed feature selection method (Section 3.2) is applied and 11 features
are finally selected (Table 5).
0 50 100 150 200 250 300
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
#31 #94 #157 #220
Feature #
P
e
a
r
s
o
n

c
o
r
r
e
l
a
t
i
o
n

c
o
e
f
f
i
c
i
e
n
t

Fig. 7 Feature-feature redundancy between feature #94 and each of the 252 features
Table 5 Eleven features selected by the proposed feature selection method
List Feature # Physical meaning Sensor
1 94 Amplitude at
1
1
( 2)
c
r
Z f + LS2
2 10 Skewness LS1
3 93 Amplitude at
1
1
( 1)
c
r
Z f + LS2
4 11 Kurtosis LS1
5 172 Amplitude at
2
2
( 4)
c
r
Z f + HS1
6 124 FM0 LS2
7 4 Peak to peak LS1
8 89 Amplitude at
1
1
( 3)
c
r
Z f LS2
9 192 Average absolute value HS2
10 22 Standard deviation frequency LS1
11 29 Amplitude at
1
1
c
r
Z f LS1

24

To test the diagnostic ability of ordinal ranking, the whole data set is split into two subsets:
the training set and the test set. The training set is for training a ranking model. The test set
is for testing the diagnostic ability of the trained ranking model. In this paper, three separate
scenarios are considered for splitting the whole data set, as listed in Table 6. The algorithm
SVOR introduced in Section 2.2 is employed to train and test a ranking model. The 2
nd

degree polynomial kernel was used as the kernel function.
Table 6 Distribution of the training set and the test set
Scenario Training Set Test Set
# of samples ranks of samples #of samples ranks of samples
Scenario 1 320 1, 2, 3, 4 320 1, 2, 3, 4
Scenario 2 480 1, 3, 4 160 2
Scenario 3 480 1, 2, 3 160 4

- In Scenario 1, the whole data set was randomly split into two equal-sized subsets,
one for training and the other for testing. In both subsets, samples from ranks 1, 2, 3
and 4 are included. This scenario tests the performance of the ranking model when the
training set covers the whole rank range.
- In Scenario 2, samples from only ranks 1, 3 and 4 are included in the training
set; and samples from only rank 2 are included in the test set. Practically, it might occur
that data of some damage levels are missed in the training set. This scenario represents the
case when data of slight damage level are not collected for training. This scenario tests the
interpolation ability of the ranking model.
- In Scenario 3, samples from only ranks 1, 2 and 3 are included in the training
set; and samples from only rank 4 are included in the test set. Similar to Scenario 2, this
scenario examines the case when data of severe damage level are not collected for training.
It tests the extrapolation ability of the ranking model.

25

7.1.Effect of feature selection
To check the performance of the proposed feature selection method, five feature subsets
are employed for analyzing the same data in each scenario: (1) all 252 features; (2) top 38
relevant features; (3) 11 features in Table 5 (the proposed method); (4) randomly selected
11 features; (5) 11 features selected by the Pearson correlation coefficient.
Details on how and why these feature subsets are obtained are explained as follows.
Feature subset (1) doesnt involve feature selection. Feature subset (2) follows the feature
selection scheme that uses top features without considering relationships among features
[6]. Under this scheme, 38 features having feature-label relevance values larger than 0.5
are obtained. Comparison of feature subsets (1) and (2) shows the influence of irrelevant
features. Feature subset (3) is generated by the proposed method. Comparison of feature
subsets (2) and (3) demonstrates the influence of redundant features. Feature subset (4)
chooses 11 features randomly. Comparison of feature subsets (3) and (4) further
emphasizes the importance of proper feature selection. Feature subset (5) is generated
using a feature-label evaluation measure that is employed in [23] (i.e. the Pearson
correlation coefficient, more precisely, the Point-biserial correlation coefficient). The
generation process for feature subset (5) is the same as the proposed method except that the
Pearson correlation coefficient is used in evaluation of the feature-label relevance (In the
proposed method, the Polyserial correlation coefficient is used). Comparison of feature
subsets (3) and (5) indicates the proper measure for feature-label relevance in ordinal
ranking problems.

26

Each of the five feature subsets is imported into SVOR to diagnose damage levels in each
scenario. In Scenario 1, the training set and the test set are randomly generated. To reduce
the impact of randomness on the test results, 30 runs are conducted. The mean and the
standard deviation of the 30 test errors are provided in Table 7. Using all 252 features, the
mean values of MA error and MZ error are both 0.099. Using the 38 relevant features, the
mean values of MA error and MZ error are reduced to 0.078 and 0.077, respectively. This
shows that irrelevant features have adverse effects on the ranking model. Using the
proposed method, some redundant features are further deleted from the 38 features,
keeping only 11 features. As a result, the mean values of MA error and MZ error are
further reduced to 0.073 and 0.072, respectively. This shows that the redundant
information can reduce the performance of the ranking model, and thus needs to be
excluded. Using the randomly selected 11 features, the mean MA error and the mean MZ
error are 0.229 and 0.220 respectively, which are relatively high. The reason is that not
enough relevant information is adopted in these features and there might be redundant
information as well. Using the 11 features selected by the Pearson correlation coefficient,
the mean MA error and the mean MZ error are 0.083 and 0.082, respectively. Compared
with the results of the proposed method, it can be shown that the Pearson correlation
coefficient work less efficiently than the Polyserical correlation coefficient (the proposed
method). The reason is that the Pearson correlation coefficient cannot properly reflect the
relevance between a continuous feature and an ordinal rank. As a result, relevant features
are not correctly selected. In Scenario 1, the proposed method corresponds to the lowest
MA error and MZ error.

27

Table 7 Results of Scenario 1 (320 training samples (ranks 1, 2, 3, 4) and 320 test samples (ranks 1,
2, 3, 4))
Feature subset MA Error
(mean standard
deviation)
MZ Error
(mean standard
deviation)
(1) all 252 features 0.099 0.022 0.099 0.022
(2) top 38 features 0.078 0.016 0.077 0.016
(3) 11 features (the proposed method) 0.073 0.012 0.072 0.012
(4) randomly selected 11 features 0.229 0.025 0.220 0.024
(5) 11 features selected using the Pearson
correlation coefficient
0.083 0.020 0.082 0.019

In Scenario 2, the training samples are from ranks 1, 3 and 4 only. The test samples
(rank 2) are predicted to be one of the three ranks (i.e. 1, 3, and 4). Because rank 2
is never predicted, the MZ error is always 1. We will compare MA errors only. In the
perfect case, the test samples are all diagnosed to be either rank 1 or 3, which are two
closest ranks to the true rank (2). In this case, the MA error is 1. In the worst case, the
test samples are all diagnosed to be rank 4, making a MA error of 2. The diagnosed
results are shown in Table 8. With all 252 features, 124 samples are diagnosed to be rank
3 and the rest are rank 4, making a MA error of 1.225. Using the top 38 features, 149
samples are ranked 3 and the rest are ranked 4, resulting in a MA error of 1.069. It can
be seen that after deleting irrelevant features, the MA error is reduced meaning that the
interpolation ability of the ranking model is improved. With the proposed method, eight
samples are ranked as 4, and others are ranked as either 1 or 3, generating a MA error
of 1.050. This indicates that deleting redundant features improves the interpolation ability
of the ranking model. With randomly selected 11 features, 137 samples are ranked 4,
giving a high MA error (1.856). This is because randomly selected features contain
irrelevant and redundant information. Using 11 features selected by the Pearson correlation
coefficient, 89 samples are ranked as 4 and a MA error of 1.556 is generated, which

28

means that the interpolation ability of this learned ranking model is poor. In Scenario 2,
features selected by the proposed method demonstrate the best interpolation ability.

Table 8 Results of Scenario 2 (380 training samples (ranks 1, 3, 4) and 160 test samples (rank 2))
Feature subset # of samples in
predicted ranks
MA error

MZ error
1 3 4
(1) all 252 features 0 124 36 1.225 1
(2) top 38 features 0 149 11 1.069 1
(3) 11 features (the proposed method) 21 131 8 1.050 1
(4) randomly selected 11 features 0 23 137 1.856 1
0 71 89 1.556 1

In Scenario 3, the training samples are from ranks 1, 2 and 3 only. The test samples
(rank 4) are predicted to be one of the three ranks (i.e. 1, 2, and 3). Same as in
Scenario 2, the MZ error is always 1 because rank 4 is never predicted. We will compare
MA errors only. In the perfect case, the test samples are all diagnosed to be rank 3, which
is the closest rank to the true rank (i.e. 4). In this case, the MA error is 1. In the worst
case, the test samples are all diagnosed as rank 1, making an MA error of 3. Table 9
shows the detailed results. With all 252 features, around half of the test samples (76
samples) are ranked 2 and half are ranked 3, making an MA error of 1.475. Using the
top 38 features, 34 samples are put in rank 2 and others in rank 3, reducing the MA
error to 1.215. This demonstrates that irrelevant features should be excluded in order to
improve the extrapolation ability of the ranking model. The features selected by the
proposed method further reduce the MA error to 1.075 by eliminating the redundant
information. Randomly selected features put most samples into rank 2, giving an MA
error of 1.569. The 11 features selected using the Pearson correlation coefficient give a
MA error of 1.294, indicating a worse extrapolation ability of the ranking model than that

29

of the proposed method (1.075). In this scenario, the proposed method generates the lowest
MA error, and thus produces a ranking model with the best exploration ability.
The comparisons in three scenarios prove the benefits of deleting irrelevant features and
redundant features. Moreover, comparisons between results of the proposed method and
results of features selected using the Pearson correlation coefficient show the effectiveness
of the Polyserical correlation coefficient in evaluating the feature-label relevance for
ordinal ranking problems. Using the Pearson (more precisely, Point-biserial) correlation
coefficient, the rank is regarded as a nominal variable. That is why the Pearson (Point-
biserial) correlation coefficient works well for classification problems not for ordinal
ranking problems. In all three scenarios, the proposed method gives the lowest error,
proving its effectiveness in selecting features for ordinal ranking.

Table 9 Results of Scenario 3 (480 training samples (ranks 1,2,3) and 160 test samples (rank 4))
Feature subset # of samples in predicted
ranks
MA
error

MZ
error
1 2 3
(1) all 252 features 0 76 84 1.475 1
(2) top 38 features 0 34 126 1.215 1
(3) 11 features (the proposed method) 0 12 148 1.075 1
(4) randomly selected 11 features 0 91 69 1.569 1
0 47 113 1.294 1
7.2.Comparison of ordinal ranking and classification
For comparison purposes, the traditional diagnosis approach [5, 6] which uses a multi-class
classifier to diagnose the damage levels is also applied to each scenario. To avoid the
influence of the learning machine, support vector machine (SVM) is adopted as a classifier
since the ordinal ranking algorithm SVOR is based on SVM. The same kernel function (2
nd

degree polynomial kernel) was used. The diagnosis procedure is the same as described in

30

Section 4.2 except that the ordinal ranking algorithm is replaced by the classification
algorithm. Results are listed in Table 10.
Table 10 Comparison of the proposed approach and traditional approach
Diagnosis
approach
Scenario 1 Scenario 2 Scenario 3

MA
Error

MZ
error
# of samples in
predicted ranks
MA
Error

MZ
error
# of samples in
predicted ranks
MA
error
MZ
error
1 3 4 1 2 3
proposed
approach
(ordinal
ranking)
0.073
0.012

0.072
0.012

21 131 8 1.050 1 0 12 148 1.075 1
traditional
approach
(classification)
0.088
0.017

0.066
0.012

0 80 80 1.500 1 25 19 116 1.431 1

In Scenario 1, the MA error of ordinal ranking (0.073
0.012) is smaller than that of

classification (0.088
0.017), whereas the MZ error (0.072
0.0120) is larger than that of

classification (0.066
0.012). This is explained as follows. The MZ error treats wrongly

ranked samples equally and the value of MZ error isnt influenced by how well the ordinal
information is kept. The more separately each rank is classified, the more likely that the
MZ error is low. The aim of classification is to classify each rank as separately as possible;
therefore classification gives a lower MZ error. However, the MA error is influenced by
how well the ordinal information is kept. It penalizes the wrongly ranked samples
considering how far a sample is wrongly ranked from its true rank. The more ordinal
information is kept in the ranking model, the more likely that MA error becomes small.
Classification doesnt guarantee that the ordinal information is kept. Ordinal ranking, on
the other hand, aims to express the ordinal information by searching a monotonic trend in
the feature space, and therefore the ordinal information is largely preserved. That is why
ordinal ranking produces a smaller MA error than classification. The above arguments are

31

also supported by results in Scenarios 2 and 3. In Scenario 2, the MA error is 1.500 for
classification and 1.050 for ordinal ranking. In Scenario 3, the MA error is 1.431 for
classification and 1.075 for ordinal ranking.
The above comparisons show that the ordinal ranking results in a lower MA error, and
classification generates a lower MZ error. For diagnosis of damage levels, a low MA error
is more important than a low MZ error. The reason is explained as follows. A low MA error
means that the diagnosed damage level of a new sample is close to its true level. A low MZ
error, however, cannot ensure a closer distance between the diagnosed damage level and
true level. In this sense, ordinal ranking is more suitable for diagnosis of damage levels than
classification. The advantage of ordinal ranking is more obvious when data of some damage
level are missing in the training process, as can be seen from Scenarios 2 and 3 in Table 10.
8. Conclusion
Diagnosis of damage levels is an important task in fault diagnosis of machinery. One key
characteristic of damage levels is the inherent ordinal information. Thus keeping the ordinal
information is important in the diagnosis process. This paper proposes to preserve the
ordinal information by using ordinal ranking techniques. Experimental results on diagnosis
of artificially created surface damage levels of planet gear teeth shows that ordinal ranking
has advantages over classification in terms of lower mean absolute error, better
interpolation ability and extrapolation ability.
A feature selection method is proposed based on correlation coefficients to improve the
diagnosis accuracy of ordinal ranking. The proposed method selects features that are
relevant to ranks, and meanwhile ensures that the redundant information is limited to a

32

certain level. Experimental results show that the proposed feature selection method
efficiently reduces the diagnosis errors, and improve the interpolation and extrapolation
abilities of the ranking model.
Correlation coefficient reflects only linear relationship between two variables. Feature
selection methods that consider the nonlinearity when evaluating the feature-label relevance
and feature-feature redundancy will be studied in our future work. Furthermore, the
experimental data used in this paper are from lab experiments not from industry fields. The
effectiveness of the proposed method in real industry needs to be tested in future.
ACKNOWLEDGMENT
The research was supported by the National Sciences and Engineering Research Council of
Canada, Syncrude Canada Ltd., and China Scholarship Council. Critical comments and
constructive suggestions from reviewers and the editor are very much appreciated.
REFERENCES
[1] B. Samanta, Gear fault detection using artificial neural networks and support vector machines with
genetic algorithms, Mechanical Systems and Signal Processing, 18 (2004), 625-644.
[2] N. Saravanan, S. Cholairajan, and K. I. Ramachandran, Vibration-based fault diagnosis of spur
bevel gear box using fuzzy technique, Expert Systems with Applications, 36 (2009), 3119-3135.
[3] Z. P. Feng, M. J. Zuo, and F. L. Chu, Application of regularization dimension to gear damage
assessment, Mechanical Systems and Signal Processing, 24 (2010), 1081-1098.
[4] H. Ozturk, I. Yesilyurt, and M. Sabuncu, Detection and advancement monitoring of distributed
pitting failure in gears, Journal of Nondestructive Evaluation, 29 (2010), 63-73.
[5] Y. Lei, M. J. Zuo, Z. J. He, and Y. Y. Zi, A multidimensional hybrid intelligent method for gear
fault diagnosis, Expert Systems with Applications, 37 (2010), 1419-1430.
[6] Y. Lei and M. J. Zuo, Gear crack level identification based on weighted K nearest neighbor
classification algorithm, Mechanical Systems and Signal Processing, 23 (2009), 1535-1547.
[7] X. Zhao, M. J. Zuo, and Z. Liu, Diagnosis of pitting damage levels of planet gears based on ordinal
ranking, in IEEE International Conference on Prognostics and Health management, Denver, U.S.,
2011.
[8] M. Inalpolat and A. Kahraman, A theoretical and experimental investigation of modulation
sidebands of planetary gear sets, Journal of Sound and Vibration, 323 (2009), 677-696.
[9] T. Barszcz and R. B. Randall, Application of spectral kurtosis for detection of a tooth crack in the
planetary gear of a wind turbine, Mechanical Systems and Signal Processing, 23 (2009), 1352-1365.
[10] W. Bartelmus and R. Zimroz, Vibration condition monitoring of planetary gearbox under varying
external load, Mechanical Systems and Signal Processing, 23 (2009), 246-257.

33

[11] W. Bartelmus and R. Zimroz, A new feature for monitoring the condition of gearboxes in non-
stationary operating conditions, Mechanical Systems and Signal Processing, 23 (2009), 1528-1534.
[12] W. Bartelmus, F. Chaari, R. Zimroz, and M. Haddar, Modelling of gearbox dynamics under time-
varying nonstationary load for distributed fault detection and diagnosis, European Journal of
Mechanics - A/Solids, 29 (2010), 637-646.
[13] H.-T. Lin, "From ordinal ranking to binary classification," Doctor of Philosophy, California Institute
of Technology, Pasadena, Unites States, 2008.
[14] A. Shashua and A. Levin, Ranking with large margin principle: two approaches, in Proceedings of
Advances in Neural Information Processing Systems, 2002, pp. 937-944.
[15] E. Frank and M. Hall, A simple approach to ordinal classification, in Machine Learning: ECML
2001, L. De Raedt and P. Flach, Eds., ed: Springer Berlin Heidelberg, 2001, pp. 145-156.
[16] R. Herbrich, T. Graepel, and K. Obermayer, Large margin rank boundaries for ordinal regression, in
Advances in Large Margin Classifiers, ed: MIT Press, 2000, pp. 115-132.
[17] X. Geng, T. Y. Liu, T. Qin, and H. Li, Feature selection for ranking, in 30th Annual International
ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, 2007,
pp. 407-414.
[18] Rahman Mukras, Nirmalie Wiratunga, Robert Lothian, Sutanu Chakraborti, and D. Harper,
Information gain feature selection for ordinal text classification using probability redistribution, in
In proceedings of the IJCAI07 workshop on texting mining and link analysis, Hyderabad IN, 2007.
[19] S. Baccianella, A. Esuli, and F. Sebastiani, Feature selection for ordinal regression, in Proceedings
of the 2010 ACM Symposium on Applied Computing, New York, 2010, pp. 1748-1754.
[20] S. S. Stevens, On the theory of scales of measurement Science, 103 (1946), 677-680.
[21] K. Crammer and Y. Singer, Pranking with ranking, Advances in Neural Information Processing
Systems 14 (2001), 641-647.
[22] W. Chu and S. S. Keerthi, Support vector ordinal regression, Neural Computation, 19 (2007), 792-
815.
[23] Y. Lei and L. Huan, Efficient feature selection via analysis of relevance and redundancy, The
Journal of Machine Learning Research, 5 (2004), 1205-1224.
[24] H. C. Peng, Long, F., and Ding, C., Feature selection based on mutual information: criteria of max-
dependency, max-relevance, and min-redundancy, IEEE Transactions on Pattern Analysis and
Machine Intelligence, 27 (2005), 1226-1238.
[25] R. E. Schumacker and R. G. Lomax, A beginner's guide to structural equation modeling, Second ed.
New Jersey: Lawrence Erlbaum Associates, Inc., 2004.
[26] D. Muijs, Doing quantitative research in education with SPSS, Second ed. London: Sage
Publications Ltd, 2010.
[27] X. Zhao, Q. Hu, Y. Lei, and M. J. Zuo, Vibration-based fault diagnosis of slurry pump impellers
using neighbourhood rough set models, Proceedings of the Institution of Mechanical Engineers, Part
C: Journal of Mechanical Engineering Science, 224 (2010), 995-1006.
[28] I. Guyon, Practical feature selection: from correlation to causality., in Mining Massive Data Sets for
Security, ed: IOS Press, 2008.
[29] U. Olsson, F. Drasgow, and N. Dorans, The polyserial correlation coefficient, Psychometrika, 47
(1982), 337-347.
[30] Mohammad Hoseini, Yaguo Lei, Do Van Tuan, Tejas Patel, and M. J. Zuo, Experiment Design of
Four Types of Experiments: Pitting Experiments, Run-To- Failure Experiments, Various Load and
Speed Experiments, and Crack Experiments, University of Alberta, Edmonton, Canada, January 31,
2011.
[31] A. S. f. Metals, Friction, Lubrication, and Wear Technology Handbook, 10th ed. vol. 18: ASM
International, 1992.
[32] P. D. Samuel and D. J. Pines, A review of vibration-based techniques for helicopter transmission
diagnostics, Journal of Sound and Vibration, 282 (2005), 475-508.
[33] J.Keller and P.Grabill, Vibration monitoring of a UH-60A transmission planetary carrier fault, in
The American Helicopter Society 59th Annual Forum, Phoenix, U.S., 2003.

34

[34] A. S. Sait and Y. I. Sharaf-Eldeen, A review of gearbox condition monitoring based on vibration
analysis techniques diagnostics and prognostics, in Rotating Machinery, Structural Health
Monitoring, Shock and Vibration, Volume 5, 2011, pp. 307-324.

Thehighlightsofthispaperareasfollows:
Afeatureselectionmethodisdesignedforordinalranking.
Adiagnosisapproachisproposedfordiagnosingdamagelevelsusingordinal
ranking.
Theproposedapproachisappliedtodiagnosisofsurfacedamagelevelsofplanet
gearteeth.
Theeffectivenessofthedesignedfeatureselectionmethodisdemonstrated.
Theadvantageoftheproposedapproachoverthetraditionalapproachis
discussed.

1 s2.0 S0263224112002400 Main

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S0263224112002400 Main

Uploaded by

Copyright:

Available Formats

Accepted Manuscript

< < w x is assigned rank i. The

= < < w x . (1)

= < < = (6)

0.012) is smaller than that of

0.017), whereas the MZ error (0.072

0.0120) is larger than that of

0.012). This is explained as follows. The MZ error treats wrongly

You might also like