You are on page 1of 10

Mechanical Systems and Signal Processing 60-61 (2015) 316–325

Contents lists available at ScienceDirect

Mechanical Systems and Signal Processing


journal homepage: www.elsevier.com/locate/ymssp

Classifying machinery condition using oil samples


and binary logistic regression
J. Phillips a, E. Cripps b, John W. Lau b, M.R. Hodkiewicz a,n
a
School of Mechanical and Chemical Engineering, University of Western Australia, Perth 6009, WA, Australia
b
School of Mathematics and Statistics, University of Western Australia, Perth 6009, WA, Australia

a r t i c l e in f o abstract

Article history: The era of big data has resulted in an explosion of condition monitoring information. The
Received 21 June 2013 result is an increasing motivation to automate the costly and time consuming human
Received in revised form elements involved in the classification of machine health. When working with industry it is
14 November 2014
important to build an understanding and hence some trust in the classification scheme for
Accepted 27 December 2014
Available online 24 February 2015
those who use the analysis to initiate maintenance tasks. Typically “black box” approaches
such as artificial neural networks (ANN) and support vector machines (SVM) can be difficult
Keywords: to provide ease of interpretability. In contrast, this paper argues that logistic regression
Logistic regression offers easy interpretability to industry experts, providing insight to the drivers of the human
Classification
classification process and to the ramifications of potential misclassification. Of course,
Oil analysis
accuracy is of foremost importance in any automated classification scheme, so we also
Mining trucks
Machine health provide a comparative study based on predictive performance of logistic regression, ANN
Neural networks and SVM. A real world oil analysis data set from engines on mining trucks is presented and
Support vector machine using cross-validation we demonstrate that logistic regression out-performs the ANN and
Receiver operating characteristic curve SVM approaches in terms of prediction for healthy/not healthy engines.
& 2015 Elsevier Ltd. All rights reserved.

1. Introduction

This article argues the advantages of using logistic regression (LR) for the binary classification problem of machine health,
with an emphasis on a particular industry application that employs an oil analysis based maintenance strategy. Many
organisations have embraced oil analysis sampling as part of their condition based maintenance strategy and collect
hundreds of samples a month. This process of collection, analysis, and then classification of oil samples by expert analysts is
expensive. As one would expect, many of the samples presented to the oil analysts indicate a healthy machine. The samples
that require further analysis by the expert are those that indicate a machine is not healthy. Such a classification does not
necessarily imply maintenance is required immediately but that extra attention should be directed toward that sample/
machine. The ability to automatically classify healthy/not-healthy machines early in the process is attractive so that further
analyst effort is not wasted examining the results of samples which show no sign of degradation.
To be suitable for use in industry, a binary classification process should be satisfy the following criteria: (1) the procedure
should accurately classify the probability of a machine being healthy/not healthy, (2) the procedure should be easy to use
and straightforward to update as new condition data becomes available, (3) it should be clear to the analyst, engineer and

n
Corresponding author. Tel.: þ61 8 6488 7911.
E-mail address: Melinda.hodkiewicz@uwa.edu.au (M.R. Hodkiewicz).

http://dx.doi.org/10.1016/j.ymssp.2014.12.020
0888-3270/& 2015 Elsevier Ltd. All rights reserved.
J. Phillips et al. / Mechanical Systems and Signal Processing 60-61 (2015) 316–325 317

maintenance planner how changes in explanatory variables affect the probability of the machine being healthy/not healthy
and these relationships should make sense in terms of the failure modes experienced by the machine, (4) there should be an
ability to assess consequences of misclassification—the cost of catastrophic failure, for example, of these engines can exceed
$100,000.
Artificial neural networks (ANN) and support vector machines (SVM) have been thoroughly examined for machine
condition classification and are demonstrably capable of addressing points (1), (2) and (4) above. However, point (3) is less
accessible. Although labelling such classification schemes as "black boxes” due to their complex structure is perhaps overly
strong [1], they definitely present a greater challenge to non-specialists who wish to obtain more insight from the analysis
than simply whether a machine is healthy/not healthy and which explanatory variables are the most important. In contrast, LR
is well equipped to provide additional insights. The parametric form of LR not only permits the classification probabilities to be
estimated for certain levels of explanatory variables, but also how the rates of change of the input variables impact the
probability classification. Graphical representations of these relationships are trivial to obtain and, we found, quite informative
for the oil analysts with whom we interacted. Also, use of the quantification of misclassification costs via sensitivity, specificity
and receiver operating curves provide a mechanism with which to tune the appropriate probability classification level. In light
of this, it is surprising that LR has relatively few published examples in machine condition classification. For example, in this
journal there have been only two papers [2,3] focussing specifically on logistic regression models, compared to over 20 papers
each on ANN and SVM.
Binary LR models have been used for health assessment of an elevator door system [4], light-emitting diodes [5], machinery
and cutting tools [2,6], wind turbine bearings [7], helical gear box [8], and simple process systems [9]. Raza et al. [10] used LR
to assess the health of a strainer located at the suction side of a pump and compared the results with ANN and support vector
machine techniques. Spezzaferro [11] applied LR to aircraft maintenance data and hence determine maintenance inspection
interval lengths. Liao et al. [12] compared LR to a proportional hazards model for bearing degradation. LR has been used to
estimate the degradation of units (bearings) and more complex facilities and, in combination with a vector machine model, to
predict failure probabilities [3,13]. Liu et al. [14] used a combination of a number of methods including LR to assess the
performance of a centrifugal compressor turbine. In more recent work Rai [15] compared the classification performance of
multinominal LR with decision trees and random forests to predict drill-bit breakage and found that LR had the lowest
proportion of classification errors. Also Pandya et al. [16] determined that multinomial LR is more effective than ANN and SVM
on classifying bearing faults. With the exception of the work of [10,15,16] none of the other papers compared the performance
of their LR models with other binary classification techniques.
Over the last decade ANN models for classification of machine health and fault diagnosis have received significant
attention due to their ability to learn from examples, handle incomplete and noisy data, and deal with nonlinear problems
[17,18]. There are different types of ANN models used for supervised and unsupervised learning [19,20]. The feed-forward
neural network (FFNN) structure is the most commonly used supervised neural network for machine fault diagnostics and
the FFNN with back-propagation training algorithm for pattern recognition and classification [21]. Although back-
propagation has produced acceptable results for classifying machine condition [22–29], a significant amount of effort is
required to determine the architecture and number of nodes. As a result the cascade-correlation neural network (CCNN) was
developed [30]. This algorithm does not require initial determination of the network structure and the number of nodes and
the efficiency of the network is not compromised by an excessive number of nodes as can occur in back-propagation [31].
Examples of the use of CCNN for machine condition and fault detection are found in [32,33].
SVMs are linear classifiers that make classification decisions based on a linear combination of the features of the data points
[34]. While SVMs and LR both calculate a set of weights for variables based on transformation of the feature space, LR models
the probability of outcomes explicitly whereas standard SVMs search for the optimal dividing hyper-plane. Perceived
advantages of SVM are the ability to manage high dimensional data and that they do not assume a parametric relationship
between model predictors and outcomes. One of the disadvantages is the need to select an appropriate kernel function for
transforming the covariates, this is non-trivial for a particular classification task as in practice several alternatives need to be
considered and compared by means of cross-validation or other methods [35]. Examples of SVM in machine condition
classification include [36–41].
In addition to the scarcity of LR models in the condition monitoring literature is the limited number of comparison
studies between LR, ANN and SVM. Table 1 shows that most of the focus on comparing these binary classifier performance in
machine condition monitoring has been on ANN and SVM [36,37,42,43] with only two papers comparing of LR, ANN and
SVM [10,44]. In comparison, outside the machine condition monitoring sector there are many more examples comparing LR
with other binary classification models as shown in the review papers in [45,46]. We present a cross-validation study based
on our data obtained from oil analysts in industry to classify condition of engines on mining trucks and conclude that in this
case LR outperforms an ANN and SVM model in terms of predictive performance. Considering the additional insights
provided by LR, and the foremost importance of predictive capability in classification techniques for decision making, we
argue LR deserves a larger recognition of its potential in industrial condition monitoring.
The paper proceeds as follows. Section 2 describes the oil condition data for mining trucks with which we demonstrate
LR and perform the cross-validation comparison study. These data sets are obtained directly from industry and are typical of
those used for condition-based maintenance strategies obtained in the mining industry. Section 3 describes the LR model,
Section 4 outlines the CCNN, a form of neural network model, to which we compare the LR model and Section 5 does the
same for SVM. Section 6 compares the predictive performances of LR, CCNN and SVM and describes the interpretability of LR
318 J. Phillips et al. / Mechanical Systems and Signal Processing 60-61 (2015) 316–325

Table 1
Examples of comparison on LR, ANN and SVM models.

Classifiers Field Reference Year


compared

LR, ANN and Machine [16] 2014


SVM condition
LR, ANN & SVM Machine [10] 2010
condition
LR, ANN & SVM Machine [44] 2010
condition
ANN & SVM Machine [28] 2015
condition
ANN & SVM Machine [47] 2012
condition
ANN & SVM Machine [36] 2005
condition
ANN & SVM Machine [42] 2005
condition
ANN & SVM Machine [37] 2004
condition
ANN & SVM Machine [43] 2002
condition

models. Section 7 addresses the issue of specificity and sensitivity and describes how to use the results to address
misclassification and to tune optimal probability classification levels in LR. Section 8 concludes the paper.

2. Data sets

Data for testing the LR, CCNN and SVM models is provided by a commercial oil analysis facility focussed on the servicing
the heavy mobile equipment market. The laboratory processes over 400,000 oil samples a year providing data from a range
of tests aimed at determining the condition of the oil and providing information on the sub-system's condition. Oil samples
in this study are taken from trucks used in the mining industry and specifically from one sub-system, the diesel engine.
Analysis of the oil samples produces data for about 30 oil condition variables.
All sample test results are reviewed by experienced analysts with the assistance of the software using a decision support
tool based on statistical process control methods. The analysts manually classify the samples into one of four classes from ‘A’ to
‘X’. ‘A’ samples have oil properties that are within acceptable limits. These are designated “good” samples and are indicative of
a “healthy” sub-system. ‘X’ samples have clear contamination needing immediate diagnostic and corrective action to prevent
possible failure. In between there are ‘B’ and ‘C’ which represent deterioration stages between ‘A’ and ‘X’. ‘B’, ‘C’, and ‘X’
samples are collectively described in this paper as representing a “not good” oil sample and are indicative of deteriorating, or
in the case of ‘X’ severely deteriorated, sub-systems. Sub-systems with ‘X’ samples should be removed from service.
A set of explanatory variables deemed important were selected based on expert knowledge of the oil analysts, in
conjunction with some exploratory data analysis on our behalf. The final variables were a consensus between us and the oil
analysts. Table 2 provides a summary of the six oil analysis variables in the explanatory variable set and their relationship
with engine failure modes and degradation processes. For the analysis described below variables Na, Si and Cu were highly
skewed to the right and were therefore transformed into categorical variables indicating whether or not the value lay above
or below the median [48]. Also included in the data set are two explanatory variables that account for the age of the oil in
the sub-system and the previous health classification of sub-system described as prevB, prevC and prevX. When comparing
model performance later in the paper, the same explanatory variables are used in each of the three classifiers.
The data set chosen is from the engine sub-system on a fleet of 60 off-highway trucks at a mine site. There are 1332
samples, of these 15.8% (211) are ‘healthy’ (A) and the remainder are ‘not healthy’ (Not-A). The aim of this project is to see if
the classifier can correctly determine the ‘A’ and ‘Not-A’ samples. Only the ‘Not-A’ samples need to be passed on to the
analysts for further evaluation. This manual classification by the analysts is necessary for samples indicating significant
deterioration because of differences in truck sub-system wear rates, oil types, sampling times and methods, environmental
conditions and the way different machines are used on different sites may influence the exact classification. Correctly
classified ‘A’ samples should not need to presented to the analyst for inspection, thereby redirecting the analysts' workload
to the more pressing occasions of ‘Not-A’ samples.

3. Binary logistic regression model

Logistic regression is used to model the probability of classification into an ‘A’ or ‘Not-A’ oil sample. Let Y i indicate the
classification of the ith oil sample such that Y i ¼ 1 if the oil sample is classified as ‘A’ and Y i ¼ 0 otherwise. Then let
π i ¼ P ðY i ¼ 1jX i Þ where Xi is a 1  (pþ1) vector with first element equal to 1 and the remaining elements corresponding to the
J. Phillips et al. / Mechanical Systems and Signal Processing 60-61 (2015) 316–325 319

Table 2
Engine failure modes and degradation processes for key condition variables.

Variables Problem areas and causes (engines)

Iron (Fe) Associated with wear of cylinder liners, pistons, crankshaft, and valves
Sodium (Na) Can indicate coolant contamination which is indicative of problems with cylinder head gaskets, oil cooler cores, or internal coolant
passages
Silicon (Si) Associated with dirt and dust contamination. It can cause wear problems due to breakdown in lubrication
Lead (Pb) Indicative of bearing wear
Copper (Cu) Indicative of wear in bushings, washers, rocker arm
Oxidation Results from the oxidation that can occur in engine oils. This leads to changes in viscosity, acid generation and deposit formation
(OXI)

p oil characteristics for oil sample i. The LR model relates π i to the oil sample characteristics by the logit function:
 
πi
logitðπ i Þ ¼ log ¼ Xiβ ð1Þ
1  πi

where β ¼ ðβ0 ; β1 ; …; βp Þ0 is the vector of regression coefficients. A simple and common predictive classification procedure is
to allocate an unobserved response ‘A’ if π i 4 0:5 and ‘Not-A’ otherwise.
The regression coefficients of the LR model are also be easily interpreted as follows. Define the odds ratio as π=ð1  πÞ.
Now, say we are only concerned with the oil characteristics Fe and Si. Then, for a given level of Si, a one unit increase in Fe
will result in an increase/ decrease the odds of observing an ‘A’ by expðβFe Þ where βFe is the regression coefficient for Fe. A
positive/negative regression coefficient implies a one unit rise in Fe will result in an increase/decrease in the probability of
observing an ‘A’. In practice, an increase in Fe or Si is expected to coincide with a decrease in the probability of observing an
‘A’ sample.
In the analysis that follows, we estimate the LR model parameters using maximum likelihood and the open source
statistical software R [49]. We develop graphs illustrating estimated effects of the oil characteristics on the probability of an
‘A’ and the estimated effects on the odds ratio. We also compare the LR classification method with the CCNN and SVM
procedures using an identical data set.

4. Cascade-correlation neural network approach

We use a supervised neural network learning algorithm based on cascade-correlation [30]. Instead of adjusting the
weights in a network of fixed topology, cascade-correlation begins with a minimal network, then automatically trains and
adds new hidden nodes one by one, creating a multi-layer structure. This layer of connections is trained to minimise the
error E between yop the observed value of output o for training pattern p, and the target output top [32].
1X  2
E¼ yop  t op ð2Þ
2 o;p

The error E is minimised by gradient descent using


 
0
eop ¼ yop t op f p ð3Þ

and
∂E X
¼ eop I ip ; ð4Þ
∂ωoi p

0
where eop is the output error at o for training pattern p, f p is the derivative of the sigmoid activation function of the output
for pattern p, Iip is the value of the input i and ωoi is the weight connecting input i to output o.
To create a new hidden node, a pool of candidate nodes is created. Each candidate node receives weighted connections
from the network's inputs and from any hidden nodes already present in the network. The outputs of these candidate nodes
are not connected to the active network. Multiple passes are run with the training set and each candidate node adjusts its
incoming weights to maximise the correlation C between the candidate node's output yp and the residual error eop in the
output of the active network. This correlation is computed over all training patterns p. C is defined as
 
X X   
C¼  yp  y eop  eo  ð5Þ
o
 p 

where y and eo are averages of y and eo over all patterns p. A best candidate is selected when the correlation scores stop
improving. This candidate is then added to the network with its associated weights as a permanent node. The entire
network is then retrained with the new node until the error is minimised [50]. For the engine oil analysis 80 hidden nodes
gave the best classification accuracy. This high number of nodes indicates the data set contains multiple relationships
320 J. Phillips et al. / Mechanical Systems and Signal Processing 60-61 (2015) 316–325

between the parameters and higher order transfer functions are required for the hidden nodes to correctly convey the input
data. The CNN analysis is performed with Neuroshell Classifier [51].

5. Support vector machine approach

In classification, support vector machines separate the different classes of data by a hyperplane

w; ΦðxÞ þ b ¼ 0 ð6Þ

corresponding to the decision function


f ðxÞ ¼ sign ðw; ΦðxÞÞ þb ð7Þ

The hyper-plane is constructed by solving a constrained quadratic optimization problem whose solution w has an
P
expansion w ¼ i αi Φðxi Þ in terms of a subset of training patterns that lie on the margin. These training patterns, called
support vectors, carry all relevant information about the classification problem. The main difference between LR and SVM is
that although LR models the probability of outcomes explicitly SVM tries to find the best dividing hyper-plane regardless of
the probability of class membership [52]. This latter property is also a disadvantage of SVMs as the classification result is
dichotomous and no probability of class membership is available. The SVM analysis are actioned in R [49].

6. Results

This section first describes the results of the LR model estimated by the data described in Section 2 and the interpretation
this model is able to provide the oil analysts. Afterwards, the comparative study of the predictive performance of the LR,
CCNN and SVM are presented approaches. Ten variables are considered for the model comparison based on the expert
opinion of the oil analysts who identified the variables shown in Table 2, the life (oil hours) and the results of previous
analysis prevB/C/X as being most important.

6.1. LR classification model and performance

Table 3 presents a summary of the binary LR model. For each variable this includes the coefficient, odds ratio and p-value
(Pr(4|z|)). Note that the explanatory variable indicating the previous classification of an oil sample is a factor recording ‘A’,
‘B’, ‘C’ or ‘X’. This results in three dummy variables included in the model indicating whether the previous classification was
‘B’ (prevB), ‘C’ (prevC) or ‘X’ (prevX). The previous classification ‘A’ is absorbed into the intercept. Similarly, the transformed
variables Na, Si and Cu are dummy variables with the lower category absorbed into the intercept.
The p-values indicate that nine of the ten variables significantly predict the classification outcome at a 5% level. The
results show that the most significant predictors on classification of the oil sample as ‘Not-A’ are the prevB/C/X sample
variables. A one unit increase in any of these samples increases the probability that the sample will be ‘Not-A’. After this Si,
Cu and Na are the most influential variables.
Table 3 implies the estimated logit model is:

logitðπ i Þ ¼ 4:1732  0:2520Fe 0:4496Pb  0:1033OXI þ 0:0044Oilhours


0:7511Cu  0:8569Si 0:7030Na 1:4813prevB
2:0613prevC  2:0148prevX ð8Þ

Table 3
Binary LR model—engine.

Variable Coefficient Odds ratio p-Value

Intercept 4.3712 NA 0.0000


Fe  0.2520 0.7772 0.0000
Pb  0.4496 0.6379 0.0005
OXI  0.1033 0.9019 0.1310
Oilhours 0.0044 1.0044 0.0000
Cu  0.7511 0.4719 0.0008
Si  0.8569 0.4245 0.0002
Na  0.7030 0.4950 0.0011
prevB  1.4813 0.2274 0.0000
prevC  2.0613 0.1273 0.0000
prevX  2.0148 0.1333 0.0021
J. Phillips et al. / Mechanical Systems and Signal Processing 60-61 (2015) 316–325 321

Fig. 1. Effect of Fe and Si on the probability of the occurrence of an ‘A’ sample in the engine.

6.2. Relationship between explanatory variables and oil condition

The variable coefficients are illustrative in explaining the relationship that specific oil conditions have with the
classification outcome. For the engine sub-system the odds ratio for Pb is 0.64, which implies that a 1 unit (ppm) increase in
Pb decreases the odds of an oil sample being classified as ‘A’ by 36% (or a multiplicative factor of 0.64).
Fig. 1 below is a plot illustrating information attainable from the LR model. It shows the probability of a sample being
classified as an ‘A’ for a given level of Fe and Si. Si and Fe are used for illustrative purposes due to their relationship with
critical failure modes in engines. Assume for the moment that the engine is new such that Oilhours¼ 0 and the previous
classification is A. Also that Cu and Na belong to their lower categories and OXI and Pb levels are zero. Then, if an oil sample
has 7.5 ppm Fe and the level of Si is less than the median (Bin 1) then the estimated probability of it being classified as an A
sample is 0.92. For the same level of Fe, if the Si value is above the median level (Bin 2) then the estimated probability of it
being classified as an A sample is 0.83.
We may extract other information from Fig. 1. An often cited cut-off probability for classification is 0.5—above 0.5 we classify as
‘A’ and ‘Not-A’ otherwise. We can see from Fig. 1 that when Si is below its median the cut-off probability for classification occurs at a
Fe level of 17.33 ppm. When Si is above its median value the cut-off probability for classification occurs at an Fe level of 13.93 ppm.
Also, the greatest difference in the probability of ‘A’ between the two levels of Si occurs at a Fe level of 17.76 ppm.
Similar figures can be easily developed for other explanatory variable combinations to allow the effects that selected
elements have on the oil classification to be more easily understood. For example, Fig. 2 highlights the effect of the previous
classification on the current classification. For example, if an oil sample has a Fe level of 15 ppm and the sub-system was
classified A on the previous occasion, then the estimated probability of it being classified as an A sample is 0.64. For the same
level of Fe, if the sub-system was classified C on the last occasion, then the estimated probability of it being an A sample is 0.19.
From a classification point of view this suggests that knowing the oil sample history is necessary when interpreting this oil data
set. This observation aligns with the experience of the oil analysts which helps to build confidence in the use of the model.

6.3. Comparison between binary LR, CCNN and SVM models

Fig. 3 compares the predictive performance of the LR, CCNN and SVM models. By predictive performance we mean the
percentage of correctly classified samples in an unobserved or hold-out data set. For LR we classify as a 1 those hold-out
observations whose estimated probabilities are greater than a 0.5 cut-off and as a zero if the estimated probabilities are less than 0.5.
Each data set is randomly divided into training and test set, with 80% of the samples being used for training to fit the model and
the fitted model is then used to predict the remaining 20% of the data. The 20% of data in the test set is used for only for model
testing, and is separate from the training set. This was performed on each data set over 50 repetitions to obtain a distribution of the
predictive performance for each model. These distributions are shown in the boxplot below. This process of multiple training-test
set repetition is important as all too often results are shown for only one training and one test set, whereas one can see from the
distributions that there is quite a spread of performance available when multiple training/test sets are used.
The same training and test data sets were presented to each model and assessed using the same explanatory variables as
described in Table 3 in Section 2. Using the measure of percentage of all samples classified correctly (% ‘A’ correctly classified
as ‘A’ and the % ‘Not-A’ correctly classified as ‘Not-A’), Fig. 3 shows that the LR outperforms both the CCNN and SVM models.
322 J. Phillips et al. / Mechanical Systems and Signal Processing 60-61 (2015) 316–325

Fig. 2. Effect of previous history (prevA, B, C, X) and Fe levels on the probability of the occurrence of an ‘A’ sample in the engine.

Fig. 3. Comparison of the predictive performance of LR, CCNN and SVM models for comparing the percentage of all samples classified correctly.

An additional factor that emerged in this work was the response of industry partners to the results. The black box nature
of both the CCNN and SVM processes was a concern to them. They felt that the processes of developing the CCNN and SVM
required too many steps and a high level of technical support in terms of which ANN algorithm to select and for the SVM
around kernel selection. They did not want to be reliant on external people when it came to updating the model and
checking its accuracy on an ongoing basis. In comparison the logistic regression model was simple to explain, there were no
difficulties in explaining the method, and they could reconcile the weights of the explanatory variables with their
understanding of failure modes for these engines. This need for consumer acceptance of the classification method in an
industry setting has not been well explored in the literature.

7. Specificity and sensitivity

Table 4 is the classification table for the engine's binary LR model using a cut-off probability of 0.5. Of the 1332 oil
samples, the LR model classified 144 samples as ‘A’ samples and 1188 samples as ‘Not-A’ samples. Of those classified as ‘A’ by
the LR model, 38 (29.8%) are ‘Not A’. That is, sensitivity ¼38/(106 þ38)  100 ¼29.8. The percentage of test ‘A’ samples
correctly classified is the sensitivity (50.2%) and the percentage of ‘Not-A’ samples correctly classified is the specificity
J. Phillips et al. / Mechanical Systems and Signal Processing 60-61 (2015) 316–325 323

Table 4
Classification table for the engine binary LR model—cut-off level of 0.5.

Actual ‘A’ Actual ‘Not-A’

Classified ‘A’ 106 38


Classified ‘Not-A’ 105 1083

Fig. 4. Classification cut-off curve for a weighted error function assuming an α value of 1/11.

(96.6%). Of most significance when implementing a model into an automated classification system is the number of false
positives that are misclassified. For example, the misclassification of an ‘X’ samples as an ‘A’ sample could contribute to an
unplanned machine failure. Therefore as a priority an effective model must correctly classify ‘Not-A’ samples. Although
smaller, there is also a cost of classifying an ‘A’ sample as ‘Not-A’ as the sample will be passed onto an analyst which then
has to spend time analysing the sample.
In order to account for the different costs associated with misclassification a utility function with weightings for the
different errors is introduced [53].

Error ¼ αð1 SensitivityÞ þ ð1  αÞð1 SpecificityÞ ð9Þ

The α value determines the final weightings of sensitivity and specificity and is therefore subjective. It depends on the
costs assigned to the error of classifying an ‘A’ sample as ‘Not-A’ and the error of classifying a ‘Not-A’ sample as an ‘A’ sample
[54] and a decision on the α value is usually made by a specialist in the field. For mining trucks this decision is influenced by
the type of sub-system being analysed, the consequence of failure, and the oil sampling interval.
Minimising the error function leads to a more risk-averse approach resulting in less misclassifications that have a critical
outcome. As an example, if the cost of classifying a ‘Not-A’ sample as an ‘A’ is 10 times greater than classifying an ‘A’ sample’
as ‘Not A’ then the value of α would be equal to 1/11. Fig. 4 shows the error function, assuming an α value of 1/11, plotted
over a range of cut-off values. The optimum cut-off value to minimise this error function corresponds to a level of 0.58. The
compares to a cut-off of 0.5 used in our analysis.
Receiver operating characteristic curves (ROC) are commonly used in diagnostic problems to determine an optimum cut-
off value for binary classification systems [55]. Fig. 5 is the ROC curve for the LR model. It is created by plotting the true
positive rate (sensitivity) and false positive rate (one minus the specificity) for a range of different cut-off values. The point
on the curve that is nearest to the upper left corner corresponds to the cut-off value that will maximise the sensitivity and
specificity of the classification [48]. The area under the ROC curve (AUC) can be used to provide a measure of a LR model's
discrimination [56]. In this study, discrimination is the model's ability to distinguish between oil samples that are ‘A’
samples and those that are ‘Not-A’ samples. An area of 0.5 indicates no discrimination between the two outcomes and an
area greater than 0.8 is considered excellent discrimination [48]. The AUC for the LR model is 0.915.
We suggest that when presenting classification models an understanding of how the classification performance is
affected by cut-off values and a discussion on the trade-offs between model sensitivity and specificity will enhance the
reader’s understanding of the model’s potential for their application.
324 J. Phillips et al. / Mechanical Systems and Signal Processing 60-61 (2015) 316–325

Fig. 5. Receiver operating curve (ROC) for the LR model.

8. Conclusion

The performance of LR, CCNN and SVM models has been evaluated for classification of oil sample data from mining
trucks. Identical data sets and explanatory variables are used in all three models. In evaluating the results we consider total
accuracy of the classification, ease of the model building and updating process, and transparency of how changes in the
variables affect the model output to the analyst.
The LR model demonstrated an ability to classify approximately 89% of the oil samples correctly. Using a cross-validation
study with the data set analysed in the article, LR outperformed both the CCNN and SVM models in terms of predictive
performance. Furthermore, the LR model offers a parametric framework to estimate the probability of the health of a
machine, conditioned on explanatory variables whose effects on the classification are easily interpreted in terms of
regression coefficients or the odds ratio. For example we can see in the LR model the importance of past history on future
classification. This is coherent with the observations of the importance of this information by the analysts. In addition how
changes in values of individual variables e.g. Fe, affect probability of classification as ‘A’ or ‘Not-A’ can also be explored. This
ability to determine in effect of individual explanatory variables on the classification is not easy to see using the CCNN and
SVM approaches. Transparency in how the classification works is important in building trust in the diagnosis approach.
The ability of industry to perform the modelling and reticulate the outputs to users via existing business systems is
dependent on skilled personnel (in-house or contracted) and computing infrastructure [57]. It is relatively straightforward
to build an LR model and to update the model as new data arrives and no specialist commercial software is required. LR
models are likely to be an accessible concept for working engineers, unlike ANN and SVM which often require specialist
training. This ability to manage the modelling in-house by personnel working for the oil analysis company is an important
factor when considering automation of the analysis.

References

[1] J.M. Benitez, J.L. Castro, I. Requena, Are artificial neural networks black boxes? IEEE Trans. Neural Networks 85 (5) (1997) 1156–1164.
[2] B. Chen, X. Chen, B. Li, Z. He, H. Cao, G. Cai, Reliability estimation for cutting tools based on logistic regression model using vibration signals, Mech.
Syst. Sig. Process. 25 (7) (2011) 2526–2537.
[3] W. Caesarendra, A. Widodo, B.-S. Yang, Application of relevance vector machine and logistic regression for machine degradation assessment, Mech.
Syst. Sig. Process. 24 (4) (2010) 1161–1171.
[4] J. Yan, J. Lee, Degradation assessment and fault modes classification using logistic regression, J. Manuf. Sci. Eng. 127 (4) (2005) 912–914.
[5] X. Di, Z. Wenbiao, Reliability Prediction Using Multivariate Degradation Data, Annual Reliability and Maintainability Symposium (RAMS), Alexandria,
VA, 2005, 337–341.
[6] J. Zhang, H. Nie, Experimental study and logistic regression modeling for machine condition monitoring through microcontroller-based data
acquisition system, J. Adv. Manuf. Syst. 8 (2) (2009) 177–192.
[7] F.-T. Wu, C.-C. Wang, J.-H. Liu, C.-M. Chang, Y.-P. Lee, Construction of wind turbine bearing vibration monitoring and performance assessment system,
J. Signal Inf. Process. 4 (2013) 430–438.
[8] A. Aggarwal, V. Sugumaran, M. Amarnath, H. Kumar, Fault diagnosis of helical gear box using logistic function and REP Tree, Int. J. Res. Mech. Eng. 2 (1)
(2014) 26–32.
[9] K.S. Park, Condition-based predictive maintenance by multiple logistic function, IEEE Trans. Reliab. 42 (4) (1993) 556–560.
[10] J. Raza, J.P. Liyanage, A. Hassan, Al,J. Lee, A comparative study of maintenance data classification based on neural networks, logistic regression and
support vector machines, J. Qual. Maintenance Eng. 16 (3) (2010) 303–318.
J. Phillips et al. / Mechanical Systems and Signal Processing 60-61 (2015) 316–325 325

[11] K.E. Spezzaferro, Applying logistic regression to maintenance data to establish inspection intervals, in: Annual Reliability and Maintainability
Symposium (RAMS) 1996, pp. 296-300.
[12] H. Liao, W. Zhao, H. Guo, Predicting remaining useful life of an individual unit using proportional hazards model and logistic regression model, in:
Annual Reliability and Maintainability Symposium (RAMS), Newport Beach, CA, 2006, pp. 127-132.
[13] X. Cao, P. Jiang, G. Zhou, Facility health maintenance through SVR-driven degradation prediction, Int. J. Mater. Prod. Technol. 33 (1/2) (2008) 185–193.
[14] W. Liu, X. Zhong, J. Lee, L. Laio, M. Zhou, Application of a novel method for machine performance degradation assessment based on gaussian mixture
model and logistic regression, Chin. J. Mech. Eng. 24 (5) (2011) 879.
[15] B. Rai, A study of classification models to predict drill-bit breakage using degradation signals, Int. J. Soc. Manage. Econ. Bus. Eng. 8 (8) (2014)
2326–2329.
[16] D.H. Pandya, S.H. Upadhyay, S.P. Harsha, Fault diagnosis of rolling element bearing by using multinomial logistic regression and wavelet packet
transform, Soft Comput. 18 (2014) 255–266.
[17] N.B. Jones, Y.-H. Li, A review of condition monitoring and fault diagnosis for diesel engines, Tribotest 6 (3) (2000) 267–291.
[18] S.A. Kalogirou, Artificial intelligence for the modeling and control of combustion processes: a review, Prog. Energy Combust. Sci. 29 (6) (2003) 515–566.
[19] R.P. Lippmann, An introduction to computing with neural nets, IEEE ASSP Mag. (1987).
[20] K.J. Anil, M. Jianchang, K.M. Mohiuddin, Artificial neural networks: a tutorial, IEEE Comput. Mag. (1996) 31–44.
[21] A.K.S. Jardine, D. Lin, D. Banjevic, A review of machinery diagnostics and prognostics implementing condition-based maintenance, Mech. Syst. Sig.
Process. 20 (2006) 1483–1510.
[22] B.A. Paya, I.I. Esat, M.N.M. Badi, Artificial neural network based fault diagnostics of rotating machinery using wavelet transforms as preprocessor,
Mech. Syst. Sig. Process. 11 (5) (1997) 751–765.
[23] J.-D. Wu, L. C-H., Investigation of engine fault diagnostis using discrete waveler transform and neural network, Expert Syst. Appl. 35 (2008) 1200–1213.
[24] A. Malhi, R. Gao, PCA-based feature selection scheme for machine defect classification, IEEE Trans. Instrum. Meas. 53 (6) (2004) 1517–1525.
[25] J. Zhu, Marine diesel engine condition monitoring by use of BP neural network, in: Proceedings of the International MultiConference of Engineers and
Computer Scientists, Hong Kong, 2009, pp. 1–4.
[26] A.C. McCormick, A.K. Nandi, Classification of the rotating machine condition using artificial neural networks, Proc. Inst. Mech. Eng. Part C J. Mech. Eng.
Sci. 211 (1997) 439–449.
[27] J. Porteiro, J. Collazo, D. Patiño, J.L. Míguez, Diesel engine condition monitoring using a multi-net neural network system with nonintrusive sensors,
Appl. Therm. Eng. 31 (17–18) (2011) 4097–4105.
[28] W. Li, Z. Zhu, F. Jiang, G. Zhou, G. Chen, Fault diagnosis of rotating machinery with a novel statistical feature extraction and evaluation method, Mech.
Syst. Sig. Process. 50-51 (2015) 414–426.
[29] B. Murugabatham, M.A. Sanjith, B. Krishnakumar, S.A.V. Satya Murty, Roller element bearing fault diagnosis using singular spectrum analysis, Mech.
Syst. Sig. Process. 35 (2013) 150–166.
[30] S.E. Fahlman, C. Lebiere, The cascade-correlation learning architecture, in: D.S. Touretzky (Ed.), Advances in Neural Information Processing Systems 2,
Morgan-Kaufmann: Los Altos, CA, 1990.
[31] D.S. Phatak, Connectivity and performance trade-offs in the cascade correlation learning architecture, IEEE Trans. Neural Networks 5 (6) (1994)
930–935.
[32] J.K. Spoerre, Application of the cascade correlation algorithm (CCA) to bearing fault classification problems, Comput. Ind. 32 (1997) 295–304.
[33] W.-M. Lin, C.-H. Wu, C.-H. Lin, K.-P. Tu, Multiple harmonic source detection of power system with cascade correlation network, in: IEEE International
Conference on Electric Utility Deregulation, Restructuring and Power Technologies (DRPT2004), Hong Kong, 2004, pp. 746–751.
[34] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, second ed. John Wiley & Sons, New York, 2000.
[35] D. Westreich, J. Lessler, M.J. Funk, Propensity score estimation: neural network, support vector machines, decision trees (CART), and meta classifiers as
alternatives to logistic regression, J. Clin. Epidemiol. 63 (2010) 826–833.
[36] B.-S. Yang, W.-W. Hwang, D.-J. Kim, A.C. Tan, Condition classification of small reciprocating compressor for refrigerators using artificial neural
networks and support vector machines, Mech. Syst. Sig. Process. 19 (2) (2005) 371–390.
[37] B. Samanta, Gear fault detection using artificial neural networks and support vector machines with genetic algorithms, Mech. Syst. Sig. Process. 18 (3)
(2004) 625–644.
[38] M. Kang, J.-M. Kim, Singular value decomposition based feature extraction approaches for classifying faults of induction motors, Mech. Syst. Sig.
Process. 41 (2013) 348–356.
[39] R. Jegadeeshwaran, V. Sugumaran, Fault diagnosis of automobile hydraulic brake system using statistical features and support vector machines, Mech.
Syst. Sig. Process. 52-53 (2015) 436–446.
[40] N. Li, R. Zhou, Q. Hu, Z. Liu, Mechanical fault diagnosis based on redundant second generation wavelet packet transform, neighbourhood rough set and
support vector machine, Mech. Syst. Sig. Process. 28 (2012) 608–621.
[41] X. Zhang, J. Zhou, Multi-fault diagnosis for rolling element bearings based on ensemble empirical mode decomposition and optimizes support vector
machines, Mech. Syst. Sig. Process. 41 (2013) 127–140.
[42] D. Thukaram, H.P. Khincha, H.P. Vijaynarasimha, Artificial neural network and support vector machine approach for locating faults in radial
distribution systems, IEEE Trans. Power Delivery 20 (2) (2005) 710–721.
[43] L.B. Jack, A.K. Nandi, Fault detection using support vector machines and artificial neural networks, augmented by genetic algorithms, Mech. Syst. Sig.
Process. 16 (2) (2002) 373–390.
[44] Y. Cai, M.-Y. Chow, W. Lu, L. Li, Evaluation of distribution fault diagnosis algorithms using ROC curves, in: IEEE Power and Energy Society General
Meeting: Minneapolis, MN: , , 2010, pp. 1-6.
[45] J.V. Tu, Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes, J. Clin. Epidemiol.
49 (11) (1996) 1225–1231.
[46] M. Paliwal, U.A. Kumar, Neural networks and statistical techniques: A review of applications, Expert Syst. Appl. 36 (2009) 2–17.
[47] T. Marwala, Condition Monitoring Using Computational Intelligence Methods—Applications in Mechanical and Electrical Systems, Springer-Verlag,
London, 2012.
[48] D.W. Hosmer, S. Lemeshow, R.X. Sturdivant, Applied Logistic Regression, third ed. John Wiley & Sons, Inc., New York, 2013.
[49] R Development Core Team, R: A Language and Environment for Statistical Computing, Vienna, Austria, 2012.
[50] M. Hoehfeld, S.E. Fahlman, Learning with limited numerical precision using the cascade-correlation algorithm, IEEE Trans. Neural Networks 3 (4)
(1992) 602–611.
[51] Neuroshell Classifier Software, 2013.
[52] J. Zhu, T. Hastie, Kernel logistic regressionm and the import vector machine, J. Comput. Graph. Stat. 14 (2005) 185–205.
[53] W. Krzanowski, D. Hand, ROC Curves for Continuous Data, CRC Press Taylor & Francis Group, Florida, 2009.
[54] J. Hilden, The area under the ROC curve and its competitors, Med. Decis. Making 11 (2) (1991) 95–101.
[55] C.E. Metz, Basic principles of ROC analysis, Semin. Nucl. Med. 8 (4) (1978) 283–298.
[56] J.A. Swets, Measuring the accuracy of diagnostic systems, Science 240 (1988) 1285–1293.
[57] J.Z. Sikorska, M. Hodkiewicz, L. Ma, Prognostic modelling options for remaining useful life estimation by industry, Mech. Syst. Sig. Process. 25 (5)
(2011) 1803–1836.

You might also like