Article in Press: Recognition of Control Chart Patterns Using An Intelligent Technique

Please cite this article in press as: A. Ebrahimzadeh, et al.
, Recognition of control chart patterns using an intelligent technique, Appl.

Soft Comput. J. (2012), doi:10.1016/j.asoc.2012.02.019
ARTICLE IN PRESS
G Model
ASOC-1491; No. of Pages 11
Applied Soft Computing xxx (2012) xxxxxx
Contents lists available at SciVerse ScienceDirect
Applied Soft Computing
j our nal homepage: www. el sevi er . com/ l ocat e/ asoc
Recognition of control chart patterns using an intelligent technique
Ata Ebrahimzadeh
, Jalil Addeh, Vahid Ranaee

Faculty of Electrical and Computer Engineering, Babol University of Technology, Babol, Iran
a r t i c l e i n f o
Article history:
Received 18 September 2011
Received in revised form
25 December 2011
Accepted 28 February 2012
Available online xxx
Keywords:
Control chart patterns
Neural networks
Shape features
Statistical feature
Support vector machine
a b s t r a c t
Control chart patterns (CCPs) are important statistical process control tools for determining whether a
process is run in its intended mode or in the presence of unnatural patterns. Automatic recognition of
abnormal patterns in control charts has seen increasing demands nowadays in the manufacturing pro-
cesses. This paper presents a novel hybrid intelligent method for recognition of common types of CCP. The
proposed method includes three main modules: the feature extraction module, the classier module and
optimization module. In the feature extraction module, a proper set of the shape features and statistical
features is proposed as the efcient characteristic of the patterns. In the classier module multilayer per-
ceptron neural network and support vector machine (SVM) are investigated. In support vector machine
training, the hyper-parameters have very important roles for its recognition accuracy. Therefore, in the
optimization module, improved bees algorithm is proposed for selecting of appropriate parameters of
the classier. Simulation results show that the proposed algorithm has very high recognition accuracy.
2012 Elsevier B.V. All rights reserved.
1. Introduction
Control chart has been widely used in modern industrial and
service organization. In recent years, various kinds of control charts
have been developed according to different quality attributes and
control targets. Monitoring process uctuation with control charts
is rst proposed by Shewhart in 1924. It is believed that the process
uctuation involves abnormal changes due to assign able causes
andnormal changes due tonon- assignable causes. Therefore, auto-
matically recognizing control chart patterns (CCPs) is an essential
issue for identifying the process uctuation effectively. CCPs can
exhibit six common types of pattern: normal (NOR), cyclic (CYC),
increasing trend (IT), decreasing trend (DT), upward shift (US), and
downward shift (DS). Except for normal patterns, all other pat-
terns indicate that the process being monitored is not functioning
correctly and requires adjustment. Fig. 1 shows these six types of
patterns [1].
In recent years, several studies have been performed for recog-
nition of the unnatural patterns. Some of the researchers used
the expert systems [2,3]. The advantage of an expert system or
rule-based system is that it contains the information explicitly. If
required, the rules can be modied and updated easily. However,
the use of rules based on statistical properties has the difculty
that similar statistical properties may be derived for some pat-
terns of different classes, which may create problems of incorrect
Corresponding author.
E-mail addresses: abrahamzadeh@gmail.com(A. Ebrahimzadeh),
vranaei@yahoo.com(V. Ranaee).
recognition. Also, ANNs have been widely applied for classiers.
ANNs can be simply categorized into two groups comprising
supervised and unsupervised. Most researchers [46] have used
supervised ANNs, such as multi layer perceptron (MLP), radial basis
function (RBF), and learning vector quantization (LVQ), to classify
different types of CCPs. Furthermore, unsupervised methods, e.g.
self-organized maps (SOM) and adaptive resonance theory (ART)
have beenappliedto fulll the same objective inother studies [7,8].
The advantage with neural network is that it is capable of handling
noisy measurements requiring no assumption about the statistical
distribution of the monitored data. It learns to recognize patterns
directly through typical example patterns during a training phase.
One disadvantage with neural network is the difculty in under-
standing how a particular classication decision has been reached
and also in determining the details of how a given pattern resem-
bles withaparticular class. Inaddition, thereis nosystematic wayto
select the topology andarchitecture of a neural network. Ingeneral,
this has to be found empirically, which can be time consuming.
Most the existing techniques used the unprocessed data as the
inputs of CCPs recognition system. The use of unprocessed CCP
data has further many problems such as the amount of data to
be processed is large. On the other hand, the approaches which
use features are more exible to deal with a complex process
problem, especially when no prior information is available. If the
features represent the characteristic of patterns explicitly and if
their components are reproducible withthe process conditions, the
classier recognition accuracy will increase [9]. Further, if the fea-
ture is amenable to reasoning, it will help in understanding how a
particular decision was made and thus makes the recognition pro-
cess a transparent process. Features could be obtained in various
1568-4946/$ see front matter 2012 Elsevier B.V. All rights reserved.
doi:10.1016/j.asoc.2012.02.019
Please cite this article in press as: A. Ebrahimzadeh, et al., Recognition of control chart patterns using an intelligent technique, Appl.
ARTICLE IN PRESS
G Model
2 A. Ebrahimzadeh et al. / Applied Soft Computing xxx (2012) xxxxxx
Fig. 1. Six various basic patterns of control charts: (a) normal pattern, (b) cyclic pattern, (c) upward trend, and (d) downward trend, (e) upward shift, (f) downward shift.
forms, including principal component analysis shape features
[10,11], multi-resolution wavelet analysis [12,13] and statistical
features [14]. PhamandWani [10] introducedfeature basedcontrol
chart pattern recognition. Nine geometric features were proposed:
slope, number of meancrossings, number of least-squarelinecross-
ings, cyclic membership, average slope of the line segments, slope
difference, and three different measures for area. The scheme was
aimed at improving the performance of the pattern recognizer by
presenting a smaller input vector (features). Gauri and Chakraborty
[11] alsopresent aset of sevenmost useful features that areselected
from a large number of potentially useful features using a CART-
based systematic approach. Based on these selected features, eight
most commonly observed CCPs are recognized using heuristic and
ANN techniques. Chen et al. [12] presents a hybrid approach by
integrating wavelet method and neural network for on-line recog-
nition of concurrent CCPs. In the hybrid system, concurrent CCPs
are rst preprocessed by a wavelet transform to decompose the
concurrent patterns into different levels or patterns, and then the
corresponding features are fed into back-propagation ANN classi-
ers for pattern recognition. Ranaee and Ebrahimzadeh [13] used
wavelet transformas inputs of SVMfor CCPR. AGenetic algorithmis
used to improve the recognition performance of SVM. Hassan et al.
[14] conducted an experimental study to use BPNs for identifying
six types of basic SPC patterns, where the performances of two BPN
recognizers using statistical features and rawdata as input feature,
respectively, were compared. The results indicated that the BPN
using statistical features as input vectors has better performance
than those of the other BPN using rawdata as input vectors.
Based on the published papers, there exist some important
issues in the design of automatic CCPs recognition systemwhich if
suitably addressed, lead to the development of more efcient rec-
ognizers. One of these issues is the extraction of the features. In this
paper for obtaining the compact set of features which capture the
prominent characteristics of the CCPs in a relatively small number
of the components, the statistical and shape features are applied.
These features are presented in Section 2.
Another issue is related to the choice of the classication
approach to be adopted. In this paper multilayer perceptron neural
network and support vector machine are investigated. The struc-
ture of the proposed Classier is shown in Fig. 2. It can be seen
that this systemis composed of the two major decision layers. First
layer is a neural network classier (Classier1). Literature review
shows the systems that use articial neural networks (ANNs) as the
classiers have high performances. Multilayer perceptrons (MLP)
neural network is simple and ideally suited for pattern recognition
tasks [15], and therefore, an MLP is used here as pattern recognizer.
In this layer because of using proper stochastic feature (mean), we
are facing to a classication problem where neural networks can
solve it despite the mentioned problem(local minimum). The main
problems are in the second, third and fourth groups which neural
networks will not be able to solve that. Therefore, in second layer
a support vector machine is proposed to solve the problem. The
proposed method is presented with more explanation in Section 5.
Using SVMs is the method that is receiving increasing atten-
tion, with remarkable results recently [16]. The main difference
between ANNs and SVMs is the principle of risk minimization. An
ANNimplements empirical riskminimizationtominimizetheerror
on the training data, whereas an SVM implements the principle
of structural risk minimization in place of experiential risk mini-
mization, which makes it have excellent generalization ability in
the situation when there is a small sample. Also, proposed method
shown in Fig. 3. The largest problems encountered in setting up the
SVM model are how to select the kernel function and its param-
eter values. The parameters that should be optimized include the
penalty parameter (C) and the kernel function parameters such as
the value of gamma () for the radial basis function (RBF) kernel.
Turning back to CCPs recognition systems, it can be found that the
selection of the best free parameters of the adopted classier is
Fig. 2. Structure of proposed classier.
ARTICLE IN PRESS
G Model
A. Ebrahimzadeh et al. / Applied Soft Computing xxx (2012) xxxxxx 3
Obtain optimized
Parameters
Parameters selection
Initialization of Bees
stop
NO
YES
Evaluate fitness of Bees
Classification
Re-Train MLP-SVM
MLP-SVM
IBA
Train MLP-SVM
Fig. 3. Proposed method.
generally done empirically. On the other hand using of SVM has
some difculties, which are howto select the optimal kernel func-
tion type, most appropriate the hyper-parameters values for SVM
training and testing stages. Therefore in this study, we used an
efcient optimizer called IBA for nding the optimum values of
hyper-parameters, i.e., the kernel parameter and classier param-
eters.
The rest of paper is organized as follows. Section 2 explains the
feature extraction. Section 3 presents the needed concepts includ-
ing the MLP neural network, SVM and Original bees algorithm.
Section 4 describes the proposed method including the improved
bees algorithm and classier. Section 5, shows simulation results
and nally Section 6 concludes the paper.
2. Feature extraction
Features represent the format of the CCPs. As we know, different
types of CCP have different properties; therefore nding the suit-
able features inorder toidentify themis a difcult task. Inthe signal
recognition area, choosing the good features not only enables the
classier todistinguishmore andhigher CCPs, but alsohelps reduce
the complexity of the classier. In this paper, for the feature extrac-
tion module we have used a suitable set of features that consists
of both shaping and statistical information of the CCPs. Figs. 410
showthese features. These features are briey describedas follows.
2.1. Statistical feature
Some statistical features are mean, standard deviation, skew-
ness, kurtosis, andautocorrelation. Inthis paper wehaveusedmean
as feature 1.
Fig. 4. APML of normal and cyclic patterns.
Fig. 5. APSL of upward-trend and upward-shift patterns.
Fig. 6. ASS of upward-trend and upward-shift patterns.
Feature1 (mean): Its mathematical formis shown below:
mean =
n
i=1
X
i
n
(1)
where X
i
represent the input (reference) vector and n is the total
length of the observing window.
Fig. 7. MVSASTI of upward-trend and upward-shift patterns.
ARTICLE IN PRESS
G Model
Fig. 8. APSL of downward-trend and downward-shift patterns.
Fig. 9. ASS of downward-trend and downward-shift patterns.
2.2. Shape features
In [13], the authors have introduced nine shape features for dis-
crimination of the CCPs. In this paper, three of these features are
consideredas features 2, 3and4. Feature 5is proposedinthis paper.
Feature 2 (APML): the area between the pattern and the mean
line. The APML is lowest for a normal pattern. Thus, this feature
differentiates between normal and other patterns.
Feature 3 (APSL): the area between the pattern and its least-
square line. Cyclic and shift patterns have a higher APSL value than
Fig. 10. MVSASTI of downward-trend and downward-shift patterns.
normal and trend patterns and therefore the APSL can be used to
differentiate cyclic and shift patterns from normal and trend pat-
terns.
Feature 4 (ASS): the area between the least-square line and the
line segments. The value of this feature is approximately zero for
a trend pattern and is higher for a shift pattern. This feature thus
differentiates trend patterns fromshift patterns.
Feature 5 (MVSASTI): the maximumvalue of variation in signal
amplitude in a short time interval (proposed shape feature). As it is
depicted in Fig. 1, the maximumvariation of signal amplitude in a
time interval of 60 second is approximately identical in both shift
and trend patterns. But in shift pattern, there is a large amount of
variation in signal amplitude in a part of signal and in a short time
interval which is not existed in the trend pattern. This difference
could be used for separating these two patterns. So, the maximum
of signal variation is calculated in a short time interval like T. begin-
ning fromt =0, the maximumvalue of signal variation is calculated,
from t =0 to t =t +T. This process is proceeded for t =1, . . ., 60T.
Finally the maximumvariation in different time interval is selected
as a desired feature. T could be any number between 5 and 10. This
shape feature is newin this area. In this study t =7s is selected.
3. Needed concepts
3.1. Multi-layer perceptron (MLP) neural network
An MLP neural network consists of an input layer (of source
nodes), one or more hidden layers (of computation nodes) and
an output layer. The recognition basically consists of two phases:
training and testing. In the training stage, weights are calculated
according to the chosen learning algorithm. The issue of learning
algorithmand its speed is very important for the MLP model. In this
study the resilient back-propagation (RPROP) algorithm.
RPROP considers the sign of derivatives as the indication for the
direction of the weight update [17]. In doing so, the size of the
partial derivative does not inuence the weight step. The following
equation shows the adaptation of the update values of
ij
(weight
changes) for the RPROP algorithm. For initialization, all are set to
small positive values:
ij
(t) =
ij
(t 1); if
E
W
ij
(t 1)
E
W
ij
(t) >0
ij
(t 1); if
E
W
ij
(t 1)
E
W
ij
(t) <0
ij
(t 1); otherwise
(2)
where
0
=0, 0<
<1<
+
,
,0,+
are known as the update factors.
Whenever the derivative of the corresponding weight changes its
sign, this implies that the previous update value is too large and
it has skipped a minimum. Therefore, the update value is then
reduced (
), as shown above. However, if the derivative retains

its sign, the update value is increased (
+
). This will help to acceler-
ate convergence inshallowareas. To avoid over-acceleration, inthe
epochfollowing the applicationof (
+
), the newupdate value is nei-
ther increased nor decreased (
0
) fromthe previous one. Note that
the values of
ij
remain non-negative in every epoch. This update
value adaptation process is then followed by the actual weight
update process, which is governed by the following equations:
W
ij
(t) =
ij
; if
E
W
ij
(t) >0
+
ij
; if
E
W
ij
(t) <0
0; otherwise
(3)
Thevalues of thetrainingparameters adoptedfor thealgorithms
were determined empirically.
ARTICLE IN PRESS
G Model
3.2. Support vector machine (SVM)
SVMperforms classication tasks by constructing optimal sepa-
ratinghyper-planes (OSH). OSHmaximizes themarginbetweenthe
two nearest data points belonging to two separate classes. Suppose
the training set, (x
i
, y
i
), i = 1, 2, ..., l, x R
d
_
1, +1
_
can be sep-
aratedby the hyper-plane W
T
x +b=0, where
Wis the weight vector

and b is bias. If this hyper-plane maximizes the margin, then the
following inequality is valid for all input data:
y
i
(W
T
x +b) 1, for all x
i
i = 1, 2, ..., l (4)
The margin of the hyper-plane is 2/
_
_
W
_
_
. Thus, the problemis
the maximizing of the margin by minimizing of
_
_
W
_
_
subject to (1).
This is aconvexquadratic programming(QP) problemandLagrange
multipliers (a
i
, i =1, 2, ..., l ;
i
0) are used to solve it:
L
p
=
1
2
_
_
W
_
_
2
i=1
i
_
y
i
(W
T
x +b) 1
(5)
After minimizing L
p
withrespect towandb, the optimal weights
are given by:
W
=
l
i=1
i
y
i
x
i
(6)
The dual of the problemis given by [18]:
L
d
=
l
i=1
1
2
l
i=1
l
j=1
j
y
i
y
j
x
T
i
x
j
(7)
To nd the OSH, it must maximize L
d
under the constraints of
l
i=1
i
y
i
= 0. The Lagrange multipliers are only non-zero (>0)
when y
i
(W
T
x +b) =1. Those training points, for which the equality
in (1) holds, are called support vectors (SV) that can satisfy (>0)
The optimal bias is given by:
b
= y
i
W
T
x
i
(8)
For any support vector x
i
. The optimal decision function (ODF)
is then given by:
f (x) = sgn
_
l
i=1
y
i
i
x
T
x
i
+b
_
(9)
where
i
s are optimal Lagrange multipliers. For input data with a
high noise level, SVMuses soft margins can be expressed as follows
with the introduction of the non-negative slack variables
i
=, i =1,
2, ..., l:
y
i
(W
T
x
i
+b) 1
i
for i = 1, 2, ..., l (10)
To obtain the OSH, it should be minimizing the = 1/2
_
_
W
_
_
2
+
C
l
i=1
k
i
subject to (10), where C is the penalty parameter, which
controls the tradeoff between the complexity of the decision func-
tion and the number of training examples, misclassied. In the
nonlinearly separable cases, the SVMmap the training points, non-
linearly, to a high dimensional feature space using kernel function
K(
x
i
,
x
j
), wherelinear separationmay bepossible. Thereareseveral
Kernel functions:
Linear:
K(
x
i
,
x
j
) =
x
i
x
j
(11)
Gaussian radial basis function (GRBF):
K(
x
i
,
x
j
) = exp
_
_
_
x
i
x
j
_
_
_
2
2
(12)
Polynomial:
K(
x
i
,
x
j
) = (
x
i
.
x
j
+1)
d
(13)
Sigmoid:
K(
x
i
,
x
j
) = tanh(
x
i
.
x
j
+) (14)
where , d, and and are the parameters of the kernel functions.
After a kernel function is selected, the QP problemwill become:
L
d
=
l
i=1
1
2
l
i=1
l
j=1
j
y
i
y
j
K(x
i
, x
j
) (15)
the
i
is derived by:
i
= arg max L
d
0
i
C; i = 1, 2, ..., l; (16)
l
j=1
i
y
i
= 0
After training, the following, the decision function, becomes:
f (x) = sgn
_
l
i=1
y
i
i
K(x, x
i
)
_
+b
(17)
The performance of SVM can be controlled through the term
C and the kernel parameter which are called hyper-parameters.
These parameters inuence on the number of the support vectors
and the maximization margin of the SVM. The suitable selection
of parameters of SVMplays an important role on the classication
performance. In this paper, improved bees algorithms are applied
to select the parameters of SVM.
3.3. Original bees algorithm
The bees algorithmis an optimization algorithminspired by the
natural foraging behavior of honey bees to nd the optimal solu-
tion [19,20]. Fig. 11 shows the pseudo code for the algorithmin its
1. Initialise the solution population.
2. Evaluate the fitness of the population.
3. While (stopping criterion is not met)
//Forming new population.
4. Select sites for neighborhood search.
5. Recruit bees for selected sites (more bees for
the best e sites) and evaluate fitnesses.
6. Select the fittest bee from each site.
7. Assign remaining bees to search randomly and
evaluate their fitnesses.
8. End While
Fig. 11. Pseudo code of bees algorithm.
ARTICLE IN PRESS
G Model
simplest form. The algorithm requires a number of parameters to
be set, namely: number of scout bees (n), number of sites selected
out of n visited sites (m), number of best sites out of m selected
sites (e), number of bees recruited for best e sites (nep), number
of bees recruited for the other (me) selected sites (nsp), initial
size of patches (ngh) which includes site and its neighbourhood
and stopping criterion. The algorithm starts with the n scout bees
being placedrandomly inthe searchspace. The tnesses of the sites
visited by the scout bees are evaluated in step 2.
In step 4, bees that have the highest tnesses are chosen as
selected bees and sites visited by themare chosen for neighbor-
hoodsearch. Then, insteps 5and6, thealgorithmconducts searches
in the neighborhood of the selected sites, assigning more bees to
search near to the best e sites. The bees can be chosen directly
according to the tnesses associatedwiththe sites they are visiting.
Alternatively, the tness values are used to determine the proba-
bility of the bees being selected. Searches in the neighborhood of
the best e sites whichrepresent more promising solutions are made
moredetailedbyrecruitingmorebees tofollowthemthantheother
selected bees. Together with scouting, this differential recruitment
is a key operation of the bees algorithm.
However, in step 6, for each patch only the bee with the highest
tness will be selected to formthe next bee population. In nature,
there is no such a restriction. This restriction is introduced here to
reduce the number of points tobe explored. Instep7, the remaining
bees in the population are assigned randomly around the search
spacescoutingfor newpotential solutions. Thesesteps arerepeated
until a stopping criterion is met. At the end of each iteration, the
colony will have two parts to its new population representatives
fromeach selected patch and other scout bees assigned to conduct
randomsearches [20].
4. Proposed method
In this section, it is described the design of proposed method
for CCPs. Proposed method is an IBA based MLP-SVMclassication
technique. The structure of the proposed method is shown in Fig. 3.
This approach selects of the kernel function parameters and soft
margin constant C penalty parameter of support vector machine
(SVM) classiers.
4.1. Improved bees algorithm(IBA)
In order to improve the convergence velocity and accuracy of
the BA, this article recommends an improved bee algorithm(IBA).
In bees algorithmngh denes the initial size of the neighborhood in
whichfollower bees areplaced. For example, if Xis thepositionof an
elitebeeintheithdimension, follower bees will beplacedrandomly
in the interval X
ie
ngh in that dimension at the beginning of the
optimization process. As the optimization advances, the size of the
searchneighborhood gradually decreases to facilitate ne tuning of
the solution. For each of the mselected sites, the recruited bees are
randomly placed withuniformprobability ina neighborhood of the
high tness location marked by the scout bee. This neighborhood
(ower patch) is dened as an n-dimensional hyper box of sides a
1
,
...., a
n
that is centered on the scout bee. For each ower patch, the
tness of the locations visited by the recruited bees is evaluated. If
one of the recruited bees lands in a position of higher tness than
the scout bee, that recruited bee is chosen as the new scout bee.
At the end, only the ttest bee of each patch is retained. The ttest
solution visited so-far is thus taken as a representative of the whole
ower patch. This bee becomes the dancer once back at the hive.
In BA, the size of a patch is kept unchanged as long as the local
search procedure yields higher points of tness. If the local search
fails tobringanyimprovement intness, thesizeais decreased. The
updating of the neighborhood size follows the following heuristic
formula
patchsize(t +1) = 0.8 ngh(t) (18)
where t denotes the tth iteration of the bees algorithmmain loop.
Following this strategy, the local search is initially dened over a
large neighborhood, and has a largely explorative character. As the
algorithm progresses, a more detailed search is needed to rene
the current local optimum. Hence, the search is made increasingly
exploitative, and the area around the optimum is searched more
thoroughly. Since the search process of BA is nonlinear and highly
complicated, linearly andnonlinearly decreasing patchsize withno
feedback taken fromthe elite bees tnesses cannot truly reect the
actual search process. In the beginning of the search process, the
bees are far away from the optimum point and hence a big patch
size is needed to globally search the solution space. Conversely,
when the best solution found by the population improves greatly
after some iteration, i.e., the bees nd a near optimum solution,
only small movements are needed and patch size must be set to
small values. Based on this, in this study, we proposed improved
bees algorithm(IBA) in which the patch size is set as a function of
elite bees tness during search process of BA as follows:
Patchsize
t
i
=
1
t

ngh
1 +exp(F(elite
t
i
))
(19)
where F(elite
t
i
) is the tness of ith elite bee in tth iteration and
Patchsize
t
i
is the ith elite bees size of the neighborhood in tth iter-
ation. In this case, patch size changes according to the rate of elite
bee tness improvement. According to Eq. (19), during the search
of IBA, while the tness of an elite bee is far away from the real
global optimal, the value of patch size will be large resulting in
strong global search abilities and locating the promising search
areas. Meanwhile, when the tness of an elite bee is achieved near
the real global optimal, the patch size will be set small, depending
on the nearness of its tness to the optimal value, to facilitate a
ner local explorations and hence accelerate convergence. So, the
main difference between BA and IBA is in the patch size denition.
First in IBA, the patch size is associated with tness value (Eq. (19)).
Second patch size is same for all elite bees in BA, meanwhile every
elite bee has its own patch size in IBA.
Several mathematical functions were optimized to illustrate the
performance of the IBA. The rst function is the Rosenbrock func-
tion described by
f
1
( x) =
D1
i=1
100(x
i+1
x
2
i
)
2
+(x
i
1)
2
(20)
where x = [x
1
, x
2
, ..., x
D
], the initial range of x is [50, 50]
D
, and D
denotes thedimensionof thesolutionspace. Theminimumsolution
of the Rosenbrock function is x
= [1, 1, ...., 1], and f

1
( x
) = 0.
The second function is the sphere function described by
f
2
( x) =
D
i=1
x
2
i
(21)
where the initial range of x is [100, 100]
D
. The minimumsolution
of the sphere function is x
= [0, 0, ...., 0], and f

2
( x) = 0.
The third function is the Rastrigin function described by
f
3
( x) =
D
i=1
(x
2
i
10 cos(2x
i
) +10) (22)
where the initial range of xis [5.12, 5.12]
D
. The minimumsolution
of the sphere function is x
= [0, 0, ...., 0], and f

3
( x
) = 0.
The optimization results of the Rosenbrock function, the Sphere
functionand the Rastriginfunctionhas beenshowninTable 1. Also,
ARTICLE IN PRESS
G Model
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
10
-15
10
-10
10
-5
10
0
10
5
Iteration
F
i
t
n
e
s
s
IBA
BA
Fig. 12. Convergence curves of IBA and BA for the Rastrigin function (D=5).
Table 1
Optimizations of the benchmark functions by BA and IBA.
Algorithm Function F
D=3 D=5
BA Rosenbrock 7.34e4 3.19e3
IBA Rosenbrock 4.32e6 7.42e4
BA Sphere 7.34e34 3.19e32
IBA Sphere 4.32e39 7.42e35
BA Rastrigin 6.84e11 1.49e7
IBA Rastrigin 5.62e14 8.12e13
we have drawn the convergence curves of BAand IBAalgorithms to
show the progresses of the mean best values presented in Fig. 12.
As depictedinthis gure, the IBAhas better performance andspeed
of convergence compared with BA.
4.2. Classier
The structure of the proposed classier is shown in Fig. 2. It
can be seen that this system is composed of the two major deci-
sion layers. First layer is a neural network classier (Classier1).
Literature review shows the systems that use articial neural net-
works (ANNs) as the classiers have high performances. Multilayer
perceptrons (MLP) neural network is simple and ideally suited for
patternrecognitiontasks [15]. Local minimums are one of the prob-
lems of neural networks. In the proposed method rst layer is just
used to do classication. In this layer sextuple patterns of control
chart are divided to three groups of double patterns. This layer
uses the statistical feature and divides the input patterns into three
binary groups. As indicated in Fig. 13, the value of the used sta-
tistical feature of the member of the each group is close to each
other. These binary groups are: normal andcyclic patterns (Group1,
blue lines), upwardshift andupwardtrendpatterns (Group2, green
lines), downward shift and downward trend patterns (Group3, red
lines).
In this layer because of using proper stochastic feature (mean),
we are facing to a classication problem where neural networks
can solve it despite the mentioned problem(local minimum). The
main problems are in the second, third and fourth groups which
neural networks will not be able to solve that. Therefore, a support
vector machine is proposed to solve the problem.
Support vector machines (SVMs), based on statistic all earning
theory, are gaining applications in the area of pattern recogni-
tion because of their excellent generalization capability [18]. Using
SVMs is the method that is receiving increasing attention, with
remarkable results recently [18]. The main difference between
ANNs and SVMs is the principle of risk minimization. An ANN
implements empirical risk minimization to minimize the error on
the training data, whereas an SVM implements the principle of
structural risk minimization in place of experiential risk minimiza-
tion, which makes it have excellent generalization ability in the
situation when there is a small sample. In Fig. 2, each sub-classier
(Classier 24) is an SVM classier. Classier 2 is trained to sepa-
rate normal and cyclic pattern. According to Fig. 4 the separation
of these patterns are very easy. Because the values of the feature2
are completely different for the normal and the cyclic patterns. For
separation of the members of the Group2, i.e., upward shift and
upwardtrendpatterns, the Feature3, the Feature4 andthe Feature5
are used by the Classier3.Classier 3 is trained to separate upward
shift and upward trend patterns. For separation of the members of
the Group3, the Feature3, the Feature4andthe Feature5are usedby
the Classier4.The main difference between ANNs and SVMs is the
principle of risk minimization. An ANN implements empirical risk
minimization to minimize the error on the training data, whereas
an SVMimplements the principle of structural risk minimization in
place of experiential risk minimization, which makes it have excel-
lent generalization ability in the situation when there is a small
sample. The largest problems encountered in setting up the SVM
model are howto select the kernel function and its parameter val-
ues. The parameters that should be optimized include the penalty
parameter (C) and the kernel function parameters such as the value
of gamma () for the radial basis function (RBF) kernel.
Turning back to CCPs recognition systems, it can be found that
the selection of the best free parameters of the adopted classier
is generally done empirically. On the other hand using of SVMhas
some difculties, which are howto select the optimal kernel func-
tion type, most appropriate the hyper-parameters values for SVM
training and testing stages. Therefore in this study, we used an IBA
for nding the optimumvalues of hyper-parameters, i.e., the kernel
parameter and classier parameters.
5. Simulation results
In this section we evaluate the performance of proposed recog-
nizer. For this purpose we have used the practical and real world
data [21]. This dataset contains 600 examples of control charts. For
this study, we have used 60% of data for training the classier and
the rest for testing. The easiest way to assess the performance rate
is to choose a test set independent of the training set and validation
set to classify its examples, count the examples that have been cor-
rectly classiedanddivide by the size of the test set. The proportion
ARTICLE IN PRESS
G Model
0 100 200 300 400 500 600
10
15
20
25
30
35
40
45
50
Patterns
V
a
l
u
e

o
f

f
e
a
t
u
r
e
s
Feature 1
NR
CC
UT
DT
US
DS
Fig. 13. Mean of CCPs (patterns from1 to 100 are normal patterns, cyclic: 101200, Up-trend: 201300, Down-trend: 301400, Up-shift: 401500, Down-shift: 501600).
Table 2
Recognition accuracy of the recognizer without optimization.
First layer classier Second layer classier RA (%)
MLP (RPROP) MLP (RPROP) 96.28
MLP (RPROP) SVM(linear) 97.15
MLP (RPROP) SVM(poly) 97.22
MLP (RPROP) SVM(GRBF) 97.54
of test-set examples that are classied correctly to the total sam-
ples, estimates the performance of recognizer for each pattern. In
order to achieve the recognitionaccuracy (RA) of system, one needs
to compute the average value of the performances of the CCPs.
The MLP classiers were testedwithvarious neurons for a single
hidden layer and the best network is selected. Also, based on the
extensivesimulations it is foundthat, theSVMwithGRBFkernel has
better results than the other kernels such as linear and polynomial.
So, in this study, for the SVMclassier, GRBF was adopted as kernel
function.
5.1. Performance of recognizer without optimization
First we have evaluatedthe performance of the recognizer with-
out optimization. Table 2 shows the recognition accuracy (RA) of
different systems. As reportedinTable 2, using MLP neural network
in both layers, 96.28% recognition accuracy is achieved. Also, per-
cent of the OAO method achieved with MLP neural network in rst
and the SVM classier based on the Gaussian kernel (SVMRBF)
in second layer on the test set were equal to 97.54%. These results
were better than those achieved by the SVM-linear and the SVM-
poly. Indeed, percent of the OAO method were equal to 97.15% for
the SVM-linear classier and 97.22% for the SVM-poly classier.
5.2. Performance of recognizer with optimization
Next, we apply IBA to nd the optimumparameters of the SVM.
Table 3 shows the IBA parameters. These values obtained after sev-
eral experiments. Table 4 shows the RAof optimized systems. From
Tables 2 and 4 it can be found that RA is increased from97.54% (in
case of the non-optimized system) to 99.51%. It can be seen that the
optimizationimproves the performance of recognizer signicantly.
In order to indicate the details of the recognition for each pat-
tern, the confusion matrix of the recognizer is shown by Table 5.
Table 3
Parameters used in the bees algorithm.
Number of scout bees, n 20
Number of sites selected for neighborhood search, m 10
Number of best elite sites out of mselected sites, e 5
Number of bees recruited for best e sites, nep 2
Number of bees recruited for the other (me) selected sites, nsp 2
Number of iterations, R 100
Table 4
Recognition accuracy of the recognizer without optimization.
First layer classier Second layer classier RA (%)
MLP SVM 99.51
As it can be seen from Table 3, separating up trend (UT) and the
up shift (US) patterns as well as between the downward trend (DT)
andthe downwardshift (DS) patterns is sodifcult due tosimilarity
between them, which are also the most overlapped ones according
to Figs. 510 (Section 2). In addition, it provides reference classi-
cationaccuracies inorder to quantify the capability of the proposed
method classication system to further improve these interesting
results. As we know, the values in the diagonal of confusion matrix
show the correct performance of recognizer for each pattern. In
other words, these value show that how many of considered pat-
tern are recognized correctly by the system. The other values show
the mistakes of system. For example, look at the third row of this
matrix. The value of 98.53% shows the percentage of correct recog-
nition of upward trend pattern and the value of 1.47% shows that
this type of pattern is wrongly recognized with upward shift pat-
tern. In order to achieve the recognition accuracy (RA) of system, it
is needed to compute the average value of that appears in diagonal.
Table 5
Confusion matrix for best result.
Nor Cyc Up-tr Do-tr Up-sh Do-sh
Nor 100% 0 0 0 0 0
Cyc 0 100% 0 0 0 0
Up-tr 0 0 98.53% 0 1.47% 0
Do-tr 0 0 0 98.53% 0 1.47%
Up-sh 0 0 0 0 100% 0
Do-sh 0 0 0 0 0 100%
ARTICLE IN PRESS
G Model
Table 6
Comparison between the performances of IBAMLPSVM and other recognition
techniques.
Recognition techniques Recognition accuracy (%)
ABCMLPSVM 99.15
GAMLPSVM 98.64
PSOMLPSVM 98.76
ICAMLPSVM 97.88
BAMLPSVM 99.30
IBAMLPSVM(proposed method) 99.51
In order to compare the performance of improved bees algo-
rithm(IBA) with another nature inspired algorithms, we have used
several nature inspired algorithms such as articial bee colony
(ABC) [22], genetic algorithm (GA) [23], particle swarm optimiza-
tion (PSO) [24] and imperialist competitive algorithm (ICA) [25]
to evolve the proposed method. According to results in Table 6,
the best accuracy obtained for the test set by IBAMLPSVM is
99.51%. It can be seen that the success rates of IBA is higher than
the performance of other nature inspired algorithms.
5.3. Performance evaluation with optimization in different runs
In this experiment, for evaluating the performance of the pro-
posed algorithm, ve different runs have been performed. The
improved bees algorithm nds the best combination of the free
parameters of SVMclassier to gain the tness function maximum.
Table 7
The best parameters of SVMclassier with the GRBF kernel function and estimation
parameter for different runs.
Run C Best tness
1 237.95 6.44 99.51
2 146.32 7.70 99.51
3 365.52 0.59 99.51
4 139.51 8.33 99.51
5 1.83 5.90 99.51
Table 8
Asummary of different classicationalgorithms together withtheir reportedresults
used measures of the accuracy.
Ref. no. Classier RA (%)
[26] MLP (RSFM) 97.46
[27] MLP 94.30
[28] PNN 95.58
[29] MLP 93.73
[30] LVQ 97.7
[31] MLP (SPA) 96.38
This work MLPSVM 99.51
The optimal values of SVM classier parameters, i.e. parameter
and optimal values of C parameters estimated by proposed algo-
rithmare shown in Table 7. The good agreement has been observed
between the estimation and preselected parameter. The proposed
algorithm successfully nds the global optimum just with 100
0 10 20 30 40 50 60 70 80 90 100
95
95.5
96
96.5
97
97.5
98
98.5
99
99.5
100
Iteration
R
e
c
o
g
n
i
t
i
o
n

a
c
c
u
r
a
c
y
run 1
run 2
run 3
run 4
run 5
Fig. 14. Evolution of tness function for different runs.
0 10 20 30 40 50 60 70 80 90 100
95.5
96
96.5
97
97.5
98
98.5
99
99.5
100
Iteration
R
e
c
o
g
n
i
t
i
o
n

a
c
c
u
r
a
c
y
IBA
BA
Fig. 15. Performance comparisons of BA and IBA.
ARTICLE IN PRESS
G Model
generations. Fig. 14 shows a typical increase of the tness (clas-
sication accuracy) of the best individual tness of the population
obtained fromhybrid intelligent method for different runs. As indi-
cated in Fig. 14, its tness curves gradually improved fromiteration
0100, and exhibited no signicant improvements after iteration
70 for the ve different runs. The optimal stopping iteration to
get the highest validation accuracy for the ve different runs was
around iteration 5070. In Fig. 15, the accuracy and the speed of
bees algorithm and improved bees algorithm are compared. The
achieved diagrams showthe mean of 5 different runs for both algo-
rithms. As depicted in this gure, the improved bees algorithmhas
higher accuracy and speed of convergence compared with bees
algorithm.
5.4. Comparison of the proposed method with other methods in
the literature
Several researchers have addressed the arrhythmia detection
and classication problem using the CCP signals directly or by
analyzing the pattern rate variability signal in the past. Direct
comparison with other works is difcult in control chart pattern
recognition problem. This is mainly because of the fact that there
is no single unied data set available. Different setup of patterns
(in case of number of training and testing samples and the number
of patterns) will lead to different performance. Besides, there are
many different kinds of benchmarking systems used for systems
quality. This causes difculties for direct numerical comparison. A
summary of different methods together with their reported results
used measures of accuracy is summarized in Table 8.
As for neural network-based CCPs recognizers, Le et al. [26]
introduced a new ANN model and their numerical simulations
showed that this model a recognition accuracy about 97.46% for
recognition of six types of CCP. Phamand Oztemel [27] reported a
generalizationrateof 94.30%. ChengandMa [28] havegainedrecog-
nition accuracy (RA) about 95.58%. However, the performance for
lower patterns is reportedtobeless than90%. Theproposedmethod
in [29] reached a RA about 93.73% of classication accuracy for
recognition of six types of CCPs. In [30] the authors used LVQneural
network and achieved a RA about 97.70. In [31], Guh and Tannock
proposed a sequential pattern analysis method and reported a clas-
sicationrateabout 96.38%. Comparingtothesepapers, aneffective
system is proposed in the current work which provides a better
accuracy over a wider range of different types of CCPs (six different
classes).
6. Conclusion
Control chart patterns (CCPs) are important statistical pro-
cess control tools for determining whether a process is run in its
intended mode or in the presence of unnatural patterns. Unnatural
CCPs provideclues topotential qualityproblems at anearlystage, to
eliminate defects before they are produced. In this paper presents
a novel hybrid intelligent method for recognition of common types
of CCP. The proposed method includes three main modules: the
feature extraction module, the classier module and optimization
module. In the feature extraction module, a proper set of eight
shape and statistical features is presented that is useful for recog-
nition of control chart patterns. Extraction of these features does
not call for utilizing the experience and skill of the users and thus
the CCP recognizer developed based of these features will be truly
automated. Further, the use of this features set required less train-
ing effort and resulted in better recall performance. These conrm
the expectation that a feature-based input vector representation
results in better recognizer performance. In the classier module, a
hybrid learning-based model, which integrates the ANN and SVM
learning techniques, was proposed for the comprehensive recogni-
tion of CCPs. The structure of the proposed classier is composed
of the two major decision layers. First layer is a neural network
classier (Classier1). Local minimums are one of the problems of
neural networks. In this layer sextuple patterns of control chart are
divided to three groups of double patterns. These groups are: nor-
mal and cyclic patterns (Group1), upward shift and upward trend
patterns (Group2), downward shift and downward trend patterns
(Group3). Themainproblems areinGroup2andGroup3whichneu-
ral networks will not be able to solve that. Therefore, a support
vector machine is proposed to solve the problem. For the optimiza-
tion module, improved bees algorithmis proposed to improve the
generalization performance of the recognizer. In this module, it the
SVMclassier design is optimized by searching for the best value of
the parameters that tune its discriminant function (kernel param-
eter selection). The results showed that the proposed model was
effective in nding the parameters of SVM, and that it improved
classication accuracy. We evaluated the proposed model using a
data set and compared it with other models. The simulation results
indicate that the proposed method can correctly achieve high clas-
sication accuracy (99.51%).
References
[1] D.C. Montgomery, Introduction to Statistical Quality Control, 5th ed., John
Wiley, Hoboken, NJ, USA, 2005.
[2] J.A. Swift, J.H. Mize, Out-of-control pattern recognition and analysis for quality
control charts using lisp-based systems, Computers and Industrial Engineering
28 (1) (1995) 8191.
[3] J.R. Evans, W.M. Lindsay, A framework for expert system development in sta-
tistical quality control, Computers and Industrial Engineering 14 (3) (1988)
335343.
[4] A. Ebrahimzadeh, V. Ranaee, Control chart pattern recognition using an opti-
mized neural network and efcient features, ISA Transactions 49 (2010)
387393.
[5] V. Ranaee, A. Ebrahimzadeh, Control chart pattern recognition using neural
networks and efcient features: a comparative study, Pattern Analysis &Appli-
cations (2011), doi:10.1007/s10044-011-0246-6.
[6] M.S. Yang, J.H. Yang, A fuzzy-soft learning vector quantization for control
chart pattern recognition, International Journal of Production Research 40 (12)
(2002) 27212731.
[7] R.S. Guh, Y.R. Shiue, Online identication of control chart patterns using self-
organized approaches, International Journal of Production Research 43 (2005)
12251254.
[8] C.H. Wang, W. Kuo, H. Qi, An integrated approach for process monitoring using
wavelet analysis and competitive neural network? International Journal of
Production Research 45 (1) (2007) 227244.
[9] V. Ranaee, A. Ebrahimzadeh, R. Ghaderi, Application of the PSO SVMmodel for
recognition of control chart patterns, ISA Transactions 49 (4) (2010) 577586.
[10] D.T. Pham, M.A. Wani, Feature-based control pattern recognition, International
Journal of Production Research 35 (7) (1997) 18751890.
[11] S.K. Gauri, S. Chakraborty, Improved recognition of control chart patterns using
articial neural networks, International Journal of Advanced Manufacturing
Technology (36) (2008) 11911201.
[12] Z. Chen., S. Lu, S. Lam, A hybrid systemfor SPC concurrent pattern recognition,
Advanced Engineering Informatics 21 (2007) 303310.
[13] V. Ranaee, A. Ebrahimzadeh, Control chart pattern recognition using a novel
hybrid intelligent method, Applied Soft Computing 11 (2) (2011) 26762686.
[14] A. Hassan, M.S. Nabi Baksh, A.M. Shaharoun, H. Jamaluddin, ImprovedSPCchart
pattern recognition using statistical features, International Journal of Produc-
tion Research 41 (7) (2003) 15871603.
[15] S. Haykin, Neural Network: AComprehensive Foundation, Prentice-Hall, Engle-
wood Cliffs, NJ, 1999.
[16] A. Ebrahimzadeh, V. Ranaee, High efcient method for control chart patterns
recognition, Acta Technica 56 (2011) 89101.
[17] M. Riedmiller, H. Braun, A direct adaptive method for faster back propagation
learning: the rprop algorithm, in: Proc. (ICNN), 1993, pp. 586591.
[18] C. Burges, A tutorial on support vector machines for pattern recognition, Data
Mining and Knowledge Discovery 2 (1998) 121167.
[19] T.D. Seeley, The Wisdom of the Hive: The Social Physiology of Honey Bee
Colonies, Harvard University Press, Cambridge, MA, 1996.
[20] D.T. Pham, A. Ghanbarzadeh, E. Koc , S. Otri, S. Rahim, M. Zaidi, The bees
algorithma novel tool for complex optimisation problems, in: Intelligent Pro-
duction Machines and Systems, 2006, pp. 454459.
[21] http://archive.ics.uci.edu/ml/databases/synthetic control/synthetic con-
trol.data.html.
[22] D. Karaboga, B. Basturk, On the performance of articial bee colony (ABC) algo-
rithm, Applied Soft Computing 8 (2008) 687697.
ARTICLE IN PRESS
G Model
[23] K.S. Tang, K.F. Man, S. Kwong, Q. He, Genetic algorithms and their applications,
IEEE Signal Processing Magazine 13 (1996) 2237.
[24] J. Kennedy, R. Eberhart, Particle swarm optimization, in: Proceedings of IEEE
International Conference on Neural Networks, 4, 1995, pp. 19421948.
[25] E. Atashpaz-Gargari, C. Lucas, Imperialist competitive algorithm: an algorithm
for optimization inspired by imperialistic competition, in: Proceedings of the
IEEE Congress on Evolutionary Computation, Singapore, 2007, pp. 46614667.
[26] Q. Le, X. Goal, L. Teng, M. Zhu, A newANN model and its application in pattern
recognition of control charts, in: Proc. IEEE (WCICA), 2008, pp. 18071811.
[27] D.T. Pham, E. Oztemel, Control chart patternrecognitionusing neural networks,
Journal of Systems Engineering 2 (1992) 256262.
[28] Z. Cheng, Y. Ma, A research about pattern recognition of control chart using
probability neural network, in: Proc. (ISECS), 2008, pp. 140145.
[29] S. Sagiroujlu, E. Besdoc, M. Erler, Control chart pattern recognition using
articial neural networks, Turkish Journal of Electrical Engineering 8 (2000)
137147.
[30] D.T. Pham, E. Oztemel, Control chart pattern recognition using linear vector
quantization networks, International Journal of Production Research (1994)
256262.
[31] R.S. Guh, J.D.T. Tannock, A neural network approach to characterize pattern
parameters in process control charts, Journal of Intelligent Manufacturing 10
(1999) 449462.

Article in Press: Recognition of Control Chart Patterns Using An Intelligent Technique

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Article in Press: Recognition of Control Chart Patterns Using An Intelligent Technique

Uploaded by

Copyright:

Available Formats

Please cite this article in press as: A. Ebrahimzadeh, et al.

, Recognition of control chart patterns using an intelligent technique, Appl.

, Jalil Addeh, Vahid Ranaee

), as shown above. However, if the derivative retains

Wis the weight vector

= [1, 1, ...., 1], and f

= [0, 0, ...., 0], and f

= [0, 0, ...., 0], and f

You might also like