Professional Documents
Culture Documents
a r t i c l e i n f o a b s t r a c t
Keywords: Support vector machines (SVM) have in recent years been gainfully used in various pattern recognition
Support vector machines applications. Based on statistical learning theory, this paradigm promises strong robustness to noise
Genetic algorithm and generalization to unseen data. As in any classification technique, appropriate choice of the kernels
Model selection and input features play an important role in SVM performance. In this study, an evolutionary scheme
Wavelet kernel function
searches for optimal kernel types and parameters for automated seizure detection. We consider the
Partial seizure
ECG
Lyapunov exponent, fractal dimension and wavelet entropy for possible feature extraction. The classifi-
cation accuracy of this approach is examined by applying the MIT (Massachusetts Institute of Technol-
ogy) dataset and comparing results with the SVM. The MIT-BIH dataset has the electrocardiographic
(ECG) changes in patients with partial epilepsy which two types ECG beats (partial epilepsy and normal).
A comparison of results shows that performance of the evolutionary scheme outweighs that of support
vector machine. In the best condition, the accuracy rate of the proposed approaches reaches 100% for
specificity and 96.29% for sensitivity.
Ó 2011 Elsevier Ltd. All rights reserved.
0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2011.01.087
10752 M. Zavar et al. / Expert Systems with Applications 38 (2011) 10751–10758
manifestations. Because of possible contribution of cardiovascular of data from 11 partial seizures recorded in five women patients
system to sudden unexplained death (SUD), it has received the most during continuous electroencephalographic/electrocardiographic/
attention. According to research in recent years, SUD in patients with video monitoring. The patients ranged in age from 31 to 48 years
severe intractable epilepsy reduce from 1/100 to 1/1000 in patients old, were without clinical evidence of cardiac disease, and had par-
with well-controlled epilepsy. In addition ECG signal has been play- tial seizures with or without secondary generalization from frontal
ing increasingly important role in seizure detection in patients with or temporal foci. Recordings were made under a protocol approved
partial epilepsy. Severe cardiac rhythm and conduction abnormalities by Beth Israel Deaconess Medical Center’s Committee on Clinical
have been explained, such as severe bradycardia, bundle branch Investigations. Data were analyzed off-line using customized soft-
blocks, ST-segment and QT-interval abnormalities. (Hirsch & ware. Onset and offset of seizures were visually identified to the
C.O.a.L., 2002; Leutmezer, Schernthaner, Lurger, Pötzelberger, & nearest 0.1 s by an experienced electroencephalographer (DLS)
Baumgartner, 2003; Leung, Kwan, & Elger, 2006; Hitiris et al., 2007). blinded with respect to the heart rate variability analysis. Contin-
Autonomic nervous system changes are common in epileptic uous single-lead ECG signals were sampled at 200 Hz. From the
and non-epileptic seizures and may cause sudden death in epilep- digitized ECG recording, a heartbeat annotation file (a list of the
tic patients. These autonomic changes are not very clearly, but may type and time of occurrence of each heartbeat) was obtained using
be the result of increased motor activity and other factors. Several a version of the commercially available arrhythmia analysis
studies have demonstrated that ECG commonly changes during software.
seizures. In this paper we propose an automated decision system The seizures may be related to cardiac arrhythmias and sudden
for partial epilepsy seizure detection. Ordinary methods of moni- death. In five women patients, 11 seizures were recorded that were
toring and diagnosing electrocardiographic changes rely on detect- between ranges of 15–110 s. Post-ictal heart rate oscillations with
ing the presence of particular signal features by a human observer. low frequency (i.e. intervals of 2–6 min) were observed in all five
Due to large number of patients in intensive care units and the patients that were included reasonable frequency spectrum in
need for continuous observation of such conditions, in recent the ranges of 0.01–0.1 Hz. Also, in pre-ictal no oscillations were
years, there has been an growing interest to automate this process. observed.
However, techniques from the domains of nonlinear analysis and Entirely, the significant characteristics of post-ictal heart rate
chaos theory are applied to experimental time series such as ECG oscillations are as follows:
signals. The techniques developed for automated electrocardio-
graphic changes detection. Many ECG changes detection methods, These oscillations are different from high frequency oscillations
such as wavelet transform (WT), length and energy transform, arti- (0.2–0.4 Hz of physiological oscillations relates to breath).
ficial neural networks, support vector machine are reported in the Post-ictal oscillations have high amplitudes.
literatures (Foo, Stuart, Harvey, & Meyer-Baese, 2002; Güler & Discriminating these oscillations from short length physiologi-
Übeyli, 2004; Güler & Übeyli, 2005; Saxena, Kumar, & Hamde, cal changes that are related to posture or body activity.
2002; Stollberger & Finsterer, 2004; Übeyli & Güler, 2004).
This paper is organized as follows. In Section 2, epileptic seizure 3. Feature extraction
data is described. In Section 3, the features of ECG signals are ex-
tracted with Lyapunov exponent, wavelet entropy and fractal The nonlinear features as Lyapunov exponent, fractal dimension
dimension. SVM is reviewed in Section 4. In Section 5, the model and wavelet entropy are used in feature extraction stage of this
selection based on a genetic algorithm is explained. In Section 6, approach.
experimental results are presented, and finally in Section 7, conclu-
sions are given. 3.1. Lyapunov exponent
sequence for another x value, say xj, that is close to xi, then of the outcome is related to a probability of distribution
the sequence of differences d0 = jxj xij, d1 = jxj+1 xi+1j, . . . , dn = P = (p1, p2, p3, . . . , pn).
jxj+n xi+nj is assumed it increase exponentially, at least on the Transient signals have some characteristics such as high fre-
average, as n increases. More formally, we assume that quency and instant break.
Given a discrete signal x(n), being fast transform at instant k and
dn ¼ d0 ekn ð2Þ scale j, it has a high frequency component coefficient Dj(k) and a
low frequency component coefficient Aj(k). The frequency band
Or, after taking logarithms
of the information contained in signal components Dj(k) and
1 dn Aj(k), obtained by reconstruction are as follows:
k¼ ln ð3Þ (
n d0 Dj ðkÞ : ½2ðjþ1Þ fs ; 2j fs
j ¼ 1; . . . ; m ð5Þ
In practice, above equation will be taken as the definition of the Aj ðkÞ : ½0; 2ðjþ1Þ fs
Lyapunov exponent k. If k is positive, the behavior is chaotic. In this
where fs is the sampling frequency.
method of finding, essentially two nearby trajectory points in state
The original sequence X(n) can be represented by the sum of all
space are being located and then following the differences between
components as follow:
the two trajectories that follow each of these ’’initial’’ points.
To characterize the attractor, an average value for k usually is X
j
demanded. An average value obtains by distributed over attractor. XðnÞ ¼ Dj ðkÞ þ Aj ðkÞ ð6Þ
j¼1
The average Lyapunov exponent for attractor is then found from
Various wavelet entropy measures were defined in X.
1 XN
Entirely, wavelet entropy has been done on signal energy levels
k¼ kðxi Þ ð4Þ
N i¼1 that can be considered as wavelet energy if the energy is used in
the formulas as follow.
For this set k(xi) shows the value of k may depend on the value of xi The energy at each resolution level, will be the energy of detail
chosen as the initial value. Positive values of Lyapunov exponent k signal
represents that behavior of the system is chaotic conversely, nega- X
tive values of Lyapunov exponent reflects periodic behavior of the Ej ¼ j Dj ðkÞj2 ð7Þ
system. Fig. 1(a) and (b) shows Lyapunov exponent for a normal k
nents, which confirm the chaotic nature of the ECG signals. In consequence total energy can be obtained by
As it is seen from Fig. 1, 250 coefficients of Lyapunov exponent XX X
have extracted of every epoch. High dimension of feature vectors Etot ¼ j Dj ðkÞj2 ¼ Ej ð9Þ
increased computational complexity. In order to reduce the dimen- j k j
sionality of the extracted diverse feature vectors, statistics over the Then the normalized values, which represent the relative wavelet
set of Lyapunov exponents are used. These statistical features are: energy,
maximum, minimum, mean and standard deviation of the
Lyapunov exponents in each epoch. Ej
Pj ¼ ð10Þ
Etot
3.2. Wavelet entropy For the resolution level, defined by scales the probability distribu-
P
tion of the energy. Clearly, j Pj ¼ 1 and the distribution {Pj} can
The information-theoretic entropy was introduced by Claude be considered as a time-scale density. This gives suitable tool for
Shannon in 1948 and Shannon was looking for a measure of detecting characterizing specific phenomenon in time and fre-
uncertainty. For process with n possible outcomes; the uncertainty quency planes.
12 14
10 12
8 10
6 8
4 6
2 4
0 2
-2 0
-4 -2
0 50 100 150 200 250 0 50 100 150 200 250
(a) (b)
Fig. 1. Lyapunov exponents of the ECG beats. (a) Normal beat and (b) partial epilepsy beat.
10754 M. Zavar et al. / Expert Systems with Applications 38 (2011) 10751–10758
Shannon entropy measures the predictability of future ampli- The total average length for scale k, L(k) is proportional to k D
tude values of the ECG based on the probability distribution of where D is the fractal dimension2 by Higuchi’s method. In the
amplitude values. It quantifies the probability density function of curve of ln(L(k)) versus ln(1/k), the slope of the least squares lin-
the distribution of values. So total wavelet entropy is defined as ear best fit is the estimate of the FD. In Higuchi’s algorithm there
(Rosso et al., 2000; Safty & A.E.-Z., 2008): is a need to choose the value for Kmin and Kmax, that is, initial
X and final step of scaling factor values. But the criterion for the
wSðpÞ ¼ Pj lnðpÞj ð11Þ selection of these values is not presented, so it gives less effective
j results. Higuchi fixed the minimum value of K = 2 and maximum
value of K = 8 for his algorithm. The FD of a curve can be defined
Therefore, according to the principles and basis of wavelet entropy
as
as mentioned, different entropy criteria have been used, as in MAT-
LAB7 has referred. Hence, according to the MATLAB, in the following log10 L
D¼ ð19Þ
expressions, s is the signal and (si)i the coefficients of s in an orthog- log10 d
onal basis.
here L is the total length of the curve or sum of distances between
The entropy E must be an additive cost function such that
P successive points and d is the diameter estimated as the distance
E(0) = 0 and EðsÞ ¼ i Eðsi Þ (13)
between the first point of the sequence and the point of the se-
quence that provides the farthest distance. Mathematically, d can
Normalized Shannon entropy
be expressed as
X
E1ðsÞ ¼ s2i logðs2i Þ with the convention 0 logð0Þ ¼ 0 ð12Þ d ¼ maxðdistanceð1; iÞÞ ð20Þ
i
Considering the distance between each point of the sequence and
Concentration in norm entropy with 1 6 p the first, point i is the one that maximizes the distance with respect
X to the first point. The FD compares the actual number of units that
E2ðsÞ ¼ j si jp ¼ kskpp ð13Þ
compose a curve with the minimum number of units required to
i
reproduce a pattern of the same spatial extent (Paramanathan &
Here, the role of p is as power. Uthayakumar, 2008).
Log energy entropy
The logarithm of ‘‘energy’’, defined as the sum over all samples
4. Support vector machine
X
E3ðsÞ ¼ logðs2i Þ with the convention logð0Þ ¼ 0 ð14Þ
i The SVM proposed by Vapnik (1995) has been studied compre-
hensively for classification, regression, and density estimation.
Threshold entropy
Fig. 2 shows the architecture of the SVM. SVM maps the input pat-
E4(s) = 1 if |si| > p and 0 elsewhere so E4(s) = {i such that |si| > p}
terns into a higher dimensional feature space through some non
is the number of samples for which the absolute value of the
linear mapping chosen a priori. Then, a linear decision surface is
signal exceeds a threshold p.
constructed in this high-dimensional feature space. Hence, SVM
SURE entropy
A threshold-based method in which the threshold equals to is a linear classifier in the parameter space, but it can perform
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi nonlinear classification with the nonlinear mapping of the input
p¼ 2loge ðnlog2 ðnÞÞ ð15Þ space to the high-dimensional feature space (Burges, 1998; Übeyli
and Güler, 2008).
where n is the number of samples in the signal. Training the SVM is a quadratic optimization problem. The con-
X struction of a hyperplane wTx + b P 0 (w is the vector of hyper-
E5ðsÞ ¼ n þ minðs2i ; p2 Þ ð16Þ
plane coefficients and b is a bias term), so that the margin
i
between the hyperplane and the nearest point is maximized, can
be posed as the quadratic optimization problem. SVM has been
3.3. Fractal dimension with Higuchi’s algorithm shown to provide high generalization ability. For a two-class prob-
lem, assuming the optimal hyperplane in the feature space is gen-
Consider x(1), x(2), . . . , x(N) be the time sequence to be analyzed. erated, Eq. (24) shows the classification decision for new (test) data
Construct k new time series xkm as xkm = {x(m), x(m + k), . . . , x:
x(m + [(N m)/k]k)} for m = 1, 2, . . . , k, where m indicates the initial
time value, k indicates the discrete time interval between points (de-
X
N
f ðxÞ ¼ yi ai Kðxi ; xÞ þ b ð21Þ
lay) and [a] means integer part of a. For each of the curves or time ser- i¼1
ies xkm constructed, the average length Lm(k) is computed as
where ai P 0, i = 1, 2, . . . , N are non-negative Lagrange multipliers
P that satisfy yi(w, u(xi) + b) = 1. Therefore Support Vectors are on
i¼1 j xðm þ ikÞ xðm þ ði 1ÞkÞ j ðN 1Þ
Lm ðkÞ ¼ Nm ð17Þ the hyperplane with a P 0, and K(xi, x) is a kernel function that ex-
k
k
presses an inner product in the feature space. So, f(x) is a linear
combination of the kernels. Fig. 3 shows training data.
where N is the total length of the data sequence and (N 1)/
SVM uses some kernel functions, such as: k(xi, x) = (xi, x) (linear
[(N m)/k]k is a normalization factor. An average length is com-
SVM); k(xi, x) = (xi, x + 1)d (polynomial SVM of degree);
puted for all time series having the same delay (or scale) k, as the 2 2
kðxi ; xÞ ¼ eðkxi xk Þ=2r (Radial Basis Function–RBF SVM); where s,
mean of the k lengths Lm(k) for m = 1, 2, . . . , k. The procedure is re-
k, y are constants. However, a proper kernel function for a certain
peated for each k ranging from 1 to kmax, yielding a sum of average
problem is dependent on the specific data, and till now there is no
lengths L(k) for each k as indicated in
good method on how to choose a kernel function. In this study, the
choice of the kernel functions was studied empirically and optimal
X
k
LðkÞ ¼ LmðkÞ ð18Þ
m¼1 2
FD.
M. Zavar et al. / Expert Systems with Applications 38 (2011) 10751–10758 10755
Table 1
WSVM using the Marlet and Mexican hat wavelet kernel.
Recently, based on the wavelet decomposition, some wavelet parameter and optimal soft margin constant C parameter of sup-
support vector machines (WSVMs) are proposed and have been port vector machine classifiers.
used successfully in many fields such as classification and The structure of chromosomes is expressed in Table 2.
nonlinear function estimation (Zhang, Zhou, & Jiao, 2004; Yang & The initial population of the genetic algorithm is composed 20
Wang, 2008). chromosomes that each chromosome is constructed of 9 genes
After checking the satisfaction of kernel in Mercer theorem, is (bit), these bits are consisted of 4 segments. The first segment of
construct wavelet kernels for SVM, translation-invariant wavelet a chromosome indicates the feature types and 2 bits are enough,
kernels that satisfy the translation invariant kernel theorem are: because 4 feature types are used in here (Lyapunov exponent,
wavelet entropy, fractal dimension, Lyapunov exponent & FD).
YN
xi x0i The second segment of a chromosome represents the value of C
Kðx; x0 Þ ¼ w ð27Þ
ai parameter which is between 0.1 and 1,000,000. The third segment
i¼1
illustrates the kernel type of SVM (Polynomial, RBF, Mexican hat
The WSVM using the Marlet and Mexican hat wavelet kernel in
modeling to SVM is superior than in using the Gaussian radial basis
function kernel. Mother wavelet function and wavelet kernel func-
tion for Marlet and Mexican hat are shown in Table 1 (Cui, 2006).
The goal in this formulation is to find the optimal wavelet coef-
ficients in the space spanned by the multidimensional wavelet ba-
sis. Therefore, we can obtain the optimal decision function as
below:
! !
Xt
ai YN
xj xji
f ðxÞ ¼ sgn w þb ð28Þ
i¼1
yi i¼1 ai
6. Model selection
Fig. 4. Fitness of 3 generation.
Table 2
Structure of a chromosome.
Table 4
Accuracy with GA_WSVM classifier.
Kernel function Feature selection Kernel parameter C parameters Nsv Margin Specificity Sensitivity Accuracy (%)
Rbf Lyapunov exponent r = 10 10,000 33 37.8721 100 88.89 94.45
Rbf Fractal dimension r = 10 100 39 29.2818 91.44 86.32 88.88
Rbf Lyapunov exponent & FD r = 10 0.1 35 30.4322 91.44 90.5 90.97
Poly Lyapunov exponent d=3 100,000 38 36.8593 95.6 81.48 88.54
Poly Wavelet entropy d=3 0.1 36 31.4373 91.44 85.18 88.31
Mexican hat Wavelet entropy – 1 28 37.58760 96.29 88.89 92.59
Mexican hat Lyapunov exponent & FD – 100,000 23 37.86413 97.45 91.44 94.45
Mexican hat Wavelet entropy – 1,000,000 24 36.53829 94.87 96.29 95.08
Morlet Lyapunov exponent & FD x = 10 100,000 22 38.802874 100 92.59 96.29
Morlet Fractal Dimension x=1 10 30 33.09635 92.59 93.5 93.04
Morlet Wavelet entropy x=5 100,000 25 36.7492 100 85.18 92.59
Morlet Lyapunov exponent x=3 1 30 35.78114 92.59 88.89 90.75
Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Rosso, O. A., Blanco, S., Yordanova, J., Kolev, V., Figliola, A., Schurmann, M., et al.
Data Mining and Knowledge Discovery, 2, 121–167. (2000). Wavelet entropy: A new tool for analysis of short duration brain
Cortes, C., & Vapnik, V. (1995). Support vector networks. Machine Learning, 20(3), electrical signals. Journal of Neuroscience Methods, 105, 65–75.
273–297. Safty, S. EL., & A.E.-Z. (2008). Applying wavelet entropy principle in fault
Cui, W. Z. (2006). Wavelet support vector machine with universal approximation classification. Expert Systems with Applications, 30, 1307–6884.
and its application. In IEEE information theory workshop (pp. 340–364). Saxena, S. C., Kumar, V., & Hamde, S. T. (2002). Feature extraction from ECG signals
Foo, S. Y., Stuart, G., Harvey, B., & Meyer-Baese, A. (2002). Neural network-based using wavelet transforms for disease diagnostics. International Journal of
EKG pattern recognition. Engineering Applications of Artificial Intelligence, 15, Systems Science, 33(13), 1073–1085.
253–260. Stollberger, C., & Finsterer, J. (2004). Cardiorespiratory findings in sudden
Güler, I., & Übeyli, E. D. (2004). Application of adaptive neuro-fuzzy inference unexplained/unexpected death in epilepsy (SUDEP). Epilepsy Research, 59,
system for detection of electrocardiographic changes in patients with partial 51–60.
epilepsy using feature extraction. Expert Systems with Applications, 27(3), Übeyli, E. D., & Güler, I. (2004). Detection of electrocardiographic changes in partial
323–330. epileptic patients using Lyapunov exponents with multilayer perceptron neural
Güler, I., & Übeyli, E. D. (2005). An expert system for detection of networks. Engineering Applications of Artificial Intelligence, 17(6), 567–576.
electrocardiographic changes in patients with partial epilepsy using wavelet- Übeyli, E. D., & Güler, I. (2008). Support vector machines for detection of
based neural networks. Expert Systems with Application, 22(2), 62–71. electrocardiographic changes in partial epileptic patients. Expert Systems with
Hirsch, J., & C.O.a.L. (2002). Ictal heart rate differentiates epileptic from non- Applications, 51, 1–8.
epileptic seizures. Neurology, 58, 636–638. Vapnik, V. (1995). The nature of statistical learning theory (pp. 33–45). New York:
Hitiris, N., Suratman, S., Kelly, K., Stephen, L. J., Sills, G. J., & Brodie, M. J. (2007). Springer.
Sudden unexpected death in epilepsy: A search for risk factors. Epilepsy & Widodo, A., & Yang, B.-S. (2007). Wavelet support vector machine for induction
Behavior, 10, 138–141. machine fault diagnosis based on transient current signal. Expert Systems with
Leung, H., Kwan, P., & Elger, C. E. (2006). Finding the missing link between ictal Applications, 1–10.
bradyarrhythmia, ictal asystole, and sudden unexpected death in epilepsy. Yang, M. H., & Wang, R.-C. (2008). DDoS detection based on wavelet kernel support
Epilepsy & Behavior, 13, 19–30. vector machine. The Journal of China Universities of Posts and
Leutmezer, F., Schernthaner, C., Lurger, S., Pötzelberger, K., & Baumgartner, C. Telecommunications, 15(3), 59–63.
(2003). Electrocardiographic changes at the onset of epileptic seizures. Epilepsia, Zhang, X., Ren, S.-J., Xu, J.-H., & Zhu, Z.-C. (2005). Robust multiwavelets support
44(3), 348–354. vector regression network. In International conference on control and automation
Paramanathan, P., & Uthayakumar, R. (2008). Application of fractal theory in (ICCA2005) WE-2.5 (pp. 1220–1224).
analysis of human electroencephalographic signals. Computers in Biology and Zhang, L., Zhou, W., & Jiao, L. (2004). Wavelet support vector machine. IEEE, 34,
Medicine, 38, 372–378. 34–39.