You are on page 1of 8

Expert Systems with Applications 38 (2011) 10751–10758

Contents lists available at ScienceDirect

Expert Systems with Applications


journal homepage: www.elsevier.com/locate/eswa

Evolutionary model selection in a wavelet-based support vector machine


for automated seizure detection
M. Zavar a,⇑, S. Rahati b, M.-R. Akbarzadeh-T c, H. Ghasemifard d
a
Sama Technical and Vocational Training School, Islamic Azad University, Quchan Branch, Quchan, Iran
b
Islamic Azad University, Mashhad Branch, Mashhad, Iran
c
Center for Applied Research on Soft Computing and Intelligent Systems, Ferdowsi University of Mashhad, Mashhad, Iran
d
Dr. Shiekh Pediatric hospital, Mashhad University of Medical Sciences, Mashhad, Iran

a r t i c l e i n f o a b s t r a c t

Keywords: Support vector machines (SVM) have in recent years been gainfully used in various pattern recognition
Support vector machines applications. Based on statistical learning theory, this paradigm promises strong robustness to noise
Genetic algorithm and generalization to unseen data. As in any classification technique, appropriate choice of the kernels
Model selection and input features play an important role in SVM performance. In this study, an evolutionary scheme
Wavelet kernel function
searches for optimal kernel types and parameters for automated seizure detection. We consider the
Partial seizure
ECG
Lyapunov exponent, fractal dimension and wavelet entropy for possible feature extraction. The classifi-
cation accuracy of this approach is examined by applying the MIT (Massachusetts Institute of Technol-
ogy) dataset and comparing results with the SVM. The MIT-BIH dataset has the electrocardiographic
(ECG) changes in patients with partial epilepsy which two types ECG beats (partial epilepsy and normal).
A comparison of results shows that performance of the evolutionary scheme outweighs that of support
vector machine. In the best condition, the accuracy rate of the proposed approaches reaches 100% for
specificity and 96.29% for sensitivity.
Ó 2011 Elsevier Ltd. All rights reserved.

1. Introduction In this study, a wavelet-based support vector machine (WSVM)


is applied to detection of epilepsy instances from ECG signals. The
Support vector machines are machine learning method based approach aims to benefit from robust wavelet-based kernels and
on statistical learning theory, so strong robustness to noise and evolutionary selection of SVM parameters such its kernel types
generalization to unseen data are expected from SVM and this sub- and parameters to minimize detection error with fewer data items
ject is the most important drawback of ANNs. The idea of SVM was (higher sparsity).
first proposed by Vapnik in mid-1970’s, but it is widely used only As can be seen, Kernels perform an important role in SVM, but
articles after 1995. SVM is now considered a powerful tool on pro- SVMs cannot choose optimal kernel types, kernel parameters and
cessing and classification fields, and more development is ongoing. feature subsets. We can use an evolutionary scheme searches to
An SVM uses kernels to map the data from input space to a high- solve this problem. In some of the recent studies, the genetic algo-
dimensional feature space in which the problem becomes linearly rithms have been used for selecting the optimum features for SVM
separable. The resulting decision function is related to the admis- classifier, but many of these studies do not perform the SVM
sible kernels, number of support vectors (SV) and their weights. parameter optimization. On the other hand, optimal kernel type
There are many kinds of kernels that can be used, such as the radial and kernel parameters, feature subset and the C parameter of
basis functions (RBF), polynomial kernels, wavelets. Wavelets, a SVM are assessed simultaneously in this paper. Experiments show
powerful tool for non-stationary signal processing, has been more the feasibility and validity of GA_WSVM in classification.
recently applied to kernel functions in SVM classification and Electrocardiogram1 signal is applied for the recording of the bio-
regression. Here, wavelet functions are first used to construct the electrical and biomechanical activities of the cardiac system. It pro-
admitted kernels for SVM according to Mercer theory (Burges, vides useful information about the functional aspects of
1998; Widodo & Yang, 2007). cardiovascular system. Mainly, epileptic seizures are associated with
several changes in autonomic nervous system, which may cause car-
⇑ Corresponding author. diovascular, gastrointestinal, respiratory, cutaneous and urinary
E-mail addresses: m.zavar@gmail.com (M. Zavar), Rahati@mshdiau.ac.ir (S.
Rahati), akbarzadeh@ieee.org (M.-R. Akbarzadeh-T), Ghasemifard.hadi@gmail.com
1
(H. Ghasemifard). ECG.

0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2011.01.087
10752 M. Zavar et al. / Expert Systems with Applications 38 (2011) 10751–10758

manifestations. Because of possible contribution of cardiovascular of data from 11 partial seizures recorded in five women patients
system to sudden unexplained death (SUD), it has received the most during continuous electroencephalographic/electrocardiographic/
attention. According to research in recent years, SUD in patients with video monitoring. The patients ranged in age from 31 to 48 years
severe intractable epilepsy reduce from 1/100 to 1/1000 in patients old, were without clinical evidence of cardiac disease, and had par-
with well-controlled epilepsy. In addition ECG signal has been play- tial seizures with or without secondary generalization from frontal
ing increasingly important role in seizure detection in patients with or temporal foci. Recordings were made under a protocol approved
partial epilepsy. Severe cardiac rhythm and conduction abnormalities by Beth Israel Deaconess Medical Center’s Committee on Clinical
have been explained, such as severe bradycardia, bundle branch Investigations. Data were analyzed off-line using customized soft-
blocks, ST-segment and QT-interval abnormalities. (Hirsch & ware. Onset and offset of seizures were visually identified to the
C.O.a.L., 2002; Leutmezer, Schernthaner, Lurger, Pötzelberger, & nearest 0.1 s by an experienced electroencephalographer (DLS)
Baumgartner, 2003; Leung, Kwan, & Elger, 2006; Hitiris et al., 2007). blinded with respect to the heart rate variability analysis. Contin-
Autonomic nervous system changes are common in epileptic uous single-lead ECG signals were sampled at 200 Hz. From the
and non-epileptic seizures and may cause sudden death in epilep- digitized ECG recording, a heartbeat annotation file (a list of the
tic patients. These autonomic changes are not very clearly, but may type and time of occurrence of each heartbeat) was obtained using
be the result of increased motor activity and other factors. Several a version of the commercially available arrhythmia analysis
studies have demonstrated that ECG commonly changes during software.
seizures. In this paper we propose an automated decision system The seizures may be related to cardiac arrhythmias and sudden
for partial epilepsy seizure detection. Ordinary methods of moni- death. In five women patients, 11 seizures were recorded that were
toring and diagnosing electrocardiographic changes rely on detect- between ranges of 15–110 s. Post-ictal heart rate oscillations with
ing the presence of particular signal features by a human observer. low frequency (i.e. intervals of 2–6 min) were observed in all five
Due to large number of patients in intensive care units and the patients that were included reasonable frequency spectrum in
need for continuous observation of such conditions, in recent the ranges of 0.01–0.1 Hz. Also, in pre-ictal no oscillations were
years, there has been an growing interest to automate this process. observed.
However, techniques from the domains of nonlinear analysis and Entirely, the significant characteristics of post-ictal heart rate
chaos theory are applied to experimental time series such as ECG oscillations are as follows:
signals. The techniques developed for automated electrocardio-
graphic changes detection. Many ECG changes detection methods,  These oscillations are different from high frequency oscillations
such as wavelet transform (WT), length and energy transform, arti- (0.2–0.4 Hz of physiological oscillations relates to breath).
ficial neural networks, support vector machine are reported in the  Post-ictal oscillations have high amplitudes.
literatures (Foo, Stuart, Harvey, & Meyer-Baese, 2002; Güler &  Discriminating these oscillations from short length physiologi-
Übeyli, 2004; Güler & Übeyli, 2005; Saxena, Kumar, & Hamde, cal changes that are related to posture or body activity.
2002; Stollberger & Finsterer, 2004; Übeyli & Güler, 2004).
This paper is organized as follows. In Section 2, epileptic seizure 3. Feature extraction
data is described. In Section 3, the features of ECG signals are ex-
tracted with Lyapunov exponent, wavelet entropy and fractal The nonlinear features as Lyapunov exponent, fractal dimension
dimension. SVM is reviewed in Section 4. In Section 5, the model and wavelet entropy are used in feature extraction stage of this
selection based on a genetic algorithm is explained. In Section 6, approach.
experimental results are presented, and finally in Section 7, conclu-
sions are given. 3.1. Lyapunov exponent

Lyapunov exponent as a measure of the divergence of nearby


2. Data acquisition trajectories, the exponential divergence of nearby trajectories in
state space is conceptually the most basic indicator of determinis-
In this paper, GA_WSVM approach was represented for detec- tic chaos and can be estimated using largest Lyapunov exponent
tion of electrocardiographic (ECG) changes in patients with partial Lmax. In fact this technique is for visualizing the dynamical behav-
epilepsy which were obtained from the MIT-BIH database ior of the multivariate system. That is to generate the phase space
(Al-Aweel et al., 1999). portrait of the system. A phase space portrait is created by treating
MIT-BIH database is a large and growing archive of well- each time-dependent variable of the system as a component of the
characterized digital recordings of physiologic signals and related vector in a multidimensional space, usually called state or phase
data for use by the biomedical research community. MIT-BIH space of the system. Each vector in the phase space represents an
currently includes databases of multi-parameter cardiopulmonary, instantaneous state of the system. These time dependent vectors
neural and other biomedical signals from healthy subjects and are plotted sequentially in the phase space to represent the evolu-
patients with a variety of conditions with major public health tion of the state of the system over time.
implications, including sudden cardiac death, congestive heart A system’s behavior is chaotic if its average Lyapunov exponent
failure, epilepsy, gait disorders, sleep apnea, and aging. The data- is a positive number. The calculation of the Lyaponuv exponent
base of heart rate oscillations in partial epilepsy has been studied from a one-dimensional time-series of data they should label the
in the present work. Two types (partial epilepsy and normal) of series x(t0), x(t1), x(t2), . . . as x0, x1, x2, . . .. For the sake of simplicity,
the ECG beats were obtained from the MIT-BIH database. Post-ictal we will assume, as is usually the case, that the time intervals be-
heart rate oscillations were reported in a heterogeneous group of tween samples are all equal; therefore, we can write (Übeyli &
patients with partial epilepsy. This pattern is marked by the Güler, 2004):
appearance of transient but prominent low frequency heart rate
tn  t 0 ¼ ns ð1Þ
oscillations (0.01–0.1 Hz) immediately following five of 11 seizures
recorded in five patients. This finding may be a marker of neuroau- where s is the time interval between samples.
tonomic instability, therefore, may have implications for If the system is behaving chaotically, the divergence of nearby
understanding perturbations of heart rate control associated with trajectories will manifest itself in the following way: if we select
partial seizures. The preliminary report was based upon analysis some value from the sequences of xs, say xi, end then search the
M. Zavar et al. / Expert Systems with Applications 38 (2011) 10751–10758 10753

sequence for another x value, say xj, that is close to xi, then of the outcome is related to a probability of distribution
the sequence of differences d0 = jxj  xij, d1 = jxj+1  xi+1j, . . . , dn = P = (p1, p2, p3, . . . , pn).
jxj+n  xi+nj is assumed it increase exponentially, at least on the Transient signals have some characteristics such as high fre-
average, as n increases. More formally, we assume that quency and instant break.
Given a discrete signal x(n), being fast transform at instant k and
dn ¼ d0 ekn ð2Þ scale j, it has a high frequency component coefficient Dj(k) and a
low frequency component coefficient Aj(k). The frequency band
Or, after taking logarithms
of the information contained in signal components Dj(k) and
1 dn Aj(k), obtained by reconstruction are as follows:
k¼ ln ð3Þ (
n d0 Dj ðkÞ : ½2ðjþ1Þ fs ; 2j fs 
j ¼ 1; . . . ; m ð5Þ
In practice, above equation will be taken as the definition of the Aj ðkÞ : ½0; 2ðjþ1Þ fs 
Lyapunov exponent k. If k is positive, the behavior is chaotic. In this
where fs is the sampling frequency.
method of finding, essentially two nearby trajectory points in state
The original sequence X(n) can be represented by the sum of all
space are being located and then following the differences between
components as follow:
the two trajectories that follow each of these ’’initial’’ points.
To characterize the attractor, an average value for k usually is X
j

demanded. An average value obtains by distributed over attractor. XðnÞ ¼ Dj ðkÞ þ Aj ðkÞ ð6Þ
j¼1
The average Lyapunov exponent for attractor is then found from
Various wavelet entropy measures were defined in X.
1 XN
Entirely, wavelet entropy has been done on signal energy levels
k¼ kðxi Þ ð4Þ
N i¼1 that can be considered as wavelet energy if the energy is used in
the formulas as follow.
For this set k(xi) shows the value of k may depend on the value of xi The energy at each resolution level, will be the energy of detail
chosen as the initial value. Positive values of Lyapunov exponent k signal
represents that behavior of the system is chaotic conversely, nega- X
tive values of Lyapunov exponent reflects periodic behavior of the Ej ¼ j Dj ðkÞj2 ð7Þ
system. Fig. 1(a) and (b) shows Lyapunov exponent for a normal k

and epileptic seizure. And energy at each sampled time k will be


The Lyapunov exponents of the two types of ECG beats differ X
obviously from each other so they can be used for representing Ek ¼ j Dj ðkÞj2 ð8Þ
the ECG signals. As can be seen, there are positive Lyapunov expo- j

nents, which confirm the chaotic nature of the ECG signals. In consequence total energy can be obtained by
As it is seen from Fig. 1, 250 coefficients of Lyapunov exponent XX X
have extracted of every epoch. High dimension of feature vectors Etot ¼ j Dj ðkÞj2 ¼ Ej ð9Þ
increased computational complexity. In order to reduce the dimen- j k j

sionality of the extracted diverse feature vectors, statistics over the Then the normalized values, which represent the relative wavelet
set of Lyapunov exponents are used. These statistical features are: energy,
maximum, minimum, mean and standard deviation of the
Lyapunov exponents in each epoch. Ej
Pj ¼ ð10Þ
Etot
3.2. Wavelet entropy For the resolution level, defined by scales the probability distribu-
P
tion of the energy. Clearly, j Pj ¼ 1 and the distribution {Pj} can
The information-theoretic entropy was introduced by Claude be considered as a time-scale density. This gives suitable tool for
Shannon in 1948 and Shannon was looking for a measure of detecting characterizing specific phenomenon in time and fre-
uncertainty. For process with n possible outcomes; the uncertainty quency planes.

12 14

10 12

8 10

6 8

4 6

2 4

0 2

-2 0

-4 -2
0 50 100 150 200 250 0 50 100 150 200 250
(a) (b)
Fig. 1. Lyapunov exponents of the ECG beats. (a) Normal beat and (b) partial epilepsy beat.
10754 M. Zavar et al. / Expert Systems with Applications 38 (2011) 10751–10758

Shannon entropy measures the predictability of future ampli- The total average length for scale k, L(k) is proportional to k  D
tude values of the ECG based on the probability distribution of where D is the fractal dimension2 by Higuchi’s method. In the
amplitude values. It quantifies the probability density function of curve of ln(L(k)) versus ln(1/k), the slope of the least squares lin-
the distribution of values. So total wavelet entropy is defined as ear best fit is the estimate of the FD. In Higuchi’s algorithm there
(Rosso et al., 2000; Safty & A.E.-Z., 2008): is a need to choose the value for Kmin and Kmax, that is, initial
X and final step of scaling factor values. But the criterion for the
wSðpÞ ¼  Pj lnðpÞj ð11Þ selection of these values is not presented, so it gives less effective
j results. Higuchi fixed the minimum value of K = 2 and maximum
value of K = 8 for his algorithm. The FD of a curve can be defined
Therefore, according to the principles and basis of wavelet entropy
as
as mentioned, different entropy criteria have been used, as in MAT-
LAB7 has referred. Hence, according to the MATLAB, in the following log10 L
D¼ ð19Þ
expressions, s is the signal and (si)i the coefficients of s in an orthog- log10 d
onal basis.
here L is the total length of the curve or sum of distances between
The entropy E must be an additive cost function such that
P successive points and d is the diameter estimated as the distance
E(0) = 0 and EðsÞ ¼ i Eðsi Þ (13)
between the first point of the sequence and the point of the se-
quence that provides the farthest distance. Mathematically, d can
 Normalized Shannon entropy
be expressed as
X
E1ðsÞ ¼ s2i logðs2i Þ with the convention 0 logð0Þ ¼ 0 ð12Þ d ¼ maxðdistanceð1; iÞÞ ð20Þ
i
Considering the distance between each point of the sequence and
 Concentration in norm entropy with 1 6 p the first, point i is the one that maximizes the distance with respect
X to the first point. The FD compares the actual number of units that
E2ðsÞ ¼ j si jp ¼ kskpp ð13Þ
compose a curve with the minimum number of units required to
i
reproduce a pattern of the same spatial extent (Paramanathan &
Here, the role of p is as power. Uthayakumar, 2008).
 Log energy entropy
The logarithm of ‘‘energy’’, defined as the sum over all samples
4. Support vector machine
X
E3ðsÞ ¼ logðs2i Þ with the convention logð0Þ ¼ 0 ð14Þ
i The SVM proposed by Vapnik (1995) has been studied compre-
hensively for classification, regression, and density estimation.
 Threshold entropy
Fig. 2 shows the architecture of the SVM. SVM maps the input pat-
E4(s) = 1 if |si| > p and 0 elsewhere so E4(s) = {i such that |si| > p}
terns into a higher dimensional feature space through some non
is the number of samples for which the absolute value of the
linear mapping chosen a priori. Then, a linear decision surface is
signal exceeds a threshold p.
constructed in this high-dimensional feature space. Hence, SVM
 SURE entropy
A threshold-based method in which the threshold equals to is a linear classifier in the parameter space, but it can perform
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi nonlinear classification with the nonlinear mapping of the input
p¼ 2loge ðnlog2 ðnÞÞ ð15Þ space to the high-dimensional feature space (Burges, 1998; Übeyli
and Güler, 2008).
where n is the number of samples in the signal. Training the SVM is a quadratic optimization problem. The con-
X struction of a hyperplane wTx + b P 0 (w is the vector of hyper-
E5ðsÞ ¼ n þ minðs2i ; p2 Þ ð16Þ
plane coefficients and b is a bias term), so that the margin
i
between the hyperplane and the nearest point is maximized, can
be posed as the quadratic optimization problem. SVM has been
3.3. Fractal dimension with Higuchi’s algorithm shown to provide high generalization ability. For a two-class prob-
lem, assuming the optimal hyperplane in the feature space is gen-
Consider x(1), x(2), . . . , x(N) be the time sequence to be analyzed. erated, Eq. (24) shows the classification decision for new (test) data
Construct k new time series xkm as xkm = {x(m), x(m + k), . . . , x:
x(m + [(N  m)/k]k)} for m = 1, 2, . . . , k, where m indicates the initial
time value, k indicates the discrete time interval between points (de-
X
N
f ðxÞ ¼ yi ai Kðxi ; xÞ þ b ð21Þ
lay) and [a] means integer part of a. For each of the curves or time ser- i¼1
ies xkm constructed, the average length Lm(k) is computed as
where ai P 0, i = 1, 2, . . . , N are non-negative Lagrange multipliers
P that satisfy yi(w, u(xi) + b) = 1. Therefore Support Vectors are on
i¼1 j xðm þ ikÞ  xðm þ ði  1ÞkÞ j ðN  1Þ
Lm ðkÞ ¼ Nm ð17Þ the hyperplane with a P 0, and K(xi, x) is a kernel function that ex-
k
k
presses an inner product in the feature space. So, f(x) is a linear
combination of the kernels. Fig. 3 shows training data.
where N is the total length of the data sequence and (N  1)/
SVM uses some kernel functions, such as: k(xi, x) = (xi, x) (linear
[(N  m)/k]k is a normalization factor. An average length is com-
SVM); k(xi, x) = (xi, x + 1)d (polynomial SVM of degree);
puted for all time series having the same delay (or scale) k, as the 2 2
kðxi ; xÞ ¼ eðkxi xk Þ=2r (Radial Basis Function–RBF SVM); where s,
mean of the k lengths Lm(k) for m = 1, 2, . . . , k. The procedure is re-
k, y are constants. However, a proper kernel function for a certain
peated for each k ranging from 1 to kmax, yielding a sum of average
problem is dependent on the specific data, and till now there is no
lengths L(k) for each k as indicated in
good method on how to choose a kernel function. In this study, the
choice of the kernel functions was studied empirically and optimal
X
k
LðkÞ ¼ LmðkÞ ð18Þ
m¼1 2
FD.
M. Zavar et al. / Expert Systems with Applications 38 (2011) 10751–10758 10755

Fig. 2. Architecture of the SVM.

The translation invariant kernel k(x, x0 ) = k(x  x0 ) is Mercer ker-


nel if and only if its Fourier transform is not negative (Zhang, Ren,
Xu, & Zhu, 2005).
That is
Z
F½KðxÞ ¼ ð2n ÞN=2 KðXÞejðx:xÞ dX P 0 ð22Þ
RN

5.2. Wavelet analysis

In wavelet analysis theory, the multiresolution analysis method


uses a base of wavelets functions’ dilation factors a and translation
factors c at different dimension space to get a family of wavelet
functions. Given w e L2(R), a > 1, c > 0.
Suppose {wm,n}m,neZ = {DamTncwm,n}m,neZ then, the family of
functions generate a wavelet framework in L2(RN) space, where,
w is known as the mother wavelet, and a, c denote the frame-
work parameters. We can express a mother wavelet function
w(x) as:
pffiffiffiffiffiffiffiffiffiffiffiffiffix  c
wa;c ðxÞ ¼ jajw ð23Þ
a
where, y, a, c 2R, a is a dilation factor, and c is a translation factor.
Thus, the wavelet transform of a function f(x) e L2(RN) can be writ-
Fig. 3. SVM: mapping the training data non-linearly into a higher dimensional ten as:
feature space.
W a;c ðf Þ ¼ hf ðxÞ; wa;c ðxÞi ð24Þ
2
that is, the dot product in L (R) space. Eq. (23) indicates the decom-
results were achieved using RBF kernel function (Cortes & Vapnik, position of a function on a wavelet base. Some mother wavelet base
1995; Vapnik, 1995; Yang & Wang, 2008). functions can generate wavelet framework, and the framework sat-
isfying Mercer condition can be used to construct the kernel func-
tions. The Mercer condition is
5. Wavelet SVM (WSVM) ZZ
0
Kðx; x0 Þf ðxÞf ðx0Þdxdx P 0 ð25Þ
5.1. Acceptability support vector kernel L2 L2

For f e L2(R) is satisfied, and then K(x, x0 ) can be expressed as dot


The kernel function of SVM is the acceptability support vector product formation in feature space, namely, K(x, x0 ) = k(hx, x0 i).
kernel, which is paid more attention in machine learning because Therefore, in L2(R) space, if f = {wi} a framework, and
it can extend linear learning to nonlinear learning. It satisfies X
Mercer positive defined condition. Kðx; x0 Þ ¼ ki wi ðxÞwi ðx0 Þ ð26Þ
i
To a kernel function: K : RN  RN ? R, if and only if the matrix
[(xi, xj)]i,j e Rmn is positive semi-definite to all the points where ki > 0, ki < ki+1, we regard the wavelet kernel as a multidimen-
x1, x2, . . . , xn here n = 1, 2, . . ., the Mercer condition is created. sional wavelet function. Generally, most researchers treat wavelet
The following definition makes it appropriate use the Mercer as predisposal tool to abstract feature or data reduction; some good
kernel function of translation invariant to prove the positive de- mother wavelet functions were not applied for network intrusion
fined condition of Mercer kernel. detections. In fact, it is the SVM kernel functions.
10756 M. Zavar et al. / Expert Systems with Applications 38 (2011) 10751–10758

Table 1
WSVM using the Marlet and Mexican hat wavelet kernel.

Mother wavelet Wavelet kernel function


0 2    
Morlet w(y, y ) = cos (x0y)  exp (y /2) Qd ðx x0 Þ kx x0 k2
Kðx; x0 Þ ¼ cos xo i ai i  exp  i2a2i
i¼1
i
0    
Mexican hat w(y, y ) = (1  y2)  exp (y2/2) Q ðx x0 Þ2 kx x0 k2
Kðx; x0 Þ ¼ di¼1  1 i a2 i  exp  i2a2i
i i

Recently, based on the wavelet decomposition, some wavelet parameter and optimal soft margin constant C parameter of sup-
support vector machines (WSVMs) are proposed and have been port vector machine classifiers.
used successfully in many fields such as classification and The structure of chromosomes is expressed in Table 2.
nonlinear function estimation (Zhang, Zhou, & Jiao, 2004; Yang & The initial population of the genetic algorithm is composed 20
Wang, 2008). chromosomes that each chromosome is constructed of 9 genes
After checking the satisfaction of kernel in Mercer theorem, is (bit), these bits are consisted of 4 segments. The first segment of
construct wavelet kernels for SVM, translation-invariant wavelet a chromosome indicates the feature types and 2 bits are enough,
kernels that satisfy the translation invariant kernel theorem are: because 4 feature types are used in here (Lyapunov exponent,
wavelet entropy, fractal dimension, Lyapunov exponent & FD).
YN  
xi  x0i The second segment of a chromosome represents the value of C
Kðx; x0 Þ ¼ w ð27Þ
ai parameter which is between 0.1 and 1,000,000. The third segment
i¼1
illustrates the kernel type of SVM (Polynomial, RBF, Mexican hat
The WSVM using the Marlet and Mexican hat wavelet kernel in
modeling to SVM is superior than in using the Gaussian radial basis
function kernel. Mother wavelet function and wavelet kernel func-
tion for Marlet and Mexican hat are shown in Table 1 (Cui, 2006).
The goal in this formulation is to find the optimal wavelet coef-
ficients in the space spanned by the multidimensional wavelet ba-
sis. Therefore, we can obtain the optimal decision function as
below:
! !
Xt
ai YN
xj  xji
f ðxÞ ¼ sgn w þb ð28Þ
i¼1
yi i¼1 ai

where xji is the jth component of ith training example.

6. Model selection
Fig. 4. Fitness of 3 generation.

In this section, the model selection for SVM with an evolution-


ary scheme searches is described. Genetic algorithm is a good evo-
lutionary computing for most type of problems. Genetic algorithm Table 3
has several start point, therefore it can searches the search space Accuracy with SVM classifier.
from several direction. Kernel function Parameter Nsv Margin Accuracy (%)
As you know, GA has several stages in its algorithm. At start, it is
Rbf r = 10 32 37.27183 93.034
generated the random population of n chromosomes. There is a fit-
Poly d=3 38 35.78114 87.045
ness function that assesses the fitness f(x) of each chromosome x in Spline d=1 44 32.490018 58.06
the population. GA wants to generate a new population by repeat- Bspline d=1 40 35.379529 69.256
ing following steps until the new population is complete. Two par- Fourier d=2 50 30.494018 50.7143
Mexican hat – 24 38.487660 94.52
ent chromosomes are selected from a population according to their
Morlet x=5 25 38.53692 95.11
fitness. The crossover is performed between the parents to form
new offspring with a crossover probability. The mutation is per-
formed for new offspring with a mutation probability. The new off-
spring is accepted in the new population and is replaced. The new
generated population is used for a further run of the algorithm
(Engin, 2008).
The structure and algorithm of GA_WSVM approach is used in
this study for ECG signals. This model using GA, selects the feature
vector, optimal kernel function type, optimal kernel function

Table 2
Structure of a chromosome.

Feature C Kernel x Parameter for Marlet


selection parameter type kernel
00 000 00 00
Fig. 5. Graphic schema of Table 3.
M. Zavar et al. / Expert Systems with Applications 38 (2011) 10751–10758 10757

Table 4
Accuracy with GA_WSVM classifier.

Kernel function Feature selection Kernel parameter C parameters Nsv Margin Specificity Sensitivity Accuracy (%)
Rbf Lyapunov exponent r = 10 10,000 33 37.8721 100 88.89 94.45
Rbf Fractal dimension r = 10 100 39 29.2818 91.44 86.32 88.88
Rbf Lyapunov exponent & FD r = 10 0.1 35 30.4322 91.44 90.5 90.97
Poly Lyapunov exponent d=3 100,000 38 36.8593 95.6 81.48 88.54
Poly Wavelet entropy d=3 0.1 36 31.4373 91.44 85.18 88.31
Mexican hat Wavelet entropy – 1 28 37.58760 96.29 88.89 92.59
Mexican hat Lyapunov exponent & FD – 100,000 23 37.86413 97.45 91.44 94.45
Mexican hat Wavelet entropy – 1,000,000 24 36.53829 94.87 96.29 95.08
Morlet Lyapunov exponent & FD x = 10 100,000 22 38.802874 100 92.59 96.29
Morlet Fractal Dimension x=1 10 30 33.09635 92.59 93.5 93.04
Morlet Wavelet entropy x=5 100,000 25 36.7492 100 85.18 92.59
Morlet Lyapunov exponent x=3 1 30 35.78114 92.59 88.89 90.75

wavelet function and Marlet wavelet function). Finally, the fourth


Table 5
segment of this chromosome, shows Marlet kernel parameter (x) Classification results.
which may be 1, 3, 5, 10.
Classifier Number of feature Accuracy (%)
After initialization of population, the fitness of 20 chromosomes
is calculated. SVM (RBF) 10 93.034
WSVM (Morlet) 10 95.11
The fitness function used in this paper is classification accuracy
GA_WSVM 6 96.29
of test data. We used the roulette wheel, and random single-sight
crossover3 and mutation operators. In this paper, 3 generation is ta-
ken to achieve the best fitness (Fig. 4). Number of population is 20, obtained by using GA_WSVM approach are higher than the classi-
crossover rate is 0.5 and mutation rate is 0.001. fication of SVM.
Finally, all of the classification accuracies obtained by using
SVM, WSVM and GA_WSVM approach in best condition are given
in Table 5.
7. Experimental results
8. Conclusion
For both classifiers, totally 160 signals from MIT have been
used. Hence, the dimensions of the feature matrix are 160  10.
In this study, GA_WSVM approach is proposed for ECG signal
It should be mentioned that the length of each epoch equals to
that wavelet technique with SVMs is used to construct WSVMs.
2.5 s (each 2.5 s is about 2 beat ECG). So 70% of data were used
Our wavelet kernel is a kind of multidimensional wavelet function
for training both classifiers, and also 30% of data were assigned
that can employed for classification. Due to the wavelet kernel is
for testing classifiers.
orthonormal (or orthonormal approximately), therefore the data
In this section, the accuracy of SVM classifiers, whose kernel
can be classify more accurate and faster than the other kernels.
function type, kernel function parameter, and C parameter have
In spite of kernels and features perform an important role in
constant values, are used for comparing with the performance of
SVM, but SVM can not choose optimal kernel type, kernel parame-
GA_WSVM approach. For implementation of SVMs, need to use
ters and feature subset. We can use an evolutionary scheme
kernel function. Seven kernel functions were used for classifying
searches to solve this problem.
and in this case the feature number is 10, C parameter is 1000
The GA_WSVM approach optimizes feature selection, kernel
and other parameters have an arbitrary value. Support vector4 ex-
function types, parameters of kernel function, and optimal values
plained the data feature of the whole training dataset; SVs are given
of C parameters simultaneously. Feature subset in ECG classifica-
a small portion of the whole training dataset. For example, if the
tion methods, is generated by using of Lyapunov exponent, wavelet
training data are 160, therefore, SVs is only 32(20%) of n. The classi-
entropy and fractal Dimension, for increasing the efficiently classi-
fication accuracies of SVM with constant values are given in Table 3.
fication capability of SVM classifiers. Hence, the GA_WSVM ap-
According to results in Table 3, Morlet kernel function is the
proach can be used for many areas such as medical data
best of all kernel function types used in this study for seizure
classification and speech recognition.
detection. Fig. 5 shows the graphic schema of Table 3.
Generally speaking, GA_WSVM approach was represented for
As you can see in Fig. 5, SVM with wavelet kernel is the best
detection of electrocardiographic (ECG) changes in epileptic pa-
accuracy, also number of support vectors is low and Margin has
tients, regarding the results it can be concluded that changes in
more amount than the others. These results explain that wavelet
neural system (epileptic discharges) have significant effect on car-
kernel is suitable for seizure detection.
diovascular system. There are some clear reasons that support our
In GA_WSVM method, the feature selection, optimal kernel
claims, such as presence of neuronal intercommunication between
function types, optimal values of kernel function parameters and
epileptogenic foci and the centers of heart control. Regarding this
optimal values of C parameters is estimated by genetic algorithm.
point that before appearance of some seizures and also at onset
This experiment used 4 kernels (2 wavelet kernel function,
of abnormal discharges in brain, aura will occurs, so it can be a rea-
Mexican hat and Morlet, also RBF and Polynomial that obtained
son for ECG changes before onset of convulsion behaviors.
good results in Table 3), also parameters for RBF and polynomial
are constant (r = 10, d = 3) (Table 4).
References
These results in Table 5 show that the C parameter, features and
kernel parameters have important role in accuracy of classification. Al-Aweel, I. C., Krishnamurthy, K. B., Hausdorff, J. M., Mietus, J. E., Ives, J. R., Blum, A.
As shown in both Tables 4 and 5, the classification accuracies S., et al. (1999). Postictal heart rate oscillations in partial epilepsy. Neurology,
53(7), 1590–1592.
Avci, E. (2008). Selecting of the optimal feature subset and kernel parameters in
3
One-point crossover. digital modulation classification by using hybrid genetic algorithm-support
4
SV. vector machines: HGASVM. Expert Systems with Applications, 1–12.
10758 M. Zavar et al. / Expert Systems with Applications 38 (2011) 10751–10758

Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Rosso, O. A., Blanco, S., Yordanova, J., Kolev, V., Figliola, A., Schurmann, M., et al.
Data Mining and Knowledge Discovery, 2, 121–167. (2000). Wavelet entropy: A new tool for analysis of short duration brain
Cortes, C., & Vapnik, V. (1995). Support vector networks. Machine Learning, 20(3), electrical signals. Journal of Neuroscience Methods, 105, 65–75.
273–297. Safty, S. EL., & A.E.-Z. (2008). Applying wavelet entropy principle in fault
Cui, W. Z. (2006). Wavelet support vector machine with universal approximation classification. Expert Systems with Applications, 30, 1307–6884.
and its application. In IEEE information theory workshop (pp. 340–364). Saxena, S. C., Kumar, V., & Hamde, S. T. (2002). Feature extraction from ECG signals
Foo, S. Y., Stuart, G., Harvey, B., & Meyer-Baese, A. (2002). Neural network-based using wavelet transforms for disease diagnostics. International Journal of
EKG pattern recognition. Engineering Applications of Artificial Intelligence, 15, Systems Science, 33(13), 1073–1085.
253–260. Stollberger, C., & Finsterer, J. (2004). Cardiorespiratory findings in sudden
Güler, I., & Übeyli, E. D. (2004). Application of adaptive neuro-fuzzy inference unexplained/unexpected death in epilepsy (SUDEP). Epilepsy Research, 59,
system for detection of electrocardiographic changes in patients with partial 51–60.
epilepsy using feature extraction. Expert Systems with Applications, 27(3), Übeyli, E. D., & Güler, I. (2004). Detection of electrocardiographic changes in partial
323–330. epileptic patients using Lyapunov exponents with multilayer perceptron neural
Güler, I., & Übeyli, E. D. (2005). An expert system for detection of networks. Engineering Applications of Artificial Intelligence, 17(6), 567–576.
electrocardiographic changes in patients with partial epilepsy using wavelet- Übeyli, E. D., & Güler, I. (2008). Support vector machines for detection of
based neural networks. Expert Systems with Application, 22(2), 62–71. electrocardiographic changes in partial epileptic patients. Expert Systems with
Hirsch, J., & C.O.a.L. (2002). Ictal heart rate differentiates epileptic from non- Applications, 51, 1–8.
epileptic seizures. Neurology, 58, 636–638. Vapnik, V. (1995). The nature of statistical learning theory (pp. 33–45). New York:
Hitiris, N., Suratman, S., Kelly, K., Stephen, L. J., Sills, G. J., & Brodie, M. J. (2007). Springer.
Sudden unexpected death in epilepsy: A search for risk factors. Epilepsy & Widodo, A., & Yang, B.-S. (2007). Wavelet support vector machine for induction
Behavior, 10, 138–141. machine fault diagnosis based on transient current signal. Expert Systems with
Leung, H., Kwan, P., & Elger, C. E. (2006). Finding the missing link between ictal Applications, 1–10.
bradyarrhythmia, ictal asystole, and sudden unexpected death in epilepsy. Yang, M. H., & Wang, R.-C. (2008). DDoS detection based on wavelet kernel support
Epilepsy & Behavior, 13, 19–30. vector machine. The Journal of China Universities of Posts and
Leutmezer, F., Schernthaner, C., Lurger, S., Pötzelberger, K., & Baumgartner, C. Telecommunications, 15(3), 59–63.
(2003). Electrocardiographic changes at the onset of epileptic seizures. Epilepsia, Zhang, X., Ren, S.-J., Xu, J.-H., & Zhu, Z.-C. (2005). Robust multiwavelets support
44(3), 348–354. vector regression network. In International conference on control and automation
Paramanathan, P., & Uthayakumar, R. (2008). Application of fractal theory in (ICCA2005) WE-2.5 (pp. 1220–1224).
analysis of human electroencephalographic signals. Computers in Biology and Zhang, L., Zhou, W., & Jiao, L. (2004). Wavelet support vector machine. IEEE, 34,
Medicine, 38, 372–378. 34–39.

You might also like