You are on page 1of 11

IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 12, NO.

5, SEPTEMBER 2008 667

Classification of Electrocardiogram Signals


With Support Vector Machines and Particle
Swarm Optimization
Farid Melgani, Senior Member, IEEE, and Yakoub Bazi, Member, IEEE

Abstract—The aim of this paper is twofold. First, we present a In the literature, several methods have been proposed for
thorough experimental study to show the superiority of the gener- the automatic classification of ECG signals. Among the most
alization capability of the support vector machine (SVM) approach recently published works are those presented in [1]–[10]. In
in the automatic classification of electrocardiogram (ECG) beats.
Second, we propose a novel classification system based on particle greater detail, the method presented in [1] is based on a hybrid
swarm optimization (PSO) to improve the generalization perfor- fuzzy neural network that consists of a fuzzy self-organizing
mance of the SVM classifier. For this purpose, we have optimized subnetwork connected in a cascade with a multilayer percep-
the SVM classifier design by searching for the best value of the tron. The authors proposed to use high-order statistics (i.e., cu-
parameters that tune its discriminant function, and upstream by mulants of the second, third, and fourth orders) as input features
looking for the best subset of features that feed the classifier. The
experiments were conducted on the basis of ECG data from the for feeding their classifier. In [2], a neuro-fuzzy approach for the
Massachusetts Institute of Technology–Beth Israel Hospital (MIT– ECG-based classification of heart rhythms is described. Here,
BIH) arrhythmia database to classify five kinds of abnormal wave- the QRS complex signal is characterized by Hermite polyno-
forms and normal beats. In particular, they were organized so as to mials, whose coefficients feed the neuro-fuzzy classifier. In [3],
test the sensitivity of the SVM classifier and that of two reference the authors implemented two classification systems based on
classifiers used for comparison, i.e., the k-nearest neighbor (kNN)
classifier and the radial basis function (RBF) neural network clas- the support vector machine (SVM) approach. The first exploits
sifier, with respect to the curse of dimensionality and the number features based on high-order statistics, while the second uses
of available training beats. The obtained results clearly confirm the coefficients of Hermite polynomials. For improved perfor-
the superiority of the SVM approach as compared to traditional mance, the authors propose to combine the two classifiers by
classifiers, and suggest that further substantial improvements in means of a weighting mechanism, whose weights are deter-
terms of classification accuracy can be achieved by the proposed
PSO–SVM classification system. On an average, over three exper- mined according to a least square estimation method. Detec-
iments making use of a different total number of training beats tion of premature ventricular contractions (PVCs) by means of
(250, 500, and 750, respectively), the PSO–SVM yielded an overall a fuzzy-neural network classifier with features derived from a
accuracy of 89.72% on 40438 test beats selected from 20 patient quadratic spline wavelet transform is proposed in [4]. In [5], dif-
records against 85.98%, 83.70%, and 82.34% for the SVM, the ferent classification systems based on linear discriminant clas-
kNN, and the RBF classifiers, respectively.
sifiers are explored, together with different morphological and
Index Terms—Electrocardiogram (ECG) signal classification, timing features obtained from single and multiple ECG leads.
feature detection, feature reduction, generalization capability, In [6], a high-order spectral analysis method is proposed for
model selection issue, particle swarm optimization (PSO), support
vector machine (SVM). the analysis and classification of cardiac arrhythmias, based on
bispectral analysis techniques. In particular, the bispectrum is
estimated using an autoregressive model, and the frequency sup-
I. INTRODUCTION port of the bispectrum is extracted as a quantitative measure to
classify atrial and ventricular tachyarrhythmias. In [7], an auto-
OR SEVERAL years, the automatic classification of elec-
F trocardiogram (ECG) signals has received great attention
from the biomedical engineering community. This is mainly due
matic online beat segmentation and classification system based
on a Markovian approach is proposed. The system carries out
ECG signal analysis through two processing layers. In the first,
to the fact that ECG provides cardiologists with useful informa-
the ECG signal is segmented into beat waveforms by means of
tion about the rhythm and functioning of the heart. Therefore, its
a robust and precise waveform modeling with hidden Markov
analysis represents an efficient way to detect and treat different
models (HMMs). In the second, the system identifies prema-
kinds of cardiac diseases.
ture ventricular contraction beats using a simple set of rules.
In [8], a rule-based rough-set decision system is presented for
Manuscript received June 21, 2007; revised December 31, 2007 and
the development of an inference engine for disease identification
March 4, 2008. First published April 11, 2008; current version published using time-domain features. In [9], a patient-adapting heartbeat
September 4, 2008. classifier system based on linear discriminants is proposed. The
F. Melgani is with the Department of Information Engineering and Com-
puter Science, University of Trento, I-38050 Trento, Italy (e-mail: melgani@
classification system processes an incoming recording with a
disi.unitn.it). global-classifier to produce the first set of beat annotations.
Y. Bazi is with the College of Engineering, Al Jouf University, Al Jouf 2014, Then, an expert validates, and, if necessary, corrects a fraction
Saudi Arabia, (e-mail: yakoub.bazi@ju.edu.sa).
Digital Object Identifier 10.1109/TITB.2008.923147
of the beats of the recording. The system then adapts by first

1089-7771/$25.00 © 2008 IEEE


668 IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 12, NO. 5, SEPTEMBER 2008

training a local classifier using the newly annotated beats, and The rest of the paper is organized as follows. The basic math-
combines both local and global classifiers to form an adapted ematical formulation of SVMs for solving binary and multiclass
classification system. Finally, in [10], the authors present an classification problems is recalled in Section II. The main con-
approach for classifying beats of a large dataset by training a cepts and principles of PSO are introduced in Section III. The
neural network classifier using wavelet and timing features. The proposed PSO–SVM classification system is described in Sec-
authors found that the fourth scale of a dyadic wavelet trans- tion IV. The experimental results obtained on ECG data from
form with a quadratic spline wavelet together with the pre/post the Massachusetts Institute of Technology–Beth Israel Hospital
RR-interval ratio is very effective in distinguishing normal and (MIT–BIH) arrhythmia database [17] are reported in Sections V
PVC from other beats. and VI. Finally, conclusions are drawn in Section VII.
From these works, it appears clear that research in the field
of automatic ECG classification has reached a good level of II. SUPPORT VECTOR MACHINES
maturation. However, in the design of an ECG classification
Let us first consider, for simplicity, a supervised binary classi-
system, there are still some open issues, which, if suitably ad-
fication problem. Let us assume that the training set consists of
dressed, may lead to the development of more robust and ef-
N vectors xi ∈ d (i = 1, 2, . . . , N ) from the d-dimensional
ficient classifiers. One of these issues is related to the choice
feature space X. To each vector xi , we associate a target
of the classification approach to be adopted. In particular, we
yi ∈ {−1, +1}. The linear SVM classification approach con-
think that despite its great potential, the SVM approach has not
sists of looking for a separation between the two classes in X by
received the attention it deserves in the ECG classification lit-
means of an optimal hyperplane that maximizes the separating
erature as compared to other research fields. Indeed, the SVM
margin [11]. In the nonlinear case, which is the most commonly
classifier exhibits a promising generalization capability, thanks
used as data are often linearly nonseparable, the two classes are
to the maximal margin principle (MMP) it is based on [11]. An-
first mapped with a kernel method in a higher dimensional fea-
other important property is that it is less sensitive to the curse of 
ture space, i.e., Φ(X) ∈ d (d > d). The membership decision
dimensionality than traditional classification approaches. This
rule is based on the function sign[f (x)], where f (x) represents
is explained by the fact that the MMP makes it unnecessary
the discriminant function associated with the hyperplane in the
to estimate explicitly the statistical distributions of classes in
transformed space and is defined as
the hyperdimensional feature space in order to carry out the
classification task. Thanks to these interesting properties, the f (x) = w∗ Φ(x) + b∗ . (1)
SVM classifier has proved successful in a number of different
application fields, such as 3-D object recognition [12], biomed- The optimal hyperplane defined by the weight vector w∗ ∈
d
ical imaging [13], image compression [14], and remote sens-  and the bias b∗ ∈  is the one that minimizes a cost function
ing [15], [16]. Turning back to ECG classification, other issues that expresses a combination of two criteria: margin maximiza-
that need to be addressed are the following: 1) feature selection tion and error minimization. It is expressed as [11]
is not performed in a completely automatic way and 2) the se-
1  N
lection of the best free parameters of the adopted classifier is Ψ(w, ξ) = w2 + C ξi . (2)
generally done empirically (model selection issue). 2 i=1
In this paper, in order to address the aforementioned issues, in
a first step, we present a thorough experimental exploration of This cost function minimization is subject to the following
the SVM capabilities for ECG classification. In a second step, constraints:
we propose to optimize further the performances of the SVM yi (wΦ(xi ) + b) ≥ 1 − ξi , i = 1, 2, . . . , N (3)
approach in terms of classification accuracy: 1) by automati-
cally detecting the best discriminating features from the whole and
considered feature space and 2) by solving the model selection
issue. Unlike traditional feature selection methods, where the ξi ≥ 0,i = 1, 2, . . . , N (4)
user has to specify the number of desired features, the proposed where the ξi s are slack variables introduced to account for
system allows to carry out what we term “feature detection.” nonseparable data. The constant C represents a regularization
Feature selection and feature detection have the common char- parameter that allows to control the shape of the discriminant
acteristic of searching for the best discriminative features. The function. The aforementioned optimization problem can be re-
latter, however, has the advantage of determining their number formulated through a Lagrange functional, for which the La-
automatically. In other words, feature detection does not require grange multipliers can be found by means of a dual optimization
the desired number of most discriminative features from the user leading to a quadratic programming (QP) solution [11], i.e.,
a priori. The detection process is implemented through a particle
swarm optimization (PSO) framework that exploits a criterion 
N
1 
N

intrinsically related to SVM classifier properties, namely, the max αi − αi αj yi yj K(xi , xj ) (5)
α 2 i,j =1
number of support vectors (SVs). This framework is formulated i=1
in such a way that it also solves the model selection issue, i.e., under the constraints
to estimate the best values of the SVM classifier parameters,
which are the regularization and kernel parameters. αi ≥ 0, for i = 1, 2, . . . , N (6)
MELGANI AND BAZI: CLASSIFICATION OF ECG SIGNALS WITH SVMs AND PSO 669

and all other individuals in the same population. During the itera-

N tive search process in the d-dimensional solution space, each
αi yi = 0 (7) particle (i.e., candidate solution) will adjust its flying velocity
i=1 and position according to its own flying experience as well as
those of the other companion particles in the swarm. PSO has
where α = [α1 , α2 , . . . , αN ] is the vector of Lagrange mul-
proved promising in solving a number of engineering problems
tipliers and K(·, ·) is a kernel function. The final result is a
such as automatic control [20], antenna design [21], and inverse
discriminant function conveniently expressed as a function of
problems [22]. In the following, we will briefly describe the
the data in the original (lower) dimensional feature space X
 main concepts of the basic PSO algorithm.
f (x) = αi∗ yi K(xi , x) + b∗ . (8) Let us consider a swarm of size S. Each particle Pi (i =
i∈S 1, 2, . . . , S) in the swarm is characterized by: 1) its current po-
sition pi (t) ∈ d , which refers to a candidate solution of the
The set S is a subset of the indexes {1, 2, . . . , N } corre-
optimization problem at iteration T ; 2) its velocity vi (t) ∈ d ;
sponding to the nonzero Lagrange multipliers αi s, which define
and 3) the best position pbi (t) ∈ d identified during its past
the so-called SVs. The kernel K(·,·) must satisfy the condition
trajectory. Let pg (t) ∈ d be the best global position found over
stated in Mercer’s theorem so as to correspond to some type
all trajectories traveled by the particles of the swarm. Position
of inner product in the transformed (higher) dimensional fea-
optimality is measured by means of one or more fitness func-
ture space Φ(X) [11]. A typical example of such kernels is
tions defined in relation to the considered optimization problem.
represented by the following Gaussian function:
During the search process, the particles move according to the
K(xi , x) = exp(−γxi − x2 ) (9) following equations:
where γ represents a parameter inversely proportional to the vi (t + 1) = wvi (t) + c1 r1 (t) (pbi (t) − pi (t))
width of the Gaussian kernel.
As described before, SVMs are intrinsically binary classi- + c2 r2 (t) (pg (t) − pi (t)) (10)
fiers. But, the classification of ECG signals often involves the pi (t + 1) = pi (t) + vi (t) (11)
simultaneous discrimination of numerous information classes.
In order to face this issue, a number of multiclass classifica- where r1 (·) and r2 (·) are random variables drawn from a uni-
tion strategies can be adopted [15], [18]. The most popular ones form distribution in the range [0,1] so as to provide a stochastic
are the one-against-all (OAA) and the one-against-one (OAO) weighting of the different components participating in the parti-
strategies. The former involves a reduced number of binary de- cle velocity definition. c1 and c2 are two acceleration constants
compositions (and thus, of SVMs), which are, however, more regulating the relative velocities with respect to the best global
complex. The latter requires a shorter training time, but may and local positions, respectively. In greater detail, these parame-
incur conflicts between classes due to the nature of the score ters are considered as scaling factors that determine the relative
function used for decision. Both strategies generally lead to pull of the best position of the particle and the global best posi-
similar results in terms of classification accuracy. In this pa- tion. Sometimes, it is referred to them as the cognitive and social
per, we shall consider the OAA strategy. Briefly, this strategy is rates, respectively. They are factors determining how much the
based on the following procedure. Let Ω = {w1 , w2 , . . . , wT } particle is influenced by the memory of its best location and
be the set of T possible labels (information classes) associated by the rest of the swarm, respectively. The inertia weight w
with the ECG beats that we desire to classify. First, an ensemble is used as a tradeoff between the global and local exploration
of T (parallel) SVM classifiers is trained. Each classifier aims capabilities of the swarm. Large values of this parameter per-
at solving a binary classification problem defined by the dis- mit better global exploration, while small values lead to a fine
crimination between one information class ωi (i = 1, 2, . . . , T ) search in the solution space. Equation (10) allows the compu-
against all others (i.e., Ω − {wi }). Then, in the classification tation of the velocity at iteration T + 1 for each particle in the
phase, the “winner-takes-all” rule is used to decide which label swarm by combining linearly its current velocity (at iteration
to assign to each beat. This means that the winning class is the T ) and the distances that separate the current particle position
one that corresponds to the SVM classifier of the ensemble that from its best previous position and the best global position,
shows the highest output (discriminant function value). respectively. The particle position is updated with (11). Both
(10) and (11) are iterated until convergence of the search pro-
III. PARTICLE SWARM OPTIMIZATION cess is reached. Typical convergence criteria are based on the
iterative behavior of the best value of the adopted fitness func-
PSO is a stochastic optimization technique introduced re- tion(s) or/and simply on a user-defined maximum number of
cently by Kennedy and Eberhart, inspired by the social behavior iterations.
of bird flocking and fish schooling [19]. Similar to other evo-
lutionary computation algorithms, such as genetic algorithms
(GAs) [16], PSO is a population-based search method that ex- IV. PROPOSED PSO–SVM CLASSIFICATION SYSTEM
ploits the concept of social sharing of information. This means In this section, we describe the proposed SVM system for
that each individual (called particle) of a given population the classification of ECG signals. As mentioned in the Introduc-
(called swarm) can benefit from the previous experiences of tion, the aim of this system is to optimize the SVM classifier
670 IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 12, NO. 5, SEPTEMBER 2008

accuracy by automatically: 1) detecting the subset of the best 2) Search process


discriminative features (without requiring a user-defined num- Step 5) Detect the best global position pg in the swarm
ber of desired features) and 2) solving the SVM model selection exhibiting the minimal value of the considered
issue (i.e, estimating the best values of the regularization and fitness function over all explored trajectories.
kernel parameters). In order to attain this, the system is derived Step 6) Update the speed of each particle using (10).
from an optimization framework based on PSO. Step 7) Update the position of each particle using (11). If a
particle goes beyond the predefined boundaries of
A. PSO Setup the search space, truncate the updating by setting
the position of the particle at the space boundary
Because of the good performances generally achieved with
and reverse its search direction (i.e., multiply its
the nonlinear SVM classifier based on the Gaussian kernel [15],
speed vector by –1). This will stop the particles
in the following, we shall describe the proposed classification
from further attempting to go out of the allowed
system with this particular kernel. We note that the same rea-
search space.
soning holds for any kind of kernel that satisfies the Mercer’s
Step 8) For each candidate particle pi (i = 1, 2, . . . , S),
condition mentioned in Section II.
train an SVM classifier and compute the corre-
The position pi ∈ d+2 of each particle Pi from the swarm is
sponding fitness function f (i).
viewed as a vector encoding: 1) a candidate subset f of features
Step 9) Update the best position pbi of each particle if its
among the d available input features and 2) the value of the
current position pi (i = 1, 2, . . . , S) has a smaller
two SVM classifier parameters, which are the regularization
fitness function.
and the kernel parameters C and γ, respectively. Since the first
3) Convergence
part of the position vector implements a feature detection task,
Step 10) If the maximum number of iterations is not yet
each component (coordinate) of this part will assume either a
reached, return to step 5).
“0” (feature discarded) or a “1” (feature selected) value. The
4) Classification
conversion from real to binary values will be made by a simple
Step 11) Select the best global position p∗g in the swarm
thresholding operation at the 0.5 value.
and train an SVM classifier fed with the subset
Let f (i) be the fitness function value associated with the ith
of detected features mapped by p∗g and modeled
particle Pi . The choice of the fitness function is important be-
with the values of the two parameters C and γ
cause it is on this basis that the PSO evaluates the goodness of
encoded in the same position.
each candidate solution pi for designing our SVM classifica-
Step 12) Classify the ECG signals with the trained SVM
tion system. A possible choice is to adopt the class of criteria
classifier.
that estimates the leave-one-out error bound that exhibits the
interesting property of representing an unbiased estimation of
the generalization performance of classifiers. In particular, for C. Extension to Multiclass Classification Problems
SVM classifiers, different measures of this error bound have Extension of the proposed system to address multiclass clas-
been derived, such as the radius-margin bound and the simple sification problems is made by adopting the OAA strategy de-
SV count [11]. In this paper, we shall explore the simple SV scribed in Section II. For a problem with a set of T classes
count as a fitness criterion in the PSO optimization framework Ω = {w1 , w2 , . . . , wT }, this means that an ensemble of T bi-
because of its simplicity and effectiveness, as shown in the clas- nary SVM classifiers should be optimized according to the PSO
sification of hyperspectral remote sensing images [16]. procedure described in the previous section. Therefore, the pro-
posed approach will automatically detect the features and deter-
B. SVM Classification With PSO: Case of Binary mine the two model parameter values for each binary SVM clas-
Classification Problems sifier defined to discriminate between class wi (i = 1, 2, . . . , T )
and all others (i.e., Ω − {wi }). During the classification phase,
The procedure describing the proposed SVM classification
the “winner-takes-all” rule is used to produce the final decision.
system for the basic discrimination case between two classes is
Note that, though we adopted the OAA strategy as a multiclass
as follows.
strategy, other strategies could also be considered, thanks to the
1) Initialization
general nature of the proposed PSO–SVM classification system.
Step 1) Generate randomly an initial swarm of size S.
Step 2) Set to zero the velocity vectors vi (i =
1, 2, . . . , S) associated with the S particles. V. EXPERIMENTAL DESIGN
Step 3) For each position pi ∈ d+2 of the particle A. Dataset Description
Pi (i = 1, 2, . . . , S) from the swarm, train an
SVM classifier and compute the corresponding Our experiments were conducted on the basis of ECG data
fitness function f (i) (i.e., the SV measure). from the MIT–BIH arrhythmia database [17]. In particular, the
Step 4) Set the best position of each particle with its initial considered beats refer to the following classes: normal sinus
position, i.e., rhythm (N ), atrial premature beat (A), ventricular premature
beat (V ), right bundle branch block (RB), left bundle branch
pbi = pi , (i = 1, 2, . . . , S). (12) block (LB), and paced beat (/). The beats were selected from
MELGANI AND BAZI: CLASSIFICATION OF ECG SIGNALS WITH SVMs AND PSO 671

among the beats of the considered class; 3) the average accuracy


(AA), which is the average over the classification accuracies ob-
tained for the different classes; 4) the McNemar’s test that gives
the statistical significance of differences between the accuracies
achieved by the different classification approaches. This test is
based on the standardized normal test statistic [25]
fij − fj i
Zij =  (13)
fij + fj i
where Zij measures the pairwise statistical significance of the
difference between the accuracies of the ith and jth classi-
fiers. fij stands for the number of beats classified correctly
and wrongly by the ith and jth classifiers, respectively. Accord-
ingly, fij and fij are the counts of classified beats on which
the considered ith and jth classifiers disagree. At the commonly
Fig. 1. Two-dimensional distribution of the six considered classes in the sub-
space formed by the best couple of features obtained with the PCA algorithm.
used 5% level of significance, the difference of accuracies be-
For better visualization, just 25 samples were randomly selected for each class. tween the ith and jth classifiers is said statistically significant if
|Zij | > 1.96.

the recordings of 20 patients, which correspond to the following


B. Experimental Scheme
files: 100, 102, 104, 105, 106, 107, 118, 119, 200, 201, 202,
203, 205, 208, 209, 212, 213, 214, 215, and 217. In order to The proposed experimental framework was articulated
feed the classification process, in this study, we adopted the two around the following five main experiments.
following kinds of features: 1) ECG morphology features and 1) The first experiment aimed at assessing the effectiveness
2) three ECG temporal features, i.e., the QRS complex duration, of the SVM approach in classifying ECG signals directly
the RR interval (the time span between two consecutive R points in the whole original hyperdimensional feature space (i.e.,
representing the distance between the QRS peaks of the present by means of all the 303 available features). The total num-
and previous beats), and the RR interval averaged over the ten ber of training beats was fixed to 500, as reported in Table I.
last beats [9]. In order to extract these features, first we per- For comparison purpose, we implemented two other ref-
formed the QRS detection and ECG wave boundary recognition erence nonparametric classification approaches, namely,
tasks by means of the well-known ecgpuwave software avail- the k-nearest neighbor (kNN) and the radial basis func-
able on http://www.physionet.org/physiotools/ecgpuwave/src/. tion (RBF) neural network classifiers [24].
Then, after extracting the three temporal features of interest, we 2) In the second experiment, it was desired to explore the
normalized to the same periodic length the duration of the seg- behavior of the SVM classifier (compared to the two ref-
mented ECG cycles according to the procedure reported in [23]. erence classifiers) when integrated within a standard clas-
To this purpose, the mean beat period was chosen as the normal- sification scheme based on a PCA feature reduction. In
ized periodic length, which was represented by 300 uniformly particular, the number of features was varied from 10 to
distributed samples. Consequently, the total number of mor- 50 with a step of 10 so as to test this classifier in small as
phology and temporal features equals 303 for each beat. Fig. 1 well as high-dimensional feature subspaces.
illustrates the distribution of the six considered classes drawn 3) The third experimental part had for objective to assess
by means of 25 samples randomly selected for each class and the capability of the proposed PSO–SVM classification
the two best features according to the principal component anal- system to boost further the accuracy of the SVM classi-
ysis (PCA) algorithm [24]. From this figure, one can expect that fier, thanks to its automatic feature detection and model
the discrimination task will not be straightforward due to the selection-oriented optimization process.
apparently strong overlap between classes. 4) The fourth experiment was devoted to analyze the gener-
In order to obtain reliable assessments of the classification alization capability of the SVM, the kNN, and the RBF
accuracy of the investigated classifiers, in all the following ex- classifiers with and without feature reduction, and of the
periments, we carried out three different trials, each with a new PSO–SVM classification system by decreasing/increasing
set of randomly selected training beats, while the test set was the number of available training beats. This analysis was
kept unchanged. The results of these three trials obtained on the done through two experimental scenarios, which con-
test set were thus averaged. The detailed numbers of training sisted in passing from 500 to 250 and 750 training beats,
and test beats are reported for each class in Table I. Classifi- respectively.
cation performance was evaluated in terms of four measures, 5) Finally, in the fifth experiment, we analyzed the sensitivity
which are: 1) the overall accuracy (OA), which is the percent- of the PSO–SVM classification system with respect to the
age of correctly classified beats among all the beats considered three parameters that govern the PSO optimizer, namely,
(independently of the classes they belong to); 2) the accuracy the inertia weight w and the two acceleration constants c1
of each class that is the percentage of correctly classified beats and c2 .
672 IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 12, NO. 5, SEPTEMBER 2008

TABLE I
NUMBERS OF TRAINING AND TEST BEATS USED IN THE EXPERIMENTS

TABLE II
OVERALL (OA), AVERAGE (AA), AND CLASS PERCENTAGE ACCURACIES ACHIEVED ON THE TEST BEATS WITH
THE DIFFERENT INVESTIGATED CLASSIFIERS WITH A TOTAL NUMBER OF 500 TRAINING BEATS

C. Experiment Settings polynomial kernel. Each time we left one of the subsets out
of the training, and only used it to obtain an estimate of the
In the experiments, we considered the nonlinear SVM based
on the popular Gaussian kernel (referred to as SVM-RBF or classification accuracy. From m times of training and accuracy
simply SVM). The related parameters C and γ for this ker- computation, the AA yielded a prediction of the classification
accuracy of the considered SVM classifier. We chose the best
nel were varied in the arbitrarily fixed ranges [10−3 , 200] and
[10−3 , 2] so as to cover high and small regularization of the clas- SVM classifier parameter values to maximize this prediction.
In all experiments reported in this paper, we adopted a fivefold
sification model, and fat as well as thin kernels, respectively. In
CV. The same procedure was adopted to find the best parameters
addition, for comparison purpose, we implemented, in the first
experiment, the SVM classifier with two other kernels, which for the kNN and RBF classifiers. We recall that this empirical
parameter estimation procedure and all the classification exper-
are the linear and the polynomial kernels, leading thus to two
iments were repeated three times, each with one of the three
other SVM classifiers termed as SVM-linear and SVM-poly,
respectively. The degree d of the polynomial kernel was varied different training sets generated randomly.
As reported in Table II, the OA and AA accuracies achieved
in the range [2,5] in order to span polynomials with low and
high flexibility. The K value and the number of hidden nodes with the SVM classifier based on the Gaussian kernel (SVM–
(h) of the kNN and the RBF classifiers were tuned in the arbi- RBF) on the test set were equal to 87.76% and 87.48%, re-
spectively. These results were better than those achieved by the
trarily fixed intervals [1,15] and [10,60], respectively. The other
RBF parameters, which include the center and the width of each SVM-linear, the SVM-poly, the RBF, and the kNN classifiers.
RBF (kernel), were computed by applying the K-means clus- Indeed, the OA (and AA) accuracies were equal to 80.55%
(78.90%) for the SVM-linear classifier, 85.25% (85.75%) for
tering algorithm separately to each class [26]. Concerning the
PSO algorithm, we considered the following standard param- the SVM-poly classifier, 82.74% (82.07%) for the RBF clas-
eters: swarm size S = 40, inertia weight w = 0.4, acceleration sifier, and 81.36% (80.70%) for the kNN classifier. Note that
depending on the classifier, the most difficult classes to discrim-
constants c1 and c2 equal to the unity, and maximum number of
iterations fixed at 40. inate were the paced beat (/), the ventricular premature beat
(V ), and the atrial premature beat (A) classes, which are also
the most overlapped ones according to Fig. 1.
VI. EXPERIMENTAL RESULTS This experiment appears to confirm what was observed in
other application fields, i.e., the superiority of SVM based on
A. Experiment 1: Classification in the Whole Original the Gaussian kernel as compared to traditional classifiers when
Hyperdimensional Feature Space dealing with feature spaces of very high dimensionality. In ad-
As mentioned earlier, in this experiment, we applied the SVM dition, it provides reference classification accuracies in order to
classifier directly on the entire original hyperdimensional fea- quantify the capability of the proposed PSO–SVM classification
ture space, which is made up of 303 features. During the training system to further improve these interesting results.
phase, the SVM parameters were selected according to a m-fold
cross-validation (CV) procedure [27], first by randomly split-
ting the 500 training beats into m mutually exclusive subsets B. Experiment 2: Classification Based on Feature Reduction
(folds) of equal size, and then, by training m times an SVM In this experiment, we trained the SVM classifier based on the
classifier modeled with predefined values: C for the linear ker- Gaussian kernel, which proved in the previous experiments to
nel, (C and γ) for the Gaussian kernel, and (C and d) for the be the most appropriate kernel for ECG signal classification, in
MELGANI AND BAZI: CLASSIFICATION OF ECG SIGNALS WITH SVMs AND PSO 673

feature subspaces of various dimensionalities. The desired num-


ber of features varied from 10 to 50 with a step of 10, namely,
from small to high-dimensional feature subspaces. Feature re-
duction was achieved by the traditional PCA algorithm, com-
monly used in ECG signal classification. It is based on the idea
to select the first component (i.e., the direction of maximum
variance), then the second component (direction of second max-
imum variance), and so on, up to the desired number of compo-
nents, which will compose the considered feature subspace.
Fig. 2(a) depicts the results obtained in terms of OA by the
three considered classifiers combined with the PCA algorithm,
namely, the PCA–SVM, the PCA–RBF, and the PCA–kNN clas-
sifiers. In particular, it can be seen that for all feature subspace
dimensionalities except the lowest (i.e., 10 features), the PCA–
SVM classifier maintains a clear superiority over the other two.
Its best accuracy was found using a feature subspace made up of
the first 30 components. The corresponding OA and AA accu-
racies were 87.57% and 88.92%, respectively. Comparing these
results with those achieved with the SVM classifier based on
the Gaussian kernel in the original feature space (i.e., without
feature reduction), a slight decrease of 0.19% in terms of OA
and an increase of 1.44% in terms of AA was obtained (see
Table II). As regards the PCA–kNN and the PCA–RBF classi-
fiers, the best empirical numbers of features were found to be
20 and 30, respectively. The corresponding OA and AA accu-
racies were 84% and 82.83% for the PCA–kNN classifier, and
83.54% and 83.01% for the PCA–RBF, respectively. Note from
Table II that the PCA–kNN classifier behaves much better with
20 features than in the original hyperdimensional feature space.
From this experiment, we can make three observations: 1) the
SVM classifier shows a relatively low sensitivity to the curse
of dimensionality as compared to the kNN and the RBF clas-
sifiers [see Table III(a)]; 2) the SVM classifier still preserve its
superiority when integrated in a feature reduction-based classi-
fication scheme; and 3) though the SVM performs well in the
whole original feature space, its accuracy can still be improved
provided that a subspace of higher generalization capability can
be found.

C. Experiment 3: Classification With PSO–SVM


As described in Section IV, the proposed PSO–SVM classifi-
cation system aims at enhancing the SVM classification process
from two different viewpoints: 1) by automatically detecting a
feature subspace of higher generalization capability in order to
deal in a more effective way with the curse of dimensionality, Fig. 2. Overall percentage accuracy (OA) versus number of selected features
instead of reducing the dimension of the original feature space (first principal components) achieved on the test beats with the PCA–SVM, the
basing on PCA or simply on feature sampling as done in the PCA–kNN, and the PCA–RBF classifiers with a training set made up of (a) 500,
(b) 250, and (c) 750 beats. The horizontal line refers to the OA achieved with
literature [1], [3], [5], [10] and 2) by passing from an empirical the proposed PSO–SVM classification approach.
tuning of the value of the two SVM parameters to their automatic
optimization. This experiment is aimed at assessing the effec-
tiveness of this methodological enhancement. To this purpose, average accuracies were 90.52% and 92.34% corresponding to
we applied the PSO–SVM classifier to the available training substantial accuracy gains as compared to what was yielded ei-
beats. Note that each particle of the swarm was defined by posi- ther by the SVM classifier (with the Gaussian kernel) applied to
tion and velocity vectors of a dimension of 305. At convergence all available features (+2.76% and +4.86%, respectively) or by
of the optimization process, we assessed the PSO–SVM clas- the PCA–SVM classifier (+2.95% and +3.42%, respectively)
sifier accuracy on the test samples. The achieved overall and [see Table II and Fig. 2(a)]. The superiority of the PSO–SVM
674 IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 12, NO. 5, SEPTEMBER 2008

TABLE III
STATISTICAL SIGNIFICANCE OF DIFFERENCES IN CLASSIFICATION ACCURACY BETWEEN THE NINE INVESTIGATED CLASSIFIERS
EXPRESSED BY MEANS OF THE MCNEMAR’S TEST WITH A TOTAL OF (a) 500, (b) 250, AND (c) 750 TRAINING BEATS

TABLE IV and maximum numbers of features were obtained for the ven-
NUMBER OF FEATURES DETECTED FOR EACH CLASS WITH THE PSO–SVM
CLASSIFICATION SYSTEM TRAINED ON 500 BEATS
tricular premature (V ) and normal (N ) classes with 35 and 63
features, respectively.

D. Experiment 4: Sensitivity to the Number of Training Beats


In this experiment, we repeated the previous three experi-
ments while decreasing and increasing the training set size by
is statistically significant as shown by the McNemar’s statistical 50%. In particular, we considered two experimental scenarios
test numbers reported in Table III(a), which, in absolute value, characterized by a total number of 250 and 750 training beats,
are all above the 1.96 threshold. Its worst class accuracy was respectively. Table V(a) and (b) shows the results achieved with
obtained for normal beat (N ) (89.12%), while that of the SVM all nine investigated classifiers (SVM-linear, SVM-poly, SVM–
and the PCA–SVM classifiers was for ventricular premature RBF, kNN, RBF, PCA–SVM, PCA–kNN, PCA–RBF, and PSO–
beats (V ) as they were (81.48%) and (83.62%), respectively. SVM) for these two scenarios, i.e., for 250 and 750 training
This shows the capability of the PSO–SVM classifier to reduce beats, respectively. Similarly, Fig. 2(b) and (c) shows the trend
the gap between the worst and the best class accuracies (6.19% of the OA provided by the PCA–SVM, the PCA–kNN, and the
versus 9.69% and 14.5% for the PCA–SVM and the SVM clas- PCA–RBF classifiers on varying the number of features from
sifiers, respectively) while keeping OA at a high level. Table IV 10 to 50.
shows the number of features detected automatically to discrim- In general, as could be expected, reducing the number of
inate each class from the others. The average number of features training beats involved a more or less significant decrease
required by the PSO–SVM classifier is 46, while the minimum in accuracy depending on the classifier. In terms of OA, the
MELGANI AND BAZI: CLASSIFICATION OF ECG SIGNALS WITH SVMs AND PSO 675

TABLE V
OVERALL (OA), AVERAGE (AA), AND CLASS PERCENTAGE ACCURACIES ACHIEVED ON THE TEST BEATS WITH
THE DIFFERENT EXPLORED CLASSIFIERS WITH A TOTAL NUMBER OF (a) 250 AND (b) 750 TRAINING BEATS

decrease in accuracy was 3.55%, 3.93%, 4.54%, 4.63%, 5.13%, TABLE VI


OVERALL (OA) AND AVERAGE (AA) ACCURACIES ACHIEVED ON THE TEST
5.15%, 5.76%, 5.82%, and 6.16% for the PSO–SVM, the PCA– BEATS BY THE PSO–SVM CLASSIFICATION SYSTEM FOR DIFFERENT VALUES
kNN, the RBF, the kNN, the PCA–RBF, the SVM–RBF, the OF (a) INERTIA WEIGHT w VARIED IN THE RANGE [0,1] (c 1 AND c 2 WERE SET
PCA–SVM, the SVM-poly, and the SVM-linear classifiers, re- TO 1) AND (b) ACCELERATION CONSTANTS c 1 AND c 2 TUNED IN THE RANGE
[1,2] [w WAS FIXED AT 0.6, WHICH IS THE BEST VALUE FOUND IN (a)]
spectively. The PSO–SVM classifier thus shows the greatest
robustness to a decrease in training beats. Though it exploited
only 250 training beats, it yielded the best OA and AA values,
which are comparable to those of the SVM classifier trained
with double the number of training beats (500 beats) [see Ta-
bles II and V(a)]. Its worst class accuracy was 86.47% versus
80.23%, 68.14%, 61.67%, 57.14%, 50.22%, and 49.41% for
the PCA–SVM, the SVM(-RBF), the PCA–kNN, the kNN, the
RBF, and the PCA–RBF classifiers, respectively.
When increasing the number of training beats from 500 to
750, the classification accuracies increase and the differences
between the classifiers appear less pronounced. In particular,
the classifier that benefited from the additional training beats
was the PCA–kNN classifier with a gain of +3.04% and +4.10%
in terms of OA and AA, respectively. Still, in this classification
scenario, the PSO–SVM classifier maintained a clear superiority
in terms of both OA and AA.
For both scenarios, the PSO–SVM showed statistically signif-
icant differences of accuracy with respect to the other classifiers
according to the McNemar’s test [see Table III(b) and (c)]. with respect to the inertia weight w and the two acceleration
constants c1 and c2 , which control the behavior, and thus, the
goodness of the PSO search process. In the first step, we fixed
c1 and c2 to 1 and we varied w in the range [0,1] (according
E. Experiment 5: Sensitivity to the Inertia Weight and to [19]). From Table VI(a), the best and the worst classification
Acceleration Parameters
accuracies were obtained for w = 0.6 and w = 0.8, respectively.
As mentioned previously, in this experiment, we wanted to The corresponding OA (and AA) accuracies achieved on the test
analyze the sensitivity of the PSO–SVM classification system set were 90.88% (92.70%) and 88.87% (91.90%), respectively.
676 IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 12, NO. 5, SEPTEMBER 2008

In the second step, we fixed w = 0.6 (corresponding to the best [2] T. H. Linh, S. Osowski, and M. L. Stodoloski, “On-line heart beat recogni-
obtained accuracy) and we varied c1 and c2 in the range [1,2] tion using Hermite polynomials and neuron-fuzzy network,” IEEE Trans.
Instrum. Meas., vol. 52, no. 4, pp. 1224–1231, Aug. 2003.
(according to [19]). In this case, the OA and AA accuracies were [3] S. Osowski, T. H. Linh, and T. Markiewicz, “Support vector machine-
less affected by the variation of these parameters. Indeed, they based expert system for reliable heart beat recognition,” IEEE Trans.
fluctuated from 90.18% (91.65%) for c1 = c2 = 1.2 to 90.88% Biomed. Eng., vol. 51, no. 4, pp. 582–589, Apr. 2004.
[4] L.Y. Shyu, Y. H. Wu, and W. Hu, “Using wavelet transform and fuzzy
(92.70%) for c1 = c2 = 1. neural network for VPC detection form the Holter ECG,” IEEE Trans.
As this empirical analysis shows, the PSO optimizer appears Biomed. Eng., vol. 51, no. 7, pp. 1269–1273, Jul. 2004.
more sensitive to the inertia weight parameter than the two [5] F. de Chazal, M. O’Dwyer, and R. B. Reilly, “Automatic classification
of ECG heartbeats using ECG morphology and heartbeat interval fea-
other parameters. However, even when nonstandard parameter tures,” IEEE Trans. Biomed. Eng., vol. 51, no. 7, pp. 1196–1206, Jul.
values are adopted, the achieved accuracies keep still above 2004.
those yielded by the reference classifiers. [6] L. Khadra, A. S. Al-Fahoum, and S. Binajjaj, “A quantitative analysis
approach for cardiac arrhythmia classification using higher order spectral
techniques,” IEEE Trans. Biomed. Eng., vol. 52, no. 11, pp. 1840–1845,
Nov. 2005.
VII. CONCLUSION [7] R. V. Andreao, B. Dorizzi, and J. Boudy, “ECG signal analysis through
hidden Markov models,” IEEE Trans. Biomed. Eng., vol. 53, no. 8,
From the obtained experimental results, we can strongly rec- pp. 1541–1549, Aug. 2006.
ommend the use of the SVM approach for classifying ECG [8] S. Mitra, M. Mitra, and B. B. Chaudhuri, “A rough set-based inference
signals on account of their superior generalization capability as engine for ECG classification,” IEEE Trans. Instrum. Meas., vol. 55,
no. 6, pp. 2198–2206, Dec. 2006.
compared to traditional classification techniques. This capabil- [9] F. de Chazal and R. B. Reilly, “A patient adapting heart beat classifier
ity generally provides them with higher classification accuracies using ECG morphology and heartbeat interval features,” IEEE Trans.
and a lower sensitivity to the curse of dimensionality. Biomed. Eng., vol. 53, no. 12, pp. 2535–2543, Dec. 2006.
[10] T. Inan, L. Giovangrandi, and J. T. A. Kovacs, “Robust neural network
The main novelty of this paper is in the proposed PSO-based based classification of premature ventricular contractions using wavelet
approach, which aims at optimizing the performances of SVM transform and timing interval features,” IEEE Trans. Biomed. Eng.,
classifiers in terms of classification accuracy by detecting the vol. 53, no. 12, pp. 2507–2515, Dec. 2006.
[11] V. Vapnik, Statistical Learning Theory. New York: Wiley, 1998.
best subset of available features and solving the tricky model [12] M. Pontil and A. Verri, “Support vector machines for 3D object recogni-
selection issue. The fact that it is entirely automatic makes it tion,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 6, pp. 637–646,
particularly useful and attractive. The results confirm that the Jun. 1998.
[13] Y. Y. El-Naqa, M. N. Wernick, N. P. Galatsanos, and R. M. Nishikawa,
PSO–SVM classification system substantially boosts the gener- “A support vector machine approach for detection of microcalcifica-
alization capability achievable with the SVM classifier, and its tions,” IEEE Trans. Med. Imag., vol. 21, no. 12, pp. 1552–1563, Dec.
robustness against the problem of limited training beat availabil- 2002.
[14] J. Robinson and V. Kecman, “Combining support vector machine learning
ity, which may characterize pathologies of rare occurrence. An- with the discrete cosine transform in image compression,” IEEE Trans.
other advantage of the PSO–SVM approach can be found in its Neural Netw., vol. 14, no. 4, pp. 950–958, Jul. 2003.
high sparseness, which is explained by the fact that the adopted [15] F. Melgani and L. Bruzzone, “Classification of hyperspectral remote sens-
ing images with support vector machine,” IEEE Trans. Geosci. Remote
optimization criterion is based on minimizing the number of Sens., vol. 42, no. 8, pp. 1778–1790, Aug. 2004.
SVs. This criterion favors the definition of compact discrimi- [16] Y. Bazi and F. Melgani, “Toward an optimal SVM classification system
nant functions, which are thus easy to implement on a hardware for hyperspectral remote sensing images,” IEEE Trans. Geosci. Remote
Sens., vol. 44, no. 11, pp. 3374–3385, Nov. 2006.
platform. For such purpose, the PSO–SVM classifier should first [17] R. Mark and G. Moody MIT-BIH Arrhythmia Database 1997 [Online].
be run on a PC for determining the best features for each class Available http://ecg. mit.edu/dbinfo.html.
and the discrimination model (SVs and related weights) of the [18] C.-W. Hsu and C.-J. Lin, “A comparison of methods for multiclass support
vector machines,” IEEE Trans. Neural Netw., vol. 13, no. 2, pp. 415–425,
corresponding SVM. Mar. 2002.
Finally, it is noteworthy that, thanks to its general nature, the [19] J. Kennedy and R. C. Eberhart, Swarm Intelligence. San Mateo, CA:
proposed PSO–SVM system is applicable not only to morphol- Morgan Kaufmann, 2001.
[20] Z. L. Gaing, “A particle swarm optimization approach for optimum design
ogy and temporal features, but also to other types of features of PID controller in AVR system,” IEEE Trans. Energy Convers., vol. 19,
such as those based on wavelets and high-order statistics. Fur- no. 2, pp. 384–391, Jun. 2004.
thermore, other optimization criteria could be considered as [21] M. Donelli, R. Azzaro, F. G. B. De Natale, and A. Massa, “An innovative
computational approach based on a particle swarm strategy for adaptive
well, individually or jointly depending on the application re- phased-arrays control,” IEEE Trans. Antennas Propag., vol. 54, no. 3,
quirements. pp. 888–898, Mar. 2006.
[22] W. H. Slade, H. W. Ressom, M. T. Musavi, and R. L. Miller, “Inversion
of ocean color observations using particle swarm optimization,” IEEE
ACKNOWLEDGMENT Trans. Geosci. Remote Sens., vol. 42, no. 9, pp. 1915–1923, Sep. 2004.
[23] J. J. Wei, C. J. Chang, N. K. Shou, and G. J. Jan, “ECG data compression
The authors would like to thank Dr. C.-C. Chang and using truncated singular value decomposition,” IEEE Trans. Biomed.
Dr. C.-J. Lin for supplying the software LIBSVM (http://www. Eng., vol. 5, no. 4, pp. 290–299, Dec. 2001.
[24] R. Duda, P. Hart, and D. Stork, Pattern Classification, 2nd ed. New
csie.ntu.edu.tw/ ˜cjlin/libsvm) used in this research. York: Wiley, 2001.
[25] A. Agresti, Categorical Data Analysis, 2nd ed. New York: Wiley, 2002.
[26] L. Bruzzone and D. Prieto, “A technique for the selection of kernel function
REFERENCES parameters in RBF neural networks for classification of remote sensing
images,” IEEE Trans. Geosci. Remote. Sens., vol. 37, no. 2, pp. 1179–
[1] S. Osowski and T. H. Linh, “ECG beat recognition using fuzzy hybrid 1184, Mar. 1999.
neural network,” IEEE Trans. Biomed. Eng., vol. 48, no. 11, pp. 1265– [27] M. Stone, “Cross-validatory choice and assessment of statistical predic-
1271, Nov. 2001. tions,” J. R. Statist. Soc. B, vol. 36, pp. 111–147, 1974.
MELGANI AND BAZI: CLASSIFICATION OF ECG SIGNALS WITH SVMs AND PSO 677

Farid Melgani (M’04–SM’06) received the State En- Yakoub Bazi (S’05–M’07) received the State Engi-
gineer degree in electronics from the University of neer and M.Sc. degrees in electronics from the Uni-
Batna, Batna, Algeria, in 1994, the M.Sc. degree in versity of Batna, Batna, Algeria, in 1994 and 2000,
electrical engineering from the University of Bagh- respectively, and the Ph.D. degree in information and
dad, Baghdad, Iraq, in 1999, and the Ph.D. degree in communication technology from the University of
electronic and computer engineering from the Uni- Trento, Trento, Italy, in 2005.
versity of Genoa, Genoa, Italy, in 2003. From 2000 to 2002, he was a Lecturer at the Uni-
From 1999 to 2002, he was with the Signal Pro- versity of M’sila, M’sila, Algeria. From January 2006
cessing and Telecommunications Group, Department to June 2006, he was a Postdoctoral Researcher at the
of Biophysical and Electronic Engineering, Univer- University of Trento. He is currently an Assistant
sity of Genoa. Since 2002, he has been with the Uni- Professor in the College of Engineering, Al Jouf Uni-
versity of Trento, Trento, Italy, where he is an Assistant Professor of Telecommu- versity, Al Jouf, Saudi Arabia. His current research interests include pattern
nications, and currently, the Head of the Intelligent Information Processing (I2P) recognition and evolutionary computation methodologies applied to remote
Laboratory, Department of Information Engineering and Computer Science. His sensing images and biomedical signal/images (change detection, classification,
current research interests include processing, pattern recognition, and machine and semisupervised learning).
learning techniques applied to remote sensing and biomedical signals/images Dr. Bazi is a Referee for several international journals.
(classification, regression, multitemporal analysis, and data fusion). He is the
author or coauthor of more than 80 scientific papers and is a referee for several
international journals.
Dr. Melgani was on the scientific committees of several international con-
ferences and is an Associate Editor of the IEEE GEOSCIENCE AND REMOTE
SENSING LETTERS.

You might also like