You are on page 1of 6

Cardiac Abnormalities Detection from Compressed ECG in Wireless Telemonitoring using Principal Components Analysis (PCA)

Ayman Ibaida #1 , Ibrahim Khalil #2 and Fahim Su #3


Distributed Systems & Networking School of Computer Science & IT RMIT University, Melbourne, Australia
ayman.ibaida@student.rmit.edu.au 2 ibrahimk@cs.rmit.edu.au 3 fahim.sufi@student.rmit.edu.au

AbstractIn Wireless telecardiology applications ECG signal is compressed before transmission to support faster data delivery and reduce consumption of bandwidth. However, most of the ECG analysis and diagnosis algorithms are based on processing of the original ECG signal. Therefore, compressed ECG data needs to be decompressed rst before the existing algorithms and tools can be applied to detect cardiovascular abnormalities. Decompression will cause delay on the doctors mobile device and in wireless nodes that have the responsibilities to detect and prioritize abnormal data for faster processing. This is undesirable in body sensor networks (BSNs) as high processing involved in decompression will waste valuable energy in the resource and power constrained sensor nodes. In this paper, in order to diagnose cardiac abnormality such as Ventricular tachycardia, we applied a novel system to analyse and classify compressed ECG signal by using a PCA for feature extraction and k-mean for clustering of normal and abnormal ECG signals.

redundant R waves or noise peaks are removed. Other researchers used template matching methods to classify the ECG signal using neural networks [9]. The techniques mentioned above are applied to the original enormous sampled ECG signal. Large sampled ECG signal will make processing both time and resource consuming.
Amplitude R T P S

I. I NTRODUCTION Cardiac diseases are number one killer in the modern world as many people die because of sudden heart attack. At the same time, a large number of people die because of the delay or errors in diagnosing their cardiac diseases. Electrocardiogram (ECG) signal has been intensively used by cardiac specialists to effectively diagnose cardiovascular diseases [1].Several researchers have proposed various methods such as digital signal processing, ltering methods, data mining tools as well as neural networks for classication of cardiac anomalies [1]. ECG can also be used for continuous patient monitoring as well as in biometric authentication techniques [2], [3], [4]. A typical ECG signal as shown in Fig. 1 contains special waves such as P, T waves as well as QRS complex. Cardiologists investigate each of these waves, complexes and other features such as RR interval, PR interval, PR segment, ST interval and ST segment etc.[5] to diagnose various types of abnormal cardiac symptoms. However, accurate extraction of features using numerous signal processing techniques [6], [7] can be complex and difcult. Recently, wavelet-based QRS detectors have been suggested by a variety of researches [8]. Such methods have a post-processing phase in which the

Time / Samples

Fig. 1.

ECG Waves

In a typical wireless telemonitoring scenario as shown in Fig. 2 a patient wears wireless sensors capable of reading samples of ECG, possibly compress and diagnose, and send it wirelessly with the result of diagnosis to a central server and e-doctors (e.g. doctors who are roaming around with mobile devices) that can take quick action according to its priority [10]. However, wireless nodes (e.g. Sensor nodes in body sensor networks (BSNs) or a roaming doctors smartphone) are power and resource constrained. Therefore, it is obvious that the existing [11], [12], [13], [14] and above mentioned techniques for diagnosis are suitable for implementation neither in body sensor networks nor in resource-constrained wireless environment. In wireless telemonitoring scenarios digitized ECG data need to be transferred as fast as possible using the mobile technologies such as MMS,GPRS, HSDPA or zigbee etc. However, these technologies can not provide high speed communication [15] and data must be compressed rst to make the transmission energy efcient. Therefore, in this paper we have proposed a novel technique to analyse abnormalities

978-1-4244-3518-0/09/$25.00 2009 IEEE

207

ISSNIP 2009

Diagnosis from Compressed ECGs Bluetooth Link

Bandwidth Constrained Wireless Link

Base Station

zigbee 802.15

ECG Sensor

Hospital

Gateway Node

Roaming Patients

Power Constrained ECG Sensor Nodes diagnoses from compressed ECGs

Doctors Diagnosing on Resource Constrained Mobile Nodes

Wireless Sensor Patch (remote aged care facility or battlefield)

Roaming Doctors

Fig. 2. A typical wireless telemonitoring scenario. Compression would save energy on power hungry bluetooth device, resource constrained wireless sensor nodes and doctors smartphone. Compression also helps trasmit faster over bandwidth constrained wireless links. Diagnosis of diseases possible on Mobile nodes from Compressed data.

from compressed ECG data without decompressing the data. The abnormal cardiac condition considered in this paper is Ventricular Tachycardia which is a life-threatening cardiac desease consisting of a rapid rhtym originating from the lower chambers of the heart. The rapid rate prevents the heart from lling adequately with blood, and less blood is able to pump through the body. To achieve this, we rst applied a lossless compression method as described in [15] before transmission. We then analyzed the compressed ECG signal directly and extracted the important features of it from the compressed data using Principal Component Analysis (PCA). The extracted features are classied as normal and abnormal using k-means algorithm [16], [17]. In this research we have made contributions by answering the following research questions: How can we classify and detect normal and abnormal ECG data directly from the compressed ECG signal without decompression? How can we implement an attribute selection technique and apply it to extract compressed ECG features using principal component analysis (PCA)? Rest of the paper is organized as follows. Section II briey discusses the our previously proposed compression algorithm that is used in this paper. In section III we discuss the basic system, present analysis of feature subset selection from compressed ECGs using Principal Component Analysis (PCA). Next, in section IV we show results of PCA and and simple k-means algorithm to cluster data into abnormal and normal segments.Finally, section V concludes the paper. II. BACKGROUND : T HE C OMPRESSION A LGORITHM The ECG encoding algorithm is a symbol substitution based technique preceded by some mathematical transformations.

According to our previous experiments [15], up to 95% compression (compression rate of 20) was harnessed without any loss of information (lossless compression) when our encoding algorithm is jointly applied with existing compression and encryption algorithms. According to the literature, this was highest possible compression ratio achieved for compressing publicly available ECG of MIT BIT Arrhythmia Database. The character set that is used for substitutions is shown in Fig. 3. Application of the algorithm with this character set would generate, for example, compressed ECGs shown in Fig. 5. Apart from providing highest possible compression ratio, the compression algorithm also preserves features for cardiovascular diagnosis directly from the compressed ECG. According to the literature [18] and to the best of our knowledge, multiple diseases were diagnosed from compressed MIT BIH ECG for the very rst time directly from their compressed ECG. The benet of diagnosis from compressed ECG is immense. As compressed ECG contains less characters, diagnosis from compressed ECG can be possible (using the techniques shown in [18] with fewer reading operations (I/O). Most importantly, for telecardiology applications, where ECG is transmitted and stored in compressed format, cardiovascular diagnosis is possible, without performing decompression, saving processing power, resource and time. Minimizing delays in diagnosis entail savings of patients lives. The selected compression algorithm is not only a compression algorithm, but also an encryption algorithm to make sure secure transmission of data can be achieved. The compression algorithm consisting of the following stages: Normalization stage: to rescale the ECG signal and convert them to the smallest integer value

208

Fig. 5. If we observe carefully we may not notice a signicant difference in normal and abnormal compressed signals. By using special data mining algorithms we can determine abnormalities in the compressed ECG data.
Fig. 3. Character Set for the compressed ECG signal

A. Analysis of Compressed ECG signal The compressed ECG data contains the characters set shown in Fig. 3. This compression is done in an wireless node of the body sensors carried by a patient. Before transmission of the compressed ECGs over the wireless networks data mining module in the nodes needs to be trained with the normal samples of compressed ECGs. As shown in Fig. 7 character frequency calculation is performed for each compressed ECG segment. As a result, we have the frequency count for each character. Since we have 148 characters, if each character is regarded as an attribute, then we have 148 attributes. But 148 attributes is large number for clustering (normal and abnormal). Therefore, we applied an attribute selection technique called Principle Component Analysis (PCA) for dimensionality reduction.
ECG Compression Frequency Count Attribute Selection (PCA)

Differencing stage: to lower the amplitude of the signal Value encoding: to encode the unsigned normalized difference Sign encoding: to encode the signs of the values Decimal Values permutation stage as a mapping function Substitution of ASCII character codes to ASCII characters III. T HE METHODOLOGY
3

2.5

1.5

0.5

0.5

1.5

ECG Signal
0 500 1000 1500 2000 2500

(a)
0.6 0.4

Clustering of Normal/Abnormal (k-mean)

Disease Detection

0.2

Fig. 7.

Block Diagram for the Proposed ECG detection system

0.2

0.4

0.6

B. Attribute Subset Selection A preprocessing of data using attribute selection algorithm is a critical issue in data mining solutions, since the training will be hard and inaccurate using large number of attributes. Also, it will make the system more complicated and the processing time will be large if the number of attributes keep increasing. In this paper, we adopted PCA which is appropriate if there is a set of samples with large number of variables (attributes). The algorithm will generate a new small set of articial variables called Principle Components which can be selected and fed to clustering system. We rst prepared the data set for patient Cu01 as an example. For experimentation we took 12 samples of which 6 are normal and the rest abnormal. We then compressed each sample and calculated character frequency to derive the nal data set shown in Fig. 6. By applying PCA on this data set we rst generate the covariance matrix of the data. Next, we derive eigenvectors and eigenvalues for the covariance matrix which is then rearranged as a new matrix starting with the eigenvector that corresponds to the highest eigenvalue, and so on. As a result, this matrix will be (n n) matrix where n is the number of variables (i.e in this case n = 148). After this, we calculate the scores matrix which is a (n m) matrix where n is the number of samples and m is the number of variables. Equation 1 shows

0.8

1.2

500

1000

1500

2000

2500

(b)
Fig. 4. (a) Normal ECG sample for pationt CU01 (b) Abnormal ECG sample for pationt CU01

Huge amount of ECG data is required to be transmitted over bandwidth constrained wireless networks as well as limited power sensor nodes. However, sending large amount of data is a power consuming and will reduce the lifetime of body sensor networks. Therefore, compression of ECGs and diagnosis of abnormalities from compressed ECGs will play key roles in enhancing the lifetime of body sensor networks. In this paper, we deployed the compression algorithm proposed in [15] because it is a lossless algorithm. We analysed the resulting compressed ECG and used data mining tools to classify it as normal and abnormal. In Fig. 4(a) we can see a normal ECG sample for patient CU01 from the CU Ventricular Tachyarrhythmia Database and Fig. 4(b) shows the abnormal sample for the same patient. For the purpose of classication of normal and abnormal cases, we will only use the compressed ECGs as shown in

209

(a)
Fig. 5.

(b)

Compressed ECG samples for patient CU01 (a) Abnormal ECG of 4(b) in Compressed Format (b) Normal ECG of 4(a) in Compressed Format

50 00

200

200

100 50 0 00 0 50 100 150 0 150 100 00 50 0 50 00 100 50 0 50 00 50 0 0 50 100 150 0 50 100 150 0 150 100 0 50 100 150 0 50 100 150 0 200 0 50 100 150 0 50 100 150

100

0 150 100 50 0 150 100 50 0 200

50

100

150

50

100

150

50

100

150

100 50 0 0 50 100 150 0 0 50 100 150

Fig. 6.

data set for patient CU01 the rst six plots for abnormal samples and the second six plots for normal samples

the general form to calculate the scores for the rst principle component C1 = b11 (X1 ) + b21 (X2 ) + . . . bp1 (Xp ) (1)

where C1 = the sample score on the principal component 1 bp1 = the regression coefcient (or weight) for observed variable p, Xp = the sample value of variable no p Similarly, other principal components (i.e. PC2, PC3,PC4, and so on) can also be calculated. The challenge now is how

many Principal components we will keep. A simple calculation reveals that the rst few components represent the high portion of data, which is clearly shown in Table I with the eigenvalues and the proportion of each eigenvalue of the total data. If we look at Table I we can clearly notice that the rst and second eigenvalues represent approximately 70% of the total data. Proportions of each eigenvalue in this table is derived by dividing the eigenvalue over the total summation of all eigenvalues obtained as in equation 2. Pi = ei k=m k=1 ek (2)

210

CU01
2 Normal Abnormal 1

CU03
2 Normal Abnormal 2 1.5 1 0 0.5 0

CU09
Normal Abnormal

PC2

PC2

PC2
3 2 1 0 1 2 3 4

0.5 1 1.5 2

2
4

2.5

3 6

5 4

3 5

PC1

PC1

PC1

Fig. 8.

Class distribution for Principal Component 1 (i.e. PC1) & Principal Component 2 (i.e. PC2) for patients cu01,cu03 and cu09 respectively TABLE III K- MEAN RESULTS FOR CU01 Sample Number 1 2 3 4 5 6 7 8 9 10 11 12 Distance from class1 21.50359 21.44839 26.29707 27.9701 69.63799 54.60353 0.086221 0.047432 0.041762 0.064142 0.011507 0.01275 Distance from class2 3.930326 4.478564 4.35777 0.733969 16.59684 8.494487 30.91665 31.52044 29.41073 30.10386 31.33386 29.84699

TABLE I E IGENVALUES F OR VARIOUS PRINCIPAL COMPONENTS OF PATIENT CU01 Principal Components Principal Component 1 (PC1) Principal Component 2 (PC2) Principal Component 3 (PC3) Principal Component 4 (PC4) Principal Component 5 (PC5) Principal Component 6 (PC6) Principal Component 7 (PC7) Principal Component 8 (PC8) Principal Component 9 (PC9) Eigenvalue 82.56614 20.04823 7.47957 6.37829 5.9081 5.45423 5.16522 3.76667 3.6941 Proportion 0.56552 0.13732 0.05123 0.04369 0.04047 0.03736 0.03538 0.0258 0.0253

where Pi is proportion of the ith eigenvalue ei is the ith eigenvalue m number of eigenvalues which is the same number of variables. Therefore, in our work we will take just the rst two principal components and their corresponding score will be used as an input for our clustering part. The same procedure was repeated for all other patients of Ventricular Tachyarrhythmia Database to determine eignevalues and the principal components.
S CORES FOR PC1 Sample Number 1 2 3 4 5 6 7 8 9 10 11 12
AND

Class 2 (abnormal) 2 (abnormal) 2 (abnormal) 2 (abnormal) 2 (abnormal) 2 (abnormal) 1 (normal) 1 (normal) 1 (normal) 1 (normal) 1 (normal) 1 (normal)

TABLE II PC2 (C OMPONENT 1 &2) OF CU01 Scores for PC2 -2.54187 -2.64893 -2.97021 -1.99849 1.579908 0.798551 0.419675 -0.03882 0.275417 -0.12245 0.221886 0.02097 Class abnormal abnormal abnormal abnormal abnormal abnormal normal normal normal normal normal normal

Scores for PC1 -1.18113 -1.26634 -0.88636 -0.12991 3.246313 2.38746 -4.92702 -5.10988 -4.82858 -4.99787 -5.02603 -4.94025

tests on 6 normal and another 6 abnormal ECG segments for every patient we have 12 sets of values for the principal component scores. This particula Table II corresponds to patient CU01 of CU Ventricular Tachyarrhythmia Database. Similiary, scores can be derived for other patients in the database. Figure 8 show class distribution for Principal Component 1 & 2 for patients cu01,cu03 and cu09 respectively. It is obvious from this distribution that abnormal ECGs can be easily separated from the normal ones. Similar tests were performed on all other patients from Ventricular Tachyarrhythmia Database, and they all follow the same trend which conrms that abnormal ECGs can be distinguished from the normal ECGs when PCA is applied to compressed ECGs of the patients. Now we just have two variables (i.e PC1 and PC2) which can be easily fed to a k-mean clustering to classify abnormal and normal ECG segments. This will further validate our earlier observations. Table III shows the results for k-mean algorithm as it is applied to previous data shown in Table II. From the results it is clear that the distances of samples 1-6 are small for class 2 (abnormal) and large for class 1 (normal). This is why it is classied as class 2 (abnormal). Similarly, samples 7-12 have small distance from class 1 and large distance from class 2. Therefore, it is classied as class 1 (normal). This fact is clearly established in Fig. 8 which shows plots of scores for three patients.

IV. R ESULTS AND D ISCUSSION Using the procedure discussed earlier we can now derive Table II that shows the rst two principal component scores for every normal and abnormal ECG segment. Since we performed

211

V. CONCLUSION AND FUTURE WORK Because ECG signal is enormous in size [19],compression algorithms must be used to make the whole tele-cardiology faster and efcient. A faster solution is of crucial importance for diagnoses and treatment of cardiovascular diseases. Although ECG compression enables faster transmission, it also introduces a delay in the processing phase because of the decompression. Since existing methods process the original ECG signal and not the compressed one, this decompression time can be enough to threaten patient life. However decompression in wireless telemonitoring will cause delay on the doctors mobile devices. This (decompression) is also undesirable in body sensor network as more processing will waste valuable energy resources. To overcome the decompression delay and make body sensor network energy efcient, in this paper we implemented the ECG analysis and data mining solution on the compressed ECG signal using PCA for feature extraction and k-mean as a clustering technique. Compressed ECG signal can be fast in transmission, and now we have clearly shown that we can classify and analyse the compressed ECG signal to detect cardiac abnormalities. Encouraged by these results we intend to develop a neural network model to be trained to classify more diseases in a node of body sensor networks. R EFERENCES
[1] G. Clifford, F. Azuaje, and P. McSharry, Advanced methods and tools for ECG data analysis. Artech House. [2] I. Khalil and F. Su, Legendre Polynomials based biometric authentication using QRS complex of ECG, in Intelligent Sensors, Sensor Networks and Information Processing, 2008. ISSNIP 2008. International Conference on, 2008, pp. 297302. [3] F. Su and I. Khalil, An automated patient authentication system for remote telecardiology, in Intelligent Sensors, Sensor Networks and Information Processing, 2008. ISSNIP 2008. International Conference on, 2008, pp. 279284. [4] F. Su, I. Khalil, and I. Habib, Polynomial distance measurement for ECG based biometric authentication. John Wiley & Sons, Ltd. Chichester, UK, 2008. [5] Y. Suzuki, Self-organizing QRS-wave recognition in ECG using neural networks, IEEE Transactions on Neural Networks, vol. 6, no. 6, pp. 14691477, 1995. [6] S. Mahmoodabadi, A. Ahmadian, and M. Abolhasani, ECG feature extraction using Daubechies wavelets, in Proceedings of the Fifth IASTED International Conference, Visualization, Imaging, and Image Processing, Benidorm, Spain, 2005. [7] K. Minami, H. Nakajima, and T. Toyoshima, Real-time discrimination of ventricular tachyarrhythmia withFourier-transform neural network, IEEE Transactions on Biomedical Engineering, vol. 46, no. 2, pp. 179 185, 1999. [8] A. Ghaffari, H. Golbayani, and M. Ghasemi, A new mathematical based QRS detector using continuous wavelet transform, Computers and Electrical Engineering, vol. 34, no. 2, pp. 8191, 2008. [9] S. Barro, M. Fernandez-Delgado, J. Vila-Sobrino, C. Regueiro, and E. Sanchez, Classifying multichannel ECG patterns with an adaptive neuralnetwork, IEEE Engineering in Medicine and Biology Magazine, vol. 17, no. 1, pp. 4555, 1998. [10] B. Lo, S. Thiemjarus, R. King, and G. Yang, Body sensor networka wireless sensor platform for pervasive healthcare monitoring, in The 3rd International Conference on Pervasive Computing. Citeseer, 2005. [11] J. De Vos and M. Blanckenberg, Automated pediatric cardiac auscultation, IEEE Transactions on Biomedical Engineering, vol. 54, no. 2, pp. 244252, 2007. [12] E. Ubeyli, Eigenvector Methods for Automated Detection of Electrocardiographic Changes in Partial Epileptic Patients.

[13] W. Jiang and S. Kong, Block-based neural networks for personalized ECG signal classication, IEEE Transactions on Neural Networks, vol. 18, no. 6, 2007. [14] F. Melgani and Y. Bazi, Classication of Electrocardiogram Signals With Support Vector Machines and Particle Swarm Optimization, IEEE Transactions on Information Technology in Biomedicine, vol. 12, no. 5, pp. 667677, 2008. [15] F. Su and I. Khalil, Enforcing secured ECG transmission for real time telemonitoring: A joint encoding, compression, encryption mechanism, Security and Communication Networks, vol. 1, no. 5, 2008. [16] I. Jolliffe, Principal component analysis. Springer verlag, 2002. [17] M. Dunham, Data mining introductory and advanced topics. Prentice Hall/Pearson Education, 2003. [18] F. Su, Q. Fang, I. Khalil, and S. S. Mahmoud, Novel methods of faster cardiovascular diagnosis in wireless telecardiology, IEEE Journal on Selected Areas in Communications, vol. 27, no. 4, MAY 2009. [19] F. Su, I. Khalil, Q. Fang, and I. Cosic, A mobile web grid based physiological signal monitoring system, in Technology and Applications in Biomedicine, 2008. ITAB 2008. International Conference on, 2008, pp. 252255.

212

You might also like