Professional Documents
Culture Documents
Abstract Medical Diagnosis Systems play a vital role in medical and adaptation of a suitable model are some issues that a
practice and are used by medical practitioners for diagnosis and medical system should take into consideration. Medical
treatment. In this paper, a medical diagnosis system is presented progress is always supported by data analysis which improves
for predicting the risk of cardiovascular disease. This system is the skill of medical experts and establishes the treatment
built by combining the relative advantages of genetic algorithm
technique for diseases. The purpose of medical diagnosis
and neural network. Multilayered feed forward neural networks
are particularly suited to complex classification problems. The system is to assist physicians in defining the risk level of an
weights of the neural network are determined using genetic individual patient. The heart disease dataset found in
algorithm because it finds acceptably good set of weights in less University of California, Irvine Machine Learning Repository
number of iterations. The dataset provided by University of is used for training and testing the system [10]. The purpose of
California, Irvine (UCI) machine learning repository is used for using this dataset is to provide a complex, real world data
training and testing. It consists of 303 instances of heart disease example where the relationships between the features are not
data each having 14 attributes including the class label. First, the easily discovered by casual inspection.
dataset is preprocessed in order to make them suitable for In this proposed system, the advantages of genetic
training. Genetic based neural network is used for training the
system. The final weights of the neural network are stored in the
algorithm and neural network are combined to predict the risk
weight base and are used for predicting the risk of cardiovascular of cardiovascular disease. Genetic algorithm is an
disease. The classification accuracy obtained using this approach optimization algorithm that mimics the principles of natural
is 94.17%. genetics. It finds acceptably good solutions to problems
acceptably quickly. In many applications, knowledge that
describes desired system behavior is contained in datasets.
Keywords- Genetic Algorithm; Neural Network, Backpropagation When datasets contain knowledge about the system to be
Algorithm; Cardiovascular Disease; Prediction Engine designed, a neural network promises a solution because it can
train itself from the datasets. Neural networks are adaptive
I. INTRODUCTION
models for data analysis particularly suitable for handling
Cardiovascular disease refers to the class of diseases that nonlinear functions. By combining the optimization technique
involve the heart or blood vessels. Cardiovascular disease of genetic algorithm with the learning power of neural
technically refers to any disease that affects the cardiovascular network, a model with better predictive accuracy can be
system; it is usually used to refer to those related to derived.
atherosclerosis. Cardiovascular diseases include coronary
heart disease, cerebrovascular disease, raised blood pressure, II. RELATED WORK
peripheral artery disease, rheumatic heart disease, congenital
heart disease and heart failure. In practice, cardiovascular A number of approaches have been used to predict the risk of
disease is treated by cardiologists, thoracic surgeons, vascular cardiovascular diseases. Hai, Hussain, and Xin (2008), in their
surgeons, neurologists, and interventional radiologists, work proposed neural based learning classifier system for
depending on the organ system that is being treated. The heart classifying data mining tasks. They conducted experiments on
is the organ that pumps blood to all tissues of the body. If the 13 different datasets from the University of California, Irvine
pumping action of the heart becomes inefficient, vital organs repository and one artificial dataset. They showed that neural
like the brain and kidney suffer and if the heart stops working, based learning classifier system performs equivalently to
death occurs within minutes. The World Health Organization supervise learning classifier system on five datasets,
(WHO) has estimated that 12 million deaths occur worldwide, significantly good performance on six datasets and
every year due to heart disease [2]. significantly poor performance on three datasets [3].
Medical diagnosis is an important yet complicated task Shantakumar and Kumaraswamy (2009), in their
that needs to be done accurately and efficiently. The work proposed an intelligent and effective heart attack
automation of this system is very much needed to help the prediction system using data mining and artificial neural
physicians to do better diagnosis and treatment. The network. They also proposed extracting significant patterns for
representation of medical knowledge, decision making, choice heart disease prediction. They used K-means clustering to
extract the data appropriate to heart attack from the
warehouse.They used MAFIA algorithm to mine the frequent
patterns[6][7].
Shanthi, Sahoo, and Saravanan (2009), in their work
proposed a decision support system using evolving connection
weights of artificial neural networks with genetic algorithms
with the application to predict stroke disease. The data for
their work have been collected from 150 patients who have the
symptoms of stroke disease. The result shows that this hybrid
approach is better compared to traditional artificial neural
network and feature selection using genetic algorithm [8].
Niti, Anil, and Navin (2007), in their work proposed
a decision support system for heart disease diagnosis using
neural network. They trained their system with 78 patient
records and the errors made by humans are avoided in this
system [5].
Anbarasi, Anupriya, and Iyengar (2010), in their
work proposed an enhanced prediction of heart disease with
feature subset selection using genetic algorithm. They
predicted more accurately the presence of heart disease with
reduced number of attributes. They used Nave Bayes,
Clustering, and Decision Tree methods to predict the diagnosis
of patients with the same accuracy as obtained before the
reduction of attributes. They concluded that the decision tree
method outperforms the other two methods [1].
Latha and Subramanian (2007), in their work
proposed an intelligent heart disease prediction system using
CANFIS and genetic algorithm. They simulated their work
using Neurosolution software. Cleveland heart disease dataset Figure 1 . System Architecture
is used for analysis. The mean square error obtained is only
0.000842 [4]. A.Cardiac Database
Yung-Keun Kwon and Byungo-Ro Moon (2007), in The Cleveland heart disease data provided by the University
their work proposed a hybrid Neuro-Genetic system for stock of California, Irvine Machine Learning Repository [10] is used
trading. They tested the system with 36 companies in NYSE for analysis of this work. The dataset has 13 numeric input
and NASDAQ for 13 years from 1992 to 2004. They used a attributes namely age, sex, chest pain type, cholesterol, fasting
recurrent neural network as the prediction model. Genetic blood sugar, resting ecg, maximum heart rate, exercise
algorithm was used to optimize the weights of the neural induced angina, old peak, slope, number of vessels colored
network. The hybrid system showed notable improvement on and thal. It also has the predicted attribute ie) the class label.
the average over the buy and hold strategy [9]. The description of the dataset is tabulated in Table 1.
Comparing to the works discussed above, the work
discussed in this paper is different by using genetic algorithm B. Preprocessing Engine
and neural network for predicting the risk of cardiovascular Preprocessing is an important step in the knowledge discovery
disease. Genetic based neural network is used to train the process, as real world data tend to be incomplete, noisy, and
system. The weights of the neural network are optimized using inconsistent. Data preprocessing includes data cleaning, data
genetic algorithm. integration, data transformation, and data reduction. Data
cleaning routines attempt to fill in missing values, smooth out
III. SYSTEM ARCHITECTURE noise while identifying outliers and correct inconsistencies in
the data. In this proposed work, the most probable value is
The architecture of the proposed system is illustrated in Figure used to fill in the missing values. Data transformation routines
1.The major components of this system are Cardiac Database, convert the data into appropriate forms for mining.
Preprocessing Engine, Weight Optimization Engine, Training Normalization is useful for classification purpose. By
Engine, Weight Base, and Prediction Engine. normalizing the input values for each attribute measured in the
training tuples will speed up the learning process. In this work,
the normalization technique used is min-max normalization.
The min-max normalization given in equation (1) discussed in
[12] is defined as follows:
The following is the methodology for weight
optimization used in this work as discussed in [11]:
(1) Step 1: Get the input-output pairs (Ii,Ti), i=1, 2, ..,N where Ii =
In this work, the following attributes are normalized: age, (I1i, I2i, , Ini ) and Ti=(T1i,T2i,,Tni) of the neural network,
Chromosome Ci, i=1,2,,p belonging to the current
trestbps, chol, thalach, and oldpeak.
population Pi whose size is p.
Step 2: Extract weights Wi from Ci using the actual weight wk
TABLE 1. SUMMARY OF THE HEART DISEASE DATASET which is given in equation (2):