Professional Documents
Culture Documents
I.
INTRODUCTION
i.
ii.
iii.
iv.
1283
IEEE-ICRTIT 2011
disease and to identify the lifetime of a patient. The input
dataset is in WEKA ARFF file or .csv file format. The hepatitis
disease dataset has 20 attributes, 14 of which are linear valued
and are relevant. There are 281 instances and 2 classes. The
hepatitis patient dataset is run against the CART decision tree
algorithm. There are some missing values in the dataset. The
instance with missing values is probabilistically assigned a
possible value according to the distribution of values for that
attribute based on the training data using CART algorithm. The
figure 1 shows the original hepatitis patient dataset
A. Experimental Data
The hepatitis disease data set of 473 patients is used in this
experiment [1]. Relevant attributes are identified and selected.
This dataset contains 19 attributes and a class variable with two
possible values, which are shown in Table-I. This data contains
attributes (Age, Bilirubin, Alk Phosphate, Sgot, Albumin,
Protime) which contain continuous values. The other attributes
1284
ANOREXIA$
ALKPHOS
SEX$
TABLE I
Description Of The Features In The Hepatitis patient Dataset
FATIGUE$
Class
DIE, LIVE
Age
Sex
male, female
Steroid
no, yes
Antivirals
no, yes
Fatigue
no, yes
Malaise
no, yes
Anorexia
no, yes
Liver Big
no, yes
10
Liver Firm
no, yes
11
Spleen Palpable
no, yes
12
Spiders
no, yes
13
Ascites
no, yes
14
Varices
no, yes
15
Bilirubin
16
Alk Phosphate
17
SGOT
18
Albumin
19
Protime
20
Histology
no, yes
1285
IEEE-ICRTIT 2011
We have implemented the ID3, C4.5 [10], CART algorithm
[15] and tested them on our experimental dataset. The accuracy
of these algorithms can be examined by confusion matrix
produced by them. A confusion matrix contains information
about actual and predicted classifications done by a
classification system. Performance of such systems is
commonly evaluated using the data in the matrix. The
following tables II,III,IV shows the confusion matrix for three
class classifier
TABLE II.
CONFUSION MATRIX OF ID3 ALGORITHM
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.686
0.281
0.66
0.686
0.673
0.68
No
0.719
0.314
0.742
0.719
0.73
0.719
Yes
TABLE III
CONFUSION MATRIX OF C4.5 ALGORITHM
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.97
0.385
0.615 0.89
0.03
0.714
0.97 0.929
0.385 0.5
0.669
Live
0.669
Die
TABLE IV
0.91
0.231
0.769
0.09
0.859
0.333
0.91 0.884
0.231 0.273
0.541
0.541
Live
Die
Where,
1) The recall or true positive rate (TP) is the proportion of
positive cases that were correctly identified, as calculated
using the equation:
1286
i.
ii.
iii.
iv.
[6]
[7]
[8]
[9]
[10]
[11]
TABLE V
PREDICTION ACCURACY TABLE
SNO
1
2
3
NAME OF ALGORITHM
CART Algorithm
ID3 Algorithm
C4.5 Algorithm
[12]
ACCURACY %
83.2
64.8
71.4
[15]
REFERENCES
[2]
[3]
[4]
[16]
[17]
CONCLUSION
[1]
[18]
[19]
[20]
[21]
[22]
1287