Professional Documents
Culture Documents
Technology
13
29
B. Feature Selection
One of the significant step of various artificial intelligence
and pattern recognition problems is feature selection [17].
Feature selection is a method in preprocessing which is used to
identify the suitable features and it plays an important role in
classification. In order to produce a reduced data set various
feature selection approaches along with various search methods
are available. Hybrid, Wrapper and Filter methods are generally
regarded as feature selection methods and the result obtained
from these methods differs in accuracy as well as in time.
In feature selection process firstly, candidate feature subset is
produced from the original dataset after that the produced
candidate subset is assessed by means of some assessment
functions which comprises of classifier error rate, dependency,
information, consistency and distance [18]. Generally relevancy
value is produced using these functions and these values are
further used as termination condition in order to conclude
whether the selected subset feature is optima or not. Feature
selection methods detect dependencies between features. In this Figure 1. Model of Bagging Process
study we use correlation feature selection - subs t evaluation (cfs)
method using greedy stepwise search for the selection of WEKA (Waikato Environment for Knowledge Analysis) is a
important attribute. Cfs is the dimension reduction method that tool widely used for data mining tasks [21]. In this study, using
compute the association (correlation) between features and WEKA software, three different learning algorithms random
classes and separates the feature which are not suitable. Seven forest, REPTree and J48 are used as base classifier in bagging
features (AC, DS, DP, ASTV, MSTV, ALTV and Mean) were based classifiers.
obtained using cfs. D. Random Forest
C. Bagging (Bootstrap Aggregation) Random forest is the combination of different decision trees,
In supervised machine learning, ensemble methods are very used to classify the data samples into classes. It is commonly
popular because of the ability to accurately predict class labels. used statistical technique used for t e classification. The worth of
An ensemble method uses more than one cla sifier to achieve each distinct tree in not essential, the purpose of random tree is to
overall better accuracy. Classical ensemble methods such as reduce the error rate of the whole orest. The error rate depends
bagging and boosting, have good predictive capability. Bagging upon two factors i.e. correlation between two trees and the
method was proposed by Breiman [13]. strength of the tree.
In bagging algorithm form original training data set, N Two important parameters are associated with random forest.
different samples called bootstrap samples [19] S1, S2, ……, SN First one is the total number of trees in the forest, second is the
are generated. A classifier Cn is built against each bootstrap number of predictive variables used to split the nodes of a tree.
sample Sn. From classifier C1, C2, ……, CN, a inal classifier CL In order to minimize the overall error rate these two parameters
is built whose output is the class predicted most often by its sub- should be optimized. In order to est mates the important variables
classifiers. The bagging process is shown in fig. 1. in the classification, random fores is an efficient method. The
algorithm to construct each tree in random forest is as follows:
Bagging algorithm for multiple classification [20].
If the original training data set comprises of S different
Let T is the training set cases, then select S samples randomly (with replacement).
for i=1 to N For constructing the tree, these S samples are training set.
Create a new set Si (bootstrap sampl s) by randomly For a set of T input variables, select a distinct variable t
selecting the examples from the training set T. the size of such that t<T at each node. Select the t variables
Si must be equal to size of training set T. randomly, out of T. the best split on the t is used for node
splitting. During the construction of forest, the value of t is
Learn the classifier Ci: i=1 to N for training set Si by using set constant.
machine learning algorithm.
Each tree is constructed to the largest possible level but
Create a final model by combining all t e classifiers (C1, without pruning.
C2, …, Ci ) having majority votes.
As the error rate in random forest depends upon
correlation and strength, therefore reducing t decreases
both correlation and strength
14
29
E. Reduced Error Pruning Tree (REPTree)
REPTree method is proposed by Quinlan [22]. The REPTree
algorithm generates a decision tree, by calculating the
information gain using entropy. It helps to decrease the decision
tree model complexity by “reduced error pruning method” and III. RESULTS AND DISCUSSION
also reduces the error which arises from variance [23]. The
information gain [24], is a criteria that uses entropy as measure,
and select the attributes having maximum information gain. Let T
be a set of examples containing m elements belong to class X and
n elements belong to class Y. The information required for
deciding whether a random example from T belongs to X or Y is
defined as
t t ƒ ƒ
I(t, f ) = — log2 — log2 (1)
t+ƒ t+ƒ t+ƒ t+ƒ
15
29
50% of deaths were the result of unawareness about abnormal TABLE III: COMPARISON OF DECISION TREES CLASSIFIERS USING
FHR pattern. Now the CTG data is analyzed using the data BAGGING APPROACH IN TERMS OF PERFORMANCE MEASURES.
mining techniques and these techniques are helping hand in
CTG Data with all features (21)
avoidance of human mistakes and to take correct decisions. Class
Performance
Classifiers Measures Normal Suspicious Pathological
In this study, the classification results of bagging based
Precision 0.957 0.883 0.946
random forest, REPTree and J48 are compared. The tenfold cross Random Recall 0.984 0.769 0.898
validation is used to avoid over fitting for classification of normal Forest
F-Measure 0.971 0.822 0.921
and pathological subjects. In cross validation 9/10 of data is used Precision 0.951 0.888 0.903
for training of algorithm and remaining is used for testing by REPTree Recall 0.977 0.753 0.903
repeating this step 10 times. In Table II the results of the three F-Measure 0.964 0.815 0.903
classifiers for correct/incorrect classification using full (21) Precision 0.951 0.850 0.909
feature and most relevant (07) features are presented. J48 Recall 0.972 0.749 0.909
F-Measure 0.961 0.796 0.909
TABLE II: COMPARISON OF DECISION TREES CLASSIFIERS USING CTG Data with selected features (07)
BAGGING APPROACH FOR THE CTG DATA. Performance Class
Classifiers Normal Suspicious Pathological
Measures
Attributes
Precision 0.955 0.852 0.915
Classifiers Classification Random Recall 0.973 0.763 0.920
21 07 Forest
(Complete (Seleccted F-Measure 0.964 0.805 0.918
features) features) Precision 0.951 0.842 0.913
Accuracy 94.73% 93.93% REPTree Recall 0.970 0.756 0.898
Random Forest
Error 5.26% 6.07% F-Measure 0.961 0.796 0.905
Accuracy 93.98% 93.84% Precision 0.954 0.865 0.899
REPTree
Error 6.02% 6.14% J48 Recall 0.973 0.763 0.909
Accuracy 93.56% 93.46% F-Measure 0.963 0.811 0.904
J48 6.44% 6.54%
Error
to the better classification rate. It is clear from the table that 94.00 93.84 93.56
random forest using bagging approach provides higher values of 93.50
93.46
F-Measure, Recall and Precision as compared to REPTree and
J48. In case of full feature space the average values of all three 93.00
performance measure for random forest are (0.946, 0.947, 0.946) 92.50
respectively whereas in case of relevant features the average
Random Forest REPTree J48
values of all three performance measure for random forest are
(0.938, 0.939, 0.938) respectively. For complete features the
average values of all three performance measure for REPTree are Full Features Selected Features
(0.938, 0.940, 0.938) respectively, whereas in case of relevant
features the average values of all three performance measure for Figure 4 Comparison of Decision Trees Algorithms using Bagging
REPTree are (0.937, 0.938, 0.937) respectively. For all features Approach
the average values of all three performance measure for J48 are
(0.934, 0.936, 0.934) respectively, whereas in case of relevant IV. CONCLUSION
features the average values of all three performance measure for In this paper, the performance of three decision trees based
J48 are (0.933, 0.935, 0.933) respectively. algorithms, namely random forest, REPTree and J48 with
16
29
bagging approach were evaluated for the classification of CTG [10] G. Georgoulas, C.D. Stylios, G. Nokas, and P.P. Groumpos, “Classification
data. For the analysis, data set with complete features and of fetal heart rate during labour using hidden Markov models,” Proc. IEEE
Int. Joint Conf. Neural Network, vol.3, pp.2471–2474, 2004.
reduced features have been used. All three classifiers have shown
[11] J. Spilka, V. Chudacek, M. Koucky, L. Lhotska, M. Huptych, P. Janku, G.
almost similar classification accuracies on full features set. Georgoulas, and C.D. Stylios, “Using nonlinear features for fetal heart rate
Random forest performed slightly better (94.7%). Correlation classification,” Biomed. Signal Process. Control, vol.7, pp.350–357, 2012.
feature selection - subset evaluation (cfs) method was used for [12] T.G. Dietterich, “Ensemble Methods in Machine Learning,” In J. Kittler
the selection of relevant features. Classification accuracy of and F. Roli (Ed.) First International Workshop on Multiple Classifier
random forest and other classifiers by using proposed Systems, Lecture Notes in Computer Science, New York: Springer Verlag,
methodology of bagging approach are negligibly degraded in pp.1-15, 2000.
reduced feature space scenario. It may be concluded that AC, DS, [13] L. Breiman, “Bagging predictors,” Machine Learning, vol.24, pp.123–140,
DP, ASTV, MSTV, ALTV and Mean are the most relevant 1996.
features in the analysis and classification of cardiotocogram. [14] A. Frank, and A. Asuncion, “UCI Machine Learning Repository,”
[http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School
Moreover random forest classifier using bagging approach with of Information and Computer Science, 2000.
above mentioned seven features can be used efficiently for the [15] D.J. Newman, S. Heittech, C.L. Blake, and C.J. Merz, “UCI Repository of
classification of CTG data. Machine Learning Databases,” University California Irvine, Department of
Information and Computer Science, 1998.
The major limitation of the study is that bagging approach in
[16] D.A. de Campos, J. Bernardes, A. Garrido, J. Marques-de-Sá and L.
combination with decision trees algorithms were used for Pereira-Leite, “SisPorto 2.0 A Program for Automated Analysis of
publically available secondary database. The authenticity of the Cardiotocograms,” J. Matern. Fetal Med., vol.5, pp.311-318, 2000.
proposed technique can be verified by using primary data for the [17] J.G. Zhang, and H.W. Deng, “Gene selection for classification of
classification of healthy and pathological data. Moreover, microarray data based on the Bayes error,” BMC Bioinformatics, vol.8,
classifiers other than decision trees algorithms can also be used pp..370, 2007.
with bagging approach. [18] M.A. Hall, and A.S. Lloyd, “Feature Subset Selection: A Correlation Based
Filter Approach,” International Conference on Neural Information
REFERENCES Processing and Intelligent Information Systems, pp.855-858, 1997.
[1] E.M. Karabulut, and T. Ibrikci, “Analysis of Cardiotocogram Data for [19] B. Efron, and R. Tibshirani, “An Introduction to the Bootstrap,” Chapman
& Hall, 1993.
Fetal Distress Determination by Decision Tree Based Adaptive Boosting
Approach,” Journal of Computer and Communications, vol.2, pp.32-37, [20] L. Breiman, “Bagging predictors,” Technical Report 421, Department of
2014. Statistics, University of California at Berkeley, 1994.
[. 2H1a]ll,MF..AEibe, H. Geoffrey, P. Bernhard, R. Peter, and H.W. Ian, “The
[2] G. Georgoulas, J. Spilka, P. Karvelis, V. Chudacek, C. Stylios, L. Lhotska, WEKA Data Mining Software: An Update SIGKDD Explorations,”
“A three class treatment of the FHR classification problem using latent Vol.11, Issue 1, 2009.
class analysis labeling,” In Proceedings of 36th Annual International
Conference of the IEEE Engineering in Medicine and Biology, Sheraton [22] J.R. Quinlan, “Simplifying decision trees,” International Journal of Man.
Chicago Hotel and Towers, Chicago, USA, pp.46-49, August 2014. Machine Studies, vol.27, pp.221–234, 1987.
[3] P.J. Steer, “Has Electronic Fetal Heart Rate Monitoring Made a [23] I.H. Witten, and E. Frank, “Data mining: practical machine learning tools
Difference?,” Seminars in Fetal and Neonatal Medicine, 13, WB Saunders, and techniques,” 2nd ed. The United States of America, Morgan Kaufmann
pp. 2-7, 2008. series in data management systems, 2005.
[4] T. Peterek, K. Jana, D. Pavel, and G. Petr, “Classification of [24] W. Peng, C. Juhua, and Z. Haiping Zhou, “An Implementation of ID3 -
cardiotocography records by random forest,” 36th International Conference Decision Tree Learning Algorithm,” University of New South Wales,
on Telecommunications and Signal Processing, TSP, pp. 620-623, 2013. School of Computer Science & Engineering, Sydney, NSW 2032,
Australia.
[5] A. Zarko, D. Devane, G.ML. Gyte, “Continuous cardiotocography (CTG)
as a form of electronic fetal monitoring (EFM) for fetal assessment during [25] L. Rokach, and M. Oded, “Top-Down Induction of Decision Trees
labour,” In Alfirevic, Zarko, Cochrane Database of Systematic Reviews, Classifiers—A Survey,” IEEE transactions on Systems, Man and
2006. Cybernetics—part c: Applications and reviews, vol.35, 2005.
[6] M. Huang, and Y. Hsu, “Fetal Distress Prediction Using Discriminant [26] E. Alpaydın, “Introduction to Machine Learning,” The MIT Press, Printed
Analysis, Decision Tree, and Artificial Neural Network,” Journal of and bound in the United States of America, 2004.
Biomedical Science & Engineering, vol.5, pp.526-533, 2012.. [27] J. R. Quinlan, “Induction of Decision Trees,” Mach. Learning, vol.1,
[7] C. Sundar, M. Chitradevi, and G. Geetharamani, “Classification of pp.81-106, 1986.
Cardiotocogram Data Using Neural Network Based Machine Learning [28] J. R. Quinlan, “C4.5: Programs for Machine Learning,” Morgan Kaufmann
Technique,” International Journal of Computer Applications, vol.47, pp.19- Publishers, 1993.
25, 2012.
[8] E. Yılmaz, and Ç. Kılıkçıer, “Determination of Fetal State from
Cardiotocogram Using LS-SVM with Particle Swarm Optimization and
Binary Decision Tree,” Computational and Mathematical Methods in
Medicine, 2013.
[9] H.W. Jongsma, and J. G. Nijhuis, “Classification of fetal and neonatal
heart rate patterns in relation to behavioural states,” Eur. J. Obstet.
Gynecol, Reprod. Biol. Vol.21, pp.293–299, 1986.
17
29