Professional Documents
Culture Documents
55 (2015)
Research India Publications; httpwww.ripublication.comijaer.htm
Department of Computer Science and Engineering Bharathidasan Institute of Technology, Anna University, Tiruchirappalli620 024, India.
3
Department of Electronics and Communication Engineering Bharathidasan Institute of Technology, Anna University,
Tiruchirappalli, India.
1
E-mail: asirantony@gmail.com
Abstract
Attribute reduction plays a significant role in data preprocessing in order to improve the performance of the
machine learning algorithm. Attribute reduction is a process
of selecting the best and relevant attributes from high
dimensional data by eliminating redundant and irrelevant
attributes for improving the predictive accuracy of the
machine learning algorithm. This paper proposes an ant
colony optimization (ACO)-based attribute reduction
technique for improving the performance of machine learning
algorithm. The performance of the proposed algorithm is
tested on benchmark dataset. It is observed that the proposed
algorithm is yielding better predictive accuracy on the
training dataset compared to other approaches.
Keywords: Attribute reduction, ant colony optimization,
disease diagnostic system
1. Introduction
Attribute reduction is a process of reducing the
dimensionality by removing the irrelevant and redundant
attributes or variable from the given dataset for the
construction of the classification or predictive model. In realworld, the datasets contain irrelevant and redundant features.
Redundant attributes may have the similar information where
the irrelevant attributes does not carry the information which
is needed to perform the prediction or classification. Feature
selection techniques provide four main benefits when
constructing predictive models such as improved model
interpretability,
shorter
training
times,
enhanced
generalization and runtime.
Feature selection has been an active field of research area
in pattern recognition, machine learning, statistics and data
mining. The main objective of feature selection is to choose a
subset of input variables by eliminating features, which are
irrelevant or of no predictive information to improve the
performance of classification algorithm.
Data mining is the process of using programmed data
analysis techniques to uncover previously invisible
relationships among data items. Data mining often involves
the examination of data stored in a database. There are three
major data mining techniques namely regression,
classification, and clustering. Data mining is the process that
1561
2. Related works
Classification is the method of assigning predefined class
labels to some invisible or test data. For this purpose, a set of
labeled statistics is used to train a classifier which is then
used for grouping invisible data. The classification method is
also known as supervised learning.
International Journal of Applied Engineering Research, ISSN 0973-4562 Vol. 10 No.55 (2015)
Research India Publications; httpwww.ripublication.comijaer.htm
1562
International Journal of Applied Engineering Research, ISSN 0973-4562 Vol. 10 No.55 (2015)
Research India Publications; httpwww.ripublication.comijaer.htm
Diseases dataset
i, j
(t
0)
(1)
bi
i 1
Test dataset
Predictive model
building
Accuracy on disease
prediction
i, j
1563
(t 1)
i, j
(t )
i, j
(t ).(
.c)
(3)
International Journal of Applied Engineering Research, ISSN 0973-4562 Vol. 10 No.55 (2015)
Research India Publications; httpwww.ripublication.comijaer.htm
No of
attributes
1
2
3
4
5
6
7
Heuristic function
calculated between
class attribute and
other attributes
(Increment search)
{1}
{1,2}
{1,2,3}
{1,2,3,4}
{1,2,3,4,5}
{1,2,3,4,5,6}
{1,2,3,4,5,6,7}
Accuracy
of
Naive
Bayes
(NB)
75.4
77
76.6
76.5
76.3
76.8
76.4
Accuracy
of
J48
73.04
73.47
74.60
74.34
73.69
74.21
73.82
No. of
attributes
1
2
3
4
5
Without
heuristic
function
(Random
search)
{1}
{1,2}
{1,2,3}
{1,2,3,4}
{1,2,3,4,5}
Accuracy
of Naive
Base
Accuracy
of J48
65
66.6
73.8
76.3
76.3
63.8
65.6
74.8
73.8
73.8
{1,2,3,4,5,6}
76.3
74.8
{1,2,3,4,5,6,7}
76.3
74.69
1564
Conclusion
This paper proposed an ant colony optimization (ACO)based attribute reduction algorithm for improving the
performance of the supervised learning algorithms in terms of
reducing the runtime and improve the accuracy for Diseases
Diagnosis System (DDS). This research work focused on the
feature reduction on training diseases dataset which is learnt
to build the predictive model for the for DDS, the diabetes
dataset is used for evaluating the predictive accuracy for the
DDS using information gain (IG)-based heuristic function
with incremental search for attribute subset generation and
reduction. The results were obtained with the heuristic
function using Naive Bayes and J48 classifies in terms of
classifications accuracy. It is observed that the heuristic
function produces better result compared to the classification
without heuristic function. In future, this work can be
extended with other heuristic functions and searching
strategies for attribute reduction.
International Journal of Applied Engineering Research, ISSN 0973-4562 Vol. 10 No.55 (2015)
Research India Publications; httpwww.ripublication.comijaer.htm
References
[1] Sivagaminathan, R. K., and Ramakrishnan, S., 2007, A
hybrid approach for feature subset selection using
neural networks and ant colony optimization, Expert
Syst Appl., 33 (1), pp. 4960.
[2] Parag Deoskar, Dr. Divakar Singh, Dr. Anju Singh, An
Efficient Support Based Ant Colony Optimization
Technique for Lung Cancer Data, Expert Syst
International
Journal
of
Advanced
Research in Computer and Communication Engineering
Vol. 2, Issue 9, September 2013.
[3] Huang, C. L., 2009, ACO-based hybrid classification
system with feature subset selection and model
parameters optimization, Neurocomputing, 73 (13),
pp. 438448.
[4] Chen, Y., Miao, D., and Wang, R., 2010, A rough set
approach to feature selection based on ant colony
optimization, Pattern Recogn Lett., 31 (3), pp. 226
233.
[5] Priyanka Dhasal, Shiv Shakti Shrivastava, Hitesh
Gupta, Parmalik Kumar, An Optimized Feature
selection for Image Classification Based on SVMACO,
International Journal of Advanced Computer Research
(IJACR) ,Volume-2 Number-3 Issue-5 September-2012.
[6] Zar Chi Su Su Hlaing and May Aye Khine, Solving
Traveling Salesman Problem by Using Improved Ant
Colony Optimization Algorithm International Journal of
Information and Education Technology, Vol. 1, No. 5,
December 2011
[7] Maryam Bahojb Imani, Tahereh Pourhabibi,
Mohammad Reza Keyvanpour, and Reza Azmi A New
Feature Selection Method Based on Ant Colony and
Genetic Algorithm on Persian Font Recognition
International Journal of Machine Learning and
Computing, Vol. 2, No. 3, June 2012
[8] Chen, Ling, Bolun Chen, and Yixin Chen. "Image
feature selection based on ant colony optimization." AI
2011: Advances in Artificial Intelligence. Springer
Berlin Heidelberg, 2011. 580-589.
[9] Suman Banik, Bibhash Roy, Parthi Dey, Nabendu
Chaki, Sugata Sanyal, QoS Routing using OLSR with
Optimization for Flooding, International Journal of
Information and Communication Technology Research,
ISSN-2223-4985,
Vol.1,
No.4,
pp.164-168,
August 2011.
[10] Hesham Arafat, Rasheed M.Elawady, Sherif Barakat
and Nora M.Elrashidy Using Rough Set and Ant
Colony optimization In Feature Selection Volume 2,
Issue 1, January February 2013. ISSN 2278-6856
[11] Akarsu, E., and Karahoca, A., 2011, Simultaneous
feature selection and ant colony clustering, Proc.
Computer Science, 3, pp. 14321438.
[12] Aghdam, M. H., Ghasem-Aghaee, N., and Basiri, M.
E., 2009, Text feature selection using ant colony
1565
[13]
[14]
[15]
[16]
[17]