You are on page 1of 6

2012 IEEE International Conference on Advanced

Communication Control and Computing Technologies (ICACCCT)

Towards Higher Accuracy in Supervised Learning and Dimensionality


Reduction by Attribute Subset Selection - A Pragmatic Analysis
D.Asir Antony Gnana Singh*, S. Appavu Alias Balamurugan**, E.lebamalar Leavline***
Department ofCSE*, Department ofIT**, Department ofECE***
M.I.E.T Engineering College,

Tiruchirappalli -7,India.*, Thiagarajar College ofEngineering, Madurai-15,India **,


Tiruchirappalli - 24, India***

Anna University of Technology,

asirantony@gmail.com*, app_s@yahoo.com**, jebilee@gmail.com***


Abstract

This paper presents a pragmatic study on feature

subset selection evaluators. In data mining, dimensionality


reduction
improving

in

data

the

algorithms.

preprocessing

performance

Many

of

techniques

plays
the

have

vital

machine
been

role

for

learning

proposed

by

researchers to achieve dimensionality reduction. Beside the


contribution of

feature subset selection in dimensionality

reduction gives a significant improvement

in accuracy, it

reduces the false prediction ratio and reduces


complexity

for

building

the

learning

model

in

the time
machine

learning algorithm as the result of removing redundant and

original variables may be redundant when forming the


transformed variables. In many cases, it is desirable to
reduce not only the dimensionality in the transformed
space, but also the number of variables that need to be
considered or measured [15],[16],[26]. Unlike feature
extraction, feature selection aims to seek optimal or
suboptimal subsets of the original features [16]-[19], [31],
[32] by preserving the significant information carried by
the collected complete data, to facilitate feature analysis for
high dimensional problems [22]-[24], [26].

irrelevant attributes from the original dataset. This study


analyzes the

performance of these

Cfs,

Consistency

and

Filtered attribute subset evaluators in view of dimensionality


reduction with the wide range of test datasets and learning
algorithms namely probability-based Naive Bayes, tree-based
C4.S(J48) and instance-based ill!.

Keywords - Data prepocessing; Dimensionality Reduction;


Feature subset selection; Classification Accuracy; Data mining;

This study mainly focuses on


analyzing
the
performance of these Cfs, Consistency and Filtered
attribute subset evaluators in view of dimensionality
reduction with the wide range of test datasets and the
learning algorithms namely probability-based learner Naive
Bayes, tree-based learner C4.5(J48) and lazy instance
based learner IB1.

Machine learning;Classifier.

I.

INTRODUCTION

Dimensionality reduction for classification process has


involved significant attention in both pattern recognition
and machine learning. High dimensional data space may
increase the computational cost and reduce the prediction
accuracy of classifiers [1]-[3]. The classification process is
known as supervised learning that builds the classifier by
learning from a training data sets[6] and it is observed that,
when the features of training data sets exceed a particular
range of a sample space, the accuracy of the classifier will
decrease [1][2]. There are two ways to be followed to
achieve the dimensionality reduction: feature extraction
and feature selection [5], [3].
In feature extraction problems, [12],
[13], [26] the original features in the measurement space
are initially transformed into a new dimension-reduced
space via some specified transformation. Significant
features are then determined in the new space. Although
the significant variables determined in the new space are
related to the original variables, the physical interpretation
in terms of the original variables may be lost. In addition,
though the dimensionality may be greatly reduced using
some feature extraction methods, such as principal
component analysis (PCA) [14], the transformed variables
usually involves all the original variables. Often, the

ISBN No. 978-1-4673-2048-1I12/$31.002012 IEEE

A. Feature subset selection

Feature subset selection is a process that removes


irrelevant and redundant features form the dataset to
improve the prediction accuracy of the learning algorithm.
The irrelevant features reduce the predictive accuracy and
redundant feature deteriorate the performance of the
learners and requires high computation time and other
resources for training and testing the data [9] [8]. The
Feature subset selection can be classified into three
methods namely wrapper, filter and hybrid [10], [11]. In
wrapper approach, a predetermined learning model is
assumed, wherein features are selected that justify the
learning performance of the particular learning model [27],
[11] whereas in the filter approach, statistical analysis of
the feature set is required, without utilizing any learning
model [28]. The Hybrid approach is the combination of
filter and wrapper methods.
1)

Subset Generation and Searching methods:

A Search strategy and criterion functions are needed


for subset selection. The search algorithm generates and
compares
possible
feature-selection
solutions
by
calculating their criterion function values as a measure of
the effectiveness of each considered feature subset. The
feature subset with the best criterion function value is given
as the output of the feature-selection algorithm [12], [5],
[11] [4]. In general, many searching strategies are followed

125

2012 IEEE International Conference on Advanced

to generate the feature subset [30]-[33] as follows. The first


method is sequential forward search, starts the searching
process with empty set and adds the features effectively.
The second method, sequential backward search starts with
the full set of attributes at each step; it removes the worst
attribute remaining in the set. The third method
bidirectional selection starts on both ends to add and
remove features concurrent. In the fourth method, the
searching process is done on the randomly selected subset
using a sequential or bidirectional strategy. The fifth
method is complete search, thoroughly search the subsets
so that it gives best solution, but is not possible when the
number of features are large.[II] [30]-[33].
B. Supervised learning algorithms
The classification algorithms can be
employed for evaluating the performance of feature subset
evaluators. Several algorithms are available, each has
pitfalls and strengths. Research says that, there is no
possibility to work best on all supervised learning problems
by a single learning algorithm. The most familiar
algorithms namely Naive Bayes, C4.5 (148) and IBI have
been chosen to build an experimental setup to carry out this
study. This setup is strengthened with the most highlighted
characteristic of Naive Bayes classifier that, it estimates the
parameters (mean and variance of the variables) necessary
for classification by minimal amount of training data;
advantages of C4.5 (148) decision tree which is simple to
understand and interpret, requires little data preparation,
robust and performs well with large data in a short time;
benefit of IB 1 that it is able to learn quickly from a very
small dataset.

Communication Control and Computing Technologies (ICACCCT)

2) C4. 5 (J48): Many techniques are followed to build a


decision tree. In all these techniques the given data are
formed into a tree structure, the branches represent the
association between feature values and class label. The
C4.5 (148) is familiar and superior among these techniques
[48]. It partitions the training datasets in recursive fashion,
derived from examining the potential on feature values in
separating the classes. The decision tree learns from a set
of training dataset through an iterative process of choosing
a feature and splitting the given dataset based on the values
of that feature. The entropy or information gain is used to
select the most representative features for classification.
The selected features have the lowest entropy and highest
information gain. This learning algorithm works with the
following steps. First, computing the entropy measure for
each feature, secondly partitioning the set of dataset
according to the possible values of the feature that has the
lowest entropy, and thirdly estimating probabilities, in a
way exactly the same as the Naive Bayes approach [49].
Although feature sets are chosen one at a time in a greedy
manner, they are dependent on results of previous tests.
3) IBI: This classifier works on nearest neighbour
classification principle. In this approach, the distance
between the training instance and the given test instance
are calculated by the Euclidean distance measure. If more
than one instance has the smallest distance to the test
instance, the first found instance is used. Nearest neighbour
is one of the most significant learning algorithms; it can be
adapted to solving wider problems [46]. To classify an
unclassified vector X, this algorithm ranks the neighbours
of X amongst a given set of N data (Xi, ci), i 1, 2, ..., N,
and uses the class labels cj G
1, 2, ... , K) of the K most
similar neighbours to predict the class of the new vector X.
In particular, the classes of these neighbours are weighted
using the similarity between X and each of its neighbours,
where similarity is measured by the Euclidean distance
metric. Then, X is assigned as the class label with the
greatest number of votes among the K nearest class labels.
This learner works according to the persecution that the
classification of an instance is likely to be most similar to
the classification of other instances that are nearby within
the vector space. It does not depend on prior probability
compared to the other learning algorithm such as NB. This
is computationally cost effective when the size of the
dataset is less and the calculation of the distance is quite
expensive when the dataset is large. Conversely, the
computational cost is reduced by adopting the PCA and
information gain based feature ranking for dimensionality
reduction.The rest of this article is organized as follows: In
section II, the related works are reviewed. Section III,
elucidates the proposed work. Section IV, represents the
experimental result with discussion. In section V,
conclusion is drawn with future directions.
=

1) Naive Bayes: In this classifier, the classification is


achieved by the principle of basic Bayes' theorem. It gives
relatively better performance on classification tasks [47]. In
general, Naive Bayes (NB) learns with the assumption that,
features are independent to the given class variable. More
formally, this classifier is defined by discriminant function:
N

Ii (X) IT P(X; I C)P(C)


=

;=1

(1)

where X (xl, x2, ..., xN) denotes a feature vector and cp j


1, 2, ..., N, denotes possible class labels. The training
phase for learning a classifier consists of estimating
conditional probabilities P(X,ICi) and prior probabilities
P(ci). Here, P(ci) are estimated by counting the training
dataset that fall into class ci and then dividing the resulting
count by the size of the training set. Similarly, conditional
probabilities are estimated by simply observing the
frequency distribution of feature xj within the training
subset that is labeled as class ci. To classify a c1ass
unknown test vector, the posterior probability of each class
is calculated, given the feature values present in the test
vector; and the test vector is assigned to the class that is of
the highest probability[49].
=

126

2012 IEEE International Conference on Advanced

II.

RELATED WORK

Many feature subset evaluator algorithms have been


proposed for choosing the most relevant feature subsets
form the datasets. In this section, attribute subset selection
algorithms namely Cfs, Consistency and Filtered Subset
Evaluator are discussed.
A.

Cjs

In this method Cfs (Correlation-based Feature


Selection), the subsets of attributes are evaluated rather
than individual attributes [38], [39], [43]. The kernel of this
heuristic principle that evaluates the effectiveness of
individual features based of the degree of inter-correlation
among the features to predict the class. The goodness of the
feature subset is determined by heuristic equation (1) on
basis of the attributes presented in subset have high
correlation with the class and low inter-correlation with
each other.
Merit

kr
k+k(k-l)rjJ
cf

(2)

where S is the subsets containing k features, fef the average


feature-class correlation, and ftf the average feature-feature
inter-correlation. The numerator can be thought of as
giving an indication of how predictive a group of features
are; the denominator of how much redundancy is there
among them. This heuristic method handles irrelevant
features, as they will be poor predictors of the class.
Redundant attributes are discriminated against as they will
be highly correlated with one or more of the other features.
Since attributes are treated independently, CFS cannot
identify strongly interacting features such as in a parity
problem. However, it has been shown that it can identify
useful attributes under moderate levels of interaction [38]
[43].
In order to apply equation (2) it is necessary to compute the
correlation (dependence) between attributes. Cfs first
discretizes numeric features using the technique discussed
in [42] [43] and then uses symmetrical uncertainty to
estimate the degree of association between discrete features
(X and Y):
SU

2.0x

Communication Control and Computing Technologies (ICACCCT)

H(X)+ H(Y) - H(X,Y)


H(X)+ H(Y)

(3)

After computing a correlation matrix, Cfs applies a


heuristic search strategy to find a good subset of features
according to equation (3). As mentioned at the beginning
of this section, we use the modified forward selection
search, which produces a list of attributes ranked according
to their contribution to the goodness of the set.

B. Consistency
In consistency-based subset elevation, many approaches
use class consistency as an evaluation metric in order to
select the attribute subset [40], [41]. These methods look
for combinations of attributes whose values divide the data
into subsets containing a strong single class majority.
several approaches to attribute subset
selection use class consistency as an evaluation metric [40],
[41]. These methods look for combinations of attributes
whose values divide the data into subsets containing a
strong single class majority. Usually the search is biased in
favour of small feature subsets with high-class consistency.
Our consistency-based subset evaluator uses the work of
[41] consistency metric.
J

Consistencys

IID,I-IM,I

1--"'="'----

(4)

where s is an attribute subset, J is the number of distinct


combinations of attribute values for s, D; is the number of
occurrences of the ith attribute value combination, M; is the
cardinality of the majority class for the ith attribute value
combination and N is the total number of instances in the
data set. Data sets with numeric attributes are first
discretized using the methods found in [42] [43]. The
modified forward selection search described earlier in this
section is used to produce a list of attributes, ranked
according to their overall contribution to the consistency of
the attribute set.
C.

Filter

This approach can be most suitable for reducing only


the data in dimensionality, rather than for training a
classifier. The goodness of the subset selection takes the
advantages of the Cfs subset evaluator and Greedy
Stepwise searching algorithm. It reduces the dimensionality
as maximum as possible in high dimensional data [44].

III.

PROPOSED WORK

For this proposed work, the experimental setup was


constructed with three attribute subset selection techniques
and ten standard machine-learning data sets from the Weka
data set collection [44] to carry out the experiment. These
data sets range in size from a minimum of 5 to 270
attributes in maximum with instances range in size form a
minimum of 4 to maximum of 4627. The experiment was
evaluated by the Weka data mining tool [44]. These three
attribute subset evaluators were evaluated with the help of
the datasets by the well known classifier algorithms
probability-based Naive Bayes, tree-based C4.5 (J48) and
instance-based IB 1.

127

2012 IEEE International Conference on Advanced

IV.

Communication Control and Computing Technologies (ICACCCT)

RESUL TS AND DISCUSSION

The Summary of data sets used in the experiments and


the experimental results are derived from the analyses of
.
dimensionality reduction and improvement of the learnmg
accuracy of the classifiers that are shown in Table-I, Table
II and Table-III, respectively.
TABLE l.

S.No.

Dataset

Contact Lenses

2
3
4
5
6
7

Diabetes
Glass
Ionosphere
Iris
Labor
Soybean

8
9
10

Super Market
Vote
Weather

3S
30
2S
20
IS

Instances

a Cis

10

SUMMARY OF DATA SETS

Features

Classes

24

768
214
351
150
57
683

9
10
35
5
17
36

2
7
2
3
2
19

4627
435
14

217
17
5

2
2
2

S
0

II

--.11

...
c,t." ,,',,'1 e" '$:-" '"
o"Q
..."t (J-li
,o
<..0r.:-

S.No.

115

Feature Subset Evaluators

'i- 77.5
,
77

Cfs

Collsistellcy

filtered

Contact Lenses

Diabetes

3
4

Glass
Ionosphere

8
14

7
7

5
14

5
6

Iris
Labor

2
7

2
4

2
4

7
8

Soybean
Super Market

22

13
32

7
15

Vote
Weather

4
2

10
2

1
1

9
10

o\ ::o,,'b(;o t""

'" ,,0"

II ..

II Consisttncv
WFiltered

o-$- t\

,,'&

'l

Performan<e Evaluation ofFeaturt Su bset Evaluators

79

COMPARISON OF REDACTED FEATURES SUBSETS BY


FEATURE SUBSET SELECTION EVALUATORS
Dataset

.... Y

n--

Fig I :Performance comparison in dimensionality


reduction of attribute subset Evaluators

795

TABLE fl.

In dimensionality analysis, it is observed that the subset


evaluators Filter, Consistency and Cfs have reduced much
dimensionality, as shown in Figurel.The performance of the
Filter is superior than the Consistency and Cfs in terms of
dimensionality reduction regardless of the Classifier. The
performance of the Consistency is better than Cfs and
pitiable than Filter. In accuracy analysis, it is observed that,
the performance of Cfs is superior for instance-based IE 1
and probability-based Naive Bayes (NB) and inferior for the
tree-basedC4.5 (J48). Filtered Subset Evaluator achieves
better performance than the Consistency and lesser
performance than Cfs for NB and IE 1. The Consistency
attains excellent performance than the Filtered and Cfs for
C4.5 (J48).

11

76.5
"

755

75

'45
,

'"

Consisttncy filt!r!d I

Cfs

"""'""
C4.S(J48)

NO

'ii,,,.,

'"

Coo""'""
'81

" .,,'

Fig 2:Comparison of accuracy of classifiers with feature subset evaluators

V.

CONCLUSION

This paper presents an experimental analysis on


dimensionality reduction and improvement of supervised
learning accuracy based on the Cfs, Consistency and Filter
attribute subset evaluators. From the experimental study, it
is observed that Filter is outperforming in dimensionality
reduction compared with Cfs and Consistency. Cfs gives
more accuracy than filter and consistency, even though
inferior in dimensionality reduction. As the future
enhancement of this study, attribute rankers may be
included to improve the classification accuracy.

128

2012 IEEE

TABLE lll.

S.No.

I
2
3
4
5
6
7
8
9
10

Dataset

Contact Lenses
Diabetes
Glass
Ionosphere
Iris
Labor
Soybean
Super Market
Vote
Weather

Avera2e

International Conference on Advanced Communication Control and Computing Technologies (ICACCCT)

SUMMARY OF CLASSIFIERS ACCURACY WITH RESPECT TO THE ATTRIBUTE SUBSETEVALUATORS

Accuracy of NB of Redacted
Feature Subsets

Accuracy of C4.5(J48) of
Redacted Feature Subsets

Accuracy of IBI of Redacted


Feature Subsets

COllsistellcy

Filter

Cfs

COllsistellcy

70.83

8333

70.83

66.66

62.50

66.66

74.86
68.69

74.86
70.09

74.60
65.88

68.35
71.02

68.35
70.09

70.18
71.02

Cfs

COllsistellcy

Filter

Cfs

70.83

70.83

70.83

77.47
47.66

77.47
44.39

76.43
44.39

Filter

92.02

87.17

92.02

90.59

87.46

90.59

88.88

87.74

88.88

96.00
91.22
87.11

96.00
87.71
81.69

96.00
87.71
83.30

96.00
77.19
85.65

96.00
82.45
83.74

96.00
80.70
82.86

96.66
84.21
83.89

96.66
87.71
76.57

96.66
87.71
79.94

63.71
96.09
57.14

63.71
92.41
57.14

63.71
95.63
50.00

63.71
96.09
42.85

63.71
96.32
42.85

63.71
95.63
50.00

57.20
94.02
78.57

43.00
93.33
78.57

53.07
91.72
64.28

77.92

75.85

76.00

76.649

78.08

77.08

78.95

76.45

77.018

REFERENCES

[1]

C.-I Chang and S. Wang, "Collstrailled balld selectioll for


hyperspectral imagery," IEEE
Trans. Geosci. Remote Sens.,
vol. 44, no. 6, pp. 1575-1585, Jun. 2006.

[11] Md. Monrul Kabir,Md. Shahj ahan,Kazuyuki Murase "A new hybrid
ant colony optimization algorithm for feature selection" Expert
systems with application 39 3747-3763 (2012)

[2]

A Plaza, P. Martinez, J Plaza, and R. Perez, "Dimellsiollality


reductioll alld classification of hyperspectral image data using
sequences of extended morphological trans/ormations, " IEEE Trans.
Geosci. Remote Sens., vol. 43, no. 3, pp. 466-479,Mar. 2005.

[12] AK. Jain, RP.W. Duin, and J Mao, "Statistical Pattern


Recognition: A Review, " IEEE Trans. Pattern Analysis and Machine
Intelligence, vol. 22, no. 1, pp. 4-37, Jan. 2000.

[3]

Liangpei Zhang, Yanfei Zhong, Bo Huang, Jianya Gong, and


Pingxiang Li, "Dimensionality Reduction Based on Clonal Selection
for Hyperspectral Imagery" IEEE TRANSACTIONS ON
GEOSCIENCE AND REMOTE SENSING, VOL. 45, NO. 12,
DECEMBER 2007

[4]

J. Wang and C.-I Chang, "Independent component analysis-based


dimensionalityreduction
with applications in hyperspectral image
analysis, "IEEE Trans. Geosci. Remote Sens., vol. 44, no. 6, pp.
I 586-1600,]un. 2006.

[5]

A Jain and D. Zongker,


"Feature selection: Evaluation,
application, andsmall sample petjormance," IEEE Trans. Pattern
Anal. Mach. Intell., vol. 19, no. 2, pp. 153-158,Feb. 1997.

[6]

Jiawei Han and Micheline Kamber "Data Mining Concepts and


Techniques" Second Edition Elsevier 2006

[7]

Xuechuan Wang
Kuldip K. Paliwal "Feature extraction and
dimensionality reduction algorithms and their applications in vowel
recognition" Pattern Recognition 36 (2003) 2429 - 2439

[8]

[9]

Qinbao Song, Jingjie Ni and Guangtao Wang "A Fast Clustering


Based Feature Subset Selection Algorithm for High Dimensional
Data" IEEE TRANSACTIONS ON KNOWLEDGE AND DATA
ENGINEERING,VOL. X, NO. X,2011
John G.H., Kohavi R. and Pfleger K., "Irrelevant Features and the
Subset Selection Problem, In the Proceedings of the Eleventh
International Conference on Machine Learning " pp 121-129,1994.

[10] Oh,l, Lee,.J & Moon B (2004)" Hybrid genetic algorithms for
feature selection" lEE transaction on Pattern Analysis and Machine
Intelligence 26(11). 1424-1437

[13] AR Webb, "Statistical Pattern Recognition, seconded" Wiley,


2002.
[14]

LT. Jolliffe,"Principal Component Analysis," second ed. Springer,


2002.

[15] G.P. McCabe, "Principal Variables," Technometrics, vol. 26, pp.


137-144,May 1984.
[16] W.J Krzanowski, "Selection of Variables to Preserve Multivariate
Data Structure Using Principal Components, " Applied Statististics,
vol. 36, no. 1, pp. 22-33,1987.
[17] P. Mitra, C.A Murthy, and S.K. Pal, "Unsupervised Feature
Selection Using Feature Similarity," IEEE Trans. Pattern Analysis
and Machine Intelligence, vol. 24, no. 3, pp. 301-312,Mar. 2002.
[18] B. Krishnapuram, AJ Hartemink, L. Carin, and M.A T. Figueiredo,
"A Bayesian Approach to Joint Feature Selection and Classifier
Design," IEEE Trans. Pattern Analysis and Machine Intelligence,
vol. 26, no. 9, pp. 1105-1111, Sept. 2004.
[19] M.H.C. Law, M.AT. Figueiredo, and AK. Jain, "Simultaneous
Feature Selection and Clustering Using Mixture Models," IEEE
Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 9, pp.
1154-1166, Sept. 2004.
[20] R Kohavi and G.H. John, "Wrappers for Feature Subset Selection,"
Artificial Intelligence, vol. 97, nos. 1-2, pp. 273-324, Dec. 1997.
[21]

AJ Miller, "Subset Selection in Regression" Chapman and Hall,


1990.

[22] P. Pudil, J Novovicova, and J Kittler, "Floating Search Methods in


Feature Selection, " Pattern Recognition Letters, vol. 15, no. 11, pp.
1119-1125,Nov. 1994.
[23] S.K. Pal, R.K. De, and J Basak, "Unsupervised Feature Evaluation:
A Neuro-Fuzzy Approach," IEEE Trans. Neural Networks, vol. 11,
no. 2, pp. 366-376,Mar. 2000.

129

2012 IEEE

International Conference on Advanced Communication Control and Computing Technologies (ICACCCT)

[24] K.Z. Mao, "IdentifYing Critical Variables of Principal Components


for Unsupervised Feature Selection," IEEE Trans. Systems, Man,
and Cybernetics, Part B, vol. 35, pp. 339-344,200 5.

[39] Mark Hall, "Correlation-based feature selection for discrete and


numeric class mL/chine learning," in Proc. of the 17"' International
Conference on Machine Learning (lCML20 0 0 , 200 0 .

[25] l.T. Jolliffe, "Discarding Variables in a Principal Component


Analysis-I: Artificial Data," Applied Statistics, vol. 21, no. 2, pp.
160 -173, 1972.

[40 ] H . Almuallim and T . G . Dietterich, "Learning with many irrelevant


features," in Proceedings of the Ninth National Conference on
Arti_cial lntelligence. 1991, pp. 547{552, AAAI Press.

[26] Hua-Liang Wei and Stephen A. Billings "Feature Subset Selection


and Ranking for Data Dimensionality Reduction" IEEE
TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE
INTELLIGENCE, VOL. 29, NO. 1, JANUARY 20 0 7

[41] H. Liu and R. Setiono, "A probabilistic approach to feature


selection: A _Iter solution," in Proceedings of the 13"' International
Conference on Machine Learning. 19 9 6, pp. 319{327, Morgan
Kaufmann.

[27] Guyon, I & Elisseeff A "an introduction t o variable and feature


selection", Joumal of Machine Learning Research 3, 1157-1182
(20 0 3)

[42] U. M. Fayyad and K. B. Irani, "Multi-interval discretisation of


continuous-valued attributes," in Proceedings of the Thirteenth
Intemational Joint Conference on Arti_cial Intelligence. 1993, pp.
1022{1027, Morgan Kaufmann.

[28] Dash, M., & Liu, H. "Feature selection for classification. "
Intelligent Data Analysis (1) 131-156 (1997)
[29] Huang, J., Cai,Y., & Su,X. A hybrid genetic algorithm for feature
selction wrapper based on mutual information. Pattern Recognition
letters, 28,1825-1844 (20 0 7)
[30 ] Guan,1 S., Liu, J., & Qi, Y
"An incremental approach to
contribution-based feature selection" Joumal of Intelligence System
13(1) (20 0 4)
[3 I] Peng, H., Long, F., & ding. C "Overfiting in making comparisions
between variable selection methods. " Journal of Machine Learning
Research, 3,137-1382.(200 3)
[32] Gasca, E., Sanchez, J.S., & Alonso R. "Elimination redundancy and
irrelelevance using a new MLP-based feature selection method. "
Pattern Recognition, 39 , 313-315 (20 06)
[33] Hsu, C, Huang. H., & Schuschel, D. "The ANNIGMA-wrapper
approach to fast feature seletion for neural nets". IEEE
Transactions on system, Man, and Cybemetics- Part B: Cybernetic,
32(2) 20 7-212(20 02)
[34] Caruana, R., &
Freitage, D "Greedy attribute selection". In
proceedings of 11"' intemational conference of machine learing
USA: Morgan Kaufman (19 9 4)
[35] Lai, C, Reinders, M.J.T., & Wessels, "L Random subspace method
for multivariate feature selection". Pattern Recognition Letters , 27,
1067-1076 (20 06)
[36] Straceezzi, D.
elimination. "

J., &

Utgoff, P.

E.

"Randomixed

variable

Journal of Machine Learning Research 5, 1331-1335(200 4)

[43] Mark A. Hall, Geo_rey Holmes "Benchll1L/rking Attribute Selection


Techniques
for
Discrete
Class
Data
Mining"
IEEE
TRANSACTIONS
ON
KNOWLEDGE
AND
DATA
ENGINEERING, VOL. 15, NO. 3, MAY/!UNE 200 3
[44] Remco R. Bouckaert,Eibe Frank,Mark Hall,Richard Kirkby"Peter
Reutemann,Alex Seewald,David Scuse "WEKA Manual for Version
3-6-6" University of Waikato, Hamilton, New Zealand October 28,
2011- P 220-220
[45] P. Langley, W. Iba, and K. Thompson, "An analysis of Bayesian
classi_ers," in Proc. of the Tenth Na- tional Conference on Arti_cial
Intelligence, San Jose, CA, 19 92, pp. 223{228, AAAI Press,
[Langley92.ps.gz,from,http://www.isle .orgUangl ey/papers/bayes.aa
ai92.ps].
[46] Kuramochi, M., and Karypis, G., "Gene classification using
expression profiles: a feasibility study", International Journal on
Artificial Intelligence Tools, 14 (4) (2005) 641-660.
[47] Domingos, P., and Pazzani, M., "Feature selection and transduction
for prediction of molecular bioactivity for drug design", Machine
Learning, 29 (1997) 103-130.
[48] Xing, E. P., Jordan, M. L., and Karp, R. M., "Feature selection for
high-dimensional genomic microarray data", Proceedings of the
18th International Conference on Machine Leaming, 20 01,601-608.
[49] J. Novakovic, 120 P. Toward "optimal feature selection using
ranking methods and classification algorithms Strbac", D. Bulatovic
Yugoslav Journal of Operations Research, Number 1, 119-135 -21
(2011)

[37] Liu, H., & Tu, L. Toward integrating feature selection algorithems
for classification and clustering . IEEE Transactions on Knowledge
and Datat Engineering 17(4),491-502.(2004)
[38] M. A. Hall, "Correlation-based feature selection for mL/chine
learning," Ph.D. thesis,
Department of Computer Science,
University of Waikato, Hamilton, New Zealand, 199 8.

130

You might also like