Professional Documents
Culture Documents
=
x
n
for i=1,2,n
Cov(x,y)=
(x-x )(-)
(n-1)
for i=1,2,n
Covariance matrix =_
co:(x,x) co:(x,y)
co:(y,x) co:(y,y)
]
For x and y as attributes of dataset.
ALGORITHM : SWIFT
inputs: D(F
1
, F
2
, ..., Fm, C) - the given data set
0 - the T-Relevance threshold.
output: S - selected feature subset .
//====Part 1: Irrelevant Feature Removal ====
1 for i =1 to m do
2 T-Relevance =SU (Fi, C)
3 if T-Relevance >0 then
4 S =S {Fi};
//====Part 2: Minimum Spanning Tree Construction ====
5 G =NULL; //G is a complete graph
6 for each pair of features {Fi, F } S do
7 F-Correlation =SU (F
, F] )
8 AJJ Fi onJ/or F] to 0 wit F-Correlation os tc wcigt
o tc corrcsponJing cJgc;
9 minSpanTree =Prim (G); //Using Prim Algorithm to
generate the minimum spanning tree
//====Part 3 : Tree Partition and Representative Feature
Selection ====
10 Forest =minSpanTree
11 for each edge Ei] Forest do
12 if SU(Fi, F] ) <SU(Fi, C) SU(Fi, F] ) <SU(F],C)
then
13 Forest =Forest Ei]
14 S =
15 for each tree Ii Forest do
16 F
]
R =argmax FkIi SU(Fk,C)
17 S =S {F
]
};
18 return S
STAGE 1: IRREVALANT FEATURE REMOVAL
Symmetric uncertainty is the measure of uncertainty. The
symmetric uncertainty is defined as follows
Su(X, ) =2 [0oin(X / )/E(X) +E( )].
Where, E(X) is the entropy of a discrete random variable X.
Suppose p(x) is the prior probabilities for all values of X,
E(X) is defined by
E(X) =xX
p(x) log2 p(x)
Gain(X /) is the amount by which the entropy of
decreases. It reflects the additional information about
provided by X and is called the information gain.
Information gain which is given by
0oin(X /) =E(X) E(X / )
0oin(X /) =E( ) E( /X)
Where E(X / ) is the conditional entropy which quantifies
the remaining entropy (i.e. uncertainty) of a random variable
X given that the value of another random variable is
known. Suppose p(x) is the prior probabilities for all values
of X and p(x /y) is the posterior probabilities of X given the
values of , E(X / ) is defined by
International Journal of Computer Trends and Technology (IJCTT) volume 11 number 2 May 2014
ISSN: 2231-2803 http://www.ijcttjournal.org Page92
E(X / ) =yp(y) xX p(x /y) log
2
p(x /y)
Information gain is a symmetrical measure. That is the
amount of information gained about X after observing is
equal to the amount of information gained about after
observing X. This ensures that the order of two variables
(e.g.,(X, ) or (,X)) will not affect the value of the measure.
Symmetric uncertainty treats a pair of variables
symmetrically, it compensates for information gains bias
toward variables with more values and normalizes its value
to the range [0,1]. A value 1 of Su(X, ) indicates that
knowledge of the value of either one completely predicts the
value of the other and the value 0 reveals that X and are
independent. Although the entropy based measure handles
nominal or discrete variables, they can deal with continuous
features as well, if the values are discretized properly in
advance [14]. Given Su(X, ) the symmetric uncertainty of
variables X and , the relevance T-Relevance between a
feature and the target concept C, is defined as follows.
Definition 1: (T-Relevance) The relevance between the
feature Fi F and the target concept C is referred to as the
T-Relevance of Fi and C, and denoted by Su(Fi, C). If
Su(Fi, C) is greater than a predetermined threshold 0, we say
that Fi is a strong T-Relevance feature.
STAGE 2: MINIMUM SPANNING TREE
CONSTRUCTION
Given Su(X, ) the symmetric uncertainty of variables X
and the correlation F-Correlation between a pair of features
can be defined as follows.
Definition 2: (F-Correlation) The correlation between any
pair of features Fi and F] (Fi, F] F i =]) is called the
F-Correlation of Fi and F] , and denoted by Su(Fi, F]).
PRIMS ALGORITHM:
A Minimum Spanning Tree in an undirected connected
weighted graph of a spanning tree of minimum weight
among all spanning trees.
Grow a MST:
Start by picking any vertex to be the root of the tree.
While the tree does not contain all vertices in the
graph
find shortest edge leaving the tree and add it to the
tree .
STAGE 3: SELECTING REPRESENTATIVE FEATURES
The feature redundancy F-Redundancy and the
representative feature R-Feature of a feature cluster can be
defined as follows.
Definition 3: (F-Redundancy) Let S = {F1, F2, ...,
Fi,...,F
k<F} be a cluster of features. If Fj S, Su (F], C )
Su (Fi, C ) Su (Fi, F]) >Su (Fi, C) is always corrected
for each Fi S (i =]), then Fi are redundant features with
respect to the given Fj (i.e. each Fi is a F-Redundancy).
Definition 4: (R-Feature) A feature Fi S ={F1, F2, ..., Fk}
(k <F) is a representative feature of the cluster S ( i.e.Fi is
a R-Feature ) if and only if, Fi =argmaxF]S Su(F], C ).
This means the feature, which has the strongest T-Relevance,
can act as a R-Feature for all the features in the cluster.
According to the above definitions, feature subset selection
can be the process that identifies and retains the strong T-
Relevance features and selects R-Features from feature
clusters.
FLOW DIAGRAM
IV. DATASET DESCRIPTION
We have used two dataset from the UCI repository of which
first is LUNG CANCER dataset with 32 instances and 57
attributes(1 class attribute, 56 predictive) with attribute class
label in which all predictive attributes are nominal, taking on
integer values 0-3 and Class Distribution as class 1 with 9
observation and class 2 with 13 observations and class 3 with
10 observations and second is LIBRAS Movement Database
with 360 (24 in each of fifteen classes) instances and 90
numeric (double) and 1 for the class (integer) attributes with
class distribution of 6.66% for each of 15 classes.
International Journal of Computer Trends and Technology (IJCTT) volume 11 number 2 May 2014
ISSN: 2231-2803 http://www.ijcttjournal.org Page93
V. RESULTS AND DISCUSSION
As proposed the swift clustering algorithm and PCA is
implemented in Net beans IDE. To evaluate this two datasets
have been used and the outputs are tabulated in the Table 1.
Table 1
Selected attributes
DATASET LUNG CANCER LIBRAS
MOVEMENT
ATTRIBUTES 56 90
INSTANCES 32 360
SWIFT
OUTPUT
7 ATTRIBUTES 2 ATTRIBUTES
PCA
OUTPUT
21 ATTRIBUTES 9 ATTRIBUTES
VI. CONCLUSION
Feature selection method is an efficient way to improve the
accuracy of classifiers, dimensionality reduction, removing
both irrelevant and redundant data. Thus SWIFT algorithm
selects only fewer and relevant features which adds to the
classifier accuracy when compared with PCA as shown in
table 1. For the future work, we plan to explore different
types of correlation measures, and study some formal
properties of feature space.
REFERENCES
[1]. UCI Repository https:archive.ics.uci.edu/ml/
[2]. IEEE transactions on knowledge and data engineering vol:25 no:1 year
2013 a fast clustering-based feature subset selection algorithmfor high
dimensional data
[3] Arauzo-Azofra A., Benitez J .M. and Castro J .L., A feature set measure
based on relief, In Proceedings of the fifth international conference on
Recent Advances in Soft Computing, pp 104-109, 2004.
[4] Baker L.D. and McCallum A.K., Distributional clustering of words for
text classification, In Proceedings of the 21st Annual international ACM
SIGIR Conference on Research and Development in information Retrieval,
pp 96-103, 1998.
[5] Bell D.A. and Wang, H., A formalismfor relevance and its application
in feature subset selection, Machine Learning, 41(2), pp 175-195, 2000.
[6] Butterworth R., Piatetsky-Shapiro G. and Simovici D.A., On Feature
Selection through Clustering, In Proceedings of the Fifth IEEE international
Conference on Data Mining, pp 581-584, 2005.
[7] Chikhi S. and Benhammada S., ReliefMSS: a variation on a feature
ranking ReliefF algorithm. Int. J . Bus. Intell. Data Min. 4(3/4), pp 375-390,
2009.
[8] Dash M. and Liu H., Feature Selection for Classification, Intelligent Data
Analysis, 1(3), pp 131-156, 1997.
[9] Dash M., Liu H. and Motoda H., Consistency based feature Selection, In
Proceedings of the Fourth Pacific Asia Conference on Knowledge Discovery
and Data Mining, pp 98-109, 2000.
[10] Das S., Filters, wrappers and a boosting-based hybrid for feature
Selection, In Proceedings of the Eighteenth International Conference on
Machine Learning, pp 74-81, 2001.
[11] Dash M. and Liu H., Consistency-based search in feature selection.
Artificial Intelligence, 151(1-2), pp 155-176, 2003.
[12] Demsar J ., Statistical comparison of classifiers over multiple data sets,
J .Mach. Learn. Res., 7, pp 1-30, 2006.
[13] Dhillon I.S., Mallela S. and Kumar R., A divisive information theoretic
feature clustering algorithm for text classification, J . Mach. Learn. Res., 3,pp
1265-1287, 2003.
[14] Fayyad U. and Irani K., Multi-interval discretization of continuous-
valued attributes for classification learning, In Proceedings of the Thirteenth
International J oint Conference on Artificial Intelligence, pp 1022-1027,
1993.
[15] Fleuret F., Fast binary feature selection with conditional mutual
Information, J ournal of Machine Learning Research, 5, pp 1531-1555, 2004.
[16] Forman G., An extensive empirical study of feature selection metrics
for text classification, J ournal of Machine Learning Research, 3, pp 1289-
1305, 2003.
[17] Garcia S and Herrera F., An extension on Statistical Comparisons of
Classifiers over Multiple Data Sets for all pairwise comparisons, J .
Mach.Learn. Res., 9, pp 2677-2694, 2008.
[18] Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M.,
Mesirov, J .P., Coller, H., Loh, M.L., Downing, J .R., and Caligiuri, M. A.,
Molecular classification of cancer: class discovery and class prediction by
gene expression monitoring. Science, 286(5439), pp 531-537, 1999.
[19] Guyon I. and Elisseeff A., An introduction to variable and feature
selection, J ournal of Machine Learning Research, 3, pp 1157-1182, 2003.
[20] J ohn G.H., Kohavi R. and Pfleger K., Irrelevant Features and the
Subset Selection Problem, In the Proceedings of the Eleventh International
Conference on Machine Learning, pp 121-129.
[21] Kohavi R. and J ohn G.H., Wrappers for feature subset selection,
Artif.Intell., 97(1-2), pp 273-324, 1997.
[22] Yu L. and Liu H., Feature selection for high-dimensional data: a fast
correlation-based filter solution, in Proceedings of 20th International
Conference on Machine Leaning, 20(2), pp 856-863, 2003.