Professional Documents
Culture Documents
● Input features
class column (fully surrendered, not surrendered), other columns
● Filter model of feature selection
● Fcorrelation measure calculation
(select relevant and irrelevant features)
● Graph based clustering method
● Ad_matrix created
● Prim’s Algorithm to generate minimum spanning tree.
● Minimum edges selected
● Each edge 2 attributes
● Compare edge weight with weight of 2 attributes
● If edge weight not less than both weights above then
● Add edge selected edges set
● End if
● For all edges from selected edges set
● If any 1 feature present in previous cluster then
● Add other feature from edge to any 1 of existing group of attributes
● Else
● Add both features from edge to new group
● End if
● End for
● For each
group
● Add first attribute to selected attribute set
● End for
● Display final selected attribute set
Fcorrelation measure
● symmetric uncertainty is defined as follows
ᵄᵄ(ᵄ, ᵅ ) =2 × ᵃᵄᵅᵅ(ᵄ∣
ᵅ)/ᵄ(ᵄ) + ᵄ(ᵅ ) . (1)
Where,
1)ᵄ(ᵄ) is the entropy of a discrete random variable ᵄ. Suppose ᵅ(ᵆ) is the prior probabilities for all values of ᵄ,
ᵄ(ᵄ) is defined by
ᵄ(ᵄ) = −Σ ᵅ(ᵆ) log2 ᵅ(ᵆ)…. (2)
ᵆ∈ᵄ
2)Gain(ᵄ∣
ᵅ ) is the amount by which the entropy of ᵅ decreases. It reflects the additional information about ᵅ
provided by ᵄ and is called the information gain which is given by
ᵃᵄᵅᵅ(ᵄ∣
ᵅ) = ᵄ(ᵄ) − ᵄ(ᵄ∣
ᵅ )
= ᵄ(ᵅ ) − ᵄ(ᵅ ∣
ᵄ) …. (3)
ᵄ(ᵄ∣
ᵅ ) = −Σ ᵅ(ᵇ) Σ ᵅ(ᵆ∣
ᵇ) log2 ᵅ(ᵆ∣
ᵇ). (4)….(conditional entropy)
ᵇ∈ᵅ ᵆ∈ᵄ
Proposed
● Input features class column (fully surrendered, not surrendered), other columns
● Filter model of feature selection
● Entropy Correlation coefficient calculation with Information gain calculation (select relevant and
irrelevant features)
● Graph based clustering method
● Kruskal’s Algorithm to generate minimum spanning tree.
● Minimum edges selected
● Each edge 2 attributes
● Compare edge weight with weight of 2 attributes
● If edge weight not less than both weights above then
● Add edge selected edges set
● End if
● For all edges from selected edges set
● If any 1 feature present in previous cluster then
● Add other feature from edge to any 1 of existing group of attributes
● Else
● Add both features from edge to new group
● End if
● End for
● For each group
● Add first attribute to selected attribute set
● End for
● Display final selected attribute set
Entropy Correlation coefficient calculation.. (Proposed system)
The below formula will be used:
Where, I is mean dependence information and is given by,