Professional Documents
Culture Documents
Abstract:
Data classification is one of the investment research areas in the field of data mining. Machine learning
algorithms such as naive bayes, neural network, and support vector machine are most regularly used for performing
the classification task. Supervised learning is one of its kinds where the datasets consist of class labels and the
machine learning classifier are trained first using that. It is to be noted that feature selection plays a vital role in
developing the classification accuracy of the supervised machine learning classifiers. This research work aims in
proposing an improved genetic algorithm based feature selection planning based five layered artificial neural
network classifier. Around 20 datasets are collect from the UCI repository. Implementations are carried out using
MATLAB tool. Performance metrics such as prediction efficiency and time taken for prediction are taken into
account to conduct the performance evaluation of the expected classifier. Simulation results portrays that the
proposed IGA-FLANN classifier outperforms the existing classifiers.
Keywords — Machine learning, boosting, neural network, genetic algorithm, feature selection, data mining,
MATLAB.
method that creates simpler models that the coaching conversation with the highest
are easier to analyze. value variable. A weakness of their
approach is that it does not take into
Online decision trees from data streams consideration the communication between
are usually unable to handle concept drift. behaviour variables and, due to the
Blanco et al., 2016 proposed the uncertainty congenitally present in
Incremental Algorithm Driven by Error modelling learning styles, small
Margins (IADEM-3) that mainly carry out differences in behaviour can lead to
two actions in response to a concept drift. incorrect predictions. Subsequently, the
At first, IADEM-3 resets the variables learner is presented with coaching material
affected by the change and continue not suited to their learning style. Because
unbroken the structure of the tree, which of the above mentioned challenges a new
allows for changes in which ensuing target method that uses fuzzy decision trees to
functions are very similar. After that, build a series of fuzzy predictive models
IADEM-3 creates alternative models that connecting these variables for all
replace parts of the main tree when they dimensions of the Felder Silverman
decidedly improve the accuracy of the Learning Styles model. Results using live
model, thereby rebuilding the main tree if data by the authors showed that the fuzzy
needed. An online change detector and a models have elevated the predictive
non-parametric statistical test based on certainty across four learning style
Hoeffding’s bounds are used to agreement dimensions and promote the discovery of
that significance. A new pruning method is some interesting relationships amongst
also incorporated in IADEM-3, making behaviour variables.
sure that all split tests previously installed
in decision nodes are useful. Their learning 2.2. Recent Works on Support Vector
model is also viewed as an in concert of Machine (SVM)
classifiers, and predictions of the main and
alternative models are combined to Motivated by the KNN trick presented in
classify unlabeled examples. IADEM-3 is the weighted twin support vector machines
empirically compared with various well- with local information (WLTSVM), Pan et
known decision tree induction algorithms al., 2015 expected a novel K-nearest
for concept batch detection. The authors neighbour based structural twin support
depict that their new algorithm often vector machine (KNN-STSVM). By
reaches higher levels of accuracy with applying the intra-class KNN method,
smaller decision tree models, continue the different weights are disposed to the
processing time bounded, irrespective of samples in one class to build up the
the number of instances processed. structural information. For the other class,
the excessive constraints are deleted by the
Predicting learning styles in conversational inter-class KNN method to speed up the
intelligent coaching systems using fuzzy coaching process. For large scale
decision trees has been proposed by problems, a fast clip algorithm is further
Crokett et al., 2017. Prediction of learning introduced for increase of rate.
style is carried out by confining Comprehensive experimental results on
independent behaviour variables during twenty-two datasets determine the
both resulting from boosting converge on To select the most feasible and compact
incorrect training data. The first one is feature subset, the improved GA with
filtering for subsequent functions when the mutation pool is being proposed in this
training data contains dangerous areas research work. The search for feature
and/or label noise; and the second one is δ
subset is done based on and conditional
over fitting in subsequent functions that mutual information measure. Every dataset
are forced to learn on all the incorrect is first fed into the proposed theoretical
instances. The authors demonstrated the search based feature selection algorithm to
capability of CBB through extensive obtain a subset of features, which are
empirical results on 20 UCI benchmark evaluated and finally after convergence,
datasets and proclaimed that CBB achieves the non-dominated feature subset is
superior predictive accuracy that use selected.
selective boosting without clusters.
As GA is a population based stochastic
III. PROPOSED WORK search algorithm, the initial population for
proposed improved GA is created at
The proposed work has contributions three
random. The dimension of population set
fold. At first an improved genetic p ×n p
algorithm based feature selection action is is where the defined population
n
size is and is the number of features in
portrayed. Once when the features are
selected then in the next stage, verdict the dataset. Binary string representation of
association is done. At the third stage, five chromosome is chosen in this work. The
layered artificial neural network is length of the string is same with total
proposed. number of features present in the dataset.
Each gene consists of only two values, ‘1’
3.1. Improved Genetic Algorithm based and ‘0’, which implies that the index
Feature Selection Strategy feature is present and absent respectively
C
P={P,P,...,P} in current subset. So, a chromosome 1
of
Let, 1 2 n
be the feature set
length K is represented as follows:
where n is the number of features and
U ={O 1,O2, m }∈
...,O R m× n
In the proposed improved GA, every
be a given
≤n . The population member has equal probability
dataset with m objects where m
of being a parent. Single point crossover
feature selection method can be viewed as approach is used to get offspring from
a mapping
(P
g ,U
,D) →{P
'}
, where,
(⋅)
g
parent chromosomes. The main reason
is the feature selection method, D is the behind application of alteration is to
decision attribute representing class labels diversity in the population. The method
'=k
'⊂P , where, P uses a alteration pool of different mutation
and P , number of
strategies. At any moment for any current
selected features, k<<n . The goal of offspring, alteration strategy to be applied
proposed method is to find P ' which will
is chosen dynamically from the pool. The
be highly relevant to D and less related to reason behind selecting such alteration
each other. strategy is to maximize quality of
population member by utilizing
effectiveness of those individual mutual
R
is obtained by combining all such F
8: Return F
r
edu
ct
s
, where,
9: End
R=R
edu
ct1∪R
edu
ct2∪...∪R
edu
ctE
.
10: Output: Set F of optimal feature
Now each pair of r
edu
ct x and y
in
subset
r
edu
ct
set Ris compared based on their
objective functions and if all objective 3.3. Five Layered - ANN (FL – ANN)
function values for x are optimum compare Classifier
y y
to that of , we say that x dominates
x >y
Once after the feature selection is carried
and denote it as . For each reduct in out, FL – ANN is used to classify. FL –
R , a dominating factor is set initially as ANN is a five layered RBF based classifier
zero and it increases by one when it neural network that makes use of gradient
r
edu
ct
dominates another in R. Finally, descent approach and regression based
a
r
edu
ct
is selected in set F if it classification. It optimizes flattening
dominates all the reducts, i.e., if it’s parameter of RBF kernel through grade
R−1 descent approach.
dominating factor is . If multiple
such
r
edu
ct
is there in F then one is Applied input vector x is transmitted to
chosen at random as the final reduct. classification layer through input layer.
Pattern layer includes one neuron for each
Algorithm 2 Find optimal features by training datum with RBF kernel. Squared
verdict consolidation. Euclidean distance between input vector x
and training data vector t is calculated as
R=R
edu
ct1∪R
edu
ct2∪...∪R
edu
ctE
1: in (1) where p denotes total number of
Input : All generated reducts training data at pattern layer.
R=R
edu
ct1∪R
edu
ct2∪...∪R
edu
ctE
2
d
it(j)= x
s −tj ,1≤j≤p
F=φ 2: Begin F=φ ... (1)
0.9
tjbelo
ngstoithc
las
s1≤i≤N calculates denominator value the same as
y(j,i)=
0.1
els
e 1≤j≤p FL-ANN represented by D.
... (3) p
=∑
D r(j)
N+1 neurons are placed at summation j=1
... (6)
layer where N is the total number of
classes and additional one term to N is for Each class is represented with a neuron at
one neuron to obtain denominator. FL- normalization layer. These neurons divide
ANN uses diverge effect term at corresponding nominator value by
summation layer to increase the distances denominator value calculated at
among classes. Diverge effect term value summation layer, according to (7) where ci
is calculated as in (4) where d (j, i) denotes denotes normalized output of ith class.
diverge effect term of jth training data and
u
ith class. ymax is initialized to 0.9 which i=
c i
,1≤i≤N
D ... (7)
denotes the maximum value of y(j, i). ymax
value is updated with the maximum value
Class of input vector is determined at
of output layer after each iteration of
output layer through the winner decision
optimization. Diverge effect term is
mechanism as given in (8) where c is the
calculated by N neurons of summation
output vector of normalization layer, cid
layer. This calculation includes aggressive
and id denote winner neuron value and
form of y (j, i) − ymax to increase the effect
indices of the class, respectively.
of y (j, i).
[c ,i
d ]=m
ax
(c)
d(j,i)=e(y(j,i)−ym
ax)
*y(j,i)
id
... (8)
... (4)
Gradient descent based collective learning
Diverge effect term is used in calculating
is utilized in FL-ANN for obtaining
nominator values at summation layer as in
optimized flattening parameter value. Each
(5). Moreover, denominator value is also
training datum at pattern layer is
calculated at this layer as in (6).
sequentially applied to neural network and
p three steps are executed until maximum
ui = ∑d ( j , i ) * r ( j ),1 ≤ i ≤ N iteration limit exceeds. Firstly, squared
j=1
... (5) error e is calculated for each input, as in
(9) where y(z, id) represents the value of
When N neurons, represented with ui,
zth training input data for idth class and cid
calculate nominator values by summing
is value of winner class.
dot product of diverge effect terms and
classification layer outputs, other neuron e=(y(
z,i
d )−c )2
id
... (9)
Blood 748 5
Bupa 345 7
Car 1728 6
Contraceptive 1473 9
Credit 30000 24
Diagnostic 569 32
Ecoli 336 8
Ionosphere 351 34
Mammography 961 6
monks – 1 432 7
monks – 2 432 7
monks – 3 432 7
Parkinsons 197 23
Pima 768 8
Prognostic 198 34
Sonar 208 60
Spect 267 22
Vert 310 6
Yeast 1484 8
Algorithms
Dataset
MLP SVM TREE RBF IGA – FLANN
mks-3 1 1 1 1 1
Algorithms
Dataset
MLP SVM TREE RBF IGA - FLANN
Fig.4. Time Taken for Classification by the Algorithms Comparison for the datasets 1 to 10
Fig.5. Time Taken for Classification by the Algorithms Comparison for the datasets 11 to 20