Professional Documents
Culture Documents
ABSTRACT
Micro-array gene expression data processing has become a major area of research due its applications in the medical field. The
key uses of a microarray database are to store the measurement data, manage a searchable index, and make the data available to
other applications for analysis and interpretation applications. However, the microarray dataset suffers from the curse of
dimensionality, the limited number of samples, and the irrelevant and noise genes, all of which make the classification task for a
given sample more challenging. In the proposed methodology, the gene selection process is improved further by utilizing the
Spearmans rank correlation (SC) along with MI for feature selection. The ABC algorithm is replaced by a more efficient
Firefly algorithm to form the proposed methodology SC-mRMR-FA. The proposed SC-mRMR-FA considers SC values
between the genes in addition to the MI values for the predictive gene selection. The selection of genes is carried out using the
Firefly algorithm with high accuracy. Finally the classification is carried out based on the FA selected genes using the SVM
classifier. The evaluation results obtained on cancer dataset prove that the proposed SC-mRMR-FA Algorithm provides highly
accurate cancer classification then the mRMR-ABC algorithm
Keywords: - Micro array, Gene Expression, Spearman co-relation, Firefly Algorithm.
The preprocessing helps us to improve data SVMs are supervised learners that construct a
efficiency and remove the noisy data which helps to model from available training data with known
identify the survival of the fittest. The preprocessing is to classification. In order to obtain accurate class predictions
clean the original data and extract the useful information SVMs provide a number of free parameters that have to be
from the data set. Various data sets have been taken for tuned to reflect the requirements of the given task. The
preprocessing such that MRMR filter the response of the SVM can be characterized as a supervised learning
data set has been tested under various classification algorithm capable of solving linear and non-linear binary
algorithms. The classification algorithm which gives best classification problems. Given a training set with
accuracy for the data set is taken as input to the genetic patterns where X is an input vector
algorithm for the next process of identifying the survival of and its corresponding binary class label, the
the fittest. The authors employ data preprocessing step to idea of support vector classification is to separate examples
apply piecewise regression as a predictive data mining by means of a maximal margin hyperplane. Therefore, the
technique that fits a data model which will be used for algorithm strives to maximize the distance between genes
prediction. The MRMR method aims at selecting that are closest to the decision surface.
maximally relevant and minimally redundant set of genes The fitness function is an important factor for
for discriminating tissue classes. In MRMR method, gene evaluation and evolution of SVMs providing satisfactory
ranking is performed by optimizing the ratio of the and stable results in real-world applications. The fitness
relevancy of a gene to the redundancy of function guides the superordinated evolutionary learning
Where is the mutual information between process determining the probability that an individual can
class labels and gene where the summation is taken hand down genetic information to the subsequent
over the space of gene expression values. The redundancy population. Therefore, it should express the users
of a gene subset is determined by the mutual information objective and should favor SVMs with satisfactory
among the genes. The redundancy of gene i with the other generalization ability in order to select useful classifiers
genes in the subset is given by systematically instead of accidentally. Consequently, the
In the proposed algorithm the fitness function is fitness function effectively conducts model selection and it
used to indicate how good or bad a gene solution is. The can incorporate arbitrary model selection criteria as fitness
way of selecting the fitness function is a very significant measure. Whereas the fitness function selects solutions for
matter in designing the proposed feature selection and light intensity and absorption coefficient. The selection is
classification algorithm, since the solution optimization as implemented as tournament selection with a tournament
well as the performance of the algorithm count mainly on size of two. Furthermore, an elitist mechanism is applied in
this fitness function. Thus the solutions will be ordered in order to ensure that the best SVM is member of the next
ascending way after measuring their fitness function based generation.
on their fitness value. In the proposed clustering algorithm The idea of SC-mRMR-FA based SVM model
(a firefly (brighter one) that have minimum fitness value) selection one chooses the individual with maximum overall
for each iteration will has the ability to affect and influence fitness for future use on unseen data. To simulate this
in the movements of the other fireflies. Therefore when scenario, it assessed the performance by means of precision,
comparing between two fireflies a and b, if b is brighter recall, f-measure and accuracy metrics. The important gene
than firefly a, than firefly a will move toward firefly b. The selection data is used to increase the cancer classification
proposed algorithm is designed to enhance the performance accuracy using SC mRMR-FA algorithm. optimization
of the SVM in order to obtain more accurate classification algorithm is used to select the most significant genes from
process. the given cancer datasets.
To implement the proposed system and generate various Precision can be seen as a measure of exactness
results the scenario use MATLAB in this environment. The
scenario has been selected Leukemia 1, dataset.
100
90
Leukemia 1 dataset 80
70
60
Accuracy can be calculated from formula given as Precision
50
follows 40
30
20
Accuracy =
10
0
mRMR-ABC SC-mRMR-FA
Methods
90
REFERENCES
80
60
0
Machine Learning Research, vol. 3, pp.1157
mRMR-ABC SC-mRMR-FA
Methods 1182, 2003.
[3] C. Ambroise, G. Mclachlan, Selection bias in gene
Fig 3.1 Accuracy comparison extraction on the basis of microarray
geneexpression data, Proc. Natl. Acad. Sci. 99
(10) (2002) 6562 6566.
From the above fig 5.1 it can observe that the
[4] R. Breitling, P. Armengaud, A. Amtmann, P.
comparison of existing and proposed system in terms of
Herzyk, Rank products: a simple, yet powerful,
accuracy metric. In x axis plot the types and in y axis plot
new method to detect differentially regulated
genes in replicated microarray experiments, methods, Expert Systems with Applications, vol.
FEBS Lett. 573 (13) (2004) 83 92. 39, no. 8, pp. 7270 7280, 2012.
[5] Carlos J. Alonso-Gonzlez, Q. Isaac Moro- [9] H.-L. Huang and F.-L. Chang, Esvm:
Sancho, Microarray gene expression Evolutionary support vector machine for
classification with few genes: Criteria to combine automatic feature selection and classification of
attribute selection and classification methods, an microarray data, Biosystems, vol. 90, no. 2, pp.
international journal Expert Systems with 516528, 2007.
Applications 39 (2012) 72707280 [10] Guyon and A. Elissee, An introduction to
[6] A. Abderrahim, E. Talbi, and M. Khaled, variable and feature selection, Journal of
Hybridization of genetic and quantum algorithm Machine Learning Research, pages: 1157-1182,
for gene selection and classification of 2003.
microarray data, in Proc. IEEE International [11] C.Deisy, B.Subbulakshmi, S.Baskar, and
Symposium In Parallel Distributed Processing, , R.Ramaraj , Efficient dimensionality reduction
pp. 18, 2009. approaches for feature selection , Proceedings of
[7] E. Alba,J. Garcia-Nieto et al., Gene selection in the International Conference on Computational
cancer classification using pso/svm and ga/svm Intelligence and Multimedia Applications, pages:
hybrid algorithms, Evolutionary Computation, 121-
pp. 284290, 2007. 127, 2007.
[8] C. Alonso, I. Moro-Sancho, A. Simon-Hurtado,
and R. Varela-Arrabal, Microarray gene
expression classification with few genes: Criteria
to combine attribute selection and classification