27 views

Uploaded by selamitsp

- Automatic Web Video Categorization Using Audio-Visual Information and Hierarchical Clustering Relevance Feedback
- Ocr
- Genetic Algorithm with SRM SVM Classifier for Face Verification
- A Practical Approach to Microarray Data Analysis-Daniel P. Berrar
- Time-Series Classification based on Individualised Error Prediction
- sample draft mcp-pj
- Support Vector Machines
- A Survey on Machine Learning Approach in Data Mining
- Active Learning Literature Survey
- Review Paper on Credit Card Fraud Detection
- Table-of-Contents
- IJCAI13-Identifying Useful Human Correction Feedback
- nepali-news-classification-using-nave-bayes-support-vector-machi-2018.pdf
- Biostatistics Role in Microarray Analysis
- A Comparative Study of Existing Machine Learning Approaches for Parkinson's Disease Detection
- 07061922
- Banks Army
- Using Bayes Network in Weka
- www.FreePaper.us_Demo_7155919.pdf
- Presentation ML

You are on page 1of 19

Chapter 9
CLASSIFYING MICROARRAY DATA USING
SUPPORT VECTOR MACHINES
‘Sayan Mukherjee
PostDoctoral Fellow: MIT/Whitehead Instiute for Genome Research and Center for
Biological and Computational Learning at MIT,
e-mail: sayan@mitedy
1. INTRODUCTION
Over the last few years the routine use of DNA microarrays has made
possible the creation of large datasets of molecular information
characterizing complex biological systems. Molecular classification
approaches based on machine learning algorithms applied to DNA
microarray data have been shown to have statistical and clinical relevance
for a variety of tumor types: Leukemia (Golub et al., 1999), Lymphoma
(Shipp et al., 2001), Brain cancer (Pomeroy et al, 2002), Lung cancer
(Bhattacharjee et al., 2001) and the classification of multiple primary tumors
(Ramaswamy et al., 2001),
One particular machine learning algorithm, Support Vector Machines
(SVMs), has shown promise in a variety of biological classification tasks,
including gene expression microarrays (Brown et al., 2000, Mukherjee et al.,
1999). SVMs are powerful classification systems based on regularization
techniques with excellent performance in many practical classification
problems (Vapnik, 1998, Evgeniou et al, 2000).
‘This chapter serves as an introduction to the use of SVMs in analyzing
DNA microarray data. An informal theoretical motivation of SVMS_ both
from a geometric and algorithmic perspective, followed by an application to
leukemia classification, is described in Section 2. The problem of gene
selection is described in Section 3. Section 4 states some results on a variety
of cancer morphology and treatment outcome problems. Multiclass
classification is described in Section 5. Section 6 lists software sources and9. Classifying Microarray Data Using Support Vector Machines 167
rules of thumb that a user may find helpful. The chapter concludes with a
brief discussion of SVMs with respect to some other algorithms.
. SUPPORT VECTOR MACHINES
In binary microarray classification problems, we are given 1 experiments
{Gu Yoo @aID}- This is called the training set, where x; is a vector
corresponding io the expression measurements of the # experiment or
sample (this vector has n components, each component is the expression
‘measurement for a gene or EST) and yj is a bi ss label, which will be
#1. We want to estimate a multivariate function from the training set that
will accurately label a new sample, that is f(r) =Yaon This problem of
leaming a classification boundary given positive and negative examples is a
particular case of the problem of approximating a multivariate function from
sparse data. The problem of approximating a function from sparse data is ill-
posed. Regularization is a classical approach to making the problem well-
posed (Tikhonov and Arsenin, 1977)
A problem is well-posed if it has a solution, the solution is unique, and
the solution is stable with respect to perturbations of the data (small
perturbations of the data result in small perturbations of the solution). This
last property is very important in the domain of analyzing microarray data
where the number of variables, genes and ESTs measured, is much larger
than the number of samples. In statistics when the number of variables is,
much larger that the number of samples one is said to be facing the curse of
dimensionality and the function estimated may very accurately fit the
samples in the training set but be very inaccurate in assigning the label of a
new sample, this is referred to as over-fitting. The imposition of stability on
the solution mitigates the above problem by making sure that the function is
smooth so new samples similar to those in the training set will be labeled
similarly, this is often referred to as the blessing of smoothness.
2.1 Mathematical Background of SVMs
We start with a geometric intuition of SVMs and then gi
mathematical formulation
e a more general
2.1.1 A Geometrical Interpretation
A. geometric interpretation of the SVM illustrates how this idea of
smoothness or stability gives rise to a geometric quantity called the margin
which is a measure of how well separated the two classes can be. We start by
assuming that the classification function is linear168, Chapter 9
@.1)
‘ors x and w, respectively.
‘The operation ws is called a dot product. The label of a new point yew is
the sign of the above function, Yrer= sign [f(tyey) } The. classification
boundary, all values ofx for which f(x) = 0, is a hyperplane’ defined by its
normal vector w (see Figure 9.1.
a, aA A
ang
Figure 9.1, The hyperplane separating two classes, The circles and the triangles designate the
‘members ofthe two classes. The normal vector tothe hyperplane isthe vector w.
‘Assume we have points from two classes that can be separated by a
hyperplane and x is the closest data point to the hyperplane, define x9 to be
the closest point on the hyperplane to x. This is the closest point to x that
satisfies wex=0 (see Figure 9.2). We then have the following two
equations:
wx k for some k, and
wx.
Subtracting these two equations, we obtain w(x ~%9
Dividing by the norm of w (the norm of w is the length of the vector w), we
obiain?
A liperplane isthe extension of the two or three dimensional concepts of li
to higher dimensions. Note, that in an n-dimensional space, a hypespla
‘dimensional object in the same way that a plane is a ewo dimensional object in a Uhre
‘dimensional space. This is why the normal or perpendicular to the hyperplane is always a
2-The notation “ql” refers to the absolute value ofthe variable a. The notation “al refers
the length ofthe veetor a
es and planes
is ann 1

- Automatic Web Video Categorization Using Audio-Visual Information and Hierarchical Clustering Relevance FeedbackUploaded byAnonymous ntIcRwM55R
- OcrUploaded bykapilkumargupta
- Genetic Algorithm with SRM SVM Classifier for Face VerificationUploaded byAnonymous Gl4IRRjzN
- A Practical Approach to Microarray Data Analysis-Daniel P. BerrarUploaded byBashistha Kanth
- Time-Series Classification based on Individualised Error PredictionUploaded byVo Thanh Vinh
- sample draft mcp-pjUploaded byapi-403594934
- Support Vector MachinesUploaded byIrina Alexandra
- A Survey on Machine Learning Approach in Data MiningUploaded byInternational Journal for Scientific Research and Development - IJSRD
- Active Learning Literature SurveyUploaded byfoobar89892
- Review Paper on Credit Card Fraud DetectionUploaded byseventhsensegroup
- Table-of-ContentsUploaded byanon-514411
- IJCAI13-Identifying Useful Human Correction FeedbackUploaded byhouri melkon
- nepali-news-classification-using-nave-bayes-support-vector-machi-2018.pdfUploaded byAshok Pant
- Biostatistics Role in Microarray AnalysisUploaded byGerónimo Maldonado-Martínez
- A Comparative Study of Existing Machine Learning Approaches for Parkinson's Disease DetectionUploaded byLucaGiovanniSolís
- 07061922Uploaded byKannadhasan Dass
- Banks ArmyUploaded bykuku288
- Using Bayes Network in WekaUploaded byfayshox1325
- www.FreePaper.us_Demo_7155919.pdfUploaded byMehdi
- Presentation MLUploaded byfull entertainment
- Lesson Plan With DatesUploaded bydasariorama
- Comparison of Various Classifiers over Type I and Type II Diabetic Food Recognition SystemUploaded byGRD Journals
- IJETTCS-2014-12-04-87Uploaded byAnonymous vQrJlEN
- dlsvmUploaded byDandi Baroes
- A quick introduction to Statistical Learning Theory and Support Vector MachinesUploaded byRahul B. Warrier
- AI與機械學習Uploaded byThomas Chen
- We Present SkinputUploaded byДраган Јекић
- Performance Analysis of Supervised Remote Sensing Methods for Forest Identification in PakistanUploaded byIJRASETPublications
- EigenValue Decay a New Method for Neural Network RegularizationUploaded byfrangky200787
- matecconf_iccma2016_06002Uploaded byPrashanth Mohan

- lecture1(1)Uploaded byEswar Gowrinath
- Lecture 7Uploaded byDeovrat Kumar Raj
- Lecture5 (1) AirUploaded byRaj Bakhtani
- bag filter calculation.pdfUploaded byArun Gupta
- Cyclone Separator 3Uploaded byKanail
- lecture8.pdfUploaded byselamitsp
- Matlab DerslerUploaded byselamitsp
- Letter of Support Template_0.pdfUploaded byselamitsp
- TERMAL ANALİZ YÖNTEMLERİUploaded byselamitsp
- Redecam Bag FiltersUploaded byselamitsp
- chapter5lecturespart6_000Uploaded byselamitsp
- Lecture2 AirUploaded byRaj Bakhtani
- Assessment of heavy metal contamination of roadside soils_Bai-2009.pdfUploaded byselamitsp
- lecture6 (1)Uploaded byRaj Bakhtani
- lecture6 (1)Uploaded byRaj Bakhtani
- 12 1 Koordinatlar Ve IzomorfizmlerUploaded byselamitsp
- Time Series Forecasting of Hourly PM10 UsingUploaded byselamitsp
- Matematik_TurevUploaded byselamitsp
- 5 Gas TransferUploaded byselamitsp
- Material BalanceUploaded byalireza_e_2
- Notes-2nd Order ODE Pt2Uploaded byHimanshu Sharma
- k Ch10 SolutionsUploaded byselamitsp
- Ecuaciones Diferenciales de 2 OrdenUploaded byJuan Jose Flores Fiallos
- T Ü R E V U Y G U L A M A L A R IUploaded byocasionnel
- lineer denklem sistemleriUploaded byselamitsp
- Matlab Turkce Kullanma KilavuzuUploaded byselamitsp
- Çalışma Soruları(Mühendislik Matematiği)Uploaded byselamitsp
- Hava Kirliligi Ve Modellemesi IUploaded byselamitsp
- Müh Mat Calisma Sorulari 69Uploaded byselamitsp
- Diferansiyel Denklemler 5 (Tam Diferansiyel Denklemler) 51Uploaded byselamitsp

- Kentucky LectureUploaded byDjordje Miladinovic
- Regression.pdfUploaded bySpandan Bandyopadhyay
- Lecture Notes on Inverse ProblemsUploaded byMatthias86
- 904 Complete Fall09Uploaded byNeneng Alif
- Least Squares -Uploaded byAlfredo Romero G
- ipUploaded byJohny Travolta
- Liu G.R., Han X. - Computational Inverse Techniques in Nondestructive Evaluation(2003)(592)Uploaded bytqtri
- rbf-intro.pdfUploaded byBruno Turetto Rodrigues
- SSRN-id2448655Uploaded bySam Sep A Sixtyone
- Levenberg Marquardt MethodUploaded byIotalaseria Putu
- Total Variation Applications in Computer VisionUploaded byVania V. Estrela
- Kernels and RegularisationUploaded bydarraghg
- Reduced Rank MethodsUploaded byviniezhil2581
- 9-6Uploaded bykrabtech
- Uses and Abuses of EIDORS - An Extensible Software Base for EITUploaded bysastrakusumawijaya
- 15Uploaded bymayrstjk
- Ridge Regression.pdfUploaded bySushmitha 2615
- Heat Conduction - Basic ResearchUploaded byJosé Ramírez
- GWPEST_sir2010-5169Uploaded byLuis AC
- OOPS Methodology for IhcpUploaded byJayendran Chandrashekar
- DUAL POROSITYUploaded byHafiizhNurrahman
- Tian 2015Uploaded byNguyenHuy
- Lecture NotesUploaded byAmit A
- Deblurring Images, Matrices, Spectra, And Filtering (Fundamentals of Algorithms)Uploaded byZengben Hao
- Venables Ripley 4 Ex and SolnsUploaded bysionglan
- Rudy RegularizationUploaded bydexxt0r
- 5106 Faster Ridge Regression via the Subsampled Randomized Hadamard TransformUploaded byTiberiu Tinca
- human3.6mUploaded bysuresh_496
- statistical_toolbox_manual.pdfUploaded byLyly Magnan
- Parameter Estimation and Inverse ProblemsUploaded byjabirhussain