Professional Documents
Culture Documents
Introduction
This document contains supplementary material for my STAT 6601 SVM
Project. In addition to all of the code used to prepare and run the
examples in the Powerpoint presentation it contains:
The analysis of the Crabs data set
Examples showing the SVMs may be used to do tasks other than
classification
Multiple runs of the Microarray example under different conditions
A list of references
A list of websites concerned with support vector machines
A list of websites containing test data sets
JBR 5/16/2019 1
STAT 6601 Project Joseph Rickert
true B O
B 100 0
O 0 100
print(crabs.svm)
Call:
svm(formula = crabs$sp ~ ., data = lcrabs, cost = 100, gamma = 1)
Parameters:
SVM-Type: C-classification
SVM-Kernel: radial
cost: 100
gamma: 1
summary(crabs.svm)
Call:
svm(formula = crabs$sp ~ ., data = lcrabs, cost = 100, gamma = 1)
Parameters:
SVM-Type: C-classification
SVM-Kernel: radial
cost: 100
gamma: 1
plot(cmdscale(dist(crabs[,4:8])),
col = as.integer(crabs[,1]),
pch = c("o","+")[1:200 %in% crabs.svm$index + 1])
o +
o oo
o o oo o o +o o
2
o
o oo o oo o o oo
o+
cmdscale(dist(crabs[, 4:8]))[,2]
+ o + + oo o
+ oo oo +
+ oo o o o +
1
+ ooo o o+o oo o
o oo o o oo
o + o o+o o+o o oo +oooo o
o o o o o +o o + o o + o
+ o ++ + + o o o +
oo o o
0
o o
+ + ooo o o o o ooo oo o o
+ o oooo + o o o o o
o o+ +o oo o
+ + o o o + o o
oo o
-1
o o o+
+ oo o o
o +
o
o o oo +
oo oo +
+ o o oo+o ooo
-2
o o
o
+ o
+
-3
JBR 5/16/2019 2
-30 -20 -10 0 10 20
cmdscale(dist(crabs[, 4:8]))[,1]
STAT 6601 Project Joseph Rickert
Parameters:
SVM-Type: C-classification
SVM-Kernel: radial
cost: 1
gamma: 0.2
summary(model.crabs)
Call:
svm(formula = sp ~ ., data = lcrabs)
Parameters:
SVM-Type: C-classification
SVM-Kernel: radial
cost: 1
gamma: 0.2
JBR 5/16/2019 3
STAT 6601 Project Joseph Rickert
data(iris)
attach(iris)
Parameters:
SVM-Type: C-classification
SVM-Kernel: radial
cost: 1
gamma: 0.25
> summary(model)
Call:
svm(formula = Species ~ ., data = iris)
Parameters:
SVM-Type: C-classification
SVM-Kernel: radial
cost: 1
gamma: 0.25
( 8 22 21 )
JBR 5/16/2019 4
STAT 6601 Project Joseph Rickert
Number of Classes: 3
Levels:
setosa versicolor virginica
# (same as:)
svm 49
pred <- fitted(model)
# Check accuracy:
table(pred, y)
y
pred setosa versicolor virginica
setosa 50 0 0
versicolor 0 48 2
virginica 0 2 48
+ +
o o
o
1.0
o o o
o
cmdscale(dist(iris[, -5]))[,2]
+ o
ooo + o
0.5
oo o + o + o
o o
oo
o oo+ oo+
+ o o
oooo oo
oo o o + ++ o oo +
+ ooooo+ o+
oo
+ o
o + ++ o
0.0
o
o o oo o + + o
o ooo + + +
o ++ + o
ooooo o ooo + o o+ o
o o o oo ++
oo o++ ++
-0.5
+o o+ + +
+ oo +o+o
oo+ o o
+ +o
+
+
-1.0
+o
+
+
-3 -2 -1 0 1 2 3 4
JBR 5/16/2019
cmdscale(dist(iris[, -5]))[,1] 5
STAT 6601 Project Joseph Rickert
Second Analysis of iris data splitting the data set into training and test data
# Split iris data set into training and testing data sets
# First build the vector columns
SL.TR <- c(Sepal.Length[1:25],Sepal.Length[51:75],Sepal.Length[101:125])
SL.TE <- c(Sepal.Length[26:50],Sepal.Length[76:100],Sepal.Length[126:150])
SW.TR <- c(Sepal.Width[1:25],Sepal.Width[51:75],Sepal.Width[101:125])
SW.TE <- c(Sepal.Width[26:50],Sepal.Width[76:100],Sepal.Width[126:150])
PL.TR <- c(Petal.Length[1:25],Petal.Length[51:75],Sepal.Length[101:125])
PL.TE <- c(Petal.Length[26:50],Petal.Length[76:100],Petal.Length[126:150])
PW.TR <- c(Petal.Width[1:25],Petal.Width[51:75],Petal.Width[101:125])
PW.TE <- c(Petal.Width[26:50],Petal.Width[76:100],Petal.Width[126:150])
# Next build the factor columns
S.TR <- c(Species[1:25],Species[51:75],Species[101:125])
fS.TR <- factor(S.TR,levels=1:3)
levels(fS.TR) <- c("setosa","veriscolor","virginica")
S.TE <- c(Species[26:50],Species[76:100],Species[126:150])
fS.TE <- factor(S.TR,levels=1:3)
levels(fS.TE) <- c("setosa","veriscolor","virginica")
# Make two new data frames
iris.train <-data.frame(SL.TR,SW.TR,PL.TR,PW.TR,fS.TR)
iris.test <-data.frame(SL.TE,SW.TE,PL.TE,PW.TE,fS.TE)
attach(iris.train)
attach(iris.test)
Parameters:
SVM-Type: C-classification
SVM-Kernel: radial
cost: 1
gamma: 0.25
( 7 13 12 )
Number of Classes: 3
Levels:
setosa veriscolor virginica
# Check accuracy:
table(pred.2, y.2)
y.2
pred.2 setosa veriscolor virginica
setosa 25 0 0
veriscolor 0 25 0
virginica 0 0 25
JBR 5/16/2019 6
STAT 6601 Project Joseph Rickert
+
o +
1.0
+
+
cmdscale(dist(iris.train[, -5]))[,2]
o o o
o + o
o
0.5
o o
o+ o+ o
ooo o o oo +
o
o + + o
+ + o
0.0
oo oo o+ ++o
o
oo+o + o + + o
o + +o
o o
-0.5
o +
o o o
+
+ ++
+ ++
-1.0
+ +
-2 0 2 4
cmdscale(dist(iris.train[, -5]))[,1]
JBR 5/16/2019 7
STAT 6601 Project Joseph Rickert
1.0
+
o
o
+o + o
+ o
0.5
cmdscale(dist(iris.test[, -5]))[,2]
o
o
+ o +o o+
o + oo o
o +o
++ooo + + ++ o
o o
+ +
0.0
oo oo
+ + o
+ + o+ o o
o+oo + ++ o
oo o +o o
-0.5
o
o +
o+ +
o
-1.0
+
+
-3 -2 -1 0 1 2 3
cmdscale(dist(iris.test[, -5]))[,1]
JBR 5/16/2019 8
STAT 6601 Project Joseph Rickert
-1
-2
0 1 2 3 4 5
JBR 5/16/2019 9
STAT 6601 Project Joseph Rickert
setosa versicolor
50 100
setosa versicolor
2 1
Call:
svm(formula = Species ~ ., data = i2, class.weights = wts)
Parameters:
SVM-Type: C-classification
SVM-Kernel: radial
cost: 1
gamma: 0.25
Number of Classes: 2
Levels:
setosa versicolor
JBR 5/16/2019 10
STAT 6601 Project Joseph Rickert
# or:
m <- svm(~ a + b, gamma = 0.1)
# test:
newdata <- data.frame(a = c(0, 4), b = c(0, 4))
predict (m, newdata)
# visualize:
plot(X, col = 1:1000 %in% m$index + 1, xlim = c(-5,5), ylim=c(-5,5))
points(newdata, pch = "+", col = 2, cex = 5)
+
4
2
+
b
0
-2
-4
-4 -2 0 2 4
JBR 5/16/2019 11
STAT 6601 Project Joseph Rickert
Supplementary fig. 2. Expression levels of predictive genes in independent dataset. The expression levels of the 50 genes
most highly correlated with the ALL-AML distinction in the initial dataset were determined in the independent dataset.
Each row corresponds to a gene, with the columns corresponding to expression levels in different samples. The
expression level of each gene in the independent dataset is shown relative to the mean of expression levels for that gene
in the initial dataset. Expression levels greater than the mean are shaded in red, and those below the mean are shaded in
blue. The scale indicates standard deviations above or below the mean. The top panel shows genes highly expressed in
ALL, the bottom panel shows genes more highly expressed in AML.
JBR 5/16/2019 12
STAT 6601 Project Joseph Rickert
JBR 5/16/2019 13
STAT 6601 Project Joseph Rickert
JBR 5/16/2019 14
STAT 6601 Project Joseph Rickert
Call:
svm(formula = fy ~ ., data = fmat.train[, 1:3000])
Parameters:
SVM-Type: C-classification
SVM-Kernel: radial
cost: 1
gamma: 0.0003333333
( 26 11 )
Number of Classes: 2
Levels:
ALL AML
x2 <- fmat.test[,1:3000]
pred <- predict(model.ma, x2)
# Check accuracy:
table(pred, fy2)
fy2
pred ALL AML
ALL 20 13
AML 0 1
JBR 5/16/2019 15
STAT 6601 Project Joseph Rickert
+
40000
+
20000
+
+ ++ +
+
+o + + ++
+ + ++
pc[,2]
+ + +
+ + +
0
+
+
++ +
+
++
-20000
+
+ +
+
+
-40000
pc[,1]
JBR 5/16/2019 16
STAT 6601 Project Joseph Rickert
30000
+
+
++
20000
+
+ + + +
+ +
+
10000
pc[,2]
+
0
+
++
+
+
+ ++ ++ o
+
+ + +
+
-20000
+ +
+ +
pc[,1]
Call:
svm(formula = fy ~ ., data = df.mat.data[, 1:300])
Parameters:
SVM-Type: C-classification
SVM-Kernel: radial
cost: 1
gamma: 0.003333333
JBR 5/16/2019 17
STAT 6601 Project Joseph Rickert
fy
pred ALL AML
ALL 27 0
AML 0 11
o
20000
+
o
10000
+ o o o
pc[,2]
+
o
o +
+ o
0
+ + o+ + +
o o
+
o + o+o + +
+ oo
-10000
+ + o
o
+
pc[,1]
JBR 5/16/2019 18
STAT 6601 Project Joseph Rickert
Call:
svm(formula = fy ~ ., data = df.mat.data[, 1:30])
Parameters:
SVM-Type: C-classification
SVM-Kernel: radial
cost: 1
gamma: 0.03333333
Number of Support Vectors: 33
( 22 11 )
Number of Classes: 2
Levels:
ALL AML
fy
pred ALL AML
ALL 27 0
AML 0 11
JBR 5/16/2019 19
STAT 6601 Project Joseph Rickert
plot(pc,
+ col = as.integer(fy),
+ pch = c("o","+"),main="ALL 'Black' and AML 'Red' Classes")
o +
500
+ +
o
+ +
o
o o
+ o o
+ + +o
+
+ + + o
0
o
pc[,2]
o +
o +o +
+ o
o
+
o o
-500
o +
pc[,1]
JBR 5/16/2019 20
STAT 6601 Project Joseph Rickert
Call:
svm(formula = fy ~ ., data = fmat.train[, 1:3000], gamma = 1e-07)
Parameters:
SVM-Type: C-classification
SVM-Kernel: radial
cost: 1
gamma: 1e-07
Number of Support Vectors: 24
( 13 11 )
Number of Classes: 2
Levels:
ALL AML
JBR 5/16/2019 21
STAT 6601 Project Joseph Rickert
References
Cristiani, Nello and John Shawe-Taylor. An Introduction to Support Vector
Machines and other kernel-based learning methods. Cambridge: Cambridge UP, 2000
Venerables, W.N. and B.D. Ripley. Modern Applied Statistics with S. 4th ed. NY:
Springer, 2002
Websites
www.kernel-machines.org
www.support-vector.net
www.csie.ntu.edu.tw/~cjlin/libsvm
http://cran.stat.ucla.edu/doc/packages/e1071.pdf
JBR 5/16/2019 22