Professional Documents
Culture Documents
Classification
n
n
n
n
SVMGeneral Philosophy
Small Margin
Large Margin
Support Vectors
5
WX+b=0
w 0 + w 1 x1 + w 2 x2 = 0
H1: w0 + w1 x1 + w2 x2 1
H2: w0 + w1 x1 + w2 x2 1 for yi = 1
SVMLinearly Separable
Distance between point | xi w + b |
and hyperplane:
|| w ||
Therefore, the margin is 2/||w||
There are infinite hyperplanes
separating the two classes but we
want to find the best one, the one
that minimizes classification error
on unseen data.
Support vectors
Margin
1 T
n Minimize
w w
2
n
Subject to yi(wxi+b) 1
9
w = i i yi x i
learned
weight
Classification
function
Support
vector
w x + b = i i yi xi x + b
Notice the inner product between the test
point x and the support vectors xi used as
a measure of similarity.
10
Thus, an SVM with a small number of support vectors can have good
generalization, even when the dimensionality of the data is high
11
Nonlinear SVMs
n
0
n
0
n
12
Nonlinear SVMs
n
: x (x)
13
y ( x ) ( x) + b = y K ( x , x) + b
i
14
2
Consider the mapping ( x) = ( x, x )
0
x2
( x) ( y) = ( x, x 2 ) ( y, y 2 ) = xy + x 2 y 2
K ( x, y) = xy + x 2 y 2
15
I (h1 , h2 ) = min(h1 (i ), h2 (i ))
i =1
1
2
K (h1 , h2 ) = exp D(h1 , h2 )
A
Sigmoid kernel
17
Read the data set once, construct a statistical summary of the data
(i.e., hierarchical clusters) given a limited amount of memory
Micro-clustering: Hierarchical indexing structure
n
20
n
n
SVM
n
n
Deterministic algorithm
Nice generalization
properties
Hard to learn learned
in batch mode using
quadratic programming
techniques
Using kernels can learn
very complex functions
Neural Network
n
Nondeterministic
algorithm
Generalizes well but
doesnt have strong
mathematical foundation
Can easily be learned in
incremental fashion
To learn complex
functionsuse multilayer
perceptron (nontrivial)
23
Representative implementations
n
LIBSVM: an efficient implementation of SVM, multiclass classifications, nu-SVM, one-class SVM, including
also various interfaces with java, python, etc.