Professional Documents
Culture Documents
INTRODUCTION
TO
MACHINE
LEARNING
3RD EDITION
ETHEM ALPAYDIN
© The MIT Press, 2014
alpaydin@boun.edu.tr
http://www.cmpe.boun.edu.tr/~ethem/i2ml3e
CHAPTER 13:
KERNEL MACHINES
Kernel Machines
3
if C1
X x , r t where r
t
t t t 1 x
1 if x t
C2
find w and w0 such that
w T xt w0 1 for r t 1
w T xt w0 1 for r t 1
which can be rewritten as
r t w T xt w0 1
w
r t w T xt w0
We require , t
w
2
Margin
6
min w subject to r t w T xt w 0 1, t
1 2
2
Lp w t r t w T xt w 0 1
N
1 2
2 t 1
w r w x w 0 t
N N
1 2 t t T t
2 t 1 t 1
Lp N
0 w t r t xt
w t 1
Lp N
0 t r t 0
w 0 t 1
7
Ld w w w T t r t xt w0 t r t t
1 T
2 t t t
w w t
1 T
2 t
r r x x t
1 t s t s t T s
2 t s t
subject to t r t 0 and t 0, t
t
Most αt are 0 and only a small number have αt >0; they are
the support vectors
8
Soft Margin Hyperplane
9
r t wT x t w0 1 t
Soft error
t
t
New primal is
1
2
2
Lp w C t t t t r t wT x t w0 1 t t t t
10
Hinge Loss
11
0 if y t r t 1
Lhinge(y , r )
t t
1 y t t
r otherwise
n-SVM
12
1 1
min w - n t
2
2 N t
subject to
r t w T xt w 0 t , t 0, 0
Ld r r x x
1 N t s t s t T s
2 t 1 s
subject to
1
t r
t t
0 ,0 t
,
N t
t
n
gx t r t K xt , x
t
Vectorial Kernels
14
Polynomials of degree q:
K x , x x x 1
t T t q
K x, y xT y 1
2
x1y1 x 2 y 2 12
1 2 x1y1 2 x 2 y 2 2 x1 x 2 y1y 2 x12 y12 x 22 y 22
x 1, 2 x1 , 2 x 2 , 2 x1 x 2 , x , x 2
1
2 T
2
Vectorial Kernels
15
Radial-basis functions:
xt x 2
K xt , x exp
2s 2
Defining kernels
16
Kernel “engineering”
Defining good measures of similarity
String kernels, graph kernels, image kernels, ...
Empirical kernel map: Define a set of templates mi
and score function s(x,mi)
(xt)=[s(xt,m1), s(xt,m2),..., s(xt,mM)]
and
K(x,xt)= (x)T (xt)
Multiple Kernel Learning
17
t s r t r s i K i xt , x s
1
Ld t
t 2 t s i
1-vs-all
Pairwise separation
Error-Correcting Output Codes (section 17.5)
Single multiclass optimization
1 K
min w i C it
2
2 i 1 i t
subject to
w zt T xt w zt 0 w i T xt wi 0 2 it , i z t , it 0
SVM for Regression
19
min w C t t
1 2
2
t
r t w T x w0 t
w x w r
T
0
t
t
t , t 0
20
Kernel Regression
21
2 t
subject to
w T xu w T xv 1 t , t : r u r v , it 0
One-Class Kernel Machines
23
min R 2 C t
t
subject to
x t a R 2 t , t 0
Ld x x r r x x
N
t t T s t s t s t T s
t t 1 s
subject to
0 t C , t 1
t
24
Large Margin Nearest Neighbor
25