Professional Documents
Culture Documents
M OTIVATION : K ERNELS
Idea
I
More precisely
I
0
k(x, x0 )
for every occurrence of
x, x
in the SVM formulae.
for some R+
is called an RBF kernel (RBF = radial basis function). The parameter is called
bandwidth.
8
CHAPTER 4. NONPARAMETRIC TECHNIQU
Other names for kRBF : Gaussian kernel, squared-exponential kernel.
If we fix x0 , the function kRBF ( . , x0 ) is (up to scaling) a spherical Gaussian density on
Rd , with mean x0 and standard deviation .
h = .2
h = .5
h=1
x2
0.6
0.15
3
(x)
0.4
0.1
(x)
(x)
0.05
2
1
0
-2
0.2
2
1
0
-2
0
-1
-1
-1
0
1
2 -2
1
1
0
-2
0
-1
0
-1
x1
-1
1
2 -2
(c)(d=2)
contours
2 -2
f (x) = sign
n
X
!
yi i kRBF (xi , x)
i=1
Circled points are support vectors. The the two contour lines running through
support vectors are the nonlinear counterparts of the convex hulls.
The lines in the image are contour lines of this surface. The classifier runs
along the bottom of the "valley" between the two classes.
C HOOSING A KERNEL
Theory
To define a kernel:
I
Practice
The data analyst does not define a kernel, but tries some well-known standard kernels
until one seems to work. Most common choices:
I
The "linear kernel" kSP (x, x0 ) = hx, x0 i, i.e. the standard, linear SVM.
k(x, x0 ) = (x), (x0 ) F
for all x, x0 Rd .
In other words
I
That is why the SVM still works: It still uses scalar products, just on another
space.
The mapping
I
To make this work, has to transform the data into data on which a linear SVM
works.
:= 2x1 x2
x2
x22
P
P
!"
P
P
P
PP
P P
P
$"
P
P
$%
!#
$#
137/196
Solution
I
Rd Rd
j j (x)j (x0 ) .
j=1
The j arefunctions
Rd R and i 0. This means the (possibly infinite) vector
(x) = ( 1 1 (x), 2 2 (x), . . .) is a feature map.
Kernel arithmetic
Various functions of kernels are again kernels: If k1 and k2 are kernels, then e.g.
k1 + k2
are again kernels.
Peter Orbanz Data Mining
k1 k2
const. k1
T HE K ERNEL T RICK
Kernels in general
I
min
vH ,c
n
X
i=1
yi (k(vH ,
xi ) c) 1 i
s.t.
and i 0
s.t.
) :=
W(
n
X
i=1
n
X
yi i = 0
n
1
1X
i jyiyj (k(
xi ,
xj ) + I{i = j})
2 i,j=1
and
i 0
i=1
Classifier
f (x) = sgn
n
X
i=1
!
yi i k(
xi , x)
S UMMARY: SVM S
Basic SVM
I
Full-fledged SVM
Ingredient
Purpose
Maximum margin
Slack variables
Kernel
Use in practice
I
H ISTORY