Professional Documents
Culture Documents
If we let k(x, y ) = (x) (y ) be a kernel function, then we can write our support vector machine in terms of kernels, ! l l X X (x) = sign yi k(xi , x + ) + 1 yi k(xi , x) f
i i sv i=1 i=1
We can write our training algorithm in terms of kernel functions as well, 0 1 l l l X 1 XX i i j yi yj k(xi , xj )A , = argmax @ 2 i=1 i=1 j =1 subject to the constraints,
l X i=1
i yi = 0 , i 0, i = 1, . . . , l.
Selecting the right kernel for a particular non-linear classication problem is called feature search.
p. 1/
Kernel Functions
Kernel Name Linear Kernel Homogeneous Polynomial Kernel Non-Homogeneous Polynomial Kernel Gaussian Kernel Kernel Function k(x, y ) = x y k(x, y ) = (x y )d k(x, y ) = (x y + c)d k(x, y ) = e
|xy |2 2 2
p. 2/
Non-linear Classiers
Lets review classication with non-linear SVMs: 1. We have a non-linear data set. 2. Pick a kernel other than the linear kernel, this means that the input space will be transformed into a higher dimensional feature space. 3. Solve our dual maximum margin problem in the feature space (we are solving now a linear classication problem). 4. The resulting model is a linear model in feature space and a non-linear model in input space.
p. 3/
k(x, y ) = (x) (y ) = (x y )2 . That is, we picked our mapping from input space into feature space in such a way that the kernel in feature space can be evaluated in input space. This begs the question: What about the other kernels? What do the associated feature spaces and mappings look like? It turns out that for each kernel function we can construct a canonical feature space and mapping. This implies that features spaces and mappings for kernels are not unique!
p. 4/
Properties of Kernels
Denition: [Positive Denite Kernel] A function k : Rn Rn R such that
l X l X
i=1 j =1
i j k(xi , xj ) 0
p. 5/
Properties of Kernels
New notation: Let k : Rn Rn R be a kernel, then k(, x) is a partially evaluated kernel with x Rn and represents a function Rn R.
Theorem: [Reproducing Kernel Property] Let k : Rn Rn Rn be a positive denite kernel, then the following property holds, k(x, y ) = k(x, ) k(, y ), with x, y Rn .
p. 6/
= k(x, y ) = k(, x) k(, y ) = (() x) (() y ) = (x) (y). The section on kernels in the book shows that the construction (x) (y ) is indeed well dened and represents a dot product in an appropriate feature space.
2 2
p. 7/