You are on page 1of 7

Kernel Functions

If we let k(x, y ) = (x) (y ) be a kernel function, then we can write our support vector machine in terms of kernels, ! l l X X (x) = sign yi k(xi , x + ) + 1 yi k(xi , x) f
i i sv i=1 i=1

We can write our training algorithm in terms of kernel functions as well, 0 1 l l l X 1 XX i i j yi yj k(xi , xj )A , = argmax @ 2 i=1 i=1 j =1 subject to the constraints,
l X i=1

i yi = 0 , i 0, i = 1, . . . , l.

Selecting the right kernel for a particular non-linear classication problem is called feature search.

p. 1/

Kernel Functions
Kernel Name Linear Kernel Homogeneous Polynomial Kernel Non-Homogeneous Polynomial Kernel Gaussian Kernel Kernel Function k(x, y ) = x y k(x, y ) = (x y )d k(x, y ) = (x y + c)d k(x, y ) = e

|xy |2 2 2

Free Parameters none d2 d 2, c > 0 >0

p. 2/

Non-linear Classiers
Lets review classication with non-linear SVMs: 1. We have a non-linear data set. 2. Pick a kernel other than the linear kernel, this means that the input space will be transformed into a higher dimensional feature space. 3. Solve our dual maximum margin problem in the feature space (we are solving now a linear classication problem). 4. The resulting model is a linear model in feature space and a non-linear model in input space.

p. 3/

A Closer Look at Kernels


We have shown that for (x1 , x2 ) =
2 (x2 1 , x2 ,

2x1 x2 ) the kernel

k(x, y ) = (x) (y ) = (x y )2 . That is, we picked our mapping from input space into feature space in such a way that the kernel in feature space can be evaluated in input space. This begs the question: What about the other kernels? What do the associated feature spaces and mappings look like? It turns out that for each kernel function we can construct a canonical feature space and mapping. This implies that features spaces and mappings for kernels are not unique!

p. 4/

Properties of Kernels
Denition: [Positive Denite Kernel] A function k : Rn Rn R such that
l X l X

i=1 j =1

i j k(xi , xj ) 0

holds is called a positive denite kernel. Here, i , j R and x1 , . . . , xl is a set of points in Rn .

p. 5/

Properties of Kernels
New notation: Let k : Rn Rn R be a kernel, then k(, x) is a partially evaluated kernel with x Rn and represents a function Rn R.

Theorem: [Reproducing Kernel Property] Let k : Rn Rn Rn be a positive denite kernel, then the following property holds, k(x, y ) = k(x, ) k(, y ), with x, y Rn .

p. 6/

Feature Spaces are not Unique


We illustrate that feature spaces are not unique using our homogeneous polynomial kernel to the power of two, that is, k(x, y ) = (x y )2 with x, y R2 . Let : R2 R3 such that 2 2 2 2 (x) = (x1 , x2 ) = (x1 , x2 , 2x1 x2 ) and : R2 {R2 R} with (x) = k(, x) = (() x) , be two mappings from our input space to two different feature spaces, then 2 2 2 2 2 2 2 2 (x) (y ) = (x1 , x2 , 2x1 x2 ) (y1 , y 2 , 2y 1 y 2 ) = (x y )
2 2

= k(x, y ) = k(, x) k(, y ) = (() x) (() y ) = (x) (y). The section on kernels in the book shows that the construction (x) (y ) is indeed well dened and represents a dot product in an appropriate feature space.
2 2

p. 7/

You might also like