452 views

Original Title: svm

Uploaded by Shopno Chura

- Neural Networks
- Jason Svm Tutorial
- Neural Network
- svm
- SVM3
- Mastering Social Media Mining with R - Sample Chapter
- efficient_discriminative_pawan09
- The Best of the Machine Learning Algorithms Used in Artificial Intelligence
- SVM Tutorial
- The ML Book
- Cost.compile.cp10
- svm using matlab
- Tutorial SVM Matlab
- SVM in matlab
- Lagrangian Relaxation
- Fault Diagnosis of Power Transformers Using Kernel
- A Real-Time Interference Monitoring Technique for GNSS Based on a Twin Support Vector Machine Method.pdf
- Nonconvex Wireless
- 04sch1
- MPC_Book

You are on page 1of 25

(SVM)

Y.H. Hu SVM − 1

Outline

Mercer's Theorem

Y.H. Hu SVM − 2

Linear Hyperplane Classifier

Given: {(xi, di); i = 1 to N, di ∈

x2

{+1, −1}}.

A linear hyperplane classifier is

a hyperplane consisting of

x points x such that

r

w H = {x| g(x) = wTx + b = 0}

For x in the side of ¾: wTx + b ≤ 0; d = −1.

Distance from x to H: r = wTx/|w| − (−b/|w|) = g(x) /|w|

Y.H. Hu SVM − 3

Distance from a Point to a Hyper-plane

The hyperplane H is characterized by

(*) wTx + b = 0 H

C

X*

(*) says any vector x on H that r

w

B

project to w will have a length of

OA = −b/|w|. O

projection onto vector w is

wTx*/|w| = OA + BC. Or equivalently, wTx*/|w| = r−b/|w|.

Hence r = (wTx*+b)/|w| = g(x*)/|w|

Y.H. Hu SVM − 4

Optimal Hyperplane: Linearly Separable Case

be in the center of the gap.

• Supporting Vectors −

Samples on the boundaries.

Supporting vectors alone

ρ

ρ can determine optimal

hyperplane.

x1

• Question: How to find

optimal hyperplane?

For di = +1, g(xi) = wTxi + b ≥ ρ|w| ⇒ woTxi + bo ≥ 1

For di = −1, g(xi) = wTxi + b ≤ −ρ|w| ⇒ woTxi + bo ≤ −1

Y.H. Hu SVM − 5

Separation Gap

For xi being a supporting vector,

For di = +1, g(xi) = wTxi + b = ρ|w| ⇒ woTxi + bo = 1

For di = −1, g(xi) = wTxi + b = −ρ|w| ⇒ woTxi + bo = −1

Hence wo = w/(ρ|w|), bo = b/(ρ|w|). But distance from xi to

hyperplane is ρ = g(xi)/|w|. Thus wo = w/g(xi), and ρ = 1/|wo|.

The maximum distance between the two classes is

2ρ = 2/|wo|.

Hence the objective is to find wo, bo to minimize |wo| (so that ρ

is maximized) subject to the constraints that

woTxi + bo ≥ 1 for di = +1; and woTxi + bo ≤ 1 for di = −1.

Combine these constraints, one has: di•(woTxi + bo) ≥ 1

Y.H. Hu SVM − 6

Quadratic Optimization Problem Formulation

Given {(xi, di); i = 1 to N}, find w and b such that

φ(w) = wTw/2

is minimized subject to N constraints

di • (wTxi + b) ≥ 0; 1 ≤ i ≤ N.

N

J(w, b, α ) = φ(w) − ∑ α i [d ( w

i =1

i

T

xi + b) − 1]

N

Set ∂J(w,b,α)/∂w = 0 ⇒ w = ∑ α i d i xi

i =1

N

∂J(w,b,α)/∂α = 0 ⇒ ∑ α i di = 0

i =1

Y.H. Hu SVM − 7

Optimization (continued)

The solution of Lagrange multiplier problem is at a saddle

point where the minimum is sought w.r.t. w and b, while the

maximum is sought w.r.t. αi.

Kuhn-Tucker Condition: at the saddle point,

αi[di (wTxi + b) − 1] = 0 for 1 ≤ i ≤ N.

⇒ If xi is NOT a suppor vector, the corresponding αi = 0!

⇒ Hence, only support vector will affect the result of

optimization!

Y.H. Hu SVM − 8

A Numerical Example

J = w2/2 − α1(−w−b−1) − α2(2w+b−1) − α3(3w+b−1)

∂J/∂w = 0 ⇒ w = − α1 + 2α2 + 3α3

∂J/∂b = 0 ⇒ 0 = α1 − α2 − α3

Solve: (a) −w−b−1 = 0 (b) 2w+b−1 = 0 (c); 3w + b −1 = 0

(b) and (c) conflict each other. Solve (a), (b) yield w = 2,

b = −3. From the Kuhn-Tucker condition, α3 = 0. Thus, α1 = α2

= 2. Hence the solution of decision boundary is: 2x − 3 = 0. or

x = 1.5! This is shown as the dash line in above figure.

Y.H. Hu SVM − 9

Primal/Dual Problem Formulation

Given a constrained optimization problem with a convex cost

function and linear constraints; a dual problem with the

Lagrange multipliers providing the solution can be formulated.

Duality Theorem (Bertsekas 1995)

(a) If the primal problem has an optimal solution, then the dual

problem has an optimal solution with the same optimal

values.

(b) In order for wo to be an optimal solution and αo to be an

optimal dual solution, it is necessary and sufficient that wo

is feasible for the primal problem and

Φ(wo) = J(wo,bo,α o) = Minw J(w,bo,α o)

Y.H. Hu SVM − 10

Formulating the Dual Problem

1 T N N N

J ( w, b,α ) = w w − ∑ α i di w xi − b ∑α i di + ∑α i

T

2 i =1 i =1 i =1

N N

With w = ∑ α i d i xi and ∑ α i d i = 0 . These lead to a

i =1 i =1

Dual Problem

N 1 N N

Maximize Q (α ) = ∑ α i − ∑ ∑ α iα j d i d j xiT x j

i =1 2 i =1 j =1

N

Subject to: ∑ α i d i = 0 and αi ≥ 0 for i = 1, 2, …, N.

i =1

x12 L x1 xN α1d1

M M

N

1

O

2

i =1

xN x1 L xN2 α N d N

Y.H. Hu SVM − 11

Numerical Example (cont’d)

1 2 3 − α1

Q(α ) = ∑ α i − [− α1 α 2 α 3 ] 2 4 6 α 2

3 1

i =1 2

3 6 9 α 3

or Q(α) = α1 + α2 + α3 − [0.5α12 + 2α22 + 4.5α32 − 2α1α2 −

3α1α3 + 6α2α3]

subject to constraints: −α1 + α2 + α3 = 0, and

α1 ≥ 0, α2 ≥ 0, and α3 ≥ 0.

Use Matlab Optimization tool box command:

x=fmincon(‘qalpha’,X0, A, B, Aeq, Beq)

The solution is [α1 α2 α3] = [2 2 0] as expected.

Y.H. Hu SVM − 12

Implication of Minimizing ||w||

Let D denote the diameter of the smallest hyper-ball that

encloses all the input training vectors {x1, x2, …, xN}. The set

of optimal hyper-planes described by the equation

WoTx + bo = 0

has a VC-dimension h bounded from above as

h ≤ min { D2/ρ2, m0} + 1

where m0 is the dimension of the input vectors, and ρ = 2/||wo||

is the margin of the separation of the hyper-planes.

VC-dimension determines the complexity of the classifier

structure, and usually the smaller the better.

Y.H. Hu SVM − 13

Non-separable Cases

Recall that in linearly separable case, each training sample

pair (xi, di) represents a linear inequality constraint

di(wTxi + b) ≥ 1, i = 1, 2, …, N

If the training samples are not linearly separable, the

constraint can be modified to yield a soft constraint:

di(wTxi + b) ≥ 1−ζi , i = 1, 2, …, N

{ζi; 1 ≤ i ≤ N} are known as slack variables. If ζi > 1, then the

corresponding (xi, di) will be mis-classified.

N

The minimum error classifier would minimize ∑ I (ζ i − 1) , but it

i =1

is non-convex w.r.t. w. Hence an approximation is to minimize

1 T N

Φ ( w, ζ ) = w w + C ∑ ζ i

2 i =1

Y.H. Hu SVM − 14

Primal and Dual Problem Formulation

Primal Optimization Problem Given {(xi, ζi);1≤ i≤ N}. Find w, b

such that

1 T N

Φ ( w, ζ ) = w w + C ∑ ζ i

2 i =1

di(wTxi + b) ≥ 1−ζi for i = 1, 2, …, N.

Dual Optimization Problem Given {(xi, ζi);1≤ i≤ N}. Find

Lagrange multipliers {αi; 1 ≤ i ≤ N} such that

N 1N N

Q(α ) = ∑ α i − ∑ ∑ α iα j d i d j xiT xi

i =1 2 i=1 j =1

is maximized subject to the constraints (i) 0 ≤ αi ≤ C (a user-

N

specified positive number) and (ii) ∑ α i d i = 0

i =1

Y.H. Hu SVM − 15

Solution to the Dual Problem

Optimal Solution to the Dual problem is:

Ns

wo = ∑ α o ,i d i xi Ns: # of support vectors.

i =1

(i) αi [di (wTxi + b) − 1 + ζi] = 0 (*)

(ii) µi ζi = 0

{µi; 1 ≤ i ≤ N} are Lagrange multipliers to enforce the condition

ζi ≥ 0. At optimal point of the primal problem, ∂ Φ/∂ζi = 0. One

may deduce that ζi = 0 if αi ≤ C. Solving (*), we have

1 N

bo = ∑ α i ( wT xi − d i )

N i =1

Y.H. Hu SVM − 16

Matlab Implementation

% svm1.m: basic support vector machine

% X: N by m matrix. i-th row is x_i

% d: N by 1 vector. i-th element is d_i

% X, d should be loaded from file or read from input.

% call MATLAB optimization tool box function fmincon

a0=eps*ones(N,1);

C = 1;

a=fmincon('qfun',a0,[],[],d',0,zeros(N,1),C*ones(N,1),…

[],[],X,d)

wo=X'*(a.*d)

bo=sum(diag(a)*(X*wo-d))/sum([a > 10*eps])

function y=qfun(a,X,d);

% the Q(a) function. Note that it is actually -Q(a)

% because we call fmincon to minimize -Q(a) is

% the same as to maximize Q(a)

[N,m]=size(X);

y=-ones(1,N)*a+0.5*a'*diag(d)*X*X'*diag(d)*a;

Y.H. Hu SVM − 17

Inner Product Kernels

In general, if the input is first transformed via a set of nonlinear

functions {φi(x)} and then subject to the hyperplane classifier

p p

g ( x) = ∑ w jϕ j ( x) + b =∑ w jϕ j ( x) = wT ϕ b = w0 ; ϕ 0 ( x) = 1

j =1 j =0

p

K ( x, y ) = ∑ ϕ j ( x)ϕ j ( y ) = ϕϕ T

j =0

N

1 N N

Q (α ) = ∑ α i − ∑∑ α iα j d i d j K ( xi , x j )

i =1 2 i=1 j =1

Y.H. Hu SVM − 18

General Pattern Recognition with SVM

ϕ1(x)

w1 b

x1

ϕ2(x) w2

x2

xi •

+ di

•

• • •

• •

• •

xm

wp

ϕp(x)

≤ p}, any pattern recognition problem can be solved.

Y.H. Hu SVM − 19

Polynomial Kernel

Consider a polynomial kernel

m m m m

K ( x, y ) = (1 + x y ) = 1 + 2∑ xi yi + 2∑ ∑ xi yi x j y j + ∑ xi2 yi2

T 2

i =1 i =1 j = i +1 i =1

ϕ(x) = [1 x12, …, xm2, √2 x1, …, √2xm, √2 x1 x2, …, √2 x1xm,

√2 x2 x3, …, √2 x2xm, …,√2 xm−1xm]

= [1 ϕ1(x), …, ϕp(x)]

where p = 1 +m + m + (m−1) + (m−2) + … + 1 = (m+2)(m+1)/2

Hence, using a kernel, a low dimensional pattern classification

problem (with dimension m) is solved in a higher dimensional

space (dimension p+1). But only φj(x) corresponding to

support vectors are used for pattern classification!

Y.H. Hu SVM − 20

Numerical Example: XOR Problem

Training samples:

(−1 −1; −1), (−1 1 +1),

(1 −1 +1), (1 1 −1)

ϕ(x) = [1 x12 x22 √2 x1, √2 x2, √2 x1x2]T

1 1 1 − 2 − 2 2 9 1 1 1

1 1

1 1 1 − 2 2 − 2 9 1

Φ= ; K ( x i , x j ) = ΦΦ T = ;

1 1 0 2 − 2 − 2 1 1 9 1

1

1 1 1 2 2 2 1 1 9

Note dim[ϕ

Y.H. Hu SVM − 21

XOR Problem (Continued)

Note that K(xi, xj) can be calculated directly without using Φ!

−1 −1

E.g. K1,1 = (1 + [− 1 − 1] ) 2 = 9; K1, 2 = (1 + [− 1 − 1] ) 2 = 1

− 1 1

The corresponding Lagrange multiplier α = (1/8)[1 1 1 1]T.

N

w= ∑ α i d iϕ (xi ) = ΦT [α1d1 α 2 d 2 L α N d N ]

T

i =1

= [0 0 0 0 0 −1/√2]T

(x1, x2) (−1, −1) (−1, +1) (+1,−1) (+1,+1)

y = −1 x1x2 −1 +1 +1 −1

Y.H. Hu SVM − 22

Other Types of Kernels

Polynomial learning (xTy + 1)p p: selected a priori

machine

Radial basis 1 2 σ : selected a priori

2

exp − 2 || x − y ||

function 2σ

Two-layer tanh(βoxTy + β1) only some βo and β1

perceptron values are feasible.

theorem"!

Y.H. Hu SVM − 23

Mercer's Theorem

Let K(x,y) be a continuous, symmetric kernel, defined on a≤

x,y ≤ b. K(x,y) admits an eigen-function expansion

∞

K ( x, y ) = ∑ λiϕ i (x)ϕ i ( y )

i =1

uniformly if and only if

aa

∫ ∫ K ( x, y )ψ (x)ψ ( y )dxdy ≥ 0

bb

a

for all ψ(x) such that ∫ ψ 2 (x)dx < ∞ .

b

Y.H. Hu SVM − 24

Testing with Kernels

For many types of kernels, ϕ(x) can not be explicitly

represented or even found. However,

N

w= ∑ i i i

α d ϕ (x ) = Φ T

[α d

1 1 α d

2 2 L α d

N N ] = Φ T

f

T

i =1

in the XOR problem, f = (1/8)[−1 +1 +1 −1]T. Suppose that x

= (−1, +1), then

− 1 / 8

− 1 − 1 − 1 − 1 2 1 / 8

y ( x) = K ( x, x j ) f = 1 + [ − 1 − 1] ) 2 1 + [ − 1 1] ) 2 1 + [1 − 1] ) 2 1 + [1 1] )

1 1 1 1 1 / 8

− 1 / 8

= [1 9 1 1][− 1 / 8 1 / 8 1 / 8 − 1 / 8] = 1

T

Y.H. Hu SVM − 25

- Neural NetworksUploaded byMarioEs
- Jason Svm TutorialUploaded bybrunojtfernandes
- Neural NetworkUploaded byKannan.s
- svmUploaded bySujith Kattappana
- SVM3Uploaded byAmer Mohib
- Mastering Social Media Mining with R - Sample ChapterUploaded byPackt Publishing
- efficient_discriminative_pawan09Uploaded byvibhav_brookes
- The Best of the Machine Learning Algorithms Used in Artificial IntelligenceUploaded byIndrasen Poola
- SVM TutorialUploaded byNeetu Verma
- The ML BookUploaded byfioan89
- Cost.compile.cp10Uploaded bycasarokar
- svm using matlabUploaded byniera_85
- Tutorial SVM MatlabUploaded byVíctor Garrido Arévalo
- SVM in matlabUploaded bytruongvinhlan19895148
- Lagrangian RelaxationUploaded byLana Čaldarević
- Fault Diagnosis of Power Transformers Using KernelUploaded byEléctricos Quevedo
- A Real-Time Interference Monitoring Technique for GNSS Based on a Twin Support Vector Machine Method.pdfUploaded byfabry8080
- Nonconvex WirelessUploaded byRat.T
- 04sch1Uploaded byarchana10bhosale
- MPC_BookUploaded byNilay Saraf
- Bearing Defect IdentificationUploaded bymhsgh2003
- Credit Rating Analysis With Support Vector Machines and Neural NetworksUploaded byhenrique_oliv
- Machine Learning Algorithms in Web Page ClassificationUploaded byAnonymous Gl4IRRjzN
- 2085Uploaded byLivia Marsa
- IJERD (www.ijerd.com) International Journal of Engineering Research and Development IJERD : hard copy of journal, Call for Papers 2012, publishing of journal, journal of science and technology, research paper publishing, where to publish research paper, journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, how to get a research paper published, publishing a paper, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, how to submit your paper, peer review journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals, yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, journal of engineering, online SubmisUploaded byIJERD
- 3-1157-guyonUploaded byLuis Cavique
- CVX DualityUploaded byjdsolor
- FERUM4.1 Users GuideUploaded byKenneth Davis
- Review On Feature Selection Techniques And The Impact Of Svm For Cancer Classification Using Gene Expression ProfileUploaded byijcses

- Data Structures and Algorithms Binary Search TreeUploaded byShah Zeb Yousafzai
- Logic SyllogismUploaded byJames Nitsuga
- PalindromeUploaded byNikhil Sabu
- On Bipolar Single Valued Neutrosophic GraphsUploaded byAnonymous 0U9j6BLllB
- 2-Lecture Notes Lesson2 1Uploaded bykstu1112
- isar-refUploaded byOscar Chamache
- [object XMLDocument]LOGIC BIBLIOGRAPHY [up to 2008], 2008.pdfUploaded byzeeshanshhd
- smarandache near-ringsUploaded byfurrysammy17
- Lecture 03Uploaded bylucky
- Raymond Smullyan - Analytic Natural DeductionUploaded byLorenz49
- (0) Algebra Distributive 2terms Exponents AllUploaded byNurul
- Lecture 1Uploaded byFrozenSoul
- CSC510 Discrete Structures Assignment 1Uploaded byMohd Khairi
- GodelUploaded byAndrés Serrano
- CM #1 (Preliminaries)Uploaded bywhatsapp
- Rr310504 Theory of ComputationUploaded bySrinivasa Rao G
- net201_lecture-5_part1-1Uploaded bysmwangi
- Introduction to Fuzzy ControlUploaded byUriel Eutimio
- turingUploaded byJiang Yuan Tay
- Formas lógicas y forma lógicaUploaded byelias
- How Numbers Work Discover the strange and beautiful world of mathematics (Instant Expert).epubUploaded byceasornicareasa
- Saharon Shelah- A Comment on "p < t"Uploaded byJutmso
- Types of RecursionsUploaded byH Shamas Murtaza
- Chapter 9 Undecidability.pptUploaded byRajesh Salla
- BCD to Excess-3 Code Converter-EncoderUploaded byJunaid Iqbal
- Two Level MinimizationUploaded byVenkata Aravind Bezawada
- Math SyllabusUploaded bySourav Dey
- IntroSem2008-03Uploaded byAsia Douglas
- MEJ00.pdfUploaded byRegan Regan
- SyllogismUploaded byKristine Mae Torres Palao