Professional Documents
Culture Documents
1
1+exp(w (t) > x n )
(t)
Lets replace the predicted label prob. n by the predicted binary label yn
where
Thus w (t)
w (t+1) = w (t) t (
yn(t) yn )x n
(
>
(t)
1 if n 0.5 or w (t) x n 0
(t)
yn =
>
(t)
0 if n < 0.5 or w (t) x n < 0
(t)
gets updated only when yn 6= yn (i.e., when w (t) mispredicts)
Mistake-Driven Learning
Consider the mistake-driven SGD update rule
w (t+1) = w (t) t (
yn(t) yn )x n
Lets, from now on, assume the binary labels to be {-1,+1}, not {0,1}. Then
(
(t)
2yn if yn 6= yn
(t)
yn yn =
(t)
0
if yn = yn
(t)
Hyperplanes
Separates a D-dimensional space into two half-spaces (positive and negative)
Notion of Margins
Geometric margin n of an example x n is its signed distance from hyperplane
n =
wT xn + b
||w ||
1nN
|(w T x n + b)|
||w ||
N
X
`n (w , b) =
n=1
N
X
max{0, yn (w x n + b)}
n=1
yn x n
yn
w + yn x n
b + yn
w old + yn x n
bnew
bold + yn
10
Updates would be
w new = w old + yn x n = w old + x n (since yn = +1)
bnew = bold + yn = bold + 1
wT
new x n + bnew
(w old + x n )T x n + bold + 1
T
(w T
old x n + bold ) + x n x n + 1
T
Thus w T
new x n + bnew is less negative than w old x n + bold
11
w new = w old + x
12
Updates would be
w new = w old + yn x n = w old x n (since yn = 1)
bnew = bold + yn = bold 1
wT
new x n + bnew
(w old x n )T x n + bold 1
T
(w T
old x n + bold ) x n x n 1
T
Thus w T
new x n + bnew is less positive than w old x n + bold
13
w new = w old x
14
Convergence of Perceptron
Theorem (Block & Novikoff): If the training data is linearly separable with
margin by a unit norm hyperplane w (||w || = 1) with b = 0, then perceptron
converges after R 2 / 2 mistakes during training (assuming ||x|| < R for all x).
Proof:
yn w T
xn
||w ||
= yn w T
xn
2yn w T
k xn
(1)
2
(2)
||w
k+1 || R k
k+1
k R 2 / 2
Nice Thing: Convergence rate does not depend on the number of training
examples N or the data dimensionality D. Depends only on the margin!!!
Machine Learning (CS771A)
15
16
17
|w T x n +b|
= ||w1 ||
||w ||
18
1
||w ||
Minimize f (w , b) =
subject to
n = 1, . . . , N
19
1
||w ||
20
Next class..
21