Professional Documents
Culture Documents
Multi-class Classification
Using binary classifiers Random Forests
Regression
Ridge regression Basis functions
Multi-class Classification
e.g. K = 3
C1 C3
R1 ? C1
R3
C3 C2 R2 C2
?
C1
1 vs 2 & 3
?
C2 C3
?
3 vs 1 & 2 2 vs 1 & 3
1 vs 2 & 3
C1
max fk (x)
k
C2 C3
2 vs 1 & 3
3 vs 1 & 2
C1
max fk (x)
k
C2 C3
Training: learn k=10 two-class 1 vs the rest SVM classifiers fk (x) Classification: choose class with most positive score
Example
hand drawn
5 5 4 5
classification
3 5 2 5
2 3 4 5
5 0
w1>xi w2>xi & w1>xi w3>xi w2>xi w3>xi & w2>xi w1>xi w3>xi w1>xi & w3>xi w2>xi
This is a quadratic optimization problem subject to linear constraints and there is a unique minimum Note, a margin can also be included in the constraints
In practice there is a little or no improvement over the binary case
Random Forests
rootnode
2 1
x1 6
A decision tree
fa l se
tru
x2 > 1
e
10 11
12
13 14
terminal(leaf)node
Classifyas greenclass
fals e
x1 > 1
e tru
Classifyas blueclass
Classifyas redclass
Testing
x2
Split2
Split1
Trees vs Forests
lack of margin
x1 1
A single tree may over fit to the training data Instead train multiple trees and combine their predictions Each tree can differ in both training data and node tests Achieve this by injecting randomness into training algorithm
Aforestisanensembleoftrees.Thetreesareallslightlydifferentfromoneanother.
Nodetestparams
Foresttraining
T1 T2 T3
leaf
leaf
leaf
Application:
Body tracking in Microsoft Kinect for XBox 360
RGB image
Training/test Data
From motion capture system
Node tests
Example result
body parts in 3D
Regression
y
Suppose we are given a training set of N observations ((x1, y1), . . . , (xN , yN )) with xi Rd, yi R The regression problem is to estimate f (x) from this data such that yi = f (xi)
Learning by optimization
As
N X
l (f (xi), yi) + R (f )
wj xj
: x (x)
R R
or the basis functions can be Gaussians centred on the training data: j (x) = exp (x xj )2 /2 2 e.g. for 3 points, e (xx2 )2 /22 f (x, w) = (w1 , w2 , w3 ) e = w> (x) 2 2 e(xx3 ) /2
(xx1 )2 /22
: x (x)
R1 R3
loss function
regularization
xi
yi = y i + ni
measured value
ni N (0, 2)
true value
y1 y2 . . yN
w0 w1 . . wM
is an N M design matrix
1 = . .
1 xN
Hence
> + I
1 > = + I > y
w = > y
w = > + I
=
Mx1
> y
assume N > M
MxM
MxN
Nx1
w = ( > ) 1 > y = + y
where + is the pseudo-inverse of (pinv in Matlab) Adding the term I improves the conditioning of the inverse, since if is not full rank, then (> + I) will be (for suciently large )
> As , w 1 y 0
Often the regularization is applied only to the inhomogeneous part of w, ) i.e. to w , where w = (w0, w
0.5
-0.5
-1
-1.5
0.1
0.2
0.3
0.4
0.5 x
0.6
0.7
0.8
0.9
f (x, w) =
M X j =0
wj xj = w> (x)
: x (x)
R RM +1
N = 9 samples, M = 7
1.5 Sample points Ideal fit lambda = 100
1.5 Sample points Ideal fit lambda = 0.001
0.5
0.5
-0.5
-0.5
-1
-1
-1.5
0.1
0.2
0.3
0.4
0.5 x
0.6
0.7
0.8
0.9
-1.5
0.1
0.2
0.3
0.4
0.5 x
0.6
0.7
0.8
0.9
0.5
0.5
-0.5
-0.5
-1
-1
-1.5
0.1
0.2
0.3
0.4
0.5 x
0.6
0.7
0.8
0.9
-1.5
0.1
0.2
0.3
0.4
0.5 x
0.6
0.7
0.8
0.9
M=3
least-squares fit 1.5 Sample points Ideal fit Least-squares solution
1.5
M=5
least-squares fit Sample points Ideal fit Least-squares solution
0.5
0.5
-0.5
-0.5
-1
-1
-1.5
-1.5
15 10 5 0 -5 -10
0.1
0.2
0.1
0.2
0.3
0.3
0.4
0.5 x
0.6
0.7
0.8
0.9
1
400 300 200 100 0 -100
0.8
0.9
-15
y
-20 -25 0 0.1 0.2 0.3 0.4 0.5 x 0.6 0.7 0.8 0.9 1
-200 -300 -400 0 0.1 0.2 0.3 0.4 0.5 x 0.6 0.7 0.8 0.9 1
0.5
-0.5
-1
-1.5
0.1
0.2
0.3
0.4
0.5 x
0.6
0.7
0.8
0.9
f (x, w) =
N X i=1
wi e(xxi )
/2
= w> (x)
: x (x)
R RN
w is a N-vector
0.5
0.5
-0.5
y
0 0.1 0.2 0.3 0.4 0.5 x 0.6 0.7 0.8 0.9 1
-0.5
-1
-1
-1.5
-1.5
0.1
0.2
0.3
0.4
0.5 x
0.6
0.7
0.8
0.9
0.5
0.5
-0.5
y
0 0.1 0.2 0.3 0.4 0.5 x 0.6 0.7 0.8 0.9 1
-0.5
-1
-1
-1.5
-1.5
0.1
0.2
0.3
0.4
0.5 x
0.6
0.7
0.8
0.9
4 error norm
0.5
y
-10 -5 0
-0.5
-1
-1.5
10
10 log
10
0.1
0.2
0.3
0.4
0.5 x
0.6
0.7
0.8
0.9
Sigma = 0.334
1.5 Sample points Ideal fit Validation set fit 1
Sigma = 0.1
1.5 Sample points Ideal fit Validation set fit 1
0.5
0.5
-0.5
-0.5
-1
-1
-1.5
-1.5
0 0.1 0.2 0.3 0.4 0.5 x 0.6 0.7 0.8 0.9 1
0.1
0.2
0.3
0.4
0.5 x
0.6
0.7
0.8
0.9
2000 1500 1000 500 0 -500 -1000 -1500 -2000 0 0.1 0.2 0.3 0.4 0.5 x 0.6 0.7 0.8 0.9 1 y
0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 0 0.1 0.2 0.3 0.4 0.5 x 0.6 0.7 0.8 0.9 1 y
Bishop, chapters 3, 4.1 4.3 and 14.3 Hastie et al, chapters 10.1 10.6 More on web page: http://www.robots.ox.ac.uk/~az/lectures/ml