Professional Documents
Culture Documents
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Outlines
Canberra
February June 2013
Overview
Introduction
Linear Algebra
Probability
Linear Regression 1
Linear Regression 2
Linear Classification 1
Linear Classification 2
Neural Networks 1
Neural Networks 2
Kernel Methods
Sparse Kernel Methods
Graphical Models 1
Graphical Models 2
Graphical Models 3
Mixture Models and EM 1
Mixture Models and EM 2
Approximate Inference
Sampling
Principal Component Analysis
Sequential Data 1
Sequential Data 2
Combining Models
Selected Topics
Discussion and Summary
1of 300
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
Part VII
I
SML
2013
Classification
Linear Classification 1
Generalised Linear
Model
Inference and Decision
Discriminant Functions
Fishers Linear
Discriminant
The Perceptron
Algorithm
263of 300
Classification
Goal : Given input data x, assign it to one of K discrete
classes Ck where k = 1, . . . , K.
Divide the input space into different regions.
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Classification
Generalised Linear
Model
Inference and Decision
Discriminant Functions
Fishers Linear
Discriminant
The Perceptron
Algorithm
Figure : Length of the petal [in cm] for a given sepal [cm] for iris
flowers (Iris Setosa, Iris Versicolor, Iris Virginica).
264of 300
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Classification
Generalised Linear
Model
Inference and Decision
Discriminant Functions
Fishers Linear
Discriminant
The Perceptron
Algorithm
265of 300
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Classification
Generalised Linear
Model
Inference and Decision
Discriminant Functions
Fishers Linear
Discriminant
The Perceptron
Algorithm
266of 300
Introduction to Statistical
Machine Learning
Linear Model
Idea: Use again a Linear Model as in regression: y(x, w) is
a linear function of the parameters w
y(xn , w) = wT (xn )
But generally y(xn , w) R.
Example: Which class is y(x, w) = 0.71623 ?
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Classification
Generalised Linear
Model
Inference and Decision
Discriminant Functions
Fishers Linear
Discriminant
The Perceptron
Algorithm
267of 300
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Classification
Generalised Linear
Model
Activation function: f ()
Link function : f 1 ()
signHzL
1.0
The Perceptron
Algorithm
0.5
-0.5
0.0
0.5
1.0
-0.5
-1.0
268of 300
Generative Models
1
2
3
4
5
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Classification
Generalised Linear
Model
Inference and Decision
Discriminant Functions
Fishers Linear
Discriminant
The Perceptron
Algorithm
269of 300
Introduction to Statistical
Machine Learning
Two Classes
c
2013
Christfried Webers
NICTA
The Australian National
University
Definition
A discriminant is a function that maps from an input vector x to
one of K classes, denoted by Ck .
Consider first two classes ( K = 2 ).
Construct a linear function of the inputs x
y(x) = wT x + w0
such that x being assigned to class C1 if y(x) 0, and to
class C2 otherwise.
weight vector w
bias w0 ( sometimes w0 called threshold )
I
SML
2013
Classification
Generalised Linear
Model
Inference and Decision
Discriminant Functions
Fishers Linear
Discriminant
The Perceptron
Algorithm
270of 300
Introduction to Statistical
Machine Learning
Two Classes
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Classification
Generalised Linear
Model
Inference and Decision
Discriminant Functions
Fishers Linear
Discriminant
The Perceptron
Algorithm
271of 300
Introduction to Statistical
Machine Learning
Two Classes
c
2013
Christfried Webers
NICTA
The Australian National
University
Classification
Generalised Linear
Model
x2
y>0
y=0
y<0
I
SML
2013
R1
R2
Discriminant Functions
Fishers Linear
Discriminant
x
w
The Perceptron
Algorithm
y(x)
kwk
x
x1
w0
kwk
272of 300
Introduction to Statistical
Machine Learning
Two Classes
c
2013
Christfried Webers
NICTA
The Australian National
University
Classification
x
T
y(x) = w
z
I
SML
2013
}|
w
x + r
kwk
{
wT w z T }| {
+w0 = r
+ w x + w0 = rkwk
kwk
Generalised Linear
Model
Inference and Decision
Discriminant Functions
Fishers Linear
Discriminant
x2
y>0
y=0
y<0
R1
The Perceptron
Algorithm
R2
x
w
y(x)
kwk
x
x1
w0
kwk
273of 300
Introduction to Statistical
Machine Learning
Two Classes
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
e e
y(x) = w
x
Decision surface is now a D-dimensional hyperplane in a
D + 1-dimensional expanded input space.
Classification
Generalised Linear
Model
Inference and Decision
Discriminant Functions
Fishers Linear
Discriminant
The Perceptron
Algorithm
274of 300
Introduction to Statistical
Machine Learning
Multi-Class
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Classification
Generalised Linear
Model
Inference and Decision
Discriminant Functions
Fishers Linear
Discriminant
R1
R2
The Perceptron
Algorithm
C1
R3
C2
not C1
not C2
275of 300
Introduction to Statistical
Machine Learning
Multi-Class
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Classification
Generalised Linear
Model
C3
C1
R1
Fishers Linear
Discriminant
R3
C1
The Perceptron
Algorithm
C3
R2
C2
C2
276of 300
Introduction to Statistical
Machine Learning
Multi-Class
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Classification
Generalised Linear
Model
Inference and Decision
Discriminant Functions
Fishers Linear
Discriminant
The Perceptron
Algorithm
Rj
Ri
Rk
xB
xA
b
x
277of 300
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Classification
Generalised Linear
Model
Inference and Decision
Discriminant Functions
Fishers Linear
Discriminant
The Perceptron
Algorithm
k = 1, . . . , K
278of 300
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
RD+1
I
SML
2013
Classification
D+1
Generalised Linear
Model
Inference and Decision
eK
w
R(D+1)K
Fishers Linear
Discriminant
Discriminant Functions
The Perceptron
Algorithm
RK .
279of 300
Introduction to Statistical
Machine Learning
e
Determine W
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Classification
Generalised Linear
Model
Inference and Decision
Discriminant Functions
Fishers Linear
Discriminant
The Perceptron
Algorithm
e = (X
e T X)
e 1 X
eTT = X
e T
W
e is the pseudo-inverse of X.
e
where X
280of 300
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Classification
Generalised Linear
Model
Inference and Decision
Discriminant Functions
Fishers Linear
Discriminant
The Perceptron
Algorithm
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Classification
Generalised Linear
Model
Fishers Linear
Discriminant
The Perceptron
Algorithm
8
4
Discriminant Functions
8
2
282of 300
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Classification
Generalised Linear
Model
Fishers Linear
Discriminant
The Perceptron
Algorithm
6
6
6
6
Discriminant Functions
283of 300
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
y(x) = w x
Classification
Generalised Linear
Model
Inference and Decision
Discriminant Functions
Fishers Linear
Discriminant
The Perceptron
Algorithm
284of 300
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Classification
Generalised Linear
Model
Fishers Linear
Discriminant
The Perceptron
Algorithm
285of 300
nC2
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Classification
Generalised Linear
Model
Inference and Decision
Discriminant Functions
Fishers Linear
Discriminant
The Perceptron
Algorithm
6
286of 300
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
where yn = w xn .
Maximise the Fisher criterion
Classification
Generalised Linear
Model
J(w) =
(m2 m1 )
s21 + s22
The Perceptron
Algorithm
6
287of 300
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
J(w) =
w SB w
wT SW w
I
SML
2013
Classification
Generalised Linear
Model
Inference and Decision
Discriminant Functions
Fishers Linear
Discriminant
The Perceptron
Algorithm
nC2
288of 300
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
wT SB w
wT SW w
Classification
Generalised Linear
Model
Inference and Decision
Discriminant Functions
S1
W (m2
m1 )
Fishers Linear
Discriminant
The Perceptron
Algorithm
289of 300
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Classification
Generalised Linear
Model
Inference and Decision
Discriminant Functions
SW =
K
X
Sk
k=1
Fishers Linear
Discriminant
The Perceptron
Algorithm
where
Sk =
(xn mk )(xn mk )T
nCk
mk =
1 X
xn
Nk
nCk
290of 300
K
X
k=1
N
1 X
xn .
N
n=1
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Classification
Generalised Linear
Model
Inference and Decision
Discriminant Functions
Fishers Linear
Discriminant
The Perceptron
Algorithm
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
How many linear features can one find with this method?
SB is of rank at most K 1 because of the sum of K rank
one matrices and the global constraint via m.
Projection onto the subspace spanned by SB can not have
more than K 1 linear features.
Classification
Generalised Linear
Model
Inference and Decision
Discriminant Functions
Fishers Linear
Discriminant
The Perceptron
Algorithm
292of 300
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Classification
Generalised Linear
Model
Inference and Decision
Discriminant Functions
Fishers Linear
Discriminant
The Perceptron
Algorithm
293of 300
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Classification
Generalised Linear
Model
Inference and Decision
Discriminant Functions
Fishers Linear
Discriminant
The Perceptron
Algorithm
294of 300
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Classification
Generalised Linear
Model
Inference and Decision
Discriminant Functions
Fishers Linear
Discriminant
The Perceptron
Algorithm
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Classification
Generalised Linear
Model
Inference and Decision
Discriminant Functions
Fishers Linear
Discriminant
The Perceptron
Algorithm
nM
296of 300
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Classification
nM
Generalised Linear
Model
297of 300
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
w( +1) = w( ) + n tn
Classification
Generalised Linear
Model
Discriminant Functions
0.5
Fishers Linear
Discriminant
0.5
The Perceptron
Algorithm
0.5
0.5
1
1
0.5
0.5
1
1
0.5
0.5
298of 300
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
w( +1) = w( ) + n tn
Classification
Generalised Linear
Model
Discriminant Functions
0.5
Fishers Linear
Discriminant
0.5
The Perceptron
Algorithm
0.5
0.5
1
1
0.5
0.5
1
1
0.5
0.5
299of 300
because (n tn ) n tn = kn tn k > 0.
BUT: contributions to the error from the other misclassified
patterns might have increased.
AND: some correctly classified patterns might now be
misclassified.
Perceptron Convergence Theorem : If the training set is
linearly separable, the perceptron algorithm is guaranteed
to find a solution in a finite number of steps.
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Classification
Generalised Linear
Model
Inference and Decision
Discriminant Functions
Fishers Linear
Discriminant
The Perceptron
Algorithm
300of 300