Professional Documents
Culture Documents
Machine Learning
Canberra
February June 2013
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
Outlines
Overview
Introduction
Linear Algebra
Probability
Linear Regression 1
Linear Regression 2
Linear Classification 1
Linear Classification 2
Neural Networks 1
Neural Networks 2
Kernel Methods
Sparse Kernel Methods
Graphical Models 1
Graphical Models 2
Graphical Models 3
Mixture Models and EM 1
Mixture Models and EM 2
Approximate Inference
Sampling
Principal Component Analysis
Sequential Data 1
Sequential Data 2
Combining Models
Selected Topics
Discussion and Summary
1of 157
Introduction to Statistical
Machine Learning
c 2013
Christfried Webers
NICTA
The Australian National
University
Part III
ISML
2013
Basic Concepts
Linear Algebra
Linear Transformations
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
115of 157
Intuition
Geometry
Points and Lines
Vector addition and scaling
Humans have experience with 3 dimensions (less with 1, 2
though)
Generalisation to N dimensions (possibly N ! 1)
Line ! vector space V
Point ! vector x 2 V
Example : X 2 Rnm
Space of matrices Rnm and the space of vectors Rnm are
isomorphic
Introduction to Statistical
Machine Learning
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
Basic Concepts
Linear Transformations
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
116of 157
Introduction to Statistical
Machine Learning
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
Basic Concepts
Linear Transformations
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
117of 157
Introduction to Statistical
Machine Learning
Matrix-Vector Multiplication
1
f o r i i n xrange (m) :
R[ i ] = 0 . 0 ;
f o r j i n xrange ( n ) :
R[ i ] = R[ i ] + A [ i , j ] V [ j ]
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
Basic Concepts
Linear Transformations
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
118of 157
Introduction to Statistical
Machine Learning
Matrix-Vector Multiplication
1
f o r i i n xrange (m) :
R[ i ] = 0 . 0 ;
f o r j i n xrange ( n ) :
R[ i ] = R[ i ] + A [ i , j ] V [ j ]
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
Basic Concepts
Linear Transformations
Trace
Inner Product
Projection
Rank, Determinant, Trace
a11
6 a21
6
4. . .
am1
A
a12
a22
...
am2
...
...
...
...
32
Matrix Inverse
a1n
v1
a11 v1 + a12 v2 + + a1n vn
6 v2 7 6 a21 v1 + a22 v2 + + a2n vn 7
a2n 7
76 7 = 6
7
5
. . . 5 4. . . 5 4
...
amn
vn
am1 v1 + am2 v2 + + amn vn
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
119of 157
Introduction to Statistical
Machine Learning
Matrix-Vector Multiplication
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
a11
6 a21
6
4. . .
am1
A
a12
a22
...
am2
...
...
...
...
32
Basic Concepts
a1n
v1
a11 v1 + a12 v2 + + a1n vn
6 v2 7 6 a21 v1 + a22 v2 + + a2n vn 7
a2n 7
76 7 = 6
7
5
. . . 5 4. . . 5 4
...
amn
vn
am1 v1 + am2 v2 + + amn vn
Linear Transformations
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
120of 157
Introduction to Statistical
Machine Learning
Matrix-Vector Multiplication
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
a11
6 a21
6
4. . .
am1
a12
a22
...
am2
...
...
...
...
32
Basic Concepts
a1n
v1
a11 v1 + a12 v2 + + a1n vn
6 v2 7 6 a21 v1 + a22 v2 + + a2n vn 7
a2n 7
76 7 = 6
7
5
. . . 5 4. . . 5 4
...
amn
vn
am1 v1 + am2 v2 + + amn vn
Linear Transformations
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
121of 157
Introduction to Statistical
Machine Learning
Matrix-Vector Multiplication
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
Basic Concepts
Linear Transformations
Trace
Inner Product
Projection
A
2
4a1
a2
...
3 v1
2
3
6 v2 7
7 4
5
an 5 6
4. . .5 = a1 v1 + a2 v2 + + an vn
vn
122of 157
Introduction to Statistical
Machine Learning
Transpose of R
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
Given R = AV,
2
What is R ?
R = 4a1 v1 + a2 v2 + + an vn 5
Basic Concepts
Linear Transformations
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
123of 157
Introduction to Statistical
Machine Learning
Transpose of R
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
Given R = AV,
2
What is R ?
R = 4a1 v1 + a2 v2 + + an vn 5
Basic Concepts
Linear Transformations
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
124of 157
Transpose
Introduction to Statistical
Machine Learning
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
Basic Concepts
Linear Transformations
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
125of 157
Introduction to Statistical
Machine Learning
Transpose
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
v1
6
6
6
v2 . . . vn 6
6
6
4
AT
aT1
aT2
...
aTn
RT
7
7
7 T
7 = v1 a + v2 aT + + vn aTn
1
2
7
7
5
Basic Concepts
Linear Transformations
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
126of 157
Introduction to Statistical
Machine Learning
Trace
c 2013
Christfried Webers
NICTA
The Australian National
University
Example
ISML
2013
Basic Concepts
Linear Transformations
Trace
Inner Product
3
1 2 3
A = 44 5 65
7 8 9
Projection
Rank, Determinant, Trace
tr {A} = 1 + 5 + 9 = 15
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
127of 157
Introduction to Statistical
Machine Learning
vec(x) operator
Define vec(X) as the vector which results from stacking all
columns of a matrix A on top of each other.
Example
2 3
1
647
6 7
677
6 7
2
3
627
1 2 3
6 7
7
A = 44 5 65
vec(A) = 6
657
6
7
7 8 9
687
637
6 7
465
9
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
Basic Concepts
Linear Transformations
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
tr XT Y = vec(X)T vec(Y)
128of 157
Introduction to Statistical
Machine Learning
Inner Product
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
Basic Concepts
h, i : V V ! F
Conjugate Symmetry :
hx, yi = hy, xi
Linearity :
hax, yi = ahx, yi , and hx + y, zi = hx, zi + hy, zi
Positive-definitness :
hx, xi 0, and hx, xi = 0 for x = 0 only.
Linear Transformations
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
129of 157
Introduction to Statistical
Machine Learning
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
Basic Concepts
Linear Transformations
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
130of 157
Introduction to Statistical
Machine Learning
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
hx, yi = x y
n
X
k=1
hX, Yi = tr XT Y =
=
Linear Transformations
Trace
xk yk = x1 y1 + x2 y2 + + xn yn
Inner Product
Projection
Rank, Determinant, Trace
p
n
X
X
Basic Concepts
p
X
k=1
(XT Y)kk =
Matrix Inverse
Eigenvectors
p
n
X
X
k=1 l=1
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
Xlk Ylk
k=1 l=1
131of 157
Introduction to Statistical
Machine Learning
=
=
=
=
n
XX
k=1 l=1
p
n
X
X
k=1 l=1
p
n
X
X
k=1 l=1
p
X
T
p
X
k=1
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
Basic Concepts
Linear Transformations
n
XX
Trace
k=1 l=1
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
(X Z)kk + (Y Z)kk
k=1
= tr XT Z + tr YT Z = hX, Zi + hY, Zi
132of 157
Projection
In linear algebra and functional analysis, a projection is a
linear transformation P from a vector space V to itself such
that
P2 = P
Introduction to Statistical
Machine Learning
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
Basic Concepts
Linear Transformations
Trace
Inner Product
Projection
133of 157
Introduction to Statistical
Machine Learning
Projection
In linear algebra and functional analysis, a projection is a
linear transformation P from a vector space V to itself such
that
P2 = P
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
Basic Concepts
Linear Transformations
(a,b)
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
x
(a-b,0)
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
134of 157
Introduction to Statistical
Machine Learning
Projection
In linear algebra and functional analysis, a projection is a
linear transformation P from a vector space V to itself such
that
P2 = P
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
Basic Concepts
Linear Transformations
(a,b)
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
(a-b,0)
a
a b
A
=
b
0
1
A=
0
Books
1
0
135of 157
Introduction to Statistical
Machine Learning
Orthogonal Projection
Orthogonality : need an inner product hx, yi
Choose two arbitrary vectors x and y. Then Px and y
are orthogonal.
0 = hPx, y
Pyi = (Px) (y
Py) = x (P
c 2013
Christfried Webers
NICTA
The Australian National
University
Py
P P)y
ISML
2013
Basic Concepts
Orthogonal Projection
Linear Transformations
Trace
P =P
P=P
Inner Product
Projection
Introduction to Statistical
Machine Learning
Orthogonal Projection
Given a matrix A 2 Rnp and a vector x. What is the
closest point e
x to x in the column space of A ?
Orthogonal Projection into the column space of A
e
x = A(AT A)
Projection matrix
P = A(AT A)
AT x
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
Basic Concepts
Linear Transformations
Trace
Inner Product
AT
Projection
Rank, Determinant, Trace
Proof
Matrix Inverse
Eigenvectors
P2 = A(AT A)
AT A(AT A)
AT = A(AT A)
AT = P
Singular Value
Decomposition
Directional Derivative,
Gradient
Orthogonal projection ?
Books
PT = (A(AT A)
AT )T = A(AT A)
AT = P
137of 157
a1 = a2 = ak = 0
Introduction to Statistical
Machine Learning
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
Basic Concepts
Linear Transformations
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
138of 157
Rank
Introduction to Statistical
Machine Learning
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
Basic Concepts
Linear Transformations
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
139of 157
Introduction to Statistical
Machine Learning
Determinant
c 2013
Christfried Webers
NICTA
The Australian National
University
det {A} =
2Sn
sgn( )
n
Y
Basic Concepts
Ai,
(i)
i=1
= det {A}
ISML
2013
Linear Transformations
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
140of 157
Introduction to Statistical
Machine Learning
Matrix Inverse
c 2013
Christfried Webers
NICTA
The Australian National
University
Identity Matrix I
I = AA 1 = A 1 A
The matrix inverse A 1 does only exist for square matrices
which are NOT singular.
Singular matrix
at least one eigenvalue is zero,
determinant |A| = 0.
ISML
2013
Basic Concepts
Linear Transformations
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
(AB)
(Only then is (AB)(AB)
=B
= (AB)B
A
1
Singular Value
Decomposition
= AA
= I.)
Directional Derivative,
Gradient
Books
141of 157
Introduction to Statistical
Machine Learning
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
Basic Concepts
(A
1 T
) ?
Linear Transformations
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
142of 157
Introduction to Statistical
Machine Learning
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
Basic Concepts
1 T
(A ) ?
need to assume that (A
rule : (A 1 )T = (AT ) 1
Linear Transformations
) exists
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
143of 157
Introduction to Statistical
Machine Learning
Useful Identity
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
(A
+ BT C
B)
1 T
B C
= ABT (BABT + C)
Basic Concepts
Linear Transformations
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
144of 157
Introduction to Statistical
Machine Learning
Useful Identity
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
(A
+ BT C
B)
1 T
B C
= ABT (BABT + C)
Basic Concepts
Linear Transformations
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
145of 157
Introduction to Statistical
Machine Learning
(A
+ BT C
c 2013
Christfried Webers
NICTA
The Australian National
University
1 T
B)
B C
= ABT (BABT + C)
Basic Concepts
Multiply by (BABT + C)
(A
+ BT C
Linear Transformations
1 T
B)
B C
(BABT + C) = ABT
= (A
= ABT
+ BT C
1
+B C
B)
1
B)
Trace
Inner Product
Projection
ISML
2013
[BT C
1
[B C
B(ABT ) + BT ]
1
B+A
]AB
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
146of 157
Introduction to Statistical
Machine Learning
Woodbury Identity
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
(A + BD
C)
=A
B(D + CA
B)
CA
Basic Concepts
Linear Transformations
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
147of 157
Introduction to Statistical
Machine Learning
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
Basic Concepts
Linear Transformations
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
148of 157
Introduction to Statistical
Machine Learning
Eigenvectors
c 2013
Christfried Webers
NICTA
The Australian National
University
2 C.
ISML
2013
Basic Concepts
Linear Transformations
Trace
0 1
x= x
1 0
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
= { , }
x=
,
1
1
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
149of 157
Introduction to Statistical
Machine Learning
Eigenvectors
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
Basic Concepts
Ax = x
is equivalent to
Linear Transformations
Trace
Inner Product
(A
I)x = 0
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
150of 157
Introduction to Statistical
Machine Learning
Real Eigenvalues
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
1 + 2
5 + 6
3 + 4
7 + 8
1
=
3
2
4
5
7
Basic Concepts
Linear Transformations
Trace
Inner Product
6
8
Projection
Rank, Determinant, Trace
Matrix Inverse
by
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
151of 157
Introduction to Statistical
Machine Learning
Real Eigenvalues
How can we enforce real eigenvalues?
Lets assume A 2 Cnn , Hermitian (AH = A).
Calculate
xH Ax = xH x
for an eigenvector x 2 Cn of A.
Another possibility to calculate xH Ax
Basic Concepts
Linear Transformations
(A is Hermitian)
H
(reverse order)
= ( xH x)H
(eigenvalue)
= (x Ax)
ISML
2013
Trace
xH Ax = xH AH x
H
c 2013
Christfried Webers
NICTA
The Australian National
University
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
= x x
Singular Value
Decomposition
and therefore
=
( is real).
Directional Derivative,
Gradient
Books
152of 157
Introduction to Statistical
Machine Learning
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
Basic Concepts
Linear Transformations
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
153of 157
Introduction to Statistical
Machine Learning
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
Basic Concepts
Linear Transformations
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
154of 157
Introduction to Statistical
Machine Learning
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
Basic Concepts
Linear Transformations
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
155of 157
Introduction to Statistical
Machine Learning
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
Basic Concepts
Linear Transformations
Trace
Inner Product
Projection
Rank, Determinant, Trace
Df (X)() = tr X (C + C)
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
Books
Introduction to Statistical
Machine Learning
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
Basic Concepts
Linear Transformations
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
157of 157