You are on page 1of 33

Statistical ML

M. Keller
Basics for Statistical Machine Learning
Motivation
Linear Algebra Basics Linear Algebra
Vectors
Matrices
Determinant
Inverses
Mikaela Keller Diagonalization

IDIAP Research Institute


Martigny, Switzerland
mkeller[at]idiap.ch

July 2nd, 2007

1 / 22
Statistical ML

M. Keller

Motivation

Linear Algebra
Vectors
Matrices
Motivation Determinant
Inverses
Diagonalization

Linear Algebra Basics

2 / 22
Statistical ML

M. Keller

Motivation

Linear Algebra
Vectors
Matrices
Motivation Determinant
Inverses
Diagonalization

Linear Algebra Basics

3 / 22
Motivation Statistical ML

Concrete Example: Regression M. Keller

Motivation

Linear Algebra
Vectors
Matrices
Determinant
Inverses
Diagonalization

I Determination of abalone age by:


I Counting the number of rings in the shell through a
microscope ← time-consuming task.
I Through other measurements: sex, diameter, height,
whole weight, shell weight, etc. ← easy to obtain.
I Regression problem: training examples = {(easy
measurements, age)}. We want to predict the age of
abalone from the easy measurements alone.
4 / 22
Motivation Statistical ML

Concrete Example: Classification M. Keller

Motivation

Linear Algebra
2 Vectors
Matrices
Determinant
Inverses
0
Diagonalization

−2

−2 0 2

I Written digits classification:


I Automatic recognition of postal code from scanned
mail.
I Classification problem: training examples = {(image,
actual digit)}. We want to predict the correct digit
from a new image.
5 / 22
Motivation Statistical ML

Concrete Example: Density Estimation / Clustering M. Keller

Motivation
100
2 (i) Linear Algebra
90
Vectors
80 Matrices
70 0 Determinant
Inverses
60 Diagonalization
50
−2
40
1 2 3 4 5 6 −2 0 2

I Data compression / data visualization / data


exploration:
I Time between two eruptions vs duration of the previous
eruption.
I Unsupervised problem: training examples =
{(measurement)}. We want to “organize” the
information contained in the measurements.
6 / 22
Motivation Statistical ML

M. Keller

Motivation

Linear Algebra
I Most of the problems described previously end up Vectors
Matrices
reformulated into: Determinant
Inverses
I curves or surfaces to be discovered, Diagonalization

I ie systems of equations with unknowns to be solved,


I ie matrices manipulation operations.
⇒ Linear Algebra.
I Diverse sources of uncertainty:
I limited amount of examples,
I noise in the measurements,
I randomness inherent to the observed phenomena, etc.
⇒ Probability Theory

7 / 22
Statistical ML

M. Keller

Motivation

Linear Algebra
Motivation Vectors
Matrices
Determinant
Inverses
Linear Algebra Basics Diagonalization

Vectors
Matrices
Determinant
Inverses
Matrix Diagonalization

8 / 22
Vectors Statistical ML

M. Keller

Motivation

I Examples x are usually represented as vectors of m Linear Algebra


Vectors
components: Matrices
Determinant
Inverses
Diagonalization
  x2
x1 x

x =  ...  , xT = (x1 , . . . , xm ) .
 

xm x1

I Inner product (aka dot product, scalar product):


 
y1
xT y = (x1 , . . . , xm )  ...  = x1 y1 + . . . + xm ym .
 

ym

9 / 22
Vectors Statistical ML

M. Keller

Motivation

Linear Algebra
Vectors
I “x and y are orthogonal (x ⊥ y)” ⇔ xT y = 0. Matrices
Determinant
Inverses
I The norm (length) of x: Diagonalization


kxk = xT x

.
I The distance between 2 vectors x and y is defined as
d(x, y) = kx − yk:

d(x, y)2 = kxk2 + kyk2 − 2xT y

10 / 22
Matrices Statistical ML

M. Keller

Motivation
n Equations with m unknows x1 , . . . , xm : Linear Algebra
Vectors
 Matrices

 a11 x1 +
 ... + a1m xm = b1 Determinant
Inverses
.. ⇔ Diagonalization

 .
an1 x1 + . . . +anm xm = bn

    
a11 . . . a1m x1 b1
 .. . . ..   ..  =  ..  ⇔
 . . .  .   . 
an1 . . . anm xm bn

An×m xm×1 = bn×1 .

11 / 22
Matrices Statistical ML

M. Keller

Motivation
n Equations with m unknows x1 , . . . , xm : Linear Algebra
Vectors
 Matrices

 a11 x1 +
 ... + a1m xm = b1 Determinant
Inverses
.. ⇔ Diagonalization

 .
an1 x1 + . . . +anm xm = bn

    
(a11 ,
. . . , a1m ) x1 b1
..   ..   ..  ⇔
 . =

 . . 
(an1 , . . . , anm ) xm bn

Ax = b.

11 / 22
Matrices Statistical ML

M. Keller

Motivation
n Equations with m unknows x1 , . . . , xm : Linear Algebra
Vectors
 Matrices

 a11 x1 +
 ... + a1m xm = b1 Determinant
Inverses
.. ⇔ Diagonalization

 .
an1 x1 + . . . +anm xm = bn

    
(a11 ,
. . . , a1m ) x1 b1
..   ..   ..  ⇔
 . =

 . . 
(an1 , . . . , anm ) xm bn

Ax = b.

11 / 22
Matrices Statistical ML

Geometrical view M. Keller

Motivation

Linear Algebra
2-D Example Vectors
Matrices
Determinant
 Inverses
2x1 − x2 =0 Diagonalization

x1 + 3x2 =2

12 / 22
Matrices Statistical ML

Geometrical view M. Keller

Motivation

Linear Algebra
2-D Example Vectors
Matrices
Determinant
 Inverses
2x1 − x2 =0 Diagonalization

x1 + 3x2 =2

(1,2)

(0,0)

12 / 22
Matrices Statistical ML

Geometrical view M. Keller

Motivation

Linear Algebra
2-D Example Vectors
Matrices
Determinant
 Inverses
2x1 − x2 =0 Diagonalization

x1 + 3x2 =2

12 / 22
Matrices Statistical ML

Geometrical view M. Keller

Motivation

Linear Algebra
2-D Example Vectors
Matrices
Determinant
Inverses

2x1 − x2 =0 Diagonalization

x1 + 3x2 =2

(−1,1)

(2,0)

12 / 22
Matrices Statistical ML

Geometrical view M. Keller

Motivation

Linear Algebra
2-D Example Vectors
Matrices
Determinant
 Inverses
2x1 − x2 =0 Diagonalization

x1 + 3x2 =2

12 / 22
Matrices Statistical ML

M. Keller
n Equations with m unknows x1 , . . . , xm :
Motivation

Linear Algebra
Ax = b ⇔ Vectors
Matrices
     Determinant
a11 . . . a1m x1 b1 Inverses
 .. .. ..   ..  =  ..  ⇔ Diagonalization
 . . .  .   . 
an1 . . . anm xm bn

13 / 22
Matrices Statistical ML

M. Keller
n Equations with m unknows x1 , . . . , xm :
Motivation

Linear Algebra
Ax = b ⇔ Vectors
Matrices
     Determinant
a11 . . . a1m x1 b1 Inverses
 .. . . ..   ..  =  .. Diagonalization
⇔

 . . .  .   .
an1 . . . anm xm bn
     
a11 a1m b1
 ..   ..   ..  .
x1  .  + . . . + xm  .  =  . 
an1 anm bn
A real valued matrix An×m is also seen as a linear transfor-
mation:
A : Rm −→ Rn
x −→ Ax
.
13 / 22
Matrices Statistical ML

Alternate geometrical view M. Keller

Motivation

2-D Example Linear Algebra


Vectors
Matrices
Determinant
Inverses
      
2x1 −x2 = 0 2 −1 0 Diagonalization

⇔ x1 + x2 = .
x1 +3x2 = 2 1 3 2

14 / 22
Matrices Statistical ML

Alternate geometrical view M. Keller

Motivation

2-D Example Linear Algebra


Vectors
Matrices
Determinant
Inverses
      
2x1 −x2 = 0 2 −1 0 Diagonalization

⇔ x1 + x2 = .
x1 +3x2 = 2 1 3 2

14 / 22
Matrices Statistical ML

Alternate geometrical view M. Keller

Motivation

2-D Example Linear Algebra


Vectors
Matrices
Determinant
Inverses
      
2x1 −x2 = 0 2 −1 0 Diagonalization

⇔ x1 + x2 = .
x1 +3x2 = 2 1 3 2

14 / 22
Matrices Statistical ML

Alternate geometrical view M. Keller

Motivation

2-D Example Linear Algebra


Vectors
Matrices
Determinant
Inverses
      
2x1 −x2 = 0 2 −1 0 Diagonalization

⇔ x1 + x2 = .
x1 +3x2 = 2 1 3 2

14 / 22
Matrices Statistical ML

Alternate geometrical view (No solution) M. Keller

Motivation

Linear Algebra
2-D Example Vectors
Matrices
Determinant
Inverses
Diagonalization
      
2x1 −2x2 = 0 2 −2 0
⇔ x1 + x2 = .
x1 −x2 = 2 1 −1 2

15 / 22
Determinant Statistical ML

M. Keller
Recursive Definition: Let A be a square matrix (m × m),
Motivation
a11 . . . a1m Linear Algebra
X m
det(A) = ... .. .. Vectors
= (−1)1+j a1j det(M1j ),

. .
Matrices
Determinant
am1 . . . amm j=1 Inverses
Diagonalization

where Mij is A without its line i and its column j and


det(m) = m for m scalar.

Example:
a11 a12 a13

det(A) = a21 a22 a23
a31 a32 a33

a a23 a a23 a a
= a11 22 + a12 21 + a13 21 22


a32 a33 a31 a33 a31 a32
= a11 (a22 a33 −a32 a23 )+a12 (a21 a33 −a31 a23 )+a13 (a21 a32 −a31 a22 )
16 / 22
Inverses Statistical ML

M. Keller

Motivation

Linear Algebra
I Definition: A square matrix Am×m is called non-singular Vectors
Matrices
or invertible if there exists a matrix Bm×m such that: Determinant
Inverses
  Diagonalization
1 ... 0
AB = Im =  ... . . . ...  = BA.
 

0 ... 1

If such B exists it is called the inverse of A and noted


A−1 .
I “A is invertible” ⇔ det(A) 6= 0 ⇔ “Ax = 0 iff x = 0”.
I If A (square) is invertible, the solution of the system
Ax = b is x = A−1 b.

17 / 22
Determinants and Inverses Statistical ML

Geometrical view M. Keller

Motivation

Linear Algebra
2-D Example Vectors
Matrices
Determinant
Inverses
2 −1 Diagonalization
|det(A)| = |
| = |2 · 3 − 1 · (−1)|
1 3

a .2

a .1

18 / 22
Determinants and Inverses Statistical ML

Geometrical view M. Keller

Motivation

Linear Algebra
2-D Example Vectors
Matrices
Determinant
Inverses
2 −1 Diagonalization
|det(A)| = |
| = OP.OQ. sin(θ2 − θ1 ).
1 3

θ2 P
θ1

18 / 22
Matrices Statistical ML

M. Keller

I If A is rectangular andAT Ais invertible, the solution of Motivation

the system Ax = b is x = (AT A)−1 AT b. Linear Algebra


Vectors
I (AT A)−1 AT is called the pseudo-inverse of A. Matrices
Determinant
 T  Inverses
x1 Diagonalization
 .. 
I Let Xn×m =  .  be a collection of examples.
xT
n
I The Gram matrix of this collection is:
 T
x1 x1 . . . xT

1 xn
G = XXT =  ... .. ..  .

. . 
T T
xn x1 . . . xn xn

I A real valued squared matrix A is said to be positive


semidefinite if for any vector z: zT Az ≥ 0.
I Gram matrices are positive semidefinite matrices.

19 / 22
Matrix Diagonalization Statistical ML

M. Keller

I An eigenvector u of A (square matrix) is a solution Motivation


(6= 0) of the equation: Au = λu ⇔ (A − λI)u = 0, for Linear Algebra
Vectors
a particular λ called the associated eigenvalue. Matrices
Determinant
I Eigenvalues are solution of the characteristic Inverses
Diagonalization
polynomial: det(A − λI) = 0.
I If An×n is real valued and symmetric then:
I all eigenvalues λ1 , . . . , λn are real valued and
I we can find n eigenvectors u1 , . . . , un such that ui ⊥ uj
and kuj k = 1, ie a new basis for Rn .
I If P = (u1 , . . . , un ), then A can be rewritten as:
 
λ1 . . . 0
 .. . . ..  PT .
A = P . . . 
0 . . . λn

I “A positive semidefinite” ⇔ λi ≥ 0 for all i.


20 / 22
Singular Value Decomposition Statistical ML

M. Keller

Motivation

Linear Algebra
Vectors
Matrices
Determinant
I The Singular Value Decomposition is a generalization of Inverses
Diagonalization

matrix diagonalization for rectangular matrices.


I Any real valued matrix Mn×m can be rewritten as:
T
M = Un×n Σn×m Vm×m

where U and V are orthogonal matrices and σij = 0


unless i = j.

21 / 22
Acknowledgement Statistical ML

M. Keller

Motivation

Linear Algebra
Vectors
Matrices
Determinant
Inverses
I Sources of inspiration: Diagonalization

I Linear Algebra: Gilbert Strang MIT course and


“Elementary Linear Algebra” Keith Matthews (both on
the web).
I Some of the motivating figures: Christopher M.
Bishop’s book “Pattern Recognition and Machine
Learning”.

22 / 22

You might also like