Statistical ML Linear Algebra Basics

Statistical ML
M. Keller
Basics for Statistical Machine Learning
Motivation
Linear Algebra Basics Linear Algebra
Vectors
Matrices
Determinant
Inverses
Mikaela Keller Diagonalization
IDIAP Research Institute

Martigny, Switzerland
mkeller[at]idiap.ch
July 2nd, 2007
1 / 22
Statistical ML
M. Keller
Motivation
Linear Algebra
Vectors
Matrices
Motivation Determinant
Inverses
Diagonalization
Linear Algebra Basics
2 / 22
Statistical ML
M. Keller
Motivation
Linear Algebra
Vectors
Matrices
Motivation Determinant
Inverses
Diagonalization
Linear Algebra Basics
3 / 22
Motivation Statistical ML
Concrete Example: Regression M. Keller
Motivation
Linear Algebra
Vectors
Matrices
Determinant
Inverses
Diagonalization
I Determination of abalone age by:

I Counting the number of rings in the shell through a
microscope ← time-consuming task.
I Through other measurements: sex, diameter, height,
whole weight, shell weight, etc. ← easy to obtain.
I Regression problem: training examples = {(easy
measurements, age)}. We want to predict the age of
abalone from the easy measurements alone.
4 / 22
Concrete Example: Classification M. Keller
Motivation
Linear Algebra
2 Vectors
Matrices
Determinant
Inverses
0
Diagonalization
−2
−2 0 2
I Written digits classification:

I Automatic recognition of postal code from scanned
mail.
I Classification problem: training examples = {(image,
actual digit)}. We want to predict the correct digit
from a new image.
5 / 22
Concrete Example: Density Estimation / Clustering M. Keller
Motivation
100
2 (i) Linear Algebra
90
Vectors
80 Matrices
70 0 Determinant
Inverses
60 Diagonalization
50
−2
40
1 2 3 4 5 6 −2 0 2
I Data compression / data visualization / data

exploration:
I Time between two eruptions vs duration of the previous
eruption.
I Unsupervised problem: training examples =
{(measurement)}. We want to “organize” the
information contained in the measurements.
6 / 22
M. Keller
Motivation
Linear Algebra
I Most of the problems described previously end up Vectors
Matrices
reformulated into: Determinant
Inverses
I curves or surfaces to be discovered, Diagonalization
I ie systems of equations with unknowns to be solved,

I ie matrices manipulation operations.
⇒ Linear Algebra.
I Diverse sources of uncertainty:
I limited amount of examples,
I noise in the measurements,
I randomness inherent to the observed phenomena, etc.
⇒ Probability Theory
7 / 22
Statistical ML
M. Keller
Motivation
Linear Algebra
Motivation Vectors
Matrices
Determinant
Inverses
Linear Algebra Basics Diagonalization
Vectors
Matrices
Determinant
Inverses
Matrix Diagonalization
8 / 22
Vectors Statistical ML
M. Keller
Motivation
I Examples x are usually represented as vectors of m Linear Algebra

Vectors
components: Matrices
Determinant
Inverses
Diagonalization
  x2
x1 x
x =  ...  , xT = (x1 , . . . , xm ) .
 
xm x1
I Inner product (aka dot product, scalar product):

 
y1
xT y = (x1 , . . . , xm )  ...  = x1 y1 + . . . + xm ym .
 
ym
9 / 22
Vectors Statistical ML
M. Keller
Motivation
Linear Algebra
Vectors
I “x and y are orthogonal (x ⊥ y)” ⇔ xT y = 0. Matrices
Determinant
Inverses
I The norm (length) of x: Diagonalization
√
kxk = xT x
.
I The distance between 2 vectors x and y is defined as
d(x, y) = kx − yk:
d(x, y)2 = kxk2 + kyk2 − 2xT y
10 / 22
Matrices Statistical ML
M. Keller
Motivation
n Equations with m unknows x1 , . . . , xm : Linear Algebra
Vectors
 Matrices
 a11 x1 +
 ... + a1m xm = b1 Determinant
Inverses
.. ⇔ Diagonalization
 .
an1 x1 + . . . +anm xm = bn

    
a11 . . . a1m x1 b1
 .. . . ..   ..  =  ..  ⇔
 . . .  .   . 
an1 . . . anm xm bn
An×m xm×1 = bn×1 .
11 / 22
M. Keller
Motivation
Vectors
 Matrices
 a11 x1 +
Inverses
 .
an1 x1 + . . . +anm xm = bn

    
(a11 ,
. . . , a1m ) x1 b1
..   ..   ..  ⇔
 . =

 . . 
(an1 , . . . , anm ) xm bn
Ax = b.
11 / 22
M. Keller
Motivation
Vectors
 Matrices
 a11 x1 +
Inverses
 .
an1 x1 + . . . +anm xm = bn

    
(a11 ,
. . . , a1m ) x1 b1
..   ..   ..  ⇔
 . =

 . . 
(an1 , . . . , anm ) xm bn
Ax = b.
11 / 22
Geometrical view M. Keller
Motivation
Linear Algebra
2-D Example Vectors
Matrices
Determinant
Inverses
2x1 − x2 =0 Diagonalization
x1 + 3x2 =2
12 / 22
Motivation
Linear Algebra
2-D Example Vectors
Matrices
Determinant
Inverses
x1 + 3x2 =2
(1,2)
(0,0)
12 / 22
Motivation
Linear Algebra
2-D Example Vectors
Matrices
Determinant
Inverses
x1 + 3x2 =2
12 / 22
Motivation
Linear Algebra
2-D Example Vectors
Matrices
Determinant
Inverses

x1 + 3x2 =2
(−1,1)
(2,0)
12 / 22
Motivation
Linear Algebra
2-D Example Vectors
Matrices
Determinant
Inverses
x1 + 3x2 =2
12 / 22
M. Keller
n Equations with m unknows x1 , . . . , xm :
Motivation
Linear Algebra
Ax = b ⇔ Vectors
Matrices
     Determinant
a11 . . . a1m x1 b1 Inverses
 .. .. ..   ..  =  ..  ⇔ Diagonalization
 . . .  .   . 
an1 . . . anm xm bn
13 / 22
M. Keller
n Equations with m unknows x1 , . . . , xm :
Motivation
Linear Algebra
Ax = b ⇔ Vectors
Matrices
     Determinant
a11 . . . a1m x1 b1 Inverses
 .. . . ..   ..  =  .. Diagonalization
⇔

 . . .  .   .
an1 . . . anm xm bn
     
a11 a1m b1
 ..   ..   ..  .
x1  .  + . . . + xm  .  =  . 
an1 anm bn
A real valued matrix An×m is also seen as a linear transfor-
mation:
A : Rm −→ Rn
x −→ Ax
.
13 / 22
Alternate geometrical view M. Keller
Motivation
2-D Example Linear Algebra

Vectors
Matrices
Determinant
Inverses

2x1 −x2 = 0 2 −1 0 Diagonalization
⇔ x1 + x2 = .
x1 +3x2 = 2 1 3 2
14 / 22
Motivation

Vectors
Matrices
Determinant
Inverses

⇔ x1 + x2 = .
x1 +3x2 = 2 1 3 2
14 / 22
Motivation

Vectors
Matrices
Determinant
Inverses

⇔ x1 + x2 = .
x1 +3x2 = 2 1 3 2
14 / 22
Motivation

Vectors
Matrices
Determinant
Inverses

⇔ x1 + x2 = .
x1 +3x2 = 2 1 3 2
14 / 22
Alternate geometrical view (No solution) M. Keller
Motivation
Linear Algebra
2-D Example Vectors
Matrices
Determinant
Inverses
Diagonalization

2x1 −2x2 = 0 2 −2 0
⇔ x1 + x2 = .
x1 −x2 = 2 1 −1 2
15 / 22
Determinant Statistical ML
M. Keller
Recursive Definition: Let A be a square matrix (m × m),
Motivation
a11 . . . a1m Linear Algebra
X m
det(A) = ... .. .. Vectors
= (−1)1+j a1j det(M1j ),

. .
Matrices
Determinant
am1 . . . amm j=1 Inverses
Diagonalization
where Mij is A without its line i and its column j and

det(m) = m for m scalar.
Example:
a11 a12 a13

det(A) = a21 a22 a23
a31 a32 a33

a a23 a a23 a a
= a11 22 + a12 21 + a13 21 22

a32 a33 a31 a33 a31 a32
= a11 (a22 a33 −a32 a23 )+a12 (a21 a33 −a31 a23 )+a13 (a21 a32 −a31 a22 )
16 / 22
Inverses Statistical ML
M. Keller
Motivation
Linear Algebra
I Definition: A square matrix Am×m is called non-singular Vectors
Matrices
or invertible if there exists a matrix Bm×m such that: Determinant
Inverses
  Diagonalization
1 ... 0
AB = Im =  ... . . . ...  = BA.
 
0 ... 1
If such B exists it is called the inverse of A and noted

A−1 .
I “A is invertible” ⇔ det(A) 6= 0 ⇔ “Ax = 0 iff x = 0”.
I If A (square) is invertible, the solution of the system
Ax = b is x = A−1 b.
17 / 22
Determinants and Inverses Statistical ML
Motivation
Linear Algebra
2-D Example Vectors
Matrices
Determinant
Inverses
2 −1 Diagonalization
|det(A)| = |
| = |2 · 3 − 1 · (−1)|
1 3
a .2
a .1
18 / 22
Determinants and Inverses Statistical ML
Motivation
Linear Algebra
2-D Example Vectors
Matrices
Determinant
Inverses
2 −1 Diagonalization
|det(A)| = |
| = OP.OQ. sin(θ2 − θ1 ).
1 3
θ2 P
θ1
18 / 22
M. Keller
I If A is rectangular andAT Ais invertible, the solution of Motivation
the system Ax = b is x = (AT A)−1 AT b. Linear Algebra

Vectors
I (AT A)−1 AT is called the pseudo-inverse of A. Matrices
Determinant
 T  Inverses
x1 Diagonalization
 .. 
I Let Xn×m =  .  be a collection of examples.
xT
n
I The Gram matrix of this collection is:
 T
x1 x1 . . . xT

1 xn
G = XXT =  ... .. ..  .

. . 
T T
xn x1 . . . xn xn
I A real valued squared matrix A is said to be positive

semidefinite if for any vector z: zT Az ≥ 0.
I Gram matrices are positive semidefinite matrices.
19 / 22
Matrix Diagonalization Statistical ML
M. Keller
I An eigenvector u of A (square matrix) is a solution Motivation

(6= 0) of the equation: Au = λu ⇔ (A − λI)u = 0, for Linear Algebra
Vectors
a particular λ called the associated eigenvalue. Matrices
Determinant
I Eigenvalues are solution of the characteristic Inverses
Diagonalization
polynomial: det(A − λI) = 0.
I If An×n is real valued and symmetric then:
I all eigenvalues λ1 , . . . , λn are real valued and
I we can find n eigenvectors u1 , . . . , un such that ui ⊥ uj
and kuj k = 1, ie a new basis for Rn .
I If P = (u1 , . . . , un ), then A can be rewritten as:
 
λ1 . . . 0
 .. . . ..  PT .
A = P . . . 
0 . . . λn
I “A positive semidefinite” ⇔ λi ≥ 0 for all i.

20 / 22
Singular Value Decomposition Statistical ML
M. Keller
Motivation
Linear Algebra
Vectors
Matrices
Determinant
I The Singular Value Decomposition is a generalization of Inverses
Diagonalization
matrix diagonalization for rectangular matrices.

I Any real valued matrix Mn×m can be rewritten as:
T
M = Un×n Σn×m Vm×m
where U and V are orthogonal matrices and σij = 0

unless i = j.
21 / 22
Acknowledgement Statistical ML
M. Keller
Motivation
Linear Algebra
Vectors
Matrices
Determinant
Inverses
I Sources of inspiration: Diagonalization
I Linear Algebra: Gilbert Strang MIT course and

“Elementary Linear Algebra” Keith Matthews (both on
the web).
I Some of the motivating figures: Christopher M.
Bishop’s book “Pattern Recognition and Machine
Learning”.
22 / 22

Statistical ML Linear Algebra Basics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistical ML Linear Algebra Basics

Uploaded by

Copyright:

Available Formats

Statistical ML

IDIAP Research Institute

July 2nd, 2007

Linear Algebra Basics

Linear Algebra Basics

Concrete Example: Regression M. Keller

I Determination of abalone age by:

Concrete Example: Classification M. Keller

I Written digits classification:

Concrete Example: Density Estimation / Clustering M. Keller

I Data compression / data visualization / data

I ie systems of equations with unknowns to be solved,

I Examples x are usually represented as vectors of m Linear Algebra

I Inner product (aka dot product, scalar product):

d(x, y)2 = kxk2 + kyk2 − 2xT y

An×m xm×1 = bn×1 .

Geometrical view M. Keller

Geometrical view M. Keller

Geometrical view M. Keller

Geometrical view M. Keller

Geometrical view M. Keller

Alternate geometrical view M. Keller

2-D Example Linear Algebra

Alternate geometrical view M. Keller

2-D Example Linear Algebra

Alternate geometrical view M. Keller

2-D Example Linear Algebra

Alternate geometrical view M. Keller

2-D Example Linear Algebra

Alternate geometrical view (No solution) M. Keller

where Mij is A without its line i and its column j and

If such B exists it is called the inverse of A and noted

Geometrical view M. Keller

Geometrical view M. Keller

I If A is rectangular andAT Ais invertible, the solution of Motivation

the system Ax = b is x = (AT A)−1 AT b. Linear Algebra

I A real valued squared matrix A is said to be positive

I An eigenvector u of A (square matrix) is a solution Motivation

I “A positive semidefinite” ⇔ λi ≥ 0 for all i.

matrix diagonalization for rectangular matrices.

where U and V are orthogonal matrices and σij = 0

I Linear Algebra: Gilbert Strang MIT course and

You might also like