05 Face Recognition Pca 09

Last Time
Statistical Linear Models:

PCA
Radiometry Radiance and Irradiance

Color Spaces
RGB, nRGB
HSV/I/L
YCrCb
Pixel Statistics
Reading: Eigenfaces online paper

FP pgs. 505-512
Color Models
Classification
Skin Color Models
Appearance-Based Methods
PCA
ICA, FLD
Non-negative Matrix Factorization, Sparse Matrix Factorization
Statistics: the science of collecting, organizing,

and interpreting data.
Statistical Tensor Models:
Multilinear PCA,
Multilinear ICA
Person and Activity Recognition
Data collection.
Data analysis - organize & summarize data to bring
out main features and clarify their underlying
structure.
Inference and decision theory extract relevant info
le
op
from collected data and use it as
Pe
D
Views
a guide for further action.
Illuminations
Statistical Modeling
Statistical Linear Models:
Today
Maximum Likelihood
ns
ressio
Exp
PART I: 2D Vision
Non-parametric Histogram Table Look-up

Parametric Gaussian Model
Data Collection
Population: the entire group of individuals

that we want information about.
Sample: a representative part of the
population that we actually examine in order to
gather information.
Sample size: number of
observations/individuals in a sample.
Statistical inference: to make an inference
about a population based on the information
contained in a sample.
Definitions
Individuals (people or things) -- objects

described by data.
Individuals on which an experiment is being
performed are known as experimental units,
subjects.
Variables--describe characteristics of an
individual.
Categorical variable places an individual into a

category such as male/female.
Quantitative variable measures some characteristic
of the individual, such as height, or pixel values in an
image.
Data Analysis
Experimental Units: images

Observed Data: pixel values in images are directly
measurable but rarely of direct interest
Data Analysis: extracts the relevant information
bring out main features and clarify their underlying
structure.
Variables
Response Variables are directly measurable,
they measure the outcome of a study.
Explanatory Variables, Factors explain or

cause changes in the response variable.
Response vs. Explanatory Variables
Pixels (response variables, directly measurable from

data) change with changes in view and illumination,
the explanatory variables (not directly measurable but
of actual interest).
The question of causation
A strong relationship between two variables does not always

mean that changes in one variable causes changes in the
other.
The relationship between two variables is often influenced by
other variables which are lurking in the background.
The best evidence for causation comes from randomized
comparative experiments.
The observed relationship between two variables may be due

to direct causation, common response or confounding.
Common response refers to the possibility that a change in
a lurking variable is causing changes in both our explanatory
variable and our response variable
Confounding refers to the possibility that either the change in
our explanatory variable is causing changes in the response
variable OR that a change in a lurking variable is causing
changes in the response variable.
Pixel values change with scene geometry,

illumination location, camera location which are
known as the explanatory variables
Explaining Association
An association between two variables x and y can reflect many
types of relationships
association
causality
Apperance Based Models

Models based on the appearance of 3D objects in
ordinary images.
Linear Models
Pixels are response variables that are directly

measurable from an image.
PCA Eigenfaces, EigenImages

FLD Fisher Linear Discriminant Analysis
ICA images are a linear combination of multiliple sources
Multilinear Models
Relevant Tensor Math

MPCA TensorFaces
MICA
2002 by M. Alex O. Vasilescu
Statistical Linear Models

Generative Models:
Second-order methods
faithful/accurate data representation - minimal

reconstruction (mean-square) error
covariance
Linear Models
PCA Principal Component Analysis

Factor Analysis
Higher Order Methods
meaningful representation
higher order statistics
ICA Independent Component Analysis
Descriminant Models:
FLD Fisher Linear Descriminant Analysis
Images
Image Representation
.... ..
.
.
. .
.. ............
Image experimental unit, multivariate function

Pixel response variable
i kr1
pixel kl
255
5
25
xe
pi
I k r
An image is a point in
i2 ... ik
i1
ik+1
.
I = .
.
.
.
ik(r1)+1
i
kr
pixel 1
0
pixel value
l2
255
kr1 dimensional space
i1
i2
i = M
i
kr
axis representing pixel 1
1
0
0
0
1

= i1 M + i 2 0 L + i kr M

M
0
0
0
1
Image Representation
Representation
i2 ... ik
i1
ik+1
.
I = .
.
.
.
ik(r1)+1
i
kr
i1
i2
i = M
i
kr
1
O
i1
0
i2
1
i
kr
Find a new basis matrix that results in a

compact representation useful for face
detection/recognition
Basis Matrix, B
i = Bc
vector of coefficients, c
Consider a set of images of N people under the same viewpoint and

lighting
Each image is made up of 3 pixels and pixel 1 has the same value as
pixel 3 for all images
i1 n

i n = i2 n
s.t . i1 n = i3 n and 1 n N

i3 n
pixel 2

lighting
i1 n

i n = i2 n

i3 n
l3
xe
pi
.
i2
Toy Example - Representation Heuristic
pixel 2
i2
i3
i3
..
................
......
l3
xe
pi
1
0
0

1 0 0 i1n
i n = i1n 0 + i2 n 1 + i3 n 0 = 0 1 0 i
2n

0 0 1 i3n
0
0
1
Basis Matrix, B
i1
i1
pixel 1

lighting
i1 n

i n = i2 n

i3 n
Old Basis
pixel 2
..
................
.....
i2
new basis
l3
xe
pi
pixel 1
1 0
New Basis Matrix, B
Nearest Neighbor Classifier
D, data matrix
C, coefficient matrix
C=B D
Given a new image, inew :
pixel 1
Highly correlated variables were combined

The new basis (the new axis) are uncorrelated
i
1n
= 0 1 = Bcn
i2n
Solve for and store the coefficient matrix C:

1 0
i1 i 2 L i N = 0 1 c1 c 2 L c N

1 0

pixel 2
1 0
1
0

= i1n 0 + i2 n 1

1
0
Toy Example-Recognition
l3
xe
pi
Toy Example-Recognition
1
0
0

1 0 0 i1n
i n = i1n 0 + i2 n 1 + i3 n 0 = 0 1 0 i
2n

0 0 1 i3n
0
0
1
..
.................
......
pixel 1
.5 0 .5 i1new
c new = B 1i new =
i2 new
0 1 0 i3new
Next, compare c new a reduced dimensionality

representation of inew against all coefficient
vectors c n 1 n N
Given an input image representation y (input is also

called a probe; representation may be the image itself,
i, or some transformation of the image, ex. c), the NN
classifier will assign to y the label associated with the
closest image in the training set.
So if, it happens to be closest to another face it will be
assigned L=1 (face), otherwise it will be assigned L=0
(nonface)
Euclidean distance:
d
= yL y
( y Lc yc )
N
c =1
One possible classifier: nearest-neighbor classifier
The Principle Behind

Principal Component Analysis1
Principal Component Analysis:

Eigenfaces
Employs second order statistics to
compute in a principled way a new basis
matrix
1
2
3
Also called: - Hotteling Transform2 or the

- Karhunen-Loeve Method 3.
Find an orthogonal coordinate system such that

data is approximated best and the correlation
between different axis is minimized.
I.T.Jolliffe; Principle Component Analysis; 1986

R.C.Gonzalas, P.A.Wintz; Digital Image Processing; 1987
K.Karhunen; Uber Lineare Methoden in der Wahrscheinlichkeits Rechnug; 1946
M.M.Loeve; Probability Theory; 1955
PCA: Goal - Formally Stated
PCA: Theory
x2
x2
e2
e2
Problem formulation
Input:
X = [x1 L x N ] points in d-dimensional space
Solve for: B dxm basis matrix (md)
:
x1
x1
...
Define a new origin as the mean of the data set

Find the direction of maximum variance in the samples (e1) and align it with
the first axis ,
Continue this process with orthogonal directions of decreasing variance,
aligning each with the next axis
and correlation is minimized

(or cov. is diagonalized)
Recall:
Correlation:
Thus, we have a rotation which minimizes the covariance
1 N
(i n )(i n )T
N 1 n =1
(where is the sample mean)
ST =
1
(D M )(D M )T
N 1
where
M = [ L ]
i1

i 2
1
=
i1 i 2 L i N
N 1
i N
Sample Covariance:
cov(x, y )
x y
1 N
T
(x x )(x x )
N 1 i =1
PCA: Some Properties of the

Covariance/Scatter Matrix
Define the covariance (scatter) matrix of the input

samples:
ST =
cor (x, y ) =
cov(x, y ) =
The Sample Covariance Matrix
C = [c1 L c N ] = BT [x1 L x N ]
The covariance matrix ST is symmetric
The diagonal contains the variance of each parameter

(i.e. element ST,ii is the variance in the ith direction).
Each element ST,ij is the co-variance between the two

directions i and j, represents the level of correlation
(i.e. a value of zero indicates that the two dimensions are
uncorrelated).
PCA: Goal Revisited
Look for: B
Such that:
[c1
Algebraic definition of PCs
Given a sample of N observations on a vector of d

variables
x = [x1 L xN ]
L c N ] = BT [i1 L i N ]
correlation is minimized
Define the kth principal coefficient of the sample by the

d
linear transformation
ck = bTk x = bik xi
cov(C) is diagonal
i =1
Note that Cov(C) can be expressed via Cov(D) and B :
CC
= B (D M)(D M) B
= BT ST B
Algebraic Derivation of b1
where the vector
b k = [b1k
Chosen such that
var[ck ]
Subject to cov ck , cl = 0, k > l 1 and to b k b k
To find b1 maximize var[c1] subject to b b k = 1
Differentiate and set to 0:
L
= Sb1 b1 = 0
b1
Therefore,
b1
is maximal
=1
var[c1 ] = b1T Sb1 = b1T 1b1 = 1
L = b Sb1 b b1 1
T
1
We have maximized
Maximize objective function:

T
1
L bdk ]
T
k
So, 1 is the largest eigenvalue of S
(S I )b1 = 0
is an eigenvector of
corresponding to eigenvalue = 1
Data Loss
To find the next principal direction maximize

T
var[c2] subject to cov[c2,c1]=0 and b 2 b 2 = 1
Maximize objective function:
) (
L = bT2 Sb 2 bT2 b 2 1 bT2 b1 0
Differentiate and set to 0:
L
= Sb 2 b 2 b1 = 0
b 2
Sample points can be projected via the new md

projection matrix Bopt and can still be
reconstructed, but some information will be lost.
)
x2
x2
x1 Bc +
i
BoptT(xi - )
x1
2D data
x1
1D data
2D data
Data Reduction: Theory
Data Loss (cont.)
It can be shown that the mean square error

between xi and its reconstruction using only m
principle eigenvectors is given by the expression
:
N

j =1
j =1
j = m +1
Throwing away the least significant eigenvectors

in Bopt means throwing away the least significant
variance information
Any real D matrix

Can be decomposed:
where
C x = UC y U
Remember that:
SVD: definition
For a square matrix
Each eigenvalue represents the the total

variance in its dimension.
Singular Value Decomposition
C x = (X )(X ) DD
T
then:
d N
where D IR
EVD
is non-square
vs.
SVD
and
And:
12
Cy =
N2
O
0 d d
Set to 0 redundant singular values

12
VT
D' = U
2m
~
D = UC y V T
Cy =
O
N N N
q = min( N , d )
Data Reduction and SVD
Non-square
C x = UC y UT
=
O
q qq
The s are called singular values
Square
D = U VT
UT U = VVT = I
~
D = UC y V T
dN
Given the data dimension is m we can solve

for the first m vectors of U
(No need to find all of them)
PCA : Conclusion
A multi-variant analysis method.

Finds a more natural coordinate system for the
sample data.
Allows for data to be removed with minimum
loss in reconstruction ability.
Consider a set of images, & each image is made up of 3 pixels and pixel 1 has the same
T
value as pixel 3 for all images
i = i i i
s.t. i = i and 1 n N
[ 1n
2n
3n
1n
3n
PCA chooses axis in the direction of highest variability of the data, maximum scatter
pixel 2
PCA-Dimensionality Reduction
|
|
| |
| |
i1 i 2 L i N = B c1 c 2 L c N
|
|
| |
| |
1st axis
..
.................
...... 2
l3
xe
pi
data matrix, D
nd
axis
Each image i n is now represented by a vector of

coefficients c n in a reduced dimensionality space.
D = USV T (svd of D)
pixel 1
set B = U
B minimize the following function
E = BT ST B such that BT B = Identity
Data and Eigenfaces
PCA for Recognition
Consider the set of images i n = [i1n i2 n i3n ]T
PCA chooses axis in the direction of highest variability of the data

Given a new image, inew , compute the vector of coefficients
c new
associated with the new basis, B
c new = BT i new
pixel 2
1st axis
.... ..
................
. .. 2
Next, compare c new a reduced dimensionality

representation of inew against all coefficient
vectors c n 1 n N
l3
xe
pi
nd
B 1 = BT
Each image below is a column vector in the basis

matrix B
One possible classifier: nearest-neighbor

classifier
axis
pixel 1
Linear Representation:
. ... ..
.............
......
Principal components (eigenvectors) of image

ensemble
255
pixel kl
5
25
x
pi
2
el
c1
5
25
pixel 1
0
pixel kl
Eigenimages
255
Data is composed of 28 faces photographed under same

lighting and viewing conditions
s.t. i1n = i3n and 1 n N
255
+ c9
+ c28
e
pix
l2
pixel 1
0
255
di = Uc i
Running Sum:
+ c3
c1
2
1 term
3 terms
9 terms
28 terms
Eigenvectors are typically computed using the Singular

Value Decomposition (SVD) algorithm
PIE Database (Weizmann)
The Covariance Matrix
Define the covariance (scatter) matrix of the input

samples:
ST =
1 N
(i n )(i n ) T
N 1 n =1
(where is the sample mean)
PCA Classifier
EigenImages-Basis Vectors
Distance to Face Subspace:
d f (y ) = y U f U f y
T
Each image bellow is a column vector in the basis matrix

B
PCA encodes encodes the variability across
images without distinguishing between variability in
people, viewpoints and illumination
Likelihood ratio (LR) test to classify a probe y as face

or nonface. Intuitively, we expect dn (y) > df (y) to
suggest that y is a face.
The LR for PCA is defined as:
d =
PCA for Recognition - EigenImages
. . 1 axis
................
......
.
.
.........
............. 2 axis
pixel 2
st
person 1
person 2
nd
pixel 1
d n (y )
L =1
>
d f (y )
<
L =0
Face Detection/Recognition
Consider a set of images of 2 people under fixed viewpoint & N lighting condition
Each image is made up of 2 pixels
pixel 2
1st axis
..
..........................
...................
2
person 1
nd
axis
person 2
pixel 1
Reduce dimensionality by throwing away the axis along which the data varies the least
The coefficient vector associated with the 1st basis vector is used for classifiction
Possible classifier: Mahalanobis distance
Each image is represented by one coefficient vector
Each person is displayed in N images and therefore has N coefficient vectors
location and scale in an image
Face Localization
Face Detection and Localization
Scan and classify using image windows at different positions and scales
Face examples
Off-line
training for
multiple scales
Conf.=5
Classifier
Feature Extraction
Non-face examples
Cluster detections in the space-scale space

Assign cluster size to the detection confidence
10

05 Face Recognition Pca 09

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

05 Face Recognition Pca 09

Uploaded by

Copyright:

Available Formats

Last Time

Statistical Linear Models:

Radiometry Radiance and Irradiance

Reading: Eigenfaces online paper

Skin Color Models

Statistics: the science of collecting, organizing,

Statistical Tensor Models:

Person and Activity Recognition

Statistical Linear Models:

Non-parametric Histogram Table Look-up

Population: the entire group of individuals

Individuals (people or things) -- objects

Categorical variable places an individual into a

Experimental Units: images

Explanatory Variables, Factors explain or

Response vs. Explanatory Variables

Pixels (response variables, directly measurable from

The question of causation

A strong relationship between two variables does not always

The observed relationship between two variables may be due

Pixel values change with scene geometry,

Apperance Based Models

Pixels are response variables that are directly

PCA Eigenfaces, EigenImages

Relevant Tensor Math

2002 by M. Alex O. Vasilescu

Statistical Linear Models

faithful/accurate data representation - minimal

PCA Principal Component Analysis

Higher Order Methods

higher order statistics

ICA Independent Component Analysis

FLD Fisher Linear Descriminant Analysis

Image experimental unit, multivariate function

kr1 dimensional space

axis representing pixel 1

2002 by M. Alex O. Vasilescu

Find a new basis matrix that results in a

2002 by M. Alex O. Vasilescu

Consider a set of images of N people under the same viewpoint and

Consider a set of images of N people under the same viewpoint and

Toy Example - Representation Heuristic

Toy Example - Representation Heuristic

2002 by M. Alex O. Vasilescu

2002 by M. Alex O. Vasilescu

Toy Example - Representation Heuristic

New Basis Matrix, B

Nearest Neighbor Classifier

Given a new image, inew :

Highly correlated variables were combined

Solve for and store the coefficient matrix C:

2002 by M. Alex O. Vasilescu

Next, compare c new a reduced dimensionality

Given an input image representation y (input is also

One possible classifier: nearest-neighbor classifier

The Principle Behind

Principal Component Analysis:

Also called: - Hotteling Transform2 or the

Find an orthogonal coordinate system such that

I.T.Jolliffe; Principle Component Analysis; 1986

PCA: Goal - Formally Stated

Define a new origin as the mean of the data set

and correlation is minimized

Thus, we have a rotation which minimizes the covariance