You are on page 1of 10

Last Time

Statistical Linear Models:


PCA

Radiometry Radiance and Irradiance


Color Spaces

RGB, nRGB
HSV/I/L
YCrCb

Pixel Statistics

Reading: Eigenfaces online paper


FP pgs. 505-512

Color Models

Classification

Skin Color Models

Appearance-Based Methods

PCA
ICA, FLD
Non-negative Matrix Factorization, Sparse Matrix Factorization

Statistics: the science of collecting, organizing,


and interpreting data.

Statistical Tensor Models:

Multilinear PCA,
Multilinear ICA

Person and Activity Recognition

Data collection.
Data analysis - organize & summarize data to bring
out main features and clarify their underlying
structure.
Inference and decision theory extract relevant info
le
op
from collected data and use it as
Pe
D
Views
a guide for further action.

Illuminations

Statistical Modeling

Statistical Linear Models:

Today

Maximum Likelihood

ns
ressio
Exp

PART I: 2D Vision

Non-parametric Histogram Table Look-up


Parametric Gaussian Model

Data Collection

Population: the entire group of individuals


that we want information about.
Sample: a representative part of the
population that we actually examine in order to
gather information.
Sample size: number of
observations/individuals in a sample.
Statistical inference: to make an inference
about a population based on the information
contained in a sample.

Definitions

Individuals (people or things) -- objects


described by data.
Individuals on which an experiment is being
performed are known as experimental units,
subjects.
Variables--describe characteristics of an
individual.

Categorical variable places an individual into a


category such as male/female.
Quantitative variable measures some characteristic
of the individual, such as height, or pixel values in an
image.

Data Analysis

Experimental Units: images


Observed Data: pixel values in images are directly
measurable but rarely of direct interest
Data Analysis: extracts the relevant information
bring out main features and clarify their underlying
structure.

Variables
Response Variables are directly measurable,
they measure the outcome of a study.

Explanatory Variables, Factors explain or


cause changes in the response variable.

Response vs. Explanatory Variables

Pixels (response variables, directly measurable from


data) change with changes in view and illumination,
the explanatory variables (not directly measurable but
of actual interest).

The question of causation

A strong relationship between two variables does not always


mean that changes in one variable causes changes in the
other.
The relationship between two variables is often influenced by
other variables which are lurking in the background.
The best evidence for causation comes from randomized
comparative experiments.

The observed relationship between two variables may be due


to direct causation, common response or confounding.
Common response refers to the possibility that a change in
a lurking variable is causing changes in both our explanatory
variable and our response variable
Confounding refers to the possibility that either the change in
our explanatory variable is causing changes in the response
variable OR that a change in a lurking variable is causing
changes in the response variable.

Pixel values change with scene geometry,


illumination location, camera location which are
known as the explanatory variables

Explaining Association
An association between two variables x and y can reflect many
types of relationships
association

causality

Apperance Based Models


Models based on the appearance of 3D objects in
ordinary images.

Linear Models

Pixels are response variables that are directly


measurable from an image.

PCA Eigenfaces, EigenImages


FLD Fisher Linear Discriminant Analysis
ICA images are a linear combination of multiliple sources

Multilinear Models

Relevant Tensor Math


MPCA TensorFaces
MICA

2002 by M. Alex O. Vasilescu

Statistical Linear Models


Generative Models:

Second-order methods

faithful/accurate data representation - minimal


reconstruction (mean-square) error

covariance

Linear Models

PCA Principal Component Analysis


Factor Analysis

Higher Order Methods

meaningful representation

higher order statistics

ICA Independent Component Analysis

Descriminant Models:

FLD Fisher Linear Descriminant Analysis

Images

Image Representation

.... ..
.
.
. .
.. ............

Image experimental unit, multivariate function


Pixel response variable

i kr1

pixel kl

255

5
25

xe
pi

I k r
An image is a point in

i2 ... ik
i1

ik+1

.
I = .
.

.
.

ik(r1)+1

i
kr

pixel 1
0

pixel value

l2

255

kr1 dimensional space

i1

i2

i = M

i
kr

axis representing pixel 1

1
0
0

0
1




= i1 M + i 2 0 L + i kr M


M
0

0
0
1

2002 by M. Alex O. Vasilescu

Image Representation

Representation

i2 ... ik
i1

ik+1

.
I = .
.

.
.

ik(r1)+1

i
kr

i1

i2

i = M

i
kr

1
O

i1
0
i2

1
i
kr

Find a new basis matrix that results in a


compact representation useful for face
detection/recognition

Basis Matrix, B

i = Bc

vector of coefficients, c

2002 by M. Alex O. Vasilescu

Consider a set of images of N people under the same viewpoint and


lighting
Each image is made up of 3 pixels and pixel 1 has the same value as
pixel 3 for all images
i1 n

i n = i2 n
s.t . i1 n = i3 n and 1 n N

i3 n
pixel 2

Consider a set of images of N people under the same viewpoint and


lighting
Each image is made up of 3 pixels and pixel 1 has the same value as
pixel 3 for all images
i1 n

i n = i2 n
s.t . i1 n = i3 n and 1 n N

i3 n

l3
xe
pi

.
i2

Toy Example - Representation Heuristic

pixel 2

Toy Example - Representation Heuristic

i2

i3

i3

..
................
......

l3
xe
pi

1
0
0


1 0 0 i1n
i n = i1n 0 + i2 n 1 + i3 n 0 = 0 1 0 i
2n



0 0 1 i3n
0
0
1

Basis Matrix, B
i1

i1

pixel 1

2002 by M. Alex O. Vasilescu

2002 by M. Alex O. Vasilescu

Toy Example - Representation Heuristic


Consider a set of images of N people under the same viewpoint and
lighting
Each image is made up of 3 pixels and pixel 1 has the same value as
pixel 3 for all images
i1 n

i n = i2 n
s.t . i1 n = i3 n and 1 n N

i3 n
Old Basis

pixel 2

..
................
.....

i2

new basis

l3
xe
pi

pixel 1

1 0

New Basis Matrix, B

Nearest Neighbor Classifier

D, data matrix

C, coefficient matrix

C=B D

Given a new image, inew :

pixel 1

Highly correlated variables were combined


The new basis (the new axis) are uncorrelated

i
1n
= 0 1 = Bcn

i2n

Solve for and store the coefficient matrix C:



1 0

i1 i 2 L i N = 0 1 c1 c 2 L c N


1 0

pixel 2

1 0

1
0


= i1n 0 + i2 n 1


1
0

Toy Example-Recognition

l3
xe
pi

Toy Example-Recognition

1
0
0


1 0 0 i1n
i n = i1n 0 + i2 n 1 + i3 n 0 = 0 1 0 i
2n



0 0 1 i3n
0
0
1

2002 by M. Alex O. Vasilescu

..
.................
......

pixel 1

.5 0 .5 i1new
c new = B 1i new =
i2 new
0 1 0 i3new

Next, compare c new a reduced dimensionality


representation of inew against all coefficient
vectors c n 1 n N

Given an input image representation y (input is also


called a probe; representation may be the image itself,
i, or some transformation of the image, ex. c), the NN
classifier will assign to y the label associated with the
closest image in the training set.
So if, it happens to be closest to another face it will be
assigned L=1 (face), otherwise it will be assigned L=0
(nonface)

Euclidean distance:
d

= yL y

( y Lc yc )
N

c =1

One possible classifier: nearest-neighbor classifier

The Principle Behind


Principal Component Analysis1

Principal Component Analysis:


Eigenfaces
Employs second order statistics to
compute in a principled way a new basis
matrix
1
2
3

Also called: - Hotteling Transform2 or the


- Karhunen-Loeve Method 3.

Find an orthogonal coordinate system such that


data is approximated best and the correlation
between different axis is minimized.

I.T.Jolliffe; Principle Component Analysis; 1986


R.C.Gonzalas, P.A.Wintz; Digital Image Processing; 1987
K.Karhunen; Uber Lineare Methoden in der Wahrscheinlichkeits Rechnug; 1946
M.M.Loeve; Probability Theory; 1955

PCA: Goal - Formally Stated

PCA: Theory
x2

x2

e2

e2

Problem formulation
Input:
X = [x1 L x N ] points in d-dimensional space
Solve for: B dxm basis matrix (md)
:

x1

x1

...

Define a new origin as the mean of the data set


Find the direction of maximum variance in the samples (e1) and align it with
the first axis ,
Continue this process with orthogonal directions of decreasing variance,
aligning each with the next axis

and correlation is minimized


(or cov. is diagonalized)
Recall:
Correlation:

Thus, we have a rotation which minimizes the covariance

1 N
(i n )(i n )T
N 1 n =1

(where is the sample mean)

ST =

1
(D M )(D M )T
N 1

where

M = [ L ]

i1

i 2
1

=
i1 i 2 L i N
N 1

i N

Sample Covariance:

cov(x, y )

x y
1 N
T
(x x )(x x )
N 1 i =1

PCA: Some Properties of the


Covariance/Scatter Matrix

Define the covariance (scatter) matrix of the input


samples:

ST =

cor (x, y ) =

cov(x, y ) =

The Sample Covariance Matrix

C = [c1 L c N ] = BT [x1 L x N ]

The covariance matrix ST is symmetric

The diagonal contains the variance of each parameter


(i.e. element ST,ii is the variance in the ith direction).

Each element ST,ij is the co-variance between the two


directions i and j, represents the level of correlation
(i.e. a value of zero indicates that the two dimensions are
uncorrelated).

PCA: Goal Revisited

Look for: B

Such that:

[c1

Algebraic definition of PCs

Given a sample of N observations on a vector of d


variables

x = [x1 L xN ]

L c N ] = BT [i1 L i N ]

correlation is minimized

Define the kth principal coefficient of the sample by the


d
linear transformation

ck = bTk x = bik xi

cov(C) is diagonal

i =1

Note that Cov(C) can be expressed via Cov(D) and B :

CC

= B (D M)(D M) B
= BT ST B

Algebraic Derivation of b1

where the vector

b k = [b1k

Chosen such that

var[ck ]

Subject to cov ck , cl = 0, k > l 1 and to b k b k

To find b1 maximize var[c1] subject to b b k = 1

Differentiate and set to 0:

L
= Sb1 b1 = 0
b1
Therefore,

b1

is maximal

=1

var[c1 ] = b1T Sb1 = b1T 1b1 = 1

L = b Sb1 b b1 1
T
1

We have maximized

Maximize objective function:


T
1

L bdk ]

Algebraic Derivation of b1
T
k

So, 1 is the largest eigenvalue of S

(S I )b1 = 0

is an eigenvector of

corresponding to eigenvalue = 1

Algebraic Derivation of b2

Data Loss

To find the next principal direction maximize


T
var[c2] subject to cov[c2,c1]=0 and b 2 b 2 = 1
Maximize objective function:

) (

L = bT2 Sb 2 bT2 b 2 1 bT2 b1 0

Differentiate and set to 0:

L
= Sb 2 b 2 b1 = 0
b 2

Sample points can be projected via the new md


projection matrix Bopt and can still be
reconstructed, but some information will be lost.

)
x2

x2
x1 Bc +
i

BoptT(xi - )

x1
2D data

x1
1D data

2D data

Data Reduction: Theory

Data Loss (cont.)

It can be shown that the mean square error


between xi and its reconstruction using only m
principle eigenvectors is given by the expression
:
N


j =1

j =1

j = m +1

Throwing away the least significant eigenvectors


in Bopt means throwing away the least significant
variance information

Any real D matrix


Can be decomposed:
where

C x = UC y U

Remember that:

SVD: definition

For a square matrix

Each eigenvalue represents the the total


variance in its dimension.

Singular Value Decomposition

C x = (X )(X ) DD
T

then:

d N
where D IR

EVD

is non-square

vs.

SVD

and

And:

12

Cy =
N2

O
0 d d

Set to 0 redundant singular values


12

VT
D' = U
2m

~
D = UC y V T

Cy =
O

N N N

q = min( N , d )

Data Reduction and SVD

Non-square

C x = UC y UT

=
O

q qq

The s are called singular values

Square

D = U VT

UT U = VVT = I

~
D = UC y V T

dN

Given the data dimension is m we can solve


for the first m vectors of U
(No need to find all of them)

PCA : Conclusion

A multi-variant analysis method.


Finds a more natural coordinate system for the
sample data.
Allows for data to be removed with minimum
loss in reconstruction ability.

Consider a set of images, & each image is made up of 3 pixels and pixel 1 has the same
T
value as pixel 3 for all images
i = i i i
s.t. i = i and 1 n N

[ 1n

2n

3n

1n

3n

PCA chooses axis in the direction of highest variability of the data, maximum scatter

pixel 2

PCA-Dimensionality Reduction

|
|
| |
| |

i1 i 2 L i N = B c1 c 2 L c N

|
|
| |
| |

1st axis

..
.................
...... 2

l3
xe
pi

data matrix, D
nd

axis

Each image i n is now represented by a vector of


coefficients c n in a reduced dimensionality space.

D = USV T (svd of D)

pixel 1

set B = U

B minimize the following function

E = BT ST B such that BT B = Identity

Data and Eigenfaces

PCA for Recognition

Consider the set of images i n = [i1n i2 n i3n ]T

PCA chooses axis in the direction of highest variability of the data


Given a new image, inew , compute the vector of coefficients
c new
associated with the new basis, B

c new = BT i new

pixel 2

1st axis

.... ..
................
. .. 2

Next, compare c new a reduced dimensionality


representation of inew against all coefficient
vectors c n 1 n N

l3
xe
pi

nd

B 1 = BT

Each image below is a column vector in the basis


matrix B

One possible classifier: nearest-neighbor


classifier

axis

pixel 1

2002 by M. Alex O. Vasilescu

2002 by M. Alex O. Vasilescu

Linear Representation:

. ... ..
.............
......

Principal components (eigenvectors) of image


ensemble
255

pixel kl

5
25

x
pi

2
el

c1
5
25

pixel 1
0

pixel kl

Eigenimages
255

Data is composed of 28 faces photographed under same


lighting and viewing conditions

s.t. i1n = i3n and 1 n N

255

+ c9

+ c28

e
pix

l2

pixel 1
0

255

di = Uc i

Running Sum:

+ c3

c1
2

1 term

3 terms

9 terms

28 terms

Eigenvectors are typically computed using the Singular


Value Decomposition (SVD) algorithm

PIE Database (Weizmann)

The Covariance Matrix

Define the covariance (scatter) matrix of the input


samples:

ST =

1 N
(i n )(i n ) T
N 1 n =1

(where is the sample mean)

PCA Classifier

EigenImages-Basis Vectors

Distance to Face Subspace:

d f (y ) = y U f U f y
T

Each image bellow is a column vector in the basis matrix


B
PCA encodes encodes the variability across
images without distinguishing between variability in
people, viewpoints and illumination

Likelihood ratio (LR) test to classify a probe y as face


or nonface. Intuitively, we expect dn (y) > df (y) to
suggest that y is a face.
The LR for PCA is defined as:

d =
2002 by M. Alex O. Vasilescu

PCA for Recognition - EigenImages

. . 1 axis
................
......
.
.
.........
............. 2 axis

pixel 2

st

person 1

person 2

nd

pixel 1

d n (y )

L =1
>

d f (y )

<
L =0

Face Detection/Recognition

Consider a set of images of 2 people under fixed viewpoint & N lighting condition
Each image is made up of 2 pixels
pixel 2

1st axis

..
..........................
...................
2

person 1

nd

axis

person 2
pixel 1

Reduce dimensionality by throwing away the axis along which the data varies the least
The coefficient vector associated with the 1st basis vector is used for classifiction
Possible classifier: Mahalanobis distance
Each image is represented by one coefficient vector
Each person is displayed in N images and therefore has N coefficient vectors

location and scale in an image

2002 by M. Alex O. Vasilescu

Face Localization

Face Detection and Localization

Scan and classify using image windows at different positions and scales

Face examples

Off-line
training for
multiple scales

Conf.=5

Classifier

Feature Extraction
Non-face examples

Cluster detections in the space-scale space


Assign cluster size to the detection confidence

10

You might also like