You are on page 1of 12

IME 692, Fall-2016

Principal Component Analysis


Principal component analysis

Same objective as in the case of data matrix factorization


Exception: the rows of data matrix should be considered as observations from a -variate
random variable

Low dimension linear combination


Easy to interpret
Linear combinations generating the largest spread among the values of
Example: bank notes

Length of the bill, 1


height of the bill (left), 2
height of the bill (right), 3
distance of the inner frame to the lower border, 4
distance of the inner frame to the upper border, 5
length of the diagonal of the central picture, 6
Standardized Linear Combination

1
Consider a weighted combination of different variables =1

Undesirable

A flexible approach: weighted average


= =1 so that =1 2 = 1 (Standardized linear combination)

Choose such that the variance of the projection is maximized, i.e.,


max = max
: =1 : =1

Solution can be found using Rayleigh Quotient result


Principal Components

The direction of is given by the eigenvector 1 corresponding to the largest eigenvalue 1 of the
covariance matrix =
First PC 1 = 1 , so on

Generalization
For a random variable ; = , = = (spectral decomposition)
= diag 1 , 2 , , = 1 , 2 ,
The principal components are =
Illustration
1
~ 0, , with = ; > 0
1

Eigenvalues 1 + and 1

1 1 1 1
Eigenvectors 1 = and 2 =
2 1 2 1

1 1 1
PC transformation = =
2 1 1

1 1 1 + 2
=
2 2 1 2

1 = 1 + = 1
Properties of PCs
For a random variable ~ , , the PC transformation =

= 0, = 1,

= , = 1,

, = 0,

1 2


=1 =
PC in practice

= , =
If 1 is the first eigenvector of , the 1st PC is 1 = 1 1

Consider spectral decomposition = , then the PCs are obtained by


= 1

= 1 = = diag 1 , 2 , , is the matrix of eigenvalues of


=
PC is sensitive to scale changes
Multiplying a one variable by a scalar results in different eigenvalue and eigenvectors
Decomposition being performed on covariance matrix
Caution: PC transformation should be applied to data that have approx. the same scale in each variable

Effects of scaling
Interpretation of the PCs
First two PCs
[,1] [,2]
-0.044 0.011
0.112 0.071
0.139 0.066
0.768 -0.563
0.202 0.659
-0.579 -0.489

1 = 1 = 0.044 0.112 0.139 0.768 0.202 0.579 1 , 6


2 = 2 = 0.011 0.071 0.066 0.563 0.659 0.489 1 , 6

1 0.84 0.66
2 0.64 + 0.75 0.56
First two eigen vectors explain 88% of variance
Correlations of PCs

The correlation between PC , and an original variable is =


12

; = ,

12

For the data matrix, = ; = ,

Interpretation: 2 = proportion of variance of explained by


Normalized PCA

Heterogeneous variables (different units)


NPC gives each variable the same weight
Standardization of variables = 12 ; = diag
1
Centering matrix = 1 1

= 0, = , using Jordan decomposition