Professional Documents
Culture Documents
Carlos Pinto1
1 Neuropsychiatric
Overview
1. Distortions in Data 2. Variance and Covariance 3. PCA: A Toy Example 4. Running PCA in R 5. Factor Analysis 6. Running Factor Analysis in R 7. Conclusions
in linear systems . . .
in linear systems . . .
in linear systems . . .
in linear systems . . .
Noise
1. Noise arises from inaccuracies in our measurements 2. Noise is measured relative to the signal
SNR =
2 signal 2 noise
The SNR must be large if we are to extract useful information from our experiment.
Noise
1. Noise arises from inaccuracies in our measurements 2. Noise is measured relative to the signal
SNR =
2 signal 2 noise
The SNR must be large if we are to extract useful information from our experiment.
Noise
1. Noise arises from inaccuracies in our measurements 2. Noise is measured relative to the signal
SNR =
2 signal 2 noise
The SNR must be large if we are to extract useful information from our experiment.
Noise
1. Noise arises from inaccuracies in our measurements 2. Noise is measured relative to the signal
SNR =
2 signal 2 noise
The SNR must be large if we are to extract useful information from our experiment.
Rotation
1. We dont always know the true underlying variables in the system we are studying 2. We may actually be measuring some (hopefully linear!) combination of the true variables.
Rotation
1. We dont always know the true underlying variables in the system we are studying 2. We may actually be measuring some (hopefully linear!) combination of the true variables.
Redundancy
1. We dont always know whether the variables we are measuring are independent. 2. We may actually be measuring one or more (hopefully linear!) combinations of the same set of variables.
Redundancy
1. We dont always know whether the variables we are measuring are independent. 2. We may actually be measuring one or more (hopefully linear!) combinations of the same set of variables.
Degrees of Redundancy
An Illustrative Experiment
Goals of PCA
1. To isolate noise 2. To separate out the redundant degrees of freedom 3. To eliminate the effects of rotation
Goals of PCA
1. To isolate noise 2. To separate out the redundant degrees of freedom 3. To eliminate the effects of rotation
Goals of PCA
1. To isolate noise 2. To separate out the redundant degrees of freedom 3. To eliminate the effects of rotation
Reduced Data
Variance
Variable : X Variable : Y
Variance of X : Variance of Y :
n1 P
i (yi y)(yi y)
n1
Covariance
Covariance of X and Y :
2 XY = i (xi
x)(yi y) 2 = YX n1
The covariance measures the degree of the (linear!) relationship between X and Y Small values of the covariance indicate that the variables are independent (uncorrelated)
Covariance Matrix
We can write the four variances associated with our two variables in the form of a matrix: CXX CXY CYX CYY The diagonal elements are the variances and the offdiagonal elements are the covariances The covariance matrix is symmetric, since CXY = CYX
Three Variables
For 3 variables, we get a 3 3 matrix: C11 C12 C13 C21 C22 C23 C31 C32 C33
As before, the diagonal elements are the variances and the offdiagonal elements are the covariances, and the matrix is symmetric
1. The diagonal terms are the variances of our different variables. 2. Large diagonal values correspond to strong signals 3. The off-diagonal terms are the covariances between different variables. They reect distortions in the data (noise, rotation, redundancy). 4. Large off-diagonal values correspond to distortions in our data
1. The diagonal terms are the variances of our different variables. 2. Large diagonal values correspond to strong signals 3. The off-diagonal terms are the covariances between different variables. They reect distortions in the data (noise, rotation, redundancy). 4. Large off-diagonal values correspond to distortions in our data
Minimizing Distortion
1. If we redene our variables (as linear combinations of each other) the covariance matrix will change. 2. We want to change the covariance matrix so that the offdiagonal elements are close to zero. 3. That is, we want to diagonalize the covariance matrix.
Minimizing Distortion
1. If we redene our variables (as linear combinations of each other) the covariance matrix will change. 2. We want to change the covariance matrix so that the offdiagonal elements are close to zero. 3. That is, we want to diagonalize the covariance matrix.
Minimizing Distortion
1. If we redene our variables (as linear combinations of each other) the covariance matrix will change. 2. We want to change the covariance matrix so that the offdiagonal elements are close to zero. 3. That is, we want to diagonalize the covariance matrix.
X Y
= =
1 2 3 2
, ,
1 2 3 2
X Y
= =
1 2 3 2
, ,
1 2 3 2
Variances
1 1 C11 = 2 2 +
1 2
1 2
= = = =
1 2 3 2 3 2 9 2
3 C12 = 1 2 2 +
1 2
3 2
1 C21 = 3 2 2 +
3 2
1 2
3 C22 = 3 2 2 +
3 2
3 2
Covariance Matrix
This is not diagonal the nonzero off-diagonal terms indicate the presence of distortion, in this case because the two variables are correlated
Rotation of Axes I
We need to redene our variables to diagonalize the covariance matrix. Choose new variables which are a linear combination of the old ones:
X Y
= a1 X + a2 Y = b1 X + b2 Y
We have to determine the four constants a1 , a2 , b1 , b2 such that the covariance matrix is diagonal.
Rotation of Axes I
We need to redene our variables to diagonalize the covariance matrix. Choose new variables which are a linear combination of the old ones:
X Y
= a1 X + a2 Y = b1 X + b2 Y
We have to determine the four constants a1 , a2 , b1 , b2 such that the covariance matrix is diagonal.
Rotation of Axes I
We need to redene our variables to diagonalize the covariance matrix. Choose new variables which are a linear combination of the old ones:
X Y
= a1 X + a2 Y = b1 X + b2 Y
We have to determine the four constants a1 , a2 , b1 , b2 such that the covariance matrix is diagonal.
Rotation of Axes II
X Y
= X + 3Y = 3X + Y
X1 X2
= 5 =5
+ 3
3 2
Y1 Y2
3 1 + 2 = 3 2
=0 =0
= 3
1 2
3 2
1. All offdiagonal entries are now zero, so data distortions have been removed 2. The variance of Y (second entry on the diagonal) is now zero. This variable has now been eliminated, so the data has been dimensionally reduced (from 2 dimensions to 1)
Eigenvalues
1. The numbers on the diagonal of C are called the eigenvalues of the covariance matrix. 2. Large eigenvalues correspond to large variances these are the important variables to consider. 3. Small eigenvalues correspond to small variances these variables can sometimes be neglected.
Eigenvectors
The directions of the new rotated axes are called the eigenvectors of the covariance matrix Etymological Note: vector: from Latin, carrier eigen: from German, proper to, belonging to
Finance, X
Marketing, Y
Policy, Z
3 7 10 3 10
6 3 9 9 6
5 3 8 7 5
Running PCA in R
1. Large variances represent signal. Small signals represent noise. 2. The new axes are linear combinations of the original axes 3. The principal components are orthogonal (perpendicular) to each other 4. The distributions of our measurements are fully described by their mean and variance
Advantages of PCA
1. The procedure is hypothesis free. 2. The procedure is well dened 3. The solution is (essentially) unique
Factor Analysis
There are a multiplicity of procedures which go under the name of factor analysis. Factor analysis is hypothesis driven and tries to nd a priori structure in the data. Well use the data from Example 2 in the next few slides...
Factor Model
Hypothesis: A signicant part of the variance of our 3 variables can be expressed as linear combinations of two underlying factors, F1 and F2 : X = a1 F1 + a2 F2 + eX Y = b1 F1 + b2 F2 + eY Z = c1 F1 + c2 F2 + eZ
Factor Model
Hypothesis: A signicant part of the variance of our 3 variables can be expressed as linear combinations of two underlying factors, F1 and F2 : X = a1 F1 + a2 F2 + eX Y = b1 F1 + b2 F2 + eY Z = c1 F1 + c2 F2 + eZ
Factor Model
Hypothesis: A signicant part of the variance of our 3 variables can be expressed as linear combinations of two underlying factors, F1 and F2 : X = a1 F1 + a2 F2 + eX Y = b1 F1 + b2 F2 + eY Z = c1 F1 + c2 F2 + eZ
Assumptions
1. The factors Fi are independent with mean 0 and variance 1 2. The error terms are independent with mean 0 and variance i2 With these assumptions, we can calculate the variances in terms of our hypothesised loadings.
Assumptions
1. The factors Fi are independent with mean 0 and variance 1 2. The error terms are independent with mean 0 and variance i2 With these assumptions, we can calculate the variances in terms of our hypothesised loadings.
CYY
2 2 CZZ = c2 1 + c2 + eZ
CXY
= a1 b1 + a2 b2
CXZ = a1 c1 + a2 c2 CYZ = b1 c1 + b2 c2
To t the model we would like to choose the loadings ai , bi and ci so that the predicted variances match the observed variances. Problem: The solution is not unique That is, there are different choices for the loadings that yield the same values for the variances; in fact, an innite number.....
To t the model we would like to choose the loadings ai , bi and ci so that the predicted variances match the observed variances. Problem: The solution is not unique That is, there are different choices for the loadings that yield the same values for the variances; in fact, an innite number.....
In essence, it nds an initial set of loadings and then rotates these to nd the best solution according to a user specied criterion
Varimax: Tends to maximise the difference in loading
between factors.
Quartimax: Tends to maximise the difference in loading
between variables.
In essence, it nds an initial set of loadings and then rotates these to nd the best solution according to a user specied criterion
Varimax: Tends to maximise the difference in loading
between factors.
Quartimax: Tends to maximise the difference in loading
between variables.
Example 3
Physics History English Maths Politics
8 7 10 3 4 5
4 3 4 8 8 9
3 3 3 9 6 8
8 9 9 4 4 5
5 3 4 7 7 9
Take Home
1. Measurements are subject to many distortions including noise, rotation and redundancy 2. PCA is a robust, well dened method for reducing data and investigating underlying structure. 3. There are many approaches to factor analysis; all of them are subject to a certain degree of arbitrariness and should be used with caution
Further Reading
A Tutorial on Principal Components Analysis Jonathon Shlens http://www.snl.salk.edu/ shlens/notes.html Factor Analysis Peter Tryfos www.yorku.ca/ptryfos/f1400.pdf