Notes5 PDF

Factor analysis
Introduction 2
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Factor analysis model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Factor analysis model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Variance of x
i
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Covariance matrix of x. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Non-uniqueness of factor loadings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Non-uniqueness of factor loadings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Principal factor analysis 11
Procedure - initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Constraint 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Heywood cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Maximum likelihood estimation 18
MLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Testing for number of factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Factor rotation 22
Some general comments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
What do we look for? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Two types of rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Types of rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Estimating/predicting factor scores 28
Random vs. deterministic factor scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Deterministic factor scores: Bartletts method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Random factor scores: Thompsons method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Factor analysis vs. PCA 33
1
Common properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Dierences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2
Introduction 2 / 35
Introduction
I In social sciences (e.g., psychology), it is often not possible to measure the variables of interest
directly. Examples:
N Intelligence
N Social class
Such variables are called latent variables or common factors.
I Researchers examine such variables indirectly, by measuring variables that can be measured and that
are believed to be indicators of the latent variables of interest. Examples:
N Examination scores on various tests
N Occupation, education, home ownership
Such variables are called manifest variables or observed variables.
I Goal: study the relationship between the latent variables and the manifest variables
3 / 35
Factor analysis model
I Multiple linear regression model:
x
1
=
11
f
1
+ +
1k
f
k
+u
1
x
2
=
21
f
1
+ +
2k
f
k
+u
2
.
.
. =
.
.
.
x
p
=
p1
f
1
+ +
pk
f
k
+u
p
where
N x = (x
1
, . . . , x
p
)
are the observed variables (random)

N f = (f
1
, . . . , f
k
)
are the common factors (random)

N u = (u
1
, . . . , u
p
)
are called specic factors (random)

N
ij
are called factor loadings (constants)
4 / 35
3
Factor analysis model
I In short: x = f +u, where is the p k matrix containing the
ij
s.
I Dierence with multiple regression: common factors f
1
, . . . , f
k
are unobserved.
I Assumptions:
N E(x) = 0 (if this is not the case, simply subtract the mean vector)
N E(f) = 0, Cov(f) = I
N E(u) = 0, Cov(u
i
, u
j
) = 0 for i = j
N Cov(f, u) = 0
5 / 35
Variance of x
i
I Notation:
N Cov(u) = = diag(
11
, . . . ,
kk
)
N Cov(x) =
I Then (see board):
N
ii
= Var(x
i
) =
k
j=1
2
ij
+
ii
N Var(x
i
) consists of two parts:
I h
2
i
=
k
j=1
2
ij
, called communality of x
i
, represents variance of x
i
that is shared with the
other variables via the common factors
I
ii
, called the specic or unique variance, represents the variance of x
i
that is not shared
with the other variables
6 / 35
Covariance matrix of x
I Note that (see board):
N
ij
= Cov(x
i
, x
j
) =
k
=1
j
I Hence, the factor models leads to: =
+
I The reverse is also true: If one can decompose in this form, then the k-factor model holds for x
7 / 35
4
Non-uniqueness of factor loadings
I Suppose that k-factor model holds for x: x = f +u
I Let G be a k k orthogonal matrix.
I Then x = GG
f +u.
I Note that G
f satises assumptions that we made about the common factors (see board).
I Hence the k-factor model holds with factors G
f and factor loadings G.

I = (G)(G
) + =
+
I Hence, factors f with loadings , or factors G
f with loadings G are equivalent for explaining the

covariance matrix of the observed variables.
8 / 35
Non-uniqueness of factor loadings
I Non-uniqueness can be resolved by imposing an extra condition. For example:
N
1
is diagonal with its elements in decreasing order (constraint 1)
N
D
1
is diagonal with its elements in decreasing order, where D = diag(
11
, . . . ,
pp
)
(constraint 2)
9 / 35
Estimation
I is usually estimated by S (or often: correlation matrix is estimated by R) .
I Given S (or R), we need to nd estimates

and

that satisfy constraint 1 or 2, so that S (or R)
+

.
I Note that typically, the number of parameters in

and

is smaller than the number of parameters
in S. Hence, there is no exact solution in general.
I Two main methods to estimate

and

:
N principal factor analysis
N maximum likelihood estimation (requires normality assumption)
I In practice, we also need to determine the value of k, the number of factors.
10 / 35
5
Principal factor analysis 11 / 35
Procedure - initialization
I Estimate correlation matrix by R
I Make preliminary estimates

h
2
i
of the communalities h
2
i
, using:
N The square of the multiple correlation coecient of the ith variable with all the other variables,
or
N The largest correlation coecient between the ith variable and one of the other variables
12 / 35
Idea
I Given R (p p), we want to nd

(p p) and

(p k) that satisfy constraint 2, so that
R

I We look at R

, because we are interested in explaining the (co)variances that are shared through
the common factors.
I R

is symmetric. Hence there is a spectral decomposition R

= GAG
p
i=1
a
i
g
(i)
g
(i)
I If the rst k eigenvalues are positive, and the remaining ones are close to zero, then
R

k
i=1
a
i
g
(i)
g
(i)
=
k
i=1
(a
1/2
i
g
(i)
)(a
1/2
i
g
(i)
)
.
I

k
i=1
(i)
(i)
. Hence, a natural estimate for
(i)
is

(i)
= a
1/2
i
g
(i)
.
I In matrix form:

= G
1
A
1/2
1
.
13 / 35
Procedure
I Determine the spectral decomposition of the reduced correlation matrix R

, where the ones on
the diagonal are replaced by

h
2
i
= 1

ii
. Thus, R

= GAG
, where A = diag(a
1
, . . . , a
p
)
contains the eigenvalues of R

, a
1
a
p
, and G contains the corresponding orthonormal
eigenvectors.
I Estimate by

= G
1
A
1/2
1
, where G
1
= (g
(1)
, . . . , g
(k)
) and A
1
= diag(a
1
, . . . , a
k
).
I Estimate the specic variances
ii
by

ii
= 1
k
j=1
2
ij
, i = 1, . . . , p.
I Stop, or repeat the above steps until some convergence criterion has been reached.
14 / 35
6
Constraint 2
I D = diag(
11
, . . . ,
pp
) = I because working with the correlation matrix is equivalent to working
with standardized variables.
I Hence,

satises constraint 2:
D
1
= (A
1/2
1
G
1
)(G
1
A
1/2
1
) = A
1
is diagonal with decreasing elements.
15 / 35
Heywood cases
I It can happen that

ii
< 0 or

ii
> 1.
I This makes no sense:
N
ii
is a variance, so must be positive.
N Working with the correlation matrix means we are working with standardized variables. So
V ar(x
i
) = 1, and V ar(
i
) cannot exceed 1.
I Such cases are called Heywood cases.
16 / 35
Example
I See R-code.
17 / 35
Maximum likelihood estimation 18 / 35
MLE
I Assume that X has a multivariate normal distribution
I Then log likelihood function (plugging in x for ) is (see board):
l() =
1
2
nlog |2|
1
2
n tr(
1
S)
I Regard =
+ as a function of and , and maximize the log likelihood function over and
.
I Optimization is done iteratively:
N For xed , one can maximize analytically over
N For xed , one can maximize numerically over
I This method is used by the R-function factanal().
I This method can also have problems with Heywood cases.
19 / 35
7
Testing for number of factors
I Advantage of the MLE method is that it allows to test if the number of factors is sucient:
N Null hypothesis: k factors is sucient
N Alternative hypothesis: k factors is not sucient
N p-value < 0.05 means ...
I Often sequential testing procedure is used: start with 1 factor and then increase the number of
factors one at a time until test doesnt reject the null hypothesis.
I It can occur that the test always rejects the null hypothesis. This is an indication that the model
does not t well (or that the sample size is very large).
20 / 35
Example
I See R-code
21 / 35
Factor rotation 22 / 35
Some general comments
I In factor rotation, we look for an orthogonal matrix G such that the factor loadings
= G can
be more easily interpreted than the original factor loadings .
I Is it a good idea to look for such rotations?
N Cons: One can keep rotating the factors until one nds an interpretation that one likes.
N Pros: Factor rotation does not change the overall structure of a solution. It only changes how
the solution is described, and nds the simplest description.
23 / 35
What do we look for?
I Factor loadings can often be easily interpreted if:
N Each variable is highly loaded on at most one factor.
N All factor loadings are either large and positive, or close to zero.
24 / 35
8
Two types of rotations
I Orthogonal rotation: the factors are restricted to be uncorrelated.
I Oblique rotation: the factors may be correlated.
I Advantage of orthogonal rotation: For orthogonal rotation (based on standardized variables), the
factor loadings represent correlations between factors and observed variables (see board). This is not
the case for oblique rotations.
I Advantage of oblique rotation: May be unrealistic to assume that factors are uncorrelated. One may
obtain a better t by dropping this assumption.
25 / 35
Types of rotations
I Orthogonal:
N Varimax: default in factanal(). Aims at factors with a few large loadings, and many
near-zero loadings.
N Quartimax: not implemented in base R.
I Oblique:
N Promax: use option rotation="promax" in factanal(). Aims at simple structure with low
correlation between factors.
N Oblimin: not implemented in base R
26 / 35
Example
I See R-code
27 / 35
Estimating/predicting factor scores 28 / 35
Random vs. deterministic factor scores
I So far, we considered the factor scores to be random. This is appropriate when we think of dierent
samples consisting of dierent individuals, and we are interested in the general structure.
I One can also consider the factor scores to be deterministic. That is appropriate when we are
interested in a specic group of individuals.
29 / 35
9
Deterministic factor scores: Bartletts method
I Assume normality, and suppose that and are known.
I Denote the factor scores for the ith individual by f
i
.
I Then x
i
given f
i
is normally distributed with mean f
i
and covariance matrix .
I Hence, the log likelihood for one observation x
i
is given by
1
2
log |2|
1
2
(x
i
f
i
)
1
(x
i
f
i
).
I Setting the derivative with respect to f
i
equal to zero gives (see board):
f
i
= (
1
)
1
1
x
i
.
30 / 35
Random factor scores: Thompsons method
I Consider f to be random, i.e., f has a normal distribution with mean 0 and covariance matrix I.
I Then

f
x
0
0
I Then f|x has distribution N(
1
x, I
1
) (see board).
I Hence, natural estimator for f
i
is
1
x
i
.
31 / 35
Examples
I Both methods have advantages and disadvantages, no clear favorite.
I See examples in R-code.
32 / 35
10
Factor analysis vs. PCA 33 / 35
Common properties
I Both methods are mostly used in exploratory data analysis.
I Both methods try to obtain dimension reduction: explain a data set in a smaller number of variables.
I Both methods dont work if the observed variables are almost uncorrelated:
N Then PCA returns components that are similar to the original variables.
N Then factor analysis has nothing to explain, i.e.
ii
close to 1 for all i.
I Both methods give similar results if the specic variances are small.
I If specic variances are assumed to be zero in principle factor analysis, then PCA and factor analysis
are the same.
34 / 35
Dierences
I PCA required virtually no assumptions.
Factor analysis assumes that data come from a specic model.
I In PCA emphasis is on transforming observed variables to principle components.
In factor analysis, emphasis is on the transformation from factors to observed variables.
I PCA is not scale invariant.
Factor analysis (with MLE) is scale invariant.
I In PCA, considering k + 1 instead of k components does not change the rst k components.
In factor analysis, considering k + 1 instead of k factors may change the rst k factors (when using
MLE method).
I Calculation of PCA scores is straightforward.
Calculation of factor scores is more complex.
35 / 35
11

Notes5 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Notes5 PDF

Uploaded by

Copyright:

Available Formats

Factor analysis

are the observed variables (random)

are the common factors (random)

are called specic factors (random)

f and factor loadings G.

f with loadings G are equivalent for explaining the

I Then f|x has distribution N(

You might also like