Professional Documents
Culture Documents
Factor analysis
From Wikipedia, the free encyclopedia
Factor analysis is related to principal component analysis (PCA) but not identical.
Because PCA performs a variance-maximizing rotation of the variable space, it takes
into account all variability in the variables. In contrast, factor analysis estimates how
much of the variability is due to common factors ("communality"). The two methods
become essentially equivalent if the error terms in the factor analysis model (the
variability not explained by common factors, see below) can be assumed to all have
the same variance.
Definition
Suppose we have a set of p observable random variables, x1,...,xp with means
μ1,...,μp.
Suppose for some unknown constants lij and k unobserved random variables Fj,
where and , where k < p, we have
Here εi is independently distributed error terms with zero mean and finite variance -
which may not be the same for all of them. Let Var(εi) = ψi, so that we have
x − μ = LF + ε.
Suppose Cov(x) = Σ. Then note that from the conditions just imposed on F, we
have
Note that for any orthogonal matrix Q if we set L = LQ and F = QTF, the criteria
for being factors and factor loadings still hold. Hence a set of factors and factor
loadings is identical only up to orthogonal transformations.
[edit] Example
The following example is a simplification for expository purposes, and should not be
taken to be realistic. Suppose a psychologist proposes a theory that there are two
kinds of intelligence, "verbal intelligence" and "mathematical intelligence", neither of
which is directly observed. Evidence for the theory is sought in the examination
scores from each of 10 different academic fields of 1000 students. If each student is
chosen randomly from a large population, then each student's 10 scores are random
variables. The psychologist's theory may say that for each of the 10 academic fields,
the score averaged over the group of all students who share some common pair of
values for verbal and mathematical "intelligences" is some constant times their level
of verbal intelligence plus another constant times their level of mathematical
intelligence, i.e., it is a linear combination of those two "factors". The numbers for a
particular subject, by which the two kinds of intelligence are multiplied to obtain the
expected score, are posited by the theory to be the same for all intelligence level pairs,
and are called "factor loadings" for this subject. For example, the theory may hold
that the average student's aptitude in the field of amphibiology is
The numbers 10 and 6 are the factor loadings associated with amphibiology. Other
academic subjects may have different factor loadings.
Two students having identical degrees of verbal intelligence and identical degrees of
mathematical intelligence may have different aptitudes in amphibiology because
individual aptitudes differ from average aptitudes. That difference is called the "error"
— a statistical term that means the amount by which an individual differs from what
is average for his or her levels of intelligence (see errors and residuals in statistics).
The observable data that go into factor analysis would be 10 scores of each of the
1000 students, a total of 10,000 numbers. The factor loadings and levels of the two
kinds of intelligence of each student must be inferred from the data.
[edit] Mathematical model of the same example
In the example above, for i = 1, ..., 1,000 the ith student's scores are
where
X = μ + LF + ε
where
Note that, since any rotation of a solution is also a solution, this makes interpreting
the factors difficult. See disadvantages below. In this particular example, if we do not
know beforehand that the two types of intelligence are uncorrelated, then we cannot
interpret the two factors as the two different types of intelligence. Even if they are
uncorrelated, we cannot tell which factor corresponds to verbal intelligence and which
corresponds to mathematical intelligence without an outside argument.
The values of the loadings L, the averages μ, and the variances of the "errors" ε must
be estimated given the observed data X.
Charles Spearman spearheaded the use of factor analysis in the field of psychology
and is sometimes credited with the invention of factor analysis. He discovered that
school children's scores on a wide variety of seemingly unrelated subjects were
positively correlated, which led him to postulate that a general mental ability, or g,
underlies and shapes human cognitive performance. His postulate now enjoys broad
support in the field of intelligence research, where it is known as the g theory.
[edit] Advantages
[edit] Disadvantages
The data collection stage is usually done by marketing research professionals. Survey
questions ask the respondent to rate a product sample or descriptions of product
concepts on a range of attributes. Anywhere from five to twenty attributes are chosen.
They could include things like: ease of use, weight, accuracy, durability,
colourfulness, price, or size. The attributes chosen will vary depending on the product
being studied. The same question is asked about all the products in the study. The data
for multiple products is coded and input into a statistical program such as R, SPSS,
SAS, Stata, and SYSTAT.
[edit] Analysis
The analysis will isolate the underlying factors that explain the data. Factor analysis is
an interdependence technique. The complete set of interdependent relationships are
examined. There is no specification of either dependent variables, independent
variables, or causality. Factor analysis assumes that all the rating data on different
attributes can be reduced down to a few important dimensions. This reduction is
possible because the attributes are related. The rating given to any one attribute is
partially the result of the influence of other attributes. The statistical algorithm
deconstructs the rating (called a raw score) into its various components, and
reconstructs the partial scores into underlying factor scores. The degree of correlation
between the initial raw score and the final factor score is called a factor loading.
There are two approaches to factor analysis: "principal component analysis" (the total
variance in the data is considered); and "common factor analysis" (the common
variance is considered).
Note that principal component analysis and common factor analysis differ in terms of
their conceptual underpinnings. The factors produced by principal component analysis
are conceptualized as being linear combinations of the variables whereas the factors
produced by common factor analysis are conceptualized as being latent variables.
Computationally, the only difference is that the diagonal of the relationships matrix is
replaced with communalities (the variance accounted for by more than one variable)
in common factor analysis. This has the result of making the factor scores
indeterminate and thus differ depending on the method used to compute them whereas
those produced by principal component analysis are not dependent on the method of
computation. Although there have been heated debates over the merits of the two
methods, a number of leading statisticians have concluded that in practice there is
little difference (Velicer and Jackson, 1990) which makes sense since the
computations are quite similar despite the differing conceptual bases, especially for
datasets where communalities are high and/or there are many variables, reducing the
influence of the diagonal of the relationship matrix on the final result (Gorsuch,
1983).
The use of principal components in a semantic space can vary somewhat because the
components may only "predict" but not "map" to the vector space. This produces a
statistical principal component use where the most salient words or themes represent
the preferred basis. [ok]
[edit] Advantages