Factor Analysis

3/29/2019 Factor analysis - Wikipedia
Factor analysis
Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a
potentially lower number of unobserved variables called factors. For example, it is possible that variations in six
observed variables mainly reflect the variations in two unobserved (underlying) variables. Factor analysis searches for
such joint variations in response to unobserved latent variables. The observed variables are modelled as linear
combinations of the potential factors, plus "error" terms. Factor analysis aims to find independent latent variables.
It is a theory used in machine learning and related to data mining. The theory behind factor analytic methods is that
the information gained about the interdependencies between observed variables can be used later to reduce the set of
variables in a dataset. Factor analysis is commonly used in biology, psychometrics, personality theories, marketing,
product management, operations research, and finance. It may help to deal with data sets where there are large
numbers of observed variables that are thought to reflect a smaller number of underlying/latent variables. It is one of
the most commonly used inter-dependency techniques and is used when the relevant set of variables shows a
systematic inter-dependence and the objective is to find out the latent factors that create a commonality.
Factor analysis is related to principal component analysis (PCA), but the two are not identical.[1] There has been
significant controversy in the field over differences between the two techniques (see section on exploratory factor
analysis versus principal components analysis below). PCA can be considered as a more basic version of exploratory
factor analysis (EFA) that was developed in the early days prior to the advent of high-speed computers. Both PCA and
factor analysis aim to reduce the dimensionality of a set of data, but the approaches taken to do so are different for the
two techniques. Factor analysis is clearly designed with the objective to identify certain unobservable factors from the
observed variables, whereas PCA does not directly address this objective; at best, PCA provides an approximation to
the required factors.[2] From the point of view of exploratory analysis, the eigenvalues of PCA are inflated component
loadings, i.e., contaminated with error variance.[3][4][5][6][7][8]
Contents
Statistical model
Definition
Example
Mathematical model of the same example
Geometric interpretation
Practical implementation
Types of factor analysis
Types of factor extraction
Terminology
Criteria for determining the number of factors
Modern criteria
Older methods
Rotation methods
In psychometrics
History
Applications in psychology
Advantages
Disadvantages
Exploratory factor analysis versus principal components analysis
Arguments contrasting PCA and EFA
https://en.wikipedia.org/wiki/Factor_analysis 1/16
Variance versus covariance

Differences in procedure and results
In marketing
Information collection
Analysis
Advantages
Disadvantages
In physical and biological sciences
In microarray analysis
Implementation
See also
References
Further reading
External links
Statistical model
Definition
Suppose we have a set of observable random variables, with means .
Suppose for some unknown constants and unobserved random variables (called "common factors," because
they influence all the observed random variables), where and where , we have
Here, the are unobserved stochastic error terms with zero mean and finite variance, which may not be the
same for all .
In matrix terms, we have
If we have observations, then we will have the dimensions , , and . Each column of and denotes
values for one particular observation, and matrix does not vary across observations.
Also we will impose the following assumptions on :
1. and are independent.

2.
3. (to make sure that the factors are uncorrelated).
Any solution of the above set of equations following the constraints for is defined as the factors, and as the
loading matrix.
Suppose . Then note that from the conditions just imposed on , we have
or
or
Note that for any orthogonal matrix , if we set and , the criteria for being factors and factor
loadings still hold. Hence a set of factors and factor loadings is unique only up to an orthogonal transformation.
Example
Suppose a psychologist has the hypothesis that there are two kinds of intelligence, "verbal intelligence" and
"mathematical intelligence", neither of which is directly observed. Evidence for the hypothesis is sought in the
examination scores from each of 10 different academic fields of 1000 students. If each student is chosen randomly
from a large population, then each student's 10 scores are random variables. The psychologist's hypothesis may say
that for each of the 10 academic fields, the score averaged over the group of all students who share some common pair
of values for verbal and mathematical "intelligences" is some constant times their level of verbal intelligence plus
another constant times their level of mathematical intelligence, i.e., it is a combination of those two "factors". The
numbers for a particular subject, by which the two kinds of intelligence are multiplied to obtain the expected score, are
posited by the hypothesis to be the same for all intelligence level pairs, and are called "factor loading" for this
subject. For example, the hypothesis may hold that the average student's aptitude in the field of astronomy is
{10 × the student's verbal intelligence} + {6 × the student's mathematical intelligence}.
The numbers 10 and 6 are the factor loadings associated with astronomy. Other academic subjects may have different
factor loadings.
Two students assumed to have identical degrees of the latent, unmeasured traits of verbal and mathematical
intelligence may have different measured aptitudes in astronomy because individual aptitudes differ from average
aptitudes and because of measurement error itself. Such differences make up what is collectively called the "error" — a
statistical term that means the amount by which an individual, as measured, differs from what is average for or
predicted by his or her levels of intelligence (see errors and residuals in statistics).
The observable data that go into factor analysis would be 10 scores of each of the 1000 students, a total of 10,000
numbers. The factor loadings and levels of the two kinds of intelligence of each student must be inferred from the data.
Mathematical model of the same example

In the following, matrices will be indicated by indexed variables. "Subject" indices will be indicated using letters a,b
and c, with values running from 1 to which is equal to 10 in the above example. "Factor" indices will be indicated
using letters p, q and r, with values running from 1 to which is equal to 2 in the above example. "Instance" or
"sample" indices will be indicated using letters i,j and k, with values running from 1 to . In the example above, if a
sample of students responded to the questions, the ith student's score for the ath question are
given by . The purpose of factor analysis is to characterize the correlations between the variables of which the
are a particular instance, or set of observations. In order for the variables to be on equal footing, they are
normalized:
where the sample mean is:
and the sample variance is given by:
The factor analysis model for this particular sample is then:
or, more succinctly:
where
is the ith student's "verbal intelligence",

is the ith student's "mathematical intelligence",
are the factor loadings for the ath subject, for p = 1, 2.
In matrix notation, we have
Observe that by doubling the scale on which "verbal intelligence"—the first component in each column of F—is
measured, and simultaneously halving the factor loadings for verbal intelligence makes no difference to the model.
Thus, no generality is lost by assuming that the standard deviation of the factors for verbal intelligence is 1. Likewise
for mathematical intelligence. Moreover, for similar reasons, no generality is lost by assuming the two factors are
uncorrelated with each other. In other words:
where is the Kronecker delta (0 when and 1 when ).The errors are assumed to be independent of the
factors:
Note that, since any rotation of a solution is also a solution, this makes interpreting the factors difficult. See
disadvantages below. In this particular example, if we do not know beforehand that the two types of intelligence are
uncorrelated, then we cannot interpret the two factors as the two different types of intelligence. Even if they are
uncorrelated, we cannot tell which factor corresponds to verbal intelligence and which corresponds to mathematical
intelligence without an outside argument.
The values of the loadings L, the averages μ, and the variances of the "errors" ε must be estimated given the observed
data X and F (the assumption about the levels of the factors is fixed for a given F). The "fundamental theorem" may be
derived from the above conditions:
The term on the left is the (a,b) term of the correlation matrix (an matrix) of the observed data, and its
diagonal elements will be 1's. The last term on the right will be a diagonal matrix with terms less than unity. The first
term on the right is the "reduced correlation matrix" and will be equal to the correlation matrix except for its diagonal
values which will be less than unity. These diagonal elements of the reduced correlation matrix are called
"communalities" (which represent the fraction of the variance in the observed variable that is accounted for by the
factors):
The sample data will not, of course, exactly obey the fundamental equation given above due to sampling errors,
inadequacy of the model, etc. The goal of any analysis of the above model is to find the factors and loadings
which, in some sense, give a "best fit" to the data. In factor analysis, the best fit is defined as the minimum of the mean
square error in the off-diagonal residuals of the correlation matrix:[9]
This is equivalent to minimizing the off-diagonal components of the error covariance which, in the model equations
have expected values of zero. This is to be contrasted with principal component analysis which seeks to minimize the
mean square error of all residuals.[9] Before the advent of high speed computers, considerable effort was devoted to
finding approximate solutions to the problem, particularly in estimating the communalities by other means, which
then simplifies the problem considerably by yielding a known reduced correlation matrix. This was then used to
estimate the factors and the loadings. With the advent of high-speed computers, the minimization problem can be
solved iteratively with adequate speed, and the communalities are calculated in the process, rather than being needed
beforehand. The MinRes algorithm is particularly suited to this problem, but is hardly the only iterative means of
finding a solution.
If the solution factors are allowed to be correlated (as in oblimin rotation, for example), then the corresponding
mathematical model uses skew coordinates rather than orthogonal coordinates.
Geometric interpretation
The parameters and variables of factor analysis can be given a geometrical interpretation. The data ( ), the factors (
) and the errors ( ) can be viewed as vectors in an -dimensional Euclidean space (sample space), represented
as , and respectively. Since the data are standardized, the data vectors are of unit length ( ). The
factor vectors define an -dimensional linear subspace (i.e. a hyperplane) in this space, upon which the data vectors
are projected orthogonally. This follows from the model equation
and the independence of the factors and the errors: . In the above example, the hyperplane is just a 2-
dimensional plane defined by the two factor vectors. The projection of the data vectors onto the hyperplane is given by
and the errors are vectors from that projected point

to the data point and are perpendicular to the
hyperplane. The goal of factor analysis is to find a
hyperplane which is a "best fit" to the data in some
sense, so it doesn't matter how the factor vectors
which define this hyperplane are chosen, as long as
they are independent and lie in the hyperplane. We
are free to specify them as both orthogonal and
normal ( ) with no loss of generality.
After a suitable set of factors are found, they may
also be arbitrarily rotated within the hyperplane, so
that any rotation of the factor vectors will define the
same hyperplane, and also be a solution. As a result, Geometric interpretation of Factor Analysis parameters for
in the above example, in which the fitting hyperplane 3 respondents to question "a". The "answer" is
is two dimensional, if we do not know beforehand represented by the unit vector , which is projected onto
that the two types of intelligence are uncorrelated, a plane defined by two orthonormal vectors and .
The projection vector is and the error is
then we cannot interpret the two factors as the two
perpendicular to the plane, so that . The
different types of intelligence. Even if they are projection vector may be represented in terms of the
uncorrelated, we cannot tell which factor factor vectors as . The square of the
corresponds to verbal intelligence and which length of the projection vector is the communality:
corresponds to mathematical intelligence, or . If another data vector were plotted, the
whether the factors are linear combinations of both, cosine of the angle between and would be : the
without an outside argument. (a,b) entry in the correlation matrix. (Adapted from Harman
Fig. 4.3)[9]
The data vectors have unit length. The correlation
matrix for the data is given by . The
correlation matrix can be geometrically interpreted as the cosine of the angle between the two data vectors and .
The diagonal elements will clearly be 1's and the off diagonal elements will have absolute values less than or equal to
unity. The "reduced correlation matrix" is defined as
The goal of factor analysis is to choose the fitting hyperplane such that the reduced correlation matrix reproduces the
correlation matrix as nearly as possible, except for the diagonal elements of the correlation matrix which are known to
have unit value. In other words, the goal is to reproduce as accurately as possible the cross-correlations in the data.
Specifically, for the fitting hyperplane, the mean square error in the off-diagonal components
is to be minimized, and this is accomplished by minimizing it with respect to a set of orthonormal factor vectors. It can
be seen that
The term on the right is just the covariance of the errors. In the model, the error covariance is stated to be a diagonal
matrix and so the above minimization problem will in fact yield a "best fit" to the model: It will yield a sample estimate
of the error covariance which has its off-diagonal components minimized in the mean square sense. It can be seen that
since the are orthogonal projections of the data vectors, their length will be less than or equal to the length of the
projected data vector, which is unity. The square of these lengths are just the diagonal elements of the reduced
correlation matrix. These diagonal elements of the reduced correlation matrix are known as "communalities":
Large values of the communalities will indicate that the fitting hyperplane is rather accurately reproducing the
correlation matrix. It should be noted that the mean values of the factors must also be constrained to be zero, from
which it follows that the mean values of the errors will also be zero.
Practical implementation
Types of factor analysis

Exploratory factor analysis (EFA) is used to identify complex interrelationships among items and group items that are
part of unified concepts.[10] The researcher makes no a priori assumptions about relationships among factors.[10]
Confirmatory factor analysis (CFA) is a more complex approach that tests the hypothesis that the items are associated
with specific factors.[10] CFA uses structural equation modeling to test a measurement model whereby loading on the
factors allows for evaluation of relationships between observed variables and unobserved variables.[10] Structural
equation modeling approaches can accommodate measurement error, and are less restrictive than least-squares
estimation.[10] Hypothesized models are tested against actual data, and the analysis would demonstrate loadings of
observed variables on the latent variables (factors), as well as the correlation between the latent variables.[10]
Types of factor extraction

Principal component analysis (PCA) is a widely used method for factor extraction, which is the first phase of EFA.[10]
Factor weights are computed to extract the maximum possible variance, with successive factoring continuing until
there is no further meaningful variance left.[10] The factor model must then be rotated for analysis.[10]
Canonical factor analysis, also called Rao's canonical factoring, is a different method of computing the same model as
PCA, which uses the principal axis method. Canonical factor analysis seeks factors which have the highest canonical
correlation with the observed variables. Canonical factor analysis is unaffected by arbitrary rescaling of the data.
Common factor analysis, also called principal factor analysis (PFA) or principal axis factoring (PAF), seeks the least
number of factors which can account for the common variance (correlation) of a set of variables.
Image factoring is based on the correlation matrix of predicted variables rather than actual variables, where each
variable is predicted from the others using multiple regression.
Alpha factoring is based on maximizing the reliability of factors, assuming variables are randomly sampled from a
universe of variables. All other methods assume cases to be sampled and variables fixed.
Factor regression model is a combinatorial model of factor model and regression model; or alternatively, it can be
viewed as the hybrid factor model,[11] whose factors are partially known.
Terminology
Factor loadings: Commonality is the square of standardized outer loading of an item. Analogous to Pearson's r, the
squared factor loading is the percent of variance in that indicator variable explained by the factor. To get the percent of
variance in all the variables accounted for by each factor, add the sum of the squared factor loadings for that factor
(column) and divide by the number of variables. (Note the number of variables equals the sum of their variances as the
variance of a standardized variable is 1.) This is the same as dividing the factor's eigenvalue by the number of
variables.
Interpreting factor loadings: By one rule of thumb in confirmatory factor analysis, loadings should be .7 or higher to
confirm that independent variables identified a priori are represented by a particular factor, on the rationale that the
.7 level corresponds to about half of the variance in the indicator being explained by the factor. However, the .7
standard is a high one and real-life data may well not meet this criterion, which is why some researchers, particularly
for exploratory purposes, will use a lower level such as .4 for the central factor and .25 for other factors. In any event,
factor loadings must be interpreted in the light of theory, not by arbitrary cutoff levels.
In oblique rotation, one may examine both a pattern matrix and a structure matrix. The structure matrix is simply the
factor loading matrix as in orthogonal rotation, representing the variance in a measured variable explained by a factor
on both a unique and common contributions basis. The pattern matrix, in contrast, contains coefficients which just
represent unique contributions. The more factors, the lower the pattern coefficients as a rule since there will be more
common contributions to variance explained. For oblique rotation, the researcher looks at both the structure and
pattern coefficients when attributing a label to a factor. Principles of oblique rotation can be derived from both cross
entropy and its dual entropy.[12]
Communality: The sum of the squared factor loadings for all factors for a given variable (row) is the variance in that
variable accounted for by all the factors, and this is called the communality. The communality measures the percent of
variance in a given variable explained by all the factors jointly and may be interpreted as the reliability of the indicator
in the context of the factors being posited.
Spurious solutions: If the communality exceeds 1.0, there is a spurious solution, which may reflect too small a sample
or the choice to extract too many or too few factors.
Uniqueness of a variable: The variability of a variable minus its communality.
Eigenvalues/characteristic roots: Eigenvalues measure the amount of variation in the total sample accounted for by
each factor. The ratio of eigenvalues is the ratio of explanatory importance of the factors with respect to the variables.
If a factor has a low eigenvalue, then it is contributing little to the explanation of variances in the variables and may be
ignored as less important than the factors with higher eigenvalues.
Extraction sums of squared loadings: Initial eigenvalues and eigenvalues after extraction (listed by SPSS as
"Extraction Sums of Squared Loadings") are the same for PCA extraction, but for other extraction methods,
eigenvalues after extraction will be lower than their initial counterparts. SPSS also prints "Rotation Sums of Squared
Loadings" and even for PCA, these eigenvalues will differ from initial and extraction eigenvalues, though their total
will be the same.
Factor scores (also called component scores in PCA): are the scores of each case (row) on each factor (column). To
compute the factor score for a given case for a given factor, one takes the case's standardized score on each variable,
multiplies by the corresponding loadings of the variable for the given factor, and sums these products. Computing
factor scores allows one to look for factor outliers. Also, factor scores may be used as variables in subsequent
modeling. (Explained from PCA not from Factor Analysis perspective).
Criteria for determining the number of factors

Researchers wish to avoid such subjective or arbitrary criteria for factor retention as "it made sense to me". A number
of objective methods have been developed to solve this problem, allowing users to determine an appropriate range of
solutions to investigate. Methods may not agree. For instance, the parallel analysis may suggest 5 factors while
Velicer's MAP suggests 6, so the researcher may request both 5 and 6-factor solutions and discuss each in terms of
their relation to external data and theory.
Modern criteria
Horn's parallel analysis (PA): A Monte-Carlo based simulation method that compares the observed eigenvalues with
those obtained from uncorrelated normal variables. A factor or component is retained if the associated eigenvalue is
bigger than the 95th percentile of the distribution of eigenvalues derived from the random data. PA is one of the most
recommended rules for determining the number of components to retain, but many programs fail to include this
option (a notable exception being R).[13] However, Formann provided both theoretical and empirical evidence that its
application might not be appropriate in many cases since its performance is considerably influenced by sample size,
item discrimination, and type of correlation coefficient.[14]
Velicer's (1976) MAP test[15] “involves a complete principal components analysis followed by the examination of a
series of matrices of partial correlations” (p. 397). The squared correlation for Step “0” (see Figure 4) is the average
squared off-diagonal correlation for the unpartialed correlation matrix. On Step 1, the first principal component and
its associated items are partialed out. Thereafter, the average squared off-diagonal correlation for the subsequent
correlation matrix is then computed for Step 1. On Step 2, the first two principal components are partialed out and the
resultant average squared off-diagonal correlation is again computed. The computations are carried out for k minus
one step (k representing the total number of variables in the matrix). Thereafter, all of the average squared
correlations for each step are lined up and the step number in the analyses that resulted in the lowest average squared
partial correlation determines the number of components or factors to retain.[15] By this method, components are
maintained as long as the variance in the correlation matrix represents systematic variance, as opposed to residual or
error variance. Although methodologically akin to principal components analysis, the MAP technique has been shown
to perform quite well in determining the number of factors to retain in multiple simulation studies.[16][17][18] This
procedure is made available through SPSS's user interface. See Courtney (2013)[19] for guidance.
Older methods
Kaiser criterion: The Kaiser rule is to drop all components with eigenvalues under 1.0 – this being the eigenvalue equal
to the information accounted for by an average single item. The Kaiser criterion is the default in SPSS and most
statistical software but is not recommended when used as the sole cut-off criterion for estimating the number of
factors as it tends to over-extract factors.[20] A variation of this method has been created where a researcher calculates
confidence intervals for each eigenvalue and retains only factors which have the entire confidence interval greater than
1.0.[16][21]
Scree plot:[22] The Cattell scree test plots the components as the X-axis and the corresponding eigenvalues as the Y-
axis. As one moves to the right, toward later components, the eigenvalues drop. When the drop ceases and the curve
makes an elbow toward less steep decline, Cattell's scree test says to drop all further components after the one starting
at the elbow. This rule is sometimes criticised for being amenable to researcher-controlled "fudging". That is, as
picking the "elbow" can be subjective because the curve has multiple elbows or is a smooth curve, the researcher may
be tempted to set the cut-off at the number of factors desired by their research agenda.
Variance explained criteria: Some researchers simply use the rule of keeping enough factors to account for 90%
(sometimes 80%) of the variation. Where the researcher's goal emphasizes parsimony (explaining variance with as few
factors as possible), the criterion could be as low as 50%.
Rotation methods
The unrotated output maximizes variance accounted for by the first and subsequent factors, and forces the factors to
be orthogonal. This data-compression comes at the cost of having most items load on the early factors, and usually, of
having many items load substantially on more than one factor. Rotation serves to make the output more
understandable, by seeking so-called "Simple Structure": A pattern of loadings where each item loads strongly on only
one of the factors, and much more weakly on the other factors. Rotations can be orthogonal or oblique (allowing the
factors to correlate).
Varimax rotation is an orthogonal rotation of the factor axes to maximize the variance of the squared loadings of a
factor (column) on all the variables (rows) in a factor matrix, which has the effect of differentiating the original
variables by extracted factor. Each factor will tend to have either large or small loadings of any particular variable. A
varimax solution yields results which make it as easy as possible to identify each variable with a single factor. This is
the most common rotation option. However, the orthogonality (i.e., independence) of factors is often an unrealistic
assumption. Oblique rotations are inclusive of orthogonal rotation, and for that reason, oblique rotations are a
preferred method. Allowing for factors that are correlated with one another is especially applicable in psychometric
research, since attitudes, opinions, and intellectual abilities tend to be correlated, and since it would be unrealistic in
many situations to assume otherwise.[23]
Quartimax rotation is an orthogonal alternative which minimizes the number of factors needed to explain each
variable. This type of rotation often generates a general factor on which most variables are loaded to a high or medium
degree. Such a factor structure is usually not helpful to the research purpose.
Equimax rotation is a compromise between varimax and quartimax criteria.
Direct oblimin rotation is the standard method when one wishes a non-orthogonal (oblique) solution – that is, one in
which the factors are allowed to be correlated. This will result in higher eigenvalues but diminished interpretability of
the factors. See below.
Promax rotation is an alternative non-orthogonal (oblique) rotation method which is computationally faster than the
direct oblimin method and therefore is sometimes used for very large datasets.
In psychometrics
History
Charles Spearman pioneered the use of factor analysis in the field of psychology and is sometimes credited with the
invention of factor analysis. He discovered that school children's scores on a wide variety of seemingly unrelated
subjects were positively correlated, which led him to postulate that a general mental ability, or g, underlies and shapes
human cognitive performance. His postulate now enjoys broad support in the field of intelligence research, where it is
known as the g theory.
Raymond Cattell expanded on Spearman's idea of a two-factor theory of intelligence after performing his own tests
and factor analysis. He used a multi-factor theory to explain intelligence. Cattell's theory addressed alternative factors
in intellectual development, including motivation and psychology. Cattell also developed several mathematical
methods for adjusting psychometric graphs, such as his "scree" test and similarity coefficients. His research led to the
development of his theory of fluid and crystallized intelligence, as well as his 16 Personality Factors theory of
personality. Cattell was a strong advocate of factor analysis and psychometrics. He believed that all theory should be
derived from research, which supports the continued use of empirical observation and objective testing to study
human intelligence.
Applications in psychology
Factor analysis is used to identify "factors" that explain a variety of results on different tests. For example, intelligence
research found that people who get a high score on a test of verbal ability are also good on other tests that require
verbal abilities. Researchers explained this by using factor analysis to isolate one factor, often called crystallized
intelligence or verbal intelligence, which represents the degree to which someone is able to solve problems involving
verbal skills.
Factor analysis in psychology is most often associated with intelligence research. However, it also has been used to
find factors in a broad range of domains such as personality, attitudes, beliefs, etc. It is linked to psychometrics, as it
can assess the validity of an instrument by finding if the instrument indeed measures the postulated factors.
Advantages
Reduction of number of variables, by combining two or more variables into a single factor. For example,
performance at running, ball throwing, batting, jumping and weight lifting could be combined into a single factor
such as general athletic ability. Usually, in an item by people matrix, factors are selected by grouping related
items. In the Q factor analysis technique, the matrix is transposed and factors are created by grouping related
people: For example, liberals, libertarians, conservatives and socialists, could form separate groups.
Identification of groups of inter-related variables, to see how they are related to each other. For example, Carroll
used factor analysis to build his Three Stratum Theory. He found that a factor called "broad visual perception"
relates to how good an individual is at visual tasks. He also found a "broad auditory perception" factor, relating to
auditory task capability. Furthermore, he found a global factor, called "g" or general intelligence, that relates to
both "broad visual perception" and "broad auditory perception". This means someone with a high "g" is likely to
have both a high "visual perception" capability and a high "auditory perception" capability, and that "g" therefore
explains a good part of why someone is good or bad in both of those domains.
Disadvantages
"...each orientation is equally acceptable mathematically. But different factorial theories proved to differ as much in
terms of the orientations of factorial axes for a given solution as in terms of anything else, so that model fitting did
not prove to be useful in distinguishing among theories." (Sternberg, 1977[24]). This means all rotations represent
different underlying processes, but all rotations are equally valid outcomes of standard factor analysis
optimization. Therefore, it is impossible to pick the proper rotation using factor analysis alone.
Factor analysis can be only as good as the data allows. In psychology, where researchers often have to rely on
less valid and reliable measures such as self-reports, this can be problematic.
Interpreting factor analysis is based on using a "heuristic", which is a solution that is "convenient even if not
absolutely true".[25] More than one interpretation can be made of the same data factored the same way, and factor
analysis cannot identify causality.
Exploratory factor analysis versus principal

components analysis
While exploratory factor analysis and principal component analysis are treated as synonymous techniques in some
fields of statistics, this has been criticised (e.g. Fabrigar et al., 1999;[26] Suhr, 2009[27]). In factor analysis, the
researcher makes the assumption that an underlying causal model exists, whereas PCA is simply a variable reduction
technique.[28] Researchers have argued that the distinctions between the two techniques may mean that there are
objective benefits for preferring one over the other based on the analytic goal. If the factor model is incorrectly
formulated or the assumptions are not met, then factor analysis will give erroneous results. Factor analysis has been
used successfully where adequate understanding of the system permits good initial model formulations. Principal
component analysis employs a mathematical transformation to the original data with no assumptions about the form
of the covariance matrix. The aim of PCA is to determine a few linear combinations of the original variables that can be
used to summarize the data set without losing much information.[29]
Arguments contrasting PCA and EFA

Fabrigar et al. (1999)[26] address a number of reasons used to suggest that principal components analysis is not
equivalent to factor analysis:
1. It is sometimes suggested that principal components analysis is computationally quicker and requires fewer
resources than factor analysis. Fabrigar et al. suggest that the ready availability of computer resources have
rendered this practical concern irrelevant.
2. PCA and factor analysis can produce similar results. This point is also addressed by Fabrigar et al.; in certain
cases, whereby the communalities are low (e.g., .40), the two techniques produce divergent results. In fact,
Fabrigar et al. argue that in cases where the data correspond to assumptions of the common factor model, the
results of PCA are inaccurate results.
3. There are certain cases where factor analysis leads to 'Heywood cases'. These encompass situations whereby
100% or more of the variance in a measured variable is estimated to be accounted for by the model. Fabrigar et
al. suggest that these cases are actually informative to the researcher, indicating a misspecified model or a
violation of the common factor model. The lack of Heywood cases in the PCA approach may mean that such
issues pass unnoticed.
4. Researchers gain extra information from a PCA approach, such as an individual's score on a certain component –
such information is not yielded from factor analysis. However, as Fabrigar et al. contend, the typical aim of factor
analysis – i.e. to determine the factors accounting for the structure of the correlations between measured
variables – does not require knowledge of factor scores and thus this advantage is negated. It is also possible to
compute factor scores from a factor analysis.
Variance versus covariance

Factor analysis takes into account the random error that is inherent in measurement, whereas PCA fails to do so. This
point is exemplified by Brown (2009),[30] who indicated that, in respect to the correlation matrices involved in the
calculations:
"In PCA, 1.00s are put in the diagonal meaning that all of the variance in the matrix is to be accounted
for (including variance unique to each variable, variance common among variables, and error variance).
That would, therefore, by definition, include all of the variance in the variables. In contrast, in EFA, the
communalities are put in the diagonal meaning that only the variance shared with other variables is to
be accounted for (excluding variance unique to each variable and error variance). That would, therefore,
by definition, include only variance that is common among the variables."
— Brown (2009), Principal components analysis and exploratory factor analysis –

Definitions, differences and choices
For this reason, Brown (2009) recommends using factor analysis when theoretical ideas about relationships between
variables exist, whereas PCA should be used if the goal of the researcher is to explore patterns in their data.
Differences in procedure and results

The differences between principal components analysis and factor analysis are further illustrated by Suhr (2009):[27]
PCA results in principal components that account for a maximal amount of variance for observed variables; FA
account for common variance in the data.
PCA inserts ones on the diagonals of the correlation matrix; FA adjusts the diagonals of the correlation matrix with
the unique factors.
PCA minimizes the sum of squared perpendicular distance to the component axis; FA estimates factors which
influence responses on observed variables.
The component scores in PCA represent a linear combination of the observed variables weighted by
eigenvectors; the observed variables in FA are linear combinations of the underlying and unique factors.
In PCA, the components yielded are uninterpretable, i.e. they do not represent underlying ‘constructs’; in FA, the
underlying constructs can be labeled and readily interpreted, given an accurate model specification.
In marketing
The basic steps are:
Identify the salient attributes consumers use to evaluate products in this category.
Use quantitative marketing research techniques (such as surveys) to collect data from a sample of potential
customers concerning their ratings of all the product attributes.
Input the data into a statistical program and run the factor analysis procedure. The computer will yield a set of
underlying attributes (or factors).
Use these factors to construct perceptual maps and other product positioning devices.
Information collection
The data collection stage is usually done by marketing research professionals. Survey questions ask the respondent to
rate a product sample or descriptions of product concepts on a range of attributes. Anywhere from five to twenty
attributes are chosen. They could include things like: ease of use, weight, accuracy, durability, colourfulness, price, or
size. The attributes chosen will vary depending on the product being studied. The same question is asked about all the
products in the study. The data for multiple products is coded and input into a statistical program such as R, SPSS,
SAS, Stata, STATISTICA, JMP, and SYSTAT.
Analysis
The analysis will isolate the underlying factors that explain the data using a matrix of associations.[31] Factor analysis
is an interdependence technique. The complete set of interdependent relationships is examined. There is no
specification of dependent variables, independent variables, or causality. Factor analysis assumes that all the rating
data on different attributes can be reduced down to a few important dimensions. This reduction is possible because
some attributes may be related to each other. The rating given to any one attribute is partially the result of the
influence of other attributes. The statistical algorithm deconstructs the rating (called a raw score) into its various
components, and reconstructs the partial scores into underlying factor scores. The degree of correlation between the
initial raw score and the final factor score is called a factor loading.
Advantages
Both objective and subjective attributes can be used provided the subjective attributes can be converted into
scores.
Factor analysis can identify latent dimensions or constructs that direct analysis may not.
It is easy and inexpensive.
Disadvantages
Usefulness depends on the researchers' ability to collect a sufficient set of product attributes. If important
attributes are excluded or neglected, the value of the procedure is reduced.
If sets of observed variables are highly similar to each other and distinct from other items, factor analysis will
assign a single factor to them. This may obscure factors that represent more interesting relationships.
Naming factors may require knowledge of theory because seemingly dissimilar attributes can correlate strongly
for unknown reasons.
In physical and biological sciences

Factor analysis has also been widely used in physical sciences such as geochemistry, hydrochemistry,[32] astrophysics
and cosmology, as well as biological sciences, such as ecology, molecular biology and biochemistry.
In groundwater quality management, it is important to relate the spatial distribution of different chemical parameters
to different possible sources, which have different chemical signatures. For example, a sulfide mine is likely to be
associated with high levels of acidity, dissolved sulfates and transition metals. These signatures can be identified as
factors through R-mode factor analysis, and the location of possible sources can be suggested by contouring the factor
scores.[33]
In geochemistry, different factors can correspond to different mineral associations, and thus to mineralisation.[34]
In microarray analysis
Factor analysis can be used for summarizing high-density oligonucleotide DNA microarrays data at probe level for
Affymetrix GeneChips. In this case, the latent variable corresponds to the RNA concentration in a sample.[35]
Implementation
Factor analysis has been implemented in several statistical analysis programs since the 1980s:
BMDP
JMP (statistical software)
Python: module Scikit-learn[36]
R (with the base function factanal or fa function in package psych). Rotations are implemented in the GPArotation
R package.
SAS (using PROC FACTOR or PROC CALIS)
SPSS[37]
Stata
See also
Design of experiments Non-negative matrix factorization
Formal concept analysis Q methodology
Higher-order factor analysis Recommendation system
Independent component analysis Root cause analysis
References
1. Bartholomew, D.J.; Steele, F.; Galbraith, J.; Moustaki, I. (2008). Analysis of Multivariate Social Science Data.
Statistics in the Social and Behavioral Sciences Series (2nd ed.). Taylor & Francis. ISBN 978-1584889601.
2. Jolliffe I.T. Principal Component Analysis, Series: Springer Series in Statistics, 2nd ed., Springer, NY, 2002, XXIX,
487 p. 28 illus. ISBN 978-0-387-95442-4
3. Cattell, R. B. (1952). Factor analysis. New York: Harper.
4. Fruchter, B. (1954). Introduction to Factor Analysis. Van Nostrand.
5. Cattell, R. B. (1978). Use of Factor Analysis in Behavioral and Life Sciences. New York: Plenum.
6. Child, D. (2006). The Essentials of Factor Analysis, 3rd edition. Bloomsbury Academic Press.
7. Gorsuch, R. L. (1983). Factor Analysis, 2nd edition. Hillsdale, NJ: Erlbaum.
8. McDonald, R. P. (1985). Factor Analysis and Related Methods. Hillsdale, NJ: Erlbaum.
9. Harman, Harry H. (1976). Modern Factor Analysis. University of Chicago Press. pp. 175, 176. ISBN 978-0-226-
31652-9.
10. Polit DF Beck CT (2012). Nursing Research: Generating and Assessing Evidence for Nursing Practice, 9th ed.
Philadelphia, USA: Wolters Klower Health, Lippincott Williams & Wilkins.
11. Meng, J. (2011). "Uncover cooperative gene regulations by microRNAs and transcription factors in glioblastoma
using a nonnegative hybrid factor model" (https://web.archive.org/web/20111123144133/http://www.cmsworldwid
e.com/ICASSP2011/Papers/ViewPapers.asp?PaperNum=4439). International Conference on Acoustics, Speech
and Signal Processing. Archived from the original (http://www.cmsworldwide.com/ICASSP2011/Papers/ViewPape
rs.asp?PaperNum=4439) on 2011-11-23.
12. Liou, C.-Y.; Musicus, B.R. (2008). "Cross Entropy Approximation of Structured Gaussian Covariance Matrices".
IEEE Transactions on Signal Processing. 56 (7): 3362–3367. doi:10.1109/TSP.2008.917878 (https://doi.org/10.11
09%2FTSP.2008.917878).
13. * Ledesma, R.D.; Valero-Mora, P. (2007). "Determining the Number of Factors to Retain in EFA: An easy-to-use
computer program for carrying out Parallel Analysis" (http://pareonline.net/getvn.asp?v=12&n=2). Practical
Assessment Research & Evaluation. 12 (2): 1–11.
14. Tran, U. S., & Formann, A. K. (2009). Performance of parallel analysis in retrieving unidimensionality in the
presence of binary data. Educational and Psychological Measurement, 69, 50-61.
15. Velicer, W.F. (1976). "Determining the number of components from the matrix of partial correlations".
Psychometrika. 41 (3): 321–327. doi:10.1007/bf02293557 (https://doi.org/10.1007%2Fbf02293557).
16. Warne, R. T.; Larsen, R. (2014). "Evaluating a proposed modification of the Guttman rule for determining the
number of factors in an exploratory factor analysis". Psychological Test and Assessment Modeling. 56: 104–123.
17. Ruscio, John; Roche, B. (2012). "Determining the number of factors to retain in an exploratory factor analysis
using comparison data of known factorial structure". Psychological Assessment. 24 (2): 282–292.
doi:10.1037/a0025697 (https://doi.org/10.1037%2Fa0025697). PMID 21966933 (https://www.ncbi.nlm.nih.gov/pub
med/21966933).
18. Garrido, L. E., & Abad, F. J., & Ponsoda, V. (2012). A new look at Horn's parallel analysis with ordinal variables.
Psychological Methods. Advance online publication. doi:10.1037/a0030005
19. Courtney, M. G. R. (2013). Determining the number of factors to retain in EFA: Using the SPSS R-Menu v2.0 to
make more judicious estimations. Practical Assessment, Research and Evaluation, 18(8). Available online:
http://pareonline.net/getvn.asp?v=18&n=8
20. Bandalos, D.L.; Boehm-Kaufman, M.R. (2008). "Four common misconceptions in exploratory factor analysis" (http
s://books.google.com/books?id=KFAnkvqD8CgC&pg=PA61). In Lance, Charles E.; Vandenberg, Robert J.
Statistical and Methodological Myths and Urban Legends: Doctrine, Verity and Fable in the Organizational and
Social Sciences. Taylor & Francis. pp. 61–87. ISBN 978-0-8058-6237-9.
21. Larsen, R.; Warne, R. T. (2010). "Estimating confidence intervals for eigenvalues in exploratory factor analysis".
Behavior Research Methods. 42 (3): 871–876. doi:10.3758/BRM.42.3.871 (https://doi.org/10.3758%2FBRM.42.3.
871). PMID 20805609 (https://www.ncbi.nlm.nih.gov/pubmed/20805609).
22. Cattell, Raymond (1966). "The scree test for the number of factors". Multivariate Behavioral Research. 1 (2): 245–
76. doi:10.1207/s15327906mbr0102_10 (https://doi.org/10.1207%2Fs15327906mbr0102_10). PMID 26828106 (h
ttps://www.ncbi.nlm.nih.gov/pubmed/26828106).
23. Russell, D.W. (December 2002). "In search of underlying dimensions: The use (and abuse) of factor analysis in
Personality and Social Psychology Bulletin" (http://psp.sagepub.com/content/28/12/1629.short). Personality and
Social Psychology Bulletin. 28 (12): 1629–46. doi:10.1177/014616702237645 (https://doi.org/10.1177%2F014616
702237645).
24. Sternberg, R. J. (1977). Metaphors of Mind: Conceptions of the Nature of Intelligence. New York: Cambridge
University Press. pp. 85–111.
25. "Factor Analysis" (https://web.archive.org/web/20040818062948/http://comp9.psych.cornell.edu/Darlington/factor.
htm). Archived from the original (http://comp9.psych.cornell.edu/Darlington/factor.htm) on August 18, 2004.
Retrieved July 22, 2004.
26. Fabrigar; et al. (1999). "Evaluating the use of exploratory factor analysis in psychological research" (http://www.st
atpower.net/Content/312/Handout/Fabrigar1999.pdf) (PDF). Psychological Methods.
27. Suhr, Diane (2009). "Principal component analysis vs. exploratory factor analysis" (http://www2.sas.com/proceedi
ngs/sugi30/203-30.pdf) (PDF). SUGI 30 Proceedings. Retrieved 5 April 2012.
28. SAS Statistics. "Principal Components Analysis" (http://support.sas.com/publishing/pubcat/chaps/55129.pdf)
(PDF). SAS Support Textbook.
29. Meglen, R.R. (1991). "Examining Large Databases: A Chemometric Approach Using Principal Component
Analysis". Journal of Chemometrics. 5 (3): 163–179. doi:10.1002/cem.1180050305 (https://doi.org/10.1002%2Fce
m.1180050305).
30. Brown, J. D. (January 2009). "Principal components analysis and exploratory factor analysis – Definitions,
differences and choices" (http://jalt.org/test/PDF/Brown29.pdf) (PDF). Shiken: JALT Testing & Evaluation SIG
Newsletter. Retrieved 16 April 2012.
31. Ritter, N. (2012). A comparison of distribution-free and non-distribution free methods in factor analysis. Paper
presented at Southwestern Educational Research Association (SERA) Conference 2012, New Orleans, LA
(ED529153).
32. Subbarao, C.; Subbarao, N.V.; Chandu, S.N. (December 1996). "Characterisation of groundwater contamination
using factor analysis". Environmental Geology. 28 (4): 175–180. doi:10.1007/s002540050091 (https://doi.org/10.1
007%2Fs002540050091).
33. Love, D.; Hallbauer, D.K.; Amos, A.; Hranova, R.K. (2004). "Factor analysis as a tool in groundwater quality
management: two southern African case studies". Physics and Chemistry of the Earth. 29 (15–18): 1135–43.
doi:10.1016/j.pce.2004.09.027 (https://doi.org/10.1016%2Fj.pce.2004.09.027).
34. Barton, E.S.; Hallbauer, D.K. (1996). "Trace-element and U—Pb isotope compositions of pyrite types in the
Proterozoic Black Reef, Transvaal Sequence, South Africa: Implications on genesis and age". Chemical Geology.
133 (1–4): 173–199. doi:10.1016/S0009-2541(96)00075-7 (https://doi.org/10.1016%2FS0009-2541%2896%2900
075-7).
35. Hochreiter, Sepp; Clevert, Djork-Arné; Obermayer, Klaus (2006). "A new summarization method for affymetrix
probe level data" (http://bioinformatics.oxfordjournals.org/content/22/8/943.full). Bioinformatics. 22 (8): 943–9.
doi:10.1093/bioinformatics/btl033 (https://doi.org/10.1093%2Fbioinformatics%2Fbtl033). PMID 16473874 (https://
www.ncbi.nlm.nih.gov/pubmed/16473874).
36. http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.FactorAnalysis.html
37. MacCallum, Robert (June 1983). "A comparison of factor analysis programs in SPSS, BMDP, and SAS".
Psychometrika. 48 (2): 223–231. doi:10.1007/BF02294017 (https://doi.org/10.1007%2FBF02294017).
Further reading
Child, Dennis (2006), The Essentials of Factor Analysis (https://books.google.com/books?id=rQ2vdJgohH0C) (3rd
ed.), Continuum International, ISBN 978-0-8264-8000-2.
Fabrigar, L.R.; Wegener, D.T.; MacCallum, R.C.; Strahan, E.J. (September 1999). "Evaluating the use of
exploratory factor analysis in psychological research" (http://psycnet.apa.org/journals/met/4/3/272/). Psychological
Methods. 4 (3): 272–299. doi:10.1037/1082-989X.4.3.272 (https://doi.org/10.1037%2F1082-989X.4.3.272).
Jennrich, Robert I., "Rotation to Simple Loadings Using Component Loss Function: The Oblique Case,"
Psychometrika, Vol. 71, No. 1, pp. 173–191, March 2006.
Katz, Jeffrey Owen, and Rohlf, F. James. Primary product functionplane: An oblique rotation to simple structure.
Multivariate Behavioral Research, April 1975, Vol. 10, pp. 219–232.
Katz, Jeffrey Owen, and Rohlf, F. James. Functionplane: A new approach to simple structure rotation.
Psychometrika, March 1974, Vol. 39, No. 1, pp. 37–51.
Katz, Jeffrey Owen, and Rohlf, F. James. Function-point cluster analysis. Systematic Zoology, September 1973,
Vol. 22, No. 3, pp. 295–301.
Mulaik, S. A. (2010), Foundations of Factor Analysis, Chapman & Hall.
Preacher, K.J.; MacCallum, R.C. (2003). "Repairing Tom Swift's Electric Factor Analysis Machine". Understanding
Statistics. 2 (1): 13–43. doi:10.1207/S15328031US0201_02 (https://doi.org/10.1207%2FS15328031US0201_02).
Thompson, B. (2004), Exploratory and Confirmatory Factor Analysis: Understanding concepts and applications,
Washington DC: American Psychological Association, ISBN 978-1591470939.
External links
A Beginner's Guide to Factor Analysis (http://www.tqmp.org/RegularArticles/vol09-2/p079/p079.pdf)
Exploratory Factor Analysis. A Book Manuscript by Tucker, L. & MacCallum R. (1993). Retrieved June 8, 2006,
from: http://www.unc.edu/~rcm/book/factornew.htm
Garson, G. David, "Factor Analysis," from Statnotes: Topics in Multivariate Analysis. Retrieved on April 13, 2009
from https://archive.is/20121214194305/http://www2.chass.ncsu.edu/garson/pa765/statnote.htm
Factor Analysis at 100 (http://www.fa100.info/index.html) — conference material
FARMS — Factor Analysis for Robust Microarray Summarization, an R package (http://www.bioinf.jku.at/software/
farms/farms.html)
Retrieved from "https://en.wikipedia.org/w/index.php?title=Factor_analysis&oldid=889839414"
This page was last edited on 28 March 2019, at 09:03 (UTC).
Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using
this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia
Foundation, Inc., a non-profit organization.

Factor Analysis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Factor Analysis

Uploaded by

Copyright:

Available Formats

3/29/2019 Factor analysis - Wikipedia

Variance versus covariance

In matrix terms, we have

Also we will impose the following assumptions on :

1. and are independent.

{10 × the student's verbal intelligence} + {6 × the student's mathematical intelligence}.

Mathematical model of the same example

where the sample mean is:

and the sample variance is given by:

The factor analysis model for this particular sample is then:

or, more succinctly:

is the ith student's "verbal intelligence",

In matrix notation, we have

and the errors are vectors from that projected point

Types of factor analysis

Types of factor extraction

Uniqueness of a variable: The variability of a variable minus its communality.

Criteria for determining the number of factors

Equimax rotation is a compromise between varimax and quartimax criteria.

Exploratory factor analysis versus principal

Arguments contrasting PCA and EFA

Variance versus covariance

— Brown (2009), Principal components analysis and exploratory factor analysis –

Differences in procedure and results

In physical and biological sciences

Retrieved from "https://en.wikipedia.org/w/index.php?title=Factor_analysis&oldid=889839414"

This page was last edited on 28 March 2019, at 09:03 (UTC).

You might also like