You are on page 1of 20

Sursa: http://www2.chass.ncsu.edu/garson/pa765/factor.

htm
Factor Analysis
Overview
Factor analysis is used to uncover the latent structure (dimensions) of a set of variables. t
reduces attribute space from a larger number of variables to a smaller number of factors and
as such is a !non"dependent! procedure (that is# it does not assume a dependent variable is
specified). $actor anal%sis could be used for an% of the following purposes:
&o reduce a large number of variables to a smaller number of factors for modeling
purposes# where the large number of variables precludes modeling all the measures
individuall%. 's such# factor anal%sis is integrated in structural e(uation modeling
(S)*)# helping create the latent variables modeled b% S)*. +owever# factor anal%sis
can be and is often used on a stand"alone basis for similar purposes.
&o select a subset of variables from a larger set# based on which original variables
have the highest correlations with the principal component factors.
&o create a set of factors to be treated as uncorrelated variables as one approach to
handling multicollinearit% in such procedures as multiple regression
&o validate a scale or inde, b% demonstrating that its constituent items load on the
same factor# and to drop proposed scale items which cross"load on more than one
factor.
&o establish that multiple tests measure the same factor# thereb% giving -ustification
for administering fewer tests.
&o identif% clusters of cases and/or outliers.
&o determine networ. groups b% determining which sets of people cluster together
(using /"mode factor anal%sis# discussed below)
' non"technical analog%: ' mother sees various bumps and shapes under a blan.et at
the bottom of a bed. 0hen one shape moves toward the top of the bed# all the other
bumps and shapes move toward the top also# so the mother concludes that what is
under the blan.et is a single thing# most li.el% her child. Similarl%# factor anal%sis
ta.es as input a number of measures and tests# analogous to the bumps and shapes.
&hose that move together are considered a single thing# which it labels a factor. &hat
is# in factor anal%sis the researcher is assuming that there is a !child! out there in the
form of an underl%ing factor# and he or she ta.es simultaneous movement
(correlation) as evidence of its e,istence. f correlation is spurious for some reason#
this inference will be mista.en# of course# so it is important when conducting factor
anal%sis that possible variables which might introduce spuriousness# such as
anteceding causes# be included in the anal%sis and ta.en into account.
$actor anal%sis is part of the multiple general linear h%pothesis (*12+) famil% of
procedures and ma.es man% of the same assumptions as multiple regression: linear
relationships# interval or near"interval data# untruncated variables# proper specification
(relevant variables included# e,traneous ones e,cluded)# lac. of high multicollinearit%#
and multivariate normalit% for purposes of significance testing. $actor anal%sis
generates a table in which the rows are the observed raw indicator variables and the
columns are the factors or latent variables which e,plain as much of the variance in
these variables as possible. &he cells in this table are factor loadings# and the meaning
of the factors must be induced from seeing which variables are most heavil% loaded
on which factors. &his inferential labeling process can be fraught with sub-ectivit% as
diverse researchers impute different labels.
&here are several different t%pes of factor anal%sis# with the most common being
principal components anal%sis (34'). +owever# principal a,is factoring (3'$)# also
called common factor anal%sis# is preferred for purposes of confirmator% factor%
anal%sis in structural e(uation modeling.
Key Concepts and Terms
Exploratory factor analysis ()$') see.s to uncover the underl%ing structure of a
relativel% large set of variables. &he researcher5s priori assumption is that an%
indicator ma% be associated with an% factor. &his is the most common form of factor
anal%sis. &here is no prior theor% and one uses factor loadings to intuit the factor
structure of the data.
Confirmatory factor analysis (4$') see.s to determine if the number of factors and
the loadings of measured (indicator) variables on them conform to what is e,pected
on the basis of pre"established theor%. ndicator variables are selected on the basis of
prior theor% and factor anal%sis is used to see if the% load as predicted on the e,pected
number of factors. &he researcher5s priori assumption is that each factor (the
number and labels of which ma% be specified priori) is associated with a specified
subset of indicator variables. ' minimum re(uirement of confirmator% factor anal%sis
is that one h%pothesi6e beforehand the number of factors in the model# but usuall%
also the researcher will posit e,pectations about which variables will load on which
factors (7im and *ueller# 897:b: 55). &he researcher see.s to determine# for
instance# if measures created to represent a latent variable reall% belong together.
&here are two approaches to confirmator% factor anal%sis:
The Traditional Method. 4onfirmator% factor anal%sis can be accomplished
through an% general"purpose statistical pac.age which supports factor
anal%sis. ;ote that for S)* 4$' one uses principle a,is factoring (3'$) rather
than principle components anal%sis (34') as the t%pe of factoring. &his
method allows the researcher to e,amine factor loadings of indicator variables
to determine if the% load on latent variables (factors) as predicted b% the
researcher5s model. &his can provide a more detailed insight into the
measurement model than can the use of single"coefficient goodness of fit
measures used in the S)* approach. 's such the traditional method is a useful
anal%tic supplement to the S)* 4$' approach when the measurement model
merits closer e,amination.
The SEM Approach. 4onfirmator% factor anal%sis can mean the anal%sis of
alternative measurement (factor) models using a structural e(uation modeling
pac.age such as '*<S or 1S=)1. 0hile S)* is t%picall% used to model
causal relationships among latent variables (factors)# it is e(uall% possible to
use S)* to e,plore 4$' measurement models. &his is done b% removing from
the model all straight arrows connecting latent variables# adding curved arrows
representing covariance between ever% pair of latent variables# and leaving in
the straight arrows from each latent variable to its indicator variables as well
as leaving in the straight arrows from error and disturbance terms to their
respective variables. Such a measurement model is run li.e an% other model
and is evaluated li.e other models# using goodness of fit measures generated
b% the S)* pac.age.
>sing S)*# the researcher can e,plore 4$' models with or without the
assumption of certain correlations among the error terms of the indicator
variables. Such measurement error terms represent causes of variance due to
unmeasured variables as well as random measurement error. ?epending on
theor%# it ma% well be that the researcher should assume unmeasured causal
variables will be shared b% indicators or will correlate# and thus S)* testing
ma% well be merited. &hat is# including correlated measurement error in the
model tests the possibilit% that indicator variables correlate not -ust because of
being caused b% a common factor# but also due to common or correlated
unmeasured variables. &his possibilit% would be ruled out if the fit of the
model specif%ing uncorrelated error terms was as good as the model with
correlated error specified. n this wa%# testing of the confirmator% factor model
ma% well be a desirable validation stage preliminar% to the main use of S)*
to model the causal relations among latent variables.
>sing S)*# the redundancy test is to use chi"s(uare difference (discussed in
the section on structural e(uation modeling) to compare an original multifactor
model with one which is constrained b% forcing all correlations among the
factors to be 8.@. f the constrained model is not significantl% worse than the
unconstrained one# the researcher concludes that a one"factor model would fit
the data as well as a multi"factor one and# on the principle of parsimon%# the
one"factor model is to be preferred.
>sing S)*# the measurement invariance test is to use chi"s(uare difference to
assess whether a set of indicators reflects a latent variable e(uall% well across
groups in the sample. &he constrained model is one in which factor loadings
are specified to be e(ual for each class of the grouping variable. f the
constrained model is not significantl% worse# then the researcher concludes the
indicators are valid across groups. &his procedure is also called multiple group
CFA. f the model fails this test# then it is necessar% to e,amine each indicator
for group invariance# since some indicators ma% still be invariant. &his
procedure# called the partial measurement invariance test is discussed b%
7line (899:: 225 ff.).
>sing S)*# the orthogonality test is similar to the redundanc% test# but factor
correlations are set to @. f the constrained model is not significantl% worse
than the unconstrained one# the factors in the model can be considered
orthogonal (uncorrelated# independent). &his test re(uires at least three
indicators per factor.
Factors and components: Aoth are the dimensions (or latent variables) identified
with clusters of variables# as computed using factor anal%sis. &echnicall% spea.ing#
factors (as from 3$' "" principal factor anal%sis# a...a. principal a,is factoring# a...a.
common factor anal%sis) represent the common variance of variables# e,cluding
uni(ue variance# and is thus a correlation"focused approach see.ing to reproduce the
intercorrelation among the variables. A% comparison# components (from 34' "
principal components anal%sis) reflect both common and uni(ue variance of the
variables and ma% be seen as a variance"focused approach see.ing to reproduce both
the total variable variance with all components and to reproduce the correlations.
34' is far more common than 3$'# however# and it is common to use !factors!
interchangeabl% with !components.!
34' is generall% used when the research purpose is data reduction (to reduce
the information in man% measured variables into a smaller set of components).
3$' is generall% used when the research purpose is to identif% latent variables
which contribute to the common variance of the set of measured variables#
e,cluding variable"specific (uni(ue) variance.
Warning: Simulations comparing factor anal%sis with structural e(uation
modeling (S)*) using simulated data indicate that at least in some
circumstances# factor anal%sis ma% not correctl% identif% the correct number of
latent variables# or sometimes even come close. 0hile factor anal%sis ma%
demonstrate that a particular model with a given predicted number of latent
variables is not inconsistent with the data b% this techni(ue# researchers should
understand that other models with different numbers of latent variables ma%
also have good fit b% S)* techni(ues.
Types of Factoring
&here are different methods of e,tracting the factors from a set of data. &he
method chosen will matter more to the e,tent that the sample is small# the
variables are few# and/or the communalit% estimates of the variables differ.
o Principal components analysis (PCA): A% far the most common form of
factor anal%sis# 34' see.s a linear combination of variables such that the
ma,imum variance is e,tracted from the variables. t then removes this
variance and see.s a second linear combination which e,plains the ma,imum
proportion of the remaining variance# and so on. &his is called the principal
a,is method and results in orthogonal (uncorrelated) factors. 34' anal%6es
total (common and uni(ue) variance.
P proced!re: Select 'nal%6e " ?ata =eduction " $actor "
Bariables (input variables) " ?escriptives " >nder 4orrelation *atri,#
chec. 7*< and 'nti"image to get overall and individual 7*<
statistics " ),traction " *ethod (principal components) and 'nal%6e
(correlation matri,) and ?ispla% (Scree 3lot) and ),tract
(eigenvalues over 8.@) " 4ontinue " =otation " under *ethod# choose
Barima, " 4ontinue " Scores " Save as variables " 4ontinue " <7.
Canonical factor analysis # also called Raos canonical factoring# is
a different method of computing the same model as 34'# which uses
the principal a,is method. 4$' see.s factors which have the highest
canonical correlation with the observed variables. 4$' is unaffected
b% arbitrar% rescaling of the data.
o Principal factor analysis (PFA): 'lso called principal a,is factoring# 3'$#
and common factor anal%sis# 3$' is a form of factor anal%sis which see.s the
least number of factors which can account for the common variance
(correlation) of a set of variables# whereas the more common principal
components anal%sis (34') in its full form see.s the set of factors which can
account for all the common and uni(ue (specific plus error) variance in a set
of variables. 3$' uses a 34' strateg% but applies it to a correlation matri, in
which the diagonal elements are not 85s# as in 34'# but iterativel%"derived
estimates of the communalities (=
2
of a variable using all factors as
predictorsC see below).
PFA and E": 3$' is preferred for purposes of structural e(uation
modeling (S)*). 3$' accounts for the covariation among variables#
whereas 34' accounts for the total variance of variables. Aecause of
this difference# in theor% it is possible under 3$' but not under 34'
to add variables to a model without affecting the factor loadings of
the original variables in the model. 0idaman (899D) notes# !principal
component anal%sis should not be used if a researcher wishes to
obtain parameters reflecting latent constructs or factors.! +owever#
when commonalities are similar under 3$' and 34'# then similar
results will follow.
PCA vs# PFA. $or most datasets# 34' and 3$' will lead to similar
substantive conclusions (0il.inson# Alan.# and 2ruber# 8996).
o Ot$er Extraction "et$ods. n addition to 34' and 3$'# there are other
less"used e,traction methods:
11 !mage factoring: based on the correlation matri, of predicted variables rather than
actual variables# where each variable is predicted from the others using multiple regression.
11 Ma"imum li#elihood factoring: based on a linear combination of variables to form
factors# where the parameter estimates are those most li.el% to have resulted in the observed
correlation matri,# using *1) methods and assuming multivariate normalit%. 4orrelations are
weighted b% each variable5s uni(ueness. ('s discussed below# uni(ueness is the variabilit% of
a variable minus its communalit%.) *1$ generates a chi"s(uare goodness"of"fit test. &he
researcher can increase the number of factors one at a time until a satisfactor% goodness of fit
is obtained. 0arning: for large samples# even ver% small improvements in e,plaining variance
can be significant b% the goodness"of"fit test and thus lead the researcher to select too man%
factors.
11 Alpha factoring: based on ma,imi6ing the reliabilit% of factors# assuming variables
are randoml% sampled from a universe of variables. 'll other methods assume cases to be
sampled and variables fi,ed.
11 $n%eighted least s&uares '$(S) factoring: based on minimi6ing the sum of s(uared
differences between observed and estimated correlation matrices# not counting the diagonal.
11 *enerali+ed least s&uares '*(S) factoring: based on ad-usting >1S b% weighting the
correlations inversel% according to their uni(ueness (more uni(ue variables are weighted
less). 1i.e *1$# 21S also generates a chi"s(uare goodness"of"fit test. &he researcher can
increase the number of factors one at a time until a satisfactor% goodness of fit is obtained.
Factor Analytic %ata "odes
o &'mode factor analysis. ="mode is b% far the most common# so much so that
it is normall% assumed and not labeled as such. n ="mode# rows are cases#
columns are variables# and cell entries are scores of the cases on the
variables. n ="mode# the factors are clusters of variables on a set of people or
other entities# at a given point of time.
o ('mode factor analysis# also called inverse factor analysis# is factor anal%sis
which see.s to cluster the cases rather than the variables. &hat is# in /"mode
the rows are variables and the columns are cases (e,.# people)# and the cell
entries are scores of the cases on the variables. n /"mode the factors are
clusters of people for a set of variables. /"mode is used to establish the
factional composition of a group on a set of issues at a given point in time.
' /"mode issue has to do with negative factor loadings. n
conventional factor anal%sis of variables# loadings are loadings of
variables on factors and a negative loading indicates a negative relation
of the variable to the factor. n /"mode factor anal%sis# loadings are
loadings of cases (often individuals) on factors and a negative loading
indicates that the case/individual displa%s responses opposite to those
who load positivel% on the factor. n conventional factor anal%sis#
loading approaching 6ero indicates the given variable is unrelated to
the factor. n /"mode factor anal%sis# a loading approaching 6ero
indicates the given case is near the mean for the factor. 4luster anal%sis
is now more common than /"mode factor anal%sis.
The follo%ing modes are rare,
o O'mode factor analysis is an older form of time series anal%sis in which data
are collected on a single entit% (e,.# one >. S. Senator)# the columns are
%ears# and the rows are measures (variables). n this mode# factors show
which %ears cluster together on a set of measures for a single entit%. Aased on
this# one can compare entities or# in a histor% of the entit%# one can
differentiate periods for purposes of e,planation of behavior.
o T'mode factor analysis is similar to <"mode in that the columns are %ears.
+owever# the rows are entities (e,.# cases are people) and data are gathered
for a single variable. n &"mode# the factors show which %ears cluster together
on that variable for a set of people or other entities. <ne might investigate# for
instance# if Senators5 positions on militar% spending are differentiated
between war %ears and peacetime %ears.
o 'mode factor analysis uses entities for columns (e,.# Senators)# %ears for
rows (cases)# and cell entries measure a single variable. n S"mode# factors
show which Senators or other entities cluster together over a period of %ears
on a single variable. S"mode would be used# for instance# to establish the
underl%ing factional composition of a group on an issue over a long period of
time.
Factor loadings: &he factor loadings# also called component loadings in 34'# are the
correlation coefficients between the variables (rows) and factors (columns).
'nalogous to 3earson5s r# the s(uared factor loading is the percent of variance in that
variable e,plained b% the factor. &o get the percent of variance in all the variables
accounted for b% each factor# add the sum of the s(uared factor loadings for that
factor (column) and divide b% the number of variables. (;ote the number of variables
e(uals the sum of their variances as the variance of a standardi6ed variable is 8.) &his
is the same as dividing the factor5s eigenvalue b% the number of variables.
Factor- component- pattern- and structure matrices. n S3SS# the factor
loadings are found in a matri, labeled $actor *atri, if 3$' is re(uested# or in
one labeled 4omponent *atri, if 34' is re(uested. (;ote S3SS output gives
both a factor or component matri, and a rotated factor or component matri,.
&he rotated version is used to induce factor meanings).
n o.li&ue rotation# one gets both a pattern matri, and a structure matri,. &he
structure matri" is simpl% the factor loading matri, as in orthogonal rotation#
representing the variance in a measured variable e,plained b% a factor on both
a uni(ue and common contributions basis. &he pattern matri"# in contrast#
contains coefficients which -ust represent uni(ue contributions. &he more
factors# the lower the pattern coefficients as a rule since there will be more
common contributions to variance e,plained. $or obli(ue rotation# the
researcher loo.s at both the structure and pattern coefficients when attributing
a label to a factor.
&he sum of the s(uared factor loadings for all factors for a given variable
(row) is the variance in that variable accounted for b% all the factors# and this
is called the communality. n a complete 34'# with no factors dropped# this
will be 8.@# or 8@@E of the variance. &he ratio of the s(uared factor loadings
for a given variable (row in the factor matri,) shows the relative importance of
the different factors in e,plaining the variance of the given variable. $actor
loadings are the basis for imputing a label to the different factors.
&eprod!ced and resid!al correlation matrices. &he factor loadings can be used to
estimate the correlation matri, among variables. $or an% given pair of variables# the
reproduced correlation is the product of their factor loadings on the first factor plus
the product on the second factor# etc.# for all factors. &he reproduced correlation
matri" can be subtracted from the actual correlation matri,# resulting in a residual
correlation matri". 1ow or non"significant coefficients in the residual correlation
matri, indicate a good factor model. S3SS footnotes to the table of residual
correlations reports the percentage of non"redundant residual correlations greater than
.@5. n a good factor anal%sis# this percentage is low. (&his is not a test used to re-ect a
model# however# in part because significance and non"significance depends not onl%
on residual correlation magnitudes but also on sample si6e.)
Comm!nality) $
*
) is the s&uared multiple correlation for the variable as dependent
using the factors as predictors. &he communalit% measures the percent of variance in
a given variable e,plained b% all the factors -ointl% and ma% be interpreted as the
relia.ility of the indicator.
0hen an indicator variable has a low communalit%# the factor model is not
wor.ing well for that indicator and possibl% it should be removed from the
model. +owever# communalities must be interpreted in relation to the
interpretabilit% of the factors. ' communalit% of .75 seems high but is
meaningless unless the factor on which the variable is loaded is interpretable#
though it usuall% will be. ' communalit% of .25 seems low but ma% be
meaningful if the item is contributing to a well"defined factor. &hat is# what is
critical is not the communalit% coefficient per se# but rather the e,tent to which
the item pla%s a role in the interpretation of the factor# though often this role is
greater when communalit% is high.
f the communalit% e,ceeds 8.@# there is a spurious solution# which ma% reflect
too small a sample or the researcher has too man% or too few factors.
4ommunalit% for a variable is computed as the sum of s(uared factor loadings
for that variable (row). =ecall r"s(uared is the percent of variance e,plained#
and since factors are uncorrelated# the s(uared loadings ma% be added to get
the total percent e,plained# which is what communalit% is. $or full orthogonal
34'# the initial communalit% will be 8.@ for all variables and all of the
variance in the variables will be e,plained b% all of the factors# which will be
as man% as there are variables. &he !e,tracted! communalit% is the percent of
variance in a given variable e,plained b% the factors which are e,tracted#
which will usuall% be fewer than all the possible factors# resulting in
coefficients less than 8.@. $or 3$' and other e,traction methods# however# the
communalities for the various factors will be less than 8 even initiall%.
4ommunalit% does not change when rotation is carried out# hence in S3SS
there is onl% one communalities table.
+ni,!eness of a variable is 8 " h
2
. &hat is# uni(ueness is the variabilit% of a variable
minus its communalit%.
Eigenval!es: 'lso called characteristic roots. &he eigenvalue for a given factor
measures the variance in all the variables which is accounted for b% that factor. &he
ratio of eigenvalues is the ratio of e,planator% importance of the factors with respect
to the variables. f a factor has a low eigenvalue# then it is contributing little to the
e,planation of variances in the variables and ma% be ignored as redundant with more
important factors.
&hus# eigenvalues measure the amount of variation in the total sample
accounted for b% each factor. ;ote that the eigenvalue is not the percent of
variance e,plained but rather a measure of amount of variance in relation to
total variance (since variables are standardi6ed to have means of @ and
variances of 8# total variance is e(ual to the number of variables). S3SS will
output a corresponding column titled 5E of variance5. ' factor5s eigenvalue
ma% be computed as the sum of its s(uared factor loadings for all the
variables.
nitial eigenvalues and eigenvalues after e,traction (listed b% S3SS as
!),traction Sums of S(uared 1oadings!) are the same for 34' e,traction# but
for other e,traction methods# eigenvalues after e,traction will be lower than
their initial counterparts. S3SS also prints !=otation Sums of S(uared
1oadings! and even for 34'# these eigenvalues will differ from initial and
e,traction eigenvalues# though their total will be the same.
o Trace is the sum of variances for all factors# which is e(ual to the number of
variables since the variance of a standardi6ed variable is 8.@. ' factor5s
eigenvalue divided b% the trace is the percent of variance it e,plains in all the
variables# usuall% labeled percent of trace in computer output. 4omputer
output usuall% lists the factors in descending order of eigenvalue# along with
a cumulative percent of trace for as man% factors as are e,tracted.
Factor scores: 'lso called component scores in 34'# factor scores are the scores of
each case (row) on each factor (column). &o compute the factor score for a given case
for a given factor# one ta.es the case5s standardi6ed score on each variable# multiplies
b% the corresponding factor loading of the variable for the given factor# and sums
these products. 4omputing factor scores allows one to loo. for factor outliers. 'lso#
factor scores ma% be used as variables in subse(uent modeling.
&he S3SS $'4&<= procedure saves standardi6ed factor scores as variables in
%our wor.ing data file. n S3SS# clic. ScoresC select 5Save as Bariables5 and
5?ispla% factor score coefficient matri,5. &he factor (or in 34'# component)
score coefficient matri, contains the regression coefficients used down the
columns to compute scores for cases# were one to want to do this manuall%. A%
default S3SS will name them $'48F8#$'42F8# $'4DF8# etc.# for the
corresponding factors (factor 8# 2 and D) of anal%sis 8C and $'48F2# $'42F2#
$'4DF2 for a second set of factor scores# if an%# within the same procedure#
and so on. 'lthough S3SS adds these variables to the right of %our wor.ing
data set automaticall%# the% will be lost when %ou close the dataset unless %ou
re"save %our data.
Criteria for determining t$e n!m-er of factors) ro!g$ly in t$e order of
fre,!ency of !se in social science (see %!nteman) ./0/: **'1).
o Kaiser criterion: ' common rule of thumb for dropping the least important
factors from the anal%sis is the 78 rule. &hough originated earlier b% 2uttman
in 895G# the criterion is usuall% referenced in relation to 7aiser5s 896@ wor.
which relied upon it. &he 7aiser rule is to drop all components with
eigenvalues under 8.@. t ma% overestimate or underestimate the true number
of factorsC the preponderance of simulation stud% evidence suggests it is a
conservative criterion which usuall% overestimates the true number of factors#
sometimes severel% so (1ance# Autts# and *ichels# 2@@6). &he 7aiser
criterion is the default in S3SS and most computer programs but is not
recommended when used as the sole cut"off criterion for estimated the
number of factors.
o cree plot: &he 4attell scree test plots the components as the H a,is and the
corresponding eigenvalues as the I a,is. 's one moves to the right# toward
later components# the eigenvalues drop. 0hen the drop ceases and the curve
ma.es an elbow toward less steep decline# 4attell5s scree test sa%s to drop all
further components after the one starting the elbow. &his rule is sometimes
criticised for being amenable to researcher"controlled !fudging.! &hat is# as
pic.ing the !elbow! can be sub-ective because the curve has multiple elbows
or is a smooth curve# the researcher ma% be tempted to set the cut"off at the
number of factors desired b% his or her research agenda. )ven when
!fudging! is not a consideration# the scree criterion tends to result in even
more factors than the 7aiser criterion. Scree plot e,ample
o Parallel analysis (PA). 3' is now often recommended as the best method to
assess the true number of factors (Belicer# )aton# and $ava# 2@@@: 67C 1ance#
Autts# and *ichels# 2@@6). &hough not available in S3SS or S'S# <54onnor
(2@@@) presents programs to implement 3' in S3SS# S'S# and *'&1'A.
&hese programs are located at
http://flash.la.eheadu.ca/Jboconno2/nfactors.html.
o
o "inim!m average partial ("AP) criterion. ?eveloped b% Belicer# this
criterion is similar to 3' but more comple, to implement. <54onnor (2@@@)#
lin.ed above# also presents programs for *'3.
o 2ariance explained criteria: Some researchers simpl% use the rule of
.eeping enough factors to account for 9@E (sometimes :@E) of the variation.
0here the researcher5s goal emphasi6es parsimon% (e,plaining variance with
as few factors as possible)# the criterion could be as low as 5@E.
o 3oliffe criterion: ' less used# more liberal rule of thumb which ma% result in
twice as man% factors as the 7aiser criterion. &he Koliffe rule is to crop all
components with eigenvalues under .7.
o "ean eigenval!e. &his rule uses onl% the factors whose eigenvalues are at or
above the mean eigenvalue. &his strict rule ma% result in too few factors.
o Compre$ensi-ility. &hough not a strictl% mathematical criterion# there is
much to be said for limiting the number of factors to those whose dimension
of meaning is readil% comprehensible. <ften this is the first two or three.
Aefore dropping a factor below one5s cut"off# however# the researcher
should chec. its correlation with the dependent variable. ' ver% small
factor can have a large correlation with the dependent variable# in
which case it should not be dropped. 'lso# as a rule of thumb# factors
should have at least three high# interpretable loadings "" fewer ma%
suggest that the reasearcher has as.ed for too man% factors.
&otation "et$ods. =otation serves to ma.e the output more understandable and is
usuall% necessar% to facilitate the interpretation of factors. &he sum of eigenvalues is
not affected b% rotation# but rotation will alter the eigenvalues (and percent of
variance e,plained) of particular factors and will change the factor loadings. Since
alternative rotations ma% e,plain the same variance (have the same total eigenvalue)
but have different factor loadings# and since factor loadings are used to intuit the
meaning of factors# this means that different meanings ma% be ascribed to the factors
depending on the rotation " a problem some cite as a drawbac. to factor anal%sis. f
factor anal%sis is used# the researcher ma% wish to e,periment with alternative
rotation methods to see which leads to the most interpretable factor structure.
/.li&ue rotations discussed below allow the factors to be correlated# and so a
factor correlation matri" is generated when obli(ue is re(uested. ;ormall%#
however# an orthogonal method such as varima, is selected and no factor
correlation matri, is produced as the correlation of an% factor with another is
6ero.
o 4o rotation is the default in S3SS# but it is a good idea to select a rotation
method# usuall% varima,. &he original# unrotated principal components
solution ma,imi6es the sum of s(uared factor loadings# efficientl% creating a
set of factors which e,plain as much of the variance in the original variables
as possible. &he amount e,plained is reflected in the sum of the eigenvalues
of all factors. +owever# unrotated solutions are hard to interpret because
variables tend to load on multiple factors.
o 2arimax rotation is an orthogonal rotation of the factor a,es to ma,imi6e
the variance of the s(uared loadings of a factor (column) on all the variables
(rows) in a factor matri,# which has the effect of differentiating the original
variables b% e,tracted factor. )ach factor will tend to have either large or
small loadings of an% particular variable. ' varima, solution %ields results
which ma.e it as eas% as possible to identif% each variable with a single
factor. &his is the most common rotation option.
o (!artimax rotation is an orthogonal alternative which minimi6es the
number of factors needed to e,plain each variable. &his t%pe of rotation often
generates a general factor on which most variables are loaded to a high or
medium degree. Such a factor structure is usuall% not helpful to the research
purpose.
o E,!imax rotation is a compromise between Barima, and /uartima, criteria.
o %irect o-limin rotation is the standard method when one wishes a non"
orthogonal solution "" that is# one in which the factors are allowed to be
correlated. &his will result in higher eigenvalues but diminished
interpretabilit% of the factors. See below.
o Promax rotation is an alternative non"orthogonal rotation method which is
computationall% faster than the direct oblimin method and therefore is
sometimes used for ver% large datasets.
P&54CA6: ' computer program which adapts 34' for non"metric and non"linear
relationships. ts use is still rare.
&he Component Transformation "atrix in S3SS output shows the correlation of
the factors before and after rotation.
Ass!mptions
4o selection -ias7proper specification. &he e,clusion of relevant variables and the
inclusion of irrelevant variables in the correlation matri, being factored will affect#
often substantiall%# the factors which are uncovered. 'lthough social scientists ma% be
attracted to factor anal%sis as a wa% of e,ploring data whose structure is un.nown#
.nowing the factorial structure in advance helps select the variables to be included
and %ields the best anal%sis of factors. &his dilemma creates a chic.en"and"egg
problem. ;ote this is not -ust a matter of including all relevant variables. 'lso# if one
deletes variables arbitraril% in order to have a !cleaner! factorial solution# erroneous
conclusions about the factor structure will result. See 7im and *ueller# 897:a: 67":.
4o o!tliers. <utliers can impact correlations heavil% and thus distort factor anal%sis.
<ne ma% use *ahalanobis distance to identif% cases which are multivariate outliers#
then remove them from the anal%sis prior to factor anal%sis. <ne can also create a
dumm% variable set to 8 for cases with high *ahalanobis distance# then regress this
dumm% on all other variables. f this regression is non"significant (or simpl% has a
low ="s(uared for large samples) then the outliers are -udged to be at random and
there is less danger in retaining them. &he ratio of the beta weights in this regression
indicates which variables are most associated with the outlier cases.
5nterval data are assumed. +owever# 7im and *ueller (897:b 7G"5) note that
ordinal data ma% be used if it is thought that the assignment of ordinal categories to
the data do not seriousl% distort the underl%ing metric scaling. 1i.ewise# these authors
allow use of dichotomous data if the underl%ing metric correlations between the
variables are thought to be moderate (.7) or lower. &he result of using ordinal data is
that the factors ma% be that much harder to interpret.
;ote that categorical variables with similar splits will necessaril% tend to
correlate with each other# regardless of their content (see 2orsuch# 89:D). &his
is particularl% apt to occur when dichotomies are used. &he correlation will
reflect similarit% of !difficult%! for items in a testing conte,t# hence such
correlated variables are called difficulty factors. &he researcher should
e,amine the factor loadings of categorical variables with care to assess
whether common loading reflects a difficult% factor or substantive correlation.
mproper use of dichotomies can result in too man% factors. See the discussion
of levels of data.
6inearity. $actor anal%sis is a linear procedure. <f course# as with multiple linear
regression# nonlinear transformation of selected variables ma% be a pre"processing
step. &he smaller the sample si6e# the more important it is to screen data for linearit%.
"!ltivariate normality of data is re(uired for related significance tests. 34' and
3$'# significance testing apart# have no distributional assumptions. ;ote# however#
that a less"used variant of factor anal%sis# ma,imum li.elihood factor anal%sis# does
assume multivariate normalit%. &he smaller the sample si6e# the more important it is
to screen data for normalit%. *oreover# as factor anal%sis is based on correlation (or
sometimes covariance)# both correlation and covariance will be attenuated when
variables come from different underl%ing distributions (e,.# a normal vs. a bimodal
variable will correlate less than 8.@ even when both series are perfectl% co"ordered).
;onetheless# normalit% is not considered one of the critical assumptions of factor
anal%sis.
8omoscedasticity. Since factors are linear functions of measured variables#
homoscedasticit% of the relationship is assumed. +owever# homoscedasticit% is not
considered a critical assumption of factor anal%sis.
Ort$ogonality (for PFA -!t not PCA): the uni(ue factors should be uncorrelated
with each other or with the common factors. =ecall that 3$' factors onl% the common
variance# ignoring the uni(ue variance. &his is not an issue for 34'# which factors the
total variance.
+nderlying dimensions shared b% clusters of variables are assumed. f this
assumption is not met# the !garbage in# garbage out! (22<) principle applies. $actor
anal%sis cannot create valid dimensions (factors) if none e,ist in the input data. n
such cases# factors generated b% the factor anal%sis algorithm will not be
comprehensible. 1i.ewise# the inclusion of multiple definitionall%"similar variables
representing essentiall% the same data will lead to tautological results.
"oderate to moderate'$ig$ intercorrelations wit$o!t m!lticollinearity are not
mathematicall% re(uired# but appl%ing factor anal%sis to a correlation matri, with
onl% low intercorrelations will re(uire for solution nearl% as man% principal
components as there are original variables# thereb% defeating the data reduction
purposes of factor anal%sis. <n the other hand# too high intercorrelations ma% indicate
a multicollinearit% problem and colinear terms should be combined or otherwise
eliminated prior to factor anal%sis. 7*< statistics ma% be used to address
multicollinearit% in a factor anal%sis# or data ma% first be screened using B$ or
tolerance in regression. Some researchers re(uire correlations L D.@ to conduct factor
anal%sis.
Factor interpretations and la-els must have face validit% and/or be rooted in theor%.
t is notoriousl% difficult to assign valid meanings to factors. ' recommended practice
is to have a panel not otherwise part of the research pro-ect assign one5s items to one5s
factor labels. ' rule of thumb is that at least :@E of the assignments should be
correct.
Ade,!ate sample si9e. 't a minimum# there must be more cases than factors.
P O!tp!t Example
Annotated P Factor Analysis O!tp!t
Fre,!ently As:ed (!estions
8ow many cases do 5 need to do factor analysis;
8ow do 5 inp!t my data as a correlation matrix rat$er t$an raw data;
8ow many varia-les do 5 need to do factor analysis; T$e more) t$e -etter;
<$at is =sampling ade,!acy= and w$at is it !sed for;
5s it necessary to standardi9e one>s varia-les -efore applying factor analysis;
Can yo! pool data from two samples toget$er in factor analysis;
8ow does factor comparison of t$e factor str!ct!re of two samples wor:;
<$y is rotation of axes necessary;
<$y are t$e factor scores 5 get t$e same w$en 5 re,!est rotation and w$en 5 do
not;
<$y is o-li,!e (non'ort$ogonal) rotation rare in social science;
<$en s$o!ld o-li,!e rotation -e !sed;
8ow $ig$ does a factor loading $ave to -e to consider t$at varia-le as a defining
part of t$at factor;
<$at is simple factor structure ) and is t$e simpler) t$e -etter;
8ow is factor analysis related to validity;
<$at is t$e matrix of standardi9ed component scores) and for w$at mig$t it -e
!sed in researc$;
<$at are t$e pros and cons of PFA compared to PCA;
<$y are my PCA res!lts different in A compared to P;
8ow do 5 do ('mode factor analysis of cases rat$er t$an varia-les;
8ow else may 5 !se factor analysis to identify cl!sters of cases and7or o!tliers;
<$at do 5 do if 5 want to factor categorical varia-les;
8ow many cases do 5 need to do factor analysis;
&here is no scientific answer to this (uestion# and methodologists differ.
'lternative arbitrar% !rules of thumb#! in descending order of popularit%#
include those below. &hese are not mutuall% e,clusive: Ar%ant and Iarnold#
for instance# endorse both S&B and the =ule of 2@@.
8. =ule of 8@. &here should be at least 8@ cases for each item in the
instrument being used.
2. S&B ratio. &he sub-ects"to"variables ratio should be no lower than 5
(Ar%ant and Iarnold# 8995)
D. =ule of 8@@: &he number of sub-ects should be the larger of 5 times
the number of variables# or 8@@. )ven more sub-ects are needed when
communalities are low and/or few variables load on each factor.
(+atcher# 899G)
G. =ule of 85@: +utcheson and Sofroniou (8999) recommends at least
85@ " D@@ cases# more toward the 85@ end when there are a few
highl% correlated variables# as would be the case when collapsing
highl% multicollinear variables.
5. =ule of 2@@. &here should be at least 2@@ cases# regardless of S&B
(2orsuch# 89:D)
6. =ule of D@@. &here should be at least D@@ cases (;oruMis# 2@@5: G@@).
7. Significance rule. &here should be 58 more cases than the number of
variables# to support chi"s(uare testing (1awle% and *a,well# 8978)
8ow do 5 inp!t my data as a correlation matrix rat$er t$an raw data;
n S3SS# one first creates a !matri, data file! using the *'&=H ?'&'
command# as e,plained in the S0SS Synta" Reference *uide. &he format is:
MATRIX DATA VARIABLES=varlist.
BEGIN DATA
MEAN meanslist
STDDEV stddevlist
CORR
CORR .!!
CORR ".#$ .$$
CORR .%% .&! ".'
END DATA.
EXEC(TE.
where
varlist is a list of variable names separated b% commas
meanslist is a list of the means of variables# in the same order
as varlist
stddevlist is a list of standard deviations of variables# in the
same order
4<== statements define a correlation matri,# with variables in
the same order (data above are for illustrationC one ma% have
more or fewer 4<== statements as needed according to the
number of variables).
;ote the period at the end of the *'&=H ?'&' and );?
?'&' commands.
&hen if the *'&=H ?'&' command is part of the same control
s%nta, wor.ing file# add the $'4&<= command as usual but add the
subcommand !/*'&=HN(;(O)! (but without the (uote mar.s). f the
*'&=H ?'&' is not part of the same s%nta, set but has been run
earlier# the matri, data file name is substituted for the asteris..
8ow many varia-les do 5 need in factor analysis; T$e more) t$e -etter;
$or confirmator% factor anal%sis# there is no specific limit on the number of
variables to input. $or e,plorator% factor% anal%sis# &hurstone recommended
at least three variables per factor (7im and *ueller# 897:b: 77).
>sing confirmator% factor anal%sis in structural e(uation modeling#
having several or even a score of indicator variables for each factor
will tend to %ield a model with more reliabilit%# greater validit%# higher
generali6abilit%# and stronger tests of competing models# than will 4$'
with two or three indicators per factor# all other things e(ual. +owever#
the researcher must ta.e account of the statistical artifact that models
with fewer variables will %ield apparent better fit as measured b% S)*
goodness of fit coefficients# all other things e(ual.
+owever# !the more# the better! ma% not be true when there is a
possibilit% of su.optimal factor solutions (!bloated factors!). &oo man%
too similar items will mas. true underl%ing factors# leading to
suboptimal solutions. $or instance# items li.e ! li.e m% office#! !*%
office is nice#! ! li.e wor.ing in m% office#! etc.# ma% create an
!office! factor when the researcher is tr%ing to investigate the broader
factor of !-ob satisfaction.! &o avoid suboptimi6ation# the researcher
should start with a small set of the most defensible (highest face
validit%) items which represent the range of the factor (e,.# ones
dealing with wor. environment# cowor.ers# and remuneration in a
stud% of -ob satisfaction). 'ssuming these load on the same -ob
satisfaction factor# the researcher then should add one additional
variable at a time# adding onl% items which continue to load on the -ob
satisfaction factor# and noting when the factor begins to brea. down.
&his stepwise strateg% results in the most defensible final factors.
<$at is =sampling ade,!acy= and w$at is it !sed for;
*easured b% the 7aiser"*e%er"<l.in (7*<) statistics# sampling ade(uac%
predicts if data are li.el% to factor well# based on correlation and partial
correlation. n the old da%s of manual factor anal%sis# this was e,tremel%
useful. 7*< can still be used# however# to assess which variables to drop
from the model because the% are too multicollinear.
&here is a 7*< statistic for each individual variable# and their sum is
the 7*< overall statistic. 7*< varies from @ to 8.@ and 7*< overall
should be .6@ or higher to proceed with factor anal%sis. f it is not# drop
the indicator variables with the lowest individual 7*< statistic values#
until 7*< overall rises above .6@.
&o compute 7*< overall# the numerator is the sum of s(uared
correlations of all variables in the anal%sis (e,cept the 8.@ self"
correlations of variables with themselves# of course). &he denominator
is this same sum plus the sum of s(uared partial correlations of each
variable i with each variable -# controlling for others in the anal%sis.
&he concept is that the partial correlations should not be ver% large if
one is to e,pect distinct factors to emerge from factor anal%sis. See
+utcheson and Sofroniou# 8999: 22G.
n S3SS# 7*< is found under 'nal%6e " Statistics " ?ata =eduction "
$actor " Bariables (input variables) " ?escriptives " 4orrelation *atri,
" chec. 7*< and Aartlett5s test of sphericit% and also chec. 'nti"
image " 4ontinue " <7. &he 7*< output is 7*< overall. &he
diagonal elements on the 'nti"image correlation matri, are the 7*<
individual statistics for each variable.
5s it necessary to standardi9e one>s varia-les -efore applying factor analysis;
;o. =esults of factor anal%sis are not affected b% standardi6ation# which is
built into the procedure. ;ote# however# that standardi6ation (subtracting the
mean# dividing b% the standard deviation) scales data in a sample"specific
wa%. f the research purpose is to compare factor structures between two or
more samples# then one should use the covariance matri, rather than the
correlation matri, as input to factor anal%sis (7im and *ueller# 897:b: 76).
+owever# the covariance method has problems when the variables are
measured on widel% different scales (e,.# income measure to P8@@#@@@ and
education measured to 22 %ears). 7im and *ueller recommend multisample
standardi6ation for this case (subtracting the grand mean of all samples# and
dividing b% the standard deviation of all cases) prior to computing the sample
covariance matri, (p. 76). +owever# in practice factor comparison between
samples usuall% is based on ordinar% factor anal%sis of correlation matrices.
Can yo! pool data from two samples toget$er in factor analysis;
Ies# but onl% after %ou have shown both samples have the same factor
structure in through factor comparison.
8ow does factor comparison of t$e factor str!ct!re of two samples wor:;
&he pooled data method has the researcher pool the data for two samples#
adding a dumm% variable whose coding represents group membership. &he
factor loadings of this dumm% variable indicate the factors for which the
groups5 mean factor scores would be most different.
&he factor invariance test# discussed above# is a structural e(uation
modeling techni(ue (available in '*<S# for e,.) which tests for
deterioration in model fit when factor loadings are constrained to be
e(ual across sample groups.
&he comparison measures method re(uires computation of various
measures which compare factor attributes of the two samples. $actor
comparison is discussed b% 1evine (8977: D7"5G)# who describes these
factor comparison measures:
RMS- root mean s&uare. =*S is the root mean s(uare of the average
s(uared difference of the loadings of the variables on each of two
factors. =*S varies from @ to 2# reaching @ in the case of a perfect
match between samples of both the pattern and the magnitude of
factors in the two samples. 'n =*S of 2 indicates all loadings are at
unit% but differ in sign between the two samples. ntermediate values
are hard to interpret.
CC- coefficient of congruence. &he coefficient of congruence is the
sum of the products of the paired loadings divided b% the s(uare root
of the product of the two sums of s(uared loadings. 1i.e =*S# 44
measures both pattern and magnitude similarities between samples.
&here is a tendenc% to get a high 44 whenever two factors have
man% variables with the same sign.
S- salient varia.le similarity inde". &he salient variable similarit%
inde, is based on classif%ing factor loadings into positive salient ones
(over Q.8)# h%perplane ones (from ".8 to Q.8)# and negative salient
ones ((below ".8). +%perplane loadings# which approach @# indicate
variables having onl% a near"chance relationship to the factor. &he S
inde, will be @ when there are no salient loadings# indicating no
factor congruence between the two samples. 'n S of 8 indicates
perfect congruence# and "8 indicates perfect negative (reflected)
congruence. ;ote that calculating S for all possible pairs of factors
between two samples ris.s coming to conclusions on the basis of
chance.
<$y is rotation of axes necessary;
$or solutions with two or more factors# prior to rotation the first a,is will lie
in between the clusters of variables and in general the variables will not sort
well on the factors. =otation of the a,es causes the factor loadings of each
variable to be more clearl% differentiated b% factor.
<$y are t$e factor scores 5 get t$e same w$en 5 re,!est rotation and w$en 5 do
not;
f the solution has onl% one factor# rotation will not be done so the factor
scores will be the same whether %ou re(uest rotation or not.
<$y is o-li,!e (non'ort$ogonal) rotation rare in social science;
<bli(ue rotation is rare because# although it ma.es lin.age of the variables
with the factors clearer# it ma.es the distinction between factors more
difficult. Since identif%ing the meaning of the different factors is one of the
main challenges of factor anal%sis# obli(ue rotation tends to ma.e matters
worse in most cases.
+owever# occasionall% an obli(ue rotation will still result in a set of
factors whose intercorrelations approach 6ero. &his# indeed# is the test
of whether the underl%ing factor structure of a set of variables is
orthogonal. <rthogonal rotation mathematicall% assures resulting
factors will be uncorrelated and because of this determinism cannot be
used to test if underl%ing factor structure is orthogonal.
<$en s$o!ld o-li,!e rotation -e !sed;
n confirmator% factor anal%sis (4$')# if theor% suggests two factors are
correlated# then this measurement model calls for obli(ue rotation. n
e,plorator% factor anal%sis ()$')# the researcher does not have a theoretical
basis for .nowing how man% factors there are or what the% are# much less
whether the% are correlated. =esearchers conducting )$' usuall% assume the
measured variables are indicators of two or more different factors# a
measurement model which implies orthogonal rotation. &hat )$' is far more
common than 4$' in social science is another reason wh% orthogonal
rotation is far more common than obli(ue rotation.
0hen modeling# obli(ue rotation ma% be used as a filter. ?ata are first
anal%6ed b% obli(ue rotation and the factor correlation matri, is
e,amined. f the factor correlations are small (e,.# R .D2# corresponding
to 8@E e,plained)# then the researcher ma% feel warranted in assuming
orthogonalit% in the model. f the correlations are larger# then
covariance between factors should be assumed (e,.# in structural
e(uation modeling# one adds double"headed arrows between latents).
$or purposes other than modeling# such as seeing if test items sort
themselves out on factors as predicted# orthogonal rotation is almost
universal.
8ow $ig$ does a factor loading $ave to -e to consider t$at varia-le as a defining
part of t$at factor;
&his is purel% arbitrar%# but common social science practice uses a minimum
cut"off of .D or .D5. ;orman and Streiner (899G: 8D9) give this alternative
formula for minimum loadings when the sample si6e# ;# is 8@@ or more: *in
$1 N 5.852/SS/=&(;"2)T. 'nother arbitrar% rule"of"thumb terms loadings as
!wea.! if less than .G# !strong! if more than .6# and otherwise as !moderate.!
&hese rules are arbitrar%. &he meaning of the factor loading magnitudes
varies b% research conte,t. $or instance# loadings of .G5 might be considered
!high! for dichotomous items but for 1i.ert scales a .6 might be re(uired to
be considered !high.!
<$at is simple factor structure) and is t$e simpler) t$e -etter;
' factor structure is simple to the e,tent that each variable loads heavil% on
one and onl% one factor. >suall% rotation is necessar% to achieve simple
structure# if it can be achieved at all. <bli(ue rotation does lead to simpler
structures in most cases# but it is more important to note that obli(ue rotations
result in correlated factors# which are difficult to interpret. Simple structure is
onl% one of several sometimes conflicting goals in factor anal%sis.
8ow is factor analysis related to validity;
n confirmator% factor anal%sis (4$')# a finding that indicators have high
loadings on the predicted factors indicates convergent validity. n an obli(ue
rotation# discriminant validity is demonstrated if the correlation between
factors is not so high (e,.# L #:5) as to lead one to thin. the two factors
overlap conceptuall%.
<$at is t$e matrix of standardi9ed component scores) and for w$at mig$t it -e
!sed in researc$;
&hese are the scores of all the cases on all the factors# where cases are the
rows and the factors are the columns. &he% can be used for orthogonali6ation
of predictors in multiple regression. n a case where there is multicollinearit%#
one ma% use the component scores in place of the H scores# thereb% assuring
there is no multicollinearit% of predictors.
;ote# however# that this orthogonali6ation comes at a price. ;ow#
instead of e,plicit variables# one is modeling in terms of factors# the
labels for which are difficult to impute. Statisticall%# multicollinearit%
is eliminated b% this procedure# but in realit% it is hidden in the fact that
all variables have some loading on all factors# mudd%ing the purit% of
meaning of the factors.
' second research use for component scores is simpl% to be able to use
fewer variables in# sa%# a correlation matri,# in order to simplif%
presentation of the associations.
;ote also that factor scores are (uite different from factor loadings.
$actor scores are coefficients of cases on the factors# whereas factor
loadings are coefficients of variables on the factors.
<$at are t$e pros and cons of PFA compared to PCA;
34' determines the factors which can account for the total (uni(ue and
common) variance in a set of variables. &his is appropriate for creating a
t%polog% of variables or reducing attribute space. 34' is appropriate for most
social science research purposes and is the most often used form of factor
anal%sis.
3$' determines the least number of factors which can account for the
common variance in a set of variables. &his is appropriate for
determining the dimensionalit% of a set of variables such as a set of
items in a scale# specificall% to test whether one factor can account for
the bul. of the common variance in the set# though 34' can also be
used to test dimensionalit%. 3$' has the disadvantage that it can
generate negative eigenvalues# which are meaningless.
<$y are my PCA res!lts different in A compared to P;
&here are different algorithms for computing 34'# %ielding (uite similar
results. S3SS b% used !iterated principal factors! b% default# whereas S'S did
not. n S'S# specif% *)&+<?N3=;& to get the iterated solution within
3=<4 $'4&<=. &his assumes# of course# that the same rotation is also
specified in both programs. 'lso# S3SS b% default does 25 iterations and this
could ma.e a minor difference if S'S differs# though S3SS allows the user to
change this to another number.
8ow do 5 do ('mode factor analysis of cases rat$er t$an varia-les;
Simpl% transpose the data matri,# reversing rows and columns. ;ote that
there must be more cases than variables.
8ow else may 5 !se factor analysis to identify cl!sters of cases and7or o!tliers;
f there are onl% two or at most three principal component factors which
e,plain most of the total variation in the original variables# then one can
calculate the factor scores of all cases on these factors# and then a plot of the
factor scores will visuall% reveal both clusters of cases and also outliers. See
?unteman# 89:9: 75"79.
<$at do 5 do if 5 want to factor categorical varia-les;
' nominal and ordinal analog to factor anal%sis is latent class analysis. 'lso#
S3SS offers offers other procedures for factoring categorical data.
?i-liograp$y
Ar%ant and Iarnold (8995). 3rincipal components anal%sis and e,plorator% and
confirmator% factor anal%sis. n 2rimm and Iarnold# Reading and understanding
multivariate analysis. 'merican 3s%chological 'ssociation Aoo.s.
?unteman# 2eorge +. (89:9). 0rincipal components analysis. &housand <a.s# 4':
Sage 3ublications# /uantitative 'pplications in the Social Sciences Series# ;o. 69.
$abrigar# 1. =.# 0egener# ?. &.# *ac4allum# =. 4.# U Strahan# ). K. (8999).
)valuating the use of e,plorator% factor anal%sis in ps%chological research.
0sychological Methods# G: 272"299.
2orsuch# =. 1. (89:D). Factor Analysis. +illsdale# ;K: 1awrence )rlbaum. <rig. ed.
897G.
+atcher# 1arr% (899G). A step1.y1step approach to using the SAS system for factor
analysis and structural e&uation modeling. 4ar%# ;4: S'S nstitute. $ocus on the
4'1S procedure.
+utcheson# 2raeme and ;ic. Sofroniou (8999). The multivariate social scientist:
!ntroductory statistics using generali+ed linear models. &housand <a.s# 4': Sage
3ublications.
7im# Kae"<n and 4harles 0. *ueller (897:a). !ntroduction to factor analysis: What it
is and ho% to do it. &housand <a.s# 4': Sage 3ublications# /uantitative 'pplications
in the Social Sciences Series# ;o. 8D.
7im# Kae"<n and 4harles 0. *ueller (897:b). Factor Analysis: Statistical methods
and practical issues. &housand <a.s# 4': Sage 3ublications# /uantitative
'pplications in the Social Sciences Series# ;o. 8G.
7line# =e, A. (899:). 0rinciples and practice of structural e&uation modeling. ;I:
2uilford 3ress. 4overs confirmator% factor anal%sis using S)* techni(ues. See esp.
4h. 7.
1ance# 4harles )# *arcus *. Autts# and 1awrence 4. *ichels (2@@6). &he sources of
four commonl% reported cutoff criteria: 0hat did the% reall% sa%V <rgani6ational
=esearch *ethods 9(2): 2@2"22@. ?iscusses 7aiser and other criteria for selecting
number of factors.
1awle%# ?. ;. and '. ). *a,well (8978). Factor analysis as a statistical method.
1ondon: Autterworth and 4o.
1evine# *ar. S. (8977). Canonical analysis and factor comparison. &housand <a.s#
4': Sage 3ublications# /uantitative 'pplications in the Social Sciences Series# ;o. 6.
;orman# 2. =.# and ?. 1. Streiner (899G). 2iostatistics: The .are essentials. St.
1ouis# *<: *osb%.
<54onnor# A. 3. (2@@@). S3SS and S'S programs for determining the number of
components using parallel anal%sis and Belicer5s *'3 test. 2ehavior Research
Methods- !nstrumentation- and Computers D2: D96"G@2. .
;oruMis. *ari-a K. (2@@5). S3SS 8D.@ Statistical 3rocedures 4ompanion. 4hicago:
S3SS# nc.
3ett# *ar-orie '.# ;anc% =. 1ac.e%# and Kohn K. Sullivan (2@@D). Ma#ing sense of
factor analysis: The use of factor analysis for instrument development in health care
research. &housand <a.s# 4': Sage 3ublications.
Belicer# 0. $.# )aton# 4. '.# and $ava# K. 1. (2@@@). 4onstruct e,plication through
factor or component anal%sis: ' review and evaluation of alternative procedures for
determining the number of factors or components. 3p. G8"78 in =. ?. 2offin and ).
+elmes# eds.# 0ro.lems and solutions in human assessment. Aoston: 7luwer. >pholds
3' over 78 as a number of factors cutoff criterion.
0idaman# 7. $. (899D). 4ommon factor anal%sis versus principal components
anal%sis: ?ifferential bias in representing model parametersV! Multivariate
2ehavioral Research 2:: 26D"D88. 4ited with regard to preference for 3$' over 34'
in confirmator% factor anal%sis in S)*.
0il.inson# 1.# 2. Alan.# and 4. 2ruber (8996). 3es#top 3ata Analysis %ith S4STAT.
>pper Saddle =iver# ;K: 3rentice"+all.
4op%right 899:# 2@@6 b% 2. ?avid 2arson.

You might also like