You are on page 1of 7

Factor Analysis First Aid

(Introduction to Easy Factor Analysis v9.4 )

Author: Antal Pinter c.2013

Program features
Factor analysis is a powerful method for investigation of unknown systems, first line in sense of discovering
internal rules and dependencies among the observed set of variables. Revealing invisible influences and
relationships, it is an indispensable tool for better understanding and description of the considered system,
therefore it represents a scientific and theoretical structure as well as a practical and analytic technique.
There are a lot of specialized multivariate statistical programs in the market like SPSS, PSPP, STATISTICA, R,
SAS, MVSP, SYSTAT, Canoco, Brodgar, GenStat, Minitab, PC-ORD, PATN, CAP, ECOM, Factor8.1 and
similars, so why need one more another software besides of them?
There are a few reasons for that:
This program has projected as a low-cost software for students and researchers who are not very familiar
with handling too large, demanding and less comfortably software packages or who have less statistical
knowledge as wel, but want to factor analyze their data and get results in a fast and easy-to-use manner. This
program provides a unique opportunity for each researcher with no special skills in software or statistics, to
done his work efficiently!
Researchers often needs a tool for an arbitrary rotation on the "what-if" based concept, to look what would
be happen if change some of the results in the factor matrix, for a purpose of intencionally excluding the
influence of some variable or because of other needs. This opportunity is given by the graphical rotation, a
visual and interactive way to intuitively manage the particular factor structure!
After the extraction and processing of various rotations to clarify the interpretation of the results, researchers
always have a need for an objective ranking of their factors in terms of determining the importance of their
mutual influence, but none of the these programs offer such a feature! A special kind of orthogonal rotation, the
so-called special transformation which provides such a possibility, is also included here!
Data entry is done in the simplest possible way, by using Excel files as the most widespread and most
common form of data presentation. So, the necessary variables are always available in that form, whether from
the websites, from a variety of database servers, from other programs or by making them up by yourself. Just in
case if you need to perform the analysis of the ready-made correlation matrix, or when the original data are not
available for any reason, it is possible to start factor analysis by entering data in the form of an ordinary
structured text file!
In addition to a mouse input over the Configuration settings menu option, the program offers the comfort
of a step-by-step keyboard entry managing and monitoring the process of analysis and thus, by meaning of
immediacy feelings, can help in directing attention toward better decision making on what to do.

An Outline some extracted fragments of Factor analysis


1. Theory
The underlying assumption of factor analysis is the existance a number of unobserved latent variables (or
"factors") that account for the correlations among observed variables. Each observed variable (X) can then be
expressed as a weighted composite of a set of latent variables (F's) such that
X i = ai1 F1 + ai2 F2 ++ aim Fm +U i
where Xi is the i-th observed variable on the factors, Fm is the common factor and Ui is the residual of Xi on the
factors, so represent the specific unique- factor. Given the assumption that the residuals are uncorrelated across
the observed variables, the correlations among the observed variables are accounted for by the common factors.
It is crucial in understanding factor analysis to remember that F stands for a function of variables and not a
variable. For example, the functions might be F1 = XW + 2Z, and F2 = 3X2Z/W1/2. The unknown variables
entering into each function, F, in the above equation treated as the basic factor-model-equation, are related in
unknown ways, and although the equations relating the functions themselves are linear, they can express a
complex interaction between particular variables. Some researchers even drawn the comparison of factor
analysis with quantum theory! (Rummel, R.J)

2. Factor Analysis vs. Principal Components


There are two methods in theory for factor analysis: Exploratory (EFA) and confirmatory (CFA) factor
analysis (Thompson, 2004). Whereas both methods are used to examine the underlying factor structure of the
data, they play quite different roles in terms of the purpose of given research: One used for theory-building,
whereas the other is used primarily for theory-testing. Exploratory factor analysis (EFA) is used when
researchers have little ideas about the underlying mechanisms of the target phenomena, as such, they utilize
EFA to identify a set of unobserved (i.e., latent) factors that reconstruct the complexity of the observed data in
an essential form by retaining all important information available from the original data and removing all
unnecessary and/or redundant information, as well as noises induced by sampling/measurement errors. EFA is a
tool intended to help generate a new theory by exploring latent factors that the best accounts for the variations
and interrelationships of the manifest variables (Henson & Robers, 2006)
Because this type of factor analysis is used to estimate the unknown structure of the data, it represents a critical
point that distinguishes EFA and principal axis factoring from CFA and principal component analysis, whereas
the latter is used to summarize the information available from the given set of variables and reduce it into a
fewer number of components suitable for testing an existing theory. As an important implication, PCA assumed
the observed items without measurement errors, treating all variations explained.
It must point out that there are a considerable disagreement among statistical theorists about the two methods,
that when should be used principal component analysis (PCA) and when principal axis factoring (PAF) as factor
analysis. (Anna B. Costello and Jason W. Osborne, Masaki Matsunaga, 2010)
However, the empirical results show that the two types of analysis substantially match each other if the number
of variables exceeds 30 or communality exceeds 0.60 for a number of variables.

-2-

On other hand, the essential technical difference between factor analysis and principal components analysis is
much more simple: principal components analysis inserts communality estimates of 1.0 into the diagonal of the
correlation matrix, while the principal axis factoring method uses (most frequently) as initial communalities the
squared multiple correlation coefficients (SMC), or the maximum correlation coeffitients. (Daniel J. Denis)

3. Few aspects of analyze, worth of special attention


Before starting the process of analysis, its worth to pay a special attention to several important issues. Here will
be mentioned only some of them, like the kind and number of variables, the data sample size, the eigenvalues
regarding the required number of factors to extract, and the correlation matrix.
3.1. Variables
For selection of appropriate variables it is necessary to take into account an optimal number of redundant
varijable, i-e. the variables must at least a little overlap in their meaning.
About the question of sample size, it is not advisable to work with less then 50 cases, for an EFA it should be at
least 100. Some authors suggest that N of 100 is poor, 200 is fair, 300 is good 500 is very good, and
1000 or more is excellent. They also considered that a ratio of the sample size to the number of variables
should be 10 but at least not under 5. These recommendations are, however, not a general approach ones,
because the appropriate sample size, or N:p ratio, for a given measurement analysis is actually a function of
several aspects of the data, and a sample size of 100-200 may be sufficient. If that not gives a satisfied solution,
larger samples would be necessary. All these considerations are related to so far most applied fields of
research as psychology, economics, etc. and not to the analysis of natural phenomena, where is possible to
use much smaller samples than recommended!
3.2. Eigenvalues
The extraction of principal components or factors in principal axis factoring takes place by calculating the
eigenvalues of the correlation matrix.

Although equal to the sum of squared factor loadings, the eigenvalue is technically a solution of the
characteristic equation (R-E)a=0 for the unrotated factors. (Rummel, R.J.)
The number of positive eigenvalues determines the number of dimensions needed to represent a set of scores without
any loss of information. Hence, the number of positive eigenvalues determines the number of all possible
factors/components to be extracted. (Kootstra, 2004)

There are several rules/criteria introduced in the literature that help determine the optimal number of factors to
retain. The most frequently used strategy is to retain all factors whose computed eigenvalue is >= 1.0. (KaiserGuttmans latent root criterion) .
Factors having lower eigenvalue then 1.0 is considered irrelevant as they explain less variability than it explains
the variables itself. This method is best used when the number of variables counts from 20 to 50. If there are
less than 20 variables, there is a tendency to select too few factors, and if the number of variables is greater than
50, the tendency is to select too many factors.
2

Other factor-retention strategies can be applied include scree test, Bartletts test, MLM and parallel analysis,
or can be determined by a desired proportion and amount of explained variance accounted by a factors

-3-

obtained, but in any case, the number of latent factors should be determined primarily on the ground of
theoretical expectations and conceptualization of the target construct. (Masaki Matsunaga, 2010)
Notice, that during a PAF, some of the eigenvalues can be even negative, if the matrix is not of full rank.
Although it is strange to have a negative variance, this happens because the factor analysis is only analyzing the
common variance, which is less than the total variance. If we were doing a principal components analysis, we
would have had 1's on the diagonal, which means that all of the variance is being analyzed (which is another
way of saying that we are assuming that we have no measurement error), and we would not have negative
eigenvalues. In general, it is not uncommon to have negative eigenvalues. (IDRE Research Technology Group)
3.3. Correlation matrix
With respect to the correlation matrix, two things are important: the variables have to be intercorrelated, but they
should not correlate too highly (extreme multicollinearity and singularity) as this would cause difficulties in
determining the unique contribution of the variables to a factor. The intercorrelation can be checked by using
Bartletts test of spherity, which tests the null hypothesis that the original correlation matrix is an identity matrix.
This test has to be significant: when the correlation matrix is an identity matrix, there would be no correlations
between the variables. Multicollinearity, then, can be detected via the determinant of the correlation matrix: if the
determinant is greater than 0.00001, then there is no multicollinearity. (Kootstra 2004)
The smaller determinant means the more significant matrix. Values of 0.05 or less considered as good for
making Factor analysis.

--------------------------------------------------------------------------------------------------Example:
If det(R)=0.00001043 and log(e)|R|= -11.4137
Bartletts (1950) Chi-square test of significance of the correlation matrix:
2v + 5
X^2 = - ( n - 1 - -------- ) Log(e) |R| =
6
= -(145 - 1 (2*24 +5)/6 )(-11.4137) = -(144 -53/6)(-11.4137)= 1543.
v-1
The degrees of freedom are: df = v ----- . = 24(24-1)/2 = 12( 23) = 276.
2
Because Chi-square tables list this large df, the Chi-square can be transformed to t. The t is 31 and is highly
significant! The matrix can be legitimately factored! (Gorsuch R. 1983)

4. Factor rotation
After factor extraction it might be difficult to interpret and name the factors/components on the basis of their factor
loadings. The first factor always accounts for the maximum part of the variance appearing this way as the most
important factor. Thus, interpretation of the factors can be very difficult. A solution for this difficulty is factor
rotation. Factor rotation alters the pattern of the factor loadings, in a way to most approximate the simple structure,
and hence can improve interpretation.
The matrix is said to have simple structure when
1. Each row contains at least one zero.
2. Each column contains at least m zeroes.
3. For each pair of columns, there should be at least m variables whose entries vanish in one column but not in the
other.

-4-

The Varimax, Quartimax and Equamax named orthogonal rotations belongs to Orthomax rotation class.

4.1. Varimax
Suggested by Henry Felix Kaiser in 1958, it is a popular scheme for orthogonal rotation which cleans up the factors
as follows: "for each factor, high loadings (correlations) will result for a few variables; the rest will be near zero."
The VARIMAX rotation (gamma = 1) maximizes the squared factor loadings (the sum of the variance) in each
factor, i.e., SIMPLIFIES the COLUMNS of the factor loading matrix. In each factor the large loadings are
increased and the small ones are decreased so that each FACTOR only has a few variables with large loadings.
Minimizing the number of variables that have high loadings on each factor, it simplifies the interpretation of the
factors.

4.2. Quartimax
In contrast, the QUARTIMAX rotation (gamma = 0) maximizes the variance of the squared factor loadings in
each variable, i.e., simplifies the ROWS of the factor loading matrix. In each variable the large loadings are
increased and the small ones are decreased so that each VARIABLE will only load on a few factors.
Minimizing the number of factors needed to explain each variable, it simplifies the interpretation of the
observed variables.
4.3. Equamax
EQUAMAX is a rotation method (gamma=m/2) that is a combination of the varimax and the quartimax
method. The number of variables that load highly on a factor and the number of factors needed to explain a
variable are minimized.
4.4. Graphical rotation
Although appears historically as a first method of rotation, it is dissused nowdays. But along with it disappeared
the possibility of direct monitoring the process of rotation too, when it could be useful.
It is proved that the correlation between variables Xi and Xj is:

rij = ai1 aj1+ ai2 aj2 + + aim ajm


Thus, two factor scores can be strongly connected if they have high loadings for the same factors.
Sometimes, even at the price of violating Thurston-s principle of simple structure, provides the need to reduce
some factor weighting to decrease the dependency between the corresponding variables by its product, and we
just got it enabled with graphical rotation, by rotating one axis - factor toward a variable for intentionally
exclude its influence on some other variable, or vice versa, in a similar way, for intentionally increase the
significance of some variable! (Jahn and Vahle 1968)

4.5. Special transformation


.
After factor analysis, its advisable to perform a special transformation of the retained factors to determine to
what degree the variables examined affected the resultant variable and to rank them according to their
importance. This will be possible by merging all loadings to a single factor in such a way, that only one factor
can be weighted great with the dependent aim variable, while the rest factor remains uncorrelated, ie, with a
zero weight! Then, the independent variable with the highest loading has the greatest influence on the dependent
variable, and so on, according to the established rankings.
( Fischer and Roppert 1965, Jahn and Vahle 1968, 1970, Jolliffe 1990, 1993, Frank-M. Chmielewski )
-5-

5. Factor scores
Factor scores are attemption to quantification of the qualitative results of factor analysis, enabling so further
examinations applying them e.g. as new variables in multiple regression analysis.
Regression factor scores are calculated by linear combining of the standardized variable scores and the
factor score coefficients, where the factor score coefficients are taken from the factor loadings, weighted
by the corresponding eigenvalues/ squared sums.

Calculation of the Regression Factor scores


z(m,r)=[x(m,r) - (r)] / (r)

m=number of cases
2

FSC(r,n) = F(r,n) / F(i,n)


FAC(m,n)= z(m,r) * FSC(r,n)

r=number of variables
n=number of factors
r

FAC(m,n) = z(m,r) * [ F(r,n) F(i,n) ]


i=1
2

Because of standardization by construction, their mean is always 0 and the standard deviation is equal to
1 if principal components methods are used, and will be the squared multiple correlation between factors and
variables (typically used as the communality estimate) if principal axis methods are used.(DiStefano, Christine)
Factor scores are transformation of observed variables, and represents the estimated variation from the
mean of a particular variable, in the unit of standard deviation.
A score of 0 on a factor therefore means that importance ratings of the relevant attributes is close to the average
for that sample. Similarly, a negative score corresponds to variable with lower than average importance ratings
and indicates an inverted relationship to those ones with positive loadings, e.g. a score -2 means about two
standard deviation below the mean.
To say the factors are uncorrelated means that the factor scores on the factor patterns are uncorrelated, and not
necessarily the factor loadings. Factor loadings are, however, independent (orthogonal) (Suhr, D.D.)
Notice yet, that the communality (h2 ) values i.e. the squared sums of the variable's loadings given for the
unrotated factors do not change with orthogonal rotations, hence factor scores may be given with either the
unrotated or the rotated factor matrix!

____________________________
Dipl.ecc. Antal Pinter, Subotica
pantal24@gmail.com

-6-

REFERENCES:
Rummel, R.J.: UNDERSTANDING FACTOR ANALYSIS
(http://www.hawaii.edu/powerkills/UFA.HTM#P2.5)
DiStefano, Christine & Zhu, Min & Mindrila, Diana: Understanding and Using Factor Scores:
Considerations for the Applied Researcher. University of South Carolina.
( http://pareonline.net/pdf/v14n20.pdf )
Suhr, Diana D., Ph.D.: Principal Component Analysis vs. Exploratory Factor Analysis . University of
Northern Colorado ( http://www2.sas.com/proceedings/sugi30/203-30.pdf )
Daniel, J.Denis, Ph.D.: DATA & DECISION - Factor Analysis II. University of Montana, 2009
(http://psychweb.psy.umt.edu/denis/datadecision/factor/dd_fa_part_2_aug_2009.pdf)
Costello, Anna B. & Osborne, Jason.W.: Best Practices in Exploratory Factor Analysis.
North Carolina State University, 2005 (http://pareonline.net/pdf/v10n7.pdf)
Matsunaga, Masaki: How to Factor-analyze Your Data Right: Dos, Donts, and How-Tos, Rikkyo
University.
(http://mvint.usbmed.edu.co:8002/ojs/index.php/web/article/download/464/605/464-811-1-PB.pdf )
IDRE Research Technology Group: SAS Annotated Output Factor Analysis, University of California.
(http://www.ats.ucla.edu/stat/sas/output/factor.htm)
Kootstra, Gerrit Jan: Exploratory Factor Analysis , 2004
(http://www.let.rug.nl/~nerbonne/teach/rema-stats-meth-seminar/Factor-Analysis-Kootstra-04.PDF)
Jahn, Walter.-Vahle, Hans: Die faktoranalyse und ihre anwendung, Verlag die wirtschaft, Berlin, 1968
Jahn, Walter.-Vahle, Hans: A faktoranalzis s alkalmazsa, Kzgazdasgi s jogi knyvkiad, Budapest, 1974
Fischer, G - Roppert, J.: Ein verfahren der transformationsanalyse faktorenanalytischer Ergebnisse, 1965
Chmielewski, Frank-M.: Impact of climate changes on crop yields of winter rye in Halle (southeastern
Germany), 1901 to 1980, Meteorological Institute, Humboldt University at Berlin
(http://www.int-res.com/articles/cr/2/c002p023.pdf)
Bl, Barnabs Makra L. Matyasovszky I. Cspe Z.: Association of Sociodemographic and
Environmental Factors with Allergic Rhinitis and Asthma, University of Szeged, Etvs Lornd
University,Budapest, 2012
(http://www2.sci.u-szeged.hu/eghajlattan/akta12/2012_ACTA_46_p04_Balo_et_al.pdf )
Hedrih, Vladimir: Psiholoka raunarska statistika v0.3 -Faktorska analiza, Ni, 2005
(www.hm.co.rs/statistika/knjiga/faktoranaliz.doc)
Gorsuch, Richard L.: Factor Analysis, Hillsdale, NJ, Lawrence Erlbaum Associates, 1983
Darlington, Richard B. Factor analysis
(http://www.unt.edu/rss/class/mike/Articles/FactorAnalysisDarlington.pdf)
-7-

You might also like