Professional Documents
Culture Documents
Program features
Factor analysis is a powerful method for investigation of unknown systems, first line in sense of discovering
internal rules and dependencies among the observed set of variables. Revealing invisible influences and
relationships, it is an indispensable tool for better understanding and description of the considered system,
therefore it represents a scientific and theoretical structure as well as a practical and analytic technique.
There are a lot of specialized multivariate statistical programs in the market like SPSS, PSPP, STATISTICA, R,
SAS, MVSP, SYSTAT, Canoco, Brodgar, GenStat, Minitab, PC-ORD, PATN, CAP, ECOM, Factor8.1 and
similars, so why need one more another software besides of them?
There are a few reasons for that:
This program has projected as a low-cost software for students and researchers who are not very familiar
with handling too large, demanding and less comfortably software packages or who have less statistical
knowledge as wel, but want to factor analyze their data and get results in a fast and easy-to-use manner. This
program provides a unique opportunity for each researcher with no special skills in software or statistics, to
done his work efficiently!
Researchers often needs a tool for an arbitrary rotation on the "what-if" based concept, to look what would
be happen if change some of the results in the factor matrix, for a purpose of intencionally excluding the
influence of some variable or because of other needs. This opportunity is given by the graphical rotation, a
visual and interactive way to intuitively manage the particular factor structure!
After the extraction and processing of various rotations to clarify the interpretation of the results, researchers
always have a need for an objective ranking of their factors in terms of determining the importance of their
mutual influence, but none of the these programs offer such a feature! A special kind of orthogonal rotation, the
so-called special transformation which provides such a possibility, is also included here!
Data entry is done in the simplest possible way, by using Excel files as the most widespread and most
common form of data presentation. So, the necessary variables are always available in that form, whether from
the websites, from a variety of database servers, from other programs or by making them up by yourself. Just in
case if you need to perform the analysis of the ready-made correlation matrix, or when the original data are not
available for any reason, it is possible to start factor analysis by entering data in the form of an ordinary
structured text file!
In addition to a mouse input over the Configuration settings menu option, the program offers the comfort
of a step-by-step keyboard entry managing and monitoring the process of analysis and thus, by meaning of
immediacy feelings, can help in directing attention toward better decision making on what to do.
-2-
On other hand, the essential technical difference between factor analysis and principal components analysis is
much more simple: principal components analysis inserts communality estimates of 1.0 into the diagonal of the
correlation matrix, while the principal axis factoring method uses (most frequently) as initial communalities the
squared multiple correlation coefficients (SMC), or the maximum correlation coeffitients. (Daniel J. Denis)
Although equal to the sum of squared factor loadings, the eigenvalue is technically a solution of the
characteristic equation (R-E)a=0 for the unrotated factors. (Rummel, R.J.)
The number of positive eigenvalues determines the number of dimensions needed to represent a set of scores without
any loss of information. Hence, the number of positive eigenvalues determines the number of all possible
factors/components to be extracted. (Kootstra, 2004)
There are several rules/criteria introduced in the literature that help determine the optimal number of factors to
retain. The most frequently used strategy is to retain all factors whose computed eigenvalue is >= 1.0. (KaiserGuttmans latent root criterion) .
Factors having lower eigenvalue then 1.0 is considered irrelevant as they explain less variability than it explains
the variables itself. This method is best used when the number of variables counts from 20 to 50. If there are
less than 20 variables, there is a tendency to select too few factors, and if the number of variables is greater than
50, the tendency is to select too many factors.
2
Other factor-retention strategies can be applied include scree test, Bartletts test, MLM and parallel analysis,
or can be determined by a desired proportion and amount of explained variance accounted by a factors
-3-
obtained, but in any case, the number of latent factors should be determined primarily on the ground of
theoretical expectations and conceptualization of the target construct. (Masaki Matsunaga, 2010)
Notice, that during a PAF, some of the eigenvalues can be even negative, if the matrix is not of full rank.
Although it is strange to have a negative variance, this happens because the factor analysis is only analyzing the
common variance, which is less than the total variance. If we were doing a principal components analysis, we
would have had 1's on the diagonal, which means that all of the variance is being analyzed (which is another
way of saying that we are assuming that we have no measurement error), and we would not have negative
eigenvalues. In general, it is not uncommon to have negative eigenvalues. (IDRE Research Technology Group)
3.3. Correlation matrix
With respect to the correlation matrix, two things are important: the variables have to be intercorrelated, but they
should not correlate too highly (extreme multicollinearity and singularity) as this would cause difficulties in
determining the unique contribution of the variables to a factor. The intercorrelation can be checked by using
Bartletts test of spherity, which tests the null hypothesis that the original correlation matrix is an identity matrix.
This test has to be significant: when the correlation matrix is an identity matrix, there would be no correlations
between the variables. Multicollinearity, then, can be detected via the determinant of the correlation matrix: if the
determinant is greater than 0.00001, then there is no multicollinearity. (Kootstra 2004)
The smaller determinant means the more significant matrix. Values of 0.05 or less considered as good for
making Factor analysis.
--------------------------------------------------------------------------------------------------Example:
If det(R)=0.00001043 and log(e)|R|= -11.4137
Bartletts (1950) Chi-square test of significance of the correlation matrix:
2v + 5
X^2 = - ( n - 1 - -------- ) Log(e) |R| =
6
= -(145 - 1 (2*24 +5)/6 )(-11.4137) = -(144 -53/6)(-11.4137)= 1543.
v-1
The degrees of freedom are: df = v ----- . = 24(24-1)/2 = 12( 23) = 276.
2
Because Chi-square tables list this large df, the Chi-square can be transformed to t. The t is 31 and is highly
significant! The matrix can be legitimately factored! (Gorsuch R. 1983)
4. Factor rotation
After factor extraction it might be difficult to interpret and name the factors/components on the basis of their factor
loadings. The first factor always accounts for the maximum part of the variance appearing this way as the most
important factor. Thus, interpretation of the factors can be very difficult. A solution for this difficulty is factor
rotation. Factor rotation alters the pattern of the factor loadings, in a way to most approximate the simple structure,
and hence can improve interpretation.
The matrix is said to have simple structure when
1. Each row contains at least one zero.
2. Each column contains at least m zeroes.
3. For each pair of columns, there should be at least m variables whose entries vanish in one column but not in the
other.
-4-
The Varimax, Quartimax and Equamax named orthogonal rotations belongs to Orthomax rotation class.
4.1. Varimax
Suggested by Henry Felix Kaiser in 1958, it is a popular scheme for orthogonal rotation which cleans up the factors
as follows: "for each factor, high loadings (correlations) will result for a few variables; the rest will be near zero."
The VARIMAX rotation (gamma = 1) maximizes the squared factor loadings (the sum of the variance) in each
factor, i.e., SIMPLIFIES the COLUMNS of the factor loading matrix. In each factor the large loadings are
increased and the small ones are decreased so that each FACTOR only has a few variables with large loadings.
Minimizing the number of variables that have high loadings on each factor, it simplifies the interpretation of the
factors.
4.2. Quartimax
In contrast, the QUARTIMAX rotation (gamma = 0) maximizes the variance of the squared factor loadings in
each variable, i.e., simplifies the ROWS of the factor loading matrix. In each variable the large loadings are
increased and the small ones are decreased so that each VARIABLE will only load on a few factors.
Minimizing the number of factors needed to explain each variable, it simplifies the interpretation of the
observed variables.
4.3. Equamax
EQUAMAX is a rotation method (gamma=m/2) that is a combination of the varimax and the quartimax
method. The number of variables that load highly on a factor and the number of factors needed to explain a
variable are minimized.
4.4. Graphical rotation
Although appears historically as a first method of rotation, it is dissused nowdays. But along with it disappeared
the possibility of direct monitoring the process of rotation too, when it could be useful.
It is proved that the correlation between variables Xi and Xj is:
5. Factor scores
Factor scores are attemption to quantification of the qualitative results of factor analysis, enabling so further
examinations applying them e.g. as new variables in multiple regression analysis.
Regression factor scores are calculated by linear combining of the standardized variable scores and the
factor score coefficients, where the factor score coefficients are taken from the factor loadings, weighted
by the corresponding eigenvalues/ squared sums.
m=number of cases
2
r=number of variables
n=number of factors
r
Because of standardization by construction, their mean is always 0 and the standard deviation is equal to
1 if principal components methods are used, and will be the squared multiple correlation between factors and
variables (typically used as the communality estimate) if principal axis methods are used.(DiStefano, Christine)
Factor scores are transformation of observed variables, and represents the estimated variation from the
mean of a particular variable, in the unit of standard deviation.
A score of 0 on a factor therefore means that importance ratings of the relevant attributes is close to the average
for that sample. Similarly, a negative score corresponds to variable with lower than average importance ratings
and indicates an inverted relationship to those ones with positive loadings, e.g. a score -2 means about two
standard deviation below the mean.
To say the factors are uncorrelated means that the factor scores on the factor patterns are uncorrelated, and not
necessarily the factor loadings. Factor loadings are, however, independent (orthogonal) (Suhr, D.D.)
Notice yet, that the communality (h2 ) values i.e. the squared sums of the variable's loadings given for the
unrotated factors do not change with orthogonal rotations, hence factor scores may be given with either the
unrotated or the rotated factor matrix!
____________________________
Dipl.ecc. Antal Pinter, Subotica
pantal24@gmail.com
-6-
REFERENCES:
Rummel, R.J.: UNDERSTANDING FACTOR ANALYSIS
(http://www.hawaii.edu/powerkills/UFA.HTM#P2.5)
DiStefano, Christine & Zhu, Min & Mindrila, Diana: Understanding and Using Factor Scores:
Considerations for the Applied Researcher. University of South Carolina.
( http://pareonline.net/pdf/v14n20.pdf )
Suhr, Diana D., Ph.D.: Principal Component Analysis vs. Exploratory Factor Analysis . University of
Northern Colorado ( http://www2.sas.com/proceedings/sugi30/203-30.pdf )
Daniel, J.Denis, Ph.D.: DATA & DECISION - Factor Analysis II. University of Montana, 2009
(http://psychweb.psy.umt.edu/denis/datadecision/factor/dd_fa_part_2_aug_2009.pdf)
Costello, Anna B. & Osborne, Jason.W.: Best Practices in Exploratory Factor Analysis.
North Carolina State University, 2005 (http://pareonline.net/pdf/v10n7.pdf)
Matsunaga, Masaki: How to Factor-analyze Your Data Right: Dos, Donts, and How-Tos, Rikkyo
University.
(http://mvint.usbmed.edu.co:8002/ojs/index.php/web/article/download/464/605/464-811-1-PB.pdf )
IDRE Research Technology Group: SAS Annotated Output Factor Analysis, University of California.
(http://www.ats.ucla.edu/stat/sas/output/factor.htm)
Kootstra, Gerrit Jan: Exploratory Factor Analysis , 2004
(http://www.let.rug.nl/~nerbonne/teach/rema-stats-meth-seminar/Factor-Analysis-Kootstra-04.PDF)
Jahn, Walter.-Vahle, Hans: Die faktoranalyse und ihre anwendung, Verlag die wirtschaft, Berlin, 1968
Jahn, Walter.-Vahle, Hans: A faktoranalzis s alkalmazsa, Kzgazdasgi s jogi knyvkiad, Budapest, 1974
Fischer, G - Roppert, J.: Ein verfahren der transformationsanalyse faktorenanalytischer Ergebnisse, 1965
Chmielewski, Frank-M.: Impact of climate changes on crop yields of winter rye in Halle (southeastern
Germany), 1901 to 1980, Meteorological Institute, Humboldt University at Berlin
(http://www.int-res.com/articles/cr/2/c002p023.pdf)
Bl, Barnabs Makra L. Matyasovszky I. Cspe Z.: Association of Sociodemographic and
Environmental Factors with Allergic Rhinitis and Asthma, University of Szeged, Etvs Lornd
University,Budapest, 2012
(http://www2.sci.u-szeged.hu/eghajlattan/akta12/2012_ACTA_46_p04_Balo_et_al.pdf )
Hedrih, Vladimir: Psiholoka raunarska statistika v0.3 -Faktorska analiza, Ni, 2005
(www.hm.co.rs/statistika/knjiga/faktoranaliz.doc)
Gorsuch, Richard L.: Factor Analysis, Hillsdale, NJ, Lawrence Erlbaum Associates, 1983
Darlington, Richard B. Factor analysis
(http://www.unt.edu/rss/class/mike/Articles/FactorAnalysisDarlington.pdf)
-7-