Professional Documents
Culture Documents
Factor Analysis
Data reduction tool
Removes redundancy or duplication from a set of
correlated variables
Represents correlated variables with a smaller set of
derived variables.
Factors are formed that are relatively independent of one
another.
Two types of variables:
latent variables: factors
observed variables
Cohesion Variables:
G1 (I do not enjoy being a part of the social environment of this exercise group)
G2 (I am not going to miss the members of this exercise group when the program
ends)
G3 (I am unhappy with my exercise groups level of desire to exceed)
G4 (This exercise program does not give me enough opportunities to improve my
personal performance)
G5 (For me, this exercise group has become one of the most important social
groups to which I belong)
G6 (Our exercise group is united in trying to reach its goals for performance)
G7 (We all take responsibility for the performance by our exercise group)
G8 (I would like to continue interacting with some of the members of this exercise
group after the program ends)
G9 (If members of our exercise group have problems in practice, everyone wants
to help them)
G10 (Members of our exercise group do not freely discuss each athletes
responsibilities during practice)
G11 (I feel like I work harder during practice than other members of this exercise
group)
Other examples
Diet
Air pollution
Personality
Customer satisfaction
Depression
Quality of Life
2. Screening of Variables:
identifies groupings to allow us to select one variable to
represent many
useful in regression (recall collinearity)
3. Summary:
Allows us to describe many variables using a few factors
4. Clustering of objects:
Helps us to put objects (people) into categories depending on
their factor scores
Standard Result
-----------------------------------Variable | Factor1
Factor2 |
-------------+--------------------+
notenjoy | -0.3118
0.5870 |
notmiss | -0.3498
0.6155 |
desireexceed | -0.1919
0.8381 |
personalpe~m | -0.2269
0.7345 |
importants~l |
0.5682
-0.1748 |
groupunited |
0.8184
-0.1212 |
responsibi~y |
0.9233
-0.1968 |
interact |
0.6238
-0.2227 |
problemshelp |
0.8817
-0.2060 |
notdiscuss | -0.0308
0.4165 |
workharder | -0.1872
0.5647 |
-----------------------------------
How to interpret?
-----------------------------------Variable | Factor1
Factor2 |
-------------+--------------------+
notenjoy | -0.3118
0.5870 |
notmiss | -0.3498
0.6155 |
desireexceed | -0.1919
0.8381 |
personalpe~m | -0.2269
0.7345 |
importants~l |
0.5682
-0.1748 |
groupunited |
0.8184
-0.1212 |
responsibi~y |
0.9233
-0.1968 |
interact |
0.6238
-0.2227 |
problemshelp |
0.8817
-0.2060 |
notdiscuss | -0.0308
0.4165 |
workharder | -0.1872
0.5647 |
-----------------------------------
How to interpret?
Authors may conclude
something like:
We were able to derive two
factors from the 11 items. The
first factor is defined as
teamwork. The second factor
is defined as personal
competitive nature . These
two factors describe 72% of the
variance among the items.
-----------------------------------Variable | Factor1
Factor2 |
-------------+--------------------+
notenjoy | -0.3118
0.5870 |
notmiss | -0.3498
0.6155 |
desireexceed | -0.1919
0.8381 |
personalpe~m | -0.2269
0.7345 |
importants~l |
0.5682
-0.1748 |
groupunited |
0.8184
-0.1212 |
responsibi~y |
0.9233
-0.1968 |
interact |
0.6238
-0.2227 |
problemshelp |
0.8817
-0.2060 |
notdiscuss | -0.0308
0.4165 |
workharder | -0.1872
0.5647 |
-----------------------------------
X1 = F + e1
X2 = F + e2
var(ej) = var(ek) , j k
Xm = F + em
Reality:
X1 = 1F + e1
X2 = 2F + e2
var(ej) var(ek) , j k
Xm = mF + em
(unequal sensitivity to change in factor)
(Related to Item Response Theory (IRT))
Multi-Factor Models
Two factor orthogonal model
ORTHOGONAL = INDEPENDENT
Example: cohesion has two domains
X1 = 11F1 + 12F2 + e1
X2 = 21F1 + 22F2 + e2
.
41
51
61
71
81
91
101
111
12 0.31
22 0.35
32 0.19
42 0.23
52 0.57
62 0.82
72 0.92
82 0.62
92 0.88
102 0.03
112 0.19
0.59
0.62
0.84
0.73
0.17
0.12
0.20
0.22
0.21
0.42
0.56
Data Exploration
Histograms
normality
discreteness
outliers
Same scale
high = good, low = bad?
NotDiscussPOST
70
WorkHarderPOST
ProblemsHelpPOST
0 30
GroupUnitedPOST
Frequency
Frequency
Frequency
20 40
InteractPOST
30 60
ResponsibilityPOST
DesireExceedPOST
Frequency
Frequency
30 60
ImportantSocialPOST
Frequency
1
PersonalPerformPOST
0 20
Frequency
40 80
NotMissPOST
Frequency
NotEnjoyPOST
0 40
Frequency
1
30
30 60
0 30
Frequency
100
0 40
Frequency
Data exploration
Correlation Matrix
. pwcorr notenjoy-workharder
| notenjoy notmiss desire~d person~m import~l groupu~d respon~y
-------------+--------------------------------------------------------------notenjoy |
1.0000
notmiss |
0.3705
1.0000
desireexceed |
0.2609
0.3987
1.0000
personalpe~m |
0.2552
0.3472
0.5946
1.0000
importants~l | -0.2514 -0.3357 -0.1384 -0.3123
1.0000
groupunited | -0.1732 -0.2460 -0.2384 -0.1359
0.4364
1.0000
responsibi~y | -0.2554 -0.3663 -0.2908 -0.2507
0.4399
0.8016
1.0000
interact | -0.1847 -0.2966 -0.2162 -0.2294
0.4415
0.4251
0.5174
problemshelp | -0.2561 -0.2865 -0.2567 -0.1940
0.4159
0.6498
0.7748
notdiscuss |
0.1610
0.0763
0.2253
0.2193 -0.0242
0.0027 -0.0598
workharder |
0.3482
0.1606
0.3794
0.3848 -0.0010 -0.2765 -0.3083
| interact proble~p notdis~s workha~r
-------------+-----------------------------------interact |
1.0000
problemshelp |
0.5446
1.0000
notdiscuss | -0.0346 -0.0699
1.0000
workharder | -0.1063 -0.2358
0.2660
1.0000
5
3
1
5
3
5
3
1
5
3
1
5
3
1
1
5
3
3
1
5
3
5
5
3
3
1
5
3
1
1
5
3
1
5
3
1
1
5
3
1
Valid correlations?
Data Matrix
Factor analysis is totally dependent on correlations
between variables.
Factor analysis summarizes correlation structure
v1...vk
v1...vk
O1
.
.
.
.
.
.
.
.
On
Data Matrix
F1..Fj
v1
.
.
.
vk
v1
.
.
.
vk
Correlation
Matrix
Factor
Matrix
Important implications
Correlation matrix must be valid measure of
association
Likert scale? i.e. on a scale of 1 to K?
Consider previous set of plots
Is Pearson (linear) correlation a reasonable
measure of association?
-choric corelations
assume that variables are truncated versions of continuous
variables
only appropriate if continuous underlying assumption makes
sense
notenjoy
1
.64411349
.44814752
.37687346
-.33466689
-.26640575
-.38218019
-.31300025
-.40864072
.28367782
.49864257
notmiss
desireexceed
1
.60971951
.49572253
-.35262233
-.25987331
-.43174724
-.41147172
-.44688816
.2071563
.26866894
1
.74640077
-.18773414
-.32414348
-.34289848
-.28711931
-.34338549
.33714715
.50117974
importantsocial
groupunited
personalperform
importantsocial
groupunited
responsibility
interact
problemshelp
notdiscuss
workharder
personalperform
1
-.42902852
-.22011768
-.32272048
-.37003374
-.31435615
.28191066
.4766736
1
.47698468
.49187407
.51150655
.51458893
-.07289447
.02547056
1
.85603168
.46469124
.75552992
-.0934676
-.35603256
interact
problemshelp
responsibility
interact
problemshelp
notdiscuss
workharder
responsibility
1
.59252523
.84727982
-.11548039
-.37311526
1
.60910395
-.09653691
-.13316066
1
-.11580359
-.30122735
Eigenvalues
To select how many factors to use, consider
eigenvalues from a principal components analysis
Two interpretations:
eigenvalue equivalent number of variables which the factor
represents
eigenvalue amount of variance in the data described by the
factor.
Rules to go by:
Cohesion Example
. factormat R, pcf n(134)
(obs=134)
Factor analysis/correlation
Method: principal-component factors
Rotation: (unrotated)
Number of obs
=
Retained factors =
Number of params =
134
3
30
-------------------------------------------------------------------------Factor |
Eigenvalue
Difference
Proportion
Cumulative
-------------+-----------------------------------------------------------Factor1 |
4.96356
3.14606
0.4512
0.4512
Factor2 |
1.81751
0.76378
0.1652
0.6165
Factor3 |
1.05373
0.27749
0.0958
0.7123
Factor4 |
0.77624
0.02065
0.0706
0.7828
Factor5 |
0.75559
0.22587
0.0687
0.8515
Factor6 |
0.52972
0.05654
0.0482
0.8997
Factor7 |
0.47318
0.24670
0.0430
0.9427
Factor8 |
0.22647
0.02484
0.0206
0.9633
Factor9 |
0.20163
0.07341
0.0183
0.9816
Factor10 |
0.12822
0.05407
0.0117
0.9933
Factor11 |
0.07415
.
0.0067
1.0000
--------------------------------------------------------------------------
Eigenvalues
2
3
. screeplot
5
Number
10
Number of obs
=
Retained factors =
Number of params =
.........
Factor loadings (pattern matrix) and unique variances
------------------------------------------------Variable | Factor1
Factor2 |
Uniqueness
-------------+--------------------+-------------notenjoy | -0.6091
0.2661 |
0.5582
notmiss | -0.6566
0.2648 |
0.4988
desireexceed | -0.6712
0.5373 |
0.2608
personalpe~m | -0.6342
0.4344 |
0.4091
importants~l |
0.5538
0.2162 |
0.6466
groupunited |
0.7164
0.4137 |
0.3156
responsibi~y |
0.8456
0.4197 |
0.1088
interact |
0.6271
0.2132 |
0.5613
problemshelp |
0.8187
0.3866 |
0.1802
notdiscuss | -0.2830
0.3072 |
0.8256
workharder | -0.4977
0.3260 |
0.6461
-------------------------------------------------
134
2
21
Interpretability?
Not interpretable at this stage
In an unrotated solution, the first factor describes most of
variability.
Ideally we want to
spread variability more evenly among factors.
make factors interpretable
2
1
F1
4
F1
x1
x2
x3
x4
Factor 1
Factor 2
0.5
0.8
-0.7
-0.5
0.5
0.8
0.7
-0.5
x1
x2
x3
x4
Factor 1
Factor 2
0
0
-0.9
0
0.6
0.9
0
-0.9
. rotate
Factor analysis/correlation
Method: iterated principal factors
Rotation: orthogonal varimax (Kaiser off)
Rotated Solution
Number of obs
=
Retained factors =
Number of params =
134
2
21
-------------------------------------------------------------------------Factor |
Variance
Difference
Proportion
Cumulative
-------------+-----------------------------------------------------------Factor1 |
3.35544
0.72180
0.5603
0.5603
Factor2 |
2.63364
.
0.4397
1.0000
-------------------------------------------------------------------------LR test: independent vs. saturated: chi2(55) = 959.26 Prob>chi2 = 0.0000
Rotated factor loadings (pattern matrix) and unique variances
------------------------------------------------Variable | Factor1
Factor2 |
Uniqueness
-------------+--------------------+-------------notenjoy | -0.3118
0.5870 |
0.5582
notmiss | -0.3498
0.6155 |
0.4988
desireexceed | -0.1919
0.8381 |
0.2608
personalpe~m | -0.2269
0.7345 |
0.4091
importants~l |
0.5682
-0.1748 |
0.6466
groupunited |
0.8184
-0.1212 |
0.3156
responsibi~y |
0.9233
-0.1968 |
0.1088
interact |
0.6238
-0.2227 |
0.5613
problemshelp |
0.8817
-0.2060 |
0.1802
notdiscuss | -0.0308
0.4165 |
0.8256
workharder | -0.1872
0.5647 |
0.6461
-------------------------------------------------
Rotation options
Orthogonal
Oblique
allows dependence of factors
make distinctions sharper (loadings closer to 0s and
1s
can be harder to interpret once you lose
independence of factors
Uniqueness
Should all items be retained?
Uniquess for each item describes the proportion of the
item described by the factor model
Recall an R-squared:
proportion of variance in Y explained by X
1-Uniqueness:
proportion of the variance in Xk explained by F1, F2, etc.
Uniqueness:
represents what is left over that is not explained by factors
error that remainese
Interpretation
Naming of Factors
Wrong Interpretation: Factors represent separate
groups of people.
Right Interpretation: Each factor represents a continuum
along which people vary (and dimensions are orthogonal
if orthogonal)
Factor Scoring
. predict f1 f2
(regression scoring assumed)
Scoring coefficients
(method = regression; based on varimax rotated factors)
---------------------------------Variable | Factor1
Factor2
-------------+-------------------notenjoy | -0.03322
0.19223
notmiss | 0.04725
0.13279
desireexceed | 0.15817
0.54996
personalpe~m | -0.04037
0.21452
importants~l | 0.02971 -0.02168
groupunited | 0.12273
0.12938
responsibi~y | 0.60379
0.07719
interact | 0.04594 -0.00870
problemshelp | 0.31516
0.06376
workharder | 0.11750
0.10810
----------------------------------
3
4
Scores for factor 2
5
4
3
2
Graphs by progrm
Dragon Boat
Walking
Dragon Boat
Walking
4
3
2
1
Graphs by progrm
Our example?
Preliminary analysis of pilot data!
Concern: negative items hang together, positive items
hang together:
Is separation into two factors:
based on two different factors (teamwork, pers. comp. nature)
based on negative versus positive items?
Stata Code
pwcorr notenjoy-workharder
polychoric notenjoy-workharder
matrix R = r(R)
factormat R, pcf n(134)
screeplot
factormat R, n(134) ipf factor(2)
rotate
Stata Options
Pearson correlation
Use factor for principal components and factor analysis
choose estimation approach: ipf, pcf, ml, pf
choose to retain n factors: factor(n)
Polychoric correlation
Use factormat for principal components and factor analysis
choose estimation approach: ipf, pcf, ml, pf
choose to retain n factors: factor(n)
include n(xxx) to describe the sample size