You are on page 1of 52

Factor Analysis

N P Singh
Professor

History

Factor analysis was invented by


psychologist Charles Spearman

What is a factor?

Combination of original variables

Example
Student
No
1
2
3
4
5

Grades in
Finance Marketin Policy
(Y1)
g Y2
(y3)
3
6
5
7
3
3
10
9
8
3
9
7
10
6
5

Examples
It has been suggested that these grades are
functions of two underlying factors, F1 and
F2, tentatively
These are as quantitative ability and verbal
ability, respectively.
It is assumed that each Y variable is linearly
related to the two factors, as given in next
slide.

Factors

What are these Error


Terms
The error terms e1, e2, and e3, serve to
indicate that the hypothesized relationships
are not exact.
the parameters ij are referred to as
loadings. For example, 12 is called the
loading of variable Y1 on factor F2.

Factor Analysis

A data reduction technique designed to


represent a wide range of attributes on a
smaller number of dimensions.

Continued ..
In this MBA program, finance is highly
quantitative, while marketing and policy
have a strong qualitative orientation.
Quantitative skills should help a student in
finance, but not in marketing or policy.
Verbal skills should be helpful in marketing
or policy but not in finance.
In other words, it is expected that the
loadings have roughly the following
structure:

Continued
It is expected that the loadings have roughly
the following structure:

The Common Factor Model

The Common Factor Model


This

model proposes that each


observed response (measure 1 through
measure 5) is influenced partially by
underlying common factors (factor 1
and factor 2) and partially by
underlying unique factors (E1through
E5).
The strength of the link between each
factor and each measure varies, such
that a given factor influences some
measures more than others.

Factor Analysis

For example, suppose that a bank asked a


large number of questions about a given
branch. Consider how the following
characteristics might be more
parsimoniously represented by just a few
constructs (factors).

Factor Analysis

Factor Analysis
-Benefits include: (1) a more concise
representation of the marketing situation and
hence communication may be enhanced; (2)
fewer questions may be required on future
surveys; and, (3) perceptual maps become
feasible.

- Ideally, interval data (e.g., a rating on a 7


point scale), regarding the perceptions of
consumers are required regarding a number of
features, such as those noted above for the
bank are gathered.

Examples of Data

personality.sav - a set of responses from a


personality questionnaire.
SAQ.sav - fictional statistics anxiety
questionnaire from Andy Field's textbook
resources

Principal Component
Analysis
The purpose of PCA is to derive a relatively
small number of components that can
account for the variability found in a
relatively large number of measures.
This procedure, called data reduction, is
typically performed when a researcher does
not want to include all of the original
measures in analyses but still wants to work
with the information that they contain.

PCA Model

Difference

The first difference is that the direction of influence


is reversed: EFA assumes that the measured
responses are based on the underlying factors while
in PCA the principal components are based on the
measured responses.
The second difference is that EFA assumes that the
variance in the measured variables can be
decomposed into that accounted for by common
factors and that accounted for by unique factors.
The principal components are defined simply as
linear combinations of the measurements, and so
will contain both common and unique variance.

What is FA & PCA?


FA

and PCA (principal components


analysis) are methods of data
reduction
Take many variables and explain them
with a few factors or components
Correlated variables are grouped together
and separated from other variables with
low or no correlation

What is FA & PCA?

FA and PCA are not much different


than canonical correlation in terms of
generating canonical variates from
linear combinations of variables
Although there are now no sides of the
equation
And your not necessarily correlating the
factors, components, variates, etc.

FA vs. PCA conceptually


FA produces factors; PCA produces
components
Factors cause variables; components are
aggregates of the variables

FA vs. PCA conceptually


FA analyzes only the variance shared among
the variables (common variance without
error or unique variance); PCA analyzes all
of the variance
FA: What are the underlying processes that
could produce these correlations?; PCA: Just
summarize empirical associations, very data
driven

General Steps to FA
Step 1: Selecting and Measuring a set of
variables in a given domain
Step 2: Data screening in order to prepare
the correlation matrix
Step 3: Factor Extraction
Step 4: Factor Rotation to increase
interpretability
Step 5: Interpretation
Further Steps: Validation and Reliability of
the measures

Good Factor

A good factor:

Makes sense
will be easy to interpret
simple structure
Lacks complex loadings

Cumulative percent of variance


explained.
We are looking for an eigenvalue above
1.0.

Expensive

Appeals to Others

Reliable

Exciting

Attractive Looking

Latest Features

Luxury

Trend Setting

Trust

Distinctive
Not Conservative
Not Family
Not Basic

What shall these components be called?

Expensive

Appeals to Others

Reliable

Exciting

Attractive Looking

Latest Features

Luxury

Trend Setting

Trust

Distinctive
Not Conservative
Not Family
Not Basic

EXCLUSIVE

TRENDY

RELIABLE

Expensive

Appeals to Others

Reliable

Exciting

Attractive Looking

Latest Features

Luxury

Trend Setting

Trust

Distinctive
Not Conservative
Not Family
Not Basic

Calculate Component Scores


EXCLUSIVE

= (Expensive + Exciting + Luxury + Distinctive Conservative Family


Basic)/7
TRENDY
= (Appeals to Others + Attractive Looking + Trend Setting)/3

RELIABLE
= (Reliable + Latest Features + Trust)/3

Exclusive Trendy
Reliable
Beetle
1.4
6.7
6.9
Hummer
3.9
6.2
6.7
Lotus
4.1
7.3
6.7
Minivan
-1.67
4.83
6.5
Pick-Up
-0.43
4.93
6.3

Not much differing on this


dimension.

Exclusive Trendy
Reliable
Beetle
1.4
6.7
6.9
Hummer
3.9
6.2
6.7
Lotus
4.1
7.3
6.7
Minivan
-1.67
4.83
6.5
Pick-Up
-0.43
4.93
6.3

Vehicle by Component

Pick-Up

Minivan

Lotus

Hummer

Beetle

-3

-2

-1

Exclusive

Trendy

Types of FA

Exploratory FA
Summarizing data by grouping correlated
variables
Investigating sets of measured variables related
to theoretical constructs
Usually done near the onset of research
The type of FA and PCA we are talking here

Types of FA

Confirmatory FA
More advanced technique
When factor structure is known or at least
theorized
Testing generalization of factor structure to new
data, etc.
This is tested through SEM

Terminology
Observed Correlation Matrix
Reproduced Correlation Matrix
Residual Correlation Matrix

Terminology

Orthogonal Rotation

Loading Matrix correlation between each


variable and the factor

Oblique Rotation

Factor Correlation Matrix correlation between


the factors
Structure Matrix correlation between factors and
variables
Pattern Matrix unique relationship between each
factor and variable uncontaminated by overlap
between the factors

Terminology

Factor Coefficient matrix coefficients used


to calculate factor scores (like regression
coefficients)

Questions
Three general goals: data reduction,
describe relationships and test theories
about relationships (next chapter)
How many interpretable factors exist in the
data? or How many factors are needed to
summarize the pattern of correlations?

Questions
What does each factor mean?
Interpretation?
What is the percentage of variance in
the data accounted for by the factors?

Questions
Which factors account for the most
variance?
How well does the factor structure fit a
given theory?
What would each subjects score be if they
could be measured directly on the factors?

Considerations
(from Comrey and Lee, 1992)

Hypotheses about factors believed to


underlie a domain
Should have 6 or more for stable solution

Include marker variables


Pure variables correlated with only one factor
They define the factor clearly
Complex variables load on more than on factor and
muddy the water

Considerations
(from Comrey and Lee, 1992)
Make sure the sample chosen is spread out
on possible scores on the variables and the
factors being measured
Factors are known to change across
samples and time points, so samples should
be tested before being pooled together

You might also like