You are on page 1of 40

Correspondence Analysis

Correspondence analysis is a descriptive/exploratory technique designed to analyze simple two-way and multi-way tables containing some measure of correspondence between the rows and columns. The results provide information which is similar in nature to those produced by Factor Analysis techniques, and they allow one to explore the structure of categorical variables included in the table. The most common kind of table of this type is the two-way frequency cross-tabulation table.

Correspondence Analysis
In a typical correspondence analysis, a cross-tabulation table of frequencies is first standardised, so that the relative frequencies across all cells sum to 1.0. One way to state the goal of a typical analysis is to represent the entries in the table of relative frequencies in terms of the distances between individual rows and/or columns in a low-dimensional space. There are several parallels in interpretation between correspondence analysis and factor analysis.
2

Correspondence Analysis
The data summarises individuals political affiliation (1,,5) and geographic region (1,,4) . 1 2 Liberal Tend Lib

3
4 5

Moderate
Tend Cons Conservative

Correspondence Analysis
The data summarises individuals political affiliation (1,,5) and geographic region (1,,4) . 1 2 3 Northeast Midwest South

West

Correspondence Analysis
The data summarises individuals political affiliation (1,,5) and geographic region (1,,4) . 725 rows of data

Correspondence Analysis
Analyze > Dimension Reduction > Correspondence Analysis

Correspondence Analysis
Select row/column variables. And define the ranges.

Having defined the ranges. Use the buttons at the side of the screen to set desired parameters.

Correspondence Analysis
Define row range. Select row bound, Update and then Continue

There are 4 regions.

Correspondence Analysis
Define the column range. Select column bound, Update and then Continue

There are 5 political affiliations.

Correspondence Analysis
Finally

Use the buttons at the side of the screen to set desired parameters.
10

Correspondence Analysis
Select statistics

11

Correspondence Analysis
Select plots

12

Correspondence Analysis
Finally use the OK button to run the analysis

13

Correspondence Analysis
The Correspondence Table is simply the crosstabulation of the row and column variables, including the row and column marginal totals, serving as input.

Cor res ponde nce Table Political Outlook Moderate Tend Cons 58 16 71 47 75 46 40 26 244 135

Region Northeast Midw es t South West A ctiv e Margin

Liberal 19 26 18 30 93

Tend Lib 23 31 27 19 100

Cons ervative 15 35 70 33 153

A ctiv e Margin 131 210 236 148 725

14

Correspondence Analysis
The Row Profiles are the cell contents divided by their corresponding row total (ex. 19/131=0.145 for the first cell). This table also shows the column masses (column marginals as a percent of n) (ex. 93/725=0.128). These are intermediate calculations on the way toward computing distances between points.
Row Profiles Political Outlook Moderate Tend Cons .443 .122 .338 .224 .318 .195 .270 .176 .337 .186

Region Northeast Midw es t South West Mas s

Liberal .145 .124 .076 .203 .128

Tend Lib .176 .148 .114 .128 .138

Cons ervative .115 .167 .297 .223 .211

Activ e Margin 1.000 1.000 1.000 1.000

15

Correspondence Analysis
Column Profiles are the cell elements divided by the column marginals (ex. 19/103=0.204). This table also shows the row masses (row marginals as a percent of n) (ex. 131/725=0.181). These are intermediate calculations on the way toward computing distances between points.
Colum n Profile s Political Outlook Moderate Tend Cons .238 .119 .291 .348 .307 .341 .164 .193 1.000 1.000

Region Northeast Midw es t South West A ctiv e Margin

Liberal .204 .280 .194 .323 1.000

Tend Lib .230 .310 .270 .190 1.000

Cons ervative .098 .229 .458 .216 1.000

Mas s .181 .290 .326 .204

16

Correspondence Analysis
In the Summary table, we first look at the chi-square value and see that it is significant, justifying the assumption that the two variables are related.
Sum m ary Proportion of Inertia Singular V alue .189 .124 .078 Conf idence Singular V alue Standard Deviation .035 .040 Correlation 2 -.043

Dimension 1 2 3 Total

Inertia .036 .015 .006 .057

Chi Square

Sig.

41.489

.000 a

A cc ounted f or .627 .268 .105 1.000

Cumulativ e .627 .895 1.000 1.000

a. 12 degrees of f reedom

17

Correspondence Analysis
SPSS has computed the interpoint distances and subjected the distance matrix to principal components analysis, yielding in this case three dimensions.
Sum m ary Proportion of Inertia Singular V alue .189 .124 .078 Conf idence Singular V alue Standard Deviation .035 .040 Correlation 2 -.043

Dimension 1 2 3 Total

Inertia .036 .015 .006 .057

Chi Square

Sig.

41.489

.000 a

A cc ounted f or .627 .268 .105 1.000

Cumulativ e .627 .895 1.000 1.000

a. 12 degrees of f reedom

18

Correspondence Analysis
Only the interpretable dimensions are reported, not the full solution, which is why the eigen values add to something less than 100% (labelled Inertia; these are the percent of variance explained by each dimension) - in this case only 0.057 = 5.7%. This reflects the fact that the correlation between region and political outlook, while significant, is weak.
Sum m ary Proportion of Inertia Singular V alue .189 .124 .078 Conf idence Singular V alue Standard Deviation .035 .040 Correlation 2 -.043

Dimension 1 2 3 Total

Inertia .036 .015 .006 .057

Chi Square

Sig.

41.489

.000 a

A cc ounted f or .627 .268 .105 1.000

Cumulativ e .627 .895 1.000 1.000

a. 12 degrees of f reedom

19

Correspondence Analysis
The eigen values (called inertia here) reflect the relative importance of each dimension, with the first always being the most important, the next second most important, etc.

Sum m ary Proportion of Inertia Singular V alue .189 .124 .078 Conf idence Singular V alue Standard Deviation .035 .040 Correlation 2 -.043

Dimension 1 2 3 Total

Inertia .036 .015 .006 .057

Chi Square

Sig.

41.489

.000 a

A cc ounted f or .627 .268 .105 1.000

Cumulativ e .627 .895 1.000 1.000

a. 12 degrees of f reedom

20

Correspondence Analysis
The singular values are simply the square roots of the eigen values. They are interpreted as the maximum canonical correlation between the categories of the variables in analysis for any given dimension.
Sum m ary Proportion of Inertia Singular V alue .189 .124 .078 Conf idence Singular V alue Standard Deviation .035 .040 Correlation 2 -.043

Dimension 1 2 3 Total

Inertia .036 .015 .006 .057

Chi Square

Sig.

41.489

.000 a

A cc ounted f or .627 .268 .105 1.000

Cumulativ e .627 .895 1.000 1.000

a. 12 degrees of f reedom

21

Correspondence Analysis
Note that the "Proportion of Inertia" columns are the dimension eigen values divided by the total (table) eigen value. That is, they are the percent of variance each dimension explains of the variance explained: thus the first dimension explains 62.7% of the 5.7% of the variance explained by the model.
Sum m ary Proportion of Inertia Singular V alue .189 .124 .078 Conf idence Singular V alue Standard Deviation .035 .040 Correlation 2 -.043

Dimension 1 2 3 Total

Inertia .036 .015 .006 .057

Chi Square

Sig.

41.489

.000 a

A cc ounted f or .627 .268 .105 1.000

Cumulativ e .627 .895 1.000 1.000

a. 12 degrees of f reedom

22

Correspondence Analysis
The standard deviation columns refer back to the singular values and helps the researcher assess the relative precision of each dimension.

Sum m ary Proportion of Inertia Singular V alue .189 .124 .078 Conf idence Singular V alue Standard Deviation .035 .040 Correlation 2 -.043

Dimension 1 2 3 Total

Inertia .036 .015 .006 .057

Chi Square

Sig.

41.489

.000 a

A cc ounted f or .627 .268 .105 1.000

Cumulativ e .627 .895 1.000 1.000

a. 12 degrees of f reedom

23

Correspondence Analysis
The Overview Row Points table, for each row point in the correspondence table, displays the mass, scores in dimension, inertia, contribution of the point to the inertia of the dimension, and contribution of the dimension to the inertia of the point.
a Ove rvie w Row Points

Score in Dimension Of Point to Inertia of Dimension 1 2 .470 .139 .026 .010 .501 .099 .003 .752 1.000 1.000

Contribution Of Dimens ion to Inertia of Point 1 2 Total .832 .105 .938 .181 .030 .210 .901 .076 .977 .010 .970 .979

Region Northeast Midw es t South West Activ e Total

Mas s .181 .290 .326 .204 1.000

1 -.702 -.130 .540 -.055

2 .309 .065 .194 -.675

Inertia .020 .005 .020 .012 .057

a. Symmetric al normalization

24

Correspondence Analysis
Keyword interpretations Mass: the marginal proportions of the row variable, used
to weight the point profiles when computing point distance. This weighting has the effect of compensating for unequal numbers of cases.

Scores in dimension: scores used as coordinates for

points when plotting the correspondence map. Each point has a score on each dimension.

Inertia: Variance
25

Correspondence Analysis
Contribution of points to dimensions: as factor loadings
are used in conventional factor analysis to ascribe meaning to dimensions, so "contribution of points to dimensions" is used to intuit the meaning of correspondence dimensions.

Contribution of dimensions to points: these are multiple

correlations, which reflect how well the principal components model is explaining any given point (category).

26

Correspondence Analysis
The Overview Column Points table is similar to the previous one, except for the column variable in the correspondence table.
a Ove rvie w Colum n Points

Score in Dimension Of Point to Inertia of Dimension 1 2 .163 .663 .090 .017 .113 .303 .055 .002 .579 .015 1.000 1.000

Contribution Of Dimens ion to Inertia of Point 1 2 Total .363 .630 .993 .921 .075 .995 .448 .512 .960 .308 .005 .313 .940 .010 .950

Political Outlook Liberal Tend Lib Moderate Tend Cons Cons ervative A ctiv e Total

Mas s .128 .138 .337 .186 .211 1.000

1 -.491 -.351 -.252 .237 .721

2 -.800 .124 .334 -.037 -.094

Inertia .016 .003 .009 .006 .022 .057

a. Symmetric al normalization

27

Correspondence Analysis
The Confidence Row Points tables display the standard deviations of the row scores (the values used as coordinates to plot the correspondence map) and are used to assess their precision.
Confide nce Row Points Standard Deviation in Dimension 1 2 .190 .307 .169 .323 .122 .206 .339 .148 Correlation 1-2 .528 .066 -.685 -.026

Region Northeast Midw es t South West

28

Correspondence Analysis
The Confidence Column Points tables display the standard deviations of the column scores (the values used as coordinates to plot the correspondence map) and are used to assess their precision.
Confide nce Colum n Points Standard Deviation in Dimension 1 2 .387 .221 .072 .117 .171 .122 .215 .406 .127 .302 Correlation 1-2 -.694 .801 .575 .095 .304

Political Outlook Liberal Tend Lib Moderate Tend Cons Cons ervative

29

Correspondence Analysis
The plots of transformed categories for dimensions display a plot of the transformation of the row category values and of column category values into scores in dimension, with one plot per dimension.

The x-axis has the category values and the y-axis has the corresponding dimension scores. Thus the category "Northeast" in the Overview Row Points table above had a score in dimension of -0.702, as shown on the plot.

30

Correspondence Analysis

Refer back to Overview Row Points dimension 1 Why join!

31

Correspondence Analysis

Refer back to Overview Row Points dimension 2


32

Correspondence Analysis

Refer back to Overview Column Points dimension 1 33

Correspondence Analysis

Refer back to Overview Column Points dimension 2 34

Correspondence Analysis
The uniplots for the row and column variables. Note that the origin of the axes is slightly different in the two plots.

35

Correspondence Analysis

Refer back to Overview Row Points dimensions 1 & 2 36

Correspondence Analysis

37 Refer back to Overview Column Points dimensions 1 & 2

Correspondence Analysis
Finally the biplot correspondence map is obtained. Note the axes now encompass the most extreme values of both of the uniplots.

Note that while some generalizations can be made about the association of categories (South more conservative, West more liberal). The researcher must keep firmly in mind that correspondence is not association. That is, the researcher should not allow the maps display of intercategory distances to obscure the fact that, for this example, the model only explains 5.7% of the variance in the correspondence table.

38

Correspondence Analysis

Refer back to Overview Row Points dimensions 1 & 2 39 and Overview Column Points dimensions 1 & 2.

Correspondence Analysis
Care must be taken when interpreting the previous plot. It must be remembered that distances between columns and rows are not defined.

40

You might also like