You are on page 1of 83

Crosstabulation and Measures of Association

Abdul Rahman Othman

When do We Crosstabulate Variables?


Investigate the relationship between two or more variables. Variables investigated have nominal or ordinal values. If the variables are in the interval scale then transform to ordinal scale.

What is a Crosstabulation?
Joint frequency distribution of two or more class (ordinal/nominal) variables. Subdivision of one variable according to the values of another variable.

Example: Methodology Seminar Attendees


Academic Status Staff Gender Male Female 3 9 Student 4 11

SPSS Example
Open the data set tastetest.sav. Data set consists of 30 subjects responding to a product called mulch. Mulch is produced in 3 colors: Red, Blue and Black. The 30 subjects were divided into 3 groups of 10 each. The three groups were randomly assigned the colored mulch.

Cont.
Their respective taste of the colored mulch were recorded. Open SPSS go to File > Open > Data. An Open File dialog box opens. Look for tastetest.sav. Double click on it to open it.

Browsing Variable Information


Go to Utilities > Variables. A Variables display box opens. The variable color is highlighted on the left panel of the display box. Information with regard to color is shown on the right panel of the display box.

Variables Display Box Showing Information of the Variable Color

Browsing Variable Taste


In the Variables display box, highlight taste. The following information is presented.

Cont.

Producing the Crosstabulation


Go to Analyze > Descriptive Statistics > Crosstabs.

Crosstabs Dialog Box


A Crosstabs dialog box opens. Place color in the Row(s): box and taste in the Column(s): box. Click OK.

Cont.

Output
color * taste Crosstabulation u t t 2 2 t 2 7 2 5 2 6 t 5 t

Interpretation
Note that the red colored mulch (red=1) has a lot more respondents saying that its taste is above average than the other colored mulches. Note that the black colored mulch (black = 3) has a lot more below average rating compared to the other colored mulches.

Clustered Bar Chart


A clustered bar chart can be produced to aid in this interpretation. Use the Dialog Recall icon to recall the crosstabulation of color x taste. This time check the options:
Display Clustered Bar Charts Suppress Tables.

Click OK.

Crosstabs Dialog Box Recalled

Clustered Bar Chart


r rt
a s te s a le
ar a e a e ra e e a e ra

e ra

Bel ar

era a

el

era

un t

ed

B lu e

B la

M u l

 

 

 

          

More Graphical Aids


Stacked Bar Charts Clustered Pie Chart

Stacked Bar Charts


Go to Graphs > Bar. A Bar Charts dialog box opens. Choose the Stacked icon. Choose the option Summaries for groups of cases in the Data in Chart Are section. Click Define.

Define Stacked Bar Dialog Box


A Define Stacked Bar: Summaries for Groups of Cases dialog box opens. Place color in the Category Axis: box. Place taste in the Define Stacks by: box. Note that the default choice in the Bars Represent section is N of cases. This is OK if the number of cases in the grouping variable are the same.

Cont.
In this particular example N of cases (red) = N of cases (blue) = N of cases (black) = 10. In the case when these are not the same choose the option % of cases instead. Following that we can adjust the bars to be of equal length so that comparisons can be made.

Cont.

Observed Output
s te s
r b ve b ve ver

le
ver e

ver el

ver

r bel

ver

un t

ed

lu e

M u l

(

(  0' & (  0'  ( )  ' ) '  &    

( 

!% $ # " !

Changing the Orientation of the Output


To change the orientation of the default output from vertical to horizontal. Double click on the graph to invoke the Chart Editor. Go the Transpose chart coordinate system icon and click on it. Close the Chart Editor.

49

E3

E3 D3 GC B 3 A E3 D3 GC 2 E3 D F E3 D3 DCB F D3 DC B 3 3 A @ @
T
el er r r

lu e

ed

ck

Preferred Output

u n t

M u l l r

s te s c

el

er

er

er

le

er

e e

3 2

Stacked Bar Chart when Number of Cases in Groups Not Equal


Recall that in the last example the number of cases in each of the colored mulch group = 10. Suppose this is not the case. Suppose our grouping variable of interest is taste, hence the number of cases for far above average, above average, average, below average and far below average are 3, , 11, and 3, respectively.

Redo Stacked Bar Charts


Go to Graphs > Bar. A Bar Charts dialog box opens. Choose the Stacked icon. Choose the option Summaries for groups of cases in the Data in Chart Are section. Click Define.

Define Stacked Bar Dialog Box


A Define Stacked Bar: Summaries for Groups of Cases dialog box opens. Place taste in the Category Axis: box. Place color in the Define Stacks by: box. Choose % of cases in the Bars Represent section.

Cont.

Resultant Graph
M u lc h c o lo r
1 0 .0 % 1 0 0 .0 %

P erce nt

4 0 .0 %

R ed B lu e B la c k

0 .0 %

P I H

0 .0%

0 .0 %

0 .0 % F a r ab o v e a v e ra g e Above a v e r a ge A v e r a ge B e lo w a v e ra g e F a r b e lo w a v e r ag e

T a s te s c a le

Note
The default colors in the graph do not tally with the mulch colors. We need to edit the graph to make the colors the same as that of the product. The bar length are based on overall percentage. We need to change the bar lengths so that they are all the same, i.e. the % breakdown of colors are based on group size.

Edits
Double click on the graph to activate the Chart Editor. Click on the blue key box representing red mulch in the Mulch Color legend. All the blue sections of the bars are highlighted. Double click on the blue key box in the legend. A Properties dialog box opens.

Properties Dialog Box


Go to the Fill & Border tab and click on it.

Properties Dialog Box Fill & Border


Change the color in Fill to red.

Cont.
Click Apply.

Resulting Edit

Edit the Other Key Boxes


In the same manner change the green color of the blue mulch to blue and the khaki color of the black mulch to black.

UR X

TV

U R TR T SV
o er

U R TR T SR R Q
r er o

`Y Ya
.

`Y Yb
.

`Y Y c
.

`Y YYd
. l

ihR W W g f
ul h
lu e ed

`Y Yad
.

`Y Y
.

You Should Get

er

e lo er

UR TR S RQ
r er

TR W

UR

T ast

scal

e lo e

`Y Y

rce nt

o lo r

Reforming the Bars So That They are of Equal Length


Double click on the bars to open the Properties dialog box. Click on the Bar Options tab. In the Bar Options tab choose Scale to 100% in the Stacked Bars section. Click Apply.

Resultant Graph
ul h
R e lu e la

l r

.8

P erce nt

e ra

ar a a era

e e

e ra

e a

el era

ar el a e ra e

T a s te s c a le

v v s t wr q p t wr s v t su s srq u t srq s p y y x y x y x yx x x

yx

Reorientate
lc c o lo r
Re ar e lo a e ra l e la c k e

e lo

era

era

e a

era

ar a

o e a

e ra

nt

Note
Note that the x-axis labels are all wrong. They should be in the form of decimals 0, 0. , 0.4, 0. , 0. and 1. Not 0%, 0. %, , 1%. Relabel them appropriately. You can also use 0%, 0%, , 100%. What can you interpret from this graph?

Clustered Pie Chart


Go to Graphs > Interactive > Pie > Clustered. The Create Clustered Pie Chart dialog box opens.

Create Clustered Pie Chart Dialog Box


Choose the -D Coordinate option. Grab Taste from the variables window on the left and place it in the Slice by: box in the Pie Variables section on the right. Grab mulch and place it in the Panel Variables box. Click OK.

Resultant Graphs
Red Blue

Taste scale
Far above average Above average Average Below average Far below average Pies show counts

Black

Editing the Colors to Match the Bar Chart Colors


Double click graph output to invoke the Interactive Graph Editor. Go to the legend box and double click on the key. A dotted frame appears around the key and all Far above average sectors in the three pie charts. A Color Legend-Taste Scale dialog box opens. Change the colors for every key to match those in the bar charts. Click OK.

Clustered Pie Charts After Editing the Colors


Red Blue

ar a o e a erage A o e a erage A erage elo a erage ar elo a erage Pies sho

ounts

Black

i k h g i k j i i ih ih g lk

Taste scale

Clustered Pie Charts with Taste as Panel and Color as Slices after Editing.
Far above average Above average Average

Mulch color
ed lue lac Pi

Below average

Far below average

Measures of Association
Interval variables: Pearson Correlation Coefficient Ordinal variables: Spearman Rho Correlation Nominal & ordinal variables: Chi-Square Statistic

Nominal Variables: Measures of Association


Contingency Coefficient Phi and Cramers V Lambda Uncertainty Coefficient

Ordinal Variables: Measures of Association


Gamma Somers d Kendalls tau-b Kendalls tau-c Nominal x interval variables: eta

Pearson Correlation Coefficient


Given a set of bivariate data, the tendency towards a linear relationship between the two variables can be measured by the Pearson correlation coefficient, r. -1 < r < 1. r = 0, no tendency towards a linear relationship. Could be random. Could be nonlinear. r = s1, perfect linear relationship.

Computational Formula for r.

r!
n

nxi yi  xi yi i !1 i!1 i!1


n n n

nx  xi i!1 i!1
n 2 i

n y  yi i!1 i!1
n n 2 i

Example: Weights (kg) and Blood Glucose Levels (mg/100ml)


Weight (X) Glucose (Y) 4 10 .3 109 3 104 .1 10 . 10 9 . 1 1 9.4 9 93.4 10 Weight (X) Glucose (Y) .1 101 .9 . 99 .1 100 3.9 10 3 104 4.4 10 .

Cont.
16

n ! 16
16

x y
i i =1 16 i !1 16

!126128.1 ! 1621

x
i !1 16

! 1237.8 ! 97178.6

y y
i !1 2 i

x
i !1

2 i

! 165801

Cont.
r!

16 126128.1  1237.8 1621 2 2 16 97178.6  1237.8 16 165801  1621

! 0.484

Interpretation
Body weight and blood glucose level have a weak affinity towards a linear relationship.

Plot of glucose by weight


1 4 0 .0 0

/
1 2 0 .0 0

lu c o s e le v e ls ( l

l)
1 0 0 .0 0 8 0 .0 0 6 0 .0 0

w ei

t (

n n n nr
.

nn nq

n n np

nn no

nn nm

SPSS Example
Run this data using SPSS. Type the data in the SPSS Data Editor. Name the first variable x. Label it weight (kg). Name the second variable y. Label it blood glucos level (mg/100ml). File > Save as sugar.sav

Cont.
Go to Analyze > Correlate > Bivariate. A Bivariate Correlations dialog box opens.

Bivariate Correlations Dialog Box


Place x and y in the Variables: box Check the Pearson box under Correlation Coefficients. Uncheck Flag significant correlations. Click Options.

Bivariate Correlations: Options Dialog Box

Check the Means and standard deviations and Cross-product and covariances boxes under Statistics. Click Continue.

Output
Descriptive St tistics t x t

Cont.
Correlations

x x Pearson Correlation Sig. ( -tailed) Sum of Squares and Cross-products Covariance N Pearson Correlation Sig. ( -tailed) Sum of Squares and Cross-products Covariance N 1 1419. 9 94. 0 1 .4 4 .0 3.4 4 . 3 1 1

y .4 4 .0 3.4 4 . 3 1 1 3.43

104. 9 1

Relationship to Hand Calculation


Sum of squares of x = Corrected sum of squares of x
n

= x
i !1

2 i

1 xi n i !1
n n n 2 i

n Sum of squares of x ! n x  xi i !1 i !1

Similarly
um o squares o y
n

orrected sum o squares o y 1 y  n yi i !1 i !1


n 2 i n n 2

um o squares o y ! n y  yi i !1 i !1
2 i

and
um o cross product o xy
n

orrected sum o cross product o xy 1 xi yi  n xi yi i!1 i!1 i!1


n n

n n um o cross n n ! nxi yi  xi yi i!1 i!1 product o xy i!1

Therefore
Sum o cross n product o xy r! Sum o Sum o n n squares o x squares o y

Next Week
More Measures of Association & Test of Hypothesis