You are on page 1of 48

SPSS: Expected frequencies, chi-squared test.

In-depth example: Age groups and radio choices.


Dealing with small frequencies.
Quick Example: Handedness and Careers

Last time we tested whether one nominal variable was


independent of another.

We did this by looking at the cross tabs and seeing how far the
observed frequencies were from the frequencies we would
expect if the two variables were independent.

For nominal variables that only had 2 possible responses each


(yes/no, male/female, insane/sane), we could use the odds
ratio.

When one or both of the variables has more than 2 responses


odds ratio is no longer useful, so we use the chi-squared test
instead.

The tradeoff: Odds ratio can be used for one-tailed tests, chisquared cant. Chi-squared can handle any number of rows
and columns.

Get chi-squared is also heavy in math, so in the real world,


SPSS and other software can handle most of it for us.

Most important things to know:


- How to get the expected frequency from a particular cell.
- Chi-squared is a measure of how far the observed
frequencies are from the expected frequencies.
- Large chi-squared values mean large deviations from the
expected frequencies.
- The df for chi-squared is (rows 1) x (columns 1)

SPSS: Expected frequencies


Start with a crosstab.
Analyze Descriptive Stats Crosstabs

In the pop-up, choose your row and column variables and click
the cells button in the upper right of the pop-up.

The cells button brings up the menu of what you want the cells
to show. Uncheck observed and check expected.

Then click Continue, then OK. This will produce a crosstab of


the expected values.

If father figure type and parenting style were independent,


there would be 9.4 moderate style stepfathers in our sample
on average.

Leaving Observed checked and leaving Expected unchecked


produces the observed values.

In our sample, we found 10 moderate style stepfathers.


Very near the 9.4 in the independent ideal.

Checking both observed and expected produces a table that


has both the observed and expected values in the same table.

It allows cell-to-cell comparison but its more cluttered.

The null hypothesis of independence fits the moderates and


stepfathers.

But live-in partners appear to be more permissive and less


authoritarian than other types of father figure.

Note the vague language about the trends in the data. Thats
because we cant say whether these trends are significant or
not.
We dont have the tools to say anything definitive about
specific categories.

Bats: Observing frequencies you wouldnt expect.

SPSS: Full crosstab analysis. Consider the following data on a


sample of peoples ages and radio preference.

We want to know if a persons radio preference depends on


what generation they belong to.
We have the data from 72 people in total in three nominal
categories of radio choice and three ordinal categories of age.

Should we do an odds ratio or a chi-squared?

Should we do an odds ratio or a chi-squared?

Chi-squared.

Because we have 3x3 table.

Odds Ratio only works for 2x2 tables.

SPSS: Chi-Squared is also in the crosstabs section.


Analyze Descriptive Statistics Crosstabs.
Click on the Statistics button.

Put a check next to Chi-Square in the upper left.


It doesnt matter if Risk is checked or unchecked.

Then click Continue, then OK.

Checking Chi-Squared produces the following table.

We want the Pearson Chi-Square. (yeah, Pearson is a big deal)


2

= 10.268
df = 4.

We could have got this from

(rows 1) x (cols. 1) = 2 x 2 = 4.

We also know that the p-value = .036.

So if we were testing for independence at alpha =0.05, we


would reject the null hypothesis of independence.
For interest: Asymp. Sig. stands for Asymptotic Significance.
Asymptotic in statistics means As n infinity.

The Chi-Square test also tells us of potential problems.

The test assumes there is a large number of respondents in


each cell. The standard rule is that every cell should have a
frequency of at least 5.

Having small cells (cells with less than 5 respondents) makes


the p-value of the chi-squared test inaccurate.
The more small cells there are, the worse the problem.

There are ways to deal with cells with small n. The easiest one
is to find a logical way to group categories together.

Here, there are substantially fewer older adults than any other
group.
We could merge the middle age and older adult categories
into a not young category.

Then we would have 2x3 cross tab with larger n values.

For a table of this size, its simple enough to do by hand.


Music
News
Sports

Young
14
4
7

Middle Age
10
15
9

Older Adult
2
8
3

The frequencies in the new categories are the frequencies in


both the old categories added together.

Music
News
Sports

Young
14
4
7

Not Young
10 + 2 = 12
15 + 8 = 23
9 + 3 = 12

Music
News
Sports

Young
14
4
7

Not Young
12
23
12

We still have one cell below 5, but thats better than having
three cells below 5. This wont distort our answer by much.

But if we do this by hand, then we cant analyze the new


dataset with SPSS.

We need some way to make new variables from old ones.

This slide for interest: For 2x2 crosstabs, there is no way to


merge to improve the frequencies in cells, but we can use a
modification to the chi-squared test called the Yates

Adjustment.
The textbook talks about dealing with cells with few
respondents in pages 326-331.

Also, its technically the small expected frequencies that cause


trouble, but the best indicator of these is small observed
counts.

We need some way to transform old variables into new ones.


SPSS: Recoding variables.
Goal: To take the three category variable Young/Middle/Old
And make a two category variable Young/Not Young

Transform Recode into Different Variables.

Select the variable you want to change. In our case its age.
Give the new variable a name in Output Variable: Name,
Then click on Change.

Then, click on Old and New Values.

This brings up the menu to define the old categories you have
the new categories you want.

In the new popup, check Output variables are strings first

Then enter the old category name in Old Value: Value


And enter the new category name in New Value: Value

Click Add and repeat the last slide for each category.

1Young Young,
2MiddleAge NotYoung, and
3OlderAdult NotYoung are the recoding were doing.

Now we can a crosstab in SPSS with the variable with the


merged category variable.
( Analyze Descriptive Statistics Crosstabs )

We can look at the expected frequencies.


(Crosstabs menu, Statistics button, Check Expected)

Even though one cell has observed frequency less than 5, its
expected frequency is more than 5, so the potential problem is
lessened.

We can also do the chi-squared test again and see if theres a


problem or a change in the p-value.

0/6 cells are too small instead of 3/9.


We went from 4 df to 2 because we now have a 2x3 crosstab.
(2 1) x (3 -1 ) = 2.

Also, the most important part, the p-value, hasnt changed


dramatically. (In the 3x3 table it was .036)

This implies that merging middle age and older didnt change
anything major.
We reject the null ; radio choice depends on age.

Its easier to detect differences in larger groups, so we would


expect the p-value to go down a little, but not something
dramatic like .001 or .000.

If the p-value had increased much we would have lost the


ability to reject the null. (A bad merge can do this).

Pacing parrot asks: Do we time for another?

We took a survey of people in four career fields and found if


they were left or right handed.

These are the observed counts.

Most of the respondents are right handed except for in the


athletics field, where a few more than half are left handed.

We want to know if this difference is a fluke or if career and


handedness are somehow dependent.

We have a 2x4 crosstab, so we should use a chi-squared test.


These are the results:

Degrees of freedom =
2

=
There is

evidence against independence.

We have a 2x4 crosstab, so we should use a chi-squared test.


These are the results:

Degrees of freedom = 3
2

= 50.434
There is very significant evidence against independence.

The chi-squared test has a very small p-value (less than .001).
Do the results of this test tell us that there are more left
handed people in athletics in general?

The chi-squared test has a very small p-value (less than .001).
Do the results of this test tell us that there are more left
handed people in athletics in general?

No.

Chi-squared only checks whether two variables are

independent, not specific trends within them.

By comparing the expected and observed counts, we can see


that the athletic field is much different from the others.
We can use this information to guide a next step even if were
not getting definite answers from just the expected counts.

We could try merging the other three fields into non-athletic


and athletic, as long as those three fields together fairly
represented everything non-athletic.

In that case, the odds ratio shows that someone in the athletic
field has 7.371 times the odds of being left handed as
someone in a non-athletic profession.

The confidence interval shows that this odds ratio is


significantly more than 1 at the alpha = 0.025 level.

Next time: More on cross tabs.


If time permits: Intro to Analysis of Variance. (Ch. 8)

You might also like