Pitanje 14

SPSS: Expected frequencies, chi-squared test.
In-depth example: Age groups and radio choices.

Dealing with small frequencies.
Quick Example: Handedness and Careers
Last time we tested whether one nominal variable was

independent of another.
We did this by looking at the cross tabs and seeing how far the
observed frequencies were from the frequencies we would
expect if the two variables were independent.
For nominal variables that only had 2 possible responses each

(yes/no, male/female, insane/sane), we could use the odds
ratio.
When one or both of the variables has more than 2 responses

odds ratio is no longer useful, so we use the chi-squared test
instead.
The tradeoff: Odds ratio can be used for one-tailed tests, chisquared cant. Chi-squared can handle any number of rows
and columns.
Get chi-squared is also heavy in math, so in the real world,

SPSS and other software can handle most of it for us.
Most important things to know:

- How to get the expected frequency from a particular cell.
- Chi-squared is a measure of how far the observed
frequencies are from the expected frequencies.
- Large chi-squared values mean large deviations from the
expected frequencies.
- The df for chi-squared is (rows 1) x (columns 1)
SPSS: Expected frequencies

Start with a crosstab.
Analyze Descriptive Stats Crosstabs
In the pop-up, choose your row and column variables and click
the cells button in the upper right of the pop-up.
The cells button brings up the menu of what you want the cells
to show. Uncheck observed and check expected.
Then click Continue, then OK. This will produce a crosstab of

the expected values.
If father figure type and parenting style were independent,

there would be 9.4 moderate style stepfathers in our sample
on average.
Leaving Observed checked and leaving Expected unchecked

produces the observed values.
In our sample, we found 10 moderate style stepfathers.

Very near the 9.4 in the independent ideal.
Checking both observed and expected produces a table that

has both the observed and expected values in the same table.
It allows cell-to-cell comparison but its more cluttered.
The null hypothesis of independence fits the moderates and

stepfathers.
But live-in partners appear to be more permissive and less

authoritarian than other types of father figure.
Note the vague language about the trends in the data. Thats
because we cant say whether these trends are significant or
not.
We dont have the tools to say anything definitive about
specific categories.
Bats: Observing frequencies you wouldnt expect.
SPSS: Full crosstab analysis. Consider the following data on a

sample of peoples ages and radio preference.
We want to know if a persons radio preference depends on

what generation they belong to.
We have the data from 72 people in total in three nominal
categories of radio choice and three ordinal categories of age.
Should we do an odds ratio or a chi-squared?
Should we do an odds ratio or a chi-squared?
Chi-squared.
Because we have 3x3 table.
Odds Ratio only works for 2x2 tables.
SPSS: Chi-Squared is also in the crosstabs section.

Analyze Descriptive Statistics Crosstabs.
Click on the Statistics button.
Put a check next to Chi-Square in the upper left.

It doesnt matter if Risk is checked or unchecked.
Then click Continue, then OK.
Checking Chi-Squared produces the following table.
We want the Pearson Chi-Square. (yeah, Pearson is a big deal)

2
= 10.268
df = 4.
We could have got this from
(rows 1) x (cols. 1) = 2 x 2 = 4.
We also know that the p-value = .036.
So if we were testing for independence at alpha =0.05, we

would reject the null hypothesis of independence.
For interest: Asymp. Sig. stands for Asymptotic Significance.
Asymptotic in statistics means As n infinity.
The Chi-Square test also tells us of potential problems.
The test assumes there is a large number of respondents in

each cell. The standard rule is that every cell should have a
frequency of at least 5.
Having small cells (cells with less than 5 respondents) makes

the p-value of the chi-squared test inaccurate.
The more small cells there are, the worse the problem.
There are ways to deal with cells with small n. The easiest one
is to find a logical way to group categories together.
Here, there are substantially fewer older adults than any other
group.
We could merge the middle age and older adult categories
into a not young category.
Then we would have 2x3 cross tab with larger n values.
For a table of this size, its simple enough to do by hand.

Music
News
Sports
Young
14
4
7
Middle Age
10
15
9
Older Adult
2
8
3
The frequencies in the new categories are the frequencies in

both the old categories added together.
Music
News
Sports
Young
14
4
7
Not Young
10 + 2 = 12
15 + 8 = 23
9 + 3 = 12
Music
News
Sports
Young
14
4
7
Not Young
12
23
12
We still have one cell below 5, but thats better than having
three cells below 5. This wont distort our answer by much.
But if we do this by hand, then we cant analyze the new

dataset with SPSS.
We need some way to make new variables from old ones.
This slide for interest: For 2x2 crosstabs, there is no way to

merge to improve the frequencies in cells, but we can use a
modification to the chi-squared test called the Yates
Adjustment.
The textbook talks about dealing with cells with few
respondents in pages 326-331.
Also, its technically the small expected frequencies that cause

trouble, but the best indicator of these is small observed
counts.
We need some way to transform old variables into new ones.

SPSS: Recoding variables.
Goal: To take the three category variable Young/Middle/Old
And make a two category variable Young/Not Young
Transform Recode into Different Variables.
Select the variable you want to change. In our case its age.
Give the new variable a name in Output Variable: Name,
Then click on Change.
Then, click on Old and New Values.
This brings up the menu to define the old categories you have
the new categories you want.
In the new popup, check Output variables are strings first
Then enter the old category name in Old Value: Value

And enter the new category name in New Value: Value
Click Add and repeat the last slide for each category.
1Young Young,
2MiddleAge NotYoung, and
3OlderAdult NotYoung are the recoding were doing.
Now we can a crosstab in SPSS with the variable with the

merged category variable.
( Analyze Descriptive Statistics Crosstabs )
We can look at the expected frequencies.

(Crosstabs menu, Statistics button, Check Expected)
Even though one cell has observed frequency less than 5, its
expected frequency is more than 5, so the potential problem is
lessened.
We can also do the chi-squared test again and see if theres a

problem or a change in the p-value.
0/6 cells are too small instead of 3/9.

We went from 4 df to 2 because we now have a 2x3 crosstab.
(2 1) x (3 -1 ) = 2.
Also, the most important part, the p-value, hasnt changed

dramatically. (In the 3x3 table it was .036)
This implies that merging middle age and older didnt change
anything major.
We reject the null ; radio choice depends on age.
Its easier to detect differences in larger groups, so we would

expect the p-value to go down a little, but not something
dramatic like .001 or .000.
If the p-value had increased much we would have lost the

ability to reject the null. (A bad merge can do this).
Pacing parrot asks: Do we time for another?
We took a survey of people in four career fields and found if

they were left or right handed.
These are the observed counts.
Most of the respondents are right handed except for in the

athletics field, where a few more than half are left handed.
We want to know if this difference is a fluke or if career and

handedness are somehow dependent.
We have a 2x4 crosstab, so we should use a chi-squared test.

These are the results:
Degrees of freedom =
2
=
There is
evidence against independence.
We have a 2x4 crosstab, so we should use a chi-squared test.

These are the results:
Degrees of freedom = 3
2
= 50.434
There is very significant evidence against independence.
The chi-squared test has a very small p-value (less than .001).
Do the results of this test tell us that there are more left
handed people in athletics in general?
The chi-squared test has a very small p-value (less than .001).
Do the results of this test tell us that there are more left
handed people in athletics in general?
No.
Chi-squared only checks whether two variables are
independent, not specific trends within them.
By comparing the expected and observed counts, we can see

that the athletic field is much different from the others.
We can use this information to guide a next step even if were
not getting definite answers from just the expected counts.
We could try merging the other three fields into non-athletic

and athletic, as long as those three fields together fairly
represented everything non-athletic.
In that case, the odds ratio shows that someone in the athletic
field has 7.371 times the odds of being left handed as
someone in a non-athletic profession.
The confidence interval shows that this odds ratio is

significantly more than 1 at the alpha = 0.025 level.
Next time: More on cross tabs.

If time permits: Intro to Analysis of Variance. (Ch. 8)

Pitanje 14

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pitanje 14

Uploaded by

Copyright:

Available Formats

SPSS: Expected frequencies, chi-squared test.

In-depth example: Age groups and radio choices.

Last time we tested whether one nominal variable was

For nominal variables that only had 2 possible responses each

When one or both of the variables has more than 2 responses

Get chi-squared is also heavy in math, so in the real world,

Most important things to know:

SPSS: Expected frequencies

Then click Continue, then OK. This will produce a crosstab of

If father figure type and parenting style were independent,

Leaving Observed checked and leaving Expected unchecked

In our sample, we found 10 moderate style stepfathers.

Checking both observed and expected produces a table that

It allows cell-to-cell comparison but its more cluttered.

The null hypothesis of independence fits the moderates and

But live-in partners appear to be more permissive and less

Bats: Observing frequencies you wouldnt expect.

SPSS: Full crosstab analysis. Consider the following data on a

We want to know if a persons radio preference depends on

Should we do an odds ratio or a chi-squared?

Should we do an odds ratio or a chi-squared?

Because we have 3x3 table.

Odds Ratio only works for 2x2 tables.

SPSS: Chi-Squared is also in the crosstabs section.

Put a check next to Chi-Square in the upper left.

Then click Continue, then OK.

Checking Chi-Squared produces the following table.

We want the Pearson Chi-Square. (yeah, Pearson is a big deal)

We could have got this from

We also know that the p-value = .036.

So if we were testing for independence at alpha =0.05, we

The Chi-Square test also tells us of potential problems.

The test assumes there is a large number of respondents in

Having small cells (cells with less than 5 respondents) makes

Then we would have 2x3 cross tab with larger n values.

For a table of this size, its simple enough to do by hand.

The frequencies in the new categories are the frequencies in

But if we do this by hand, then we cant analyze the new

We need some way to make new variables from old ones.

This slide for interest: For 2x2 crosstabs, there is no way to

Also, its technically the small expected frequencies that cause

We need some way to transform old variables into new ones.

Transform Recode into Different Variables.

Then, click on Old and New Values.

In the new popup, check Output variables are strings first

Then enter the old category name in Old Value: Value

Now we can a crosstab in SPSS with the variable with the

We can look at the expected frequencies.

We can also do the chi-squared test again and see if theres a

0/6 cells are too small instead of 3/9.

Also, the most important part, the p-value, hasnt changed

Its easier to detect differences in larger groups, so we would

If the p-value had increased much we would have lost the

Pacing parrot asks: Do we time for another?

We took a survey of people in four career fields and found if

These are the observed counts.

Most of the respondents are right handed except for in the

We want to know if this difference is a fluke or if career and

We have a 2x4 crosstab, so we should use a chi-squared test.

evidence against independence.

We have a 2x4 crosstab, so we should use a chi-squared test.

Chi-squared only checks whether two variables are

independent, not specific trends within them.