You are on page 1of 20

KARL PEARSON (1857-1936)

(Pearsons)

British mathematician, father of modern statistics and a pioneer of eugenics!

Chi-squared (2) test


This test compares measurements relating to the frequency of individuals in defined categories e.g. the numbers of white and purple flowers in a population of pea plants. Chi-squared is used to test if the observed frequency fits the frequency you expected or predicted.

How do we calculate the expected frequency?


You might expect the observed frequency of your data to match a specific ratio. e.g. a 3:1 ratio of phenotypes in a genetic cross. Or you may predict a homogenous distribution of individuals in an environment. e.g. numbers of daisies counted in quadrats on a field.
Note: In some cases you might expect the observed frequencies to match the expected, in others you might hope for a difference between them.

Example 1: GENETICS

Comparing the observed frequency of different types of maize grains with the expected ratio calculated using a Punnett square.

The photo shows four different phenotypes for maize grain, as follows:
Purple & Smooth (A), Purple & Shrunken (B), Yellow & Smooth (C) and Yellow & Shrunken (D)

The Punnett square below shows the expected ratio of phenotypes from crosses of four genotypes of maize.
Gametes PS Ps pS ps

PS

PPSS

PPSs

PpSS

PpSs

Ps

PPSs

PPss

PpSs

Ppss

pS

PpSS

PpSs

ppSS

ppSs

ps

PpSs

Ppss

ppSs

ppss

A:B:C:D = 9:3:3:1

What is the null hypothesis (H0)?


H0 = there is no statistically significant difference between the observed frequency of maize grains and the expected frequency (the 9:3:3:1 ratio)
HA = there is a significant difference between the observed frequency of maize grains and the expected frequency If the value for 2 exceeds the critical value (P = 0.05), then you can reject the null hypothesis.

Calculating 2
2
=

(O E)2 E

O = the observed results E = the expected (or predicted) results

Phenotype

E
(9:3:3:1)

O-E

(O-E)2

(O-E)2 E

A B C D

271 73 63 26

244 81 81 27 433

27 -8 -18 -1

729 64 324 1

2.99 0.88 4.00 0.04 7.91

433

2=

Compare your calculated value of 2 with the critical value in your stats table

Our value of 2 = 7.91 Degrees of freedom = no. of categories - 1 = 3


D.F. Critical Value (P = 0.05)

1 2 3 4 5

3.84 5.99 7.82 9.49 11.07

Our value for 2 exceeds the critical value, so we can reject the null hypothesis.
There is a significant difference between our expected and observed ratios. i.e. they are a poor fit.

Example 2: ECOLOGY
One section of a river was trawled and four species of fish counted and frequencies recorded. The expected frequency is equal numbers of the four fish species to be present in the sample.

What is the null hypothesis (H0)?


H0 = there is no statistically significant difference between the observed frequency of fish species and the expected frequency.
HA = there is a significant difference between the observed frequency of fish and the expected frequency If the value for 2 exceeds the critical value (P = 0.05), then you can reject the null hypothesis.

Calculating 2
2
=

(O E)2 E

O = the observed results E = the expected (or predicted) results

Species

O-E

(O-E)2

(O-E)2 E

Rudd Roach Dace Bream

15 15 4 6 40

10 10 10 10 40

5 5 -6 -4

25 25 36 16

2.5 2.5 3.6 1.6 10.2

2=

Compare your calculated value of 2 with the critical value in your table of critical values.

Our value of 2 = 10.2 Degrees of freedom = no. of categories - 1 = 3


D.F. Critical Value (P = 0.05)

1 2 3 4 5

3.84 5.99 7.82 9.49 11.07

Our value for 2 exceeds the critical value, so we can reject the null hypothesis.
There is a significant difference between our expected and observed frequencies of fish species.

Example 3: CONTINGENCY TABLES


You can use contingency tables to calculate expected frequencies when the relationship between two quantities is being investigated.

In this example we will look at the incidence of colour blindness in both males and females.

What is the null hypothesis (H0)?


H0 = there is no statistically significant difference between the observed frequency of colour blindness in males and females.
HA = there is a significant difference between the between the observed frequency of colour blindness in males and females If the value for 2 exceeds the critical value (P = 0.05), then you can reject the null hypothesis.

Observed frequencies Males Colour blind Not colour blind 56 754

Females 14 536

Expected Cell Frequency = (Row Total x Column Total)


n e.g. The expected frequency for colour blind males = (56 + 14) x (56 + 754) = 42 1360

Observed:
Colour blind

Males

Females

56

14

Not colour blind

754
Males

536
Females

Expected:
Colour blind Not colour blind (O E)2 / E Colour blind Not colour blind

42 768
Males

28 522
Females

4.7 754
(O E)2 E

14 536

=
2

= 4.7 + 14 + 754 + 536 = 12.33

Compare your calculated value of 2 with the critical value in your table of critical values

Our value of 2 = 12.33 Deg of Freedom = (2 rows - 1) x (2 cols 1) = 1


D.F. Critical Value (P = 0.05)

1 2 3 4 5

3.84 5.99 7.82 9.49 11.07

Our value for 2 exceeds the critical value, so we can reject the null hypothesis. There is a significant difference between our expected and observed frequencies.

The fraction of males with colour blindness is greater than that in females. The difference cannot be attributed to chance alone.

You might also like