You are on page 1of 13

Chi

Squared Analysis
Chi Squared
We can determine if two genes are linked by performing a
statistical analysis on the data from an experiment.

To perform a chi-squared analysis requires knowing an expected


number of progeny in each possible phenotype.

(obs exp)2
X2 = -------------------
expected

Comparison of actual data to the expected data allows one to


determine how much from the expected the actual data varied.
Chi Squared
If I flip a coin 10 times and get 6 heads and 4 tails, who can
confidently state that the coin is unbalanced?

If I flip a coin 1000 times and get 600 heads and 400 tails, who can
confidently state that the coin is unbalanced?

Chi-squared is a statistical analysis that allows one to assess the


likelihood that results could have been obtained by chance.

For the purposes of gene linkage, we assess whether the two gene
demonstrate independent assortment.
Chi Squared
The Chi Square Test

The general formula is

(O E)2
2 =
E

n where
n O = observed data in each category
n E = expected data in each category based on the
experimenters hypothesis
n = Sum of the calculations for each category
Consider the following example in Drosophila
melanogaster
n Gene affecting wing shape n Gene affecting body color
n c = Normal wing n e = Normal (gray)
+ +

n c = Curved wing n e = ebony

n Note:
n The wild-type allele is designated with a + sign

n Recessive mutant alleles are designated with lowercase

letters

n The Cross:
n A cross is made between two true-breeding flies (c+c+e+e+
and ccee). The flies of the F1 generation are then allowed
to mate with each other to produce an F2 generation.
n The outcome
n F1 generation
n All offspring have straight wings and gray bodies

n F2 generation
n 193 straight wings, gray bodies

n 69 straight wings, ebony bodies


n 64 curved wings, gray bodies
n 26 curved wings, ebony bodies
n 352 total flies

n Applying the chi square test


n Step 1: Propose a hypothesis that allows us to calculate
the expected values based on Mendels laws
n The two traits are independently assorting
n Step 2: Calculate the expected values of the four
phenotypes, based on the hypothesis
n According to our hypothesis, there should be a

9:3:3:1 ratio in the F2 generation


Phenotype Expected Expected number
probability
straight wings, 9/16 9/16 X 352 = 198
gray bodies
straight wings, 3/16 3/16 X 352 = 66
ebony bodies
curved wings, 3/16 3/16 X 352 = 66
gray bodies
curved wings, 1/16 1/16 X 352 = 22
ebony bodies
n Step 3: Apply the chi square formula

(O1 E1)2 (O2 E2)2 (O3 E3)2 (O4 E4)2


2 = + + +
E1 E2 E3 E4

(193 198)2 (69 66)2 (64 66)2 (26 22)2


2 = + + +
198 66 66 22

2 = 0.13 + 0.14 + 0.06 + 0.73

2 = 1.06
n Step 4: Interpret the chi square value
n The calculated chi square value can be used to obtain

probabilities, or P values, from a chi square table


n These probabilities allow us to determine the likelihood that the
observed deviations are due to random chance alone

n Low chi square values indicate a high probability that the


observed deviations could be due to random chance alone
n High chi square values indicate a low probability that the
observed deviations are due to random chance alone

n If the chi square value results in a probability that is less than


0.05 (ie: less than 5%), the hypothesis is rejected
n Step 4: Interpret the chi square value

n Before we can use the chi square table, we have to


determine the degrees of freedom (df)
n The df is a measure of the number of categories that are

independent of each other


n df = n 1

n where n = total number of categories

n In our experiment, there are four phenotypes/categories

n Therefore, df = 4 1 = 3

n Refer to Table 2.1


n Step 4: Interpret the chi square value

n With df = 3, the chi square value of 1.06 is slightly greater


than 1.005 (which corresponds to P= 0.80)

n A P = 0.80 means that values equal to or greater than 1.005


are expected to occur 80% of the time based on random
chance alone

n Therefore, it is quite probable that the deviations between


the observed and expected values in this experiment can be
explained by random sampling error

You might also like