You are on page 1of 11

CHAPTER 11 Statistics and Heredity

An Austrian monk, Gregor Mendel (1822–1884) studied genetics,


and his principles are the foundation for modern genetics. Mendel
used his spare time to grow a variety of peas at the monastery. One
of his many experiments involved crossbreeding peas that had
smooth yellow seeds with peas that had wrinkled green seeds. He
noticed that the results occurred with regularity. That is, some of
the offspring had smooth yellow seeds, some had smooth green
Other Chi-Square Tests seeds, some had wrinkled yellow seeds, and some had wrinkled
green seeds. Furthermore, after several experiments, the
percentages of each type seemed to remain approximately the same.
Mendel formulated his theory based on the assumption of dominant
and recessive traits and tried to predict the results. He then
crossbred his peas and examined 556 seeds over the next
generation. Finally, he compared the actual results with the
theoretical results to see if his theory was correct. To do this, he
used a “simple” chi-square test, which is explained in this chapter.

11-1 11-2
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004

Objectives Introduction

„ Test a distribution for goodness of fit using „ The chi-square distribution can be used for
chi-square. tests concerning frequency distributions, such
as: “ If a sample of buyers is given a choice of
„ Test two variables for independence using chi-
automobile colors, will each color be selected
square.
with the same frequency?”
„ Test proportions for homogeneity using chi-
„ The chi-square distribution can also be used
square.
to test the independence of two variables. For
example, “Are senators’ opinions on gun
control independent of party affiliations?”

11-3 11-4
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004
Introduction (cont’d.) Test for Goodness-of-Fit

„ The chi-squared distribution can be used to „ The chi-square statistic can be used to see
test the homogeneity of proportions. For whether a frequency distribution fits a specific
example: “Is the proportion of high school pattern. This is referred to as the chi-squared
seniors who attend college immediately after goodness-of-fit test.
graduating the same for the northern,
southern, eastern, and western parts of the
United States?”

11-5 11-6
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004

Observed Frequencies vs. Expected Frequencies Goodness-of-Fit Test


Suppose a market analyst wished to see whether consumers „ The formula for the chi-square goodness-of-fit
have any preference among five flavors of a new fruit soda. A
test: 2 (O − E )2
sample of 100 people provided these data: X = ∑ E
Cherry Strawberry Orange Lime Grape
32 28 16 14 10
with d.f.= number of categories - 1
Since the frequencies for each flavor were obtained from a
sample, these actual frequencies are called the observed
O = observed frequency
frequencies. The frequencies obtained by calculation (as if there E = expected frequency
were no preference) are called the expected frequencies.
This test is a right-tailed test, since when the
Frequency Cherry Strawberry Orange Lime Grape
Observed 32 28 16 14 10 (O – E) values are squared, the answer will be
Expected 20 20 20 20 20 positive or zero.
11-7 11-8
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004
Goodness-of-Fit Test Assumptions An Example

„ The data are obtained from a random sample. Is there enough evidence to reject the claim that there is no
preference in the selection of fruit soda flavors, using the data
shown previously? Let α = 0.05.
„ The expected frequency for each category
must be 5 or more. Solution
STEP 1 State the hypotheses and identify the claim.
H0: Consumers show no preference for flavors (claim).
H1 : Consumers show a preference.
STEP 2 Find the critical value. The degrees of freedom are 5 – 1=
4, and α = 0.05. Hence, the critical value from the table is 9.488.
STEP 3 Compute χ2.
(O − E ) 2
χ2 = ∑ = 18.0
E
11-9 11-10
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004

An Example (cont’d) A Good Fit


STEP 4 Make the decision. The decision is to reject the null „ When the observed
hypothesis, since 18.0 > 9.488. values and expected y
values are close
together, the chi-square
test value will be small.
Then the decision will be
not to reject the null- x
hypothesis—hence,
STEP 5 Summarise the results. There is enough evidence to
there is a “good fit.”
reject the claim that consumers show no preference for the Observed values Expected Values
flavors.

11-11 11-12
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004
Not A Good Fit Chi-Square Goodness-of-Fit Procedure
„ When the observed „ Step 1 State the hypotheses and identify the
values and the expected claim.
values are far apart, the y
chi-square test value „ Step 2 Find the critical value. The test is
will be large. Then, the always right-tailed.
null hypothesis will be
rejected—hence, there is x „ Step 3 Compute the test value.
“not a good fit.” 2
Find the sum of the (O − E ) values.
E
Observed values Expected Values
„ Step 4 Make the decision.

„ Step 5 Summarize the results.


11-13 11-14
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004

Goodness-of-Fit Results More Examples

„ When there is a perfect agreement between Goodness-of-fit can also be used to test the observed data
against known frequency distributions.
the observed and the expected values, χ2 = 0 ,
Poisson Distribution: Let X be the number of defects in printed
but χ2 can never be negative. circuit boards. A random sample of n = 60 printed circuit boards
is taken and the number of defects recorded. The results were as
„ The test is right-tailed because “H0: Good fit” follows: No. of defects 0 1 2 3
and “H1: Not a good fit” means that χ2 will be
small in the first case and χ2 will be large in Observed Freq. 32 15 9 4

the second case. Does the assumption of a Poisson distribution seem appropriate
as a model for these data?
Solution:
The mean of the (assumed) Poisson distribution is unknown so
must be estimated from the data by the sample mean:

11-15 11-16
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004
More Examples (cont.) More Examples (cont.)
32 × 0 + 15 × 1 + 9 × 2 + 4 × 3 Now do the chi-square goodness-of-fit.
X = = 0.75
60
Step 1: The null hypothesis is H0 : X ∼Poisson
Using the Poisson distribution with µ = 0.75 we can compute pi, The alternative hypothesis is H1 : X does not follow a
the hypothesised probabilities associated with each class. From Poisson distribution.
these we can calculate the expected frequencies(under the null
Step 2: Find the Critical Value. Note the degrees of freedom is 4 -1-
hypothesis):
1 =2, since we had to estimate one parameter (the mean, µ)
e −0.75 0.75 0
P ( X = 0) = = 0.472 ⇒ E ( X = 0) = 0.472 × 60 = 28.38 from the data. Suppose α = 0.05, we have the critical value
0!
from Chi-square table: 5.991.
e −0.75 0.751
P ( X = 1) = = 0.354 ⇒ E ( X = 1) = 0.354 × 60 = 21.24
1! Step 3: Computer χ2:
e −0.75 0.75 2
P( X = 2) = = 0.133 ⇒ E ( X = 2) = 0.133 × 60 = 7.98 (O − E ) 2
2! χ2 = ∑ = 3.3894
E
P( X = 3) = 1 − ( P( X = 0) + P( X = 1) + P( X = 2)) = 0.041 ⇒ E ( X = 3) = 0.041 × 60 = 2.46
Step 4: Make a decision: do not reject the null hypothesis.
11-17 11-18
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004

More Examples (cont.) More Examples (cont.)


Uniform distribution: A staff member of an emergency medical Next do the goodness-of-fit test:
service wishes to determine whether the number of accidents is
Step 1. State the hypotheses
equally distributed during the week. A week was selected at
random, and the following data were obtained. State the H0: The distribution is uniform
hypotheses and identify the claim. Find the critical value at =
H1: The distribution is not uniform
0.005.
Day Mon. Tues. Wed. Thurs. Fri. Sat. Sun Step 2: Find the critical value for = 0.005, and because the
No. of
23 31 14 17 35 44 21 degrees of freedom is 7-1-1 =5, we have 16.750
Accidents

Step 3: Compute Chi-square:


Solution: (O − E ) 2
χ2 = ∑ = 26.0
“Equal distribution” represents a uniform distribution, i.e., the E
expected number of accidents for everyday is even:
Step 4: Make the decision to reject the null hypothesis since
(23+31+14+17+35+44+21)/7 = 26.4
26.0261>16.750. Therefore the claim that the no. of accident is
evenly distributed during the week is not supported.
11-19 11-20
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004
Tests Using Contingency Tables Independence Test
When data can be tabulated in table form in terms of „ The chi-square independence test can be used
frequencies, several types of hypotheses can be tested using the
chi-square test. Two such tests are the independence of to test the independence of two variables.
variables test and the homogeneity of proportions test. The test
of independence of variables is used to determine whether two „ H0: There is no relationship between two
variables are independent of or related to each other when a variables.
single sample is selected. The test of homogeneity of proportions
is used to determine whether the proportions for a variable are „ H1: There is a relationship between two
equal when several samples are selected from different
populations. Both tests use the chi-square distribution and a variables.
contingency table, and the test value is found in the same way.
The independence test will be explained first. „ If the null hypothesis is rejected, there is
some relationship between the variables.

11-21 11-22
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004

An Example An Example (cont’d)


Suppose a new postoperative procedure is administered to a If the null hypothesis is not rejected, the test means that both
number of patients in a large hospital. One can ask the professions feel basically the same way about the procedure
question, Do the doctors feel differently about this procedure and the differences are due to chance. If the null hypothesis is
from the nurses, or do they feel basically the same way? Note rejected, the test means that one group feels differently about
that the question is not whether they prefer the procedure but the procedure from the other. Remember that rejection does
whether there is a difference of opinion between the two groups. not mean that one group favors the procedure and the other
does not. Perhaps both groups favor it or both dislike it, but in
To answer this question, a researcher selects a sample of nurses
different proportions.
and doctors and tabulates the data in table form, as shown.
Group Prefer new Prefer old No Preference
Nurses 100 80 20
Doctors 50 120 30

11-23 11-24
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004
Chi-Square Independence Test Contingency Table

„ In order to test the null hypothesis, one must


compute the expected frequencies, assuming Column 1 Column 2 Column 3
the null hypothesis is true. Row 1 C1,1 C1,2 C1,3
„ When data are arranged in table form for the Row 2 C2,1 C2,2 C2,3
independence test, the table is called a
contingency table.
„ The degrees of freedom for any contingency
table are d.f. = (rows – 1) (columns – 1) =
(R – 1)(C – 1).

11-25 11-26
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004

Independence Test Value Calculation of the Expected Frequencies

„ The formula for the test value for the Using the contingency table, one can compute the expected
frequencies for each block (or cell) as shown next.
independence test is the same as the one for
1. Find the sum of each row and each column, and find the
the goodness-of-fit test. grand total, as shown.

Prefer new Prefer old No

χ2 = ∑
( O − E )2 Group
procedure procedure Preference
Row 1 sum
E Nurses 100 80 20
200
Row 2 sum
Doctors 50 120 30
with d.f. = (R − 1)(C − 1) 200

O = observed frequency Column 1


sum
Column 2
sum
Column 3
sum
Grand Total
E = expected frequency 150 200 50
400

11-27 11-28
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004
Calculation of the Expected Frequencies (cont’d) Calculation of the Expected Frequencies (cont’d)

2 For each cell, multiply the corresponding row sum by the For each cell, the expected values can be computed and placed
column sum and divide by the grand total, to get the expected in the table:
value: Prefer
Prefer old No
Row sum × column sum Group new
procedure Preference
Expected value = procedure
grand total
Row 1 sum
The rationale for the computation of the expected frequencies for Nurses 100(75) 80(100) 20(25)
200
a contingency table uses proportions. For C1,1 a total of 150 out
Row 2 sum
of 400 people prefer the new procedure. And since there are 200 Doctors 50(75) 120(100) 30(25)
200
nurses, one would expect, if the null hypothesis were true, Column 1 Column 2 Column 3 Grand
(150/400)(200), or 75, of the nurses to be in favor of the new sum sum sum Total
procedure. For example, for C1,2, the expected value, denoted by 150 200 50 400
E1,2, is
200 × 200
E1,2 = 400 =100

11-29 11-30
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004

Calculation of the Expected Frequencies (cont’d) Homogeneity of Proportions Test


Now, compute χ2: „ Homogeneity of proportions test is used when
(O − E ) 2
χ2 = ∑ = 26.27 samples are selected from several different
E
The final steps are to make the decision and summarise the populations and the researcher is interested
results. This test is always a right-tailed test, and the degrees of
in determining whether the proportions of
freedom are (R-1)(C-1)= (2-1)(3-1)=2. If α = 0.05, the critical value
from the table is 5.991. Hence, the decision is to reject the null elements that have a common characteristic
hypothesis, since 26.67 > 5.991. are the same for each population. The sample
sizes are specified in advance, making either
the row totals or column totals in the
contingency table known before the samples
are selected.

11-31 11-32
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004
Homogeneity of Proportions Test (cont’d.) Independence and Homogeneity

„ H0: p1 = p2 = p3 =… = pn. „ The procedures for the chi-square


independence and homogeneity tests are
„ H1: At least one proportion is different from
identical and summarised below.
the others.
„ Step 1 State the hypotheses and identify the
„ When the null hypothesis is rejected, it can be
claim.
assumed that the proportions are not all
equal. „ Step 2 Find the critical value in the right
tail.

11-33 11-34
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004

Independence and Homogeneity (cont’d.) Independence and Homogeneity (cont’d.)

„ Step 3 Compute the test value. To compute „ Step 4 Make the decision.
the test value, first find the expected
„ Step 5 Summarise the results.
values. For each cell of the
contingency table, use the formula
(row sum )(column sum )
E=
grand total

to get the expected value. To find the


test value, use the formula
( O − E) 2
χ = ∑
2
E 11-35 11-36
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004
An Example An Example (cont’d)
A researcher selected a sample of 150 seniors from each of three STEP 1 State the hypotheses.
area high schools and asked each senior, “Do you drive to school
H0: p1 = p2 = p3
in a car owned by either you or your parents?” The data are
shown in the table. At α = 0.05, test the claim that the H1: At least one proportion is different from the others.
proportion of students who drives their own or their parents’
STEP 2 Find the critical value. The formula for the degrees of
cars is the same at all three schools.
freedom is the same as before: (2-1)(3-1)=2. The critical value is
School 1 School 2 School 3 Total
5.991.

Yes 18 22 16 56 STEP 3 Compute the test value. First, compute the expected
No 32 28 34 94 values, and the complete table is shown.
50 50 50 150 School 1 School 2 School 3 Total
Yes 18(18.67) 22(18.67) 16(18.67) 56
No 32(31.33) 28(31.33) 34(31.33) 94
50 50 50 150

11-37 11-38
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004

An Example (cont’d) Assumptions


The test value is „ The assumptions for the chi-square
(O − E ) 2
χ2 = ∑ = 1.596 independence and homogeneity tests:
E
1. The data are obtained from a random
STEP 4 Make the decision. The decision is not to reject the null
hypothesis, since 1.596 < 5.991. sample.
STEP 5 Summarise the results. There is not enough evidence to
reject the null hypothesis that the proportions of high school
2. The expected value in each cell must be 5
students who drive their own or their parents’ cars to school are or more.
equal for each school.

11-39 11-40
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004
Summary Summary (cont’d.)

„ There are three main uses of the chi-square 3. The homogeneity of proportions test is
distribution:
used to determine if several proportions
1. The test of independence is used to are all equal when samples are selected
determine whether two variables are from different populations.
related or are independent.
2. It can be used as goodness-of-fit test, in
order to determine whether the
frequencies of a distribution are the same
as the hypothesized frequencies.
11-41 11-42
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004

Conclusions

„ The chi-square distribution is useful in a


variety of hypotheses tests that can be applied
to many different everyday situations.

11-43
© Copyright McGraw-Hill 2004

You might also like