You are on page 1of 6

CHAPTER 11 Chi-Square Test

11.1 Introduction
- The Chi-squared (
2
_ , Greek letter chi, pronounced ki) test looks not at an
individual item of data (i.e. a single parameter) but at the whole distribution.
As a result it is known as a nonparametric test or distribution -free tests.
These tests do not require the population to follow the normal distribution.

- These tests can be used
i) to test a hypothesis about a single variance or standard deviation.
ii) for tests concerning frequency distribution.
iii) to test the independence of two variables.


Characteristics of Chi-square Distribution:

i) The value of
2
_ is never negative (i.e. always positive).

ii) There is a family of
2
_ distribution. Each with a different shape, depending on
the number of degrees of freedom (df).

iii) When the number of df is small the distribution is positively skewed but as
the number of degrees of df increase it becomes symmetrical and approaches
the normal distribution.

The Chi-square Family of Curves.













11.2 A Goodness-of-Fit Tests

Definition: A nonparametric test involving a set of observed frequencies and a
corresponding set of expected frequencies.

Purpose: To determine if there is a statistical difference between the two sets of
data, one of which is observed and the other expected.
It determines whether frequencies observed for some categorical
variable, could have been drawn from a hypothesized population
distribution.

The null and alternative hypotheses are usually stated as:
H
0
: The sample is from the specified population
H
1
: The sample is not from the specified population


In the goodness-of-fit tests the
2
_ distribution is used to determine how well an
observed set of data fit an expected set of data.


The frequencies obtained from the performance of an experiment are called the
observed frequency and are denoted as O. The expected frequencies, denoted by E,
are the frequencies that we expect to obtain if the null hypothesis is true. The
expected frequency for a category is obtained as

np E =

where
n is the sample size and
p is the probability that an element belongs to that category if the null hypothesis is
true.

The test statistic for a goodness-of-fit test is
(


=
E
E O
2
2
) (
_

where O = observed frequency for a category
E = expected frequency for a category = np
Remember that a chi-square goodness-of-fit test is always a right-tailed test.

If the numbers of observed and expected frequencies among the categories are quite
close, the resulting statistic will be small and the H
0
is not rejected.

If large differences exist among categories, a large statistic results and the H
0
will be
rejected. Thus, a chi-square goodness-of-fit test is always a right-tailed test.

In goodness-of-fit test, the degrees of freedom 1 = k df
where k = number of categories



Example 11.2: A bank has an ATM installed inside the bank, and it is available to its
customers only from 7am to 6pm Monday through Friday. The manager of the bank
wanted to investigate if the percentage of transactions made on this ATM is the same
for each of the five days (Monday through Friday) of the week. She randomly
selected one week and counted the number of transactions made on this ATM on each
of the five days during this week. The information she obtained is given in the
following table, where the number of users represents the number of transactions on
this ATM on these days. For convenience, we will refer to these transactions as
people or users.

Day Monday Tuesday Wednesday Thursday Friday
Number of
users 253 197 204 279 267

At the 1% level of significance, can we reject the null hypothesis that the proportion
of people who use this ATM each of the five days of the week is the same? Assume
that this week is typical of all weeks in regard to the use of this ATM.

Solution:
20 . 0 :
5 4 3 2 1 0
= = = = = p p p p p H
1
H : At least two of the five proportions are not equal to 0.20

Use
2
_ distribution with 4 1 5 = = df , the critical value of
2
_ is 13.277.







Test statistic:
184 . 23
) (
2
2
=
(


=

E
E O
_
Since 23.184 > 13.277, reject
0
H .
We conclude that a higher number of users of this ATM use this machine on one or
more of these days.


11.3 Tests of Independence

We often have information on more than one variable for each element. Such
information can be summarized and presented using a two-way classification table,
which is also called a contingency table or cross-tabulation.

Example: Total Enrollment at a university
Full-time Part-time
Male 3768 2615
Female 4658 3717

A contingency table can be of any size. For example, it can be 3 2 , 2 3 or 3 3 .
In general, the table is made up of r rows and c columns and designated as an c r
(numbers of rows by numbers of columns) table.

The
2
independence test can be used to test the independence of two variables. That
is to determine whether a relationship exists between two variables. Another word,
we test the null hypothesis that the two characteristics of the elements of a given
population are NOT related (i.e., they are independent) against the alternative that
the two characteristics are related (i.e., they are dependent).

Eg: (i) Choice of TV program and gender.
(ii) Magazines read and educational background.
(iii) Years of working experience and income.

A test of independence involves a test of the null hypothesis that two characteristics of
a population are not related. The degree of freedomfor a test of independence is
) 1 )( 1 ( = c r df
where r and c are the number of rows and the number of columns, respectively, in the
given contingency table.

The test statistic for a test of independence is
(


=
E
E O
2
2
) (
_
where O and E are the observed and expected frequencies, respectively, for a cell.

Note:
- Row and column headings do not count in determining the number of r and c.
Each block in the table is called a cell.



Example 11.4: Violence and lack of discipline have become major problems in
schools in the United States. A random sample of 300 adults was selected, and these
adults were asked if they favor giving more freedom to schoolteachers to punish
students for violence and lack of discipline. The two-way classification of the
responses of these adults is presents in the following table. Does the sample provides
sufficient evidence to conclude that the two attributes, gender and opinions of adults,
are dependent? Use a 1% significance level.
In Favor (F) Against (A) No Opinion (N)
Men (M) 93 70 12
Women (W) 87 32 6

Solution:







































11.4 Tests of Homogeneity

A test of homogeneity involves testing the null hypothesis that the proportions of
elements with certain characteristic in two or more different populations are the same
against the alternative hypothesis that these proportions are not the same.

Example 11.5: Consider the data on income distributions for households in California
and Wisconsin given in the table below. Using the 2.5% significance level, test the
null hypothesis that the distribution of households with regards to income levels is
similar (homogeneous) for two states.
California Wisconsin
High income 70 34
Medium income 80 40
Low income 100 76

Solution:

You might also like