You are on page 1of 10

TUTORIAL-CHI SQUARE TESTS

QUESTION 1

Where people turn to for news is different for various age groups. A study of
indicated where different age groups primarily get their news.

Column variable

Row variable Under 36 36-50 50+ Total

Local TV 97 115 130 342

National TV 71 101 126 298

Radio 80 95 108 283

Local newspaper 50 80 103 233

105 85 71 261
Internet
Total 403 476 538 1417

At the 5% level of significance, is there evidence of significant relationship between


the age group and where people primarily get their news? If so, explain the
relationship.

Solution:

  (a)   H0: There is no relationship between the age group and where people primarily get 
their news. 

    H1: There is a relationship between the age group and where people primarily get 
their news. 

Excel output  

Observed Frequencies

Column variable

Row variable Under 36 36-50 50+ Total

Local TV 97 115 130 342

National TV 71 101 126 298

Radio 80 95 108 283

Local newspaper 50 80 103 233

Page 1 of 10 
TUTORIAL WEEK 7‐2012_2013 
105 85 71 261
Internet
Total 403 476 538 1417

Expected Frequencies

Column variable

Row variable Under 36 36-50 50+ Total

Local TV 97.26606 114.8850 129.8490 342

National TV 84.75229 100.1044 113.1433 298

Radio 80.48624 95.06563 107.4481 283

Local newspaper 66.26606 78.26958 88.46436 233

Internet 74.22936 87.67537 99.09527 261

Total 403 476 538 1417

Data

Level of Significance 0.05

Number of Rows 5

Number of Columns 3

Degrees of Freedom 8

Results

Critical Value 15.50731

Chi-Square Test Statistic 30.92932

p-Value 0.000145

Reject the null hypothesis

  Decision rule: If   STAT
2
 > 15.5073, reject H0.     

  Test statistic:   STAT
2
= 30.9293 

Page 2 of 10 
TUTORIAL WEEK 7‐2012_2013 
  Decision: Since   STAT
2
= 30.9293 is greater than the critical bound of 15.5073, reject H0.  

  There is evidence of a significant relationship between the age group and where people 
primarily get their news.  The “50+” group has a lower than expected frequency of getting 
their news through the Internet while the “under 36” group has a higher than expected 
frequency of getting their news through the Internet. 

QUESTION 2

Where people turn to for news is different for various age groups. Suppose that a
study conducted on this issue and it was based on 200 respondents who were
between the ages of 36 and 50 and 200 respondents who were above 50. The
results are represented in the following table with the specific breakdown of the
responses.
Age
Source of News 36-50 Above 50 Total
Newspapers 82 104 186
Other 118 96 214
Total 200 200 400

a) Is there evidence of a significance difference of the proportion who get their


news primarily from newspapers between those of 36 to 50 years old and
those above 50 years old? Use a 5% level of significance. Please use
mathematical calculations to solve it.
b) Use Excel to determine the p-value in (a) and interpret its meaning.

Solution: In this problem we have a 2x2 contingency table. Firstly we define the null and
alternative hypothesis.

a)

H 0 :  1   2 . (Proportion of 36-50 years old and the proportion of the group above

50, who get the news primarily for the newspaper is equal)
H 1 :  1   2 (the two proportions of people between the age 36 to 50 and above 50
who get their news primarily form newspapes are not the same. The way they get
their news is not independent of the age.)

In this type of problems we will use the chi square test statistic.

Page 3 of 10 
TUTORIAL WEEK 7‐2012_2013 
( fo  fe )2
Test Statictic   df  (r - 1)(c - 1)  2 where r is the number of rows in the
all cells fe
contigency table and c is the number of columns.
Alos f 0 : fo  observed frequency in a particular cell
f e : expected frequency in a particular cell if H 0 is true
The average proportion is
X1  X 2 X 82  104
p    0.465 and
n1  n2 n 200  200
X1  X 2 X 118  96
p    0.535
n1  n2 n 200  200

It follows that the expected frequency table is


Age
Source of News 36-50 Above 50 Total
Newspapers 200(0.46 200(0.465)=93 186
5)=93
Other 200(0.53 200(0.537)=93 214
5)=107
Total 200 200 400

Hence the chi square test statistic is

( f o  f e ) 2 (82 - 93) 2 (104  93) 2 (118 - 107) 2 (96 - 107) 2


Test Statictic  
all cells fe

93

93 107

107
 4.863

The value for the chi square distribution for 1degrees of freedom at the 5% level of
significance is   3.841 .
2
0.05

Comparing the critical value with the value of the chi square test statistic we can see that
3.841  4.863 . Hence it follows that we will not accept the H 0 hypothesis. At the 5% level of
significance, the two proportions are not the same and there is a significant difference of the
proportion who get their news primarily from the newspaper between the ages 36 to 50 years
old and those above 50 years old.

b)

Page 4 of 10 
TUTORIAL WEEK 7‐2012_2013 
Results
Critical Value 3.8415
Chi-Square Test Statistic 4.8638
p-Value 0.0274
Reject the null hypothesis
Based on the excel output we can see
that the p-values is 0.024. If we compare it with the significance level is smaller
0.024<0.05 hence we will reject the H 0
hypothesis mentioned in a).

QUESTION 3

More shoppers do their majority of their grocery shopping on Saturday than any
other day of the week. To check this statement 600 shoppers were interviewed, 200
from each age group: under 35, between 35-54 and over 54. The results are
represented in the following table
Observed Frequencies
Column variable
Row variable Under 35 35-54 Over 54 Total
Saturday 48 56 24 128
A Day other than Saturday 152 144 176 472
Total 200 200 200 600

Is there evidence of a significant difference among the age groups with respect to
the majority shopping day? Use the 5% level of significance.

Solution: In this problem we have a 3x2 contingency table. Firstly we define the null and
alternative hypothesis.

a)

H 0 : 1   2   3 .

H 1 : at least one proportion differs


where population 1 = under 35, 2 = 35-54, 3 = over 54

In this type of problems we will use the chi square test statistic.

Page 5 of 10 
TUTORIAL WEEK 7‐2012_2013 
( fo  fe )2
Test Statictic   df  (r - 1)(c - 1)(2  1)  (3  1)  2 where r is the number of rows in the
all cells fe
contigency table and c is the number of columns.
Alos f 0 : fo  observed frequency in a particular cell
f e : expected frequency in a particular cell if H 0 is true
The average proportion is
X1  X 2  X 3 48  56  24
p   0.213 and
n1  n2  n3 200  200  200
X1  X 2  X 3 152  144  176
p   0.786
n1  n 2  n3 200  200  200

It follows that the expected frequency table is


Expected Frequencies
Column variable
Row variable Under 35 35-54 Over 54 Total
Saturday 200(0.213)= 200(0.213)= 200(0.213)= 128
42.67 42.67 42.67
A Day other than Saturday 200(0.786)= 200(0.786)= 200(0.786)= 472
157.33 157.33 157.33
Total 200 200 200 600

Hence the chi square test statistic is

( f o  f e ) 2 (48 - 42.6) 2 (56  42.67) 2 (24 - 42.67) 2 (152 - 157.33) 2


Test Statictic  
all cells fe

42.67

42.67 42.67

157.33
(144 - 157.33) 2 (176 - 157.33) 2
   16.521
157.33 157.33

The value for the chi square distribution for 2 degrees of freedom at the 5% level of
significance is   5.991 .
2
0.05

Hence  2 0,05  5.991 and we reject the H 0 . There is enough evidence to conclude that there
is a difference between the age group and the day that they do their shopping.

Page 6 of 10 
TUTORIAL WEEK 7‐2012_2013 
QUESTION 4

A sample of 500 shoppers was selected in a large metropolitan in order to determine


various information’s concerning consumer behaviour. Among the questions asked
was “Do you enjoy shopping for clothing?” The results are summarized in the
following contingency table:

Observed Frequencies       

Gender   

Enjoy Shopping  Male  Female  Total 

Yes 126 234 360

No 104 36 140

Total 230 270 500

a) Is there evidence of a significant difference between the proportion of male


and females who enjoy shopping for clothing at the 1% level of significance?
b) Determine the p-value in ( a) and interpret its meaning.
c) What are your answers to (a) and (b) if 206 males enjoyed shopping for
clothing and 24 did not?

Solution

Excel output: 

Observed Frequencies       

Gender   

Enjoy Shopping  Male  Female  Total 

Yes 126 234 360

No 104 36 140

Total 230 270 500

Expected Frequencies       

Gender   

Enjoy Shopping  Male  Female  Total 

Page 7 of 10 
TUTORIAL WEEK 7‐2012_2013 
Yes 165.6 194.4 360

No 64.4 75.6 140

Total 230 270 500

Level of Significance  0.01

Number of Rows  2

Number of Columns  2

Degrees of Freedom  1

Results   

Critical Value  6.634897

Chi‐Square Test Statistic  62.6294

p‐Value  2.5E‐15

Reject the null hypothesis   

      H0:  1   2   H1:  1   2   where population: 1 = males, 2 = females 

      Decision rule: df = 1. If   STAT
2
> 6.635, reject H0. 

      Test statistic:   STAT
2
 = 62.6294 

    Decision: Since   STAT
2
 = 62.6294 is greater than the upper critical bound of 6.6349, 
reject H0. There is enough evidence to conclude that there is significant difference 
between the proportions of males and females who enjoy shopping for clothing at the 
0.01 level of significance. 

  (b)  p‐value = virtually zero.  The probability of obtaining a test statistic of 62.6294 or 
larger when the null hypothesis is true is virtually zero. 

  (c)   (a)  H0:  1   2     H1:  1   2   where Populations: 1 = males, 2 = females 

      Decision: Since   STAT
2
 = 0.9881 is less than the upper critical bound of 

    6.635, do not reject H0. There is not enough evidence to conclude that the 
proportion of males and females who enjoy shopping for clothing are 
different. 

  (b)  p‐value = 0.3202. The probability of obtaining a test statistic of 0.9881 or 
larger when the null hypothesis is true is 0.3202. 

Page 8 of 10 
TUTORIAL WEEK 7‐2012_2013 
QUESTION 5

A survey was conducted in five countries. The percentage of respondents said that
they eat out once a week or more are as follows:

GERMANY 10%
FRANCE 12%
UNITED KINDOM 28%
GREECE 39%
US 57%
Suppose that the survey was based on 1000 respondents in each country

a) At the 5% level of significance determine whether there is a significance


difference in the proportion of people who eat out at least once in a week in
the various countries.
b) Find the p-value in (a) and interpret its meaning.

Solution:

EXCEL output:

Observed Frequencies

Column variable

Row variable Germany France UK Greece US Total

Yes 100 120 280 390 570 1460

No 900 880 720 610 430 3540

Total 1000 1000 1000 1000 1000 5000

Expected Frequencies

Column variable

Row variable Germany France UK Greece US Total

Yes 292 292 292 292 292 1460

No 708 708 708 708 708 3540

Total 1000 1000 1000 1000 1000 5000

Data

Level of Significance 0.05

Page 9 of 10 
TUTORIAL WEEK 7‐2012_2013 
Number of Rows 2

Number of Columns 5

Degrees of Freedom 4

Results

Critical Value 9.487728

Chi-Square Test 742.3961


Statistic

p-Value 2.3E-159

Reject the null


hypothesis

(a) H 0 : 1   2   3   4   5 ( the proportion of people that they eat out is the same

In all countries)

H1 : Not all  j are equal. ( the proportion of people that they eat out is NOT

the same in all countries)

where population 1 = Germany, 2 = France, 3 = UK, 4 = Greece, 5 = US

( f0 – fe )2
Test statistic:  2
STAT   = 742.3961
All Cells fe

Decision: Since the calculated test statistic 742.3961 is greater than the critical value of

9.4877, you reject H 0 and conclude that there is a difference in the proportion of people
who eat out at least once a week in the various countries.

(b) p-value is virtually zero. The probability of obtaining a data set which gives rise to a test
statistic of 742.3961 or more is virtually zero if there is no difference in the proportion of
people who eat out at least once a week in the various countries.

Page 10 of 10 
TUTORIAL WEEK 7‐2012_2013 

You might also like