You are on page 1of 20

Chi Square Distributions

 Prof G.R.C.Nair
Applications

 To test if, more than two population


proportions can be considered equal.
 To determine if two attributes are
independent of each other.
 To test goodness of fit .
 To test the discrepancies between
observed and expected frequencies.
Characteristics
1. Continuous distribution.
2. Assumes non negative values only .
3. Chi square distribution curve starts at the origin
and lies entirely to the right of Y axis.
4.Parameter degrees of freedom (d f) also figure.
5.Shape of Chi square distribution curve is skewed
for very small d.f and changes drastically as d.f
increases . For large d.f, Chi-square distribution
looks like a normal distribution curve.
Conditions for use

1.Used for large samples.


2.All expected frequency > 10 for good
accuracy, and should be minimum 5.
3.Take Ho: There is no significant
difference between the sample
proportions or between the observed and
the corresponding expected values.
4. If the degrees of freedom and area
required in the right tail (ie, significance
level of test ) are given, the critical
values of chi-square can be found from
the table.
5. If the Chi sq value got is > critical
value, H0 is rejected.
Goodness Of Fit Test

This is a test to see if a particular


theoretical probability distribution is
appropriate in a specific case.
Frequencies obtained from the actual
performance of an experiment /
observation in a survey are called
observed frequencies: fO
We test the null hypothesis that these observed
frequencies follow a certain expected frequency
as per a specific theoretical distribution fE. The
hypothesis tested is how good the observed
frequencies fit a given probability distribution.
H0 : The observed values follow the selected
probability distribution.
H1 : They do not follow.
 Test statistic for a Chi square test is calculated
as  =  ( fO- fE) 2/ fE

Degrees of freedom = n-1, where n denotes the


number of possible outcomes/samples.

The expected frequency for each category


should not be < 5. If there is a category with an
expected frequency of less than 5 either
increase the sample size or combine two or
more categories to make expected frequency
equal to 5.
Contingency table

 For testing association of attributes.


 Expected frequency in a contingency
table to be calculated by the formula,
fE = T R x T C / n. and d.f = (c-1)(r-1)
 H0: Attributes have no association, ie
independent
 H1: they have association ie, dependent
Example -1

 The number of car accidents in a city


during the last 10 months was as follows
20,17,12,6,7,15,8,5,16, and 14. Does it
agree with the general belief that the
number of accidents are more or less the
same every month? Test at 10%Sig level.
 H0 : No of accidents are same every month
 H1: They are not the same
 Expected number of accidents every month, if
they are the same = 120/10=12
 fO-
fE)2/fEetc

 This isthetable value for d.fatSign
Level, which is 14.684.
 So reject H0
Example - 2

 The following table shows the goals made in the world cup
matches. Test if they are as per the Poisson’s distribution
at 20%S.L
No of goals 0 1 2 3 4 5 6 7
No of match 95 158 108 63 40 9 5 2

H0: Poisson Distribution can be fitted well, H1: Do not fit well
Mean no of goals = (95*0+158*1+108*2+….) / (95+158+108+…) =1.7
Expected frequency as per Poisson fE = n* e-x / x!
No of goals fO fE (fO-fE)2/fE
0 95 88 49/88=0.56
1 158 150 64/150=0.43
2 108 126 324/126=2.57
3 63 72 =1.12
4 40 30 =3.33
5 9* 10* =0.29
6 5* 3*
7 2* 1*
 * Combine 8.3
d.f= 5, S.L=20%  critical = 7.289
calculated value is > critical value. So reject H 0.
Poisson distribution do not fit well.
Example 3
100 students selected randomly from 10,000 doing MBA,
were cross classified by their age at entry and final grade
obtained. Test if there is any association between age at
entry and final grade obtained at 5%.

Grade under 21yrs 21 to24 Over24 Total


Up to 3 6 9 5 20
3.1 to 3.5 18 14 8 40
3.6 to 4.0 11 12 17 40
Total 35 35 30 100
Expected frequency Table
Grade under 21yrs 21 to24 Over24 Total

Up to 3 7 7 6 20

3.1 to 3.5 14 14 12 40

3.6 to 4.0 14 14 12 40

Total 35 35 30 100

calculated =6.37 d.f= 4, S.L=5%  critical = 9.49


calculated value is < critical value. So cannot reject H 0.
Age and Grade points are independent. ( ie no association)
Example 4

 In a survey of 200 boys, of the 75 found to


be intelligent,40 had skilled fathers; while
85 of the unintelligent boys had unskilled
fathers. Do these figures support the
hypothesis that skilled fathers get intelligent
boys? Use 2 test at 5% significance.
Example 5 / HW

 A number of managers from various levels were randomly


selected and interviewed for their concern for environmental
issues. The response of each persons was tallied into one of the
three categories as given below:

No concern Some concern Great Concern


Top level 15 13 12
Middle level 20 19 21
Supervisor 7 7 6
Group leader 28 21 13

Use 0.01 significance level to determine whether there is any


relation ship between the level of management and
environmental concern.
Further Examples / HW

 2009 mid term make up Q. 6


 2007 Terminal Part B. Q. 3
 2009 terminal part B Q.3
 2007 make up Terminal, 7
 2007 mid term 2?,4
 Mid term 2009 Part C . Q. 6
 2007 Mid term Make up Q. 5

You might also like