Professional Documents
Culture Documents
TASK
Chi-square goodness-of-fit test (x2) for Stolen
Vehicles
SUBMITTED BY
NAME: NAHASON MATOKE
REGNO: SIT/H/004/10
SUBMITTED TO:
DR G. WANYEMBI
KAKAMEGA
Chi Sqr Assignment 2
Purpose:
Test for distributional adequacy the chi-square test (Snedecor and Cochran, 1989) is used
to test if a sample of data came from a population with a specific distribution.
An attractive feature of the chi-square goodness-of-fit test is that it can be applied to any
univariate distribution for which you can calculate the cumulative distribution function.
The chi-square goodness-of-fit test is applied to binned data (i.e., data put into classes).
This is actually not a restriction since for non-binned data you can simply calculate a
histogram or frequency table before generating the chi-square test. However, the values of
the chi-square test statistic are dependent on how the data is binned. Another disadvantage
of the chi-square test is that it requires a sufficient sample size in order for the chi-square
approximation to be valid.
Test Statistic: For the chi-square goodness-of-fit computation, the data are divided into k
bins and the test statistic is defined as:
Where is the observed frequency for bin i and is the expected frequency for bin i.
Using the X2= goodness-fit test and significance level of 0.01 to test the hypothesis that
proportions stolen are identical to population make proportions.
Suppose it is established that 15% of all cars are Fords, 35% are Toyotas, 20% are Nissans,
15% are Hyundai’s, and 15% are Peugeots.
Total vehicles
Total Vehicles= ∑(stolen/percentage of stolen Vehicle)*100 = 5000
There fore
Expected Stolen Frequencies (Stolen Vehicle)
Given that
15% of all cars are Fords, 35% are Toyotas, 20% are Nissans, 15% are Hyundai’s, and 15%
are Peugeots
15% Ford of Total Vehicles =
23.64285714
That is, chi-square is the sum of the squared difference between observed (Oij) and the
expected (Eij) data (or the deviation, d), divided by the expected data in all possible
categories
In the chi-square test for independence the degree of freedom is equal to the number of
columns in the table minus one multiplied by the number of rows in the table minus one.
Df: = (c-1) (r-1)
= (2-1) (5-1)
=4
Thus the value calculated from the formula above is compared with values in the chi-
square distribution table (Bissonnette, 2006). We reject the null hypothesis if the chi-
squared value is greater than the critical value (what is called the upper critical value).
Conclusion
Therefore the chi square for these data is: 23.643 (4 degrees of freedom: (2-1) (5-1)). The
critical value at p =.01 is 13.277
Since 23.643 is larger than 13.277, what observed differs from these expectations is enough
to reject the null Hypothesis.
H0. Null hypothesis: In Chi-Square goodness of fit test, the null hypothesis assumes that
there is no significant difference between the observed and the expected value.
Ha. Alternative hypothesis: In Chi-Square goodness of fit test, the alternative hypothesis
assumes that there is a significant difference between the observed and the expected value.
The calculated value of X2 (23.636) is much higher than the table value(13.277) which
means that the calculated value cannot be said to have been due to chance. It is significant
Hence, the hypothesis does not hold