You are on page 1of 4

MASINDE MULIRO UNIVERSITY

OF SCIENCE & TECHNOLOGY


COMPUTER SCIENCE DEPARTMENT

PCT 911: ADVANCE RESEARCH METHODS

TASK
Chi-square goodness-of-fit test (x2) for Stolen
Vehicles

SUBMITTED BY
NAME: NAHASON MATOKE
REGNO: SIT/H/004/10

SUBMITTED TO:
DR G. WANYEMBI

KAKAMEGA
Chi Sqr Assignment 2

Purpose:
Test for distributional adequacy the chi-square test (Snedecor and Cochran, 1989) is used
to test if a sample of data came from a population with a specific distribution.

An attractive feature of the chi-square goodness-of-fit test is that it can be applied to any
univariate distribution for which you can calculate the cumulative distribution function.
The chi-square goodness-of-fit test is applied to binned data (i.e., data put into classes).
This is actually not a restriction since for non-binned data you can simply calculate a
histogram or frequency table before generating the chi-square test. However, the values of
the chi-square test statistic are dependent on how the data is binned. Another disadvantage
of the chi-square test is that it requires a sufficient sample size in order for the chi-square
approximation to be valid.

The chi-square test is an alternative to the Anderson-Darling and Kolmogorov-Smirnov


goodness-of-fit tests. The chi-square goodness-of-fit test can be applied to discrete
distributions such as the binomial and the Poisson. The Kolmogorov-Smirnov and
Anderson-Darling tests are restricted to continuous distributions.

Definition The chi-square test is defined for the hypothesis:


H0: The data follow a specified distribution.
Ha: The data do not follow the specified distribution.

Test Statistic: For the chi-square goodness-of-fit computation, the data are divided into k
bins and the test statistic is defined as:

Where is the observed frequency for bin i and is the expected frequency for bin i.

The expected frequencies


The Kenya Evening Star, Nov. 7, 2009, reported the following information for a random
sample of 1000 stolen cars for the previous year:
170 were Fords, 300 Toyotas, 210 Nissans, 190 Hyundai's, and 130 Peugeots.

Using the X2= goodness-fit test and significance level of 0.01 to test the hypothesis that
proportions stolen are identical to population make proportions.
Suppose it is established that 15% of all cars are Fords, 35% are Toyotas, 20% are Nissans,
15% are Hyundai’s, and 15% are Peugeots.

The Observed Stolen Vehicles.


Ford Toyota Nissan Hyundai Peugeot Total
Stolen (Oij) 170 300 210 190 130 1000

Percentage of vehicles stolen for each make;


(Stolen make/Total stolen) * 100

Ford Toyota Nissan Hyundai Peugeot Total


Stolen (Oij) % 17 30 21 19 13 100

Total vehicles
Total Vehicles= ∑(stolen/percentage of stolen Vehicle)*100 = 5000
There fore
Expected Stolen Frequencies (Stolen Vehicle)

Given that
15% of all cars are Fords, 35% are Toyotas, 20% are Nissans, 15% are Hyundai’s, and 15%
are Peugeots
15% Ford of Total Vehicles =

Ford Toyota Nissan Hyundai Peugeot


Eij % 15% 35% 20% 15% 15%
Eij 150 350 200 150 150

Test the null hypothesis

Oij Eij Oij- Eij (Oij- Eij)2/ Eij


Ford 170 150 20 2.666666667
Toyota 300 350 -50 7.142857143
Nissan 210 200 10 0.5
Hyundai 190 150 40 10.66666667
Peugeot 130 150 -20 2.666666667

23.64285714

That is, chi-square is the sum of the squared difference between observed (Oij) and the
expected (Eij) data (or the deviation, d), divided by the expected data in all possible
categories

Assessing significance levels:

In the chi-square test for independence the degree of freedom is equal to the number of
columns in the table minus one multiplied by the number of rows in the table minus one.
Df: = (c-1) (r-1)
= (2-1) (5-1)
=4
Thus the value calculated from the formula above is compared with values in the chi-
square distribution table (Bissonnette, 2006). We reject the null hypothesis if the chi-
squared value is greater than the critical value (what is called the upper critical value).
Conclusion
Therefore the chi square for these data is: 23.643 (4 degrees of freedom: (2-1) (5-1)). The
critical value at p =.01 is 13.277
Since 23.643 is larger than 13.277, what observed differs from these expectations is enough
to reject the null Hypothesis.

State the you can draw from the observations made

Test the null hypothesis

Set up the hypothesis for Chi-Square goodness of fit test:

H0. Null hypothesis: In Chi-Square goodness of fit test, the null hypothesis assumes that
there is no significant difference between the observed and the expected value.
Ha. Alternative hypothesis: In Chi-Square goodness of fit test, the alternative hypothesis
assumes that there is a significant difference between the observed and the expected value.

The calculated value of X2 (23.636) is much higher than the table value(13.277) which
means that the calculated value cannot be said to have been due to chance. It is significant
Hence, the hypothesis does not hold

You might also like