You are on page 1of 7

Ch.

13 - Inference (Significance Test) for Tables Chi-Square Goodness of Fit Test


To test to see if an observed sample distribution is different from the hypothesized population distribution, use a Chi-Square (2) test for Goodness of Fit These are tests for data in Table format. I) One Sample (1-Way Table) - i.e. just one column, or just one row. Test Statistic.
( Observed
Expected Expected

)2

where the d.f. = n 1, (n being the number of categories) Use a calculator to do this! What you do is compare a table created from sample data (which is really like Ha) to a theoretical table that you would expect to occur (which is really like H0). Expected Table => the null hypothesis table Observed Table => your table from the sample You need to enter the Observed table given to you as a list, (in L1, for example). Next, we need to determine what will the Expected table look like, and then enter this into a second list, (L2). Then, in a third list, (L3), you will calculate the necessary information to find 2 . Lastly, sum up the values in L3 and this will be your 2 test statistic value. You then compare this to the * critical value from the Chi-Square table on page 842, and decide to either accept the null or reject the null.

Ex #1) Trix cereal comes in five fruit flavors, and each flavor has a different shape. A curious student who obviously had a lot of time on their hands methodically sorted an entire box of the cereal and found the following distribution of flavors for the pieces of cereal in the box: Flavor Frequency Grape 530 Lemon 470 Lime 420 Orange 610 Strawberry 585

Test the null hypothesis that the flavors are uniformly distributed versus the alternative that they are not. A. So, => H0: Even distribution Ha: Uneven distribution

So, our Observed Data Table is the table above. This is entered into L1. Next, we need to calculate the Expected Data Table, the null table. Well, the question says we are testing against a uniformly distributed situation. That means, every cell has the same value. So, we need to add up how many pieces of cereal the student sorted, and evenly divide this number up among the different colors. There were 2615 pieces of cereal. Divvied up among 5 colors would mean there would be 523 pieces of cereal for each color, if the colors truly are uniformly distributed. So, heres our Expected Data Table: Flavor Frequency Grape 523 Lemon 523 Lime 523 Orange 523 Strawberry 523

Now, enter this into L2. Next, in L3, enter the following formula ( Observed Expected ) 2 ( L1 L 2) 2 L3 = L3 = or, rather, ( Expected ) ( L 2) Lastly, we need to sum up L3. Quit out of the [Stat]:Edit menu. Then, go to [2nd]:[List]:Math:sum(L3) and sum up L3. This will give you 2.

So,.

( Observed

Expected Expected

)2

2 =

47.57170172

This is our test statistic. We now have to calculate the P-value. To do this we must look in the Chi-Square table on pg.842. We need to move down the left column to the appropriate degrees of freedom, (d.f.= n-1), which in our case is (5-1) or d.f. = 4. Now, read across until we find the closest value to 47.57170172. Since we run right off the chart, we assume that the P-value is extremely small, basically close to zero. The data is significant! Another way to do this would be to calculate the critical value 2, or point of no return. Lets try a 1% level test. Again, read down the left-most column until you find df=4, and then read across to the 1% column. This critical value is *= 13.28. Well, our test statistic is much further out than this, so again it is significant, and again we reject the null.

So, the data is very significant, as the P-value is close to zero. We reject H0 in favor of the alternative and conclude that the evidence suggests that either the Trix flavors are not uniformly distributed, or our box of Trix is not a random sample.

Thats basically it. One thing to be careful of, though, dont mistake a linear regression problem for a Chi-Square Test for Goodness of Fit problem, or visaversa. They both have one row of data.

II)

One Sample (2-Way Table) - i.e. more than one column, and more than one row. Test Statistic.
( Observed
Expected Expected

)2

where the d.f. = (r 1)(c 1), (r being the number of rows, and c being the number of columns) Use a calculator to do this! You technically need two tables in order to do a 2-Way Table ChiSquare test, but the nice thing about the TI-83 is that you only need to enter in the Observed (sample) Table, and the calculator will generate the Expected (null) Table for you. (Note, on the AP, you should copy both tables down onto your paper!) . Again, you are comparing a table created from sample data (which is really like Ha) to a theoretical table that you would expect to occur (which is really like H0). Expected Table => the null hypothesis table Observed Table => your table from the sample We need to enter in the Observed table into our calculator. We do NOT need to calculate out the Expected table! The calc will do this for us! We enter the Observed table into a MATRIX, and not a list! Using [Matrx]:Edit:[A] , we will enter the table in a matrix. We usually pick Matrix A to be the Observed table, and Matrix B to be the Expected table, but thats only because thats what the defaults are set to. If you want to change them, thats fine. But youll have to change them when you run the Chi-Square test as well. Enter in the number of rows in the top right corner of the screen and then the number of columns. (Its always rows x columns). The format of the table automatically changes when you hit [Enter]. Now, enter in the table into the table. Make sure you enter the data into the correct cells.

Next, you would normally do the same for the Expected table, choosing to enter that table into Matrix B the same way that we did previously with the Observed table and Matrix A, but dont bother! The calculator will put this together for us when we run the Chi-Square test!!! (We just have to remember to copy it down onto our paper). So, now, quit out of the matrix menu, and go to [Stat]:Tests: 2-Test. Make sure that your Observed and Expected entries are set to the correct matrices. (We set Observed to [A], and Expected to [B]). Then, Calculate! Lastly, check your test statistic value (2) to the correct critical value (*) based on whatever alpha-level ( = %) you are running for the problem. If the test statistic is further out than the critical value, you reject the null hypothesis in favor of the alternative hypothesis. Or, if the P-value is smaller than the test level you are running, reject. Lets try an example.. Ex #1) Chronic users of cocaine need the drug to feel pleasure. Perhaps giving them a medication that fights depression will help them stay off cocaine. A three-year study compared an antidepressant called desipramine with lithium (a standard treatment for cocaine addiction) and a placebo. The subjects were 72 chronic users of cocaine who wanted to break their drug habit. Twenty-four of the subjects were randomly assigned to each treatment. Here are the counts and proportions of the subjects who avoided relapse into cocaine use during the study: Group 1 2 3 Treatment Desipramine Lithium Placebo Subjects 24 24 24 No relapse 14 6 4 Proportion 0.583 0.250 0.167

So,.. we need to translate this into a usable Observed Data table: Observed (Alternative) No, Did Not Relapse Yes, Relapsed 14 10 6 18 4 20

Desipramine Lithium Placebo

Now, if we did not have a calculator we would have to create what we call the Expected Data Table, or the null hypothesis table. This will be based on certain assumptions or expectations detailed out in the problem. For example, if we count how many people did not relapse in total, we get 24. This is 24 out of 72 people in the study. Thats 1/3. So we would expect, if neither drug has any effect, that we would have an even distribution across the board, where for each drug as well as the placebo, we would have 1/3 not relapse, and 2/3 relapse. i.e. we would expect.. Expected (Null) No, Did Not Relapse Yes, Relapsed 8 16 8 16 8 16

Desipramine Lithium Placebo

Basically, to find each cell, heres the quick formula Each Cell = (That Cells Row Subtotal)(That Cells Column Subtotal) (Overall Total for Entire Table) But, dont bother! Because as soon as we run the Chi-Square Test for Goodness of Fit the Expected Data Table will automatically be calculated! So, lets enter our Observed Data Table into Matrix [A]. [2nd]:[Matrx]:Edit:[A] Now, we have a 3 row by 2 column table. So, change the matrix dimensions to a 3x2. Enter in our sample data from the Observed Table into matrix [A]. Quit out to a home screen when youre done. Lastly, we need to run the Chi-Square Test for Goodness of Fit. go to [Stat]:Tests: 2-Test. Make sure that your Observed and Expected entries are set to the correct matrices. (We set Observed to [A], and Expected to [B]). Then, hit Calculate! Results => next page..

Results => 2 Test 2 = 10.5 p = .0052475184 df = 2

So, if we simply think of P-values, this test statistic of 2 has a P-value of slightly over %. This is significant at the 1% level. This is very significant. Or, if we compare our test statistic value of 2 = 10.5 to *, the critical value, we will need to look up the * critical value. To do this, lets pick a 1% test level, since we already know that is a good level from our P-value. In the Chi-Square table, (pg. 842), look down the degrees of freedom column to a degrees of freedom value of d.f.= (3-1)(2-1), or d.f.=2. Now, read across to the 1% column. This is the critical value, the point of no return. * = 9.21 Our test statistic is beyond the point of no return, so, again, this is significant.

*=9.21

2=10.5

We reject the null hypothesis that there is no difference between drugs or placebo, in favor of the alternative hypothesis that there does seem to be evidence to suggest that there is varying affect on cocaine addiction, based on which drug used, vs. placebo.

You might also like