You are on page 1of 10

CONTINGENCY TABLES AND CHI-SQUARED TESTS

In this type of analysis we have two characteristics, such as gender and eye colour, which cannot be measured but which can be used to group people by variations within them. These characteristics may, or may not, be associated in some way. How can we decide? We can take a random sample from the population, note which variation of each characteristic is appropriate for each case and then cross-tabulate the data. It is then analysed in order to see if the proportions of each characteristic in the sub-samples are the same as the overall proportions - easier to do than describe! As an example, if there is no relationship between gender and eye colour we would expect similar proportions of males and females to have blue eyes. The Variables If the variables are, as is usually the case, nominal, (described by name only), frequencies may be cross-tabulated by each category within each variable. Ordinal variables may be used if there are only a limited number of orders so that each one can be classified as a separate category. Continuous variables may be grouped and then tabulated similarly though the results will then vary according to the grouping categories.

Contingency Tables (Cross-Tabs) You have met this type of table before as a contingency table when calculating probabilities. As a reminder: cases are allotted to categories and their frequencies crosstabulated; e.g. in the gender / eye colour example there might be blue eyed males, blue eyed females, brown eyed males, brown eyed females, etc. These tables are known as contingency tables. All possible 'contingencies' are included in the 'cells' which are themselves mutually exclusive. The table is completed by calculating the 'row totals', the 'column totals' and the 'grand total'. Expected values If the two variables, the characteristics under scrutiny, are completely independent, the proportions within the sub-totals of the contingency table would be expected to be the same as those of the totals for each variable. In practice we work with frequencies rather than proportions, distinguishing between 'observed' and 'expected' frequencies by enclosing the latter within brackets. If gender and eye colour are independent and if a third of the population has blue eyes, we would expect a third of males to be blue eyed and a third of females to be blue eyed. These proportions are obviously contrived so as to be easy to work with. How can we cope with more awkward numbers? In the on-going example, the proportions are first calculated as fractions which are then multiplied by the total frequency to find the expected individual cell frequencies. This produces a formula which is applicable in all cases: For any cell the expected frequency is calculated by:
Row total Column total Overall total

where the relevant row and column are those crossing in that particular cell.

Chi-squared ( ) Test for Independence The hypothesis test which is carried out in order to see if there is any association between categorical variables, such as gender and eye colour, is known as the Chi-squared, ( ), test,
2

Example 1 The following table compiled by a personnel manager relates to a random sample of 180 staff taken from the whole workforce of the supermarket chain. We shall, in this example, test for association between a member of staff's gender and his/her type of job, at the 5% level of significance. Male Supervisor Shelf stacker Till operator Cleaner Total Completing the row and column totals, as previously with probability, gives the full table. In this example we randomly selected a sample of 180 supermarket staff and found that two thirds, 120, of them were female and one third, 60, were male. Assuming there is no association between gender and job category and finding that we have 45 till operators, we would expect two thirds, 30, of them to be female and one third, 15, of them to be male. Note that these figures are a quarter of each gender respectively, which checks since till operators, 45, form a quarter of the total staff, 180. We now calculate the other expected frequencies from the probabilities and put them into the table: (See Section 4.8) P (Supervisor) = 35/180 P (Male) = 60/180 assuming independence.
35 60 180 180 180 = 11.67

Female 15 30 35 40

Total

20 20 10 10

P (Supervisor and male) = 35/180 x 60/180

Therefore the expected number of Male supervisors =

This is the expected frequency for members of staff who are both male and a supervisor. Note that it is a theoretical number which does not have to be an integer.

This simplifies to:

Row total Column total Overall total

35 60 = 11.67 180

Calculating the other expected frequencies and inserting them in the table (in brackets): Male Supervisor Shelf stacker Till operator Cleaner Total 20 20 10 10 (11.67) (16.67) (15.00) (16.67) 60 15 30 35 40 Female (23.33) (33.33) (30.00) (33.33) 120 Total 35 50 45 50 180

These are the frequencies which would be expected if there is no association between gender and job category at the supermarket. If the expected frequencies are observed to actually occur in practice then we can deduce that the two variables are indeed independent. We would obviously not expect to get exact agreement with the expected frequencies, so some critical amount of difference is allowed and we compare the difference from our observations with that allowed by the use of a standard table. Are the values observed so different to those expected that we must reject the idea of independence? Or are the results just due to sampling errors, with the variables actually being independent? It is to be hoped that you recognise the need for a hypothesis test! The chisquared ( 2) Hypothesis test To find the answer, we analyse the data and compare the result to a standard table figure. We carry out a formal hypothesis test at 5% significance: the chi-squared test. 1) 2) 3) 4) State Null Hypothesis, H0, (that of no association) and Alternative Hypothesis, H1. Record observed frequencies, O, in each cell of the contingency table. Calculate row, column and grand totals. Calculate expected frequency, E, for each cell : row total x column total grand total Note that: No expected frequency should be less than 1 and the number of expected frequencies below 5 should not be over 20% of the total number of cells. Otherwise the test is invalid. 5) Find critical value from chi-square table, as appended, with (r - 1) (c - 1) degrees of freedom where r and c are the number of rows and columns respectively.

6) 7)

Calculate test statistic:

( O E) 2
E

Compare the two values and conclude whether the variables are independent or not.

In example 1, we have already carried out steps 2, 3 and 4 of the procedure by calculating the expected values. Whether these are calculated before, or during, the test is up to personal preference. Some statisticians also prefer to calculate the test statistic, as this procedure is rather lengthy, before starting the test and to then insert the calculated value in the formal hypothesis test. The test statistic is an overall measure of the difference between the expected and observed frequencies. Each cell difference is squared so that positive and negative differences have the same weighting and proportioned by the size of the expected cell contents. When the contributions from each cell are totalled their sum is compared with a critical value from the chi-squared table hence the name of this test. Null Hypothesis (H0): There is no association between gender and job category. (Remember that 'null' means none.) Alternative Hypothesis (H1): There is an association between gender and job category. Critical Value: from the chi-squared table Number of degrees of freedom () = (r - 1)(c - 1) = (4 - 1)(2 - 1) = 3 x 1 = 3; 2 table, as appended, is always one tailed. Level of significance = 5%
2 5%, = 3 =

7.816

Test statistic The test statistic is calculated from the contingency table which includes both the observed and the expected values for the frequency of staff. The data may be tabulated, as we shall do in this example, or the contribution of each cell may be calculated directly as and then the test statistic found as the sum of these contributions: Male Supervisor Shelf stacker Till operator Cleaner Total 20 20 10 10 (11.67) (16.67) (15.00) (16.67) 60 15 30 35 40 Female (23.33) (33.33) (30.00) (33.33) 120 Total 35 50 45 50 180

(O E)2
E

Test statistic =

( O E) 2
E

O 20 15 20 30 10 35 10 40

E 11.67 23.33 16.67 33.33 15.00 30.00 16.67 33.33

(O - E) 8.33 -8.33 3.33 -3.33 -5.00 5.00 -6.67 6.67 Total

(O - E)2/E 5.946 2.974 0.665 0.333 1.667 0.833 2.669 1.335 16.422

Test Statistic: 16.422 Conclusion: Test statistic > Critical value therefore reject H0 . Conclude that there is an association between gender and job category in the supermarket chain. Looking again at the data we can see that far more males than expected were supervisors or shelf stackers and more females were cleaners or till operators.

Example 2 In this example we first have to set up the contingency table from the following information collected from a questionnaire: In a recent survey within a Supermarket chain, a random sample of 160 employees: stackers, sales staff and administrators, were asked to grade their attitude towards future wage restraint on the scale: Very favourable; favourable; unfavourable; very unfavourable. Of the 40 stackers interviewed, 7 gave the response 'favourable', 24 the response 'unfavourable', and 8 the response 'very unfavourable'. There were 56 sales staff and from these, 10 responded 'very unfavourable', 9 responded 'favourable' and 3 responded 'very favourable. The rest of the sample were administrators. Of these, 16 gave the response 'very favourable' and 2 the response 'very unfavourable'. In the whole survey, exactly half the employees interviewed responded 'unfavourable'. We first draw up a contingency table showing these results and then test whether attitude towards future wage restraint is dependent on the type of employment. Setting up the table: in this example there are three types of employee giving four different responses, i. e. we have a 3 x 4 (or a 4 x 3) table. Adding extra rows and columns for the subtotals and titles we need 5 x 6 cells. Have a go at compiling the table. As you come to each number in the frequency of response above insert it into the appropriate place; then find the missing figures by difference. There is sufficient information here to enable you to complete your table. When complete, check with that below before calculating the expected values. V.favourable Favourable Unfavourable Stackers Sales staff Administrators Total The expected values can next be calculated:
Row total Column total Overall total

V.unfavourable Total

and inserted.

V.favourable Favourable Unfavourable Stackers Sales staff Administrators Total Hypothesis test 1 3 16 20 7 9 24 40 24 34 22 80

V.unfavourable Total 8 10 2 20 40 56 64 160

Null Hypothesis (H0): There is no association between job category and attitude towards wage restraint. Alternative Hypothesis (H1): There is an association between job category and attitude towards wage restraint. Level of Significance: 5% Level Of Significance

Critical value: Number of degrees of freedom () = (r - 1)(c - 1) = Level of significance = 5% 2 table, 5%, 6 degrees of freedom = 12.59

Test statistic O 1 7 24 8 3 9 34 10 16 24 22 2

( O E)2
E

(Complete the table.) E 5 10 20 5 7 14 (O - E) -4 -3 +4 +3 -4 -5 (O - E)2/E 3.200 0.900 0.800 1.800 2.286 1.786

Total Test static: 32.969

32.969

Conclusion: Test statistic > Critical value therefore reject H0 Conclude that there is an association between job category and attitude towards future wage restraint. The administrators were for it but the others against it.

COMPLETED EXAMPLES FROM LECTURE HANDOUT


Example 2 In this example we first have to set up the contingency table from the following information collected from a questionnaire: In a recent survey within a Supermarket chain, a random sample of 160 employees: stackers, sales staff and administrators, were asked to grade their attitude towards future wage restraint on the scale:

Very favourable; favourable;

unfavourable;

very unfavourable.

Of the 40 stackers interviewed, 7 gave the response 'favourable', 24 the response 'unfavourable', and 8 the response 'very unfavourable'. There were 56 sales staff and from these, 10 responded 'very unfavourable', 9 responded 'favourable' and 3 responded 'very favourable. The rest of the sample were administrators. Of these, 16 gave the response 'very favourable' and 2 the response 'very unfavourable'. In the whole survey, exactly half the employees interviewed responded 'unfavourable'. We first draw up a contingency table showing these results and then test whether attitude towards future wage restraint is dependent on the type of employment. Setting up the table: in this example there are three types of employee giving four different responses, i. e. we have a 3 x 4 (or a 4 x 3) table. Adding extra rows and columns for the subtotals and titles we need 5 x 6 cells. Have a go at compiling the table. As you come to each number in the frequency of response above insert it into the appropriate cell; then find the missing figures by difference. There is sufficient information here to enable you to complete your table. When complete, check with that below before calculating the expected values. V.favourable Favourable Unfavourable Stackers Sales staff Administrators Total 1 3 16 20 7 9 24 40 24 34 22 80 V.unfavourable Total 8 10 2 20 40 56 64 160 and inserted.

The expected values can next be calculated:

Row total Column total Overall total

V.favourable Favourable Unfavourable Stackers Sales staff Administrators Total Hypothesis test 1 3 16 (5) (7) (8) 20 7 9 24 (10) (14) (16) 40 24 34 22 (20) (28) (32) 80

V.unfavourable Total 8 10 2 (5) (7) (8) 20 40 56 64 160

Null Hypothesis (H0): There is no association between job category and attitude towards wage restraint. Alternative Hypothesis (H1): There is an association between job category and attitude towards wage restraint.

Level of Significance:

5% Level of significance

Critical value: Number of degrees of freedom () = (r - 1)(c - 1) = (3 - 1)(4 - 1) = 2 x 3 = 6 Level of significance = 5% 2 table, Table 5 in Appendix D, 5%, 6 degrees of freedom = 12.59

Test statistic O 1 7 24 8 3 9 34 10 16 24 22 2

( O E)2
E

E 5 10 20 5 7 14 28 7 8 16 32 8

(O - E) -4 -3 +4 +3 -4 -5 +6 +3 +8 +8 -10 -6 Total

(O - E)2/E 3.200 0.900 0.800 1.800 2.286 1.786 1.286 1.286 8.000 4.000 3.125 4.500 32.969

Test static: 32.969 Conclusion: Test statistic > Critical value therefore reject H0 Conclude that there is an association between job category and attitude towards future wage restraint. The administrators were for it but the others against it.
Table 5 PERCENTAGE POINTS OF THE 2-D1STRIBUTION

1 2 3

10% 2.706 4.605 6.252

5% 3.841 5.991 7.816

2 5% 5.024 7.378 9.351

1% 6.635 9.210 11.35

0.l% 10.83 13.82 16.27

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 50 60 70 80 90 100

7.780 9.236 10.64 12.02 13.36 14.68 15.99 17.28 18.55 19.81 21.06 22.31 23.54 24.77 25.99 27.20 28.41 29.62 30.81 32.01 33.20 34.38 35.56 36.74 37.92 39.09 40.26 51.81 63.17 74.40 85.53 96.58 107.6 118.5

9.488 11.07 12.59 14.07 15.51 16.92 18.31 19.68 21.03 22.36 23.68 25.00 26.30 27.59 28.87 30.14 31.41 32.67 33.92 35.17 36.42 37.65 38.89 40.11 41.34 42.56 43.77 55.76 67.50 79.08 90.53 101.9 113.1 124.3

11.14 12.83 14.45 16.02 17.53 19.02 20.48 21.92 23.34 24.74 26.12 27.49 28.85 30.19 31.53 32.85 34.17 35.48 36.78 38.08 39.36 40.65 41.92 43.19 44.46 45.72 46.98 59.34 71.42 83.30 95.02 106.6 118.1 129.6

13.28 15.08 16.81 18.49 20.09 21.67 23.21 24.72 26.22 27.69 29.14 30.58 32.00 33.41 34.81 36.19 37.57 38.93 40.29 41.64 42.98 44.31 45.64 46.96 48.28 49.59 50.89 63.69 76.15 88.38 100.4 112.3 124.1 135.8

18.47 20.51 22.46 24.36 26.13 27.89 29.59 31.26 32.91 34.51 36.12 37.70 39.25 40.79 42.31 43.82 45.32 46.80 48.27 49.73 51.18 52.62 54.05 55.48 56.89 58.30 59.70 73.42 86.66 99.61 112.3 124.8 137.2 149.5

10

You might also like