You are on page 1of 4

TWO WAY ANOVA A rm wants to test four dierent kinds of new machine.

They have ve workers, and each of them uses each of the machines for a week. The results are as follows: 1 2 3 4 1 44 38 47 36 2 46 40 52 43 3 34 36 44 32 4 43 38 46 33 5 38 42 49 39 We begin by making the null hypothesis that there is no variation either among the workers or among the types of machine. We then proceed much as before, except that now we assume that each observation diers from the overall mean because of (a) factor 1 (which worker) (b) factor 2 (which machine) and (c) error. So our model is X = + F1 + F2 + Suppose there are r levels of factor 1 and s levels of factor 2. The word levels is the standard terminology. We would normally put the data into an r s matrix, with the rows representing the levels of factor 1 and the columns the levels of factor 2. Here, for example, factor 1 the workers, factor 2 is machines, r = 5 and s = 4 We then dene 1 Xi. = s
s

Xij
j=1

1 X.j = r

Xij
i=1

These are the row and column means; so for example X2. is the mean of the entries in the second row. Note that there is an s in the formula for the row means and an r in the formula for the column means. This is simply because an r s matrix has s entries in each of its r rows. The dot replaces the index we have summed over so that there is no confusion about which one it was. To break up the total sum of squares and get the right sorts of sums of squares in we write: (Xij X)2 =
i j i j

[(Xi. X) + (X.j X) + (Xij Xi. X.j + X)]2

When we expand the right hand side we nd that all the cross products vanish: (Xi. X)(X.j X) =
i j i

(Xi. X)
j

(X.j X) = 0

since

j (X.j

X) = 0. Then also 1

[(Xi. X)(Xij Xi. X.j + X) =


i j i

(Xi. X)
j

(Xij Xi. X.j + X)

=
i

(Xi. X)(sXi. sXi. sX + sX) = 0

and similarly for the remaining term. Hence

(Xij X)2 =
i j i j

(Xi. X)2 +
i j

(X.j X)2 +
i j

(Xij Xi. X.j + X)2

which we can write as SStotal = SSf actor Clearly we can write SSf actor
1 1

+ SSf actor

+ SSerror

=
i

s(Xi. X)2

SSf actor

=
j

r(X.j X)2

As before, there is an s in the formula for factor 1, i.e. for the rows, and an r in the formula for factor 2, i.e. for the columns. It can be shown that if we divide the sums of squares by the respective number of degrees of freedom we obtain the mean square errors and that these are all estimates of the (again supposed common) variance under the null hypothesis that there are no systematic eects: H0 : E(Xi. ) = E(X.j ) = E(X) =

The correct number of degrees of freedom is r 1 for factor 1, s 1 for factor 2, and (r 1)(s 1) for error. This makes the total rs 1 which is correct since there is one lost in estimating the overall mean. We can then go on to show that the appropriate tests are based on F = M Sf actor M Serror

In the example the ve rows correspond to dierent workers and the four columns to dierent machines, so factor 1 is workers, factor 2 is machines, r = 5 and s = 4. We write down the data as before, together with the various dierent sums and means: 2

i 1 2 3 4 5 X.j
i

1 44 46 34 43 38 205 41 0 0

2 38 40 36 38 42 194 38.8 -2.2 4.84

3 47 52 44 46 49 238 47.6 6.6 43.56

4 36 43 32 33 39 183 36.6 -4.4 19.36

Xi. 41.25 45.25 36.5 40 42 .25 4.25 -4.5 -1 1

(Xi. X)2 .0625 18.0625 20.25 1 1 40.375

165 181 146 160 168

=67.76

The overall mean is 41, which for this square array is the mean of both the Xi. and the X.j . The summed squared deviations now have to be multiplied by 4 and 5, respectively, and this gives SSworkers =
i

s(Xi. X)2 = 4 40.375 = 161.5 r(X.j X)2 = 5 67.76 = 338.8


j

SSmachines =

We can work out the sum of squares for error directly, but it is usually easier to work out the total sum of squares SStotal =
i j

(Xij X)2 = 574

which gives us, by subtraction, SSerror = 574 338.8 161.5 = 73.7 We now compute the mean squares M Sworkers = 161.5 = 40.375 51 M Smachines = 338.8 = 112.93 41

Note that while the number 40.375 has turned up again thats a bit of a coincidence. We multiplied by 4 because there are 4 columns, and weve now divided by 4 because there are 5 rows, and 4 is 5-1. M Serror = and then the two F -statistics workers: 40.38 = 6.62 6.1 F95%,(4,12)df = 3.26 3 F99%,(4,12)df = 5.41 73.7 = 6.1 (5 1)(4 1)

machines:

112.93 = 18.5 6.1

F95%,(3,12)df = 3.49

F99%,(3,12)df = 5.95

so there is a highly signicant variation among both workers and machines.

You might also like