You are on page 1of 4

LCGC Europe Online Supplement

statistics and data analysis

Analysis of Variance
Shaun Burke, RHM Technology Ltd, High Wycombe, Buckinghamshire, UK. Statistical methods can be powerful tools for unlocking the information contained in analytical data. This second part in our statistics refresher series looks at one of the most frequently used of these tools: Analysis of Variance (ANOVA). In the previous paper we examined the initial steps in describing the structure of the data and explained a number of alternative significance tests (1). In particular, we showed that t-tests can be used to compare the results from two analytical methods or chemical processes. In this article, we will expand on the theme of significance testing by showing how ANOVA can be used to compare the results from more than two sets of data at the same time, and how it is particularly useful in analysing data from designed experiments.

With the advent of built-in spreadsheet functions and affordable dedicated statistical software packages, Analysis of Variance (ANOVA) has become relatively simple to carry out. This article will therefore concentrate on how to select the correct variant of the ANOVA method, the advantages of ANOVA, how to interpret the results and how to avoid some of the pitfalls. For those wanting more detailed theory than is given in the following section, several texts are available (25). A bit of ANOVA theory Whenever we make repeated measurements there is always some variation. Sometimes this variation (known as within-group variation) makes it difficult for analysts to see if there have been significant changes between different groups of replicates. For example, in Figure 1 (which shows the results from four replicate analyses by 12 analysts), we can see that the total variation is a combination of the spread of results within groups and the spread between the mean values (betweengroup variation). The statistic that measures the within and between-group variations in ANOVA is called the sum of squares and often appears in the output tables abbreviated as SS. It can be shown that the different sums of squares calculated in ANOVA are equivalent to variances (1). The

central tenet of ANOVA is that the total SS in an experiment can be divided into the components caused by random error, given by the within-group (or sample) SS, and the components resulting from differences between means. It is these latter components that are used to test for statistical significance using a simple F-test (1). Why not use multiple t-tests instead of ANOVA? Why should we use ANOVA in preference to carrying out a series of t-tests? I think this is best explained by using an example; suppose we want to compare the results from 12 analysts taking part in a training exercise. If we were to use t-tests, we would need to calculate 66 t-values. Not only is this a lot of work but the chance of reaching a wrong conclusion increases. The correct way to analyse this sort of data is to use one-way ANOVA. One-way ANOVA One-way ANOVA will answer the question: Is there a significant difference between the mean values (or levels), given that the means are calculated from a number of replicate observations? Significant refers to the observed spread of means that would not normally arise from the chance variation within groups. We have already seen an example of this type of problem in

the form of the data contained in Figure 1, which shows the results from 12 different analysts analysing the same material. Using these data and a spreadsheet, the results obtained from carrying out one-way ANOVA are reported in Example 1. In this example, the ANOVA shows there are significant differences between analysts (Fvalue > Fcrit at the 95% confidence level). This result is obvious from a plot of the data (Figure 1) but in many situations a visual inspection of a plot will not give such a clear-cut result. Notice that the output also includes a p-value (see Interpretation of the result(s) section, which follows). Note: ANOVA cannot tell us which individual mean or means are different from the consensus value and in what direction they deviate. The most effective way to show this is to plot the data (Figure 1) or alternatively, but less effectively, carry out a multiple comparison test such as Scheffe's test (2). It is also important to make sure the right questions are being asked and that the right data are being captured. In Example 1, it is possible that the time difference between the analysts carrying out the determinations is the reason for the difference in the mean values. This example shows how good experimental design procedures could have prevented ambiguity in the conclusions.

10

statistics and data analysis

LCGC Europe Online Supplement

Example 1 An example of one-way ANOVA carried out by Excel


A_1 Replicate 1 Replicate 2 Replicate 3 Replicate 4 A_7 Replicate 1 Replicate 2 Replicate 3 Replicate 4 Anova: Single Factor Source of Variation Between Groups Within Groups 40.71 40.91 40.8 38.42 34.1 34.1 34.69 34.6 A_2 A_3 A_4 A_5 A_6 35.84 36.67 40.54 41.19 41.22 36.58 37.33 40.67 40.29 39.61 31.3 36.96 40.81 40.99 37.89 34.19 36.83 40.78 40.4 36.67 A_8 39.2 39.3 39.3 39.3 A_9 42.5 42.3 42.5 42.5 A_10 A_11 A_12 39.75 36.04 44.36 39.69 37.03 45.73 39.23 36.85 45.25 39.73 36.24 45.34

SS 438.7988 35.6208

df 11 36

MS F P-value F crit 39.8908 40.31545 6.6E-17 2.066606 0.989467

(Note: the data table has been split into two sections (A_1 to A_6, A_7 to A_12) for display purposes. The ANOVA is carried out on a single table.)

SS = sum of squares, df = degrees of freedom, MS = mean square (SS/df). The P-value is < 0.05 (Fvalue is > Fcrit - 95% confidence level for 11 and 36 degrees of freedom) therefore it can be concluded that there is a significant difference between the analysts' results.

Example 2 Two-way ANOVA The analysis of tinned ham was carried out at three temperatures (415, 435 and 460 C) and three times (30, 60 and 90 minutes). Three analyses, determining protein yield were made at each temperature and time. The measurements are summarized in the diagram below and the results of the two-way ANOVA are given in the table.
Temp (C) 415 Time (min) 27 27.1 27.2 27 27.1 27.2 27 27.1 27.2 435 460

30

60

90

Time (min)/Temp (C) 30 30 30 60 60 60 90 90 90

415 27.13 27.2 27.13 27.29 27.13 27.23 27.03 27.13 27.07

435 27.2 26.97 27.13 27.07 27.1 27.03 27.2 27.23 27.27

460 27.03 27.1 27.13 27.1 27.07 27.03 27.03 27.07 26.9

Anova: Two-factor with replication Source of Variation SS Sample (=Time) 0.000867 Columns (=Temperature) 0.049689 Interaction 0.087644 Within 0.077667 Total 0.215867

df 2 2 4 18 26

MS F P-value F crit 0.000433 0.100429 0.904952 3.554561 0.024844 5.75794 0.011667 3.554561 0.021911 5.078112 0.006437 2.927749 0.004315

Two-way ANOVA In a typical experiment things can be more complex than described previously. For example, in Example 2 the aim is to find out if time and/or temperature have any effect on protein yield when analysing samples of tinned ham. When analysing data from this type of experiment we use two-way ANOVA. Two-way ANOVA can test the significance of each of two experimental variables (factors or treatments) with respect to the response, such as an instrument's output. When replicate measurements are made we can also examine whether or not there are significant interactions between variables. An interaction is said to be present when the response being measured changes more than can be explained from the change in level of an individual factor. This is illustrated in Figure 2 for a process with two factors (Y and Z) when both factors are studied at two levels (low and high). In Figure 2(b), the changes in response caused by Y depend on Z, and vice versa. In two-way ANOVA we ask the following questions: Is there a significant interaction between the two factors (variables)? Does a change in any of the factors affect the measured result? It is important to check the answers in the right order: Figure 3 illustrates the decision process. In the case of Example 2 the questions are: Is there an interaction between temperature and time which affects the protein yield? Does time and/or temperature affect the protein yield? Using the built-in functions of a spreadsheet (in this case Excels data analysis tools two-factor analysis with replication) we see that there is a significant interaction between time and temperature and a significant effect of temperature alone (both p-value < 0.05 and F > Fcrit). Following the process outlined in Figure 3, we consider the interaction question first by comparing the mean squares (MS) for the within-group variation with the interaction MS. This is reported in the results table of Example 2. F = 0.021911/0.004315 = 5.078 If the interaction is significant (F > Fcrit), as in this case, then the individual factors (time and temperature) should each be compared with the MS for the interaction (not the within-group MS) thus: Ftemp = 0.024844/0.021911 = 1.134

26.9

27.3

26.9

27.3

26.9

27

27.1

27.2

27

27.1

27.2

27

27.1

26.9

27.3

26.9

27.3

26.9

27.2 27.2

27

27.1

27.2

27

27.1

27.2

27

26.9

27.3

26.9

27.3

26.9

27.1

Note: in the above example, the spreadsheet (Excel) labels Source of Variation as Sample, Columns, Interaction and Within. Sample = Time, Columns = Temperature, Interaction is the interaction between temperature and time, and Within is a measure of the within-group variation. (Note: Source of variation Columns = Temperature and Sample = Time).

27.3

27.3

27.3

LCGC Europe Online Supplement

statistics and data analysis

11

Ftime = 0.000433/0.021911 = 0.020 Fcrit = 6.944, for 2 and 4 degrees of freedom (at the 95% confidence level) In other words, there is no significant difference between the interaction of time and temperature with respect to either of the individual factors, and, therefore, the interaction of temperature with time is worth further investigation. If one or both of the individual factors were significant compared with the interaction, then the individual factor or factors would dominate and for all practical purposes any interaction could be ignored. If the interaction term is not significant then it can be considered to be another small error term and can thus be pooled with the within-group (error) sums of squares term. It is the pooled value (SS2pooled) that is then used as the denominator in the F-test to determine if the individual factors affect the measured results significantly. To combine the sums of squares the following formula is used:
ss2pooled ss inter dof inter ss within dof within

where dofinter and dofwithin are the degrees of freedom for the interaction term and error term, and SSinter and SSwithin are the sums of squares for the interaction term and error term, respectively. (dofpooled dofinter dofwithin)

Interpretation of the result(s) To reiterate the interpretation of ANOVA results, a calculated F-value that is greater than Fcrit for a stated level of confidence (typically 95%) means that the difference being tested is statistically significant at that level. As an alternative to using the Fvalues the p-value can be used to indicate the degree of confidence we have that there is a significant difference between means (i.e., (1-p) * 100 is the percentage confidence). Normally a p-value of 0.05 is considered to denote a significant difference. Note: Extrapolation of ANOVA results is not advisable, so in Example 2 for instance, it is impossible to say if a time of 15 or 120 minutes would lead to a measurable effect on protein yield. It is, therefore, always more economic in the long run to design the experiment in advance, in order to cover the likely ranges of the parameter(s) of interest. Avoiding some of the pitfalls using ANOVA In ANOVA it is assumed that the data for each variable are normally distributed. Usually in ANOVA we dont have a large amount of data so it is difficult to prove any departure from normality. It has been shown, however, that even quite large deviations do not affect the decisions made on the basis of the F-test. A more important assumption about ANOVA is that the variance (spread) between groups is homogeneous (homoscedastic). If this is not the case (this often happens in chemistry, see Figure 1) then the F-test can suggest a statistically

Selecting the ANOVA method One-way ANOVA should be used when there is only one factor being considered and replicate data from changing the level of that factor are available. Two-way ANOVA (with or without replication) is used when there are two factors being considered. If no replicate data are collected then the interactions between the two factors cannot be calculated. Higher level ANOVAs are also available for looking at more than two factors. Advantages of ANOVA Compared with using multiple t-tests, one-way and two-way ANOVA require fewer measurements to discover significant effects (i.e., the tests are said to have more power). This is one reason why ANOVA is used frequently when analysing data from statistically designed experiments. Other ANOVA and multivariate ANOVA (MANOVA) methods exist for more complex experimental situations but a description of these is beyond the scope of this introductory article. More details can be found in reference 6.

48 46 Analyte concentration (ppm) 44 42 40 38 36 34 32 30 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 Analyst ID Mean total standard deviation

figure 1 Plot comparing the results from 12 analysts.

12

statistics and data analysis

LCGC Europe Online Supplement

significant difference when none is present. The best way to avoid this pitfall is, as ever, to plot the data. There also exist

a number of tests for heteroscedasity (i.e., Bartlett's test (5) and Levene's test (2)). It may be possible to overcome this type of

ZHigh

ZHigh

problem in the data structure by transforming it, such as by taking logs (7). If the variability within a group is correlated with its mean value then ANOVA may not be appropriate and/or it may indicate the presence of outliers in the data (Figure 4). Cochran's test (5) can be used to test for variance outliers. Conclusions ANOVA is a powerful tool for determining if there is a statistically significant difference between two or more sets of data. One-way ANOVA should be used when we are comparing several sets of observations. Two-way ANOVA is the method used when there are two separate factors that may be influencing a result. Except for the smallest of data sets ANOVA is best carried out using a spreadsheet or statistical software package. You should always plot your data to make sure the assumptions ANOVA is based on are not violated. Acknowledgements The preparation of this paper was supported under a contract with the UK Department of Trade and Industry as part of the National Measurement System Valid Analytical Measurement Programme (VAM) (8). References
(1) S. Burke, Scientific Data Management 1(1), 3238, September 1997. (2) G.A. Millikem and D.E. Johnson, Analysis of Messy Data, Volume 1: Designed Experiments, Van Nostrand Reinhold Company, New York, USA (1984). (3) J.C. Miller and J.N. Miller, Statistics for Analytical Chemistry, Ellis Horwood PTR Prentice Hall, London, UK (ISBN 0 13 0309907). (4) C. Chatfield, Statistics for Technology, Chapman & Hall, London, UK (ISBN 0412 25340 2). (5) T.J. Farrant, Practical Statistics for the Analytical Scientist, A Bench Guide, Royal Society of Chemistry, London, UK (ISBN 0 85404 442 6) (1997). (6) K.V. Mardia, J.T. Kent and J.M. Bibby, Multivariate Analysis, Academic Press Inc. (ISBN 0 12 471252 5) (1979). (7) ISO 4259: 1992. Petroleum Products Determination and Application of Precision Data in Relation to Methods of Test. Annex E, International Organisation for Standardisation, Geneva, Switzerland (1992). (8) M. Sargent, VAM Bulletin, Issue 13, 45, Laboratory of the Government Chemist (Autumn 1995).

Response

ZLow

YLow

YHigh

Response

ZLow

YLow

YHigh

(a) Y and Z are independent

(b) Y and Z are interacting

figure 2 Interactive factors.

Start

Compare within-group mean squares with interaction mean squares

Significant difference? (F > F crit)

Yes

Compare interaction mean squares with individual factor mean squares

No
Pool the within-group and interaction sums of squares

Compare pooled mean squares with individual factor mean squares

figure 3 Comparing mean squares in two-way ANOVA with replication.

Unreliable high mean (may contain outliers)

Variance

Significantly different means by ANOVA

Mean value

figure 4 A plot of variance versus the mean.

Shaun Burke currently works in the Food Technology Department of RHM Technology Ltd, High Wycombe, Buckinghamshire, UK. However, these articles were produced while he was working at LGC, Teddington, Middlesex, UK (http://www.lgc.co.uk).