You are on page 1of 17

ANOVA: ANalysis Of VAriance between groups You might guess that the size of maple leaves depends on the

location of the trees. For example, that maple leaves under the shade of tall oaks are smaller than the maple leaves from trees in the prairie and that maple leaves from trees in median strips of parking lots are smaller still. To test this hypothesis you collect several (say 7) groups of 10 maple leaves from different locations. Group A is from under the shade of tall oaks; group B is from the prairie; group C from median strips of parking lots, etc. Most likely you would find that the groups are broadly similar, for example, the range between the smallest and the largest leaves of group A probably includes a large fraction of the leaves in each group. Of course, in detail, each group is probably different: has slightly different highs, lows, and hence it is likely that each group has a different average (mean) size. Can we take this difference in average size as evidence that the groups in fact are different (and perhaps that location causes that difference)? Note that even if there is not a "real" effect of location on leaf-size (the null hypothesis), the groups are likely to have different average leaf-sizes. The likely range of variation of the averages if our location-effect hypothesis is wrong, and the null hypothesis is correct, is given by the standard deviation of the estimated means: /N where is the standard deviation of the size of all the leaves and N (10 in our example) is the number of leaves in a group. Thus if we treat the collection of the 7 group means as data and find the standard deviation of those means and it is "significantly" larger than the above, we have evidence that the null hypothesis is not correct and instead location has an effect. This is to say that if some (or several) group's average leaf-size is "unusually" large or small, it is unlikely to be just "chance". The comparison between the actual variation of the group averages and that expected from the above formula is is expressed in terms of the F ratio: F=(found variation of the group averages)/(expected variation of the group averages) Thus if the null hypothesis is correct we expect F to be about 1, whereas "large" F indicates a location effect. How big should F be before we reject the null hypothesis? P reports the significance level. In terms of the details of the ANOVA test, note that the number of degrees of freedom ("d.f.") for the numerator (found variation of group averages) is one less than the number of groups (6); the number of degrees of freedom for the denominator (so called "error" or variation within groups or expected variation) is the total number of leaves minus the total number of groups (63). The F ratio can be computed from the ratio of the mean sum of squared deviations of each group's mean from the overall mean [weighted by the size of the group] ("Mean Square" for "between") and the mean sum of the squared deviations of each item from that item's group mean ("Mean Square" for "error"). In the previous sentence mean means dividing the total "Sum of Squares" by the number of degrees of freedom. Why not just use the t-test? The t-test tells us if the variation between two groups is "significant". Why not just do t-tests for all the pairs of locations, thus finding, for example, that leaves from median strips are significantly smaller than leaves from the prairie, whereas shade/prairie and shade/median strips are not significantly different. Multiple t-tests are not the answer because as the number of groups grows, the number of needed pair comparisons grows quickly. For 7 groups there are 21 pairs. If we test 21 pairs we should not be surprised to observe things that happen only 5% of the time. Thus in 21 pairings, a P=.05 for one pair cannot be considered significant. ANOVA puts all the data into one number (F) and gives us one P for the null hypothesis. One way analysis of variance Menu location: Analysis_Analysis of Variance_One Way. This function compares the sample means for k groups. There is an overall test for k means, multiple comparison methods for pairs of means and tests for the equality of the variances of the groups. Consider four groups of data that represent one experiment performed on four occasions with ten different subjects each time. You could explore the consistency of the experimental conditions or the inherent error of the experiment by using one way analysis of variance (ANOVA), however, agreement analysis might be more appropriate. One way ANOVA is more appropriate for finding statistical evidence of inconsistency or difference across the means of the four groups. One way ANOVA assumes that each group comes from an approximately normal distribution and that the variability within the groups is roughly constant. The factors are arranged so that experiments are columns and subjects are rows, this is how you must enter your data in the StatsDirect workbook. The overall F test is fairly robust to small deviations from these assumptions but you could use the Kruskal-Wallis test as an alternative to one way ANOVA if there was any doubt.

Numerically, one way ANOVA is a generalisation of the two sample t test. The F statistic compares the variability between the groups to the variability within the groups:

- where F is the variance ratio for the overall test, MST is the mean square due to treatments/groups (between groups), MSE is the mean square due to error (within groups, residual mean square), Yij is an observation, Ti is a group total, G is the grand total of all observations, ni is the number in group i and n is the total number of observations. Assumptions:

random samples normally distributed observations in each population equal variance of observations in each population

- the homogeneity of variance option (marked as "Equality of variance tests (Levene, Bartlett)" in the ANOVA results window) can be used to test the variance assumption. The Shapiro-Wilk test can be used to look for evidence of non-normality. The most commonly unacceptable deviation from the assumptions is inequality of variance when the groups are of unequal sizes. A significant overall test indicates a difference between the population means for the groups as a whole; you may then go on to make multiple comparisons between the groups but this "dredging" should be avoided if possible. If the groups in this example had been a series of treatments/exposures to which subjects were randomly allocated then a two way randomized block design ANOVA should have been used. Example From Armitage and Berry (1994, p. 214). Test workbook (ANOVA worksheet: Expt 1, Expt 2, Expt 3, Expt 4). The following data represent the numbers of worms isolated from the GI tracts of four groups of rats in a trial of carbon tetrachloride as an anthelminthic. These four groups were the control (untreated) groups. Expt 1 279 338 334 198 303 Expt 2 378 275 412 265 286 Expt 3 172 335 335 282 250 Expt 4 381 346 340 471 318

To analyse these data in StatsDirect you must first prepare them in four workbook columns appropriately labelled. Alternatively, open the test workbook using the file open function of the file menu. Then select One Way from the Analysis of Variance section of the analysis menu. Select the columns marked "Expt 1", "Expt 2","Expt 3" and "Expt 4" in one action when prompted for data. For this example: One way analysis of variance Variables: Expt 1, Expt 2, Expt 3, Expt 4 Source of Variation Between Groups Within Groups Corrected Total Sum Squares 27234.2 63953.6 91187.8 DF 3 16 19 Mean Square 9078.066667 3997.1

F (variance ratio) = 2.271163 P = .1195

The null hypothesis that there is no difference in mean worm counts across the four groups is held. If we had rejected this null hypothesis then we would have had to take a close look at the experimental conditions to make sure that all control groups were exposed to the same conditions. Technical validation The American National Institute of Standards and Technology provide Statistical Reference Datasets for testing statistical software (McCullough and Wilson, 1999;http://www.nist.gov.itl/div898/strd). The results below for the SiResits data set are given to 12 decimal places, StatsDirect provides 15 decimal places of accuracy internally. One way analysis of variance Variables: Instrument 1, Instrument 2, Instrument 3, Instrument 4, Instrument 5 Source of Variation Between Groups Within Groups Corrected Total Sum Squares 0.0511462616 0.21663656 0.2677828216 DF 4 20 24 Mean Square 0.0127865654 0.010831828

F (variance ratio) = 1.180462374402 P = .3494

Two way analysis of variance Menu location: Analysis_Analysis of Variance_Two Way. This function calculates ANOVA for a two way randomized block experiment. There are overall tests for differences between treatment means and between block means.Multiple comparison methods are provided for pairs of treatment means. Consider data classified by two factors such that each level of one factor can be combined with all levels of the other factor: Treatment (i, 1 to k) 1 2 3 Yij

Block (j, 1 to b)

1 2 3 . b

In the example below there is a study of different treatments on clotting times. Response/outcome variable Y is the observed clotting time for blood samples. Blocks are individuals who donated a blood sample. Treatments are different methods by which portions of each of the blood samples are processed. Unlike one way ANOVA, the F tests for two way ANOVA are the same if either or both block and treatment factors are considered fixed or random:

- where F is the variance ratio for tests of equality of treatment and block means, MST is the mean square due to treatments/groups (between groups), MSB is the mean square due to blocks (between blocks), MSE is the mean square due to error (within groups, residual mean square), Yij is an observation, Y bar i. is a treatment group mean, Y bar .j is a block mean and Y bar .. is the grand mean of all observations. If you wish to use a two way ANOVA but your data are clearly non-normal then you should consider using the Friedman test, a nonparametric alternative. Please note that many statistical software packages and texts present multiple comparison methods for treatment group means only in the context of one way ANOVA. StatsDirect extends this to two way ANOVA by using the treatment group mean square from two way ANOVA for multiple comparisons. Treatment effects must be fixedfor this use of multiple comparisons to be valid. See Hsu (1996) for further discussion. Example From Armitage and Berry (1994, p. 241). Test workbook (ANOVA worksheet: Treatment 1, Treatment 2, Treatment 3, Treatment 4). The following data represent clotting times (mins) of plasma from eight subjects treated in four different ways. The eight subjects (blocks) were allocated at random to each of the four treatment groups. Treatment 1 8.4 12.8 9.6 9.8 8.4 8.6 8.9 7.9 Treatment 2 9.4 15.2 9.1 8.8 8.2 9.9 9.0 8.1 Treatment 3 9.8 12.9 11.2 9.9 8.5 9.8 9.2 8.2 Treatment 4 12.2 14.4 9.8 12.0 8.5 10.9 10.4 10.0

To analyse these data in StatsDirect you must first prepare them in four workbook columns appropriately labelled. Alternatively, open the test workbook using the file open function of the file menu. Then select Two Way from the Analysis of Variance section of the analysis menu. Select the columns marked "Treatment 1", "Treatment 2","Treatment 3" and "Treatment 4" in one action when prompted for data. For this example: Two way randomized block analysis of variance Variables: Treatment 1, Treatment 2, Treatment 3, Treatment 4 Source of Variation Between blocks (rows) Between treatments (columns) Residual (error) Corrected total Sum Squares 78.98875 13.01625 13.77375 105.77875 DF 7 3 21 31 Mean Square 11.284107 4.33875 0.655893

F (VR between blocks) = 17.204193 P < .0001 F (VR between treatments) = 6.615029 P = .0025 Here we can see that there was a statistically highly significant difference between mean clotting times across the groups. The difference between subjects is of no particular interest here. P values The P value or calculated probability is the estimated probability of rejecting the null hypothesis (H0) of a study question when that hypothesis is true.

The null hypothesis is usually an hypothesis of "no difference" e.g. no difference between blood pressures in group A and group B. Define a null hypothesis for each study question clearly before the start of your study. The only situation in which you should use a one sided P value is when a large change in an unexpected direction would have absolutely no relevance to your study. This situation is unusual; if you are in any doubt then use a two sided P value. The term significance level (alpha) is used to refer to a pre-chosen probability and the term "P value" is used to indicate a probability that you calculate after a given study. The alternative hypothesis (H1) is the opposite of the null hypothesis; in plain language terms this is usually the hypothesis you set out to investigate. For example, question is "is there a significant (not due to chance) difference in blood pressures between groups A and B if we give group A the test drug and group B a sugar pill?" and alternative hypothesis is " there is a difference in blood pressures between groups A and B if we give group A the test drug and group B a sugar pill". If your P value is less than the chosen significance level then you reject the null hypothesis i.e. accept that your sample gives reasonable evidence to support the alternative hypothesis. It does NOT imply a "meaningful" or "important" difference; that is for you to decide when considering the real-world relevance of your result. The choice of significance level at which you reject H0 is arbitrary. Conventionally the 5% (less than 1 in 20 chance of being wrong), 1% and 0.1% (P < 0.05, 0.01 and 0.001) levels have been used. These numbers can give a false sense of security. In the ideal world, we would be able to define a "perfectly" random sample, the most appropriate test and one definitive conclusion. We simply cannot. What we can do is try to optimise all stages of our research to minimise sources of uncertainty. When presenting P values some groups find it helpful to use the asterisk rating system as well as quoting the P value: P < 0.05 * P < 0.01 ** P < 0.001 Most authors refer to statistically significant as P < 0.05 and statistically highly significant as P < 0.001 (less than one in a thousand chance of being wrong). The asterisk system avoids the woolly term "significant". Please note, however, that many statisticians do not like the asterisk rating system when it is used without showing P values. As a rule of thumb, if you can quote an exact P value then do. You might also want to refer to a quoted exact P value as an asterisk in text narrative or tables of contrasts elsewhere in a report. At this point, a word about error. Type I error is the false rejection of the null hypothesis and type II error is the false acceptance of the null hypothesis. As an aid memoir: think that our cynical society rejects before it accepts. The significance level (alpha) is the probability of type I error. The power of a test is one minus the probability of type II error (beta). Power should be maximised when selecting statistical methods. If you want to estimate sample sizes then you must understand all of the terms mentioned here. The following table shows the relationship between power and error in hypothesis testing: DECISION Accept H0 correct decision P 1-alpha type II error P beta

TRUTH H0 is true H0 is false

Reject H0 type I error P alpha (significance) correct decision P 1-beta (power)

H0 = null hypothesis P = probability If you are interested in further details of probability and sampling theory at this point then please refer to one of the general texts listed in the reference section. You must understand confidence intervals if you intend to quote P values in reports and papers. Statistical referees of scientific journals expect authors to quoteconfidence intervals with greater prominence than P values. Notes about Type I error:

is the incorrect rejection of the null hypothesis

maximum probability is set in advance as alpha is not affected by sample size as it is set in advance increases with the number of tests or end points (i.e. do 20 tests and 1 is likely to be wrongly significant)

Notes about Type II error:

is the incorrect acceptance of the null hypothesis probability is beta beta depends upon sample size and alpha can't be estimated except as a function of the true population effect beta gets smaller as the sample size gets larger beta gets smaller as the number of tests or end points increases

Kruskal-Wallis test Menu location: Analysis_Analysis of Variance_Kruskal-Wallis. This is a method for comparing several independent random samples and can be used as a non-parametric alternative to the one way ANOVA. The Kruskal-Wallis test statistic for k samples, each of size ni is:

- where N is the total number (all ni) and Ri is the sum of the ranks (from all samples pooled) for the ith sample and:

The null hypothesis of the test is that all k distribution functions are equal. The alternative hypothesis is that at least one of the populations tends to yield larger values than at least one of the other populations. Assumptions:

random samples from populations independence within each sample mutual independence among samples measurement scale is at least ordinal either k population distribution functions are identical, or else some of the populations tend to yield larger values than other populations

If the test is significant, you can make multiple comparisons between the samples. You may choose the level of significance for these comparisons (default is a = 0.05). All pairwise comparisons are made and the probability of each presumed "non-difference" is indicated (Conover, 1999; Critchlow and Fligner, 1991; Hollander and Wolfe, 1999). Two alternative methods are used to make all possible pairwise comparisons between groups; these are Dwass-Steel-Critchlow-Fligner and Conover-Inman. In most situations, you should use the Dwass-SteelCritchlow-Fligner result. By the Dwass-Steel-Critchlow-Fligner procedure, a contrast is considered significant if the following inequality is satisfied:

- where q is a quantile from the normal range distribution for k groups, ni is size of the ith group, nj is the size of the jth group, tb is the number of ties at rank b and Wijis the sum of the ranks for the ith group where observations for both groups have been ranked together. The values either side of the greater than sign are displayed in parentheses in StatsDirect results.

The Conover-Inman procedure is simply Fisher's least significant difference method performed on ranks. A contrast is considered significant if the following inequality is satisfied:

- where t is a quantile from the Student t distribution on N-k degrees of freedom. The values either side of the greater than sign are displayed in parentheses in StatsDirect results. An alternative to Kruskal-Wallis is to perform a one way ANOVA on the ranks of the observations. StatsDirect also gives you an homogeneity of variance test option with Kruskal-Wallis; this is marked as "Equality of variance (squared ranks)". Please refer tohomogeneity of variance for more details. Technical Validation The test statistic is an extension of the Mann-Whitney test and is calculated as above. In the presence of tied ranks the test statistic is given in adjusted and unadjusted forms, (opinion varies concerning the handling of ties). The test statistic follows approximately a chi-square distribution with k-1 degrees of freedom; P values are derived from this. For small samples you may wish to refer to tables of the Kruskal-Wallis test statistic but the chi-square approximation is highly satisfactory in most cases (Conover, 1999). Example From Conover (1999, p. 291). Test workbook (ANOVA worksheet: Method 1, Method 2, Method 3, Method 4). The following data represent corn yields per acre from four different fields where different farming methods were used. Method 1 83 91 94 89 89 96 91 92 90 Method 2 91 90 81 83 84 83 88 91 89 84 Method 3 101 100 91 93 96 95 94 Method 4 78 82 81 77 79 81 80 81

To analyse these data in StatsDirect you must first prepare them in four workbook columns appropriately labelled. Alternatively, open the test workbook using the file open function of the file menu. Then select KruskalWallis from the Non-parametric section of the analysis menu. Then select the columns marked "Method 1", "Method 2", "Method 3" and "Method 4" in one selection action. For this example: Adjusted for ties: T = 25.62883 P < 0.0001 All pairwise comparisons (Dwass-Steel-Chritchlow-Fligner) Method 1 and Method 2 , P = 0.1529 Method 1 and Method 3 , P = 0.0782 Method 1 and Method 4 , P = 0.0029 Method 2 and Method 3 , P = 0.0048 Method 2 and Method 4 , P = 0.0044 Method 3 and Method 4 , P = 0.0063 All pairwise comparisons (Conover-Inman) Method 1 and Method 2, P = 0.0078 Method 1 and Method 3, P = 0.0044 Method 1 and Method 4, P < 0.0001 Method 2 and Method 3, P < 0.0001 Method 2 and Method 4, P = 0.0001 Method 3 and Method 4, P < 0.0001 From the overall T we see a statistically highly significant tendency for at least one group to give higher values than at least one of the others. Subsequent contrasts show a significant separation of all groups with the Conover-Inman method and all but method 1 vs. methods 2 and 3 with the Dwass-Steel-Chritchlow-Fligner method. In most situations, it is best to use only the Dwass-Steel-Chritchlow-Fligner result.

ANOVA - Comparing More Than Two Group Means As we've seen, t tests compare the means of two independent groups or two related groups or scores. Many research designs involve more than just two groups. One approach for comparing more than two means is to test each pair of means with a t test. Not only does this approach involve many tests, but it also compounds the Type 1 error rate. For example, if we have three groups, Morning, Afternoon, and Evening, whose scores we want to compare, we could compare 1) Morning with Afternoon, 2) Morning with Evening, and 3) Afternoon with Evening. This approach would involve three separate tests, and it would result in an overall error rate of three times that of a single test. Instead of this approach, statisticians use an overall test, called an ANOVA. ANOVA stands for ANalysis Of VAriance and is described as an omnibus test because it tests all differences between the separate groups at once. The variances being analyzed are the deviations from the mean scores, which we used to calculate the standard deviation (itself a measure of variation, of course). Review the calculation for the variance and standard deviation before proceeding on. Understanding that calculation will help to understand the following technique. There are many types of ANOVAs, which depend on the specifics of the research design. The first type we will consider is called simple or oneway ANOVA. A oneway ANOVA is used to compare the means of three or more groups. Think of it as an extension of an independent t test, or think of the independent t test as a special case of an ANOVA for two groups. The heart of an ANOVA test is something called the sum of squares. This is where recalling the calculation of the variance and standard deviation is helpful. Remember that we determined deviations from the mean for each score in a distribution. This process resulted in positive and negative deviations, which, due to the property of the mean, all sum to 0 always. In order to calculate an average deviation, we squared and then summed the deviations. This is the sum of squares (SS) - think of it as the sum of squared deviations to remember where the squares originated. In this section, we will be considering three different types of sums of squares.

The total sum of squares represents the total amount of variation in the combined distribution of all raw scores. The between groups sum of squares represents the variation that is related to group membership. The between groups sum of squares represents variance that is explained by the characteristics that define the separate groups.

The within groups sum of squares represents the variation within the separate groups, which is unexplained variance and is also called error. Explained variance plus unexplained variance equals total variance, or stated another way, the between groups sum of squares plus the within groups sum of squares equals the total sum of squares. In a t test, the between groups sum of squares is represented by the difference between the two group means and the within groups sum of squares is represented by the variances of the two groups. The test statistic for an ANOVA is called the F ratio, which compares the between groups sum of squares with the within groups sum of squares. The between groups sum of squares is divided by the degrees of freedom associated with the number of group means (e.g., if there are 3 groups, then there are 2 degrees of freedom) to obtain the mean sum of squares between groups. Likewise, the within groups sum of squares is divided by the degrees of freedom associated with the group sample size (n). For example, when comparing three groups of 10 scores, there are 27 degrees of freedom, which is 9 degrees of freedom from each group. Dividing the within groups sum of squares by its associated degrees of freedom gives the mean sum of squares within groups. The F ratio is the mean between groups sum of squares divided by the mean within groups sum of squares. This test statistic is used to determine statistical significance of the result. SPSS and Excel report the associated p-value directly; Table B.3 in the text lists some example F values for specific levels and degrees of freedom. Notice that there are two degrees of freedom parameters in the table - one for the numerator of the F ratio and another for the denominator. An effect size for an ANOVA is called eta squared, 2, and is calculated by dividing the between groups sum of squares by the total sum of squares to obtain the percentage of variance that is explained by group membership. Similar to the coefficient of determination, the amount of unexplained variance is 1 - the amount of explained variance. The assumptions for an ANOVA are extensions of the assumptions for a t test, namely 1) independent observations, 2) normally distributed data, and 3) homogeneity of variance among all groups. There is a form of Levene's test to assess whether the variances are similar. Let's step through an example. Suppose that you are teaching three sections of the same course, designated Morning, Afternoon, and Evening. For the sake of this example, each section has 10 students. You collected data from your students using an instrument that measures their level of activity in the class. Here are the data: Section Morning Afternoon Evening 7 4 8

6 8 6 7 6 8 6 5 7 Sum Mean SD 66 6.60 0.97

5 6 5 4 5 6 5 5 4 49 4.90 0.74

7 8 8 6 7 6 9 7 8 74 7.40 0.97

1. Set up the hypothesis The null hypothesis is that the population means are equal. Symbolically, H0: Morning = Afternoon = Evening The alternative, research, hypothesis is never directional for an ANOVA. In this case, the alternative hypothesis is that the three means are not equal. Symbolically, H1: MeanMorning MeanAfternoon MeanEvening As we will see, this form of the alternative hypothesis will not identify the source of specific differences if they are found to exist. 2. Establish the alpha level at = .05, which is always two-tailed for an ANOVA, based on the alternative hypothesis. 3. Select the appropriate test statistic (see the decision tree on page 166 or https://usffiles.usfca.edu/FacStaff/baab/www/lessons/DecisionTree.html). The appropriate test to use is a oneway ANOVA. The test statistic will be F. 4. Check the test's assumptions and the compute the test statistic based on the sample data (obtained value). Independence is a result of the data collection process. Normality is checked by inspecting the histograms and skewness ratios. Homogeneity of variances is check with Levene's test (see the SPSS output below). The test statistic is calculated and reported by SPSS or Excel. This is how the F ratio is determined: First, a grand mean is calculated. All 30 scores shown above are added and the sum is divided by 30 to obtain a grand mean of6.30. The between groups sum of squares is the sum of the squared differences of each group mean from the grand mean multiplied by the sample size. [Note: ^2 represents squaring and * represents multiplication] Between groups sum of squares = 10 * [(6.60-6.30)^2 + (4.90-6.30)^2 + (7.40-6.30)^2] , which equals 10 * [0.09 + 1.96 + 1.21] or 32.60 The within groups sum of squares is the sum of each score's squared deviation from its group mean. In other words, we subtract 6.60, 4.90, or 7.40 from each score listed above, square that result, and then add up all of the squares. Here are the squared deviations from the grand mean: Section Morning Afternoon Evening 0.16 0.36 1.96 0.36 0.16 0.36 1.96 0.36 2.56 0.16 Sum 8.40 0.81 0.01 1.21 0.01 0.81 0.01 1.21 0.01 0.01 0.81 4.90 0.36 0.16 0.36 0.36 1.96 0.16 1.96 2.56 0.16 0.36 8.40

Within groups sum of squares = 8.40 + 4.90 + 8.40, which equals 21.70 Next, the two mean sums of squares are calculated by dividing by the degrees of freedom. Because there are three groups, the between groups degrees of freedom is 3-1 or 2. Because each group has 10 observations, the within groups degrees of freedom is 3 * (10 -1) or 3*9 which is 27. The mean between groups sum of squares is 32.60/2 or 16.30 The mean within groups sum of squares is 21.70/27 or .803704 The F ratio is 16.30/.80 or 20.28 5. Skip to #8. Determine the critical value for the test statistic. 6. Skip to #8. Compare the obtained value with the critical value. 7. Skip to #8. Either reject or retain the null hypothesis based on the following: If obtained value > critical value, then reject the null hypothesis - evidence supports the research hypothesis. If obtained value <= critical value, then retain the null hypothesis - evidence does not support the research hypothesis. 8. Alternative to #5-7 - for use with SPSS output: Compare the reported p-value (Sig.) with the preset alpha level. If p-value < alpha level, then reject the null hypothesis - evidence supports the research hypothesis. There is a small chance of committing a Type I error. If p-value >= alpha level, then retain the null hypothesis - evidence does not support the research hypothesis. The chance of committing a Type I error is too large. In the first table of the SPSS output below, find the group means and the grand mean described above. The second table contains the results of Levene's test, which establishes the equality of the group standard deviations. The third table contains the ANOVA results, which include the three sums of squares, the two mean sums of squares, the F ratio, and the observed p-value. Because the p-value is less than .05, we reject the null hypothesis and conclude that the three group means are different - but how are the different? For this we need to conduct another test, called pairwise comparisons. There are numerous versions of this test. We'll use the one that the author recommends, the Bonferroni test. But before that, look over the Excel results, generated using the ANOVA: Single Factor option on the Data Analysis dialog box, which is accessed from the Data Analysis command on the Tools menu (see the earlier note about installing these tools).

Here are the Excel ANOVA results:

Now for the pairwise comparisons. The Bonferroni test identifies the specific source of the differences found by the overall ANOVA test.

This test reports that the mean score of the Afternoon section (4.9) is different from both the Morning (6.6) and Evening (7.4) sections, but that the difference between Morning and Evening sections' scores is not statistically significant. Here is a picture of the pattern of mean scores.

The effect size is computed from the ANOVA table as the percentage of explained variance, denoted by 2, which is 32.6/54.3 or about 40.0%, representing a medium effect. The determination of the relative strength of an

effect depends on the field of study - general guidelines should be weighed against findings of other researchers. Factorial ANOVA - Comparing Group Means Based on Combinations of Independent Variables We'll take the application of ANOVA just one step further. Suppose that you not only want to compare sections of a course but you also want to compare the level of activity of majors and non-majors in the course. Now, instead of just three groups, you have six groups - three sections and two types of students within each section. Again to make the example more straightforward, we'll assume equal numbers of majors and non-majors within the sections. That is, there are exactly five majors and five non-majors in each of the three sections. This uniformity is not a requirement, but it does make the results easier to understand. Before conducting the comparison of the six group means, let's introduce a few new terms. First of all, this type of ANOVA is called a factorial ANOVA, and in particular for the example just described, a two factor or two-way ANOVA. A factor is an independent variable that is used to categorize the observations. When we compare the means, there are two types of effects that we can observe. They are called main effects and interaction effects. Main effects are due to the factors themselves. In this example, there is a main effect due to Section and another main effect due to Type of Student. Interaction effects are due to the combination of the two factors. For example, if we found that majors are more actively learning in the morning section and non-majors are more actively learning in the afternoon section, there would be an interaction effect. Interactions are designated by a combination of factors, such as Section * Type. In the ANOVA table, the result of the factorial structure is a separation of the between groups sum of squares. Because there are more categories of students, we have more ways to determine where the differences might arise. In the following example, which uses the same data that we used earlier, you'll notice that the overall sum of squares is the same. The various F ratios and effect sizes are calculated in the same manner as they were in the oneway ANOVA. The assumptions are the same as the oneway ANOVA as well - there are just more subgroups to check. Here are the data seen earlier, but now divided by Type of Student as well.

Here is the Excel output from the ANOVA: Two-Factor with Replication command. Note when using this command, the column and row containing the headings are required in the range. The values in the Total column were reported in the previous oneway ANOVA.

Here is the ANOVA table with the sum of squares for the two main effects and for the interaction.

Notice that the differences due to the Section are statistically significant, which is what we found in the oneway ANOVA, and that the differences due to Type of Student are statistically significant, but the interaction effect is not statistically significant. Try computing the effect sizes for the statistically significant effects. What percentage of variance is left unexplained? Here is the same analysis from SPSS. First, here are the descriptive statistics about the groups as well as Levene's test for the equality of variances.

Then here are the results of the ANOVA. Notice that SPSS includes additional information in the table.

For the purposes of our example, we just need to focus on the rows labeled, Section, Type, Section * Type, Error, and Corrected Total. Compare these results to the Excel output displayed above. Here is a picture that illustrates the pattern of means.

When the two lines do not cross each other, there is no interaction effect. In this graph, we can see that no matter which section they were in, the mean scores of Majors exceeded those of Non-Majors. This pattern represents a main effect. Also, by estimating the midpoints between the two mean scores for each section, we can see that Morning and Evening mean scores are higher than the Afternoon mean score. Sum of squares

Sum of squares (SS) The SS is a measuring scale that uses numbers to indicate how much variance is within a set of measurements. Sum of squares (SS) is the short name for the Sum of the squared STANDARD DEVIATIONS from the mean. The long name tells us exactly which math operations to perform to find the SS. Warning: there are many different Sums of Squares, as well as different means, in an ANOVA. Make sure you know which one you are dealing with first before you do all of those calculations. ANOVA Table example

SSA

= SS(between groups) = [A] - [T] = [Y] - [A]

SSS/A= SS(within groups)

SST = SSA + SSS/A = [Y] - [T] Bracket terms: [A] Add up the measurements in each group first, then square each group total, add all of the squared totals together, and then divide by the number of measurements in a single group. [Y] Square all measurements, regardless of which group, then add them all up. [T] Add up the measurements in each group first, then add those sums together, then square that number, then divide that number by the total number of measurements in all groups. Example: Three groups of patients were given different doses of a new drug (the Algebra Pill) that is supposed to increase their algebra skills. They took an exam with a possible score of 20. Assume that all participants were selected at random.

[A] = [(12+8+7+5)2 + (10+19+10+11)2 + (14+12+10+12)2] divided by 4 = 5828/4 = 1457 [Y] = [122+82+72+52+102+192+102+112+142+122+102+122] divided by 1 = 1548 [T] = [(12+8+7+5+10+19+10+11+14+12+10+12]2 divided by (3 x 4) = 16,900/12 = 1408 SSA = [A] - [T] = 1457 - 1408 = 49

SSS/A SST

= [Y] - [A] = 1548 - 1457 = 91 = [Y] - [T] = 1548 - 1408 = 140

**Note: SSA + SSS/A MUST EQUAL SST Always double check your work. Does 49 + 91 = 140... why yes, it does!

You might also like