You are on page 1of 12

PreviousHomeNext Home Design Experimental Design

Factorial Designs
A Simple Example
Probably the easiest way to begin understanding factorial designs is by looking at an example. Let's imagine a design where we have an educational program where we would like to look at a variety of program variations to see which works best. For instance, we would like to vary the amount of time the children receive instruction with one group getting 1 hour of instruction per week and another getting 4 hours per week. And, we'd like to vary the setting with one group getting the instruction inclass (probably pulled off into a corner of the classroom) and the other group being pulled-out of the classroom for instruction in another room. We could think about having four separate groups to do this, but when we are varying the amount of time in instruction, what setting would we use: in-class or pull-out? And, when we were studying setting, what amount of instruction time would we use: 1 hour, 4 hours, or something else?

With factorial designs, we don't have to compromise when answering these questions. We can have it both ways if we cross each of our two time in instruction conditions with each of our two settings. Let's begin by doing some defining of terms. In factorial designs, a factor is a major independent variable. In this example we have two factors: time in instruction and setting. A level is a subdivision of a factor. In this example, time in instruction has two levels and setting has two levels. Sometimes we depict a factorial design with a numbering notation. In this example, we can say that we have a 2 x 2 (spoken "two-by-two) factorial design. In this notation, the number of numbers tells you how many factors there are and the number values tell you how many levels. If I said I had a 3 x 4 factorial design, you would know that I had 2 factors and that one factor had 3 levels while the other had 4. Order of the numbers makes no difference and we could just as easily term this a 4 x 3 factorial design. The number of different treatment

groups that we have in any factorial design can easily be determined by multiplying through the number notation. For instance, in our example we have 2 x 2 = 4 groups. In our notational example, we would need 3 x 4 = 12 groups.

We can also depict a factorial design in design notation. Because of the treatment level combinations, it is useful to use subscripts on the treatment (X) symbol. We can see in the figure that there are four groups, one for each combination of levels of factors. It is also immediately apparent that the groups were randomly assigned and that this is a posttest-only design.

Now, let's look at a variety of different results we might get from this simple 2 x 2 factorial design. Each of the following figures describes a different possible outcome. And each outcome is shown in table form (the 2 x 2 table with the row and column averages) and in graphic form (with each factor taking a turn on the horizontal axis). You should convince yourself that the information in the tables agrees with the information in both of the graphs. You should also convince yourself that the pair of graphs in each figure show the exact same information graphed in two different ways. The lines that are shown in the graphs are technically not necessary -- they are used as a visual aid to enable you to easily track where the averages for a single level go across levels of another factor. Keep in mind that the values shown in the tables and graphs are group averages on the outcome variable of interest. In this example, the outcome might be a test of achievement in the subject being taught. We will assume that scores on this test range from 1 to 10 with higher values indicating greater achievement. You should study carefully the outcomes in each figure in order to understand the differences between these cases.

The Null Outcome


Let's begin by looking at the "null" case. The null case is a situation where the treatments have no effect. This figure assumes that even if we didn't give the training we could expect that students would score a 5 on average on the outcome test. You can see in this hypothetical case that all four groups score an average of 5 and therefore the

row and column averages must be 5. You can't see the lines for both levels in the graphs because one line falls right on top of the other.

The Main Effects


A main effect is an outcome that is a consistent difference between levels of a factor. For instance, we would say theres a main effect for setting if we find a statistical difference between the averages for the in-class and pull-out groups, at all levels of time in instruction. The first figure depicts a main effect of time. For all settings, the 4 hour/week condition worked better than the 1 hour/week one. It is also possible to have a main effect for setting (and none for time).

In the second main effect graph we see that in-class training was better than pull-out training for all amounts of time.

Finally, it is possible to have a main effect on both variables simultaneously as depicted in the third main effect figure. In this instance 4 hours/week always works better than 1 hour/week and in-class setting always works better than pull-out.

Interaction Effects
If we could only look at main effects, factorial designs would be useful. But, because of the way we combine levels in factorial designs, they also enable us to examine the interaction effects that exist between factors. An interaction effect exists when differences on one factor depend on the level you are on another factor. It's important to recognize that an interaction is between factors, not levels. We wouldn't say there's an interaction between 4 hours/week and in-class treatment. Instead, we would say that there's an interaction between time and setting, and then we would go on to describe the specific levels involved.

How do you know if there is an interaction in a factorial design? There are three ways you can determine there's an interaction. First, when you run the statistical analysis, the statistical table will report on all main effects and interactions. Second, you know there's an interaction when can't talk about effect on one factor without mentioning the other factor. if you can say at the end of our study that time in instruction makes a difference, then you know that

you have a main effect and not an interaction (because you did not have to mention the setting factor when describing the results for time). On the other hand, when you have an interaction it is impossible to describe your results accurately without mentioning both factors. Finally, you can always spot an interaction in the graphs of group means -- whenever there are lines that are not parallel there is an interaction present! If you check out the main effect graphs above, you will notice that all of the lines within a graph are parallel. In contrast, for all of the interaction graphs, you will see that the lines are not parallel.

In the first interaction effect graph, we see that one combination of levels -- 4 hours/week and in-class setting -- does better than the other three. In the second interaction we have a more complex "cross-over" interaction. Here, at 1 hour/week the pull-out group does better than the in-class group while at 4 hours/week the reverse is true. Furthermore, the both of these combinations of levels do equally well.

Summary
Factorial design has several important features. First, it has great flexibility for exploring or enhancing the signal (treatment) in our studies. Whenever we are interested in examining treatment variations, factorial designs should be strong candidates as the designs of choice. Second, factorial designs are efficient. Instead of conducting a series of independent studies we are effectively able to combine these studies into one. Finally, factorial designs are the only effective way to examine interaction effects.

So far, we have only looked at a very simple 2 x 2 factorial design structure. You may want to look at somefactorial design variations to get a deeper understanding of how they work. You may also want to examine how we approach the statistical analysis of factorial experimental designs

Factoria l Design Analysi s


Here is the regression model statement for a simple 2 x 2 Factorial Design. In this design, we have one factor for time in instruction (1 hour/week versus 4 hours/week) and one factor for setting (in-class or pull-out). The model uses a dummy variable (represented by a Z) for each factor. In two-way factorial designs like this, we have two main effects and one interaction. In this model, the main effects are the statistics associated with the beta values that are adjacent to the Z-variables. The interaction effect is the statistic associated with 3 (i.e., the t-value for this coefficient) because it is adjacent in the formula to the multiplication of (i.e., interaction of) the dummy-coded Z variables for the two factors. Because there are two dummycoded variables, each having two values, you can write out 2 x 2 = 4 separate equations from this one general model. You might want to see if you can write out the equations for the four cells. Then, look at some of the differences between the groups. You can also write out two equations for each Z variable. These equations represent the main effect equations. To see the difference between levels of a factor, subtract the equations from each other. If you're confused about how to manipulate these equations, check the section on how dummy variables work.

Factorial Design
In a factorial design, there are more than one factors under consideration in the experiment. The test subjects are assigned to treatment levels of every factor combinations at random. Example A fast food franchise is test marketing 3 new menu items in both East and West Coasts of continental United States. To find out if they the same popularity, 12 franchisee restaurants from

each Coast are randomly chosen for participation in the study. In accordance with the factorial design, within the 12 restaurants from East Coast, 4 are randomly chosen to test market the first new menu item, another 4 for the second menu item, and the remaining 4 for the last menu item. The 12 restaurants from the West Coast are arranged likewise. Problem Suppose the following tables represent the sales figures of the 3 new menu items after a week of test marketing. Each row in the upper table represents the sales figures of 3 different East Coast restaurants. The lower half represents West Coast restaurants. At .05 level of significance, test whether the mean sales volume for the new menu items are all equal. Decide also whether the mean sales volume of the two coastal regions differs. East Coast: ========== Item1 Item2 Item3 E1 E2 E3 E4 25 36 31 26 39 42 39 35 36 24 28 29

West Coast: ========== Item1 Item2 Item3 W1 W2 W3 W4 51 47 47 52 43 39 53 46 42 36 32 33

Solution The solution consists of the following steps: 1. Save the sales figure into a file named "fastfood-3.csv" in CSV format as follows. Item1,Item2,Item3 E1,25,39,36 E2,36,42,24 E3,31,39,28 E4,26,35,29 W1,51,43,42 W2,47,39,36 W3,47,53,32 W4,52,46,33 2. Load the data into a data frame named df3 with the read.csv function. > df3 = read.csv("fastfood-3.csv") 3. Concatenate the data rows in df3 into a single vector r .

> r = c(t(as.matrix(df3))) # response data > r [1] 25 39 36 36 42 ... 4. Assign new variables for the treatment levels and number of observations. > f1 = c("Item1", "Item2", "Item3") # 1st factor levels > f2 = c("East", "West") > k1 = length(f1) > k2 = length(f2) > n = 4 # 2nd factor levels # number of 1st factors # number of 2nd factors # observations per treatment
th

5. Create a vector that corresponds to the 1 treatment level of the response data r in step 3 element-by-element with the gl function. > tm1 = gl(k1, 1, n*k1*k2, factor(f1)) > tm1 [1] Item1 Item2 Item3 Item1 Item2 ... 6. Similarly, create a vector that corresponds to the 2 treatment level of the response data r in
nd

step 3. > tm2 = gl(k2, n*k1, n*k1*k2, factor(f2)) > tm2 [1] East East East East East ... 7. Apply the function aov to a formula that describes the response r by the two treatment factors tm1 and tm2 with interaction. > av = aov(r ~ tm1 * tm2) # include interaction

8. Print out the ANOVA table with summary function. > summary(av) Df Sum Sq Mean Sq F value tm1 tm2 tm1:tm2 Residuals Answer Since the p-value of 0.0015 for the menu items is less than the .05 significance level, we reject the null hypothesis that the mean sales volume of the new menu items are all equal. Moreover, the pvalue of 1.2e-05 for the east-west coasts comparison is also less than the .05 significance level. It shows there is a difference in overall sales volume between the coasts. Finally, the last p-value of 0.0113 (< 0.05) indicates that there is a possible interaction between the menu item and coast location factors, i.e., customers from different coastal regions have different tastes. Exercise Create the response data in step 3 above along vertical columns instead of horizontal rows. Adjust the factor levels in steps 5 and 6 accordingly. 2 1 2 18 385 715 234 363 193 715 117 20 9.55 5.81 Pr(>F) 0.0015 ** 0.0113 *

35.48 1.2e-05 ***

Installation R can be downloaded from one of the mirror sites in http://cran.r-project.org/mirrors.html. In case you are installing in Windows Vista or Linux, you should do so with administration privilege. Using External Data R offers plenty of options for loading external data, including Excel, Minitab and SPSS files. We have included a tutorial titled Data Import on the subject. R Session After R is started, there is a console waiting for input. At the prompt (>), you can enter numbers and perform calculations. > 1 + 2 [1] 3 Variable Assignment We assign values to variables with the assignment operator "=". Just typing the variable by itself at the prompt will print out the value. We should note that another form of assignment operator "<-" is also in use. > x = 1 > x [1] 1 Functions R functions are invoked by its name, following by the parenthesis, and zero or more arguments. The following apply the function c to combine three numeric values into a vector. > c(1, 2, 3) [1] 1 2 3 Comments All text after the pound sign "#" is considered a comment. > 1 + 1 [1] 2 Add-on Packages Sometimes we need additional functionality beyond those offered by the core R library. In order to install an add-on package, you should start R with administration privilege, invoke the install.packages function at the prompt, and follow the instruction. > install.packages() Getting Help R provides extensive documentation. For example, typing ?c or help(c) gives documentation of the function c in R. Please give it a try. > help(c) If you are not sure about the name of the function you are looking for, you can perform a fuzzy search with the apropos function. > apropos("nova") [1] "anova" .... "anova.glm" # this is a comment

Finally, there is an R specific Internet search engine at http://www.rseek.org for more assistance.

Numeric
Decimal values are called numeric in R. It is the default computational data type. If we assign a decimal value to a variable x as follows, the class of x will be numeric. > x = 10.5 > x [1] 10.5 > class(x) [1] "numeric" Furthermore, even if we assign an integer to a variable k, it is still being saved as a numeric value. > k = 1 > k [1] 1 > class(k) [1] "numeric" The fact that k is not an integer can be confirmed with the is.integer function. This is a consequence of the lack of class inheritance relationship between the numeric and integer data types. The latter will be discussed in the next tutorial. > is.integer(k) [1] FALSE # is k an integer? # print the class name of k # print the value of k # print the class name of x # assign a decimal value # print the value of x

Integer
In order to create an integer variable in R, we invoke the as.integer function. We can be assured that y is indeed an integer by applying the is.integer function. > y = as.integer(3) > y [1] 3 > class(y) [1] "integer" > is.integer(y) [1] TRUE Incidentally, we can coerce a numeric value into an integer with the same as.integerfunction. > as.integer(3.14) [1] 3 And we can parse a string for decimal values in much the same way. > as.integer("5.27") [1] 5 On the other hand, it is erroneous trying to parse a non-decimal string. # coerce a decimal string # coerce a numeric value # is y an integer? # print the class name of y # print the value of y

> as.integer("Joe") [1] NA Warning message:

# coerce an nondecimal string

NAs introduced by coercion Often, it is useful to perform arithmetic on logical values. Like the C language, TRUE has the value 1, while FALSE has value 0. > as.integer(TRUE) [1] 1 > as.integer(FALSE) [1] 0 # the numeric value of FALSE # the numeric value of TRUE

Complex
> z = 1 + 2i > z [1] 1+2i > class(z) [1] "complex" # print the value of z

Numeric

A complex value in R is defined via the pure imaginary value i. # create a complex number

# print the class name of z

The following gives an error as 1 is not a complex value. > sqrt(1) [1] NaN Warning message: In sqrt(1) : NaNs produced Instead, we have to use the complex value 1 + 0i. > sqrt(1+0i) [1] 0+1i An alternative is to coerce 1 into a complex value. > sqrt(as.complex(1)) [1] 0+1i # square root of 1+0i # square root of 1

Logical
A logical value is often created via comparison between variables. > x = 1; y = 2 > z = x > y > z [1] FALSE > class(z) [1] "logical" Standard logical operations are "&" (and), "|" (or), and "!" (negation). # print the class name of z # sample values # is x larger than y? # print the logical value

Character
A character object is used to represent string values in R. We convert objects into character values with the as.character() function: > x = as.character(3.14) > x [1] "3.14" > class(x) [1] "character" Two character values can be concatenated with the paste function. > fname = "Joe"; lname ="Smith" > paste(fname, lname) [1] "Joe Smith" However, it is often more convenient to create a readable string with the sprintf function, which has a C language syntax. > sprintf("%s has %d dollars", "Sam", 100) [1] "Sam has 100 dollars" To extract a substring, we apply the substr function. Here is an example showing how to extract the substring between the third and twelfth positions in a string. > substr("Mary has a little lamb.", start=3, stop=12) [1] "ry has a l" And to replace the first occurrence of the word "little" by another word "big" in the string, we apply the sub function. > sub("little", "big", "Mary has a little lamb.") [1] "Mary has a big lamb." More functions for string manipulation can be found in the R documentation. > help("sub") # print the class name of x # print the character string

> u = TRUE; v = FALSE > u & v [1] FALSE > u | v [1] TRUE > !u [1] FALSE Further details and related logical operations can be found in the R documentation. > help("&") # negation of u # u OR v # u AND v

You might also like