You are on page 1of 58

Design of Experiments

DOE
Rami Arafeh
07.02.2009
Definitions
- Experiment: a planned scientific inquiry designed to
investigate one or more populations under several
treatments and/or levels
e.g.:
- Experimental Design: The plan of the experiment
which specifies the treatment conditions (independent
variables ) and what is to be measured (dependent
variables).

- Treatment(s): various conditions (processes,
techniques, operations) which distinguish the population
of interest)
e.g.:

Definitions
- Factor: when several aspects are studied in a single
experiment, each is called a factor (independent variable).
The different categories within a factor are called the
levels of factor.

Control: A group of subjects which does not receive the
experimental treatment but in all other respects is treated
in the same way as the experimental group.
Term used for the standard treatment included in the
experiment so that there is a reference value to which
other treatments may be compared.

Placebo: Placebo: An inactive substance or dummy
treatment administered to a control group to compare its'
effects with a real substance, drug or treatment.


Definitions
-EU: Experimental Unit: the smallest entity receiving a
single treatment. Could be one entry or group

- Experimental Error: Uncontrolled sources of
variability in the results which occur randomly during
the experiment. Much of this error is due to individual
differences among subjects.


-ANOVA: Analysis of Variance:
A statistical procedure which allows the
comparison of the means and standard deviations
of three or more groups in order to examine
whether significant differences exist anywhere in
the data.

Is the process of subdividing the total variability
of experimental observation into portions
attributable to recognized source of variation.
Definitions
- Mean Separation: (Multiple comparisons)
If the null hypothesis is rejected then at least one
mean is significantly different from at least one
other one.

-LSD: Least Significant Difference: a value based
on the standard error that distinguish statistically
similar from non-similar means.

Definitions
Designs
1- Completely Randomized
Design
(CRD)
CRD
The treatments are randomly
assigned to the experimental
units.

Example. A manufacturer of paper used for making
grocery bags is interested in improving the tensile strength
of the product. Product engineering thinks that tensile
strength is a function of the hardwood concentration in the
pulp and that the range of hardwood concentrations of
practical interest is between 5 and 20%. A team of
engineers responsible for the study decides to investigate
four levels of hardwood concentration: 5%, 10%, 15%, and
20%. They decide to make up six test specimens at each
concentration level, using a pilot plant. All 24 specimens
are tested on a laboratory tensile tester, in random order.
CRD
CRD
CRD
From where the variation comes
?????????


CRD
ANOVA
One-Way ANOVA
Partitions Total Variation
Variation due to
treatment
Variation due to
random sampling
Total variation
Sum of Squares Within
Sum of Squares Error (SSE)
Within Groups Variation
Sum of Squares Among
Sum of Squares Between
Sum of Squares Treatment
(SST)
Among Groups Variation
Total Variation
X
Group 1 Group 2 Group 3
Response, X
( ) ( ) ( ) ( )
2 2
21
2
11
X X X X X X Total SS
ij
+ + + =
Treatment Variation
X
X
3
X
2
X
1
Group 1 Group 2 Group 3
Response, X
( ) ( ) ( )
2 2
2 2
2
1 1
X
t
X
t
n X X n X X n SST + + + =
Random (Error) Variation
X
2
X
1
X
3
Group 1 Group 2 Group 3
Response, X
( ) ( ) ( )
2 2
1 21
2
1 11 t tj
X X X X X X SSE + + + =
SStotal=SSerror+SStreatment
Error Variation
SSE=SStotal-SStreatment
The design model




where Y
ij
is a random variable denoting the (ij)
th
observation,
is a parameter common to all treatments called the overall
mean,
i
is a parameter associated with the i
th
treatment called
the i
th
treatment effect, and
ij
is a random error component.

=
=
+ + =
n j
a i
Y
ij i ij
,......., 2 , 1
,...., 2 , 1
c t
Completely Randomized Design
Completely Randomized Design
This is an example of a completely randomized single-
factor experiment with four levels of the factor.
The levels of the factor are called treatments, and each
treatment has six observations or replicates.
This figure indicates that changing the hardwood
concentration has an effect on tensile strength; specifically,
higher hardwood concentrations produce higher observed
tensile strength.
Completely Randomized Design
Analysis of Variance
Suppose we have a different levels of a single factor that we
wish to compare.
The response for each of the a treatments is a random variable.
Let y
ij
, represents the j
th
observation taken under treatment i.
We initially consider the case in which there are an equal
number of observations, n, on each treatment.
Analysis of Variance
We are interested in testing the equality of the a
treatment means
1
,
2
,...,
a
.We find that this is
equivalent to testing the hypotheses


If the null hypothesis is true, each observation consists of
the overall mean plus a realization of the random error
component
ij
and changing the levels of the factor has no
effect on the mean response.
i one least at for H
H
i a
a
0 :
0 ..... :
2 1 0
=
= = = =
t
t t t
Analysis of Variance
The sum of square total is


The sum of square treatment is


The error sum of squares is
SS
Error
= SS
Total
- SS
Treatment

N
y
y SS
a
i
n
j
ij T
2
..
1 1
2
=

= =
N
y
n
y
SS
a
i
i
treatment
2
..
1
2
=

=
Analysis of Variance
The ANOVA partitions the total variability in the sample
data into two component parts.
Then, the test of the hypothesis is based on a
comparison of two independent estimates of the population
variance.
The total variability in the data is described by the total
sum of squares.

Analysis of Variance
We can show that if the null hypothesis H
0
is true, the
ratio has an F-distribution with a - 1 and a(n - 1) degrees of
freedom.
If the null hypothesis is false, the expected value of
MS
Treatments
is greater than
2
.
We would reject H
0
if F >F
,a-1,a(n-1).

( )
( ) | |
E
Treatmant
E
Treatment
MS
MS
n a SS
a SS
F =

=
1 /
1 /
Analysis of Variance
Example. In the paper tensile strength experiment,
we can use the ANOVA to test the hypothesis that
different hardwood concentrations do not affect the
mean tensile strength of the paper.
The hypotheses are
H
0
:
1
=
2
=
3
=
4
= 0
H
a
:
i
0 for at least one

Analysis of Variance
Analysis of Variance


= (7)
2
+ (8)
2
+.+ (20)
2
(383)
2
/24 = 512.96






N
y
y SS
i j
ij Total
2
..
4
1
6
1
2
=

= =
N
y
n
y
SS
i
i
treatment
2
..
4
1
2
=

=
( ) ( ) ( ) ( ) ( )
79 . 382
24
383
6
127 102 94 60
2 2 2 2 2
=
+ + +
treatment total
error
SS SS SS =
SSerror = 512.96 - 382.79 = 130.17
Analysis of Variance
Analysis of Variance
The typical ANOVA table for CRD



Source of
Varaition
Sum of
Squares
Degree of
Freedom
Mean
Square
F
Treatment SS
Treatment
a-1 MS
Treatment
MS
Treatment
MS
Error
Error SS
Error
a(n-1) MS
Error
Total SS
Total
an-1
Source of
Varaition
Sum of
Squares
Degree of
Freedom
Mean
Square
F
Hardwood
concentration
382.79 3 127.60 19.60
Error 130.17 20 6.51
Total 512.96 23
Analysis of Variance
Analysis of Variance
SS
E
= SS
T
SS
treatment

= 512.96 382.79 = 130.17
From ANOVA results, we will reject H
0
, if F > F
Table
F = 127.60 / 6.51 = 19.60
F
0.01, 3, 20
= 4.94
Therefore, we reject H
0
and conclude that
hardwood concentration affects the mean strength of
the paper.
Multiple Comparisons
and mean separation
by LSD
Multiple Comparisons
When the null hypothesis is rejected in the ANOVA, we
know that some of the treatment or factor level means are
different.
However, the ANOVA doesnt identify which means are
different.
Methods for investigating this issue are called multiple
comparisons methods.
Fishers least significant difference (LSD) method.
Multiple Comparisons
where LSD, the least significant difference, is



If the sample sizes are different in each treatment, the
LSD is

n
MS
t LSD
E
n a
2
) 1 ( , 2 /
=
o
|
|
.
|

\
|
+ =

j i
E a N
n n
MS t LSD
1 1
, 2 / o
Multiple Comparisons
Ex. Apply the Fisher LSD method to the hardwood concentration
experiment. There are a = 4, n = 6, MS
E
= 6.51, with 95 %
confidence interval and t
0.025,20
= 2.086. The treatment means are




The value of LSD is
17 . 21
00 . 17
67 . 15
00 . 10
. 4
. 3
. 2
. 1
=
=
=
=
y
y
y
y
07 . 3 6 / ) 51 . 6 ( 2 086 . 2 / 2
20 , 025 . 0
= = n MS t
E
Source of
Varaition
Sum of
Squares
Degree of
Freedom
Mean
Square
F
Hardwood
concentration
382.79 3 127.60 19.60
Error 130.17 20 6.51
Total 512.96 23
Analysis of Variance
Multiple Comparisons
Therefore, any pair of treatment averages that differs by more
than 3.07 implies that the corresponding pair of treatment means
are different.
The comparisons among the observed treatment averages are
4 vs. 1 = 21.17 10.00 = 11.17 > 3.07
4 vs. 2 = 21.17 15.67 = 5.50 > 3.07
4 vs. 3 = 21.17 17.00 = 4.17 > 3.07
3 vs. 1 = 17.00 10.00 = 7.00 > 3.07
3 vs. 2 = 17.00 15.67 = 1.33 < 3.07
2 vs. 1 = 15.67 10.00 = 5.67 > 3.07
Multiple Comparisons
From this analysis, we see that there are significant
differences between all pairs of means except 2 and 3.
This implies that 10 and 15% hardwood concentration
produce approximately the same tensile strength and that
all other concentration levels tested produce different
tensile strengths.
Designs
2- Randomized Complete
Block Design
(RCBD)
divides the group of experimental units
into n homogeneous groups of size t.
These homogeneous groups are called
blocks.
The treatments are then randomly
assigned to the experimental units in
each block - one treatment to a unit in
each block.

RCBD
Example 1:
Suppose we are interested in how weight gain
(Y) in rats is affected by Source of protein (Beef,
Cereal, and Pork) and by Level of Protein (High
or Low).
There are a total of t = 32 treatment
combinations of the two factors (Beef -High
Protein, Cereal-High Protein, Pork-High Protein,
Beef -Low Protein, Cereal-Low Protein, and
Pork-Low Protein) .
RCBD
Suppose we have available to us a total of N =
60 experimental rats to which we are going to
apply the different diets based on the t = 6
treatment combinations.
Prior to the experimentation the rats were divided
into n = 10 homogeneous groups of size 6.
The grouping was based on factors that had
previously been ignored (Example - Initial weight
size, appetite size etc.)
Within each of the 10 blocks a rat is randomly
assigned a treatment combination (diet).

RCBD
The weight gain after a fixed period is
measured for each of the test animals
and is tabulated on the next slide:

RCBD
Block

Block

1
107 96 112 83 87 90
6
128 89 104 85 84 89

(1) (2) (3) (4) (5) (6)

(1) (2) (3) (4) (5) (6)

2
102 72 100 82 70 94
7
56 70 72 64 62 63

(1) (2) (3) (4) (5) (6)

(1) (2) (3) (4) (5) (6)

3
102 76 102 85 95 86
8
97 91 92 80 72 82

(1) (2) (3) (4) (5) (6)

(1) (2) (3) (4) (5) (6)

4
93 70 93 63 71 63
9
80 63 87 82 81 63

(1) (2) (3) (4) (5) (6)

(1) (2) (3) (4) (5) (6)

5
111 79 101 72 75 81
10
103 102 112 83 93 81

(1) (2) (3) (4) (5) (6)

(1) (2) (3) (4) (5) (6)

RCBD
Example 2:
The following experiment is interested in
comparing the effect four different
chemicals (A, B, C and D) in producing
water resistance (y) in textiles.
A strip of material, randomly selected from
each bolt, is cut into four pieces (samples)
the pieces are randomly assigned to
receive one of the four chemical
treatments.
RCBD
This process is replicated three times
producing a Randomized Block (RB)
design.
Moisture resistance (y) were measured
for each of the samples. (Low readings
indicate low moisture penetration).
The data is given in the diagram and table
on the next slide.


RCBD
Diagram: Blocks (Bolt Samples)
9.9 C 13.4 D 12.7 B
10.1 A 12.9 B 12.9 D
11.4 B 12.2 A 11.4 C
12.1 D 12.3 C 11.9 A

RCBD
Blocks (Bolt Samples)
Chemical 1 2 3
A 10.1 12.2 11.9
B 11.4 12.9 12.7
C 9.9 12.3 11.4
D 12.1 13.4 12.9

data table
RCBD
The Model for a randomized Block
Experiment
ij j i ij
y c | t + + + =
ij j i ij
y c | t + + + =
i = 1,2,, t j = 1,2,, b
y
ij
= the observation in the j
th
block receiving the
i
th
treatment
= overall mean
t
i
= the effect of the i
th
treatment
|
j
= the effect of the j
th
Block
c
ij
= random error
The Anova Table for a randomized Block
Experiment
Source S.S. d.f. M.S. F p-value
Treat SS
T
t-1 MS
T
MS
T
/MS
E
Block SS
B
n-1 MS
B
MS
B
/MS
E

Error SS
E
(t-1)(b-1) MS
E
RCBD
A randomized block experiment is assumed
to be a two-factor experiment.
The factors are blocks and treatments.
The is one observation per cell. It is
assumed that there is no interaction between
blocks and treatments.
The degrees of freedom for the interaction is
used to estimate error.
RCBD
The ANOVA Table for Diet Experiment
Source S.S d.f. M.S. F p-value
Block 5992.4167 9 665.82407 9.52 0.00000
Diet 4572.8833 5 914.57667 13.076659 0.00000
ERROR 3147.2833 45 69.93963
The Anova Table for Textile Experiment
SOURCE SUM OF SQUARES D.F. MEAN SQUARE F TAIL PROB.
Blocks 7.17167 2 3.5858 40.21 0.0003
Chem 5.20000 3 1.7333 19.44 0.0017
ERROR 0.53500 6 0.0892
If the treatments are defined in terms of two
or more factors, the treatment Sum of
Squares can be split (partitioned) into:
Main Effects
Interactions
The ANOVA Table for Diet Experiment
terms for the main effects and interactions
between Level of Protein and Source of Protein

Source S.S d.f. M.S. F p-value
Block 5992.4167 9 665.82407 9.52 0.00000
Diet 4572.8833 5 914.57667 13.076659 0.00000
ERROR 3147.2833 45 69.93963
Source S.S d.f. M.S. F p-value
Block 5992.4167 9 665.82407 9.52 0.00000
Source 882.23333 2 441.11667 6.31 0.00380
Level 2680.0167 1 2680.0167 38.32 0.00000
SL 1010.6333 2 505.31667 7.23 0.00190
ERROR 3147.2833 45 69.93963
H.W
- Latin Square
Factorial Design
Split-plot design

Advantages and disadvantages of each disign

You might also like