You are on page 1of 58

Statistical Principles of

Experimental Design

Dov Stekel

Maximum information from

minimum effort


Blocking and randomization

Arrangement of samples and arrays
Class exercise
How many replicates?
Computer practical

Blocking, Randomization and

Arrangement of experimental design that
minimises problems from extraneous
sources of variability
Use blocking to avoid confounding
Use randomization and blinding to avoid

Toxicity Example
We are interested in characterising the
toxic effect of Benzo(a)pyrene (BP) on rats
8 Rats are to be treated with BP and 8 rats
with a control compound
Each array will be hybridized against a
reference sample
16 Arrays in the experiment

Experimental Design
There are two batches of 8 slides from two
different print runs (1 and 2)
Hybridisation will be done by two
researchers, Alison and Brian.
What is the best way to arrange the

Design 1
Alison prepares all 8 BP samples and
hybridises them to the arrays of print run 1
Brian prepares all 8 control samples and
hybridises them to the arrays of print run 2

Design 2
Alison chooses 8 rats and treats 4 with BP and 4
with control substance.
She prepares and hybridises 2 BP samples to
arrays from print run 1 and 2 BP samples to
arrays from print run 2
She prepares and hybridises 2 control samples
to arrays from print run 1 and 2 control samples
to arrays from print run 2
Brian does the same with the other 8 rats

Design 2

Print Run 1

Print Run 2

Print Run 1

Print Run 2



Design 3
8 rats are randomly assigned to Alison, along
with 4 BP preparations and 4 control
preparations. She is not told which
preparations are which.
She prepares and hybridises samples to
randomly pre-arranged arrays so that 2 BP
samples and 2 control samples are hybridised
to 4 arrays from each of print runs 1 and 2.
Brian does the same with the other 8 rats

What is wrong with design 1?

Treatment, researcher and print run are
confounded variables
We cannot tell whether differences between the
two groups of rats result from treatment,
researcher or print run
Use blocking in designs 2 and 3 to deconfound
the variability of interest (treatment) from the
extraneous variabilities (researcher and print run)
Designs 2 and 3 are also balanced which
increases power of analyses

What is wrong with design 2?

Alison's choice of rats may be biased
For example, she may choose the
healthiest rats, so confounding potential
treatment effects with researcher variability
Use randomization and blinding in design
3 to avoid bias

Arrangement of Samples and Arrays

Is it better to use Affymetrix arrays or a
two-colour array system?
If using a two-colour array system, is it
better to use a reference sample?
If using a two-colour array system, what is
the best arrangement of samples on the

Several Factors

Available technology
Statistical considerations
We consider problem from perspective of
three different experiments

Example 1:
Hepatocellular Carcinomas
Samples are taken from disease and
healthy tissue from patients suffering from
hepatocellular carcinomas and hybridised
to microarrays. We would like to identify
genes that are up- or down- regulated in
hepatocellular carcinomas relative to
healthy tissue.

Design 1.1


Healthy 1

Disease 1

Array 1

Array 2

x 20

Design 1.2

Healthy 1

GeneChip 1

Disease 1

GeneChip 2

x 20

Design 1.3

Healthy 1

x 20
Disease 1

Array 1

Design 1.4

Healthy 1

Healthy 11

x 10

x 10

Disease 1

Disease 11

Array 1

Array 11

Design 1.5

Healthy 1

Healthy 1

x 20
Disease 1

Disease 1

Array 1

Array 2

Which is the best design?

Simple experiment - five different

Design 1.1 is bad because it increases
Design 1.3 is bad because it confounds
colour with disease state.
Designs 1.4 and 1.5 are best.

Design 1.1




Array 1

Array 2

Coefficient of
Variability is 30%
Design increases
variability to 43%

Design 1.5





Array 1

Array 2

Coefficient of
Variability: 30%
design reduces
variability to 21%

Example 2:
B-Cell Lymphomas
Samples are taken from 60 patients
suffering from B-cell lymphomas and
hybridised to microarrays. The aim of the
experiment is to identify clinically relevant
subgroups of patients using a cluster
analysis, and then to build a classification
model to differentiate between the

Design 2.1

Patient 1

x 30
Patient 2

Array 1

Design 2.2

Patient 1

x 60

Array 1

Design 2.3

Patient 1

GeneChip 1

x 60

Which design is best?

Design 2.1 is bad because it is difficult to
compare patients on equal footing.
Designs 2.2 and 2.3 are good.
Probably most appropriate use of
Affymetrix technology.

Example 3:
Yeast Time Series
Budding yeast can reproduce sexually by
producing haploid cells through a process
called sporulation. Yeast was placed in a
sporulating medium, samples taken at 7
timepoints from the start of sporulation.
We are interested in identifying genes that
show similar profiles in the timecourse.

Design 3.1
Time 0

Time 0

Time 0

Time 0

Time 0

Time 0

Time 1

Time 2

Time 3

Time 4

Time 5

Time 6

Array 1

Design 3.2
Time 0

Time 1

Time 2

Time 3

Time 4

Time 5

Time 6

Time 1

Time 2

Time 3

Time 4

Time 5

Time 6

Time 0

Array 1

Design 3.3

Time 0

Time 1

GeneChip 1 2

Time 2

Time 3

Time 4

Time 5

Time 6

Which is the best design?

Design 3.3 is bad because timepoint is
confounded with array.
Design 3.2 is a loop design. It is a good
design, but harder to analyse.
Design 3.1 is the best design.

Bright Timepoint Problem

Imagine we have a "bright" array. This
could be because of:
Higher gene expression
Experimental artifact

Normalising by array mean or median

cannot deconfound these factors

Time Series Example

Time Series Ratios

Raw Gene Expression for FYV1

FYV1 Normalised to Array

FYV1 Normalised to Reference

Class Exercise
Two strains of Staphylococcus aureus:
methicillin-sensitive and methicillinresistant
Each strain is cultured and then either
treated or untreated with methicillin
Samples are taken at several time points
(0h, 2h, 6h, 10h)
We want to identify genes involved in

How Many Replicates?

Use Power Analysis which relates:
Difference in mean we are trying to detect
Population and experimental variability
Type of analysis
Chosen significance threshold
Number of replicates

Population Inferrence



The confidence is the probability of not getting
a false positive result.
It is the probability of accepting the null
hypothesis when the null hypothesis is true.
A false positive result is known as a Type I
We control for Type I errors explicitly by
selecting an appropriate confidence level
In microarray experiments, we must modify the
confidence level to account for multiplicity

The power is the probability of not getting a false
negative result.
It is the probability of rejecting the null hypothesis
when the null hypothesis is false.
A false negative result is known as a Type II
We control the power implicitly via the confidence
level and the experimental design.

Type I and Type II Errors



No effect


Not significant


Type II error


Type I error


Power Analysis Assumptions

We assume that the data is approximately log
normally distributed
This corresponds to standard deviation of the
errors of the raw data being proportional to the
signal intensity
This is equivalent to a constant standard
deviation in the logged data
The standard deviation divided by the mean is
called the coefficient of variation

Log Normally Distributed Data

Power Analysis
We will use the power.t.test() formula in
R to calculate the power of one and two
sample tests
power.t.test(n, delta, sd,
sig.level, power, type,

Formula is used with one of the first five

variables omitted and will calculate the
unknown variable

Power Analysis Example:

Doxorubicin Chemotherapy
We are interested in the treatment of breast
cancer patients with doxorubicin chemotherapy
We want to perform a microarray experiment to
determine genes that are up- or down- regulated
as a result of the chemotherapy
We would like to know:
How to design the experiment?
How many patients we need?

Paired vs Unpaired Design

In a paired design, we take samples from each
patient before and after treatment, and for each
gene, look at the difference in expression before
and after treatment
In an unpaired design, we have two groups of
patients, one group treated, the other group
untreated. We look at the difference in gene
expression between the two groups
Which is a better experiment?

Paired and Unpaired Designs

Paired: test if
mean is different
from zero

Unpaired: test if
means of groups
are different

Power Analysis Assumptions

Suppose we know from a pilot study and
evaluation of our technology that the
coefficient of variation is 40%
Let's say that we want to detect genes that are
2-fold regulated
We are testing 10,000 genes so we will use a
signficance threshold of 0.001 to compensate
for multiplicity
How many patients do we need for a power of
80%, 90% and 99%?

Paired Experiment
The standard deviation of the underlying normal
distribution equivalent to 40% variability is 0.39
The difference in means is log2(2) = 1
The number of patients we need is:



Unpaired Experiment
The standard deviation and difference in
means is the same.
The number of patients we need is:

Group Size


1-Sample Number

Paired vs Unpaired
In this example, we need more than twice
the patients in the unpaired experiment to
obtain the same power as the paired
Paired experimental design is more
powerful than unpaired experimental
design because the differences between
individuals are factored out in the analysis

Extraneous variability:
Block to avoid confounding variables
Randomisation to avoid bias
Blocked experiments require ANOVA

Two sample experiments

Reference samples increase variability.
Hybridise both samples to same array.

Multiple patient comparisons
Reference samples or Affymetrix technology
enable comparisons.

Time series analysis

Reference samples are essential.

Number of replicates
Calculate using power analyses.

Computer Practical
Power analysis for population inference test

You might also like