Professional Documents
Culture Documents
Experimental Design
Dov Stekel
Overview
Toxicity Example
We are interested in characterising the
toxic effect of Benzo(a)pyrene (BP) on rats
8 Rats are to be treated with BP and 8 rats
with a control compound
Each array will be hybridized against a
reference sample
16 Arrays in the experiment
Experimental Design
There are two batches of 8 slides from two
different print runs (1 and 2)
Hybridisation will be done by two
researchers, Alison and Brian.
What is the best way to arrange the
experiment?
Design 1
Alison prepares all 8 BP samples and
hybridises them to the arrays of print run 1
Brian prepares all 8 control samples and
hybridises them to the arrays of print run 2
Design 2
Alison chooses 8 rats and treats 4 with BP and 4
with control substance.
She prepares and hybridises 2 BP samples to
arrays from print run 1 and 2 BP samples to
arrays from print run 2
She prepares and hybridises 2 control samples
to arrays from print run 1 and 2 control samples
to arrays from print run 2
Brian does the same with the other 8 rats
Design 2
Alison
Print Run 1
Print Run 2
Print Run 1
Print Run 2
Control
Treated
Brian
Control
Treated
Design 3
8 rats are randomly assigned to Alison, along
with 4 BP preparations and 4 control
preparations. She is not told which
preparations are which.
She prepares and hybridises samples to
randomly pre-arranged arrays so that 2 BP
samples and 2 control samples are hybridised
to 4 arrays from each of print runs 1 and 2.
Brian does the same with the other 8 rats
Several Factors
Available technology
Cost
Statistical considerations
We consider problem from perspective of
three different experiments
Example 1:
Hepatocellular Carcinomas
Samples are taken from disease and
healthy tissue from patients suffering from
hepatocellular carcinomas and hybridised
to microarrays. We would like to identify
genes that are up- or down- regulated in
hepatocellular carcinomas relative to
healthy tissue.
Design 1.1
Reference
Sample
Reference
Sample
Healthy 1
Disease 1
Array 1
Array 2
x 20
Design 1.2
Healthy 1
GeneChip 1
Disease 1
GeneChip 2
x 20
Design 1.3
Healthy 1
x 20
Disease 1
Array 1
Design 1.4
Healthy 1
Healthy 11
x 10
x 10
Disease 1
Disease 11
Array 1
Array 11
Design 1.5
Healthy 1
Healthy 1
x 20
Disease 1
Disease 1
Array 1
Array 2
Design 1.1
Reference
Sample
Reference
Sample
Healthy
Disease
Array 1
Array 2
Coefficient of
Variability is 30%
Design increases
variability to 43%
Design 1.5
Healthy
Healthy
Disease
Disease
Array 1
Array 2
Coefficient of
Variability: 30%
Experimental
design reduces
variability to 21%
Example 2:
B-Cell Lymphomas
Samples are taken from 60 patients
suffering from B-cell lymphomas and
hybridised to microarrays. The aim of the
experiment is to identify clinically relevant
subgroups of patients using a cluster
analysis, and then to build a classification
model to differentiate between the
subgroups.
Design 2.1
Patient 1
x 30
Patient 2
Array 1
Design 2.2
Patient 1
x 60
Reference
Array 1
Design 2.3
Patient 1
GeneChip 1
x 60
Example 3:
Yeast Time Series
Budding yeast can reproduce sexually by
producing haploid cells through a process
called sporulation. Yeast was placed in a
sporulating medium, samples taken at 7
timepoints from the start of sporulation.
We are interested in identifying genes that
show similar profiles in the timecourse.
Design 3.1
Time 0
Time 0
Time 0
Time 0
Time 0
Time 0
Time 1
Time 2
Time 3
Time 4
Time 5
Time 6
Array 1
Design 3.2
Time 0
Time 1
Time 2
Time 3
Time 4
Time 5
Time 6
Time 1
Time 2
Time 3
Time 4
Time 5
Time 6
Time 0
Array 1
Design 3.3
Time 0
Time 1
GeneChip 1 2
Time 2
Time 3
Time 4
Time 5
Time 6
Class Exercise
Two strains of Staphylococcus aureus:
methicillin-sensitive and methicillinresistant
Each strain is cultured and then either
treated or untreated with methicillin
Samples are taken at several time points
(0h, 2h, 6h, 10h)
We want to identify genes involved in
methicillin-resistance
Population Inferrence
Population
Sample
Inferrence
Confidence
The confidence is the probability of not getting
a false positive result.
It is the probability of accepting the null
hypothesis when the null hypothesis is true.
A false positive result is known as a Type I
Error.
We control for Type I errors explicitly by
selecting an appropriate confidence level
In microarray experiments, we must modify the
confidence level to account for multiplicity
Power
The power is the probability of not getting a false
negative result.
It is the probability of rejecting the null hypothesis
when the null hypothesis is false.
A false negative result is known as a Type II
Error.
We control the power implicitly via the confidence
level and the experimental design.
OUR
DECISION
No effect
Effect
Not significant
Correct
Type II error
Significant
Type I error
Correct
Power Analysis
We will use the power.t.test() formula in
R to calculate the power of one and two
sample tests
power.t.test(n, delta, sd,
sig.level, power, type,
alternative)
Paired: test if
mean is different
from zero
Unpaired: test if
means of groups
are different
Paired Experiment
The standard deviation of the underlying normal
distribution equivalent to 40% variability is 0.39
The difference in means is log2(2) = 1
The number of patients we need is:
Power
80%
90%
99%
Number
8
9
11
Unpaired Experiment
The standard deviation and difference in
means is the same.
The number of patients we need is:
Power
80%
90%
99%
Group Size
8
10
13
Number
16
20
26
1-Sample Number
8
9
11
Paired vs Unpaired
In this example, we need more than twice
the patients in the unpaired experiment to
obtain the same power as the paired
experiment
Paired experimental design is more
powerful than unpaired experimental
design because the differences between
individuals are factored out in the analysis
Conclusions
Extraneous variability:
Block to avoid confounding variables
Randomisation to avoid bias
Blocked experiments require ANOVA
analyses
Conclusions
Multiple patient comparisons
Reference samples or Affymetrix technology
enable comparisons.
Number of replicates
Calculate using power analyses.
Computer Practical
Power analysis for population inference test