Professional Documents
Culture Documents
C. F. Jeff Wu
University of Michigan
(joint work with G. Dyson)
Outline
Current Methods
Proposed Methodology
Analysis Plan
Example
Conclusions
Multiplicity Problem
When we make more than one
comparison in a hypothesis testing
situation, p-value interpretation falls
through
Control of family error rate is necessary in
order to preserve nominal type I error rate
Various approaches to correct the chance
of making a type I error for multiplicity,
including Tukey, Bonferroni and Holms
Microarray Analysis
Techniques
Westfall Young step down (WY)
Significance Analysis of Microarrays
(SAM)
Empirical Bayes (EB)
Bayesian (MCMC)
Mixture Modeling
Dimension reduction techniques
Machine learning
5
Significance Analysis of
Microarrays (SAM)
Use a t-like statistic
Half-Normal Analysis
Analysis Plan
Robust measures of location and scale
Summary statistic
Two half-normal plots (for upwardregulated and downward-regulated
genes)
Segment determination
NC
J
,
J
Find
10
Reasonable estimates
Less affected by outliers than mean and SD
Interested in robustness rather than efficiency
11
Summary Statistic
Compute quasi two-sample t-statistic
using robust values from above:
c is chosen to minimize
13
Segment Determination: J
Given initialize null set as points abss1 :
abssk
Regress null set on 1:k half-normal
quantiles (Q1:Qk)
Produce predicted values y h at the
remaining quantile values (Qh:h>k)
Compute predicted statistics
with
Find
14
J k
to
15
Sample
Let k = 200, total effects = 500
First 200 ordered positive effects regressed on first
200 half-normal quantiles
Test ordered effects 201 to 500 using absolute
value of predicted statistics
For example, effect 239 is the largest h less than
the t-critical value
k 200
So J
would initially be 239
16
Example
J 3116
17
Find J NC
Will test all effects after J using same
statistics
To adjust for multiple testing, define NC
as the number of consecutive significant
effects necessary to call all subsequent
effects significant
Use the Bonferroni adjustment (does not
require independence):
Instead of doing thousands of
comparisons, only need to do NC to
determine significance
Define
18
19
J NC
J
20
21
Conclusions
Proposed a new method for determining
differential expression in genes
Dealt with the multiplicity problem by using
only a small subset of genes
Can extend to other large data sets
Allow scientists to play a role in sequential
decision making
Incorporate a priori knowledge of experiment
with selection of c
24