Use of The Half-Normal Probability Plot To Identify Significant Effects For Microarray Data

Use of the Half-Normal Probability
Plot to Identify Significant Effects for

Microarray Data
C. F. Jeff Wu
University of Michigan
(joint work with G. Dyson)
Outline
Current Methods
Proposed Methodology
Analysis Plan
Example
Conclusions
What are microarrays?

Two major types
Oligonucleotide gene chips
Spotted glass arrays
Perfect match (PM) and mismatch (MM)

probes are spotted onto a gene chip
~20 probes make up a probe set (or gene)
MM probe for each gene has the middle base
set to the complement of its PM probe
Hybridize labeled RNA corresponding to PM
probes
Glass arrays involve the competitive

hybridization of two RNA pools to cDNA
spotted onto a glass slide
Typically thousands on genes on a slide
3
Multiplicity Problem
When we make more than one
comparison in a hypothesis testing
situation, p-value interpretation falls
through
Control of family error rate is necessary in
order to preserve nominal type I error rate
Various approaches to correct the chance
of making a type I error for multiplicity,
including Tukey, Bonferroni and Holms
Microarray Analysis
Techniques
Westfall Young step down (WY)
Significance Analysis of Microarrays
(SAM)
Empirical Bayes (EB)
Bayesian (MCMC)
Mixture Modeling
Dimension reduction techniques
Machine learning
5
Westfall Young (WY)

Compute ranks of original test statistic rj
such that
Construct b balanced permutations of the
samples, computing the same test
(b )
(b )
statistic as above t1 , , t k for each b
Compute
and
Repeat B times and calculate the adjust
p-value as
Less conservative than Bonferroni
6
Significance Analysis of
Microarrays (SAM)
Use a t-like statistic
Use balanced permutation method from

previous slide to estimate null distribution,
assuming all effects are null
Call genes that fall outside bars
significant
Half-Normal Analysis
Microarray Specific Problem
Analysis Plan
Robust measures of location and scale
Summary statistic
Two half-normal plots (for upwardregulated and downward-regulated
genes)
Segment determination
NC
J
,
J
Find
insignificant, borderline, significant

NC
J
Repeat the procedure, using as base
10
Robust Measures of Location and

Scale
Perform transformation and suitable
normalization
Compute median and Maximum Absolute
Deviation (MAD) for each gene
Reasonable estimates
Less affected by outliers than mean and SD
Interested in robustness rather than efficiency
11
Summary Statistic
Compute quasi two-sample t-statistic
using robust values from above:
c is chosen to minimize
for the middle 100*(1-2)% of the ssl.

Tusher et al. (2001) chose c to minimize
the coefficient of variation
Efron et al. (2001) used the 90th
percentile of the gene standard error
estimates for c
12
Two Half-Normal Plots

Construct two half-normal plots: one for
the p positive and r negative ssl.
Run the procedure separately on each
set
Denote the ordered p positive effects by
Plot abssi against half-normal distribution
quantiles, i.e. the points
1 (.5 .5[i .5] / p ), abss( i ) )
Goal: obtain set of noise effects

Yield a baseline against which to test the
rest of the effects
13
Segment Determination: J
Given initialize null set as points abss1 :
abssk
Regress null set on 1:k half-normal
quantiles (Q1:Qk)
Produce predicted values y h at the
remaining quantile values (Qh:h>k)
Compute predicted statistics
with
Find
14
Segment Determination:J (cont)

The initial null set of k genes becomes k
k
+ m (= J ) null genes
Now re-do the segment determination
procedure, using the k + m genes as
base null set
Continue until no new genes are added
Do for each k less than p-1
k
J
Store the end point
Set the most frequent
J k
to
15
Sample
Let k = 200, total effects = 500
First 200 ordered positive effects regressed on first
200 half-normal quantiles
Test ordered effects 201 to 500 using absolute
value of predicted statistics
For example, effect 239 is the largest h less than
the t-critical value
k 200
So J
would initially be 239
Redo the above, with k = 239 effects; so we

test effects 240 to 500
Say statistic 242 is the largest h less than t-critical
value based on new regression line
So the new J 200 would be 242
Redo the above again with k = 242, test

effects 243 to 500
No statistics are less than t critical value
200
So J is 242
16
Example
J 3116
17
Find J NC
Will test all effects after J using same
statistics
To adjust for multiple testing, define NC
as the number of consecutive significant
effects necessary to call all subsequent
effects significant
Use the Bonferroni adjustment (does not
require independence):
Instead of doing thousands of
comparisons, only need to do NC to
determine significance
Define
Now we have identified the change points

in the graph for segment detection
18
Example: Downwardregulated Speed Mouse Data
19
Example: Downward Regulated

Speed Mouse Data (cont)
J NC
J
20
Error Rate Estimation: FDR

False Discovery Rate (FDR) is the
expected proportion of falsely rejected
hypotheses
Permute the condition labels, maintaining
balance
Example: 8 replicates in conditions A and B
Each A and B will have 4 replicates from A
and 4 from B
Compute the robust statistics, keeping the
same c from the actual data
Determine the average number of effects

that fall above the positive or below the
negative boundary of the significant sets
Divide that number by the total number of
called significant effect
21
Speed Data: Analysis and

Comparison
WY found 8 genes significant, with Type I

error = 0.05
22
Lemon Data: Analysis and

Comparison
WY found 253 genes significant, with

Type I error = 0.05
23
Conclusions
Proposed a new method for determining
differential expression in genes
Dealt with the multiplicity problem by using
only a small subset of genes
Can extend to other large data sets
Allow scientists to play a role in sequential
decision making
Incorporate a priori knowledge of experiment
with selection of c
24

Use of The Half-Normal Probability Plot To Identify Significant Effects For Microarray Data

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Use of The Half-Normal Probability Plot To Identify Significant Effects For Microarray Data

Uploaded by

Copyright:

Available Formats

Use of the Half-Normal Probability

Plot to Identify Significant Effects for

What are microarrays?

Perfect match (PM) and mismatch (MM)

Glass arrays involve the competitive

Westfall Young (WY)

Use balanced permutation method from

Microarray Specific Problem

insignificant, borderline, significant

Robust Measures of Location and

for the middle 100*(1-2)% of the ssl.

Two Half-Normal Plots

Goal: obtain set of noise effects

Segment Determination:J (cont)

Set the most frequent

Redo the above, with k = 239 effects; so we

Redo the above again with k = 242, test

Now we have identified the change points

Example: Downwardregulated Speed Mouse Data

Example: Downward Regulated

Error Rate Estimation: FDR

Determine the average number of effects

Speed Data: Analysis and

WY found 8 genes significant, with Type I

Lemon Data: Analysis and

WY found 253 genes significant, with

You might also like