You are on page 1of 25

Multiple hypothesis testing using

the ex ess dis overy ount and alpha-investing rules


Dean P. Foster and Robert A. Stine

Department of Statisti s

The Wharton S hool of the University of Pennsylvania

Philadelphia, PA 19104-6340

April 7, 2005

Abstra t
We propose an adaptive, sequential methodology for testing multiple hypotheses.
Our methodology onsists of a new riterion, the ex ess dis overy ount (EDC), and a
new lass of testing pro edures that we all alpha-investing rules. The ex ess dis overy
ount is the di eren e between the number of orre tly reje ted null hypotheses and a
fra tion of the total number of reje ted hypotheses. EDC shares many properties with
the false dis overy rate (FDR), but is adapted to testing a sequen e of hypotheses rather
than a xed set. Be ause EDC ontrols the ount of in orre tly reje ted hypotheses
rather than a ratio, we are able to prove that a wide lass of testing pro edures that
we all alpha-investing rules ontrol EDC. Alpha-investing rules mimi alpha-spending
rules used in sequential trials, but possess a key di eren e. When a test reje ts a null
hypothesis, alpha-investing rules earn additional probability toward testing subsequent
hypotheses. Alpha-investing rules allow one to in orporate domain knowledge into the
testing pro edure and improve the power of the tests.
Key words and phrases: Bonferroni method, false dis overy rate (FDR), family wide
error rate (FWER), multiple omparison pro edure.
 All orresponden e regarding this manus ript should be dire ted to Prof. Stine at the address shown

with the title. He an be rea hed via e-mail at stinewharton.upenn.edu.

1
EDC and Alpha-investing 2

1 Introdu tion
We propose an adaptive, sequential methodology for testing multiple hypotheses. Our
approa h works in the usual setting in whi h one has a bat h of several hypotheses
as well as ases in whi h the hypotheses arrive sequentially in a stream. Streams of
hypothesis tests arise naturally in variety of ontemporary modeling appli ations, su h
as genomi s and variable sele tion for large models. In ontrast to the omparatively
well-de ned problems that spawned multiple omparison pro edures su h as Tukey's
studentized range, these appli ations an involve thousands of tests. For example,
mi roarrays lead one to ompare a ontrol group to a treatment group using measured
di eren es on over 6,000 genes (Dudoit, Sha er and Boldri k, 2003). In ontrast, the
example used by Tukey to motivate the problems of multiple omparisons ompares
the means of only 6 groups (Tukey, 1953, available in Braun (1994)). If one onsiders
the possibility for intera tions, then the number of tests is virtually in nite. Be ause
our approa h allows the testing to pro eed sequentially, the hoi e of future hypotheses
an depend upon the results of previous tests. Thus, having dis overed di eren es in
ertain genes, an investigator ould, for example, dire t attention toward related genes
identi ed by ommon trans ription fa tor binding sites (Gupta and Ibrahim, 2005).
Our methodology has two key omponents, a riterion and a pro edure. For mul-
tiple testing, we distinguish riteria that ontrol the number of Type I errors from
testing pro edures. We all our new riterion the ex ess dis overy ount (EDC). EDC
tra ks the expe ted number of true reje tions among the reje ted hypotheses. To on-
trol EDC, a test pro edure must guarantee that the expe ted ount of true reje tions
ex eeds a hosen fra tion of the number of reje ted hypotheses. For example, one might
want to guarantee that at least 95% of the reje ted hypotheses were reje ted orre tly.
Although one an use EDC to ontrol traditional tests, the advantage of this riterion
is that it permits one to ontrol adaptive testing pro edures in whi h the hoi e of the
next hypothesis to test depends on previous results.
The se ond omponent of our methodology is a lass of adaptive testing pro edures
that we all alpha-investing rules. We show that testing pro edures in this lass ontrol
EDC. Alpha-investing rules allow one to test a possibly in nite stream of hypotheses,
a ommodate dependent tests, and in orporate domain knowledge. Alpha-investing
rules mimi alpha-spending rules that are ommonly used in lini al trials. Unlike
alpha-spending rules, however, alpha-investing rules treat ea h test as an \investment."
EDC and Alpha-investing 3

Ea h test has a ost, but an generate a pro t in the form of the an in rease in the
amount of Type I error available for subsequent tests.
The rest of this paper develops as follows. We rst review several ideas from the
literature on multiple omparisons, parti ularly those related to the family wide error
rate and the false dis overy rate. With these ideas in pla e, we de ne EDC in Se tion 3
and alpha-investing rules in Se tion 4. In Se tion 5, we show that alpha-investing rules
ontrol a generalized version of EDC. We give several examples of testing a sequen e
of hypotheses using alpha-investing rules in Se tion 6. We lose in Se tion 7 with a
brief summary dis ussion, and defer the single proof to the appendix.

2 Criteria and Pro edures


We begin with a brief review of riteria and pro edures used to test a olle tion of
hypotheses. To set the stage for des ribing EDC, we review the two most important
riteria ommonly applied in testing multiple hypotheses: the family wide error rate
and the false dis overy rate. These riteria generalize the notion of the Type I error
rate ( -level) to tests of several hypotheses and are often onfused with testing pro e-
dures. The false dis overy rate is a riterion that one might design a testing pro edure
to satisfy, but is not itself a testing pro edure. Just as there are many -level tests of a
simple hypothesis, so too are there various multiple testing pro edures. We on ne our
attention to two, the Bonferroni pro edure and step-up/step-down tests. These pro e-
dures are most losely related to and suggestive of the alpha-investing rules developed
in Se tion 4.
Suppose that we have a set of m null hypotheses H(m) = fH1; H2; : : : ; Hm g that
spe ify values for parameters  = f1; 2; : : : ; mg. Ea h parameter j an be s alar or
ve tor-valued, and  denotes the spa e of parameter values. In the most familiar ase,
ea h null hypothesis spe i es that a parameter is zero, Hj : j = 0. We des ribe the
situation in whi h every hypothesis has this form and is true as the \null model."
We follow the standard notation for labeling the true and false reje tions as shown
in Table 1, whi h is taken from Benjamini and Ho hberg (1995). Assume that m0 of the
null hypotheses in H(m) are true. The observable statisti R(m) ounts how many of
these m hypotheses are reje ted. The unobservable random variable V  (m) denotes the
number of false positives among the m tests, those ases in whi h the testing pro edure
in orre tly reje ts a true null hypothesis. Similarly, S  (m) = R(m) V  (m) ounts the
EDC and Alpha-investing 4

Table 1: Counts of the number of null hypotheses that are true and false, displayed as sums

of unobserved random variables. The marginal random variable R(m) that ounts the total

number reje ted is observable, but internal ounts su h as V  (m) depend upon .

Claim

A ept H0 Reje t H0
True H0 U  ( m) V  ( m) m0
State H0 T  (m) S  (m) m m0
m R(m) R(m) m

number of orre tly reje ted null hypotheses. We index these random variables with a
supers ript  to distinguish them from a statisti su h as R(m); V  (m) and S  (m) are
not observable without . For a null model, m0 = m, V  (m) = R(m) and S  (m) = 0.
A basi premise of multiple testing is to ontrol the han e for any false reje tion.
The family wide error rate (FWER) is the probability of falsely reje ting any null
hypothesis from H(m), regardless of the values of the underlying parameters,
FWER(m)  sup P (V  (m)  1) : (1)
2
An important spe ial ase is ontrol of FWER under the null model. We refer to this
riterion as the size of a pro edure,
Size(m) = P0(V  (m)  1) ; (2)
where P0 denotes the probability measure under the null model. All of the pro edures
that we des ribe ontrol Size(m), but not all ontrol the more general FWER.
The Bonferroni pro edure is familiar and represents an important ben hmark for
omparison. Let p1 ; : : : ; pm denote the p-values of tests of H1; : : : ; Hm . Given a hosen
level 0 < < 1, the usual Bonferroni pro edure reje ts those Hj for whi h pj  =m.
Let the indi ators Vj 2 f0; 1g tra k in orre t reje tions; Vj = 1 if Hj is in orre tly
reje ted and is zero otherwise. Then V  (m) = P Vj and the inequality
m
X
P (V  (m)  1)  P (Vj = 1)  (3)
j =1
shows that this pro edure ontrols FWER(m)  . More generally, one need not
distribute equally over H(m); the pro edure only requires that the sum of the -
levels is not more than . For example, alpha-spending rules allo ate over a olle tion
EDC and Alpha-investing 5

of hypotheses with a larger share given to hypotheses of greater interest. Although it


ontrols FWER, the Bonferroni pro edure is often riti ized for having little power
ompared to other methods. Clearly, its power de reases as m in reases be ause the
threshold =m for dete ting a signi ant e e t de reases.
To obtain more power when some null hypotheses are false but still ontrol FWER,
Holm (1979) introdu ed the following so- alled step-down testing pro edure. Order
the olle tion of m hypotheses so that the p-values of the asso iated test statisti s are
sorted from smallest to largest (putting the most signi ant rst),
p(1)  p(2)      p(m) :

The test of H(1) has p-value p(1) , the test of H(2) has p-value p(2) and so forth.
Holm's pro edure reje ts those hypotheses H(j) for whi h p(j) is less than an in-
reasing sequen e of thresholds. The pro edure rst ompares the smallest p-value
to the Bonferroni threshold. If p(1) > =m, the pro edure stops and does not re-
je t any hypothesis. Consequently, Size(m)  . If p(1)  =m, the pro edure re-
je ts H(1) and moves on to test H(2) . Rather than ompare p(2) to =m, however,
Holm's pro edure ompares p(2) to a larger threshold, =(m 1). In general, if we
de ne jd = minfj : p(j) > =(m j + 1)g, then Holm's step-down pro edure reje ts
H(1) ; : : : ; H(jd 1) . Be ause of the nesting, this testing pro edure is losed in the sense
of Mar us, Peritz and Gabriel (1976) and hen e ontrols FWER(m)  . Obviously,
when ompared to using the Bonferroni threshold for ea h p-value, Holm's method has
larger power. The improvement is small, however, when m is large be ause =m is so
lose to =(m j ) when testing the smallest p-values.
The false dis overy rate (FDR) riterion ontrols the size of a testing pro edure
but introdu es a di erent type of ontrol if the null model is reje ted. Benjamini
and Ho hberg (1995) de ne FDR as the expe ted proportion of false positives among
reje ted hypotheses,
!
V  (m)
FDR(m) = E R(m) j R(m) > 0 P(R(m) > 0) : (4)
For the null model, R(m) = V  (m) and FDR(m) = FWER(m). Thus, test pro edures
that ontrol FDR(m)  have Size(m)  . Under the alternative, FDR(m) de reases
as the number of false null hypotheses m m0 in reases (Dudoit et al., 2003). As a
result, FDR(m) be omes more easy to ontrol in the presen e of non-zero e e ts,
allowing more powerful pro edures. Variations on FDR in lude pFDR (whi h drops
EDC and Alpha-investing 6

the term P(R > 0) Storey, 2002, 2003) and the lo al false dis overy rate fdr(z) (whi h
estimates the false dis overy rate as a fun tion of the size of the test statisti Efron,
2005a,b). Closer to our work, Meinshausen and Ri e (2004) and Meinshausen and
Buehlmann (2004) onsider estimates of m0, the total number of false hull hypotheses
in H(m).
Benjamini and Ho hberg (1995) show that the following so- alled step-up testing
pro edure ontrols FDR. First, assume that the p-values are independent and de ne
ju = maxfj : p(j )  j =mg. Using the inequality of Simes (1986), they show that the
testing pro edure that reje ts H(1) ; : : : ; H(j) ontrols FDR(m)  . This testing pro-
edure thus ontrols Size(m)  , but does not ontrol FWER for all . A similar step-
down pro edure that reje ts H(1) ; : : : ; H(jd 1) for jd = minfj : p(j) > =(m j + 1)g
also has FDR(m)  . Although this step-down pro edure has less power than its step-
up ousin (be ause jd 1  ju ), it has more power than Holm's pro edure. Holm's
step-down pro edure sets thresholds for the p-values to m ; m 1 ; m 2 ; : : : whereas a
Simes-based step-down pro edure uses the larger thresholds m ; 2m ; 3m ; : : :. A ost of
this greater power is a restri tion to independent tests that Holm's pro edure does
not require. Subsequent papers (su h as Benjamini and Yekutieli, 2001; Sarkar, 1998;
Troendle, 1996) onsider situations in whi h this type of step-up/step-down testing
ontrols FDR under dependen e, but the results obtain only for ertain types of de-
penden e.

3 The Ex ess Dis overy Count (EDC)


The ex ess dis overy ount (EDC) is a new riterion for ontrolling a multiple testing
pro edure. Its form resembles that of FDR, and it too ontrols an unobservable random
variable. EDC operates in the domain of ounts, however, rather than ratios of ounts,
and EDC emphasizes the number of orre t reje tions S  (m) rather than the number
of in orre t reje tions V  (m). EDC is the expe ted di eren e between the number of
orre tly reje ted null hypotheses S  (m) and a fra tion 0   1 of the number of
reje ted hypotheses R(m) (see Figure 1). For a pro edure that tests H(m), we have
De nition 1. The ex ess dis overy ount riterion for testing a set of m hypotheses
is
EDC ; (m) = E [S  (m) R(m)℄ + ; 0 < ; < 1 : (5)
Typi al values for the two tuning parameters and are 0.05 and 0.95, respe tively.
EDC and Alpha-investing 7

Figure 1: EDC ontrols the gap between the number of true reje tions S and a fra tion of

the number of reje ted null hypotheses. A strong signal implies most of the null hypotheses

in H are false.
Count
EΘ R

EΘ SΘ
EDC
Γ EΘ R - Α

Θ
No signal Moderate Strong signal

FDR(m) ontrols the expe ted proportion of false positives V  (m)=R(m) given that
R(m) > 0. EDC ; (m) instead ontrols the expe ted di eren e in the ounts S  (m)
R(m). Being a ratio, 0  FDR(m)  1 and hen e resembles a onditional probability.
In ontrast EDC ; (m) need not be positive, let alone lie between 0 and 1.
We are most interested in pro edures su h as that suggested by Figure 1 for whi h
EDC is positive. In this gure, the x-axis indi ates the amount of signal in the sense of
the proportion of null hypotheses in H that are false. \Strong signal" implies that many
of the m hypothesis are false, whereas \no signal" implies the null model. We will say
that a multiple testing pro edure \ ontrols EDC" if EDC ; (m)  0. Control of EDC
amounts to showing that the expe ted ount of true reje tions is at least E R(m) .
Under the null model, S  (m) = 0 so that
EDC ; (m) = E R(m)  Size(m) :
Thus, a pro edure that ontrols EDC ; (m)  0 also ontrols Size(m)  = . One an
also use EDC to ontrol FWER. If = 1, ontrol of EDC implies ontrol of FWER
be ause
EDC ;1 (m)  0 ) P (V  (m)  1)  E V  (m)  :
This property suggests that one an think of as ontrolling the FWER when  1.
The se ond tuning parameter more losely resembles FDR in the sense of ontrol-
ling the pro edure on e it reje ts the null model. Assuming that E R(m) > 0, ontrol
EDC and Alpha-investing 8

Figure 2: When viewed as ontrolling the proportion of false positives among reje ted null

hypotheses, EDC ontrols the gap between the ratio of expe tations EV  =ER and a de-

reasing fun tion of the number of reje ted null hypotheses. A strong signal in the heuristi

sense here implies most of the null hypotheses in H are false.


Proportion

FWER ΑΓ

H1-ΓL+ΑEΘ R

EΘ VΘ EΘ R
Θ
No signal Moderate Strong signal

of EDC ; (m) implies


E V  (m)
E [S  (m) R(m)℄ +  0 ) E R(m)
 (1 ) + E R (m)

When many hypotheses in H(m) are false and R(m) is large, most of the ontrol on
the pro edure omes from . Figure 2 shows EDC from this \FDR point of view" that
emphasizes the ratio E V  (m)=E R(m) rather than ounts. The FDR riterion on this
s ale is a horizontal line an hored at FWER that ontrols E V  (m)=R(m) rather
than the ratio of expe tations. A riterion that ontrols the ratio of expe tations (rather
than the expe tation of the ratio) has been dis ussed in Benjamini and Ho hberg
(1995).
To supplement these sket hes, we ran a small simulation. Figure 3 shows simulated
values of FDR and EDC for testing a olle tion of m = 200 hypotheses using three
pro edures: a naive, xed-level test that reje ts Hj if pj  = 0:05, the step-down
Simes pro edure, and the standard Bonferroni pro edure. The tested hypotheses Hj :
j = 0 spe ify the means of 200 normal populations. We set the values of the j by
sampling a spike-and-slab mixture. The mixture puts 100(1 1 )% of its probability
in a spike at zero; 1 = 0 identi es the null model. The slab of this mixture { the
EDC and Alpha-investing 9

Figure 3: FDR (left) and EDC (right, with = 0:05 and = 0:95) ontrol the size of test
pro edures (1 = 0) and the number reje ted as the level of signal 1 grows. The lines show

FDR and EDC for the Bonferroni pro edure (|), Simes-based step-down testing (   ), and
a naive pro edure that reje ts ea h hypothesis at level = 0:05 (  ).
FDR EDC
0.08
0.07 6

0.06
4
0.05
0.04 2
0.03
0
0.02
0.01 -2
Π1 Π1
.1 .2 .3 .4 .5 .6 .7 .8 .9 1 .1 .2 .3 .4 .5 .6 .7 .8 .9 1

signal { is a normal distribution, so that


8
<
j  :
0 w:p: 1 1 : (6)
N (0; 2 ) w:p: 1
We set the varian e of the signal omponent of the mixture to 2 = 2 log m so that
the standard deviation of the non-zero j mat hes the bound ommonly used in hard
thresholding. The test statisti s are independent, normally distributed random vari-
ables Zj iid
 N (j ; 1) for whi h the two-sided p-values are pj = 2(1 (jZj j)). Given
these p-values, we omputed FDR and EDC0:05;0:95 in a simulation with 10,000 trials.
In the simulation, we varied the amount of signal varying 1 from 0 (the null model)
to 1.
Qualitatively, FDR and EDC perform similarly. The shaded regions in Figure 3
indi ate la k of ontrol of the indi ated riterion. Bonferroni and step-down testing
ontrol FWER(200)  0:05 and EDC ; (200)  0. Simulated values of these riteria
remain outside of the shaded regions for all values of 1 . On the other hand, the naive
pro edure that tests all 200 hypotheses at level 0.05 produ es results that fall into
the shaded region for many values of 1 . Both FDR and EDC show this pro edure
as swit hing from liberal (shaded region) to onservative at about the same level of
signal, namely 0:6 < 1 < 0:7. Noti e that FDR emphasizes, relatively speaking,
di eren es among the pro edures when the amount of signal is small; as 1 nears 1,
FDR falls to zero for all 3 pro edures. Dudoit et al. (2003) dis uss this aspe t of
FDR further. EDC preserves a more uniform s ale for various amounts of signal. We
EDC and Alpha-investing 10

note also that the Bonferroni pro edure produ es linear trends in EDC. The slope of
the line seen in the right panel of Figure 3 depends upon the hoi e of in EDC ; .
Conservative

methods for e
V  (m) to be small regardless of the presen e of signal so
that E S  (m) R(m) +  (1 ) 1.

4 Alpha-Investing Rules
Alpha-investing rules provide a framework for devising multiple testing pro edures that
ontrol EDC in a dynami setting that allows streams of hypotheses. Alpha-investing
rules resemble alpha-spending rules su h as those often used in sequential lini al trials.
In a sequential trial, investigators routinely monitor the a umulating results for safety
and eÆ a y. This monitoring leads to a sequen e of tests of one (or several) null
hypotheses as the data a umulate. Alpha-spending (or error-spending) rules ontrol
the level of su h tests. Given an overall Type I error rate for the trial, su h as = 0:05,
alpha-spending rules allo ate, or spend, over a sequen e of tests. As Tukey (1991)
writes, \On e we have spent this error rate, it is gone." When repeatedly testing one
null hypothesis H0 in a lini al trial, spending rules guarantee that P (reje t H0) 
when H0 is true.
While similar in that they allo ate Type I error over multiple tests, alpha-investing
rules di er from alpha-spending rules in the following way. An alpha-investing rule
earns additional probability toward subsequent Type I errors with ea h reje ted hy-
pothesis. Rather than treating ea h test as an expense that onsumes its Type I
error rate, an alpha-investing rule treats tests as investments, motivating our hoi e of
name. In keeping with this analogy, we all the Type I error rate available to the rule
its alpha-wealth. As with an alpha-spending rule, an alpha-investing rule an never
spend more than its urrent alpha-wealth. Unlike an alpha-spending rule, however, an
alpha-investing rule earns an in rement in its alpha-wealth ea h time that it reje ts a
null hypothesis. For alpha-investing, Tukey's remark be omes \If we invest the error
rate wisely, we'll earn more for further tests." A pro edure that invests its alpha-wealth
in testing hypotheses that are reje ted a umulates additional wealth toward subse-
quent tests. The more hypotheses that are reje ted, the more alpha-wealth it earns. If
the test of Hj is not signi ant, however, the rule loses the -level invested in this test
and its alpha-wealth de reases. The more wealth a rule invests in testing hypotheses
that are not reje ted, the less alpha-wealth remains for subsequent tests.
EDC and Alpha-investing 11

More spe i ally, an alpha-investing rule is a fun tion I that determines the -
level for testing the next hypothesis in a sequen e of tests. We assume an exogenous
system external to the investing rule determines the next hypothesis to test. (Though
not part of the investing rule itself, this exogenous system an use the sequen e of
reje tions Rj to determine the next hypothesis to test.) An alpha-investing rule has
two parameters: the initial alpha-wealth and the amount earned ( alled the pay-out)
when a null hypothesis is reje ted. Let W (k)  0 denote the alpha-wealth a umulated
by an investing rule after k tests; W (0) is the initial alpha-wealth. For example, one
might onventionally set W (0) = 0:05 or 0:10. At step j , an alpha-investing rule sets
the level for testing Hj to some value j up to its urrent wealth, 0  j  W (j 1).
The level j for testing Hj typi ally depends upon the sequen e of prior out omes R1 ,
R2 ; : : : ; Rj 1 , and so we write an alpha-investing rule in general as
j = IW (0);! (R1 ; R2; : : : ; Rj 1)
= IW (0);! (j ) : (7)
The out omes of the sequen e of tests determine the alpha-wealth W (j 1) available
for testing Hj+1. Let pj denote the p-value of the test of Hj . If pj  j , the test reje ts
Hj . In this ase, the investing rule pays log 1=(1 pj )  pj from the invested j and
earns a pay-out ! that is added to its alpha-wealth. If pj > j , the pro edure does not
reje t Hj and its alpha-wealth de reases by log(1 j ). The hange in the alpha-wealth
is thus 8
< ! + log(1 pj ) if pj  j ;
W (j ) W (j 1) = : (8)
log(1 j ) if pj > j :
The appearan e of log(1 ) and log(1 p) in (8) deserves some explanation.
Consider the following \mi ro-investment" approa h to testing a single null hy-
pothesis H0. Set the initial wealth W (0) = and assume that the test of H0 returns
p-value p0. Rather than use one test at level , a mi ro-investment approa h uses a
sequen e of tests, ea h risking a small amount   of the total alpha-wealth. First
test H0 at level , reje ting H0 if p0  . If p0 > , the investing rule pays  for the
rst test, and then tests H0 onditionally on p0 >  at level . This se ond test reje ts
H0 if  < p0  2 2 . If this se ond test does not reje t H0 , the investing rule again
pays  and retests H0, now onditionally on p0 > 2 2. This pro ess ontinues until
the investing rule either spends all of its alpha-wealth or reje ts H0 on the kth attempt
be ause
1 (1 )k 1 < p0  1 (1 )k :
EDC and Alpha-investing 12

If the pro edure reje ts H0 after k tests, then the total of the mi ro-payments made is
k =
log(1 p0 )  ! log(1 p ) as  ! 0 :
log(1 ) 0

The in rements to the wealth de ned in equation (8) essentially treat ea h test as a
sequen e of su h mi ro-level tests.
In the next se tion, we show that alpha-investing rules that a umulate alpha-
wealth in this way ontrol EDC. The initial alpha-wealth W (0) ontrols the han e
for reje ting the null model. Under the null model when no hypothesis is reje ted, an
investing rule performs like an alpha-spending rule with level W (0) and so Size(m) 
W (0). Results des ribed in the next se tion permit one to make a orresponden e
between the parameters W (0) and ! that hara terize an alpha-investing rule and the
parameters and that identify EDC. In parti ular, to ontrol EDC, it will be shown
most natural to asso iate W (0) with and ! with .
Whereas W (0) ontrols the probability of reje ting the null model, the pay-out
! ontrols how the testing pro edure performs on e it has reje ted the null model.
The notion of ompensation for reje ting a hypothesis aptured in (8) allows one to
build ontext-dependent information into the testing pro edure. Suppose that the
substantive ontext suggests that the rst few hypotheses are most likely to be those
that are reje ted and that false hypotheses ome in lusters. In this setting, one might
onsider using an alpha-investing rule like the following. Assume that the last reje ted
hypothesis is Hk . If false hypotheses are lustered, an alpha-investing rule should
invest most of its wealth W (k) available after reje ting Hk in testing Hk+1 . A rule
that does this is

IW (0);! (k) = 6 W(2k ) (k 1k)2 ; k = k + 1; : : : ; minfj : j > k ; Rj = 1g : (9)
This rule invests 6=2  0:6 of its wealth in testing H1 or the null hypothesis Hk+1
that follows a reje ted hypothesis. The -level falls o rapidly at the rate 1=k2 as more
subsequent hypotheses are tested and not reje ted. If the substantive insight is orre t
and the false null hypotheses are lustered, then tests of hypotheses like H1 or Hk+1
represent \good investments." An example in Se tion 6 illustrates these ideas.
While it is relatively straightforward to devise investing rules, it may be diÆ ult
a priori to order the hypotheses in su h a way that those most likely to be reje ted
ome rst. Su h an ordering relies heavily on the stru ture of the spe i testing
situation. Another ompli ation is the onstru tion of tests that provide the p-values
EDC and Alpha-investing 13

that determine the alpha-wealth of an investing rule a ording to (8). In order to show
that a pro edure ontrols EDC, we require a test of Hj to have the property that
8 2 ; E (Vj j Rj 1; Rj 2; : : : ; R1 )  j : (10)
This ondition amounts to requiring that, onditionally on having either a epted or
reje ted the prior j 1 hypotheses, the test of Hj is done at level no higher than the
nominal hoi e j . The tests need not be independent.

Remark. These pro edures only require that the test of Hj maintain the stated -
level onditionally on the binary random variables R1, R2 ; : : : ; Rj 1. In parti ular, we
note that the test is not onditioned on the test statisti (su h as a z-s ore) or parameter
estimate. Adaptive testing in a group sequential trial (e.g. Lehma her and Wassmer,
1999) uses the information on the observed z-statisti at the rst look. Tsiatis and
Mehta (2003) shows that using this information leads to a less powerful test ompared
to traditional group sequential tests that only look at a eptan e at the rst look.

5 Alpha-Investing Rules Control EDC


An important extension of EDC generalizes this riterion to an arbitrary number of
hypotheses. This version of the riterion repla es the xed ount of hypotheses in the
de nition (5) of EDC ; (m) by an arbitrary stopping time.
De nition 2. The ex ess dis overy ount of a pro edure for testing a stream of
hypotheses H1; H2; : : : is
 
EDC ; = inf inf E S  (M ) R(M ) + :
2 M 2M 
(11)
where M 2 M, the set of stopping times with nite expe tation.
The ondition on M for es S  (M )  R(M )  M and so implies that both E R(M )
and E S  (M ) are bounded. Be ause step-up testing halts after the last signi ant
test (whi h is not a stopping time), this extension of EDC does not apply to su h
pro edures. In what follows, we will on entrate then on step-down pro edures.
We o er two observations on this generalized riterion. First, EDC drifts to 1
as the number of tests in reases for any testing pro edure that xes the level of signi -
an e. To see that this is so, suppose a sequen e of tests are made at level (as in the
naive pro edure onsidered in the prior example). Under the null model, we expe t
EDC and Alpha-investing 14

100 % of the hypotheses to be falsely reje ted. Be ause all of the null hypotheses are
true, S  (m) = 0 and EDC ; (m) = E R(m) = (1 m) ! 1 as m ! 1.
Hen e EDC ; = 1.
Se ond, we observe that it is always possible to onstru t a test pro edure for
whi h EDC ;  0. The Bonferroni pro edure o ers a on rete example. Although
the ommon appli ation of the Bonferroni rule assigns equal -level to ea h test, this
need not be the ase. All that is ne essary is that the sum of the levels be less than
P
. If one tests Hj at level j and j j  , then E V  (m)  for all m. Thus,
EDC ; (m)  0 for all and m.
The following theorem states that an alpha-investing rule IW (0);! with wealth de-
termined by (8) ontrols EDC so long as the pay-out ! is not too large. The theo-
rem follows by showing that a sto hasti pro ess related to the alpha-wealth sequen e
W (0); W (1); : : : is a sub-martingale. Be ause the proof of this result relies only on
the optional stopping theorem for martingales, we do not require independent tests,
though this is the ertainly the easiest ontext in whi h to show that the p-values are
honest in the sense required for (10) to hold.
Theorem 1 An alpha-investing rule IW (0);! governed by (8) with initial alpha-wealth
W (0)  and pay-out !  1 ontrols EDC ; ,

EDC ; 0: (12)


A proof of the theorem is in the appendix.

6 Examples
The examples in this se tion illustrate alpha-investing rules and EDC. Our rst two
examples onsider testing a large, but xed, olle tion of m hypotheses for whi h we ob-
serve independent p-values p1, p2, : : :, pm. The rst des ribes an alpha-investing rule
that mimi s Simes-based step-down testing. The se ond shows how alpha-investing
rules are able to leverage domain knowledge to form a more powerful multiple test-
ing pro edure. A third example des ribes alpha-investing when testing a stream of
hypothesis using dependent test statisti s.
EDC and Alpha-investing 15

6.1 Comparison to Step-Down Testing

We ompare alpha-investing to the Simes-based step-down testing pro edure des ribed
in Se tion 2. This pro edure reje ts H(1) ; H(2) ; : : : ; H(jd 1) , where jd = minfk : p(k) >
k =mg identi es the rst test that is not reje ted. (Step-up testing does not provide
a stopping time.) Assume that the step-down pro edure ontrols FDR(m)  and
reje ts a small number k > 0 of the m hypotheses. It follows then that the p-values
have the following stru ture:
p(1)  =m; p(2)  2 =m; : : : ; p(k)  k =m; and p(k+1) > (k + 1) =m : (13)
To reprodu e this behavior with alpha-investing, onsider the following approa h.
Set the initial alpha-wealth W (0) = and ! = . De ne the alpha-investing to
allo ate its available alpha-wealth W (j ) equally over the hypotheses that have not been
reje ted, and begin by testing ea h hypothesis at the Bonferroni level =m. Be ause of
the stru ture in the p-values (13), this rst pass reje ts at least one hypothesis, namely
H(1) . To keep the presentation simple, suppose that only one hypothesis has p-value
less than =m. The pro edure pays log(1 =m) for ea h test that does not reje t,
and earns + log(1 p(1) ) for reje ting H(1) . Hen e, after testing ea h hypothesis at
level =m, its alpha-wealth is at least
W (m) = W (0) + + log(1 p(1) ) + (m 1) log(1 =m)
 2 + m log(1 =m)
 2=m (14)
After this rst pass through the hypotheses, its alpha-wealth is virtually un hanged,
and it retains enough wealth to reje t H(2) .
For the se ond pass through the remaining m 1 null hypotheses, the alpha-
investing rule reje ts any hypothesis for whi h pj  2 =m, as in the Simes pro edure.
Be ause these tests ondition on pj > =m, this round of testing requires that the
alpha-investing rule test ea h of the remaining m 1 hypothesis at level
 
P < p  2 j p > = :
0 m j
m j
m m
It possesses enough wealth after the se ond round to do this be ause, from (14) for
 1=2,
W (m) 2 =m
m 1
 m 1  m :
EDC and Alpha-investing 16

As in the rst round, this se ond pass again approximately onserves the alpha-wealth
of the pro edure. Thus, so long as m is large and k  m so that bounds similar
to (14) hold, ea h pass though the hypotheses onserves enough alpha-wealth for the
next round of tests. In this way, the investing rule gradually raises the threshold for
reje ting a hypothesis as the number of reje ted hypotheses in reases.
The simulation summarized in the next se tion ompares this alpha-investing rule
to step-down testing. The alpha-investing rule generally does slightly better (reje ts
more false hypotheses) than step-down testing for two reasons. First, the lower bound
(14) for the wealth W (m), for example, assumes p(1) = =m. In fa t, we would expe t
p(1) to be loser to =(2m), on average. Se ond, our des ription assumes that the
p-values reje ted by step-down testing are evenly distributed, with one between ea h
threshold. Instead, it is likely that some passes of the investing rule will reje t more
than one hypothesis and thus have greater alpha-wealth for testing in the next round
than suggested by these lower bounds.

6.2 Investing Rules that Leverage Domain Knowledge

The performan e of an alpha-investing rule improves, in the sense of being more pow-
erful, if the investigator \knows the s ien e". If the investigator is able to order the
hypotheses a priori so that those most likely to be reje ted are tested rst, then alpha-
investing an reje t onsiderably more hypotheses than step-down testing. The full
bene t is only realized, however, when one exploits an aggressive investing rule. The
prior investing rule assumes that the hypotheses are arranged in no parti ular order
and spreads its alpha-wealth evenly over the remaining hypotheses.
Suppose that the test pro edure reje ts Hk and is about to test Hk+1. Rather
than spread its urrent alpha-wealth W (k ) evenly over the remaining hypotheses, a
rule an invest more in testing the next hypothesis. For example, one an allo ate
W (k ) using a dis rete probability mass fun tion su h as this version of the investing
rule (9). If none of the remaining hypotheses are reje ted, then the level for testing Hj
is
W (k ) 1 ; j = k + 1; : : : ; m ;
j = (15)
h  (j k )2
m k ;2
where the normalizing onstant hq;2 = Pqi=1 1=i2 . If one of these tests reje ts a hypoth-
esis, the pro edure reallo ates its wealth so that all is spent by the time the pro edure
tests Hm. Mimi ing the language of nan ial investing, we des ribe this type of alpha-
EDC and Alpha-investing 17

investing rule as aggressive and the previous method as onservative.


dpf: This idea of using prior information is impli it in alpha spending
rules. But not mu h FDR theory on su h rules exist. Re ently, Genovese
and Wasserman (2004) uses prior information on the hypotheses to ome up
with a weighted Benjamini-Ho hberg ( alled wBH) pro eedure. Following
the ideas of the previous se tion, we an show that the wBH pro eedure
satis es the EDC. Thus the wBH pro eedure is somewhere between these
two pro eedures. Not quite as aggressive as this se tion, but mu h more
aggressive than the usual Simes method. (GOD THIS IS A BAD PARA-
GRAPH. SHOULD WE EXPAND IT?)
The simulation summarized in Figure 4 ompares step-down testing to onservative
and aggressive alpha-investing rules. For this simulation, we assume that the investi-
gator tests the hypotheses in the order implied by jj j. The m = 200 hypotheses test
means as de ned in the simulation in Se tion 3 (see equation 6). We set the initial
wealth W (0) = 0:05, = 0:05, = 0:95, and used step-down testing that ontrols
F DR(200) < 0:05. Figure 4 shows FDR and EDC. All three pro edures ontrol both
FDR and EDC, as they should. FDR for step-down testing losely tra ks the perfor-
man e of the onservative alpha-investing rule. The parti ularly low FDR obtained
by aggressive alpha-investing may appear surprising at rst. The low error rate is
another bene t of the side-information. Aggressive alpha-investing spends all of its
alpha-wealth testing the initial hypotheses | whi h happen to be false | and runs
out of wealth before en ountering the hypotheses for whi h j = 0. This rule also has
larger EDC.
Alpha-investing guarantees prote tion from too many false reje tions, but how well
does it nd signal? Figure 5 ompares the power of the these alpha-investing rules to
that of step-down testing. The plot shows the number of orre t reje tions S  (m) made
by three di erent rules: aggressive alpha-investing that exploits domain knowledge
using the rule (15), onservative alpha-investing (whi h assumes a random order) and
step-down testing. The gure shows the average number of hypotheses reje ted by ea h
investing rule relative to the number reje ted by step-down testing, on a per entage
s ale. For example, with a weak signal (1 = 0:10),
100 S (m;S aggressive investing) > 150%

(m; step-down)
In general, for weak signals, aggressive alpha-investing identi es about 30% more false
EDC and Alpha-investing 18

Figure 4: Both alpha-investing rules ( onservative |, aggressive  ) ontrol FDR (left)

and EDC (right), as does step-down testing (   ). Conservative alpha-investing assumes

no domain knowledge, whereas aggressive alpha-investing uses domain knowledge, here the

ordering of 2i .
FDR EDC

0.07 6
0.06
0.05 4

0.04 2
0.03
0.02 0

0.01 -2
Π1 Π1
.1 .2 .3 .4 .5 .6 .7 .8 .9 1 .1 .2 .3 .4 .5 .6 .7 .8 .9 1

hypotheses than step-down testing. The two be ome more similar as signal strength
grows (in the form of more false null hypotheses). As dis ussed in the prior se tion,
onservative alpha-investing reje ts a few more hypothesis, about 5-10%, than Simes-
based step-down testing.

6.3 Dependent Tests

The previous examples illustrate EDC and alpha-investing rules when testing a losed
set of m hypotheses using independent tests. For dependent tests, however, step-down
testing does not guarantee ontrol of FDR. In omparison, one an nd alpha-investing
rules that ontrol EDC.
EDC itself makes no assumption of independen e of the the tests, but does require
that the tests be onditionally orre t in the sense of (10). When hypothesis tests
are independent, it is simple to assure that ea h test indeed has level j . One need
only form ea h test as though only one hypothesis were being tested; the out omes of
the prior tests R1 ; R2 : : : ; Rj 1 do not a e t its level. This ondition is mu h more
diÆ ult to establish when the tests are dependent. Although EDC allows any sort
of dependen e, it may not be possible to onstru t tests that satisfy this ondition
without making assumptions on the form of the dependen e.
In some ases, however, known properties of multivariate distributions suggest a
suitable test pro edure. For example, suppose that the test statisti s Y = (Y1; : : : ; Ym )
for H(m) have a multivariate normal distribution with mean ve tor ~ and ovarian e
EDC and Alpha-investing 19

Figure 5: Aggressive alpha-investing using (15) exploits domain knowledge to a hieve higher

power than Simes-based step-down testing. This plot shows the per entage of orre tly

reje ted null hypotheses for ea h pro edure, relative to step-down testing. Both alpha-

investing rules have more power than step-down testing with the same size.

% Rejected vs Step-Down

150
140
130
120 Aggressive
110
Conservative
100
Step-down
Π1
.1 .2 .3 .4 .5 .6 .7 .8 .9 1

matrix , Y  N (~; ). In this ase, Dykstra (1980) shows that


P(jYm j < m j jY1 j  1 ; : : : ; jYm 1 j  m 1 )  P(jYm j  m ) : (16)
Thus, so long as no prior two-sided hypothesis has been reje ted, an -level test of
Hm that ignores the prior out omes | as though they were independent | has level
at least . The pro edure is onservative. If, however, some prior test reje ts a null
hypothesis, these results no longer hold.
In this ase, the simplest way to ensure the level of a test is to remove the e e t
of the reje ted hypothesis. If Hk , say, has been reje ted, then one an guarantee (10)
holds by onstru ting subsequent tests to be independent of Yk and any Yj ; j < k
whi h is orrelated with Yk . By removing the information from the reje ted test, the
a eptan e region for subsequent two-sided tests is a symmetri onvex set around the
origin and inequalities su h as (16) hold.
For example, onsider a balan ed two-way analysis of varian e with r row e e ts
P P
r;i and olumn e e ts ;j with i r;i = j ;j = 0. Write the ve tor of row
e e ts as ~r and the ve tor of olumn e e ts ~ . For ea h ell of the design, we have
n independent normally distributed observations Yijk
Yijk = 0 + r;i + ;j + Zijk ; Zijk iid
 N (0; 2 ); k = 1; : : : ; n;
with known varian e 2. Assume that the hypotheses to be tested have the form
EDC and Alpha-investing 20

Hj : ~0r;j ~r = 0; ~0 ;j ~ = 0. Standard results from linear models show that the usual
tests of Hj and Hk are independent if ~0r;j~r;k = 0 and ~0 ;j~ ;k = 0. Suppose one
begins with tests of the row e e ts ( = 0). There are no onstraints on the tests
until reje ting a hypothesis, Hk say. At this point, one an ommen e testing olumn
e e ts, ignoring the prior results for the row e e ts be ause these are orthogonal. One
an ontinue testing other hypotheses among the row e e ts so long as ~r;j is orthogonal
to ~r;k .
A similar pro edure an be used in stepwise regression. Consider the familiar
forward stepwise sear h, seeking predi tors of the response Y among X1 ; X2 ; : : : ; Xm
in a linear model
Yi = 0 + 1 X1;i + 2 X2;i +    + m Xm;i + Zi ; Zi iid
 N (0; 2 ) :
Assume that all of the variables have mean zero and 0 = 0. Under the normal linear
model with known error varian e, (16) implies that tests of Hj : j = 0 based on the
familiar z-s ores for the predi tors Zj = (Xj0 Y )=(Xj0 Xj ) satisfy (10) until some Hk
is reje ted. For further tests, one an assure that (10) holds by sweeping Xk and all
predi tors among X1 ; X2 ; : : : ; Xk 1 that are orrelated with Xk from the remaining
predi tors. In pra ti e, most predi tors are orrelated with ea h other to some extent
and this ondition requires sweeping X1 ; X2 ; : : : ; Xk from subsequent predi tors. If
we olle t these k predi tors into an n  k matrix X , then the subsequent predi tors
would be X~j = (I X 0 (X 0 X ) 1 X )Xj ; j = k + 1; : : :. The resulting loss of variation
in predi tors suggests it would be prudent to at least partially \orthogonalize" the
predi tors prior to using this type of sear h.

7 Dis ussion
The ombination of EDC with alpha-investing rules invites the use of adaptive strate-
gies for testing multiple hypotheses. Rather than posit a xed set of hypotheses in
advan e of analysis, one an o er a strategy for determining whi h hypotheses to test
next after getting some preliminary results. We would expe t good strategies to lever-
age domain knowledge and be spe i to the parti ular method of analysis.
Part of our motivation for developing EDC and alpha-investing rules arose from
our work using stepwise regression for data mining (Foster and Stine, 2004). In this
appli ation, we ompared forward stepwise regression to tree-based lassi ers for pre-
di ting the onset of personal bankrupt y. To make regression ompetitive, we expanded
EDC and Alpha-investing 21

the stepwise sear h to in lude all possible intera tions among more than 350 \base"
predi tors. This produ ed more than 67,000 possible predi tors. Be ause so many of
these predi tors were intera tions (more than 98%), it is not surprising that most of
the predi tors identi ed by the sear h were intera tions. Furthermore, be ause of the
wide s ope of this sear h, the pro edure la ked power to nd subtle e e ts that while
small, improve the predi tive power of a model. It be ame apparent to us that a hybrid
sear h that only onsidered the intera tion Xj  Xk , say, after in luding both Xj and
Xk as main e e ts might be very e e tive. At the time, however, we la ked a method
for ontrolling the sele tion pro edure when the s ope of the sear h dynami ally ex-
pands as in this situation. We expe t to use alpha-investing heavily in this work in the
future.
We spe ulate that the greatest reward from developing a spe ialized testing strategy
will ome from developing methods that sele t the next hypothesis rather than spe i
fun tions to determine how is spent. The rule (15) invests most of the urrent wealth
in testing hypotheses following a reje tion. One an imagine quite a few other hoi es.
Our work and those of others in information theory (Rissanen, 1983; Foster, Stine
and Wyner, 2002), however, suggest that one an nd universal alpha-investing rules.
Given a pro edure for ordering the hypothesis, a universal alpha-investing rule would
lead to reje ting as many hypothesis as the best rule within some lass. We would
expe t su h a rule to spend its alpha-wealth a bit more slowly than the simple rule
(15), but retain this general form.
Another area of appli ation for alpha-investing is in group-sequential lini al trials.
In other work (Foster and Stine, 2005) we address the on ept of adaptive design with
a modi ation for alpha-investing. We show that the omplaints raised in Tsiatis and
Mehta (2003) about the eÆ ien y of su h tests an be mitigated by proper alpha-
investing. At the same time, we allow the resear her freedom to design rules that
guide how to spend or invest their alpha-wealth.

Appendix: Proof of Theorem 1


We prove Theorem 1 in this se tion. We begin by de ning an empiri al ex ess dis overy
ount. De ne the random variable
ed ; (; j )  S  (j ) R(j ) +
EDC and Alpha-investing 22

so that
EDC ; = inf inf E (ed ; (; M )) :
2 M 2M 
Now de ne
A(j )  ed ; (; j ) W (j ) :
Our main lemma shows that A(j ) is a sub-martingale for alpha-investing rules with
initial alpha-wealth W (0)  and pay-out !  1 . A sub-martingale is \in reasing"
in the sense that
E (A(j ) j A(j 1); A(j 2); : : : ; A(1))  A(j 1) :
By de nition S  (0) = R(0) = 0 so that ed ; (; 0) = . So if W (0)  , !  1
and A(j ) is a sub-martingale, then the optional stopping theorem implies that for all
nite stopping times M
E (ed ; (; M ))  E (ed ; (; M ) W (M ))  W (0)  0 :
The rst inequality follows be ause the alpha-wealth W (j )  0 [a:s:℄, and the se ond
inequality follows from the sub-martingale property. Sin e EDC for alpha-investing
rules is the in mum over su h expe tations, all of whi h are non-negative, EDC itself
is non-negative.
Thus to show Theorem 1 all we need is the following lemma:
Lemma 1 Let V  (m) and R(m) denote the umulative number of false reje tions and
the umulative number of all reje tions, respe tively, when testing a sequen e of null
hypotheses fH1 ; H2 ; : : :g using an alpha-investing rule IW (0);! with initial alpha-wealth
W (0)  , pay-out !  1 , and umulative alpha-wealth W (m). Then the pro ess
A(j )  ed ; (; j ) W (j )
= (1 )R(j ) V  (j ) + W (j )
is a sub-martingale,

E (A(m) j A(m 1); : : : ; A(1))  A(m 1) : (17)


Proof.
We begin with some notation for the in rements that de ne the ounts in Table 1.
Write V  (m) and R(m) as sums of indi ators Vj ; Rj 2 f0; 1g,
m
X m
X
V  (m) = Vj ; R(m) = Rj :
j =1 j =1
EDC and Alpha-investing 23

Similarly write the a umulated alpha-wealth W (m) and A(m) as sums of in rements,
P Pm
W (m) = m j =0 Wj and A(m) = j =0 Aj . Let j denote the alpha level of the test of
Hj that satis es the ondition (10). The hange in the alpha-wealth from testing Hj
an be written as:
Wj = Rj ! + log(1 (pj ^ j )) ;
where ^ is the minimum operator. Substituting this into Aj we get
Aj = (1 !)Rj Vj log(1 (pj ^ j )) :
Sin e Rj  0 and 1 !  0 by the onditions of the lemma, it follows that
Aj  Vj log(1 (pj ^ j )) : (18)
If j 62 Hj , then Vj = 0 and Aj  0 almost surely. So we only need to onsider the
ase in whi h the null hypothesis Hj is true.
Abbreviate the onditional expe tation
Ej 1 (X ) = E (X j A(1); A(2); : : : ; A(j 1)) :
Then, when Hj is true, pj  U [0; 1℄ so that
Z1
Ej 1( log(1 (pj ^ j )) = log(1 (p ^ j ))dp
Z0 j Z1
= log(1 p)dp log(1 j )dp
0 j
= j :
Sin e Ej 1(Vj )  j by the de nition of this being an j level test, equation (18)
implies Ej 1 Aj  0.

Referen es
Benjamini, Y. and Ho hberg, Y. (1995) Controlling the false dis overy rate: a pra ti al
and powerful approa h to multiple testing. Journal of the Royal Statist. So ., Ser.
B, 57, 289{300.

Benjamini, Y. and Yekutieli, D. (2001) The ontrol of the false dis overy rate in multiple
testing under dependen y. Annals of Statisti s, 29, 1165{1188.
EDC and Alpha-investing 24

Braun, H. I. (ed.) (1994) The Colle ted Works of John W. Tukey: Multiple Compar-
isons, vol. VIII. New York: Chapman & Hall.

Dudoit, S., Sha er, J. P. and Boldri k, J. C. (2003) Multiple hypothesis testing in
mi roarray experiments. Statisti al S ien e, 18, 71{103.
Dykstra, R. L. (1980) Produ t inequalities involving the multivariate normal-
distribution. Journal of the Amer. Statist. Asso ., 75, 646{650.
Efron, B. (2005a) Large s ale simultaneous hypothesis testing: the hoi e of a null
hypothesis. Journal of the Amer. Statist. Asso ., 100, 96{104.
| (2005b) Sele tion and estimation for large-s ale simultaneous inferen e.
Te h. rep., Department of Statisti s, Stanford University, http://www-
stat.stanford.edu/brad/papers/hivdata.
Foster, D. P. and Stine, R. A. (2004) Variable sele tion in data mining: Building a
predi tive model for bankrupt y. Journal of the Amer. Statist. Asso ., 99, 303{313.
| (2005) Theoreti al foundations for adaptive testing using alpha-investing rules. Te h.
rep., Statisti s Department, University of Pennsylvania.

Foster, D. P., Stine, R. A. and Wyner, A. J. (2002) Universal odes for nite sequen es
of integers drawn from a monotone distribution. IEEE Trans. on Info. Theory, 48,
1713{1720.
Genovese, Christopher, K. R. and Wasserman, L. (2004) False dis overy ontrol with
p-value weighting. in progress.
Gupta, M. and Ibrahim, J. G. (2005) Towards a omplete pi ture of gene regulation:
using Bayesian approa hes to integrate genomi sequen e and expression data. Te h.
rep., University of North Carolina, Chapel Hill, NC.

Holm, S. (1979) A simple sequentially reje tive multiple test pro edure. S andinavian
Journal of Statisti s, 6, 65{70.

Lehma her, W. and Wassmer, G. (1999) Adaptive sample size al ulations in group
sequential trials. Biometri s, 55, 1286{90.
EDC and Alpha-investing 25

Mar us, R., Peritz, E. and Gabriel, K. R. (1976) On losed testing pro edures with
spe ial referen e to ordered analysis of varian e. Biometrika, 63, 655{660.
Meinshausen, N. and Buehlmann, P. (2004) Lower bounds for the number of false null
hypotheses for multiple testing of asso iations under general dependen e. Te h. Rep.
121, ETH Zuri h, http://stat.ethz. h/ ni olai/.

Meinshausen, N. and Ri e, J. (2004) Estimating the proportion of false null hypotheses


among a large number of independently tested hypotheses. To appear, Annals of
Statisti s.

Rissanen, J. (1983) A universal prior for integers and estimation by minimum des rip-
tion length. Annals of Statisti s, 11, 416{431.
Sarkar, S. K. (1998) Some probability inequalities for ordered Mtp2 random variables:
A proof of the Simes onje ture. Annals of Statisti s, 26, 494{504.
Simes, R. J. (1986) An improved bonferroni pro edure for multiple tests of signi an e.
Biometrika, 73, 751{754.

Storey, J. D. (2002) A dire t approa h to false dis overy rates. Journal of the Royal
Statist. So ., Ser. B, 64, 479{498.

| (2003) The positive false dis overy rate: a Bayesian interpretation and the q-value.
Annals of Statisti s, 31, 2013{2035.

Troendle, J. F. (1996) A permutation step-up method of testing multiple out omes.


Biometri s, 52, 846{859.

Tsiatis, A. A. and Mehta, C. (2003) On the ineÆ ien y of the adaptive design for
monitoring lini al trials. Biometrika, 90, 367{378.
Tukey, J. W. (1953) The problem of multiple omparisons. Unpublished le ture notes.
| (1991) The philosophy of multiple omparisons. Statisti al S ien e, 6, 100{116.

You might also like