You are on page 1of 6

Open access, freely available online

Why Most Published Research Findings


Are False
John P. A. loannidis

factors that influence this problem and is characteristic of the field and can
some corollaries thereof. va~ya lot depending on whether the
There is increasing concern that most field targets highly likely relationships
Modeling the Framework for False or searches for only one or a few
current published research findings are
Positive Findings true relationships among thousands
false.The probability that a research daim
is true may depend on study power and Severai methodologists have and millions of hypotheses that may
bias,the number of other studies on the pointed out [9-111 that the high be postulated. Let us also consider,
Same question, and, importantly,the ratio rate of nonreplication (lack of for computational simplicity,
of true to no relationships arnong the confirmation) of research discoveries circumscribed fields where either there
relationships probed in each scientific is a consequence of the convenient, is only one true relationship (among
field. In this framework, a research finding yet ill-founded strategy of claiming many that can be hypothesized) or
is less likely to be true when the studies conclusive research findings solely on the power is similar to find any of the
conducted in a field are smaller; when the basis of a Single study assessed by several existing true relationships. The
effect sizes are smal1er;when there is a formal statistical significance, typically pre-study probability of a relationship
greater number and lesser preselection for a pvalue less than 0.05. Research being true is R/(R + 1). The probability
of tested re1ationships;where there is is not most appropriately represented of a study finding a true relationship
greater flexibility in designs, definitions, and summarized by pvalues, but, reflects the power 1 - ß (one minus
outcomes, and analytical modes; when unfortunately, there is a widespread the Type ll error rate). The probability
there is greater financial and other notion that medical research articles of claiming a relationship when none
interest and prejudice;and when more truly exists reflects the Type I error
teams are involved in a scientific field It a n be proven that rate, a. Assuming h a t C relationships
are being probed in the field, the
in chase of statistical significance. most daimed resea~h expected values of the 2 x 2 table are
Simulations show that for most study
designs and Settings, it is rnore likely for findings are false. g-iven in Table 1. After a research
a research claim to be false than true. finding has been claimed based on
Moreover,for many current scientific should be interpreted based only on achieving formal statistical significance,
fields, claimed research findings may pvalues. Research findings are defined the post-study probability that it is true
often be simply accurate measures of the here as any relationship reaching is the positive predictive value, PPV.
prevailing bias. In this essay, I discuss the formal statistical significance, e.g., The PPV is also the complementary
implications of these problems for the effective interventions, informative probability of what Wacholder et al.
conduct and interoretation of research. predictors, risk factors, or associations. have called the false positive repon
"Negative" research is also very useful. probability [10]. According to the 2
"~e&itive" is actually a misnomer, and x 2 table, one gets PPV = (1 - ß) R/(R
the misinterpretation is widespread. - PR + a). A research finding is thus

P
ublished research findings are
sometimes refuted by subsequent However, here we will target
evidence, with ensuing confusion relationships that investigators claim Citatlon: loannidisJPA (2005) Why most published
and disappointment. Refutation and exist, rather than null findings. research findings arefalsePLoS Med 2(8):el24.
controversy is Seen across the range of As has been shown previously, the
Copyright: Q 2005 John P.Aloannidis.This is an
research designs, from clinical trials probability that a research finding open-access article distributedunder the terms
and traditional epidemiological studies is indeed true depends on the prior ,of the Creative CommonsAttribution Liceme,
probability of it being true (before which permits unrestricted use,distribution,and
[I-31 to the most modern molecular reproduction in any medium, provided the original
research [4,5]. There is increasing doing the study), the statistical power work is properlycited.
concern that in modern research, false of the study, and the level of statistical
Abbmviatlon: PPV, positive predictivevalue
findings may be the majority or even significance [10,11]. Consider a 2 x 2
the vast majority of published research table in which research findings are John P. A. loannidis is in the Department of Hygiene
compared against the gold Standard and Epidemiology, Uniwnity of loannina Schwl of
claims [6-81. However, this should Medicine, loannina,Greece,and lnstinnefor Clinical
not be surprising. It can be proven of true relationships in a scientific Research and Health Policy Studies,Depanment of
that most claimed research findings field. In a research field both true and Medicine,Tuhs-New England Medial Center,Tufts
false hypotheses can be made about UniversitySchool of Medicine, Boston,Massachusetts,
are false. Here I will examine the key United States of America. E-mail:jioannid&c.uoigr
the presence of relationships. Let R
be the ratio of the number of "true Compting IntaresWTheauthor has declared that
The Essay section contains opinion pieces on topks no competing interests exist.
of broad interest to a general medial audience.
relationships" to "no relationships"
among those tested in the field. R DOI: 10.137l/journal.prned.0020124

:*.
:@.. PLoS Medicine I www.plosmedicine.org August 2005 I Volume 2 I lssue 8 I e124
Same question, claims a statistically
Table 1. Research Findinqs and True Relationships
significant research finding is easy to
Research True Relationshlp
estimate. For n independent studies of
Findina Yes Na Total
equal power, the 2 ;2 table is shown in
Yes . 6 1 -ß)RI(R+l) d ( R + 1) c(R+a-'BR)I(R+ 1) Table 3: PPV = R ( l - ß3/(R + 1- [ l -
No cßRl(R + 1 ) -
c(1 a)l(R + 1 ) -
c(1 a + ßR)l(R + 1 ) aIn- Rßn) (not considering bias). With
Total dV(R+ 1 ) dVl+l) C increasing number of independent
studies, PPV tends to decrease, unless
1 - ß < a , i.e., typically 1 - ß < 0.05.
This is shown for different levels of
more likely true than false if (1 - ß)R are lost in noise [12], or investigators
power and for different pre-study odds
> a.Since usually the vast majority of use data inefficiently or fail to notice
investigators depend on a = 0.05, this in Figure 2. For n studies of different
statistically significant relationships, or
means that a research finding is more power, the term ßn is replaced by the
there rnay be conflicts of interest that
product of the terms Pi for i = 1 to n,
likely true than false if (1 - ß)R > 0.05. tend to "bury" significant findings [13].
but inferences are similar.
What is less well appreciated is There is no good large-scale empirical
that bias and the extent of repeated evidence on how frequently such Corollaries
independent testing by different teams reverse bias rnay occur across diverse A practical example is shown in Box
of investigators around the globe rnay research fields. However, it is probably 1. Based on the above considerations,
further distort this picture and rnay fair to say that reverse bias is not as
lead to even smaller probabilities of the one rnay deduce several interesting
common. Moreover measurement corollaries about the probability that a
research findings being indeed true. errors and inefficient use of data are research finding is indeed true.
We will try to model these two factors in probably becoming less frequent Corollary 1: The smaller the studies
the context of similar 2 X 2 tables. problems, since measurement error has conducted in a scientific field, the less
decreased with technological advances iikeiy the research hdings are to be
Bias
in the molecular era and investigators true. Small sample size means smaller
First, let us define bias as the are becoming increasingly sophisticated power and, for all functions above,
combination of various design, data, about their data. Regardless, reverse the PPV for a true research finding
analysis, and presentation factors that bias rnay be modeled in the Same way as decreases as power decreases towards
tend to produce research findings bias above. Also reverse bias should not 1- ß = 0.05. Thus, other factors being
when they should not be produced. be confused with chance variability that equal, research findings are more likely
Let U be the proportion of probed rnay lead to missing a true relationship true in scientific fields that undertake
analyses that would not have been because of chance. large studies, such as randomized
"research findings," but nevertheless
controlled trials in cardiology (several
end up presented and reported as Testing by Several Independent thousand subjects randomized) [14]
such, because of bias. Bias should not Teams
~ - - ~ ~ ~ ~ -
than in scientific fields with small
be confused with chance variability
Several independent teams rnay be studies, such as most research of
that causes some findings to be false by
addressing the Same Sets of research molecular predictors (sarnple sizes 100-
chance even though the study design,
questions. As research efforts are fold smaller) [15].
data, analysis, and presentation are
globalized, it is practically the rule Coroiiary 2: The smaiier the effect
perfect. Bias can entail manipulation
that several research teams, often sizes in a scientific field, the less iikeiy
in the analysis or reporting of findings.
dozens of them, rnay probe the Same the research findings are to be true.
Selective or distorted reporting is a
or similar questions. Unfortunately, in Power is also related to the effect
typical form of such bias. We rnay
some areas, the prevailing mentality size. Thus research findings are more
assume that U does not depend on
until now has been to focus on likely true in scientific fields with large
whether a true relationship exists
isolated discoveries by single teams effects, such as the impact of smoking
or not. This is not an unreasonable
and interpret research experiments on Cancer or cardiovascular disease
assumption, since typically it is
in isolation. An increasing number (relative risks 3-20), than in scientific
impossible to know which relationships
of questions have at least one study fields where postulated effects are
are indeed true. In the presence of bias
claiming a research finding, and small, such as genetic risk factors for
(Table 2), one gets P W = ([I - ß]R +
this receives unilateral attention. multigenetic diseases (relative risks
$R)/(R+a-ßR+ U - ua+$R),and
The probability that at least one 1.1-1.5) [7]. Modem epidemiology is
PPV decreases with increasing u, unless
study, among several done on the increasingly obliged to target smaller
1 - ß 1a , i.e., 1 - ß 10.05 for most
situations. Thus, with increasing bias, - -

the chances that a research finding Table 2. Research Findings andTrue Relationships in the Presence of Bias
is true diminish considerably. This is Research True Relationship
shown for different levels of power and Finding Yes No Total
for different pre-study odds in Figure 1.
Yes (dl-BlRtuMIER+l) .dtudl-a)1(~+1) c(R+a-ßR+u-ua+uaR)/(R+t)'
Conversely, true research findings -
(1 u)cßRI(R + 1) - -
1 C I I R+1 - - +
c(1 u)(1 a ßR)/(R 1 ) +
No
rnay occasionally be annulled because . .
Total cRNI+ 1) > c1@+1) ' , . . ' . C ,< . -<
of reverse bias. For example, with large
measurement errors relationships ~ . i a 1 3 7 1 ~ ~ ~ ~ ~ 1 ~ 4 m 0 2

. PLoS Medicine I nplosrnedicine.oip August 2005 1 Volume 2 1 lssue 8 1 e124


effect sizes [16]. Consequently, the outcomes) [23]. Similarly, fields that
proportion of true research findings use commonly agreed, stereotyped
is expected to decrease. In the same analytical methods (e.g., Kaplan-
line of thinking, if the true effect sizes Meier plots and the log-rank test)
are very small in a scientific field, [24] rnay yield a larger proportion
this field is likely to be plagued by of true findings than fields where
almost ubiquitous false positive claims. analytical methods are still under
For example, if the majority of true expenmentation (e.g., artificial
genetic or nutritional determinants of intelligente methods) and only "best"
complex diseases confer relative risks results are reported. Regardless, even
less &an 1.05, genetic or nutritional in the most stringent research designs,
epidemiologywould be largely utopian bias seems to be a major problem.
endeavors. For example, there is strong evidence
Coroliary 3: The greater the number that selective outcome reporting,
and the lesser the selection of tested with manipulation of theoutcohes
relationships m a scientiüc field, the and analyses reported, is a common
less likely the research findings are to problem even for randomized trails
be true. As shown above, the post-study [25]. Simply abolishing selective
probability that a finding is true (Pm) publication would not make this
depends a lot on the pre-study odds problem go away.
(R). Thus, research findings are more Coroliary 5: The greater the 6nanciai
likely true in confirmatory designs, and other mterests and prejudices
such as large phase 111randomized in a scientinc field, the less likely
controlled trials, or meta-analyses the research findingsare to be true.
thereof, than in hypothesis-generating Conflicts of interest and prejudice rnay
expenments. Fields considered highly increase bias, U Conflicts of interest
informative and creative given the are very common in biomedical
wealth of the assembled and tested research [26], and typically they are
information, such as microarrays and inadequately and sparsely reported
other high-throughput discove~y- [26,27]. Prejudice rnay not necessarily
onented research [4,8,17], should have have financial roots. Scientists in a
extremely low PW. given field rnay be prejudiced purely - ~ o . 0 5- 1110.20- 1110.50- u 4 . a
Corollary 4: The greater the because of their belief in a scientific DOI: 10.1371/)ournal.prned.0020124.g001
flexibility in designs, definitions, theory or commitment to their own Figure 1. PPV (ProbabilityThat a Research
outcomes, and m c a i modes in findings. Many otherwise seemingly Finding IsTrue) as a Function of the Pre-Study
a scientinc field, the less likely the independent, university-based studies Odds for Various Levels of Bias, U
research findmgs are to be true. rnay be conducted for no other reason Panels correspond to power of 0.20,0.50,
Flexibility increases the potential for than to give physicians and researchers and 0.80.
transforrning what would be "negative" qualificationsfor promotion or tenure.
results into "positivenresults, i.e., bias, Such nonfinancial conflicts rnay also This seemingly paradoxical corollary
U For several research designs, e.g., lead to distorted reported resuits and follows because, as stated above, the
PPV of isolated findings decreases
randomized controlled trials [18-201 interpretations. Prestigious investigators
or meta-analyses [21,22], there have rnay suppress via the peer review process when many teams of investigators
been efforts to standardize their the appearance and dissemination of are involved in the same field. This
findinh that refute their findings, thus rnay explain why we occasionally see
conduct and reporting. Adherence to
common standards is likely to increase condemning their field to perpetuate major excitement followed rapidly
false dogma, Empirical evidence by severe disappointments in fields
the proportion of true findings. The
same applies to outcomes. True on expert opinion shows that it is that draw wide attention. With many
findings rnay be more common extremely unreliable [28]. teams working on the same field and
when outcomes are unequivocal and Coroliary 6: The hotter a with massive experimental data being
universally agreed (e.g., death) rather scientiüc field (with more scientific produced, timing is of the essence
teams invohred), the less likely the in beating competition. Thus, each
than when multifarious outcomes are
devised (e.g., scales for schizophrenia research findings are to be true. team rnay pnontize on pursuing and
disseminating its most impressive
"positive" results. "Negativenresults rnay
Table 3. Research Findings and Tnie Relationships in the Presence of Multiple Studies become attractive for dissemination
Research True Reiationship only if some other team has found
Finding Yes No Total a "positive" association on the same
question. In that case, it rnay be
attractive to refute a claim made in
some prestigiousjournal. The term
Proteus phenomenon has been coined
~ia1~7i/lamipdoo2oi~~.(0~ to descnbe this phenomenon of rapidly

@ PLoS Medicine I miwplosmedicine.org August 2005 1 Volume 2 I lssue 8 1 e124


Box 1. An Example: Science analyses,and reporting so as to make
at Low Pre-Study Odds more relationships Cross the p = 0.05
threshold even though this would not
Let us assume that a team of have been crossed with a perfealy
investigators performs a whole genome adhered to design and analysis and with
association study to test whether perfect comprehensive reporting of the
any of 100,000gene polymorphisms results, strictly according to the original
are associated with susceptibility to study plan. Such manipulation could be
schizophrenia.Based on what we done,for example, with serendipitous
know about the extent of heritability inclusion or exclusion of certain patients
of the disease, it is reasonable to or controls,post hoc subgrwp analyses,
expea that probaMy around ten inwstigation of genetic contrasts that
gene poiymorphisrns among those were not originally specified,changes
tested would be truly associated with in the disease or control definitions,
schizophrenia, with relativeiy similar and various combinations of selective
odds ratios around 1.3 for the ten or so or distorted reporting of the results.
polymorphisms and with a fairly similar Commercially availableöata mining"
power to identify any of them.Then packages actually are proud of their
R = 10/100,000= 10 4,andthe prestudy ability to yield statistically significant
probability for any polymorphism to be resuits through data dredging.In the
associated with schizophrenia is also presence of blas with u = 0.1 0, the post-
R/(R + 1) = 10 '.Let us also suppose that study probability that a research finding
the study has 6094power to find an is true is only 4.4 X 10 4. Furthermore,
association with an odds ratio of 1.3 at even in the absence of any bias, when
a = 0.05.Then it can be estimated that ten independent research teams perform
if a statistically significant association is similar experiments around the world, if
found with thep-value barely crossing the one of them finds a formally statistically
0.05 threshold, the post-study probability significant association,the probability
that this is true increases about 12-fold that the research finding is true is only
compared with the pre-study probability, 1.5 X 10 ",hardly any higher than the
but it is still only 12 X 10 '. probability we had before any of this
Now let us suppose that the extensive research was undertaken!
investigators rnanipulate their desigi

Figura 2. PW (ProbabilityThata Research is hot or has strong invested interests eventuaily true about 85% of the time.
Finding IsTrue) as a Function of the Pre-Study rnay sometimes promote larger studies A fairly similar performance is expected
Odds for Various Numbers of Conducted and improved standards of research, of a confirmatory meta-analysis of
Studies,n
enhancing the predictive value of its goodquality randomized trials:
Panels correspond to power of 0.20,050,
and 0.80. research findings. Or massive discovery- potential bias probably increases, but
oriented testing may result in such a power and pre-test chances are higher
alternating extreme research claims large yield of significant relationships compared to a single randomized trial.
and extremely opposite refutations that investigators have enough to Conversely, a meta-analytic finding
[29]. Empirical &dence suggests that report and search further and thus from inconclusive studies where
this sequence of extreme opposites is refrain from data dredging and pooling is used to "correct" the low
very CO-&nonin m~lecuiar~~enetics manipulation. power of single studies, is probably
1291. false if R I 1:3. Research findings from
These coroiiaries consider each Most Research FindingsAre False underpowered, early-phase clinical
factor separately, but these factors often for Most Research Designs and for trials would be true about one in four
influence each other. For example, Most Fields times, or even less frequently if bias
investigators working in fields where In the described framework, a PPV is present ~~iderniological studies of
true effect sizes are perceived to be exceeding 50% is quite difficult to an exploratory nature perform even
small rnay be more likely to perform get. Table 4 provides the results worse, especially when underpowered,
large studies than investigators working of simulations using the formulas but even well-powered epidemiological
in fields where true effect sizes are developed for the influence of power, studies rnay have only a one in
perceived tobe large. Or prejudice ratio of true to non-true relationships, five chance being true. if R = 1:10.
rnay prevail in a hot scientific field, and bias, for various types of situations Finally, in discoveqwriented research
further undermining the predictive that rnay be characteristic of specific with massive testing, where tested
value of its research findings. Highly study designs and settings. A finding relationships exceed true ones 1,00&
prejudiced stakeholders rnay even from a wellconducted, adequately fold (e.g., 30,000 genes tested, of which
create a barrier that aborts efforts at powered randomized controlled t i a l 30 rnay be the true culprits) [30,31],
obtaining and disseminating opposing starting with a 50% pre-study chance PPV for each claimed relationship is
results. Conversely, the fact that a field that the intemention is effective is extremely low, even with considerable

@. PLoS Medicine I www.plosmedicine.org August 2005 1 Volume 2 I lssue 8 I e124


standardiiation of laboratory and lower intake tertiles. Then the claimed spent their careers is a "null field."
statistical methods, outcomes, and effect sizes are simply measuring However, other lines of evidence,
reporting thereof to minimize bias. nothing else but the net bias that has or advances in technology and
been involved in the generation of experimentation, rnay lead eventually
Claimed Research Findings this scientific literature. Claimed effect to the dismantling of a scientific field.
May Often Be Simply Accurate sizes are in fact the most accurate Obtaining measures of the net bias
Measures of the PrevailingBias estimates of the net bias. It even follows in one field rnay also be useful for
As shown, the majority of modern that between "null fields," the fields obtaining insight into what might be
biomedical research is operating in that claim stronger effects (often with the range of bias operating in other
areas with very low pre- and post- accompanying claims of medical or fieldswhere similar analytical methods,
study probability for true findings. public health importance) are simply technologies, and conflicts rnay be
Let us suppose that in a research field those that have sustained the worst operating.
there are-no true findings at all to be biases.
diicovered. History of science teaches For fields with very low PPV,the few How Can We lmprove
us that scientific endeavor has often true relationships would not distort the Situation?
in the past wasted effort in fields with this overall picture much. Even if a 1s it unavoidable that most research
absolutely no yield of true scientific few relationships are tme, the shape findings are false, or can we improve
information, at least based on our of the distribution of the o b s e ~ e d the situation?A major problem is that
current understanding. in such a "null effects would still yield a clear measure it is impossible to know with 100%
field," one would ideally expect all of the biases involved in the field. This certainty what the truth is in any
observed effect sizes to W b y chance concept totally reverses the way we research question. In this regard, the
around the null in the absence of bias. view scientific results. Traditionally, pure "goldnstandard is unattainable.
The extent that observed findings investigators have viewed large However, there are several approaches
deviate from what is expected by and highly significant effects with to improve the post-study probability.
chance alone would besimply a pure excitement, as signs of important Better powered evidence. e.g., large
measure of the prevailing bias. discoveries. Too large and too highly studies or low-bias meta-analyses,
For example,let us suipose that significant effects rnay actually be more rnay help, as it Comes closer to the
no nutrients or dietary Patterns are likely to be signs of large bias in most unknown "gold" standard. However,
actually important determinants for fields of modern research. They should large studies rnay still have biases
the risk of developing a specific tumor. lead investigators to careful critical and these should be acknowledged
Let us also suppose that the scientific thinking about what might have gone and avoided. Moreover, large-scale
literature has examined 60 nutrients wrong with their data, analyses, and evidence is impossible to obtain for all
and claims ail of them to be related to results. of the millions and trillions of research
the nsk of developing this tumor with Of Course, investigators working in questions posed in current research.
relative risks in the range of 1.2 to 1.4 any field are likely to resist accepting Large-scale evidence should be
for the comparison of the upper to that the whole field in which they have targeted for research questions where
the pre-study probability is already
considerably high, so that a significant
Table 4. PPV of Research Findings forvarious Combinations of Power (1 - P), Ratio research finding will lead to a post-test
o f T ~to
e Not-True Relationships (R),and Bias (U) probability that would be considered
1- ß R U PracticalExample PPV quite definitive. Large-scale evidence is
also particularly indicated when it can
test major concepts rather than narrow,
specific questions. A negative finding
0.95 2:l 030 Confimtory meta-analysis of good- 0.85
can then refute not only a specific
proposed claim, but a whole field or
considerable portion thereof. Selecting
Underpowered,but well-performed 0.23 the performance of large-scale studies
based on narrow-minded cntena,
such as the marketing promotion of a
specific dmg, is largely wasted research.
080 1:lO 030 Adequately power4 exploratory 020 Moreover, one should be cautious
widemioloaical studv that extremely large studies rnay be
more likely to find a formally statistical
significant difference for a trivial effect
020 1 :1,000 0.80 Discovery-orientedexploratory 0.0010
that is not really meaningfully diierent
researchwith massive testina
from the null [32-341.
Second, most research questions
are addressed by many teams, and
it is misleading to emphasize the
statistically significant findings of
any single team. What matters is the

.
!@,:
. PLoS Medicine I www.plosmedicine.org August 2005 1 Volume 2 I lssue 8 1 e124
totality of the evidente. Diminishing many relationships are expected to be 19. Ioannidis JP, Evans SJ, Gotzsche PC, O'Neill
RT, Altman DG, et al. (2004) Better reporting
bias through enhanced research true among those probed across the of harms in randomized uials: An extension
standards and curtailing- of prejudices
- - relevant research fields and research of the CONSORTstatement. Ann Intern Med
rnay also help. However, this rnay designs. The wider field rnay yield some 141: 781-788.
20. International Conference on Harmonisation
require a change in scientific mentality guidance for estimating this probability E9 Expert Working Group (1999) ICH
that might be difficult to achieve. for the isolated research project. Harmonised Tripartite Guideline. Statistical
In some research designs, efforts Experiences from biases detected in principles for clinical uials. Stat Med 18: 1905-
1942.
rnay also be more successful with other neighboring fields would also be 21. Moher D, Cook DJ, Eastwood S, Olkin I,
upfront registration of studies, e.g., useful to draw upon. Even though these Rennie D, et al. (1999) Improving the quality
randomized tnals [35].Registration assumptions would be considerably of reports of meta-analyses of randomised
controlled trials: The QUOROM starement.
would Pose a challenge for hypothesis- subjective, they would still be very Quality of Reporting of ~eta-analyses.Lancet
generating research. Some kind of useful in interpreting research claims 354: 18961900.
registration or networking of data and putting them in context. 22. Stroup DF, Berlin JA, Monon SC. Olkin I,
Williamson GD, er al. (2000) Meta-analysis
collections or investigators within fields of observational studies in epidemiology:
rnay be more feasible than registration References A proposal for reponing. Meta-analysis
1. Ioannidii JP,Haidich AB, Lau J (2001) Any of Observational Studies in Epidemiology
of each and every hypothesis- casualties in the clash of randomised and (MOOSE) group. JAMA 283: 2008-2012.
generating experiment. Regardless, observational evidence? BMJ 322: 879-880. 23. Marshall M, Lockwood A, Bradley C,
2. Lawlor DA, Davey Smith G, Kundu D, Adams C,Joy C, er al. (2000) Unpublished
even if we do not See a great deal of Bmckdorfer KR, Ebrahim S (2004) Those ratingscales: A major source of bias in
Progress with registration of studies confounded vitamins: What can we learn from randomised controlled uials of treaunents for
in other fields, the pnnciples of the differences between observational versus schizophrenia. BrJ Psychiatry 176: 249-252.
randomised uial evidence! Lancet 363: 1724- 24. Alunan DG, Goodman SN (1994) Transfer
developing and adheringto a protocol 1727. of technology from statisticaljoumals to the
could be more widely borrowed from 3. Vandenbroucke JP (2004) When are biomedical literature. Past trends and future
randomized controlled trials. observational studies as credible as randomised predictions. JAMA 272: 129-132.
uials? Lancet 363: 172tL1731. 25. Chan AW, Hrobjartsson A, Haahr MT,
Finally, instead of chasing statistical 4. Michiels S, Koscielny S. Hill C (2005) Gotzsche PC, Alunan DG (2004) Empirical
significance,we should improve our Prediction of cancer outcome with microarrays: evidence for selective reporting of outcomes in
A multiple random Validation suategy. Lancet
understanding of the range of R 365: 488492.
randomized uials: Comparison of protocols to
values-the pre-study odds-where published anicles. JAMA 291: 2457-2465.
5. Ioannidis JPA, N m n i EE, Trikalinos TA, 26. Krimsky S, Rothenberg LS,Stott P, Kyle G
research efforts operate [10]. Before Contopoul~IoannidisDG (2001) Replication (1998) Scientificjournals and their authors'
validity of genetic association studies. Nat financial interests: A pilot study. Psychother
running an expenment, investigators Genet 29: 306309. Psychosom 67: 194-201.
should consider what they believe the 6. Colhoun HM, McKeigue PM. Davey Smith
27. Papanikolaou GN, Baltogianni MS,
chances are that they are testing a true G (2003) Problems of reponing genetic
associations with complex outcomes. Lancet Contopoulos-Ioannidis DG, Haidich AB,
rather than a non-true relationship. 361: 865-872. Giannakakii IA, et al. (2001) Reporting of
conflicu of interest in guidelines of preventive
Speculated high R values rnay 7. Ioannidii JP (2003) Cenetic associations: False
and therapeutic interventions. BMC Med Res
or uue? Trends Mol Med 9: 135-138.
sometimes then be ascertained. As 8. IoannidisJPA (2005) Microarrays and Methodol 1: 3.
descnbed above, whenever ethically molecular research: Noise discovery? Lancet 28. Anunan EM, Lau J, Kupelnick B, Mosteller F,
365: 454455. Chalmers TC (1992) A comparison of resulu
acceptable, large studies with minimal of meta-analysesof randomized control trials
9. Sterne JA, Davey Smith G (2001) Siting the
bias should be berformed on research evidence-What's wrong with significance tests. and recommendations of clinical expew.
findings that are considered relatively BMJ 322: 226-231. Treatments for myocardial infarction. JAMA
10. Wacholder S, Chanock S, Garcia-Closas M, EI 268: 240-248.
established, to See how often they are ghormli L, Rothman N (2004) Assessing the 29. loannidisJP, Trikalinos TA (2005) Early
indeed confirmed. I suspect several probability that a positive report is false: An extreme conuadictory estimates may
appear in published research: The Proteus
established "classics" will fail the test approach for molecular epidemiology studies. J
phenomenon in molecular genetics research
Nad Cancer Inst 96: 434-442.
11. Risch NJ (2000) Searching for genetic and randomized uials. J Clin Epidemiol 58:
Nevertheless, most new discoveries determinants in the new millennium. Nature 543-549.
405: 847-856. 30. Ntzani EE, IoannidisJP (2003) Predictive
will continue to stem from hypothesis- ability of DNA microanays for cancer outcomes
12. KelseyJL, Whittemore AS, Evans AS,
generating research with low or very Thompson WD (1996) Methods in and correlates: An empirical assessment.
low pre-study odds. We should then observational epidemiology, 2nd ed. NewYork: Lancet 362: 1439-1444.
Oxford U Press. 432 P. 31. Ransohoff DF (2004) Rules of evidence
acknowledge that statistical significance 13. Topol EJ (2004) Failing the public health- for cancer molecular-marker discovery and
testing in the report of a single study Rofecoxib, Merck, and the FDA. N Engl J Med validation. Nat Rev Cancer 4: 309-314.
gives only a partial picture, without 351: 1707-1709. 32. Lindley DV (1957) A statistical paradox.
14. Yusuf S, Collins R, Peto R (1984) Why do we Biometrika 44: 187-192.
knowing how much testing has been need some large, simple randomized trials? Stat 33. Barrlett MS (1957) A comment on D.V.
done outside the report and in the Med 3: 409-422. Lindley's statistical paradox. Biomeuika 44:
relevant field at large. Despite a large 15. Alunan DG, Royston P (2000) What do we 533-534.
mean by validating a prognostic model? Stat 34. Senn SJ (2001) Two cheen for P-values.J
statistical literature for multiple testing Med 19: 453-473. Epidemiol Biostat 6: 193-204.
corrections [37],usually it is impossible 16. Taubes G (1995) Epidemiology faces i o limits. 35. De Angelis C. Drazen JM, Frizelle FA, Haug C,
Science 269: 164-169. Hoey J. et al. (2004) Clinical uial regisuation:
to decipher how much data dredging 17. Golub TR, Slonim DK, Tamayo P, Huard A statement from the International &mmittee
by the reporting authors or other C, Gaasenbeek M, er al. (1999) Molecular of MedicalJournal Editors. N Engl J Med 351:
- -

research teams has preceded a reported classification of cancer: Class discovery 1250-12511
and dass prediction by gene expression 36. Ioannidii JPA (2005) Conuadicted and
research finding. Even if determining monitoring. Science 286: 531-537. initially suonger effects in highly cired clinical
this were feasible, this would not 18. Moher D, Schulz KF, Alunan DG (2001) research. JAMA 294: 218-228.
inform us about the pre-study odds. The CONSORT Statement Revised 37. Hsueh HM, ChenJJ, Kodell RL (2003)
recommendations for improving the quality Comparison of methods for estimating the
Thus, it is unavoidable that one should of reports of parallel-group randomised uials. number of uue null hypotheses in multiplicity
make approximate assumptions on how Lancet 357: 1191-1 194. testing. J Biopharm Stat 13: 675-689.

':@,: PLoS Medicine I www.plormedicine.org August 2005 1 Volume 2 I lssue 8 1 e124

You might also like