You are on page 1of 2

A Bayesian perspective on interpreting statistical significance

Imagine that you are screening drugs to see if they lower blood pressure. Based on the
amount of scatter you expect to see, and the minimum change you would care about, you've
chosen the sample size for each experiment to have 80% power to detect the difference you
are looking for with a P value less than 0.05. What happens when you repeat the experiment
many times?

The answer is "it depends". It depends on the context of your experiment. Let's look at the
same experiment performed in three contexts. First, we'll assume that you know a bit about
the pharmacology of the drugs, and expect 10% of the drugs to be active. In this case, the
prior probability is 10%. Second, we'll assume you know a lot about the pharmacology of the
drugs, and expect 80% to be active. Third, we'll assume that the drugs were selected at
random, and you expect only 1% to be active in lowering blood pressure.

What happens when you perform 1000 experiments in each of these contexts. The details of
the calculations are shown on pages 143-145 of Intuitive Biostatistics, by Harvey Motulsky
(Oxford University Press, 1995). Since the power is 80%, you expect 80% of truly effective
drugs to yield a P value less than 0.05 in your experiment. Since you set the definition of
statistical significance to 0.05, you expect 5% of ineffective drugs to yield a P value less than
0.05. Putting these calculations together creates these tables.

A. Prior probability=10% Drug really works Drug really doesn't work Total
P<0.05 "significant" 80 45 125
P>0.05, "not significant" 20 855 875
Total 100 900 1000

B. Prior probability=80% Drug really works Drug really doesn't work Total
P<0.05 "significant" 640 10 650
P>0.05, "not significant" 160 190 350
Total 800 200 1000

C. Prior probability=1% Drug really works Drug really doesn't work Total
P<0.05 "significant" 8 50 58
P>0.05, "not significant" 2 940 942
Total 10 990 1000
The total for each column is determined by the prior probability - the context of your
experiment. The prior probability equals the fraction of the experiments that are in the leftmost
column. To compute the number of experiments in each row, use the definition of power and
alpha. Of the drugs that really work, you won't obtain a P value less than 0.05 in every case.
You chose a sample size to obtain a power of 80%, so 80% of the truly effective drugs yield
"significant" P values, and 20% yield "not significant" P values.  Of the drugs that really don't
work (middle column), you won't get "not significant" results in every case. Since you defined
statistical significance to be "P<0.05" (alpha=0.05), you will see a "significant" result in 5% of
experiments performed with drugs that are really inactive and a "not significant" result in the
other 95%.

If the P value is less than 0.05, so the results are "statistically significant", what is the chance
that the drug is really active? The answer is different for each experiment.

Experiments with P<0.05 and... Fraction of experiments with P<0.05


Prior probability ...drug really ...drug really where drug really works
works doesn't work
A. Prior 80 45 80/125 = 64%
probability=10%
B. Prior 640 10 640/650 = 98%
probability=80%
C. Prior 8 50 8/58 = 14%
probability=1%
For experiment A, the chance that the drug is really active is 80/125 or 64%. If you observe a
statistically significant result, there is a 64% chance that the difference is real and a 36%
chance that the difference was caused by random sampling. For experiment B, there is a
98.5% chance that the difference is real. In contrast, if you observe a significant result in
experiment C, there is only a 14% chance that the result is real and a 86% chance that it is a
coincidence of random sampling. For experiment C, the vast majority of "significant' results
are due to chance.

Your interpretation of a "statistically significant" result depends on the context of the


experiment. You can't interpret a P value in a vacuum. Your interpretation depends on the
context of the experiment. Interpreting results requires common sense, intuition, and
judgment.

You might also like