Statistical Significance: 2 Role in Statistical Hypothesis Test-Ing

Statistical signicance
Statistical signicance is the low probability of obtain-

ing at least as extreme results given that the null hy-
pothesis is true.
[1][2][3][4][5][6][7]
It is an integral part of
statistical hypothesis testing where it helps investigators
to decide if a null hypothesis can be rejected.
[8][9]
In any
experiment or observation that involves drawing a sample
from a population, there is always the possibility that an
observed eect would have occurred due to sampling er-
ror alone.
[10][11]
But if the probability of obtaining at least
as extreme result (large dierence between two or more
sample means), given the null hypothesis is true, is less
than a pre-determined threshold (e.g. 5% chance), then
an investigator can conclude that the observed eect ac-
tually reects the characteristics of the population rather
than just sampling error.
[8]
The present-day concept of statistical signicance orig-
inated from Ronald Fisher when he developed statisti-
cal hypothesis testing in the early 20th century.
[2][12][13]
These tests are used to determine whether the outcome
of a study would lead to a rejection of the null hypothesis
based on a pre-specied low probability threshold called
p-values, which can help an investigator to decide if a re-
sult contains sucient information to cast doubt on the
null hypothesis.
[14]
P-values are often coupled to a signicance or alpha ()
level, which is also set ahead of time, usually at 0.05
(5%).
[14]
Thus, if a p-value was found to be less than 0.05,
then the result would be considered statistically signi-
cant and the null hypothesis would be rejected.
[15]
Other
signicance levels, such as 0.1 or 0.01, are also used, de-
pending on the eld of study.
In statistics, statistical signicance is not the same as re-
search, theoretical, or practical signicance.
[8][9][16]
1 History
Main article: History of statistics
The concept of statistical signicance was originated by
Ronald Fisher when he developed statistical hypothesis
testing, which he described as tests of signicance,
in his 1925 publication, Statistical Methods for Research
Workers.
[2][12][13]
Fisher suggested a probability of one in
twenty (0.05) as a convenient cutolevel to reject the null
hypothesis.
[17]
In their 1933 paper, Jerzy Neyman and
Egon Pearson recommended that the signicance level
(e.g. 0.05), which they called , be set ahead of time,
prior to any data collection.
[17][18]
Despite his initial suggestion of 0.05 as a signicance
level, Fisher did not intend this cuto value to be xed,
and in his 1956 publication Statistical methods and scien-
tic inference he recommended that signicant levels be
set according to specic circumstances.
[17]
2 Role in statistical hypothesis test-
ing
Main articles: Statistical hypothesis testing, Null hypoth-
esis, p-value and Type I and type II errors
Statistical signicance plays a pivotal role in statistical
In a two-tailed test, the rejection region or level is partitioned
to both ends of the sampling distribution and make up only 5%
of the area under the curve.
hypothesis testing, where it is used to determine if a null
hypothesis should be rejected or retained. A null hypoth-
esis is the general or default statement that nothing hap-
pened or changed.
[19]
For a null hypothesis to be rejected
as false, the result has to be identied as being statisti-
cally signicant, i.e. unlikely to have occurred by chance
alone.
To determine if a result is statistically signicant, a re-
searcher would have to calculate a p-value, which is the
probability of observing an eect given that the null hy-
pothesis is true.
[7]
The null hypothesis is rejected if the p-
value is less than the signicance or level. The level is
the probability of rejecting the null hypothesis given that
it is true (type I error) and is most often set at 0.05 (5%).
If the level is 0.05, then the conditional probability of a
type I error, given that the null hypothesis is true, is 5%.
[20]
Then a statistically signicant result is one in which the
1
2 6 REFERENCES
observed p-value is less than 5%, which is formally writ-
ten as p < 0.05.
[20]
If an observed p-value is not lower than the signicance
level, then rather than simply accepting the null hypoth-
esis, where feasible it would often be appropriate to in-
crease the sample size of the study, and see if the signif-
icance level is reached.
[21]
If the level is set at 0.05, it means that the rejection re-
gion comprises 5% of the sampling distribution.
[22]
This
5% can be allocated to one side of the sampling distri-
bution as in a one-tailed test or partitioned to both sides
of the distribution as in a two-tailed test, with each tail
(or rejection region) containing 2.5% of the distribution.
One-tailed tests are more powerful than two-tailed tests,
as a null hypothesis can be rejected with a less extreme
result.
3 Dening signicance in terms of
sigma ()
Main articles: Standard deviation and Normal distribu-
tion
In specic elds such as particle physics and
manufacturing, statistical signicance is often ex-
pressed in multiples of the standard deviation or sigma
() of a normal distribution, with signicance thresholds
set at a much stricter level (e.g. 5).
[23][24]
For instance,
the certainty of the Higgs boson particles existence was
based on the 5 criterion, which corresponds to a p-value
of about 1 in 3.5 million.
[24][25]
4 Eect size
Main article: Eect size
Researchers focusing solely on whether their results are
statistically signicant might report ndings that are not
necessarily substantive.
[26]
To gauge the research signif-
icance of their result, researchers are also encouraged to
report the eect size along with p-values (in cases where
the eect being tested for is dened in terms of an eect
size): the eect size quanties the strength of an eect,
such as the distance between two means or the correlation
between two variables.
[27]
5 See also
A/B testing
ABX test
Condence level, the complement of the signi-
cance level
Eect size
Fishers method for combining independent tests of
signicance
Look-elsewhere eect
Texas sharpshooter fallacy (gives examples of tests
where the signicance level was set too high)
Reasonable doubt
Statistical hypothesis testing
6 References
[1] Redmond, Carol; Colton, Theodore (2001). Clinical sig-
nicance versus statistical signicance. Biostatistics in
Clinical Trials. Wiley Reference Series in Biostatistics
(3rd ed.). West Sussex, United Kingdom: John Wiley &
Sons Ltd. pp. 3536. ISBN 0-471-82211-6.
[2] Cumming, Geo (2012). Understanding The New Statis-
tics: Eect Sizes, Condence Intervals, and Meta-Analysis.
New York, USA: Routledge. pp. 2728.
[3] Krzywinski, Martin; Altman, Naomi (30 October 2013).
Points of signicance: Signicance, P values and t-
tests. Nature Methods (Nature Publishing Group) 10
(11): 10411042. doi:10.1038/nmeth.2698. Retrieved
3 July 2014.
[4] Sham, Pak C.; Purcell, Shaun M (17 April 2014).
Statistical power and signicance testing in large-scale
genetic studies. Nature Reviews Genetics (Nature Pub-
lishing Group) 15 (5): 335346. doi:10.1038/nrg3706.
Retrieved 3 July 2014.
[5] Johnson, Valen E. (October 9, 2013). Revised stan-
dards for statistical evidence. Proceedings of the National
Academy of Sciences (National Academies of Science).
doi:10.1073/pnas.1313476110. Retrieved 3 July 2014.
[6] Altman, Douglas G. (1999). Practical Statistics for Med-
ical Research. New York, USA: Chapman & Hall/CRC.
p. 167. ISBN 978-0412276309.
[7] Devore, Jay L. (2011). Probability and Statistics for Engi-
neering and the Sciences (8th ed.). Boston, MA: Cengage
Learning. pp. 300344. ISBN 0-538-73352-7.
[8] Sirkin, R. Mark (2005). Two-sample t tests. Statistics
for the Social Sciences (3rd ed.). Thousand Oaks, CA:
SAGE Publications, Inc. pp. 271316. ISBN 1-412-
90546-X.
[9] Borror, Connie M. (2009). Statistical decision making.
The Certied Quality Engineer Handbook (3rd ed.). Mil-
waukee, WI: ASQ Quality Press. pp. 418472. ISBN
0-873-89745-5.
3
[10] Babbie, Earl R. (2013). The logic of sampling. The
Practice of Social Research (13th ed.). Belmont, CA: Cen-
gage Learning. pp. 185226. ISBN 1-133-04979-6.
[11] Faherty, Vincent (2008). Probability and statistical sig-
nicance. Compassionate Statistics: Applied Quantitative
Analysis for Social Services (With exercises and instruc-
tions in SPSS) (1st ed.). Thousand Oaks, CA: SAGE Pub-
lications, Inc. pp. 127138. ISBN 1-412-93982-8.
[12] Poletiek, Fenna H. (2001). Formal theories of testing.
Hypothesis-testing Behaviour. Essays in Cognitive Psy-
chology (1st ed.). East Sussex, United Kingdom: Psy-
chology Press. pp. 2948. ISBN 1-841-69159-3.
[13] Fisher, Ronald A. (1925). Statistical Methods for Research
Workers. Edinburgh, UK: Oliver and Boyd. p. 43. ISBN
0-050-02170-2.
[14] Schlotzhauer, Sandra (2007). Elementary Statistics Using
JMP (SAS Press) (PAP/CDR ed.). Cary, NC: SAS Insti-
tute. pp. 166169. ISBN 1-599-94375-1.
[15] McKillup, Steve (2006). Probability helps you make a
decision about your results. Statistics Explained: An In-
troductory Guide for Life Scientists (1st ed.). Cambridge,
United Kingdom: Cambridge University Press. pp. 44
56. ISBN 0-521-54316-9.
[16] Myers, Jerome L.; Well, Arnold D.; Lorch Jr, Robert F.
(2010). The t distribution and its applications. Research
Design and Statistical Analysis: Third Edition (3rd ed.).
New York, NY: Routledge. pp. 124153. ISBN 0-805-
86431-8.
[17] Quinn, Georey R.; Keough, Michael J. (2002). Experi-
mental Design and Data Analysis for Biologists (1st ed.).
Cambridge, UK: Cambridge University Press. pp. 4669.
ISBN 0-521-00976-6.
[18] Neyman, J.; Pearson, E.S. (1933). The testing of statisti-
cal hypotheses in relation to probabilities a priori. Math-
ematical Proceedings of the Cambridge Philosophical So-
ciety 29: 492510. doi:10.1017/S030500410001152X.
[19] Meier, Kenneth J.; Brudney, Jerey L.; Bohte, John
(2011). Applied Statistics for Public and Nonprot Admin-
istration (3rd ed.). Boston, MA: Cengage Learning. pp.
189209. ISBN 1-111-34280-6.
[20] Healy, Joseph F. (2009). The Essentials of Statistics: A
Tool for Social Research (2nd ed.). Belmont, CA: Cen-
gage Learning. pp. 177205. ISBN 0-495-60143-8.
[21] Cohen, Barry H. (2008). Explaining Psychological Statis-
tics (3rd ed.). Hoboken, NJ: John Wiley and Sons. pp.
4683. ISBN 0-470-00718-4.
[22] Health, David (1995). An Introduction To Experimental
Design And Statistics For Biology (1st ed.). Boston, MA:
CRC press. pp. 123154. ISBN 1-857-28132-2.
[23] Vaughan, Simon (2013). Scientic Inference: Learning
from Data (1st ed.). Cambridge, UK: Cambridge Uni-
versity Press. pp. 146152. ISBN 1-107-02482-X.
[24] Bracken, Michael B. (2013). Risk, Chance, and Causa-
tion: Investigating the Origins and Treatment of Disease
(1st ed.). New Haven, CT: Yale University Press. pp.
260276. ISBN 0-300-18884-6.
[25] Franklin, Allan (2013). Prologue: The rise of the sig-
mas. Shifting Standards: Experiments in Particle Physics
in the Twentieth Century (1st ed.). Pittsburgh, PA: Univer-
sity of Pittsburgh Press. pp. IiIii. ISBN 0-822-94430-8.
[26] Carver, Ronald P. (1978). The Case Against Statistical
Signicance Testing. Harvard Educational Review 48:
378399.
[27] Pedhazur, Elazar J.; Schmelkin, Liora P. (1991). Mea-
surement, Design, and Analysis: An Integrated Approach
(Student ed.). New York, NY: Psychology Press. pp.
180210. ISBN 0-805-81063-3.
7 Further reading
Ziliak, Stephen, and McCloskey, Deirdre, (2008).
The Cult of Statistical Signicance: How the Stan-
dard Error Costs Us Jobs, Justice, and Lives. Ann
Arbor, University of Michigan Press, 2009.
Thompson, Bruce, (2004). The signicance cri-
sis in psychology and education. Journal of Socio-
Economics, 33, pp. 607613.
Chow, Siu L., (1996). Statistical Signicance: Ra-
tionale, Validity and Utility, Volume 1 of series In-
troducing Statistical Methods, Sage Publications Ltd,
ISBN 978-0-7619-5205-3 argues that statistical
signicance is useful in certain circumstances.
Kline, Rex, (2004). Beyond Signicance Testing:
Reforming Data Analysis Methods in Behavioral Re-
search Washington, DC: American Psychological
Association.
8 External links
The article "Earliest Known Uses of Some of the
Words of Mathematics (S)" contains an entry on Sig-
nicance that provides some historical information.
"The Concept of Statistical Signicance Testing"
(February 1994): article by Bruce Thompon hosted
by the ERIC Clearinghouse on Assessment and
Evaluation, Washington, D.C.
"What does it mean for a result to be statistically
signicant"?" (no date): an article from the Statis-
tical Assessment Service at George Mason Univer-
sity, Washington, D.C.
4 9 TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES
9 Text and image sources, contributors, and licenses
9.1 Text
Statistical signicance Source: http://en.wikipedia.org/wiki/Statistical_significance?oldid=629745192 Contributors: Bryan Derksen, The
Anome, William Avery, Michael Hardy, Kku, Gabbe, Dcljr, Ellywa, Nichtich, Den fjttrade ankan, Nerd, Cherkash, Topbanana, Paranoid,
Gak, Henrygb, Giftlite, BrendanH, Pgan002, Antandrus, L353a1, DanielCD, Rich Farmbrough, Yknott, Kndiaye, Slb, Cretog8, Arcadian,
Andrewpmk, John Quiggin, Seans Potato Business, Alkarex, Woohookitty, Btyner, Rjwilmsi, Smoe, Thomas Arelatensis, Thisismikesother,
ElKevbo, Cjpun, EvanSeeds, Lborelli, Mathbot, Riki, Preslethe, Vonkje, Chobot, YurikBot, Wavelength, Gaius Cornelius, ENeville,
Nephron, DRosenbach, Jon Olav Vik, Doc pune, Lt-wiki-bot, Davril2020, Badgettrg, Darrel francis, SmackBot, McGeddon, Jtneill, Rob-
fuller, Ohnoitsjamie, Josefec, Nbarth, Danielkueh, Richard001, G716, Arodb, Euchiasmus, Tim bates, Nijdam, Tommyzee, Mmiller0712,
Mdgross50, Grapplequip, DwightKingsbury, Joseph Solis in Australia, Abeg92, Tawkerbot4, LarryQ, Thijs!bot, Tallred, Wildthing61476,
Erxnmedia, Fetchcomms, Magioladitis, Torchiest, Inhumandecency, MartinBot, ChemNerd, Lilac Soul, Coppertwig, Yym1997, Kenneth
M Burke, Spellcast, Philip Trueman, Don Quixote de la Mancha, MuanN, Seraphim, Sprasad.ee, SQL, Wangerin, Lavers, Jasondet, The-
G-Unit-Boss, Melcombe, Wjmummert, Martarius, ClueBot, Binksternet, Srudes2, Winsteps, Pwestfall, Lot49a, Qwfp, Staticshakedown,
Dthomsen8, SilvonenBot, Mifter, Aam aadmi, ZooFari, Jmkim dot com, Addbot, Eric Drexler, DOI bot, Fgnievinski, Bulletproofman19,
MrOllie, Palmerabollo, Numbo3-bot, Ehrenkater, Zorrobot, Luckas-bot, AnomieBOT, ChristopheS, Materialscientist, SvartMan, Xqbot,
Bbarkley, Sylwia Ufnalska, M12107, Constructive editor, FrescoBot, Sawomir Biay, Pinethicket, Edderso, Georg Hurtig, RedBot, Gjsis,
Cerebis, Animalparty, Indicedigini, Raylyons, Billare, Sir Arthur Williams, Rgmooney C109, GoingBatty, Schwa dk, HiW-Bot, Kostya 888,
Muditjai, Mysticyx, L Kensington, Mikhail Ryazanov, ClueBot NG, Mathstat, Michael D. Stephens, Helpful Pixie Bot, BG19bot, Wikstar7,
Lilingxi, Matthieu Vergne, Manoguru, Minsbot, MathewTownsend, BattyBot, HankW512, ChrisGualtieri, Eggingerik, BetseyTrotwood,
NicenFriendlyPerson, Sa publishers, Soranoch, Thewikiguru1, Rgiordan, EmilKarlsson, 1980na and Anonymous: 152
9.2 Images
File:Fisher_iris_versicolor_sepalwidth.svg Source: http://upload.wikimedia.org/wikipedia/commons/4/40/Fisher_iris_versicolor_
sepalwidth.svg License: CC-BY-SA-3.0 Contributors: en:Image:Fisher iris versicolor sepalwidth.png Original artist: en:User:Qwfp (origi-
nal); Pbroks13 (talk) (redraw)
File:NormalDist1.96.png Source: http://upload.wikimedia.org/wikipedia/en/b/bf/NormalDist1.96.png License: ? Contributors:
self-made
Original artist:
Qwfp (talk)
File:Wikiversity-logo.svg Source: http://upload.wikimedia.org/wikipedia/commons/9/91/Wikiversity-logo.svg License: ? Contributors:
Snorky (optimized and cleaned up by verdy_p) Original artist: Snorky (optimized and cleaned up by verdy_p)
9.3 Content license
Creative Commons Attribution-Share Alike 3.0

Statistical Significance: 2 Role in Statistical Hypothesis Test-Ing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistical Significance: 2 Role in Statistical Hypothesis Test-Ing

Uploaded by

Copyright:

Available Formats

Statistical signicance

Statistical signicance is the low probability of obtain-

You might also like