You are on page 1of 64

Statistics 101:

Formula-Free Statistics

Author: Nick Barrowman, PhD


Date: July 16th, 2012

Conflict of interest disclosure

I do not hold any research grants funded by


industry
I have done paid consulting work for Mead
Johnson Nutrition [Canada] Co.
I have no other relevant financial
relationships with members of the
pharmaceutical industry or medical supply
companies.

Learning objectives

By the end of this talk you will be able to:


Understand the principal concepts of
statistics
Interpret statistics commonly reported in the
medical literature
Identify "red flags" in research reports that
may be signs of trouble

Variability

Patients vary.
Physicians vary.
Nurses vary.
Hospitals vary.
Measurements vary.
Disease states vary.
Immune response varies.
Drug adherence varies.

Variability

Consequences of variability

Variability means that the patterns we notice


may be illusory and we may miss the real
patterns.
Variability means that our conclusions will
always be tentative.
Variability means that well have to deal with
uncertainty.

Uncertaintyisanuncomfortableposition.
Butcertaintyisanabsurdone.
Voltaire

What is Statistics?

Statistics is the science of variability.

The principal concepts of statistics

Variability can be modeled using probability.


This provides a framework for drawing
inferences and quantifying our uncertainty.

Fundamental statistical ideas

The population and the sample


Confidence intervals
Hypothesis tests
P-values

Population vs. sample

Random sample

Population

Calculation
Calculation

Population mean
blood pressure

Inference
Inference

Sample mean
blood pressure

A typically cryptic description

The mean systolic blood pressure in group A was


110 mmHg, while in group B it was 104 mmHg
There was no difference in systolic blood pressure
between the groups.

A typically cryptic description

The mean systolic blood pressure in group A was


110 mmHg, while in group B it was 104 mmHg
There was no difference in systolic blood pressure
between the groups.

Statements like this can be perplexing.


For a start, how can there be no difference
when there is clearly a difference?

Population vs. sample

Hig
h
BP

Random sample

Population

Low
BP
Calculation
Calculation

Population difference
between groups in
mean blood pressure

Inference
Inference

Sample difference
between groups in
mean blood pressure

A typically cryptic description

The mean systolic blood pressure in group A was


110 mmHg, while in group B it was 104 mmHg
There was no difference in systolic blood pressure
between the groups.

A typically cryptic description


Sample

The mean systolic blood pressure in group A was


110 mmHg, while in group B it was 104 mmHg
There was no difference in systolic blood pressure
between the groups.
Population

A typically cryptic description

The mean systolic blood pressure in group A was


110 mmHg, while in group B it was 104 mmHg
There was no difference in systolic blood pressure
between the groups.

statistically significant
More on this later

A typically cryptic description

The mean systolic blood pressure in group A was


110 mmHg, while in group B it was 104 mmHg
There was no statistically significant difference in
systolic blood pressure between the groups.

mean

Even if the means do not differ


significantly between groups,
systolic blood pressure varies
within each group.

A typically cryptic description

The mean systolic blood pressure in group A was


110 mmHg, while in group B it was 104 mmHg
There was no statistically significant difference in
mean systolic blood pressure between the
groups.
(SD = 4.2 mmHg)

SD is the standard deviation.


Estimates of variability are essential.

(SD = 5.0 mmHg)

Red flag

Failing to report estimates of


variability
Variabilityisalwayspresent.Failingto
reportestimatesofvariabilitycanbe
misleadingandcanalsomakeit
impossibleforthereadertoverifyresults.

Comparisons

Many studies focus on comparisons


between groups
between a single group and a reference standard.

e.g. Compare weight gain:


On average, did group A gain more than group B?
On average, did people in a single group gain
weight? Here the reference standard is no change.

The null hypothesis

The null hypothesis is a default assumption


about the population, usually that there is no
difference.
For example: In the population, there is no
difference between the mean blood pressure
for groups A and B.

Weight gain example

With two groups, the null hypothesis is:


Mean weight gain is the same in the two groups
i.e. Difference in mean weight gain = 0

With one group, the null hypothesis is:


Mean weight gain = 0

In these two cases, zero is the null value

Other examples of the null hypothesis

Example: Mortality in two groups


Mortality rate in group A = mortality rate in group B
i.e. Relative risk of mortality is 1.
So 1 is the null value.

Example: IQ in a single group


Mean IQ is 100.
So 100 is the null value.

Hypothesis testing

An example:
BedsideLimitedEchocardiographybytheEmergencyPhysicianIsAccurate
DuringEvaluationoftheCriticallyIllPatient

Pershadetal.Pediatrics2004;114;e667e671.

Goal: to compare echocardiography measurements made


by emergency physicians and experienced pediatric
echocardiography providers.

Hypothesis testing
Patient Emerg Doc Cardiographer

Shortening fraction
SF (%)

Hypothesis testing

Difference between
measurements made by
echocardiographers and
emergency physicians

Hypothesis testing

Difference between
measurements made by
echocardiographers and
emergency physicians

On average
echocardiographers
estimates were
higher by 4.4%.

Hypothesis testing

Difference between
measurements made by
echocardiographers and
emergency physicians

On average
echocardiographers
estimates were
higher by 4.4%.
Could this
difference be a
chance occurrence?

Hypothesis testing

We need to test the hypothesis that in the


population there is no difference.
We often report a p-value: the probability of
observing a difference that is at least as
extreme as what was observed, assuming
there is no difference in the population.
Usually consider a p-value < 0.05 to be
statistically significant.

Hypothesis testing

Difference between
measurements made by
echocardiographers and
emergency physicians

On average
echocardiographers
estimates were
higher by 4.4%.
P=0.003 (t-test)
statistically
significant

Statistical vs. Clinical significance

But the authors note, Although statistically


significant, the difference of 4.4% in the
estimation of SF may not be clinically
relevant.
A statistically significant finding is not always
clinically significant.
Subject area judgement is always needed.

Red flag

Treating a statistically significant


finding as important without
considering whether it is clinically
relevant
Statisticalsignificancesimplyrulesout
chanceasanexplanationfortheresults.
Itdoesnotnecessarilymeanthatthe
resultsareclinicallysignificant.

Beyond the p-value

The mean sample difference is 4.4%


The mean population difference could be
larger or smaller.
We know (with 95% confidence) that the
population difference is greater than zero.
Do we know anything more?

Confidence intervals

Yes! A much more useful result than a pvalue is a confidence interval.


A confidence interval tells us what population
values (of the difference in means) are
consistent with our data.
Values outside the confidence interval are
ruled out.
The key issue is the clinical relevance of the
values contained in a confidence interval.

Hypothesis testing

Difference between
measurements made by
echocardiographers and
emergency physicians

On average
echocardiographers
estimates were
higher by 4.4%.
The 95% confidence
interval is
1.6% 7.2%.

Statisticsmeansneverhavingtosay
yourecertain.

Hypothesis testing
Patient Emerg Doc Cardiographer

Inferior vena
cava (IVC)
diameter(mm)

Hypothesis testing

Difference between
measurements made by
echocardiographers and
emergency physicians

Hypothesis testing

Difference between
measurements made by
echocardiographers and
emergency physicians

On average
echocardiographers
estimates were
higher by 0.068 mm

P=0.14
Not statistically
significant

Hypothesis testing

Since p>0.05, can we conclude that there is


no difference between IVC measurements
made by echocardiographers and emergency
physicians?

Hypothesis testing

Since p>0.05, can we conclude that there is


no difference between IVC measurements
made by echocardiographers and emergency
physicians?
No! The confidence interval can help us
understand why not.

Hypothesis testing

Difference between
measurements made by
echocardiographers and
emergency physicians

On average
echocardiographers
estimates were higher
by 0.068 mm
The 95% confidence
interval is -0.025 to
0.16 mm.

Is the null hypothesis true?

It is not necessary (or even possible) to


know if the null hypothesis is exactly correct.
For instance, it is not necessary to know if a
difference is exactly zero.
Rather, it is sufficient to be confident that a
difference is small enough that it is negligible
or unimportant.
Who should decide what is unimportant?

Red flag

Claiming there is no difference


because p > 0.05
Thesamplesizeofthestudymaynotbe
sufficienttodetectadifference,orthe
differencemaybesmall(butpossibly
stillimportant).

Association

The example examined the question of


whether type of training (emergency
physician or echocardiographer) was
associated with the measured value of SF or
IVC.
Many common statistical methods focus on
estimating or testing associations.

Measures and tests of association

Students t-test, Wilcoxon test

Measures and tests of association

Students t-test, Wilcoxon test


chi-square test, Fishers exact test

Measures and tests of association

Students t-test, Wilcoxon test


chi-square test, Fishers exact test
log rank test

Measures and tests of association

Students t-test, Wilcoxon test


chi-square test, Fishers exact test
log rank test
Pearson and Spearman correlations

Measures and tests of association

Students t-test, Wilcoxon test


chi-square test, Fishers exact test
log rank test
Pearson and Spearman correlations
absolute risk reduction
relative risk
odds ratio
number needed to treat

Measures of association for dichotomous outcomes

Group Group
A
B
50%

5%

25%

30%

3%

5%

Absolute risk
reduction

Relative
risk

Odds
ratio

20%

0.6

0.43

(NNT=5)

(RRR=40%)

2%

0.6

(NNT=50)

(RRR=40%)

20%

0.2

(NNT=5)

(RRR=80%)

0.59

0.16

The slippery slope of causation

When there is an association, we may be


tempted to ascribe causation. For example:
The new surgical technique produced quicker
recovery.

Compare with:
Patients who received the new surgical technique
recovered quicker.

Perhaps the patients who received the new


surgical technique had less serious
conditions.

The slippery slope of causation

Be very cautious with conclusions about


causality
A well-executed randomized trial provides
the most solid grounds for causal inferences

Correlationdoesnotimplycausation.

Red flag

Causal inferences in an
observational study
Anobservationalstudycandetectan
association,butnot(byitself)causation.

Raw and adjusted analyses

In the analysis of the surgery data, it might


be wise to take into account the severity of
each patients condition.
This would give adjusted results.
Terminology: Unadjusted results are often
called raw or crude results.
Adjusted analyses are often used to account
for imbalances between groups.

Raw and adjusted analyses

But remember: an adjusted association is


still just an association.
Remain cautious of causal inferences!

Adjusted analyses

linear regression
logistic regression
Poisson regression
Cox proportional hazards regression
Cochran-Mantel-Haenszel methods

Some things I havent discussed

Biases
Study designs
Power and sample size determination
Other statistical methods

evaluation of diagnostic tests


meta-analysis
multi-level/hierarchical models
assessment of measurement reliability

Summary

Variability leads to uncertainty.


Statistics uses probability theory to model variability
and quantify uncertainty.
Confidence intervals are more informative than pvalues, and help to assess clinical significance.
Many common statistical methods focus on estimating
and testing associations.
Be careful about ascribing causation.
To account for other factors, statistical methods are
available for adjusting associations.

VarietyVariabilityisthespiceoflife.

You might also like