You are on page 1of 82

Data is Everywhere

Research Literature
Hypothesis: Surgeon-directed institutional peer review, associated with
positive physician feedback, can decrease the morbidity and mortality
rates associated with carotid endarterectomy. Results: Stroke rate
decreased from 3.8% (1993-1994) to 0%(1997-1998). The mortality rate
decreased from 2.8% (1993-1994) to 0% (1997-1998). (average) Length
of stay decreased from 4.7 days (1993-1994) to 2.6 days (1997-1998).
The (average) total cost decreased from $13344 (1993-1994) to $9548
(1997-1998).

Biostatistics

Archives of Surgery, August 2000

SGU

Popular Press

July 2014

For the first time, an influential doctors group is recommending that


some children as young as 8 be given cholesterol-fighting drugs to ward
off future heart problems... With one-third of U.S. children overweight
and about 17 percent obese, the new recommendations are important,
said Dr. Jennifer Li, a Duke University childrens heart specialist.
cnn.com, July 8, 2008

Data provides Information

Steps in Research Project

Good Data Can Be Analyzed and Summarized to


Provide Useful Information

Planning
Design
Data Collection
Data Analysis

Bad Data Can Be Analyzed and Summarized to


Provide Incorrect/Harmful/Non-informative
Information

Presentation
Interpretation

Biostatistics
Design of Studies
Sample size
Selection of study participants
Role of randomization
Data Collection Variability
Important patterns in data are obscured by variability.
Distinguish real patterns from random variation.
Inference
Draw general conclusions from limited data
e.g. survey
Summarize
What summary measures will best convey the results
How to convey uncertainty in results
Interpretation
What do the results mean in terms of practice, the program,
the population etc.

1954 Salk Polio Vaccine Trial


Vaccinated
n = 200, 745
School Children
Placebo
n = 201, 229

Vaccine
Placebo

Design: Features of the Polio Trial

Polio Cases
82
162

Reference: Meier P, The Biggest Public Health Experiment Ever: The 1954 Field Trial of the
Salk Poliomyelitis Vaccine, In: Statistics: A Guide to the Unknown, 1972.

There were almost twice as many polio cases


in the placebo compared to vaccine group.

Comparison Group
Randomized
Placebo Controls

COULD WE GET SUCH GREAT IMBALANCE BY CHANCE?

Double Blind

Polio Cases
Vaccine 82 out of 200,745
Placebo 162 out of 201,229

Objective: The groups should be equivalent


except for the factor (vaccine) being investigated.

p-value=?
Question: Could the results be due to chance?
Statistical methods tell us how to make these probability
calculations.
7

Types of Data
1

There are Different Statistical Methods for Different Types of Data

Binary (dichotomous) data


Polio: Yes/No
Cure: Yes/No
Gender: Male/Female

Binary Data To compare the number of polio cases in the 2


treatment arms of the Salk Polio vaccine, you could
use
Fishers Exact Test
Chi-Square Test

Categorical data
Race/ethnicity nominalno ordering
Country of birth nominalno ordering
Degree of agreement ordinalordering

Continuous data (finer measurements)

Continuous Data To compare blood pressure in a clinical trial


evaluating 2 blood pressure lowering medications, you
could use
2-sample t-Test
Wilcoxon Rank Sum (nonparametric) Test

Blood pressure
Weight
Height
Age

Time to Event data


Time in remission
9

)
Sample Mean (X

10

)
Notes on Sample Mean (X

Add up data, then divide by sample size (n)

The sample size n


is the number of
observations
(pieces of data)

Example: n = 5 Systolic Blood Pressures (mmHg)

Formula
=
X

X1 = 120

X2 = 80
X3 = 90
X4 = 110
X5 = 95

Pn

i=1 Xi

n
Also called sample average or arithmetic mean

Sensitive to extreme values


One data point could make a great change in sample mean

Why is it called the sample mean?


To distinguish it from population mean

= 120 + 80 + 90 + 110 + 95 = 99 mmHg


X
5
11

12

Population Versus Sample

Population Versus Sample

is not the population mean


The sample mean X
Population The entire group about which you want information
Blood pressures of all 18-year-old male college
students in the U.S.

Population
Population mean

Sample A part of the population from which we actually


collect information; used to draw conclusions about
the whole population

Sample

Sample mean X

We dont know the population mean but we would like to


We draw a sample from the population

We calculate the sample mean X

Sample of blood pressures from n = 5


18-year-old male college students in the U.S.

to ?
How close is X
is to
Statistical theory will tell us how close X

13

14

Sample Median

The median is the middle number


STATISTICAL INFERENCE IS THE PROCESS OF
TRYING TO DRAW CONCLUSIONS ABOUT THE
POPULATION FROM THE SAMPLE

80

90

95

110

120

Median
We will return to this later
1

Not sensitive to extreme values.


If 120 became 200
Median no change
Mean big change (becomes 115)

15

16

Sample Median

If the sample size is an even number,


Average the two middle numbers
80

90

95

110

120

125

Median
95 + 110
= 102.5 mmHg
2

17

Describing Variability

18

Describing Variability

The sample variance is the average of the


square of the deviations about the sample
mean

How Can We Describe the Spread


of the Distribution?
Minimum and Maximum
Range=Max-Min

Why n 1?
Stay tuned

SAMPLE STANDARD DEVIATION (S or SD)

s2 =

Pn

)2
X
n1

i=1 (Xi

Sample variance (s 2 )
Sample standard deviation (s or SD) is the square root of s 2

m
19

20

Calculating s

Notes on s

Example: n = 5
Systolic Blood Pressures
(mmHg)
X1

120

X2

80

X3

90

X4

110

X5

95

The bigger s is, the more variability there is

s measures the spread about the mean

s can equal 0 only if there is no spread


all n observations have the same value

The units of s are the same as the units of the data


(e.g. mmHg)

Often abbreviated SD

s is the best estimate of the population standard deviation

Interpretation
Most of the population will be within about 2 standard

Sample Mean
Sample Variance
Sample Standard Deviation (SD)

deviations (s) of the mean X

= 99 (mmHg)
X
s 2 = 255
s = 15.97 (mmHg)

For a normally (Gaussian) distributed population, most is

about 95%
21

22

23

24

More Notes about SD:


Why do we divide by n 1 instead of n?
with in the formula for s 2
We want to replace X
P
)2
(Xi X
s =
n1
2

Because we dont know , we use X


)2 tends to be smaller than (Xi )2
But (Xi X
So, to compensate we divide by a smaller number:
n 1 instead of n
n 1 is called the degrees of freedom of the variance.
Why?
The sum of the deviations is zero
The last deviation can be found once we know the other n 1
Only n 1 of the squared deviations can vary freely

The term degrees of freedom arises in other statistics


It is not always n 1, but it is in this case

Other Measures of Variation

Continuous Variables: Histograms

Standard deviation (SD or S)

Means and medians do not tell the whole story

Minimum and maximum observation

Differences in spread (variability)

Range=Max-Min

Differences in shape of the distribution

What Happens To These as Sample Size Increases?

Histograms are a way of displaying the


distribution of a set of data by charting the
number (or percentage) of observations whose
values fall within pre-defined numerical ranges

. Tend to increase?
. Tend to decrease?
. Remain about the same?

25

26

How to Make a Histogram


Divide into intervals (equal)

Table 20: Resident Population by Age and State (2000)


State
Alabama
Alaska
Arizona
Arkansas
California
Colorado
Connecticut
Delaware
Florida
Georgia
Hawaii
Idaho
Illinois
Indiana
Iowa
Kansas
Kentucky

Percent
13.0
5.7
13.0
14.0
10.6
9.7
13.8
13.0
17.6
9.6
13.3
11.3
12.1
12.4
14.9
13.3
12.5

State
Louisiana
Maine
Maryland
Massachusetts
Michigan
Minnesota
Mississippi
Missouri
Montana
Nebraska
Nevada
New Hampshire
New Jersey
New Mexico
New York
North Carolina
North Dakota

Source: Statistical Abstract of the United States, 2001.


www.census.gov/prod/2002pubs/01statab/stat-ab01.html

Percent
11.6
14.4
11.3
13.5
12.3
12.1
12.1
13.5
13.4
13.6
11.0
12.0
13.2
11.7
12.9
12.0
14.7

State
Ohio
Oklahoma
Oregon
Pennsylvania
Rhode Island
South Carolina
South Dakota
Tennessee
Texas
Utah
Vermont
Virginia
Washington
West Virginia
Wisconsin
Wyoming

Count number in each

Percent
13.3
13.2
12.8
15.6
14.5
12.1
14.3
12.4
9.9
8.5
12.7
11.2
11.2
15.3
13.1
11.7

Count the observations in each class.


Here are the counts:

Class
4.1 to 5.0
5.1 to 6.0
6.1 to 7.0
7.1 to 8.0
8.1 to 9.0

27

Count
0
1
0
0
1

Class
9.1 to 10.0
10.1 to 11.0
11.1 to 12.0
12.1 to 13.0
13.1 to 14.0

Count
3
2
9
14
12

Class
14.1 to 15.0
15.1 to 16.0
16.1 to 17.0
17.1 to 18.0
18.1 to 19.0

Count
5
2
0
1
0

28

15
10

Bin width:
5 mm Hg

Number of Men in Population

Divide range of data into intervals (bins) of equal width


Count number of observations in each class
Draw the histogram
Label scales

12

14

Pictures of Data: Histograms

How to Make a Simple Histogram

80

100

120

10

12

14

16

18

160

180

50

3
2
1
0
80

120

140

160

180

80

100

120

140

160

180

Systolic Blood Pressure (mm Hg)

30

Other Types of Histograms


25
20
15
0

There is no perfect answer to this


Depends on sample size n

Rough Guideline: # Intervals n

Frequency
Histogram

10

Frequency

30

35

How many intervals (bins) should you have in a histogram?

100

Systolic Blood Pressure (mm Hg)

29

Bin width:
1 mm Hg

40
30
20
10

20

Percent of residents over 65

Bin width:
20 mm Hg

Number of Men in Population

10
8
6
0

Number of states

140

Systolic Blood Pressure (mm Hg)

Number of Intervals
about 3
about 7
about 10

31

1.0
0.8
0.6
0.4
0.2

Relative Frequency (%)

Relative
Frequency
Polygon

0.0

0.2

0.4

0.6

0.8

Relative
Frequency
Histogram

0.0

Histogram applet at
http://www.stat.sc.edu/~west/javahtml/Histogram.html

1.0

IgM concentrations (g/l) in 324 children


Relative Frequency (%)

n
10
50
100

IgM concentrations (g/l)

IgM concentrations (g/l)

32

Stem and Leaf Plot

Boxplots

160

140

150

79
1166788999
0112333444555667777889
00111111233445555566777777889
0011123333456667788999
0111224446
003
05

130

75th Percentile

120

Sample Median

110

|
|
|
|
|
|
|
|

Largest Non-Outlier

25th Percentile

100

9
10
11
12
13
14
15
16

Outlier

Smallest Non-Outlier

33

Shapes of the Distribution

34

Shapes of the Distribution

1976
Symmetrical and
bell-shaped

Positively skewed or
skewed to the right

Negatively skewed or
skewed to the left

Bimodal

Reverse J-shaped

Uniform

1988

Many Distributions are Not Symmetric


Source: Sibergeld, Annual Rev. Public Health, 1997.

35

36

Distribution Characteristics

Note on Shapes of Distributions


Right Skewed (positively skewed)
Long right tail
Mean > Median
e.g. hospital stays
Left Skewed (negatively skewed)
Long left tail
Mean < Median
e.g. humidity (cant get over 100%)
Symmetric Right and left sides are mirror images
Left tail looks like right tail
Mean Median Mode

Mode

Median

Outlier An individual observation that falls outside the


overall pattern of the graph.

Mean

37

5
2

Medium
Sample

Number of Men in Population

The Histogram and the Probability Density

Mean=Balancing Point

38

80

100

120

140

160

180

80

39

0.020
0.010

Probability Density

Entire
Population

0.000

10

20

30

Large
Sample

Number of Men in Population

Systolic Blood Pressure (mm Hg)

100

120

140

160

180

Systolic Blood Pressure (mm Hg)

80

100

120

140

160

180

Systolic Blood Pressure (mm Hg)

40

The Probability Density

What is the most well-known Distribution?

The Normal (Gaussian) Distribution


The probability density is a smooth idealized curve that shows
0.08

the shape of the distribution in the population


This is generally a theoretical distribution that we can never

Frequency
0.04

0.06

see: we can only estimate it from the distribution presented by


a representative (random) sample from the population
Areas in an interval under the curve represent the percent of

0.00

0.02

the population in the interval

25

30

35
40
Serum Albumin (g/l)

45

50

41

The Normal (Gaussian) Distribution

42

The Normal Distribution


There are lots of normal distributions:

Symmetric
Bell-shaped
Mean Median Mode

You can tell which normal distribution you have by knowing the
mean and standard deviation:
Mean () is the center
Standard deviation () measures the spread (variability)

Applet at http://stat-www.berkeley.edu/~stark/Java/Html/StandardNormal.htm

43

44

The Normal Distribution

The 68-95-99.7 Rule

Areas under a normal curve represent the proportion of total values


described by the curve that fall in that range:
In any normal distribution, approximately:

The shaded area is


approximately 29% of the
total area under the curve

. 68% of the observations fall within one standard deviation of


the mean.
. 95% of the observations fall within two standard deviations
of the mean.
. 99.7% of the observations fall within three standard
deviations of the mean.
*more precisely, 1.96

45

46

47

48

Distributions of Heights in Females Age 18-24


Approximately normal
Mean 65 inches
Standard deviation 2.5 inches
The rule says that if a
population is normally
68%

distributed then approximately


68% of the population will be

within 1 SD of X

It doesnt guarantee that


exactly 68% of your sample of
.
data will fall within 1 SD of X
Why? The rule works better if
the sample size is big.
57.5

60

62.5

65

67.5

70

72.5

Standard Normal Distribution


= 0 and = 1

Standard Normal Scores


Z -Scores
How many standard deviations from the population mean are you?
Standard Score(Z ) =

observation population mean


standard deviation

A standard score of
Z = 1 = observation lies one SD above the mean
Z = 2 = observation lies two SD above the mean
=0

=2

Z = 1 = observation lies one SD below the mean


Z = 2 = observation lies two SD below the mean

49

Z -Scores

50

Whats the usefulness of standard normal scores?

Example: Female Heights, mean= 65, s = 2.5 inches


1

Height = 72.5 inches


72.5 65
Z=
= +3.0
2.5

It tells you how many SD(s) an observation is from the mean.

Thus, it is a way of quickly assessing how unusual an


observation is.

Suppose the mean height is 65 inches, and s = 2.5

Height = 60 inches

Is 72.5 inches unusually tall?

60 65
Z=
= 2.0
2.5

If we know Z = 3.0, does that help us?

51

52

Assuming the population has a normal distribution:

Within Z
SDs of
the mean

Fraction of Population that is


More than Z
More than Z
SDs above
SDs below
the mean
the mean

More than Z
SDs above
or below
the mean

Normal Probability Applet at


www-stat.stanford.edu/~naras/jsm/FindProbability.html

Z
0.5
1.0
1.5
2.0
2.5
3.0
3.5

38.29
68.27
86.64
95.45
98.76
99.73
99.95

%
%
%
%
%
%
%

30.85%
15.87%
6.68%
2.28%
0.62%
0.13%
0.02%

30.85%
15.87%
6.68%
2.28%
0.62%
0.13%
0.02%

61.71%
31.73%
13.36%
4.55%
1.24%
0.27%
0.05%

53

Problems

54

If you have a standard score of Z = 3, what % of the


population would have scores greater than you?

If you have a standard score of Z = 1.5, what % of the


population would have scores less than you?

Suppose the population is normally distributed:


1

If you have a standard score of Z = 2, what % of the


population would have scores greater than you?

If you have a standard score of Z = 2, what % of the


population would have scores less than you?

55

56

Suppose we call unusual observations those that are either at


least 2 SD above the mean or about 2 SD below the mean.
What % are unusual?

What % of the observations would have |Z | > 3.0?

What percent of observations would have |Z | > 1.15?

In other words, what % of the observations will have a

standard score either Z > +2.0 or Z < 2.0?


What % would have |Z | > 2?

What % of the observations would have |Z | > 1.0 (i.e., more


than 1 SD away from the mean)?

57

Normal Distribution
z
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.10
0.11
0.12
0.13
0.14
0.15
0.16
0.17
0.18
0.19
0.20
0.21
0.22
0.23
0.24
0.25
0.26
0.27
0.28
0.29

P
1.0000
0.9920
0.9840
0.9761
0.9681
0.9601
0.9522
0.9442
0.9362
0.9283
0.9203
0.9124
0.9045
0.8966
0.8887
0.8808
0.8729
0.8650
0.8572
0.8493
0.8415
0.8337
0.8259
0.8181
0.8103
0.8026
0.7949
0.7872
0.7795
0.7718

z
0.30
0.31
0.32
0.33
0.34
0.35
0.36
0.37
0.38
0.39
0.40
0.41
0.42
0.43
0.44
0.45
0.46
0.47
0.48
0.49
0.50
0.51
0.52
0.53
0.54
0.55
0.56
0.57
0.58
0.59

P
0.7642
0.7566
0.7490
0.7414
0.7339
0.7263
0.7188
0.7114
0.7039
0.6965
0.6892
0.6818
0.6745
0.6672
0.6599
0.6527
0.6455
0.6384
0.6312
0.6241
0.6171
0.6101
0.6031
0.5961
0.5892
0.5823
0.5755
0.5687
0.5619
0.5552

z
0.60
0.61
0.62
0.63
0.64
0.65
0.66
0.67
0.68
0.69
0.70
0.71
0.72
0.73
0.74
0.75
0.76
0.77
0.78
0.79
0.80
0.81
0.82
0.83
0.84
0.85
0.86
0.87
0.88
0.89

P
0.5485
0.5419
0.5353
0.5287
0.5222
0.5157
0.5093
0.5029
0.4965
0.4902
0.4839
0.4777
0.4715
0.4654
0.4593
0.4533
0.4473
0.4413
0.4354
0.4295
0.4237
0.4179
0.4122
0.4065
0.4009
0.3953
0.3898
0.3843
0.3789
0.3735

z
0.90
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1.00
1.01
1.02
1.03
1.04
1.05
1.06
1.07
1.08
1.09
1.10
1.11
1.12
1.13
1.14
1.15
1.16
1.17
1.18
1.19

Tabulated values are the proportion of the


standard Normal distribution outside the
range z, where z is a standard Normal
deviatealso called two-sided p-values.
P
z
P
0.3681
1.20
0.2301
0.3628
1.21
0.2263
0.3576
1.22
0.2225
0.3524
1.23
0.2187
0.3472
1.24
0.2150
0.3421
1.25
0.2113
0.3371
1.26
0.2077
0.3320
1.27
0.2041
0.3271
1.28
0.2005
0.3222
1.29
0.1971
0.3173
1.30
0.1936
0.3125
1.31
0.1902
0.3077
1.32
0.1868
0.3030
1.33
0.1835
0.2983
1.34
0.1802
0.2937
1.35
0.1770
0.2891
1.36
0.1738
0.2846
1.37
0.1707
0.2801
1.38
0.1676
0.2757
1.39
0.1645
0.2713
1.40
0.1615
0.2670
1.41
0.1585
0.2627
1.42
0.1556
0.2585
1.43
0.1527
0.2543
1.44
0.1499
0.2501
1.45
0.1471
0.2460
1.46
0.1443
0.2420
1.47
0.1416
0.2380
1.48
0.1389
0.2340
1.49
0.1362

The above results will turn out to be very important later in our
discussion of p-values.

58

Normal Distribution
z
1.50
1.51
1.52
1.53
1.54
1.55
1.56
1.57
1.58
1.59
1.60
1.61
1.62
1.63
1.64
1.65
1.66
1.67
1.68
1.69
1.70
1.71
1.72
1.73
1.74
1.75
1.76
1.77
1.78
1.79

59

P
0.1336
0.1310
0.1285
0.1260
0.1236
0.1211
0.1188
0.1164
0.1141
0.1118
0.1096
0.1074
0.1052
0.1031
0.1010
0.0989
0.0969
0.0949
0.0930
0.0910
0.0891
0.0873
0.0854
0.0836
0.0819
0.0801
0.0784
0.0767
0.0751
0.0735

z
1.80
1.81
1.82
1.83
1.84
1.85
1.86
1.87
1.88
1.89
1.90
1.91
1.92
1.93
1.94
1.95
1.96
1.97
1.98
1.99
2.00
2.01
2.02
2.03
2.04
2.05
2.06
2.07
2.08
2.09

P
0.0719
0.0703
0.0688
0.0672
0.0658
0.0643
0.0629
0.0615
0.0601
0.0588
0.0574
0.0561
0.0549
0.0536
0.0524
0.0512
0.0500
0.0488
0.0477
0.0466
0.0455
0.0444
0.0434
0.0424
0.0414
0.0404
0.0394
0.0385
0.0375
0.0366

z
2.10
2.11
2.12
2.13
2.14
2.15
2.16
2.17
2.18
2.19
2.20
2.21
2.22
2.23
2.24
2.25
2.26
2.27
2.28
2.29
2.30
2.31
2.32
2.33
2.34
2.35
2.36
2.37
2.38
2.39

P
0.0357
0.0349
0.0340
0.0332
0.0324
0.0316
0.0308
0.0300
0.0293
0.0285
0.0278
0.0271
0.0264
0.0257
0.0251
0.0244
0.0238
0.0232
0.0226
0.0220
0.0214
0.0209
0.0203
0.0198
0.0193
0.0188
0.0183
0.0178
0.0173
0.0168

z
2.40
2.41
2.42
2.43
2.44
2.45
2.46
2.47
2.48
2.49
2.50
2.51
2.52
2.53
2.54
2.55
2.56
2.57
2.58
2.59
2.60
2.61
2.62
2.63
2.64
2.65
2.66
2.67
2.68
2.69

P
0.0164
0.0160
0.0155
0.0151
0.0147
0.0143
0.0139
0.0135
0.0131
0.0128
0.0124
0.0121
0.0117
0.0114
0.0111
0.0108
0.0105
0.0102
0.0099
0.0096
0.0093
0.0091
0.0088
0.0085
0.0083
0.0080
0.0078
0.0076
0.0074
0.0071

z
2.70
2.71
2.72
2.73
2.74
2.75
2.76
2.77
2.78
2.79
2.80
2.81
2.82
2.83
2.84
2.85
2.86
2.87
2.88
2.89
2.90
2.91
2.92
2.93
2.94
2.95
2.96
2.97
2.98
2.99

P
0.0069
0.0067
0.0065
0.0063
0.0061
0.0060
0.0058
0.0056
0.0054
0.0053
0.0051
0.0050
0.0048
0.0047
0.0045
0.0044
0.0042
0.0041
0.0040
0.0039
0.0037
0.0036
0.0035
0.0034
0.0033
0.0032
0.0031
0.0030
0.0029
0.0028

60

Is every variable normally distributed?

Population versus Sample

Absolutely not.
The population of interest could be
Then why do we spend so much time
studying the normal distribution?
1

Some variables are normally distributed.

A bigger reason is the Central Limit Theorem


(next lecture)

All women between ages 30 and 40


All patients with a particular disease

The sample is a small number of individuals from the population.


The sample is a subset of the population.

61

Population versus Sample

62

Population versus Sample

versus population mean ()


Sample mean X
e.g. mean blood pressure

A parameter A number that describes the population.


A parameter is a fixed number, but in practice we do not know its
value.
population mean
Example:
population proportion

(e.g., X
= 99 mmHg)
We know the sample X
We dont know the population mean but we would like to

Sample proportion versus population proportion


e.g. proportion of individuals with health insurance

A statistic A number that describes a sample of data.


A statistic can be calculated. We often use a statistic to estimate
an unknown parameter.
sample mean
Example:
sample proportion

We know the sample proportion (e.g. 80%)


We dont know the population proportion

Key Question
How close is the sample mean (or proportion) to
the population mean (or proportion)?

63

64

Sources of Error

Some Examples of Potentially Biased Sampling

Errors from Biased Sampling


The study systematically favors certain outcomes
. Voluntary response
. Non-response

Example Blood pressure study of women age 30-40


Volunteers

Solution:
Random sampling

Non-random; selection bias

Family members
Non-random; not independent
Telephone survey; random digit dial
Random or non-random sample?

. Convenience sampling
Errors from (Random) Sampling
. Caused by chance occurrence

Example Clinic Population


100 consecutive patients

. Get a bad sample because of bad luck


(by bad I mean not representative)

Random or non-random sample?


Convenience samples are sometimes assumed to

. Can be controlled by taking a larger sample

be random.

Using mathematical statistics, we can figure out how much


potential error there is from random sampling (standard error)
65

Example: Literary Digest poll of 1936 presidential election

66

Bottom Line

Election result: 62% voted for Roosevelt


Digest prediction: 43% voted for Roosevelt
Problem: Sampling Bias

When a selection procedure is biased, taking a larger sample

does not help


. This just repeats the mistake on a larger scale
Non-respondents can be very different from respondents
. When there is a high non-response rate, look out for
non-response bias

Selection Bias
Mail questionnaire to 10 million people
Sources: telephone books, clubs
Poor people are unlikely to have telephone
(only 25% had telephones)

Non Response Bias


Only about 20% responded (2.4 million)
Responders different than non-responders

67

68

Random Sample

Sampling Variability

When a sample is randomly selected from a population, it is

called a random sample

If we repeatedly choose samples from the same population, a


statistic will take different values in different samples

In a simple random sample each individual in the population

has an equal chance of being chosen for the sample


Random sampling helps control systematic bias

IDEA
If the statistic does not change
much if you repeated the study
(you get the same answer each time),
then it is fairly reliable
(not a lot of variability)

But even with random sampling, there is still sampling

variability or error

69

70

Example

Sample 1
Estimate the proportion of persons in a population who have
health insurance

1100
= .8012
1373

p =

1090
= .7939
1373

Sample 2

Choose a sample of size n = 1373.

Sample 1
n = 1373

p =

p =

1100
= .8012
1373

Sample 3
p = .8347
Sample 4

Is the sample proportion reliable?

p = .7786

If we took another sample of another 1373 persons,

would the answer bounce around a lot?

and so on

71

72

The spread of the sampling distribution depends on the sample size


Proportions based on

Proportions based on

sample size n = 300

sample size n = 1000

25

15

30

30

The Sampling Distribution

15
10
0

10

20

10

20

Histogram
of 1000
Sample
Proportions

0.70

0.75

0.80

0.85

0.90

0.70

Sample Proportion with Health Insurance


Samples of size n= 300

0.76

0.78

0.80

0.82

0.75

0.80

0.85

0.90

Sample Proportion with Health Insurance


Samples of size n= 1000

0.84

Sample Proportion with Health Insurance


Samples of size n= 1373

73

74

0.0

0.2

Percentage

p = .80

0.4

Lets explore this...

0.6

0.8

1.0

Population distribution of Health Insurance

No Health Insurance

75

Health Insurance

76

Lets do an experiment...

1.0

Sample 2

1.0

Sample 1

0.8
0.6

Percentage

0.4
0.0

0.0

Ready, set,
go...

0.2

Percentage

0.2

plot the health insurance status


record the sample proportion

0.4

patients, each with n = 20 patients


For each of the 500 samples, we will

0.6

0.8

Take 500 separate random samples from this population of

No Health Insurance

Health Insurance

No Health Insurance

p = 0.9

Health Insurance

p = 0.6

77

So we did this 500 times...


Lets look at a histogram of the 500 sample means
Each based on a sample of size 20

78

Lets do ANOTHER experiment...

Take 500 separate random samples from this population of


2.5

3.0

patients, each with n = 50 patients


For each of the 500 samples, we will
plot the health insurance status
record the sample proportion

1.5

2.0

p
= 0.8
s = 0.11

0.0

0.5

1.0

Ready, set,
go...

0.0

0.2

0.4

0.6

0.8

1.0

79

80

0.6

0.2
0.0
Health Insurance

No Health Insurance

Health Insurance

No Health Insurance

p = 0.8
s = 0.06

Percentage

0.4

0.6
0.4
0.0

0.2

Percentage

0.8

0.8

1.0

Sample 2

1.0

Sample 1

So we did this 500 times...


Lets look at a histogram of the 500 sample means
Each based on a sample of size 50

p = 0.8

0.0

p = 0.7

0.2

0.4

0.6

0.8

1.0

81

82

Lets do ANOTHER experiment...

1.0

Sample 2

1.0

Sample 1

0.8
0.6

Percentage
No Health Insurance

Health Insurance

p = 0.76

83

0.4
0.0

0.0

Ready, set,
go...

0.2

Percentage

0.2

plot the health insurance status


record the sample proportion

0.4

patients, each with n = 100 patients


For each of the 500 samples, we will

0.6

0.8

Take 500 separate random samples from this population of

No Health Insurance

Health Insurance

p = 0.83

84

So we did this 500 times...


Lets look at a histogram of the 500 sample means
Each based on a sample of size 100

Lets Review
p = .8

Population

Health Insurance

No Health Insurance

n = 20

p
= 0.8
s = 0.04

0.0

0.2

0.4

0.6

0.8

p
= 0.799

sp = 0.11

p
= 0.803

sp = 0.06

p
= 0.798

sp = 0.04

1.0

n = 50
0.2

0.4

0.6

0.8

1.0

0.0

0.0

0.2

0.4

0.6

0.8

1.0

n = 100
0.0

0.2

0.4

0.6

0.8

1.0

85

Lets do an experiment...

0.025

Take 500 separate random samples from this population of

0.015

0.020

men, each with n = 20 subjects


For each of the 500 samples, we will
plot a histogram of the sample BP values
record the sample mean
sample standard deviation

0.005

0.010

= 125 mm Hg
= 14 mm Hg

Ready, set,
go...

0.000

Percentage of Men in Population

0.030

Population distribution of blood pressures

86

80

100

120

140

160

Systolic Blood Pressure (mm Hg)

87

88

So we did this 500 times...


Lets look at a histogram of the 500 sample means
each based on a sample of size 20

0.12
0.10

= 125
X

0.08

0.04

sX = 3.07

100

120

140

160

180

80

100

120

140

160

180

= 125.17
X

= 124.3
X

s = 12.36

s = 11.65

0.00

0.02

80

0.04

0.00

0.00

0.06

0.02
0.01

0.03

0.02
0.01

0.04

Sample 2

0.03

Sample 1

80

100

120

140

160

180

89

90

Lets do ANOTHER experiment...

Ready, set,
go...

0.01
0.00

0.00

0.01

0.02

plot a histogram of the sample BP values


record the sample mean
sample standard deviation

0.02

0.03

men, each with n = 50 subjects


For each of the 500 samples, we will

0.03

0.04

Take 500 separate random samples from this population of

Sample 2

0.04

Sample 1

80

91

100

120

140

160

180

80

100

120

140

= 124.98
X

= 126.72
X

s = 14.05

s = 13.64

160

180

92

So we did this 500 times...


Lets look at a histogram of the 500 sample means
Each based on a sample of size 50

Lets do ANOTHER experiment...

0.20

Take 500 separate random samples from this population of

0.15

men, each with n = 100 subjects


For each of the 500 samples, we will
plot a histogram of the sample BP values
record the sample mean
sample standard deviation

0.10

= 125.01
X
sX = 1.93

0.00

0.05

Ready, set,
go...
80

100

120

140

160

180

93

So we did this 500 times...


Lets look at a histogram of the 500 sample means
Each based on a sample of size 100

120

140

160

180

0.25
0.20
0.15
80

100

120

140

160

180

0.05

100

sX = 1.41

= 127.32
X

= 125.06
X

s = 14.93

s = 13.15

0.00

80

= 124.93
X

0.10

0.03
0.02
0.01
0.00

0.03
0.02
0.00

0.01

0.04

Sample 2

0.04

Sample 1

94

80

95

100

120

140

160

180

96

Lets Review

Population distribution of hospital length of stay

Population

90

100

110

120

130

140

150

= 14

= 124.997
X

sX = 3.07

160

120

140

160

= 125.015
X

n = 50
80

100

120

140

0.15

= 4 days
= 3 days

0.05

100

sX = 1.93

0.00

80

0.10

n = 20

Percentage

0.20

80

= 125

160

10

15

20

25

30

Length of Stay (in days)

= 124.934
X

n = 100
80

100

120

140

sX = 1.41

160

97

98

Lets do an experiment...

Ready, set,
go...

0.25
0.15
0.05
0.00

0.00

0.05

0.10

plot a histogram of the sample LOS values


record the sample mean
sample standard deviation

0.10

0.15

0.20

hospital admissions, each with n = 16 patients


For each of the 500 samples, we will

0.20

0.25

Take 500 separate random samples from this population of

0.30

Sample 2

0.30

Sample 1

99

10

15

20

25

10

15

= 4.7
X

= 5.01
X

s = 2.88

s = 2.73

20

25

100

So we did this 500 times...


Lets look at a histogram of the 500 sample means
Each based on a sample of size 16

Lets do ANOTHER experiment...

Take 500 separate random samples from this population of

0.4

sX = 0.74

0.2

= 4.08
X

0.3

0.5

men, each with n = 64 subjects


For each of the 500 samples, we will
plot a histogram of the sample LOS values
record the sample mean
sample standard deviation

0.0

0.1

Ready, set,
go...
0

10

15

20

25

101

So we did this 500 times...


Lets look at a histogram of the 500 sample means
Each based on a sample of size 64

0.8

0.25

= 4.1
X

0.6

0.20
0.15
0.10

sX = 0.37

0.00

0.4

0.05

0.20
0.15

10

15

20

25

10

15

20

25

0.0

0.2

0.00

0.05

0.10

0.25

0.30

Sample 2

0.30

Sample 1

102

= 4.26
X

= 4.08
X

s = 2.72

s = 2.45

103

10

15

20

25

104

Lets do ANOTHER experiment...

Ready, set,
go...

0.25
0.00

0.05

0.10

0.15
0.00

0.05

0.10

0.15

0.20

men, each with n = 256 subjects


For each of the 500 samples, we will
plot a histogram of the sample LOS values
record the sample mean
sample standard deviation

0.20

0.25

Take 500 separate random samples from this population of

0.30

Sample 2

0.30

Sample 1

10

15

20

25

10

15

= 4.48
X

= 4.29
X

s = 3.32

s = 2.76

20

25

105

So we did this 500 times...


Lets look at a histogram of the 500 sample means
Each based on a sample of size 256

106

Lets Review
Population

10

15

20

s = 3

= 4.081
X

sX = 0.74

= 4.104
X

sX = 0.37

= 4.1
X

sX = 0.19

25

1.5

= 4.1
X

1.0

2.0

= 4

sX = 0.19

n = 16
0

10

15

20

25

0.5

n = 64
5

10

15

20

25

0.0

10

15

20

25

n = 256
0

107

10

15

20

25

108

Variation in sample mean values tied to size of each sample


NOT the number of samples

The Sampling Distribution

The sampling distribution of a sample statistic refers to what the


distribution of the statistic would look like if we chose a large
number of samples from the same population

www.ruf.rice.edu/~lane/stat_sim/sampling_dist/index.html

500
5000
Simulations

500
5000
Simulations

500
5000
Simulations

n=16

n=64

n=256

109

110

Sampling Distribution of a Sample Mean

In real research it is impossible to estimate the sampling

distribution of a sample mean by actually taking multiple


random samples from the same population

The sampling distribution of a sample mean is a theoretical


probability distribution
It describes the distribution of

no research would ever happen if a study needed to be

repeated multiple times to understand this sampling behavior

all sample means

Simulations are useful to illustrate a concept, but not to

highlight a practical approach!

from all possible random samples

Luckily, there is some mathematical machinery that

of the same size

generalizes some of the patterns we saw in the simulation


results

taken from a population

111

112

Amazing Result
Mathematical statisticians have figured out how to predict what
the sampling distribution will look like without actually repeating
the study numerous times and having to choose a sample each time

The Big Idea


Its not practical to keep repeating a study to evaluate sampling
variability and to determine the sampling distribution.
Mathematical statisticians have figured out how to calculate it
without doing multiple studies.

Often, the sampling distribution will


look normal

The sampling distribution of a statistic is often normally

30

distributed.

This mathematical result comes from the CENTRAL LIMIT

10

20

THEOREM. For the theorem to work, it requires the sample


size (n) to be large (usually n > 60 suffices).

Statisticians have derived formulas to calculate the standard

deviation of the sampling distribution and its called the


standard error of the statistic.

0.76

0.78

0.80

0.82

0.84

Sample Proportion with Health Insurance


Samples of size n= 1373

113

Central Limit Theorem

114

Illustration of the Central Limit Theorem


Population

If the sample size is large, the distribution of sample means


approximates a normal distribution:
One Die

Two Dice

10

Five Dice

Means based
on n = 16
2

10

Number of Occurrences

Means based
on n = 32
0
1

Mean Value

Mean Value

10

Mean Value

Means based
on n = 64
0

115

10

116

Why is the normal distribution so important in the study of statistics?

Why is the sampling distribution so important?

If a sampling distribution has a lot of variability (i.e. has a big

Its not because things in nature are always normally distributed


(although sometimes they are)

standard error), then if you took another sample its likely you
would get a very different result
About 95% of the time the sample mean (or proportion) will

be within 2 standard errors of the population mean (or


proportion)

Its because of the Central Limit Theorem:


The sampling distribution of statistics (like a sample mean) often
follows a normal distribution if the sample sizes are large

This tells us how close the sample statistic should be to the

population parameter

117

118

Standard Errors (SE)

The standard deviation


IS NOT
The standard error of a statistic

Measures the precision of your sample statistic


A small SE means it is more precise
The SE is the standard deviation of the sampling distribution

Standard deviation measures the variability among individual


observations.

of the statistic
Mathematical statisticians have come up with formulas for the
standard error. There are different formulae for:

Standard error measures the precision of a statistic such as the


sample mean or proportion that is calculated from a
number (n) of different observations. The sample
mean and sample proportion are trying to estimate
the population mean or population proportion.

Standard error of the mean (SEM)


Standard error of a proportion

These formulae always involve the sample size n. As the

sample size gets bigger, the standard error gets smaller.

119

120

Standard Error of the Mean


SEM

Notes on SEM

This is a measure of the precision of the sample mean:


s
SEM =
n

1
2
3

Example
Measure systolic
Sample size
Sample mean
Sample SD

is.
The smaller SEM is, the more precise X
SEM depends on n and s.
SEM gets smaller if
s gets smaller
n gets bigger

blood pressure on random sample of 100 students


n = 100
= 123.4 mmHg
X
s = 14.0 mmHg
14
SEM =
= 1.4mmHg
100
121

Question:

122

95% Confidence Interval for Population Mean

How close to population mean () is sample mean (X)?


2SEM
X

ANSWER
The standard error of the sample mean tells us 95% of the time
the population mean will lie within about 2 standard errors of the
sample mean
2SEM
X

1.96SEM)
(More accurately: X

The CI gives the range of plausible values for


= 123.4 mmHg, s = 14
Example: Blood pressure n = 100, X
95% CI is
123.4 2 1.4

123.4 2 1.4
123.4 2.8

123.4 2.8

Why is this true? Because of the Central Limit Theorem


INTERPRETATION
We are 95% confident that the sample mean is within 2.8 mmHg
of the population mean. The 95% error bound is 2.8.
123

Ways to write a confidence interval:


120.6 to 126.2
(120.6, 126.2)
(120.6126.2)
We are highly confident that the population mean falls in the
range 120.6 to 126.2.

124

Notes on Confidence Intervals

Notes on Confidence Intervals

Interpretation:
Plausible values for the population mean with high
confidence

n increases
s decreases
Level of confidence decreases (e.g. 90%, 80% vs 95%)

Are all CIs 95%? No.

The length of CI decreases when

It is the most commonly used


A 99% CI is wider
A 90% CI is narrower
To be more confident you need a bigger interval For a 99% CI
you need 2.576 SEM
95% CI you need 2 SEM (actually its 1.96 SEM)
90% CI you need 1.645 SEM
Where do these come from?

Confidence interval is only accounting for random sampling


error not other systematic sources of error of bias
Examples
BP measurement is always +5 too high
Only those with high BP agree to participate
(non response bias)

125

Notes on Confidence Intervals


5

126

Underlying Assumptions for a 95% CI for the Population Mean

Technical Interpretation
The CI works (includes ) 95% of the time
2SEM
X

2 s
X
n

Random sample of population

Important
Sample size n is at least 60 to use 2SEM

Central Limit Theorem requires large n


6

Confidence Interval Applet


www.stat.sc.edu/~west/javahtml/ConfidenceInterval.html
127

128

What if the sample size is smaller than 60?

Value of t.95 used for 95% Confidence Interval for Mean


df
1
2
3
4
5
6
7
8
9
10
11

There needs to be a small correction in the formula

2SEM needs to be slightly bigger.


X
How much bigger 2 needs to be depends on the sample size.

Computers or statistical tables refer to the degrees of


freedom = n 1. One looks up the correct number in a
t-table or t-distribution with n 1 degrees of freedom.
You can think of degrees of freedom like a corrected sample

size. In this case its n 1 because we had to estimate one


. But its not always n 1.
parameter by X
t SEM
X

t
12.706
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
2.201

df
12
13
14
15
20
25
30
40
60
120

t
2.179
2.160
2.145
2.131
2.086
2.060
2.042
2.021
2.000
1.980
1.960

Notes
t s
X
n

129

Students t-Distribution
df
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

0.2
3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
1.363
1.356
1.350
1.345
1.341
1.337
1.333
1.330
1.328
1.325
1.323
1.321
1.319
1.318
1.316
1.315
1.314
1.313
1.311
1.310

0.05
12.706
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
2.201
2.179
2.160
2.145
2.131
2.120
2.110
2.101
2.093
2.086
2.080
2.074
2.069
2.064
2.060
2.056
2.052
2.048
2.045
2.042

0.02
31.821
6.965
4.541
3.747
3.365
3.143
2.998
2.896
2.821
2.764
2.718
2.681
2.650
2.624
2.602
2.583
2.567
2.552
2.539
2.528
2.518
2.508
2.500
2.492
2.485
2.479
2.473
2.467
2.462
2.457

0.01
63.657
9.925
5.841
4.604
4.032
3.707
3.499
3.355
3.250
3.169
3.106
3.055
3.012
2.977
2.947
2.921
2.898
2.878
2.861
2.845
2.831
2.819
2.807
2.797
2.787
2.779
2.771
2.763
2.756
2.750

130

Students t-Distribution

Tabulated values correspond to


a given two-tailed p-value for
different degrees of freedom.
0.1
6.314
2.920
2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.812
1.796
1.782
1.771
1.761
1.753
1.746
1.740
1.734
1.729
1.725
1.721
1.717
1.714
1.711
1.708
1.706
1.703
1.701
1.699
1.697

Most people use t = 2 once n gets above 60 or so


Sometimes people use 1.96 when n gets bigger (> 120)
Value of t depends on the level of confidence and sample size

df
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

0.001
636.619
31.599
12.924
8.610
6.869
5.959
5.408
5.041
4.781
4.587
4.437
4.318
4.221
4.140
4.073
4.015
3.965
3.922
3.883
3.850
3.819
3.792
3.768
3.745
3.725
3.707
3.690
3.674
3.659
3.646

131

0.2
1.309
1.309
1.308
1.307
1.306
1.306
1.305
1.304
1.304
1.303
1.303
1.302
1.302
1.301
1.301
1.300
1.300
1.299
1.299
1.299
1.298
1.298
1.298
1.297
1.297
1.297
1.297
1.296
1.296
1.296
1.282

0.1
1.696
1.694
1.692
1.691
1.690
1.688
1.687
1.686
1.685
1.684
1.683
1.682
1.681
1.680
1.679
1.679
1.678
1.677
1.677
1.676
1.675
1.675
1.674
1.674
1.673
1.673
1.672
1.672
1.671
1.671
1.645

0.05
2.040
2.037
2.035
2.032
2.030
2.028
2.026
2.024
2.023
2.021
2.020
2.018
2.017
2.015
2.014
2.013
2.012
2.011
2.010
2.009
2.008
2.007
2.006
2.005
2.004
2.003
2.002
2.002
2.001
2.000
1.960

0.02
2.453
2.449
2.445
2.441
2.438
2.434
2.431
2.429
2.426
2.423
2.421
2.418
2.416
2.414
2.412
2.410
2.408
2.407
2.405
2.403
2.402
2.400
2.399
2.397
2.396
2.395
2.394
2.392
2.391
2.390
2.326

0.01
2.744
2.738
2.733
2.728
2.724
2.719
2.715
2.712
2.708
2.704
2.701
2.698
2.695
2.692
2.690
2.687
2.685
2.682
2.680
2.678
2.676
2.674
2.672
2.670
2.668
2.667
2.665
2.663
2.662
2.660
2.576

0.001
3.633
3.622
3.611
3.601
3.591
3.582
3.574
3.566
3.558
3.551
3.544
3.538
3.532
3.526
3.520
3.515
3.510
3.505
3.500
3.496
3.492
3.488
3.484
3.480
3.476
3.473
3.470
3.466
3.463
3.460
3.291

132

t-distribution Applets

Example: Blood pressure


n=5

= 99 mmHg
X
s = 15.97

2.776 SEM
95% CI is X
http://www.stat.sc.edu/~west/applets/tdemo.html
http:
//www.econtools.com/jevons/java/Graphics2D/tDist.html

99 2.776 7.142
99 19.83
The 95% CI for mean blood pressure is
(79.17, 118.83)
(79.17 118.83)
Rounding off is okay too: (79, 119)
133

Confusion between SD and SEM

134

PROPORTIONS (p)

Proportion of individuals with health insurance

Standard deviation (s) - measures spread in the data

Proportion of patients who became infected


Proportion of patients who are cured

Standard error (s/ n) - measures the precision of the sample


mean

Proportion of individuals who are hypertensive


Proportion of individuals positive on a blood test
Proportion of adverse drug reactions
Proportion of premature infants who survive

The standard error of the sample mean depends on the sample size.

Does the standard deviation depend on the sample size too?

On each individual in the study, we record a binary outcome


(Yes/No; Success/Failure) rather than a continuous measurement

135

136

Proportions

Example
n = 200 patients
X = 90 adverse drug reaction
The estimated proportion who experience an adverse drug reaction
is
p = 90/200 = .45
or 45%

How accurate of an estimate is the sample proportion of the

population proportion?

NOTES

What is the standard error of a proportion?

There is uncertainty about this rate because it involved only

n = 200 patients
If we had studied a much larger number of patients, would we

have gotten a much different answer?


The sample proportion is p
= .45
But it is not the true rate of adverse drug reactions in the

population
137

The Sampling Distribution of a Proportion

95% CI for a Proportion

10

12

p 1.96SE (
p)
r
p (1 p)
p 1.96
n

The standard error of a


sample proportion is
r
p (1 p)
SE (
p) =
n

p is the sample proportion


n is the sample size
Example
n = 200 patients
X = 90 adverse drug reactions

p = 90/200 = .45
r
.45 .55
.45 1.96
200
.45 1.96 0.035

Number of Samples

138

0.35

0.40

0.45

0.50

0.55

.45 0.07

Sample Proportion

139

The 95% confidence interval is (.38 .52).

140

Interpreting a 95% CI for a Proportion

Notes on 95% CI for Proportions

Plausible range of values for population proportion


Highly confident that population proportion is in the interval
The method works 95% of the time

Random (or representative) sample


Suppose the 200 patients were sicker?
Suppose the 200 patients were consecutive?

The confidence interval does not address your definition of


drug reaction and whether thats a good or bad definition. It
accounts only for sampling variation.

Can also have CI with different levels of confidence

141

142

Example

Sometimes 1.96SE (
p ) is called

Study of survival of premature infants.

95% Error Bound


Margin of Error
5

All premature babies born at Johns Hopkins during a 3 year

period (Allen, et al. NEJM, 1993)

The formula for a 95% CI is ONLY APPROXIMATE. It works


well if the number of failures (drug reactions) and successes
(non-reactions) are both at least 5.
Otherwise, you need to use a computer to perform something
called exact binomial calculations.
You do NOT use the t-correction for small sample sizes like
we did for sample means. We use exact binomial
calculations.

n = 39 infants born at 25 weeks gestation


31 survived 6 months

p =

31
= 0.79
39

95% CI .63 .91


(based on exact binomial calculations)

Source: Motulsky, Intuitive Biostatistics

143

144

Comparison of 2 Groups
Are confidence intervals needed even though all infants were

studied?

Are the Population Means Different


(Continuous Data)
Two Situations

Are the 39 infants a sample?


Seems like its the whole population.
1

Paired Design
Before-after data
Twin data

It makes sense to calculate a CI when the sample is


representative of a larger population about which you
wish to make inferences. It is reasonable to think
that these data from several years at one hospital are
representative of data from other years at other
hospitals, at least at big-city university hospitals in
the United States.

Two Independent Sample Design


Treatment A
School Children
Treatment B

145

Paired Design

146

Example: Blood pressure and Oral Contraceptive Use

Before
After

Subjects Ten non-pregnant, pre-menopausal women 16-49


years old who were beginning a regimen of oral
contraceptive (OC)
Methods Measure blood pressure prior to starting OC use, and
three-months after consistent OC use

Why Pairing?

Goal Identify any changes in average blood pressure


associated with OC use in such women

Controls extraneous noise


Everyone acts as own control

Rosner, Fundamentals of Biostatistics, (2005).

147

148

Example: Blood pressure and Oral Contraceptive Use

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
sample mean

BP Before OC
115
112
107
119
115
138
126
105
104
115
115.6

BP After OC
128
115
106
128
122
145
132
109
102
117
120.4

Calculate a 95% CI for the Expected Change in Blood Pressure


95% CI for population mean BP change

Difference
After-Before

4.8

4.8

t.95,df =9 SEM
4.57
2.262
10
2.262 1.445

1.53 mm Hg

to

8.07 mm Hg

Notes

The sample average of the differences is 4.8.


The sample standard deviation (s) of the differences is

s = 4.57.

Where does 2.262 come from?


See the t-distribution with 9 degrees of freedom

The BP change could be due to factors other than oral


contraceptives. A control group of comparable women who
were not taking oral contraceptives would strengthen this
study.

149

150

Hypothesis TestingSignificance Testingand p-values


Want to draw a conclusion about a population parameter:

In a population of women who use oral


contraceptives, is the average (expected) change in
blood pressure (After-Before) 0 or not?

The number 0 is NOT in confidence interval (1.53 8.07)


0

1.53

8.07

Sometimes statisticians use the term expected for the

population average

Because 0 is not in the interval, suggests there is a significant

is the expected (population) mean change in blood pressure

change in BP over time

Choose between two competing possibilities for using a

There is a significant increase in blood pressure

single imperfect (paired) sample


Null hypothesis H0 : = 0
Alternative hypothesis H1 : 6= 0
We reject H0 if the sample mean is far away from 0.
151

152

The Hypotheses

Do we have sufficient evidence to reject H0 and claim H1 is true?

We set up mutually exclusive, exhaustive possibilities for the truth:


The null hypothesis H0
Typically represents the hypothesis that there is no
effect or difference.
It represents current beliefs or state of knowledge.
For example, there is no effect of oral contraceptives
on blood pressure:

is close to zero, it is consistent with H0


If X
is far from zero, is it consistent with H1
If X
= 4.8 is
How do we decide if X
more consistent with H0 or H1 ?

H0 : = 0
The alternative hypothesis H1
Typically represents what you are trying to prove.
For example, oral contraceptives affect blood
pressure:
H1 : 6= 0
153

154

The p-value

What is the probability of observing an extreme sample meanlike


4.8 mm Hgif the null hypothesis (H0 : = 0 ) were true?
The answer is called the p-value
If that probability (p-value) is small, it suggests the observed

result was unlikely if H0 is true.


This would provide evidence against H0

If that probability (p-value) is large, it suggests the observed

result quite probably if H0 is true.


This would provide evidence for H0

155

http://xkcd.com/892/

156

How are p-values calculated?


1

First, measure the distance between the sample mean and


what you would expect the sample mean to be if H0 : = 0
were true:

t=
t=

The t-statistic is analogous to the Z -score on pages 37-44

Z=

sample mean 0
SEM

4.8
4.8
=
= 3.31
1.45
4.57/ 10

observation mean
SD

t=

0
X
SEM

Z
t
observation
sample mean
standard deviation standard error
mean
0 because

we are calculating pvalues under the scenario that


H0 : = 0

The value t = 3.31 is called the test statistic


We observed a sample mean that was 3.31 standard

deviations of the mean (SEM) away from what we would have


expected the mean to be if OC has no effect (i.e., d = 0)
157

How are p-values calculated?


2

158

How are p-values calculated?

Next, calculate the probability of getting a test statistic as or


more extreme than what you observed (t=3.31) if H0 was
true:
This p-value comes from the normal distribution.
How unusual is it to get a standard normal score as extreme as

If the sample size is small (n < 60), a small t-correction must

be made

3.31? Not likely at all (p < .01)

Instead of a normal distribution, t-distribution is used with

n 1 degrees of freedom
The p-values gets a little larger

Use the t table.


3.31

This procedure is called a paired t-test with n 1 degrees of

3.31

freedom. In the oral contraceptive example, we performed a


paired t-test with 9 degrees of freedom.

159

160

Interpreting the p-value

Using the p-value to make a decision

The p-value in the blood pressure/OC example is .0089


Interpretation If the true before OC/after OC blood pressure
difference is 0 amongst all women taking OCs, then
the chance of seeing a mean difference as
extreme/more extreme as 4.8 in a sample of 10
women is .0089

p-values are probabilities (numbers between 0 and 1).


Small p-values are measures of evidence against H0 in favor of
H1 .
2

The p-value is the probability of obtaining a result


as/or more extreme than you did by chance alone
assuming the null hypothesis H0 is true.

If the p-value is small either


(a) A very rare event occurred and H0 is true
OR
(b) H0 is false

161

Using the p-value to make a decision

162

Using the p-value to make a decision

The p-value in the blood pressure/OC example is .0089

p-value is a continuum of evidence


Guidelines?
p = .10: suggestive
p = .05: magical cutoff
p = .01: strong evidence

. This p-value is small


5

. So there is a small probability of observing our data (or


something more extreme) if H0 is true

How precise should p-values be?


2 decimal places suffice (p = .07)
Sometimes 3 decimal places if p < .01
. p = .007
If the p-value is really small, p < .001 is fine
If the p-value is really big, p > .20 is fine

. We reject H0

163

164

Blood PressureOC ExampleSummary

SummaryPaired t-test

Methods The changes in blood pressures after oral contraceptive use


were calculated for 10 women. A paired t-test was used to
determine if there was a significant change in blood pressure
and a 95% confidence was calculated for the mean blood
pressure change (after-before).

1
2

Designate null and alternative hypotheses


Collect data
Compute change for each paired set of observations
d , the sample mean of the paired differences
Compute X
Compute sd , the sample standard deviation of the differences

Result Blood pressure measurements increased on average 4.8 mm Hg


with standard deviation 4.57. The 95% confidence interval for
the mean change was 1.5 8.1. There was evidence that blood
pressure measurements after oral contraceptive use were
significantly higher than before oral contraceptive use
(p = .0089).

Calculate the test statistic


t=

Discussion A limitation of this study is that there was no comparison


group of women who did not use oral contraceptives. We do
not know if blood pressures may have risen even without oral
contraceptive usage.

d 0
d 0
X
X

=
SEM
sd / n

Compare t to a t-distribution to get a p-value


If p is small, Reject H0
If p is large, Fail to Reject H0

165

Two Types of Errors

166

The p-value and the -level


Some people will only call a p-value significant if it is less than
some cutoff (e.g., .05). This cutoff is called the -level

The -level is the probability of a type I error. It is the probability


of falsely rejecting H0 .

Type I error: Claim H1 is true when in fact H0 is true


Type II error: Do not claim H1 is true when in fact H1 is true

Statistically significant
The p-value is less than a preset threshold value, .

The probability of making a Type I error is called the -level


The probability of making a Type II error is called the -level
The probability of NOT making a Type II error is called the power

Do Not Say
The result is statistically significant
The result is statistically significant at = .05
The result is significant (p < .05)
167

168

One-Sided versus Two-Sided p-values

Connection between CIs and HTs

Two-sided p-value: (p = .009) Probability of a result as or more


< 4.8 or X
> 48)
extreme than observed (either X

The CI gives plausible values for the population parameter


data take me to the truth
Hypothesis testing postulates two choice for the population

One-sided p-value: (p = .0045) Probability of a more extreme


> 4.8)
positive result than observed (X

parameter

You never know what direction study results will go...


In this course, we will use two-sided p-values exclusively.
This is what is typically done in the scientific/medical literature.

here are two possibilities for the truth, data help me choose one

169

Connection between CIs and HTs

170

Connection between CIs and HTs

If 0 is not in the 95% CI, then we reject H0 that = 0 at

level = .05 (the p-value < .05)


In this BP-OC example, the 95% confidence interval tells us

1.53

that the p-value is less than .05, but it doesnt tell us that it
is p = .009

8.07

The confidence interval and the p-value are complementary.

Why?

You cant get an exact p-value from just looking at a


confidence interval

d and captures 2 standard errors in either


CI starts at X
direction
If 0 is not in the 95% CI, the d is more than 2 standard errors

I like to report both

from 0 (either above or below)


So the distance (t) will be > 2 or < 2
and the resulting p-value< .05

171

172

More on the p-value

More on the p-value


STATISTICAL SIGNIFICANCE
IS NOT THE SAME AS
SCIENTIFIC SIGNIFICANCE

STATISTICAL SIGNIFICANCE DOES


NOT IMPLY CAUSATION

Example: Blood Pressure and Oral Contraceptives

Blood Pressure Example

n = 100, 000
= .03 mmHg
X

There could be other factors that could explain the change in

blood pressure.

s = 4.57

A significant p-value is only ruling out random sampling as

the explanation.

p-value

Need a Comparison Group

= .04

Big n can sometimes produce a small p-value even though the

Self-selected (may be okay)


Randomized (better)

magnitude of the effect is very small (not scientifically


significant)
Supplement with a CI: 95% CI is .002 .058 mmHg
173

The Language of Hypothesis (Significance) Testing

174

More on the p-value


NOT REJECTING H0 IS NOT THE SAME
AS ACCEPTING H0

Suppose the p-value is p = .40


How might this result be described?

Example: Blood Pressure and Oral Contraceptives

Not statistically significant

n = 5
= 5.0 mmHg
X

Do not reject H0

Can we also say?

s = 4.57

Accept H0

p-value

Claim H0 is true

= .07

We cannot reject H0 at significance level = .05.


Are we convinced there is no effect of OC on BP?
Maybe we should have taken a bigger sample.
Interesting trend, but not proven beyond a reasonable doubt
Look at the confidence interval 95% CI (-.67, 10.7)

Statisticians much prefer the double negative


Do not reject H0

Innocent until proven guilty


175

176

Comparing Two Independent Groups

Note

Controlled Trial in Peru of Bismuth Subsalicylate (Pepto Bismol)


in Infants with Diarrheal Disease
Controls n = 84
The data are not paired.
There are different infants in each group.

Infants

2 independent groups
How do we calculate
Confidence interval for difference
p-value to determine if the difference in two groups is
significant
2-sample (unpaired) t-test

Treatment n = 85

n
Mean stool output ml/kg
Standard deviation(s)

Control
84
260
254

Tx
85
182
197

Scientific Question: Is there a treatment effect?


177

95% CI for the Difference in Means of Two Independent (Unpaired) Groups

178

The SE of the Difference in Sample Means

Generic CI formula:
Principle: Variation from independent sources can be added
estimate 1.96 SE
Variance(X1 X2 ) = (SE (X1 ))2 + (SE (X2 ))2
(X1 X2 ) 1.96 SE (X1 X2 )
SE (X1 X2 ) =

SE (X1 X2 ) = standard error of the

q
(SE (X1 ))2 + (SE (X2 ))2

difference of 2 sample means


Formula depends on n1 , n2 , s1 , s2
There are other slightly different equations for SE (X1 X2 )

The standard error of the difference for two independent

samples is calculated differently than we did for paired designs.

But they all give similar answers

Statisticians have developed formulae for the standard error of

the difference. These formulae depend on sample size in both


groups and standard deviations in both groups.
179

180

The SE of the Difference in Sample Mean

Example: Pepto Bismol RCT

95% CI for Difference in Mean


n
Mean stool output ml/kg
Standard deviation(s)

Control
84
260
254

Tx
85
182
197

78 1.96 SE (X1 X2 )
78 1.96 34.94
78 68.48

SE (X1 X2 ) =

r

2 
2
254/ 84 + 197/ 85

9 to 147

p
=
27.712 + 21.372

Note

= 34.94
The confidence interval does not include 0.
Thus, p < .05
181

Hypothesis Test to Compare Two Independent Groups

182

Notes on the 2-sample t-test


1
2

Two-Sample (Unpaired) t-test

Are the expected stool outputs equal in the two groups?

This is a 2-sample (unpaired) t-test


The value t = 2.23 is the test statistic
We calculate a p-value which is the probability of obtaining a
test statistic as extreme as we did if H0 was true.

2.23

2.23

H0 : 1 = 2
H1 : 1 6= 2

difference in means 0
t=
SE of the difference
t=

260 182
78
=
= 2.23
34.94
34.94
183

How is the probability computed? If sample sizes are large


(both greater than 60) a normal distribution is used.
If sample sizes are small, a small t correction is required (a t
distribution is used with n1 + n2 2 degrees of freedom; that
is the degrees of freedom is the total sample size from both
groups minus 2).
An assumption that is also required is that both populations
are approximately normally distributed. (Results can be highly
influenced by wild observations or outliers.)
184

DiarrheaPepto BismolSummary

Nonparametric Alternative to the 2-sample t-test


Mann-Whitney-Wilcoxon Rank Sum Test

Question Is there a difference in mean stool output between the two


treatment groups?
Methods The stool output was calculated for 84 infants randomized to
placebo and 85 infants randomized to Pepto Bismol. A 95%
confidence interval was calculated for the difference in mean
stool output between the two groups and a two-sample t-test
was used to determine if there was a significant different
between the two groups.

Objective

Assess if the two populations are different?

Advantages

Does not assume populations are normally

distributed. The two-sample t-test requires that


assumption with small sample sizes
Uses only ranksdo not need precise numerical
outcomes
Not sensitive to outliers

Result The mean stool outputs in the treated and control groups were
182 and 260 respectively. The control group stool output was
significantly higher than the treated group (p = .03). The
control group was 78 ml/kg higher than the treated group
(95% confidence interval 9 147 ml/kg).

Disadvantage of the Nonparametric Test


Nonparametric methods are often less sensitive

(powerful) for finding true differences because


they throw away information (they use only
ranks)
Need full data set, not just summary statistics
185

186

Example: Health Intervention Study


Rank the pooled data:
Order:

Evaluate an intervention to educate high school students about


health and lifestyle

Rank:
Group:

Y = Post Pretest Score

Find the average rank in the 2 groups:

Intervention (I) 5 0 7 2 19
Only 5 individuals in each
sample

Randomize
Control (C)

Intervention group average rank:

3+5+7+9+10
5

Control group average rank:

1+2+4+6+8
5

6 -5 -6 1 4

= 6.8

= 4.2

p-value calculations:

With such a small sample, we need to be sure score

improvements are normally distributed if we want to use t-test


BIG assumption

Statisticians have developed formulae and tables to determine the


probability of observing such an extreme discrepancy (6.8 vs 4.2)
by chance alone. Thats the p-value.
In the example, p = .22.

Alternative: Mann- Whitney-Wilcoxon non-parametric test!

187

188

Health Intervention StudySummary

Note

Question Is there a difference in test score change between the


intervention and control groups?
Design 10 high school students were randomized to either receive a
two-month health and lifestyle education program (or no
program). Each student was administered a test regarding
health and lifestyle issues prior to randomization and after the
two-month period.

In the health insurance study, the p-value was .22.


. No significant difference in test scores between the intervention
and control group (p = .22)
The two-sample t-test would give a different answer (p = .11).

Statistics Differences in the two test scores (after-before) were computed


for each student. Mean and median test score changes were
computed for each study group. A Mann-Whitney rank sum
test was used to determine if there was a statistically significant
difference in test score change between the intervention and
control groups at the end of the two-month study period.

Different statistical procedures can give different p-values.


If the largest observation 19 was changed to 100, the p-value

based on the Mann-Whitney test would not change but the


two-sample t-test would.

Result The median score change was four points higher in the
intervention group than in the control group. The difference in
test score improvements between the intervention and control
groups was not statistically significant (p = .17)

189

The t-test or the nonparametric test?

190

Example: Exposure of Young Infants to Environmental Tobacco Smoke


Objective This study examined the degree to which breast-feeding and
cigarette smoking by mothers and smoking by other household
members contributed to the exposure of infants to the
products of tobacco smoker (urinary cotinine level).
Method We report median values and interquartile ranges for each
group. Comparisons between groups are made with the
Wilcoxon rank sum test because the distributions of urine
cotinine values are positively skewed.

Statisticians will not always agree, but there are some guidelines:
Use nonparametric test if sample size is small and distribution

looks skewed. You might also do a t-test, too, and compare.


Only ranks available

Otherwise, use t-test

191

Source: Mascola et al., AJPH, 1998, 88:893-895.

192

Extension of the 2-sample t-test


Analysis of Variance
One-Way ANOVA

Example: Pulmonary Disease

Goal: Does passive smoking have a measurable effect on


pulmonary health?

The t-test compares two populations

Methods: Measure mid-expiratory flow (FEF) in liters per


second (amount of air expelled per second) in six
smoking groups.

Analysis of variance is a generalization of the two-sample

t-test to compare three or more populations


The test statistic from ANOVA calculations is called the

F -test
A p-value is then calculated
Are there any differences among the populations?

An alternative strategy is to perform lots of two-sample

t-tests (pairwise)
That could be a lot of statistical testing!
Instead, perform an ANOVA

Nonsmokers (NS)

Passive Smokers (PS)

Noninhaling Smokers (NI)

Light Smokers (LS)

Moderate Smokers (MS)

Heavy Smokers (HS)

No significant differences Stop. No further analysis necessary


Significant differences Do two-sample t-tests to find them
193

White and Froeb. Small-Airways Dysfunction in Non-Smokers Chronically Exposed


to Tobacco Smoke, NEJM 302: 13 (1980)

194

Mean FEF 2 SE

One strategy is to perform lots of two-sample t-tests...

3.3

FEF (L/s)

n
200
200
50
200
200
200

3.1

sd FEF
(L/s)
0.79
0.77
0.86
0.78
0.81
0.82

2.9

Mean FEF
(L/s)
3.78
3.30
3.32
3.23
2.73
2.59

2.7

Group
name
NS
PS
NI
LS
MS
HS

2.5

Group
number
1
2
3
4
5
6

3.5

3.7

NS

PS

NI

LS

MS

HS

Group

Based on a one-way analysis of variance, there are significant

In this example, there would be 15 comparisons...


It would be nice to have one catch-all test which would tell

you whether there were any differences amongst the six groups

195

differences in pulmonary function among these groups


(p < .001).
Pairwise two-sample t-tests show very significant differences
between nonsmokers and all other groups.
There were no significant differences between passive smokers,
noninhalers and light smokers; and between moderate and
heavy smokers.

196

Smoking and FEFSummary

Smoking and FEFSummary

Statistics

ANOVA was used to test for any differences in FEF levels


amongst the six groups of men

Individual group comparisons were performed with a


Subjects A sample of over 3,000 persons was classified into one of six
smoking categorizations based on responses to smoking related
questions

series of two sample t-tests, and 95% confidence intervals


were constructed for the mean difference in FEF between
each combination of groups

Methods 200 men were randomly selected from each of five smoking
classification groups (non-smoker, passive smokers, light
smokers, moderate smokers, and heavy smokers), as well as 50
men classified as non-inhaling smokers for a study designed to
analyze the relationship between smoking and respiratory
function

Results

Analysis of variance showed statistically significant


(p < .001) differences in FEF between the six groups of
smokers.
Non-smokers had the highest mean FEF value, 3.78 L/s,
and this was statistically significantly larger than the five
other smoking-classification groups
The mean FEF value for non-smokers was 1.19 L/s
higher than the mean FEF for heavy smokers (95% CI
1.031.35 L/s), the largest mean difference between any
two smoking groups

197

Whats the rationale behind analysis of variance?

198

Overuse of Hypothesis TestsBad Statistics!!

H0 : 1 = 2 = = k
H1 : at least one mean is different
The variation in the sample means between groups is compared to
the variation within a group.

Age
< 20
20

3.3

3.1

2.9

FEF (L/s)

3.5

3.7

n
97
88

Sample
Mean
17.8
24.6

2.7

2.5

NS

PS

NI

LS

MS

HS

Group

If the between group variation is a lot bigger than the within group
variation, that suggests there are some differences among the
populations.
http://www.ruf.rice.edu/~lane/stat_sim/one_way/index.html

199

200

Comparing Two Proportions

Notes on Design

Study: Clinical trial of AZT to prevent maternal-infant


transmission of HIV.
AZT
n = 121

9 infected
infants

Placebo
n = 127

31 infected
infants

Random assignment of Tx
Helps insure 2 groups are comparable
Patient & physician could not request particular Tx
Double blind
Patient & physician did not know Tx assignment

Randomize

Definition of infection
Two positive cultures (infant > 32 weeks)

Conner et al. New England J. of Medicine 331:1173-1190 (1994)

201

HIV Transmission Rates

202

HIV Transmission Rates

95% confidence intervals


AZT
Placebo

9/121
31/127

=
=

.074
.244

(7.4%)
(24.4%)

AZT
Placebo

95% CI
95% CI

.03 .14
.17 .32

Note
These are NOT the true population parameters for the
transmission rates.
There is sampling variability

Is the difference significant, or can it be explained by chance?

As CIs do not overlap, suggests significance. But whats the


p-value?

Note: if the CIs did overlap, it would still be possible to get a


p < .05.

Want a direct method for testing 2 independent proportions


203

204

Display the Data in a 2 2 Table


(2 rows and 2 columns)

Hypothesis Testing

H0 : p1 = p2
AZT

Placebo
H1 : p1 6= p2

HIV transmission
(infected)

Yes

31

40

No

112

96

208

121

127

248

p1 = Proportion infected on AZT


p2 = Proportion infected on placebo
1

Fishers Exact Test

(Pearsons) Chi-Square Test (2 )

205

Fishers Exact Test

206

Notes on Fishers Exact Test

As with all hypothesis tests, start by assuming H0 is true:

AZT is not effective


Imagine putting 40 red balls (the infected) and 208 blue balls
(non-infected) in a jar. Shake it up.

Calculations are difficult

Now choose 121 ballsthats AZT group.


The remaining balls are the placebo group.

Always appropriate to test equality of two proportions


Computers are usually used
Exact p-value (no approximations)
no minimum sample size requirements

We can calculate the probability you get 9 or fewer red balls

among the 121. That is the one-sided p-value.


The two-sided p-value is just about (but not exactly) twice

the one-sided p-value. It accounts for the probability of


getting either extremely few red balls or a lot of red balls in
the AZT group.
The p-value is the probability of obtaining a result as or more

extreme (more imbalance) than you did by chance alone.


207

208

HIV-AZTSummary

The Chi-Square Approximate Method

Study We conducted a randomized, double-blind, placebo-controlled


trial of the efficacy and safety of zidovudine (AZT) in reducing
the risk of maternal-infant HIV transmission
Methods HIV transmission rates for both the placebo and AZT groups
were calculated as the ratio of HIV infected infants (based on
cultures at 32 weeks) divided by the total number of infants
and 95% confidence intervals were calculated. The transmission
rates for the two groups were compared by Fishers Exact Test.
Results

Works for big sample sizes


If all 4 numbers in the 2 2 table are 5 or more it is okay
The only advantage of this method over Fishers Exact Test is

The maternal infant HIV transmission rate for the AZT

you dont need a computer to do it.

group was 7.4%


(95% CI 3.5% 13.7%)
The maternal infant HIV transmission rate for the
placebo group was 24.4%
(95% CI 17.2% 32.8%)
AZT significantly reduced the rate of HIV transmission
compared to placebo (p < .001)

209

The Chi-Square Approximate Method

The Chi-Square Approximate Method

Looks at discrepancies between observed and expected.

O =

observed

expected =

210

Calculate expected counts assuming H0 is true

Calculate a test statistic to measure the difference between


what we observe and what we expect
Test Statistic 2 =

row total column total


grand total

Expected refers to the values for the cell counts that would be
expected if the null hypothesis is true

X (O E )2
E
4 cells

Use a chi-square table with 1 degree of freedom to get a


p-value
How likely is it to get such a big discrepancy between the

observed and expected?

211

212

2 Distribution with 1 Degree of Freedom

Performing the 2 Test for a 22 Table

HIV transmission
(infected)

AZT

Placebo

Yes

31

40

No

112

96

208

121

127

248

Observed = 9
3.84

Expected = 121

40
= 19.52
248

213

Performing the 2 Test for a 22 Table

214

Performing the 2 Test for a 22 Table


HIV

Expected

Observed

AZT

Yes
No

AZT
9
112
121

Placebo

Placebo

Yes

31

40

Yes

40

HIV

Yes
No

AZT
19.52
101.48
121

Placebo
20.48
106.52
127

40
208
248

No

112

96

208

No

208

121

127

248

= 13.19 2

HIV

127

40
208
248

(9 19.52)2 (112 101.48)2 (31 20.48)2 (96 106.52)2


+
+
+
19.52
101.48
20.48
106.52

AZT

121

Placebo
31
96
127

The p-value is about p = .0003

248

It is NOT a coincidence that the

13.19

215

square of Z on page 141 is almost the


2 . One is nearly the square of the
other:

13.19 3.63
216

2 Distribution with 1 Degree of Freedom

Chi-Square for Associations in r c Tables

This table assumes that you have one degree of freedomthe case when analyzing a
22 table:
2
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4

P
1.0000
0.7518
0.6547
0.5839
0.5271
0.4795
0.4386
0.4028
0.3711
0.3428
0.3173
0.2943
0.2733
0.2542
0.2367
0.2207
0.2059
0.1923
0.1797
0.1681
0.1573
0.1473
0.1380
0.1294
0.1213

2
2.5
2.6
2.7
2.8
2.9
3.0
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
4.0
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9

P
0.1138
0.1069
0.1003
0.0943
0.0886
0.0833
0.0783
0.0736
0.0693
0.0652
0.0614
0.0578
0.0544
0.0513
0.0483
0.0455
0.0429
0.0404
0.0381
0.0359
0.0339
0.0320
0.0302
0.0285
0.0269

2
5.0
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
6.0
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
7.0
7.1
7.2
7.3
7.4

P
0.0253
0.0239
0.0226
0.0213
0.0201
0.0190
0.0180
0.0170
0.0160
0.0151
0.0143
0.0135
0.0128
0.0121
0.0114
0.0108
0.0102
0.0096
0.0091
0.0086
0.0082
0.0077
0.0073
0.0069
0.0065

2
7.5
7.6
7.7
7.8
7.9
8.0
8.1
8.2
8.3
8.4
8.5
8.6
8.7
8.8
8.9
9.0
9.1
9.2
9.3
9.4
9.5
9.6
9.7
9.8
9.9

P
0.0062
0.0058
0.0055
0.0052
0.0049
0.0047
0.0044
0.0042
0.0040
0.0038
0.0036
0.0034
0.0032
0.0030
0.0029
0.0027
0.0026
0.0024
0.0023
0.0022
0.0021
0.0019
0.0018
0.0017
0.0017

2
10.0
10.1
10.2
10.3
10.4
10.5
10.6
10.7
10.8
10.9
11.0
11.1
11.2
11.3
11.4
11.5
11.6
11.7
11.8
11.9
12.0
12.1
12.2
12.3
12.4

P
0.0016
0.0015
0.0014
0.0013
0.0013
0.0012
0.0011
0.0011
0.0010
0.0010
0.0009
0.0009
0.0008
0.0008
0.0007
0.0007
0.0007
0.0006
0.0006
0.0006
0.0005
0.0005
0.0005
0.0005
0.0004

2
12.5
12.6
12.7
12.8
12.9
13.0
13.1
13.2
13.3
13.4
13.5
13.6
13.7
13.8
13.9
14.0
14.1
14.2
14.3
14.4
14.5
14.6
14.7
14.8
14.9

P
0.0004
0.0004
0.0004
0.0003
0.0003
0.0003
0.0003
0.0003
0.0003
0.0003
0.0002
0.0002
0.0002
0.0002
0.0002
0.0002
0.0002
0.0002
0.0002
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001

217

Summary of Methods for Comparing Proportions

218

Note on p-Values and Sample Size

Will the p-value change if we have smaller sample sizes but


proportions remain about the same?
Suppose our sample size were about 1/4 the original:
Fishers Exact Test Always works with large or small sample size
Highly computational; need a computer
2 -Test Works with larger sample size
Calculations easy to do
One of the most popular statistical methods in
scientific literature
Extends to larger tables

HIV
transmission

AZT

Placebo

Yes

10

No

28

24

52

30

32

62

AZT 2/30 = 6.7%


Placebo 8/32 = 25%

p = .083

219

220

Note

Relative Risk
Ratio of Proportions:
Relative risk = p1 /p2
AZT Example

The p-value depends not only on the observed difference

between the proportions, but also on the sample sizes


If the sample sizes that two proportions were based on were

The risk of HIV with AZT relative to placebo:


Relative risk= p1 /
p2 = .074/.244 = .30

The risk of HIV with placebo relative to AZT:


Relative risk= p2 /
p1 = 3.29
The risk of HIV transmission with placebo is more than 3
times higher compared to AZT

bigger, the p-value would get smaller

221

222

The Relative Risk versus the p-Value


The relative risk tells you the magnitude of the disease-exposure
association.
3

You can test


H0 : Relative Risk= 1
H1 : Relative Risk6= 1
Using any of the methods for comparing proportions
(2 , Fishers)

The p-value (calculated using either Fishers exact test or the 2


statistic) tells you if the observed result can be explained by
chance.

A big relative risk does not necessarily mean that the p-value is
small.

The p-value depends both on the magnitude of the relative risk as


well as the sample size.
223

224

Describing the Association Between Two Continuous Variables

Scatter plot

Correlation coefficient

Simple linear regression

Association between body weight (X ) and plasma volume (Y )

Body Weight (kg)


58.0
70.0
74.0
63.5
62.0
70.5
71.0
66.0

1
2
3
4
5
6
7
8

Plasma Volume (l)


2.75
2.86
3.37
2.76
2.62
3.49
3.05
3.12

225

The Scatter Plot

226

The Correlation Coefficient


Measures the direction and magnitude of
the linear association between X and Y

3.2

The correlation coefficient is between -1 and +1

3.0

2.8

65

Body Weight (kg)

r = 1

r =0

227

r = .7

r =1

70

Scatter diagram of plasma volume and


body weight showing the linear regression
line

60

2.6

Plasma Volume (l)

3.4

r = .7

228

Examples of the Correlation Coefficient

Properties of Correlation Coefficient

Perfect Positive

Uncorrelated

1 r 1

Corr(X , Y ) = r

r
r
r
r
r

= 1: Perfect positive association


> 0: Positive association
= 0: No association
< 0: Negative association
= 1: Perfect negative association

Weak Positive

Closer to 1 and 1: stronger relationship

Sign: direction of association

r = 0: no linear association

Weak Negative

229

230

Correlation measures linear association

Correlation Slider:
noppa5.pc.helsinki.fi/koe/corr/cor7.html
Correlation Guessing Game:
http://istics.net/stat/Correlations/

r =0

A strong relationship along a curve for which r = 0

231

232

Four Scatterplotsall have r = .7

NOTES AND CAVEATS ON CORRELATION COEFFICIENT

Measuring only linear relationships

Other kinds of relationships are also important

Look at and graph the data

Sensitive to outliers

X values are measured not controlled by the experimental

design. That is, X and Y are random

Example where r is appropriate

X = height

Example where r is not appropriate

Y = weight

Clinical study at different doses


X = dose of drug
Y = Response

Anscombes Data

233

Body WeightBlood Plasma

234

How Close Do the Points Fall to the Line?

3.2

3.0

This is measured by the correlation coefficient


But what line? This is measured by regression

2.8

2.6

Plasma Volume (l)

3.4

60

65

70

Body Weight (kg)

r = .76
235

236

Simple Linear Regression

Simple Linear Regression


Fit a straight line to the data

3.6

Y is the dependent variable


X is the independent variable
Predictor
Regressor
Covariate

3.4

Called simple because there is only one independent variable

2.6

3.0

We try to predict Y from X

2.8

Plasma Volume (l)

3.2

2.4

If there are several independent variables, its called multiple

2.2

linear regression

55

60

65

70

75

Body Weight (kg)

237

Least Squares Line

238

Regression by Eye

The linear regression line minimizes the sum of squares of

vertical deviations
Least squares line

www.ruf.rice.edu/~lane/stat_sim/reg_by_eye/index.html

Its called simple because there is only one independent

variable (X )
Each distance is yi yi = yi a + bxi
this is computed for each data point in the sample

239

240

The Equation of a Line

The Equation of a Line

Y
Intercept The expected value of Y when X is 0
Warning: The intercept is not always easily
interpretable.
Only meaningful if X can take the value 0.
Is body weight ever really 0?

b
1

Slope The slope b is the expected increase in Y


corresponding to a unit increase in X
b = 0 No association
b > 0 Positive association
as X increases Y tends to increase
b < 0 Negative association
as X increase Y tends to decrease

a
X

241

Simple Linear Regression Model


Points dont fall exactly on line, so to represent that we add 

242

Body WeightBlood Plasma

Y = a + bX + 
Plasma volume = 0.0857+.0436weight
 is
Noise

Estimate of the intercept = 0.0857

Error
Scatter

Estimate of the slope = 0.0436


For each kilogram difference in body weight, we
expect plasma volume to differ by .0436 liters

Assumptions About 
Random noise
Sometimes positive, sometimes negative

but, on average, its 0


Normally distributed about 0
243

244

Prediction

Estimated Slope
Versus
Population Parameter Slope

Measurement of plasma volume is time-consuming


Use equation and body weight to estimate plasma volume
1

The estimated slope is .0436 and is based on a sample

Estimate the plasma volume for 60kg man

(n = 8) of data.
Y = 0.0857 + .0436 60 = 2.7liters
2

On the other hand, the true slope is a population parameter

and represents the true relation between Y and X based on


an infinite number of observations.

Estimate the plasma volume for 70kg man


Y = 0.0857 + .0436 70 = 3.1liters

Is the estimated slope (.0436) a good estimate?

Estimate the plasma volume for a 50kg man?

Is it close to the true population slope?

245

Precision of Estimated Slope

246

Statistical Inference on Slope


Random sampling behavior of estimated regression coefficients is
normal for large samples (n > 60), and centered at true values
Confidence interval
Hypothesis test

The estimated slope 0.0436 is not the true population parameter


slope b
Standard Error of Estimated Slope

95% Confidence Interval for Slope


Measures precision of estimated slope
Standard error depends on
N
Spread in X s
How close points are to line

Slope t standard error


where t is the t-value with n 2 degrees of freedom

Example

Statisticians have developed formulas to calculate the SE of

the estimated slope


Example

0.0436 2.447 .0153


0.0436 .0374

Slope = 0.0436
Standard error of slope = 0.0153

(.0062,
247

.081)

Note: 0 is not in the confidence interval.

248

Notes

t Values Used for Confidence Intervals for Linear Regression


Parameters

If 0 is not in the 95% confidence interval


it means
Critical Values of t for 95% Confidence
df
1
2
3
4
5
6
7
8
9
10
11

t
12.706
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
2.201

df
12
13
14
15
20
25
30
40
60
120

b = 0 is not plausible

t
2.179
2.160
2.145
2.131
2.086
2.060
2.042
2.021
2.000
1.980
1.960

thus
We would reject H0 :
the = .05

b=0

at

level

That is, p < .05


If we want the actual p-value, we would have to go a step further
and perform a significance test.
249

Hypothesis Testing

250

Hypothesis Testing

H0 :

b=0

H1 :

b 6= 0

The p-value represents the probability of observing a slope as

(slope is zero)

extreme as .0436 if the true population slope was really 0


p-value is probability of being 2.85 or more standard errors

away from mean of 0 on a normal curve


Assume null true, and calculate standardized distance of b from 0

test statistic = t =

p-value = .03

0.0436
slope 0
=
SE(slope)
0.0153
2.85

Plasma volume is positively associated with body weight


(p = .03)

2.85

= 2.85

251

252

Where does the p-value come from?

Some Notes and Assumptions about Simple Linear Regression

Model predicts Y from X


In a normal population, what is the probability of being at

Relationship between Y and X is a straight line

least 2.85 standard deviations away from the population


mean?

Use equation in the range of the data

Statisticians have calculated these probabilities


This is fine for larger sample sizes (n > 60)

Beware of extrapolation

Actually, a t-correction must be used because the sample size

Variability of values about the line is approximately normal

is small with n 2 degrees of freedom

Pairs of data points (X , Y ) are independent

In the example, n 2 = 6 degrees of freedom

The variability (standard deviation) about the line is the same

at all places on the line

253

The Slope (b) and the Correlation Coefficient (r )

254

Testing for an Association

Both indicate direction of association (positive or negative)

We can test whether the correlation coefficient is 0 or not

Slope b is the expected difference in Y per unit difference

(higher versus lower) in X


Larger slope does not necessarily mean a stronger linear

BUT

association
The correlation coefficient is scaled between -1 and +1
The correlation coefficient measures how close points fall on a

linear line

It turns out it is equivalent to testing if the slope b is 0

255

256

The Coefficient of Determination, R 2

Whats a Good R 2 ?

The Square of the Correlation


Coefficient R 2 is the
The Coefficient of Determination

There are a couple of important things about R 2 and r :


These quantities are both estimates based on the sample of

data

Fraction of observed variability in Y that can be explained by

Frequently reported without some recognition of sampling

variability, for example a 95% confidence interval


Low R 2 and r not necessarily bad

R 2 is a number between 0 and 1

many outcomes can not/will not be fully or close to fully

Idea

explained, in terms of variability, by any one single predictor

When there is a straight line relation, some of the variation in Y is


accounted for by the fact that as X changes, it pulls Y along with
it.

The higher the R 2 values, the better the X predicts Y for

individuals in a sample/population
individual Y -values vary less about their estimated means

based on X

Example

However, there may be important overall associations between

In plasma volume and body weight example,


R

= .76

= .762 = .58

mean of Y and X even though still a lot of individual


variability in Y -values about their means estimated by X
257

Caveat

258

Spurious Correlations with Time


CORRELATION IS NOT CAUSATION
May occur when two variables are recorded over time
. and correlated with time

Observation
Negative correlation between death rates from ovarian cancer and
family size based on 20 countries?

Death Rate
from
Drowning

Faulty Interpretation?
Does this mean that having a large family
will protect you from ovarian cancer?
Amount of Ice Cream Sold
259

260

Spurious Correlation

Ecological FallacyHeight and Dietary Intake in 3 Countries


Blood Pressure

Students who received radiation therapy are most likely to die


Morning on which it is hard to get out of bed are the

mornings with the most car accidents

Dietary Salt

261

Multiple Linear Regression

262

Multiple Regression Equation

Finds an equation to predict Y from multiple independent variables


(the X s)

Y
X1
X2

y = b0 + b1 X1 + b2 X2 + b3 X3 + 

= Number of bed days



HMO or
=
Fee for service plan
= Mental health score at baseline

Data

X3 = Functional status at baseline


X4 = Bed days in year prior to enrollment

Parameters

dependent variable

b0

intercept

X1

first independent variable

b1

regression coefficient (slope) for X1

X2

second independent

b2

regression coefficient for X2

X3

third independent variable

b3

regression coefficient for X3

X5 = Age

Just represents random noise (scatter)


and is in the equation to remind us

X6 = Gender

that actual data points wont fall


perfectly on the line

263

264

Interpretation of Regression Coefficients from Multiple Linear Regression

Uses of Multiple Linear Regression

b1 is the expected difference in Y per unit difference


(higher versus lower) in X1 if all the other X s
(independent variables) are held constant

To look for relationships between variables and adjust for

confounding variables
To develop models to predict the expected value of Y from

b2 is the expected difference in Y per unit difference


(higher versus lower) in X2 if all the other X s are
held constant

the X s

265

Example:
Is there an association between hemoglobin and packed cell volume?

Example: Hemoglobin and Pack Cell Volume

Hemoglobin level, packed cell volume, age, and menopausal status for 20 women
Subject Number Hb (g/dl) PCV
Age (yrs) Menopause (0=No)
1
11.1
35
20
0
2
10.7
45
22
0
3
12.4
47
25
0
4
14.0
50
28
0
5
13.1
31
28
0
6
10.5
30
31
0
7
9.6
25
32
0
8
12.5
33
35
0
9
13.5
35
38
0
10
13.9
40
40
0
11
15.1
45
45
1
12
13.9
47
49
0
13
16.2
49
54
1
14
16.3
42
55
1
15
16.8
40
57
1
16
17.1
50
60
1
17
16.6
46
62
1
18
16.9
55
63
1
19
15.7
42
65
1
20
16.5
46
67
1
Source: Pan data from Campbell et al. (1985)

266

16

14

Hb (g/dl)

12

10

25

30

35

40

45

50

55

PCV (%)

267

268

55

Multiple Linear Regression

50

16

45

40

PCV (%)

14

Hb (g/dl)

A statistical procedure that tries to evaluate the association


between two variables while controlling for the effects of other
variables.
One way of thinking about how it works is this:

12

35

30

Imagine sorting the individuals into age groups (example


21-25, 26-30, 31-35 etc.)

For each age group, perform a simple linear regression of


hemoglobin (Y ) versus PCV (X )
Get the slope

Calculate a sort of average of the slopes from each age group

Basically, this is what multiple linear regression does but it


doesnt actually need to sort into age groups

25

10

20

30

40

50

60

20

Age (years)

30

40

50

60

Age (years)

Is the relationship between hemoglobin and PCV only

apparent because maybe they both increase with age?


Is age a confounder?
Can we control for age?

269

Multiple Linear Regression Results

270

Interpretation

Predicted hemoglobin = 5.24 + 0.110 Age + 0.097 PCV


Regression of Hemoglobin on Age and PCV

Variable
Constant
Age
PCV

b0
b1
b2

Regression
coefficient
5.24
0.110
0.097

SE
1.21
0.016
0.033

t-value
4.34
6.74
2.98

MLR For a given age, hemoglobin differs by 0.097 gm/dl


for every unit difference in PCV
MLR The simple linear regression suggested hemoglobin
differed by 0.121 per unit difference in PCV
(not controlling for age)

p-value
0.0004
0.0001
0.0085

Conclusion Hemoglobin is positively associated with PCV even


after accounting for age
(p = .0085)

271

272

Inference for MLR Coefficients

Inference for MLR Coefficients


Hypothesis test

A parameter estimate
(regression coefficient)
has an associated standard error
(SE)

H0 : b = 0

H1 : b 6= 0

Test statistic: t =

Statisticians have developed formulas for standard errors of

b
SE

To calculate a p-value use a t-distribution with degrees of


freedom = sample size - # coefficients

multiple linear regression coefficients


SE represents the imprecision in the parameter estimate

Example: df = 20 3 = 17

To test H0 : b2 = 0

Generally, the standard errors get smaller with larger sample

size

t=

.097
= 2.98
.033

2.98

2.98

p-value is .0085
273

Example

274

Solution

Use multiple linear regression to account for effects of age.


Question Do post-menopausal women have a different
hemoglobin level than pre-menopausal women?
Y

Simple Analysis Two-sample t-test comparing pre vs


post-menopausal women is highly significant

= b0 + b1 X1 + b2 X2

X1 = Age (years)
X2 = Menopause

0 if N0 (pre)
=
1 if YES (post)

Concern Post-menopausal women are older AND Hemoglobin


rises with age
Are we just seeing the effects of age?

275

276

Note

How is Multiple Linear Regression Adjusting for Age?

Imagine sorting individuals into age groups


In each age group, divide into pre-menopausal and

post-menopausal

The X s do not have to be continuous

Then perform a two-sample t-test to calculate the differences

X2 is called an indicator or dummy variable

in hemoglobin sample means between pre and


post-menopausal

Indicator variables are used to define groups

Multiple linear regression kind of averages these two-sample

t-tests across all age groupsIt averages the differences


across all age groups

277

Results:
MLR of Hb against age and menopausal status

278

Confidence Interval for b2

b2 t S.E.
Variable
Constant
Age
Menopausal

b0
b1
b2

Regression
coefficient
9.74
0.081
1.88

SE
1.11
0.033
1.03

t-value
8.77
2.41
1.82

p-value
<0.001
0.03
0.09

1.88 2.1 1.03

Interpretation
After accounting for age, post-menopausal women have, on
average, hemoglobin levels about 1.88 g/dl higher than
pre-menopausal women (95% CI is -0.28 to 4.08, p = .09)

Campbell and Machin

279

280

Power and Sample Size

Example: Thrombolysis and Acute MI

Clinicians thought thrombolysis would benefit AMI


Successive studies failed to prove

Used to determine how many subjects are


needed to answer the research question

Finally, did a mega-trial adequately powered to prove

association

281

Truth
H0

Emerg Med J 2003;20:453-458

282

Type I Error (significance level)

H1

Reject H0
Type II Error

Decision

Fail to
reject H0

Power

283

284

Statistical Power

Power

Power=the chance you reject H0 when H0 is false


That is, you correctly conclude there is a treatment
effect when there really is a treatment effect.

Power is a measure of doing the right thing when H1 is true!


Higher power is better (the closer the power is to 1.0 or 100%)

10

15

20

25

285

Effect of Effect Size

286

Effect of Effect Size

10

10

15

20

25

10

10

15

20

25

10

10

15

20

25

Which is harder?
To detect very small differences
To find a large (obvious) difference

287

288

Effect of Amount of Variability

10

Effect of
or How certain we want to be to avoid type I error

10

15

20

Conventionally, we choose a probability of .05 for a type I error

25

If we lower the significance level, the power will be lower

10

10

15

20

25

289

Effect of
or How certain we want to be to avoid type I error

10

10

10

10

10

10

15

15

15

290

Effect of Sample Size

20

20

20

25

10

10

15

20

25

10

10

15

20

25

25

25

291

292

What influences power?

Blood pressure and oral contraceptives


Is oral contraceptive use associated with higher blood pressure
among 35-39 year olds?

H0 : OC = NOOC

Effect size

H0 : OC 6= NOOC

Variability in measurements
Chosen significance level ()

Pilot Study

Sample size

OC users
Non-OC users

n
8
21

Sample
Mean systolic BP
132.8
127.4

Sample
SD(s)
15.3
18.2

2-sample t-test p = .46


293

But...

294

Design

The sample mean difference in blood pressures is 132.8 -

127.4 = 5.4
This could be considered scientifically significant, however, the

Determine sample sizes needed to detect

result is not statistically significant (or even close to it!) at


the = .05 level

about a 5mm increase in blood pressure in OC users


with 80% power

95% CI for difference in means (OC Non-OC)

at significance level = .05

-9.5 to +20.3

Using the pilot data, we estimate that the standard deviations are
15.3 and 18.2 in OC and non-OC users respectively

Suppose, as a researcher, you were concerned about detecting

a population difference of this magnitude if it truly existed


This OC/Blood pressure study has power of .13 to detect a

difference in blood pressure of 5.4 or more, if this difference


truly exists in the population of women 35-39 years old!

295

296

Key Determinants of Sample Size

Specify
-level of the test Probability of type I error
p-values below this are called statistically significant

In order to detect a difference in BP of 5.4 units (if it really

exists in the population) with high (80%) certainty

Power The power you desire for detecting this treatment


effect

we would need to enroll 153 OC users and 153 non-users

This assumed that we wanted equal numbers of women in

each group...

Effect Size Your best estimate of the true difference


= 1 2 is the treatment effect
Pilot data
Smallest effect of scientific interest
Variability Your best estimate of the true SDs

297

Designing Your Own Study

298

Designing Your Own Study

What if sample size calculation yields group sizes that are too big?
Increase minimum difference of interest
Increase -level

When designing a study, there is a tradeoff between


Power

Decrease desired power

-level

Decrease SD???

Sample size calculations are an important part of study proposal

Sample size

Study funders want to know that the researcher can detect a

Minimum detectable difference (specific H1 )

relationship with a high degree of certainty (should it really


exist)

Industry standard80% power, = .05

Accounting for confounders requires more information and sample


size calculations have to be done via computer simulationconsult
a statistician!

299

300

Formulae

Steps in a Research Project

Pn
X =

i=1 Xi

s
SE (X ) =
n
s
SE (X 1 X 2 ) =

s12
s2
+ 2
n1 n2

Pn

s =

r
SE (
p) =

X )2
n1

i=1 (Xi

Planning
Design

p (1 p)
n

Data Collection
Data Analysis
Presentation

X (O E )2
=
E
4 cells
2

Interpretation

301

Types of Data

302

Measures of Center

Binary data

Categorical data

Continuous data

Survival data

303

304

Measures of Spread

Pictures of Data

305

Shapes of Distributions

306

Normal Distribution

Right skew

Completely described by:

Left skew

Symmetric

Uniform

Rule?

Bimodal

307

308

Constructing Intervals

Standard Normal Scores

Z -score =

Measures

309

Sampling Distribution

Refers to the distribution of a


samples have been taken.

310

Standard Error

when multiple

311

312

Central Limit Theorem

Confidence Intervals

Means and Proportions (and differences in means and differences


in proportions) are distributed normally when the sample size is
.

95% of the time, the population mean will lie within about two
standard errors of the sample mean.

Variability of the distribution of means is characterized by


=
.

To construct a Confidence Interval:

Variability of the distribution of proportions is characterized by


=
.

313

Hypothesis Testing

314

Unified Theory of CI/HT

Step 0:
Situation

Parameter

H0

Statistic

SE

Distribution

Step 1:

Step 2:

Step 3:

Step 4:

315

316

Type I error

Type II error

Power

p-value

317

Power

318

Population versus Sample

Sample Size

Parameter

versus

Statistic

Difference to Detect

Variability

Significance level ()

319

320

Correlation

Simple Linear Regression

r = 1
r <0
r =0
r >0

Assumptions

r =1

CAUTION:
Prediction
Coefficient of Determination
321

Inference on Slope

322

Multiple Linear Regression

323

324

Association not Causation

325

You might also like