Lecture 8

One sample statistical
tests, continued
Recall: Single population

mean (large n)
Hypothesis test:
observed mean null mean
Z
s
n
Confidence Interval
s
confidence interval observed mean Z/2 * ( )
n
Single population mean

(small n, normally
distributed
trait)
Hypothesis test:

Tn 1
s
n
Confidence Interval
s
confidence interval observed mean Tn 1,/2 * ( )
n
What is a T-distribution?
A t-distribution is like a Z distribution,

except has slightly fatter tails to reflect
the uncertainty added by estimating .
The bigger the sample size (i.e., the
bigger the sample size used to
estimate ), then the closer t becomes
to Z.
If n>100, t approaches Z.
T-distribution with only 1 degree of freedom.
T-distribution with 4 degrees of freedom.
T-distribution with 9 degrees of freedom.
T-distribution with 29 degrees of

freedom.
T-distribution with 99 degrees of freedom. Looks a lot like

Z!!
Students t Distribution
Note: t
Z as n increases
Standard
Normal
(t with df = )
t (df = 13)
t-distributions are bellshaped and symmetric, but
have fatter tails than the
normal
t (df = 5)
0
from Statistics for Managers Using Microsoft Excel 4th Edition, Prentice-Hall 2004
Students t Table
Upper Tail Area
df
.25
.10
.05
1 1.000 3.078 6.314
Let: n = 3
df = n - 1 = 2
= .10
/2 =.05
2 0.817 1.886 2.920

/2 = .05
3 0.765 1.638 2.353

The body of the table
contains t values, not
probabilities
2.920 t
t distribution values
With comparison to the Z value
Confidence
t
Level
(10 d.f.)
t
(20 d.f.)
t
(30 d.f.)
Z
____
.80
1.372
1.325
1.310
1.28
.90
1.812
1.725
1.697
1.64
.95
2.228
2.086
2.042
1.96
.99
3.169
2.845
2.750
2.58
Note: t
Z as n increases
The T probability density

function
What does t look like mathematically? (You may at least recognize some resemblance
to the normal distribution function)
Where:
v is the degrees of freedom
(gamma) is the Gamma function
is the constant Pi (3.14...)
The t-distribution in SAS

Yikes! The t-distribution looks like a mess! Dont want to
integrate!
Luckily, there are charts and SAS! MUST
SPECIFY DEGREES OF FREEDOM!
The t-function in SAS is:
probt(t-statistic, df)
The normality
assumption
Ttests (and all linear models, in fact)

have a normality assumption:
If the outcome variable is not normally

distributed and the sample size is small, a
ttest is inappropriate
it takes longer for the CLT to kick in and the
sample means do not immediately follow a
t-distribution
This is the source of the normality

assumption of the ttest
Computer simulation of the

distribution of the sample mean
(non-normal, small n):
1. Pick any probability distribution and specify a mean and standard

deviation.
2. Tell the computer to randomly generate 1000 observations from
that probability distributions
E.g., the computer is more likely to spit out values with high
X
probabilities
T n
Sx
n
3. Calculate 1000 T-statistics:
4. Plot the T-statistics in histograms.

5. Repeat for different sample sizes (ns).
n=2, underlying distribution is exponential (mean=1, SD=1)
This is NOT a tdistribution!
This is NOT a t-distribution!
This doesnt yet follow a tdistribution!

Still not quite a t-distribution! Note
the left skew.

Now, pretty close to a Tdistribution!
Conclusions
If the underlying data are not normally distributed AND

n is small**, the means do not follow a t-distribution (so
using a ttest will result in erroneous inferences).
Data transformation or non-parametric tests should be

used instead.
**How small is too small? No hard and fast rule

depends on the true shape of the underlying
distribution. Here N>30 (closer to 100) is needed.
Practice Problem:
A manufacturer of light bulbs claims that its

light bulbs have a mean life of 1520 hours with
an unknown standard deviation. A random
sample of 40 such bulbs is selected for testing.
If the sample produces a mean value of 1505
hours and a sample standard deviation of 86,
is there sufficient evidence to claim that the
mean life is significantly less than the
manufacturer claimed?
Assume that light bulb lifetimes are roughly normally

distributed.
Answer
1. What is your null hypothesis?
Null hypothesis: mean life = 1520 hours
Alternative hypothesis: mean life < 1520 hours
2. What is your null distribution?
Since we have to estimate the standard deviation, we need to make inferences from a T-curve with 39
degrees of freedom.
X 40 ~ t39 (1520, s X
86
40
13.5)
3. Empirical evidence: 1 random sample of 40 has a mean of 1498.3 hours
1505 1520
1.11
13.5
p value P(t39 1.11) .137
t39
5. Probably not sufficient evidence to reject the null. We cannot sue the light bulb manufacturer for false
advertising! Notice that using t-distribution to calculate the p-value didnt change much! With n>30, might
as well use Z table.
Practice problem
You want to estimate the average ages

of kids that ride a particular kids ride at
Disneyland. You take a random sample of
8 kids exiting the ride, and find that their
ages are: 2,3,4,5,6,6,7,7. Assume that
ages are roughly normally distributed.
a. Calculate the sample mean.
b. Calculate the sample standard deviation.
c. Calculate the standard error of the mean.
d. Calculate the 99% confidence interval.
Answer (a,b)
a. Calculate the sample mean.
8
X8
X
i 1
2 3 4 5 6 6 7 7 40
5.0
8
8
b. Calculate the sample standard deviation.

8
s X2
i 1
( X i 5) 2
8 1
s X 3.4 1.9
32 2 2 12 0 2(12 ) 2(2 2 ) 24
3.4
7
7
Answer (c)
c. Calculate the standard error of the mean.
sX
sX
n
1.9
8
.67
Answer (d)
d. Calculate the 99% confidence interval.
mean s X (t df , / 2 )
5.0 .67(3.50) (2.65, 7.35)
t7,.005=3.5
Example problem, class

data:
A two-tailed hypothesis test:
A researcher claims that Stanford affiliates
eat fewer than the recommended intake
of 5 fruits and vegetables per week.
We have data to address this claim: 24
people in the class provided data on their
daily fruit and vegetable intake.
Do we have evidence to dispute her claim?
Histogram fruit and veggie

intake (n=24)
Mean=3.7 servings
Median=3 servings
Mode=3 servings
Std Dev=1.7 servings
Answer
1.Defineyourhypotheses(null,alternative)
H0: P(average servings)=5.0
Ha: P(average servings)5.0 servings (two-sided)
2.Specifyyournulldistribution
X 24
1.7
~T23 (5.0,
0.34)
24
Answer, continued
3.Doanexperiment
observed mean in our experiment = 3.7 servings
4.Calculatethep-valueofwhatyouobserved
T23
3 .7 5
3.8
0.34
p-value < .05;
5.Rejectorfailtoreject(~accept)thenullhypothesis
Reject! Stanford affiliates eat significantly fewer than the
recommended servings of fruits and veggies.
T23 critical
value for
p<.05, two
tailed = 2.07
95% Confidence Interval

X 24 T23,.025 * (standard error )
3.7 2.07 * (0.34)
3.0 4.4
H0: P(average servings)=5.0

The 95% CI excludes 5, so p-value <.05
Paired data (repeated

measures)
Patient
BP Before (diastolic)
BP After
100
92
89
84
83
80
98
93
108
98
95
90
What about these

data? How do you
analyze these?
Example problem: paired

ttest
Patient
Diastolic BP Before
D. BP After
Change
100
92
-8
89
84
-5
83
80
-3
98
93
-5
108
98
-10
95
90
-5
Null Hypothesis: Average Change =

ttest
X
8 5 3 5 10 5 36
6
6
6
Change
-8
( 8 6) 2 ( 5 6) 2 ( 3 6) 2 ...
sx
5
4 1 9 1 16 1
sx
2.5
6
32
2.5
5
-5
-3
-5
1.0
60
T5
6
1.0
Null Hypothesis: Average Change

=0
With 5 df, T>2.571

corresponds to p<.05
(two-sided test)
-10
-5

ttest
Change
95%CI :- 6 2.571 * (1.0)
-8
(-3.43,- 8.571)
-5
Note: does not include

0.
-3
-5
-10
-5
Summary: Single
population mean (small n,
Hypothesis test:
normality)
t n 1
sx
n
Confidence Interval
sx
confidence interval observed mean t n -1,/2 * ( )
n
Summary: paired ttest
Hypothesis test:
observed mean d 0
t n 1
sd
n
Where
d=change over
time or
difference
within a pair.
Confidence Interval
sd
confidence interval observed mean d t n -1,/2 * ( )
n
Summary: Single
population mean (large n)
Hypothesis test:
Z t n 1
sx
n
Confidence Interval
confidence interval observed mean [ t n -1,/2
sx
Z/2 ] * ( )
n
Examples of Sample
Statistics:
Single population mean (known )
Single population mean (unknown )
Single population proportion
Difference in means (ttest)
Difference in proportions (Z-test)
Odds ratio/risk ratio
Correlation coefficient
Regression coefficient
Recall: normal
approximation to the
binomial
Statistics for proportions are based

on a normal distribution, because
the binomial can be approximated
as normal if np>5
Recall: stats for

proportions
For binomial: x np
2
x np(1 p )
Differs by
a factor
of n.
x np (1 p )
For proportion:
P-hat stands for sample
proportion.
p p
np (1 p) p (1 p)
2
n
n
p (1 p )
p
n
p
2
Differs
by a
factor
of n.
Sampling distribution of a
sample proportion
p p
p
s p
p=true population proportion.
p(1 p )
n
p (1 p )
n
p ~ Normal ( p,
BUT if you knew p you

wouldnt be doing the
experiment!
p (1 p )
)
n
Always a normal
distribution!
Practice Problem
A fellow researcher claims that at least 15% of smokers
fail to eat any fruits and vegetables at least 3 days a week.
You find this hard to believe and decide to check the
validity of this statistic by taking a random (representative)
sample of smokers. Do you have sufficient evidence to
reject your colleagues claim if you discover that 17 of the
200 smokers in your sample eat no fruits and vegetables at
least 3 days a week?
Answer
1. What is your null hypothesis?
Null hypothesis: p=proportion of smokers who skip fruits and veggies frequently >= .15
Alternative hypothesis: p < .15
2. What is your null distribution?
Var( ) = .15*.85/200 = .00064 SD( ) = .025
~pN (.15, .025)
p
p
3. Empirical evidence: 1 random sample: = 17/200 = .085
4. Z = (.085-.15)/.025 = -2.6
p-value = P(Z<-2.6) = .0047
5. Sufficient evidence to reject the claim.
OR, use computer

simulation
1. Have SAS randomly pick 200

observations from a binomial
distribution with p=.15 (the null).
2. Divide the resulting count by 200 to
get the observed sample proportion.
3. Repeat this 1000 times (or some
arbitrarily large number of times).
4. Plot the resulting distribution of
sample proportions in a histogram:
How often did we get

observed values of 0.085 or
lower when true p=.15?
Only 4/1000 times!
Emprical p-value=.004
Practice Problem
In Saturdays newspaper, in a story about poll results from
Ohio, the article said that 625 people in Ohio were sampled
and claimed that the margin of error in the results was 4%.
Can you explain where that 4% margin of error came from?
Answer
.5 * .5 .5
5
1
2%
625
25 250 50
4% is 2 standard errors.
Since, we' re on a normal distribution, 2 standard errors on either
side of the mean, should represent 95% confidence...
S p
Paired data proportions

test
Analogous to paired ttest

Also takes on a slightly different form
known as McNemars test (well see
lots more on this next term)

test
1000 subjects were treated with

antidepressants for 6 months and
with placebo for 6 months (order of tx
was randomly assigned)
Question: do suicide attempts
(yes/no) differ depending on whether
a subject is on antidepressants or on
placebo?

test
Data:
15 subjects attempted suicide in both conditions (noninformative)
10 subjects attempted suicide in the antidepressant
condition but not the placebo condition
5 subjects attempted suicide in the placebo condition
but not the antidepressant condition
970 did not attempt suicide in either condition (noninformative)
Data boils down to 15 observations
In 10/15 cases (66.6%), antidepressant>placebo.
Paired proportions test

Single proportions test:
Under the null hypothesis, antidepressants and placebo work
equally well. So,
Ho: among discordant cases, p (antidepressant>placebo) = 0.5
Observed p = .666
p p0
.666 .5
1.29; p .05
( p0 )(1 p0 )
(.5)(.5)
15
n
Not enough evidence to reject the null!
Key one-sample
Hypothesis Tests
Test for Ho: = 0 t n 1
Test for Ho: p = po:
x 0
sx
n
p p 0
( p 0 )(1 p 0 )
n
Tn-1 approaches
Z for large n.
** If np (expected
value)<5, use exact
binomial rather
than Z
approximation
Corresponding confidence
intervals
For a mean: x t n 1, / 2 s x
For a proportion: p Z
/2
Tn-1 approaches
Z for large n.
( p )(1 p )
** If np
(expected
value)<5,
use exact
binomial
rather than
Z
approximati
on
Symbol overload!
n: Sample size
Z: Z-statistic (standard normal)
tdf: T-statistic (t-distribution with df degrees of
freedom)
p: (p-hat): sample proportion
X: (X-bar): sample mean
s: Sample standard deviation
p0: Null hypothesis proportion
0: Null hypothesis mean

Lecture 8

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 8

Uploaded by

Copyright:

Available Formats

One sample statistical

Recall: Single population

Single population mean

observed mean null mean

A t-distribution is like a Z distribution,

T-distribution with only 1 degree of freedom.

T-distribution with 4 degrees of freedom.

T-distribution with 9 degrees of freedom.

T-distribution with 29 degrees of

T-distribution with 99 degrees of freedom. Looks a lot like

1 1.000 3.078 6.314

2 0.817 1.886 2.920

3 0.765 1.638 2.353

The T probability density

The t-distribution in SAS

Ttests (and all linear models, in fact)

If the outcome variable is not normally

This is the source of the normality

Computer simulation of the

1. Pick any probability distribution and specify a mean and standard

4. Plot the T-statistics in histograms.

n=2, underlying distribution is exponential (mean=1, SD=1)

This is NOT a tdistribution!

n=5, underlying distribution is exponential (mean=1, SD=1)

This is NOT a t-distribution!

n=10, underlying distribution is exponential (mean=1, SD=1)

This doesnt yet follow a tdistribution!

n=30, underlying distribution is exponential (mean=1, SD=1)

n=100, underlying distribution is exponential (mean=1, SD=1)

If the underlying data are not normally distributed AND

Data transformation or non-parametric tests should be

**How small is too small? No hard and fast rule

A manufacturer of light bulbs claims that its

Assume that light bulb lifetimes are roughly normally

3. Empirical evidence: 1 random sample of 40 has a mean of 1498.3 hours

You want to estimate the average ages

b. Calculate the sample standard deviation.

Example problem, class

Histogram fruit and veggie

p-value < .05;

95% Confidence Interval

H0: P(average servings)=5.0

Paired data (repeated

What about these

Example problem: paired

Null Hypothesis: Average Change =

Example problem: paired

Null Hypothesis: Average Change

With 5 df, T>2.571

Example problem: paired

95%CI :- 6 2.571 * (1.0)

Note: does not include

Summary: paired ttest

confidence interval observed mean [ t n -1,/2

Statistics for proportions are based

Recall: stats for

p=true population proportion.

BUT if you knew p you

OR, use computer

1. Have SAS randomly pick 200

How often did we get

Paired data proportions

Analogous to paired ttest

Paired data proportions