You are on page 1of 85

Statistics

ST 361: Introduction to Statistics


Hypothesis tests and Confidence
Intervals for two means




Kimberly Weems
ksweems@ncsu.edu
Statistics
Outline : Hypothesis Tests for 2 Means
HT for comparing means of 2 independent
populations/groups
Hypothesis Testing via CIs
HT for comparing means of 2 matched/paired
(dependent) populations/groups
Hypothesis Testing via CIs



Statistics 3
Recall: The Basic Paradigm.
Population Sample
Statistics
Inference
Parameters
Statistics
Now, Compare two groups
Group 1 Group 2
Statistics 5
Inference for differences
Population 1
Sample 1
Statistics
Inference
Parameters
Population 2
Sample 2
Statistics
Inference
Parameters
Inference
Difference in parameters
Difference in statistics
Statistics
Hypothetical situation
Population A
Mean 300, standard deviation 100
Population B
Mean 100, standard deviation 30
Sample from both populations
n=30
Calculate the mean of both samples
Take the difference in the means



Statistics
Hypothetical situation
Sample A mean=323.8
Sample B mean = 98.1
Difference = 225.6

Repeat this process 10000 times
Statistics
Means of sample A

Statistics
Means of sample A
Normal distribution shape
Centered at 300
Spread from about 240 to 360




100
18.26
30
y
n
o
o = = =
Statistics
Means of sample B

Statistics
Means of sample B
Normal distribution shape
Centered at 100
Spread from about 80 to 120

30
5.5
30
y
n
o
o = = =
Statistics
Differences

Statistics
Differences
Normal distribution shape
Centered at 200
Spread from about 135 to 265

Statistics
Fact
The difference in two independent normally
distributed variables will be normal.
Must know variables are independent


Statistics 15
Fact
For independent random variables y
1
and y
2

the variance of the difference is the sum of the
variances
Var(y
1
- y
2
)=Var(y
1
)+ Var(y
2
)
Statistics 16
Some basic principles
For independent random variables y
1
and y
2

the variance of the difference is the sum of the
variances
Var(y
1
- y
2
)=Var(y
1
)+ Var(y
2
)
Note difference still gives sum!!
Statistics 17
Recall
1
2
1
1
2
2
y
y
n
n
o
o
o
o
=
=
Statistics 18
Recall
1 1
2 2
2
2
1 1
1 1
2
2
2 2
2 2
y y
y y
n
n
n
n
o o
o o
o o
o o
= => =
= => =
Variance of sample mean
Statistics 19
Important formula
1 2
2 2
1 2
1 2
y y
n n
o o
o

= +
Statistics 20
Important formula
1 2
2 2
1 2
1 2
y y
n n
o o
o

= +
Standard
error of
difference
in sample
means
Statistics 21
Important formula
1 2
2 2
1 2
1 2
y y
n n
o o
o

= +
Standard
error of
difference
in sample
means
Variance
of sample
one mean
Variance
of sample
two mean
Statistics
Note
In most cases we will not know
1
or
2
,
instead we will substitute s
1
and s
2
(the sample
SDs). This will not make a difference if the
samples are large.
Statistics
Test for difference in means: Two-sample t
test for independent samples
Assumptions
Samples are random
Both populations are normally distributed.
The samples are independent.
Statistics
Test for difference in means: Two-sample t
test for independent samples
Assumptions
Samples are random
Both populations are normally distributed.
The samples are independent.
Needed to know distribution of
difference
Statistics
Hypotheses
Null Hypothesis
H
0
:
1
-
2
=
0

Alternative hypothesis
H
1
:
1
-
2
>
0

H
1
:
1
-
2
<
0

H
1
:
1
-
2
=
0
Where
0
is some specific value (i.e., the null value)
which is usually 0.
Statistics
Test Statistic
0
statistic-null value
standard error
t =
Statistics
Test Statistic
( )
1 2 0
0
2 2
1 2
1 2
y y
t
s s
n n

=
+
Statistics
Test Statistic
( )
1 2 0
0
2 2
1 2
1 2
y y
t
s s
n n

=
+
From null
hypothesis
Sample
Means
Statistics
p-value
t-distribution
Found from t-table
Degrees of freedom
We will use (n
1
+n
2
-2)
approximately correct if the sample size is large
approximately correct if n
1
=n
2
and s
1
=s
2
Use software in other situations to find exact df.
Found in direction of alternative hypothesis
Statistics
Conclusion
If p-value is less than o reject H
0
.
Statistics
Example
A study reported in the Journal of Adolescent
Health examined gender differences in the
amount of time adolescents spent using
computers each day. The study randomly
selected 2110 students from schools in Hong
Kong.

Statistics
Example
The resulting summary statistics for the total
amount of time students use a computer (in
minutes) each day are given below. Does this
information indicate that gender makes a
difference in computer usage?

Male Female
Mean = 141.15 Mean =133.28
StDev = 97.06 StDev =94.5
n=1009 n=1101
Statistics
Example
Assumptions
Samples are random
Populations are normally distributed.
The samples are independent.


Statistics
Example
Assumptions
Samples are random
Populations are normally distributed.
The samples are independent.
H
0
:
1
-
2
= 0 (males and females are the
same in terms of computer usage)
H
1
:
1
-
2
= 0 (males and females differ
in terms of computer usage).



Statistics
Example
( ) ( )
1 2 0
2 2 2 2
1 2
1 2
y - y - 141.15-133.28 - 0
t = =
s s 97.06 94.5
+ +
1009 1101 n n
7.87
=1.88
17.45
Statistics
Example
P-value
See t-table
Degrees of freedom
df = n
1
+n
2
-2=2108
use last row on the table.
Statistics
t Table
1.88 is between 1.645 and 1.96
Statistics
t Table
Statistics
Example
P-value
t Table
Degrees of freedom
df = n
1
+n
2
-2=2108
use last row on the table.
1.645<1.88<1.96
2(0.05)>p-value>2(0.025)
Two sided test: double the probabilities
0.10>p-value>0.05

Statistics
Example
If p-value <= o reject H
0
.
p-value>0.05 Do not reject H0
Not enough evidence to conclude that there is a
difference in computer usage between males
and females
Statistics
Confidence Interval
Statistic Margin of Error
( )
2 2
1 2
1 2
1 2
s s
y y t
n n
+
Statistics
Confidence Interval
Statistic Margin of Error
( )
2 2
1 2
1 2
1 2
s s
y y t
n n
+
Df =n
1
+n
2
-2
Statistics
Example
We would like to find a 95% confidence
interval for the mean difference between male
and female computer usage in this population .

Male Female
Mean = 141.15 Mean =133.28
StDev = 97.06 StDev =94.5
n=1009 n=1101
Statistics
Example
( )
( )
2 2
1 2
1 2
1 2
2 2
s s
y - y t +
n n
97.06 94.5
141.15-133.28 1.96 +
1009 1101
7.871.96 17.45 => 7.878.19
(-0.32,16.06)
Statistics
Example
Notice that the interval (-.32, 16.06) contains
the null value 0. Therefore, it is plausible
(with 95% confidence) that the true difference
in mean computer usage is 0. Thus, we fail to
reject H
0
.
We are 95% confident that the interval (-.32,
16.06) contains the true difference in computer
usage between males and females.
Statistics
General Rule: CI approach to a 2-tailed HT
If the null value is contained in the 100(1-o)%
CI, then we FAIL TO REJECT H
0
at level o.
If the null value is NOT contained in the
100(1-o)% CI, then we REJECT H
0
at level o.
Can only use this approach for a 2-tailed test
Statistics
Special Case: Pooled t-test
In this section we take up a special case of the
two sample t-test.
Used when we can make a specific assumption
Degrees of freedom will be exactly n
1
+n
2
-2
Assume SDs are equal: o
1
= o
2

Pool the information you have about them.


Statistics
Pooled Variances
Combine information about both variances into
a single estimate. Use this estimate in the
standard error formula.
( ) ( )
2 2
1 1 2 2 2
1 2
1 1

2
n s n s
s
n n
+
=
+
Statistics
Test for difference in means
(pooled variance)
Assumptions
Samples are random
Populations are normally distributed.
Population variances are equal.
Samples are independent.
Statistics
Hypotheses
Null Hypothesis
H
0
:
1
-
2
=
0

Alternative hypothesis
H
1
:
1
-
2
>
0

H
1
:
1
-
2
<
0

H
1
:
1
-
2
=
0
Where
0
is some specific (null) value.
Statistics
Test Statistic
( )
1 2 0
0
2 2
1 2
y y
t
s s
n n

=
+
Statistics
Test Statistic
( )
1 2 0
0
2 2
1 2
y y
t
s s
n n

=
+
Pooled Variance
Statistics
p-value
Found from t-table with n
1
+n
2
2 degrees of
freedom
Found in direction of alternative hypothesis
For two sided (=) alternative find one sided case
and double results.
Statistics
Conclusion
If p-value <= o reject H
0
.
Statistics
Note: Can also use rejection region
For both the separate and pooled variance
two-sample t-test, we can also use a rejection
region approach. Recall:
Select a significance level o.
Determine the Rejection Region: set of values
for which one rejects H
0
.
Compute the sample mean and the test
statistic. Reject H
0
if the test statistic lies in
the rejection region.
Statistics
Note: Can also use rejection region

Alternative Hypothesis & Rejection Region
H
1
Rejection Region
H
1
:
1
-
2
=
0
|t
0
| > t
o/2,n1+n2 2
H
1
:
1
-
2
>
0
t
0
> t
o, n1+n2 2
H
1
:
1
-
2
<
0
t
0
< - t
o, n1+n2 2
Statistics
Example
Does the color of paper make a difference in
exam scores? A history professor created two
versions of an exam. The two versions were
printed on two colors of paper. He
administered them to his class by randomly
assigning them to his students.
Statistics
Example
Twenty-one students took the exam version
that was on pink paper. Eighteen students
were assigned to the version on gold paper.
The resulting scores are summarized below.
Does this indicate that there is a significant
difference between the two colors of the exam?
Color n Mean St. Dev
Pink 21 72 8.1
Gold 18 64 9.2
Statistics
Example
Assumptions
Samples are random
Populations are normally distributed
Populations have same variance
The samples are independent.
H
0
:
1
-
2
= 0 (version of the exam does
not make a difference)
H
1
:
1
-
2
= 0 (versions do make a
difference).


Statistics
Example
( ) ( )
( ) ( )
( ) ( )
2 2
1 1 2 2 2
1 2
2 2
n -1 s + n -1 s
s =
n +n -2
21-1 8.1 + 18-1 9.2
=
21+18-2
21-1 65.61+ 18-1 84.64
=
21+18-2
1312.2+1438.88 2751.08
= =74.35
37 37
Statistics
Example
( ) ( )
2 2
1 2 0
1 2
y - y - 72-64 - 0
t = =
74.35 74.35
s s
+
+
21 18
n n
8
=2.88
7.67
Statistics
P-value
Degrees of freedom
n
1
+n
2
2=21+18-2=37

Closest df=38 (can round up or down)
2*(0.005)>p-value
0.010>p-value

Statistics
Conclusion.
P-value is less than 0.05=> Reject H
0
There is evidence of a significant difference
between the 2 versions of the exam.
Statistics
Confidence interval for difference.
( )
2 2
1 2
1 2
s s
y y t
n n
+
Statistics
Example
Calculate a 95% confidence interval for the
difference in mean score of exams for the two
versions.
37 degrees of freedom
Statistics
Example
( )
2 2
1 2
1 2
s s
y - y t +
n n
74.35 74.35
82.024 +
21 18
82.024 7.67 => 85.605
(2.4,13.6)
Statistics
Example
Notice that the null value 0 is not contained
in the interval (2.4, 13.6), so we reject the null
hypothesis.
We are 95% confident that the interval (2.4,
13.6) contains the true mean difference
between the exam versions. OR
The observed interval (2.4, 13.6) brackets the
true difference in mean exam scores, with 95%
confidence.
Statistics
When do we do this test?
For any sample size (especially when n is
small) and
If we can make the assumption of equal
variances
Often ok if we have two randomly assigned groups
When using software distinction not as
important.
Was more important before computing
Probably see this test in literature.
Statistics
Paired Differences
Compare two measures on the same subject
Right and left hand
Pre-test and post-test
Before and after measure
Record two measures on same subject
Take the difference in those measures
Change scores
Statistics
Example
We recorded the right and left hand strength of 9
randomly selected college age adults.

Statistics
Example
Subject Dominant Off Dom
1 333 350
2 380 374
3 164 189
4 330 308
5 214 209
6 282 224
7 390 382
8 258 293
9 221 219
Statistics
Example
Subject Dominant Off Dom Difference
1 333 350 -17
2 380 374 6
3 164 189 -25
4 330 308 22
5 214 209 5
6 282 224 58
7 390 382 8
8 258 293 -35
9 221 219 2
Statistics
Example
Treat differences as a single sample

Hypotheses:
If there is no difference average should be 0
If dominant hand is stronger difference should be
greater than 0


Statistics
Notation
D
sample average of differences
s standard deviation of differences
n number of differences
D
y

Statistics
Example
D
D
y = ______
s =27.50
n = _______
Statistics
Test for paired differences
Assumption
We have a random sample of the differences
The population of differences is normally
distributed.
Statistics
Hypotheses
H
0
: =
0

H
1
: >
0

H
1
: <
0

H
1
: =
0

Where is really
D
, the true mean of the
differences

Statistics
Test Statistic
0
0
D
D
y
t
s
n

=
Statistics
Test Statistic
0
0
D
D
y
t
s
n

=
From null
hypothesis
(usually zero)
Mean of sample
SD of sample
Statistics
Test Statistic: Example
0
0
?
D
D
y
t
s
n

= =
Statistics
p-value
Found from t-table using n-1 degrees of
freedom.

Statistics
Conclusion
If p-value is less than o reject H
0
.
Or, we can use a rejection region approach
with n-1 df.
Or, for a 2-tailed (i.e. 2-sided) test, we can use
a CI approach

Conclusion for the example:
/ 2, 1
. . . ( / )
D n D
y mo e y t s n
o
=
Statistics
Rejection Region Approach

Alternative Hypothesis & Rejection Region
H
1
Rejection Region
H
1
:
0
|t
0
| > t
o/2,n1
H
1
: : >
0
t
0
> t
o, n1
H
1
: : <
0
t
0
< - t
o, n1
Statistics
CI for the mean of a normal distribution, when is
unknown (contd)
When is unknown, the (1-)% CI for for a particular
sample (x
1
,,x
n
) is:

where is the sample mean, s is the sample standard
deviation, n is the sample size, is the critical value of a
t distribution with df=n-1 corresponding to a right tail
probability of /2.







/ 2, 1
. . . /
n
x mo e x t x n
o
=
x
/ 2, 1 n
t
o
Statistics
Example: Twin Weights
http://www.statcrunch.com/5.0/index.php?dataid=338704
Weights for 19 newborn twins born to members of the Greater
Columbia South Carolina Area Mothers of Twins Club from
September 2000 to December 2001.

You might also like