You are on page 1of 44

Inference From Small Samples

Quantitative Methods for Economics


Dr. Katherine Sauer
Metropolitan State College of Denver
Chapter Overview:

I. Normal Population, is known
II. The t-distribution (aka Students t-distribution)
III. Difference Between Means from Small, Independent Samples
IV. The F-test for equality of two variances
V. Difference between Means, Paired Samples

I. Normal Population, is known

For n < 30:

When the population is Normal and the population standard deviation
is known, then the sampling distribution for sample means is
|
.
|

\
|
n
N x
o
, ~
The confidence interval is

The Test Statistic is
x
Z x o
o 2 /

x
H
x
Z
o

=
Example: The temperature (degrees C) of a cooled storage unit is
taken on 8 consecutive days.
4.5 4.8 5.2 4.7 3.8 3.7 4.1 3.9

Temperatures for this type of storage unit are known to be
Normally distributed with a standard deviation of =0.35.

Construct a 90% confidence interval for the true mean temperature.
3375 . 4 = x
Calculate the sample mean:

For = 0.10, Z
/2
= 1.6449

Calculate the standard error:
x
Z x o
o 2 /

1237 . 0
8
35 . 0
= = =
n
x
o
o
4.3375 + 1.6449(0.1237)
4.3375 + 0.2035

4.1340 to 4.5410

We are 90% sure that the true population mean is in this
interval.
Test the hypothesis that the mean temperature is 4 degrees.

H0: = 4
H1: 4

For = 0.10, / 2 = 0.05 Z = 1.6449
= 4
Z = -1.6449
Reject
H0
Accept H0
Z = 1.6449
Reject
H0
Reject the null if Z > 1.6449 or Z < -1.6449
x
H
x
x
Z
o

=
n
x
o
o =
73 . 2
1237 . 0
4 3375 . 4
=

=
x
Z
Z = 2.73
Reject the null and conclude that the
average temperature is not 4 degrees.
p-value = pr (Z > 2.73) + pr(Z < -2.73)
= 0.0032 + 0.0032
= 0.0064


There is only a 0.64% chance of selecting the given sample if
the true mean is 4.
Often, we dont know the population standard deviation.

We can no longer use the Z table.
II. The t-distribution (aka Students t-distribution)

Fun origin: A chemist at the Guinness brewery in Dublin invented
the t-distribution in order to monitor quality in brewing, using
small samples from Normal populations with unknown.


If random samples of size n are selected from a Normal population
with mean and unknown, then the distribution of sample means
is a t-distribution.
( )
x n
s t x , ~
1

n
s
s
x
=
(n-1) refers to the degrees of freedom
The t-distribution is similar to the Normal distribution in several
ways:
it is bell shaped
it is symmetrical about the mean
is the number of standard errors between the
sample mean and population mean
x
s
x
t

=
Ex: find the tail
area equal to
5% when the
sample size is
10.


10-1 =9 degrees
of freedom

Tail area = 0.05

Critical t-value
is 1.8331
In large samples, when is unknown, we often use Z instead of t.

When samples are large, Z and t are close.
Statistical software always uses t when is unknown, even for
large samples.

The confidence interval for a small sample from a Normal
population with unknown is
x n
s t x
2 / , 1 o

The test statistic for a small sample from a Normal population


with unknown is
x
s
x
t

=
Example: The waiting time at an airline check in counter is
known to be Normally distributed. A random sample of 5
passengers were interviewed. They reported the following wait
times: 15.5 21.2 12.6 18.4 22.9 minutes.
Construct a 90% confidence interval for the average wait time.

Calculate the sample average wait time: 12 . 18 = x
n
s
s
x
=
Calculate the standard error:
Remember to
divide by n-1
for the
variance!!!
xi xi - mean (xi - mean)^2 mean 18.12
15.5 -2.62 6.8644 variance 17.437
21.2 3.08 9.4864 st dev 4.175763
12.6 -5.52 30.4704
18.4 0.28 0.0784
22.9 4.78 22.8484
69.748 sum
8675 . 1
5
1758 . 4
= = =
n
s
s
x
x n
s t x
2 / , 1 o

Find the critical value for t:


t
n-1, /2
= t
4, 0.05
= 2.1318


Construct the interval:
18.12 + (2.1318)(1.8675)
18.12 + 3.9811
14.1389 to 22.1011

We are 90% confident that the average wait time is in
this range.
Example: Test the hypothesis that the average wait time is at
most 20 minutes.

1. State the null and alternative hypotheses
H
0
: < 20
H
1
: > 20
one-sided test, upper tail

2. Sketch the graph and identify the critical region
=0.10 t
4, 0.1
= 1.5332
=20
t =1.5332
Reject
H0
Accept H0
Accept H0 if 1.5332 < t

Reject H0 if t > 1.5332
3. Calculate t:
x
s
x
t

=
n
s
s
x
=
8675 . 1
5
1758 . 4
= = =
n
s
s
x
0067 . 1
8675 . 1
20 12 . 18
=

= t
12 . 18 = x
=20
t =1.5332
Reject
H0
Accept H0
t = -1.0067
Accept H0 because
-1.0067 < 1.5332

Accept the null and validate the claim that at most the average
wait time is 20 minutes.
4. p-value is the area to the right of -1.0067

(rarely look up in t-distribution table software)
Example: The temperature (degrees C) of a cooled storage unit is
taken on 8 consecutive days.
4.5 4.8 5.2 4.7 3.8 3.7 4.1 3.9

At the 90% level, test the hypothesis that the mean temperature is
4 degrees.
H0: = 4
H1: 4

xi xi - mean (xi - mean)^2 mean 4.3375
4.5 0.1625 0.0264 variance 0.294107
4.8 0.4625 0.2139 st dev 0.542316
5.2 0.8625 0.7439
4.7 0.3625 0.1314
3.8 -0.5375 0.2889
3.7 -0.6375 0.4064
4.1 -0.2375 0.0564
3.9 -0.4375 0.1914
2.0588 sum
Lets verify the output:
1917 . 0
8
542316 . 0
= = =
n
s
s
x
x n
s t x
2 / , 1 o
t
7, 0.05
= 1.8946


4.3375 + (1.8946)(0.19174)
4.3375 + 0.3633

3.9742 to 4.7008
x
s
x
t

=
76 . 1
19174 . 0
4 3375 . 4
=

= t
This is a two-tail test.

- 1.8946 < 1.76 < 1.8946

Accept the null.

If we had rejected the null, the p-value would have told us the
level of significance.
III. Difference Between Means from Small, Independent Samples

Example: Promoters of e-learning software design a test for
effectiveness of an online course based on typing tutor software.
Two groups are randomly selected. Group 1 consists of 10 subjects
who have completed a course that did not use supporting software.
Group 2 consists of 8 subjects who used the online software.

The typing speeds (wpm) are as follows.
Group 1: 23, 35, 37, 12, 26, 60, 13, 24, 27, 53

Group 2: 56, 30, 55, 48, 35, 40, 33, 23

Construct a 90% confidence interval for the difference in mean
typing speed between the two groups. Can you conclude that those
who used the online software can type faster?
xi xi - mean (xi - mean)^2 xi xi - mean (xi - mean)^2
23 -8 64 56 16 256
35 4 16 30 -10 100
37 6 36 55 15 225
12 -19 361 48 8 64
26 -5 25 35 -5 25
60 29 841 40 0 0
13 -18 324 33 -7 49
24 -7 49 23 -17 289
27 -4 16 sum 1008
53 22 484
sum 2216
mean 31 mean 40
variance 246.2222 variance 144
st dev 15.69147 st dev 12
Group 1 Group 2
Well need to construct a pooled estimate of variance.
2
) 1 ( ) 1 (
2 1
2
2 2
2
1 1
2
+
+
=
n n
s n s n
s
p
5 . 201
2 8 10
) 12 )( 1 8 ( ) 69147 . 15 )( 1 10 (
2 2
2
=
+
+
=
p
s
Use the pooled estimate of variance to find the standard error.
|
|
.
|

\
|
+ =

2 1
2
1 1
2 1
n n
s s
p x x
7333 . 6
8
1
10
1
5 . 201
2 1
=
|
.
|

\
|
+ =
x x
s
Find the critical t value:
degrees of freedom = n1 + n2 2
= 16
/ 2 = 0.05

t
16, 0.05
= 1.7459


Construct the interval:
40 31 + 1.7459(6.7333)
9 + 11.7557

-2.7557 to 20.7557
The interval contains 0. We can conclude that the
difference between means is zero.

Typing speeds between the 2 groups are the same.
At the 95% level, test the hypotheses that the mean typing speed is
faster for those who used the software.


H0: 1 = 2
H1: 1 > 2
one tailed test

= 0.05

t
16,

0.05
= 1.7459
1 = 2
Accept H0
t = 1.7459
Reject
H0
t =1.3366
The test statistic is
|
|
.
|

\
|
+

=
2 1
2
2 1 2 1
1 1
) ( ) (
n n
s
x x
t
p

3366 . 1
7333 . 6
) 0 ( ) 31 40 (
=

= t
1 = 2
Accept H0
t = 1.7459
Reject
H0
Accept the null
hypotheses that the
typing speed of both
groups is the same.
Assumptions made in solving this problem:
1. independent samples
2. random samples from Normal populations
3. the variance is the same for both populations
IV. The F-test for equality of two variances

To figure out if two populations have similar variances, we will look
at the sample variances.

If the ratio of the sample variances is close to 1, then the hypothesis
that the populations have equal variance is plausible.
The sampling distribution of is an F-distribution, when the
samples are independent and selected from Normal populations
with equal variances.
2
2
2
1
s
s
The F-distribution is not symmetrical and depends on the
degrees of freedom in each sample.
v1 = n1 1 v2 = n2 - 1
Ex: Suppose sample 1 has 10 observations and sample 2 has 8
observations. Find the critical F-value for the 5% level.
v1 = 9 v2 = 7
If we wanted the 2.5% level, wed need a different table.
Example: Using the data from the typing example, test whether
the sample variances are equal at the 95% level.

H
0
:
2
1
=
2
2

H
1
:
2
1

2
2

this is a 2-tail test

/2 = 0.025
F: v1 = 10-1 = 9 v2 = 8-1 = 7

F = 4.82
Calculate the test statistic
2
2
2
1
s
s
7099 . 1
144
22 . 246
2
2
2
1
= =
s
s
F = 1.7099
Accept the null
hypothesis and
conclude that the
population variances
are equal.
Instead, test the hypothesis that the variance of population 1
exceeds the variance of population 2.

H
0
:
2
1
<
2
2

H
1
:
2
1
>
2
2

this is a 1-tail test, upper tail

= 0.05
F: v1 = 10-1 = 9 v2 = 8-1 = 7

F = 3.69
Calculate the test statistic
2
2
2
1
s
s
7099 . 1
144
22 . 246
2
2
2
1
= =
s
s
F = 1.7099
Accept the null
hypothesis and
conclude the
variance of
population 1 is less
than or equal to the
variance of
population 2.
V. Difference between Means, Paired Samples

Paired t-tests are used when data consists of pairs of measurements
on the same subjects.
ex: before and after
Example: The typing speeds for 7 people are recorded before and
after completing a course using typing tutor software.
Person Before After Difference
JM 32 46 14
AC 10 18 8
TB 65 58 -7
AF 39 50 11
AO 24 36 12
PD 10 24 14
FF 24 21 -3
Construct a 90% confidence interval for the difference between
average typing speed before and after the course.

/2 = 0.05
degrees of freedom = 7-1 = 6
t
6,

0.05
= 1.9432
Calculate the mean of the differences:
49 / 7 = 7

Calculate the sample standard deviation:
Person Difference dif - mean (dif - mean)^2
JM 14 7 49
AC 8 1 1
TB -7 -14 196
AF 11 4 16
AO 12 5 25
PD 14 7 49
FF -3 -10 100
436
variance 72.6667
st dev 8.5245
Calculate the sample standard error:
2219 . 3
7
5242 . 8
= = =
n
s
s
d
d
Construct the interval:

7 + 1.9432(3.2219)
7 + 6.2608

0.7392 to 13.2608


We are 90% confident that the true difference in average typing
speeds is between 0.7392 words per minute and 13.2608 words
per minute.
Now at the 2.5% level, test the hypothesis that typing speeds have
increased after taking the course.

H
0
:


d
< 0
H
1
:
d
> 0
one sided test

= 0.025

degrees of freedom = 6

t
6,

0.025
= 2.447
t =2.1726
d = 0
Accept H0
t = 2.447
Reject
H0
Calculate the test statistic:
sterror
claim H estimate
t
0

=
1726 . 2
2219 . 3
0 7
=

= t
Accept the null hypothesis and
conclude that typing speeds did
not improve during the course.
Concepts:
t-distribution
F-distribution

Skills:
Construct confidence interval and perform hypothesis test for
means from small, independent samples

Perform an F-test

Construct confidence interval and perform hypothesis test for the
difference between means from small, independent samples

Construct confidence interval and perform hypothesis test for the
difference between paired means from small, independent samples

You might also like