You are on page 1of 51

Sampling Distributions

for
Means and Proportions
Quantitative Methods for Economics
Dr. Katherine Sauer
Metropolitan State College of Denver
Chapter Overview:

I. Sampling Distributions: Means
II. The Central Limit Theorem
III. The Normal Distribution
IV. Sampling Distributions: Proportions
V. Desirable Properties of Estimators
We can use sample statistics to infer things about the population
parameters.

Sometimes the sample statistic (e.g. mean) will be close to the
population parameter, sometimes it will not.

Recall: Greek letters are used for the population, English letters are
used for the corresponding sample characteristic.
Lets start with some review.

Suppose our population consists of five numbers.
3, 1, 5, 6, 2

Calculate the population mean, variance, and standard deviation.

= 3.4


2
= 3.44

= 1.8547
I. Sampling Distributions: Means

Suppose we want a sample of size 2. In the table, list all possible
combinations of samples of size 2.

Repeat for a sample of size 3.
Sample of 2 Sample of 3
3,1 3,1,5
3,5 3,1,6
3,6 3,1,2
3,2 3,5,6
1,5 3,5,2
1,6 3,6,2
1,2 1,5,6
5,6 1,5,2
5,2 1,6,2
6,2 5,6,2
Now, calculate each samples mean.
Sample of 2 Mean Sample of 3 Mean
3,1 2 3,1,5 3
3,5 4 3,1,6 3.33
3,6 4.5 3,1,2 2
3,2 2.5 3,5,6 4.67
1,5 3 3,5,2 3.33
1,6 3.5 3,6,2 3.67
1,2 1.5 1,5,6 4
5,6 5.5 1,5,2 2.67
5,2 3.5 1,6,2 3
6,2 4 5,6,2 4.33
Notice that the sample means vary from the population mean.
from 1.5 to 5.5 for sample of n=2
from 2 to 4.67 for sample of n=3

Depending on the sample chosen, the sample mean could be a
good estimate of the population mean, or not.
Lets calculate the mean of all the sample means and the standard
deviation of the sample means.
x

x
o
For sample of size 2: For sample of size 3:
mean 3.4 3.4
standard dev. 1.1358 0.7572


The mean of all the sample means is the same as the population
mean.

The standard deviation of all the sample means decreases as the
sample size increases.
The standard deviation of all the sample means is called the
standard error of the mean.

It can be calculated directly from the samples (as we just did) or
by using the formula when the population standard deviation is
known:
1

=
N
n N
n
x
o
o
1

N
n N
is the finite population correction factor.

When N is large, this factor is approximately 1.
- if the sample size is less than 5% of the
population size, you dont need the
correction factor (finite populations)
The difference between the population mean and its point estimate
is called the sampling error.

If point estimates are the same as the population parameters, there
is no sampling error and the standard error is zero.
A probability distribution is a list of every possible outcome with the
corresponding probability.

For our example, there are 10 possible samples of size 2. The
probability of each sample being selected is 0.10.

Lets plot the probability distribution of our sample means.

Step 1: Construct a frequency distribution table.
- 3 intervals is probably appropriate
- 1.5 to less than 3, 3 to less than 4.5, 4.5 to less than 6
Interval Frequency
1.5< x < 3 3
3 < x < 4.5 5
4.5 < x < 6 2
Step 2: Calculate the relative frequencies.
probability of particular sample x frequency
Interval Frequency Relative Frequency
1.5< x < 3 3 0.3
3 < x < 4.5 5 0.5
4.5 < x < 6 2 0.2
Step 3: Plot the probability histogram.
0
0.1
0.2
0.3
0.4
0.5
0.6
1.5< x < 3 3 < x < 4.5 4.5 < x < 6
R
e
l
a
t
i
v
e

F
r
e
q
u
e
n
c
y

Sample Mean
Distribution of Sample Means
0
0.1
0.2
0.3
0.4
0.5
0.6
1.5< x < 3 3 < x < 4.5 4.5 < x < 6
R
e
l
a
t
i
v
e

F
r
e
q
u
e
n
c
y

Sample Mean
Distribution of Sample Means
Notice, even for this very small population and sample size, the
probability distribution is tending toward the bell shape of the
Normal Distribution.
II. The Central Limit Theorem says that the probability distribution
of the sample means
for samples of size 30 or greater

selected from any population whose mean and variance
are known

approaches a Normal distribution

with mean and standard deviation .


The distribution of sample means for sample
sizes of n > 30.

n
x
o
o =
|
.
|

\
|
n
N x
o
, ~
In addition, the Central Limit Theorem applies for small samples
from Normal populations, when the population variance is
known.


for samples of any size from a Normal
distribution with known variance.



The Central Limit Theorem allows us to calculate
- probabilities regarding sample means
- the limits that contain various percentages of sample
means

( later it will also help us construct confidence intervals)
|
.
|

\
|
n
N x
o
, ~
III. The Normal probability distribution

It has long been recognized that large numbers of measurements,
when sorted and plotted in a histogram, tend to look like a bell-
shaped form.

This bell-shaped curve is the Normal probability distribution
curve.

Formula:


This formula would trace out a bell-curve, symmetrical around
the mean of .

The area under the curve sums to 1.
- true of any probability distribution
2
2
1
2
1
) (
|
.
|

\
|

=
o

t o
x
e x f
The probability that a random variable, X, has a value between x = a
and x = b is given by the area under the curve between x = a and
x = b.
2
2
1
2
1
) (
|
.
|

\
|

=
o

t o
x
e x f
However, we actually dont need to do the integration because the
Normal curve has some special characteristics that let us find the
area from a single table.
Special properties of the Normal distribution:

1. Total area under the curve is one. (true of any probability
distribution)

2. The curve is symmetrical about the mean.
- the area to the left of the mean is 0.5
- the area to the right of the mean is 0.5

3. The area under the curve between the mean and any point
depends on the number of standard deviations between the point
and the mean.
- the Z-score is the number of standard deviations
between the point and the mean
The area between the mean and a point which is one standard
deviation from the mean is 0.3413.
68.26% of the total area is within one standard deviation


The area between the mean and a point which is two standard
deviations from the mean is 0.4772.
95.44% of the total area is within two standard deviations


The area between the mean and a point which is three standard
deviations from the mean is 0.4986.
99.72% of the total area is within three standard deviations
0.3413 = 34% 0.3413
0.135 = 13.5%
0.0235 = 2.35%
0.0015 = 0.15%
0.135
0.0235
0.0015
The Z-score is calculated as



Z is the number of standard deviations between the point (x) and
the mean.

Calculate Z to two decimal points.

Once you have Z, use a Normal probability distribution table to
find the area under the curve.
o

=
x
Z
Here is an excerpt from the table.
Ex: Z = 1.00

Area in upper
tail = 0.1587

Area between
and +
= 0.5 0.1587
= 0.3413
Example: Suppose the time it takes to process an email inquiry is
normally distributed with a mean time of 500 seconds and a
standard deviation of 10 seconds. What is the probability that a
selected email will be processed in more than 505 seconds?

Step 1: Sketch the curve and indicate relevant information.





Step 2: Calculate Z.
Z = 505 500 = 0.5
10

Step 3: Look up in table.
When Z = 0.5, the area in the upper tail is 0.3085.
The probability that an email will take more than 505 seconds to
process is 0.3085.
What if instead we wanted to know the probability that processing
an email will take less than 485 seconds?

Step 1: Sketch the curve and indicate relevant information.





Step 2: Calculate Z.
Z = 485 500 = -1.5
10

Step 3: Look up in table.
For Z = 1.5 area in tail is 0.0668

The probability that processing an email will take less than 485
seconds to process is 0.0668.
What if instead we wanted to know the probability that processing
an email will take between 485 and 505 seconds?

Step 1: Sketch the curve and indicate relevant information.





Step 2: Calculate Z.
Z1 = 485 500 = -1.5 Z2 = 505 500 = 0.5
10 10

Step 3: Look up in the table.
For Z = 1.5, area in tail is 0.0668
For Z = 0.5, area in tail is 0.3085

Subtract from 1. 1 0.0668 0.3085 = 0.6247
Example: An importer of Herbs and Spices claims that the average
weight of packets of saffron is 20 grams. However, packets are
actually filled to an average weight of 19.5 grams with a standard
deviation of 1.8 grams. A random sample of 36 packets is selected.
Find the probability that the average weight is 20 grams or more.

In this example we are dealing with a sample of size n > 30. Well
apply the CLT and calculate the mean and standard error of the
distribution of means.




For our sample, and


n
x
o
o =
=
x
5 . 19 = =
x
3 . 0
36
8 . 1
= = =
n
x
o
o
Step 1: Sketch the curve and indicate relevant information.







Step 2: Calculate Z (using the calculated mean and std error).

Z = 20 19.5 = 1.67
0.3

Step 3: Look up Z in the table.
For Z = 1.67, the area in the tail is 0.0475

This is the probability that the average weight is 20 grams or more.


Instead, lets find the lower and upper limit within which 95% of all
packets weigh.

In this case, we are dealing with the population, not the sample. Use
the population mean and standard deviation.

Step 1: Sketch the curve and indicate relevant information.






Step 2: Look up the Z that corresponds to a tail area of 0.025.

=19.5
Area in tail above
line = 0.025
Area in tail below
line = 0.025
When the area of the upper tail is 0.025, Z = 1.96.
=19.5
Area in tail above
line = 0.025
Area in tail below
line = 0.025
15.972
Step 3: Find the upper and lower limits.
Z x = number of units from the mean

1.96 x 1.8 = 3.528 grams

19.5 + 3.528 = 23.028 grams is the upper limit
19.5 3.528 = 15.972 grams is the lower limit
23.028
95% of the packets of saffron are between 15.972 and 23.028
grams in weight.

Instead, lets calculate the two limits within which 95% of all
average weights fall.

Now we are dealing with the sample of n = 36.

The methodology is the same as when we use the entire population,
except well use the standard error of the means instead of the
standard deviation for the population.

Step 1: Sketch the curve and indicate relevant information.
Area in tail above
line = 0.025
Area in tail below
line = 0.025
5 . 19 = =
x
3 . 0
36
8 . 1
= = =
n
x
o
o
Step 2: Look up the Z that corresponds to a tail area of 0.025.
Z = 1.96

Step 3: Find the upper and lower limits.
Z x x = number of units from the mean

1.96 x 0.3 = 0.588 grams

19.5 + 0.588 = 20.088 grams is the upper limit
19.5 0.588 = 18.912 grams is the lower limit

95% of the samples average weights are between 18.912 and 20.088
grams.
IV. Sampling Distributions: Proportions

A proportion is the number of elements with a given characteristic
divided by the total number of elements in the group.
ex: The proportion of people who vote in an election is
the number who vote divided by the number eligible to
vote.




X or x are the number of elements with a given characteristic.

Often times proportions are quoted as percentages.

The sample proportion is a point estimate of the population
proportion.
N
X
= t
n
x
p =
Example: Suppose we have the following population of data.
3, 1, 5, 6, 2

Calculate the population proportion of even numbers.

= 2 = 0.4
5

Referring back to our samples of size 2 and 3, calculate the sample
proportion of even numbers.
3,1 0 3,1,5 0
3,5 0 3,1,6 1/3 = 0.33
3,6 1/2 = 0.5 3,1,2 1/3 = 0.33
3,2 1/2 = 0.5 3,5,6 1/3 = 0.33
1,5 0 3,5,2 1/3 = 0.33
1,6 1/2 = 0.5 3,6,2 2/3 = 0.67
1,2 1/2 = 0.5 1,5,6 1/3 = 0.33
5,6 1/2 = 0.5 1,5,2 1/3 = 0.33
5,2 1/2 = 0.5 1,6,2 2/3 = 0.67
6,2 2/2 = 1 5,6,2 2/3 = 0.67
Sample
Proportion
Sample
Proportion Sample of 3 Sample of 2
Calculate the mean of all sample proportions for each sample size.

The mean of all the sample proportions is the same as the
population proportion.
For samples of size 2: p = 0.4
For samples of size 3: p = 0.4
The standard deviation of all the sample proportions decreases as
the sample size increases.

The standard error of all sample proportions is given by




(when N is large, we can omit the finite population correction
factor)

For samples of size 2:

= (0.3464)(0.8660)
= 0.29998
= 0.3
1
) 1 (


=
N
n N
n
p
t t
o
1 5
2 5
2
) 4 . 0 1 ( 4 . 0


=
p
o
For samples of size 3:


= (0.2828)(0.7071)
= 0.199967
= 0.2
1 5
3 5
3
) 4 . 0 1 ( 4 . 0


=
p
o
The list of every possible sample proportion with its probability
is called the sampling distribution of proportions.

Lets plot the probability distribution of our proportions for the
samples of size 2.

Step 1: Construct a frequency distribution table.
- we only have 3 values for p (0, 0.5, 1)
p Frequency
0 3
0.5 6
1 1
Step 2: Calculate the relative frequency distribution. (probability
distribution)
probability of particular sample x frequency
0.10





Step 3: Plot the probability histogram.
p Frequency Relative Frequency
0 3 0.3
0.5 6 0.6
1 1 0.1
Notice, even for this very small population and sample size, the
probability distribution is tending toward the bell shape of the
Normal Distribution.

For samples of size 30 or greater the distribution of
sample proportions is approximately Normal with
mean and standard deviation




The distribution of sample proportions
for sample sizes of n > 30.
p
t =
n
p
) 1 ( t t
o

=
|
|
.
|

\
|

n
N p
) 1 (
, ~
t t
t
Example: In a certain neighborhood, it is known that 12% of
people age 16 to 24 are unemployed. If a random sample of 150
people age 16 to 24 is selected, what is the probability that the
sample contains at most 10% unemployed?


Step 1: Calculate p and p .
In this case, n = 150.

If 12% of the population is unemployed then = 0.12.
p = = 0.12

= 0.0265
150
) 12 . 0 1 ( 12 . 0
=
p
o
Step 2: Sketch the curve and indicate relevant information.







Step 3: Calculate Z

= 0.10 0.12 = -0.7547 = -0.75
0.0265

Step 4: Look up Z in the table.
Area in tail is 0.2266.

The probability that at most 10% of the sample is unemployed is
0.2266.
= 0.12
0.10
p
p
Z
o
t
=
Instead, lets calculate the probability that the sample contains at
most 25 unemployed people.

Step 1: Convert the number into a proportion.

25 / 150 = 0.16667

Step 2: Calculate p and p .
p = = 0.12 p = 0.0265
Step 3: Sketch the curve and indicate relevant information.
Pr(p<0.167)
= 0.12
0.167
Step 4: Calculate Z
Z = 0.167 0.12 = 1.7735 = 1.78
0.0265

Step 5: Look up Z in the table.
Area in tail is 0.0375.

The probability that at most 25 people in the sample are
unemployed is 0.0375.
Many times the value of the population proportion is unknown.
We can approximate the mean and standard error of the
proportions by :
p p =
n
p p
s
p
) 1 (
=
V. Some desirable properties of estimators

1. Estimators should be unbiased.
accurate

An estimator is unbiased if the average value of all the point
estimates is equal to the population parameter being estimated.

To prove that x is an unbiased estimator of we would need to
show that the expected value of the sample mean is equal to the
population mean.
E(x) = .
2. The values of sample statistics vary around the population
parameter. It is desirable to keep this variance at a minimum
minimum variance
precise

An estimator is precise when the values of the estimates are
close.

Concepts:
- Central Limit Theorem
- Normal distribution
- desirable properties of estimators

Skills:
For both means and proportions:
- calculate the mean of all the sample means (proportions) and the
standard deviation of the sample means (proportions)

- construct a probability distribution table

- calculate the probability of an event

You might also like