You are on page 1of 15

Author(s): Brenda Gunderson, Ph.D.

, 2011

License: Unless otherwise noted, this material is made available under the
terms of the Creative Commons AttributionNon-commercialShare
Alike 3.0 License: http://reativeommons.or!/lienses/b"#n#sa/$.0/
We have reviee! this material in aordane with U.%. &op"ri!ht 'aw an! have trie! to ma"imi#e $our
abilit$ to use% share% an! a!a&t it. (he itation )e" on the followin! slide provides information about how "ou
ma" share and adapt this material.

&op"ri!ht holders of ontent inluded in this material should ontat o&en.michi'an(umich.e!u with an"
*uestions, orretions, or larifiation re!ardin! the use of ontent.

+or more information about ho to cite these materials visit http://open.umih.edu/eduation/about/terms#of#use.

,n" me!ical in)ormation in this material is intended to inform and eduate and is not a tool )or sel)-!ia'nosis
or a replaement for medial evaluation, advie, dia!nosis or treatment b" a healthare professional. Please
spea) to "our ph"siian if "ou have *uestions about "our medial ondition.

*ieer !iscretion is a!vise!: %ome medial ontent is !raphi and ma" not be suitable for all viewers.
Some material may be sourced from:
-ind on %tatistis
Utts/.e)ard, $rd /dition, Du0bur", 2001
(e0t 2nl": 3%B4 0567118111
Bundled version: 3%B4 1111689$01
Material from this publication used with permission.
Attribution +e$
for more information see: http://open.umih.edu/wi)i/,ttributionPoli"
,se - Share - A!a&t

.ake /our 0n Assessment

Creative Commons Attribution License
Creative Commons Attribution Share Alike License
Creative Commons Attribution Noncommercial License
Creative Commons Attribution Noncommercial Share Alike License
1N, 2ree 3ocumentation License
Creative Commons 4ero Waiver
5ublic 3omain 6neli'ible: :or)s that are ineli!ible for op"ri!ht protetion in the U.%. ;18 U%& < 102;b== >laws in
"our ?urisdition ma" differ
5ublic 3omain 7"&ire!: :or)s that are no lon!er proteted due to an e0pired op"ri!ht term.
5ublic 3omain 1overnment: :or)s that are produed b" the U.%. Government. ;18 U%& <
107=
5ublic 3omain Sel) 3e!icate!: :or)s that a op"ri!ht holder has dediated to the publi domain.
2air ,se: Use of wor)s that is determined to be +air onsistent with the U.%. &op"ri!ht ,t. ;18 U%& < 108= >laws in "our
?urisdition ma" differ
2ur determination 307S N08 mean that all uses of this $rd#part" ontent are +air Uses and we 30 N08 !uarantee that
"our use of the ontent is +air.
(o use this ontent "ou should !o $our on in!e&en!ent anal$sis to determine whether or not "our use will be +air.
@ &ontent the op"ri!ht holder, author, or law permits "ou to use, share and adapt. A
@ &ontent 2pen.-ihi!an believes an be used, shared, and adapted beause it is ineli!ible for op"ri!ht. A
@ &ontent 2pen.-ihi!an has used under a +air Use determination. A
.o!ule 3: Sam&lin' 3istributions an! the CL8
Objectives: The objective of this module is to give you a hands-on discussion and
understanding of the Central Limit Theorem (CLT), a theorem that plays an
important role in statistics. The sampling distribution of a statistic can be obtained
mathematically, but e ill simulate the sampling process and ill observe the
empirical sampling distribution of various statistics.
!n this module you ill simulate random samples from a "non population
distribution and compute a sample statistic for each of the generated samples. The
generated sample statistics can be e#amined to learn about properties of the
sampling distribution of the statistic.
Overview: $tatistical inference is the process of draing conclusions about a
population parameter based on data. %hen a sample is selected from a population,
a summary number can be computed from the observations resulting in the value of
a statistic. A statistic is used to estimate the corresponding value for a population
(that is, a sample statistic estimates a population parameter). &oever, a sample
chosen at random ill not necessarily yield an estimate (a value of a statistic) that
is e#actly e'ual to the corresponding parameter for the population. The ne#t
selected sample of the same si(e ill probably give a di)erent estimate from the
*rst one. !f additional samples of the same si(e ere ta"en you ould begin to see
ho the possible estimates (possible values of the statistic) vary and ho close they
tend to be to the parameter value.
%ith a large number of samples, you can assess hether the value of the statistic
(e.g. sample mean X ) ill be fre'uently close to the true value of the population
parameter (e.g. population mean

), and if so, ho close on average. This can be


seen more easily through some pictures:
0ne 9an!om Sam&le 2ive 9an!om Sam&les
8ent$ 9an!om Sam&les
X
X X X
X X X X X X X
X X X X X X X X X X X X X X X

8rue 5o&ulation 5arameter 8rue 5o&ulation 5arameter 8rue
5o&ulation 5arameter
Note: 7ach : re&resents one statistic value (one estimate) com&ute! )rom one
sam&le.
When data are gathered by random sampling, the statistic will be a random
variable and as such it will have a probability distribution. The probability
distribution of the sample statistic is called its sam&lin' !istribution.
33
+enerally spea"ing, if e use a statistic to ma"e an inference about a population
parameter, e ant its sampling distribution to be centered at the true parameter
(a characteristic hich allos us to call that statistic unbiased), and e ould li"e
variability in the estimates to be as small as possible.
34
,elo e have to estimators that are both unbiased, but -stimator ! has less
variability (is more precise). Thus, e ould prefer -stimator ! over -stimator !!.
7stimator 6 7stimator 66
X X
X X X X X X
X X X X X X X X X X X XX
X X X X X X X X X X X X X X X X X XX

8rue 5o&ulation 5arameter 8rue
5o&ulation 5arameter
%e ill ne#t e#amine the sampling distribution of the sample statistic most
commonly used for measuring the center of a distribution -- the sample mean.
Formula card:
Activity: ;o !o Sam&le Si#e an! the 3istribution
o) the 5arent 5o&ulation a<ect the Sam&lin'
3istribution o) the Sam&le .ean=
!n this activity you ill observe the e)ects that sample si(e and the distribution of
the population you are sampling from have on the sampling distribution of the
sample mean. The sampling distribution of the sam&le mean,
X
, is the distribution
of the sample mean values for all possible samples of the same si(e from the same
population.
.or this activity open the sam&lin' !istribution a&&let (the original applet can be
found at http://onlinestatboo".com/stat0sim/sampling0dist/inde#.html). This applet
ill help you simulate sampling distributions for a variety of statistics, alloing you
to vary the sample si(e and the population from hich the samples are ta"en.
35
1ead the !nstructions.
2ress 3,egin4 and the $ampling
5istribution 6pplet ill open7
you ill see the screen at the
right.
8otice that hen the applet
begins, a histogram of the
normal distribution ith mean
9: and standard deviation ; is
displayed for the default
3parent distribution4.

The $ampling 5istribution 6pplet has several options you can choose from:
The 9
st
histogram, the 5arent 5o&ulation histogram is the population
from hich the sample ill be dran. <ou can select from Normal% ,ni)orm%
9i'ht Skee! or even customi(e the distribution by selecting Custom and
dragging the mouse over the plot of the parent distribution. .or no, "eep
the default 8(9:, ;) distribution as the parent population.
The =
nd
histogram, the Sam&le 3ata plot, displays a histogram of the
sampled data. This histogram is initially blan". The >
rd
and ?
th
histograms
sho the distribution of statistics computed from the sampled data. The
number of samples (replications) that the >
rd
and ?
th
histograms are based on
is indicated by the label @1epsA@ hich ill be displayed once the sample is
simulated.
$elect the .ean as the statistic in the >
rd
histogram ith a sample si(e
of ; (default), then clic" on Animate! sam&le, and one sample of si(e n A ;
ill be dran from the normal parent population (note 8 is sample si(e,
hereas e generally use n to indicate it). <ou ill see the *ve observations
appear in the =
nd
histogram7 the sample mean of the *ve numbers ill appear
in the >
rd
histogram as a blue s'uare. This graphically shos the process of
getting the sample mean from one sample of si(e ;. 1epeat this several
times and you ill see ho the 3sampling distribution4 of the sample mean
starts to form in the third histogram. Bnce you have a feeling of this or"s
you can speed things up by ta"ing ;, 9CCC or 9C,CCC samples at one time.
6lthough e ill focus primarily on the sampling distribution of the
sample mean, you do have the option to simulate the sampling distribution of
any of the folloing statistics:
.ean7 .e!ian7 s!A $tandard deviation (8 is used in the denominator)7
*arianceA Dariance of the sample (8 is used in the denominator)7
36
*ariance(,)AEnbiased estimate of variance (8-9 is used in denominator)7
.A3A Fean absolute value of the deviation from the mean7 9an'e
%hen you are done ith a particular simulation, you can clic" on Clear Loer 3
button to clear the histograms =, > and ? and select ne settings for your ne#t
simulation.
37
Tasks: .or the folloing tas"s alays select .ean (sample mean) as the statistic
of interest in the
>
rd
histogram (and leave the ?
th
histogram ith none).
9. Select the Normal !istribution as a &arent &o&ulation.
a. %hat are the mean and standard deviation of this populationG
Fean A 9:.CC, sd A ;.CC
b. $elect a sam&le si#e n > ? for the mean as the statistic of interest. 5o
about ; animated samples and then ta"e 9C,CCC samples at once.
5ra a picture of the distribution of the sample means. Fa"e sure to label
both a#es.
&o does the distribution of the sample mean (>
rd
histogram) compare
ith the parent population (e.g., shape, mean, standard deviation)G
The distribution of the resulting sample mean values follos appro#imately a
normal shape that is centered around the original population mean value of
9:, but the spread of the sample mean values is smaller than the spread of
the values in the original population H that is, the sample mean values have a
smaller standard deviation.
c. Clear the loer three graphs and chan'e the sam&le si#e to n > @?.
6gain, do about ; animated samples and then ta"e 9C,CCC samples at
once.
5ra a picture of the distribution of the sample means.
Comment on the changes observed on the >
rd
histogram here as
compared to the >
rd
histogram generated in part 9(a).
The distribution of the resulting sample mean values again follos a normal
shape that is centered around the original population mean value of 9:, but
the sample means seem to be more concentrated (less varied) around the
population mean of 9:.
d. %hat can you say about the relationship beteen the standard deviation
of the sample mean and the population standard deviationG
The standard deviation of the sample mean is smaller than the population
standard deviation.
e. %hat can you say about the relationship beteen the sample si(e and the
standard deviation of the sample meanG
The standard deviation of the sample mean becomes smaller as the sample
si(e increases.
f. 5oes the number of samples (replications) inIuence the shape of the
sampling distributionG (8ote: the number of samples is not the sample
38
si(e.) .or e#ample, is the shape of the sampling distribution hen 1ep A
9C,CCC signi*cantly di)erent from the shape of the sampling distribution
hen 1ep A 9CC,CCCG
8o, only the sample si(e n and the shape of the parent population ill
inIuence the shape of the sampling distribution.
39
=. Clear the loer three graphs and then select the skee! !istribution as a
&arent &o&ulation.
a. $elect a sam&le si#e n > ? for the mean as the statistic of interest. 5o a
fe animated samples and then ta"e 9C,CCC samples at once. 5ra a picture
of the distribution of the sample means.
b. &o does the distribution of the sample mean (>
rd
histogram) compare to
the distribution of the sample mean in part 9(a) (hen the parent population
as normal)G
%hen the parent population as normal, the distribution of the sample mean
loo"ed more li"e a normal distribution H more symmetric and bell shaped than
this histogram of sample means.
c. &o does the distribution of the sample mean (>
rd
histogram) compare
ith the parent population (e.g., shape, mean, standard deviation)G
The distribution of the sample mean has a somehat symmetric shape, ith a
mean close to the population mean, and the standard deviation smaller than
that of the population.
d. Change the sample size to n = 25. 5o a fe animated samples and then ta"e
9C,CCC samples at once. Draw a picture of the distribution of the sample means.
Comment on the changes observed on the >
rd
histogram as compared to the
>
rd
histogram generated in part =(a).
The sample means seem to be more concentrated around the value of the
population mean and the shape of the distribution is somehat normal loo"ing.
e. %hat should be the value of the standard deviation of the sample mean if
the population standard deviation is :.== and the sample si(e is n A =;G &o
does the standard deviation in histogram > from part =(c) compare to this
valueG
The standard deviation of the sample mean ill be e'ual to :.==/
25
A 9.=?. The
standard deviation from =(c), =.J9, is larger due to have a smaller sample si(e.
(9/s'rt(n) is smaller).
>. Clear the loer three graphs, then select the custom !istribution as a
&arent &o&ulation. The parent population plot should be empty. To 3dra4 a
40
distribution, you ill need to use the mouse. Clic" and drag on di)erent parts of
the parent population graph until you have dran a distribution that you li"e.
a. $"etch your custom population.
This ill vary by student. -ncourage students to create a uni'ue distribution.
41
b. $elect a sam&le si#e n > ? for the mean as the statistic of interest. 5o a
fe animated samples and then ta"e 9C,CCC samples at once. &o does the
distribution of the sample mean
(>
rd
histogram) compare ith the parent population (e.g., shape, mean,
standard deviation)G
The distribution of the sample mean has a somehat symmetric shape, ith a
mean close to the population mean, and the standard deviation smaller than
that of the population.
c. Chan'e the sam&le si#e to n > @?. 5o a fe animated samples and
then ta"e 9C,CCC samples at once. Comment on the changes observed on the
>
rd
histogram here as compared to the >
rd
histogram generated in part >(b).
%hat can you say about the shape of the distribution of the sample mean
ith respect to the sample si(e nG
The sample means seem more concentrated around the value of the population
mean ith a distribution that does loo" appro#imately normal. The larger sample
si(e n, the narroer the distribution of the sample mean is.
d. %hat should be the standard deviation of the sample mean for samples of
si(e n A =; from your custom populationG ($ho your calculation.) &o does
the standard deviation of the values in histogram > from part >(c) compare to
itG
6ccording to the central limit theorem, the standard deviation for the sample
mean should be e'ual to n , here

is the population standard deviation. !n


this particular 3custom4 distribution

A:.=: , thus the standard deviation of the


sample mean is 6.26 25 A9.=;. %e can see from the >
rd
histogram, the standard
deviation of this empirically generated sampling distribution of the sample mean
is 9.=:, hich is 'uite close to the e#pected 9.=;.
?. .ill in the blan"s to summari(e your *ndings in -#ercises 9, =, and >:
a. 6) the &arent &o&ulation is a normal !istribution ith a mean and a
standard deviation , then for an$ sam&le si#e (small or large), the sam&le
mean ill have a 00normal00 distribution ith a mean of 00000 and a
standard deviation of 00 n 00.
b. 6) the &arent &o&ulation is N08 a normal !istribution but ith a
mean and a standard deviation , then for a lar'e sam&le si#e, the
sam&le mean ill have approximately a 00 normal 00
distribution ith a mean of 00 00 and a standard deviation of 00 n 00.
The result in ?a is "non as the Sam&lin' 3istribution o) the Sam&le .ean.
The result in ?b is "non as the Central Limit 8heorem. <ou should note that
there are several similarities beteen them. &oever, ma"e sure you can see
and understand the di)erence beteen the to results.
.ill out the chart belo to further summari(e your *ndings regarding the
sampling distribution of the sample mean based on the CLT.
Will the Sam&lin' 3istribution o) Sam&le
.ean
be a&&ro"imatel$ Normal=
n > A0% 5arent 5o&ulation
Normal
/es
4
n > A0% 5arent 5o&ulation
N08 Normal
No
n > ?0% 5arent 5o&ulation
Normal
/es
n > B0% 5arent 5o&ulation
N08 Normal
/es
43
Check Your Understandin:
6 researcher interested in the environmental impact of contaminants in soil has
collected a sample of 9CC tree saplings of a certain species. Ten years ago, the
average height of all such tree saplings as
:C inches ith a standard deviation of ? inches. Let K denote the height of a tree
sapling.
a. The sample mean for the 9CC tree saplings as ;:. .ill in appropriate notation:
00

x 00 A ;:.
b. 2rovide the e#pected value, standard deviation, and appro#imate distribution of
the sample mean height of tree saplings assuming the values from ten years ago
are treated as population parameters.
6ppro#imately 8ormal (:C,C.?)
8ote that C.? comes from ?/L9CC
c. 5ra a detailed s"etch of the sampling distribution of the sample mean height of
tree samplings. Fa"e sure to include your labels.
This ill be appro#imately normal ith the #-a#is labeled #(bar) or Msample
mean valuesN. The distribution should be centered at a mean of :C ith a
standard deviation of C.?.
8(:C, C.?)


x
44
density
59 72 71 10 15 19 82
7"am&le 7"am Cuestion on Sam&lin' 3istribution o) the
Sam&le .ean
.or a particular community it is "non that the mean amount of ater used per
home during Bctober is 9=;C gallons and the standard deviation is >=; gallons.
a. The distribution for amount of ater used is skee! to the ri'ht. $"etch a
s"eed right distribution belo and label both a#es.
b. .or a promotional campaign a radio station plans to randomly select ;C homes
and pay their ater bills for the month of Bctober. 5escribe the a&&ro"imate
sam&lin' !istribution o) the sam&le mean amount o) ater use! )or a
ran!om sam&le o) ?0 homesG
2rovide all features of the distribution.
The sample mean ill have appro#imately a 8B1F6L distribution ith a mean of
9=;C gallons and a standard deviation of
962 . 45
50
325
=
c. The radio station can a)ord to pay for a total of :O,CCC gallons. %hat is the
&robabilit$ that the total number of gallons for a random sample of ;C homes
ill e#ceed :O,CCC gallonsG (&int: thin" about ho a total and an average are
related.)

!"!A# > 67000 ( ) = $%A& >
67000
50





= ' >1340 ( )
= (>
1340 1250
45.962





= (>1.96 ( ) = 0.025
45
density
6mount of ater (gallons)