You are on page 1of 18

The Sampling

Distribution
Introduction to Hypothesis
Testing and Interval Estimation

Outline
Distinctions
Sampling Distribution
The Central Limit Theorem
Confidence Intervals

Random Sampling

Key things to keep in mind


Population- what we want to talk about
Sample- what we have with our data
Sampling distribution- the means by which
we will go from our sample to the
population

Sampling Distribution

Sampling distributions concern any statistic we


can come up with. Examples:

Measures of Central Tendency


Measures of Variability
Measures of Relationship
Ratios

Sample != sampling distribution


Recall also that sampling distributions can be
theoretical (used in most studies) or empirical
(seeing wider use via bootstrapping).
It is the properties of the sampling distribution

Central Limit Theorem (CLT)

Suppose X is

random
mean
standard deviation
not necessarily normal

Terms Concerning Sampling


Distribution of the Mean

Standard Error of the mean: X

Is just the standard deviation of the sampling


distribution.

i.e. it is a particular standard deviation

Sampling error

The sample cannot be fully representative of


the population
As such, there is variability due to chance
We could have a thousand sample means and
none of them equal exactly the population
mean. However

CLT (continued)

Properties of the sampling distribution of


the mean

random
has a mean of

has a standard error n


Distributed approximately normal for large
samples
Normal for all samples if the variable X is
normal

The Central Limit Theorem

For any population of scores, regardless of form,


the sampling distribution of the mean will
approach a normal distribution as the sample
size (N) gets larger.

This of course begs the question of what is large


enough

Furthermore, the sampling distribution of the


mean will have a mean equal to (the population

mean), and a standard deviation equal to N

Central Limit Theorem

With the mean, we can use sample data and the


normal curve to reach conclusions about the
population of interest
We of course desire large, random samples in
order to do

Non-random selection can result in under-selection or


over-selection of subsections of the population.

e.g. carry out a telephone opinion poll

http://www.ruf.rice.edu/~lane/stat_sim/s
ampling_dist/index.html

In summary: sample means


are random
are normally distributed for large sample
sizes
distribution has mean
distribution has standard error (standard

deviation)
n

Confidence intervals
Draw a sample, gives us a mean
X is our best guess at
For most samples X will be close to
X is a point estimate
However, we can also give a range or
interval estimate that takes into account
the uncertainty involved in that estimate

Using the normal distribution

Confidence interval equation

Limits X z ( X )
Where
X = sample mean
Z = z value from normal curve
X = standard error of the mean

95% confidence interval


Lets say we want a 95% confidence
interval.
Obtain1 the critical z-score for p =.025

2.5% above +z, and 2.5% below -z

p = .025 then z = 1.96


When the population standard deviation is
not known, we use the t critical value
instead

Limits X t ( s X )

Confidence interval example


Randomly selected a group of 10 of you
folks with a mean score of 89 (s = 6) on
the midterm.
What guess can we make as to the true
mean of the class?

6
10

89 + 2.26*

89 + 2.26(1.90)

(89 - 4.294) <

84.71 <

This seems pretty wide; it essentially covers a full


letter grade. Why do you think that is?

< (89 + 4.294)

< 93.294

Important: what a confidence


interval means

A 95% confidence interval means that:


95% of the confidence intervals calculated on repeated
sampling of the same population will contain
Note that the population value does not vary i.e. its not a
95% chance that it falls in that specific interval1
In other words, the CI attempts to capture the true
population mean, but we would have a different interval
estimate for each sample drawn

http://www.ruf.rice.edu/~lane/stat_sim/conf_interval/inde
x.html
In R
library(animation)
conf.int(.95)

Question to think about

How does one know if the confidence


interval calculated actually contains the
true population mean?

You might also like