You are on page 1of 5

AQA Statistics 1 Estimation

1 of 5 27/02/13 MEI
Section 1: Confidence intervals

Notes and Examples

These notes contain subsections on
Estimating the mean and variance of a population
The distribution of sample means
The Central Limit Theorem
Confidence intervals for samples from a Normal distribution
Confidence intervals for samples from a large distribution
The meaning of a confidence interval


Estimating the mean and variance of a population

It is important that you understand the difference between a population and a
sample.

A population includes every data value that fits the definition of the
population. (Note the difference between the use of this word in Statistics and
in English: in Statistics the population is a set of data, not a set of individuals.
In English we might talk about the population of the United Kingdom, but in
Statistics we would talk about a set of measurements or values, such as the
population of heights of people living in the United Kingdom.)

A sample is a number of data values from within a population, which can be
used to find out more about a population in situations where it is impractical to
study the whole population (perhaps because it is too large).

Samples can be chosen in a number of ways. In the work that you do here,
you will assume that all samples are random (i.e. the individuals in the sample
were chosen randomly from the population). In practice, samples are not
always random, sometimes because of practical difficulties, or because it
might be desirable to use a sample constructed in a different way (e.g. in a
survey involving people, you might wish to use an equal number of men and
women).

If you want to know the mean and variance of a population, and it is not
possible to collect data for the whole population, then you can use data from a
sample to estimate the mean and variance.

To estimate the mean of a population, the mean x of a sample gives an
unbiased estimate for the mean of the population. This means that if you
took all possible samples of a particular size from the population, and
calculated the mean for each sample, then the mean value of all these means
would be equal to the population mean.
To estimate the variance of a population, the sample variance gives an
unbiased estimate for the population variance. Remember from your work on
AQA S1 Estimation 1 Notes and Examples
2 of 5 27/02/13 MEI
variance in the earlier chapter on Numerical measures, that the formula for
the sample variance is different from the formula for the population variance.

The population variance is given by
2
2
( )

x
n
.
The sample variance is given by
2
2
( )
1

x x
s
n
.

With your present knowledge it is difficult for you to understand fully why using
the divisor n 1 is used for an unbiased estimator for the population variance.
One reason, however, can be given at this level. The variance is calculated
from the deviations from the mean,
i
x x . Since the sum of all the deviations
from the mean is zero, then the value of the last deviation from the mean can
be deduced from the other values. The final deviation from the mean is
dependent on the previous n 1 deviations, so there are only n 1
independent deviations.

You do not need to fully understand this at this stage: you just need to know
that when you are estimating a population variance from a sample, you must
be careful to use the divisor n 1 rather than the divisor n.


The distribution of sample means

If you consider samples of a particular size from any distribution, then the
values of the sample means form a probability distribution.

If the underlying distribution is Normal, with mean and standard
deviation , then the distribution of the sample means is Normal, with
mean and standard deviation

n
.

If the sample size is large enough, then the Central Limit Theorem (see
below) tells you that even if the underlying distribution is not Normal,
the distribution of the sample means is approximately Normal, with
mean and standard deviation

n
.

The quantity

n
is called the standard error of the mean.


The Central Limit Theorem

You cannot prove the central limit theorem with the knowledge you have at
present. However, it is important that you understand what it is saying. Below
is a link to a demonstration of the central limit theorem.
AQA S1 Estimation 1 Notes and Examples
3 of 5 27/02/13 MEI

http://www.chem.uoa.gr/applets/AppletCentralLimit/Appl_CentralLimit2.html

In this demonstration, you can choose any of eight pre-set distributions, most
of which look nothing like a Normal distribution! When you press Draw, the
program will take a large number of samples of size n from the distribution,
and draw a histogram showing the distribution of sample means. The
Population figure given on the screen is the number of samples taken. You
can choose this to be small, medium or large choosing medium or large will
give a smoother histogram but will take longer to complete). If you choose a
sample size of 1 and press Draw, the histogram will be approximately the
same shape as the distribution itself. Try gradually increasing the sample size
and pressing Draw again to see how the histogram changes. Note the mean
and the standard deviation in each case. You should be able to see that as
the sample size increases, the shape of the histogram gets closer to the
shape of a Normal distribution. You should also notice that the mean value of
the sample means remains approximately the same, but the standard
deviation of the sample means decreases.

The rule of thumb often used is that the sample size needs to be about 30 to
get a good approximation to the Normal distribution. In fact, for some
distributions you can get a good approximation to the Normal distribution even
for values of n which are much smaller than 30. Try each of the distributions
(1 to 8) in turn, with a sample size of 4 (you have to reset this each time you
change the distribution) and with the large radio button selected where it
says Population size. Which of them look like the Normal distribution? The
screen dump below shows a simulation run for the first distribution (the
uniform) and you can see that it is remarkably like the Normal, even with only
n = 4.



In the demonstration above, the histogram only shows an approximation to
the distribution of sample means, as it is plotted from a large number of
random observations. The central limit theorem applies to the theoretical
distribution of sample means. For an underlying distribution (of any shape)
with mean and standard deviation , the theoretical distribution of sample
means for samples of size n has mean and standard deviation
n

,
AQA S1 Estimation 1 Notes and Examples
4 of 5 27/02/13 MEI
whatever the value of n. The shape of the distribution of sample means gets
closer to a Normal distribution as n gets larger.

Note that if the underlying distribution is Normal, then the distribution of
sample means is also Normal, whatever the value of n.


Confidence intervals for samples from a Normal distribution

Confidence limits for the mean based on a sample of size n taken from a
Normal population with standard deviation are given by:

k
x
n


Make sure that you understand how to find the required value of k from tables
for percentage points of the Normal distribution:

For a confidence level of 90%, you have two tails of 5%. For a N(0, 1)
distribution, the points cut-off points are z = -k and z = k respectively.
So P( ) 0.05 X k and P( ) 0.95 X k . Hence k = 1.645 from the
tables for percentage points of the Normal distribution.

For a confidence level of 95%, the two tails are 2.5% each, so
P( ) 0.975 X k , so k = 1.96 from the tables.

Check that you can show that for a confidence level of 99%, k = 2.576.

When you have done several questions on this, you will probably find that you
can remember the commonly used values of k. However you must make sure
that you know how to find them from the tables, in case you are given a less
common confidence interval to find.


Confidence intervals from a large sample

In the work above, we have assumed that we are working with a Normal
distribution, and that the standard deviation of the underlying distribution is
known.

These conditions are not always satisfied. However, if the sample size is large
enough for the Central Limit theorem to apply, then the distribution of the
means will be approximately Normal. Also, for a large sample size, the
sample standard deviation will be a good estimator for the population
standard deviation.
This means that if the sample is large (more than about 30), then we can find
confidence intervals for the mean even if the underlying distribution is not
Normal and / or we do not know the population variance.

If the population variance is not known, then the confidence limits become
AQA S1 Estimation 1 Notes and Examples
5 of 5 27/02/13 MEI

ks
x
n
where s is the sample standard deviation.

Note that if the sample is small and either the population standard deviation is
unknown, or the underlying distribution cannot be assumed to be Normal,
then you CANNOT calculate confidence intervals in this way.


The meaning of a confidence interval

It is important that you understand what is meant by a confidence interval. For
example, if you find a 90% confidence interval, this does NOT mean that there
is a 90% chance that the mean lies within the confidence interval. It means
that if you take many samples of the same size and construct a 90%
confidence interval from each one, then 90% of these intervals will contain the
true population mean.

This may seem like a subtle distinction, or just a different wording. The
important point, however, is that the true population mean is not random.
Even though you dont know what it is, it is fixed. Either a particular
confidence interval contains the mean, or it doesnt. So you cannot say that
there is a 90% chance that a particular confidence interval contains the mean.
However, the confidence interval is a random variable as it is based on a
random sample. So you can say that 90% of intervals calculated from
samples of a given size will contain the true mean.

The link below demonstrates confidence intervals. The value for alpha of
0.05 corresponds to a 95% confidence interval. So if you change alpha to
0.02, you will get 98% confidence intervals. Each line on the graph shows a
confidence interval calculated from a particular sample. The lines shown in
red are the confidence intervals which do not catch the true population mean
of 0. By clicking the More Intervals button several times, you can find the
number of intervals which do not catch the mean for a large number of
intervals. You should find that when alpha is 0.05, about 5% of intervals do
not catch the mean.
http://www.stat.sc.edu/~west/javahtml/ConfidenceInterval.html


The Geogebra resource Confidence intervals using the Normal
distribution also shows confidence intervals graphically. You can generate a
new sample from a Normal distribution, and a confidence interval is calculated
from this. You can investigate how often the confidence interval captures the
true mean.


For practice in setting out work on confidence intervals, try the Finding a
Confidence interval activity. You need to cut out the question and solution
and put the steps of the solution in the correct order.

You might also like