You are on page 1of 17

Central Limit Theorem

MA700 Statistical Methods for Researchers


Dr. Sriram Devanathan
Amrita Vishwa Vidyapeetham
1
X
X
Central Limit Theorem
As
Sample
Size
Gets
Large
Enough
Sampling
Distribution of
becomes
almost
Normal
regardless
of shape of
population
X
Central Limit Theorem:
When sampling from almost any distribution,
is approximately Normally distributed in large
samples.
X
Central Limit Theorem
As the sample size increases the sampling
distribution of the sample mean approaches the
normal distribution with mean and variance

2
/n
~
X
) , ( N
n
2

Normal Population Distribution


Let X
1
,, X
n
be a random sample from a normal
distribution with mean value and standard deviation
Then for any n, is normally distributed.
The Central Limit Theorem
Let X
1
,, X
n
be a random sample from a distribution with
mean value and variance Then if n sufficiently
large, has approximately a normal distribution with
The larger the value of n, the better the approximation.

.
X

2
.
X
2
2
and ,
X X
n


Formal Statement
6
The Central Limit Theorem
large
n
X
small to
moderate n
X
Population
distribution

Rule of Thumb
If n > 30, the Central Limit Theorem can be used.
0 1 2 3 4
A warning!
Not all distributions have finite mean and variance
For example, neither the Cauchy distribution (the
ratio of two standard normal random variables) nor
the distribution of the ratio of two iid exponentially
distributed random variables have any moments!
For such distributions, the CLT does not hold.
-10 -5 0 5 10
Cauchy
2
1
1 1
) (
x
x f


2
1
1
) (
x
x f

The Normal distribution


Chest measurements of 5738 Scottish
soldiers by Belgian scholar Lambert Quetelet
(1796-1874)
First application of the Normal distribution to
human data
40 35 45
0.0
0.1
0.2
150 160 170 180 190 200
.00
.02
.04
.06
(a) Chest measurements of Quetelets Scottish soldiers (in.)
(b) Heights of the 4294 men in the workforce database (cm)
= 39.8 in., = 2.05 in.
= 174 cm, = 6.57 cm
Normal density curve has
Normal density curve has
Figure 6.2.1 Two standardized histograms with
approximating Normal densitycurves.
From Chance Encounters by C.J. Wild and G.A.F. Seber, John Wiley & Sons, 2000.
The sample mean has a sampling
distribution
Sampling batches of Scottish soldiers and taking chest
measurements. Pop mean = 39.8 in, Pop s.d. = 2.05 in
1
2
3
4
5
6
7
8
9
10
12
11
34 36 38 40 42 44 46
(a) 12 samples of size n = 6
Sample
number
Chest measurement (in.)
From Chance Encounters by C.J. Wild and G.A.F. Seber, John Wiley & Sons, 1999.
Twelve samples of size 24
34 36 38 40 42 44 46
(b) 12 samples of size n = 24
Sample
number
Chest measurement (in.)
1
2
3
4
5
6
7
8
9
10
12
11
From Chance Encounters by C.J. Wild and G.A.F. Seber, John Wiley & Sons, 2000.
Histograms from 100,000 samples
(c) n = 100
(b) n = 24
39 38 37 40 41 42
39 38 37 40 41 42
39 38 37 40 41 42
0.0
0.5
1.0
1.5
0.0
0.5
1.0
0.0
0.5
Sample mean of chest measurements (in.)
(a) n = 6
Figure 7.2.2 Standardised histograms of the sample means from
100,000 samples of soldiers (n soldiers per sample).
From Chance Encounters by C.J. Wild and G.A.F. Seber, John Wiley & Sons, 2000.
Central Limit Effect -- Histograms of sample means
n = 2 n = 1
n = 4 n = 10
0
0.0 0.2 0.4 0.6 0.8 1.0
1
2
3
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
3
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
3
4
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
(b) Uniform
From Chance Encounters by C.J. Wild and G.A.F. Seber, John Wiley & Sons, 2000.
Central Limit Effect -- Histograms of sample means
n = 2
n = 1
n = 4
n = 10
0 1 2 3 4 5 6
0.0
0.2
0.4
0.6
0.8
1.0
0 1 2
0.0
0.4
0.8
1.2
0 1 2 3 4 5 6
0.0
0.2
0.4
0.6
0.8
1.0
(a) Exponential
0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
0 1 2 3 4
0.0
0.2
0.4
0.6
0.8
From Chance Encounters by C.J. Wild and G.A.F. Seber, John Wiley & Sons, 2000.
Central Limit Effect -- Histograms of sample means
n = 2 n = 1
n = 4 n = 10
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
3
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
3
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
3
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
3
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
3
(b) Quadratic U
From Chance Encounters by C.J. Wild and G.A.F. Seber, John Wiley & Sons, 2000.
Consequences of the CLT
When asking questions about the mean(s) of
distributions, we can use theory based on the Normal
distribution
Is the mean different from zero?
Are the means different from each other?
Traits that are made up of the sum of many parts are
likely to follow a Normal distribution
True even for mixture distributions
Distributions related to the Normal distribution are widely
relevant to statistical analyses
c
2
distribution
t-distribution
F-distribution

You might also like