You are on page 1of 4

www.statistics.me.

uk

The Central Limit Theorem and Standard Error


Often in statistics we are analysing samples and from that we make inferences about the population(s) from which the sample was taken. If we take a sample from a population then the mean and standard deviation of the sample is considered to approximate the mean and standard deviation of the population. This is the basis of statistical inference. If we take a number of samples from a population and we calculate the sample means then the mean from each sample will in general be all different. However, the mean of the means will normally provide a better estimate of the population mean than the mean of an individual sample. This is a fundamental result from the Central Limit Theorem. In the Central Limit Theorem, it is presumed that there are a population of means of samples of size n, which are each drawn from a parent population of data. The Central Limit Theorem states that (i) The mean of the population of means is equal to the mean of the parent population, (ii) The standard deviation of the population of means is equal to the standard deviation of the parent population divided by , (iii) The distribution of the means of the samples will approximate the normal distribution1 closer and closer as n increases. The Central Limit Theorem is independent of the distribution of the parent population. This gives us a means of analysing samples from a distribution that is not known by appealing to the normal distribution. The probabilities of variables in the normal distribution can be found on computers or by using statistical tables2. The definition of the standard error of the mean is based on point (ii) above. It is a measure of the standard deviation of the mean of a sample and is estimated by dividing the standard deviation of the sample by n. Illustration of the Central Limit Theorem and Standard Error of the Mean The spreadsheet has been included in order to illustrate the Central Limit Theorem and Standard Error of the Mean. In the spreadsheet 100 rows of (pseudo-)random numbers from the uniform distribution on [0,1] of samples of size 10. Thus uniform distribution is considered in the document Probability Distributions3, it has a population mean of 0.5 and a standard deviation of and these quantities are shown in green on the spreadsheet. 1st Sample of 10 We will first consider the first sample of 10 and its properties. This is given in blue on the spreadsheet. Although the random numbers are always changing when using the spreadsheet, an example of the top row sample of random number is 0.21241309 , 0.851326121, 0.435669007, 0.829894915, 0.826754995, 0.232335226, 0.7345375, 0.565696704, 0.350940474, 0.436741732, 0.898372277 The mean of the sample of 10 items of data is 0.579516549 , the standard deviation is 0.259270454. The following table compares the means and standard deviation of the sample with those of the population. population mean standard deviation
The Normal Distribution Eg Statistical Tables 3 Probability Distributions
1 2

sample (size 10) 0.579516549 0.259270454

0.5

www.statistics.me.uk

The standard error is computed from the sample to be the standard deviation of the sample divided by the square root of the sample size. The standard error for this sample is its 0.259270454/ = 0.081988516. Sample of size 1000 Let us now consider the overall sample of size 1000, made100 samples of size 10. The mean and standard deviation for this large sample is given in orange on the spreadsheet and they are compared with the population mean and standard deviation below. population mean standard deviation 0.5 sample (size 1000) 0.496645885 0.286776176

Note that this larger sample tends to give more accurate estimates of the population mean and standard deviation. Sample of 100 samples of size 10 Let us now consider how the set of 100 samples informs us about the population of samples. On the spreadsheet the frequency of means within 0.05 ranges are counted. A histogram showing the distribution of the 100 means is given on the spreadsheet and is reproduced below.

Histogram of sample means


30 25 20 15 10 5 0

The histogram illustrated point (iii); the distribution of means begins to resemble the normal distribution. [Note that the original data was obtained from the uniform distribution.] The statistics for the sample of 100 samples is shown in purple. Firstly the mean of the means is the same as the mean of the 1000-sample and, as stated earlier this tends to be close to the population mean. This confirms point (i). The standard deviation of the sample of 100 means is equal to 0.077069, which is close to the standard error of 0.081988516 that was estimated from the first sample of 10. This

www.statistics.me.uk

confirms point (iii) and confirms that the standard error provides an estimate of the standard deviation of the population of means.
Sample of 10 samples of size 100 Let us now look at the data in a different way; as 10 samples of size 100. The sample of 10 means is perhaps insufficient to try and show that it is normally distributed. However, we can use this data to illustrate the other points. The first sample of 100 is shown in grey. The mean of this sample is 0.524434414 and the standard deviation is 0.268612139. The standard error is therefore 0.268612139/ = 0.0268612139. If we look at the statistics in pink for the sample standard deviation of the mean the result is 0.03477, showing again that the standard error provides an estimate of the standard deviation of the population of means. Exercise Try other blocks of random variables by using RAND() to gain more experience in the central limit theorem and the standard error of the mean.

Example In a store, the mean number of shoppers in any hour is 448 with a standard deviation of 21. In a sample of 49 shopping hours, what is the probability that the mean number of shoppers lies between 441 and 446? Answer The mean of the sample of 49 is presumed to be normally distributed. Since the standard deviation is 21, the standard error for a sample of 49 is

The probability that the mean of a sample of 49 lies in the range 441 and 446 and is illustrated in the following diagram of the normal distribution of mean =448 and standard deviation =3.
0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 -0.02 430 435 440 445 x 450 455 460 465

www.statistics.me.uk

The z-values for the normal distribution are given by z= and x=446, z= .

. For x=441,

Reading from the Statistical Tables2, the area under the standard normal curve up to -2.33 is 0.0099 and the area under the curve up to -0.67 is 0.2514. Hence the area under the curve between the two z-values is 0.2514-0.0099=0.2415. Hence the probability that the mean number of shoppers in a sample size of 49 lies between 441 and 446 is 0.2415.

You might also like