You are on page 1of 14

A113 Mathematics Problem 14: Salary

6th Presentation

Copyright 2009

Recap
Mean is the average of a set of numerical values. Standard deviation
A measure of dispersion of data (how spread out your data is) from the mean A smaller standard deviation value will indicate that most of the data points are nearer to the mean.

Variance is the square of standard deviation.

Frequency table
The frequency table shows all possible outcomes of an event and the number of times each outcome occurs.
Salary Range ($) Frequency Relative Frequency

Relative Frequency
Frequency Total Frequency

1000<=x<1025 1025<=x<1050 1050<=x<1075 1075<=x<1100 1100<=x<1125 1125<=x<1150

24 22 19 29 32 11

0.024 0.022 0.019 0.029 0.032 0.011

.....

1875<=x<1900 1900<=x<1925 1925<=x<1950 1950<=x<1975 1975<=x<2000 Total:

.....

31 22 15 27 25 1000

.....

0.031 0.022 0.015 0.027 0.025 1.000

Note: Relative frequency can be also interpreted as the probability of the occurrence of a particular event. E.g. The probability of a randomly selected executive obtaining a salary in the range of $1950 to $1975 is 0.027.

Relative frequency histogram


The distribution of the individual entries is as shown below:
Distribution of the individual entries of salary
0.25

0.2

Mean = 1494 Standard deviation = 289.6 Variance = 83849

Relative Frequency

0.15

The distribution shows that all outcomes have about the same chance of occurrence for our data.

0.1

0.05

The mean of the distribution is approximately $1494.

The variance and standard deviation are 83849 and 289.6 respectively.

Salary Range ($)

Sample size
Sample size is the number of observations that constitute the sample. For example, when we are calculating the average of 2 entries, the number of observations is 2. Hence, the sample size will be 2. Likewise, if we are calculating the average of 30 entries, then the sample size will be 30.

Average of 2 entries
The distribution of the average of 2 entries is as shown below:
Distribution of the averages of two entries of salary
0.25

Standard deviation = 202.3


0.2

Relative Frequency

Mean = 1501

Variance = 40941 ( 83849 / 2 )

The relative frequency is the lowest at the two tails of the distribution, and it increases as we move towards the mean.

0.15

0.1

Variance of individual entries distribution

Number of entries in a sample

0.05

The mean of the distribution is approximately $1494, i.e. the mean of the distribution of the individual entries.

The variance is reduced by a factor of 2, which is the number of entries in each sample.

Salary Range ($)

Average of 10 entries
The distribution of the average of 10 entries is as shown below:
Distribution of the averages of ten entries of salary
0.25

0.2

Relative Frequency

Mean = 1499
0.15

Standard deviation = 94.0 Variance = 8834 ( 83849 / 10 )

The spread of the distribution is further reduced with the highest frequency occurring at the mean.

0.1

0.05

The mean of the distribution is approximately $1494.

The variance is reduced proportionally by the number of entries in each sample.

Salary Range ($)

Average of 30 entries
The distribution of the average of 30 entries is as shown below:
Distribution of the averages of thirty entries of salary
0.25

Mean = 1500 Relative Frequency


0.2

Standard deviation = 52.9 Variance = 2796

0.15

( 83849 / 30 )

The mean is approximately $1494 and the spread of the distribution becomes even smaller.

0.1

0.05

The probability of obtaining an average that is approximately $1494 increases with the number of entries in each sample.

Salary Range ($)

Central Limit Theorem


The Central Limit Theorem states that if the sample size is large ( 30), the shape of the histogram of the sample means will resemble a bell-shaped curve, also known as the Normal distribution curve.
E.g.:
Relative Frequency

$1501

$1499

$1500

Salary Range ($)

Normal Distribution Curve

Normal distribution

Mean

The Normal curve is symmetrical about its mean. It is described by its mean and its standard deviation (or variance). The area under the Normal distribution curve represents the probability of an event occurring where the total area is 1.

Normal distribution

90%

10%
Mean

163 cm

172 cm

The height of a population is normally distributed. It can be modeled by a Normal distribution curve. For the example above, there is a 10% chance for a randomly chosen individual to have a height exceeding 172 cm.

Estimating average
From the discussions made so far, the distribution of the average of a reasonably large sample size approaches a Normal distribution and the spread of the data will be very narrow. As the sample size gets larger, the average obtained from this sample is less likely to deviate too far from the actual average of the population. Thus, to estimate the average of a population, we can use the average computed from a randomly selected large sample ( 30) of the population. In conclusion, adopting Peters suggestion is an efficient and effective way of solving Serenes problem.

Learning points
Understand the relationship between frequency and relative frequency Understand the relationship between relative frequency and probability Understand that for large sample size ( 30), the Normal distribution curve is a fairly accurate representation of the distribution of the sample means Understand that the variance of the distribution of sample means is inversely proportional to the sample size

Discussion
Suppose the distribution of the individual entries is as follows:
Distribution of the individual entries of salary
0.25

Relative Frequency

0.2 0.15 0.1 0.05 0

Mean = 1202.5 Standard deviation = 145.7 Variance = 21230

Salary Range ($)

Suggest how the distribution of average might look like when sample size is large (30).

You might also like