Professional Documents
Culture Documents
Measure of Spread
Dr Hjh edit Master subtitle style Click toMadihah Khalid
4/28/12
Introduction
Assume we have the following three sets of observations:
one measure by calculating their mean and median, which are both 50 for all three datasets.
However, the three datasets look quite different: the way
the data are spread around their central value of 50 is quite different between the three sets. There are different statistical measures that describe this spread.
4/28/12
Range
The simplest measure to describe spread in datasets is the
range, which is the difference between the largest and smallest observation. For the example above, we obtain 100 for all three datasets.
datasets show the same amount of variability. This doesnt really reflect the impression we get by looking at the three datasets.
4/28/12
is obtained by subtracting the sample mean from each individual observation, squaring this difference, adding these differences for all observations and dividing this total by the number of observations minus 1. larger its contribution to the variance. This means that datasets that show a large spread around the mean will have large values for the sample variance, compared to datasets with a smaller spread. In the example datasets, the variances are respectively 1166.67, 835 and 2401.67.
deviation, which is the square root of the variance and expresses the spread of the data in the same units as the
4/28/12
The Formulas
We have just been using this formula: (The "Population Standard Deviation")
4/28/12
observations that are on different scales (e.g. like body weight and height in a group of 10-year olds), one can use the coefficient of variation (C.V.), which is obtained by expressing the standard deviation as a percentage of the (sample) mean. olds were 5 kg on a mean of 20 kg, and the standard deviation of the height were 9 cm on a mean of 150 cm, it is quite hard to tell which of both variables shows the largest variability by comparing the variances (or standard deviations).
for the weight and 6% for the height in the group of 10-year
4/28/12
comes from the description of the distribution of data using percentiles (or quantiles). For example the 40th percentile is that value such that 40% of the observations are smaller than or equal to that value (so 60% of the observations are larger). 75% percentiles, which are also called the first quartile (Q1), the median (Med) and the third quartile (Q3). Note that half of the observations are situated between the first and the third quartile. third and the first quartile, is then another regularly used measure of spread in data. In our example with the three
The most regularly used percentiles are the 25%, 50% and
4/28/12
Other measures
Interquartile range or Interdecile range Mean difference Median absolute deviation Average absolute deviation (or simply called average
deviation)
free). In other words, they have no units even if the variable itself has units. These include:
4/28/12
Example
Find the mean , median and mode for the following
distribution. Also find the range, variance and standard deviation of the distribution
# of mistakes # of pages 0 61 1 109 2 53 3 23 4 4
A)
Height
B)
150154 4
155 159 10
160164 20
165169 7
170174 6
175179 3
# of students
4/28/12
The calculation of the median of grouped data is based on the following formula Looks a bit hideous don't you think? Let's look at it in detail. L = the lower limit of the class containing the median n = the total number of frequencies f = the frequency of the median class CF = the cumulative number of frequencies in the classes preceding the class containing the median i = the width of the class containing the median Putting the numbers from the example into the formula now, we see that the median value is 10.53:
0-5 5 - 10 10 - 15 15 - 20 Total
6 12 19 3 40
4/28/12
Mode
For finding the mode of grouped data, first of all we have to
determine the modal class. The class interval whose frequency is maximum is known by this name. the mode lies in between this class. Then the mode is calculated by the following formula.
Mode = Here, l = lower limit of modal class f1 = frequency of modal class fo = frequency of class preceding the modal class. f2 = frequency of class succeeding the modal class
4/28/12
Using Excel
For mean
http://www.youtube.com/watch?v=M7oON5Yvq-c
For standard deviation
http:// www.youtube.com/watch?v=x5KtqJG-Wds&feature=related
4/28/12 Calculate the median quartiles, percentiles and interquartile range of the data shown in the following cumulative frequency curve