You are on page 1of 46

MEASURES OF DISPERSION

Why Study Dispersion

 A measure of location, such as mean or median, only


describes the centre of the data. It is valuable from that
standpoint, but doesn’t tells us anything about the spread
of the data.

 For example, If your nature guide told you that the river ahead
averaged 3 feet in depth, would you want to wade across on
foot without additional information? Probably not. You would
want to know something about the variation in the depth. Is the
maximum depth of the river 3.25 feet and the minimum 2.75
feet? If that is the case you probably agree to cross.
Why Study Dispersion (Cont.)

 What if you learned that the river depth ranged from .50
feet to 5.5 feet ? Your decision would probably be not to
cross. Before making a decision about crossing the river,
you want information on both the typical depth and the
dispersion in the depth of the river.

 A small value for a measures of dispersion indicates that


data are clustered closely, say around the arithmetic mean.
The mean therefore considered representative of the data.
Definition
The measurement of the scatter of the values of a data set
among themselves is called a measures of dispersion or
Variation.

 The more similar the scores are to each other, the lower the
measure of dispersion will be
 The less similar the scores are to each other, the higher the
measure of dispersion will be
 In general, the more spread out a distribution is, the larger the
measure of dispersion will be
Measures of Dispersion

125
100
Which of the distributions of 75

scores has the larger dispersion? 50


25
0
1 2 3 4 5 6 7 8 9 10

The upper distribution has more


dispersion because the scores are
more spread out. That is, they are 125

100

less similar to each other 75

50

25

0
1 2 3 4 5 6 7 8 9 10
Significance of Measuring Variation

 To determine the reliability of an average

 To serve as a basis for the control of the variability

 To compare two or more series with regards to their


variability

 To facilitate the use of other statistical measures


Measures of Dispersion

Frequently used measures of dispersions are:

i) The Range
ii) The Quartile Deviation
iii) The Mean ( or Average ) Deviation
iv) The Variance
v) The Standard Deviation

These are absolute measures of dispersion


Why Absolute ?
 Expressed in the same statistical units in which the original data are
presented
 For example, dollar, meter, kilogram, taka etc.

But when two sets of data are expressed in different units absolute
measures are not comparable

Even with the identical units of measurements, the individual values


of one distribution may vary so widely ( such as salary of a manager
Vs. wage of a worker) that the average and the deviations of items
from this average of the first distribution may be widely different in
magnitude from those of other
Relative Measures
 To compare the extent of variation of different distributions
having identical or differing units of measurements

 Expressed in the form of coefficients and pure number

 Independent of the unit of measurement

 The relative measures are


1) Coefficient of Range
2) Coefficient of Quartile Deviation
3) Coefficient of Mean Deviation
4) Coefficient of Variation
Range
 The range is defined as the difference between the largest
score in the set of data and the smallest score in the set of
data, i.e XL – XS
Where, XL= Largest observation
XS = Smallest observation
 What is the range of the following data:
4 8 1 6 6 2 9 3 6 9

 The largest score (XL) is 9; the smallest score (XS) is 1;


the range is XL - XS = 9 - 1 = 8
Range (Cont.)
 The Quick oil company has a number of outlets in the
metropolitan Seattle area. The numbers of changes at
the Oak Street outlet in the past 20 days are:

65,98,55,62,79,79,59,51,90,72,56,70,62,66,80,94,63,73
,71,85
Calculate the range from the following data
 Solution:
The largest score (XL) is 98; the smallest score (XS) is 51;
the range is XL - XS = 98 - 51 = 47
Range (Cont.)
 Merits:
1) Simple to compute and understand
2) Gives rough but quick answer
 Demerits:
1) Not based on all observation
2) Effected by extreme observation
3) Can not be applied for open ended class
4) Not suitable for mathematical treatment
 Uses:
1) Quality control
2) Fluctuation in share price
3) Weather forecast
Quartile Deviation
 The quartile deviation range (QD) is defined as the difference
of the first and third quartiles

The first quartile is the 25th percentile


The third quartile is the 75th percentile

 QD = (Q3 - Q1)

 Frequently reduced to the measures of semi- interquartile


range (SIR)

 SIR = (Q3 - Q1) / 2


 Find Quartile Deviation & Semi Interquartile Range
from the previously given Ungrouped data
 Solution
QD = (Q3 - Q1)
1st Quartile (Q1 ) = 62
3rd Quartile (Q3 ) = 79.75
QD = (Q3 - Q1) = (79.75-62)
= 17.75
Semi Interquartile Range:
SIR = (Q3 - Q1) / 2
= 8.86
Quartile Deviation(Cont.)
 Merits:
1) Simple and easy to understand
2) It is not influenced by extreme value
3) Useful for highly skewed distribution
 Demerits:
1) It ignores the 1st and last 25% of items
2) Not amenable for mathematical treatment
3) Affected by sampling fluctuations
 Uses:
It doesn’t include deviation of each and every observation
from an average in the measurement, that’s why it is rarely used
in practice
Mean Deviation
 The mean deviation is an average of absolute deviations of
individual observations from the central value of a series

 If X1, X2,...........................,.Xn from a sample of observations then


n

 x x i
AD( x)  i 1
n

k
For grouped data f i xi  x Where,
AD( x)  i 1
k= No of classes
n
f = Frequency
Mean Deviation ( Cont.)
 Calculation of mean deviation

Step 1: Calculate the mean of the data

Step 2: Subtract the mean from each observation and record


the resulting differences

Step 3: Write down the absolute value of each of the


differences found in Step 2 (ignore their signs)

Step 4: Calculate the mean of the absolute values of the


differences found in step 3
Mean Deviation (Cont.)
Example Solution
The batting scores of a Step 1
cricketer was recorded over
10 completed innings to 32  27    29
date.
x
10
 30.5
His scores were: 32, 27, 38,
25, 20, 32, 34, 28, 40, 29
The cricketers’ average number
Calculate the mean deviation of runs is 30.5
of the cricketers’ scores
Mean Deviation (Cont.)
Step 2 and 3 are completed in the table
Score Deviation from mean Absolute value of
deviation
32 +1.5 1.5
27 -3.5 3.5
⁞ ⁞ ⁞
29 -1.5 1.5
 x  x   0 xx  47.0

Step 4
Mean deviation 
 xx
n
47.0

10
 4.7
Mean Deviation ( Cont.)
 Merits:
1) Simple & easy
2) Not much affected by sampling fluctuations
3) Based on all items
4) Less affected by extreme items
 Demerits:
1) Algebraic positives and negative signs are
ignored. +5 and -5 have the same meaning
2) It is not suitable for mathematical treatment
 Uses:
1) Useful while using small sample
2) Forecasting business cycles
Variance
 Variance is defined as the average of the square deviations
For Ungrouped data

 X  2

 2
 Where,
N μ= Population Mean
N= Population size
For grouped data
f X   
2  
2

N
Where,
f= Frequency
Variance (Cont.)

 First, it says to subtract the mean from each of the scores

 This difference is called a deviate or a deviation score

 The deviate tells us how far a given score is from the typical, or
average, score

 Thus, the deviate is a measure of dispersion for a given score


Variance (Cont.)

 Variance is the mean of the squared deviation scores

 The larger the variance is, the more the scores deviate, on
average, away from the mean

 The smaller the variance is, the less the scores deviate, on
average, from the mean
Variance (Cont.)
 When calculating variance, it is often easier to use a
computational formula which is algebraically equivalent to the
definitional formula:

X
2

X     
2

2
X
  N 
2

N N

 2 is the population variance, X is a score,  is the population


mean, and N is the number of scores
Variance (Cont.)
Calculate the variance of the numbers 9,8,6,5,8,6

Mean of the of the numbers =7

Necessary table for calculation


X X2 X- (X-)2
9 81 2 4
8 64 1 1
6 36 -1 1
5 25 -2 4
8 64 1 1
6 36 -1 1
 = 42  = 306 =0  = 12
Variance (Cont.)

 X
 
2

X
 
2
 2

2
 N X
 
2
N
42 2 N
306 
 6
6 12

306  294 
6 6
12

6 2
2

Both the formula has given the same results


Variance (Cont.)
 The following table shows the Price of 80 New Vehicles Sold
Last Month at Toyota (in $ thousand) is given below

Selling 15-18 18-21 21-24 24-27 27-30 30-33 33-36 Total


Price
Freq 8 23 17 18 8 4 2 80

Considering this distribution as a population find the


variance of the distribution ?
Variance (Cont.)
Necessary table for calculation
Selling Frequency Mid value
Price (f) (X) X
2
fx fX
2

15-18 8 16.5 272.25 132 2178


18-21 23 19.5 380.25 448.5 8745.75
21-24 17 22.5 506.25 382.5 8606.25
24-27 18 25.5 650.25 459 11704.5
27-30 8 28.5 812.25 228 6498
30-33 4 31.5 992.25 126 3969
33-36 2 34.5 1190.25 69 2380.5
Total 80 4803.75 1845 44082
Variance (Cont.)

 fX
2

 fX 
2

  N
2

N
2

44082 -
(1845)
 80
80
 19.15
So average squared deviation from mean is 19.15
Variance (Cont.)

Necessary table for calculation


Selling Frequency Mid value x A 2
d fd fd
Price (f) (X) c
15-18 8 16.5 -3 -24 72
18-21 23 19.5 -2 -46 92
21-24 17 22.5 (A) -1 -17 17
24-27 18 25.5 0 0 0
27-30 8 28.5 1 8 8
30-33 4 31.5 2 8 16
33-36 2 34.5 3 6 18
Total 80 -65 223
Variance (Cont.)
 Calculation of the variance by the short cut method

 fd 
 fd 
2
2

  N i
2 2

N
(65)
2

223 
 80 3
2

80
 19.15

So average squared deviation from mean is 19.15


Sample Variance
 Sample variance is defined as

s 
 X X
2   2

2
Where, s is the sample variance
 2 is used as an estimate of the population variance  2

s
But
It tends to underestimate the population variance

It provides a biased estimate

This problem can be solved if we use n-1 instead of n


Sample Variance (Cont.)
 Sample variance is defined as

s
2

X  X 
2

n 1

 This estimate is the unbiased estimate of population


variance

 The division by n-1 instead of n makes the average


squared deviation consistent with many similar
measures used in statistical application
Standard Deviation (SD)
 When the deviate scores are squared in variance, their unit of
measure is squared as well

E.g. If people’s weights are measured in pounds, then the


variance of the weights would be expressed in pounds2 (or
squared pounds)

 Since squared units of measure are often awkward to deal with,


the square root of variance is often used instead

 The standard deviation is the square root of variance


Standard Deviation ( Cont.)

Standard deviation = variance


Variance = standard deviation2
Standard Deviation ( Cont.)
 Merits
1) Based on all observation
2) Amenable to mathematical treatment
3) Less affected by sampling fluctuations
 Demerits:
1) Difficult to compute
2) Affected by extreme value
 Uses:
SD is the best measures of dispersion. It is widely
used measures of dispersion. It is widely used in sampling
theory., and by biologists.
Coefficient of Variation

 This is a measure of relative variability


 used to:
 measure changes that have occurred in a population over time

 compare variability of two populations that are expressed in


different units of measurement

 represent spread of the distribution relative to the mean of the


same distribution

 It is expressed as a percentage rather than in terms of the


units of the particular data
Coefficient of Variation( Cont.)
 The formula for the coefficient of variation (CV) is:
s
CV   %
x
Where x = the mean of the sample
s = the standard deviation of
the sample

 CV=33% implies that the SD of the sample value is 33% of


the mean of the same distribution
 The distribution for which the value of CV is greater is said to
be more variable, or less consistent, or less uniform
Coefficient of Variation ( Cont.)
Example
Calculate the coefficient of variation for the price of 400 g cans
of pet food, given that the mean is 81 cents and s = 6.77 cents.
Interpret the results.

Solution
s
V  
x
 6.77 
 
 81 
 8.36%
This means that the standard deviation of the price of a 400g
can of pet food is 8.36% of the mean price.
Problem
 In two factories A & B engaged in the same industry. The
average monthly wages and standard deviations are as follows

Average No. of wage


Factory SD of wage
Monthly wage Earners
A 4600 500 100
B 4900 400 80

I. Which factory A or B pays larger amount as month wages?


II. Which factory shows greater variability in the distribution
of wage
III. What is the mean & SD of all the workers in two factories
taken together
Coefficient of Variation (Cont.)
 Merits:
1) SD of the data must be understood in context of
mean of the data
2) CV is a dimensionless number
3) Used for comparison
 Demerits:
1) Sensitive to small changes in the mean
2)It can not be used directly to construct confidence
interval
 Uses:
Common in applied probability fields such queuing theory
and reliability theory
Properties of a Good Measures of Dispersion

 Should be simple to understand


 Should be easy to compute
 Should be rigidly defined
 Should be based on all observation
 Should be amenable for further mathematical treatment
 Should have sampling stability
 Should not be affected by extreme value
Which One Is The Best Measure ?

Standard Deviation
References

Douglas, A., William, G. and Samuel, A. (2005). Statistical Techniques in


Business &Economics. McGraw –Hill Irwin.

Gupta ,S.P. and Gupta ,M.P (2011). Business Statistics. Sultan Chand &
Sons.

Jalil,M.A. and Rezina Ferdous (1999). Basic Statistics: Methods and


Application. Robi Publication, Dhaka.

Nurul Islam ,M. (2004). An Introduction to Statistics and Probability. Book


World.

Pillai, R.S.N. and Bagavathi, V.(2002) .Statistics. S,Chand & Company


Ltd.
Questions
1) What do you mean by measures of dispersion? Explain with example.
2) What are the various measures of dispersion? Define these measures for
raw and frequency data. What are the advantages and disadvantages of
these measures?
3) What are the good measures of dispersion?
4) What are the significance of measuring dispersion?
5) What do you mean by CV? What are the advantages of CV over standard
deviation?
6) “Measures of Central Tendency & Measures of Dispersion both are
needed to understand a frequency distribution”. Justify.
Thank You

You might also like