You are on page 1of 23

Measures of Variation

As well as the Central Tendency of the data in a population


or sample a second important characteristic of the data is it
variability about some center.
Measures of Variation include:
The range
The Variance
The Standard Deviation
The Mean Absolute Deviation
The standard deviation is just the
square root of the variance
Measures of Variation
Standard Deviation of a Population
We will label the population variance to be
2

And define
2
=
i
(x
i
)
2
/N

Where
is the population mean
N is the size of the population

i
(x
i
)
2
is the sum of the squares of the
difference between each item in the population
and the mean.
Measures of Variation
Suppose a student receives the following quiz grades:
{82, 68, 74, 86, 90, 88, 62, 75, 80, 55}
For this student, these grades are the total population of
her scores that are used to calculate her mean or average
grade. We obtain:
= (82 + 68 + 74 + 86 + 90 + 88 + 62 + 75 + 80 + 55)/10
= 760/10 = 76
The mean of this population is 76
Measures of Variation
Having obtained the mean, we can now calculate the
variance

2
=
i
(x
i
)
2
/N
= {(82-76)
2
+ (68-76)
2
+ (74-76)
2
+ (86-76)
2
+ (90-76)
2
+
(88-76)
2
+ (62-76)
2
+ (75-76)
2
+ (80-76)
2
+ (55-76)
2
}/10
= (36 + 64 + 4 +100 + 196 + 144 + 196 + 1 + 16 + 441)/10
= 119.8
{82, 68, 74, 86, 90, 88, 62, 75, 80, 55} and =76
Measures of Variation
We find the standard deviation in this population data by
taking the square root of the variance.

2
=
i
(x
i
)
2
/N = 119.8


= (119.8)

= 10.94
If we display the data on a dot plot, we can visualize the use
of the standard deviation as a measure of variation in the
data
55 60 65 70 75 80 85 90 95 100
x x x x x x x x x
x
Mean = 76
= 76

{82, 68, 74, 86, 90, 88, 62, 75, 80, 55}
Measures of Variation
Chebyshevs Theorem
The proportion of any set of data lying within K standard
deviations of the mean is always at least 1 1/K
2
, for all K
greater than or equal to 2.
Chebyshevs Inequality tells us that in any statistical
distribution at least of the values will lie within 2
standard deviations of the mean, and at least 8/9 of all
values will lie within 3 standard deviations of the mean.
In the previous example we found = 76 and

= 10.94
- 2

= 76 2(10.94) = 54.12
+ 2

= 76 + 2(10.94) =97.88
We find that 100% of the
values lie within 2 of the
mean
Measures of Variation
The Sample Standard Deviation
The standard deviation of a sample is denoted by
the letter s. The sample standard deviation is an
estimate of the population standard deviation _
s
2
=
i
(x
i
x)
2
/(n 1)
Where x bar in the previous formula denotes the sample
mean. The sample standard deviation is obtained by
taking the square root of the variance.
Note! To calculate the sample variance we divide by the
number of degrees of freedom (n 1) instead of the sample
size n. We have already calculated the sample mean when
we use the same sample data to obtain a second statistic.
Only n-1 of those values are considered free the n
th
value is
fixed since the sum must equal n times the mean.
Measures of Variation
The formula for the standard deviation can be
transformed into a form that slightly simplifies the
computation.
s = (n
i
(x
i
)
2
(
i
x
i
)
2
)/n(n 1))


On first sight it is not clear that we have simplified
the calculation, but if we assume that the previous
10 grades were a sample taken from a larger
number of students enrolled in a course, then we
will illustrate how the two formula are used to
calculate the standard deviation.
Measures of Variation
_
s = (
i
(x
i
x)
2
/(n 1))

{82, 68, 74, 86, 90, 88, 62, 75, 80, 55}
s = (((82-76)
2
+ (68-76)
2
+ (74-76)
2
+ (86-76)
2
+ (90-76)
2
+
(88-76)
2
+ (62-76)
2
+ (75-76)
2
+ (80-76)
2
+ (55-76)
2
)/(n-1))


= (1198/9)

= 133.11

= 11.54
Using the original formula and treating the previous data a
sample data with a mean of 76 we get:
Measures of Variation
To use the modified formula, we first construct the
following table
x x
2

82 6724
68 4724
74 5476
86 7396
90 8100
88 7744
62 3844
75 5625
80 6400
55 3025
760 58958
{82, 68, 74, 86, 90, 88, 62, 75, 80, 55} n = 10
s
2
= ((10)(58958)-760
2
)/(10)(9)
= (589580-577600)/(10)(9)
= 133.11
s = 133.11

= 11.54
In this second method we find the total
of the sample items and the total of the
square of each of these items.
Measures of Variation
Finding the standard deviation for tabulated or weighted data
Class Midpoint (x) Total (f) f*x x
2
f*x
2

64.5 - 69 .5 67 6 402 4489 26934
69.5 74.5 72 11 792 5184 57024
74.5 79.5 77 20 1540 5929 118580
79.5 84.5 82 13 1066 6724 87412
84.5 89.5 87 9 783 7569 68121
89.5 94.5 92 1 92 8464 8464
60 4675 366535

Recall the table we constructed for finding the mean of a sample of September
temperature readings in the Central Tendency lecture notes.
We have augmented the previous table by adding two additional columns
that will be used for calculating the sample standard deviation of these
grouped data.
Measures of Variation
The formula for obtaining the standard deviation of
weighted or tabulated data is:
s = (n
i
(f
i
* x
i
2
) (
i
f
i
* x
i
)
2
)/n(n 1))


From the previous table we have
n
i
(f
i
* x
i
2
) = (60)(366535) = 21992100
(
i
f
i
* x
i
)
2
= (4675)
2
= 21855625
s = ((21992100 21855625)/(60)(59))

= 38.55

= 6.21
Measures of Variation
64.5 69.5 74.5 79.5 84.5 89.5 94.5 Temperature


60
55
50
45
40
35
30
25
20
15
10
5
0

frequency
We construct an ogive from the previous table
x
x
x
x
x
x
x
Mean = 79.183
s = 6.21
6.21 6.21
2s = 12.42
2s 2s
Measures of Variation
The Normal Distribution
Continuous
Symmetric
Mean = Median = Mode (all the same value)
o
o
o
o
o
o
o
o
o
o
o
o o
o
o
o
o
o
o
o
o
o
o
mean
68% of values
2 95% of values
3 99.8 % of values
Measures of Variation
Other measures of variation
Using the range to estimate the standard deviation
s ~ range/4
On an earlier slide we found for a population of student
grades:
{82, 68, 74, 86, 90, 88, 62, 75, 80, 55}
= 76 and

= 10.94
The range of this population = 90 55 = 35
This gives us an estimate of

= 35/4 = 8.75
In the tabulated data for the temp readings we have
range = 92 65 = 27 s = 27/4 = 6.15 which agrees
fairly well with the calculated value of s = 6.21
Measures of Variation
The Coefficient of Variation (CV)
Define: For either a population or a sample the
Coefficient of Variation is defined to be the ratio of the
standard deviation over the mean
CV =

/ for a population
CV = s/ x for a sample
Where x denotes x bar the
sample mean
The CV for the population of grades from the previous page:
CV = 10.94/76 = 0.144
Part 2
Measures of Relative Standing
Relative Standing
A z score is the number of standard deviations that a raw
score, x, is above or below the mean.
A raw score x taken from a population is converted to a
standardized z score by the formula
z = (x )/
In a sample the z score of a value x is given by
z = (x x)/s where x denotes the sample mean
Relative Standing
Percentiles
percentile of value x = ((number of values < x)/ total number of values)*100
(round the result to the nearest whole number
Suppose that in a class of 25 people we have the following
averages (ordered in ascending order)
42, 59, 63, 67, 69, 69, 70, 73, 73, 74, 74, 74, 77, 78, 78, 79, 80, 81, 84, 85, 87, 89, 91, 94, 98
If you received a 77, what percentile are you?
percentile of 77 = (12/25)*100 = 48
Relative Standing
Quartiles
Instead of finding the percentile of a single data value as
we did on the previous page, it is often useful to group the
data into 4, or more, (nearly) equal groups. When
grouping the data into four equal groupings, we call these
groupings quartiles.
Let n = number of items in the data set
k = percent desired (ex. k= 25)
L = locator the value separating the first k
percent of the data from the rest
L = (k/100) * n
Relative Standing
Lets separate the 25 class grades into four quartiles.
Step 1 order the data in ascending order
42, 59, 63, 67, 69, 69, 70, 73, 73, 74, 74, 74, 77, 78, 78, 79, 80, 81, 84, 85, 87, 89, 91, 94, 98
Now find the 3 locators L
25
, L
50
, L
75
,
L
25
= (25/100) * 25 = 6.25
L
50
= (50/100) * 25 = 12.5
L
75
= (75/100) * 25 = 18.75
7
13
19
Round fraction
part up to the next
integer
L
25
Q
1

Q
2

Q
3

Relative Standing
Other measures of relative standing include
Interquartile range (IQR) = Q
3
-

Q
1
Semi-interquartile range = (Q
3
-

Q
1
)/ 2
Midquartile = (Q
3
+

Q
1
)/2
10 90 percentile range = P
90
-

P
10



For the data on the previous page we have:
IQR = 84 70 = 16
Semi IQR = (84 70)/2 = 8
Midquartile = (84 + 70)/2 = 77
Measures of variation
Measure of central
tendency
Box Diagram
Recall the ordered high temperature readings from an
previous lecture
65, 67, 68, 68, 69, 69, 71, 71, 71, 72, 72, 72, 73, 73, 73,
74, 74, 75, 75, 75, 75, 76, 76, 77, 77, 77, 77, 77, 77, 78,
78, 78, 78, 79, 79, 79, 79, 80, 81, 81, 81, 81, 81, 81, 81,
81, 82, 82, 83, 84, 85, 85, 85, 86, 86, 87, 87, 88, 89, 92
L
25

L
75
65 92 69 73 77 81 85 89
Q
1
M Q
3
median
To construct a box diagram to illustrate the extent to which the extreme
data values lie beyond the interquartile range, draw a line with the low
and high value highlighted at the two ends. Mark the gradations between
these two extremes, then locate the quartile boundaries Q
1
, Med., and Q
3
on this line. Construct a box about these values.
Q
1
= (73 + 74)/2 = 73.5

You might also like