You are on page 1of 53

Introduction

• Dispersion (variability, scatter, or spread))


characterizes how
• stretched or squeezed of the data.
• Dispersion refers to the variation of the items
around an average.
• Dispersion is a non negative real number
• Equal zero if all the data are the same and
increases as the data become more diverse.
Variability
No Variability in Cash Flow Mean
Mean

Variability in Cash Flow Mean


Mean

© 2002 Thomson / South-


Slide 3-3
Western
Variability

Variability

No Variability

© 2002 Thomson / South-


Slide 3-4
Western
MEASURES OF CENTRAL TENDENCY
• When the graph of the scores is a normal curve, the
mode, median, and mean are equal
• The mean is the most common measure of central
tendency
• When the scores are quite skewed or the data is ordinal
lacking a common interval, the median is a better
measure of central tendency
• The mode is used only when the mean or median
cannot be calculated (e.g., nominal data) or when the
only information wanted is the mot frequent score (e.g.,
most uniform size or injury site)

Measures of Variability
• Describes the set of scores in terms of their
spread, or heterogeneity
• • consider two groups of scores
• group 1 = 9, 5, 1; group 2 = 5, 6, 4
• • both have a mean and median of 5 but
group 2 has much more homogeneous or
similar scores than group 1

From the Dictionary
• Deviation: departure from a standard or norm.
• Variance: the state, quality, or fact of being
variable, divergent, different, or anomalous.
• Error: a deviation from accuracy or correctness
• Variability: something that may or does vary; a
variable feature or factor
• Variation: something that may or does vary; a
variable feature or factor
• Objectives of Dispersion :
• (i) To determine the reliability of an average
• (ii) To compare the variability of two or more series
• (iii) It serves the basis of other statistical measures
such as correlation etc.
• (iv) It serves the basis of statistical quality control..
Properties of a good measure of dispersion:
• (i) It should be easy to understand.
• (ii) It should be simple to calculate
• (iii) It should be uniquely defined.
• (iv) It should be based on all observations.
• (v) It should not be unduly affected by extreme
items
Measures of Variability
• Quartiles
• Range
• Interquartile range
• Variance
• Standard deviation
• Coefficient of variation
• All of these measures are appropriate for
measurement data only.
Quartiles and Percentiles
• A quartile divides the data into four approximately equal
groups.
• A percentile is a statistic that identifies the percentage of
the data that is less than the given value.
• The lower quartile, sometimes abbreviated as Q1 , is also
know as the 25th percentile.
• Technically, the median is a “middle” quartile and is referred
to as Q2. Because it is the numeric middle of the data, half
of the data is below the median and half is above.
• The upper quartile, or Q3, is also know as the 75th
percentile.
Quartiles
Q1, Q2, Q3
divides ranked scores into four equal parts
25% 25% 25% 25%

Q1 Q2 Q3
The location of quartiles

When there are n values in an ordered data set:

n+1
lower quartile = th value
4

n+1
median = th value
2

3 (n + 1)
upper quartile = th value
4

interquartile range = upper quartile – lower quartile


Deciles
D1, D2, D3, D4, D5, D6, D7, D8, D9
divides ranked data into ten equal parts

10% 10% 10% 10% 10% 10% 10% 10% 10% 10%

D1 D2 D3 D4 D5 D6 D7 D8 D9
Finding the median, quartiles and inter-quartile range.

Example 2: Find the median and quartiles for the data below.
6, 3, 9, 8, 4, 10, 8, 4, 15, 8, 10

Order the data

Q1 Q2 Q3
3, 4, 4, 6, 8, 8, 8, 9, 10, 10, 15,

Lower Upper
Quartile Median Quartile
= 4 = 8 = 10

Inter-Quartile Range = 10 - 4 = 6
Finding the median, quartiles and inter-quartile range.

Example 1: Find the median and quartiles for the data below.
12, 6, 4, 9, 8, 4, 9, 8, 5, 9, 8, 10

Order the data

Q1 Q2 Q3

4, 4, 5, 6, 8, 8, 8, 9, 9, 9, 10, 12

Lower Upper
Median
Quartile Quartile
= 8
= 5½ = 9

Inter-Quartile Range = 9 - 5½ = 3½
Range
Range is the simplest measure of variability.
Properties of Range
Only two values are used in its calculation.
It is influenced by an extreme value (outliers). highly
sensitive to the largest and smallest values.
It is easy to compute and understand.
•It is the difference between the largest and smallest values
in the sample
•Used when the measure of central tendency is the mode
(nominal data or when the most frequent score is of
interest) or median (ordinal data or skewed data)
It is best for symmetric data with no outliers.
Example: Apartment Rents

Seventy studio apartments


were randomly sampled in
a small college town. The
monthly rent prices for
these apartments are listed
in ascending order on the next slide.
Range
Range = largest value - smallest value
Range = 615 - 425 = 190

425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
The interquartile range (IQR)
• It is the range of the data that contains the
middle 50% of cases. Recall that you find
• the range by subtracting the minimum value
from the maximum value in the dataset. You
calculate in the IQR in a similar way, except
that you find the difference between the 1st
quartile (Q1) and the 3rd quartile (Q3).
• Therefore, IQR = Q3-Q1
Interquartile Range
 The interquartile range of a data set is the difference
between the third quartile and the first quartile.
 It is the range for the middle 50% of the data.

 It overcomes the sensitivity to extreme data values.


Interquartile Range
3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80

425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
• Example
• A recent study proclaimed ALDAMAZIN the “wettest” city in Sudan .
The following table lists a measurement of the approximate annual
rainfall in ALDAMAZIN for the last 10 years. Find the Range and IQR
for this data.
• Year Rainfall (inches)
• 1998 90
• 1999 56
• 2000 60
• 2001 59
• 2002 74
• 2003 76
• 2004 81
• 2005 91
• 2006 47
• 2007 59
• First, place the data in order from smallest to
largest. The range is the difference between
the minimum and maximum rainfall amounts.
The Five-Number Summary

• The five-number summary is a numerical


description of a data set comprised of the
following measures (in order):
• minimum value,
• lower quartile,
• median,
• upper quartile,
• maximum value.
Box plot
• box plot summarizes
data using the median,
upper and lower
quartiles, and the
extreme (least and
greatest) values. It
allows you to see
important
characteristics of the
data at a glance
• the Box includes the
lower quartile, median,
and upper quartile.
• The Whiskers extend
from the Box to the
max and min.
Constructing a box and whisker plot

Step 1 - take the set of numbers


given…
34, 18, 100, 27, 54, 52, 93, 59, 61, 87, 68,
85, 78, 82, 91

Place the numbers in order from


least to greatest:
18, 27, 34, 52, 54, 59, 61, 68, 78, 82, 85,
87, 91, 93, 100
Constructing a box and whisker plot

• Step 2 - Find the median.


• Remember, the median is the middle value in a data
set.
18, 27, 34, 52, 54, 59, 61, 68, 78, 82, 85, 87, 91, 93, 100
68 is the median of this data set.
• Step 3 – Find the lower quartile.
• The lower quartile is the median of the data set to the
left of 68.
(18, 27, 34, 52, 54, 59, 61,) 68, 78, 82, 85, 87, 91, 93,
100
52 is the lower quartile
Constructing a box and whisker plot

• Step 4 – Find the upper quartile.


18, 27, 34, 52, 54, 59, 61, 68, (78, 82, 85, 87, 91, 93,
100)
87 is the upper quartile
• Step 5 – Find the maximum and minimum values in
the set.
18, 27, 34, 52, 54, 59, 61, 68, 78, 82, 85, 87, 91, 93, 100
18 is the minimum and 100 is the maximum.
Drawing a Box Plot.

Question: Gemma recorded the heights in cm of girls in the same class and
constructed a box plot from the data. The box plots for both boys and girls
are shown below. Use the box plots to choose some correct statements
comparing heights of boys and girls in the class. Justify your answers.

Boys

130 140 150 160 170 180 cm 190

Girls

1. The girls are taller on average. 2. The boys are taller on average.

3. The girls show less variability in height. 5. The smallest person is a girl.

4. The boys show less variability in height. 6. The tallest person is a boy.
Figure 2-14 Boxplots

Skewed left

Skewed right
Normal Uniform Skewed
Box Plot and Skewness
• Data is SKEWED LEFT when the longer tail is
on the left side.
• Data is SKEWED right since the longer tail is
on the right side side.
Variance
•The variance is a measure of variability that uses all
the data
•The variance is based on the difference between each
observation (xi) and the mean ( x )for the sample
and μ for the population.
• If measuring variance of population, denoted by 2
(“sigma-squared”).
• If measuring variance of sample, denoted by s2 (“s-
squared”).
• Measures average squared deviation of data points
from their mean.
• Highly affected by outliers. Best for symmetric
data. Problem is units are squared.
The variance is the average of the squared
differences between the observations and the
mean value

 ( x   ) 2
For the population: 2  i
N

 ( x  x ) 2

For the sample: s2  i


n 1


Data set: 5, 9, 16, 17, 18
X 65
Mean    13
N 5
Deviations from the mean: -
8, -4, 3, 4, 5
Population Variance
• Average of the squared deviations from the
arithmetic mean

X   X
 X 
X  
2
2


2
5 -8 64 
9 -4 16 N
16 +3 9
130
17
18
+4
+5
16
25 
0 130
5
 2 6 .0

© 2002 Thomson / South-


Slide 3-34
Western
Population Standard Deviation
• Square root of the
variance

 X 
2

X X   X  
2

2

N
5 -8 64 130
9 -4 16 
16 +3 9 5
17
18
+4
+5
16
25  2 6 .0
0 130
  
2

 2 6 .0
© 2002 Thomson / South-
 5 .1
Slide 3-35
Western
CALCULATIONAL FORMULA FOR STANDARD
DEVIATION
• FORMULA 2.3 SHOULD BE USED IF
THE GROUP TESTED IS VIEWED AS
THE GROUP OF INTEREST;
CONSIDERED THEN THE
POPULATION (E.G., CALCULATING
STANDARD DEVIATION OF THE 50-M
SWIM TIMES AT A SWIM MEET )

• X = SCORES
• N = NUMBER OF SCORES
• FORMULA TYPICALLY USED FOR
HAND CALCULATION
CALCULATIONAL FORMULA FOR STANDARD
DEVIATION
• FORMULA 2.4 SHOULD BE USED IF THE
GROUP TESTED IS VIEWED AS A
REPRESETATIVE PART OF THE POPULATION;
CONSIDERED THEN A SAMPLE
• STANDARD DEVIATION CALCULATED ON THE
SAMPLE IS USED AS AN ESTIMATE OF THE
POPULATION STANDARD DEVIATION (E.G.,
CALCULATION OF THE STANDARD DEVIATION
OF THE 40-YARD TIME OF COLLEGE WIDE
RECEIVERS THAT IS USED AS AN ESTIMATION
OF THE STANDARD DEVIATION OF ALL
COLLEGE WIDE RECEIVERS)
• X = SCORES
• N = NUMBER OF SCORES
• FORUMULA TYPICALLY USED FOR HAND
CALCULATION
SAMPLE CALCULATION OF THE STANDARD DEVIATION USING
FORMULA 2.3 AND 2.4 AND THE FOLLOWING TESTS SCORES: 7,
2, 7, 6, 5, 6, 2
Standard Deviation
• The Standard Deviation of a data set is the square
root of the variance.
• The standard deviation is measured in the same units
as the data, making it easy to interpret.
Computing a standard deviation
( xi   ) 2
For the population: 
N

( xi  x ) 2
For the sample: s
n 1
Calculating the Standard Deviation
• Why only conceptually mean of deviation
scores?
• If Xi X i -
  Xi
 1
• What is mean deviation? 2
3
• S(Xi – ) = 0 ~
4
5
4 Steps to Standard Deviation
• 1. Calculate deviation scores Xi  
• 2. Sums of squared deviations
– Or Sums of squares (SS) SS   ( X i   )2
• 3. Variance
– mean of squared deviations (MS)
• 4. Standard deviation  2

 (X i  ) 2

– square root of variance ~ N

 (X i  ) 2

N
Coefficient of Variation
Ratio of sample standard deviation to sample mean
multiplied by 100.
Measures relative variability, that is, variability relative to
the magnitude of the data.
Unitless, so good for comparing variation between two
groups.

100 For the population

s
 100 For the sample
x
Level Of Measurement & Variability
• Which can be used?
• nominal
– none
• ordinal
– range only
• interval/ratio
– all 3 OK
– range, standard deviation, & variance ~
Standard Deviation (SD)
 (X i  ) 2

• Conceptually mean deviation score for all


data
– Gives width (dispersion) of distribution
• Describing a distribution
– Report mean & standard deviation
– ,  ~
Samples & Variability
• Usually study samples
• to learn about populations
– Sampling introduces error
– Change symbols & formula

SS   ( X  X ) 2
s
 ( X  X ) 2

N 1
s 
2  ( X  X ) 2

N 1
Samples: Degrees of Freedom (df)
• df = N – 1
– For a single sample (or group)
• s tends to underestimate s
– Fewer Xi used to calculate
– Dividing by N-1 boosts value of s
• Also used for
– Confidence intervals for sample means
– Critical values in hypothesis testing ~
Properties of the
Standard Deviation
• If a constant is added to every score in a distribution,
the standard deviation will not be changed.
• If you visualize the scores in a frequency distribution
histogram, then adding a constant will move each
score so that the entire distribution is shifted to a
new location.
• The center of the distribution (the mean) changes,
but the standard deviation remains the same.

50
Properties of the
Standard Deviation (cont.)
• If each score is multiplied by a constant, the
standard deviation will be multiplied by the
same constant.
• Multiplying by a constant will multiply the
distance between scores, and because the
standard deviation is a measure of distance, it
will also be multiplied.

51
The Mean and Standard Deviation as
Descriptive Statistics
• If you are given numerical values for the mean
and the standard deviation, you should be
able to construct a visual image (or a sketch)
of the distribution of scores.
• As a general rule, about 70% of the scores will
be within one standard deviation of the mean,
and about 95% of the scores will be within a
distance of two standard deviations of the
mean.

52
Choosing Appropriate
Measure of Variability
• If data are symmetric, with no serious outliers,
use range and standard deviation.
• If data are skewed, and/or have serious
outliers, use IQR.
• If comparing variation across two data sets,
use coefficient of variation.

You might also like