Professional Documents
Culture Documents
Variability
No Variability
Q1 Q2 Q3
The location of quartiles
n+1
lower quartile = th value
4
n+1
median = th value
2
3 (n + 1)
upper quartile = th value
4
10% 10% 10% 10% 10% 10% 10% 10% 10% 10%
D1 D2 D3 D4 D5 D6 D7 D8 D9
Finding the median, quartiles and inter-quartile range.
Example 2: Find the median and quartiles for the data below.
6, 3, 9, 8, 4, 10, 8, 4, 15, 8, 10
Q1 Q2 Q3
3, 4, 4, 6, 8, 8, 8, 9, 10, 10, 15,
Lower Upper
Quartile Median Quartile
= 4 = 8 = 10
Inter-Quartile Range = 10 - 4 = 6
Finding the median, quartiles and inter-quartile range.
Example 1: Find the median and quartiles for the data below.
12, 6, 4, 9, 8, 4, 9, 8, 5, 9, 8, 10
Q1 Q2 Q3
4, 4, 5, 6, 8, 8, 8, 9, 9, 9, 10, 12
Lower Upper
Median
Quartile Quartile
= 8
= 5½ = 9
Inter-Quartile Range = 9 - 5½ = 3½
Range
Range is the simplest measure of variability.
Properties of Range
Only two values are used in its calculation.
It is influenced by an extreme value (outliers). highly
sensitive to the largest and smallest values.
It is easy to compute and understand.
•It is the difference between the largest and smallest values
in the sample
•Used when the measure of central tendency is the mode
(nominal data or when the most frequent score is of
interest) or median (ordinal data or skewed data)
It is best for symmetric data with no outliers.
Example: Apartment Rents
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
The interquartile range (IQR)
• It is the range of the data that contains the
middle 50% of cases. Recall that you find
• the range by subtracting the minimum value
from the maximum value in the dataset. You
calculate in the IQR in a similar way, except
that you find the difference between the 1st
quartile (Q1) and the 3rd quartile (Q3).
• Therefore, IQR = Q3-Q1
Interquartile Range
The interquartile range of a data set is the difference
between the third quartile and the first quartile.
It is the range for the middle 50% of the data.
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
• Example
• A recent study proclaimed ALDAMAZIN the “wettest” city in Sudan .
The following table lists a measurement of the approximate annual
rainfall in ALDAMAZIN for the last 10 years. Find the Range and IQR
for this data.
• Year Rainfall (inches)
• 1998 90
• 1999 56
• 2000 60
• 2001 59
• 2002 74
• 2003 76
• 2004 81
• 2005 91
• 2006 47
• 2007 59
• First, place the data in order from smallest to
largest. The range is the difference between
the minimum and maximum rainfall amounts.
The Five-Number Summary
Question: Gemma recorded the heights in cm of girls in the same class and
constructed a box plot from the data. The box plots for both boys and girls
are shown below. Use the box plots to choose some correct statements
comparing heights of boys and girls in the class. Justify your answers.
Boys
Girls
1. The girls are taller on average. 2. The boys are taller on average.
3. The girls show less variability in height. 5. The smallest person is a girl.
4. The boys show less variability in height. 6. The tallest person is a boy.
Figure 2-14 Boxplots
Skewed left
Skewed right
Normal Uniform Skewed
Box Plot and Skewness
• Data is SKEWED LEFT when the longer tail is
on the left side.
• Data is SKEWED right since the longer tail is
on the right side side.
Variance
•The variance is a measure of variability that uses all
the data
•The variance is based on the difference between each
observation (xi) and the mean ( x )for the sample
and μ for the population.
• If measuring variance of population, denoted by 2
(“sigma-squared”).
• If measuring variance of sample, denoted by s2 (“s-
squared”).
• Measures average squared deviation of data points
from their mean.
• Highly affected by outliers. Best for symmetric
data. Problem is units are squared.
The variance is the average of the squared
differences between the observations and the
mean value
( x ) 2
For the population: 2 i
N
( x x ) 2
Data set: 5, 9, 16, 17, 18
X 65
Mean 13
N 5
Deviations from the mean: -
8, -4, 3, 4, 5
Population Variance
• Average of the squared deviations from the
arithmetic mean
X X
X
X
2
2
2
5 -8 64
9 -4 16 N
16 +3 9
130
17
18
+4
+5
16
25
0 130
5
2 6 .0
X
2
X X X
2
2
N
5 -8 64 130
9 -4 16
16 +3 9 5
17
18
+4
+5
16
25 2 6 .0
0 130
2
2 6 .0
© 2002 Thomson / South-
5 .1
Slide 3-35
Western
CALCULATIONAL FORMULA FOR STANDARD
DEVIATION
• FORMULA 2.3 SHOULD BE USED IF
THE GROUP TESTED IS VIEWED AS
THE GROUP OF INTEREST;
CONSIDERED THEN THE
POPULATION (E.G., CALCULATING
STANDARD DEVIATION OF THE 50-M
SWIM TIMES AT A SWIM MEET )
• X = SCORES
• N = NUMBER OF SCORES
• FORMULA TYPICALLY USED FOR
HAND CALCULATION
CALCULATIONAL FORMULA FOR STANDARD
DEVIATION
• FORMULA 2.4 SHOULD BE USED IF THE
GROUP TESTED IS VIEWED AS A
REPRESETATIVE PART OF THE POPULATION;
CONSIDERED THEN A SAMPLE
• STANDARD DEVIATION CALCULATED ON THE
SAMPLE IS USED AS AN ESTIMATE OF THE
POPULATION STANDARD DEVIATION (E.G.,
CALCULATION OF THE STANDARD DEVIATION
OF THE 40-YARD TIME OF COLLEGE WIDE
RECEIVERS THAT IS USED AS AN ESTIMATION
OF THE STANDARD DEVIATION OF ALL
COLLEGE WIDE RECEIVERS)
• X = SCORES
• N = NUMBER OF SCORES
• FORUMULA TYPICALLY USED FOR HAND
CALCULATION
SAMPLE CALCULATION OF THE STANDARD DEVIATION USING
FORMULA 2.3 AND 2.4 AND THE FOLLOWING TESTS SCORES: 7,
2, 7, 6, 5, 6, 2
Standard Deviation
• The Standard Deviation of a data set is the square
root of the variance.
• The standard deviation is measured in the same units
as the data, making it easy to interpret.
Computing a standard deviation
( xi ) 2
For the population:
N
( xi x ) 2
For the sample: s
n 1
Calculating the Standard Deviation
• Why only conceptually mean of deviation
scores?
• If Xi X i -
Xi
1
• What is mean deviation? 2
3
• S(Xi – ) = 0 ~
4
5
4 Steps to Standard Deviation
• 1. Calculate deviation scores Xi
• 2. Sums of squared deviations
– Or Sums of squares (SS) SS ( X i )2
• 3. Variance
– mean of squared deviations (MS)
• 4. Standard deviation 2
(X i ) 2
(X i ) 2
N
Coefficient of Variation
Ratio of sample standard deviation to sample mean
multiplied by 100.
Measures relative variability, that is, variability relative to
the magnitude of the data.
Unitless, so good for comparing variation between two
groups.
100 For the population
s
100 For the sample
x
Level Of Measurement & Variability
• Which can be used?
• nominal
– none
• ordinal
– range only
• interval/ratio
– all 3 OK
– range, standard deviation, & variance ~
Standard Deviation (SD)
(X i ) 2
SS ( X X ) 2
s
( X X ) 2
N 1
s
2 ( X X ) 2
N 1
Samples: Degrees of Freedom (df)
• df = N – 1
– For a single sample (or group)
• s tends to underestimate s
– Fewer Xi used to calculate
– Dividing by N-1 boosts value of s
• Also used for
– Confidence intervals for sample means
– Critical values in hypothesis testing ~
Properties of the
Standard Deviation
• If a constant is added to every score in a distribution,
the standard deviation will not be changed.
• If you visualize the scores in a frequency distribution
histogram, then adding a constant will move each
score so that the entire distribution is shifted to a
new location.
• The center of the distribution (the mean) changes,
but the standard deviation remains the same.
50
Properties of the
Standard Deviation (cont.)
• If each score is multiplied by a constant, the
standard deviation will be multiplied by the
same constant.
• Multiplying by a constant will multiply the
distance between scores, and because the
standard deviation is a measure of distance, it
will also be multiplied.
51
The Mean and Standard Deviation as
Descriptive Statistics
• If you are given numerical values for the mean
and the standard deviation, you should be
able to construct a visual image (or a sketch)
of the distribution of scores.
• As a general rule, about 70% of the scores will
be within one standard deviation of the mean,
and about 95% of the scores will be within a
distance of two standard deviations of the
mean.
52
Choosing Appropriate
Measure of Variability
• If data are symmetric, with no serious outliers,
use range and standard deviation.
• If data are skewed, and/or have serious
outliers, use IQR.
• If comparing variation across two data sets,
use coefficient of variation.