You are on page 1of 42

ESSENTIAL STATISTICS 2E

William Navidi and Barry Monk

McGraw-Hill Education. All rights reserved. Authorized only for instructor use in the classroom. No reproduction or further distribution permitted without the prior written consent of McGraw-Hill Education.
Measures of Position

Section 3.3

McGraw-Hill Education.
Objectives
1. Compute and interpret -scores
2. Compute the quartiles of a data set
3. Compute the percentiles of a data set
4. Compute the five-number summary for a data set
5. Understand the effects of outliers
6. Construct boxplots to visualize the five-number summary
and outliers

McGraw-Hill Education.
Objective 1
Compute and interpret -scores

McGraw-Hill Education.
-Score
Who is taller, a man 73 inches tall or a woman 68 inches tall? The
obvious answer is that the man is taller. However, men are taller
than women on the average.

Suppose the question is asked this way: Who is taller relative to


their gender, a man 73 inches tall or a woman 68 inches tall?

One way to answer this question is with a -score.

McGraw-Hill Education.
Interpreting -Scores
The -score of an individual data value tells how many standard
deviations that value is from its population mean.

For example, a value one standard deviation above the mean


has a -score of = 1 and a value two standard deviations
below the mean has a -score of = 2.

Let be a value from a population with mean and



standard deviation . The z-score for is = .

McGraw-Hill Education.
Example: -Score
A National Center for Health Statistics study states that the mean
height for adult men in the U.S. is = 69.4 inches, with a standard
deviation of = 3.1 inches. The mean height for adult women is
= 63.8 inches, with a standard deviation of = 2.8 inches. Who
is taller relative to their gender, a man 73 inches tall, or a woman 68
inches tall?
We compute:
7369.4
Mans Height = = = 1.16
3.1

6863.8
Womans Height = = = 1.50
2.8

The woman is taller, relative to the population of womens heights.

McGraw-Hill Education.
-Scores & The Empirical Rule
Since the -score is the number of standard deviations from the mean,
we can easily interpret the -score for bell-shaped populations using
The Empirical Rule.

When a population has a histogram that is approximately bell-shaped,


then:
Approximately 68% of the data will have -scores between 1 and 1.
Approximately 95% of the data will have -scores between 2 and 2.
All, or almost all of the data will have -scores between 3 and 3.

McGraw-Hill Education.
Objective 2
Compute the quartiles of a data set

McGraw-Hill Education.
Quartiles
In a previous section, we learned how to compute the mean and
median of a data set as measures of the center. Sometimes, it is
useful to compute measures of position other than the center to get
a more detailed description of the distribution. Quartiles divide a
data set into four approximately equal pieces.

Every data set has three quartiles:


The first quartile, denoted 1 , separates the lowest 25% of the
data from the highest 75%.
The second quartile, denoted 2 , separates the lowest 50% of the
data from the highest 50%. 2 is the same as the median.
The third quartile, denoted 3 , separates the lowest 75% of the
data from the highest 25%.

McGraw-Hill Education.
Computing Quartiles
There are several methods for computing quartiles, all of which give similar
results. The following procedure is one fairly straightforward method.

Step 1: Arrange the data in increasing order.


Step 2: Let be the number of values in the data set. To compute the
second quartile, simply compute the median. For the first or third
quartiles, proceed as follows:
For the first quartile, compute = 0.25
For the third quartile, compute = 0.75
Step 3: If is a whole number, the quartile is the average of the number
in position and the number in position + 1.

If is not a whole number, round it up to the next higher whole


number. The quartile is the number in the position corresponding
to the rounded-up value.
McGraw-Hill Education.
Example: Computing Quartiles
The following table presents the annual rainfall, in inches, in Los Angeles
during the month of February over several years. Compute the quartiles for
the data.

The data are already in increasing order. There are = 45 values. For
the first quartile we compute = 0.25 45 = 11.25. Since 11.25 is not
a whole number, we round it up to 12. The first quartile is the number in
the 12th position, which is 0.92. = 1st Quartile = 0.92
For the second quartile, we compute the median using the methods
previously presented. The median is 3.12. = 2nd Quartile = 3.12
For the third quartile we compute = 0.75 45 = 33.75. Since 33.75 is
not a whole number, we round it up to 34. The third quartile is the
number in the 34th position, which is 4.94. = 3rd Quartile = 4.94
McGraw-Hill Education.
Quartiles on the TI-84 PLUS
The 1-Var Stats command in the TI-84 PLUS
Calculator displays a list of the most common
parameters and statistics for a given data set.
This command is accessed by pressing STAT and
then highlighting the CALC menu.
The 1-Var Stats command returns the following quantities.
The mean
The sample standard deviation
The population standard deviation
minX The minimum data value
Q1 The first quartile
Med The median
Q3 The third quartile
maxX The maximum data value

McGraw-Hill Education.
Example: Computing Quartiles on the TI-84
The following table presents the annual rainfall, in inches, in Los Angeles
during the month of February over several years. Compute the quartiles for
the data.

Step 1: Enter the data in L1.

Step 2: Press STAT and highlight


the CALC menu.
The quartile values produced by the TI-
Step 3: Run the 1-Var Stats 84 PLUS may differ from results
obtained by hand because it uses a
command.
slightly different procedure.
McGraw-Hill Education.
Visualizing the Quartiles
Following is a dotplot of the Los Angeles rainfall data with
the quartiles indicated. The quartiles divide the data set
into four parts, with approximately 25% of the data in each
part.

McGraw-Hill Education.
Objective 3
Compute the percentiles of a data set

McGraw-Hill Education.
Percentiles
Quartiles describe the shape of a distribution by dividing it into
fourths. Sometimes it is useful to divide a data set into a greater
number of pieces to get a more detailed description of the
distribution.
Percentiles divide a data set into hundredths. For a number p
between 1 and 99, the pth percentile separates the lowest p% of the
data from the highest (100 p)%.

McGraw-Hill Education.
Computing Percentiles
The following procedure computes the pth percentile of a data set:

Step 1: Arrange the data in increasing order.

Step 2: Let be the number of values in the data set. For the pth

percentile, compute = .
100

Step 3: If is a whole number, the pth percentile is the average of


the number in position and the number in position + 1.

If is not a whole number, round it up to the next higher


whole number. The pth percentile is the number in the
position corresponding to the rounded-up value.
McGraw-Hill Education.
Example: Computing Percentiles
The following table presents the annual rainfall, in inches, in Los Angeles
during the month of February over several years. Compute the 60th percentile
for the data.

The data are already in increasing order. There are = 45 values. For the
60
60th percentile we compute = 45 = 27. Since 27 is a whole number,
100
the 60th percentile is the average of the numbers in the 27th and 28th
positions. We see that the 60th percentile is
.+.
60th Percentile = = .

McGraw-Hill Education.
Computing a Percentile from a Given Data Value
Sometimes we are given a value from a data set and wish to compute
the percentile corresponding to that value. Following is the
procedure for doing this:

Step 1: Arrange the data in increasing order.

Step 2: Let be the data value whose percentile is to be computed.


Use the following formula to compute the percentile:

+0.5
Percentile = 100

Round the result to the nearest whole number. This is the


percentile corresponding to the value .
McGraw-Hill Education.
Example: Percentile of a Given Data Value
The following table presents the annual rainfall, in inches, in Los
Angeles during the month of February over several years. One year,
the rainfall was 1.90. What percentile does this correspond to?

The data are already in increasing order. There are = 45 values in


the data set. There are 17 values less than 1.90.

+. +.
Percentile = = = .

We round the result to 39. The value 1.90 corresponds to the 39th
percentile.

McGraw-Hill Education.
Objective 4
Compute the five-number summary for a data
set

McGraw-Hill Education.
Five-Number Summary
The five-number summary of a data set consists of the median, the
first quartile, the third quartile, the smallest value, and the largest
value. These values are generally arranged in order.
The five-number summary of a data set consists of the following
quantities.

Minimum First Quartile Median Third Quartile Maximum

McGraw-Hill Education.
Example: Five Number Summary
The following table presents the annual rainfall, in inches, in Los
Angeles during the month of February over several years. Compute
the five-number summary.

We previously computed the quartiles:


= 0.92 Med = = 3.12 = 4.94

The minimum and maximum values are:


Minimum = 0.00 Maximum = 13.68

The five-number summary is given by:


0.00, 0.92, 3.12, 4.94, 13.68
McGraw-Hill Education.
Example: Five Number Summary on the TI-84
The following table presents the annual rainfall, in inches, in Los
Angeles during the month of February over several years. Compute
the five-number summary.

When using the TI-84 PLUS Calculator,


the five-number summary is given by
the 1-Var Stats command.

McGraw-Hill Education.
Objective 5
Understand the effects of outliers

McGraw-Hill Education.
Outliers
An outlier is a value that is considerably larger or
considerably smaller than most of the values in a data set.
Some outliers result from errors; for example a misplaced
decimal point may cause a number to be much larger or
smaller than the other values in a data set. Some outliers
are correct values, and simply reflect the fact that the
population contains some extreme values.

McGraw-Hill Education.
Example: Outliers
The temperature in a downtown location is measured for
eight consecutive days during the summer. The readings, in
Fahrenheit, are
81.2 85.6 89.3 91.0 83.2 8.45 79.5 87.8
Which reading is an outlier? Is the outlier an error or is it
possible that it is correct?

Solution:
The outlier is 8.45. It certainly is an error, likely resulting
from a misplaced decimal point. The outlier should be
corrected if possible.

McGraw-Hill Education.
Interquartile Range
One method for detecting outliers involves a measure
called the Interquartile Range.

The interquartile range is found by subtracting the first


quartile from the third quartile.
IQR =

McGraw-Hill Education.
IQR Method for Detecting Outliers
The most frequent method used to detect outliers in a data set is the
IQR Method.
Step 1: Find the first quartile 1 , and the third quartile 3 .

Step 2: Compute the interquartile range: IQR = 3 1 .

Step 3: Compute the outlier boundaries. These boundaries are the


cutoff points for determining outliers.
Lower Outlier Boundary = 1 1.5(IQR)
Upper Outlier Boundary = 3 + 1.5(IQR)

Step 4: Any data value that is less than the lower outlier boundary
or greater than the upper outlier boundary is considered to
be an outlier.
McGraw-Hill Education.
Example: Identifying Outliers
The following table presents the number of students absent in a middle
school in northwestern Montana for each school day in January. Identify any
outliers.
65 67 71 57 51 49 44 41 59 49 42
56 45 77 44 42 45 46 100 59 53 51
We may use the TI-84 PLUS or other technology to
compute the quartiles. 1 = 45 3 = 59

Interquartile Range: IQR = 3 1 = 59 45 = 14

Lower Outlier Boundary: 1 1.5(IQR) = 45 1.5(14) = 24


Upper Outlier Boundary: 3 + 1.5(IQR) = 59 + 1.5(14) = 80

There are no values less than the lower boundary of 24. The value 100 is
greater than the upper boundary. Therefore, the value 100 is an outlier.
McGraw-Hill Education.
Objective 6
Construct boxplots to visualize the five-number
summary and outliers

McGraw-Hill Education.
Boxplot
A boxplot is a graph that presents the five-number summary
along with some additional information about a data set.
There are several different kinds of boxplots. The one we
describe here is sometimes called a modified boxplot.

McGraw-Hill Education.
Example: Boxplot
The following table presents the number of students absent in a middle
school in northwestern Montana for each school day in January. Construct a
boxplot.
65 67 71 57 51 49 44 41 59 49 42
56 45 77 44 42 45 46 100 59 53 51
Step 1:
We may use the TI-84 PLUS or other technology to
compute the quartiles. 1 = 45, Med = 51, and 3 = 59.

Step 2:
We draw vertical lines at 45, 51, and 59, then
horizontal lines to complete the box.

McGraw-Hill Education.
Example: Boxplot (Continued 1)
Step 3:
We compute the outlier boundaries:
Lower Outlier Boundary = 1 1.5(IQR) = 24
Upper Outlier Boundary = 3 + 1.5(IQR) = 80

Step 4:
The largest data value that is less than the upper boundary is 77. We
draw a horizontal line from 59 up to 77.

McGraw-Hill Education.
Example: Boxplot (Continued 2)
Step 5:
The smallest data value that is greater than the lower boundary is 41.
We draw a horizontal line from 45 down to 41.

Step 6:
The data value 100 lies outside of the outlier boundaries. Therefore,
100 is an outlier. We plot this point separately.

McGraw-Hill Education.
Boxplots on the TI-84 PLUS
The following steps will create a boxplot for
the student absences data on the TI-84 PLUS.

Step 1: Enter the data in L1.

Step 2: Press 2nd,Y=, then 1 to access the


Plot1 menu. Select On and the boxplot
type.

Step 3: Press Zoom, 9 to view the plot.

McGraw-Hill Education.
The Empirical Rule
When a data set has a bell-shaped histogram, it is often possible to
use the standard deviation to provide an approximate description of
the data using a rule known as The Empirical Rule.
Approximately 68% of the data will be within one standard
deviation of the mean.
Approximately 95% of the data will be within two standard
deviations of the mean.
All, or almost all, of the data will be within three standard
deviations of the mean.

McGraw-Hill Education.
Boxplots and Shape of a Data Set (Skewed Right)
Boxplots can be used to determine skewness in a data set.

If the median is closer to the


first quartile than to the
third quartile, or the upper
whisker is longer than the
lower whisker, the data are
skewed to the right.

McGraw-Hill Education.
Boxplots and Shape of a Data Set (Skewed Left)

If the median is closer to the


third quartile than to the
first quartile, or the lower
whisker is longer than the
upper whisker, the data are
skewed to the left.

McGraw-Hill Education.
Boxplots and Shape of a Data Set (Symmetric)

If the median is approximately


halfway between the first and
third quartiles, and the two
whiskers are approximately
equal in length, the data are
approximately symmetric

McGraw-Hill Education.
You Should Know . . .
How to compute and interpret -scores
How to compute the quartiles of a data set
How to compute a percentile of a data set
How to compute the percentile corresponding to a given data value
How to find the five-number summary for a data set
How to determine outliers using the IQR method
How to construct a boxplot and use it to determine skewness

McGraw-Hill Education.

You might also like