You are on page 1of 68

Chapter 3

Numerically Describing Data from One Variable


3.1 Measures of Central Tendency
1. A statistic is resistant if it is not sensitive to extreme data values. The median is resistant
because it is a positional measure of central tendency and increasing the largest value or
decreasing the smallest value does not affect the position of the center. The mean is not
resistant because it is a function of the sum of the data values. Changing the magnitude of
one value changes the sum of the values, and thus affects the mean. The mode is a
resistant measure of center.
2. The men and the median are approximately equal when the data are symmetric. If the
mean is significantly greater than the median, the data are skewed right. If the mean is
significantly less than the median, the data are skewed left.
3. Since the distribution of household incomes in the United States is skewed to the right, the
mean is greater than the median. Thus, the mean household income is $55,263 and the
median is $41,349.
4. HUD uses the median because the data are skewed. Explanations will vary. One
possibility is that the price of homes has a distribution that is skewed to the right, so the
median is more representative of the typical price of a home.
5. The mean will be larger because it will be influenced by the extreme data values that are to
the right end (or high end) of the distribution.
6.

10, 000 + 1
= 5000.5 . The median is between the 5000th and the 5001st ordered values.
2

7. The mode is used with qualitative data because the computations involved with the mean
and median make no sense for qualitative data.
8. parameter; statistic
9. False. A data set may have multiple modes, or it may have no mode at all.
10. False. The formula

n +1
gives the position of the median, not the value of the median.
2

11. x =

20 + 13 + 4 + 8 + 10 55
=
= 11
5
5

12. x =

83 + 65 + 91 + 87 + 84 420
=
= 84
5
5

13. =

3 + 6 + 10 + 12 + 14 45
=
=9
5
5

121

Chapter 3 Numerically Summarizing Data


14. =

15.

1 + 19 + 25 + 15 + 12 + 16 + 28 + 13 + 6 135
=
= 15
9
9

142
2.4 . The mean price per ad slot is approximately $2.4 million.
59

16. Let x represent the missing value. Since there are 6 data values in the list, the median 26.5
is between the 3rd and 4th ordered values which are 21 and x, respectively. Thus,
21 + x
= 26.5
2
21 + x = 53
x = 32
The missing value is 32.

420 + 462 + 409 + 236 1527


=
= $381.75
4
4
Data in order: 236, 409, 420, 462
409 + 420 829
Median =
=
= $414.50
2
2
No data value occurs more than once so there is no mode.

17. Mean =

35.34 + 42.09 + 39.43 + 38.93 + 43.39 + 49.26 248.44


=
$41.41
6
6
Data in order: 35.34, 38.93, 39.43, 42.09, 43.39, 49.26
39.43 + 42.09 81.52
Median =
=
= $40.76
2
2
No data value occurs more than once so there is no mode.

18. Mean =

3960 + 4090 + 3200 + 3100 + 2940 + 3830 + 4090 + 4040 + 3780 33, 030
=
= 3670 psi
9
9
Data in order: 2940, 3100, 3200, 3780, 3830, 3960, 4040, 4090, 4090
Median = the 5th ordered data value = 3830 psi
Mode = 4090 psi (because it is the only data value to occur twice)

19. Mean =

282 + 270 + 260 + 266 + 257 + 260 + 267 1862


=
= 266 minutes
7
7
Data in order: 257, 260, 260, 266, 267, 270, 282
Median = the 4th data value with the data in order = 266 minutes
Mode = 260 minutes (because it is the only data value to occur twice)

20. Mean =

21. (a) The histogram is skewed to the right, suggesting that the mean is greater than the
median. That is, x > M .
(b) The histogram is symmetric, suggesting that the mean is approximately equal to the
median. That is, x = M .
(c) The histogram is skewed to the left, suggesting that the mean is less than the median.
That is, x < M .
122

Section 3.1 Measures of Central Tendency


22. (a)
(b)
(c)
(d)

IV because the distribution is symmetric (so mean median) and centered near 30.
III because the distribution is skewed to the right, so mean > median.
II because the distribution is skewed to the left, so mean < median.
I because the distribution is symmetric (so mean median) and centered near 40.

23. Los Angeles ATM fees:


2.00 + 1.50 + 1.50 + 1.00 + 1.50 + 2.00 + 0.00 + 2.00 11.50
Mean =
=
$1.44
8
8
Data in order: 0.00, 1.00, 1.50, 1.50, 1.50, 2.00, 2.00, 2.00
1.50 + 1.50 3.00
Median =
=
= $1.50
2
2
Mode = $1.50 and $2.00 (because both values occur three times each)

New York City ATM fees:


1.50 + 1.00 + 1.00 + 1.25 + 1.25 + 1.50 + 1.00 + 0.00 8.50
Mean =
=
$1.06
8
8
Data in order: 0.00, 1.00, 1.00, 1.00, 1.25, 1.25, 1.50, 1.50
1.00 + 1.25 2.25
Median =
=
$1.13
2
2
Mode = $1.00 (because it occurs the more than the other values)
The ATM fees in Los Angeles appear to be higher in general than those in New York City.
All three measures of center were higher for Los Angeles than for New York.
Explanations will vary. Possibilities for the difference may be the number of ATMs
available or the amount of ATM usage in each city.
24. Reaction Time to Blue:
0.582 + 0.481 + 0.841 + 0.267 + 0.685 + 0.45 3.306
Mean =
=
= 0.551 sec.
6
6
Data in order: 0.267, 0.45, 0.481, 0.582, 0.685, 0.841
0.481 + 0.582 1.063
Median =
=
= 0.5315 sec.
2
2
No data value occurs more than once so there is no mode.

Reaction Time to Red:


0.408 + 0.407 + 0.542 + 0.402 + 0.456 + 0.533 2.748
Mean =
=
= 0.458 sec.
6
6
Data in order: 0.402, 0.407, 0.408, 0.456, 0.533, 0.542
0.408 + 0.456 0.864
Median =
=
= 0.432 sec.
2
2
No data value occurs more than once so there is no mode.
There is a shorter reaction time to the red screen than to the blue screen. Explanations will
vary. This information could be useful in designing warning screens for computer
software controlling critical operations (such as nuclear power plants, for one example).

123

Chapter 3 Numerically Summarizing Data


76 + 60 + 60 + 81 + 72 + 80 + 80 + 68 + 73 650
=
72.2 beats per minute
9
9
(b) Samples and sample means will vary.
(c) Answers will vary.

25. (a) =

39 + 21 + 9 + 32 + 30 + 45 + 11 + 12 + 39 238
=
26.4 minutes
9
9
(b) Samples and sample means will vary.
(c) Answers will vary.

26. (a) =

27. (a) =

0 + 0 + 0 + 4 + 10 + 1 + 10 + 10 + 19 + 9 + 18 + 20 + 13 + 13 + 2 + 7 + 8 + 13
18

157
18

8.7 goals per year


(b) Samples and sample means will vary.
(c) Answers will vary.
91.538 + 92.552 + 86.291 + 82.087 + 83.687 + 83.601 + 86.251 606.007
=
7
7
86.572 hours
Data in order: 82.087, 83.601, 83.687, 86.251, 86.291, 91.538, 92.552
Median = the 4th data value with the data in order = 86.251 hours
3687 + 3662 + 3453 + 3278 + 3427 + 3391 + 3593 24, 491
(b) distance =
=
3499 km
7
7
Data in order: 3278, 3391, 3427, 3453, 3593, 3662, 3687
Median = the 4th data value with the data in order = 3453 km
7.617 + 6.033 + 6.733 + 7.283 + 1.017 + 6.317 39.667
(c) margin =
=
5.667 minutes
7
7
Data in order: 1.017, 4.667, 6.033, 6.317, 6.733, 7.283, 7.617
Median = the 4th data value with the data in order = 6.317 minutes
40.28 + 39.56 + 40.02 + 39.93 + 40.94 + 40.56 + 41.65
(d) Mean winning speed: =
7
282.94
=
= 40.420 km/h
7
distance = 24, 491 40.414 km/hr
Winning speed:
time 606.007

28. (a) time =

Winning speed:

distance
3499
=
40.417 km/hr
time
86.572

The three results agree approximately. The differences are due to rounding.
29. The distribution is relatively symmetric as is evidenced by both the histogram and the fact
that the mean and median are approximately equal. Therefore, the mean is the better
measure of central tendency.

124

Section 3.1 Measures of Central Tendency


30. The distribution is skewed right as is evidenced by both the histogram and the fact that the
mean is significantly greater than the mean. Therefore, the median is the better measure of
central tendency.
31. (a) x 51.1; M = 51 .
(b) The mean is approximately equal to the median suggesting that the distribution is
symmetric, and this is confirmed by the histogram.
32. (a) x 5.88 million shares; M = 5.58 million shares .
(b) The mean is greater than the mean suggesting that the distribution is skewed right, and
this is confirmed by the histogram.

Weight of Plain M&Ms


Frequency

33.

0.76 0.78 0.80 0.82 0.84 0.86 0.88 0.90 0.92 0.94 0.96

Weight (grams)

x 0.874 grams; M = 0.88 grams . The mean is approximately equal to the median
suggesting that the distribution is symmetric. This is confirmed by the histogram (though
is does appear to be slightly skewed left). The mean is the better measure of central
tendency.
Length of Eruptions

34.

Frequency

15

5
90 95 100 105 110 115 120 125

Length (seconds)

x 104.1 seconds; M = 104 seconds . The mean is approximately equal to the median
suggesting that the distribution is symmetric. This is confirmed by the histogram. The
mean is the better measure of central tendency.

125

Chapter 3 Numerically Summarizing Data


Hours Worked per Week
Frequency

35.

10 15 20 25 30 35 40 45

Hours

x = 22 hour; M = 25 hours . The mean is smaller than the median suggesting that the
distribution is skewed left. This is confirmed by the histogram. The median is the better
measure of central tendency.
Car Dealers Profit
Number of Sales

36.

500 1000 1500 2000 2500 3000 3500 4000 4500

Dollars

x $1392.83; M = $1177.50 . The mean is significantly greater than the median


suggesting that the distribution is skewed right. This is confirmed by the histogram. The
median is the better measure of central tendency.
37. The highest frequency is 12,362, and so the mode region of birth is Central America.
38. The highest frequency is 131, and so the mode offense is Street or Highway.
39. The vote counts are: Bush = 21, Kerry = 17, Nader = 1, and Badnarik = 1. The mode
candidate is Bush.
40. The frequencies are: Cancer = 1, Gunshot wound = 8, Assault = 1, Motor vehicle accident
= 7, Fall = 2, and Congestive heart failure = 1. The mode diagnosis is Gunshot wound.
41. Sample size of 5:
All data recorded correctly: x = 99.8; M = 100 .
106 recorded at 160: x = 110.6; M = 100 .
Sample size of 12:
All data recorded correctly: x 100.4; M = 101 .
106 recorded at 160: x 104.9; M = 101 .
Sample size of 30:
All data recorded correctly: x = 100.6; M = 99 .
106 recorded at 160: x = 102.4; M = 99 .
For each sample size, the mean becomes larger while the median remains the same. As the
sample size increases, the impact of the misrecorded data value on the mean decreases.
126

Section 3.1 Measures of Central Tendency


42. (a)
(b)
(c)
(d)

27.1 years; M = 27 years; Mode = 26 years


= 249.8 lb; M = 245 lb; Mode = 305 lb
4.6 years; M = 4 years; Mode = 3 years

The frequency for Purdue is 3. The frequencies of all other colleges are lower than 3,
so the mode college attended is Purdue.
(e) Samples and sample means will vary.
(f) Offensive guards: = 306.4 lb; M = 305 lb; Mode = 305 lb
Running backs: = 217.8 lb; M = 220 lb; Mode = 225 lb
Yes, there appears to be differences in the weights of offensive guards and running
backs. All three measures of center indicate that offensive guards are significantly
heavier than running backs. This is due to the nature of the positions. Offensive
guards must be able to protect the quarterback while the running back must be able to
run quickly.
(g) It does not make sense to compute the mean player number. The variable player
number is qualitative, so the quantitative calculations will be meaningless.

43. Samples and sample means will vary.


44. NBA salaries are likely significantly skewed to the right. Therefore, since the median will
be lower than the mean, the players would rather use the median salary to support the
claim that the average players salary need to be increases. The negotiator for the owners
would rather use the mean salary.
45. The amount of money lost per visitor is likely skewed to the right. Therefore, the median
loss would be less than the mean because the mean amount would be inflated by those few
visitors who lost very large amounts of money
46. The sum of the nineteen readable scores is 19 84 = 1596 . The sum of all twenty scores is
20 82 = 1640 . Therefore, the unreadable score is 1640 1596 = 44 .
47. The sum of the six number will be 6 34 = 204 .
48. (a)
(b)
(c)
(d)
(e)
(f)

Median. Home prices are likely skewed right.


Mode. The variable major is qualitative.
Mean. The data are quantitative and symmetric.
Median. The data are quantitative and skewed.
Median. NFL salaries are likely skewed right.
Mode. The variable requested song is qualitative.
30 + 30 + 45 + 50 + 50 + 50 + 55 + 55 + 60 + 75 500
=
= 50 . The mean is $50,000.
10
10
Median: The ten data values are in order, so we average the two middle values.
50 + 50 100
=
= 50 . The median is $50,000.
2
2
Mode: The mode is $50,000 (the most frequent salary).

49. (a) Mean:

127

Chapter 3 Numerically Summarizing Data


(b) Add $2500 ($2.5 thousand) to each salary to form the new data set.
New data set: 32.5, 32.5, 47.5, 52.5, 52.5, 52.5, 57.5, 57.5, 62.5, 77.5
32.5 + 32.5 + 47.5 + 52.5 + 52.5 + 52.5 + 57.5 + 57.5 + 62.5 + 77.5 525
Mean:
=
= 52.5
10
10
The new mean is $52,500.
Median: The ten data values are in order, so we average the two middle values.
52.5 + 52.5 105
=
= 52.5 . The new median is $52,500.
2
2
Mode: The new mode is $52,500 (the most frequent new salary).
All three measures of central tendency increased by $2500, which was the amount of
the raises.
(c) Multiply each original data value by 1.05 to generate the new data set.
New data set: 31.5, 31.5, 47.25, 52.5, 52.5, 52.5, 57.75, 57.75, 63, 78.75
31.5 + 31.5 + 47.25 + 52.5 + 52.5 + 52.5 + 57.75 + 57.75 + 63 + 78.75 525
Mean:
=
= 52.5 .
10
10
The new mean is $52,500.
Median: The ten data values are in order, so we average the two middle values.
52.5 + 52.5 105
=
= 52.5 . The new median is $52,500.
2
2
Mode: The new mode is $52,500 (the most frequent new salary).
All three measures of central tendency increased by 5%, which was the amount of the
raises.
(d) Add $25 thousand to the largest data value to form the new data set.
New data set: 30, 30, 45, 50, 50, 50, 55, 55, 60, 100
30 + 30 + 45 + 50 + 50 + 50 + 55 + 55 + 60 + 100 525
Mean:
=
= 52.5 . The new mean is
10
10
$52,500.
Median: The ten data values are in order, so we average the two middle values.
50 + 50 100
=
= 50 . The mew median is $50,000.
2
2
Mode: The new mode is $50,000 (the most frequent salary).
The mean was increased by $2500, but the median and mode remained unchanged.

65 + 70 + 71 + 75 + 95 376
=
= 75.2
5
5
The five data values are in order, so the median is the middle value: M = 71 .
The distribution is skewed right, so the median is the better measure of central
tendency.
Adding 4 to each score gives the following new data set: 69, 74, 75, 79, 99.
69 + 74 + 75 + 79 + 99 396
x=
=
= 79.2
5
5
The curved test score mean is 4 greater than the unadjusted test score mean. Adding 4
to each score increased the mean by 4.

50. (a) x =
(b)
(c)
(d)

(e)

128

Section 3.2 Measures of Dispersion


51. The largest data value is 0.94 and the smallest is 0.76. The mean after deleting those two
data values is 0.875 grams. (Note: The value 0.94 occurs twice, but we only remove one.)
The trimmed mean is more resistant than the regular mean. Note in this case that the
trimmed mean 0.875 grams is approximately equal to the median 0.88 grams.

0.76 + 0.94 1.7


=
= 0.85 grams. The midrange is not resistant because it is
2
2
computed using the two most extreme data values.

52. Midrange =

3.2 Measures of Dispersion


1. No. In comparing two populations, the larger the standard deviation, the more dispersed
the distribution, provided that the variable of interest in both populations has the same unit
of measurement. Since 5 inches 5 2.54 = 12.7 centimeters , the distribution with a
standard deviation of 5 inches is in fact more dispersed.
2. In the calculation of the sample variance, the degrees of freedom is n 1 , and is used as the
divisor in averaging the squared deviations about the mean.
3. All data values are used in computing the standard deviation, including extreme values.
Since a statistic is resistant only if it is not influenced by extreme data values, the standard
deviation is not resistant.
4. zero
5. A statistic is biased whenever that statistic consistently overestimates or underestimates a
parameter.
6. range

7. The standard deviation is the square root of the variance.

8. mean; mean; spread

9. True

10. True

11. From Section 3.1, Exercise 11, we know x = 11 .

Data, xi
20
13
4
8
10

Sample Mean, x
11
11
11
11
11

Deviations, xi x Squared Deviations, ( xi x )


20 11 = 9
92 = 81
13 11 = 2
22 = 4
4 11 = 7
(7) 2 = 49
8 11 = 3
(3) 2 = 9
10 11 = 1
(1) 2 = 1

(x x ) = 0
(x x )
s=

( x x )

(x x )
=
i

n 1

144
=
= 36 ;
5 1

n 1

129

= 144

144
= 36 = 6 .
5 1

Chapter 3 Numerically Summarizing Data


12. From Section 3.1, Exercise 12, we know x = 82 .

Data, xi
83
65
91
87
84

Sample Mean, x
82
82
82
82
82

Deviations, xi x Squared Deviations, ( xi x )


83 82 = 1
12 = 1
65 82 = 17
(17) 2 = 289
91 82 = 9
92 = 81
87 82 = 5
52 = 25
84 82 = 2
22 = 4

(x x ) = 0
( x x ) = 400 = 100 ; s = ( x x )
=

( x x )

n 1

5 1

n 1

= 400

400
= 100 = 10 .
5 1

13. From Section 3.1, Exercise 13, we know = 9 .

Population Mean,
9
9
9
9
9

Data, xi
3
6
10
12
14

Deviations, xi Squared Deviations, ( xi )


3 9 = 6
(6) 2 = 36
6 9 = 3
(3) 2 = 9
10 9 = 1
12 = 1
12 9 = 3
32 = 9
14 9 = 5
52 = 25

( x ) = 0
(x ) =
i

( x )
=
i

80
=
= 16 ; =
5

( x )

= 80

80
= 16 = 4 .
5

14. From Section 3.1, Exercise 14, we know = 15 .

Data, xi
1
19
25
15
12
16
28
13
6

Population Mean,
15
15
15
15
15
15
15
15
15

Deviations, xi Squared Deviations, ( xi )


1 15 = 14
(14) 2 = 196
19 15 = 4
42 = 16
25 15 = 10
102 = 100
15 15 = 0
02 = 0
12 15 = 3
(3) 2 = 9
16 15 = 1
12 = 1
28 15 = 13
132 = 169
13 15 = 2
(2) 2 = 4
6 15 = 9
(9) 2 = 81

( x ) = 0
i

130

( x )
i

= 576

Section 3.2 Measures of Dispersion

(x )
=

15. x =

576
=
= 64 ; =
9

(x )

6 + 52 + 13 + 49 + 35 + 25 + 31 + 29 + 31 + 29 300
=
= 30 .
10
10

Data, xi
6
52
13
49
35
25
31
29
31
29

Sample Mean, x
30
30
30
30
30
30
30
30
30
30

Squared Deviations, ( xi x )
( 24) 2 = 576
222 = 484
(17) 2 = 289
192 = 361
52 = 25
(5) 2 = 25
12 = 1
(1) 2 = 1
12 = 1
(1) 2 = 1

Deviations, xi x
6 30 = 24
52 30 = 22
13 30 = 17
49 30 = 19
35 30 = 5
25 30 = 5
31 30 = 1
29 30 = 1
31 30 = 1
29 30 = 1

( x x ) = 0
( x x )
( x x ) = 1764 = 196 ; s = ( x x ) = 1764 =
=
i

64
= 64 = 8 .
9

= 1764

n 1

16. =

10 1

196 = 14 .

4 + 10 + 12 + 12 + 13 + 21 72
=
= 12 .
6
6

Data, xi
4
10
12
12
13
21

Population Mean,
12
12
12
12
12
12

Deviations, xi Squared Deviations, ( xi )


4 12 = 8
(8) 2 = 64
10 12 = 2
(2) 2 = 4
12 12 = 0
02 = 0
12 12 = 0
02 = 0
13 12 = 1
12 = 1
21 12 = 9
92 = 81

( x ) = 0
(x )
=
=
i

2 =

( xi )
N

150
= 25 ;
6

131

( x )
i

150
= 25 = 5 .
6

= 150

Chapter 3 Numerically Summarizing Data


17. Range = Largest Data Value Smallest Data Value = 462 236 = $226.
From Section 3.1, Exercise 17, we know x = $381.75 .
Data, xi
420
462
409
236

Sample Mean, x
381.75
381.75
381.75
381.75

Deviations, xi x
38.25
80.25
27.25
145.75

( x x ) = 0
i

(x x )
=
i

n 1

29,888.75

= 9,962.9 $2
4 1

Squared Deviations, ( xi x )
1463.0625
6440.0625
742.5625
21, 243.0625

( x x ) = 29,888.75
( x x ) = 29,888.75 $99.81
; s=
2

n 1

4 1

18. Range = Largest Data Value Smallest Data Value = 49.26 35.34 = $13.92.
To calculate the sample variance and the sample standard deviation, we use the
computational formula:
2
Data value, xi Data value squared, xi2
xi )
(

2
xi n
35.34
1248.9156
s2 =
n 1
42.09
1771.5681
2
248.44 )
(
39.43
1554.7249
10,399.9932
6
38.93
1515.5449
=
22.584 $2 ;
6 1
43.39
1882.6921
49.26
2426.5476
2
2
248.44 )
(
xi = 248.44 xi = 10,399.9932
10,399.9932
s=

6 1

$4.75

19. Range = Largest Data Value Smallest Data Value = 4090 2940 = 1150 psi
To calculate the sample variance and the sample standard deviation, we use the
computational formula:
2
Data value, xi Data value squared, xi2
xi )
(

2
xi n
3960
15,681,600
2
s =
n 1
4090
16,728,100
( 33,030 )2
3200
10,240,000
122,910,300
9
3100
9,610,000
=
211,275 psi 2 ;
9 1
2940
8,643,600
3830
14,668,900
( 33,030 )2
4090
16,728,100
122,910,300
9
4040
16,321,600
s=
459.6 psi
9 1
3780
14,288,400
2
xi = 33,020 xi = 122,828,600
132

Section 3.2 Measures of Dispersion


20. Range = Largest Data Value Smallest Data Value = 282 257 = 25 minutes.
From Section 3.1, Exercise 20, we know x = 266 minutes. .
Data, xi
282
270
260
266
257
260

Sample Mean, x
266
266
266
266
266
266

267

266

Deviations, xi x Squared Deviations, ( xi x )


16
256
4
16
6
36
0
0
9
81
6
36
1

( x x ) = 0
( x x ) = 426
426
( x x ) = 426 8.4 min
=
= 71 min ; s =
2

s2 =

(x x )
i

n 1

7 1

n 1

7 1

21. Histogram (b) depicts a higher standard deviation because the data is more dispersed, with
data values ranging from 30 to 75. Histogram (a)s data values only range from 40 to 60.
22. (a) III, because it is centered between 52 and 57 and has the greatest amount of dispersion
of the three histograms with mean = 53.
(b) I, because it is centered near 53 and its dispersion is consistent with s = 1.3 but not
with s = 0.12 or s = 9 .
(c) IV, because it is centered near 53 and it has the least dispersion of the three histograms
with mean = 53.
(d) II, because it has a center near 60.
23. Los Angeles ATM fees:
Range = Largest Data Value Smallest Data Value = 2.00 0.00 = $2.00.
2
Data value, xi Data value squared, xi2
xi )
(

2
xi n
2.00
4
s=
1.50
2.25
n 1
2
1.50
2.25
(11.5 )

19.75
1.00
1
8
=
1.50
2.25
8 1
$0.68
2.00
4
0.00
2.00
xi = 11.5

0
4
2
xi = 19.75

133

Chapter 3 Numerically Summarizing Data


New York City ATM fees:
Range = Largest Data Value Smallest Data Value = 1.50 0.00 = $1.50.
2
Data value, xi Data value squared, xi2
xi )
(

2
xi n
1.50
2.25
s=
1.00
1
n 1
2
1.00
1
( 8.5 )
10.625
1.25
1.5625
8
=
1.25
1.5625
8 1
$0.48
1.50
2.25
1.00
1
0.00
0
2
xi = 8.5
xi = 10.625
Based on both the range and the standard deviation, ATM fees in Los Angeles have more
dispersion than ATM fees in New York. Both the range and the standard deviation for Los
Angeles are larger.
24. Reaction Time to Blue:
Range = Largest Data Value Smallest Data Value = 0.841 0.267 = 0574 sec.
2
Data value, xi Data value squared, xi2
xi )
(

2
xi n
0.582
0.338724
s=
0.481
0.231361
n 1
0.841
0.707281
( 3.306 )2

2.02038
0.267
0.071289
6
=
6 1
0.685
0.469225
0.1994 sec.
0.45
0.2025

= 3.306

2
i

= 2.02038

Reaction Time to Red:


Range = Largest Data Value Smallest Data Value = 0.542 0.402 = 0.140 sec.
2
Data value, xi Data value squared, xi2
xi )
(

2
xi n
0.408
0.166464
s=
0.407
0.165649
n 1
2
0.542
0.293764
2.748 )
(
1.279506
0.402
0.161604
6
=
0.456
0.207936
6 1
0.533
0.284089
0.0647 sec.
2
2.748
1.279506
x
=
x
=
i
i
Based on both the range and the standard deviation, the reaction times for blue have more
variability than those for red. Both the range and the standard deviation for blue are larger.
134

Section 3.2 Measures of Dispersion

25. (a) We use the computational formula:

2 =

2
i

( x )

2
i

( x )

( 650 )
47,474

47,474

2
i

= 47, 474 ; N = 9 ;

58.8 ( beats/min.) ;
2

= 650 ;

( 650 )

7.7 beats/min.

(b) Samples, sample variances, and sample standard deviations will vary.
(c) Answers will vary.

26. (a) We use the computational formula:

2 =

xi2

( x )

2
i

( x )

7778

( 238)
9

7778

= 238 ;

2
i

= 7778 ; N = 9 ;

164.9 min.2 ;

( 238 )

12.8 min.

(b) Samples, sample variances, and sample standard deviations will vary.
(c) Answers will vary.

27. (a) We use the computational formula:

2 =

2
i

( x )

2
i

( x )

(157 )
2107

2107

2
i

= 2107 ; N = 18 ;

41.0 goals 2 ;

18

= 157 ;

18

(157 )

18

6.4 goals

18

(b) Samples, sample variances, and sample standard deviations will vary.
(c) Answers will vary.
28. (a) Range = Largest Data Value Smallest Data Value = 92.552 82.087 = 10.465 hours.
For the population variance and standard deviation, we use the computational formula:
xi = 606.007 ; xi2 = 52,561.3666 ; N = 7 ;

=
2

2
i

( x )

xi2

( x )

( 606.007 )
52,561.3666

13.981 hours 2 ;

52,561.3666

( 606.007 )

135

3.739 hours

Chapter 3 Numerically Summarizing Data


(b) Range = Largest Data Value Smallest Data Value = 3687 3278 = 409 km.
For the population variance and standard deviation, we use the computational formula:
xi = 24, 491 ; xi2 = 85,825,565 ; N = 7 ;

2 =

2
i

( x )

( x )

2
i

85,825,565

( 24, 491)
7

19, 793.3 km 2 ;

85,825,565

( 24, 491)

140.7 km

(c) Range = Largest Data Value Smallest Data Value = 7.617 1.017 = 6.600 min.
For the population variance and standard deviation, we use the computational formula:
xi = 39.667 ; xi2 = 255.510823 ; N = 7 ;

=
2

2
i

( x )

( x )

2
i

( 39.667 )
255.510823
7

4.390 min 2 ;

255.510823

( 39.667 )
7

2.095 min

(d) Range = Largest Data Value Smallest Data Value = 41.65 39.56 = 2.09 km/h.
For the population variance and standard deviation, we use the computational formula:
xi = 282.94 ; xi2 = 11, 439.397 ; N = 7 ;

=
2

2
i

( x )

xi2

( x )

( 282.94 )
11,439.397

0.423 ( km/h ) ;
2

11,439.397

( 282.94 )

0.651 km/h

9 + 24 + 8 + 9 + 5 + 8 + 9 + 10 + 8 + 10 100
=
= 10 fish ;
N
10
10
Range = Largest Data Value Smallest Data Value = 24 5 = 19 fish
xi = 15 + 2 + 3 + 18 + 20 + 1 + 17 + 2 + 19 + 3 = 100 = 10 fish ;
Drew: =
N
10
10
Range = Largest Data Value Smallest Data Value = 20 1 = 19 fish

29. (a) Ethan: =

Both fishermen have the same mean and range, so these values do not indicate any
differences between their catches per day.

136

Section 3.2 Measures of Dispersion


(b) Ethan:

2
i

=
Drew:

= 100 ;

2
i

= 1236 ; N = 10

( x )

= 100 ;

2
i

2
i

10

4.9 fish

10

(100 )

= 1626 ; N = 10

( x )

1236

1626

(100 )
10

10

7.9 fish

Yes, now there appears to be a difference in the two fishermens records. Ethan had a
more consistent fishing record, which is indicated by the smaller standard deviation.
(c) Answers will vary. One possibility follows: The range is limited as a measure of
dispersion because it does not take all of the data values into account. It is obtained by
using only the two most extreme data values. Since the standard deviation utilizes all
of the data values, it provides a better overall representation of dispersion.
30. (a) Range = Largest Data Value Smallest Data Value = 349 180 = 169 lb
x 8591
xi = 8591 ; xi2 = 2,332, 051; N = 33 ; = N i = 33 260.3 lb

2
i

( x )

2,332,051

( 8591)

33

53.8 lb

33

(b) Range = Largest Data Value Smallest Data Value = 306 177 = 129 lb
x 5889
xi = 5889 ; xi2 = 1, 481,833 ; N = 24 ; = N i = 24 245.4 lb

2
i

( x )

1,481,833

( 5889 )
24

24

39.2 lb

(c) The weights of the offense have the greater dispersion. The offense has both the larger
range and the larger standard deviation.
31. Range = Largest Data Value Smallest Data Value = 73 28 = 45.
For the sample variance and sample standard deviation, we use the computational formula:
xi = 2045 ; xi2 = 109,151 ; n = 40 ;
s2 =

2
i

( x )

n 1

( 2045)
109,151
40 1

40

118.0 ; s =

137

109,151

( 2045 )

40 1

40

10.9

Chapter 3 Numerically Summarizing Data


32. Range = Largest Data Value Smallest Data Value = 10.96 3.01 = 7.95 million shares.
For the sample variance and sample standard deviation, we use the computational formula:
xi = 205.92 ; xi2 = 1355.6208 ; n = 35 ;
s2 =

s=

2
i

( x )

n 1
1355.6208

1355.6208

( 205.92 )

35 1

( 205.92 )
35

s=

4.238 million shares 2 ;

2.059 million shares

33. (a) We use the computational formula:

( x )

35

35 1

2
i

n 1

38.2887

= 43.71 ;

( 43.71)

2
i

= 38.2887 ; n = 50 ;

50

0.04 g

50 1

(b) The histogram is approximately symmetric, so the Empirical Rule is applicable.


(c) Since 0.79 is exactly 2 standard deviations below the mean [0.79 = 0.87 2(0.04)] and
0.95 is exactly 2 standard deviations above the mean [0.95 = 0.87 + 2(0.04)], the
Empirical Rule predicts that approximately 95% of the M&Ms will weigh between
0.79 and 0.95 grams.
(d) All except 1 of the M&Ms weigh between 0.79 and 0.95 grams. Thus, the actual
percentage is 49/50 = 98%.
(e) Since 0.91 is exactly 1 standard deviation above the mean [0.91 = 0.87 + 0.04], the
Empirical Rule predicts that 13.5% + 2.35% + 0.15% = 16% of the M&Ms will weigh
more than 0.91 grams.
(f) Seven of the M&Ms weigh more than 0.91 grams (not including the ones that weigh
exactly 0.91 grams). Thus, the actual percentage is 7/50 = 14%.
34. (a) We use the computational formula:

s=

2
i

( x )

n 1

478,832

( 4582 )
44

44 1

= 4582 ;

2
i

= 478,832 ; n = 44 ;

6 sec.

(b) The histogram is approximately symmetric, so the Empirical Rule is applicable.


(c) Since 92 is exactly 2 standard deviations below the mean [92 = 104 2(6)] and 116 is
exactly 2 standard deviations above the mean [116 = 92 + 2(6)], the Empirical Rule
predicts that approximately 95% of the eruptions should last between 92 and 116 sec.
(d) All except 3 of the observed eruptions lasted between 92 and 116 seconds. Thus, the
actual percentage is 41/ 44 93% .
(e) Since 98 is exactly 1 standard deviation below the mean [98 = 104 6], the Empirical Rule
predicts that 13.5% + 2.35% + 0.15% = 16% of the eruptions will last less than 98 sec.
(f) Five of the observed eruptions lasted less than 98 seconds. Thus, the actual percentage
is 5 / 44 11% .
138

Section 3.2 Measures of Dispersion


35. Car 1:

= 3352;

= 755, 712; n = 15

2
i

Measures of Center:
xi = 3352 223.5 miles ; Mode: none;
x=
n
15
M = 223 miles (the 8th value in the ordered data)

Measures of Dispersion:
Range = Largest Data Value Smallest Data Value = 271 178 = 93 miles;

2
i

s2 =

n 1
755,712

s=
Car 2:

( x )

( 3352 )

= 3558;

15

( 3352 )

15

15 1

475.1 miles 2 ;

21.8 miles

15 1

755,712

= 877, 654; n = 15

2
i

Measures of Center:
xi = 3558 = 237.2 miles ; Mode: none;
x=
n
15
M = 230 miles (the 8th value in the ordered data)
Measures of Dispersion:
Range = Largest Data Value Smallest Data Value = 326 160 = 166 miles;
s2 =

s=

2
i

( x )

n 1

877,654

( 3558 )
15

15 1

( 3558)
877,654
15 1

15

2406.9 miles 2 ;

49.1 miles

The distribution for Car 1 is symmetric since the mean and median are approximately
equal. The distribution for Car 2 is skewed right slightly since the mean is larger than the
median. Both distributions have similar measures of center, but Car 2 has more dispersion
which can be seen by its larger range, variance, and standard deviation. This means that
the distance Car 1 can be driven on 10 gallons of gas is more consistent. Thus, Car 1 is
probably the better car to buy.
36. Fund A:

= 61;

2
i

= 356.12; n = 20

Measures of Center:
xi = 61 3.05 miles ;
x=
20
n

Mode: none;

139

M=

3.0 + 3.1
= 3.05 ;
2

Chapter 3 Numerically Summarizing Data


Measures of Dispersion:
Range = Largest Data Value Smallest Data Value = 8.6 ( 2.3) = 10.9 ;

2
i

s2 =

Fund B:

( x )

n 1

= 3558;

( 61)
356.12
20 1

20 8.95 ; s =

356.12

( 61)
20

20 1

2.99

= 877, 654; n = 15

2
i

Measures of Center:
xi = 68.1 3.41 ; Mode = 4.3; M = 3.5 + 3.8 = 3.65
x=
20
2
n
Measures of Dispersion:
Range = Largest Data Value Smallest Data Value = 12.9 ( 6.7 ) = 19.6 ;

2
i

( x )

( 68.1)
825.27

825.27

( 68.1)

20 31.23 ; s =
20 5.59
19 1
19 1
The distribution for Mutual Fund A is symmetric since the mean and median are equal.
Likewise, the distribution for Mutual Fund B is approximately symmetric (but skewed left
slightly since the mean is smaller than the median). Mutual Fund B has a larger measure
of center and greater dispersion which can be seen by its larger range, variance, and
standard deviation. This means that the rate of return on Mutual Fund A is generally
lower, but more consistent. The rate of return o Mutual Fund B is generally higher, but
more dispersed.
s2 =

37. (a) Financial Stocks:

x=

n 1

Energy Stocks:

= 719.4;

= 9591.0556; n = 32

2
i

502.9
15.716 ;
32
i

= 502.9;

2
i

M=

15.92 + 16.26
= 16.09
2

= 21, 213.3104; n = 32

719.4
19.50 + 19.67
= 19.585
22.481 ; M =
32
2
n
Energy Stocks have higher mean and median rates of return.
x=

(b) Financial Stocks: s =

Energy Stocks: s =

2
i

( x )

n 1

2
i

( x )

32 1

32

( 719.4 )
21,213.3104
32

7.378

12.751
32 1
Energy Stocks are riskier since they have a larger standard deviation.
n 1

( 502.9 )
9591.0556

140

Section 3.2 Measures of Dispersion


38. (a) American League:

= 166.26;

= 715.1876; n = 40

2
i

166.26
4.18 + 4.21
= 4.195
4.157 ; M =
40
2
n
National League: xi = 149.93; xi2 = 576.4971; n = 40
x=

149.93
3.84 + 3.87
= 3.855
3.748 , M =
40
2
n
The American League has both the higher mean and median earned-run average.
x=

(b) American League: s =

National League: s =

2
i

( x )

n 1

2
i

( x )

40 1

n 1

(166.26 )
715.1876

576.4971

40

(149.93)

40 1

40

0.787

0.610

The American League has more dispersion.


39. (a) Since 70 is exactly 2 standard deviations below the mean [70 = 100 2(15)] and 130 is
exactly 2 standard deviations above the mean [130 = 100 + 2(15)], the Empirical Rule
predicts that approximately 95% of people has an IQ score between 70 and 130.
(b) Since about 95% of people has an IQ score between 70 and 30, then approximately 5%
of people has an IQ score either less than 70 or greater than 130.
(c) Approximately 5% / 2 = 2.5% of people has an IQ score greater than 130.
40. (a) Since 404 is exactly 1 standard deviation below the mean [404 = 518 114] and 632 is
exactly 1 standard deviation above the mean [632 = 518 + 114], the Empirical Rule
predicts that approximately 68% of SAT scores is between 404 and 632.
(b) Since about 68% of SAT scores is between 404 and 632, then approximately 32% of
people of SAT scores is either less than 404 or greater than 632.
(c) Since 746 is exactly 2 standard deviations above the mean [746 = 518 + 2(114)], the
Empirical Rule predicts that approximately 2.5% of SAT scores is greater than 746.
41. (a) Approximately 95% of the data will be within 2 standard deviations of the mean.
Now, 325 2(30) = 265 and 325 + 2(30) = 385. Thus, about 95% of pairs of kidneys
will be between 265 and 385 grams.
(b) Since 235 is exactly 3 standard deviations below the mean [235 = 325 3(30)] and 415
is exactly 3 standard deviations above the mean [415 = 325 + 3(30)], the Empirical
Rule predicts that about 99.7% of pairs of kidneys weighs between 235 and 415 grams.
(c) Since about 99.7% of pairs of kidneys weighs between 235 and 415 grams, then about
0.3% of pairs of kidneys weighs either less than 235 or more than 415 grams.
(d) Since 295 is exactly 1 standard deviations below the mean [295 = 325 30] and 385 is
exactly 2 standard deviations above the mean [385 = 325 + 2(30)], the Empirical Rule
predicts that approximately 34% + 34% + 13.5% = 81.5% of pairs of kidneys weighs
between 295 and 385 grams.

141

Chapter 3 Numerically Summarizing Data


42. (a) Approximately 68% of the data will be within 1 standard deviation of the mean. Now,
4 0.007 = 3.993 and 4 + 0.007 = 4.007. Thus, about 68% of bolts manufactured will
be between 3.933 and 4.007 inches long.
(b) Since 3.986 is exactly 2 standard deviations below the mean [3.986 = 4 2(0.007)] and
4.014 is exactly 2 standard deviations above the mean [4.014 = 4 + 2(0.007)], the
Empirical Rule predicts that about 95% of bolts manufactured will be between 3.986
and 4.014 inches long.
(c) Since about 95% of bolts is between 3.986 and 4.014 inches, then about 5% of bolts
manufactured will either be shorter than 3.986 or longer than 4.014 inches. That is,
about 5% of the bolts will be discarded.
(d) Since 4.007 is exactly 1 standard deviations above the mean [4.007 = 4 + 0.007] and
4.021 is exactly 3 standard deviations above the mean [4.021 = 4 + 3(0.007)], the
Empirical Rule predicts that approximately 13.5% + 2.35% = 15.85% of bolts
manufactured will be between 4.007 and 4.021 inches long.

1
1

43. (a) By Chebyshevs inequality, at least 1 2 100% = 1 2 100% 88.9% of


k
3
gasoline prices has prices within 3 standard deviations of the mean.
1
1

100% = 84% of
(b) By Chebyshevs inequality, at least 1 2 100% = 1
2
k
2.5
gasoline prices has prices within k = 2.5 standard deviations of the mean. Now,
1.37 2.5(0.05) = 1.245 and 1.37 + 2.5(0.05) = 1.495 . Thus, the gasoline prices that
are within 2.5 standard deviations of the mean are from $1.245 to $1.495.
(c) Since 1.27 is exactly k = 2 standard deviations below the mean [1.27 = 1.37 2(0.05)]
and 1.47 is exactly k = 2 standard deviations above the mean [1.47 = 1.37 + 2(0.05)],
1
1

Chebyshevs theorem predicts that at least 1 2 100% = 1 2 100% = 75% of


k
2
gas stations has prices between $1.27 and $1.47 per gallon.
1
1

44. (a) By Chebyshevs inequality, at least 1 2 100% = 1 2 100% = 75% of


k
2
commuters in Boston has a commute time within 2 standard deviations of the mean.
1
1

(b) By Chebyshevs inequality, at least 1 2 100% = 1 2 100% 55.6% of


k
1.5
commuters in Boston has a commute time within 1.5 standard deviations of the mean.
Now, 27.3 1.5(8.1) = 15.15 and 27.3 + 1.5(8.1) = 39.45 . Thus, the commute times
within 1.5 standard deviations of the mean are from 15.15 to 39.45 minutes.
(c) Since 3 is exactly k = 3 standard deviations below the mean [3 = 27.3 3(8.1)] and
51.6 is exactly k = 3 standard deviations above the mean [51.6 = 27.3 + 3(8.1)],
1
1

Chebyshevs theorem predicts that at least 1 2 100% = 1 2 100% 88.9% of


k
3
gas stations has prices between $1.27 and $1.47 per gallon.

142

Section 3.2 Measures of Dispersion


45. When calculating the variability in team batting averages, we are finding the variability of
means. When calculating the variability of all players, we are finding the variability of
individuals. Since there is more variability among individuals than among means, the
teams will have less variability.
46. (a) Range = Largest Data Value Smallest Data Value = 75 30 = $45 thousand.
For the population variance and standard deviation, we use the computational formula:
xi = 500 ; xi2 = 26, 600 ; N = 10 ;

=
2

2
i

( x )

( x )

( x )

2
i

( 500 )
26,600
10

= 160 thousand $2 ;

10

10

$12.6 thousand
10
N
(b) Add $2500 ($2.5 thousand) to each salary to form the new data set.
New data set: 32.5, 32.5, 47.5, 52.5, 52.5, 52.5, 57.5, 57.5, 62.5, 77.5
Range = Largest Data Value Smallest Data Value = 77.5 32.5 = $45 thousand.
xi = 525 ; xi2 = 29,162.5 ; N = 10 ;

=
2

2
i

( 500 )
26,600

( x )

( x )

2
i

( 525)
29,162.5
10

= 160 thousand $2 ;

10

10

$12.6 thousand
10
N
All three measures of variability remain the same.
(c) Multiply each original data value by 1.05 to generate the new data set.
New data set: 31.5, 31.5, 47.25, 52.5, 52.5, 52.5, 57.75, 57.75, 63, 78.75
Range = Largest Data Value Smallest Data Value = 78.75 31.5 = $47.25 thousand.
xi = 525 ; xi2 = 29,326.5 ; N = 10 ;

2 =

2
i

( 525)
29,162.5

2
i

( x )

( 525 )
29,326.5
10

10

= 176.4 thousand $2 ;

( 525)
29,326.5

10 $13.3 thousand
N
=
10
N
All three measures of variability are larger than original, showing greater dispersion of
salaries. (Note that R and are each 5% larger than original, and 2 is 1.1025 times
larger than original which is (1.05) 2 .)

143

Chapter 3 Numerically Summarizing Data


(d) Add $25 thousand to the largest data value to form the new data set.
New data set: 30, 30, 45, 50, 50, 50, 55, 55, 60, 100
Range = Largest Data Value Smallest Data Value = 100 30 = $70 thousand.
xi = 525 ; xi2 = 30,975 ; N = 10 ;

=
2

2
i

( x )

2
i

( x )

( 525 )
30,975

10

10

= 341.25 thousand $2 ;

( 525)
30,975

10

$18.5 thousand
10
N
All three measures of variability are significantly larger than original.
47. Sample size of 5:
All data recorded correctly: s 5.3 .
106 recorded incorrectly as 160: s 27.9 .
Sample size of 12:
All data recorded correctly: s 14.7 .
106 recorded incorrectly as 160: s 22.7 .
Sample size of 30:
All data recorded correctly: s 15.9 .
106 recorded incorrectly as 160: s 19.2 .
As the sample size increases, the impact of the misrecorded data value on the standard
deviation decreases.
48. We use the computational formula:

2
i

( x )

= 312 ;

( 312 )
24,336

2
i

= 24,336 ; n = 4 ;

n
4
=
=0
4 1
n 1
If all values in a data set are identical, then there is zero variance.
s=

14.1
100% = 11.65%,
121
18.1
while the coefficient of variation for blood pressure after exercise is
100% =
135.9
13.32%. There is more variability in systolic blood pressure after exercise.
(b) The coefficient of variation for free calcium concentration in the group of people with
16.1
normal blood pressure is
100% = 14.92%, while the coefficient of variation for
107.9
free calcium concentration in the group of people with high blood pressure is
31.7
100% = 18.85%. There is more variability in free calcium concentration in the
168.2
high blood pressure group.

49. (a) The coefficient of variation for blood pressure before exercise is

144

Section 3.2 Measures of Dispersion


50. From Section 3.1, Exercise 17, we know x = $381.75 .
Data, xi
420
462
409
236

MAD =

Sample Mean, x
381.75
381.75
381.75
381.75

Deviations, xi x
38.25
80.25
27.25
145.75
( xi x ) = 0

Squared Deviations, xi x
38.25
80.25
27.25
145.75
xi x = 291.50

| x x | = $291.50 = $72.875 , which is somewhat less than the sample standard


i

n
4
deviation of s $99.81 .
51. (a) Skewness =
(b) Skewness =
(c) Skewness =
(d) Skewness =
(e) Skewness =

3(50 40)
= 3 . The distribution is skewed to the right.
10
3(100 100)
= 0 . The distribution is perfectly symmetric.
15
3(400 500)
= 2.5 . The distribution is skewed to the left.
120
3(0.8742 0.88)
0.44 . The distribution is slightly skewed to the left.
0.0397
3(104.136 104)
0.07 . The distribution is symmetric.
6.249

52. (a) Reading from the graph, the average annual return for a portfolio that is 10% foreign is
14.9%. The level of risk is 14.7%.
(b) To best minimize risk, 30% should be invested in foreign stocks. According to the
graph, a 30% investment in foreign stocks has the smallest standard deviation (level of
risk) at about 14.3%.
(c) Answers will vary. One possibility follows: The risk decreases because a portfolio
including foreign stocks is more diversified.
(d) According to Chebyshevs theorem, at least 75% of returns are within k = 2 standard
deviations of the mean. Thus, at least 75% of returns are between x ks =
15.8 2(14.3) = 12.8% and x + ks = 15.8 + 2(14.3) = 44.4% . By Chebyshevs
theorem, at least 88.9% of returns are within k = 3 standard deviations of the mean,
Thus, at least 88.9% of returns are between x ks = 15.8 3(14.3) = 27.1% and
x + ks = 15.8 + 3(14.3) = 58.7% . An investor should not be surprised if she has a
negative rate of return. Chebyshevs theorem indicates that a negative return is fairly
common.

145

Chapter 3 Numerically Summarizing Data

Consumer Reports: Basement Waterproofing Coatings

546.2
90.9 + 91.2 182.1
=
= 91.05 g ;
91.03 g ; M A =
6
2
2
n
There are 2 modes: 90.8 g and 91.2 g (each value occurs twice).

(a) x A =

(b)

= 546.2 ;

2
i

sA =

2
i

( x )

n 1

= 49, 722.66 ; n = 6 ;

( 546.2 )
49,722.66
6 1

0.23 g

522.3
87.0 + 87.1 174.1
=
= 87.05 g
= 87.05 g ; M B =
6
2
2
n
There are 2 modes: 87.0 g and 87.2 g (each value occurs twice).

(c) xB =

(d)

= 522.2 ;

2
i

sB =

2
i

= 45, 448.9 ; n = 6 ;

( x )

n 1

(e)

( 522.3)
45,466.33
6 1

0.15 g

86 8
87 0 0 1 2 2
88
89
9 8 8 90
3 2 2 91
Yes, there appears to be a difference in these two products ability to mitigate water seepage. All
6 of the measurements for product B are less than the measurements for product A. Although it is
not clear whether there is any practical difference in these two products ability to mitigate water
seepage, product B appears to do a better job.

3.3 Measures of Central Tendency and Dispersion from Grouped Data


1. When we approximate the mean and standard deviation from grouped data, we assume that
all of the data points within each group can be approximated by the midpoint of that group.
2. x =

is a weighted average in which the value of each weight is one.

146

Section 3.3 Measures of Central Tendency and Dispersion from Grouped Data
3.

10 + 20

10 19

20 + 30

20 29

xi f i

xi x

= 15

120

32.8333

17.8333

2544.2127

= 25

16

400

32.8333

7.8333

981.7694

fi

30 39

35

21

735

32.8333

2.1667

98.5864

40 49

45

11

495

32.8333

12.1667

1628.3145

50 59

55

220

32.8333

22.1667

1965.4504

f
x f
x=
f
i

Midpoint, xi

Class

= 60

x f

i i

1+ 6

15

2
2

(x x ) f
( f ) 1
2

f = 7218.3334

7185.3334
$11.06
60 1

( xi )

Frequency, fi

xi f i

xi

11

38.5

14.5714

11.0714

1348.3349

14.5714

6.0714

= 3.5

6 + 11

6 10

( x x )

= 1970

1970
=
32.8333 $32.83 ; s =
60

i i

4.

( xi x )

Frequency, fi

Midpoint, xi

Class

= 8.5

fi

11 15

13.5

67.5

14.5714

1.0714

5.7395

16 20

18.5

111

14.5714

3.9286

92.6034

21 25

23.5

23.5

14.5714

8.9286

79.7199

26 30

28.5

57

14.5714

13.9286

388.0118

31 35

33.5

33.5

14.5714

18.9286

358.2919

36 40

38.5

77

14.5714

23.9286

1145.1558

f
xf
=
f

i i
i

= 28

x f

i i

408
=
14.5714 14.6 points ; =
28

147

( x )

= 408

(x )
f
i

f = 3417.8572

3417.8572
11.0 points
28

Chapter 3 Numerically Summarizing Data


5.

Class
09
10 19

0 + 10
2
10 + 20
2

xi f i

xi

=5

31

155

17.3

12.3

4689.99

= 15

39

585

17.3

2.3

206.31

fi

20 29

25

17

425

17.3

7.7

1007.93

30 39

35

210

17.3

17.7

1879.74

40 49

45

180

17.3

27.7

3069.16

50 59

55

110

17.3

37.7

2842.58

60 69

65

65

17.3

47.7

2275.29

f
xf
=
f

i i
i

6.

( xi )

Frequency, fi

Midpoint, xi

Class
09
10 19

1730
=
= 17.3 days ; =
100

2
10 + 20
2

( x )

= 1730

i i

(x )
f
i

f = 15,971

15,971
12.6 days
100

( xi x )

Frequency, fi

xi f i

xi x

=5

24

120

21.6

16.6

6613.44

= 15

14

210

21.6

6.6

609.84

Midpoint, xi
0 + 10

x f

= 100

fi

20 29

25

39

975

21.6

3.4

450.84

30 39

35

18

630

21.6

13.4

3232.08

40 49

45

225

21.6

23.4

2737.8

x f
x=
f

i i
i

= 100

2160
=
= 21.6 hr/wk ; s =
100

x f

= 2160

i i

(x x ) f
( f ) 1
2

148

( x x )

13, 644
11.7 hr/wk
100 1

f = 13, 644

Section 3.3 Measures of Central Tendency and Dispersion from Grouped Data
7.

Frequency, fi
(in millions)

xi f i

xi

= 30

28.9

867

44.4695

14.4695

6050.6898

= 40

35.7

1428

44.4695

4.4695

713.1586

Midpoint, xi

Class
25 34
35 44

25 + 35
2
35 + 45
2

fi

45 54

50

35.1

1755

44.4695

5.5305

1073.5837

55 64

60

24.7

1482

44.4695

15.5305

5957.5518

f
xf
=
f

i i
i

8.

( xi )

Class
0 0.9
1.0 1.9

= 124.4

x f

i i

5532
=
44.4695 44.5 yrs ; =
124.4

2
1+ 2
2

(x )
f

f = 13, 794.9839

13, 794.9839
10.5 yrs
124.4

( xi )

Freq, fi

xi f i

xi

= 0.5

539

269.5

2.7627

2.2627

2759.5783

= 1.5

1.5

2.7627

1.2627

1.5944

Midpt, xi
0 +1

( x )

= 5532

fi

2.0 2.9

2.5

1336

3340

2.7627

0.2627

92.1991

3.0 3.9

3.5

1363

4770.5

2.7627

0.7373

740.9422

4.0 4.9

4.5

289

1300.5

2.7627

1.7373

872.2631

5.0 5.9

5.5

21

115.5

2.7627

2.7373

157.3490

6.0 6.9

6.5

13

2.7627

3.7373

27.9348

f
xf
=
f

i i
i

= 3551

x f

i i

( x )

= 9810.5

9810.5
=
2.7627 2.8 ; =
3551

(x )
f

149

f = 4651.8609

4651.8609
1.1
3551

Chapter 3 Numerically Summarizing Data


9. (a)

50 59
60 69

50 + 60
2
60 + 70
2

( xi )

Freq, fi

xi f i

xi

= 55

55

80.9350

25.9350

672.6242

= 65

308

20,020

80.9350

15.9350

78,208.6613

Midpt, xi

Class

fi

70 79

75

1519

113,925

80.9350

5.9350

53,505.5977

80 89

85

1626

138,210

80.9350

4.0650

26,868.3900

90 99

95

503

47,785

80.9350

14.0650

99,505.5851

100 109

105

11

1155

80.9350

24.0650

6370.3665

f
xf
=
f

i i
i

= 3968

x f

i i

( x )

= 321,150

321,150
=
80.9350 80.9F ; =
3968

(x )
f
i

f = 265,131.2248

265,131.2248
8.2F
3968

High Temperatures in August in Chicago

Frequency

(b)

1800
1600
1400
1200
1000
800
600
400
200
0

50

60

70

80

90 100 110

Temperature

(c)

By the Empirical Rule, 95% of the observations will be within 2 standard deviations of
the mean. Now, 2 = 80.9 2(8.2) = 64.5 and + 2 = 80.9 + 2(8.2) = 97.3 , so
95% of the of days in August will have temperatures between 64.5F and 97.3F .

150

Section 3.3 Measures of Central Tendency and Dispersion from Grouped Data
10. (a)

20 + 25

20 24

2
25 + 30

25 29

( xi )

Freq, fi

xi f i

xi

= 22.5

90

37.8333

15.3333

940.4404

= 27.5

15

412.5

37.8333

10.3333

1601.6563

Midpoint, xi

Class

fi

30 34

32.5

27

877.5

37.8333

5.3333

767.9904

35 39

37.5

40

1500

37.8333

0.3333

4.4436

40 44

42.5

28

1190

37.8333

4.6667

609.7865

45 49

47.5

15

712.5

37.8333

9.6667

1401.6763

50 54

52.5

210

37.8333

14.6667

860.4484

55 59

57.5

115

37.8333

19.6667

773.5582

f
xf
=
f

i i
i

= 135

x f

i i

( x )

= 5107.5

5107.5
=
37.8333 37.8 in ; =
135

(x )
f
i

f = 6960.0001

6960.0001
7.2 in
135

Annual Rainfall for St. Louis, MO

Frequency

(b)
45
40
35
30
25
20
15
10
5
0

20

25

30

35

40

45

50

60

65

Rainfall (inches)

(c) By the Empirical Rule, 95% of the observations will be within 2 standard deviations of the
mean. Now, 2 = 37.8 2(7.2) = 23.4 and + 2 = 37.8 + 2(7.2) = 52.2 , so 95% of
annual rainfalls in St. Louis will be between 23.4 and 52.2 inches.

151

Chapter 3 Numerically Summarizing Data


11. (a)

15 + 20

15 19

2
20 + 25

20 24

( xi )

Freq, fi

xi f i

xi

= 17.5

93

1627.5

32.2721

14.7721

20,293.99

= 22.5

511

11,497.5

32.2721

9.7721

48,787.40

Midpoint, xi

Class

25 29

27.5

1628

44,770

32.2721

4.7721

37,074.34

30 34

32.5

2832

92,040

32.2721

0.2279

147.09

35 39

37.5

1843

69,112.5

32.2721

5.2279

50,370.92

40 44

42.5

377

16,022.5

32.2721

10.2279

39,437.95

f
xf
=
f

i i
i

= 7284

x f

i i

( x )

= 235, 070

235, 070
=
32.2721 32.3 yr ; =
7284

(x )
f
i

f = 196,111.69

196,111.69
5.2 yr
7284

Number of Multiple Births in 2002


Frequency

(b)
3000
2500
2000
1500
1000
500
0

15

20

25

30

35

40

45

Mothers Age

(c)

fi

By the Empirical Rule, 95% of the observations will be within 2 standard deviations of
the mean. Now, 2 = 32.3 2(5.2) = 21.9 and + 2 = 32.3 + 2(5.2) = 42.7 , so
95% of mothers of multiple births will be between 21.9 and 42.7 years of age.

152

Section 3.3 Measures of Central Tendency and Dispersion from Grouped Data
12. (a)

400 449
450 499

400 + 450
2
450 + 500
2

( xi )

Freq, fi

xi f i

xi

= 425

281

119,425

603.1482

178.1482

8,918,035.5

= 475

577

274,075

603.1482

128.1482

9,475,471.6

Midpoint, xi

Class

fi

500 549

525

840

441,000

603.1482

78.1482

5,129,998.6

550 599

575

1120

644,000

603.1482

28.1482

887,399.7

600 649

625

1166

728,750

603.1482

21.8518

556,766.4

650 699

675

900

607,500

603.1482

71.8518

4,646,413.0

700 749

725

518

375,550

603.1482

121.8518

7,691,192.1

750 800

775.5

394

305,547

603.1482

172.3518

11,703,826.
3

f
xf
=
f

i i
i

= 5796

x f

i i

( x )

= 3, 495,650

3, 495,847
=
603.1482 603.1 ; =
5796

(x )
f
i

f = 49,009,103.2

49, 009,103.2
92.0
5796

SAT Verbal Scores, 2003


Frequency

(b)

1200
1000
800
600
400
200
0

400 450 500 550 600 650 700 750 800

Score

(c) By the Empirical Rule, 95% of the observations will be within 2 standard deviations of
the mean. Now, 2 = 603.1 2(92.0) = 419.1 and + 2 = 603.1 + 2(92) = 787.1 , so
95% of ISACS college-bound seniors will have SAT Verbal scores between 419 and 787.

153

Chapter 3 Numerically Summarizing Data


13.

20 + 30

20 29

2
30 + 40

30 39

( xi x )

Freq, fi

xi f i

xi x

= 25

25

51.75

26.75

715.5625

= 35

210

51.75

16.75

1683.375

Midpt, xi

Class

fi

40 49

45

10

450

51.75

6.75

455.625

50 59

55

14

770

51.75

3.25

147.875

60 69

65

390

51.75

13.25

1053.375

70 79

75

225

51.75

23.25

1621.6875

f
x=

= 40

x f

i i

( x x )

= 2070

f = 5677.5

x f = 2070 = 51.75 51.8 (compared to 51.1 using the raw data.);


40
f
( x x ) f = 5677.5 12.1 (compared to 10.9 using the raw data.)
40 1
( f ) 1
i i
i

s=

14.

3+5

3 4.99

2
5+7

5 6.99

( xi x )

Frequency, fi

xi f i

xi x

=4

12

48

48

=6

14

84

Midpoint, xi

Class

7 8.99

48

24

9 10.99

10

30

48

f
x=

= 35

x f

i i

( x x )
i

fi

f = 120

x f = 210 = 6 million shares (compared to 5.88 million shares using the raw data.);
f 35
( x x ) f = 120 1.879 million shares (compared to 2.059 million shares using
35 1
( f ) 1
i i
i

s=

= 210

the raw data.)

154

Section 3.3 Measures of Central Tendency and Dispersion from Grouped Data

15. GPA = xw =

w x
w x

i i

i i

5(3) + 3(4) + 4(4) + 3(2) 49


=
3.27
5+3+ 4+3
15

w x
w x

5(100) + 10(93) + 60(86) + 25(85) 8715


=
= 87.15%
5 + 10 + 60 + 25
100

w x
w x

4($3.50) + 3($2.75) + 2($2.25)


$2.97 /lb
4+3+ 2

w x
w x

2.5($1.30) + 4($4.50) + 2($3.75)


$3.38 /lb
2.5 + 4 + 2

16. Course Average = xw =

i i
i i

17. Cost per pound = xw =

i i
i i

18. Cost per pound = xw =

i i
i i

19. (a)

( xi )

Class

Midpt, xi

Freq, fi

xi f i

xi

09

20,225

101,125

35.6058

30.6058

18,945,060.7

10 19

15

21,375

320,625

35.6058

20.6058

9,075,803.5

20 29

25

20,437

510,925

35.6058

10.6058

2,298,814.9

30 39

35

21,176

741,160

35.6058

0.6058

7,771.5

40 49

45

22,138

996,210

35.6058

9.3942

1,953,700.5

50 59

55

16,974

933,570

35.6058

19.3942

6,384,515.4

60 69

65

10,289

668,785

35.6058

29.3942

8,889,891.4

70 79

75

6,923

519,225

35.6058

39.3942

10,743,824.4

80 89

85

3,053

259,505

35.6058

49.3942

7,448,669.7

90 99

95

436

41,420

35.6058

59.3942

1,538,064.6

f
x f
=
f

i i
i

= 143,026

x f

i i

( x )

= 5,092,550

5,092,550
=
35.6058 35.6 yr ; =
143,026

155

( x )
f
i

fi

f = 67, 286,116.6

67, 286,116.6
21.7 yr
143,026

Chapter 3 Numerically Summarizing Data


(b)

( xi )

Class

Midpt, xi

Freq, fi

xi f i

xi

09

19,319

96,595

38.0872

33.0872

21,149,722.6

10 19

15

20,295

304,425

38.0872

23.0872

10,817,616.6

20 29

25

19,459

486,475

38.0872

13.0872

3,332,836.4

30 39

35

20,936

732,760

38.0872

3.0872

199,536.9

40 49

45

22,586

1,016,370

38.0872

6.9128

1,079,312.8

50 59

55

17,864

982,520

38.0872

16.9128

5,109,868.6

60 69

65

11,563

751,595

38.0872

26.9128

8,375,067.1

70 79

75

9,121

684,075

38.0872

36.9128

12,427,862.3

80 89

85

5,367

456,195

38.0872

46.9128

11,811,751.5

90 99

95

1,215

115,425

38.0872

56.9128

3,935,466.2

f
x f
=
f

i i
i

= 147,725

x f

i i

( x )

= 5,626, 435

5,626, 435
=
38.0872 38.1 yr ; =
147,725

( x )
f
i

fi

f = 78, 239,041.0

78, 239,041
23.0 yr
147,725

(c) & (d) Females have both a higher mean age and more dispersion in age.
20. (a)

( xi )

Class

Midpt, xi

Freq, fi

xi f i

xi

10 14

12.5

1.1

13.75

25.8462

13.4996

200.4631

15 19

17.5

53.0

927.5

25.8462

8.4996

3828.8896

20 24

22.5

115.1

2589.75

25.8462

3.4996

1409.6527

25 29

27.5

112.9

3104.75

25.8462

1.5004

254.1605

30 34

32.5

61.9

2011.75

25.8462

6.5004

2615.5969

35 39

37.5

19.8

742.5

25.8462

11.5004

2618.7322

40 44

42.5

3.9

165.75

25.8462

16.5004

1061.8265

45 49

47.5

0.2

9.5

25.8462

21.5004

92.4534

= 367.9

x f

i i

= 9565.25

156

( x )
i

fi

f = 12,081.7749

Section 3.3 Measures of Central Tendency and Dispersion from Grouped Data

x f
=
f

i i
i

(b)

9565.25
=
25.9996 26.0 yr ; =
367.9

( x )
f

12,081.7749
5.7 yr
367.9

( xi )

Class

Midpt, xi

Freq, fi

xi f i

xi

10 14

12.5

0.7

8.75

27.6180

15.118

159.9877

15 19

17.5

43.0

752.5

27.6180

10.118

4402.0787

20 24

22.5

103.6

2331

27.6180

5.118

2713.6905

25 29

27.5

113.6

3124

27.6180

0.118

1.5818

30 34

32.5

91.5

2973.75

27.6180

4.882

2180.8040

35 39

37.5

41.4

1552.5

27.6180

9.882

4042.8725

40 44

42.5

8.3

352.75

27.6180

14.882

1838.2336

45 49

47.5

0.5

23.75

27.6180

19.882

197.6470

x f
=
f

i i
i

(c) & (d)

21.

Class
09
10 19
20 29
30 39
40 49
50 59
60 69

= 402.6

x f

i i

( x )

= 11,119

11,1119
=
27.6180 27.6 yr ; =
402.6

( x )
f
i

fi

f = 12,081.7749

15,536.8952
6.2 yr
402.6

The year 2002 has both the higher mean age of mothers and more dispersion in
the age of mothers.
Frequency, f
31
39
17
6
4
2
1

Cumulative Frequency, CF
31
70
87
93
97
99
100

The total frequency is 100, so the position of the median is

n 100
=
= 50 , which is in the
2
2

n
CF
50 31
2
i = 10 +
second class, 10 19. Then M = L +
( 20 10 ) 14.9 days .
39
f

157

Chapter 3 Numerically Summarizing Data


22.

Class
09
10 19
20 29
30 39
40 49

Frequency, f
24
14
39
18
5

Cumulative Frequency, CF
24
38
77
95
100

The total frequency is 100, so the position of the median is

n 100
=
= 50 , which is in the
2
2

n
CF
50 38
i = 20 +
third class, 20 29. Then M = L + 2
( 30 20 ) 23.1 hr/wk .
f
39

23.

Class
25 34
35 44
45 54
55 64

Frequency, f (millions)
28.9
35.7
35.1
24.7

Cumulative Frequency, CF (millions)


28.9
64.6
99.7
124.4

The total frequency is 124.4 (million), so the position of the median is

n 124.4
=
= 62.2 ,
2
2

which is in the second class, 35 44. Then


n
CF
62.2 28.9
2
i = 35 +
M = L+
( 45 35) 44.3 years .
f
35.7
24.

Class
0 0.9
1.0 1.9
2.0 2.9
3.0 3.9
4.0 4.9
5.0 5.9
6.0 6.9

Frequency, f
539
1
1336
1363
289
21
2

Cumulative Frequency, CF
539
540
1876
3239
3528
3549
3551

The total frequency is 3551, so the position of the median is

n 3551
=
= 1775.5 , which is
2
2

n
CF
1775.5 540
i = 2.0 +
in the third class, 2.0 2.9. Then M = L + 2
( 3.0 2.0 ) 2.9 .
f
1336

158

Section 3.3 Measures of Central Tendency and Dispersion from Grouped Data
25. From the table in Problem 5, the modal class (highest frequency class) is 10 19 days.
26. From the table in Problem 6, the modal class (highest frequency class) is 20 29 hr/wk.
27. From the table in Problem 7, the modal class (highest frequency class) is 25 44 years.
28. From the table in Problem 8, the modal class (highest frequency class) is 3.0 3.9.
29. (a) Answers will vary. One possibility follows: Many colleges do not permit students
under age 16 to enroll in courses, so a reasonable midpoint to use would be 17.
(b) Answers will vary. One possibility follows: Since it is not likely that many students
would be over 70 years old, a reasonable midpoint would be 60.
(c) Answers will vary depending on choices for midpoints in parts (a) and (b). Using the
choices midpoints from above:

Class

Midpoint, xi

Freq, fi

xi f i

Less than 18

17

139

2363

= 19

4089

77,691

= 21

3357

70,497

= 23.5

1661

39,033.5

= 27.5

470

12,925

= 32.5

145

4712.5

= 37.5

95

3562.5

117

5265

21

1260

18 19
20 21
22 24
25 29
30 34
35 39
40 49
50 and above

18 + 20
2
20 + 22
2
22 + 25
2
25 + 30
2
30 + 35
2
35 + 40
2

40 + 50
2

= 45

60

f
=

= 10, 094

x f

i i

= 217,309.5

217,309.5
21.5 years . This estimate is a little higher than the actual
fi 10, 094
mean age of 20.9 years.
xi fi

159

Chapter 3 Numerically Summarizing Data

3.4 Measures of Position


1. Answers will vary. The kth percentile of a set of data is the value which divides the bottom
k% of the data from the top (100k)% of the data. For example, if a data value lies at the
60th percentile, then approximately 60% of the data is below it and approximately 40% is
above this value.
2. This can happen because the percentile is rounded to the nearest integer. For example, if
there were 150 scores in the class then the percentile for the top score would be given by
149
100 = 99.3 which rounds to the 99th percentile, while the next score would
150
148
correspond to a percentile of
100 = 98.7 which also rounds to the 99th percentile.
150
3. A four-star mutual fund is in the top 40% but not in the top 20% of its investment class.
That is, it is above the bottom 60% but below the top 20% of the ranked funds.
4. Not necessarily. When an outlier is discovered it should be investigated to find its cause.
Once the cause is determined, then it can be determined whether it should be removed from
the data set.
5. To qualify for Mensa, one needs to have an IQ that is in the top 2% of people.
6. Comparing z-scores gives us a unitless comparison of standard deviations from the mean.
They also take the relative size and variability of the data into account. This allows us to
have a standard basis for comparison and also enables us to more easily detect possible
outliers.

2400 2600
0.30

670
x 3300 3500
=
0.42
z-score for the 40-week gestation baby: z =

475
The weight of a 34-week gestation baby is 0.30 standard deviations below the mean, while
the weight of a 40-week gestation baby is 0.42 standard deviations below the mean. Thus,
the 40-week gestation baby weighs less relative to the gestation period.

7. z-score for the 34-week gestation baby: z =

3000 2600
0.60

670
x 3900 3500
z-score for the 40-week gestation baby: z =
=
0.84

475
The weight of a 34-week gestation baby is 0.60 standard deviations above the mean, while
the weight of a 40-week gestation baby is 0.84 standard deviations above the mean. Thus,
the 34-week gestation baby weighs less relative to the gestation period.

8. z-score for the 34-week gestation baby: z =

160

Section 3.4 Measures of Position


x

75 69.6
=2

2.7
x 70 64.1
z-score for the 70-inch woman: z =
=
2.27

2.7
The height of the 75-inch man is 2 standard deviations above the mean, while the height of
a 70-inch woman is 2.27 standard deviations above the mean. Thus, the 70-inch woman is
relatively taller than the 75-inch man.

9. z-score for the 75-inch man: z =

68 69.6
0.59

2.7
x 62 64.1
=
0.81
z-score for the 62-inch woman: z =

2.7
The height of the 68-inch man is 0.59 standard deviations below the mean, while the height
of a 62-inch woman is 0.81 standard deviations below the mean. Thus, the 68-inch man is
relatively taller than the 62-inch woman.

10. z-score for the 68-inch man: z =

2.27 4.198
2.50

0.772
x 2.61 4.338
=
2.20
z-score for Johann Santana: z =

0.785
Jake Peavys 2004 ERA was 2.50 standard deviations below the mean, while Johann
Santanas 2004 ERA was 2.20 standard deviations below the mean. Thus, Peavy had the
better year relative to his peers.

11. z-score for Jake Peavy: z =

0.406 0.28062
3.82

0.03281
x 0.372 0.26992
z-score for Ichiro Suzuki: z =
=
4.74

0.02154
Ted Williams 1941 batting average was 3.82 standard deviations above the mean, while
Ichiro Suzukis 2004 batting average was 4.74 standard deviations above the mean. Thus,
Suzuki had the better year relative to his peers.

12. z-score for Ted Williams: z =

13. The data provided in Table 17 are already listed in ascending order.
k
40
(a) i =
( n + 1) =
( 51 + 1) = 20.8 . Since i = 20.8 is not an integer, we average
100
100
325.5 + 333.2
= 329.35 . This means that
the 20th and 21st data values: P40 =
2
approximately 40% of the states have violent crime rates less than 329.35 crimes per
100,000 population, and approximately 60% of the states have violent crime rates more
than this.

161

Chapter 3 Numerically Summarizing Data


k
95
(b) i =
( n + 1) =
( 51 + 1) = 49.4 . Since i = 49.4 is not an integer, we average
100
100
730.2 + 793.5
the 49th and 50th data values: P95 =
= 761.85 . This means that
2
approximately 95% of the states have violent crime rates less than 761.85 crimes per
100,000 population, and approximately 5% of the states have violent crime rates more
than this.
k
10
(c) i =
( n + 1) =
( 51 + 1) = 5.2 . Since i = 5.2 is not an integer, we average the
100
100
173.4 + 221.0
= 197.2 . This means that approximately
5th and 6th data values: P10 =
2
10% of the states have violent crime rates less than 197.2 crimes per 100,000
population, and approximately 90% of the states have violent crime rates more than
this.
(d) Of the 51 states, 48 have a violent crime rate less than Floridas violent crime rate.
48
Percentile rank of Florida = 100 94 . Floridas violent crime rate is at the 94th
51
percentile. This means that approximately 94% of the states have violent crime rates
that are less than that of Florida, and approximately 6% of the states have violent crime
rates that are larger than that of Florida.
(e) Of the 51 states, 40 have a violent crime rate less than Californias violent crime rate.
40
Percentile rank of California = 100 78 . Californias violent crime rate is at the
51
th
78 percentile. This means that approximately 78% of the states have violent crime
rates that are less than that of California, and approximately 22% of the states have
violent crime rates that are larger than that of California.
14. The data provided in Table 17 are already listed in ascending order.
k
30
(a) i =
( n + 1) =
( 51 + 1) = 15.6 . Since i = 15.6 is not an integer, we average
100
100
275.8 + 285.6
the 15th and 16th data values: P30 =
= 280.7 . This means that
2
approximately 30% of the states have violent crime rates less than 280.7 crimes per
100,000 population, and approximately 70% of the states have violent crime rates more
than this.
k
85
(b) i =
( n + 1) =
( 51 + 1) = 44.2 . Since i = 44.2 is not an integer, we average
100
100
646.3 + 658.0
= 652.15 . This means that
the 44th and 45th data values: P85 =
2
approximately 85% of the states have violent crime rates less than 652.15 crimes per
100,000 population, and approximately 15% of the states have violent crime rates more
than this.

162

Section 3.4 Measures of Position


k
5
(c) i =
( n + 1) =
( 51 + 1) = 2.6 . Since i = 2.6 is not an integer, we average the
100
100
108.9 + 110.2
2nd and 3rd data values: P5 =
= 109.55 . This means that approximately
2
5% of the states have violent crime rates less than 109.55 crimes per 100,000 population,
and approximately 95% of the states have violent crime rates more than this.
(d) Of the 51 states, 45 have a violent crime rate less than New Mexicos violent crime
45
rate. Percentile rank of New Mexico = 100 88 . New Mexicos violent crime
51
th
rate is at the 88 percentile. This means that approximately 88% of the states have
violent crime rates that are less than that of New Mexico, and approximately 12% of
the states have violent crime rates that are larger than that of New Mexico.
(e) Of the 51 states, 15 have a violent crime rate less than Rhode Islands violent crime
15
rate. Percentile rank of Rhode Island = 100 29 . Rhode Islands violent crime
51
th
rate is at the 29 percentile. This means that approximately 29% of the states have
violent crime rates that are less than that of Rhode Island, and approximately 71% of
the states have violent crime rates that are larger than that of Rhode Island.
15. (a) Computing the sample mean ( x ) and sample standard deviation (s) for the data yields
x = 3.9935 inches and s 1.7790 inches. Using these values as approximations for
x 0.97 3.9935

1.70 . The
the and , the z-score for x = 0.97 inches z =

1.7790
rainfall in 1971 (0.97 inches) is 1.70 standard deviations below the mean.
(b) The data provided are already listed in ascending order. There are n = 20 data points.
25
The index for the first quartile is i =
( 20 + 1) = 5.25 . Since i = 5.2 is not an
100

integer, we average the 5th and 6th data values: Q1 =

2.47 + 2.78
2

= 2.625 inches. The

50
index for the second quartile is i =
( 20 + 1) = 10.5 . Since i = 10.5 is not an
100
integer, we average the 10th and 11th data values: Q2 =

3.97 + 4.0
2

= 3.985 inches. The

75
index for the third quartile is i =
( 20 + 1) = 15.75 . Since i = 15.75 is not an
100
integer, we average the 15th and 16th data values: Q3 =

5.22 + 5.50
2

(c) IQR = Q3 Q1 = 5.36 2.625 = 2.735 inches

= 5.36 inches.

(d) Lower fence = Q1 1.5 ( IQR ) = 2.625 1.5 ( 2.735 ) = 1.478 inches.

Upper fence = Q3 + 1.5 ( IQR ) = 5.36 + 1.5 ( 2.735 ) = 9.463 inches.


According to this criterion, there are no outliers.

163

Chapter 3 Numerically Summarizing Data


16. (a) Computing the sample mean ( x ) and sample standard deviation (s) for the data yields
x = 10.08 g/dL and s 1.8858 g/dL. Using these values as approximations for the
x 7.8 10.08
and , the z-score for x = 7.8 g/dL is z =

1.21 . Blackies

1.8858
hemoglobin level (7.8 g/dL) is 1.21 standard deviations below the mean.
(b) The data provided are already listed in ascending order. There are n = 20 data points.
25
The index for the first quartile is i =
( 20 + 1) = 5.25 . Since i = 5.25 is not an
100

integer, we average the 5th and 6th data values: Q1 =

8.9 + 9.4
2

= 9.15 g/dL. The index

50
for the second quartile is i =
( 20 + 1) = 10.5 . Since i = 10.5 is not an integer, we
100
average the 10th and 11th data values: Q2 =

9.9 + 10.0
2

= 9.95 g/dL. The index for the

75
third quartile is i =
( 20 + 1) = 15.75 . Since i = 15.75 is not an integer, we
100
average the 15th and 16th data values: Q3 =

11.0 + 11.2
2

= 11.1 g/dL.

(c) IQR = Q3 Q1 = 11.1 9.15 = 1.95 g/dL

(d) Lower fence = Q1 1.5 ( IQR ) = 9.15 1.5 (1.95 ) = 6.225 g/dL.

Upper fence = Q3 + 1.5 ( IQR ) = 11.1 + 1.5 (1.95 ) = 14.025 g/dL.


The hemoglobin level 5.7 g/dL is an outlier because it is less than the lower fence.
17. (a) Computing the sample mean ( x ) and sample standard deviation (s) for the data yields
x 15.9227 mg/L and s 7.3837 mg/L. Using these values as approximations for the
x 20.46 15.9227
and , the z-score for x = 20.46 mg/L is z =

0.61 . The

7.3837
organic concentration of 20.46 mg/L is 0.61 standard deviations above the mean.
(b) There are n = 33 data points, and we must put them in ascending order:
5.2, 5.29, 5.3, 6.51, 7.4, 8.09, 8.81, 9.72, 10.3, 11.4, 11.9, 14, 14.86, 14.86,
14.9, 15.35, 15.42, 15.72, 15.91, 16.51, 16.87, 17.5, 17.9, 18.3, 19.8, 20.46,
20.46, 22.49, 22.74, 27.1, 29.8, 30.91, 33.67
25
The index for the first quartile is i =
( 33 + 1) = 8.5 . Since i = 8.5 is not an
100

integer, we average the 8th and 9th data values: Q1 =

9.72 + 10.3
2

= 10.01 mg/L. The

50
index for the second quartile is i =
( 33 + 1) = 17 . Since i = 17 is an integer, the
100
17th data value is the second quartile: Q2 = 15.42 mg/L. The index for the third

164

Section 3.4 Measures of Position


75
quartile is i =
( 33 + 1) = 25.5 . Since i = 25.5 is not an integer, we average the
100
25th and 26th data values: Q3 =

19.8 + 20.46
2

= 20.13 mg/L.

(c) IQR = Q3 Q1 = 20.13 10.1 = 10.12 mg/L

(d) Lower fence = Q1 1.5 ( IQR ) = 10.01 1.5 (10.12 ) = 5.17 mg/L.

Upper fence = Q3 + 1.5 ( IQR ) = 20.13 + 1.5 (10.12 ) = 35.31 mg/L.


According to this criterion, there are no outliers.
18. (a) Computing the sample mean ( x ) and sample standard deviation (s) for the data yields
x 10.0266 mg/L and s 4.9789 mg/L. Using these values as approximations for the
x 17.99 10.0266
and , the z-score for x = 20.46 mg/L is z =

1.60 . The

4.9789
organic concentration of 17.99 mg/L is 1.60 standard deviations above the mean.
(b) There are n = 47 data points, and we must put them in ascending order:
3.02, 3.79, 3.91, 3.99, 4.6, 4.71, 4.8, 4.85, 4.9, 5.5, 7, 7.11, 7.31, 7.45, 7.66,
7.85, 7.9, 7.92, 8.05, 8.37, 8.5, 8.5, 8.79, 9.1, 9.11, 9.29, 9.6, 9.81, 10.3, 10.72,
10.47, 10.89, 11.33, 11.56, 11.72, 11.72, 11.8, 11.97, 12.57, 12.89, 16.92, 17.9,
17.99, 21, 21.4, 21.82, 22.62
25
The index for the first quartile is i =
( 47 + 1) = 12 . Since i = 12 is an integer, the
100
12th data value is the first quartile: Q1 = 7.11 mg/L. The index for the second quartile

50
th
is i =
( 47 + 1) = 24 . Since i = 24 is an integer, the 24 data value is the second
100
75
quartile: Q2 = 9.1 mg/L. The index for the third quartile is i =
( 47 + 1) = 36 .
100
Since i = 36 is an integer, the 36th data is the third quartile: Q3 = 11.72 mg/L.
(c) IQR = Q3 Q1 = 11.72 7.11 = 4.61 mg/L

(d) Lower fence = Q1 1.5 ( IQR ) = 7.11 1.5 ( 4.61) = 0.195 mg/L.

Upper fence = Q3 + 1.5 ( IQR ) = 20.13 + 1.5 (10.12 ) = 18.635 mg/L.


The organic carbon concentrations 21, 21.4, 21.82, and 22.62 mg/L are outliers because
they are higher than the upper fence.
19. The first and third quartiles are Q1 = 433 minutes and Q3 = 489.5 minutes.

Upper fence = Q3 + 1.5 ( IQR ) = 489.5 + 1.5 ( 489.5 433) = 574.25 minutes.

The cutoff point is 574 minutes. If more minutes are used, the customer is contacted.
20. The first and third quartiles are Q1 = $84 and Q3 = $138 .

Upper fence = Q3 + 1.5 ( IQR ) = 138 + 1.5 (138 84 ) = $219 .


If daily charges exceed $219, the customer will be contacted.
165

Chapter 3 Numerically Summarizing Data


21. (a) The first and third quartiles are Q1 = $67 and Q3 = $479 .

Lower fence = Q1 1.5 ( IQR ) = 67 1.5 ( 479 67 ) = $551

Upper fence = Q3 + 1.5 ( IQR ) = 479 + 1.5 ( 479 67 ) = $1097 .


Therefore, $12,777 is an outlier because it is greater than the upper fence.
(b)

(c) Answers will vary. One possibility is that a student may have provided his or her
annual income instead of his or her weekly income.
22. (a) The first and third quartiles are Q1 = $21 and Q3 = $54 .

Lower fence = Q1 1.5 ( IQR ) = 21 1.5 ( 54 21) = $28.50


Upper fence = Q3 + 1.5 ( IQR ) = 54 + 1.5 ( 54 21) = $103.50 .
Therefore, $115 and $1000 are outliers because they are greater than the upper fence.

(b)

(c) Answers will vary. One possibility follows: It is possible that $115 is correct but
simply an unusual situation. For the data value $1000, perhaps a student provided his
or her annual expenditures for entertainment instead of his or her weekly expenditures.

166

Section 3.5 The Five-Number Summary and Boxplots


23.

24.

Pulse
76
60
60
81
72
80
80
68
73
72.2
7.671
Travel Time
39
21
9
32
30
45
11
12
39
26.4
12.842

z-score
0.49
1.59
1.59
1.14
0.03
1.01
1.01
0.55
0.10
0.0
1.00
z-score
0.98
0.42
1.36
0.43
0.28
1.44
1.20
1.12
0.98
0.0
1.000

= mean of the z-scores


= standard deviation of the z-scores

= mean of the z-scores


= standard deviation of the z-scores

3.5 The Five-Number Summary and Boxplots


1. The median and interquartile range are better measures of central tendency and dispersion
if the data are skewed or if the data contain outliers.
2. right
3. (a) The median is to the left of the center of the box and the right line is substantially
longer than the left line, so the distribution is skewed right.
(b) Reading the boxplot, the five-number summary is approximately: 0, 1, 3, 6, 16.
4. (a) The median is near the center of the box and the horizontal lines are approximately the
same in length, so the distribution is symmetric.
(b) Reading the boxplot, the five-number summary is approximately: 1 , 2, 5, 8, 11.

167

Chapter 3 Numerically Summarizing Data


5. The data in ascending order are as follows:
42, 43, 46, 46, 47, 48, 49, 49, 50, 50, 51, 51, 51, 51, 52, 52, 54, 54, 54, 54, 54, 55, 55, 55,
55, 56, 56, 56, 57, 57, 57, 57, 58, 60, 61, 61, 61, 62, 64, 64, 65, 68, 69
The smallest number (youngest president) in the data set is 42. The largest number in the
data set is 69. The first quartile is Q1 = 51 (the 11th data point). The median is M = 55
(the 22nd data point). The third quartile is Q3 = 58 (the 33rd data point). The five-number
summary is 42, 51, 55, 58, 69.
The upper and lower fences are: Lower fence = Q1 1.5 ( IQR ) = 51 1.5 ( 58 51) = 40.5 ;
Upper fence = Q3 + 1.5 ( IQR ) = 58 + 1.5 ( 58 51) = 68.5 . Thus, 69 is an outlier.

The median is near the center of the box and the horizontal lines are approximately the
same in length, so the distribution is symmetric.
6. The data in ascending order are as follows:
1, 2, 8, 8, 11, 11, 12, 15, 16, 16, 17, 23, 23, 23, 23, 28, 28, 31, 33, 33, 35, 40
The smallest number in the data set is 1. The largest number in the data set is 40. The first

quartile is Q1 =
M=
Q3 =

17 + 23
2
28 + 28
2

11 + 11
2

= 11 (the mean of the 5th and 6th data points). The median is

= 20 (the mean of the 11th and 12th data points). The third quartile is
= 28 (the mean of the 16th and 17th data points). The five-number summary is

1, 11, 20, 28, 40.


The upper and lower fences are: Lower fence = Q1 1.5 ( IQR ) = 11 1.5 ( 28 11) = 14.5 ;
Upper fence = Q3 + 1.5 ( IQR ) = 28 + 1.5 ( 28 11) = 53.5 . Thus, there are no outliers.

The median is near the center of the box and the horizontal lines are approximately the
same in length, so the distribution is symmetric.

168

Section 3.5 The Five-Number Summary and Boxplots


7. The data is ascending order are as follows:
1, 3, 3, 3, 3, 4, 4, 4, 5, 7, 7, 7, 9, 10, 10, 10, 12, 13, 14, 15, 16, 17, 17, 17, 17, 18, 19, 19,
21, 22, 23, 25, 27, 27, 29, 32, 35, 36, 45
The smallest number in the data set is 1. The largest number in the data set is 45. The first
quartile is Q1 = 7 (the 10th data point). The median is M = 15 (the 20th data point). The
third quartile is Q3 = 22 (the 30th data point). The five-number summary is 1, 7, 15, 22,
45. The upper and lower fences are:
Lower fence = Q1 1.5 ( IQR ) = 7 1.5 ( 22 7 ) = 15.5 ;
Upper fence = Q3 + 1.5 ( IQR ) = 22 + 1.5 ( 22 7 ) = 44.5 . Thus, 45 is an outlier.

The median is to the left of the center of the box and the right line is substantially longer
than the left line, so the distribution is skewed right.
8. The data is ascending order are as follows:
18, 19, 19, 19, 20, 21, 22, 24, 24, 24, 25, 25, 25, 25, 26, 26, 26, 26, 26, 27, 27, 28, 28, 29,
29, 29, 29, 29, 29, 29, 29, 30, 30, 30, 30, 30, 30, 30, 30, 31, 31, 31, 31, 31, 32, 32, 32, 32,
32, 32, 34, 34, 34, 34, 34, 34, 34, 34, 35, 35, 38, 39, 46
The smallest number in the data set is 18. The largest is 46. The first quartile is Q1 = 26
(the 16th data point). The median is M = 30 (the 32nd data point). The third quartile is
Q3 = 32 (the 48th data point). The five-number summary is 18, 26, 30, 32, 46.

The upper and lower fences are: Lower fence = Q1 1.5 ( IQR ) = 18 1.5 ( 32 26 ) = 9 ;
Upper fence = Q3 + 1.5 ( IQR ) = 32 + 1.5 ( 32 26 ) = 41 . Thus, 46 is an outlier.

The median is to the right of the center of the box, so the distribution is skewed left.

169

Chapter 3 Numerically Summarizing Data


9. The data is ascending order are as follows:
0.598, 0.600, 0.600, 0.601, 0.602, 0.603, 0.605, 0.605, 0.605, 0.606, 0.607, 0.607, 0.608,
0.608, 0.608, 0.608, 0.608, 0.609, 0.610, 0.610, 0.610, 0.610, 0.611, 0.611, 0.612
The smallest number in the data set is 0.598. The largest is 0.612. The first quartile is

Q1 =

0.603 + 0.605
2

= 0.604 (the mean of the 6th and 7th data points). The median is

M = 0.608 (the 13th data point). The third quartile is Q3 =

0.610 + 0.610
2

= 0.610 (the mean

of the 19th and 20th data points). The five-number summary is 0.598, 0.604, 0.608, 0.610,
0.612. The upper and lower fences are:
Lower fence = Q1 1.5 ( IQR ) = 0.604 1.5 ( 0.610 0.604 ) = 0.595 ;
Upper fence = Q3 + 1.5 ( IQR ) = 0.610 + 1.5 ( 0.610 0.604 ) = 0.619 .

Thus, there are no outliers.

The median is to the right of the center of the box, so the distribution is skewed left.
Answers will vary concerning the source of variability in weight.
10. The data is ascending order are as follows:
421, 480, 581, 583, 598, 611, 616, 618, 643, 645, 646, 649, 653, 654, 660, 664, 666, 667,
669, 672, 675, 678, 679, 682, 683, 684, 688, 688, 692, 692, 698, 698, 704, 706, 707, 707,
711, 711, 713, 715, 726, 737, 740, 741, 787, 791, 802, 816, 821, 830, 971
The smallest number in the data set is 421. The largest number in the data set is 971. The
first quartile is Q1 = 653 (the 13th data point). The median is M = 684 (the 26th data
point). The third quartile is Q3 = 713 (the 39th data point). The five-number summary is
421, 653, 684, 713, 971. The upper and lower fences are:
Lower fence = Q1 1.5 ( IQR ) = 653 1.5 ( 713 653) = 563 ;
Upper fence = Q3 + 1.5 ( IQR ) = 713 + 1.5 ( 713 653) = 803 .

Thus, the data points 421, 480, 816, 821, 830, and 971 are outliers.

**
*

The median is near the center of the box. Though the left line is longer than the right line,
when we consider the positions of the outliers, the distribution is relatively symmetric.
Answers will vary. Wyoming is very rural resulting in the need to drive further distances.
New York is more urban with many mass transit systems resulting in many individual
gasoline expenditures.

170

Section 3.5 The Five-Number Summary and Boxplots


11. (a) The data is ascending order are as follows:
28, 32, 33, 35, 36, 38, 39, 44, 44, 45, 45, 46, 46, 48, 48, 48, 49, 50, 51, 51, 51, 52, 52,
53, 53, 54, 55, 56, 56, 58, 59, 60, 60, 62, 63, 66, 69, 70, 70, 73

The smallest number in the data set is 28. The largest number in the data set is 73. The
first quartile is Q1 =
median is M =
quartile is Q3 =

45 + 45

51 + 51
2
58 + 59
2

= 45 (the mean of the 10th and 11th data points). The

= 51 (the mean of the 20th and 21st data points). The third
= 58.5 (the 30th and 31st data points). The five-number

summary is 28, 45, 51, 58.5, 73.


(b) Lower fence = Q1 1.5 ( IQR ) = 45 1.5 ( 58.5 45 ) = 24.75 ;
Upper fence = Q3 + 1.5 ( IQR ) = 58.5 + 1.5 ( 58.5 45 ) = 78.75 . There are no outliers.

(c) The median is near the center of the box and the horizontal lines are approximately
equal in length, so the distribution is symmetric. This is confirmed by the histogram.
(d) Since the distribution is symmetric and contains no outliers, the mean and standard
deviation should be reported as the measures of central tendency and dispersion.
12. (a) The data is ascending order are as follows:
3.01, 3.04, 3.25, 3.38, 3.38, 3.56, 3.78, 4.35, 4.43, 4.50, 4.74, 4.88, 5.00, 5.02, 5.32,
5.34, 5.53, 5.58, 5.64, 5.75, 6.06, 6.07, 6.23, 6.52, 6.57, 6.92, 7.16, 7.25, 7.57, 7.97,
8.40, 8.74, 9.70, 10.32, 10.96
The smallest number in the data set is 3.01. The largest number in the data set is 10.96.
The first quartile is Q1 = 4.43 (the 9th data point). The median is M = 5.58 (the 18th
data point). The third quartile is Q3 = 7.16 (the 27th data point). The five-number
summary is 3.01, 4.43, 5.58, 7.16, 10.96.
(b) Lower fence = Q1 1.5 ( IQR ) = 4.43 1.5 ( 7.16 4.43) = 0.335 ;
Upper fence = Q3 + 1.5 ( IQR ) = 7.16 + 1.5 ( 7.16 4.43) = 11.255 . There are no outliers.

171

Chapter 3 Numerically Summarizing Data


(c) The median is to the left of the center of the box and the right line is substantially
longer than the left line, so the distribution is skewed right. This is confirmed by the
histogram.
(d) Since the distribution is skewed, the median and interquartile range should be reported
as the measures of central tendency and dispersion.
13. (a) The data is ascending order are as follows:
0, 0, 0, 0, 0, 0, 0, 0.41, 0.62, 0.64, 0.67, 0.89, 0.94, 1.05, 1.06, 1.15, 1.22, 1.35, 1.68,
1.7, 1.7, 2.04, 2.07, 2.16, 2.38, 2.45, 2.59, 2.83

The smallest number in the data set is 0. The largest number in the data set is 2.83.
The first quartile is Q1 =
median is M =
quartile is Q3 =

0 + 0.41

1.05 + 1.06
2
1.7 + 2.04
2

= 0.205 (the mean of the 7th and 8th data points). The

= 1.055 (the mean of the 14th and 15th data points). The third
= 1.87 (the 21st and 22nd data points). The five-number

summary is 0, 0.205, 1.055, 1.87, 2.83.


(b) Lower fence = Q1 1.5 ( IQR ) = 0.205 1.5 (1.87 0.205 ) = 2.2925 ;
Upper fence = Q3 + 1.5 ( IQR ) = 1.87 + 1.5 (1.87 0.205 ) = 4.3675 .

Thus, there are no outliers.

(c) The right line is substantially longer than the left line, so the distribution is skewed
right. This is confirmed by the histogram.
(d) Since the distribution is skewed, the median and interquartile range should be reported
as the measures of central tendency and dispersion.
14. (a) The data is ascending order are as follows:
78, 107, 108, 161, 177, 225, 234, 237, 255, 262, 268, 274, 279, 285, 286, 291, 292,
311, 314, 343, 345, 351, 352, 352, 357, 375, 377, 402, 424, 444, 459, 470, 484, 496,
503, 539, 540, 553, 563, 579, 593, 599, 621, 638, 662, 717, 740, 770, 770, 822, 1633
The smallest number in the data set is 78. The largest is 1633. The first quartile is
Q1 = 279 (the 13th data point). The median is M = 375 (the 26th data point). The third
quartile is Q3 = 563 (the 39th data point). The five-number summary is 78, 279, 375,
563, 1633.

172

Section 3.5 The Five-Number Summary and Boxplots


(b) Lower fence = Q1 1.5 ( IQR ) = 285.5 1.5 ( 563 285.5 ) = 130.75 ;
Upper fence = Q3 + 1.5 ( IQR ) = 563 + 1.5 ( 563 285.5 ) = 979.25 .

Thus, the data point 1633 is an outlier.

(c) The median is to the left of the center of the box, so the distribution is skewed right.
This is confirmed by the histogram.
(d) Since the distribution is skewed, the median and interquartile range should be reported
as the measures of central tendency and dispersion.
15. The data in ascending order are:
Keebler:
20, 20, 21, 21, 21, 22, 23, 24, 24, 24, 25, 25, 26, 28, 28, 28, 28, 29, 31, 32, 33

Store Brand: 16, 17, 18, 21, 21, 21, 23, 23, 24, 24, 24, 25, 26, 26, 27, 27, 28, 29, 30, 31, 33
Since both sets of data contain n = 21 data points, the quartiles are in the same positions for
both sets. Namely, the first quartile is the mean of the 5th and 6th data points, the median is
the 11th data point, and the third quartile is the mean of the 16th and 17th data points.
The five-number summaries are:
Keebler: 20, 21.5, 25, 28, 33
Store Brand: 16, 21, 24, 27.5, 33
The fences for Keebler Chips Deluxe Chocolate Chip Cookies are:
Lower fence = 21.5 1.5 ( 28 21.5 ) = 11.75 ; Upper fence = 28 + 1.5 ( 28 21.5 ) = 37.75
The fences for the store brand chocolate chip cookies are:
Lower fence = 21 1.5 ( 27.5 21) = 11.25 ; Upper fence = 27.5 + 1.5 ( 27.5 21) = 37.25
So, neither data set has any outliers.

Keebler appears to have both a higher number of chocolate chips per cookie and the more
consistent number of chips per cookie.

173

Chapter 3 Numerically Summarizing Data


16. The data in ascending order are:
Oklahoma: 18, 30, 40, 44, 47, 55, 61, 62, 64, 64, 73, 78, 79, 83, 145

Kansas:

42, 59, 62, 64, 68, 71, 73, 88, 91, 92, 95, 101, 113, 116, 122

Nebraska:

26, 28, 30, 55, 60, 61, 62, 63, 65, 69, 74, 81, 88, 102, 110

Since all three sets of data contain n = 15 data points, the quartiles are in the same positions
for all three sets. Namely, the first quartile is the 4th data point, the median is the 8th data
point, and the third quartile is the 12th data point.
The five-number summaries are:
Oklahoma: 18, 44, 62, 78, 145
Kansas: 42, 64, 88, 101, 122
Nebraska: 26, 55, 63, 81, 110
Oklahoma: Lower fence = 44 1.5 ( 78 44 ) = 7 ; Upper fence = 78 + 1.5 ( 78 44 ) = 129 ,
Kansas:

so 145 is an outlier.
Lower fence = 64 1.5 (101 64 ) = 8.5 ; Upper fence = 101 + 1.5 (101 64 ) = 156.5 ,

so there are no outliers.


Nebraska: Lower fence = 55 1.5 ( 81 55 ) = 16 ; Upper fence = 81 + 1.5 ( 81 55 ) = 120 , so
there are no outliers.

Kansas appears to have a higher number of tornados per year.


17. The data in ascending order are:

McGwire: 340, 341, 350, 350, 360, 360, 360, 369, 370, 370, 370, 370, 377, 380, 380, 380,
380, 380, 385, 385, 388, 390, 390, 390, 390, 398, 400, 400, 409, 410, 410, 410,
410, 410, 420, 420, 420, 420, 420, 423, 425, 430, 430, 430, 430, 430, 430, 430,
440, 440, 440, 450, 450, 450, 450, 452, 458, 460, 460, 461, 470, 470, 470, 478,
480, 500, 510, 510, 527, 550
The smallest number in the data set is 340. The largest number is 550. The first quartile is
Q1 = 380 (the mean of the 17th and 18th data points). The median is M = 420 (the mean of
the 35th and 36th data points). The third quartile is Q3 = 450 (the mean of the 53rd and 54th
data points). The five-number summary for Mark McGwire is 340, 380, 420, 450, 550.
Lower fence = 380 1.5(450 380) = 275 ; Upper fence = 450 + 1.5(450 380) = 555 .
Thus, there are no outliers.
174

Section 3.5 The Five-Number Summary and Boxplots


Sosa: 340, 344, 350, 350, 350, 360, 364, 364, 365, 366, 368, 370, 370, 370, 370, 370, 371,
380, 380, 380, 380, 380, 380, 388, 390, 390, 400, 400, 400, 400, 400, 405, 410, 410,
410, 410, 410, 414, 415, 420, 420, 420, 420, 420, 420, 420, 420, 430, 430, 430, 430,
430, 430, 433, 433, 434, 434, 440, 440, 440, 450, 460, 480, 480, 482, 500,
The smallest number in the data set is 340. The largest number is 500. The first quartile is
Q1 = 370.5 (the mean of the 16th and 17th data points). The median is M = 410 (the mean
of the 33rd and 34th data points). The third quartile is Q3 = 430 (the mean of the 50th and
51st data points). The five-number summary for Sammy Sosa is 340, 370.5, 410, 430, 500.
Lower fence = 370.5 1.5 ( 430 370.5 ) = 281.25 ;
Upper fence = 430 + 1.5 ( 430 370.5 ) = 519.25 . Thus, there are no outliers.

(Note: The TI-84 gives Q1 = 371 because the calculator uses a different, but acceptable,
procedure for determining the quartiles. In most cases, the different procedures produce
the same results, but in this case, they differ slightly.)
Bonds:

320, 320, 347, 350, 360, 360, 360, 361, 365, 370, 370, 375, 375, 375, 375, 380,
380, 380, 380, 380, 385, 390, 390, 391, 394, 396, 400, 400, 400, 400, 404, 405,
410, 410, 410, 410, 410, 410, 410, 410, 410, 410, 411, 415, 415, 416, 417, 417,
420, 420, 420, 420, 420, 420, 420, 420, 429, 430, 430, 430, 430, 430, 435, 435,
436, 440, 440, 440, 440, 442, 450, 454, 488

The smallest number in the data set is 320. The largest number is 488. The first quartile is
Q1 = 380 (the mean of the 18th and 19th data points). The median is M = 410 (the 37th
data point). The third quartile is Q3 = 420 (the mean of the 55th and 56th data points). The
five-number summary for Barry Bonds is 320, 380, 410, 420, 488.
Lower fence = 380 1.5 ( 420 380 ) = 320 ; Upper fence = 420 + 1.5 ( 420 380 ) = 480 .
Thus, 488 is an outlier.

Mark McGwire appears to have longer distances. Barry Bonds appears to have the most
consistent distances.

175

Chapter 3 Numerically Summarizing Data

Chapter 3 Review Exercises


1. (a) x =

x = 7925.1 = 792.51 m/s ;


10

M=

792.4 + 792.4
2

= 792.4 m/s

Data in order: 789.6, 791.4, 791.7, 792.3, 792.4, 792.4, 793.1, 793.8, 794.0, 794.4
(b) Range = Largest Data Value Smallest Data Value = 974.4 789.6 = 4.8 m/s .
Data, xi
793.8
793.1
792.4
794.0
791.4
792.4
791.7
792.3
789.6
794.4

Sample Mean, x
792.51
792.51
792.51
792.51
792.51
792.51
792.51
792.51
792.51
792.51

Deviations, xi x
Squared Deviations, ( xi x )
793.8 792.51 = 1.29
1.292 = 1.6641
793.1 792.51 = 0.59
0.592 = 0.3481
792.4 792.51 = 0.11
(0.11) 2 = 0.0121
1.492 = 2.2201
7934.0792.51 = 1.49
791.4 792.51 = 1.11
(1.11)2 = 1.2321
792.4 792.51 = 0.11
(0.11) 2 = 0.0121
791.7 792.51 = 0.81
(0.81) 2 = 0.6561
792.3 792.51 = 0.21
(0.21) 2 = 0.0441
789.6 792.51 = 2.91
(2.91) 2 = 8.4681
794.4 792.51 = 1.89
1.892 = 3.5721

x =7925.1
s

( x x ) = 0

(x x )
=
i

n 1

2. (a) x =

18.22904
10 1

2.03 (m/s) ; s =
2

x = 1268 = 126.8 beats/min ;


n

( x x )

10

M=

(x x )
i

n 1

128 + 129
2

= 18.2290

18.22904

10 1

1.42 m/s .

= 128.5 beats/min.

Data in order: 86, 96, 115, 120, 128, 129, 136, 143, 146, 169
(b) Range = Largest Data Value Smallest Data Value = 169 86 = 83 beats/min.
Data, xi
136

Sample Mean, x
126.8

Deviations, xi x
9.2

Squared Deviations, ( xi x )
84.64

169
120

126.8
126.8

1780.84
46.24

128

126.8

42.2
6.8
1.2

129
143

126.8
126.8

2.2
16.2

4.84
262.44

115
146

126.8
126.8

11.8
19.2

139.24
368.64

96
86

126.8
126.8

30.8
40.8

948.64
1664.64

x =1268

( x x ) = 0
i

176

1.44

( x x )
i

= 5301.60

Chapter 3 Review Exercises

(x x )
=

n 1

s=

3. (a) x =

(x x )

5301.60

n 1

10 1

589.1 (beats/min.) 2 ;

5301.60
10 1

24.3 beats/min.

x = 91, 610 10,178.8889 $10,178.89 ;


9

M = $9,980

Data in order: 5500, 7200, 7889, 8998, 9980, 10995, 12999, 13999, 14050
(b) Range = Largest Data Value Smallest Data Value = 14, 050 5,500 = $8,550 .
Deviations, xi x Squared Deviations, ( xi x )
3871.1111
14,985,501.1
3820.1111
14,593, 248.8

Data, xi
14,050
13,999

Sample Mean, x
10,178.8889
10,178.8889

12,999
10,995
9,980

10,178.8889
10,178.8889
10,178.8889

2820.1111
816.1111
198.8889

7,953,026.6
666,037.3
39,556.8

8,998
7,889

10,178.8889
10,178.8889

1180.8889
2289.8889

1,394, 498.6
5, 243,591.2

7, 200
5,550

10,178.8889
10,178.8889

2978.8889
4678.8889

8,873,779.1
21,892,001.3

x =91,610
s=
(c) x =

( x x ) = 0
i

(x x )
i

n 1

75, 641, 240.9


9 1

( x x )

= 75,641, 240.9

$3, 074.92 .

x = 118, 610 13,178.8889 $13,178.89


n

Data in order: 5500, 7200, 7889, 8998, 9980, 10995, 12999, 13999, 41050
M = $9,980 ; Range = 41, 050 5,500 = $35,550 .
Data, xi
41,050
13,999
12,999
10,995
9,980
8,998
7,889
7, 200
5,550

x =118,610

Sample Mean, x
13,178.8889
13,178.8889
13,178.8889
13,178.8889
13,178.8889
13,178.8889
13,178.8889
13,178.8889
13,178.8889

Deviations, xi x Squared Deviations, ( xi x )


27,871.1111
776,798,833.9
820.1111
672,582.2
179.8889
32,360.0
2183.8889
4,769,370.7
3198.8889
10, 232,890.1
4180.8889
17, 479,831.9
5289.8889
27,982,924.5
5978.8889
35,747,112.4
7678.8889
58,965,334.7

( x x ) = 0
i

177

( x x )
i

= 932,681, 240.9

Chapter 3 Numerically Summarizing Data

(x x )

s=

n 1

932, 681, 240.9


9 1

$10, 797.46 .

The mean, range, and standard deviation are all changed considerably by the incorrectly
entered data value. The median does not change. The median is resistant.
4. (a) x =

x = 2, 071, 024 138, 068.2667 $138, 068


15

Data in order: 99000, 115000, 124757, 128429, 135512, 136529, 136833, 136924,
138820, 140794, 149143, 149380, 153146, 157216, 169541
M = $136,924
(b) Range = Largest Data Value Smallest Data Value = 169,541 99, 000 = $70,541 .
Data, xi
138,820
169,541
135,512
149,143
140,794
153,146
99,000
136,924
136,833
115,000
124,757
128, 429
157, 216
149,380
136,529

Sample Mean, x
138,068.2667
138,068.2667
138,068.2667
138,068.2667
138,068.2667
138,068.2667
138,068.2667
138,068.2667
138,068.2667
138,068.2667
138,068.2667
138,068.2667
138,068.2667
138,068.2667
138,068.2667

x =91,610
s=

(x x )
i

n 1

Deviations, xi x Squared Deviations, ( xi x )


751.7333
565,103
31, 472.7333
990,532,941
2556.2667
6,534, 499
11,074.7333
122,649,717
2,725.7333
7, 429,622
15,078.7333
227,338,041
39,068.2667
1,526,329, 462
1,144.2667
1,309,346
1, 235.2667
1,525,884
23,068.2667
532,144,928
13,311.2667
177,189,821
9,639.2667
92,915, 463
19,147.7333
366,635,690
11,311.7333
1,539.2667

( x x ) = 0
i

4,183, 425,169
15 1

$17, 286.30 .

178

127,955,310
2,369,342

( x x )
i

= 4,183, 425,169

Chapter 3 Review Exercises

x = 933 58.3 years

5. (a) =

Data value, xi
44
56
51
46
59
56
58
55
65
64
68
69
56
62
62
62
xi = 933

16

Data in order: 44, 46, 51, 55, 56, 56, 56,


58, 59, 62, 62, 62, 64, 65, 68, 69
58 + 59
M=
= 58.5 years
2
The data is bimodal: 56 years and 62 years.
Both have frequencies of 3.
(b) Range = 69 44 = 25 years
To calculate the population standard
deviation, we use the computational
formula:

2
i

( x )

2
933)
(

55,169

16

16

6.9 years
(c) Answers will vary depending on samples
selected.

Data value squared, xi2


1936
3136
2601
2116
3481
3136
3364
3025
4225
4096
4624
4761
3136
3844
3844
3844
2
xi = 55,169

2846
177.9 home runs .
16
To find the median, we put the data in order and find the mean of the 8th and 9th data
183 + 185
values: M =
= 184 home runs . The mode is the most frequent data value,
2
which is 135 home runs.
(b) Range = 235 135 = 100 home runs. To find the standard deviation, we determine

6. (a) To find the mean, we find

2
i

= 521,902 . So, =

= 2846 and n = 16 , so =

2
i

( x )

( 2846 )
521,902
16

16

31.3 home runs .

(c) Answers will vary.


(d) The reporter is not lying because the mode is an average. He is being deceptive,
however, because the word average is usually meant as the mean.

78
2.2 children .
36
To find the median, we put the data in order and find the mean of the 18th and 19th data
2+3
= 2.5 children .
values: M =
2

7. (a) To find the mean, we determine

= 78 and n = 36 , so x =

179

Chapter 3 Numerically Summarizing Data


(b) Range = 4 0 = 4 children. To find the standard deviation, we determine

s=

2
i

( x )

n 1

( 78)
224

2
i

= 224 .

36
36 1

1.3 children .

134
4.5 cars .
30
To find the median, we put the data in order and find the mean of the 15th and 16th data
4+5
values: M =
= 4.5 cars .
2
(b) Range = 9 1 = 8 cars. To find the standard deviation, we determine xi2 = 754 .

8. (a) To find the mean, we determine

s=

2
i

( x )

n 1

754

(134 )

30
30 1

= 134 and n = 30 , so x =

2.3 cars .

9. (a) By the Empirical Rule, approximately 99.7% of the data will be within 3 standard
deviations of the mean. Now, 600 3(53) = 441 and 600 + 3(53) = 759. Thus, about
99.7% of light bulbs have lifetimes between 441 and 759 hours.
(b) Since 494 is exactly 2 standard deviations below the mean [494 = 600 2(53)] and 706
is exactly 2 standard deviations above the mean [706 = 600 + 2(53)], the Empirical
Rule predicts that approximately 95% of the light bulbs will have lifetimes between
494 and 706 hours.
(c) Since 547 is exactly 1 standard deviations below the mean [547 = 600 1(53)] and 706
is exactly 2 standard deviations above the mean [706 = 600 + 2(53)], the Empirical
Rule predicts that approximately 34 + 47.5 = 81.5% of the light bulbs will have
lifetimes between 547 and 706 hours.
(d) Since 441 hours is 3 standard deviations below the mean [441 = 600 3(53)], the
Empirical Rule predicts that 0.15% of light bulbs will last less than 441 hours. Thus,
the company should expect to replace about 0.15% of the light bulbs.
1
1

(e) By Chebyshevs theorem, at least 1 2 100% = 1


100% = 84% of all the
2
k
2.5
light bulbs are within k = 2.5 standard deviations of the mean.
(f) Since 494 is exactly k = 2 standard deviations below the mean [494 = 600 2(53)] and
706 is exactly 2 standard deviations above the mean [706 = 600 + 2(53)], Chebyshevs
1
1

inequality indicates that at least 1 2 100% = 1 2 100% = 75% of the light


k
2
bulbs will have lifetimes between 494 and 706 hours.
10. (a) By the Empirical Rule, approximately 99.7% of the data will be within 3 standard
deviations of the mean. Now, 4302 3(340) = 3282 and 4302 + 3(340) = 5322. Thus,
about 99.7% of toner cartridges will print between 3282 and 5322 page.

180

Chapter 3 Review Exercises


(b) Since 3622 is exactly 2 standard deviations below the mean [3622 = 4302 2(340)]
and 4982 is exactly 2 standard deviations above the mean [4982 = 4302 + 2(340)], the
Empirical Rule predicts that approximately 95% of the toner cartridges will print
between 3622 and 4982 hours.
(c) Since 3622 is exactly 2 standard deviations below the mean [3622 = 4302 2(340)] the
Empirical Rule predicts that 0.15 + 2.35 = 2.5% of the toner cartridges light bulbs will
last less than 3622 pages. Thus, the company should expect to replace about 2.5 of the
toner cartridges.
1
1

(d) By Chebyshevs theorem, at least 1 2 100% = 1 2 100% 55.6% of all the


k
1.5
toner cartridges are within k = 1.5 standard deviations of the mean.
(e) Since 3282 is exactly k = 3 standard deviations below the mean [3282 = 4302 3(340)]
and 5322 is exactly 3 standard deviations above the mean [5322 = 4302 + 3(340)],
1
1

Chebyshevs inequality indicates that at least 1 2 100% = 1 2 100% 88.9%


k
3
of the toner cartidges will print between 3282 and 5322 pages.
11.

( xi )

Class

Midpt, xi

Freq, fi

xi f i

xi

20 24

22.5

6035

135,787.5

42.2826

19.7826

2,361,804.87

25 29

27.5

4352

119,680

42.2826

14.7826

951,021.94

30 34

32.5

4083

132,697.5

42.2826

9.7826

390,740.09

35 39

37.5

3933

147,487.5

42.2826

4.7826

89,960.54

40 44

42.5

4194

178,245

42.2826

0.2174

198.22

45 49

47.5

3716

176,510

42.2826

5.2174

101,154.21

50 54

52.5

3005

157,762.5

42.2826

10.2174

313,707.76

55 59

57.5

2355

135,412.5

42.2826

15.2174

545,345.61

60 64

62.5

1664

104,000

42.2826

20.2174

680,148.79

65 69

67.5

1173

79,177.5

42.2826

25.2174

745,930.95

70 74

72.5

1025

74,312.5

42.2826

30.2174

935,918.54

75 79

77.5

895

69,362.5

42.2826

35.2174

1,110,037.41

80 84

82.5

744

61,380

42.2826

40.2174

1,203,374.81

( x )

f = 9, 429,343.76

f
(a) =

xi f i

(b) =

= 37,174

x f

i i

= 1,571,815

1,571,815
42.2826 42.28 years
37,174

( x )
f
i

9, 429,343.76
15.93 years
37,174

181

fi

Chapter 3 Numerically Summarizing Data

12.

( xi )

Class

Midpt, xi

Freq, fi

xi f i

xi

20 24

22.5

1903

42,817.5

43.7136

21.2136

856,382.02

25 29

27.5

1415

38,912.5

43.7136

16.2136

371,976.37

30 34

32.5

1364

44,330

43.7136

11.2136

171,515.94

35 39

37.5

1430

53,625

43.7136

6.2136

55,210.62

40 44

42.5

1409

59,882.5

43.7136

1.2136

2,075.21

45 49

47.5

1242

58,995

43.7136

3.7864

17,806.34

50 54

52.5

1008

52,920

43.7136

8.7864

77,818.43

55 59

57.5

784

45,080

43.7136

13.7864

149,040.82

60 64

62.5

599

37,437.5

43.7136

18.7864

211,404.37

65 69

67.5

415

28,012.5

43.7136

23.7864

234,804.02

70 74

72.5

482

34,945

43.7136

28.7864

399,412.59

75 79

77.5

456

35,340

43.7136

33.7864

520,533.50

80 84

82.5

372

30,690

43.7136

38.7864

559,631.15

f
(a) =

xi f i

(b) =

= 12,879

x f

i i

( x )

= 562,987.5

fi

f = 3,627,581.38

562,987.5
43.7136 43.71 years
12,879

f
( x )
f
i

3,627,581.38
16.78 years
12,879

(c) The mean age of a female involved in a traffic fatality is greater than the mean age of
a male involved in a traffic fatality. Also, the ages of females involved in traffic a
traffic fatality are more dispersed. Answers will vary. One possibility is that an
insurance company might use this information in order to help establish the rates it
would charge for insuring drivers.
13. GPA = xw =

w x
w x

i i

i i

14. Cost per pound = xw =

5(4) + 4(3) + 3(4) + 3(2) 50


=
3.33
5+ 4+3+3
15

w x
w x

i i
i i

15. (a) Yankees:

Mets:

2($2.70) + 1($1.30) + 12 ($1.80)


$2.17 / lb
2 + 1 + 12

184,193,950
$6,351,516 .
29
96, 660,970
=
$3, 452,177 .
28

= 184,193,950 and n = 29 , so Yankees =

= 96, 660,970 and n = 28 , so Mets

182

Chapter 3 Review Exercises


(b) Yankees: M Yankees = $3,100, 000 (the 15th data value)
800, 000 + 1, 000, 000
Mets: M Mets =
= $900, 000 (the mean of the 14th and 15th values)
2
(c) In both cases, the mean is substantially larger than the median, so both distributions are
skewed right.
(d) Yankees:

2
i

2.30011015 , so

Yankees =
Mets:

2
i

2
i

( x )

2.30011015

(184,193,950 )
29

$6, 242, 767.0

19

9.37457 1014 , so

Mets =

2
i

( x )

9.37457 10

14

( 96, 660,970 )

28

28

$4, 643, 606.1

(e) Yankees: $301,400; $837,500; $3,100,000; $11,623,571.50; $22,000,000


Mets: $300,000; $318,750; $900,000; $4,666,666.50; $17,166,667
(f) Fences for the Yankees:
Lower fence = 837,500 1.5(11, 623,571.50 837,500) = $15,341, 607.25
Upper fence = 11, 623,571.50 + 1.5(11, 623,571.50 837,500) = $27,802, 678.75
The Yankees have no outliers.
Fences for the Mets:
Lower fence = 318, 750 1.5(4, 666, 666.50 318, 750) = $6, 203,124.75
Upper fence = 4, 666, 666.50 + 1.5(4, 666, 666.50 318, 750) = $11,188,541.25
The data values $16,071,429 (Vaughn) and $17,166,667 (Piazza) are outliers.

Annotations will vary. One possibility is that the Mets salaries are clearly lower and
less dispersed than the Yankees salaries.
(g) In both boxplots, the median is to the left of the center of the box and the right line is
substantially longer than the left line, so both distributions are skewed right.
(h) For both distributions, the median is the better measure of central tendency since the
distributions are skewed.

183

Chapter 3 Numerically Summarizing Data


64.04
= 6.404 million cycles .
10
113.32
= 11.332 million cycles .
Material B: xi = 113.32 and n = 10 , so xB =
10

16. (a) Material A:

= 64.04 and n = 10 , so xA =

(b) Material A: M A =

Material B: M B =

5.69 + 5.88
2
8.20 + 9.65
2

= 5.785 million cycles (the mean of 5th and 6th values)


= 8.925 million cycles (the mean of 5th and 6th values)

(c) In both cases, the mean is substantially larger than the median, so both distributions are
skewed right.
(d) Material A:

2
i

2
i

sA =
Material B:

2
i

sB =

472.177 , so

( x )

n 1

( 64.04 )
472.177
10 1

10

2.626 million cycles

1597.4002 , so

2
i

( x )

n 1

1597.4002

(113.32 )

10 1

10

5.900 million cycles

(e) Material A: 3.17; 4.52; 5.785; 8.01; 11.92 million cycles


Material B: 5.78; 6.84; 8.925; 14.71; 24.37 million cycles
(f) Fences for Material A:
Lower fence = 4.52 1.5(8.01 4.52) = 0.715 million cycles
Upper fence = 8.01 + 1.5(8.01 4.52) = 13.245 million cycles
Material A has no outliers.

Fences for Material B:


Lower fence = 6.84 1.5(14.71 6.84) = 4.965 million cycles
Upper fence = 14.71 + 1.5(14.71 6.84) = 26.515 million cycles
Material B has no outliers
Bearing Failures

184

Chapter 3 Review Exercises


(g) In both boxplots, the median is to the left of the center of the box and the right line is
substantially longer than the left line, so both distributions are skewed right.
(h) For both distributions, the median is the better measure of central tendency since the
distributions are skewed.
17. The data provided are already listed in ascending order.
k
40
(a) i =
( n + 1) =
( 88 + 1) = 35.6 . Since i = 35.6 is not an integer, we average
100
100
366,155 + 371, 479
= $368,817 . This means that
the 35th and 36th data values: P40 =
2
approximately 40% of drivers in the 2004 Nextel Cup Series earned less than
$368,817, and approximately 60% of drivers in the 2004 Nextel Cup Series earned
more than $368,817.

k
95
(b) i =
( n + 1) =
( 88 + 1) = 84.55 . Since i = 84.55 is not an integer, we average
100
100
5, 692, 620 + 6, 221, 710
= $5,957,165 . This means
the 84th and 85th data values: P95 =
2
that approximately 95% of drivers in the 2004 Nextel Cup Series earned less than
$5,957,165, and approximately 5% of drivers in the 2004 Nextel Cup Series earned
more than $5,957,165.
k
10
(c) i =
( n + 1) =
( 88 + 1) = 8.9 . Since i = 8.9 is not an integer, we average the
100
100
65,175 + 70,550
8th and 9th data values: P10 =
= $67862.50 . This means that
2
approximately 10% of drivers in the 2004 Nextel Cup Series earned less than
$67,862.50, and approximately 90% of drivers in the 2004 Nextel Cup Series earned
more than $67,862.50.
(d) Of the 88 drivers in the 2004 Nextel Cup Series, 73 earned less than $4,117,750.
73
Percentile rank of $4,117,750 = 100 83 . Thus, $4,117,750 was at the 83rd
88
percentile. This means that approximately 83% of drivers in the 2004 Nextel Cup
Series earned less than $4,117,750, and approximately 17% of drivers in the 2004
Nextel Cup Series earned more than $4,117,750.
(e) Of the 88 drivers in the 2004 Nextel Cup Series, 13 earned less than $116,359.
13
Percentile rank of $116,359 = 100 15 . Thus, $116,359 was at the 15th
88
percentile. This means that approximately 15% of drivers in the 2004 Nextel Cup
Series earned less than $116,359, and approximately 85% of drivers in the 2004 Nextel
Cup Series earned more than $116,359.

185

Chapter 3 Numerically Summarizing Data


18. The data provided are already listed in ascending order.
k
30
(a) i =
( n + 1) =
( 88 + 1) = 26.7 . Since i = 26.7 is not an integer, we average
100
100
366,155 + 371, 479
the 26th and 27th data values: P30 =
= $268, 422.50 . This means that
2
approximately 30% of drivers in the 2004 Nextel Cup Series earned less than
$268,422.50, and approximately 70% of drivers in the 2004 Nextel Cup Series earned
more than $268,422.50.

k
90
(b) i =
( n + 1) =
( 88 + 1) = 80.1 . Since i = 80.1 is not an integer, we average
100
100
4, 759, 020 + 5,152, 670
the 80th and 81st data values: P90 =
= $4,955,845 . This means
2
that approximately 90% of drivers in the 2004 Nextel Cup Series earned less than
$4,955,845, and approximately 5% of drivers in the 2004 Nextel Cup Series earned
more than $4,955,845.
k
5
(c) i =
( n + 1) =
( 88 + 1) = 4.45 . Since i = 4.45 is not an integer, we average
100
100
57, 450 + 57,590
the 4th and 5th data values: P5 =
= $57,520 . This means that
2
approximately 5% of drivers in the 2004 Nextel Cup Series earned less than $57,520,
and approximately 95% of drivers in the 2004 Nextel Cup Series earned more than
$57,520.
(d) Of the 88 drivers in the 2004 Nextel Cup Series, 49 earned less than $1,333,520.
49
Percentile rank of $1,333,520 = 100 56 . Thus, $1,333,520 was at the 56th
88
percentile. This means that approximately 56% of drivers in the 2004 Nextel Cup
Series earned less than $1,333,520, and approximately 44% of drivers in the 2004
Nextel Cup Series earned more than $1,333,520.
(e) Of the 88 drivers in the 2004 Nextel Cup Series, 16 earned less than $139,614.
16
Percentile rank of $139,614 = 100 18 . Thus, $139,614was at the 18th
88
percentile. This means that approximately 18% of drivers in the 2004 Nextel Cup
Series earned less than $139,614, and approximately 82% of drivers in the 2004 Nextel
Cup Series earned more than $139,614.
x

160 156.5
0.07

51.2
x 185 183.4
z-score for the male: z =
=
= 0.04

40
The weight of the 160-pound female is 0.07 standard deviations above the mean, while the
weight of the 185-pound male is 0.04 standard deviations above the mean. Thus, the 160pound female is relatively heavier.

19. z-score for the female: z =

186

Case Study

Who Was A Mourner?

20. (a) Reading the boxplot, the median crime rate is approximately 4050 per 100,000
population.
(b) Reading the boxplot, the 25th percentile crime rate is approximately 3100 per 100,000
population.
(c) Reading the boxplot, there is one outlier. It is approximately 8000.
(d) Reading the boxplot, the lowest crime rate is approximately 2200 per 100,000
population.

Case Study: Who Was A Mourner?


1. The table below gives the length of each word, line by line in the passage. A listing is also
provided of the proper names, numbers, abbreviation, and titles that have been omitted
from the data set.
3, 7, 8, 3, 7, 3, 3, 6, 2, 3, 3, 2, 3
4, 3, 8, 2, 3, 7, 4, 2, 11
(omitted Richardson and 22d)
6, 3, 4, 9, 3, 7, 4, 2, 4, 2, 6, 4, 3
7, 5, 2, 8, 2, 4
(omitted Frogg Lane, Liberty-Tree, and Monday)
4, 3, 3, 7, 2, 7, 3, 4, 2, 10, 2, 6
5, 4, 8, 2, 3, 7, 2, 4, 6, 4, 3, 5, 6, 2
3, 5, 5, 5, 5, 6, 5, 4, 8, 8
2, 3, 8, 7, 2, 3, 6, 3, 6, 2, 3, 9
(omitted appeard)
3, 6, 4, 3, 3, 7, 3, 5, 2, 9, 3
8, 8, 2, 6, 4, 3, 4, 5, 2, 3, 3, 4, 2, 7
5, 6, 8, 4, 3, 7, 6, 6, 5, 2, 3
6, 12, 5, 6, 2
(omitted Wolfes Summit of human Glory)
5, 2, 3, 1, 7, 6, 3, 5, 4, 4, 1, 6, 3
2. Mean = 4.54; Median = 4; Mode = 3; standard deviation 2.21 ; sample variance 4.90 ;
Range = 11; Minimum = 1; Maximum = 12; Sum = 649; Count = 143

Answers will vary. None of the provided authors match both the measures of central
tendency and the measures of dispersion well. In other words, there is no clear cut choice
for the author based on the information provided. Based on measures of central tendency,
James Otis or Samuel Adams would appear to be the more likely candidates for A
MOURNER. Based on measures of dispersion, Tom Sturdy seems the more likely choice.
Still, the unknown authors mean word length differs considerably from that of Sturdy, and
the unknown authors standard deviation differs considerably from those of Otis and
Adams.

187

Chapter 3 Numerically Summarizing Data


3. Comparing the two Adams summaries, both the measures of center and the measures of
variability differ considerably for the two documents. For example, the means differ by
0.09 and the standard deviations differ by 0.19, not to mention the differences in word
counts and the maximum length. This calls into question the viability of word-length
analysis as a tool for resolving disputed documents. Word-length may be a part of the
analysis needed to determine unknown authors, but other variables should also be taken
into consideration.
4. Other information that would be useful to identify A MOURNER would be the style of the
rhetoric, vocabulary choices, use of particular phrases, and the overall flow of the writing.
In other words, identifying an unknown author requires qualitative analysis in addition to
quantitative analysis.

188

You might also like