You are on page 1of 35

Chapter 1 Solutions

1.1. Most students will prefer to work in seconds, to avoid having to work with decimals or
fractions.
1.2. Who? The individuals in the data set are students in a statistics class. What? There are
eight variables: ID (a label, with no units); Exam1, Exam2, Homework, Final, and Project
(in units in points, scaled from 0 to 100); TotalPoints (in points, computed from the other
scores, on a scale of 0 to 900); and Grade (A, B, C, D, and E). Why? The primary purpose
of the data is to assign grades to the students in this class, and (presumably) the variables
are appropriate for this purpose. (The data might also be useful for other purposes.)
1.3. Exam1 = 79, Exam2 = 88, Final = 88.
1.4. For this student, TotalPoints = 2 86 + 2 82 + 3 77 + 2 90 + 80 = 827, so the grade is B.
1.5. The cases are apartments. There are ve variables: rent (quantitative), cable (categorical),
pets (categorical), bedrooms (quantitative), distance to campus (quantitative).
1.6. (a) To nd injuries per worker, divide the rates in Example 1.6 by 100,000 (or, redo the
computations without multiplying by 100,000). For wage and salary workers, there are
0.000034 fatal injuries per worker. For self-employed workers, there are 0.000099 fatal
injuries per worker. (b) These rates are 1/10 the size of those in Example 1.6, or 10,000
times larger than those in part (a): 0.34 fatal injuries per 10,000 wage/salary workers, and
0.99 fatal injuries per 10,000 self-employed workers. (c) The rates in Example 1.6 would
probably be more easily understood by most people, because numbers like 3.4 and 9.9 feel
more familiar. (It might be even better to give rates per million worker: 34 and 99.)
1.7. Shown are two possible stemplots; the rst uses split
stems (described on page 11 of the text). The scores are
slightly left-skewed; most range from 70 to the low 90s.

5
6
6
7
7
8
8
9
9

58
0
58
0023
5558
00003
5557
0002233
8

5
6
7
8
9

58
058
00235558
000035557
00022338

1.8. Preferences will vary. However, the stemplot in Figure 1.8 shows a bit more detail, which
is useful for comparing the two distributions.
1.9. (a) The stemplot of the altered data is shown on the right. (b) Blank stems
should always be retained (except at the beginning or end of the stemplot),
because the gap in the distribution is an important piece of information about
the data.

53

1
2
2
3
3
4
4
5

6
5568
34
55678
012233
8
1

1.10. Student preferences will vary. The stemplot


has the advantage of showing each individual
score. Note that this histogram has the same
shape as the second histogram in Exercise 1.7.

Chapter 1

Frequency

54

9
8
7
6
5
4
3
2
1
0
50

Frequency

1.11. Student preferences may vary, but the


larger classes in this histogram hide a lot of
detail.

Looking at DataDistributions

60

90

100

18
16
14
12
10
8
6
4
2
0
40

60
80
First exam scores

100

7
6
Frequency

1.12. This histogram shows more details about


the distribution (perhaps more detail than
is useful). Note that this histogram has the
same shape as the rst histogram in the solution to Exercise 1.7.

70
80
First exam scores

5
4
3
2
1
0
55

60

65

70 75 80 85 90
First exam scores

95 100

1.13. Using either a stemplot or histogram, we see that the distribution is left-skewed, centered
near 80, and spread from 55 to 98. (Of course, a histogram would not show the exact values
of the maximum and minimum.)
1.14. (a) The cases are the individual employees. (b) The rst four (employee identication
number, last name, rst name, and middle initial) are labels. Department and education level
are categorical variables; number of years with the company, salary, and age are quantitative
variables. (c) Column headings in student spreadsheets will vary, as will sample cases.
1.15. A Web search for city rankings or best cities will yield lots of ideas, such as crime
rates, income, cost of living, entertainment and cultural activities, taxes, climate, and school
system quality. (Students should be encouraged to think carefully about how some of these
might be quantitatively measured.)

Solutions

55

1.16. Recall that categorical variables place individuals into groups or categories, while
quantitative variables take numerical values for which arithmetic operations. . . make sense.
Variables (a), (d), and (e)age, amount spent on food, and heightare quantitative. The
answers to the other three questionsabout dancing, musical instruments, and broccoliare
categorical variables.
1.18. Student answers will vary. A Web search for college ranking methodology gives
some ideas; in recent year, U.S. News and World Report used 16 measures of academic
excellence, including academic reputation (measured by surveying college and university
administrators), retention rate, graduation rate, class sizes, faculty salaries, student-faculty
ratio, percentage of faculty with highest degree in their elds, quality of entering students
(ACT/SAT scores, high school class rank, enrollment-to-admission ratio), nancial resources,
and the percentage of alumni who give to the school.

brown

gray

white

red

black

blue

yellow

orange

black

red

purple

green

40
35
30
25
20
15
10
5
0
blue

Percent

1.19. For example, blue is by far the most popular choice; 70% of respondents chose 3 of the
10 options (blue, green, and purple).

Favorite color
30
25
Percent

1.20. For example, opinions about least-favorite


color are somewhat more varied than favorite
colors. Interestingly, purple is liked and disliked by about the same fractions of people.

20
15
10
5
white

green

gray

yellow

purple

brown

orange

Least favorite color

1.21. (a) There were 232 total respondents. The table that follows gives the percents; for
10 .
= 4.31%. (b) The bar graph is on the following page. (c) For example, 87.5%
example,
232
of the group were between 19 and 50. (d) The age-group classes do not have equal width:
The rst is 18 years wide, the second is 6 years wide, the third is 11 years wide, etc.
Note: In order to produce a histogram from the given data, the bar for the rst age
group would have to be three times as wide as the second bar, the third bar would have to
be wider than the second bar by a factor of 11/6, etc. Additionally, if we change a bars

56

Chapter 1

Looking at DataDistributions

width by a factor of x, we would need to change that bars height by a factor of 1/x.

70 and over

51 to 69

36 to 50

25 to 35

1 to 18

19 to 24

Percent
4.31%
41.81%
30.17%
15.52%
6.03%
2.16%

Percent

Age group
(years)
1 to 18
19 to 24
25 to 35
36 to 50
51 to 69
70 and over

40
35
30
25
20
15
10
5
0

Age group (years)

1.22. (a) & (b) The bar graph and pie charts are shown below. (c) A clear majority (76%)
agree or strongly agree that they browse more with the iPhone than with their previous
phone. (d) Student preferences will vary. Some might prefer the pie chart because it is more
familiar.
Strongly
disagree

Response percent

50
40
30

Mildly
disagree

20

Strongly
agree

Mildly
agree

10
0
Strongly
disagree
25
Replacement percent

20
15
10
5

Previous phone model

g
thi
n

he

Ot

No

ian

mb

kic

Sy

de
Si

ry
er
kB

lm

Pa

Bl

ow

ind

bil

o
sM

ac

zr

0
Ra

1.23. Ordering bars by decreasing height shows


the models most affected by iPhone sales.
However, because other phone and replaced nothing are different than the other
categories, it makes sense to place those two
bars last (in any order).

ola

Mildly
disagree

tor

Mildly
agree

Mo

Strongly
agree

Solutions

57

10

Paper

Metals

Other

Metals

15

Glass

Food scraps

20

Wood

25

Glass Other
Wood
Rubber, leather,
textile

Rubber, leather, textiles

Paper, paperboard

Plastics

30

Yard trimmings

Percent of total waste

1.24. (a) The weights add to 254.2 million tons, and the percents add to 99.9.
(b) & (c) The bar graph and pie chart are shown below.

Plastics

Yard trimmings
Food scraps

0
Source

60

60

50

50

Percent recycled

40
30
20
10
0

30
20
10
0
r

pe

s
ng

im

mi

Pa

tal

Me

Tr

mi

im

r
the

Tr

Ru

d
oo

ng

e
bb

s
tic

as

Material

1.26. (a) The bar graph is shown on


the right. (b) The graph clearly illustrates the dominance of Google; its
bar dwarfs those of the other search
engines.

as

Gl

be

b
Ru

Material

Market share (%)

s
tal ape
P
Me

Pl

s
las

ps
ra
sc
od
Fo

40

Pl
Fo asti
od cs
sc
ra
ps

Percent recycled

1.25. (a) & (b) Both bar graphs are shown below. (c) The ordered bars in the graph from (b)
make it easier to identify those materials that are frequently recycled and those that are not.
(d) Each percent represents part of a different whole. (For example, 2.6% of food scraps are
recycled; 23.7% of glass is recycled, etc.)

oo

he

Ot

80
70
60
50
40
30
20
10
0
Google Yahoo

MSN

AOL Microsoft Ask


Live
Search engine

Other

58

Chapter 1

Looking at DataDistributions

Percent of all spam

1.27. The two bar graphs are shown below.


20

20

15

15

10

10

0
Adult

Financial Health

Leisure Products Scams

Products Financial

Adult

Scams

Leisure

Health

Type of spam

Type of spam

10
8
6
4
2
rk
Au ey
str
a
Co lia
lom
bia
Ch
ile
Fra
nc
No e
rw
a
Sw y
ed
en
Me
Ve xico
ne
So zue
uth la
A
Ho frica
ng
Ko
ng
Eg
De ypt
nm
ark
Sp
ain
Ind
Ge ia
rm
an
y
Isr
ae
l
Ita
ly

Tu

do

na
Ca

ing
dK
Un

ite

da

0
m

Facebook users (millions)

1.28. (a) The bar graph is below. (b) The number of Facebook users trails off rapidly after the
top seven or so. (Of course, this is due in part to the variation in the populations of these
countries. For example, that Norway has nearly half as many Facebook users as France is
remarkable, because the 2008 populations of France and Norway were about 62.3 million
and 4.8 million, respectively.)

Country

1.29. (a) Most countries had moderate (single- or double-digit) increases in Facebook usages. Chile (2197%) is an extreme outlier, as are (maybe) Venezuela
(683%) and Colombia (246%). (b) In the stemplot on the right, Chile and
Venezuela have been omitted, and stems are split ve ways. (c) One observation is that, even without the outliers, the distribution is right-skewed. (d) The
stemplot can show some of the detail of the low part of the distribution, if the
outliers are omitted.

0
0
0
0
0
1
1
1
1
1
2
2
2

000
2333
4444
6
99
33

59

70
60
50
40
30
20
10
Theology

M.B.A.

M.D.

Law

Other M.S.

Other Ph.D.

Ed.D.

Other M.A.

0
M.Ed.

1.30. (a) The given percentages refer to nine


distinct groups (all M.B.A. degrees, all
M.Ed. degrees, and so on) rather than one
single group. (b) Bar graph shown on the
right. Bars are ordered by height, as suggested by the text; students may forget to do
this or might arrange in the opposite order
(smallest to largest).

Degrees earned by women (%)

Solutions

Yel

low

Oth

er

ld

/go

Re

e
Blu

ite
Wh

rl

Gra

er

pea

ite

Wh

Silv

Bla

ck

Color

25
20
15
10
5

d
/go
l

rl

low

ite

ite
Wh

pea

Yel

Color

Re

e
Gra
y
Bla
ck

0
Blu

ld

er
Oth

/go

Re

low

10

Wh

Color

Yel

Blu
e

ite
Wh

rl

Gra

er

pea

ite

Wh

Silv

Bla

ck

15

er

Intermediate cars

Oth

10

20

er

15

Luxury cars

Silv

20

25

Percent of intermediate cars

Percent of luxury cars

1.31. (a) The luxury car bar graph is below


on the left; bars are in decreasing order of
size (the order given in the table). (b) The
intermediate car bar graph is below on the
right. For this stand-alone graph, it seemed
appropriate to re-order the bars by decreasing
size. Students may leave the bars in the order
given in the table; this (admittedly) might
make comparison of the two graphs simpler.
(c) The graph on the right is one possible
choice for comparing the two types of cars:
for each color, we have one bar for each car
type.

Percent

Graduate degree

1.32. This distribution is skewed to the right, meaning that Shakespeares plays contain many
short words (up to six letters) and fewer very long words. We would probably expect most
authors to have skewed distributions, although the exact shape and spread will vary.

60

Chapter 1

Looking at DataDistributions

1.33. Shown is the stemplot; as the text suggests, we have trimmed numbers (dropped the last digit) and split stems. 359 mg/dl appears to be
an outlier. Overall, glucose levels are not under control: Only 4 of the
18 had levels in the desired range.

1.34. The back-to-back stemplot on the right suggests that the


individual-instruction group was more consistent (their numbers have less spread) but not more successful (only two had
numbers in the desired range).

0
1
1
2
2
3
3

Individual
22
99866655
22222
8

0
1
1
2
2
3
3

799
0134444
5577
0
57
5
Class
799
0134444
5577
0
57
5

1.35. The distribution is roughly symmetric, centered near 7 (or between 6 and 7), and
spread from 2 to 13.
1.36. (a) Totals emissions would almost certainly be higher for
0 00000000000000011111
0 222233333
very large countries; for example, we would expect that even
0 445
with great attempts to control emissions, China (with over
0 6677
1 billion people) would have higher total emissions than the
0 888999
1 001
smallest countries in the data set. (b) A stemplot is shown; a
1
histogram would also be appropriate. We see a strong right
1
skew with a peak from 0 to 0.2 metric tons per person and a
1 67
smaller peak from 0.8 to 1. The three highest countries (the
1 9
United States, Canada, and Australia) appear to be outliers;
apart from those countries, the distribution is spread from 0 to 11 metric tons per person.
1.37. To display the
0 000000000000000000000000000000000000011111111111111111111
0 2222222222222222233333333333333333333333
distribution, use
0 444444444444444444445555555555555555555
either a stemplot
0 666666666666666666667777777777777
or a histogram. DT
0 888888888888888999999999999999999
1 000000000000111111111
scores are skewed to
1 22222222222233333333333
the right, centered
1 444444455
near 5 or 6, spread
1 66666777
from 0 to 18. There
1 8
are no outliers. We
might also note that only 11 of these 264 women (about 4%) scored 15 or higher.

Solutions

61

Frequency

1.38. (a) The rst histogram shows two modes: 55.2 and 5.65.8. (b) The second histogram
has peaks in locations close to those of the rst, but these peaks are much less pronounced,
so they would usually be viewed as distinct modes. (c) The results will vary with the
software used.
18
16
14
12
10
8
6
4
2
0
4.2

4.6

5.4
5.8
6.2
Rainwater pH

6.6

18
16
14
12
10
8
6
4
2
0
4.14

4.54

4.94

5.34 5.74 6.14


Rainwater pH

6.54

6.94

1.39. Graph (a) is studying time (Question 4); it is reasonable to expect this to be right-skewed
(many students study little or not at all; a few study longer).
Graph (d) is the histogram of student heights (Question 3): One would expect a fair
amount of variation but no particular skewness to such a distribution.
The other two graphs are (b) handedness and (c) genderunless this was a particularly
unusual class! We would expect that right-handed students should outnumber lefties
substantially. (Roughly 10 to 15% of the population as a whole is left-handed.)
1.40. Sketches will vary. The distribution of coin years would be left-skewed because newer
coins are more common than older coins.
Women
Men
1.41. (a) Not only are most responses multiples of 10;
0 033334
many are multiples of 30 and 60. Most people will
96 0 66679999
round their answers when asked to give an estimate
22222221 1 2222222
888888888875555 1 558
like this; in fact, the most striking answers are ones
4440 2 00344
such as 115, 170, or 230. The students who claimed 360
2
3 0
minutes (6 hours) and 300 minutes (5 hours) may have
6 3
been exaggerating. (Some students might also consider
suspicious the student who claimed to study 0 minutes per night. As a teacher, I can easily
believe that such students exist, and I suspect that some of your students might easily accept
that claim as well.) (b) The stemplots suggest that women (claim to) study more than men.
The approximate centers are 175 minutes for women and 120 minutes for men.

62

Chapter 1

Looking at DataDistributions

1.42. The stemplot gives more information than a histogram (since all the
original numbers can be read off the stemplot), but both give the same impression. The distribution is roughly symmetric with one value (4.88) that
is somewhat low. The center of the distribution is between 5.4 and 5.5 (the
median is 5.46, the mean is 5.448); if asked to give a single estimate for the
true density of the earth, something in that range would be the best answer.

48
49
50
51
52
53
54
55
56
57
58

8
7
0
6799
04469
2467
03578
12358
59
5

1.43. (a) There are four variables: GPA, IQ, and self-concept are quantitative, while gender
is categorical. (OBS is not a variable, since it is not really a characteristic of a student.)
(b) Below. (c) The distribution is skewed to the left, with center (median) around 7.8. GPAs
are spread from 0.5 to 10.8, with only 15 below 6. (d) There is more variability among the
boys; in fact, there seems to be a subset of boys with GPAs from 0.5 to 4.9. Ignoring that
group, the two distributions have similar shapes.
0
1
2
3
4
5
6
7
8
9
10

5
8
4
4689
0679
1259
0112249
22333556666666788899
0000222223347899
002223344556668
01678

Female

4
7
952
4210
98866533
997320
65300
710

1.44. Stemplot at right, with split stems. The distribution is


fairly symmetricperhaps slightly left-skewedwith center
around 110 (clearly above 100). IQs range from the low 70s
to the high 130s, with a gap in the low 80s.

0
1
2
3
4
5
6
7
8
9
10

Male
5
8
4
689
069
1
129
223566666789
0002222348
2223445668
68

7
7
8
8
9
9
10
10
11
11
12
12
13
13

24
79
69
0133
6778
0022333344
555666777789
0000111122223334444
55688999
003344
677888
02
6

Solutions

63

1.46. The time plot on the right shows that


womens times decreased quite rapidly from
1972 until the mid-1980s. Since that time,
they have been fairly consistent: Almost all
times since 1986 are between 141 and 147
minutes.

Winning time (minutes)

1.45. Stemplot at right, with split stems. The distribution is


skewed to the left, with center around 59.5. Most self-concept
scores are between 35 and 73, with a few below that, and one
high score of 80 (but not really high enough to be an outlier).

2
2
3
3
4
4
5
5
6
6
7
7
8

01
8
0
5679
02344
6799
1111223344444
556668899
00001233344444
55666677777899
0000111223
0

190
180
170
160
150
140
1970 1975 1980 1985 1990 1995 2000 2005
Year

1.47. The total for the 24 countries was 897 days, so with Suriname, it is 897 + 694 = 1591
days, and the mean is x = 1591
25 = 63.64 days.
1.48. The mean score is x =

821
= 82.1.
10

1.49. To nd the ordered list of times, start with the 24 times in Example 1.23, and add 694 to
the end of the list. The ordered times (with median highlighted) are
4, 11, 14, 23, 23, 23, 23, 24, 27, 29, 31, 33, 40 ,
42, 44, 44, 44, 46, 47, 60, 61, 62, 65, 77, 694
The outlier increases the median from 36.5 to 40 days, but the change is much less than the
outliers effect on the mean.
1.50. The median of the service times is 103.5 seconds. (This is the average of the 40th and
41st numbers in the sorted list, but for a set of 80 numbers, we assume that most students
will compute the median using software, which does not require that the data be sorted.)
1.51. In order, the scores are:
55, 73, 75, 80, 80 , 85 , 90, 92, 93, 98
The middle two scores are 80 and 85, so the median is M =

80 + 85
= 82.5.
2

64

Chapter 1

Looking at DataDistributions

1.52. See the ordered list given in the previous solution.


The rst quartile is Q 1 = 75, the median of the rst ve numbers: 55, 73, 75 , 80, 80.
Similarly, Q 3 = 92, the median of the last ve numbers: 85, 90, 92 , 93, 98.
1.53. The maximum and minimum can be found by inspecting the list. The sorted list (with
quartile and median locations highlighted) is
1
19
55
75
104
140
201
372

2
25
56
76
106
141
203
386

2
30
57
76
115
143
211
438

3
35
59
77
116
148
225
465

4
40
64
80
118
148
274
479

9
44
67
88
121
157
277
700

9
48
68
89
126
178
289
700

9
51
73
90
128
179
290
951

11
52
73
102
137
182
325
1148

19
54
75
103
138
199
367
2631

This conrms the ve-number summary (1, 54.5, 103.5, 200, and 2631 seconds)
given in Example 1.26. The sum of the 80 numbers is 15,726 seconds, so the mean is
x = 15,726
80 = 196.575 seconds (the value 197 in the text was rounded).
Note: The most tedious part of this process is sorting the numbers and adding them
all up. Unless you really want to conrm that your students can sort a list of 80 numbers,
consider giving the students the sorted list of times, and checking their ability to identify the
locations of the quartiles.
1.54. The median and quartiles were found earlier; the minimum and maximum are easy to
locate in the ordered list of scores (see the solutions to Exercises 1.51 and 1.52), so the
ve-number summary is Min = 55, Q 1 = 75, M = 82.5, Q 3 = 92, Max = 98.

Min = 55, Q 1 = 75, M = 82.5, Q 3 = 92, Max = 98

Score on first exam

1.55. Use the ve-number summary from the solution to Exercise 1.54:

95
90
85
80
75
70
65
60
55
50

1.56. The interquartile range is IQR = Q 3 Q 1 = 92 75 = 17, so the 1.5 IQR rule would
consider as outliers scores outside the range Q 1 25.5 = 49.5 to Q 3 + 25.5 = 117.5.
According to this rule, there are no outliers.
1.57. The variance can be computed from the formula s 2 =

1 
(xi x)2 ; for
n1

example, the rst term in the sum would be (80 82.1)2 = 4.41. However, in practice,
1416.9
= 157.43 and
software or a calculator is the preferred approach; this yields s 2 =
9
.
s = s 2 = 12.5472.

Solutions

65

1.58. In order to have s = 0, all 5 cases must be equal; for example, 1, 1, 1, 1, 1, or


12.5, 12.5, 12.5, 12.5, 12.5. (If any two numbers are different, then xi x would be nonzero
for some i, so the sum of squared differences would be positive, so s 2 > 0, so s > 0.)
1.59. Without Suriname, the quartiles are 23 and 46.5 days; with Suriname included, they are
23 and 53.5 days. Therefore, the IQR increases from 23.5 to 30.5 daysa much less drastic
change than the change in s (18.6 to 132.6 days).
1.60. Divide total score by 4:

950
= 237.5 points.
4

1.61. (a) Use a stemplot or histogram. (b) Because


the distribution is skewed, the ve-number
summary is the best choice; in millions of
dollars, it is
Min
3338

Q1
4589

M
7558.5

Q3
13,416

Max
66,667

0
0
1
1
2
2
3
3
4
4
5
5
6
6

333333333333333333444444444444
55555555566666677777777778888889
00001112223333333
79
01111233
559
114
5

Some students might choose the less-appropriate


.
.
summary: x = 12,144 and s = 12,421 mil3
lion dollars. (c) For example, the distribution
99
is sharply right-skewed. (This is not surprising
6
given that we are looking at the top 100 companies; the top fraction of most distributions will tend to be skewed to the right.)
1.62. (a) Either a stemplot
x
s
Min
Q1
M Q 3 Max
or histogram can be used
All points
4.7593 0.7523 0.4 4.30 4.7
5
6.5
to display the distribuNo ODouls 4.8106 0.5864 3.8 4.35 4.7
5
6.5
tion. Two stemplots are shown on the following page: one with all points, and one with the
outlier mentioned in part (b) excluded. In the table are the mean and standard deviation, as
well as the ve-number summary, both with and without the outlier (all values are percents).
The latter is preferable because of the outlier; in particular, note the outliers effect on the
standard deviation. (See also the solution to the next exercise.) (b) ODouls is marketed as
non-alcoholic beer.
Note: In federal regulations, part of the denition of beer is that it has at least 0.5%
alcohol. By that standard, ODouls is a low-alcohol beverage, but it is not beer.

66

Chapter 1
All points
0 4
0
1
1
2
2
3
3 88
4 11111122222222223334444
4 555555666667777777778889999999999
5 000000011224
5 5666688999999
6 1
6 5

Looking at DataDistributions
Without ODouls
3 88
4 111111
4 2222222222333
4 4444555555
4 66666777777777
4 8889999999999
5 000000011
5 22
5 45
5 6666
5 88999999
6 1
6
6 5

1.63. All of these numbers are given in the table in the solution to the previous exercise.
(a) x changes from 4.76% (with) to 4.81% (without); the median (4.7%) does not change.
(b) s changes from 0.7523% to 0.5864%; Q 1 changes from 4.3% to 4.35%, while Q 3 = 5%
does not change. (c) A low outlier decreases x; any kind of outlier increases s. Outliers
have little or no effect on the median and quartiles.
1.64. (a) A stemplot or histogram can be used to display
the distribution. Students may report either mean/standard
deviation or the ve-number summary (in units of calories):
x
141.06

s
27.79

Min
70

Q1
113

M
145.5

Q3
157

Max
210

7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

0
4556889
2458
00000000334
08
0235558
22333444555666788899
0012233356777
00012336669
01459
8
5
00
0

(b) ODouls has the fewest calories (70) of these 86 beers.


(c) Nearly all the beers with fewer than 120 calories are
marketed as light beers (and most have light in their
names). Of the other beers, only one (Weinhards Amber
Light) is called light.
Note: If we apply the 1.5 IQR rule to all 86 beers,
ODouls does not qualify as an outlier (the cutoff is 47).
However, if we restrict our attention to the light beers (fewer than 120 calories), any beer
below 80 calories is an outlier.

1.65. Use a small data set with an odd number of points, so that the median is the middle
number. After deleting the lowest observation, the median will be the average of that middle
number and the next number after it; if that latter number is much larger, the median will
change substantially. For example, start with 0, 1, 2 , 998, 1000; after removing 0, the
median changes from 2 to 500.
1.66. Salary distributions (especially in professional sports) tend to be skewed to the right. This
skew makes the mean higher than the median.

Solutions

67

1.67. (a) The distribution is left-skewed. While the skew makes the
ve-number summary is preferable, some students might give the
mean/standard deviation. In ounces, these statistics are:
x
6.456

s
1.425

Min
3.7

Q1
4.95

M
6.7

Q3
7.85

3
4
4
5
5
6
6
7
7
8

Max
8.2

7
3
7777
23
0033
7
03
668899999
2

(b) The numerical summary does not reveal the two weight clusters (visible in a stemplot or histogram). (c) For small potatoes (less than 6 oz),
n = 8, x = 4.662 oz, and s = 0.501 oz. For large potatoes, n = 17,
x = 7.300 oz, and s = 0.755 oz. Because there are clearly two groups, it seems appropriate
to treat them separately.

70
60
50
40
30

Frequency

Diameter at breast height (cm)

1.68. (a) The ve-number summary is Min = 2.2 cm, Q 1 = 10.95 cm, M = 28.5 cm, Q 3 =
41.9 cm, Max = 69.3 cm. (b) & (c) The boxplot and histogram are shown below. (Students
might choose different interval widths for the histogram.) (d) Preferences will vary. Both
plots reveal the right-skew of this distribution, but the boxplot does not show the two peaks
visible in the histogram.

20
10

9
8
7
6
5
4
3
2
1
0
0

10

20 30 40 50 60 70
Diameter at breast height (cm)

80

70

30

60

25

50
40
30

Frequency

CRP (mg/l)

1.69. (a) The ve-number summary is Min = 0 mg/l, Q 1 = 0 mg/l, M = 5.085 mg/l, Q 3 =
9.47 mg/l, Max = 73.2 mg/l. (b) & (c) The boxplot and histogram are shown below.
(Students might choose different interval widths for the histogram.) (d) Preferences will
vary. Both plots reveal the sharp right-skew of this distribution, but because Min = Q 1 , the
boxplot looks somewhat strange. The histogram seems to convey the distribution better.

20
15
10

20

10

10

20

30

40 50 60
CRP (mg/l)

70

80

90

1.70. Answers depend on whether natural (base-e) or common (base-10) logarithms are used. Both sets of answers
are shown here. If this exercise is assigned, it would
probably be best for the sanity of both instructor and
students to specify which logarithm to use.
(a) The ve-number summary is:
Logarithm
Natural
Common

Min
0
0

Q1
0
0

M
1.8048
0.7838

Q3
2.3485
1.0199

Max
4.3068
1.8704

Looking at DataDistributions

4.5
4
3.5
3
2.5
2
1.5
1
0.5
0

2
Base-10 log of (1+CRP)

Chapter 1

Natural log of (1+CRP)

68

1.5
1
0.5
0

16
14
12
10
8
6
4
2
0

Frequency

Frequency

.
(The ratio between these answers is roughly ln 10 = 2.3.)
(b) & (c) The boxplots and histograms are shown below. (Students might choose different
interval widths for the histograms.) (d) As for Exercise 1.69, preferences will vary.

0.5

1.5 2 2.5 3 3.5 4


Natural log of (1+CRP)

4.5

16
14
12
10
8
6
4
2
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2
Base-10 log of (1+CRP)

1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0

14
12
Frequency

Retinol level (mol/l)

1.71. (a) The ve-number summary (in units of mol/l) is Min = 0.24, Q 1 = 0.355, M =
0.76, Q 3 = 1.03, Max = 1.9. (b) & (c) The boxplot and histogram are shown below.
(Students might choose different interval widths for the histogram.) (d) The distribution is
right-skewed. A histogram (or stemplot) is preferable because it reveals an important feature
not evident from a boxplot: This distribution has two peaks.

10
8
6
4
2
0
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
Retinol level (mol/l)

2.2

Solutions

69

1.72. The mean and standard deviation for these ratings are
.
x = 5.9 and s = 3.7719; the ve-number summary is
Min = Q 1 = 1, M = 6.5, Q 3 = Max = 10. For a graphical
presentation, a stemplot (or histogram) is better than a boxplot
because the latter obscures details about the distribution. (With
a little thought, one might realize that Min = Q 1 = 1 and
Q 3 = Max = 10 means that there are lots of 1s and lots
of 10s, but this is much more evident in a stemplot or histogram.)

1
2
3
4
5
6
7
8
9
10

0000000000000000
0000
0
0
00000
000
0
000000
00000
000000000000000000

1.73. The distribution of household net worth would almost surely be strongly skewed to the
right: Most families would generally have accumulated little or modest wealth, but a few
would have become rich. This strong skew pulls the mean to be higher than the median.
1.74. See also the solution to Exercise 1.36. (a) The venumber summary (in units of metric tons per person) is:
Min = 0, Q 1 = 0.75, M = 3.2, Q 3 = 7.8, Max = 19.9
The evidence for the skew is in the large gaps between the
higher numbers; that is, the differences Q 3 M and Max Q 3
are large compared to Q 1 Min and M Q 1 . (b) The IQR
is Q 3 Q 1 = 7.05, so outliers would be less than 9.825 or
greater than 18.375. According to this rule, only the United
States qualies as an outlier, but Canada and Australia seem
high enough to also include them.

0
0
0
0
0
1
1
1
1
1

00000000000000011111
222233333
445
6677
888999
001
67
9

.
1.75. The total salary is $690,000, so the mean is x = $690,000
= $76,667. Six of the nine
9
employees earn less than the mean. The median is M = $35,000.
1.76. If three individuals earn $0, $0, and $20,000, the reported median is $20,000. If the two
individuals with no income take jobs at $14,000 each, the median decreases to $14,000.
The same thing can happen to the mean: In this example, the mean drops from $20,000 to
$16,000.
1.77. The total salary is now $825,000, so the new mean is x =
median is unchanged.
1.78. Details at right.
11,200
= 1600
7
214,872
= 35,812 and
s2 =
6

.

x=

s=

35,812 = 189.24

$825,000
9

xi
1792
1666
1362
1614
1460
1867
1439
11200

.
= $91,667. The

xi x
192
66
238
14
140
267
161
0

(xi x)2
36864
4356
56644
196
19600
71289
25921
214872

70

Chapter 1

Looking at DataDistributions

1.79. The quote describes a distribution with a strong right skew: Lots of years with no losses
to hurricane ($0), but very high numbers when they do occur. For example, if there is one
hurricane in a 10-year period causing $1 million in damages, the average annual loss for
that period would be $100,000, but that does not adequately represent the cost for the year
of the hurricane. Means are not the appropriate measure of center for skewed distributions.
Women
Men
1.80. (a) x and s are appropriate for symmetric disx
s
x
s
tributions with no outliers. (b) Both high numbers
Before
165.2
56.5
117.2
74.2
are agged as outliers. For women, IQR = 60,
After
158.4 43.7
110.9 66.9
so the upper 1.5 IQR limit is 300 minutes. For
men, IQR = 90, so the upper 1.5 IQR limit is 285 minutes. The table on the right shows
the effect of removing these outliers.

1.81. (a) & (b) See the table on the right. In both cases,
the mean and median are quite similar.

pH
Density

x
5.4256
5.4479

s
0.5379
0.2209

M
5.44
5.46

1.82. See also the solution to Exercise 1.43. (a) The mean of
x
s
M
IQ
108.9 13.17
110
this distribution appears to be higher than 100. (There is
GPA 7.447 (2.1) 7.829
no substantial difference between the standard deviations.)
(b) The mean and median are quite similar; the mean is slightly smaller due to the slight left
skew of the data. (c) In addition to the mean and median, the standard deviation is shown for
reference (the exercise did not ask for it).
Note: Students may be somewhat puzzled by the statement in (b) that the median is
close to the mean (when they differ by 1.1), followed by (c), where they differ a bit
(when M x = 0.382). It may be useful to emphasize that we judge the size of such differ.
1.1
ences relative to the spread of the distribution. For example, we can note that 13.17
= 0.08
.
for (b), and 0.382
2.1 = 0.18 for (c).
1.83. With only two observations, the mean and median are always equal because the median
is halfway between the middle two (in this case, the only two) numbers.
1.84. (a) The mean (green arrow) moves along with the moving point (in fact, it moves in
the same direction as the moving point, at one-third the speed). At the same time, as long
as the moving point remains to the right of the other two, the median (red arrow) points to
the middle point (the rightmost nonmoving point). (b) The mean follows the moving point
as before. When the moving point passes the rightmost xed point, the median slides along
with it until the moving point passes the leftmost xed point, then the median stays there.
1.85. (a) There are several different answers, depending on the conguration of the rst ve
points. Most students will likely assume that the rst ve points should be distinct (no
repeats), in which case the sixth point must be placed at the median. This is because the
median of 5 (sorted) points is the third, while the median of 6 points is the average of the
third and fourth. If these are to be the same, the third and fourth points of the set of six
must both equal the third point of the set of ve.
The diagram below illustrates all of the possibilities; in each case, the arrow shows the

Solutions

71

location of the median of the initial ve points, and the shaded region (or dot) on the line
indicates where the sixth point can be placed without changing the median. Notice that there
are four cases where the median does not change, regardless of the location of the sixth
point. (The points need not be equally spaced; these diagrams were drawn that way for
convenience.)

(b) Regardless of the conguration of the rst ve points, if the sixth point is added so as to
leave the median unchanged, then in that (sorted) set of six, the third and fourth points must
be equal. One of these two points will be the middle (fourth) point of the (sorted) set of
seven, no matter where the seventh point is placed.
Note: If you have a student who illustrates all possible cases above, then it is likely that
the student either (1) obtained a copy of this solutions manual, (2) should consider a career
in writing solutions manuals, (3) has too much time on his or her hands, or (4) both 2 and
3 (and perhaps 1) are true.
1.86. The ve-number summaries (all in millimeters) are:
Q1
46.71
38.07
35.45

M
47.12
39.16
36.11

Q3
48.245
41.69
36.82

Max
50.26
43.09
38.13

H. bihai is clearly the tallest varietythe shortest bihai was


over 3 mm taller than the tallest red. Red is generally taller
than yellow, with a few exceptions. Another noteworthy
fact: The red variety is more variable than either of the
other varieties.
1.87. (a) The means and standard deviations
(all in millimeters) are:
Variety
bihai
red
yellow

x
47.5975
39.7113
36.1800

s
1.2129
1.7988
0.9753

bihai
46 3466789
47 114
48 0133
49
50 12

48
Length (mm)

bihai
red
yellow

Min
46.34
37.40
34.57

50
46
44
42
40
38
36
34
bihai
red
yellow
Heliconia variety

red
37
38
39
40
41
42
43

4789
0012278
167
56
4699
01
0

yellow
34 56
35 146
36 0015678
37 01
38 1

(b) Bihai and red appear to be right-skewed (although it is difcult to tell with such small
samples). Skewness would make these distributions unsuitable for x and s.

72

Chapter 1

Looking at DataDistributions

.
1.88. (a) The mean is x = 15, and the standard deviation is s = 5.4365. (b) The mean is still
15; the new standard deviation is 3.7417. (c) Using the mean as a substitute for missing data
will not change the mean, but it decreases the standard deviation.
1.89. The minimum and maximum are easily determined to be 1 and 12 letters, and the
quartiles and median can be found by adding up the bar heights. For example, the rst
two bars have total height 22.3% (less than 25%), and adding the third bar brings the total
to 45%, so Q 1 must equal 3 letters. Continuing this way, we nd that the ve-number
summary, in units of letters, is:
Min = 1, Q 1 = 3, M = 4, Q 3 = 5, Max = 12
Note that even without the frequency table given in the data le, we could draw the same
conclusion by estimating the heights of the bars in the histogram.
1.90. Because the mean is to be 7, the ve numbers must add up to 35. Also, the third number
(in order from smallest to largest) must be 10 because that is the median. Beyond that, there
is some freedom in how the numbers are chosen.
Note: It is likely that many students will interpret positive numbers as meaning
positive integers only, which leads to eight possible solutions, shown below.
1 1 10 10 13
1 3 10 10 11

1 1 10 11 12
1 4 10 10 10

1 2 10 10 12
2 2 10 10 11

1 2 10 11 11
2 3 10 10 10

1.91. The simplest approach is to take (at least) six numberssay, a, b, c, d, e, f in increasing
order. For this set, Q 3 = e; we can cause the mean to be larger than e by simply choosing
f to be much larger than e. For example, if all numbers are nonnegative, f > 5e would
accomplish the goal because then
e+ f
e + 5e
a+b+c+d +e+ f
>
>
= e.
x=
6
6
6
1.92. The algebra might be a bit of a stretch for some students:
=

(x1 x) +

(x2 x) +

(x3 x) + + (xn1 x) +

(xn x)

x1 x +

x2 x +

x3 x + + xn1 x +

xn x

(drop all the parentheses)

x1 + x2 + x3 + + xn1 + xn

x x x x x

x1 + x2 + x3 + + xn1 + xn

nx

(rearrange the terms)

Next, simply observe that n x = x1 + x2 + x3 + + xn1 + xn .


1.93. (a) One possible answer is 1, 1, 1, 1. (b) 0, 0, 20, 20. (c) For (a), any set of four
identical numbers will have s = 0. For (b), the answer is unique; here is a rough description
of why. We want to maximize the spread-out-ness of the numbers (which is what standard
deviation measures), so 0 and 20 seem to be reasonable choices based on that idea. We also
want to make each individual squared deviation(x1 x)2 , (x2 x)2 , (x3 x)2 , and
(x4 x)2 as large as possible. If we choose 0, 20, 20, 20or 20, 0, 0, 0we make the

Solutions

73

rst squared deviation 152 , but the other three are only 52 . Our best choice is two at each
extreme, which makes all four squared deviations equal to 102 .
1.94. Answers will vary. Typical calculators will carry only about 12 to 15 digits; for example,
a TI-83 fails (gives s = 0) for 14-digit numbers. Excel (at least the version I checked) also
fails for 14-digit numbers, but it gives s = 262,144 rather than 0. The (very old) version of
Minitab used to prepare these answers fails at 20,000,001 (eight digits), giving s = 2.
1.95. The table on the right reproduces the
(in mm)
(in inches)
Variety
x
s
x
s
means and standard deviations from the
bihai
47.5975
1.2129
1.874
0.04775
solution to Exercise 1.87 and shows those
red
39.7113 1.7988 1.563 0.07082
values expressed in inches. For each converyellow 36.1800 0.9753 1.424 0.03840
sion, multiply by 39.37/1000 = 0.03937 (or
divide by 25.4an inch is dened as 25.4 millimeters). For example, for the bihai variety,
x = (47.5975 mm)(0.03937 in/mm) = (47.5975 mm) (25.4 mm/in) = 1.874 in.
1.96. (a) x = 5.4479 and s = 0.2209. (b) The rst measurement corresponds to
5.50 62.43 = 343.365 pounds per cubic foot. To nd x new and snew , we similarly multiply
.
.
by 62.43: x new = 340.11 and snew = 13.79.
Note: The conversion from cm to feet is included in the multiplication by 62.43; the
step-by-step process of this conversion looks like this:
(1 g/cm3 )(0.001 kg/g)(2.2046 lb/kg)(30.483 cm3/ft3 ) = 62.43 lb/ft3
.
1.97. Convert from kilograms to pounds by multiplying by 2.2: x = (2.42 kg)(2.2 lb/kg) =
.
5.32 lb and s = (1.18 kg)(2.2 lb/kg) = 2.60 lb.
1.98. Variance is changed by a factor of 2.542 = 6.4516; generally, for a transformation
xnew = a + bx, the new variance is b2 times the old variance.
1.99. There are 80 service times, so to nd the 10% trimmed mean, remove the highest and
lowest eight values (leaving 64). Remove the highest and lowest 16 values (leaving 48) for
the 20% trimmed mean.
The mean and median for the full data set are x = 196.575 and M = 103.5 minutes. The
.
.
10% trimmed mean is x = 127.734, and the 20% trimmed mean is x = 111.917 minutes.
Because the distribution is right-skewed, removing the extremes lowers the mean.

74

Chapter 1

Looking at DataDistributions

12

25

10
20

Frequency

Diameter at breast height (in)

1.100. After changing the scale from centimeters to inches, the ve-number summary values
change by the same ratio (that is, they are multiplied by 0.39). The shape of the histogram
might change slightly because of the change in class intervals. (a) The ve-number
summary (in inches) is Min = 0.858, Q 1 = 4.2705, M = 11.115, Q 3 = 16.341, Max =
27.027. (b) & (c) The boxplot and histogram are shown below. (Students might choose
different interval widths for the histogram.) (d) As in Exercise 1.56, the histogram reveals
more detail about the shape of the distribution.

15
10

8
6
4
2

0
0

10
15
20
25
30
Diameter at breast height (in)

35

1.101. Take the mean plus or minus two standard deviations: 572 2(51) = 470 to 674.
1.102. Take the mean plus or minus three standard deviations: 572 3(51) = 419 to 725.
1.103. The z-score is z =

620 572
51

.
= 0.94.

572 .
1.104. The z-score is z = 510 51
= 1.22. This is negative because an ISTEP score of 510 is
below average; specically, it is 1.22 standard deviations below the mean.

.
1.105. Using Table A, the proportion below 620 (z = 0.94)
is 0.8264 and the proportion at or above is 0.1736; these
two proportions add to 1. The graph on the right illustrates this with a single curve; it conveys essentially the
same idea as the graphical subtraction picture shown in
Example 1.36.
.
1.106. Using Table A, the proportion below 620 (z = 0.94)
.
is 0.8264, and the proportion below 660 (z = 1.73) is
0.9582. Therefore:

620
0.8264

419

470

0.1736

521

572

623

674

725

620 660
0.8264
0.9582

area between
area left
area left
=

620 and 660


of 660
of 620
0.1318

0.9582

419

470

521

572

623

674

0.8264

The graph on the right illustrates this with a single curve; it conveys essentially the same
idea as the graphical subtraction picture shown in Example 1.37.

725

Solutions

75

.
1.107. Using Table A, this ISTEP score should correspond to a standard score of z = 0.67
.
(software gives 0.6745), so the ISTEP score (unstandardized) is 572 + 0.67(51) = 606.2
(software: 606.4).
.
1.108. Using Table A, x should correspond to a standard score of z = 0.84 (software gives
.
0.8416), so the ISTEP score (unstandardized) is x = 572 0.84(51) = 529.2 (software:
529.1).
1.109. Of course, student sketches will not
be as neat as the curves on the right,
but they should have roughly the correct
shape. (a) It is easiest to draw the curve
1
4
7
10
13
16
19
22
25
28
rst, and then mark the scale on the
axis. (b) Draw a copy of the rst curve, with the peak over 20. (c) The curve has the same
shape, but is translated left or right.
1.110. (a) As in the previous exercise, draw the curve
rst, and then mark the scale on the axis. (b) In order
to have a standard deviation of 1, the curve should be
1/3 as wide, and three times taller. (c) The curve is
centered at the same place (the mean), but its height
and width change. Specically, increasing the standard
deviation makes the curve wider and shorter; decreasing the standard deviation makes the curve narrower
and taller.

10

13

16

19

1.111. Sketches will vary.


Women
Men
1.112. (a) The table on the right gives the
68%
7856 to 20,738
4995 to 23,125
ranges for women; for example, about 68%
95%
1415 to 27,179
4070 to 32,190
of women speak between 7856 and 20,738
99.7%
5026
to
33,620
13,135
to 41,255
words per day. (b) Negative numbers do
not make sense for this situation. The 689599.7 rule is reasonable for a distribution that
is close to Normal, but by constructing a stemplot or histogram, it is easily conrmed that
this distribution is slightly right-skewed. (c) These ranges are also in the table; the mens distribution is more skewed than the womens distribution, so the 689599.7 rule is even less
appropriate. (d) This does not support the conventional wisdom: The ranges from parts (a)
and (c) overlap quite a bit. Additionally, the difference in the means is quite small relative to
the large standard deviations.

76

Chapter 1

Looking at DataDistributions

Women
Men
1.113. (a) Ranges are given in the table on
68%
8489
to
20,919
7158
to 22,886
the right. In both cases, some of the lower
95%
2274
to
27,134
706
to
30,750
limits are negative, which does not make
99.7%
3941
to
33,349
8,570
to
38,614
sense; this happens because the womens
distribution is skewed, and the mens distribution has an outlier. Contrary to the conventional
wisdom, the mens mean is slightly higher, although the outlier is at least partly responsible
for that. (b) The means suggest that Mexican men and women tend to speak more than people of the same gender from the United States.

1.114. (a) For example, 6870


= 0.2. The complete list is given on the right.
10
(b) The cut-off for an A is the 85th percentile for the N (0, 1) distribution.
From Table A, this is approximately 1.04; software gives 1.0364. (c) The top
two students (with scores of 92 and 98) received As.

68
54
92
75
73
98
64
55
80
70

0.2
1.6
2.2
0.5
0.3
2.8
0.6
1.5
1
0

1.115. (a) We need the 5th, 15th, 55th, and


Table A
Software
85th percentiles for a N (0, 1) distribuStandard Actual
Standard Actual
tion. These are given in the table on the
F
1.64
53.6
1.6449
53.55
D
1.04
59.6
1.0364
59.64
right. (b) To convert to actual scores, take
C
0.13
71.3
0.1257
71.26
the standard-score cut-off z and compute
B
1.04
80.4
1.0364
80.36
10z + 70. (c) Opinions will vary.
Note: The cut-off for an A given in the previous solution is the lowest score that gets an
Athat is, the point where ones grade drops from an A to a B. These cut-offs are the points
where ones grade jumps up. In practice, this is only an issue for a score that falls exactly
on the border between two grades.
1.116. (a) The curve forms a 1 1 square, which
has area 1.
(b) P(X < 0.35) = 0.35.
(c) P(0.35 < X < 0.65) = 0.3.
0

1.117. (a) The height should be 14 since the area


under the curve must be 1. The density curve
is on the right. (b) P(X 1) = 14 = 0.25.
(c) P(0.5 < X < 2.5) = 0.5.

0.35

0.35 0.65

1.118. The mean and median both equal 0.5; the quartiles are Q 1 = 0.25 and Q 3 = 0.75.
1.119. (a) Mean is C, median is B (the right skew pulls the mean to the right). (b) Mean A,
median A. (c) Mean A, median B (the left skew pulls the mean to the left).

Solutions
1.120. Hint: It is best to draw the curve rst, then place
the numbers below it. Students may at rst make mistakes like drawing a half-circle instead of the correct
bell-shaped curve, or being careless about locating the
standard deviation.

77

218

234

250

266

282

298

314

1.121. (a) The applet shows an area of 0.6826 between 1.000 and 1.000, while the
689599.7 rule rounds this to 0.68. (b) Between 2.000 and 2.000, the applet reports
0.9544 (compared to the rounded 0.95 from the 689599.7 rule). Between 3.000 and
3.000, the applet reports 0.9974 (compared to the rounded 0.997).

1.122. See the sketch of the curve in the solution to Exercise 1.120. (a) The middle 95% fall
within two standard deviations of the mean: 266 2(16), or 234 to 298 days. (b) The
shortest 2.5% of pregnancies are shorter than 234 days (more than two standard deviations
below the mean).
1.123. (a) 99.7% of horse pregnancies fall within three standard deviations of the mean: 336 3(3), or 327 to 325
days. (b) About 16% are longer than 339 days since 339
days or more corresponds to at least one standard devia327 330 333 336 339 342 345
tion above the mean.
Note: This exercise did not ask for a sketch of the Normal curve, but students should be
encouraged to make such sketches anyway.
1.124. Because the quartiles of any distribution have 50% of
observations between them, we seek to place the ags so
that the reported area is 0.5. The closest the applet gets
is an area of 0.5034, between 0.680 and 0.680. Thus,
the quartiles of any Normal distribution are about 0.68
standard deviations above and below the mean.
Note: Table A places the quartiles at about 0.67;
other statistical software gives 0.6745.
1.125. The mean and standard deviation are x = 5.4256 and s = 0.5379. About 67.62%
.
(71/105 = 0.6476) of the pH measurements are in the range x s = 4.89 to 5.96. About
95.24% (100/105) are in the range x 2s = 4.35 to 6.50. All (100%) are in the range
x 3s = 3.81 to 7.04.

78

Chapter 1

1.126. Using values from Table A:


(a) Z > 1.65: 0.0495. (b) Z < 1.65: 0.9505.
(c) Z > 0.76: 0.7764. (d) 0.76 < Z <
1.65: 0.9505 0.2236 = 0.7269.

(a)

Looking at DataDistributions

(b)
1.65

1.65

0.76

(c)

0.76

(d)

1.65

1.127. Using values from Table A:


(a) Z 1.8: 0.0359. (b) Z 1.8:
0.9641. (c) Z > 1.6: 0.0548. (d) 1.8 <
Z < 1.6: 0.9452 0.0359 = 0.9093.

(a)

1.129. (a) z = 0.3853 has cumulative pro(a)


portion 0.65 (that is, 0.3853 is the 65th
percentile of the standard Normal distribution). (b) If z = 0.1257, then Z > z has
3
proportion 0.45 (0.1257 is the 55th percentile).

(d)
1.6

1.128. (a) 22% of the observations fall below


(a)
0.7722. (This is the 22nd percentile of the
standard Normal distribution.) (b) 40% of
3
the observations fall above 0.2533 (the 60th
percentile of the standard Normal distribution).

1.8

(c)

(b)
1.8

1.6

1.8

(b)
0.22

0.40

(b)
0.65

0.45

1.130. 70 is two standard deviations below the mean (that is, it has standard score z = 2), so
about 2.5% (half of the outer 5%) of adults would have WAIS scores below 70.
1.131. 130 is two standard deviations above the mean (that is, it has standard score z = 2), so
about 2.5% of adults would score at least 130.
1509 .
1.132. Tonyas score standardizes to z = 1820321
= 0.9688, while Jermaines score
.
29 21.5
corresponds to z = 5.4 = 1.3889. Jermaines score is higher.

.
1.133. Jacobs score standardizes to z = 16 5.421.5 = 1.0185, while Emilys score corresponds
.
1509
to z = 1020321
= 1.5234. Jacobs score is higher.
1509 .
1.134. Joses score standardizes to z = 2080321
= 1.7788, so an equivalent ACT score is
.
21.5 + 1.7788 5.4 = 31.1. (Of course, ACT scores are reported as whole numbers, so this
would presumably be a score of 31.)

Solutions

79
.
= 1.5741, so an equivalent SAT score is

1.135. Marias score standardizes to z =


.
1509 + 1.5741 321 = 2014.

30 21.5
5.4

1.136. Marias score standardizes to z =


Her score is the 96.5 percentile.

2090 1509
321

1.137. Jacobs score standardizes to z =


His score is the 32.3 percentile.

19 21.5
5.4

.
= 1.81, for which Table A gives 0.9649.

.
= 0.4630, for which Table A gives 0.3228.

1.138. 1920 and above: The top 10% corresponds to a standard score of z = 1.2816, which in
.
turn corresponds to a score of 1509 + 1.2816 321 = 1920 on the SAT.
1.139. 1239 and below: The bottom 20% corresponds to a standard score of z = 0.8416,
.
which in turn corresponds to a score of 1509 0.8416 321 = 1239 on the SAT.
1.140. The quartiles of a Normal distribution are 0.6745 standard deviations from the mean,
.
so for ACT scores, they are 21.5 0.6745 5.4 = 17.9 to 25.1.
1.141. The quintiles of the SAT score distribution are 1509 0.8416 321 = 1239,
1509 0.2533 321 = 1428, 1509 + 0.2533 321 = 1590, and 1509 + 0.8416 321 = 1779.
1.142. For a Normal distribution with mean 55 mg/dl and standard deviation 15.5 mg/dl:
55 .
(a) 40 mg/dl standardizes to z = 4015.5
= 0.9677. Using Table A, 16.60% of women fall
55 .
= 0.3226.
below this level (software: 16.66%). (b) 60 mg/dl standardizes to z = 6015.5
Using Table A, 37.45(c) Subtract the answers from (a) and (b) from 100%: Table A gives
45.95% (software: 45.99%), so about 46% of women fall in the intermediate range.
1.143. For a Normal distribution with mean 46 mg/dl and standard deviation 13.6 mg/dl:
46 .
(a) 40 mg/dl standardizes to z = 4013.6
= 0.4412. Using Table A, 33% of men fall below
46 .
this level (software: 32.95%). (b) 60 mg/dl standardizes to z = 6013.6
= 1.0294. Using
Table A, 15.15(c) Subtract the answers from (a) and (b) from 100%: Table A gives 51.85%
(software: 51.88%), so about 52% of men fall in the intermediate range.
1.144. (a) About 0.6% of healthy young adults have osteoporosis (the cumulative probability
below a standard score of 2.5 is 0.0062). (b) About 31% of this population of older
women has osteoporosis: The BMD level which is 2.5 standard deviations below the young
adult mean would standardize to 0.5 for these older women, and the cumulative probability
for this standard score is 0.3085.
1.145. (a) About 5.2%: x < 240 corresponds to z < 1.625. Table A gives 5.16% for
1.63 and 5.26% for 1.62. Software (or averaging the two table values) gives 5.21%.
(b) About 54.7%: 240 < x < 270 corresponds to 1.625 < z < 0.25. The area to the
left of 0.25 is 0.5987; subtracting the answer from part (a) leaves about 54.7%. (c) About
279 days or longer: Searching Table A for 0.80 leads to z > 0.84, which corresponds to
x > 266 + 0.84(16) = 279.44. (Using the software value z > 0.8416 gives x > 279.47.)

80

Chapter 1

Looking at DataDistributions

1.146. (a) The quartiles for a standard Normal distribution are 0.6745. (b) For a N (, )
distribution, Q 1 = 0.6745 and Q 3 = + 0.6745 . (c) For human pregnancies,
.
.
Q 1 = 266 0.6745 16 = 255.2 and Q 3 = 266 + 0.67455 16 = 276.8 days.
1.147. (a) As the quartiles for a standard Normal distribution are 0.6745, we have
IQR = 1.3490. (b) c = 1.3490: For a N (, ) distribution, the quartiles are
Q 1 = 0.6745 and Q 3 = + 0.6745 .
1.148. In the previous two exercises, we found that for a N (, ) distribution,
Q 1 = 0.6745 , Q 3 = + 0.6745 , and IQR = 1.3490 . Therefore,
1.5 IQR = 2.0235 , and the suspected outliers are below Q 1 1.5 IQR = 2.698 ,
and above Q 3 + 1.5 IQR = + 2.698 . The percentage outside of this range is
2 0.0035 = 0.70%.
1.149. (a) The rst and last deciles for a standard Normal distribution are 1.2816. (b) For
.
a N (9.12, 0.15) distribution, the rst and last deciles are 1.2816 = 8.93 and
.
+ 1.2816 = 9.31 ounces.
1.150. The shape of the quantile plot suggests that the data are right-skewed (as was observed
in Exercises 1.36 and 1.74). This can be seen in the at section in the lower leftthese
numbers were less spread out than they should be for Normal dataand the three apparent
outliers (the United States, Canada, and Australia) that deviate from the line in the upper
right; these were much larger than they would be for a Normal distribution.
1.151. (a) The plot is reasonably linear except for the point in the upper right, so this
distribution is roughly Normal, but with a high outlier. (b) The plot is fairly linear, so
the distribution is roughly Normal. (c) The plot curves up to the rightthat is, the large
values of this distribution are larger than they would be in a Normal distributionso the
distribution is skewed to the right.

5.8
5.6
Density

1.152. See also the solution to Exercise 1.42.


The plot suggests no major deviations from
Normality, although the three lowest measurements do not quite fall in line with the
other points.

5.4
5.2
5
4.8
3

1
0
1
Normal score

Solutions

81

1.153. (a) All three quantile plots are below; the yellow variety is the nearest to a straight line.
(b) The other two distributions are slightly right-skewed (the lower-left portion of the graph
is somewhat at); additionally, the bihai variety appears to have a couple of high outliers.
H. caribaea red

43

38 H. caribaea yellow

42

49

37

41
48

36

40
39

47

35

38

46

37
3

1
0
1
Normal score

34
3

1
0
1
Normal score

1
0
1
Normal score

1.154. Shown are a histogram and quantile plot for one sample of 200 simulated N (0, 1)
points. Histograms will vary slightly but should suggest a bell curve. The Normal quantile
plot shows something fairly close to a line but illustrates that, even for actual Normal data,
the tails may deviate slightly from a line.
3
Simulated values

50

Frequency

40
30
20

2
1
0

10

2
3

3 2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 2.5 3


Simulated values

1
0
1
Normal score

1.155. Shown are a histogram and quantile plot for one sample of 200 simulated uniform data
points. Histograms will vary slightly but should suggest the density curve of Figure 1.34
(but with more variation than students might expect). The Normal quantile plot shows that,
compared to a Normal distribution, the uniform distribution does not extend as low or as
high (not surprising, since all observations are between 0 and 1).

Simulated values

25
Frequency

Flower length (mm)

50 H. bihai

20
15
10
5
0
0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9


Simulated values

1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
3

1
0
1
Normal score

82

Chapter 1

Looking at DataDistributions

1.156. Shown is a back-to-back stemplot; the distributions


could also be compared with histograms or boxplots. Either
mean/standard deviation or the ve-number summary could
be used; both are given below. Both the graphical and
numerical descriptions reveal that hatchbacks generally have
higher fuel efciency (and also are more variable).

Hatchback

00

Hatchback

x
22.548

s
3.423

Min
16

Q1
20

M
21.5

Q3
25

Max
30

Large
sedan

16.571

1.425

13

16

17.0

17

19

0
000
00000000
0000000
00
00
00000
000
0
00000
0
0
0

1.157. (a) The distribution appears to be roughly Normal. (b) One could
justify using either the mean and standard deviation or the ve-number
summary:
x
15.27%

s
3.118%

Min
8.2%

Q1
13%

M
15.5%

Q3
17.6%

Max
22.8%

(c) For example, binge drinking rates are typically 10% to 20%. Which
states are high, and which are low? One might also note the geographical
distribution of states with high binge-drinking rates: The top six states
(Wisconsin, North Dakota, Iowa, Minnesota, Illinois, and Nebraska) are
all adjacent to one another.

1.158. (a) The stemplot on the right suggests that there are two groups of
states: the under-23% and over-23% groups. Additionally, while they do
not qualify as outliers, Oklahoma (16.3%) and Vermont (30%) stand out
as notably low and high. (b) One could justify using either the mean and
standard deviation or the ve-number summary:
x
23.71%

s
3.517%

Min
16.3%

Q1
20.8%

M
24.3%

Q3
26.4%

Max
30%

Neither summary reveals the two groups of states visible in the stemplot.
(c) One could explore the connections (geographical, socioeconomic, etc.)
between the states in the two groups; for example, the top group includes
many northeastern states, while the bottom group includes quite a few
southern states.

13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

Large sedan
00
00
00000000
0000000000
0000
00

8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

28

16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

58
34
023689
015788
0077
13466889
01567
45677789
8
148
2
6
8

14678
4679
268
346899
3488
12446
023468
02346
0455
355679
0

Solutions

83

Percent

1.159. Students might compare


100
Silver
color preferences using
90
80
a stacked bar graph like
White
70
that shown on the right,
Gray
60
or side-by-side bars like
50
Black
those below. (They could
40
also make six pie charts,
Blue
30
but comparing slices across
20
Red
pies is difcult.) Possible
10
Brown
observations: white is con0
North South Europe China South Japan
siderably less popular in
Other
America America
Korea
Europe, and gray is less
common in China.
Note: The orders of countries and colors is as given in the text, which is more-or-less
arbitrary. (Colors are ordered by decreasing popularity in North America.)

North America

25

South America

Percent

20
15

Europe

10

China

South Korea

Japan
Silver

White

Gray

Black

Blue

Red

Brown

Other

1.162. Using either a histogram or stemplot,


we see that this distribution is sharply rightskewed. For this reason, the ve-number
summary is preferred.
Min
0

Q1
3

M
12.5

Q3
34

Max
86

Frequency

Color
80
70
60
50
40
30
20
10
0

Some students might report the less.


.
appropriate x = 21.62 and s = 22.76.
0 10 20 30 40 50 60 70 80 90 100
From the histogram and ve-number
Internet users per hundred people
summary, we can observe, for example, that
many countries have fewer than 10 Internet users per 100 people. In 75% of countries, less
than 1/3 of the population uses the Internet.

84

Chapter 1

Looking at DataDistributions

1.163. The distribution is somewhat right-skewed (although considerably less


than the distribution with all countries) with only one country (Bosnia and
Herzegovina) in the 20s. Because of the irregular shape, students might
choose either the mean/standard deviation or the ve-number summary:
x
39.85

s
22.05

Min
1.32

Q1
18.68

M
43.185

Q3
54.94

Max
85.65

Baltimore
Boston
Chicago
Long Beach
Los Angeles
Miami
Minneapolis
New York
Oakland
Philadelphia
San Francisco
Washington, D.C.

7.82
8.26
4.02
6.25
8.07
3.67
14.87
6.23
9.30
7.04
7.61
13.12

Open space (acres)

40000
30000
20000
10000
0
ore
Bos
t
Ch on
i
c
a
Lon
g
gB o
Los each
Ang
ele
s
Min Miam
nea i
po
Ne lis
wY
o
Oa rk
Phi kland
la
San delph
Wa Fran ia
shi
ngt cisco
on,
D.C
.

tim
Bal

14
Acres of open space
per 1000 people

Bal

14
12
10
8
6
4
2

12
10
8
6
4
2

Wa

ore
tim

Min
n
shi eapo
ngt
on, lis
D.C
.
Oa
kla
nd
B
Los oston
Ang
ele
s
B
San altimo
Fra re
n
Phi cisco
lad
e
Lon lphia
gB
ea
Ne ch
wY
o
Ch rk
ica
go
Mia
mi

0
Bos
t
Ch on
i
c
a
Lon
g
gB o
Los each
Ang
ele
s
Min Miam
nea i
po
Ne lis
wY
o
Oa rk
Phi kland
la
San delph
Wa Fran ia
shi
ngt cisco
on,
D.C
.

0
Bal

Acres of open space


per 1000 people

145789
23488889
5
0134467
124666669
022345688
223
026
15

50000

8000
7000
6000
5000
4000
3000
2000
1000
0
tim
ore
Bos
t
Ch on
Lon icago
gB
Los each
Ang
ele
s
M
Min iam
nea i
po
Ne lis
wY
o
Oa rk
k
l
and
Phi
la
San delph
Wa Fran ia
shi
ngt cisco
on,
D.C
.

Population (thousands)

1.164. (a) & (b) The graphs are below. Bars are shown in alphabetical order by city name (as the data were given in the table).
.
(c) For Baltimore, for example, this rate is 5091
651 = 7.82. The
complete table is shown on the right. (d) & (e) Graphs below.
Note that the text does not specify whether the bars should be
ordered by increasing or decreasing rate. (f) Preferences may
vary, but the ordered bars make comparisons easier.

0
1
2
3
4
5
6
7
8

Solutions

85

1.165. The given description is true on the average, but the curves (and a few calculations)
give a more complete picture. For example, a score of about 675 is about the 97.5th
percentile for both genders, so the top boys and girls have very similar scores.
1.166. (a) & (b) Answers will vary. Denitions might be as simple as free time, or time
spent doing something other than studying. For part (b), it might be good to encourage
students to discuss practical difculties; for example, if we ask Sally to keep a log of her
activities, the time she spends lling it out presumably reduces her available leisure time.
1.167. Shown is a stemplot; a histogram
should look similar to this. This distribution is relatively symmetric apart from
one high outlier. Because of the outlier,
the ve-number summary (in hours) is
preferred:
22 23.735 24.31 24.845 28.55
Alternatively, the mean and standard
deviation are x = 24.339 and s = 0.9239
hours.

22
22
23
23
24
24
25
25
26
26
27
27
28
28

013
7899
000011222233344444
55566666667777778888888999
00000011111112222222223333333333444444
555555666666666777777888888999999
00001111233344
56666889
2
56
2
5

1.169. Either a bar graph or a pie chart could


be used. The given numbers sum to 66.7, so
the Other category presumably includes the
remaining 29.3 million subscribers.

Subscribers (millions)

1.168. Gender and automobile preference are categorical; age and household income are
quantitative.

25
20
15
10
5

mc

Co

AT

&T
Ro
a
adR st
unn
er
Am Veriz
eric
o
aO n
nlin
e
Ear
thL
ink
Ch
arte
r
Q
Ca west
ble
vis
Un
ited ion
On
line
Oth
er

1.170. Womens weights are skewed to the right: This makes the mean higher than the median,
and it is also revealed in the differences M Q 1 = 14.9 lb and Q 3 M = 24.1 lb.
1.171. (a) For car makes (a categorical variable), use either a bar graph or pie chart. For
car age (a quantitative variable), use a histogram, stemplot, or boxplot. (b) Study time is
quantitative, so use a histogram, stemplot, or boxplot. To show change over time, use a time
plot (average hours studied against time). (c) Use a bar graph or pie chart to show radio
station preferences. (d) Use a Normal quantile plot to see whether the measurements follow
a Normal distribution.

Chapter 1

1.172. The counts given add to 6067, so the


others received 626 spam messages. Either a bar graph or a pie chart would be
appropriate. What students learn from this
graph will vary; one observation might be
that AA and BB (and perhaps some others)
might need some advice on how to reduce
the amount of spam they receive.

Spam count

86

Looking at DataDistributions

1800
1600
1400
1200
1000
800
600
400
200
0
AA BB CC DD EE FF GG HH II JJ KK LL other
Account ID

1.173. No, and no: It is easy to imagine examples of many different data sets with mean 0 and
standard deviation 1for example, {1,0,1} and {2,0,0,0,0,0,0,0,2}.
Likewise, for any given ve numbers a b c d e (not all the same), we can
create many data sets with that ve-number summary, simply by taking those ve numbers
and adding some additional numbers in between them, for example (in increasing order):
10,
, 20,
,
, 30,
,
, 40,
, 50. As long as the number in the rst blank is
between 10 and 20, and so on, the ve-number summary will be 10, 20, 30, 40, 50.
1.174. The time plot is shown below; because of the great detail in this plot, it is larger than
other plots. Ruths and McGwires league-leading years are marked with different symbols.
(a) During World War II (when many baseball players joined the military), the best home
run numbers decline sharply and steadily. (b) Ruth seemed to set a new standard for other
players; after his rst league-leading year, he had 10 seasons much higher than anything that
had come before, and home run production has remained near that same level ever since
(even the worst post-Ruth year1945had more home runs than the best pre-Ruth season).
While some might argue that McGwires numbers also raised the standard, the change is
not nearly as striking, nor did McGwire maintain it for as long as Ruth did. (This is not
necessarily a criticism of McGwire; it instead reects that in baseball, as in many other
endeavors, rates of improvement tend to decrease over time as we reach the limits of human
ability.)

League-leading HRs in season

70
60
50
40
30
20
10
0
1880

1900

1920

1940
Year

1960

1980

2000

Solutions
1.175. Bondss mean changes from 36.56 to 34.41 home runs (a drop of 2.15),
while his median changes from 35.5 to 34 home runs (a drop of 1.5). This
illustrates that outliers affect the mean more than the median.

87
1
2
2
3
3
4
4
5
5
6
6
7

69
4
55
3344
77
02
5669

1.176. Recall the texts description of the effects of a linear transformation xnew = a + bx: The
mean and standard deviation are each multiplied by b (technically, the standard deviation
is multiplied by |b|, but this problem species that b > 0). Additionally, we add a to the
(new) mean, but a does not affect the standard deviation. (a) The desired transformation
is xnew = 40 + 2x; that is, a = 40 and b = 2. (We need b = 2 to double the standard
deviation; as this also doubles the mean, we then subtract 40 to make the new mean 100.)
.
1 .
(b) xnew = 45.4545 + 1.8182x; that is, a = 49 11
= 49.0909 and b = 20
11 = 1.8182.
5
(This choice of b makes the new standard deviation 20 and the new mean 145 11
; we then
subtract 45.4545 to make the new mean 100.) (c) Davids score2 72 40 = 104is
.
higher within his class than Nancys score1.8182 78 45.4545 = 96.4is
within her class. (d) A third-grade score of 75 corresponds to a score of 110 from the
100
N (100, 20) distribution, which has a standard score of z = 110 20
= 0.5. (Alternatively,
70
= 0.5.) A sixth-grade score of 75 corresponds to about 90.9 on the transformed
z = 75 10
100
80 .
scale, which has standard score z = 90.920
= 0.45. Therefore, about 69% of
= 75 11
third graders and 32% of sixth graders score below 75.

1.177. Results will vary. One set of 20 samples gave


Means
Standard deviations
the results at the right (Normal quantile plots are not
22 568
5 6
23
6
shown).
23 89
6 66899
Theoretically, x will have a Normal distribution

24 02
7 3
.
with mean 25 and standard deviation 8/ 30 = 1.46,
24 89
7
25 3
8 113
so that about 99.7% of the time, one should nd x
25 6799
8 789
between 20.6 and 29.4. Meanwhile, the theoretical dis26 124
9 000
tribution of s is nearly Normal (slightly skewed) with
26
9 556
59
.
.
mean = 7.9313 and standard deviation = 1.0458; about
27 4
10 2
99.7% of the time, s will be between 4.8 and 11.1.
Note: If we take a sample of sizen from a Normal distribution and compute the sample standard deviation S, then (S/ ) n 1 has a chi distribution with n 1 degrees of
freedom (which looks like a Normal distribution when n is reasonably large). You can learn
all you would want to knowand moreabout this distribution on the Web (for example, at
Wikipedia). One implication
of this is


 that on the average, s underestimates ; specically,
2
(n/2)
the mean of S is n 1 (n/2 1/2) . The factor in parentheses is always less than 1, but
approaches 1 as n approaches innity. The proof of this fact is left as an exercisefor the
instructor, not for the average student!