Professional Documents
Culture Documents
Chapter 4
4.1 (a) Yes, the scatterplot below (left) shows a linear relationship between the cube root of
weight, 3 weight , and length.
10
0.3
9
0.2
0.1
Residual
6
5
4
0.0
-0.1
3
-0.2
2
1
-0.3
5
10
15
20
25
Length (cm)
30
35
40
10
15
20
25
Length (cm)
30
35
40
(b) Let x = length and y = 3 weight . The least-squares regression line is y = 0.0220 + 0.2466 x .
The intercept of 0.0220 clearly has no practical interpretation in this situation, since weight and
the cube root of weight must be positive. The slope 0.2466 indicates that for every 1 cm increase
in length, the cube root of weight will increase, on average, by 0.2466. (c)
3 weight = 0.0220 + 0.2466 36 8.8556 , so the predicted weight is 8.85563 694.5 g. The
predicted weight with this model is slightly higher than the predicted weight of 689.9g with the
model in Example 4.2. (d) The residual plot above (right) shows the residuals are negative for
lengths below 17 cm, positive for lengths between 18 cm and 27 cm, and have no clear pattern
for lengths above 28 cm. (e) Nearly all (99.88%) of the variation in the cube root of the weight
can be explained by the linear relationship with the length.
4.2 (a) The scatterplot below (left) shows positive association between length and period with
one very unusual point (106.5, 2.115) in the top right corner.
2.2
0.100
2.0
0.075
0.050
1.6
Residual
Period (s)
1.8
1.4
1.2
0.025
0.000
1.0
-0.025
0.8
-0.050
0.6
20
30
40
50
60
70
Length (cm)
80
90
100
110
20
30
40
50
60
70
Length (cm)
80
90
100
110
(b) The residual plot above (right) shows that the residuals tend to be small or negative for small
lengths and then get larger for lengths between 40 and 50 cm. The residual for the one very large
length is negative again. Even though the value of r 2 is 0.983, the residual plot suggests that a
model with some curvature (or a linear model after a transformation) might be better. (c) The
information from the physics student suggests that there should be a linear relationship between
90
Chapter 4
period and length . (d) A scatterplot (left) and residual plot (right) are shown below for the
transformed data. The least-squares regression line for the transformed data is
y = 0.0858 + 0.210 length . The value of r 2 is slightly higher, 0.986 versus 0.983, and the
residual plot looks better, although the residuals for the three smallest lengths are positive and
the residuals for the next six lengths are negative.
2.2
2.0
0.05
1.8
Residual
Period
1.6
1.4
0.00
1.2
-0.05
1.0
0.8
-0.10
0.6
4
7
8
Square root of length
10
11
7
8
Square root of length
10
11
(e) According to the theoretical relationship, the slope in the model for (d) should be
2
0.2007 . The estimated model appears to agree with the theoretical relationship because
980
the estimated slope is 0.210, an absolute difference of about 0.0093. (f) The predicted length of
an 80-centimeter pendulum is y = 0.0858 + 0.210 80 1.7925 seconds.
3.0
3.0
2.5
2.5
Pressure (atmospheres)
Pressure (atmospheres)
4.3 (a) A scatterplot is shown below (left). The relationship is strong, negative and slightly
nonlinear (or curved), with no outliers.
2.0
1.5
1.0
2.0
1.5
1.0
5.0
7.5
10.0
12.5
15.0
Volume (cubic cm)
17.5
20.0
0.050
0.075
0.100
0.125
1/Volume
0.150
0.175
(b) Yes, the scatterplot for the transformed data (above on the right) shows a clear linear
relationship. (c) The least-squares regression equation is P = 0.3677 + 15.8994 (1 V ) . The
square of the correlation coefficient, r 2 = 0.9958 , indicates almost a perfect fit. The residual plot
(below) shows a definite pattern, which should be of some concern, but the model still provides a
good fit.
91
0.050
Residual
0.025
0.000
-0.025
-0.050
-0.075
0.050
0.075
0.100
0.125
1/Volume
0.150
0.175
(d) Letting y = 1 P , the least-squares regression line is y = 0.1002 + 0.0398V . The scatterplot
(below on the left), the value of r 2 = 0.9997 , and the residual plot (below on the right) indicate
that the linear model provides an excellent fit for the transformed data. This transformation also
achieves linearity because V = k P .
0.005
0.9
0.004
0.8
0.003
0.002
Residual
1/Pressure
0.7
0.6
0.001
0.000
-0.001
0.5
-0.002
0.4
-0.003
0.3
-0.004
5.0
7.5
10.0
12.5
15.0
Volume (cubic cm)
17.5
20.0
5.0
7.5
10.0
12.5
15.0
Volume (cubic cm)
17.5
20.0
(e) When the gas volume is 15 cm3 the model in part (c) predicts the pressure to be
P = 0.3677 + 15.8994 (1 15 ) 1.4277 atmospheres, and the model in part (d) predicts the
reciprocal of pressure to be 0.1002 + 0.0398(15) = 0.6972 or P = 1/ 0.6972 1.4343
atmospheres. The predictions are the same to the nearest one-hundredth of an atmosphere.
0.2
0.1
Residual
Period squared
4.4 (a) The scatterplot below (left) shows that the relationship between period2 and length is
roughly linear.
0.0
-0.1
-0.2
20
30
40
50
60
70
Length (cm)
80
90
100
110
20
30
40
50
60
70
Length (cm)
80
90
100
110
(b) The least-squares regression line for the transformed data y = period2 and x = length is
y = 0.1547 + 0.0428 x . The value of r 2 = 0.992 and the residual plot above (right) indicate that
92
Chapter 4
the linear model provide a good fit for the transformed data. As we noticed in Exercise 4.2 part
(d), the residual plot looks better, but there is still a pattern with the residuals for the three
smallest lengths being positive and the residuals for the next six lengths being negative. (c)
4 2
0.0403 . The
According to the theoretical relationship, the slope in the model should be
980
estimated model appears to agree with the theoretical relationship because the estimated slope is
0.0428, an absolute difference of about 0.0025. (d) The predicted length of an 80-centimeter
pendulum is y = 0.1547 + 0.0428 80 3.2693 or a period of 1.8081 seconds. The two models
provide very similar predicted values, with an absolute difference of only 0.0156.
4.5 (a) A scatterplot is shown below (left). The relationship is strong, negative and nonlinear (or
curved).
180
5.0
140
ln(Light intensity)
160
120
100
80
60
4.5
4.0
3.5
40
20
3.0
5
8
Depth (meters)
10
11
8
Depth (meters)
10
11
93
180
0.000100
160
Light intensity (lumens)
0.000075
Residual
0.000050
0.000025
0.000000
-0.000025
140
120
100
80
60
40
-0.000050
20
5
8
9
Depth (meters)
10
11
8
Depth (meters)
10
11
(g) At 22m, the predicted light intensity is y = 888.1139e0.33322 0.5846 lumens. No, the
absolute difference between the observed light intensity 0.58 and the predicted light intensity
0.5846 is very small (0.0046 lumens) because the model provides an excellent fit.
4.6 (a) A scatterplot is shown below (left).
3000000
6.5
2500000
6.0
log(Acres)
Acres
2000000
1500000
5.5
1000000
500000
5.0
0
1978
1979
1980
1981
1978
Year
1979
1980
1981
Year
(b) The ratios are 226,260/63,042 = 3.5890, 907,075/226,260 = 4.0090, and 2,826,095/907,075 =
3.1156. (c) The transformed values of y are 4.7996, 5.3546, 5.9576, and 6.4512. A scatterplot of
the logarithms against year is shown above (right). (d) Minitab output is shown below.
The regression equation is
log(Acres) = - 1095 + 0.556 year
Predictor
Constant
year
Coef
-1094.51
0.55577
S = 0.0330502
SE Coef
29.26
0.01478
R-Sq = 99.9%
T
-37.41
37.60
P
0.001
0.001
R-Sq(adj) = 99.8%
(e) If x = year and y = acres, then the model after the inverse transformation is
y = 101094.51100.5558 x . The coefficient of 100.5558 x is 0.0000 (rounded to 4 decimal places) so all of
the predicted values would be 0. (Note: If properties of exponents are not used to simplify the
right-hand-side, then some calculators will be able to do the calculations without having serious
overflow problems.) (f) The least-squares regression line of log(acres) on year is
y = 4.2513 + 0.5558 x . (g) The residual plot below shows no clear pattern, so the linear
regression model on the transformed data is appropriate.
94
Chapter 4
0.04
3000000
0.03
2500000
2000000
0.01
Acres
Residual
0.02
0.00
1500000
1000000
-0.01
500000
-0.02
-0.03
1.0
1.5
2.0
2.5
3.0
Years since 1977
3.5
4.0
(h) If x = year and y = acres, then the model after the inverse transformation is
y = 104.2513100.5558 x 17,836.1042 100.5558 x . A scatterplot with the exponential model
superimposed is shown above (right). The exponential model provides an excellent fit. (i) The
predicted number of acres defoliated in 1982 (5 years since 1977) is
y 17,836.1042 100.55585 = 10,722,597.42 acres.
4.7 (a) If y = number of transistors and x = number of years since 1970, then y (1) = ab1 = 2250
4
2250
2250 3
1.5874 . This model
and y (4) = ab = 9000 , so a =
1417.4112 and b =
0.25
1417.4112
9000
predicts the number of transistors in year x after 1970 to be y = 1417.4112 1.5874 x . (b) Using
the natural logarithm transformation on both sides of the model in (a), produces the line
ln y = 7.2566 + 0.4621x . (c) The slope for Moores model (0.4621) is larger than the estimated
slope in Example 4.6 (0.332), so the actual transistor counts have grown more slowly than
Moores law suggests.
4
4.8 (a) According to the claim, the number of children killed doubled every year after 1950.
Year
1951 1952 1953 1954 1955 1956 1957 1958 1959 1960
Number of deaths
2
4
8
16
32
64
128 256 512 1024
(b) A scatterplot showing the exponential relationship is shown below (left).
3.0
1000
2.5
log(Number of deaths)
Number of deaths
800
600
400
2.0
1.5
1.0
200
0.5
0
0.0
1950
1952
1954
1956
Year
1958
1960
1950
1952
1954
1956
1958
1960
Year
(c) According to the paper, the number of children killed x years after 1950 is 2 x . Thus,
245 = 3.5184 1013 or approximately 35 trillion children were killed in 1995. This is clearly a
95
mistake. (d) A scatterplot of the logarithms against year (above on the right) shows a strong,
positive linear relationship. (e) The least-squares regression line for predicting the logarithm of
y = deaths from x = year is approximately y = 587.0 + 0.301x . Thus, the predicted value in
1995 is y = 587.0 + 0.301 1995 13.495 . As a check, log(245 ) 13.5463 . The absolute
difference in these two predictions, 0.0513, is relatively small.
4.9 (a) A scatterplot is shown below.
300
2.5
2.0
200
Log(Population)
250
150
100
1.5
1.0
50
0.5
0
50
100
Time (since 1790)
150
200
50
100
Time (since 1790)
150
200
(b) In the scatterplot above (right), the transformed data appear to be linear from 0 to 90 (or 1790
to about 1880), and then linear again, but with a smaller slope. The linear trend indicates that the
exponential model is still appropriate and the smaller slope reflects a slower growth rate. (c) The
least-squares regression line for predicting y = log(population) from x = time since 1790 is
y = 1.329 + 0.0054 x . Transforming back to the original variables, the estimated population size
is 21.3304 1.0125x . A scatterplot with this regression line is shown below (left). (d) The
residual plot (below on the right) shows random scatter and r 2 = 0.995, so the exponential model
provides an excellent fit.
2.5
0.010
0.005
2.3
Residual
Log(Population)
2.4
2.2
0.000
-0.005
2.1
-0.010
-0.015
2.0
120
130
140
150
160
170
180
Time (since 1790)
190
200
210
120
130
140
150
160
170
180
Time (since 1790)
190
200
210
(e) The predicted population in 2010 is y = 1.329 + 0.0054 220 2.517 or about
102.517 = 328.8516 million people. The prediction is probably too low, because these estimates
usually do not include homeless people and illegal immigrants.
4.10 (a) A scatterplot of distance versus height is shown below (left).
Chapter 4
1500
1500
1400
1400
1300
1300
1200
1200
Distance
Distance
96
1100
1100
1000
1000
900
900
800
800
300
400
500
600
700
Height
800
900
1000
16
18
20
22
24
26
Square root of height
28
30
32
(b) The curve tends to bow downward, which resembles a power curve x p with p < 1.
Since we want to pull in the right tail of the distribution, we should apply a transformation x p
with p < 1. (c) A scatterplot of distance against the square root of height (shown above, right)
straightens the graph quite nicely.
4.11 (a) Let x = Body weight in kg and y = Life span in years. Scatterplots of the original data
(left) and the transformed data (right), after taking the logarithms of both variables, are shown
below. The linear trend in the scatterplot for the transformed data suggests that the power model
is appropriate.
1.75
40
Log(Life span)
1.50
30
20
1.25
1.00
0.75
10
0.50
0
0
500
1000
1500
Weight (kg)
2000
2500
3000
-2
-1
1
Log(Weight)
(b) The least squares regression line for the transformed data is log y = 0.7617 + 0.2182log( x) .
The residual plot (below on the left) shows fairly random scatter about zero and r 2 = 0.7117 .
Thus, 71.17% of the variation in the log of the life spans is explained by the linear relationship
with the log of the body weight.
97
0.3
40
0.2
Residual
0.1
0.0
-0.1
-0.2
30
20
10
-0.3
-0.4
-2
-1
1
Log(Weight)
2
3
Transformed weight
(c) The inverse transformation gives the estimated power model y = 100.7617 x 0.2182 5.7770 x 0.2182 .
(d) This model predicts the average life span for humans to be
y 5.7770 650.2182 = 14.3642 years, considerably shorter than the expected life span of humans.
(e) According to the biologists, the power model is y = ax 0.2 . The easiest and best option is to
plot a graph of ( weight 0.2 , lifespan ) and then fit a least-squares regression line using the
transformed weight as the explanatory variable. The scatterplot (above on the right) shows that
this model provides a good fit for the data. The least-squares regression line is
y = 2.70 + 7.95 x 0.2 with a predicted average life span of y = 2.7 + 7.95 650.2 15.62 years for
humans. Note: Students may try some other models, which are not as good. For example,
raising both sides of the equation to the fifth power, the model becomes y 5 = a 5 x , which is a
linear regression model with no intercept parameter (or an intercept of zero). After transforming
life span y to y5, the estimated model is y 5 = 30,835 x . This model predicts the average life span
of humans to be y = ( 30,835 65)
0.2
transformed data is y 5 = 1389463 + 30, 068 x with a predicted average life span of
y = (1389463 + 30068 65 )
0.2
Chapter 4
1.2
1.2
1.1
1.1
1.0
1.0
Log(Cost)
Log(Cost)
98
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
9
10
11
12
13
14
15
Diameter (inches)
16
17
18
1.00
1.05
1.10
1.15
Log(Diameter)
1.20
1.25
(b) Let y = the cost of the pizza and x = the diameter of the pizza. The least-squares regression
line is log y = 1.5118 + 2.1150log x . The inverse transformation gives the estimated power
model y = 101.5118 x 2.115 0.0308 x 2.115 . (c) According to this model, the predicted costs of the
four different size pizzas are $4.01, $5.90, $8.18, and $13.91, from smallest to largest. There are
only slight differences between the predicted costs for the model and the actual costs, so an
adjustment does not appear to be necessary based on this model. (d) According to our estimated
power model in part (b), the predicted cost for the new soccer team pizza is
y = 0.0308 242.115 $25.57 . (e) An alternative model is based on setting the cost proportional
to the area, or the power model of the form cost ( 4 ) x 2 . Most students will square the
diameter and then fit a linear model to obtain the least squares regression line
y = 0.506 + 0.0445 x 2 . The estimated price of the soccer team pizza is
Weight (pounds)
225
200
175
150
60
65
70
Height (inches)
75
80
(c) Calculate the logarithms of the heights and the logarithms of the weights. The least-squares
regression line for the transformed data is log y = 1.3912 + 2.0029log x . r 2 = 0.9999 ; almost
all (99.99% of the variation in log of weight is explained by the linear relationship with log of
99
height. (d) The residual plot below for the transformed data shows that the residuals are very
close to zero with no discernable pattern. This model clearly fits the transformed data very well.
0.0015
Residual
0.0010
0.0005
0.0000
-0.0005
-0.0010
1.750
1.775
1.800
1.825
Log(Height)
1.850
1.875
1.900
3000
Log(Heart weight)
4000
2000
1000
-1
0
8
10
12
Cavity length (cm)
14
16
18
-0.4
-0.2
0.0
0.2
0.4
0.6
Log(Cavity lenght)
0.8
1.0
1.2
Numerical Summaries: The correlation between log of cavity length and log of heart weight is
0.997, indicating a near perfect association. Model: The power model is weight = a lengthb .
After taking the logarithms of both variables, the least-squares regression line is
log y = 0.1364 + 3.1387log x . Approximately 99.3% of the variation in the log of heart weight
is explained by the linear relationship with log of cavity length. The residual plot below suggests
that there may be a little bit of curvature remaining, but nothing to get overly concerned about.
100
Chapter 4
0.3
Residual
0.2
0.1
0.0
-0.1
-0.2
-0.4
-0.2
0.0
0.2
0.4
0.6
Log(Cavity length)
0.8
1.0
1.2
400
10
300
Residual
Distance (cm)
4.15 (a) The scatterplot below (left) shows that the relationship between y = distance and x =
time is strong, positive, and nonlinear (curved).
200
-5
100
-10
0
0.1
0.2
0.3
0.4
0.5
0.6
Time (seconds)
0.7
0.8
0.0
0.9
0.1
0.2
0.3
0.4
0.5
Time squared
0.6
0.7
0.8
(b) The least-squares regression line for the transformed data is y = 0.990 + 490.416 x 2 . (c) The
residual plot above (right) shows random scatter and r 2 = 0.9984 , so 99.84% of the variability in
the distance fallen is explained with this linear model. (d) Yes, the scatterplot below (left) shows
that this transformation does a very good job creating a linear trend. The least-squares regression
line for the transformed data is y = 0.1046 + 22.0428 x .
0.4
20
0.2
15
0.1
Residual
0.3
10
0.0
-0.1
-0.2
-0.3
-0.4
-0.5
0.1
0.2
0.3
0.4
0.5
0.6
Time (seconds)
0.7
0.8
0.9
0.1
0.2
0.3
0.4
0.5
0.6
Time (seconds)
0.7
0.8
0.9
(e) The residual plot above (right) shows no obvious pattern and r 2 = 0.9986 . This is an excellent
model. (f) The predicted distance that an object had fallen after 0.47 seconds is 109.32 cm using
101
the model from (b) and 109.51 cm using the model from (d). There is very little difference in the
predicted values, but most students will probably pick the prediction from (d) because r 2 is a
little higher and the residual plot shows less variability about the regression line.
4.16 (a) We are given the model ln y = 2.00 + 2.42ln x . Using properties of logarithms, the
power model is eln y = e 2.00+2.42ln x or y = e2.00 x 2.42 . (b) The estimated biomass of a tree with a
diameter of 30 cm is y = e 2.00 302.42 508.2115 kg.
4.17 Who? The individuals are carnivores. What? The response variable y is a measure of
abundance and the explanatory variable x is the size of the carnivore. Why? Ecologists were
interested in learning more about natures patterns. When, where and how? The data were
collected before 2002 (the publication date) by relating the body mass of the carnivore to the
number of carnivores. Rather than simply counting the total number of observed carnivores, the
researchers created a measure of abundance based on a count relative to the size of prey in an
area. Graphs: A scatterplot of y = abundance versus x = body mass (on the left below) shows a
nonlinear relationship. Using the log transformation for both variables provides a moderately
strong, negative, linear relationship (see the scatterplot below on the right).
1800
3
1600
1400
Log(Abundance)
Abundance
1200
1000
800
600
400
200
0
-1
0
50
100
150
200
Body mass (kg)
250
300
350
-1.0
-0.5
0.0
0.5
1.0
Log(Body mass)
1.5
2.0
2.5
Numerical Summaries: The correlation between log body mass and log abundance is 0.912.
Model: The least-squares regression line for the transformed data is
log y = 1.9503 1.0481log x , with an r 2 = 0.8325 and a residual plot (below) showing no obvious
patterns.
1.0
Residual
0.5
0.0
-0.5
-1.0
-1.0
-0.5
0.0
0.5
1.0
Log(Body mass)
1.5
2.0
2.5
102
Chapter 4
4.18 Let x = the breeding length, length at which 50% of females first reproduce and y = the
asymptotic body length. The scatterplot (left) and residual plot (right) below show that the linear
model does not provide a great fit for these body measurements of this fish species. Most of the
residuals are negative for breeding lengths below 30 cm and above 150 cm.
100
75
400
50
300
Residual
500
200
25
0
100
-25
-50
0
0
50
100
150
200
Breeding length (cm)
250
300
350
50
100
150
200
Breeding length
250
300
350
Applying the log transformation to both lengths produces better results. The scatterplot (left)
and residual plot (right) below show that a linear model provides a very good fit. The least
squares regression model for the transformed data is log y = 0.3011 + 0.9520logx , with an
r 2 = 0.898 and a residual plot with very little structure, although most of the residuals are still
negative when the explanatory variable is above 1.9.
0.4
0.3
2.5
0.2
2.0
Residual
3.0
1.5
0.1
0.0
-0.1
1.0
-0.2
0.5
-0.3
0.5
1.0
1.5
Log(Breeding length)
2.0
2.5
0.5
1.0
1.5
Log(Breeding length)
2.0
2.5
The inverse transformation gives the estimated power model y = 100.3011 x0.952 2.0003x 0.952 ,
which provides a good fit for these data.
4.19 (a) Scatterplots of the original data (left) and the transformed data (right) are shown below.
103
120
2.0
100
80
60
40
1.5
1.0
0.5
20
0.0
0
0
10
20
Time (hours)
30
40
10
20
Time (hours)
30
40
(b) The first phase is from 0 to 6 hours when the mean colony size actually decreases. This
decrease is hard to see on the graph of the original data, but is more obvious on the graph of the
transformed data. In the second phase, from 6 to 24 hours, the mean colony size increases
exponentially. Both graphs show this phase clearly, but it is most noticeable from the linear
trend on the graph of the transformed data for this time period. At 36 hours, mean growth is in
the third phase where growth is still occurring, but at a lower rate than the previous phase. The
point in the top right corner of both graphs clearly shows the new phase because this point does
not fit the pattern for phase two. (c) Let y = mean colony size and x = time. The least-squares
regression line for the transformed data is log y = 0.5942 + 0.0851x . Using the inverse
transformation, the predicted size of a colony 10 hours after inoculation is
y = 100.5942100.085110 = 100.2568 1.8063 .
1.6
1.75
1.4
1.50
1.2
1.25
Log(Colony size)
4.20 The correlation for time (hours 624) and log (mean colony size) is r = 0.9915 . The
correlation time (hours 624) and log (individual colony size) is r = 0.9846 . As expected, the
correlation for the individual colony size is smaller than the correlation for the mean colony size
because individual measurements have more variability. The scatterplots below show the
differences in the relationships for mean colony sizes (left) and individual colony sizes (right).
1.0
0.8
0.6
0.4
1.00
0.75
0.50
0.25
0.2
0.00
0.0
5
10
15
Time (hours)
20
25
10
15
Time (hours)
20
25
4.21 (a) Weight = c1 (height )3 and strength = c2 (height ) 2 , so strength = c3 ( weight ) 2 / 3 , where c1 ,
c2 , and c3 are arbitrary constants. (b) The graph of y = x 2 / 3 below shows that strength does not
increase linearly with body weight, as would be the case if a person 1 million times as heavy as
an ant could lift 1 million times more than the ant. Strength increases more slowly. For
example, if weight is multiplied by 1000, strength will increase by a factor of 10002 / 3 = 100 .
104
Chapter 4
100
Strength
80
60
40
20
0
0
200
400
600
800
1000
Weight
4.22 (a) Answers will vary. (b) The population of cancer cells after n 1 years is P = P0 (7 / 6) n 1 .
The population of cancer cells after n years is P = P0 (7 / 6) n 1 + (1/ 6)( P0 (7 / 6)n 1 ) = P0 (7 / 6) n .
(c) Answers will vary, but the exponential model should provide a good fit for the data collected.
4.23 (a) The sum of the six counts is 10+9+24+61+206+548 = 858 people. (b) The sum of the
top row shows 10+9+24 = 43 people had arthritis. (c) The marginal distribution of participation
in soccer is shown below.
Elite Non-elite Did not play
Count
71
215
572
Percent 8.3% 25.1%
66.7%
(d) The percent of each group who have arthritis is 14.08% for the elite soccer players, 4.2% for
the non-elite soccer players and 4.19% for the people who did not play. This suggests an
association between playing elite soccer and developing arthritis.
4.24 The percents should add to 100% because they provide a breakdown of all participants
according to one categorical variable. The sum is 8.3% + 25.1% + 66.7% = 100.1 %. If one
more decimal place is included in each of the percents, then the sum is 8.28% + 25.06% +
66.67% = 100.01%. The percents do not add to 100% because of rounding.
4.25 (a) The sum of the six counts is 5375 students. (b) The proportion of these students who
smoke is 1004/5375 = 0.1868, so the percent of smokers is 18.68%. (c) The marginal
distribution of parents smoking behavior is shown below.
Neither parent smokes
One parent smokes Both parents smoke
Count
1356
2239
1780
Percent 25.23%
41.66%
33.12%
(d) The three conditional distributions are shown in the table below.
Neither parent
One parent
Both parents
smokes
smokes
smoke
Student does not smoke
86.14%
81.42%
77.53%
Student smokes
13.86%
18.58%
22.47%
The conditional distributions reveal what many people expectparents have a substantial
influence on their children. Students that smoke are more likely to come from families where
one or more of their parents smoke.
105
4.26 (a) The two-way table is shown below. (b) The percent of eggs in each group that hatched
are 59.26% in a cold nest, 67.86% in a neutral nest, and 72.12% in a hot nest. The percents
indicate that hatching increases with temperature. The cold nest did not prevent hatching, but
made it less likely.
Cold
Neutral Hot
Hatched
16
38
75
Not hatched 11
18
29
Total
27
56
104
4.27 (a) The two conditional distributions are shown in the table below. The biggest difference
between men and women is in Administrationa higher percentage of women chose this major.
A greater percent of men chose the other fields, especially finance. (b) A total of 386 students
responded , so 722386 = 336 did not respond. About 46.54% of the students did not respond.
Female Male
Accounting
30.22% 34.78%
Administration 40.44% 24.84%
Economics
2.22% 3.73%
Finance
27.11% 36.65%
4.28 Two examples are shown below. In general, choose a to be any number from 0 to 50, and
then all the other entries can be determined.
25 25
10 40
35 15
50 0
Note: This is why we say that such a table has one degree of freedom: We can make
one (nearly) arbitrary choice for the value of a, and then have no more decisions to make.
4.29 (a) The two-way table is shown below. (b) Overall, 11.88% of white defendants and
10.24% of black defendants receive the death penalty. For white victims, 12.58% of white
defendants and 17.46% of black defendants receive the death penalty. For black victims, 0% of
white defendants and 5.83% of black defendants receive the death penalty. (c) The death penalty
is more likely when the victim was white (14.02%) rather than black (5.36%). Because most
convicted killers are of the same race as their victims, whites are more often sentenced to death.
Death penalty
No death penalty
White defendant
19
141
Black defendant
17
149
4.30 (a) The two-way table is shown below. (b) Overall, 70% of male applicants are admitted,
while only 56% of females are admitted. (c) In the business school, 80% of male applicants are
admitted, compared with 90% of females. In the law school, 10% of males are admitted,
compared with 33.33% of females. (d) Six out of 7 men apply to the business school, which
admits 82.5% of all applicants, while 3 out of 5 women apply to the law school, which admits
only 27.5% of its applicants.
Admit Deny
Male
490
210
Female 280
220
106
Chapter 4
4.31 The table below gives the two marginal distributions. The marginal distribution of marital
status is found by taking, e.g., 337/8235 4.1%. The marginal distribution of job grade is found
by taking, e.g., 955/8235 11.6%.
Single
Married
Divorced Widowed
4.1%
93.9%
1.5%
0.5%
Grade 1
Grade 2
Grade 3
Grade 4
11.6%
51.5%
30.2%
6.7%
As rounded here, both sets of percents add up to 100%. If students round to the nearest whole
percent, the marital status numbers add up to 101%. If they round to two places after the
decimal, the job grade percents add up to 100.01%.
4.32 The percent of single men in grade 1 jobs is 58/337 17.21%. The percent of grade 1 jobs
held by single men is 58/955 6.07%.
4.33 Divide the entries in the first column by the first column total; e.g., 17.21% 58/337.
These should add to 100% (except for rounding error). The percentages in the table below add to
100.01%.
Job grade
% of single men
1
17.21%
2
65.88%
3
14.84%
4
2.08%
If the percents are rounded to the nearest tenth, 17.2%, 65.9%, 14.8%, and 2.1%, then they add to
100%.
4.34 (a) We need to compute percents to account for the fact that the study included many more
married men than single men, so that we would expect their numbers to be higher in every job
grade (even if marital status had no relationship with job level). (b) A table of percents is below;
descriptions of the relationship may vary. Single and widowed men had higher percents of grade
1 jobs; single men had the lowest (and widowed men the highest) percents of grade 4 jobs.
Job grade
Single
Married
Divorced Widowed
1
17.21%
11.31%
11.90%
19.05%
4
2.08%
6.90%
5.56%
9.52%
4.35 Age is the main lurking variable: Married men would generally be older than single men,
so they would have been in the work force longer, and therefore had more time to advance
in their careers.
4.36 (a) A bar graph is shown below58.33% of desipramine users did not have a relapse,
while 25.0% of lithium users and 16.7% of those who received a placebo succeeded in breaking
their addictions. (b) Because random assignment was used, there is statistical evidence for
causation (though there are other questions we need to consider before we can reach that
conclusion).
107
60
50
40
30
20
10
Desipramine
Lithium
Label
Placebo
4.37 (a) To find the marginal distribution of opinion, we need to know the total numbers of
people with each opinion: 49/133 36.84% said higher, 32/133 24.06% said the same,
and 52/133 39.10% said lower. The numbers are summarized in the first table below. The
main finding is probably that about 39% of users think the recycled product is of lower quality.
This is a serious barrier to sales. (b) There were 36 buyers and 97 nonbuyers among the
respondents, so (for example) 20/36 55.56% of buyers rated the quality as higher. Similar
arithmetic with the buyers and nonbuyers rows gives the two conditional distributions of opinion,
shown in the second table below. We see that buyers are much more likely to consider recycled
filters higher in quality, though 25% still think they are lower in quality. We cannot draw any
conclusion about causation: It may be that some people buy recycled filters because they start
with a high opinion of recycled products, or it may be that use persuades people that the quality
is high.
Higher The same Lower
36.84% 24.06%
39.10%
Higher The same
Buyers
55.56% 19.44%
Nonbuyers 29.90% 25.77%
Lower
25.00%
44.33%
4.38 (a) The two-way table is shown below. (b) The overall batting averages are 0.240 for Joe
and 0.260 for Moe. Moe has the best overall batting average.
Hit
No hit
Joe
120
380
Moe
130
370
(c) Two separate tables, one for each type of pitcher, are shown below. Against left-handed
pitchers, Joes batting average is 0.200 and Moes batting average is 0.100. Against righthanded pitchers, Joes batting average is 0.400 and Moes batting average is 0.300. Joe is better
against both kinds of pitchers.
Left-handed pitchers
Right-handed pitchers
Hit
No hit
Hit
No hit
Joe
80
320
Joe
40
60
Moe
10
90
Moe
120
280
(d) Both players do better against right-handed pitchers than against left-handed pitchers. Joe
spent 80% of his at-bats facing left-handers, while Moe only faced left-handers 20% of the time.
108
Chapter 4
4.39 Examples will vary, of course; one very simplistic possibility is shown below. The key is
to be sure that there is a lower percentage of overweight people among the smokers than among
the nonsmokers.
Combined All People
Early Death
Yes
No
Overweight
41
59
Not overweight 50
50
Smokers
Non smokers
Early Death
Yes
No
Overweight
10
0
Not overweight 40
20
Early Death
Yes No
Overweight
31 59
Not overweight 10 30
4.40 Who? The individuals are students. What? The categorical variables of interest are
educational level or degree (Associates, Bachelors, Masters, Professional, or Doctors) and
gender (male or female). Why? The researchers were interested in checking if the participation
of women changes with level of degree. When, where, how, and by whom? These projections,
in thousands, were made for 2005-2006 by the National Center for Education Statistics. Graphs:
The conditional distributions of sex for each degree level are shown in the bar graph below (left).
The conditional distributions of degree level for each gender are shown in the bar graph below
(right).
70
50
60
40
Percent
Percent
50
40
30
30
20
20
10
10
0
m
Fe
Degree
A
e
al
le
Ma
's
te
ia
oc
ss
m
Fe
e
al
el
ch
Ba
's
or
e
al
m
Fe
e
al
's
er
st
Ma
e
al
m
Fe
e
al
o
si
es
of
Pr
e
al
m
Fe
l
na
D
e
al
r's
to
oc
e
al
0
Degree
A
l
's
's
r's
r 's
na
te
or
te
to
ia
el
sio
as
oc
oc
ch
M
es
D
ss
Ba
of
r
P
Fe
e
al
l
's
's
's
r's
na
te
or
or
te
ia
el
ct
sio
as
oc
ch
M
es
Do
ss
Ba
of
r
P
Ma
le
Numerical summaries: The software output below from Mintab provides the joint distribution,
marginal distributions, and conditional distributions in one consolidated table. The first entry in
each cell is the count, the second entry is the % of the row (or the conditional distribution of
gender for each type of degree), the third entry is the % of the column (or the conditional
distribution of degree for each gender), and the fourth entry is the overall %.
Columns: Gender
Female
Male
109
All
Associate's
431
63.85
26.85
15.851
244
36.15
21.90
8.974
675
100.00
24.83
24.825
Bachelor's
813
58.20
50.65
29.901
584
41.80
52.42
21.478
1397
100.00
51.38
51.379
Doctor's
21
46.67
1.31
0.772
24
53.33
2.15
0.883
45
100.00
1.66
1.655
Master's
298
58.09
18.57
10.960
215
41.91
19.30
7.907
513
100.00
18.87
18.867
42
47.19
2.62
1.545
47
52.81
4.22
1.729
89
100.00
3.27
3.273
1605
59.03
100.00
59.029
1114
40.97
100.00
40.971
2719
100.00
100.00
100.000
Professional
All
Cell Contents:
Count
% of Row
% of Column
% of Total
Interpretation: Women earn a majority of associates, bachelors, and masters degrees, but fall
slightly below 50% for professional and doctoral degrees. The distributions of degree level are
very similar for females and males.
4.41 No. Rich nations have more TV sets than poor nations. Rich nations also have longer life
expectancies because they have better nutrition, clean water, and better health care. There is
common response relationship between TV sets and length of life.
110
Chapter 4
x = # of
TV sets
y = average
life span
Wealth
4.42 In this case, there may be a causative effect, but in the direction opposite to the one
suggested: People who are overweight are more likely to be on diets, and so choose artificial
sweeteners over sugar. (Also, heavier people are at a higher risk to develop diabetes; if they do,
they are likely to switch to artificial sweeteners.)
Use of
sweeteners
Weight
gain
4.43 No. The number of hours standing up is a confounding variable in this case. The diagram
below illustrates the confounding between exposure to chemicals and standing up.
111
?
Exposure to
chemicals
Miscarriages
Time
standing up
4.44 Well-off people tend to have more cars. They also tend to live longer, probably because
they are better educated, take better care of themselves, and get better medical care. The cars
have nothing to do with it. The relationship between number of cars and length of life is
common response.
Number
of cars
Length of
life
Wealth
4.45 It could be that children with lower intelligence watch many hours of television and get
lower grades as well. It could be that children from lower socio-economic households where
parents are less likely to limit television viewing and are unable to help their children with their
schoolwork because the parents themselves lack education. The variables number of hours
112
Chapter 4
watching television and grade point average change in common response to socio-economic
status or IQ.
Number of hours
spent watching TV
GPA
IQ or socioeconomic
status
4.46 Single men tend to have a different value system than married men. They have many
interests, but getting married and earning a substantial amount of money are not among their top
priorities. Confounding is the best term to describe the relationship between marital status and
income.
?
Marital
status
Annual
income
Values
4.47 The effects of coaching are confounded with those of experience. A student who has taken
the SAT once may improve his or her score on the second attempt because of increased
113
familiarity with the test. The student may also have increased knowledge from additional math
and science courses.
SAT
score
Coaching
Course
Experience
4.48 A reasonable explanation is that the cause-and-effect relationship goes in the other
direction: Doing well makes students feel good about themselves, rather than vice versa.
Selfesteem
Quality
of work
CASE CLOSED!
1. (a) Let y = premium and x = age. Scatterplots of the original data (left) and transformed data
(right) after taking the logarithms of both variables are shown below. The plot of the original
data shows a strong nonlinear relationship. The plot for the transformed data shows a clear
linear trend, so the power model is appropriate.
114
Chapter 4
2.4
250
2.3
2.2
Log(Premium)
Premium ($)
200
150
100
2.1
2.0
1.9
1.8
1.7
50
1.6
1.5
0
40
45
50
55
Age (years)
60
65
1.60
1.65
1.70
Log(Age)
1.75
1.80
(b) A scatterplot of the logarithm of premium versus age is shown below (left). The linear trend
suggests that the exponential model is appropriate.
2.4
0.010
2.3
0.005
2.1
Residual
Log(Premium)
2.2
2.0
1.9
1.8
1.7
0.000
-0.005
-0.010
1.6
1.5
-0.015
40
45
50
55
age
60
65
40
45
50
55
60
65
Age
(c) Since the association between the log of premium and age is nearly perfect, the exponential
model is most appropriate. The least-squares regression line for the transformed data is
log y = 0.0275 + 0.0373x . Using the inverse transformation, the predicted premium is
y = 100.0275100.0373 x 0.9386 100.0373 x . (d) The predicted monthly premiums are
y = 0.9386 100.037358 $136.74 for a 58-year-old and y = 0.9386 100.037368 $322.76 for a
68-year-old. (e) You should feel very comfortable with these predictions. The residual plot
above (right) shows no clear patterns and r 2 = 99.9% , so the exponential model provides an
excellent fit.
2. (a) The entries in each column are only from these six selected causes of death. There are
other causes of death so the total number of deaths in each age group is higher than the sum of
the deaths for these six causes. (b) Percents should be used to compare the age groups because
the age groups contain different numbers of individuals. (c) The conditional distributions are
shown in the table below. Each entry is obtained by dividing the count for that cause of death by
the appropriate column total.
15 to 24 years 25 to 44 years 45 to 64 years
Accidents
45.32%
21.60%
5.42%
AIDS
0.52%
5.34%
1.35%
Cancer
4.93%
14.77%
33.16%
Heart disease
3.28%
12.63%
23.27%
Homicide
15.59%
5.71%
0.63%
Suicide
11.87%
8.73%
2.30%
115
50
50
40
40
30
30
Percent
Percent
20
10
Cause
20
10
s S r t e e
s S r t e e
s S r t e e
n t ID ce ar id id
n t ID ce ar id id
n t ID ce ar id id
d e A Can He mic u ic
d e A Can He mic u ic
d e A Can He mic Su ic
ci
ci
ci
o
o S
o S
H
Ac
H
Ac
Ac
H
4
P6
5
P2
5
P1
Cause
5 5 4
P1 P2 P6
s
nt
de
ci
Ac
5 5 4
P1 P 2 P 6
D
AI
5 5 4
P1 P2 P 6
r
ce
an
C
5 5 4
P 1 P2 P6
rt
ea
H
5 5 4
P 1 P 2 P6
e
id
ic
m
Ho
5 5 4
P1 P2 P6
e
id
ic
Su
(d) The leading cause of death for the youngest age group is accidents, followed by homicide
and suicide. For the middle age group, accidents are still the leading cause of death, but cancer
and heart disease are second and third, respectively. For the oldest age group, cancer is the
leading cause of death, with heart disease running a close second.
3. (a) The chance of dying for men over 65 who walk at least 2 miles a day is half that of men
who do not exercise. (b) Individuals who exercise regularly have many other habits and
characteristics that could contribute to longer lives.
4.49 Spending more time watching TV means that less time is spent on other activities. Answers
will vary, but some possible lurking variables are: the amount of time parents spend at home, the
amount of exercise and the economy. For example, parents of heavy TV watchers may not
spend as much time at home as other parents. Heavy TV watchers may not get as much exercise
as other adolescents. As the economy has grown over the past 20 years, more families can afford
TV sets (many homes now contain more than two TV sets), and as a result, TV viewing has
increased and children have less physical work to do in order to make ends meet.
4.50 (a) Let y = intensity and x = distance. A scatterplot of the original data is shown below
(left). The data appear to follow a power law model of the form y = axb where b is some
negative number.
-0.5
0.30
-0.6
-0.7
Log(Intensity)
Intensity (candelas)
0.25
0.20
0.15
-0.8
-0.9
-1.0
0.10
-1.1
-1.2
1.0
1.2
1.4
1.6
Distance (meters)
1.8
2.0
0.00
0.05
0.10
0.15
0.20
Log(distance)
0.25
0.30
(b) A scatterplot of the transformed data (above on the right), after taking the logarithms of both
variables, shows a clear linear trend, so the power model is appropriate. The least-squares
116
Chapter 4
regression line for the transformed data is log y = 0.5235 2.0126log x . (c) The residual plot
below shows no obvious patterns and r 2 = 99.9% so this linear model on the transformed data
provides an excellent fit.
0.010
0.25
Intensity (candelas)
0.005
Residual
Variable
intensity
predicted
0.30
0.000
0.20
0.15
0.10
-0.005
0.00
0.05
0.10
0.15
0.20
Log(Distance)
0.25
0.30
1.0
1.2
1.4
1.6
Distance (meters)
1.8
2.0
(d) Using the inverse transformation to find the predicted intensity gives
y = 100.5235 x 2.0126 0.2996 x 2.0126 . The plot of the original data with this model is shown above
(right). (e) The predicted intensity of the 100-watt bulb at 2.1 meters is
y = 0.2996 2.12.0126 0.0673 candelas.
4.51 (a) Yes, this transformation achieves linearity; see the scatterplot below.
0.30
Intensity (candelas)
0.25
0.20
0.15
0.10
0.2
0.3
0.4
0.5
0.6
0.7
1/(Distance-squared)
0.8
0.9
1.0
(b) Let x = distance and y = intensity. The least-squares regression line for the transformed data
1
is y = 0.0006 + 0.30 2 . (c) The predicted intensity of the 100-watt bulb at 2.1 meters is
x
1
y = 0.0006 + 0.30 2 0.0674 candelas. (d) Writing the model from part (d) of Exercise
2.1
0.3
4.50 in a slightly different form shows that the models are very similar, y = 0.0006 + 2
2.1
0.3
versus y 2.0126 . The absolute difference in the predicted values is 0.0001. Thus, the
2.1
117
4.52 The explanatory variable is the amount of herbal tea and the response variable is a measure
of health and attitude. The most important lurking variable is social interactionmany of the
nursing home residents may have been lonely before the students started visiting.
4.53 (a) The column sums are shown below.
Single:
10,949 + 7,653 + 4,009 + 720 = 23,331
Married:
2,472 + 19,640 + 32,183 + 8,539 = 62,834
Widowed:
16 + 228 + 2,312 + 8,732 = 11,288
Divorced:
155 + 2,904 + 7,898 + 1,703 = 12,660
The sum of these column totals is 23,331 + 62,834 + 11,288 + 12,660 = 110,113, which is not
equal to 110,115. The difference is due to rounding. (b) The marginal distributions, conditional
distributions, and joint distribution are shown in the software output from Minitab below.
Rows: Age
Columns: Marital Status
divorced married single widowed
All
15-24
155
1.14
1.22
0.141
2472
18.19
3.93
2.245
10949
80.55
46.93
9.943
16
0.12
0.14
0.015
13592
100.00
12.34
12.344
25-39
2904
9.54
22.94
2.637
19640
64.55
31.26
17.836
7653
25.15
32.80
6.950
228
0.75
2.02
0.207
30425
100.00
27.63
27.631
40-64
7898
17.02
62.39
7.173
32183
69.36
51.22
29.227
4009
8.64
17.18
3.641
2312
4.98
20.48
2.100
46402
100.00
42.14
42.140
65+
1703
8.65
13.45
1.547
8539
43.36
13.59
7.755
720
3.66
3.09
0.654
8732
44.34
77.36
7.930
19694
100.00
17.89
17.885
All
12660
11.50
100.00
11.497
62834
57.06
100.00
57.063
23331
21.19
100.00
21.188
11288
10.25
100.00
10.251
110113
100.00
100.00
100.000
Cell Contents:
Count
% of Row
% of Column
% of Total
The table below provides just the marginal distribution for marital status.
Single Married Widowed Divorced
21.19% 57.06% 10.25%
11.50%
A bar chart of the marginal distribution is shown below.
118
Chapter 4
60
50
Percent
40
30
20
10
Single
Married
Widowed
Marital status
Divorced
(c) The two conditional distributions are shown in the table below.
Age
Single Married Widowed Divorced
1524 80.55% 18.19% 0.12%
1.14%
4064 8.64% 69.36% 4.98%
17.02%
Among the younger women, more than 4 out of 5 have not yet married, and those who are
married have had little time to become widowed or divorced. Most of the older group is or has
been marriedonly about 8.64% are still single. (d) Among single women, 46.93% are 1524,
32.8% are 2539, 17.18% are 4064 and 3.09% are 65 or older.
4.54 (a) The scatterplots below show a strong nonlinear relationship for the original data (left)
and a nearly perfect, negative linear association for the transformed data (right).
0.5
3.0
0.4
2.5
Log(Height)
Height (feet)
0.3
2.0
1.5
0.2
0.1
0.0
1.0
-0.1
-0.2
0
3
Bounce
Bounce
Not only is the linear association between the log(height) and bounce stronger than the linear
association between the logarithms of both variables, but there is also a value of zero for the
bounce number which means that the logarithm cannot be used for this point. The exponential
model is more appropriate for predicting y = height from x = bounce number. (b) The leastsquares regression line for the transformed data is log y = 0.4610 0.1191x . The residual plot
below shows that the first two residuals are positive and the next three residuals are negative, but
the residuals are all very small. The value of r 2 is 0.998, which indicates that 99.8% of the
variability in log(height) is explained by linear relationship with bounce. This model provides an
excellent fit.
119
0.015
0.010
Residual
0.005
0.000
-0.005
-0.010
Bounce
Number of flu
cases reported
Amount of
ice cream
sold
Season or
temperature
4.56 Who? The individual are randomly selected people from three different locations. What?
The response variable is whether or not the individual suffered from CHD and the explanatory
variable is a measure of how prone an individual is to sudden anger. Both variables are
categorical, with CHD being yes or no and the level of anger being classified as low, moderate,
or high. Why? The researchers wanted to see if there was an association between these two
categorical variables. When, where, how, and by whom? In the late 1990s a random sample of
almost 13,000 people was followed for four years. The Spielberger Trait Anger Scale was used
to classify the level of anger and medical records were used for CHD. Graphs: A bar graph of
the conditional distributions of CHD for each level of anger is shown below (left). To see the
120
Chapter 4
increase in the percent of individual with CHD in each group, a separate bar graph is shown
(right). Notice how the change in scale changes your impression of the effect.
100
4
80
Percent
Percent
3
60
40
20
0
CHD
Anger level
yes
no
low
yes
no
moderate
yes
no
high
low
moderate
Anger level
high
Numerical summaries: The software output below from Minitab shows the marginal
distributions, conditional distributions, and joint distribution.
Rows: CHD
Columns: Anger
high
low moderate
All
No
606
7.32
95.73
7.151
3057
36.90
98.30
36.075
4621
55.78
97.67
54.532
8284
100.00
97.76
97.758
Yes
27
14.21
4.27
0.319
53
27.89
1.70
0.625
110
57.89
2.33
1.298
190
100.00
2.24
2.242
All
633
7.47
100.00
7.470
3110
36.70
100.00
36.700
4731
55.83
100.00
55.830
8474
100.00
100.00
100.000
Cell Contents:
Count
% of Row
% of Column
% of Total
The most important numbers for comparison are the percents of each anger group
that experienced CHD: 53/3110 1.70% of the low-anger group, 110/4731 2.33% of the
moderate-anger group, and 27/633 4.27% of the high-anger group.
Interpretation: Risk of CHD increases with proneness to sudden anger. It might be good to
point out to students that results like these are typically reported in the media with a reference to
4.3%
2.5 , we might read that subjects in the
the relative risk of CHD; for example, because
1.7%
high-anger group had 2.5 times the risk of those in the low-anger group.
4.57 Who? The individuals are cultures of marine bacteria. What? The two quantitative
variables are x = time (minutes) and y = count (number of surviving bacteria in hundreds). Why?
Researchers wanted to see if the bacteria would decay exponentially over time when exposed to
X-rays. When, where, how, and by whom? It is not clear when or where the data were collected,
but the counts were obtained after exposing cultures to X-rays for different lengths of time.
121
Graphs: Scatterplots below show the original data (left) and the transformed data (right) after
taking the logarithm of count. Both plots suggest that the exponential decay model is appropriate
for these data.
400
2.6
2.4
2.2
Log(Count)
300
200
100
2.0
1.8
1.6
1.4
1.2
1.0
0
8
10
Time (minutes)
12
14
16
8
10
Time (minutes)
12
14
16
Numerical summaries: The least-squares regression line for the transformed data is
log y = 2.5941 0.0949 x . Using the inverse transformation, the predicted count is
y = 102.5941100.0949 x 392.7354 100.0949 x . Interpretation: The residual plot below shows no
clear pattern and r 2 = 98.8% , so the exponential decay model provides an excellent model for
the number of surviving bacteria after exposure to X-rays.
0.10
Residual
0.05
0.00
-0.05
-0.10
0
8
10
Time (minutes)
12
14
16
4.58 (a) The two-way table below was obtained by adding the corresponding entries for each
age group. The proportion of smokers who stayed alive for 20 years is 443/582 0.7612 or
76.12% and the proportion of nonsmokers who stayed alive is 502/732 0.6858 or 68.58%.
Smoker Not
Dead
139
230
Alive
443
502
(b) For the youngest group, 269/288 or 93.40% of the smokers and 327/340 or 96.18% of the
nonsmokers survived. For the middle group, 167/245 or 68.16% of the smokers and 147/199 or
73.87% of the nonsmokers survived. For the oldest group, 7/49 or 14.29% of the smokers and
28/193 or 14.51% of the nonsmokers survived. The results are reversed when the data for the
three age groups are combined. (c) The percents of smokers in the three age groups are
288/628100 45.86% for the youngest group, 245/444100 55.18% for the middle aged
group, but only 49/242100 20.25% for the oldest group.