Professional Documents
Culture Documents
13.73 (a)
Coefficients Standard Error t Stat P-value
Intercept 6808.1047 854.9682 7.9630 0.0005
Twitter Activity 0.0503 0.0035 14.2532 0.0000
(b) For each additional unit increase in Twitter activity, the mean receipts per theater will
increase by an estimated $0.05. The estimated mean receipt per theater is $6808.10 when
there is no Twitter activity.
(c) Ŷ b0 b1 X = 6808.1047+0.0503100000 = $11,835.26
(d) You should not use the model to predict the receipts for a movie that has a Twitter
activity of 1,000,000 because 1,000,000 falls outside the domain of the independent
variable and any prediction performed through extrapolation will not be reliable.
(e) r2 = 0.9760. So 97.60% of the variation in receipt per theater can be explained by the
variation in Twitter activity.
(f)
Residual Plot
3000
2000
1000
Residuals
0
-1000
-2000
-3000
-4000
0 200000 400000 600000 800000
X
The residual plot does not reveal specific pattern. However, the sample size is too small
for the residual analysis to be reliable.
(g) t 14.2532 and p-value = 0.0000. Since p-value < 0.05 , reject H0. There is
evidence of a linear relationship between Twitter activity and receipts.
(h) $10,015.85 Y | X 100,000 $13,654.67
$6,790.94 YX 100,000 $16,879.58
(i) The results of (a)-(h) suggest that Twitter activity is a useful predictor of receipts on the
first weekend a movie opens. However, the sample size of 7 is too small for the
prediction to be reliable.
13.74 (f) Based on a visual inspection of the graphs of the distribution of residuals and the
cont. residuals versus the number of cases, there is no pattern. The model appears to be
adequate.
(g) t 24.88 t0.05/2 2.1009 with 18 degrees of freedom for 0.05 . Reject H0. There
is evidence that the fitted linear regression model is useful.
(h) 44.88 Y | X 150 46.80
41.56 YX 150 50.12
0
-20
-40
-60
0 20 40 60
Diameter at breast height
There are clusters of negative residuals at the low and high end of the diameter values.
There appears to be some non-linear relationship between height and diameter.
Normal Probability Plot
60
40
20
Residuals
0
-20
-40
-60
-2 -1 0 1 2
Z Value
The normal probability plot does not suggest any possible departure from the normality
assumption.
13.76 (a)
Scatter Diagram
250
200
150
Y
100
50
0
155 160 165 170 175 180 185 190
X
b0 -122.3439 b1 1.7817
(b) For each additional thousand dollars in assessed value, the estimated mean selling price of a
house increases by 1.7817 thousand dollars. The estimated mean selling price of a house
with a 0 assessed value is –122.3439 thousand dollars. However, this interpretation is not
meaningful in the current setting since the assessed value is very unlikely to be 0 for a
house.
(c) Yˆ -122.3439 1.78171X -122.3439 1.78171170 180.5475 thousand
dollars
(d) r2 = 0.9256. So, 92.56% of the variation in selling price can be explained by the variation
in assessed value.
(e)
Assessed Value Residual Plot
2
Residuals
-2
-4
-6
-8
155 160 165 170 175 180 185 190
Assessed Value
13.76 (e)
cont.
Normal Probability Plot
2
Residuals
0
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
-2
-4
-6
-8
Z Value
Both the residual plot and the normal probability plot do not reveal any potential
violation of the linearity, equal variance and normality assumptions.
(f) t = 18.6648 with 28 degrees of freedom, p-value is virtually zero. Since p-value < 0.05,
reject H0. There is evidence of a linear relationship between selling price and assessed
value.
(g) 1.5862 1 1.9773
13.77 (a)
Scatter Diagram
188
186
184
182
180
Y
178
176
174
172
170
168
0 0.5 1 1.5 2 2.5
X
b0 151.9153 b1 16.6334
13.77 (b) For each additional thousand square foot increase in the heating area of a house, the
cont. estimated mean assessed value increases by 16.6334 thousand dollars. The estimated mean
assessed value of a house with a heating area of 0 square feet is 151.9153 thousand dollars.
However, this interpretation is not meaningful in the current setting since the size of a house
is very unlikely to be 0 for a house with a positive assessed value.
(c) Yˆ 151.9153399 16.6334 X 151.9153399 16.63341.75 181.0237 thousand
dollars
(d) r2 = 0.6593. So, 65.93% of the variation in assessed value can be explained by the
variation in the heating area.
(e)
2
Residuals
-2
-4
-6
0.00 0.50 1.00 1.50 2.00 2.50
Heating Area
2
Residuals
0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
-2
-4
-6
Z Value
Both the residual plot and the normal probability plot do not reveal any potential
violation of the linearity, equal variance and normality assumptions.
13.77 (f) t = 5.0161 with 13 degrees of freedom, p-value = 0.0002. Since p-value < 0.05,
cont. reject H0. There is evidence of a linear relationship between assessed value and heating
area.
13.78 (a)
Scatter Diagram
4.5
4
3.5
3
2.5
GPA
2
1.5 GPA
1
0.5
0
0 200 400 600 800
GMAT
b0 = 0.30, b1 = 0.00487
(b) 0.30 is the portion of estimated mean GPI index (GPA) that is not affected by the GMAT
score. The mean GPI index of a student with a zero GMAT score is estimated to be 0.30,
which does not have practical meaning. For each additional point on the GMAT score,
the estimated GPI increases by an average of 0.00487.
(c) Yˆ 0.30 0.00487 X 0.30 0.00487(600) 3.222
(d) r2 = 0.7978. 79.78% of the variation in the GPI can be explained by the
variation in the GMAT score.
(e) Based on a visual inspection of the graphs of the distribution of residuals and the
residuals versus the GMAT score, there is no pattern. The model appears to be adequate.
(f) t 8.428 t0.05/2 2.1009 with 18 degrees of freedom for 0.05 . Reject H0. There
is evidence that the fitted linear regression model is useful.
(g) 3.144 Y | X 600 3.301
2.886 YX 600 3.559
(h) 0.00366 1 0.00608
13.79 (a)
Scatter Diagram
4.5
4
0
-0.2
-0.4
-0.6
-0.8
0 100 200 300 400
Invoices
13.79 (e)
cont.
Invoices Residual Plot
0.8
0.6
0.4
0.2
Residuals
0
-0.2
-0.4
-0.6
-0.8
0 10 20 30 40
Invoices
(f) Based on a visual inspection of the graphs of the distribution of residuals and the
residuals versus the number of invoices and time, there appears to be autocorrelation in
the residuals.
(g) D = 0.69 < 1.37 = dL. There is evidence of positive autocorrelation. The
model does not appear to be adequate. The number of invoices and, hence, the time
needed to process them, tend to be high for a few days in a row during historically
heavier shopping days or during advertised sales days. This could be the possible causes
for positive autocorrelation.
Due to the violation of the independence of errors assumption, the prediction made in (c)
is very likely to be erroneous.
13.80 (a)
Scatter Plot
12
O-ring Damage Index
10
0
0 10 20 30 40 50 60 70 80
Temperature (degrees F)
There is not any clear relationship between atmospheric temperature and O-ring damage
from the scatter plot.
13.80 (b),(f)
cont.
12
10
0
0 20 40 60 80 100
-2
-4
Temperature (degrees F)
(c) In (b), there are 16 observations with an O-ring damage index of 0 for a variety of
temperatures. If one concentrates on these observations with no O-ring damage, there is
obviously no relationship between O-ring damage index and temperature. If all
observations are used, the observations with no O-ring damage will bias the estimated
relationship. If the intention is to investigate the relationship between the degrees of O-
ring damage and atmospheric temperature, it makes sense to focus only on the flights in
which there was O-ring damage.
(d) Prediction should not be made for an atmospheric temperature of 31 0F because it is
outside the range of the temperature variable in the data. Such prediction will involve
extrapolation, which assumes that any relationship between two variables will continue to
hold outside the domain of the temperature variable.
(e) Yˆ 18.036 0.240X
(g) A nonlinear model is more appropriate for these data.
(h)
Temperature Residual Plot
7
6
5
4
Residuals
3
2
1
0
-1
-2
-3
0 20 40 60 80 100
Temperature
The string of negative residuals and positive residuals that lie on a straight line with a
positive slope in the lower-right corner of the plot is a strong indication that a nonlinear
model should be used if all 23 observations are to be used in the fit.
13.81 (a)
Regression Statistics
Multiple R 0.6137
R Square 0.3766
Adjusted R Square 0.3543
Standard Error 9.1725
Observations 30
ANOVA
df SS MS F Significance F
Regression 1 1423.1916 1423.1916 16.9156 0.0003
Residual 28 2355.7750 84.1348
Total 29 3778.9667
0
-5
-10
-15
-20
-25
0 1 2 3 4 5 6
X
0
-5
-10
-15
-20
-25
-3 -2 -1 0 1 2 3
Z Value
The residual plot and the normal probability plot do not reveal any possible violation of
the assumptions.
13.81 (f) H 0 : 1 0 H1 : 1 0
cont. p-value = 0.0003. Reject H0 at the 5% level of significance. There is evidence that the
fitted linear regression model is useful.
(g) 66.1668 Y | X 4.5 77.5440
(h) 52.2241 YX 4.5 91.4867
(i) -24.2292 1 -8.1185
(j) The “population” might be considered to be all the teams in recent years in which
baseball has been played.
(k) Other independent variables that might be considered for inclusion in the models are (i)
runs scored, (ii) hits allowed, (iii) walks allowed, (iv) number of errors, etc.
13.82 (a)
Regression Statistics
Multiple R 0.9429
R Square 0.8890
Adjusted R Square 0.8851
Standard Error 33.8396
Observations 30
ANOVA
df SS MS F Significance F
Regression 1 256892.1202 256892.1202 224.3372 0.0000
Residual 28 32063.2465 1145.1159
Total 29 288955.3667
0
-20
-40
-60
-80
-100
-120
0 50 100 150 200 250
X
13.82 (e)
cont.
Normal Probability Plot
80
60
40
20
Residuals
0
-20
-40
-60
-80
-100
-120
-3 -2 -1 0 1 2 3
Z Value
The normal probability plot suggests possible departure from the normality assumption.
(f) t STAT 14.9779 with a p-value that is approximately zero, reject H 0 at the 5% level of
significance. There is evidence of a linear relationship between annual revenue and
franchise value.
(g) $417.5025 millions Y | X 150 $448.2982 millions
(h) $361.8935 millions YX 150 $503.9071 millions
(i) The strength of the relationship between revenue and value is stronger for NBA
franchises than for European soccer teams and Major League Baseball teams.
13.83 (a)
Regression Statistics
Multiple R 0.8900
R Square 0.7921
Adjusted R Square 0.7805
Standard Error 258.7281
Observations 20
ANOVA
df SS MS F Significance F
Regression 1 4589582.0709 4589582.0709 68.5624 0.0000
Residual 18 1204923.7291 66940.2072
Total 19 5794505.8000
13.83 (e)
cont.
Residual Plot
1000
800
600
400
Residuals
200
0
-200
-400
-600
0 200 400 600 800
X
200
0
-200
-400
-600
-2 -1 0 1 2
Z Value
Based on a visual inspection of the graphs of the distribution of the residuals versus
revenues, the equal variance assumption appears to be violated. The normal probability
plot suggests that the normality assumption might have been violated.
(f) The p-value is approximately zero, reject H 0 at the 5% level of significance. There is
evidence of a linear relationship between annual revenue and franchise value.
(g) $54.3111 millions Y | X 150 $418.2124 millions
(h) -$336.9499 millions YX 150 $809.4734 millions
(i) The strength of the relationship between revenue and value is stronger for NBA
franchises than for European soccer teams and Major League Baseball teams.
13.84 (a)
Scatter Diagram
5000
4500
4000
Weight (grams) 3500
3000
2500
2000
1500
1000
500
0
0 20 40 60 80 100
Circumference (cms.)
Yˆ 2629.222+82.4717X
(b) For each increase of one centimeter in circumference, the estimated mean weight of a
pumpkin will increase by 82.4717 grams.
(c) Yˆ 2629.222+82.4717 60 2319.080 grams.
(d) There appears to be a positive relationship between weight and circumference of a
pumpkin. It is a good idea for the farmer to sell pumpkins by circumference instead of
weight for circumference is a good predictor of weight, and it is much easier to measure
the circumference of a pumpkin than its weight.
(e) r2 = 0.9373. 93.73% of the variation in pumpkin weight can be explained by the
variation in circumference.
(f)
Circumference Residual Plot
600
400
200
Residuals
-200
-400
-600
-800
0 10 20 30 40 50 60 70 80 90
Circumference
13.85 (a)
Scatter Plot
4500000
4000000
3500000
3000000
2500000
Y
2000000
1500000
1000000
500000
0
0 20000 40000 60000 80000
X
(b)
Regression Statistics
Multiple R 0.3836
R Square 0.1472
Adjusted R Square 0.1235
Standard Error 849860.1708
Observations 38
ANOVA
df SS MS F Significance F
Regression 1 4487471111709.6900 4487471111709.6900 6.2131 0.0174
Residual 36 26001443158075.5000 722262309946.5420
Total 37 30488914269785.2000
13.85 (e)
cont.
Residual Plot
2000000
1500000
1000000
500000
Residuals
0
-500000
-1000000
-1500000
-2000000
0 20000 40000 60000 80000
X
There is a slight increase in the variance of the residuals at the higher end of the median
family income. In general, however, the assumption of homoscedasticity seems to be
intact.
Normal Probability Plot
2000000
1500000
1000000
500000
Residuals
0
-500000
-1000000
-1500000
-2000000
-3 -2 -1 0 1 2 3
Z Value
The normal probability plot does not suggest violation of the normality assumption.
(f) t = 2.4926 with a p-value = 0.0174 < 0.05. Reject H 0 . There is enough evidence to
conclude that there is a linear relationship between one-month sales total and median
income of customer base.
(g) 7.2995 71.0400
You are 95% confident that the slope is somewhere between 7.2995 and 71.04.
13.86 (a)
Scatter Plot
5000000
4000000
Sales 3000000
2000000
1000000
0
0 5 10 15 20 25 30 35 40
Age
(b) Yˆ 931626.16+21782.76X
(c) Since median age of customer base cannot be 0, b0 just captures the portion of the latest
one-month mean sales total that varies with factors other than median age.
b1 21782.76 means that as the median age of customer base increases by one year, the
estimated mean latest one-month sales total will increase by $21782.76.
(d) r 2 0.0017 . Only 0.17% of the total variation in the franchise's latest one-month sales
total can be explained by using the median age of customer base.
(e)
4000000
Residuals
2000000
0
-2000000 0 10 20 30 40
Age
The residuals are very evenly spread out across different range of median age.
(f) H0 : 0 H1 : 0
r
Test statistic: t 0.2482
1 r2
n2
Decision rule: Reject H 0 when |t|>2.0281.
Decision: Since t = 0.2482 is less than the upper critical bound 2.0281, do not reject H 0 .
There is not enough evidence to conclude that there is a linear relationship between one-
month sales total and median age of customer base.
(g) b1 t /2 Sb1 21782.76354 2.028187749.63
-156181.50 199747.02
13.87 (a)
Scatter Diagram
4500000
4000000
3500000
Sales 3000000
2500000
2000000
1500000
1000000
500000
0
0 20 40 60 80 100
HS
There appears to be some positive linear relationship between total sales and percentage
of customer base with high school diploma.
(b) Yˆ -2969741.23+59660.09X
(c) b1 59660.09 indicates that as the percent of customer base with a high school diploma
increases by one, the estimated mean latest one-month sales total will increase by
$59660.09.
(d) r 2 0.2405 . 24.05% of the total variation in the franchise's latest one-month sales total
can be explained by the percentage of customer base with a high school diploma.
(e)
HS Residual Plot
2000000
Residuals
0
0 20 40 60 80 100
-2000000
HS
13.88 (a)
Scatter Diagram
4500000
4000000
3500000
3000000
Sales
2500000
2000000
1500000
1000000
500000
0
0 10 20 30 40 50
Collge
4000000
Residuals
2000000
0
-2000000 0 10 20 30 40 50
College
13.89 (a)
Scatter Diagram
4500000
4000000
3500000
3000000
Sales 2500000
2000000
1500000
1000000
500000
0
-5 0 5 10 15 20 25
Growth
It is not obvious that there is any linear relationship between total sales and annual
population growth rate of customer base over the past 10 years.
(b) Yˆ 1595571.48+26833.54X
(c) b0 =1595571 means the estimated mean latest one-month sales total is $1595571 when
the annual population growth rate of customer base over the past 10 years is zero.
b1 26833.54 means that as the annual population growth rate increases by 1%, the
estimated mean latest one-month sales total will increase by $26833.54.
(d) r 2 0.0126 . Only 1.26% of the total variation in the franchise's latest one-month sales
total can be explained by the annual population growth rate of customer base over the
past 10 years.
(e)
Growth Residual Plot
4000000
Residuals
2000000
0
-5
-2000000 0 5 10 15 20 25
Growth
There seems to be a diamond shape pattern of the residual distribution and, hence, a
violation of the homoscedasticity assumption. The variance is larger when the growth
rate is closer to zero.
(f) H0 : 0 H1 : 0
r
Test statistic: t 0.6776
1 r2
n2
Decision rule: Reject H 0 when |t|>2.0281.
Decision: Since t = 0.6776 is less than the upper critical bound 2.0281, do not reject H 0 .
There is not enough evidence to conclude that there is a linear relationship between one-
month sales total and the annual population growth rate of customer base over the past 10
years.
(g) b1 t /2 Sb1 26833.54 2.0281 39601.427
-53481.77 107148.86
13.90 (a) The correlation between compensation and the investment return is 0.1457.
(b) H0 : 0 vs. H1 : 0
The tSTAT value is 2.0404 with a p-value = 0. 0.0427 < 0.05, reject H 0 . The correlation
between compensation and the investment return is statistically significant.
(c) The small correlation between compensation and stock performance was surprising (or
maybe it shouldn’t have been!).