You are on page 1of 13

CHAPTER 7

MULTIPLE REGRESSION
ANSWERS TO ODD-NUMBERED PROBLEMS AND CASES
1. A good predictor variable is highly related to the dependent variable but not too
highly related to other predictor variables.
3. The net regression coefficient measures the average change in the dependent variable per
unit change in the relevant independent variable, holding the other independent variables
constant.
5.

Y
= 7.52 + 32!" # $2.27" = #$7.%%
7. a. &ach variable is perfectly related to itself. The correlation is al'ays $.
b. The entries in a correlation matri( reflected about the main diagonal are the
same. )or e(ample, r32 = r23.
c. *ariables 5 and + 'ith correlation coefficients of .7, and .7!, respectively.
d. The r$-

= #.5$ indicates a negative linear relationship.
e. .es. *ariables 5 and + are to some e(tent collinear, r5+

= .+,.
f. /odels that include variables - and + or variables 2 and 5 are possibilities. The
predictor variables in these models are related to the dependent variable and not
too highly related to each other.
g. *ariable 5.
9. a. 0orrelations1 )ood, 2ncome, 3i4e
)ood 2ncome
2ncome !.%%-
3i4e !.737 !.%+7
2ncome is highly correlated 'ith )ood e(penditures" and, to a lesser e(tent,
so is 3i4e. 5o'ever, the predictor variables 2ncome and 3i4e are themselves
highly correlated indicating there is a potential multicollinearity problem.
b. The regression e6uation is
)ood = 3.52 + 2.2% 2ncome # !.-$ 3i4e
7redictor 0oef 3& 0oef T 7 *2)
,-
0onstant 3.5$, 3.$+$ $.$$ !.3!2
2ncome 2.277+ !.%$2+ 2.%! !.!2+ -.!$+
3i4e #!.-$$ $.23+ #!.33 !.7-, -.!$+
3 = 2.%,27, 8#36 = 7%.59 8#36ad:" = 72.39
;hen income is increased by one thousand dollars holding family si4e constant, the
average increase in annual food e(penditures is 22% dollars. ;hen family si4e is
increased by one person holding income constant, the average decrease in annual
food e(penditures is -$ dollars. 3ince family si4e is positively related to food
e(penditures, r = .737, it doesn<t ma=e sense that a decrease in e(penditures
'ould occur.
c. /ulticollinearity is a problem as indicated by VIF<s of about -.!. 3i4e should be
dropped from the regression function and the analysis redone 'ith only 2ncome
as the predictor variable.
11. a. 3catter diagram follo's. )emale drivers indicated by solid circles, male divers by
diamonds.


b. The regression e6uation is1
Y
>
= 25.5 # $.!- ?$ + $.2$ ?2
)or a given age of car, female drivers e(pect to get about $.2 more miles
per gallon than male drivers.
c. )itted line for female drivers has e6uation1
$
!- . $ 2$ . 2+
>
X Y =
)itted line for male drivers has e6uation1
$
!- . $ 5 . 25
>
X Y =
7arallel lines 'ith different intercepts"

d.
,5
@ine falls Abet'eenB point representing female drivers and point
representing male drivers. 3traight line e6uation over#predicts mileage for
male drivers and under#predicts mileage for female drivers. 2mportant to include
gender variable in this regression function.
13. a. 0orrelations1 3ales, Cutlets, Auto, 2ncome
3ales Cutlets Auto
Cutlets !.73,
Auto !.5-% !.+7!
2ncome !.,3+ !.55+ !.2%$

The regression e6uation is
3ales = # 3.,2 + !.!!23% Cutlets + !.-57 Auto + !.-!$ 2ncome
7redictor 0oef 3& 0oef T 7 *2)
0onstant #3.,$% 2.2,! #$.7$ !.$3$
Cutlets !.!!23%- !.!!$572 $.52 !.$73 2.-73
Auto !.-57- !.$+75 2.73 !.!2, $.%5-
2ncome !.-!!5% !.!377, $!.+! !.!!! $.-%$
3 = 2.++7,% 8#36 = ,7.-9 8#36ad:" = ,+.29
Analysis of *ariance
3ource D) 33 /3 ) 7
8egression 3 $%-3.-! +$-.-7 %+.32 !.!!!
8esidual &rror 7 -,.%3 7.$2
Total $! $%,3.23
7ersonal income by region ma=es a significant contribution to sales. Adding 2ncome
,+
to the regression function results in an increase in R
2
from 559 to ,79. 2n addition,
the t value and corresponding p value for 2ncome indicates the coefficient of this
variable in the population is different from ! given predictor variables Cutlets and
3ales. Eotice ho'ever, the regression should be rerun after deleting the insignificant
predictor variable Cutlets. The correlation matri( and the VIF numbers suggest
Cutlets is multicollinear 'ith Auto and 2ncome.
b. 7redicted *alues for Ee' Cbservations
Ee'
Cbs )it 3& )it ,59 02 ,59 72
$ 27.3!+ $.%7% 22.%+5, 3$.7-+" $,.5,$, 35.!2!"
*alues of 7redictors for Ee' Cbservations
Ee'
Cbs Cutlets Auto 2ncome
$ 25!! 2!.2 -!.!
Annual sales for region $2 is predicted to be 27.3!+ million.
c. The standard error of estimate has been reduced to 2.+7 from $!.3 and R
2
has increased
to ,79. The ,59 72 in part b is fairly narro'. The forecast for region $2 sales in
part be should be accurate.
d. The best choice is to drop Cutlets from the regression function. 2f this is done,
the regression e6uation is
3ales = # -.!3 + !.+2$ Auto + !.-3! 2ncome
7redictor 0oef 3& 0oef T 7 *2)
0onstant #-.!27 2.-+% #$.+3 !.$-$
Auto !.+2!, !.$3%2 -.-, !.!!2 $.!%+
2ncome !.-3!$7 !.!3-%, $2.33 !.!!! $.!%+
3 = 2.%7+55 8#36 = ,+.59 8#36ad:" = ,5.+9
/easures of fit are nearly the same as those for the full model and there is no longer
a multicollinearity problem.
15. a. 3catter plot for cash purchases versus number of items rectangles" and credit card
purchases versus number of items solid circles" follo's.
,7

b. /initab regression output1

Eotice that for a given number of items, sales from cash purchases are estimated to
be about F$%.+! less than gross sales from credit card purchases.

c. The regression in part b is significant. The number of items sold and 'hether
the purchases 'ere cash or credit card e(plains appro(imately %39 of the
variation in gross sales. The predictor variable 2tems is clearly significant. The
coefficient of the dummy variable ?2 is significantly different from ! at the
,%
$!9 level but not at the 59 level. )rom the residual plots belo' 'e see that
there are a fe' large residuals see, in particular, cash sales for day 25 and credit
card sales for day $"G but overall, plots do not indicate any serious departures
from the usual regression assumptions.
d.

Y
= $3.+$ + 5.,,25" H $%.+$" = F$-5

e. sy.(<s = 3!.,% df = -7 t.!25 = Z.!25 = $.,+
,59 large sample" prediction interval1
$-5 $.,+3!.,%" = F%-, F2!+"
f. )itted function in part b is effectively t'o parallel straight lines given by the
e6uations1
0ash purchases1

Y
= $3.+$ + 5.,,2tems H $%.+$" = #-.,% + 5.,,2tems
0redit card purchases1

Y
= $3.+$ + 5.,,2tems
2f 'e fit separate straight lines to the t'o types of purchases 'e get1
0ash purchases1

Y
= #.+! + 5.7%2tems 8
2
= ,!.59
0redit card purchases1

Y
= $!.!2 + +.-+2tems 8
2
= ++.!9
7redictions for cash sales and credit card sales 'ill not be too much different
for the t'o procedures one prediction e6uation or t'o individual e6uations".
2n terms of 8
2
, the single e6uation model falls bet'een the fits of the separate
models for cash purchases and credit card purchases but closer to the higher
number for cash purchases. )or convenience and overall good fit, prefer the
,,
single e6uation 'ith the dummy variable.
17. a. *ie' 'ill enter the step'ise regression function first since it has the largest
correlation 'ith 7rice. After that the order of entry is difficult to determine from
the correlation matri( alone. 3everal of the predictor variable pairs are fairly highly
correlated so multicollinearity could be a problem. )or e(ample, once *ie' is in the
model, &levation may not enter be significant". 3lope and Area are correlated so
it may be only one of these predictors is re6uired.
b. As pointed out in part a, it is difficult to determine the results of a step'ise program.
5o'ever, a t'o predictor model 'ill probably 'or= as 'ell as any in this case.
7otential t'o predictor models include *ie' and Area or *ie' and 3lope.

19. 3tep'ise regression results, 'ith significance level .!5 to enter and leave the
regression function, follo'.
Alpha#to#&nter1 !.!5 Alpha#to#8emove1 !.!5
8esponse is . on 3 predictors, 'ith E = 2!
3tep $
0onstant #2+.2-
?3 3$.-
T#*alue 3.3!
7#*alue !.!!-
3 $-.+
8#36 37.7$
8#36ad:" 3-.25
The AbestB regression model relates final e(am score to the single predictor
variable grade point average.
All possible regression results are summari4ed in the follo'ing table.
7redictor *ariables
2
R
?$ .2,5
?2 .3!$
?3 .377
?$, ?2 .-!-
?$, ?3 .-52
?2, ?3 .-+!
?$, ?2, ?3 .-,%
The
2
R
criterion 'ould suggest using all three predictor variables. 5o'ever, the
results in problem 7.$% suggest there is a multicollinearity problem 'ith three
predictors. The best t'o independent variable model uses predictors ?2 and ?3.
;hen this model is fit, ?2 is not re6uired. ;e end up 'ith a model involving the
single predictor ?3, the model selected by the step'ise procedure.
$!!
21. 3catter diagram 'ith fitted 6uadratic regression function1

a. I b. The regression e6uation is
Assets = 7.+$ # !.!!-+ Accounts + !.!!!!3- AccountsJJ2
7redictor 0oef & 0oef T 7 *2)
0onstant 7.+!% %.5!3 !.%, !.-!$
Accounts #!.!!-57 !.!237% #!.$, !.%53 25.,+5
AccountsJJ2 !.!!!!33+$ !.!!!!!%,3 3.7+ !.!!7 25.,+5
3 = $2.-$$7 8#36 = ,7.,9 8#36ad:" = ,7.39
Analysis of *ariance
3ource D) 33 /3 ) 7
8egression 2 5$$3! 255+5 $+5.,5 !.!!!
8esidual &rror 7 $!7% $5-
Total , 522!%
The regression is significant F = $+5.,5, p value = .!!!". Kiven Accounts in the
model, AccountsJJ2 is significant t value = 3.7+, p value = .!!7". 5ere Accounts
could be dropped from the regression function and the analysis repeated 'ith only
AccountsJJ2 as the predictor variable. 2f this is done, R
2
and the coefficient of
AccountsJJ2 remain virtually unchanged.
c. Dropping AccountsJJ2 from the model gives1
The regression e6uation is
$!$
Assets = # $7.$ + !.!%32 Accounts
7redictor 0oef 3& 0oef T 7
0onstant #$7.$2$ %.77% #$.,5 !.!%7
Accounts !.!%32!5 !.!!75,2 $!.,+ !.!!!
3 = 2!.$%77 8#36 = ,3.%9 8#36ad:" = ,3.!9
The coefficient of Accounts changes from the 6uadratic model to the straight
line model because, not surprisingly, Accounts and AccountsJJ2 are highly
collinear VIF = 25.,+5 in the 6uadratic model".
23. Lsing the final model from problem 22 'ith 523 = 7.3 and @actic = $.%5
7redicted *alues for Ee' Cbservations
Ee'
Cbs )it 3& )it ,59 02 ,59 72
$ 32.3+ 3.!2 25.7%, 3%.,5" $+.+,, -%.!-"
3ince
53 . +
M
=
s x y
s
and
$7, . 2
!25 .
= t
a large sample ,59 prediction interval is1

" 5, . -+ , $3 . $% " 53 . + $7, . 2 3+ . 32
Eotice the large sample ,59 prediction interval is not too much different than the
actual ,59 prediction interval 72" above.
Although the fit in this case is relatively good, the standard error of the estimate is
some'hat large, so there is a fair amount of uncertainty associated 'ith any forecast.
2t may be a good idea to collect more data and, perhaps, investigate additional
predictor variables.
CASE 7-1: THE BOND MARKET
The actual data for this case is supplied in Appendi( A. 3tudents can either be as=ed to
8espond to the 6uestion at the end of the case or they can be assigned to run and analy4e the data.
Cne approach that 2 have used successfully is to assign one group of students the role of as=ing
Nudy NohnsonMs 6uestions and another group the responsibility for 8onMs ans'ers.
$. ;hat 6uestions do you thin= Nudy 'ill have for 8onO The students al'ays seem
to come up 'ith 6uestions that /s. Nohnson 'ill as=. The =ey is that 8on should be able
$!2
to ans'er them. 7ossible issues include1
Are all the predictor variables in the final model re6uiredO 2s a simpler model
'ith fe'er predictor variables feasibleO
Do the estimated regression coefficients in the final model ma=e sense and are
they reliableO
)our observations have large standardi4ed residuals. 2s this a cause for concernO
2s the final model a good one and can it be confidently used to forecast the
utility<s bond interest rate at the time of issuanceO
2s multiple regression the appropriate statistical method to use for this situationO
CASE 7-2: AAA WASHINGTON
$. The multiple regression model that includes both unemployment rate and average
monthly temperature is sho'n belo'. Temperature is the only good predictor variable.

3. Lnemployment rate lagged $$ months is a good predictor of emergency road service
calls. Lnemployment rate lagged 3 months is not a good predictor. The /initab output
'ith Temp and @agged$$8ate is given belo'.
The regression e6uation is
0alls = 2$-!5 # %%.- Temp + 75+ @ag$$8ate
$!3
7redictor 0oef 3& 0oef T 7
0onstant 2$-!5 $%3! $$.7! !.!!!
Temp #%%.3+ $,.2$ #-.+! !.!!!
@ag$$8ate 75+.3 $72.! -.-! !.!!!
3 = $$$+.%! 8#36 = +-.$9 8#36ad:" = +2.%9
Analysis of *ariance
3ource D) 33 /3 ) 7
8egression 2 $2!-3!2!% +!2$5$!- -%.2% !.!!!
8esidual &rror 5- +735$$$+ $2-72-3
Total 5+ $%77%$32-
The regression is significant. The signs on the coefficients of the independent variables
ma=e sense. The coefficient of each independent variable is significantly different
from ! t = H-.+, p value = .!!! and t = -.-, p value = .!!!, respectively".
CASE 7-3: FANTASY BASEBALL (A)
$. The regression is significant. The R
2
of 7%.$9 loo=s good. The t statistic for each
of the predictor variables is large 'ith a very small p#value. The VIF<s are relatively
small for the three predictors indicating that multicollinearity is not a problem. The
residual plots sho'n in )igure 7#- indicate that this model is valid. Dr. 5an=e has
developed a good model to forecast &8A.
3. The regression results 'ith ;527 replacing CPA as a predictor variable follo'.
The residual plots are very similar to those in )igure 7#-.
The regression e6uation is
&8A = # 2.%$ + -.-3 ;527 + !.$!$ 0/D + !.%+2 58Q,
7redictor 0oef 3& 0oef T 7 *2)
0onstant #2.%$!5 !.-%73 #5.77 !.!!!
;527 -.-333 !.3$35 $-.$- !.!!! $.,5,
0/D !.$!!7+ !.!-25- 2.37 !.!$, $.7,3
58Q, !.%+23 !.$$,5 7.22 !.!!! $.$35
3 = !.-3,2%, 8#36 = 77.,9 8#36ad:" = 77.-9
Analysis of *ariance
3ource D) 33 /3 ) 7
8egression 3 ,$.$+7 3!.3%, $57.-% !.!!!
$!-
8esidual &rror $3- 25.%5, !.$,3
Total $37 $$7.!2+
The fit and the ade6uacy of this model are virtually indistinguishable from the
corresponding model 'ith CPA instead of ;527 as a predictor. The estimated
coefficients of 0/D and 58Q, are nearly the same in both models. Poth models are
good. The original model 'ith CPA as a predictor has a slightly higher R
2
and a
slightly smaller standard error of the estimate. Lsing these criteria, it is the preferred
model.
CASE 7-4: FANTASY BASEBALL (B)
The pro:ect may not be doomed to failure. A lot can be learned from investigating the
influence of the various independent variables on ;2E3. 5o'ever, the best regression model
does not e(plain a large percentage of the variation in ;2E3, R
2
= 3-9, so the e(perts have
a point. There 'ill be a lot of uncertainty associated 'ith any forecast of ;2E3. The step'ise
selection of the best predictor variables and the subse6uent full regression output follo'.
3tep'ise 8egression1 ;2E3 versus T58C;3, &8A, ...
Alpha#to#&nter1 !.!5 Alpha#to#8emove1 !.!5
8esponse is ;2E3 on $! predictors, 'ith E = $3%
3tep $ 2
0onstant 2!.53$ 5.5-3
&8A #2.$+ #2.!$
T#*alue #7.!! #+.%!
7#*alue !.!!! !.!!!
8LE3 !.!$%2
T#*alue 3.%+
7#*alue !.!!!
3 3.33 3.$7
8#36 2+.5$ 33.%3
8#36ad:" 25.,7 32.%5
The regression e6uation is
;2E3 = 5.5- # 2.!$ &8A + !.!$%2 8LE3
7redictor 0oef 3& 0oef T 7 *2)
0onstant 5.5-3 -.$!% $.35 !.$7,
&8A #2.!$$! !.2,5, #+.%! !.!!! $.!$7
8LE3 !.!$%$7! !.!!-7!2 3.%+ !.!!! $.!$7
3 = 3.$7-$+ 8#36 = 33.%9 8#36ad:" = 32.%9
$!5
Analysis of *ariance
3ource D) 33 /3 ) 7
8egression 2 +,5.3$ 3-7.++ 3-.5$ !.!!!
8esidual &rror $35 $3+!.$7 $!.!%
Total $37 2!55.-%

$!+

You might also like