You are on page 1of 65

18

Statistics for Business & Economics, 6th edition

Chapter 13:
Multiple Regression

13.1 Given the following estimated linear model: y 10 3 x1 2 x2 4 x3


a. y 10 3(20) 2(11) 4(10) = 132
b. y 10 3(15) 2(14) 4(20) = 163
c. y 10 3(35) 2(19) 4(25) = 253
d. y 10 3(10) 2(17) 4(30) = 194
13.2 Given the following estimated linear model: y 10 5 x1 4 x2 2 x3
a. y 10 5(20) 4(11) 2(10) = 174
b. y 10 5(15) 4(14) 2(20) = 181
c. y 10 5(35) 4(19) 2(25) = 311
d. y 10 5(10) 4(17) 2(30) = 188
13.3 Given the following estimated linear model: y 10 2 x1 12 x2 8 x3
a. y 10 2(20) 12(11) 8(10) = 262
b. y 10 2(15) 12(24) 8(20) = 488
c. y 10 2(20) 12(19) 8(25) = 478
d. y 10 2(10) 12(9) 8(30) = 378
13.4 Given the following estimated linear model: y 10 2 x1 12 x2 8 x3
a. y increases by 8
b. y increases by 8
c. y increases by 24
a. y increases by 8
13.5 Given the following estimated linear model: y 10 2 x1 14 x2 6 x3
a. y decreases by 8
b. y decreases by 6
c. y increases by 28

19

Statistics for Business & Economics, 6th edition

13.6

The estimated regression slope coefficients are interpreted as follows:


b1 = .661: All else equal, an increase in the planes top speed by one mph
will increase the expected number of hours in the design effort by an
estimated .661 million or 661 thousand worker-hours.
b2 = .065: All else equal, an increase in the planes weight by one ton will
increase the expected number of hours in the design effort by an estimated .
065 million or 65 thousand worker-hours
b3 = -.018: All else equal, an increase in the percentage of parts in common
with other models will result in a decrease in the expected number of hours
in the design effort by an estimated .018 million or 18 thousand workerhours

13.7

The estimated regression slope coefficients are interpreted as follows:


b1 = .057: All else equal, an increase of one unit in the change over the
quarter in bond purchases by financial institutions results in an estimated .
057 increase in the change over the quarter in the bond interest rate
b2 = -.065: All else equal, an increase of one unit in the change over the
quarter in bond sales by financial institutions results in an estimated .065
decrease in the change over the quarter in the bond interest rates

13.8

a. b1 = .052: All else equal, an increase of one hundred dollars in


weekly income results in an estimated .052 quarts per week increase in
milk consumption. b2 = 1.14: All else equal, an increase in family size
by one person will result in an estimated increase in milk consumption
by 1.14 quarts per week.
b. The intercept term b0 of -.025 is the estimated milk consumption of
quarts of milk per week given that the familys weekly income is 0
dollars and there are 0 members in the family. This is likely
extrapolating beyond the observed data series and is not a useful
interpretation.

13.9

a. b1 = .653: All else equal, a one unit increase in the average number
of meals eaten per week will result in an estimated .653 pounds gained
during freshman year.
b2 = -1.345: All else equal, a one unit increase in the average number of
hours of exercise per week will result in an estimated 1.345 pound
weight loss.
b3 = .613: All else equal, a one unit increase in the average number of
beers consumed per week will result in an estimated .613 pound weight
gain.
b. The intercept term b0 of 7.35 is the estimated amount of weight gain
during the freshman year given that the meals eaten is 0, hours exercise
is 0 and there are no beers consumed per week. This is likely
extrapolating beyond the observed data series and is not a useful
interpretation.
13.10 Compute the slope coefficients for the model: yi b0 b1 x1i b2 x2i

Chapter 13: Multiple Regression

Given that b1

s y (rx1 y rx1x2 rx2 y )


sx1 (1 r 2 x1x2 )

, b2

s y (rx2 y rx1x2 rx1 y )


sx2 (1 r 2 x1x2 )

400(.60 (.50)(.70))
400(.70 (.50)(.60))
= 2.000, b2
= 3.200
2
200(1 .50 )
200(1 .502 )
400(.60 (.50)(.70))
b. b1
= -.667,
200(1 (.50) 2 )
400(.70 ( .50)(.60))
b2
= 1.067
200(1 (.50) 2 )
400(.40 (.80)(.45))
400(.45 (.80)(.40))
c. b1
=.083, b2
= .271
2
200(1 (.80) )
200(1 (.80)2 )
400(.60 (.60)(.50))
d. b1
= .9375,
200(1 (.60) 2 )
400(.50 (.60)(.60))
b1
= -.4375
200(1 (.60) 2 )
a. b1

13.11 a. When the correlation is = 0 the slope coefficient of the X1 term


simplifies to the slope coefficient of the bivariate regression: Start with
s y (rx1 y rx1x2 rx2 y )
equation 13.4: b1
. Note that if the correlation between
sx1 (1 r 2 x1x2 )
X1 and X2 is zero, then the second terms in both the numerator and
denominator are zero and the formula algebraically reduces to b1

s y rx1 y
sx1

which is the equivalent of the bivariate slope coefficient (see box on bottom
of page 380).
b. When the correlation between X1 and X2 is = 1, then the second term in the
denominator goes to 0 and the slope coefficient is undefined.
13.12 a. Electricity sales as a function of number of customers and price
Regression Analysis: salesmw2 versus priclec2, numcust2
The regression equation is
salesmw2 = - 647363 + 19895 priclec2 + 2.35 numcust2
Predictor
Coef
SE Coef
T
P
Constant
-647363
291734
-2.22
0.030
priclec2
19895
22515
0.88
0.380
numcust2
2.3530
0.2233
10.54
0.000
S = 66399
R-Sq
Analysis of Variance
Source
DF
Regression
2
Residual Error
61
Total
63

= 79.2%

R-Sq(adj) = 78.5%

SS
MS
1.02480E+12 5.12400E+11
2.68939E+11 4408828732
1.29374E+12

F
116.22

P
0.000

20

21

Statistics for Business & Economics, 6th edition

All else equal, for every one unit increase in the price of electricity, we estimate
that sales will increase by 19895 mwh. Note that this estimated coefficient is not
significantly different from zero (p-value = .380).
All else equal, for every additional residential customer who uses electricity in the
heating of their home, we estimate that sales will increase by 2.353 mwh.
b. Electricity sales as a function of number of customers
Regression Analysis: salesmw2 versus numcust2
The regression equation is
salesmw2 = - 410202 + 2.20 numcust2
Predictor
Coef
SE Coef
Constant
-410202
114132
numcust2
2.2027
0.1445
S = 66282
R-Sq
Analysis of Variance
Source
DF
Regression
1
Residual Error
62
Total
63

= 78.9%

T
-3.59
15.25

P
0.001
0.000

R-Sq(adj) = 78.6%

SS
MS
1.02136E+12 1.02136E+12
2.72381E+11 4393240914
1.29374E+12

F
232.48

P
0.000

An additional residential customer will add 2.2027 mwh to electricity sales.


The two models have roughly equivalent explanatory power; therefore, adding
price as a variable does not add a significant amount of explanatory power to the
model. There appears to be a problem of high correlation between the independent
variables of price and customers.
c.
Regression Analysis: salesmw2 versus priclec2, degrday2
The regression equation is
salesmw2 = 2312260 - 165275 priclec2 + 56.1 degrday2
Predictor
Coef
SE Coef
T
P
Constant
2312260
148794
15.54
0.000
priclec2
-165275
24809
-6.66
0.000
degrday2
56.06
60.37
0.93
0.357
S = 110725
R-Sq
Analysis of Variance
Source
DF
Regression
2
Residual Error
61
Total
63

= 42.2%

R-Sq(adj) = 40.3%

SS
MS
5.45875E+11 2.72938E+11
7.47863E+11 12260053296
1.29374E+12

F
22.26

P
0.000

All else equal, an increase in the price of electricity will reduce electricity sales by
165,275 mwh.
All else equal, an increase in the degree days (departure from normal weather) by
one unit will increase electricity sales by 56.06 mwh.
Note that the coefficient on the price variable is now negative, as expected, and it
is significantly different from zero (p-value = .000)

Chapter 13: Multiple Regression

22

d.
Regression Analysis: salesmw2 versus Yd872, degrday2
The regression equation is
salesmw2 = 293949 + 326 Yd872 + 58.4 degrday2
Predictor
Coef
SE Coef
T
Constant
293949
67939
4.33
Yd872
325.85
21.30
15.29
degrday2
58.36
35.79
1.63
S = 66187
R-Sq
Analysis of Variance
Source
DF
Regression
2
Residual Error
61
Total
63

= 79.3%

P
0.000
0.000
0.108

R-Sq(adj) = 78.7%

SS
MS
1.02652E+12 5.13259E+11
2.67221E+11 4380674677
1.29374E+12

F
117.16

P
0.000

All else equal, an increase in personal disposable income by one unit will increase
electricity sales by 325.85 mwh.
All else equal, an increase in degree days by one unit will increase electricity sales
by 58.36 mwh.
13.13 a. mpg as a function of horsepower and weight
Regression Analysis: milpgal versus horspwr, weight
The regression equation is
milpgal = 55.8 - 0.105 horspwr - 0.00661 weight
150 cases used 5 cases contain missing values
Predictor
Coef
SE Coef
T
P
Constant
55.769
1.448
38.51
0.000
horspwr
-0.10489
0.02233
-4.70
0.000
weight
-0.0066143
0.0009015
-7.34
0.000
S = 3.901
R-Sq = 72.3%
R-Sq(adj) = 72.0%
Analysis of Variance
Source
DF
SS
MS
F
Regression
2
5850.0
2925.0
192.23
Residual Error
147
2236.8
15.2
Total
149
8086.8

P
0.000

All else equal, a one unit increase in the horsepower of the engine will reduce fuel
mileage by .10489 mpg. All else equal, an increase in the weight of the car by 100
pounds will reduce fuel mileage by .66143 mpg.
b. Add number of cylinders
Regression Analysis: milpgal versus horspwr, weight, cylinder
The regression equation is
milpgal = 55.9 - 0.117 horspwr - 0.00758 weight
150 cases used 5 cases contain missing values
Predictor
Coef
SE Coef
T
Constant
55.925
1.443
38.77
horspwr
-0.11744
0.02344
-5.01
weight
-0.007576
0.001066
-7.10
cylinder
0.7260
0.4362
1.66

+ 0.726 cylinder
P
0.000
0.000
0.000
0.098

S = 3.878
R-Sq = 72.9%
R-Sq(adj) = 72.3%
Analysis of Variance
Source
DF
SS
MS
F
Regression
3
5891.6
1963.9
130.62
Residual Error
146
2195.1
15.0
Total
149
8086.8

P
0.000

23

Statistics for Business & Economics, 6th edition

All else equal, one additional cylinder in the engine of the auto will increase fuel
mileage by .726 mpg. Note that this is not significant at the .05 level (p-value = .
098).
Horsepower and weight still have the expected negative signs
c. mpg as a function of weight, number of cylinders
Regression Analysis: milpgal versus weight, cylinder
The regression equation is
milpgal = 55.9 - 0.0104 weight + 0.121 cylinder
154 cases used 1 cases contain missing values
Predictor
Coef
SE Coef
T
P
Constant
55.914
1.525
36.65
0.000
weight
-0.0103680
0.0009779
-10.60
0.000
cylinder
0.1207
0.4311
0.28
0.780
S = 4.151
R-Sq = 68.8%
R-Sq(adj) = 68.3%
Analysis of Variance
Source
DF
SS
MS
F
Regression
2
5725.0
2862.5
166.13
Residual Error
151
2601.8
17.2
Total
153
8326.8

P
0.000

All else equal, an increase in the weight of the car by 100 pounds will reduce fuel
mileage by 1.0368 mpg. All else equal, an increase in the number of cylinders in
the engine will increase mpg by .1207 mpg.
The explanatory power of the models has stayed relatively the same with a slight
drop in explanatory power for the latest regression model.
Note that the coefficient on weight has stayed negative and significant (p-values
of .000) for all of the regression models; although the value of the coefficient has
changed. The number of cylinders is not significantly different from zero in either
of the models where it was used as an independent variable. There is likely some
correlation between cylinders and the weight of the car as well as between
cylinders and the horsepower of the car.
d. mpg as a function of horsepower, weight, price
Regression Analysis: milpgal versus horspwr, weight, price
The regression equation is
milpgal = 54.4 - 0.0938 horspwr - 0.00735 weight +0.000137 price
150 cases used 5 cases contain missing values
Predictor
Coef
SE Coef
T
P
Constant
54.369
1.454
37.40
0.000
horspwr
-0.09381
0.02177
-4.31
0.000
weight
-0.0073518
0.0008950
-8.21
0.000
price
0.00013721 0.00003950
3.47
0.001
S = 3.762
R-Sq = 74.5%
R-Sq(adj) = 73.9%
Analysis of Variance
Source
DF
SS
MS
F
Regression
3
6020.7
2006.9
141.82
Residual Error
146
2066.0
14.2
Total
149
8086.8

P
0.000

All else equal, an increase by one unit in the horsepower of the auto will reduce
fuel mileage by .09381 mpg. All else equal, an increase by 100 pounds in the
weight of the auto will reduce fuel mileage by .73518 mpg and an increase in the
price of the auto by one dollar will increase fuel mileage by .00013721 mpg.

Chapter 13: Multiple Regression

24

e.
Horse power and weight remain significant negative independent variables
throughout whereas the number of cylinders has been insignificant. The size of the
coefficients change as the combinations of independent variables changes. This is
likely due to strong correlation that may exist between the independent variables.
13.14

a. Horsepower as a function of weight, cubic inches of displacement


Regression Analysis: horspwr versus weight, displace
The regression equation is
horspwr = 23.5 + 0.0154 weight + 0.157 displace
151 cases used 4 cases contain missing values
Predictor
Coef
SE Coef
T
P
Constant
23.496
7.341
3.20
0.002
weight
0.015432
0.004538
3.40
0.001
displace
0.15667
0.03746
4.18
0.000
S = 13.64
R-Sq = 69.2%
R-Sq(adj) = 68.8%
Analysis of Variance
Source
DF
SS
MS
F
Regression
2
61929
30964
166.33
Residual Error
148
27551
186
Total
150
89480

VIF
6.0
6.0

P
0.000

All else equal, a 100 pound increase in the weight of the car is associated with a
1.54 increase in horsepower of the auto.
All else equal, a 10 cubic inch increase in the displacement of the engine is
associated with a 1.57 increase in the horsepower of the auto.
b. Horsepower as a function of weight, displacement, number of cylinders
Regression Analysis: horspwr versus weight, displace, cylinder
The regression equation is
horspwr = 16.7 + 0.0163 weight + 0.105 displace + 2.57 cylinder
151 cases used 4 cases contain missing values
Predictor
Coef
SE Coef
T
P
VIF
Constant
16.703
9.449
1.77
0.079
weight
0.016261
0.004592
3.54
0.001
6.2
displace
0.10527
0.05859
1.80
0.074
14.8
cylinder
2.574
2.258
1.14
0.256
7.8
S = 13.63
R-Sq = 69.5%
R-Sq(adj) = 68.9%
Analysis of Variance
Source
DF
SS
MS
F
Regression
3
62170
20723
111.55
Residual Error
147
27310
186
Total
150
89480

P
0.000

All else equal, a 100 pound increase in the weight of the car is associated with a
1.63 increase in horsepower of the auto.
All else equal, a 10 cubic inch increase in the displacement of the engine is
associated with a 1.05 increase in the horsepower of the auto.
All else equal, one additional cylinder in the engine is associated with a 2.57
increase in the horsepower of the auto.
Note that adding the independent variable number of cylinders has not added to the
explanatory power of the model. R square has increased marginally. Engine
displacement is no longer significant at the .05 level (p-value of .074) and the
estimated regression slope coefficient on the number of cylinders is not

25

Statistics for Business & Economics, 6th edition

significantly different from zero. This is due to the strong correlation that exists
between cubic inches of engine displacement and the number of cylinders.
c. Horsepower as a function of weight, displacement and fuel mileage
Regression Analysis: horspwr versus weight, displace, milpgal
The regression equation is
horspwr = 93.6 + 0.00203 weight + 0.165 displace - 1.24 milpgal
150 cases used 5 cases contain missing values
Predictor
Coef
SE Coef
T
P
VIF
Constant
93.57
15.33
6.11
0.000
weight
0.002031
0.004879
0.42
0.678
8.3
displace
0.16475
0.03475
4.74
0.000
6.1
milpgal
-1.2392
0.2474
-5.01
0.000
3.1
S = 12.55
R-Sq = 74.2%
R-Sq(adj) = 73.6%
Analysis of Variance
Source
DF
SS
MS
F
Regression
3
66042
22014
139.77
Residual Error
146
22994
157
Total
149
89036

P
0.000

All else equal, a 100 pound increase in the weight of the car is associated with a .
203 increase in horsepower of the auto.
All else equal, a 10 cubic inch increase in the displacement of the engine is
associated with a 1.6475 increase in the horsepower of the auto.
All else equal, an increase in the fuel mileage of the vehicle by 1 mile per gallon is
associated with a reduction in horsepower of 1.2392.
Note that the negative coefficient on fuel mileage indicates the trade-off that is
expected between horsepower and fuel mileage. The displacement variable is
significantly positive, as expected, however, the weight variable is no longer
significant. Again, one would expect high correlation among the independent
variables.
d. Horsepower as a function of weight, displacement, mpg and price
Regression Analysis: horspwr versus weight, displace, milpgal, price
The regression equation is
horspwr = 98.1 - 0.00032 weight + 0.175 displace - 1.32 milpgal
+0.000138 price
150 cases used 5 cases contain missing values
Predictor
Coef
SE Coef
T
P
VIF
Constant
98.14
16.05
6.11
0.000
weight
-0.000324
0.005462
-0.06
0.953
10.3
displace
0.17533
0.03647
4.81
0.000
6.8
milpgal
-1.3194
0.2613
-5.05
0.000
3.5
price
0.0001379
0.0001438
0.96
0.339
1.3
S = 12.55
R-Sq = 74.3%
R-Sq(adj) = 73.6%
Analysis of Variance
Source
DF
SS
MS
F
Regression
4
66187
16547
105.00
Residual Error
145
22849
158
Total
149
89036

P
0.000

Engine displacement has a significant positive impact on horsepower, fuel mileage


is negatively related to horsepower and price is not significant.

Chapter 13: Multiple Regression

e. Explanatory power has marginally increased from the first model to the last.
The estimated coefficient on price is not significantly different from zero.
Displacement and fuel mileage have the expected signs. The coefficient on
weight has the wrong sign; however, it is not significantly different from zero
(p-value of .953).
13.15

A regression analysis has produced the following Analysis of Variance table


SSE
SSR
SSE
1
Given that SST = SSR + SSE, s 2 e
, R2
,
n k 1
SST
SST
SSE (n k 1)
R2 1
SST (n 1)
500
a. SSE = 500, s 2 e
= 19.2308, se = 4.3853
30 3 1
b. SST = SSR + SSE = 4,500 + 500 = 5,000
500 (26)
4500
500
2
1
c. R 2
= .90, R 1
= .8885
5000 (29)
5000
5000

13.16

A regression analysis has produced the following Analysis of Variance table


SSE
SSR
SSE
1
Given that SST = SSR + SSE, s 2 e
, R2
,
n k 1
SST
SST
SSE (n k 1)
R2 1
SST (n 1)
2500
a. SSE = 2500, s 2 e
= 86.207, se =9.2848
32 2 1
b. SST = SSR + SSE = 7,000 + 2,500 = 9,500
2500 (29)
7000
2500
2
1
c. R 2
= .7368, R 1
= .7086
9500 (31)
9500
9500

13.17

A regression analysis has produced the following Analysis of Variance table


SSE
SSR
SSE
1
Given that SST = SSR + SSE, s 2 e
, R2
,
n k 1
SST
SST
SSE (n k 1)
R2 1
SST (n 1)
10000
a. SSE = 10,000, s 2 e
= 222.222, se =14.9071
50 4 1
b. SST = SSR + SSE = 40,000 + 10,000 = 50,000
10, 000 (45)
40, 000
10, 000
2
2
1
c. R
= .80, R 1
= .7822
50, 0 00 (49)
50, 000
50, 000

26

27

Statistics for Business & Economics, 6th edition

13.18

A regression analysis has produced the following Analysis of Variance table


SSE
SSR
SSE
1
Given that SST = SSR + SSE, s 2 e
, R2
,
n k 1
SST
SST
SSE (n k 1)
R2 1
SST (n 1)
15, 000
a. SSE = 15,000, s 2 e
= 75.0, se = 8.660
206 5 1
b. SST = SSR + SSE = 80,000 + 15,000 = 95,000
10, 000 (45)
80, 000
15, 000
2
2
1
c. R
= .8421, R 1
= .7822
50, 0 00 (49)
95, 000
95, 000

13.19

13.20

13.21

3.549
.9145 , therefore, 91.45% of the variability in work-hours
3.881
of design effort can be explained by the variation in the planes top speed,
weight and percentage number of parts in common with other models.
b. SSE = 3.881-3.549 = .332
.332 /(27 4)
.9033
c. R 2 1
3.881/ 26
d. R .9145 .9563 . This is the sample correlation between observed and
predicted values of the design effort
a. R 2

88.2
.5441 , therefore, 54.41% of the variability in milk
162.1
consumption can be explained by the variations in weekly income and family
size.
73.9 /(30 3)
.5103
b. R 2 1
162.1/ 29
c. R .5441 .7376 . This is the sample correlation between observed and
predicted values of milk consumption.
a. R 2

79.2
.6331 , therefore, 63.31% of the variability in weight
79.2 45.9
gain can be explained by the variations in the average number of meals eaten,
number of hours exercised and number of beers consumed weekly.
45.9 /(25 4)
.5807
b. R 2 1
125.1/ 24
c. R .6331 .7957 . This is the sample correlation between observed and
predicted values of weight gained
a. R 2

Chapter 13: Multiple Regression

13.22

a.
Regression Analysis: Y profit versus X2 offices
The regression equation is
Y profit = 1.55 -0.000120 X2 offices
Predictor
Coef
SE Coef
Constant
1.5460
0.1048
X2 offi -0.00012033 0.00001434

T
14.75
-8.39

P
0.000
0.000

S = 0.07049
R-Sq = 75.4%
R-Sq(adj) = 74.3%
Analysis of Variance
Source
DF
SS
MS
F
Regression
1
0.34973
0.34973
70.38
Residual Error
23
0.11429
0.00497
Total
24
0.46402

P
0.000

b.
Regression Analysis: X1 revenue versus X2 offices
The regression equation is
X1 revenue = - 0.078 +0.000543 X2
Predictor
Constant
X2 offi

Coef
-0.0781
0.00054280

SE Coef
0.2975
0.00004070

offices
T
-0.26
13.34

P
0.795
0.000

S = 0.2000
R-Sq = 88.5%
R-Sq(adj) = 88.1%
Analysis of Variance
Source
DF
SS
MS
F
Regression
1
7.1166
7.1166
177.84
Residual Error
23
0.9204
0.0400
Total
24
8.0370

P
0.000

c.
Regression Analysis: Y profit versus X1 revenue
The regression equation is
Y profit = 1.33 - 0.169 X1 revenue
Predictor
Coef
SE Coef
Constant
1.3262
0.1386
X1 reven
-0.16913
0.03559

T
9.57
-4.75

P
0.000
0.000

S = 0.1009
R-Sq = 49.5%
R-Sq(adj) = 47.4%
Analysis of Variance
Source
DF
SS
MS
F
Regression
1
0.22990
0.22990
22.59
Residual Error
23
0.23412
0.01018
Total
24
0.46402

P
0.000

d.
Regression Analysis: X2 offices versus X1 revenue
The regression equation is
X2 offices = 957 + 1631 X1 revenue
Predictor
Coef
SE Coef
Constant
956.9
476.5
X1 reven
1631.3
122.3

T
2.01
13.34

P
0.057
0.000

S = 346.8
R-Sq = 88.5%
R-Sq(adj) = 88.1%
Analysis of Variance
Source
DF
SS
MS
F
Regression
1
21388013
21388013
177.84
Residual Error
23
2766147
120267
Total
24
24154159

P
0.000

28

29

Statistics for Business & Economics, 6th edition

13.23

Given the following results where the numbers in parentheses are the sample
standard error of the coefficient estimates
a. Compute two-sided 95% confidence intervals for the three regression slope
coefficients
b j tn k 1, 2 sb j
95% CI for x1 = 4.8 2.086 (2.1); .4194 up to 9.1806
95% CI for x2 = 6.9 2.086 (3.7); -.8182 up to 14.6182
95% CI for x3 = -7.2 2.086 (2.8); -13.0408 up to -1.3592
b. Test the hypothesis H 0 : j 0, H1 : j 0
4.8
2.286 t20,.05 /.01 = 1.725, 2.528
For x1: t
2.1
Therefore, reject H 0 at the 5% level but not at the 1% level
6.9
1.865 t20,.05 /.01 = 1.725, 2.528
For x2: t
3.7
Therefore, reject H 0 at the 5% level but not at the 1% level
7.2
2.571 t20,.05/.01 = 1.725, 2.528
For x3: t
2.8
Therefore, do not reject H 0 at either level

13.24

Given the following results where the numbers in parentheses are the sample
standard error of the coefficient estimates
a. Compute two-sided 95% confidence intervals for the three regression slope
coefficients
b j tn k 1, 2 sb j
95% CI for x1 = 6.8 2.042 (3.1); .4698 up to 13.1302
95% CI for x2 = 6.9 2.042 (3.7); -6.4554 up to 14.4554
95% CI for x3 = -7.2 2.042 (3.2); -13.7344 up to -.6656
b. Test the hypothesis H 0 : j 0, H1 : j 0
6.8
2.194 t30,.05 /.01 = 1.697, 2.457
For x1: t
3.1
Therefore, reject H 0 at the 5% level but not at the 1% level
6.9
1.865 t30,.05 /.01 = 1.697, 2.457
For x2: t
3.7
Therefore, reject H 0 at the 5% level but not at the 1% level
7.2
2.25 t30,.05 /.01 = 1.697, 2.457
For x3: t
3.2
Therefore, do not reject H 0 at the 5% level nor the 1% level

Chapter 13: Multiple Regression

13.25

Given the following results where the numbers in parentheses are the sample
standard error of the coefficient estimates
a. Compute two-sided 95% confidence intervals for the three regression slope
coefficients
b j tn k 1, 2 sb j
95% CI for x1 = 34.8 2.000 (12.1); 10.60 up to 59.0
95% CI for x2 = 56.9 2.000 (23.7); 9.50 up to 104.30
95% CI for x3 = -57.2 2.000 (32.8); -122.80 up to 8.40
b. Test the hypothesis H 0 : j 0, H1 : j 0
34.8
2.876 t60,.05/.01 = 1.671, 2.390
For x1: t
12.1
Therefore, reject H 0 at the 5% level but not at the 1% level
56.9
2.401 t60,.05/.01 = 1.671, 2.390
For x2: t
23.7
Therefore, reject H 0 at the 5% level but not at the 1% level
57.2
1.744 t60,.05 /.01 = 1.671, 2.390
For x3: t
32.8
Therefore, do not reject H 0 at either level

13.26

Given the following results where the numbers in parentheses are the sample
standard error of the coefficient estimates
a. Compute two-sided 95% confidence intervals for the three regression slope
coefficients
b j tn k 1, 2 sb j
95% CI for x1 = 17.8 2.042 (7.1); 3.3018 up to 32.2982
95% CI for x2 = 26.9 2.042 (13.7); -1.0754 up to 54.8754
95% CI for x3 = -9.2 2.042 (3.8); -16.9596 up to -1.44
b. Test the hypothesis H 0 : j 0, H1 : j 0
17.8
2.507 t35,.05 /.01 1.697, 2.457
For x1: t
7.1
Therefore, reject H 0 at the 5% level but not at the 1% level
26.9
1.964 t35,.05 /.01 1.697, 2.457
For x2: t
13.7
Therefore, reject H 0 at the 5% level but not at the 1% level
9.2
2.421 t35,.05 /.01 1.697, 2.457
For x3: t
3.8
Therefore, do not reject H 0 at either level

13.27

a. b1 .661, sb 1 .099, n 27, t23,.05/.025 1.714, 2.069


90% CI: .661 1.714(.099); .4913 up to .8307

30

31

Statistics for Business & Economics, 6th edition

95% CI: .661 2.069(.099); .4562 up to .8658


b. b2 .065, sb 2 .032, t23,.025/.005 2.069, 2.807
95% CI: .065 2.069(.032); -.0012 up to .1312
99% CI: .065 2.807(.032); -.0248 up to .1548
H
c. 0 : 2 0, H1 : 2 0
.065
t
2.031
.032
t23,.05 /.025 = 1.714, 2.069
Therefore, reject H 0 at the 10% level but not at the 5% level
d. H 0 : 1 2 0, H1 : At least one i 0, (i 1, 2)
(3.311 .332) / 2
F
103.188
.332 / 23
Therefore, reject H 0 at the 1% level since F = 103.188 > 5.66 = F 2,23,.01
13.28

a. H 0 : 1 0; H1 : 1 0
.052
t
2.26
.023
t27,.025 /.01 2.052, 2.473
Therefore, reject H 0 at the 2.5% level but not at the 1% level
b. t27,.05/.025 /.005 1.703, 2.052, 2.771
90% CI: 1.14 1.703(.35); .5439 up to 1.7361
95% CI: 1.14 2.052(.35); .4218 up to 1.8582
99% CI: 1.14 2.771(.35); .1701 up to 2.1099

13.29

a. H 0 : 2 0; H1 : 2 0
1.345
t
2.381
.565
t21,.025/.01 2.080, 2.518
Therefore, reject H 0 at the 2.5% level but not at the 1% level
b. H 0 : 3 0; H1 : 3 0
.613
t
2.523
.243
t21,.01/.005 2.518, 2.831
Therefore, reject H 0 at the 1% level but not at the .5% level
c. t21,.05 /.025 /.005 1.721, 2.080, 2.831
90% CI: .653 1.721(.189); .3277 up to .9783
95% CI: .653 2.080(.189); .2599 up to 1.0461
99% CI: .653 2.831(.189); .1179 up to 1.1881

Chapter 13: Multiple Regression

32

13.30 a. H 0 : 3 0, H1 : 3 0
.000191
t
.428
.000446
t16,.10 = -1.337
Therefore, do not reject H 0 at the 20% level
b. H 0 : 1 2 3 0, H1 : At least one i 0, (i 1, 2,3)
16 .71
F
13.057 , F 3,16,.01 = 5.29
3 1 .71
Therefore, reject H 0 at the 1% level
13.31 a. t85,.025 /.005 2.000, 2.660
90% CI: 7.878 2.000(1.809); 4.260 up to 11.496
95% CI: 7.878 2.660(1.809); 3.0661 up to 12.6899
.003666
2.73 , t85,.005 2.660
b. H 0 : 2 0; H1 : 2 0 , t
.001344
Therefore, reject H 0 at the .5% level
13.32

a. All else being equal, an extra $1 in mean per capita personal income leads
to an expected extra $.04 of net revenue per capita from the lottery
b. b2 .8772, sb 2 .3107, n 29, t24,.025 2.064
95% CI: .8772 2.064(.3107), .2359 up to 1.5185
c. H 0 : 3 0, H1 : 3 0
365.01
t
1.383
263.88
t24,.10 /.05 = -1.318, -1.711
Therefore, reject H 0 at the 10% level but not at the 5% level

13.33 a. b3 7.653, sb 3 3.082, n 19, t15,.025 2.131


95% CI: 7.653 2.131(3.082), 1.0853 up to 14.2207
19.729
2.194
b. H 0 : 2 0, H1 : 2 0 , t
8.992
t15,.025/.01 = 2.131, 2.602
Therefore, reject H 0 at the 2.5% level but not at the 1% level
13.34 a. n 19, b1 .2, sb 1 .0092, t16,.025 2.12
95% CI: .2 2.12(.0092), .1805 up to .2195
.1
1.19
b. H 0 : 2 0, H1 : 2 0 , t
.084

33

Statistics for Business & Economics, 6th edition

t16,.10 = -1.337, Therefore, do not reject H 0 at the 10% level


13.35 a. n 14, b1 .101, sb 1 .023, t10,.05 1.812
90% CI: .101 1.812(.023), .0593 up to .1427
.244
3.05
b. H 0 : 2 0, H1 : 2 0 , t
.08
t10,.01/.005 = -2.764, -3.169
Therefore, reject H 0 at the 1% level but not at the .5% level
c. H 0 : 3 0, H1 : 3 0
.057
t
6.162
.00925
t10,.005 = 3.169
Therefore, reject H 0 at the .5% level
13.36 a. n 39, b5 .0495, sb 1 .01172, t30,.005 2.750
99% CI: .0495 2.750(.01172), .0173 up to .0817
H
b. 0 : 4 0, H1 : 4 0
.48122
t
.617
.77954
t30,.10 = 1.31
Therefore, do not reject H 0 at the 20% level
c. H 0 : 7 0, H1 : 7 0
.00645
t
2.108
.00306
t30,.025/.01 = 2.042, 2.457
Therefore, reject H 0 at the 5% level but not at the 2% level
13.37 Test the hypothesis that all three of the predictor variables are equal to zero given
the following Analysis of Variance tables
a. H 0 : 1 2 3 0, H1 : At least one i 0, (i 1, 2,3)
SSR k
MSR
4500 3
F
2 =
= 78.0, F 3,26,.05 = 2.98
SSE (n k 1)
se
500 26
Therefore, reject H 0 at the 5% level
b. H 0 : 1 2 3 0, H1 : At least one i 0, (i 1, 2,3)
SSR k
MSR
9780 3
F
2 =
= 40.362, F 3,26,.05 = 2.98
SSE (n k 1)
se
2100 26
Therefore, reject H 0 at the 5% level

Chapter 13: Multiple Regression

c. H 0 : 1 2 3 0, H1 : At least one i 0, (i 1, 2,3)


SSR k
MSR
46, 000 3
F
2 =
= 15.9467, F 3,26,.05 = 2.98
SSE (n k 1)
se
25, 000 26
Therefore, reject H 0 at the 5% level
d. H 0 : 1 2 3 0, H1 : At least one i 0, (i 1, 2,3)
SSR k
MSR
87, 000 3
F
2 =
= 15.3708, F 3,26,.05 = 2.98
SSE (n k 1)
se
48, 000 26
Therefore, reject H 0 at the 5% level
13.38 a. SST = 3.881, SSR = 3.549, SSE = .332
H 0 : 1 2 3 0, H1 : At least one i 0, (i 1, 2,3)
3.549 / 3
F
81.955
.332 / 23
F 3,23,.01 = 4.76
Therefore, reject H 0 at the 1% level.
b. Analysis of Variance table:
Sources of
Sum of Degress of
variation
Squares
Freedom
Regressor
3.549
3
Error
.332
23
Total
3.881
26

Mean Squares
1.183
.014435

F-Ratio
81.955

13.39 H 0 : 1 2 0, H1 : At least one i 0, (i 1, 2)


45 .463 (2 / 45)
F
21.262
2
1 .463
F 2,45,.01 = 5.18
Therefore, reject H 0 at the 1% level
13.40 a. SST = 162.1, SSR =88.2, SSE = 73.9
H 0 : 1 2 0, H1 : At least one i 0, (i 1, 2)
88.2 / 2
F
16.113 , F 2,27,.01 = 5.49
73.9 / 27
Therefore, reject H 0 at the 1% level
b.
Sources of
Sum of
Degress of
variation
Squares
Freedom Mean Squares
Regressor
88.2
2
44.10
Error
73.9
27
2.737
Total
162.1
29
13.41 a. SST = 125.1, SSR = 79.2, SSE = 45.9

F-Ratio
16.113

34

35

Statistics for Business & Economics, 6th edition

H 0 : 1 2 3 0, H1 : At least one i 0, (i 1, 2,3)


79.2 / 3
F
12.078 , F 3,21,.01 = 4.87
45.9 / 21
Therefore, reject Ho at the 1% level
b.
Sources of
variation
Regressor
Error
Total

Sum of
Squares
79.2
45.9
125.1

Degress of
Freedom
3
21
24

Mean Squares
26.4
2.185714

F-Ratio
12.078

13.42 a. H 0 : 1 2 3 4 0, H1 : At least one i 0, (i 1, 2,3, 4)


The test can be based directly on the coefficient of determination since
R2
SSR SST SSR
SSR
SSE
2

F , and
R
1
, and hence
2
1 R
SSE SST SSE
SST
SST
n K 1 R2
24 .51
F
F
6.2449 , F 4,24,.01 = 4.22. Therefore,

2 ,
K
4 1 .51
1 R
reject H 0 at the 1% level
13.43 a. H 0 : 1 2 3 0, H1 : At least one i 0, (i 1, 2,3)
R2

R2
SSR SST SSR
SSR
SSE

F , and
1
, and hence
2
1 R
SSE SST SSE
SST
SST

n K 1 R2
15 .84
F
F
26.25 , F 3,15,.01 = 5.42. Therefore, reject

2 ,
K
3 1 .84
1 R
H 0 at the 1% level
13.44 a. H 0 : 1 2 0, H1 : At least one i 0, (i 1, 2)
R2
F

R2
SSR SST SSR
SSR
SSE

F , and
1
, and hence
2
1 R
SSE SST SSE
SST
SST

n K 1 R2
16 .96 (2 /16)
F
217 , F 2,16,.01 = 6.23

2 ,
K
2
1 .96
1 R

Therefore, reject H 0 at the 1% level

Chapter 13: Multiple Regression

36

13.45 H 0 : 1 2 3 4 5 6 7 0, H1 : At least one


i 0, (i 1, 2,3, 4,5, 6, 7)
31 .3572 (7 / 31)
F
4.017 , F 7,30,.01 = 3.30.
7
1 .3572
Therefore, reject H 0 at the 1% level
( SSE * SSE )k1 n k 1 ( SSE * SSE ) / SST

SSE /(n k 1)
k1
SSE / SST

13.46

=
13.47

n k 11 R 2* (1 R 2 )
n k 1 R 2 R 2*
=
k1
1 R2
k1
1 R2

Let 3 be the coefficient on the number of preschool children in the household


(88.2 83.7) /1
H 0 : 3 0, H1 : 3 0 , F
1.398 , F 1,26,.05 = 4.23
83.7 / 26
Therefore, do not reject H 0 at the 5% level

13.47.1

SSE /( n k 1)
n 1
(1 R 2 ) =
= 1
SST /(n 1)
n k 1
n 1
k
(n 1) R 2 k
R2
=
n k 1
n k 1
n k 1
2
a. R 1

b. Since R 2

c.

(n 1) R 2 k
(n k 1) R 2 k
, then R 2
n k 1
n 1

SSR / k
n k 1 SSR / SST
n k 1 R2

=
=
SSE /(n k 1)
k
SSE / SST
k
1 R2
n k 1 [(n k 1) R 2 k ] /(n 1)
n k 1 (n k 1) R 2 k
=
k
[n 1 (n k 1) R 2 k ] /(n 1)
k
( n k 1)(1 R 2 )
n k 1 R 2 k
=
k
(1 R 2 )

13.49 Given the estimated multiple regression equation y 6 5 x1 4 x2 7 x3 8 x4


a. y 6 5(10) 4(23) 7(9) 8(12) = 307
b. y 6 5(23) 4(18) 7(10) 8(11) = 351
c. y 6 5(10) 4(23) 7(9) 8(12) = 307
d. y 6 5(10) 4(13) 7( 8) 8(16) = -176

37

Statistics for Business & Economics, 6th edition

13.50 Y 7.35 .653(20) 1.345(10) .613(6) 10.638 pounds


13.51 Y .578 .052(6) 1.14(4) 5.45 quarts of milk per week
13.52 Y 2.0 .661(1) .065(7) .018(50) 2.216 million worker hours
13.53 a. All else equal, a one square foot increase in the lot size is expected to increase
the selling price of the house by $1.468
b. 98.43% of the variation in the selling price of homes can be explained by the
variation in house size, lot size, number of bedrooms and number of bathrooms
2701.1
1.353 , t15,.10 /.05 = 1.341, 1.753
c. H 0 : 4 0, H1 : 4 0 , t
1996.2
Therefore, reject H 0 at the 10% level but not at the 5% level
d.
Y 1998.5 22.352(1250) 1.4686(4700) 6767.3(3) 2701.1(1.5) 61194.47
13.54

Compute values of yi when xi = 1, 2, 4, 6, 8, 10


Xi
1
2
4
6
1.5
4
11.3137 32
58.7878
yi 4 x
13
41
85
yi 1 2 xi 2 xi 2 5

8
90.5097

10
126.4611

145

221

13.55 Compute values of yi when xi = 1, 2, 4, 6, 8, 10


Xi
1
2
4
6
8
1.8
4
13.9288
48.5029 100.6311 168.8970
yi 4 x
13
41
85
145
yi 1 2 xi 2 xi 2 5
13.56 Compute values of yi when xi = 1, 2, 4, 6, 8, 10
Xi
1
2
4
6
1.5
4
11.3137 32
58.7878
yi 4 x
4.7
11.8
36.2
74.2
yi 1 2 xi 1.7 xi 2
13.57 Compute values of yi when xi = 1, 2, 4, 6, 8, 10
Xi
1
2
4
6
1.2
3
6.8922 15.8341 25.7574
yi 3x
15
-3
-23
yi 1 5 xi 1.5 xi 2 4.5

10
252.3829
221

8
90.5097

10
126.4611

125.8

191

8
36.3772

10
47.5468

-55

-99

13.58 There are many possible answers. Relationships that can be approximated by a
non-linear quadratic model include many supply functions, production functions
and cost functions including average cost versus the number of units produced.

Chapter 13: Multiple Regression

38

To estimate the function with linear least squares, solve the equation 1 2 2
for 2 . Since 2 2 1 , plug into the equation and algebraically manipulate:
Y o 1 X 1 (2 1 ) X 21 3 X 2

13.59

Y o 1 X 1 2 X 21 1 X 21 3 X 2
Y o 1[ X 1 X 21 ] 2 X 21 3 X 2
Y 2 X 21 o 1[ X 1 X 21 ] 3 X 2
Conduct the variable transformations and estimate the model using least squares.
13.60

a. All else equal, 1% increase in annual consumption expenditures will be


associated with a 1.1556% increase in expenditures on vacation travel.
All else equal, a 1% increase in the size of the household will be associated
with a .4408% decrease in expenditures on vacation travel.
b. 16.8% of the variation in vacation travel expenditures can be explained by the
variations in the log of total consumption expenditures and log of the number
of members in the household
c. 1.1556 1.96(.0546) = 1.049 up to 1.2626
.4408
8.996 ,
d. H 0 : 2 0, H1 : 2 0 , t
.0490
Therefore, reject H 0 at the 1% level

13.61

a. A 1% increase in median income leads to an expected .68% increase in store


size.
.68
8.831 , Therefore, reject H 0 at the 1% level
b. H 0 : 1 0, H1 : 1 0 , t
.077

13.62

a. All else equal, a 1% increase in the price of beef will be associated with a
decrease of .529% in the tons of beef consumed annually in the U.S.
b. All else equal, a 1% increase in the price of pork will be associated with an
increase of .217% in the tons of beef consumed annually in the U.S.
.416
2.552 , t25,.01 = 2.485, Therefore, reject
c. H 0 : 4 0, H1 : 4 0 , t
.163
H 0 at the 1% level
d. H 0 : 1 2 3 4 0, H1 : At least one i 0, (i 1, 2,3, 4)
n k 1 R2
25 .683

13.466 , F 4,25,.01 = 4.18. Therefore,


2
k
1 R
4 1 .683
reject H 0 at the 1% level
e. If an important independent variable has been omitted, there may be
specification bias. The regression coefficients produced for the misspecified
model would be misleading.
F

13.63 Estimate a Cobb-Douglas production function with three independent variables:

Y 0 X 1 1 X 2 2 X 3 3 where X1 = capital, X2 = labor and X3 = basic research

39

Statistics for Business & Economics, 6th edition

Taking the log of both sides of the equation yields:


log(Y ) log( 0 ) 1 log( X 1 ) 2 log( X 2 ) 3 log( X 3 )
Using this form, now regression the log of Y on the logs of the three independent
variables and obtain the estimated regression slope coefficients.

12.34

a. Coefficients for exponential models can be estimated by taking the


logarithm of both sides of the multiple regression model to obtain an equation
that is linear in the logarithms of the variables.
log(Y ) log( 0 ) 1 log( X 1 ) 2 log( X 2 ) 3 log( X 3 ) 4 (log( X 4 ) log( )
Substituting in the restrictions on the coefficients: 1 2 1, 2 1 1 ,
3 4 1, 4 1 3
log(Y ) log( 0 ) 1 log( X 1 ) [1 1 ]log( X 2 ) 3 log( X 3 ) [1 3 ](log( X 4 ) log( )
Simplify algebraically and estimate the coefficients. The coefficient 2 can be
found by subtracting 1 from 1.0. Likewise the coefficient 4 can be found by
subtracting 3 from 1.0.
b. Constant elasticity for Y versus X4 is the regression slope coefficient on the X4
term of the logarithm model.
Linear model:

Regression Plot
Salary = 20544.5 + 616.113 Experience
S = 3117.89

R-Sq = 78.0 %

R-Sq(adj) = 77.9 %

50000

40000

Salary

13.64

30000

20000
0

10

20

Experience

30

40

Chapter 13: Multiple Regression

Quadratic model:

Regression Plot
Salary = 18683.8 + 910.807 Experience
- 8.21382 Experience**2
S = 3027.17

R-Sq = 79.4 %

R-Sq(adj) = 79.1 %

50000

Salary

40000

30000

20000
0

10

20

Experience

30

40

40

41

Statistics for Business & Economics, 6th edition

Cubic model:

Regression Plot
Salary = 20881.1 + 344.484 Experience
+ 26.4323 Experience**2 - 0.582553 Experience**3
S = 2982.43

R-Sq = 80.2 %

R-Sq(adj) = 79.8 %

50000

Salary

40000

30000

20000
0

10

20

30

40

Experience

All three of the models appear to fit the data well. The cubic model appears to fit
the data the best as the standard error of the estimate is lowest. In addition,
explanatory power is marginally higher for the cubic model than the other models.
13.66
Results for: GermanImports.xls
Regression Analysis: LogYt versus LogX1t, LogX2t
The regression equation is
LogYt = - 4.07 + 1.36 LogX1t + 0.101 LogX2t
Predictor
Coef
SE Coef
T
Constant
-4.0709
0.3100
-13.13
LogX1t
1.35935
0.03005
45.23
LogX2t
0.10094
0.05715
1.77
S = 0.04758

R-Sq = 99.7%

Analysis of Variance
Source
DF
Regression
2
Residual Error
28
Total
30
Source
LogX1t
LogX2t

DF
1
1

SS
21.345
0.063
21.409

P
0.000
0.000
0.088

VIF
4.9
4.9

R-Sq(adj) = 99.7%
MS
10.673
0.002

F
4715.32

Seq SS
21.338
0.007

13.67 What is the model constant when the dummy variable equals 1
a. y 7 8 x1 , b0 = 7
b. y 12 6 x1 , b0 = 12
c. y 7 12 x1 , b0 = 7

P
0.000

Chapter 13: Multiple Regression

42

13.68 What is the model constant when the dummy variable equals 1
a. y 5.78 4.87 x1
b. y 1.15 9.51x1
c. y 13.67 8.98 x1
13.69 The interpretation of the dummy variable is that we can conclude that for a given
difference between the spot price in the current year and OPEC price in the
previous year, the difference between the OPEC price in the current year and
OPEC price in the previous years is $5.22 higher in 1974 during the oil embargo
than in other years
13.70

a. All else being equal, expected selling price is higher by $3,219 if condo has a
fireplace.
b. All else being equal, expected selling price is higher by $2,005 if condo has
brick siding.
c. 95% CI: 3219 1.96(947) = $1,362.88 up to $5,075.12
2005
2.611 , t809,.005 = 2.576
d. H 0 : 5 0, H1 : 5 0 , t
768
Therefore, reject H 0 at the .5% level

13.71

a. All else being equal, the price-earnings ratio is higher by 1.23 for a regional
company than a national company
1.23
2.48 , t29,.01/.005 = 2.462, 2.756
b. H 0 : 2 0, H1 : 2 0 , t
.496
Therefore, reject H 0 at the 2% level but not at the 1% level
c. H 0 : 1 2 0, H1 : At least one i 0, (i 1, 2)
n k 1 R2
29 .37

8.516 , F 2,29,.05 = 3.33


2
k
1 R
2 1 .37
Therefore, reject H 0 at the 5% level
F

13.72 35.6% of the variation in overall performance in law school can be explained by
the variation in undergraduate gpa, scores on the LSATs and whether the students
letter of recommendation are unusually strong. The overall model is significant
since we can reject the null hypothesis that the model has no explanatory power in
favor of the alternative hypothesis that the model has significant explanatory
power. The individual regression coefficients that are significantly different than
zero include the scores on the LSAT and whether the students letters of
recommendation were unusually strong. The coefficient on undergraduate gpa was
not found to be significant at the 5% level.
13.73

a. All else equal, the annual salary of the attorney general who can be removed is
$5,793 higher than if the attorney general cannot be removed

43

Statistics for Business & Economics, 6th edition

b. All else equal, the annual salary of the attorney general of the state is $3,100
lower if the supreme court justices are elected on partisan ballots
5793
1.9996 , t43,.05 /.025 = 1.68, 2.016
c. H 0 : 5 0, H1 : 5 0 , t
2897
Therefore, reject H 0 at the 5% level but not at the 2.5% level
3100
1.76 , t43,.05 /.025 = -1.68, -20.16
d. H 0 : 6 0, H1 : 6 0 , t
1761
Therefore, reject H 0 at the 5% level but not at the 2.5% level
e. t43,.05/.025 = 2.016
95% CI: 547 2.016(124.3), 296.41 up to 797.59
13.74

a. All else equal, the average rating of a course is 6.21 units higher if a
visiting
lecturer is brought in than if otherwise.
6.21
1.73 , t20,.05 = 1.725
b. H 0 : 4 0, H1 : 4 0 , t
3.59
Therefore, reject H 0 at the 5% level
c. 56.9% of the variation in the average course rating can be explained by the
variation in the percentage of time spent in group discussions, the dollars spent
on preparing the course materials, the dollars spent on food and drinks, and
whether a guest lecturer is brought in.
H 0 : 1 2 3 4 0, H1 : At least one i 0, (i 1, 2,3, 4)
n k 1 R2
20 .569

6.6
2
k
1 R
4 1 .569
F 4,20,.01 = 4.43
Therefore, reject H 0 at the 1% level

d. t20,.025 = 2.086
95% CI: .52 2.086(.21), .0819 up to .9581
13.75 34.4% of the variation in a test on understanding college economics can be
explained by which course was taken, the students gpa, the teacher that taught the
course, the gender of the student, the pre-test score, the number of credit hours
completed and the age of the student. The regression model has significant
explanatory power:
H 0 : 1 2 3 4 5 6 7 0, H1 : At least one
i 0, (i 1, 2,3, 4,5, 6, 7)
F

n k 1 R2
342 .344

25.62
2
k
1 R
7 1 .344

Chapter 13: Multiple Regression

44

13.76
Results for: Student Performance.xls
Regression Analysis: Y versus X1, X2, X3, X4, X5
The regression equation is
Y = 2.00 + 0.0099 X1 + 0.0763 X2 - 0.137 X3 + 0.064 X4 + 0.138 X5
Predictor
Constant
X1
X2
X3
X4
X5

Coef
1.997
0.00990
0.07629
-0.13652
0.0636
0.13794

S = 0.5416

SE Coef
1.273
0.01654
0.05654
0.06922
0.2606
0.07521

R-Sq = 26.5%

Analysis of Variance
Source
DF
Regression
5
Residual Error
21
Total
26

T
1.57
0.60
1.35
-1.97
0.24
1.83

P
0.132
0.556
0.192
0.062
0.810
0.081

VIF
1.3
1.2
1.1
1.4
1.1

R-Sq(adj) = 9.0%

SS
2.2165
6.1598
8.3763

MS
0.4433
0.2933

F
1.51

P
0.229

The model is not significant (p-value of the F-test = .229). The model only explains
26.5% of the variation in gpa with the hours spent studying, hours spent preparing
for tests, hours spent in bars, whether or not students take notes or mark highlights
when reading tests and the average number of credit hours taken per semester. The
only independent variables that are marginally significant (10% level but not the 5%
level) include number of hours spent in bars and average number of credit hours.
The other independent variables are not significant at common levels of alpha.
13.77

a. Begin the analysis with the correlation matrix identify important independent
variables as well as correlations between the independent variables
Correlations: Salary, Experience, yearsenior, Gender_1F
Experien
yearseni

Salary Experien yearseni


0.883
0.000
0.777
0.000

0.674
0.000

Gender_1 -0.429
0.000

-0.378
0.000

-0.292
0.000

Regression Analysis: Salary versus Experience, yearsenior, Gender_1F


The regression equation is
Salary = 22644 + 437 Experience + 415 yearsenior - 1443 Gender_1F
Predictor
Coef
SE Coef
T
P
VIF
Constant
22644.1
521.8
43.40
0.000
Experien
437.10
31.41
13.92
0.000
2.0
yearseni
414.71
55.31
7.50
0.000
1.8
Gender_1
-1443.2
519.8
-2.78
0.006
1.2
S = 2603

R-Sq = 84.9%

Analysis of Variance
Source
DF
Regression
3
Residual Error
146

SS
5559163505
989063178

R-Sq(adj) = 84.6%
MS
1853054502
6774405

F
273.54

P
0.000

45

Statistics for Business & Economics, 6th edition

Total

149

6548226683

84.9% of the variation in annual salary (in dollars) can be explained by the variation
in the years of experience, the years of seniority and the gender of the employee. All
of the variables are significant at the .01 level of significance (p-values of .000, .000
and .006 respectively). The F-test of the significance of the overall model shows
that we reject H 0 that all of the slope coefficients are jointly equal to zero in favor
of H1 that at least one slope coefficient is not equal to zero. The F-test yielded a pvalue of .000.
b. H 0 : 3 0, H1 : 3 0
1443.2
t
2.78 , t146,.01 = -2.326
519.8
Therefore, reject H 0 at the 1% level. And conclude that the annual salaries for
females are statistically significantly lower than they are for males.
c. Add an interaction term and test for the significance of the slope coefficient on
the interaction term.
13.78 Two variables are included as predictor variables. What is the effect on the
estimated slope coefficients when these two variables have a correlation equal to
a. .78. A large correlation among the independent variables will lead to a high
variance for the estimated slope coefficients and will tend to have a small
2
students t statistic. Use the rule of thumb r
to determine if the
n
correlation is large.
b. .08. No correlation exists among the independent variables. No effect on the
estimated slope coefficients.
c. .94. A large correlation among the independent variables will lead to a high
variance for the estimated slope coefficients and will tend to have a small
students t statistic.
2
d. .33. Use the rule of thumb r
to determine if the correlation is large.
n
13.79 n = 34 and four independent variables. R = .23. Does this imply that this
independent variable will have a very small students t statistic?
Correlation between the independent variable and the dependent variable is not
necessarily evidence of a small students t statistic. A high correlation among the
independent variables could result in a very small students t statistic as the
correlation creates a high variance.

Chapter 13: Multiple Regression

46

13.80 n = 47 with three independent variables. One of the independent variables has a
correlation of .95 with the dependent variable.
Correlation between the independent variable and the dependent variable is not
necessarily evidence of a small students t statistic. A high correlation among the
independent variables could result in a very small students t statistic as the
correlation creates a high variance.
13.81 n = 49 with two independent variables. One of the independent variables has a
correlation of .56 with the dependent variable.
Correlation between the independent variable and the dependent variable is not
necessarily evidence of a small students t statistic. A high correlation among the
independent variables could result in a very small students t statistic as the
correlation creates a high variance.
13.82 Through 13.84 Reports can be written by following the extended Case Study on
the data file Cotton see Section 13.9
13.85
Regression Analysis: y_deathrate versus x1_totmiles, x2_avgspeed
The regression equation is
y_deathrate = - 2.97 - 0.00447 x1_totmiles + 0.219 x2_avgspeed
Predictor
Coef
SE Coef
T
P
VIF
Constant
-2.969
3.437
-0.86
0.416
x1_totmi
-0.004470
0.001549
-2.89
0.023
11.7
x2_avgsp
0.21879
0.08391
2.61
0.035
11.7
S = 0.1756
R-Sq = 55.1%
R-Sq(adj) = 42.3%
Analysis of Variance
Source
DF
SS
MS
F
Regression
2
0.26507
0.13254
4.30
Residual Error
7
0.21593
0.03085
Total
9
0.48100

P
0.061

55.1% of the variation in death rates can be explained by the variation in total
miles traveled and in average travel speed. The overall model is significant at the
10% but not the 5% level since the p-value of the F-test is .061.
All else equal, the average speed variable has the expected sign since as average
speed increases, the death rate also increases. The total miles traveled variable is
negative which indicates that the more miles traveled, the lower the death rate.
Both of the independent variables are significant at the 5% level (p-values of .023
and .035 respectively). There appears to be some correlation between the
independent variables.

47

Statistics for Business & Economics, 6th edition

Due to the high correlation between the independent variables, an alternative


model using a quadratic model is as follows:
Regression Analysis: y_deathrate versus x1_totmiles, x1_totsquared
The regression equation is
y_deathrate = - 6.54 + 0.0268 x1_totmiles -0.000015 x1_totsquared
Predictor
Coef
Constant
-6.539
x1_totmi
0.026800
x1_totsq -0.00001480

SE Coef
1.296
0.002835
0.00000153

T
-5.04
9.45
-9.68

P
0.001
0.000
0.000

VIF
285.5
285.5

S = 0.06499
R-Sq = 93.9%
R-Sq(adj) = 92.1%
Analysis of Variance
Source
DF
SS
MS
F
Regression
2
0.45143
0.22572
53.44
Residual Error
7
0.02957
0.00422
Total
9
0.48100
Source
x1_totmi
x1_totsq

DF
1
1

P
0.000

Seq SS
0.05534
0.39609

13.86
Regression Analysis: y_FemaleLFPR versus x1_income, x2_yrsedu, ...
The regression equation is
y_FemaleLFPR = 0.2 +0.000406 x1_income + 4.84 x2_yrsedu - 1.55
x3_femaleun
Predictor
Coef
SE Coef
T
P
VIF
Constant
0.16
34.91
0.00
0.996
x1_incom
0.0004060
0.0001736
2.34
0.024
1.2
x2_yrsed
4.842
2.813
1.72
0.092
1.5
x3_femal
-1.5543
0.3399
-4.57
0.000
1.3
S = 3.048

R-Sq = 54.3%

Analysis of Variance
Source
DF
Regression
3
Residual Error
46
Total
49

SS
508.35
427.22
935.57

R-Sq(adj) = 51.4%
MS
169.45
9.29

F
18.24

P
0.000

13.87
Regression Analysis: y_money versus x1_pcincome, x2_ir
The regression equation is
y_money = - 1158 + 0.253 x1_pcincome - 19.6 x2_ir
Predictor
Coef
SE Coef
T
P
Constant
-1158.4
587.9
-1.97
0.080
x1_pcinc
0.25273
0.03453
7.32
0.000
x2_ir
-19.56
21.73
-0.90
0.391
S = 84.93
R-Sq = 89.8%
R-Sq(adj) = 87.5%
Analysis of Variance
Source
DF
SS
MS
F
Regression
2
570857
285429
39.57
Residual Error
9
64914
7213
Total
11
635771
Source
x1_pcinc
x2_ir

DF
1
1

Seq SS
565012
5845

VIF
1.3
1.3

P
0.000

Chapter 13: Multiple Regression

48

13.88
Regression Analysis: y_manufgrowt versus x1_aggrowth, x2_exportgro, ...
The regression equation is
y_manufgrowth = 2.15 + 0.493 x1_aggrowth + 0.270 x2_exportgrowth
- 0.117 x3_inflation
Predictor
Coef
SE Coef
T
P
VIF
Constant
2.1505
0.9695
2.22
0.032
x1_aggro
0.4934
0.2020
2.44
0.019
1.0
x2_expor
0.26991
0.06494
4.16
0.000
1.0
x3_infla
-0.11709
0.05204
-2.25
0.030
1.0
S = 3.624
R-Sq = 39.3%
R-Sq(adj) = 35.1%
Analysis of Variance
Source
DF
SS
MS
F
Regression
3
373.98
124.66
9.49
Residual Error
44
577.97
13.14
Total
47
951.95
Source
x1_aggro
x2_expor
x3_infla

DF
1
1
1

P
0.000

Seq SS
80.47
227.02
66.50

13.89

The method of least squares regression yields estimators that are BLUE Best
Linear Unbiased Estimators. This result holds when the assumptions regarding the
behavior of the error term are true. BLUE estimators are the most efficient (best)
estimators out of the class of all unbiased estimators. The advent of computing
power incorporating the method of least squares has dramatically increased its use.

13.90 The analysis of variance table identifies how the total variability of the dependent
variable (SST) is split up between the portion of variability that is explained by the
regression model (SSR) and the part that is unexplained (SSE). The Coefficient of
Determination (R2) is derived as the ratio of SSR to SST. The analysis of variance
table also computes the F statistic for the test of the significance of the overall
regression whether all of the slope coefficients are jointly equal to zero. The
associated p-value is also generally reported in this table.
13.91

a. False. If the regression model does not explain a large enough portion of the
variability of the dependent variable, then the error sum of squares can be larger
than the regression sum of squares
b. False the sum of several simple linear regressions will not equal a multiple
regression since the assumption of all else equal will be violated in the simple
linear regressions. The multiple regression holds all else equal in calculating the
partial effect that a change in one of the independent variables has on the
dependent variable.
c. True
d. False While the regular coefficient of determination (R2) cannot be negative,
the adjusted coefficient of determination R 2 can become negative. If the
independent variables added into a regression equation have very little
explanatory power, the loss of degrees of freedom may more than offset the
added explanatory power.

49

Statistics for Business & Economics, 6th edition

e. True
13.92 If one model contains more explanatory variables, then SST remains the same for
both models but SSR will be higher for the model with more explanatory variables.
Since SST = SSR1 + SSE1 which is equivalent to SSR2 + SSE2 and given that SSR2 >
SSR1, then SSE1 > SSE2. Hence, the coefficient of determination will be higher with
a greater number of explanatory variables and the coefficient of determination must
be interpreted in conjunction with whether or not the regression slope coefficients on
the explanatory variables are significantly different from zero.
13.93

This is a classic example of what happens when there is a high degree of


correlation between the independent variables. The overall model can be shown to
have significant explanatory power and yet none of the slope coefficients on the
independent variables are significantly different from zero. This is due to the effect
that high correlation among the independent variables has on the variance of the
estimated slope coefficient.

13.94

e (y a b x b x
e (y y b x b x
e ny ny nb x nb x
e 0

1 1i

2 2i

1 1i

2 2i

b1 x1i b2 x2i )

2 2

nb1 x1 nb2 x2

1 1

13.95

a. All else equal, a unit change in population, industry size, measure of


economic quality, measure of political quality, measure of environmental quality,
measure of health and educational quality, and social life results in a respective
4.983, 2.198, 3.816, -.310, -.886, 3.215, and .085 increase in the new business
starts in the industry.
2
b. R .766 . 76.6% of the variability in new business starts in the industry can be
explained by the variability in the independent variables; population, industry
size, economic, political environmental, health and educational quality of life.
c. t 62,.05 = 1.67, therefore, the 90% CI: 3.816 1.67(2.063) = .3708 up to 7.2612
.886
.29 , t62,.025 = 2.000
d. H 0 : 5 0, H1 : 5 0 , t
3.055
Therefore, do not reject H 0 at the 5% level
3.125
2.05 , t62,.025 = 2.000
e. H 0 : 6 0, H1 : 6 0 , t
1.568
Therefore, reject H 0 at the 5% level
f. H 0 : 1 2 3 4 5 6 7 0, H1 : At least one
i 0, (i 1, 2,3, 4,5, 6, 7)
n k 1 R2
62 .766

28.99
2
k
1 R
2 1 .766
F 7,62,.01 = 2.79, Therefore, reject H 0 at the 1% level
F

Chapter 13: Multiple Regression

13.96

50

a. All else equal, an increase of one question results in a decrease of 1.834 in


expected percentage of responses received. All else equal, an increase in one
word in length of the questionnaire results in a decrease of .016 in expected
percentage of responses received.
b. 63.7% of the variability in the percentage of responses received can be explained
by the variability in the number of questions asked and the number of words
c. H 0 : 1 2 0, H1 : At least one i 0, (i 1, 2)
n k 1 R2
27 .637

23.69
2
k
1 R
2 1 .637
F 2,27,.01 = 5.49, Therefore, reject H 0 at the 1% level
d. t27,.005 = 2.771, 99% CI: -1.8345 2.771(.6349). 3.5938 up to -.0752
e. t = -1.78, t27,.05/.025 = -1.703, -2.052.
F

Therefore, reject H 0 at the 5% level but not at the 2.5% level


13.97

a. All else equal, a 1% increase in course time spent in group discussion results
in an expected increase of .3817 in the average rating of the course. All else
equal, a dollar increase in money spent on the preparation of subject matter
materials results in an expected increase of .5172 in the average rating by
participants of the course. All else equal, a unit increase in expenditure on noncourse related materials results in an expected increase of .0753 in the average
rating of the course.
b. 57.9% of the variation in the average rating can be explained by the linear
relationship with % of class time spent on discussion, money spent on the
preparation of subject matter materials and money spent on non-class related
materials.
c. H 0 : 1 2 3 0, H1 : At least one i 0, (i 1, 2,3)
n k 1 R2
21 .579

9.627
2
k
1 R
3 1 .579
F 2,21,.05 = 3.47
Therefore, reject H 0 at the 5% level
d. t21,.05 = 1.721, 90% CI: .3817 1.721(.2018) .0344 up to .729
e. t = 2.64, t21,.01/.005 = 2.518, 2.831
F

Therefore, reject H 0 at the 1% level but not at the .5% level


f. t = 1.09, t21,.05 = 1.721. Therefore, do not reject H 0 at the 10% level

51

Statistics for Business & Economics, 6th edition

13.98
Regression Analysis: y_rating versus x1_expgrade, x2_Numstudents
The regression equation is
y_rating = - 0.200 + 1.41 x1_expgrade - 0.0158 x2_Numstudents
Predictor
Coef
SE Coef
T
P
VIF
Constant
-0.2001
0.6968
-0.29
0.777
x1_expgr
1.4117
0.1780
7.93
0.000
1.5
x2_Numst
-0.015791
0.003783
-4.17
0.001
1.5
S = 0.1866
R-Sq = 91.5%
R-Sq(adj) = 90.5%
Analysis of Variance
Source
DF
SS
MS
F
Regression
2
6.3375
3.1687
90.99
Residual Error
17
0.5920
0.0348
Total
19
6.9295

13.99

H 0 : 4 5 6 7 0, H1 : At least one i 0, (i 4,5, 6, 7, )


n k 1 R2
55 .467 .242

5.804 , F
2
k
1 R
4 1 .467
Therefore, reject H 0 at the 1% level
F

13.100

P
0.000

4,55,.01

= 3.68

a. All else equal, each extra point in the students expected score leads to an
expected increase of .469 in the actual score
b. t 103,.025 = 1.98, therefore, the 95% CI: 3.369 1.98(.456) = 2.4661 up to
4.2719
3.054
2.096 , t103,.025 = 1.96
c. H 0 : 3 0, H1 : 3 0 , t
1.457
Therefore, reject H 0 at the 5% level
d. 68.6% of the variation in the exam scores is explained by their linear
dependence on the students expected score, hours per week spent working on
the course and the students grade point average
H
e. 0 : 1 2 3 0, H1 : At least one i 0, (i 1, 2,3)
n k 1 R2
103 .686

75.008 , F 3,103,.01 = 3.95


2
k
1 R
3 1 .686
Reject H 0 at any common levels of alpha
F

f. R .686 .82825
g. Y 2.178 .469(80) 3.369(8) 3.054(3) 75.812
13.101 a. t 22,.01 = 2.819, therefore, the 99% CI: .0974 2.819(0.0215)
= .0368 up to .1580
.374
1.789 , t22,.05 /.025 = 1.717, 2.074.
b. H 0 : 2 0, H1 : 2 0 , t
.209
Therefore, reject H 0 at the 5% level but not the 2.5% level
22(.91) 2
.9175
c. R 2
24

Chapter 13: Multiple Regression

52

d. H 0 : 1 2 0, H1 : At least one i 0, (i 1, 2)
n k 1 R2
22 .9175

122.33 , F 2,22,.01 = 5.72


2
k
1 R
2 1 .9175
Reject H 0 at any common levels of alpha
F

e. R .9175 .9579
13.102 a. t 2669,.05 = 1.645, therefore, the 90% CI: 480.04 1.645(224.9) = 110.0795 up
to 850.0005
b. t 2669,.005 = 2.576, therefore, the 99% CI: 1350.3 2.576(212.3) = 803.4152 up
to 1897.1848
891.67
4.9299
c. H 0 : 8 0, H1 : 8 0 , t
180.87
t2669,.005 = 2.576, therefore, reject H 0 at the .5% level
722.95
6.5142
d. H 0 : 9 0, H1 : 9 0 , t
110.98
t2669,.005 = 2.576, therefore, reject H 0 at the .5% level
e. 52.39% of the variability in minutes played in the season can be explained by
the variability in all 9 variables.
f. R .5239 .7238
13.103 a. H 0 : 1 0, H1 : 1 0
.052
t
2.737 , t60,.005 = 2.66, therefore, reject H 0 at the 1% level
.019
.005
.119
b. H 0 : 2 0, H1 : 2 0 , t
.042
t60,.10 = 1.296, therefore, do not reject H 0 at the 20% level
c. 17% of the variation in the growth rate in GDP can be explained by the
variations in real income per capita and the average tax rate, as a proportion of
GNP.
d. R .17 .4123
13.104 A report can be written by following the Case Study and testing the significance
of the model. See section 13.9

53

Statistics for Business & Economics, 6th edition

13.105 a. Start with the correlation matrix:


Correlations: EconGPA, SATverb, SATmath, HSPct
EconGPA
0.478
0.000

SATverb

SATmath

0.427
0.000

0.353
0.003

HSPct

0.362
0.000

0.201
0.121

SATverb

SATmath

0.497
0.000

Regression Analysis: EconGPA versus SATverb, SATmath, HSPct


The regression equation is
EconGPA = 0.612 + 0.0239 SATverb + 0.0117 SATmath + 0.00530
61 cases used 51 cases contain missing values
Predictor
Coef
SE Coef
T
P
Constant
0.6117
0.4713
1.30
0.200
SATverb
0.023929
0.007386
3.24
0.002
SATmath
0.011722
0.007887
1.49
0.143
HSPct
0.005303
0.004213
1.26
0.213
S = 0.4238
R-Sq = 32.9%
R-Sq(adj) = 29.4%
Analysis of Variance
Source
DF
SS
MS
F
Regression
3
5.0171
1.6724
9.31
Residual Error
57
10.2385
0.1796
Total
60
15.2556
Source
SATverb
SATmath
HSPct

DF
1
1
1

HSPct
VIF
1.2
1.5
1.3

P
0.000

Seq SS
3.7516
0.9809
0.2846

The regression model indicates positive coefficients, as expected, for all three
independent variables. The greater the high school rank, and the higher the SAT
verbal and SAT math scores, the larger the Econ GPA. The high school rank
variable has the smallest t-statistic and is removed from the model:
Regression Analysis: EconGPA versus SATverb, SATmath
The regression equation is
EconGPA = 0.755 + 0.0230 SATverb + 0.0174 SATmath
67 cases used 45 cases contain missing values
Predictor
Coef
SE Coef
T
P
Constant
0.7547
0.4375
1.72
0.089
SATverb
0.022951
0.006832
3.36
0.001
SATmath
0.017387
0.006558
2.65
0.010
S = 0.4196
R-Sq = 30.5%
R-Sq(adj) = 28.3%
Analysis of Variance
Source
DF
SS
MS
F
Regression
2
4.9488
2.4744
14.05
Residual Error
64
11.2693
0.1761
Total
66
16.2181
Source
SATverb
SATmath

DF
1
1

Seq SS
3.7109
1.2379

VIF
1.1
1.1

P
0.000

Chapter 13: Multiple Regression

54

Both SAT variables are now statistically significant at the .05 level and appear to
pick up separate influences on the dependent variable. The simple correlation
coefficient between SAT math and SAT verbal is relatively low at .353. Thus,
multicollinearity will not be dominant in this regression model.
The final regression model, with conditional t-statistics in parentheses under the
coefficients, is:
Y .755 .023( SATverbal ) .0174( SATmath)
(3.36)
(2.65)
S = .4196 R2 = .305 n = 67
b. Start with the correlation matrix:
Correlations: EconGPA, Acteng, ACTmath, ACTss, ACTcomp, HSPct
EconGPA
0.387
0.001

Acteng

ACTmath

0.338
0.003

0.368
0.001

ACTss

0.442
0.000

0.448
0.000

0.439
0.000

ACTcomp

0.474
0.000

0.650
0.000

0.765
0.000

0.812
0.000

HSPct

0.362
0.000

0.173
0.150

0.290
0.014

0.224
0.060

Acteng

ACTmath

ACTss

ACTcomp

0.230
0.053

Regression Analysis: EconGPA versus Acteng, ACTmath, ...


The regression equation is
EconGPA = - 0.207 + 0.0266 Acteng - 0.0023 ACTmath + 0.0212 ACTss
+ 0.0384 ACTcomp + 0.0128 HSPct
71 cases used 41 cases contain missing values
Predictor
Coef
SE Coef
T
P
VIF
Constant
-0.2069
0.6564
-0.32
0.754
Acteng
0.02663
0.02838
0.94
0.352
2.2
ACTmath
-0.00229
0.03031
-0.08
0.940
4.2
ACTss
0.02118
0.02806
0.75
0.453
4.6
ACTcomp
0.03843
0.07287
0.53
0.600
12.7
HSPct
0.012817
0.005271
2.43
0.018
1.2
S = 0.5034
R-Sq = 31.4%
R-Sq(adj) = 26.1%
Analysis of Variance
Source
DF
SS
MS
F
Regression
5
7.5253
1.5051
5.94
Residual Error
65
16.4691
0.2534
Total
70
23.9945
Source
Acteng
ACTmath
ACTss
ACTcomp
HSPct

DF
1
1
1
1
1

Seq SS
3.5362
1.0529
1.4379
0.0001
1.4983

P
0.000

55

Statistics for Business & Economics, 6th edition

The regression shows that only high school rank is significant at the .05 level. We
may suspect multicollinearity between the variables, particularly since there is a
total ACT score (ACT composite) as well as the components that make up the
ACT composite. Since conditional significance is dependent on which other
independent variables are included in the regression equation, drop one variable at
a time. ACTmath has the lowest t-statistic and is removed:
Regression Analysis: EconGPA versus Acteng, ACTss, ACTcomp,
HSPct
The regression equation is
EconGPA = - 0.195 + 0.0276 Acteng + 0.0224 ACTss + 0.0339 ACTcomp
+ 0.0127 HSPct
71 cases used 41 cases contain missing values
Predictor
Coef
SE Coef
T
P
VIF
Constant
-0.1946
0.6313
-0.31
0.759
Acteng
0.02756
0.02534
1.09
0.281
1.8
ACTss
0.02242
0.02255
0.99
0.324
3.0
ACTcomp
0.03391
0.04133
0.82
0.415
4.2
HSPct
0.012702
0.005009
2.54
0.014
1.1
S = 0.4996
R-Sq = 31.4%
R-Sq(adj) = 27.2%
Analysis of Variance
Source
DF
SS
MS
F
Regression
4
7.5239
1.8810
7.54
Residual Error
66
16.4706
0.2496
Total
70
23.9945
Source
Acteng
ACTss
ACTcomp
HSPct

DF
1
1
1
1

P
0.000

Seq SS
3.5362
2.1618
0.2211
1.6048

Again, high school rank is the only conditionally significant variable. ACTcomp
has the lowest t-statistic and is removed:
Regression Analysis: EconGPA versus Acteng, ACTss, HSPct
The regression equation is
EconGPA = 0.049 + 0.0390 Acteng + 0.0364 ACTss + 0.0129 HSPct
71 cases used 41 cases contain missing values
Predictor
Coef
SE Coef
T
P
VIF
Constant
0.0487
0.5560
0.09
0.930
Acteng
0.03897
0.02114
1.84
0.070
1.3
ACTss
0.03643
0.01470
2.48
0.016
1.3
HSPct
0.012896
0.004991
2.58
0.012
1.1
S = 0.4983
R-Sq = 30.7%
R-Sq(adj) = 27.6%
Analysis of Variance
Source
DF
SS
MS
F
Regression
3
7.3558
2.4519
9.87
Residual Error
67
16.6386
0.2483
Total
70
23.9945
Source
Acteng
ACTss
HSPct

DF
1
1
1

Seq SS
3.5362
2.1618
1.6579

P
0.000

Chapter 13: Multiple Regression

56

Now ACTss and high school rank are conditionally significant. ACTenglish has a tstatistic less than 2 and is removed:

57

Statistics for Business & Economics, 6th edition

Regression Analysis: EconGPA versus ACTss, HSPct


The regression equation is
EconGPA = 0.566 + 0.0479 ACTss + 0.0137 HSPct
71 cases used 41 cases contain missing values
Predictor
Coef
SE Coef
T
Constant
0.5665
0.4882
1.16
ACTss
0.04790
0.01355
3.53
HSPct
0.013665
0.005061
2.70

P
0.250
0.001
0.009

S = 0.5070
R-Sq = 27.1%
R-Sq(adj) = 25.0%
Analysis of Variance
Source
DF
SS
MS
F
Regression
2
6.5123
3.2562
12.67
Residual Error
68
17.4821
0.2571
Total
70
23.9945
Source
ACTss
HSPct

DF
1
1

VIF
1.1
1.1

P
0.000

Seq SS
4.6377
1.8746

Both of the independent variables are statistically significant at the .05 level and
hence, the final regression model, with conditional t-statistics in parentheses under
the coefficients, is:
Y .567 .0479( ACTss) .0137( HSPct )
(3.53)
(2.70)
2
S = .5070 R = .271 n = 71
c. The regression model with the SAT variables is the better predictor
because the standard error of the estimate is smaller than for the ACT model
(.4196 vs. .5070). The R2 measure cannot be directly compared due to the sample
size differences.
13.106
Correlations: Salary, age, Experien, yrs_asoc, yrs_full, Sex_1Fem, Market,
C8
age

Salary
0.749
0.000

age Experien yrs_asoc yrs_full Sex_1Fem

Experien

0.883
0.000

0.877
0.000

yrs_asoc

0.698
0.000

0.712
0.000

0.803
0.000

yrs_full

0.777
0.000

0.583
0.000

0.674
0.000

0.312
0.000

Sex_1Fem -0.429
0.000

-0.234
0.004

-0.378
0.000

-0.367
0.000

-0.292
0.000

Market

0.026
0.750

-0.134
0.103

-0.150
0.067

-0.113
0.169

-0.017
0.833

0.062
0.453

-0.029
0.721

-0.189
0.020

-0.117
0.155

-0.073
0.373

-0.043
0.598

-0.094
0.254

C8

Market

-0.107
0.192

Chapter 13: Multiple Regression

58

The correlation matrix indicates that several of the independent variables are likely
to be significant, however, multicollinearity is also a likely result. The regression
model with all independent variables is:
Regression Analysis: Salary versus age, Experien, ...
The regression equation is
Salary = 23725 - 40.3 age + 357 Experien + 263 yrs_asoc + 493
yrs_full
- 954 Sex_1Fem + 3427 Market + 1188 C8
Predictor
Coef
SE Coef
T
P
VIF
Constant
23725
1524
15.57
0.000
age
-40.29
44.98
-0.90
0.372
4.7
Experien
356.83
63.48
5.62
0.000
10.0
yrs_asoc
262.50
75.11
3.49
0.001
4.0
yrs_full
492.91
59.27
8.32
0.000
2.6
Sex_1Fem
-954.1
487.3
-1.96
0.052
1.3
Market
3427.2
754.1
4.54
0.000
1.1
C8
1188.4
597.5
1.99
0.049
1.1
S = 2332
R-Sq = 88.2%
R-Sq(adj) = 87.6%
Analysis of Variance
Source
DF
SS
MS
F
Regression
7 5776063882
825151983
151.74
Residual Error
142
772162801
5437766
Total
149 6548226683
Source
age
Experien
yrs_asoc
yrs_full
Sex_1Fem
Market

DF
1
1
1
1
1
1

P
0.000

Seq SS
3669210599
1459475287
1979334
500316356
22707368
100860164

Since age is insignificant and has the smallest t-statistics, it is removed from the
model:
The conditional F test for age is:
SSRF SSRR 5, 766, 064, 000 5, 771, 700, 736
FX 2

.80
s 2Y | X
(2332) 2
Which is well below any common critical value of F. Thus, age is removed from
the model. The remaining independent variables are all significant at the .05 level
of significance and hence, become the final regression model. Residual analysis to
determine if the assumption of linearity holds true follows:

Statistics for Business & Economics, 6th edition

Regression Analysis: Salary versus Experien, yrs_asoc, ...


The regression equation is
Salary = 22455 + 324 Experien + 258 yrs_asoc + 491 yrs_full
Sex_1Fem
+ 3449 Market + 1274 C8
Predictor
Coef
SE Coef
T
P
Constant
22455.2
557.7
40.26
0.000
Experien
324.24
51.99
6.24
0.000
yrs_asoc
257.88
74.88
3.44
0.001
yrs_full
490.97
59.19
8.29
0.000
Sex_1Fem
-1043.4
476.7
-2.19
0.030
Market
3449.4
753.2
4.58
0.000
C8
1274.5
589.3
2.16
0.032
S = 2330
R-Sq = 88.1%
R-Sq(adj) = 87.6%
Analysis of Variance
Source
DF
SS
MS
F
Regression
6 5771700580
961950097
177.15
Residual Error
143
776526103
5430252
Total
149 6548226683

10000

5000

RESI1

59

-5000

10

20

Experien

30

40

- 1043
VIF
6.7
4.0
2.6
1.2
1.1
1.1

P
0.000

Chapter 13: Multiple Regression

10000

RESI1

5000

-5000

10

20

yrs_asoc

10000

RESI1

5000

-5000

10

yrs_full

20

60

Statistics for Business & Economics, 6th edition

10000

RESI1

5000

-5000

0.0

0.5

1.0

Sex_1Fem

10000

5000

RESI1

61

-5000

0.0

0.5

Market

1.0

Chapter 13: Multiple Regression

62

10000

RESI1

5000

-5000

0.0

0.5

1.0

C8

The residual plot for Experience shows a relatively strong quadratic relationship
between Experience and Salary. Therefore, a new variable, taking into account the
quadratic relationship is generated and added to the model. None of the other
residual plots shows strong evidence of non-linearity.
Regression Analysis: Salary versus Experien, ExperSquared, ...
The regression equation is
Salary = 18915 + 875 Experien - 15.9 ExperSquared + 222 yrs_asoc + 612
yrs_full
- 650 Sex_1Fem + 3978 Market + 1042 C8
Predictor
Coef
SE Coef
T
P
VIF
Constant
18915.2
583.2
32.43
0.000
Experien
875.35
72.20
12.12
0.000
20.6
ExperSqu
-15.947
1.717
-9.29
0.000
16.2
yrs_asoc
221.58
59.40
3.73
0.000
4.0
yrs_full
612.10
48.63
12.59
0.000
2.8
Sex_1Fem
-650.1
379.6
-1.71
0.089
1.2
Market
3978.3
598.8
6.64
0.000
1.1
C8
1042.3
467.1
2.23
0.027
1.1
S = 1844
R-Sq = 92.6%
R-Sq(adj) = 92.3%
Analysis of Variance
Source
DF
SS
MS
F
Regression
7 6065189270
866455610
254.71
Residual Error
142
483037413
3401672
Total
149 6548226683
Source
Experien
ExperSqu
yrs_asoc
yrs_full
Sex_1Fem
Market
C8

DF
1
1
1
1
1
1
1

Seq SS
5109486518
91663414
15948822
678958872
12652358
139540652
16938635

P
0.000

63

Statistics for Business & Economics, 6th edition

The squared term for experience is statistically significant; however, the Sex_1Fem
is no longer significant at the .05 level and hence is removed from the model:
Regression Analysis: Salary versus Experien, ExperSquared, ...
The regression equation is
Salary = 18538 + 888 Experien - 16.3 ExperSquared + 237 yrs_asoc
+ 624 yrs_full
+ 3982 Market + 1145 C8
Predictor
Coef
SE Coef
T
P
VIF
Constant
18537.8
543.6
34.10
0.000
Experien
887.85
72.32
12.28
0.000
20.4
ExperSqu
-16.275
1.718
-9.48
0.000
16.0
yrs_asoc
236.89
59.11
4.01
0.000
3.9
yrs_full
624.49
48.41
12.90
0.000
2.8
Market
3981.8
602.9
6.60
0.000
1.1
C8
1145.4
466.3
2.46
0.015
1.0
S = 1857
R-Sq = 92.5%
R-Sq(adj) = 92.2%
Analysis of Variance
Source
DF
SS
MS
F
Regression
6 6055213011 1009202168
292.72
Residual Error
143
493013673
3447648
Total
149 6548226683

P
0.000

This is the final model with all of the independent variables being conditionally
significant, including the quadratic transformation of Experience. This would
indicate that a non-linear relationship exists between experience and salary.
13.107
Correlations: hseval, Comper, Homper, Indper, sizehse, incom72
hseval
-0.335
0.001

Comper

Homper

0.145
0.171

-0.499
0.000

Indper

-0.086
0.419

-0.140
0.188

-0.564
0.000

sizehse

0.542
0.000

-0.278
0.008

0.274
0.009

-0.245
0.020

incom72

0.426
0.000

-0.198
0.062

-0.083
0.438

0.244
0.020

Comper

Homper

Indper

sizehse

0.393
0.000

The correlation matrix indicates that the size of the house, income and percent
homeowners have a positive relationship with house value. There is a negative
relationship between the percent industrial and percent commercial and house
value.

Chapter 13: Multiple Regression

64

Regression Analysis: hseval versus Comper, Homper, ...


The regression equation is
hseval = - 19.0 - 26.4 Comper - 12.1 Homper - 15.5 Indper + 7.22
sizehse
+ 0.00408 incom72
Predictor
Coef
SE Coef
T
P
VIF
Constant
-19.02
13.20
-1.44
0.153
Comper
-26.393
9.890
-2.67
0.009
2.2
Homper
-12.123
7.508
-1.61
0.110
3.0
Indper
-15.531
8.630
-1.80
0.075
2.6
sizehse
7.219
2.138
3.38
0.001
1.5
incom72
0.004081
0.001555
2.62
0.010
1.4
S = 3.949
R-Sq = 40.1%
R-Sq(adj) = 36.5%
Analysis of Variance
Source
DF
SS
MS
F
Regression
5
876.80
175.36
11.25
Residual Error
84
1309.83
15.59
Total
89
2186.63

P
0.000

All variables are conditionally significant with the exception of Indper and Homper.
Since Homper has the smaller t-statistic, it is removed:
Regression Analysis: hseval versus Comper, Indper, sizehse,
incom72
The regression equation is
hseval = - 30.9 - 15.2 Comper - 5.73 Indper + 7.44 sizehse +
0.00418 incom72
Predictor
Coef
SE Coef
T
P
VIF
Constant
-30.88
11.07
-2.79
0.007
Comper
-15.211
7.126
-2.13
0.036
1.1
Indper
-5.735
6.194
-0.93
0.357
1.3
sizehse
7.439
2.154
3.45
0.001
1.5
incom72
0.004175
0.001569
2.66
0.009
1.4
S = 3.986
R-Sq = 38.2%
R-Sq(adj) = 35.3%
Analysis of Variance
Source
DF
SS
MS
F
Regression
4
836.15
209.04
13.16
Residual Error
85
1350.48
15.89
Total
89
2186.63

P
0.000

Indper is not significant and is removed:


Regression Analysis: hseval versus Comper, sizehse, incom72
The regression equation is
hseval = - 34.2 - 13.9 Comper + 8.27 sizehse + 0.00364 incom72
Predictor
Coef
SE Coef
T
P
VIF
Constant
-34.24
10.44
-3.28
0.002
Comper
-13.881
6.974
-1.99
0.050
1.1
sizehse
8.270
1.957
4.23
0.000
1.2
incom72
0.003636
0.001456
2.50
0.014
1.2
S = 3.983

R-Sq = 37.6%

Analysis of Variance
Source
DF
Regression
3
Residual Error
86
Total
89

SS
822.53
1364.10
2186.63

R-Sq(adj) = 35.4%
MS
274.18
15.86

F
17.29

P
0.000

65

Statistics for Business & Economics, 6th edition

This becomes the final regression model. The selection of a community with the
objective of having larger house values would include communities where the
percent of commercial property is low, the median rooms per residence is high and
the per capita income is high.
13.108

a. Correlation matrix:
Correlations: deaths, vehwt, impcars, lghttrks, carage
vehwt
impcars
lghttrks
carage

deaths
0.244
0.091

vehwt

impcars lghttrks

-0.284
0.048

-0.943
0.000

0.726
0.000

0.157
0.282

-0.175
0.228

-0.422
0.003

0.123
0.400

0.011
0.943

-0.329
0.021

Crash deaths are positively related to vehicle weight and percentage of light trucks
and negatively related to percent imported cars and car age. Light trucks will have
the strongest linear association of any independent variable followed by car age.
Multicollinearity is likely to exist due to the strong correlation between impcars
and vehicle weight.
b.
Regression Analysis: deaths versus vehwt, impcars, lghttrks, carage
The regression equation is
deaths = 2.60 +0.000064 vehwt - 0.00121 impcars
lghttrks
- 0.0395 carage
Predictor
Coef
SE Coef
T
Constant
2.597
1.247
2.08
vehwt
0.0000643
0.0001908
0.34
impcars
-0.001213
0.005249
-0.23
lghttrks
0.008332
0.001397
5.96
carage
-0.03946
0.01916
-2.06

+ 0.00833
P
0.043
0.738
0.818
0.000
0.045

S = 0.05334
R-Sq = 59.5%
R-Sq(adj) = 55.8%
Analysis of Variance
Source
DF
SS
MS
F
Regression
4
0.183634
0.045909
16.14
Residual Error
44
0.125174
0.002845
Total
48
0.308809

VIF
10.9
10.6
1.2
1.4

P
0.000

Light trucks is a significant positive variable. Since impcars has the smallest tstatistic, it is removed from the model:

Chapter 13: Multiple Regression

66

Regression Analysis: deaths versus vehwt, lghttrks, carage


The regression equation is
deaths = 2.55 +0.000106 vehwt + 0.00831 lghttrks - 0.0411 carage
Predictor
Coef
SE Coef
T
P
VIF
Constant
2.555
1.220
2.09
0.042
vehwt
0.00010622 0.00005901
1.80
0.079
1.1
lghttrks
0.008312
0.001380
6.02
0.000
1.2
carage
-0.04114
0.01754
-2.34
0.024
1.2
S = 0.05277

R-Sq = 59.4%

Analysis of Variance
Source
DF
Regression
3
Residual Error
45
Total
48

SS
0.183482
0.125326
0.308809

R-Sq(adj) = 56.7%
MS
0.061161
0.002785

F
21.96

P
0.000

Also, remove vehicle weight using the same argument:


Regression Analysis: deaths versus lghttrks, carage
The regression equation is
deaths = 2.51 + 0.00883 lghttrks - 0.0352 carage
Predictor
Coef
SE Coef
T
P
Constant
2.506
1.249
2.01
0.051
lghttrks
0.008835
0.001382
6.39
0.000
carage
-0.03522
0.01765
-2.00
0.052
S = 0.05404

R-Sq = 56.5%

Analysis of Variance
Source
DF
Regression
2
Residual Error
46
Total
48

SS
0.174458
0.134351
0.308809

VIF
1.1
1.1

R-Sq(adj) = 54.6%
MS
0.087229
0.002921

F
29.87

P
0.000

The model has light trucks and car age as the significant variables. Note that car
age is marginally significant (p-value of .052) and hence could also be dropped
from the model.
c. The regression modeling indicates that the percentage of light trucks is
conditionally significant in all of the models and hence is an important predictor
in the model. Car age and imported cars are marginally significant predictors
when only light trucks is included in the model.
13.109

a. Correlation matrix:
Correlations: deaths, Purbanpop, Ruspeed, Prsurf
deaths Purbanpo
Purbanpo -0.594
0.000
Ruspeed
Prsurf

0.305
0.033

-0.224
0.121

-0.556
0.000

0.207
0.153

Ruspeed

-0.232
0.109

67

Statistics for Business & Economics, 6th edition

Descriptive Statistics: deaths, Purbanpop, Prsurf, Ruspeed


Variable
deaths
Purbanpo
Prsurf
Ruspeed

N
49
49
49
49

Mean
0.1746
0.5890
0.7980
58.186

Median
0.1780
0.6311
0.8630
58.400

TrMean
0.1675
0.5992
0.8117
58.222

Variable
deaths
Purbanpo
Prsurf
Ruspeed

Minimum
0.0569
0.0000
0.2721
53.500

Maximum
0.5505
0.9689
1.0000
62.200

Q1
0.1240
0.4085
0.6563
57.050

Q3
0.2050
0.8113
0.9485
59.150

StDev
0.0802
0.2591
0.1928
1.683

SE Mean
0.0115
0.0370
0.0275
0.240

The proportion of urban population and rural roads that are surfaced are negatively
related to crash deaths. Average rural speed is positively related, but the
relationship is not as strong as the proportion of urban population and surfaced
roads. The simple correlation coefficients among the independent variables are
relatively low and hence multicollinearity should not be dominant in this model.
Note the relatively narrow range for average rural speed. This would indicate that
there is not much variability in this independent variable.
b. Multiple regression
Regression Analysis: deaths versus Purbanpop, Prsurf, Ruspeed
The regression equation is
deaths = 0.141 - 0.149 Purbanpop - 0.181 Prsurf + 0.00457 Ruspeed
Predictor
Constant
Purbanpo
Prsurf
Ruspeed
S = 0.05510

Coef
0.1408
-0.14946
-0.18058
0.004569

SE Coef
0.2998
0.03192
0.04299
0.004942

R-Sq = 55.8%

Analysis of Variance
Source
DF
Regression
3
Residual Error
45
Total
48

SS
0.172207
0.136602
0.308809

T
0.47
-4.68
-4.20
0.92

P
0.641
0.000
0.000
0.360

VIF
1.1
1.1
1.1

R-Sq(adj) = 52.8%
MS
0.057402
0.003036

F
18.91

P
0.000

The model has conditionally significant variables for percent urban population and
percent surfaced roads. Since average rural speed is not conditionally significant,
it is dropped from the model:

Chapter 13: Multiple Regression

68

Regression Analysis: deaths versus Purbanpop, Prsurf


The regression equation is
deaths = 0.416 - 0.155 Purbanpop - 0.188 Prsurf
Predictor
Coef
SE Coef
T
P
Constant
0.41609
0.03569
11.66
0.000
Purbanpo
-0.15493
0.03132
-4.95
0.000
Prsurf
-0.18831
0.04210
-4.47
0.000
S = 0.05501

R-Sq = 54.9%

Analysis of Variance
Source
DF
Regression
2
Residual Error
46
Total
48

VIF
1.0
1.0

R-Sq(adj) = 53.0%

SS
0.169612
0.139197
0.308809

MS
0.084806
0.003026

F
28.03

P
0.000

This becomes the final model since both variables are conditionally significant.
c. Conclude that the proportions of urban populations and the percent of rural
roads that are surfaced are important independent variables in explaining crash
deaths. All else equal, increases in the proportion of urban population, the
lower the crash deaths. All else equal, increases in the proportion of rural
roads that are surfaced will result in lower crash deaths. The average rural
speed is not conditionally significant.
13.110 a. Correlation matrix and descriptive statistics
Correlations: hseval, sizehse, Taxhse, Comper, incom72, totexp
sizehse
Taxhse
Comper
incom72
totexp

hseval
0.542
0.000
0.248
0.019
-0.335
0.001
0.426
0.000
0.261
0.013

sizehse

Taxhse

Comper

incom72

0.289
0.006
-0.278
0.008
0.393
0.000
-0.022
0.834

-0.114
0.285
0.261
0.013
0.228
0.030

-0.198
0.062
0.269
0.010

0.376
0.000

The correlation matrix shows that multicollinearity is not likely to be a problem in


this model since all of the correlations among the independent variables are
relatively low.

69

Statistics for Business & Economics, 6th edition

Descriptive Statistics: hseval, sizehse, Taxhse, Comper, incom72, totexp


Variable
hseval
sizehse
Taxhse
Comper
incom72
totexp

N
90
90
90
90
90
90

Mean
21.031
5.4778
130.13
0.16211
3360.9
1488848

Median
20.301
5.4000
131.67
0.15930
3283.0
1089110

TrMean
20.687
5.4638
128.31
0.16206
3353.2
1295444

Variable
hseval
sizehse
Taxhse
Comper
incom72
totexp

Minimum
13.300
5.0000
35.04
0.02805
2739.0
361290

Maximum
35.976
6.2000
399.60
0.28427
4193.0
7062330

Q1
17.665
5.3000
98.85
0.11388
3114.3
808771

Q3
24.046
5.6000
155.19
0.20826
3585.3
1570275

StDev
4.957
0.2407
48.89
0.06333
317.0
1265564

SE Mean
0.522
0.0254
5.15
0.00668
33.4
133402

The range for applying the regression model (variable means + / - 2 standard errors):
Hseval 21.03 +/- 2(4.957) = 11.11 to 30.94
Sizehse5.48 +/- 2(.24) = 5.0 to 5.96
Taxhse
130.13 +/- 2(48.89) = 32.35 to 227.91
Comper
.16 +/- 2(.063) = .034 to .286
Incom72
3361 +/- 2(317) = 2727 to 3995
Totexp
1488848 +/- 2(1265564) = not a good approximation
b. Regression models:
Regression Analysis: hseval versus sizehse, Taxhse, ...
The regression equation is
hseval = - 31.1 + 9.10 sizehse - 0.00058 Taxhse
incom72
+0.000001 totexp
Predictor
Coef
SE Coef
T
Constant
-31.07
10.09
-3.08
sizehse
9.105
1.927
4.72
Taxhse
-0.000584
0.008910
-0.07
Comper
-22.197
7.108
-3.12
incom72
0.001200
0.001566
0.77
totexp
0.00000125 0.00000038
3.28
S = 3.785

R-Sq = 45.0%

Analysis of Variance
Source
DF
Regression
5
Residual Error
84
Total
89

SS
982.98
1203.65
2186.63

- 22.2 Comper + 0.00120


P
0.003
0.000
0.948
0.002
0.445
0.002

VIF
1.3
1.2
1.3
1.5
1.5

R-Sq(adj) = 41.7%
MS
196.60
14.33

F
13.72

P
0.000

Taxhse is not conditionally significant, nor is income; however, dropping one variable at a
time, eliminate Taxhse first, then eliminate income:

Chapter 13: Multiple Regression

70

Regression Analysis: hseval versus sizehse, Comper, totexp


The regression equation is
hseval = - 29.9 + 9.61 sizehse - 23.5 Comper +0.000001 totexp
Predictor
Coef
SE Coef
T
P
VIF
Constant
-29.875
9.791
-3.05
0.003
sizehse
9.613
1.724
5.58
0.000
1.1
Comper
-23.482
6.801
-3.45
0.001
1.2
totexp
0.00000138 0.00000033
4.22
0.000
1.1
S = 3.754

R-Sq = 44.6%

Analysis of Variance
Source
DF
Regression
3
Residual Error
86
Total
89

R-Sq(adj) = 42.6%

SS
974.55
1212.08
2186.63

MS
324.85
14.09

F
23.05

P
0.000

This is the final regression model. All of the independent variables are
conditionally significant.
Both the size of house and total government expenditures enhances market value
of homes while the percent of commercial property tends to reduce market values
of homes.
c. In the final regression model, the tax variable was not found to be conditionally
significant and hence it is difficult to support the developers claim.
13.111

a. Correlation matrix
Correlations: retsal84, Unemp84, perinc84
retsal84
-0.370
0.008
perinc84 0.633
0.000

Unemp84

Unemp84

-0.232
0.101

There is a positive association between per capita income and retail sales. There is
a negative association between unemployment and retail sales. High correlation
among the independent variables does not appear to be a problem since the
correlation between the independent variables is relatively low.
Descriptive Statistics: retsal84, perinc84, Unemp84
Variable
retsal84
perinc84
Unemp84

N
51
51
51

Mean
5536
12277
7.335

Median
5336
12314
7.000

TrMean
5483
12166
7.196

Variable
retsal84
perinc84

Minimum
4250
8857

Maximum
8348
17148

Q1
5059
10689

Q3
6037
13218

StDev
812
1851
2.216

SE Mean
114
259
0.310

71

Statistics for Business & Economics, 6th edition

Regression Analysis: retsal84 versus Unemp84, perinc84


The regression equation is
retsal84 = 3054 - 86.3 Unemp84 + 0.254 perinc84
Predictor
Coef
SE Coef
T
P
Constant
3054.3
724.4
4.22
0.000
Unemp84
-86.25
40.20
-2.15
0.037
perinc84
0.25368
0.04815
5.27
0.000
S = 612.9

R-Sq = 45.3%

Analysis of Variance
Source
DF
Regression
2
Residual Error
48
Total
50

VIF
1.1
1.1

R-Sq(adj) = 43.0%

SS
14931938
18029333
32961271

MS
7465969
375611

F
19.88

P
0.000

This is the final model since all of the independent variables are conditionally significant at
the .05 level. The 95% confidence intervals for the regression slope coefficients:
1 t ( S ) : -86.25 +/- 2.011(40.2) = -86.25 +/- 80.84
1

2 t ( S ) :
2
b.

.254 + / - 2.011(.04815) = .254 + / - .0968

All things equal, the condition effect of a $1,000 decrease in per capita income
on retail sales would be to reduce retail sales by $254.

c.

Adding state population as a predictor yields the following regression results:


Regression Analysis: retsal84 versus Unemp84, perinc84, Totpop84
The regression equation is
retsal84 = 2828 - 71.3 Unemp84 + 0.272 perinc84 - 0.0247 Totpop84
Predictor
Coef
SE Coef
T
P
VIF
Constant
2828.4
737.9
3.83
0.000
Unemp84
-71.33
41.40
-1.72
0.091
1.1
perinc84
0.27249
0.04977
5.47
0.000
1.1
Totpop84
-0.02473
0.01845
-1.34
0.187
1.1
S = 607.8
R-Sq = 47.3%
R-Sq(adj) = 44.0%
Analysis of Variance
Source
DF
SS
MS
F
Regression
3
15595748
5198583
14.07
Residual Error
47
17365523
369479
Total
50
32961271

P
0.000

The population variable is not conditionally significant and adds little explanatory power,
therefore, it will not improve the multiple regression model.
13.112 a.
Correlations: FRH, FBPR, FFED, FM2, GDPH, GH
FBPR
FFED
FM2
GDPH
GH

FRH
0.510
0.000
0.244
0.001
0.854
0.000
0.934
0.000
0.907

FBPR

FFED

FM2

GDPH

0.957
0.000
0.291
0.000
0.580
0.000
0.592

0.077
0.326
0.287
0.000
0.285

0.987
0.000
0.977

0.973

Chapter 13: Multiple Regression

0.000

0.000

0.000

0.000

72

0.000

The correlation matrix shows that both interest rates have a significant positive
impact on residential investment. The money supply, GDP and government
expenditures also have a significant linear association with residential investment.
Note the high correlation between the two interest rate variables, which, as
expected, would create significant problems if both variables are included in the
regression model. Hence, the interest rates will be developed in two separate
models.
Regression Analysis: FRH versus FBPR, FM2, GDPH, GH
The regression equation is
FRH = 70.0 - 3.79 FBPR - 0.0542 FM2 + 0.0932 GDPH - 0.165 GH
166 cases used 52 cases contain missing values
Predictor
Coef
SE Coef
T
P
VIF
Constant
70.00
24.87
2.82
0.005
FBPR
-3.7871
0.6276
-6.03
0.000
1.2
FM2
-0.054210
0.009210
-5.89
0.000
46.8
GDPH
0.093223
0.007389
12.62
0.000
58.1
GH
-0.16514
0.03747
-4.41
0.000
28.7
S = 23.42
R-Sq = 86.7%
R-Sq(adj) = 86.3%
Analysis of Variance
Source
DF
SS
MS
F
Regression
4
573700
143425
261.42
Residual Error
161
88331
549
Total
165
662030

P
0.000

This will be the final model with prime rate as the interest rate variable since all of
the independent variables are conditionally significant. Note the significant
multicollinearity that exists between the independent variables.
Regression Analysis: FRH versus FFED, FM2, GDPH, GH
The regression equation is
FRH = 55.0 - 2.76 FFED - 0.0558 FM2 + 0.0904 GDPH - 0.148 GH
166 cases used 52 cases contain missing values
Predictor
Coef
SE Coef
T
P
VIF
Constant
55.00
26.26
2.09
0.038
FFED
-2.7640
0.6548
-4.22
0.000
1.2
FM2
-0.05578
0.01007
-5.54
0.000
50.7
GDPH
0.090402
0.007862
11.50
0.000
59.6
GH
-0.14752
0.03922
-3.76
0.000
28.5
S = 24.61
R-Sq = 85.3%
R-Sq(adj) = 84.9%
Analysis of Variance
Source
DF
SS
MS
F
Regression
4
564511
141128
233.00
Residual Error
161
97519
606
Total
165
662030

P
0.000

The model with the federal funds rate as the interest rate variable is also the final
model with all of the independent variables conditionally significant. Again, high
correlation among the independent variables will be a problem with this regression
model.

73

Statistics for Business & Economics, 6th edition

b. 95% confidence intervals for the slope coefficients on the interest rate term:
Bank prime rate as the interest rate variable:
1 t ( S ) : -3.7871 +/- 1.96(.6276) = -3.7871 +/- 1.23
1

Federal funds rate as the interest rate variable:


1 t ( S ) : -2.764 +/- 1.96(.6548) = -2.764 +/- 1.2834
1

13.112 a.
Correlations: Infmrt82, Phys82, Perinc84, Perhosp
Infmrt82
0.434
0.001
Perinc84 0.094
0.511
Perhosp
0.411
0.003

Phys82 Perinc84

Phys82

0.614
0.000
0.285
0.042

0.267
0.058

The correlation matrix shows a positive association with Phys82 and Perhosp.
These variables are the number of physicians per 100,000 population and the total
per capita expenditures for hospitals. One would expect a negative association,
therefore, examine the scatterdiagram of infant mortality vs. phys82:

Infmrt82

20

15

10

100

150

200

250

300

350

400

450

500

550

Phys82

The graph shows an obvious outlier which, upon further investigation, is the
District of Columbia. Due to the outlier status, this row is dropped from the
analysis and the correlation matrix is recalculated:

Chapter 13: Multiple Regression

74

Correlations: Infmrt82, Phys82, Perinc84, Perhosp


Infmrt82
-0.147
0.309
Perinc84 -0.192
0.181
Perhosp
0.199
0.166

Phys82 Perinc84

Phys82

0.574
0.000
-0.065
0.654

0.140
0.331

The physicians per 100,000 population now has the correct sign, however, none of
the independent variables has a statistically significant linear association with the
dependent variable. Per capita expenditures for hospitals is an unexpected positive
sign; however, it is not conditionally significant. The multiple regression results
are likely to yield low explanatory power with insignificant independent variables:
Regression Analysis: Infmrt82 versus Phys82, Perinc84, Perhosp
The regression equation is
Infmrt82 = 12.7 - 0.00017 Phys82 -0.000206 Perinc84 + 6.30 Perhosp
Predictor
Coef
SE Coef
T
P
VIF
Constant
12.701
1.676
7.58
0.000
Phys82
-0.000167
0.006647
-0.03
0.980
1.5
Perinc84
-0.0002064
0.0001637
-1.26
0.214
1.6
Perhosp
6.297
3.958
1.59
0.118
1.1
S = 1.602
R-Sq = 8.9%
R-Sq(adj) = 3.0%
Analysis of Variance
Source
DF
SS
MS
F
Regression
3
11.546
3.849
1.50
Residual Error
46
118.029
2.566
Total
49
129.575

P
0.227

As expected, the model explains less than 9% of the variability in infant mortality.
None of the independent variables are conditionally significant and high correlation
among the independent variables does not appear to be a significant problem. The
standard error of the estimate is very large (1.602) relative to the size of the infant
mortality rates and hence the model would not be a good predictor. Sequentially
dropping the independent variable with the lowest t-statistic confirms the
conclusion that none of the independnet variables is conditionally significant. The
search is on for better independent variables.
b. The two variables to include are per capita spending on education (PerEduc)
and per capita spending on public welfare (PerPbwel). Since the conditional
significance of the independent variables is a function of other independent
variables in the model, we will include the original set of variables:

75

Statistics for Business & Economics, 6th edition

Regression Analysis: Infmrt82 versus Phys82, Perinc84, ...


The regression equation is
Infmrt82 = 12.2 - 0.00122 Phys82 +0.000015 Perinc84 + 8.87 Perhosp
- 1.96 PerEduc - 4.56 PerPbwel
Predictor
Coef
SE Coef
T
P
VIF
Constant
12.167
1.654
7.35
0.000
Phys82
-0.001219
0.008028
-0.15
0.880
2.4
Perinc84
0.0000152
0.0001942
0.08
0.938
2.4
Perhosp
8.872
4.010
2.21
0.032
1.2
PerEduc
-1.960
1.315
-1.49
0.143
1.8
PerPbwel
-4.555
3.360
-1.36
0.182
1.7
S = 1.550
R-Sq = 18.4%
R-Sq(adj) = 9.1%
Analysis of Variance
Source
DF
SS
MS
F
Regression
5
23.848
4.770
1.98
Residual Error
44
105.727
2.403
Total
49
129.575

P
0.100

The model shows low explanatory power and only one independent variable that is
conditionally significant (Perhosp). Dropping sequentially the independent variable
with the lowest t-statistic yields a model with no conditionally significant
independent variables. This problem illustrates that in some applications, the
variables that have been identified as theoretically important predictors do not
meet the statistical test.
13.114 a.
Correlations: Salary, age, yrs_asoc, yrs_full, Sex_1Fem, Market, C8
Salary
0.749
0.000
yrs_asoc 0.698
0.000
yrs_full 0.777
0.000
Sex_1Fem -0.429
0.000
Market
0.026
0.750
C8
-0.029
0.721

age yrs_asoc yrs_full Sex_1Fem

Market

age

0.712
0.000
0.583
0.000
-0.234
0.004
-0.134
0.103
-0.189
0.020

0.312
0.000
-0.367
0.000
-0.113
0.169
-0.073
0.373

-0.292
0.000
-0.017
0.833
-0.043
0.598

0.062
0.453
-0.094
0.254

-0.107
0.192

The correlation matrix indicates several independent variables that should provide
good explanatory power in the regression model. We would expect that age, years
at Associate professor and years at full professor are likely to be conditionally
significant:

Chapter 13: Multiple Regression

76

Regression Analysis: Salary versus age, yrs_asoc, ...


The regression equation is
Salary = 21107 + 105 age + 532 yrs_asoc + 690 yrs_full - 1312
Sex_1Fem
+ 2854 Market + 1101 C8
Predictor
Coef
SE Coef
T
P
VIF
Constant
21107
1599
13.20
0.000
age
104.59
40.62
2.58
0.011
3.1
yrs_asoc
532.27
63.66
8.36
0.000
2.4
yrs_full
689.93
52.66
13.10
0.000
1.7
Sex_1Fem
-1311.8
532.3
-2.46
0.015
1.3
Market
2853.9
823.3
3.47
0.001
1.0
C8
1101.0
658.1
1.67
0.097
1.1
S = 2569

R-Sq = 85.6%

Analysis of Variance
Source
DF
Regression
6
Residual Error
143
Total
149

SS
5604244075
943982608
6548226683

R-Sq(adj) = 85.0%
MS
934040679
6601277

F
141.49

P
0.000

Dropping the C8 variable yields:


Regression Analysis: Salary versus age, yrs_asoc, ...
The regression equation is
Salary = 21887 + 90.0 age + 539 yrs_asoc + 697 yrs_full - 1397
Sex_1Fem
+ 2662 Market
Predictor
Coef
SE Coef
T
P
VIF
Constant
21887
1539
14.22
0.000
age
90.02
39.92
2.26
0.026
3.0
yrs_asoc
539.48
63.91
8.44
0.000
2.4
yrs_full
697.35
52.80
13.21
0.000
1.7
Sex_1Fem
-1397.2
533.2
-2.62
0.010
1.2
Market
2662.3
820.3
3.25
0.001
1.0
S = 2585

R-Sq = 85.3%

Analysis of Variance
Source
DF
Regression
5
Residual Error
144
Total
149

SS
5585766862
962459821
6548226683

R-Sq(adj) = 84.8%
MS
1117153372
6683749

F
167.14

P
0.000

This is the final model. All of the independent variables are conditionally
significant and the model explains a sizeable portion of the variability in salary.
b. To test the hypothesis that the rate of change in female salaries as a function of
age is less than the rate of change in male salaries as a function of age, the
dummy variable Sex_1Fem is used to see if the slope coefficient for age (X1) is
different for males and females. The following model is used:
Y 0 ( 1 6 X 4 ) X 1 2 X 2 3 X 3 4 X 4 5 X 5
0 1 X 1 6 X 4 X 1 2 X 2 3 X 3 4 X 4 5 X 5

77

Statistics for Business & Economics, 6th edition

Create the variable X4X1 and then test for conditional significance in the regression model.
If it proves to be a significant predictor of salaries then there is strong evidence to
conclude that the rate of change in female salaries as a function of age is different than for
males:
Regression Analysis: Salary versus age, femage, ...
The regression equation is
Salary = 22082 + 85.1 age + 11.7 femage + 543 yrs_asoc + 701 yrs_full
- 1878 Sex_1Fem + 2673 Market
Predictor
Coef
SE Coef
T
P
VIF
Constant
22082
1877
11.77
0.000
age
85.07
48.36
1.76
0.081
4.4
femage
11.66
63.89
0.18
0.855
32.2
yrs_asoc
542.85
66.73
8.13
0.000
2.6
yrs_full
701.35
57.35
12.23
0.000
2.0
Sex_1Fem
-1878
2687
-0.70
0.486
31.5
Market
2672.8
825.1
3.24
0.001
1.0
S = 2594
R-Sq = 85.3%
R-Sq(adj) = 84.7%
Analysis of Variance
Source
DF
SS
MS
F
Regression
6 5585990999
930998500
138.36
Residual Error
143
962235684
6728921
Total
149 6548226683

P
0.000

The regression shows that the newly created variable of femage is not conditionally
significant. Thus, we cannot conclude that the rate of change in female salaries as a
function of age differs from that of male salaries.
13.115
Regression Analysis: hseval versus sizehse, taxrate, incom72, Homper
The regression equation is
hseval = - 32.7 + 6.74 sizehse - 223 taxrate + 0.00464 incom72 + 11.2
Homper
Predictor
Coef
SE Coef
T
P
VIF
Constant
-32.694
8.972
-3.64
0.000
sizehse
6.740
1.880
3.58
0.001
1.4
taxrate
-222.96
45.39
-4.91
0.000
1.2
incom72
0.004642
0.001349
3.44
0.001
1.2
Homper
11.215
4.592
2.44
0.017
1.3
S = 3.610
R-Sq = 49.3%
R-Sq(adj) = 47.0%
Analysis of Variance
Source
DF
SS
MS
F
Regression
4
1079.08
269.77
20.70
Residual Error
85
1107.55
13.03
Total
89
2186.63

P
0.000

All of the independent variables are conditionally significant. Now add the percent of
commercial property to the model to see if it is significant:

Chapter 13: Multiple Regression

78

Regression Analysis: hseval versus sizehse, taxrate, ...


The regression equation is
hseval = - 31.6 + 6.76 sizehse - 218 taxrate + 0.00453 incom72 + 10.3
Homper
- 2.18 Comper
Predictor
Coef
SE Coef
T
P
VIF
Constant
-31.615
9.839
-3.21
0.002
sizehse
6.757
1.892
3.57
0.001
1.4
taxrate
-217.63
49.58
-4.39
0.000
1.4
incom72
0.004534
0.001412
3.21
0.002
1.4
Homper
10.287
5.721
1.80
0.076
2.0
Comper
-2.182
7.940
-0.27
0.784
1.7
S = 3.630
R-Sq = 49.4%
R-Sq(adj) = 46.4%
Analysis of Variance
Source
DF
SS
MS
F
Regression
5
1080.07
216.01
16.40
Residual Error
84
1106.56
13.17
Total
89
2186.63

P
0.000

With a t-statistic of -.27 we have not found strong enough evidence to reject H 0 that the
slope coefficient on percent commercial property is significantly different from zero. The
conditional F test:
SSRF SSRR 1080.07 1079.08
FComper

.075 , thus, at any common level of


S 2Y | X
(3.63)2
alpha, do not reject H 0 that the percent commercial property has no effect on house
values.
Add percent industrial property to the base model:
Regression Analysis: hseval versus sizehse, taxrate, ...
The regression equation is
hseval = - 28.6 + 6.10 sizehse - 232 taxrate + 0.00521 incom72 + 8.68
Homper
- 7.50 Indper
Predictor
Coef
SE Coef
T
P
VIF
Constant
-28.643
9.602
-2.98
0.004
sizehse
6.096
1.956
3.12
0.003
1.5
taxrate
-232.34
46.00
-5.05
0.000
1.2
incom72
0.005208
0.001431
3.64
0.000
1.4
Homper
8.681
5.070
1.71
0.091
1.6
Indper
-7.505
6.427
-1.17
0.246
1.7
S = 3.602

R-Sq = 50.2%

Analysis of Variance
Source
DF
Regression
5
Residual Error
84
Total
89

SS
1096.77
1089.86
2186.63

R-Sq(adj) = 47.2%
MS
219.35
12.97

F
16.91

P
0.000

Likewise, the percent industrial property is not significantly different from zero. The
RSS5 RSS 4 1096.77 1079.08
FIndper

1.36 , again this is


conditional F test:
S 2Y | X
(3.602) 2

79

Statistics for Business & Economics, 6th edition

lower than the critical value of F based on common levels of alpha, therefore, do not reject
H 0 that the percent industrial property has no effect on house values.
Tax rate models:
Regression Analysis: taxrate versus taxbase, expercap, Homper
The regression equation is
taxrate = - 0.0174 -0.000000 taxbase +0.000162 expercap + 0.0424 Homper
Predictor
Coef
SE Coef
T
P
VIF
Constant
-0.017399
0.007852
-2.22
0.029
taxbase
-0.00000000 0.00000000
-0.80
0.426
1.2
expercap
0.00016204 0.00003160
5.13
0.000
1.1
Homper
0.042361
0.009378
4.52
0.000
1.2
S = 0.007692
R-Sq = 31.9%
R-Sq(adj) = 29.5%
Analysis of Variance
Source
DF
SS
MS
F
Regression
3 0.00237926 0.00079309
13.41
Residual Error
86 0.00508785 0.00005916
Total
89 0.00746711

P
0.000

Since taxbase is not significant, it is dropped from the model:


Regression Analysis: taxrate versus expercap, Homper
The regression equation is
taxrate = - 0.0192 +0.000158 expercap + 0.0448 Homper
Predictor
Coef
SE Coef
T
P
Constant
-0.019188
0.007511
-2.55
0.012
expercap
0.00015767 0.00003106
5.08
0.000
Homper
0.044777
0.008860
5.05
0.000
S = 0.007676
R-Sq = 31.4%
R-Sq(adj) = 29.8%
Analysis of Variance
Source
DF
SS
MS
F
Regression
2
0.0023414
0.0011707
19.87
Residual Error
87
0.0051257
0.0000589
Total
89
0.0074671

VIF
1.1
1.1

P
0.000

Both of the independent variables are significant. This becomes the base model that we
now add percent commercial property and percent industrial property sequentially:
Regression Analysis: taxrate versus expercap, Homper, Comper
The regression equation is
taxrate = - 0.0413 +0.000157 expercap + 0.0643 Homper + 0.0596 Comper
Predictor
Coef
SE Coef
T
P
VIF
Constant
-0.041343
0.008455
-4.89
0.000
expercap
0.00015660 0.00002819
5.55
0.000
1.1
Homper
0.064320
0.009172
7.01
0.000
1.4
Comper
0.05960
0.01346
4.43
0.000
1.3
S = 0.006966

R-Sq = 44.1%

Analysis of Variance
Source
DF
Regression
3
Residual Error
86
Total
89

SS
0.0032936
0.0041735
0.0074671

R-Sq(adj) = 42.2%
MS
0.0010979
0.0000485

F
22.62

P
0.000

Chapter 13: Multiple Regression

80

Percent commercial property is conditionally significant and an important independent


variable as shown by the conditional F-test:
RSS3 RSS 2 .003294 .00234
FComper

19.62
S 2Y | X
(.006966) 2
With 1 degree of freedom in the numberator and (90-3-1) = 86 degrees of freedom
in the denominator, the critical value of F at the .05 level is 3.95. Hence we would
conclude that the percentage of commercial property has a statistically significant positive
impact on tax rate.
We now add industrial property to test the effect on tax rate:
Regression Analysis: taxrate versus expercap, Homper, Indper
The regression equation is
taxrate = - 0.0150 +0.000156 expercap + 0.0398 Homper - 0.0105 Indper
Predictor
Coef
SE Coef
T
P
VIF
Constant
-0.015038
0.009047
-1.66
0.100
expercap
0.00015586 0.00003120
5.00
0.000
1.1
Homper
0.03982
0.01071
3.72
0.000
1.6
Indper
-0.01052
0.01273
-0.83
0.411
1.5
S = 0.007690
R-Sq = 31.9%
R-Sq(adj) = 29.5%
Analysis of Variance
Source
DF
SS
MS
F
Regression
3 0.00238178 0.00079393
13.43
Residual Error
86 0.00508533 0.00005913
Total
89 0.00746711

P
0.000

The percent industrial property is insignificant with a t-statistic of only -.83. The F-test
confirms that the variable does not have a significant impact on tax rate:
RSS3 RSS 2 .002382 .00234
FIndper

.683 . With 1 degree of freedom in the


S 2Y | X
(.00769) 2
numberator and (90-3-1) = 86 degrees of freedom in the denominator, the critical value of
F at the .05 level is 3.95. Hence we would conclude that the percentage of commercial
property has no statistically significant impact on tax rate.
In conclusion, we found no evidence to back three of the activists claims and strong
evidence to reject one of them. We concluded that commercial development will have no
effect on house value, while it will actually increase tax rate. In addition, we concluded
that industrial development will have no effect on house value or tax rate.
It was important to include all of the other independent variables in the regression models
because the conditional significance of any one variable is influenced by which other
independent variables are in the regression model. Therefore, it is important to test if
direct relationships can be explained by the relationships with other predictor variables.

81

Statistics for Business & Economics, 6th edition

13.116

a. Correlation matrix:
Correlations: EconGPA, sex, Acteng, ACTmath, ACTss, ACTcomp, HSPct
EconGPA
0.187
0.049
Acteng
0.387
0.001
ACTmath
0.338
0.003
ACTss
0.442
0.000
ACTcomp
0.474
0.000
HSPct
0.362
0.000

sex

Acteng

ACTmath

ACTss

ACTcomp

0.270
0.021
-0.170
0.151
-0.105
0.375
-0.084
0.478
0.216
0.026

0.368
0.001
0.448
0.000
0.650
0.000
0.173
0.150

0.439
0.000
0.765
0.000
0.290
0.014

0.812
0.000
0.224
0.060

0.230
0.053

sex

There exists a positive relationship between EconGPA and all of the independent variables,
which is expected. Note that there is a high correlation between the composite ACT score
and the individual components, which is again, as expected. Thus, high correlation among
the independent variables is likely to be a serious concern in this regression model.
Regression Analysis: EconGPA versus sex, Acteng, ...
The regression equation is
EconGPA = - 0.050 + 0.261 sex + 0.0099 Acteng + 0.0064 ACTmath + 0.0270
ACTss
+ 0.0419 ACTcomp + 0.00898 HSPct
71 cases used 41 cases contain missing values
Predictor
Coef
SE Coef
T
P
VIF
Constant
-0.0504
0.6554
-0.08
0.939
sex
0.2611
0.1607
1.62
0.109
1.5
Acteng
0.00991
0.02986
0.33
0.741
2.5
ACTmath
0.00643
0.03041
0.21
0.833
4.3
ACTss
0.02696
0.02794
0.96
0.338
4.7
ACTcomp
0.04188
0.07200
0.58
0.563
12.8
HSPct
0.008978
0.005716
1.57
0.121
1.4
S = 0.4971
R-Sq = 34.1%
R-Sq(adj) = 27.9%
Analysis of Variance
Source
DF
SS
MS
F
Regression
6
8.1778
1.3630
5.52
Residual Error
64
15.8166
0.2471
Total
70
23.9945

P
0.000

As expected, high correlation among the independent variables is affecting the results. A
strategy of dropping the variable with the lowest t-statistic with each successive model
causes the dropping of the following variables (in order): 1) ACTmath, 2) ACTeng, 3)
ACTss, 4) HSPct. The two variables that remain are the final model of gender and
ACTcomp:

Chapter 13: Multiple Regression

82

Regression Analysis: EconGPA versus sex, ACTcomp


The regression equation is
EconGPA = 0.322 + 0.335 sex + 0.0978 ACTcomp
73 cases used 39 cases contain missing values
Predictor
Coef
SE Coef
T
Constant
0.3216
0.5201
0.62
sex
0.3350
0.1279
2.62
ACTcomp
0.09782
0.01989
4.92

P
0.538
0.011
0.000

S = 0.4931
R-Sq = 29.4%
R-Sq(adj) = 27.3%
Analysis of Variance
Source
DF
SS
MS
F
Regression
2
7.0705
3.5352
14.54
Residual Error
70
17.0192
0.2431
Total
72
24.0897

VIF
1.0
1.0

P
0.000

Both independent variables are conditionally significant.


b. The model could be used in college admission decisions by creating a predicted
GPA in economics based on sex and ACT comp scores. This predicted GPA
could then be used with other factors in deciding admission. Note that this
model predicts that females will outperform males with equal test scores.
Using this model as the only source of information may lead to charges of
unequal treatment.
13.117
92.7% of the total variation in gross revenue from a medical practice can be
explained by the variations in number of hours worked, number of physicians,
number of allied health personnel and the number of rooms used in the practice.
The regression model indicates that gross revenue from a medical practice is
positively associated with the number of hours worked, the number of physicians
in the the practice, the number of allied health personnel and the number of rooms
used in the practice. All of these variables are expected to have a positive effect on
gross revenue. With df = (50 4 1) = 45, the critical value of t for a two-tailed
test at an alpha of .05 is approximately 2.021. Therefore, one can conclude that all
of the independent variables are conditionally significant with the sole exception of
the number of rooms used in the practice. This variable, while a positive estimated
coefficient, is not significantly different from zero.
The number of physicians in the practice has a large effect on gross revenue since,
all else equal, for a 1% increase in the number of physicians in the practice, we
estimate an increase in the gross revenue from the medical practice by .673%.
Note however that before additional physicians are hired, the analysis should be
extended to include not only gross revenue but also profits.