HW 5 Solutions

Spring 2014
Stat 431
Solution to Homework 5
(100 points total)
20
40
60
80
190
150
160
170
157.2 + x x
180
190
180
170
161 + 2.6 * x/10
150
160
170
150
160
161 + 2.6 * x/10
180
190
1. (15 points) Plots are shown in Figure 1.
20
40
60
80
20
40
60
80
Figure 1: fitted regression for (a), (b) and (c)

R-code:
curve(161+2.6*x/10, 0,80, ylim = c(150, 190))
curve(96.2+33.6*x/10 - 3.2 * (x/10)^2, 0, 80, add=TRUE)
curve(157.2+x-x, 0, 29, xlim = c(0, 80), ylim = c(150, 190))
curve(157.2+19.1+x-x, 30, 44, xlim = c(0, 80), ylim = c(150, 190), add= TRUE)
2. (20 points)
(a) (5 points) Output of regression:
Call:
lm(formula = Mailings ~ HoursOfEffort * Aware)
Coefficients:
(Intercept)
HoursOfEffort
AwareYES
HoursOfEffort:AwareYES
Estimate Std. Error t value Pr(>|t|)

2.454
2.502
0.981
0.3286
13.821
1.088 12.705
<2e-16 ***
1.707
3.991
0.428
0.6697
4.308
1.683
2.560
0.0117 *
Residual standard error: 11.16 on 121 degrees of freedom

Multiple R-squared: 0.7665,
Adjusted R-squared: 0.7607
F-statistic: 132.4 on 3 and 121 DF, p-value: < 2.2e-16
1
The coefficient of interaction is 4.308, which means that, if HoursOfEffort is increased

by 1, the predicted difference in Mailings for a customer who was aware of FedEx is
4.308 greater than the predicted difference for a customer who was not aware.
(b) (5 points) We use the Anova test. H0 : aware = 0 v.s. H1 : aware 6= 0. Output:
Analysis of Variance Table
Model 1: Mailings
Model 2: Mailings
Res.Df
RSS Df
1
123 19188
2
122 15896 1
--Signif. codes: 0
~ HoursOfEffort
~ HoursOfEffort + Aware
Sum of Sq
F
Pr(>F)
3292.7 25.272 1.729e-06 ***
*** 0.001 ** 0.01 * 0.05 . 0.1
We can see the P-value is smaller than 0.05, which means Aware does explain a significant
extra amount of variation in the number of shipments per month by customers. (Full
credit can also be earned by comparing the model Mailings HoursOfEffort to the
model Mailings HoursOfEffort * Aware, which includes the interaction term.)
(c) (5 points) We split the data and perform regression on both. Output:
Call:
lm(formula = MailingYES ~ HoursOfEffortYES)
Coefficients:
(Intercept)
HoursOfEffortYES

4.161
3.464
1.201
0.236
18.129
1.430 12.679
<2e-16 ***

Call:
lm(formula = MailingNO ~ HoursOfEffortNO)
Coefficients:
(Intercept)
2.4541
2.2955
1.069
0.289
HoursOfEffortNO 13.8214
0.9982 13.846
<2e-16 ***
We can see the coefficient of HourOfEffort in No group is the same as in the regression of (a); the coefficient of HourOfEffort in Yes group is equal to the coefficient of
2
HourOfEffort plus interaction in the regression of (a). The intercept in the No group
is the same as the intercept in (a), and the intercept in the Yes group is the same as
the intercept in (a) plus the coefficient of Aware in (a).
(d) (5 points)
predict(fit1, newdata = data.frame(HoursOfEffort = 4, Aware = "YES"),
interval = "prediction", level = 0.95)
fit
lwr
upr
1 76.67738 53.83333 99.52142
So the prediction interval is [53.83, 99.51].
3. (25 points)
(a) (5 points)
Call:
lm(formula = GP1000M.City ~ Weight + Horsepower, data = car)
Coefficients:
(Intercept) 1.168e+01 1.727e+00
6.765 6.91e-10 ***
Weight
8.918e-03 8.822e-04 10.109 < 2e-16 ***
Horsepower 8.838e-02 1.226e-02
7.207 7.88e-11 ***
The LS coefficient are 11.68, 0.008918, 0.0838. R2 is 0.841, RMSE is 3.5.
(b) (5 points)
> vif(fit.car)
Weight Horsepower
2.202488
2.202488
The VIF for these two variables are both 2.202488. The VIF are identical because
there are only two predictors in the regression. The VIF is a function of R2 for the
(simple) regression of one predictor on the other. Since R2 in simple linear regression is
the squared correlation coefficient of the response with the predictor, R2 for regressing
Weight on Horsepower is equal to R2 for regressing Horsepower on Weight.
(c) (5 points) The pairwise correlations are shown below.
> cor(Weight, Horsepower)
[1] 0.7388965
> cor(Weight, H.per.P)
[1] 0.2403865
> cor(Horsepower, H.per.P)
[1] 0.8227992
3
(d) (5 points)
Call:
lm(formula = GP1000M.City ~ Weight + H.per.P)
Coefficients:
(Intercept) 6.703e-01 2.047e+00
0.327
0.744
Weight
1.253e-02 6.051e-04 20.707 < 2e-16 ***
H.per.P
2.707e+02 3.623e+01
7.474 2.08e-11 ***
The LS coefficients are 0.6703, 0.01253, 270.7; R2 is 0.8448, RMSE is 3.458. VIF of the
two predictors are both 1.06133.
(e) Comparing the results in (d) and (a), we can see the R2 is larger and RMSE is smaller
in (d) than (a), which means the response can be explained better in (d) than (a). The
collinearity in (d) is smaller than (a) as the VIF in (d) is smaller than (a).
4. (20 points)
(a) (10 points) We have the following two formulas.
2
2
M SRXj =
M
X
,
LR /
j ,SLR
q
SE(jM LR ) = SE(jSLR ) V IFXj M SRXj .
2
2
2
From the two summaries of regression, we know
M
M RI,SLR = 21.21 ;
LR = 19.79 ,
M LR ) = 0.5634; SE(
SLR ) = 0.4806. So we can calculate that
SE(M
M RI
RI
V IFXM RI = (0.5634/0.4806)2 (21.212 /19.792 ) = 1.5785.

(b) (10 points) The RMSE for PIQ MRI is 21.21 with degree of freedom 36, the RMSE
for PIQ MRI + Height + Weight is 19.79 with degree of freedom 34. Hence,
F =
(21.212 36 19.792 34)/2

= 6.938
19.792
Under the null, F would satisfy F2,34 . So we can calculate the P-value as 0.0029. So we
reject the null hypothesis that M RI = 0 with significance level 0.05.
5. (20 points)
(a) (8 points) Output of the regression summary:
Call:
lm(formula = courseevaluation ~ beauty + tenuretrack + age +
female)
4
Coefficients:
(Intercept) 4.444818
0.156449 28.411 < 2e-16 ***
beauty
0.134947
0.032932
4.098 4.94e-05 ***
tenuretrack -0.197480
0.060341 -3.273 0.00115 **
age
-0.003809
0.002764 -1.378 0.16883
female
-0.228929
0.052567 -4.355 1.64e-05 ***
F-statistic: 11.24 on 4 and 458 DF, p-value: 1.046e-08
-0.5
-1.5
-1.0
Residuals
0.0
0.5
1.0
From the summary, we can see one unit increase beauty associates with 0.1349 higher
in course evaluation in average; if the teacher is in tenure track, the course evaluation
will be 0.197480 lower than the non-tenure track; if the teacher is 1 year older, the
course evaluation will be 0.003809 lower on average; if the teacher is female, the course
evaluation will be 0.228929 lower than the males. Under the assumption of normally
distributed residuals, roughly 95% of the observations will fall within 2(0.5318) of their
predicted values.
3.8
4.0
4.2
4.4
Fitted values
Figure 2: Residuals vs fitted values for (a)

(b) (6 points) Output of the regression summary.
Call:
lm(formula = courseevaluation ~ beauty * female)
Coefficients:
(Intercept)
4.10364
0.03359 122.158 < 2e-16 ***
beauty
0.20027
0.04333
4.622 4.95e-06 ***
female
-0.20505
0.05103 -4.018 6.85e-05 ***
beauty:female -0.11266
0.06398 -1.761
0.0789 .
5
F-statistic: 11.97 on 3 and 459 DF,
p-value: 1.471e-07
From the summary, we can see in our model if the teacher is female, the course evaluation
will be 0.20505 lower than the males; a one unit increase in beauty for males is associated
with 0.20027 higher course evaluation on average; a one unit increase in beauty for
females is associated with (0.20027 0.11266) = 0.08761 higher course evaluations on
average.
4.0
3.5
2.0
2.5
3.0
course evaluation
4.5
5.0
(c) (6 points) The plot is in Figure 3. The black dots are male instructors and red dots are
female instructors.
1.5
1.0
0.5
0.0
0.5
1.0
1.5
beauty
Figure 3: fitted regression for (a), (b) and (c)

R-code:
plot(beauty.male, courseevaluation.male, xlab = "beauty", ylab = "course evaluation")
abline(lm(courseevaluation.male ~ beauty.male))
points(beauty.female, courseevaluation.female, col="red")
abline(lm(courseevaluation.female ~ beauty.female), col="red")

HW 5 Solutions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HW 5 Solutions

Uploaded by

Copyright:

Available Formats

Spring 2014

161 + 2.6 * x/10

161 + 2.6 * x/10

1. (15 points) Plots are shown in Figure 1.

Figure 1: fitted regression for (a), (b) and (c)

Estimate Std. Error t value Pr(>|t|)

Residual standard error: 11.16 on 121 degrees of freedom

The coefficient of interaction is 4.308, which means that, if HoursOfEffort is increased

Estimate Std. Error t value Pr(>|t|)

Residual standard error: 12.43 on 48 degrees of freedom

V IFXM RI = (0.5634/0.4806)2 (21.212 /19.792 ) = 1.5785.

(21.212 36 19.792 34)/2

Figure 2: Residuals vs fitted values for (a)

F-statistic: 11.97 on 3 and 459 DF,

Figure 3: fitted regression for (a), (b) and (c)

You might also like