Professional Documents
Culture Documents
Stat 431
Solution to Homework 5
(100 points total)
20
40
60
80
190
150
160
170
157.2 + x x
180
190
180
170
150
160
170
150
160
180
190
20
40
60
80
20
40
60
80
~ HoursOfEffort
~ HoursOfEffort + Aware
Sum of Sq
F
Pr(>F)
3292.7 25.272 1.729e-06 ***
*** 0.001 ** 0.01 * 0.05 . 0.1
We can see the P-value is smaller than 0.05, which means Aware does explain a significant
extra amount of variation in the number of shipments per month by customers. (Full
credit can also be earned by comparing the model Mailings HoursOfEffort to the
model Mailings HoursOfEffort * Aware, which includes the interaction term.)
(c) (5 points) We split the data and perform regression on both. Output:
Call:
lm(formula = MailingYES ~ HoursOfEffortYES)
Coefficients:
(Intercept)
HoursOfEffortYES
HourOfEffort plus interaction in the regression of (a). The intercept in the No group
is the same as the intercept in (a), and the intercept in the Yes group is the same as
the intercept in (a) plus the coefficient of Aware in (a).
(d) (5 points)
predict(fit1, newdata = data.frame(HoursOfEffort = 4, Aware = "YES"),
interval = "prediction", level = 0.95)
fit
lwr
upr
1 76.67738 53.83333 99.52142
So the prediction interval is [53.83, 99.51].
3. (25 points)
(a) (5 points)
Call:
lm(formula = GP1000M.City ~ Weight + Horsepower, data = car)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.168e+01 1.727e+00
6.765 6.91e-10 ***
Weight
8.918e-03 8.822e-04 10.109 < 2e-16 ***
Horsepower 8.838e-02 1.226e-02
7.207 7.88e-11 ***
Residual standard error: 3.5 on 109 degrees of freedom
Multiple R-squared: 0.841,
Adjusted R-squared: 0.8381
F-statistic: 288.3 on 2 and 109 DF, p-value: < 2.2e-16
The LS coefficient are 11.68, 0.008918, 0.0838. R2 is 0.841, RMSE is 3.5.
(b) (5 points)
> vif(fit.car)
Weight Horsepower
2.202488
2.202488
The VIF for these two variables are both 2.202488. The VIF are identical because
there are only two predictors in the regression. The VIF is a function of R2 for the
(simple) regression of one predictor on the other. Since R2 in simple linear regression is
the squared correlation coefficient of the response with the predictor, R2 for regressing
Weight on Horsepower is equal to R2 for regressing Horsepower on Weight.
(c) (5 points) The pairwise correlations are shown below.
> cor(Weight, Horsepower)
[1] 0.7388965
> cor(Weight, H.per.P)
[1] 0.2403865
> cor(Horsepower, H.per.P)
[1] 0.8227992
3
(d) (5 points)
Call:
lm(formula = GP1000M.City ~ Weight + H.per.P)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.703e-01 2.047e+00
0.327
0.744
Weight
1.253e-02 6.051e-04 20.707 < 2e-16 ***
H.per.P
2.707e+02 3.623e+01
7.474 2.08e-11 ***
Residual standard error: 3.458 on 109 degrees of freedom
Multiple R-squared: 0.8448,
Adjusted R-squared: 0.842
F-statistic: 296.7 on 2 and 109 DF, p-value: < 2.2e-16
The LS coefficients are 0.6703, 0.01253, 270.7; R2 is 0.8448, RMSE is 3.458. VIF of the
two predictors are both 1.06133.
(e) Comparing the results in (d) and (a), we can see the R2 is larger and RMSE is smaller
in (d) than (a), which means the response can be explained better in (d) than (a). The
collinearity in (d) is smaller than (a) as the VIF in (d) is smaller than (a).
4. (20 points)
(a) (10 points) We have the following two formulas.
2
2
M SRXj =
M
X
,
LR /
j ,SLR
q
SE(jM LR ) = SE(jSLR ) V IFXj M SRXj .
2
2
2
From the two summaries of regression, we know
M
M RI,SLR = 21.21 ;
LR = 19.79 ,
M LR ) = 0.5634; SE(
SLR ) = 0.4806. So we can calculate that
SE(M
M RI
RI
Under the null, F would satisfy F2,34 . So we can calculate the P-value as 0.0029. So we
reject the null hypothesis that M RI = 0 with significance level 0.05.
5. (20 points)
(a) (8 points) Output of the regression summary:
Call:
lm(formula = courseevaluation ~ beauty + tenuretrack + age +
female)
4
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.444818
0.156449 28.411 < 2e-16 ***
beauty
0.134947
0.032932
4.098 4.94e-05 ***
tenuretrack -0.197480
0.060341 -3.273 0.00115 **
age
-0.003809
0.002764 -1.378 0.16883
female
-0.228929
0.052567 -4.355 1.64e-05 ***
Residual standard error: 0.5318 on 458 degrees of freedom
Multiple R-squared: 0.08938,
Adjusted R-squared: 0.08143
F-statistic: 11.24 on 4 and 458 DF, p-value: 1.046e-08
-0.5
-1.5
-1.0
Residuals
0.0
0.5
1.0
From the summary, we can see one unit increase beauty associates with 0.1349 higher
in course evaluation in average; if the teacher is in tenure track, the course evaluation
will be 0.197480 lower than the non-tenure track; if the teacher is 1 year older, the
course evaluation will be 0.003809 lower on average; if the teacher is female, the course
evaluation will be 0.228929 lower than the males. Under the assumption of normally
distributed residuals, roughly 95% of the observations will fall within 2(0.5318) of their
predicted values.
3.8
4.0
4.2
4.4
Fitted values
p-value: 1.471e-07
From the summary, we can see in our model if the teacher is female, the course evaluation
will be 0.20505 lower than the males; a one unit increase in beauty for males is associated
with 0.20027 higher course evaluation on average; a one unit increase in beauty for
females is associated with (0.20027 0.11266) = 0.08761 higher course evaluations on
average.
4.0
3.5
2.0
2.5
3.0
course evaluation
4.5
5.0
(c) (6 points) The plot is in Figure 3. The black dots are male instructors and red dots are
female instructors.
1.5
1.0
0.5
0.0
0.5
1.0
1.5
beauty