Professional Documents
Culture Documents
Problem 2
A. Brand preference. In a small-scale experimental study of the relation between degree of brand
liking (Y ) and moisture content ( X 1 ) and sweetness ( X 2 ) of the product, the following results
were obtained from the experiment based on a completely randomized design (data are coded):
i: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Xn : 4 4 4 4 6 6 6 6 8 8 8 8 10 10 10 10
Xn : 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4
Yi : 64 73 61 76 72 80 71 83 83 89 86 93 88 95 94 10
0
a. Fit regression model to the data. State the estimated regression function. How is 1 interpreted
here?
b. Obtain the residuals and prepare a box plot of the residuals. What information does this plot
provide?
The boxplot for residuals shows that they are symmetrically distributed around zero and also have the
zero mean.
c. Plot the residuals against Y , X 1 , X 2 , and X 1 X 2 on separate graphs. Also prepare a normal
probability plot. Analyze the plots and summarize your findings.
Normal probability plot for residuals shows more or less normality of them:
The plot of residuals vs all the fitted and X1 and X2 and their product shows uniform spread, so
regression function is linear in any of these:
d. Conduct a formal test for lack of fit of the first-order regression function; use a = .01. State the
alternatives, decision rule, and conclusion.
H 0 : E Y 0 1 X 1 2 X 2
H a : E Y 0 1 X 1 2 X 2
J=1 J=2 J=3 J=4 J=5 J=6 J=7 J=8
Replicate X1=4, X1=4, X1=6, X1=6, X1=8, X1=8, X1=10, X1=10,
X2=2 X2=4 X2=2 X2=4 X2=2 X2=4 X2=2 X2=4
I=1 64 73 72 80 83 89 88 95
I=2 61 76 71 83 86 93 94 100
Mean j 62.5 74.5 71.5 81.5 84.5 91 91 97.5
Sum( Yij Y j )^2 4.5 4.5 0.5 4.5 4.5 8 18 12.5
c=8
SSPE(from above)=57 df=n-c=16-8=8
SSE=94.3
Then SSLF=SSE-SSPE=37.3 df=c-p=8-3=5
F*=(SSLF/5)/(SSPE/8)=1.0470175 < F(1-0.01,5,8)= 6.631825
So F*<F for a = .01 and we do not reject H0 and conclude that the regression function is linear.
REMARK: We didnt talk about the lack of fit test in our class reviewing liner regressions. You can find
more details about this test in Section 3.7 (page 115) of the book Applied Linear Regression Models
(3rd edition) by Neter, Kutner, Nachtshein and Wasserman.
B. Refer to Brand preference. The diagonal elements of the hat matrix are:
h55 h66 h77 h88 h99 h10,10 h11,11 h12,12 .137 and
h11 h22 h33 h44 h13,13 h14,14 h15,15 h16,16 .237.
i: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Xn : 4 4 4 4 6 6 6 6 8 8 8 8 10 10 10 10
Xn : 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4
Yi : 64 73 61 76 72 80 71 83 83 89 86 93 88 95 94 10
0
a. Explain the reason for the pattern in the diagonal elements of the hat matrix.
i: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
hii 0.2375 0.2375 0.2375 0.2375 0.1375 0.1375 0.1375 0.1375 0.1375 0.1375 0.1375 0.1375 0.2375 0.2375 0.2375 0.2375
The diagonal elements of hat matrix indicate the effect of a given observation, so in this case we have two
groups of equally influential observations. Hence, no outliers can be identified by the elements of a hat
matrix in this case. Also, their sum equals number of the unknown parameters.
b. According to the rule of thumb stated in the chapter, are any of the observations outlying
with regard to their X values.
The rule of thumb suggests that points with a hat diagonal greater than 2p/n be considered high leverage
points. (Sometimes when p is small consider any point with a hat diagonal greater than .2 (or .5) as
having high leverage) Here, 2p/n=6/16=3/8=0.375 and no outliers can be identified.
c. Obtain the studentized deleted residuals and identify any outlying Y observations.
i: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Yi : 64 73 61 76 72 80 71 83 83 89 86 93 88 95 94 100
ti : -.04 .06 -1.36 1.39 -.37 -.66 -.77 .50 .47 -.60 1.82 .98 -1.14 -2.10 1.49 .25
| ti |: .04 .06 1.36 1.39 .37 .66 .77 .50 .47 .60 1.82 .98 1.14 2.10 1.49 .25
The largest in the absolute value studentized deleted residual is #14. Then there is #11 and #15. However,
by the empirical rule, the cut-off point is t(.975, 12) = 2.1788; None of them exceed this value.
e. Calculate the average absolute percent difference in the fitted values with and without
case 14. What does this measure indicate about the influence of case 14?
n
Y i (14) Y i
i=1 Y i
*100
0.677679%
n
so the effect of case #14 on inferences is not large and therefore this case should be kept.
f. Calculate Cooks distance D, for each case. Are any cases influential according to this
measure?
Cooks distance:
i: 1 2 3 4 5 6 7 8
Di 0.00019 0.0004 0.1804 0.1863 0.0077 0.0245 0.0323 0.01435
i: 9 10 11 12 13 14 15 16
Di 0.0122 0.0204 0.1498 0.0510 0.1318 0.3634 0.21067 0.0068
And graph:
Problem 3.
a. Find the maximum likelihood estimates of o, 1, and 2. State the fitted response function.
With a unit increase in the 1st or 2nd covariate (income or age) we expect the odds of success to increase
exp(b1)= 1.070079093 or exp(b2)= 1.819627221 times. This means that the odds of buying a new car
increase by 7% for every additional $1,000 of income and by 82% for every additional year of age.
c.What is the estimated probability that a family with annual income of $50 thousand and an
oldest car of 3 years will purchase a new car next year?
Using the model for the prediction we get the probability of a purchase of new car in the next year will be
0.6090245.
Cutpoints that divide fitted logit values into equal groups are 0.25110785 and 0.59013409. We have 11
observations in each cell and there are 3, 2 and 9 observations in each cell respectively with Y=1. So we
have the pattern below, which does not seem to have a sigmoidal shape, but we only have 3 intervals.
c. Obtain the deviance residuals and present them in an index plot. Do there appear to be any
outlying cases?
Additional Problem
Prove the two equations in Remark 3 on page 8 of the lecture notes on GLM
[Hint: Consider the expectations of the first and the second derivatives (w.r.t. theta_i) of the log-
likelihood function. ]
E 0 E 2 E 0
i i
where l ( i , | yi ) log( f ( yi | i , )) for any density function f ( yi | i , ) .
Ai
f ( yi | i , ) exp yii (i ) ( yi , ) .
Ai
y E[ y ]
2 2
A A 2 A A
i // (i ) i E i // (i ) i Var ( yi ) 0
i i
therefore,
Ai //
Var ( yi ) (i ) .