Professional Documents
Culture Documents
Solution Key
y = c(42, 37, 37, 28, 18, 18, 19, 20, 15, 14, 14, 13, 11, 12, 8, 7, 8, 8, 9, 15, 15)
x1 = c(80, 80, 75, 62, 62, 62, 62, 62, 58, 58, 58, 58, 58, 58, 50, 50, 50, 50, 50, 56, 70)
x2 = c(27, 27, 25, 24, 22, 23, 24, 24, 23, 18, 18, 17, 18, 19, 18, 18, 19, 19, 20, 20, 20)
x3 = c(89, 88, 90, 87, 87, 87, 93, 93, 87, 80, 89, 88, 82, 93, 89, 86, 72, 79, 80, 82, 91)
(a) [2 marks] Fit a linear regression model relating the results of the stack loss to the three regressor
varilables. Provide an summary output of your fitted model. Use the model to predict stack loss when
x1 = 60, x2 = 26, and x3 = 85.
m1 = lm(y~x1+x2+x3)
summary(m1)
##
## Call:
## lm(formula = y ~ x1 + x2 + x3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.2377 -1.7117 -0.4551 2.3614 5.6978
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -39.9197 11.8960 -3.356 0.00375 **
## x1 0.7156 0.1349 5.307 5.8e-05 ***
## x2 1.2953 0.3680 3.520 0.00263 **
## x3 -0.1521 0.1563 -0.973 0.34405
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.243 on 17 degrees of freedom
## Multiple R-squared: 0.9136, Adjusted R-squared: 0.8983
## F-statistic: 59.9 on 3 and 17 DF, p-value: 3.016e-09
## [1] 23.76576
Above is the code and ouptput for the main effects linear regression model. The predicted stack
loss as the specified levels of x is 23.77.
1
(b) [3 marks] Conduct a t-test for the null hypothesis H0 : β3 = 0 at the α = 0.05 level. Show the
calculation of the test statistic and p-value. What conclusion to do you draw regarding the relationship
between stack loss and acid concentration?
β̂3 − 0 −0.1521225
t∗ = = = −0.9733098
se(
ˆ β̂3 ) 0.156294
Therefore we do not reject the null hypothesis that stack-loss is unrelated to acid concentartion.
## [1] -0.9733098
## [1] 0.3440461
(c) [2 marks] Calculate a 90% confidence interval for β2 (show your work) and provide a written interpre-
tation of this regression coefficient.
For a one degree increase in temperature we expect that the daily stack-loss will increase by
1.2953 units. A 90% confidence interval for this estimate is (0.655, 1.936).
2
(d) [3 marks] Conduct a residual analysis of the fitted model using various residual plots. What conclusions
do you draw about the overal fit of the model?
Based on the residual analysis I conclude that the fit of this model fairly good. Recall, for a well
fit model we’d expect the standardized residuals to be indepenent N (0, 1). From the scatterplots
we see that the residuals appear to be approximately centred at zero and all but one of them
are within ±1.96. The scatterplots in x2 and ŷ show a potential quadratic relationship (large
residuals at the extremes, small residuals for moderate values) which may be of concern. Finally,
the quantile plot shows good agreement between the standardized residuals and the standard
normal distribution.
# Scatterplots (with loess curves) of residuals vs explanatory variables & fitted values
# Normal quantile plot
par(mfrow=c(2,3))
scatter.smooth(x1,rstandard(m1),main="Residauls vs X1"); abline(h=c(-1.95,0,1.96), lty=2)
scatter.smooth(x2,rstandard(m1),main="Residuals vs X2"); abline(h=c(-1.95,0,1.96), lty=2)
scatter.smooth(x3,rstandard(m1),main="Residuals vs X3"); abline(h=c(-1.95,0,1.96), lty=2)
qqnorm(rstandard(m1)); abline(0,1)
scatter.smooth(m1$fitted.value,rstandard(m1),main="Residuals vs Fitted Y")
abline(h=c(-1.95,0,1.96), lty=2)
2
rstandard(m1)
rstandard(m1)
rstandard(m1)
1
1
0
0
−2
−2
−2
50 60 70 80 18 20 22 24 26 75 80 85 90
x1 x2 x3
2
Sample Quantiles
rstandard(m1)
1
1
0
0
−2
−2
−2 −1 0 1 2 5 10 20 30 40
Note: Students may have slightly different residual plots if they chose to include interaction terms or remove
terms from their model. At minimum two plots should be considered (preferrable a Normal quantile plot
and scatter plot of residuals versus fitted values). Conclusions drawn must be consistent with the student’s
plots. Note that the residuals supplied directly from the fitted model are not standardized which makes
interpretation of the plots more difficult.
3
Question 2 [10 marks]
The angle θ at which electrons are emitted in muon decay has a distribution with the density:
1 + αx
f (x|α) = , −1 ≤ x ≤ 1, −1 ≤ α ≤ 1
2
where x = cos θ.
(a) [3 marks] Find the likelihood, log-likelihood, score, and information functions for a sample of n
independent observations from this distribtuion.
n
Y 1 + αxi
Likelihood L(α) =
i=1
2
Xn
log-likelihood `(α) = log L(α) = log(1 + αxi ) − n log 2
i=1
n
∂` X xi
Score S(α) = =
∂α i=1
(1 + αxi )
2 n
∂ ` X x2i
Information I(α) = − =
∂α2 i=1
(1 + αxi )2
(b) [2 marks] Use the Newton Raphson algorithm to find the maximum likelihood estimate of α for the
data given below. Note: you must code the algorithm yourself instead of using any built-in optimization
or root finding functions.
4
alpha.hat = alpha.new
print(trace)
## [,1] [,2]
## trace 0.0000000 4.174163e+00
## 0.3134817 -1.019505e-01
## 0.3066253 -3.409367e-04
## 0.3066022 -3.765120e-09
## 0.3066022 -2.870967e-16
(c) [3 marks] Calculate 95% confidence intervals for α using the likelihood ratio, score, and Wald asymptotic
results. Which of the three intervals do you prefer, and why?
lr.lower = uniroot(lr.int,interval=c(-1,alpha.hat),x=x,a.hat=alpha.hat)$root
lr.upper = uniroot(lr.int,interval=c(alpha.hat,1),x=x,a.hat=alpha.hat)$root
print(c(lr.lower,lr.upper))
sr.lower = uniroot(sr.int,interval=c(-.5,alpha.hat),x=x)$root
sr.upper = uniroot(sr.int,interval=c(alpha.hat,1),x=x)$root
## Error in uniroot(sr.int, interval = c(alpha.hat, 1), x = x): f() values at end points not of opposite
# Find the Wald based 95% CI {\alpha: (alpha.hat - alpha)^2 I(alpha.hat < chisq(0.95))}
wr = function(a,x,a.hat){ (a.hat-a)^2*Info(a.hat,x)}
wr.int = function(a,x,a.hat) {wr(a,x,a.hat) - qchisq(0.95,1)}
wr.lower = uniroot(wr.int,interval=c(-0.99,alpha.hat),x=x,a.hat=alpha.hat)$root
wr.upper = uniroot(wr.int,interval=c(alpha.hat,1.3),x=x,a.hat=alpha.hat)$root
print(c(wr.lower,wr.upper))
5
## [1] -0.2033615 0.8166075
In this case I prefer either the likelihood ratio based or Wald based interval. The Score statistic
does not seem particularly well behaved in this case (see plot below). Perhaps we do not have a
large enough sample size to use the that asymptotic result. In this case calculation of the MLE
and Information are relatively easy so there is no problem using the Wald result however it does
impose a symmetric interval which we may not prefer.
Note: As seen in the plot below there is another part of the domain of α for which the score
statistic is less then χ2(1) (0.95). So one could argue that we should have a composite confidence
interval of (-1, -0.98) ∪ (-0.255, 1). In my mind, given our MLE I would not include this region
and would instead not rely on a Score statistic based interval in this setting.
# Plot the log relative likelihood statistic and score statistic and wald statistic
# over -1<alpha<1 (CI are alphas with curve below chisq(0.95)=3.84)
alphas=seq(-0.99,.99,.01)
par(mfrow=c(1,3))
lr.out = rep(0,length(alphas))
for(i in 1:length(alphas)){lr.out[i]=lr(alphas[i],x,alpha.hat)}
plot(alphas,-2*lr.out,type='l',main="LR Statistic")
abline(h=qchisq(0.95,1), lty=2); abline(v=alpha.hat,lty=3)
sr.out = rep(0,length(alphas))
for(i in 1:length(alphas)){sr.out[i]=sr(alphas[i],x)}
plot(alphas,sr.out,type='l', main="Score Statistic")
abline(h=qchisq(0.95,1), lty=2); abline(v=alpha.hat,lty=3)
wr.out = rep(0,length(alphas))
for(i in 1:length(alphas)){wr.out[i]=wr(alphas[i],x,alpha.hat)}
plot(alphas,wr.out,type='l', main="Wald Statistic")
abline(h=qchisq(0.95,1), lty=2); abline(v=alpha.hat,lty=3)
6
LR Statistic Score Statistic Wald Statistic
25
40
20
6
30
15
−2 * lr.out
wr.out
sr.out
4
20
10
2
10
5
0
0
−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0
We wish to test H0 : α = 0.25 versus Ha : α 6= 0.25. The three tests are conducted below using R.
In all three cases we do not reject the null hypothesis that α = 0.25.
7
Question 3 [10 marks]
Suppose that Y is a random variable from the exponential distribtuion with rate parameter λ > 0 and
probability density function:
f (y; λ) = λe−λy
(a) [2 marks] Show that the distribution of Y is a member of the exponential family by identifying the
canonical parameter, the dispersion parameter, and the functions a(φ), b(θ), c(y; φ).
Recall a distribtuion is a member of the exponential family if its pdf/pmf can be written in the
form:
yθ − b(θ)
f (y; θ, φ) = exp + c(y; φ)
a(φ)
For the exponential distribtion we can write
(b) [2 marks] Obtain an expression for the mean and variance of Y and identify the canonical link.
(c) [3 marks] Suppose Yi , i = 1, . . . , n are iid and for each Yi there is vector of explanatory variables
xi = (1, xi1 , . . . , xi,p−1 )0 . Consider the linear predictor ηi = x0i β and the canonical link found in (b).
Find the specific form of the score vector and information matrix for β.
To find the Score and Information we can either substitute into the Likelihood using the parameter
relationship defined by the canonical link: x0 β = η = θ = 1/µ = λ i.e. λ = x0 β or we can use the
general result discussed in class (course notes pages 6-8). Since we have a random sample we can
omit the subscript i and find the contributions from a single observation.
8
Using general result for the exponential family. Note in this case we have:
2 2
∂η −1 ∂η −1 1
= 2 W −1 = V ar(Y ) = µ2 =
∂µ µ ∂µ µ2 µ2
This implies:
∂` ∂η
Sj (β) = = (y − µ) · W · xj
∂βj ∂µ
−1
= (y − µ)µ2 xj
µ2
1
= − y− 0 xj
xβ
xj
= −yxj + 0
xβ
Since we are using the canonical link the Information and expected Information are equal and
have have: 2
1 xj xk
Ijk (β) = Ijk (β) = xj W −1 xk = xj xk = 0 2
µ2 (x β)
Therefore using either method, the j th element of the Score vector and the (j, k)th element of the
Information matrix are given by:
n n
X xij X xij xik
Sj (β) = −yi xij + Ijk (β) =
i=1
xi 0 β i=1
(xi 0 β)2
(d) [3 marks] The R code below gives data on y the time in years until a first claim for 25 insurance
policies and x a proprietary measure of risk. Use Newton Raphson to estimate β = (β0 , β1 ) from an
exponential generalized linear model with the canonical link. Again, you must code your own Newton
Raphson algorithm rather than relying on any built-in functions in R.
y = c(0.9683, 0.4515, 17.4488, 0.6287, 2.2330, 2.6467, 3.9589, 0.0782, 5.4717, 4.1161,
0.6715, 1.6350, 0.1640, 0.3331, 0.7501, 3.0846, 0.6889, 6.3826, 7.0869, 0.7967,
3.2684, 0.1373, 2.8698, 1.5126, 0.9055)
x = c(0.1036, 2.1824, 0.1745, 2.0089, 1.2317, 0.6166, 0.4675, 3.2074, 0.0277, 1.2962,
0.6812, 0.1946, 1.3291, 0.4381, 0.2984, 0.3018, 0.7928, 0.2021, 1.0280, 0.0121,
1.2043, 2.9322, 1.4526, 0.6444, 0.1849)
9
I22 = sum(X[2,]^2/xTB^2)
rbind( c(I11,I12), c(I21,I22))
}
# Put data in matrix form, set up initial beta estimate and tolerace for convergance
Y = y
X = rbind(rep(1,length(x)),x)
Beta.old = Beta.new = c(0,1)
delta = 1
epsilon = 10^{-5}
trace = c(Beta.new,Score(Beta.new,Y,X))
After seven iterations of Newton Raphson (equivalently Fisher Scoring here since we’re using the canonical
link) we find the maximum likelihood estimate β̂ = (0.189, 0.315).
Note: It appears that the convergence of Newton Raphson is sensitive to the starting values used. It is
possible that the algorithm may converge to some local maxima instead of the true MLE.
# Note that in the future we will use the GLM function to estimate beta
# Here's the appropriate GLM call for an exponential regression
fit = glm(y~x,family=Gamma)
summary(fit,dispersion=1)
##
## Call:
## glm(formula = y ~ x, family = Gamma)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7096 -1.2278 -0.5957 0.2998 1.9012
##
## Coefficients:
10
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.18897 0.08441 2.239 0.0252 *
## x 0.31466 0.14812 2.124 0.0336 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Gamma family taken to be 1)
##
## Null deviance: 36.062 on 24 degrees of freedom
## Residual deviance: 29.788 on 23 degrees of freedom
## AIC: 100.29
##
## Number of Fisher Scoring iterations: 7
11