Professional Documents
Culture Documents
Simon Wood
Mathematical Sciences, University of Bath, U.K.
yi N(i , 2 ).
where i = E (yi ).
yi EF(i , )
GLMs: scope
150
100
50
New Cases
200
250
AIDS in Belgium
1982
1984
1986
1988
Year
1990
1992
yi Poi
yi Poi
GLM estimation
Model estimation is by maximum likelihood, via a Newton type method. e.g. for the
0.2
0.1
0.0
0.3
0.4
10
20
30
exp(0)
40
50
60
70
IRLS
i = 1 + (yi
i )(Vi /Vi + gi /gi ) gives Newtons method.
Distribution of
Deviance
Properties of deviance
If the model exactly fits the data then the deviance is zero.
As a rough approximation
D 2ndim()
if the model is correct. Approximation can be good in some
cases and is exact for the strictly linear model.
Model Comparison
For GLMs we need to check the assumptions that the data are
independent and have the assumed mean-variance
relationship, and are consistent with the assumed
distribution.
Pearson residuals
Deviance residuals
glm in R
13
3.5
4.5
5.5
Predicted values
13
1.5
1.5
Residuals vs Leverage
0.5
1.0
0.5
1
2
0.0 1.0
Theoretical Quantiles
13
0.5
0.5 0.5
2
ScaleLocation
2
0.0
2.0
2
0
Residuals
4 2
Normal QQ
Residuals vs Fitted
Cooks distance
3.5
4.5
5.5
0.0
Predicted values
0.2
Leverage
13
0.4
ScaleLocation
3.5
4.5
5.5
Predicted values
. . . much better.
1.5
0.0 1.0
Theoretical Quantiles
1.2
0.8
0.4
11
0.5
0.5
13
2.5
3.5
4.5
Predicted values
5.5
Cooks distance
2.5
6
2
0.0
2
1
0
1
1.0
0.0
Residuals
1.5
Residuals vs Leverage
11
11
Normal QQ
11
Residuals vs Fitted
0.0
0.2
0.4
Leverage
0.6
Now, examine the fitted model, first with the default print method
> am2
Call:
glm(formula=cases~year+I(year^2),
family=poisson(link=log),data=belg.aids)
Coefficients:
(Intercept)
1.90146
year
0.55600
I(year^2)
-0.02135
summary.glm (edited)
> summary(am2)
...
Deviance Residuals:
Min
1Q
-1.45903 -0.64491
Median
0.08927
3Q
0.67117
Max
1.54596
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.901459
0.186877 10.175 < 2e-16 ***
year
0.556003
0.045780 12.145 < 2e-16 ***
I(year^2)
-0.021346
0.002659 -8.029 9.82e-16 ***
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 872.2058
Residual deviance:
9.2402
AIC: 96.924
on 12
on 10
degrees of freedom
degrees of freedom
Hypothesis testing
cases ~ year
cases ~ year + I(year^2)
Df Resid. Dev Df Deviance P(>|Chi|)
11
80.686
10
9.240 1
71.446 2.849e-17
AIC comparison
> AIC(am1,am2,am3)
df
AIC
am1 2 166.36982
am2 3 96.92358
am3 4 98.69148
So, both hypothesis testing and AIC agree that the quadratic
model, am2 is the most appropriate.
predict
Now we can plot the data, fitted curve, and standard error
bands:
plot(belg.aids$year+1980,belg.aids$cases)
lines(year+1980,exp(fv$fit),col=2)
lines(year+1980,exp(fv$fit+2*fv$se),col=3)
lines(year+1980,exp(fv$fit-2*fv$se),col=3)
#
#
#
#
data
fit
upper c.l.
lower c.l.
150
100
50
cases
200
250
1982
1984
1986
1988
year
1990
1992