You are on page 1of 29

Generalized Linear Models

Generalized linear models: Exponential family



In Generalized Linear Models the response is assumed to possess a
probability distribution of Exponential distribution function:




dispersin )
momentos ) (
enlace de funcin

) , (
) (
) (
a(
b
y
e
y c
a
b y
u
u

u u
(

Generalized linear models: Exponential family


Normal distribution has the form:









)
2
1
ln
2
( ) c(y, ) a(
2
) (
2
1
) | (
2
2
2
2
) , (
) (
) (
2
1
ln
2
2
2
1
ln
2
2
2
1
ln
2
) (
2
) (
2
2
2
2
2
2 2
2
2
2
2
o t o

o u u
o t

u u
o t o

o
o t o

o t o

+ = = = =
= = =
= =
(

(
(
(
(

(
(

(
(

y
b yu y
e e e
e e y f
y c
a
b y
y
yu
uy y
y
y
Generalized linear models: Exponential family
Poisson Distribution has the form:









| |
) ! ln( ) , ( 1 ) ) ( ) ln(
!
) | (
) , (
) (
) (
) ! ln( ) ln(
y y c a( u b u y y
e e
y
u e
y f
y c
a
b y
y u u y
y u
= = = =
= = =
(

u u

u u
Generalized linear models: Exponential family
Binomial Distribution has the form:









|
|
.
|

\
|
= = =

=
=
=
|
|
.
|

\
|
=
(

|
|
.
|

\
|
+

+ +
|
|
.
|

\
|

y
n
y c a( p n b
p
p
y y
e e
e p p
y
n
y f
y c
a
b y
y
n
p n
p
p
y
p y n p y
y
n
y n y
ln ) , ( 1 ) )) 1 ln( ( ) ( )
1
log(
) 1 ( ) | (
) , (
) (
) (
ln )) 1 ln( ( )
1
ln(
) 1 ln( ) ( ) ln( ln
u u

u u
Generalized linear models: Exponential family

In Generalized Linear Models the response is assumed to possess a
probability distribution of Exponential distribution function:




u u

u u
= = =
=
(

) ( g ) ) )
) | (
) , (
) (
) (
a( b( Var(Y) b( Y E
e y f
y c
a
b y
Generalized linear models: Exponential family
Normal distribution has the form:









u u g u b Y Var u
y
u b
y
b yu y
e e e
e e y f
u y
y c
a
b y
y
yu
uy y
y
y
= = = = = = =
+ = = = =
= = =
= =
=
(

(
(
(
(

(
(

(
(

) ( ) a( ) ( ) (
2
2
) ( Y E
)
2
1
ln
2
( ) c(y, ) a(
2
) (
2
1
) | (
2
2
2
2
2
) , (
) (
) (
2
1
ln
2
2
2
1
ln
2
2
2
1
ln
2
) (
2
) (
2
2
2
2
2
2 2
2
2
2
2
o
o t o

o u u
o t

u u
o t o

o
o t o

o t o

Generalized linear models: Exponential family


Poisson Distribution has the form:









| |
) ! ln( ) , ( 1 ) ) ( ) ln(
!
) | (
) , (
) (
) (
) ! ln( ) ln(
y y c a( u b u y y
e e
y
u e
y f
y c
a
b y
y u u y
y u
= = = =
= = =
(

u u

u u
) ln( ) ( ) exp( ) a( ) ( ) (
u ) exp( ) ( Y E ) exp( )
) ln(
) ln(
u u g u u b Y Var
u b b( u
u
u
= = = = =
= = = = =
=
=
u u
u u
u
u
Generalized linear models: Exponential family
Binomial Distribution has the form:









|
|
.
|

\
|
= = =

=
=
=
|
|
.
|

\
|
=
(

|
|
.
|

\
|
+

+ +
|
|
.
|

\
|

y
n
y c a( p n b
p
p
y y
e e
e p p
y
n
y f
y c
a
b y
y
n
p n
p
p
y
p y n p y
y
n
y n y
ln ) , ( 1 ) )) 1 ln( ( ) ( )
1
log(
) 1 ( ) | (
) , (
) (
) (
ln )) 1 ln( ( )
1
ln(
) 1 ln( ) ( ) ln( ln
u u

u u
)
1
ln( ) ( ) 1 ( ) a( ) ( ) (
) ( )) exp 1 ln ) (
1
ln
p
p
p g p np u b Y Var
np b ( ( n b
p
p

= = = =
= + =

=
u
u u u
Probability distributions





Normal: (2)



Inverse Gaussian: (3)


Gamma: (4)

Probability distributions





Negative Binomial: (5)

Poisson: (6)

Binomial:
(6)

Generalized Linear Models (GLM)
General class of linear models that are made up of 3
components: Random, Systematic, and Link Function
Random component: Identifies dependent variable
(Y) and its probability distribution
Systematic Component: Identifies the set of
explanatory variables (X
1
,...,X
k
)
Link Function: Identifies a function of the mean
that is a linear function of the explanatory
variables
k k
X X g | | o + + + =
1 1
) (
Generalised linear model
If the distribution of observations is one of the distributions from the
exponential family and some function of the expected value of the
observations is a linear function of the parameters then generalised
linear model is used:

Function g is called the link function. Here is a list of the popular distribution
and corresponding link functions:
binomial - logit = ln(p/(1-p))
normal - identity
Gamma - inverse
Poisson - log

Most natural way is to use u=X|. The optimization for this kind of functions is
done iteratively.

X y y = = )) ( ' ( , , ), ( ' ( ( )) ( ( , , )), ( ( (
1 1 n n
B g B g E g E g u u
Likelihood function
) (
) ( ) / (
) / (
X p
p X f
X f
u u
u =
Likelihood function
( )
|
|
.
|

\
|
=

[
[
n
i X n
n
i X n
p x f p x x x L
p x f p x x x L
) ; ( ln ) ; ,..., , ( ln
) ; ( ) ; ,..., , (
2 1
2 1

Likelihood function Poisson distribution

| |

=
[ = [ = [ =

) ! log( ) log( ln
!
) | (
) ! log( ) log(
y u u y L
e
y
u e
y f L
y u u y
y u

Newton Raphson algorithm


Likelihood ratio test
Let us assume that we have a sample of size n (x=(x
1
,,,,x
n
)) and we want to estimate a
parameter vector u=(u
1
,u
2
). Both u
1
and u
2
can also be vectors. We want to test null-
hypothesis against alternative one:

Let us assume that likelihood function is L(x| u). Then likelihood ratio test works as
follows: 1) Maximise the likelihood function under null-hypothesis (I.e. fix
parameter(s) u
1
equal to u
10
, find the value of likelihood at the maximum, 2)maximise
the likelihood under alternative hypothesis (I.e. unconditional maximisation), find the
value of the likelihood at the maximum, then find the ratio:




w is the likelihood ratio statistic. Tests carried out using this statistic are called likelihood
ratio tests. In this case it is clear that:

If the value of w is small then null-hypothesis is rejected. If g(w) is the the density of the
distribution for w then critical region can be calculated using:



10 1 1 10 1 0
: against : u u u u = = H H
on maximisati ned unconstrai after parameters both the of values the are

on maximisati ) ( d constraine after paramater the of value the is

| ( / )

, | (
2 1
10 1 1
2 1 2 10
u u
u u u
u u u u
=
= x L x L w

0 s w s1
}
=
o
o
c
dw w g
0
) (
Deviances
In linear model, we maximise the likelihood with full model and under the null
hypothesis. The ratio of the values of the likelihood function under two
hypotheses (null and alternative) is related to F-distribution. Interpretation is that
how much variance would increase if we would remove part of the model (null
hypothesis).
In logisitc and log-linear models, again likelihood function is maximised under the null
and alternative hypotheses. Then logarithm (it is called deviance) of ratio of the
values of the likelihood under these two hypotheses asymptotically has chi-
squared distribution:


That is the difference between maximum achievable log-likelihood and the value of
likelihood at the estimated parameters
That is the reason why in log-linear and logistic regressions it is usual to talk about
deviances and chi-squared statistics instead of variances and F-statistics.
Analysis based on log-linear and logistic models (in general for generalised
linear models) is usually called analyisis of deviances. Reason for this is that
chi-squared is related to deviation of the fitted model and observations.
Another test is based on Pearsons chi-squared test. It approaches asymptotically to
chi-squared with n-p (n is the number of observations and p is the number
parameters) degrees of freedom.



_
2
~2.0 (l(y | y) log(l(y |

)


X
2
=
(y
i



i
)
2
Var(y
i
)
i=1
n

19
Goodness of Fits: Deviance
Deviance = -2[L
M
-L
S
]
where L
M
is the maximum log likelihood of the model of interest
L
S
is the maximum log likelihood for the most complex model, which has a
separated parameter at each explanatory setting (saturated model).
Deviance has approximately a chi-square distribution with df = N-p
Where N = number of observations and p = number of parameters (including
intercept).
Likelihood ratio test for model comparison between M
1
and M
0
(M
0
is a
simpler model than M
1
)

Likelihood ratio = -2[L
0
-L
1
)=2[L
0
-L
S
]-{-2[L
1
-L
S
]} = Deviance
0
-Deviance
1

Model fit metrics
Covariance matrix for parameters
computed from 2
nd
partial derivatives of the log
likelihood function
Likelihood ratio test
Ratio of max square log likelihood to square log
likelihood of null hypothesis
Deviance
Measure of how overdetermined the system is
Compare max of full system to max of saturated
model (number of parameters equals number of data
points)
Range of plausible models
Likelihood ratio
) ; , f(
) ; , f(

2
2
0
y
y
b
o
o
=
With

0
the specified model and the bestmodel
Ratio of likelihood of any model to likelihood of best model
| |
z

- y

- y

- y
2
2
1
2
2
1
2
2
1
2
2
1
- exp - exp
- exp
- exp

0
b
0
=
(
(

=
(
(

(
(

=
|
.
|

\
|
|
.
|

\
|
|
.
|

\
|
Log-likelihood ratio ln = -
z
2

z
2
= -2 ln
Example
Site
Longitud
e
Latitud
e
Alt Sl Te Pp V Ndvi Soil Lc S M B
P
Nicols B. -104.7 24.38 1920 2 17 450 5 90 7 8 9 3 79 80
Librado R. -104.26 24.4 2005 2 17 450 7 84 9 8 24 28 31 32
La Ermita -104.33 23.89 2169 10 17 550 6 109 9 11 47 6 13 14
Madero -104.29 24.27 1942 2 17 450 3 74 9 8 16 85 110 111
Castillo N. -104.49 24.34 1923 2 17 450 7 78 7 8 20 58 33 34
Km 188 -104.61 25.38 1501 3 21 350 4 83 2 10 29 28 27 28
Km 130 -104.51 24.99 1733 4 19 450 4 85 3 9 22 20 20 21
Las Huertas -104.29 24.27 1930 2 17 450 3 75 9 8 15 13 20 21
18 de Agosto -104.15 23.95 1866 1 17 450 7 81 9 12 17 10 20 21
El Venado -104.28 23.87 1747 4 17 450 6 83 4 8 20 10 20 21
Km 23 -104.46 24.51 2126 4 15 550 7 83 3 11 16 8 14 15
Km 73 -104.32 25.13 1284 5 21 350 4 78 9 8 1 0 1 2
Rodrguez -104.09 24.32 2064 4 17 550 6 78 2 11 15 0 7 8
Berros - Tuitan -104.27 23.97 1855 4 17 450 6 84 9 9 15 0 0 1
27 de Noviembre -104.49 24.22 1862 2 17 450 5 91 9 8 17 0 31 32
Km 86 -104.64 24.7 1954 6 17 450 3 89 4 8 8 4 10 11
Morcillo -104.7 24.11 1971 3 17 550 8 88 9 10 2 0 5 6
Km 43 -104.47 24.65 1908 3 17 450 7 75 3 9 12 0 3 4
Zarco -
Cieneguilla -104.04 24.1 2143 5 15 550 7 82 2 9 10 0 9 10
Berros - Saltito -104.28 23.94 1858 15 17 450 6 83 9 8 6 0 1 2
Km 36 -104.7 24.27 1909 2 17 450 3 86 9 9 0 0 0 1
Zaragoza -104.16 23.87 1856 1 17 450 7 76 9 8 11 15 21 22
Entrada Guadiana -104.34 23.95 1867 8 17 550 6 83 9 11 1 0 2 3
Carlos R. -104.44 24.27 1867 1 17 450 5 88 7 8 15 0 5 6
Km 153 -104.53 25.12 1360 2 21 350 7 81 9 9 8 6 0 1
Km 51 -104.16 25.21 1416 24 21 250 7 79 4 9 1 0 0 1
Km 29 -104.16 25.36 1396 7 21 250 7 71 4 10 0 0 0 1
Km 237 -104.75 25.76 1905 2 17 450 6 79 9 11 5 1 0 1
Km 260 -104.89 25.96 1940 2 17 450 6 82 2 9 0 0 0 1
Km 84 -104.42 25.82 1770 3 19 350 7 75 9 11 0 0 0 1
Francisco Z. -104.07 24.22 2166 3 15 550 6 74 2 8 2 0 0 1
Km 61 -104.29 25.55 1651 7 21 250 7 77 9 9 0 0 0 1
Km 104 -104.58 25.79 1942 1 17 450 6 77 9 10 3 0 1 2
Km 76 -104.28 25.67 1817 5 19 250 4 77 4 10 1 0 0 1

Variable distribution
M
B
P
-20 0 20 40 60 80 100 120
Grasshopper count
0
2
4
6
8
10
12
14
16
18
20
F
r
e
q
u
e
n
c
y
Correlation
Longitude Latitude Altitude Slope Temperature Precipitation Vegetation Ndvi Soil Landcover
M. lakinus 0.09 -0.42
a
0.28 -0.10 -0.19 0.34
a
-0.09 0.57
a
-0.05 0.03
B. nubilum 0.03 -0.16 0.07 -0.21 -0.05 0.04 -0.24 -0.16 0.10 -0.26
P.
nebrascensis
-0.04 -0.28 0.17 -0.25 -0.17 0.14 -0.33 0.07 0.13 -0.31
Multicolinearity
Longitude Latitude Altitude Slope Temperature Precipitation Vegetation Ndvi Soil Landcover
Longitude 1.00 -0.37
a
0.00 0.32 -0.01 -0.06 0.22 -0.30 0.09 -0.01
Latitude -0.37
a
1.00 -0.45
a
0.02 0.57
a
-0.66
a
-0.02 -0.37
a
-0.18 0.23
Altitude 0.00 -0.45
a
1.00 -0.28 -0.92
a
0.79
a
0.04 0.30 -0.08 0.09
Slope 0.32 0.02 -0.28 1.00 0.32 -0.30 0.16 0.11 0.12 0.03
Temperature -0.01 0.57 -0.92
a
0.32 1.00 -0.85
a
-0.04 -0.22 0.05 0.04
Precipitation -0.06 -0.64 0.79
a
-0.30 -0.85
a
1.00 0.09 0.39
a
-0.03 0.08
Vegetation 0.22 -0.02 0.04 0.15 -0.04 0.06 1.00 -0.10 0.04 0.32
Ndvi -0.30 -0.37
a
0.30 0.11 -0.22 0.39
a
-0.10 1.00 0.16 0.06
Soil 0.09 -0.18 -0.08 0.12 0.05 -0.03 0.04 0.16 1.00 -0.02
Landcover -0.01 0.23 0.09 0.04 0.05 0.08 0.32 0.06 -0.02 1.00
Deviance
Species
Model Link function Deviance
M. lakinus Value d.f. Value/df
Gamma Log 2.244 7 0.335
B. nubilum
Gamma Log 11.211 9 1.080
P. nebrascensis
Gamma Log 2.835 7 0.715
Parameter B Std. Error
95% Wald Confidence Interval Hypothesis Test
Lower Upper Wald Chi-Square df Sig.
(Intercept) 835.919 62.1403 714.126 957.712 180.960 1 .000
[Temperature=15.00] -2.627 .5287 -3.663 -1.591 24.692 1 .000
[Temperature =17.00] -2.889 .6660 -4.195 -1.584 18.822 1 .000
[Temperature =19.00] -5.630 .5807 -6.768 -4.491 93.982 1 .000
[Temperature =21.00] 0
a
. . . . . .
[Precipitation =250.00] -4.156 .3781 -4.897 -3.415 120.788 1 .000
[Precipitation =350.00] 0
a
. . . . . .
[Precipitation =450.00] 2.332 .4734 1.404 3.260 24.268 1 .000
[Precipitation =550.00] 0
a
. . . . . .
[Vegetation=3.00] -3.481 .5117 -4.484 -2.478 46.261 1 .000
[Vegetation=4.00] -2.383 .8388 -4.027 -.739 8.072 1 .004
[Vegetation=5.00] -3.402 .5694 -4.518 -2.286 35.696 1 .000
[Vegetation=6.00] -4.161 .5299 -5.199 -3.122 61.647 1 .000
[Vegetation=7.00] -4.288 .5890 -5.442 -3.133 52.991 1 .000
[Vegetation=8.00] 0
a
. . . . . .
[Soil=2.00] -.103 .3156 -.721 .516 .106 1 .745
[Soil=3.00] 1.911 .2833 1.356 2.467 45.522 1 .000
[Soil=4.00] .488 .1837 .128 .848 7.052 1 .008
[Soil=7.00] .533 .2793 -.015 1.080 3.638 1 .056
[Soil=9.00] 0
a
. . . . . .
[Landcover=8.00] .436 .2687 -.090 .963 2.636 1 .104
[Landcover=9.00] .369 .2951 -.210 .947 1.561 1 .212
[Landcover=10.00] -.118 .3949 -.892 .656 .089 1 .765
[Landcover=11.00] 1.666 .4357 .812 2.520 14.623 1 .000
[Landcover=12.00] 0
a
. . . . . .
Longitude 8.360 .6374 7.111 9.610 172.047 1 .000
Latitude 1.366 .2438 .889 1.844 31.421 1 .000
Slope -.039 .0147 -.067 -.010 6.859 1 .009
Ndvi .122 .0113 .100 .144 117.147 1 .000
(Scale) .048
b
.0122 .029 .079
Dependent Variable: M1
Model: (Intercept), Precipitation, Temperature, Vegetation, Soil, Landcover, Longitude, Latitude, Slope, Ndvi
a. Set to zero because this parameter is redundant.
b. Maximum likelihood estimate.

Residual
Fit