Professional Documents
Culture Documents
in logistic regression
Claudia Czado
TU München
• Parameter estimation
• Regression diagnostics
scores:
t x t
β
P
n x β P
n e i
sj (β) := ∂l
= yixij − ni e i
∂βj t xij = xij yi − ni t j = 1, . . . , p
i=1 x
1+e i
β i=1 1 + e xi β
| {z }
E(Yi |Xi =xi )
t
2
∂ l
P
n x
e i
β Pn
∂βr ∂βs =− ni t xisxir = − nip(xi)(1 − p(xi))xisxir
i=1 x β
(1+e i )2 i=1
h 2
i
∂ l
⇒ H(β) = ∂βr ∂βs = −X tDX ∈ Rp×p
r,s=1,...,p
⇒ ||D1/2XZ||2 = 0 ⇔ D1/2XZ = 0
⇔ X tD1/2D1/2XZ = 0
D 1/2 X full rank
⇔ Z = (X tDX)−10
⇔ Z=0 q.e.d.
⇒ There is at most one solution to the score equations, i.e. if the MLE of β
exists, it is unique and solution to the score equations.
P
n t ∗
[yixi β − ln(1 + e i β )]
x
∗ t ∗
⇒ l(β ) =
i=1
P
n t ∗ P
n t ∗
{xi β − ln(1 + e i β )} −
x ln(1 + e i β )
x
t ∗
=
i=1,Yi =1 i=1,Yi =0
D
2) V (β)1/2(β̂ n − β) → Np(0, Ip) where
Coefficients:
Value Std. Error t value
(Intercept) 2.5 0.24 10.4
poly(Age, 2)1 -14.9 2.97 -5.0
poly(Age, 2)2 3.7 2.53 1.4
Sex -2.6 0.20 -13.0
PClass2nd -1.2 0.26 -4.7
PClass3rd -2.5 0.28 -8.9
Binomial model
Binomial model
Binomial model
Response: cbind(Survived, 1 - Survived)
The quadratic effect in Age in the interaction between Age and PClass is
weakly significant, since a drop in deviance of 641-634=7 on 2 df gives a
p − value = .03.
Binomial model
Binomial model
The residual deviance test gives p − value = .42, i.e. Interactions improve fit
strongly.
2
0
fitted logits
-2
1st female
2nd female
-4
3rd female
1st male
2nd male
-6
3rd male
0 20 40 60
Age
0.4 0.6
0.0 0.2
0 20 40 60
Age
Yi −ni p̂i
Pearson residuals: eP
i := (ni p̂i (1−p̂i ))1/2
2
P n
⇒ χ = (eP
i )
2
i=1
³ ´ ³ ´
Yi ni −Yi
deviance residuals: edi := sign(Yi − nip̂i)(2yi ln ni p̂i + 2(ni − yi) ln ni (1−p̂i ) )
Pn
⇒D= (riD )2
i=1
sqrt(abs(resid(r.inter)))
Deviance Residuals
••• • • •
••••••• ••••• • •••••
Pearson Residuals
-4 -2 0 2 4 6
•
0.8
•••
••••••
•
•••••
•••
• •
•
••
•
• •
••••••••
••••
••••
••• ••• •••••••••
••••
•••
0.4
•
••••••••
••
••
•
•••
•••
•
••
• •
••
• •
•
••
•
• •
•
••
• ••
••
••
•
••••••••••••••
•••
•
••••••
0.0
•••
••••••••••••••
•••
••••
•••
••••
•••
••••
•••
••
••••
••
•••
•••••
••••••••••••••••• ••••••
•••••••• •• ••••• • • •••••••• • •
0.0 0.2 0.4 0.6 0.8 1.0 -3 -2 -1 0 1 2 3
Fitted : (Age + Sex + PClass)^2 Quantiles of Standard Normal
sqrt(abs(resid(r.group.inter)))
Deviance Residuals
• •
Pearson Residuals
-4 -2 0 2 4
••
0.8
•• ••• ••
•
••
• •
• •
•
•
••
••
•••
••
•••••••••••••
•
•••••••
•• • •• •• •• •• •• • •• • •
•••
•
••••
••
• •••••••••••••••••
0.4
•• •• •
•••••
•
••• • • •• • •••
•••••••••••••••
••
•
• •• • •••• • ••••••
••• •••••• •
• ••
0.0
dηi
Ziβ := ηi + (yi − µi) i = 1, . . . , n
dµi
h i
dηi ni −µi (ni −µi −(−1)µi ni 1
⇒ dµi = µi (ni −µi )2
= µi (ni −µi ) = ni pi (1−pi )
⇒ Zβ = Xβ + D−1(β)² where
² := Y − (n1p1, . . . , nnpn)t
β̂ = (X tD(β̂)X)−1X tD(β̂)Zβ̂
Define ³ ´
e := Zβ̂ − X β̂ = I − X(X tD(β̂)X)−1X tD(β̂) Zβ̂
If one considers D(β̂) as a non random constant quantity, then we have
³ ´
E(e) = I − X(X tD(β̂)X)−1X tD(β̂) E(Zβ̂ ) = 0
| {z }
= X β̂
eri
eai := V ar(eri )1/2
“adjusted residuals”
eri
= ½ ¾1/2
1
{ni p̂i (1−p̂i )}2 [ n p̂ (1− p̂ )
−(X(X t D(β̂ )X)−1 X t )ii ]
i i i
eri
= ½ ¾1/2
ni p̂i (1−p̂i )[1−ni p̂i (1−p̂i )(X(X t D(β̂ )X)−1 X t )ii ]
eP
= ·
i
¸1/2
1−ni p̂i (1−p̂i )(X(X t D(β̂ )X)−1 X t )ii
eP
= i
[1−hii ]1/2
where hii := nip̂i(1 − p̂i)(X(X tD(β̂)X)−1X t)ii
Y = Xβ + ² β̂ = (X tX)−1X tY Ŷ = X β̂ = HY,
H = X(X tX)−1X t H2 = H
⇒ ê := Y − Ŷ = (I − H)Y
= (I − H)ê
Yi −ni p̂i
Lemma: eP = (I − H)eP, where eP
i := (ni p̂i (1−p̂i ))1/2
⇒ eP = D̂−1/2(Y − Ŷ)
eP = (I − H) eP
| {z }
M
M spans residual space eP. This suggests that small mii (or large hii) should
be useful in detecting extreme points in the design space X.
Pn
We have hii = p (exercise), therefore we consider hii > 2p
n as “high leverage
i=1
points”.
t
eY | X − j := Y − X−j (X−j X−j )−1X−j
t
Y = (I − H−j )Y,
| {z }
=:H−j
= raw residuals in model with j th covariable removed
t
eXj|X−j := Xj − X−j (X−j X−j )−1X−j
t
Xj = (I − H−j )Xj
The LSE of βj in (*), denoted by β̂j∗ satisfies β̂j∗ = β̂j where β̂j is LSE in
Y = Xβ + ² (exercise).
Since β̂j∗ = β̂j we can interpret the partial residual plot as follows:
If partial residual plot scatters
- around 0 ⇒ Xj has no influence on Y
- linear ⇒ Xj should be linear in model
- nonlinear ⇒ Xj should be included with this nonlinear form.
e = Y − X−j β̂ −j − Xjβ̂j
Yi − nip̂i
e(Y|X−j )i := + β̂j xij as partial residual
nip̂i(1 − p̂i)
Heuristic derivation:
Justification: IWLS
−1 r
ZL := X β̂ + Lγ̂ + D̂L eL with
xti β̂+li γ̂
e
erLi := yi − ni L = (l1, . . . , ln)t
xti β̂+li γ̂
|1 + e{z }
p̂Li
D̂L := diag(. . . , nip̂Li(1 − p̂Li), . . .)
−1 yi − nip̂Li
e(Y|X−j )i := (D̂L )iierLi + γ̂li = + γ̂li
nip̂Li(1 − p̂Li)
⇒ partial residual plot in logistic regression: plot xij versus e(Y|X−j )i.
Di := ||X β̂ − X β̂ −i||2/pσ̂ 2
ê2i hii
= pσ̂ 2 (1−hii )2
(ea 2
i ) hii hii
⇒ Di ≈ (1−hii ) = (eP
i )2
(1−h 2 =: Dia
ii )
eP Yi −ni p̂i
where eai = i
(1−hii )1/2
, eP
i := (ni p̂i (1−p̂i ))1/2