You are on page 1of 14

The hidden logisti

regression model
P.J. Rousseeuw

1
2

and A. Christmann

Universitaire Instelling Antwerpen (UIA), Department of Mathemati s and Computer S ien e, Universiteitsplein 1, B-2610 Wilrijk, Belgium
University of Dortmund, SFB-475, HRZ, 44221 Dortmund, Germany

Keywords: Logisti regression, Hidden layer, Overlap, Robustness.

Abstra t

The logisti regression model is ommonly used to des ribe the e e t of one or several explanatory
variables on a binary response variable. Here we onsider an alternative model under whi h the
observed response is strongly related but not equal to the unobservable true response. We all this
the hidden logisti regression (HLR) model be ause the unobservable true responses are omparable
to a hidden layer in a feedforward neural net. We propose the maximum estimated likelihood method
in this model, whi h is robust against separation unlike existing methods for logisti regression.
We also onsider outlier-robust estimation in this setting.
1

Introdu tion

The logisti regression model assumes independent Bernoulli distributed response variables with
su ess probabilities (x0 ) where  is the logisti distribution fun tion, x 2 IR are ve tors of
explanatory variables, 1  i  n, and  2 IR is unknown. Under these assumptions, the lassi al
maximum likelihood (ML) estimator has ertain asymptoti optimality properties. However, even if
the logisti regression assumptions are satis ed there are data sets for whi h the ML estimate does
not exist. This o urs for exa tly those data sets in whi h there is no overlap between su esses
and failures, f. Albert and Anderson (1984) and Santner and Du y (1986). This identi ation
problem is not limited to the ML estimator but is shared by all estimators for logisti regression,
su h as that of Kuns h et al. (1989).
One way to deal with this problem is to measure the amount of overlap. This an be done by
exploiting a onne tion between the notion of overlap and the notion of regression depth proposed
by Rousseeuw and Hubert (1999), leading to the algorithm of Christmann and Rousseeuw (2001).
A omparison between this approa h and the support ve tor ma hine is given in Christmann,
Fis her and Joa hims (2000).
In Se tion 2 we use an alternative model, whi h is an extension of the logisti regression model.
We assume that due to an additional sto hasti me hanism the true response of a logisti regression
model is unobservable, but that there exists an observable variable whi h is strongly related to
the true response. E.g., in a medi al ontext there is often no perfe t laboratory test pro edure
to dete t whether a spe i illness is present or not (i.e., mis lassi ation errors may sometimes
o ur). In that ase, the true response (whether the disease is present) is not observable, but the
result of the laboratory test is.
It an be argued that the true unobservable responses are omparable to a hidden layer in a
feedforward neural network model, whi h is why we all this the hidden logisti regression (HLR)
model. In Se tion 3 we propose the maximum estimated likelihood (MEL) te hnique in this model,
and show that it is immune to the identi ation problem des ribed above. The MEL estimator is
studied by simulations (Se tion 4) and on real data sets (Se tion 5). In Se tion 6 we also onsider
outlier-robust estimation in this setting, whereas Se tion 7 provides a dis ussion and an outlook
to further resear h.
i

Robustness against separation and outliers in binary regression

The hidden logisti regression model

The lassi al logisti regression model assumes n observable independent responses Y with Bernoulli
distributions Bi(1; (x0 )), where i = 1; : : : ; n and  2 IR . Throughout this paper we assume that
there is an inter ept, so we put x 1 = 1 for all i, and thus p  2.
The new model assumes that the true responses are unobservable (latent) due to an additional
sto hasti me hanism. In medi al diagnosis there is typi ally no test pro edure (e.g. a blood test)
whi h is ompletely free of mis lassi ation errors. Another possible ause of mis lassi ations is
the o urren e of leri al errors.
To larify the model, let us rst onsider a medi al appli ation with only n = 1 patient. His/her
true status (e.g. presen e or absen e of the disease) has two possible values, typi ally denoted as
su ess (s) and failure (f ). We assume that the true status T is unobservable. However, we an
observe the variable Y whi h is strongly related to T as in Figure 1. If the true status is T = s
we observe Y = 1 with probability P(Y = 1jT = s) = 1 , hen e a mis lassi ation o urs with
probability P(Y = 0jT = s) = 1 1 . Analogously, if the true status is f we observe Y = 1 with
probability P(Y = 1jT = f ) = 0 and we obtain Y = 0 with probability P(Y = 0jT = f ) = 1 0 .
We of ourse assume that the probability of observing the true status is higher than 50%, i.e.
0 < 0 < 0:5 < 1 < 1.
i

i;

1 0

0
1 1

T
Fig. 1.

Unobservable truth T and observable response Y .

Ekholm and Palmgren (1982) onsidered the general ase with n observations. In our notation, there are n unobservable independent random variables T resulting from a lassi al logisti
regression model with nite parameter ve tor  = (1 ; : : : ;  )0 = ( ; 1 ; : : : ; 1 )0 . Hen e T has
a Bernoulli distribution with su ess probability  = (x0 ) where (z ) = 1=[1 + exp( z ) and
x 2 IR . Furthermore, they assume that the observable responses Y are related to T as in Figure
1. For instan e, when T = s we obtain Y = 1 with probability P(Y = 1jT = s) = 1 whereas
Y = 0 o urs with the omplementary probability P(Y = 0jT = s) = 1 1 . (The plain logisti
model assumes 0 and 1 = 1.) The entire me hanism in Figure 2 we all the hidden logisti regression model be ause the true status T is hidden by the sto hasti stru ture in the top part of
Figure 2. This model an be interpreted as a spe ial kind of neural net, with a single hidden layer
that orresponds to the latent variable T .
i

The maximum estimated likelihood method

a. Constru tion

We now need a way to t data sets arising from the hidden logisti model. Two approa hes already
exist, by Ekholm and Palmgren (1982) and by Copas (1988), but here we will pro eed in a di erent
way.

P.J. Rousseeuw and A. Christmann

1 0

0
1 1

logistic

X
Fig. 2.

x1 x 2 ... x p 1

Hidden logisti regression model.

Let us start by looking only at Figure 1, where Y is observed but T is not. How an we then
estimate T ? This is a tually the smallest nontrivial estimation problem, be ause any su h problem
needs more than one possible value of the parameter and more than one possible out ome. Here
we have exa tly two values for both, and the only distributions on two possible out omes are the
Bernoulli distributions. Under f the likelihood of Y = 0 ex eeds that of Y = 1, and under s the
opposite holds. Therefore, the maximum likelihood estimator of T given (Y = y) be omes simply
T^ML (Y = 0) = f
(1)
T^ML (Y = 1) = s
whi h onforms with intuition.
Let us now onsider the onditional probability that Y = 1 given T^ML, yielding
P(Y = 1jT^ML) = 0 if y = 0
= 1

if

y=1

(2)

where y is the observed value of Y . Denoting (2) by Y~ , we an rewrite it as


Y~ = 0 + (1 0 )Y = (1 Y )0 + Y 1
whi h is a weighted average of 0 and 1 with weights 1 Y and Y .
In the model with n observations y we obtain analogously
i

y~ = (1 y )0 + y 1
i

(3)

whi h we will all the pseudo-observations. In words, the pseudo-observation y~ is the su ess
probability onditional on the most likely estimate of the true status t .
We now want to t a logisti regression to the pseudo-observations y~ . (In the lassi al ase,
y~ = y .) There are several estimation methods, but here we will apply the maximum likelihood
i

Robustness against separation and outliers in binary regression

formula. The goal is thus to maximize


L(j(~y1 ; : : : ; y~ )) =
n

Y [(x0 )
n

=1

yi

[1 (x0 )1
i

(4)

yi

over  2 IR . We all (4) the estimated likelihood be ause we don't know the true likelihood, whi h
depends on the unobservable t1 ; : : : ; t . (We only know the true likelihood when 0 = 0 and 1 = 1.)
The maximizer ^ of (4) an thus be alled the maximum estimated likelihood (MEL) estimator.
In order to ompute the MEL estimator we an take the logarithm of (4), yielding
p

X y~ ln((x0 )) + (1
n

y~ ) ln(1 (x0 ))


i

=1

(5)

whi h always exists sin e  is nite. Di erentiating with respe t to  yields the (p variate) s ore
fun tion
X
(6)
s(j(~y1 ; : : : ; y~ )) = (~y (x0 )) x
n

=1

for all  2 IR . Setting (6) equal to zero yields the desired estimate.
p

b. Properties of the MEL estimator

Unlike the lassi al ML estimator, the MEL estimator always exists.


Property 1. When 0 < 0 < 1 < 1 and the data set has a design matrix of full olumn rank,
the MEL estimator always exists and is unique.
(Note that when the design matrix is not of full olumn rank, we an rst redu e the dimension
of the x by means of prin ipal omponent analysis.)
Proof. The Hessian matrix of (5) equals
i

X (x0 )(1


s() =


=1

(x0 )) x x0
i

(7)

and is thus negative de nite be ause the design matrix has rank p. Therefore the di erentiable
fun tion (5) is stri tly on ave. Now let us take any  6= 0 and repla e  in (5) by . If we let
 ! +1 then (5) always tends to 1 be ause there is at least one x in the data set with x0  6= 0
due to full rank, and neither y~ or (1 y~ ) an be zero. Therefore, there must be a nite maximizer
^MEL of (5), whi h is unique be ause the on avity is stri t.
2
This implies that the MEL estimator exists even when the data set has no overlap. Therefore
also the resulting odds ratios OR = exp(^ ) always exist, i.e. they are never zero or +1.
A property shared by all logisti regression estimators is x ane equivarian e. This says that
when the x are repla ed by x = Ax where A is a nonsingular p  p matrix, then the regression
oe ients transform a ordingly.
Property 2. The MEL estimator is x ane equivariant.
 = (A0 ) 1 ^MEL hen e (x )0 ^ = x0 A0 (A0 ) 1 ^MEL =
^MEL
Proof. From (6) it follows that 
MEL
0
2
x ^MEL. This also yields the same predi ted values.
In linear regression there exist two other types of equivarian e: one about adding a linear
fun tion to the response (`regression equivarian e') and one about multiplying the response by a
onstant fa tor (`y s ale equivarian e'), but these obviously do no apply to logisti regression.
i

. Choi e of

and

If 0 and 1 are known from the ontext (e.g. from the type I and type II error probabilities of a
blood test) then we an use these values. But in many ases, 0 and 1 are not given in advan e.
Copas (1988, page 241) found that a urate estimation of 0 and 1 from the data itself is very

P.J. Rousseeuw and A. Christmann

di ult, if not impossible unless n is extremely large. He essentially onsiders them as tuning
onstants that an be hosen, as do we.
The `symmetri ' approa h used by Copas is to hoose a single onstant > 0 and to set
0 = and 1 = 1 :

(8)

His omputations require that be small enough so that terms in 2 an be ignored. In his Table
1 the values = 0:01 and = 0:02 o ur, whereas he onsiders = 0:05 to be unreasonably high
(page 238). In most of Copas' examples = 0:01 performs well, and this turns out to be true
also for our MEL method, so we ould use = 0:01 as the default hoi e. This approa h has the
advantage of simpli ity.
On the other hand, there is something to be said for an `asymmetri ' hoi e whi h takes into
a ount how many y 's are 0 and 1 in the data set. Let us onsider the marginal distribution of
the y (that is, un onditional on the x ) from whi h we onstru t some estimate ^ of the marginal
su ess probability P(Y = 1). It seems reasonable to onstrain 0 and 1 su h that the average of
the pseudo-observations y~ orresponds to ^ . This yields
i

X y~
n

= (1 ^ )0 + ^ 1
n =1
^ ^ 1 = 0 ^ 0
1 1 = 0 :
1 ^ ^ 0
Sin e it is natural to assume that 0 < ^ < 1 the latter ratios equal a (small) positive number
whi h we will denote by . Consequently we an write both 0 and 1 as fun tions of , as:
1 + ^
^
(9)
0 =
1 + and 1 = 1 + :
However, sin e we have assumed that 0 < ^ < 1 we have to onstru t ^ a ordingly. We annot
take the standard estimate  = 1 =1 y = (number of y = 1)=n be ause  an be ome 0 or 1.
A natural idea is to bound  away from 0 and 1 by putting
^ =

n
i

^ = max (; min(1 ;  ))

(10)

whi h means trun ation at and 1 . This is su ient be ause always


^
^ + ^
0 =
1 + < 1 + = ^
and
1 + ^ > ^ + ^ = ^
1 =
1+
1+

hen e 0 < ^ < 1 . Note that both mis lassi ation probabilities in Figure 1 are less than be ause

^
<
0 =
1+ 1+ <
and
1 1 = 1 + 1 +1 ^ = (11 +^) < 1 + < :
Our default hoi e will be = 0:01, whi h implies smaller lassi ation errors than by putting

= 0:01 in formula (8).

When the data are `balan ed' in the sense that there are as many y = 1 as y = 0, expression
(10) yields ^ = 0:5 hen e 0 = 1 1 by (9), yielding identi al mis lassi ation probabilities, as
in the symmetri formulas (8). In all other, `unbalan ed' ases, our asymmetri approa h yields
less biased predi tions. An extreme ase is when all y = 1. (This is a situation where the lassi al
ML estimator does not exist.) The MEL estimator will put all y~ = 1 yielding a t with all
i

Robustness against separation and outliers in binary regression

slopes ^1 = : : : = ^ 1 = 0 and with inter ept =  1 (1 ). Using the symmetri approa h


(8) yields 1 = 0:99 hen e ^ = logit(0:99) = ln(99)  4:595 so the tted values are onstant
and equal to 0:99. On the other hand, the asymmetri approa h yields ^ = 0:99 and 1 = (1 +
(0:99)(0:01))=(1 + 0:01) = 1:0099=1:01 = 0:999901. This again yields zero slopes but a larger
inter ept ^ = logit(0:999901) = ln(10099)  9:22 so the tted values are 0:9999 whi h is mu h
loser to 1.
Our re ommendation is therefore to ompute ^ , 0 , and 1 as in (9) and (10) with = 0:01, to
ompute the pseudo-observations y~ a ording to (3) and to arry out the resulting MEL method.
Our S-PLUS ode for this method an be downloaded from
p

http://win-www.uia.a .be/u/statis/software/HLR readme.html

or
http://www.statistik.uni-dortmund.de/sfb475/beri hte/rous hr2.zip

The ML estimator has the ni e property under the logisti regression model that if ^ is the ML
estimate for the data set f(x0 ; y ), 1  i  ng, then ^ is the ML estimate for the data set
f(x0 ; 1 y ), 1  i  ng. Hen e, re oding all response variables Y to 1 Y a e ts the ML
estimator only in the way that it hanges the signs of the regression oe ients, and the odds
ratios be ome exp( ^ ) = 1=OR . We all this equivarian e with respe t to re oding the response
variable. The MEL estimator has the same property, whether 0 and 1 are given by (8) or (9) .
Property 3. The MEL estimator is equivariant with respe t to re oding the response variable.

Proof. Writing y = 1
y and re omputing (10) and (9) [or (8) yields y~ = 1 y~ by (3).
Applying the ML estimator to the (x0 ; y~ ) yields the desired result.
2
i

Simulations

In this se tion we arry out a small simulation to ompare the bias and the standard error of the
usual ML estimator and the proposed MEL estimator with = 0:01 under the assumptions of the
logisti regression model. We will estimate p = 3 oe ients, in luding the inter ept term. Both
explanatory variables are generated from the standard normal distribution. As true parameter
ve tors we use  = (1; 0; 0)0 and  = (1; 1; 2)0 . The number of observations n will be 20, 50, and
100. For ea h situation 1; 000 samples are generated.
We use the depth-based algorithm (Christmann and Rousseeuw 2001) to he k whether the
data set has overlap, i.e. whether the ML estimate exists. It turned out that there were 12 data
sets without overlap for n = 20 with  , and 129 data sets without overlap for n = 20 with  .
This ontrasts sharply to the MEL estimate, whi h existed for all data sets.
Table 1 ompares ML and MEL for the data sets with overlap. In situation A, where the true
slopes are zero, there is not mu h di eren e between the estimators. But in situation B, the MEL
estimator has a substantially smaller bias and standard error than the ML estimator. This an be
explained by the well-known phenomenon that ML tends to overestimate the magnitude of nonzero
oe ients, whereas MEL exhibits a kind of `shrinkage' behavior.
A

Examples

In this se tion we onsider some ben hmark data sets. Both the banknotes data set (Riedwyl
1997) and the hemophilia data set (Hermans and Habbema 1975) have no overlap, hen e their ML
estimate does not exist. The vaso onstri tion data (Finney 1947, Pregibon 1981) and the food
stamp data (Kuns h et al. 1989) are well-known in the literature on outlier dete tion and robust
logisti regression. They both have little overlap: it su es to delete 3 (resp. 6) observations in
these data sets to make the ML estimate nonexistent (see Christmann and Rousseeuw 2001). Some
of these observations are onsidered as outliers in Kuns h et al. (1989). The an er remission data
set (Lee 1974) is hosen be ause n=p  4 is small. The toxoplasmosis data set (Efron 1986) and
the IVC data set (Jaeger et al. 1997, 1998) have a large n.

P.J. Rousseeuw and A. Christmann

Table 1:

Bias and Standard Error of the ML estimator and the MEL estimator with = 0:01.
ML

n
Bias
SE
Case A with  = (1; 0; 0)0
20
0.291 0.032
1
0.010 0.031
2
-0.014 0.035
50
0.097 0.012
1
-0.015 0.011
2
-0.021 0.012
100
0.053 0.008
1
0.004 0.008
2
-0.004 0.008
Case B with  = (1; 1; 2)0
20
0.586 0.067
1
0.652 0.083
2
1.372 0.159
50
0.133 0.019
1
0.156 0.022
2
0.350 0.030
100
0.061 0.011
1
0.085 0.012
2
0.154 0.016

MEL
Bias
SE
0.272
0.009
-0.004
0.095
-0.015
-0.021
0.052
0.004
-0.004

0.028
0.029
0.030
0.012
0.011
0.012
0.008
0.008
0.008

0.360
0.364
0.780
0.097
0.104
0.247
0.038
0.050
0.084

0.039
0.045
0.057
0.017
0.019
0.025
0.010
0.011
0.015

The IVC data set des ribes an in vitro experiment to study possible risk fa tors of the thrombus apturing e a y of inferior vena ava (IVC) lters. We fo us on the study of a parti ular oni al
IVC lter, for whi h the design onsisted of 48 di erent settings x . For ea h ve tor x there were
m repli ations with m 2 f50; 60; 90; 100g, yielding a total of n = 3200.
Table 2: Comparison between MEL estimates with = 0:01 and ML estimates.
^6
^5
^4
^3
^2
Data set (n; p)
Method
^
^1
Banknotes
ML
no overlap, ML does not exist
(200; 7)
MEL
147.09 0.46 -1.02 1.33 2.20 2.32 -2.37
Hemophilia
ML
no overlap, ML does not exist
(52; 3)
MEL
-5.43 -56.59 47.39
Vaso onstri tion ML
-2.92 5.22 4.63
(39; 3)
MEL
-2.77 4.98 4.41
Food stamp
ML
0.93 -1.85 0.90 -0.33
(150; 4)
MEL
0.89 -1.83 0.88 -0.33
Can er remission ML
58.04 24.66 19.29 -19.60 3.90 0.15 -87.43
(27; 7)
MEL
58.51 18.20 12.20 -12.19 3.68 0.14 -81.42
Toxoplasmosis
ML
0.10 -0.45 -0.19 0.21
(697; 4)
MEL
0.10 -0.44 -0.19 0.21
IVC
ML
-1.79 0.67 -1.05 -1.25 1.83
(3200; 5)
MEL
-1.73 0.65 -1.03 -1.22 1.79
i

Table 2 shows that the MEL estimates with = 0:01 were quite similar to the ML estimates for
the data sets with overlap. This is even true for the an er remission data set taking into a ount
the huge standard errors of the ML oe ients, namely 71:23; 47:84; 57:95; 61:68; 2:34; 2:28; and
67:57. The odds ratios exp(^ ) based on the ML and MEL estimates were quite similar too (see
Table 3).
Figure 3 shows that the hoi e of has relatively little impa t on the MEL estimates for the
j

Robustness against separation and outliers in binary regression

food stamp data set, whi h has overlap. Figure 4 shows the e e t of for the banknotes data.
Be ause the latter data set has no overlap we know that jj^jj tends to +1 as goes to 0 (sin e
= 0 orresponds to the ML estimator). One ould therefore use like a `ridge parameter' in
Figure 4.
Table 3: Comparison of odds ratios based on ML and MEL.
^3
^2
Data set (n; p)
Method
^
^1
Vaso onstri tion ML
0.05 185.03 102.64
MEL
0.06 146.13 81.97
Food stamp
ML
2.53
0.16
2.45 0.72
MEL
2.44
0.16
2.42 0.72
Toxoplasmosis
ML
1.10
0.64
0.83 1.24
MEL
1.10
0.64
0.83 1.24
6

Outlier-robust estimation

In the literature on logisti regression, many robust alternatives to the maximum likelihood estimator have been proposed. They an easily be modi ed for the hidden logisti regression model in the
same way that we onstru ted the MEL estimator, i.e. by applying them to the pseudo-observations
(3).
As an example we will onsider a modi ation of the least trimmed weighted squares (LTWS)
estimator of Christmann (1994a) whi h is de ned as follows. We assume large strata, i.e. ea h
design point x has m responses Y for j = 1; : : : ; m . One then adds all the Y orresponding to
that x yielding
X
Z = Y 2 f0; : : : ; m g
j

mi

=1

and rede nes n as the number of the x 's (whi h is less than the total number of original responses YP). The large strata assumption says that n and p are xed while min1  m ! 1
and m =( =1 m ) ! k 2 (0; 1). One then puts  = Z =m and Z  = (m  (1  ))1 2  1 ( )
as well as X  = (m  (1  ))1 2 x . For large values of m the Z  approximately follow a linear
regression model in the X . Christmann (1994a) de ned the LTWS estimator of  as the least
trimmed squares estimator (Rousseeuw 1984) applied to the transformed variables Z  and X  ,
that is
X r2
^LTWS = argmin
:
 2 IR =1
where r1:2  : : :  r2 : are the ordered squared residuals where r = Z  0 X  . The robustness
aspe ts and asymptoti behavior of ^LWTS were investigated in Christmann (1994a, 1998) .
In the hidden logisti model, we apply the LTWS method to the pseudo-observations y~ de ned
in (3), with 0 and 1 given by (9) and (10). That is, we put
Y~ = (1 Y )0 + Y 1
i

n
j

n
=

i n

n n

P
yielding the orresponding variable Z~ = =1 Y~ . Substituting Z~ for Z yields ~ = Z~ =m and
i

mi

Z~  = (m ~ (1 ~ ))1 2  1(~ )
X~  = (m ~ (1 ~ ))1 2 x
i

to whi h we apply LTS regression. Like the MEL estimator, this modi ed LTWS estimator exists
for all data sets (and it is still x ane equivariant). In addition, it is also robust to outliers in
Z and x . (The latter means that the modi ed LTWS estimator an resist the e e t of leverage
points, unlike some other robust approa hes.)
i

P.J. Rousseeuw and A. Christmann

^
1

1.85

0.65

0.75

1.75

0.85

1.65

0.00 0.04 0.08

0.00 0.04 0.08

^
2

^
3

0.80

0.33

0.84

0.31

0.88

0.29

0.00 0.04 0.08

0.00 0.04 0.08

Fig. 3. Graphs of the MEL oe ients versus for the food stamp data set,
for = 0:0001; 0:001; 0:005; 0:01; 0:05; and 0:1.

Robustness against separation and outliers in binary regression

^
1

0.2

200

400

0.6

600

1.0

800

0.00 0.04 0.08

0.00 0.04 0.08

^
2

^
3

10

0.00 0.04 0.08

0.00 0.04 0.08

Fig. 4. Graphs of the rst four MEL oe ients versus for the banknotes data set,
for = 0:0001; 0:001; 0:005; 0:01; 0:05; and 0:1.

P.J. Rousseeuw and A. Christmann

11

Let us illustrate the modi ed LTWS estimator on the toxoplasmosis data set (Efron 1986) of
Se tion 5. In aggregated form this data set has n = 34 observations, with m ranging from 1 to
82 with a mean of 20:5. We ran the modi ed LTWS method with the default hoi es = 0:01
and h = [ 0:75n = 25, whi h took only a few se onds be ause we used the FAST-LTS program
(Rousseeuw and Van Driessen 1999b). The resulting oe ients were ( 0:37; 1:26; 0:17; 0:42)0
whi h learly di er from the non-robust oe ients given in Table 2. Of ourse, the odds ratios
0:69, 0:28, 0:84, and 1:52 based on the outlier-robust approa h also di er from the non-robust odds
ratios in Table 3. The observations 27, 28, and 30 sti k out in the robust residual plot (Figure 5),
whi h agrees with ndings based on a robust minimum Hellinger distan e approa h (Christmann
1994b).

Standardized LTS Residual


-2
0
2
4
6

o 27

o 28
o 30
o

o
o

ooo

o o

o
o

oo
o

ooo

2.5

o
o
o
o
o

o
oo

o
-2.5

o
0

10

20

30

Index
Fig. 5.

Toxoplasmosis data: Index plot of residuals based on modi ed LTWS.

Dis ussion and outlook

The main problem addressed in this paper is that the oe ients of the binary regression model
(with logisti or probit link fun tion) annot be estimated when the x 's of su esses and failures
don't overlap. This is a de ien y of the model itself, be ause the t an be made perfe t by letting
jjjj tend to in nity. Therefore, this problem is shared by all reasonable estimators that operate
under the logisti model.
Our approa h to resolve this problem is to work with a generalized model, whi h we all the
hidden logisti model. Here we ompute the pseudo-observations y~ , de ned as the probability that
y = 1 onditional on the maximum likelihood estimate of the true status t . The resulting MEL
estimator always exists and is unique, even though the hypotheti al mis lassi ation probabilities
(based on our default setting = 1%) are so small that they would not be visible in the observed
data.
The hidden logisti model was previously used (under a di erent name) in an important paper
by Copas (1988). However, his approa h and ours are almost diametri ally opposite. Copas' motivation is to redu e the e e t of the outliers that matter, whi h are the observations (x ; y ) where x
is far away from the bulk of the data and y has the value whi h is very unlikely under the logisti
i

12

Robustness against separation and outliers in binary regression

model. In the terminology of Rousseeuw and van Zomeren (1990) these are bad leverage points. In
logisti regression their e e t is always to atten the t, i.e. to bring the estimated slopes lose to
zero. Copas' approa h shrinks the logisti distribution fun tion  away from 0 and 1 (by letting
it range between and 1 ), so that bad leverage points are no longer that unlikely under his
model, whi h greatly redu es their e e t. On the other hand, his approa h aggravates the problems
that arise when there is little overlap between su esses and failures, as in his analysis of the vaso
onstri tion data.
Our approa h goes into the other dire tion: rather than shrinking  while leaving the responses
y un hanged, we leave  un hanged and shrink the y to the pseudo-observations y~ whi h are
slightly larger than zero or slightly less than 1. This ompletely eliminates the overlap problem.
It does not help at all for the problem of bad leverage points, but for that problem we an use
existing te hniques from the robustness literature. For instan e, for grouped data (i.e. tied x 's)
we saw in Se tion 6 that the tting an be done by the LTS regression method, whi h is robust
against leverage points.
In general, also other robust te hniques an be applied to the (x ; y~ ). For instan e, note that
the s ore fun tion (6) is similar to an M-estimator equation. Sin e the (pseudo-)residual is always
bounded due to
jy~ (x0 )j < 1
the main problem omes from the fa tor x whi h need not be bounded (this orresponds to the
leverage point issue). A straightforward remedy is to downweight leverage points, yielding the
weighted maximum estimated likelihood (WEMEL) estimator de ned as the solution ^ of
i

X (~y
n

=1

(x0 )) w x = 0
i

(11)

where the weights w only depend on how far away x is from the bulk of the data. For instan e,
we an put
M
w =
(12)
maxfRD2 (x ); M g
i

where x = (x 2 ; : : : ; x ) 2 IR 1 , RD(x ) is its robust distan e, and M is the 75th per entile of
all RD2 (x ), j = 1; : : : ; n.
When all regressor variables are ontinuous and there are not more than (say) 30 of them,
we an use the robust distan es that ome out of the minimum ovarian e determinant (MCD)
estimator of Rousseeuw (1984), for whi h the fast algorithm of Rousseeuw and Van Driessen (1999a)
is available. This algorithm has been in orporated in the pa kages S-Plus (as the fun tion ov.m d)
and SAS/IML (as the routine MCD), and both provide the robust distan es in their output. In ase
that not all regressor variables are ontinuous or there are very many of them (even more than one
thousand), we an use the robust distan es provided by the robust prin ipal omponents algorithm
of Hubert, Rousseeuw and Verboven (2001).
We have not yet studied the WEMEL estimator in any detail, but we note that it is easy to
ompute be ause most GLM algorithms (in luding the one in S-Plus) allow the user to input prior
weights w .
We also have not yet addressed the issue of bias orre tion for either MEL or WEMEL, whi h is
a subje t for further resear h. It may be possible to apply the same type of al ulus as for formula
(27) of Copas (1988).
Last but not least are the omputation of in uen e fun tions and breakdown values. It would
be interesting to onne t our work in the hidden logisti model with the existing body of literature
on outlier dete tion and robust estimation in the lassi al logisti model, in luding the work of
Pregibon (1982), Stefanski et al. (1986), Kuns h et al. (1989), and Muller and Neykov (2000).
i;

i;p

A knowledgement

The se ond author was supported by the Deuts he Fors hungsgemeins haft (SFB 475, \Redu tion
of omplexity in multivariate data stru tures").

P.J. Rousseeuw and A. Christmann

13

Referen es

A. Albert, J. A. Anderson (1984). On the existen e of maximum likelihood estimates in logisti


regression models. Biometrika, 71, 1{10.
A. Christmann (1994a). Least median of weighted squares in logisti regression with large strata.
Biometrika, 81, 413{417.
A. Christmann (1994b). Ausreieridenti kation im logistis hen Regressionsmodell. In: S.J. Poppl,
H.-G. Lipinski, T. Mansky (Eds.). Medizinis he Informatik: Ein integrierender Teil arztunterstutzender Te hnologien. MMV Medizin Verlag, Mun hen, pp. 478-481.
A. Christmann (1998). On positive breakdown point estimators in regression models with dis rete
response variables. Habilitation thesis, University of Dortmund, Department of Statisti s.
A. Christmann, P. Fis her, and T. Joa hims (2000). Comparison between the regression depth method and the support ve tor ma hine to approximate the minimum number of mis lassi ations. Te hni al report 53/00, University of Dortmund, SFB 475, submitted.

A. Christmann, P. J. Rousseeuw (2001). Measuring overlap in logisti regression. To appear in


Computational Statisti s and Data Analysis.
J.B. Copas (1988). Binary regression models for ontaminated data. With dis ussion. J. R. Statist.
So ., B, 50, 225{265.
B. Efron (1986). Double exponential families and their use in generalized linear regression. J. Amer.
Statist. Asso ., 81, 709{721.
A. Ekholm, J. Palmgren (1982). A model for binary response with mis lassi ation. In: R. Gil hrist, editor, GLIM-82: Pro . Int. Conf. Generalized Linear Models, pp. 128{143, Springer:
Heidelberg.
D.J. Finney (1947). The estimation from individual re ords of the relationship between dose and
quantal response. Biometrika, 34, 320{334.
J. Hermans, J.D.F. Habbema (1975). Comparison of ve methods to estimate posterior probabilities. EDV in Medizin und Biologie, 6, 14{19.
M. Hubert, P.J. Rousseeuw, and S. Verboven (2001). A fast method for robust prin ipal omponents with appli ations to hemometri s. To appear in Chemometri s and Intelligent Laboratory Systems.
H.J. Jaeger, T. Mair, M. Geller, R.K. Kinne, A. Christmann, K.D. Mathias (1997). A physiologi
in vitro model of the inferior vena ava with a omputer- ontrolled ow system for testing
of inferior vena ava lters. Investigative Radiology, 32, 511{522.
H.J. Jaeger, S. Kolb, T. Mair, M. Geller, A. Christmann, R.K. Kinne, K.D. Mathias (1998). In vitro model for the evaluation of inferior vena ava lters: e e t of experimental parameters
on thrombus- apturing e a y of the Vena Te h-LGM Filter. Journal of Vas ular and Interventional Radiology, 9, 295{304.
H. R. Kuns h, L. A. Stefanski, and R. J. Carroll (1989). Conditionally unbiased bounded-in uen e
estimation in general regression models, with appli ations to generalized linear models.
J. Amer. Statist. Asso ., 84, 460{466.
E.T. Lee (1974). A omputer program for linear logisti regression analysis. Computer Programs
in Biomedi ine, 80{92.
C. Muller, N.M. Neykov (2000). Breakdown points of trimmed likelihood estimators and related
estimators in generalized linear models. Te hni al report.

14

Robustness against separation and outliers in binary regression

D. Pregibon (1981). Logisti regression diagnosti s. Ann. Statist., 9, 705{724.


D. Pregibon (1982). Resistant ts for some ommonly used logisti models with medi al appli ations. Biometri s, 38, 485{498.
H. Riedwyl (1997). Lineare Regression und Verwandtes. Birkhauser, Basel.
P.J. Rousseeuw (1984). Least median of squares regression. J. Amer. Statist. Asso ., 79, 871{880.
P. J. Rousseeuw, M. Hubert (1999). Regression depth. J. Amer. Statist. Asso ., 94, 388{433.
P.J. Rousseeuw, K. Van Driessen (1999a). A fast algorithm for the minimum ovarian e determinant estimator. Te hnometri s, 41, 212{223.
P.J. Rousseeuw, K. Van Driessen (1999b). Computing LTS regression for large data sets. Te hni al
Report, University of Antwerp, submitted.
P.J. Rousseeuw, B.C. Van Zomeren (1990). Unmasking multivariate outliers and leverage points.
J. Amer. Statist. Asso ., 85, 663{651.
T. J. Santner, D. E. Du y (1986). A note on A. Albert and J.A. Anderson's onditions for the
existen e of maximum likelihood estimates in logisti regression models. Biometrika, 73,
755{758.
L.A. Stefanski, R.J. Carroll, and D. Ruppert (1986). Optimally bounded s ore fun tions for generalized linear models with appli ations to logisti regression. Biometrika, 73, 413{424.

You might also like