You are on page 1of 4

page 1

Chapter 21 Logistic Regression for Binomial Response Variables

Response variable is binomial counts, denoted by counts of binary variables. That is, if
X~bernoulli(p), then Y = ∑ X i ∼ Binomial ( n, p )

Example 1 Case Study 21.1: Island size and bird extinctions: On each island we count the
number of species that went extinct out all the species on the island. What is the relationship
between the area of an island and the probability of extinction of birds present on the island?

Example 2 Case study 21.2: Moth coloration and natural selection: At each distance from
Liverpool we count the number of moths from each morph that were taken by predators. What is
the relationship between the distance from Liverpool, where trees are dark from industrial soot,
and the probability of predation on the light and dark morphs of the moth Carbonaria?

Logistic regression model for binomial counts

Y = the number of successes in m binomial trials. For example, how many species went
extinct in the 10 year period of the study on each island?

Yi ~Binomial(mi,πi) where i is the ith island and πi is the probability of extinction on each

X 1 ,..., X p : explanatory variables, in the extinction example, X is the area of the island.

For a particular set of values of the explanatory variables,

µ (Y X 1 ,…, X p ) = π = probability of success.

Y/m = the binomial proportion. Note that the sample size in the bird extinction study is
the number of islands, not the number of species.

• We model µ (Y ) = π just as we did for the binary response model:

• logit(π) = η = β 0 + β1 X 1 + … + β p X p

• As before:

1 + eη
page 2
Continuous versus Counted Proportions:
Not all proportions are appropriate to model with logistic regression. We model proportions like
fat calories/total calories, etc., using normal theory, usually. The only proportions that are
appropriate in this context are those that result from an integer count of a certain outcome over
the total number of trials or outcomes.


Since the response variable, Y, is a Binomial random variable we have:

µ (Yi X 1i ,…, X pi ) = πi
SD (Yi X 1i ,… , X pi ) = miπ i (1 − π i )

Variance of Y is not constant across all i.

Estimation of Logistic Regression Coefficients

As for the binary response model we use the maximum likelihood estimators (MLE’s).

Model Assessment
Estimated versus observed: One was to assess the appropriateness of the model and the efficacy
of the estimation routine is to plot the estimated probability, πˆi , against the observed response
proportion, π i = i . Additionally, plots of the observed logits versus one or more of the
explanatory variables are useful for visual examination as we do for ordinary scatterplots in
linear regression. See Display 21.2.

Residual analysis: As in the binary response case, we have two widely used residuals for
binomial counts models.

There are two standard ways to define a residual in logistic regression.

yi − miπˆi
1. Pearson residual = .
miπˆi (1 − πˆi )

⎧⎪ ⎛ Y ⎞ ⎛ m − Y ⎞ ⎫⎪
2. Deviance residual = Dresi = sign (Yi − miπˆi ) 2 ⎨Yi log ⎜ i ⎟ + ( mi − Yi ) log ⎜ i ⎟⎬
⎩⎪ ⎝ miπˆi ⎠ ⎝ mi − miπˆi ⎠ ⎭⎪

The Pearson residual is more easily understood, but the deviance residual directly gives the
contribution of each point to the lack of fit of the model.
page 3
Since the data are grouped, the residuals in a binomial counts logistic regression (either Pearson
or deviance) are more useful than in the binary response regression.

The residuals should be plotted against the predicted values for πis and examined for outliers or
remaining patterns.

Comparing two models

As with the binary response case, we use the value –2ln(Maximized likelihood function) to
compare models. Recall that the MLE’s of the β i ’s are the values that maximize the likelihood
function of the data. So, we find that the values for the β i ’s that maximize the likelihood
function, take the natural log and multiply by –2.

• The quantity –2 ln(Maximized likelihood) is also called the deviance of a model since
larger values indicate greater deviation from the assumed model. Comparing two nested
models by the difference in deviances is a drop-in-deviance test.

• The difference between the values of –2ln(Maximized likelihood function) for a full and
reduced model has approximately a chi-square distribution if the null hypothesis that the
extra parameters are all 0 is true. The d.f. is the difference in the number of parameters
for the two models.

Model Selection
Both AIC and BIC can be used as model selection criteria. As with linear regression models,
they are only relative measures of fit, not absolute measures of fit.

AIC = Deviance + 2p

BIC = Deviance + plog(n)

where p is the number of parameters in the model. Stepwise model selection methods are
available in SPSS using likelihood ratio tests or Wald’s test. The LR methods are preferred.
Other software programs, like S-Plus, have stepwise procedures using AIC or BIC.

Goodness-of-fit Tests
Since we have multiple counts per cell there is a goodness-of-fit test similar to that in linear
regression. We can compare the model with the log odds or logit is linear in the parameters to the
model where each cell has a separate mean. So we are comparing the logistic regression model
with p predictors to the model n different parameters, where p is the number of predictors in the
logistic regression model and n is the number of categorical treatment combinations. That is we
are testing

Reduced model: logit(π i ) = β 0 + β1 x1 + ... + β p x p (p parameters)

Saturated model: logit(π i ) = α i (n parameters)
page 4

⎧⎪ ⎛ Y ⎞ ⎛ m −Y ⎞ ⎫⎪
The test statistic is D 2 = ∑ Di 2 = ∑ 2 ⎨Yi log ⎜ i ⎟ + ( mi − Yi ) log ⎜ i ⎟⎬
i i ⎩⎪ ⎝ miπˆi ⎠ ⎝ mi − miπˆi ⎠ ⎭⎪

Both the denominators, miπˆi and mi − miπˆi , need to be large for the distribution of the test
statistic to be approximately χ 2 n − p . The pvalue for the test is then Pr χ 2 n − p ≥ D 2 }
Wald Test and Confidence Intervals for Single Coefficients.
The Wald test performs similarly as in the binary counts case. The normal approximation used
by this test is adequate so long as n is moderately large and the mπ is greater than 5.

Fitting the model in Matlab.

Below is the code for fitting the bird extinction model in Matlab.

case2101 =[185.8000 75.0000 5.0000

105.8000 67.0000 3.0000
30.7000 66.0000 10.0000
8.5000 51.0000 6.0000
4.8000 28.0000 3.0000
4.5000 20.0000 4.0000
4.3000 43.0000 8.0000
3.6000 31.0000 3.0000
2.6000 28.0000 5.0000
1.7000 32.0000 6.0000
1.2000 30.0000 8.0000
0.7000 20.0000 2.0000
0.7000 31.0000 9.0000
0.6000 16.0000 5.0000
0.4000 15.0000 7.0000
0.3000 33.0000 8.0000
0.2000 40.0000 13.0000
0.0700 6.0000 3.0000];


[b,dev,stats]=glmfit(area,[extinct atrisk],'binomial');

x = 1:10:180;
y = glmval(b,x,'logit');