Professional Documents
Culture Documents
Dichotomous Responses
Carolyn J. Anderson
Department of Educational Psychology
Qualitative Explanatory
■ Review and Some Uses & Examples.
Variables
■ Interpreting logistic regression models.
Multiple Logistic Regression
Model Selection
■ Inference for logistic regression.
The Tale of the Titanic
■ Model checking.
Final few Remarks on Logistic
Regression
For example, in the high school and beyond data set we could
look at whether students who attend academic versus
non-academic programs differed in terms of
■ School type (public or private).
■ Race (4 levels).
■ Career choice (11 levels).
■ SES level (3 levels).
Model Selection
Level Type Academic Academic ni
The Tale of the Titanic Low public 91 40 131
Final few Remarks on Logistic
Regression
private 4 4 8
Middle public 138 111 249
private 14 36 50
High public 44 82 126
private 1 35 36
600
Qualitative Explanatory
X1 = 1 if public
Variables
= 0 if private
Multiple Logistic Regression
logit(π) = α + β1 x1 + β2 s1 + β3 s2
Overivew exp(β1 ) = the conditional odds ratio between program type and
Qualitative Explanatory
Variables
given SES.
Multiple Logistic Regression For example, for low SES
Model Selection
(odds academic)|public,low exp(α + β1 + β2 )
The Tale of the Titanic =
Final few Remarks on Logistic
(odds academic)|private, low exp(α + β2 )
Regression
eα eβ1 eβ2
=
eα eβ2
= eβ1
Qualitative Explanatory
β2 (odds academic)|low
Variables exp(β2 ) = e = (school type)
Multiple Logistic Regression
(odds academic)|high
Model Selection
The Tale of the Titanic ■ exp(β3 ) = the conditional odds ratio between program type
Final few Remarks on Logistic
Regression
and middle versus high SES given school type,
β3 (odds academic)|middle
exp(β3 ) = e = (school type)
(odds academic)|high
Qualitative Explanatory
■ Answer: Homogeneous Association — So if a logit model
Variables
with only main effects for the (qualitative) explanatory
Multiple Logistic Regression
variables fits a 3–way table, then we know that the table
Model Selection
displays homogeneous association.
The Tale of the Titanic
Qualitative Explanatory
X2 2 3.748 .10
Variables
G2 (deviance) 2 4.6219 .15
Multiple Logistic Regression
Log Likelihood -375.3239
Model Selection
Qualitative Explanatory
logit(π̂i ) = 2.1107 − 1.3856x1i − 1.5844s1i − .9731s2i
Variables
Overivew
The constraints do not effect
Qualitative Explanatory
Variables
■ Estimated fitted/predicted values of π (or logit(π)), and
Multiple Logistic Regression therefore do not effect goodness of fit statistics or residuals.
Model Selection
Model Selection
Public 0.0000 -1.3856 -.6928
The Tale of the Titanic Private 1.3856 0.0000 .6928
Final few Remarks on Logistic
Regression Low SES 0.0000 -1.5844 -.7319
Mid SES -.6113 -.9731 -.1206
High SES 1.5844 0.0000 .8525
Model Selection
Public 0.0000 -1.3856 -.6928
The Tale of the Titanic Private 1.3856 0.0000 .6928
Final few Remarks on Logistic
Regression Low SES 0.0000 -1.5844 -.7319
Mid SES -.6113 -.9731 -.1206
High SES 1.5844 0.0000 .8525
Model Selection
Multiple logistic regression model as a GLM:
The Tale of the Titanic
Final few Remarks on Logistic ■ Random component is Binomial distribution (the response
Regression
variables is a dichotomous variable).
■ Systematic component is linear predictor with more than 1
variable.
α + β1 x1 + β2 x2 + . . . + βk xk
■ Link is the logit
logit(π) = α + β1 x1 + β2 x2 + . . . + βk xk
■ Socioeconomic
( status or “s” where (
1 if Low 1 if Middle
s1 = s2 =
0 otherwise 0 otherwise
Qualitative Explanatory
Variables
MO is a special case of M1 ; MO is nested within M1 .
Multiple Logistic Regression
We can test whether imposing equal spacing between
Model Selection
categories of SES leads to a significant reduction in fit using
The Tale of the Titanic
Model Selection
Overivew
Qualitative Explanatory
Variables
Model Selection
Qualitative Explanatory
Variables
Model Selection
Overivew
Qualitative Explanatory
Variables
Model Selection
Overivew
Qualitative Explanatory
Variables
Model Selection
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 6.6794 0.8029 69.2142 <.0001
math 1 -0.1006 0.0118 72.6982 <.0001
ses 1 -0.9768 0.2644 13.6467 0.0002
sctyp 1 1 -0.5663 0.5611 1.0186 0.3128
ses*sctyp 1 1 0.5964 0.2646 5.0792 0.0242
Adding in reading also leads to a nice model.
Overivew
Qualitative Explanatory
Variables
Model Selection
Overivew
Qualitative Explanatory
Variables
Model Selection
Model Selection
2. Interaction between a continuous and a discrete variable,
The Tale of the Titanic
then the curves would cross.
Final few Remarks on Logistic 3. Interaction between 2 continuous variables:
Regression
■ Plot π̂ versus values of 1 variable for selected levels of the
other variable (e.g., low, middle and high value on the
“other variable”).
■ If there is no interaction between the variables, the curves
will be parallel.
■ If there is an interaction between the continuous variable,
the curves will cross.
■ Weight ■ Age
Multiple Logistic Regression
Overivew
Qualitative Explanatory
Variables The correlations:
Multiple Logistic Regression systolic diastolic weight
Model Selection
systolic 1.00 .802 .186
The Tale of the Titanic
If you put all 6 variables in the model, age ends up being the
only one that really looks significant. Age is correlated with
both blood pressure measurements and weight.
Model Selection
term, the change in degrees of freedom will always equal 1 (i.e.,
The Tale of the Titanic
∆df = 1). Therefore, we could just look at ∆G2 . When this is not the
Final few Remarks on Logistic case, you should use p–values.
Regression
Models −2(LO − L1 )
Model −2L df Compared = ∆G2 p-value R2
(1) MSP 659.210 — .25
(2) MS, MP, SP 659.226 (2) − (1) .016 .899 .24
(3a) MS, MP 664.972 (3a) − (2) 5.746 .017 .24
(3b) MS, SP 659.288 (3b) − (2) .062 .803 .25
(3c) MP, SP 659.386 (3c) − (2) .160 .689 .25
(4a) M, SP 659.441 (4a) − (3b) .153 .696 .25
Model Selection
■ For example suppose that you have 6 possible predictors, fit
The Tale of the Titanic
1. Most complex model.
Final few Remarks on Logistic
Regression 2. Delete the 6–way interaction.
3. Delete all the 5–way interactions.
4. Delete all 4–way interaction.
5. etc.
■ What you should NOT do is let a computer algorithm do
stepwise regression.
Qualitative Explanatory
Variables
■ R2 must possess utility as a measure of goodness-of-fit &
Multiple Logistic Regression
have intuitively reasonable interpretation.
Model Selection
■ R2 should be dimensionless.
The Tale of the Titanic ■ R2 should have well defined range and endpoints denote
Final few Remarks on Logistic
Regression
perfect relationship (e.g., 1 ≤ R2 ≤ 1 or 0 ≤ R2 ≤ 1).
■ R2 should be general enough to apply to any type of model
(e.g., random or fixed predictors).
■ R2 shouldn’t depend on method used to fit model to data.
■ R2 values for different models fit to same data set are
directly comparable.
■ Relative values of R2 should be comparable.
■ Positive and negative residuals equally weighted by R2 .
Qualitative Explanatory
■ Description:
Variables
The Titanic was billed as the ship that would never sink.
Multiple Logistic Regression
On her maiden voyage, she set sail from Southampton
Model Selection
to New York. On April 14th, 1912, at 11:40pm, the
The Tale of the Titanic
● The Tale of the Titanic
Titanic struck an iceberg and at 2:20 a.m. sank. Of the
● Modeling Titanic Data
● Plot using
2228 passengers and crew on board, only 705
Hosmer-Lemeshow Grouping
● QQ-plot: Pearson Residuals
survived.
● QQ-plot: Deviance Residuals
● Parameter Estimates
■ Data Available:
● Graphical Interpretation ◆ n = 1046
Final few Remarks on Logistic ◆ Y = survived (0 = no, 1 = yes)
Regression
◆ Explanatory variables that we’ll look at:
■ Pclass = Passenger class (1 =first class, 2 =second,
3 =third)
■ Sex = Passenger gender (1 =female, 2 =male)
■ Age in years.
Overivew
Qualitative Explanatory
Variables
Model Selection
Overivew
Qualitative Explanatory
Variables
Model Selection
Overivew
Qualitative Explanatory
Variables
Model Selection
Model Selection
pclass 1 1 0.6673 0.4104 2.6433 .10
The Tale of the Titanic pclass 2 1 0.9925 0.4061 5.9740 .01
● The Tale of the Titanic
● Modeling Titanic Data
● Plot using
sex female 1 1.1283 0.2559 19.4478 < .01
Hosmer-Lemeshow Grouping
● QQ-plot: Pearson Residuals age 1 −0.0419 0.00799 27.5016 < .01
● QQ-plot: Deviance Residuals
● Parameter Estimates
● Graphical Interpretation
pclass*sex 1 female 1 0.1678 0.1940 0.7480 .39
Final few Remarks on Logistic pclass*sex 2 female 1 0.6072 0.1805 11.3190 < .01
Regression
Overivew
Qualitative Explanatory
Variables
Model Selection
Model Selection
◆ Use the conditional probability distribution where you consider
The Tale of the Titanic the “sufficient statistics” (statistics computed on data the are
Final few Remarks on Logistic
Regression
needed to estimate certain model parameters) as being fixed.
● Final few Remarks on Logistic ◆ The conditional probability distribution and the maximized value
Regression
● “Exact” Inference
● Some notes regarding SAS
of the conditional likelihood function depends only on the
● Including Interactions
● Last Two Comments on
parameters that you’re interested in testing.
Logistic Regression
■ This only works when you use the canonical link for the random
component.
■ The conditional method is especially useful for small samples. You
can perform “exact” inference for a parameter by using conditional
likelihood function that eliminates all of the other parameters).
Model Selection
◆ LOGISTIC: You do not need number of cases,
The Tale of the Titanic
model y = x1 x2 /<other options you might want> ;
Final few Remarks on Logistic
Regression
● Final few Remarks on Logistic
Regression
■ Response pattern with counts (i.e., tabular form).
● “Exact” Inference
● Some notes regarding SAS
● Including Interactions
◆ GENMOD: You need a variable for number of cases
● Last Two Comments on
Logistic Regression (ncases) that equals frequency count,
model y/ncases = x1 x2 / link= logit dist=binomial;
◆ LOGISTIC: You do not need number of cases (ncases)
that equals a frequency count.
model y/ncases = x1 x2 /<other options you might
want> ;
Overivew
For both LOGISTIC and GENMOD, interactions are included
Qualitative Explanatory
Variables
by using the ∗ notation
Multiple Logistic Regression
Qualitative Explanatory
Variables Example: High School and Beyond with school type and SES both as
Multiple Logistic Regression
nominal variables. The model was
Model Selection logit(πij ) = α + βiP + βjSES
The Tale of the Titanic
So
Final few Remarks on Logistic
Regression
df = (# school types ) × (# SES levels)
● Final few Remarks on Logistic
Regression −(# unique parameters)
● “Exact” Inference
● Some notes regarding SAS
● Including Interactions
= (2 × 3) − [1 − (2 − 1) − (3 − 1)]
● Last Two Comments on
Logistic Regression
In general for a similar model with (and I and J),
df = (IJ) − 1 − (I − 1) − (J − 1)
= (I − 1)(J − 1)