You are on page 1of 45

LOGISTIC REGRESSION-PART 4

Kami Memimpin We Lead


Multicategory Logit Models
Models For Nominal Responses
• Y is nominal with J categories
• Let  1 ,  j  denote the response probabilities
with  1    j  1
• If we have n independent observations based on
these probabilities, the probability distribution
for the no. of outcomes that occur for each J
types is called multinomial.

Kami Memimpin We Lead


Multicategory Logit Models
• Multicategory (or polychotomous) logit models
simultaneously refer to all pairs of categories
• They describe the odds of response in one
category rather than another
• Once the model specifies logits for a certain (J-
1) pairs of categories, the rest are redundant

Kami Memimpin We Lead


Multicategory Logit Models
• Logit models for nominal responses pair each
response category with a baseline category
• The choice of baseline category is arbitrary
• If the last category (J) is the baseline, the
baseline category logits are
j 
log   , j  1, ,J
J 

Kami Memimpin We Lead


Multicategory Logit Models
• Given that the response falls in category j or J,
this is the log odds that the response is j
• For J = 3, for instance, the logit model uses
log(  1  3 ) and log( 2  3)
• The logit models using the baseline-category
logits with a predictor x has form
j 
log       j x, j  1, ,J
J 

Kami Memimpin We Lead


Multicategory Logit Models
• Parameters in the (J-1) equations determine
parameters for logits using all other pairs of
response categories
• For instance, for an arbitrary pair of categories a
and b
 a   a J   a   b 
log    log    log    log  
 b   b  J  J  J 
   a   a x    b   b x 
  a   b     a   b  x
Kami Memimpin We Lead
Multicategory Logit Models
• The logit equation for categories a and b has
intercept parameter (  a   b ) and slope parameter
(  a  b )
• For optimal efficiency, one should fit J-1 logit
equations simultaneously
• Estimates of the model parameters will then have
smaller standard error than the estimates obtained
by fitting the equations separately
• For simultaneous fitting, the same parameter
estimates occur for a pair of categories no matter
which category is baseline.
Kami Memimpin We Lead
Multicategory Logit Models
Alligator Food Choice Example
• The data is taken from a study by the Florida
Game and Fresh Water Fish Commission of
factors influencing the primary food choice of
alligators
• For 59 alligators sampled in Lake George,
Florida, it shows the alligator length (in meters)
and the primary food type, in volume, found in
the alligator's stomach
• Primary food type has three categories: Fish,
Invertebrate, and Other
Kami Memimpin We Lead
Multicategory Logit Models

Kami Memimpin We Lead


Multicategory Logit Models
Reading The Data
data gator;
input length choice $ @@;
datalines;
1.24 I 1.30 I 1.30 I 1.32 F 1.32 F 1.40 F 1.42 I 1.42 F
1.45 I 1.45 O 1.47 I 1.47 F 1.50 I 1.52 I 1.55 I 1.60 I
1.63 I 1.65 O 1.65 I 1.65 F 1.65 F 1.68 F 1.70 I 1.73 O
1.78 I 1.78 I 1.78 O 1.80 I 1.80 F 1.85 F 1.88 I 1.93 I
1.98 I 2.03 F 2.03 F 2.16 F 2.26 F 2.31 F 2.31 F 2.36 F
2.36 F 2.39 F 2.41 F 2.44 F 2.46 F 2.56 O 2.67 F 2.72 I
2.79 F 2.84 F 3.25 O 3.28 O 3.33 F 3.56 F 3.58 F 3.66 F
3.68 O 3.71 F 3.89 F
;

Kami Memimpin We Lead


Multicategory Logit Models
Fitting A Baseline-Category Logit Model
proc logistic data=gator descending ;
model choice (REFERENCE="O") = length / link=glogit scale=none
aggregate;
run;

Kami Memimpin We Lead


Multicategory Logit Models

Kami Memimpin We Lead


Multicategory Logit Models

Kami Memimpin We Lead


Multicategory Logit Models

Kami Memimpin We Lead


Multicategory Logit Models

Kami Memimpin We Lead


Multicategory Logit Models
• We applied baseline-category logit model with J
= 3,
– Y = "primary food choice" is the response
– X = "length of alligator" is the predictor
• From the parameter estimates
log ˆ1 ˆ3   1.618  0.110 x,
log ˆ 2 ˆ3   5.697  2.465 x

Kami Memimpin We Lead


Multicategory Logit Models
• The estimated log odds that the response is fish
rather than invertebrate equals

log ˆ1 ˆ 2   1.618  5.697   


 0.110   2.465  
 4.080  2.355x

• For each logit, one interprets the estimates just


as in ordinary binary logistic regression models,
conditional on the event that the response
outcome was one of those two categories
Kami Memimpin We Lead
Multicategory Logit Models
• For instance, given that the primary food type is
fish or invertebrate, the estimated probability
that it is fish increases in length x according to
an logistic curve
• For alligators of length x + 1 meters, the
estimated odds that primary food type is fish
rather than invertebrate equal exp(2.355) = 10.5
times the estimated odds for alligators of length
x meters

Kami Memimpin We Lead


Multicategory Logit Models
• To test the hypothesis that primary food choice
is independent of alligator length, we test
Ho:j=0 for j = 1,2 in the model
• The LR test takes twice the difference in
maximized log likelihoods between this model
and the simpler one having response
independent of length
• The test statistic equals 16.8 with df = 2, giving a
p-value of 0.01 and strong effect of a length
effect
Kami Memimpin We Lead
Multicategory Logit Models
• One can express the multicategory logit model directly
in terms of the response probabilities, as
exp  j   j x 
j  j , j  1, , J  1
 exp  h   h x 
h 1
• The denominator is same for each probability
• The numerators for various j sum to the denominator
• The parameters equal zero in the above equation for
whichever category is the baseline in the logit
expressions

Kami Memimpin We Lead


Multicategory Logit Models
• The estimated probabilities of the outcomes
equal for alligator data

exp 1.62  0.11x 


ˆ1 
1  exp 1.62  0.11x   exp  5.70  2.47 x 
exp  5.70  2.47 x 
ˆ 2 
1  exp 1.62  0.11x   exp  5.70  2.47 x 
1.0
ˆ3 
1  exp 1.62  0.11x   exp  5.70  2.47 x 

Kami Memimpin We Lead


Multicategory Logit Models
• A graph of the model-predicted probabilities
for each food choice by alligator length
proc logistic data=gator descending plots=effect;
model choice (REFERENCE="O") = length / link=glogit
scale=none aggregate;
output out = prob PREDPROBS=I;
run;
proc print data=prob;
run;

Kami Memimpin We Lead


Multicategory Logit Models

Larger alligators clearly prefer to eat fish Kami Memimpin We Lead


Multicategory Logit Models
Logit Models For Ordinal Responses
• When response categories are ordered, logits can
directly incorporate the ordering
• We can have models with simpler interpretations
• Define the j-th cumulative probability that the
response Y falls in category j or below as

P Y  j    1    j , j  1, ,J

Kami Memimpin We Lead


Multicategory Logit Models
• The cumulative probabilities reflect the ordering,
with
P Y  1  P Y  2    P Y  J   1
• Models for cumulative probabilities do not use the
final one P Y  j , since it equals 1
• The logits of the first J-1 cumulative probabilities
are
 P Y  j  
logit  P Y  j    log  
 1  P Y  j  
 1    j 
 log  , j  1, , J  1
  j 1    j 
  Kami Memimpin We Lead
Multicategory Logit Models
• These are called cumulative logits
• A model for the j-th cumulative logit looks like
an ordinary logit model for a binary response in
which categories 1 to j combine to form a single
category, and categories j + 1 to J form a second
category

Kami Memimpin We Lead


Multicategory Logit Models
• For a predictor X, the model
logit  P Y  j     j   x, j  1, , J  1
• has parameter  describing the effect of X on
the log odds of response in category j or below
• This model assumes an identical effect of X for
all J-1 cumulative logits
• When this model fits well, it requires a single
parameter rather than J-1 parameters to
describe the effect of X
Kami Memimpin We Lead
Multicategory Logit Models
• This model refers to odds ratios for the collapsed
response scale, for any fixed j
• For two values x1 and x2 of X, the odds ratio utilizes the
cumulative probabilities and their complements
• We have
 P Y  j | X  x2  P Y  j | X  x2  
log      x2  x1 
 P Y  j | X  x1  P Y  j | X  x1  
• Since the log odds ratio is proportional to the distance
between the x values with same proportionality
constant  for any j, it is called a proportional odds
model
Kami Memimpin We Lead
Multicategory Logit Models
• For x2  x1  1 , the odds of response below any
given category multiply by e for each unit
increase in X
• When the model holds with = 0, X and Y are
statistically independent
• Explanatory variables in cumulative logit
models can be continuous, categorical or of
both types
• The ML fitting process uses an iterative
algorithm simultaneously for all j
Kami Memimpin We Lead
Multicategory Logit Models
Example: Political Ideology

Kami Memimpin We Lead


Multicategory Logit Models
• Political ideology uses a five ordinal scale,
ranging from very liberal to very conservative
• For political party we have Democrats and
Republicans

Kami Memimpin We Lead


Multicategory Logit Models
SAS Codes: Read The Data

data ideology;
input party $ ideology $ count @@;
datalines;
Demo VL 80 Demo SL 81 Demo M 171 Demo SC 41 Demo VC 55
Repub VL 30 Repub SL 46 Repub M 148 Repub SC 84 Repub VC 99
;

Kami Memimpin We Lead


Multicategory Logit Models
SAS Codes: Fit The Model
proc logistic data = ideology order=data descending;
freq count;
class party (ref='Demo') /param = ref;
model ideology = party /link=clogit scale=none;
run;

Kami Memimpin We Lead


Multicategory Logit Models
Response Profile

Kami Memimpin We Lead


Multicategory Logit Models
• The ORDER=DATA option is specified in
the PROC statement, the values for
IMPROVE are ordered in the sequence in
which PROC LOGISTIC encounters them in
the data
• However, the DESCENDING option causes
PROC LOGISTIC to sort the values
numerically, then reverses them to form the
ordered values

Kami Memimpin We Lead


Multicategory Logit Models
Fit Statistics

Kami Memimpin We Lead


Multicategory Logit Models
• Next, PROC LOGISTIC prints out a test for
the appropriateness of the proportional
odds assumption
• If you reject the null hypothesis, you reject
the assumption of proportional odds and
you need to consider a different approach

Kami Memimpin We Lead


Multicategory Logit Models
Testing For Effects

Kami Memimpin We Lead


Multicategory Logit Models
Parameter Estimation

Kami Memimpin We Lead


Multicategory Logit Models
Discussion
• The ML fit of the proportional odds model has
estimated effect = 0.975(ASE = 0.129)
• For any fixed j, the estimated odds that a
Republicans response is in the conservative
direction rather than the liberal direction equal
exp(0.975) = 2.65 times the estimated odds for
Democrat's
• A 95% CI for this odds ratio equals
exp[0.975 ± 1.96 (0.129)] = (2.1; 3.4).
• A fairly substantial association exists, Democrats
tending to be more liberal than Republicans.
Kami Memimpin We Lead
Multicategory Logit Models
• The cumulative probabilities equal
exp  j   x 
P Y  j | X  x  
1  exp  j   x 
• The first estimated cumulative probability for
Republican (VC) equals

exp  2.0440  0.9745 


 0.25549
1  exp  2.0440  0.9745 
• The estimated probability of the j-th category
can be obtained as
P Y  j | X  x   P Y  j  1| X  x 
Kami Memimpin We Lead
Multicategory Logit Models
• Using SAS
proc logistic data = ideology order=data descending plots=effect (polybar);
freq count;
class party (ref='Demo') /param = ref;
model ideology = party /link=clogit scale=none;
output out = prob PREDPROBS=I;
run;
proc print data=prob;
run;

Kami Memimpin We Lead


Multicategory Logit Models

Kami Memimpin We Lead


Multicategory Logit Models

Kami Memimpin We Lead


Kami Memimpin We Lead

Thank You

You might also like