Logistic Regression-Part 4: Kami Memimpin We Lead

LOGISTIC REGRESSION-PART 4
Kami Memimpin We Lead

Multicategory Logit Models
Models For Nominal Responses
• Y is nominal with J categories
• Let  1 ,  j  denote the response probabilities
with  1    j  1
• If we have n independent observations based on
these probabilities, the probability distribution
for the no. of outcomes that occur for each J
types is called multinomial.

• Multicategory (or polychotomous) logit models
simultaneously refer to all pairs of categories
• They describe the odds of response in one
category rather than another
• Once the model specifies logits for a certain (J-
1) pairs of categories, the rest are redundant

• Logit models for nominal responses pair each
response category with a baseline category
• The choice of baseline category is arbitrary
• If the last category (J) is the baseline, the
baseline category logits are
j 
log   , j  1, ,J
J 

• Given that the response falls in category j or J,
this is the log odds that the response is j
• For J = 3, for instance, the logit model uses
log(  1  3 ) and log( 2  3)
• The logit models using the baseline-category
logits with a predictor x has form
j 
log       j x, j  1, ,J
J 

• Parameters in the (J-1) equations determine
parameters for logits using all other pairs of
response categories
• For instance, for an arbitrary pair of categories a
and b
 a   a J   a   b 
log    log    log    log  
 b   b  J  J  J 
   a   a x    b   b x 
  a   b     a   b  x
• The logit equation for categories a and b has
intercept parameter (  a   b ) and slope parameter
(  a  b )
• For optimal efficiency, one should fit J-1 logit
equations simultaneously
• Estimates of the model parameters will then have
smaller standard error than the estimates obtained
by fitting the equations separately
• For simultaneous fitting, the same parameter
estimates occur for a pair of categories no matter
which category is baseline.
Alligator Food Choice Example
• The data is taken from a study by the Florida
Game and Fresh Water Fish Commission of
factors influencing the primary food choice of
alligators
• For 59 alligators sampled in Lake George,
Florida, it shows the alligator length (in meters)
and the primary food type, in volume, found in
the alligator's stomach
• Primary food type has three categories: Fish,
Invertebrate, and Other

Reading The Data
data gator;
input length choice $ @@;
datalines;
1.24 I 1.30 I 1.30 I 1.32 F 1.32 F 1.40 F 1.42 I 1.42 F
1.45 I 1.45 O 1.47 I 1.47 F 1.50 I 1.52 I 1.55 I 1.60 I
1.63 I 1.65 O 1.65 I 1.65 F 1.65 F 1.68 F 1.70 I 1.73 O
1.78 I 1.78 I 1.78 O 1.80 I 1.80 F 1.85 F 1.88 I 1.93 I
1.98 I 2.03 F 2.03 F 2.16 F 2.26 F 2.31 F 2.31 F 2.36 F
2.36 F 2.39 F 2.41 F 2.44 F 2.46 F 2.56 O 2.67 F 2.72 I
2.79 F 2.84 F 3.25 O 3.28 O 3.33 F 3.56 F 3.58 F 3.66 F
3.68 O 3.71 F 3.89 F
;

Fitting A Baseline-Category Logit Model
proc logistic data=gator descending ;
model choice (REFERENCE="O") = length / link=glogit scale=none
aggregate;
run;





• We applied baseline-category logit model with J
= 3,
– Y = "primary food choice" is the response
– X = "length of alligator" is the predictor
• From the parameter estimates
log ˆ1 ˆ3   1.618  0.110 x,
log ˆ 2 ˆ3   5.697  2.465 x

• The estimated log odds that the response is fish
rather than invertebrate equals
log ˆ1 ˆ 2   1.618  5.697   

 0.110   2.465  
 4.080  2.355x
• For each logit, one interprets the estimates just

as in ordinary binary logistic regression models,
conditional on the event that the response
outcome was one of those two categories
• For instance, given that the primary food type is
fish or invertebrate, the estimated probability
that it is fish increases in length x according to
an logistic curve
• For alligators of length x + 1 meters, the
estimated odds that primary food type is fish
rather than invertebrate equal exp(2.355) = 10.5
times the estimated odds for alligators of length
x meters

• To test the hypothesis that primary food choice
is independent of alligator length, we test
Ho:j=0 for j = 1,2 in the model
• The LR test takes twice the difference in
maximized log likelihoods between this model
and the simpler one having response
independent of length
• The test statistic equals 16.8 with df = 2, giving a
p-value of 0.01 and strong effect of a length
effect
• One can express the multicategory logit model directly
in terms of the response probabilities, as
exp  j   j x 
j  j , j  1, , J  1
 exp  h   h x 
h 1
• The denominator is same for each probability
• The numerators for various j sum to the denominator
• The parameters equal zero in the above equation for
whichever category is the baseline in the logit
expressions

• The estimated probabilities of the outcomes
equal for alligator data
exp 1.62  0.11x 

ˆ1 
1  exp 1.62  0.11x   exp  5.70  2.47 x 
exp  5.70  2.47 x 
ˆ 2 
1  exp 1.62  0.11x   exp  5.70  2.47 x 
1.0
ˆ3 
1  exp 1.62  0.11x   exp  5.70  2.47 x 

• A graph of the model-predicted probabilities
for each food choice by alligator length
proc logistic data=gator descending plots=effect;
model choice (REFERENCE="O") = length / link=glogit
scale=none aggregate;
output out = prob PREDPROBS=I;
run;
proc print data=prob;
run;

Larger alligators clearly prefer to eat fish Kami Memimpin We Lead

Logit Models For Ordinal Responses
• When response categories are ordered, logits can
directly incorporate the ordering
• We can have models with simpler interpretations
• Define the j-th cumulative probability that the
response Y falls in category j or below as
P Y  j    1    j , j  1, ,J

• The cumulative probabilities reflect the ordering,
with
P Y  1  P Y  2    P Y  J   1
• Models for cumulative probabilities do not use the
final one P Y  j , since it equals 1
• The logits of the first J-1 cumulative probabilities
are
 P Y  j  
logit  P Y  j    log  
 1  P Y  j  
 1    j 
 log  , j  1, , J  1
  j 1    j 
  Kami Memimpin We Lead
• These are called cumulative logits
• A model for the j-th cumulative logit looks like
an ordinary logit model for a binary response in
which categories 1 to j combine to form a single
category, and categories j + 1 to J form a second
category

• For a predictor X, the model
logit  P Y  j     j   x, j  1, , J  1
• has parameter  describing the effect of X on
the log odds of response in category j or below
• This model assumes an identical effect of X for
all J-1 cumulative logits
• When this model fits well, it requires a single
parameter rather than J-1 parameters to
describe the effect of X
• This model refers to odds ratios for the collapsed
response scale, for any fixed j
• For two values x1 and x2 of X, the odds ratio utilizes the
cumulative probabilities and their complements
• We have
 P Y  j | X  x2  P Y  j | X  x2  
log      x2  x1 
 P Y  j | X  x1  P Y  j | X  x1  
• Since the log odds ratio is proportional to the distance
between the x values with same proportionality
constant  for any j, it is called a proportional odds
model
• For x2  x1  1 , the odds of response below any
given category multiply by e for each unit
increase in X
• When the model holds with = 0, X and Y are
statistically independent
• Explanatory variables in cumulative logit
models can be continuous, categorical or of
both types
• The ML fitting process uses an iterative
algorithm simultaneously for all j
Example: Political Ideology

• Political ideology uses a five ordinal scale,
ranging from very liberal to very conservative
• For political party we have Democrats and
Republicans

SAS Codes: Read The Data
data ideology;
input party $ ideology $ count @@;
datalines;
Demo VL 80 Demo SL 81 Demo M 171 Demo SC 41 Demo VC 55
Repub VL 30 Repub SL 46 Repub M 148 Repub SC 84 Repub VC 99
;

SAS Codes: Fit The Model
proc logistic data = ideology order=data descending;
freq count;
class party (ref='Demo') /param = ref;
model ideology = party /link=clogit scale=none;
run;

Response Profile

• The ORDER=DATA option is specified in
the PROC statement, the values for
IMPROVE are ordered in the sequence in
which PROC LOGISTIC encounters them in
the data
• However, the DESCENDING option causes
PROC LOGISTIC to sort the values
numerically, then reverses them to form the
ordered values

Fit Statistics

• Next, PROC LOGISTIC prints out a test for
the appropriateness of the proportional
odds assumption
• If you reject the null hypothesis, you reject
the assumption of proportional odds and
you need to consider a different approach

Testing For Effects

Parameter Estimation

Discussion
• The ML fit of the proportional odds model has
estimated effect = 0.975(ASE = 0.129)
• For any fixed j, the estimated odds that a
Republicans response is in the conservative
direction rather than the liberal direction equal
exp(0.975) = 2.65 times the estimated odds for
Democrat's
• A 95% CI for this odds ratio equals
exp[0.975 ± 1.96 (0.129)] = (2.1; 3.4).
• A fairly substantial association exists, Democrats
tending to be more liberal than Republicans.
• The cumulative probabilities equal
exp  j   x 
P Y  j | X  x  
1  exp  j   x 
• The first estimated cumulative probability for
Republican (VC) equals
exp  2.0440  0.9745 

 0.25549
1  exp  2.0440  0.9745 
• The estimated probability of the j-th category
can be obtained as
P Y  j | X  x   P Y  j  1| X  x 
• Using SAS
proc logistic data = ideology order=data descending plots=effect (polybar);
freq count;
class party (ref='Demo') /param = ref;
model ideology = party /link=clogit scale=none;
output out = prob PREDPROBS=I;
run;
proc print data=prob;
run;



Thank You

Logistic Regression-Part 4: Kami Memimpin We Lead

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Logistic Regression-Part 4: Kami Memimpin We Lead

Uploaded by

Copyright:

Available Formats

LOGISTIC REGRESSION-PART 4

Kami Memimpin We Lead

Kami Memimpin We Lead

Kami Memimpin We Lead

Kami Memimpin We Lead

Kami Memimpin We Lead

Kami Memimpin We Lead

Kami Memimpin We Lead

Kami Memimpin We Lead

Kami Memimpin We Lead

Kami Memimpin We Lead

Kami Memimpin We Lead

Kami Memimpin We Lead

Kami Memimpin We Lead

log ˆ1 ˆ 2   1.618  5.697   

• For each logit, one interprets the estimates just

Kami Memimpin We Lead

Kami Memimpin We Lead

exp 1.62  0.11x 

Kami Memimpin We Lead

Kami Memimpin We Lead

Larger alligators clearly prefer to eat fish Kami Memimpin We Lead

Kami Memimpin We Lead

Kami Memimpin We Lead

Kami Memimpin We Lead

Kami Memimpin We Lead

Kami Memimpin We Lead

Kami Memimpin We Lead

Kami Memimpin We Lead

Kami Memimpin We Lead

Kami Memimpin We Lead

Kami Memimpin We Lead

Kami Memimpin We Lead

Kami Memimpin We Lead

exp  2.0440  0.9745 

Kami Memimpin We Lead

Kami Memimpin We Lead

Kami Memimpin We Lead

You might also like