Multicategory Logit Models

MULTICATEGORY
LOGIT MODELS
Del Rosario, RP | Perez, JJ
Nominal Responses
One response variable Y with J levels
One or more explanatory or predictor variables
quantitative, qualitative or both
Logistic Regression
Forming Logits
When J = 2, Y is dichotomous
log of success odds that an event occurs or does not
occur:
logit () = log
When J > 2,
Multicategory or Polytomous response variable
(1)
2
There are
logits that can be formed but only
(J 1) are non-redundant
Categorical Logit Models

Nominal response
Multinomial logistic regression/Baseline Logits
Ordinal response
Ordinal logistic regression
Cumulative Logits/Proportional Odds Model
Adjacent Categories
Continuous Ratio
Multicategory Logits
Model simultaneously all relationships between
probabilities for pairs of categories (vs Binary Logistic
Regression)
Optimal efficiency
Estimates of the model parameters smaller SE than the
estimates obtained by fitting the equations separately.
For simultaneous fitting, the same parameter estimates occur for a
pair of categories no matter which category is baseline.
They describe the odds of response in one category rather

than another.
Baseline Category Logits

Logit models for nominal responses pair each response
category with a baseline category.
The choice of baseline category is arbitrary.
If the last category (J) is the baseline, the baseline
category logits are:
, = 1, ,
Given that the response falls in category j or J, this is the

log odds that the response is j.
For J = 3, for instance, the logit model uses log ( 1/2 )
and log (2 /3)
Baseline Category Logits

The logit models using the baseline-category logits with a predictor
x has
log
= + , = 1, ,
Parameters in the (J 1) equations determine parameters for

logits using all other pairs of response categories.
For instance, for an arbitrary pair of categories a and b
/
log
= log
= log
log
= + +
= +
Example 1: Alligator Food Choice

A study looking into factors influencing the primary food
choice of alligators in the wild
59 alligators were sampled, and the data shows the
alligator length (in meters) and the primary food type, in
volume, found in the alligators stomach
Food type has three categories: Fish (1), Invertebrate
(2), and Other (3)

Table 1. Alligator size (meter) and Primary food choice
. mlogit food size, b(3)

Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
5:
log
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
=
-57.570928
-49.97414
-49.186349
-49.170647
-49.170622
-49.170622
Using STATA
Multinomial logistic regression
Number of obs
LR chi2(2)
Prob > chi2
Pseudo R2
Log likelihood = -49.170622

Std. Err.
P>|z|
=
=
=
=
59
16.80
0.0002
0.1459
food
Coef.
[95% Conf. Interval]
size
_cons
-.110109
1.617731
.517082
1.307275
-0.21
1.24
0.831
0.216
-1.123571
-.9444801
.9033531
4.179943
size
_cons
-2.465446
5.697444
.8996503
1.793809
-2.74
3.18
0.006
0.001
-4.228728
2.181644
-.702164
9.213244
(base outcome)
. estat ic
Model
Obs
ll(null)
ll(model)
df
AIC
BIC
59
-57.57093
-49.17062
106.3412
114.6514
Using R

Y = primary food choice ; X = length of alligator
Estimated log odds that primary food choice of alligators is fish
rather than other types:
1
log
= 1.618 0.110
3
Estimated log odds that primary food choice of alligators is
invertebrate rather than other types:
2
log
= 5.697 2.465
3

What about estimated log odds that primary food choice of
alligators is fish rather than invertebrate?
1
log
= 1.618 5.697 + 0.110 (2.465)
2
log
1
2
= 4.080 + 2.355
For every 1 meter increase in length of the alligator, the odds of

choosing fish rather than an invertebrate as primary food increase
by 2.355 = 10.54 times.

Hypothesis testing on the effect of length as predictor:
Ho : j = 0 for j = 1, 2
LR = 16.8, p = 0.0002
Strong effect of length of alligator on food choice
Estimated Probabilities
=
( + )
=1 (
+ )
Denominator = same for each probability

Numerator = various j sum to the denominator
Parameters = zero for whichever the category is baseline in the
logit expression
1.62 0.011
1 =
1 + 1.62 0.011 + (5.70 2.47)
(5.70 2.47)
2 =
1 + 1.62 0.011 + (5.70 2.47)
1
3 =
1 + 1.62 0.011 + (5.70 2.47)
Example 2: Job Satisfaction and Income

The researchers seek to find the relationship between Y = job
satisfaction and X1 = income, stratied by X2= gender (1=F, 2=M),
for black Americans
Iteration
1:
log likelihood
= -103.35145
. mlogit satisfaction
income gender
[weight=count], b(1)
Iteration
2:
log
likelihood
= -102.92608
(frequency weights assumed)
Iteration 3:
log likelihood = -102.91365
Iteration
4:
log
Iteration 0:
log likelihood
likelihood == -102.91362
-107.39082
Iteration
log
Iteration 5:
1:
log likelihood
likelihood == -102.91362
-103.35145
Iteration 2:
Multinomial
regression= -102.91365
Number of obs
=
104
Iteration 3:logistic
log likelihood
LR chi2(6)
=
8.95
Iteration 4:
Iteration 5:
Prob > chi2
=
0.1762
Pseudo R2
=
0.0417
Multinomial logistic regression
Number of obs
=
104
LR chi2(6)
=
8.95
Prob
>
chi2
=
0.1762
satisfaction
Coef.
Std. Err.
z
P>|z|
Pseudo R2
=
0.0417
1
(base outcome)
2satisfaction
income
1
gender
_cons
2
income
gender
income
_cons
gender
_cons
3
income
4
gender
income
_cons
gender
4
_cons
income
gender
. estat _cons
ic
. estat
ic
Model
.
Model
Coef.
Std. Err.
P>|z|
.9239423
.7752856
(base outcome)
.1239678
1.317757
-.583335
1.990687
1.19
0.09
-0.29
0.233
0.925
0.769
-.5955895
-2.458788
-4.485009
2.443474
2.706724
3.318339
.9239423
.1239678
1.157282
-.583335
.005601
.5385145
1.157282
.005601
1.560782
.5385145
.1884805
-1.81048
1.560782
.1884805
-1.81048
1.19
0.09
1.57
-0.29
0.00
0.29
1.57
0.00
2.04
0.29
0.15
-0.92
2.04
0.15
-0.92
0.233
0.925
0.117
0.769
0.996
0.770
0.117
0.996
0.042
0.770
0.883
0.360
0.042
0.883
0.360
-.5955895
-2.458788
-.2907792
-4.485009
-2.390357
-3.072897
-.2907792
-2.390357
.0595581
-3.072897
-2.332134
-5.685582
.0595581
-2.332134
-5.685582
2.443474
2.706724
2.605344
3.318339
2.401559
4.149926
2.605344
2.401559
3.062005
4.149926
2.709095
2.064621
3.062005
2.709095
2.064621
.7752856
1.317757
.7388206
1.990687
1.22245
1.842591
.7388206
1.22245
.7659445
1.842591
1.286052
1.977129
.7659445
1.286052
1.977129
Obs
ll(null)
ll(model)
df
AIC
BIC
104
Obs
-107.3908
ll(null)
-102.9136
ll(model)
9
df
223.8272
AIC
247.6267
BIC
Example 2: Job Satisfaction and Income
log
= + 1 + 2 , = 1,2, , 1
1
I = Income
is the conditional log odds ratio between income and job

satisfaction categories 2 & 1 (3&1,4&1), given gender
G = Gender
is the conditional log odds ratio between gender and job
satisfaction categories 2 & 1 (3&1,4&1), given income
Models for Ordinal Responses
Cumulative Logit Models

Logits can utilize ordered categories
results in models with simpler interpretations and potentially greater
power than baseline-category logit models.
A cumulative probability for Y is the probability that Y is less than or

equal to a certain value. In notation, for j = 1, 2, , J,
= = 1 + = 2 + + =
= 1 + 2 + +
The cumulative probabilities reflect the ordering.

1 2 =1
Models for cum. prob. do not use P(Y1) because P(Y1) = 1
Cumulative Logit Models

The logits of the first J-1 cumulative probabilities are
These are called cumulative logits.

Logit[P(Y j)] is like an ordinary logit model with binary
response , i.e. 1 to j combines to form the first category,
and j+1 to J form the second category.
Each cumulative logit uses all response categories.
For J = 3, both logit[P(Y 1)] = log[1/(2+ 3)] and logit[P(Y2)] =
log[(1+ 2)/ 3] are used
Proportional Odds Property

For a predictor X, the cumulative logit model is given by:
Notice that does not have a subscript j which implies that the
value of is constant for all J-1 cumulative logits.
When the model fits well, a single parameter instead of J-1
parameters is enough to describe the effect of x.
The curves of each cumulative probability have the same
shape/slope/rate of change but different start and end points
depending on j.

At any fixed value, the
ordering is retained, with P(Y
1) being the lowest.
This is the case when > 0
When < 0, the curves are
descending.
When = 0, the graph has a horizontal line for each

cumulative probability.
Implies X and Y are statistically independent

P(Y = j) = P(Y j) P(Y j-1)
Probabilities are graphed
figure on the right.
This graph is when > 0.
As x increases, the probability
to fall on a lower category
increases as well
This is against the usual
interpretation that positive
slope implies positive
association
When < 0, the labels on the
figure on the right are
reversed.

Consider the odds ratio
Get the logarithm on both sides and simplify
log
( | = 2
( | = 1
log
( > | = 2
( > | = 1
= + 2 + 1 = (2 -1 )
Thus, the log OR is the difference between the cumulative logits at

those two values of x, and is equal to (2 -1 )
This is the proportional odds assumption.
The log OR is proportional to the distance between any x values
For x2 x1 = 1, the odds of response below any given category
multiply by exp{} for every unit increase in x.
The model expression for the cumulative probabilities is:
To estimate the category probabilities,
= = = 1
For example,
Example: All explanatory variables are

categorical
A study looks at factors that influence the decision of whether
college juniors will apply to graduate school. The response is
ordinal with VL at the highest end of the scale.
Because all variables are categorical, data can be entered in a
contingency table.
Apply to Grad School

Parental Undergrad Very Somewhat
Very Likely
Education institution Unlikely Unlikely
Private
175
98
20
Low
Public
25
12
7
Private
14
26
10
High
Public
6
4
3
Example: Cont.
Ensure that dataset is in case or expanded form before using
polr.
Example: Cont. (Using polr)

R command is polr (proportional odds logistic regression) from the
nnet package. Format of dataset should be in case form.
Example: Cont. (using polr)
The coefficients of the last output are called proportional odds ratios.
For pared, the odds of "very likely" applying versus "somewhat likely" or "unlikely"
applying combined are 3.07 greater among students from public than private
colleges, given that all the other variables in the model are held constant
Likewise, the odds of "very likely" or "somewhat likely" applying versus "unlikely"
applying is 3.07
times greater among students with high parental education,
given that all of the other variables in the model are held constant
Example: Cont. (using vglm from VGAM package)
Example: Cont. (in Stata)
Example: w/ continuous predictor (Using polr)
Example: Cont.
The coefficients of the last output are called proportional odds ratios.
For pared, the odds of "very likely" applying versus "somewhat likely" or "unlikely"
applying combined are 2.85 greater among students from public than private
colleges, given that all the other variables in the model are held constant
Likewise, the odds of "very likely" or "somewhat likely" applying versus "unlikely"
applying is 2.85 times greater among students with high parental education,
given that all of the other variables in the model are held constant
For gpa, when a student's gpa moves 1 unit, the odds of moving from "unlikely"
applying to "somewhat likely" or "very likley" applying (or from the lower and
middle categories to the high category) are multiplied by 1.85.
Example: Cont. (using vglm)
Example: Cont. (using stata)
Inference on Model Parameters

Testing for independence (Ho: = 0)
Test statistic to be used is the difference between the deviance
value for the independence model and the model allowing an
explanatory variable.
If p-value < LOS, Ho is rejected and we conclude that an
association exists
Tests of independence on an ordinal scale considers the

ordering of response categories.
When the model is fit, it is more powerful that tests of
independence for nominal data, because
it focuses on a restricted alternative, P(Y j)
it has only a single degree of freedom (Recall that beta is the
same for all J-1 cum logits)

Testing Ho: = 0

Testing the assumption on proportional odds
Our model where is constant will only hold if the proportional odds
assumptions is not violated. If it is violated, it would be better to get
individual estimates for each j.
Agresti suggested to get the LR test between the vglm model with
(Parallel=TRUE) for simultaneous fitting of , i.e. only one will be
the outcome, and with (Parallel=FALSE) for individual fitting of , i.e.
there will be separate estimates for .
Cases when assumption of proportionality: when the cumulative
probability curves intersect (recall graph earlier)
Occurs , for example, when Males tend to be on the moderate response
of the ordinal scale, whereas Females tend to be both on the extreme
responses of the ordinal scale.
Ho: The model without the additional parameters j is sufficient

P-value does not reject the null hypothesis. There is no need to
estimate for individual js. The single is enough.
Alternatives if the proportional odds assumptions is violated:
Run the model with individual js. (Issues: increase in SE, decrease in
power)
Run the model using baseline-category logits and use the ordinality in
an informal way to interpret the association. (Issue: Increase in number
of parameters, less parsimonious)
Collapse multicategory responses to binary. (Issue: loss of efficiency,
loss of data)
Invariance
Invariance to choice of response categories
Situation: Researcher A used a 5-point likert scale (SD, D, N, A,
SA). Researcher B conducted a similar study but used a 3-point
likert scale (D, N, A). If the proportional odds assumption is not
violated, the parameters for the effect of a predictor are roughly
the same.
This feature of the model makes it possible to compare estimates

from studies using different response scales.
Paired-Category Ordinal Logits
ADJACENT-CATEGORIES LOGITS
The adjacent-category logits are:
For J = 3, the logits are log(2/ 1) and log(3/ 2)

The corresponding models is

A simpler proportional odds version of the model is
For it, the effects {= j} of x, on the odds of making the

higher instead of the lower response are identical for each
pair of adjacent response categories.
Example
Stem Cell Research and Religious Fundamentalism
Example: Cont.

CONTINUATION-RATIO LOGITS
Another approach forms logits for ordered response categories in a
sequential manner. The models apply simultaneously to:
These are called continuation-ratio logits.

They refer to a binary response that contrasts each category with a
grouping of categories from lower levels of the response scale.
They refer to a binary response that contrasts each category with a
grouping of categories from lower levels of the response scale.
Example
Tonsil Size and Streptococcus
Example: Cont.

Multicategory Logit Models

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multicategory Logit Models

Uploaded by

Copyright:

Available Formats

MULTICATEGORY

Categorical Logit Models

They describe the odds of response in one category rather

Baseline Category Logits

Given that the response falls in category j or J, this is the

Baseline Category Logits

Parameters in the (J 1) equations determine parameters for

Example 1: Alligator Food Choice

Example 1: Alligator Food Choice

. mlogit food size, b(3)

Multinomial logistic regression

Log likelihood = -49.170622

[95% Conf. Interval]

Example 1: Alligator Food Choice

Example 1: Alligator Food Choice

For every 1 meter increase in length of the alligator, the odds of

Example 1: Alligator Food Choice

Denominator = same for each probability

Example 2: Job Satisfaction and Income

[95% Conf. Interval]

Example 2: Job Satisfaction and Income

is the conditional log odds ratio between income and job

Models for Ordinal Responses

Cumulative Logit Models

A cumulative probability for Y is the probability that Y is less than or

The cumulative probabilities reflect the ordering.

Models for cum. prob. do not use P(Y1) because P(Y1) = 1

Cumulative Logit Models

These are called cumulative logits.

Proportional Odds Property

Proportional Odds Property

When = 0, the graph has a horizontal line for each

Proportional Odds Property

Proportional Odds Property

Thus, the log OR is the difference between the cumulative logits at

To estimate the category probabilities,

Example: All explanatory variables are

Apply to Grad School

Example: Cont. (Using polr)

Example: Cont. (using polr)

Example: Cont. (using vglm from VGAM package)

Example: Cont. (in Stata)

Example: w/ continuous predictor (Using polr)

Example: Cont. (using vglm)

Example: Cont. (using stata)

Inference on Model Parameters

Tests of independence on an ordinal scale considers the

Inference on Model Parameters

Inference on Model Parameters

Inference on Model Parameters

Ho: The model without the additional parameters j is sufficient

This feature of the model makes it possible to compare estimates

Paired-Category Ordinal Logits

For J = 3, the logits are log(2/ 1) and log(3/ 2)

Paired-Category Ordinal Logits

For it, the effects {= j} of x, on the odds of making the

Paired-Category Ordinal Logits

These are called continuation-ratio logits.

You might also like