You are on page 1of 49

MULTICATEGORY

LOGIT MODELS
Del Rosario, RP | Perez, JJ

Nominal Responses
One response variable Y with J levels
One or more explanatory or predictor variables
quantitative, qualitative or both

Logistic Regression

Forming Logits
When J = 2, Y is dichotomous
log of success odds that an event occurs or does not
occur:
logit () = log

When J > 2,
Multicategory or Polytomous response variable
(1)
2

There are
logits that can be formed but only
(J 1) are non-redundant

Categorical Logit Models


Nominal response
Multinomial logistic regression/Baseline Logits

Ordinal response
Ordinal logistic regression
Cumulative Logits/Proportional Odds Model
Adjacent Categories
Continuous Ratio

Multicategory Logits
Model simultaneously all relationships between
probabilities for pairs of categories (vs Binary Logistic
Regression)
Optimal efficiency
Estimates of the model parameters smaller SE than the
estimates obtained by fitting the equations separately.
For simultaneous fitting, the same parameter estimates occur for a
pair of categories no matter which category is baseline.

They describe the odds of response in one category rather


than another.

Baseline Category Logits


Logit models for nominal responses pair each response
category with a baseline category.
The choice of baseline category is arbitrary.
If the last category (J) is the baseline, the baseline
category logits are:

, = 1, ,

Given that the response falls in category j or J, this is the


log odds that the response is j.
For J = 3, for instance, the logit model uses log ( 1/2 )
and log (2 /3)

Baseline Category Logits


The logit models using the baseline-category logits with a predictor
x has

log
= + , = 1, ,

Parameters in the (J 1) equations determine parameters for


logits using all other pairs of response categories.
For instance, for an arbitrary pair of categories a and b
/

log
= log
= log
log

= + +
= +

Example 1: Alligator Food Choice


A study looking into factors influencing the primary food
choice of alligators in the wild
59 alligators were sampled, and the data shows the
alligator length (in meters) and the primary food type, in
volume, found in the alligators stomach
Food type has three categories: Fish (1), Invertebrate
(2), and Other (3)

Example 1: Alligator Food Choice


Table 1. Alligator size (meter) and Primary food choice

. mlogit food size, b(3)


Iteration
Iteration
Iteration
Iteration
Iteration
Iteration

0:
1:
2:
3:
4:
5:

log
log
log
log
log
log

likelihood
likelihood
likelihood
likelihood
likelihood
likelihood

=
=
=
=
=
=

-57.570928
-49.97414
-49.186349
-49.170647
-49.170622
-49.170622

Using STATA

Multinomial logistic regression

Number of obs
LR chi2(2)
Prob > chi2
Pseudo R2

Log likelihood = -49.170622


Std. Err.

P>|z|

=
=
=
=

59
16.80
0.0002
0.1459

food

Coef.

[95% Conf. Interval]

size
_cons

-.110109
1.617731

.517082
1.307275

-0.21
1.24

0.831
0.216

-1.123571
-.9444801

.9033531
4.179943

size
_cons

-2.465446
5.697444

.8996503
1.793809

-2.74
3.18

0.006
0.001

-4.228728
2.181644

-.702164
9.213244

(base outcome)

. estat ic
Model

Obs

ll(null)

ll(model)

df

AIC

BIC

59

-57.57093

-49.17062

106.3412

114.6514

Using R

Example 1: Alligator Food Choice


Y = primary food choice ; X = length of alligator
Estimated log odds that primary food choice of alligators is fish
rather than other types:

1
log
= 1.618 0.110
3
Estimated log odds that primary food choice of alligators is
invertebrate rather than other types:
2
log
= 5.697 2.465
3

Example 1: Alligator Food Choice


What about estimated log odds that primary food choice of
alligators is fish rather than invertebrate?

1
log
= 1.618 5.697 + 0.110 (2.465)
2

log

1
2

= 4.080 + 2.355

For every 1 meter increase in length of the alligator, the odds of


choosing fish rather than an invertebrate as primary food increase
by 2.355 = 10.54 times.

Example 1: Alligator Food Choice


Hypothesis testing on the effect of length as predictor:

Ho : j = 0 for j = 1, 2
LR = 16.8, p = 0.0002
Strong effect of length of alligator on food choice

Estimated Probabilities
=

( + )

=1 (

+ )

Denominator = same for each probability


Numerator = various j sum to the denominator
Parameters = zero for whichever the category is baseline in the
logit expression

Estimated Probabilities
1.62 0.011
1 =
1 + 1.62 0.011 + (5.70 2.47)

(5.70 2.47)
2 =
1 + 1.62 0.011 + (5.70 2.47)
1
3 =
1 + 1.62 0.011 + (5.70 2.47)

Example 2: Job Satisfaction and Income


The researchers seek to find the relationship between Y = job
satisfaction and X1 = income, stratied by X2= gender (1=F, 2=M),
for black Americans

Iteration
1:
log likelihood
= -103.35145
. mlogit satisfaction
income gender
[weight=count], b(1)
Iteration
2:
log
likelihood
= -102.92608
(frequency weights assumed)
Iteration 3:
log likelihood = -102.91365
Iteration
4:
log
Iteration 0:
log likelihood
likelihood == -102.91362
-107.39082
Iteration
log
Iteration 5:
1:
log likelihood
likelihood == -102.91362
-103.35145
Iteration 2:
log likelihood = -102.92608
Multinomial
regression= -102.91365
Number of obs
=
104
Iteration 3:logistic
log likelihood
LR chi2(6)
=
8.95
Iteration 4:
log likelihood = -102.91362
Iteration 5:
log likelihood = -102.91362
Prob > chi2
=
0.1762
Log likelihood = -102.91362
Pseudo R2
=
0.0417
Multinomial logistic regression
Number of obs
=
104
LR chi2(6)
=
8.95
Prob
>
chi2
=
0.1762
satisfaction
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
Log likelihood = -102.91362
Pseudo R2
=
0.0417
1
(base outcome)
2satisfaction
income
1
gender
_cons
2
income
gender
income
_cons
gender
_cons
3
income
4
gender
income
_cons
gender
4
_cons
income
gender
. estat _cons
ic

. estat
ic
Model

.
Model

Coef.

Std. Err.

P>|z|

[95% Conf. Interval]

.9239423
.7752856
(base outcome)
.1239678
1.317757
-.583335
1.990687

1.19
0.09
-0.29

0.233
0.925
0.769

-.5955895
-2.458788
-4.485009

2.443474
2.706724
3.318339

.9239423
.1239678
1.157282
-.583335
.005601
.5385145
1.157282
.005601
1.560782
.5385145
.1884805
-1.81048
1.560782
.1884805
-1.81048

1.19
0.09
1.57
-0.29
0.00
0.29
1.57
0.00
2.04
0.29
0.15
-0.92
2.04
0.15
-0.92

0.233
0.925
0.117
0.769
0.996
0.770
0.117
0.996
0.042
0.770
0.883
0.360
0.042
0.883
0.360

-.5955895
-2.458788
-.2907792
-4.485009
-2.390357
-3.072897
-.2907792
-2.390357
.0595581
-3.072897
-2.332134
-5.685582
.0595581
-2.332134
-5.685582

2.443474
2.706724
2.605344
3.318339
2.401559
4.149926
2.605344
2.401559
3.062005
4.149926
2.709095
2.064621
3.062005
2.709095
2.064621

.7752856
1.317757
.7388206
1.990687
1.22245
1.842591
.7388206
1.22245
.7659445
1.842591
1.286052
1.977129
.7659445
1.286052
1.977129

Obs

ll(null)

ll(model)

df

AIC

BIC

104
Obs

-107.3908
ll(null)

-102.9136
ll(model)

9
df

223.8272
AIC

247.6267
BIC

Example 2: Job Satisfaction and Income

log
= + 1 + 2 , = 1,2, , 1
1
I = Income

is the conditional log odds ratio between income and job


satisfaction categories 2 & 1 (3&1,4&1), given gender
G = Gender
is the conditional log odds ratio between gender and job
satisfaction categories 2 & 1 (3&1,4&1), given income

Models for Ordinal Responses

Cumulative Logit Models


Logits can utilize ordered categories
results in models with simpler interpretations and potentially greater
power than baseline-category logit models.

A cumulative probability for Y is the probability that Y is less than or


equal to a certain value. In notation, for j = 1, 2, , J,
= = 1 + = 2 + + =
= 1 + 2 + +

The cumulative probabilities reflect the ordering.


1 2 =1

Models for cum. prob. do not use P(Y1) because P(Y1) = 1

Cumulative Logit Models


The logits of the first J-1 cumulative probabilities are

These are called cumulative logits.


Logit[P(Y j)] is like an ordinary logit model with binary
response , i.e. 1 to j combines to form the first category,
and j+1 to J form the second category.
Each cumulative logit uses all response categories.
For J = 3, both logit[P(Y 1)] = log[1/(2+ 3)] and logit[P(Y2)] =
log[(1+ 2)/ 3] are used

Proportional Odds Property


For a predictor X, the cumulative logit model is given by:

Notice that does not have a subscript j which implies that the
value of is constant for all J-1 cumulative logits.
When the model fits well, a single parameter instead of J-1
parameters is enough to describe the effect of x.
The curves of each cumulative probability have the same
shape/slope/rate of change but different start and end points
depending on j.

Proportional Odds Property


At any fixed value, the
ordering is retained, with P(Y
1) being the lowest.
This is the case when > 0
When < 0, the curves are
descending.

When = 0, the graph has a horizontal line for each


cumulative probability.
Implies X and Y are statistically independent

Proportional Odds Property


P(Y = j) = P(Y j) P(Y j-1)
Probabilities are graphed
figure on the right.
This graph is when > 0.
As x increases, the probability
to fall on a lower category
increases as well
This is against the usual
interpretation that positive
slope implies positive
association
When < 0, the labels on the
figure on the right are
reversed.

Proportional Odds Property


Consider the odds ratio
Get the logarithm on both sides and simplify
log

( | = 2
( | = 1
log
( > | = 2
( > | = 1
= + 2 + 1 = (2 -1 )

Thus, the log OR is the difference between the cumulative logits at


those two values of x, and is equal to (2 -1 )
This is the proportional odds assumption.
The log OR is proportional to the distance between any x values
For x2 x1 = 1, the odds of response below any given category
multiply by exp{} for every unit increase in x.

Estimated Probabilities
The model expression for the cumulative probabilities is:

To estimate the category probabilities,

= = = 1
For example,

Example: All explanatory variables are


categorical
A study looks at factors that influence the decision of whether
college juniors will apply to graduate school. The response is
ordinal with VL at the highest end of the scale.
Because all variables are categorical, data can be entered in a
contingency table.

Apply to Grad School


Parental Undergrad Very Somewhat
Very Likely
Education institution Unlikely Unlikely
Private
175
98
20
Low
Public
25
12
7
Private
14
26
10
High
Public
6
4
3

Example: Cont.
Ensure that dataset is in case or expanded form before using
polr.

Example: Cont. (Using polr)


R command is polr (proportional odds logistic regression) from the
nnet package. Format of dataset should be in case form.

Example: Cont. (using polr)

The coefficients of the last output are called proportional odds ratios.
For pared, the odds of "very likely" applying versus "somewhat likely" or "unlikely"
applying combined are 3.07 greater among students from public than private
colleges, given that all the other variables in the model are held constant
Likewise, the odds of "very likely" or "somewhat likely" applying versus "unlikely"
applying is 3.07
times greater among students with high parental education,
given that all of the other variables in the model are held constant

Example: Cont. (using vglm from VGAM package)

Example: Cont. (in Stata)

Example: w/ continuous predictor (Using polr)

Example: Cont.

The coefficients of the last output are called proportional odds ratios.
For pared, the odds of "very likely" applying versus "somewhat likely" or "unlikely"
applying combined are 2.85 greater among students from public than private
colleges, given that all the other variables in the model are held constant
Likewise, the odds of "very likely" or "somewhat likely" applying versus "unlikely"
applying is 2.85 times greater among students with high parental education,
given that all of the other variables in the model are held constant

For gpa, when a student's gpa moves 1 unit, the odds of moving from "unlikely"
applying to "somewhat likely" or "very likley" applying (or from the lower and
middle categories to the high category) are multiplied by 1.85.

Example: Cont. (using vglm)

Example: Cont. (using stata)

Inference on Model Parameters


Testing for independence (Ho: = 0)
Test statistic to be used is the difference between the deviance
value for the independence model and the model allowing an
explanatory variable.
If p-value < LOS, Ho is rejected and we conclude that an
association exists

Tests of independence on an ordinal scale considers the


ordering of response categories.
When the model is fit, it is more powerful that tests of
independence for nominal data, because
it focuses on a restricted alternative, P(Y j)
it has only a single degree of freedom (Recall that beta is the
same for all J-1 cum logits)

Inference on Model Parameters


Testing Ho: = 0

Inference on Model Parameters


Testing the assumption on proportional odds
Our model where is constant will only hold if the proportional odds
assumptions is not violated. If it is violated, it would be better to get
individual estimates for each j.
Agresti suggested to get the LR test between the vglm model with
(Parallel=TRUE) for simultaneous fitting of , i.e. only one will be
the outcome, and with (Parallel=FALSE) for individual fitting of , i.e.
there will be separate estimates for .
Cases when assumption of proportionality: when the cumulative
probability curves intersect (recall graph earlier)
Occurs , for example, when Males tend to be on the moderate response
of the ordinal scale, whereas Females tend to be both on the extreme
responses of the ordinal scale.

Inference on Model Parameters

Ho: The model without the additional parameters j is sufficient


P-value does not reject the null hypothesis. There is no need to
estimate for individual js. The single is enough.
Alternatives if the proportional odds assumptions is violated:
Run the model with individual js. (Issues: increase in SE, decrease in
power)
Run the model using baseline-category logits and use the ordinality in
an informal way to interpret the association. (Issue: Increase in number
of parameters, less parsimonious)
Collapse multicategory responses to binary. (Issue: loss of efficiency,
loss of data)

Invariance
Invariance to choice of response categories
Situation: Researcher A used a 5-point likert scale (SD, D, N, A,
SA). Researcher B conducted a similar study but used a 3-point
likert scale (D, N, A). If the proportional odds assumption is not
violated, the parameters for the effect of a predictor are roughly
the same.

This feature of the model makes it possible to compare estimates


from studies using different response scales.

Paired-Category Ordinal Logits

ADJACENT-CATEGORIES LOGITS
The adjacent-category logits are:

For J = 3, the logits are log(2/ 1) and log(3/ 2)


The corresponding models is

Paired-Category Ordinal Logits


A simpler proportional odds version of the model is

For it, the effects {= j} of x, on the odds of making the


higher instead of the lower response are identical for each
pair of adjacent response categories.

Example
Stem Cell Research and Religious Fundamentalism

Example: Cont.

Paired-Category Ordinal Logits


CONTINUATION-RATIO LOGITS
Another approach forms logits for ordered response categories in a
sequential manner. The models apply simultaneously to:

These are called continuation-ratio logits.


They refer to a binary response that contrasts each category with a
grouping of categories from lower levels of the response scale.
They refer to a binary response that contrasts each category with a
grouping of categories from lower levels of the response scale.

Example
Tonsil Size and Streptococcus

Example: Cont.

You might also like