Professional Documents
Culture Documents
Categories Description
0 Household without cars
1 Household owning 1 car
2 Household owning 2 cars
3 Household owning 3 cars
4 or more household owning 4 or more cars
In the above example, there is a natural ordering, which suggests that households categorized as 2 own more
cars than the households categorized as 1 or 0. In this particular case, we cannot arbitrarily change the order.
Such data, where the order of outcomes is not arbitrary, but rather systematic, is called ordered data, which is
also a type of categorical data.
The use of categorical variable as an explanatory variables, such as gender, is common in OLS regression
models. As an explanatory variable, a categorical variable captures the systematic differences latent in data
that cannot be accounted for by other variables in the model. For instance, if there is a systematic difference
Variable Description
lfp Paid Labor Force: 1=yes 0=no
k5 Number of children less than 6 years old
k618 Children between 6 and 18 years of age
age Wife's age in years
wc Wife College: 1=yes 0=no
hc Husband College: 1=yes 0=no
lwg Log of wife's estimated wages
inc Family income excluding wife's in thousands
Cumulative
Frequency Percent Valid Percent Percent
Valid NotInLF 325 43.2 43.2 43.2
inLF 428 56.8 56.8 100.0
Total 753 100.0 100.0
The above table shows that 325 women (43.2%) in the sample were unemployed, whereas another 56.8%
were employed. We are interested in determining the relationship between the educational attainment of both
husband and wife on a women’s status in the labour force. The hypothesis that we would like to test is the
following: If a women has received college education, she may be more likely to be in the labour force. For
this, we perform a cross-tabulation in SPSS and select the chi-square option in the dialogue box.
Table 4: Cross tabulation of labour force participation and woman’s education
Crosstab
The above table suggests that 68% of college educated women were employed against 52.5% of women who
did not attend college. Now we would like to know if the association between wife’s education and
participation in labour force has any statistical significance. We use the chi-square statistics to test the
significance of the association between two variables.
Chi-Square Tests
The column Asymptotic Significance (two-sided) in the above figure indicates the significance of the
relationship. A low significance value of 0. 05 or less suggests that there may be a relationship between the
two variables. In the above case, the significance value of . 000 suggests that there is statistically significant
relationship between women’s education and their participation in the labour force. But what about the
relationship between education attainment of women’s husband and their participation in the workforce. A
cross-tab suggested that 60% of the women whose husbands received college education were employed
against 55% of the women whose husband did not attend college. The significance of chi-square test returned
a high value of . 160 suggesting that there was no relationship between husband’s education attainment and
the wife’s participation in labour force.
Now returning to the example of labour force survey of married women, we set y = 1 if the women is in the
labour force and y = 0, if she is unemployed. The independent variables include number of children,
education, and expected income. Now consider that a woman may be about to leave her job while another
Assuming that ε is distributed logistically with variance = π , the Logit model can be expressed as
2
3
expx i β
Pry = 1|x = [4]
1 + expx i β
Unlike the OLS model, where the variance can be estimated because the dependant variable is observed, in
the binary Logit model the variance is assumed because the dependant variable, y ∗ , is latent. One can argue
that an OLS method can be used to estimate the model. however, this leads to serious estimation problems.
The first and foremost problem is the heteroscedastic error terms. Since x i β + ε i can only be 0 or 1, therefore,
either x i β + ε i = 0 or x i β + ε i = 1. This leaves ε i equal to either −x i β or 1 − x i β . In such as case, variance is
given by (Greene (1985), p.874):
Varε i |x = x i β ∗1 − x i β [5]
The above equation suggests that as x increase, so does the variance of ε i . Other major problems include the
fact that x i β cannot be constrained to the 0 − 1 interval and that one could not avoid negative variances.
Estimation of binary Logit models
To identify the binary model, we always set one category or alternative as the base case. The estimated
coefficients are then interpreted as a comparison with the base case. Using the labour force example,
probability of being employed is given by:
exp Vw
Prwork = exp Vw +exp Vu
[7]
Prunemp = 1 − Prwork = 1 − 1
1+exp −Vw
exp −Vw
Prunemp = 1+exp −Vw
[10]
Odds Ratio
The odds ratio between the two outcomes is expressed as follows:
1
Prwork 1+exp −Vw
Prunemp
= exp −Vw
= 1
exp −Vw
= expVw [11]
1+exp −V w
Log of odds
And finally, the log of odds are expressed as
Prwork
ln Prunemp
= lnexpV w = V w , which is equal to x i β.
Let us revisit the dataset of labour force participation of married women and estimate a Binary Logit model of
participation in the labour force using income, children, and education of women and their husbands as
explanatory variables.
Interpreting binary Logit model and statistical inference
Table 6 lists the coefficients of a Binary Logit model that estimates the probability of being employed for
married, white women. The column B lists the estimated coefficients (Betas), while the odds are presented in
the last column, Exp(B). We first begin with the interpretation of coefficients. It is better to use the odds
ratio than the actual coefficients to interpret the model. Variable k5 represents number of young children
under the age of 6 in a household. The coefficient, β, is equal to −1. 463, and the odds are expressed as
exp−1. 463 = 0. 232. This implies that each additional young child decreases the odds of mother being
employed by a factor of 0. 23, all else being equal. The odds of college educated women being employed are
2. 242 times higher than the women who did not receive college education.
Parameter Estimates
The impact of age could be interpreted as follows, with an increase in the age by one year, the odds for being
employed decline by a factor of 0. 94. But what if one is interested in determining the impact of being 10
years older, rather than being just one year older. The actual formula for odds is expβδ, where δ is the
change in the number of units. For a 10-year change in age, the odds of working decline by a factor of
exp−. 0628 ∗ 10 = 0. 53. That is, the odds decline by almost 50%. If we were interested in determining the
odds of not working, we can simply take the inverse of expβ. Thus, each additional young child increases
the odds of being unemployed by a factor of 1 = 4. 31.
. 232
We can also interpret the results as a percentage change in odds using the formula: 100expβ k δ − 1. Again,
each additional young child decreases the odds of being employed by 77% (100exp−1. 4629 − 1 = − 76.
8%).
Statistical Inference of Binary Logit models
Logit models are interpreted similar to the OLS models. Instead of the t-stat (or Z-statistics) to evaluate the
Coefficient 2
statistical significance of the model, SPSS uses Wald statistics, which is expressed as SE
.
Estimation software also reports the significance level for Wald statistics. It has been observed that when the
estimated coefficient is very large, the corresponding standard error is also very large, thus returning a very
small value for the Wald statistic. This often leads one to fail to reject the null hypothesis that the estimated
parameter is equal to 0. In cases where the model returns a large coefficient for a variable, Wald statistics
may not be the best instrument to evaluate the parameter. One may want to rescale the variable in such a case.
Another more informative and reliable method is the likelihood ratio test. Each variable from the final model
is eliminated and the reduced model is estimated to obtain -2 * log-likelihood (-2LL). The procedure is
repeated for every variable in the final model. The log-likelihood test returns a change in the value of -2LL if
the effect is removed from the final model. The difference between -2LL for the model with only an intercept
-2 Log
Model Likelihood Chi-Square df Sig.
Intercept Only 1029.746
Final 905.266 124.480 7 .000
Pseudo R-Square
Cox and Snell .152
Nagelkerke .204
McFadden .121
where l0 is the kernel of the log-likelihood of the intercept-only model (only information in the model are
sample shares), while lB is the kernel of the log-likelihood of the final model. This formulation of
McFadden R-square has been adopted in the logistic regression estimation techniques in some software, e.g.
SPSS, which automatically generates this and other goodness-of-fit statistics.
For Logit models, R-square of 0.07 and higher reflects a good fit. In fact, Louviere et al (2000) have argued
that ρ 2 values of 0.2 to 0.4 are "considered to be indicative of extremely good model fits." They have cited a
simulation experiment by Domenchic and McFadden (1975) who have "equialenced this range to 0.7 to 0.9
for a linear function."
′
In the above equation, β j is the coefficient for variable x i when Y i = j. The subscript i on x suggests that it
varies across the decision makers (i) and the subscript j on β suggests that it varies across choices (j). The
above model will return a set of probabilities for J alternatives for the decision-maker with characteristics x i .
It will also return J − 1 non-redundant baseline logits. As mentioned earlier, we normalize the Multinomial
Logit model by assuming that one set of parameters is equal to 0, i.e., β 1 = 0, therefore e 1 = 1. The choice
for the base case, whose coefficients are set to 0, is completely arbitrary. The probabilities are therefore
expressed as follows:
β′x
e j i
ProbY = j = for j = 1,2, ... , J, [15]
∑ J ′
1+ e βkxi
k=1
ProbY = 1 = 1
[16]
1+∑ k=1
J ′
e βkxi
g3 = 0 [19]
expg 1 expg 1
Pauto = expg 1 +expg 2 +expg 3
= 1+expg 1 +expg 2
[20]
expg 2
Ptransit = 1+expg 1 +expg 2
[21]
Pwalk = 1
1+expg 1 +expg 2
[22]
The first difference you will notice is that there are two sets of coefficients. One set of coefficients estimates
the odds of working and being inactive and the other set of coefficient measures the odds of being in school
and being inactive. The odds of black men in the labour force were exp−. 444 = 0. 64 times than that of
whites and others. Stated otherwise, odds of non-blacks working were 1/0. 64 = 1. 562 5 times higher than the
blacks. The estimated coefficient (−0. 444) is measuring the change in log-odds
(LN[Prob-Work/Prob-Inactive]) when the variable Black is increased by one unit, i.e. from 0 to 1. Whereas
exp−. 444 = 0. 64 gives the ratio of odds (Prob-Work/Prob-Inactive) of working against inactive when
Black=1 to when Black =0. Note that if one would like to determine the odds of blacks being inactive rather
than working, the odds are 1 = 1. 56 times, the same as the odds for non-blacks working against
exp−0. 444
being inactive. Similarly the odds of young men from intact families to be in school against being inactive
were 1/ exp−. 547 = 1. 73 as high as those of young men from broken homes. As for continuous explanatory
variables, we can see that odds of working or in school against being inactive increase with family income and
test score. A unit increase in the family income increases the odds of attending school by
exp. 268 − 1 ∗ 100 = 30. 7%.
For the Conditional Logit model, z ij = x ij , w i . If x ij represents the attributes of the choices, the subscript ‘ij’
on x suggests that it varies across the decision makers (i) and choices (j). Where as w i represent the
characteristics of the decision maker (i) and hence it does not vary across alternatives. We can re-write the
above equation as follows:
β ′ x +α ′ w β′x ′
e ij e α wi
ProbY i = j = e ij
=
i
β ′ x ij +α ′ wi β′x
[25]
∑ J
j=1
e ∑ J
j=1
′
e ij e α wi
It is useful to note that terms that do not vary across alternatives – that is, those specific to the individual – fall
out of the probability. Therefore the above equation can be simplified as follows:
β′x
ProbY i = j = e ij
β′x
[26]
∑ J
j=1
e ij
The way to accommodate income as a regressor is to introduce alternative-specific dummy variables and
multiply them with the common characteristics of the individual decision-maker.
The income variable is introduced in the model as an alternative-specific variable. For example, the variable
Inc − Eaton will capture the impact of income on the utility of shopping at Eaton Centre, whereas the variable
Inc − Fairview will capture the impact of income on shopping at the Fairview Mall. One can see that by
interacting the characteristics of the decision-maker with the alternative-specific dummies (not shown in the
above table), we have created new variables that vary across alternatives for each decision-maker. Also,
remember not to include all interacted income variables in the utility function because if you add them
together, they will again reproduce the original income variable and hence will out of the equation during
estimation. In the above example, include any two interacted income variables in the model.
Unlike the Multinomial Logit model, the Conditional Logit model returns 1 set of parameters, regardless of
the number of alternatives. However, the data set has to be conditioned so that each decision-maker is
repeated in the data set for the number of available alternatives, which is evident from the above two tables.
Therefore, the total number of rows in the data set is equal to the number of decision-makers (i) times
available alternatives (j). This is only true if all decision-makers are presented with the same choice set. The
Conditional Logit model allows the modeller to restrict the number of alternatives available to a
decision-maker. Consider the example of mode choice. A trip-maker without a valid driver’s license can be
offered a choice set that excludes the auto-drive mode.
The marginal effects for any variable x k can be computed by differentiating the Logit model with respect to
the variable x k . Therefore, marginal effects are given by the following equation:
∂P j
δ jk = ∂x k
= P j 1j = k − P k β [27]
The above is referred to as direct elasticity where m indexes the regressor (attribute) variable and j,k index the
alternatives. Consider the following example where we would like to determine the direct elasticity of the
auto-drive mode with respect to the cost of driving. Direct elasticity calculations require the following inputs:
x km is the cost of driving,
P k is the probability of auto-drive, and
Cross elasticity calculations for change in the auto-drive mode with respect to changes in transit costs require
the following inputs:
x jm is the cost of transit,
P k is the probability of auto-drive, and
β k is the estimated coefficient for the cost variable.
In estimating Conditional Logit models, one is not restricted by the number of choices. Here the "size of the
estimation problem is independent of the number of choices" (Greene,1997, p. 920). Greene further argues
that the number of choices should be restricted to 100. The fact remains that even with 100 choices,
interpretation of the model becomes a major concern. From the behavioural perspective, a decision-maker
seldom undertakes simultaneous evaluation of 100 choices. To assume that a rational decision-maker can
simultaneously evaluate 100 choices is debatable at best.
The Conditional Logit model does not contain a constant term (β 0 in the OLS tradition). The Conditional
Logit model can only include J − 1 alternative-specific constants. In the above-mentioned mode choice
problem involving three alternatives, we can create alternative-specific constants for any two alternatives.
In conditional Logit models, we do not set any category as the base case or set its systematic utility to 0. The
binary choice is presented as conditional Logit:
exp Va
Pauto = exp Va +exp Vt
[30]
The above equations presents an interesting property of conditional Logit models. We do not observe the
actual utility, but the difference in the utility of two choices.
exp Vt −Va
Ptransit = 1 − 1
1+exp Vt −Va
= 1+exp Vt −Va
[32]
exp Vt
Ptransit = exp Va +exp Vt +exp Vw
exp Vw
Pwalk = exp Va +exp Vt +exp Vw
If you notice the probability function carefully, we are still dealing with the difference in utilities. Let us
divide both the denominator and the numerator with exp V a in Pauto
Pauto = 1
exp V t exp Vw
= 1
1+exp Vt −Va +exp Vw−Va
[35]
1+ +
exp V a exp V a
β′x ′
P ij ′ = e ij
β′x
[37]
∑ J
j=1
e ij
P ij ′
= e
β′x ′
= expβ ′ x ij − x ij ′ [38]
e ij
The above expression suggests that the log-odds of choosing j over j ′ are given by the "weighted difference
between the individual’s values on the explanatory variables for the two alternatives, with the weights being
the estimated parameters", i.e., βs.
The interpretation is illustrated by using a model estimated by David Hensher, which has been reproduced by
Greene (1997) and Powers and Xiu (2000). The example is that of a classic mode choice problem where 152
respondents were surveyed. The original model consisted of four choices: air, bus, car, and train. Powers and
Xiu (2000) have excluded air as an alternative and have reported the results for a three mode choice set where
the choices are 1=train, 2=bus, and 3=car. We have retained Powers and Xiu (2000) results in this discussion.
The explanatory variables are terminal wait time (TTME), in-vehicle time (INVT), in-vehicle cost (INVC),
and GC which is a generalised cost measure computed as INVC + (INVT* Value of Time).
Table 11: Estimates from a Conditional Logit Model with alternative-specific attributes
The log-odds for an individual of choosing train (1) over bus (2) are given as:
P i1
ln P i2
= −. 002TTME 1 − TTME 2 −. 435INVC 1 − INVC 2 −. 077INVT 1 − INVT 2 +. 431GC 1 − GC 2
The above suggests that the odds of choosing a mode decline with the increase in wait time, in-vehicle travel
time, and in-vehicle costs. The odds of choosing a mode, however, increase with the increase in generalised
cost.
When the attributes of choices as well as the characteristics of the decision-makers explain the utility of the
alternatives, the model can contain alternative-specific variables as well as individual-specific covariates after
multiplying individual-specific covariates with alternative-specific dummies.
The model is presented as follows:
β ′ x +α ′ w β′x α′w
e ij j i e ij e j i
ProbY i = j = β ′ x +α ′ w
= β′x α′w
[40]
∑ J
j=1
e ij j i ∑ J
j=1
e ij e j i
where x ij are the alternative-specific covariates, while w i are individual-specific attributes. Interpretation of
the above model is similar to that of Conditional Logit model discussed earlier. In the Conditional Logit
model we have included alternative-specific variables. Now we include an individual-specific variable,
household income (HHINC), which does not vary across alternatives. As mentioned earlier, HHINC will be
multiplied with the alternative-specific dummies to enter the model as a regressor. Powers and Xiu (2000)
omit the lowest coded category (train) and create two alternative-specific constants DB (dummy for bus) and
DC (dummy for car). The new variables are:
HHINC * Dummy for bus = HHINC_DB
HHINC * Dummy for car = HHINC_DC
The results indicate that an increase in the household income increases the odds in favour of bus and car
against train. However, a look at the t-statistics reveal that only HHINC_DC returns a statistically significant
coefficient.
Mroz, T. A. (1987). The sensitivity of an empirical model of married women’s hours of work to economic and
statistical assumptions. Econometrica. Vol. 55, no. 4. pp.765-799.
Long, J. Scott, Freese, Jeremy. (2005). Regression models for categorical dependant variables using Stata.
Stata Press. Texas.
Louviere, J. J., Hensher, D. A., and Swait, J. D. (2000). Stated choice methods: Analysis and application.
Cambridge University Press.
Domencich, T., McFadden, D. (1975). Urban travel demand: A behavioral analysis. North- Holland,
Amsterdam.
Greene, William H. (1997). Econometric Analysis. 3rd edition. Prentice Hall.
McFadden, D. (1974). The measurement of urban travel demand. Journal of Public Economics. 3(4) 303-328.
Powers, Daniel A., Xie, Yu. (2000). Statistical methods for categorical data analysis. Academic Press.
California.