You are on page 1of 18

Econometric Models for Probabilistic Choice Among Products Author(s): Daniel McFadden Source: The Journal of Business, Vol.

53, No. 3, Part 2: Interfaces Between Marketing and Economics (Jul., 1980), pp. S13-S29 Published by: The University of Chicago Press Stable URL: http://www.jstor.org/stable/2352205 . Accessed: 19/01/2011 14:38
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at . http://www.jstor.org/action/showPublisher?publisherCode=ucpress. . Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

The University of Chicago Press is collaborating with JSTOR to digitize, preserve and extend access to The Journal of Business.

http://www.jstor.org

CONSUMER BEHAVIOR
Daniel McFadden
Massachusetts Institute of Technology

Econometric Models for Probabilistic Choice among Products*


View of Marketing An Econometrician's I understandthe disciplineof marketingexists to answer questions such as: "Will housepersons buy more BrandA soap if its perfumecontent is increased?" Traditional econometric demand analysis provides no answer. Its attention has been concentrated on consumption levels of broad commodity classes (e.g., housing services), examined using aggregate market data, with demand models constructed on the twin pillars of economic rationality and consumer sovereignty. The market researcher has understandablylooked elsewhere-to psychology and survey research-for answers to his questions. Realities have forced econometric demand analysts to broaden their perspective. Public interventionin the supply of some commodities, notably in the areas of transportation,energy, and communications,have requiredeconomists to recognize the marketing considerations implicit in issues of policy. (The decision of whether to build and how to design a public
* Preparedfor presentationat the Conference on Interfaces betweenMarketing Economics,Graduate and School of Management,University of Rochester, April 7, 1978. Research was supportedin partby the National Science Foundation through grant SOC75-22657to the University of California, Berkeley.Portionsof this paperwere writtenwhile the author was an Irving Fisher Visiting Professor of Economicsat the Cowles Foundation Researchin Economics, for Yale University.
(Journal of Business, 1980, vol. 53, no. 3, pt. 2)

This paper reviews several recent developments in econometric demand analysis which may be of interest in marketresearch. Econometric models of probabilisticchoice, suitable for forecasting choice among existing or new brands, or switching between brands, are surveyed. These models incorporate attributedescriptions of commodities, makingthem statistical counterparts of the Court-GrilichesLancaster theory of consumer behavior. Particularattention is given to models which yield tree structuresof similaritiesbetween alternatives. Also reviewed are methods for estimatingeconometric models of probabilistic choice from "point-ofsale" sample surveys.

? 1980by The University of Chicago


0021-9398/80/5332-0003$01

.50

S13

S14

Journal of Business

transitsystem is not differentin kind from the decision of whether to marketand how to determine the perfume content of BrandA soap.) On the other hand, economists have begun to use, and even commission, sample surveys and to confront the problemsof developing models appropriatefor the analysis of survey data. There are four key elements in these economic analyses of marketingproblems.
1. The objective of market research is assumed to be the forecasting of behavior of consumers in economic markets. Then, the analysis

should be focused toward this objective, with questions of attitudes, intentions, perceptions, and psychological measurement entertained only to the extent that they can be linked to marketbehavior and can improve forecasting ability.
2. The theory of the economically rational utility-maximizing consumer, interpreted broadly to admit the effects of perception, state of mind, and imperfect discrimination, provides a plausible, logically unified foundation for the development of models of various aspects of market behavior. 3. The core of a model of market behavior will be an equation, consistent with the theory of the economic consumer, which specifies the probabilities of choices (e.g., brand, frequency, or volume of purchases) as a function of the objective market environment of the

consumer, including measured attributes and prices of alternatives, socioeconomic characteristics, and past experience. If this function depends on nonmarketvariables, such as perceptions and attitudes, then the system should be completed with equations relatingthe evolution of attitudes and perceptions to the market environment. The basic tool for forecasting will be a "reduced-form"equation relating behaviorto marketenvironment,with interveningnonmarketvariables eliminated.
4. The design of sample surveys, construction of questionnaires, specification of models of market behavior, and statistical methods for analysis should be integrated. In particular, survey design and statisti-

cal procedure should be jointly optimized to yield consistent, maximally efficient forecasts.
Theories and Models of Probabilistic-Choice Behavior

I start from the axiom that consumers are "rational"in the sense that they make choices which maximize their perceived utility, subject to economic constraints on expenditures. To accommodate the demonstrated inability of individuals to discriminate perfectly, or of the analyst to measure exactly the environment of the choice, I follow Thurstone(1927)in assumingthat utility is a randomfunction. Sensible assumptionson the variableson which the utility of an alternativecan

Econometric Models

S15

depend and on the probabilistic structure of the utility function will permit the constructionof plausible, nonvacuous models of behavior. Althoughthis structureis apparentlystatic, it can in principleaccommodate learning behavior and "behavior modification"via specification of the dependence of utility on experience, information,and perception. The following paragraphs summarize the development in McFadden (1973). I shall concentrateon choice among a finite set of alternatives(e.g., brand choice), althoughthe theory is more general. Suppose alternatives j = 1, . . . , J are offered, with alternative j having a vector of

measured attributesZj, and a (random)utility u (zj). Then, the choice probability that i will be chosen satisfies for j # i], (1) where we assume the distributionof u is such that the probabilityof
P(i I z) = P [u EEJ u(zi) _ u(zj) ties is zero, and z = (z1, . . . , Zj).

The randomutility function u (zj) associated with a discrete alternative should be interpreted as the maximum utility attainable by the consumer, given his budget constraint and a fixed alternative]. Then u (zj) is a function of income and prices, includingthe price of alternativej. The sources of randomnessin the utilityfunctionare unobserved variations in tastes and in the attributesof alternatives, and errors of perception and optimizationby the consumer. If underlying preferences are defined in terms of psychological needs, with commodities having attributes which meet these needs, then zj will include these attributes.In particular,if a consumerpreference model of the Court-Griliches-Lancaster type is applied to choice amongdiscrete, mutuallyexclusive alternatives,thenzj will includethe attributes of the portfolio of purchases made when alternativej is specified. Further, if "household production" of the type studied by Becker enters the consumer's optimizationprocess, then u (zj) is interpreted as maximumutility obtainablewith optimalhousehold production, given the budget constraint and the fixed alternativej. The models of econometricchoice outlinedabove representa statistical implementationof economic consumer theory for problems of discrete choice. They are consistent with a variety of special preference structures,includingthose impliedby Lancasterianor Beckerian views of the consumer, for appropriatespecifications of the random utilities of the discrete alternatives. The choice probabilities will provide the key mapping from the environmentof choice to marketbehavior. For econometricpurposes, the measured variables influencingutility will be specified, as will a finite-dimensional parametricfamily of probabilitydistributionsfor the random utility function.

S16

Journal of Business

The Luce Model

One historicallyand empiricallyimportantprobabilisticchoice model, due to Luce (1959), is


J

P(i IzO) = v(z1,O)I Zv(zj,O), j=1

(2)

where v (zi,O)is a scale value for an alternative with measured attributes zi and is assumed known up to a finite parametervector 0. In its econometricimplementation,with log v(zi,6) linear-in-parameters, this model is also called the multinomiallogit (MNL) model. This model can be derivedfrom the random-utility model by assumingthe utilities of the various alternatives to be independently distributed, with the extreme value distribution
P[u(zi) ui] = exp [-v(zj,0)e-ui]. (3)

The Luce model has been employed for market analysis in economics with reasonable success, notably in transportationplanning where it has been used to forecast market penetrationof new travel modes (see McFadden [1974; McFadden and Associates 1977];for a survey of other empirical applications, see McFadden [1976]). Some marketing applicationsare indicatedin Silk and Urban (1978)and Punj and Staelin (1978). The model contains the structuralrestriction,independence from irrelevant alternatives (IIA), that the relative odds of

two alternatives are independentof the attributes, or even the presence, of a third alternative. A classic example, with strong contrasts in similaritiesof alternatives, shows that this restrictionis sometimes implausible.Suppose zi, Z2, and Z3 are the attributesof a trip by red bus, blue bus, and auto, respectively. Suppose consumerstreatthe two buses as equivalent and are indifferentbetween auto and bus. Then, one expects P(l jz1,z2) = P(l Iz1,z3) = P(2Jz2,z3) = 1/2 and P(1 Z1,z2,z3)= P(2 IZ1,Z2,Z3) = 1/4. The relativeodds of 1 and 3 depend on the presence of alternative2, and these probabilitiesare inconsistent with the Luce model. The example suggests that the Luce model will be unsuitable for marketingapplications where the patterns of perceived similaritiesof brandshave a significantinfluenceon market shares. Thurstone(Multinomial-Probit) Model The structuralrestrictionsof the Luce model have led psychologists to develop alternatives without the IIA property. One conceptually appealing alternativeis the multinomial-probit model, obtainedfrom the random-utility model by assumingthe utilities of alternativesto have a multivariatenormal distribution. This model was first proposed by Thurstone(1927)and has been appliedto psychological-choicedata by

Econometric Models

S17

Bock and Jones (1968). Applications to market-choiceproblemshave been inhibitedby computationcost. However, in a series of economic applications,approximationshave been introducedwhich reduce the computationalbarrier(see Hausman and Wise 1978; Daganzo, Bouthelier, and Shiffi 1977; and Lerman and Manski 1980).

Model Elimination-by-Aspects Another model permittinga very general pattern of similaritiesis the elimination-by-aspectsmodel of Tversky (1972). Conceptually, each alternativecan be viewed as owning a set of aspects or features (e.g., for soap the relevant aspects may be color, fragrance,price category, size, etc.). The consumer is viewed as selecting an aspect at random, eliminatingavailable alternatives which fail to own this aspect, and repeatingthis process until a single alternative remains. An example illustratesthe structureof this model in a special preference-treecase. Suppose a choice set with three alternatives{Z1,Z2,Z3} exists. Suppose the alternativescan be arrangedin a treelike structure,as in figure 1, with common segments representingcommon aspects of alternatives and the length vi of each segment a measure of the abundance (or desirability) of its associated aspects. The choice probabilities are formed by samplingrandomlyfrom aspects, discardingbrancheswithout the sampled aspect, and repeatingthe process until a single alternative remains. Thus, in figure 1, an aspect sampledfrom v1, v2, or V3 determinesa single alternativein one step. An aspect sampledfrom V4 eliminates Z3, with additional step(s) required to select an aspect in either v1 or v2. The choice probabilitiesthen satisfy P(1 z1,z2) = v11(v1+ v2)
P(1
Z1,Z3) = (V1 + V4)I(V1 + V3 + V4)
=

P(2 1Z2,Z3)
P(3 P(1 IZ1,Z2,Z3)
=

(V2 + V4)/(V2 + V3 + V4)


= vJ(v1 + V2 + v3 + V4)

(4)

IZlZ2,Z3)

V1 V? + V2 + V3 + V4

V4 V1 + V2 + V3 + V4

V1 VI + V2

V4
V3

V2

z2

FIG.

tree model 1.-The elimination-by-aspects for a preference

S18

Journal of Business

Note that when V4 = 0 this model reduces to the Luce model. The model is readily extended to more complex trees and to general networks. Tversky shows that this model is consistent with randompreference maximization.In a psychological laboratorysetting, the vi are treated as parameters. Application of the model to market choice would requirethat they be made parametricfunctions of the measured attributesof the alternatives. Since the general network case requires 2J - 2 terms vi in a choice set with J alternatives, the model may become computationallyinfeasible for large choice sets. This problem is discussed further in McFadden (1980). GeneralizedExtreme-Value Model A third model due to me (McFadden and Associates 1978)is a direct of generalization the Luce (MNL) model to permitpatternsof "non-independence" of alternatives. A random-utilitymodel in which the utilities of the alternatives have independent extreme value distributions yields the Luce model. Consideringnonindependentextreme value distributionsleads to the generalized extreme value (GEV) models describedbelow. The practicalinterest in these models lies in their form when specialized to preference trees such as the one in figure 2. For the GEV model, transitionprobabilitiesfrom one level of the tree to the next have multinomial-logit forms, with the scale value
Brand A Central Air Conditioner Brand B

Heat Pump

Brand 1 Brand 2 Rom Air Conditioner Brand 3 Brand 4 Level 1


FIG.

Level 2

2.-Example of a preference tree for air conditioners

Econometric Models

S19

associated with each branch obtained from the fitted probabilitiesfor transitionsfurther down the tree. Then, the GEV model can be estimated recursively by fitting a sequence of multinomial-logit models. The following result characterizes the GEV family.
Theorem Suppose G(yl, . . ., yj) is a nonnegative, homogeneous-of-degree 1 function of (y 1,... , yj) ' 0 which is positive whenever any argument is positive. Suppose for any distinct (il, . . . , ik) from {1, . . ., J}, ak Glayil, ..., Yikis nonnegativeif k is odd and nonpositive if k is even.

Then
Pi = 0 log G(vi,
. . .,

log VJ)IO vi,

(5)

where vi is a scale value for alternativei, defines a choice model which is consistent with utility maximization.
The theorem is proved by showing that exp [-G(vle-ul,... (1) by construction. The special case G(yI, ...
.,

, vie -J)]

is a multivariateextreme value distributionand then obtaining(5) from


Yj) = 1j yj yields the

MNL model. If G is symmetric in its arguments, then the choice probabilities satisfy the property of "simple scalability" (Tversky 1972), which has undesirablestructuralrestrictions similarto the IIA property. However, in general G may depend on the "clustering" of alternativesin attributespace. An example of a more generalG function satisfying the hypotheses of the theorem is
M

G (y) =Z am Hm (y)
m=l

(6)

where
l-am

Hm(y)
M

ykI(lY-m)
kEBm

Bm C 411 .

= Jj, UBM
m=l

I I

.
am > 0,

J}9

and O?'

m< 1.

The parameter 0m is an index of the similarity of the unobserved attributesof alternativesin Bm.The choice probabilitiesfor this function satisfy
M

Pi = X P(i IBm)P(Bm),
m=l

(7)

where P(i I Bm) is the conditional probabilitythat alternativei is chosen, given the event Bm. Then,

S20

Journal of Business

P (i jBim) = v I(lm)
3EBm

vVYl-m

if i E Bm

(8)

= 0

if i fBm
M

and P(Bm) is the probabilityof the event Bm, with


P(Bm) = am Hm (v) an Hn (v). n=1 (9)

Functions of the form in (6) can also be nested to yield a wider class satisfying the theorem hypotheses. For example, the function
G(y)
= laq
q=1 mek,

M (y)t11P

Jl)

(10)

c where UBm = {1,.. , J}, satisfies the hypotheses provided I > am ?q- 0 for m E Dq. The choice probabilitiesfor (10) and analogous

functions can be written as sums of productsof conditionaland marginal probabilities,in a manner generalizing(7), with each probability element having a MNL form and the denominatorin each element equaling a representativeterm in the succeeding element. Choice probabilitiesof the form (7) were apparentlyfirst derived for the case of three alternatives and B1 = {1}, B2 = {2, 3} by Cardell (1977). For the case of disjointBm, the form (7) has been discovered, independently, by Daly and Zachary (1976), Ben-Akiva and Lerman (1977), and Williams(1977). A particular advantageof the GEV model is that it can be specialized to forms with a "treelike" interpretation analogousto figure2, but with the property that at each level of the tree choice can be interpreted as conformingto a MNL model. For example, = (y1IP + Y21P) + y3 with p = 1 - C G(y1,y2,y3) yields the choice probabilities
P(1
z1,z2) =
V1'PI(V1'P + V21p)

P(1 j z1,z3)
P(3

= V11(V1 + V3) = V3/G(v1,V2,V3) = P(l

(1)

j Z1,Z2,Z3)

P(1 | Z1,Z2,Z3)

IZ1,Z2)

(vMI + V~'P)P/G(V1,V2,V3).

To compare this model with the elimination-by-aspectsmodel (4), I have calculated the respective vi parameters and o- so that binarychoice probabilitiesin the two models are the same and then compared their trinomial-choiceprobabilities. I found that these probabilities never differ by more than .015, or 3%. Thus, at least for three alternatives, these models appear to be indistinguishablefor empirical purposes.

Econometric Models

S21

To see how (11) can be interpreted as a nest of MNL, or Luce, models, suppose that the scale values vi can be specialized to a form log-linear-in-parameters, vi = 6'zi. Then, the choice probabilityfor log alternative 1, conditioned on 1 or 2 being chosen, equals the binarychoice probabilityfor 1 over 2,
P(1 Jzt,z2) =
e"'Z1IP/(eO`Z1IP

e0'z21P).

(12)

Consideration conditional-choicedata then permitsestimationof the of parametervector 0/(1 - o-) and calculation of the inclusive value, or composite value,
V12 = In (eO`Z1IP +
eo'z2P).

(13)

The trinomialchoice between 1, 2, and 3 can, at the upperlevel of the tree, be written as a binomial choice between Z3 and an inclusive alternativeV12,with choice probability (14) - o-). Then, all the This multinomial-logit form permitsestimationof (1 parametersof this GEV model can be estimated by the empirically practical method of multinomial-logitestimation at each of the two nested levels of the decision tree. The parametero- is a measureof the degree of similaritybetween 1 and 2, with o- = 0 correspondingto the "red bus/blue bus" example. Consistency with utility maximization requires 0 ' r 1. The Williams-Daly-Zachary Theorem A family of probabilistic-choicemodels consistent with the randomutility model, and containingthe GEV family, are characterizedby the following theorem, whose essential ingredients were proved by Williams (1977) and by Daly and Zachary (1979):
Theorem
+ P(3 Iz3,V12) = e`'Z31[eO'Z3 e(1-ff)V12].

function of (yi,

, yj) is a nonnegative, homogeneous-of-degree 1 , yj) i- 0, and is positive whenever any argument is positive. Suppose for any distinct (i1, . . ., ik) from {1, . . ., J}, Suppose G(y1,
. . .

0k log G/Oyj, . . is even. Then,

Oyik

is nonnegative if k is odd and nonpositive if k

Pi =

0 log G(v1, . . . , Vj)/O log vi,

(15)

where vi is a scale value for alternative i, defines a choice model consistent with utility maximization. This resultdiffersfrom the characterization the GEV model in that of the mixed partialderivatives of log G, ratherthan G, are signed. It is

S22

Journal of Business

easy to show that the conditionsof the GEV theoremare a special case of the conditions in this theorem. theorem characterizes random-utility The Williams-Daly-Zachary models where the utility of an alternativecan be writtenas the sum of a nonstochastic scale value and a randomeffect, with the distributionof the random effects independent of the scale values. Note that the distribution the randomeffects can dependon aspects of alternatives of other than their scale values, such as their direction or clustering in attributespace. (Withoutsuch dependence, the resultingchoice models are simply scalable.) A proof and further discussion of the theorem is given in McFadden (1978). This Williams-Daly-Zachary theorem provides a useful foundation for constructing or checking probabilistic-choicemodels where consistency with the random-utility model is required.
Generic and Nominal Models

An alternativeis describedby a vector z of measuredattributes.It may also have a vector y of unmeasuredattributes(with the randomnessof utility attributedto variationsin y). The vector z contains information on generic (or, intrinsic)properties of the alternative, and in addition nominal (or extrinsic) informationsuch as labels attached by the observer for purposes of identification. For example, a transportationmode alternativemay be described by generic variables such as time, cost, and number of stops as well as nominal labels such as "bus," "express," "Alternative No. 4." It is reasonable to postulate that behavior depends solely on generic propertiesof an alternative.However, an observed nominallabel which is "correlated" with the unobserved generic variablemay appearto be related to choice. For example, a label which identifies a transportationmode as bus may be correlatedwith an unobservedgeneric variablemeasuringthe schedule flexibilityof the mode, and hence may act as a "proxy" for the generic variable. The "similarity" between alternatives should also be perceived by the subjectin generic terms. Again, nominallabels may serve as proxies for unobservedgeneric indicators.The plausibilityof the red bus/blue bus example is based on the assumptionthat the label "bus" identifies alternatives with closely related unobserved attributes. Generic models are desirable for forecasting market behavior, particularly the demand for new products. Knowledge of the historical effect of nominal variables, reflecting underlyingunmeasuredgeneric effects, is of little use in forecasting the demandfor a product whose unmeasuredgeneric attributeshave changed. forecastingsuggests that it is Empiricalexperience in travel-demand difficultto construct purely generic choice models, using conventional marketdata alone, and obtain plausible fits to observed choice behav-

Econometric Models

S23

ior. Hence, the success of this approachto forecastingmarketbehavior would appearto depend on comprehensivemeasurementof attributes of alternatives. Measurement Issues The measurement requirements imposed by generic probabilisticchoice models may be impractical in some marketing applications, where a full inventory of physical attributes of a product would be excessively long and expensive, and would raise fundamentalquestions of the appropriatedimensions for measurement.Since the proximate objective of these measurementsis determination their impact of on the utility of alternatives, direct psychological scaling of consumer ratings of these attributeswould appear to be a more direct and economical alternative. Two constraints should be imposed on the constructionof these scales due to the context in which they will be used. First, the scales themselves will become explanatory variables in a probabilistic-choice model of market behavior. Hence, a primary criterionin scale constructionshould be their abilityto providepredictive power in a choice model. Candidate scales can be compared by testing the forecasting accuracy of choice models employing them as variables. Note that this extrinsic criterion for scale construction is quite differentfrom the intrinsic criteria commonly employed in multidimensionalscaling. Second, if the scales constructedin the mannerdescribed above are to be useful for forecasting purposes, they must be tied back to the physical attributesof productswhich are subjectto design decisions by suppliers. This can be achieved by econometric models relating scale levels to physical attributes, or by psychological methods with trial products. Attitudesand Behavior Marketbehavior in response to an objective decision environmentis presumablymediated by attitudes and perceptions. In fact, much of marketingseems to be concerned with modifyingthe link between the objective attributes of products and the consumer's perception of them. What, then, should be the role of attitudeandjudgmentscales in a market-behavior forecasting system? As the preceding section indicated, psychological scales can be treated simply as attributesof alternatives in probabilistic-choicemodels, provided they are constructed so as to contributeto the explanationof behavioron one handand to be tied to the experience of the consumer and the objective attributesof products on the other. There is no requirementthat the explanatory variables in a choice model be exclusively objective attributesor ex-

S24

Journal of Business

elusively psychologicalscales; economy and plausibilityshouldusually suggest a mix. The demand system will now be "structural."In addition to the choice model, there will be equations (or, more generally, mappingsobtained by laboratoryor trial-productexperiments) relating perceptions and changes in attitudes to experience and to the objective attributes of products. For some forecasting purposes, a reduced form obtained from this system by solving out intervening attitudeand perceptual scales and relatingbehavior to objective attributes will be relevant. For others, where the problem involves modifying the structuralrelationshipbetween attitudes and objective attributes, each structuralequationwill be used separately. In both cases, psychological measurement has a well-defined interconnection with behavior and with the objective attributesof products.
Statistical Methods for Estimating Probabilistic-Choice Models

Most parametricprobabilistic-choicemodels have a nonlinear structure requiring a general-purposeestimation technique such as nonlinearleast-squaresor maximum-likelihood estimation.For some models, notably the MNL model, efficient computerprogramshave been developed and extensive experience has been accumulated in their use.' More recently, an initial version of a "production"multinomial probit programhas been released (see Lerman and Manski 1980). Most of the statistical procedures and empirical applications of probabilistic-choicemodels to date have treated survey data obtained from exogenous randomor stratifiedsample designs, where the actual choice behaviorof subjectsdoes not affect theirprobabilityof selection in the sample. Several recent papers have considered the problem of statistical estimation of choice models using data collected by samplingprocedures other than random sampling. Of particularinterest are choicebased samples, utilizingdata collected from "purchase" or "point-ofsale" surveys. Such data sources are often available to analysts from sales or warrantyrecords or can be commissionedat low cost relative to randomhousehold surveys. Manskiand Lerman(1977)have shown that treating choice-based samples as if they were random and calculatingestimatorsappropriate randomsamples will generallyyield to inconsistent estimates.2They introducea weighted likelihoodfunction whose maximizationis shown to yield consistent estimates. Manski and McFadden (1980) have considered more generally the problem of estimation of discrete-choice models under alternative
1. One examplein the publicdomainis the QUAILprogram MNL analysiswhich I for developed. 2. In a MNL model with alternative-specific dummyvariables,the inconsistency is confinedto the dummy variablecoefficients.

Econometric Models

S25

Choice Set C
0 i 11 0 0 0 .9 0 0 0 0 I .........0 . 00* J

J~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

IZI
Attribute Set Z z fziZ) ........

IP

W)
P (Z)

z" Q(1)..............

. Q(i) ........ . J) Q(

~~~~~~~~~~~p(z")

FIG. 3.-Contingency table layout of observations

sampledesigns. The discrete-choice problemcan be definedby a finite set C of mutuallyexclusive alternativeresponses, a space of attributes Z, assumed to be a subset of a finite-dimensionalvector space, a probabilitydensity p (z) [z E Z], giving the distributionof attributesin
the population, and response probability or choice probability

P(i Iz,0*), specifying the conditional probabilityof selection of alternativei E C, given attributesz E Z. Priorknowledge of causal structure up to a parametervector 0*. The analyst's problem is to estimate 0* from a suitable sample of subjects and their associated responses. The probabilitydensity of (i,z) pairs in the population is given by
f(i,z) = P(i z,0*)p(z), [(i,z)
E

is assumed to allow the analyst to specify the response model P(i I z, )

C x Z].

(16)

The analyst can draw observationsof (i,z) pairs from C x Z according to one of various samplingrules. The problemof interest is first, given any samplingrule, to determinehow 0* may be estimated, and second, to assess the relative advantages of alternative sampling rules and estimation methods. The data layout can be visualized using a contingency table, as illustratedin figure 3. An observation (i,z) occurs in the population with frequencyf(i,z). The row sums give the marginaldistributionsof attributesp(z), while the column sums give the population shares of responsesQ(i). Thejoint frequencyf(i,z) can be writteneitherin terms of the conditionalprobabilityof i given z, or choice probability,or in terms of the conditionalprobabilityof z given i, as the formulaein the figure illustrate. The feature of the probabilistic-choiceproblem which distinguishes it from the general analysis of discrete data is the postulate that the response probabilityP(i Iz,0*) belongs to a known parametricfamily

S26

Journal of Business

and reflects an underlyinglink from z to i which will continue to hold even if the distributionp (z) of the explanatory variables changes.3 Alternatively,given a populationC x Z with probabilitydistribution specifiedbyf(i,z), one might, in the absence of any knowledge of the process relatingi's to z's, obtain a random sample from C x Z and directly examine the joint distributionf(i,z). This exploratory data analysis approach is exemplified by the literature on associations in contingencytables, where it is assumed only thatZ is finite (see, e.g., Goodmanand Kruskal 1954; Haberman 1974; and Bishop, Fienberg, and Holland 1975). If one believes that the elements of C index conceptually distinct populations of z values, then the natural analytical approach is to decomposef(i,z) into the productf(i,z) = q(z i)Q(i), where q(z Ii) gives the distributionof z withinthe populationindexed by i and Q(i) is the proportionof the populationwith this index. This is the approach taken in discriminant analysis. There, prior knowledge allows the analyst to specify q (z i) up to a parametricfamily, and a sample suitable for estimating the unknown parametersis obtained from the subpopulationi (see, e.g., Kendall and Stuart 1976 and Anderson 1958). Models for data analysis of associations, or discriminantanalysis, imply (by Bayes law) probabilistic-choicefunctions. These derivedchoice probabilitiesmay be inconsistent with economic-choice behavior. When they are consistent, a separate and interesting question is whether the parametervector O*can be estimated conveniently from the associated model of association or discrimination. Manskiand McFadden(1980)attemptto provide a generaltheory of estimationfor quantalresponse models. The scope of the investigation is as follows: Consider the problem of estimating O*from stratified samples of (i,z) observations. A stratifiedsamplingprocess is one in which the analystestablishes an index set B, partitionsC x Z into strata over Sb 5 C X Z, b EB, and specifies a suitableprobabilitydistribution B. To obtain an (i,z) observation, he draws stratumaccordingto the specifieddistributionon B and then samples at randomfrom within the drawn stratum. Within the class of all stratificationrules, two symmetric types of stratification of particularstatisticaland empiricalinterest. In "exare ogenous" sampling,the analyst partitionsZ into subsets Zb, b E B, and lets Sb = C X Zb. In "endogenous" or "choice-based" sampling, he partitionsC into subsets Cb, b EB, and lets Sb = Cb x Z. Less formally, in exogenous sampling the analyst selects decision makers and ob3. This postulate is fundamentalto the concept of "scientific explanation." If the responseprobability functionis invariant over populationswith differentdistributions of attributes,then it definesa "law" whichtranscendsthe character specificsets of data. of Otherwise,the model providesonly a device for summarizing data and fails to providea key ingredientof "explanation"-predictive power.

Econometric Models

S27

serves theirchoices, while in choice-based samplingthe analyst selects alternativesand observes decision makers choosing them. In figure 3 exogenous samplingcorrespondsto stratifyingon rows and then sampling randomly from each row, while choice-based sampling corresponds to stratifying on columns and then sampling randomly from each column. Manski and McFadden make a detailed statistical examination of maximum-likehoodestimation of 0* in both exogenous and choicebased samples. They find that application of maximum likelihood is wholly classical in exogenous samples. In choice-based samples, however, the form of the maximum-likelihoodestimate (MLE) depends crucially on whether the analyst has available certain prior informap tion, namely, the marginaldistributions (z), z E Z, or Q(i), i E C, where Q(i) = Y- P(i IZ,O*)p(z). estimator of 0 in a choice-based sample The maximum-likelihood when p is known and Q is unknown satisfies
N N

max slog
n=1

P(in

jz. ,O)

log !P(in
n1

Z )P(z),

(17)

where (inZ,Z) is observed for a samplen = 1, . . . , N. WhenQ andp are both known, (17) is maximized subject to the constraints Q(i)
=

YP(in Iz,6)p(z).

(18)

Whenp is unknown, the classical conditions for maximum-likelihood estimation are not met. However, several alternative nonclassical maximum-likelihoodand pseudo-maximum-likelihoodmethods are available which yield consistent estimators. When Q is known and p is unknown, Cosslett (1980) has shown that the nonclassical full-informationmaximum-likehoodestimator satisfies
max min E1gP0(in IznO)[ZXiQ(i)][XP

|IZn,0)]}. (19)

A second estimator,introducedby Manskiand Lerman(1977)satisfies


N

max Zw(in) log P(in IZnO) ,


eve n=1

(20)

where w(i) = Q(i)IH(i) and H(i) is the samplingfrequency for alternative i.

If both p and Q are unknown in a choice-based sample, then, provided an identificationcondition is satisfied,4Manski and McFad4. An importantcase in which the identificationconditionfails is the MNL model, where in the absence of a knowledgeof Q there is a confoundingof the effects of Q and dummies. alternative-specific

S28

Journal of Business

den show that the nonclassical full-informationmaximum-likehood estimator satisfies


N

max max
OEO X__O

log [P (in I Znkin/


n=i

34EC

P(J fZn ,O)Xj -

(21)

Note that one can, with some loss of efficiency, obtain consistent estimatesfor an information case by using a consistent estimatorwhich ignores some available information.For example, the estimators (19) or (20) could be used in the case both p and Q are known, and the estimator (21) could be used in any of the informationcases. The choice-based samplingestimatorsabove have been extended by Cosslett (1980)to a case which has particularpotentialvalue for application to marketingdata. Supposep(z) is observed, say, from the U.S. Census Public Use Sample or other general data sources, and a choice-based sample is drawn for a single alternative, say, data from warrantyregistrationsof purchasers. (This is termed an "enriched" sample.) Then, modified versions of the criterion (19) or (21) can be used to estimate the parametersof the probabilistic-choicefunction. More generally, this approach to the question of sample design and estimation method suggests that an integratedanalysis of data availability, choice model structure,survey cost, and statisticalmethod can improve the flexibility, precision, and cost effectiveness of market forecasts. Conclusion This paper has surveyed recent developments in the specificationand estimationof econometricmodels of probabilisticchoice. The motivation for this work has been the problem of forecasting the impact on demand of introducing new commodities or modifying aspects of existing commodities. Particularattention is given to correctly representing patternsof similaritiesbetween alternativesand to developing statistical methods for estimation of probabilistic-choicemodels from "purchase" surveys. The applicationsof these methods in transportation and labor economics appear similar to problems in market research, suggesting that some of these methods may be useful in the latter discipline.
References
Anderson, T. W. 1958. An Introduction to Multivariate Statistical Analysis. New York:

Wiley. Ben-Akiva, M., and Lerman, S. 1977. Disaggregatetravel and mobilitychoice models and measuresof accessibility. Mimeographed. in Forthcoming P. Stopher(ed.), Third
International Conference on Behavioural Travel Modelling. Bishop, Y.; Fienberg, S.; and Holland, P. 1975. Discrete Multivariate Analysis. Cam-

bridge, Mass.: M.I.T. Press.

Econometric Models

S29

Bock, R. D., and Jones, L. V. 1968. The Measurement and Prediction of Judgement and

Choice. San Francisco:Holden-Day. Cardell,S. 1977. A choice model without independencefrom irrelevantalternatives. Mimeographed. CharlesRiver Associates. Cosslett, S. 1980. Efficientestimationof discrete choice models. In C. Manskiand D.
McFadden (eds.), Structural Analysis of Discrete Data. Cambridge, Mass.: M.I.T.

Press. Daganzo, C.; Bouthelier, F.; and Sheffi, Y. 1977. Multinomialprobit and qualitative choice: a computationallyefficient algorithm. TransportationScience 11, no. 4 (November):338-58. Daly, A., and Zachary, S. 1979. Improvedmultiplechoice models. In D. Hensher and
Q. Dalvi (eds.), Identifying and Measuring the Determinants of Mode Choice. London:

Teakfield. Goodman,L., and Kruskal, W. 1954. Measuresof associationfor cross-classification.


Journal of the American Statistical Association 49, no. 268 (December): 732-64.

Haberman,S. 1974. The Analysis of FrequencyData. Chicago:University of Chicago Press. Hausman,J., and Wise, D. A. 1978. A conditionalprobitmodel for qualitativechoice: discrete decisions recognizing interdependence, and heterogeneous preferences. Econometrica48, no. 2 (March):403-26. Kendall, M., and Stuart, J. 1976.Advanced Theoryof Statistics. Vol. 3. New York: Hafner. Lerman,S., and Manski, C. 1980. On the use of simulatedfrequenciesto approximate choice probabilities.In C. Manski and D. McFadden(eds.), StructuralAnalysis of Discrete Data. Cambridge,Mass.: M.I.T. Press.
Luce, R. D. 1959. Individual Choice Behavior: A Theoretical Analysis. New York:

Wiley. McFadden, D. 1973. Conditional logit analysis of qualitative choice behavior. In P. Zarembka(ed.), Frontiersin Economics. New York: Academic Press. McFadden,D. 1974.Measurement urbantraveldemand.Journalof PublicEconomics of 3, no. 4 (November):303-28. McFadden,D. 1976.Quantalchoice analysis:a survey.Annals of Economicand Social Measurement5, no. 4 (January): 363-90. McFadden,D. 1978.Modellingthe choice of residentiallocation. In A. Karlquistet al.
(eds.), Spatial Interaction Theory and Planning Models. Amsterdam: North-Holland.

McFadden,D. 1980. Econometric models of probabilisticchoice. In C. Manski and


D. McFadden (eds.), Structural Analysis of Discrete Data. Cambridge, Mass.: M.I.T.

Press. McFadden,D.; Talvitie, A.; Cosslett, S.; Hasan, I.: Johnson, M.; Reid, F; and Train,
K. 1978. Demand Model Estimation and Validation. Institute of Transportation

Studies, Urban Travel Demand Forecasting Project, Final Report Series, vol. 5. Berkeley: University of CaliforniaPress.
Manski, C., and McFadden, D. 1980. Alternative Estimators and Sample Designs for Discrete Choice Analysis. In C. Manski and D. McFadden (eds.), Structural Analysis

of Discrete Data. Cambridge,Mass.: M.I.T. Press. Manski,C., and Lerman, S. 1977. The estimationof choice probabilitiesfrom choicebased sampling.Econometrica45, no. 8 (November): 1977-88. Punj,G., andStaelin, R. 1978.The choice process for graduate businessschools.Journal
of Marketing Research 15 (November): 588-98.

Silk, A., and Urban, G. 1978. Pre-test-market evaluation of new packaged goods: a
model and measurement methodology. Journal of Marketing Research 15 (May):

171-91. Thurstone,L. 1927. A law of comparative judgement.Psychological Review 34 no. 4 (July):272-86. Tversky,A. 1972.Elimination aspects: a theory of choice. PsychologicalReview 79, by no. 4 (July):281-99. Williams, H. C. L. 1977. On the formationof travel demand models and economic evaluationmeasures of user benefit. Environment and Planning A.9, no. 3 (March): 285-344.

You might also like