Modeling Bounded Rational Decision Making

Journal of Economic Behavior and Organization 21 (1993) 331-352.
North-Holland
A model of decision making under

bounded rationality*
Kent D. Wall
Naoal Postgraduutr School, Monterey, CA, USA
Received November 1991. final version received June 1992
A model of decision making under bounded rationality is presented that combines satisficing
behavior with learning and adaptation through environmental feedback. The aspirations, or
goals, of the decision maker dynamically adjust in response to the observed sequence of past
decisions and their corresponding effects on the decison maker’s objective function. A simple
linear response model is employed to represent the beliefs of the decison maker concerning the
causal connection between his/her decisions and the resulting objective function value. The
combination of these simple elements yields a decision process model rich in dynamic behavior;
it can exhibit optimizing behavior in the long-run and chaotic pseudo-random search in the
short-run. As such, the model bridges the gap between substantive rationality and procedural
rationality.
1. Introduction
There is no doubt mathematical optimization as a characterization of

long-run economic equilibrium is of value. Certainly such an approach is
viable in situations where changes in the economic environment occur slowly
relative to the speed with which agents can adjust, and agents are able to
acquire accurate information sufficiently fast and relatively inexpensively.
Under these conditions the long-run dominates the short-run because the
period of adjustment is so transient as to be inconsequential. For all
practical purposes we may consider agents to be forecver in equilibrium,
moving instantaneously from one to another as the environment changes and
information dictates.
If, however, speeds of adjustment are sufficiently constrained, or inform-
ation is either incomplete or imperfect so that learning effects cannot be
ignored, then the long-run is no longer dominant. In these situations it is of
Correspondmcr TV: Kent D. Wall, DRMI, code 64Wa, U.S. Naval Postgraduate School,
Monterey, CA 93943. USA.
*The author wishes to thank James D. Hamilton and Maxim Engers for their help in framing
the issues presented in the introduction, and in an earlier version of the proof of Theorem I.
Thanks are also due the anonymous referees for many constructive suggestions.
0167~2681/93,506.00 ,I‘# 1993-Elsevier Science Publishers B.V. All rights reserved

332 K.D. Wall, A model of derision making
interest to study the dynamic response of eocnomic agents and it becomes

necessary to model precisely just how decisions are actually made.
Such considerations have not been lost on many prominent economists.
Simon (1955, 1957), March and Simon (1958), Cyert and March (1962),
Baumol and Quandt (1964), Day (1967) Winter (1971), Nelson and Winter
(1973), and Radner (1975) among others have given theoretical direction.
Theoretical research in this area, however, has not experienced the rapid
growth of other subjects like rational expectations, partly owing to the
limited framework available for empirical investigations. An example of the
empirical work done in this respect is Crain et al. (1984). Their findings
encourage further theoretical developments, but their nonparametric method-
ology does not support the tesing of more detailed models.
The primary purpose of this paper is to lay out a model that addresses
this empirical gap. The model has two primary advantages: (1) its form
admits econometric estimation; and (2) its framework contains the optimizing
paradigm as a special case, thus permitting testing of this paradigm against
an alternative. The model is predicated on Simon’s concept of bounded
rationality and embodies explicit mechanisms for learning, adaptation, and
goal formation, all in a form suited for empirical studies. Other researchers
have touched upon this theme and elements of the model presented here owe
much to their efforts. Most notable among these are Day, March, and, of
course, Simon. Adaptive aspirations is attributed to Simon (1955) and March
and Simon (1958). Simple first order adjustment schemes for the adaptation
of aspirations can be found in Levinthal and March (1981) and March
(1988). Representation of search and learning in a form analogous to
nonlinear programming optimization is due to Day (1967) and Day and
Tinney (1968). This paper constitutes an elaboration and synthesis of these
earlier efforts. The exposition begins with a statement of the conceptual
framework and then proceeds to its implementation in concrete operational
terms. Some properties of the resulting model then are demonstrated, and a
brief discussion of the method necessary to employ the model in empirical
research is given.
2. Bounded rationality
The descriptive model of decision making presented here owes its con-
ceptualization to Simon’s theory of bounded rationality, the essence of which
may be captured in eight statements:
[A] Decision making is dominated by the effects of complexity on the

limited abilities of humans to process large amounts of information.
Thus, information processing tends to be parsimonious, and solutions
are simple-minded.
K.D. Wall, A model of decisionmaking 333
[B] New solutions are synthesized by modifying the currently implemented

one; so search is local.
[C] Alternatives are considered one at a time, not simultaneously, so search
is sequential.
CD] The search for a new and better solution is undertaken only when it is
deemed necessary; when it is observed that goals are not being met.
[E] A satisficing mode is used in searching; the first solution that is ‘good
enough’ is implemented.
[F] Goals are stated in terms of aspirations, and these are formed by
adaptation and learning from experience.
[G] Search strategies are developed on the basis of learning and adaptation
through experience.
[H] The attention the decision maker pays to the environment is the
product of learning and adaptation driven by experience.
The rendering of these tenets in mathematical terms depends directly upon
the type of problem under study. In economics the underlying decision
problem is that faced by a single decision maker who must select the values
of N decision variables at time t so as to maximize the value of some
objective function, ,$ The decision variables are represented by an N-vector,
x(t), indexed by t, the time at which their values have been tixed. The
objective function is then written as a function of x(t),f(t) =f‘(x(t)). The
choice of x(t) is limited by constraints and so the decision maker must select
x(t) from some set of feasible alternatives denoted by X(t). Time is assumed
to evolve discretely so f takes its values from the set integers, _I”. The
decision variables are assumed to be continuously valued so x(t) takes its
values in 9”. The objective function .f’ is assumed to be a continuously
differentiable mapping from ;‘R” to .&‘I. This basic decision problem, given
bounded rationality, is stated not as a maximization problem, but one in
which the decision maker must determine x(t) subject to x(t) c :1(t) such
that ,f(t) = ,f(x(t)) 2 II, where a denotes currently held aspirations.
Given this framework, the desiderata of a behaviorally based model
incorporating bounded rationality are implemented as follows. First, con-
sideration of [A] through [C] suggests decision rules specifying a change in
decision variables relative to previous values:
x(t+1)=x(t)+dx(t+l)=x(t)+cc(t+l)d(t+l), (I)
where d(t) represents the local search direction and a(t) denotes the step
length to be taken along this direction. Furthermore, these two quantities are
chosen on the basis of an information set Y(t) gathered from experience:
d(t + 1) = Yu,(.Y-(t)), (2)
ci(t + 1) = Yv,(.F(t)).
334 K.D. Wall, A model ofdecision making
Realistic modeling would suggest that Y(t) be very simple and limited in its
scope. The decision maker should not be assumed in possession of inform-
ation not directly observable or easily inferred from directly observable data.
For example, in a model of monopolistic producer behavior we would
exclude elasticities of demand from Y(t), but include the past prices asked
for the product produced and the corresponding consumption observed.
In consideration of [D] through [F] the decision maker is presumed to
hold at the end of time t an aspiration level a(t+ 1) representing a value of
the objective function f( .) which he or she believes is satisfactory and a
reasonable goal to attain in the next time period. Motivation for continued
search is provided by failure to achieve the currently defined goal; i.e.,
Iff(x(t)) < u(t + l), continue to pursue

searching for a better alternative. (4)
On the other hand, if satisfactory performance has been achieved then

terminate searching; i.e.,
Iff(x(t))Za(t+ I), set cx(t+ I)=0 and

stop the search process. (5)
Finally, adaptation and learning require that a(t + 1) be adjusted according

to experience as captured in Y(t):
u(t+ l)= Y,(Y(t)). (6)
3. The operational model
Any empirical implementation of the model described by (1) through (6)

requires concrete form for F(t) and the mappings ‘Pi, where i= 1,2,3. Broad
guidance in such a task is provided by three overriding consierations: First,
research in cognitive psychology has found that humans employ simple
linear relationships to capture causal connections between variables and that
linear models capture a great deal of this behavior [Dawes and Corrigan
(1974)]. Second, there is evidence of persistence and inertia in the formation
of beliefs and in the processing of information used to modify relationships
[Edwards (1968) Einhorn (1980)]. Third, perception and judgment suffer
from significant biases through which objective data are selectively attended
to and subjectively interpreted [Kahneman, Slavic and Teversky (1982) parts
II-IV]. The simplest possible information processing mechanisms should be
employed with the focus primarily on linear relationships.
With these considerations in mind, the information set is assumed to
consist of nothing more than observations on past and current values off(t)
and x(t). Thus
K.D. Wall, A model of decision making 335
~(r)={f(r),x(z);r=t,t-l,t-2 )... }. (7)
This is to be contrasted with the information set assumed in the neoclassical

optimizing paradigm where complete information about f is assumed; i.e.,
the functional form of f(t) and all parameters, derivatives, and so forth are
known. A further restriction can be placed on (7) by requiring a finite
‘memory’, and this is done below by inclusion of only the current and N
previous observations as required.
Eqs. (2) and (3) are replaced by a set of operations consistent with [D]
and [G] but involving only simple linear functions. To wit, the decision
maker at the beginning of period t+ 1, holding aspirations a(t + l), chooses
x(t + 1) to force the value of the objective function, f, at the end of the
period to equal a(t + 1). The function f( .), however, is not included in Y(t),
only past observed values of f and corresponding x are available for use in
determining x(t + 1). The exact form of f, its parameters, and how its value is
determined by x must be learned. Therefore, to reflect this state of incomplete
information, the decision maker is hypothesized to possess an estimate of this
function, denoted f’, and to use this in place of f. Algebraically this
amounts to requiring x(t + 1) satisfy
a(t + 1) =f“(f + 1) =f-yx(t + 1)).
Furthermore, the decision maker uses a linear representation of reality in

constructing f”:
f’(t+l)=f(t)+c:dx(t+l). (8)
The desired increment to x(t) can then be obtained by computing the

minimum norm solution to this equality constraint; i.e.,
Mr+j)=C4r+ l)-f(mlllct~~2
=a(t+ l)d(t+ 1).
The step length is seen to depend upon the difference between the achieved
objective and the desired value of the objective. Thus, if the decision maker is
close to his or her goal then a small step is taken, while large discrepancies
between the actual and desired objective value will lead to more bold search,
in the form of large changes in the decision variables. If the decision maker
had achieved the goal then no step is taken and satisticing behavior obtains.
The search direction is seen to be a unit vector which approximates the
direction of steepest ascent but uses only ‘backward looking’ information.
336 K.D. Wall, A model oidecision making
While this unit vector never changes in magnitude, the direction in which it
points does change with experience and therefore embodies the learning and
adaptation required by [G].
The N-vector c, represents the decision maker’s best estimate of the
sensitivity of f(t) to changes in x(t). It is comprised of nothing more than
simple finite first differences derived from F(t). This formulation follows after
the work of Cyert and March (1962) and Day (1967) who argue that humans
employ finite first difference approximations when dealing with the concepts
of derivative and gradient. In the one-dimensional case where x(t) is a scalar
c, is simply
c,=c(t)=[f(t)-f(t-l)]/[x(t)-x(t-l)].
In the multi-dimensional case c, is defined by a set of N equations:
Af (t) =ci[x(t)-x(t- l)]
Af(t-N+l)=c;[x(t-N+l)--(t-N)],
where the prime denotes transposition. Therefore, c, is any solution to
A f(t) = X(t)c,,
where X’ (t) is the N x N matrix obtained by columnwise concatenation of

the vectors x(t-j)-x(t--j- l), O<j< N- 1. One such solution is obtained
via the (Moore-Penrose) pseudo-inverse:
c,=X(t)+Af(t).
If the rows of X(t) are linearly independent when X(t)’ is replaced by X(t)-‘.
All that remains to be specified is the exact form of (6), and here we appeal
to March and Simon (1958) and March (1988) as a point of departure.
Aspirations are hypothesized to adjust to experience by a linear first-order
adaptive adjustment mechanism. Two additional terms, however, are added
representing environmental feedback from the observed values of f(t) and
the observed rate of change in f(t). The result is a form of double
exponential smoothing on f(t):
~(t+l)=Cl-ljlP(t)+Bf(t),
i(t+l)=[l-s]i(t)+s[f(t)-f(t-l)],
a(t+1)=[1-~]a(t)+~f(t+1)+~i(t+1).
Aspirations are seen to adjust to exponentially smoothed estimates of f(t)

and Af(t). Thus it is not only the perceived level of the objective function
K.D. Wall, A model of decisionmaking 331
but also the perceived rate of change of the objective function that influence
aspirations. Goal formation is argued to be significantly affected by the
magnitude of change of the objective with a rapidly rising (falling) f(t)
leading the decision maker to increase (decrease) the value of a(t+ 1) above
(below) what would have been the case by keying only on the level of f(t).
The inclusion of a rate of change effect does not appear to have been
considered in the existing literature, but is crucial to the performance of this
model. The motivation for this term is based on two considerations. First,
humans attend to rapidly changing factors far more readily than they do to
those changing only gradually over time [Kahneman and Tversky, in
Kahneman, Slavic and Tversky (1992, ch. 4), and Hogarth (1987, ch. 2)]. One
needs only to reflect upon the public reaction to a change in gasoline prices
from, say $0.90/gal to $1.30/gal that occurs over a period of two or three
weeks. The same magnitude of change that occurs over a period of two years
evokes almost no reaction at all. Second the formation of goals is more
dependent upon long term perceptions than the formation of f’, which is
only concerned with what can be expected over the next decision period.
Therefore, aspirations are more likely to be a function of what is perceived to
be possible to achieve over more than one decision period. Using f(t+ 1) as
a base and i(t+ 1) as measure of the average rate of change in f(t) over the
next k periods, one may expect that it is possible to achieve l(t+ l)+i(t+ 1)k
over the next k periods. By setting y =+k we obtain the expression for
a(t + 1).
The model of decision making under bounded rationality can now be
summarized:
Step 0. Initialization.
Set t = t,,x(t,) =.x0, f(t,) =fo, and
Step I. Update the information set at end of period t (by observing f(t)
and remembering x(t))
Y(t)={f(t),x(t),s(t- l)}. (9)
Step 2. Update aspirations
At+ l)=Cl -Pl.m+Bf(o~ (10)

i(t+l)=[l-fi]i(t)+G[f(t)-f(t-l)], (11)
u(t+l)=[l-~]a(t)+qqt+l)+yi(t+l). (12)
338 K.D. Wall, A model qf decision making
Step 3. Update linear representation of the environment
c,=X(t)‘df(t). (13)
Step 4. Determine new decision to implement
(14)
(15)
h(t+l)=a(t+l)d(t+l), (16)
x(t+l)=x(t)+dx(t+l). (17)
Step 5. Implement new decision
Set t = t + 1, implement decision and return to Step 1.
It is important to note here that the model addresses in a behaviorally

attractive, but implicit way, all the elements of the neoclassical search theory
developed by Stigler. The model can be interpreted as balancing marginal
gains (to be derived from continued search) against the cost of continued
search. The aspiration, a(t + l), contains all the marginal benefit ‘calculations’.
The gap u(t+ 1) -f(t) represents the value of continued search. If it is large,
then search is very worthwhile and large steps in decision space will be taken
(subject to any hard constraints, like the physical limitations on the rate of
change of capital). The cost of changing is well worth the potential for gain.
If, on the other hand, a(t+ 1)-f(t) is
. small, then small steps will be taken
because there is little gain to be had. When u(t + 1) =f(t) there is no benefit
to continued search and it ceases. Since u(t+ 1) is a function of
T(t + 1) +h?(t+ 1) we find that when there is an expectation of gain, then
search is undertaken because it is deemed worth it.
4. Dynamic properties of the model
The model presented above is inherently dynamic, representing the evolu-

tion of decision making over time by a set of difference equations. In the
terminology of dynamic systems, it is capable of describing transient
behavior or the short-run fluctuations of the decision variables as they are
incrementally altered in the search for a satisfactory solution. For such
models two overriding issues arise: (1) the existence of steady-state, or long-
run, solutions; and (2) the stability of these solutions.
For use of the model in economics it seems reasonable to demand that the
model demonstrate neoclassical optimizing behavior in the long-run under

circumstances where learning and adaptation would produce an accumu-
lation of knowledge leading to complete information. In other words, it
seems reasonable to demand that the neoclassical optimum be a solution to
the model and that the model be able to converge to this solution. The
following theorem, the proof of which is found in the appendix, demonstrates
the ability of the model to do this under certain conditions.
Theorem 1. Consider f: %?)“+.%l, strictly concave and differentiable on an

open bounded set 9 c 99”. Let x* l 9 satisfy f(x*) > f(x) for all XEY, x#x*,
and take a(t) = f * = f(x*) for all t. Define x(t + 1) to be the element of
%(t)cYi nearest to x(t+l)=x(t)+a(t+l)d(t+l) for t-1,2,3,... and where
the initial conditions have been assigned so that f (0) > f ( - 1) > ... > f ( - N).
Then the sequence {x(t)} converges to x*.
The model is thus capable of converging to the neoclassical optimizing

solution given feasible starting values and the omniscience to know the
maximum attainable value of the objective function. The learning and
adaptation inherent in the naive search scheme of (13)-(17) is sufficient to
seek out the optimum solution when f * is given.
It is unreasonable, however, to assume that f * is initially available. In
realistic situations this information must be extracted from experience. It
thus becomes critical to see if the model can learn f * as the search proceeds.
More specifically, it is of vital interest to ascertain the interaction between
(13)-( 17) and the learning and adaptation equations for aspirations, (lo)4 12).
If only a reasonable initial value for a(0) is required, and not f *, then a
much more interesting model obtains - one with far more empirical and
theoretical potential.
At the present time a proof of Theorem 1 with a(0) = f * replaced by
a(0) > f(0) is not available. All that can be offered here are the results of
some simulation studies. They illustrate diverse dynamic behavior and
demonstrate the capability required to learn, adapt, and converge to the
optimizing solution. This convergence is by no means guaranteed, for it
depends on a complex interplay between the initial conditions and the
parameters governing adaptation and learning. Similar findings are present in
the simulation studies of economic behavior carried out by Witt (1986) in
terms of market behavior and firm survival.
Simulation Example 1. Consider the quadratic function in one decision
variable presented in Day (1967):
f(x(t))= -am +8x(t)- 1.
This function has a unique global maximum at x* = 4.0 with f * = 15.0. The
340 K.D. Wall, A model of decision making
simulation studies presented below are initialized with using x( - l)=O,

x(O)= 1.0, and a(0) = 12. The decision maker is constrainted to select only
non-negative x(t) and can change x(t) by no more than 10% in any one
period. The first simulation depicted in figs. 1 and 2 below used 4 =0.4,
/?=0.4, and 6 =0.3, and y = 3@(h = 3). The first figure shows how a(t), f(t),
and f(t) +hi(t) behave over time. The model appears to converge to the
maximizing decision within 25 time periods. Note how expectations and
aspirations rise, forcing search to proceed long enough for the actual
maximum achievable objective value to be found. After this time, aspirations
adapt to the mounting evidence that no greater objective value is possible.
Eventually aspirations fall and converge towards the optimum. The second
figure presents the x(t) trajectory through the first 600 time periods.
Convergence, for all practical purposes is achieved, however, more than 2,000
iterations are required before x(t) converges to x* in the strict sense. This
result is representative of the performance under a range of values for 4, fi
and 6. Learning and adaptation suffice to lead the decision maker to within
some neighbourhood of the neo-classical optimum.
The model exhibits other behavior, however, that illustrates the possibility
of sub-optimization through what might be called defective information
processing. For example, consider the behavior captured in figs. 3 and 4.
Here h has been reduced to 1, representing a decision maker with a short
horizon. In this situation f(t)+hi(t) does not rise fast enough to force a(t) to
stay above f(t) long enough to permit the decision maker to learn the
maximum achievable value of f(t). In this case f(t) ‘catches up with’ a(t).
The decision maker becomes satisfied with the achieved objective value too
soon and search ceases too early. Compare this situation with that exhibited
in fig. 1.
The simulations of figs. 1 through 4 depict that what might happen in a
stable environment where learning and adaptation result in an accumulation
of knowledge that ultimately leads the decision maker to the neo-classical
optimum, or some small neighborhood of it. The real-world, however, is
better characterized as a changing environment and it is of some interest to
investigate how the model behaves in such a situation. Therefore the problem
considered above was repeated with a changing coefficient in the objective
function. Figs. 5 and 6 present model behavior in the case where the
coefficient on x(t) is allowed to change according to the sequence
{8,9,9.5,8,7,6.5,5,4.5,4,4.5}. For this simulation 4=0.2, p=O.5, 6=0.3,
y =24(h=2), and a(0) = 8. Fig. 5 illustrates the ability of the model to learn
about the maximum achievable value of the objective function and track it
‘up’ or ‘down’. Fig. 6 presents the behavior of the decision variable. The solid
piecewise constant line represents the optimal decision value and the small
open circles indicating the decisions taken by the model. The decisions follow
closely the optimal values but never converge precisely to the exact value
J
0 10 20 30 40 50
Fig. 1. Aspirations, expectations, and profit as functions of time for the quadratic function in
one-dimension.
42) X”‘X
f(C)+ h?(C) v---v
f(t)
0 100 200 300 400 500 600

Time ,n*ex
Fig. 2. The decision variable as a function of time for the quadratic problem in one-dimension.
’ 0 10 20 30 40 50
Time Index
Fig. 3. Aspirations, expectations, and profit as functions of time for the quadratic problem with
inadequate learning leading to suboptimal decisions.
a(t) X”‘X
j‘(t) + hi(t) v---v
f(t)
m
0 10 20 30 40 50
Time Index
Fig. 4. The decision variable as a function of time with inadequate learning and adaptation
leading to suboptimal satisficing decisions.
0 20 40 60 60 100 120 140 160 160 200
Time Index
Fig. 5. Aspirations, expectations, and achieved profit for the quadratic problem with time
varying environment.
4) X”‘X
j‘(C)+ k(t) v---v
f(t)
0
0 20 40 60 60 100 120 140 160 160 200
Time Index
Fig. 6. The decision variable as a function of time for the quadratic problem with time varying
environment.
JEB.0 C
K.D. Wall, A model of decision making
601 04 0.7 1.0 1.3 1.6
x( 1): Capal-Labor Ratio
Fig. 7. Profit function contours for the two dimensional example.
before another parameter shift occurs. Note that there is evidence of

satisticing at times, particularly between t = 115 and 126, and again between
t= 176 and 190.
Simulation Example 2. A very good two dimensional example is found in
Day and Tinney (1968). A firm seeks to maximize its profit through the
manipulation of two decision variables: xl(t), its capital-to-labor ratio during
period t, and x2(t), its production level during period t. At the end of each
period the firm observes its profit f(x(t)) =f(xl(t),x,(t)) = f(t) defined as:
f&(t)) =(TOXl(t)l -IT1-i?oll(X*(t)~W2Xl(t))I~OIe1
-~02C(XZ(t)w1Xl(t))/00182
- 5009
and adds this to its information set composed of past observations on f(t)
and x(t). It then decides on a new value for x and implements this for period
t + 1. The firm is not assumed to have any information concerning the form
or parameters of the demand function and is not assumed to know its supply
function.
The difficulty presented to our model of decision making can be gauged
from the contour plot of f(x) displayed in tig. 7. Profit falls off rapidly for
capital-to-labor ratios less than 0.15 and production rates less than 15,000
units. The ‘ridges’ encountered just prior to these two precipices often
confound many search algorithms by inducing zig-zagging behavior. More
importantly, the protit function is strictly concave only within a neighborhood
of x* = CO.3067; 47,274.41]’ (at which point f* = $183,662.7). Thus for large
values of x1 and x2, strict concavity does not hold.
The first experiment employs an initial aspiration level of a(O)=$150,000
and initializing decisions of
x(O) = [l.So; 89,000]‘, x( - 1) = [1.55,94,000]‘, x( -2) = C1.60, 104,000]‘.
The parameters for the adjustment eqs. (10)+12) are set at:
fiZO.20, 6=0.10, Cp=o.40, y=5f$.
Two restrictions are placed on x(t) to capture some elements of realism.

First, the firm is assumed to be unable to reduce its capital-to-labor ratio
below 0.1 and drop production below 14,000 units. Second, the firm’s speeds
of adjustment are limited. In any one period it cannot change its capital-to-
labor ratio and its production by more than 10%. Thus the firm has to
choose x(t) subject to:
and 1dx,(t)/x,(t- 1) / 50.1, (dx,(t)/x,(t- 1) ( 50.1).
Fig. 8 displays the decision history in terms of x,(t) versus x,(t). The
model finds the neoclassical optimizing solution after approximately 150
iterations and, for all practical purposes, converges. The value of the profit
function remains within one decimal place of the optimum, x1 is within two
decimal places of its optimizing value, and x2 matches its optimizing value to
within 0.50/,.
As with the first simulation example, one might ask what the behavior of
the model is if it is embedded in a changing environment. Would it ever be
able to find a time-varying optimum? Would the decision variables display
behavior that could be interpreted as a sample from some stochastic process?
Some food for thought is provided by figs. 9 and 10. Here the model
operates in a shifting environment much like that leading to figs. 5 and 6.
Two shifts in the parameters of the profit function are introduced, corres-
ponding to shifts in the demand curve and in the factor input supply curves.
One shift occurs at t = 120 and the other at t =250. Over the period of the
simulation there are now three different maxima that must be sought:
x* = CO.31; 47,274]‘, f’* = $183,663,

K.D. Wall, A model oJdecision making
001 0.4 0.7 10 1.3 1.6
Capital-Labor Ratio
Fig. 8. Convergent behavior in decision space for the two-dimensional profit maximization
problem.
x* = CO.86; 38,893]‘, f* = $135,660,
x* = CO.68; 80,423]‘, f* = $254,212,
For this simulation experiment all model parameters are kept at their
previous values except C$= 0.05.
The model locates the three maxima to within some small neighborhood.
Fig, 9 presents the path taken in decision variable space, clearly illustrating
how search proceeds from one solution to the next as information is received
discontirming prior conceptions of where the optimal decision is located. Fig.
10 shows that the model requires approximately 100 time periods to learn
where the optimal decision resides.
5. Estimation framework
For the model to be truly useful in empirical research it must permit
estimation and testing in conjunction with time series data. This requires
casting eqs. (9)-(17) in a form suitable for econometric work and one
framework immediately suggests itself. Eqs. (lo)-( 12) comprise a recursive
“01 0.4 0.7 10 13 16
Capital-Labor Ratlo
Fig. 9. Behavior in decision space for the two-dimensional problem with time varying
environment.
In
“0 * f’ ” ” ”
x I
-0 50 100 150 200 250 300 350
nme index
Fig. 10. Aspirations, expectations, and achieved prolit as functions of time for the two-
dimensional profit maximization problem with time varying environment.
a(t) X”‘X
f(t) + hi(t) v---v
f(t)
JE.90. D
system of linear difference equations that bring to mind a state-space

approach employing a Kalman filter for generating model residuals.
To see this, define as the state a 3 x 1 vector, s(t), where
s(t) = C?(t), i(t), a(t)

and define z(t) to be a vector of (N + 1) exogenous, or predetermined, inputs:
z(t)=[f(t),f(t-1) )...) f(t-N);x(t)‘,x(t-l)‘)...) x(t-N)‘]‘.
A a state space form representation of the decision model immediately

results:
s(t) =Fs(t- 1) +Gz(t), (18)
y(t) = H(z(t))s(t) + h(z(t)), where (19)
F=
r(1-B)0 0 (l-6)
01
0 ,G=
ra 6 -6
0 o...o
1
O...O ,
1 4
H(z(t))=CO:O:ctlllCt1121,
Y (l-4)
1 1
0 0 o...o
1
and y(t) =x(t). The state space model is linear in the state but time-varying in
the output equations because of the terms involving c,/ll c, 11’. It should be
noted, however, that this time variation in the coefficient matrix H is due
solely to exogenous variables and may be treated as predetermined. Estimat-
ing the model can be accomplished by a number of techniques, for example
that used in Burmeister, Wall and Hamilton (1986), Kalaba and Tesfatsion
(1980, 1988), or Pagan (1980). If one is willing to state stochastic hypotheses
regarding additive disturbances in (18)419), then a Gaussian likelihood
function can be postulated and the methods of the first and last references
above apply. If one does not wish to introduce additive random errors then
a least-squares approach can be taken and the methods of Kalaba and
Tesfatsion applied. In either case, the methods are well developed, tried and
true.
The researcher applying this estimation framework needs to obtain time
series data for f(t) and x(t) over the period of interest, say, {to ZL~:~}, and
use them to form the z(t) vector. The c, are constructed using (13) and the
first N observations of f(t) and x(t) as initial conditions. The required
pseudo-inverses can be accomplished by singular value decomposition.
Finally, H(z(t)) and h(z(t)) are formed. The need for initial conditions in
K.D. Wall, A model cfdecision making 349
computing the c, means estimation proceeds using the data over the interval
I lo + N - 1 stst/ }. Four parameters are estimated: {#,/I, 6,~). In addition,
.~n estimate of the trajectory of a(t) over the sample period can be obtained
by coupling the parameter estimation algorithm to a state space mode1 filter/
smoother algorithm like that of De Jong (1989). This allows the all
important goal formation behavior to be revealed and can be used to shed
light on how aspirations have responded to stimuli over the sample.
As an illustration of formulating an estimation problem, suppose one
wishes to study the behavior of the U.S. auto industry over the past four
decades, with particular interest in how the industry responded to growing
foreign competition. In this case behavior might be interpreted in terms of
the decisions taken to adjust production capacity, employment, and price.
Thus x(t) is a 3 x 1 vector composed of time series observations on each of
thcsc three variables. Furthermore, it is hypothesized that decisions are based
upon attention to profits; thus S(t) is the observed industry profit in period t.
If it is hypothesized that decision makers focus on market share, then f(t) is
the observed market share of domestic auto producers. Depending on the
frequency of observation, t is indexed by months, quarters, or years.
l.\timation of this mode1 requires four time series, and then their use in
generating the three time series comprising the elements of c,. The four
ehtimated parameters, together with the estimated trajectory for a(t), would
yield information on the rate with which the industry responded to the
foreign challenge, how industry goals were affected by changing market
conditions, or whether profit or market share better reflects the concerns of
industry management.
6. Discussion and conclusions
The primary concern of this paper is the presentation of an operational

model of boundedly rational decision making that can be used in empirical
studies of dynamic economics. The model’s main attributes are: (1) its
inherently dynamic nature, for it naturally explains the evolution of decisions
by framing them in terms of difference equations; (2) its ability to depict both
neoclassical optimizing behavior and non-neoclassical suboptimizing be-
havior; (3) its very modest assumptions concerning the information set of the
decision maker; (4) its very modest assumptions concerning the com-
putational and cognitive abilities of the decision maker; and (5) its explicit
incorporation of learning and adaptation and the simplicity of the mecha-
nisms by which this takes place. Learning and adaptation are the dominant
components in the model; they determine whether one obtains the optimizing
solution or a suboptimal result. The former may be characterized as a special
limiting case where, given a stable environment and sufficient time, learning
and adaptation is adequate to guide the search to the overall maximum.
350 K.D. Wall, A model qf decision making
Suboptimization occurs when learning or adaptation allows aspirations to

converge too rapidly to the actual objective function value thus terminating
search prematurely.
This last point brings out an important feature of the model. It demon-
strates the need to strike a balance between reacting rapidly to changes in
the environment and overreacting to ephemeral disturbances. If aspirations
adapt too slowly to information, then decision making will proceed too
cautiously and search will be prolonged unnecessarily. On the other hand, if
aspirations adapt too rapidly, then it is possible to converge to a sub-
optimum. For example, consider a situation like that depicted in fig. 10 at
t= 120. If aspirations adjust too rapidly to the adverse change, then the
decision maker will be inclined to satisfice and accept a profit (of approxi-
mately Sl20,OOO) lower than that which is ultimately achievable ($135,660). A
simulation with 4=0.5 bears this out; convergence to the first optimum
($183,663) occurs very fast, but between t= 120 and t=250 the model ends
search at a profit of $124,755.
Perhaps the most important characteristic of the model is its ability to
portray long-run behavior different from that of the neoclassical paradigm.
The model presented here can display suboptimal satisficing behavior, as well
as optimizing behavior. The sole determinants of this are the parameter
values: p, 6, 4, and y (or h).
The model is but one operationalization of the key mappings in eqs. (2)
(3) and (6). Other operationalizations can be specified, but the ones presented
in this paper are believed to be the most simple. Even so, these simple
specifications, when coupled with learning and adaptation demonstrate
remarkable abilities.
A number of questions are to be investigated in the near future. First there
is the proof of convergence to the neoclassical optimum when a(t) is not
fixed at f*. The convergence of the algorithm has been demonstrated in
simulation, but is seen to depend in a complicated way on a(O), the
parameters p, y, 6 and C#Jand the specification of X(t). A proof appears
possible with the aid of a theorem on algorithmic convergence by Zangwill
(1967, ch. 11).
Next there is the investigation of the effects of a changing environment.
Both examples show the algorithm capable of finding new optima as the
environment undergoes shifts. It seems important to establish if limits exist
to the ability of the model to learn enough to keep pace with rapidly
changing environments. For example, ‘At what “speed” of change in the
environment does the model begin to lose its ability to “catch up” with the
optimum?’
Finally, there are a number of normative questions that the model can be
used to answer. For example, if we assume a boundedly rational decision
maker, as above, how can the information set be ammended to help the
K.D. Wall, .4 model of’ decision making 351
decision maker converge more rapidly to the maximum. In a rapidly

changing environment, how can the information set be altered to aid in
tracking the optimum and, hopefully, to catch up with it? In other words,
what can be done to increase learning and speed adaptation? The model
appears ideally suited to help in answering these and similar questions.
Appendix
Proof of‘ Theorem 1. The argument is by induction. Let the 1 x N gradient

vector of ,f at a point p be denoted Df (p). At time t=O we know from the
prescribed initialization that Df’(x( -j)) x [x(-j+ 1)-x(-j)] >0 for 1 <j< N.
By construction, cb = Df(q) = D,f(xijx( j)) for - N <j < 0 and Jj > 0. There-
fore cb[x* -x(O)] = Df(q)[x* -x(O)] = Df(C,i,x(j))[x* -x(O)]. From the
concavity of f, Df(x(O))[x* -x(O)] > 0 and Df(C%jx( j))[x* -x(O)] >
Df’(x(O))[x*-x(O)]. Therefore cb[x*-x(O)] >O and d(1) is an ascent direc-
tion with a component along x* -x(O) in a direction towards x*. If the step
along d( 1) is not ‘too’ long, the algorithm will produce an x( 1) with ,f(l)>
f’(0). In fact any step of length less than 11x*-x(O) (( will produce this result.
To show that the algorithm gives such a step length note that
x(~)-x(O)=C.~*--~‘(X(O))IC~IIICO
112=Df’(~*)C~*-x(0)lco/llco II*>
and
11x(l)-40) //
where P = 11
DJ‘(g*) ((//IDf’(tzz)
I/. But from the strict concavity of f, p < 1 and
so the step to x(l) is not ‘too’ large so as to over
kb’ ;;x:;ic; “i;;(;;;i”)l’; (o))
x .
Now assume that for some t=n, ,f(n)>J‘(n- l)>f’(n-2)>f(n-3)~ ..‘>
f(n- N). By exactly the same arguments as above, ck[x*-x(n)] >O, so
d(n+ 1) is a direction with a component along x*-x(n) and ‘towards’ x*.
Furthermore, I/x(n+l)-x(n))I<IIx*-x(n)11 so that f(n+l)>f(n). Thus f(n)
converges monotonically to some upper bound, F, and by continuity, x(n)
converges to some point z such that F =f’(z). To see that this upper bound is
indeed f‘*, and that z=x*, consider (16) rewritten as
IIc,//C~(~+1)-x(~~l=C.f*-f‘(~)lc,lllc,//
In the limit as PZ-+CG,c,-Df’(d, the lefthand side +O, and c,/IIc,/(+l. This
implies ,f(x(n))-.f* and continuity of f implies x(n)-+z=x*. Q.E.D.
References
Baumol, W.J. and R.E. Quandt, 1964, Rules of thumb and optimally imperfect decisions,
American Economic Review 54, 23-46.
Burmeister, E., K.D. Wall and J.D. Hamilton, 1986, Estimation of unobserved expected monthly
inflation using Kalman filtering, Journal of Business and Economic Statistics 4, 147-160.
Crain, W. Mark, William F. Shughart II and Robert D. Tollison, 1984, Journal of Economic
Behavior and Organization 5, 3755386.
Cyert, R.M. and J.G. March, 1962, A behavioral theory of the firm (Prentice-Hall, Englewood
Cliffs, NJ).
Dawes, R. and B. Corrigan, 1974, Linear models in decision making Psychological Bulletin 8 I.
95-106.
Day, R.H., 1967, Profits, learning and the convergence of satisticing to marginalism, Quarterly
Journal of Economics 81, 302-311.
Day, R.H. and E.H. Tinney, 1968, How to cooperate in business without really trying, Journal of
Political Economy 76, 8533600.
De Jong, Piet, 1989, Smoothing and interpolation with the state-space model, Journal of the
American Statistical Association 84, 108551088.
Edwards, Ward, 1968, Conservatism in human information processing, in: B. Kleinmuntz. ed.,
Formal representation of human judgement (Wiley, New York) 17-52.
Einhorn, H., 1980, Learning from experience and suboptimal rules in decision making, in:
T.S. Wallsten, ed., Cognitive processes in choice and decision behavior (Lawrence Erlbaum,
Hillsdale, NJ).
Friedman, Milton, 1953, Essays in positive economics (University of Chicago Press, Chicago).
Hogarth, Robin, 1987, Jugment and choice, 2nd ed. (Wiley, New York).
Kahnemann, Daniel, Paul Slavic and Amos Tversky, eds., 1982, Judgement under uncertainty:
Heuristics and biases (Cambridge University Press, Cambridge, MA).
Kalaba, Robert and Leigh Tesfatsion, 1980, A least-squares model specification test for a class of
dynamic nonlinear economicmodels with systematically varying parameters, Journal of
Optimization Theory and Applications 32, 5388567.
Kalaba, Robert and Leigh Tesfatsion, 1988, Exact sequential filtering, smoothing and prediction
for nonlinear systems, Nonlinear Analysis: Theory, Methods and Applications 12, 5999615.
Levinthal, Daniel and James G. March, 1981, A model of adaptive organizational search,
Journal of Economic Behavior and Organization 2, 307-334.
March, James G., 1988, Variable risk preference and adaptive aspirations, Journal of Economic
Behavior and Organization 9, 5-24.
March, James G. and Herbert A. Simon, 1958, Organizations (Wiley & Sons, New York).
Nelson, Richard R. and S. Winter, 1973, Toward an evolutionary theory of economic
capabilities, American Economic Review 63, 440-449.
Pagan, Adrian, 1980, Some identification and estimation results for regression models with
stochastically varying coefftcients, Journal of Econometrics 13, 341-363.
Radner, Roy, 1975, A behavioral model of cost reduction, Bell Journal of Economics 6, 196-215.
Shoemaker, Paul, 1980, Experiments on decisions under risk: The expected utility hypothesis
(Martinus Niyhoff Publishing, Boston, MA).
Simon, Herbert A., 1955, A behavioral model of rational choice, Quarterly Journal of Economics
69, 99-118.
Simon, H.A., 1957, Models of man (Wiley & Sons, New York).
Winter, S. 1971, Satisficing, selection, and the innovating remnant, Quarterly Journal of
Economics 85,2377261.
Witt, Ulrich, 1986, Firms’ market behavior under imperfect information and economic natural
selection, Journal of Economic Behavior and Organization 7, 265-290.
Zangwill, W.I., 1969, Nonlinear programming: A unified approach (Prentice-Hall, Englewood
Cliffs, NJ).

Modeling Bounded Rational Decision Making

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Modeling Bounded Rational Decision Making

Uploaded by

Copyright:

Available Formats

Journal of Economic Behavior and Organization 21 (1993) 331-352.

A model of decision making under

Received November 1991. final version received June 1992

There is no doubt mathematical optimization as a characterization of

0167~2681/93,506.00 ,I‘# 1993-Elsevier Science Publishers B.V. All rights reserved

interest to study the dynamic response of eocnomic agents and it becomes

[A] Decision making is dominated by the effects of complexity on the

[B] New solutions are synthesized by modifying the currently implemented

d(t + 1) = Yu,(.Y-(t)), (2)

Iff(x(t)) < u(t + l), continue to pursue

On the other hand, if satisfactory performance has been achieved then

Iff(x(t))Za(t+ I), set cx(t+ I)=0 and

Finally, adaptation and learning require that a(t + 1) be adjusted according

u(t+ l)= Y,(Y(t)). (6)

3. The operational model

Any empirical implementation of the model described by (1) through (6)

~(r)={f(r),x(z);r=t,t-l,t-2 )... }. (7)

This is to be contrasted with the information set assumed in the neoclassical

a(t + 1) =f“(f + 1) =f-yx(t + 1)).

Furthermore, the decision maker uses a linear representation of reality in

The desired increment to x(t) can then be obtained by computing the

=a(t+ l)d(t+ 1).

In the multi-dimensional case c, is defined by a set of N equations:

Af (t) =ci[x(t)-x(t- l)]

where the prime denotes transposition. Therefore, c, is any solution to

where X’ (t) is the N x N matrix obtained by columnwise concatenation of

Aspirations are seen to adjust to exponentially smoothed estimates of f(t)

Set t = t,,x(t,) =.x0, f(t,) =fo, and

Y(t)={f(t),x(t),s(t- l)}. (9)

Step 2. Update aspirations

At+ l)=Cl -Pl.m+Bf(o~ (10)

Step 3. Update linear representation of the environment

Step 4. Determine new decision to implement

Step 5. Implement new decision

Set t = t + 1, implement decision and return to Step 1.

It is important to note here that the model addresses in a behaviorally

4. Dynamic properties of the model

The model presented above is inherently dynamic, representing the evolu-

model demonstrate neoclassical optimizing behavior in the long-run under

Theorem 1. Consider f: %?)“+.%l, strictly concave and differentiable on an

The model is thus capable of converging to the neoclassical optimizing

f(x(t))= -am +8x(t)- 1.

simulation studies presented below are initialized with using x( - l)=O,

0 100 200 300 400 500 600

0 20 40 60 60 100 120 140 160 160 200

601 04 0.7 1.0 1.3 1.6

x( 1): Capal-Labor Ratio

Fig. 7. Profit function contours for the two dimensional example.

before another parameter shift occurs. Note that there is evidence of

f&(t)) =(TOXl(t)l -IT1-i?oll(X*(t)~W2Xl(t))I~OIe1

x(O) = [l.So; 89,000]‘, x( - 1) = [1.55,94,000]‘, x( -2) = C1.60, 104,000]‘.

fiZO.20, 6=0.10, Cp=o.40, y=5f$.

Two restrictions are placed on x(t) to capture some elements of realism.

and 1dx,(t)/x,(t- 1) / 50.1, (dx,(t)/x,(t- 1) ( 50.1).

x* = CO.31; 47,274]‘, f’* = $183,663,

001 0.4 0.7 10 1.3 1.6

x* = CO.86; 38,893]‘, f* = $135,660,

x* = CO.68; 80,423]‘, f* = $254,212,

“01 0.4 0.7 10 13 16

-0 50 100 150 200 250 300 350

system of linear difference equations that bring to mind a state-space

s(t) = C?(t), i(t), a(t)