Stochastic Modeling of Irrationality in Normal-Form Games

Stochastic Modeling of Irrationality in
Normal-Form Games
Ben Zod
March 31, 2016
Abstract
Richard McKelvey and Thomas Palfrey introduced a statistical
generalization of the Nash equilibrium solution concept that accounts
for irrationality or lack of information among players of a game. Their
model, quantal response, clouds an actions perceived utility with statistical noise resulting in continuous movement of equilibria as payoffs
change, and often describes empirical results with greater accuracy
than does Nashs solution concept. This paper examines this model
and its origins, and finds a weakness caused by its stochastic specification. Quantal response equilibria exhibit inflexibility when adapting
to the appearance of an additional strategy, and can be manipulated
to yield counter-intuitive results.
Introduction
Before Mckelvey and Palfrey (1995) introduced the quantal response model
for normal-form games, the field of game theory focused on the decisionmaking of intelligent and rational subjects. The field examined how perfectly
rational players would act in situations involving conflict, cooperation, and
strategy. Mckelvey and Palfrey relax the assumptions of intelligence and
rationality, and apply a stochastic model to normal-form games.
Traditional game theory, pioneered by John Nash, defines equilibrium in
a game as a situation in which each players strategy is the best response
to every other players strategy. That is, after revealing the outcome, each
player could not have done better by changing strategies, given every other
1
players strategy remains constant. In normal-form games, this solution concept results in pure strategy equilibria, in which players always play one of
their strategy choices, and mixed strategy equilibria, in which players mix
between their available options according to some probability distribution.
Nashs solution concept shows exactly how perfectly intelligent and rational
decision makers act. However, humans often know varying levels of information, and vary in their rationality. For this reason, empirical data of humans
playing games often does not align well with the Nash equilibrium predictions, as shown in [8].
The assumption of perfectly rational and knowledgeable players is not one
that fits well with human players. For example, imagine a subject is asked
to participate in an experiment in which she is provided two light bulbs
of differing brightness, and is asked to write down which bulb is brighter.
The equivalent Nash solution concept to this experiment would be that the
subject would always know which bulb is brighter, and would also write down
the brighter bulb, regardless of how close in brightness they are. However,
this is not something we expect from a human subject. If one light bulb
is much brighter than the other, we expect the subject to answer correctly
close to 100% of the time. However, as the light bulbs get closer and closer
in brightness, we expect the subject to be more likely to make a mistake.
That is, as the difference in brightness between the two bulbs decreases, the
probability of the subject answering correctly also decreases. Thus, instead
of best responding, the subject is making an educated guess based on the
information provided. This can be thought of as better responding.
Quantal response equilibria attempt to model this behavior of better
responding by assigning the probability of a player playing an action as proportional to the difference between the expected utility of that action against
that of every other possible action. In the light bulb example, this means
that the probability that the subject answers bulb one as being brighter is
proportional to the difference between the true brightness of bulb one and
two. If bulb one is much brighter than bulb two, the subject answers bulb
one with probability close to 1. If bulb one is just a small amount brighter
than bulb two, the subject answers bulb one with probability close to 0.5 1 .
This paper will first explore individual choice models, closely examining
1
The light bulb example is an experiment of individual choice. Although the quantal
response model is designed for and used for games of more than one player, the individual
choice example is presented for simplicity and understanding the core concept.
Duncan Luces (1959) choice axiom, its results, and its limitations. The logic
applied to individual choice decisions will then be expanded to the realm of
game theory, where McKelvey and Palfreys quantal response equilibrium
concept will be more carefully defined, calculated, and analyzed. Following
this, I will delve into the limitations of the quantal response model, the
situations in which it succeeds and fails, and in which areas it can improve
as a model that predicts behavior of subjects with varying levels rationality.
Individual Choice
2.1
Luces Choice Model
All theorems and lemmas in this section are attributed to Duncan Luce
(1959). For further information and proofs of the results below, see [6].
Throughout this section we will suppose that a universal set U is given,
and should be interpreted as the universe of possible alternatives. In decisions, the decision maker will have to be able to evaluate the elements of U
according to some preference specification and select elements from certain
subsets of U . Now, let T be a finite subset of U , and suppose that an element
must be chosen from T . If S is a subset of T (S T ), let PT (S) denote the
probability that the selected element lies in S. If x is an element of T (x T ),
let PT (x) denote the probability that the selected element is x.
With notation defined, we can specify the three probability axioms that
form a foundation of any probabilistic study.
The three axioms of probability
(i) For S T, 0 PT (S) 1
(ii) PT (T ) = 1
(iii) If R, S T and R S = , then PT (R S) = PT (R) + PT (S).
Note: If we repeatedly apply part (iii), we see that
P
PT (S) =
PT (x);
xS
therefore, it is always sufficient to state results just for PT (x).
Moving forward, we will be using probabilities that a certain element of

a set is selected over all other options in that set. That is, the probability
that x is chosen from the menu {x, y} is formally written as P{x,y} (x). For
convenience, from here on out this probability will be written as P (x, y),
assuming x 6= y. Thus, P (x, y) > 21 when x is preferred to y, and it is always
true that P (x, y) = 1 P (y, x) and P (x, x) = 12 .
The probability axioms establish constraints on the measures PT , but
there are no assumed connections among the several measures. Luce suspected that, at least for choices, complete independence among the several
measures would be a naive assumption. His proposed relationship is referred
to as Luces choice axiom, and is the foundation for this section.
Axiom 2.1. Let T be a finite subset of U such that, for every S T , PS is
defined.
(i) If P (x, y) 6= 0, 1 for all x, y T, then for R S T
PT (R) = PS (R)PT (S);
(ii) If P (x, y) = 0 for some x, y T, then for every S T
PT (S) = PT {x} (S {x});
Interpretation. Part (i) of the axiom states that the probability that
the selected element of the menu T lies in R is exactly the probability that
the selected element of the menu S lies in R, multiplied by the probability
the selected item is even in the menu S, where T contains S, which in turn
contains R. Part (ii) essentially says that if in fact the decision maker would
never choose the element x over y, where x and y are elements of T , then
the probability that the selected item of the menu T lies in S is the same as
if the element x did not exist in T or S.
Lemma 2.2. If P (x, y) 6= 0, 1 for all x, y T , then axiom 1 implies that for
any S T such that x, y S,
P (x,y)
P (y,x)
PS (x)
PS (y) .
The importance of this lemma is that it implies that when axiom 1 holds
for T and its subsets, the ratio PS (x)/PS (y) is independent of S.
Luces first theorem formally establishes that, assuming axiom 1 holds,
all the probabilities are determined by the pairwise probabilities.
4
Theorem 2.3. If axiom 1 holds for T and if P (x, y) 6= 0, 1 for all x, y T ,

then
PT (x) = P P1 (y,x) =
yT
P (x,y)
1+
1
P
yT {x}
P (y,x)
P (x,y)
His next theorem shows that axiom 1 also demands that the pairwise
probabilities meet certain constraints.
Theorem 2.4. If axiom 1 holds for {x,y,z} and if none of the pairwise discriminations is perfect 2 , then
P (x, y)P (y, z)P (z, x) = P (x, z)P (z, y)P (y, x).
Corollary 2.5. Under the conditions of the theorem,
P (x,y)P (y,z)
P (x, z) = P (x,y)P (y,z)+P (z,y)P (y,x)

If each of {x, y}, {y, z}, and {x, z} are offered to a subject just once,
his choices are governed by the given probabilities, and they are statistically
independent, then P (x, y)P (y, z)P (z, x) is exactly the probability that his
choices imply the intransitivity x > y > z > x. Among other things that
will become apparent later, Theorem 2.4 ensures that, if axiom 1 holds, this
probability is the same as the probability of x > z > y > x.
In what follows, Luce shows that for situation in which pairwise choice
discrimination is imperfect, axiom 1 implies the existence of a ratio scale that
is unique except for its unit, as well as being independent of any structural
assumptions on the set of alternatives. This theorem lays the groundwork
for the quantal response probabilities, discussed in the next section.
Theorem 2.6. Suppose that T is a finite subset of U, that P (x, y) 6= 0, 1 for
all x, y T , and that axiom 1 holds for T and its subsets, then there exists a
positive real-valued function v on T, which is unique up to multiplication by
a positive constant, such that for every S T
PS (x) =
Pv(x)
v(y)
yS
Proof. Define v(x) = kPT (x), where k > 0; then by part (i) of axiom 1, and
part (iii) of the probability axioms, we have
2
That is, P (a, b) 6= 0, 1 for any a, b T .
PS (x) =
=
PT (x)
PT (S)
PkPT (x)
kPT (x)
yS
Pv(x)
v(y)
yS
so existence is ensured.
To show uniqueness, suppose that v 0 is another such function: then for any
xT
v(x) = kPT (x) =
kv 0 (x)
P
.
v 0 (y)
yT
Let k = k/
0 0
v (y), and we have v(x) = k v (x), which concludes the proof.
yT
What Luce has shown here is the following: Say we are confined to a
local region T in which all pairwise discriminations are imperfect. Suppose
we also know that the several probability measures are related to one another
in accordance to axiom 1, such that PS acts like a conditional probability
relative to PT . Now, what has been shown is that the distribution PT (x) can
be interpreted as a particular choice of unit of a ratio scale over T. These
scales can be extended throughout U, having important implications.
In the third chapter of Individual Choice Behavior, Luce applies his axiom
and its results to utility theory. If we let A be the set of pure alternatives
and E the set of chance events, then ab, where a, b A and E, is
the uncertain alternative where a is the outcome if occurs, and b is the
outcome if does not occur. The symbol QD is introduced for the subsets
D of E to describe the probability that an element from D is most likely to
occur, according to the subject. Luces choice axiom is assumed to hold for
the families {PT } and {QD }.
The idea is that ab will be preferred to ab, where E, if and only if
one of the following is true
(1) a is preferred to b and is considered more likely than , or
(2) b is preferred to a and is considered more likely than .
A preference structure following this rule is considered decomposable, and
can be written as
P (ab, ab) = P (a, b)Q(, ) + P (b, a)Q(, ) for a, b A and , E.
6
2.2
Debreus Critique
In 1960, Gerard Debreu published a brief review of Luces Individual Choice

Behavior (see [2]). After summarizing Luces axiom and its results and applications, Debreu focuses his attention on a simple example that illustrates
a serious limitation to the applicability of the axiom.
Debreu gives the set U the following three elements:
DC , a recording of the Debussy quartet by the C quartet
BF , a recording of the eighth symphony of Beethoven by the orchestra
conducted by F,
BK , a recording of the eighth symphony of Beethooven by the B orchestra
conducted by K
The subject will be given the choice of a subset of U, and will listen to
the recording he has chosen. We see the following actions by our subject:
When presented with {DC , BF } he chooses DC with probability 3/5.
When presented with {DC , BK } he chooses DC with probability 3/5
When presented with {BF , BK } he chooses BF with probability 1/2
We can interpret this as the subject preferring the Debussy quartet by
the C quartet to either of the Beethoven options, and being indifferent to
who is conducting Beethovens eighth symphony. But what happens if the
subject is presented with all three options? According to the axiom,
When presented with {DC , BF , BK }, the subject must choose DC with
probability 3/7, and BF and BK with probability 2/7 each.
This implies that, although the subject prefers Debussy to either individual Beethoven option, he prefers Beethoven when given all three options
from which to choose. This is counter-intuitive, and presents a difficulty to
the application of Luces model of individual choice.
Quantal Response
Notation, definitions, and theorems in this section are credited to McKelvey

and Palfrey (1995) (see [7]).
7
3.1
Notation
Consider a finite n-person game in normal form: There is a set N = {1, .., n}
of players, and for each player i N , a strategy set Si = {si1 , ..., siJi } consisting of Ji pure strategies.
Q There exists a payoff function, ui : S R, for
each i N , where S = iN Si .
Let i be the set of probability
measures on Si . Elements of i are of the
P
form pi : Si R where sij Si pi (sij ) = 1, and pi (sij ) 0 for all sij Si .
We use the notation pij = pi (sij ). We denote points in by p = (p1 , ..., pn ),
where pi = (pi1 , ..., piJi ) i . We use sij to denote the strategy pi i with
pij = 1, and we use the shorthand notation p = (pi , pi ). Thus, (sij , pi )
represents the strategy where i plays the pure strategy sij , and all other
players play their components of p.
Q to have domain by the rule ui (p) =
P The payoff function is extended
p(s)u
(s),
where
p(s)
=
i
iN pi (si ). As defined originally by John
sS
Nash (1950), a vector p = (p1 , ..., pn ) is a Nash Equilibrium if, for all
i N and all p0i i , ui (p0i , pi ) ui (p).
Now, we write Xi = RJi , to represent the space Q
of possible payoffs for
strategies that player i might adopt. We also let X = ni=1 Xi . The function
u : X is defined as
u(p) = (
u1 (p), ..., un (p)),
where
uij (p) = ui (sij , pi ).
3.2
Quantal Response Equilibrium
Next, Mckelvey and Palfrey (1995) define a statistical version of Nash Equilibrium called quantal response equilibrium. In this version, each players
utility for each action is subject to random error3 . For each i and each
j {1, ..., Ji }, and for any p , player is utility for playing action j is
defined as
3
There are multiple interpretations for this random error. The interpretation considered
in this paper is that players are not entirely rational, meaning the true utility they receive
is clouded by statistical noise, leading to occasional errors which decrease with an increase
in rationality. An alternative interpretation is that players do in fact calculate the expected
payoffs correctly, but have an additive payoff disturbance associated with each available
pure strategy.
uij (p) = uij (p) + ij

Each players error vector, i = (i1 , ..., iJi ), is distributed according to a
joint distribution, with density function fi (i ). E(i ) = 0, and the marginal
distribution of fi exists for each ij . f = (f1 , ..., fn ) is considered admissible
if, for all i, fi satisfies the above properties.
An interpretation of this basic model is that each player receives a signal from each action, comprised of the true expected utility of that action
and some error. This results in actions with higher expected utilities being
selected more often, but not always. It also implies that as the difference in
true utility becomes greater, the better option is chosen more often.
For any u = (
u1 , ..., un ) with ui Rji for each i, the authors define the
ij-response set Rij RJi by
uij + ij uik + ik k = 1, ..., Ji }
Rij (
ui ) = {i RJi |
Given p, each set Rij (
ui (p)) specifies the region of errors that would cause
player i to choose action j. Lastly, let ij (
ui ) be the probability that, given
u, player i will select strategy j, that is,
R
ij (
ui ) = Rij (ui ) f ()d
Definition 3.1. Let = (N, S, u) be a game in normal form, and let f
be admissible. For any such f and game = (N, S, u), a quantal response
equilibrium (QRE) is any such that for all i N, 1 j Ji ,
ij = ij (
ui ()).
The authors call i : RJi Ji the statistical reaction function (or
quantal response function) of player i. They then layout several results about
statistical reaction functions:
1. is nonempty
2. i is continuous on RJi
3. ij is monotonically increasing in uij .
4. If, for all i and for all j, k = 1, ..., Ji , ij and ij are i.i.d., then for all
u, for all i, and for all j, k = 1, ..., Ji ,
uij > uik ij (
u) > ik (
u).
9
Property 3 compares the statistical best response function if one of player

is expected payoffs, uij has changed while every other component of ui has
stayed the same. The region Rij expands and each other uik decreases or
remains the same. Property 4 states that the probability of different actions
are ordered by their expected payoffs. Together, these two properties mean
that it is more likely that better actions are chosen than are worse actions.
Properties 1 and 2 above imply the theorem below.
Theorem 3.2. For any and any admissible f , there exists a QRE.
Proof. A quantal response equilibrium is a fixed point of u. u must
be continuous on , because the distribution of , whatever it may be, must
have a density. Thus, by Brouwers fixed point theorem, u has a fixed
point, meaning a QRE exists.
3.3
The Logit QRE
The most commonly used, easiest to work with, and conceptually understood
class of quantal response functions is called the logistic quantal response
function. As will be apparent, the logistic function evolves directly from
Luces individual choice model.
For any given > 0, and for xi RJi the logitstic quantal response
function is defined by
exij
Ji
P
exik
ij (xi ) =
k=1
and corresponds to optimal choice behavior if fi has an extreme value distriij

, with independent eij s. If each player
bution 4 , with cdf Fi (ij ) = ee
uses a logistic quantal response function, the corresponding Logit Equilibrium
requires, for each i, j,
ij =
exij
Ji
P
exik
k=1
where xij = uij (), and the s are equilibrium probability distributions.
The set of possible logistic response functions is parameterized by the
parameter . In our interpretation, this represents a players rationality,
4
This function comes from a specific extreme value distribution called the Gumbel
distribution, as described in the papers of Emil J. Gumbel.
10
and is inversely related to the level or amount of error. When = 0, actions

are chosen completely randomly, and as , the amount of error goes to
0. In fact, as we will soon show, in the case of , the Logit Equilibrium
approaches the Nash equilibrium of the underlying game.
For the purposes of showing such a result, we define the Logit Equilibrium
correspondence as the correspondence : R+ 2 given by
() = { : ij =
uij ()
e
Ji
P
uik ()
e
i, j}
k=1
Theorem 3.3. Let be the logistic quantal response function. Let {1 , 2 , ...}
be a sequence such that limt t = . Let {p1 , p2 , ...} be a corresponding
sequence with pt (t ) for all t such that limt pt = p . Then p is a
Nash equilibrium.
Proof. Assume, for contradiction, that p is not a Nash equilibrium. Then
there is some player i and some pair of strategies, sij and sik , with p (sik ) > 0,
and ui (sij , pi ) > ui (sik , pi ). Equivalently, uij (p ) > uik (p ). Since u is a
continuous function, it follows that for sufficiently small there is a T such
that for t T, uij (pt ) > uik (pt ) + .
But as t , k (
ui (pt ))/j (
u(pt )) 0. Therefore pt (sik ) 0. But
this contradicts p (sik ) > 0. Thus, our assumption that p is not a Nash
equilibrium is proven false, and the proof is complete.
3.4
An Example
Consider the following game:
Alice
H
T
Bob
H
T
X, 0 0, 1
0, 1 1, 0
In tables such as this, each player plays one of their given strategies (in
this case, H and T on the left for Alice, Player 1, and H and T on the top for
Bob, Player 2), resulting in the payoffs from one of the boxes shown above.
In each outcome box, the left number is Player 1s utility gained from that
outcome, and the right number is Player 2s utility. For example, when both
players play T, Alice gets 1 utility, and Bob gets 0 utility.
11
When X = 1, we have a game commonly known as Matching Pennies.

The idea is that Alice wants to play H (or Heads) if Bob is playing H, and
wants to play T (Tails) if Bob is playing T . In other words, Alice wants to
play the same pure strategy as Bob. Bob on the other hand, wants to play
the opposite strategy of Alice. There are no pure strategy Nash equilibria to
this game. The mixed Nash equilibrium, determined by each player playing
according to a probability distribution that makes the other player indifferent
to either pure strategy, can be easily shown to be each player playing each
strategy with probability 1/2.
But what happens when X 6= 1? Intuition would suggest that Alice
might want to play H more often as X increases. However, as a result of
the way in which mixed Nash equilibria are calculated, Alices strategy will
not change in X, while Bob will play H less as X increases. This is slightly
counter-intuitive to human behavior, and empirical results confirm that there
is somewhat of an Own Payoff Effect, in that the probability Alice plays H
increases as X increases (Goeree, Holt, Palfrey 2002). This is well captured
by the logit quantal response model, as we will see shortly.
We now calculate the logit QRE for this game, with X as a variable. Let
pij be the probability that player i plays strategy j, where i {Alice, Bob},
and j {H, T }, and let eij be the corresponding error term.
For Alice:
EU1 (H) = X p2H + 0 p2T and EU1 (T ) = 0 p2H + 1 p2T
Thus, according to our model,
p1H = P(EU1 (H) + eIH > EU1 (T ) + e1T )
= P(e1T eIH < EU1 (H) EU1 (T ))
Because e1T eIH is the difference between two independent error terms, we
can write it as a single error term for Player 1 (Alice).
p1H = P(e1 < EU1 (H) EU1 (T ))
= P(e1 < Xp2H p2T )
We also know, by the axioms of probability, that p2T = 1 p2H , thus we have
p1H = P(e1 < Xp2H (1 p2H )
= P(e1 < (X + 1)p2H 1)
= F [(X + 1)p2H 1)]
12
Where F (x) = 1+e1x is the cdf of the difference between two extreme value
distributions.
Following a similar method for Bob, we get
p2H = F (1 2pIH )
Thus, our two equilibrium probabilities in this game are as follows:
p1H =
1
1+e[(X+1)p2H 1]
and p2H =
1
1+e(12p1H )
We are left with a system with two equations, two unknowns, and parameterized by , the irrationality parameter. There is no closed form solution
to these logit QREs. However, for a given , we can plot the two functions
against each other, with the intersection points being the equilibria.
Using Mathematica, we can manipulate and trace the movement of the
equilibrium as moves from 0 to . This progression is shown in Figures
1, 2 and 3, for X = 4, 9, 19.
We can see that as X increases, so does p1H for a fixed . The smaller
the bigger the increase in p1H is. The interpretation here is that a player who
has some level of irrationality will give in to the own payoff effect, in that
they will more often play H as their payoff for (H,H) increases. The more
rational a player is, the less they will give into this effect, and when a player
is perfectly rational ( = ), they will not change behavior based on their
own payoff, and will play according to the Nash mixed strategy equilibrium.
We also see that as increases, the equilibrium probability that Bob plays
H, p2H , strictly decreases, while the equilibrium probability that Alice plays
H, p1H , increases and then decreases. The initial increase can be thought
of as the Alice realizing how large her own payoff is for playing H as she
gets a little bit rational. But as her rationality continues to increase, she
realizes that a rational Bob would play H infrequently, and so she adjusts
accordingly, resulting in a decreasing p1H in for high enough .
13
(a) = 0
EQ = (0.500, 0.500)
(b) = 0.5
EQ = (0.657, 0.461)
(c) = 1
EQ = (0.723, 0.391)
(d) = 3
EQ = (0.682, 0.251)
(e) = 10
EQ = (0.568, 0.205)
(f) = 100
EQ = (0.507, 0.200)
Figure 1: X = 4. The blue line in each image represents the probability

equation for p1H in terms of p2H . The orange line represents the probability
equation for p2H in terms of p1H . The red line is a trace of equilibrium points
as a function of .
14
(a) = 0
EQ = (0.500, 0.500)
(b) = 0.5
EQ = (0.831, 0.418)
(c) = 1
EQ = (0.894, 0.313)
(d) = 3
EQ = (0.795, 0.145)
(e) = 10
EQ = (0.607, 0.104)
(f) = 100
EQ = (0.511, 0.100)
Figure 2: X = 9
(a) = 0
EQ = (0.500, 0.500)
(b) = 0.5
EQ = (0.966, 0.386)
(c) = 1
EQ = (0.989, 0.273)
(d) = 3
EQ = (0.895, 0.086)
(e) = 10
EQ = (0.644, 0.053)
(f) = 100
EQ = (0.515, 0.050)
Figure 3: X = 19
15
Limitations of Quantal Response
4.1
Debreus Critique Revisited
Recall Gerard Debreus 1960 critique of Luces individual choice model. Debreu argued that the model gives undeserved5 additional probability to correlated strategies. Debreu uses the example of an individual deciding between
options of music. When asked to decide between a recording of Debussy and
a recording of Beethoven, the subject more often chooses Debussy. However,
when asked to decide between Debussy, Beethoven, and the same Beethoven
symphony but with a different conductor, the subject more often selects a
Beethoven recording. This is counter-intuitive, as the subject does not have
a preference between the two Beethoven options, and so we would expect
them to select Debussy with the same probability as when only given two
options. Taken to its logical extreme, as the amount of Beethoven options
increases, with the subject still preferring Debussy to any one Beethoven and
being indifferent between all of the Beethoven options, the probability that
the subject selects Debussy goes to 0.
This section will argue a related critique of the logistic quantal response
model.
4.2
A Motivating Example
Consider the following game:

Bob
Alice
B
S
B
2, 1
0, 0
S
0, 0
1, 2
This game, often called Bach or Stravinsky (or Battle of the Sexes),
tells the story of Alice and Bob deciding separately whether to go to a Johann
Sebastian Bach concert or an Igor Stravinsky concert. Alice would prefer
Bach, Bob would prefer Stravinsky, but neither gets any utility if they go to
different concerts.
5
This additional probability is undeserved in the sense that it is contrary to what we

see and expect in human behavior.
16
It is important to note here that this game has three Nash equilibria:
two pure strategies where they both go to the same concert, and one mixed
strategy, where each goes to the concert they would prefer with probability
2/3.
Let us also consider the following related game:
Bob
B
Alice S1
S2
B
2, 1
0, 0
0, 0
S
0, 0
1, 2
1, 2
S1 and S2 can be thought of as Alice going to the Stravinsky concert

wearing brown shoes or black shoes. It affects neither her utility nor Bobs
utility, as the exact same outcome occurs whichever color shoe she chooses.
Because the choice between S1 and S2 is so arbitrary and has no effect on
either players utility, intuitively we expect that the two options combined
should get the same probability as S would in the 2x2 version.
However, because of the way the quantal response model is specified, this
is not the case. In fact, for a high enough , there are three QREs, each
corresponding to the Nash equilibria stated earlier. However, all three of
these equilibria are mixed strategy, as even the QRE that corresponds to a
pure strategy Nash equilibrium has no absolute probabilities (probabilities
of 0 or 1) for any R+ . Thus, any strategy for either player in a game gets
at least some probabilistic weight, and therein lies the problem. I call this
the additional strategy effect. As will be shown, in the Bach or Stravinsky
example, the probability that Alice plays S in the 2x2 game is different from
the combined probability she plays either S1 or S2 in the 3x2 game. In other
3x2
2x2
3x2
words, p2x2
1B 6= p1B . Whether p1B is greater than or less than p1B depends
on which of the three quantal response equilibria of this game we look at.
It is important to note, however, that this is simply one instance in which
the QRE model does not fit human behavior. In many cases, the QRE model
does very well. For example, if we were to take the 2x2 BoS game that we
have already specified, and added another Stravinsky concert that both Alice
and Bob could attend, the normal form game would would be as follows:
17
B
Alice SX
SY
Bob
B SY
2, 1 0, 0
0, 0 1, 2
0, 0 0, 0
SX
0, 0
0, 0
1, 2
For interpretation, SX and SY can be thought of as Stravinsky concerts

in concert halls X and Y respectively. Just like in the 3x2 example, the
difference between SX and SY is arbitrary in that both concert halls are
equal in quality. However, if Alice goes to concert hall X, while Bob goes to
concert hall Y , both players get 0 utility. In this case, the QRE model rightly
gives the additional Stravinsky strategy significant probability. Alices choice
of X or Y changes Bobs utility, and similarly Bobs choice changes Alices
utility. So in this case, the quantal response equilibria do a good job of
predicting how players should and do act.
4.3
Additional Strategy Effect in Generalized Games
In this subsection, games with variable payoffs will be examined in order

to make generalized conclusions about the QRE model. The two games of
interested are shown below:
Bob
Bob
T
Alice
B
L
a,
c,
R
b,
d,
T
Alice M
B
L
a,
c,
c,
R
b,
d,
d,
Using a similar method to the one used to calculate the QRE in the
matching pennies game, we can calculate the generalize QRE probability
system of equations.
2x2
p2x2
1T =
p2x2
2L
1
[(abc+d)p2x2
2L +bd]
1+e
[(+)p2x2
1T +]
1+e
18
3x2
p3x2
1T =
1
[(abc+d)p3x2
2L +bd]
1+2e
p3x2
1M =
p3x2
2L
[(cda+b)p3x2
2L +db]
2+e
1+e[(+)p1T 3x2+]
Observations. Most importantly, we first note that in the 3x2 game,

although there are 3 equations, each probability can be written in terms of
3x2
either p3x2
1T or p2L , meaning that the system can be solved with only the
first and third equations. Having produced equations for each system, we
examine the relationship between the corresponding probabilities. We see
3x2
that for any a, b, c, d, , , , R, p2x2
2L = p2L . We also note that the only
3x2
difference between p2x2
1T and p1T is that the 3x2 version has an additional
[(abc+d)p2L +bd]
e
term in the denominator.
3x2
Remark. It is easy to show that p2x2
1T > p1T , as e taken to any positive
1
1
exponent is also positive, and 1+x
> 1+2x
for any positive x.
Lemma 4.1. For > 0, and x, y, z R, a function of the form

f (p) =
1
1+xe[yp+z]
is monotonic.
Proof. If y < 0, then e[yp+z] increases in p, and f (p) decreases in p for
positive x, and increases in p for negative x.
If y > 0, then e[yp+z] decreases in p, and f (p) increases in p for
positive x, and decreases in p for negative x.
If y = 0 or x = 0, then f (p) does not change in p.
2x2 3x2
3x2
Lemma 4.2. For > 0, and a, b, c, d, , , , R, p2x2
1T , p2L , p1T , and p2L
are monotonic functions.
Proof. Each function can be written in the form of f (p), as it has been
previously specified. Thus, by Lemma 4.1, each function is monotonic.
The quantal response equilibrium probabilities in the general 2x2 and 3x2
games are p2x2
2x2
3x2
3x2
1T , p
2L , p
1T , and p
2L , such that
19
p2x2
1T =
p2x2
2L =
1
[(abc+d)
p2x2
2L +bd]
1+e
[(+)
p2x2
1T +]
1+e
and
p3x2
1T =
p3x2
2L =
1
[(abc+d)
p3x2
2L +bd]
1+2e
[(+)
p3x2
1T +]
1+e
Theorem 4.3. For > 0, and a, b, c, d, , , , R, p2x2

3x2
1T 6= p
1T
3x2
Proof. It can be easily shown that the difference between p2x2
1T and p1T is
exactly
e[(abc+d)p2L +bd]
(1+e[(abc+d)p2L +bd] )(1+2e[(abc+d)p2L +bd] )
>0
3x2
thus the function p2x2
1T is always above the function p1T for any given p2L
3x2
for all > 0. We also know that the difference between p2x2
2L and p2L is
exactly 0 at all pIT and > 0, since they are the same function of p1T . For
convenience, we will just write both of these functions as p2L .
3x2
Because p2x2
1T , p1T , and p2L are all monotonic, as shown in Lemma 4.2,
3x2
2x2
and because p2x2
1T p1T is always positive, the intersection(s) of p1T and p2L
3x2
must be different from the intersection(s) of p1T and p2L . These intersections
represent the fixed point equilibrium solutions to each system, and thus we
obtain the desired result.
This theorem shows that for any fixed , the addition of the new but
identical strategy changes the equilibrium probabilities. Even if this change
is small, it is significant because we can continue to add copies of the same
strategy until the difference between the equilibrium of the new game and
that of the original game is large. We show this in the next theorem.
Theorem 4.4. Let N be the number of copies of one of the original two
strategies in the generalized game for Player 1. For any fixed , as N increases, the probability of the other strategy increases or decreases strictly
monotonically.
20
Proof. This result is the logical extreme of Theorem 4.3. We have already
seen that when N is increased from 0 to 1, as is the case in the 2x2 to the 3x2
2x2
example, p3x2
1T is always below p1T , thus, depending on the games specifica3x2
tion, p3x2
2x2
1T will either increase or decrease from p
1T with this difference in p1T
2x2
and p1T . Now, imagine a game that has k strategies available, with k 1
of them being identical copies of one strategy. When we add an additional
copy to this kx2 game, such that we have a (k+1)x2 game with N = k, we
have the following two functions for the two games:
1
pkx2
1T =
[(abc+d)
pkx2
2L +bd]
1+(k1)e
(k+1)x2
p1T
1
(k+1)x2
+bd]
[(abc+d)
p2L
1+ke
We see that the difference between these two functions is

e[(abc+d)p2L +bd]
(1+(k1)e[(abc+d)p2L +bd] )(1+ke[(abc+d)p2L +bd] )
which is always strictly greater than 0. By the same logic as Theorem
4.3, the addition of another copy when there are already k copies moves the
equilibrium in the same direction, regardless of how many copies are already
present.
Thus, every additional copy pushes the equilibrium probability that Player
1 plays the non-copied strategy towards 0 or 1.
Conjecture 4.5. As , |
p2x2
3x2
1T p
1T | strictly decreases.
This conjecture proposes that as increases, the difference between the
equilibrium probabilities of the 2x2 and 3x2 games constantly gets smaller.
This is an intuitive result, as we expect a player to treat the two games as
more and more alike, as he is getting more and more rational.
In order to conceptualize all of this, it is helpful to return to our previous
Bach or Stravinsky example.
Observing Figure 4, it is easy to see the result of Theorem 4.3, as the
intersection points are noticeably different, especially for low values of .
Additionally, we see that for low enough values of there is only one equilibrium for each version. It is only after gets large enough that the three
21
(a) = 0
(d) = 3
(b) = 0.5
(e) = 10
(c) = 1.8
(f) = 100
Figure 4: Bach or Stravinsky. The blue line represents the function p2x2
1T , the
3x2
orange line represents the function p1T , and the green line represents the
function p2L .
equilibria appear. On this note, because of the nature of the functions, we
see that three equilibria appear in the 2x2 game for lower values of than
in the 3x2 game. We can think of this as the Alice in the 3x2 game taking
longer to get to the same level of rationality as her 2x2 self.
4.4
Applications to Prisoners Dilemma
The story behind the famous prisoners dilemma game is this: Two guilty
people (Alice and Bob) are being questioned separately for collaboratively
breaking the law. Both can either defect and confess to the crime, or cooperate with each other by not confessing. If both cooperate, they get 1
year each. If one defects and the other tries to cooperate, the defector gets
off without jail time, and the other gets 5 years. If both defect, then they
each get 3 years. The normal form game is presented below, with numbers
corresponding to utility, not jail time.
22
Alice
D
C
Bob
D
C
1, 1 5, 0
0, 5 3, 3
This game has a dominant strategy, in that defecting provides more utility
no matter how the other player acts. Similarly, cooperating is a dominated
strategy, in that each player can get more utility by not cooperating. Thus,
the only situation in which both players are best responding, and thus the
only Nash equilibrium, is when both players defect. Even though the best
joint outcome is when both players cooperate, it is too tempting in that case
for each player to defect.
The theorems in the previous section result in an interesting consequence
for this game. Like we did in previous sections, we create a third option for
Alice that is an exact copy of her second strategy. We are now looking at
the game below.
D
Alice C1
C2
Bob
D
C
1, 1 5, 0
0, 5 3, 3
0, 5 3, 3
After reading section 4.3, we suspect that the probability that Alice defects in this game will be different from the same probability in the original
2x2 version. Using the same technique as before, we see that this is true,
and in fact the addition of a copy of the C strategy actually makes it more
likely that Alice will cooperate. This is shown in Figure 5.
Note also that as increases, the equilibrium moves towards both players
playing D with probability close to 1 much more quickly than in other games
we have looked at. This is caused by D being a dominant strategy. However,
what this shows is that for any fixed rationality value , there is a number
of copies of C, call it N , that can be added such that the probability that
Alice plays D is approximately 0. For a very low value of , N need not be
large. As the value of increases, N grows very quickly. For = , there
is no such N . But, as shown in Figure 6, for any fixed , we can keep adding
copies of cooperate until Alice defects with probability approximately 0.
23
(a) = 0
(b) = 1
(c) = 3
Figure 5: Prisoners Dilemma. The blue line represents the function p2x2
1D ,
3x2
the orange line represents the function p1D , and the green line represents the
function p2D .
(a) = 0
(b) = 1
(c) = 3
(d) = 5
(e) = 7
(f) = 9
Figure 6: Prisoners Dilemma. The green line represents the function p2D .
Each other line represents the function p1D in a game with N copies of the C
strategy for Alice. Dark Blue: N = 1, Orange: N = 2, Red: N = 5, Purple:
N = 10, Brown: N = 20, Light Blue: N = 50, Yellow: N = 100, Magenta:
N = 1,000, Dark Green: N = 10,000, Bright Red: N = 1,000,000
24
Conclusion
In this paper, we examine only the effect of adding identical copies of preexisting strategies to a game. However, it is important to note that this is
only an extreme example, used for convenience and ease of understanding.
This specific example is only a small part of the phenomenon that exists. We
know by the specification of the quantal response model that as we change
the payoffs of a game, the equilibria change continuously. Thus, if we take an
identical strategy and adjust it ever so slightly, we see the same phenomenon.
Continue in this way, and we reach the conclusion that we see similar effects
to the ones we have shown, even when strategies are just correlated in some
way. That is, if two strategies are similar but not identical, the model still
may have trouble adapting. This has much more widespread applications
than just the identical strategy special case.
Moving forward, in order to better suit the model to adapt and adjust to
similar or identical strategies, there needs to exist in the model specification
some parameter representing correlation between strategies. If two strategies
are highly correlated, meaning that they offer the player similar results, then
we should see similar equilibrium probabilities in the game with only one
of the two strategies, and the game with both strategies. If two strategies
are mostly uncorrelated, in that there is a significant difference in outcome
between the two strategies, then the addition of one of the strategies to a
game containing the other strategy should in fact make a significant impact
on the equilibrium probabilities.
References
[1] Anderson, Simon P., Jacob K. Goeree, and Charles A. Holt. MinimumEffort Coordination Games: Stochastic Potential and Logit Equilibrium.
Games and Economic Behavior 34.2 (2001): 177-99.
[2] Debreu, Gerard. Rev of Individual Choice Behavior a Theoretical Analysis. The American Economic Review (1960): 186-88.
[3] Goeree, Jacob K., Charles A. Holt, and Thomas R. Palfrey. Quantal
Response Equilibria. The New Palgrave Dictionary of Economics(2008):
783-87.
25
[4] Goeree, Jacob K., Charles A. Holt, and Thomas R. Palfrey. Risk Averse
Behavior in Generalized Matching Pennies Games. Games and Economic
Behavior 45.1 (2003): 97-113.
[5] Haile, Philip A., Ali Hortasu, and Grigory Kosenok. On the Empirical
Content of Quantal Response Equilibrium. American Economic Review
98.1 (2008): 180-200.
[6] Luce, R. Duncan. Individual Choice Behavior a Theoretical Analysis. Mineola, NY: Dover Publications, 2005.
[7] Mckelvey, Richard D., and Thomas R. Palfrey. Quantal Response Equilibria for Normal Form Games. Games and Economic Behavior 10.1
(1995): 6-38.
[8] Ochs, Jack. Coordination Problems, Handbook of Experimental Economics (1995), edited by John Kagel and Alvin E. Roth, Princeton University Press, Princeton, 195-251.
26

Stochastic Modeling of Irrationality in Normal-Form Games

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stochastic Modeling of Irrationality in Normal-Form Games

Uploaded by

Copyright:

Available Formats

Stochastic Modeling of Irrationality in

Luces Choice Model

therefore, it is always sufficient to state results just for PT (x).

Moving forward, we will be using probabilities that a certain element of

Theorem 2.3. If axiom 1 holds for T and if P (x, y) 6= 0, 1 for all x, y T ,

P (x, z) = P (x,y)P (y,z)+P (z,y)P (y,x)

That is, P (a, b) 6= 0, 1 for any a, b T .

v (y), and we have v(x) = k v (x), which concludes the proof.

In 1960, Gerard Debreu published a brief review of Luces Individual Choice

Notation, definitions, and theorems in this section are credited to McKelvey

Quantal Response Equilibrium

uij (p) = uij (p) + ij

Property 3 compares the statistical best response function if one of player

The Logit QRE

and corresponds to optimal choice behavior if fi has an extreme value distriij

and is inversely related to the level or amount of error. When = 0, actions

Consider the following game:

When X = 1, we have a game commonly known as Matching Pennies.

Figure 1: X = 4. The blue line in each image represents the probability

Limitations of Quantal Response

Debreus Critique Revisited

Consider the following game:

This additional probability is undeserved in the sense that it is contrary to what we

S1 and S2 can be thought of as Alice going to the Stravinsky concert

For interpretation, SX and SY can be thought of as Stravinsky concerts

Additional Strategy Effect in Generalized Games

In this subsection, games with variable payoffs will be examined in order

Observations. Most importantly, we first note that in the 3x2 game,

Lemma 4.1. For > 0, and x, y, z R, a function of the form

Theorem 4.3. For > 0, and a, b, c, d, , , , R, p2x2

We see that the difference between these two functions is

Applications to Prisoners Dilemma

You might also like