Professional Documents
Culture Documents
Normal-Form Games
Ben Zod
March 31, 2016
Abstract
Richard McKelvey and Thomas Palfrey introduced a statistical
generalization of the Nash equilibrium solution concept that accounts
for irrationality or lack of information among players of a game. Their
model, quantal response, clouds an actions perceived utility with statistical noise resulting in continuous movement of equilibria as payoffs
change, and often describes empirical results with greater accuracy
than does Nashs solution concept. This paper examines this model
and its origins, and finds a weakness caused by its stochastic specification. Quantal response equilibria exhibit inflexibility when adapting
to the appearance of an additional strategy, and can be manipulated
to yield counter-intuitive results.
Introduction
Before Mckelvey and Palfrey (1995) introduced the quantal response model
for normal-form games, the field of game theory focused on the decisionmaking of intelligent and rational subjects. The field examined how perfectly
rational players would act in situations involving conflict, cooperation, and
strategy. Mckelvey and Palfrey relax the assumptions of intelligence and
rationality, and apply a stochastic model to normal-form games.
Traditional game theory, pioneered by John Nash, defines equilibrium in
a game as a situation in which each players strategy is the best response
to every other players strategy. That is, after revealing the outcome, each
player could not have done better by changing strategies, given every other
1
players strategy remains constant. In normal-form games, this solution concept results in pure strategy equilibria, in which players always play one of
their strategy choices, and mixed strategy equilibria, in which players mix
between their available options according to some probability distribution.
Nashs solution concept shows exactly how perfectly intelligent and rational
decision makers act. However, humans often know varying levels of information, and vary in their rationality. For this reason, empirical data of humans
playing games often does not align well with the Nash equilibrium predictions, as shown in [8].
The assumption of perfectly rational and knowledgeable players is not one
that fits well with human players. For example, imagine a subject is asked
to participate in an experiment in which she is provided two light bulbs
of differing brightness, and is asked to write down which bulb is brighter.
The equivalent Nash solution concept to this experiment would be that the
subject would always know which bulb is brighter, and would also write down
the brighter bulb, regardless of how close in brightness they are. However,
this is not something we expect from a human subject. If one light bulb
is much brighter than the other, we expect the subject to answer correctly
close to 100% of the time. However, as the light bulbs get closer and closer
in brightness, we expect the subject to be more likely to make a mistake.
That is, as the difference in brightness between the two bulbs decreases, the
probability of the subject answering correctly also decreases. Thus, instead
of best responding, the subject is making an educated guess based on the
information provided. This can be thought of as better responding.
Quantal response equilibria attempt to model this behavior of better
responding by assigning the probability of a player playing an action as proportional to the difference between the expected utility of that action against
that of every other possible action. In the light bulb example, this means
that the probability that the subject answers bulb one as being brighter is
proportional to the difference between the true brightness of bulb one and
two. If bulb one is much brighter than bulb two, the subject answers bulb
one with probability close to 1. If bulb one is just a small amount brighter
than bulb two, the subject answers bulb one with probability close to 0.5 1 .
This paper will first explore individual choice models, closely examining
1
The light bulb example is an experiment of individual choice. Although the quantal
response model is designed for and used for games of more than one player, the individual
choice example is presented for simplicity and understanding the core concept.
Duncan Luces (1959) choice axiom, its results, and its limitations. The logic
applied to individual choice decisions will then be expanded to the realm of
game theory, where McKelvey and Palfreys quantal response equilibrium
concept will be more carefully defined, calculated, and analyzed. Following
this, I will delve into the limitations of the quantal response model, the
situations in which it succeeds and fails, and in which areas it can improve
as a model that predicts behavior of subjects with varying levels rationality.
Individual Choice
2.1
All theorems and lemmas in this section are attributed to Duncan Luce
(1959). For further information and proofs of the results below, see [6].
Throughout this section we will suppose that a universal set U is given,
and should be interpreted as the universe of possible alternatives. In decisions, the decision maker will have to be able to evaluate the elements of U
according to some preference specification and select elements from certain
subsets of U . Now, let T be a finite subset of U , and suppose that an element
must be chosen from T . If S is a subset of T (S T ), let PT (S) denote the
probability that the selected element lies in S. If x is an element of T (x T ),
let PT (x) denote the probability that the selected element is x.
With notation defined, we can specify the three probability axioms that
form a foundation of any probabilistic study.
The three axioms of probability
(i) For S T, 0 PT (S) 1
(ii) PT (T ) = 1
(iii) If R, S T and R S = , then PT (R S) = PT (R) + PT (S).
Note: If we repeatedly apply part (iii), we see that
P
PT (S) =
PT (x);
xS
PS (x)
PS (y) .
The importance of this lemma is that it implies that when axiom 1 holds
for T and its subsets, the ratio PS (x)/PS (y) is independent of S.
Luces first theorem formally establishes that, assuming axiom 1 holds,
all the probabilities are determined by the pairwise probabilities.
4
P (x,y)
1+
1
P
yT {x}
P (y,x)
P (x,y)
His next theorem shows that axiom 1 also demands that the pairwise
probabilities meet certain constraints.
Theorem 2.4. If axiom 1 holds for {x,y,z} and if none of the pairwise discriminations is perfect 2 , then
P (x, y)P (y, z)P (z, x) = P (x, z)P (z, y)P (y, x).
Corollary 2.5. Under the conditions of the theorem,
P (x,y)P (y,z)
Pv(x)
v(y)
yS
Proof. Define v(x) = kPT (x), where k > 0; then by part (i) of axiom 1, and
part (iii) of the probability axioms, we have
2
PS (x) =
=
PT (x)
PT (S)
PkPT (x)
kPT (x)
yS
Pv(x)
v(y)
yS
so existence is ensured.
To show uniqueness, suppose that v 0 is another such function: then for any
xT
v(x) = kPT (x) =
kv 0 (x)
P
.
v 0 (y)
yT
Let k = k/
0 0
yT
What Luce has shown here is the following: Say we are confined to a
local region T in which all pairwise discriminations are imperfect. Suppose
we also know that the several probability measures are related to one another
in accordance to axiom 1, such that PS acts like a conditional probability
relative to PT . Now, what has been shown is that the distribution PT (x) can
be interpreted as a particular choice of unit of a ratio scale over T. These
scales can be extended throughout U, having important implications.
In the third chapter of Individual Choice Behavior, Luce applies his axiom
and its results to utility theory. If we let A be the set of pure alternatives
and E the set of chance events, then ab, where a, b A and E, is
the uncertain alternative where a is the outcome if occurs, and b is the
outcome if does not occur. The symbol QD is introduced for the subsets
D of E to describe the probability that an element from D is most likely to
occur, according to the subject. Luces choice axiom is assumed to hold for
the families {PT } and {QD }.
The idea is that ab will be preferred to ab, where E, if and only if
one of the following is true
(1) a is preferred to b and is considered more likely than , or
(2) b is preferred to a and is considered more likely than .
A preference structure following this rule is considered decomposable, and
can be written as
P (ab, ab) = P (a, b)Q(, ) + P (b, a)Q(, ) for a, b A and , E.
6
2.2
Debreus Critique
Quantal Response
3.1
Notation
Consider a finite n-person game in normal form: There is a set N = {1, .., n}
of players, and for each player i N , a strategy set Si = {si1 , ..., siJi } consisting of Ji pure strategies.
Q There exists a payoff function, ui : S R, for
each i N , where S = iN Si .
Let i be the set of probability
measures on Si . Elements of i are of the
P
form pi : Si R where sij Si pi (sij ) = 1, and pi (sij ) 0 for all sij Si .
We use the notation pij = pi (sij ). We denote points in by p = (p1 , ..., pn ),
where pi = (pi1 , ..., piJi ) i . We use sij to denote the strategy pi i with
pij = 1, and we use the shorthand notation p = (pi , pi ). Thus, (sij , pi )
represents the strategy where i plays the pure strategy sij , and all other
players play their components of p.
Q to have domain by the rule ui (p) =
P The payoff function is extended
p(s)u
(s),
where
p(s)
=
i
iN pi (si ). As defined originally by John
sS
Nash (1950), a vector p = (p1 , ..., pn ) is a Nash Equilibrium if, for all
i N and all p0i i , ui (p0i , pi ) ui (p).
Now, we write Xi = RJi , to represent the space Q
of possible payoffs for
strategies that player i might adopt. We also let X = ni=1 Xi . The function
u : X is defined as
u(p) = (
u1 (p), ..., un (p)),
where
uij (p) = ui (sij , pi ).
3.2
Next, Mckelvey and Palfrey (1995) define a statistical version of Nash Equilibrium called quantal response equilibrium. In this version, each players
utility for each action is subject to random error3 . For each i and each
j {1, ..., Ji }, and for any p , player is utility for playing action j is
defined as
3
There are multiple interpretations for this random error. The interpretation considered
in this paper is that players are not entirely rational, meaning the true utility they receive
is clouded by statistical noise, leading to occasional errors which decrease with an increase
in rationality. An alternative interpretation is that players do in fact calculate the expected
payoffs correctly, but have an additive payoff disturbance associated with each available
pure strategy.
3.3
The most commonly used, easiest to work with, and conceptually understood
class of quantal response functions is called the logistic quantal response
function. As will be apparent, the logistic function evolves directly from
Luces individual choice model.
For any given > 0, and for xi RJi the logitstic quantal response
function is defined by
exij
Ji
P
exik
ij (xi ) =
k=1
exij
Ji
P
exik
k=1
where xij = uij (), and the s are equilibrium probability distributions.
The set of possible logistic response functions is parameterized by the
parameter . In our interpretation, this represents a players rationality,
4
This function comes from a specific extreme value distribution called the Gumbel
distribution, as described in the papers of Emil J. Gumbel.
10
uij ()
e
Ji
P
uik ()
e
i, j}
k=1
Theorem 3.3. Let be the logistic quantal response function. Let {1 , 2 , ...}
be a sequence such that limt t = . Let {p1 , p2 , ...} be a corresponding
sequence with pt (t ) for all t such that limt pt = p . Then p is a
Nash equilibrium.
Proof. Assume, for contradiction, that p is not a Nash equilibrium. Then
there is some player i and some pair of strategies, sij and sik , with p (sik ) > 0,
and ui (sij , pi ) > ui (sik , pi ). Equivalently, uij (p ) > uik (p ). Since u is a
continuous function, it follows that for sufficiently small there is a T such
that for t T, uij (pt ) > uik (pt ) + .
But as t , k (
ui (pt ))/j (
u(pt )) 0. Therefore pt (sik ) 0. But
this contradicts p (sik ) > 0. Thus, our assumption that p is not a Nash
equilibrium is proven false, and the proof is complete.
3.4
An Example
Alice
H
T
Bob
H
T
X, 0 0, 1
0, 1 1, 0
In tables such as this, each player plays one of their given strategies (in
this case, H and T on the left for Alice, Player 1, and H and T on the top for
Bob, Player 2), resulting in the payoffs from one of the boxes shown above.
In each outcome box, the left number is Player 1s utility gained from that
outcome, and the right number is Player 2s utility. For example, when both
players play T, Alice gets 1 utility, and Bob gets 0 utility.
11
12
Where F (x) = 1+e1x is the cdf of the difference between two extreme value
distributions.
Following a similar method for Bob, we get
p2H = F (1 2pIH )
Thus, our two equilibrium probabilities in this game are as follows:
p1H =
1
1+e[(X+1)p2H 1]
and p2H =
1
1+e(12p1H )
We are left with a system with two equations, two unknowns, and parameterized by , the irrationality parameter. There is no closed form solution
to these logit QREs. However, for a given , we can plot the two functions
against each other, with the intersection points being the equilibria.
Using Mathematica, we can manipulate and trace the movement of the
equilibrium as moves from 0 to . This progression is shown in Figures
1, 2 and 3, for X = 4, 9, 19.
We can see that as X increases, so does p1H for a fixed . The smaller
the bigger the increase in p1H is. The interpretation here is that a player who
has some level of irrationality will give in to the own payoff effect, in that
they will more often play H as their payoff for (H,H) increases. The more
rational a player is, the less they will give into this effect, and when a player
is perfectly rational ( = ), they will not change behavior based on their
own payoff, and will play according to the Nash mixed strategy equilibrium.
We also see that as increases, the equilibrium probability that Bob plays
H, p2H , strictly decreases, while the equilibrium probability that Alice plays
H, p1H , increases and then decreases. The initial increase can be thought
of as the Alice realizing how large her own payoff is for playing H as she
gets a little bit rational. But as her rationality continues to increase, she
realizes that a rational Bob would play H infrequently, and so she adjusts
accordingly, resulting in a decreasing p1H in for high enough .
13
(a) = 0
EQ = (0.500, 0.500)
(b) = 0.5
EQ = (0.657, 0.461)
(c) = 1
EQ = (0.723, 0.391)
(d) = 3
EQ = (0.682, 0.251)
(e) = 10
EQ = (0.568, 0.205)
(f) = 100
EQ = (0.507, 0.200)
14
(a) = 0
EQ = (0.500, 0.500)
(b) = 0.5
EQ = (0.831, 0.418)
(c) = 1
EQ = (0.894, 0.313)
(d) = 3
EQ = (0.795, 0.145)
(e) = 10
EQ = (0.607, 0.104)
(f) = 100
EQ = (0.511, 0.100)
Figure 2: X = 9
(a) = 0
EQ = (0.500, 0.500)
(b) = 0.5
EQ = (0.966, 0.386)
(c) = 1
EQ = (0.989, 0.273)
(d) = 3
EQ = (0.895, 0.086)
(e) = 10
EQ = (0.644, 0.053)
(f) = 100
EQ = (0.515, 0.050)
Figure 3: X = 19
15
4.1
Recall Gerard Debreus 1960 critique of Luces individual choice model. Debreu argued that the model gives undeserved5 additional probability to correlated strategies. Debreu uses the example of an individual deciding between
options of music. When asked to decide between a recording of Debussy and
a recording of Beethoven, the subject more often chooses Debussy. However,
when asked to decide between Debussy, Beethoven, and the same Beethoven
symphony but with a different conductor, the subject more often selects a
Beethoven recording. This is counter-intuitive, as the subject does not have
a preference between the two Beethoven options, and so we would expect
them to select Debussy with the same probability as when only given two
options. Taken to its logical extreme, as the amount of Beethoven options
increases, with the subject still preferring Debussy to any one Beethoven and
being indifferent between all of the Beethoven options, the probability that
the subject selects Debussy goes to 0.
This section will argue a related critique of the logistic quantal response
model.
4.2
A Motivating Example
B
S
B
2, 1
0, 0
S
0, 0
1, 2
This game, often called Bach or Stravinsky (or Battle of the Sexes),
tells the story of Alice and Bob deciding separately whether to go to a Johann
Sebastian Bach concert or an Igor Stravinsky concert. Alice would prefer
Bach, Bob would prefer Stravinsky, but neither gets any utility if they go to
different concerts.
5
16
It is important to note here that this game has three Nash equilibria:
two pure strategies where they both go to the same concert, and one mixed
strategy, where each goes to the concert they would prefer with probability
2/3.
Let us also consider the following related game:
Bob
B
Alice S1
S2
B
2, 1
0, 0
0, 0
S
0, 0
1, 2
1, 2
17
B
Alice SX
SY
Bob
B SY
2, 1 0, 0
0, 0 1, 2
0, 0 0, 0
SX
0, 0
0, 0
1, 2
4.3
Bob
T
Alice
B
L
a,
c,
R
b,
d,
T
Alice M
B
L
a,
c,
c,
R
b,
d,
d,
Using a similar method to the one used to calculate the QRE in the
matching pennies game, we can calculate the generalize QRE probability
system of equations.
2x2
p2x2
1T =
p2x2
2L
1
[(abc+d)p2x2
2L +bd]
1+e
[(+)p2x2
1T +]
1+e
18
3x2
p3x2
1T =
1
[(abc+d)p3x2
2L +bd]
1+2e
p3x2
1M =
p3x2
2L
[(cda+b)p3x2
2L +db]
2+e
1+e[(+)p1T 3x2+]
1
1+xe[yp+z]
is monotonic.
Proof. If y < 0, then e[yp+z] increases in p, and f (p) decreases in p for
positive x, and increases in p for negative x.
If y > 0, then e[yp+z] decreases in p, and f (p) increases in p for
positive x, and decreases in p for negative x.
If y = 0 or x = 0, then f (p) does not change in p.
2x2 3x2
3x2
Lemma 4.2. For > 0, and a, b, c, d, , , , R, p2x2
1T , p2L , p1T , and p2L
are monotonic functions.
Proof. Each function can be written in the form of f (p), as it has been
previously specified. Thus, by Lemma 4.1, each function is monotonic.
The quantal response equilibrium probabilities in the general 2x2 and 3x2
games are p2x2
2x2
3x2
3x2
1T , p
2L , p
1T , and p
2L , such that
19
p2x2
1T =
p2x2
2L =
1
[(abc+d)
p2x2
2L +bd]
1+e
[(+)
p2x2
1T +]
1+e
and
p3x2
1T =
p3x2
2L =
1
[(abc+d)
p3x2
2L +bd]
1+2e
[(+)
p3x2
1T +]
1+e
e[(abc+d)p2L +bd]
(1+e[(abc+d)p2L +bd] )(1+2e[(abc+d)p2L +bd] )
>0
3x2
thus the function p2x2
1T is always above the function p1T for any given p2L
3x2
for all > 0. We also know that the difference between p2x2
2L and p2L is
exactly 0 at all pIT and > 0, since they are the same function of p1T . For
convenience, we will just write both of these functions as p2L .
3x2
Because p2x2
1T , p1T , and p2L are all monotonic, as shown in Lemma 4.2,
3x2
2x2
and because p2x2
1T p1T is always positive, the intersection(s) of p1T and p2L
3x2
must be different from the intersection(s) of p1T and p2L . These intersections
represent the fixed point equilibrium solutions to each system, and thus we
obtain the desired result.
This theorem shows that for any fixed , the addition of the new but
identical strategy changes the equilibrium probabilities. Even if this change
is small, it is significant because we can continue to add copies of the same
strategy until the difference between the equilibrium of the new game and
that of the original game is large. We show this in the next theorem.
Theorem 4.4. Let N be the number of copies of one of the original two
strategies in the generalized game for Player 1. For any fixed , as N increases, the probability of the other strategy increases or decreases strictly
monotonically.
20
Proof. This result is the logical extreme of Theorem 4.3. We have already
seen that when N is increased from 0 to 1, as is the case in the 2x2 to the 3x2
2x2
example, p3x2
1T is always below p1T , thus, depending on the games specifica3x2
tion, p3x2
2x2
1T will either increase or decrease from p
1T with this difference in p1T
2x2
and p1T . Now, imagine a game that has k strategies available, with k 1
of them being identical copies of one strategy. When we add an additional
copy to this kx2 game, such that we have a (k+1)x2 game with N = k, we
have the following two functions for the two games:
1
pkx2
1T =
[(abc+d)
pkx2
2L +bd]
1+(k1)e
(k+1)x2
p1T
1
(k+1)x2
+bd]
[(abc+d)
p2L
1+ke
(a) = 0
(d) = 3
(b) = 0.5
(e) = 10
(c) = 1.8
(f) = 100
Figure 4: Bach or Stravinsky. The blue line represents the function p2x2
1T , the
3x2
orange line represents the function p1T , and the green line represents the
function p2L .
equilibria appear. On this note, because of the nature of the functions, we
see that three equilibria appear in the 2x2 game for lower values of than
in the 3x2 game. We can think of this as the Alice in the 3x2 game taking
longer to get to the same level of rationality as her 2x2 self.
4.4
The story behind the famous prisoners dilemma game is this: Two guilty
people (Alice and Bob) are being questioned separately for collaboratively
breaking the law. Both can either defect and confess to the crime, or cooperate with each other by not confessing. If both cooperate, they get 1
year each. If one defects and the other tries to cooperate, the defector gets
off without jail time, and the other gets 5 years. If both defect, then they
each get 3 years. The normal form game is presented below, with numbers
corresponding to utility, not jail time.
22
Alice
D
C
Bob
D
C
1, 1 5, 0
0, 5 3, 3
This game has a dominant strategy, in that defecting provides more utility
no matter how the other player acts. Similarly, cooperating is a dominated
strategy, in that each player can get more utility by not cooperating. Thus,
the only situation in which both players are best responding, and thus the
only Nash equilibrium, is when both players defect. Even though the best
joint outcome is when both players cooperate, it is too tempting in that case
for each player to defect.
The theorems in the previous section result in an interesting consequence
for this game. Like we did in previous sections, we create a third option for
Alice that is an exact copy of her second strategy. We are now looking at
the game below.
D
Alice C1
C2
Bob
D
C
1, 1 5, 0
0, 5 3, 3
0, 5 3, 3
After reading section 4.3, we suspect that the probability that Alice defects in this game will be different from the same probability in the original
2x2 version. Using the same technique as before, we see that this is true,
and in fact the addition of a copy of the C strategy actually makes it more
likely that Alice will cooperate. This is shown in Figure 5.
Note also that as increases, the equilibrium moves towards both players
playing D with probability close to 1 much more quickly than in other games
we have looked at. This is caused by D being a dominant strategy. However,
what this shows is that for any fixed rationality value , there is a number
of copies of C, call it N , that can be added such that the probability that
Alice plays D is approximately 0. For a very low value of , N need not be
large. As the value of increases, N grows very quickly. For = , there
is no such N . But, as shown in Figure 6, for any fixed , we can keep adding
copies of cooperate until Alice defects with probability approximately 0.
23
(a) = 0
(b) = 1
(c) = 3
Figure 5: Prisoners Dilemma. The blue line represents the function p2x2
1D ,
3x2
the orange line represents the function p1D , and the green line represents the
function p2D .
(a) = 0
(b) = 1
(c) = 3
(d) = 5
(e) = 7
(f) = 9
Figure 6: Prisoners Dilemma. The green line represents the function p2D .
Each other line represents the function p1D in a game with N copies of the C
strategy for Alice. Dark Blue: N = 1, Orange: N = 2, Red: N = 5, Purple:
N = 10, Brown: N = 20, Light Blue: N = 50, Yellow: N = 100, Magenta:
N = 1,000, Dark Green: N = 10,000, Bright Red: N = 1,000,000
24
Conclusion
In this paper, we examine only the effect of adding identical copies of preexisting strategies to a game. However, it is important to note that this is
only an extreme example, used for convenience and ease of understanding.
This specific example is only a small part of the phenomenon that exists. We
know by the specification of the quantal response model that as we change
the payoffs of a game, the equilibria change continuously. Thus, if we take an
identical strategy and adjust it ever so slightly, we see the same phenomenon.
Continue in this way, and we reach the conclusion that we see similar effects
to the ones we have shown, even when strategies are just correlated in some
way. That is, if two strategies are similar but not identical, the model still
may have trouble adapting. This has much more widespread applications
than just the identical strategy special case.
Moving forward, in order to better suit the model to adapt and adjust to
similar or identical strategies, there needs to exist in the model specification
some parameter representing correlation between strategies. If two strategies
are highly correlated, meaning that they offer the player similar results, then
we should see similar equilibrium probabilities in the game with only one
of the two strategies, and the game with both strategies. If two strategies
are mostly uncorrelated, in that there is a significant difference in outcome
between the two strategies, then the addition of one of the strategies to a
game containing the other strategy should in fact make a significant impact
on the equilibrium probabilities.
References
[1] Anderson, Simon P., Jacob K. Goeree, and Charles A. Holt. MinimumEffort Coordination Games: Stochastic Potential and Logit Equilibrium.
Games and Economic Behavior 34.2 (2001): 177-99.
[2] Debreu, Gerard. Rev of Individual Choice Behavior a Theoretical Analysis. The American Economic Review (1960): 186-88.
[3] Goeree, Jacob K., Charles A. Holt, and Thomas R. Palfrey. Quantal
Response Equilibria. The New Palgrave Dictionary of Economics(2008):
783-87.
25
[4] Goeree, Jacob K., Charles A. Holt, and Thomas R. Palfrey. Risk Averse
Behavior in Generalized Matching Pennies Games. Games and Economic
Behavior 45.1 (2003): 97-113.
[5] Haile, Philip A., Ali Hortasu, and Grigory Kosenok. On the Empirical
Content of Quantal Response Equilibrium. American Economic Review
98.1 (2008): 180-200.
[6] Luce, R. Duncan. Individual Choice Behavior a Theoretical Analysis. Mineola, NY: Dover Publications, 2005.
[7] Mckelvey, Richard D., and Thomas R. Palfrey. Quantal Response Equilibria for Normal Form Games. Games and Economic Behavior 10.1
(1995): 6-38.
[8] Ochs, Jack. Coordination Problems, Handbook of Experimental Economics (1995), edited by John Kagel and Alvin E. Roth, Princeton University Press, Princeton, 195-251.
26