Aan 15

Games and Economic Behavior 79 (2013) 6774
Contents lists available at SciVerse ScienceDirect
Games and Economic Behavior

www.elsevier.com/locate/geb
Note
Evolutionary stability in repeated extensive games played by nite automata

Luciano Andreozzi 1
Universit degli Studi di Trento, Facolt di Economia, Via Inama, 8, 38100 Trento, Italy
a r t i c l e
i n f o
a b s t r a c t
We discuss the emergence of cooperation in repeated Trust Mini-Games played by nite automata. Contrary to a previous result obtained by Piccione and Rubinstein (1993), we rst prove that this repeated game admits two Nash equilibria, a cooperative and a noncooperative one. Second, we show that the cooperative equilibrium is the only (cyclically) stable set under the so-called best response dynamics. 2013 Elsevier Inc. All rights reserved.
Article history: Received 10 May 2010 Available online 18 January 2013 JEL classication: C70 C72 Keywords: Finite automata Trust game Evolutionary stability Cooperation
1. Introduction Repeated games have enormous sets of equilibria. In a seminal article, Abreu and Rubinstein (1988) introduced the idea that the equilibrium selection problem could be addressed by modeling strategies as nite automata. In this approach, the total payoff of a strategy is a combination of the complexity of the automaton that represents it (as measured by the number of its states) and the payoff it obtains in the playing of the game. They proved that arbitrarily small costs of complexity could drastically reduce the strategies that can be sustained in equilibrium. In the repeated Prisoners Dilemma (PD), for example, some popular strategies such as Tit for Tat (TfT) cannot be Nash equilibria. The reason is that, in playing against itself, TfT never reaches the states in which it does not cooperate. It follows that a strategy of unconditional cooperation is a best response to TfT, because it obtains the same payoff as TfT itself, but with a smaller number of states. Abreu and Rubinstein (1988) proved that cooperation can only be achieved by machines that put the punishing phase rst. Each machine starts by punishing the other by playing Defect for a xed number of rounds and does not revert to cooperation unless the other machine has played Defect for the same number of rounds. Once the punishing phase is over, both machines start cooperating. Switching to defection during the cooperative phase is deterred by the threat to start the punishment phase all over again. Abreu and Rubinstein (1988) provide a nice interpretation of this initial phase of punishment as a show of strength: at the beginning of the play, each machine will test the ability of the other machine to punish an eventual defection. Machines that are unable to punish are exploited by unending defection. While this argument drastically restricts the strategies that one can observe in equilibrium, it still allows for a huge variety of possible outcomes, including perpetual defection. Binmore and Samuelson (1992) used an evolutionary model to study the resulting equilibrium selection problem and obtained a stark result. When the cost of complexity is so small that
E-mail address: luciano.andreozzi@economia.unitn.it. I would like to thank Ken Binmore, Larry Samuelson and Michele Piccione for their comments on previous versions of this paper. Two anonymous referees and an editor of this journal provided detailed comments that greatly improved the exposition of the matter. All remaining mistakes are mine.
1
0899-8256/$ see front matter 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.geb.2013.01.003
68
L. Andreozzi / Games and Economic Behavior 79 (2013) 6774
Fig. 1. The Trust Mini-Game.
it can be ranked lexicographically after the game payoffs, the only (modied) evolutionarily stable strategies are those that cooperate throughout the game. However, this result proved to be fragile. Volij (2002) showed that it crucially depends on the way complexity costs are modeled. When they enter directly into a machines payoff function the opposite conclusion holds true: Defect is the only evolutionarily stable strategy.2 We contribute to this literature by studying the emergence of cooperation in sequential games, and in particular in the Trust Mini-Game (TG) in Fig. 1. It is the sequential version of a PD, in which the rst player (the Sender ) chooses whether to Trust ( T ) the second player (Receiver ), who in turn decides whether to Reward ( R ) the Senders trust or not (NR). With the assumption that v S < 0 < 1 < V R , (NT , NR) is the games unique Nash equilibrium. At rst sight, cooperation is even less likely to emerge in sequential games than in simultaneous ones. Piccione and Rubinstein (1993) proved that in any sequential game, if players are constrained to choose pure strategies, the only Nash equilibria for the machine game are constant repetitions of one of the equilibria of the stage game. In the TG this entails that there are no pure strategy equilibria other than an innite repetition of (NT , NR). The intuition behind their result is as follows. In the TG the punishment phase would consist of a nite number of rounds in which the Sender plays NT and the Receiver plays NR. However, the Receivers behavior cannot be observed when the Sender plays NT . It follows that the Receiver can eliminate the states associated with the punishment phase (in which he plays NR) and reduce the complexity of his strategy. Without the punishment phase, however, cooperation cannot be in equilibrium. To prove that cooperation can emerge in the repeated TG we have rst to show that (contrary to Piccione and Rubinstein, 1993) there can be an NE in which the two players play Trust and Reward. We do this by relaxing the assumption that players are constrained to use pure strategies. This assumption was common in the early literature, where it was justied with the argument that it is unclear how the complexity of a mixed strategy could be modeled (see Abreu and Rubinstein, 1988). However, in the kind of evolutionary models we shall deal with, mixed strategies have an uncontroversial interpretation in terms of polymorphisms within populations of agents, each of whom uses a pure strategy. In these models considering mixed strategies is important because mixed strategy NE may turn out to be stable under some adjustment dynamics. We rst show that once players are allowed to use mixed strategies, Piccione and Rubinsteins (1993) result is overturned: besides a set of non-cooperative equilibria, there is a (mixed strategy) cooperative Nash equilibrium. Second, we address the equilibrium selection problem by using the so-called best response dynamics. The game is assumed to be played repeatedly by agents belonging to two separate populations, one of senders the other of receivers. Occasionally, agents are given the opportunity to change the machine they use to play the repeated game, in which case they adopt the machine that yields the largest payoff. We show that under this adjustment process there is a path that leads from the noncooperative equilibrium to the cooperative one, but not vice versa. Formally, the cooperative mixed strategy NE is socially stable in the sense introduced by Matsui (1992), while the set of non-cooperative equilibria is not. This shows that Volijs (2002) result is also fragile. In a Prisoners Dilemma in which one player is allowed to choose rst, the non-cooperative equilibrium cannot be stable and cooperation is bound to emerge. The paper proceeds as follows. Section 2 introduces the necessary technicalities and denitions. Section 3 contains the proof that a cooperative equilibrium exists in the repeated TG. Section 4 introduces the best response dynamics and proves that the cooperative equilibrium is the only stable equilibrium. Section 5 concludes. All proofs are relegated to Appendix A. 2. The model: denitions Two players repeatedly play the Trust Game in Fig. 1. We shall indicate with S S = { T , NT } and S R = { R , NR} the pure strategy sets for the Sender and the Receiver. The set E = {NT , ( T , NR), ( T , R )} is the set of outcomes of the game. (Note that since the trust game is an extensive form game, E does not correspond to S S S R .) h i (e ) is the payoff that player i obtains on reaching the end-node e E (i = Sender, Receiver ). A nite automaton, or a machine, M is a nite collection of states of which one is the initial one. Each state is associated to a strategy, which is the strategy the automaton plays when in that state. After each round the state of the automaton
Samuelson and Swinkels (2003) discuss the differences between these two approaches.
69
Fig. 2. The unique minimum CURB set of the machine game.
changes depending upon its current state and the outcome of the previous round. Formally, a machine for player i is a quadruple Q i , q0 , , with the following characteristics. i
Q i is a set of states; q0 is the initial state; i i : Q i S i is the output function; i : Q i E Q i is the transition function.
Let M S and M R be the set of nite automata for player S and R . Two machines ( M S , M R ) playing against each other produce a deterministic history of strategies chosen by the two players (st ). E (st ) is the set of end nodes reached as a consequence of the history of play st , and h i ( E (st )) is the payoff obtained by player i at round t . The payoff resulting from a match ( M S , M R ) for player i = S , R is thus i ( M S , M R ) = (1 ) t =1 t 1 h i ( E (st )), where is the time discount factor. Each state after the rst one has a cost c . c c We shall denote with G the quadruple M S , M R , S , R . G is usually referred to as the machine game. The exc c tension to mixed strategies is done in the usual way. We shall indicate with S ( M S , q ) ( R ( M R , p )) the payoff the Sender (Receiver) obtains by using the machine M S ( M R ) when the cost of a state is c and he expects the Receiver (Sender) to use his mixed strategy q ( p ). Notice that, since in Section 4 mixed strategies will be interpreted as the frequencies with which pure strategies are used within two populations, payoffs are only dened for pure strategies. Finally, BR S (q) and BR R ( p ) denote the best response correspondences for the Sender and the Receiver. 3. Equilibria Before we present our results, we introduce the notion of Closed Under Rational Behavior (CURB) set (Basu and Weibull, ) be the set of probability distributions over a set of machines M . 1991). Let (M
S,M R ) be two nite sets of automata. We say that (M S,M R ) is a CURB set for the game G if (a) Denition 1. Let (M R ), M S BR S (q) implies that M S M S and (b) for all p (M S ), M R BR R ( p ) implies that M R M R. for all q (M A CURB set is minimal if it contains no proper subsets that are CURB sets. S,M R ) is a CURB set if, when player i expects player j to choose with positive probaIntuitively, a set of machines (M j he will only choose a strategy within M i. bility only strategies in M We shall now prove that the machine game G has a small minimal CURB set, and then that there is one cooperative equilibrium within that set. Fig. 2 contains all the automata that form this minimal CURB set. These are: all the oneT NR R state machines for both players ( M NT S , M S for the Sender and M R , M R for the Receiver) plus a two-state machine for the
70
Table 1 A simplied version of the machine game. M NR R M NT S MT S MS

g
g R MR
0, 0 vS, VR v S (1 ) c , V R (1 )
0, 0 1, 1 1 c, 1
sender M S . The latter machine implements the grim strategy: it plays T in the rst round and keeps playing T as long as the other player has played R .3 It reverts to a constant play of NT after the rst round in which the other player has played NR. These strategies yield very simple patterns when matched against each other. M NT S produces a stream of NT indepeng R R dently of the strategy with which it is matched. ( M T S , M R ) and ( M S , M R ) produce an uninterrupted stream of ( T , R ). g NR NR (M T S , M R ) produces a continuous stream of ( T , NR). ( M S , M R ) produces ( T , NR) in the rst round, followed by a stream of NT . g T NR R S the game in which players Let S S = { M NT S , M S , M S }, S R = { M R , M R } and S = S R S S . We shall indicate with G S choices are restricted to the set S . Table 1 represents G . All proofs are based on the following assumption: Assumption 1. (i) (ii) c
crit :=
c crit :=
V R 1 ; VR v S (1 v S ) . g
R crit is the threshold value of such that when crit ( M S , M R ) is a Nash equilibrium in the machine game without complexity costs (c = 0). This is the familiar condition that players must be suciently patient for cooperation to be a Nash
equilibrium in a repeated game. The second condition imposes that complexity costs are suciently small with respect to
and v S .
Proposition 1. Let Assumption 1 hold. Then the set S is the unique minimal CURB set for G . I shall only present a sketch of the proof that S is in fact a minimal CURB set. First note that against M NT S all Receivers machines obtain the same payoff (zero). Hence, it suces to consider what any alternative machine can obtain against M T S g NR NR T and M S . The key of the proof is that against M T S no machine can do better than M R , because M R exploits M S with the g R . On the other hand, if a minimum number of states. Against M S no machine that always plays R can do better than M R R R that plays NR for the rst time in round n > 1 obtains a larger payoff than M against the mixed strategy machine M R chosen by the Sender, then M NR R (which plays NR at the rst round) will obtain an even larger payoff, because it would S with | M S | > 0 that never plays Trust is obtain a larger payoff with a smaller number of states. Similarly, any machine M strictly dominated by M NT S , because it contains a larger number of states and obtains the same repeated game payoff (zero). S that plays T for the rst time in round n. The proof consists in showing that any such machine Now consider a machine M g cannot obtain a payoff which is simultaneously larger than the payoff obtained by M S and M T S. The following is a simple corollary of Proposition 1, which is worth stating as a separate result. Corollary 1. All the NE for G S are NE also for G . The following proposition characterizes the NE for G S , which, because of the previous corollary, are equilibria for G as well. Proposition 2. The game G S has a connected component N of NE in which the Sender chooses M NT S with probability one and the
Receiver chooses M NR R with a probability q

R R
1 0 . If Assumption 1 is met, G S has also a mixed strategy NE min( 1 , 1c ) := q v S 1 v S (1) 1) V R 1 +c ( p , q ) where p = (0, V R( , V ) and q = ( v c , v S ). V v
S S
There is a clear intuition behind both (sets of) NE in Proposition 2. First, there is a set of NE in which the Sender NR chooses the non-trusting machine M NT S because she expects the Receiver to choose the non-rewarding machine M R with a
3 For the sake of a simple notation, we use the same letters for strategies and outcomes, when this does not create confusion. So in Fig. 2 the letter R stands for the outcome ( T , R ), and NR stands for ( T , NR) and so on.
71
suciently high probability. The second NE is slightly more complex. The Sender expects the Receiver to always play Reward R (M R ) or always play Not Reward ( M NR R ) and the probability he puts on these two strategies are such that he gets the same g g T payoff by choosing the unconditionally trustful machine M T S , and the grim machine M S . M S is more complex than M S . g NR However, it obtains a larger payoff against M R because M S quits trusting after the rst time that the Receiver has played g R NR. So, while M T S yields higher payoffs when the Receiver plays M R with a suciently large probability, M S becomes the R best reply when the non-rewarding machine M R is expected with a larger probability. The trick of the proof is that when the cost of an extra state is suciently low (i.e. when c < c crit ) it pays to have an extra state to discriminate between M NR R R and M R rather that reverting to the simpler (but not discriminating) machine that never trusts M NT S . 4. Learning Consider the following extremely simplied model of learning.4 There are two large (innite) populations of agents which, with an abuse of notation, we shall denote as S (Sender) and R (Receiver). The game G is played by pairs of individuals drawn at random from S and R . Each agent in each population adopts a machine. The state of the two populations is represented by a pair ( p (t ), q(t )), where p (t ) and q(t ) are the distributions among the machines within the S and R population respectively. Let p M S (t ) and q M R (t ) be the fraction of the population S and R that use machines M S and M R respectively at time t . As above, let BR S (q(t )) and BR R ( p (t )) be the set of best replies for the Sender and the Receiver respectively when the state of the two populations is ( p (t ), q(t )).5 Agents in each population revise their strategies at a xed rate. When revising her strategy, an agent will switch to one of the best replies. These hypotheses ensure that the states of the two populations evolve according to the well known Best Response Dynamics (BRD)
(t ) = b S (t ) p (t ), p (t ) = b R (t ) q(t ) q
where b S (t ) BR S (q(t )) and b R (t ) BR R (q(t )) for all t (Gilboa and Matsui, 1991).6 Proposition 3. The set ( p (t ), q(t )) (S ).
(1)
(S ) is invariant under the BRD. That is, for any initial condition ( p (0), q(0)) 2
(S ), for any t > 0
Proof. This is an immediate consequence of the denition of the BRD and CURB set.
This proposition allows us to study the stability properties of the equilibria in the only minimal CURB set as if that were an independent game. In fact, if the system starts at any point of (S ), the dynamics will not take it out of (S ). The stability concept we shall use was introduced by Matsui (1992). Intuitively, a set of states X is stable if there is no best response path that leads from any element of X to a state which is not in X . To make this precise, we need some further piece of terminology. A strategy distribution ( p , q) ( S ) is directly accessible from ( p , q ) if there exists a best reply 0. Also ( p , q ) is accessible from ( p , q) if one path such that ( p (0), q(0)) = ( p , q) and ( p ( T ), q( T )) = ( p , q ) for some T of the following holds true: (i) ( p , q ) is directly accessible from ( p , q); (ii) there exists a sequence ( pn , qn ) converging at ( p , q ) such that ( pn , qn ) is directly accessible from ( p , q) for any n; (iii) if ( p , q ) is accessible from another ( p , q ) which is accessible from ( p , q). We need one nal denition: a set of states F (S ) is a cyclically stable set (CSS) if (i) any ( p , q ) / F is not accessible from any ( p , q) F , and (ii) any ( p , q) F is accessible from any ( p , q ) F . The idea of a CSS is that a set of states F is stable if the best response dynamics (1) cannot leave F . We are now ready to formulate our main result. Proposition 4. In game G , the mixed strategy NE ( p , q ) is cyclically stable. The set of NE N is not cyclically stable. Fig. 3 provides a graphical illustration of this proposition. It represents the state space (S ), with the two (sets of) Nash g T equilibria ( p , q ) and N . p 0 , p 1 and p g represent the fractions of population S which play M NT S , M S and M S respectively NR R and q0 and q1 are the fractions of R that play M R and M R . It also represents two orbits generated by the BRD. The rst orbit originates from x, which lies on the face where p 0 = 0. and has the familiar appearance of the orbits generated in
4 5
This is the continuous and deterministic counterpart of the stochastic learning model proposed by Volij (2002). We are implicitly assuming here that only a nite number of machines are represented at the initial state ( p (0), q(0)) and at any following time t > 0. The analysis of the BRD for games with an innite number of strategies is beyond the scope of this paper. On the other hand, none of the results presented below requires such an analysis. 6 Note that these are not differential equations, because best replies might not be unique, so that more than one orbit can originate from the same initial condition.
72
Fig. 3. Orbits generated by the BRD on
(S ).
2 2 games with a single mixed strategy NE (see for example Berger, 2002). p 0 remains constantly equal to zero, while the populations converges towards ( p , q ). The second orbit starts from y N and converges to ( p , q ), just like the rst one. The logic of the proof is to show that from any point in N there is a best response path that approaches ( p , q ), while there are no best response paths going from ( p , q ) to N . Actually, ( p , q ) attracts all best response paths originating in a suciently small neighborhood. 5. Conclusions The results presented above have two main implications for the theory of repeated games played by nite automata. First, a show of strength is not necessary to sustain cooperation, provided that mixed strategies are allowed. Even sequential games, for which the show of strength argument fails, do have cooperative equilibria in mixed strategies. Second, when a player is allowed to choose rst in a PD the non-cooperative equilibrium cannot be stable. This suggests that the exact timing of decision plays a crucial, and neglected, role in the way in which cooperation can spread in a world of universal defection. Appendix A A.1. Proof of Proposition 1 We shall prove Proposition 1 with the help of three lemmata. Lemma 1. The set of strategies S is a minimal CURB set for G .
/ S (i = S , R ) yields, against any element of Proof. We must show that any machine for the Sender and the Receiver M i (S ), a payoff which is strictly smaller than the payoff offered by at least one element of S . First consider any machine S S has at least two states, because otherwise it would belong to S . If it never plays T , its payoff / S. M for the sender M S | < 0 (c 1) and hence it is strictly dominated by M NT , whose payoff is zero. Suppose thus that M S plays T at is 0 c | M S the beginning of the game (there is no loss of generality in this assumption, because none of the machines in S R behave differently depending on the round in which the rst T takes place). Its payoff against M NR R is thus bounded above by S |. Its payoff against M R is bounded above by 1 c | M S |. As a consequence, M S is strictly dominated by M g v S (1 ) c | M R S S | > 1. If | M S | = 1, M S is a two-state machine whose initial state is T and the other is NT . (If both states were whenever | M S would be dominated by M T .) It is a tedious exercise to show that all two states machines in which the st state is T, M S g R T are strictly dominated by M S against any probability distribution involving only M NR R and M R . R / S R . All Receivers machines obtain the same payoff (zero) Now consider an alternative machine for the Receiver M g T against M NT S , and hence it suces to consider what M R obtains against M S and M S . M R has at least two states. If the initial NR T R| < V R, state is NR, it is dominated by M R . To see this, consider that against M S any machine obtains at most V R c | M g R |. Suppose then that M R has R as initial while against M S a machine that plays NR in the rst round gets V R (1 ) c | M g state. Neither M T S nor M S play NT , unless the Receiver has played NR once. If M R always plays R after any T , M R is strictly g R T dominated by M R . The reason is that it obtains a constant stream of ( T , R ) against both M S and M S , and it has at least two R must have at least one state in which it plays NR, which is reached after a sequence of T s. After it has played states. So M g NR the rst time, the best that M R can do is keep playing NR, for M S will not play Trust any longer, while M T S will continue NR to play T . As a consequence, M R obtains the same payoff as M R , beginning at round n. Before that, it obtains a stream of 1. R Let 0 ( p ) and 1 ( p ) = 1 be the payoff obtained by M NR R and M R resp. when the Sender plays the mixed strategy p . If the n n 0 R |. Clearly, if 0 ( p ) 1 ( p ) = 1, then 0 ( p ) > R he obtains: ( p) = 1 + ( p) c|M ( p ), so that Receiver uses M 0 1 R cannot be a best response. 2 ( p ), and again M M R cannot be a best response. If ( p ) ( p ) = 1, then 1 ( p ) = 1 >
73
The second part of the following lemma is a well known result in this kind of literature: the best reply to a machine never contains more states that the machine itself. The rst part depends upon the sequential structure of the Trust Game. Lemma 2. Let M S and M R be two machines for the Sender and the Receiver resp. such that | M i | | M S |, while if M S BR S ( M R ), then | M R | | M S | 2. If M R BR R ( M S ), then | M R | <
Proof. The second part of the lemma is just a consequence of Piccione and Rubinsteins (1993) Lemma 1. To prove the rst part we have to show that the best reply for the Receiver to any machine M S of the Sender contains a strictly smaller number of states. Consider any machine M S = Q S , q1 S , S , S which has at least two states. Let b R : Q S S R be the policy R dened as function that maximizes the Receivers payoff stream against M S . Now consider a machine for the receiver M R = Q S , q1 , R , R , where R (q S ) = b R (q S ) and R (q S , .) = S (q S , E ( S (q S ), R (q S )). This machine has the same follows M S set of states of M S and implements the optimal policy, so that it maximizes the Receivers repeated game payoffs. We now R that behaves like M R against M S (and show that it is possible to construct an alternative machine for the Receiver M R ). If one state q S Q S hence obtains the same payoff), but has a smaller number of states. Consider rst a match ( M S , M R ), one can obtain M R by replacing Q S with Q S q R obtains the same payoff as S. M is not reached in a match ( M S , M R and contains a smaller number of states. Suppose then that all states in Q S are reached. There must be at least one M S = { k tS , NT ) = q tS+1 , and S (q t ) = NT . In other words, there succession of states Q q1 1), such that S (q S , . . . , q S } (with k must be a succession of (at least one) states in which the Sender plays NT . (If all states were T , the Receivers best reply 1 1 would be M NR R .) There are two possibilities. First, q S = q S (the succession of NT starts at the beginning of the game) and
+1 k k+1 with S (qk S (q S ) = T (after k rounds M S enters a state in which plays T ). Consider the following machine S , NT ) = q S +1 k+1 R , q1 , R , R , where Q S and q1 = q R= Q R = QS Q k M and the S . This machine is obtained by M R by selecting q S R R
R against M S and therefore obtains the same payoff, initial state and removing the rst k states. It behaves exactly as M 1 1 with less states. A second possibility is that q = q , that is, the succession of NT does not start at the beginning of the S S S , E ( S (q S ), R (q S ))) = q 1 S) = T . q S Q S be the state such that S (q S is thus the last state in which and game. Let q S (q S R before entering the succession Q S of states in which it plays NT . Let q Q S be the state such M S plays T against M
S
that
S , then S (q R must be modied as follows: S ) = T , so after playing NT for k rounds, M S plays T again. In this case M /Q q S S , E ( S (q S ), R (q S ))) = q . In all the cases we obtain a machine that obtains the maximum payoffs against M S with a R (q S strictly smaller number of states. 2
Lemma 3. There are no minimal CURB sets other than S in G .
k k k S (q S , E ( S (q S ), R (q S ))) = q S . So q S is the state M S enters at the end of the succession Q S . If q S Q S , M S never R , E ( S (q S ), R (q S ))) = q R . Finally, if leaves Q S . In this case M S must be modied as follows: Q R = Q S Q S and R (q
Proof. By way of contradiction, assume that S = {S R , S S } is a set of machines for the Sender and the Receiver, such that S is a minimal CURB set. Of course, S S = . (If any of the machines in S were also in S , S would not be a minimal CURB set.) Let M S M S , such that | M S | | M S | for each M S S S . Thus M S is one of the machines that have the minimal number of states in S S . Since S is a CURB set, it must be that if M R BR R ( M S ), then M R S R . Because of Lemma 2, | M R | < | M S |. Since S is a CURB set, it must also be that all best replies to M R are in M S . This implies that there exists a S | | M R |. We thus have that S such that M S BR S ( M R ) and M S M S . Because of Lemma 2, this implies that | M strategy M S | | M R | < | M S |, which contradicts that | M S | | M | for each M M S . 2 |M S S Clearly, Lemma 1 and Lemma 3 imply Proposition 1. A.2. Proof of Proposition 2
T Proof. Let q be the probability with which the Receiver plays M NR R . The payoff in playing M S is v S q + (1 q ), while the g payoff in playing M S is ( v S (1 ) c )q + (1 c )(1 q). M NT is thus a best reply provided that both these magnitudes are S
non-positive, which requires q

NR ( M NT S , MR )
S + v S , which is non-negative provided that c two strategies is c cv v S
is the only Nash equilibrium. If crit , if the Sender plays p , the Receiver is indifferent between and therefore NR R T M R and M R . If the Receiver plays q , the Sender is indifferent between M NT S and M S . The Senders payoff in playing these
vS c crit = (1 v S ) .
1 1 v S
and q
1c . 1 v S (1)
If < crit , M NR R is a weakly dominant strategy for the Receiver,
A.3. Proof of Proposition 4 Proof. To asses the stability of ( p , q ) and the set of NE N we need to consider only the BRD paths within (S ). To see this consider that, because of Proposition 3, no state outside (S ) is accessible from any state in (S ). It follows that the BRD cannot take the system from ( p , q ) to N or vice versa by rst reaching a state ( p , q ) / (S ). We prove the
74
proposition by showing that ( p , q ) is a CSS and that ( p , q ) is accessible from any ( p , q) N , which of course implies that N cannot be a CSS. The orbits on the face spanned by p 1 and q0 can be easily computed, provided that they do 0 , M NT 0 . In fact, as long as q0 (t ) < q not cross the threshold q S is not a best reply, which guarantees that p 0 remains zero. The logic of the proof is to calculate the Poincar section of the two orbits, using the plan {x, y , y , ( p , q )}, which we n shall denote as H . Let (qn 0 (x)) and (q 0 ( y )) be the values taken by q 0 (t ) when crossing the plane H starting from x and n y respectively where n = 0, 1, 2 . . . denotes the successive crossings. We shall show that for any n, (qn 0 ( y )) = (q 0 (x)). This NR means that the fraction of M R players when orbits cross the plane H does not depend upon the initial fraction of M NT S players within the S population. Since we know that (qn 0 (x)) approaches q 0 monotonically, this implies that the same will happen for (qn 0 ( y )). Furthermore, q 0 (t ) remains below q 0 for any t > 0, and hence along the best response path originating from y , p 0 (t ) goes monotonically to zero and p 1 (t ) and p g (t ) approach p 1 and p g . p Consider any point on H . At any such point p 1 = p ( 1 p ) , with [ 0 , 1 ] , and q0 [q 00 00 1 0 , q 0 ]. One of the best replies g for the sender is clearly M S , so that one has that when p 1 (0) = p (1 p 00 ), a best reply path is p 1 (t ) = p (1 p 00 )e t , q with t [0, t 1 ). t 1 = log( q0 ) is the time it takes for population R to move from q 0 to q 0 . Let p 11 be p 1 (t 1 ). When q 0 (t ) = q 0 ,
(t t 1 ) . The orbit will cross H when MT S becomes the best reply, so that one has that p 1 (t ) = 1 (1 p 11 )e
p 1 (t ) 1 p 0 (t )
0
= p 1 . This requires that

1 (1 p 11 )e (t t1 ) 1 p 00 e t
p 1 (t ) p 1 (t )+ p g (t )
= p 1, = p 1.
p et 1
1 (1 ( p (1 p 00 )e t1 ))e (t t1 ) 1 p 00 e t
1 Solving the last equation for t , one obtains t 2 = log( 1 ), which does not depend on p 00 . This means that starting p 1 from y , p 1 will take the same time t 2 to come back to p , 1 as when starting from x. At that time, one would have that 0 e t2 . With an analogous reasoning one can show that it takes the same time t 3 for the system to go back to the q0 (t 2 ) = q plane H , independently of the point ( x or y ) from which the orbit started. One can iterate this reasoning to show that in i i ( y )) = (q0 ( y )) for any i , and therefore that both orbits converge to ( p , q ). 2 fact (q0
References
Abreu, D., Rubinstein, A., 1988. The structure of Nash equilibria in repeated games with nite automata. Econometrica 56, 12591282. Basu, K., Weibull, J.W., 1991. Strategy subsets closed under rational behavior. Econ. Letters 36 (2), 141146. Berger, U., 2002. Best response dynamics for role games. Int. J. Game Theory 30, 527538. Binmore, K., Samuelson, L., 1992. Evolutionary stability in repeated games played by nite automata. J. Econ. Theory 57, 278305. Gilboa, I., Matsui, A., 1991. Social stability and equilibrium. Econometrica 59, 859867. Matsui, A., 1992. Best response dynamics and socially stable strategies. J. Econ. Theory 57, 343362. Piccione, M., Rubinstein, A., 1993. Finite automata play a repeated extensive game. J. Econ. Theory 61, 160168. Samuelson, L., Swinkels, J.M., 2003. Evolutionary stability and lexicographic preferences. Games Econ. Behav. 44 (2), 332342. Volij, O., 2002. In defense of defect. Games Econ. Behav. 39, 309321.

Aan 15

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Aan 15

Uploaded by

Copyright:

Available Formats

Games and Economic Behavior 79 (2013) 6774

Contents lists available at SciVerse ScienceDirect

Games and Economic Behavior

Evolutionary stability in repeated extensive games played by nite automata

L. Andreozzi / Games and Economic Behavior 79 (2013) 6774

Fig. 1. The Trust Mini-Game.

L. Andreozzi / Games and Economic Behavior 79 (2013) 6774

Fig. 2. The unique minimum CURB set of the machine game.

L. Andreozzi / Games and Economic Behavior 79 (2013) 6774

Table 1 A simplied version of the machine game. M NR R M NT S MT S MS

Receiver chooses M NR R with a probability q

L. Andreozzi / Games and Economic Behavior 79 (2013) 6774

(S ), for any t > 0

L. Andreozzi / Games and Economic Behavior 79 (2013) 6774

Fig. 3. Orbits generated by the BRD on

L. Andreozzi / Games and Economic Behavior 79 (2013) 6774

non-positive, which requires q

S + v S , which is non-negative provided that c two strategies is c cv v S

If < crit , M NR R is a weakly dominant strategy for the Receiver,

L. Andreozzi / Games and Economic Behavior 79 (2013) 6774

= p 1 . This requires that

You might also like