Professional Documents
Culture Documents
Stat155
Game Theory
Lecture 9: von Neumanns Minimax Theorem Nash equilibrium.
Games as linear programs.
von Neumanns minimax theorem
Peter Bartlett
Low regret learning algorithms for playing games.
1 / 20 2 / 20
Definition
Definition
A pair (x , y ) m n is a Nash equilibrium for a payoff matrix
A pair (i , j ) {1, . . . , m} {1, . . . , n} is a saddle point (or pure Nash
A Rmn if
equilibrium) for a payoff matrix A Rmn if
max x > Ay = x> Ay = min x> Ay .
xm y n
max aij = ai j = min ai j .
i j
Just like a pure Nash equilibrium (saddle point), but mixed strategies.
If Player I plays x and Player II plays y , neither player has an
If Player I plays i and Player II plays j , neither player has an
incentive to change.
incentive to change.
x is a best response to y , y is a best response to x .
Think of these as locally optimal strategies for the players.
Think of these as locally optimal strategies for the players.
Weve seen that they are also globally optimal strategies.
Well see that they are also globally optimal strategies.
3 / 20 4 / 20
Nash equilibrium Outline
Theorem
For x m and y n , the following are equivalent.
1 (x , y ) is a Nash equilibrium.
2 x and y are optimal:
Nash equilibrium.
min x > >
Ay = max min x Ay , max x > Ay = min max x > Ay .
y x y x y x Games as linear programs.
von Neumanns minimax theorem
Low regret learning algorithms for playing games.
Proof
(1)(2): The same as the proof for the optimality of a saddle point.
(2)(1): The von Neumann minimax theorem implies that
x> Ay min x> Ay = max min x > Ay = min max x > Ay = max x > Ay x> Ay
y x y y x x
so (x , y ) is a Nash equilibrium.
5 / 20 6 / 20
Here, b Rn specifies the linear objective x 7 b > x, and This is not a linear program:
di Rn , ci R specify the ith constraint. The constraints are linear, but the objective is not. But we can
The set of xs that satisfy the constraints is a polytope convert it to a linear program, by introducing a slack variable.
(an intersection of half-spaces).
7 / 20 8 / 20
Linear Programs Linear Programs
max v For the column player, we can obtain a similar linear program:
xRm ,v R
min v
s.t. v x > Ae1 m
y R ,v R
.. s.t. v e1> Ay
.
..
v x > Aen .
x1 0 v en> Ay
.. y1 0
.
..
xm 0 .
1> x = 1 ym 0
1> y = 1
By maximizing the lower bound v on all of the x > Aei we maximize
the minimum of the x > Aei .
9 / 20 10 / 20
Theorem
For any two-person zero-sum game with payoff matrix A Rmn ,
There are many proofs; see the text for a proof based on the
max min x > Ay = min max x > Ay . separating hyperplane theorem from convex analysis.
xm y n y n xm
We will consider playing the game repeatedly, with one player learning
a good strategy (playing small improvements on its previous actions)
We call the optimal expected payoff the value of the game, V . and the other player giving a best response.
LHS: Player I plays x m first, then Player II responds with y n . This gives an explicit sequence of strategies for the two players that
converges to their optimal strategies, and that proves the minimax
RHS: Player II plays y n first, then Player I responds with x m .
theorem.
Notice that we should always prefer to play last:
The key property of the learner is that it has small regret.
max min x > Ay min max x > Ay .
xm y n y n xm
Proof of von Neumanns Minimax Theorem Proof of von Neumanns Minimax Theorem
17 / 20 18 / 20
Theorem
The existence of a row player with low regret implies the minimax theorem.
In the proof, the row player is a low regret learning algorithm, playing Nash equilibrium.
xt , the P
column player plays a best response yt , and we define Games as linear programs.
1 PT
x = T1 T x
t=1 t , y
= T t=1 yt . von Neumanns minimax theorem
The proof shows that x and y are asymptotically optimal, in the sense Low regret learning algorithms for playing games.
that that the gain of x and the loss of y approach the value of the
game.
Next, well consider a specific low regret learning algorithm:
gradient ascent.
19 / 20 20 / 20