You are on page 1of 5

Outline for today

Stat155
Game Theory
Lecture 9: von Neumanns Minimax Theorem Nash equilibrium.
Games as linear programs.
von Neumanns minimax theorem
Peter Bartlett
Low regret learning algorithms for playing games.

September 22, 2016

1 / 20 2 / 20

Recall: Saddle Point Nash equilibrium

Definition
Definition
A pair (x , y ) m n is a Nash equilibrium for a payoff matrix
A pair (i , j ) {1, . . . , m} {1, . . . , n} is a saddle point (or pure Nash
A Rmn if
equilibrium) for a payoff matrix A Rmn if
max x > Ay = x> Ay = min x> Ay .
xm y n
max aij = ai j = min ai j .
i j
Just like a pure Nash equilibrium (saddle point), but mixed strategies.
If Player I plays x and Player II plays y , neither player has an
If Player I plays i and Player II plays j , neither player has an
incentive to change.
incentive to change.
x is a best response to y , y is a best response to x .
Think of these as locally optimal strategies for the players.
Think of these as locally optimal strategies for the players.
Weve seen that they are also globally optimal strategies.
Well see that they are also globally optimal strategies.

3 / 20 4 / 20
Nash equilibrium Outline

Theorem
For x m and y n , the following are equivalent.
1 (x , y ) is a Nash equilibrium.
2 x and y are optimal:
Nash equilibrium.
min x > >
Ay = max min x Ay , max x > Ay = min max x > Ay .
y x y x y x Games as linear programs.
von Neumanns minimax theorem
Low regret learning algorithms for playing games.
Proof
(1)(2): The same as the proof for the optimality of a saddle point.
(2)(1): The von Neumann minimax theorem implies that

x> Ay min x> Ay = max min x > Ay = min max x > Ay = max x > Ay x> Ay
y x y y x x

so (x , y ) is a Nash equilibrium.
5 / 20 6 / 20

Linear Programs Linear Programs

A linear program is an optimization problem involving the choice of a


real vector to maximize a linear objective subject to linear constraints: From the perspective of the row player, a two-player zero-sum game
is an optimization problem of the form
max b> x maxm min x > Aei
xRn xR i{1,...,n}

s.t. d1> x c1 s.t. x1 0


d2> x c2 ..
.
..
. xm 0
dk> x ck 1> x = 1

Here, b Rn specifies the linear objective x 7 b > x, and This is not a linear program:
di Rn , ci R specify the ith constraint. The constraints are linear, but the objective is not. But we can
The set of xs that satisfy the constraints is a polytope convert it to a linear program, by introducing a slack variable.
(an intersection of half-spaces).
7 / 20 8 / 20
Linear Programs Linear Programs

max v For the column player, we can obtain a similar linear program:
xRm ,v R
min v
s.t. v x > Ae1 m
y R ,v R
.. s.t. v e1> Ay
.
..
v x > Aen .
x1 0 v en> Ay
.. y1 0
.
..
xm 0 .
1> x = 1 ym 0
1> y = 1
By maximizing the lower bound v on all of the x > Aei we maximize
the minimum of the x > Aei .

9 / 20 10 / 20

Linear Programs: An Aside Outline

There are efficient algorithms for solving linear programs.


(Efficient run-time is polynomial in the size of the problem.)
The column players linear program is the dual of the row players
linear program:
For any concave maximization problem, like the row players linear Nash equilibrium.
program (well call it the primal problem), it is possible to define a
Games as linear programs.
dual convex minimization problem, like the column players linear
von Neumanns minimax theorem
program.
Low regret learning algorithms for playing games.
This dual problem has a value that is at least as large the value of the
primal problem.
In many important cases (such as our linear program), these values
are the same. In optimization, this is called strong duality.
This is von Neumanns minimax theorem.
The principle of indifference is a general property of dual optimization
problems (called complementary duality).
11 / 20 12 / 20
von Neumanns Minimax Theorem Proof of von Neumanns Minimax Theorem

Theorem
For any two-person zero-sum game with payoff matrix A Rmn ,
There are many proofs; see the text for a proof based on the
max min x > Ay = min max x > Ay . separating hyperplane theorem from convex analysis.
xm y n y n xm
We will consider playing the game repeatedly, with one player learning
a good strategy (playing small improvements on its previous actions)
We call the optimal expected payoff the value of the game, V . and the other player giving a best response.
LHS: Player I plays x m first, then Player II responds with y n . This gives an explicit sequence of strategies for the two players that
converges to their optimal strategies, and that proves the minimax
RHS: Player II plays y n first, then Player I responds with x m .
theorem.
Notice that we should always prefer to play last:
The key property of the learner is that it has small regret.
max min x > Ay min max x > Ay .
xm y n y n xm

The astonishing part is that it does not help.


13 / 20 14 / 20

Proof of von Neumanns Minimax Theorem Proof of von Neumanns Minimax Theorem

Definition (Regret in a repeated game)


Consider a two-player zero-sum game that is repeated for T rounds.
At each round, the row player chooses an xt m , then the column player
chooses a yt n , and the row player receives a payoff of xt> Ayt .
The row players regret after T rounds is how much its total payoff falls Well see that there are learning algorithms that have low regret
short of the best in retrospect that it could have achieved against the against any sequence played by the column player.
column players choices with a fixed mixed strategy: These learning algorithms dont need to know anything about the
T T
game in advance, they just need to see, after each round, the column
X X
RT = max >
x Ayt xt> Ayt . vector of payoffs corresponding to the column players choice.
xm
t=1
| {z } |t=1 {z }
best total payoff total payoff

If the row players regret RT grows sublinearly (that is, RT /T 0),


we say that it has low regret.
15 / 20 16 / 20
Proof of von Neumanns Minimax Theorem Proof of von Neumanns Minimax Theorem

Proof (maxx miny x > Ay miny maxx x > Ay )


Theorem
The existence of a row player with low regret implies the minimax theorem. max min x > Ay min x> Ay
xm y n y n
T
Proof 1 X > T
1 X > RT
= min xt Ay max x Ayt
Suppose the row player has low regret. y n T
t=1 xm T T
P t=1
Define x = T1 T
t=1 xt . 1
T
X RT
min xt> Ay = max x > A
y
Suppose that the column player plays a best response yt against the T y n xm T
t=1
row players choice xt : RT
xt> Ayt = min xt> Ay . 1 XT min max x > Ay .
y n = xt> Ayt y n xm T
T
1 PT t=1
As T , RT /T 0.
Define y = T t=1 yt .

17 / 20 18 / 20

Proof of von Neumanns Minimax Theorem Outline

Theorem
The existence of a row player with low regret implies the minimax theorem.

In the proof, the row player is a low regret learning algorithm, playing Nash equilibrium.
xt , the P
column player plays a best response yt , and we define Games as linear programs.
1 PT
x = T1 T x
t=1 t , y
= T t=1 yt . von Neumanns minimax theorem
The proof shows that x and y are asymptotically optimal, in the sense Low regret learning algorithms for playing games.
that that the gain of x and the loss of y approach the value of the
game.
Next, well consider a specific low regret learning algorithm:
gradient ascent.

19 / 20 20 / 20

You might also like