Professional Documents
Culture Documents
Overview
Game Playing
Game problem vs State space problem
Status labeling procedure in Game tree
Nim game problem
Zero-sum games
one players win is the others loss
Discrete:
Finite number of states or configurations
Perfect information:
Fully observable
No information is hidden from either player.
Deterministic:
The outcome of actions is known
MAX nodes:
Select the maximum cost successor
terminal test
also called goal test
determines when the game is over
calculate the result
usually win, lose, draw; sometimes a score (see below)
Two-Person Games
games with two opposing players
often called MIN and MAX
usually MAX moves first, then they take turns
in game terminology, a move comprises two steps (plies)
one by MAX and one by MIN
full information
both players know the full state of the environment
partial information
one player only knows part of the environment
some aspects may be hidden from the opponent, or from both players
Evaluation function
Evaluation function or static evaluator is used to
evaluate the goodness of a game position
Zero-sum assumption lets us use a single evaluation
function to describe goodness of a board wrt both players
f(n) >> 0: position n good for me and bad for you
f(n) << 0: position n bad for me and good for you
f(n) near 0: position n is a neutral position
f(n) = +infinity: win for me
f(n) = -infinity: win for you
MINIMAX Procedure
The procedure through which the scoring
information travels up the game tree is
called MINIMAX procedure.
MINIMAX Procedure
Minimax procedure
Create start node as a MAX node with current board
configuration
Expand nodes down to some depth (a.k.a. ply) of lookahead
in the game
Apply the evaluation function at each of the leaf nodes
Back up values for each of the non-leaf nodes until a value
is computed for the root node
At MIN nodes, the backed-up value is the minimum of the
values associated with its children.
At MAX nodes, the backed-up value is the maximum of the
values associated with its children.
Minimax Algorithm
From root node select the move which leads to highest value
Minimax Algorithm
2
1
2
2
Static evaluator
value
MAX
MIN
Tic-Tac-Toe
X
e(p) = 6 - 5 = 1
Alpha-Beta Strategy
Maintain two bounds:
Alpha (): a lower bound on best that the
player to move can achieve
Beta (): an upper bound on what the
opponent can achieve
Search, maintaining and
Whenever higher, or higher further search at this node is
irrelevant
If beta value of any MIN node below a MAX node is less than or
equal to its alpha value, then prune the path below the MIN
node.
If alpha value of any MAX node below a MIN node exceeds the
beta value of the MIN node, then prune the nodes below the
MAX node.
Alpha-beta pruning
We can improve on the performance of the
minimax algorithm through alpha-beta pruning
Basic idea: If you have an idea that is surely bad,
don't take the time to see how truly awful it is. -Pat Winston
MAX
MIN
>=2
=2
MAX
2
Alpha-beta pruning
Traverse the search tree in depth-first order
At each MAX node n, alpha(n) = maximum value found
so far
At each MIN node n, beta(n) = minimum value found so
far
The alpha values start at - and only increase, while beta
values start at + and only decrease
MAX
MIN
12
14 1 - prune
2 - prune
14
Alpha-beta algorithm
function MAX-VALUE (state, , )
;; = best MAX so far; = best MIN
if TERMINAL-TEST (state) then return UTILITY(state)
v := -
for each s in SUCCESSORS (state) do
v := MAX (v, MIN-VALUE (s, , ))
if v >= then return v
:= MAX (, v)
end
return v
function MIN-VALUE (state, , )
if TERMINAL-TEST (state) then return UTILITY(state)
v :=
for each s in SUCCESSORS (state) do
v := MIN (v, MAX-VALUE (s, , ))
if v <= then return v
:= MIN (, v)
end
return v
Effectiveness of alpha-beta
Alpha-beta is guaranteed to compute the same value for the root
node as computed by minimax, with less or equal computation
Worst case: no pruning, examining bd leaf nodes, where each
node has b children and a d-ply search is performed
Best case: examine only (2b)d/2 leaf nodes.
Result is you can search twice as deep as minimax!
Best case is when each players best move is the first alternative
generated
In Deep Blue, they found empirically that alpha-beta pruning
meant that the average branching factor at each node was about 6
instead of about 35!