Lecture 8 - Adversarial Search: Dr. Muhammad Adnan Hashmi

Lecture 8 Adversarial Search
Dr. Muhammad Adnan Hashmi

1 January 2013
Alpha-beta is still not efficient: Suppose we have 100 seconds, explore 104 nodes/sec 106 nodes per move
We need to figure out ways to cut this down further: Cutoff test, e.g., depth limit Evaluation function = estimated
desirability of position.
Its a heuristic function, that is supposed to cut off the search at non-terminal states Should order the terminal states similar to the true utility function Computation should not take too long For non-terminal states, this function should be strongly co-related with the chances of winning. The heuristic value should demonstrate how useful the state is in reaching the goal from there (not just simple minimax value).
3
1 January 2013
1 January 2013
For chess, typically linear weighted sum of features Eval(s) = w1 f1(s) + w2 f2(s) + + wn fn(s) e.g., w1 = 9 with f1(s) = (number of white queens) (number of black queens), etc. Try to imagine a heuristic function for: Checkers Tic-tac-toe
Minimax-Cutoff is identical to Minimax-Value except

Terminal is replaced by Cutoff Utility is replaced by Eval
In essence, Eval cuts off the search, at the point specified by the Cutoff, and takes an action based on Eval, rather than searching till the leaves.
Does it work in practice? bm = 106, b=35 m=4 4-ply look-ahead is a hopeless chess player!
4-ply human novice 8-ply typical PC, human master 12-ply Deep Blue, Kasparov (strong heuristics needed).
Deep Blue's evaluation function was initially written in a generalized form, with many tobe-determined parameters (e.g. how important is a safe king position compared to a space advantage in the center, etc.). The optimal values for these parameters were then determined by the system itself, by analyzing thousands of master games. The evaluation function had been split into 8,000 parts, many of them designed for special positions. In the opening book there were over 4,000 positions and 700,000 grandmaster games. The endgame database contained many six piece endgames and five or fewer piece positions. [http://en.wikipedia.org/wiki/Deep_Blue_(ches s_computer)]
1 January 2013
Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994. Used a pre-computed database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 444 billion positions. Chess: Deep Blue defeated human world champion Garry Kasparov in a six-game match in 1997. Deep Blue searches 200 million positions per second, uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply. Othello: Human champions refuse to compete against computers, who are too good. Go: human champions refuse to compete against computers, who are too bad. In go, b > 300, so most programs use pattern knowledge bases to suggest plausible moves.
1 January 2013
Chance introduced by dice, card-shuffling Coin-Flipping:
1 January 2013
10
Expecti-minimax value gives perfect play Just like Minimax, except we must also handle chance nodes: if state is a Max node then return the highest ExpectiMinimax-Value of Successors(state) if state is a Min node then return the lowest ExpectiMinimax-Value of Successors(state) If state is a chance node then return average of ExpectiMinimax-Value of Successors(state).
11
1 January 2013
Dice rolls increase b: 21 possible rolls with 2 dice Backgammon: 20 legal moves Depth 4 = 20 * (21 * 20)3 = 1.2 *109 moves As depth increases, probability of reaching a given node shrinks Alpha-beta pruning is much less selective TDGammon uses depth-2 search + very good Eval World-champion level.
1 January 2013
12
1 January 2013
E.g., card games, where opponent's initial cards are unknown Typically we can calculate a probability for each possible deal: just like a big dice roll at the beginning of the game Idea: Compute the minimax value of each action in each deal, then choose the action with highest expected value over all deals Special case: if an action is optimal for all deals, it's optimal. GIB, current best bridge program, approximates this idea by Generating 100 deals consistent with bidding information Picking the action that wins most tricks on average.
13
Road A leads to a small heap of gold pieces Road B leads to a fork: Take the left fork and you'll find a mound of jewels; Take the right fork and you'll be run over by a bus.
Road A leads to a small heap of gold pieces Road B leads to a fork: Take the left fork and you'll be run over by a bus; Take the right fork and you'll find a mound of jewels.
Road A leads to a small heap of gold pieces Road B leads to a fork: Guess correctly and you'll find a mound of jewels; Guess incorrectly and you'll be run over by a bus 14 1 January 2013
1 January 2013
15

Lecture 8 - Adversarial Search: Dr. Muhammad Adnan Hashmi

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 8 - Adversarial Search: Dr. Muhammad Adnan Hashmi

Uploaded by

Copyright:

Available Formats

Lecture 8 Adversarial Search

Dr. Muhammad Adnan Hashmi

Minimax-Cutoff is identical to Minimax-Value except

Terminal is replaced by Cutoff Utility is replaced by Eval

Chance introduced by dice, card-shuffling Coin-Flipping:

You might also like