You are on page 1of 5

Monte Carlo Tree Search in Monopoly

Matthew Long, Kapil Kanagal


Stanford University
December 11, 2015
and hotels, auctions) and AIs are typically constructed
by attempting to imitate a human-developed strategy
in a computer program. These AIs are based on
proven Monopoly strategies and vary their playing styles
(aggressive, money-centered, property-centered, etc), but
do not attempt to do any game tree search. These AIs
employ heuristics that imitate what it?s creators believe is
a successful Monopoly policy and are quite sophisticated
particularly in their auction bids and trade offers. The goal
of this project is to bootstrap a number of these rule-based
AIs into a better performing AI, by using depth-limited
MCTS to evaluate the policies of each AI and choosing
the one that leaves an agent in the strongest state.

Abstract
In this paper, we implement a version of
depth-limited Monte Carlo tree search to play
the board game Monopoly. We bootstrap an
existing set of human-implemented rule-based AI
algorithms by simulating a series of turns into
the future using each AIs policy, from there an
evaluation function is used to compute which AI
puts the agent in best position. Ultimately, this
resulted in an AI that performs 40% better than
any of the AIs whose policies it uses.

Introduction

Games are a hallmark of modern artificial intelligence


2 Approach
research. Many games today include 3 or more players,
large state spaces, and incorporate randomness and hidden
information making them challenging to build player 2.1 Monopoly
AIs. Monte Carlo tree search (MCTS) is a relatively
new algorithm that can produce strategies for complex
games with large numbers of random games and has
had success in the notoriously difficult game Go (1).
In pure Monte Carlo game search, players first build a
representation of the search tree by simulating random
complete games and (state, action) tuples are scored on
the basis of leading to an eventual victory. A policy can
then be constructed by selecting the highest scoring action
for a given state. Since Monopoly has the random element
of dice rolls, which drastically influence the property
acquisition, trading, and building on properties, and a
high branching factor (12 possible dice rolls on each turn)
running a full Monte Carlo Tree Search at each location
would be absolutely unfeasible with our resources and
final outcomes would only be loosely tied to individual
actions. Instead, we propose a modified depth-limited
MCTS, where a domain-specific evaluation function is
Figure 1: A Monopoly board
used to score the end-result of a series of actions.
Compared to other games, the rules of Monopoly
are relatively complex (ex. properties sets, houses SUMMARY OF MONOPOLY QUICKLY
1

2.2

Literature Review

chosen depth, and rather running random simulations of n


full turns from the current state. The number of random
simulations and the depth of simulations are connected in
that in order to have a reasonable representation of deeper
game tree, more simulations need to be run. This means
that while the theoretical complexity isnt exponential, in
practice it is.
Data: Agent, Current Player, MC AIs, Game State,
Iterations, Depth
Result: Action
Scores = [0,...,0];
if Current Player == Agent then
Copy Game State;
for AI in MC AIs do
AgentAI = AI;
for i = 1 to Iterations do
Simulate depth turns following
AgentAI policy;
Scores[AI] += EvaluateState(Agent);
end
end
Restore original Game State;
Agent = argmax Scores ;
else
Simulate as usual with CurrentPlayer ;
end
Algorithm 1: Agent Decision Policy
One key aspect of our algorithm is storing the current
game state before entering the simulation and reverting
back to that game state after the simulation. This proved
to be a difficult issue to resolve in code, but we managed
to do so using the copy.deepcopy() function in python in
order to ensure that we would be able to revert back to
the original game state after simulating the outcomes of
the various policies. Storing the deep copy allows us to
return to the original game state before the simulation, but
choose the move that corresponds to the optimal policy
found during the simulations.

Hybrid Monte Carlo tree search techniques have been


explored for a number of applications. For instance, in
a game like chess, there are so called shallow traps in
which precise play over small number of immediate steps
is necessary to avoid losing. In these circumstances,
MCTS with terminal scoring could easily overlook these
problems. Proposed solutions to these problems involve
merging the minimax algorithm with MCTS to create
an AI that is precisely aware of game tree in immediate
future, and reasonably well aware of deeper tree (2).
Our proposed solution involves merging MCTS with
depth-limited minimax to avoid having to exhaustively
search the game tree, because even searching exhaustively
to a small depth will require a large amount of parallelism
to do in a reasonable amount of time due to the large
branching factor. The other aspect of our project is
attempting to create a better AI given a number of
rule-based AIs. This is a similar concept to aggregate
bootstrapping in machine learning, where a number of
models are trained on a dataset and each vote on the final
output of the ensemble model (3). Ensemble-based AI
game playing does not appear to be a well-explored area
of research, possibly due to the fact that switching policies
periodically throughout a game could make long-term
tactics impossible to execute.

2.3

Infrastructure

For this project we used monopyly, a Monopoly


framework developed by Richard Shepherds. As it was
developed for a tournament, it includes a number of
human-built AIs that were used for bootstrapping and
competition opponents. The framework is written in
Python and allows us to abstract away the gameplay
algorithms and GUI and focus on the implementation of
our AI player. Simulations were performed on a 4-core
AWS EC2 instance to take advantage of parallelization
and enable long-running simulation jobs.

2.5
2.4

Algorithm

Evaluation Function

The evaluation function needs to incorporate domain


knowledge to provide a reasonable representation of the
desirability of a players current state. Since it is used to
determine which AI policy to follow is a critical piece of
the proposed bootstrapped agent. The evaluation function
used was a linear function consisting of 5 features.

Ideally, the entire game could be simulated, but due to


runtime constraints, simulating a set number of move into
the future is more appropriate. The runtime is O(cmpn),
where c is the number of simulations we do, m is the
number of Monopoly AIs that we use in order to choose
a policy, p is the number of players in the game, and
n is the depth of the tree search that we seek to use.
There is no exponential in the time complexity because
we are not attempting to fully search the game tree to the

1. Sets - Number of property sets owned, if all the


properties in a set are possessed by a single player,
that player can begin to build houses and hotels
2

2. Property Ratio - Ratio of properties owned by hyperparameters that needed to be determined. Ideally
player to total buyable properties
these could be tuned to identify the parameters most likely
to produce victories. However, resource constraints were
3. Blocker - Number of property sets that a player is a major constraint, so information on the time complexity
preventing another player from owning. To see the was first collected to determine feasible values.
relevance to this consider when another player has
2 out of 3 properties in a set, that final property is
valuable to that player and subsequently to all other
players.
4. Houses - Number of houses owned by player
5. Hotels - Number of hotels owned by player
6. Cash - Total amount of cash a player possesses
The property ratio and cash features were normalized
by running a number of simulations to derive mean
and standard deviation and then using these values
to transform the features. Approximate weights were
determined by the authors for three game periods, divided
by the percentage of total properties owned into early
(<50%), mid (50% and <100%) and late (100%).
Feature
Sets
Prop. Rat.
Blocker
Houses
Hotels
Cash

Early
3
6
3
2
1
2

Mid
6
3
6
4
2
3

Figure 2: Iterations vs. Average Time (depth = 2)

Late
1
1
4
4
6
5

Table 1: Feature Weights


EXPLANATION OF WEIGHTS

2.6

Setup and Metrics

A reasonable description of the goal of this paper was to


produce a better AI given a set of AIs. We thought an
interesting way to explore this hypothesis was to take a
number of different AIs, 5 in this case, written by one
person (Brett Hutley), determine the performance of each
of them individually and then determine the performance
of our MCTS player using the same AIs, referred to as
contributing AIs, as potentially policies. Performance is
determined by the percentage of wins in a series of games.
The monopyly framework provides a tournament class
for running permutations of games given an initial player
group. A controlled setup was necessary in order to get
comparable results, so we settled on using the same 7
opponents (from Stephen Chan, Simon Cornell, and Alan
Brooks) for all tests and only playing 3 player games.
The number of AI iterations and depth of each were

Figure 3: Depth vs. Average Time (iters = 20)

Games take approximately two minutes each when


depth is set to 2 and number of iterations to 20. While
this is at risk of under-exploring the game tree, it is
about what is technically feasible for us at the moment
and it is desirable to look more than one turn into the
future to avoid the greedy approach of just looking at the
immediate next best result.
3

Results

3.1

3.2.2

Playing Technique

Contributing AI Performance

First, we need to quantify the performance of each of the


AI players that will contribute to our MCTS player, so that
it can be observed if the tree search is adding value over
just simply picking one of the AIs and always following
their policy. 500 games were played with each of the
contributing players against various combinations of the
opponent AIs and their win percentages were tallied.

Wins
Probability

Bretts AI
60
0.12

Buffy
31
0.06

Willow
62
0.12

Cordie
24
0.05

Xander
48
0.10

Table 2: Contributing AI Performance


Figure 5: AI Choice Probability vs. Game Stage

Note that all of these players are performing worse


than their opponents. If all 3 players were evenly matched,
a 33% winning percentage would be expected.

3.2

MCTS Agent Performance

3.2.1

Wins

It is clear from 5, that Willow and Bretts AI are favored


throughout the game. This makes intuitive sense because
these are the two best players when playing individually.
It is interesting to note that while their policy was chosen
more often than the others, the difference was small,
meaning that the other AIs policies would often be better
over the next few moves than Willow or Bretts AI.

Run over 500 games, the MCTS agent won 83 games or


16.8% of the games it participated in.

Discussion

Our results indicate that our hypothesis that an


aggregation of AI policies, tied together with an
evaluation function and MCTS can play better than the
individual AIs behind it. While this phenomenon is not
exactly intuitive, it is common in machine learning and
is the whole premise behind ensemble methods. The
fact that MTCS improved upon all of Brett Hutleys AIs
is a promising validation to the algorithm we designed
although a better test might be trying to improve upon AIs
that are already playing at a high level (i.e. >33% wins).
It would be an interesting continuation of the project to
see how speeding up simulations could improve the AI
and try to quantify the quality of the MTCS search on the
Figure 4: AI Win Totals
game tree more. The authors believe that 20 iterations
on depth 2 is probably under-exploring the game tree, but
These results suggest that the amalgamation of AIs given resource constraints was a reasonable selection of
can play better than any single AI could.
hyperparameters.
4

Future Work

advantage of this tactic. Another likely more intensive


way would be to rewrite the framework. The framework
was designed to interface with a GUI and provide ample
logging functionality, but by simplifying rules somewhat,
implementing logging on an ad-hoc basis, and eliminating
the need to be compatible with a GUI some degree of
speedup could be achieved. Along these lines, the game
could be implemented in a compiled language like C++
that would certainly be faster.

The next step in this project would be to speed up the


game tree search. This was the major bottleneck in the
project and prevented us from exploring hyperparameter
tuning more extensively. The most obvious way to search
the tree faster would be to make use of multithreading.
Each simulation is completely independent from the
other solutions so this would be fairly simple to do,
provided we had a machine with enough cores to take

References
[1] C. Browne, E. Powley, and D. Whitehouse. A survey of monte carlo tree search methods. IEEE Transactions on
Computational Intelligence and AI in Games, 4(1), 2012.
[2] H. Baier and M. Winands. Monte-carlo tree search and minimax hybrids. Computer Games, 2014.
[3] T. Dietterich. Ensemble methods in machine learning. Multiple classifier systems, 2000.

You might also like