Applying Genetic Algorithms To Evolve Strategy in Mastermind Game

CLEI 2011
Applying Genetic Algorithms to Evolve Strategy in Mastermind Game

Fabrcio H. Rodriguesab
Instituto de Cincias Exatas e Tecnolgicas - ICET Universidade Feevale Novo Hamburgo, Brasil
Michelle D. Leonhardtc
Instituto de Informtica Universidade Federal do Rio Grande do Sul - UFRGS Porto Alegre, Brasil
Abstract
Despite of its simplicity, Mastermind game has showed up as a worthy problem for AI researchers due to its well defined rules and its challenging nature. Different approaches to develop algorithms to play the game have been tried, using a number of AI paradigms. However, most of the solutions developed to solve the problem are man-made ones (i.e. they are whole designed by the researchers to face the game). In this article, instead of presenting a new application of IA techniques to play the game efficiently, we show our efforts towards the use of genetic algorithms to evolve a new strategy based on human-like acting during the game solving, which have so far achieved strategies that managed to win over 60% of the times.
Keywords: Artificial Intelligence, Machine Learning, Genetic Algorithms, Genetic Programming, Mastermind, Strategy, Games
Introduction
The Mastermind game is a traditional guessing board game for 2 players: the codemaker and the codebreaker. The first is responsible for creating a secret
a b c
Incentive is a great part of any work. Thanks to those who gave us such a gift. Email: fabriciohr@feevale.br Email: mdleonhardt@inf.ufrgs.br
This paper is electronically published in Electronic Notes in Theoretical Computer Science URL: www.elsevier.nl/locate/entcs
RODRIGUES, LEONHARDT
sequence of colors while the second has to discover the hidden code within a certain limit of guesses in order to win the game. The secret sequence has a length of N colors chosen from M available ones. Repetition of colors within the sequence is or is not allowed depending on the version of the game. After each guess, the codemaker provides the codebreaker a hint about the fitness of the given guess in relation to the actual sequence. This hint has the form of {A,B}, where A is the number of colors in the guess that are at the correct position in the hidden sequence and B is the number of colors in the guess that are present in the hidden sequence but at the wrong position. Despite its simplicity, the game has showed up as a worthy problem to AI researchers [10], due to its well defined rules and its challenging nature. Different AI techniques have been applied regarding Mastermind: simulated annealing techniques [2], hill climbing heuristics [11], genetic algorithms (GA) and other evolutionary strategies [1] [4] [6] [8], constraint optimization techniques [2] [12] and ad hoc algorithms [9]. Most of these attempts consisted in developing algorithms to directly play the game as the codebreaker, which were really successful. In this work, however, we are handling the problem of playing Mastermind with a different approach: the use of GA a search heuristic, idealized by John Henry Holland in the earlier 70's, that mimics the dynamics of natural evolution to find solutions for a wide variety of problems [3] to evolve a strategy to play the game. Our work consists of building a rule-based-system-like framework to be the player of the game and evolving a strategy to play Mastermind based on this framework. We have started by defining a set of basic functions that serve as the building blocks of the strategy. Such functions work by taking a previous guessed color sequence and modifying it in order to generate a new sequence to be guessed. In GA terminology, the framework can be seen as the chromosome form and the set of basic functions, the pool of genes. The GA objective is, given the framework and any set of basic functions, to evolve a good strategy to play Mastermind as the codebreaker (i.e. an appropriated combination of functions to be used by the framework to play the game). We chose GA to do so for two main reasons: (1) Since the set of basic functions has dozens of them and the strategy structure consists in a combination of over a hundred functions what means a search space of around 10100 possible strategies an exhaustive search would be unfeasible; and (2) the generalist character of GA [5] will allow the developed evolution schema to be used to solve other similar problems (i.e. evolve strategies to perform other tasks), what wouldnt be possible if using another bind-to-the-problem heuristic. Another reason for choosing this approach is the fact that, in nature, instead of only static entities, the genetic code translates in dynamic beings, with its features ruling over a complex set of functions. So it should be applied to its computing version. The genetic programming techniques already known (such as those that use trees in order to represent sentences in programming languages, that are largely used [5]) already do so [5], but the approach presented here is easier to implement and may be suitable to apply to other kinds of problems, even outside computational field. The remainder sections are structured as follows: Section 2 talks about techniques usually used to play Mastermind; Section 3 presents the details of our
2
approach to solve the problem; Section 4 describes the implementation of GA and presents the results we have already achieved; And section 5 discusses our approach and suggests future work.
Related Work
This section presents a briefing of the efforts made on the objective of playing Mastermind. In general, the work made on playing Mastermind consisted in attempts to develop algorithms to directly play the game. In spite of the diversity of techniques used to do so, most of the works share an underneath basic strategy that comes to be selecting the next guess from the group of remaining possible codes, using the clues given by codemaker to cut some of them off, pruning the search space. The work made by Berghman, Goossens, and Leus (2009) [1] is one of those that use such a strategy and seems to has had the best results in the task. In their work, they always use the same code as the first guess and run their algorithm, which employs GA, to select the next ones to be guessed. Before each guess is made, the GA is started with a population of 150 distinct possible codes. In each generation each of the population codes are played against all already given guesses as if the given guesses were the correct answers. In the next step, the received clues number of correct colors at correct positions and correct colors at wrong positions are used to compute the fitness of each code of the population. The population is sorted by the fitness of the codes and it is checked if some eligible code evolved (for the purpose of the work, an eligible code is a code that, when played against all the already given guesses, obtains the same tip reached by each of the given guesses when played against the actual hidden code). If so, the eligible code is stored. The GA continues until it is collected 60 eligible codes or until the GA reaches 100 generations. The algorithm then chooses the next code to be guessed from the set of eligible codes collected during GA execution. This choice is made by calculating, for each of the eligible codes, how many eligible codes would remain if the code in focus was the hidden one. The eligible code which would make remain less eligible codes is used as the next guess. It was reported that this algorithm always finds the hidden code, in an average of 4.39 guesses, taking 0.614 seconds to perform the task, for the version with codes of L = 4 colors among P = 6 possible ones, with allowed repetition; 6.475 guesses, taking 1.284 seconds, for L = 6 and P = 9, also with allowed repetition; and 8.366, taking 20.571 seconds, for L = 8 and P = 12, again with allowed repetition. A comparison with several other successful related approaches is also reported, showing that their algorithm is as efficient in terms of average of guesses as its similar for the version of L = 4 and P = 6 and outperforms them for the other versions. Concerning processing time, their algorithm is overcome on the version with L = 4 and P = 6 in which some other approaches take only 0.001 seconds to win the game , however comparing to the others on the remaining versions. As we can see, the attempt described here, as well as those referred throughout the references section, is based on an embedded process of analyzing and pruning a
3
great part of the or even whole search space. This is a task that requires quite intensive cpu-processing. Even though they have had good results it is not so hard to find people whose performance on playing Mastermind is comparable to theirs. Moreover, it seems that people do not have as much information processing power as the computers do at least not of the same kind or not of the kind required to perform the pruning task. Considering that fact, it is plausible to think that there is another kind of strategy, other than those present on previous work, which achieves good results. This is what we are running after.
Our Approach
Playing Mastermind efficiently is a rather explored task. On the other hand, the challenge of developing an algorithm that evolves a new strategy for the game is far less worked on. In our work we are interested in the second approach. The version of the game considered in this work uses sequences of 4 colors (i.e. L = 4) chosen from 9 available ones (i.e. P = 9) with forbidden repetition, and with a maximum of 12 guesses to win the game (this version is usually referred to as bulls-and-cows or bulls-and-pigs game). As a codemakers hint represents the proximity of a guess from the hidden code, it can be considered as a state in which the player is in his way to the answer. In this version of Mastermind, there are 14 possible hints/states during the game as follows (sorted by their distance from the answer): {0,0} (all wrong), {0,1}, {1,0}, {0,2}, {1,1}, {2,0}, {0,3}, {0,4}, {1,2}, {2,1}, {3,0}, {1,3}, {2,2} and {4,0} (solution). As our strategy is based on human-like acting during the game, we started the work by observing peoples interaction with it. The first guess made usually has no special form and can be considered random. In order to make each of the following guesses, people usually analyze one or more of the previous ones and the reached states the number of colors at the correct position and the number of right colors at wrong position. Then, they generate the new guess by performing one of the following actions: (1) Choosing a random new code; (2) Picking one of the previous guessed codes and changing the order of its colors in order to make a new code; (3) Picking one of the previous guessed codes and changing the color of one or more of its positions in order to make a new code; and (4) A combination of actions (2) and (3). It was also observed that, just with this kind of strategy, some people reach results comparable to those obtained by the approaches listed on [1]. Although we dont have a well defined model of the mind process executed while a person is playing the game, we believe that it is a process of making-and-testing hypothesis about the form of the hidden guess that guides the player through the search space of possible answers with this process based only on the actions listed above. Considering this fact, we formulated the hypothesis that, providing some learning algorithm with means of performing the actions above, as well as of making some analysis over previous guesses, it is possible to evolve a strategy to play Mastermind comparable in efficiency to that usually taken by people. To test our hypothesis, we started by implementing the actions the player can take to make a guess during the game. We defined 45 basic functions (numbered from 0 to 44) to perform these actions, divide in 4 categories: (1) Functions to
4
generate a new code, not related to the previous ones (one for generate a random code and other to take the first colors available to make the code); (2) Functions that receives a code, changes the position of some of its colors, and returns this new code (e.g. exchange the positions of first and last colors, or the first two, or three of them); (3) Functions that receives a code, exchanges one or more of its colors for other available ones, and returns this new code (e.g. change the first color, the first two, the last, the first and the third and the fourth); and (4) Functions that receives a code, both changes the position of the colors and exchange some of them for other available ones and returns this new code (so far it was implemented only functions that shift the colors of the code right or left a given number of positions and replace the emptied positions for other available colors). As an example of using a basic function, let's imagine that the player has the code Blue|Yellow|Red|Green to work with. Let's also suppose that the basic function to be applied is such that shifts the code right once and replace the emptied color for another available one. Thus, it will first shift the code, generating the sequence Empty|Blue|Yellow|Red, and then replace the empty position for a random available color purple, let's say generating the new code Purple|Blue|Yellow|Red. We next developed the strategy form as a rule-based-system-like framework. More specifically, it was implemented as a matrix, with the rows representing the hints/states and columns representing the times that the player received a given hint reached a given state. In each position of this matrix it is placed a number that represents one of the basic functions (as it can be seen in Figure 1). In order to allow the player to use the strategy, along with it we implemented a memory in which the player stores how many times he has received each clue (reached each state) and another memory to store one of the codes previously guessed (in this case, the immediately previous one) and the clue received for it. An example of the strategy is shown below, in Figure 1. hint/times {0,0} {0,1} {1,0} {1,1} {2,0} 1 0 40 19 13 7 2 0 20 3 39 22 3 10 20 29 18 37 4 32 11 43 41 9 5 3 38 0 6 34
Figure 1: reduced strategy form example.
The use of the strategy is shown by the algorithm below Algorithm 1 BEGIN
5
Generate and play a random code; Receive a hint; WHILE received hint doesn't indicate win DO Store current code and hint and discard old ones; Update number of times the hint was received; Function_to_use := select from the strategy the function on the line corresponding to the stored hint and the column corresponding to the times this hint was received New_code := function_to_use(stored_code); Play the new code and receive a hint; END; END. As an example of use of the strategy, lets imagine a player that uses the matrix of figure 1 as its strategy and that has played the sequence Blue|Green|Red|Yellow as its last guess, receiving the hint {1,0}. Lets also imagine that it is the second time the player has received this hint. Therefore, in order to make the next guess, the player will choose the function stored at the intersection of the row {1,0} (corresponding to the received tip) and the column 2 (since it is the second time the hint {1,0} is received). Doing so, the player will find the number 3 and then choose the 3rd basic function of its list to apply to the stored code. If the 3rd basic function is change the position of first and last colors, it will take the stored code Blue|Green|Red|Yellow and generate the new one Yellow|Green|Red|Blue, which will be the next guess. Lets finally imagine that for this sequence the player receives the clue {2,0} and it is the 4th time this state is reached. In that case the player will choose the function on the row {2,0} and column 4 which will come to be the 9th applying it to the previous guess in order to generate the next one. Such process will be repeated until the player wins the game or reaches the limit of tries. Throughout our experiments we made two intrusions into the strategy being evolved. We previously defined that the colors within a guess that receives the tip {0,0} would be excluded and the player would not be allowed to choose than for a further guess (since they are all wrong). We also stated that the functions in the row {0,0} would be just those that generate a new code without receiving another one to modify (since the codes that receives the tip {0,0}, and causes to enter in such row, are useless to modify due to all its colors becomes forbidden). This care was taken in order to accelerate the evolving of the strategy and keep it from some basic traps. The portion of the strategy to be evolved by the algorithm is the combination of functions within the matrix that will result in an efficient way to play Mastermind. Thus, it is within the class of combinatorial optimization problems [14], for what GA is a good tool [5] what justify our choice for GA.
6
Applying GA
For the representation of the problem in terms of GA, we considered each matrix strategy as an individual, each row of the matrix as an individuals chromosome and the 45-set of functions as the pool of genes. The fitness evaluation consists in raffling 100 possible secret codes per generation and making all its individuals play against this set of codes. For each individual it is calculated the rate of wins (times in which the answer is discovered within 12 guesses) and the average of guesses needed to win a game. Then the population is ranked by the rate of wins, tie-breaking by the average of guesses needed to win. The individuals that will form the couples that will generate the next generation are chosen by a variation of the Classification method. The GA raffles an individual and decides whether it participates of a couple or not based on a probability calculated considering the position of the individual on the ranking (this probability is 100% for the first of the ranking, decreasing linearly until the last, with 0%). Having taken 3 individuals, it is formed the 3 possible couples with them and each couple generates 1 child. In our GA, the individuals reproduction occurs by mixing each of the chromosomes of one individual with their correspondents of another. For this process we tested single cut-point, double cut-point and uniform crossovers, all reaching about the same results. The probability of crossover was set in every experiment as 100%. We also used our mutation operator for 1% of the offspring. Our mutation operator works taking the individual's strategy matrix, raffling 5 of its 130 positions and then changing their basic functions for other random chosen ones. For this experiment we used a fixed-length population of 1000 strategies. Each new generation is composed by 950 children of individuals of previous generation and 50 survivors. The best individual of each generation always survive and the other 49 are selected the same way the couples are. We have already made some experiments with 10000-individuals population, in order to evaluate if this increment on the population would represent a significant better on our search. The following table presents the results of our experiments. They have been conducted by executing the GA over 500 generations. On the rows we have the size of population. On columns we have data concerning the individuals between the 200th and 240th generation (interval in which the algorithm seems to converge, due to the standard deviation of the results of the last 40 generations gets low) of all the 20 executions of the GA: the rate of wins of the best individual evolved, the average rate of wins , its coefficient of variation, the average number of guesses needed to win the game and its coefficient of variation.
Table 1. Results of experiments
Population size 1000 10000 Max. rate of wins 67% 72% Avg. rate of wins 61.11% 65.71% Coef. of var. avg. rate of wins 3.84% 4.52% 7 Avg. of guesses 8.88 8.83 Coef. of var. avg. guesses 3.49% 3.07%
The results so far achieved by our algorithm suggest a promising horizon. Even though we have not yet managed to evolve a strategy that allow players to win every game they play, we have reached players that could be seen as pretty smart ones, consistently winning more than 60% of the times, what is shown by the low coefficient of variation of efficiency of the evolved strategies. It has even been achieved individuals that sometimes win over 70% of the times. The average number of guesses which stays under of the maximum number of guesses also evidences the evolving of a genuine heuristic to play the game, which combines the basic functions in a configuration that seems to use their strengths in a proper way, hardly needing all the allowed guesses to win the game. It represents a good advance over the average of less than 10% of wins showed by a random selected distribution of basic functions and points to the suitability of the type of strategy we propose to the kind of problem that Mastermind represents. Moreover, considering that there are more candidate basic functions than the 45 so far implemented, we believe that we have much field to explore in which our algorithm can achieve better results. The tests with populations of 10000 individuals have showed a relatively small difference on the results, what suggests that 1000 individuals is a good size for the population in our experiments with this approach. A last point worthy to mention is the time our approach spends on the task. In our experiments with 1000-individuals-long population, it has taken an average of 1.064 seconds to evolve each generation, totaling 532 seconds to achieve the 500th generation. For 10000-individuals-long population, the times change to 12.6 seconds and 6300 seconds, respectively. On the other hand, with the evolved strategy, it is needed only 0.000008 seconds, on average, to play a game. That represents a great advance over other approaches in terms of performance, what justify further work on the proposal.
Concluding Remarks
The work presented here introduces a rather novel approach to the task of playing Mastermind. Unlike the previous works that have embedded strategies and are based upon intensive information process dealing with a great slice of the space of possible codes , we are evolving a strategy that is based only on the experience gained during evolution. Hard information processing is used just during the evolution of it, in which the strategies face many scenarios and pass the information acquired about them on the following generations. Our strategy uses local information to make its guesses and does not have to deal at least explicitly with the whole set of possible answers. Due to this feature, it takes greatly less time to play the game than the endeavors listed on the references what represents one of the strengths of our proposal. Another difference is that, while the strategies of other approaches grow in complexity with the growing of the possible answers (i.e. the growing of either the code length or the possible colors), ours only varies
8
its size as the code length varies due to the increment of the number of possible tips. Differently of other approaches that uses specific-designed solutions, our approach was elaborated based on general concepts, such as the appliance of a specific basic function after receiving a given hint that can be seen as the use of a determined tool when facing a certain state of the solution of a problem. This design gives generality to our approach, which allows it to be used to evolve strategies for more complex tasks, such as image processing. For example, we have a variety of algorithms developed to better the quality of an image for a certain purpose. However, finding the combination of them that could really better the image is a manual and heuristic processing of try-and-evaluate them, which requires work of an expert, as described in [7]. If it is possible to quantitatively evaluate the quality of an image and classify it in some stages of quality, the evolving schema described in this work could be applied to this problem. With the possible algorithms to apply to the image on the place of the basic functions and the stage of quality of the image replacing the tips given by the codemaker, the work described here could be used to find a good order of algorithms to be applied to some kind of image in order to high its quality. In this way, as future work, we can mention the expansion of the set of basic functions with those to complete the change-colors-and-positions functions class that are estimated to be up 200 ones what is expected to allow the emergence of more efficient strategies, due to the wider variety of tools to be selected during evolution to compose more suitable individuals. Along with this, experiments with varying crossover methods as well as varying criteria of ranking the population will be made in order to improve the exploration of the space of possible strategies. Finally, it's planned the implementation of 2 variations of the type of strategy presented here, in order to evaluate if changes in what is stored by the player during the game and in how the basic functions are selected given a certain tip can improve the performance of the evolved strategies. References
[1] [2] Berghman, L., D. Goossens, R., Leus, Efficient solutions for Mastermind using genetic algorithms. Computers & operations research, v.36, p.1880-1885 (2009). Bernier, J. L., C. Herriz, J. J. Merelo, S. Olmeda, and A. Prieto, Solving Mastermind using GAs and simulated annealing: A case of dynamic constraint optimization, Proceedings of the 4th International Conference on Parallel Problem Solving from Nature, pp. 553-653, Springer, Berlin/Heidelberg (1996). Holland, J. H., Adaptation in natural and artificial systems, MIT Press Cambridge, MA, USA, 1992 Kalisker, T., D. Camens, Solving Mastermind Using Genetic Algorithms, Proceedings of Genetic and Evolutionary Computation Conference GECCO 2003 - Part II, pp. 206, Springer, Berlin/Heidelberg (2003). Linden, R, Algoritmos Genticos: Uma importante ferramenta da Inteligncia Computacional, Editora Brasport, Rio de Janeiro, 2006. Merelo-Guervs, J. J., P. Castilho, V. M. Rivas, Finding a needle in a haystack using hints and evolutionary computation: the case of evolutionary MasterMind. Applied Soft Computing, Volume 6, Issue 2, pp. 170-179 (2003). Noguerol, L., Sistema de Informao como Apoio ao Diagnstico em Parasitologia, monograph, Universidade Feevale, 2008. Ryan, C., M. Nicolau, M. ONeil, Genetic Algorithms Using Grammatical Evolution, Proceedings of 5th European Conference, EuroGP 2002, pp. 1-4, Springer, Berlin/Heidelberg (2002). Shapiro, E., Playing Mastermind Logically. ACM SIGART Bulletin, Issue 85, pp. 28-29, ACM, New York, NY, USA (1983). 9
[3] [4]
[5] [6]
[7] [8] [9]
RODRIGUES, LEONHARDT [10] Stuckman, J., G. Zhang, Mastermind is NP-Complete, Cornell University Library, arXiv.org (2006). [11] Temporel, A., T. Kovacs, A heuristic hill climbing algorithm for Mastermind, Proceedings of the 2003 UK Workshop on Computational Intelligence (UKCI-03), pp. 189-196 (2003). [12] Van Hentenryck, P., A constraint approach to mastermind in logic programming, ACM SIGART Bulletin, Issue 103, pp. 31-35. ACM New York, NY, USA (1988).
10

Applying Genetic Algorithms To Evolve Strategy in Mastermind Game

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Applying Genetic Algorithms To Evolve Strategy in Mastermind Game

Uploaded by

Copyright:

Available Formats

CLEI 2011

Applying Genetic Algorithms to Evolve Strategy in Mastermind Game

Figure 1: reduced strategy form example.

[7] [8] [9]

You might also like