King-Rook Vs King With Neural Networks

King-Rook vs King with Neural Networks
Alyssa Milburn, Leiden University April 9, 2009
Introduction
This report, written for the 2009 Articial Intelligence course at Leiden University, discusses the use of neural networks for predicting the depth-of-win for a player in King-Rook vs King chess games.
1.1
King-Rook vs King
The particular chess games we examine here involve a white player with a rook and king, versus an opposing black side with only a king. For each game we consider the position of the given three pieces on a standard chess board, and the goal is to predict the number of moves required for the white player to win (the depth of the win), if indeed a win is possible at all from the given starting position.
1.2
Theory
Neural networks [2](pg 736-748) are a statistical learning method involving units/nodes connected by weighted links. Each unit computes an input value by summing the weighted output values of the nodes on the input links, and then applies an activation function (commonly used 1 activation functions are a threshold function and a sigmoid function along the lines of 1+ex ) in order to obtain an output value. In particular for this assignment we examined feed-forward networks which are generally comprised of layers of units interconnected with the previous/next layer. There is generally an input layer, an output layer, and one or more hidden layers. Input is applied to the units in the input layer, and then the units in the output layer produce the network output. There are various factors inuencing the accuracy of a neural network, such as the ability of the network to represent a certain function (amongst other things this is aected by the number of hidden units/layers), the addition of bias nodes to provide a convenient constant value to the network, and the training technique used to set the weights in the network before use. One common training technique is called back propogation, so named because it involves adjusting the network gradually towards the desired result by examining the error at the output layer (the dierence between the expected result provided in the training data and the actual result provided by the network) and then propogating the error back through the network, proportioning it amongst the other units in the network, then nally adjusting the weights on the links involved using the error and a provided learning rate. A simple algorithm for adjusting the weight of an input link of a node, given a certain error, is to take the existing weight and add the product of the learning rate, the error on the output end of the link, the activation function applied to all weighted inputs and the derivative of the 1
Alyssa Milburn, Leiden University
activation function applied to the input of the link in question. the activation lower (input) nodes output value and the derivative of the output activation function applied to that input. With activation function f , learning rate , error , input x and weighted inputs i: w := w + f (x) f (i) The combination of the signs on the error and inputs results in the weight moving in the right direction, which is important, and the use of the derivative of the activation function results in the use of gradient descent to reduce the overall error of the network. The calculation of the error to use for each unit makes the situation somewhat more complex. While for an output node you can simply use the dierence between the expected result and the obtained one, for intermediate nodes it is necessary to take account of the errors on the later units in order to proportion the error appropriately throughout the network. It turns out to be simpler to include the derivative result in the error of the function. The above formula then becomes w := w + f (x), the error for the output nodes becomes f (i) (expected g(i)) and the error calculation for other nodes works well as f (i) z where z is the sum of the application of the weights of each outgoing link with the error of the node on the other end of that link a simple calculation of the contribution to the error on that later node, on the basis of the weighting of the node in question in making the calculation. The training then simply involves repeated applications of the back propogation algorithm by setting the input nodes to a given entry in an example set, calculating the values for the output nodes by summing inputs and applying the activation function as necessary, and then calculating the errors and adjusting the weights in the network as discussed above.
Approach
The assignment suggested that use be made of the King-Rook vs. King data set provided by the Machine Learning Repository[1].
2.1
Implementation
The necessary code was written in C++, using object-orientated programming techniques to implement a reasonably exible neural network framework which could easily use diering activation functions, multiple hidden layers, back propogation, congurable parameters and bias nodes, allowing a reasonable amount of experimentation with these techniques. While originally an attempt was made to have a single neuron class, it soon became apparent that due to the diering types of neurons (output neurons and hidden neurons needing to perform weight adjustments on their input links and to recalculate errors in dierent manners, with input and bias neurons being far simpler) it was simpler to have a basic neuron class and then use subclasses to implement the dierences. This led to some problems (in particular a misspelt overridden function name leading to the failure of the network to function) but we feel was the best decision. We tried to simplify the remaining code as much as possible while complying with our exibility goals; for example, the input and output values are calculated on-demand rather than attempting to store them, and the interface presented to the main program is very simple, with parameters being changed by modifying the source code rather than trying to provide a more sophisticated system.
1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1
1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Figure 1: A plot of sin(2x) cos(2y) against the trained neural network output.
2.2
Experiments and Results
We began as suggested by testing our network against the simple binary xor function, which exposed several bugs in our implementation, and then against a more complicated trigonometric function sin(2x) cos(2y). While the network did not produce good results for this function at rst, good results (Figure 1) were produced when using two hidden layers (here we used a static 50, 000 number of training exercises, 8 nodes in each hidden layer, a learning rate of 0.2 and a weight range of 8.0 to 8.0) and after adjusting the expected output values to lie between 0.0 and 1.0 (by adding 1.0 to the result of the function and dividing that by 2.0). With more extreme weight ranges (for instance 1.0 to 1.0 and 100.0 to 100.0) we encountered the network spiralling o towards constant output values, with only a single hidden layer or less nodes the network reproduced a much atter output function, and increasing or decreasing the number of training exercises increased or decreased the accuracy of the network output as would be expected. Modifying the learning rate seemed to have a fairly unpredictable eect with the network also spiralling o towards constant output values when a high (> 0.7) learning rate was specied, but we did not experiment with this in any signicant fashion. We then proceeded to examine the King-Rook vs King problem as described earlier. After modifying the data le provided by the repository with a script in a minor way (to convert the names of draw, one etc to numeric constants, and to randomise the order of the data) and the neural network test program to read this data le and train with it in a naive fashion (simply training on the rst 28, 000 entries and then producing results for the remaining 56), we then proceeded to testing. Our rst model used a single output node, with the number of moves encoded from 0 to 17 (with 17 used to indicate a draw) to 0.0 to 1.0 for the purposes of the network. The output for our rst few attempts was useless, either due to being apparently almost completely random,
18 "kingrook.txt" x 16
14
12
10
4 4 6 8 10 12 14 16 18
Figure 2: A plot of network output for the King-Rook vs King problem, with the expected values on the x axis and the network output on the y axis or due to the network outputting near-constant results. While adjusting the training rate and the number of hidden nodes/layers in the network seemed to have relatively little eect, we had some limited amount of success in reducing the weight range from the previous 8.0 to 0.2. The output of the network at this point is shown in Figure 2, using two hidden layers with 12 nodes per layer. As can clearly be seen, while a vague correlation is present, the network is not managing to function even remotely reliably. In fact, as further experimentation showed, this network turned out to perform little better than a network with a single hidden node in a single hidden layer. Adding extra layers and increasing/decreasing the nodes in each layer resulted in no real improvement, nor did adjusting the network parameters. Since we had a limited amount of training data and were using almost all of it already, adding more complexity (such as some kind of stop condition) was not considered. Finally, we attempted to use distributed coding; using 18 dierent output nodes, one for each result. This was even less successful than our previous attempt, no matter what changes to the parameters were made.
Conclusion
Obviously we failed to produce a network which could succeed at the proposed task. While our network code appears to be correct and capable of simulating reasonably complex functions such as the trigometric one discussed previously, we failed to train a network to give reliable results for the King-Rook vs King problem. Time constraints meant that we did not look into possible solutions such as automated searches for optimal parameters, diering inputs (such as providing the positions of pieces relative to each other, as opposed to global positions), providing more training data or using a dierent activation function. One result which was particularly surprising to us was the number of adjustments to parameters required to make the network stabilise for any reasonably complicated function; the range of the starting link weights and the learning rate have a large eect, and values outside a fairly limited range lead to disaster for the network during training.
References
[1] Asuncion, A. and Newman, D.J. (2007). UCI Machine Learning Repository, http://www.ics.uci.edu/mlearn/MLRepository.html. Irvine, CA: University of California, School of Information and Computer Science. [2] S.J. Russell and P. Norvig, Articial Intelligence, A Modern Approach, 2nd edition, Prentice Hall, 2003.
Appendix I: Code
/* * neuralnet .h * * AI 2009 , NN * Alyssa Milburn , 0824623 */ # include < vector > # include < utility > struct neuron { std :: vector < neuron * > outputs ; virtual void adjust_weights () { } virtual void calculate_error () { } virtual double output_value () = 0; }; struct normalneuron : public neuron { double error ; std :: vector < std :: pair < double , neuron * > > inputs ; void adjust_weights (); void calculate_error (); double output_value (); double input_value (); }; struct outputneuron : public normalneuron { double expected ; void calculate_error (); }; struct biasneuron : public neuron { double output_value (); }; struct inputneuron : public neuron { double value ; double output_value (); }; struct layer { std :: vector < neuron * > neurons ; void wire_to_prev_layer ( layer & l ); void calculate_error (); void adjust_weights (); }; class neuralnet { protected : std :: vector < layer > layers ; public : neuralnet ( unsigned int numlayers , unsigned int * neuronsp erlayer ); void set_input_value ( unsigned int n , double v ); void s e t _ o u t p u t _ e x p e c t e d _ v a l u e ( unsigned int n , double v ); double get_output_value ( unsigned int n ); void do_backprop_cycle (); };
/* * n e u r a l n e t . cpp * * AI 2009 , NN * Alyssa Milburn , 0824623 */ # include # include # include # include " neuralnet . h " < cmath > < cassert > < cstdlib >
// c u s t o m i s a b l e p a r a m e t e r s # define WEIGHT_RANGE 1.0 const double learning_rate = 0.2; // a c t i v a t i o n f u n c t i o n ( replace as desired ) double sigmoid ( double v , bool diff = false ) { if ( diff ) return sigmoid ( v ) * (1 - sigmoid ( v )); else return 1 / (1 + exp ( - v )); } void normalneuron :: adjust_weights () { // s t a n d a r d weight a d j u s t m e n t using a previously - c a l c u l a t e d error for ( unsigned int i = 0; i < inputs . size (); i ++) { inputs [ i ]. first += learning_rate * inputs [ i ]. second - > output_value () * erro r ; } } void normalneuron :: calculate_error () { // Note that this is not used for outputneuron , which o v e r r i d e s this f u n c t i o n . // g ( input ) * sum over outputs of ( weight * error ) error = 0; for ( unsigned int i = 0; i < outputs . size (); i ++) { normalneuron * output = dynamic_cast < normalneuron * >( out puts [ i ]); // the weights are stored in the output neuron , u n f o r t u n a t e l y unsigned int j ; for ( j = 0; j < output - > inputs . size (); j ++) { if ( output - > inputs [ j ]. second == this ) { // output ( weight * error ) error += output - > inputs [ j ]. first * output - > error ; break ; } } // make sure we a c t u a l l y found a result assert ( j != output - > inputs . size ()); } error *= sigmoid ( input_value () , true ); } void outputneuron :: calculate_error () { // g ( input ) * ( e x p e c t e d - output ) error = sigmoid ( input_value () , true ) * ( expected - output_ value ()); } double normalneuron :: input_value () { // sum inputs using their values and weights double value = 0; for ( unsigned int i = 0; i < inputs . size (); i ++) { value += inputs [ i ]. first * inputs [ i ]. second - > output_val ue (); } return value ; } double normalneuron :: output_value () { return sigmoid ( input_value ()); } double biasneuron :: output_value () { return -1.0; }
double inputneuron :: output_value () { return value ; } void layer :: wire_to_prev_layer ( layer & l ) { // connect or wire up all the nodes in this layer with all the nodes // in the s p e c i f i e d p r e v i o u s layer , a v o i d i n g any bias neurons in this // layer since they don t have inputs for ( unsigned int i = 0; i < neurons . size (); i ++) { normalneuron * ourneuron = dynamic_cast < normalneuron * >( neurons [ i ]); if (! ourneuron ) { assert ( dynamic_cast < biasneuron * >( neurons [ i ])); continue ; } for ( unsigned int j = 0; j < l . neurons . size (); j ++) { neuron * prevneuron = l . neurons [ j ]; double random_weight = WEIGHT_RANGE (( rand () / ( double ) RAND_MAX ) * 2 * WEIGHT_RANGE ); ourneuron - > inputs . push_back ( std :: pair < double , neuron * >( random_weight , prevneuron ) ); prevneuron - > outputs . push_back ( ourneuron ); } } } void layer :: calculate_error () { for ( unsigned int i = 0; i < neurons . size (); i ++) { neurons [ i ] - > calculate_error (); } } void layer :: adjust_weights () { for ( unsigned int i = 0; i < neurons . size (); i ++) { neurons [ i ] - > adjust_weights (); } } neuralnet :: neuralnet ( unsigned int numlayers , unsigned int * neuronsperlayer ) { assert ( numlayers >= 2); for ( unsigned int i = 0; i < numlayers ; i ++) { layers . push_back ( layer ()); for ( unsigned int j = 0; j < neuronsperlayer [ i ]; j ++) { if ( i == 0) { // input layer layers [ i ]. neurons . push_back ( new inputneuron ); } else if ( i == numlayers - 1) { // output layer layers [ i ]. neurons . push_back ( new outputneuron ); } else { // hidden layer layers [ i ]. neurons . push_back ( new normalneuron ); } } // all layers except output layer get a bias neuron if ( i != numlayers - 1) { layers [ i ]. neurons . push_back ( new biasneuron ); } // wire layers t o g e t h e r if ( i > 0) { layers [ i ]. wire_to_prev_layer ( layers [ i - 1]); } } } void neuralnet :: do_backprop_cycle () { // going b a c k w a r d s from the last layer to the first hidden // layer , we c a l c u l a t e the errors on each layer and then // adjust the weights ( with the p r e v i o u s layer ) a c c o r d i n g l y for ( int i = layers . size () - 1; i > 0; i - -) layers [ i ]. calculate_error ();
for ( int i = layers . size () - 1; i > 0; i - -) layers [ i ]. adjust_weights (); } void neuralnet :: set_input_value ( unsigned int n , double v ) { layer & l = layers [0]; assert ( n < l . neurons . size ()); inputneuron * inneuron = dynamic_cast < inputneuron * >( l . n eurons [ n ]); inneuron - > value = v ; } void neuralnet :: s e t _ o u t p u t _ e x p e c t e d _ v a l u e ( unsigned int n , double v ) { layer & l = layers [ layers . size () - 1]; assert ( n < l . neurons . size ()); outputneuron * outneuron = dynamic_cast < outputneuron * >( l . neurons [ n ]); outneuron - > expected = v ; } double neuralnet :: get_output_value ( unsigned int n ) { layer & l = layers [ layers . size () - 1]; assert ( n < l . neurons . size ()); return l . neurons [ n ] - > output_value (); } /* * main . cpp * * AI 2009 , NN * Alyssa Milburn , 0824623 */ # include " neuralnet . h " # include < cstdlib > # include < ctime > # include < cmath > // sin etc # include < cassert > # include < iostream > using namespace std ; int main ( int argc , char ** argv ) { srand ( time ( NULL )); unsigned int layercounts [4] = { 6 , 6 , 6 , 18 }; neuralnet ournetwork (4 , layercounts ); for ( unsigned int i = 0; i < 28056; i ++) { int data [6]; char chars [4]; int expected ; fscanf ( stdin , " %c ,% d ,% c ,% d ,% c ,% d ,% d \ n " , & chars [0] , & data [1] , & chars [1] , & data [3] , & chars [2] , & data [5] , & expected ); data [0] = chars [0] - a ; data [1] -= 1; data [2] = chars [1] - a ; data [3] -= 1; data [4] = chars [2] - a ; data [5] -= 1; for ( unsigned int j = 0; j < 6; j ++) { assert ( data [ j ] >= 0 && data [ j ] < 8); ournetwork . set_input_value (j , ( double ) data [ j ] / 8.0); } if ( i > 28000) { double max = 0.0; int sel = -1; for ( unsigned int i = 0; i < 18; i ++) { double t = ournetwork . get_output_value ( i ); if ( t > max ) { max = t ; sel = i ; } } cout << expected << " " << sel << endl ; } else { for ( unsigned int i = 0; i < 18; i ++) { double t = ( expected == i ? 1.0 : 0.0); ournetwork . s e t _ o u t p u t _ e x p e c t e d _ v a l u e (i , t ); } ournetwork . do_backprop_cycle (); } } }
# rewrite . py import random data = open ( " krkopt . data " ). readlines () random . shuffle ( data ) d = { zero : 0 , one : 1 , two : 2 , three : 3 , four : 4 , five : 5 , six : 6 , seven : 7 , eight : 8 , nine : 9 , ten : 10 , eleven : 11 , twelve : 12 , thirteen : 13 , fourteen : 14 , fifteen : 15 , sixteen : 16 , draw : 17 } outdata = open ( " data . txt " , " w " ) for i in data : x = i . strip (). split ( " ," ) x [ -1] = str ( d [ x [ -1]]) outdata . write ( " ," . join ( x ) + " \ n " )
10

King-Rook Vs King With Neural Networks

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

King-Rook Vs King With Neural Networks

Uploaded by

Copyright:

Available Formats

King-Rook vs King with Neural Networks

Alyssa Milburn, Leiden University April 9, 2009