You are on page 1of 79

An investigation into genetic algorithms for the prediction of

football results to aid computerised gambling


Mark Rowan
School of Computer Science,
The University of Birmingham,
Birmingham,
B15 2TT, UK
mark@tamias.co.uk
September 5, 2007
Supervisor: Dr. John Bullinaria

Abstract
Gambling is a risky business, particularly in the field of sports betting, where the inherently random
nature of the game can present great difficulties to systems attempting to predict future event outcomes.
In this project, a genetic algorithm representation is presented with the intention of recovering underlying
structure in football data, including potential temporal changes.
A number of enhancements to the algorithm are then proposed and implemented, with varying
degrees of success, and suitable values for a range of parameters are identified. The results show that
it is possible to recover at least sufficient underlying structure from the data to break even across the
length of a footballing season, and indeed that in certain circumstances, a healthy profit may be made.
It is also shown to be possible to use the GA-trained model to predict with some modicum of accuracy
the final placings of teams in a league table.
This project is an extension of a previous mini-project in the same area, and builds upon the proposals
put forward in that mini-project.
Keywords Football, betting, prediction, gambling, sports, genetic algorithm, evolutionary computation.

Contents
1 Introduction and literature
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Relevant literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Design
2.1 Program structure . . . . . . . . . . . .
2.1.1 Gambler . . . . . . . . . . . . . .
2.1.2 Predictor . . . . . . . . . . . . .
2.1.3 Population . . . . . . . . . . . .
2.1.4 Individual . . . . . . . . . . . . .
2.1.5 ScoresDatabase . . . . . . . . . .
2.1.6 FixtureResult . . . . . . . . . . .
2.2 Genetic algorithm . . . . . . . . . . . .
2.2.1 Representation . . . . . . . . . .
2.2.2 Fitness evaluation . . . . . . . .
2.2.3 Search operators . . . . . . . . .
2.3 Data sources . . . . . . . . . . . . . . .
2.4 Predicting outcomes using the model . .
2.4.1 Calculating a prediction from the
2.4.2 Oracles . . . . . . . . . . . . . .

. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
representation
. . . . . . . . .

3 Experiments performed
3.1 Profiling the testing data . . . . . . . . . . . . . .
3.2 Refining use of available data . . . . . . . . . . . .
3.3 Converting GA to use a bitstring representation . .
3.3.1 Different binary precision values . . . . . .
3.3.2 High-precision bitstring . . . . . . . . . . .
3.3.3 Low-precision bitstring . . . . . . . . . . . .
3.4 Altering population extents for testing data . . . .
3.5 Predicting league tables . . . . . . . . . . . . . . .
3.6 Removing seeded home/away win ratio information
3.7 Calculation of certainties . . . . . . . . . . . . . . .

6
6
6

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

8
8
8
9
9
9
9
10
10
10
11
12
13
13
13
14

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

15
15
16
18
20
23
40
52
54
57
58

4 Conclusions and further work


4.1 Findings . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Further work . . . . . . . . . . . . . . . . . . . . . .
4.2.1 Oracle switching strategies . . . . . . . . . .
4.2.2 Intelligent weighting of training data . . . . .
4.2.3 Decreasing bias of predictions . . . . . . . . .
4.2.4 Niching and fitness sharing . . . . . . . . . .
4.2.5 Adding knowledge to the model dynamically

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

67
67
68
68
68
68
68
69

A Calculating the binary reflected Gray code


2

70

B Populations divided by time


72
B.1 English Premier League . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
B.2 French Le Championnat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
B.3 German 1. Bundesliga . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
C Project proposal

74

List of Figures
2.1

Program structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
3.10
3.11
3.12
3.13
3.14
3.15
3.16
3.17
3.18
3.19
3.20
3.21
3.22
3.23
3.24
3.25
3.26
3.27
3.28
3.29
3.30

Profile of testing data sets . . . . . . . . . . . . . . . . . . . . . .


Results for reducing extent of input data . . . . . . . . . . . . . .
Binary precision values . . . . . . . . . . . . . . . . . . . . . . . .
Fitness profile for old real-valued representation . . . . . . . . . .
Fitness profile for new binary representation . . . . . . . . . . . .
Actual evolved parameter values . . . . . . . . . . . . . . . . . .
Results for different numbers of oracles . . . . . . . . . . . . . . .
Fitness profile for different initial mutation rates . . . . . . . . .
Results for different initial mutation rates . . . . . . . . . . . . .
Fitness profile for different numbers of epochs of training . . . .
Results for training over different numbers of epochs . . . . . . .
Fitness profile for different population sizes . . . . . . . . . . . .
Results for different population sizes . . . . . . . . . . . . . . . .
Multi-point parameter crossover . . . . . . . . . . . . . . . . . . .
Multi-point team crossover . . . . . . . . . . . . . . . . . . . . .
Fitness profile for different crossover schemas . . . . . . . . . . .
Results for different crossover schemas . . . . . . . . . . . . . . .
Fitness profile for different initial mutation rates . . . . . . . . .
Results for different initial mutation rates . . . . . . . . . . . . .
Fitness profile for different numbers of epochs of training . . . .
Results for training over different numbers of epochs . . . . . . .
Fitness profile for different population sizes . . . . . . . . . . . .
Results for different population sizes . . . . . . . . . . . . . . . .
Fitness profile for different crossover schemas . . . . . . . . . . .
Results for different crossover schemas . . . . . . . . . . . . . . .
Fitness profile for different population training data extents . . .
Results for different population training data extents . . . . . . .
Fitness profile for different home/away win ratio initialisations .
Results for different different home/away win ratio initialisations
Results for different betting strategies based on certainties . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

8
17
19
22
24
25
26
27
28
29
31
32
34
35
36
36
37
38
41
42
43
44
46
47
49
50
52
53
58
60
64

List of Tables
2.1
2.2

Testing dataset format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


Training dataset format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
3.10
3.11
3.12
3.13
3.14
3.15
3.16
3.17
3.18
3.19
3.20
3.21
3.22
3.23
3.24
3.25
3.26
3.27
3.28
3.29
3.30
3.31
3.32
3.33
3.34
3.35
3.36
3.37
3.38
3.39

Total percentage yields for naive betting . . . . . . . . . . . . . . . . . .


Results for reducing extent of input data . . . . . . . . . . . . . . . . . .
Results for different binary precisions . . . . . . . . . . . . . . . . . . . .
Fitness for real-valued representation . . . . . . . . . . . . . . . . . . . .
Fitness for different binary precisions . . . . . . . . . . . . . . . . . . . .
Results for different numbers of oracles . . . . . . . . . . . . . . . . . . .
Fitness for different initial mutation rates . . . . . . . . . . . . . . . . .
Results for different initial mutation rates . . . . . . . . . . . . . . . . .
Fitness for different numbers of epochs . . . . . . . . . . . . . . . . . . .
Results for different numbers of epochs . . . . . . . . . . . . . . . . . . .
Fitness for different population sizes . . . . . . . . . . . . . . . . . . . .
Results for different population sizes . . . . . . . . . . . . . . . . . . . .
Fitness for different crossover schemas . . . . . . . . . . . . . . . . . . .
Results for different crossover schemas . . . . . . . . . . . . . . . . . . .
Fitness for different initial mutation rates . . . . . . . . . . . . . . . . .
Results for different initial mutation rates . . . . . . . . . . . . . . . . .
Fitness for different numbers of epochs . . . . . . . . . . . . . . . . . . .
Results for different numbers of epochs . . . . . . . . . . . . . . . . . . .
Fitness for different population sizes . . . . . . . . . . . . . . . . . . . .
Results for different population sizes . . . . . . . . . . . . . . . . . . . .
Fitness for different crossover schemas . . . . . . . . . . . . . . . . . . .
Results for different crossover schemas . . . . . . . . . . . . . . . . . . .
Fitness for different population training data extents . . . . . . . . . . .
Results for different population training data extents . . . . . . . . . . .
English Premier League 20052006 final and predicted final league table
English Premier League 20062007 final and predicted final league table
French Le Championnat 20052006 final and predicted final league table
French Le Championnat 20062007 final and predicted final league table
German 1.Bundesliga 20052006 final and predicted final league table .
German 1.Bundesliga 20062007 final and predicted final league table .
Fitness for different home/away win ratio initialisations . . . . . . . . .
Results for different home/away win ratio initialisations . . . . . . . . .
English Premier League 20052006 final and predicted final league table
English Premier League 20062007 final and predicted final league table
French Le Championnat 20052006 final and predicted final league table
French Le Championnat 20062007 final and predicted final league table
German 1.Bundesliga 20052006 final and predicted final league table .
German 1.Bundesliga 20062007 final and predicted final league table .
Results for different betting strategies based on certainties . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

8
9
16
18
21
21
23
24
26
31
33
33
34
37
39
39
40
41
45
45
48
48
51
51
55
55
55
55
56
56
56
59
59
59
59
61
61
61
62
62
66

Chapter 1

Introduction and literature


1.1

Introduction

For many centuries, mankind has sought to capitalise on the unpredictability of certain stochastic events;
whether this be casting lots on dice, playing roulette, forecasting the stock markets, or betting on the
outcome of sporting matches such as football.
Some of these challenges rely more on luck than skill, although there is almost always an element
of both required to be successful in predicting outcomes of future events. If the element of skill can be
reproduced computationally and improved beyond human skill (as has been well documented in many
fields, such as excelling at the game of chess), then it may prove possible to succeed to some extent in
predicting the outcome of sports matches given past performance.
Genetic algorithms, pioneered by Holland [7], have long been known to perform well on many general
function optimisation problems. If the problem of comparing pairs of teams can successfully be represented as a numerical function, then it follows that a GA should have at least some measure of success
at optimising the function, in order to ascertain some understanding of the data describing the teams.

1.2

Relevant literature

The mini-project in this field by the same author and preceding this project [14] investigated suitable
representations and parameters for a genetic algorithm-based system to read a set of training data
consisting of results of past football matches between teams, and make predictions based upon the
trained model for future matches. It also proposed future extensions to the project, of which a number
are implemented and investigated here.
Clair and Letscher [3] successfully implement a strategy to predict winners and underdogs in
football pools. This is quite a different strategy to that employed in this project, however, as the
tournament for which they are predicting is a simple knockout, where the winner of a pair of teams
goes on to play in the next round. They focus on a feature unique to the pools, of computationally
finding a balance between predicting success for the favourite team, and predicting an upset caused by
the underdog. This is done in order to minimise the size of the pool of players with which the money
will be shared, rather than just to maximise the number of correct predictions, which is the aim of this
project.
Tsakonas et al. [15] present work similar in a number of ways to the aims of this project. They find
success in using fuzzy rules, neural networks, and genetic programming, and they highly recommend
such soft-computing techniques as an area for future research in the field of computer-aided gambling.
They make use of a relatively large number of data features (difference of infirmity factors; difference
of dynamics profile; difference of ranks; host factor; personal score of the teams).
With the exception of the study in neural networks, they focus primarily on the use of these techniques
to generate hard rule-sets which then govern prediction, rather than using the techniques themselves
to dynamically adapt to the data and predict outcomes more flexibly. Conversely, this project uses
a mathematical representation of the strengths and weaknesses of teams rather than attempting the
generation of rigid sets of rules using fuzzy logic or genetic programming.
6

Rotshtein et al. [13] makes use of the logic that human sports fans frequently make predictions based
on simple, common-sense assumptions, such as IF a team t1 won all previous matches, AND a team t2
lost all previous matches, AND the team t1 won the previous matches between t1 and t2 , THEN it should
be expected that t1 would win. They show how this can again be formalised as a fuzzy logic model,
and progress to devise such a model.
Their input data is far more sparse than that of both Tsakonas et al. [15] and the mini-project [14]
relating to this project. The authors introduce a genetic algorithm and a neural network to optimise
the fuzzy logic rule generator, with respectable success, although they concede in their conclusions that
the prediction model can be further improved by accounting for additional factors in the fuzzy rules:
home/away game, number of injured players, and various psychological effects including factors such as
the referee who oversees the game, and the weather conditions prior to kick-off.

Chapter 2

Design
2.1

Program structure

Figure 2.1: Program structure

2.1.1

Gambler

The Gambler module reads the testing set into memory and then requests betting recommendations
from the Predictor for each match in the testing set. It calculates the winnings for testing purposes, and
expects to find testing data in the tab-separated format in table 2.1
%season identifier (String)
(String)
(String)
(String)
$home team away team winning team
$home team away team winning team
..
.
%next season identifier
$home team away team

winning team
..
.

(double)
odds
odds

(String)
dd/mm/yyyy
dd/mm/yyyy

odds

dd/mm/yyyy

Table 2.1: Testing dataset format


8

%season identifier (String)


(String)
(String)
(int)
$home team away team home score
$home team away team home score
..
.
%next season identifier
$home team away team

home score
..
.

(int)
away score
away score

(String)
dd/mm/yyyy
dd/mm/yyyy

home result

dd/mm/yyyy

Table 2.2: Training dataset format

2.1.2

Predictor

The predictor initialises a Population or multiple Populations, and provides an interface for the Gambler
to query the predicted outcomes of matches. The Predictor in turn queries each of the Populations it
has initialised, to find their predicted results, before reporting these to the Gambler. It also initialises a
ScoresDatabase.

2.1.3

Population

Each Population initialises a training set by reading items from the ScoresDatabase. It also initialises
an array of Individuals and begins training the GA.
Training is performed according to the following process:
Randomly initialise each Individuals representation vector
Mutate each Individual
Evaluate each Individuals fitness
Select and cross over two Individuals to create a new array of Individuals
The two parent individuals are chosen using roulette-wheel (fitness-proportional) selection, so that
two good, but not necessarily the fittest, Individuals are selected. This is in order to allow the GA to
maintain population diversity.
The Population selects the fittest Individual to be consulted for prediction of games, and returns this
Individuals prediction for a winning team in a match when the Predictor requests it on behalf of the
Gambler.

2.1.4

Individual

Each Individual contains the vector representation of the four attributes for each team (home and away
attack/defence strengths) as well as referencing an array of teams, passed to it by the ScoresDatabase,
which it will use to predict the outcome of any two teams passed to it. It also initialises the mutation
strength for the Individual, as well as the ratios for home wins, away wins, and draws, used as the
thresholds for determining the winner of a match between two teams. Each Individual provides a method
for random mutation of the representation vector according to the mutation strengths contained within.

2.1.5

ScoresDatabase

The ScoresDatabase reads the training data set into memory, creating an array of FixtureResult objects.
It also creates an array listing all unique teams, which is passed to the Populations so they can initialise
their Individuals correctly.
The ScoresDatabase expects to find data in the tab-separated format in table 2.2.
9

2.1.6

FixtureResult

Each FixtureResult is a simple container, storing the names of two teams which play a match, the date
and season at which the match was played, and the final result of the match.

2.2
2.2.1

Genetic algorithm
Representation

The representation used for the GA, originally developed in the mini-project [14], encodes a value representing attack and defence strengths for each team, with one set for home performance and another
for away performance, so that the predicted outcome of a match can be determined by comparing
the two teams relevant attack and defence strengths. The reason for encoding a set of attributes for
home performance and another for away performance is that teams are well-known to play very
differently when at their home ground compared to when they are visiting another teams ground. There
are well-defined reasons for this: Buraimo and Simmons [2] state that home field advantage is a bundle
of attributes including home team psychology, greater familiarity with pitch, passionate home fans and
susceptibility of referees to home crowd pressure.
For a set of teams t, these attributes are represented as a vector of real values in the following format,
where ah refers to attacking strength at home, aa is attacking strength away from home, and dh and da
are the home and away defensive strengths, respectively:
[{ah , aa , dh , da }1 , {ah , aa , dh , da }2 , . . . , {ah , aa , dh , da }t ]
These values can be stored as binary values in a single concatenated string. In addition to the
parameters for each teams strengths, the representation encodes its own mutation rate to enable selfadapting mutation of the population. For an 8-bit binary representation of each value, and with around 40
unique teams in each training set and four parameters for each team, this leads to approximately 404
8 = 1280 bits in the bitstring. This equates to a search space with 21280 210385 combinations. Clearly,
classical search in a space this large is well outside the scope of any reasonably-available computational
facilities, which is why an intelligent search such as a genetic algorithm must be employed to optimise
the bitstring.
Since a teams performance fluctuates over time due to seasonal effects, such as the arrival of new
players in the summer and winter transfer windows, and smaller teams typically over-performing early in
the season, the representation was extended further to encompass disparate, temporally-related portions
of input space, and training of experts is performed on each portion of the input space, in order to
attempt to recover any underlying temporal information from the training sets. The boundaries at which
the input space is sectioned are noted in Appendix B. The mini-project records that the best strategy for
utilising the predictions of the various experts was to weight the prediction of the expert whose period
of training on the input data includes the testing data sample by three times the weightings of the other
experts, and thus to produce a weighted average prediction from the ensemble [14]:
For a set of n trained expert predictors P ,
P = {p1 , p2 , . . . , pn }
The training data T is split up into n portions according to time, and each predictor pm is trained
on the input data portion tm :
T = {t1 , t2 , . . . , tn }
If a testing sample is dated to be within the boundaries of portion k of the training data, then
predictor pk is chosen as the expert predictor, and its prediction is weighted by a magnitude of three.
The remaining experts predictions are weighted by a magnitude of one.
Gray codes
Traditional binary encoding of real values suffers from the problem of Hamming cliffs. A Hamming cliff
in a bitstring occurs when a small change to a bitstring (flipping a single bit) produces a large change
10

in the resulting value. For example, the string 0001 represents the value 1, but a single mutation can
cause the string to become 1001, with the value 9. Mutation can therefore be very drastic in cases such
as this. As a further example of the problem, bits to the left of the string are more significant (cause
greater change when flipped) than those to the right of the string. Therefore it is impossible to predict
what magnitude of effect a single mutation operation may have.
Another way to describe the problem of Hamming cliffs is that, for any subset of numbers, the
Hamming distance (number of bits which must be flipped to obtain one number from another) across
the subset can be very large.
This unpredictability and the problem of Hamming cliffs can be overcome by the use of Gray coding,
an alternative method of representing a binary number which minimises the Hamming distance between
two consecutive numbers to just one bit. Bitner et al. [1] note that it is desirable to have an algorithm
in which the change from one subset to the next is as small as possible, since the cost of additions and
deletions may be high [in terms of lost information] and that the smallest possible change is obviously
the addition or deletion of a single element, and in terms of bit vectors, this means that successive
vectors must differ by a single bit. Such sequences are known as Gray codes. Mathias and Whitley
[12] describes this effect in a similar way: Gray coding is known to eliminate Hamming cliffs that exist
in binary function spaces. A Hamming cliff occurs when two consecutive numbers have complementary
binary representations. For example, the binary representations for the numbers 7 and 8 are complements
of each other (i.e. 0111 and 1000). Gray coding is an alternative encoding of a number using binary
characters which allows every number to be only a Hamming distance of 1 away from its immediate
neighbours.
The reflected Gray code is described by Bitner et al. [1] and can be calculated from any real value
by converting the value from decimal to binary, and then to the reflected Gray code. Algorithms for
achieving the conversion from binary to reflected Gray code and vice-versa are available at Wikipedia
[16], and a Java implementation of these and also the required real-to-binary functions derived from this
and developed for this project are in Appendix A.
Clearly, calculating the real values from the Gray codes via binary every time a fitness evaluation
is to be performed is computationally very wasteful. An optimisation of this procedure would be to
maintain inside each Individual Java object an array of the corresponding real values for every set of
n bits in the bitstring, where n is the binary precision being employed. After mutation or crossover (or
any other change to the bitstring), the corresponding real values need then be calculated once only, and
the array updated accordingly. Reading the values from the bitstring can then take place quickly from
this buffer, rather than having to re-calculate them each time they are requested.

2.2.2

Fitness evaluation

Good parents are chosen by the genetic algorithm according to their fitness, which is a measure of how
well an individual in the population performs on the problem. The fitness is calculated by taking the
home and away attack and defence parameters for each team contained in each individual, and using
them to predict the outcome for each match in the training set. The output of the predictor is of the
form x N3 where:
x = 3 in the case that the home team is predicted to win
x = 1 in the case that the match is predicted to be a draw
x = 0 in the case that the away team is predicted to win
The reason for choosing these values for x is that these are the points awarded to teams in the equivalent
cases during actual football matches, and it therefore seemed sensible to follow this convention for the
fitness evaluation.
The predicted outcome x is then compared with the actual outcome y, as read from the the training
set, and the absolute of the result is cumulatively summed across the whole training set.
For an individual i trained over a training set T , where each t T is a pair of teams playing a match,
the fitness is therefore calculated as follows:
X
|xt yt |
f (i) =
tT

11

and the problem therefore becomes a non-linear fitness-minimisation problem.


Weighting per season
A teams overall performance tends to remain relatively stable over many seasons. For example, Manchester United are usually found to finish in one of the top three places. However in previous seasons, some
teams in the League have had vastly different performances. A good example is that of Chelsea FC, who
would in the past invariably finish the season mid-table, but since the arrival of a wealthy investor at the
club, have finished more recent seasons at the top of the table. Other clubs, such as Blackburn Rovers,
who have been champions in past seasons, have been consistently found in the lower part of the top half
of the table in more recent times.
It is generally the case that a teams recent performance is a good indicator of its future performance,
whereas its historic performance is not a very good indicator. For example, Wimbledon used to perform
well in the Premier League but, for various reasons and effects, are now to be found three divisions below
in League 2 under the name MK Dons. One way to deal with this effect is to take into account all
the history of a teams performance in the league across the entire training set, but to bias the fitness
evaluation towards a teams recent history. This is achieved by weighting the fitness of each individual
according to the time since the match being evaluated was played.
If w is the distance between two seasons, taken as the year of the most-recent season in the training
set minus the year of the current season being used in the training set, the fitness evaluation becomes:
f (i) =

X |xt yt |
tT

This produces an exponential decay curve of weightings. Training on the most recent season will have
a weighting of w1 = 1, whilst training on the preceding season will have a weighting of w2 = 0.5, and
the season before that will be weighted at 0.25, then 0.125, etc.

2.2.3

Search operators

After fitness evaluation, the fitness values of all individuals in the population are scaled using a successive combination of three fitness scaling techniques developed in the mini-project [14]: linear scaling
(equation 2.1), power scaling (equation 2.2), and exponential scaling (equation 2.3).
f (z) = z b

(2.1)

where b is the fitness of the worst individual, computed for each generation. This has the effect of
removing a constant amount of fitness from each individual, such that only the remaining differences
(which now appear greater) are used for ranking the individuals [5].
f (z) = z k , k > 0

(2.2)

where k is a constant, set in this project to 3 [6].


f (z) = ez

(2.3)

Once candidate parent individuals have been selected using roulette-wheel fitness selection, a new
population of individuals is generated using the search operators for crossover and mutation. Two
different methods of crossover are described and implemented in section 3.3.2 and section 3.3.3. Mutation
is straightforward when using the bitstring representation: firstly the self-adapted mutation rate is read
from the relevant location in the bitstring and converted from the encoded reflected binary Gray code
to a real value (or rather, read from the real-valued array buffer generated inside the Individual Java
object). Each bit in the bitstring is then flipped with a probability p(x) where x is the mutation rate.
12

2.3

Data sources

All data for the training and testing sets was obtained from the freely-available archive at http://www.
football-data.co.uk in CSV format. The attributes used for training were the names of the home and
away teams for each match, and the date on which the match was played, as well as a record of the final
score. For the testing set, the attributes recorded were the names of the home and away teams for each
match, the date on which the match was played, the name of the winning team, and the odds given to
that outcome before the match, according to the BetBrain 1 service.

2.4
2.4.1

Predicting outcomes using the model


Calculating a prediction from the representation

The prediction is calculated by normalising the relevant home/away attack and defence parameters for
the two teams, a and b, and subtracting the sets of parameters from each other, to find the relative
strengths s of each team:
sa =

abh daa + 1
2

(2.4)

sb =

dah aba + 1
2

(2.5)

The difference d between the two strengths is calculated and again normalised to a value from 0 1:
d=

sa sb + 1
2

(2.6)

The difference d can then be compared to a pre-defined home/away win ratio. The ratio is initially
given values of 0.5 for home wins () and 0.25 for away wins () , with subtraction used to calculate the
resulting draw ratio. If d > 1 then the home team is considered to win, and the home team receives
a prediction value of 3. If d < then the away team is considered to win, and the home team receives a
prediction value of 0. Finally, if < d < 1 , the result is predicted to be a draw, and the home team
receives a predction value of 1:

if d > 1
3
0
if d <
x=

1
otherwise

The prediction value x can then be compared to the actual outcome y, and the difference added to
the overall fitness of the Individual during fitness evaluation.
The values in the ratio are initially seeded to 0.5 and 0.25 due to the real-world proportions of games
which are won by the home or away team, or which result in a draw, although the values are allowed to
self-adapt during mutation of the population as they are stored as part of the bitstring. Buraimo and
Simmons [2] state that around 48 percent of games in English football are won by the home team,
with the remainder of games split approximately equally between away wins and draws2 . For ease of
calculation, these values are therefore initially seeded with values of 50% for home wins, 25% for away
wins, and 25% for draws.
During mutation the values for the ratio are unbounded apart from the necessity to stay within 0 1
enforced implicitly by the binary representation, so the draw ratio is permitted to become zero or even
negative. It follows that individuals with such extreme ratio values will be unsuccessful and are therefore
less likely to survive to future generations, so the GA essentially employs a self-penalisng penalty method
to remove unfit individuals, rather than explicitly repairing them to contain reasonable values.
1 An online service at http://www.betbrain.com which collates several bookmakers freely-available odds and provides
the user with the highest odds for each outcome, in order to maximise potential profits
2 This can be confirmed at http://www.sportpress.com where the statistics are calculated to be 46.42% home wins (std
dev 2.99), 26.86% draws (std dev 3.11), and 26.72% away wins (std dev 1.50) over the 14-season history of the Premier
League.

13

2.4.2

Oracles

The mini-project introduced the concept of oracles which are selected from each population using fitnessproportional selection [14]. The system consults a number of oracles in order to calculate a prediction for
the outcome of a match, rather than just consulting the fittest individual in the population. The oracles
results are combined and averaged to bring an element of diversity into the predictions and mitigate
any cases in which the fittest individual in a population may predict badly, but the majority of the
population (ie. the other oracles being consulted) disagree with it. This has the effect of reinforcing
popular predictions amongst the individuals of the population, and weakening unpopular predictions
which are only held by a minority of the fittest individuals of the population.

14

Chapter 3

Experiments performed
3.1

Profiling the testing data

For this project, six data sets were chosen and prepared for testing and training. In addition to the
English Premier League 2005-06 season and associated training data used for the mini project [14], the
French Le Championnat and German Bundesliga 1 leagues were selected and processed to provide
testing data and associated training data for the 2005-06 and 2006-07 seasons. The English Premier
League 2006-07 season was also added to the data sets. All the data was obtained from the free data
sets available at http://www.football-data.co.uk.
In this section, the following abbreviations will be used to denote different testing data sets:
EPL0506 English Premier League 2005-06 season
EPL0607 English Premier League 2006-07 season
FLC0506 French Le Championnat 2005-06 season
FLC0607 French Le Championnat 2006-07 season
GB10506 German Bundesliga 1 2005-06 season
GB10607 German Bundesliga 1 2006-07 season
Betting odds for the testing set are obtained from the BetBrain best odds in the football-data sets,
in order to avoid bias towards any one specific bookmaker, and also as these generally represent the best
odds freely available to gamblers.
The sum-inverse-odds of a testing set indicate how hard it is for a gambler to record a profit over
that set. Bookmakers shorten their odds slightly to ensure that on average they receive more money than
they pay out in winnings. If the three probabilities for a home win, away win, and draw are inverted
and summed, the expectation would be for the result to equal 1.0. However bookmakers adjusted odds
sum to slightly more, around 1.10, thus granting them a 10% profit per bet on average.
BetBrain is an online service which locates the best (highest) odds for any given outcome from a
number of bookmakers offering bets on that outcome, meaning that a gambler can received a larger
payout in the case of a correct bet. In this case, the average profit recorded by the bookmakers is
significantly lower, although still above zero. The following are the average sum-inverse-odds across
the six testing data sets, showing that on average random betting would still record approximately a
1.52.5% loss:
EPL0506 Average sum-inverse-odds: 1.01565, std dev: 0.01177
EPL0607 Average sum-inverse-odds: 1.01766, std dev: 0.02161
FLC0506 Average sum-inverse-odds: 1.01966, std dev: 0.01417
FLC0607 Average sum-inverse-odds: 1.02682, std dev: 0.01252
15

GB10506 Average sum-inverse-odds: 1.01720, std dev: 0.01411


GB10607 Average sum-inverse-odds: 1.02250, std dev: 0.01191
A profile of the testing data sets can be seen in fig 3.1, showing the total profit or loss which would
be made by placing 10 bets on all games for the home team to win, the away team to win, or a draw,
respectively. The mean of these results is also plotted to indicate the expected profit/loss for nave (or
random) betting. It is this value that is the target profit to beat, as it indicates better than random
performance.
The percentage yield of results is calculated using the formula y = po where p = raw profit and o =
total outlay required to return profit p. This gives a more accurate measure of how successful a betting
strategy is than simply recording the profit, as for a given yield, a larger profit may be obtained simply
by spending more on the initial outlay. The percentage yields for navely betting on the six testing sets
can be seen in table 3.1.
In table 3.1, the labels for the tests are as follows:
1. Betting 10 per game on the home team to win.
2. Betting 10 per game on the away team to win.
3. Betting 10 per game on a draw.

Yield

Test
1
2
3
Total

EPL0506
14.30%
-15.72%
-19.58%
-7.00%

EPL0607
6.67%
-17.68%
-2.27%
-4.42%

FLC0506
-2.67%
-7.07%
0.86%
-2.96%

FLC0607
6.45%
-8.25%
0.60%
-0.40%

GB10506
-12.27%
-5.22%
10.65%
-2.28%

GB10607
-1.58%
10.42%
-3.60%
1.75%

Average
1.82%
-7.25%
-2.22%

Table 3.1: Total percentage yields for nave betting


The most successful strategy across all six testing sets is clearly to bet navely on the home team to
win in every game. This will, on average, return a 1.86% yield. However, this does not indicate that for
all testing sets, betting on home wins would return a positive yield. The fact that these six testing sets
return an average yield of 1.86% for home wins is most likely due to the excessively large yield which is
returned on the EPL0506 testing set alone when betting on home wins (14.30%). Indeed, for other testing
sets such as GB10506, the worst yield is returned by betting on home wins, so it is purely an artefact
of the selected six data sets that betting on home wins is successful. If a computationally intelligent
strategy can reliably return better yields than 1.86%, then it can be considered more successful.

3.2

Refining use of available data

Hypothesis Training the model using the genetic algorithm is computationally expensive, with multiple populations each containing many individuals, and each requiring a large number of fitness evaluations
on each epoch. Using all the available historic data may prove to be unneccessary as currently each season
is weighted as a function of the inverse of the distance (in years) of the current training season from the
first season in the testing set. Therefore training updates from data 14 seasons previous to the testing set
will produce only a very small adjustment in the model, compared to updates from more recent training
data.
It is also possible that including data from very old seasons could actually cause the model to become
less accurate, as (at least in the English Premier League), teams were often very different in their
performance at the start of the Premier League around 1993, compared to more recently. For example,
Queens Park Rangers who finished fifth in the 1993/4 season and maintained a good performance for
two seasons before being relegated from the Premier League after a poor season and never returning, or
conversely Chelsea who were a mid-table finishing team for the early years of the Premier League before
securing a consistent place amongst the top four finishing teams from the 2002/03 season onwards [10].
It is hypothesised that if the training data for each testing set is reduced to just the five seasons
preceding the testing set, rather than the current 1214 seasons, not only will training will proceed much
16

Figure 3.1: Profile of testing data sets

17

faster as fewer fitness evaluations are required for each epoch of training; an increase in overall percentage
yield will also result as the model is trained by only relevant (more recent) data.
Method This experiment was run with the previous best settings from the mini project [14]: five populations trained on different parts of the input data set (months AugustSeptember, OctoberDecember,
January, FebruaryMarch, AprilMay), over 40 epochs of training, with 100 individuals per population,
an initial mutation rate of 0.10, and allowing interactive switching of oracle predictors between one
oracle and five, according to the current performance of the model on the testing set, with the most
expert oracle having a weighting of 3 times that of the other oracles.
Two tests were run:
1. Training on the full training set for each testing set:
EPL0506 Training on English Premier League data from 199394 to 200405.
EPL0607 Training on English Premier League data from 199394 to 200506.
FLC0506 Training on French Le Championnat data from 199394 to 200405.
FLC0607 Training on French Le Championnat data from 199394 to 200506.
GB10506 Training on German Bundesliga 1 data from 199394 to 200405.
GB10607 Training on German Bundesliga 1 data from 199394 to 200506.
2. Training on the reduced training set for each testing set:
EPL0506 Training on English Premier League data from 200001 to 200405.
EPL0607 Training on English Premier League data from 200102 to 200506.
FLC0506 Training on French Le Championnat data from 200001 to 200405.
FLC0607 Training on French Le Championnat data from 200102 to 200506.
GB10506 Training on German Bundesliga 1 data from 200001 to 200405.
GB10607 Training on German Bundesliga 1 data from 200102 to 200506.

Yield
Std dev
T-test

Test
1
2
1
2
1,2

EPL0506
17.90%
18.20%
3.50%
2.26%
0.7190

EPL0607
-0.84%
-5.08%
1.86%
3.69%
0.0000

FLC0506
2.64%
1.42%
2.96%
3.04%
0.1573

FLC0607
-4.79%
-3.11%
2.69%
3.23%
0.0504

GB10506
-11.53%
-5.84%
5.40%
4.02%
0.0001

GB10607
-2.06%
0.67%
3.91%
4.75%
0.0316

Average
0.22%
1.04%
3.39%
3.50%
0.1597

Table 3.2: Results for reducing extent of input data


Results It appears from the results in fig 3.2 and table 3.2 that the hypothesis was correct: the average
percentage yield was greater when the model was trained on reduced input data, probably due to the
removal of older, less-accurate training data, and on four of the data sets (EPL0607, FLC0607, GB10506,
and GB10607) the t-test considered the results statistically significant. The reduction in the required
number of fitness evaluations per epoch of training also had the additional desired effect of decreasing
training time. However, neither technique provides greater yields than the 1.86% returned by simply
betting on the home team to win in the six given testing .sets.

3.3

Converting GA to use a bitstring representation

The current representation and mutation allows, and even encourages, evolved parameter values to be
polarised at 0.0 and 1.0, with relatively few values between, because the parameters are artificially
bounded to these values (see fig 3.6(a)). For example, a parameter with value 0.7 and a large positive
mutation (greater than 0.3) would be bounded at 1.0 irrespective of the actual mutation value. The same
18

Figure 3.2: Results for reducing extent of input data

19

would occur for any parameter value larger than 0.7; the resulting post-mutation value would always
be artificially bounded at 1.0. This then increases the likelihood that, after mutation, a parameter will
take a value of 1.0. The same effect obviously occurs at the lower bounding of 0.0. This effect can be
considered to be occurring because the GA essentially knows, or has been informed, of its boundaries
for mutation.
A better representation would not inform the GA during mutation of what the boundaries are. This
can be achieved using binary bitstrings, as mutation is instead performed by flipping bits with probability
proportional to mutation strength, and hence if the parameter value is currently at an extremity, any
further mutation is guaranteed to move the value away from the boundary.
The conversion of the representation from real values to binary means that some of the techniques
implemented in the mini-project [14] are no longer appropriate, particularly improved fast evolutionary programming using adaptive Gaussian and Cauchy mutation, as this only finds use in real-valued
optimisation. In a bitstring representation, mutation is of course binary, consisting of the flipping of a
bit with probability proportional to the mutation rate. Since mutation can only affect a chosen bit in
this one discrete way, there is no use for Gaussian or Cauchy-generated continuous random numbers in
determining by how much to perform a mutation.
The bitstring representation presents us with an elegant method of achieving self-adaptive mutation.
The current mutation rate can be encoded in the bitstring as just an extra sequence of bits with no
discernible difference to the remainder of the string. The value of these bits can be read before mutating,
to obtain the current mutation rate. Mutation, operating on the entire bitstring, then mutates the
mutation rate with exactly the same probability as all of the parameter values encoded in the remainder
of the bitstring. There is also no need now for a separate home/away win ratio mutation rate, as mutation
of the win ratio is performed with the same self-adaptive probability as overall bitstring mutation. Finally,
there is of course now no need for the bounding n used in self-adapting mutation, as mutation when at
the upper or lower boundaries of allowed values will always move the value away from the boundary in
Gray-coded bitstring representation. In addition, there is no meaning to be found in the complex and
simple schemas for self-adaptation, as these are only applicable to real-valued representations, and the
discrete nature of a bitstring representation renders the idea essentially useless.

3.3.1

Different binary precision values

Hypothesis A fractional number between 0 and 1 can be represented in binary by approximating or


rounding to the nearest value when insufficient bits are available to represent the number fully. The
greater the number of bits per number, the smaller the significance embodied in the least-significant
bit, and therefore the greater the range of values between 0 and 1 that can be represented. However,
with greater-precision binary numbers come two potential problems: mutation and crossover will be
proportionally slower, as a greater number of bits must be processed on each operation, and in addition
mutation will proceed more slowly than with smaller-precision bitstrings as the flipping of a bit has less
significance the more bits there are in the representation of the number. This will in turn lead to slower
movement through the search space.
It is hypothesised that for this reason the shorter bitstrings, with a precision of 8 bits per number or
less, will be the more effective, as the range of values which can be represented by 8 bits is likely to be
sufficiently large for the model to be precise. With the higher-precision (longer) bitstrings, eg. 12 bits
or over, the increased resolution provided by the representation becomes less useful, and the detrimental
effect on the rate of mutation, and hence the rate of exploration of the search space, becomes more
pronounced, meaning it may take greater amounts of training to achieve a similar performance of the
model.
Method The GA was trained on the six testing/training sets at four different bitstring precisions:
1. 4 bits (Resolution down to

1
24 ,

or a minimum increment of 0.0625).

2. 6 bits (Resolution down to

1
26 ,

or a minimum increment of 0.015625).

3. 8 bits (Resolution down to

1
28 ,

or a minimum increment of 0.00390625).

4. 12 bits (Resolution down to

1
212 ,

or a minimum increment of 0.000244140625).


20

Training was performed for 40 epochs, with five populations of 100 individuals, an initial mutation rate
of 0.10, and interactive switching of oracles, as in the previous experiment.
In addition, the average fitness value of the best individual in the population trained on November
December training data, over all epochs of training for the EPL0506 testing set was recorded and plotted.

Yield

Std dev

T-test

Test
1
2
3
4
1
2
3
4
1,2
1,3
1,4
2,3
2,4
3,4

EPL0506
17.25%
18.75%
17.97%
19.44%
4.37%
3.10%
4.69%
1.90%
0.1683
0.5788
0.0001
0.4896
0.3467
0.1548

EPL0607
-2.86%
-3.12%
-3.57%
-2.74%
3.46%
3.59%
0.38%
2.25%
0.7955
0.3186
0.0406
0.5403
0.6574
0.0821

FLC0506
4.02%
5.22%
4.50%
4.65%
3.87%
0.82%
2.61%
2.37%
0.1427
0.6092
0.0198
0.2018
0.2613
0.8434

FLC0607
-4.55%
-5.00%
-6.01%
-5.31%
2.26%
2.87%
2.25%
2.13%
0.5375
0.0260
0.7643
0.1719
0.6653
0.2633

GB10506
-7.20%
-7.26%
-7.62%
-8.41%
2.68%
2.48%
.83%
1.04%
0.9347
0.5930
0.0445
0.6351
0.0410
0.2018

GB10607
5.40%
2.10%
3.40%
3.84%
6.44%
2.92%
1.69%
1.24%
0.0257
0.1449
0.2437
0.0607
0.0099
0.3058

Average
2.01%
1.78%
1.45%
1.91%
3.85%
2.63%
2.41%
1.82%
0.4341
0.3784
0.1855
0.3499
0.3303
0.3085

Table 3.3: Results for different binary precisions


Results The t-test indicates that tests 1 and 4 (4-bit and 12-bit precisions, respectively) are the most
statistically significant, which appears reasonable given the hypothesis. According to the results in fig 3.3
and table 3.3, the shorter (lower-precision) the bitstring, the better the final yield, except in the case of
extremely high-precision (12-bit) bitstrings, which perform on average almost as well as the low-precision
(4-bit) bitstring representation, but with a significantly smaller standard deviation and a much greater
computation time. This is in direct contrast to the actual fitness values observed during training (fig 3.5),
where the performance of the GA in improving fitness is directly proportional to the precision of the
bitstring, and it is also in contrast to the hypothesis which predicted that performance would increase
up to a certain bitstring precision and then begin to decrease again.
Importantly, the fitness results for bitstring representation at any precision (fig 3.5) are much better
than for the results of the real representation (fig 3.4). This is most likely due to the effect that
polarisation has on the real-valued representation as, with a large enough mutation rate and bounds of
0.0 and 1.0, the real-valued representation effectively becomes a 1-bit precision bitstring representation,
and is more prone to variations in the evolved parameter values.
Finally, tests 1 and 4 both achieved higher percentage yields over the six testing sets than navely
betting on the home team to win, although as previously discussed, the standard deviation is moderately
high.
Test
1

Fitness
157.204

Std dev
1.798
Table 3.4: Fitness for real-valued representation

Fitness seems to reach a plateau very quickly (fig 3.5), usually around the same value across runs,
although the actual evolved representation values differ across runs. This is permissible because the
performances of the teams are measured as a ratio between teams, and therefore there are many solutions
to the problem which result in the same ratio between teams parameters.
Despite the increased fitness performance and moderate model performance on the testing sets, longer
bitstrings are actually potentially less beneficial to the project as the computation time is much larger,
whilst giving very little extra benefit compared to shorter bitstrings. Also, mutations are relatively
smaller as each bit is less significant than in a shorter bitstring, so mutation to achieve similar level of
exploration of the search space will require a larger number of epochs.
21

Figure 3.3: Binary precision values

22

Test
1
2
3
4
Tests
1,2
1,3
1,4
2,3
2,4
3,4

Fitness
152.427
148.939
148.436
147.353
T-test
0.0033
0.0010
0.0000
0.6753
0.1740
0.3521

Std dev
3.766
4.205
4.229
3.912

Table 3.5: Fitness for different binary precisions


We can see, however, that the binary bitstring representation does produce a better spread of parameter values than the real representation used in the mini-project [14]. Fig 3.6(a) and fig 3.6(b) are the
parameter values of the fittest individuals after 40 epochs of training using a real-valued and bitstring
representation, respectively. We can also see that the final fitness values achieved when using bitstring
representation (table 3.5) are better than those achieved when using the old real-valued representation
(table 3.4).

3.3.2

High-precision bitstring

In order to further investigate the difference in performance between low and high-precision bitstring
representations, a set of identical experiments will be performed; one with a low-precision bitstring
representation, and one set with a high-precision bitstring representation. The high-precision bitstring
tests will be run with a precision of 8 bits, and the low-precision bitstring tests with a precision of 4 bits.
Investigating oracles
Hypothesis The large discrepancy in performance between the EPL0506 and other testing sets, as
seen in fig 3.3 and table 3.3, may be due to the effects of switching oracles after ten losses in every 20
consecutive predictions (as documented in section 2.4.2). It is possible that this particular scheme works
well only for this particular testing set, and is very specific to the set. It is therefore necessary to find a
scheme which works well in general across all the testing sets, or at least as many as possible.
Method The experiment was run with five separately-trained populations over 40 epochs of training,
with 100 individuals per population, an initial mutation rate of 0.10, and allowing the following oracle
prediction strategies:
1. One fixed oracle only (ie. fittest individual is used as a single predictor, with no switching).
2. Three fixed oracles, with the most expert weighted at 3 times the others.
3. Five fixed oracles, with the most expert weighted at 3 times the others.
4. Five switching oracles, with the most expert weighted at 3 times the others (ie. the strategy used
prior to this experiment).
Results Clearly, on the EPL0506 testing set for which the oracle switching scheme was originally
devised, this technique is successful, returning the highest yield as seen in the relevant graph in fig 3.7.
However, as seen in table 3.6 and in the other graphs, it is not the most successful global strategy. This
confirms the hypothesis that the particular oracle-switching strategy previously used is only suitable for
a specific training/testing set. Indeed, the most successful global strategy of those tested is to use just
one fixed oracle, with yields approaching those of betting navely on home wins across the data set.
These results also show that the greater the number of fixed oracles being used for prediction, the lower
23

Figure 3.4: Fitness profile for old real-valued representation

Yield

Std dev

T-test

Test
1
2
3
4
1
2
3
4
1,2
1,3
1,4
2,3
2,4
3,4

EPL0506
14.53%
8.81%
-19.58%
17.97%
2.46%
5.72%
0.00%
4.69%
0.0001
0.0000
0.0024
0.0000
0.0000
0.0000

EPL0607
5.66%
3.73%
-1.79%
-3.57%
3.07%
4.70%
0.00%
0.38%
0.0925
0.0000
0.0000
0.0000
0.0000
0.0000

FLC0506
-2.21%
-2.86%
0.86%
4.50%
1.55%
1.75%
0.00%
2.61%
0.1709
0.0000
0.0134
0.0000
0.0000
0.0000

FLC0607
6.53%
6.13%
0.60%
-6.01%
1.96%
1.98%
0.00%
2.25%
0.4850
0.0000
0.5125
0.0000
0.0000
0.0000

GB10506
-12.35%
-8.35%
10.64%
-7.62%
3.33%
5.24%
0.29%
2.83%
0.0025
0.0000
0.0000
0.0000
0.5454
0.0000

Table 3.6: Results for different numbers of oracles

24

GB10607
-1.31%
0.01%
-3.60%
3.40%
2.20%
5.19%
0.00%
1.69%
0.2477
0.0000
0.0000
0.0019
0.0042
0.0000

Average
1.81%
1.25%
-2.15%
1.45%
2.43%
4.10%
0.05%
2.41%
0.1664
0.0000
0.0881
0.0003
0.0916
0.0000

Figure 3.5: Fitness profile for new binary representation

the yield, and the t-test confirms that the difference in results using different numbers of fixed oracles is
statistically significant.
It is possible that an alternative strategy of switching oracles may produce a more successful yield
than the strategy employed previously (five oracles switching to one, and vice-versa, after 10 losses in
any 20 consecutive bets), but it is hard to perceive that this would be any less tailored to just the six
training/testing sets presented here. This could, however, be investigated as part of further work.
Initial mutation rates
Hypothesis The ideal initial mutation rate is likely to be tightly coupled with the precision of the
bitstring, for the reasons discussed in section 3.3.1. It is currently unknown what is a suitable mutation
strength for a bitstring precision of 8 bits, but it is likely to be different to the 0.10 discovered to be
suitable for the real-valued representation in the mini project [14].
Method The experiment was run with five separately-trained populations over 40 epochs of training,
with 100 individuals per population, with 1 fixed oracle and initial mutation rates as follows:
1. Initial mutation rate of 0.05.
2. Initial mutation rate of 0.10 (the current value).
3. Initial mutation rate of 0.20.
4. Initial mutation rate of 0.30.
Results The results actually show very little change in performance related to mutation rate, including
the fitness values produced during training. The t-test confirms that only the difference between an initial
mutation rate of 0.05 and an initial rate of 0.30 is statistically significant, and even then in only a few
25

(a) Real-valued representation

(b) Binary representation

Figure 3.6: Actual evolved parameter values


Test
1
2
3
4
Tests
1,2
1,3
1,4
2,3
2,4
3,4

Fitness
149.900
149.029
149.629
150.770
T-test
0.5293
0.8393
0.5088
0.6617
0.2015
0.3830

Std dev
4.732
4.987
4.653
4.507

Table 3.7: Fitness for different initial mutation rates


26

Figure 3.7: Results for different numbers of oracles

27

28

Figure 3.9: Results for different initial mutation rates

29

cases. This is likely to be at least in part due to the self-adapting mutation which must be settling on a
suitable mutation rate relatively quickly regardless of the initial rate.
The percentage yields recorded for each of the initial mutation rates are fairly similar, although the
yield for an initial mutation rate of 0.20 is highest by a slight amount. For this reason, future experiments
will be run with an initial mutation rate of 0.20.
Performance with different initial mutation rates is likely to be different with a lower-precision bitstring, and this will be investigated later.
Varying number of epochs
Hypothesis The fitness of the best individual in the population is certain to improve the longer the
population is trained for, as the GA employs elitism. As Jennison and Sheehan [8] state, an elitist
algorithm would be guaranteed to solve . . . simple problems because mutation and crossover can be
relied upon to improve the fittest individual, element by element if necessary, until the optimum is
reached. Given a larger number of epochs, the elite individual will have a correspondingly greater
chance of approaching an optimum.
It is unclear, however, what effect this would have on the resulting yields when the model is applied
to the testing data. Extra epochs of training may either improve the model, or alternatively force it
further into a local optimum, as by increasing selectivity, elitist strategies increase the risk of effective
convergence at an inferior local optimum in problems with multiple local optima [8].
Method The experiment was run with five separately-trained populations over different numbers of
epochs of training, with 100 individuals per population, 1 fixed oracle, initial mutation rate of 0.20.
Numbers of epochs tested are as follows:
1. 40 epochs (the current number).
2. 60 epochs.
3. 80 epochs.
4. 150 epochs.
Results As fig 3.21 shows, the fitness of the best individual in the population follows an exponential
decrease. Final fitnesses are lower in the cases where more epochs of training are performed, as both the
graph and table 3.9 show. The t-test shows that there is a good likelihood that the different runs come
from the same distribution, which is to be expected if the fitness curves all follow the same approximate
decrease. The best performance on the testing sets is, however, found to occur at only 60 epochs, with
a decrease in performance as the number of epochs of training increases (table 3.18. This is possibly an
effect of over-fitting to the training data once the GA has hit a local optimum and continues to descend
into it. Further work will be needed to enable the algorithm to escape such local optima.
Varying population sizes
Hypothesis With a larger population size it is expected that a greater area of the search space may
be examined upon each epoch, and therefore there is a greater chance of finding and settling into
an optimum. However, with larger population sizes, the computation time is correspondingly greatly
increased as fitness evaluations, mutations, and crossovers must be performed more often on every epoch.
If there is any benefit to be found in varying the population size, it is likely to emerge from having larger
populations, as the increased spread of individuals across the search space increases the chance of finding
optima.
Method The experiment was run with five separately-trained populations over 60 epochs of training,
with varying numbers of individuals per population, 1 fixed oracle, and initial mutation rate of 0.20.
Sizes of populations are as follows:
1. 50 individuals.
30

Yield

Std dev

T-test

Test
1
2
3
4
1
2
3
4
1,2
1,3
1,4
2,3
2,4
3,4

EPL0506
13.62%
14.20%
13.99%
13.72%
2.31%
0.36%
2.97%
2.96%
0.2229
0.6234
0.2317
0.7251
0.4308
0.7531

EPL0607
6.00%
6.63%
6.41%
6.65%
3.02%
1.35%
1.39%
0.74%
0.3443
0.5421
0.0000
0.5635
0.9698
0.4563

FLC0506
-2.83%
-2.82%
-2.65%
-2.78%
1.49%
0.50%
0.11%
0.91%
0.9739
0.5462
0.0177
0.1026
0.8320
0.4877

FLC0607
6.01%
6.30%
6.39%
6.16%
1.63%
0.72%
0.76%
0.69%
0.4234
0.2990
0.0001
0.6665
0.4969
0.2749

GB10506
-12.45%
-12.51%
-12.09%
-12.23%
0.65%
0.93%
0.96%
1.92%
0.8184
0.1274
0.5928
0.1292
0.5288
0.7456

GB10607
-1.55%
-1.36%
-1.46%
-1.06%
1.61%
1.52%
1.49%
1.75%
0.6675
0.8335
0.3106
0.8172
0.5266
0.3944

Table 3.8: Results for different initial mutation rates

Figure 3.10: Fitness profile for different numbers of epochs of training

31

Average
1.46%
1.74%
1.76%
1.74%
1.79%
0.90%
1.28%
1.50%
0.5751
0.4953
0.1921
0.5007
0.6308
0.5187

Figure 3.11: Results for training over different numbers of epochs

32

Test
1
2
3
4
Tests

Fitness
148.912
148.200
147.759
146.163
T-test
1,2
1,3
1,4
2,3
2,4
3,4

Std dev
4.645
5.017
5.117
4.429
0.6050
0.4082
0.0374
0.7595
0.1347
0.2445
Table 3.9: Fitness for different numbers of epochs

Yield

Std dev

T-test

Test
1
2
3
4
1
2
3
4
1,2
1,3
1,4
2,3
2,4
3,4

EPL0506
13.82%
13.78%
13.93%
12.97%
1.39%
1.36%
2.75%
5.63%
0.9153
0.8522
0.0000
0.7993
0.4927
0.4474

EPL0607
6.78%
6.70%
6.24%
5.71%
0.74%
0.90%
1.08%
2.80%
0.7262
0.0447
0.0000
0.1094
0.1044
0.3868

FLC0506
-2.69%
-2.60%
-2.43%
-2.91%
0.54%
2.12%
1.76%
1.15%
0.8313
0.4870
0.0004
0.7665
0.5224
0.2631

FLC0607
6.03%
6.48%
6.57%
6.41%
0.91%
0.86%
1.08%
1.38%
0.0830
0.0635
0.0440
0.7371
0.8431
0.6544

GB10506
-13.20%
-12.38%
-12.83%
-12.92%
1.45%
1.10%
1.60%
1.82%
0.0296
0.3958
0.5605
0.2570
0.2100
0.8430

Table 3.10: Results for different numbers of epochs

33

GB10607
-1.48%
-0.71%
-1.46%
-0.73%
1.43%
2.00%
1.16%
2.12%
0.1235
0.9694
0.1510
0.1091
0.9665
0.1372

Average
1.54%
1.88%
1.67%
1.42%
1.07%
1.39%
1.57%
2.48%
0.4515
0.4688
0.1260
0.4631
0.5232
0.4553

2. 100 individuals (the current number).


3. 200 individuals.
4. 300 individuals.

Figure 3.12: Fitness profile for different population sizes

Test
1
2
3
4
Tests
1,2
1,3
1,4
2,3
2,4
3,4

Fitness
151.618
147.827
145.538
145.119
T-test
0.0055
0.0000
0.0000
0.0839
0.0575
0.7547

Std dev
4.412
4.798
4.358
5.036

Table 3.11: Fitness for different population sizes


Results The fitness of the population, as predicted, benefits from the increased population size (fig 3.12).
The performance of the algorithm also shows a general, but only slight, increase when trained with a
larger population (table 3.12). However the computation cost of the training increases dramatically with
large numbers of individuals and only a correspondingly small increase in performance, leading to the
conclusion that keeping the population size at a manageable 100 individuals will not result in too much
of a performance decrease, but at a much lower computation cost.
34

Figure 3.13: Results for different population sizes

35

Crossover schemas
Hypothesis Currently each new generation is produced from two parents by selecting blocks of n bits
corresponding to real numbers in the bitstring from either parent, and combining these using multi-point
crossover into a new individual (fig 3.14). An alternative method of crossover could take all four of the
parameters representing a teams performance as a single item to be crossed over (fig 3.15). This may
provide extra benefits, as the offspring receives complete sets of attributes for each team which could
already be weighted to work well together. Conversely, by using parameter crossover, it is possible that
a well-evolved team representation could become destroyed and lost from the model if the parameters
are divided up amongst other individuals in the offspring population.
Parent 1: 1010 0110 0101 0110 1110 0010 0010 0011 1100 0101 0010 0101 . . .
Parent 2: 0110 1010 1100 1010 0101 0100 1110 0000 0001 1100 1100 1001 . . .
Offspring: 1010 1010 1100 0110 1110 0100 0010 0000 1100 0101 1100 1001 . . .
Figure 3.14: Multi-point parameter crossover

Parent 1: 1101 0110 1011 0100 0110 1010 0110 0011 0100 1110 0010 0001 . . .
Parent 2: 0010 0110 1010 1101 1011 0110 0101 1101 1110 0010 0011 0001 . . .
Offspring: 1101 0110 1011 0100 1011 0110 0101 1101 0100 1110 0010 0001 . . .
Figure 3.15: Multi-point team crossover
Another experiment of interest would be to ascertain the usefulness of employing elitism in the training
algorithm. The current strategy is for the fittest individual to be copied from the parent population to the
offspring population, and two further parents are then chosen from the parent population for crossover.
As before mentioned, Jennison and Sheehan [8] state that an elitist algorithm would be guaranteed to
solve simple problems but that by increasing selectivity, elitist strategies increase the risk of effective
convergence at an inferior local optimum in problems with multiple local optima. As the search space
has many local optima it is not known whether there will be a detrimental effect on training if elitism
is used. Therefore, by comparing results when training with elitism to results without elitism, it can be
determined what effect elitism has.
Method The experiment was run with five separately-trained populations over 60 epochs of training,
with 100 individuals per population, 1 fixed oracle, and initial mutation rate of 0.20. Crossover schemas
used are as follows:
1. Parameter crossover with elitism (the current schema).
2. Team crossover with elitism.
3. Parameter crossover without elitism.
4. Team crossover without elitism.
Results The fitness values are only marginally different between the two types of crossover (fig 3.16
and table 3.13) but there is a clear advantage to fitness employing elitism. The non-elitist algorithms
record a slower improvement in fitness over the epochs of training, which is to be expected as elitism is
known to force the training algorithm into an optimum (whether local or global).
More interestingly, the average yield suffers slightly when elitism is not employed (table 3.14), indicating that elitism is beneficial to the training, and that the search space either has one deep global
optimum or plateaus of similar-depth local optima. This second scenario seems more plausible, as has
been discussed before, because the representation takes the form of a ratio of performances between
teams. If one optimum contains good values to describe these ratios then there are of course a number
of other multiples which still maintain the same ratio between the values. This would lead to a fitness
landscape with numerous same-depth local optima around a plateau.
36

Yield

Std dev

T-test

Test
1
2
3
4
1
2
3
4
1,2
1,3
1,4
2,3
2,4
3,4

EPL0506
13.80%
13.53%
12.86%
14.04%
1.16%
2.92%
5.85%
2.13%
0.6673
0.4363
0.0044
0.6115
0.4842
0.3506

EPL0607
5.76%
6.11%
6.61%
6.24%
2.90%
1.69%
1.11%
0.98%
0.5991
0.1801
0.0000
0.2277
0.7558
0.2144

FLC0506
-2.08%
-2.47%
-3.51%
-2.72%
2.15%
1.18%
1.30%
1.34%
0.4309
0.0067
0.0261
0.0045
0.4806
0.0398

FLC0607
6.23%
6.44%
6.40%
6.03%
0.77%
0.81%
0.72%
3.20%
0.3582
0.4311
0.0000
0.8549
0.5416
0.5794

GB10506
-12.26%
-12.45%
-12.84%
-12.88%
1.56%
0.86%
1.64%
1.47%
0.6093
0.2090
0.1554
0.2963
0.2094
0.9235

GB10607
-2.16%
-1.53%
-0.97%
-0.51%
2.74%
1.64%
2.27%
2.17%
0.3305
0.1029
0.0229
0.3275
0.0682
0.4653

Table 3.12: Results for different population sizes

Figure 3.16: Fitness profile for different crossover schemas

37

Average
1.55%
1.61%
1.42%
1.70%
1.88%
1.52%
2.15%
1.88%
0.4992
0.2277
0.0348
0.3871
0.4233
0.4288

Figure 3.17: Results for different crossover schemas

38

Test
1
2
3
4
Tests
1,2
1,3
1,4
2,3
2,4
3,4

Fitness
148.429
148.859
152.188
151.511
T-test
0.7480
0.0407
0.0442
0.0619
0.0708
0.7172

Std dev
4.921
4.478
7.415
5.598

Table 3.13: Fitness for different crossover schemas

Yield

Std dev

T-test

Test
1
2
3
4
1
2
3
4
1,2
1,3
1,4
2,3
2,4
3,4

EPL0506
13.68%
14.46%
13.48%
14.18%
3.37%
1.51%
2.32%
0.39%
0.2942
0.8109
0.0000
0.0825
0.3690
0.1493

EPL0607
6.46%
6.32%
6.12%
5.84%
2.49%
1.89%
1.88%
2.57%
0.8193
0.5838
0.8803
0.7077
0.4590
0.6676

FLC0506
-3.16%
-2.68%
-3.01%
-2.81%
2.70%
0.57%
0.86%
2.12%
0.3850
0.7899
0.2444
0.1100
0.7643
0.6617

FLC0607
6.60%
6.20%
6.20%
6.23%
1.49%
1.00%
0.79%
0.64%
0.2746
0.2464
0.0001
1.0000
0.9001
0.8834

GB10506
-12.37%
-12.68%
-12.26%
-12.22%
1.23%
1.50%
0.77%
1.22%
0.4360
0.7144
0.6737
0.2295
0.2485
0.8904

Table 3.14: Results for different crossover schemas

39

GB10607
-1.56%
-1.58%
-1.22%
-1.79%
1.24%
1.61%
1.56%
0.85%
0.9507
0.4084
0.4366
0.4293
0.5668
0.1186

Average
1.61%
1.68%
1.55%
1.57%
2.09%
1.35%
1.36%
1.30%
0.5266
0.5923
0.3725
0.4265
0.5513
0.5618

Test 2 (team crossover with elitism) is the most successful strategy by a small margin, but more
usefully the standard deviation for this crossover schema is lower than the original parameter crossover
schema.

3.3.3

Low-precision bitstring

As discussed before, a similar set of experiments will now be performed using lower-precision bitstring
representations, in order to further investigate the effect of precision in the representation. As discovered
in experiment 3.3, 4-bit precision bitstrings achieved a high performance compared to other precisions
which were tested, so the following branch of experiments will be performed with 4-bit representations.
As was discovered in experiment 3.7, the switching of oracles according to current performance of
the trained model is dataset-dependent, and so this aspect of low-precision representations will not be
investigated further.
Initial mutation rates
Hypothesis The required initial mutation rate for successful training is likely to be smaller than with
the high-precision bitstring, as fewer changes (mutations) are required with shorter bitstrings to achieve
a similar magnitude of change in the corresponding real value.
Method The experiment was run with five separately-trained populations over 40 epochs of training,
with 100 individuals per population, with 1 fixed oracle and initial mutation rates as follows:
1. Initial mutation rate of 0.05.
2. Initial mutation rate of 0.10 (the current value).
3. Initial mutation rate of 0.20.
4. Initial mutation rate of 0.30.

Test
1
2
3
4
Tests
1
1
1
2
2
3

Fitness
156.551
153.139
152.057
152.581
T-test
2
3
4
3
4
4

Std dev
0.932
2.978
4.064
3.995
0.0000
0.0000
0.0000
0.2891
0.5788
0.6478
Table 3.15: Fitness for different initial mutation rates

Results The results in fig 3.18 and table 3.15 show that fitness is much more dependent on the initial
mutation rate than with longer bitstrings (see fig 3.8). This, as mentioned before, is likely to be due to
the greater significance of a single mutation operation on a 4-bit number compared to a single mutation
operation on an 8-bit number.
The performances recorded in fig 3.19 and table 3.16 show that there is a large drop in performance
on the EPL0506 testing set with an initial mutation rate of 0.05, although the results on the remaining
testing sets are actually better than for any other mutation rate value tested. However the standard
deviation is very large indeed, making the results unreliable. The mutation rate value with the best
percentage yield and the lowest standard deviation is 0.10, which is a lower value than that found for
the high-precision bitstrings, as predicted in the hypothesis.
40

Figure 3.18: Fitness profile for different initial mutation rates

Yield

Std dev

T-test

Test
1
2
3
4
1
2
3
4
1,2
1,3
1,4
2,3
2,4
3,4

EPL0506
0.98%
13.99%
13.79%
12.22%
11.06%
3.29%
2.79%
4.51%
0.0000
0.0000
0.0000
0.8199
0.1191
0.1452

EPL0607
5.34%
5.91%
4.53%
3.79%
5.01%
3.67%
5.16%
5.17%
0.6445
0.5784
0.8825
0.2809
0.1020
0.6167

FLC0506
-0.87%
-2.85%
-2.98%
-2.10%
3.91%
1.53%
2.96%
3.32%
0.0246
0.0368
0.4302
0.8477
0.3085
0.3249

FLC0607
3.36%
6.52%
5.55%
5.94%
4.98%
1.05%
2.77%
3.43%
0.0045
0.0624
0.0744
0.1103
0.4306
0.6532

GB10506
-6.35%
-11.90%
-12.85%
-12.72%
7.77%
1.87%
2.00%
3.09%
0.0018
0.0004
0.0006
0.0904
0.2611
0.8695

Table 3.16: Results for different initial mutation rates

41

GB10607
-0.17%
-1.22%
-0.43%
-1.53%
6.80%
1.55%
4.65%
2.69%
0.4573
0.8746
0.3584
0.4265
0.6194
0.3118

Average
0.38%
1.74%
1.27%
0.93%
6.59%
2.16%
3.39%
3.70%
0.1888
0.2588
0.2910
0.4293
0.3068
0.4869

Figure 3.19: Results for different initial mutation rates

42

Varying number of epochs


Hypothesis Fitness is again expected to improve with increased epochs of training, although as before
mentioned, performance of the trained model will depend on whether the GA descends into a local or
a global optimum. It is predicted that the lower-precision bitstring will converge more quickly to an
optimum than the high-bitstring or rather, that within the same number of epochs, the achieved
fitness will be better.
Method The experiment was run with five separately-trained populations over different numbers of
epochs of training, with 100 individuals per population, 1 fixed oracle, and an initial mutation rate of
0.10. Numbers of epochs tested are as follows:
1. 40 epochs (the current number).
2. 60 epochs.
3. 80 epochs.
4. 150 epochs.

Figure 3.20: Fitness profile for different numbers of epochs of training


Results Fitness is again seen to improve with more epochs of training, but an unexpected observation is
that the low-precision representation plateaus frequently, with a fast initial fitness improvement followed
by long stretches of no improvement with. Final fitness values are not significantly higher than those
recorded with the high-precision bitstring representation, although the standard deviations are lower.
The best model performance was obtained with 80 epochs of training, possibly due to over-fitting to
the training data over longer runs of training, or to further descent into a local optimum rather than a
global optimum. More epochs of training appear to be required for the model to reach a performance
comparable to the high-precision bitstring representation, as the slower mutation rate leads to slower
descent into optima.
43

Figure 3.21: Results for training over different numbers of epochs

44

Test
1
2
3
4
Tests
1
1
1
2
2
3

Fitness
150.980
150.042
152.312
155.425
T-test
2
3
4
3
4
4

Std dev
4.963
4.528
4.081
3.128
0.4885
0.3054
0.0005
0.0688
0.0000
0.0041
Table 3.17: Fitness for different numbers of epochs

Yield

Std dev

T-test

Test
1
2
3
4
1
2
3
4
1,2
1,3
1,4
2,3
2,4
3,4

EPL0506
12.76%
9.70%
13.79%
12.40%
5.42%
8.87%
3.52%
3.47%
0.1501
0.4290
0.0333
0.0404
0.1676
0.1658

EPL0607
4.97%
3.65%
4.94%
4.80%
3.87%
6.16%
3.87%
3.30%
0.3669
0.9735
0.4444
0.3803
0.4153
0.8919

FLC0506
-2.85%
-2.15%
-3.01%
-3.72%
2.78%
2.92%
0.86%
2.60%
0.3870
0.7846
0.7536
0.1666
0.0496
0.2040

FLC0607
5.83%
6.25%
6.67%
5.49%
3.06%
4.28%
1.68%
2.07%
0.6881
0.2353
0.0604
0.6534
0.4280
0.0318

GB10506
-12.16%
-12.41%
-12.08%
-12.98%
2.52%
3.13%
1.09%
2.32%
0.7488
0.8895
0.2328
0.6165
0.4694
0.0868

Table 3.18: Results for different numbers of epochs

45

GB10607
-1.25%
-0.12%
-2.00%
-1.33%
2.39%
3.71%
1.09%
1.82%
0.2070
0.1658
0.9013
0.0219
0.1531
0.1224

Average
1.22%
0.82%
1.38%
0.78%
3.34%
4.85%
2.02%
2.60%
0.4247
0.5796
0.4043
0.3132
0.2805
0.2505

Varying population sizes


Hypothesis Larger population sizes are again expected to perform more successfully than smaller
population sizes, due to the larger coverage of the search space and the greater population diversity
afforded in order to help avoid local optima.
Method The experiment was run with five separately-trained populations over 80 epochs of training,
with varying numbers of individuals per population, 1 fixed oracle, and initial mutation rate of 0.10.
Sizes of populations are as follows:
1. 50 individuals.
2. 100 individuals (the current number).
3. 200 individuals.
4. 300 individuals.

Figure 3.22: Fitness profile for different population sizes

Results Fig 3.22 shows that, as predicted, fitness improves in direct correlation with the number of
individuals in the population, with the greatest descent into optima occurring with the larger population
sizes. The standard deviation of the fitness results does, however, increase with population size.
Unexpectedly, the best performance of the model does not come with the largest tested population
size (table 3.20), although the results for the lower sizes of 50 and 100 individuals are poor compared
to the larger sizes, as predicted. The best combination of percentage yield and lower standard deviation
occurs with a population size of 200 individuals. It is possible that the slightly lower percentage yield
recorded for the largest population size is an artifact of the large standard deviation. In addition, the
t-test shows that the results of tests 3 and 4 are considered more statistically significant than any other
pair of tests. The reason for this is unknown.
46

Figure 3.23: Results for different population sizes

47

Test
1
2
3
4
Tests
1,2
1,3
1,4
2,3
2,4
3,4

Fitness
152.854
152.240
151.490
148.275
T-test
0.5803
0.2738
0.0009
0.5514
0.0041
0.0282

Std dev
3.824
3.973
4.825
5.209

Table 3.19: Fitness for different population sizes

Yield

Std dev

T-test

Test
1
2
3
4
1
2
3
4
1,2
1,3
1,4
2,3
2,4
3,4

EPL0506
12.86%
11.14%
13.18%
12.75%
4.91%
5.46%
4.50%
3.82%
0.2459
0.8117
0.2282
0.1553
0.2330
0.7166

EPL0607
2.66%
3.75%
4.49%
3.38%
5.51%
6.45%
5.41%
5.59%
0.5235
0.2420
0.9446
0.6627
0.8306
0.4805

FLC0506
-3.26%
-3.45%
-1.62%
-2.84%
2.55%
4.22%
3.44%
3.09%
0.8527
0.0619
0.3502
0.1008
0.5628
0.1957

FLC0607
5.81%
5.40%
5.81%
6.87%
3.03%
2.89%
3.98%
2.42%
0.6259
0.9983
0.2759
0.6797
0.0577
0.2634

GB10506
-11.99%
-12.13%
-11.81%
-11.88%
1.74%
2.87%
3.17%
2.43%
0.8326
0.8095
0.8591
0.7113
0.7415
0.9315

Table 3.20: Results for different population sizes

48

GB10607
-1.55%
-1.18%
-0.93%
-1.71%
1.87%
2.84%
2.29%
3.53%
0.5944
0.3029
0.8396
0.7324
0.5629
0.3597

Average
0.76%
0.59%
1.52%
1.09%
3.27%
4.12%
3.80%
3.48%
0.6125
0.5377
0.5829
0.5071
0.4981
0.4912

Crossover schemas
Hypothesis Using the same alternative crossover schemas as experiment 3.3.2, with and without
elitism, we should expect to see the same results that team crossover with elitism is the most successful
strategy.
Method The experiment was run with five separately-trained populations over 60 epochs of training,
with 100 individuals per population, 1 fixed oracle, and initial mutation rate of 0.10. Crossover schemas
used are as follows:
1. Parameter crossover with elitism (the current schema).
2. Team crossover with elitism.
3. Parameter crossover without elitism.
4. Team crossover without elitism.

Figure 3.24: Fitness profile for different crossover schemas

Results As predicted, team crossover with elitism is again the most successful strategy for improving
fitness, with the non-elitist strategies faring less well (fig 3.24 and table 3.21). However, unexpectedly,
this was one of the least successful strategies at producing high percentage yields. The t-test (table 3.22)
shows that there is little difference of statistical significance between team and parameter crossovers
when elitism is employed (tests 1 and 2), and the standard deviations of all the tests are high. These
results are inconclusive, and may require further testing.
Compared to the overall results of experiments using a high-precision bitstring representation, the
low-precision representations standard deviations are generally larger, although yields vary between
being better and worse on each experiment. In conclusion, the results seem to indicate that using 8-bit
precision is more successful than first thought, compared to using 4-bit precision.
49

Figure 3.25: Results for different crossover schemas

50

Test
1
2
3
4
Tests
1,2
1,3
1,4
2,3
2,4
3,4

Fitness
150.980
150.042
152.312
155.425
T-test
0.4885
0.3054
0.0005
0.0688
0.0000
0.0041

Std dev
4.963
4.528
4.081
3.128

Table 3.21: Fitness for different crossover schemas

Yield

Std dev

T-test

Test
1
2
3
4
1
2
3
4
1,2
1,3
1,4
2,3
2,4
3,4

EPL0506
12.76%
9.70%
13.79%
12.40%
5.42%
8.87%
3.52%
3.47%
0.1501
0.4290
0.0333
0.0404
0.1676
0.1658

EPL0607
4.97%
3.65%
4.94%
4.80%
3.87%
6.16%
3.87%
3.30%
0.3669
0.9735
0.4444
0.3803
0.4153
0.8919

FLC0506
-2.85%
-2.15%
-3.01%
-3.72%
2.78%
2.92%
0.86%
2.60%
0.3870
0.7846
0.7536
0.1666
0.0496
0.2040

FLC0607
5.83%
6.25%
6.67%
5.49%
3.06%
4.28%
1.68%
2.07%
0.6881
0.2353
0.0604
0.6534
0.4280
0.0318

GB10506
-12.16%
-12.41%
-12.08%
-12.98%
2.52%
3.13%
1.09%
2.32%
0.7488
0.8895
0.2328
0.6165
0.4694
0.0868

Table 3.22: Results for different crossover schemas

51

GB10607
-1.25%
-0.12%
-2.00%
-1.33%
2.39%
3.71%
1.09%
1.82%
0.2070
0.1658
0.9013
0.0219
0.1531
0.1224

Average
1.22%
0.82%
1.38%
0.78%
3.34%
4.85%
2.02%
2.60%
0.4247
0.5796
0.4043
0.3132
0.2805
0.2505

3.4

Altering population extents for testing data

Hypothesis A common feature of the results of all the experiments performed so far is that percentage
yields for the German Bundesliga (GB1) testing sets are significantly lower than for the other four testing
sets. This could be due to fundamental differences in the quality of the league, for example matches
being inherently less predictable than in the English and French leagues, but similarly there may be a
reason much simpler to analyse computationally.
The German football league has a different fixture format compared to the English and French
leagues. The German football season, like in England, starts in August (although the French league
starts in late July), and all three leagues last until late May, but in Germany there is a winter break of
six weeks from mid-December through to the end of January, in the French league there is a three-week
break from the last week of December through to the second week of January, and in the English league
DecemberJanuary is in fact one of the busiest times of the season in terms of fixture schedules. The
extents for the five expert populations were originally chosen in the mini project to fit with the English
league fixture format [14], but with the differences in the French and German leagues, the extents of
the training data used for training each population should be changed to reflect the fixture schedule.
Appendix B outlines the new extents.
Method The experiment was run at 8-bit precision bitstring representation with separately-trained
populations over 60 epochs of training, with 100 individuals per population, 1 fixed oracle, an initial
mutation rate of 0.20, and using team crossover with elitism. Populations were initialised with the
following extents:
1. Old population extents (same for all testing sets).
2. New population extents (as defined in Appendix B).

Figure 3.26: Fitness profile for different population training data extents

52

Figure 3.27: Results for different population training data extents

53

Results The differences between the original and league-tailored population extents are surprisingly
minimal. The two sets of results (table 3.24 and fig 3.27) show almost identical percentage yield and
standard deviation for both sets of results. According to the t-test, none of the sets of results are
statistically significant with relation to the two experiments, with the most statistical significance being
awarded to the GB10506 testing set which shows a small improvement when using tailored population
extents.
Fitness during training of the NovemberJanuary population shows a considerable improvement once
the fitness values are scaled to equivalent starting points (the fitnesses in the second population were
all of larger value before scaling due to the extra number of training data points now included in the
population through changing the population extents). This almost certainly results from the greater
number of training cases available to the genetic algorithm in this population. The t-test reports a
moderate level of statistical significance (table 3.23 and fig 3.26).

3.5

Predicting league tables

Hypothesis It is a relatively easy extension to the Gambler class of the program to sum the predicted
number of wins, losses, and draws awarded to each team over the course of a seasons betting in order
to ascertain the number of points the team would earn during the season, and therefore the predicted
final ranking of the league table. This has benefits for people interested in gambling, as there is money
to be made in predicting the eventual champions and losers of the league each season, as well as being
a very useful indicator of the long-term performance of the system. Tsakonas et al. [15] also used their
system to attempt to predict league tables, with some success.
Method The experiment was run at 8-bit precision bitstring representation with separately-trained
populations using the new individually-tailored extents, over 60 epochs of training, with 100 individuals
per population, 1 fixed oracle, an initial mutation rate of 0.20, and using team crossover with elitism.
Results The actual final league tables for each of the six data sets can be seen in tables 3.25, 3.26,
3.27, 3.28, 3.29, 3.30, alongside the predicted final league tables as produced by the genetic algorithm. In
cases that a team was new to the league for a particular season, no predictions were made for that team,
with the resulting teams therefore playing two fewer matches per new team. The new teams places are
inserted into the predicted table for reference, but ignored when calculating differences in final positions
between the actual and predicted results. Abbreviations for the table headings are as follows:
W(h) Number of games won (at home),
W(a) Number of games won (away),
D Number of games drawn,
L(h) Number of games lost (at home),
L(a) Number of games lost (away),
Pts Total points tally over the season,
P Number of games played,
Difference between predicted and actual final position of the team.
The predicted league tables are moderately accurate in terms of their final placings of teams in the
league, with several predictions correct and others only a small number of places away from the correct
position. This is more the case in the English Premier League tables (3.25 and 3.26), but still occurs
throughout the testing data sets, with perhaps less accuracy in the French and German league tables
but still some generally-correct predictions. The average error across the teams in each table has been
calculated at the foot of every table by summing the total number of places by which each team is
misplaced, and dividing the result by the number of teams involved in the predictions (that is, not
54

Test
1
2
Tests
1,2

Fitness
70.054
69.127
T-test
0.1213

Std dev
1.912
2.231

Table 3.23: Fitness for different population training data extents

Yield
Std dev 1
2
T-test

Test
1
2
2.54%
2.35%
1,2

EPL0506
13.14%
13.09%
2.23%
2.54%
0.9402

EPL0607
5.82%
5.54%
2.61%
1.13%
0.6793

FLC0506
-2.42%
-2.58%
1.90%
1.75%
0.7803

FLC0607
6.12%
5.77%
2.23%
4.42%
0.5040

GB10506
-11.13%
-9.64%
3.50%
2.36%
0.1431

GB10607
-1.89%
-2.40%
2.50%
2.43%
0.5537

Average
1.61%
1.63%

0.6001

Table 3.24: Results for different population training data extents

Team
Chelsea
ManUnited
Liverpool
Arsenal
Tottenham
Blackburn
Newcastle
Bolton
WestHam
Wigan
Everton
Fulham
Charlton
Middlesbrough
ManCity
AstonVilla
Portsmouth
Birmingham
WestBrom
Sunderland

W
29
25
25
20
18
19
17
15
16
15
14
14
13
12
13
10
10
8
7
3

D
4
8
7
7
11
6
7
11
7
6
8
6
8
9
4
12
8
10
9
6

L
5
5
6
11
9
13
14
12
15
17
16
18
17
17
21
16
20
20
22
29

Pts
91
83
82
67
65
63
58
56
55
51
50
48
47
45
43
42
38
34
30
15

P
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38

Team
Arsenal
Chelsea
ManUnited
Liverpool
AstonVilla
Blackburn
Newcastle
WestHam
Portsmouth
Wigan
Tottenham
Bolton
Sunderland
Birmingham
Everton
ManCity
Fulham
WestBrom
Middlesbrough
Charlton

W(h)
17.84
17.88
17.96
17.8
17.88
17.56
17.88
17.84
17.52
0
17.48
17.36
17.36
17.36
17.24
17.24
16.88
16.92
16.8
16.44

W(a)
0
0
0
0
0
0
0
0
0
0
0.04
0
0
0
0
0
0
0
0
0

D
2.4
1.84
1.36
1.8
0.52
1.4
0.4
0.24
0.72
0
0.68
0.88
0.88
0.76
1.08
0.88
1.52
1.08
1.4
1.68

L(h)
L(a)
Pts
P
0
15.76
55.92
36
0
16.28
55.48
36
0.04
16.68
55.24
36
0
16.4
55.2
36
0
17.6
54.16
36
0
17.04
54.08
36
0
17.72
54.04
36
0
17.92
53.76
36
0
17.76
53.28
36
0
0
0
0
0
17.84
53.24
36
0
17.76
52.96
36
0
17.76
52.96
36
0
17.88
52.84
36
0
17.68
52.8
36
0
17.88
52.6
36
0
17.6
52.16
36
0
18
51.84
36
0
17.8
51.8
36
0
17.88
51
36
Avg error per team playing

+3
-1
-1
-1
+10
=
=
+1
+7
-5
-3
+7
+4
-4
-1
-5
+1
-5
-7
3.47

Table 3.25: English Premier League 20052006 final and predicted final league table

Team
ManUnited
Chelsea
Liverpool
Arsenal
Tottenham
Everton
Bolton
Reading
Portsmouth
Blackburn
AstonVilla
Middlesbrough
Newcastle
ManCity
WestHam
Fulham
Wigan
SheffieldUnited
Charlton
Watford

W
28
24
20
19
17
15
16
16
14
15
11
12
11
11
12
8
10
10
8
5

D
5
11
8
11
9
13
8
7
12
7
17
10
10
9
5
15
8
8
10
13

L
5
3
10
8
12
10
14
15
12
16
10
16
17
18
21
15
20
20
20
20

Pts
89
83
68
68
60
58
56
55
54
52
50
46
43
42
41
39
38
38
34
28

P
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38

Team
Arsenal
Chelsea
ManUnited
Portsmouth
Newcastle
Liverpool
Everton
Reading
WestHam
ManCity
Tottenham
Blackburn
Middlesbrough
Fulham
Charlton
AstonVilla
Wigan
SheffieldUnited
Bolton
Watford

W(h)
16
15.84
15.56
15.56
15.72
15.28
15.6
0
15.32
15.08
15.28
15.24
15.12
14.88
14.84
14.92
14.76
0
14.68
0

W(a)
0
0
0
0
0
0
0
0
0
0.2
0
0
0
0
0
0
0
0
0
0

D
1.64
1.28
1.88
1.4
0.92
2.12
0.84
0
1.4
1.36
1.32
1.36
1.36
1.72
1.6
1.32
1.6
0
1.44
0

L(h)
L(a)
Pts
P
0
14.36
49.64
32
0
14.88
48.8
32
0.04
14.56
48.56
32
0
15.04
48.08
32
0
15.36
48.08
32
0.04
14.6
47.96
32
0.04
15.56
47.64
32
0
0
0
0
0
15.28
47.36
32
0
15.52
47.2
32
0
15.4
47.16
32
0.04
15.36
47.08
32
0.04
15.52
46.72
32
0
15.4
46.36
32
0
15.56
46.12
32
0
15.76
46.08
32
0
15.64
45.88
32
0
0
0
0
0
15.88
45.48
32
0
0
0
0
Avg error per team playing

Table 3.26: English Premier League 20062007 final and predicted final league table

55

+3
-1
-2
+4
+7
-3
-1
+6
+4
-5
-2
-1
+2
+3
-5
=
-10
3.47

Team
Lyon
Bordeaux
Lille
Lens
Marseille
Auxerre
Rennes
Nice
ParisSG
Monaco
LeMans
Nancy
StEtienne
Nantes
Sochaux
Toulouse
Troyes
Ajaccio
Strasbourg
Metz

W
25
18
16
14
16
17
18
16
13
13
13
12
11
11
11
10
9
8
5
6

D
9
15
14
18
12
8
5
10
13
13
13
12
14
12
11
11
12
9
14
11

L
4
5
8
6
10
13
15
12
12
12
12
14
13
15
16
17
17
21
19
21

Pts
84
69
62
60
60
59
59
58
52
52
52
48
47
45
44
41
39
33
29
29

Team
LeMans
Rennes
Sochaux
Lille
Nantes
Lens
Lyon
Bordeaux
Metz
Ajaccio
ParisSG
Nancy
StEtienne
Nice
Strasbourg
Troyes
Auxerre
Monaco
Toulouse
Marseille

P
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38

W(h)
17.92
17.96
17.84
17.6
17.84
17.56
17.2
17.64
17.68
17.48
17.56
0
17.48
17.4
17.28
17.28
17.32
17.12
16.84
16.76

W(a)
0.04
0
0.04
0.12
0
0.08
0
0
0
0
0
0
0
0
0
0
0
0
0.04
0

D
0.88
0.8
0.88
1
0.48
1
2.28
0.8
0.64
1.2
0.64
0
0.88
0.96
1.24
1.12
0.84
1.4
1.64
1.64

L(h)
L(a)
Pts
P
0
17.2
54.76
36
0
17.24
54.68
36
0
17.28
54.52
36
0.04
17.36
54.16
36
0
17.68
54
36
0
17.4
53.92
36
0
16.52
53.88
36
0
17.56
53.72
36
0
17.68
53.68
36
0.08
17.28
53.64
36
0
17.8
53.32
36
0
0
0
0
0
17.64
53.32
36
0
17.64
53.16
36
0.04
17.48
53.08
36
0.04
17.6
52.96
36
0
17.84
52.8
36
0.08
17.48
52.76
36
0
17.52
52.2
36
0.04
17.56
51.92
36
Avg error per team playing

+10
+5
+11
-1
+8
-2
-6
-6
+10
+7
-2
=
-5
+4
+1
-10
-7
-3
-14
5.89

Table 3.27: French Le Championnat 20052006 final and predicted final league table

Team
Lyon
Marseille
Toulouse
Rennes
Lens
Bordeaux
Sochaux
Auxerre
Monaco
Lille
StEtienne
LeMans
Nancy
Lorient
ParisSG
Nice
Valenciennes
Troyes
Sedan
Nantes

W
24
19
17
14
15
16
15
13
13
13
14
11
13
12
12
9
11
9
7
7

D
9
7
7
15
12
9
12
15
12
11
7
16
10
13
12
16
10
12
14
13

L
5
12
14
9
11
13
11
10
13
14
17
11
15
13
14
13
17
17
17
18

Pts
81
64
58
57
57
57
57
54
51
50
49
49
49
49
48
43
43
39
35
34

P
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38

Team
Lyon
Lens
Rennes
Nice
Sochaux
LeMans
Lorient
Lille
Bordeaux
StEtienne
Marseille
Monaco
ParisSG
Auxerre
Nantes
Sedan
Valenciennes
Toulouse
Nancy
Troyes

W(h)
17.84
17.96
17.88
18
17.84
17.84
17.8
17.96
17.76
17.8
17.72
17.72
17.64
17.44
17.56
17.2
0
16.96
16.96
16.76

W(a)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

D
1.12
0.72
0.88
0.52
0.88
0.64
0.72
0.2
0.64
0.36
0.56
0.32
0.48
1.08
0.64
1
0
1.24
1.16
1.56

L(h)
L(a)
Pts
P
0
17.04
54.64
36
0
17.32
54.6
36
0
17.24
54.52
36
0
17.48
54.52
36
0
17.28
54.4
36
0
17.52
54.16
36
0
17.48
54.12
36
0
17.84
54.08
36
0
17.6
53.92
36
0
17.84
53.76
36
0
17.72
53.72
36
0
17.96
53.48
36
0
17.88
53.4
36
0
17.48
53.4
36
0
17.8
53.32
36
0
17.8
52.6
36
0
0
0
0
0
17.8
52.12
36
0
17.88
52.04
36
0
17.68
51.84
36
Avg error per team playing

=
+3
+1
+12
+2
+6
+7
+2
-3
+1
-9
-3
+2
-6
+4
+2
-14
-5
-2
4.42

Table 3.28: French Le Championnat 20062007 final and predicted final league table

Team
BayernMunich
WerderBremen
Hamburg
Schalke04
Leverkusen
Hertha
Dortmund
Nurnberg
Stuttgart
Mgladbach
Mainz
Hannover
Bielefeld
EinFrankfurt
Wolfsburg
Kaiserslautern
FCKoln
Duisburg

W
22
21
21
16
14
12
11
12
9
10
9
7
10
9
7
8
7
5

D
9
7
5
13
10
12
13
8
16
12
11
17
7
9
13
9
9
12

L
3
6
8
5
10
10
10
14
9
12
14
10
17
17
14
17
18
17

Pts
75
70
68
61
52
48
46
44
43
42
38
38
37
36
34
33
30
27

P
34
34
34
34
34
34
34
34
34
34
34
34
34
34
34
34
34
34

Team
Stuttgart
WerderBremen
BayernMunich
FCKoln
Nurnberg
Dortmund
Bielefeld
Hannover
Mainz
Wolfsburg
Schalke04
Kaiserslautern
Leverkusen
Hertha
EinFrankfurt
Mgladbach
Hamburg
Duisburg

W(h)
15.68
15.6
15.48
15.6
15.68
15.6
15.44
15.52
15.4
15.4
15.32
15.24
15.28
15.4
15.2
15.4
15.04
0

W(a)
0.08
0.12
0
0.28
0.04
0.08
0.16
0
0
0.04
0.16
0.16
0
0
0
0
0
0

D
1.32
1.44
2.04
0.72
0.72
0.68
0.88
1.08
1.36
1.24
0.96
1.08
1.32
0.88
1.44
0.64
1.16
0

L(h)
L(a)
Pts
P
0
15
48.6
32
0
14.96
48.6
32
0.04
14.48
48.48
32
0
15.48
48.36
32
0
15.6
47.88
32
0.12
15.68
47.72
32
0
15.68
47.68
32
0.2
15.4
47.64
32
0.04
15.2
47.56
32
0
15.32
47.56
32
0
15.72
47.4
32
0
15.68
47.28
32
0.12
15.4
47.16
32
0.4
15.64
47.08
32
0
15.36
47.04
32
0.04
15.92
46.84
32
0.16
15.76
46.28
32
0
0
0
0
Avg error per team playing

Table 3.29: German 1.Bundesliga 20052006 final and predicted final league table

56

+8
=
-2
+13
+3
+1
+6
+4
+2
+5
-7
+4
-8
-8
-1
-6
-14
5.41

including teams for which no training data was available, and which appear as a row of zeroes in the
league tables).
The total average placing error for the six league tables is therefore 28.31.
One obviously poor feature of the GAs predictions which becomes apparent when looking at the
predicted league tables is the disproportionate number of home wins to away wins or draws. It is clear
that, for example in the English Premier League 20052006 predicted table (3.25), a team is generally
never predicted to lose at home or win away from home. A team instead becomes higher-placed by
converting more of its away losses into draws. This results in the spread of points being awarded to each
team over the course of the season becoming very small, with only fractions of points separating the
teams, quite unlike the actual league table results.

3.6

Removing seeded home/away win ratio information

Hypothesis The home/away win ratio (see section 2.4.1) is seeded with random numbers centred
around specific values in an attempt to enhance the populations initial fitness during training by providing external information about well-known good ratios. It is possible for an individual to choose two
values with sum to > 1, but such individuals will then perform badly in the fitness evaluation as they
will be unable to predict draws between teams. Individuals with invalid ratio values are still permitted
though, in order to enhance population diversity, but they are less likely to survive to future generations
and therefore they penalise themselves. This is a penalty approach to invalid solutions.
Another method of calculating and mutating the home/away win ratio could employ a different
method instead of the current penalty approach to removing individuals with invalid ratios from the
population. By adding a third value to the bitstring to represent the draw ratio, and then normalising
all three values to obtain a true ratio of home wins to away wins to draws, it can be guaranteed that for
any three values the resulting ratio will always be valid. This can be seen as a kind of repair approach
to the problem of finding valid ratios, in contrast to the previous penalty approach. The difference can
be explained as follows, where h stands for home wins, a for away wins, and d for draws.
Seeded home/away win ratios, with calculation for (potentially invalid) draw ratio:
h=01
a=01

(3.1)
(3.2)

d=ha

(3.3)

Unseeded home/away win and draw ratio values, and formulae for deriving the ratio of the three
values:
h=01
a=01

(3.4)
(3.5)

d=01
h
h =
(h + a + d)
a

a =
(h + a + d)
d
d =
(h + a + d)

(3.6)
(3.7)
(3.8)
(3.9)

This alternative method of representing and calculating the home/away win and draw ratio employs
what can be seen as a repair approach for individuals with invalid (summing to 6= 1) h, d, and a values.
By refraining from seeding the initial values for h, d, and a, the GA has more flexibility in the initial
population to choose an appropriate ratio, which should lead to fewer generalisations that the home team
should almost invariably win or, occasionally, draw.
Method The experiment was run at 8-bit precision bitstring representation with separately-trained
populations using the new individually-tailored extents, over 60 epochs of training, with 100 individuals
57

per population, 1 fixed oracle, an initial mutation rate of 0.20, using team crossover with elitism, and
the new method of calculating home/away win and draw ratios.
The tests are labelled as follows:
1. Old method with seeded ratios (penalty approach),
2. New method with unseeded ratios (repair approach).

Figure 3.28: Fitness profile for different home/away win ratio initialisations
Results The two tests produced similar final percentage yields (table 3.31), although the plots of profit
and loss (figure 3.29) show some difference in the mid-season performance of the GA, particularly in the
GB10506 and GB10607 testing sets, where the new ratio calculation scheme performs quite significantly
better.
Performance of the GA on the training sets is slightly improved using the new ratio calculation scheme
(figure 3.28) although the t-test indicates that the difference is only marginally significant (table 3.32).
The total average placing error for the predicted league tables is 26.93, which shows some improvement
compared to the previous total average placing error of 28.31 using the pre-seeded home/away win ratio
information. The improvement in the spread of predicted final placings for each team comes from
the greater tendency of the GA to predict a draw between two teams than when using the pre-seeded
home/away win ratios (compare tables 3.25 and 3.33). There is still almost no increase in the number
of predictions for a team to lose at home or win away from home, however.

3.7

Calculation of certainties

Hypothesis Kvam and Sokol [9] build upon the work by Clair and Letscher [3] to accurately rank
(and/or rate) teams using only basic input data based on Markov chain models. They summarise
that part of the reason for the comparative success of our model is that the other models . . . treat the
outcome of games as binary events, wins and losses. In contrast, our model estimates the probability of
58

Team
Stuttgart
Schalke04
WerderBremen
BayernMunich
Leverkusen
Nurnberg
Hamburg
Bochum
Dortmund
Hertha
Hannover
Bielefeld
Cottbus
EinFrankfurt
Wolfsburg
Mainz
Aachen
Mgladbach

W
21
21
20
18
15
11
10
13
12
12
12
11
11
9
8
8
9
6

D
7
5
6
6
6
15
15
6
8
8
8
9
8
13
13
10
7
8

L
6
8
8
10
13
8
9
15
14
14
14
14
15
12
13
16
18
20

Pts
70
68
66
60
51
48
45
45
44
44
44
42
41
40
37
34
34
26

P
34
34
34
34
34
34
34
34
34
34
34
34
34
34
34
34
34
34

Team
WerderBremen
Bochum
Mgladbach
Hannover
Hertha
Dortmund
Schalke04
BayernMunich
Wolfsburg
Stuttgart
Mainz
Leverkusen
Cottbus
Nurnberg
Hamburg
EinFrankfurt
Aachen
Bielefeld

W(h)
15.96
15.92
16
15.92
15.84
15.92
15.84
15.8
15.72
15.72
15.84
15.76
15.68
15.56
15.56
15.32
0
15.04

W(a)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

D
0.76
0.4
0.16
0.36
0.56
0.28
0.48
0.52
0.76
0.72
0.32
0.4
0.44
0.72
0.56
0.68
0
1.08

L(h)
L(a)
Pts
P
0
15.28
48.64
32
0
15.68
48.16
32
0
15.84
48.16
32
0
15.72
48.12
32
0
15.6
48.08
32
0
15.8
48.04
32
0
15.68
48
32
0
15.68
47.92
32
0
15.52
47.92
32
0
15.56
47.88
32
0
15.84
47.84
32
0
15.84
47.68
32
0
15.88
47.48
32
0
15.72
47.4
32
0
15.88
47.24
32
0
16
46.64
32
0
0
0
0
15.88
46.2
32
Avg error per team playing

+2
+6
+14
+7
+5
+3
-5
-4
+6
-9
+5
-7
=
-8
-8
-2
-5
5.65

Table 3.30: German 1.Bundesliga 20062007 final and predicted final league table

Test
1
2
Tests
1,2

Fitness
147.565
145.920
T-test
0.1928

Std dev
4.312
4.490

Table 3.31: Fitness for different home/away win ratio initialisations

Yield
Std dev
T-test

Test
1
2
1
2
1,2

EPL0506
11.60%
12.00%
2.99%
3.87%
0.6846

EPL0607
5.02%
4.06%
1.95%
4.40%
0.3245

FLC0506
-1.90%
-2.04%
2.23%
1.64%
0.8006

FLC0607
4.94%
6.19%
2.25%
4.06%
0.1895

GB10506
-11.08%
-7.97%
2.46%
5.76%
0.0184

GB10607
-1.37%
-2.56%
2.90%
4.02%
0.2395

Average
1.20%
1.61%
2.47%
3.96%
0.3762

Table 3.32: Results for different home/away win ratio initialisations

Team
Chelsea
ManUnited
Liverpool
Arsenal
Tottenham
Blackburn
Newcastle
Bolton
WestHam
Wigan
Everton
Fulham
Charlton
Middlesbrough
ManCity
AstonVilla
Portsmouth
Birmingham
WestBrom
Sunderland

W
29
25
25
20
18
19
17
15
16
15
14
14
13
12
13
10
10
8
7
3

D
4
8
7
7
11
6
7
11
7
6
8
6
8
9
4
12
8
10
9
6

L
5
5
6
11
9
13
14
12
15
17
16
18
17
17
21
16
20
20
22
29

Pts
91
83
82
67
65
63
58
56
55
51
50
48
47
45
43
42
38
34
30
15

P
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38

Team
ManUnited
Arsenal
Chelsea
Liverpool
Blackburn
Newcastle
Portsmouth
AstonVilla
WestHam
Wigan
Bolton
Tottenham
Everton
Fulham
ManCity
Sunderland
Middlesbrough
Charlton
WestBrom
Birmingham

W(h)
16.88
16.92
16.24
15.92
16.36
16.28
16.28
16.12
15.88
0
15.92
16.04
15.64
15.8
15.72
15.4
15.64
15.24
14.76
14.96

W(a)
0.12
0.24
0.24
0.16
0.16
0.12
0.04
0
0.16
0
0
0.08
0
0
0
0
0
0
0.04
0

D
4.72
4.04
5
5.56
3.96
3.44
3.6
3.84
3.92
0
4.16
3.48
4.24
3.72
3.84
4.12
3.12
4.28
5.28
4.16

L(h)
L(a)
Pts
P
0
14.28
55.72
36
0
14.84
55.52
36
0
14.6
54.44
36
0.04
14.44
53.8
36
0
15.6
53.52
36
0.04
16.16
52.64
36
0
16.08
52.56
36
0.08
16
52.2
36
0
16.2
52.04
36
0
0
0
0
0
15.92
51.92
36
0
16.48
51.84
36
0.08
16.12
51.16
36
0.44
16.24
51.12
36
0.08
16.44
51
36
0.28
16.36
50.32
36
0.32
16.92
50.04
36
0
16.48
50
36
0
15.96
49.68
36
0
16.88
49.04
36
Avg error per team playing

Table 3.33: English Premier League 20052006 final and predicted final league table

59

+1
+2
-2
-1
+1
+1
+9
+7
=
-2
-6
-2
-2
=
+4
-3
-5
=
-2
2.63

Figure 3.29: Results for different different home/away win ratio initialisations

60

Team
ManUnited
Chelsea
Liverpool
Arsenal
Tottenham
Everton
Bolton
Reading
Portsmouth
Blackburn
AstonVilla
Middlesbrough
Newcastle
ManCity
WestHam
Fulham
Wigan
SheffieldUnited
Charlton
Watford

W
28
24
20
19
17
15
16
16
14
15
11
12
11
11
12
8
10
10
8
5

D
5
11
8
11
9
13
8
7
12
7
17
10
10
9
5
15
8
8
10
13

L
5
3
10
8
12
10
14
15
12
16
10
16
17
18
21
15
20
20
20
20

Pts
89
83
68
68
60
58
56
55
54
52
50
46
43
42
41
39
38
38
34
28

P
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38

Team
ManUnited
Liverpool
Wigan
Blackburn
Portsmouth
AstonVilla
Newcastle
Reading
Arsenal
Chelsea
WestHam
ManCity
Middlesbrough
Fulham
Everton
Bolton
Tottenham
SheffieldUnited
Charlton
Watford

W(h)
13.96
13.36
13.8
13.56
13.4
13.64
13.36
0
12.88
13.12
13.08
13.24
12.8
12.72
12.76
12.2
12.56
0
12.28
0

W(a)
0.44
0.6
0.12
0.08
0.16
0.08
0.08
0
0.28
0.16
0.12
0
0
0.08
0
0
0.08
0
0
0

D
6.2
5.44
5.16
5.68
5.48
4.92
5.36
0
6
5.64
5.56
4.92
5.96
5.64
5.68
6.68
5.2
0
5.6
0

L(h)
L(a)
Pts
P
0
11.44
49.4
32
0
12.68
47.32
32
0.2
12.88
46.92
32
0
12.72
46.6
32
0.04
12.96
46.16
32
0.24
13.16
46.08
32
0.08
13.2
45.68
32
0
0
0
0
0.04
12.84
45.48
32
0.04
13.12
45.48
32
0.16
13.24
45.16
32
0.36
13.6
44.64
32
0.28
13.04
44.36
32
0.12
13.52
44.04
32
0.12
13.44
43.96
32
0.16
12.96
43.28
32
0.08
14.16
43.12
32
0
0
0
0
0.36
13.76
42.44
32
0
0
0
0
Avg error per team playing

Table 3.34: English Premier League 20062007 final and predicted final league table

Team
Lyon
Bordeaux
Lille
Lens
Marseille
Auxerre
Rennes
Nice
ParisSG
Monaco
LeMans
Nancy
StEtienne
Nantes
Sochaux
Toulouse
Troyes
Ajaccio
Strasbourg
Metz

W
25
18
16
14
16
17
18
16
13
13
13
12
11
11
11
10
9
8
5
6

D
9
15
14
18
12
8
5
10
13
13
13
12
14
12
11
11
12
9
14
11

L
4
5
8
6
10
13
15
12
12
12
12
14
13
15
16
17
17
21
19
21

Pts
84
69
62
60
60
59
59
58
52
52
52
48
47
45
44
41
39
33
29
29

P
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38

Team
Lyon
Nantes
Ajaccio
LeMans
Monaco
ParisSG
Rennes
Strasbourg
Lens
Sochaux
Lille
Nancy
StEtienne
Metz
Bordeaux
Troyes
Nice
Toulouse
Auxerre
Marseille

W(h)
17.84
17.84
17.72
17.64
17.6
17.52
17.48
17.32
17.4
17.4
17.4
0
17.24
16.92
17.12
16.76
16.92
16.88
16.84
16.48

W(a)
0.28
0
0.08
0.04
0.04
0.04
0
0.04
0.04
0.04
0
0
0
0.08
0
0
0
0
0
0

D
2.36
1.48
0.92
1.2
1.16
1.16
1.24
1.56
1.28
1.12
1.08
0
1.4
1.8
1.2
2
1.4
1.48
1.48
1.64

L(h)
L(a)
Pts
P
0
15.6
56.72
36
0.12
16.68
55
36
0
17.36
54.32
36
0
17.16
54.24
36
0
17.24
54.08
36
0
17.32
53.84
36
0
17.28
53.68
36
0
17.12
53.64
36
0
17.32
53.6
36
0
17.48
53.44
36
0
17.52
53.28
36
0
0
0
0
0.08
17.32
53.12
36
0
17.28
52.8
36
0.04
17.64
52.56
36
0.16
17.24
52.28
36
0.08
17.64
52.16
36
0.04
17.6
52.12
36
0.04
17.64
52
36
0.12
17.88
51.08
36
Avg error per team playing

=
+11
+14
+7
+5
+3
=
+10
-5
+4
-8
=
+6
-12
+1
-8
-2
-12
-14
6.42

Table 3.35: French Le Championnat 20052006 final and predicted final league table

Team
Lyon
Marseille
Toulouse
Rennes
Lens
Bordeaux
Sochaux
Auxerre
Monaco
Lille
StEtienne
LeMans
Nancy
Lorient
ParisSG
Nice
Valenciennes
Troyes
Sedan
Nantes

W
24
19
17
14
15
16
15
13
13
13
14
11
13
12
12
9
11
9
7
7

D
9
7
7
15
12
9
12
15
12
11
7
16
10
13
12
16
10
12
14
13

L
5
12
14
9
11
13
11
10
13
14
17
11
15
13
14
13
17
17
17
18

Pts
81
64
58
57
57
57
57
54
51
50
49
49
49
49
48
43
43
39
35
34

P
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38
38

Team
Rennes
Lyon
Sochaux
Nice
Nantes
Auxerre
ParisSG
Lille
Lens
Monaco
Sedan
StEtienne
LeMans
Lorient
Bordeaux
Troyes
Valenciennes
Nancy
Marseille
Toulouse

W(h)
17.96
17.76
17.52
17.52
17.36
17.12
17.2
17.16
17.08
17.36
16.8
17.04
16.84
16.44
16.64
16.28
0
15.8
16.12
15.32

W(a)
0.24
0.12
0.08
0.2
0.08
0
0.08
0.12
0
0
0.12
0.04
0.04
0
0
0
0
0.12
0
0

D
1.92
2.16
2.2
1.64
2.04
2.28
1.76
1.76
2.04
1.2
2.16
1.6
1.8
2.88
1.72
2.56
0
3.44
2.12
3.2

L(h)
L(a)
Pts
P
0
15.96
56.52
36
0
16.04
55.8
36
0
16.28
55
36
0
16.76
54.8
36
0
16.52
54.36
36
0.04
16.56
53.64
36
0
17.04
53.6
36
0
17.04
53.6
36
0
16.88
53.28
36
0.04
17.44
53.28
36
0
17.04
52.92
36
0.2
17.16
52.84
36
0.08
17.32
52.44
36
0
16.68
52.2
36
0.2
17.6
51.64
36
0.24
17.16
51.4
36
0
0
0
0
0
16.72
51.2
36
0.16
17.76
50.48
36
0.28
17.36
49.16
36
Avg error per team playing

Table 3.36: French Le Championnat 20062007 final and predicted final league table

61

+3
-1
+4
+12
+14
+2
+8
+2
-4
-1
+7
-1
-1
=
-9
+1
-4
-16
-16
5.58

=
+1
+13
+5
+3
+4
+5
-4
-7
+4
+2
-1
+2
-8
-7
-11
=
4.53

Team
BayernMunich
WerderBremen
Hamburg
Schalke04
Leverkusen
Hertha
Dortmund
Nurnberg
Stuttgart
Mgladbach
Mainz
Hannover
Bielefeld
EinFrankfurt
Wolfsburg
Kaiserslautern
FCKoln
Duisburg

W
22
21
21
16
14
12
11
12
9
10
9
7
10
9
7
8
7
5

D
9
7
5
13
10
12
13
8
16
12
11
17
7
9
13
9
9
12

L
3
6
8
5
10
10
10
14
9
12
14
10
17
17
14
17
18
17

Pts
75
70
68
61
52
48
46
44
43
42
38
38
37
36
34
33
30
27

P
34
34
34
34
34
34
34
34
34
34
34
34
34
34
34
34
34
34

Team
BayernMunich
Dortmund
Hertha
Nurnberg
WerderBremen
Leverkusen
Mgladbach
Stuttgart
Hannover
Hamburg
Schalke04
FCKoln
Mainz
Kaiserslautern
Bielefeld
EinFrankfurt
Wolfsburg
Duisburg

W(h)
14.72
14.92
14.88
14.72
14.24
14.36
14.56
13.68
13.8
14.08
13.96
13.68
13.8
13.68
13.72
13.56
13
0

W(a)
0.4
0.24
0
0.08
0.28
0.24
0
0.24
0.08
0
0
0.04
0
0
0
0
0.16
0

D
4.4
3.48
3.32
3.2
3.96
3.64
3.08
4
4.08
3.32
3.64
4.24
3.44
3.76
3.24
3.52
4.32
0

L(h)
L(a)
Pts
P
0
12.6
49.76
32
0
13.36
48.96
32
0
13.8
47.96
32
0.04
13.96
47.6
32
0.16
13.48
47.52
32
0.08
13.68
47.44
32
0
14.36
46.76
32
0.08
14.16
45.76
32
0.12
13.96
45.72
32
0.12
14.52
45.56
32
0.2
14.28
45.52
32
0
14.04
45.4
32
0.24
14.64
44.84
32
0.2
14.48
44.8
32
0.24
14.8
44.4
32
0.04
14.88
44.2
32
0.24
14.36
43.8
32
0
0
0
0
Avg error per team playing

=
+5
+3
+4
-3
-1
+3
+1
+3
-7
-7
+5
-2
+2
-2
-2
-2
3.06

Table 3.37: German 1.Bundesliga 20052006 final and predicted final league table

Team
Stuttgart
Schalke04
WerderBremen
BayernMunich
Leverkusen
Nurnberg
Hamburg
Bochum
Dortmund
Hertha
Hannover
Bielefeld
Cottbus
EinFrankfurt
Wolfsburg
Mainz
Aachen
Mgladbach

W
21
21
20
18
15
11
10
13
12
12
12
11
11
9
8
8
9
6

D
7
5
6
6
6
15
15
6
8
8
8
9
8
13
13
10
7
8

L
6
8
8
10
13
8
9
15
14
14
14
14
15
12
13
16
18
20

Pts
70
68
66
60
51
48
45
45
44
44
44
42
41
40
37
34
34
26

P
34
34
34
34
34
34
34
34
34
34
34
34
34
34
34
34
34
34

Team
WerderBremen
BayernMunich
Wolfsburg
Hannover
Stuttgart
Dortmund
Bochum
Leverkusen
Cottbus
Mgladbach
Nurnberg
Hertha
Schalke04
Mainz
Hamburg
Bielefeld
Aachen
EinFrankfurt

W(h)
13.24
13.92
13.36
12.84
12.6
12.36
12.24
12.12
12.36
12.24
12.32
12.16
12.08
12.4
11.84
11.6
0
11.6

W(a)
0.96
0.2
0.64
0.04
0.24
0.48
0.32
0.32
0
0.12
0.04
0.12
0.12
0.04
0.16
0
0
0

D
6.88
6.96
6.64
7.36
6.56
6.32
7.04
6.76
6.76
6.72
6.68
6.76
6.92
6.2
7.12
6.96
0
6.56

L(h)
L(a)
Pts
P
0
11.16
49.48
32
0.04
10.92
49.32
32
0
11.52
48.64
32
0.2
11.56
46
32
0.16
12.52
45.08
32
0.12
12.8
44.84
32
0.12
12.32
44.72
32
0.08
12.76
44.08
32
0.28
12.72
43.84
32
0.04
12.92
43.8
32
0.12
12.84
43.76
32
0.24
12.84
43.6
32
0.2
12.76
43.52
32
0.44
13
43.52
32
0.32
12.6
43.12
32
0.76
12.8
41.76
32
0
0
0
0
0.68
13.24
41.36
32
Avg error per team playing

Table 3.38: German 1.Bundesliga 20062007 final and predicted final league table

62

+2
+2
+12
+7
-4
+3
+1
-3
+4
+7
-5
-2
-11
+2
-8
-4
-3
4.71

the winning team being better than the losing team based on the location of the game and the margin
of victory and is therefore able to more accurately assess the outcome of a close game.
The current representation strategy used in this project can not only calculate a prediction for which
team will win a match (see section 2.4.1), but it can also give this prediction a certainty, by ascertaining
the magnitude of the difference between strengtha and strengthb , in a similar way to Kvam and Sokol
[9]. This is the equivalent of determining whether the match will be a close contest, or whether one team
is far superior to the other and will have a better chance of winning. Different betting strategies can
then be devised using these certainties to determine how much to trust a particular prediction. This
should lead to fewer (or smaller) bets being placed upon predictions which are uncertain, and more (or
larger) bets being placed upon predictions with a higher degree of certainty.
Method The experiment was run at 8-bit precision bitstring representation with separately-trained
populations using the new individually-tailored extents, over 60 epochs of training, with 100 individuals
per population, 1 fixed oracle, an initial mutation rate of 0.20, using team crossover with elitism, and
the new method of calculating home/away win and draw ratios. The four certainty strategies employed
are as follows:
1. Fixed betting, ie. always bet 10 on teams which appear in the training data set (the current
strategy),
2. Place a bet of 10 certainty on teams in the training data set (varying the magnitude of the bet
according to certainty),
3. Place a bet of 10 only if the certainty is greater than a random number from 0 1 and the teams
appear in the training data set, granting games with a greater certainty of a particular result a
correspondingly greater chance of being bet upon (varying the frequency of the bet according to
certainty),
4. Place a bet of 10 only if the certainty is greater than 34 maximum certainty encountered so
far and the teams appear in the training data set (varying the frequency of the bet according to
certainty).
Results Due to the potentially different amounts that can now be bet across the testing set, resulting
percentage yields and standard deviations must be normalised according to the total amount that was
bet in a particular testing run. This leads to a higher-than-expected percentage yield on all testing
sets, as the normalisation factor is smaller than the total amount bet for each testing set in all previous
experiments.
For the sake of calculating percentage yields, the previous experiments assume a 10 was placed on
all matches (regardless of whether the teams even appeared in the training data set), meaning that the
percentage yield was always calculated according to an outlay of 380 games 10 = 3800 for the
English and French leagues, and 306 games 10 = 3060 for the German leagues. Once taking into
account the teams which do not appear in the training set, and therefore upon which no bet is placed,
the corresponding betting outlay is lower, and this affects the percentage yield.
In order to allow comparison between the results of this experiment and all previous experiments, an
additional column is added to the results table (table 3.39) indicating the scaled equivalent percentage
yield if it were assumed that bets were placed on all matches in the testing set, regardless of the presence
of the teams in the training set.
The results for introducing calculations of certainties into the decisions for the magnitude or frequency
of bets are very promising (fig 3.30 and table 3.39). All three certainty-based strategies performed
better than the fixed betting strategy employed previously, with the differences being largely statistically
significant (t-tests on tests 1 and 2, 1 and 3, 1 and 4). The most successful betting strategy appears to be
the one employed in test 4, betting if the certainty is greater than 34 maximum certainty encountered
so far. However this strategy does appear to have a significantly larger standard deviation than the nextbest strategy, of placing bets of size proportional to the certainty of the predicted outcome. The standard
deviation for this strategy, and the overall percentage yield, indicate that this is a very successful and
less-random strategy to employ, although the t-test suggests that the difference between strategies 2 and
4 is only moderately significant. It is perhaps no surprise that the standard deviation for test 3 (betting
63

Figure 3.30: Results for different betting strategies based on certainties

64

if the certainty is greater than a random number) is very large, due to the extra stochasticity introduced
by the randomised betting. This is the worst-performing of the three certainty-based strategies, but is
still marginally better than the fixed betting strategy.

65

Yield

Std dev

T-test

Test
1
2
3
4
1
2
3
4
1,2
1,3
1,4
2,3
2,4
3,4

EPL0506
10.77%
13.88%
11.52%
14.15%
8.88%
4.11%
14.99%
9.73%
0.0004
0.0004
0.0008
0.5979
0.0047
0.0096

EPL0607
7.15%
6.47%
4.70%
5.61%
5.42%
5.23%
15.07%
9.90%
0.0000
0.0000
0.1498
0.5998
0.4154
0.3143

FLC0506
-1.48%
-1.99%
-1.85%
-0.76%
3.93%
2.00%
10.44%
6.79%
0.2306
0.2957
0.3031
0.9533
0.5951
0.6506

FLC0607
5.38%
4.66%
5.12%
4.38%
4.30%
2.05%
10.37%
5.27%
0.0001
0.0002
0.0075
0.8669
0.1020
0.2913

GB10506
-10.55%
-9.28%
-6.32%
-6.89%
4.27%
5.37%
15.13%
9.09%
0.0000
0.0000
0.0000
0.3478
0.6082
0.2706

GB10607
-0.28%
-1.96%
-2.06%
-3.78%
5.20%
2.47%
12.12%
7.15%
0.8453
0.8495
0.2633
0.9704
0.0783
0.1922

Table 3.39: Results for different betting strategies based on certainties

66

Average
1.83%
1.97%
1.85%
2.12%
5.33%
3.54%
13.02%
7.99%
0.1794
0.1910
0.1207
0.7227
0.3006
0.2881

Scaled
1.56%
1.67%
1.58%
1.80%
4.53%
3.01%
11.07%
6.79%

Chapter 4

Conclusions and further work


4.1

Findings

This project has successfully implemented and investigated the effects of a number of experimental
techniques to improve the performance of the genetic algorithm on a range of training and testing data
sets. A variety of parameters were tested in order to find optimal values, and these were recorded in
Chapter 3.
Of particular interest, it was found that a bitstring representation of the real valued parameters
relating to each team in the training set is much more successful than using arrays of actual real numbers.
Also of note is the finding that a reduction in scope of training data is beneficial to the optimisation,
as very much older data on performances of football teams is considerably different to modern data on
their performances.
Appropriate values for the number of epochs of training and the sizes of populations for differentprecision bitstrings were found, and a comprehensive comparison of the effects of different values for GA
parameters when using two different precisions of bitstring representation was undertaken, leading to the
conclusion that a precision of 8 bits gives a reasonable trade-off between computation time and required
precision to produce a good performance from the trained model.
Initial mutation rates for the self-adapting mutation were tested, and optimum values discovered,
as well as two competing crossover strategies both with and without elitism being employed, which
showed that elitism is beneficial to the optimisation of the bitstring representation. Qualitative analysis
of the fitness landscape was undertaken (see section 3.3.2) to determine why it may be the case that
elitism greatly benefits the performance of the optimisation procedure. It was also discovered that using
strategies for switching the number of oracles during prediction over a testing set is too tightly tied to
the features of a particular data set to be of reliable use, and that a compromise value of three oracles
working in co-operation with each other produces the best performance over the six testing sets examined
(see section 3.3.2).
It was discovered that in order to obtain success from using multiple experts trained on different
sections of the input space, the input space must be partitioned appropriately according to the fixture
schedule of the different leagues being tested.
Experiments showed that the penalty-method of dealing with individuals with invalid home/away win
ratios was slightly inferior to the repair-method which guarantees that any home/away win and draw
ratio will always be correct. In addition, using unseeded initial values for this ratio produced better
results than supplying the information to the GA before training, as the self-adaptation is able to at
least partially recover this information from the training set.
The system is able to predict with a moderate amount of accuracy the final placings of teams in a
league table, although the system tends to have an inherent bias towards predicting the home team to
win any given fixture, leading to a poor prediction of the final number of points actually awarded to
each team in the league table. Indeed, the differentiation between teams comes more from the number of
points lost through predicting a team to draw instead of winning at home, than from predicting teams
to lose at home.
Percentage yields can be boosted by betting either amounts or with frequency proportional to the
67

certainty of a prediction, as calculated by the magnitude of the difference between the strengths of two
teams extracted from the optimised bitstring representation. Different betting strategies using these
certainty values were investigated, and the results found to significantly improve over fixed-amount
betting regardless of certainty. These percentage yields were particularly high on some testing sets,
notably the English and French leagues, but the German league testing sets remained very difficult to
return a positive yield from.
Overall, the system has shown considerable success at generalising from training data to unseen
testing data across a number of testing sets.

4.2
4.2.1

Further work
Oracle switching strategies

As discussed in section 3.3.2, it is possible that an alternative strategy of switching oracles may produce a
more successful yield than the strategy employed previously (five oracles switching to one and vice-versa
after 10 losses in any 20 consecutive bets). The experiments showed that, unlike in the mini-project where
this particular strategy was successful on the testing data used [14], on the extended range of testing data
sets in this project the performance decreased significantly when this strategy was employed. Future
work could look into the effects of different strategies and attempt to determine whether a generalised
strategy can produce improved performance over a number of testing sets, or if indeed the strategy is
highly dependent on the particular testing set in use.

4.2.2

Intelligent weighting of training data

With the current system, for any team that has been absent from the League for some time due to being
relegated and then re-promoted, training is far from accurate. For example, if a team was relegated five
years previous to the testing set season, and then promoted in the testing set season, the training for
the model containing that team is all based on the teams performance five or more years ago, which
is weighted very low compared to more recent data, and is therefore considered less accurate. A fair
assumption would be that a team which has had a long absence from the league and only just re-joined it
is unlikely to perform exceptionally well in the league. Indeed, in every season except 200102 at least
one Premier League newcomer has been relegated [17]. Therefore, weightings during training should be
calculated as a function of distance from the most recent season in which a team appears in the training
data, rather than simply being calculated as a function of distance from the current season.

4.2.3

Decreasing bias of predictions

Currently the trained model has a high tendency to predict a win for the home team, which is an overgeneralisation from the training set. Although it is known that home-wins outnumber away-wins and
draws, the frequency with which the GA predicts home-wins is unrealistic, as can be seen from the league
tables in section 3.5. Investigations should be performed to ascertain why it is that the GA is encouraged
by the fitness function to settle for a majority home-win prediction, and what can be done to mitigate
this. This may require adjustment to the fitness evaluation function to more forcefully encourage the
correct prediction of a result, rather than settling for predicting the most common result (home-wins)
in order to reduce the accumulated fitness penalty this way. One way to achieve this may be to include
bookmakers odds in the fitness function, as home-wins generally have lower paybacks than away-wins
and draws due to their statistically increased frequency of occurrence, and so a fitness function which
rewards correct predictions of away-wins and draws more highly due to the higher odds associated with
these outcomes may encourage convergence to a fairer prediction model during training.

4.2.4

Niching and fitness sharing

It was proposed in the mini project [14] that fitness sharing and niching may improve the performance
of the GA. These are techniques outlined by Darwen and Yao [4]: fitness sharing modifies a search
landscape by reducing payoff in densely-populated regions, encouraging search in unexplored regions
and causing sub-populations to form.
68

One major use of such fitness enhancement techniques is to encourage diversity in a population which
must be kept small due to computational constraints. However, earlier experiments (see sections 3.3.2
and 3.3.3) have shown that the improvement in model performance obtained with larger population sizes
is minimal at best, and even in some cases worse than with smaller population sizes. Niching and fitness
sharing are therefore unlikely to bring much benefit in these situations.
Another major benefit of niching and fitness sharing is to ensure that a greater number of optima
are located simultaneously by the population on each run. One way to find multiple optima is to make
several runs of an ordinary GA: for each run, genetic drift will cause the population to converge to one
optimum, but on the next run it will probably converge to a different optimum. Thus, multiple optima
are found from multiple runs [4]. (As an aside, this explains why there is a standard deviation in the
results obtained in the experiments, as different optima produce different models, from which different
predictions will be made, thus affecting the total returns of the model slightly each time). Darwen and
Yao [4] go on to say that fitness sharing . . . reduces payoff . . . at heavily-populated regions of the search
space, and it does it dynamically during a single run, which reduces the repeated partial exploration
suffered by sequential schemes.
Although this will not necessarily result in correspondingly better model performance, it does allow
the potential for finding a number of individuals located in different optima, and using these as the
oracles during betting predictions to ensure that all the oracles are well-trained and will disagree slightly
due to their guaranteed location at different optima. This slight disagreement, or negative correlation
between experts in an ensemble, is an effect more typically encountered in the field of neural networks
known as Negative Correlation Learning, and results in the ability of an ensemble of experts (such as
the oracles in this system) to create negatively correlated networks to encourage specialisation and
cooperation among the individual [experts] [11].

4.2.5

Adding knowledge to the model dynamically

Work could be undertaken to update the model with results of recent matches as they are played throughout a season, without having to completely re-train the model each time new knowledge is added to the
training set. This is an important and significant addition, as it would enable the model to predict
results for teams which do not appear in the base training set (eg. those teams which have recently been
promoted) after a few matches have been played.
Genetic algorithms are inherently off-line search algorithms, which makes it hard to incorporate
new information into the trained model without having to completely re-train the model. Alternative
optimisation techniques may facilitate this necessary ability to update the model on-line, for example
Ant Colony Optimisation. Alternatively, it could be the case that the main limitation of the system
actually lies with the representation, and not the method of optimisation. Future work should first
look at including additional data such as referee, individual players situations, or a recent change in
managers, etc.

69

Appendix A

Calculating the binary reflected


Gray code
Converting real values to binary, and then to binary reflected Gray codes:
S t r i n g grayCode = binToGray ( decToBin ( value , p r e c i s i o n ) )
private S t r i n g decToBin ( double val , int p r e c i s i o n ) {
S t r i n g binary = ;
for ( int i =1; i<=p r e c i s i o n ; i ++) {
i f ( v a l >= ( 1 / Math . pow ( ( double ) 2 , ( double ) i ) ) ) {
b i n a r y+= 1 ; val =(1/Math . pow ( ( double ) 2 , ( double ) i ) )
;
} else {
b i n a r y+= 0 ;
}
}
return b i n a r y ;
}
private S t r i n g binToGray ( S t r i n g b i n a r y ) {
S t r i n g gray = ;
int p r e c i s i o n = b i n a r y . l e n g t h ( ) ;
gray+=b i n a r y . s u b s t r i n g ( 0 , 1 ) ;
for ( int i =1; i <p r e c i s i o n ; i ++) {
gray+=I n t e g e r . t o S t r i n g ( I n t e g e r . p a r s e I n t (
b i n a r y . s u b s t r i n g ( i 1, i ) )
Integer . parseInt (
b i n a r y . s u b s t r i n g ( i , i +1) ) ) ; // = XOR
}
return gray ;
}
Converting in the reverse direction (Gray codes to binary and then back to real values):
double v a l u e = binToDec ( grayToBin ( grayCode , p r e c i s i o n ) )
private S t r i n g grayToBin ( S t r i n g gray ) {
S t r i n g binary = ;
int p r e c i s i o n = gray . l e n g t h ( ) ;
b i n a r y+=gray . s u b s t r i n g ( 0 , 1 ) ;
for ( int i =1; i <p r e c i s i o n ; i ++) {
b i n a r y+=I n t e g e r . t o S t r i n g ( I n t e g e r . p a r s e I n t (
70

b i n a r y . s u b s t r i n g ( i 1, i ) )
Integer . parseInt (
gray . s u b s t r i n g ( i , i +1) ) ) ; // = XOR
}
return b i n a r y ;
}
private double binToDec ( S t r i n g b i n a r y ) {
double dec = 0 . 0 ;
int p r e c i s i o n = b i n a r y . l e n g t h ( ) ;
for ( int i =1; i<=p r e c i s i o n ; i ++) {
i f ( b i n a r y . s u b s t r i n g ( i 1, i ) . e q u a l s ( 1 ) ) {
dec +=(1/(Math . pow ( 2 , i ) ) ) ;
}
}
return dec ;
}

71

Appendix B

Populations divided by time


B.1

English Premier League

The home and away strengths of a football team are not always constant, as a team has different runs
of form, or injuries can hit important players, or a new manager can introduce a fresh enthusiasm to the
players. Obviously these effects are very hard to predict in advance, if not impossible. However certain
periods of the footballing season can be identified as fluctuating in approximately the same manner at
the same period of each season.
It is hypothesised that there are five distinct periods of the English footballing season that can have
a direct effect on the performance of the team:
Start of the season August to end of September teams try to integrate their new summer signings,
and many smaller teams frequently over-perform, due to enthusiasm at the start of the season, as
well as the effect that having fewer international-quality players on duty for their countries over
the summer will have. This part of the season can be very unpredictable, and the emergence of a
clear future League champion is very unlikely.
Mid-Autumn October to end of November teams are starting to find their comfortable form. New
signings are generally well-integrated, and the teams start to perform more consistently.
Christmas December to end of the year this is a very tough period of the calendar for teams, with
fixtures packed very closely together (often two matches in a week) and weaker teams beginning to
tire. Stronger teams, often those which will finish at the top of the table, perform more consistently
than weaker teams.
Mid-winter January to end of February teams have made their January transfers and injected new
dynamism and enthusiasm into the team. Once new players have settled, weaker teams may be
boosted in their performance compared to earlier in the season, whereas stronger teams will receive
less of a boost to their already-good form.
Easter March to end of the season in May weaker teams are starting to tire again, and long-term
injuries are beginning to build and take their toll. Many teams will struggle to be as consistent in
their performances as before.
By splitting the training data into a number of disparate portions according to this schema, and
training separate populations on each portion of the data, it is possible to separate off harder-to-predict
periods such as the start of the season from relatively easier-to-predict periods, so that each population
can be considered an expert in the specific period on which it is trained.
However, as described in section 3.4, different countries leagues have very different structures in
terms of when fixtures are played. This means that schemas such as the one outlined above for the
English Premier League are specific to each league. Population extents for the other leagues used in the
testing sets for this project are outlined below.
72

B.2

French Le Championnat

Start of the season July to end of August with an earlier start to the season than in the English
Premier League, the portion of training data should not need to encompass September. Teams are
finding their form and, as in the EPL, initial results can be very unexpected and unpredictable.
Autumn September to end of October
Pre-Wwnter break November to end of January there is a three-week break during the end of
December and first half of January, so this section of the input data needs to be larger than
the equivalent section in the English Premier League training set, in order to encompass enough
samples for training.
Post-winter break February to end of March
Easter April to end of May

B.3

German 1. Bundesliga

Start of the season August to end of October


Pre-winter break November to end of January there is a six-week break during the second half of
December and the whole of January, so this section of the input data needs to be larger than the
equivalent section in both the EPL and FLC training sets, in order to encompass enough samples
for training.
Post-winter break February to end of March
Easter April to end of May

73

Appendix C

Project proposal

74

75

76

References
[1] James R. Bitner, Gideon Ehrlich, and Edward M. Reingold. Efficient generation of the binary
reflected Gray code and its applications. Commun. ACM, 19(9):517521, 1976. ISSN 0001-0782.
[2] Babatunde Buraimo and Rob Simmons. Market size and attendance in english premier league
football. Technical report, Lancaster University Management School, March 2006.
[3] B Clair and D Letscher. Optimal strategies for sports betting pools. Working paper, 2005. Department of Mathematics and Computer Science, Saint Louis University.
[4] Paul Darwen and Xin Yao. A dilemma for fitness sharing with a scaling function. In Proceedings of
1995 IEEE Conference on Evolutionary Computation, pages 166171. IEEE, December 1995.
[5] Stephanie Forrest. Documentation for prisoners dilemma and norms programs that use the genecic
algorithm. University of Michigan, 1985.
[6] Andrew Mcgilvary Gillies. Machine learning procedures for generating image domain feature detectors. PhD thesis, University of Michigan, 1985.
[7] J. Holland. Adaptation In Natural and Artificial Systems. The University of Michigan Press, Ann
Arbor, 1975.
[8] Christopher Jennison and Nuala Sheehan. Theoretical and empirical properties of the genetic algorithm as a numerical optimizer. Journal of Computational and Graphical Statistics, 4(4):296318,
Dec 1995. ISSN 1061-8600.
[9] Paul Kvam and Joel S. Sokol. A logistic regression/markov chain model for NCAA basketball. Naval
Research Logistics, 53(8):788803, July 2006.
[10] Barclays Premier League. Premier league history. World wide web, August 2007. http://www.
premierleague.com/page/History.
[11] Yong Liu and Xin Yao. Ensemble learning via negative correlation. Neural Networks, 12(10):
13991404, 1999. URL citeseer.ist.psu.edu/liu99ensemble.html.
[12] K.E. Mathias and L.D. Whitley. Transforming the search space with gray coding. volume 1 of Evolutionary Computation, 1994. IEEE World Congress on Computational Intelligence., Proceedings of
the First IEEE Conference on, pages 513518. IEEE, June 1994.
[13] A. P. Rotshtein, M. Posner, and A. B. Rakityanskaya. Football predictions based on a fuzzy model
with genetic and neural tuning. Cybernetics and Systems Analysis, 41(4):619630, July 2005.
[14] Mark Rowan. Evolving strategies for prediction of sporting fixtures. Masters thesis, University of
Birmingham, 2007.
[15] A Tsakonas, G Dounias, S Shtovba, and V Vivdyuk. Soft computing-based result prediction of football games. In Proceedings of the First International Conference on Inductive Modeling, volume 3,
pages 1521, Lviv, 2002.
[16] Wikipedia. Gray code. World wide web, September 2007. http://en.wikipedia.org/w/index.
php?title=Gray code&oldid=155616205#Programming algorithms.
77

[17] Wikipedia. FA Premier League. World wide web, 24 August 2007. http://en.wikipedia.org/w/
index.php?title=Premier League&oldid=153340420.

78

You might also like