You are on page 1of 8

IST-Africa 2015 Conference Proceedings

Paul Cunningham and Miriam Cunningham (Eds)


IIMC International Information Management Corporation, 2015
ISBN: 978-1-905824-50-2

Using Artificial Neural Networks to


Predict Winners in Horseraces:
A Case Study at the Champs de Mars
Sameerchand PUDARUTH1, Manish JOGEEAH2, Akshay Kumar CHANDOO3
University of Mauritius, Reduit, Moka, 80835, Mauritius
Tel: +230 403 7400, Fax: + 230 454 9642, Email: s.pudaruth@uom.ac.mu1,
manish.jogeeah@umail.uom.ac.mu2, akshay.chandoo@umail.uom.ac.mu3
Abstract: In this paper, we have used a machine learning approach in order to
predict winners at the Champs de Mars horse racing track. In particular, we have
used a multi-layer perceptron (artificial neural network) with one hidden layer
containing five nodes and a zero-based log-sigmoid activation function to predict the
output. The training was done on the first 41 meetings on a total of 347 races while
the testing was carried out on 16 races from the last two meetings. Currently, the
success rate of the neural network is only 25%, well below those claimed in literature
although a different data set has been used and it has always been the case that the
Champs de Mars horse racing track has been considered to be different from most
racing tracks in the world because of its very short straight of only 300m.
Nevertheless, we also noted that 11 out of 16 winners can be found within the first
four predicted positions. We believe that the system can still be improved
significantly by selecting the right parameters of the neural network and the right
features. The lessons learnt from this work can easily be adapted to more important
economic matters such as predicting the price of stocks, foreign currency exchange
rates, tourists arrival and the price of oil and gas.
Keywords: artificial neural networks, horseracing, Champs de Mars.

1. Introduction
Your Horse racing is a very popular sport in Mauritius. Mauritius is considered as a racing
mad nation. Horse races are held at the Champ De Mars, the second oldest track in the
world founded in 1812 and which just celebrated its 200th years in 2012. The Mauritian
people are very fond of gambling and betting on races is common among thousands of
Mauritians. Rumours like having confidential news from the stable, jockey or owner
influences people to place a bet on that particular horse. This is how the majority of bettors
place their bets. They also consult various race magazines in which tipsters in Mauritius
give their opinions and analysis of each race. These race magazines also gives an insight of
each horse performance at training prior to the race and very often contain interviews of
horse trainers, jockeys or owners who give their opinion on the chances of their horse.
One thinking which makes bettors place their trust in the analysis of professional
tipsters is that, they believe these tipsters have access to more information and have closer
proximity with the horses connections. Bettors do not spend enough time doing their own
analysis. Analysing a race is quite a time consuming and very tedious task. Most of the
bettors do not have proper techniques or they do not have good tools to capture hidden
patterns and trends in the large amount of data that can be associated with a particular race.
Bettors are confronted to large amount of different analytical data and are often
unsuccessful to find good correlation among the different attributes or the percentage by
which a particular attribute influences the result which would help them reach a conclusion.

Copyright 2015 The authors www.IST-Africa.org/Conference2015 Page 1 of 8


Our aim is to provide the proper tools to perform analysis of all the data associated to a
race before drawing a conclusion about the potential winner. Data more specifically results
for the first 41 race meetings have been collected for the 2014 racing season. Thorough
analysis of a huge amount of data for a race will lead to identify horses that stand out.
People who are interested in the gambling industry like horse racing and football and
scientists who are interested in the application of probability in such types of events will be
the direct beneficiaries of this work. However, the applications of neural networks are not
limited solely to such types of data. Neural networks can be used to find appropriate
solutions for problems in diverse fields such as astrophysics, medicine, business, banking
and finance, transport and logistics, bioinformatics, law and many others.
This paper proceeds as follows. In section 2, we give an overview of related works that
have been done in this area although we have noticed that little work has been in this field.
Section 3 describes the methodology and the data collection processes. The results are
presented, discussed and evaluated in section 4. Section 4 concludes the paper and
epitomising on the potential of neural networks with regards to economic affairs.

2. Literature Review
Silverman [1] used a Gibbs model in order to predict the speed of a horse. He assumed that
the horse with the fastest speed would win most of the races. However, when tested with
real data, the horse with the fastest speed won only 21.63% of the total races. Although this
is better than pure random guessing, it is far less than what experienced tipsters can
achieve. In Mauritius, about 40% of the winning horses are public favourites and the best
tipsters usually achieve around the same percentage. Thus, the percentage success obtained
by Silver is far from being adequate and definitely would lead to a loss if betting is made.
In another experiment, Silverman [1] used some additional features like the number of days
after which the horse is running again, the change in weight, whether the horse had gained
or lose weight and its average speed in recent races. A conditional logistic model was then
used to make the prediction. This model fared better than the first one as the return on
investment was found to be 36.73% when the parameters were varied.
A probabilistic approach was used in [2] to determine the winner of a horse race. Two
hundred and forty races from the 2010 horse racing season from the Champs de Mars
racecourse in Mauritius was used for testing. In 2010, each horseracing meeting had 8
races. Thirty meetings were used to collect the statistics and the testing was based on
meetings 31st, 32nd and 33rd. Out of 24 races, the system predicted fourteen winners
compared with eleven winners from the best professional tipster for the same three
meetings. In [3], a similar experiment but this time using fuzzy logic was carried out on
2012 horseracing season. In this case, only 10 winners were predicted but it was still
slightly better compared with professional tipsters. A simulation of a betting operation in
which Rs100 was staked on each of the predicted winners was also conducted. The return
on investment was calculated to be 90.6%.
Schumaker [4] used support vector regression and features such as fastest time, win
percentage, place percentage and average finishing position for the last four races to predict
the rank of a horse in their next race. He explains how a balancing point can be found
between accuracy and payout in order to maximise payout. In [5], Schumaker compared his
S&C Racing System with random chance, crowd wisdom and Dr. Z Bettors on six different
wagers (win, place, show, exacta, quiniela and trifecta). His system outperformed all the
three systems. He also found that using information from only the last four races was
enough to maximise both accuracy and payout.
In [6], the authors have used artificial neural networks for the prediction of winners in
horseraces. In particular, they used five different algorithms and applied them on horse
racing data collected from the Aqueduct track in New York. Back-propagation (BP) and

Copyright 2015 The authors www.IST-Africa.org/Conference2015 Page 2 of 8


Back-propagation with momentum (BPM) were able to predict 39 winners out of 100 races.
Quasi-Newton BFGS (BFG) predicted 35 winners, Conjugate Gradient Descent (CGD)
predicted 32 winners and Levenberg-Marquadt (LM) predicted only 29 winners. Thus, BP
and BPM were the best at predicting winners while LM was found to be the fastest
algorithm. Davoodi and Khanteymooris [7] work was based primarily on the work of
Williams and Li [7], who conducted similar experiments on data collected from the
Caymans Race Track in Jamaica.
In [8], Bishell explained most models that have been used to predicting the outcome of
horseraces consider the strength of each horse separately from the other horses. Thus, he
created two new models which he called precise predictor with clustering and race
predictor which considered the strength of each horse relative to other horses running in the
same race. However, his experiments concluded that this is worst approach that considering
the horses separately.
Edelman [9] have used a machine learning technique known as support vector machine
(SVM) to predict the odds of horseraces. He used a sample of 200 races with 12 features
and showed that SVM can do equally well for predicting the finishing positions compared
with traditional linear and logistics regression-based methods. Lessmann et al. [10] building
on the work of Edelman [9], Sung and Johnson [11] and Benter [12], varied the parameters
of the support vector machine in the first step of the commonly accepted two-stage
procedure. They showed that it is possible to further enhance the predictive accuracy of
such models by 56% through a judicious use of parameters.
All the works mentioned so far are related to gambling, however, it is important to
make out that neural networks are only a tool that can be used in a wide variety of domains.
Ao [13] has used a neural network to predict the number of tourists arrival. Neural network
have also been used massively in the prediction of stock prices [13][14] and in analysing
the performance of enterprises [15][16]. Bhurtun et al. [17] have used neural networks to
predict the peak energy demand for the next hour in Mauritius. Neural networks are a
powerful tool but the results are often misinterpreted and the benefits are often exaggerated.
To harness the power of this technique, it is important to realise its shortcomings and to
carry out more research in order to make more suitable decisions as regards to the type of
network and the parameters that must be used to get valuable results. Neural networks are
not a panacea to all types of problems. Some problems can usually be solved using other
much simpler methods like linear regression and k-nearest neighbour.

3. Methodology
The aim of this paper is to perform deep analysis of the huge amount of data associated
with a race in order to identify hidden trends that are very difficult to be noticed without the
use of proper tools to enable more accurate predictions. The specific objectives are: to
gather all data for each horse participating in a particular race, to provide the neural
network with the training data and desired output for supervised learning, to test the model
developed from the training data on the testing data and to determine the potential winner
of a race.
Data for the racing season of 2014 has been collected in an excel sheet. The data
consists of results of each race in every of the first 41 race meetings. On average there were
9 races per race meeting so the excel sheet contains about 347 races. The neural network
needs to be provided with a training data and expected output first. It uses a supervised
learning algorithm to develop a model based on the training data. Then the finishing times
for the testing data can be predicted. The training data consists of the following inputs.

Copyright 2015 The authors www.IST-Africa.org/Conference2015 Page 3 of 8


3.1 Weight

Weight refers to weight the horse will be carrying and includes the jockeys weight. The
weight is basically determined by the rating of the horse. The rating of the horse gives an
idea of a horse ability. The rating of a highly performing horse rises up and consequently
the horse will be carrying more weight. The weight a horse carries is important as a horse
with top weight like 61kg is likely to be penalised as compared to a horse with 50kg on the
back. Horses with a low weight are likely to produce better finish.

3.2 Draw

The draw or barrier draw is the horse starting position in the starting stall. This is quite a
determining factor for the position the horse will secure a position throughout the race. It
also determines how much effort a horse will have do in the premature stages of the race.
The Champ de Mars racetrack is a tricky one and a good starting position in the stall is
always favourable. For example for an 1850m race, a big barrier draw is undesirable. This
is because just 150m after the starting line, there is a tight bend. Horses starting from a bad
barrier and who want to lead will have to cover more ground round the bend 150m form the
start if they are not fast enough when coming out of their respective gates.

3.3 Odds

It is common knowledge that a horse with a high chance of winning will offer the lowest
return. So the odds of a horse can give an idea of how the horses chance of winning is
perceived by the bookmaker.

3.4 Jockey

The most influential factor in determining the probability of a horse winning a race.
Different jockeys have different abilities. Moreover this factor is quite unpredictable as
maybe for a particular race meeting, a jockey might be less focused due to other problems.
To classify the jockey based on their abilities, the total number of race meetings they have
ridden is divided by the total wins. So a jockey having a high value is considered to a good
jockey. The worth of a jockey is calculated by dividing the number of wins by the number
of rides in a horse racing season.

3.5 Previous Performance

The previous performance is generally what drives the odds. For a horse having been in the
top spots on its last race, probability of performing equally well the next time is considered
high. The last 5 performances have been considered. Generally a horse which has been
performing well on his last outings is expected to perform well.

3.6 Distance

The distance to cover is an important factor. Horses are generally classified according to the
distances they excel. A horse which runs well over short distances is known as a sprinter. A
horses performing best over middle-distance races are known as milers and long-distance
horses are called stayers. Races of different distances are run at the Champ De Mars. These
are: 1000m, 1365m, 1400m, 1500m, 1600m, 1650m, 1850m, 2100m and 2300m.

Copyright 2015 The authors www.IST-Africa.org/Conference2015 Page 4 of 8


3.7 Margin

The length separating the winner and the horse. If a particular won its last race then it will
have a margin value of zero. This is the attribute we are going to predict. The horse with the
lowest margin is considered to be the winner.

Figure 1. A Neural Network to Predict the Rank of a Horse in a Race


A neural network is an algorithm that can learn complex patterns from a set of data. The
patterns that are discovered are then used to make predictions on new data. A neural
network usually consists of three layers: an input layer, zero or more hidden layers and an
output layer. As shown in Fig. 1, the input layer consists of the features that are believed to
impact on the output. The hidden layer allows the network to discover complex
relationships that usually exist between the inputs and the output.

4. Experiments, Results and Evaluation


In the 2014 racing season, there were 43 race meetings. In this study, forty one (41) race
meetings have been used as training set and the two last meetings have been used as the
testing set. There are 347 races in the training set and 16 races in the testing set. An
artificial neural network has been used to predict the horses which will finish in the first
four positions. A trial version of NeuroXL Predictor [18] has been used as the neural
engine. The parameters used for training and making the predictions are shown in Table 1.
Table 1. Parameters Used for Constructing the Neural Network
Parameter Value
Number of Epochs/Cycles 20000
Minimum Weight Delta 0.000001
Scale Inputs and Output values Yes
Initial Weights 0.3
Learning Rate 0.3
Momentum 0.60
Neurons in Hidden Layer 5
Activation Function Zero-based Log-Sigmoid
Table 1 shows the parameters that have been used for initialising the multi-layer
perceptron. Thus, we allow the neural network to perform 20000 cycles to reach a

Copyright 2015 The authors www.IST-Africa.org/Conference2015 Page 5 of 8


converging point. Five nodes were used in the hidden layer and finally a zero-based log-
sigmoid activation function was used to estimate the final values.

Figure 2. Building the Model based on the Training Set


The first step in making the predictions involve creating a model based on the data in
the training set. The blue line in Fig. 1 below shows the variation in the actual data while
the green line demonstrates how the neural network has been used to fit the testing data
over the training data. The next step is to make the prediction. Five features namely the
weight the horse is carrying, its draw, its five previous performances, its odds on Saturday
and the distance of the race have been used in order to predict the margin at the finishing
post. The jockeys worth was not used because for the international meetings we have
assumed that all jockeys have equal worth.
Table 2. Predicted Results for Race 1 of Meeting 42
Predicted Predicted Actual
Horse Weight Draw P1 P2 P3 P4 P5 Odds Distance
Margin Rank Rank

Kowloon Bay 59 10 6 6 9 7 3 1200 1600 3.578 1 1


Dream In
Combat 59.5 8 7 6 9 7 2 450 1600 3.735 2 2
Captain
Matthew 61 2 5 2 2 3 4 330 1600 7.484 3
Storm Alterno 57 9 2 7 4 9 6 1100 1600 8.438 4
Sheriff
Marshall 59.5 7 5 8 9 5 5 1600 1600 8.954 5
Fort Noble 59.5 1 4 6 1 6 5 700 1600 10.807 6 3
Arromonches 59 3 5 5 6 1 2 600 1600 15.186 7 4
Young Royal 60.5 6 1 11 4 2 7 800 1600 23.102 8

Table 2 shows the results after running the prediction algorithm on the first race of the
nd
42 race meeting. The horses have been sorted in ascending order of their predicted
margin. The predicted rank and actual rank are also shown. The neural network has
predicted Kowloon Bay as the winner, Dream in Combat in the second position, Captain
Matthew in the third place and Storm Alterno completing the quartet. We can see that the
neural network has been able to predict the first and second places correctly in this race.
However, the horses which were predicted to finish in the 6th and 7th positions have
completed the trifecta and quartet respectively.

Copyright 2015 The authors www.IST-Africa.org/Conference2015 Page 6 of 8


Table 3. Predicted Results v/s Actual Results for Meeting 42 and 43

Race Meeting 42 Race Meeting 43


Predicted Results Actual Results Predicted Results Actual Results
nd
Race 1st 2nd 3rd 4th 1st 2 3rd 4th 1st 2nd 3rd 4th 1st 2nd 3rd 4th
1 7 3 1 8 7 3 4 6 5 7 2 3 5 3 8 1
2 2 5 6 8 8 7 3 1 6 2 1 8 2 1 4 5
3 6 2 5 3 6 7 3 4 5 8 1 7 3 4 6 8
4 5 7 3 4 4 2 7 6 3 5 7 6 5 8 7 6
5 1 8 4 6 5 2 1 6 2 5 7 3 1 6 3 4
6 3 8 6 7 1 2 5 7 4 1 5 7 6 4 3 1
7 1 5 4 3 5 1 3 8 7 8 3 4 7 3 2 8
8 2 1 5 4 5 8 4 1 1 2 6 7 6 3 8 4
Table 3 shows the detail predicted and actual results for meetings 42 and 43. Only the
first four positions have been shown. Overall, the neural network has been able to predict 4
winners out of 16 races, i.e., a percentage success of 25%. However, 3 horses that were
predicted to finish second have won their respective race. Furthermore, 2 horses that were
predicted to finish third have also won their race and two horses that were predicted to
finish in the fourth positions have actually won the race. We can also note that for races 5
and 7 from meeting 42 and race 6 of meeting 4, the horses that were predicted to win the
race have finished in the 2nd and 3rd positions.
Although, these results are well below those claimed in [6] and [7], although we need to
point out that the data is different and the features used are also quite different. It is also
important to point out that none of the predicted winners were crowd favourites, instead
they can be considered as long shots as the odds of three of four winning horses was above
Rs1000. Betting Rs100 on each of these 16 races would have led to a positive payout of
Rs2000. Looking from this angle, the system seems to be a profitable one, however, it is too
early to draw any hard and fast rules about the reliability of this system as more testing
would be required for that.

5. Conclusions
The aim of this paper was to gauge the potential of using artificial neural networks for
identifying prospective winners of horseraces. To the best of our knowledge, no such work
has been done earlier on data from the Champs de Mars track. Thus, data from 363 races
were collected and divided into a training set and a training set. Out of 16 races, our system
was able to predict 4 races correctly. The best winning percentage of professional tipsters
was 38% for the 2014 horse racing season while the percentage of crowd favourites that
won was 40%. Thus, our proposed system fared less well than these two approaches. We
also estimated the payout for 16 races and a 225% return on investment was obtained
because we noticed that the system favoured longshots. Looking from this angle, the system
is as good as previous systems that have been presented in the literature. In the future, we
intend to use even more features and to work on different types of wagers in order to find
the out with the best payout. The effects of changing the different parameters of the neural
network will also be investigated. The results will also be compared with other machine
learning classifiers such as nave Bayes, fuzzy logic and support vector machines. The
lessons learnt from this research can easily be applied to make predictions on more
important national economic affairs such as predicting the price of stocks, fuels, vehicles,
currency exchange rate, inflation rate, energy consumption, fish capture sites, tourists
arrival, population growth and the risks of developing life-threatening diseases. However,

Copyright 2015 The authors www.IST-Africa.org/Conference2015 Page 7 of 8


in order to unleash the full potential of neural networks, more multi-disciplinary
collaboration is required.

References
[1] N. Silverman, Optimal Decisions with Multiple Agents of Varying Performance, Ph.D. Dissertation,
University of California, Los Angeles, 2013.
[2] S. Pudaruth, R. Seesaha and L. Rambacussing, Generating Horse Racing Tips at the Champs de Mars
using Fuzzy Logic, International Journal of Computer Science and Technology, Vol. 4, No. 3, July -
September 2013.
[3] S. Pudaruth, N. Medard and Z. B. Dookhun, Horse Racing Prediction at the Champs de Mars using a
Weighted Probabilistic Approach, International Journal of Computer Applications, Vol. 72, No. 5, May
2013.
[4] R. P. Schumaker, Using SVM Regression to Predict Harness Races: A One Year Study of Northfield
Park, in Proc. of the Midwest Decision Sciences Institute Conference, Indianapolis, May 2011.
[5] R. P. Schumaker, Machine Learning the Harness Track: Crowdsourcing and Varying Race History,
Decision Support Systems and Electronic Commerce, Vol. 54, No. 3, pp. 1370-1379, February 2013.
[6] E. Davoodi and A. R. Khanteymoori, Horse Racing Prediction using Artificial Neural Networks, in
Proc. of the 11th WSEAS International Conference on Recent Advances in Neural Networks, Fuzzy
Systems & Evolutionary Computing, pp. 155-160, 2010.
[7] J. Williams and Y. Li, A Case Study using Neural Network Algorithms: Horse Racing Predictions in
Jamaica, in Proc. of the International Conference on Artificial Intelligence, Las Vegas, 2008.
[8] A. Bishell, Machine Learning and New Zealand Horse Racing Prediction, BSc. Report, Department of
Computer Science, Massey University, New Zealand, 2006.
[9] D. Edelman, Adapting Support Vector Machine methods for Horserace Odds Prediction", Annals of
Operations Research, Vol. 151(1), pp. 325-336, April 2007.
[10] S. Lessmann, M. C. Sung and J. E. V. Johnson, Adaptive Least-Square Support Vector Regression
Models to Forecast the Outcome of Horseraces, The Journal of Prediction Markets, Vol. 1(3), pp. 169-
187, 2007.
[11] M. Sung and J. E. V. Johnson, Comparing the Effectiveness of One- and Two-step Conditional Logit
Models for Predicting Outcomes in a Speculative Market, Journal of Prediction Markets, Vol. 1, pp. 43
59, 2007.
[12] W. Benter, Computer-based Horse Race Handicapping and Wagering Systems: A Report, in DB
Hausch, VSY Lo and WT Ziemba (eds) Efficiency of Racetrack Betting Markets (London, Academic
Press), pp. 183198, 1994.
[13] S. I. Ao, A Framework for Neural Network to make Business Forecasting with Hybrid VAR and GA
Components, Engineering Letters, Vol. 13, No. 1, May 2006.
[14] C. Wong, Re-Thinking Financial Neural Network Studies: Seven Cardinal Confounds, in Proc. of the
Global Conference on Business and Finance Proceedings, Vol. 6, No. 1, Las Vegas, Nevada, 2011.
[15] A. Bhunia, S. Mukhuti and G. Roy, Financial Performance Analysis A Case Study, Current Research
Journal of Social Sciences, Vol. 3, No. 3, pp. 269-275, 2011.
[16] S. Y. Lin, C. H. Chen and C. C. Lo, Currency Exchange Rates Prediction based on Linear Regression
Analysis using Cloud Computing, International Journal of Grid and Distributed Computing, Vol. 6, No.
2. April 2013.
[17] C. Bhurtun, I. Jahmeerbacus and C. Jeewooth, Short Term Load Forecasting in Mauritius using Neural
Network, in Proc. of the 8th IEEE International Conference on the Industrial and Commercial Use of
Energy (ICUE), pp. 184-191, Cape Town, August 2011.
[18] NeuroXL, 2015. Online. http://neuroxl.com/products/excel-forecasting-software/neuroxl-predictor.htm.
Last Accessed: 28 February 2015.

Copyright 2015 The authors www.IST-Africa.org/Conference2015 Page 8 of 8

You might also like