Professional Documents
Culture Documents
football results
Gianluca Baio1,2
Marta A. Blangiardo3
Abstract
The problem of modelling football data has become increasingly popular in the last few years and many different models have been proposed
with the aim of estimating the characteristics that bring a team to lose or
win a game, or to predict the score of a particular match. We propose a
Bayesian hierarchical model to address both these aims and test its predictive strength on data about the Italian Serie A championship 1991-1992.
To overcome the issue of overshrinkage produced by the Bayesian hierarchical model, we specify a more complex mixture model that results in
better fit to the observed data. We test its performance using an example
about the Italian Serie A championship 2007-2008.
Keywords: Bayesian hierarchical models; overshrinkage; Football data;
bivariate Poisson distribution.
Introduction
Statistical modelling of sport data is a popular topic and much research has been
produced to this aim, also in reference to football. From the statistical point of
view, the task is stimulating because it raises some interesting issues. One such
To whom correspondence should be sent. However, this work is the result of equal participation of both authors in all the aspects of preparation, analysis and writing of the manuscript.
matter is related to the distributional form associated with the number of goals
scored in a single game by the two opponents.
Although the Binomial or Negative Binomial have been proposed in the late
1970s (Pollard et al. 1977), the Poisson distribution has been widely accepted
as a suitable model for these quantities; in particular, a simplifying assumption
often used is that of independence between the goals scored by the home and
the away team. For instance, Maher (1982) used a model with two independent
Poisson variables where the relevant parameters are constructed as the product
of the strength in the attack for one team and the weakness in defense for the
other.
Despite that, some authors have shown empirical, although relatively low,
levels of correlation between the two quantities (Lee 1997, Karlis & Ntzoufras
2000). Consequently, the use of more sophisticated models have been proposed,
for instance by Dixon & Coles (1997), who applied a correction factor to the
independent Poisson model to improve the performance in terms of prediction.
More recently, Karlis & Ntzoufras (2000, 2003) advocated the use of a bivariate
Poisson distribution that has a more complicated formulation for the likelihood
function, and includes an additional parameter explicitly accounting for the
covariance between the goals scored by the two competing teams. They specify
the model in a frequentist framework (although extensions using the Bayesian
approach have been described by Tsionas 2001), and their main purpose is the
estimation of the effects used to explain the number of goals scored.
We propose in this paper a Bayesian herarchical model for the number of
goals scored by the two teams in each match. Hierarchical models are widely
used in many different fields as they are a natural way of taking into account
relations between variables, by assuming a common distribution for a set of
relevant parameters, thought to underlay the outcomes of interest (Congdon
2003).
Within the Bayesian framework, which naturally accommodates hierarchical
models (Bernardo & Smith 1999), there is no need of the bivariate Poisson
modelling. We show here that assuming two conditionally independent Poisson
variables for the number of goals scored, correlation is taken into account, since
the observable variables are mixed at an upper level. Moreover, as we are framed
in a Bayesian context, prediction of a new game under the model is naturally
accommodated by means the posterior predictive distribution.
The paper is structured as follow: first we describe in 2 the model and the
data used; 3 describes the results in terms of parameter estimations as well as
prediction of a new outcome. In 4 we deal with the problem of overshrinkage
and present a possible solution using a mixture model. Finally 5 presents some
issues and some possible extension that can be material for future work and in
6 we include the WinBUGS code for our analysis.
The model
In order to allow direct comparison with Karlis & Ntzoufras (2003), we first
consider the Italian Serie A for the season 1991-1992. The league is made by a
total of T = 18 teams, playing each other twice in a season (one at home and
one away). We indicate the number of goals scored by the home and by the
away team in the gth game of the season (g = 1, . . . , G = 306) as yg1 and yg2
2
respectively.
The vector of observed counts y = (yg1 , yg2 ) is modelled as independent
Poisson:
ygj | gj Poisson(gj ),
where the parameters = (g1 , g2 ) represent the scoring intensity in the g-th
game for the team playing at home (j = 1) and away (j = 2), respectively.
We model these parameters according to a formulation that has been used
widely in the statistical literature (see Karlis & Ntzoufras 2003 and the reference
therein), assuming a log-linear random effect model:
log g1
log g2
=
=
Poisson-logNormal models have been discussed and widely used in the literature
see for instance Aitchinson & Ho (1989), Chib & Winkelman (2001) and
Tunaru (2002).
The parameter home represents the advantage for the team hosting the game
and we assume that this effect is constant for all the teams and throughout the
season. In addition, the scoring intensity is determined jointly by the attack
and defense ability of the two teams involved, represented by the parameters
att and def , respectively. The nested indexes h(g), a(g) = 1, . . . , T identify the
team that is playing at home (away) in the g-th game of the season.
The data structure for the model is presented in Table 1 and consist of the
name and code of the teams, and the number of goals scored for each game
of the season. As is possible to see, the indexes h(g) and a(g) are uniquely
associated with one of the 18 teams. For example, in Table 1 Sampdoria are
always associated with the index 16, whether they play away, as for a(4), or at
home, as for h(303).
g
1
2
3
4
...
303
304
305
306
home team
Verona
Napoli
Lazio
Cagliari
...
Sampdoria
Roma
Inter
Torino
away team
Roma
Atalanta
Parma
Sampdoria
...
Cremonese
Bari
Atalanta
Ascoli
h(g)
18
13
11
4
...
16
15
9
17
a(g)
15
2
14
16
...
5
3
2
1
yg1
0
1
1
3
...
2
2
0
5
yg2
1
0
1
2
...
2
0
0
2
Conversely, for each t = 1, . . . , T , the team-specific effects are modelled as exchangeable from a common distribution:
attt Normal(att , att ),
As suggested by various works, we need to impose some identifiability constraints on the team-specific parameters. In line with Karlis & Ntzoufras (2003),
we use a sum-to-zero constraint, that is
T
X
T
X
attt = 0,
t=1
deft = 0.
t=1
However, we also assessed the performance of the model using a cornerconstraint instead, in which the team-specific effect for only one team are set
to 0, for instance att 1 := 0 and def 1 := 0. Even if this latter method is slightly
faster to run, the interpretation of these coefficients is incremental with respect
to the baseline identified by the team associated with an attacking and defending
strength of 0 and therefore is less intuitive.
Finally, the hyper-priors of the attack and defense effects are modelled independently using again flat prior distributions:
att Normal(0, 0.0001),
att Gamma(0.1, 0.1),
A graphical representation of the model is depicted in Figure 1. The inherent hierarchical nature implies a form of correlation between the observable variables yg1 and yg2 by means of the unobservable hyper-parameters
= (att , def , att , def ). In fact, the components of represent a latent structure that we assume to be common for all the games played in a season and
that determine the average scoring rate.
att
home
def
att
atth(g)
def
atta(g)
defa(g)
defh(g)
g1
g2
yg1
yg2
Results
home
mean
-0.2238
-0.1288
-0.2199
-0.1468
-0.1974
0.1173
0.3464
-0.0435
-0.2077
0.1214
0.0855
0.5226
0.2982
-0.1208
-0.0224
-0.0096
0.0824
-0.2532
mean
0.2124
attack effect
2.5%
median
-0.5232
-0.2165
-0.4050
-0.1232
-0.5098
-0.2213
-0.4246
-0.1453
-0.4915
-0.1983
-0.1397
0.1255
0.1077
0.3453
-0.3108
-0.0464
-0.4963
-0.2046
-0.1210
0.1205
-0.1626
0.0826
0.2765
0.5206
0.0662
0.2956
-0.3975
-0.1200
-0.2999
-0.0182
-0.2716
-0.0076
-0.1821
0.0837
-0.5601
-0.2459
2.5%
0.1056
median
0.2128
97.5%
0.0595
0.1321
0.0646
0.1255
0.0678
0.3451
0.5811
0.2149
0.0980
0.3745
0.3354
0.7466
0.5267
0.1338
0.2345
0.2436
0.3408
0.0206
mean
0.4776
-0.0849
0.1719
-0.0656
0.1915
0.0672
0.3701
0.1700
-0.2061
-0.3348
0.0722
-0.3349
0.0668
-0.2038
-0.1358
-0.1333
-0.4141
0.3259
defense
2.5%
0.2344
-0.3392
-0.0823
-0.3716
-0.0758
-0.1957
0.1207
-0.0811
-0.5041
-0.6477
-0.1991
-0.6788
-0.2125
-0.5136
-0.4385
-0.4484
-0.7886
0.1026
effect
median
0.4804
-0.0841
0.1741
-0.0645
0.1894
0.0656
0.3686
0.1685
-0.2049
-0.3319
0.0742
-0.3300
0.0667
-0.2031
-0.1300
-0.1317
-0.4043
0.3254
97.5%
0.6987
0.1743
0.4168
0.2109
0.4557
0.3372
0.6186
0.4382
0.0576
-0.0514
0.3145
-0.0280
0.3283
0.0859
0.1253
0.1346
-0.1181
0.5621
97.5%
0.3213
40
50
40
Ascoli
Atalanta
Bari
20
20
0
0
50
20
40
0
0
40
Cagliari
20
40
0
0
50
Cremonese
20
40
20
40
20
40
20
40
20
40
20
40
Fiorentina
20
0
0
50
20
40
0
0
40
Foggia
20
40
0
0
50
Genoa
Inter
20
0
0
100
20
40
0
0
50
Juventus
20
40
Lazio
Milan
50
50
0
0
100
20
40
0
0
50
Napoli
20
40
0
0
100
Parma
Roma
50
50
0
0
50
20
40
0
0
100
Sampdoria
20
40
0
0
40
Torino
Verona
50
0
0
0
100
20
40
20
0
20
40
One possible well known drawback of Bayesian hierarchical models is the phenomenon of overshrinkage, under which some of the extreme occurrences tend
to be pulled towards the grand mean of the overall observations. In the application of a hierarchical model for the prediction of football results this can
be particularly relevant, as presumably a few teams will have very good performances (and therefore compete for the final title or the top positions), while
some other teams will have very poor performance (struggling for relegation).
The model of 2 assumes that all the attack and defense propensities be
drawn by a common process, characterised by the common vector of hyperparameters (att , att , def , def ); clearly, this might be not sufficient to capture
the presence of different quality in the teams, therefore producing overshrinkage,
6
with the effect of: a) penalising extremely good teams; and b) overestimate the
performance of poor teams.
One possible way to avoid this problem is to introduce a more complicated
structure for the parameters of the model, in order to allow for three different
generating mechanism, one for the top teams, one for the mid-table teams, and
one for the bottom-table teams. Also, in line with Berger (1984), shrinkage can
be limited by modelling the attack and defense parameters using a non central
t (nct) distribution on = 4 degrees of freedom instead of the normal of 2.
Consequently, the model for the likelihood, and the prior specification for
the gj and for the hyper-parameter home is unchanged, while the other hyperparameters are modelled as follows. First we define for each team t two latent
(unobservable) variables grpatt (t) and grpdef (t), which take on the values 1, 2 or
3 identifying the bottom-, mid- or top-table performances in terms of attack (defense). These are given suitable categorical distributions, each depending on a
def
def
def
att
att
att
vector of prior probabilities att = (1t
, 2t
, 3t
) and def = (1t
, 2t
, 3t
).
We specify minimally informative models for both att and def in terms of a
Dirichlet distribution with parameters (1, 1, 1), but obvioulsy one can include
(perhaps subjective) prior information on the vectors att and def to represent
the prior chance that each team is in one of the three categories.
The attack and defense effects are then modelled for each team t as:
def
att
def t nct def
,
.
att t nct att
grp(t) , grp(t) , ,
grp(t) grp(t)
In particular, since the values of grpatt (t) and grpdef (t) are unknown, this formulation essentially amounts to defining a mixture model on the attack and
defense effects:
att t =
3
X
k=1
att
att
kt
nct att
k , k , ,
def t =
3
X
def
def
kt
nct def
k , k , .
k=1
The location and scale of the nct distributions (as suggested, we use = 4)
depend on the probability that each team actually belongs in any of the three
categories of grpatt (t) and grpdef (t).
The model for the location and scale parameters of the nct distributions
is specified as follows. If a team have poor performance, then they are likely
to show low (negative) propensity to score, and high (positive) propensity to
concede goals. This can be represented using suitable truncated Normal distributions, such as
att
1
truncNormal(0, 0.001, 3, 0)
def
1
truncNormal(0, 0.001, 0, 3)
def
3
truncNormal(0, 0.001, 3, 0)
Finally, for the average teams we assume that the mean of the attack and defense
effect have independent dispersed Normal distributions
def
att
att
Normal(0, 2def )
2 Normal(0, 2 ) 2
(that is, on average, the attack and defense effects are 0, but can take on both
negative or positive values).
For all the groups k = 1, 2, 3, the precisions are modelled using independent
minimally informative Gamma distributions
katt Gamma(0.01, 0.01),
4.1
We used the Italian Serie A 2007-2008 to test the model described above. A few
major differences between this season and the one described in 2 should be
noticed. Firstly, starting from the season 1994-1995, in Serie A a win is worth
3 points (instead of just 2). Moreover, in the season 2003-2004, the number
of teams in the league was increased to 20 (and therefore the total number of
games played is now G = 360).
These two factors are likely to increase the gap between top and bottom
teams and consequently to invalidate the assumption that the attack (defense)
effects are drawn from a common distribution. In fact, when using the basic
model of 2 for the 2007-2008 data, a large overshrinkage was produced, penalising in particular Inter and Roma (the top two teams). These clubs performed
very well, with over 80 points in the final table, while the estimated points were
only 69 and 67, respectively (see Table 3). Similarly, the bottom-table teams
were predicted to have significantly more points than observed.
team
Inter
Roma
Juventus
Fiorentina
Milan
Sampdoria
Udinese
Napoli
Genoa
Atalanta
Palermo
Lazio
Siena
Cagliari
Torino
Reggina
Catania
Empoli
Parma
Livorno
Observed results
points
scored
concd
85
69
26
82
72
37
72
72
37
66
55
39
64
66
38
60
56
46
57
48
53
50
50
53
48
44
52
48
52
56
47
47
57
46
47
51
44
40
45
42
40
56
40
36
49
40
37
56
37
33
45
36
29
52
34
42
62
30
35
60
0.2
Par
Liv
0.1
Reg
Pal
Ata
Cag
Emp
Gen
Laz
Udi
Nap
Tor
Defense effect
Sam
Sie
Cat
0.1
Fio
Juv
Mil
Rom
0.2
0.3
0.4
Int
0.5
0.4
0.3
0.2
0.1
0
Attack effect
0.1
0.2
0.3
0.4
100
50
Atalanta
50
Cagliari
40
Catania
Empoli
50
0
20
20
40
100
20
Fiorentina
20
40
100
Genoa
20
40
20
Lazio
40
50
20
40
20
40
20
40
20
40
20
40
50
20
40
100
Livorno
0
Juventus
50
50
0
100
Inter
50
0
40
50
50
Milan
Napoli
50
0
20
40
50
20
Palermo
40
50
20
40
50
Parma
100
Reggina
Roma
50
0
20
40
100
20
Sampdoria
40
50
20
40
50
Siena
Torino
0
Udinese
50
0
0
100
50
20
40
20
40
20
40
Figure 4: Posterior predictive validation of the mixture model: the black line
represents the observed cumulative points through the season, while the blue
line represents predictions for the Bayesian hierarchical mixture model
Attack effect
Defense effect
Udinese
Udinese
Torino
Torino
Siena
Siena
Sampdoria
Sampdoria
Roma
Roma
Reggina
Reggina
Parma
Parma
Palermo
Palermo
Napoli
Napoli
Milan
Milan
Livorno
Livorno
Lazio
Lazio
Juventus
Juventus
Inter
Inter
Genoa
Genoa
Fiorentina
Fiorentina
Empoli
Empoli
Catania
Catania
Cagliari
Cagliari
Atalanta
Atalanta
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Posterior probability of being a Bottom, Mid or Toptable club
0.9
(a)
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Posterior probability of being a Bottom, Mid or Toptable club
0.9
(b)
Figure 5: Posterior probability that each team belongs in one of the three groups
Discussion
The model presented in this paper is a simple application of Bayesian hierarchical modelling. The basic structure presented in 2 can be easily implemented
and run using standard MCMC algorithms, such as the one provided for WinBUGS
in 6. The performance of this model is not inferior to the one used by Karlis
& Ntzoufras (2003), which relies on a bivariate Poisson structure and requires
a specific algorithm.
Moreover, the hierarchical model can be easily extended to include a mixture
structure to account for the fact that teams show a different propensity to
10
score and concede goals, as represented by the attack and defense effects. In
this case, the model becomes more complex and time consuming, but it can still
be accommodated within standard MCMC algorithms (that we again developed
in WinBUGS).
Sensitivity analysis has been performed on the he choice of the arbitrary
cutoffs of 3 for the truncated Normal distributions used in the mixture model.
When larger values were chosen, the model was not able to assign the teams
to the three components of mixture, with almost all them being associated
with the second category. This is intuitively due to the fact that when the
truncated Normal distributions have a larger support, their density is too low,
in comparison with the central component. On the other hand, when the cutoff
is too small, then the densities of the extreme components are too high and
therefore none of the teams is assigned to the second category.
One of the limitations of the results produced by the model (rather than of
the model itself) is that, for the sake of simplicity, predictions are obtained in
one batch, that is for all the G games of the season, using the observed results
to estimate the parameters. An alternative, more complex approach would be
to dynamically predict new istances of the games. One possibility would be to
define the hyper-parameters as time-specific, in order to account for periods
of variable form of the teams (including injuries, suspensions, etc.). Moreover,
prior information can be included at various level in the model, perhaps in the
form of expert opinion about the strength of each team.
The complete codes for the WinBUGS models presented in 2 and 4 are given below. Notice the use of the function djl.dnorm.trunc used to code the truncated
Normal distribution. This is not available in the standard WinBUGS version, but
can be freely downloaded from the internet (see Lunn 2008).
model {
# LIKELIHOOD AND RANDOM EFFECT MODEL FOR THE SCORING PROPENSITY
for (g in 1:ngames) {
# Observed number of goals scored by each team
y1[g] ~ dpois(theta[g,1])
y2[g] ~ dpois(theta[g,2])
# Predictive distribution for the number of goals scored
ynew[g,1] ~ dpois(theta[g,1])
ynew[g,2] ~ dpois(theta[g,2])
# Average Scoring intensities (accounting for mixing components)
log(theta[g,1]) <- home + att[hometeam[g]] + def[awayteam[g]]
log(theta[g,2]) <- att[awayteam[g]] + def[hometeam[g]]
}
# 1. BASIC MODEL FOR THE HYPERPARAMETERS
# prior on the home effect
home ~ dnorm(0,0.0001)
# Trick to code the sum-to-zero constraint
for (t in 1:nteams){
att.star[t] ~ dnorm(mu.att,tau.att)
def.star[t] ~ dnorm(mu.def,tau.def)
att[t] <- att.star[t] - mean(att.star[])
11
References
Aitchinson, J. & Ho, C. (1989), The multivariate poisson-log normal distribution, Biometrika 76, 643653.
Berger, J. (1984), The robust Bayesian point of view, in J. Kadane, ed., Robustness of Bayesian analysis, North Holland, Amsterdam, Netherlands.
Bernardo, J. M. & Smith, A. (1999), Bayesian Theory, John Wiley and Sons,
New York, NY.
12
Chib, S. & Winkelman, R. (2001), Markov Chain Monte Carlo Analysis of Correlated Count Data, Journal of Business and Economic Statistics 4, 428
435.
Congdon, P. (2003), Applied Bayesian Modelling, John Wiley and Sons, Chichester, UK.
Dixon, M. & Coles, S. (1997), Modelling association football scores and inefficiencies in the football betting market, Journal of the Royal Statistical
Society C 46, 265280.
Karlis, D. & Ntzoufras, I. (2000), On modelling soccer data, Student 3, 229
244.
Karlis, D. & Ntzoufras, I. (2003), Analysis of sports data by using bivariate
Poisson models, Journal of the Royal Statistical Society D 52, 381393.
Lee, A. (1997), Modeling scores in the Premier League: is Manchester United
really the best?, Chance 10, 1519.
Lunn, D. (2008), WinBUGS code for the truncated normal distribution, Documentation and code available online.
URL: http://www.winbugs-development.org.uk/shared.html
Maher, M. (1982), Modelling association footbal scores, Statistica Neerlandica
36, 109118.
Pollard, R., Benjamin, P. & Reep, C. (1977), Sport and the negative binomial
distribution, in S. Ladany & R. Machol, eds, Optimal Strategies in Sports,
North Holland, New York, NY, pp. 188195.
Tsionas, E. (2001), Bayesian Multivariate Poisson Regression, Communications in Statistics Theory and Methodology 30(2), 243255.
Tunaru, R. (2002), Hierarchical bayesian models for multiple count data, Austrian Journal of Statistics 31, 221229.
13