You are on page 1of 19

INDIAN INSTITUTE OF MANAGEMENT, KOZHIKODE

QUANTITATIVE ANALYSIS IN FOOTBALL


Quantitative Methods Project
9/18/2013

This report shows the various ways quantitative analysis can be done in football to determine the performance of a team and predict the result of a match.

Quantitative Analysis in Football

Contents
INTRODUCTION: ................................................................................3 DESCRIPTIVE STATISTICS: .............................................................4 PROBABLITY ANALYSIS ...............................................................11 INTERVAL ESTIMATION ...............................................................15 HYPOTHESIS TESTING: ..................................................................16 REFERENCES: ...................................................................................19

Group 27

Page 2

Quantitative Analysis in Football

INTRODUCTION:
We are all well aware of how football has impacted the financial world. The money generated by football is growing steadily since 1990. There have also been record breaking financial deals and negotiations between football clubs and players. Other than the deals between football clubs and players, a huge amount of money is transacted in the form of betting. According to Nevada gaming Commission $3.2 billion was wagered in sports bets in the states casinos in 2011. Of that amount, $1.34 billion or 41 percent was handled just for football. Thirty-three million Americans participate in fantasy football, according to the Fantasy Sports Trade Association. The FSTA found that $1.18 billion changes hands between players through pools each year. Hence a there is a need to quantitatively evaluate not only the players, but also the performance of the team as a whole. Football results are randomly distributed but the outcomes of the games can be predicted using statistical analysis. Here in this project we have shown how quantitative analysis can be used in analysing the performance of the team and in turn predicting the results of a match. With football betting, there are only three possible half-time and full time outcomes (home/draw/away). We have used the results of matches played by two teams Real Madrid and Manchester United from 1998 to 2013 to analyse and predict their performance. The data used for analysis contains the number of matches played by the team in a season, the position held by the team in that season, the points gained by the team in that season, the home and away match records (number of matches won, number of matches lost, number of matches with no result, number of goals scored for the team and number of goals scored against the team). Below is the data used for our analysis: Real Madrid

Home
Season Position Played Win Draw Loss For Against Win Draw

Away
Loss For Against Points

2012-2013 2011-2012 2010-2011 2009-2010 2008-2009 2007-2008 2006-2007 2005-2006 2004-2005 2003-2004 2002-2003 2001-2002 2000-2001 Group 27

2 1 2 2 2 1 1 2 2 4 1 3 1

38 38 38 38 38 38 38 38 38 38 38 38 38

17 16 16 18 14 17 12 11 15 13 13 14 15

2 2 1 0 2 0 4 4 1 2 5 5 3

0 1 2 1 3 2 3 4 3 4 1 0 1

67 70 61 60 49 53 32 40 43 43 52 48 53

21 19 12 18 29 18 18 21 12 26 22 14 15

9 16 13 13 11 10 11 9 10 8 9 5 9

5 2 4 3 1 4 3 6 4 5 7 4 5

5 1 2 3 7 5 5 4 5 6 3 10 5

36 51 41 42 34 31 34 30 28 29 34 21 28

21 13 21 17 23 18 22 19 20 28 20 30 25

85 100 92 96 78 85 76 70 80 70 78 66 80 Page 3

Quantitative Analysis in Football 1999-2000 1998-1999 5 2 38 38 9 14 4 2 6 31 3 46 27 24 7 7 10 3 2 27 9 31 21 38 62 68

Manchester United

Home
Season Position Played Win Draw Loss For Against Win Draw

Away
Loss For Against Points

EPL
2012-2013 2011-2012 2010-2011 2009-2010 2008-2009 2007-2008 2006-2007 2005-2006 2004-2005 2003-2004 2002-2003 2001-2002 2000-2001 1999-2000 1998-1999 1 2 1 2 1 1 1 2 3 3 1 3 1 1 1 38 38 38 38 38 38 38 38 38 38 38 38 38 38 38 16 15 18 16 16 17 15 13 12 12 16 11 15 15 14 0 2 1 1 2 1 2 5 6 4 2 2 2 4 4 3 2 0 2 1 1 2 1 1 3 1 6 2 0 1 45 52 49 52 43 47 46 37 31 37 42 40 49 59 45 19 19 12 12 13 7 12 8 12 15 12 17 12 16 18 12 13 5 11 12 10 13 12 10 11 9 13 9 13 8 5 3 10 3 4 5 3 3 5 2 6 3 6 3 9 2 3 4 5 3 4 3 4 4 6 4 3 4 3 2 41 37 29 34 25 33 37 35 27 27 32 47 30 38 35 24 14 25 16 11 15 15 26 14 20 22 28 19 29 19 89 89 80 85 90 87 89 83 77 75 83 77 80 91 79

DESCRIPTIVE STATISTICS:
Descriptive statistics is a discipline that describes the main features of collection of data. Some measures that are commonly used to describe a data set are measures of central tendency and measures of variability or dispersion. Measures of central tendency include the mean, median and mode, while measures of variability include the standard deviation (or variance), the minimum and maximum values of the variables, kurtosis and skewness. Based on the position held by the teams in various seasons, we can come to the consensus that Manchester United has remained in the top three teams for the past 14 years, with the majority times winning and getting ranked first.

Group 27

Page 4

Quantitative Analysis in Football

Positions profile of Man U

1 2 3

Real Madrid has remained in the top five teams for the last 14 years, with majority times winning and getting ranked first.

Positions profile of Real Madrid


1 2 3 4 5

Stacked column charts show the relationship of individual items to the whole, comparing the contribution of each value to a total across categories. Number of wins, draws and losses in home/away can be depicted using stacked column chart with each stack representing number of wins, number of losses and number of draws.

Group 27

Page 5

Quantitative Analysis in Football

Home Manchester United

20 18 16 14 12 10 8 6 4 2 0

Loss Draw Win

2011-2012

2000-2001

2012-2013

2010-2011

2009-2010

2008-2009

2007-2008

2006-2007

2005-2006

2004-2005

2003-2004

2002-2003

2001-2002

1999-2000

Home Real Madrid


20 18 16 14 12 10 8 6 4 2 0

1998-1999

Loss Draw Win

Group 27

Page 6

Quantitative Analysis in Football

Away Manchester United


20 18 16 14 12 10 8 6 4 2 0

Loss Draw Win

Away Real Madrid


20 15 10 5 0 Loss Draw Win

The summary statistics number of wins in home and away by a team is as follows Manchester United
Home -Win Mean Standard Error Median Mode Standard Deviation Sample Variance 14.73333 0.511456 15 16 1.980861 3.92381 Page 7

Group 27

Quantitative Analysis in Football Kurtosis Skewness Range Minimum Maximum Sum Count -0.44462 -0.46411 7 11 18 221 15

Away - Win Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count 10.73333 0.589323 11 13 2.282438 5.209524 1.366206 -1.16011 8 5 13 161 15

Real Madrid
Home- Win Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count 14.26667 0.628427 14 14 2.433888 5.92381 0.111816 -0.52951 9 9 18 214 15

Away- Win

Group 27

Page 8

Quantitative Analysis in Football Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count 9.8 0.711805217 9 9 2.75680975 7.6 0.715807738 0.57022685 11 5 16 147 15

Box Plot This plot is used to determine the dispersion of values with respect to the mean as well as determine the skewness in the values. Real Madrid Home Win 9 11 12 13 13 14 14 14 15 15 16 16 17 17 18 Median = 14 Q1 = 13 Q2 = 16 Minimum (x1) = 9 Away Win 5 7 7 8 9 9 9 9 10 10 11 11 13 13 16 Median = 9 Q1 = 8 Q2 = 11 Minimum (x1) = 5
Page 9

Group 27

Quantitative Analysis in Football

Maximum (x2) = 18
9

Maximum (x2) = 16
16 18

13 14

We see that winning at home ground is left skewed indicating that a higher number of matches are being won on home ground.

We see that winning away from the home ground is right skewed indicating that a lower number of matches are being won away from the home ground. Manchester United Home Win 11 12 12 13 14 15 15 15 15 16 16 16 16 17 18 Median = 15 Q1 = 13 Q2 = 16 Minimum (x1) = 11 Maximum (x2) = 18 Away Win 5 8 9 9 10 10 11 11 12 12 12 13 13 13 13 Median = 11 Q1 = 9 Q2 = 13 Minimum (x1) = 5 Maximum (x2) = 13

Group 27

Page 10

Quantitative Analysis in Football

We see that winning at home ground is left skewed indicating that a higher number of matches are being won on home ground.

We see that winning away from the home ground is also left skewed indicating that a high number of matches are being won away from the home ground as well and so in the two cases (Home and Away) the team has a similar performance whether the match is on home ground or not.

PROBABLITY ANALYSIS
Determining the distribution of the number of wins in home of both the teams Let X be the random variable that denotes number of wins X follows normal distribution with parameters and The standard normal variable z = X-/ f(Z) = is the standard normal density function

Manchester United

= 14.733; = 1.980860804

Season
2012-2013 2011-2012 2010-2011 2009-2010 2008-2009 2007-2008 2006-2007 2005-2006 2004-2005 2003-2004 2002-2003 2001-2002 2000-2001 1999-2000 1998-1999 Group 27

Win
17 16 16 18 14 17 12 11 15 13 13 14 15 9 14

Z
1.123031802 0.712166509 0.712166509 1.533897096 -0.109564078 1.123031802 -0.931294665 -1.342159959 0.301301215 -0.520429372 -0.520429372 -0.109564078 0.301301215 -2.163890546 -0.109564078

f(Z)
0.2131 0.3101 0.3101 0.1238 0.397 0.2131 0.2589 0.1625 0.3814 0.3485 0.3485 0.397 0.3814 0.0387 0.397 Page 11

Quantitative Analysis in Football

Hence, the standard normal distribution of wins in home is given by the graph

0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -3 -2 -1 0 1 2

Real Madrid

= 14.26666667; = 2.433887739

Season
2012-2013 2011-2012 2010-2011 2009-2010 2008-2009 2007-2008 2006-2007 2005-2006 2004-2005 2003-2004 2002-2003 2001-2002 2000-2001 1999-2000 1998-1999

Win
17 16 16 18 14 17 12 11 15 13 13 14 15 9 14

Z
1.123031802 0.712166509 0.712166509 1.533897096 -0.109564078 1.123031802 -0.931294665 -1.342159959 0.301301215 -0.520429372 -0.520429372 -0.109564078 0.301301215 -2.163890546 -0.109564078

f(Z)
0.2131 0.3101 0.3101 0.1238 0.397 0.2131 0.2589 0.1625 0.3814 0.3485 0.3485 0.397 0.3814 0.0387 0.397

Hence, the standard normal distribution of wins in home is given by the graph

Group 27

Page 12

Quantitative Analysis in Football

0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -3 -2 -1 0 1 2

For Standard Normal Distribution we see that for both the teams the entire data for 15 years lies within 2 and the spread of the distributions for both the teams is almost the same indicating similar performance on home ground. Calculating the number of points expected by the team to score in a match Number of points gained if the match is won = 3 Number of points gained if the match is draw = 1 Number of points gained if the match is lost = 0

Manchester United
weight(x) x win draw loss p(x) xP(x) 3 0.775438596 2.326316 1 0.133333333 0.133333 0 0.09122807 0 2.459649

Hence, the average number of points expected by Manchester United to score in a match is 2.459 Real Madrid
weight(x) win draw loss x p(x) xP(x) 3 0.515789 1.547368 1 0.231579 0.231579 0 0.252632 0 1.778947 Page 13

Group 27

Quantitative Analysis in Football

Hence, the average number of points expected by Manchester United to score in a match is 1.778947 Determining the expected amount of money that a team will make in the future match. Manchester United Considering a sample of 15 English Premium Leagues

Posn 1 2 3

Number of times in 15 years 9 3 3 15

Event Finishes 1st Finishes 2nd Finishes 3rd

x(in million dollars) P(X) 15.1 7.3 4.5

xP(X) 0.60 0.20 0.20 9.06 1.46 0.9

E(X) 11.42

Thus, for the next premier league we can conclude that the team will make $11.42 million. Thus, the management can afford to incur a maximum maintenance cost of 11.42 million $ for no profit no loss. Else it will result in a loss. (Currently the maintenance cost for Manchester United stands around $9 million yearly Real Madrid Considering a sample of 15 Spanish La Ligas
Posn 1 2 3 4 5 Number of times in 15 years 5 7 1 1 1 15

Event Finishes 1st Finishes 2nd Group 27

x(in million dollars) P(X) 8.6 5.2

xP(X) 0.33 0.47

E(X) 2.87 5.57 2.43 Page 14

Quantitative Analysis in Football Finishes 3rd Finishes 4th Finishes 5th 4.1 3.3 2.1 0.07 0.07 0.07 0.27 0.22 0.14

Thus, for the next premier league we can conclude that the team will make $5.57 million. Thus, the management can afford to incur a maximum maintenance cost of 5.57 million $ for no profit no loss. Else it will result in a loss. (Currently the maintenance cost for Real Madrid stands around $4 million yearly

INTERVAL ESTIMATION
Manchester United Estimating the mean number of goals scored by Manchester united. Sample of past 15 seasons shows the mean to be 78.73 and standard deviation to be 10.83. Assuming goal scoring pattern to be normally distributed over the years, construct a 95% confidence interval level for mean. Data and Analysis: Given Data Sample size Mean Standard deviation Confidence Interval Sx Degrees of freedom t value 15 78.73 10.22 95% 2.64 14 2.145

Calculating from above values using t distribution, maximum and minimum values, Max 84.39 Min 73.06

Real Madrid Estimating the mean number of goals scored by Real Madrid in next season

Group 27

Page 15

Quantitative Analysis in Football

Sample of past 15 seasons shows the mean to be 83 and standard deviation to be 17.23. Assuming goal scoring pattern to be normally distributed over the years, construct a 95% confidence interval level for mean. Data and Analysis Sample size Mean Std Dev Confidence Interval Sx Degrees of freedom t value 15 83 17.23783215 95% 4.450789122 14 2.145

Max Min 92.54 73.45

Conclusion: We can expect Manchester United to score goals in the range of 73 to 84 in upcoming seasons with 95% certainty We can expect Real Madrid to score goals in the range of 73 to 92 in upcoming seasons with 95% certainty Comparing both the teams statistics, it can be concluded that Manchester United is expected to perform consistently with less variations than Real Madrid.

HYPOTHESIS TESTING:
Manchester United One sample hypothesis Problem: A random sample of 570 English Premier Matches featuring Manchester United showed that the average number of goals scored by them Xbar = 1.182 per match and standard deviation = 0.1851. Does the average number of goals scored by MANU in a match be greater than 1? (Level of significance = 1%) EIGHT STEP PROCEDURE: Step 1.The parameter of interest is the mean number of goals scored by Manchester United per match, . ( is not given) Step 2. H0 : <= 1
Group 27 Page 16

Quantitative Analysis in Football

Step 3. Ha : > 1 Step 4. = 0.01 Step 5.The text statistic is t = x3bar - 0 s / n Step 6. Given that n=570, hence d.f. = 569 (as d.f >100, it can be approximated as infinity and calculated correspondingly from table). Also for = 0.01, DOF = 569, = 2.326. Hence, reject H0 if t0< 2.326 Step 7.Computations: Since xbar = 1.182, s = .1851, 0= 1 and n=570, we have t0 = 1.182 1 = 23.53 .1851/570

Step 8. Conclusion: Since t0 = 23.53 > 2.326 (t0.01, 569); we therefore reject the null hypothesis (that is H0 : <= 1) at the 0.01 level of significance. Therefore, we conclude that the mean number of goals scored by MANU per match exceeds 1 based on hypothesis testing using the sample of 570 Manchester United EPL matches and 5% level of significance. Real Madrid One sample hypothesis Problem. A random sample of 570 Spanish La Liga Matches featuring Real Madrid showed that the average number of goals scored by them Xbar = 1.31 per match and standard deviation = 0.301. Does the average number of goals scored by Real Madrid in a match be greater than 1? (Level of significance = 5%) EIGHT STEP PROCEDURE:

Group 27

Page 17

Quantitative Analysis in Football

Step 1.The parameter of interest is the mean number of goals scored by Real Madrid per match, . ( is not given) Step 2. H0 : <= 1 Step 3. Ha : > 1 Step 4. = 0.05 Step 5.The text statistic is t = x3bar - 0 s / n Step 6. Given that n=570, hence d.f. = 569 (as d.f >100, it can be approximated as infinity and calculated correspondingly from table). Also for = 0.05, DOF = 569, = 1.645. Hence, reject H0 if t0< 1.645 Step 7.Computations: Since xbar = 1.31, s = .301, 0= 1 and n=570, we have t0 = 1.31 1 = 24.74 .301/570

Step 8. Conclusion: Since t0 = 24.74 > 1.645 (t0 .05, 569); we therefore reject the null hypothesis (that is H0 : <= 1) at the 0.05 level of significance. Therefore, we conclude that the mean number of goals scored by Real Madrid per match exceeds 1 based on hypothesis testing using the sample of 570 Real Madrid Spanish La Liga matches and 5% level of significance.

Group 27

Page 18

Quantitative Analysis in Football

REFERENCES:
Source of data: http://www.statto.com/football/teams/real-madrid/history/modern http://www.statto.com/football/teams/manchester-united/history/modern

Group 27

Page 19

You might also like