Professional Documents
Culture Documents
Abstract
Rankings have predictive value for determining the outcomes of basketball games and tennis matches. Rankings, based on power scores,
are also available for NFL teams. This paper evaluates power scores as predictors of the outcomes of NFL games for the 19942000
seasons. The evaluation involves a comparison of forecasts generated from probit regressions based on power scores published in The New
York Times with those of a naive model, the betting market, and the opinions of the sports editor of The New York Times. We conclude that
the betting market is the best predictor followed by the probit predictions based on power scores. We analyze the editors predictions and
find that his predictions were comparable to a bootstrapping model of his forecasts but were inferior to those based on power scores and
even worse than naive forecasts.
2001 International Institute of Forecasters. Published by Elsevier Science B.V. All rights reserved.
Keywords: Sports forecasting; Expert predictions; Bootstrapping
1. Introduction
In an earlier paper (Boulier & Stekler, 1999) we
found that the rankings of teams in basketball
tournaments and of tennis players in the Grand Slam
events were good predictors of the outcomes of these
competitions. Using statistical probit models, we
concluded that there was predictive value in both the
information that one competitor was considered
superior to another and in the quantitative difference
in the rankings. In many sports there is additional
information contained in so-called power scores
that measure the relative abilities of teams based on
objective criteria such as the teams performance and
the strength of its own and its opponents schedules.1
*Corresponding author. Tel.: 11-202-994-8088; fax: 11-202994-6147.
E-mail address: mortile@gwu.edu (B.L. Boulier).
1
Bassett (1997) and Glickman and Stern (1998) provide
references for alternative methods of statistical estimation of
power scores.
0169-2070 / 01 / $ see front matter 2001 International Institute of Forecasters. Published by Elsevier Science B.V. All rights reserved.
doi:10.1016/S0169-2070(01)00144-3
258
2. Data
During the sample period, the NFL had between
28 and 31 teams,4 each of which played 16 games in
a 17-week period from September to December. Our
analysis is based on games played during weeks
617 of the 19941996 seasons, weeks 517 of the
19971999 seasons, and weeks 717 of the 2000
season, yielding 1212 observations. The New York
Times publishes a power score that measures the
relative abilities of each team. This measure was first
published after the first 5 weeks of the 19941996
seasons, the first four games of the 19971999
seasons, and the first six games of the 2000 season.
The power scores summarizes a teams relative
performance in past games. It is based on the won
loss record of the teams, point margins of victory or
loss, whether games were played at home or away,
and the quality of opponents. Runaway scores are
downweighted and performance in recent games is
given greater weight. The top team is assigned a
rating of 1.000 and relative power scores are used to
In 1994, which is the first year in our sample, there were only 28
teams. The number of teams increased to 30 in 1995 and to 31 in
2000. The teams are divided into two Conferences and within each
Conference there are three Divisions. While each team plays 16
games, their opponents are selected on the basis of each teams
performance in the previous year.
259
Table 1
Descriptive statistics a
Variable
Mean
Standard
deviation
Minimum
Maximum
0.245
210.528
0.608
0.483
0.611
0.754
0.183
7.086
0.488
0.500
0.488
0.431
0.001
230
0
0
0
0
0.965
21
1
1
1
1
0.549
0.498
3. Results
5
260
Table 2
Spearman rank correlations of power score ratings of 31 NFL teams by eight systems for week 17 of the 1999 NFL season a
NY
Times
Sagarin
LA Times
Huckaby
Kambour
Massey
Moore
Packard
0.837
0.885
0.934
0.832
0.957
0.960
0.970
Sagarin
USA
Today
0.850
0.893
0.804
0.873
0.814
0.821
LA
Times
0.906
0.829
0.928
0.891
0.875
Huckaby
Kambour
Massey
Moore
0.896
0.976
0.936
0.913
0.884
0.854
0.849
0.975
0.937
0.944
Websites: The New York Times (www.nytimes.com), Jeff Sagarin (www.usatoday.com / sports / sagarin / nfl00.htm), The Los Angeles
Times (www.latimes.com), Stewart Huckaby (www.hometown.aol.com / swhuck / nfl.html), Edward Kambour (www.stat.tamu.edu / |
kambour / NFL.html), Sonny Moore (www.members.aol.com / powerrater / nfl-foot.htm), Erik Packard (www.mesastate.edu / |epackard /
nfl.html).
Table 3
Proportion of times that the team that is predicted to win actually wins by forecasting method, 19942000
Betting market a
Year
Power score:
higher rank
wins
Naive: home
team predicted
to win
1994
1995
1996
1997
1998
1999
2000
All years
Brier scoreall
years:
158
170
167
183
183
189
162
1212
0.639
0.535
0.587
0.623
0.634
0.624
0.611
0.608
0.627
0.571
0.635
0.557
0.596
0.577
0.630
0.597
0.589
0.612
0.593
0.607
0.645
0.614
0.611
0.611
153
167
165
178
183
189
160
1195
0.392
0.403
0.389
If the spread was zero in the betting market, that observation was excluded. There were 17 such cases.
0.647
0.629
0.636
0.674
0.683
0.667
0.663
0.658
0.342
O (r 2 d )
N
n 51
QR 5 ]]]],
N
where d n 5 1 if the event occurs on the nth occasion
and equals 0 otherwise, and r n is the predicted
probability that the event will occur on the nth
occasion. Predicting that the higher ranked team will
win can be interpreted as a forecast that the event
will occur with probability 1. If the forecasts are
always accurate, QR is 0; if the predictions are
correct 50% of the time, the value of the statistic
equals 0.5. Predictions that are always wrong
produce QR 5 1. Whenever the values of r n are
constrained to equal 1 or 0, the Brier Score equals
the proportion of wrong predictions. The Brier Score
is equivalent to the mean square error of the probability forecasts. The Brier Scores range from 0.342
for the predictions of the betting market to 0.403 for
the predictions of the sports editor.
7
261
3.3. Probits
Our analysis of the forecasts derived solely from
the rankings based on power scores did not take into
account either the quantitative difference in the ranks
or the difference in the power scores. In this section,
we estimate a probit model designed to predict the
probability that the higher ranked team will win. The
explanatory variables will either be the difference of
ranks or the difference in power scores from which
the rankings are derived. We can then determine
whether it is possible to improve upon the performance of the forecast: the higher ranked team
will win. Thus it will be possible to determine
whether the difference in ranks and / or power scores
of two teams contains information beyond merely
knowing that one team has a higher rank than the
other.
A probit statistical model is a statistical model
relating the probability of the occurrence of discrete
random events (Yi ), which take 0, 1 values such as
winning or losing, to some set of explanatory
variables. It yields probability estimates of the event
occurring if the explanatory variables have specified
values. Specifically, if we let Yi 5 1 for a win and
Yi 5 0 otherwise, then the probit can be specified as:
b 9x i
Prob[Yi 5 1] 5
E f(z) dz
2`
262
See Vergin and Sosik (1999, p. 23) for a brief review of the
literature on the sources of the home-field advantage.
11
A low pseudo-R 2 does not mean that rank is not a powerful
predictor of outcomes of games on average. Suppose, for example,
there are 200 games. Assume that in 100 of these games there is a
big difference in ranks of opponents and in 100 there is a small
difference in ranks. Suppose further that the higher ranked team
wins 75% of games when there is a big difference in ranks and
55% of games when there is a small difference in ranks. For this
sample of games, a probit regression estimates that the marginal
effect of a dummy variable denoting a big difference in ranks is
0.20 and is statistically significant at the 0.003 level. Yet, the
pseudo-R 2 is only 0.034.
Table 4
Coefficients of probits based on differences of ranks and differences of power scores, without and with a dummy variable for home team
advantage
Independent
variable
Constant
Difference in
ranks
Difference in
power scores
Home team is
higher ranked
dummy
Pseudo R 2
(2)
(3)
(4)
0.0098
(0.0655)
20.0256*
(0.0053)
20.2949*
(0.0759)
20.0273*
(0.0054)
0.0289
(0.0614)
20.2750*
(0.0723)
1.0246*
(0.2087)
1.0905*
(0.2130)
0.6268*
(0.0754)
0.6263*
(0.0753)
0.01
0.06
0.02
0.06
Notes: standard errors in parentheses. An asterisk indicates that the coefficient is statistically significant at the 0.01 level. There are 1212
observations.
263
Table 5
Predicted and actual probabilities of winning by difference in ranks, 19942000
Difference in
ranks
30
29
28
27
26
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
Predicted
probability of
higher ranked
team winning
Proportion of wins
0.782
0.774
0.767
0.759
0.751
0.742
0.734
0.725
0.717
0.708
0.699
0.690
0.681
0.672
0.663
0.653
0.644
0.634
0.625
0.615
0.605
0.595
0.585
0.575
0.565
0.555
0.545
0.535
0.524
0.514
2
6
9
9
11
16
16
17
25
22
25
25
35
42
40
47
43
43
53
60
63
56
46
68
69
65
64
80
69
86
1.000
1.000
0.778
0.667
0.636
0.813
0.750
0.824
0.760
0.909
0.720
0.720
0.714
0.548
0.625
0.596
0.605
0.605
0.509
0.617
0.667
0.518
0.522
0.603
0.507
0.662
0.594
0.575
0.565
0.477
264
Table 6
Brier scores for recursive probit regressions and for two naive models, The New York Times sports editor, and the betting market,
19942000
Year
Difference
in ranks
plus home
team
dummy
Naive models
Difference
in power
scores
Difference in
power scores
plus home team
dummy
Probability forecasts
Higher
ranked
team
picked
Home
team
picked
New
York
Times
sports
editor
Betting
market
Win/loss
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
1994
1995
1996
1997
1998
1999
2000
All years
146
157
155
170
171
175
148
1122
0.236
0.257
0.242
0.238
0.225
0.242
0.254
0.242
0.238
0.259
0.235
0.229
0.209
0.233
0.247
0.235
0.237
0.255
0.243
0.237
0.226
0.243
0.241
0.240
0.239
0.257
0.235
0.231
0.209
0.234
0.236
0.234
0.411
0.408
0.387
0.359
0.316
0.389
0.419
0.382
0.363
0.471
0.419
0.382
0.357
0.365
0.399
0.393
0.418
0.408
0.394
0.406
0.357
0.377
0.392
0.392
0.363
0.420
0.361
0.459
0.404
0.400
0.385
0.400
141
155
153
165
171
175
146
1106
0.362
0.381
0.373
0.339
0.322
0.331
0.356
0.351
265
266
Table 7
Probit regressions examining The New York Times editors picks, 19942000
Row
(1)
(2)
(3)
(4)
Dependent
variable
Independent variables
Constant
Power score
difference:
higher ranked
team minus
lower ranked
team
Home
team is
higher
ranked
Editor picks
higher ranked
team to win
Higher ranked
team wins
Editor picks
home team to
win
Home team
wins
20.1525
(0.0806)
3.0646*
(0.2900)
0.4337*
(0.0846)
20.3289*
(0.0861)
0.2172*
(0.0424)
1.0138*
(0.2229)
0.6148*
(0.0761)
0.2598*
(0.0626)
Editor picks
higher
ranked
team to win
Power score
difference:
home team
minus away
team
Editor
picks
home
team to
win
Pseudo-R 2
0.12
0.1040
(0.0903)
0.06
3.3080*
(0.1799)
1.1117*
(0.1544)
0.29
0.0929
(0.0899)
0.06
Notes: standard errors are in parentheses. An asterisk indicates significance at the 0.01 level. There are 1212 observations.
267
268
Table 8
Brier Scores of The New York Times editors picks compared with bootstrap estimates
Year
(1)
New York Times sports
editors picks
(2)
Bootstrap forecasts
of win or loss
1994
1995
1996
1997
1998
1999
2000
All years
146
157
155
170
171
175
148
1122
0.363
0.420
0.361
0.459
0.404
0.400
0.385
0.400
0.404
0.446
0.381
0.382
0.351
0.366
0.392
0.388
Table 9
Accuracy of judgmental forecasts of outcomes of professional football games: The New York Times sports editor and four former NFL
players / coaches
Year
1994
1995
1996
1997
1998
1999
2000
Chris Collinsworth
No. of
games
Percent
correct
No. of
games
158
170
167
183
183
189
162
62.7
57.1
63.5
55.7
59.6
63.0
59.7
251
251
251
Jerry Glanville
Nick Buoniconti
Len Dawson
Percent
correct
No. of
games
Percent
correct
No. of
games
Percent
correct
No. of
games
Percent
correct
67.3
66.1
68.9
251
251
251
57.0
57.4
61.4
251
251
251
65.3
60.6
69.7
251
251
251
63.8
61.4
67.7
269
Acknowledgements
We appreciate the helpful comments of Keith Ord,
an Associate Editor, and the two referees.
270
References
Avery, C., & Chevalier, J. (1999). Identifying investor sentiment
from price paths: the case of football betting. Journal of
Business, 72, 493521.
Bassett, Jr. G. W. (1997). Robust sports ratings based on least
absolute values. The American Statistician, 51 (2), 99105.
Bunn, D., & Wright, G. (1991). Interaction of judgmental and
statistical forecasting methods: issues and analysis. Management Science, 37, 501518.
Boulier, B. L., & Stekler, H. O. (1999). Are sports seedings good
predictors? International Journal of Forecasting, 15, 8391.
Brier, G. W. (1950). Verification of weather forecasts expressed in
terms of probability. Monthly Weather Review, 78, 13.
Forrest, D., & Simmons, R. (2000). Forecasting sport: the
behavior and performance of football tipsters. International
Journal of Forecasting, 16, 317331.
Gandar, J. M., Dare, W. H., Brown, C. R., & Zuber, R. A. (1998).
Informed traders and price variations in the betting market for
professional basketball games. The Journal of Finance, 53 (1),
385401.
Glickman, M. E., & Stern, H. S. (1998). A state-space model for
national football league scores. Journal of the American
Statistical Association, 93, 2535.