Combining Probability Forecasts (Michael P. Clementsa David I. Harvey - 2010)

International Journal of Forecasting 27 (2011) 208223
www.elsevier.com/locate/ijforecast
Combining probability forecasts
Michael P. Clements
a
, David I. Harvey
b,
a
Department of Economics, University of Warwick, United Kingdom
b
School of Economics, University of Nottingham, United Kingdom
Abstract
We consider different methods for combining probability forecasts. In empirical exercises, the data generating process of
the forecasts and the event being forecast is not known, and therefore the optimal form of combination will also be unknown.
We consider the properties of various combination schemes for a number of plausible data generating processes, and indicate
which types of combinations are likely to be useful. We also show that whether forecast encompassing is found to hold between
two rival sets of forecasts or not may depend on the type of combination adopted. The relative performances of the different
combination methods are illustrated, with an application to predicting recession probabilities using leading indicators.
c 2010 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
Keywords: Probability forecasts; Forecast combinations; Recession probabilities
1. Introduction
In this paper we consider different ways of
combining probability forecasts of the same event. As
was noted by Diebold and Lopez (1996), forecasts of
economic and fnancial variables often take the form
of probabilities, and there are good reasons to believe
that probability forecasts will become increasingly
prominent.
1
For example, in a macroeconomic policy
setting, a forecast of the probability that the target
Corresponding address: School of Economics, University of

Nottingham, University Park, Nottingham NG7 2RD, United
Kingdom. Tel.: +44 0 115 9515481; fax: +44 0 115 9514159.
E-mail address: dave.harvey@nottingham.ac.uk (D.I. Harvey).
1
In other spheres, the combination of probability assessments
is commonplace, although the emphasis tends to be different from
rate of infation will be exceeded next year, or of the
probability that the economy will contract, may be
markedly more informative than simple point forecasts
of the expected rates of infation and output growth,
especially in the absence of any indication of the
degree of uncertainty to be attached to the point
forecasts.
An extensive body of literature in economics and
management science attests to the usefulness of fore-
cast combination for point forecasts, where by a point
ours. The literature on the combination of experts subjective
probability distributions (see, e.g., Clemen & Winkler, 1999; Genest
& Zidek, 1986) looks at ways of aggregating individual assessments
such that the aggregate possesses desirable properties, rather than
focusing on accuracy. See also Dawid (1986) and Winkler (1996)
on probability forecasting and evaluation from a meteorological
perspective, as well as for a discussion of earlier contributions.
0169-2070/$ - see front matter c 2010 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
doi:10.1016/j.ijforecast.2009.12.016
M.P. Clements, D.I. Harvey / International Journal of Forecasting 27 (2011) 208223 209
forecast we mean a forecast, defned on R, of an out-
come that is also defned on R, in contrast to prob-
ability forecasts, which are defned on the interval
[0, 1] of a binary outcome variable. The recent liter-
ature on combining point forecasts covers a number of
areas, including the specifcation of the combination
weights, testing for forecast encompassing, the impact
of parameter estimation uncertainty, and the limiting
distributions of the tests when the forecasts are from
models which are nested.
2
With a few exceptions,
3
the
standard assumption in the literature has been that the
forecaster has a squared error loss function, and most
of the work on the combination of point forecasts has
been based on linear combinations of the individual
forecasts. The focus on linear combinations is read-
ily justifable when, in addition to squared-error loss,
we assume that the variable being forecast (y
t
) and the
forecasts (given by the vector f
t
) follow a joint Gaus-
sian distribution. Then, standard results indicate that
the conditional expectation E (y
t
| f
t
) is the optimal
predictor (the conditional expectation minimizes the
expected squared error loss), and the joint normality
of

y
t
f
ensures that the conditional expectation of

y
t
is a linear combination of the elements of f
t
(see,
for example, Timmermann, 2006, pp. 144145).
However, for probability forecasts, the limited
support of (y
t
, f
t
) suggests that the justifcation for
considering linear combinations of forecasts is more
problematic, and alternatives such as the logarithmic
opinion pool (LoOP) fgure prominently in the litera-
ture. We investigate different ways of combining prob-
ability forecasts, both in terms of the accuracy of the
combined forecasts, and in terms of the implications
for forecast encompassing. Because in practice the
data generating process of the forecasts and the event
being forecast will typically be unknown, we consider
the properties of various combination schemes for a
number of plausible data generating processes, in or-
der to see whether some tend to dominate the others.
2
See, for example, Bates and Granger (1969), Chong and
Hendry (1986), Clark and McCracken (2001), Deutsch, Granger,
and Ter asvirta (1994), Granger and Ramanathan (1984), Harvey,
Leybourne, and Newbold (1998), Hendry and Clements (2004),
Harvey and Newbold (2000), Newbold and Granger (1974), Stock
and Watson (1999) and West (1996, 2001), as well as the reviews
by Clemen (1989), Clements and Harvey (2009), Diebold and Lopez
(1996), Newbold and Harvey (2002) and Timmermann (2006).
3
Elliott and Timmermann (2004) is a notable exception.
We establish the large-sample properties of the com-
bination schemes, but, at least as importantly, we also
investigate their relative performances when the com-
bination weights are estimated, as is often done in
practice.
The plan of the rest of the paper is as follows.
In Section 2 we describe the forecast combinations
we consider, as well as the loss functions that are
typically used for evaluating probability forecasts.
Section 3 discusses two specifc data generation pro-
cesses and establishes the optimal form of combina-
tion in each case. Section 4 describes the estimation of
the combination weights for the two loss functions or
scoring rules, LPS and QPS, and contrasts the impli-
cations for forecast encompassing of using the optimal
form of combination for the given data generation pro-
cess with the situation when the form of combination
is non-optimal. In Section 5, using Monte Carlo, we
explore the relative accuracies of the different forecast
combinations for the two data generating processes of
Section 3, and focus in particular on the dependence
of the ranking on (i) the number of forecasts used to
estimate the combination weights, and (ii) the impor-
tance of the loss function: QPS versus LPS. Section 6
illustrates the different types of combinations of prob-
ability forecasts through an application based on com-
bining the forecast recession probabilities generated
using two leading indicators from the Conference
Boards Composite Leading Indicator. Section 7 offers
some concluding remarks.
2. Combinations and scoring rules for probability
forecasts
2.1. Forecast combinations
The frst combination method is the linear com-
bination of forecasts. In the literature on combining
experts subjective probability distributions, this is
commonly referred to as the linear opinion pool
(LiOP). In the econometrics literature, the most gen-
eral form of a linear combination of two point fore-
casts, f
1t
and f
2t
, is given by:
LiOP C
t
f
1t
, f
2t
;
= +
1
f
1t
+
2
f
2t
, (1)
where is used in a generic sense to denote the
combination parameters, and where no restrictions are
210 M.P. Clements, D.I. Harvey / International Journal of Forecasting 27 (2011) 208223
imposed on either or the weights {
1
,
2
}. The
weights are typically estimated by applying OLS to:
y
t
= +
1
f
1t
+
2
f
2t
+
t
,
as this corresponds to minimizing the sum of squared
forecast errors of the combined forecast. This is the
form of the regression for testing for forecast encom-
passing advocated by Fair and Shiller (1990), while re-
stricted versions have been considered by Chong and
Hendry (1986) and West (2001) (setting
1
= 1 when
testing that
2
= 0), and Harvey et al. (1998) and
West and McCracken (1998) (restricting
1
+
2
= 1).
Forecast encompassing holds when a combination of
two (or more) forecasts does not result in a statistically
signifcant reduction in the forecast loss relative to just
one of the forecasts: here,
2
= 0 implies that the f
1
forecast encompasses f
2
. Tests of forecast encompass-
ing assess ex post whether there is a linear combination
of forecasts that results in a statistically signifcant re-
duction in forecast loss (such as a reduction in mean
squared error) relative to using a particular forecast.
Note that, when applied to probability forecasts, nei-
ther the general form nor either of the restricted forms
ensure that C
t
[0, 1].
The second form of combination is the multiplica-
tive combination:
LoOP C
t
f
1t
, f
2t
;
= f

1
1t
f

2
2t
, (2)
which is known as the logarithmic opinion pool
(LoOP) when applied to experts subjective probabil-
ity distributions (see Genest & Zidek, 1986, and the
references therein). This is a simplifed version of the
following formula for combining discrete probability
distributions:
f
j
=
N
i =1
f
j
i
i
M
j =1
N
i =1
f
j
i
i
=
exp
i =1
i
log f
j
i
j =1
exp
i =1
i
log f
j
i
,
where f
j
i
is individual i s probability of class c
j
,
where there are M classes. The denominator is a
scaling factor. Typically,

N
i =1

i
= 1, and if the
weights are equal, i.e.
i
= N
1
, the LoOP deliv-
ers the geometric mean. Assuming a binary variable
(M = 2), and with N = 2, we obtain:
f =
f

1
1
f

2
2
f

1
1
f

2
2
+(1 f
1
)
1
(1 f
2
)
2
,
where we drop the j superscript on f and f
i
, which
are now the combined and individual probabilities of
the event occurring. The multiplicative combination
(Eq. (2)) follows directly by setting the denominator
equal to
1
. As was the case with Eq. (1), without
restrictions on ,
1
and
2
, there is no guarantee that
C
t
[0, 1].
The fnal form of combination is that of Kamstra
and Kennedy (1998) (henceforth KK). KK suggest
combining log odds ratios by logit regressions, with-
out claiming any optimality properties for the resulting
combination: we only claim that this methodology is
a means of combining individual probability forecasts
in a computationally attractive manner, while alleviat-
ing bias (p. 86). Specifcally, the KK combination of
f
1t
and f
2t
is:
KK C
t
f
1t
, f
2t
;
=
exp
+
1
ln
f
1t
1f
1t
+
2
ln
f
2t
1f
2t
1 +exp
+
1
ln
f
1t
1f
1t
+
2
ln
f
2t
1f
2t
=
exp ()
f
1t
1f
1t
f
2t
1f
2t
2
1 +exp ()
f
1t
1f
1t
f
2t
1f
2t
2
, (3)
where
1
and
2
are the maximum likelihood esti-
mates of the slope coeffcients from a logit regression
of y
t
on a constant, ln
f
1t
1f
1t
and ln
f
2t
1f
2t
. The KK
combination does ensure that C
t
[0, 1], whilst al-
lowing for an unrestricted intercept and not requiring
that
1
+
2
= 0. Consequently, relative to LoOP, the
KK combination restricts the resulting forecasts to the
unit interval whilst catering for forecasts which may
be individually biased.
2.2. Scoring rules
In contrast to the evaluation of point forecasts, the
actual probabilities are not observed, so that f , the
forecast probability that y = 1, is compared to the
realized value of y. The two main ways of scoring
probability forecasts are the quadratic and logarith-
mic scores. The Brier or quadratic probability score
(QPS: Brier, 1950) is simply the expected squared er-
ror E
( f y)
2
, corresponding to the usual notion

of squared-error loss. The logarithmic probability
score (LPS: see Brier, 1950; Good, 1952) is defned
as E
y log ( f ) (1 y) log (1 f )
.
4
For a se-
quence of probability forecasts and outcomes, { f
t
, y
t
},
t = 1, . . . , n, these scores are calculated as
QPS =
1
n
n
t =1
2 ( f
t
y
t
)
2
(4)
and
LPS =
1
n
n
t =1
[y
t
ln f
t
+(1 y
t
) ln (1 f
t
)] . (5)
It is conventional to calculate QPS as in Eq. (4), so that
QPS is twice the standard mean squared error measure
commonly calculated for point forecasts. These two
well-known measures for scoring probability forecasts
have been used in economic applications by Anderson
and Vahid (2001) and Diebold and Rudebusch (1989),
inter alia.
3. Data generating processes and optimal combi-
nations
The general form of DGP we consider is:
y
t
= 1 (g( f
1t
, f
2t
) > v
t
) , (6)
where v
t
U (0, 1) that is, v
t
is a uniform random
variable on (0, 1) and where g(.) is some function
of the individual probability forecasts f
1t
, f
2t
. This
implies that Pr (y
t
= 1) =
u
(g( f
1t
, f
2t
)), where
u
is the cdf (cumulative density function) of a uniform
(0, 1) random variable. By construction, the form of
the optimal combination is
C
f
1t
, f
2t
;
= Pr (y
t
= 1 | f
1t
, f
2t
)
= max[g( f
1t
, f
2t
), 0],
where the max function is employed since the form
of g( f
1t
, f
2t
) for one of our specifc DGPs (CJ-DGP,
below) does not ensure g( f
1t
, f
2t
) > 0, although for
both DGPs, g( f
1t
, f
2t
) < 1. Thus, when the DGP
for y
t
is specifed in terms of the forecasts f
1t
, f
2t
,
as here, the optimal or correct form of combination is
given by the DGP.
4
As written, LPS takes possible values on [0, +) and QPS
takes values on [0, 1].
The frst special case of Eq. (6) is the logit-based
DGP (henceforth L-DGP) considered by Clements
and Harvey (in press), which relates y
t
to the
forecast models explanatory variables via a logit-type
transformation, and where the forecasts are also logit
functions:
y
t
= 1
exp(
1
X
1t
+
2
X
2t
)
1 +exp(
1
X
1t
+
2
X
2t
)
> v
t
(7)
f
1t
=
exp(
11
X
1t
)
1 +exp(
11
X
1t
)
(8)
f
2t
=
exp(
12
X
2t
)
1 +exp(
12
X
2t
)
. (9)
This DGP refects the common practice of using logit
models to obtain probability forecasts.
5
It allows for
an investigation of forecast encompassing (when, say,
2
= 0), and of the effects of parameter estimation
uncertainty on predictive accuracy (i.e., the effects
of replacing the population
i j
by the estimator
i j
), although our focus will be on comparing the
accuracies of different combination methods, and
also on the effect of uncertainty on estimating the
combination weights. If we assume that {X
1t
, X
2t
} has
a normal distribution:
X
1t
X
2t
0
0
1
X
1
,X
2
X
1
,X
2
1
, (10)
then f
1t
and f
2t
are correlated, provided that
X
1
,X
2
= 0.
Assuming that X
1t
and X
2t
are scalars, the DGP
can be re-written for y
t
in terms of f
1t
and f
2t
by com-
bining Eq. (7) with rearrangements of Eqs. (8) and (9):
y
t
= 1
f
1t
1f
1t
11
f
2t
1f
2t
12
1 +
f
1t
1f
1t
11
f
2t
1f
2t
12
> v
t
,
so that the true Pr (y
t
= 1) in terms of the individual
forecasts is given by:
5
For example, logit models have been used to obtain forecasts
of the probabilities of recessions from information sets that include
leading indicators such as the yield curve; see, e.g., Estrella and
Mishkin (1998).
Pr (y
t
= 1 | f
1t
, f
2t
) =
f
1t
1f
1t
11
f
2t
1f
2t
12
1 +
f
1t
1f
1t
11
f
2t
1f
2t
12
,
(11)
which defnes the optimal form of combination:
C
f
1t
, f
2t
;
f
1t
1f
1t
11
f
2t
1f
2t
12
1 +
f
1t
1f
1t
11
f
2t
1f
2t
12
. (12)
In terms of Eq. (6), g( f
1t
, f
2t
) = C
f
1t
, f
2t
;
. It
is readily apparent that Eq. (12) is the form of com-
bination suggested by Kamstra and Kennedy (1998),
which we termed the KK combination in Section 2
(Eq. (12) is equivalent to Eq. (3) with = 0).
When the X
i t
are vectors of variables, we cannot
in general establish the optimality of the KK combi-
nation. For vector information sets, the log odds ratios
are given by:
ln
f
1t
1 f
1t
11
X
1t
, ln
f
2t
1 f
2t
12
X
2t
,
where
11
and
12
are the population values for the
two individual models. Then the logit regression of y
t
on the two log odds ratios results in:
Pr(y
t
= 1) =
exp(
1
11
X
1t
+
2
12
X
2t
)
1 +exp(
1
11
X
1t
+
2
12
X
2t
)
. (13)
This will only match the data generating process the
logit regression of y
t
on X
1t
and X
2t
given by Eq. (7)
if the following are satisfed:
11
=
1
,
2
12
=
2
. (14)
When these restrictions hold, KK is the optimal form
of combination.
The second special case of Eq. (6) is:
y
t
= 1
1t
u
(1)
2t
> v
t
+c
(15)
f
1t
= u
1t
f
2t
= u
2t
,
where the forecasts u
1t
and u
2t
are drawn from the
Cook and Johnson (1981) bivariate distribution:
f (u
1t
, u
2t
) =
+1
(u
1t
u
2t
)
+1
1t
+u
2t
1
(+2)
(16)
and c > 0. In terms of the general setup of Eq. (6),
g( f
1t
, f
2t
) = f

1t
f
(1)
2t
c, where c > 0 ensures
g( f
1t
, f
2t
) < 1. There are a number of reasons for
considering Eq. (15). It is simple enough to be analyti-
cally tractable for a number of cases of interest, whilst
generating forecasts with desirable characteristics, as
explained below. It is also, when c = 0, a data gener-
ation process for which LoOP is the optimal form of
combination.
This DGP has the property that the random vari-
ables u
1t
and u
2t
are correlated, but have marginal
distributions which are U (0, 1). The degree of corre-
lation between u
1t
and u
2t
, 0 1, is determined
by the parameter , with 0 as and 1
as 0. The equation for y
t
indicates that y
t
de-
pends on f
1t
and f
2t
, such that when the constant c is
zero, Pr (y
t
= 1) = u
1t
u
(1)
2t
, as v
t
is U (0, 1), and
is drawn independently of u
1t
and u
2t
for all t .
6
The
form of the DGP ensures that the forecasts have the
characteristics of probability forecasts (i.e., they are
defned on the unit interval), while also allowing for
contemporaneous correlation among the predictions (a
property frequently observed in practice). By setting
= 1, Pr (y
t
= 1) = u
1t
, and thus depends only on
f
1t
, so that f
2t
conveys no useful information, given
that we have f
1t
. When in addition c = 0, f
1t
is bi-
ased (as is f
2t
). Note that it is straightforward to sim-
ulate {y
t
, f
1t
, f
2t
} from Eqs. (15) and (16).
Since from Eq. (15):
Pr (y
t
= 1 | u
1t
, u
2t
) = Pr
1t
u
(1)
2t
> v
t
+c
= Pr
v
t
< u
1t
u
(1)
2t
c
= max(u
1t
u
(1)
2t
c, 0),
it follows immediately that the optimal type of combi-
nation, for c = 0, is given by:
C
t
( f
1t
, f
2t
; , c) = max( f

1t
f
(1)
2t
c, 0),
with f
1t
= u
1t
and f
2t
= u
2t
; or, simply, C
t
( f
1t
,
f
2t
; ) = f

1t
f
(1)
2t
when c = 0 as u
1t
u
(1)
2t
> 0.
6
Note that y
t
can also be written as y
t
= 1(u
1t
u
(1)
2t
> w
t
),
where w
t
U (c, 1 +c).
This matches the LoOP defned in Eq. (2), when = 1
in Eq. (2).
4. Loss functions and estimation
Suppose that we have the optimal form of combi-
nation for a given DGP. Consider selecting combina-
tion weights to minimize the LPS of the combined
forecast C
f
1t
, f
2t
;
. This is equivalent to
maximum likelihood (ML) estimation of , since the
likelihood function (for iid data on a Bernoulli random
variable with a probability C
t
() that Y
t
= 1) is given
by:
L =
y
t
=1
C
f
1t
, f
2t
;
y
t
y
t
=0
[1 C
t
( f
1t
, f
2t
; )]
1y
t
. (17)
Taking logs, we obtain:
ln L =
[y
t
ln C
t
( f
1t
, f
2t
; )
+(1 y
t
) ln(1 C
t
( f
1t
, f
2t
; ))],
which is proportional to (minus) LPS (see Eq. (5)).
Thus, ML using the likelihood function for an iid
Bernoulli random variable is equivalent to minimizing
LPS.
Alternatively, Pr (y
t
= 1 | f
1t
, f
2t
) = C
t
( f
1t
, f
2t
;
) can be thought of instead as the conditional
expectation:
E (y
t
| f
1t
, f
2t
) = C
f
1t
, f
2t
;
,
giving rise to the (generally nonlinear) regression
model:
y
t
= C
f
1t
, f
2t
;
+
t
, (18)
to be estimated by (nonlinear) least squares. The non-
linear least squares estimation of Eq. (18) is clearly
equivalent to estimating on the basis of minimizing
QPS.
Given that both Eqs. (17) and (18) are correctly
specifed, in the sense that they incorporate the opti-
mal form of combination, both maximum likelihood
estimation of Eq. (17) and nonlinear least squares es-
timation of Eq. (18) will provide consistent estimates
of under standard assumptions. By implication, the
parameters of the optimal combination will be esti-
mated consistently, regardless of whether we use QPS
or LPS. Tests of forecast encompassing based on both
QPS and LPS will therefore be valid, in the sense that
when f
2t
does not enter the optimal combination, both
QPS and LPS will indicate that f
1t
encompasses f
2t
in population.
As an example, consider the KK combination and
the L-DGP. The optimal form of combination is given
by Eq. (12), so that estimation of the combination
weights by QPS based on
C
f
1t
, f
2t
;
f
1t
1f
1t
f
2t
1f
2t
2
1 +
f
1t
1f
1t
f
2t
1f
2t
2
(19)
will result in:
1
p
11
,

2
p
12
.
When
2
= 0, so that y
t
depends only on X
1t
in the
DGP, testing for forecast encompassing (
2
= 0 in
Eq. (19)) will give the correct inference as

2
p
0.
The same inference would result from minimizing
Eq. (17) with C
t
given by Eq. (19).
Consider the second DGP, CJ-DGP. Minimizing
QPS corresponds to choosing {, c} such that:
arg min
{,c}
y
t
max(u
1t
u
(1)
2t
c, 0)
2
.
When = 1, such that f
1t
2t
,
p
1.
When we dispense with the assumption that Eqs.
(17) and (18) are correctly specifed, as when C
t
(.)
is replaced by some other type of combination C
t
(.),
we cannot establish (in general) that the inference we
make concerning forecast encompassing will match
that made when the correct form of combination is
usedeven asymptotically.
Formally, for a loss function L = L (e
t
), let e
j,t
=
y
t
C
j t
(f
t
;
j
) denote the combined forecast error
associated with a form of combination given by C
j t
(.),
for j = 1, 2, . . . , J combination methods (in our case,
J = 3: LiOP, LoOP and KK). Conditional on the form
of combination, the optimal combination weights are
given by:
j
= arg min
j
E
e
j,t
, (20)
where the expectation is over the conditional distribu-
tion of e
t
, given past forecasts and outcomes. While
the optimal weight on f
2
may be zero in the optimal
form of combination, indicating forecast encompass-
ing of f
2
by f
1
, in general it cannot be proved that the
vector
j
, for combination method j , will also indi-
cate forecast encompassing.
Furthermore, although
E
e
j,t
j, (21)
where e
denotes the forecast error using the opti-

mal form of combination, we could have any accuracy
ranking among the different non-optimal combination
methods, i.e.:
E
e
1,t
e
2,t
. . . E
e
J,t
. (22)
Moreover, in practice, the population parameters {
1
, . . . ,
J
} will be replaced by estimates of the com-
bination weights. In that case, the inequality in Eq.
(21) may no longer hold, and the relationships between
the different forms of combination in Eq. (22) based
on population values may be overturned. The rankings
may also depend on the loss function, L(.). In what
follows, we report results for both loss functions (QPS
and LPS), as well as focusing on the form of the com-
bination, C
t
(.), and the impact of uncertainty about .
The next section describes the Monte Carlo investiga-
tion of the accuracy of optimal and non-optimal meth-
ods of forecast combination for both of our example
DGPs.
5. Monte Carlo simulations
Deriving analytical expressions for the value of
the loss for the different ways of combining forecasts
is mathematically intractable, except in a number
of special cases. Consequently, in this section we
report the results of some Monte Carlo simulations for
probability forecast combination using LiOP, LoOP
and KK, for both the L-DGP and the CJ-DGP. We
report the results frst for QPS (Section 5.1), then for
LPS (Section 5.2).
5.1. Simulation results for QPS
5.1.1. For the L-DGP
The simulations for the L-DGP are based on Eqs.
(7)(9), where the Xs are normal random variables
drawn from Eq. (10). We ignore parameter estima-
tion uncertainty in the forecast model parameters ,
in order to focus on uncertainty in the estimation of
the combination weights. Simulated population pa-
rameters , based on the average values across 10,000
replications of the logit model estimates on samples
of size 10,000, were used. We generated samples of
size n = {25, 50, 100, 200, } to estimate the com-
bination weights for a given replication (the case n =
indicates that optimal population weights, which
were simulated using a sample size of n = 10,000,
were used). We then used these estimated weights to
compute out-of-sample combined forecasts and the
corresponding QPS for that replication, using 10,000
out-of-sample observations in order to approximate
the population QPS associated with the given set
of combination weights.
7
The tables then report the
means and standard deviations of the simulated QPS
values across replications. Both here and throughout
the paper, simulations were performed in Gauss 9.0
using 10,000 Monte Carlo replications.
Table 1 reports the results for scalars X
1t
and X
2t
,
1
= 1 and
2
= 0 (so that encompassing holds in
the KK combination), and for correlation parameter
values
X
1
,X
2
= {0.2, 0.5, 0.8}. Focusing frstly on
the results for the population combination weights
(n = ), unreported results confrm that the KK
combination attaches zero weight to f
2
, and the same
is found to be true for LiOP and LoOP. All methods
result in the same limiting value of QPS, regardless of
the value of
X
1
,X
2
. Even though KK is the optimal
form of combination, when the combination weights
are estimated using small samples of forecasts, the
estimation uncertainty of this method is greater than
that of LiOP, and the latter method is shown to be the
most accurate in terms of QPS, with a lower mean and
standard deviation obtained for all correlation values.
The accuracy of the LoOP method lies between those
of KK and LiOP in small samples. Unreported results
for a non-encompassing case (
1
= 0.5 and
2
= 0.5)
7
We constrain 0 for the LoOP combination. We also
constrain infeasible values of the combined forecasts to their
boundary values (i.e., 0 or 1 for QPS, and 0.0001 or 0.9999 for
the results for LPS reported in Section 5.2). Replications where
all of the forecasts being combined are outside the feasible range
(for any of the combination schemes) are treated as anomalous
replications, and were resampled in order to obtain the simulation
results reported.
Table 1
Means and standard deviations of QPS across Monte Carlo replications: L-DGP,
1
= 1,
2
= 0.
n
X
1
,X
2
= 0.2
X
1
,X
2
= 0.5
X
1
,X
2
= 0.8
Mean S.d. Mean S.d. Mean S.d.
25 LiOP 0.459 0.041 0.460 0.041 0.460 0.041
LoOP 0.483 0.076 0.484 0.071 0.485 0.072
KK 0.491 0.080 0.492 0.081 0.492 0.082
50 LiOP 0.436 0.019 0.436 0.020 0.437 0.020
LoOP 0.443 0.030 0.443 0.031 0.444 0.030
KK 0.445 0.032 0.445 0.034 0.445 0.033
100 LiOP 0.425 0.010 0.425 0.010 0.425 0.010
LoOP 0.426 0.015 0.427 0.016 0.427 0.013
KK 0.427 0.012 0.427 0.013 0.427 0.013
200 LiOP 0.419 0.006 0.419 0.006 0.419 0.006
LoOP 0.420 0.009 0.420 0.010 0.420 0.007
KK 0.420 0.007 0.420 0.007 0.420 0.007
LiOP 0.413 0.413 0.413
LoOP 0.413 0.413 0.413
KK 0.413 0.413 0.413
again show that all three combinations give essentially
the same limiting value of QPS, but that LiOP is the
most accurate for small values of n.
In order to investigate the reason for the poor fnite-
sample performance of KK, which is the optimal
form of combination, Table 2 reports the bias-variance
decomposition of the mean QPS fgures reported in
Table 1. Given the defnition of QPS as twice the
expected squared error, the squared bias plus the
variance is equal to half the QPS values reported in
Table 1. From Table 2 it is apparent that the high
small-sample QPS values for KK are due primarily to
the variance. Although both the squared bias and the
variance increase as the sample size gets smaller, the
bias term is almost inconsequential compared to the
variance.
The results reported in Table 1 are for independent
data on {y
t
, f
1t
, f
2t
} across t , but in time series appli-
cations the data and forecasts are likely to be serially
dependent. We check that our results are qualitatively
unaffected by allowing for serial correlation by report-
ing in Table 3 the results for setups similar to those re-
ported in Table 1, but allowing for serial dependence.
Specifcally, the X
i t
(i = 1, 2) in Eqs. (8) and (9) are
replaced by W
i t
, where W
i t
= W
i,t 1
+X
i t
, i = 1, 2,
i.e., the explanatory variables are AR(1), with innova-
tions given by the X
i t
. The X
i t
are as defned in Eq.
(10). The results in Table 3 are for = 0.5, and are
qualitatively similar for values of such as = 0.8,
for example. It is apparent that autocorrelation has no
effect on either the rankings of the combination meth-
ods or our qualitative fndings, and thus, in order to
save space, in what follows we only report results for
serially independent data.
We also report results for when the two models
explanatory variables are vectors. To explore the rela-
tive performances of the three combination methods in
these cases, we consider the following generalization
of Eqs. (7)(9). Let X
1t
and X
2t
be (2 1) vectors:
X
1t
=
X
11t
X
12t
, X
2t
=
X
21t
X
22t
,
so that:
y
t
= 1
exp(
11
X
11t
+
12
X
12t
+
21
X
21t
+
22
X
22t
)
1 +exp(
11
X
11t
+
12
X
12t
+
21
X
21t
+
22
X
22t
)
> v
t
f
1t
=
exp(
11
X
11t
+
12
X
12t
)
1 +exp(
11
X
11t
+
12
X
12t
)
f
2t
=
exp(
21
X
21t
+
22
X
22t
)
1 +exp(
21
X
21t
+
22
X
22t
)
.
We assume that X
t
=

X
1t
X
2t
is jointly nor-
mally distributed, with zero means and unit variances,
and that all correlations are equal to zero except
for
X
11
,X
21
. A non-zero value of
X
11
,X
21
generates
Table 2
Bias-variance decomposition of simulated QPS values: L-DGP,
1
= 1,
2
= 0.
n
X
1
,X
2
= 0.2
X
1
,X
2
= 0.5
X
1
,X
2
= 0.8
Bias
2
Variance Bias
2
Variance Bias
2
Variance
25 LiOP 0.008 0.221 0.008 0.222 0.008 0.222
LoOP 0.012 0.230 0.011 0.231 0.011 0.232
KK 0.011 0.235 0.011 0.235 0.011 0.235
50 LiOP 0.004 0.214 0.004 0.214 0.004 0.214
LoOP 0.005 0.216 0.005 0.216 0.005 0.217
KK 0.005 0.218 0.005 0.217 0.005 0.218
100 LiOP 0.002 0.210 0.002 0.210 0.002 0.210
LoOP 0.002 0.211 0.002 0.211 0.002 0.211
KK 0.002 0.211 0.002 0.211 0.002 0.211
200 LiOP 0.001 0.208 0.001 0.208 0.001 0.208
LoOP 0.001 0.209 0.001 0.209 0.001 0.209
KK 0.001 0.209 0.001 0.209 0.001 0.209
correlated forecasts, and, in addition, each forecast has
an idiosyncratic component (X
12t
and X
22t
respec-
tively) which is independent of the other variables.
Table 4 reports results for the design parameters
{
11
= 2,
12
= 5,
21
= 4,
22
= 1}, which are
chosen such that the restrictions given by Eq. (14) for
KK to be the optimal form of combination are not
satisfed. Despite this, in the limiting case KK is still
the most accurate forecast combination, followed by
LoOP and lastly LiOP, with the differences in accuracy
being largein the order of 40% for
X
11
,X
21
=
0.2. For small values of n, LiOPs performance is
more competitive, with estimation uncertainty having
a considerably greater impact on the accuracy of
KK, and even more on that of LoOP. For n = 25,
for example, LiOP is better than KK for the larger
correlations between X
11
and X
21
, in terms both of
the mean QPS, and of the standard deviation being
substantially smaller.
5.1.2. For the CJ-DGP
We simulated data from Eqs. (15) and (16),
and computed the out-of-sample QPS mean and
standard deviation for each combination method,
where the combination parameters were estimated
using n = {25, 50, 100, 200, } observations (the
case n = again indicates the use of the optimal
population weights, simulated using n = 10, 000).
The simulations were conducted using the same
methodology as for the L-DGP, and the settings
employed were = {0.5, 1}, where = 1
corresponds to f
1
encompassing f
2
using the optimal
type of combination (LoOP when c = 0), as well as
Corr(u
1t
, u
2t
) = {0, 0.8} and c = {0, 0.5} to
allow for correlated and biased forecasts (when = 0
and c = 0 respectively). The results are recorded in
Tables 5 (for = 1) and 6 (for = 0.5).
Consider frst the case where the population
weights are used. When = 1, so that f
1
encompasses
f
2
using the optimal combination, unreported results
show that all three combinations attach zero weight to
f
2
when c = 0 and/or = 0, but not in the case where
the forecasts are both correlated and biased. Instead,
when both = 0 and c = 0, we fnd
2
values
for LiOP, LoOP and KK of 0.106, 0.015 and 0.082,
although the LoOP value of 0.015 is not signifcantly
different from zero across replications. This highlights
the result that when non-optimal combination methods
are employed, whether one forecast encompasses
another may depend on the form of the combination
adopted. For example, using LiOP we fnd that the
inclusion of f
2
attracts a non-zero weight in the
combination, whereas using LoOP the weight on f
2
is zero (i.e., f
1
2
).
In terms of the most accurate combination, when
we abstract from estimation uncertainty (n = ),
LoOP is always at least as good as the other two
methods. This is true whether = 1 or = 0.5,
although the differences are small in magnitude. The
form of the DGP is such that the entries in the tables
for LiOP and LoOP can be calculated analytically for
some combinations of values of {c, }. For example,
Table 3
1
= 1,
2
= 0, autocorrelated explanatory variables.
n
X
1
,X
2
= 0.2
X
1
,X
2
= 0.5
X
1
,X
2
= 0.8
25 LiOP 0.444 0.048 0.445 0.050 0.445 0.049
LoOP 0.472 0.088 0.471 0.080 0.473 0.081
KK 0.478 0.086 0.478 0.087 0.478 0.085
50 LiOP 0.418 0.022 0.417 0.021 0.418 0.021
LoOP 0.426 0.035 0.425 0.033 0.426 0.033
KK 0.428 0.036 0.428 0.036 0.427 0.036
100 LiOP 0.405 0.010 0.405 0.010 0.405 0.010
LoOP 0.407 0.014 0.408 0.013 0.408 0.015
KK 0.408 0.013 0.408 0.013 0.408 0.013
200 LiOP 0.400 0.006 0.399 0.006 0.400 0.006
LoOP 0.401 0.014 0.400 0.009 0.400 0.009
KK 0.401 0.007 0.401 0.007 0.401 0.007
LiOP 0.394 0.394 0.394
LoOP 0.394 0.394 0.394
KK 0.394 0.394 0.394
Table 4
11
= 2,
12
= 5,
21
= 4,
22
= 1.
n
X
11
,X
21
= 0.2
X
11
,X
21
= 0.5
X
11
,X
21
= 0.8
25 LiOP 0.190 0.023 0.191 0.023 0.176 0.020
LoOP 0.245 0.076 0.244 0.072 0.230 0.067
KK 0.190 0.062 0.209 0.061 0.215 0.062
50 LiOP 0.179 0.012 0.181 0.011 0.166 0.010
LoOP 0.199 0.038 0.200 0.035 0.187 0.034
KK 0.152 0.035 0.171 0.034 0.176 0.033
100 LiOP 0.175 0.007 0.176 0.006 0.162 0.006
LoOP 0.181 0.023 0.183 0.023 0.170 0.019
KK 0.134 0.016 0.153 0.016 0.159 0.017
200 LiOP 0.173 0.005 0.174 0.004 0.160 0.004
LoOP 0.174 0.030 0.176 0.029 0.163 0.016
KK 0.126 0.008 0.145 0.008 0.150 0.008
LiOP 0.171 0.172 0.158
LoOP 0.166 0.168 0.157
KK 0.119 0.138 0.144
we can show that QPS
Li OP
> QPS
LoOP
for the
encompassing specifcation = 1, c = 0.5 and
= 0.
8
8
An appendix to this paper is available at www.forecasters.
org/ijf . This outlines the calculation of QPS for LiOP and LoOP
for the CJ-DGP.
When we allow for fnite-sample estimation uncer-
tainty, LiOP is the most accurate form of combination,
especially for the smaller sample sizes. For small val-
ues of n, both the LoOP and KK combinations are
similarly affected by estimation uncertainty, with the
standard deviation of QPS for LoOP being especially
large when c = 0.
Table 5
Means and standard deviations of QPS across Monte Carlo replications: CJ-DGP, = 1.
n c = 0, = 0 c = 0, = 0.8 c = 0.5, = 0 c = 0.5, = 0.8
Mean S.d. Mean S.d. Mean S.d. Mean S.d.
25 LiOP 0.367 0.033 0.369 0.035 0.194 0.021 0.196 0.023
LoOP 0.405 0.073 0.403 0.072 0.251 0.146 0.237 0.093
KK 0.406 0.074 0.406 0.072 0.245 0.063 0.242 0.059
50 LiOP 0.350 0.015 0.351 0.016 0.185 0.012 0.185 0.013
LoOP 0.363 0.033 0.362 0.030 0.212 0.112 0.203 0.048
KK 0.365 0.033 0.365 0.033 0.211 0.037 0.210 0.036
100 LiOP 0.341 0.008 0.342 0.008 0.180 0.007 0.180 0.007
LoOP 0.347 0.019 0.346 0.017 0.189 0.088 0.183 0.032
KK 0.347 0.014 0.347 0.014 0.191 0.017 0.191 0.018
200 LiOP 0.337 0.005 0.338 0.005 0.177 0.005 0.177 0.005
LoOP 0.339 0.008 0.339 0.007 0.177 0.060 0.174 0.008
KK 0.339 0.007 0.340 0.007 0.183 0.008 0.183 0.008
LiOP 0.333 0.333 0.174 0.174
LoOP 0.333 0.333 0.168 0.168
KK 0.333 0.333 0.176 0.176
Table 6
Means and standard deviations of QPS across Monte Carlo replications: CJ-DGP, = 0.5.
n c = 0, = 0 c = 0, = 0.8 c = 0.5, = 0 c = 0.5, = 0.8
25 LiOP 0.443 0.040 0.390 0.036 0.124 0.016 0.188 0.023
LoOP 0.462 0.077 0.420 0.071 0.161 0.077 0.227 0.082
KK 0.476 0.077 0.425 0.073 0.171 0.061 0.231 0.060
50 LiOP 0.421 0.020 0.370 0.017 0.119 0.008 0.178 0.012
LoOP 0.418 0.031 0.380 0.033 0.137 0.039 0.196 0.056
KK 0.433 0.033 0.385 0.034 0.143 0.033 0.200 0.035
100 LiOP 0.409 0.010 0.360 0.009 0.116 0.005 0.173 0.007
LoOP 0.402 0.014 0.363 0.013 0.123 0.049 0.176 0.029
KK 0.414 0.013 0.366 0.014 0.127 0.017 0.181 0.016
200 LiOP 0.404 0.006 0.356 0.006 0.115 0.004 0.170 0.005
LoOP 0.395 0.006 0.356 0.007 0.114 0.032 0.167 0.010
KK 0.407 0.007 0.359 0.007 0.120 0.007 0.173 0.007
LiOP 0.399 0.351 0.113 0.167
LoOP 0.389 0.350 0.107 0.160
KK 0.400 0.352 0.115 0.167
5.2. Simulation results for LPS
5.2.1. For the L-DGP
The results for when LPS replaces QPS as the loss
function are reported in Table 7, where the setup is
exactly the same as that in Table 1, apart from the
loss function. Recall that KK is optimal. In contrast
to the results we obtained for QPS, KK is less affected
by estimation uncertainty than LoOP and LiOP, and
is markedly more accurate for the smallest set of
forecasts, n = 25. LoOP, and to a lesser extent LiOP,
are badly affected by estimation uncertainty in fnite
samples.
5.2.2. For the CJ-DGP
The LPS results for the CJ-DGP with = 1 (cor-
responding to Table 5 for QPS) are given in Table 8.
LoOP is optimal for c = 0, and from the limit results,
Table 7
Means and standard deviations of LPS across Monte Carlo replications: L-DGP,
1
= 1,
2
= 0.
n
X
1
,X
2
= 0.2
X
1
,X
2
= 0.5
X
1
,X
2
= 0.8
25 LiOP 0.749 0.323 0.740 0.280 0.736 0.289
LoOP 0.820 0.637 0.747 0.366 0.729 0.285
KK 0.669 0.097 0.667 0.088 0.664 0.083
50 LiOP 0.665 0.188 0.663 0.177 0.663 0.176
LoOP 0.671 0.264 0.661 0.185 0.657 0.168
KK 0.631 0.030 0.630 0.029 0.630 0.029
100 LiOP 0.633 0.176 0.631 0.155 0.628 0.120
LoOP 0.628 0.123 0.627 0.123 0.625 0.109
KK 0.615 0.014 0.614 0.014 0.614 0.014
200 LiOP 0.616 0.143 0.618 0.170 0.614 0.123
LoOP 0.613 0.113 0.613 0.099 0.610 0.042
KK 0.607 0.007 0.607 0.007 0.607 0.007
LiOP 0.599 0.599 0.599
LoOP 0.599 0.599 0.599
KK 0.599 0.599 0.599
Table 8
Means and standard deviations of LPS across Monte Carlo replications: CJ-DGP, = 1.
n c = 0, = 0 c = 0, = 0.8 c = 0.5, = 0 c = 0.5, = 0.8
25 LiOP 0.685 0.370 0.679 0.371 0.879 0.826 0.822 0.893
LoOP 0.699 0.450 0.665 0.394 0.970 1.216 0.911 1.264
KK 0.586 0.130 0.577 0.102 0.709 0.433 0.652 0.397
50 LiOP 0.595 0.287 0.596 0.310 0.411 0.579 0.394 0.487
LoOP 0.591 0.258 0.572 0.209 0.422 0.677 0.465 0.929
KK 0.536 0.035 0.536 0.036 0.332 0.112 0.330 0.101
100 LiOP 0.561 0.314 0.553 0.277 0.321 0.398 0.328 0.452
LoOP 0.543 0.186 0.534 0.136 0.312 0.366 0.342 0.630
KK 0.517 0.015 0.517 0.016 0.299 0.020 0.299 0.022
200 LiOP 0.543 0.331 0.540 0.319 0.295 0.375 0.293 0.369
LoOP 0.523 0.166 0.515 0.083 0.278 0.238 0.293 0.414
KK 0.508 0.008 0.508 0.008 0.289 0.010 0.288 0.010
LiOP 0.500 0.500 0.251 0.250
LoOP 0.500 0.500 0.257 0.257
KK 0.500 0.500 0.280 0.279
LiOP is best when c = 0.5. However, when estimation
uncertainty is taken into account KK is clearly best, as
it was for the L-DGP.
5.3. Summary
In summary, our results indicate that for large sam-
ples of forecasts, corresponding to the known com-
bination weight case in the limit, LiOP is dominated
by KK for the L-DGP, and by LoOP for the CJ-DGP.
However, the impact of estimation uncertainty on the
rankings depends on whether the aim is to minimize
QPS or LPS. Under QPS, estimation uncertainty has a
dramatic effect on the accuracy of the KK and LoOP
combination forecasts when the sample of forecasts
is small, and in such small n cases, the LiOP com-
bination method may be expected to generate supe-
rior forecasts. However, for LPS loss KK might be
expected to be the preferred combination method
unless there are large samples of forecasts on which
the combination weights are to be estimated. The de-
terioration in the small-sample performance of LiOP
when accuracy is assessed by LPS rather than QPS
is attributable to the loss of precision from estimating
the weights using a numerical algorithm in the case of
LPS, rather than using OLS under QPS loss.
6. Empirical illustration
Our empirical illustration of the use of different
types of combinations of probability forecasts is based
on recession probability forecasting, where the reces-
sionary periods are those defned by the NBERs Busi-
ness Cycle Dating Committee.
9
Two logit forecasting
models are considered: one uses the spread or yield
curve, and the other uses average weekly hours.
10
The
usefulness of the spread for predicting US recessions
was established by Anderson and Vahid (2001), Es-
trella and Mishkin (1998), and Hamilton and Kim
(2000), and it is a component of the Conference
Boards Composite Leading Indicator (CLI). Average
weekly hours is also a component of the CLI.
11
Stock
and Watson (2003, Table 3, p. 77) found that hours was
on a par with the spread in terms of predicting output
growth at around the time of the 2001 recession.
After constructing the annual percentage change in
hours, we have monthly data from1965:01 to 2007:02.
We ran separate logit regressions of the recession in-
dicator variable on the spread, and of the recession
indicator on the annual change in hours. The estima-
tion sample began around 1965 for both models (the
precise starting point depends on the lag selected for
the leading indicator), and ended in either 1985:12
or 1995:12. For each estimation sample, the lag of
the indicator variable (either the spread or the hours
variable) is selected to maximize the model likelihood
9
See http://www.nber.org/cycles.html.
10
The spread data are the 10-year treasury constant maturity
rate (series identifer GS10) less the 3-month treasury bill at the
secondary market rate (series identifer TB3MS). The hours variable
is average weekly hours, total private industries (series identifer
AWHNONAG). The data were taken from the FRED website
http://research.stlouisfed.org/fred2.
11
See http://www.conference-board.org/economics/bci/serieslist
01.cfm. The Conference Board uses average weekly hours in
manufacturing.
value. From these estimated logit regressions, we use
the in-sample predicted recession probabilities over
this period to estimate the combination weights for
LiOP, LoOP and KK that minimize QPS. We then gen-
erate out-of-sample 1-step-ahead forecast recession
probabilities, without updating the logit model param-
eter estimates as the forecast origin is moved forward
in time (this is usually known as a fxed forecasting
scheme in the literature).
12
Using the weights esti-
mated for the in-sample predictions, the three forecast
combinations are constructed out-of-sample. The fore-
cast combinations are then evaluated out-of-sample by
QPS, and we report pairwise comparisons of equal
forecast accuracy using the modifed Diebold and
Mariano (1995) statistic of Harvey, Leybourne, and
Newbold (1997). The out-of-sample period begins in
1986:01 for the 1965 to 1985:12 estimation sample,
and in 1996:01 for the 1965 to 1995:12 estimation
sample, and runs until 2007:02 in both cases.
A possible criticism is that the out-of-sample
combination weights are calculated from in-sample
predictions rather than out-of-sample forecasts. This
could be remedied by introducing an additional data
split, so that between the current in-sample and out-
of-sample periods there is a period over which out-of-
sample forecasts are generated, which are then used to
estimate weights. However, it was felt that this would
complicate the illustration, with little gain in terms of
insight. The selected logit models are not the result
of elaborate model search/specifcation procedures
in each case we simply choose the best lag from a
maximum lag of twelve and thus there is little reason
to suppose that the in-sample fts will be very different
from their out-of-sample counterparts.
For both in-sample periods, a nine month lag was
selected for the spread, and a one month lag for hours.
Both leading indicators have a signifcant in-sample
predictive power for recessions for both in-sample
12
We also considered recursive and rolling schemes, whereby the
logit forecasting models parameters were updated as the forecast
origin was moved through the data, using either an expanding
window or fxed window of data. In both cases the combination
weights were estimated on the most recent 240 in-sample predicted
probabilities from the two models. In the case of the recursive
scheme the spread model was better than the combinations, while
for the rolling scheme the individual models and combinations were
all equally accurate, according to the Diebold-Mariano statistic.
Hence, there are no interesting differences between the various ways
of combining forecasts, so we do not report these results.
periods. We obtained pseudo-R
2
statistics of 0.288 and
0.290 for the spread, and 0.131 and 0.138 for hours,
using the measure of Estrella and Mishkin (1998).
Their pseudo-R
2
statistic is defned as:
R
2
= 1
ll
u
ll
r
2
n
ll
r
,
where ll
u
and ll
r
are the unrestricted maximised
value of the log likelihood, and the value imposing
the restriction that the slope coeffcients are zero,
respectively. In all cases we rejected the null that the
slope coeffcient (on either the spread or hours) could
be omitted.
Table 9 records the values obtained out-of-sample
for QPS using the individual indicator logit models,
as well as the LiOP, LoOP and KK combinations
of these forecasts. Of the two individual model
forecasts, the spread model is more accurate than
that using hours, matching the higher R
2
in-sample
for the spread, but combination improves upon the
best individual set of forecasts, with KK performing
markedly better than LiOP and LoOP in both forecast
periods. Table 10 indicates that the spread model
forecasts are statistically signifcantly more accurate
than those of the hours model, at the 1% level for
the forecast period 1986:01 to 2007:02, and at the
10% level for the shorter forecast period. Statistical
signifcance is assessed by the modifed Diebold-
Mariano statistic of the null of equal accuracy assessed
by QPS (see Diebold & Mariano, 1995; Harvey et al.,
1997).
13
The KK forecast combination is signifcantly
more accurate than either LiOP or LoOP at all levels
of signifcance for the second forecast period, and at
the 10% level for the frst forecast period. This result
is consistent with the simulation fndings, given that
the combination weights are estimated using samples
of size in excess of n = 200.
Since recessionary periods, as defned by the
NBER, are relatively rare events (there was only one
recession during our second forecast period), one way
of assessing the economic signifcance of our fndings
is to examine the forecast performance of the individ-
ual models and the best combination model (KK) at
13
Our setting satisfes the standard assumptions for the Diebold-
Mariano statistic to be valid: although the variable being forecast
is binary, the forecast error is continuous because the forecast
probability is continuous.
Table 9
QPS for out-of-sample recession probability forecasts.
Forecast
model/combination
1986:012007:02 1996:012007:02
Spread 0.087 0.073
Hours 0.108 0.093
LiOP 0.082 0.067
LoOP 0.084 0.071
KK 0.075 0.056
Notes: For the frst forecast period 1986:012007:02, the
combination weights are estimated from 240 in-sample predicted
probabilities, while for the second forecast period 1996:012007:02
there are around 360 data points.
these times. During the recession at the beginning of
the decade, the average probability of recession for the
model using hours was only 0.255, while that for the
spread was 0.410. The KK combination returned an
average forecast of 0.462.
7. Conclusions
Forecasts of economic and fnancial variables
which take the form of probabilities are becoming in-
creasingly common. As there is an extensive body of
literature in economic and management science sug-
gesting that forecast combination can improve on the
best individual forecast, we consider ways of com-
bining probability forecasts. For probability forecasts,
the justifcation for considering linear combinations is
weaker than in the case of standard point forecasts,
and we consider three combination schemes, one of
which is linear combination. When the loss function
is given by QPS, our simulation results indicate that
linear combination may work reasonably well when
the optimal combination weights are estimated from
a small sample of forecasts, but for moderate sample
sizes, combinations such as those proposed by Kam-
stra and Kennedy (1998) are likely to prove superior.
Moreover, when the loss function is given by LPS, our
results suggest that there is little to recommend linear
combination, particularly when we allow for estima-
tion uncertainty by having small or moderately sized
samples of forecasts on which to estimate the combi-
nation weights.
We present an empirical illustration based on com-
bining US recession probability forecasts from two
single leading indicator logit models. The Kamstra-
Kennedy combination is found to be more accurate
Table 10
Tests for equal QPS-forecast accuracy.
1986:012007:02 1996:012007:02
Test stat. p-value Test stat. p-value
Spread & Hours 2.89 0.002 1.43 0.077
Spread & LiOP 3.96 1.000 2.99 0.998
Spread & LoOP 4.73 1.000 1.97 0.975
Spread & KK 2.50 0.998 6.55 1.000
LiOP & KK 1.36 0.932 5.02 1.000
LoOP & KK 1.90 0.985 8.28 1.000
LiOP & LoOP 2.72 0.004 3.01 0.001
Notes: The test statistic is the modifed Diebold-Mariano statistic of the null hypothesis of equal accuracy, assessed by QPS. For x & y, a
p-value of less than 0.05 indicates rejection of the null hypothesis in favour of x being more accurate at the 5% level, and a p-value greater
than 0.95 indicates rejection of the null hypothesis in favour of y being more accurate at the 5% level.
than both the individual forecasts and the other types
of combinations we consider, including linear combi-
nation. Given that we have large samples of forecasts
on which to estimate the combination parameters for
the different ways of combining forecasts, the fnding
that the Kamstra-Kennedy combination fares best is
consistent with our simulation fndings.
For standard point forecasts, where the literature
focuses almost exclusively on linear combinations of
forecasts and squared error loss, the notion of forecast
encompassing is well-defned, and is a useful test of
predictive accuracy to be reported alongside related
tests, such as tests of whether two sets of forecasts
are equally accurate (e.g., Diebold & Mariano, 1995).
We have shown that for probability forecasts there
are a number of types of combination that might
be considered, and that forecast encompassing may
hold for one type of combination but not another.
Thus, while probability forecast encompassing tests
can still be conducted in a linear combination setting
(see Clements & Harvey, in press), when a broader
range of combination methods is allowed, the notion
of forecast encompassing appears to be less useful,
due to the dependence on the form of the combination.
For this reason we have focused mainly on the relative
accuracies of the different types of combination,
paying particular attention to the need to estimate
combination weights using what may, on occasion, be
relatively small samples of forecasts.
Acknowledgements
We are grateful to Graham Elliott and an anony-
mous referee for helpful comments.
References
Anderson, H. M., & Vahid, F. (2001). Predicting the probability
of a recession with nonlinear autoregressive leading indicator
models. Macroeconomic Dynamics, 5, 482505.
Bates, J. M., & Granger, C. W. J. (1969). The combination of fore-
casts. Operations Research Quarterly, 20, 451468. Reprinted
in T.C. Mills (ed.), Economic forecasting: The international
library of critical writings in economics. Cheltenham: Edward
Elgar, 1999.
Brier, G. W. (1950). Verifcation of forecasts expressed in terms of
probability. Monthly Weather Review, 78, 13.
Chong, Y. Y., & Hendry, D. F. (1986). Econometric evaluation of
linear macro-economic models. Review of Economic Studies,
53, 671690. Reprinted in C.W.J.Granger (ed.), Modelling
economic series. Oxford: Clarendon Press, 1990.
Clark, T. E., & McCracken, M. W. (2001). Tests of equal forecast
accuracy and encompassing for nested models. Journal of
Econometrics, 105, 85110.
Clemen, R. T. (1989). Combining forecasts: A review and anno-
tated bibliography. International Journal of Forecasting, 5,
559583. Reprinted in: T.C. Mills (ed.), Economic forecasting:
The international library of critical writings in economics.
Cheltenham: Edward Elgar, 1999.
Clemen, R. T., & Winkler, R. L. (1999). Combining probability
distributions from experts in risk analysis. Risk Analysis, 19,
187203.
Clements, M. P., & Harvey, D. I. (2009). Forecast combination and
encompassing. In T. C. Mills, & K. Patterson (Eds.), Palgrave
handbook of econometrics, volume 2: Applied econometrics
(pp. 169198). Basingstoke: Palgrave MacMillan.
Clements, M. P., & Harvey, D. I. (in press). Forecast encompassing
tests and probability forecasts. Journal of Applied Econometrics.
Cook, R. D., & Johnson, M. E. (1981). A family of distributions for
modelling non-elliptically symmetric multivariate data. Journal
of the Royal Statistical Society, Series B, 43, 210218.
Dawid, A. P. (1986). Probability forecasting. In S. Kotz, N. L.
Johnson, & C. B. Read (Eds.), Encyclopedia of Statistical
Sciences: vol. 7 (pp. 210218). John Wiley & Sons.
Deutsch, M., Granger, C. W. J., & Ter asvirta, T. (1994). The
combination of forecasts using changing weights. International
Journal of Forecasting, 10, 4757.
Diebold, F. X., & Lopez, J. A. (1996). Forecast evaluation and
combination. In G. S. Maddala, & C. R. Rao (Eds.), Handbook
of statistics: vol. 14 (pp. 241268). Amsterdam: North-Holland.
Diebold, F. X., & Mariano, R. S. (1995). Comparing predictive
accuracy. Journal of Business and Economic Statistics, 13,
253263. Reprinted in T.C. Mills (ed.), Economic forecasting:
The international library of critical writings in economics.
Cheltenham: Edward Elgar, 1999.
Diebold, F. X., & Rudebusch, G. D. (1989). Scoring the leading
indicators. Journal of Business, 62, 369391.
Elliott, G., & Timmermann, A. (2004). Optimal forecast com-
binations under general loss functions and forecast error
distributions. Journal of Econometrics, 122, 4779.
Estrella, A., & Mishkin, F. S. (1998). Predicting US recessions:
Financial variables as leading indicators. Review of Economics
and Statistics, 80, 4561.
Fair, R. C., & Shiller, R. J. (1990). Comparing information in fore-
casts from econometric models. American Economic Review,
80, 3950.
Genest, C., & Zidek, J. V. (1986). Combining probability distri-
butions: A critique and an annotated bibliography. Statistical
Science, 1, 114148.
Good, I. (1952). Rational decisions. Journal of the Royal Statistical
Society, Series B, 14(1), 107114.
Granger, C. W. J., & Ramanathan, R. (1984). Improved methods of
combining forecasts. Journal of Forecasting, 3, 197204.
Hamilton, J. D., & Kim, D. H. 2000. A re-examination of the
predictability of economic activity using the yield spread.
NBER working papers, 7954.
Harvey, D. I., Leybourne, S., & Newbold, P. (1997). Testing
the equality of prediction mean squared errors. International
Journal of Forecasting, 13, 281291.
Harvey, D. I., Leybourne, S., & Newbold, P. (1998). Tests for
forecast encompassing. Journal of Business and Economic
Statistics, 16, 254259. Reprinted in T.C. Mills (ed.), Economic
forecasting: The international library of critical writings in
economics. Cheltenham: Edward Elgar, 1999.
Harvey, D. I., & Newbold, P. (2000). Tests for multiple forecast
encompassing. Journal of Applied Econometrics, 15, 471482.
Hendry, D. F., & Clements, M. P. (2004). Pooling of forecasts. The
Econometrics Journal, 7, 131.
Kamstra, M., & Kennedy, P. (1998). Combining qualitative fore-
casts using logit. International Journal of Forecasting, 14,
8393.
Newbold, P., & Granger, C. W. J. (1974). Experience with fore-
casting univariate time series and the combination of forecasts.
Journal of the Royal Statistical Society, Series A, 137, 131146.
Reprinted in T.C. Mills (ed.), Economic forecasting: The inter-
national library of critical writings in economics. Cheltenham:
Edward Elgar, 1999.
Newbold, P., & Harvey, D. I. (2002). Forecast combination and
encompassing. In M. P. Clements, & D. F. Hendry (Eds.), A
Companion to Economic Forecasting (pp. 268283). Oxford:
Blackwells.
Stock, J. H., & Watson, M. W. (1999). A comparison of linear and
nonlinear univariate models for forecasting macroeconomic
time series. In R. F. Engle, & H. White (Eds.), Cointegration,
causality and forecasting: A festschrift in honour of Clive
Granger (pp. 144). Oxford: Oxford University Press.
Stock, J. H., & Watson, M. W. (2003). How did leading indicator
forecasts perform during the 2001 recession? Federal Reserve
Bank of Richmond, Economic Quarterly, 89(3), 7190.
Timmermann, A. (2006). Forecast combinations. In G. Elliott, C.
Granger, & A. Timmermann (Eds.), Handbook of economic
forecasting, vol. 1. Handbooks in economics, 24 (pp. 135196).
North-Holland: Elsevier.
West, K. D. (1996). Asymptotic inference about predictive ability.
Econometrica, 64, 10671084.
West, K. D. (2001). Tests for forecast encompassing when forecasts
depend on estimated regression parameters. Journal of Business
and Economic Statistics, 19, 2933.
West, K. D., & McCracken, M. W. (1998). Regression-based
tests of predictive ability. International Economic Review, 39,
817840.
Winkler, R. L. (1996). Scoring rules and the evaluation of
probabilities (with discussion). Test, 5(1), 160.

Combining Probability Forecasts (Michael P. Clementsa David I. Harvey - 2010)

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Combining Probability Forecasts (Michael P. Clementsa David I. Harvey - 2010)

Uploaded by

Copyright:

Available Formats

International Journal of Forecasting 27 (2011) 208223

Corresponding address: School of Economics, University of

ensures that the conditional expectation of

, corresponding to the usual notion

denotes the forecast error using the opti-

You might also like