Combination Forecasts

MFx Macroeconomic Forecasting
IMFx
This training material is the property of the International Monetary Fund (IMF) and is intended for use in IMF
Institute for Capacity Development (ICD) courses. Any reuse requires the permission of the ICD.
EViews is a trademark of IHS Global Inc.
Module 9 (Optional): Combination Forecasts

Pat Healy & Carl Sandberg
Roadmap
M9_Intro. Introduction, Motivations & Outline
M9_S1. Forecast Combination Basics
M9_S2. Solving the Theoretical Combination Problem &
Implementation Issues
M9_S3. Methods to Estimate Weights (M<T)
M9_S4. Methods to Estimate Weights (M>T)
M9_S5. EViews 1: EViews Workshop 1
M9_S6. EViews 2: Forecast Combination Tools in EViews
M9_Conclusion. Conclusions
Module 9
M9_Intro: Introduction, Motivations
& Outline
Introduction
Generally speaking, multiple forecasts are
available to decision makers before they make a
policy decision
Key Question: Given the uncertainty associated
with identifying the true DGP, should a single
(best) forecast be used? Or should we (somehow)
average over all the available forecasts?
Motivations: One vs. Many

There are several disadvantages of using only one
forecasting model:
Model misspecifications of an unknown form
Implausible that one statistical model would be preferable to
others at all points of the forecast horizon
Combining separate forecasts offers:

A simple way of building a complex, more flexible forecasting
model to explain the data
Some insurance against breaks or other non-stationarities that
may occur in the future
Outline
1. Forecast Combination Basics
2. Solving the Theoretical Combination Problem &
Implementation Issues
3. Methods to Estimate & Assign Weights
4. Workshop 1: Combining Forecasts in EViews
5. Workshop 2: EViews Combination Tool
6. Wrap-Up
Module 9
Session 1, Part 1: Forecast
Combination Basics
1. Forecast Combination Basics

Learning Objectives:
Look at a general framework and notation for
combining forecasts
Introduce the theoretical forecast combination
problem
General Framework
Today (say, at time T) we want to forecast the value
that a variable of interest (Y) will take at time T+h
We have a certain number (M) of forecasts available
How can we pool, or combine these M forecasts into
an optimal forecast?
Is there any advantage of pooling instead of just finding
and using the best one among the M available?
Some Notation
yt is the value of Y at time t (today is T)
xt,h,i is an unbiased (point) forecast of yt+h
made at time t
h is the forecasting horizon
i = 1,M is the identifier of the available forecast
M is the total number of forecasts
Some (more) Notation
et+h,i = yt+h - xt,h,i is the forecast (prediction) error

2t+h,i = Var(e2t+h,i) is the inverse of precision
t+h,i,j = Cov(et+h,i et+h,j)
wt,h,i = [wt,h,1 , wt,h,M ] is a vector of weights
L(et,h) is the loss from making a forecast error
E [L(et,h)] is the risk associated to a forecast
The Forecast Combination Problem

A combined forecast is a weighted average of the M
M
forecasts:
yTc h wT ,h,i xT ,h,i
i 1
The forecast combination problem can be formally

stated as:
Problem: Choose weights wT,h,i to minimize E[ L(eTc h )]
M
subject to
w
i 1
T , h ,i
What is the Problem Really About?

Observations on a variable Y
Observations of forecasts of Y with forecasting horizon 1
Forecasting errors
Question: How much weight shall we give to the current forecast given past
performance and knowing that there will be a forecasting error?

GDP
T+1
Time
Module 9
Session 1, Part 2: Combining
Prediction Errors
Examples of Loss Functions

Squared error loss: E[(eTc h )2 ]
Absolute error loss: E[| eTc h |]
Linex loss: E[exp( eTc h ) eTc h 1]

We will focus on mean squared error loss
E[ L(eTc h )] E[(eTc h )2 ]
Combining Prediction Errors

M
wT ,h ,i 1 then:
Notice that because
i 1
c
T h
i 1
i 1
yT h y yT h wT ,h,i wT ,h,i xT ,h,i

c
T h
wT ,h ,i ( yT h xT ,h ,i )
i 1
M
wT ,h ,i eT h ,i
i 1
Hence, if weights sum to one:

E L e
c
T h
E L wT ,h,ieT h,i
i 1
Combination Problem with MSE

Let w be the (M x 1) vector of weights, e the (M x 1) vector of forecast
errors, u an (M x 1) vector of 1s, and the (M x M) variance covariance
matrix of the errors
E ee'
It follows that
M
w
i 1
T , h ,i
u'w
c
T h
wT ,h,i eT h,i w'e

i 1
2
c
T h
w'ee'w
E eTc h E w'ee'w w'E ee' w w'w
Problem 1: Choose w to minimize w w subject to uw = 1.
Issues and Clarification (I)

Must the weights sum to one?
If forecasts are unbiased, this guarantees an
unbiased combination forecast
How restrictive is pooling the forecasts rather

than information sets?
Pooling information sets is theoretically better but
practically difficult or impossible
Issues and Clarification (II)

Are point forecasts the only forecasts that we can
combine?
We can also combine forecasts of distributions.
Are Bayesian methods useful?

Not entirely.
Is there a difference between averaging across

forecasts and across forecasting models?
If you know the models and the models are linear in the
parameters, there is no difference.
Broad Summary of Questions

1. What are the optimal weights in the
population ?
2. How can we estimate the optimal weights?
3. Are these estimates good?
Module 9
Session 2, Part 1: Solving the
Theoretical Combination Problem
2. Solving the Theoretical Combination

Problem
Create a simple combination of forecasts using 2
forecasts
Generalize to M forecasts
Explore key takeaways from combining
Simple Combination
For now, lets assume that we know the distribution of the forecasting
errors associated to each forecast.
What are the optimal

(loss minimizing) weights,
in population?
T+1
Simple Combination with m=2

Consider two point forecasts:
yTc h wxT ,h,1 (1 w) xT ,h,2
E[(eTc h )2 ] E[(weT h,1 (1 w)eT h,2 )2 ]
2
= w2s T+h,1
+ (1- w)2 s T2+h,2 + 2w(1- w)s T +h,1,2
The solution to the Problem (m=2) is:
2
T h ,2 T h ,1,2
*
weight of xT,h,1
w 2
2
T h,1 T h,2 2 T h ,1,2
2
T h ,1 T h ,1,2
*
weight of xT,h,2
1 w 2
2
T h,1 T h,2 2 T h ,1,2
Interpreting the Optimal Weights

Consider:
w
T h ,2 T h ,1,2
2
*
1 w T h ,1 T h ,1,2
*
a larger weight is assigned to the more precise model

the weights are the same (w* = 0.5) if and only if 2T+h,1 = 2T+h,2
if the covariance between the two forecasts increases, greater
weight goes to the more precise forecast
General Result with M Forecasts:

The choice of weights will reflect var-covar matrix of FE.
Result: The vector of optimal weights w with M forecasts is
w'
u'T,h
-1
u'T,h u
-1
Takeaway 1: Combining Forecasts

Decreases Risk
Compute the expected loss (MSE) under the optimal weights
2
2
2
Is the
(1
T h ,1 T h ,2
T h ,1,2 )
c
* 2
E[(eT h ( w )) ] 2
correlation
2
T h,1 T h,2 2T h,1,2 T h ,1 T h ,2
coefficient
Suppose that
2
2
(1
T h ,2
T h ,1,2 )
c
* 2
2
2
E[(eT h ( w )) ] T h ,1 2
T h ,1
T h,1 T2h,2 2T h,1,2 T h,1 T h,2
E[(eTc h (w* ))2 ] min{ T2h,1 , T2h,2 }

That is, the forecast risk from combining forecasts is lower than
the lowest of the forecasting risk from the single forecasts
Result:
Takeaway 2: Bias vs. Variance Tradeoff

The MSE loss function of a forecast has two components:
the squared bias of the forecast
the (ex-ante) forecast variance
E[( yT h xT ,h,i )2 ] BiasT2,h,i y2 Var ( xT ,h,i )
E ( y
c
T h
2
M
M

yt h ) E wT , h ,i biasT ,h ,i y wT ,h ,i xt ,h ,i E ( xt ,h ,i )
i

i
2
2
T , h ,i
2
T , h ,i
bias
wT2 ,h ,iVarT2.h.i
2
y
Result: Combining forecasts offers a tradeoff between increased overall

bias vs. lower (ex-ante) forecast variance.

GDP
T+1
Time
Module 9
Session 2, Part 2: Implementation
Issues
Issue: What if it is optimal for one

weight to be<0?
Consider again the case M = 2. The optimal weights are such that:
T2h ,2 T h ,1,2
w*
2
*
1 w T h ,1 T h ,1,2
If T,h,1,2 > 0 and 2T,h,2 < T,h,1,2 < 2T,h,1 then w* < 0
Shall we impose the

constraints that wT,h,i > 0 ?
Another Issue: Estimating

In reality, we do not know : we can only estimate the
theoretical weights using the observed past forecast errors.
et,h,1
T
et,h,2
T
Questions when estimating

1) Are the estimates of based on past errors unbiased?
2) Does the population depend on t?
If not, estimates become better as T increases
If it does, different issues: heteroskedasticity of any sort,
serial correlation, etc.
Do our estimates capture this dependence?
3) Does depend on past realizations of y?
Questions when estimating

4) How good are our estimates of ? If M is large relative to T,
our estimates are poor!
5) Shall we just focus on weighted averages? Why not to
consider the median forecast, or trim excess forecasts?
One More Issue: Optimality of Equal

Weights?
When does it make sense (in terms of minimum squared error) to use
equal weights?
when the variance of the forecast errors are the same
and all the pair wise covariances across forecast errors are the same
the loss function is symmetric
Result: Equal weights tend to perform better than many estimates of

the optimal weights (Stock and Watson 2004, Smith and Wallis 2009)
Module 9
Session 3: Methods to estimate the
weights when M is low relative to T
Methods of Weighting (M<T)

Decide when to combine and when not to
combine
Estimate weights using OLS
Address Sampling Error
To combine or not to combine?

We need to assess whether one set of forecasts encompasses all
information contained in another set of forecasts
Example, for 2 forecasting models, run the regression:
yt h T ,h,0 T ,h,1 xt ,h,1 T ,h,2 xt ,h,2 t h
t 1, 2,...T - h
If you cannot reject:

H 0 : (T ,h,0 , T ,h,1 , T ,h,2 ) (0,1,0)
H 0 : (T ,h,0 , T ,h,1 , T ,h,2 ) (0,0,1)
All other outcomes imply that there is some information in both

forecasts that can be used to obtain a lower mean squared error
OLS estimates of the weights

If we assume a linear-in-weights model, OLS can be used to estimate
the weights that minimizes the MSE using data for t = 1,T h:
yt h T ,h,0 T ,h,1 xt ,h,1 T ,h,2 xt ,h,2 ... T ,h,M xt ,h,M t h

yt h T ,h,1 xt ,h,1 T ,h,2 xt ,h,2 ... T ,h,M xt ,h,M t h
Including a constant allows correcting
for the bias in any one of the forecasts
s.t. ,i 1
i 1
OLS estimates of the weights

If weights sum to one, then previous equation becomes a regression of
a vector of 0s over the past forecasting errors:
0 T ,h,1 ( xt ,h,1 yt h ) T ,h,2 ( xt ,h,2 yt h ) ... t h
0 T ,h,1et h,1 T ,h,2 et h,2 ... t h
subject to uw = 1 and
Problem 2: Choose w to minimize w'w

wi 0, where
T h
e
t 1
t,h
e 't,h and et,h is a vector that collects the
forecast errors of the M forecasts made in t.
Reducing the dependency on sampling

errors
Assume that estimates of reflect (in part) sampling error.

Although optimal weights depend on , it makes sense to
reduce the dependence of the weights on such an estimate
One way to achieve this is to shrink the optimal weights
towards equal weights (Stock and Watson 2004):
Ts ,h,i T ,h,i (1 )(1/ M ),

max(0,1 ( M / (T h M 1)))
Module 9
Session 4: Methods to estimate the
weights when M is high relative to T
Methods of Weighting (M>T)

Explore shortcomings of OLS weights
Look at other parametric weights.
Consider some non-parametric weights and
techniques
Premise: problems with OLS weights

The problem with OLS weights is that:
If M is large relative to T h, the estimator loses

precision and may not even be feasible (if M > T
h)
Even if M is low relative to T h, OLS estimation of
weights is subject to sampling error.
Other Types of Weights
Relative Performance
Shrinking Relative Performance
Recent Performance
Adaptive Weights
Non-parametric (trimming and indexing)
Relative performance weights

A solution to the problem of OLS weights is to ignore the covariance
across forecast errors and compute weights based on their relative
performance over the past.
1 T h 2
et ,h,i
For each forecast compute MSET ,h,i
T h 1 t 1
MSE weights (or relative performance weights)

T ,h ,i
1
MSET ,h ,i
M
1
i 1 MSET , h ,i
Shrinking relative performance

Consider instead
T , h ,i
MSE
T , h ,i
k
M
i 1 MSET , h ,i
If k = 1 we obtain standard MSE weights

If k = 0 we obtain equal weights 1/M
The parameter k allows attenuating the attention we pay to
performance
Highlighting recent performance

Consider computing a discounted
MSET ,h,i
T h
1
2
t , h ,i (t )
#period with 0 t 1
where (t) can be either one of the following

1 if t T h v
(t )
0 if t T h v
(t )
T h t
rolling
window
discounted
MSE
Computing MSE weights using either rolling windows or

discounting allows paying more attention to recent performance
Adaptive weights
Relative performance weights could be sensitive to adding new
forecast errors. A possibility is to adapt previous weights by the most
recently computed weights
T ,h,i MSE weight (with or without covariance)

T* ,h,i T* 1,h,i (1- )T ,h,i
Non parametric weights: Trimming

Often advantageous to discard the models with the worst and
best performance when combining forecasts
Simple averages are easily distorted by extreme
forecasts/forecast errors
Aiolfi and Favero (2003) recommend ranking the individual
models by R^2 and discarding the bottom and top 10 percent
Non parametric weights: Indexing

Rank the models by their MSE. Let Ri be the rank of the i-th
model, then:
Index based-weights
index
T , h ,i
1 Ri
M
1 R
j 1
Module 9
Session 5: Workshop on Combining
Forecasts
Workshop 1: Combining Forecasts

Have open MF_combination_forecasting.wf1well be
working on the CF_W1_Combined pagefile.
Lets estimate 4 regression models:
Combining Forecasts
Step One: Estimate each of the models using LS and Forecast
2008-09 using EViews. Note the RMSE of each.
Step Two: Calculate a combined forecast using the misspecified
ones using:
Equal Weights
Trimmed Weights
Inverse MSE weights
Combining Forecasts
Step Three: Compare the RMSE of the combined forecast
models to that of the individual forecast models.
Which is most accurate?
Do all 3 combined ones outperform the individual ones?
Step Four: Repeat Steps 1-3 for the true DGP:
Then forecast 2008-09, what is RMSE of this full model?
Module 9
Session 6 : Forecast Combination
Tools in EViews
Workshop 2: Forecast Combination

Techniques
Have open MF_M9.wf1well be working on the USA
pagefile.
Lets explore different forecast combination techniques!
Note: This workshop will involve the use of multiple program
files. Make sure to have the workfile open at all times.
W2: Forecast Combination Techniques

Step One: First, remember to always inspect your data:
The USA pagefile contains quarterly data for 1959q1-2010q4:
rgdp= real GDP index (2005=100)in this case will use natural log (lngdp).
growth = growth rate over different time spans (1, 2 and 4 quarters)
fh_i_j = Growth forecast indexed by time horizon (i) and model number (j)
Step Two: Produce combined forecast for GDP growth in

2004q1we can compare this to actual GDP growth.
Use programs to compute forecasts for rest of 2004-2007 (tedious).
Workshop 2: Forecast Combination

Techniques
Have open MF_M9.wf1well be working on the USA
pagefile.
Lets explore different forecast combination techniques!
Note: This workshop will involve the use of multiple program
files. Make sure to have the workfile open at all times.
Module 9
Session 7: Conclusions
Conclusions
Numerous weighting schemes have been proposed to
formulate combined forecasts.
Simple combination schemes are difficult to beat; why this is
the case is not fully understood.
Simple weights reduce variability with relatively little cost in terms of
overall bias
Also provide diversity if pool of models is indeed diverse.
Conclusions
Results are valid for symmetric loss function; may not be valid if sign of the
error matters
Forecasts based solely on the model with the best in-sample performance
often yields poor out-of-sample forecasting performance.
Reflects the reasonable prior that a preferred model is really just an
approximation of true DGP, which can change each period.
Combined forecasts imply diversification of risk (provided not all the
models suffer from the same misspecification problem).
Appendix
JVI14.09
65
Appendix 1
Let e be the (M x 1) vector of the forecast errors. Problem 1:
choose the vector w to minimize E[weew] subject to uw = 1.
Notice that E[weew]
= wE[ee]w = ww. The Lagrangean is
L w'ee'w - [u'w - 1]
and the FOC is
w - u 0 w* = -1u
Using uw = 1 one can obtain

u'w = u' u 1 = u' u =
u' u
-1
-1
Substituting back one obtains

*
-1
-1
w = u
u' u
JVI14.09
-1
66
Appendix 1
Let t,h be the variance-covariance matrix of the forecasting errors
T , h
T2 h ,1
T h ,1,2
T h ,1,2
T2 h ,2
Consider the inverse of this matrix

T , h
det | T , h
T2 h ,2
|
T h ,1,2
T h ,1,2
T2 h ,1
Let u = [1, 1]. The two weights w* and (1 - w*) can be written as
T,h -1u
1 w
u' -1u
T,h
*
JVI14.09
67
Appendix 2
Notice that
and that
T2h ,1 T2h ,2 2 T h ,1,2 E (eT h ,1 eT h ,2 )2 0
T2h ,1 (1 T2h ,1,2 ) 0
So, the following inequality holds
T2h ,2 (1 T2h ,1,2 )

1
T2 h ,1 T2h ,2 2T h ,1,2 T h ,1 T h ,2
T2h,2 (1 T2h,1,2 ) T2h,1 T2h,2 2T h,1,2 T h,1 T h ,2

0 T2h ,1 2T h ,1,2 T h ,1 T h ,2 T2h ,2 T2h ,1,2
0 ( T h ,1 T h ,2 T h ,1 )2
JVI14.09
68
Appendix 3
The MSE loss function of a forecast has two components:
the squared bias of the forecast
the (ex-ante) forecast variance
E[( yT h xT ,h,i )2 ] E[( E ( yT h ) y xT ,h ,i ) 2 ]

E[( E ( yT h ) y E ( xT ,h,i ) E ( xT , h,i ) xT ,h,i ) 2 ]
E[( Biasi y E ( xT ,h,i ) xT ,h,i ) 2 ]
Biasi2 y2 Var ( xT ,h,i )
JVI14.09
69
Appendix 4
Suppose that x Py where P is an (m x T) matrix, y is a (T x 1)
vector with all yt , t = 1,T. Consider:
T , h ,i
2
T , h ,i
2
1 T h
xt , h ,i yt h
T h t 1
2
1 T h
E
[
y
]
t , h ,i
t h
y ,t h
T h t 1
2
y
2
1 T h
2 T h
y ,t h xt ,h ,i E[ yt h ]
xt ,h,i E[ yt h ] T h
T h t 1
t 1
JVI14.09
70
Appendix 4
Consider:
2
T , h ,i
2
1 T h
2 T h

xt , h ,i E[ yt h ]
y ,t h xt ,h ,i E[ yt h ]
T h t 1
T h t 1
2
...
'(Py - E[y])
T h
2
...
'(PE[y] + P - E[y])
T h
2
...
'P '(I - P)E[y]
T h
2
MSPET
E 'P
T h
2
y
JVI14.09
71

Combination Forecasts

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Combination Forecasts

Uploaded by

Copyright:

Available Formats

MFx Macroeconomic Forecasting

Module 9 (Optional): Combination Forecasts

Motivations: One vs. Many

Combining separate forecasts offers:

1. Forecast Combination Basics

M is the total number of forecasts

Some (more) Notation

et+h,i = yt+h - xt,h,i is the forecast (prediction) error

The Forecast Combination Problem

The forecast combination problem can be formally

What is the Problem Really About?

What is the Problem Really About?

Examples of Loss Functions

Linex loss: E[exp( eTc h ) eTc h 1]

Combining Prediction Errors

yT h y yT h wT ,h,i wT ,h,i xT ,h,i

Hence, if weights sum to one:

Combination Problem with MSE

wT ,h,i eT h,i w'e

E eTc h E w'ee'w w'E ee' w w'w

Problem 1: Choose w to minimize w w subject to uw = 1.

Issues and Clarification (I)

How restrictive is pooling the forecasts rather

Issues and Clarification (II)

Are Bayesian methods useful?

Is there a difference between averaging across

Broad Summary of Questions

2. Solving the Theoretical Combination

What are the optimal

Simple Combination with m=2

Interpreting the Optimal Weights

a larger weight is assigned to the more precise model

General Result with M Forecasts:

Takeaway 1: Combining Forecasts

E[(eTc h (w* ))2 ] min{ T2h,1 , T2h,2 }

Takeaway 2: Bias vs. Variance Tradeoff

Result: Combining forecasts offers a tradeoff between increased overall

What is the Problem Really About?

Issue: What if it is optimal for one

Shall we impose the

Another Issue: Estimating

Questions when estimating

Questions when estimating

One More Issue: Optimality of Equal

Result: Equal weights tend to perform better than many estimates of

Methods of Weighting (M<T)

To combine or not to combine?

If you cannot reject:

All other outcomes imply that there is some information in both

OLS estimates of the weights

yt h T ,h,0 T ,h,1 xt ,h,1 T ,h,2 xt ,h,2 ... T ,h,M xt ,h,M t h

OLS estimates of the weights

e 't,h and et,h is a vector that collects the

forecast errors of the M forecasts made in t.

Reducing the dependency on sampling

Assume that estimates of reflect (in part) sampling error.

Ts ,h,i T ,h,i (1 )(1/ M ),

Methods of Weighting (M>T)

Premise: problems with OLS weights

If M is large relative to T h, the estimator loses

Other Types of Weights

Relative performance weights

MSE weights (or relative performance weights)

Shrinking relative performance