You are on page 1of 71

MFx Macroeconomic Forecasting

IMFx
This training material is the property of the International Monetary Fund (IMF) and is intended for use in IMF
Institute for Capacity Development (ICD) courses. Any reuse requires the permission of the ICD.
EViews is a trademark of IHS Global Inc.

Module 9 (Optional): Combination Forecasts


Pat Healy & Carl Sandberg

Roadmap
M9_Intro. Introduction, Motivations & Outline
M9_S1. Forecast Combination Basics
M9_S2. Solving the Theoretical Combination Problem &
Implementation Issues
M9_S3. Methods to Estimate Weights (M<T)
M9_S4. Methods to Estimate Weights (M>T)
M9_S5. EViews 1: EViews Workshop 1
M9_S6. EViews 2: Forecast Combination Tools in EViews
M9_Conclusion. Conclusions

Module 9
M9_Intro: Introduction, Motivations
& Outline

Introduction
Generally speaking, multiple forecasts are
available to decision makers before they make a
policy decision
Key Question: Given the uncertainty associated
with identifying the true DGP, should a single
(best) forecast be used? Or should we (somehow)
average over all the available forecasts?

Motivations: One vs. Many


There are several disadvantages of using only one
forecasting model:
Model misspecifications of an unknown form
Implausible that one statistical model would be preferable to
others at all points of the forecast horizon

Combining separate forecasts offers:


A simple way of building a complex, more flexible forecasting
model to explain the data
Some insurance against breaks or other non-stationarities that
may occur in the future

Outline
1. Forecast Combination Basics
2. Solving the Theoretical Combination Problem &
Implementation Issues
3. Methods to Estimate & Assign Weights
4. Workshop 1: Combining Forecasts in EViews
5. Workshop 2: EViews Combination Tool
6. Wrap-Up

Module 9
Session 1, Part 1: Forecast
Combination Basics

1. Forecast Combination Basics


Learning Objectives:
Look at a general framework and notation for
combining forecasts
Introduce the theoretical forecast combination
problem

General Framework
Today (say, at time T) we want to forecast the value
that a variable of interest (Y) will take at time T+h
We have a certain number (M) of forecasts available
How can we pool, or combine these M forecasts into
an optimal forecast?
Is there any advantage of pooling instead of just finding
and using the best one among the M available?

Some Notation
yt is the value of Y at time t (today is T)
xt,h,i is an unbiased (point) forecast of yt+h
made at time t
h is the forecasting horizon
i = 1,M is the identifier of the available forecast

M is the total number of forecasts

Some (more) Notation

et+h,i = yt+h - xt,h,i is the forecast (prediction) error


2t+h,i = Var(e2t+h,i) is the inverse of precision
t+h,i,j = Cov(et+h,i et+h,j)
wt,h,i = [wt,h,1 , wt,h,M ] is a vector of weights
L(et,h) is the loss from making a forecast error
E [L(et,h)] is the risk associated to a forecast

The Forecast Combination Problem


A combined forecast is a weighted average of the M
M
forecasts:
yTc h wT ,h,i xT ,h,i
i 1

The forecast combination problem can be formally


stated as:
Problem: Choose weights wT,h,i to minimize E[ L(eTc h )]
M

subject to

w
i 1

T , h ,i

What is the Problem Really About?


Observations on a variable Y
Observations of forecasts of Y with forecasting horizon 1
Forecasting errors
Question: How much weight shall we give to the current forecast given past
performance and knowing that there will be a forecasting error?

What is the Problem Really About?


GDP

T+1

Time

Module 9
Session 1, Part 2: Combining
Prediction Errors

Examples of Loss Functions


Squared error loss: E[(eTc h )2 ]
Absolute error loss: E[| eTc h |]

Linex loss: E[exp( eTc h ) eTc h 1]


We will focus on mean squared error loss

E[ L(eTc h )] E[(eTc h )2 ]

Combining Prediction Errors


M

wT ,h ,i 1 then:
Notice that because
i 1
c
T h

i 1

i 1

yT h y yT h wT ,h,i wT ,h,i xT ,h,i


c
T h

wT ,h ,i ( yT h xT ,h ,i )
i 1
M

wT ,h ,i eT h ,i
i 1

Hence, if weights sum to one:


E L e

c
T h

E L wT ,h,ieT h,i

i 1

Combination Problem with MSE


Let w be the (M x 1) vector of weights, e the (M x 1) vector of forecast
errors, u an (M x 1) vector of 1s, and the (M x M) variance covariance
matrix of the errors

E ee'

It follows that
M

w
i 1

T , h ,i

u'w

c
T h

wT ,h,i eT h,i w'e


i 1

2
c
T h

w'ee'w

E eTc h E w'ee'w w'E ee' w w'w

Problem 1: Choose w to minimize w w subject to uw = 1.

Issues and Clarification (I)


Must the weights sum to one?
If forecasts are unbiased, this guarantees an
unbiased combination forecast

How restrictive is pooling the forecasts rather


than information sets?
Pooling information sets is theoretically better but
practically difficult or impossible

Issues and Clarification (II)


Are point forecasts the only forecasts that we can
combine?
We can also combine forecasts of distributions.

Are Bayesian methods useful?


Not entirely.

Is there a difference between averaging across


forecasts and across forecasting models?
If you know the models and the models are linear in the
parameters, there is no difference.

Broad Summary of Questions


1. What are the optimal weights in the
population ?
2. How can we estimate the optimal weights?
3. Are these estimates good?

Module 9
Session 2, Part 1: Solving the
Theoretical Combination Problem

2. Solving the Theoretical Combination


Problem
Learning Objectives:
Create a simple combination of forecasts using 2
forecasts
Generalize to M forecasts
Explore key takeaways from combining

Simple Combination
For now, lets assume that we know the distribution of the forecasting
errors associated to each forecast.

What are the optimal


(loss minimizing) weights,
in population?

T+1

Simple Combination with m=2


Consider two point forecasts:
yTc h wxT ,h,1 (1 w) xT ,h,2
E[(eTc h )2 ] E[(weT h,1 (1 w)eT h,2 )2 ]
2
= w2s T+h,1
+ (1- w)2 s T2+h,2 + 2w(1- w)s T +h,1,2
The solution to the Problem (m=2) is:
2

T h ,2 T h ,1,2
*
weight of xT,h,1
w 2
2
T h,1 T h,2 2 T h ,1,2
2

T h ,1 T h ,1,2
*
weight of xT,h,2
1 w 2
2
T h,1 T h,2 2 T h ,1,2

Interpreting the Optimal Weights


Consider:

w
T h ,2 T h ,1,2
2
*
1 w T h ,1 T h ,1,2
*

a larger weight is assigned to the more precise model


the weights are the same (w* = 0.5) if and only if 2T+h,1 = 2T+h,2
if the covariance between the two forecasts increases, greater
weight goes to the more precise forecast

General Result with M Forecasts:


The choice of weights will reflect var-covar matrix of FE.
Result: The vector of optimal weights w with M forecasts is

w'

u'T,h

-1

u'T,h u
-1

Takeaway 1: Combining Forecasts


Decreases Risk
Compute the expected loss (MSE) under the optimal weights
2
2
2
Is the

(1

T h ,1 T h ,2
T h ,1,2 )
c
* 2
E[(eT h ( w )) ] 2
correlation
2
T h,1 T h,2 2T h,1,2 T h ,1 T h ,2
coefficient
Suppose that
2
2

(1

T h ,2
T h ,1,2 )
c
* 2
2
2
E[(eT h ( w )) ] T h ,1 2

T h ,1
T h,1 T2h,2 2T h,1,2 T h,1 T h,2

E[(eTc h (w* ))2 ] min{ T2h,1 , T2h,2 }


That is, the forecast risk from combining forecasts is lower than
the lowest of the forecasting risk from the single forecasts

Result:

Takeaway 2: Bias vs. Variance Tradeoff


The MSE loss function of a forecast has two components:
the squared bias of the forecast
the (ex-ante) forecast variance
E[( yT h xT ,h,i )2 ] BiasT2,h,i y2 Var ( xT ,h,i )
E ( y

c
T h

2
M
M

yt h ) E wT , h ,i biasT ,h ,i y wT ,h ,i xt ,h ,i E ( xt ,h ,i )
i

i
2

2
T , h ,i

2
T , h ,i

bias

wT2 ,h ,iVarT2.h.i
2
y

Result: Combining forecasts offers a tradeoff between increased overall


bias vs. lower (ex-ante) forecast variance.

What is the Problem Really About?


GDP

T+1

Time

Module 9
Session 2, Part 2: Implementation
Issues

Issue: What if it is optimal for one


weight to be<0?
Consider again the case M = 2. The optimal weights are such that:
T2h ,2 T h ,1,2
w*
2
*
1 w T h ,1 T h ,1,2

If T,h,1,2 > 0 and 2T,h,2 < T,h,1,2 < 2T,h,1 then w* < 0

Shall we impose the


constraints that wT,h,i > 0 ?

Another Issue: Estimating


In reality, we do not know : we can only estimate the
theoretical weights using the observed past forecast errors.
et,h,1
T

et,h,2
T

Questions when estimating


1) Are the estimates of based on past errors unbiased?
2) Does the population depend on t?
If not, estimates become better as T increases
If it does, different issues: heteroskedasticity of any sort,
serial correlation, etc.
Do our estimates capture this dependence?
3) Does depend on past realizations of y?

Questions when estimating


4) How good are our estimates of ? If M is large relative to T,
our estimates are poor!
5) Shall we just focus on weighted averages? Why not to
consider the median forecast, or trim excess forecasts?

One More Issue: Optimality of Equal


Weights?
When does it make sense (in terms of minimum squared error) to use
equal weights?
when the variance of the forecast errors are the same
and all the pair wise covariances across forecast errors are the same
the loss function is symmetric

Result: Equal weights tend to perform better than many estimates of


the optimal weights (Stock and Watson 2004, Smith and Wallis 2009)

Module 9
Session 3: Methods to estimate the
weights when M is low relative to T

Methods of Weighting (M<T)


Learning Objectives:
Decide when to combine and when not to
combine
Estimate weights using OLS
Address Sampling Error

To combine or not to combine?


We need to assess whether one set of forecasts encompasses all
information contained in another set of forecasts
Example, for 2 forecasting models, run the regression:
yt h T ,h,0 T ,h,1 xt ,h,1 T ,h,2 xt ,h,2 t h

t 1, 2,...T - h

If you cannot reject:


H 0 : (T ,h,0 , T ,h,1 , T ,h,2 ) (0,1,0)
H 0 : (T ,h,0 , T ,h,1 , T ,h,2 ) (0,0,1)

All other outcomes imply that there is some information in both


forecasts that can be used to obtain a lower mean squared error

OLS estimates of the weights


If we assume a linear-in-weights model, OLS can be used to estimate
the weights that minimizes the MSE using data for t = 1,T h:

yt h T ,h,0 T ,h,1 xt ,h,1 T ,h,2 xt ,h,2 ... T ,h,M xt ,h,M t h


yt h T ,h,1 xt ,h,1 T ,h,2 xt ,h,2 ... T ,h,M xt ,h,M t h
Including a constant allows correcting
for the bias in any one of the forecasts

s.t. ,i 1
i 1

OLS estimates of the weights


If weights sum to one, then previous equation becomes a regression of
a vector of 0s over the past forecasting errors:
0 T ,h,1 ( xt ,h,1 yt h ) T ,h,2 ( xt ,h,2 yt h ) ... t h
0 T ,h,1et h,1 T ,h,2 et h,2 ... t h

subject to uw = 1 and
Problem 2: Choose w to minimize w'w


wi 0, where

T h

e
t 1

t,h

e 't,h and et,h is a vector that collects the

forecast errors of the M forecasts made in t.

Reducing the dependency on sampling


errors

Assume that estimates of reflect (in part) sampling error.


Although optimal weights depend on , it makes sense to
reduce the dependence of the weights on such an estimate
One way to achieve this is to shrink the optimal weights
towards equal weights (Stock and Watson 2004):

Ts ,h,i T ,h,i (1 )(1/ M ),


max(0,1 ( M / (T h M 1)))

Module 9
Session 4: Methods to estimate the
weights when M is high relative to T

Methods of Weighting (M>T)


Learning Objectives:
Explore shortcomings of OLS weights
Look at other parametric weights.
Consider some non-parametric weights and
techniques

Premise: problems with OLS weights


The problem with OLS weights is that:

If M is large relative to T h, the estimator loses


precision and may not even be feasible (if M > T
h)
Even if M is low relative to T h, OLS estimation of
weights is subject to sampling error.

Other Types of Weights

Relative Performance
Shrinking Relative Performance
Recent Performance
Adaptive Weights
Non-parametric (trimming and indexing)

Relative performance weights


A solution to the problem of OLS weights is to ignore the covariance
across forecast errors and compute weights based on their relative
performance over the past.
1 T h 2
et ,h,i
For each forecast compute MSET ,h,i

T h 1 t 1

MSE weights (or relative performance weights)


T ,h ,i

1
MSET ,h ,i
M
1

i 1 MSET , h ,i

Shrinking relative performance


Consider instead

T , h ,i

MSE

T , h ,i

k
M

i 1 MSET , h ,i

If k = 1 we obtain standard MSE weights


If k = 0 we obtain equal weights 1/M
The parameter k allows attenuating the attention we pay to
performance

Highlighting recent performance


Consider computing a discounted
MSET ,h,i

T h
1
2

t , h ,i (t )
#period with 0 t 1

where (t) can be either one of the following


1 if t T h v
(t )
0 if t T h v

(t )

T h t

rolling
window
discounted
MSE

Computing MSE weights using either rolling windows or


discounting allows paying more attention to recent performance

Adaptive weights
Relative performance weights could be sensitive to adding new
forecast errors. A possibility is to adapt previous weights by the most
recently computed weights

T ,h,i MSE weight (with or without covariance)


T* ,h,i T* 1,h,i (1- )T ,h,i

Non parametric weights: Trimming


Often advantageous to discard the models with the worst and
best performance when combining forecasts
Simple averages are easily distorted by extreme
forecasts/forecast errors
Aiolfi and Favero (2003) recommend ranking the individual
models by R^2 and discarding the bottom and top 10 percent

Non parametric weights: Indexing


Rank the models by their MSE. Let Ri be the rank of the i-th
model, then:
Index based-weights
index
T , h ,i

1 Ri
M

1 R
j 1

Module 9
Session 5: Workshop on Combining
Forecasts

Workshop 1: Combining Forecasts


Have open MF_combination_forecasting.wf1well be
working on the CF_W1_Combined pagefile.
Lets estimate 4 regression models:

Combining Forecasts
Step One: Estimate each of the models using LS and Forecast
2008-09 using EViews. Note the RMSE of each.
Step Two: Calculate a combined forecast using the misspecified
ones using:
Equal Weights
Trimmed Weights
Inverse MSE weights

Combining Forecasts
Step Three: Compare the RMSE of the combined forecast
models to that of the individual forecast models.
Which is most accurate?
Do all 3 combined ones outperform the individual ones?

Step Four: Repeat Steps 1-3 for the true DGP:

Then forecast 2008-09, what is RMSE of this full model?

Module 9
Session 6 : Forecast Combination
Tools in EViews

Workshop 2: Forecast Combination


Techniques
Have open MF_M9.wf1well be working on the USA
pagefile.
Lets explore different forecast combination techniques!
Note: This workshop will involve the use of multiple program
files. Make sure to have the workfile open at all times.

W2: Forecast Combination Techniques


Step One: First, remember to always inspect your data:
The USA pagefile contains quarterly data for 1959q1-2010q4:
rgdp= real GDP index (2005=100)in this case will use natural log (lngdp).
growth = growth rate over different time spans (1, 2 and 4 quarters)
fh_i_j = Growth forecast indexed by time horizon (i) and model number (j)

Step Two: Produce combined forecast for GDP growth in


2004q1we can compare this to actual GDP growth.
Use programs to compute forecasts for rest of 2004-2007 (tedious).

Workshop 2: Forecast Combination


Techniques
Have open MF_M9.wf1well be working on the USA
pagefile.
Lets explore different forecast combination techniques!
Note: This workshop will involve the use of multiple program
files. Make sure to have the workfile open at all times.

Module 9
Session 7: Conclusions

Conclusions
Numerous weighting schemes have been proposed to
formulate combined forecasts.
Simple combination schemes are difficult to beat; why this is
the case is not fully understood.
Simple weights reduce variability with relatively little cost in terms of
overall bias
Also provide diversity if pool of models is indeed diverse.

Conclusions
Results are valid for symmetric loss function; may not be valid if sign of the
error matters
Forecasts based solely on the model with the best in-sample performance
often yields poor out-of-sample forecasting performance.
Reflects the reasonable prior that a preferred model is really just an
approximation of true DGP, which can change each period.
Combined forecasts imply diversification of risk (provided not all the
models suffer from the same misspecification problem).

Appendix

JVI14.09

65

Appendix 1
Let e be the (M x 1) vector of the forecast errors. Problem 1:
choose the vector w to minimize E[weew] subject to uw = 1.
Notice that E[weew]
= wE[ee]w = ww. The Lagrangean is
L w'ee'w - [u'w - 1]
and the FOC is

w - u 0 w* = -1u

Using uw = 1 one can obtain


u'w = u' u 1 = u' u =
u' u

-1

-1

Substituting back one obtains


*
-1

-1

w = u
u' u

JVI14.09

-1

66

Appendix 1
Let t,h be the variance-covariance matrix of the forecasting errors
T , h

T2 h ,1

T h ,1,2

T h ,1,2

T2 h ,2

Consider the inverse of this matrix


T , h

det | T , h

T2 h ,2

|
T h ,1,2

T h ,1,2

T2 h ,1

Let u = [1, 1]. The two weights w* and (1 - w*) can be written as

T,h -1u

1 w
u' -1u
T,h
*

JVI14.09

67

Appendix 2
Notice that
and that

T2h ,1 T2h ,2 2 T h ,1,2 E (eT h ,1 eT h ,2 )2 0

T2h ,1 (1 T2h ,1,2 ) 0

So, the following inequality holds

T2h ,2 (1 T2h ,1,2 )


1
T2 h ,1 T2h ,2 2T h ,1,2 T h ,1 T h ,2

T2h,2 (1 T2h,1,2 ) T2h,1 T2h,2 2T h,1,2 T h,1 T h ,2


0 T2h ,1 2T h ,1,2 T h ,1 T h ,2 T2h ,2 T2h ,1,2
0 ( T h ,1 T h ,2 T h ,1 )2
JVI14.09

68

Appendix 3
The MSE loss function of a forecast has two components:
the squared bias of the forecast
the (ex-ante) forecast variance

E[( yT h xT ,h,i )2 ] E[( E ( yT h ) y xT ,h ,i ) 2 ]


E[( E ( yT h ) y E ( xT ,h,i ) E ( xT , h,i ) xT ,h,i ) 2 ]
E[( Biasi y E ( xT ,h,i ) xT ,h,i ) 2 ]
Biasi2 y2 Var ( xT ,h,i )

JVI14.09

69

Appendix 4
Suppose that x Py where P is an (m x T) matrix, y is a (T x 1)
vector with all yt , t = 1,T. Consider:
T , h ,i

2
T , h ,i

2
1 T h

xt , h ,i yt h

T h t 1

2
1 T h

E
[
y
]

t , h ,i
t h
y ,t h
T h t 1

2
y

2
1 T h
2 T h

y ,t h xt ,h ,i E[ yt h ]
xt ,h,i E[ yt h ] T h
T h t 1
t 1

JVI14.09

70

Appendix 4
Consider:

2
T , h ,i

2
1 T h
2 T h

xt , h ,i E[ yt h ]
y ,t h xt ,h ,i E[ yt h ]

T h t 1
T h t 1
2
...
'(Py - E[y])
T h
2
...
'(PE[y] + P - E[y])
T h
2
...
'P '(I - P)E[y]
T h
2
MSPET
E 'P
T h
2
y

JVI14.09

71

You might also like