You are on page 1of 48

Bayesian Estimation of Linearized DSGE Models

Dr. Tai-kuang Ho
+
+
Associate Professor. Department of Quantitative Finance, National Tsing Hua University, No.
101, Section 2, Kuang-Fu Road, Hsinchu, Taiwan 30013, Tel: +886-3-571-5131, ext. 62136, Fax:
+886-3-562-1823, E-mail: tkho@mx.nthu.edu.tw.
1 Posterior distribution
Chapter 12 of Tsay (2005) provides an elegant introduction to Markov Chain
Monte Carlo Methods with applications.
Greenberg (2008) provides a very good introduction to fundamentals of Bayesian
inference and simulation.
Geweke (2005) provides a more advanced treatment of Bayesian econometrics.
Bayesian inference combines prior belief (knowledge) with empirical data to form
posterior distribution, which is the basis for statistical inference.
0 : the parameters of a DSGE model
Y : the empirical data
1 (0) : prior distribution for the parameters
The prior distribution 1 (0) incorporates the prior belief and knowledge of the
parameters.
) (Y [0) : the likelihood function of the data for given parameters
By the denition of conditional probability:
) (0[Y ) =
) (0, Y )
) (Y )
=
) (Y [0) 1 (0)
) (Y )
(1)
The marginal distribution ) (Y ) is dened as:
) (Y ) =
_
) (Y, 0) o0 =
_
) (Y [0) 1 (0) o0
) (0[Y ) is called the posterior distribution of 0.
It is the probability density function (PDF) of 0 given the observed empirical
data Y .
Omit the scale factor and equation (1) can be expressed as:
) (0[Y ) c ) (Y [0) 1 (0)
Bayes theorem
jcctcvicv 111 c (|iIc|iIcco )&actica) (jvicv 111)
Expressed in logarithm:
log (jcctcvicv 111) c log (|iIc|iIcco )&actica) + log (jvicv 111)
2 Markov Chain Monte Carlo (MCMC) methods
This section draws from Chapter 7 of Greenberg (2008).
The basis of an MCMC algorithm is the construction of a transition kernel,
denoted by j (a, j), that has an invariant density equal to the target density.
Given such a kernel, we can start the process at a
0
to yield a draw a
1
from
j (a
0
, a
1
), a
2
from j (a
1
, a
2
), a
3
from j (a
2
, a
3
),..., and a
j
from j
_
a
j1
, a
j
_
.
The distribution of a
j
is approximately equal to the target distribution after a
transient period.
Therefore, MCMC algorithms provide an approximation to the exact posterior
distribution of a parameter.
How to nd a kernel that has the target density as its invariant distribution?
Metropolis-Hasting algorithm provides a general principle to nd such kernels.
Gibbs sampler is a special case of the Metropolis-Hasting algorithm.
2.1 Gibbs algorithm
The Gibbs algorithm is applicable when it is possible to sample from each con-
ditional distribution.
Suppose we want to sample from the joint distribution ) (a
1
, a
2
).
Further suppose that we are able to sample from the two conditional distributions
) (a
1
[a
2
) and ) (a
2
[a
1
).
Gibbs algorithm
1. Choose a
(0)
2
(you can also start from a
(0)
1
)
2. The rst iteration
ovo& a
(1)
1
)vcn )
_
a
1
[a
(0)
2
_
ovo& a
(1)
2
)vcn )
_
a
2
[a
(1)
1
_
3. The j-th iteration
ovo& a
(j)
1
)vcn )
_
a
1
[a
(j1)
2
_
ovo& a
(j)
2
)vcn )
_
a
2
[a
(j)
1
_
4. Draw until the desired number of iterations is obtained.
We discard some portion of the initial sample.
This portion is the transient or burn-in sample.
Let a be the number of total iterations and n be the number of burn-in sample.
The point estimate of a
1
(similarly for a
2
) and its variance are:
^ a
1
=
1
a n
a

)=n+1
a
())
1
^ o
2
1
=
1
a n1
a

)=n+1
_
a
())
1
^ a
1
_
2
The invariant distribution of the Gibbs kernel is the target distribution.
Proof:
a = (a
1
, a
2
)
j = (j
1
, j
2
)
j (a, j) : a j a
1
j
1
; a
2
j
2
The Gibbs kernel is:
j (a, j) = ) (j
1
[a
2
) ) (j
2
[j
1
)
) (j
1
[a
2
) : ovo& a
(j)
1
)vcn )
_
a
1
[a
(j1)
2
_
) (j
2
[j
1
) : ovo& a
(j)
2
)vcn )
_
a
2
[a
(j)
1
_
It follows that:
_
j (a, j) ) (a) oa =
_
) (j
1
[a
2
) ) (j
2
[j
1
) ) (a
1
, a
2
) oa
1
oa
2
= ) (j
2
[j
1
)
_
) (j
1
[a
2
) ) (a
1
, a
2
) oa
1
oa
2
= ) (j
2
[j
1
)
_
) (j
1
[a
2
) ) (a
2
) oa
2
= ) (j
2
[j
1
) ) (j
1
) = ) (j
1
, j
2
) = ) (j)
This proves that ) (j) is the invariant distribution for the Gibbs kernel j (a, j).
The proof that the invariant distribution of the Gibbs kernel is the target distri-
bution is a necessary, but not a sucient condition for the kernel to converge to
the target distribution.
Please refer to Tierney (1994) for a further discussion of such conditions.
The Gibbs sampler can be easily extended to more than two blocks.
In practice, convergence of Gibbs sampler is an important issue.
I will use Brooks and Gelman (1998)s method for convergence check.
2.2 Metropolis-Hasting algorithm
Metropolis-Hasting algorithm is more general than the Gibbs sampler because
it does not require the availability of the full set of conditional distribution for
sampling.
Suppose that we want to draw a random sample from the distribution ) (A).
The distribution ) (A) contains a complicated normalization constant so that a
direct draw is either too time-consuming or infeasible.
However, there exists an approximate distribution (jumping distribution, proposal
distribution) for which random draws are easy to obtain.
The Metropolis-Hasting algorithm generates a sequence of random draws from
the approximate distribution whose distributions converge to ) (A).
MH algorithm
1. Given a, draw Y from q (a, j).
2. Draw l from l (0, 1).
3. Return Y if:
l c(a, Y ) = min
_
) (Y ) q (Y, a)
) (a) q (a, Y )
, 1
_
4. Otherwise, return a and go to step 1.
5. Draw until the desired number of iterations is obtained.
q (a, j) is the proposal distribution.
The normalization constant of ) (A) is not needed because only a ratio is used
in the computation.
How to choose the proposal density q (a, j)?
The proposal density should generate proposals that have a reasonably good
probability of acceptance.
The sampling should be able to explore a large part of the support.
Two well-known proposal kernels are the random walk kernel and the independent
kernel.
A. Random walk kernel:
j = a + &
I(&) = I(&) q (a, j) = q (j, a) c(a, j) =
)(j)
)(a)
B. Independent kernel:
q (a, j) = q (j)
q (a, j) = q (j) c(a, j) =
)(j)q(j)
)(a)q(a)
2.2.1 Metropolis algorithm
The algorithm uses a symmetric proposal function, namely q (Y, a) = q (a, Y ).
Metropolis algorithm
1. Given a, draw Y from q (a, j).
2. Draw l from l (0, 1).
3. Return Y if:
l c(a, Y ) = min
_
) (Y )
) (a)
, 1
_
4. Otherwise, return a and go to step 1.
5. Draw until the desired number of iterations is obtained.
2.2.2 Properties of MH algorithm
A kernel q (a, j) is reversible if:
) (a) q (a, j) = ) (j) q (j, a)
It can be shown that ) is the invariant distribution for the reversible kernel q
dened above.
Now we begin with a kernel that is not reversible:
) (a) q (a, j) ) (j) q (j, a)
We make the irreversible kernel into a reversible kernel by multiplying both sides
by a function c.
) (a) c(a, j) q (a, j)
. .
= ) (j) c(j, a) q (j, a)
. .
j (a, j) = c(a, j) q (a, j)
) (a) j (a, j) = ) (j) j (j, a)
This turns the irreversible kernel q (a, j) into the reversible kernel j (a, j).
Now set c(j, a) = 1.
) (a) c(a, j) q (a, j) = ) (j) q (j, a)
c(a, j) =
) (j) q (j, a)
) (a) q (a, j)
< 1
By letting c(a, j) < c(j, a), we equalize the probability that the kernel goes
from a to j with the probability that the kernel goes from j to a.
Similar consideration for the general case implies that:
c(a, j) = min
_
) (j) q (j, a)
) (a) q (a, j)
, 1
_
2.2.3 Metropolis-Hasting algorithm with two blocks
Suppose we are at the (j 1)-th iteration a = (a
1
, a
2
) and want to move to
the j-th iteration j = (j
1
, j
2
).
MH algorithm
1. Draw Z
1
from q
1
(a
1
, Z[ a
2
).
2. Draw l
1
from l (0, 1).
3. Return j
1
= Z
1
if:
l
1
c(a
1
, Z
1
[ a
2
) =
) (Z
1
, a
2
) q
1
(Z
1
, a
1
[ a
2
)
) (a
1
, a
2
) q
1
(a
1
, Z
1
[ a
2
)
4. Otherwise, return j
1
= a
1
.
5. Draw Z
2
from q
2
(a
2
, Z[ j
1
).
6. Draw l
2
from l (0, 1).
7. Return j
2
= Z
2
if:
l
2
c(a
2
, Z
2
[ j
1
) =
) (j
1
, Z
2
) q (Z
2
, a
2
[ j
1
)
) (j
1
, a
2
) q (a
2
, Z
2
[ j
1
)
8. Otherwise, return j
2
= a
2
.
3 Estimation algorithm
An and Schorfheide (2007) review Bayesian estimation and evaluation techniques
that have been developed in recent years for empirical work with DSGE models.
Why using Bayesian method to estimate DSGE models?
Bayesian estimation of DSGE models has 3 characteristics (An and Schorfheide,
2007).
First, compared to GMM estimation, Bayesian estimation is system-based. (This
is also true for maximum likelihood estimation)
Second, the estimation is based on likelihood function generated by the DSGE
model, rather than the discrepancy between model-implied impulse responses
and VAR impulse responses.
Third, prior distributions can be used to incorporate additional information into
the parameter estimation.
3.1 Draw from the posterior by Random Walk Metropolis algo-
rithm
Remember that posterior is proportional to likelihood function times prior.
) (0[Y ) c ) (Y [0) 1 (0)
How to compute posterior moments?
Random Walk Metropolis (RWM) algorithm allows us to draw from the posterior
) (0[Y ).
RWM algorithm belongs to the more general class of Metropolis-Hasting algo-
rithm.
RWM algorithm
1. Initialize the algorithm with an arbitrary value 0
0
and set ) = 1.
2. Draw 0
+
)
from 0
+
)
= 0
)1
+ . A
_
0
)1
,
.
_
.
3. Draw & from l (0, 1).
4. Return 0
)
= 0
+
)
if
& c
_
0
)1
, 0
+
)1
_
= min
_
_
_
)
_
Y [0
+
)
_
1
_
0
+
)
_
)
_
Y [0
)1
_
1
_
0
)1
_
, 1
_
_
_
5. Otherwise, return 0
)
= 0
)1
.
6. If ) . then ) ) + 1 and go to step 2.
Kalman lter is used to evaluate the above likelihood values )
_
Y [0
+
)
_
and
)
_
Y [0
)1
_
.
3.2 Computational algorithm
Schorfheide (2000) and An and Schorfheide (2007).
1. Use a numerical optimization routine to maximize log ) (Y [0) + log 1 (0).
2. Denote the posterior model by
~
0.
3. Denote by
~
the inverse of the Hessian computed at the posterior mode
~
0.
4. Specify an initial value for 0
0
, or draw 0
0
from A
_
~
0, c
2
0

_
.
5. Set ) = 1 and set the number of MCMC ..
6. Evaluate ) (Y [0
0
) and 1 (0
0
)
A evaluate 1 (0
0
) for given 0
0
B use Paul Kleins method to solve the model for given 0
0
C use Kalman lter to evaluate ) (Y [0
0
)
7. Draw 0
+
)
from 0
+
)
= 0
)1
+ . A
_
0
)1
, c
2

_
.
8. Draw & from l (0, 1).
9. Evaluate )
_
Y [0
+
)
_
and 1
_
0
+
)
_
A evaluate 1
_
0
+
)
_
for given 0
+
)
B use Paul Kleins method to solve the model for given 0
+
)
C use Kalman lter to evaluate )
_
Y [0
+
)
_
10. Return 0
)
= 0
+
)
if
& c
_
0
)1
, 0
+
)1
_
= min
_
_
_
)
_
Y [0
+
)
_
1
_
0
+
)
_
)
_
Y [0
)1
_
1
_
0
)1
_
, 1
_
_
_
11. Otherwise, return 0
)
= 0
)1
.
12. If ) . then ) ) + 1 and go to step 7.
13. Approximate the posterior expected value of a function I(0) by:
1 [I(0) [Y ] =
1
.
cin
.
cin

)=1
I
_
0
)
_
.
cin
= . .
b&vaia
It is recommended to adjust the scale factor c so that the acceptance rate is
roughly 25 percent in WRM algorithm.
4 An example: business cycle accounting
This example illustrates Bayesian estimation of the wedges process in Chari,
Kehoe and McGrattan (2007)s business cycle accounting.
The wedges process is c
t
=
_
^

t
, ^ t
|t
, ^ t
at
, ^ j
t
_
.
c
t+1
= 1c
t
+ Q.
t+1
1 =
_

_
j
11
j
12
j
13
j
14
j
21
j
22
j
23
j
24
j
31
j
32
j
33
j
34
j
41
j
42
j
43
j
44
_

_
Q =
_

_
q
11
0 0 0
q
21
q
22
0 0
q
31
q
32
q
33
0
q
41
q
42
q
43
q
44
_

_
.
t+1
A (0
41
, 1
44
)
We estimate the lower triangular matrix Q to ensure that the estimate of \ =
QQ
t
is positive semidenite.
The matrix Q has no structural interpretation.
Given the wedges, which are functionally similar to shocks, the next step is to
solve the log-linearized model.
We use Paul Kleins MATLAB code to solve the log-linearized model.
The state variables of the model are:
_
^
I
t
, c
t
_
=
_
^
I
t
,
^

t
, ^ t
|t
, ^ t
at
, ^ j
t
_
The control variables of the model are:
_
^ c
t
, ^ a
t
, ^ j
t
,
^
|
t
_
The observed variables of the model are:
_
^ j
t
, ^ a
t
,
^
|
t
, ^ j
t
_
Here again the log-linearized model:
~ c
~ j
^ c
t
+
~ a
~ j
^ a
t
+
~ j
~ j
^ j
t
= ^ j
t
(1.c)
^ j
t
=
^

t
+ c
^
I
t
+ (1 c)
^
|
t
(2.c)
^ c
t
=
^

t
+ c
^
I
t

_
c +
|
(1 |)
_
^
|
t

1
(1 t
|
)
^ t
|t
(3.c)
^ t
at
(1 + )
o
+ (1 + t
a
)
(1 + )
o
1
t
^ c
t+1
(1 + t
a
)
(1 + )
o
^ c
t
(4.c)
= 1
t
_
c
~ j
~
I
_
^ j
t+1

^
I
t+1
_
+ (1 c) ^ t
at+1
_
(1 +
a
) (1 + )
^
I
t+1
= (1 c)
^
I
t
+
~ a
~
I
^ a
t
(5.c)
In Lecture 4, I have shown the maximum likelihood estimation of the wedges
process.
Going from MLE to Bayesian estimation is straightforward.
The rst step is to set the priors.
The choice of the priors follows Saijo, Hikaru (2008), "The Japanese Depres-
sion in the Interwar Period: A General Equilibrium Analysis," B. E. Journal of
Macroeconomics.
The prior for diagonal terms of matrix 1 is assumed to follow a beta distribution
with mean 0.7 and standard deviation 0.2.
The prior for non-diagonal terms of matrix 1 is assumed to follow a normal
distribution with mean 0 and standard deviation 0.3.
The prior for diagonal terms of matrix Q is assumed to follow an uniform distri-
bution between 0 and 0.5.
The prior for non-diagonal terms of matrix Q is assumed to follow an uniform
distribution between 0.5 and 0.5.
The table below summarizes the priors.
Prior
Name Domain Density Parameter 1 Parameter 2
j
11
, j
22
, j
33
, j
44
[0, 1) Beta 0.035 0.015
j
12
, j
13
, j
14
R Normal 0 0.3
j
21
, j
23
, j
24
R Normal 0 0.3
j
31
, j
32
, j
34
R Normal 0 0.3
j
41
, j
42
, j
43
R Normal 0 0.3
q
11
, q
22
, q
33
, q
44
[0, 0.5] Uniform 0 0.5
q
21
, q
31
, q
32
, q
41,
q
42
, q
43
[0.5, 0.5] Uniform 0.5 0.5
References
[1] An, Sungbae and Frank Schorfheide (2007), "Bayesian analysis of DSGE models,"
Econometric Reviews, 26 (2-4), pp. 113-172.
[2] Brooks, Stephan P. and Andrew Gelman (1998), "General Methods for Monitoring
Convergence of Iterative Simulations," Journal of Computational and Graphical
Statistics, 7 (4), pp. 434-455.
[3] Geweke, John (2005), Contemporary Bayesian Econometrics and Statistics,
Wiley-Interscience.
[4] Greenberg, Edward (2008), Introduction to Bayesian Econometrics, Cambridge
University Press.
[5] Schorfheide, Frank (2000), "Loss function-based evaluation of DSGE models,"
Journal of Applied Econometrics, 15, pp. 645-670.
[6] Tierney, Luke (1994), "Markov chain for exploring posterior distribution," The
Annals of Statistics, 22 (4), pp. 1701-1762.
[7] Tsay, Ruey S. (2005), Analysis of Financial Time Series, Second Edition, Wiley-
Interscience.

You might also like