You are on page 1of 12

Massachusetts Institute of Technology Guido Kuersteiner

Department of Economics
Time Series 14.384
Lecture 9: GMM Estimation
9.1. Introduction
In this lecture we consider estimators based on the generalized method of moments (GMM) principle.
Let X
t
be a m 1 vector of observable variables and g(X
t
, ) : R
k
R
g
a function depending
on a parameter , where is a parameter space, usually a subset of R
d
.
When g = d the method of moments principle estimates by matching E(g(X
t
, )) to a sample
analog such that
1
n
n
X
t=1
g(X
t
,

) E(g(X
t
,

)) = 0.
When g > d the same approach is termed GMM according to Hansen (1982) who provided the most
general formulation of this approach.
In economic applications moment conditions are often derived from conditional moment restric-
tions implied by economic models. To formalize this idea assume that g = 1 and let F
t
= (X
s
|s t) .
Then, assume that
E(g(X
t
,
0
)|F
t1
) = 0.
Letting
u
t
= g(X
t
,
0
)
it follows immediately, that for all z
t
which are measurable with respect to F
t1
E(u
t
z
t
) = 0.
This condition now is the basis for the construction of GMM estimators. Here, z
t
is called an
instrument which can itself be a function of X
t1
, X
t2
, ... such that z
t
= F(X
t1
, X
t2
, ..,
0
) or it
can consist of elements of X
t1
, X
t2
, ... directly.
Example 9.1. Let X
t
= (y
t
, x
t
) and consider the nonlinear regression
y
t
= f(x
t
, ) +u
t
where u
t
is assumed to satisfy the conditional moment restriction E(u
t
|F
t1
) = 0. If in addition it
also holds that E(u
t
|x
t
) = 0 then this model can be estimated by nonlinear least squares methods as
well as by GMM methods where x
t
, X
t1
, X
t2
, ... can be used as instruments. If E(u
t
|x
t
) 6= 0 then
nonlinear least squares is not valid and GMM based on instruments X
t1
, X
t2
, ... should be used.
Example 9.2. Hansen and Singleton (1982, Ecta) consider an intertemporal asset pricing model
where representative agents solve
max
c
t
E
0
P

t=0

t
u(c
t
)

s.t. c
t
+
P
N
j=1
P
j,t
Q
j,t

P
N
j=1
V
j,t
Q
j,t1
+w
t
where E
t
stands for expectations conditional on F
t
, is the subjective discount factor, c
t
consumption
at time t, u(.) is the temporally additive utility function, P
j,t
is the price of asset j at time t, Q
j,t
is
the number of shares held in asset j at time t and w
t
is labor income. The value V
j,t
is the payo of
holding asset j for one period. For stocks V
j,t
is usually equal to P
j,t
+D
j,t
where D
j,t
are dividends
paid out between t1 and t. Assuming that all assets are stocks, the rst order conditions for optimal
consumption and investment are given by
P
j,t
u
0
(c
t
) = E
t

(P
j,t+1
+D
j,t+1
) u
0
(c
t+1
)

where u
0
(.) is marginal utility of consumption. Letting R
j,t+1
be the one period return of holding
asset j, where
R
j,t+1
=
P
j,t+1
+D
j,t+1
P
j,t+1
this condition can be written as
E
t

R
j,t+1
u
0
(c
t+1
)
u
0
(c
t
)
1

= 0.
In this example we have shown that for X
t
= c
t+1
, c
t
, R
1,t+1
, ..., R
N,t+1
g(X
t
, ) =
_

_
R
1,t+1
u
0
(c
t+1
)
u
0
(c
t
)
1
.
.
.
R
N,t+1
u
0
(c
t+1
)
u
0
(c
t
)
1
_

_
and valid instruments z
t
are all variables in the information set of the agent. The parameter vector
contains and other parameters that determine the utility function u(c). Moment conditions for
estimation can then be derived from
E[g(X
t
,
0
) z
t
] = 0.
2
Example 9.3. Linear Asset Pricing Model (Hansen and Singleton, 1996, JBES)
E(e

U
0
(t +)P(t +)) = U
0
(t)P(t)
for all > 0. For the CRRA utility function U(C) = (C
+1
1)/( + 1) a log-linearized version of
this rst order conditions is
c
t+1
+r
t
= u
t+1
with
c
t
= log C
t
r
t
= log(P
t+1
/P
t
).
In this case the function g(.) takes on the form
g(c
t+1
, r
t
) = c
t+1
+r
t
.
In order to determine valid instruments we need to establish the properties of the error term u
t
.
Assume that the underlying model is a continuous time process with M(t) = U
0
(t)P(t) such that
dM(t) = M(t)dW(t)
such that by Itos formula
d lnM(t) = dW(t)
1
2
dt
or
lnM(t + 1) lnM(t) = W(t + 1) W(t)
1
2
If the discrete time data is a geometric average of the continuous time process then
u
t+1
=
Z
1
0
W(t + 1 ) W(t )d
has an MA(1) structure. The valid instrument set is therefore z
t
= c
t1
, r
t2
, ...
9.2. Formulation of the Estimator and Asymptotic Properties
For simplicity we assume that g = 1 and that z
t
is a k 1 vector of instruments. From before we use
u
t
= g(X
t
,
0
) and dene the k 1 vector of functions
u
t
z
t
= m(X
t
, z
t
,
0
).
3
Assume that A
n
is a sequence of non-singular k k matrices and dene
G
n
() = A
n
1
n
P
n
t=1
m(X
t
, z
t
, )
The GMM estimator then solves
min

kG
n
()k
2
= min

1
n
P
n
t=1
m(X
t
, z
t
, )

0
A
0
n
A
n

1
n
P
n
t=1
m(X
t
, z
t
, )

.
Dene the matrix
V
1
n
= A
0
n
A
n
.
Consistency of the GMM estimator holds if the following conditions hold.
Assumption 1 (Consistency). i)

satises

G
n
(

2
inf

kG
n
()k
2
+o
p
(1),
ii) uniform law of large numbers: Assume that V
1
n
p
V and that there exists a nonstochastic
function G() such that
sup

kG
n
()k
2
kG()k
2

p
0,
iii) identication: for any > 0 and a neighborhood N(
0
, ),
inf
/ N(
0
,)
kG()k
2
> kG(
0
)k
2
.
Assumption 1i) is the denition of the estimator

as the minimizer of the criterion function up
to a small in probability error. Primitive conditions for the other two conditions can be obtained for
example from Andrews (1991a). There, the following sucient conditions (stronger than necessary)
for Assumption 1ii) are given.
Assumption 2. (i) For some sequence{h
j
(.) : j 1} of real or complex Borel measurable func-
tions on R
r
, m(., ) has a pointwise convergent series expansion for each : m(X, z, ) =
P

j1
c
j
()h
j
(X, z) (X, z) R
r
, where for each , c
j
() is a sequence of real constants. (ii)
P

j1
|c
j
()| E|h
j
(X
t
, z
t
)| t, . (iii)
P

jJ
|c
j
()|
2
/a
j
0 for some summable sequence of posi-
tive constants {a
j
} for which
P

j1
a
j

j
< , where
j
=
P

s=

j
(s) and

j
(s) = sup
t

cov(h
j
(X
t
, z
t
), h
j
(X
t+|s|
, z
t+|s|
)

.
4
A mixing condition can be used to establish that
j
< for all j (see Comment 1 in Andrews).
Andrews also establishes that m(., ) satises the conditions of the theorem if it is suciently smooth
in the sense of having a uniformly bounded Sobolev norm of some order.
Assumption 1iii) is satised if is compact, G() achieves a unique minimum at
0
and G() is
continuous.
For asymptotic normality we need the following assumptions:
Assumption 3 (CLT). (i)
0
is contained in U , where U is an open set.
(ii)
1

n
P
n
t=1
m(X
t
, z
t
,
0
) N(0, )
where
= lim
n
var

n
P
n
t=1
m(X
t
, z
t
,
0
)

is positive denite.
(iii) G
n
() is twice continuously dierentiable on some neighborhood
0
of
0
.
(iv) Assume that

2
kG(
0
)k
2

0
is nonsingular and that
sup

2
kG
n
()k
2

0


2
kG()k
2

p
0.
(v)

0
(vi)

G
n
(

= o
p
(n
1/2
).
Let
M
n
() =
1
n
P
n
t=1
m(X
t
, z
t
, )
and dene

0
= p lim
M
n
(
0
)

0
.
Note that Assumption 3iv) requires that
0
has full row rank and that V is nonsingular.
Central limit theorems for martingale dierence sequences or mixing sequences can be used to
establish Assumption 3ii). If m(X
t
, z
t
, ) = u
t
z
t
is a strictly stationary and ergodic martingale
dierence sequence for example then
= Eu
2
t
z
t
z
0
t
.
5
Under more general mixing conditions allowing for non-stationarity, takes on a more complicated
form. Let v
t
= u
t
z
t
. Then = lim
n

n
where

n
=
1
n
X
t,s
Ev
t
v
0
s
=
n1
X
j=n+1

n
(j)
and

n
(j) =
(
1
n
P
n
t=j+1
Ev
t
v
0
tj
for j 0
1
n
P
n
t=j+1
Ev
t+j
v
0
t
for j < 0
.
Under Assumptions 1 and 3 we can now derive the asymptotic distribution of the GMM estimator.
Using a rst order mean value expansion we obtain
o
p
(1) = n
1/2
M
n

0
A
0
n
A
n
M
n

=
M
n
(
0
)

A
0
n
A
n
n
1/2
M
n
(
0
) +
_
_
M
n

A
0
n
A
n
M
n

0
+o
p
(1)
_
_
n
1/2

where

such that
n
1/2

=
_

_
M
n

0
A
0
n
A
n
M
n

_
1
M
n
(
0
)

0
A
0
n
A
n
n
1/2
M
n
(
0
) +o
p
(1)
=

0
0
V
1

0
0
V
1
n
1/2
M
n
(
0
) +o
p
(1)
d
N(0,

0
0
V
1

0
0
V
1
V
1

0
0
V
1

1
).
Note that when d = k, ie. if there are the same number of instruments as parameters and the
estimator is just identied,
0
is a k k invertible matrix. In this case,

0
0
V
1

1
=
1
0
V
01
0
and

0
0
V
1

0
0
V
1
V
1

0
0
V
1

1
=
1
0

01
0
such that the asymptotic variance covariance matrix does not depend on V. This is however not the
case in the overidentied case where d < k.
9.3. Ecient GMM
The previous result indicates that when d < k the choice of V matters for asymptotic eciency and
that by appropriately choosing V it is possible to minimize the asymptotic variance of

. The smallest
variance that can be achieved is when
V = .
6
Then, the asymptotic variance of

is

0
0

1
. To show that this is in fact the best possible
choice of V we show that

0
0
V
1

0
0
V
1
V
1

0
0
V
1

0
0

0
i
1
0
for all V . Note that 0 stands for positive semidenite and that F
1
G
1
0 if and only if
GF 0. Thus we need to show that

0
0

0
0
V
1

0
0
V
1
V
1

0
0
V
1

0
0.
Note that

0
0

1/2
h
I
k

1/2
V
1

0
0
V
1
V
1

0
0
V
1

1/2
i

1/2

0
= HPH
0
= HP(HP)
0
0,
where H =
0
0

1/2
, P =
h
I
k

1/2
V
1

0
0
V
1
V
1

0
0
V
1

1/2
i
and the second equality
uses the fact that P is a projection matrix. (P is symmetric and P
2
= P).The result then follows
since a matrix of the form HP(HP)
0
is necessarily positive denite, since for any x R
d
it follows
that xHP(HP)
0
x = kP
0
Hxk
2
0.
9.4. Weight Matrix Estimation
Estimation of the matrix is important both because it is the optimal weight matrix for the overi-
dentied GMM estimator and because it is a part of the asymptotic variance of

and thus needed
to construct condence intervals and test statistics based on

. In the case where there is no serial
correlation in v
t
= u
t
z
t
= m(X
t
, z
t
,
0
) the weight matrix can be estimated fairly easily by forming
sample averages.
For this purpose we however rst need a consistent estimate of
0
. Such a consistent estimate

can be based on an inecient GMM Estimator where

= arg min

1
n
P
n
t=1
m(X
t
, z
t
, )

2
.
Under Assumption 1 it then follows as before that


0
.
7
Then, we estimate = Eu
2
t
z
t
z
0
t
by

=
1
n
P
n
t=1
m(X
t
, z
t
,

)m(X
t
, z
t
,

)
0
and
0
by

0
=
1
n
P
n
t=1
m(X
t
, z
t
,

0
and

and

0
are consistent under mild regularity conditions.
When v
t
is autocorrelated then takes on a more complicated form and simple sample analogs
are no longer consistent estimates. We have seen earlier that in the more general case can be
written as = lim
n

n
where

n
=
n1
X
j=n+1

n
(j)
. If we replace
n
(j) by a sample analog such as

n
(j) =
(
1
n
P
n
t=j+1
v
t
v
0
tj
for j 0
1
n
P
n
t=j+1
v
t+j
v
0
t
for j < 0
with
v
t
= m(X
t
, z
t
,

)
then it can be shown under mild regularity conditions that, for j xed and nite,

n
(j) (j)
p
0
as n . The problem is however, that we need to estimate too many terms of the form (j) and
that convergence does not hold uniformly in j.
The problem can be resolved in principle by replacing

by

M
=
M
X
j=M

n
(j)
where M needs to go to innity at some appropriate rate relative to the sample. The problem with
such an estimate is however, that

M
it is not necessarily positive denite and thus can not be used
as a variance covariance matrix. Newey and West (1987) solve this problem. They show, that by
using appropriate weights w(j, M) and by specifying

M
=
M
X
j=M
w(j, M)

n
(j)
one can insure positive deniteness of

M
. The weights w(j, M) are restricted to be of the form
8
w(j, M) = k(j/M). The function k(j/M) is called a kernel weight and is assumed to satisfy
k(.) K
1
where
K
1
= {k : R [1, 1] , k(0) = 1, k(x) = k(x)x R,
Z
|k(x)| dx < , k continuous except at count. many points

Examples of such kernel functions are


Truncated
k
Tr
(x) =
(
1 for |x| 1
0 otherwise
Bartlett
k
B
(x) =
(
1 |x| for |x| 1
0 otherwise
Parzen
k
P
(x) =
_

_
1 6x
2
+ 6 |x|
3
for 0 |x| 1/2
2(1 |x|)
3
for 1/2 < |x| 1
0 otherwise
Tukey-Hanning
k
T
(x) =
(
(1 + cos x) /2 for |x| 1
0 otherwise
Quadratic Spectral
k
Q
(x) =
25
12
2
x
2

sin(6x/5)
6x/5
cos (6x/5)

It can be shown that the Bartlett, Parzen and Quadratic Spectral kernels all produce positive semi-
denite estimates of while this is not necessarily the case for the truncated and the Tukey-Hanning
kernels.
Newey and West (1987) show consistency of

under the following assumptions. Let Y
t
= (X
t
, z
t
).
Theorem 9.4 (Newey and West). i) Let m
t
() = m(X
t
, z
t
, ) and assume that m
t
() is measur-
able in Y
t
for all and continuously dierentiable in for all in a neighborhood N(
0
, ).
9
ii) There exists a measurable function m(Y ) such that
sup
N(
0
,)
km(X, z, )k m(Y )
sup
N(
0
,)

m(X, z, )

m(Y )
and
sup
t
E

m(Y
t
)
2

< D for some D < .


Also, there exists nite constants D, > 0 and r 1 such that
sup
t
E|m
t
(
0
)|
4(r+)
< D
iii) Y
t
is a mixing sequence of size r/ (r 1) for r > 1.
iv) sup
t
Em
t
(
0
) = 0 and

n

= O
p
(1).
v) |w(j, M)| < C for a nite constant C and for each j, lim
M
w(j, M) = 1.
Then, if M is chosen such that M as n and M/T
1/4
0

M

n
p
0.
The question of how to choose M optimally was analyzed by Andrews (1991b) and Andrews and
Monahan (1992).
9.5. Test of Overidentifying Restrictions
When the number of orthogonality restrictions k exceeds the number of parameters d the overiden-
tifying restrictions can be tested for. The econometric model implies that all the restrictions should
hold but because k > d they may actually be violated in the sample.
In order to develop the test statistic we consider the asymptotic distribution of

nM
n
(

) =
1

n
P
n
t=1
m(X
t
, z
t
,

). Note that

nM
n
(

) =

nM
n
(
0
) +
M
n

leads to

=
_
_
M
n

A
0
n
A
n
M
n

0
_
_
1
M
n

A
0
n
A
n

M
n
(

) M
n
(
0
)

10
and substituting back gives

nM
n
(

) =
_
_
_I
k

M
n

0
_
_
M
n

A
0
n
A
n
M
n

0
_
_
1
M
n

A
0
n
A
n
_
_
_

nM
n
(
0
)
+
M
n

0
_
_
M
n

A
0
n
A
n
M
n

0
_
_
1
M
n

A
0
n
A
n

nM
n
(

).
The second term is o
p
(1) from the rst order condition and the fact that
M
n

0

M
n

0
= o
p
(1).
It thus follows that

nM
n
(

)
d
N(0,
0
)
where

0
=

I
k

0
0
V
1

0
V
1

I
k

0
0
V
1

0
V
1

0
.
In order to construct the test statistic we now choose A
n
= C
n
such that

n
= C
n
C
0
n
and C
n
C in probability. Then

n(C
n
)
1
M
n
(

)
has variance covariance matrix
I
k
C
1

0
0

0
0
C
01
which is idempotent of rank k d. It then follows that
n

M
n
(

)
0

1
n
M
n
(

2
kd
.
11
References
Andrews, D. W. (1991a): An Empirical Process Central Limit Theorem for Dependent Non-
Identically Distributed Random Variables, Journal of Multivariate Analysis, pp. 187203.
(1991b): Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estima-
tion, Econometrica, 59(3), 817858.
Andrews, D. W., and J. Monahan (1992): An Improved Heteroskedasticity and Autocorrelation
Consistent Covariance Matrix Estimator, Econometrica, 60(4), 953966.
Hansen, L. P. (1982): Large Sample Properties of Generalized Method of Moments Estimators,
Econometrica, 50(4), 10291053.
Hansen, L. P., and K. J. Singleton (1982): Generalized Instrumental Vairables Estimation of
Nonlinear Rational Expectations Models, Econometrica, pp. 12691286.
(1996): Ecient Estimation of Linear Asset-Pricing Models With Moving Average Errors,
Journal of Business and Economic Statistics, pp. 5368.
Newey, W. K., and K. D. West (1987): A Simple, Positive Semi-Denite, Heteroskedasticity
and Autocorrelation Consistent Covariance Matrix, Econometrica, 55(3), 703708.
12

You might also like