Professional Documents
Culture Documents
Yasutomo Murasawa
October 19, 2004
Contents
1 Vector Dierentiation 1
1.1 Scalar-Valued Functions . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Vector-Valued Functions . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Dierentiation Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Application to OLS 3
2.1 Unconstrained Maximization . . . . . . . . . . . . . . . . . . . . . . 3
2.2 OLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Application to RLS 4
3.1 Constrained Maximization . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2 RLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1 Vector Dierentiation
1.1 Scalar-Valued Functions
Let
X
n
, Y
m
,
f : X Y .
Let m = 1.
Denition 1 The gradient of f(.) at x is
df
dx
(x) :=
_
_
_
f
x
1
(x)
.
.
.
f
x
n
(x)
_
_
_.
Remark: We also write f(x).
Denition 2 The Hessian matrix of f(.) at x is
d
2
f
dxdx
(x) :=
_
2
f
x
2
1
(x) . . .
2
f
x
1
x
n
(x)
.
.
.
.
.
.
.
.
.
2
f
x
n
x
1
(x) . . .
2
f
x
2
n
(x)
_
_
.
Remark: We also write
2
f(x).
1
1.2 Vector-Valued Functions
Let m > 1.
Denition 3 The Jacobian matrix of f(.) at x is
df
dx
(x) :=
_
_
df
1
dx
(x)
.
.
.
df
m
dx
(x)
_
_.
Remark: We dene
df
dx
(x) :=
df
dx
(x)
.
1.3 Dierentiation Rules
Theorem 1 a, x
n
,
da
x
dx
= a.
Proof. By the denition of the gradient,
da
x
dx
:=
_
_
_
a
x
x
1
.
.
.
a
x
x
n
_
_
_
=
_
_
_
x
1
n
i=1
a
i
x
i
.
.
.
x
n
n
i=1
a
i
x
i
_
_
_
=
_
_
a
1
.
.
.
a
n
_
_
.
2
Theorem 2 A
mn
, x
n
,
dAx
dx
= A.
Proof. By the denition of the Jacobian matrix,
dAx
dx
:=
_
_
da
1
x
dx
.
.
.
da
m
x
dx
_
=
_
_
a
1
.
.
.
a
m
_
_
.
2
Theorem 3 A
nn
, x
n
,
dx
Ax
dx
= (A+A
)x.
2
Proof. We can write
x
Ax =
n
i=1
n
j=1
x
i
a
i,j
x
j
.
By the denition of the gradient,
dx
Ax
dx
:=
_
_
_
x
Ax
x
1
.
.
.
x
Ax
x
n
_
_
_
=
_
_
_
x
1
n
i=1
n
j=1
x
i
a
i,j
x
j
.
.
.
x
n
n
i=1
n
j=1
x
i
a
i,j
x
j
_
_
_
=
_
_
_
_
x
1
_
x
1
n
j=1
a
1,j
x
j
+
i=1
n
j=1
x
i
a
i,j
x
j
_
.
.
.
x
n
_
x
n
n
j=1
a
n,j
x
j
+
i=n
n
j=1
x
i
a
i,j
x
j
_
_
_
_
_
=
_
_
_
n
j=1
a
1,j
x
j
+
n
i=1
a
i,1
x
i
.
.
.
n
j=1
a
n,j
x
j
+
n
i=1
a
i,n
x
i
_
_
_
=
_
_
a
1
x +a
1
x
.
.
.
a
n
x +a
n
x
_
_
=
_
_
a
1
.
.
.
a
n
_
_
x +
_
_
a
1
.
.
.
a
n
_
_x.
2
Remark: If A is symmetric, then
dx
Ax
dx
= 2Ax.
2 Application to OLS
2.1 Unconstrained Maximization
Let
X
n
be open,
Y ,
f : X Y be dierentiable and concave.
Consider
max
x
f(x)
and x X.
The FOC is
f(x
) = 0.
Since f(.) is concave, the FOC is also sucient.
3
2.2 OLS
Let (y, X) be a (1 +k)-variate random sample of size n. Assume a linear regression
model for y given X s.th.
E(y|X) = X.
The ordinary least squares (OLS) estimator of solves
max
1
2
(y X)
(y X)
and
k
.
The FOC is
X
(y Xb) = 0.
Assume that X has full column rank. Then
b = (X
X)
1
X
y.
3 Application to RLS
3.1 Constrained Maximization
Let
X
n
be open and convex,
Y ,
f : X Y be dierentiable and concave,
g : X
m
be dierentiable and convex.
Consider
max
x
f(x)
s.t. g(x) = 0,
and x X.
Assume that a constraint qualication holds. The Lagrange function is
L(x; ) = f(x) +
g(x).
The FOCs are
f(x
) +g(x
= 0,
g(x
) = 0.
3.2 RLS
The restricted least squares (RLS) estimator of s.t. R = r solves
max
1
2
(y X)
(y X)
s.t. R r = 0,
and
k
.
4
Assume that R has full row rank (constraint qualication). The Lagrange function
is
L(; ) :=
1
2
(y X)
(y X) +
(R r)
=
1
2
y
y +
1
2
y +
1
2
y
X
1
2
X +
r.
The FOCs are
X
y X
Xb
+R
= 0,
Rb
r = 0.
Premultiplying the rst equation by R(X
X)
1
,
R(X
X)
1
X
y Rb
+R(X
X)
1
R
= 0,
or
Rb r +R(X
X)
1
R
= 0,
where b is the OLS estimator of . Assume that X has full column rank. Then
=
_
R(X
X)
1
R
1
(Rb r).
Plugging this back into the rst equation,
X
y X
Xb
+R
_
R(X
X)
1
R
1
(Rb r) = 0,
or
X
Xb
= X
y R
_
R(X
X)
1
R
1
(Rb r),
or
b
= b (X
X)
1
R
_
R(X
X)
1
R
1
(Rb r).
5