You are on page 1of 2

EECS 545 HW 1

Due Wednesday, Sept. 10, by 5 PM in the 545 Box in EECS 2420

1. Positive (semi-)definite matrices (5 points each)


Let A be a real, symmetric d d matrix. We say A is positive semi-definite (PSD) if, for all
x Rd , xT Ax 0. We say A is positive definite (PD) if, for all x 6= 0, xT Ax > 0. We write
A 0 when A is PSD, and A 0 when A is PD.
The spectral theorem says that every real symmetric matrix A can be expressed
A = U U T ,
where U is a d d matrix such that U U T = U T U = I (called an orthogonal matrix), and
= diag(1 , . . . , d ). Multiplying on the right by U we see that AU = U . If we let ui
denote the i-th column of U , we have Aui = i ui for each i. This expression reveals that the
i are eigenvalues of A, and the corresponding columns are eignvectors associated to i .
Using the spectral decomposition, show that
(a) A is PSD iff i 0 for each i.
(b) A is PD iff i > 0 for each i.
2. Maximum likelihood estimation (5 points)
Consider a random variable X (possibly a vector) whose distribution (density function or
mass function) belongs to a parametric family. The density or mass function may be written
f (x ; ), where is called the parameter, and can be either a scalar or vector. For example, in
the Gaussian family, can be a two-dimensional vector consisting of the mean and variance.
Suppose the parametric family is known, but the value of the parameter is unknown. It is
often of interest to estimate this parameter from observations of X. Maximum likelihood
estimation is one of the most important parameter estimation techniques. Let X1 , . . . , Xn be
iid random variables distributed according to f (x ; ). By independence, the joint distribution
of the observations is the product
n
Y
f (Xi ; )
i=1

Viewed as a function of , this quantity is called the likelihood of . It is often more convenient
to work with the log-likelihood,
n
X
log f (Xi ; )
i=1

A maximum likelihood estimate (MLE) of is any parameter


b arg max

n
X

log f (Xi ; )

i=1

where arg max denotes the set of all values achieving the maximum. If there is a unique
maximizer, it is called the maximum likelihood estimate.

iid

(a) Let X1 , . . . , Xn Poi(), a Poisson random variable with intensity parameter . Determine the maximum likelihood estimator of . Use the properties from the notes on
unconstrained optimization to verify that the estimate you obtain is the MLE.
3. Unconstrained Optimization (5 point each)
In this problem you will prove some of properties of unconstrained optimiziation problems
discussed in class. It will be helpful to understand the proofs presented in the notes.
(a) Show that if f is strictly convex, then f has at most one global minimizer.
For the next two parts, the following fact will be helpul. A twice continuously differentiable
function admits the quadratic expansion

f (x) = f (y) + f (y), x y + x y, 2 f (y)(x y) + o kx yk2 ,


2
o(t)
= 0, as well as the expansion
t0 t

where o(t) denotes a function satisfying lim

1
x y, 2 f y + t(x y) (x y) .
f (x) = f (y) + f (y), x y +
2
for some t (0, 1).
(b) Show that if f is twice-continuously differentiable and x is a local minimizer, then
2 f (x ) 0, i.e., the Hessian of f is positive semi-definite at the local minimizer x .
(c) Show that if f is twice continuously differentiable, then f is convex if and only if the
Hessian 2 f (x) is positive semi-definite for all x Rd .
(d) Consider the function f (x) = 21 xT Ax + bT x + c, where A is a symmetric d d matrix.
Derive the Hessian of f . Under what conditions on A is f convex? Strictly convex?
(e) Optional: In class, we assumed the domain of the objective function was Rd . Suppose
the domain is instead some subset S Rd , and we seek to solve
min f (x).
xS

Do the various properties still hold, or do they need to be modified in some way? What
conditions on S are needed for the (modified) properties to hold?

You might also like