Estimation Theory

ESTIMATION THEORY
Outline
1. Random Variables 2. Introduction 3. Estimation techniques 4. Extensions to Complex Vector Parameters 5. Application to communication systems
[Kay93] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory, Prentice-Hall, New Jersey, 1993. [Cover-Thomas91] T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley, New York, 1991.
Random Variables
Denitions
A random variable
is a function that assigns a number to every outcome of an experiment. is completely characterized by:

A random variable
Its cumulative distribution function (cdf): Its probability density function (pdf):
Properties
The probability that
lies between

and
then is

The mean of
is given by m E

The variance of
is given by var E m m

Random Variables
Examples
Uniform random variable:

pdf:
mean and variance: Gaussian random variable:
var
pdf: mean and variance:
exp

var
can then be determined by
are given by
we have

, we can dene
and
and
From this follows the popular Bayes rule
For independent random variables
and
For two random variables
and
Random Variables
The conditional pdfs
Two random variables
The joint pdf:
The joint cdf:
The marginal pdfs
Random Variables
Function of random variables
Suppose
is a function of the random variables
and
, e.g.,
Corresponding increments in the cdf of
Hence, the expectation over The mean of is given by m The variance of
and the joint cdf of
and
are the same
var
is given by E m m

equals the joint expectation over
and
Random Variables
Vector random variables
A vector random variable
is a vector of random variables

Its cdf/pdf is the joint cdf/pdf of all these random variables. The mean of is given by m

E E
m The covariance matrix of

is given by E m m

cov
cov
Introduction
Problem Statement
Suppose we have an unknown scalar parameter that we want to estimate from an observed vector , which is related to through the following relationship

where
is a random noise vector with probability density function (pdf)
The estimator is of the form
Note that
Hence, the performance of the estimator
itself is a random variable.
should be described statistically.
Introduction
Special Models
To solve any estimation problem, we need a model. Here, we will look deeper into two specic models: The linear model: The relationship between and is then given by
where
is the model vector and
is the noise vector, which is assumed to have , cov
mean , m
The linear Gaussian model: This model is a special case of the linear model, where the noise vector is assumed to be Gaussian (or normal) distributed: exp

, and covariance matrix
det
Estimation Techniques
We can view the unknown parameter as a deterministic variable Minimum Variance Unbiased (MVU) Estimator Best Linear Unbiased Estimator (BLUE) Maximum Likelihood Estimator (MLE) Least Squares Estimator (LSE) is viewed as a random variable

The Bayesian philosophy:
Minimum Mean Square Error (MMSE) Estimator Linear Minimum Mean Square Error (LMMSE) Estimator
Minimum Variance Unbiased Estimation

A natural criterion that comes to mind is the Mean Square Error (MSE):

mse
The MSE depends does not only depend on the variance but also on the bias. This means that an estimator that tries to minimize the MSE will often depend on the parameter , and is therefore unrealizable. Solution: constrain the bias to zero and minimize the variance, which leads to the so-called Minimum Variance Unbiased (MVU) estimator: unbiased: m for all
var
Remark: The MVU does not always exist and is generally difcult to nd. 10
minimum variance: var is minimal for all

Minimum Variance Unbiased Estimation (Linear Gaussian Model)
For the linear Gaussian model the MVU exists and its solution can be found by means of the Cramer-Rao lower bound (see notes, [Kay93], [Cover-Thomas91]):
Properties: m
var

is Gaussian distributed, i.e.,
11
Best Linear Unbiased Estimation
In this case we constrain the estimator to have the form Unbiased:
m Minimum variance:
for all
var
The rst condition can only be satised if we assume a linear model for m :
Hence, we have to solve min
cov
is minimal for all
cov
subject to
12
Best Linear Unbiased Estimation

Problem: Solution: Proof: Using the method of the Lagrange multipliers, we obtain cov

min
cov
subject to

cov
cov
cov
cov
Setting the gradient with respect to cov

The Lagrange multiplier
to zero we get cov
is obtained by the contraint cov cov
Properties:
var
cov
13
Best Linear Unbiased Estimation (Linear Model)
For the linear model the BLUE is given by
Remark: For the linear model the BLUE equals the MVU only when the noise is Gaussian.
14
Maximum Likelihood Estimation
Since the pdf of

depends on , we often write it as a function that is parametrized on : . This function can also be interpreted as the likelihood function, since it

tells us how likely it is to observe a certain . The Maximum Likelihood Estimator (MLE) nds the that maximizes
The MLE is generally easy to derive. Asymptotically, the MLE has the same mean and variance as the MVU (but not asymptotically equivalent to the MVU).
for a certain .
15
Maximum Likelihood Estimation (Linear Gaussian Model)
For the linear Gaussian model, the likelihood function is given by exp

det
It is clear that this function is maximized by solving min

16
Maximum Likelihood Estimation (Linear Gaussian Model)
Problem: Solution: Proof:
min
Rewriting the cost function that we have to minimize, we get
Setting the gradient with respect to

Remark: For the linear Gaussian model, the MLE is equivalent to the MVU estimator.
to zero we get
17
Least Squares Estimation
The Least Squares Estimator (LSE) nds the for which
is minimal
Properties: No probabilistic assumptions required The performance highly depends on the noise
18
Least Squares Estimation (Linear Model)
For the linear model, the LSE solves min

Problem: Solution: Proof:
min

Remark: For the linear model the LSE corresponds to the BLUE when the noise is white, and to the MVU when the noise is Gaussian and white.

19
As before
Least Squares Estimation (Linear Model)

Orthogonality Condition
Let us compute
For the linear model the LSE leads to the following orthogonality condition:

20
The Bayesian Philosophy

is viewed as a random variable and we must estimate its particular realization .

Again, we would like to minimize the MSE
but this time both
This allows us to use prior knowledge about , i.e., its prior pdf
Bmse

and are random, hence the notation Bmse for Bayesian MSE.
Note the difference between these two MSEs:

mse
E

Bmse
Whereas the rst MSE depends on , the second MSE does not depend on .

21
Minimum Mean Square Error Estimator

We know that , so that

Bmse Since
for all , we have to minimize the inner integral for each .

Problem: Solution:
Proof: Setting the derivative with respect to

min
mean of posterior pdf of :

to zero we obtain:

Remarks: In contrast to the MVU estimator the MMSE estimator always exists. The MMSE has a smaller average MSE (Bayesian MSE) than the MVU, but the MMSE estimator is biased whereas the MVU estimator is unbiased. 22
Minimum Mean Square Error Estimator (Linear Gaussian Model)
For the linear Gaussian model where is assumed to be Gaussian with mean 0 and variance
, the MMSE estimator can be found by means of the conditional pdf of a

Gaussian vector random variable [Kay93]:
where the last equality is due to the matrix inversion lemma (see notes):
Remark: Compare this with the MVU for the linear Gaussian model.
23
Linear Minimum Mean Square Error Estimator
As for the BLUE, we now constrain the estimator to have the form The Bayesian MSE can then be written as Bmse E E E
Setting the derivative with respect to
The LMMSE estimator is therefore given by
to zero, we obtain

24
Linear Minimum Mean Square Error Estimator

Orthogonality Condition
Let us compute E
:

The LMMSE leads to the following orthogonality condition:

25
Linear Minimum Mean Square Error Estimator (Linear Model)
For the linear model where estimator is given by
is assumed to have mean 0 and variance

, the LMMSE
where the last equality is again due to the matrix inversion lemma.
Remark: The LMMSE estimator is equivalent to the MMSE estimator when the noise and the unknown parameter are Gaussian.
26
Summary
linear model deterministic

linear Gaussian model deterministic

MVU
BLUE MLE
same as linear model

LSE
same as linear model Gaussian with mean

stochastic with mean ?
and var.
and var.

MMSE
LMMSE

27
Extensions to Complex Vector Parameters
linear model deterministic
linear Gaussian model deterministic

MVU
BLUE MLE

LSE
same as linear model Gaussian with mean

stochastic with mean 0 and cov. ?
and cov.

MMSE
LMMSE
28

has length K

has length L

Application to Communications

channel

29
, we obtain

and

. . .
. . .
..
..
. . .

Channel estimation model:
..
Symbol estimation model:
. . .

..
Dening
30
Most communications systems (GSM, UMTS, WLAN, ...) consist of two periods: Training period: During this period we try to estimate the channel by transmitting some known symbols, also known as training symbols or pilots. Data period: During this period we use the estimated channel to recover the unknown data symbols that convey useful information. What kind of processing do we use in each of these periods? During the training period we use one of the previously developed estimation techniques on the channel estimation model, , assuming that is known.
During the data period we use one of the previously developed estimation techniques on the symbol estimation model, , assuming that is known.
31
Channel estimation
Let us assume that cov
BLUE, LSE (or when the noise is Gaussian also the MVU and MLE):
LMMSE (or when the noise and channel are Gaussian also the MMSE):
Remark: Note that the LMMSE estimator requires the knowledge of which is generally not available.
32
Symbol estimation
Let us assume that cov
BLUE, LSE (or when the noise is Gaussian also the MVU and MLE):
LMMSE (or when the noise and symbols are Gaussian also the MMSE):
Remark: Note that the LMMSE estimator requires the knowledge of can be set to

if the data symbols have energy

which
and are uncorrelated.
33

Estimation Theory

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Estimation Theory

Uploaded by

Copyright:

Available Formats

ESTIMATION THEORY

The probability that

Uniform random variable:

mean and variance: Gaussian random variable:

pdf: mean and variance:

can then be determined by

From this follows the popular Bayes rule

For independent random variables

For two random variables

The conditional pdfs

Two random variables

The joint pdf:

The joint cdf:

The marginal pdfs

is a function of the random variables

Corresponding increments in the cdf of

Hence, the expectation over The mean of is given by m The variance of

and the joint cdf of

are the same

equals the joint expectation over

A vector random variable

is a vector of random variables

m The covariance matrix of

is a random noise vector with probability density function (pdf)

The estimator is of the form

Hence, the performance of the estimator

itself is a random variable.

should be described statistically.

is the model vector and

is the noise vector, which is assumed to have , cov

, and covariance matrix

The Bayesian philosophy:

Minimum Variance Unbiased Estimation

minimum variance: var is minimal for all

Minimum Variance Unbiased Estimation (Linear Gaussian Model)

is Gaussian distributed, i.e.,

Best Linear Unbiased Estimation

In this case we constrain the estimator to have the form Unbiased:

Hence, we have to solve min

is minimal for all

Best Linear Unbiased Estimation

Setting the gradient with respect to cov

The Lagrange multiplier

to zero we get cov

is obtained by the contraint cov cov

Best Linear Unbiased Estimation (Linear Model)

For the linear model the BLUE is given by

Maximum Likelihood Estimation

Since the pdf of

Maximum Likelihood Estimation (Linear Gaussian Model)

It is clear that this function is maximized by solving min

Maximum Likelihood Estimation (Linear Gaussian Model)

Problem: Solution: Proof:

Rewriting the cost function that we have to minimize, we get

Setting the gradient with respect to

Least Squares Estimation

The Least Squares Estimator (LSE) nds the for which

Least Squares Estimation (Linear Model)

For the linear model, the LSE solves min

Problem: Solution: Proof:

Least Squares Estimation (Linear Model)

The Bayesian Philosophy

Again, we would like to minimize the MSE

but this time both