Estimation Theory

ESTIMATION THEORY
Outline
1. Random Variables
2. Introduction
3. Estimation techniques
4. Extensions to Complex Vector Parameters
5. Application to communication systems
[Kay93] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory,

Prentice-Hall, New Jersey, 1993.
[Cover-Thomas91] T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley, New York, 1991.
Random Variables
Definitions
A random variable
is a function that assigns a number to every outcome of an experiment.
A random variable
is completely characterized by:

Its probability density function (pdf):

Its cumulative distribution function (cdf):
Properties
and
lies between
The probability that
then is

The mean of
is given by

The variance of

m
is given by

var
Random Variables
Examples
Uniform random variable:

pdf:

var
mean and variance:
Gaussian random variable:

exp

pdf:

var
mean and variance:
Random Variables
Two random variables

we have
and
For independent random variables
are given by
and
The conditional pdfs
can then be determined by

and
The marginal pdfs

The joint pdf:

The joint cdf:
, we can define
and
For two random variables
From this follows the popular Bayes rule
Random Variables
Function of random variables
, e.g.,
and
is a function of the random variables
Suppose
and
are the same

equals the joint expectation over

Hence, the expectation over

The mean of
and the joint cdf of
Corresponding increments in the cdf of
and
is given by

The variance of

is given by

var
Random Variables
Vector random variables
is a vector of random variables
A vector random variable

Its cdf/pdf is the joint cdf/pdf of all these random variables.

The mean of
is given by
E

m

m
is given by

cov
The covariance matrix of
cov

Introduction
Problem Statement
Suppose we have an unknown scalar parameter that we want to estimate from an observed

vector , which is related to through the following relationship

is a random noise vector with probability density function (pdf)
where

Note that
itself is a random variable.

Hence, the performance of the estimator
should be described statistically.
The estimator is of the form
Introduction
Special Models
To solve any estimation problem, we need a model. Here, we will look deeper into two
specific models:
and is then given by
The linear model: The relationship between

is the model vector and
is the noise vector, which is assumed to have
where
, cov
, and covariance matrix
mean , m
The linear Gaussian model: This model is a special case of the linear model, where the
noise vector
is assumed to be Gaussian (or normal) distributed:

exp

det
Estimation Techniques
We can view the unknown parameter as a deterministic variable

Minimum Variance Unbiased (MVU) Estimator
Best Linear Unbiased Estimator (BLUE)
Maximum Likelihood Estimator (MLE)
Least Squares Estimator (LSE)
The Bayesian philosophy:
is viewed as a random variable
Minimum Mean Square Error (MMSE) Estimator
Linear Minimum Mean Square Error (LMMSE) Estimator
Minimum Variance Unbiased Estimation

A natural criterion that comes to mind is the Mean Square Error (MSE):

mse

var

The MSE depends does not only depend on the variance but also on the bias.
This means that an estimator that tries to minimize the MSE will often depend on the
parameter , and is therefore unrealizable.
Solution: constrain the bias to zero and minimize the variance, which leads to the so-called
Minimum Variance Unbiased (MVU) estimator:
unbiased: m
for all
minimum variance: var is minimal for all

Remark: The MVU does not always exist and is generally difficult to find.
10
Minimum Variance Unbiased Estimation (Linear Gaussian Model)
For the linear Gaussian model the MVU exists and its solution can be found by means of
the Cramer-Rao lower bound (see notes, [Kay93], [Cover-Thomas91]):

var

is Gaussian distributed, i.e.,

11

Properties:
Best Linear Unbiased Estimation

In this case we constrain the estimator to have the form
Unbiased:

for all

Minimum variance:

var

cov
is minimal for all

The first condition can only be satisfied if we assume a linear model for m :

m
Hence, we have to solve

12
subject to
cov
min
Best Linear Unbiased Estimation

subject to

cov
min
Problem:

to zero we get

cov

cov
is obtained by the contraint
The Lagrange multiplier
cov
cov

13
cov
var
Properties:
Using the method of the Lagrange multipliers, we obtain
Setting the gradient with respect to
Proof:
cov
cov
cov
cov
cov
Solution:
Best Linear Unbiased Estimation (Linear Model)
For the linear model the BLUE is given by

Remark: For the linear model the BLUE equals the MVU only when the noise is Gaussian.
14
Maximum Likelihood Estimation
depends on , we often write it as a function that is parametrized on :
Since the pdf of

. This function can also be interpreted as the likelihood function, since it

tells us how likely it is to observe a certain . The Maximum Likelihood Estimator (MLE)

for a certain .
finds the that maximizes
The MLE is generally easy to derive.

Asymptotically, the MLE has the same mean and variance as the MVU (but not asymptotically equivalent to the MVU).
15
Maximum Likelihood Estimation (Linear Gaussian Model)
For the linear Gaussian model, the likelihood function is given by

exp

It is clear that this function is maximized by solving

min

det

16
Maximum Likelihood Estimation (Linear Gaussian Model)

min
Problem:

Solution:
Proof:
Rewriting the cost function that we have to minimize, we get

Setting the gradient with respect to
to zero we get

Remark: For the linear Gaussian model, the MLE is equivalent to the MVU estimator.
17
Least Squares Estimation
The Least Squares Estimator (LSE) finds the for which

Properties:
No probabilistic assumptions required
The performance highly depends on the noise
18
is minimal
Least Squares Estimation (Linear Model)
For the linear model, the LSE solves

min
min
Problem:

Proof:
Solution:
As before
Remark: For the linear model the LSE corresponds to the BLUE when the noise is white,
and to the MVU when the noise is Gaussian and white.
19
Least Squares Estimation (Linear Model)

Orthogonality Condition

Let us compute

For the linear model the LSE leads to the following orthogonality condition:

20
The Bayesian Philosophy

is viewed as a random variable and we must estimate its particular realization
This allows us to use prior knowledge about , i.e., its prior pdf
Again, we would like to minimize the MSE

Bmse

but this time both
and are random, hence the notation Bmse for Bayesian MSE.
Note the difference between these two MSEs:

mse

Bmse

21
Whereas the first MSE depends on , the second MSE does not depend on .
Minimum Mean Square Error Estimator

, so that

We know that

Bmse

for all , we have to minimize the inner integral for each .
Since

min
Problem:

mean of posterior pdf of :
Solution:
Proof: Setting the derivative with respect to
to zero we obtain:

Remarks:
In contrast to the MVU estimator the MMSE estimator always exists.
The MMSE has a smaller average MSE (Bayesian MSE) than the MVU, but the MMSE
estimator is biased whereas the MVU estimator is unbiased.
22
Minimum Mean Square Error Estimator (Linear Gaussian Model)
For the linear Gaussian model where is assumed to be Gaussian with mean 0 and variance

, the MMSE estimator can be found by means of the conditional pdf of a

Gaussian vector random variable [Kay93]:

where the last equality is due to the matrix inversion lemma (see notes):
23

Remark: Compare this with the MVU for the linear Gaussian model.
Linear Minimum Mean Square Error Estimator

As for the BLUE, we now constrain the estimator to have the form
The Bayesian MSE can then be written as

24
The LMMSE estimator is therefore given by

to zero, we obtain
Setting the derivative with respect to

Bmse
Linear Minimum Mean Square Error Estimator

Orthogonality Condition

Let us compute E

The LMMSE leads to the following orthogonality condition:

25

Linear Minimum Mean Square Error Estimator (Linear Model)
, the LMMSE
is assumed to have mean 0 and variance

For the linear model where
estimator is given by

where the last equality is again due to the matrix inversion lemma.
Remark: The LMMSE estimator is equivalent to the MMSE estimator when the noise and
the unknown parameter are Gaussian.
26
Summary
linear model
linear Gaussian model
deterministic
deterministic

MVU
same as linear model

BLUE

MLE
LSE

and var.

Gaussian with mean
and var.

stochastic with mean


LMMSE
27

MMSE
Extensions to Complex Vector Parameters
linear model
linear Gaussian model
deterministic
deterministic

MVU

BLUE

MLE
LSE

and cov.
Gaussian with mean
stochastic with mean 0 and cov.

28

LMMSE

MMSE
Application to Communications

has length L

has length K

29
channel

30

.

..

.

..
..
.
..
..
.

Channel estimation model:
..
.
..
..
.
Symbol estimation model:
, we obtain
and
Defining
Most communications systems (GSM, UMTS, WLAN, ...) consist of two periods:
Training period: During this period we try to estimate the channel by transmitting some
known symbols, also known as training symbols or pilots.
Data period: During this period we use the estimated channel to recover the unknown
data symbols that convey useful information.
What kind of processing do we use in each of these periods?
During the training period we use one of the previously developed estimation techniques

, assuming that
on the channel estimation model,
is known.
During the data period we use one of the previously developed estimation techniques

31
, assuming that
on the symbol estimation model,
is known.
Channel estimation

Let us assume that cov
BLUE, LSE (or when the noise is Gaussian also the MVU and MLE):

LMMSE (or when the noise and channel are Gaussian also the MMSE):

Remark: Note that the LMMSE estimator requires the knowledge of
32
which is generally not available.
Symbol estimation

Let us assume that cov
BLUE, LSE (or when the noise is Gaussian also the MVU and MLE):

LMMSE (or when the noise and symbols are Gaussian also the MMSE):

33
and are uncorrelated.
if the data symbols have energy
can be set to
which
Remark: Note that the LMMSE estimator requires the knowledge of

Estimation Theory

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Estimation Theory

Uploaded by

Copyright:

Available Formats

ESTIMATION THEORY

[Kay93] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory,

is a function that assigns a number to every outcome of an experiment.

is completely characterized by:

Its probability density function (pdf):

Its cumulative distribution function (cdf):

The probability that

Uniform random variable:

mean and variance:

Gaussian random variable:

mean and variance:

can then be determined by

The joint pdf:

The joint cdf:

From this follows the popular Bayes rule

is a function of the random variables

are the same

equals the joint expectation over

Hence, the expectation over

and the joint cdf of

Corresponding increments in the cdf of

is a vector of random variables

A vector random variable

Its cdf/pdf is the joint cdf/pdf of all these random variables.

The covariance matrix of

vector , which is related to through the following relationship

is a random noise vector with probability density function (pdf)

itself is a random variable.

Hence, the performance of the estimator

should be described statistically.

The estimator is of the form

The linear model: The relationship between

is the model vector and

is the noise vector, which is assumed to have

, and covariance matrix

is assumed to be Gaussian (or normal) distributed:

We can view the unknown parameter as a deterministic variable

Minimum Variance Unbiased (MVU) Estimator

Best Linear Unbiased Estimator (BLUE)

Maximum Likelihood Estimator (MLE)

Least Squares Estimator (LSE)

The Bayesian philosophy:

is viewed as a random variable

Minimum Mean Square Error (MMSE) Estimator

Linear Minimum Mean Square Error (LMMSE) Estimator

Minimum Variance Unbiased Estimation

minimum variance: var is minimal for all

Minimum Variance Unbiased Estimation (Linear Gaussian Model)

is Gaussian distributed, i.e.,

Best Linear Unbiased Estimation

In this case we constrain the estimator to have the form

is minimal for all

Best Linear Unbiased Estimation

is obtained by the contraint

The Lagrange multiplier

Using the method of the Lagrange multipliers, we obtain

Setting the gradient with respect to

Best Linear Unbiased Estimation (Linear Model)

For the linear model the BLUE is given by

Maximum Likelihood Estimation

depends on , we often write it as a function that is parametrized on :

Since the pdf of

. This function can also be interpreted as the likelihood function, since it

finds the that maximizes