You are on page 1of 33

ESTIMATION THEORY

Outline

1. Random Variables
2. Introduction
3. Estimation techniques
4. Extensions to Complex Vector Parameters
5. Application to communication systems

[Kay93] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory,


Prentice-Hall, New Jersey, 1993.
[Cover-Thomas91] T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley, New York, 1991.

Random Variables
Definitions

A random variable

is a function that assigns a number to every outcome of an experiment.

A random variable

is completely characterized by:






 



 

Its probability density function (pdf):



Its cumulative distribution function (cdf):

Properties

and

lies between

The probability that

then is






 



 




The mean of

is given by


 

The variance of

 

m
is given by

 

 

 

var

Random Variables
Examples

Uniform random variable:








 

 

pdf:









var

mean and variance:

Gaussian random variable:






exp

 

 

pdf:





var

mean and variance:

Random Variables
Two random variables




 


 







 

 


 



 

 











 

 




 




 





 


 


 
 




 



 








 
 
 












 





 

 








we have
and
For independent random variables

are given by
and
The conditional pdfs

can then be determined by


and
The marginal pdfs






The joint pdf:




The joint cdf:

, we can define
and
For two random variables

From this follows the popular Bayes rule

Random Variables
Function of random variables

, e.g.,

and

is a function of the random variables

Suppose

and

are the same




equals the joint expectation over

 


 




 







 

Hence, the expectation over


The mean of

and the joint cdf of

Corresponding increments in the cdf of

and

is given by





 


 

 


The variance of

 

is given by






 

 


 

 

 


var

Random Variables
Vector random variables

is a vector of random variables

A vector random variable










Its cdf/pdf is the joint cdf/pdf of all these random variables.


The mean of

is given by
E

 

m





m
is given by




 




cov

The covariance matrix of

cov




Introduction
Problem Statement

Suppose we have an unknown scalar parameter that we want to estimate from an observed

vector , which is related to through the following relationship






is a random noise vector with probability density function (pdf)

where





Note that

itself is a random variable.




Hence, the performance of the estimator

should be described statistically.

The estimator is of the form

Introduction
Special Models

To solve any estimation problem, we need a model. Here, we will look deeper into two
specific models:
and is then given by

The linear model: The relationship between




is the model vector and

is the noise vector, which is assumed to have

where

, cov

, and covariance matrix

mean , m

The linear Gaussian model: This model is a special case of the linear model, where the
noise vector

is assumed to be Gaussian (or normal) distributed:




 





 








exp




det

Estimation Techniques

We can view the unknown parameter as a deterministic variable




Minimum Variance Unbiased (MVU) Estimator

Best Linear Unbiased Estimator (BLUE)

Maximum Likelihood Estimator (MLE)

Least Squares Estimator (LSE)

The Bayesian philosophy:

is viewed as a random variable

Minimum Mean Square Error (MMSE) Estimator

Linear Minimum Mean Square Error (LMMSE) Estimator

Minimum Variance Unbiased Estimation


A natural criterion that comes to mind is the Mean Square Error (MSE):


 

mse




var

 







The MSE depends does not only depend on the variance but also on the bias.
This means that an estimator that tries to minimize the MSE will often depend on the
parameter , and is therefore unrealizable.
Solution: constrain the bias to zero and minimize the variance, which leads to the so-called
Minimum Variance Unbiased (MVU) estimator:

unbiased: m

for all

minimum variance: var is minimal for all




Remark: The MVU does not always exist and is generally difficult to find.
10

Minimum Variance Unbiased Estimation (Linear Gaussian Model)

For the linear Gaussian model the MVU exists and its solution can be found by means of
the Cramer-Rao lower bound (see notes, [Kay93], [Cover-Thomas91]):


 

var

 





is Gaussian distributed, i.e.,




11

 


 


Properties:

Best Linear Unbiased Estimation





In this case we constrain the estimator to have the form

Unbiased:


for all




 

 

Minimum variance:


 

var




cov

is minimal for all







The first condition can only be satisfied if we assume a linear model for m :




m
Hence, we have to solve




12

subject to

cov

min

Best Linear Unbiased Estimation


subject to




cov

min

Problem:









to zero we get

cov







cov

is obtained by the contraint

The Lagrange multiplier

cov

cov




13

cov

var

Properties:

Using the method of the Lagrange multipliers, we obtain

Setting the gradient with respect to



Proof:

cov

cov

cov

cov

cov

Solution:

Best Linear Unbiased Estimation (Linear Model)

For the linear model the BLUE is given by




 


 


Remark: For the linear model the BLUE equals the MVU only when the noise is Gaussian.

14

Maximum Likelihood Estimation

depends on , we often write it as a function that is parametrized on :

Since the pdf of





 

. This function can also be interpreted as the likelihood function, since it


tells us how likely it is to observe a certain . The Maximum Likelihood Estimator (MLE)

for a certain .

finds the that maximizes

The MLE is generally easy to derive.


Asymptotically, the MLE has the same mean and variance as the MVU (but not asymptotically equivalent to the MVU).

15

Maximum Likelihood Estimation (Linear Gaussian Model)

For the linear Gaussian model, the likelihood function is given by




exp

 




It is clear that this function is maximized by solving





 


min

 






 

det







16

Maximum Likelihood Estimation (Linear Gaussian Model)

 

min

Problem:







 

 

Solution:

Proof:
Rewriting the cost function that we have to minimize, we get


 


 

 


 


 




Setting the gradient with respect to

to zero we get

 

 




Remark: For the linear Gaussian model, the MLE is equivalent to the MVU estimator.

17

Least Squares Estimation

The Least Squares Estimator (LSE) finds the for which













Properties:
No probabilistic assumptions required
The performance highly depends on the noise

18

is minimal

Least Squares Estimation (Linear Model)

For the linear model, the LSE solves




min


min

Problem:




Proof:

Solution:

As before

Remark: For the linear model the LSE corresponds to the BLUE when the noise is white,
and to the MVU when the noise is Gaussian and white.

19

Least Squares Estimation (Linear Model)


Orthogonality Condition



Let us compute













For the linear model the LSE leads to the following orthogonality condition:















20

The Bayesian Philosophy


is viewed as a random variable and we must estimate its particular realization


This allows us to use prior knowledge about , i.e., its prior pdf

Again, we would like to minimize the MSE




Bmse







but this time both

and are random, hence the notation Bmse for Bayesian MSE.

Note the difference between these two MSEs:




 

mse













 




Bmse
















21

Whereas the first MSE depends on , the second MSE does not depend on .

Minimum Mean Square Error Estimator




, so that

 

We know that




Bmse




for all , we have to minimize the inner integral for each .

Since




min

Problem:




mean of posterior pdf of :

Solution:

Proof: Setting the derivative with respect to

to zero we obtain:














Remarks:
In contrast to the MVU estimator the MMSE estimator always exists.
The MMSE has a smaller average MSE (Bayesian MSE) than the MVU, but the MMSE
estimator is biased whereas the MVU estimator is unbiased.
22

Minimum Mean Square Error Estimator (Linear Gaussian Model)

For the linear Gaussian model where is assumed to be Gaussian with mean 0 and variance




, the MMSE estimator can be found by means of the conditional pdf of a







Gaussian vector random variable [Kay93]:




 



 

 







where the last equality is due to the matrix inversion lemma (see notes):

23

 






Remark: Compare this with the MVU for the linear Gaussian model.

Linear Minimum Mean Square Error Estimator




As for the BLUE, we now constrain the estimator to have the form

The Bayesian MSE can then be written as









 



 








24

The LMMSE estimator is therefore given by




to zero, we obtain

Setting the derivative with respect to

 

Bmse

Linear Minimum Mean Square Error Estimator


Orthogonality Condition





Let us compute E










The LMMSE leads to the following orthogonality condition:




















25

 



 


 

Linear Minimum Mean Square Error Estimator (Linear Model)

, the LMMSE

is assumed to have mean 0 and variance




For the linear model where

estimator is given by


 





 


 







where the last equality is again due to the matrix inversion lemma.

Remark: The LMMSE estimator is equivalent to the MMSE estimator when the noise and
the unknown parameter are Gaussian.

26

Summary

linear model

linear Gaussian model

deterministic

deterministic


 



 





MVU


same as linear model

 



 

 



BLUE

 



 





MLE


same as linear model

LSE






and var.

 


Gaussian with mean

and var.

 


stochastic with mean









same as linear model




 










 


LMMSE

27




 





 

 



MMSE

Extensions to Complex Vector Parameters

linear model

linear Gaussian model

deterministic

deterministic


 



 


 



MVU


same as linear model

 

 

 

BLUE

 



 


 



MLE


same as linear model

LSE

 

and cov.

Gaussian with mean

stochastic with mean 0 and cov.




 



 

28

same as linear model




LMMSE

 


 

 

MMSE

Application to Communications





















 

 


 


 


 

 







 

 

has length L




 

 

 

 

 

 

 

 

 

 

has length K




29

channel

Application to Communications








































 














30












.


..







.

  

..

..
.
..
..
.







 





Channel estimation model:

..
.
..
..
.

Symbol estimation model:

, we obtain
and
Defining

Application to Communications

Most communications systems (GSM, UMTS, WLAN, ...) consist of two periods:
Training period: During this period we try to estimate the channel by transmitting some
known symbols, also known as training symbols or pilots.
Data period: During this period we use the estimated channel to recover the unknown
data symbols that convey useful information.
What kind of processing do we use in each of these periods?
During the training period we use one of the previously developed estimation techniques



, assuming that

on the channel estimation model,

is known.

During the data period we use one of the previously developed estimation techniques


31

, assuming that

on the symbol estimation model,

is known.

Application to Communications
Channel estimation




Let us assume that cov

BLUE, LSE (or when the noise is Gaussian also the MVU and MLE):


 

LMMSE (or when the noise and channel are Gaussian also the MMSE):


 

 

Remark: Note that the LMMSE estimator requires the knowledge of

32

which is generally not available.

Application to Communications
Symbol estimation




Let us assume that cov

BLUE, LSE (or when the noise is Gaussian also the MVU and MLE):



LMMSE (or when the noise and symbols are Gaussian also the MMSE):



 




33

and are uncorrelated.

if the data symbols have energy

can be set to

which

Remark: Note that the LMMSE estimator requires the knowledge of

You might also like