You are on page 1of 71

Estimation theory

Senaka Samarasekera

SSP algorithm
development process

Formulation of the problem


Selection of a computational structure with well-defined parameters
for the implementation of the estimator.
Selection of a criterion of performance or cost function that measures
the performance of the estimator under some assumptions about the
statistical properties of the signals to be processed.
Optimization of the performance criterion to determine the
parameters of the optimum estimator.
Evaluation of the optimum value of the performance criterion to
determine whether the optimum estimator satisfies the design
specifications.

Formulation of the problem


Many practical applications (e.g., speech, audio, and image
coding) require subjective criteria that are difficult to express
mathematically.
Thus, we focus on criteria of performance that
only depend on the estimation error e (n),
provide a sufficient measure of the user satisfaction, and
lead to a mathematically tractable problem.

We generally select a criterion of performance by compromising


between these objectives.

Estimators
Estimator
: A function of DATA (a.k.a a STATISTIC) that will

approximate the ACTUAL VALUE of the PARAMETER in our


mathematical model
Scenario 1: Let us observe a DC voltage using a noisy voltmeter.
We can model this as
Where is the noise process
We can use the median, or the mean as estimator . Which we
will use as an approximation to the true value of the parameter .

Cost of an estimator
Bias
: Systemic error (try to avoid this if possible but not at all costs)

Unbiased estimators B = 0
Variance
Minimum variance estimator arg
Mean-square error
Minimum man square estimator MMSE = arg

Small change in scene 1


Now lets take an estimator

Find the bias, variance and the that will minimize the
mean square error.

Likelihood function
A pdf parameterized by an unknown parameter
Eg : If noise was Gaussian in the DC measurement case
What would be the case for multiple observations
assuming i.i.d Gaussian noise
A stock of a growing company can be modelled as
where is i.i.d AWGN. What is the likelihood function? Is
this stationary or non stationary?

Assignment 3 question 2
Find the likelihood function of the autoregressive
moving average model
where is a i.i.d zero mean Gaussian noise.

Deterministic vs Random
parameters
Deterministic Parameters : Only one value possible but
we dont know their value. (this lecture)
Random Parameters: An a priori pdf can be defined on
the parameter. This accounts for the prior uncertainty
about parameters. Which means parameters are
assumed to be random.
In the next lecture we will look at Bayesian methods
which will estimate random parameters.

Estimator development
Deterministic parameter case
Least squares estimator
Maximize the Likelihood
Method of moment estimator
No explicit guarantee on the estimator cost optimality. So will have to derive
conditions of optimality for these estimators.

Optimal estimators : Minimizes a suitable average cost function


Unbiased
Minimum variance ( Unbiased)
Minimum mean square error
Mini- max estimators : Minimize the maximum risk : The best in the worst case scenario

Maximum likelihood
estimators

MLE

Note that for unimodal likelihood functions

Advantages
Most versatile estimator of the lot
Has a nice physical intuition to it
Has a direct way of finding it
Is consistent asymptotically unbiased, and asymptotically optimal (w.r.t to mse)), and
invariant for functional transformations
Therefore for large number of observations, MLE-> MVUE

DC level

If we extend this to N observations

DC level example
Let
for be observations from a noisy voltmeter measuring a DC

level

Where i.i.d Gaussian with zero mean and unit variance.


What is the maximum likelihood estimator? What if the variance
was also
Note : For exponential type of likelihood we can maximize the log
likelihood function. This helps to avoid overflows that can be
quite common in likelihood computations.

Assignment 3 question 3
A noisy oscillator at a modulator is modeled as
Where is a i.i.d standard Gaussian noise. If the phase
offset of the oscillator is a constant for N samples, find an
estimator using this N samples.

Numerical determination of the MLE


For
complicated likelihood functions you can use numerical
optimization methods such as
Let
Grid search
Newton Rapson
Scoring method
Expectation Maximization

Warning : Can get stuck in a local maxima. Not guaranteed to


get the global maximum.

Expectation Maximization method


Uses
auxiliary variables to simplify the likelihood. (adding

variables simplifies the problem ? )


Let is the unobserved auxiliary variable vector.
This solves the problem by looking at
When is this natural (or effective)
Eg. When we can then we can make which has a nice low
dimensional sufficient statistic

Expectation Maximization method

For each iteration k, with the input and data x and the likelihood

Expectation step :F
Compute
Maximization step :

For exponential likelihoods EM method has the properties that

For real

Optimality of MLE for the linear


model
Let
the data defined via the general linear model
where and Then the MLE is

This is an efficient estimator as


and the pdf of
Assignment : Prove this

Properties of MLE : Consistency


Assuming
i.i.d samples

From law of large numbers


Recall
which implies

And RHS of (1) is maximized at . So if take

Properties of MLE : Asymptotic


optimality
From
the mean value theorem
Since
Assuming IID samples the denominator becomes
Since and

Properties of MLE : Asymptotic


optimality
For IID samples the numerator
Note that are independent random variables too.
Using central limit theorem
But and Var
Using Slutskys theorem

Invariance properties
Looks at the MLE of a transformed parameter
If a measurement equation is given by the likelihood
function , and if is one-to-one, then

Method of moments
method

Method of moment estimators


This is a simple estimator that can be used

as it is if the performance is good enough

Or as a stating point to MLE (which would be consistent)

Let vector of moments of the pdf (likelihood) and be the underlying parameter
vector. Then using the pdf definition we can find the function s.t
If is invertible
Now if we plugin the estimators of the respective moments we get the MOM estimators
The multiple moments needed can be easily found via the moment
generation function

Example
Let the likelihood function be a 2 components Gaussian
mixture with unknown variance and weights. Let both
means be 0.
Where
Find the MOM estimators

Example
Let
Let , and
Then the parameters (after some algebra) can be shown
to be

Minimum variance
unbiased estimators
(MVUE)

MVUE
two observations with the likelihood functions
Assume

Assuming that the MVUE is of the form


Find the MVUE if
What if the likelihood function of the second observations
changed like

MVUE
Some time the variance and the bias equations become
so complicated that direct optimization methods fail.
Some times no function of data will give a minimum
variance for all possible possible parameter values.

Finding the MVUE


So we use 3 indirect approaches to find MVUE
Find sufficient statistic. Find unbiased estimator and condition
it on the sufficient statistic.
Find CRLB. Select a function form using some other
knowledge. Find parameters that come nearest to the CRLB
Constrain the model to be linear -> BLUE (best linear
unbiased)

MVUE I : Sufficient statistic


MVUE in the exponential family

Sufficient statistic
All the information about the parameter in the likelihood
function comes through the sufficient statistic.
Note : Raw data it self is a sufficient statistic.
An MVUE should be a function of the sufficient statistic
Minimal sufficient statistic : The smallest of them all ( in
dimensions)
Minimal sufficient statistic is always a function of other
sufficient statistics.

Complete statistics
If the whole parameter set is identifiable using the sufficient statistic
these are called a COMPLETE statistic.
A sufficient statistic is complete iff
How does this condition relate to parameter identifiability?
Note that
Since the only function in the null space of is the function , the space of all
possible functions is spanned by the parametric family .

Neyman Fisher factorization


theorem
is a sufficient statistic for the parameter iff we can factor
the likelihood function as

In the DC level observation example find the sufficient


statistic
1. for the DC level when noise power is known
2. for the noise power when 0 dc level
3. for the dc level and the noise power

Rao- Blackwell theorem


Let and two random variables (vectors). Define the
conditional expectation on given
Then

Rao Blackwell theorem proof


Property 1
Property 2

RaoBlackwell theorem
applied to estimators
Let
be anestimatorof a parameter , then theconditional
expectationof given a sufficient statistic
is always a better estimator of , and is never worse.
From a mean square error perspective this means

LehmannScheff theorem
If a statistic that is
UNBIASED,COMPLETEandSUFFICIENTfor some
parameter , then this statistic has the minimum
expected loss for ANYCONVEXLOSS FUNCTION.
In many practical applications with the squared lossfunction, it has a smaller mean squared error among
any estimators with the sameexpected value.
Hence a unbiased, complete, sufficient statistic is a
MVUE.

Pitman- Koopman theorem


Among families of probability distributions whose
domain does not vary with the parameter being
estimated, only in EXPONENTIAL FAMILYis there a
sufficient statistic whose dimension remains bounded as
sample size increases
So practically we can find worthwhile minimum
sufficient statistics only for exponential family of
distribution.
So what is this magical exponential family?

Exponential family
Not to be confused with exponential distribution ( it is
also a member in this family when the parameter of
interest is the mean).
This is concerned with how the pdfs are parameterized.
A good number of common pdfs belong to this family
when some of there parameters are known
Eg:
Poisson distribution with unknown mean
Exponential distribution with unknown mean
Gaussian distribution with unknown mean/unknown variance/
both mean and variance unknown

Exponential family
Definition

A set of probability distributions admitting following canonical


decomposition
Where
= sufficient statistic
= natural parameters
= inner/ dot product
= log normalizer
= carrier measure

If the observation is a scaler the pdf


is UNIVARIATE otherwise if the
observation is a vector the pdf is
MULTIVARIATE
The ORDER of a member in this
family is the DIMENSION OF THE

Example
Univariate Poisson distribution with unknown mean
Recall that the pmf (since is
This can be rearranged to
So
This is a exponential family member of degree 1.

Example
Find the sufficient statistic, natural parameters, log
normalizer, and the carrier measure for the Univariate
Gaussian distribution with unknown mean and variance.
Since

Therefore

Exponential Family

Assignment question
Prove that (a properly normalized) product of arbitrary
exponential family members is also a member of the
exponential family.
Is it the same for mixtures of exponential family
members?

Log-normalizer
Exponential families are characterized by their strictly
convex and differentiable functions F, called lognormalizer or the partition function
Since we have

Log normalizer
It
is also related to the moment generating and cumulant
generating functions of the sufficient statistics.
The moment generating function of the sufficient statistic is

Since the cumulant generating function

Log- normalizer
Therefore in exponential family we can easily find the
mean and the variance of the test statistic as

The Fisher information of exponential family member


also becomes
Assignment question : Prove this

MLE in exponential family


Taking
the log likelihood

Taking the gradient


At saddle points . Therefore the estimators can be found by solving the equation
Which is also the method of moment estimator in this case.
Also note that
Is negative showing that log likelihood function is strictly concave, and the
saddle point is a maximum.

Assignment questions
Show that the MLE and MVUE both are efficient in for
natural parameters of the exponential family.
Show that Expectation Maximization method finds the
local maximum of an exponential family joint
distribution (of observed and unobserved data)

Completeness of the sufficient


statistic in the exponential family
Theorem
: A sufficient statistic of an exponential family member is a complete

statistic.
Proof
We can write this as

Let and
Which is a scaled Laplace transformation of the function.. Therefore
From the uniqueness of the Laplace transform this implies Since , this implies

MVUE II: Best Linear


unbiased estimator

Weiner Filter

.
In our compact notation this can be written as
If the number of samples are equal to number of coefficients we
can find a that make .
If N > m (which is normally the case) we will try to find that
minimize

Weiner Filter

Let us define the


ross correlation vector , where each component

uto-correlation matrix where each component

Weiner filter

Now we find an estimate

Where

Applications of
Weiner filter
Noise cancellation

System identification

Channel
equalization

BLUE
Is
an extension of the Weiner filtering process.
Now we assume a linear estimator
To this to be unbiased
For this to be the case for some matrix .If we assume and then
If the covariance matrix of is , then
and

BLUE
Taking
the gradient of each variance and equating it to zero we

end up with the BLUE


and

Assignment: Proof: Take each variance and use Lagrange


multipliers to enforce . Then minimizing the augmented cost
function gives the .
Ref: Kay (Appendix 6B pg 153)

Least squares estimators

Least squares estimators (LSE)


One of the oldest methods going back to Legendre and
Gauss
Part of classical regression analysis.
Also known as data/curve fitting
Does not formulate a likelihood function. Just use the
signal model directly.
No probabilistic modeling of the noise.
Cannot claim any probabilistic optimality properties.
Two main classes of problems; Linear/ Non linear Least
squares

Linear least square estimators


For
data , parameter estimates and a model (matrix) the squared error
function becomes

Taking the gradient


Making it zero to find the maximum
Using this in the squared error equation the minimum error becomes
proof : Kay pg 225

Weighted LSE
What
if we change the error criterion

When would we use this? All data points are not same
Now
and
If we add a probabilistic description to the noise which will enable
us to formulate a likelihood function then W characterize the
spectral characteristic of the noise

Geometrical interpretation

Let where each column in the model matrix is now viewed as an n


dimensional vector.
If the model is to be identifiable these columns should be
independent thus spanning a p-dimensional sub space.
Lets define the signal estimate ;the signal space approximation of
the data.
To minimize
J the errors
should be orthogonal to this
p-space.
Hence minimizing the
square error is equivalent to
making the error
orthogonalEg:
toAssume
the model
, and

Geometrical interpretation
What happens if Then the projection on the signal
space becomes

Comparing with the earlier result this shows that matrix


is unitary as

Geometrical interpretation
Lets
look at the signal estimate with LSE

Define the projection matrix , which maps the data to the


signal space.
By defining its complement ,we see that the error

and the minimum cost

Sequential least squares estimators


So far the set of all data points were taken in one single
vector . This is called batch processing
What happens if data arrive sequentially? Can we
update the LSE sequentially?
E.g. Updating mean

Can we do this to any LSE?


Correction term

Sequential least squares estimators


Now we index the estimators using the time step
Let us denote where each is a raw vector.
Since we can write the Grammian

Sequential least squares estimators


Note

Since and this can be written as

Since, using the matrix inversion lemma

where

Sequential least squares estimators


Substituting this to and simplifying we get

Where the correction gain factor

You might also like