Estimation Theory

Estimation theory
Senaka Samarasekera
SSP algorithm
development process
Formulation of the problem

Selection of a computational structure with well-defined parameters
for the implementation of the estimator.
Selection of a criterion of performance or cost function that measures
the performance of the estimator under some assumptions about the
statistical properties of the signals to be processed.
Optimization of the performance criterion to determine the
parameters of the optimum estimator.
Evaluation of the optimum value of the performance criterion to
determine whether the optimum estimator satisfies the design
specifications.
Formulation of the problem

Many practical applications (e.g., speech, audio, and image
coding) require subjective criteria that are difficult to express
mathematically.
Thus, we focus on criteria of performance that
only depend on the estimation error e (n),
provide a sufficient measure of the user satisfaction, and
lead to a mathematically tractable problem.
We generally select a criterion of performance by compromising

between these objectives.
Estimators
Estimator
: A function of DATA (a.k.a a STATISTIC) that will
approximate the ACTUAL VALUE of the PARAMETER in our

mathematical model
Scenario 1: Let us observe a DC voltage using a noisy voltmeter.
We can model this as
Where is the noise process
We can use the median, or the mean as estimator . Which we
will use as an approximation to the true value of the parameter .
Cost of an estimator
Bias
: Systemic error (try to avoid this if possible but not at all costs)
Unbiased estimators B = 0
Variance
Minimum variance estimator arg
Mean-square error
Minimum man square estimator MMSE = arg
Small change in scene 1

Now lets take an estimator
Find the bias, variance and the that will minimize the
mean square error.
Likelihood function
A pdf parameterized by an unknown parameter
Eg : If noise was Gaussian in the DC measurement case
What would be the case for multiple observations
assuming i.i.d Gaussian noise
A stock of a growing company can be modelled as
where is i.i.d AWGN. What is the likelihood function? Is
this stationary or non stationary?
Assignment 3 question 2
Find the likelihood function of the autoregressive
moving average model
where is a i.i.d zero mean Gaussian noise.
Deterministic vs Random
parameters
Deterministic Parameters : Only one value possible but
we dont know their value. (this lecture)
Random Parameters: An a priori pdf can be defined on
the parameter. This accounts for the prior uncertainty
about parameters. Which means parameters are
assumed to be random.
In the next lecture we will look at Bayesian methods
which will estimate random parameters.
Estimator development
Deterministic parameter case
Least squares estimator
Maximize the Likelihood
Method of moment estimator
No explicit guarantee on the estimator cost optimality. So will have to derive
conditions of optimality for these estimators.
Optimal estimators : Minimizes a suitable average cost function

Unbiased
Minimum variance ( Unbiased)
Minimum mean square error
Mini- max estimators : Minimize the maximum risk : The best in the worst case scenario
Maximum likelihood
estimators
MLE
Note that for unimodal likelihood functions
Advantages
Most versatile estimator of the lot
Has a nice physical intuition to it
Has a direct way of finding it
Is consistent asymptotically unbiased, and asymptotically optimal (w.r.t to mse)), and
invariant for functional transformations
Therefore for large number of observations, MLE-> MVUE
DC level
If we extend this to N observations
DC level example
Let
for be observations from a noisy voltmeter measuring a DC
level
Where i.i.d Gaussian with zero mean and unit variance.

What is the maximum likelihood estimator? What if the variance
was also
Note : For exponential type of likelihood we can maximize the log
likelihood function. This helps to avoid overflows that can be
quite common in likelihood computations.
Assignment 3 question 3
A noisy oscillator at a modulator is modeled as
Where is a i.i.d standard Gaussian noise. If the phase
offset of the oscillator is a constant for N samples, find an
estimator using this N samples.
Numerical determination of the MLE

For
complicated likelihood functions you can use numerical
optimization methods such as
Let
Grid search
Newton Rapson
Scoring method
Expectation Maximization
Warning : Can get stuck in a local maxima. Not guaranteed to

get the global maximum.
Expectation Maximization method

Uses
auxiliary variables to simplify the likelihood. (adding
variables simplifies the problem ? )

Let is the unobserved auxiliary variable vector.
This solves the problem by looking at
When is this natural (or effective)
Eg. When we can then we can make which has a nice low
dimensional sufficient statistic
Expectation Maximization method
For each iteration k, with the input and data x and the likelihood
Expectation step :F
Compute
Maximization step :
For exponential likelihoods EM method has the properties that
For real
Optimality of MLE for the linear

model
Let
the data defined via the general linear model
where and Then the MLE is
This is an efficient estimator as

and the pdf of
Assignment : Prove this
Properties of MLE : Consistency

Assuming
i.i.d samples
From law of large numbers

Recall
which implies
And RHS of (1) is maximized at . So if take
Properties of MLE : Asymptotic

optimality
From
the mean value theorem
Since
Assuming IID samples the denominator becomes
Since and
Properties of MLE : Asymptotic

optimality
For IID samples the numerator
Note that are independent random variables too.
Using central limit theorem
But and Var
Using Slutskys theorem
Invariance properties
Looks at the MLE of a transformed parameter
If a measurement equation is given by the likelihood
function , and if is one-to-one, then
Method of moments
method
Method of moment estimators

This is a simple estimator that can be used
as it is if the performance is good enough
Or as a stating point to MLE (which would be consistent)
Let vector of moments of the pdf (likelihood) and be the underlying parameter
vector. Then using the pdf definition we can find the function s.t
If is invertible
Now if we plugin the estimators of the respective moments we get the MOM estimators
The multiple moments needed can be easily found via the moment
generation function
Example
Let the likelihood function be a 2 components Gaussian
mixture with unknown variance and weights. Let both
means be 0.
Where
Find the MOM estimators
Example
Let
Let , and
Then the parameters (after some algebra) can be shown
to be
Minimum variance
unbiased estimators
(MVUE)
MVUE
two observations with the likelihood functions
Assume
Assuming that the MVUE is of the form

Find the MVUE if
What if the likelihood function of the second observations
changed like
MVUE
Some time the variance and the bias equations become
so complicated that direct optimization methods fail.
Some times no function of data will give a minimum
variance for all possible possible parameter values.
Finding the MVUE

So we use 3 indirect approaches to find MVUE
Find sufficient statistic. Find unbiased estimator and condition
it on the sufficient statistic.
Find CRLB. Select a function form using some other
knowledge. Find parameters that come nearest to the CRLB
Constrain the model to be linear -> BLUE (best linear
unbiased)
MVUE I : Sufficient statistic

MVUE in the exponential family
Sufficient statistic
All the information about the parameter in the likelihood
function comes through the sufficient statistic.
Note : Raw data it self is a sufficient statistic.
An MVUE should be a function of the sufficient statistic
Minimal sufficient statistic : The smallest of them all ( in
dimensions)
Minimal sufficient statistic is always a function of other
sufficient statistics.
Complete statistics
If the whole parameter set is identifiable using the sufficient statistic
these are called a COMPLETE statistic.
A sufficient statistic is complete iff
How does this condition relate to parameter identifiability?
Note that
Since the only function in the null space of is the function , the space of all
possible functions is spanned by the parametric family .
Neyman Fisher factorization

theorem
is a sufficient statistic for the parameter iff we can factor
the likelihood function as
In the DC level observation example find the sufficient

statistic
1. for the DC level when noise power is known
2. for the noise power when 0 dc level
3. for the dc level and the noise power
Rao- Blackwell theorem

Let and two random variables (vectors). Define the
conditional expectation on given
Then
Rao Blackwell theorem proof

Property 1
Property 2
RaoBlackwell theorem
applied to estimators
Let
be anestimatorof a parameter , then theconditional
expectationof given a sufficient statistic
is always a better estimator of , and is never worse.
From a mean square error perspective this means
LehmannScheff theorem
If a statistic that is
UNBIASED,COMPLETEandSUFFICIENTfor some
parameter , then this statistic has the minimum
expected loss for ANYCONVEXLOSS FUNCTION.
In many practical applications with the squared lossfunction, it has a smaller mean squared error among
any estimators with the sameexpected value.
Hence a unbiased, complete, sufficient statistic is a
MVUE.
Pitman- Koopman theorem

Among families of probability distributions whose
domain does not vary with the parameter being
estimated, only in EXPONENTIAL FAMILYis there a
sufficient statistic whose dimension remains bounded as
sample size increases
So practically we can find worthwhile minimum
sufficient statistics only for exponential family of
distribution.
So what is this magical exponential family?
Exponential family
Not to be confused with exponential distribution ( it is
also a member in this family when the parameter of
interest is the mean).
This is concerned with how the pdfs are parameterized.
A good number of common pdfs belong to this family
when some of there parameters are known
Eg:
Poisson distribution with unknown mean
Exponential distribution with unknown mean
Gaussian distribution with unknown mean/unknown variance/
both mean and variance unknown
Exponential family
Definition
A set of probability distributions admitting following canonical

decomposition
Where
= sufficient statistic
= natural parameters
= inner/ dot product
= log normalizer
= carrier measure
If the observation is a scaler the pdf

is UNIVARIATE otherwise if the
observation is a vector the pdf is
MULTIVARIATE
The ORDER of a member in this
family is the DIMENSION OF THE
Example
Univariate Poisson distribution with unknown mean
Recall that the pmf (since is
This can be rearranged to
So
This is a exponential family member of degree 1.
Example
Find the sufficient statistic, natural parameters, log
normalizer, and the carrier measure for the Univariate
Gaussian distribution with unknown mean and variance.
Since
Therefore
Exponential Family
Assignment question
Prove that (a properly normalized) product of arbitrary
exponential family members is also a member of the
exponential family.
Is it the same for mixtures of exponential family
members?
Log-normalizer
Exponential families are characterized by their strictly
convex and differentiable functions F, called lognormalizer or the partition function
Since we have
Log normalizer
It
is also related to the moment generating and cumulant
generating functions of the sufficient statistics.
The moment generating function of the sufficient statistic is
Since the cumulant generating function
Log- normalizer
Therefore in exponential family we can easily find the
mean and the variance of the test statistic as
The Fisher information of exponential family member

also becomes
Assignment question : Prove this
MLE in exponential family

Taking
the log likelihood
Taking the gradient

At saddle points . Therefore the estimators can be found by solving the equation
Which is also the method of moment estimator in this case.
Also note that
Is negative showing that log likelihood function is strictly concave, and the
saddle point is a maximum.
Assignment questions
Show that the MLE and MVUE both are efficient in for
natural parameters of the exponential family.
Show that Expectation Maximization method finds the
local maximum of an exponential family joint
distribution (of observed and unobserved data)
Completeness of the sufficient

statistic in the exponential family
Theorem
: A sufficient statistic of an exponential family member is a complete
statistic.
Proof
We can write this as
Let and
Which is a scaled Laplace transformation of the function.. Therefore
From the uniqueness of the Laplace transform this implies Since , this implies
MVUE II: Best Linear

unbiased estimator
Weiner Filter
.
In our compact notation this can be written as
If the number of samples are equal to number of coefficients we
can find a that make .
If N > m (which is normally the case) we will try to find that
minimize
Weiner Filter
Let us define the

ross correlation vector , where each component
uto-correlation matrix where each component
Weiner filter
Now we find an estimate
Where
Applications of
Weiner filter
Noise cancellation
System identification
Channel
equalization
BLUE
Is
an extension of the Weiner filtering process.
Now we assume a linear estimator
To this to be unbiased
For this to be the case for some matrix .If we assume and then
If the covariance matrix of is , then
and
BLUE
Taking
the gradient of each variance and equating it to zero we
end up with the BLUE

and
Assignment: Proof: Take each variance and use Lagrange

multipliers to enforce . Then minimizing the augmented cost
function gives the .
Ref: Kay (Appendix 6B pg 153)
Least squares estimators
Least squares estimators (LSE)

One of the oldest methods going back to Legendre and
Gauss
Part of classical regression analysis.
Also known as data/curve fitting
Does not formulate a likelihood function. Just use the
signal model directly.
No probabilistic modeling of the noise.
Cannot claim any probabilistic optimality properties.
Two main classes of problems; Linear/ Non linear Least
squares
Linear least square estimators

For
data , parameter estimates and a model (matrix) the squared error
function becomes
Taking the gradient

Making it zero to find the maximum
Using this in the squared error equation the minimum error becomes
proof : Kay pg 225
Weighted LSE
What
if we change the error criterion
When would we use this? All data points are not same
Now
and
If we add a probabilistic description to the noise which will enable
us to formulate a likelihood function then W characterize the
spectral characteristic of the noise
Geometrical interpretation
Let where each column in the model matrix is now viewed as an n

dimensional vector.
If the model is to be identifiable these columns should be
independent thus spanning a p-dimensional sub space.
Lets define the signal estimate ;the signal space approximation of
the data.
To minimize
J the errors
should be orthogonal to this
p-space.
Hence minimizing the
square error is equivalent to
making the error
orthogonalEg:
toAssume
the model
, and
What happens if Then the projection on the signal
space becomes
Comparing with the earlier result this shows that matrix

is unitary as
Lets
look at the signal estimate with LSE
Define the projection matrix , which maps the data to the

signal space.
By defining its complement ,we see that the error
and the minimum cost
Sequential least squares estimators

So far the set of all data points were taken in one single
vector . This is called batch processing
What happens if data arrive sequentially? Can we
update the LSE sequentially?
E.g. Updating mean
Can we do this to any LSE?

Correction term

Now we index the estimators using the time step
Let us denote where each is a raw vector.
Since we can write the Grammian

Note
Since and this can be written as
Since, using the matrix inversion lemma
where

Substituting this to and simplifying we get
Where the correction gain factor

Estimation Theory

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Estimation Theory

Uploaded by

Copyright:

Available Formats

Estimation theory

Formulation of the problem

Formulation of the problem

We generally select a criterion of performance by compromising

approximate the ACTUAL VALUE of the PARAMETER in our

Small change in scene 1

Optimal estimators : Minimizes a suitable average cost function

Note that for unimodal likelihood functions

If we extend this to N observations

Where i.i.d Gaussian with zero mean and unit variance.

Numerical determination of the MLE

Warning : Can get stuck in a local maxima. Not guaranteed to

Expectation Maximization method

variables simplifies the problem ? )

Expectation Maximization method

For exponential likelihoods EM method has the properties that

Optimality of MLE for the linear

This is an efficient estimator as

Properties of MLE : Consistency

From law of large numbers

And RHS of (1) is maximized at . So if take

Properties of MLE : Asymptotic

Properties of MLE : Asymptotic

Method of moment estimators

as it is if the performance is good enough

Or as a stating point to MLE (which would be consistent)

Assuming that the MVUE is of the form

Finding the MVUE

MVUE I : Sufficient statistic

Neyman Fisher factorization

In the DC level observation example find the sufficient

Rao- Blackwell theorem

Rao Blackwell theorem proof

Pitman- Koopman theorem

A set of probability distributions admitting following canonical

If the observation is a scaler the pdf

Since the cumulant generating function

The Fisher information of exponential family member

MLE in exponential family

Taking the gradient

Completeness of the sufficient

MVUE II: Best Linear

Let us define the

uto-correlation matrix where each component

Now we find an estimate

end up with the BLUE

Assignment: Proof: Take each variance and use Lagrange

Least squares estimators

Least squares estimators (LSE)

Linear least square estimators

Taking the gradient

Let where each column in the model matrix is now viewed as an n

Comparing with the earlier result this shows that matrix

Define the projection matrix , which maps the data to the

and the minimum cost

Sequential least squares estimators

Can we do this to any LSE?

Sequential least squares estimators

Sequential least squares estimators

Since and this can be written as

Since, using the matrix inversion lemma

Sequential least squares estimators

Where the correction gain factor

You might also like