Good Lecture For Est Theory

Chapter 5
Estimation Theory and Applications

References:
S.M.Kay, Fundamentals of Statistical Signal Processing: Estimation
Theory, Prentice Hall, 1993
Estimation Theory and Applications

Application Areas
1. Radar
Radar system transmits an electromagnetic pulse s (n) . It is reflected by

an aircraft, causing an echo r (n) to be received after 0 seconds:
r (n) = s (n 0 ) + w(n)
where the range R of the aircraft is related to the time delay by
0 = 2R / c
2
2. Mobile Communications
The position of the mobile terminal can be estimated using the time-ofarrival measurements received at the base stations.
3. Speech Processing
Recognition of human speech by a machine is a difficult task because
our voice changes from time to time.
Given a human voice, the estimation problem is to determine the
speech as close as possible.
4. Image Processing
Estimation of the position and orientation of an object from a camera
image is useful when using a robot to pick it up, e.g., bomb-disposal
5. Biomedical Engineering
Estimation the heart rate of a fetus and the difficulty is that the
measurements are corrupted by the mothers heart beat as well.
6. Seismology
Estimation of the underground distance of an oil deposit based on
sound reflection due to the different densities of oil and rock layers.
5
Differences from Detection

1. Radar
Radar system transmits an electromagnetic pulse s (n) . After some time,

it receives a signal r (n) . The detection problem is to decide whether r (n)
is
echo from an object or it is not an echo
2. Communications
In wired or wireless communications, we need to know the information
sent from the transmitter to the receiver.
e.g., for binary phase shift keying (BPSK) signals, it consists of only two
symbols, 0 or 1. The detection problem is to decide whether it is 0 or
1.
3. Speech Processing
Given a human speech signal, the detection problem is decide what is the
spoken word from a set of predefined words, e.g., 0, 1,, 9
Waveform of 0
Another example is voice authentication: given a voice and it is indicated

that the voice is from George Bush, we need to decide its Bush or not.
10
4. Image Processing
Fingerprint authentication: given a fingerprint image and his owner says
he is A, we need to verify if it is true or not
Other biometric examples include face authentication, iris authentication,

etc.
11
5. Biomedical Engineering
17 Jan. 2003, Hong Kong Economics Times
e.g., given some X-ray slides, the detection problem is to determine if she
has breast cancer or not
6. Seismology
To detect if there is oil or there is no oil at a region
12
What is Estimation?
Extract or estimate some parameters from the observed signals, e.g.,
Use a voltmeter to measure a DC signal
x[n] = A + w[n],
n = 0,1, L , N 1
Given x[n] , we need to find the DC value, A

the parameter is the observed signal
Estimate the amplitude, frequency and phase of a sinusoid in noise

x[n] = cos(n + ) + w[n],
n = 0,1, L , N 1
Given x[n] , we need to find , and

the parameters are not directly observed in the received signal
13
Estimate the value of resistance R from a set of voltage and current

readings:
V [n] = Vactual [n] + w1[n],
I [n] = I actual [n] + w2 [n], n = 0,1, L , N 1
Given N pairs of ( V [n], I [n] ), we need to estimate the resistance R ,

ideally, R = V / I
the parameter is not directly observed in the received signals
Estimate the position of the mobile terminal using time-of-arrival

measurements:
( x s xn ) 2 ( y s y n ) 2
r[ n ] =
+ w[n],
c
n = 0,1, L , N 1
Given r[n] , we need to find the mobile position ( x s , y s ) where c is the

signal propagation speed and ( x n , y n ) represent the known position of
the n th base station
the parameters are not directly observed in the received signals
14
Types of Parameter Estimation

Linear or non-linear
Linear:
Non-linear:
DC value, amplitude of the sine wave

Frequency of the sine wave, mobile position
Single parameter or multiple parameters

Single:
Multiple:
DC value; scalar
Amplitude, frequency and phase of sinusoid; vector
Constrained or unconstrained
Constrained:
Use other available information & knowledge, e.g., from

the N pairs of (V [n], I [n] ), we draw a line which best fits
the data points and the estimate of the resistance is
given by the slope of the line. We can add a constraint
that the line should cross the origin (0,0)
Unconstrained: No further information & knowledge is available
15
Parameter is unknown deterministic or random

Unknown deterministic:
Random :
constant but unknown (classical)

DC value is an unknown constant
random variable with prior knowledge of
PDF (Bayesian)
If we have prior knowledge that the DC value
is bounded by A0 and A0 with a particular
PDF better estimate
Parameter is stationary or changing

Stationary :
Unknown deterministic for whole observation

period, time-of-arrivals of a static source
Changing :
Unknown deterministic at different time

instants, time-of-arrivals of a moving source
16
Performance Measures for Classical Parameter Estimation

Accuracy:
Is the estimator biased or unbiased?
e.g.,
x[n] = A + w[n],
n = 0,1, L , N 1
where w[n] is a zero-mean random noise with variance 2w

Proposed estimators:
A1 = x[0]
N 1
1
A 2 =
x[n]
N n =0
N 1
1
A 3 =
x[n]
N 1 n =0
N 1
A 4 = N x[n] = N x[0] x[1]L x[ N 1]

n=0
17
Biased :
E{ A } A
Unbiased :
E{ A } = A
Asymptotically unbiased : E{ A } = A only if N
Taking the expected values for A1, A 2 and A 3 , we have
E{ A1} = E{x[0]} = E{ A} + E{w[0]} = A + 0 = A
N 1
N 1
N 1
1
1
1
E{ A 2 } = E x[n] = E A + E w[n]
N n =0
N n =0
N n =0
1 N 1
1 N 1
1
1 N 1
=
A+
E{w[n]} = N A +
0= A
N n =0
N n =0
N
N n =0
E{ A 3 } =
1
N
A=
A
1 1/ N
N 1
Q. State the biasedness of A1, A 2 and A 3 .

18
For A 4 , it is difficult to analyze the biasedness. However, for w[n] = 0 :

N
N
x[0] x[1]L x[ N 1] = N A AL A = A N = A
What is the value of the mean square error or variance?

They correspond to the fluctuation of the estimate in the second order:
MSE = E{( A A) 2 }
(5.1)
var = E{( A E{ A }) 2 }
(5.2)
:
If the estimator is unbiased, then MSE = var
19
In general,
MSE = E{( A A) 2 } = E{( A E{ A } + E{ A } A) 2 }
= E{( A E{ A }) 2 } + E{( E{ A } A) 2 } + 2 E{( A E{ A })( E{ A } A)}
= var + ( E{ A } A) 2 + 2( E{ A } E{ A })( E{ A } A)
(5.3)
= var + (bias) 2
E{( A1 A) 2 } = E{( x[0] A) 2 } = E{( A + w[0] A) 2 } = E{w 2 [0]} = 2w

2
1 N 1
N 1
1
2
2
E{( A 2 A) } = E
x[n] A = E w [n] =
N n = 0
N n = 0
2w
N
2
2
2
1 N 1
w
E{( A 3 A) 2 } = E
x[n] A =
+
N 1
N 1
N 1 n = 0
20
An optimum estimator should give estimates which are

Unbiased
Minimum variance (MSE as well)
Q. How do we know the estimator has the minimum variance?
Cramer-Rao Lower Bound (CRLB)
Performance bound in terms of minimum achievable variance provided by
any unbiased estimators
Use for classical parameter estimation
Require knowledge of the noise PDF and the PDF must have closed form
More easier to determine than other variance bounds
21
Let the parameters to be estimated be = [1 , 2 , L , P ]T , the CRLB for

i in Gaussian noise is stated as follows
CRLB( i ) = [J ()]i,i = I 1 () i ,i
(5.4)
where
2 ln p (x; )
2 ln p ( x; )
2 ln p (x; )
- E
- E
L - E

1
1 2
1 P
2
2
ln p ( x; )
ln p (x; )
E
E
(5.5)
2
I () =
2 1
M
O
2
2
ln p (x; )
- E ln p (x; )
- E
P
P 1

22
p (x; ) represents PDF of x = [ x[0], x[1], L , x[ N 1]]T

parameterized by the unknown parameter vector
Note that
I () is known as Fisher information matrix
[J ]i,i is the ( i, i ) element of J

1 2
e.g., J =
[J ] 2 ,2 = 3
2 3
ln
p
(
x
;
)
ln
p
(
x
;
E
E
=
i j
j i
23
and
it
is
Review of Gaussian (Normal) Distribution

The Gaussian PDF for a scalar random variable x is defined as
p( x) =
exp
( x ) 2
2
2
2
2
1
(5.6)
We can write x ~ N (, )
The Gaussian PDF for a random vector x of size N is defined as
p ( x) =
exp (x )T C -1 (x )
2
(2) N / 2 det1 / 2 (C)

1
We can write x ~ N (, C)
24
(5.7)
The covariance matrix C has the form of

C = E{( x ) (x )T }
L E{( x[0] 0 )( x[ N 1] N 1 )}
E{( x[0] 0 ) 2 }
O
M
E{( x[0] 0 )( x[1] 1 )}
2
E{( x[ N 1] N 1 ) }
E{( x[0] 0 )( x[ N 1] N 1 )} L
(5.8)
where
x = [ x[0], x[1], L , x[ N 1]]T
= E{x} = [ 0 , 1 , L , N 1 ]]T
25
If x is a zero-mean white vector and all vector elements have variance 2

2
0
C = E{( x ) (x )T } =
M
0 L 0
2
M
= 2 I N
O 0
2
L 0
The Gaussian PDF for the random vector x can be simplified as

1 N 1 2
p ( x) =
exp
x [ n]
2 N /2
2
2 n = 0
(2 )
1
with the use of
C -1 = 2 I N
det(C) = ( 2 ) N = 2 N
26
(5.9)
Example 5.1
Determine the PDF of
x[0] = A + w[0]
and
x[n] = A + w[n],
n = 0,1, L , N 1
where { w(n) } is a white Gaussian process with known variance 2w and A

is a constant
1
1
2
p ( x[0]; A) =
exp 2 ( x[0] A)
2
2
2 w
w
N 1
1
2
p (x;A) =
exp 2 ( x[n] A)
2 N /2
2 n = 0
(2 w )
27
Example 5.2
Find the CRLB for estimating A based on single measurement:
x[0] = A + w[0]
1
2
p ( x[0]; A) =
exp 2 ( x[0] A)
2
22w
w
1
ln( p ( x[0]; A)) = ln( 22w ) 2 ( x[0] A) 2
2 w
1
ln( p ( x[0]; A))

1
( x[0] A)
= 2 2( x[0] A) 1 =
A
2w
2 w
2 ln( p ( x[0]; A))
A
= 2
w
28
As a result,
2 ln( p ( x[0]; A))
1
E
=
2w
A
I ( A) = I ( A) = 2
w
J ( A) = 2w
CRLB( A) = 2w
This means the best we can do is to achieve estimator variance = 2w or

var( A ) 2w
where A is any unbiased estimator for estimating A
29
We also observe that a simple unbiased estimator

A1 = x[0]
achieves the CRLB:

E{( A1 A) 2 } = E{( x[0] A) 2 } = E{( A + w[0] A) 2 } = E{w2 [0]} = 2w
Example 5.3
Find the CRLB for estimating A based on N measurements:
x[n] = A + w[n],
n = 0,1, L , N 1
N 1
1
2
p (x;A) =
exp 2 ( x[n] A)
2 N /2
2 n = 0
(2 w )
30
N 1
1
2
exp 2 ( x[n] A)
p (x; A) =
2 N /2
2 n = 0
(2 w )
w
1 N 1
2 N /2
) 2 ( x[n] A) 2
ln( p (x; A)) = ln((2 w )
2 w n = 0
N 1
N 1
( x[n] A)
1
ln( p (x; A))
= 2 2 ( x[n] A) 1 = n = 0 2
A
2 w
w
n =0
2 ln( p (x; A))
A
= 2
w
2 ln( p (x; A))

N
=
2w
A
31
As a result,
I ( A) = I ( A) =
N
2w
2w
J ( A) =
N
2w
CRLB( A) =
N
This means the best we can do is to achieve estimator variance = 2w / N

or
2
var( A ) w
N
where A is any unbiased estimator for estimating A

32
We also observe that a simple unbiased estimator
does not achieve the CRLB
A1 = x[0]
E{( A1 A) 2 } = E{( x[0] A) 2 } = E{( A + w[0] A) 2 } = E{w 2 [0]} = 2w
On the other hand, the sample mean estimator
achieve the CRLB
N 1
1
A 2 =
x[n]
N n =0
2
1 N 1
1 N 1 2
2
E{( A2 A) } = E
x[n] A = E w [n] =
N n = 0
N n = 0
2w
N
sample mean is the optimum estimator for white Gaussian noise
33
Example 5.4
Find the CRLB for A and 2w given {x[n]}:
x[n] = A + w[n],
n = 0,1, L , N 1
1
1 N 1
2
2
p (x; ) =
x
n
A
A,
exp
(
[
]
)
,
[
=
w]
2 N /2
2
(2 w )
2 w n = 0
1 N 1
2 N /2
) 2 ( x[n] A) 2
ln( p (x; )) = ln((2 w )
2 w n = 0
N
N
1 N 1
2
= ln(2) ln( w ) 2 ( x[n] A) 2
2
2
2 w n = 0
N 1
( x[n] A)
N 1
1
ln( p (x; ))
= 2 2 ( x[n] A) 1 = n = 0 2
A
2 w
w
n =0
34
2 ln( p (x; ))
A
= 2
w
2 ln( p(x; ))
N
E
= 2
2
A
w
N 1
ln( p (x; ))
A 2w
( x[n] A)
= n =0
4w
N 1
N 1
( w[n])
= n =0 4
w
( E{w[n]})
2 ln( p(x; ))
n =0
E
=
=0
4
2
A w
w
35
1 N 1
N
N
2
ln( p (x; )) = ln(2) ln( w ) 2 ( x[n] A) 2
2
2
2 w n = 0
ln( p (x; ))
2w
1 N 1
= 2 + 4 ( x[n] A) 2
2 w 2 w n = 0
2 ln( p (x; ))
( 2w ) 2
1 N 1
= 4 6 ( w[n]) 2
2 w w n = 0
N
2 ln( p (x; ))
N
N
1
2
E
=
w
2
2
4
6
( w )
2 w w
2 4w
N
2
I () = w
0
36
N
2 4w
2w
-1
J () = I () = N
4
2 w
N
2w
CRLB( A) =
N
CRLB( 2w ) =
2 4w
N
the CRLBs for unknown and known noise power are identical
Q. The CRLB is not affected by knowledge of noise power. Why?

Q. Can you suggest a method to estimate 2w ?
37
Example 5.5
Find the CRLB for phase of a sinusoid in white Gaussian noise:
x[n] = A cos(0 n + ) + w[n],
n = 0,1,L, N 1
where A and 0 are assumed known

The PDF is
N 1
1
2
p (x; ) =
exp 2 ( x[n] A cos(0 n + ))
2 N /2
2 n = 0
(2 w )
w
1 N 1
2 N /2
) 2 ( x[n] A cos(0 n + )) 2
ln( p (x; )) = ln((2 w )
2 w n = 0
38
ln( p (x; ))
1 N 1
= 2 2( x[n] A cos(0 n + )) A sin(0 n + )
2 w n = 0
A N 1
A
= 2 x[n] sin(0 n + ) sin( 20 n + 2)

2
w n=0
2 ln( p (x; ))
2
A N 1
= 2 [x[n] cos(0 n + ) A cos(20 n + 2)]
w n=0
2 ln( p (x; ))
A N 1
E
= 2 [ A cos(0 n + ) cos(0 n + ) A cos(20 n + 2)]
2
w n=0
A2 N 1
= 2 cos 2 (0 n + ) cos(20 n + 2)
w n=0
A2 N 1 1 1
= 2 + cos(20 n + 2) cos(20 n + 2)
w n=0 2 2
39
2 ln( p (x; ))
A2 N
A2 N 1
E
= 2 + 2 cos(20 n + 2)
2
w 2 2 w n = 0
NA2
As a result,
A2 N 1
= 2 + 2 cos(20 n + 2)
2 w 2 w n = 0
NA2 A2 N 1
CRLB() = 2 2 cos(20 n + 2)
2 w 2 w n = 0
If N >> 1,
then
1
2 2w 1 N 1
=
1
cos(20 n + 2)
2
NA N n = 0
1 N 1
cos(20 n + 2) 0
N n =0
CRLB()
40
2 2w
NA2
Example 5.6
Find the CRLB for A , 0 and for
x[n] = A cos(0 n + ) + w[n],
n = 0,1,L, N 1,
N >> 1
N 1
1
2
= [ A,0 , ]
p (x; ) =
exp 2 ( x[n] A cos(0 n + )) ,
2 N /2
(2 w )
2 w n = 0
1 N 1
2 N /2
ln( p (x; )) = ln((2 w )
) 2 ( x[n] A cos(0 n + )) 2
2 w n = 0
N 1
ln( p (x; ))
1
= 2 2 ( x[n] A cos(0 n + )) cos(0 n + )
A
2 w
n =0
1 N 1
= 2 ( x[n] cos(0 n + ) A cos 2 (0 n + ))
w n =0
41
2 ln( p (x; ))
A2
1 N 1
1 N 1 1 1
2
= 2 cos (0 n + ) = 2 + cos(20 n + 2)
w n =0 2 2
w n =0
N
2 2w
2 ln( p (x; ))
N
E
2
2
A
2 w
Similarly,
2 ln( p ( x; ))
A N 1
E
= 2 n sin( 20 n + 2) 0
A0 2 w n = 0
2 ln( p ( x; ))
A N 1
E
= 2 sin( 20 n + 2) 0
A
2 w n = 0
42
2 N 1
2 ln( p (x; ))
A 2 N 1 2 1 1
A
=
E
n cos(20 n + 2)
n2
2w n = 0 2 2
2 2w n = 0
0 2
2 ln( p ( x; ))
A2 N 1
A2 N 1
2
E
= 2 n sin (0 n + ) 2 n
0
w n=0
2 w n = 0
2 ln( p (x; ))
A 2 N 1 2
NA 2
E
= 2 sin (0 n + ) 2
2
w n=0
2 w
I ()
1
2w
0
A 2 N 1 2
n
2 n=0
A 2 N 1
n
2 n=0
43
2 N 1
A
n
2 n=0
2
NA
After matrix inversion, we have

2 2w
CRLB( A)
N
CRLB(0 )
12
2
SNR N ( N 1)
SNR =
A2
2 2w
2(2 N 1)
CRLB()
SNR N ( N + 1)
Note that
2 2w
2(2 N 1)
4
1
CRLB()
>
=
NA
SNR N ( N + 1) SNR N SNR N
In general, the CRLB increases as the number of parameters to be

estimated increases
CRLB decreases as the number of samples increases
44
Parameter Transformation in CRLB

Find the CRLB for = g () where g () is a function
e.g.,
x[n] = A + w[n],
n = 0,1, L , N 1
What is the CRLB for A 2 ?

The CRLB for parameter transformation of = g ( ) is given by
CRLB() =
g ()
ln( p (x; ))
E
(5.10)
For nonlinear function, = is replaced by and it is true only for large N
45
Example 5.7
Find the CRLB for the power of the DC value, i.e., A 2 :
x[n] = A + w[n],
n = 0,1, L , N 1
= g ( A) = A 2
2
g ( A)
g ( A)
2
= 2A
= 4A
A
A
From Example 5.3, we have
2 ln( p (x; A)) N
E
= 2
2
w
A
As a result,
2
2 2
A
4
w
2
2
w
CRLB( A ) 4 A
=
,
N >> 1
N
N
46
Example 5.8
Find the CRLB for = c1 + c 2 A from
x[n] = A + w[n],
n = 0,1, L , N 1
= g ( A) = c1 + c 2 A
2
g ( A)
g ( A)
2
= c2
= c2
A
A
As a result,
CRLB() = c 22 CRLB( A) = c 22 w
N
c 22 2w
=
N
47
Maximum Likelihood Estimation

Parameter estimation is achieved via maximizing the likelihood function
Optimum realizable approach and can give performance close to CRLB
Require knowledge of the noise PDF and the PDF must have closed form
Generally computationally demanding
Let p (x; ) be the PDF of the observed vector x parameterized by the
parameter vector . The maximum likelihood (ML) estimate is
= arg max p(x; )
48
(5.11)
e.g., given p (x = x 0 ; ) where x 0 is the observed data, as below
Q. What is the most possible value of ?
49
Example 5.9
Given
x[n] = A + w[n],
n = 0,1, L , N 1
where A is an unknown constant and w[n] is a white Gaussian noise with

known variance 2w . Find the ML estimate of A .
N 1
1
2
p (x;A) =
exp
(
x
[
n
]
A
)
2 2 n = 0
(22w ) N / 2
Since arg max p (x; ) = arg max{ln( p (x; ))}, taking log for p (x;A) gives
ln( p (x; A)) = ln((22w ) N / 2 )
50
1 N 1
2
x
n
A
(
[
]
2 2w n = 0
Differentiate with respect to A yields
N 1
( x[n] A)
N 1
ln( p (x; A))

1
= 2 2 ( x[n] A) 1 = n = 0 2
A
w
2 w
n=0
A = arg max{ln( p (x; A)} is determined from
A
N 1
( x[n] A )
n =0
Note that
2w
N 1
1 N 1
= 0 ( x[n] A) = 0 A =
x[n]
N n =0
n =0
ML estimate is identical to the sample mean

Attain CRLB
Q. How about if 2w is unknown?
51
Example 5.10
Find the ML estimate for phase of a sinusoid in white Gaussian noise:
x[n] = A cos(0 n + ) + w[n],
n = 0,1,L, N 1
where A and 0 are assumed known

The PDF is
N 1
1
2
p (x; ) =
exp 2 ( x[n] A cos(0 n + ))
2 N /2
2 n = 0
(2 w )
w
1 N 1
2 N /2
) 2 ( x[n] A cos(0 n + )) 2
ln( p (x; )) = ln((2 w )
2 w n = 0
52
It is obvious that the maximum of p (x; ) or ln( p (x; )) corresponds to the

minimum of
N 1
1 N 1
2
2
(
x
[
n
]
A
cos(
n
+
))
or
(
x
[
n
]
A
cos(
n
+
))
0
0
2
2 w n = 0
n=0
Differentiating with respect to and then set the result to zero:

N 1
2( x[n] A cos(0 n + )) A sin(0 n + )
n =0
N 1
= A x[n] sin(0 n + ) sin( 20 n + 2) = 0

2
n=0
N 1
N 1
A
x[n] sin(0 n + ) =
sin( 20 n + 2 )
2 n =0
n=0
The ML estimate for is determined from the root of the above equation
Q. Any ideas to solve the nonlinear equation?
53
Approximate ML (AML) solution may exist and it depends on the structure

of the ML expression. For example, there exists an AML solution for
N 1
A N 1
x[n] sin(0 n + ) =
sin( 20 n + 2 )
2 n =0
n=0
N 1
1 N 1
A
1
A
x[n] sin(0 n + ) =
sin( 20 n + 2) 0 = 0,
N n=0
2 N n=0
2
The AML solution is obtained from

N 1
x[ n] sin(0 n + ) = 0
n=0
N 1
N 1
n=0
n=0
x[n] sin(0 n) cos( ) + x[n] cos(0 n) sin( ) = 0

N 1
N 1
n =0
n=0
cos( ) x[n] sin(0 n) = sin( ) x[n] cos(0 n)
54
N >> 1
N 1 x[ n] sin( n)
0
= tan 1 Nn =01
x
[
n
]
cos(
n
)
n =0
In fact, the AML solution is reasonable:

N 1 ( A cos( n + ) + w[ n]) sin( n)
0
0
n=0
= tan
N 1 ( A cos( n + ) + w[n]) cos( n)
0
0
n=0
NA
N 1 w[ n] sin( n)
sin(
)
+
n=0
0
1
2
,
N >> 1
tan
NA
cos() + nN=01 w[n] cos(0 n)

2
2 N 1
n = 0 w[n] sin(0 n)
sin()
NA
= tan 1
cos() + 2 N 1 w[n] cos( n)
0
NA n = 0
1
55
For parameter transformation, if there is a one-to-one relationship

between = g () and , the ML estimate for is simply:
= g ( )
(5.12)
Example 5.11
Given N samples of a white Gaussian process w[n], n = 0,1, L , N 1, with
unknown variance 2 . Determine the power of w[n] in dB.
The power in dB is related to 2 by
P = 10 log10 ( 2 )
which is a one-to-one relationship. To find the ML estimate for P , we first

find the ML estimate for 2
56
1 N 1 2
p(w; ) =
exp
x [ n]
2 N /2
2
(2 )
2 n = 0
1
N
N
1 N 1 2
2
ln( p (w; )) = ln(2) ln( )
x [ n]
2
2
2
2 n = 0
2
Differentiating the log-likelihood function w.r.t. to 2 :

ln( p (w; 2 ))
2
1 N 1 2
= 2 + 4 x [ n]
2
2 n = 0
N
Setting the resultant expression to zero:

1 N 1 2
1 N 1 2
2
= 4 x [n] =
x [ n]
2
N n=0
2
2 n = 0
N
As a result,
N 1
1
P = 10 log10 ( ) = 10 log10
x [ n]
N n =0
57
Example 5.12
Given
x[n] = A + w[n],
n = 0,1, L , N 1
where A is an unknown constant and w[n] is a white Gaussian noise with

unknown variance 2 . Find the ML estimates of A and 2 .
1 N 1
2
p (x;) =
exp 2 ( x[n] A) ,
2 N /2
(2 )
2 n = 0
= [A ]
ln( p (x;)) 1 N 1
= 2 ( x[n] A)
A
n =0
ln( p (x;))
1 N 1
N
2
=
+
(
[
]
)
x
n
A
2 2 2 4 n = 0
2
58
2 T
Solving the first equation:

N 1
A = 1 x[n] = x
N n =0
Putting A = A = x in the second equation:

1 N 1
2
=
( x[n] x )
N n=0
2
Numerical Computation of ML Solution

When the ML solution is not of closed form, it can be computed by
Grid search
Numerical methods: Newton-Raphson, Golden section, bisection, etc
59
Example 5.13
From Example 5.10, the ML solution of is determined from
N 1
A N 1
x[n] sin(0 n + ) =
sin( 20 n + 2 )
2 n =0
n=0
Suggest methods to find

Approach 1: Grid search
Let
A N 1
g () = x[n] sin(0 n + ) sin( 20 n + 2)
2 n=0
n =0
It is obvious that
N 1
= root of g ()
60
The idea of grid search is simple:

Search for all possible values of or a given range of to find root
Values are discrete tradeoff between resolution & computation
e.g.,
Range for : any values in [0,2)

Discrete points : 1000 resolution is 2 / 1000
MATLAB source code:

N=100;
n=[0:N-1];
w = 0.2*pi;
A = sqrt(2);
p = 0.3*pi;
np = 0.1;
q = sqrt(np).*randn(1,N);
x = A.*cos(w.*n+p)+q;
for j=1:1000
pe = j/1000*2*pi;
s1 =sin(w.*n+pe);
s2 =sin(2.*w.*n+2.*pe);
g(j) = x*s1'-A/2*sum(s2);
end
61
pe = [1:1000]/1000;
plot(pe,g)
Note: x-axis is /( 2)
62
stem(pe,g)
axis([0.14 0.16 -2 2])
g (0.152 2) = -0.2324 , g (0.153 2) = 0.2168
= 0.153 2 = 0.306 ( 0.001 )
63
For a smaller resolution, say 200 discrete points:

clear pe;
clear s1;
clear s2;
clear g;
for j=1:200
pe = j/200*2*pi;
s1 =sin(w.*n+pe);
s2 =sin(2.*w.*n+2.*pe);
g(j) = x*s1'-A/2*sum(s2);
end
pe = [1:200]/200;
plot(pe,g)
64
stem(pe,g)
axis([0.14 0.16 -2 2])
g (0.150 2) = -1.1306 , g (0.155 2) = 1.1150
= 0.155 2 = 0.310 ( 0.005 )
Accuracy increases as number of grids increases
65
Approach 2: Newton/Raphson iterative procedure

With initial guess 0 , the root of g () can be determined from
k )
k )
g
(
g
(
k +1 = k
=
dg ()
g ' ( k )
d =
k
N 1
A N 1
g () = x[n] sin(0 n + ) sin( 20 n + 2)
2 n=0
n =0
N 1
A N 1
g ' () = x[n] cos(0 n + ) cos(20 n + 2) 2
2 n =0
n=0
N 1
N 1
n=0
n =0
= x[n] cos(0 n + ) A cos(20 n + 2)
with
0 = 0
66
(5.13)
p1 = 0;
for k=1:10
s1 =sin(w.*n+p1);
s2 =sin(2.*w.*n+2.*p1);
c1 =cos(w.*n+p1);
c2 =cos(2.*w.*n+2.*p1);
g = x*s1'-A/2*sum(s2);
g1 = x*c1'-A*sum(c2);
p1 = p1 - g/g1;
p1_vector(k) = p1;
end
stem(p1_vector/(2*pi))
Newton/Raphson method converges at ~ 3rd iteration

= 0.1525 2 = 0.305
Q. Can you comment on the grid search & Newton/Raphson method?
67
ML Estimation for General Linear Model

The general linear data model is given by
x = H + w
(5.14)
where
x is the observed vector of size N
w is Gaussian noise vector with known covariance matrix C
H is known matrix of size N p
is parameter vector of size p
Based on (5.7), the PDF of x parameterized by is

1
T
-1
exp
)
p (x; ) =
x
H
C
x
H
N /2
1/ 2
2
( 2 )
det (C)
1
68
(5.15)
Since C is not a function of , the ML solution is equivalent to

= arg min{J()} where J () = (x H)T C-1 (x H)
Differentiating J () with respect to and then set the result to zero:

- 2HT C-1 x + 2HT C-1 H = 0
HT C-1 x = HT C-1 H
As a result, the ML solution for linear model is

= (HT C-1H ) 1 HT C-1x
(5.16)
For white noise:

= (HT ( 2w I ) -1 H ) 1 HT ( 2w I ) -1 x = (HT H ) 1 HT x
69
(5.17)
Example 5.14
Given N pair of ( x, y ) where x is error-free but y is subject to error:
y[n] = m x[n] + c + w[n]
, n = 0,1, L , N 1
where w is white Gaussian noise vector with known covariance matrix C

Find the ML estimates for m and c
y[n] = m x[n] + c + w[n]
m
y[n] = [ x[n] 1] + w[n] = [ x[n] 1] + w[n],
c
= [m c]T
y[0] = [ x[0] 1] + w[0]
y[1] = [ x[1] 1] + w[1]
y[ N 1] = [ x[ N 1] 1] + w[ N 1]
70
Writing in matrix form:
y = H + w
where
y = [ y[0], y[1], L , y[ N 1]]T
x[0]
x[1]
H=
M
x[ N 1]
1
1
M
1
Applying (5.16) gives

m
= = (HT C 1H ) 1 HT C 1y
c

71
Example 5.15
Find the ML estimates of A , 0 and for
x[n] = A cos(0 n + ) + w[n],
n = 0,1,L, N 1,
N >> 1
where w[n] is a white Gaussian noise with variance 2w

Recall from Example 5.6:
1 N 1
2
p (x; ) =
exp
( x[n] A cos(0 n + )) ,
2 N /2
2
(2 w )
2 w n = 0
The ML solution for can be found by minimizing

N 1
J ( A, 0 , ) = ( x[n] A cos(0 n + )) 2
n=0
72
= [ A,0 , ]
This can be achieved by using a 3-D grid search or Netwon/Raphson

method but it is computationally complex
Another simpler solution is as follows
N 1
J ( A, 0 , ) = ( x[n] A cos(0 n + )) 2
n=0
N 1
= ( x[n] A cos() cos(0 n) + A sin() sin(0 n)) 2

n=0
Since A and are not quadratic in J ( A, 0 , ) , the first step is to use

parameter transformation:
1 = A cos()
2 = A sin()
73
A = 12 + 22
1 2
= tan
1
Let
c = [1 cos(0 ) L cos(0 ( N 1))]T
s = [0 sin(0 ) L sin(0 ( N 1))]T
We have
J (1, 2 , 0 ) = (x 1c 2s)T (x 1c 2s)
T
= (x - H ) (x - H ),
Applying (5.17) gives

= (HT H ) 1 HT x
74
1
= , H = [c s]
2
Substituting back to J (1, 2 , 0 ) :

J (0 ) = (x - H )T (x - H )
= (x - H (HT H ) 1 HT x)T (x - H (HT H ) 1 HT x)
= (I - H ( H H ) H ) x
) ((I - H (HT H)1 HT ) x)

T
= xT (I - H (HT H ) 1 HT )T (I - H (HT H ) 1 HT ) x
= xT (I - H (HT H ) 1 HT ) x
= xT x - xT H (HT H ) 1 HT x
Minimizing J (0 ) is identical to maximizing

xT H (HT H ) 1 HT x
or
0 = arg max xT H (HT H ) 1 HT x
3-D search is reduced to a 1-D search

75
0 , can be obtained as well

After determining
For sufficiently large N :
c c c s
x H (H H) H x = c x s x T
T
s c s s
T
0
N / 2
T
T
c x s x
N
0
/
2
( ) ( )
cT x
T
s x
1
cT x
T
s x
2 T 2
T 2
= c x + s x
N
2
2 N 1
=
x[n] exp( j0 n)
N n=0
2
1 N 1
0 = arg max
x[n] exp( j0 n) periodogram maximum

0 N n = 0
76
Least Squares Methods

Parameter estimation is achieved via minimizing a least squares (LS) cost
function
Generally not optimum but computationally simple
No knowledge of the noise PDF is required
Can be considered as a generalization of LS filtering
77
Variants of LS Methods
1. Standard LS
Consider the general linear data model:
x = H + w
where
x is the observed vector of size N

w is zero-mean noise vector with unknown covariance matrix
H is known matrix of size N p
is parameter vector of size p
The LS solution is given by
= arg min (x - H )T (x - H ) = (HT H ) 1 HT x
which is equal to (5.17)
78
(5.18)
LS solution is optimum if covariance matrix of w is C = 2w I and w is

Gaussian distributed
Define
e = x - H
where
e = [e(0) e(1) L e( N 1)]T
(5.18) is equivalent to
N 1
= arg min e 2 (k )
k =0
which is similar to LS filtering

Q. Any differences between (5.19) and LS filtering?
79
(5.19)
Example 5.16
Given
x[n] = A + w[n],
n = 0,1, L , N 1
where A is an unknown constant and w[n] is a zero-mean noise

Find the LS solution of A
Using (5.19),
N 1
A = arg min ( x[n] A)2

A n=0
N 1
Differentiating ( x[n] A)2 with respect to A and set the result to 0:

n=0
N 1
A = 1 x[n]
N n =0
80
On the other hand, writing {x[n]} in matrix form:

x = HA + w
where
1
1
H=
M
1
Using (5.18),
1
x[0]
1
1
x[1]
N 1
1
= N x[n]
A = [1 1 L 1] [1 1 L 1]
M
M
n =0
1
x[ N 1]
Both (5.18) and (5.19) give the same answer and the LS solution is
81
optimum if the noise is white Gaussian

Example 5.17
Consider the LS filtering problem again. Given
d [n] = X T [n] W + q[n],
where
n = 0,1, L , N 1
d [n] is desired response

X [n] = [ x[n] x[n 1] L x[n L + 1]]T is the input signal vector
W = [ w0 w1 L wL 1 ]T is the unknown filter weight vector

q[n] is zero-mean noise
Writing in matrix form:

d = H W + q,
Using (5.18):
W =W
= (HT H ) 1 HT d
W
82
where
X T (0) x[0]
0
0
L
T
[
1
]
[
0
]
0
x
x
L
X
(
1
)
=
H=
M
M
M
M
M
T

[
1
]
[
2
]
[
]
x
N
x
N
L
x
N
L
X ( N 1)
with x[1] = x[2] = L = 0

Note that
R xx = HT H
R dx = HT d
where R xx is not the original version but not modified version of (3.6)
83
Example 5.18
Find the LS estimate of A for
x[n] = A cos(0 n + ) + w[n],
n = 0,1,L, N 1,
N >> 1
where 0 and are known constants while w[n] is zero-mean noise

Using (5.19),
Differentiate
N 1
A = arg min ( x[n] A cos(0 n + ) )2

A n=0
N 1
2
( x[n] A cos(0 n + ) ) with respect to A & set result to 0:
n=0
N 1
2 ( x[n] A cos(0 n + ) ) cos(0 n + ) = 0

n=0
N 1
N 1
n=0
n =0
x[n] cos(0 n + ) = A cos 2 (0 n + )

84
The LS solution is then
N 1
x[n] cos(0 n + )
A = n =N01
2
cos (0 n + )
n=0
2. Weighted LS
Use a general form of LS via a symmetric weighting matrix W
= arg min (x - H )T W(x - H ) = (HT WH ) 1 HT Wx
such that
(5.20)
W = WT
Due to the presence of W , it is generally difficult to write the cost function

(x - H )T W(x - H ) in scalar form as in (5.19)
85
Rationale of using W : put larger weights on data with smaller errors

put smaller weights on data with larger errors
When W = C-1 where C is covariance matrix of the noise vector:
= (HT C-1H ) 1 HT C-1x
(5.21)
which is equal to the ML solution and is optimum for Gaussian noise

Example 5.19
Given two noisy measurements of A :
x1 = A + w1
and
x2 = A + w2
where w1 and w2 are zero-mean uncorrelated noises with known

variances 12 and 22 . Determine the optimum weighted LS solution
86
Use
2
-1 1
W=C =
0
0
2
2
1 / 12
0
=
2
1 / 2
0
Grouping x1 and x2 into matrix form:

x1 1
w1
x = 1 A + w
2
2
or
x = H A+w
Using (5.21)
2
1
/
0
1
1
1
T
T
1
A = (H C H ) H C x = [1 1]
2 1
0
1
/
87
1 / 12
0 x1
[1 1]
2 x
1 / 2 2
0
As a result,
A = 1 + 1
2 2
1
2
2
2
x1 x2
2
1
=
x
x
+
+
1
2
2 2
2 2 2 + 2
1 + 2
1
2
1
2
Note that
If 22 > 12 , a larger weight is placed on x1 and vice versa
If 22 = 12 , the solution is equal to the standard sample mean
The solution will be more complicated if w1 and w2 are correlated
Exact values for 12 and 22 are not necessary, only ratio is needed
Define = 12 / 22 , we have
1
A =
x2
x1 +
1+
1+
88
3. Nonlinear LS
The LS cost function cannot be represented as a linear model as in
x = H + w
In general, it is more complex to solve, e.g.,

The LS estimates for A, 0 and can be found by minimizing
N 1
2
( x[n] A cos(0 n + ))
n=0
whose solution is not straightforward as seen in Example 5.15

Grid search and numerical methods are used to find the minimum
89
4. Constrained LS
The linear LS cost function is minimized subject to constraints:
= arg min (x - H )T (x - H )
subject to S
(5.22)
where S is a set of equalities/inequalities in terms of

Generally it can be solved by linear/nonlinear programming, but simpler
solution exists for linear and quadratic constraint equations, e.g.,
Linear constraint equation:
1 + 2 + 3 = 10
Quadratic constraint equation:
12 + 22 + 32 = 100
Other types of constraints:
1 > 2 > 3 > 10

1 + 22 + 33 100
90
Consider the constraints S is

A = b
which contains r linear equations. The constrained LS problem for linear

model is
= arg min (x - H )T (x - H )
subject to A = b
(5.23)
The technique of Lagrangian multipliers can solve (5.23) as follows

Define the Lagrangian
J c = (x - H )T (x - H ) + T ( A - b)
where is a r -length vector of Lagrangian multipliers

The procedure is first solve then
91
(5.24)
Expanding (5.24):
J c = xT x - 2T HT x + T HT H + T A - T b
Differentiate J c with respect to :

J c
= -2HT x + 2HT H + AT
Set the result to zero:

- 2HT x + 2HT H c + AT = 0
1 T 1 T
1 T 1 T
T
1 T
c = (H H ) H x - (H H ) A = - (H H ) A
2
2
where is the LS solution. Put c into A = b :
1
A c = A - A(HT H ) 1 AT = b = ( A(HT H ) 1 AT ) 1 ( A - b)
2
2
92
Put back to c :
c = - (HT H ) 1 AT ( A(HT H ) 1 AT ) 1 ( A - b)
Idea of constrained LS can be illustrated by finding minimum value of y :

y = x 2 3x + 2
subject to x y = 1
93
5. Total LS
Motivation: Noises at both x and H :
x + w 1 = H + w 2
(5.25)
where w1 and w 2 are zero-mean noise vectors

A typical example is LS filtering in the presence of both input noise and
output noise. The noisy input is
x(k ) = s (k ) + ni (k ),
n = 0,1, L , N 1
and the noisy output is

r (k ) = s (k ) h(k ) + no (k ),
n = 0,1, L , N 1
The parameters to be estimated are {h(k )} given x(k ) and y (k )
94
Another example is in frequency estimation using linear prediction:

For a single sinusoid s (k ) = A cos(k + ) , it is true that
s (k ) = 2 cos() s (k 1) s (k 2)
s (k ) is perfectly predicted by s (k 1) and s (k 1) :
s (k ) = a0 s (k 1) + a1s (k 2)
It is desirable to obtain a0 = 2 cos() and a1 = 1 in estimation process

In the presence of noise, the observed signal is
x(k ) = s (k ) + w(k ),
n = 0,1, L , N 1
The linear prediction model is now
95
x(k ) = a0 x(k 1) + a1x(k 2),

n = 0,1, L , N 1
x(2) = a0 x(1) + a1x(0)
x ( 0)
x(2) x(1)
x(3) x(2)
x(3) = a0 x(2) + a1x(1)
x(1) a0
M
M
L L L
a1

M
x( N 1) = a0 x( N 2) + a1x( N 3) x( N 1) x( N 2) x( N 2)
s ( 0)
w(0)
w(1)
s (2) w(2) s (1)
s (3) w(3) s (2)
s (1) a0 w(2)
w(1) a0
+
=
M
M
M
M
M
a1
a1

M
w( N 2) w( N 2)
s ( N 1) w( N 1) s ( N 2) s ( N 2)
6. Mixed LS
A combination of LS, weighted LS, nonlinear LS, constrained LS and/or
total LS
Examples: weighted LS with constraints, total LS with constraints, etc.
96
Questions for Discussion

1. Suppose you have N pairs of ( xi , yi ) , i = 1,2, L , N and you need to fit
them into the model of y = ax . Assuming that only { yi } contain zeromean noise, determine the least squares estimate for a .
(Hint: ithe relationship between xi and yi is
yi = axi + ni ,
i = 1,2, L , N
where { ni } are the noise in { yi }.)

97
2. Use least squares to estimate the line y = ax in Q.1 but now only {xi }
contain zero-mean noise.
3. In a radar system, the received signal is
r (n) = s (n 0 ) + w(n)
where the range R of an object is related to the time delay by
0 = 2R / c
Suppose we get an unbiased estimate of 0 , say, 0 , and its variance is

var( 0 ) . Determine the corresponding range variance var(R ) , where R is
the estimate of R .
If var( 0 ) = (0.1s ) 2 and c = 3 108 ms 1, what is the value of var(R ) ?
98
99

Good Lecture For Est Theory

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Good Lecture For Est Theory

Uploaded by

Copyright:

Available Formats

Chapter 5

Estimation Theory and Applications

Estimation Theory and Applications

Radar system transmits an electromagnetic pulse s (n) . It is reflected by

Differences from Detection

Radar system transmits an electromagnetic pulse s (n) . After some time,

Another example is voice authentication: given a voice and it is indicated

Other biometric examples include face authentication, iris authentication,

17 Jan. 2003, Hong Kong Economics Times

Given x[n] , we need to find the DC value, A

Estimate the amplitude, frequency and phase of a sinusoid in noise

Given x[n] , we need to find , and

Estimate the value of resistance R from a set of voltage and current

I [n] = I actual [n] + w2 [n], n = 0,1, L , N 1

Given N pairs of ( V [n], I [n] ), we need to estimate the resistance R ,

Estimate the position of the mobile terminal using time-of-arrival

Given r[n] , we need to find the mobile position ( x s , y s ) where c is the

Types of Parameter Estimation

DC value, amplitude of the sine wave

Single parameter or multiple parameters

Use other available information & knowledge, e.g., from

Parameter is unknown deterministic or random

constant but unknown (classical)

Parameter is stationary or changing

Unknown deterministic for whole observation

Unknown deterministic at different time

Performance Measures for Classical Parameter Estimation

where w[n] is a zero-mean random noise with variance 2w

A 4 = N x[n] = N x[0] x[1]L x[ N 1]

Q. State the biasedness of A1, A 2 and A 3 .

For A 4 , it is difficult to analyze the biasedness. However, for w[n] = 0 :

What is the value of the mean square error or variance?

E{( A1 A) 2 } = E{( x[0] A) 2 } = E{( A + w[0] A) 2 } = E{w 2 [0]} = 2w

An optimum estimator should give estimates which are

Let the parameters to be estimated be = [1 , 2 , L , P ]T , the CRLB for

p (x; ) represents PDF of x = [ x[0], x[1], L , x[ N 1]]T

[J ]i,i is the ( i, i ) element of J

Review of Gaussian (Normal) Distribution

(2) N / 2 det1 / 2 (C)

The covariance matrix C has the form of

If x is a zero-mean white vector and all vector elements have variance 2

The Gaussian PDF for the random vector x can be simplified as

with the use of

where { w(n) } is a white Gaussian process with known variance 2w and A

ln( p ( x[0]; A))

This means the best we can do is to achieve estimator variance = 2w or

where A is any unbiased estimator for estimating A

We also observe that a simple unbiased estimator

achieves the CRLB:

2 ln( p (x; A))

This means the best we can do is to achieve estimator variance = 2w / N

where A is any unbiased estimator for estimating A

We also observe that a simple unbiased estimator

does not achieve the CRLB

E{( A1 A) 2 } = E{( x[0] A) 2 } = E{( A + w[0] A) 2 } = E{w 2 [0]} = 2w

On the other hand, the sample mean estimator

achieve the CRLB

sample mean is the optimum estimator for white Gaussian noise

Q. The CRLB is not affected by knowledge of noise power. Why?

where A and 0 are assumed known

= 2 x[n] sin(0 n + ) sin( 20 n + 2)

After matrix inversion, we have

In general, the CRLB increases as the number of parameters to be