Professional Documents
Culture Documents
Introduction
Here youll find some notes that I wrote up as I worked through this excellent book. Ive
worked hard to make these notes as good as I can, but I have no illusions that they are perfect.
If you feel that that there is a better way to accomplish or explain an exercise or derivation
presented in these notes; or that one or more of the explanations is unclear, incomplete,
or misleading, please tell me. If you find an error of any kind technical, grammatical,
typographical, whatever please tell me that, too. Ill gladly add to the acknowledgments
in later printings the name of the first person to bring each problem to my attention.
Special thanks (most recent comments are listed first) to Iman Bagheri, Jeong-Min Choi and
Hemant Saggar for their corrections involving chapter 2.
All comments (no matter how small) are much appreciated. In fact, if you find these notes
useful I would appreciate a contribution in the form of a solution to a problem that is not yet
worked in these notes. Sort of a take a penny, leave a penny type of approach. Remember:
pay it forward.
wax@alum.mit.edu
(1)
ZZ0
Z0
p(R|H0 )dR +
p(R|H0 )dR = 1 ,
ZZ0
Z0
to replace all integrals over Z Z0 with (one minus) integrals over Z0 . We get
Z
Z
p(R|H0 )dR
p(R|H0 )dR + P0 C10 1
R = P0 C00
Z0
Z0
Z
Z
p(R|H1 )dR
p(R|H1 )dR + P1 C11 1
= P1 C01
Z0
Z0
= P0 C10 + P1 C11
Z
{P1 (C01 C11 )p(R|H1 ) P0 (C10 C00 )p(R|H0 )} dR
+
(2)
Z0
If we introduce the probability of false alarm PF , the probability of detection PD , and the
probability ofRa miss PM , as defined
R in the book, we
R find that R given via Equation 1 becomes
when we use Z0 p(R|H0 )dR + Z1 p(R|H0 )dR = Z0 p(R|H0 )dR + PF = 1
R = P0 C10 + P1 C11
+ P1 (C01 C11 )PM P0 (C10 C00 )(1 PF ) .
(3)
P0 (C10 C00 )
,
P1 (C01 C11 )
for H1 . Thus if P0 changes the decision regions Z0 and Z1 change (via the above expression)
and thus both PF and PM change since they depend on Z0 and Z1 . Lets assume that we
specify a decision boundary that then defines classification regions Z0 and Z1 . These
decision regions correspond to a specific value of P1 denoted via P1 . Note that P1 is not the
true prior probability of the class H1 but is simply an equivalent prior probability that one
could use in the likelihood ratio test to obtain the same decision regions Z0 and Z1 . The
book denotes RB (P1 ) to be the expression given via Equation 4 where PF and PM changes
in concert with P1 . The book denotes RF (P1 ) to be the expression given by Equation 4 but
where PF and PM are fixed and are held constant as we change the value of P1 .
In the case where we do not fix PF and PM we can evaluate RB (P1 ) at its two end points of
P1 . If P1 = 0 then from Equation 4 RB (0) = C00 (1 PF ) + C10 PF . When we have P1 = 0
then we see that
P0 (C10 C00 )
+ ,
P1 (C01 C11 )
for all R the function (R) is always less than , and all classifications are H0 . Thus Z1 is
the empty set and PF = 0. Thus we get RB (0) = C00 .
The other extreme is when P1 = 1. In that case P0 = 0 so = 0 and we would have that
(R) > 0 for all R implying that all points are classified as H1 . This implies that PM = 0.
The expression for RB (P1 ) from Equation 4 is given by
RB (1) = C00 (1 PF ) + C10 PF + (C11 C00 ) (C10 C00 )PF = C11 ,
when we simplify. These to values for R(0) and R(1) give the end point conditions on
RB (P1 ) seen in Figure 2.7. If we do not know the value of P1 then one can still design a
hypothesis test by specifying values of PM and PF such that the coefficient of P1 in the
expression for RF (P1 ) in Equation 4 vanishes. The idea behind this procedure is that this
will make RF a horizontal line for all values of P1 which is an upper bound on the Bayes
risk. To make the coefficient of P1 vanish requires that
(C11 C00 ) + (C01 C11 )PM (C10 C00 )PF = 0 .
(5)
RF = CF PF + P1 [CM PM CF PF ] = P0 CF PF + P1 CM PM .
(6)
This is also known as the minimax test. The decision threshold can be introduced into the
definitions of PF and PM and the above gives an equation that can be used to determine its
value. If we take C00 = C11 = 0 and introduce the shorthands C01 = CM (the cost of a miss)
and C01 = CF (the cost of a false alarm) so we get our constant minimax risk of
1
N
(Nm) =
m,
and a variance given by the sum of the N variances (multiplied by the square of the scaling
1
factor N
) or
1
N 2 = 1 .
2
N
These two arguments have shown that l N N m, 1 .
We now derive Pr() for the case where we have measurements from Gaussians with different
means (Example 1). To do that we need to note the following symmetry identity about
erfc (X) function. We have that
2
Z
x
1
exp
erfc (X)
dx
2
2
X
2
2
Z X
Z
x
x
1
1
exp
exp
=1
dx = 1
dx
2
2
2
2
X
= 1 erfc (X) .
(7)
Then using this result we can derive the given expression for Pr() under the case when
P0 = P1 = 21 and = 1. We have
1
Pr() = (PF + PM ) since = 1 this becomes
2
d
1
d
d
1
erfc
+ 1 PD =
erfc
+ 1 erfc
=
2
2
2
2
2
1
d
d
=
erfc
+ 1 1 erfc
2
2
2
d
,
= erfc
2
(8)
=0
Z=
When we put in the expressions for p(R1 |H0 ) and P (R2 |H0 ) we see why converting to polar
coordinates is helpful. We have
!
!
Z 2 Z
R2
R2
21 21
21 22
1
1
p
p
e 0
e 0 dR1 dR2
PF =
2
2
20
20
=0 Z=
Z 2 Z
2
1 R1 + R22
1
dR1 dR2 .
exp
=
202 =0 Z=
2 02
When we change to polar we have the differential of area change via dR1 dR2 = ZddZ and
thus get for PF the following
Z 2 Z
Z
1
1 Z2
1 Z2
1
Z exp 2 ddZ = 2
Z exp 2 dZ .
PF =
202 =0 Z=
2 0
0 Z=
2 0
If we let v =
Z2
202
then dv =
Z
dZ
02
1
PF = 2
0
so PF becomes
Z
2 2
0
2
02 ev dv = ev = e 20 ,
2 2
0
(9)
the expression for PF given in the book. For PD the only thing that changes in the calculation
is that the normal has a variance of 12 rather than 02 . Making this change gives the
expression for PD in the book.
We can compute the ROC curve for this example by writing = 202 ln(PF ) and putting
this into the expression for PD . Where we find
2
0
ln(PF ) .
PD = exp 2 = exp
21
12
This gives
2
0
2
2
ln(PD ) = 02 ln(PF ) or PD = PF 1 .
1
2
2 /12
(10)
get larger (since PF is less
M
1
X
Pj
j=0
M
1
X
i=0
1
M
1 M
X
X
Pj Cij
i=0 j=0
M
1
X
j=0
Pj
M
1
X
i=0
Cij
p(R|Hj )dR
Zi
p(R|Hj )dR .
!
(11)
Zi
Lets consider the case where there are three classes M = 3 and expand our R where we
replace the integration region over the correct regions with their complement in terms of
Z i.e.
Z
p(R|H0 )dR
p(R|H0 )dR + P0 C20
p(R|H0 )dR + P0 C10
Z1
Z Z2
Z
ZZ0 =ZZ1 Z2
p(R|H1 )dR
p(R|H1 )dR + P1 C21
p(R|H1 )dR + P1 C11
= P1 C01
Z2
Z1 =ZZ0 Z2
Z0
Z
Z
Z
p(R|H2 )dR .
p(R|H2 )dR + P2 C22
p(R|H2 )dR + P2 C12
= P2 C02
R = P0 C00
(12)
Z2 =ZZ0 Z1
Z1
Z0
If we then simplify by breaking the intervals over segments into their component pieces for
example we simplify the first integral above as
Z
Z
Z
p(R|H0 )dR .
p(R|H0 )dR
p(R|H0 )dR = 1
ZZ1 Z2
Z1
Z2
(13)
Z2
If we define the integrands of the above integrals at the point R as I1 , I2 , and I3 such that
I0 (R) = P2 (C02 C22 )p(R|H2) + P1 (C01 C11 )p(R|H1)
I1 (R) = P0 (C10 C00 )p(R|H0) + P2 (C12 C22 )p(R|H2)
I2 (R) = P0 (C20 C00 )p(R|H0) + P1 (C21 C11 )p(R|H1) .
(14)
Then the optimal decision is made based on the relative magnitude of Ii (R). For example,
our decision rule should be
if I0 (R) min(I1 (R), I2 (R)) decide 0
if I1 (R) min(I0 (R), I2 (R)) decide 1
if I2 (R) min(I0 (R), I1 (R)) decide 2 .
(15)
Based on the results above it seems that for a general M decision hypothesis Bayes test we
can write the risk as
!
M
1 Z
M
1
M
1
X
X
X
Pj (Cij Cjj )p(R|Hj ) dR .
R=
Pi Cii +
i=0
i=0
Zi
j=0;j6=i
P 1
Note that in the above expression the first term, M
i=0 Pi Cii , is a fixed cost and cannot be
changed regardless of the decision region selected. The second term in R or
!
M
1 Z
M
1
X
X
Pj (Cij Cjj )p(R|Hj ) dR ,
i=0
Zi
j=0;j6=i
is the average cost accumulated when we incorrectly assign a sample to the regions i =
0, 1, 2, , M 1. Thus we should define Zi to be the points R such that the integrand
evaluated at that R is smaller than all other possible integrand. Thus if we define
Ii (R)
M
1
X
j=0;j6=i
a point R will be classified as from Zi if it has Ii (R) the smallest from all possible values of
I (R). That is we classify R as from Hi when
Ii (R) min(Ij (R)) for 0 j M 1 .
(16)
The book presents this decision region as the equations for the three class case M = 3 as
P1 (C01 C11 )1 (R) > P0 (C10 C00 ) + P2 (C12 C02 )2 (R) then H1 or H2
P2 (C02 C22 )2 (R) > P0 (C20 C00 ) + P1 (C21 C01 )1 (R) then H2 or H1
P2 (C12 C22 )2 (R) > P0 (C20 C10 ) + P1 (C21 C11 )1 (R) then H2 or H0 .
(17)
(18)
(19)
We can determine the decision regions in 1 and 2 space when we replace all inequalities
with equalities. In that case each of the above equalities would then be a line. We can then
solve for the values of 1 and 2 that determine the intersection point by solving these three
equations for 1 and 2 . In addition, the linear decision regions in 1 and 2 space can be
plotted by taking each inequality as an equality and plotting the given lines.
(20)
(21)
(22)
then H1 or H2 else H0 or H2
then H2 or H1 else H0 or H1 .
then H1 or H2 else H0 .
(23)
The first two equations state that we should pick H1 or H2 depending on the magnitudes of
the costs.
where C() is the uniform cost which is 1 except in a window of size centered around
where C() is zero. That is
Z
Z
dA C(A a
(R))p(A|R) = 1
dAp(A|R)
|A
a(R)|
2
= 1
a
(R)+
2
p(A|R)dA ,
(24)
a
(R)
2
1
N
+ 2
2
a n
1
N
2
n
1
.
a2
n2 + Na2
a2 n2
If we define
1
a2 n2
,
Na2 + n2
2
Ri A
= k (R) exp 2
R exp 2 A 2
2n i=1 i
2p
n i=1
!
(
)
!2
!2
PN
PN
N
N
2
2
2
X
X
p i=1 Ri
2p
p i=1 Ri
1
1
= k (R) exp 2
Ri A +
Ri2 exp 2 A2 2
2
2
2p
2n i=1
n i=1
n
n
(
)
!2
!2
PN
PN
N
1
1
p2 i=1 Ri
p2 i=1 Ri
1 X 2
= k (R) exp 2
Ri exp 2 A
exp 2
2
2
2p
2p
2n i=1
n
n
!2
PN
1
p2 i=1 Ri
.
= k (R) exp 2 A
2p
n2
Note that the mean value of the above density can be observed by inspection where we have
!
!
N
N
p2 X
1 X
a2
ams (R) = 2
Ri .
(25)
Ri =
2
n i=1
N i=1
a2 + n
N
(27)
(28)
(29)
We now add 1/2 of Equation 27 and 1/2 of Equation 29 and the convexity of C to get
1
1
RB (a|R) = Ez [C(z + a
ams )|R] + Ez [C(z + ams a
)|R]
2
2
1
1
= Ez
C(z + (ams a
)) + C(z (ams a
)) R
2
2
1
1
Ez C
(z + (ams a
)) + (z (ams a
)) R
2
2
= Ez [C(z)|R] = RB (ams |R) .
(30)
While all of this manipulations may seem complicated my feeling that the take away from
this is that the risk of using any estimator a
6= a
ms will be larger (or worse) than using a
ms
when the cost function is convex. This is a strong argument for using the mean-squared cost
function above others.
(31)
(32)
For this example from the functional form for pa|r (A|R) (now containing a nonlinear function
in a) in we have our MAP estimate given by solving the following
"
#
N
2
ln(pa|r (A|R))
1 1 X
1
A
ln(k(R))
[Ri s(A)]2
=
A
A
2 n2 i=1
2 a2
N
A
ds(A)
1 X
2 = 0,
[Ri s(A)]
= 2
n i=1
dA
a
equation for A. When we do this (and calling the solution amap (R)) we have
!
N
a2 X
s(A)
a
map (R) = 2
[Ri s(A)]
n i=1
A A=amap (R)
(33)
v
1+
N
k(N)
dv
=
1+
(1 + )N +1
v N ev dv =
k(N)N!
.
(1 + )N +1
(1 + )N +1
.
N!
(35)
Now that we have the expression for k(N) we can evaluate ams (N). We find
Z
a
ms (N)
Ap(A|N)dA
0
Z
(1 + )N +1 N +1 (1+)A
=
A
e
dA
N!
0
Z
1
(1 + )N +1
v N +1 ev dv
=
N!
(1 + )N +2 0
N +1
=
.
+1
(36)
To evaluate a
map (N) we first note from Equation 34 that
ln(p(A|N)) = N ln(A) A(1 + ) + ln(k(N)) ,
so that setting the first derivative equal to zero we get
N
ln(p(A|N))
=
(1 + ) = 0 .
A
A
Solving for A we get a
map (N)
a
map (N) =
N
.
1+
(37)
(38)
Z
Z
1/2 Z
1/2
2
2
f (x) dx
g(x) dx
f (x)g(x)dx f (x)g(x)dx
.
(39)
We will have equality if f (x) = kg(x). If we take for f and g the functions
ln(pr|a(R|A)) q
pr|a (R|A)
f (R)
A
q
g(R)
pr|a (R|A)(a(R) R) ,
as the component functions in the Schwarz inequality then we find a right-hand-side (RHS)
of this inequality given by
!1/2 Z
1/2
Z
ln(pr|a (R|A)) 2
2
RHS =
pr|a (R|A)(a(R) A) dR
pr|a (R|A)dR
,
A
and a left-hand-side (LHS) given by
Z
ln(pr|a (R|A))
pr|a (R|A)(a(R) A)dR .
LHS =
A
From the derivation in the book this LHS expression is on the right-hand-side is equal to
the value of 1. Squaring both sides of the resulting inequality
!1/2 Z
1/2
Z
ln(pr|a (R|A)) 2
2
pr|a (R|A)dR
pr|a (R|A)(a(R) A) dR
,
1
A
gives the books equation 186. Simply dividing by the integral with the derivative and
recognizing that these integrals are expectations gives
(
2 )1
ln(p
(R|A))
r|a
E[(a(R) A)2 ] E
,
(40)
A
which is one formulation of the Crammer-Rao lower bound on the value of the expression
Var[a(R) A] and is the books equation 188. From the above proof we will have an efficient
estimator (one that achieve the Cramer-Rao lower bound) and the Schwarz inequality is
tight when f = kg or in this case
q
ln(pr|a (R|A)) q
pr|a (R|A) = k(A) pr|a (R|A) (a(R) R) .
A
or
ln(pr|a (R|A))
= k(A)(a(R) A) .
(41)
A
If we can write our estimator a
(R) in this form then we can state that we have an efficient
estimator.
N
Y
i=1
1
(Ri A)2
.
exp
2n2
2n
(42)
To find the maximum likelihood solution we need to find the maximum of the above expression with respect to the variable A. The A derivative of this expression is given by
!
!
N
N
N
ln(pr|a (R|A))
1 X
N
2 X
1 X
(Ri A) = 2
Ri AN = 2
Ri A . (43)
= 2
A
2n i=1
n i=1
n N i=1
Setting this equal to zero and solving for A we get
N
1 X
Ri .
a
ml (R) =
N i=1
(44)
(45)
Thus we find
Var[aml (R) A] =
2 ln(pr|a(R|A))
E
A2
1
N
n2
1
n2
=
N
(46)
The maximum likelihood estimate of A, after we observe the number n events, is given by
finding the maximum of the density above Pr(n events|a = A). We can do this by setting
ln(p(n=N |A))
equal to zero and solving for A. This derivative is
A
ln(Pr(n = N|A)) =
(N ln(A) A ln(N!))
A
A
N
1
=
1 = (N A) .
A
A
(48)
(49)
|A))
Note that in Equation 48 we have written ln(p(n=N
in the form k(A)(a A) and thus
A
a
ml is an efficient estimator (one that achieves the Crammer-Rao bounds). Computing the
variance of this estimator using this method we then need to compute
N
2 ln(p(n = N|A))
= 2.
2
A
A
Thus using this we have
Var[aml (N) A] =
1
E
2 ln(p(R|A))
A2
o=
1
E
=
N
A2
A2
A2
=
= A.
E[N]
A
(50)
A bit of explanation might be needed for these manipulations. In the above E[N] is the
expectation of the observation N with A a fixed parameter. The distribution of N with A
a fixed parameter is a Poisson distribution with mean A given by Equation 47. From facts
about the Poisson distribution this expectation is A.
Note that these results can be obtained from MAP estimates in the case where our prior
information is infinitely weak. For example, in example 2 weak prior information means
that we should take a in the MAP estimate of a. Using Equation 25 since a
map (R) =
a
ms (R) for this example this limit gives
!
N
N
1 X
a2
1 X
Ri
Ri .
a
map (R) = 2
a + (n2 /N) N i=1
N i=1
which matches the maximum likelihood estimate of A as shown in Equation 44.
In example 4, since A is distributed as an exponential with parameter it has a variance
given by Var[A] = 12 see [1], so to remove any prior dependence in the MAP estimate we
take 0. In that case the MAP estimate of A given by Equation 37 limits to
a
map =
N
N,
1+
(51)
o
n
A2
.
From this expression for p(A|R) to get p(R|A) we would need to drop pa (A) exp 2
2
a
When we do this and then take the logarithm of p(R|A) we get
N
1 X
ln(p(R|A)) = ln(k (R)) 2
[Ri s(A)]2 .
2n i=1
ds
= 0 is zero or s(A) = N1
To satisfy this equation either dA
has a solution for aml (R) given by
!
N
X
1
a
ml (R) = s1
Ri ,
N i=1
PN
i=1
(53)
which is the books equation 209. If this estimates is unbiased we can evaluate the KramerRao lower bound on Var[aml (R) A] by computing the second derivative
!
N
s
2 ln(p(R|A))
1 2s X
1 s
N
=
[Ri s(A)] + 2
A2
n2 A2 i=1
n A
A
!
2
N
N s
1 2s X
.
(54)
[Ri s(A)] 2
=
n2 A2 i=1
n A
Taking the expectation of the above expression and using the fact that E(Ri s(A)) =
E(ni ) = 0 the first term in the above expression vanishes and we are left with the expectation
of the second derivative of the log-likelihood given by
2
N s
2
.
n A
Using this expectation the Cramer-Rao lower bound gives
Var[aml (R) A]
2 p(R|A)
A2
o=
n2
N
ds(A)
dA
2 .
(55)
We can see why we need to divide by the derivative squared when computing the variance
of a nonlinear transformation form the following simple example. If we take Y = s(A) and
Taylor expand Y about the point A = AA where YA = s(AA ) we find
ds
(A AA ) + O((A AA )2 ) .
Y = YA +
dA
A=AA
ds
.
Y YA = (A AA )
dA A=AA
From this we can easily compute the variance of our nonlinear function Y in terms of the
variance of the input and find
!2
ds(A)
Var[Y YA ] =
Var[A AA ] .
dA A=AA
Which shows that a nonlinear transformation expands the variance of the mapped variable
Y by a multiple of the derivative of the mapping.
(56)
(57)
(58)
Then using the Schwarz inequality as was done on Page 13 we can get the stated lower bound
on the variance of our estimator a
(R). The Schwarz inequality will hold with equality if and
only if
2 ln(p(R, A))
= k .
(59)
A2
Since p(R, A) = p(A|R)p(R) we have ln(p(R, A)) = ln(p(A|R)) + ln(p(R)) so Equation 59
becomes
2 ln(p(R, A))
2 ln(p(A|R))
=
= k .
A2
A2
Then integrating this expression twice gives that p(A|R) must satisfy
p(A|R) = exp(kA2 + c1 (R)A + c2 (R)) .
Z
Z
p(R|A)
p(R|A)
dR A1
dR
=
a
1 (R)
A1
A1
= 1 A1
p(R|A)dR using the books equation 264 for the first term
A1
1 = 10 = 1.
(60)
= 1 A1
A1
and R = W 1 R ,
l(R ) = m W T QW 1 R .
Recall that W T is the matrix containing the eigenvectors of K as its column values, since Q
is defined as Q = K 1 we can conclude that
KW T = W T is the same as Q1 W T = W T ,
so inverting both sides gives W T Q = 1 W T . Multiply this last expression by W 1 on the
right gives W T QW 1 = 1 W T W 1 . Since the eigenvectors are orthogonal W W T = I
so W T W 1 = I and we obtain
W T QW 1 = 1 .
Using this expression we see that l(R ) becomes
T
l(R ) = m R =
N
X
mi R
i=1
(61)
d2 = mT Qm = m W T QW 1 w
T
= m m =
N
X
(m )2
i
i=1
(62)
If > 0 then m11 = 0 and m12 = 1 will maximize d2 . In terms of m11 and m12 this means
that
m11 + m12
m11 m12
= 0 and
= 1.
2
2
1
m11
1
= 2 Note that 2 is the eigenvector
= 2
Solving for m11 and m12 we get
1
m12
corresponding to the smaller eigenvalue (when > 0).
If < 0 then m11 = 1 and m12 = 0 will maximize d2 . In terms of m11 and m12 this means
that
m11 m12
m11 + m12
= 1 and
= 0.
2
2
1
m11
1
= 1 Note that 1 is again the eigen= 2
Solving for m11 and m12 we get
1
m12
vector corresponding to the smaller eigenvalue (when < 0).
When m1 = m2 = m we get for the H1 decision boundary
1
1
1
1
(R m)T Q0 (R m) (R m)T Q1 (R m) > ln() + ln |K1 | ln |K0 | .
2
2
2
2
We can write the left-hand-side of the above as
1
1
(R m)T Q0 (R m)T Q1 (R m) = (R m)T (Q0 Q1 )(R m) .
2
2
1
Qn = 2
n
so
(63)
1
1
1
I + 2 Ks
= 2 [I H] ,
n
n
1
1
I + 2 Ks
=I H,
n
1
1
H = I I + 2 Ks
n
1
1
1
I + 2 Ks I
= I + 2 Ks
n
n
1
1
1
Ks
= I + 2 Ks
n
n2
1
= n2 I + Ks
Ks .
(64)
Notice that from the functional forms of Q0 and Q1 we can write this as
1
1
= n2 Q0 n2 Q1 = n2 Q ,
I I + 2 Ks
n
We have
Z
Z
2
N/2 N
1 N/21 L/2n
PF =
(2 n (N/2)) L
e
dL = 1
(65)
Since the integrand is a density and must integrate to one. If we assume N is even and let
M = N2 1 (an integer) then
N
= (M + 1) = M! .
2
Lets change variables in the integrand by letting x =
expression for PF becomes
PF = 1
=1
2
2n
2
2n
(2N/2 nN )1
1
M!
L
2
2n
(so that dx =
dL
2 )
2n
xM x
e dx .
M!
Using the same transformation of the integrand used above (i.e. letting x =
write PF in terms of x as
Z M
x x
PF =
e dx .
M!
(66)
L
2 )
2n
we can
+
=
e
xM 1 ex dx first integration by parts
M!
(M 1)!
Z
M
M 1 x
M 2 x
x
e + (M 1)
+
x
e dx
=
e
M!
(M 1)!
Z
M
M 1
=
e
+
e
+
xM 2 ex dx second integration by parts
M!
(M 1)!
(M 2)!
!
Z
k
2
X
+
= e
xex dx
k!
(M
(M
1))
k=M,M 1,M 2,
!
!
Z
k
k
M
M
X
X
ex dx = e
xex +
+ e + e .
= e
k!
k!
k=2
k=2
Thus
PF = e
k
M
X
k=0
k!
(67)
M!
M
( )M e X kM M!
PF =
( )
M!
k!
k=0
( )M e
M
M(M 1) M(M 1)(M 2)
+
+ .
=
1 + +
M!
2
3
(68)
If we drop the terms after the second in the above expansion and recall that (1+x)1 1x
when x 1 we can write
1
( )M e
M
PF
1
.
(69)
M!
Problem Solutions
The conventions of this book dictate that lower case letters (like y) indicate a random variable
while capital case letters (like Y ) indicate a particular realization of the random variable y.
To maintain consistency with the book Ill try to stick to that notation. This is mentioned
because other books use the opposite convention like [1] which could introduce confusion.
(70)
(71)
(72)
A similar expression holds for pn (N). Now from properties of the exponential distribution
the mean of s is a1 and the mean of n is 1b . We hope in a physical application that the mean of
s is larger than the mean of n. Thus we should expect that b > a. Ill assume this condition
in what follows.
We now need to compute p(R|H0 ) and p(R|H1 ). The density is p(R|H0 ) is already given.
To compute p(R|H1 ) we can use the fact that the probability density function (PDF) of the
Since the domain of n is n 0 the function pn (N) vanishes for N < 0 and the lower limit on
the above integral becomes 0. The upper limit is restricted by recognizing that the argument
to ps (R n) will be negative for n sufficiently larger. For example, when R n < 0 or n > R
the density pS (R n) will cause the integrand to vanish. Thus we need to evaluate
Z R
Z R
p(R|H1 ) =
ps (R n)fn (n)dn =
aea(Rn) bebn dn
0
ab
=
(eaR ebR ) ,
ba
when we integrate and simplify some. This function has a domain given by 0 < R < and
is zero otherwise. As an aside, we can check that the above density integrates to one (as it
must). We have
aR
Z
bR
e
e
ab
ab
(eaR ebR )dR =
+
b
a
b
a
a
b
0
0
ab
1 1
=0
+
= 1,
ba
a b
when we simplify. The likelihood ratio then decides H1 when
(R) =
ab
(eaR
ba
bebR
ebR )
>
P0 (C10 C00 )
,
P1 (C01 C11 )
ba
a
+1 .
P0 (C10 C00 )
.
P1 (C01 C11 )
Part (3): For Neyman-Pearson test (in the standard form) we fix a value of
PF Pr(say H1 |H0 is true)
say and seek to maximize PD . Since we have shown that the LRT in this problem is
equivalent to R > we have
Pr(say H1 |H0 is true) = Pr(R > |H0 is true) .
pn (N)dN
bN
be
bN
e
= eb .
dN = b
b
2
1 2
p(R|H1 )
=
exp |R| + R
(R) =
p(R|H0 )
2
2
r
r
1
1
1 2
1
2
=
=
exp
(R 2|R| + 1)
exp
(|R| 1)
2
2
2
2
2
2
r
1
1
e 2 exp
(|R| 1)2 .
=
2
2
Part (2): The LRT says to decide H1 when (R) > and decide H0 otherwise. From the
above expression for (R) this can be written as
!
r
1
2 1
2
(|R| 1) > ln
e2 ,
2
or simplifying some
v
!
r
u
u
2
1
|R| > t2 ln
e2 + 1 .
If we plot the two densities we get the result shown in Figure 1. See the caption on that plot
for a description.
0.4
0.3
0.2
0.0
0.1
pH0
Figure 1: Plots of the density 12 exp 12 R2 (in black) and 12 exp(|R|) (in red). Notice
that the exponential density has fatter tails than the Gaussian density. A LRT where if |R|
is greater than a threshold we declare H1 makes sense since the exponential density (from
H1 ) is much more likely to have large valued samples.
distribution function for y under the hypothesis H0 is given by
P (Y |H0 ) = Pr{y Y |H0 } = Pr{x2 Y |H0 }
Z + Y
2
1
= Pr{ Y x Y |H0 } = 2
exp 2 d .
2
2 2
0
The density function for Y (under H0 ) is the derivative of this expression with respect to Y .
We find
1
2
1
Y
Y
p(Y |H0 ) =
=
exp 2
exp 2 .
2
2
2 Y
2 2
2 2 Y
Next the distribution function for y under the hypothesis H1 is given by
P (Y |H1 ) = Pr{y Y |H1 } = Pr{x3 Y |H1 }
Z Y 1/3
1
2
1/3
= Pr{x Y |H1 } =
exp 2 d .
2
2 2
Again the density function for y (under H1 ) is the derivative of this expression with respect
to Y . We find
1
1
Y 2/3
Y 2/3
1
p(Y |H1 ) =
=
exp 2
exp 2 .
2
3Y 2/3
2
2 2
3 2 2 Y 2/3
Using these densities the LRT then gives
n 2/3 o
1
exp
Y22
2/3
1
3Y
2/3
1/6
=Y
(Y ) =
+Y) .
exp 2 (Y
1
2
exp 2Y 2
Y 1/2
= Pr{ Y x Y |H0 } =
( m)2
exp
2 2
2 2
d .
The density function for y|H0 is the derivative of this expression. To evaluate that derivative
we will use the identity
d
dt
(t)
(t)
d
d
f (x, t)dx =
f (, t)
f (x, t) +
dt
dt
(t)
(t)
f
(x, t)dx .
t
(73)
exp
2 2
1
2 2
)
!
2
( Y m)
( Y m)
1
1
exp
2
2
2
2
2 Y
2 Y
(
)
(
)!
( Y m)2
( Y + m)2
exp
+ exp
.
2 2
2 2
2 Y
)
ln(Y )
( m)2
exp
d .
2 2
2 2
1
p(Y |H1 )
p(Y |H0 )
> then
derived in the book are valid here after we replace N with K. The LRT expressions desired
for this problem are then given by the books Eq. 29 or Eq. 31.
Part (3): Given the decision region l(R) > for H1 and the opposite inequality for deciding
H0 , we can write PF and PM = 1 PD as
!
K
X
2
PF = Pr(choose H1 |H0 is true) = Pr
Ri > |H0 is true
PD = Pr(choose H1 |H1 is true) = Pr
i=1
K
X
i=1
exp 2(m2 2 +2 )
2
n
2 m21 b2 +n
p(R|H1 )
1 b
(R)
=
1
R2
p(R|H0 )
exp 2
2
2
n
2 n
s
1
n2
m21 b2
2
exp
R ,
=
m21 b2 + n2
2 (m21 b2 + n2 )n2
)
m
1 b
n n
1 b
n
R2 >
.
ln
m21 b2
n2
Taking the square root we decide H1 when
v
s
!
u
2 2
u 2(m2 2 + 2 ) 2
2
+
n n
1 b
n
1 b
.
ln
|R| > t
m21 b2
n2
(74)
1.0
0.8
0.6
0.0
0.2
0.4
P_D
0.0
0.2
0.4
0.6
0.8
1.0
P_F
Figure 2: The ROC curve for Problem 2.2.6. The minimum probability of error is denoted
with a red marker.
The optimal processor takes the absolute value of the number R and compares its value to
a threshold .
Part (2): To draw the ROC curve we need to compute PF and PD . We find
Z
Z
Z
1 R2
1
exp 2 dR .
PF =
p(R|H0 )dR = 2
p(R|H0 )dR = 2
2 n
2n
|R|>
R=
R=
Let v =
R
n
v=
1 2
1
exp v dv = 2erfc
2
n
2
For PD we find
1
R2
1
p
exp
dR
PD = 2
p(R|H1 )dR = 2
2 m21 b2 + n2
m21 b2 + n2
R=
R=
!
Z
1 2
1
exp v dv = 2erfc p
=2
.
2
2
m21 b2 + n2
v=
2 2
2
Z
m +n
1 b
To plot the ROC curve we plot the points (PF , PD ) as a function of . To show an example
of this type of calculation we need to specify some parameters values. Let n = 1, b = 2,
and m1 = 5. With these parameters in the R code chap 2 prob 2.2.6.R we obtain the plot
shown in Figure 2.
1
2
we find
1
1
1 1
Pr() = P0 PF + P1 PM = PF () + (1 PD ()) = + (PF () PD ()) .
2
2
2 2
The minimum probability of error (MPE) rule has C00 = C11 = 0 and C10 = C01 = 1 and
thus = 1. With this value of we would evaluate Equation 74 to get the value of of the
MPE decision threshold. This threshold gives Pr() = 0.1786.
N
Y
i=1
= a|I0 , S1 )p(I0 |S1 ) + p(r = a|I1 , S1 )p(I1 |S1 ) = (0.4)(0.5) + (0.7)(0.5) = 0.55
= b|I0 , S1 )p(I0 |S1 ) + p(r = b|I1 , S1 )p(I1 |S1 ) = (0.6)(0.5) + (0.3)(0.5) = 0.45
= a|I0 , S2 )p(I0 |S2 ) + p(r = a|I1 , S2 )p(I1 |S2 ) = (0.4)(0.4) + (0.7)(0.6) = 0.58
= b|I0 , S2 )p(I0 |S2 ) + p(r = b|I1 , S2 )p(I1 |S2 ) = (0.6)(0.4) + (0.3)(0.6) = 0.42 .
With what we have thus far the LRT (will say to decide S2 ) if
N
N
N
p(r = a|S2 ) a p(r = b|S1 ) a p(r = b|S2 )
p(r = a|S2 )Na p(r = b|S2 )N Na
=
(R)
p(r = a|S1 )Na p(r = b|S1 )N Na
p(r = a|S1 )
p(r = b|S2 )
p(r = b|S1 )
Na
N
p(r = a|S2 ) p(r = b|S1 )
p(r = b|S2 )
=
> .
p(r = b|S2 ) p(r = a|S1 )
p(r = b|S1 )
For the numbers in this problem we find
(0.58)(0.45)
p(r = a|S2 ) p(r = b|S1 )
=
= 1.129870 > 1 .
p(r = b|S2 ) p(r = a|S1 )
(0.42)(0.55)
1.0
0.0
0.2
0.4
P_D
0.6
0.8
1.0
0.8
0.6
P_D
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
P_F
0.4
0.6
0.8
1.0
P_F
Figure 3: The ROC curve for Problem 2.2.7. Left: When N = 1. Right: When N = 10.
A vertical line is drawn at the desired value of PF = = 0.25.
We can solve for Na (and dont need to flip the inequality since the log is positive) to find
Na >
ln
p(r=b|S1 )
p(r=b|S2 )
N
p(r=a|S2 ) p(r=b|S1 )
p(r=b|S2 ) p(r=a|S1 )
.
Since Na can only be integer valued for Na = 0, 1, 2, , N we only need to consider integer
values of say for I = 0, 1, 2, , N + 1. Note that at the limit of I = 0 the LRT Na 0
is always true, so we always declare S2 and obtain the point (PF , PD ) = (1, 1). The limit of
I = N + 1 the LRT of Na N + 1 always fails so we always declare S1 and obtain the point
(PF , PD ) = (0, 0). Since Na is the count of the number of as from N it is a binomial random
variable (under both S1 and S2 ) and once I is specified, we have PF and PD given by
N
X
N
p(r = a|S1 )k p(r = b|S1 )N k
PF = Pr{Na I |S1 } =
k
k=I
N
X
N
p(r = a|S2 )k p(r = b|S2 )N k .
PD = Pr{Na I |S2 } =
k
k=I
To plot the ROC curve we evaluate PF and PD for various values of I . This is done in
the R code chap 2 prob 2.2.7.R. Since the value of PF = = 0.25 does not exactly fall on
a integral value for I we must use a randomized rule to achieve the desired performance.
Since we are not told what N is (the number of samples observed we will consider two cases
N = 1 and N = 10.
In the case N = 1 the desired value PF = = 0.25 falls between the two points PF (I =
2) = 0 and PF (I = 1) = 0.55. To get the target value of 0.25 we need to introduce the
probability that we will use the threshold I = 2 as p=2 . The complement of this probability
or 1 p=2 is the probability that we use the threshold I = 1. Then to get the desired false
alarm rate we need to take p=2 to satisfy
p=2 PF (I = 2) + (1 p=2 )PF (I = 1) = 0.25 .
Putting in what we know for PF (I = 2) = 0 and PF (I = 1) = 0.55 this gives p=2 = 0.54.
The randomized procedure that gets PF = while maximizing PD is to observe Na and then
with a probability p=2 = 0.54 return the result from the test Na 2 (which will always be
false causing us to returning S1 ). With probability 1 p=2 = 0.45 return the result from
the test Na 1.
In the case where N = 10 we find that = 0.25 falls between the two points PF (I = 8) =
0.09955965 and PF (I = 7) = 0.2660379. We again need a randomized rule where we have
to pick p=8 such that
p=8 PF (I = 8) + (1 p=8 )PF (I = 7) = 0.25 .
Solving this gives pI =8 = 0.096336. The randomized decision to get PF = 0.25 while
maximizing PD is of the same form as in the N = 1 case. That is to observe Na and then
with a probability p=8 = 0.096336 return the result from the test Na 8. With probability
1 p=8 = 0.9036635 return the result from the test Na 7.
Problem 2.2.8 (a Cauchy hypothesis test)
Part (1): For the given densities the LRT would state that if
(R) =
p(X|H1 )
1 + (X a0 )2
1 + X2
=
=
> ,
p(X|H0 )
1 + (X a1 )2
1 + (X 1)2
Using the quadratic formula we can find the values of X where the left-hand-side of this
expression equals zero. We find
p
p
(1 3 + 2 )
2 4 2 4(1 )(1 2)
=
.
(75)
X =
2(1 )
1
In order for a real value of X above to exist we must have that the expression 13 + 2 < 0.
If that expression is not true then (1 )X 2 + 2X + 1 2 is either always positive or
always negative. This expression 1 3 + 2 is zero at the values
3 5
3 94
=
= {0.381966, 2.618034} .
2
2
For various values of we have
X+
1
1
1
dX = 1 tan1 (X+ ) + tan1 (X )
2
X (1 + X )
Z X+
1
dX
PD = Pr(choose H1 |H1 is true) = 1
2
X (1 + (X 1) )
1
1
= 1 tan1 (X+ 1) + tan1 (X 1) .
X+
Z X
1
1
PD =
p(X|H1 )dX = tan1 (X 1) tan1 (X+ 1) .
X+
=1
All of these calculations are done in the R script chap 2 prob 2.2.8.R. When that script is
run we get the result shown in Figure 4.
1.0
0.8
0.6
0.0
0.2
0.4
P_D
0.0
0.2
0.4
0.6
0.8
1.0
P_F
p(R|H0 ) = P0NH (1 P0 )N NH .
Using these the LRT is given by
p(R|H1 )
P NH (1 P1 )N NH
= 1NH
> ,
p(R|H0 )
P0 (1 P0 )N NH
decide H1 (otherwise decide H0 ).
Part (2): In the case where P0 = P1 = 12 , C00 = C11 = 0, and equal error costs C10 = C01
(C10 C00 )
we have = PP10 (C
= 1 and our LRT says to decide H1 when
01 C11 )
Pr{N(T ) = n|H1 }
= ek1 T ek0 T
Pr{N(T ) = n|H0 }
k1
k0
n
> 1.
If we assume that k1 > k0 indicating that the second source has events that happen more
frequently. Then the LRT can be written as
n>
T (k1 k0 )
.
ln kk10
Since n can only take on integer values the only possible values for are 0, 1, 2, . Thus
our LRT reduces to n I for I = 0, 1, 2, . Note we consider the case where n = I to
cause us to state that the event H1 occurred.
Part (3): To determine the probability of error we use
Pr() = P0 PF + P1 PM = P0 PF + P1 (1 PD ) =
1 1
+ (PF PD ) .
2 2
X
ek0 T (k0 T )i
PF = Pr{Say H1 |H0 } =
i!
i=
I
X
ek1 T (k1 T )i
PD = Pr{Say H1 |H1 } =
.
i!
i=
I
The above could be plotted as a function of I in the (PF , PD ) plane to obtain the ROC
curve for this problem.
k=2
1
2
X
k=2
p(Y |N = k)P (N = k)
(k + 1) 2
21
y2
(k+1) 2
p(Y |H1 )
p(Y |H0 )
k e
k!
p(R|H1 )
.
p(R|H0 )
n+1
|H0 ) =
=
n+1
Z
Z
p(R|H1 )
p(R|H0 )
n+1
(R)p(R|H0)dR =
n
p(R|H1 )
p(R|H1 )dR = E(n |H1 ) .
p(R|H0 )
p(R|H0 )dR
p(R|H1 )
p(R|H0 )dR =
p(R|H0 )
p(R|H1 )dR = 1 .
Part (3): Recall that Var(2 |H0 ) = E(2 |H0 ) E(|H0)2 from #1 in this problem we have
shown that E(2 |H0 ) = E(|H1 ). From #2 of this problem E(|H0) = 1 so
E(|H0)2 = 12 = 1 = E(|H0 ) ,
thus
Var(|H0 ) = E(|H1 ) E(|H0 ) .
exp
= e 2
R
dRi .
2 i
2
2
2
R
i
i=1
Ri2
2Ri m
m2
Ri2
2Ri m
m2
m
R
i
2
2 2
2 2
2 2
2 2
2
2 2
3m2
1
.
= 2 (Ri 2m)2 +
2
2 2
With this the integral above becomes
Z
N
2 Y 3m2
1
1
Nm2
2
2
I = e 2
e 2
exp 2 (Ri 2m) dRi
2
2
Ri
i=1
= e
Nm2
2 2
3Nm2
2 2
=e
Nm2
2
N m2
2
(xex /2 )dx .
2 X x
R
R
Then using the integration by parts lemma vdu = vu udv with
v=
1
x
where
dv =
and du = xex
2 /2
dx ,
1
2
dx and u = ex /2 ,
2
x
(76)
R
x2
Since the second integral term 12 X x12 e 2 dx is positive (and nonzero) if we drop this
term from the above sum we will have an expression that is larger in value that erfc (X) or
an upper bound. This means that
erfc (X) <
X2
1
e 2 .
2X
(77)
= 3e
4
X
X x
Remembering the factor of
1
2
X2
X2
1
1
1
erfc (X) =
e 2
e 2 +
2X
2X 3
2
3 x2
e 2 dx .
x4
(78)
Since the last expression is positive (and nonzero) if we drop it the remaining terms will then
sum to something smaller than erfc (X). Thus we have just shown
X2
X2
1
1
e 2
e 2 < erfc (X) .
2X
2X 3
(79)
( 2t
2
X
2
1
) e
dt = 23/2
2t1/2
1/2 2 t
X2
2
t3/2 et dt .
X2
2
t( 2 +1) et dt .
1
The integral we initially have is I1 ( X2 ). We can write In (x) recursively using integration by
part as
Z 1
1
( 21 +n) t
In (x) = t
+ n t( 2 +n+1) et dt
e +
2
x
x Z
1
1
1
+n
t( 2 +n+1) et dt
= x(n+ 2 ) ex
2
x
1
1
In+1 (x) .
(80)
= x(n+ 2 ) ex n +
2
Let x
erfc (X)
X2
.
2
=
=
=
=
=
=
Then using this we have shown that we can write erfc (X) as
2
1
2
1
2
1
2
1
1 X2
e 2
X
1 X2
e 2
X
1 X2
e 2
X
1 X2
e 2
X
1 X2
e 2
X
1 1 X2
e 2
2 X
I1 (
x)
223/2
3
1
3
x
x)
x
2 e
I2 (
2
223/2
5
1
3
5
3
x
x
I3 (
x
2e
x
2 e
x)
2
2
223/2
3
5
1
3
3
5
x
x
x
+ 2 I3 (
x
2 e
2 e
x)
2
2
223/2
3
5
1
3
3
5 7
357
x
x
x
2 e
x
+ 2 x
2 e
2e x
I4 (
x)
3
2
2
2
223/2
" N
#
X
3 5 7 (2k 1) (k+ 1 )
1
3 5 7 (2N + 1)
2 e x + (1)N+1
(1)k1
x
IN+1 (
x) .
2k1
223/2 k=1
223/2 2N1
#
X2 " N
X
1 1 X2
3 5 7 (2N + 1)
e 2
k1 3 5 7 (2k 1)
2
(1)
e
IN+1 (
x)
+ (1)N
X 2k
2N1
2 X
2X k=1
"
#
2
N
X
1 3 5 7 (2k 1)
1 1 X2
X
N 3 5 7 (2N + 1)
(1)k
.
e 2 1+
+
(1)
I
N+1
2k
N1
X
2
2
2 X
k=1
(81)
v2
,
e dv =
2
0
we see that the above becomes
1 X2
erfc (X) e 2 ,
2
as we were to show.
Part (2): We want to compare the expression derived above with the bound
erfc (X) <
1
X2
e 2 .
2X
(82)
2
Note that these two bounds only differ in the coefficient of the exponential of X2 . These
two coefficients will be equal at the point X when
r
1
1
2
=
X =
= 0.7978 .
2
2X
Once we know this value where the coefficients are equal we know that when X > 0.7978 then
1
Equation 82 would give a tighter bound since in that case 2X
< 12 , while if X < 0.7978
the bound
1 X2
erfc (X) < e 2 ,
2
is better. One can observe the truth in these statements by looking at the books Figure 2.10
which plots three approximations to erfc (X). In that figure one can visually observe that
2
1 X2
e
2
X
1 e 2
2
4
0
Figure 5: For = 4, the points that satisfy () outside of the black curve, points that
satisfy subset () outside of the green curve, and points that satisfy super () outside of the
red curve. When integrated over these regions we obtain exact, lower bound, and upper
bound values for PF and PD .
Problem 2.2.17 (multidimensional LRTs)
Part (1): For the given densities we have that when
202
1
1 2 1
1
1 2 1
(X1 , X2 ) =
+ exp X2
exp X1
41 0
2
12 02
2
12 02
1 2 12 02
1 2 12 02
0
+ exp
> ,
exp
X
X
=
21
2 1
02 12
2 2
02 12
we decide H1 . Solving for the function of X1 and X2 in terms of the parameter of this
problem we have
1 2 12 02
1 2 12 02
21
exp
+ exp
>
(83)
X1
X2
2 2
2 2
2
0 1
2
0 1
0
For what follows lets assume that 1 > 0 . This integration region is like a nonlinear ellipse
in that the points that satisfy this inequality are outside of a squashed ellipse. See Figure 5
where we draw the contour (in black) of Equation 83 when = 4. The points outside of this
squashed ellipse are the ones that satisfy ().
Here () is the region of the (X1 , X2 ) plane that satisfies Equation 83. Recalling the
identity that X < eX when X > 0 the region of the (X1 , X2 ) plane where
1 2 12 02
1 2 12 02
+ X2
> ,
(84)
X
2 1
02 12
2
02 12
will be a smaller set than (). The last statement is the fact that X1 and X2 need to be
larger to make their squared sum bigger than . Thus there are point (X1 , X2 ) closer to
the origin where Equation 83 is true while Equation 84 is not. Thus if we integrate over this
new region rather than the original () region we will have an lower bound on PF and PD .
Thus
Z
PF
p(X1 , X2 |H0 )dX1 dX2
subset ()
Z
PD
p(X1 , X2 |H1 )dX1 dX2 .
subset ()
Here subset () is the region of the (X1 , X2 ) plane that satisfies Equation 84. This region
can be written as the circle
20 1
.
X12 + X22 > 2
1 02
See Figure 5 where we draw the contour (in green) of Equation 84 when = 4. The points
outside of this circle are the ones that belong to subset () and would be integrated over to
obtaining the approximations to PF and PD for the value of = 4.
One might think that one could then integrate in polar coordinates to evaluate these integrals.
This appears to be true for the lower bound approximation for PF (where we integrate against
p(X1 , X2 |H0 ) which has an analytic form with the polar expression X12 +X22 ) but the integral
over p(X1 , X2 |H1 ) due to its functional form (even in polar) appears more difficult. If anyone
knows how to integrate this analytically please contact me.
To get an upper bound on PF and PD we want to construct a region of the (X1 , X2 ) plane
that is a superset of the points in (). We can do this by considering the internal polytope
or the box one gets by taking X1 = 0 and solving for the two points X2 on Equation 83
(and the same thing for X2 ) and connecting these points by straight lines. For example,
when we take = 4 and solve for these four points we get
(1.711617, 0) ,
(0, 1.711617) ,
(1.711617, 0) ,
(0, 1.711617) .
One can see lines connecting these points drawn Figure 5. Let the points in the (X1 , X2 )
space outside of these lines be denoted super (). This then gives the bounds
Z
PF <
p(X1 , X2 |H0 )dX1 dX2
super ()
Z
PD <
p(X1 , X2 |H1 )dX1 dX2 .
super ()
The R script chap 2 prob 2.2.17.R performs plots needed to produce these figures.
2 k .
=
pk exp
=
e
p
e
k
2
2
i=1
i=1
If the above ratio is greater than some threshold we decide H1 otherwise we decide H0 .
Part (2): If M = 2 and p1 = p2 =
1
2
m
1 m22 m2 X1
+ e 2 X2 ) > .
e (e
2
m2
e 2 X1 + e 2 X2 > 2e 2 .
(85)
Note that > 0. For the rest of the problem we assume that m > 0. Based on this
expression we will decide H1 when the magnitude of (X1 , X2 ) is large and the inequality
in Equation 85 is satisfied. For example, when = 4 the region of the (X1 , X2 ) plane
classified as H1 are the points to the North and East of the boundary line (in black) in
Figure 6. The points classified as H0 are the points to the South-West in Figure 6. Part
(3): The exact expressions for PF and PD involve integrating over the region of (X1 , X2 )
space defined by Equation 85 but with different integrands. For PF the integrand is p(X|H0 )
and for PD the integrand is p(X|H1 ).
We can find a lower bound for PF and PD by noting that for points on the decision boundary
threshold we have
m
m
e 2 X1 = e 2 X2
Thus
m
e 2 X1 <
or X1 <
2
ln() .
m
4
2
0
4
Figure 6: For = 4, the points that are classified as H1 are to the North and East past
the black boundary curve. Points classified as H0 are in the South-West direction and are
below the black boundary curve. A lower bound for PF and PD can be obtained by
integrating to the North and East of the green curve. A upper bound for PF and PD can be
obtained by integrating to the North and East of the red curve.
The same expression holds for X2 . Thus if we integrate instead over the region of X1 >
2
2
ln() and X2 > m ln() we will get a lower bound for PF and PD . To get an upper bound
m
we find the point where the line X1 = X2 intersects the decision boundary. This is given at
the location of
m
2
.
2e 2 X1 = or X1 =
ln
m
2
Using this we can draw an integration region to compute an upper bound for PF and PD .
We would integrate over the (X1 , X2 ) points to the North-East of the red curve in Figure 6.
The R script chap 2 prob 2.2.18.R performs plots needed to produce these figures.
2
+
R +
Ri
exp
=
1
2 12 02 i=1 i
12
0 i=1
212
202
N
m2
m2
1
0
N
0
1 1
1
m1 m0
2 2
2
1
0
=
exp
l +
2 l .
e
1
2 12 02
12
0
We decide H1 if the above ratio is greater than our threshold . The above can be written
( 2 2 )
N
m1
m
N
20
1
m1 m0
1
1 1
2
2
1
0
2 l +
2 l > ln
.
(86)
e
2
2
2 1
0
1
0
0
The above is the expression for a line in the (l , l ) space. We take the right-hand-side of
the above expression be equal to (a parameter we can change to study different possible
detection trade offs).
Part (2): If m0 = 21 m1 > 0 and 0 = 21 the above LRT, when written in terms of m1 and
1 , becomes if
7m1
3
l 2 l > ,
2
81
81
then we decide H1 . Note that this decision region is a line in (l , l ) space. An example of
this decision boundary is drawn in Figure 7 for m1 = 1, 1 = 12 , and = 4.
3
2
1
0
L_BETA
1
2
3
3
L_ALPHA
Figure 7: For m1 = 1, 1 = 21 , and = 4, the points that are classified as H1 are the ones
in the South-East direction across the black decision boundary.
Problem 2.2.20 (specifications of different means and variances)
Part (1): When m0 = 0 and 0 = 1 Equation 86 becomes
m1
l > .
12
If we assume that m1 > 0 this is equivalent to l >
12
m1
12
,
m1
PD =
2
1
m1
p(L|H1 )dL .
P
Recall that L in this case is l N
i=1 Ri we can derive the densities p(L|H0 ) and p(L|H1 )
from the densities P
for Ri in each case. Under H0 each Ri is a Gaussian with mean 0 and
2
variance 1 . Thus N
variance N12 . Under H1 each Ri
i=1 Ri is a Gaussian with mean 0 and P
2
is a Gaussian with a mean of m1 and a variance 1 . Thus N
i=1 Ri is a Gaussian with mean
1.0
0.0
0.2
0.4
P_D
0.6
0.8
1.0
0.8
0.6
P_D
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
P_F
0.2
0.4
0.6
0.8
1.0
P_F
Figure 8: Left: The ROC curve for Problem 2.2.20. Part 1. Right: The ROC curve for
Problem 2.2.20. Part 2.
Nm1 and variance N12 . Thus we have
Z
2
21 L 2
1
N
1 dL
e
PF = 2 p
2
1
2N
1
m
Z 1
1
1 1V 2
2
e
dV = erfc
=
2
Nm1
1
N m1
Z
(LNm1 )2
21
1
N 2
1
dL
PD = 2 p
e
2
1
2N1
m1
!
Z
1 1V 2
N m1
1
e 2 dV = erfc
=
Nm1
2
N
m
1
1
1
Nm
1
In the R script chap 2 prob 2.2.20.R we plot the given ROC curve for m1 = 1, 1 = 12 , and
N = 3. When this script is run it produces the plot given in Figure 8 (left).
Part (2): When m0 = m1 = 0, 12 = s2 + n2 , and n2 = 02 Equation 86 becomes
1
1
1
2 l > ,
2 s2 + n2
n
or simplifying some we get
l >
2n2 (s2 + n2 )
.
s2
(87)
This is a horizontal line in the l , l plane. Lets define the constant on the right-hand-side
as . Points in the l , l plane above this constant are classified as H1 and points below
this constant are classified as H0 . To compute the ROC we have
Z
PF =
p(L|H0 )dL
Z
PD =
p(L|H1 )dL .
PN
Recall that L in this case is l i=1 Ri2 . We can derive the densities p(L|H0 ) and p(L|H1 )
from the densities for Ri under H0 and H1 .
2
Under HP
0 each Ri is a Gaussian random variable with a mean of 0 and a variance of n .
N
1
2
Thus 2 i=1 Ri is a chi-squared random variable with N degrees of freedom and we should
n
write Equation 87 as
2(s2 + n2 )
l
>
,
n2
s2
so
Z
PF =
p(L|H0 )dL .
2
2
2(s +n )
2
s
Here p(L|H1 ) is again the chi-squared probability density with N degrees of freedom.
In the R script chap 2 prob 2.2.20.R we plot the given ROC curve for n = 1, s = 2, and
N = 3. When this script is run it produces the plot given in Figure 8 (right).
(x xi )2 (y yi )2 (z zi )2
+
+
,
2
2
2
(88)
for i = 0, 1 to perform our hypothesis test we can use a chi-squared distribution for the
distributions p(D 2 |H0 ) and p(D 2 |H1 ).
If we want to use the distance (rather than the square distance) as our measure of deviation
between the impact point and one of the hypothesis points, it turns out that the Euclidean
distance between two points is given by a chi distribution (not chi-squared). That is if Xi
are normal random variables with means i and variance i2 then
v
u N
uX Xi i 2
Y =t
,
i
i=1
is given by a chi distribution. The probability density function for a chi distribution looks
like
N
x2
21 2 xN 1 e 2
f (x; N) =
.
(89)
N2
If we remove the mean i from the expression for Y we get the noncentral chi distribution.
That is if Z looks like
v
u N 2
uX Xi
,
Z=t
i
i=1
r
PN i 2
and we define =
then the probability density function for Z is given by
i=1 i
x2 +2
e 2 xN
f (x; N, ) =
I N 1 (x) ,
2
(x)N/2
(90)
where I (z) is the modified Bessel function of the first kind. Since the chi distribution is
a bit more complicated than the chi-squared we will consider the case where we use and
use expression Equation 88 to measure distances. For reference, the chi-squared probability
density looks like
x
N
x 2 1 e 2
(91)
f (x; N) = N
.
2 2 N2
Using this the LRT for H1 against H0 look like
(xx1 )2
2
p(R|H1 )
=
p(R|H0 )
(xx0 )2
2
+
+
(yy1 )2
2
(zz1 )2
2
(yy0 )2
2
(zz0 )2
2
N 1
2
N 1
2
1 (x x1 )2 (y y1 )2 (z z1 )2
1 (x x0 )2 (y y0 )2 (z z0 )2
+
+
+
+
exp
+
2
2
2
2
2
2
2
2
N 1
(x x1 )2 + (y y1 )2 + (z z1 )2 2
=
(x x0 )2 + (y y0 )2 + (z z0 )2
1
.
exp 2 (x x1 )2 (x x0 )2 + (y y1 )2 (y y0 )2 + (z z1 )2 (z z0 )2
2
Part (2): The time variable is another independent measurement and the density function
for the combined time and space measurement would simply be the product of the spatial
density functions for H1 and H0 discussed above and the Gaussian for time.
1
M
1 M
X
X
Cij
i=0 j=0
Zi
p(Hj |R)p(R)dR =
M
1 Z
X
i=0
p(R)
Zi
M
1
X
j=0
Given a sample R the risk is given by evaluating the above over the various Zi s. We can
make this risk as small as possible by picking Zi such that it only integrates over points R
where the integrand above is as small as possible. That means pick R to be from class Hi if
p(R)
M
1
X
j=0
M
1
X
j=0
We can drop p(R) from both sides to get the optimal decision to pick the smallest value of
(over i)
M
1
X
Cij p(Hj |R) .
j=0
This is the definition of i and is what we minimize to find the optimal decision rule.
Part (2): When the costs are as given we see that
i =
M
1
X
j=0;j6=i
+m
2
m
2
p(R|H3 )dR
+ 3m
2
m
2
m
2
3m
2
p(R|H2)dR
p(R|H4)dR
3m
2
p(R|H5 )dR .
These integrals could be evaluated by converting to expressions involving the error function.
0.15
0.10
0.00
0.05
H1
10
10
15
Figure 9: The three densities for Problem 2.3.4 for m = 2.5. Here p(R|H1 ) is plotted in red,
p(R|H2 ) is plotted in black, and p(R|H3 ) is plotted in green. The hypothesis, Hi , that is
chosen for any value of R corresponds to the index of the largest likelihood p(R|Hi ) at the
given point R. The decision regions are drawn in the figure as light gray lines.
decision made. From the plot it looks like the classification is to pick green or H3 then
red or H1 , then black or H2 , and finally green or H3 again. Each of these points where
the decision changes is given by solving equations of the form p(R|Hi ) = p(R|Hj ). We will
compute these decision boundaries here. For the first decision boundary we need to find R
such that p(R|H3 ) = p(R|H1 ) or
R2
R2
1
exp
= exp 2 .
2(2m2 )
2m
2
p
When we take the logarithm and solve for R we find R = m 2 ln(2) = 2.943525 when
we consider with m = 2.5. Lets denote this numeric value as m13 . For the second decision
boundary we need to find R such that p(R|H1) = p(R|H2) or
R2
(R m)2
=
2m2
2m2
or R =
m
.
2
We will denote this boundary as m12 . Finally, for the third decision boundary we need to
find R such that p(R|H2 ) = p(R|H3 ) or
R2
1
(R m)2
.
= exp
exp
2m2
2(2m2 )
2
This can be written as
or solving for R we get
R = 2m
p
2m 1 + 8 ln(2) = {4.045, 14.045} ,
when m = 2.5. We would take the larger value (and denote it m23 ) since we are looking for
a positive decision boundary.
Part (3): To evaluate this probability we will compute
Pr() = 1 Pr(correct) ,
with Pr(correct) given by
Z m13
Z
3Pr(correct) =
p(R|H3 )dR +
m12
p(R|H1 )dR +
m13
m23
p(R|H2 )dR +
m12
p(R|H3 )dR .
m23
The factor of three in the above expression is to account for the fact that all three hypothesis
are equally likely i.e. P (Hi ) = 13 for i = 1, 2, 3.
Some of this algebra is done in the R code chap 2 prob 2.3.4.R.
11 21
22
2
1
1
1 1
1
n
1
exp
2 l1
2 l2
= p
2 s2 + n2
n
2 n2
n
n s2 + n2
n
s2
=p
exp
l1 .
2n2 (s2 + n2 )
s2 + n2
1 (R)
p(R|H3 )
p(R|H1 )
R22
R12
R22
1 R12
11 21
+ 2 2 2
exp
=
2
13 23
2 13
11 21
23
2
1 1
1
1
1
1
n
exp
2 l1
2 l2
= p
2 n2
n
2 s2 + n2
n
n s2 + n2
2
n
s
=p
exp
l2 .
2
2
2
2n (s2 + n2 )
s + n
2 (R)
The class priors are specified as P0 = 1 2p and P1 = P2 = p so using Equations 17, 18,
and 19 we have that the decision region is based on
if p(1 0)1 (R) > (1 2p)(1 0) + p( 1)2 (R) then H1 or H2 else H0 or H2
if p(1 0)2 (R) > (1 2p)(1 0) + p( 1)1 (R) then H2 or H1 else H0 or H1
if p( 0)2 (R) > (1 2p)(1 1) + p( 1)1 (R) then H2 or H0 else H1 or H0 .
Simplifying these some we get
if
if
if
1
2 + ( 1)2 (R) then H1 or H2 else H0 or H2
p
1
2 (R) > 2 + ( 1)1 (R) then H2 or H1 else H0 or H1
p
2 (R) > 1 (R) then H2 or H0 else H1 or H0 .
1 (R) >
To find the decision regions in the l1 l2 plane we can first make the inequalities above
equal to equalities to find the decision boundaries. One decision boundary is easy, when we
put in what we know for for 1 (R) and 2 (R) the last equation becomes l2 = l1 , which is a
diagonal line in the l1 l2 space. To determine the full decision boundaries lets take values
for p and say p = 0.25 and = 0.75 and plot the first two decision boundaries above. This
is done in Figure 10. Note that the first and second equation are symmetric (give the other
equation) if we switch l1 and l2 . Thus the decision boundary in the l1 l2 space expressed
by these two expressions is the same. Based on the inequalities in the above expressions we
would get the classification regions given in Figure 10. Some of this algebra and our plots is
performed in the R code chap 2 prob 2.3.5.R.
10
H_2
H_0
l_2
H_1
10
H_0
10
10
l_1
Figure 10: The decision boundaries for Problem 2.3.5 with the final classification label
denoted in each region.
Problem 2.3.6 (a discrete M class decision problem)
Part (1): We are given
Pr(r = n|Hm ) =
n
km
(km)n ekm
ekm =
,
n!
n!
for m = 1, 2, 3, . . . , M and k a fixed positive constant. Thus each hypothesis involves the
expectation of progressively more samples being observed. For example, the mean number
of events for the hypothesis H1 , H2 , H3 , H4, . . . are k, 2k, 3k, 4k, . . . . Since each hypothesis
is equally likely and the coefficients of the errors are the same the optimal decision criterion
corresponds to picking the hypothesis with the maximum likelihood i.e. we pick Hm if
(km )n ekm
(km)n ekm
Pr(r = n|Hm ) =
,
Pr(r = n|Hm ) =
n!
n!
(92)
for all m 6= m. For the value of k = 3 we plot Pr(r = n|Hm ) as a function of n for
several values of m in Figure 11. In that plot you can see that for progressively larger values
of r we would select larger hypothesis values Hm . In the plot presented, as r, increased
we would decide on the hypothesis H1 , H2 , H3 , . Our decision as to when the selected
hypothesis changes is when two sequential likelihood functions have equal values. If we take
the inequality in Equation 92 as an equality, the boundaries of the decision region between
two hypothesis m and m are where (canceling the factor n!)
0.20
0.15
H_1
H_2
H_3
0.10
H1
H_4
0.00
0.05
H_5
10
15
20
Figure 11: Plots of the likelihoods Pr(r = n|Hm ) as a function of r in Problem 2.3.6 for
various hypothesis or values of m. The optimal decision (given a measurement r) is to take
the hypothesis that has the largest likelihood.
Solving for n (the decision boundary between Hm and Hm ) in the above we get
n=
k(m m )
.
m
ln m
Since the change in hypothesis happens between sequential Hm and Hm we have that m =
m + 1 and the above simplifies to
nm m =
k
ln 1 +
1
m
.
We can evaluate this expression for m = 1 to find the locations where we switch between
H1 and H2 , for m = 2 to find the location where we switch between H2 and H3 , etc. For
m = 1, 2, 3, 4, 5 (here M = 5) we find
[1]
4.328085
These numbers match very well the cross over points in Figure 11. Since the number of counts
received must be a natural number we would round the numbers above to the following
H1 : 1 r 4
H2 : 5 r 7
H3 : 8 r 10
..
. $
%
$
k
k
+1r
Hm :
1
ln 1 + m1
ln 1 +
..
. &
'
k
r .
HM :
ln 1 + M11
1
m
(93)
M X
X
Pr(r = n|Hm )
m=1 nZm
4
X
n=1
Pr(r = n|H1 ) +
7
X
n=5
Pr(r = n|H2 ) +
10
X
n=8
Pr(r = n|H3 ) + +
nu
X
Pr(r = n|Hm ) ,
n=nl
where the lower and upper summation endpoints nl and nu are based on the decision boundary computed as given in Equation 93.
with l1 defined as
l1 =
3
X
i=1
3
1 X
(m1i m0i )ri
ci ri = 2
i=1
so ci =
1
(m1i m0i ) .
2
p(R|H2 )
p(R|H0 )
1 T
1
T
T
r (m2 m0 )
= exp 2 m2 m2 m0 m0 exp
2
2
l2
1
T
T
= exp 2 m2 m2 m0 m0 e ,
2
2 (R) =
with l2 defined as
l2 =
3
X
i=1
3
1 X
(m2i m0i )ri
d i ri = 2
i=1
so di =
1
(m2i m0i ) .
2
Part (2): For the given cost assignment given here when we use Equations 17, 18, and 19
(and cancel out the common cost) we have
P1 1 (R) > P0 P2 2 (R) then H1 or H2 else H0 or H2
P2 2 (R) > P0 then H2 or H1 else H0 or H1
P2 2 (R) > P0 + P1 1 (R) then H2 or H0 else H1 or H0 .
To make the remaining problem simpler, we will specify values for P0 = P1 = P2 = 31 and
values for m0 , m1 , m2 , and so that every expression in the decision boundary can be
evaluated numerically. This is done in the R script chap 2 prob 2.3.6.R and the result
plotted in Figure 12.
p(R|A)p(A)
,
p(R)
from which we see that we need to compute p(R|A) and p(A). Now when the variable a is
given the expression we are observing or r = ab + n is the sum of two independent random
variables ab and n. These two terms are independent normal random variables with zero
means and variances a2 b2 and n2 respectively. The variance of r is then the sum of the
variance of ab and n. With these arguments we have that the densities we need in the above
given by
1
R2
p(R|A) =
exp
2(A2 b2 + n2 )
2(A2 b2 + n2 )1/2
1
A2
p(A) =
exp 2 .
2a
2a
10
H_1
l_2
H_2
H_1
10
H_0
10
10
l_1
Figure 12: Plots of decision regions for Problem 2.3.6. The regions into which we would
classify each set are labeled.
Note to calculate our estimate a
map (R) we dont need to explicitly calculate p(R), since it is
not a function of A. From the above densities
1
1
R2
ln(p(R|A)) = ln(2) ln(A2 b2 + n2 )
2
2
2(A2 b2 + n2 )
1
A2
ln(p(A)) = ln(2) ln(a ) 2 ,
2
2a
and
a2 b2
p
a2 b2 a2 b2
a4 b4 + 4a2 b2 R2
=
2
2
q
1+
4R2
a2 b2
From the definition of v we expect v > 0 thus we must take the expression with the positive
sign. Now that we know v we can solve for A2 b2 + n2 = v for A from which we would find
two additional solutions for amap (R).
Part (2): If we desire to simultaneously find amap and bmap then we must treat our two
unknowns a and b in a vector (a, b) and then find them both at the same time. Thus our
unknown A is now the vector (a, b) and we form the log-likelihood l given by
l = ln(p(R|A, B)) + ln(p(A, B)) ln(p(R))
= ln(p(R|A, B)) + ln(p(A)) + ln(p(B)) ln(p(R))
2
2
2
12 B2
1
1
1
12 (RAB)
12 A2
2
n
b
+ ln
= ln
ln(p(R)) .
e
e a + ln
e
2n
2a
2b
We want to maximize l with respect to A and B so we take derivatives of l with respect to
A and B, set the result equal to zero, and then solve for A and B. The A and B derivative
of l are given by
B
(R AB)B
2 =0
2
n
b
(R AB)A
A
= 0.
n2
a2
One solution to these equations is A = B = 0. This is the only solution for if A and B are
both non-zero then the above two equations become
n2
R AB = 2
b
n2
and R AB = 2 ,
a
Pk
2(kb2
+ n2 )1/2
i=1 bi
+ n we have that
(R A)2
.
exp
2(kb2 + n2 )
1
2(kb2 + n2 )1/2
(R A)2
.
2(kb2 + n2 )
With l ln(p(R|A)) + ln(p(A)) ln(p(R)) taking the A derivative of l, and setting it equal
to zero, gives
RA
A
2 = 0.
2
2
kb + n a
If we solve the above for A we find
a2 R
,
A= 2
a + kb2 + n2
(94)
e 2a
e 2b .
p(A) =
2a
2
b
i=1
Then computing the log-likelihood l ln(p(R|A)) + ln(p(A)) ln(p(R)) we find
P
l
(R A li=1 Bi )2
A2
1 X 2
2 2
l=
B + constants .
2n2
2a 2b i=1 i
If we take the derivative of l with respect to A and set it equal to zero we get
!
l
X
1
A
RA
Bi 2 = 0 .
2
n
a
i=1
If we take the derivative of l with respect to Bi and set it equal to zero we get
!
l
X
Bi
1
Bi 2 = 0 ,
RA
2
n
b
i =1
for i = 1, 2, , k 1, k. As a system
1
1
1
1
+
2
2
2
2
n
n
n a
1
1
1
1
2
2 + 2
2
n
n
n
b
1
1
1
1
2
2
2 + 2
n
n
n
..
1
1
1
2
2
2
n
n
n
1
2
n
1
2
n
1
2
n
of equations this is
1
2
n
1
2
n
1
2
n
1
2
n
2
n
..
.
1
+
2
1
2
n
1
2
n
..
.
1
b2
1
2
n
1
2
n
1
b2
A
B1
B2
..
.
Bk1
Bk
1
1
1
..
.
= 2
n
1
1
This system must have a solution that depends on k (the number of bi terms). While it
might be hard to solve the full system above analytically we can solve smaller systems, look
for a pattern and hope it generalizes. If anyone knows how to solve this system analytically
as is please contact me. When k = 1 we have the system
"
#
1
1
1
+
R 1
2
2
2
A
n
a
n
= 2
.
1
1
1
B1
n 1
2
2 + 2
n
n
b
Solving for A and B we get
A=
n b R
a b + b n + a n
and B1 =
n a R
.
a b + b n + a n
Here we have introduced the precision which is defined as one over the variance. For
instance we have a 12 , b 12 , and n 12 . If we do the same thing when k = 2 we find
a
n
b
solutions for A, B1 , and B2 given by
A=
n b R
a b + 2a n + b n
and B1 = B2 =
n a R
.
a b + 2a n + b n
n b R
a b + 3a n + b n
and B1 = B2 = B3 =
n a R
.
a b + 3a n + b n
n b R
a b + ka n + b n
and B1 = B2 = = Bk1 = Bk =
n a R
.
a b + ka n + b n
See the Mathematical file chap 2 prob 2 4 1.nb where we solve the above linear system
for several values of k. If we take the above expression for A (assuming k terms Bi in the
summation) and multiply by n 1a b on top and bottom of the fraction we get
A=
1
n
a2 R
+ kb +
1
a
a2 R
,
n2 + kb2 + a2
l n l n 1
e
(n )
for 0 ,
(95)
and is zero for < 0. This density has an expectation and variance [1] given by
n
l
n
Var() = 2 .
l
E() =
(96)
(97)
Part (2): After the first measurement is made we need to compute the a posterior distribution of using Bayes rule
p(X|)p()
p(|X) =
.
p(X)
For the densities given we have
n
l n
1
X l
l n 1
e
e(l +X) n .
=
e
p(|X) =
p(X)
(n )
p(X)(n )
In the above expression the factor p(X) just contributes to the normalization constant in
that p(|X) when integrated over valid values of should evaluate to one. Now notice that
the functional form for p(|X) derived above is of the same form as a gamma distribution
just like the priori distribution p() was. That is (with the proper normalization) we see
that p(|X) must be
(l + X)n +1 (l +X) n
e
.
p(|X) =
(n + 1)
ms is just the mean of with the above density. From Equations 96 and 97 we see
Then
that for the a posteriori density that we have here
ms = n + 1
l + X
ms )2 ] =
and E[(
n + 1
.
(l + X)2
Part (3): When we now have several measurements Xi the calculation of the a posteriori
density is as follows
!
!
n
n
Y
Y
p(X|)p()
l n l n 1
1
1
Xi
p(|X) =
p(Xi |) p() =
e
=
e
p(X)
p(X) i=1
p(X) i=1
(n )
(
!)
n
n
X
l
exp l +
Xi
n+n 1 .
=
p(X)(n )
i=1
Again we recognize this is a gamma distribution with different parameters so the normalization constant can be determined and the full a posteriori density is
(
!)
P
n
n+n
X
(l + ni=1 Xi )
p(|X) =
exp l +
Xi
n+n 1 .
(n + n )
i=1
Again using Equations 96 and 97 our estimates of and its variance are now given by
ms =
n + n
P
l + ni=1 Xi
ms )2 ] =
and E[(
n + n
P
2 .
(l + ni=1 Xi )
d
d
(n 1)
l n l n 1
d
=
ln(p(|X)) =
ln
e
[l
+
(n
1)
ln()]
=
l
+
.
d
d
(n )
d
Setting this expression equal to zero and solving for then gives
n1
map = n 1 = n +
Pn
.
l
l + i=1 Xi
l +
1
P
n
i=1
Xi
(A m0 )2
k0
(R A)2
exp
2 exp 2
2
2n
2n
n
n
2 k2
1
k0
1
=
exp 2 (R A)2 + k02 (A m0 )2
2
2n p(R)
2n
1
2
= exp 2 A 2RA + R2 + k02 A2 2k02 m0 A + k02 m20
2n
2
R + k0 m0
1
1
A2 2
= exp 2 (1 + k02 )A2 2(R + k02 m0 )A
A
= exp
2
2
2n
1 + k0
n
2
2
1+k0
2
2 !
R + k02 m0
R + k02 m0
R + k02 m0
1
A2 2
A
+
= exp
2
2
2
2
1 + k0
1 + k0
1 + k0
n
2
2
1+k0
2
R + k02 m0
1
A
= exp
.
2
2
1 + k0
n
2
1+k0
Here , , and are constants that dont depend on A. Notice that this expression is a
normal density with a mean and variance given by
m1 =
R + k02 m0
1 + k02
and 12 =
n2
.
1 + k02
N
Y
i=1
1
(Ri A)2
.
exp
2n2
2n
#
N
2
X
1
(A m0 )
p(A|R) = exp
(Ri A)2 exp
2
2n i=1
n
2 k2
"
N
X
"
1
= exp
2n
i=1
"
1
= exp
2n
"
1
= exp
2
!#
(R2i 2ARi + A2 )
N A 2A
(
k2
N
+ 02
2
n
n
1
= exp
2
n
2 N+k
2
1
= exp
2
n
2 N+k
2
N
X
Ri
i=1
!#
(A2 2Am0 )
exp
2
2 kn
2
0
m0 k02
i=1 Ri
+
2
2
n
n
A 2
2
2
n
2
n
N + k02
exp
2
n
2 k2
0
PN
A2
1
N + k02
N
X
N
X
! )#
A
Ri + m0 k02
i=1
Ri + m0 k02
i=1
! )
!)2
When we put the correct normalization we see that the above is a normal density with mean
and variance given by
P
m0 k02 + N N1 N
R
i
i=1
n2
2
.
and
=
mN =
N
k02 + N
N + k02
Problem 2.4.4 (more conjugate priors)
Part (1): For the densities given we find that
p(A|R) =
p(R|A)p(A)
1
=
p(R)
p(R)
N
Y
A1/2
A
2
(R
m)
exp
i
2
(2)1/2
i=1
c
p(R)(2)N/2
N + k1 1
2
2
exp
c
p(R)(2)N/2
N + k1 1
2
2
exp
!
cA
k1
1
2
!
N
AX
1
(Ri m)2
Ak1 k2
2
2 i=1
!!
N
X
1
2
A k1 k2 +
(Ri m)
.
2
i=1
1
exp Ak1 k2
2
(98)
This will have the same form as the a priori distribution for A with parameters k1 and k2 if
k1 + N
k1
1 =
1 or k1 = k1 + N ,
2
2
and
k1 k2
= k1 k2 +
N
X
i=1
Solving for
k2
1
=
k1
k1 )
k1 k2 + N
(Ri m)2 .
we get
N
1 X
(Ri m)2
N i=1
!!
k1
1
1
1
(A)r1 eA ,
(r)
(99)
we see that the a posteriori distribution, computed above, is also a gamma distribution with
k
= 21 k1 k2 and r = 21 . Using the known expression for the mean of a gamma distribution in
the form of Equation 99 we have
k
1
1
r
E[A] = = 1 2 = .
k2
kk
2 1 2
k1 + N
k1
=
.
k1 k2 + Nw
k1 k2 + Nw
and
2
K
n2
=
K + k02
K
1 X
with l =
Ri .
K i=1
(100)
To make the a priori model for this problem match the one from Problem 2.4.3 we need to
take m0 = 0 and kn0 = a or k0 = na . Thus the mean mK of the a posteriori distribution for
a becomes (after K measurements)
P
!
!
K
K K1 K
R
i=1 i
1
1 X
mK = a
ms =
=
Ri .
(101)
2
2
n
K i=1
K + n2
1 + K
2
a
Part (3): The mean square error is the integral of the conditional variance over all possible
values for R i.e. the books equation 126 we get
Z
Z
Z
n2
n2
=
,
Rms =
dRp(R)
[A a
ms (R)]p(A|R)dA =
dRp(R)
K + k02
K + k02
or the conditional variance again (this true since the conditional variance is independent of
the measurement R).
Part (4-a): Arguments above show that each density over only the first j measurements or
p(A|{Ri }ji=1 ) should be normal with a mean and a variance which we will denote by
a
j (R1 , R2 , , Rj ) and j2 .
We can now start a recursive estimation procedure with our initial mean and variance estimates given by a
0 = 0 and 02 = a2 and using Bayes rule to link mean and variance estimates
as new measurements arrive. For example, we will use
p(A|R1 , R2 , , Rj1 , Rj ) =
)!
1
exp 2 (A Rj )2
2n
2j1
2n
(
!
1
1
1
2
2
2
2
=
exp 2 (A 2
aj1 A + a
j1 )
(A 2Rj A + Rj )
2
2j1 n p(R1 , , Rj )
2j1
2n
"
!
! #)
(
a
j1
Rj
1
1
1
+ 2 A2 2
+ 2 A
= exp
2
2
2
j1
n
j1
n
a
j1
Rj
+
1
2
2
n
j1
2
1
1
A .
= exp
+ 2 A 2
2
1
2 j1
+ 12
2
1
p(A|R1 , R2 , , Rj1 , Rj ) =
p(R1 , , Rj )
exp
2
2j1
(A a
j1 )
j1
From the above expression we will define the variance of p(A|{Ri }ji=1 ) or j2 as
1
1
1
2 + 2
2
j
j1 n
so j2 =
2
j1
n2
.
2
n2 + j1
(102)
2 )
1
a
R
j1
j
p(A|R1 , R2 , , Rj1, Rj ) = exp 2 A
.
+ 2 j2
2
2j
j1
n
With the correct normalization this is a normal density with a mean given by
a
j1 Rj
+ 2 j2
a
j (R1 , R2 , , Rj1, Rj ) =
2
j1
n
=
2
j1
n2
a
(R
,
R
,
,
R
)
+
Rj .
j1
1
2
j1
2
2
n2 + j1
n2 + j1
2
This expresses the new mean a
j as a function of a
j1 , j1
and the new measurement Rj .
From Equation 102 with j = 1 and 02 = a2 when we iterate a few of these terms we find
1
1
1
=
+ 2
2
2
1
a n
1
2
1
1
1
=
+ 2 = 2+ 2
2
2
2
1 n
a n
1
2
1
1
3
1
=
+ 2+ 2 = 2+ 2
2
2
3
a n n
a n
..
.
j
1
1
+ 2.
=
2
2
j
a n
Problem 2.4.6
Part (1): Using Equation 29 we can write our risk as
Z
R(a|R) =
C(ams a
+ Z)p(Z|R)dZ .
(103)
0
Z
Z
=
C(Z)p(Z + a ams |R)dZ +
C(Z)p(Z + a
a
ms |R)dZ
0
0
Z
Z
=
C(Z)p(Z a
+a
ms |R)dZ +
C(Z)p(Z + a
a
ms |R)dZ
0
Then R R(a|R) R(ams |R) can be computed by subtracting the two expressions above.
Note that this would give the expression stated in the book except that the product of the
densities would need to be subtraction.
E[Ri ]E[Rj ] = m2
i 6= j
,
2
2
2
Var(Ri ) + E[Ri ] = + m i = j
note that we have used the fact that the Ri are uncorrelated in the evaluation of E[Ri Rj ]
when i 6= j. Next define
n
1X
Ri ,
R=
n i=1
X
1X
2= 1
i+R
2)
V =
(Ri R)
(R2 2RR
n i=1
n i=1 i
(104)
n
n
X
R
n 2
1X 2
Ri 2
Ri + R
=
n i=1
n i=1
n
n
1 X 2 2
R R .
n i=1 i
(105)
1X
2 ] = 2 + m2 E[R
2] .
2 ] = 1 n( 2 + m2 ) E[R
E[V ] =
E[Ri2 ] E[R
n i=1
n
2 . From the definition of R
Thus we have
2] =
E[R
2
1
1 X
2
2
2
E[R
R
]
=
n(
+
m
)
+
n(n
1)m
=
+ m2 .
i
j
2
2
n i,j
n
n
2
2
2
+ m = 2
6= 2 .
E[V ] = + m
n
n
(106)
(107)
1 X
2
(Ri R)
n 1 i=1
n
n
X
1 X 2
n 2
R
=
Ri 2
Ri +
R
n 1 i=1
n 1 i=1
n1
n
1 X 2
n 2
Ri
R .
n 1 i=1
n1
1 X
n
2]
E[Ri2 ]
E[R
n 1 i=1
n1
2
n
1
2
2
2
n( + m )
+ m = 2 ,
=
n1
n1 n
Pn
1
2
2
when we simplify. Thus n1
i=1 (Ri R) is an unbiased estimator of .
E[V ] =
Now since R is drawn from a binomial random variable with parameters (n, A), the expectation of R is An, from which we see that the above equals zero and our estimator is unbiased.
To study the conditional variance of our error (defined as e = a
A) consider
For the density considered here the derivative of the above is given by
ln(p(R|A))
R nR
R nA
=
=
.
A
A 1A
A(1 A)
Squaring this expression gives
R2 2nAR + n2 A2
.
A2 (1 A)2
Taking the expectation with respect to R and recalling that E[R] = nA and
we have
(111)
(nA(1 A) + n2 A2 ) 2nA(nA) + n2 A2
n
=
,
A2 (1 A)2
A(1 A)
when we simplify. As this is equal to 1/e2 (A) computed in Equation 109 this estimator is
efficient. Another way to see this argument is to note that in addition Equation 111 can be
written in the efficiency required form of
ln(p(R|A))
= [a(R) A]k(A) .
A
by writing it as
ln(p(R|A))
R
n
.
=
A
A
n
A(1 A)
=
(Ri A1 )2 .
(2)n/2 An/2
2A
2 i=1
2
Taking logarithm of the above expression we desire to maximize we get
n
n
1 X
n
(Ri A1 )2 .
ln(2) ln(A2 )
2
2
2A2 i=1
"
1
A2
Pn
(R A1 )
i=1
Pn i
1
+ 2A2 i=1 (Ri A1 )2
n2 A12
= 0.
a1 =
1X
Ri .
n i=1
When we put this expression into the second equation and solve for A2 we get
n
1X
a
2 =
(Ri a1 )2 .
n i=1
Part (2): The estimator for A1 is not biased while the estimator for A2 is biased as was
shown in Problem 2.4.7 above.
Part (3): The estimators are seen to be coupled since the estimate of A2 depends on the
value of A1 .
.
exp 2 (R1 S1 )
exp 2 (R2 S2 )
=
2n
2n
2n
2n
1
1
(R1 S1 )2 2 (R2 S2 )2 + constants .
2
2n
2n
To find the maximum likelihood estimate of A we need to take the derivatives of ln(p(R|A))
with respect to A1 and A2 . We find
S1
S2
ln(p(R|A))
1
1
= 2 (R1 S1 )
+ 2 (R2 S2 )
A1
n
A1 n
A1
1
1
= 2 (R1 S1 )x11 + 2 (R2 S2 )x21 and
n
n
S1
S2
1
1
ln(p(R|A))
= 2 (R1 S1 )
+ 2 (R2 S2 )
A2
n
A2 n
A2
1
1
= 2 (R1 S1 )x12 + 2 (R2 S2 )x22 .
n
n
Each derivative would need to be equated to zero and the resulting system solved for A1 and
A2 . To do this we first solve for S and then with these known values we solve for A. As a
first step write the above as
x11 S1 + x21 S2 = x11 R1 + x21 R2
x12 S1 + x22 S2 = x12 R1 + x22 R2 .
In matrix notation this means that S is given by
1
R1
R1
x11 x21
x11 x21
S1
.
=
=
R2
R2
x12 x22
x12 x22
S2
Here we have assumed the needed matrix inverses exits. Now that we know the estimate for
S the estimate for A is given by
1
1
R1
x11 x12
S1
x11 x12
A1
.
=
=
R2
x21 x22
S2
x21 x22
A2
We now seek to answer the question as to whether a1 (R)
2(R) are unbiased estimators
and a
a1 (R)
of A1 and A2 . Consider the expectation of the vector
. We find
a2 (R)
E
a1 (R)
a2 (R)
=
=
x11 x12
x21 x22
x11 x12
x21 x22
1
1
R1
R2
x11
x21
1
x11 x12
S1
=
E
x21 x22
S2
A1
A1
x12
,
=
A2
A2
x22
a1 (R)
. To do this we first let
a2 (R)
matrix
x11 x12
x21 x22
a1 (R) A1
a2 (R) A2
R1
1
X X
=X
R2
A1
1
X
X
=X
A2
1
R1
R2
A1
X
=X
A2
n1
A1
n1
1
.
+
=X
n2
A2
n2
A1
A2
a
1 (R)
a
2 (R)
Since
Var
n1
n2
=X
n1
X T .
Var
n2
n2 0
0 n2
= n2 I ,
We can see how this expression compares with the one derived in Equation 112 if we expand
the denominator D of the above fraction to get
D = x211 x212 + x211 x222 + x221 x212 + x221 x222 (x211 x212 + 2x11 x12 x21 x22 + x221 x222 )
= x211 x222 2x11 x12 x21 x22 + x221 x212 = (x11 x22 x12 x21 )2 .
Thus the above expression matches that from Equation 112 and we have that our estimate
is efficient.
e
Part (1): Here we are told that P (R|A) = A R!
and A is nonrandom. The logarithm is
given by
ln(P (R|A)) = R ln(A) A ln(R!) .
2 ln(P (R|A))
A2
i.
E[R]
=
,
E
A2
A2
A2
A2
A
using the fact that E[A] = A see [1]. Thus we see that Var[a(R)] A.
Part (2): In this case we have
p(R|A) =
n
Y
i=1
p(Ri |A) =
n Ri A
Y
A e
i=1
Ri !
n
X
i=1
Ri
= Qn
i=1
ln(A) ln
Ri !
Pn
enA A
n
Y
i=1
Ri !
i=1
Ri
To compute the maximum likelihood estimate of A we take the first derivative of the loglikelihood, set the result equal to zero and solve for A. This means that we need to solve
Pn
ln(p(R|A))
Ri
= n + i=1
= 0.
A
A
1X
Ri .
A=
n i=1
so that
n
1 Y
1
p(R|A) = n
,
i=1 1 + (Ri A)2
ln(p(R|A)) =
n
X
i=1
A)
(1
+
(R
A)
(1
+
(R
A)
i
i
i
i=1
i=1
By the Cramer-Rao inequality we have that
Var[a(R)]
2 ln((R|A))
A2
o.
Thus we need to evaluate the expectation in the denominator of the above fraction. We find
2
n Z
X
(1 + (Ri A)2 )
ln(p(R|A))
=
2
p(Ri |A)dRi
E
2 )2
A2
(1
+
(R
A)
i
i=1
Z
n Z
2X
dRi
(Ri A)2 dRi
=
+
2 3
2 3
i=1
(1 + (Ri A) )
(1 + (Ri A) )
Z
n Z
2X
dv
v 2 dv
=
.
+
2 3
2 3
i=1
(1 + v )
(1 + v )
= .
=
+
E
2
A
i=1
8
8
8
2
2
n
as we were to show.
n
Y
i=1
p(Ri1 , Ri2 |)
n
Y
2
2
1
(Ri1
2Ri1 Ri2 + Ri2
)
=
exp
2(1 2 )1/2
2(1 2 )
i=1
)
(
n
X
1
1
2
(R2 2Ri1 Ri2 + Ri2
) .
exp
=
(2)n (1 2 )n/2
2(1 2 ) i=1 i1
X
n
1
2
2
(Ri1
2Ri1 Ri2 + Ri2
).
ln(p(R|)) = n ln(2) ln(1 2 )
2
2
2(1 ) i=1
The ML estimate is given by solving
ln(p(R|))
n
n
X
n (2)
(2) X 2
1
ln(p(R|))
2
(2Ri1 Ri2 )
=
+
(R 2Ri1 Ri2 + Ri2 )
2 (1 2 ) 2(1 2 )2 i=1 i1
2(1 2 ) i=1
n
n
X
n
1 X
2
2
=
Ri1 Ri2
(Ri1
2Ri1 Ri2 + Ri2
).
+
2
2
2
2
1
1 i=1
(1 ) i=1
S11
1X 2
R ,
n i=1 i1
S12
ln(p(R|))
1X
Ri1 Ri2 ,
n i=1
S22
1X 2
R .
n i=1 i2
now becomes
ln(p(R|))
n
n
n
=
+
S12
(S11 2S12 + S22 )
2
2
1
1
(1 2 )2
3
n
+ S11 + (1 + 2 )S12 S22 .
=
2
2
(1 )
(113)
For later work we will need to compute the second derivative of ln(p(R|)) with respect to
. Using Mathematica we find
2 ln(p(R|))
n
=
1 + 4 + (1 + 32 )S11 + (6 23 )S12 + (1 + 32 )S22 .
2
2
3
(1 )
(114)
The equation for the ML estimate of is given by setting Equation 113 equal to zero and
solving for .
Part (2): One form of the Cramer-Rao inequality bounds the variance of a
(R) by a function
2 ln(p(R|))
of the expectation of the expression
. In this problem, the samples Ri are mean
2
zero, variance one, and correlated with correlation which means that
E[Ri1 ] = E[Ri2 ] = 0 ,
2
2
E[Ri1
] = E[Ri2
] = 1,
Using these with the definitions of S11 , S22 , and S12 we get that
E[S11 ] = E[S22 ] = 1 and E[S12 ] = .
Using these expressions, the expectation of Equation 114 becomes
2
ln(p(R|))
1 + 2
E
=
n
.
2
(1 2 )2
Thus using the second derivative form of the Cramer-Rao inequality or
1
2
ln(pr|a (R|A))
2
E[(a(R) A) ] E
,
A2
(115)
(1 2 )2
.
n(1 + 2 )
Following the steps in the books proof of the Cramer-Rao bound we get an equation similar
to the books 185 of
h
Z
i
p
ln(p(R|A)) p
p(R|A)
p(R|A)(a(R) A) dR = 1 + B (A) .
A
Using the Schwarz inequality 39 on the integral on the left-hand-side we get that
(Z
2 )1/2 Z h
i2 1/2
p
p
ln(p(R|A))
p(R|A) dR
p(R|A)(a(R) A) dR
.
1+B (A)
A
(1 + B (A))2
(Z
ln(p(R|A)) p
p(R|A)
A
Var[a(R)]
2
(1 + B (A))2
h
i2 ,
ln(p(R|A))
A
dR Var[a(R)] .
ln(p(R|A))
ln(p(R|A))
p(R|A)dR =
p(R|A)dR = 1
dR =
(1) = 0 .
A
A
A
A
Part (2): The covariance matrix is z = E[zz T ]. The needed outer product is given by
zz T =
a
(R) A
ln(p(R|A))
A
h
a
(R) A
ln(p(R|A))
A
(
a(R) A)2
(
a(R) A)
ln(p(R|A))
A
ln(p(R|A))
A
ln(p(R|A)) 2
A
(
a(R) A)
Following the same steps leading to Equation 60 we can show that the expectation of the
(1, 2) and (2, 1) terms is one and that E[zz T ] takes the form
E[(a(R) A)2 ]
1
2
.
E[zz T ] =
ln(p(R|A))
1
E
A
ln(p(R|A))
E[(a(R) A)2 ]E
1,
A
or
E[(a(R) A)2 ]
1
E
ln(p(R|A))
A
2 ,
which is the Cramer-Rao inequality. If equality holds in the Cramer-Rao inequality then
this means that |z | = 0.
Problem 2.4.22 (a derivation of the random Cramer-Rao inequality)
Note that we can show that E[z] = 0 in the same was as was done in Problem 2.4.21. We
now compute zz T and find
zz
a
(R) A
ln(p(R,A))
A
h
a
(R) A
ln(p(R,A))
A
(
a(R) A)2
(
a(R) A)
ln(p(R,A))
A
ln(p(R,A))
A
ln(p(R,A)) 2
A
(
a(R) A)
Consider the expectation of the (1, 2) component of zz T where using integration by parts we
find
Z
Z
ln(p(R, A))
p(R, A)
(a(R) A)
p(R, A)dRdA = (a(R) A)
dRdA
A
A
Z
E[(a(R) A)2 ]
1
2
E[zz T ] =
.
ln(p(R,A))
1
E
A
Part (2): The first row of the outer product zz T has components that look like
2
(
a(R) A) ,
a
(R) A
p(R|A)
p(R|A)
,
A
a
(R) A
p(R|A)
2 p(R|A)
,
A2
a
(R) A
p(R|A)
i p(R|A)
,
Ai
for i n. When we evaluate the expectation for terms in this row we find
Z
i p(R|A)
a
(R) A i p(R|A)
=
(
a
(R)
A)
dR
E
p(R|A)
Ai
Ai
Z
i1 p(R|A)
i1 p(R|A)
= (a(R) A)
(1)
dR
Ai1
Ai1
Z
i1
i1
=
p(R|A)dR
=
(1) = 0 .
Ai1
Ai1
Thus z has its (1, 1) element given by E[a(R) A)2 ], its (1, 2) element given by 1, and
all other elements in the first row are zero. The first column of z is similar. Denote the
lower-right corner of z as J and note that J has elements for i 1 and j 1 given by
Z
1 j p(R|A)
1 i p(R|A) j p(R|A)
1 i p(R|A)
=
dR .
E
p(R|A) Ai
p(R|A) Aj
p(R|A) Ai
Aj
cofactor(J11 )
= J 11 ,
|J|
the same procedure as in the proof of the Cramer-Rao bound proof for non-random variables.
Part (3): When N = 1 then zz T looks like
# " 2
#
" 2
1
2 =
2 ,
zz T =
ln(p(R|A))
p(R|A)
1
1
1
p(R|A)
A
A
so J = E
ln(p(R|A))
A
2
and thus
1
J11 =
E
ln(p(R|A))
A
2 ,
x =
x1
x2
..
.
xn
x (AT B) = (x AT )B + (x B T )A .
B1
B2
AT B = A1 A2 An .. = A1 B1 + A2 B2 + + An Bn ,
.
Bn
(116)
(A1 B1 + A2 B2 + + An Bn )
xn
A2
A1
An
B
+
B
+
+
B
1
2
n
x1
x1
x1
A1
n
2
B + A
B + + A
B
x2 1
x2 2
x2 n
n
2
1
+ A2 B
+ + An B
A1 B
x1
x1
x1
B
B
B
A1 1 + A2 2 + + An n
x2
x2
x2
+
..
..
.
.
B1
A1
A2
B2
An
Bn
A
B
+
B
+
+
B
+
A
1 xn
2 xn + + An xn
xn 1
xn 2
xn n
B1 B2
A2
A1
n
n
A
B
B1
A1
x1
x1
x1
x1
x1
x1
A1
A2
n
B1 B2 Bn A2
A
x2 x2
x2
x2
x2 B2
x2
..
..
..
.. .. + ..
.. .. .
.
.
.
. . .
. .
A1
B1
A2
B2
An
Bn
B
An
xn
xn
n
xn
xn
xn
xn
x1
x2
..
.
xn
A1 A2 An =
A1
x1
A1
x2
A2
x1
A2
x2
An
x1
An
x2
A1
xn
A2
xn
An
xn
..
.
..
.
..
.
the we can write the above as (x AT )B + (x B T )A thus our derivative relationship is then
x (AT B) = (x AT )B + (x B T )A .
(117)
Part (2): Use Part (1) from this problem with B = x where we would get
x (AT x) = (x AT )x + (x xT )A .
If A does not depend on x then x AT = 0 and we have x (AT x) = (x xT )A. We now need
to evaluate x xT where we find
x1 x2
n
x
x1
x1
x1
x1 x2 xn
x x2
x2
x xT = . 2
..
.. = I ,
..
.
.
x1
xn
x2
xn
xn
xn
the identity matrix. Thus we have x (AT x) = A. Part (3): Note from the dimensions
given xT C is a (1 n) (n m) = 1 m sized matrix. We can write the product of xT C as
c21 c22 c2m
T
x C = x1 x2 xn ..
..
..
.
.
.
cn1 cn2 cnm
Pn
Pn
Pn
x
c
x
c
x
c
=
j
jm
j
j2
j
j1
j=1
j=1
j=1
P
n
xc
x1 Pj=1 j j2
n
x2
j=1 xj cj2
x (xT C) =
..
P .
x
c
j=1 j j2
xn
= ..
..
..
.
.
.
cn1 cn2 cnm
P
n
xj cj2
Pj=1
n
j=1 xj cj2
x2
..
.
P
n
j=1 xj cj2
xn
x1
x1
x2
xn
P
n
j=1 xj cjm
P
n
j=1 xj cjm
..
P .
n
j=1 xj cjm
=C.
= 2(x [1/2 A(x)]T )1/2 A(x) = 2(x A(x)T )1/2 1/2 A(x)
= 2(x A(x)T )A(x) ,
(118)
as we were to show.
Part (2): If A(x) = Bx then using Problem 2.4.27 where we computed x xT we find
x AT = x (xT B T ) = x (xT )B T = B T .
Using this and the result from Part (1) we get
x Q = 2B T Bx .
Part (3): If Q = xT x then A(x) = x so from Part (4) of Problem 2.4.27 we have x xT = I
which when we use with Equation 118 derived above we get
x Q = 2x ,
as we were to show.
References
[1] S. Ross. A First Course in Probability. Macmillan, 3rd edition, 1988.