You are on page 1of 24

Chapter 1 Detection Theory

1.1 Decision Theory and Hypothesis Testing

Fundamentally, detection theory deals with making decisions between two or more possibilities based on a set of observations. Two common problems of this type which arise in electrical engineering are General communications problem: Decide whether a binary 0 or 1 was sent. General radar problem: Decide if a target is present from the return signal. In the second case, a radar operator might make a decision as to whether a blip on a radar screen is an aircraft or simply radar clutter (noise). However, human decision making tends to be subjective meaning that dierent operators arrive at dierent decisions based on the same set of observations because each operator follows their own decision making process. The same operator may also give dierent answers at dierent times so that the decision making process is not even repeatable. Aside from this, the decision making procedure often needs to be automated because human intervention is simply impractical, such as in communications. The fundamental statistical approach to decision making was developed early in the 20th century and is based on hypothesis tests which require a statistical model for the observations. The statistical hypothesis testing approach has the following advantages Based on general optimality criteria with dierent criteria leading to dierent tests. Quantitative and objective. Repeatable in that their statistical behaviour can be determined. Tests can be compared to one another objectively. As the hypothesis testing approach is statistical, a mathematical model of the system generating the observations is needed. In many problems the model is simply a deterministic signal in additive random noise. For the two cases above the models are General communications problem: x(t) = As(t) + n(t), t = 1, . . . , N . n(t) : Noise, random variables (RV). s(t) : Radar pulse waveform, deterministic and usually specied. A : Unknown amplitude < A < of the radar return signal, e.g., A [0, ). 1

x(t) : The observations formed from the sum of the radar return As(t) and the random noise. General radar problem: x(t) = As(t) + n(t), t = 1, . . . , N . n(t) : Noise, RVs. s(t) : Signal pulse shape, deterministic and usually specied. A x(t) : The observations are formed from the sum of the signal As(t) and random noise. When there are two possible decisions which can be made a hypothesis test is dened as a test for the null hypothesis, H , versus the alternative hypothesis, K . Together, H and K are called hypotheses and represent the decisions which can be made. Hypothesis test for communications problem: H : A = 1, K : A = +1, Hypothesis test for radar problem: H : A = 0, K : A > 0, target absent, target present. binary 0 sent, binary 1 sent. : Unknown amplitude < A < of the signal, e.g., A {1, +1}.

On the basis of the observations, we have to decide between H and K .

1.1.1

Simple and Composite Hypotheses

Let the observations have a probability density function (pdf) fX (x; ) where the observations x may be a single RV (univariate), or a collection of RVs (multivariate) and are parameter/s of the distribution. The parameter space is dened as the set of all values which can take, so . Likewise, the observation space is dened as the set of all values which x can take, so x . A hypothesis can be classied according to whether it is simple or composite. A simple hypothesis exactly species , e.g., = H and = K where H and K are points are both simple hypotheses. A composite hypothesis does not exactly specify , e.g., H , K where H and K are regions are both composite hypotheses. It is possible to have a simple or composite null hypothesis and a simple or composite alternative hypothesis. Communications problem: = A and = {1, 1}. Both the null and alternative hypotheses are simple H = H = 1, K = K = 1. Radar problem: = A and = [0, ). The null hypothesis is simple since H = H = {0}, a single point in the space of possible values for . The alternative hypothesis is composite since K = (0, ), a region in the space of possible values for . Any combination of simple/composite null/alternative hypotheses are possible, such as a composite null and composite alternative or a composite null and simple alternative. 2


PSfrag replacements

H : H composite null : the region containing all the possible values of H : = H simple null

Figure 1.1: Simple and composite hypotheses.

1.1.2

Multiple Hypothesis Tests

Consider M-ary pulse amplitude modulation (PAM) system which can be modeled as x(t) = As(t) + n(t). There are M possible hypotheses, H1 , . . . , HM , of which only one can be true, H1 : A = 1, H2 : A = 2, . . . HM 1 : A = M 1, HM : A = M. When there are multiple hypotheses/alternatives we have a multiple hypothesis test. Only the binary hypothesis test with one null and one alternative is considered for now.

1.2

Bayesian Approach to Hypothesis Testing

How should the decision be made? The Bayesian approach is to assign a cost to each possible outcome and then to minimise the average cost. Most other methods can be considered as special cases of the Bayesian approach. The possible outcomes and associated costs are as follows 1. Outcome : Accept K given H is true. Incorrect decision. False Alarm. Type I error. Cost is CKH . 2. Outcome : Accept H given K is true. Incorrect decision. Missed Detection. Type II error. Cost is CHK . 3. Outcome : Accept H given H is true. Correct decision. Cost is CHH . 4. Outcome : Accept K given K is true. Correct decision. Correct Detection. Cost is C KK . It is assumed that the cost of an incorrect decision is greater than the cost of a correct decision since incorrect decisions should be penalised more than correct decisions, so CKH > CHH , CHK > CKK .

The cost of making a correct decision is also assumed to be nonnegative so that correct decisions are not penalised. This means that nothing is lost by making a correct decision or that there is no gain for making a correct decision, CHH 0, 3 CKK 0.

The outcomes listed above have been described in terms of accepting a particular hypothesis but they can also be described in terms of rejecting the complement (opposite) hypothesis, e.g., rejecting H given H is true is a Type I error. The probabilities with which these outcomes occur have several dierent names. 1. Pr[Accept K | H ]: Probability of False Alarm (PF A ), Pr[Type I error], , level of the test. 2. Pr[Accept H | K ]: Probability of a Missed Detection, Pr[Type II error], . 3. Pr[Accept H | H ]: Probability of correctly accepting the null. 4. Pr[Accept K | K ]: Probability of Detection (PD ), power of the test. Let the pdf of the observations under the null hypothesis (given that the null hypothesis is true) be fX (x | H ) and the pdf of the observations under the alternative hypothesis be fX (x | K ). For every value of x we have to decide between H and K . Note that in saying we decide between H and K it is implicitly assumed that only one hypotheses can be true. This means that must be divided into two disjoint (non-overlapping) regions H and K . H is accepted if x H , while K is accepted if x K . Together these regions dene the test as they determine which decision is made given the observations, hence they are called the decision regions. In particular, H is called the acceptance region since it is the region for which the null is accepted while K is called the rejection region or the critical region since it is the region for which the null is rejected. As the decision regions are disjoint their intersection is the null set, H K = . As the decision regions must cover the observation space , their union is the observation space, H K = . For binary hypothesis tests H and K are complements of each other, K , H = H . K =

K H , PSfrag replacements H K ,

H K = H K =

Figure 1.2: Decision regions for a binary hypothesis test. There are connections between the probabilities of these outcomes, 1. Pr[Accept H | H ] = 1 Pr[Accept K | H ] = 1 2. Pr[Accept K | K ] = 1 Pr[Accept H | K ] = 1 , so PD = 1 . 4

To see this, take the rst case where H is true. Since a decision must be made and the only choices are to accept H or K , the sum of these two conditional probabilities must be 1, Pr[Accept H | H ] + Pr[Accept K | H ] = 1. The same argument is true for the second case above. Alternatively, using the fact that the decision regions are disjoint but cover the observation space and that any pdf must integrate to 1, the proofs are as follows. Proof 1 1. Pr[Accept H | H ] = Pr[x H | H ] =
H

fX (x | H ) dx
K

= 1

fX (x | H ) dx

= 1 Pr[x K | H ] = 1 Pr[Accept K | H ] = 1 2. Pr[Accept K | K ] = Pr[x K | K ] =


K

fX (x | K ) dx
H

= 1

fX (x | K ) dx

= 1 Pr[x H | K ] = 1 Pr[Accept H | K ] = 1 Finally, we need to know the a priori probabilities (prior probabilities) with which the null and alternative hypotheses occur. These are designated by PH and PK respectively. Obviously PH + PK = 1, so that one of the hypotheses must be true. Denote by C the average cost of making a decision. Using conditional probabilities C can be expressed in terms of CH , the average cost of making a decision conditional on H being true and CK , conditional on K being true, C = C H PH + C K PK . As there are only two decisions which can be made the average costs CH and CK can be expressed in terms of the cost of accepting either H or K . Remembering that is the probability of accepting K given H is true and visa versa for leads to CH = (1 )CHH + CKH , CK = (1 )CKK + CHK , so the average cost of making a decision is C = ((1 )CHH + CKH )PH + ((1 )CKK + CHK )PK = CHH PH + CKK PK + (CKH CHH )PH + (CHK CKK )PK . 5

Substituting in the integral expressions for and gives C = CHH PH + CKK PK + (CKH CHH )PH
K

fX (x | H ) dx + (CHK CKK )PK

H H

fX (x | K ) dx.

The two integrals can be combined into one integral over the region K since f (x | K ) dx = 1, K X C = CHH PH + CHK PK +
K

fX (x | K ) dx +

(CKH CHH )PH fX (x | H ) (CHK CKK )PK fX (x | K ) dx.

The average cost is comprised of a constant term independent of K and an integral term dependent on K . The average cost is minimised by choosing the decision region K which minimises the integral. The integral itself has two terms, (CKH CHH )PH fX (x | H ) and (CHK CKK )PK fX (x | K ), which are both nonnegative since CKH CHH > 0, CHK CKK > 0, PH , PK 0 are probabilities, This means that for a specic x, the term inside the integral will be negative only when (CHK CKK )PK fX (x | K ) > CKH CHH )PH fX (x | H ), assigning all these values of x to K will minimise the integral. So, whenever this term is negative accept K , and whenever it is positive accept H , (CHK CKK )PK fX (x | K ) (CKH CHH )PH fX (x | H ).
H K

fX (x | H ), fX (x | K ) 0 are pdfs.

If for some x the integral term is exactly zero, then the cost of placing this value of x in H is the same as if it is placed in K . This means it does not matter which decision is made. In general, if the distributions are continuous then the probability of this occurring is vanishingly small. This may not be the case if the distributions are discrete and the probability of this occurring is non-zero. The Bayes test is usually written as fX (x | K ) K PH (CKH CHH ) fX (x | H ) H PK (CHK CKK ) where the term on the LHS is the likelihood ratio (LR) L(x) and the term on the RHS is the threshold so that the test can be written L(x) .
H K

The Bayes test is sometimes called the LR test. When carrying out a test in general, the value being compared to the threshold is called the test statistic. The LR is a measure of the probability that x was generated from the alternative distribution (K is true) relative to the probability that x was generated from the null distribution (H is true). To see this, consider the shaded area in Figure 1.3 which is approximately fX (x) dx for small dx. This area is simply Pr[x < X x + dx] or the probability that the RV X takes values near x. Applying this to the pdfs in the LR leads to the interpretation above. What are the eects of changing the threshold ? From the integral expression for the average cost, increasing cannot increase the size of the region K and if it decreases the size of H is increased by the same amount (convince/show yourself that this is true). This means K is chosen less often relative to H . The situation is reversed if decreases. How does this relate back to the a priori probabilities and the costs? 6

f X (x ) fX (x) dx PSfrag replacements

x Figure 1.3:

x + dx

1. Increasing is the same as increasing the ratio PH /PK , meaning that the probability of the null hypothesis occurring is increased relative to the probability that the alternative hypothesis occurs. So, it makes sense to accept the null hypothesis more often. 2. Increasing is the same as increasing the ratio CKH /CHK , if CHH and CKK are held constant, meaning the cost of incorrectly accepting the alternative hypothesis is increased relative to the cost of incorrectly accepting the null hypothesis. So again, it makes sense to accept the null hypothesis more often. Some further points If the cost of a correct decision is greater than that of a incorrect decision, the inequality in the Bayes test is reversed. The Bayesian approach can be generalised to deal with multiple hypotheses. Example 1.2.1 Develop the Bayes test for testing whether an observation is N(0, 1) distributed or N(1, 1) distributed, H : X N(0, 1), K : X N(1, 1). Assume that CHK = CKH = 1, CHH = CKK = 0 and PH = PK = 1/2. Taking the signal model as x = + n where n has a standard normal distribution, N(0, 1), the above hypothesis test is equivalent to H : = 0, K : = 1, so we are testing whether the mean of an observation is 0 or 1. Both the null and alternative hypotheses are simple. The distributions of the observations under the null and alternative are 1 1 fX (x | H ) = exp x2 , 2 2 1 1 fX (x | K ) = exp (x 1)2 . 2 2 7

From which the LR is f X (x | K ) L (x ) = = f X (x | H )


1 2 1 exp 2 (x 1)2 1 2 exp 2 x

1 2

1 1 = exp (x 1)2 + x2 2 2 1 = exp x . 2

The threshold is = So the Bayes (LR) test is exp(x 1/2) 1,


H K

0.5 (1 0) PH (CKH CHH ) = = 1. PK (CHK CKK ) 0.5 (1 0)

or equivalently, by taking the natural log of both sides, x 1/2.


H K

Note that the former relation is expressed in terms of the LR L(x), while the latter is expressed in terms of the observation x only. The acceptance and rejection regions based on L(x) are 0 L(x) < 1 and 1 < L(x) < respectively. The acceptance and rejection regions based on x are H : < x < 1/2 and K : 1/2 < x < respectively. PSfrag replacements L (x ) Accept K Accept H

1/2 Accept H Accept K x

and are found as follows: = Pr[x K | H ] = =


K

fX (x | H ) dx dx

1 1 exp x2 2 2 1/2 = 1 (1/2) = 0.31, 8

where (x) =

x 1 2

1 2 u du is the standard normal cumulative distribution function. exp 2

= Pr[x H | K ] = =
H 1/2

fX (x | K ) dx dx

1 1 exp (x 1)2 2 2 = (1/2 1) = 0.31.

The probability of an incorrect decision is PH + PK = 0.31, so the probability of a correct decision is 1 0.31 = 0.69.

1.2.1

Evaluating the Probabilities ,

Depending on the test statistic, there are several dierent but equivalent ways to evaluate the probabilities of Type I and II error. Clearly, the test statistic should be chosen to make the evaluation as simple as possible. The three most common test statistics are listed below. Case 1: Based on the observations x: The pdf of x under the null and alternative are fX (x | H ) and fX (x | K ) respectively. The acceptance and rejection regions are x H and x K respectively. By denition, = Pr[Accept K | H ] = Pr[x K | H ] = = Pr[Accept H | K ] = Pr[x H | K ] = Case 2. Based on the LR L: Note that the LR is a function of the observations and is therefore a random variable with some pdf. The pdf of L under the null and alternative are dened as fL (L | H ) and fL (L | K ) respectively. The acceptance and rejection regions are 0 L < and < L < respectively. So, = Pr[Accept K | H ] = Pr[ < L < | H ] = = Pr[Accept H | K ] = Pr[0 L < | K ] = Case 3. Based on an arbitrary test statistic T : Let T be some monotonically increasing transformation of the LR so that the test is T .
H K 0 K

fX (x | H ) dx fX (x | K ) dx.

fL (L | H ) dL

fL (L | K ) dL.

Again, the test statistic is a RV. Let the pdfs under the null and alternative be f T (T | H ) and fT (T | K ) respectively. The acceptance and rejection regions are T < and < T < respectively. So, = Pr[Accept K | H ] = Pr[ < T < | H ] = = Pr[Accept H | K ] = Pr[ < T < | K ] = 9

fT (T | H ) dT fT (T | K ) dT.

T is monotonically increasing/decreasing to ensure the transformation is one to one or unique and hence reversible. If the transformation is decreasing, then the inequality in the Bayes test is reversed. Note that whichever test statistic is used, the general result is the same. is the probability that the statistic lies in the rejection region under the null, while is the probability that the test statistic lies in the acceptance region under the alternative.

1.2.2

Diculties with the Bayes Test

1. The a priori probabilities PH , PK may be unknown. 2. The costs may be dicult to evaluate or set objectively. Two special cases of the Bayes test which deal with these diculties are the ideal observer test and the Neyman-Pearson test.

1.3

Ideal Observer Test

The ideal observer test minimises the probability of error, PE = PH + PK . PE = P H + P K = PH


K

fX (x|H ) dx + PK
K

fX (x|K ) dx

= PK +

PH fX (x|H ) PK fX (x|K ) dx.

To minimise PE , choose K when PH fX (x|H ) PK fX (x|K ) < 0, giving f X (x | K ) K P H . f X (x | H ) H P K This type of test is useful in communications problems where the aim is to minimise the bit error rate (BER), which is actually PE . As the ideal observer test is a likelihood ratio test, it can be interpreted as a Bayes test. The assumption of the equivalent Bayes test is CKH CHH = CHK CKK , which occurs, e.g., when 1. The cost of type I and II errors are equal, CHK = CKH . 2. The cost of a correct decision is zero, CHH = CKK = 0. In addition, it can often be assumed that PH = PK = 1/2. This occurs in communications when the transmitter can be assumed to be an ideal source of information, PH = Pr[bit sent is 1] = PK = Pr(bit sent is 1) = 1/2. The resulting Bayes test has a threshold = 1, f X (x | K ) K 1. f X (x | H ) H In other words, choose the hypothesis with the greatest likelihood, fX (x|K ) fX (x|H ).
H K

10

Example 1.3.1 Consider a binary PAM communications system where the signal is observed in additive Gaussian noise, x = As + n where x is the observation, A {1, +1} represents the bit sent from an ideal source of information, s > 0 is the signal amplitude and n is zero-mean Gaussian noise with variance 2 . We want to decide which bit was sent, A = 1 or A = +1, so the hypothesis test is H : A = 1, K : A = +1. Since n N(0, 2 ), As + n is normal (Gaussian and normal are used interchangeably) with mean As and variance 2 . The distributions under both hypotheses are fX (x | H ) = N(s, 2 ) = fX (x | K ) = N(s, 2 ) 1 (x + s ) 2 2 2 2 2 1 1 = exp 2 (x s)2 . 2 2 2 exp 1

The LR is L(x) = exp(2xs/ 2 ), so the ideal observer test is exp 2xs/ 2 1,


H K

or equivalently, by taking the natural log, x 0.


H K

The acceptance and rejection regions are H : < x < 0, K : 0 < x < . In communications the usual measure of performance is the bit error rate (BER), which is the probability of an error PE . Given that the bits sent are equi-probable (PH = PK = 1/2) and that for an ideal observer test = , PE = /2 + /2 = = . Using to obtain PE gives PE = = =
K

fX (x | H ) dx dx

1 1 exp 2 (x + s)2 2 2 2 0 = 1 (s/ ). As the signal to noise ratio (SNR ) is s2 / 2 , PE = 1 ( SNR).

Example 1.3.2 Now extend the scenario in the previous example so that we have N observations, x(t) = As + n(t), t = 1, . . . , N . n(t) are now independent and identically distributed random variables from a N(0, 2 ) distribution. If follows that the observations are also independent and identically distributed random variables, x(t) N(As, 2 ). To form the LR we require the joint pdf of the observations under each hypothesis, fX (1),...,X (N ) (x(1), . . . , x(N ) | H ), fX (1),...,X (N ) (x(1), . . . , x(N ) | K ). Recall the result that if X and Y are independent random variables with pdfs f X (x) and fY (y ) respectively, then the joint pdf of X and Y is fXY (x, y ) = fX (x)fY (y ). 11

Applying this result here gives fX (x | H ) = fX (1),...,X (N ) (x(1), . . . , x(N ) | H )


N

=
t=1 N

fX (t) (x(t) | H ) 1 2 2 exp 1 (x(t) + s)2 2 2

=
t=1

fX (x | K ) = fX (1),...,X (N ) (x(1), . . . , x(N ) | K )


N

=
t=1 N

fX (t) (x(t) | K ) 1 2 2 exp 1 (x(t) s)2 . 2 2

=
t=1 2s 2 N t=1

The LR is L(x) = exp

x(t) , so the ideal observer test is 2s exp 2


N K

x (t )
t=1

1,
H

or equivalently
N

x (t ) 0 .
t=1 T (x) H

The acceptance and rejection regions are H : < T (x) < 0, K : 0 < T (x) < . To nd , and hence PE , the distribution of the test statistic T (x) under H and K must be found. In either case, T (x) is a sum of N independent and identically distributed N(As, 2 ) random variables. Under H , T (x) N(N s, N 2 ) and under K , T (x) N(N s, N 2 ). = Pr( < T (x) < | H ) 1 1 exp (T + N s)2 = 2 2 2N 2N 0 Ns . = 1

dT

The same result is obtained for , so PE = 1 ( N SNR) where SNR = s2 / 2 . 1. As N increases, the BER decreases. 2. As the SNR increases, the BER decreases.

12

10

BER (P )

10

increasing N 10
6

10 15

10

5 SNR (dB)

10

Figure 1.4: Binary PAM example : Bit Error Rate curve

1.4

Neyman-Pearson Test

The Neyman-Pearson test uses a dierent criterion of optimality, it maximises the power of the test while holding the PF A constant at level 0 , max 1 .
=0

The Neyman-Pearson test is useful in detection problems like radar where neither the a priori probabilities nor the costs are known. In addition, the Neyman-Pearson test is simply better suited to such problems from a practical point of view. Using the method of Lagrange multipliers leads to a Bayes test with threshold , f X (x | K ) K , f X (x | H ) H with being found from the constraint = 0 either through the distribution of the observations under the null hypothesis, 0 =
K

fX (x | H ) dx,

or through the distribution of the LR under the null hypothesis, 0 =


fL (L | H ) dL.

Note that maximising the power is equivalent to minimising the probability of Type II error . Although not discussed here, the reverse Neyman-Pearson test instead minimises the P F A of the test while holding the power constant at level 1 0 . The Neyman-Pearson test is also important because of its relation to some special classes of hypothesis tests, most powerful tests and uniformly most powerful tests. 13

Denition 1 (Most Powerful Tests) A test, T0 , of level 0 , 0 is most powerful (MP) if there is no other test T1 with 0 such that 1 0 , or equivalently, PD (T1 ) PD (T0 ), i.e., the power is maximised. Denition 2 (Uniformly Most Powerful Tests) A test is uniformly most powerful (UMP) of level 0 if it is most powerful independent of any unknown parameters. By denition the Neyman-Pearson test maximises the power subject to = 0 , so it will be UMP and hence MP if the test is independent of any unknown parameters. This is likely, but not necessarily, to be the case when the null hypothesis is simple.

1.4.1

Derivation of Neyman-Pearson Detector


max 1 .
=0

By denition the Neyman-Pearson detector maximises the power 1 for a given level = 0 ,

Introducing the Lagrange multiplier to account for the constraint gives the following cost function, C , which must be maximised with respect to the test and C = 1 + ( 0 ) =
K

fX (x | K ) dx + 0
K

fX (x | H ) dx

= 0 + For 0, choose K when

fX (x | K ) fX (x | H ) dx.

f X (x | K ) K . f X (x | H ) H The Lagrange multiplier, i.e., the threshold , is found by dierentiating C with respect to giving 0 =
K

fX (x | H ) dx.

The connection between and 0 is more clear when considering the likelihood ratio, 0 =

fL (L | H ) dL.

1.5

Comparisons

The following table makes a quick comparison of the three tests in terms of a priori knowledge and settings. Note that the Neyman-Pearson test essentially replaces knowledge of the a priori probabilities and the costs with a set level.

1.6

Receiver Operating Characteristics

The receiver operating characteristics (ROC) gives PD as a function of PF A , PD (PF A ). The ROC is useful in that it allows dierent tests to be directly compared to each other on the same basis, versus PF A . Clearly, a test T1 whose ROC is never less than that of another test T2 is at least as powerful as T2 . The ROC possess several useful properties: 14

Costs: CHK , CKH , CHH , CKK A priori probabilities: PH , PK Null & alternative pdfs: fX (x | H ), fX (x | K ) Set level 0

Bayesian

Ideal-Observer Neyman-Pearson

1. The ROC is bounded in [0, 1], 0 PD (PF A ) 1 for 0 PF A 1. This follows from the fact that the ROC is a plot of a probability versus a probability. 2. The points (0, 0) and (1, 1) lie on the ROC. Proof 2 Let 0 and in PF A =

fL (L | H ) dL,

PD =

fL (L | K ) dL,

then we have PD = PF A = 1 as 0 and PD = PF A = 0 as . 3. The most useful property of the ROC is that at any point (PF A , PD ) on the curve, the slope is the threshold for a LR test with the same (PF A , PD ), dPD = . dPF A Proof 3 It is straightforward to show that d d d d(1 ) dPD = = . = dPF A d d d d The derivatives d/d and d/d are easily found. By denition =

fL (L | H ) dL = 1

fL (L | H ) dL

and

=
0

fL (L | K ) dL,

d d = f L ( | H ) and = f L ( | K ) , d d from which it follows that f L ( | K ) dPD = . dPF A f L ( | H ) Now it must be proven that the right part of the above equation is equal to . This is done by nding an alternative expression for d/d, =
H

so that,

fX (x | K ) dx =

f X (x | K ) fX (x | H ) dx = f X (x | H )

L(x)fX (x | H ) dx.

Making the change of variable y = L(x) gives =


0

yfL (y | H ) dy.

Hence

d = fL ( | H ) d and equating the two dierent expressions for d/d gives f L ( | K ) = . f L ( | H ) 15

4. The ROC is a nondecreasing function, dPD /dPF A 0. This follows from the previous property and the fact that 0. Another simple way to arrive at this property is to consider the eect of changing the threshold on PF A and PD , PF A =

fL (L | H ) dL,

PD =

fL (L | K ) dL.

If decreases, then both PF A and PD cannot decrease, which means PD is a nondecreasing function of PF A . 5. Another useful property of the ROC is that it is concave, dPD /dPF A is a nonincreasing 2 function or equivalently d2 PD /dPF A 0. Proof 4 Dierentiating the ROC twice gives d2 PD d d dPD = 2 = dPF A dPF A dPF A dPF A and since dPF A = f L ( | H ), d 1 d = 0. dPF A f L ( | H )

1.6.1

Alternative Interpretation of a Detector and its ROC

A detector can simply be viewed as some function, D(), which takes as its input the observations x and outputs a unique numerical value corresponding to the decision which is made. In a binary hypothesis testing problem the simplest assignment is D (x ) = PSfrag replacements 1 : choose K , 0 : choose H . 1 : choose K , 0 : choose H .

D (x )

Some simple relations follow: H = {x : D(x) = 0}, K = {x : D(x) = 1}, = E[D(x) | H ], 1 = E[D(x) | K ]. This interpretation of a detector is useful when the test is randomised and D(x) takes values in [0, 1]. In this case the decision is no longer that the null is accepted or that the null is rejected, instead, D(x) is the probability that the null is accepted. Randomised tests are not considered here. Example 1.6.1 Consider a radar system where the radar return signal is observed in additive Gaussian noise, x (t ) = A + n (t ) where x(t) are the observations, A 0 is the radar return signal and n(t) are independent and identically distributed N(0, 2 ) random variables. To decide whether a target is present the hypothesis test is H : A = 0, target absent, K : A > 0, target present. 16

Since x(t) N(A, 2 ), fX (x(t) | H ) = N(0, 2 ) = fX (x(t) | K ) = N(A, 2 ) = The LR is L(x) = exp
A 2 N t=1 1 2 2 1 2 2 1 2 exp 2 2x 1 2 exp 2 . 2 (x A)

x (t ) A 2
N

A 2

, so the Neyman-Pearson test is A 2


K

exp

t=1

x (t )

,
H

or equivalently, by taking the natural log,


N

x (t )
t=1 T (x) H

2 ln N A + . A 2

Under H , T (x) N(0, N 2 ) and under K , T (x) N(N A, N 2 ), = Pr[ < T (x) < | H ] 1 1 T2 = exp 2 2 2N 2N . = 1 N

dT

For the Neyman-Pearson test, the threshold is found from the expression for , = N 1 (1 ). The threshold depends on N , and . N and will always be given, so if the variance of the noise is known the test will be UMP. If 2 is unknown, the threshold, and therefore the test, will depend on the unknown parameter 2 , meaning the test will not be UMP. = Pr[ < T (x) < | K ]

2N 2

exp .

1 (T N A ) 2 2 2N

dT

NA N

As the SNR is A2 / 2 , substituting = gives

N 1 (1 ) into the expression for PD = 1 N SNR .

PD = 1 1 (1 PF A )

1. For a given PF A , increasing either the SNR or N increases PD . 2. Increasing SNR or N means the same detection rate can be obtained at a lower P F A . 17

0.8

0.6 P
D

0.4 increasing SNR or N 0.2

0 0

0.2

0.4

PFA

0.6

0.8

Figure 1.5: Radar detection example : ROC curve

1.7

UMP Tests for Normal Random Vectors

Collect the observations into an N dimensional vector X . Under the null and alternative, let X follow a multivariate normal distribution, but possibly with dierent means and covariances, H : X MVN(H , H ) K : X MVN(K , K ), where H , K are the means of X under H and K and H , K are the covariances of X under H and K . Given that the pdf of an N -variate MVN(, ) RV is f X (x ; , ) = the LR is (show it) L (x ) = | H | 1 1 T 1 1 exp (x K )T K (x K ) + (x H ) H (x H ) . | K | 2 2 1 (2 )N || 1 exp (x )T 1 (x ) , 2

Taking the natural log of the LR gives the log-likelihood ratio (LLR) L (x) = ln L(x) = 1 1 | H | 1 T 1 1 ln (x K ) T K (x K ) + (x H ) H ( x H ) 2 | K | 2 2

where || is the determinant of the matrix . Note that this is a monotonically increasing transformation. 18

Collecting all terms in x into a test statistic T (x) gives 1 1 1 T 1 T 1 T (x ) = x T ( H K )x + ( K K H H ) x 2


quadratic part linear part

so that the Neyman-Pearson test is T (x )


H K

where = ln +

1 | K | 1 1 ln + ( T 1 H T H H ). 2 | H | 2 K K K

Example 1.7.1 We wish to detect a known deterministic signal s in zero-mean additive Gaussian noise n with known distribution, n MVN(0, ). From this information the signal model is x=+n and the relevant hypothesis test is H: =0 K : = s. This gives the distributions of the observations under H and K as fX (x | H ) = MVN(0, ) and fX (x | K ) = MVN(s, ) respectively, i.e., the problem is equivalent to H : X MVN(0, ) K : X MVN(s, ). Matching this problem up with the UMP test for normal random vectors gives the test statissT 1 s. tic as T (x) = sT 1 x and the threshold as = ln + 1 2

1.7.1

Matched Filter Interpretation

Consider = I in the previous example, so that the elements of n are independent and 1 T identically distributed N(0, 1) random variables. Then T (x) = sT x, = ln + 2 s s. This is the matched lter, a UMP test for detecting a known deterministic signal s in i.i.d. Gaussian noise with known variance. It is also known as the linear correlator detector. The detector can be implemented as a discrete time linear system by passing the observations through a lter and then downsampling at rate N . Let the discrete time lter be h(n) = s(N n), n = 0, . . . , N 1, where s(N n) is the (N n)th element of s. Before the sampler, the output of the lter h(n) is given by the PSfrag replacements x(n + 1) h (n ) downsample at rate N Figure 1.6: Implementation of the matched lter as a discrete time linear system. convolution y (n ) =
N 1 m=0

T (x ) =

N n=1

x(n)s(n)

x(m + 1)h(n m) =

N 1 m=0

x(m + 1)s(N n + m),

19

after the sampler we have y (N 1) = = =


m=1 T N 1 m=0 N 1 m=0 N

x(m + 1)s(N (N 1) + m) x(m + 1)s(m + 1) x(m)s(m)

= s x.

1.7.2

Pre-Whitening

Consider the same detection problem which led to the matched lter but with = H = K = I . If is known, the problem can be transformed back into one where = I by a pre-whitening transformation. 1. Take the Cholesky decomposition of 1 to obtain a triangular matrix C such that C T C = 1 . The Cholesky decomposition allows us to represent 1 as the product of two triangular matrices, C T and C , one being the transpose of the other. = C x. 2. Pre-whiten the observations by multiplying x by C , x 3. Consider what happens to the signal model: Cx = C + Cn, = +n . x is The covariance of n ] = Cov[Cn] = C Cov[n] C T = C C T = I Cov[n since C T C = 1 C T C = I C T C C T = C T .
I

become independent N(0, 1) RVs. By pre-whitening the observations the elements of n Pre-whitening brings us back to the MF scenario with x replaced by the pre-whitened and s replaced by s = Cs. Hence, the UMP test compares the test statistic observations x = ln + 1 s (x ) = s T x with the threshold T s . T 2

1.7.3

Probability of Detection for the MF with Pre-Whitening

The distributions of the pre-whitened observations under H and K are H : X MVN(0, I ), K : X MVN( s, I ). 20

(x ) = s T x , is a linear transform of the pre-whitened Gaussian observations, The test statistic, T so it is not hard to show (try it) that (X ) N(0, s T s ), H: T (X ) N( , s T s ). K: T sT s 2 = s T s , the PF A is Let d =

2 2 d = 1 , d

exp

2 1T 2 2d

dT

so that the threshold is =d 1 (1 ). Since


= = the probability of detection is

1 2 2 d d 2 d

exp

2 )2 d 1 (T 2 2 d

dT

). PD = 1 (1 (1 PF A ) d 2 /N , so again we have the equation The SNR is d PD = 1 (1 (1 PF A ) N SNR)

which denes the ROC. The results are exactly the same without pre-whitening, just replace by x and s by s. x

1.8

Bayes Tests for Multiple Hypotheses

A correct decision is then the outcome (accept Hi ) (Hi is true) while an incorrect decision is the outcome (accept Hi ) (H j is true), i = j . In total there are M 2 decisions which can be made, one for every combination of (accept Hi ), (Hj is true), i, j = 1, . . . , M . Within the set of decisions there are M correct decisions and M (M 1) incorrect decisions. An indicator function will be used to determine whether an outcome is a correct or incorrect decision, 1, (accept Hi ) (Hi is true), ij = 0, (accept Hi ) (Hj is true), 21

Assume that there are M hypotheses, H1 , . . . , HM , and a decision must be made as to which one is true. A Bayesian approach would be to dene a cost for each possible outcome and then to minimise the average cost of making a decision. Statistically, an outcome is dened as the intersection of the two events: 1. accept H i ; 2. Hj is true, Outcome : (accept Hi ) (Hj is true), i, j = 1, . . . , M.

i.e., i,j is 0 for an incorrect decision and 1 for a correct decision. If the cost of the corresponding outcome is Cij , then the cost of making a decision will be C=
i,j

ij Cij .

This is the cost, not the average cost. To obtain the average cost, take the expectation. E [C ] = E
i,j

ij Cij =
i,j

E[ij ] Cij .

It should be obvious that since ij is an indicator function taking only the values 0, 1, its expected value is the probability that the event corresponding to ij = 1 occurs. From now on, assume that when C appears, the averaging has already been done. The average cost is then C =
i,j

Pr[choose Hi , Hj true] Cij Pr[Hi | Hj ] Pr[Hj ] Cij .

=
i,j

Pr[Hi | Hj ] is the conditional probability of accepting hypothesis Hi given that Hj is true. Pr[Hj ] is the a priori probability that Hj is true (a priori means before observing x) and represents knowledge about the system which is assumed to be known. To go further, it is necessary to consider with what information the decision will be made of course, the decision is made using the information contained in the observations x. It follows that for every possible value of x a decision must be made. Denote by the set of all values which x can take, i.e., x . must then be divided into M regions, 1 , . . . , M , one for each hypothesis. If x i , then Hi is accepted. The regions i satisfy two important properties 1. =
i=1,...,M

i i=j

2. i j = ,

The rst ensures that the entire set is covered by the i s and is another way of saying that a decision is made for every x . The second ensures that the regions are disjoint (do not overlap) and is another way of saying that a specic value of x leads to a single decision, i.e., only one hypotheses is accepted. In some problems it is possible that more than a single hypotheses can be true, but these types of tests, which are also called multiple hypothesis tests, are not considered here. The regions i are called decision regions as they determine which decision is made. As for the binary hypothesis testing, acceptance and rejection/critical regions can be dened with respect to hypothesis Hi as Acceptance region: i , i = j , Rejection/Critical region:
j =i

i is the complement of i so that = i i. where Now it is possible to express Pr[Hi | Hj ] in terms of the observations and the decision regions. From the discussion of decision regions above it follows by denition that, Pr[Hi | Hj ] = Pr[x i | Hj ]. 22

1 PSfrag replacements ...

...

...

...

...

Figure 1.7: Decision regions for a multiple hypothesis test. In words, the conditional probability of accepting Hi given Hj is true is equal to the conditional probability that x lies in the acceptance region for Hi given Hj is true. It is a basic property of a pdf that Pr[x i ] = including the conditioning on Hj gives, Pr[x i | Hj ] = Using this in the average cost expression gives C =
i,j i

fX (x) dx,
i

fX (x | Hj ) dx.

f (x | Hj ) dx Pr[Hj ] Cij f (x | Hj )Pr[Hj ] Cij dx.

=
i i j

In the above expression the only free variable is i , so the cost is minimised by assigning each x to the disjoint regions i . Since x can only be an element of one i it makes sense to assign x to the region i such that the integral term f (x | Hj )Pr[Hj ] Cij

is minimised over i. The optimal Bayes decision is then to accept Hi , or equivalently, assign x to i when f (x | Hj )Pr[Hj ] Cij < f (x | Hj )Pr[Hj ] Ckj , k = i.
j j

Using M = 2 in the above expression will give Bayes test for binary hypotheses. Another interpretation of this rule is possible by making use of Bayes theorem to show that f (x | Hj )Pr[Hj ] = f (x, Hj ) = Pr[Hj | x] f (x). 23

Substituting this into the Bayes decision rule and noting that f (x) is a common factor gives Pr[Hj | x] Cij < Pr[Hj | x] Ckj , k = i.

This is Bayes decision rule expressed in terms of the a posteriori probabilities Pr[H i | x], the probability that Hi is true after having observed x, i.e., after conditioning on the observations. To avoid confusion on the use of the pdf of a RV f () and the probability of a RV Pr[], f () is used when the RV is (or may be) continuous whereas Pr[] is used when the RV is discrete. When the RV is discrete its pdf f () is simply a set of Dirac delta functions weighted by the discrete probabilities Pr[]. A special case which gives a very simple and intuitive interpretation is obtained when Cij = C0 i = j, C1 i = j,

so that the cost of a correct decision is C0 and the cost of an incorrect decision is C1 respectively. Assuming that C1 > C0 , this results in choosing Hi when Pr[Hi | x] > Pr[Hj | x] , j = i.

Hence Bayes decision rule maximises the a posteriori probability that Hi is true and this is known as the maximum a posteriori probability (MAP) test.

1.9

Terms to Know

Hypothesis tests, null hypothesis, alternative hypothesis, simple hypothesis, composite hypothesis, multiple hypothesis test, binary hypothesis test, Bayes test, ideal observer test, NeymanPearson test, false alarm, Type I error, missed detection, Type II error, correct detection, P F A , PD , , , level of a test, power of a test, probability of a missed detection, P E , cost of a decision, decision regions, acceptance region, rejection region, critical region, a priori probabilities, a posteriori probabilities, likelihood ratio, threshold, test statistic, most powerful tests, uniformly most powerful tests, receiver operating characteristics, log-likelihood ratio, matched lter, linear correlator, pre-whitening, maximum a posteriori probability test.

24

You might also like