Nonparametric Detection of Signals by Information Theoretic Criteria: Performance Analysis and An Improved Estimator

2746 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO.
5, MAY 2010
Nonparametric Detection of Signals by Information

Theoretic Criteria: Performance Analysis and an
Improved Estimator
Boaz Nadler
Abstract—Determining the number of sources from observed the array manifold, nor on the desired signal waveforms. While
data is a fundamental problem in many scientific fields. In this paper some of the techniques employed in our analysis could poten-
we consider the nonparametric setting, and focus on the detection tially be used to analyze more sophisticated estimators, such
performance of two popular estimators based on information
theoretic criteria, the Akaike information criterion (AIC) and min- analysis is beyond the scope of the current paper.
imum description length (MDL). We present three contributions
on this subject. First, we derive a new expression for the detection A. Problem Formulation
performance of the MDL estimator, which exhibits a much closer
fit to simulations in comparison to previous formulas. Second, we We consider the standard model for signals impinging on an
present a random matrix theory viewpoint of the performance of the array of sensors,
AIC estimator, including approximate analytical formulas for its
overestimation probability. Finally, we show that a small increase (1)
in the penalty term of AIC leads to an estimator with a very good
detection performance and a negligible overestimation probability. where is the complex observation vector received at the
Index Terms—Information theoretic criteria, performance anal- sensors, is the steering matrix, composed
ysis, random matrix theory, source enumeration. of linearly independent vectors, is a vector containing
the zero mean random signals and is a -dimensional complex
I. INTRODUCTION Gaussian white noise vector. The parameter is the unknown
noise variance.
ETECTION of the number of sources measured by an
D array of passive sensors is a fundamental problem in sta-
tistical signal processing, which has received considerable at-
We assume the signals have a full rank covariance matrix
(e.g., there are no fully correlated signals), and denote by
the noise-free population signal eigen-
tention in the past 20 years, see [3], [5], [24], [27], [29], [32],
values. That is, the eigenvalues of the population covariance ma-
and [33]. In the nonparametric setting, where no assumptions
trix of are
on the array manifold or on the desired signal waveforms are
made, two of the most common estimators for this problem are (2)
the AIC and the MDL estimators, both derived from informa-
tion theoretic considerations [29]. Information theoretic criteria For simplicity of the analysis and without loss of generality, we
have also been used to derive estimators in parametric settings, assume that for the rest of this paper. Furthermore, for
where the array manifold is assumed known, as in [28] and [30], the analysis of the MDL estimator we assume that the smallest
or when other information is available, such as explicit knowl- signal eigenvalue has multiplicity one
edge of the waveform of the desired signal or some shift invari- or equivalently ), and as described explic-
ance properties of the sensor array response, as in [9] and [10]. itly below, it is well separated from the next eigenvalue. The
Most of these estimators require the eigendecomposition of the
smallest signal eigenvalue will have multiplicity one, for ex-
sample covariance matrix. We note that in certain parametric
ample, whenever the emitting sources are at markedly different
settings, this step can be avoided, leading to more computation-
distances from the sensors. Our analysis can be generalized to
ally efficient estimators, see, for example, [11].
the case of multiplicity larger than one, or to two signal eigen-
In this paper, we focus on the detection performance of the
values which are not well separated, but these extensions are
original AIC and MDL estimators as proposed in [29]. In partic-
beyond the scope of this paper.
ular, we analyze the performance of these estimators in the non-
Let denote i.i.d. observations from the model
parametric setting, where no assumptions are made neither on
(1). The problem of interest is to determine the number of
Manuscript received January 22, 2009; revised December 29, 2009. Date of sources . In the nonparametric setting, where no knowl-
publication February 05, 2010; date of current version April 14, 2010. The as- edge of the desired signal waveforms is assumed, and no
sociate editor coordinating the review of this manuscript and approving it for
publication was Dr. Xavier Mestre.
assumptions on the steering matrix are made beyond it being
The author is with the Department of Computer Science and Applied Math- of rank , most methods to perform this task use the eigen-
ematics, Weizmann Institute of Science, Rehovot, 76100, Israel (e-mail: boaz. values of the sample covariance matrix
nadler@weizmann.ac.il). of the observations, see [31] for a review.
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. Some of the most common nonparametric estimators for the
Digital Object Identifier 10.1109/TSP.2010.2042481 number of signals are based on information theoretic criteria
1053-587X/$26.00 © 2010 IEEE
NADLER: NONPARAMETRIC DETECTION OF SIGNALS 2747
[29], [32]. In this paper, we consider the MDL and AIC esti- Asymptotically as this random variable follows a
mators, given by Gaussian distribution with some mean and standard
deviation , where both of these quantities depend on the
parameters , , and . It readily follows that
(3) (6)
and
where is the cumulative dis-
tribution function of a standard Gaussian. Both references de-
rived the following asymptotic approximations for the parame-
ters and
(4)
where
(7)
(8)
Note that these two estimators differ only by their penalty terms In [35, Fig. 4] and in [7, Fig. 2] it is suggested that these formulas
for AIC, and for MDL. Also are in very close agreement with simulation results.
note that our definition of and is slightly dif- The performance of the AIC estimator was also studied in var-
ferent from the standard notation in the literature, as for con- ious works, see [15], [35] and references therein. It was shown
venience later on, we divide these terms by (which does not that its main source of error is model order overestimation by
change the estimate of the number of sources). exactly one signal
The main focus of this paper is on the performance analysis
of these two estimators (9)
where is given by
In particular, we focus on the underestimation probability of
MDL and overestimation probability of AIC by exactly one
signal, as these are the leading terms governing their respective
detection performances for large sample sizes. (10)
B. Previous Work
and . In [34] and [35], expressions for
Various works analyzed the performance of these two esti-
this overestimation probability were derived, which require
mators, see [7], [20], [34], and [35]. The MDL estimator was
numerical evaluation of a possibly high dimensional integral
proven to be strongly consistent [36]
(although [34] also presented approximations to these high di-
mensional integrations). Both simulations and theory show that
for number of sensors the overestimation probability of
AIC for complex valued signals is nonnegligible, of the order
For finite sample sizes, it was observed empirically that the main
of 5%–10%.
source of error in the MDL estimator is underestimation of the
number of signals by exactly one signal. Following this obser-
vation, both [7] and [35] studied the properties of II. MAIN RESULTS
In this paper we present three main contributions related to the
AIC and MDL estimators. First, regarding the MDL estimator,
we show that despite the above results, (6) for MDL’s detec-
tion performance with given by (7) is not as accurate as
claimed. We explain the source of this discrepancy and derive a
new more accurate formula which provides a much better fit to
simulations. Our first main result is the following lemma.
Lemma 1: Assume has multiplicity one, and
. Then, an accurate expression
for the detection performance of the MDL estimator is given by
(5) (6), but with a modified expression for as follows [see
2748 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 5, MAY 2010
(11) at the bottom of the page], where for real valued In addition, let be a Wishart matrix with identity covariance
signals and noise, and in the complex valued case. matrix and parameters and , and let denote its
Note that the difference between (7) and the new formula (11) largest eigenvalue.
are the additional terms, neglected in the calculation Then, for , in the joint limit as both with
of the expectation in previous works. We , an approximate expression for the overestimation
note that independent of our work, a similar observation was probability of the AIC estimator in the presence of signals is
made recently in [8]. While the underlying analysis is similar, given by
our results are different as we obtain an explicit formula for
the correction terms, (11), whereas [8] require the numerical
solution of some nonlinear equations.
We now show that for practical values of , and signal
eigenvalues, these terms are nonnegligible. To this end,
assume is large and perform a Taylor expansion of the loga-
rithms in (11). This gives, up to terms
(15)
where is the Tracy-Widom distribution of order , and

for real valued noise or for complex valued noise. The
(12) centering and scaling parameters and are given by
Note that the second term in (12) is , (25), (26) below.
which is small with respect to the first term if . There are two key points in (15), beyond its possible use as
Hence, for reasonable values of (say , for an approximate expression for the overestimation probability of
which ), the difference is ap- the AIC estimator. First, it shows that the overestimation prob-
proximately , which is nonnegligible in ability is asymptotically independent of the signal eigenvalues.
comparison to the difference in the penalty term of MDL, Second, and perhaps more importantly, it provides a novel in-
. Further, for , from sight, from a random matrix point of view, of why this overesti-
(8) it follows that . Hence, the change mation probability is nonnegligible for practical values of , .
in the number of standard deviations due to these terms is Consider, for example the case of no signals, , and assume
, which again is large, so that . A Taylor expansion with respect to
may be significant for small signal strengths, see Section V for in (13) gives that
a numerical example.
The second contribution of this paper is a random matrix
view of the overestimation probability of the AIC estimator. We (16)
present a relation between this probability and the distribution
of the largest eigenvalue of a Wishart matrix with identity co- According to (14) and (15), the condition for AIC to overesti-
variance, and obtain the following result. mate the number of signals is
Lemma 2: Let be i.i.d. observations from (1)
with signals, noise variance , and noise-free signal
eigenvalues . Let be the solution
of
But, as is well known from random matrix theory [13], the
largest eigenvalue of a pure noise covariance matrix is
(13) , which is comparable to the overesti-
and define mation threshold . Thus, for finite small number of sen-
sors , the penalty term of AIC is not sufficiently large
(14)
for this event to have negligible probability. We stress that the
(11)
above is only an approximate analysis, whose purpose is to pro- For comparison purposes, to detect the same signal with proba-
vide insight, rather than exact results. In Lemma 3, we provide bility at least 1/2 by the MDL estimator, its eigenvalue must be
an explicit nonasymptotic bound on the overestimation proba- larger than , where
bility for a family of modified AIC estimators.
Since the AIC estimator has a nonnegligible model order (20)
overestimation probability, an interesting question is what is a
sufficient penalty term that would yield a negligible probability Hence, with a penalty coefficient as small as this modi-
of overestimation for large sample sizes and typical number fied AIC estimator enjoys both a much improved detection per-
of sensors, . As already discussed in the original formance in comparison to MDL, together with a negligible
paper by Wax and Kailath [29], (see also [36] for a corrected overestimation probability. For example, for ,
analysis), any penalty term of the form with an and the bound (18) is of the order of
increasing function such that but . In practice, this bound is highly conservative and the ac-
as sample size , leads to an asymptot- tual overestimation probability is orders of magnitude smaller,
ically consistent estimator. One solution is of course offered see Section V for simulation results. Finally, we mention that
by the MDL estimator with penalty term . if one would like the estimator to also be strongly consistent
However, this penalty is too large and leads to poor detection ( as ), then following the analysis in [29],
performance at low SNR, see [4], [17], and [35]. this can be achieved with a sample-dependent penalty term, such
In the third contribution of this paper, we show that even a as .
small increase in the penalty term of the original AIC estimator
is sufficient to obtain a new nonparametric estimator for the
III. MATHEMATICAL PRELIMINARIES
number of sources, with both a high detection performance at
low SNR and a negligible overestimation probability. In more To derive approximate expressions for the overestimation and
detail, we consider a modified AIC estimator of the form underestimation probabilities of the AIC and MDL estimators,
we shall use the following results regarding the distribution of
sample eigenvalues of both “signal plus noise” covariance ma-
(17) trices or of pure noise covariance matrices.
The following theorem characterizes the asymptotic variance
whose penalty term depends on a yet unspecified constant . and bias of sample eigenvalues. The expression for the variance
This specific form is chosen so that the difference in the penalty is well known, see, e.g., Anderson’s book [1, ch. 13]. The ex-
terms in is , which will pression for the bias, due to interaction terms with other eigen-
turn out to be convenient for the analysis later on. Note that values, was first derived by Lawley [18].
for this estimator has a penalty term comparable to that Theorem 1: Let denote the sample eigenvalues
of the original AIC estimator, and so we naturally focus on of a covariance matrix of multivariate real or complex
values . The following lemma provides a nonasymptotic valued Gaussian observations . Let denote
quantitative bound for the overestimation probability of this the (population) eigenvalues of . For each such that has
modified AIC estimator, and shows that a penalty constant multiplicity one, asymptotically in
as small as is sufficient to obtain a practically zero
overestimation probability. (21)
Lemma 3: Consider the modified AIC estimator (17) with
constant . In the presence of pure noise with no signals (22)
, this estimator has a negligible overestimation prob-
ability which can be bounded by an explicitly computable ex-
pression. Analysis of the resulting expression for large gives with for real or complex valued samples, respectively.
the following approximate bound which holds for any value of The following theorem, proven in [12] and [13] for com-
plex or real valued noise matrices, respectively, characterizes
the asymptotic distribution of the largest eigenvalue of a pure
noise matrix.
Theorem 2: Let denote the sample covariance matrix of
pure noise observation vectors distributed . In the joint
limit , with , the distribution
(18)
of the largest eigenvalue of converges to a Tracy-Widom
distribution. That is, for every
In addition, for a single signal to be detected with probability
at least 1/2 by this estimator, its noise-free eigenvalue must be (23)
larger than , where
where for real valued noise and for complex-
valued noise. The centering and scaling parameters, and
(19) , respectively, are functions of and only.
Furthermore, with carefully constructed expressions for Lemma 4: Let denote i.i.d. observations from
and , the convergence rate to the Tracy-Widom distribution (1) with signals, noise variance and noise-free signal
is . In particular, for eigenvalues . Define
(24)
For real valued observations, the following formulas provide

convergence rate, see [21]
(25)
Then, up to an error [see (29) at the bottom of the page].
This expression depends on all signal eigenvalues. However, as-
suming , and approximating
(26)
(30)
For complex valued observations, similar, though more involved
expressions appear in [6], Theorem 2. gives the expression in (11) in Lemma 1.
The last result we shall use is the following theorem, which Remark 1: By a similar analysis it is possible to derive the
provides a nonasymptotic bound on large deviations of the correction term to in (8). However, this requires
largest eigenvalue of a pure noise matrix for finite values of considerable more tedious algebraic calculations and is hence
(see [19, Eq. (2.4)]): omitted. Also, note that the correction term to
Theorem 3: Let be the largest eigenvalue of a covari- does not change the value of where ,
ance matrix of complex multivariate Gaussian observations since according to (6), this value is determined by the condition
distributed . Then the following nonasymptotic bound .
holds for all values of , Remark 2: The proof of this lemma uses Theorem 1 regarding
the bias of sample eigenvalues. Hence, while (22) and (29) are
(27) correct asymptotically as , they provide a good approx-
imation for finite only if the eigenvalue with multiplicity one
is sufficiently separated from the other eigenvalues, see [22]. A
where
necessary condition is that no eigenvalue crossover (or subspace
swap) occurs. Since the sample eigenvalue corresponding to a
(28)
population eigenvalue has asymptotically Gaussian fluctua-
tions with variance , the condition for a negligible prob-
with , , and . ability of eigenvalue crossover is . In
our case , and so (29) is a good approximation if
IV. ANALYSIS OF MDL AND AIC DETECTION PERFORMANCE .
In this section, we prove Lemmas 1–3 from Section II. Fol- Proof of Lemma 4: Equation (5) contains terms of the form
lowing [7] and [35], we assume the performance of the MDL for various random variables , whose distribution
estimator is governed by the underestimation probability by ex- depends on the number of samples , and whose variance is
actly one signal. Thus, we start our analysis with the random . Let
variable , given by (5). We
prove the following result, from which Lemma 1 follows.
(29)
For each we make the following change of variables Our first goal is to show that under the conditions of the
lemma, as this probability is to leading order in
independent of the signal eigenvalues and depends only
where is a random variable with on the eigenvalues of a random Wishart
some density , which in general depends on the number matrix with identity covariance. To this end, we denote the
of samples . Nonetheless, by construction signal subspace by , and its orthog-
so that . Further, note that onal complement by , such that . Let
. be a basis of whose first
Assuming is large, we compute the asymptotic expansion vectors are the eigenvectors of the population covariance
in of the quantity via a Taylor expansion of the matrix corresponding to the signal eigenvalues ,
logarithm function, as follows: and the remaining vectors diagonalize the projection of the
sample covariance matrix onto the noise subspace . That
is
(31) .. .. ..
. . .
To apply (31) to the various random variables appearing in
in (5) we thus need to know, up to terms, the
means and variances of the following three random variables: ..
, and . Applying (21) and (22) of The- .
orem 1 gives the equation at the bottom of the page, and
The diagonal entries are the sample variances in

the (unknown) directions that correspond to the population
signal eigenvalues . Hence, .
where for real valued observations and for The quantities are the eigenvalues of the projec-
complex valued observations. Combining these expressions tion of onto the noise subspace , of dimension .
with (31) and (5) yields the required result, (29). Note that the projection of onto the noise subspace gives a
We now turn our attention to the performance analysis of the sub-matrix which is independent of its pro-
AIC and modified AIC estimators. jection onto the signal subspace. Since this submatrix contains
Proof of Lemma 2: Following [34] and [35], we assume that no signal contributions, it is a random realization of a Wishart
the signals present are sufficiently strong so that the misdetec- matrix with identity covariance.
tion performance of AIC is governed by its overestimation prob- The matrix captures the interactions between the
ability by exactly one source. That is signal and the noise subspaces. We consider the case ,
(32) view this matrix as a small perturbation, and expand the eigen-
values and eigenvectors of , which correspond to the
with given by (10). noise subspace, in terms of the matrix (see [22, The-
orem 2.2] for a similar analysis). We thus write for where , and the zero mean random
variable is defined as follows:
Note that under the conditions of the lemma that , it

follows that and . Similarly, we
where the expansion coefficients and are quan- obtain up to terms
tities yet to be determined. Inserting these expressions into the
eigenvalue/eigenvector system , and equating
powers of gives linear equations for the unknown quan-
tities above. Solving these equations gives that for each
(33)
where and . Inserting

The random variable is given by
all of the above expressions into (10) the terms cancel out,
and up to terms we obtain
It captures, to leading order in , the interaction between the

signal subspace direction with variance and the direction
corresponding to the noise subspace eigenvalue .
Since and
, we can write (36)
where the random variable is a linear combination of , for

, explicitly given by
(34)
where and .
Hence, the sum in (33) can be written as (37)
The key point thus far is that to leading order in , the

effect of the signals is captured solely by the fluctuations
Next, we study the properties of the random variables . in the random variable , as all other terms in (36) de-
By definition, each such variable is proportional to the correla- pend only on the eigenvalues of the noise subspace ,
tion coefficient between the projection of the data onto a fixed . A similar conclusion was also reached
direction and its projection onto the direction . Since by [35], where the random variable was neglected alto-
the directions and are orthogonal, the random vari- gether. We now show that asymptotically as , the
ables are independent of . Furthermore, since the direction random variable can indeed be neglected. To this end, note
is fixed and independent of the signal and noise realizations, that both and
under the assumption of Gaussian sources and noise, the vector . Hence,
is uniformly distributed on the -dimensional is indeed negligible with
unit sphere. In particular, we have that respect to the term in (36).
and for . Our analysis departs from [35] by introducing the random
Similarly, the properties of the random variable rele- variable , so that
vant to our proof are that ,
and . Furthermore, since
(38)
, we have that
whereas for the noise eigenvalues .
Hence Recall that . Hence, by definition
(35)
Inserting these expressions into (36) gives that the condition for where is the solution of the following equation, which de-
AIC to overestimate the number of signals is, up to pends on the penalty constant ,
(42)
(39)
Let be the solution with equality in (39). Then, asymp- For future use, we note that similar to the analysis made in
totically in , overestimation occurs when (16), for large we have that .
Also, recall that the random variable is the trace of
(40) a pure noise covariance matrix divided by its size, and is thus
distributed as a random variable.
Thus, asymptotically, the overestimation probability of the AIC Let be some positive number. Then, [see (43) at the
estimator depends on the tail statistics of the ratio bottom of the page]. In principle, both terms on the right-hand
side (RHS) above can be computed numerically for any value
of , , , and , and one can also find the value of where
the bound attains a minimum. As we now show, the key point
is that both terms in (43) are exponentially small in .
of the largest eigenvalue of a Wishart matrix with identity co-
For the first term, we use the following inequality (see [14]):
variance and parameters and to the mean of all its eigen-
values. For small values of (say ), the tail distribution
(44)
of this random variable was tabulated, see [25], and in principle
could be numerically calculated for larger values of with the
aid of a computer. Applying this bound with and , gives
To derive an approximate explicit expression for this overes- that the first term in (43) is bounded by . As
timation probability, we first note that similar to Theorem 2, in for the second term, according to Theorem 3 it is bounded
the joint limit as both , it can be proven that with by (27), with .
the same centering and scaling parameters, as for the To understand the dependence of this bound on the number
largest eigenvalue of sensors and on the penalty coefficient , we con-
sider in more detail the probability of an event of the form
. According to (27), this
probability is bounded by , where
However, as discussed in [23], a more accurate expression for
tail probabilities of the ratio, with an error is
Evaluating the integral gives that for any , [see (45) at the
Combining this expression with (40) proves the lemma. bottom of the next page]. Hence
Proof of Lemma 3: We perform an analysis similar to the
one for the AIC estimator. Note that for the case of no signals, (46)
there are no interactions between signal and noise eigenvalues.
Hence, the exact condition for is
In our case, , and so
. Hence, combining (44), (46), and (43) gives (18).
Finally, to prove (19), we note that the condition for the mod-
(41) ified estimator to detect a single signal with probability at least
(43)
Fig. 1. Comparison of empirical performance of the MDL estimator to the theoretical formula (7) and to the improved formula (11). Parameters: = 1, complex
valued signals and noise (a) p=7 one signal. (b) p =7
two signals. (c) p = 20
three signals. (d) p = 20
, five signals.
Fig. 2. Analysis of the difference between (7) and (11) for [1 ] (

. (a) Difference n 0 ) as function of sample size for p = 7, q = 1; = 0:2.
(
(b) Difference in standard deviations 0 )= as a function of sample size.
1/2 is . Similar to the analysis of the puted from (7) and the empirical results, whereas the theoret-
MDL estimator, this gives ical curve computed with the modified expression (11) is more
accurate.
Next, we show that the difference is nonneg-
ligible for practical values of . Consider sensors, and
For large and small , a Taylor expansion of the logarithms a single signal of strength , as in the upper
gives the approximate condition left simulation in Fig. 1. In Fig. 2(a), we plot the difference
as a function of sample size . As expected
from (12), this difference converges to . In Fig. 2(b),
we plot as a function of sample size .
from which (19) follows. Note that the difference is of the order of one standard deviation
for the values of considered. This nonnegligible difference
V. SIMULATIONS explains the discrepancy in Fig. 1 between the empirical sim-
In this section, we compare simulation results to our theoret- ulation results and the theoretical performance curve based on
ical predictions regarding the probability of misdetection of both the original expression for , (7).
the MDL, the AIC and the modified AIC estimators. We start Next, we consider the misdetection probability of the AIC
with the MDL estimator. In Fig. 1 we show the result of com- estimator. In Fig. 3, we present the result of computer simula-
puter simulations for the misdetection probability of the MDL tions of the probability of overestimation of the AIC estimator,
estimator, in comparison to the theoretical prediction of (6) for in comparison to the theoretical prediction of (15), for
or sensors and for various numbers of signals, ei- or samples as a function of the number of sensors
ther with the original expression for , (7), or with our modified , and for different number of sources, . In all cases
expression, (11). As shown in the figure, there is a consider- the theoretical prediction is quite accurate with an error of up to
able discrepancy between the original theoretical curve com- 1%–2%.
(45)
Fig. 3. Comparison of empirical misdetection probability of the AIC estimator to the theoretical formula (15). Parameters: =1 , complex valued signals and
= 200
noise, each point is the average of 100 000 simulations (a) n , no signals. (b) n = 200, two signals. (c) n = 1500, two signals. (d) n = 1500 , four signals.
TABLE I and we thus denote it as RMT. Note that even though the modi-
MISDETECTION PROBABILITY OF MODIFIED AIC ESTIMATOR WITH PENALTY
C =2. EMPIRICAL AVERAGE RESULTS OF 15 000 000 SIMULATIONS,
fied-AIC and the RMT algorithms are not consistent, their over-
estimation probabilities are so small that on the scale of the
INCLUDING ONE STANDARD DEVIATION CONFIDENCE INTERVALS.
figure, their misdetection error appears to reach a value of zero.
VI. SUMMARY AND CONCLUSION

In this paper, we presented a statistical performance analysis
of the MDL and AIC estimators, and of a modified AIC-type
estimator with a penalty slightly larger than that of the AIC es-
timator but significantly smaller than that of the MDL estimator.
We showed that this modified AIC estimator has a much better
detection performance than MDL with a negligible probability
of overestimation.
The two key quantities used in our analysis were: i) the distri-
bution of the largest eigenvalue of a pure noise matrix; and ii) the
interaction of signal and noise eigenvalues. These quantities also
played a key role in our previous work [16], [17], where we de-
veloped a source enumeration estimator based on a sequence of
hypothesis tests and analyzed its performance and that of two
other recent estimators [24], [26].
Since source enumeration involves choosing amongst many
composite hypothesis, in general there is no universally optimal
detection procedure. Indeed, comparing the detection perfor-
mance of the modified AIC estimator, (19), with that of the RMT
Fig. 4. Comparison of misdetection probability of various algorithms as a func- estimator, (25) in [17], no estimator dominates the other over all
tion of sample size n.
parameter ranges. In particular, the modified AIC algorithm may
be useful in legacy systems where only minimal changes to an
Finally, we consider the misdetection probability of the mod- existing code can be made.
ified AIC estimator. Table I presents simulation results for the In this paper, we focused on analysis in the nonparametric
probability of misdetection of the modified AIC estimator with setting and relied on recent results from random matrix theory.
a penalty constant , various values for and different In a similar fashion, both analysis and development of improved
number of sources. As predicted theoretically, the overestima- estimators in a parametric setting, with an explicit knowledge
tion probability is extremely small, and it slightly increases with of the array manifold, are possible, this time using results
the number of signals , as this decreases the noise subspace di- from extreme value theory on the maxima of random pro-
mension from to . Since the penalty term of the modified cesses and fields. This analysis will be described in a separate
AIC estimator is significantly smaller than that of the MDL esti- publication [2].
mator, it enjoys a much improved signal detection performance
ACKNOWLEDGMENT
at low SNR, compared to the MDL estimator. We illustrate this
in Fig. 4, where we compare the misdetection probability of the The author would like to thank the three anonymous referees
MDL, AIC, and modified AIC estimator with , as well for valuable suggestions which greatly improved the exposition
as the algorithm recently developed in [16], with a confidence of the paper.
level of . The latter algorithm performs a sequence of REFERENCES
hypothesis tests for the significance of the largest eigenvalue as [1] T. W. Anderson, An Introduction to Multivariate Statistical Analysis,
arising from a signal, using results from random matrix theory, 3rd ed. New York: Wiley, 2003.
[2] N. Arkind and B. Nadler, Parametric joint detection-estimation of the [22] B. Nadler, “Finite sample approximation results for principal compo-
number of sources in array processing, unpublished. nent analysis: A matrix perturbation approach,” Ann. Stat., vol. 36, no.
[3] J. F. Böhme, “Array processing,” in Advances in Spectrum Analysis and 6, pp. 2791–2817, 2008.
Array Processing, S. Haykin, Ed. Englewood Cliffs, NJ: Prentice- [23] B. Nadler, “On the distribution of the ratio of the largest eigenvalue to
Hall, 1991, pp. 1–63. the trace of a Wishart matrix,” submitted for publication.
[4] W. Chen, K. M. Wong, and J. P. Reilly, “Detection of the number [24] N. R. Rao and A. Edelman, “Sample eigenvalue based detection of high
of signals: A predicted eigen-threshold approach,” IEEE Trans. Signal dimensional signals in white noise using relatively few samples,” IEEE
Process., vol. 39, no. 5, pp. 1088–1098, 1991. Trans. Signal Process., vol. 56, no. 7, pp. 2625–2638, 2008.
[5] P. Chen, M. C. Wicks, and R. S. Adve, “Development of a statis- [25] F. J. Schuurmann, P. R. Krishnaiah, and A. K. Chattopadhyah, “On the
tical procedure for detecting the number of signals in a radar measure- distribution of ratios of the extreme roots to the trace of the Wishart
ment,” Inst. Elect. Eng. Proc. Radar Sonar Navig., vol. 148, no. 4, pp. matrix,” J. Mult. Anal., vol. 3, pp. 445–453, 1973.
219–226, 2001. [26] J. Schott, “A high-dimensional test for the equality of the smallest
[6] N. El Karoui, “A rate of convergence result for the largest eigenvalue eigenvalues of a covariance matrix,” J. Mult. Anal., vol. 97, no. 4, pp.
of complex white Wishart matrices,” Ann. Prob., vol. 36, no. 6, pp. 827–843, 2006.
2077–2117, 2006. [27] M. O. Ulfarsson and V. Solo, “Dimension estimation in noisy PCA with
[7] E. Fishler, M. Grosmann, and H. Messer, “Detection of signals by infor- sure and random matrix theory,” IEEE Trans. Signal Process., vol. 54,
mation theoretic criteria: General asymptotic performance analysis,” no. 12, pp. 5804–5816, 2008.
IEEE Trans. Signal Process., vol. 50, no. 5, pp. 1027–1036, 2002. [28] S. Valaee and P. Kabal, “An information theoretic approach to source
[8] F. Haddadi, M. Malek-Mohamaddi, M. M. Nayebi, and M. M. Aref, enumeration in array signal processing,” IEEE Trans. Signal Process.,
“Statistical performance analysis of MDL source enumeration in array vol. 52, no. 5, pp. 1171–1178, 2004.
processing,” IEEE Trans. Signal Process., vol. 58, no. 1, pp. 452–457, [29] M. Wax and T. Kailath, “Detection of signals by information theoretic
2010. criteria,” IEEE Trans. Acoust., Speech, Signal Process., vol. 33, no. 2,
[9] L. Huang and S. Wu, “Low-complexity MDL method for accurate pp. 387–392, Apr. 1985.
source enumeration,” IEEE Signal Process. Lett., vol. 14, no. 9, pp. [30] M. Wax and I. Ziskind, “Detection of the number of coherent signals
581–584, 2007. by the MDL principle,” IEEE Trans. Acoust. Speech, Signal Process.,
[10] L. Huang, S. Wu, and X. Li, “Reduced-rank MDL method for source vol. 37, no. 8, pp. 1190–1196, 1989.
enumeration in high-resolution array processing,” IEEE Trans. Signal [31] D. B. Williams, “Detection: Determining the number of sources,”
Process., vol. 55, no. 12, pp. 5658–5667, 2007. in Digital Signal Processing Handbook, V. K. Madisetti and D. B.
[11] L. Huang, T. Long, and S. Wu, “Source enumeration for high-res- Williams, Eds. Boca Raton, FL: CRC , 1999.
olution array processing using improved Gershgorin radii without [32] K. M. Wong, Q.-T. Zhang, J. P. Reilly, and P. C. Yip, “On information
eigendecomposition,” IEEE Trans. Signal Process., vol. 56, no. 12, theoretic criteria for determining the number of signals in high reso-
pp. 5916–5925, 2008. lution array processing,” IEEE Trans. Signal Process., vol. 38, no. 11,
[12] K. Johansson, “Shape fluctuations and random matrices,” Comm. Math. pp. 1959–1971, 1990.
Phys., vol. 209, no. 12, pp. 437–474, 2000. [33] J. Xin, N. Zheng, and A. Sano, “Simple and efficient nonparametric
[13] I. M. Johnstone, “On the distribution of the largest eigenvalue in prin- method for estimating the number of signals without eigendecomposi-
cipal component analysis,” Ann. Stat., vol. 29, pp. 295–327, 2001. tion,” IEEE Trans. Signal Process., vol. 55, no. 4, pp. 1405–1420, 2007.
[14] I. M. Johnstone and A. Lu, “On consistency and sparsity for principal [34] W. Xu and M. Kaveh, “Analysis of the performance and sensitivity
component analysis in high dimensions,” J. Amer. Stat. Assoc., vol. of eigendecomposition-based detectors,” IEEE Trans. Signal Process.,
104, pp. 682–693, 2009. vol. 43, pp. 1413–1426, 1995.
[15] M. Kaveh, H. Wang, and H. Hung, “On the theoretical performance [35] Q. T. Zhang, K. M. Wong, P. C. Yip, and J. P. Reilly, “Statistical anal-
of a class of estimators of the number of narrow-band sources,” IEEE ysis of the performance of information theoretic criteria in the detec-
Trans. Acoust. Speech, Signal Process., vol. 35, pp. 1350–1352, 1987. tion of the number of signals in array processing,” IEEE Trans. Acoust.
[16] S. Kritchman and B. Nadler, “Determining the number of components Speech Signal Process., vol. 37, no. 10, pp. 1557–1567, 1989.
in a factor model from limited noisy data,” Chem. Intell. Lab. Syst., vol. [36] L. C. Zhao, P. R. Krishnaiah, and Z. D. Bai, “Remarks on certain cri-
94, pp. 19–32, 2008. teria for detection of number of signals,” IEEE Trans. Acoust., Speech,
[17] S. Kritchman and B. Nadler, “Non-parametric detection of the number Signal Process., vol. ASSP-35, pp. 129–132, 1987.
of signals, hypothesis tests and random matrix theory,” IEEE Trans.
Signal Process., vol. 57, no. 10, pp. 3930–3941, 2009.
[18] D. N. Lawley, “Tests of significance for the latent roots of covariance
and correlation matrices,” Biometrika, vol. 43, pp. 128–136, 1956. Boaz Nadler was born in Israel in 1971. He received the B.Sc. degree in
[19] M. Ledoux, “Deviation inequalities on largest eigenvalues, in geo- mathematics and physics (cum laude), the M.Sc. degree in applied mathematics
metric aspects of functional analysis,” in Lecture Notes in Mathematics, (summa cum laude), and the Ph.D. degree in applied mathematics, all from Tel
V. D. Milman and G. Schechtman, Eds. New York: Springer, 2007, Aviv University (TAU), Tel Aviv, Israel.
vol. 1910. From 2002 to 2005, he was a Gibbs Instructor/Assistant Professor with the
[20] A. P. Liavas and P. A. Regalia, “On the behavior of information theo- Department of Mathematics, Yale University, New Haven, CT. He is currently
retic criteria for model order selection,” IEEE Trans. Signal Process., a Senior Research Scientist with the Department of Computer Science and Ap-
vol. 49, no. 8, pp. 1689–1695, 2001. plied Mathematics, Weizmann Institute of Science, Rehovot, Israel. His research
[21] Z. Ma, “Accuracy of the Tracy-Widom limit for the extreme eigen- interests are in mathematical statistics, machine learning, stochastic processes,
values in white Wishart matrices,” submitted for publication. and their applications in chemometrics and in signal processing.

Nonparametric Detection of Signals by Information Theoretic Criteria: Performance Analysis and An Improved Estimator

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Nonparametric Detection of Signals by Information Theoretic Criteria: Performance Analysis and An Improved Estimator

Uploaded by

Copyright:

Available Formats

2746 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO.

Nonparametric Detection of Signals by Information

where is the Tracy-Widom distribution of order , and

For real valued observations, the following formulas provide

The diagonal entries are the sample variances in

Note that under the conditions of the lemma that , it

where and . Inserting

It captures, to leading order in , the interaction between the

where the random variable is a linear combination of , for

The key point thus far is that to leading order in , the

Fig. 2. Analysis of the difference between (7) and (11) for [1 ] (

VI. SUMMARY AND CONCLUSION

You might also like