You are on page 1of 10

1

Minimax threshold for denoising complex signals with Waveshrink


Sylvain Sardy Swiss Federal Institute of Technology (EPFL-DMA) 1015 Lausanne, Switzerland
For the problem of signal extraction from noisy data, Waveshrink has proven to be a powerful tool both from an empirical and asymptotic point of view. Waveshrink is especially e cient at estimating spatially inhomogeneous signals. A key step of the procedure is the selection of the threshold parameter. Donoho and Johnstone propose a selection of the threshold based on a minimax principle. Their derivation is speci cally for real signals and real wavelet transforms. In this article we propose to extend the use of Waveshrink to denoising complex signals with complex wavelet transforms. We illustrate the problem of denoising complex signals with an Electronic Surveillance application. Key Words: Nonparametric denoising; Waveshrink; Minimax; Complex signals; Complex Wavelet Transform; Electronic Surveillance.

can be estimated by taking the Median Absolute Deviation of the high level wavelet coe cients as proposed in 1]. Our goal is to nd a good estimate f^ of the underlying signal f = (f (x ); : : :; f (xN )). The predictive performance of f^ is measured by the risk (also called Mean Squared Error), de ned as ^ f ) = 1 E kf^ ? f k ; (1) R(f; N where E stands for the expectation over the observed noisy signal s. Expansion-based nonparametric estimators assume that the underlying signal can be well approximated by a linear combination of P known basis functions p (e.g., splines, Fourier trigonometPP ric functions), namely that f (x) I. Background p p (x). p Suppose we observe a univariate real signal s = Once a set of basis functions is chosen, the only (s ; s ; : : :; sN ) 2 IR at equispaced locations xn quantities to estimate are the basis function coefcients = ( ; : : :; P ). The signal estimate is according to the model then f^ = ^, where is the matrix of discretized sn = f (xn ) + zn ; n = 1; : : :; N; p . The hat on top of a variable is the notation used where the zn are identically and independently dis- throughout this article to indicate the estimate of tributed standard Gaussian random variables. So the corresponding variable. we assume that the variance of the noise is known A. Waveshrink and unity (i.e., = 1). In practice the variance Waveshrink is an expansion based estimator proemail: Sylvain.Sardy@ep .ch; Tel: +41 21 693 5503; Fax: posed by Donoho and Johnstone 2]. The expansion +41 21 693 4250 EDICS: SP 3.5 is on wavelets, a set of P = N compactly supported
Abstract |
1 2 2 =1 N 1 2 1

and orthonormal functions on the real line. Under the hypothesis that the underlying signal is periodic, the wavelets are orthonormal on the support of the signal and the matrix is orthonormal. Let us denote by 0 the transpose of . The orthonormality property has two important consequences. First the Least Squares estimate of the wavelet coe cients, ^ = 0 s Normal( ; IN ); (2)

with ( ; n ) = (^ ;n ? n ) . Because a wide class of functions f can be well approximated by the linear combination of a few wavelets, Donoho and Johnstone 2] propose to enforce sparsity by using either the hard shrinkage function
2 (

hard)

(x) = x 1(jxj > );

where 1(x 2 A) is the identity function on A, or the soft shrinkage function


(

is unbiased, and the coe cients are independent of each other with same variance. Secondly the risk in function values (1) equals the risk in coefcient values: for any estimate ^ of , we have ^ f ) = R(^; ). So we can concentrate on esR(f; timating the wavelet coe cients and measure the predictive performance of an estimator in the coefcient values. The Least Squares estimate is unbiased, but does not denoise the original signal since f^ = ( 0 s) = s. The corresponding risk equals the variance of the noise, namely, R(^; ) = 1. To estimate the wavelet coe cients with a smaller risk at the cost of introducing some bias, Donoho and Johnstone 2] propose to apply component-wise a function ( ) that shrinks the Least Squares estimate towards zero to obtain the following estimate, ^ = (^); (3)

soft)

(x) = sign(x) (jxj ? ) ;


+

(5)

where x is x for x > 0 and zero otherwise. We see that any value of the Least Squares coe cients between ? and is set to zero, hence enforcing sparsity. Other shrinkage methods have been proposed, e.g., Non-Negative Garrote 3], Firm 4], or Bayesian 5]. But more important than the choice of a shrinkage function, the selection of is the crucial step of the Waveshrink procedure. As in 2], we concentrate in this article on the selection of the threshold for the soft shrinkage function (5).
+

B. Threshold selection

where is the threshold parameter of the shrinkage function, a meta parameter of the Waveshrink procedure. The risk of the estimate (3) is the sum of the risks component-wise, namely,

R( ; ) =

N X n=1

( ; n );

(4)

The threshold parameter controls the biasvariance trade-o of the risk (4). Its selection is crucial for Waveshrink to give a good estimation of the underlying signal. Donoho and Johnstone 2] proposed a selection of the meta parameter based on a minimax principal. Their approach can be summarized in three steps: Oracle risk for diagonal linear projection . They considered a \diagonal linear projection" estimator that keeps or kills each Least Squares coe cient with a di erent meta parameter n , namely, ^ = diag( ; : : :; N )^, where 2 f0; 1gN . This estimator has a total of N meta parameters, one
1

for each Least Squares coe cient. It would be difcult in practice to estimate the N meta parameters. Donoho and Johnstone invoked an oracle (i.e., the knowledge of the quantity to estimate ) and considered selecting by minimizing the risk. The optimal risk is

visual appearance. However, it has the advantage of giving good predictive performance. For details on the properties of these two thresholds, we refer to 2] and 1].
II. Waveshrink for complex signals

R(DP; ) := min R(^ ; ) =

N X n=1

min(j n j ; 1): (6)


2

In practice one would like to approach closely this ideal risk. Universal threshold . They considered the Waveshrink estimate (3) with the soft shrinkage function, and select the single meta parameter N = p2 log N , so that the estimator achieves the performance of the oracle estimator within a factor of essentially 2 log N for all possible true coe cients, namely,

R(^ N ; ) (2 log N + 1) 1 + R(DP; )


N

(7)

for all 2 IR . The universal threshold also has the advantage that, if the signal is in fact white noise (i.e., sn = zn ), then the signal will be estimated as zero with high probability since

P (max jznj > 2 log N ) ?! 0 as N ?! 1: n


Hence the universal threshold has the advantage of giving a signal estimate with a nice visual appearance (see for instance 1]). Minimax threshold . They improve the bound N = 2 log N + 1 and select the threshold N to achieve the minimum bound N , namely,

R(^ N ; )

?
N

1 + R(DP; )

for all 2 RN , and N 2 log N + 1 and 2 log N asymptotically. The minimax threshN old does not usually give an estimate with a nice

So far it has been assumed that the signal is realvalued. In some applications, however, the signal is complex-valued (see for instance our application in section III) and the wavelets have real and imaginary parts. Examples of complex wavelets include the complex Daubechies wavelets 6], chirplets 7] and brushlets 8]. To denoise the signal, both the real and imaginary parts of the Least Squares coe cients have to be shrunk towards zero. A simple shrinkage procedure would consist in independently shrinking the real and imaginary parts of the Least Squares wavelet coe cients (2). One drawback of this procedure is that the underlying signal estimate is not guaranteed to have a sparse wavelet representation (a shrunk coe cient may have its real part set to zero but not its imaginary part, and vice versa). Moreover the phases of the Least Squares coe cients are changed by this kind of shrinkage. If instead the moduli of the Least Squares coe cients are shrunk, then both the real and imaginary parts are guaranteed to be set to zero together, and the phases remain the same. Shrinking the moduli is the natural generalization of Waveshrink to complex signals since it forces sparsity on the wavelet representation. To be more speci c, suppose we observe a univariate complex signal s = (s ; s ; : : :; sN ) 2 C N at equispaced locations xn according to the model
1 2

sn = f (xn ) + zn ; n = 1; : : :; N;

where the zn = z n + iz n are identically and independently distributed complex random variables with (z n; z n)0 Normal(0; I ). And let us again assume that the underlying complex function can be well approximated by a linear combination of PP wavelets, namely, f (x) p p (x), where now p the 's are complex coe cients and the ()'s are complex wavelets. As in the real case (3), we de ne the soft-Waveshrink estimate by ^ = soft (^), where ^ = 0 s is the (complex) Least Squares estimate. We generalize the soft shrinkage function to complex values as
1 2 1 2 2 =1 ( ) (

A. Threshold selection

As in 2], we restrict our attention to one component ( ; ) of the risk (4). Let us call Y a Least Squares component, and Y , Y its real and complex parts. Their distribution is (Y ; Y )0 Normal( ; I ), where = ( ; )0. For any estimate ^ of , the risk of the component is:
1 2 1 2 2 1 2

( ; ) = E (^ ? ) + (^ ? ) :
1 1 2 2 2 2

(9)

For clarity in the following derivations, we postpone the mathematical derivations to Appendix A and Appendix B.

(8) A.1 Oracle predictive performance for diagonal linear projection where j j is now the modulus of the complex number One component of the estimate is ^ = Y with . After applying the soft shrinkage function to the 2 f0; 1g. The corresponding risk is Least Squares coe cients, the moduli are shrunk to( ; ) = j j if = 0; wards zero and the phases of the Least Squares coef2 if = 1. cients are unchanged. Any complex Least Squares So the oracle binary meta parameter is = coe cient which modulus is less than is set to 1fj j>p g and the corresponding oracle risk is zero: the sparsity in the wavelet representation of (DP; ) = min(j j ; 2). Note that the Least the underlying complex signal is insured. Squares estimate has risk (1; ) = 2. After repeatAs in the real case, the key step is in the selection ing the oracle projection to each wavelet coe cient of the threshold parameter . The selection of the independently, we nd that the oracle risk is N meta parameter must however be adapted to comX R ( DP; min(j nj ; 2): ) = plex values since the distribution of the moduli of n the Least Squares wavelet coe cients is no longer This is analogous to equation (6) for complex sigGaussian. In the following we will essentially follow nals, and it constitutes our reference predictive perthe three steps of section I-B for deriving the univer- formance. sal and minimax thresholds: in section II-A.1, we derive the oracle risk for diagonal linear projection; A.2 Universal threshold In Appendix A, we rewrite the risk de ned in in section II-A.2, we achieve the oracle risk to a factor of 1 + log(N log N ) with the universal threshold equation (9) in the convenient form of equation (15). Note that the three terms of the second and third N ; nally in section II-A.3, we nd the minimax threshold N that achieve the optimal factor N . lines of equation (15) are negative. So for > 0
soft)

( ) = j j (j j ? ) ;
+

=1

p small enough and letting = 2 log( ? log ? ), A.3 Minimax threshold we have on the one hand: De ne the minimax quantities (; ) (; ) 2+ inf sup 2=N + min( N j j ; 2) ; (2 + )(1 + ) the largest attaining N above. N = (1 + log( ? log ? ))(2 + 2): Then the overall risk for the N wavelet coe cients On the other hand, from Property (P1) and the is ) ( second point of Property (P2), we have: N N X X 2 + min(j n j ; 2) ( N ; n) N (; ) ( ; (0; 0)) + j j n n p = R ( DP; ): N = 2 exp(? =2) ? 2 2 (? ) + j j 2 exp(? =2)(1 + =2) + j j (1 + =2) To nd N consider the analogous quantity where 2 ? ? the supremum over 0; 1) 0; 1) is replaced by = ( ? log ? + j j )(1 + log( log )) the supremum over the endpoints f(0; 0); (1; 1)g, (2 + j j )(1 + log( ? log ? )): namely, Putting both inequalities together, we have (; ) inf sup N 2 =N + min( j j ; 2) ; (11) 2f ; ; 1 ; 1 g ? ? ( ; ) (1 + log( log ))(2 + min(j j ; 2)) and let N be the largest attaining N . In Apfor small enough such that log( ? ) 1, for pendix B we show that N = N and N = N . instance = 1=N with N 3. Hence, we de- Now ( ; (1; 1)) = 2+ is strictly increasing in p ne the universal threshold N = 2 log(N log N ). and ( ; (0; 0)) = 2p2 ( ( ) ? (? )) is strictly With that choice of the threshold parameter, soft- decreasing in , so that at the solution of (11), Waveshrink approaches the oracle risk by a factor (N + 1) ( N ; (0; 0)) = ( N ; (1; 1)): of essentially N = (1 + log(N log N )), namely,
1 1 2 2 0 2 1 1 2 2 =1 =1 2 2 2 2 2 2 2 1 1 1 1 2 1 1 0 1 1 2 (0 0) ( ) 2 0 0 1 0 0 2 0 0

And this last equation de nes N uniquely. Table I lists the universal N and minimax N n (1 + log(N log N )) R(DP; ): thresholds and the corresponding bounds N and J This equation is analogous to equation (7). As in N for N = 2 with J between 6 and 16. The table the real case, as the number of observations N be- is analogous to Table 2 of 2]. comes large, only the predominant features of a sigIII. Application nal remain after denoising. Indeed, for N indepenIn the previous section we have derived the unident and identically distributed standard Gaussian versal and minimax thresholds for denoising comcomplex random variables Zn , we have: plex signals with Waveshrink. We now use the prop P (max jZnj > 2 log(N log N )) ?! 0 as N ?! 1: cedure for an Electronic Surveillance application. n (10) In this Electronic Surveillance application, we are

R( N ; ) =

N X

( N ; n)

=1

TABLE I Coefficient N and related quantities


N 64 1.763 128 1.973 256 2.176 512 2.371 1024 2.560 2048 2.741 4096 2.917 8192 3.086 16384 3.251 32768 3.411 65536 3.566
N
p

2log(N log N ) 3.342 3.586 3.810 4.017 4.211 4.395 4.569 4.735 4.894 5.048 5.195

N 2.514 2.924 3.355 3.804 4.271 4.754 5.252 5.762 6.285 6.817 7.360

1 + log(N log N ) 6.584 7.431 8.258 9.069 9.868 10.656 11.436 12.209 12.977 13.739 14.496

Imag 1.5

1.0

0.5

0.0

-0.5

-1.0

-1.5 Real 1.5

1.0

interested in the problem of passive detection and ngerprinting of incoming radar signatures from an electronic surveillance platform. The observed complex signal plotted in Figure 1 is N = 2048 samples from a chirped RF source at 3db with jamming interference. We propose to model the underlying complex signal as a linear combination of wavelets. It is natural for this application to use chirplets 7], a collection of locally supported basis functions whose frequency changes linearly with time. The collection is however \over-complete" in the sense that its cardinal P is larger than the number of observations N . The matrix has more columns than rows and is therefore not orthonormal. Chen, Donoho and Saunders 9] proposed an extension of soft-Waveshrink, called Basis Pursuit, to estimate the coe cients in the over-complete situation. The coe cient estimate ^ is de ned as the solution to the Basis Pursuit optimization problem: min 1 2 ks ?

0.5

0.0

-0.5

-1.0

-1.5 50 55 60 65

microseconds

Fig. 1. Real (bottom) and imaginary (top) parts of a chirped RF source at 3db with jamming interference.
P

k + k k;
2 2 1

where k k = P j p j. This de nition of the p coe cients is a generalization of Waveshrink because, interestingly, the soft-Waveshrink estimate ^ = soft ( 0 s) is the closed form solution to the Basis Pursuit optimization problem when is orthonormal. The non-trivial Basis Pursuit optimization problem (when is over-complete) can be solved by an Interior Point algorithm 9] in the real case or by a Block Coordinate Relaxation algorithm 10] in the real and complex cases. The latter is guaranteed to converge and has been found to be (12) empirically more e cient.
1 =1 ( )

As for Waveshrink, the estimate of the meta parameter in (12) is a crucial point of the Basis Pursuit procedure. In practice, the threshold (universal or minimax) developed for Waveshrink gives a good denoising performance to Basis Pursuit. Because our application is concerned with feature extraction, we propose to use the universal threshold. By property (10), most interferences will be erased and the main features of the underlying signal will be revealed. Using the estimate of the standard deviation of the noise proposed by Donoho and Johnstone in 1] for Gaussian noise, we nd that ^ = :51. So our selection of the meta parameter is p = ^ 2 log(N log N ) = 2:22. After solving the Basis Pursuit optimization problem, we obtain the denoised estimate f^ = ^ . Figure 2 shows the spectrogram of the original

ew.spectrogram 60

20

50

40

10

30

20

10

KHz

0 bp.chirp.spectrogram 60
-10

50
-20

40

30

20

-30

10

0
-40

50

55

60

65

microseconds

signal on top, and the spectrogram of the denoised signal at the bottom. While a chirp with lin- Fig. 2. The spectrogram for a chirped RF source at 3db with jamming interference. The top spectrogram is of early decreasing frequency is already slightly visible the original signal and the bottom spectrogram is of the in the original signal, it is interesting to see how Badenoised signal using the universal meta parameter ( = 2:22). sis Pursuit with the universal selection of the meta parameter has \cleaned-up" the signal from most of the jamming interference. For even better results, IV. Conclusions one can choose the meta parameter by hand. For We have generalized Waveshrink and Basis Purinstance Figure 3 shows the Basis Pursuit denoised suit to complex signals. Both procedures need a signal with = 4. selection of the meta parameter that controls the amount of denoising. For using the two procedures on complex signals, we have derived the universal An automatic procedure will more easily detect and minimax thresholds selection of in a similar the ngerprint of an incoming radar signature on fashion as previously proposed for real signals by Donoho and Johnstone 2]. We then used the Bathe denoised signal than on the original one.

Appendix
ew.spectrogram 60
20

I. Risk properties of soft thresholding

50

40

10

30

20

The risk of the soft-thresholding estimator can be rewritten as ? ) Y ? )+ (13) ( ; ) = E ( (jY jjY j ? ) Y ? ) ( (jY jjY (14) j
+ 2 1 1 + 2 2 2

10

KHz

0 bp.chirp.spectrogram 60
-10

= 2+ ?E ( ? jY j )1(jY j < ) ? 4P (jY j < )


2 2 2

?2

1 0

(r cos ) (r sin )drd (15)


2

50
-20

40

30

20

-30

10

0
-40

where i (x) = (x ? i ) is the Gaussian density function with mean i . We deduce the following properties of the risk: P1 ( ; ) ( ; (0; 0)) + j j . Proof: from (14), we easily obtain: ? ) ( Y + Y) ; ( ; ) = ( ; (0; 0))+j j ?2E (jY jjY j
2 2 + 1 1 2 2

50

55

60

65

microseconds

Fig. 3. The spectrogram for a chirped RF source at 3db with jamming interference. The top spectrogram is of the original signal and the bottom spectrogram is of the denoised signal for a meta parameter chosen by hand ( = 4) to remove most of the interference.

sis Pursuit procedure with the proposed universal threshold to successfully denoise a complex signal in an Electronic Surveillance application.

+( Y + Y ) and the third term q = E jY jj? Yj is positive. (To see this last point, note that we can restrict out attention to > 0 and > 0. Then de ne the line L : y + y = 0 and L the line parallel to L that passes through ( ; ), and let P be the half-plane below L , P the halfplane above L and P the remaining band. One can easily show that q = q + q + q is positive since q 0 and q ?q 0.) P2 ( ; ( ; )) reaches its maximum for a xed at ( ; ) = (1; 1) and ( ; (1; 1)) = 2 + is strictly increasing in . Proof: the last three terms of the right side of Equation (15) are negative. So the proof is done if we can show that their limits are zero when ! 1 and ! 1. It is straightforward for the rst two terms which integrals are de ned on the compact circle of
( ) 1 1 2 2 1 2 1 1 1 2 2 2 1 1 2 1 1 3 2 2 1 2 3 2 3 1 1 2 1 2 2 1 2

radius . The third term is also negative, but show- will also minimize N . We must prove that ing that its limit is zero when = = ! 1 ( N; ) L N( ; N) = 2=N + min(j j ; 2) demands more work. After some algebra, one can show that, for = = , attains its maximum at either (0; 0) or (1; 1). We split the problem into two cases: q( ) = 0 R R1 N ; and the nufor j j > 2, LN ( ; N ) = =N (r cos ) (r sin )drd R merator (see equation (15)) attains its maximum at = p exp(? ) exp( ( + sin(2 ))) R1 (1; 1). p exp(? (r ? ( cos + sin )) )dr d 0 q R N; 2 2 for j j < 2, L ( ; ) = N N =N jj exp(? ) exp( cos )d (1) 0; ; j j2 by Propriety (P1). N p 2 =N j j2 = 2 exp(? )I ( ); Looking now at the right hand side of the inequality where I ( ) is the modi ed Bessel function of the as a function of r = j j, the sign of its derivative rst kind with zero order. In 11], an asymp- is the sign of 2 ? N ( ; (0; 0)) which in turn is N 0 totic expansion of this function is given by I (x) = negative because N ( ; (0; 0)) = N 2 2 for N N ?1 2 2 x p x (1 + x + x 2 + ), so that N 4 by Property (P3). So LN ( ; N ) reaches its p maximum at the endpoint = (0; 0). 2 (1 + 1 + 1 3 + ): 0 q( ) 4 2!(4 ) III. Acknowledgments So lim !1 q( ) = 0. Hence the maximum of This work was partially supported by Navy Phase ( ; ( ; )) occurs at (1; 1) and ( ; (1; 1)) = I SBIR Award No. N68936-97-C-0150. I am thank2+ . p ( ; (0; 0)) = 2 2 ( ( ) ? (? )) is strictly ful to Iain M. Johnstone for helpful comments. Three anonymous referees contributed numerous decreasing in . From these two points, we get that p(N; ) = improvements to the manuscript. (N + 1) ( ; (0; 0)) ? ( ; (1; 1)) is decreasing in References . Moreover, p(N; 0) > 0 and p(N; 1) = ?1 so 1] David L. Donoho and Iain M. Johnstone, \Adapting to that the root N to p(N; ) = 0 is unique. unknown smoothnessvia wavelet shrinkage," Journal of p the American Statistical Association, vol. 90, no. 432, P3 N > 2=N for N 4. pp. 1200{1224, December 1995. Proof: p(N; ) is decreasing in . Moreover 2] David L. Donoho and Iain M. Johnstone, \Ideal spatial p p(N; 2=N ) is increasing in N and positive for adaptation via wavelet shrinkage," Biometrika, vol. 81, pp. 425{455, 1994. N = 4. So the root N of the equation is larger p 3] H-Y Gao, \Wavelet shrinkage denoising using the than 2=N . non-negative garrote," Journal of Computational and
1 2 0 0 2 1 2 2 2 0 ( ) 0 1 2 2 1 2 2 1 2 2 2 0 2 1 1 2 2 2 2 1 2 2 0 ( ) 2 0 2 2 + ( (0 0))+ + 1 2 2 0 2 2 0 0 0 0 2+( 1+ 0 ) exp( 2 ) 1 1 3 8 2!(8 ) 2 2 2 2 2 2 0 0

II. Proof: equivalence of


N

N
0

It is clear that N ( ) N ( ) for all . So if we can establish that N ( N ) = N ( N ), then N


0 0 0 0

Graphical Statistics, vol. 4, pp. 469{488, 1998. 4] Hong-Ye Gao and Andrew Bruce, \WaveShrink with rm shrinkage," Tech. Rep., Research Report 39, Statistical Sciences Division, MathSoft, Inc, 1996. 5] Brani Vidakovic, Statistical Modeling by Wavelets, New York: Wiley, 1999.

10

6] Jean-Marc Lina and Michel Mayrand, \Complex Daubechieswavelets," Applied and Computational Harmonic Analysis, vol. 2, pp. 219{229, 1995. 7] Steve Mann and Simon Haykin, \Adaptive \chirplet" transform: and adaptive generalization of the wavelet transform," Optical Engineering, vol. 31, no. 6, pp. 1243{1256, June 1992. 8] Francois G. Meyer and Ronald R. Coifman, \Biorthogonal brushlet bases for directional image compression," in 12th International Conference on Analysis and Optimization of Systems. 1996, Springer-Verlag. 9] Shaobing Chen, David L. Donoho, and Michael Saunders, \Atomic decomposition by basis pursuit," Tech. Rep., Stanford University, 1996. 10] S. Sardy, A. Bruce, and P. Tseng, \Block coordinate relaxation methods for nonparametric wavelet denoising," Journal of Computational and Graphical Statistics, to appear. 11] F. Bowman, Introduction to Bessel functions, Dover Publication Inc., New York, 1958.

You might also like