You are on page 1of 5

IMAGE DENOISING BASED ON SCALE-SPACE MIXTURE MODELING OF WAVELET COEFFICIENTS Juan Liu and Pierre Moulin

U. of Illinois, Beckman Inst. and ECE Dept 405 N. Mathews Ave., Urbana, IL 61801 Email: {j-liuf, moulin) @ifp.uiuc.edu
ABSTRACT
In this paper, we propose a novel hierarchical statistical model for image wavelet coefficients. A simple classification scheme is used to construct a model that captures interscale and intrascale dependencies of wavelet coefficients. Applications to image denoising are presented. We develop a simple algorithm that outperforms other wavelet denoising schemes that exploit firstorder statistics, or inter- or intra- scale dependencies alone. 1. INTRODUCTION Various statistical wavelet models for images have been proposed in literature. A simple and popular model is the independent and identically distributed (iid) generalized Gaussian distribution (GGD) model for wavelet coefficients [l, 21. This model has been successfully used in image denoising and restoration. It approximates first-order statistics of wavelet coefficients fairly well, but does not take higher-order statistics into account and thus presents some limitations. The dependencies that exist between wavelet coefficients have been studied for many years in the image compression community. These dependencies can be formulated explicitly (e.g., the EQ coder [3]), or implicitly (e.g., the morphological coder [4]). Moreover, most wavelet models can be loosely classified into two categories: those exploiting interscale dependencies, and those exploiting intrascale dependencies. 1. Interscale models. For typical images, the magnitudes of wavelet coefficients are strongly correlated across scales [5]. Formally, wavelet coefficients can be organized into a quad-tree structure. If a parent node has small magnitude, its children are very likely to be small too. This property is explored in Shapiros zerotree coder [6]. Likewise, the hidden Markov tree model (HMT) by Crouse et al [7] captures dependencies between a parent and its children. The distribution of coefficients in each subband is an iid Gaussian mixture. 2. Intrascale models. Compression algorithms such as the EQ coder 131 and the morphological coder [4] exploit spatial clustering of wavelet coefficients. For example, the EQ coder models wavelet coefficients as independent GGD with zero mean and slowly varying variance. This model has recently found useful applications to denoising [SI. Local statistics are estimated from the data. However, the local homogeneity assumption is questionable in the vicinity of edges. Another method involving spatially varying variance was developed in [9]. The model in [9] is also able to account for interscale dependencies. 2.

A SCALE-SPACE MIXTURE MODEL

Considering the advantages and limitations of the statistical models above, we propose a new model which explicitly combines interscale and intrascale dependencies of image wavelet coefficients, using a simple classification technique. Our approach is based on the following empirical observations: wavelet coefficients with large magnitude are typically representative of edges or some textures, while those with small magnitude are associated with smooth regions such as the background. Also, there is a strong correlation between magnitudes of coefficients across scales [5]. The design of our new model is primarily motivated by the zerotree coding technique. Our classification technique is illustrated in Fig. 1. We define a significance threshold T . In each subband except the first fine scale, we partition the coefficients into two classes based on the magnitude of their parents: Gsigis the set of coefficients that have significant (> T ) parents; and Ginsig is the set of coefficients that have insignificant parents. The binary significance map in Fig. 2 shows the classification of Lenas wavelet coefficients. The class

0-7803-5467-2/99/ $10.0001999 IEEE

386

/\

compare to T if parent(x) >T, put x in G-sig else, put x in G-insig

Figure 1: Partition of fine subbands: for every coefficient x (marked with a small black box), the algorithm refers to its parent coefficient. If the magnitude of the otherwise, parent is larger than T , x is included in Gsig; it is included in Ginsig.
Gsigis shown as the set of bright points. It mostly represents high-activity regions. The second class, Ginsig, corresponds mostly to smooth regions. The size of the two classes is controlled by the significance threshold

Figure 2: Significance map for Lena wavelet coefficients. The significance threshold is T = 60. The class Gsig is shown as the set of bright points, and Ginsig as the set of dark points.
3. APPLICATION TO IMAGE DENOISING

T.
The two classes have quite different statistics. In general, the histogram of the coefficients in Ginsigis highly concentrated around zero, while the histogram of Gsigis more spread out (see Fig. 3a and Fig. 3b). We have observed that a Laplacian distribution provides a good fit for Gsig. Likewise, for Ginsig which corresponds mostly to homogenous regions, the intrascale model used in the EQ coder [3] is appropriate. It provides a good fit for the first-order statistics of wavelet coefficients and well models the nonstationary nature of low-activity regions. This scale-space mixture model is more accurate than solely intra- or inter-scale models in its ability to discriminate between edges and smooth areas. Unlike purely interscale models such as HMT [7], our model captures dependencies between neighbors and the local homogeneity of wavelet coefficients in smooth regions. Unlike purely intrascale models such as EQ [3], our model captures cross-scale dependencies in highactivity regions.

We now apply the statistical model designed in the previous section to image denoising. Assume that the clean image x is contaminated with additive white Gaussian noise (AWGN) w with zero mean and variance U:. We seek a good estimate of x given the noisy data y = x w. The estimation problem can be formulated in the wavelet domain: the image coefficients 5 are to be estimated from the empirical coefficients jj = 5 ul. Here ul is the noise in the wavelet domain, and is still AWGN if the wavelet transform used is orthonormal.

-+

For each wavelet coefficient, depending on the subband it belongs to, our algorithm does the following:
0

The coarse band coefficients are not processed because the coarse subband has very high SNR. These coefficients are considered reliable. For each of the first three fine subbands (with horizontal, vertical, and diagonal orientations), coefficients within the subband are modeled as iid Laplacian with zero mean and variance (j indicates the subband). The variance estimate is computed from the noisy coefficients in subband j as o : , ~ max (0, Var{jji, i E subband j } - 0;). = This estimate does not have any special optimality property but is chosen for simplicity. Under

387

Loghinogram of Ghi.
J

1
-150
-100

-50

50

1M)

150

-150

-100

-50

50

100

150

Figure 3: Log-histogram of Gsig (a) and Ginsig(b) in the second high horizontal subband (LLLH) of Lena, using significance threshold T = 60. The dash-dotted lines arethe GGD fits to the histograms. For Gsig, the GGD fit has shape parameter U = 0.98, and standard deviation 36.72; for Gingig, = 0.43, and the U standard deviation is 13.82. this iid Laplacian model, maximum a posteriori (MAP) estimates of 3 are obtained by applying it soft threshold X = f i u i / u z , j to each noisy coefficient [2]. In each of the other high subbands, coefficients are assigned to one of the two classes Gsig and Ginsig, depending on the magnitude of their estimated parent relative to the significance threshold T.
- Coefficientsin Gsigare modeled as iid Lapla-

Gingig. a given value of T , the variance u : , , , ~and For the variance field {U;, i E Gingig}are estimated as described above. The likelihood for of the data is evaluated and viewed as a function of T . We then choose T by numerically searching through a range of possible values so as to maximize this likelihood.
The denoising approach is top-down. Starting from the coarse scale, the algorithm goes from parent to children subbands, and terminates in the highest subbands. After completion of a subband, we obtain not only coefficient estimates, but also the parent significance information necessary to process the next high subband.

cian with zero mean. Their variance is estimated from the noisy coefficients in Gsig as max (0,Var{Qi, i E Gsig}- ci). Again the MAP estimator is a simple soft thresholding scheme, with threshold value adjusted to the signal variance. - Coefficients in Ginsig typically have small magnitude and represent smooth areas. We used the technique in [3,8],and compute the maximum-likelihood estimates of the variance 8: at each location i using a 5 x 5 local window centered at i. Instead of using all coefficients inside the window, only those that belong to Ginsigare used; those in Gsig are excluded. Assuming 8; is the true variance for l i , the MAP estimator is given by &? p z. - +- *i+rzYi.

4. EXPERIMENTAL RESULTS

We have applied our denoising scheme to various images, and report results for the 512 x 512 Lena and Barbara images. We use Daubechies 8-tap filters [lo] and a 5-level wavelet decomposition. We have compared our method with other methods using different statistical models, including:
0

Universal hard thresholding [ll].This denoising scheme is based on a minimax criterion, and does not involve specific signal statistics.

For each high subband, the significance threshold T controls the assignment of coefficients to Gsig and

The matlab code for the denoising algorithm is available at website http://www.ifp.uiuc.edu/-j-liuf/codes.html.

388

MAP estimator under an iid GGD model for each subband. For simplicity, we consider only the Laplacian model, which is nearly optimal [2]. The MAP estimator which is takes the form of a soft threshold applying t o the noisy coefficients. Maximum likelihood (ML) estimator under the hidden Markov Tree (HMT) model [7]. This model explores interscale dependency. The expectationmaximization (EM) algorithm is used to estimate the hidden states and the original wavelet coefficients. Adaptive Wiener filtering in the wavelet domain [8]. This method explores intrascale dependencies only. We use the baseline method which models coefficients as independent Gaussian with zero mean and slowly varying variance [3]. We then perform Wiener filtering based on local statistics evaluated in a 5 x 5 neighborhood.

5. REFERENCES

[l] S. G. Mallat, A theory for multiresolution signal decomposition: The wavelet representation, IEEE Trans. on PAMI, vol. 11, pp. 674-693, July 1989.
[2] P. Moulin and J. Liu, Analysis of multiresolution

The mean-squared error (MSE) values of the denoised signals using these methods are reported in Table l. Denoised images are shown in Fig. 4. Our method consistently outperforms the other methods in terms of MSE. Compared t o the HMT model, the improvement is 8% t o 16%for Lena, and 21% to 26% for Barbara. The improvement for Barbara is more pronounced. This is probably because Barbara has more spatial variations, and our model takes that into account. The MSE improvement over adaptive Wiener filtering is only marginal (about 1% t o 7% for both images). However, the visual quality is noticeably better using our method (see Fig. 4d and Fig. 4e). The separation of subbands into significant and insignificant classes and the different treatment of the two classes helps discriminating between high activity areas and smooth regions. In particular, we obtain better smoothing in homogeneous regions, e.g., the background and Lenas face. Edges are better preserved. We also compare our result with adaptive thresholding with context modeling by Chang et a1 [12]. This method considers both inter- and intra-scale dependencies in a different way. A context is defined for each coefficient using its parent and neighbors, and a large context value indicates that the coefficient is likely t o be representing an edge. Their denoising method applies soft thresholding. The threshold is made spatially adaptive using context information. The MSEs are about 15% higher than ours.

image denoising schemes using generalized-gaussian and complexity priors, W E E IPrans. on Info. Theory., vol. 45, pp. 909-919, Apr. 1999. [3] S. LoPresto, K. Ramchandran, and M. T. Orchard, Image coding based on mixture modeling of wavelet coefficients and a fast estimation-quantization framework, in Data Compression Conference 97, (Snowbird, Utah), pp. 221-230, 1997. [4] S. Servetto, K. Ramchandran, and M. T. Orchard, Morphological representation of wavelet data for image coding, in Proc. IEEE Int. Conf. Acoust., Speech, and Signal Proc., (Detroit, MI), pp. 2229-2232, 1995. [5] E. P. Simoncelli and J. Portilla, Texture characterization via joint statistics of wavelet coefficient magnitudes, in Proc. ICIP98, (Chicago, IL), pp. I. 62-66, Oct. 1998. [6] J. M. Shapiro, Embedded image coding using zerotrees of wavelet coefficients, IEEE Trans. Signal Proc., vol. 41, pp. 3445-3462, Dec. 1993. [7] M. S. Crouse, R. D. Nowak, and R. G. Baraniuk, Wavelet-basedstatistical signal processing using hidden markov models, IEEE Trans. on Signal Processing,vol. 46, pp. 886-902, Apr. 1998. [8] M. K. Mihcak, I. Kozintsev, and K . Ramchandran, Local statistical modeling of wavelet image coefficients and its application to denoising, in Proc. IEEE Int. Conf. Acoust., Speech, and Signal Proc., (Phoenix, AZ), pp. 3253-3256, Mar. 1999. [9] R. L. Joshi, H. Jafarkhani, J. H. Kasner, T. R. Fischer, N. Farvardin, M. W. Marcellin, and R. H. Bamberger, Comparison of different methods of classification in subband coding of images, IEEE Trans. on Image Processing, vol. 6, pp. 1473-1486, Nov. 1997. [lo] I. Daubechies, Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics, 1992. [ll] D. L. Donoho and I. M. Johnstone, Ideal spatial adaptation by wavelet shrinkage, Biometrika, vol. 81,
pp. 425-455, 1994.

[12] S. G. Chang, B. Yu, and M. Vetterli, Spatially adap-

tive wavelet thresholding with context modeling for image denoising, in Proc. of ICIP98, (Chicago, IL), pp. 1. 535-539, Oct. 1998.

389

=7 Universal hard thresholding [ll] 33.58 17.91 MAP, iid Laplacian model [2] ML, HMT model [7] 15.73 Adaptive Wiener (EQ) [8] 14.52 Our method 14.44
U,

Lena = 10 51.52 26.74 26.69 22.44 21.47


U,

U,

= 15

T (,

=7

U,

77.20 41.30 39.06 36.98 36.02

74.30 30.62 27.83 22.20 22.03

Barbara = 10 120.29 51.90 47.53 39.07 37.34

T (,

= 15 202.66 90.59 87.67 68.98 64.73

and intra- scale dependencies.

Figure 4: Denoising Results for Lena: (a) noisy image with noise standard deviation U,,,= 15; (b) MAP estimates assuming iid Laplacian distribution, M S E = 41.30; (c) denoised image using HMT model [7], M S E = 39.06; (d) denoised image using adaptive Wiener filtering in wavelet domain with 5 x 5 windows, M S E = 36.98; (e) denoised image using our model, M S E = 36.02.

390

You might also like