You are on page 1of 39

New Tight Frames of Curvelets and

Optimal Representations of Objects with C


2
Singularities
Emmanuel J. Cand`es
Applied and Computational Mathematics
California Institute of Technology
Pasadena, California 91125
David L. Donoho
Department of Statistics
Stanford University
Stanford, California, 94305
November 2002
Abstract
This paper introduces new tight frames of curvelets to address the problem of nd-
ing optimally sparse representations of objects with discontinuities along C
2
edges.
Conceptually, the curvelet transform is a multiscale pyramid with many directions and
positions at each length scale, and needle-shaped elements at ne scales. These elements
have many useful geometric multiscale features that set them apart from classical mul-
tiscale representations such as wavelets. For instance, curvelets obey a parabolic scaling
relation which says that at scale 2
j
, each element has an envelope which is aligned
along a ridge of length 2
j/2
and width 2
j
.
We prove that curvelets provide an essentially optimal representation of typical
objects f which are C
2
except for discontinuities along C
2
curves. Such representations
are nearly as sparse as if f were not singular and turn out to be far more sparse than the
wavelet decomposition of the object. For instance, the n-term partial reconstruction
f
C
n
obtained by selecting the n largest terms in the curvelet series obeys
|f f
C
n
|
2
L2
C n
2
(log n)
3
, n .
This rate of convergence holds uniformly over a class of functions which are C
2
except for
discontinuities along C
2
curves and is essentially optimal. In comparison, the squared
error of n-term wavelet approximations only converges as n
1
as n , which is
considerably worst than the optimal behavior.
Keywords. Curvelets, wavelets, second dyadic decomposition, edges, nonlinear ap-
proximation, singularities, thresholding, Radon transform.
Acknowledgments. This research was supported by National Science Foundation
grants DMS 9872890 (KDI) and by an Alfred P. Sloan Fellowship. E. C. thanks the
Institute for Pure and Applied Mathematics at UCLA and especially Mark Green and
Eilish Hathaway for their warm hospitality. E. C. would also like to acknowledge fruitful
conversations with Laurent Demanet.
1
1 Introduction
1.1 The Problem of Edges
Edges are prominent features of the visual world; from some points of view a visual scene
contains little of importance besides edges. In fact, neuroscientists have identied edge
processing neurons in the earliest and most fundamental stages of the processing pipeline
upon which mammalian visual processing is built. Edges are also ubiquitous in synthetic
and real digital imagery, and a great deal of technological research aims to nd and represent
edges. Thus, in medical imaging, detecting and enhancing boundaries between dierent
cavities is of prime importance. Finally, edge-like phenomena exist outside vision, for
example in certain physical systems, where shock fronts occur naturally. Research in
scientic computing concerns the ecient representation and propagation of such fronts.
This article is motivated by fundamental questions concerning the mathematical rep-
resentation of objects containing edges: what is the sparsest representation of functions
f(x
1
, x
2
) which contain smooth regions, but also edges? We give a quantitative content
to this question, by using a simple mathematical model of images, asking a fundamental
approximation-theoretic question about this model, and using harmonic analysis techniques
to answer the question. We construct and analyze a new tight frame for representing func-
tions f(x
1
, x
2
), and establish an essential optimality of this system. Underlying these results
is a mathematical insight concerning the central role, for the analysis and synthesis of ob-
jects with discontinuities along curves (i.e. edges), played by parabolic scaling, in which
analysis elements are supported in elongated regions obeying the relation width length
2
.
The theoretical results established here should be of considerable interest for a wide
variety of technological elds. We mention two specic areas where the implications seem
most immediate:
Image Coding. Today the most advanced image coders are transform coders; these
typically apply a linear transform to the image data, yielding coecients that are
then quantized. Our theoretical results suggest in a mathematically precise model
of image encoding that popular classical transforms such as cosine transforms and
wavelet transforms can be substantially outperformed by the new class of transforms
we describe here. Since those classical transforms underly JPEG and JPEG-2000,
this fact may be of substantial interest.
Image Reconstruction. Digitally acquired data which are blurred, noisy, and indirectly
measured, are of interest in technological and scientic elds ranging from medical
imagery to extragalactic astronomy. Our theoretical results, see the companion paper
[8], show that, because images have edges, many of the standard approaches to im-
age restoration and enhancement (e.g. method of regularization, wavelet-vaguelette
decomposition) are suboptimal, particularly in very noisy situations; whereas, in our
mathematical model of image restoration, the new schemes we introduce are essen-
tially optimal.
In essence, in both areas, parabolic scaling ought to be extremely helpful in better
resolving the edgelike components of the images; it gives better accuracy in the vicinity of
edges while using many fewer terms in an approximation. When compared to non-parabolic
scaling methods like Fourier analysis and wavelets, this can lead to better compression in
image coding, and better image restoration in the presence of noise.
2
1.2 Quantifying the Approximation Performance
To quantify the performance of various representations, we will take the viewpoint of non-
linear approximation. We consider as a model problem the case where typical objects are
functions of two variables with discontinuities along edges and which are otherwise smooth.
To make things concrete, consider representing in a Fourier basis a binary object f, i.e. the
indicator function of a set with a C
2
boundary. Then the number of Fourier coecients of
f exceeding 1/n in absolute value grows as rapidly as c n
2
as n . This rapid rate of
growth means that many dierent terms are needed to obtain good partial reconstructions.
Let f
F
n
be the best partial reconstruction obtained by selecting the n largest terms in the
Fourier series; then the squared error of such an n-term expansion would obey
|f f
F
n
|
2
L
2
n
1/2
, n . (1.1)
The asymptotics are quite dierent if we now consider representing f in a nice wavelet
basis. For n tending to innity, the number of coecients above the threshold 1/n now
only grows as c n and the best n-term wavelet approximation would obey
|f f
W
n
|
2
L
2
n
1
, n , (1.2)
which is signicantly better. Nevertheless, this is nowhere close to being optimal.
Despite the fact that wavelets have had a wide impact in image processing, they fail to
eciently represent objects with edges for the simple reason that the wavelet transform does
not take advantage of the geometry of the underlying edge curve. Suppose that the edge
curve is of length one, say. Then at each xed scale 2
j
, there are about 2
j
wavelets which
interact with the edge yielding coecients of size about 2
j
. Although this is very crude,
this analysis is essentially correct and explains why the wavelet coecients only decay like
1/n. In other words, using a wavelet basis we need about 2
j
coecients to reconstruct the
frequency content of an edge up to the subband [[ 2
j
. In comparison, this paper will
exhibit a construction in which one can achieve a similar feat with only O(2
j/2
) coecients!
The limitation here is that wavelets are non geometrical and do not exploit the regularity
of the edge curve. To obtain nearly optimal approximation rates, we need new multiscale
ideas and basis functions with a very dierent geometry.
In fact, one can easily imagine more geometrical means of approximations based on
adapted triangulations. Consider a dictionary of indicator functions of triangles with ar-
bitrary shapes and locations. Then it is quite clear [19] that for each n, there exists a
superposition of n-triangles f
T
n
=

n
i=1
1
T
i
with the property
|f f
T
n
|
2
L
2
n
2
, n . (1.3)
These types of approximation are adaptive, and are of course very dierent from thresh-
olding ideas in a xed basis. In this direction, it is important to underline the conceptual
problems with the foundations of such results. First, the best or near-best approximation
is the solution of a complicated and abstract minimization problem. How to construct such
approximations is totally unclear if not intractable. Second, these types of results hardly
lead to any kind of realistic implementation in any practical setting. When presented with
an array of pixel intensities, it is unclear how to extract an adapted triangulation. One
would perhaps need to perform some kind of edge detection which for complicated imagery
is already quite problematic. Although this is a very important issue, we choose not to
dwell further on this topic and simply refer the reader to the companion paper [7] which
contains a comprehensive discussion on this theme.
3
Despite the lack of the constructive character of such results, we nevertheless nd them
useful because they provide an objective performance benchmark.
1.3 Optimality
The asymptotic convergence rate (1.3) is actually the correct optimal behavior for approx-
imating general smooth objects having discontinuities along C
2
curves. Consider an image
model c containig binary (black and white) objects supported in the unit square, for which
the curvature of the boundary curve separating black from white is bounded by some
constant C.
No orthogonal bases can yield approximation rates which are better than n
2
. We
shall not give the argument here and simply say that this follows from information
and approximation theoretic arguments which may be found in [19] and [27].
Even if one considers nite linear combinations of arbitrary dictionaries of waveforms
(which do not necessarily build up orthobases or near orthogonal systems), there is no
depth-search limited dictionary which can achieve a better rate than n
2
, see [19]. By
depth-search limited, we suggest that we are allowed sequences of dictionaries whose
size grow polynomially in the number of terms to be kept in the approximation, see
also [7].
No pre-existing basis comes even close to the optimal convergence rate. In fact, the
wavelet convergence rate (1.2) is the best published nonadaptive result.
These itemized facts raise a fundamental question: is there is a basis which does nearly
this well? That is, is there a basis or tight frame in which simple thresholding achieves
the optimal rate of convergence. This article argues that the answer is yes.
1.4 New Tight Frames of Curvelets
In this paper, we construct new tight frames of curvelets to address the problem of nding
optimally sparse representations of objects with discontinuities along C
2
edges. These tight
frames are dierent from that introduced in [7] and are roughly dened as follows. We let
be the triple (j, , k); here, j = 0, 1, 2, . . . is a scale parameter; = 0, 1, . . . 2
j
is an
orientation parameter; and k = (k
1
, k
2
), k
1
, k
2
Z is a translation parameter. Introduce
1. the parabolic scaling matrix D
j
D
j
=

2
2j
0
0 2
j

, (1.4)
2. the rotation angle
J
= 2 2
j
, with J indexing the scale/angle pair J = (j, ),
3. and the translation parameter k

= (k
1

1
, k
2

2
) (see Section 2 for the numerical
values of the parameters
1
,
2
> 0,).
With these notations, we dene curvelets as functions of x R
2
by

(x) = 2
3j/2
(D
j
R

J
x k

) . (1.5)
4
Here the waveform is smooth and oscillatory in the horizontal direction and bell-shaped
(nonoscillatory) along the vertical direction. (We will see that the waveform actually de-
pends on the scale parameter j but only very weakly). Continuing at this informal level of
discussion, it will be useful to think of as being roughly of the form(x
1
, x
2
) = (x
1
)(x
2
)
where is a smooth wavelet and a smooth scaling function (in fact is not such a direct
product).
Hence, curvelet frame elements are obtained by anisotropic dilations, rotations and
translations of a collection of unit-scale oscillatory blobs. Some properties are immediate
The parabolic scaling (1.4) yields an Anisotropy Scaling Law: the system is well-
localized in space and obeys approximately the relationships
length 2
j
, width 2
2j
and, therefore, the width and length of a curvelet obey the anisotropy scaling relation
width length
2
.
Directional Sensitivity: the elements are oriented in the co-direction
J
= 2
j
.
Identifying the curvelet width 2
2j
with the scale, there are 2
j
directions at scale
2
2j
; that is,
# orientations = 1/

scale.
Spatial Localization. For a given scale and orientations, curvelets are obtained by
two dimensional translations; those translations form a Cartesian grid with a spacing
proportional to the length in the direction
J
and width in the normal direction.
Oscillatory Nature. Curvelets elements display oscillatory components across the
ridge.
As in wavelet theory, we also have coarse scale elements which are of the form

k
1
,k
2
(x) = (x k

), k
1
, k
2
Z,
i.e. coarse scale curvelets are translates of a waveform (x
1
, x
2
) that we shall take to be
bandlimited and rapidly decaying.
One of the result of this paper is to show that one can select proles and such that
the system (

obeys the Parseval relation

['f,

`[
2
= |f|
2
L
2
(R
2
)
, f L
2
(R
2
). (1.6)
This equality says that (

is a tight frame and standard arguments give the reconstruc-


tion formula
f =

'f,

, (1.7)
with equality holding in an L
2
-sense. The reconstruction formula says that one can analyze
and synthesize any square integrable function as a superposition of curvelet elements in a
very concrete way.
5
f(x,y)
Figure 1: Typical element from our edge model.
1.5 Functions which are C
2
Away from C
2
Edges
We now formally specify a class of objects with discontinuities along edges which is inspired
by [18, 21, 19]. It is clear that nothing in the arguments below would depend on the specic
assumptions we make here, but the precision allows us to make our arguments uniform over
classes of such objects.
We follow [18] and introduce Star
2
(A), a class of indicator functions of sets B with C
2
boundaries B. In polar coordinates, we let () : [0, 2) [0, 1] be a radius function and
dene B by x B i [x[ (). In particular, the boundary B is given by the curve
() = (() cos , () sin ) (1.8)
The class of boundaries of interest to us are dened by

0
, ||

C
2
= sup [
tt
()[ A. (1.9)
To x ideas, take
0
= 1/10. We say that a set B Star
2
(A) if B [0, 1]
2
and if B is a
translate of a set obeying (1.8) and (1.9).
The geometrical regularity of the members of the class Star
2
(A) is useful; it forces very
simple interactions of the boundary with dyadic squares at suciently ne scales. We use
this to guarantee that suciently ne has a uniform meaning for every B of interest.
The actual objects of interest to us are functions which are twice continuously dier-
entiable except for discontinuities along edges B of sets in Star
2
(A). We dene C
2
0
(A)
to be the collection of twice continuously dierentiable functions supported strictly inside
[0, 1]
2
.
Denition 1.1 Let c
2
(A) denote the collection of functions f on R
2
which are supported
in the square [0, 1]
2
and obey
f = f
0
+f
1
1
B
(1.10)
where B Star
2
(A) , and each f
i
C
2
0
(A). We speak of c
2
(A) as consisting of functions
which are C
2
away from a C
2
edge.
Figure 1 gives a graphical indication of a typical element of c
2
(A).
6
1.6 Sparsity and Nonlinear Approximation
Let f be an object which is C
2
away from a C
2
edge. The main result of this paper is that
the curvelet coecient sequence (

)
M
of f is in some sense, as sparse as if f were not
singular.
Theorem 1.2 Let c
2
(A) be the collection (1.10) of objects which are C
2
away from a C
2
curve. Dene [[
(n)
to be the n-th largest entry in the coecient sequence ([

[)
M
in the
curvelet system. Then
sup
fc
2
(A)
[[
(n)
C n
3/2
(log n)
3/2
. (1.11)
There is a natural companion to this theorem. Let f
C
n
be the n-term approximation of
f obtained by extracting from the curvelet series (1.7) the terms corresponding to the n
largest coecients. The approximation error obeys
|f f
C
n
|
2
L
2

m>n
[[
2
(m)
,
and, therefore, the rate of decay (1.11) gives the following result.
Theorem 1.3 Under the assumptions of Theorem 1.2, the n-term approximation f
C
n
ob-
tained by simple thresholding in a curvelet frame achieves
|f f
C
n
|
2
L
2
C n
2
(log n)
3
. (1.12)
Simply put, ignoring log-like factors, there is no basis in which coecients of an object
with an arbitrary C
2
singularity would decay faster than in a curvelet frame. Moreover,
naive thresholding in a xed curvelet frame achieves convergence rates rivaling those attain-
able by adaptive approximation procedures which would attempt to track the discontinuity;
a result that the companion paper [7] qualied as quite surprising. We quote from that
paper: In short, in a problem of considerable applied relevance, where one would have
thought that adaptive representation was essentially more powerful than xed nonadaptive
representation, it turns out that a new xed nonadaptive representation is essentially as
good as adaptive representation, from the point of view of asymptotic n-term approximation
errors.
1.7 Signicance
The potential for sparsity is now well-understood for data compression, statistical estima-
tion [15, 22], etc., to the point that the sparsity concept has become a real paradigm in
certain research communities.
For instance, consider encoding a function f by the method of wavelet transform coding.
First, one quantizes its wavelet coecients 'f,

` into integers k

using a uniform quantum


q:
k

= sgn('f,

`) ['f,

`[/q|.
One encodes the position of the nonzero coecients and the values of the nonzero coecients
as bit strings by standard devices (run-length coding and so forth). Later, an approximate
reconstruction of f can be obtained from

f
q
=

qk

. Here we retain the index q to


remind us that the quantization stepsize q controls the behavior of the algorithm. This
coding method has distortion (q) = |f

f
q
|
L
2; by picking q appropriately, we can arrange
7
that (q; f) = for any desired distortion level > 0. In return, the number of bits
required for a distortion level is the description length L() = L(; f, Wavelets) for
wavelet transform coding. Of course for typical functions f, L() as 0.
If we encode a function f of the above typewhich is smooth away from C
2
edgesby
wavelet transform coding, we get that the wavelet description length L(; g, Wavelets)
grows as 0 at least as rapidly as c
2
. Because Fourier series are much denser than
wavelet expansions, the Fourier description length is signicantly worst; L(; g, Fourier)
grows as 0 at least as rapidly as c
4
. Now a strategy identical to that developed in
[17] shows that one can exploit the sparsity (1.11) and develop a curvelet transform coder
based on simple ideas such as scalar quantization and run-length coding which ignoring log-
like factors, yields a curvelet description length L(; g, Curvelets) growing more slowly
than
1
.
In fact, L(; g, Curvelets) is asymptotically nearly optimal in the sense that there is
no strategy which can encode elements taken from our edge models with fewer bits than
c
1
as 0, see [19] for details.
Another implication concerns statistical estimation. Consider the problem of recovering
a function f(x
1
, x
2
) from noisy data. The function f to be recovered is assumed smooth
apart from a discontinuity along a C
2
edge. We use the continuum white noise model
and observe y = f + n where n is white noise with noise level . Then a near corollary
of Theorem 1.2 gives that simple strategies based on the shrinkage of curvelet coecients
yielding an estimator

f achieveignoring log-like factorsa Mean Squared Error (MSE)
obeying
sup
fT
E|

f f|
2
L
2

4/3
, 0.
In fact, this is essentially the optimal rate of convergence as the minimax rate scales like

4/3
. In other words, there are no other estimating procedure which, in an asymptotic
sense, give fundamentally better MSEs. In comparison, wavelet shrinkage methods only
achieves a MSE which scales like as 0.
The situation is a little dierent when one considers adaptive methods which somehow
try to estimate the location and size of the discontinuities. Edge detection is a delicate topic
and realistic existing methods which are amenable to rigorous optimality results are nearly
nonexistent (we are of course aware of [18]). The curvelet shrinkage approach avoids these
issues as it does not use edge detectors or any other problematic schemes. The algorithms
simply extracts the large curvelet coecients.
1.8 Relationship with Other Curvelets
A previous article [7] used a radically dierent machinery to construct tight frames also
known under the name of curvelets. For clarity, we shall call the former tight frame curvelets
99. We now briey review the curvelet 99 transform and explain how it operates on a square
integrable object f. This will make explicit the connections and dierences with the frames
presented in this paper. Before we do so, we need to introduce orthonormal ridgelets. We
quote from [6]: Let (
j,k
(t) : j Z, k Z) be an orthonormal basis of Meyer wavelets
for L
2
(R) [28], and let (w
0
i
0
,
(), =0, . . . , 2
i
0
1; w
1
i,
(), i i
0
, =0, . . . , 2
i
1) be an
orthonormal basis for L
2
[0, 2) made of periodized Lemarie scaling functions w
0
i
0
,
at level i
0
and periodized Meyer wavelets w
1
i,
at levels i i
0
. (We suppose a particular normalization
of these functions). Let

j,k
() denote the Fourier transform of
j,k
(t), and dene ridgelets
8

(x), = (j, k; i, , ) as functions of x R


2
using the frequency-domain denition

() = [[

1
2
(

j,k
([[)w

i,
() +

j,k
([[)w

i,
( +))/2 . (1.13)
Here the indices run as follows: j, k Z, = 0, . . . , 2
i1
1; i i
0
, i j. Notice the
restrictions on the range of and on i. Let denote the set of all such indices . It turns
out that (

is a complete orthonormal system for L


2
(R
2
).
The curvelet 99 transform makes use of multiscale partitions of unity to localize an
object in space. Let Q denote a dyadic square Q = [k
1
/2
s
, (k
1
+1)/2
s
) [k
2
/2
s
, (k
2
+1)/2
s
)
and let O be the collection of all such dyadic squares. The notation O
s
will correspond to
all dyadic squares of scale s. Let w
Q
be a window centered near Q, obtained after dilation
and translation of a single w, such that the w
2
Q
s, Q O
s
, make up a partition of unity.
We dene multiscale ridgelets by
Q,
: s s
0
, Q O
s
,

Q,
= w
Q
T
Q

,
where
T
Q
f = 2
s
f(2
s
x
1
k
1
, 2
s
x
2
k
2
).
The discrete curvelet 99 transform also employs a bank of lters (P
0
f,
1
f,
2
f, . . . ) with
the property that the passband lter
s
is concentrated near the frequencies [2
s
; 2
2s+2
] e.g.

s
(f) =
2s
f,

2s
() =

(2
2s
). Note that the coronization is nonstandard.
With these preliminaries, the curvelet 99 transform operates as follows.
Subband Decomposition. The object f is ltered into subbands:
f (P
0
f,
1
f,
2
f, . . . ).
Smooth Partitioning. Each subband is smoothly windowed into squares of an ap-
propriate scale:

s
f (w
Q

s
f)
QOs
.
Renormalization. Each resulting square is renormalized to unit scale
g
Q
= (T
Q
)
1
(w
Q

s
f), Q O
s
.
Ridgelet Analysis. Each square is analyzed in the orthonormal ridgelet system.

= 'g
Q
,

`, = (Q, ).
With our notations, we have available a formula for curvelet 99 frame elements, i.e.

=
'f,

` with

=
s

Q,
, = ( , Q O
s
).
By linking the lter passband [[ 2
2s
to the scale of spatial localization 2
s
, we
impose that (1) most curvelets 99 are negligible in norm (most multiscale ridgelets do not
survive the bandpass ltering
s
); (2) the nonnegligible curvelets 99 obey length 2
s
while width 2
2s
. In short, the system obeys approximately the scaling relationship
width length
2
.
9
Note: it is at this last step that the 2
2s
coronization scheme comes fully into play. Despite
exhibiting novel and interesting properties, the original curvelet construction presents some
disadvantaging features we now describe.
First, the construction involves a seven-index structure = (s, k
1
, k
2
; j, k; i, , ) whose
indices include parameters for scale s, location K = (k
1
, k
2
), ridge scale j, ridge location
k, angular scale i max(j, i
0
), angular location , and a gender token . In addition, we
already mentioned that the scaling ratio width length
2
is actually a distortion of the
reality. In truth, curvelets 99 assume a wide range of aspect ratiosonly their energy
decays as the scaling ratio is increasingly less parabolic. The geometry and aspect ratio of
orthonormal ridgelets is itself unclear as they are not true ridge functions. This and other
facts together with the complicated index structure makes any kind of mathematical and
quantitative analysis especially delicate, see for instance the structure of the proof in [8].
For example, when proving results about the sparsity sequence one has to worry about a
myriad of coecients which may sometimes be quite daunting.
In contrast, the new denition exhibits a much simpler structure as it is indexed by
only three parameters; namely, scale, orientation (angle) and locationa byproduct being
that mathematical analysis is then considerably simpler. Easier manipulation is certainly
highly desirable but we would like to emphasize that the alteration is uncompromising;
every published curvelet result would hold true with our new system and we are simply
not aware of any signicant mathematical result starting with the main result of this
paper which would hold true for one system and not for the other.
Second, the curvelet 99 transform is in some sense a lapped transform as it involves
spatial localization with multiscale windows. In practical settings, to overcome blocking
eects, one would need to use overlapping windows, thereby, increasing the redundancy
of a digital implementation. The new curvelet transform, however, does not exhibit this
phenomenon and suggests a new digital implementation which shall be discussed briey in
Section 9.
In short, we believe that our new tight frames yields a system which improves upon the
original construction while obeying its philosophy.
1.9 Inspiration and Relation to Other Work
Underlying our work is the inspiration of the original curvelet transform as already dis-
cussed. Of interest here, however, is the connection between applied harmonic analysis and
a central problem in approximation theory which is new. Indeed, this paper gives the rst
proof of the optimality result for otherwise smooth objects with edges although results like
Theorem 1.2 have been claimed without proofs elsewhere [7].
The ideas underlying the curvelet transform are also loosely related with the theory
of ane wavelets. Several researchers [1, 26] have proposed to study the decompositions
of objects as superposition of ane wavelets of the form (Ax + b), where A GL(R
2
)
and b R
2
. This literature is mainly about continuous transformation and is connected
to the theory of square-integrable group representations [11]. Here, we suggest studying
representations where A is of the form D
j
R

, with D
j
a parabolic scaling and rotation
matrices with an angular step proportional to the square-root of the scale. First, the
particular geometry of curvelets does not allow the identication of the parameterization
with a linear group representation. And second, guided by the theory of wavelets, we
are especially interested in obtaining discrete representations, namely tight frames, with
provably optimal approximation properties.
10
There are deep connections between the curvelet transform and ideas from the eld
of mathematical analysis. In the seventies, Feerman [23, 36] studied the boundedness of
Riesz spherical means and introduced the so-called Second Dyadic Decomposition (SDD).
The SDD is a principle for localizing objects in the frequency plane which goes beyond
the classical Littlewood-Paley theory [25]. In fact, curvelets imply a tiling of the frequency
plane which is that suggested by SDD, see Section 2 and Figure 3 for details. We would
also like to remark that in the early nineties, SDD proved to be a very useful tool for the
study of Fourier Integral Operators, see [36] and references therein.
Finally, we recently became aware of the work of Do and Vetterli on contourlets which
is also directly inspired by the curvelet transform. We will comment on this line research
in the discussion section.
2 Second Generation of Curvelets
This section introduces new tight frames we shall call curvelets. Unlike the original curvelet
transform [7], this construction does not use ridgelets.
2.1 Scale/Angle Localization
For each pair (j, ), j 0 and = 0, 1, 2, . . . , 2
j
1, we let
j,
be the angular window

j,
() = (2
j
). Note that for = 0, 1, . . . , 2
j
1,
j,
( +) =
j,+2
j (). Then dene
the symmetric window
j,
() in the polar coordinates system by

j,
() = w(2
2j
[[) (
j,
() +
j,
( +)) . (2.1)
Here, we will assume that is an even, C

angular window which is supported on


[, ] and obeys
[
2
()[
2
+[
2
( )[
2
= 1, [0, 2), (2.2)
where in the above equation, it is understood that we take the 2-periodization of the
function , see Figure 2. It is not hard to deduce from our assumptions that for each j 0,
2
j+1
1

=0
[(2
j
)[
2
= 1, (2.3)
where again we have assumed 2-periodization of the translates (2
j
).
As for the radial window, we will suppose that w is compactly supported and obeys
[w
0
(t)[
2
+

j0
[w(2
2j
t)[
2
= 1, t R. (2.4)
A possible choice is to select w as in the construction of Meyer wavelets [28, 30]. With v
a C

window whose support is included in [2/3, 8/3], Meyer introduces the partition of
unity
[v
0
(t)[
2
+

j0
[v(2
j
t)[
2
= 1, t 0;
here, v
0
is a C

window which is identically equal to one on [0, 2/3) and vanishes on


[4/3, ). Dene w as
[w(t)[
2
= [v(t)[
2
+[v(t/2)[
2
, w
0
(t) = v
0
(t). (2.5)
11

2
0
1
1/2
()
( )
2
2
Figure 2: Basic angular window.
Then w obeys (2.4). Note that w is smoothly increasing on the interval [2/3, 4/3], is
constant and equal to one on [4/3, 8/3] and smoothly decreasing on [8/3, 16/3]. In
the remainder of this paper, we will assume this special choice of window.
Put
2
0
() = w
2
0
([[) + w
2
([[)
2
. For j 1,
j,
() and
j,
( + ) have non-overlapping
supports and, therefore, (2.3) and (2.4) give that the family (
j,
) is a family of orthogonal
and compactly supported windows in the sense that
[
0
()[
2
+

j1
2
j1

=0
[
j,
()[
2
= 1. (2.6)
We will use such windows to localize the Fourier transform near symmetric wedges of length
about 2
2j
and width about 2
j
. Indeed,
j,
is localized near the symmetric wedge
W
j,
= , 2
2j
[[ 2
2(j+1)
, [ 2
j
[

2
2
j
, (2.7)
and note that for each ,
j,
is obtained from
j,0
by applying a rotation. Figure 3 gives
a graphical representation of theses wedges and associated tiling.
2.2 New Tight Frames of Curvelets
We now introduce some notations that we will use throughout the remainder of this article.
We put J to be the pair of indices J = (j, ), j 0, = 0, 1, . . . , 2
j
1 and let
J
=
2
j
. Next, we let M
J
denote the set of coecients = (j, , k) with a xed value of
the scale/angle pair J = (j, ).
For each j 1, the support of w(2
2j
[[)v(2
j
) is contained in the rectangle R
j
=
I
1j
I
2j
where
I
1j
=
1
, t
j

1
t
j
+L
j
, I
2j
=
2
, [
2
[ l
j
/2;
R
j
is symmetric around the axis = 0. We will write the length L
j
and width l
j
as
L
j
=
1
2
2j
and l
j
=
2
22
j
. It is not dicult to verify that our assumptions about
12
2
2
j/2
j
Figure 3: Curvelet Tiling of the Frequency Plane. In the frequency domain, curvelets are
supported near symmetric parabolic wedges. The shaded area represents such a generic
wedge.
localizing windows imply that
1
and
2
obey
1
= 14/3(1 + O(2
j
)) and
2
= 10/9
respectively.
We let

I
1j
be I
1j
and set

R
j
=

I
1j
I
2j
. It is well-known that e
i(k
1
+1/2)
1
/L
j
/

2L
j
,
k
1
Z, is an orthobasis for L
2
(

I
1j
). Since e
i2k
2

2
/l
j
/

l
j
is an orthobasis for L
2
(I
2j
), the
sequence (u
j,k
)
kZ
2 dened as
u
j,k
(
1
,
2
) =
2
3j/2
2

2
e
i(k
1
+1/2)2
2j

1
/
1
e
ik
2
2
j

2
/
2
, k
1
, k
2
Z, (2.8)
is then an orthobasis for L
2
(

R
j
).
We are now in position to introduce curvelets using the frequency-domain denition.
Letting R

J
be the rotation by
J
, we dene

() = (2)
J
()u
j,k
(R

J
),
t
= (j, , k). (2.9)
With the same notation as in Section 1, we also dene coarse scale curvelets

0
(x) =
(2)
0
()u
k
() where u
k
() = (2
0
)
1
e
i(k
1

1
/
0
+k
2

2
/
0
)
. Here,
0
is chosen small enough
for (u
k
)
kZ
2 to be an orthobasis for L
2
functions with a compact support containing that
of
0
, e.g.
0
= 32/3.
Observe that

M
J
['F,

`[
2
= (2)
2

[F()[
2
[
J
()[
2
d
since by construction (u
jk
(R

J
))
k
is an orthobasis over the support of
J
. It then follows
from (2.6) that for any F L
2
(R
2
),

['F,

`[
2
= (2)
2
|F|
2
L
2
13
and, therefore,

is a tight frame for L


2
(R
2
). In conclusion, the Plancherel formula gives
that (

)
M
obeys

['f,

`[
2
= |f|
2
L
2
(R
2
)
. (2.10)
This last equality says (

)
M
is a tight frame and standard arguments imply that the
decomposition (1.7) holds.
We would like to remark that the construction presented here was rapidly introduced
by Cand`es and Guo in [9]. Since the redaction of that paper, Cand`es became aware of the
work of Smith. In [32], Smith introduces a tight frame which is nearly identical to that
described above for the purpose of studying parametrices of general hyperbolic equations.
2.3 Space-Side Picture
The point of our construction is that curvelets are real-valued objects. Indeed, let
j
be
the inverse Fourier transform of
2
3j/2

j,0
()e
i
2
2j

1
2
1
. This function is real-valued and

j,0,k
(x) =
j
(x
1
2
2j
k
1
/
1
, x
2
2
j
k
2
/
2
).
Now, the envelope of
j
is concentrated near a vertical ridge of length about 2
j
and width
2
2j
. Dene
(j)
by

j
(x) = 2
3j/2

(j)
(D
j
x)
where D
j
is the diagonal matrix
D
j
=

2
2j
0
0 2
j

. (2.11)
In other words, the envelope
(j)
is supported near a disk of radius about one, and owing to
the fact that
j,0
is supported away from the axis
1
= 0,
(j)
oscillates along the horizontal
direction. In short,
(j)
resembles a 2-dimensional wavelet of the form (x
1
)(x
2
) where
and are respectively father and mother-gendered wavelets. Let k

be the Cartesian grid


(k
1
/
1
, k
2
/
2
). With these notations,

j,0,k
(x) = 2
3j/2

j
(D
j
x k

).
and the relationship
j,,k
() =
j,0,k
(R

J
) gives

(x) = 2
3j/2

(j)
(D
j
R

J
x k

). (2.12)
Hence, we dened a tight frame of elements which are obtained by anisotropic dilations,
rotations and translations of a collection of unit-scale oscillatory blobs. Curvelets occur at
all dyadic lengths and exhibit an anisotropy increasing with decreasing scale like a power
law; curvelets obey a scaling relation which says that the width of a curvelet element is
about the square of its length; width length
2
. Conceptually, we may think of the curvelet
transform as a multiscale pyramid with many directions and positions at each length scale,
and needle-shaped elements (or fat segments) at ne scales.
14
2.4 Split At Every at Other Scale
Variations about the denition are of course possible. For instance, note that in the fre-
quency plane, our tight-frames are supported near coronae of the form 2
2j
[[
2
2(j+1)
. This coronization is non-standard; these are not dyadic coronae as in wavelet
theory. It is, of course possible to adapt the construction and dene dyadic curvelets by
choosing windows of the form

j,
() = w(2
j
[[)

]j/2|,
() +
]j/2|,
( +)

. (2.13)
The exact same construction works with this choice of windows and we conclude this section
with a brief summary of the main points of the curvelet transform:
We decompose the frequency domain into dyadic annuli [x[ [2
j
, 2
j+1
).
We decompose each annulus into wedges = 2
j/2
. That is, we divide at every
other scale as shown on Figure 3.
We use oriented local Fourier bases on each wedge.
Important remark. In the remainder of this paper, we will assume this special choice
so that at scale 2
j
, curvelets have length about 2
j/2
and width 2
j
and which in the
frequency plane live near the dyadic subband [[ [ 2
j
, 2
j+1
]. We nd this choice more
consistent with the standard literature which emphasizes partitions of the frequency plane
near dyadic subbands instead of coronae of the form [ 2
2j
, 2
2j+2
].
3 Geometry and Tilings: Ridgelet Packets
The previous section makes clear that there is a general machinery for designing tight
frames. Instead of considering smooth segmentations of each subband into a xed number
of wedges, i.e. roughly 2
j/2
, we might consider arbitrary dyadic segmentations and thereby
design tight frames with arbitrary aspect ratios at arbitrary scales. When the number of
segmentations is (1) independent of scale, we essentially obtain tight frames of wavelet-like
elements or steerable wavelets [31] (2) increasing like 1/

scale, we obtain tight frames of


curvelets, and (3) increasing like 1/scale, we obtain tight frames of ridgelets [2, 6].
In [24], the authors followed this organization principle and constructed a family of
tight frames they call ridgelet packets. One problem with this work is that the family of
tight frames or orthobases they exhibit is missing a key ingredient, a translation parameter,
which may limit their applicability. In response to this, the ideas we exposed in the previous
section have of course an explicit translation index.
4 Why Does This Work?
Theorem 1.2 claims that the curvelet coecients of an object which is singular along a C
2
curve but otherwise smooth decay at nearly the rate n
3/2
. This section presents a heuristic
argument which explains the 3/2 exponent.
15
f(x,y)

s
Edge fragment
Coefficient ~ 0
-s
Scale 2
Figure 4: Schematic decomposition of a subband. The top gures represents an object with
an edge and that same object after applying a bandpass lter which keeps details at scale
2
j
. The bottom picture represents a bandpassed edge fragment together with the three
types of curvelets.
4.1 Heuristic Argument
Curvelets are not compactly supported. However, at scale 2
j
, they are of rapid decay
away from a ridge of length about 2
j/2
and width 2
j
so that we can talk about such
a ridge as being their eective support. With this in mind, curvelet coecients come in
essentially three types.
1. Type A. Those curvelets whose essential support does not overlap with the disconti-
nuity.
2. Type B. Those curvelets whose essential support overlap with the discontinuity but
are not tangent to the singularity.
3. Type C. Those curvelets which overlap with the singularity and are nearly tangent
to the singularity.
These three types are schematically represented on Figure 4. We will argue that coecients
of type A and B are in some sense negligible.
First, coecients of type A do not feel the singularity and are basically those one
would collect if we were to analyze a banal smooth, i.e. C
2
, function. The decay exponent
of these coecients is 3/2 which is that of the coecients of an arbitrary C
2
functiona
fact which will be formally established in Section 8. Note that this decay rate is that one
might actually expect as this is also the rate of other classical expansions such as Fourier
or wavelet series. Therefore, from the point of view of smooth C
2
functions, curvelets are
16
as good as Fourier or wavelet bases. To understand this phenomenon, observe that at scale
2
j
, curvelet coecients isolate details of length 2
j
to employ a terminology borrowed
from the wavelet literature. Let
j
be a bandpass lter which extracts frequencies near
the dyadic subband 2
j
[[ 2
j+1
and is identically equal to one over the support of

.
From
j
(

) =

, it follows that

= '
j
f,

`.
Eectively, the bandpass object
j
f is nearly vanishing everywhere except along a ridge of
width about 2
j
the width of the lter
j
and whose spatial position of course coincides
with that of the underlying edge. Figure 4 schematically represents this bandpassed image.
Then coecients of type A are negligible simply because
j
f nearly vanishes over the
essential support of those curvelets.
Second, coecients of type B are negligible because of the ner frequency localization
of curvelet elements. Consider a portion of a bandpassed edge as illustrated in Figure
5 which has been spatially localized with a smooth window of radius about 2
j/2
. In
the frequency domain, the bandpassed edge fragment is supported near a wedge whose
orientation is orthogonal to that of the edge. This is interesting because we have seen that
in the frequency domain, curvelets are supported near dyadic wedges of length 2
j
and width
2
j/2
, where again the orientations of such wedges are normal to the spatial orientation of
our curvelets. Figure 5 represents the essential support of the bandpassed edge fragment
and that of a curvelet. Then unless the curvelet orientation is nearly parallel to the edge,
these wedges are disjoint and associated coecients are small.
In short, coecients of type A are small because the spatial supports of the edge and of
the curvelet do not overlap whereas coecients of type B are small because their frequency
support are disjoint. This microlocalization is what actually explains the sparsity of
curvelet expansions of objects with edges.
We now focus our attention on the last group of coecients, namely, coecients of type
C. The singularity is a C
2
curve of nite length and it is clear that for a xed scale 2
j
,
there are at most O(2
j/2
) coecients of such type. We now estimate the size of each
coecient of type C. We have
[

[ = ['f,

`[ |f|
L
|

|
L
1
.
Curvelets are L
2
normalized so that |

|
L
2
1 and essentially supported in a box of
side-length 2
j/2
and width 2
j
. Therefore, they obey
|

|
L
1
B 2
3j/4
,
uniformly over the index . (Note that this easily and rigorously follows from the denitions
(2.9) or (2.12).) Since f is a bounded function, the coecients

then verify the a priori


estimate
[

[ B 2
3j/4
|f|
L
. (4.1)
To summarize, at each scale 2
j
, we have O(2
j/2
) coecients of type C which are
bounded by C2
3j/4
. Assuming that the other coecients (of type A and B) are negligible,
the nth largest coecient [[
(n)
is then bounded by
[[
(n)
C n
3/2
.
17
Scale 2
-s
Frequency 2
s
Figure 5: Microlocal behavior. The left gure is a spatial representation of a bandpassed
edge fragment and of a curvelet (of type B) while the right gure portrays its frequency
representation. The shaded area represents the essential frequency support of the band-
passed edge fragment while the curvelet is supported on the wedge centered around the
radial line.
Further, note that the above decay would also give the O(n
2
) convergence rate for the
nonlinear n-term approximation f
n
dened by keeping the n largest term in the curvelet
expansion as in Theorem 1.3. Indeed, f
n
would obey
|f f
n
|
2
L
2

m>n
[[
2
(m)
C n
2
.
4.2 Necessary Renements
The above arguments only suggest why we may expect 3/2-exponent and a careful proof
should take into account several important facts.
First, curvelets are not compactly supported and, therefore, it is inaccurate to claim
that curvelets of type A do not feel the singularity only their rapid spatial decay
will control the edge eect as the distance between the edge curve and the center of
the curvelets increases.
Second, it is inaccurate to claim that the frequency support of an edge does not
overlap with that of a curvelet if they are not parallel. A rigorous argument should
articulate this fact and quantify the overlap. In some sense, curvelet coecients decay
as the angle between their orientation and that of the edge increases; quantifying this
phenomenon with the best possible accuracy is the central part of the proof.
18
5 Architecture of the Proof
In this section, we prove Theorem 1.2, our main result. However, the proof relies on a key
estimate which is the object of a separate section.
With the notations of Section 2, we let M
j
be the set of indices (j, , k), = 0, 1, . . . and
k Z
2
so that (

)
M
j
is the set of all curvelets at scale 2
j
. Abusing notation slightly,
let
j
denote the subsequence of coecients (

)
M
j
.
To measure the sparsity of a sequence (
n
), we will use the weak-
p
or Marcinkiewicz
quasi-norm, dened as follows: let [[
(n)
be the nth largest entry in the sequence ([
n
[); we
set
[[
wp
= sup
n>0
n
1/p
[[
(n)
. (5.1)
There are other equivalent denitions; for instance the weak-
p
norm may also be dened
as
sup
>0
#n, [
n
[ >
p
.
Note that the latter denition shows that the weak-
p
norm obeys [[
wp
||
p
. Equipped
with this denition, the main result of this section is as follows.
Theorem 5.1 The sequence
j
obeys
|
j
|
w
2/3
C, (5.2)
for some constant C independent of scale.
Fix the scale parameter j. To analyze the coecient sequence of an object f at a given
scale 2
j
, we rst smoothly localize this function near dyadic squares with a prescribed
radius. We dene a partition of unity (w
Q
)
QO

QO
w
Q
(x) = 1,
so that with the index Q indicating a dyadic square of the form Q = [k
1
/2
j/2
, (k
1
+1)/2
j/2
)
[k
2
/2
j/2
, (k
2
+ 1)/2
j/2
), by w
Q
= w(2
j/2
x
1
k
1
, 2
j/2
x
2
k
2
). Here w is a nonnegative C

function vanishing outside of the square [1, 1], say. We use this partition to smoothly
localize the function f near each dyadic square Q and dene f
Q
by
f
Q
= f w
Q
.
Note how the scale of the dyadic squares depends upon the scale 2
j
; we use dyadic squares
of sidelength about the length of curvelets. For M
j
and each dyadic square Q, dene

Q
to be the curvelet coecient sequence of f
Q
, i.e.

Q,
= 'f
Q
,

`, M
j
.
Note the restriction on , namely, M
j
. Our strategy is simply to establish a series
of results about the sparsity of the curvelet coecient sequence
Q
and combine them to
derive our claim (5.2).
19
5.1 Partition of Dyadic Squares
The sequences
Q
of course exhibit a very dierent behavior depending on whether or
not the edge curve has a nonempty intersection with the support of w
Q
. Accordingly, we
partition the collection of dyadic squares O into two sets O
0
and O
1
and dene O
0
to be
the collection of those squares such that the edge curve intersects with the support of w
Q
.
Clearly, the cardinality of O
0
obeys
[O
0
[ A
0
2
j/2
, (5.3)
for some constant A
0
independent of scale. Note that since we assume f to be compactly
supported, there is a maximum of 2
j
+4 2
j/2
squares for which f
Q
is possibly nonvanishing.
We prove two results.
Theorem 5.2 Let Q be a dyadic square such that Q O
0
. The curvelet coecient sequence

Q
of f
Q
obeys
|
Q
|
w
2/3
C 2
3j/4
, (5.4)
for some constant C independent of Q.
Theorem 5.3 Let Q be a dyadic square such that Q O
1
. The curvelet coecient sequence

Q
of f
Q
obeys
|
Q
|
w
2/3
C 2
3j/2
,
for some constant C independent of Q. Actually, the stronger inequality |
Q
|

2/3
C2
3j/2
also holds.
5.2 Proof of Theorem 5.1
The proof of Theorem 5.1 is a simple consequence of Theorems 5.2 and 5.3. Recall the
p-triangle inequality for weak-
p
, p 1,
|a +b|
p
wp
|a|
p
wp
+|b|
p
wp
.
Since
j
=

Q

Q
, we have
|
j
|
2/3
w
2/3

Q
|
Q
|
2/3
w
2/3
[O
0
[ sup
O
0
|
Q
|
2/3
w
2/3
+[O
1
[ sup
O
1
|
Q
|
2/3
w
2/3
The claim follows from Theorems 5.2 and 5.3 together with the earlier observation [O
0
[
A
0
2
j/2
and [O
1
[ 2
j
+ 4 2
j/2
.
5.3 Proof of Theorem 1.2
The proof of Theorem 1.2 now easily follows from Theorem 5.1. Indeed, observe that, on
the one hand, the latter theorem established that
# M
j
, [

[ > C
2/3
,
and on the other, a previous section argued that there exists a constant B with the property
[

[ = ['f,

`[ B 2
3j/4
|f|
L
.
20
As a consequence, there is a scale j

such that for each j j

, [

[ < . Formally,
B 2
3j/4
|f|
L
< # M
j
, [

[ > = 0;
thus, the number of scales j such that # M
j
, [

[ > is possibly nonzero is bounded


by
3/4

log
2
(
1
) + log
2
(|f|

) + log
2
(B)

log
2
(
1
),
for suciently small. We then showed that
# M, [

[ >

j
# M
j
, [a

[ > C
2/3
log(
1
),
which is what we sought. Theorem 1.2 is proved.
5.4 The Coarse Scales
The careful reader will point out that we have not treated the coarse scale coecients. At
coarse scales, curvelets are of the form (x
1
k
1
, x
2
k
2
). Since is of rapid decay,
i.e. for each m 0, obeys (x) C
m
(1 +[x[)
m
, and f is supported on [0, 1]
2
, standard
arguments give that for each m 0, these coarse scale coecients obey
[
k
1
,k
2
[ C
m
(1 +[k[)
m
,
for some constant C
m
. Hence, their
p
summability is a not an issue.
6 Fourier Analysis of Edge Fragments
6.1 Edge Fragments
Suppose we are given an object with an edge along a C
2
curve. We window the object,
multiplying by w
Q
(x) = w(2
j/2
x k), where w is smooth and compactly supported with
support included in [1, 1]
2
. We then translate the domain so that the resulting object is
supported near the origin in a set contained in the square 2
j/2
x
1
, x
2
2
j/2
. We
will call the result an edge fragment.
We suppose that the scale 2
j
is small enough such that over the support of w
Q
,
the edge curve may be parameterized as a graph either of the form (x
1
, x
2
= E(x
1
)) or
(x
1
= E(x
2
), x
2
). Indeed, the edge has a very simple interaction with dyadic squares
at suciently ne scales; for j j
0
, the sidelength 2
j/2
of a square is too short to
prevent one of the aforementioned parameterizations. (Here, j
0
may be a function of the
maximum curvature of our edge curve.) Assume without loss of generality that the latter
parameterization holds; then an edge fragment is a function of the form
f(x
1
, x
2
) = w(2
j/2
x
1
, 2
j/2
x
2
)g(x
1
, x
2
)1
|x
1
E(x
2
)
. (6.1)
To make things concrete, we suppose that the edge goes through the origin and that at this
point, its tangent is pointing in the vertical direction, i.e.
E(0) = 0, E
t
(0) = 0. (6.2)
21
Edge
x
x
1
2
x = E(x )
2 1

(E(u),u)
t
x
1
x
2
cos sin + = t
Figure 6: Schematic representation of an edge fragment and associated notation.
We would like to emphasize that this is not a loss of generality and that nothing in the
below arguments depends on this specic assumption; see the discussion at the end of this
section. It follows that E deviates little from zero:
sup
[x
2
[2
j/2
[E(x
2
)[
1
2
sup
[x
2
[2
j/2
[E
tt
(x
2
)[ 2
j
.
In this sense, the edge curve (E(x
2
), x
2
), [x
2
[ 2
j/2
, is very nearly straight. Figure 6
gives a sketch of an edge fragment.
6.2 Fourier Analysis
We wish to study the localization of the Fourier transform of an edge fragment. Because
the singularity is nearly vertical, it is quite clear that the Fourier transform of the edge
fragment will have slow decay along the horizontal axis = (
1
, 0). In this section, however,
we wish to understand the decay of the Fourier transform along radial lines of the form
(cos , sin ), R, as moves away from the singular co-direction = 0, . Our goal
is to quantify this decay at a nite distance of the origin, namely, for [[ 2
j
.
Theorem 6.1 Let I
j
be a dyadic interval [ 2
j
, 2
j+
] with , 0, 1, 2, 3. The
Fourier transform of the edge fragment obeys

[[I
j
[

f(cos , sin )[
2
d C 2
2j
(1 + 2
j/2
[ sin [)
5
. (6.3)
We briey discuss the relevance of this theorem for our problem. Curvelets are com-
pactly supported near parabolic wedges in the frequency plane and Theorem 6.1 quanties
the frequency localization of an edge fragment as it gives bounds on the energy an edge
fragment puts on each such parabolic wedge. With f an edge fragment and
J
a frequency
window as in Section 2, (6.3) gives

f()[
2
[
J
()[
2
d C 2
3j/2
(1 + 2
j/2
[ sin
J
[)
5
. (6.4)
22
Such bounds are sharp and cannot be improved. In a nushell, (6.4) controls the size of the
coecients for a xed value of the scale/angle pair J.
Let F be the renormalization of an edge fragment to the unit square
F(x) = f(2
j/2
x), x [1, 1]
2
. (6.5)
Note that F is of the form
F(x) = w(x)g(2
j/2
x)1
|x
1
E
j
(x
2
)
, E
j
(x
2
) = 2
j/2
E(2
j/2
x
2
). (6.6)
The edge curve E
j
is again nearly straight. The Fourier transform of F is of course given
by

F() = 2
j

f(2
j/2
), and, therefore, (6.3) is equivalent to

[[2
j/2
I
j
[

F(cos , sin )[
2
d C 2
j/2
(1 + 2
j/2
[ sin [)
5
. (6.7)
From now on, we will use the nicer notation [[ 2
j/2
to indicate [[ 2
j/2
I
j
=
[ 2
j/2
, 2
j/2+
].
The Radon transform provides a convenient tool to study integrals of the type (6.3) and
(6.7). We recall that the Radon transform of an object f is the collection of line integrals
indexed by (, t) [0, 2) R given by
Rf(, t) =

f(x
1
, x
2
)(x
1
cos +x
2
sin t) dx
1
dx
2
, (6.8)
where is the Dirac distribution. The Radon transform is linked to the polar Fourier trans-
form of an object f because the Projection Slice Theorem states that the Fourier transform
along radial lines may be obtained by applying the 1-dimensional Fourier transform to the
slices of the Radon transform

f(cos , sin ) =

Rf(, t)e
it
dt.
6.3 The Radon Transform of an Edge Fragment
In this section, we will view RF(t, ) as a function of the variable t while will merely play
the role of a parameter. From now on, the subscript t shall denote partial derivatives with
respect to the t variable.
Lemma 6.2 Set = 2
j/2
|E
tt
|
L
and assume [ sin [ max(2
j/2
, 2). The Radon
transform RF(, ) is twice dierentiable and admits the following decomposition
(RF)
tt
(t, ) = F
0
(t, ) +F
1
(t, );
F
0
obeys
|F
0
|
2
L
2
C 2
j
[ sin [
5
, (6.9)
and F
1
is dierentiable and obeys
|(F
1
)
t
|
2
L
2
C [ sin [
5
. (6.10)
23
The Radon transform RF is the line integral along L
t
= (x
1
, x
2
), x
1
cos +x
2
sin
t = 0 which may or may not intersect with the edge c = (E
j
(u), u), [u[ 1. Note that
an intersection point is the solution of
E
j
(u) cos +usin = t. (6.11)
Recall our assumption [ sin [ 2. The function u E
j
(u) cos + usin is then strictly
monotone and we let a() = E
j
(1) cos sin , and b() = E
j
(1) cos +sin . We denote
by I() the interval with endpoints a(), b(); that is I() is the range of E
j
(u) cos +usin
as u varies in the interval [1, 1]. Observe that the size of this interval obeys
[I()[ 2[ sin [ + 2 3[ sin [. (6.12)
Lemma 6.3 Let E
j
(x) C
2
[1, 1] with E
j
(0) = 0, E
t
j
(0) = 0, |E
tt
j
|
L

[1,1]
. For
[ sin [ > 2,
(1) Each line L
t
intersects c in at most one point.
(2) The intersection is empty if t / I().
(3) Each x
2
[1, 1] generates a point (E
j
(x
2
), x
2
) which is the intersection L
t
c for
exactly one value of t I().
We let u(t, ) denote the value of x
2
named in part (3) of this lemma. Viewing this as a
function of (t, ), we observe the following behavior.
Lemma 6.4 Let [ sin [ 2. For each t I(), the function u(t, ) is dened by (6.11).
It is C
2
, with partial derivatives
u
t
= (sin +E
t
j
(u) cos )
1
,
u
tt
= E
tt
j
(u) cos /(sin +E
t
j
(u) cos )
3
.
Note that the partial derivatives then obey the following estimates:
[u
t
[ 2 [ sin [
1
, (6.13)
[u
tt
[ 8 [ sin [
3
. (6.14)
These lemmas are elementary and we omit the proofs. Figure 6 gives a graphical indication
of some of the objects just described.
We now let F

be the function obtained by composing F with the rotation by an angle


, namely,
F

(x) = F(x
1
cos x
2
sin , x
1
sin +x
2
cos ).
With these notations the Radon transform of F is given by
(RF)(t, ) =

(t, u) du.
Dene (t, a(t, )) to be the coordinates of the point (E
j
(u), u) in the orthogonal coordinates
system rotated by an angle ; a = E
j
(u) sin + ucos . Set G(x) = g(2
j/2
x) w(x) so
that F(x) = G(x) 1
|x
1
E
j
(x
2
)
. For t I(), the Radon transform of F is then given by
(RF)(t, ) =

(t, u) du. (6.15)


24
while for t / I(), the same expression holds but with an integral of the form

.
The Radon transform RF is twice dierentiable with respect to the t-variable and for
t I(), we calculate
(RF)
t
(t, ) = a
t
G

(t, a) +

1
(t, u) du,
where the subscript 1 (resp. 2) indicates dierentiation with respect to the rst (resp. sec-
ond) variable. Further,
(RF)
tt
= a
tt
G

(t, a) + 2a
t
G

1
(t, a) + (a
t
)
2
G

2
(t, a) +

11
(t, u) du
= T
1
+ 2T
2
+T
3
+T
4
while or t / I(), the second derivative is simply given by
(RF)
tt
=

11
(t, u) du.
Checking the dierentiability of RF at the endpoints of I() = (a(), b()) is a not an issue
since G(t, a) is identically zero for t in the neighborhood of both a() and b(). In other
words, both calculations agree for t near a() or b().
The proof then consists in expressing each term T
m
as a sum of terms T
m,n
obeying
either (6.9) or being further dierentiable with respect to t and with a derivative obeying
(6.10). Our calculations only use the following two facts: rst, a is supported on the interval
I() which is of length at most 3[ sin [ and obeys [a
t
[ C [ sin [
1
and [a
tt
[ C 2
j/2

[ sin [
3
; second, letting D be either /x
1
or /x
2
, [D
m
(g(2
j/2
x))[ C
m
2
jm/2
. In
the remainder of the proof, we will abuse notations and let g actually denote the rescaled
object g(2
j/2
x).
Consider T
1
. This function obeys |T
1
|
L
C 2
j/2
[ sin [
3
and is supported in
I(). Therefore,
|T
1
|
2
L
2
(R)
C 2
j
[ sin [
5
.
Consider T
2
. Express T
2
as
T
2
= a
t
(g
1
w +w
1
g) = T
2,1
+T
2,2
;
T
2,1
is supported on I() and obeys |T
2,1
|
L
C 2
j/2
[ sin [
1
. Hence, |T
2,1
|
2
L
2
(R)

C 2
j
[ sin [
1
and, therefore, (6.9). T
2,2
is dierentiable and
(T
2,2
)
t
= a
tt
w
1
g +a
t
(w
11
g +w
1
g
1
) + (a
t
)
2
(w
12
g +w
1
g
2
).
Similar arguments give |(T
2,2
)
t
|
2
L
2
(R)
C 2
j
[ sin [
5
, i.e. (6.10).
Consider T
3
. Express T
3
as
T
3
= (a
t
)
2
(g
2
w +w
2
g) = T
3,1
+T
3,2
Then T
3,1
is supported I() and obeys |T
3,1
|
L
C 2
j/2
[ sin [
2
. Hence, |T
3,1
|
2
L
2
(R)

C 2
j
[ sin [
1
and, therefore, (6.9). T
3,2
is dierentiable and
(T
3,2
)
t
= 2a
tt
a
t
(w
2
g) + (a
t
)
2
(w
12
g +w
1
g
2
) + (a
t
)
3
(w
22
g +w
2
g
2
).
25
Our basic arguments now give
|(T
3,2
)
t
|
2
L
2
(R)
C (1 + 2
j
[ sin [
2
) [ sin [
5
,
and, therefore, (6.10) since we assumed [ sin [ 2
j/2
.
At last, consider T
4
.
T
4
=

(g
11
w + 2g
1
w
1
+gw
11
) du
= T
4,1
+ 2T
4,2
+T
4,3
. (6.16)
Then [T
4,1
[ C 2
j
and therefore |T
4,1
|
2
C 2
2j
. Likewise, [T
4,2
[ C 2
j/2
and
therefore |T
4,2
|
2
C 2
j
. Finally T
4,3
is dierentiable and its derivative is given by
(T
4,3
)
t
= a
t
gw
11
+

gw
111
+g
1
w
11
du = T
4,3,1
+T
4,3,2
.
The function T
4,3,1
= a
t
gw
11
is supported on I() and obeys |T
4,3,1
|
L
C [ sin [
1
.
Hence, |T
4,3,1
|
2
L
2
C [ sin [
1
. As far as the other term is concerned, |T
4,3,2
|
L
C
and hence |T
4,3,1
|
2
L
2
C.
For the sake of completeness, we just briey mention how the proof adapts when the
integration line has an empty intersection with the edge curve. In this case, the second
derivative (RF)
tt
is simply given by
(Rf)
tt
=

g
11
w + 2g
1
w
1
+gw
11
du = T
4,1
+ 2T
4,2
+T
4,3
.
The estimated we collected for both T
4,1
and T
4,2
of course still hold. As far as the third
term is concerned, we have
(T
4,3
)
t
=

gw
111
+g
1
w
11
du
and it is then clear that |(T
4,3
)
t
|
L
C and consequently |(T
4,3
)
t
|
2
L
2
C. This nishes
the proof of Lemma 6.2.
Suppose the object we wish to analyze is now of the form

F(x) = x
m
1
1
F(x), where F
is an edge-fragment as before and m
1
a nonnegative integer. Then

F is of course an edge
fragment and therefore obeys the decomposition (6.9)-(6.10). However, these estimates can
be improved upon (the reason why we need sharper bounds will become apparent below).
Indeed, observe that along the edge curve c, x
m
1
1
F(x) obeys
[x
m
1
1
F(x)[ C 2
jm
1
/2
, x c.
This is because E deviates little from zero and obeys sup
[x
2
[1
[E
j
(x
2
)[ /2, and thus for
x c, [x
1
[ C 2
j/2
.
Corollary 6.5 Let F be an edge fragment as in Lemma 6.2 and consider

F(x) = x
m
1
1
F(x).
Then (R

F)
tt
admits the following decomposition:
(R

F)
tt
= F
0
+F
1
+F
2
;
F
0
obeys
|F
0
|
2
L
2
C 2
jm
1
2
j
[ sin [
5
+C 2
2j
; (6.17)
26
F
1
is dierentiable and obeys
|(F
1
)
t
|
2
L
2
C 2
jm
1
[ sin [
5
+C 2
j
; (6.18)
F
2
is twice dierentiable and obeys
|(F
2
)
tt
|
2
L
2
C. (6.19)
The proof proceeds as that of Lemma 6.2 and we write
(R

F)
tt
= T
1
+T
2
+T
3
+T
4
For each i = 1, 2, 3, T
i
may be expressed as a sum of terms such that each veries either
(6.9) or (6.10) but for an additional multiplicative factor 2
jm
1
since they all involve the
value of the edge fragment along the edge curve. This is the content of (6.17)(6.18).
Now write T
4
as before, i.e. (6.16). For T
4,1
, we use the same estimate as before,
namely |T
4,1
|
2
C 2
2j
and, therefore, obeys (6.17). For T
4,2
, we observe that this term
is dierentiable and the derivative is given by
(T
4,2
)
t
= a
t
(g
1
w
1
) +

g
1
w
11
+g
11
w
1
du = T
4,2,1
+T
4,2,2
The rst term T
4,2,1
= a
t
g
1
w
1
obeys |T
4,2,1
|
L
C 2
j/2
2
jm
1
/2
[ sin [
1
and as a
consequence |T
4,2,1
|
2
L
2
C 2
jm
1
2
j
[ sin [
1
, which is acceptable for (6.18). The
second term T
4,2,2
veries |T
4,2,2
|
L
C 2
j/2
and then |T
4,2,2
|
2
L
2
C 2
j
which is also
acceptable. Finally
(T
4,3
)
t
= a
t
(gw
11
) +

g
1
w
11
+gw
111
du = T
4,3,1
+T
4,3,2
+T
4,3,3
.
Similar arguments give |T
4,3,1
|
2
L
2
C2
jm
1
[ sin [
1
, which is acceptable and |T
4,3,2
|
2
L
2

C 2
j
which is also acceptable. Simple calculations show that the derivative of T
4,3,3
obeys
(6.19) which concludes the proof.
6.4 Proof of Theorem 6.1
The proof of Theorem 6.1 is now one step away. To establish (6.7), observe that the Fourier
transform of (RF)
tt
is
2

F(cos , sin ) which then gives the decomposition

2

F(cos , sin ) =

F
0
(cos , sin ) +

F
1
(cos , sin ).
Now, it follows from Lemma 6.2 that

F
0
(cos , sin )[
2
d |F
0
(, )|
2
L
2
(R)
C 2
j
[ sin [
5
.
Likewise, since i

F
1
(cos , sin ) is the Fourier transform of the derivative of F
1
(, ), the
bound (6.10) gives

[[
2
[

F
1
(cos , sin )[
2
d C [ sin [
5
and
[[2
j/2
[

F
1
(cos , sin )[
2
C 2
j
[ sin [
5
.
27
(We recall that [[ 2
j/2
means [[ [ 2
j/2
, 2
j/2+
]). Therefore, we proved that

[[2
j/2
[

F(cos , sin )[
2
d C 2
3j
[ sin [
5
, (6.20)
as claimed.
To conclude the proof of the theorem, we need to address the decay of the Fourier
transform in the directions (cos , sin ), for [ sin [ max(2
j/2
, 2); that is, for directions
which are nearly normal to the singularity.
Write the edge fragment as F(x) = F
0
(x) + (x) where F
0
(x) = 1
|x
1

g(2
j/2
x)w(x)
and (x) = F(x) F
0
(x); observe that F
0
is much like F except that we substituted the
edge curve with a vertical straight line and that the dierence function is supported in
a vertical strip whose width is at most . Then write

F =

F
0
+ . The object

F
0
is the
Fourier transform of a smooth function with a straight discontinuity and

F
0
(, 0) decays
like 1/[[ as [[ and obeys

[[2
j/2
[

F
0
(cos , sin )[
2
d C 2
j/2
.
We refer the reader to [3] for a proof, although this is an elementary fact. Next, for each
obeying [ sin [ max(2
j/2
, 2), note that the Radon transform R(, ) is L

-bounded
and supported in an interval of at most and thus obeys |R(, )|
2
L
2
(R)
C C 2
j/2
.
Therefore,

[[2
j/2
[ (cos , sin )[
2
d C 2
j/2
and this last estimate proves that for
each obeying [ sin [ max(2
j/2
, 2),

[[2
j/2
[

F(cos , sin )[
2
d C 2
j/2
.
This last inequality together with (6.20) nish the proof of the theorem.
To establish the sparsity of curvelet coecients of an edge fragment, it will prove to be
useful to develop bounds on the derivatives of the Fourier transform of an edge fragment.
Corollary 6.6 Suppose that f is an edge fragment as in (6.1). For each m = (m
1
, m
2
),
m
1
, m
2
= 0, 1, 2, . . ., let D
m
be the mixed derivative
m
1
1

m
2
2
. Then the derivative of the
Fourier transform of an edge fragment obeys

[[I
j
[D
m

f(cos , sin )[
2
d C
m
2
j[m[
2
jm
1
I
j
() +C
m
2
j[m[
2
5j
, (6.21)
with I
j
() as in Theorem 6.1, i.e. I
j
() = 2
2j
(1 + 2
j/2
[ sin [)
5
.
Consider i
m
D
m

f. This object is the Fourier transform of x
m
f(x) which we may rewrite as
x
m
f(x) = 2
j[m[/2
g(x)w
m
(2
j/2
x)1
|x
1
E
j
(x
2
)
, w
m
(x) = x
m
w(x).
In other words, x
m
f(x) = 2
j[m[/2
f
m
(x) where f
m
(x) is an edge fragment. Therefore, D
m

f
obeys

2
j
2
j+1
[D
m

f(cos , sin )[
2
d C
m
2
j[m[
2
2j
(1 + 2
j/2
[ sin [)
5
.
This is of course a naive upper-bounds and we already argued that we had available better
estimates, i.e. (6.17)(6.19). Arguments identical to those developed above would turn the
size estimates (6.17)(6.19) into (6.21). The proof is a mere repeat of that of Theorem 6.1
and is omitted.
28
6.5 Arbitrary Edge Curves
As mentioned earlier, Theorem 6.1 does not depend upon the assumption that the edge
obeys (6.2) and yields a more general result. Consider a typical edge curve c (and associated
typical edge fragment) such that the point x
0
= (x
0,1
, x
0,2
) c and that at that point the
tangent is pointing in the direction (sin
0
, cos
0
). Then the edge fragment (6.1) would
obey Theorem 6.1 with of course [ sin(
0
)[ in place of [ sin [ in the right-hand side of
(6.3).
Let R
0
be the rotation by the angle
0
. Technically speaking, although a typical edge
fragment f is not of the form f(x) = f
0
(R
0
(x x
0
)) with f
0
a standard edge fragment
because of the windowing (6.1), our size estimates (6.3)(6.21) behave as if this were the
case. Let us explain.
Corollary 6.7 Consider a typical edge fragment f as descibed above then its Fourier trans-
form may be expressed as

f() = e
ix
0


f
0
(R
0
),
where

f
0
obeys Theorem 6.1 and Corollary 6.6.
We omit the proof of this intuitive corollary as this is a mere repreat of the arguments
presented above.
7 Curvelet Analysis of Edge Fragments
Let M
J
denote the set of coecients = (j, , k) with a xed value of the scale/angle pair
J = (j, ). In the previous section, we developed a key inequality (6.4) which gives a very
precise bound on the
2
norm of the curvelet coecients of an edge fragment f. Indeed,

M
J
[

[
2
=

f()[
2
[
J
()[
2
d C 2
3j/2
(1 + 2
j/2
[ sin
J
[)
5
. (7.1)
For a xed J, set
J
= 1 + 2
j/2
[ sin
J
[ and let N
J
() be the number of indices M
J
such that [

[ > . Roughly speaking, at scale 2


j
and for a xed orientation J, there
are only about O(
J
) curvelets whose support overlaps signicantly with the edge curve c.
Assuming that the other coecients are negligible, it would follow from
N
J
()
2

M
J
[

[
2
together with (7.1) that
N
J
() C min

J
, 2
3j/2

2

5
J

.
Recall that
J
= 2 2
j/2
, = 0, 1, . . . , 2
j/2
1 or we may assume a slightly dierent
parameterization and set = 2
j/4
, . . . , 2
j/4
1. For [/2, /2], 2[[/ [ sin [ [[
and, therefore,
N
J
() C min(1 +[[, 2
3j/2

2
(1 +[[)
5
).
Hence

N
J
() C 2
j/2

2/3
,
which is (5.4). This is of course a rough sketch aimed at quantitatively explaining why the
coecients of an edge fragment obey (5.4). Indeed, we assumed that most coecients were
negligible and a rigorous proof must of course quantify the size of these individuals.
29
7.1 Proof of Theorem 5.2
Recall the frequency domain denition of a curvelet (2.9):

() = (2)
J
()u
j,k
(R

J
)
with u
j,k
as in (2.8). We recall that u
j,k
(R

J
) is an orthogonal basis for L
2
(
J
) where
J
is a rectangle containing the support of
J
. In the Fourier domain, curvelet coecients are
thus given by

=
1
2


f()
J
()u
j,k
(R

J
) d.
We now let D
1
be the partial derivative in the direction (cos
J
, sin
J
) and D
2
be the
derivative in the orthogonal direction, namely, (sin
J
, cos
J
). With
J
as before, set
L = (1 (2
j
/
J
)
2
D
2
1
)(1 2
j
D
2
2
). Then a simple calculation shows
L(u
j,k
R

J
) = (1 +
2
J
(k
1
+ 1/2)
2
)
1
(1 +k
2
2
)
1
(u
j,k
R

J
),
and integrating by parts gives

= (1 +
2
J
(k
1
+ 1/2)
2
)
1
(1 +k
2
2
)
1

(L

f
J
)()u
j,k
(R

J
) d. (7.2)
Let K = (K
1
, K
2
) Z
2
and dene R
K
to be the set of coecients (k
1
, k
2
) such that

1
J
(k
1
+1/2) [K
1
, K
1
+1) and k
2
= K
2
. It then follows from the orthogonality property
of the system (u
j,k
(R

J
))
k
that

kR
K
[

[
2
C (1 +[K
1
[
2
)
2
(1 +[K
2
[
2
)
2

[(L

f
J
)()[
2
d,
In the Appendix, we show that (L

f
J
) obeys the same estimate as the edge fragment,
namely,

[(L

f
J
)()[
2
d C 2
3j/2
(1 + 2
j/2
[ sin
J
[)
5
. (7.3)
In short, we have available the following bound

kR
K
[

[
2
C L
2
K
2
3j/2
(1 + 2
j/2
[ sin
J
[)
5
, L
K
= (1 +[K
1
[
2
)(1 +[K
2
[
2
). (7.4)
The rest of the proof mimics those estimates introduced at the beginning of this section.
For a xed orientation J, let N
J,K
() be the number of indices M
J
such that k R
K
and [

[ > . N
J,K
() is of course bounded by [R
K
[
J
and obeys
N
J,K
() C min(
J
, 2
3j/2
L
2
K

2

5
J
)
as this follows from (7.4). Put
j,K
= L
K
2
3j/4
. Then the same calculations as before
now give

[J[=j
N
J,K
() C (
j,K
)
2/3
= C 2
j/2

2/3
L
2/3
K
.
Since

KZ
2 L
2/3
K
A, we proved that
[ M
j
, [

[ > [ C 2
j/2

2/3
.
This nishes the proof of Theorem 5.2.
30
7.2 Arbitrary Edge Fragments
The previous calculations assumed that the edge fragment obeys (6.2). We hope that it is
clear that nothing in the curvelet transform crucially depends upon this specic location
and orientation. To see how the argument eortlessly adapts to arbitrary edge fragments,
we follow Section 6.5. Recall that the Fourier transform of a typical edge fragment f (the
edge curve has location x
0
and orientation
0
) is of the form

f() = e
ix
0


f
0
(R
0
) where

f
0
obeys Theorem 6.1 and Corollary 6.6, see Corollary 6.7.
We introduce some notations and let u
J,k
be the function dened by u
j,k
(R

J
x). Then

=
1
2


f
0
(R
0
)e
ix
0

J
()u
J,k
() d
=
1
2


f
0
()
J
(R

0
)u
J,kk
0
(R

0
) d;
observe the shifted index k k
0
with k
0
dened as k
0
= R

J
x
0
. In eect, the angle
J
of
the curvelet has also been shifted, i.e. by
0
. Since

f
0
obeys Theorem 6.1 and Corollary
6.6, all of our estimates remain the same but for a shift in the parameters of the curvelet
transform. For instance, (7.1) becomes

M
J
[

[
2
=

f()[
2
[
J
(R

0
)[
2
d C 2
3j/2
(1 + 2
j/2
[ sin(
J

0
)[)
5
.
Next, letting
J
= 1 + 2
j/2
[ sin(
J

0
)[ and setting L = (1 (2
j
/
J
)
2
D
2
1
)(1 2
j
D
2
2
) with
D
1
being now the partial derivative in the direction (cos(
J

0
), sin(
J

0
)) would give

= (1 +
2
J
(k
1
k
0,1
+ 1/2)
2
)
1
(1 + (k
2
k
0,2
)
2
)
1

(L

f
J
(R

0
))()u
J,k
(R

0
) d
just as before (7.2) and of course the left-hand side of (7.3) would be replaced by C
2
3j/2
(1 + 2
j/2
[ sin(
J

0
)[)
5
. From here, it is clear how the rest of arguments would
then proceed.
7.3 Coarse Scale Edge Fragments
We would like to conclude this section by remarking that strictly speaking, an edge fragment
assumes that the scale is ne enough so that j j
0
where j
0
is simply related to the
maximum curvature of the edge curve. In all rigor, we have therefore not covered the cases
where j < j
0
although these are trivial since they morally involve only a nite number
of coecients. In the next section, we will prove that for any edge fragment f,
Q
obeys
|
Q
|

2/3
2
j/2
which then establishes Theorem 5.2 for j < j
0
.
8 Curvelet Analysis of Smooth Functions
8.1 Proof of Theorem 5.3
We begin with a lemma.
Lemma 8.1 Let
j
be a dyadic corona of the form 2
j
[[ 2
j+
with , =
0, 1, 2, 3. Suppose that Q O
1
, then f
Q
obeys

j
[

f
Q
()[
2
d C 2
5j
. (8.1)
31
Proof of Lemma. The function f
Q
= gw
Q
is supported in a square of sidelength 2 2
j/2
,
is twice dierentiable and its second partial derivative with respect to x
1
, say, is given by
(f
Q
)
11
= g
11
w
Q
+ 2g
1
(w
Q
)
1
+gw
11
= T
1
+ 2T
2
+T
3
.
1. The function T
1
obeys |T
1
|
L
C and, therefore, |T
1
|
2
L
2
C 2
j
.
2. The function T
2
is dierentiable and a trivial calculation gives |(T
2
)
1
|
L
C 2
j
;
therefore, |(T
2
)
1
|
2
L
2
C 2
j
.
3. The function T
3
is twice dierentiable and a trivial calculation gives |(T
3
)
11
|
L

C 2
2j
; therefore, |(T
3
)
11
|
2
L
2
C 2
3j
.
Let I
j
be a dyadic interval of the form [ 2
j
, 2
j+
] with , = 0, 1, 2, 3. Using
arguments similar to those deployed in the proof of Theorem 6.1, we obtain that for each
n = 1, 2, 3,

T
n
obeys

[
1
[I
j

2
[

T
n
(
1
,
2
)[
2
d
1
d
2
C 2
j
.
Since
2
1

f
Q
() =

T
1
() + 2

T
2
() +

T
3
(), we proved that

[
1
[I
j

2
[

f
Q
(
1
,
2
)[
2
d
1
d
2
C 2
5j
. (8.2)
Of course, a similar bound would hold with the dyadic strip , [
2
[ I
j
(instead of
, [
1
[ I
j
) as a domain of integration.
The overall structure of the proof of Theorem 5.3 is now analogous to that of Theorem
5.2. We rst turn the upper-bound (8.1) into a size estimate about the
2
-norm of the
coecients of f
Q
at a xed scale 2
j
. Indeed, the sequence
Q
obeys

M
j
[
Q,
[
2

j
[

f
Q
()[
2
d C 2
5j
. (8.3)
We then turn this
2
-estimate into an
p
type of estimate. To do this, recall the interpolation
inequality
||
p
n
1/p1/2
||

2
(8.4)
valid for arbitrary nite sequences of length n. Roughly speaking, at scale 2
j
, there are
only about 2
j
curvelets whose support overlap signicantly with f
Q
. Assuming that the
other coecients are negligible, (8.4) would give
|
Q
|
p
2
j(1/p1/2)
|
Q
|

2
,
and, therefore,for p = 2/3,
Q
would obey
|
Q
|

2/3
2
3j/2
since |
Q
|

2
C 2
5j/2
. This is the content of Theorem 5.3.
Proof of Theorem 5.3. In the Fourier domain, curvelet coecients are again given by

=
1
2


f()
J
()u
j,k
(R

J
) d.
32
Set L = (1 2
j
) with the usual Laplacian =

2
i=1

2
/
2
i
. Then
L(u
j,k
R

J
) = (1 + 2
j
(k
1
+ 1/2)
2
+k
2
2
)(u
j,k
R

J
).
Hence, an integration by parts gives

= (1 + 2
j
(k
1
+ 1/2)
2
+k
2
2
)
2

L
2
(

f
J
)()u
j,k
(R

J
) d.
Let K = (K
1
, K
2
) Z
2
and dene R
K
to be the set of coecients (k
1
, k
2
) such that
(k
1
+1/2)2
j
[K
1
, K
1
+1) and k
2
= K
2
. It then follows from the orthogonality property
of the system (u
j,k
(R

J
))
k
that

kR
K
[

[
2
C (1 +[K[
2
)
4

[L
2
(

f
J
)()[
2
d.
We now sum this last inequality over all angular wedges J = (j, ) at a xed scale [J[ = j
and obtain

[J[=j

kR
K
[

[
2
C (1 +[K[
2
)
4


[J[=j
[L
2
(

f
J
)()[
2
d.
The Appendix develops a bound for the right-hand side of this last inequality, namely,


[J[=j
[L
2
(

f
J
)()[
2
d C 2
5j
. (8.5)
The remainder of the proof mimics the argument presented above. We rst observe
that the number of terms in R
K
is bounded by 1 + 2
j/2
and thus the cardinality of the set
indices M
j
such that k R
K
and = 1, 2, . . . 2
j/2
is less or equal to 2
j
+ 2
j/2
. The
interpolation inequality (8.4) gives

kR
K
[
Q,
[
p
C 2
j(1p/2)
2
5jp/2
(1 +[K[
2
)
2p
.
Since for p > 1/2,

KZ
2(1 +[K[
2
)
2p
A
p
, we proved that for each p > 1/2,

M
j
[
Q,
[
p
C 2
j(13p)
.
In particular |
Q
|

2/3
C 2
3j/2
. This nishes the proof of our Theorem.
Remark. Let f
Q
be an edge fragment. Then f
Q
of course obeys |f
Q
|
L
2
C 2
j/2
and,
therefore, |
Q
|

2
C 2
j/2
. The above analysis would then give |
Q
|
p
C 2
j(1/p1)
,
and in particular |
Q
|

2/3
C 2
j/2
as claimed in the previous section.
8.2 Sparsity of Smooth Functions
The argument we presented above is general and shows that curvelets are just as eective
as any other classical system for representing smooth objects (as claimed in Section 4). To
make this statement precise, recall the dention of the Sobolev norm of an object f:
|f|
2
W
s
2
=

f()[
2
(1 +[[
2s
) d,
which is equivalent to |f|
2
L
2
+|D
s
f|
2
L
2
, where D
s
is the sth derivative of f proviso that s
is an integer.
33
Theorem 8.2 Suppose that g W
s
2
, s > 0, and supp g [0, 1]
2
. Let be the curvelet
coecient sequence of g. Then,

2
2js
[

[
2
C |f|
2
W
s
2
, (8.6)
and
||
w

p
C |f|
W
s
2
, 1/p

= (s + 1)/2. (8.7)
We only sketch the proof of this result as this is an easy modication of the ideas we already
exposed. With the same notations as before, by denition the Fourier transform obeys

j
[

f()[
2
d C 2
2js
.
which can be turned into the estimate

M
j
[

[
2
C 2
2js
.
This gives the rst part of the result. Next, at scale 2
j
, we only have about 2
2j
curvelets
which interact with the unit square [0, 1]
2
. Ignoring the other coecients (they can be
handled using rapid spatial decay of

), we have
# M
j
, [

[ > C min(2
2j
,
2
2
2js
).
Summing this inequality accros j 0 gives
# M
j
, [

[ > C
p

.
which proves the second part.
Remark. The above result shows that suces to prove Theorem 1.2 for objects f of the
form f = f
1
1
B
. This follows from the fact that the curvelet coecient sequence (

) of
an arbitary object f = f
0
+f
1
1
B
may be decomposed as follows

= 'f,

` = 'f
0
,

` +'f
2
1
B
,

` =
0

+
1

,
where
0
obeys (8.7), i.e for s = 2, |
0
|
w
2/3
C.
9 Discussion
This paper introduced new tight frames of curvelets and proved that curvelets provide
optimally sparse representation of objects with singularities along C
2
edges. This result
motivated the whole construction and in itself explains its appeal. Short of this result, the
construction would be an interesting new multiscale architecture, but simply one among
many possibilities.
In fact, just as any transform may be applied to a wide spectrum of problems, there
is a range of possible applications of curvelet systems which is much wider than the types
of approximation theoretic problems studied in this paper. In particular, we would like
to point out applications in image processing [9, 34, 33], statistical estimation [8] and
possible applications in partial dierential equations and scientic computing. For instance,
34
some recent work [4] shows that some classical types of operators, namely Fourier Integral
Operators, admit optimally sparse decompositions in curvelet frames.
Space limitations prevent extensive discussion of these applications. However, it is
worth recalling that the sparsity estimates proven here are directly related to performance
metrics in these applications. In all these applications, sparser expansions theoretically lead
to better reconstructions and faster algorithms and so the estimates proven here form a
central motivating factor in eorts towards applications.
10 Appendix
Proof of (7.3). With the notations of Section 7, note that for each pair m = (m
1
, m
2
),
m
1
, m
2
= 0, 1, 2, . . . the mixed derivative of
J
obeys
D
m
1
1
D
m
2
2

J
() = O(2
jm
1
2
jm
2
/2
). (10.1)
Next, from the denition the the partial derivatives D
1
and D
2
, namely,
D
1

f = cos
J

1

f + sin
J

2

f, D
2

f = sin
J

1

f + cos
J

2

f,
we deduce a formula for higher order derivatives
D
m
1

f =

+=m
c
,
(cos
J
)

(sin
J
)

f,
and similarly for D
m
2

f. As in Section 2, assume that R
J
is a symmetric rectangle containing
the support of
J
. Corollary 6.6 gives bounds on the L
2
-norm of partial derivatives of the
function

f, namely, letting
J
= (1 + 2
j/2
[ sin
J
[)
|

f|
2
L
2
(R
J
)
C
,
2
j(+)
(2
j
2
3j/2

5
J
+ 2
5j
).
Now, from
|D
m
1

f|
2
L
2
(R
J
)
c
m

+=m
[ sin
J
[
2
|

f|
2
L
2
(R
J
)
,
and bounds on each individual term |

f|
L
2
(R
J
)
, we then derive the size estimate
|D
m
1

f|
2
L
2
(R
J
)
C
m
2
jm

2
5j
+ 2
jm
2
3j/2

2m5
J

.
The above inequality used the relationship [ sin [ 2
j/2

J
. For m 2, note that
2
5j
2
j(m+3/2)

2m5
J
since
J
1 and, therefore,
|D
m
1

f|
2
L
2
(R
J
)
C
m
(2
j

1
J
)
2m
2
3j/2

5
J
. (10.2)
There is an analogous estimate for D
m
2

f and indeed, similar calculations now give
|D
m
2

f|
2
L
2
(R
J
)
C
m
2
jm
2
3j/2

5
J
. (10.3)
We now prove that
L(

f
J
) = (I 2
2j

2
J
D
2
1
2
j
D
2
2
+ 2
3j

2
J
D
2
1
D
2
2
)(

f
J
)
35
obeys (7.3). First, observe that
D
2
1
(

f
J
) = (D
2
1

f)
J
+ 2(D
1

f)(D
1

J
) +

f(D
2
1

J
).
Then (10.2), together with (10.1) gives
(2
j

1
J
)
2
|D
2
1
(

f
J
)|
L
2
C 2
3j/4

5/2
J
.
Second, (10.3), together with (10.1) gives
2
j
|D
2
2
(

f
J
)|
L
2
C 2
3j/4

5/2
J
.
And third, it is easy to verify that similar calculations also give
(2
3j

2
J
) |D
2
1
D
2
2
(

f
J
)|
L
2
C 2
3j/4

5/2
J
.
Our claim (7.3) now follows from these last three inequalities.
Proof of (8.5). Let be the usual Laplacian. We argue that for each each m =
0, 1, 2, . . ., there is a constant C
m
with the property


[J[=j
[
m
(

f
J
)()[
2
d C 2
2jm
2
5j
. (10.4)
Note that for m = 0, (10.4) holds because of Lemma 8.1. Further, (10.4) would give (8.5)
since L
2
= I 2 2
j
+ 2
2j

2
.
The proof will use the following basic fact about the window
J
(see Section 2). For
= (
1
,
2
), we let D

be the mixed derivative

1
1

2
2
. Then for each pair , the D

J
s
obey

[J[=j
[D

J
()[
2
C

2
j[[
, (10.5)
for some constant C

.
Now begin with

m
(

f
J
) =

[[+[[=2m
c
,
D


fD

J
.
We then use (10.5) and obtain


[J[=j
[D


f()[
2
[D

J
()[
2
d C

2
j[[

2
j1
[[2
j+2
[D


f()[
2
d.
Recall that here f(x) is of the form f(x) = g(x)w(2
j/2
x) where w is a window with
support in [1, 1]
2
. Now D


f is the Fourier transform of x

f(x). Write x

f(x) as
2
j[[/2
g(x)w

(2
j/2
x) where w

(x) = x

w(x). The window w

is of course C

, com-
pactly supported in [1, 1]
2
and obviously Fourier transform of g(x)w

(2
j/2
x) obeys the
decay estimate (8.1). Hence,

2
j1
[[2
j+2
[D


f()[
2
d C

2
j[[
2
5j
.
To conclude, for each , with [[ +[[ = 2m, we proved that


[J[=j
[D


f()[
2
[D

J
()[
2
d C
m
2
2jm
2
5j
,
and, therefore, (10.4) follows.
36
References
[1] J. P. Antoine and R. Murenzi. Two-dimensional directional wavelets and the scale-
angle representation. Signal Processing, 52:259281, 1996.
[2] E. J. Cand`es. Harmonic analysis of neural netwoks. Applied and Computational Har-
monic Analysis, 6:197218, 1999.
[3] E. J. Candes. Ridgelets and the representation of mutilated Sobolev functions. SIAM
J. Math. Anal., 33:347368, 2001.
[4] E. J. Cand`es and L. Demanet. Curvelets and Fourier integral operators.
http://www.acm.caltech.edu/emmanuel/publications.html, 2002.
[5] E. J. Cand`es and L. Demanet. Curvelets, warpings, and optimally sparse
representations of Fourier integral operators. Submitted and available at
http://www.acm.caltech.edu/emmanuel/publications.html, 2002.
[6] E. J. Candes and D. L. Donoho. Ridgelets: the Key to Higher-dimensional Intermit-
tency? Phil. Trans. R. Soc. Lond. A., 357:24952509, 1999.
[7] E. J. Cand`es and D. L. Donoho. Curvelets a surprisingly eective nonadaptive repre-
sentation for objects with edges. In C. Rabut A. Cohen and L. L. Schumaker, editors,
Curves and Surfaces, pages 105120, Vanderbilt University Press, 2000. Nashville, TN.
[8] E. J. Cand`es and D. L. Donoho. Recovering edges in ill-posed inverse problems: Op-
timality of curvelet frames. Ann. Statist., 30:784 842, 2002.
[9] E. J. Cand`es and F. Guo. New multiscale transforms, minimum total variation synthe-
sis: Applications to edge-preserving image reconstruction. Signal Processing, 82:1519
1543, 2002.
[10] I. Daubechies. Ten lectures on wavelets. Society for Industrial and Applied Mathe-
matics, Philadelphia, PA, 1992.
[11] I. Daubechies, A. Grossmann, and Y. Meyer. Painless nonorthogonal expansions. J.
Math. Phys., 27:12711283, 1986.
[12] L. Demanet. Personal communication. 2002.
[13] M. N. Do. Directional Multiresolution Image Representations. PhD thesis, Swiss
Federal Institute of Technology, Lausanne, November 2001.
[14] M. N. Do and M. Vetterli. Contourlets. In J. Stoeckler and G. V. Welland, editors,
Beyond Wavelets. Academic Press, 2002. To appear.
[15] D. L. Donoho. Unconditional bases are optimal bases for data compression and for
statistical estimation. Applied and Computational Harmonic Analysis, 1:100115, 1993.
[16] D. L. Donoho. De-noising by soft-thresholding. IEEE Transactions on Information
Theory, 41:613627, 1995.
[17] D. L. Donoho. Unconditional bases and bit-level compression. Applied and Computa-
tional Harmonic Analysis, 3:388392, 1996.
37
[18] D. L. Donoho. Wedgelets: nearly-minimax estimation of edges. Ann. Statist., 27:859
897, 1999.
[19] D. L. Donoho. Sparse components of images and optimal atomic decomposition. Con-
str. Approx., 17:353382, 2001.
[20] D. L. Donoho and M. R. Duncan. Digital curvelet transform: Strategy, implementation,
experiments. Technical report, Stanford University, 1999.
[21] D. L. Donoho and I. M. Johnstone. Empirical atomic decomposition. Manuscript,
1995.
[22] D. L. Donoho, M Vetterli, R. A. DeVore, and I. Daubechies. Data compression and
harmonic analysis. IEEE Trans. Inform. Theory, 44:24352476, 1998.
[23] C. Feerman. A note on spherical summation multipliers. Israel J. Math., 15:4452,
1973.
[24] A. G Flesia, H. Hel-Or, A. Averbuch, E. J. Cand`es, R. R. Coifman, and D. L. Donoho.
Digital implementation of ridgelet packets. In J. Stoeckler and G. V. Welland, editors,
Beyond Wavelets. Academic Press, 2002. To appear.
[25] M. Frazier, B. Jawerth, and G. Weiss. Littlewood-Paley theory and the study of function
spaces, volume 79 of NSF-CBMS Regional Conf. Ser. in Mathematics. American Math.
Soc., Providence, RI, 1991.
[26] P. Gressman, D. Labate, G. Weiss, and E. Wilson. Ane, quasi-ane and co-ane
wavelets. In J. Stoeckler and G. V. Welland, editors, Beyond Wavelets. Academic
Press, 2002. To appear.
[27] B. S. Kashin. Approximation properties of complete orthonormal systems. Trudy Mat.
Inst. Steklov., 353:187191, 1985. English translation in Proc. Steklov Inst. Math.
(1987).
[28] P. G. Lemarie and Y. Meyer. Ondelettes et bases Hilbertiennes. Rev. Mat. Iberoamer-
icana, 2:118, 1986.
[29] S. Mallat. A Wavelet Tour of Signal Processing. Academic Press, 1998.
[30] Y. Meyer. Wavelets: Algorithms and Applications. SIAM, Philadelphia, 1993.
[31] E. P. Simoncelli, W. T. Freeman, E. H. Adelson, and D. J. Heeger. Shiftable multi-scale
transforms [or whats wrong with orthonormal wavelets]. IEEE Trans. Information
Theory, Special Issue on Wavelets, 38:587607, 1992.
[32] H. A. Smith. A parametrix construction for wave equations with C
1,1
coecients.
Ann. Inst. Fourier (Grenoble), 48:797835, 1998.
[33] J. L. Starck, E. J. Cand`es, and D. L. Donoho. Very high quality image restoration. In
M. A. Unser eds. A. Aldroubi, A. F. Laine, editor, Wavelet Applications in Signal and
Image Processing IX, Proc. SPIE 4478, 2001.
[34] J.L. Starck, E. Cand`es, and D.L. Donoho. The curvelet transform for image denoising.
IEEE Transactions on Image Processing, 11:670684, 2002.
38
[35] J.L. Starck, F. Murtagh, and A. Bijaoui. Image Processing and Data Analysis: The
Multiscale Approach. Cambridge University Press, Cambridge (GB), 1998.
[36] E. M. Stein. Harmonic Analysis: Real-Variable Methods, Orthogonality, and Oscilla-
tory Integrals, volume 30. Princeton University Press, Princeton, N.J., 1993.
[37] D. Taubman. High performance scalable image compression with EBCOT. IEEE
Transactions on Image Processing, 9:11581170, 2000.
39

You might also like