Lecture Notes On Inverse Problems

lecture notes on
Inverse Problems
University of Gottingen
Summer 2002
Thorsten Hohage
Contents
1 Introduction 3
2 Algorithms for the solution of linear inverse problems 14
3 Regularization methods 22
4 Spectral theory 28
5 Convergence analysis of linear regularization methods 39
6 Interpretation of source conditions 51
7 Nonlinear operators 56
8 Nonlinear Tikhonov regularization 64
9 Iterative regularization methods 70
Bibliography 79
2
1. Introduction
In [Kel76] Keller formulated the following very general denition of inverse problems, which
is often cited in the literature:
We call two problems inverses of one another if the formulation of each involves
all or part of the solution of the other. Often, for historical reasons, one of the
two problems has been studied extensively for some time, while the other is
newer and not so well understood. In such cases, the former problem is called
the direct problem, while the latter is called the inverse problem.
In many cases one of the two problems is not well-posed in the following sense:
Denition 1.1. (Hadamard) A problem is called well-posed if
1. there exists a solution to the problem (existence),
2. there is at most one solution to the problem (uniquenss),
3. the solution depends continuously on the data (stability).
A problem which is not well-posed is called ill-posed. If one of two problems which are
inverse to each other is ill-posed, we call it the inverse problem and the other one the direct
problem. All inverse problems we will consider in the following are ill-posed.
If the data space is dened as set of solutions to the direct problem, existence of a
solution to the inverse problem is clear. However, a solution may fail to exist if the data
are perturbed by noise. This problem will be addressed below. Uniqueness of a solution
to an inverse problem is often not easy to show. Obviously, it is an important issue. If
uniqueness is is not guaranteed by the given data, then either additional data have to be
observed or the set of admissible solutions has to be restricted using a-priori information
on the solution. In other words, a remedy against non-uniqueness can be a reformulation
of the problem.
Among the three Hadamard criteria, a failure to meet the third one is most delicate
to deal with. In this case inevitable measurement and round-o errors can be amplied
by an arbitrarily large factor and make a computed solution completely useless. Until
the beginning of the last century it was generally believed that for natural problems the
solution will always depend continuously on the data (natura non facit salti). If this
was not the case, the mathematical model of the problem was believed to be inadequate.
Therefore, these problems were called ill- or badly posed. Only in the second half of the last
3
1. Introduction 4
century it was realized that a huge number of problems aring in science and technology are
ill-posed in any reasonable mathematical setting. This initiated a large amount of research
in stable and accurate methods for the numerical solution of ill-posed problems. Today
inverse and ill-posed problems are still an active area of research. This is reected in a
large number of journals (Inverse Problems, Inverse and Ill-Posed Problems, Inverse
Problems in Engeneering) and monographs ([BG95, Bau87, EHN96, Gla84, Gro84, Hof86,
Isa98, Kir96, Kre89, Lav67, Lou89, Mor84, Mor93, TA77, TGSY95]).
Let us consider some typical examples of inverse problems. Many more can be found
in the references above.
Example 1.2. (Numerical dierentiation)
Dierentiation and integration are two problems, which are inverse to each other. Although
symbolic dierentiation is much simpler than symbolic integration, we will call dieren-
tiation the inverse problem since it is ill-posed in the setting considered below. For this
reason dierentiation turns out to be the more delicate problem from a numerical point of
view.
We dene the direct problem to be the evalutation of the integral
(T
D
)(x) :=
_
x
0
(t) dt for x [0, 1]
for a given C([0, 1]). The inverse problem consists in solving the equation
T
D
= g (1.1)
for a given g C([0, 1]) satisfying g(0) = 0, or equivalently computing = g
.
Obviously, (1.1) has a solution in C([0, 1]) if and only if g C
1
([0, 1]). The inverse
problem (1.1) would be well-posed if the data g were measured with the C
1
-norm. Although
this is certainly a natural setting, it is of no use if the given data contain measurement or
round-o errors which can only be estimated with respect to the supremum norm.
Let us assume that we are given noisy data g
C([0, 1]) satisfying

|g
g|

with noise level 0 < < 1. The functions
g
n
(x) := g(x) + sin
nx
, x [0, 1],
n = 2, 3, 4, . . . satisfy this error bound, but for the derivates
(g
n
)
(x) = g
(x) +ncos
nx
, x [0, 1]
we nd that
|(g
n
)
= n.
1. Introduction 5
Hence, the error in the solutions tends blows up without bound as n although the
error in the data is bounded by . This shows that (1.1) is ill-posed with respect to the
supremum norm.
Let us look at the approximate solution of (1.1) by the central dierence quotients
(R
h
g)(x) :=
g(x +h) g(x h)
2h
, x [0, 1]
for h > 0. To make (R
h
g)(x) well dened for x near the boundaries, we assume for
simplicity that g is periodic with period 1. A Taylor expansion of g yields
|g
R
h
g|

h
2
|g
,
|g
R
h
g|

h
2
6
|g
if g C
2
([0, 1]) or g C
3
([0, 1]), respectively. For noisy data the total error can be
estimated by
|g
R
h
g
|g
R
h
g|
+|R
h
g R
h
g
(1.2)
h
2
|g
+

h
if g C
2
([0, 1]). In this estimate we have split the total error into an approximation error
|g
R
h
g|
, which tends to 0 as h 0, and a data noise error R

h
g
R
h
g, which explodes
as h 0. To get a good approximation we have to balance these two error terms by a
good choice of the discretization parameter h. The minimum of the right hand side of the
error estimate is attained at h = (2/|g
|)
1/2
1/2
. With this choice of h the total error is
of the order
|g
R
h
g
=
O
_
1/2
_
. (1.3a)
If g C
3
([0, 1]), a similar computation shows that for h = (3/|g
|)
1/3
1/3
we get the
better convergence rate
|g
R
h
g
=
O
_
2/3
_
. (1.3b)
More regularity of g does not improve the order
2/3
for the central dierence quotient
R
h
. It can be shown that even for higher order dierence schemes the convergence rate is
always smaller than
O
(). This order can only be achieved for well-posed problems.
The convergence rate (1.3a) reects the fact that stability can be restored to the inverse
problem (1.1) if the a-priori information
|g
E (1.4)
is given. In other words, the restriction of the dierentiaton operator to the set W
E
:=
g : |g
| E is continuous with respect to the maximum norm. This follows from the
1. Introduction 6
estimate
|g
1
g
2
|
|(
d
dx
R
h
)(g
1
g
2
)|
+|R
h
(g
1
g
2
)|
h
2
|g
1
g
2
|
+
|g
1
g
2
|
h
2
_
E|g
1
g
2
|
,
which holds for g
1
, g
2
W
E
with the choice h =
_
|g
1
g
2
|
/E.
In studying this rst example we have seen a number of typical properties of ill-posed
problems:
amplication of high frequency errors
dependence of ill-posedness on the choice of norms, which is often determined by
practical needs
restoration of stability by a-priori information
a trade-o between accuracy and stability in the choice of the discretization parameter
dependence of the optimal choice of the discretization parameter and the convergence
rate on the smoothness of the solution
Example 1.3. (Backwards heat equation)
Direct Problem: Given L
2
([0, 1]) nd g(x) = u(x, T) (T > 0) where u :
[0, 1] [0, T] R satises
t
u(x, t) = u(x, t), x (0, 1), t (0, T), (1.5a)
u(0, t) = u(1, t) = 0, t (0, T], (1.5b)
u(x, 0) = (x), x [0, 1]. (1.5c)
may describe a temperature prole at time t = 0. On the boundaries of the interval
[0, 1] the temperature is kept at 0. The task is to nd the temperature at time t = T. The
inverse problem consists in nding the initial temperature given the temperature at time
t = T.
Inverse Problem: Given g L
2
([0, 1]), nd L
2
([0, 1]) such that u(, T) = g
and u satises (1.5).
Let
n
:=
2
_
1
0
sin(nx)(x) dx denote the Fourier coecients of with respect to
the complete orthonormal system
2 sin(n) : n = 1, 2, . . . of L
2
([0, 1]). A separation
of variables leads to the formal solution
u(x, t) =
n=1
n
e
2
n
2
t
sin(nx) (1.6)
1. Introduction 7
of (1.5). It is a straightforward exercise to show that u dened by (1.6) belongs to
C
([0, 1] (0, T]) and satises (1.5a) and (1.5b). Moreover, the initial condition (1.5c) is
satised in the L
2
sense, i.e. u(, t) as t 0 in L
2
([0, 1]).
Introducing the operator T
BH
: L
2
([0, 1]) L
2
([0, 1]) by
(T
BH
)(x) :=
_
2
n=1
_
e
2
n
2
T
sin(nx) sin(ny)
_
(y) dy, (1.7)
we may formulate the inverse problem as an integral equation of the rst kind
T
BH
= g. (1.8)
Note that the direct solution operator T
BH
damps out high frequency components with
an exponentially descreasing factor e
2
n
2
T
. Therefore, in the inverse problem a data error
in the nth Fourier component of g is amplied by the factor e
2
n
2
T
! This shows that the
inverse problem is severely ill-posed.
Also note that the inverse problem does not have a solution for arbitrary g L
2
([0, 1]).
For more information on inverse problems in diusion processes we refer to [ER95].
Example 1.4. (Sideways heat equation and deconvolution problems)
We again consider the heat equation on the interval [0, 1], but this time with an innite
time interval:
u
t
(x, t) = u
xx
(x, t), x (0, 1), t R (1.9a)
The interval [0, 1] may describe some heat conducting medium, e.g. the wall of a furnace.
We assume that the exterior side of the wall is insulated,
u
x
(0, t) = 0, t R, (1.9b)
and that the interior side has the temperature of the interior of the furnace,
u(1, t) = (t), t R. (1.9c)
Direct Problem: Given L
2
(R) nd u : [0, 1] R R satisfying (1.9).
In practice the interior of a furnace is not accessible. Temperature measurements can
only be taken at the exterior side of the wall. This leads to the
Inverse Problem: Given g L
2
(R), nd L
2
(R) such that the solution to
(1.9) satises
u(0, t) = g(t), t R. (1.10)
The Fourier transform of u with respect to the time variable,
(Tu)(x, ) :=
1
2
_

e
it
u(x, t) dt
1. Introduction 8
formally satises
Tu
xx
= (Tu)
xx
and Tu
t
= iTu
for 0 < x < 1. Hence, Tu is characterized by the equations
(Tu)
xx
= iTu, (Tu)
x
(0, ) = 0, (Tu)(1, ) = T,
which have the explicit solution
(Tu)(x, ) =
cosh
ix
cosh
i
(T)()
where
i =
_
2
+i
_
2
. It is easily checked that the inverse Fourier transform of the right
hand side of this equation really denes a solution to (1.9). Therefore, we can reformulate
our problem as an operator equation T
SH
= g with the operator T
SH
: L
2
(R) L
2
(R)
dened by
T
SH
:= T
1
(cosh
i)
1
T. (1.11)
By the Convolution Theorem T
SH
can be written as a convolution operator
(T
SH
)(t) =
_

k(t s)(s) ds
with kernel k =
2T
1
((cosh
i)
1
).
Integral equations of convolution type also arise in other areas of the applied sciences,
e.g. in deblurring of images.
Example 1.5. (Deblurring of images)
In early 1990 the Hubble Space Telescope was launched into the low-earth orbit outside
of the disturbing atmosphere in order to provide images with a unprecedented spatial
resolution. Unfortunately, soon after launch a manufacturing error in the main mirror was
detected, causing severe spherical aberrations in the images. Therefore, before the space
shuttle Endeavour visited the telescope in 1993 to x the error, astronomers employed
inverse problem techniques to improve the blurred images (cf. [Ado95, LB91]).
The true image and the blurred image g are related by a rst kind integral equation
_

k(x, y; x
, y
)(x
, y
) dx
dy
= g(x, y) (1.12)
where k is the blurring function. k(; x
0
, y
0
) describes the blurred image of a point source
at (x
0
, y
0
). It is usually assumed that k is spatially invariant, i.e.
k(x, y; x
, y
) = h(x x
, y y
), x, x
, y, y
R. (1.13)
h is called point spread function. Under the assumption (1.13) the direct problem is de-
scribed by the convolution operator
(T
DB
)(x, y) :=
_

h(x x
, y y
)(x
, y
) dx
dy
.
1. Introduction 9
The exact solution to the problem
T
DB
= g (1.14)
can in principle be computed by Fourier transformation as for the sideways heat equation.
We get = (2)
1
T
1
(1/
h)Tg. Again, the multiplication by 1/
h is unstable since

h := Th
vanishes asymptotically for large arguments. Therefore the inverse problem to determine
L
2
(R
2
) given g L
2
(R
2
) is ill-posed.
In many image restoration applications, a crucial problem is to nd a suitable point
spread function. For the Hubble Space Telescope it turned out that the blurring function
was to some extent spatially varying, i.e. that assumption (1.13) was violated. Moreover
the functions k(; x
0
, y
0
) had a comparatively large support.
Example 1.6. (Computerized tomography)
Computerized tomography is used in medical imaging and other applications and has been
studied intensively for a long time (cf. [Hel80, Nat86, RK96]). In medical X-ray tomography
one tries to determine the density of a two-dimensional cross section of the human body
by measuring the attenuation of X-rays. We assume that the support of is contained in
the disc of radius 1. Let I(t) = I
,s
(t) denote the intensity of an X-ray traveling in direction
along the line t s+t
where S
1
:= x R
2
: [x[ = 1 and
:=
_
0 1
1 0
_
(cf. Fig. 1.1). Then I satises the dierential equation

I
(t) = (s +t
)I(t)
with the explicit solution
I(t) = I() exp
_
_
t
(s +

t
) d
t
_
.
If has compact support, then I(t) is constant for [t[ suciently large. The asymptotic
values I() of I(t) are the given data. We have ln
I
,s
()
I
,s
()
= (R)(, s) where the
Radon transform is dened by
(R)(, s) :=
_

(s +t
) dt.
The direct problem consists in evaluating the Radon transform (R)(, s) for a given density
distribution L
2
(B
1
), B
1
:= x R
2
: [x[ 1 at all S
1
and s R. The inverse
problem is to nd L
2
(B
1
) given g = R L
2
(S
1
R).
In the special case that is radially symmetric we only need rays from one direction
since (R)(, s) is constant with respect to the direction . If (x) = ([x[
2
), we obtain
(R)(, s) =
_
1s
2
1s
2
(s +t
) dt = 2
_
1s
2
0
(t
2
+s
2
) dt =
_
1
()

d
1. Introduction 10
s
source
detector
Fig. 1.1: The Radon transform

where we have used the substitutions = t
2
+ s
2
and = s
2
. Hence our problem reduces
to the Abel integral equation
(T
CT
)() = g(), [0, 1] (1.15)
with the Volterra integral operator
(T
CT
)() :=
_
1
()

d. (1.16)
Example 1.7. (Electrical Impedance Tomography, EIT)
Let D R
d
, d = 2, 3 describe an electrically conducting medium with spatially varying
conductivity (x). We denote the voltage by u and assume that the electric eld E =
gradu is stationary, i.e.
t
E = 0. By Ohms law, the current density j satises j =
E = gradu. Applying the div operator to the Maxwell equation rot H = j +
t
(E)
yields
div gradu = 0, in D. (1.17a)
since div rot = 0. On the boundary a current distribution
= I, on D (1.17b)
is prescribed. By Gauss law (or the conservation of charge) it must satisfy
_
D
I ds = 0. (1.17c)
Since the voltage is only determined up to a constant, we can normalize u by
_
D
u ds = 0. (1.17d)
1. Introduction 11
Direct problem: Given , determine the voltage u[
D
on the boundary for all
current distributions I satisfying (1.17c) by solving the elliptic boundary value
problem (1.17a), (1.17b), (1.17d). In other words, determine the Neumann-to-
Dirichlet map
: H
1/2
(D) H
1/2
(D) dened by
I := u[
D
.
Inverse problem: Given measurements of the voltage distributation u[
D
for all
current distributations I, i.e. given the Neumann-to-Dirichlet map
, recon-
struct .
This problem has been studied intensively due to numereous applications in medical imag-
ing and nondestructive testing. Note that even though the underlying dierential equation
(1.17a) is linear, the mapping
is nonlinear. Some important papers on theoretical

aspects of this problem are [Ale88, Nac88, Nac95, SU87].
For d = 1 the Neumann-to-Dirichlet map is determined by one real number, which
is certainly not enough information to reconstruct the function (x). If we assume the
interior measurements of u are available, i.e. that u(x) is known for say x [0, 1] in
addition to a(0)u
(0), then can be computed explicitely. Since (u
= 0, we have
(x)u
(x) (0)u
(0) =
_
x
0
(u
d x = 0, and hence
(x) =
(0)u
(0)
u
(x)
provided u
(x) vanishes nowhere. This problem is (mildly) ill-posed as it involves dieren-

tiation of the given data u (see example 1.2). If u
is small in some areas, this may cause

additional instability.
We have introduced two problems where a coecient in a partial dierential equation is
to be determined from (partial) knowledge of the solution of the equation. Such problems
are refered to as parameter identication problems (cf. [BK89, Isa90, Isa98]).
Example 1.8. (Inverse obstacle scattering problems)
Another particularly important class of inverse problems are inverse scattering problems.
Such problems arise in acoustics, electromagnetics, elasticity, and quantum theory. The
aim is to identify properties of inaccessible objects by measuring waves scattered by them.
For simplicity we only look at the acoustic case. Let U(x, t) = Re(u(x)e
it
) describe the
velocity potential of a time harmonic wave with frequency and space-dependent part
u propagating through a homogeneous medium. Then the wave equation
1
c
2
2
t
2
U = U
reduces to the Helmholtz equation
u +k
2
u = 0 in R
d
K. (1.18a)
Here k = /c is the wave number, and K describes an impenetrable, smooth, bounded
obstacle K. The boundary condition on K depends on the surface properties. For sound-
soft obstacles we have the Dirichlet condition
u = 0 on K. (1.18b)
1. Introduction 12
We assume that the total eld u = u
i
+u
s
is the superposition of an known incident eld
u
i
(x) = e
ikx,d)
with direction d x R
2
: [x[ = 1 and a scattered eld u
s
. The scattered
eld satises the Sommerfelds radiation condition
lim
r
r
(d1)/2
_
u
s
r
iku
s
_
= 0 r = [x[, uniformly for all x = x/r, (1.18c)
which guarantees that asymptotically energy is transported away from the origin. More-
over, this condition implies that the scattered eld behaves asymptotically like an outgoing
wave:
u
s
(x) =
e
ikr
r
(d1)/2
_
u
( x) +
O
_
1
[x[
__
, [x[ (1.19)
The function u
: S
d1
C dened on the sphere S
d1
:= x R
d
: [ x[ = 1 is called far
eld pattern or scattering amplitude of u
s
.
Direct Problem: Given a smooth bounded obstacle K and an incident wave u
i
nd the far-eld pattern u
L
2
(S
d1
) of the scattered eld satisfying (1.18).
Inverse Problem: Given the far-eld pattern u
L
2
(S
d1
) and the incident
wave u
i
nd the obstacle K (e.g. a parametrization of its boundary K).
For more information on inverse scattering problems we refer to the monographs [CK83,
CK97b, Ram86] and the literature therein.
All the examples we have considered can be formulated as operator equations
F() = g (1.20)
for operators F : X Y between normed spaces X and Y . For the linear problems 1.2,
1.3, 1.4, 1.5, and 1.6, F is given by a linear integral operator
(F())(x) = (T)(x) =
_
k(x, y)(y) dy
with some kernel k, i.e. these problems can be formulated as integral equations of the rst
kind. For operator equations of the form (1.20) Hadamards criteria of well-posedness in
Denition 1.1 are
1. F(X) = Y .
2. F is one-to-one.
3. F
1
is continuous.
1. Introduction 13
The third criterium cannot be satised if F is a compact operator (i.e. it maps bounded
sets to relatively compact sets) and if dimX = . Otherwise the unit ball B = F
1
(F(B))
in X would be relatively compact, which is false.
If g in equation (1.20) does not describe measured data, but a desired state and if
describes control parameter, one speaks of an optimal control problem. Such problems are
related to inverse problems, but the point of view is dierent in some respects. First of all,
one is mainly interested in convergence in the image space Y rather than the pre-image
space X. Often one is not even care about uniqueness of a solution to (1.20). Whereas
in inverse problems it is assumed that g (not g
!) belongs to the range of F, this is not

true for optimal control problems. If the desired state g is not attainable, one tries to get
as close as possible to g, i.e. (1.20) is replaced by the minimization problem
|F() g| = min! X.
2. Algorithms for the solution of
linear inverse problems
In this chapter we introduce methods for the stable solution of linear ill-posed operator
equations
T = g. (2.1)
Here T : X Y is a bounded linear injective operator between Hilbert spaces X and Y ,
and g R(T). The algorithms we are going to discuss have to take care of the fact that
the solution of (2.1) does not depend continuously on the data, i.e. that |T
1
| = and
that the data may be perturbed by noise. We assume that only noisy data g
are at our
disposal and that
|g
g| . (2.2)
Tikhonov regularization
The problem to solve (2.1) with noise data g
may be equivalently reformulated as nding

the minimum of the functional |T g
|
2
in X. Of course, the solution to this
minimization problem again does not depend continuously on the data. One possibility to
restore stability is to add a penalty term to the functional involving the distance of to
some initial guess
0
:
J
() := |T g
|
2
+|
0
|
2
The parameter > 0 is called regularization parameter. If no initial guess is known, we
take
0
= 0.
Theorem 2.1. The Tikhonov functional J
has a unique minimum
in X for all > 0,

g
Y , and
0
X. This minimum is given by
= (T
T +I)
1
(T
+
0
). (2.3)
The operator T
T +I is boundedly invertible, so
depends continuously on g
.
To prove this theorem we need some preparations.
Denition 2.2. Let X, Y be normed spaces, and let U be an open subset of X. A mapping
F : U Y is called Frechet dierentiable at U if there exists a bounded linear operator
F
[] : X Y such that
lim
h0
1
|h|
(F( +h) F() F
[]h) = 0. (2.4)
14
2. Algorithms for the solution of linear inverse problems 15
F
[] is called the Frechet derivative of F at . F is called Frechet dierentiable if it is

Frechet dierentiable at every point U.
Lemma 2.3. Let U be an open subset of a normed space X. If F : U R is Frechet
dierentiable at and if U is a local minimum of F, then F
[] = 0.
Proof. Assume on the contrary that F
[]h ,= 0. After possibly changing the sign of h we

may assume that F
[]h < 0. Then

lim
0
F( +h) F()
= F
[]h < 0.
This contradicts the assumption that is a local minimum of F.
Lemma 2.4. The Tikhonov functional is Frechet dierentiable for every 0, and the
Frechet derivative is given by
J
[]h = 2 Re
(T g
) +(
0
), h
_
.
Proof. The assertion follows from the identity
J
( +h) J
() J
[]h = |Th|
2
+|h|
2
.
Proof of Theorem 2.1. Assume that
minimizes the Tikhonov functional J
. Then
J
]h = 0 for all h X by Lemma 2.3 and 2.4. The choice h = T
(Tg
) +(
0
)
implies that
(T
T +I)
= T
+
0
.
The bounded invertibility of T
T + I follows from the Lax-Milgrim lemma and the in-

equality
Re (T
T +I), ) = |T|
2
+||
2
||
2
.
To show that
dened by (2.3) minimizes J
, note that for all h X 0 the function

(t) := J
+ th) is a polynomial of degree 2 with 0 and
(0) = 0. Hence,
(t) (0) for all t R with equality only for t = 0.
Landweber iteration
Another idea is to minimize the functional J
0
() = |T g
|
2
by the steepest decent
method. According to Lemma 2.4 the direction of steepest decent is h = T
(T g
).
This leads to the recursion formula
0
= 0 (2.5a)
n+1
=
n
T
(T
n
g
), n 0, (2.5b)
known as Landweber iteration. We will see later that the step size parameter has be
chosen such that |T
T| 1. It can be shown by induction that the nth Landweber

iterate is given by
n
=
n1
j=0
(I T
T)
j
T
. (2.6)
In fact, the equation is obviously correct for n = 0, and if (2.6) holds true for n, then
n+1
= (I T
T)
n
+T
=
n
j=0
(I T
T)
j
T
.
If some initial guess
0
to the solution is known, the iteration should to started at
0
.
This case can be reduced to (2.5) by introducing the data g
= g T
0
. The Landweber
iterates
n
corresponding to these data are related to
n
by
n
=
n
0
, n 0.
By possibly rescaling the norm in Y , we may also assume for theoretical purposes that
|T| 1. (2.7)
Then we may set = 1, and the formulas above become a bit simpler. In practice it is
necessary to estimate the norm of |T
T| since the speed of convergence depends sensitively

on the value |T
T|, which should not be much smaller than 1. This can be done by a
few steps of the power method
n+1
:= T
T
n
/|T
T
n
|. Usually after about 5 steps with
a random starting vector
0
the norm of |T
T
n
| is suciently close to |T
T|.
Discrepancy principle
A crucial problem concerning Tikhonov regularization as well as other regularization meth-
ods is the choice of the regularization parameter. For Landweber iteration the number of
iterations plays the role of the regularization parameter. For Tikhonov regularization with
0
= 0 the total error
can be decomposed as follows:

= (I +T
T)
1
T
T + (T
T +I)
1
T
(g g
)
= (I +T
T)
1
+ (T
T +I)
1
T
(g g
)
The rst term on the right hand side of this identity is called the approximation error,
and the second term is called the data noise error. Formally (at least if T is a non-zero
multiple of the identity operator) the approximation error tends to 0 as 0. We will
show later that this is true for arbitrary T and N(T)
. On the other hand, the

operator (T
T + I)
1
T
formally tends to the unbounded operator T

1
, so we expect
that the propagated data noise error explods as 0. This situation is illustrated in
Figure 2.1. We have a trade-o between accuracy and stability: If is too large, we get
a poor approximation of the exact solution even for exact data, and if is too small, the
reconstruction becomes unstable. The optimal value of depends both on the data noise
level and the exact solution .
0 1 2 3 4 5
0
2
4
6
8
10
12
regularization parameter
e
r
r
o
r
approximation error
propagated data noise error
total error
Fig. 2.1: Dependence of error terms on regularization parameter
There exist a variety of strategies for choosing the regularization parameter. The most
well-known is Morozovs discrepancy principle. It implies that one should not try to satisfy
the operator equation more accurately than the data noise error. More precisely, it consists
in taking the largest regularization parameter = (, g
) such that the residual |T
|
is lower or equal , where 1 is some xed parameter:
(, g
) := sup > 0 : |T
| (2.8)
We will prove later that for Tikhonov regularization and most other regularization methods
the function |T
| is monotonely increasing. Usually it is sucient to nd

such that
1
|T
|
2
for given constants 1
1
<
2
. This can be done
by a simple bisection algorithm. Faster convergence can be achieved by Newtons method
applied to 1/ as unknown.
For iterative methods such as Landweber iteration, the discrepancy principle consists
in stopping the iteration at the rst index N for which |T
N
g
| .
Other implicit methods
Once we have computed the Tikhonov solution
dened by (2.3) we may nd a better

approximation by applying Tikhonov regularization again using
as initial guess
0
.
This leads to iterated Tikhonov regularization:
,0
:= 0 (2.9a)
,n+1
:= (T
T +I)
1
(T
,n
), n 0 (2.9b)
Note that only one operator T
T +I has to be inverted to compute
,n
for any n N.
If we use, e.g., the LU factorization to apply (T
T + I)
1
, the computation of
,n
for
n 2 is not much more expensive than the computation of
,1
.
The following expression for
,n
is easily derived by induction:
,n
:= (I +T
T)
n
(T
T)
1
((I +T
T)
n
n
I) T
(2.10)
Whereas in iterated Tikhonov regularization n is xed and is determined by some
parameter choice rule (e.g. the discrepancy principle), we may also hold > 0 xed and
iterpret (2.9) as an iterative method. In this case one speaks of Lardys method.
In his original paper on integral equations of the rst kind Tikhonov [Tik63] suggested
to include a derivative operator L in the penalty term in order to damp out high frequencies,
i.e. to replace J
by the functional
|T g
|
2
+|L(
0
)|
2
. (2.11)
Since a minimum
of this functional must belong to the domain of the dierential

operator, and since |L
| must not be too large, the incooperation of L has a smoothing

eect. In many cases this leads to a considerable improvement of the results. If N(L) =
0, the situation can usually be reduced to the situation considered before by introducing
a Hilbert space X
L
:= T(L) X with the norm ||
X
L
:= |L|
X
1
(cf. [Kre89, Section
16.5]). Typically, X
L
is a Sobolev space. If 0 < dimN(L) < and |T|
2
+|L|
2
||
2
for all T(L) and some constant > 0, then the regularization (2.11) still works, but
technical complications occur (cf. [EHN96, Chapter 8])
Other penalty terms have been investigated for special situations. E.g. if jumps of the
unknown solution have to be detected, the smoothing eect of (2.11) is most undesirable.
Much better results are obtained with the bounded variation norm, which is dened by
||
BV
= ||
L
1 +| grad|
L
1 for smooth functions . In general, if L
1
(), R
d
is
not smooth, the weak formulation
||
BV
:= ||
L
1 + sup
__
div f dx : f C
1
0
(), [f[ 1
_
(2.12)
has to be used. For more information on functions of bounded variation we refer to [Giu84].
The non-dierentiability of the bounded variation norm causes diculties both in the
convergence analysis and the numerical implementation of this method (cf. [CK97a, NS98,
VO96]).
Maximum entropy regularization is dened by
|T g
|
2
+
_
b
a
(x) log
(x)
0
(x)
dx (2.13)
where
0
> 0 is some initial guess. It is assumed that
0
and are nonnegative funtions
satisfying
_
dx =
_

0
dx = 1 (cf. [EHN96, SG85]). In other words, and
0
can be
interpreted as probability densities. Note that the penalty term is nonlinear here. For a
convergence analysis of the maximum entropy method we refer to [EL93].
1
It can be shown by results from spectral theory of unbounded operators that X
L
is complete if L is
self-adjoint and there exists a constant > 0 such that |L| || for all T(L).
Other explicit methods
Implicit methods are characterized by the fact that some operator involving T has to be
inverted. For many problems the application of the operator T to a vector X can
be implemented without the need to set up a matrix corresponding to T. For example,
the application of convolution operators can be implemented by FFT using
O
(N log N)
ops where N is the number of unknowns. This is much faster then a matrix-vector
multiplication with
O
(N
2
) ops. Other examples include the application of inverses of
elliptic dierential operator by multigrid method and the application of boundary integral
operators by multipole or panel-clustering methods. For these problems explicit schemes
such as Landweber iteration are very attractive since they do not require setting up a
matrix for T. To apply such schemes we only need routines implementing the application
of the operators T and T
to given vectors.
Note that the nth Landweber iterate belongs the Krylov subspace dened by
/
n
(T
T, T
) := span(T
T)
j
T
: j = 1, . . . , n
Since the computation of any element of /
n
(T
T, T
) requires only (at most) n appli-

cations of T
T, one may try to look for better approximations in the Krylov subspace
/
n
(T
T, T
).
The conjugate gradient method applied to the normal equation T
T = T
y is charac-
terized by the optimality condition
|T
n
g
| = min
|n(T
T,T
)
|T g
|.
The algorithm is dened as follows:
0
= 0; d
0
= g
; p
1
= s
0
= T
d
0
(2.14a)
for k = 1, 2, . . . , unless s
k1
= 0
q
k
= Tp
k
(2.14b)
k
= |s
k1
|
2
/|q
k
|
2
(2.14c)
k
=
k1
+
k
p
k
(2.14d)
d
k
= d
k1
k
q
k
(2.14e)
s
k
= T
d
k
(2.14f)
k
= |s
k
|
2
/|s
k1
|
2
(2.14g)
p
k+1
= s
k
+
k
p
k
(2.14h)
Note that
n
depends nonlinearly on g
. Moreover, it can be shown that the function

g
n
is discontinuous. Both of these facts make the analysis of the conjugate gradient
method more dicult than the analysis of other methods.
Quasi solutions
A dierent method for the solution of general (linear and nonlinear) ill-posed problems is
based on the following topological result:
Theorem 2.5. Let K be a compact topological space, and let Y be topological space satisfy-
ing the Hausdor separation axiom. Moreover, let f : K Y be continuous and injective.
Then f
1
: f(K) K is continuous with respect to the topology of f(K) induced by Y .
Proof. We rst show that f maps closed sets to closed sets. Let A K be closed. Since
K is compact, A is compact as well. Due to the continuity of f, f(A) is compact. Since Y
satises the Hausdor axiom, f(A) is closed.
Now let U K be open. Then f(KU) is closed by our previous argument. It follows
that f(U) is open in f(K) since f(U) f(K U) = f(K) and f(U) f(K U) = due
to the injectivity of f. Hence, f
1
is continuous.
By Theorem 2.5 the restriction of any injective, continuous operator to a compact set
is boundedly invertible on its range.
Denition 2.6. Let F : X Y be a continuous (not necessarily linear) operator and let
K X.
K
K is called a quasi-solution of F() = g
with contraint K if
|F(
K
) g
| = inf
_
|F() g
| : K
_
. (2.15)
Obviously, a quasi-solution exists if and only if there exists a best-approximation
Q
F(K)
g
to g
in F(K). If F(K) is convex, then Q

F(K)
g
exists for all g
Y , and
Q
F(K)
is continuous. If, moreover, K X is compact, then
K
= (F[
K
)
1
Q
F(X)
g
de-
pends continuously on g
by Theorem 2.5. Note that F(K) is convex if F is linear and K

is convex.
Since the embedding operators of function spaces with higher regularity to function
spaces with lower regularity are typically compact, convex and compact subsets K of X
are given by imposing a bound on the corresponding stronger norm. E.g., if X = L
2
([0, 1]),
then the sets C
1
([0, 1]) : ||
C
1 R and H
1
0
([0, 1]) : |
|
L
2 R are compact
in X for all R > 0. Note the in the latter case (2.11) with L = d/dx can be interpreted as
a penalty method for solving the constraint optimization problem (2.15) with F = T. In
the former case the constraint optimization problem (2.15) is more dicult to solve.
Regularization by discretization
We have seen in Example 1.2 that inverse problems can be regularized by discretization.
In fact, the restriction of T to any nite dimensional subspace X
n
X yields a well-posed
problem since linear operators on nite dimensional spaces are always bounded. However,
the condition number of the nite dimensional problem may be very large unless X
n
is
chosen properly. Whereas for dierentiation and some other important problems it is well
understood how to choose nite dimensional subspaces such that the condition number
can be controlled, appropriate nite dimensional subspaces are not always known a-priori,
and the numerical computation of such spaces is usually too expensive.
In regularization by discretization the size of the nite dimensional subspace acts as
a regularization parameter. Therefore, asymptotically the solution becomes less reliable
as the discretization gets ner. We refer to [Nat77], [EHN96, Section 3.3], and [Kre89,
Chapter 17] as general references.
Of course, if other regularization methods such as Tikhonov regularization are used, the
regularized problems have to be discretized for a numerical implementation, and the eects
of this discretization have to be investigated ([Neu88, PV90, Hoh00]). This is refered to as
regularization with discretization as opposed to regularization by discretization discussed
before. In the former case the choice of the discretization scheme is not as critical, and for
any reasonable method the quality of the solution does not deteriorate as the discretization
gets ner. In this lecture we will neither consider regularization by nor regularization with
discretization.
3. Regularization methods
Let X and Y be Hilbert spaces, and let L(X, Y ) denote the space of linear bounded
operators from X to Y . We dene L(X) := L(X, X). For T L(X, Y ), the null-space
and the range of T are denoted by N(T) := X : T = 0 and R(T) := T(X).
Orthogonal projections
Theorem 3.1. Let U be a closed linear subspace of X. Then for each X there exists
a unique vector U satisfying
| | = inf
uU
|u |. (3.1)
is called the best approximation to in U. is the unique element of U satisfying
, u) = 0 for all u U. (3.2)
Proof. We abbreviate the right hand side of (3.1) by d and choose a sequence (u
n
) U
such that
| u
n
|
2
d
2
+
1
n
, n N. (3.3)
Then
|( u
n
) + ( u
m
)|
2
+|u
n
u
m
|
2
= 2| u
n
|
2
+ 2| u
m
|
2
4d
2
+
2
n
+
2
m
for all n, m N. Since
1
2
(u
n
+u
m
) U it follows that
|u
n
u
m
|
2
4d
2
+
2
n
+
2
m
4
_
_
_
_

1
2
(u
n
+u
m
)
_
_
_
_
2
n
+
2
m
.
This shows that (u
n
) is a Cauchy sequence. Since U is complete, it has a unique limit
U. Passing to the limit n in (3.3) shows that is a best approximation to .
From the equality
|( ) +tu| = | |
2
+ 2t Re , u) +t
2
|u|
2
, (3.4)
which holds for all u U and all t R, it follows that 2[ Re , u) [ t|u|
2
for all
t > 0. This implies (3.2). Going back to (3.4) shows that is the only best approximation
to in U.
22
3. Regularization methods 23
Remark 3.2. An inspection of the proof shows that Theorem 3.1 remains valid if U is a
closed convex subset of X. In this case (3.2) has to be replaced by
Re , u ) 0 for all u U. (3.5)
Theorem 3.3. Let U ,= 0 be a closed linear subspace of X and let P : X U denote
the orthogonal projection onto U, which maps a vector X to its best approximation in
U. Then P is a linear operator with |P| = 1 satisfying
P
2
= P and P
= P. (3.6)
I P is the orthogonal projection onto the closed subspace U
:= v X : v, u) =
0 for all u U. Moreover,
X = U U
and U
= U. (3.7)
Proof. Linearity of P follows from the characterization (3.2). Since P = for U,
we have P
2
= P and |P| 1. (3.2) with u = P implies ||
2
= |P|
2
+ |(I P)|
2
.
Hence, |P| 1. Since
P, ) = P, P + (I P)) = P, P)
and analogously , P) = P, P), the operator P is self-adjoint.
To show that U
is closed, let (
n
) be a convergent subsequence in U
with limit X.
Then , u) = lim
n
n
, u) = 0, so U
. It follows from (3.2) that (I P) U
for all X. Moreover, (I P), v) = P, v) = 0 for all v U
. By Theorem 3.1
this implies that (I P) is the best approximation to in U
.
It follows immediately from the denition of U
that UU
= 0. Moreover, U+U
=
X since = P + (I P) for all X with P U and (I P) U
. Finally,
U
= R(I (I P)) = R(P) = U.

Theorem 3.4. If T L(X, Y ) then
N(T) = R(T
and R(T) = N(T
. (3.8)
Proof. If N(T), then , T
) = T, ) = 0 for all Y , so R(T
. Hence,
N(T) R(T
. If R(T
, then 0 = , T
) = T, ) for all Y . Hence

T = 0, i.e. N(T). This shows that R(T)
N(T) and completes the proof of the

rst equality in (3.8). Interchanging the roles of T and T
gives N(T
) = R(T)
. Hence,
N(T
= R(T)
= R(T). The last equality follows from (3.7) since R(T)
= R(T)
.
The Moore-Penrose generalized inverse
We consider an operator equation
T = g (3.9)
with an operator T L(X, Y ). Since we also want to consider optimal control problems,
we neither assume that T is injective nor that g R(T). First we have to dene what is
meant by a solution of (3.9) in this case. This leads to a generalization of the notion of
the inverse of T.
Denition 3.5. is called a least-squares solution of (3.9) if
|T g| = inf|T g| : X. (3.10)
X is called a best approximate solution of (3.9) if is a least-squares solution of (3.9)
and if
|| = inf|| : is least-squares solution of T = g. (3.11)
Theorem 3.6. Let Q : Y R(T) denote the orthogonal projection onto R(T). Then the
following three statements are equivalent:
is a least-squares solution to T = g. (3.12)
T = Qg (3.13)
T
T = T
g (3.14)
Proof. Since T Qg, (I Q)g) = 0 by Theorem 3.3, we have
|T g|
2
= |T Qg|
2
+|Qg g|
2
.
This shows that (3.13) implies (3.12). Vice versa, if
0
is a least-squares solution, the
last equation shows that
0
is a minimum of the functional |T Qg|. Since
inf
X
|T Qg| = 0 by the denition of Q, must satisfy (3.13).
Since N(T
) = R(T)
= R(I Q) by Theorems 3.3 and 3.4, the identity T
(I Q) = 0
holds true. Hence, (3.13) implies (3.14). Vice versa, assume that (3.14) holds true. Then
T g N(T
) = R(T)
. Hence, 0 = Q(T g) = T Qg.

Equation (3.14) is called the normal equation of (3.9). Note that a least-squares solution
may not exist since the inmum in (3.10) may not be attained. It follows from Theorem
3.6 that a least-squares solution of (3.9) exists if and only if g R(T) +R(T)
.
Let P : X N(T) denote the orthogonal projection onto the null-space N(T) of T. If
0
is a least-squares solution to (3.9) then the set of all least-squares solution to (3.9) is
given by
0
+ u : u N(T). Since
|
0
+u|
2
= |(I P)(
0
+u)|
2
+|P(
0
+u)|
2
= |(I P)
0
|
2
+|P
0
+u|
2
,
a best-approximate solution of (3.9) is given by (I P)
0
. Moreove , a best-approximate
solution of (3.9) is unique if it exists: Namely, each best-approximate solution must be
contained in N(T)
, since otherwise the element P (that is obviously a least squares

solution) would have smaller norm by the identity | P|
2
= ||
2
|P|
2
< ||
2
,
in contradiction to the minimality of || amoung all least squares solutions. So for two
best-approximate solutions , we have on one hand N(T)
. On the other hand,

(3.13) yields N(T). Altogether we have = 0.
After these preparations we introduce the Moore-Penrose inverse as follows:
Denition 3.7. The Moore-Penrose (generalized) inverse T
: T(T
) X of T dened
on T(T
) := R(T) +R(T)
maps g T(T
) to the best-approximate solution of (3.9).

Obviously, T
= T
1
if R(T)
= 0 and N(T) = 0. Note that T
g is not dened for all

g Y if R(T) is not closed.
Let

T : N(T)
R(T),

T := T denote the restriction of T to N(T)
. Since
N(
T) = N(T) N(T)
= 0 and R(
T) = R(T), the operator

T is invertible. By the
remarks above, the Moore-Penrose inverse can be written as
T
g =

T
1
Qg for all g T(T
). (3.15)
Denition and properties of regularization methods
Denition 3.8. Let R
: Y X be a family of continuous (not necessarily linear)

operators dened for in some index set A, and let : (0, ) Y A be a parameter
choice rule. For a given noisy data g
Y and noise level > 0 such that |g
g| the
exact solution T
g is approximated by R
(,g
)
g
. The pair (R, ) is called a regularization

method for (3.9) if
lim
0
sup
__
_
R
(,g
)
g
g
_
_
: g
Y, |g
g|
_
= 0 (3.16)
for all g T(T
). is called an a-priori parameter choice rule if (, g
) depends only on
. Otherwise is called an a-posteriori parameter choice rule.
In most cases the operators R
are linear. E.g., for Tikhonov regularization we have

A = (0, ) and R
= (I + T
T)
1
T
. The discrepancy principle is an example of an

a-posteriori parameter choice rule since the regularization parameter depends on quantities
arising in the computations, which in turn depend on the observed data g
. An example of
an a-priori parameter choice rule for Tikhonov regularization is (, g
) = . For iterative
methods the number of iterations plays the role of the regularization parameter, i.e. we
have A = N. We will show later that Tikhonov regularization, Landweber iteration and
other method introduced in the previous chapter together with appropriate parameter
choice rules are regularization methods in the sense of Denition 3.8.
The number
sup
__
_
R
(,g
)
g
g
_
_
: g
Y, |g
g|
_
in (3.16) is the worst case error of the regularization method (R, ) for the exact data
g T(T
) and the noise level . We require that the worst case error tends to 0 as the
noise level tends to 0.
If regularization with discretization is considered, the regularization parameter is of
the form = ( , h), where is the regularization parameter of the innite dimensional
method and h is a discretization parameter.
Note that an arbitrary reconstruction method T
g S(, g
) with a (not necessarily

continuous) mapping S : (0, ) Y X can be written in the form of Denition 3.8 by
setting A = X and dening R
g := for all g Y . This is important concerning the

negative results in Theorem 3.9 and 3.11 below.
Since it is sometimes hard or impossible to obtain information on the size of the noise
level in practice, parameter choice rules have been devised, which only require knowledge
of the measured data g
, but not of the noise level (cf. [EHN96, Section 4.5] and the
references therein). Although such so-called error-free parameter choice rules give good
results in many cases, the following result by Bakushinskii shows that for an ill-posed
problem one cannot get convergence in the sense of Denition 3.8.
Theorem 3.9. Assume there exists a regularization method (R
, ) for (3.9) with a param-

eter choice rule (, g
), which depends only on g
, but not on . Then T
is continuous.
Proof. Choosing g
= g in (3.16) shows that R

(g)
g = T
g for all g T(T
). Let (g
n
)
nN
be a sequence in T(T
) which converges to g T(T
) as n . Then R
(gn)
g
n
= T
g
n
for all n N. Using (3.16) again with g
= g
n
gives 0 = lim
n
|R
(gn)
g
n
T
g| =
lim
n
|T
g
n
T
g|. This proves the assertion.

Whereas we are considering the error as a deterministic quantity, it may be more ap-
propriate in some applications to consider it as a probabalistic quantity. The standard
method to determine the regularization parameter in this setting is generalized cross vali-
dation (cf. [Wah90]). Usually it is assumed in this setting that the image space Y is nite
dimensional and that some information on the distribution of the error is known.
Let us consider regularization methods (R
, ) which satisfy the following assumption:

R
: Y X, A (0, ) is a family of linear operators and

lim
0
sup
_
(, g
) : g
Y, |g
g|
_
= 0. (3.17)
It follows from (3.16) and (3.17) with g
= g that R
converges pointwise to T
:
lim
0
R
g = T
g for all g T(T
). (3.18)
Theorem 3.10. Assume that (3.17) holds true and T
is unbounded. Then the operators

R
cannot be uniformly bounded with respect to , and the operators R
T cannot be norm
convergent as 0.
Proof. For the rst statement, assume on the contrary that |R
| C for all A. Then

(3.18) implies |T
| C which contradicts our assumption that |T
| = .
For the second statement, assume that we have norm convergence. Then there exists
A such that |R
T I| 1/2. It follows that

|T
g| |T
g R
TT
g| +|R
Qg|
1
2
|T
g| +|R
||g|
for all g T(T
), which implies |T
g| 2|R
||g|. Again, this contradicts our assump-

tion.
We have required that (3.16) holds true for all y T(T
). Our next result shows that

it is not possible to get a uniform convergence rate in (3.16) for all y T(T
).
Theorem 3.11. Assume that there exist a regularization method (R
, ) for (3.9) and a

continuous function f : [0, ) [0, ) with f(0) = 0 such that
sup
__
_
R
(,g
)
g
g
_
_
: g
Y, |g
g|
_
f() (3.19)
for all g T(T
) with |g| 1 and all > 0. Then T
is continuous.
Proof. Let (g
n
)
nN
be a sequence in T(T
) converging to g T(T
) as n such that
|g
n
| 1 for all n. With
n
:= |g
n
g| we obtain
_
_
T
g
n
T
g
_
_
_
_
T
g
n
R
(n,gn)
g
n
_
_
+
_
_
R
(n,gn)
g
n
T
g
_
_
.
The rst term on the right hand side of this inequality can be estimated by setting g =
g
= g
n
in (3.19), and the second term by setting g = g and g
= g
n
. We obtain
_
_
T
g
n
T
g
_
_
2f(
n
).
Since f is continuous and f(0) = 0, this implies that T
g
n
T
g as n . Therefore, T
is continuous at all points g with | g| 1. This implies that T
is continuous everywhere.
Theorem 3.11 shows that for any regularization method for an ill-posed problem con-
vergence can be arbitrarily slow. However, we have seen in Example 1.2 for the central
dierence quotient that convergence rates f() = C
or f() = C
2/3
can be shown if
smoothness properties of the exact solution
= T
g are known a-priori. Later we will get

to know generalized smoothness conditions which yield convergence rates in the general
setting considered here.
4. Spectral theory
Compact operators in Hilbert spaces
Lemma 4.1. Let A L(X) be self-adjoint and assume that X ,= 0. Then
|A| = sup
||=1
[ A, ) [. (4.1)
Proof. Let a denote the right hand side of equation (4.1). It follows from the Cauchy-
Schwartz inequality that A a. To show that A a rst note that
|A| = sup
||=1
|A| sup
||=||=1
[ A, ) [
since we may choose = A/|A|. Let , X with || = || = 1. We choose C
with [[ = 1 such that [ A, ) [ = A, ) and set

:= . Using the polarization
identity and the fact that A, ) R for all X we obtain
[ A, ) [ =
1
4
3
k=0
i
k
_
A( +i
k

), +i
k

_
=
1
4
_
A( +

), +

1
4
_
A(

),

a
4
_
| +

|
2
+|

|
2
_
=
a
4
_
2||
2
+ 2|
|
2
_
= a.
This shows that |A| a.
Lemma 4.2. Let A L(X) be compact and self-adjoint and assume that X ,= 0. Then
there exists an eigenvalue of A such that |A| = [[.
Proof. By virtue of Lemma 4.1 there exists a sequence (
n
) with |
n
| = 1 for all n such
that A
n
,
n
) , n where |A|, |A|. To prove that is an eigenvalue,
we note that
0 |A
n
n
| = |A
n
|
2
2 A
n
,
n
) +
2
|
n
|
2
|A|
2
2 A
n
,
n
) +
2
= 2 ( A
n
,
n
)) 0, n 0,
28
4. Spectral theory 29
so
A
n

n
, n . (4.2)
Since A is compact, there exists a subsequence (
n(k)
) such that A
n(k)
, k for
some X. We may assume that A ,= 0 since for A = 0 the statement of the theorem
is trivial. It follows from (4.2) that
n(k)
, k . Therefore,
n(k)
converges to
:= / as k , and A = .
Theorem 4.3. (Spectral theorem for compact self-adjoint operators) Let A
L(X) be compact and self-adjoint. Then there exists a complete orthonormal system E =
j
: j I of X consisting of eigenvectors of A. Here I is some index set, and A
j
=
j
j
for j I. The set J = j I :
j
,= 0 is countable, and
A =
jJ
j
,
j
)
j
(4.3)
for all X. Moreover, for any > 0 the set J
:= j I : [
j
[ is nite.
Proof. By Zorns lemma there exists a maximal orthonormal set E of eigenfunctions of A.
Let U denote the closed linear span of E. Obviously, A(U) U. Moreover, A(U
) U
since Au, ) = u, A) = 0 for all u U
and all U. As U
is closed, A[
U
is compact,
and of course A[
U
is self-adjoint. Hence, if U
,= 0, there exists an eigenvector U
of A due to Lemma 4.2. Since this contradicts the maximality of E, we conclude that
U
= 0. Therefore U = X, i.e. the orthonormal system E is complete.

To show (4.3) we apply A to the Fourier representation
=
jI
,
j
)
j
(4.4)
with respect to the Hilbert basis E.
1
Assume that J
is innite for some > 0. Then there exists a sequence (

n
)
nN
of
orthonormal eigenvectors such that [
n
[ for all n N. Since A is compact, there exists
a subsequence (
n(k)
) of (
n
) such that (A
n(k)
) = (
n(k)
n(k)
) is a Cauchy-sequence.
This is a contradiction since |
n
n

m
m
|
2
=
2
n
+
2
m
2
2
for m ,= n due to the
orthonormality of the vectors
n
.
Theorem and Denition 4.4. (Singular value decomposition) Let T L(X, Y )
be compact, and let P L(X) denote the orthogonal projection onto N(T). Then there
exist singular values
0

1
> 0 and orthonormal systems
0
,
1
, . . . X and
g
0
, g
1
, . . . Y such that
=
n=0
,
n
)
n
+P (4.5)
1
Recall that only a countable number of terms in (4.4) can be non-zero even if E is not count-
able. By Bessels inequality, we have
jE
[ ,
j
) [
2
||
2
< where
jE
[ ,
j
) [
2
:=
sup
jG
[ ,
j
) [
2
: G E, #G < . Therefore, for each n N the set S
n
:=
j
E : [ ,
j
) [
(

n+1
,

n
] is nite. Hence, S =
nN
S
n
=
j
E : [ ,
j
) [ > 0 is countable.
and
T =
n=0
n
,
n
) g
n
. (4.6)
for all X. A system (
n
,
n
, g
n
) with these properties is called a singular system of
T. If dimR(T) < , the series in (4.5) and (4.6) degenerate to nite sums. The singular
values
n
=
n
(T) are uniquely determined by T and satisfy
n
(T) 0 as n . (4.7)
If dimR(T) = and n dimR(T), we set
n
(T) := 0.
Proof. We use the notation of Theorem 4.3 with A = T
T. Then
n
=
n
n
,
n
) =
|T
n
|
2
> 0 for all n. Set
n
:=
_
n
and g
n
:= T
n
/|T
n
|. (4.8)
The vectors g
n
are orthogonal since T
n
, T
m
) = T
T
n
,
m
) =
n
n
,
m
) = 0 for
n ,= m. (4.5) follows from (4.4) since P =
jI\J
,
j
)
j
. Applying T to both sides
of (4.5) gives (4.6) by the continuity of the scalar product. Now let (
n
,
n
, g
n
) be any
singular system of T. It follows from (4.6) with =
n
that T
n
=
n
g
n
for all n.
Moreover, since T
g
n
N(T)
and T
g
n
,
m
) = g
n
, T
m
) =
m
g
n
, g
m
) =
n
n,m
for all
n, m, we have
T
g
n
=
n
n
. (4.9)
Hence, T
T
n
=
2
n
n
. This shows that the singular values
n
(T) are uniquely determined
as positive square roots of the eigenvalues of A = T
T. (4.7) follows from the fact that 0

is the only possible accumulation point of the eigenvectors of A.
Example 4.5. For the operator T
BH
dened in Example 1.3 a singular system is given by
n
(T
BH
) =
2 exp(
2
(n + 1)
2
T),
n
(x) = g
n
(x) = sin((n + 1)x), x [0, 1]
for n N
0
.
Theorem 4.6. The singular values of a compact operator T L(X, Y ) satisfy
n
(T) = inf |T F| : F L(X, Y ), dimR(F) n , n N
0
. (4.10)
Proof. Let
n
(T) denote the right hand side of (4.10). To show that
n
(T)
n
(T), we
choose F :=
n1
j=0

n
,
n
) g
n
, where (
n
,
n
, g
n
) is a singular system of T. Then by
Parsevals equality
|T F|
2
=
j=n
2
n
[ ,
j
) [
2

2
n
||
2
.
This implies
n
(T)
n
(T). To prove the reverse inequality, let F L(X, Y ) with
dimR(F) n be given. Then the restriction of F to span
0
, . . . ,
n
has a non-trivial
null-space. Hence, there exists =
n
j=0
n
with || = 1 and F = 0. It follows that
|T F|
2
|(T F)|
2
= |T|
2
=
_
_
_
_
_
n
j=0
j
g
j
_
_
_
_
_
=
n
j=0
2
j
[
j
[
2

2
n
.
Hence,
n
(T)
n
(T).
Theorem 4.7. (Picard) Let T L(X, Y ) be compact, and let (
n
,
n
, g
n
) be a singular
system of T. Then the equation
T = g (4.11)
is solvable if and only if g N(T
and if the Picard criterion
n=0
1
2
n
[g, g
n
)[
2
< (4.12)
is satised. Then the solution is given by
=
n=0
1
n
g, g
n
)
n
. (4.13)
Proof. Assume that g N(T
and that (4.12) holds true. Then g =
n=0
g, g
n
) g
n
,
and the series in (4.13) converges. Applying T to both sides of (4.13) gives (4.11). Vice
versa, assume that (4.11) holds true. Then g N(T
by Theorem 3.4. Since g, g

n
) =
T, g
n
) = , T
g
n
) =
n
,
n
) by (4.9), it follows that
n=0
1
2
n
[g, g
n
)[
2
=
n=0
[,
n
)[
2
= ||
2
< .
The solution formula (4.13) nicely illustrates the ill-posedness of linear operator equa-
tions with a compact operator: Since 1/
n
by (4.7), the large Fourier modes are
amplied without bound. Typically, as in Example 4.5 the large Fourier modes correspond
to high frequencies. Obviously, the faster the decay of the singular values, the more severe
is the ill-posedness of the problem.
We say that (4.11) is mildly ill-posed if the singular values decay to 0 at a polyno-
mial rate, i.e. if there exist constants C, p > 0 such that
n
Cn
p
for all n N.
Otherwise (4.11) is called severely ill-posed. If there exist constants C, p > 0 such that
n
C exp(n
p
), we call the problem (4.11) exponentially ill-posed.
One possibility to restore stability in (4.13) is to truncate the series, i.e. to compute
R
g :=
n:n
1
n
g, g
n
)
n
. (4.14)
for some regularization parameter > 0. This is called truncated singular value decompo-
sition. A modied version is
R
g :=
n:n
1
n
g, g
n
)
n
. +
n:n<
1
g, g
n
)
n
. (4.15)
This method is usually only ecient if a singular system of the operator is known
explicitely since a numerical computation of the singular values and vectors is too expensive
for large problems.
The spectral theorem for bounded self-adjoint operators
Our next aim is to nd a generalization of Theorem 4.3 for bounded self-adjoint operators.
This theorem may be reformulated as follows: We dene the operator W : l
2
(I) X by
W(f) :=
jI
f(j)
j
. Here l
2
(I) is the Hilbert space of all functions f : I C with the
norm |f|
2
:=
jJ
[f(j)[
2
. By Parsevals equality, W is a unitary operator. Its inverse is
given by (W
1
)(j) = ,
j
) for j J. Eq. (4.3) is equivalent to
W
1
AW = M
where M
L(l
2
(I)) is dened by (M
(
)f)(j) =
j
f(j), j I. In other words, there exists
a unitary map W such that A is transformed to the multiplication operator M
on l
2
(I).
An important class of operators, which occured in the examples 1.4 and 1.5, are con-
volution operators. Let k L
1
(R
d
) satisfy k(x) = k(x) for x R
d
. Then
(A)(x) :=
_
R
d
k(x y)(y) dy
denes a self-adjoint operator A L(X). Recall that the Fourier transform
(T)() := (2)
d/2
_
R
d
e
i,x)
(x) dx, R
d
is unitary on L
2
(R
d
) and that the inverse operator is given by
(T
1
f)(x) := (2)
d/2
_
R
d
e
i,x)
f() d, x R
d
.
Due to the assumed symmetry of k, the function Tk is real-valued, and it is bounded since
k L
1
(R
d
). Introducing the function := (2)
d/2
Tk and the multiplication operator
M
L(L
2
(R
d
)), (M
f)() := ()f(), the Convolution Theorem implies that

TAT
1
= M
.
Thus we have again found a unitary map which transforms A to a multiplication opera-
tor. The following theorem shows that this is always possible for a bounded self-adjoint
operator.
Theorem 4.8. (spectral theorem for bounded self-adjoint operators) Let A
L(X) be self-adjoint. Then there exist a locally compact space , a positive Borel measure
on , a unitary map
W : L
2
(, d) X, (4.16)
and a real-valued function C() such that
W
1
AW = M
, (4.17)
where M
L(L
2
(, d)) is the multiplication operator dened by (M
f)() := ()f()
for f L
2
(, d) and .
Note that we have already proved this theorem for all the linear examples in Chapter
(1) by the remarks above. Our proof follows the presentation in [Tay96, Section 8.1] and
makes use of the following special case of the representation theorem of Riesz, the proof of
which can be found e.g. in [Bau90, MV92]. For other versions of the spectral theorem we
refer to [HS71, Rud73, Wei76].
Theorem 4.9. (Riesz) Let be a locally compact space, and let C
0
() denote the space
of continuous, compactly supported functions on . Let L : C
0
() R be a positive linear
functional, i.e. L(f) 0 for all f 0. Then there exists a positive Borel measure on
such that for all f C
0
()
L(f) =
_
f d.
Lemma 4.10. Let A L(X) be self-adjoint. Then the initial value problem
d
dt
U(t) = iAU(t), U(0) = I (4.18)
has a unique solution U C
1
(R, L(X)) given by
U(t) =
n=0
1
n!
(itA)
n
. (4.19)
U(t) : t R is a group of unitary operators with the multiplication
U(s +t) = U(s)U(t). (4.20)
Proof. The series (4.19) together with its term by term derivate converges uniformly on
bounded intervals t [a, b] with respect to the operator norm since
n=0
1/n! |itA|
n
=
exp(t|A|) < and
n=1
n/n! |iA| |itA|
n1
= |A| exp(t|A|) < . Hence, t U(t)
is dierentiable, and
U
(t) = iA
n=1
n
n!
(itA)
n1
= iAU(t).
If

U C
1
(R, L(X)) is another solution to (4.18), then Z = U

U satises Z
(t) = iAZ(t)
and Z(0) = 0. Since A is self-adjoint
d
dt
|Z(t)|
2
= 2 Re Z
(t), Z(t)) = 2 Re iAZ(t), Z(t)) = 0

for all X. Hence Z(t) = 0 and U(t) =

U(t) for all t.
To establish (4.20) we observe that Z(s) := U(s+t)U(s)U(t) satises Z
(s) = iAZ(s)
and Z(0) = 0. It follows from the argument above that Z(s) = 0 for all s. By the denition
(4.19), U(t)
= U(t) for all t R. Now (4.20) implies that U(t)
U(t) = U(t)U(t)
= I,
i.e. U(t) is unitary.
For X we call
X
:= spanU(t) : t R (4.21)
the cyclic subspace generated by . We say that X is a cyclic vector of X if X
= X.
Lemma 4.11. If U(t) is a unitary group on a Hilbert space, then X is an orthogonal direct
sum of cyclic subspaces.
Proof. An easy application of Zorns lemma shows that there exists a maximal set
j
:
j I of vectors X such that the cyclic subspaces X
j
are pairwise orthogonal. Let
V :=
jI
X
j
. Obviously, U(t)V V for all t R. Assume that there exists a vector
V
with ,= 0. Since for all t R and all V

U(t), ) = , U(t)
) = , U(t)) = 0,
it follows that X
V

. This contradicts the maximality of the set
j
: j I. It
follows that V
= 0 and X = V .
Lemma 4.12. If U(t) : t R is a continuous unitary group of operators on a Hilbert
space X, having a cyclic vector , then there exist a positive Borel measure on R and a
unitary map W : L
2
(R, d) X such that
W
1
U(t)Wf() = e
it
f(), R (4.22)
for all f L
2
(R, d) and t R.
Proof. We consider the function
(t) := (2)
1/2
U(t), ) , t R
and dene the linear functional
L(f) :=
_
R
f(t)(t) dt
for f C
2
0
(R), where

f = Tf denotes the Fourier transform of f. We aim to show that
L can be extended to a positive linear functional on C
0
(R) to obtain a positive functional
representing L by Theorem 4.9. If T L
2
(R) exists, then L(f) =
_
fT dt, so
d = T d . However, since [(t)[ is only bounded, but does not necessarily decay to 0
as [t[ , T may not be dened. If f C
2
0
(R) we have
f(t) = (2)
1/2
_

e
it
f() d = (2)
1/2
(it)
2
_

e
it
f
() d
for t ,= 0 by partial integration, so [(Tf)(t)[ =
O
([t[
2
) as [t[ . Therefore, Lf is well
dened for f C
2
0
(R). Moreover,
W(f) := (2)
1/2
_

(Tf)(t)U(t)dt
is well dened, and
|W(f)|
2
= (2)
1
__

f(s)U(s)ds,
_

f(t)U(t)dt
_
= (2)
1
_ _

f(s)

f(t) U(s t), ) dt ds
= (2)
1/2
_ _

f(s)

f(t s) ds (t) dt (4.23)
=
_
T(ff)(t) (t) dt
= L([f[
2
).
This shows that L([f[
2
) 0 for f C
2
0
(R).
Our next aim is to show that for each interval [a, b] R, L has a unique extension to a
bounded, positive linear functional on C
0
([a, b]). For this end we choose a smooth cut-o
function C
2
0
(R) satisfying 0 1 and () = 1 for [a, b]. Let g C
2
0
([a, b])
with |g|
< 1. Then
_
2
g C
2
0
(R). Hence, L(
2
g) 0 by (4.23). Analogously,
L(
2
+ g) 0. Both inequalities and the linearity of L imply that [L(g)[ L(
2
).
Therefore, L is continuous on C
2
([a, b]) with respect to the maximum norm. To show that
L(g) 0 for g C
2
0
(R) with g 0, we introduce f
:= g/
g + for all > 0. Note that

f
C
2
0
(R) and that
|g f
2
| =
_
_
_
_
g
2
+g g
2
g +
_
_
_
_
.
Hence, L(g) = lim
0
L(f
2
) 0. As C
2
0
([a, b]) is dense in C
0
([a, b]), L has a unique
extension to a bounded, positive linear functional on C
0
([a, b]).
Now Theorem 4.9 implies that L(f) =
_
f d, f C
0
(R) for some positive Borel
measure on R. Eq. (4.23) shows that W can be extended to an isometry W : L
2
(R, )
X. Assume that R(W)
. Then 0 = , W(f)) = (2)

1/2
_

f(t) , U(t)) dt for all
f C
2
0
(R). Hence , U(t)) = 0 for all t R. Since is assumed to be cyclic, = 0.
Therefore, R(W)
= 0, i.e. W is unitary.
For s R and f C
2
0
(R) we have
W(e
is
f) =
_

f(t s)U(t)dt =
_

f(t)U(t +s)dt
= U(s)
_

f(t)U(t)dt = U(s)W(f).
Applying W
1
to both sides of this equation gives (4.22) for f C
2
0
(R). Then a density
argument shows that (4.22) holds true for all f L
2
(R, ).
Proof of Theorem 4.8. If X is cyclic, then we apply id/dt to (4.22) with U dened in
Lemma 4.10 and obtain (4.17) with () = and = R.
By Lemma 4.11 there exists a family of Borel measures
j
: j I dened on
j
= R, a
family of continuous function
j
C(
j
) and familiy of mappings W
j
: L
2
(
j
, d
j
) X
j
such that W
1
j
AW
j
= M
j
for all j I and X =
jI
X
j
. Let denote the sum of the
measures
j
: j I dened on the topological sum of the spaces
j
. Obviously,
is locally compact. Moreover, L
2
(, ) can be identied with
jI
L
2
(
j
, d
j
). Dening
W : L
2
(, d) X as the sum of the operators W
j
, and M
L(L
2
(, d)) as sum of the
operators M
j
completes the proof.
Since we have shown that all bounded self-adjoint operators can be transformed to
multiplication operators, we will now look at this class of operators more closely.
Lemma 4.13. Let be a Borel measure on a locally compact space , and let f C().
Then the norm of M
f
L(L
2
(, d)) dened by (M
f
g)() := f()g() for g L
2
(, d)
and is given by
|M
f
| = |f|
, supp
(4.24)
where supp =
S open, (S)=0
S.
Proof. Since
|M
f
g|
2
=
_
[fg[
2
d =
_
supp
[fg[
2
d |f|
2
,supp
_
supp
[g[
2
d = |f|
2
,supp
|g|
2
we have |M
f
| |f|
, supp
. To show the reverse inequality, we may assume that
|f|
,supp
> 0 and choose 0 < < |f|
,supp
. By continuity of f,
:= (f[
supp
)
1
( :
[[ > |f|
, supp
) is open, and [f()[ |f|
, supp
for all
. Since is
locally compact, there exists a compact environment K to any point in
, and (K) <

since is a Borel measure. Moreover, (K
) > 0 as
supp and the interior of

K
is not empty. Hence, the characteristic function g of K
belongs to L
2
(, d),
g ,= 0, and |M
f
g| (|f|
supp
)|g|. Therefore, |M
f
| |f|
, supp
for any > 0.
This completes the proof of (4.24).
Next we dene the spectrum of an operator, which is a generalization of the set of
eigenvalues of a matrix.
Denition 4.14. Let X be a Banach space and A L(X). The resolvent set (A) of
A is the set of all

C for which N(
I A) = 0, R(
I A) = X, and (
I A) is
boundedly invertible. The spectrum of A is dened as (A) := C (A).
It follows immediately from the denition that the spectrum is invariant under unitary
transformations. In particular, (A) = (M
) with the notation of Theorem 4.8. By

(4.24), (
I M
)
1
exists and is bounded if and only if the function (
())
1
is
well-dened for -almost all and if |(
)
1
|
, supp
< . Since is continuous,
the latter condition is equivalent to

/ (supp ). In other words, (M
) = C (supp )
or
(A) = (supp ). (4.25)
It follows from (4.24) and (4.25) that (A) is closed and bounded and hence compact.
If p() =
n
j=0
p
j
j
is a polynomial, it is natural to dene
p(A) :=
n
j=0
p
j
A
j
. (4.26)
The next theorem generalizes this denition to continuous functions on (A).
Theorem 4.15. (functional calculus) With the notation of Theorem 4.8 dene
f(A) := WM
f
W
1
(4.27)
for a real-valued function f C((A)). Here (f )() := f(()). Then f(A) L(X)
is self-adjoint and satises (4.26) if f is a polynomial. The mapping f f(A), which is
called the functional calculus at A, is an isometric algebra homomorphism from C((A))
to L(X), i.e. for f, g C((A)) and , R we have
(f +g)(A) = f(A) +g(A), (4.28a)
(fg)(A) = f(A)g(A), (4.28b)
|f(A)| = |f|
. (4.28c)
The functional calculus is uniquely determined by the properties (4.26) and (4.28).
Proof. By (4.24) and (4.25), f(A) is well-dened and bounded. It is self-adjoint since
f is real-valued. For a polynomial p we have p(A) = Wp(M
)W
1
with the denition
(4.26). Since p(M
) = M
p
, this coincides with denition (4.27). The proof of (4.28a) is
straightforward. To show (4.28b), we write
f(A)g(A) = WM
f
W
1
WM
g
W
1
= WM
(fg)
W
1
= (fg)(A).
Finally, using (4.24), (4.25), and the continuity of f,
|f(A)| = |M
f
| = |f |
,supp
= |f|
,(A)
.
Let
A
: C((A)) L(X) be any isometric algebra homomorphism satisfying
A
(p) =
p(A) for all polynomials. By the Weierstra Approximation Theorem, for any f C((A))
there exists a sequence of polynomials (p
n
) such that |f p
n
|
,(A)
0 as n .
Using (4.28c) we obtain
A
(f) = lim
n
A
(p
n
) = lim
n
p
n
(A) = f(A). Therefore, the
functional calculus is uniquely determined by (4.26) and (4.28).
Theorem 4.15 will be a powerful tool for the convergence analysis in the next section as
it allows to reduce the estimation of operator norms to the estimation of functions dened
on an interval. We will also need the following
Lemma 4.16. If T L(X, Y ) and f C([0, |T
T|]), then
Tf(T
T) = f(TT
)T. (4.29)
Proof. It is obvious that (4.29) holds true if f is a polynomial. By the Weierstra Approx-
imation Theorem, for any f C([0, |T
T|]) there exists a sequence (p

n
) of polynomials
such that |f p
n
|
,[0,|T
T|]
0 as n . Hence, Tf(T
T) = lim
n
Tp
n
(T
T) =
lim
n
p
n
(TT
)T = f(TT
)T by virtue of (4.28c).
Finally we show that the functional calculus can be extended to the algebra /((A))
of bounded, Borel-measureable functions on (A) with the norm |f|
:= sup
t(A)
[f(t)[.
Theorem 4.17. The mapping f f(A) := WM
f
W
1
is a norm-decreasing algebra
homomorphism from /((A)) to L(X), i.e. for f, g /((A)) and , R we have
(f + g)(A) = f(A) +g(A), (4.30a)
(fg)(A) = f(A)g(A), (4.30b)
|f(A)| |f|
. (4.30c)
If (f
n
) is a sequence in /((A)) converging pointwise to a function f /((A)) such
that sup
nN
|f
n
| < , then
|f
n
(A) f(A)| 0 as n (4.31)
for all X.
Proof. (4.30a) and (4.30b) are shown in the same way as (4.28a) and (4.28b). For the proof
of (4.30c) we estimate |f(A)| = |M
f
| |f|
. To proof (4.31), we dene g := W

1
and C := sup
nN
|f
n
|. Since the functions [f
n
f [
2
[g[
2
converge pointwise to 0
and since they are dominated by the integrable function 4C
2
[g[
2
, Lebesgues Dominated
Convergence Theorem implies that
|f
n
(A) f(A)|
2
= |M
fn
g M
f
g|
2
=
_
[f
n
(()) f(())[
2
[g()[
2
d 0
as n .
5. Convergence analysis of linear
regularization methods
We consider an operator equation
T = g (5.1)
where T L(X, Y ) and X and Y are Hilbert spaces. We assume that g T(T
) and that
the data g
Y satisfy
|g g
| . (5.2)
In this chapter we are going to consider the convergence of regularization methods of the
form
R
:= q
(T
T)T
(5.3)
with some functions q
C([0, |T
T|])) depending on some regularization parameter >

0. We denote the reconstructions for exact and noisy data by
:= R
g and
:= R
,
respectively and use the symbol
:= T
g for the exact solution. Since T
g = T
Qg =
T
, the reconstruction error for exact data is given by
= (I q
(T
T)T
T)
= r
(T
T)
(5.4)
with
r
() := 1 q
(), [0, |T
T|]. (5.5)
The following table lists the functions q
and r
for some regularization methods.

Tikhonov regularization q
() =
1
+
r
() =

+
iterated Tikhonov regu-
larization (2.9), n 1
q
() =
(+)
n
n
(+)
n
r
() =
_

+
_
n
truncated singular value
decomposition (4.14)
q
() =
_

1
,
0, <
r
() =
_
0,
1, <
truncated singular value
decomposition (4.15)
q
() =
_

1
,
1
, <
r
() =
_
0,
1 /, <
Landweber iteration
with = 1
q
n
() =
n1
j=0
(1 )
j
r
n
() = (1 )
n
39
5. Convergence analysis of linear regularization methods 40
In all these cases the functions r
satisfy
lim
0
r
() =
_
0, > 0
1, = 0
(5.6)
[r
()[ C
r
for [0, |T
T|]. (5.7)
with some constant C
r
> 0. The limit function dened by the right hand side of (5.6) is
denoted by r
0
(). For Landweber iteration we set = 1/n and assume that the normal-
ization condition (2.7) holds true. Note that (5.6) is equivalent to lim
0
q
() = 1/ for
all > 0. Hence, q
explodes near 0. For all methods listed in the table above this growth
is bounded by
[q
()[
C
q
for [0, |T
T|] (5.8)
with some constant C
q
> 0.
Theorem 5.1. If (5.6) and (5.7) hold true, then the operators R
dened by (5.3) converge

pointwise to T
on T(T
) as 0. With the additional assumption (5.8) the norm of the

regularization operators can be estimated by
|R
|
_
(C
r
+ 1)C
q
. (5.9)
If (, g
) is a parameter choice rule satisfying

(, g
) 0, and /
_
(, g
) 0 as 0, (5.10)
then (R
, ) is a regularization method in the sense of Denition 3.8.

Proof. We rst aim to show the pointwise convergence R
. Let g T(T
),
= T
g,
and A := T
T. Recall from (5.4) that T
g R
g = r
(A)
. Using the boundedness con-

dition (5.7), it follows from (4.31) in Theorem 4.17 that lim
0
r
(A)
= r
0
(A)
. Since
r
0
is real-valued and r
2
0
= r
0
, the operator r
0
(A) is an orthogonal projection. Moreover,
R(r
0
(A)) N(A) since r
0
() = 0 for all and hence Ar
0
(A) = 0. By (3.8) we have
N(T) = N(A). Hence, |r
0
(A)
|
2
=
r
0
(A)
_
= 0 as
N(T)
= N(A)
. This
shows that |R
g T
g| 0 as 0.
Using Lemma 4.16 and the Cauchy-Schwarz inequality we obtain that
|R
|
2
= TT
(TT
), q
(TT
)) |q
()|
|q
()|
||
2
for Y . Now (5.9) follows from the assumptions (5.7) and (5.8).
To prove that (R, ) is a regularization method, we estimate
|
| |
| +|
|. (5.11)
The approximation error |
| tends to 0 due to the pointwise convergence of R
and
the rst assumption in (5.10). The propagated data noise error |
| = |R
()
(gg
)|
vanishes asymptotically as 0 by (5.2), (5.9), and the second assumption in (5.10).
Source conditions
We have seen in Theorem 3.11 that the convergence of any regularization method can be
arbitrarily slow in general. On the other hand, we have seen in Example 1.2 that estimates
on the rate of convergence as the noise level tends to 0 can be obtained under a-priori
smoothness assumptions on the solution. In a general Hilbert space setting such conditions
have the form
= f(T
T)w, w X, |w| (5.12)

with a continuous function f satisfying f(0) = 0. (5.12) is called a source condition. The
most common choice f() =
with > 0 leads to source conditions of Holder type,
= (T
T)
w, w X, |w| . (5.13)
Since T is typically a smoothing operator, (5.12) and (5.13) can be seen as abstract smooth-
ness conditions. In the next chapter we will show for some important problems that source
conditions can be interpreted as classical smoothness conditions in terms of Sobolev spaces.
In (5.13) the case = 1/2 is of special importance, since
R((T
T)
1/2
) = R(T
) (5.14)
as shown in the exercises. To take advantage of the source condition (5.13) we assume that
there exist constants 0
0
and C
> 0 such that

sup
[0,|T
T|]
[
()[ C
for 0
0
. (5.15)
The constant
0
is called the qualication of the family of regularization operators (R
)
dened by (5.3). A straightforward computation shows that the qualication of (iterated)
Tikhonov regularization
0
= 1 (or
0
= n, respectively), and that the qualication of
Landweber iteration and the truncated singular value decomposition is
0
= . By the
following theorem,
0
is a measure of the maximal degree of smoothness, for which the
method converges of optimal order.
Theorem 5.2. Assume that (5.13) and (5.15) hold. Then the approximation error and its
image under T satisfy
|
| C
, for 0
0
, (5.16)
|T
| C
+1/2
+1/2
, for 0
0
1
2
. (5.17)
Under the additional assumptions (5.2) and (5.8) and with the a-priori parameter choice
rule = c
2
2+1
, c > 0, the total error is bounded by
|
| c
2
2+1
(5.18)
with some constant c
> 0 independent of g
. For the parameter choice rule = c(/)

2
2+1
,
c > 0 we have
|
| c
1
2+1
2
2+1
(5.19)
with c
independent of g
and .
Proof. Using (5.4), (5.13), (5.15), and the isometry of the functional calculus (4.28c), we
obtain
|
| = |r
(T
T)
| = |r
(T
T)(T
T)
w| |
()|
.
(5.17) follows from (5.16) together with the identify
|T|
2
= T, T) = T
T, ) =
(T
T)
1/2
, (T
T)
1/2
_
= |(T
T)
1/2
|
2
. (5.20)
Due to the assumption (5.2) and (5.9), this gives the estimate
|
| |
| +|
| C
+
_
(C
r
+ 1)C
q
1/2
.
This shows (5.18) and (5.19).
The choice f() =
in (5.12) is not always appropriate. Whereas Holder-type source

conditions (5.13) have natural interpretations for many mildly ill-posed problems, they are
far too restrictive for most exponentially ill-posed problems. (5.13) often implies that
in an analytic function in such situations. As we will see, an appropriate choice of f for

most exponentially ill-posed problems is
f
p
() :=
_
(ln )
p
, 0 < exp(1)
0, = 0
(5.21)
with p > 0, i.e.
= f
p
(T
T)w, |w| . (5.22a)

We call (5.22) a logarithmic source condition. In order to avoid the singularity of f
p
() at
= 1, we always assume in this context that the norm in Y is scaled such that
|T
T| = |T|
2
exp(1). (5.22b)
The equality in (5.22b) is a consequence of Lemma 4.1. Of course, scaling the norm of Y
has the same eect as scaling the operator T.
Theorem 5.3. Assume that (5.22) with p > 0 and (5.15) with
0
> 0 hold true. Then
there exist constants
p
> 0 such that the approximation error is bounded by
|
|
p
f
p
() (5.23)
for all [0, exp(1)]. Under the additional assumptions (5.2) and (5.8) and with the
a-priori parameter choice rule = , the total error is bounded by
|
| c
p
f
p
() for exp(1) (5.24)
p
. If = /, then
|
| c
p
f
p
(/), for / exp(1) (5.25)
p
and .
Proof. It follows from (5.4), (5.22), and (4.28c) that
|
| = |r
(T
T)
| = |r
(T
T)f
p
(T
T)w| |r
f
p
|
,[0,exp(1)]
.
If we can show that (5.15) implies
[r
()[f
p
()
p
f
p
() (5.26)
for all , [0, exp(1)], we have proved (5.23). To show (5.26) we make the substitution
= / and write r(, ) := r
() with [0, 1/(e)]. (5.15) and (5.26) are equivalent

to
r(, ) C
, (5.27)
r(, )
p
f
p
()
f
p
()
, (5.28)
respectively. Suppose that (5.27) holds true, and dene
p
:= sup
0<1
C
(ln +1)
p
<
. Then
r(, ) C

p
(ln + 1)
p

p
_
ln
ln
+ 1
_
p
=
p
f
p
()
f
p
()
for 0 < 1 and 0 < exp(1) since ln 1. For 1 < 1/(e), condition (5.15)
with = 0 implies that
C
0
f
p
()
f
p
()
> C
0
r(, ).
This proves (5.26) with
p
= max(C
0
,
p
).
Using (5.11), (5.9), and (5.23) we obtain
|
|
p
f
p
() +
_
(1 +C
0
)C
q
.
This implies (5.24) with c
p
=
p
+
_
(1 +C
0
)C
q
sup
0<exp(1)
/f
p
() and (5.25) with
c
p
=
p
+
_
(1 +C
0
)C
q
sup
0<exp(1)
/f
p
().
Optimality and an abstract stability result
Assume we want to nd the best approximate solution
= T
g to (5.1), and at our

disposal are noisy data g
satisfying (5.2) and the a-priori information that
satises
(5.12). Dene
M
f,
=
X :
satises (5.12),
and let R : Y X be an arbitrary mapping to approximately recover
from g
. Then
the worst case error of the method R is
R
(, M
f,
, T) := sup|Rg
| :
M
f,
, g
Y, |T
Qg
| .
The best possible error bound is the inmum over all mappings R : Y X:
(, M
f,
, T) := inf
R
R
(, M
f,
, T) (5.29)
Theorem 5.4. Let (, M
f,
, T) := sup|
| :
M
f,
, |T
| denote the modulus

of continuity of (T[
M
f,
)
1
. Then
(, M
f,
, T) (, M
f,
, T). (5.30)
Proof. Let R : Y X be an arbitrary mapping, and let
M
f,
such that |T
| .
Choosing g
= 0 in the denition of
R
gives
R
(, M
f,
, T) |R(0)
|.
Since also
M
f,
and | T
| , we get
R
(, M
f,
, T) |R(0) +
|,
and hence
2|
| |R(0)
| +|R(0) +
| 2
R
(, M
f,
, T).
This implies (, M
f,
, T)
R
(, M
f,
, T) after taking the supremum over all
M
f,
with |T
| . Since R was arbitrary, we have proved (5.30).

It can be shown that actually equality holds in (5.30). The proof of the reverse in-
equality, which is much more dicult than the proof of Theorem 5.4, proceeds by the
construction of an optimal method for the set M
f,
(cf. [MM79, Lou89, GP95]). These
methods require the a-priori knowledge of both f and .
Our next aim is to nd a computable expression for (, M
f,
, T) which allows us to
compare error bounds for practical regularization methods to the best possible error bound.
The proof of the next theorem is based on
Lemma 5.5. (Jensens inequality) Assume that C
2
([, ]) with , R
is convex, and let be a nite measure on some measure space . Then
_
_
d
_
d
_
_
d
_
d
(5.31)
holds for all L
1
(, d) satisfying almost everywhere d. The right hand
side may be innite if = or = .
Proof. W.l.o.g. we may assume that
_
d = 1. Let M :=
_
d, and consider the Taylor
expansion () = (M) +
(M)( M) +
_
M
( M)
(M) dx. Since
(M) 0, we have
(M) +
(M)( M) () for all [a, b]. Hence, with = (x),

(M) +
(M)( M)
for almost all x . An integration d yields (5.31).
For the special case (t) = t
p
, p > 1, Jensens inequality becomes
_
d
__

p
d
_1
p
__
d
_1
q
with q =
p
p1
. From this form, we easily obtain Holders inequality
_
[a[[b[d
__
[a[
p
d
_1
p
__
[b[
q
d
_1
q
for positive measures on , a L
p
( ), and b L
q
( ) by setting = [b[
q
and =
[a[[b[
1
p1
.
Theorem 5.6. Let (5.12) hold, and assume that f C([0, ]), = |T|
2
, is strictly mono-
tonically increasing with f(0) = 0. Moreover, assume that the function : [0, f()
2
]
[0, f()
2
] dened by
() := (f f)
1
() (5.32)
is convex and twice continuously dierentiable. Then the stability estimate
|
|
2

2
1
_
|T
|
2
2
_
. (5.33)
holds. Consequently, for
f(),
(, M
f,
, T)
_
1
(
2
/
2
). (5.34)
Proof. By linearity, we may assume that = 1. With the notation of Theorem 4.8 let
w
:= [W
1
w[
2
. Then (5.31) and (5.32) yield
_
|
|
2
|w|
2
_
=
_
|W
1
|
2
L
2
(,d)
|W
1
w|
2
L
2
(,d)
_
=
_
_
f()
2
d
w
_
d
w
_
_
(f()
2
)d
w
_
d
w
=
_
f()
2
d
w
|w|
2
=
|(T
T)
1/2
f(T
T)w|
2
|w|
2
=
|T
|
2
|w|
2
.
By the convexity of , the fact that (0) = 0, and |w| 1, this estimate implies
(|
|
2
) |T
|
2
. (5.35)
Since f is strictly increasing, so are f f, (f f)
1
, , and
1
. Hence, applying
1
to
(5.35) yields (5.33). (5.34) follows from (5.33) and the denition.
The estimate (5.33) is a stability estimate for (5.1) based on the a-priori information
that
M
f,
. It corresponds to the stability estimate derived in Example 1.2, which was
based on an a-priori bound on the second derivative of the solution.
Remark 5.7. We discuss when equality holds in (5.33) and (5.34). Let w be an eigenvector
of T
T such that |w| = = 1 and T
Tw =

w. Then
= f(
)w and
|
|
2
= f(
)
2
=
1
(
f(
)
2
) =
1
(|T
|
2
).
In the second equality we have used the denition (5.32) with = f(
)
2
, and in the last
equality the identity (5.20). Hence, (5.33) is sharp in this case. Moreover, equality holds
in (5.34) if (/)
2
is an eigenvalue of T
Tf(T
T)
2
. An exact expression for (, M
f,
, T)
for all is derived in [Hoh99].
Denition 5.8. Let (R
, ) be a regularization method for (5.1), and let the assumptions

of Theorem 5.6 be satised. Convergence on the source sets M
f,
is said to be
optimal if
R
(, M
f,
, T)
_
1
(
2
/
2
)
asymptotically optimal if
R
(, M
f,
, T) =
_
1
(
2
/
2
) (1 + o (1)), 0
of optimal order if there is a constant C 1 such that
R
(, M
f,
, T) C
_
1
(
2
/
2
)
for / suciently small.
For f() =
, > 0, the assumptions of Theorem 5.6 are satised, and () =

1+2
2
.
We get
Corollary 5.9. (5.13) implies
|
|
1
1+2
|T
|
2
1+2
.
Moreover,
(, M
,
, T)
1
2+1
2
1+2
.
Corollary 5.9 implies that the method in Theorem 5.2 is of optimal order. Note, how-
ever, that this method requires knowledge of the parameter , i.e. the degree of smoothness
of the unknown solution. We will see below that a-posteriori parameter choice rules can
lead to order-optimal methods, which do not require such a-priori knowledge.
Usually, Corollary 5.9 is proved by interpolation rather instead of Jensens inequality
(cf.,e.g.,[EHN96, Lou89]). We have chosen the latter approach since it also allows to treat
the case f = f
p
(cf. (5.21)) (cf. [Mai94]):
Corollary 5.10. The assumptions of Theorem 5.6 are satised for f = f
p
, and the inverses
of the corresponding functions
p
have the asymptotic behavior
_
1
p
() = f
p
()(1 + o (1)), 0. (5.36)
Consequently,
|
| f
p
_
|T
|
2
2
_
(1 + o (1)), |T
|/ 0, (5.37)
(, M
fp,
, T) f
p
_
2
/
2
_
(1 + o (1)), / 0. (5.38)
Proof. By (5.22b), we have = exp(1). It is obvious that f
p
is continuous on [0, ] and
strictly monotonically increasing.
p
: [0, 1] [0, exp(1)] is given by
p
() = exp(
1
2p
).
From
p
() = exp(
1/2p
)
1
1
2p
(2p)
2
(2p 1 +
1
2p
)
it is easily seen that
p
() 0 for [0, 1], i.e.
p
is convex.
To prove the estimate on
_
1
p
(), rst note that =
1
p
() implies
ln = ln
1
2p
.
Therefore,
= (ln ln )
2p
= (ln )
2p
_
1
ln
ln
_
2p
= f
p
()
2
_
1
ln
ln
1
2p
_
2p
Since lim
0
ln
ln
1/2p
= 0 and lim
0
= lim
0
1
p
() = 0, the assertion follows.
As f
p
(
2
/
2
) = 2
p
f
p
(/), Corollary 5.10 implies that the method in Theorem 5.3 is
of optimal order.
The discrepancy principle
We assume in this section the exact data g are attainable, i.e.
g R(T). (5.39)
Assume that (5.7) holds true and choose
> C
r
. (5.40)
Instead of (2.8) we consider a version of the discrepancy principle which allows some
sloppyness in the choice of the regularization paramter. If |g
| > , i.e. if the signal-

to-noise ration |g
|/ is greater than , we assume that = (, g
) is chosen such that

|g
| |g
| (5.41a)
for some
[, 2]. Otherwise, if |g
| , we set
(, g
) = (5.41b)
and
:= 0. In the latter case, when there is more noise than data, there may not exist
such that (5.41a) is satised. Note that under assumption (5.8),

|
| = |q
(T
T)T
|
C
q
|T
| 0 as , (5.42)
which explains the notation
:= 0. Finally note that

g
= (I Tq
(T
T)T
)g
= r
(TT
)g
(5.43)
for all > 0 by virtue of Lemma 4.16. Hence, lim
0
|g
| = |r
0
(TT
)g
|
|g
| by Theorem 4.17. Together with (5.42) this shows that (5.41a) can be satised if
|g
| .
Theorem 5.11. Let r
and q
satisfy (5.7), (5.8), and (5.15) with

0
> 1/2, and let
(, g
) be a parameter choice rule satisfying (5.40) and (5.41). Then (R
, ) is a regular-
ization method in the sense of Dention 3.8.
Proof. We will repeatedly use the inequalities
|g T
| ( +C
r
) (5.44a)
( C
r
) |g T
| (5.44b)
which follow from (5.41a), (5.40), and the estimate
|(g T
) (g
)| = |r
(T
T)(g g
)| C
r
. (5.45)
Eq. (5.45), which holds for all > 0, is a consequence of (5.2) and the identity (5.43).
The proof is again based on the splitting (5.11) of the total error into an approximation
error and a propagated data error. Let (g
n
)
nN
be a sequence in Y such that |gg
n
|
n
and
n
0 as n , and let
n
:= (
n
, g
n
). Assume that the approximation error
does not tend to 0. Then, after passing to a subsequence, we may assume that there exists
> 0 such that
|
n
| for all n. (5.46)
Using the notation of Theorem 4.8 with T
T = A = WM
W
1
, let
:= W
1
. It
follows from (5.4) and (5.20), and (5.44a) that
_
[r
n
[
2
[
[
2
d = |
Ar
n
(A)
|
2
= |T(
n
)|
2
0
as n . Since 0 |A|, it follows from the Riesz-Fischer Theorem that there
exists a subsequence of n(k), such that [r
n(k)
[
2
[
[
2
0 pointwise almost everywhere
d as k . Hence, [r
n(k)
r
0
[
2
[
[
2
0 pointwise almost everywhere d
as k , where r
0
denotes the limit function dened by the right-hand side of (5.6).
With assumption (5.7), it follows from Lebesques Dominated Convergence Theorem that
_
[r
n(k)
r
0
[
2
[
[
2
d 0 as k . Since r
0
(A)
= 0 as shown in the proof of

Theorem 5.1, this implies that
|
n(k)
| = |r
n(k)
(A)
| = |r
n(k)
(A)
r
0
(A)
| 0
as k . This contradicts (5.46). Hence, |
n
| 0 as n .
Now assume that the propagated data noise error does not tend to 0. In analogy to
(5.46) we may assume that there exists > 0 such that
|
n

n
n
| for all n. (5.47)
It follows from (5.9) that
|
n

n
| C

n
n
C

n
n
(5.48)
with a generic constant C independent of n. This implies that (5.48)
n
0 as n .
Now it follows from (5.44b), (5.17), and the assumption
0
> 1/2 that
|
n

n
| C

n
n
C
|T(
n
)|
n
C(
n
)
0
1/2
0 (5.49)
as n with a generic constant C. This contradicts (5.47). Hence, both the approxi-
mation error and the propagated data noise error tend to 0, and the proof is complete.
The next theorem shows that the discrepancy principle leads to order optimal conver-
gence rates.
Theorem 5.12. Under the assumptions of Theorem 5.11 let
satisfy the Holder source

condition (5.13) with 0 <
0
1/2. Then there exists a constant c
> 0 independent
of , , and
such that
|
(,g
)
| c
1
2+1
2
2+1
. (5.50)
Proof. To estimate the approximation error we use Corollary 5.9 with
replaced by
= r
(T
T)
= (T
T)
(T
T)w. Since |r
(T
T)w| C
r
, this gives
|
| (C
r
)
1
1+2
|T(
)|
2
2+1
(C
r
)
1
1+2
(( +C
r
))
2
1+2
. (5.51)
Here the second inequality is a consequence of (5.44a). To estimate the propagated data
noise error, note that (5.17) and (5.44b) imply
( C
r
) |g T
| C
+1/2
(
)
+1/2
.
Together with (5.2) and (5.9) it follows that
|
|
_
(C
r
+ 1)C
q

_
2(C
r
+ 1)C
q
1
2+1
2
2+1
with a constant c
independent of , , and
. This together with (5.51) shows (5.50).

By Theorem 5.12, the discrepancy principle leads to order-optimal convergence only for

0
1/2, whereas the a-priori rule in Theorem 5.2 yields order-optimal convergence
for all
0
. It can be shown that (5.50) cannot be extended to (
0
1/2,
0
]
(cf. [Gro83] for Tikhonov regularization). This fact has initiated a considerable amount of
research in improved a-posteriori rule, which yield order-optimal convergence for all
0
without a-priori knowledge of (cf. [EG88, Gfr87] and references in [EHN96]).
For logarithmic source conditions, the discrepancy principle also leads to order-optimal
convergence. With a modied version of the discrepancy principle even asymptotically
optimal convergence rates can be achieved without a-priori knowledge of p (cf. [Hoh00]).
6. Interpretation of source conditions
In the previous chapter we have established convergence rates of regularization methods
for ill-posed operator equations T = g if the exact solution
satises a source condition
= f(T
T)w, |w|
with a continuous function f : [0, |T
T|] [0, ) satisfying f(0) = 0. It is usually not

obvious what such a condition means for a specic inverse problem. The aim of this chapter
is to interpret source condition for some important inverse problems.
Sobolev spaces of periodic functions
Let
f
n
(x) :=
1
2
exp(inx), n Z
denote the standard orthonormal basis of L
2
([0, 2]) with respect to the inner product
, ) :=
_
2
0
(x)(x) dx. For L
2
([0, 2]) we denote the Fourier coecients of by
(n) := , f
n
) , n Z.
Then ||
2
L
2
=
nZ
[ (n)[
2
by Parsevals equality, and =
nZ
(n)f
n
.
Denition 6.1. For 0 s < we dene
||
H
s :=
_
nZ
(1 +n
2
)
s
[ (n)[
2
_
1/2
(6.1)
and
H
s
([0, 2]) := L
2
([0, 2]) : ||
H
s < .
H
s
([0, 2]) is called a Sobolev space of index s. Note that H
0
([0, 2]) = L
2
([0, 2]).
Theorem 6.2. H
s
([0, 2]) is a Hilbert space for s 0.
Proof. It is easy to show that H
s
([0, 2]) is a linear space and that
, )
H
s
:=
nZ
(1 +n
2
)
s
(n)
(n)
51
6. Interpretation of source conditions 52
is an inner product on H
s
([0, 2]) satisfying ||
2
H
s = , )
H
s
for all H
s
([0, 2]). To
show completeness, let (
k
)
kN
be a Cauchy sequence in H
s
([0, 2]). Then (
k
(n))
kN
is a
Cauchy sequence for each n Z, and (n) := lim
k

k
(n) is well dened. We have to
show that :=
nZ
f
n
(n) belongs to H
s
([0, 2]) and that |
k
|
H
s 0 as k .
Given > 0 there exists k > 0 such that |
k
l
|
2
H
s for all l k. Hence,
N
n=N
(1 +n
2
)
s
[ (n)
k
(n)[
2
= lim
l
N
n=N
(1 +n
2
)
s
[
l
(n)
k
(n)[
2
sup
lk
|
l
k
|
2
H
s
for all N > 0. Taking the supremum over N N shows that |
k
|
H
s . This implies
that ||
H
s < and
k
in H
s
as k .
Remark 6.3. Another denition of Sobolev spaces of integral index s N uses the concept
of weak derivatives. A 2-periodic function L
2
([0, 2]) is said to have a weak (or
distributional) derivative D
k
L
2
([0, 2]) of order k N if the periodic continuation of
D
k
satises
_
R
(k)
dx = (1)
k
_
R
(D
k
)dx
for all C
(R) with compact support. It can be shown that H

k
([0, 2]) is the set of
all functions in L
2
([0, 2]) which have weak derivatives in L
2
([0, 2]) of order k (see
exercises). This alternative denition can be used to introduce Sobolev spaces on more
general domains.
For more information on Sobolev spaces of periodic functions, we refer to [Kre89, Chap-
ter 8].
Holder-type source conditions
Our rst example is numerical dierention. Since dierentiation annihilates constant func-
tion we dene the Hilbert space
L
2
0
([0, 2]) :=
_
L
2
([0, 2]) :
_
2
0
(x) dx = 0
_
of L
2
function with zero mean. The inverse of the dierentiation operator on L
2
0
([0, 2]) is
given by
(T
D
)(x) :=
_
x
0
(t) dt +c(), x [0, 2]
where c() :=
1
2
_
2
0
_
x
0
(t) dt dx is dened such that T
D
L
2
0
([0, 2]) for all
L
2
0
([0, 2]), i.e. T
D
L(L
2
0
([0, 2])).
Theorem 6.4. R((T
D
T
D
)
) = H
2
([0, 2])L
2
0
([0, 2]) for all 0. Moreover, there exist
constants c, C > 0 such that c|w|
L
2 |(T
D
T
D
)
w|
H
2 C|w|
L
2 for all w L
2
([0, 2]),
i.e. |(T
D
T
D
)
w|
H
2 |w|
L
2.
Proof. Note that L
2
0
([0, 2]) = spanf
n
: n Z, n ,= 0. A straightforward computation
shows that T
D
f
n
=
1
in
f
n
for n Z 0. It follows that T
D
f
n
=
1
in
f
n
since T, f
n
) =
1
in
(n) =
,
1
in
f
n
_
. Hence,
T
D
T
D
f
n
=
1
n
2
f
n
, n Z 0,
i.e. f
n
: n Z 0 is complete orthonormal system of eigenfunctions of T
D
T
D
. It
follows from Theorem 4.15 with W : l
2
(Z 0) L
2
0
([0, 2]), W :=
n,=0
(n)f
n
and
(n) = n
2
that
(T
D
T
D
)
w =
n,=0
1
n
2
w(n)f
n
for all w L
2
0
([0, 2]). Therefore,
|(T
D
T
D
)
w|
2
H
2 =
n,=0
_
1 +n
2
n
2
_
2
[ w(n)[
2
2
2
| w(n)|
2
L
2 <
which implies R((T
D
T
D
)
) H
2
([0, 2]) L
0
([0, 2]). To prove the reverse inclusion, let
H
2
([0, 2]) and dene w :=
n,=0
n
2
(n)f
n
. Then w L
2
0
([0, 2]) is well dened
since |w|
2
L
2
=
n,=0
(n
2
)
2
[ (n)[
2
||
2
H
2
< , and (T
D
T
D
)
w = .
Theorem 6.4 implies that if
H
2
([0, 2]) and if the regularization method satises
the assumptions of either Theorem 5.2 or Theorem 5.12, then the error can be estimated
by
|
(,g
)
|
L
2 c
1
2+1
2
2+1
.
with = |
|
H
2. Replacing the Sobolev spaces H
2
([0, 2]) by the classical function
spaces C
2
([0, 2]) with 0, 1/2, 1, this corresponds to the convergence rates (1.3)
obtained for the central dierence quotient.
Our second example is related to solution of the boundary value problem
u = 0 in , u = f on (6.2)
where R
2
is a bounded, simply connected region with smooth boundary and
f C(). The single layer potential
u(x) =
1
(y) ln[x y[ ds(y), x

solves (6.2) if and only if the density C() solves Symms equation
(y) ln[x y[ ds(y) = f(y), x

(cf. [Kre89]). It is a prototype of a one-dimensional rst-kind integral equation with a
logarithmic singularity. We consider the special case that is a circle of radius a > 0
parametrized by z(t) = a(cos t, sin t), t [0, 2]. Then [z(t) z()[
2
= 4a
2
sin
2 t
2
, and
setting (t) := a(z(t)) and g(t) := f(z(t)) we get
_
2
0
()
_
ln a +
1
2
ln
_
4 sin
2
t
2
__
d = g(t), t [0, 2]. (6.3)
The left hand side of this equation is denoted by (T
Sy
)(t). We now consider (6.3) as an
integral equation in L
2
([0, 2]). Since the integral operator T
Sy
has a square integrable
kernel, it is a compact operator in L(L
2
([0, 2])). Therefore, (6.3) is ill-posed.
Theorem 6.5. If a ,= 1, then R((T
Sy
T
Sy
)
) = H
2
([0, 2]) for all 0. Moreover,
|(T
Sy
T
Sy
)
w|
H
2 |w|
L
2 for w L
2
([0, 2]).
Proof. Using the integrals
1
2
_
2
0
e
int
ln
_
4 sin
2
t
2
_
dt =
_
1/[n[, n Z 0
0, n = 0
(cf. [Kir96, Lemma 3.17]) we obtain
T
Sy
f
n
=
_
1/[n[f
n
, n Z 0
2 lnaf
0
, n = 0
As T
Sy
= T
Sy
it follows that
(T
Sy
T
Sy
)f
n
=
_
1/[n[
2
f
n
, n Z 0
(2 lna)
2
f
0
, n = 0
Now the proof is almost identical to that of Theorem 6.4.
Logarithmic source conditions
We consider the initial value problem for the heat equation with periodic boundary condi-
tions:
t
u(x, t) =

2
x
2
u(x, t), x (0, 2), t (0, T] (6.4a)
u(0, t) = u(2, t), t (0, T) (6.4b)
u(x, 0) = (x), x [0, 2]. (6.4c)
This is equivalent to the heat equation on a circle. In analogy to (1.6), the solution to this
evolution problem is given by
u(x, t) =
nZ
exp(n
2
t) (n)f
n
(x)
We dene the operator T
BH
L(X, Y ) with X = Y = L
2
([0, 2]) by (T
BH
)(x) := u(x, T)
where u satises (6.4), i.e.
T
BH
=
nZ
exp(n
2
T) (n)f
n
. (6.5)
In order to meat the normalization condition (5.22b), we dene the norm in Y to be
||
Y
:= exp(1/2)||
L
2.
Theorem 6.6. R(f
p
(T
BH
T
BH
)) = H
2p
([0, 2]) for all p > 0, and |f
p
(T
BH
T
BH
)w|
H
2p
|w|
L
2.
Proof. It follows from (6.5) and the denition of the norm in Y that
T
BH
T
BH
=
nZ
exp(1) exp(2Tn
2
) (n)f
n
.
Hence,
f
p
(T
BH
T
BH
)w =
nZ
f
p
(exp(1) exp(2Tn
2
)) w(n)f
n
=
nZ
(1 + 2Tn
2
)
p
w(n)f
n
for w L
2
([0, 2]) and
|f
p
(T
BH
T
BH
)w|
2
H
2p =
nZ
_
1 +n
2
1 + 2Tn
2
_
2p
[ w(n)[
2
T
2p
|w|
2
L
2.
Therefore, R(f
p
(T
BH
T
BH
)) H
2p
([0, 2]). Vice versa, let H
2p
([0, 2]), and dene
w :=
nZ
(1 + 2Tn
2
)
p
(n)f
n
such that f
p
(T
BH
T
BH
)w = . It follows from |w|
L
2 =
nZ
(1+2Tn
2
)
2p
[ (n)[
2
(2T)
2p
||
2
H
2p
< that w L
2
([0, 2]). Hence, H
2p
([0, 2])
R(f
p
(T
BH
T
BH
)).
Theorem 6.6 shows that if
H
2p
([0, 2]) and the regularization method satises the
assumptions of Theorem 5.3, then the error can be estimated by
|
| c
p
f
p
(/)
with = |
|
H
2p.
Analogous results for the heat equation in more general domains, for the sideways heat
equation, and for some other problems can be found in [Hoh00].
7. Nonlinear operators
The Frechet derivative
Most regularization methods for nonlinear problems are based on a linearization of the
operator equation. Recall the denition of the Frechet derivative from Chapter 2:
Denition 7.1. Let X, Y be normed spaces, and let U be an open subset of X. A mapping
F : U Y is called Frechet dierentiable at U if there exists a bounded linear operator
F
[] : X Y such that
|F( +h) F() F
[]h| = o (|h|) (7.1)

uniformly as |h| 0. F
[] is called the Frechet derivative of F at . F is called Frechet

dierentiable if it is Frechet dierentiable at every point U. F is called continuously
dierentiable if F is dierentiable and if F
: U L(X, Y ) is continuous.
We collect some basic properties of the Frechet derivative.
Theorem 7.2. Let F : U X Y be Frechet dierentiable, and let Z be a normed space.
1. The Frechet derivative of F is uniquely determined.
2. If G : U Y is Frechet dierentiable, then F +G is dierentiable for all , R
(or C) and
(F +G)
[] = F
[] +G
[], U. (7.2)
3. (Chain rule) If G : Y Z is Frechet dierentiable, the G F : U Z if Frechet
dierentiable, and
(G F)
[] = G
[F()]F
, U. (7.3)
4. (Product rule) A bounded bilinear mapping b : XY Z is Frechet dierentiable,
and
b
[(
1
,
2
)](h
1
, h
2
) = b(
1
, h
2
) +b(h
1
,
2
) (7.4)
for all
1
, h
1
X and
2
, h
2
Y .
5. (Derivative of the operator inverse) Let X, Y be Banach spaces, and assume that
the set U L(X, Y ) of operators which have a bounded inverse is not empty. Then
the mapping inv : U L(Y, X) dened by inv(T) := T
1
. is Frechet dierentiable,
and
inv
[T]H = T
1
HT
1
(7.5)
for T U and H L(X, Y ).
56
7. Nonlinear operators 57
Remark 7.3. If Q is a normed space and F
1
: Q X and F
2
Y are Frechet dieren-
tiable, then the product rule and the chain rule with G = b and F = (F
1
, F
2
) : Q XY
imply that b(F
1
, F
2
) is Frechet dierentiable with
b(F
1
, F
2
)
[q]h = b(F
1
(q), F
2
[q]h) +b(F
1
[q]h, F
2
(q)).
If b is the product of real numbers, this is the ordinary product rule. Similarly, part 5
and the chain rule imply that for a mapping T : Q U the function inv T is Frechet
dierentiable with
(inv T)
[q]h = T(q)
1
(T
[q]h) T(q)
1
.
If X = Y = R, this is the quotient rule.
Proof of Theorem 7.2 1) Let

F
be another Frechet derivative of F. Then for all U,

h X and > 0 we have
|F
[]h

F
[]h| |F( +h) F() F
[]h|
+| F( +h) F()

F
[]h| = o ()
as 0. Dividing by shows that F
[]h =

F
[]h.
2) This is obvious.
3) We have
|G(F( +h)) G(F()) G
[F()]F
[]h|
|G(F( +h)) G(F()) G
[F()](F( +h) F())|

+|G
[]| |F( +h) F() F
[]h|.
It follows from the Frechet dierentiability of F that |F( + h) F()| = |F
[]h| +
o (|h|) =
O
(|h|) as |h| 0. Since G is Frechet dierentiable, the rst term on the right
hand side is of order o (|F( +h) F()|) = o (|h|). Due to the boundedness of |G
[]|
and the Frechet dierentiability of F, the second term is also of order o (|h|).
4) This follows from the identity
b(
1
+h
1
,
2
+h
2
) b(
1
,
2
) b(
1
, h
2
) b(h
1
,
2
) = b(h
1
, h
2
)
and the boundedness of b.
5) By the Neumann series
(T +H)
1
= T
1
(I +HT
1
)
1
= T
1
T
1
HT
1
+R
for |H| < |T
1
| with R =
j=2
T
1
(HT
1
)
j
. As |R| =
O
(|T
1
(HT
1
)
2
|) =
O
(|H|
2
), this shows (7.5).
Next we want to dene the integral over a continuous function G : [a, b] X dened on
a bounded interval [a, b] R with values in a Hilbert space X. The functional L : X R
(or L : X C if X is a complex Hilbert space) given by
L() :=
_
b
a
G(t), ) dt
is (anti)linear and bounded since the t |G(t)| is continuous on the compact interval
[a, b] and hence bounded. Using the Riesz representation theorem, we dene
_
1
0
G(t) dt to
be the unique element in X satisfying
__
b
a
G(t) dt,
_
= L()
for all X. It follows immediately from this denition that
|
_
b
a
G(t) dt|
_
b
a
|G(t)| dt. (7.6)
Lemma 7.4. Let X, Y be Hilbert spaces and U X open. Moreover, let U and h X
such that +th U for all 0 t 1. If F : U Y is Frechet dierentiable, then
F( +h) F() =
_
1
0
F
[ +th]h dt. (7.7)

Proof. For a given X consider the function
f(t) := F( +th), ) , 0 t 1.
By the chain rule, f is dierentiable and
f
(t) = F
[ +th]h, ) .
Hence, by the Fundamental Theorem of Calculus, f(1) f(0) =
_
1
0
f
(t) dt or
F( +h) F(), ) =
__
1
0
F
[ +th]h dt,
_
.
Since this holds true for all X, we have proved the assertion.
Lemma 7.5. Let X, Y be Hilbert spaces and U X open. Let F : U Y be Frechet
dierentiable and assume that there exists a Lipschitz constant L such that
|F
[] F
[]| L| | (7.8)
for all , U. If +th U for all t [0, 1], then (7.1) can be improved to
|F( +h) F() F
[]h|
L
2
|h|
2
.
Proof. Using Lemma 7.4, (7.6) and (7.8) we get
|F( +h) F() F
[]h| =
_
_
_
_
_
1
0
(F
[ +th]h F
[]h) dt
_
_
_
_
_
1
0
|F
[ +th]h F
[]h| dt
_
1
0
Lt|h|
2
dt =
L
2
|h|
2
.
Compactness
Denition 7.6. Let X, Y be normed spaces, and let U be a subset of X. An operator
F : U Y is called compact if it maps bounded sets to relatively compact sets. It is called
completely continuous if it is continuous and compact.
Unlike in the linear case, a nonlinear compact operator is not necessarily continuous (see
exercises). We mention that the terms compact and complete continuous are sometimes
used dierently in the literature.
Let Z be another normed space, and let F = H G with G : U Z and H : Z Y .
It follows immediately from the denition that F is compact if G is compact and H is
continuous, or if G maps bounded sets to bounded sets and H is compact. This facts can
often be used to prove the compactness of nonlinear operators.
Theorem 7.7. Let X, Y be normed spaces, and let U X be open. If F : U Y is
compact and X is innite dimensional, then F
1
cannot be continuous, i.e. the equation
F() = g is ill-posed.
Proof. If F
1
does not exist, there is nothing to show. Assume that F
1
exists as a
continuous operator. Then F
1
maps relatively compact sets to relatively compact sets.
Hence every ball B = X : |
0
| < contained in U is relatively compact since
F(B) is relatively compact and B = F
1
(F(B)). This is not possible since dimX = .
The idea of Newton-type methods, which will be discussed in the next section, is to
replace F() = g by the linearized equation
F
[
n
]h
n
= g F(
n
) (7.9)
in each step and update
n
by
n+1
:=
n
+ h
n
. The following result implies that (7.9)
inherits the ill-posedness of the original equation if F is completely continuous.
Theorem 7.8. Let X be a normed space, Y a Banach space, and let U X be open. If
F : U Y is completely continuous and Frechet dientiable, then F
[] is compact for all

U.
Proof. The proof relies on the fact that a subset K of the Banach space Y is relatively
compact if and only if it is totally bounded, i.e. if for any > 0 there exist
1
, . . . ,
n
Y
such that min
1jn
|
j
| for all K (cf. e.g. [Kre89, Section 1.4]).
We have to show that
K := F
[]h : h X, |h| 1
is relatively compact. Let > 0. By the denition of the Frechet derivative there exists
> 0 such that +h U and
|F( +h) F() F
[]h|

3
|h| (7.10)
for all |h| . Since F is compact, the set F( + h) : h X, |h| 1 is relatively
compact and hence totally bounded, i.e. there exist h
1
, . . . , h
n
X with |h
j
| 1, j =
1, . . . , n such that for all h X satisfying |h| 1 there exists and index j 1, . . . , n
such that
|F( +h) F( +h
j
)|

3
. (7.11)
Using (7.10) and (7.11) we obtain
|F
[]h F
[]h
j
| |F( +h) F( +h
j
)|
+| F( +h) +F() +F
[]h| +|F( +h
j
) F() F
[]h
j
| ,
i.e. K is totally bounded.
Apart from Theorem 7.10 the connection between the ill-posedness of a nonlinear
problem and its linearization is less close than one may expect as counter-examples in
[EKN89, Sch02] show.
The inhomogeneous medium scattering problem
A classical inverse problem is to determine the refractive index of a medium from mea-
surements of far eld patterns of scattered time-harmonic acoustic waves in this medium.
Let u
i
be an incident eld satisfying the Helmholtz equation u
i
+ k
2
u
i
= 0, e.g. u
i
(x) =
exp(ikx ) with S
d1
:= x R
d
: [x[ = 1. The direct scattering problem is
described by the system of equations
u +k
2
nu = 0, x R
d
, (7.12a)
u
i
+u
s
= 0, (7.12b)
lim
r
r
(d1)/2
_
u
s
r
iku
s
_
= 0, uniformly for all x =
x
[x[
. (7.12c)
Here k > 0 denotes the wave number, n is the refractive index of the medium, u
s
is the
scattered eld, and u is the total eld. Absorbing media are modelled by complex-valued
refractive indices n. We assume that Re n 0, Imn 0. Moreover, we assume that n is
constant and equal to 1 outside of the ball B
:= x R
3
: [x[ , > 0, i.e.
n = 1 a
with supp a B
. The fundamental solution to the Helmholtz equation is given by

(x, y) =
1
4
exp(k[x y[)
[x y[
, for d = 3, (7.13)
(x, y) =
i
4
H
(1)
0
(k[x y[), for d = 2 (7.14)
for x ,= y where H
(1)
0
denotes the Hankel function of the rst kind of order 0.
A basic tool for the analysis of the direct medium scattering problem is a reformulation
of the system (7.12) as an integral equation of the second kind:
u(x) +k
2
_
B
(x y)a(y)u(y) dy = u
i
(x), x R
d
. (7.15)
(7.15) is called the Lippmann-Schwinger equation.
Theorem 7.9. Assume that a C
1
(R
d
) satises supp a B
. Then any solution u

C
2
(R
d
) to (7.12) satises (7.15). Vice versa, let u C(B
) be a solution to (7.15). Then

u
s
(x) := k
2
_
B
(x y)a(y)u(y) dy, x R
d
(7.16)
belongs to C
2
(R
d
) and satises (7.12).
To prove Theorem 7.9 we need a few preparations. All function v C
2
(B
) C
1
(B
)
satisfy Greens representation formula
v(x) =
_
B
_
v
(y)(x, y) v(y)
(x, y)
(y)
_
ds(y) (7.17)
_
B
_
v(y) +k
2
v(y)
_
(x, y) dy, x B
where denotes the outer normal vector on B
(cf. [CK97b, Theorem 2.1]).

Moreover, we need the following properties of the volume potential
(V )(x) :=
_
R
d
(x, y)(y) dy, x R
d
. (7.18)
Theorem 7.10. Let supp B
. If C(R
d
), then V C
1
(R
d
), and if C
1
(R
d
),
then V C
2
(R
d
). In the latter case
( +k
2
)(V ) = (7.19)
holds true.
The mapping properties of V stated in this theorem are crude, but sucient for our
purposes. In terms of Holder and Sobolev spaces, V is a smoothing operator of order 2.
Proof of Theorem 7.9. Let u C
2
(R
d
) be a solution to (7.12), and let x B
. It
follows from Greens representation formula (7.17) with v = u and u +k
2
u = k
2
au that
u(x) =
_
B
_
u
(y)(x, y) u(y)
(x, y)
(y)
_
ds(y) k
2
_
B
(x, y)a(y)u(y) dy (7.20)
where denotes the outer normal vector on B
. Greens formula (7.17) applied to u

i
gives
u
i
(x) =
_
B
_
u
i
(x, y) u
i
(y)
(x, y)
(y)
_
ds(y). (7.21)
Finally, we choose R > and apply Greens second theorem to (x, ) and u
s
in B
R
B
_
B
_
u
s
(y)(x, y) u
s
(y)
(x, y)
(y)
_
ds(y)
=
_
B
R
_
u
s
(y)(x, y) u
s
(y)
(x, y)
(y)
_
ds(y)
=
_
B
R
_
u
s
(y) iku
s
(y)
_
(x, y) ds(y) +
_
B
R
u
s
(y)
_
ik(x, y)
(x, y)
(y)
_
ds(y)
where on B
R
is the outer normal vector. Since both u
s
and (x, ) satisfy the Sommerfeld
radiation condition and since (7.12c) implies that [u
s
(y)[ =
O
_
[y[
(d1)/2
_
as [y[ , the
right hand side of the last equation tends to 0 as R . Combining this result with
(7.20) and (7.21) and using (7.12b) shows that (7.15) is satised.
Vice versa, let u C(B
) be a solution to (7.15). Since (, y) satises the Sommerfeld

radiation condition uniformly for y B
, u
s
dened by (7.16) satises (7.12c). Moreover,
u
s
C
1
(R
d
) by Theorem 7.10. Now the second statement in Theorem 7.10 and the
assumption a C
1
c
(R
d
) imply that u
s
C
2
(R
d
) and that
u
s
+k
2
u
s
k
2
au = 0.
Since u
i
+k
2
u
i
= 0, u = u
s
+u
i
satises (7.12a).
Theorem 7.11. The Lippmann-Schwinger equation has a unique solution u C(B
) if
|a|
< (k
2
|V |
)
1
.
Proof. Under the given assumption we have |k
2
V M
a
|
< 1 where M
a
: C(B
) C(B
)
is dened by M
a
v := av. Therefore, the operator I +k
2
V M
a
has a bounded inverse given
by the Neumann series.
Using Riesz theory it can be shown that the smallness assumption on |a|
in Theorem
7.11 can be dropped. However, it is not trivial to show uniqueness of a solution to (7.15)
or equivalently, a solution to (7.12) (cf. [CK97b, Hah98, Kir96]).
Recall the denition of the far-eld pattern from (1.19). As straightforward com-
putation shows that for d = 3 the far eld pattern of (, y) is given by
( x, y) =
3
exp(ik x y),
3
= 1/(4) and that (, y) satises (1.19) uniformly with respect
to y B
. By the asymptotic behavior of the Hankel functions for large arguments

(cf. [CK97b, Leb65]), the same holds true for d = 2 with
2
= e
i/4
/
8k. Hence, for both

d = 2 and d = 3 the far eld pattern of u
s
is given by
u
( x) = k
2
d
_
B
exp(ik x y)a(y)u(y) dy, x S
d1
. (7.22)
The right hand side of this equation denes a linear integral operator E : C(B
)
L
2
(S
d1
) with u
= E(au).
Now we turn to the inverse problem to recover a from measurements of u
. Since for
d = 3 the far eld pattern u
is a function of two variables whereas the unknown coecient

a is a function of three variables, we cannot expect to be able to reconstruct a from far-eld
measurements corresponding to just one incident eld. A similar argument holds for d = 2.
Therefore, we consider incident elds u
i
(x) = u
i
(x, ) = exp(ik x) from all directions
S
d1
and denote the corresponding solutions to the direct problem by u(x, ), u
s
(x, ),
and u
(x, ). The direct solution operator F

IM
: D(F
IM
) C
1
c
(B
) L
2
(S
d1
S
d1
) is
dened by
(F
IM
(a))( x, ) := u
( x, ), x, S
d1
(7.23)
where u
(, ) is the far eld pattern corresponding to the solution u

s
(, ) of (7.12) with
n = 1a and u
i
= u
i
(, ). Here C
1
c
(B
) is the space of all continuously dierentiable func-

tion on B
with supp a B
, and the domain of F

IM
incorporates the physical restrictions
on n = 1 a:
D(F
IM
) := a C
1
c
(B
) : Re(1 a) 0, Im(a) 0.
It can be shown that F
IM
is one-to-one, i.e. that the far-eld patterns u
(, ) for all
directions S
2
of the incident wave determine the refractive index n = 1 a uniquely
(cf. [CK97b, Hah98, Kir96]).
Theorem 7.12. 1. The operator G : D(F
IM
) C(R
d
S
d1
) is Frechet dierentiable
with respect to the supremum norm on D(F
IM
), and u
:= G
[a]h satises the integral

equation
u
+k
2
V (au
) = k
2
V (hu) (7.24)
for all a D(F
IM
) and h C
1
c
(B
) with u := G(a).
2. F
IM
is Frechet dierentiable with respect to the maximum norm on D(F
IM
).
Proof. 1) The mapping D(F
IM
) L(C(B
S
d1
)), a M
a
where (M
a
v)(x, ) :=
a(x)v(x, ) is linear and bounded with respect to the supremum norm and hence Frechet
dierentiable. It follows from Theorem 7.2, Parts 3 and 5 that the the mapping D(F
IM
)
L(C(B
S
d1
)) a (I +k
2
V M
a
)
1
is Frechet-dierentiable. An application of the chain
rule yields the Frechet dierentiability of G(a) = (I +k
2
V M
a
)
1
u
i
. Now (7.24) can be de-
rived by dierentiating both sides of the Lippmann-Schwinger equation (7.15) with respect
to a using the product rule.
2) Since F(a) = E(M
a
G(a)), this follows from the rst part, the product and the chain
rule.
8. Nonlinear Tikhonov regularization
Let X, Y be Hilbert spaces and F : D(F) X Y a continuous operator. We want to
solve the operator equation
F() = g (8.1)
given noisy data g
Y satisfying |g
g| . Let
denote the exact solution. We

assume that the solution to (8.1) with exact data g = F(
) is unique, i.e. that

F() = g =
(8.2)
although many of the results below can be obtained in a modied form without this as-
sumption.
The straightforward generalization of linear Tikhonov regularization leads to the min-
imization problem
|F() g
|
2
+|
0
|
2
= min! (8.3)
over D(F) where
0
denotes some initial guess of
. The mapping D(F) R,

|F() g
|
2
+ |
0
|
2
is called (nonlinear) Tikhonov functional. Note that
as opposed to the linear case, the element 0 X does not have a special role any more.
Therefore,
0
= 0 is as good as any other initial guess.
As opposed to the linear case it is not clear under the given assumptions if the mini-
mization problem (8.3) has a solution. We will have to impose additional assumptions on
F to ensure existence. Moreover, even if (8.3) has a unique solution for = 0 there may
be more than one global minimizer of (8.3) for > 0.
As in the linear case, it is sometimes useful to consider other penalty terms in (8.3)
than |
0
|
2
.
Weak convergence in Hilbert spaces
Denition 8.1. We say that a sequence (
n
) in a Hilbert space X converges weakly to
X and write
n
as n if
n
, ) , ) , n (8.4)
for all X.
If is another weak limit of the sequence (
n
), then , ) = 0 for all
X. Choosing = shows that = , i.e. a weak limit of a sequence is uniquely
determined. It follows from the Cauchy-Schwarz inequality that strong convergence implies
64
8. Nonlinear Tikhonov regularization 65
weak convergence. The following example shows that the reverse implication is not true in
general.
Example 8.2. Let
n
: n N be an orthonormal system in X. Then |
n
m
| =
2
for m ,= n, i.e. the sequence (
n
) cannot be strongly convergent. On the other hand, it
follows from Bessels inequality
n=1
[
n
, ) [
2
||
2
for X that [
n
, ) [
2
0 as n . Therefore,
n
0.
Lemma 8.3. If T L(X, Y ), then T is weakly continuous, i.e.
n
implies T
n
T
as n .
Proof. Let
n
. Then for any Y we have
T
n
, ) =
n
, T
) , T
) = T, ) .
Lemma 8.4. If
n
, then limsup
n
|
n
| ||, i.e. the norm is weakly lower
semicontinuous.
Proof. We have
n
, ) ||
2
as n . It follows from the Cauchy-Schwarz inequality
that limsup
n
|
n
||| ||
2
. This implies the assertion.
Theorem 8.5. Every bounded sequence has a weakly convergent subsequence.
Proof. Let (
n
)
nN
be some sequence in X such that |
n
| 1 for all n N, and let
e
j
: j N be a complete orthonormal system in

X := span
n
: n N. Since
n
, e
1
) is
a bounded sequence of complex numbers, there exists a convergent subsequence
n
1
(k)
, e
1
_
.
Since
n
1
(k)
, e
2
_
is bounded, there exists a subsequence n
2
(k) of n
1
(k) such that
n
2
(k)
, e
2
_
is convergent. By continuing this process, we obtain subsequences n
l
(k) for all l N such
that
n
l
(k)
, e
l
_
is convergent and n
l+1
(k) is a subsequence of n
l
(k). The diagonal sequence
(
n
l
(l)
)
lN
has the property that
n
l
(l)
, e
k
_
converges to some
k
C as l for all
k N. Then :=
kN
k
e
k
denes an element of

X with || 1 since for all K N we
have
K
k=1
[[
2
= lim
l
K
k=1
[
n
l
(l)
, e
k
_
[
2
limsup
l
|
n
l
(l)
|
2
1.
We have to show that
n
l
(l)
,
_
, ) for all X. It suces to consider

X
since
n
l
(l)
,

X. Let > 0 and choose K N such that
k=K+1
[ , e
k
) [
2
(/4)
2
.
There exists L > 0 such that

n
l
(l)
,
K
k=1
, e
k
) e
k
)

2
for l L. Now it follows from the triangle inequality and the Cauchy-Schwarz inequality
that [

n
l
(l)
,
_
[ for l L.
Denition 8.6. A subset K of a Hilbert space X is called weakly closed if it contains the
weak limits of all weakly convergent sequences contained in K.
An operator F : T(F) X Y is called weakly closed if its graph gr F := (, F()) :
D(F) is weakly closed in XY , i.e. if
n
and F(
n
) g imply that D(F)
and F() = g.
Note that if F is weakly continuous and if D(F) is weakly closed, then F is weakly
closed. The following result gives sucient conditions for the weak closedness of D(F).
Theorem 8.7. If K X is convex and closed, then K is weakly closed.
Proof. Let (
n
) be some sequence in K converging weakly to some X. By Remark
3.2 there exists a best approximation K to in K satisfying
Re u , ) 0
for all u K. Substituting u =
n
and taking the limit n shows that ||
2
0.
Hence, = , and in particular K.
Convergence analysis
Theorem 8.8. Assume that F is weakly closed. Then the Tikhonov functional (8.3) has
a global minimum for all > 0.
Proof. Let I := inf
D(F)
|F() g
|
2
+|
0
|
2
denote the inmum of the Tikhonov
functional and choose a sequence (
n
) in D(F) such that
|F(
n
) g
|
2
+|
n
0
|
2
I +
1
n
. (8.5)
Since > 0,
n
is bounded. Hence, by Theorem 8.5 there exists a weakly convergent
subsequence
n(k)
with a weak limit X. Moreover, it follows from (8.5) that F(
n(k)
)
is bounded. Therefore, there exists a further subsequence such that F(
n(k(l))
) is weakly
convergent. Now the weak closedness of F implies that D(F) and that F(
n(k(l))
)
F() as l . It follows from Lemma 8.4 that
|F() g
|
2
+|
0
|
2
limsup
n
_
|F(
n
) g
|
2
+|
n
0
|
2
_
I.
Hence, is a global minimum of the Tikhonov functional.
We do not know if a solution to (8.3) is unique. Nevertheless, it can be shown that
an arbitrary sequence of minimizers converges to the exact solution
as the noise level

tends to 0. In analogy to Denition 3.8 this means that nonlinear Tikhonov regularization
is a regularization method.
Theorem 8.9. Assume that F is weakly closed and that (8.2) holds true. Let = () be
chosen such that
() 0 and
2
/() 0 as 0. (8.6)
If g
k
is some sequence in Y such that |g
k
g|
k
and
k
0 as k , and if
k
denotes a solution to (8.3) with g
= g
k
and =
k
= (
k
), then |
| 0 as
k .
Proof. Since
k
minimizes the Tikhonov functional, we have
|F(
k
) g
k
|
2
+
k
|
0
|
2
|F(
) g
k
|
2
+
k
|
0
|
2

2
k
+
k
|
0
|
2
.
The assumptions
k
0 and
k
0 imply that
lim
k
F(
k
) = g, (8.7)
and the assumption
2
k
/
k
0 yields
limsup
k
|
0
|
2
limsup
k
_
2
k
/
k
+|
0
|
2
_
= |
0
|
2
. (8.8)
It follows from (8.8) and Theorem 8.5 that there exists a weakly convergent subsequence of
k
with some weak limit X. By virtue of the weak closedness of F we have D(F)
and F() = g, so =
by (8.2).
It remains to show that |
| 0. Assume on the contrary that there exists

> 0 such that
|
| (8.9)
for some subsequence of (
k
) which may be assumed to be identical to (
k
) without loss
of generality. By the argument above, we may further assume that
. Since
|
|
2
= |
k

0
|
2
+|
0
|
2
+ 2 Re
k

0
,
0
_
,
it follows from (8.8) that
limsup
k
|
|
2
2|
0
|
2
+ 2 Re
0
,
0
_
= 0.
This contradicts (8.9).
As in the linear case we need a source condition to establish estimates on the rate of
convergence as 0. Source conditions for nonlinear problems usually involve

0
instead of
because of the loss of the special role of 0 X.

Theorem 8.10. Assume that F is weakly closed and Frechet dierentiable, that D(F) is
convex, and that there exists a Lipschitz constant L > 0 such that
|F
[] F
[]| L| | (8.10)
for all , D(F). Moreover, assume that the source condition
0
= F
w, (8.11a)
L|w| < 1 (8.11b)
is satised for some w Y and that a parameter choice rule = c with some c > 0 is
used. Then there exists a constant C > 0 independent of such that every global minimum
of the Tikhonov functional satises the estimates

|
| C
, (8.12a)
|F(
) g| C. (8.12b)
Proof. As in the proof of the Theorem 8.9 we use the inequality
|F(
) g
|
2
+|
0
|
2

2
+|
0
|
2
for global minimum
of the Tikhonov functional. Since this estimate would not give the
optimal rate for |F(
) g
|
2
, we add |
|
2
|
0
|
2
on both sides to obtain
|F(
) g
|
2
+|
|
2

2
+ 2Re
0
,
_
=
2
+ 2Re
w, F
](
)
_
.
Here (8.11a) has been used in the second line. Using the Cauchy-Schwarz inequality and
inserting the inequality
|F
](
)|
L
2
|
|
2
+|F(
) F(
)|
L
2
|
|
2
+|F(
) g
| +,
which follows from Lemma 7.5, yields
|F(
) g
|
2
+|
|
2

2
+2|w| +2|w| |F(
) g
| +L|w||
|
2
,
and hence
_
|F(
) g
| |w|
_
2
+(1 L|w|)|
|
2
( +|w|)
2
.
Therefore,
|F(
) g
| + 2|w|
and, due to (8.11b),
|
|
+|w|
_
(1 L|w|)
.
With the parameter choice rule = c, this yields (8.12).
By virtue of (5.14), condition (8.11a) is equivalent to
0
= (F
])
1/2
w, L| w| < 1
for some w X. Hence, for linear problems Theorem 8.10 reduces to a special case of
Theorem 5.2.
Theorem 8.10 was obtained by Engl, Kunisch and Neubauer [EKN89]. Other Holder-
type source conditions with a-priori parameter choice rules are treated in [Neu89], and a
posteriori parameter choice rules were investigated in [SEK93]. For further references and
results we refer to [EHN96, Chapter 10].
9. Iterative regularization methods
Let X, Y be Hilbert spaces, D(F) X open, and let F : D(F) Y be a continuously
Frechet dierentiable operator. We consider the nonlinear operator equation
F() = g (9.1)
and assume that for the given exact data g there exists a unique solution
D(F)
to (9.1). Moreover, we assume that some initial guess
0
D(F) to
is given, which
will serve as starting point of the iteration. Finally, as before, g
Y denote noisy data

satisfying
|g g
| . (9.2)
Examples
Since the gradient of the cost functional I() := |F() g
|
2
is given by I
[]h =
2 Re
[]
(F() g
), h
_
, the minimization of I by moving in the direction of steep-
est descent leads to the iteration formula
n+1
=
n
F
n
]
(F(
n
) g
), n = 0, 1, . . . (9.3)
known as nonlinear Landweber iteration. The step size parameter > 0 should be chosen
such that |F
n
]
n
]| 1 for all n.
It is well known that the convergence of Landweber iteration is very slow. We may ex-
pect faster convergence from Newton-type methods. Recall that Newtons method consists
in replacing (9.1) in the nth step by the linearized equation
F
n
]h
n
= y
F(
n
) (9.4)
and updating
n
by
n+1
=
n
+h
n
. (9.5)
If (9.1) is ill-posed then in general F
n
]
1
will not be bounded and the range of F
n
]
will not be closed (cf. Theorem 7.8). Therefore, the standard Newton method may not be
well dened even for exact data since we cannot guarantee that y F(
n
) R(F
n
])
in each step. Even if
n
is well dened for n 1, it does not depend continuously
on the data. Therefore, some sort of regularization has to be employed. In principle
any regularization method for linear ill-posed problems can be used to compute a stable
70
9. Iterative regularization methods 71
solution to (9.4). Tikhonov regularization with regularization parameter
n
> 0 leads to
the iteration formula
n+1
=
n
+ (
n
I +F
n
]
n
])
1
F
n
]
(g
F(
n
)). (9.6)
By Theorem 2.1 this is equivalent to solving the minimization problem
h
n
= argmin
hX
_
|F
n
]h +F(
n
) g
|
2
+
n
|h|
2
_
(9.7)
using the update formula (9.5). This method is known as the Levenberg-Marquardt algo-
rithm. The original idea of the Levenberg-Marquardt algorithm is to minimize |F()g
|
2
within a trust region X : |
n
|
n
in which the approximation F(x)
F(
n
) + F
n
](x
n
) is assumed to be valid. Here
n
plays the role of a Lagrange
parameter. Depending on the agreement of the actual residual |F(
n+1
) g
| and the
predicted residual |F(
n
) +F
n
](x
n
) g
| the trust region radius

n
is enlarged or
reduced. If |
0
| is suciently small, then a simple parameter choice rule of the form
n
=
0
_
1
2
_
n
(9.8)
is usually sucient and computationally more eective for ill-posed problems (cf. [Han97a,
Hoh99]).
In [Bak92] Bakushinskii suggested a related method called iteratively regularized Gauss-
Newton method where (9.7) is replaced by
h
n
= argmin
hX
_
|F
n
]h +F(
n
) g
|
2
+
n
|h +
0
|
2
_
(9.9)
The penalty term in (9.9) involves the distance of the new iterate
n+1
= h
n
+
n
from
the initial guess
0
. This yields additional stability since it is not possible that noise
components sum up over the iteration. Of course, to achieve convergence, the regularization
parameters must be chosen such that
n
0 as n . By Theorem 2.1, the unique
solution to (9.9) is given by
h
n
= (
n
I +F
n
]
n
])
1
_
F
n
]
(g
F(
n
)) +
n
(
0
n
)
_
. (9.10)
Let us compare the Levenberg-Marquardt algorithm (9.6) and the iteratively regularized
Gauss-Newton method (9.10) if F = T is linear. In this case performing n steps with
constant
n
= is equivalent to applying iterated Tikhonov regularization with n iterations
because (9.7) reduces to
n+1
= argmin
X
_
|T g
|
2
+|
n
|
2
_
.
Since we let n tend to innity, the Levenberg-Marquardt algorithm becomes Lardys
method. The iteratively regularized Gauss-Newton method (9.9) applied to linear problems
is
n+1
= argmin
X
_
|T g
|
2
+
n
|
0
|
2
_
,
i.e. it reduces to ordinary Tikhonov regularization with initial guess
0
and regularization
parameter
n
. In particular,
n+1
is independent of all the previous iterates.
The convergence of Newton-type methods is much faster than the convergence of
Landweber iteration. It can be shown that the number of Landweber steps which is neces-
sary to achieve an accuracy comparable to n Newton steps increases exponentially with n
(cf. [DES98]). On the other hand, a disadvantage of the Levenberg-Marquardt algorithm
and the iteratively regularized Gauss-Newton method is that the (typically full) matrix cor-
responding to the operator F
n
] L(X, Y ) has to be computed, and
n
I +F
n
]
n
]
has to be inverted in each step. For large scale problems, e.g. the inhomogeneous medium
scattering problem discussed in Chapter 7, this can be prohibitively expensive since the
computation of one column F
n
]h of this matrix, is typically as expensive as an evaluation
of the operator F.
Setting up a matrix for F
n
] can be avoided by using an iterative method to solve
(9.4). Then only the application of F
n
] and F
n
]
to a given vector are needed. For

many problems highly ecient implementations of the applications of F
n
] and F
n
]
are available, e.g. by FFT, fast multipole or nite element techniques. The regularization
of (9.4) is then achieved by an early stopping of the inner iteration. Using Landweber
iteration, a -method or the conjugate gradient method for the normal equation as inner
iteration leads to the Newton-Landweber method, the Newton--method or the Newton-CG
method, respectively (cf. [Han97b, Kal97, Rie99]). In all cases we we have a choice of
using either 0 or
0
n
as starting point of the inner iteration, corresponding to either a
Levenberg-Marquardt or an iteratively regularized Gauss-Newton scheme.
Another possibility is to solve the regularized equation
(
n
I +F
n
]
n
])h
n
= r
n
(9.11)
by some iterative method, e.g. the conjugate gradient method. For exponentially ill-posed
problems it is possible to construct special preconditioners for this inner iteration which
increase the speed of convergence signicantly (cf. [Hoh01]).
Convergence
As in the linear case, the quality of the iterates deteriorates in the presence of noise as the
number of iterations tends to innity. Therefore, the iteration has to be stopped before the
propagated data noise error becomes too large. An iteration scheme has to be accompanied
by an appropriate stopping rule to give a regulization method. The most commonly used
stopping rule is the discrepancy principle which requires to stop the iteration at the rst
iteration index N = N(, g
) for which
|F(
N
) g
| (9.12)
with some xed constant 1.
Denition 9.1. An iterative method
n+1
:=
n
(
n
, . . . ,
0
, g
) together with a stopping

rule N(, g
) is called an iterative regularization method for F if for all
D(F), all
g
satisfying (9.2) with g := F(
) and all initial guesses

0
suciently close to
the
following conditions hold:
1. (Well-denedness)
n
is well dened for n = 1, . . . , N(, g
), and N(, g
) < for
> 0.
2. (Convergence for exact data) For exact data ( = 0) either N = N(0, g) < and
N
=
or N = and |
n
| 0 for n .
3. (Convergence for noisy data) The following regularization property holds:
sup|
N(,g
| : g
Y, |g
g| 0 as 0. (9.13)
The last property in Denition 9.1 can often be established with the help of the following
theorem.
Theorem 9.2. Assume that the rst two properties in Denition 9.1 are satised and
that the discrepancy principle (9.12) is used as stopping rule. Moreover, assume that the
monotonicity condition
|
| |
n1
| (9.14)
and the stability condition
|
n
| 0 as 0 (9.15)
are satises for 1 n N(, g
). Then the regularizing property (9.13) holds true.

Proof. Let (g
k
) be a sequence in Y such that |g
k
g|
k
and
k
0 as k , and
dene N
k
:= N(
k
, g
k
).
We rst assume that N is a nite accumulation point of N
k
. Without loss of generality
we may assume that N
k
= N for all k. The stability assumption (9.15) implies that
k
N

N
as k . (9.16)
It follows from (9.12) that |g
k
F(
k
N
)|
k
for all k. Taking the limit k shows
that F(
N
) = g, i.e.
N
is a solution to (9.1). Since we assume the solution to (9.1) to be
unique, this implies (9.13).
It remains to consider the case that N
k
as k . Without loss of generality we
may assume that N
k
increases monotonically with k. Then (9.14) yields
|
k
N
k
| |
k
N
l
| |
k
N
l
N
l
| +|
N
l

| (9.17)
for l k. Given > 0, it follows from the second property in Denition 9.1 that the second
term on the right hand side of (9.17) is /2 for some l = l(). Moreover, it follows from
(9.15) that there exists K l() such that the rst term on the right hand side of (9.17)
is /2 for k K. This shows (9.13).
All known proofs establishing the properties in Denition 9.1 for some method require
some condition restricting the degree of nonlinearity of the operator F. A commonly used
condition is
|F() F() F
[]( )| |F() F()|, <

1
2
(9.18)
for || suciently small. At a rst glance, (9.18) may look like a weak condition since
the Taylor remainder can be estimated by
|F() F() F
[]( )|
L
2
| |
2
(9.19)
under the Lipschitz condition (8.10) (cf. Lemma 7.5), and the right hand side of (9.19)
involves a second power in the distance between and whereas the right hand side of
(9.18) only involves a rst power of the distance of the images under F. However, if (9.1)
is ill-posed, then |F() F()| may be much smaller than | |. In this sense (9.18)
is more restrictive than (9.19).
Whereas (9.19) could be shown for a number of parameter identication problems in
partitial dierential equation involving distributed measurements, it has not been possi-
ble to show (9.18) for other interesting problem such as inverse scattering problems or
impedance tomography, yet.
Our next theorem shows how (9.18) can be used to establish the monotonicity condition
(9.14) for Landweber iteration:
Theorem 9.3. Assume that (9.18) is satised in some ball B
:= : |
|
contained in D(F) and that |F
[]| 1 for all B
. Then the Landweber iteration

with = 1 together with the discrepancy principle (9.12) with
= 2
1 +
1 2
is well-dened and satises (9.14).
Proof. Assume that
n
B
for some n < N(, g
). Then
|
n+1
|
2
|
n
|
2
= 2 Re
n+1
n
_
+|
n+1
n
|
2
= 2 Re
n
](
), g
F(
n
)
_
+ Re
F(
n
), F
n
]F
n
]
(g
F(
n
)
_
= 2 Re
F(
n
) +F
n
](
), g
F(
n
)
_
Re
F(
n
), (I F
n
]F
n
]
)(g
F(
n
)
_
|g
F(
n
)|
2
2 Re
F(
n
) +F
n
](
), g
F(
n
)
_
|g
F(
n
)|
2
since I F
n
]F
n
]
) is positive semidenite by the assuption |F
[]| 1. Using (9.18)

and the Cauchy-Schwarz inequality we obtain
|
n+1
|
2
|
n
|
2
|g
F(
n
)|
_
2|g F(
n
)| + 2 |g
F(
n
)|
_
|g
F(
n
)|
_
(2 1)|g
F(
n
)| + 2(1 +)
_
0
Here the last inequality follows from n < N(, g
), the denition (9.12) of the discrepancy

principle, and the denition of . In particular,
n+1
B
. The assertion now follows by

induction on n.
The stability condition (9.15) is obvious from the continuity of F and F
[]
. Hence,
to prove that Landweber iteration is an iterative regularization method in the sense of
denition 9.1, it remains to show that it terminates after a nite number of steps for > 0,
i.e. N(, g
) < and that it converges for exact data. For this we refer to the originial
paper [HNS95] or the monography [EHN96].
Based on Theorem 9.2 it has been shown by Hanke [Han97a, Han97b] that the Levenberg-
Marquardt algorithm and the Newton-CG method are iterative regularization methods in
the sense of Denition 9.1 if the operator F satises the nonlinearity condition
|F() F() F
[]( )| c| | |F() F()| (9.20)

for | | suciently small. Although (9.20) is formally stronger than (9.18), for most
problems where (9.18) can be shown, (9.20) can be established as well.
Convergence rates
Let
e
n
:=
, n = 0, 1, . . . ,
denote the error of the nth iterate of the iteratively regularized Gauss-Newton method. It
follows from (9.10) that
e
n+1
= e
n
+ (T
n
T
n
+
n
I)
1
_
T
n
(g
F(
n
)) +
n
(e
0
e
n
)
_
= (T
n
T
n
+
n
I)
1
_
n
e
0
+T
n
(g
F(
n
) +T
n
e
n
)
_
with T
n
:= F
n
]. After a few further manipulations we see that e
n+1
is the sum of an
approximation error e
app
n+1
, a data noise error e
noi
n+1
and a nonlinearity error e
nl
n+1
given by
e
app
n+1
=
n
(T
T +
n
I)
1
e
0
,
e
noi
n+1
= (T
n
T
n
+
n
I)
1
T
n
(g
g),
e
nl
n+1
= (T
n
T
n
+
n
I)
1
T
n
(F(
) F(
n
) +T
n
e
n
)
+
n
(T
n
T
n
+
n
I)
1
(T
n
T
n
T
T)(T
T +
n
I)
1
e
0
where T := F
]. If F is linear, then e
nl
n+1
= 0 and we can use the error analysis in
Chapter 5. The error e
nl
n+1
caused by the nonlineariy of the operator can be estimated
separately. This possiblity makes the iteratively regularized Gauss-Newton method easier
to analyze than other methods.
Lemma 9.4. Assume that the source condition
0
= (F
])
w, |w| (9.21)
is satised for some
1
2
1 and w X and that there exists a Lipschitz constant L
such that
|F
[] F
[]| L| | (9.22)
for all , D(F). Moreover, assume that the stopping index N is chosen such that
+
1
2
N
<
+
1
2
n
, 0 n < N (9.23)
with some constant > 0, and that
n
is well-dened. Then
|e
n+1
|
_
+
1
2
_
n
+L
_
1/2
n
2
+|T
T|
+1/2
_
|e
n
| +
L
4
n
|e
n
|
2
. (9.24)
Proof. It follows from (5.20) that
|(T
n
T
n
+
n
I)
1
T
n
| sup
0
=
1
2
. (9.25)
By virtue of (9.21) and Theorem 5.2 we have |e
app
n+1
|
n
as 1. It follows from (9.25)
and (9.23) that |e
noi
n+1
|
1/2
n
/2
n
/(2) for n < N. The rst term in e
nl
n+1
can be
estimated by
1/2
n
(L/4)|e
n
|
2
due to Lemma 7.5 and (9.25). For the second term in e
nl
n+1
we get
|
n
(T
n
T
n
+
n
I)
1
(T
n
T
n
T
T)(T
T +
n
I)
1
e
0
|
= |
n
(T
n
T
n
+
n
I)
1
T
n
(T T
n
) + (T
n
)T (T
T +
n
I)
1
e
0
|
L|e
n
|
_
1/2
n
2
+|T
T|
+1/2
_
using (9.22), (9.25), (9.21) and Theorem 5.2. Putting all these estimates together gives the
assertion.
Theorem 9.5. Let the assumptions of Lemma 9.4 be satised, and assume that
1

n
n+1
r, lim
n
n
0,
n
> 0 (9.26)
with
0
1 and r > 1. Moreover, assume that is suciently large and is suciently
small. Then for exact data the error satises
|
n
| =
O
(
n
) , as n . (9.27)
For noisy data we have
|
N(,g
| =
O
_
2
2+1
_
as 0. (9.28)
Proof. Under the assumptions of Lemma 9.4 the quantities
n
:=
n
|e
n
| full the in-
equality
n+1
a +b
n
+c
2
n
, (9.29)
where the coecients are dened by a := r
(+1/(2)), b := r
L
_
1/2
n
/2 +|T
T|
+1/2
_
,
and c := r
L/4. Here we have used that 1/2, and hence

1/2
n

n
. Let t
1
and t
2
be the solutions to the xed point equation a +bt +ct
2
= t, i.e.
t
1
:=
2a
1 b +
_
(1 b)
2
4ac
, t
2
:=
1 b +
_
(1 b)
2
4ac
2c
,
let the stopping index N be given by (9.23) and dene C
:= max(
0
, t
1
). We will
show by induction that
n
C
(9.30)
for 0 n N if
b + 2
ac < 1, (9.31a)
0
t
2
, (9.31b)
X : |
0
C
D(F). (9.31c)
It is easy to see that the conditions (9.31a) are satised if is suently large and
suciently small. For n = 0, (9.30) is true by the denition of C
. Assume that (9.30) is

true for some k < N. Then (9.31c) implies that
n
D(F), i.e.
n+1
X is well dened,
and (9.29) is true for n = k. By (9.31a) we see that t
1
, t
2
R and t
1
< t
2
, and by (9.30)
we have 0
k
t
1
or t
1
<
k

0
. In the rst case, since a, b, c 0, we conclude that
k+1
a +b
k
+c
2
k
a +bt
1
+ct
2
1
= t
1
,
and in the second case we use (9.31b) and the fact that a+(b 1)t +ct
2
0 for t
1
t t
2
to show that
k+1
a +b
k
+c
2
k

k

0
.
Thus in both cases there holds
k+1
C
. This completes the proof of (9.30).

(9.30) immediately implies (9.27). The convergence rate (9.28) follows from the rst
inequality in (9.23) since
|
N

| C
N
C
()
2
2+1
.
In [BNS97] it has been shown that the iteratively regularized Gau-Newton method is
an iterative regularization method in the sense of Denition 9.1 if the operator F satises
the following nonlinearity condition: For , in a neighborhood of
there exist operators

R( , ) L(Y ) and Q( , ) L(X, Y ) and constants C
Q
, C
R
> 0 such that
F
[ ] = R( , )F
[] +Q( , )
|I R( , )| C
R
, |Q( , )| C
Q
|F
]( )|
(9.32)
Under this assumption order optimal convergence rates have been shown for <
1
2
using
the discrepancy principle. Convergence rates under logarithmic source conditions
0
= f
p
(F
])w, |w| ,
(cf. (5.21)) have been obtained in [Hoh97]. Such conditions are appropriate for exponen-
tially ill-posed problems such as inverse obstacle scattering problems and can often be
roughly interpreted as smoothness conditions in terms of Sobolev spaces.
Bibliography
[Ado95] H.-M. Adorf. Hubble space telescope image restoration in its fourth year. In-
verse Problems, 11:639653, 1995.
[Ale88] G. Alessandrini. Stable determination of conductivities by boundary measure-
ments. Appl. Anal., 27:153172, 1988.
[Bak92] A. B. Bakushinskii. The problem of the convergence of the iteratively regu-
larized Gauss-Newton method. Comput. Maths. Math. Phys., 32:13531359,
1992.
[Bau87] J. Baumeister. Stable Solution of Inverse Problems. Vieweg, Braunschweig,
1987.
[Bau90] H. Bauer. Ma- und Integrationstheorie. de Gruyter, Berlin, New York, 1990.
[BG95] A. Bakushinskii and A. Goncharsky. Ill-Posed Problems:Theory and Applica-
tions. Kluwer, Dordrecht, 1995.
[BK89] H. Banks and K. Kunisch. Parameter Estimation Techniques for Distributed
Systems. Birkhauser, Boston, 1989.
[BNS97] B. Blaschke, A. Neubauer, and O. Scherzer. On convergence rates for the itera-
tively regularized Gau-Newton method. IMA Journal of Numerical Analysis,
17:421436, 1997.
[CK83] D. Colton and R. Kre. Integral Equation Methods in Scattering Theory. Wiley,
New York, 1983.
[CK97a] G. Chavent and K. Kunisch. Regularization of linear least squares problems
by total bounded variation. ESAIM: Control, Optimisation and Calculus of
Variations, 2:359376, 1997.
[CK97b] D. Colton and R. Kre. Inverse Acoustic and Electromagnetic Scattering.
Springer Verlag, Berlin Heidelberg New York, second edition, 1997.
[DES98] P. Deuhard, H. Engl, and O. Scherzer. A convergence analysis of iterative
methods for the solution of nonlinear ill-posed problems under anely invariant
conditions. Inverse Problems, 14:1081 1106, 1998.
79
Bibliography 80
[EG88] H. W. Engl and H. Gfrerer. A posteriori parameter choice for general regu-
larization methods for solving linear ill-posed problems. Appl. Numer. Math.,
4:395417, 1988.
[EHN96] H. W. Engl, M. Hanke, and A. Neubauer. Regularization of Inverse Problems.
Kluwer Academic Publisher, Dordrecht Boston London, 1996.
[EKN89] H. W. Engl, K. Kunisch, and A. Neubauer. Convergence rates for Tikhonov
regularization of nonlinear ill-posed problems. Inverse Problems, 5:523540,
1989.
[EL93] H. W. Engl and G. Landl. Convergence rates for maximum entropy regulariza-
tion. SIAM J. Numer.oo Anal., 30:15091536, 1993.
[ER95] H. W. Engl and W. Rundell, eds. Inverse Problems in diusion processes.
SIAM, Philadelphia, 1995.
[Gfr87] H. Gfrerer. An a-posteriori parameter choice for ordinary and iterated Tikhonov
regularization of ill-posed problems leading to optimal convergence rates. Math.
Comp., 49:507522 and S5S12, 1987.
[Giu84] E. Giusti. Minimal surfaces and functions of bounded variation. Birkhauser,
Boston, 1984.
[Gla84] V. B. Glasko. Inverse Problems of Mathematical Physics. American Institute
of Physics, New York, 1984.
[GP95] R. D. Grigorie and R. Plato. On a minimax equality for seminorms. Linear
Algebra Appl., 221:227243, 1995.
[Gro83] C. W. Groetsch. Comments on Morozovs discrepancy principle. In:
G. Hammerlin and K. H. Homann, eds, Improperly Posed Problems and Their
Numerical Treatment, pp. 97104. Birkhauser, Basel, 1983.
[Gro84] C. W. Groetsch. The Theory of Tikhonov regularization for Fredholm equations
of the rst kind. Pitman, Boston, 1984.
[Hah98] P. Hahner. On acoustic, electromagnetic, and elastic scattering problems in
inhomogeneous media, 1998. Habilitation thesis, Gottingen.
[Han97a] M. Hanke. A regularizing Levenberg-Marquardt scheme, with applications to
inverse groundwater ltration problems. Inverse Problems, 13:7995, 1997.
[Han97b] M. Hanke. Regularizing properties of a truncated Newton-CG algorithm for
nonlinear inverse problems. Numer. Funct. Anal. Optim., 18:971993, 1997.
[Hel80] S. Helgason. The Radon Transform. Birkhauser Verlag, Boston, 1980.
Bibliography 81
[HNS95] M. Hanke, A. Neubauer, and O. Scherzer. A convergence analysis of the Landwe-
ber iteration for nonlinear ill-posed problems. Numer. Math., 72:2137, 1995.
[Hof86] B. Hofmann. Regularization of Applied Inverse and Ill-Posed Problems. Teub-
ner, Leipzig, 1986.
[Hoh97] T. Hohage. Logarithmic convergence rates of the iteratively regularized Gau-
Newton method for an inverse potential and an inverse scattering problem.
Inverse Problems, 13:12791299, 1997.
[Hoh99] T. Hohage. Iterative Methods in Inverse Obstacle Scattering: Regularization
Theory of Linear and Nonlinear Exponentially Ill-Posed Problems. PhD thesis,
University of Linz, 1999.
[Hoh00] T. Hohage. Regularization of exponentially ill-posed problems. Numer. Funct.
Anal. Optim., 21:439464, 2000.
[Hoh01] T. Hohage. On the numerical solution of a three-dimensional inverse medium
scattering problem. Inverse Problems, 17:17431763, 2001.
[HS71] F. Hirzebruch and W. Scharlau. Einf uhrung in die Funktionalanalysis. Spek-
trum Akademischer Verlag, Heidelberg Berlin Oxford, 1971.
[Isa90] V. Isakov. Inverse Source Problems, Vol. 34 of Mathematical Surveys and Mono-
graphs. Amer. Math. Soc., Providence, 1990.
[Isa98] V. Isakov. Inverse Problems for Partial Dierential Equations. Springer, 1998.
[Kal97] B. Kaltenbacher. Some Newton-type methods for the regularization of nonlinear
ill-posed problems. Inverse Problems, 13:729753, 1997.
[Kel76] J. B. Keller. Inverse problems. Am. Math. Mon., 83:107118, 1976.
[Kir96] A. Kirsch. An Introduction to the Mathematical Theory of Inverse Problems.
Springer, New York, Berlin, Heidelberg, 1996.
[Kre89] R. Kre. Linear Integral Equations. Springer Verlag, Berlin Heidelberg New
York, 1989.
[Lav67] M. Lavrentiev. Some Improperly Posed Problems of Mathematical Physics.
Springer, New York, 1967.
[LB91] R. L. Lagendijk and J. Biemond. Iterative Identication and Restoration of
Images. Kluwer, Dordrecht,London, 1991.
[Leb65] N. N. Lebedev. Special Functions and Their Applications. Prentice-Hall, Inc.,
Englewood, N.J., 1965.
Bibliography 82
[Lou89] A. K. Louis. Inverse und schlecht gestellte Probleme. Teubner Verlag, Stuttgart,
1989.
[Mai94] B. A. Mair. Tikhonov regularization for nitely and innitely smoothing oper-
ators. SIAM J. Math. Anal., 25:135147, 1994.
[MM79] A. A. Melkman and C. A. Micchelli. Optimal estimation of linear operators in
Hilbert spaces from inaccurate data. SIAM J. Numer. Anal., 16:87105, 1979.
[Mor84] V. A. Morozov. Methods for Solving Incorrectly Posed Problems. Springer
Verlag, New York, Berlin, Heidelberg, 1984.
[Mor93] V. A. Morozov. Methods for Solving Incorrectly Posed Problems. CRC Press,
Boca Raton, CA, 1993.
[MV92] R. Meise and D. Vogel. Einf uhrung in die Funktionalanalysis. Vieweg Verlag,
Braunschweig Wiesbaden, 1992.
[Nac88] A. Nachman. Reconstructions from boundary measurements. Annals of Math.,
128:531576, 1988.
[Nac95] A. Nachman. Global uniqueness for the two-dimensional inverse boundary-value
problem. Ann. of Math., 142:7196, 1995.
[Nat77] F. Natterer. Regularisierung schlecht gestellter Probleme durch Projektionsver-
fahren. Numer. Math., 28:329341, 1977.
[Nat86] F. Natterer. The Mathematics of Computerized Tomography. Teubner,
Stuttgart, 1986.
[Neu88] A. Neubauer. An a posteriori parameter choice for Tikhonov regularization in
the presence of modelling error. Appl.Numer.Math., 4:507519, 1988.
[Neu89] A. Neubauer. Tikhonov regularization for nonlinear ill-posed problems: optimal
convergence and nite-dimensional approximation. Inverse Problems, 5:541
557, 1989.
[NS98] M. Z. Nashed and O. Scherzer. Least squares and bounded variation regular-
ization with nondierentiable functionals. Numer. Funct. Anal. and Optimiz.,
19:873901, 1998.
[PV90] R. Plato and G. Vainikko. On the regularization of projection methods for
solving ill-posed problems. Numer. Math., 57:6379, 1990.
[Ram86] A. Ramm. Scattering by Obstacles. Reidel, Dordrecht, 1986.
[Rie99] A. Rieder. On the regularization of nonlinear ill-posed problems via inexact
Newton iterations. Inverse Problems, 15:309327, 1999.
Bibliography 83
[RK96] A. G. Ramm and A. I. Katsevich. The Radon transform and local tomography.
CRC Press, Boca Raton, CA, 1996.
[Rud73] W. Rudin. Functional Analysis. McGraw-Hill, New York, 1973.
[Sch02] E. Schock. Non-linear ill-posed problems: counter-examples. Inverse Problems,
18:715717, 2002.
[SEK93] O. Scherzer, H. W. Engl, and K. Kunisch. Optimal a-posteriori parameter choice
for Tikhonov regularization for solving nonlinear ill-posed problems. SIAM J.
Numer. Anal., 30:17961838, 1993.
[SG85] C. R. Smith and W. T. Grandy, eds. Maximum-Entropy and Bayesian Methods
in Inverse Problems. Fundamental Theories of Physics. Reidel, Dordrecht, 1985.
[SU87] J. Sylvester and G. Uhlmann. A global uniqueness theorem for an inverse
boundary value problem. Annals of Math., 125:153169, 1987.
[TA77] A. N. Tikhonov and V. Arsenin. Solutions of Ill-Posed Problems. Wiley, New
York, 1977.
[Tay96] M. Taylor. Partial Dierential Equations, Vol. 2. Springer Verlag, New York,
1996.
[TGSY95] A. N. Tikhonov, A. V. Goncharsky, V. V. Stephanov, and A. G. Yagola. Numer-
ical Methods for the Solution of Ill-Posed Problems. Kluwer Academic Press,
Dordrecht, 1995.
[Tik63] A. N. Tikhonov. On the solution of incorrectly formulated problems and the
regularization method. Soviet Math. Doklady, 4:10351038, 1963. Englisch
translation.
[VO96] C. Vogel and M. Oman. Iterative methods for total variation denoising. SIAM
J. Sci. Comput., 17:227238, 1996.
[Wah90] G. Wahba. Spline Models for Observational Data. SIAM, Philadelphia, 1990.
[Wei76] J. Weidmann. Lineare Operatoren in Hilbertraumen. Teubner, Stuttgart, 1976.

Lecture Notes On Inverse Problems

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture Notes On Inverse Problems

Uploaded by

Copyright:

Available Formats

lecture notes on

C([0, 1]) satisfying

, which tends to 0 as h 0, and a data noise error R

h)Tg. Again, the multiplication by 1/

along the line t s+t

(cf. Fig. 1.1). Then I satises the dierential equation

Fig. 1.1: The Radon transform

is nonlinear. Some important papers on theoretical

(0), then can be computed explicitely. Since (u

(x) vanishes nowhere. This problem is (mildly) ill-posed as it involves dieren-

is small in some areas, this may cause

!) belongs to the range of F, this is not

may be equivalently reformulated as nding

has a unique minimum

in X for all > 0,

[] is called the Frechet derivative of F at . F is called Frechet dierentiable if it is

[]h ,= 0. After possibly changing the sign of h we

[]h < 0. Then

minimizes the Tikhonov functional J

]h = 0 for all h X by Lemma 2.3 and 2.4. The choice h = T

T + I follows from the Lax-Milgrim lemma and the in-

dened by (2.3) minimizes J

, note that for all h X 0 the function

+ th) is a polynomial of degree 2 with 0 and

T| 1. It can be shown by induction that the nth Landweber

T| since the speed of convergence depends sensitively

can be decomposed as follows:

. On the other hand, the

formally tends to the unbounded operator T

) such that the residual |T

| is monotonely increasing. Usually it is sucient to nd

dened by (2.3) we may nd a better

T +I has to be inverted to compute

of this functional must belong to the domain of the dierential

| must not be too large, the incooperation of L has a smoothing

) requires only (at most) n appli-

. Moreover, it can be shown that the function

in F(K). If F(K) is convex, then Q

exists for all g

by Theorem 2.5. Note that F(K) is convex if F is linear and K

. It follows from (3.2) that (I P) U

for all X. Moreover, (I P), v) = P, v) = 0 for all v U

= R(I (I P)) = R(P) = U.

and R(T) = N(T

) = T, ) = 0 for all Y , so R(T

) = T, ) for all Y . Hence

N(T) and completes the proof of the

= R(T). The last equality follows from (3.7) since R(T)

= R(I Q) by Theorems 3.3 and 3.4, the identity T

. Hence, 0 = Q(T g) = T Qg.

, since otherwise the element P (that is obviously a least squares

. On the other hand,

) to the best-approximate solution of (3.9).

= 0 and N(T) = 0. Note that T

g is not dened for all

T) = R(T), the operator

: Y X be a family of continuous (not necessarily linear)

Y and noise level > 0 such that |g

. The pair (R, ) is called a regularization

). is called an a-priori parameter choice rule if (, g

are linear. E.g., for Tikhonov regularization we have

. The discrepancy principle is an example of an