Professional Documents
Culture Documents
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6.1
6.2
6.3
6.5
6.5
6.5
6.5
6.6
6.6
6.7
6.11
6.13
6.14
6.15
6.15
6.16
6.16
6.17
6.18
6.21
6.1
Introduction
The field of optimization uses linear operators and their adjoints extensively.
Example. differentiation, convolution, Fourier transform, Radon transform, among others.
Example. If A is a n m matrix, an example of a linear operator, then we know that ky Axk2 is minimized when x =
[A0 A]1 A0 y.
We want to solve such problems for linear operators between more general spaces. To do so, we need to generalize transpose
and inverse.
6.1
6.2
6.2
Fundamentals
We write T : D Y when T is a transformation from a set D in a vector space X to a vector space Y.
The set D is called the domain of T . The range of T is denoted
R(T ) = {y Y : y = T (x) for x D} .
If S D, then the image of S is given by
T (S) = {y Y : y = T (s) for s S} .
If P Y, then the inverse image of P is given by
T 1 (P ) = {x D : T (x) P } .
Notation: for a linear operator A, we often write Ax instead of A(x).
For linear operators, we can always just use D = X , so we largely ignore D hereafter.
x X .
Fact. For a linear operator A, an equivalent expression (used widely!) for the operator norm is
|||A||| = sup kAxk .
kxk1
Fact. If X is the trivial vector space consisting only of the vector 0, then |||A||| = 0 for any linear operator A.
Fact. If X is a nontrivial vector space, then for a linear operator A we have the following equivalent expressions:
kAxk
= sup kAxk .
x6=0 kxk
kxk=1
|||A||| = sup
6.3
......................................................................................................................
Example. Consider X = (Rn , kk ) and Y = R.
Clearly Ax = a1 x1 + +an xn so kAxkY = |Ax| = |a1 x1 + +an xn | |a1 ||x1 |+ +|an ||xn | (|a1 |+ +|an |) kxk .
In fact, if we choose x such that xi = sgn(ai ), then kxk = 1 we get equality above. So we conclude |||A||| = |a1 | + + |an |.
Example. What if X = Rn , kkp ? ??
Proposition. A linear operator is bounded iff it is continuous.
= T1 (x) + T2 (x)
= (T1 (x)).
Lemma. With the preceding definitions, when X and Y are normed spaces the following space of operators (!) is a vector space:
B(X , Y) = {bounded linear transformations from X to Y} .
(The proof that this is a vector space is within the next proposition.)
This space is analogous to certain types of dual spaces (see Ch. 5).
Not only is B(X , Y) a vector space, it is a normed space when one uses the operator norm |||A||| defined above.
Proposition. (B(X , Y), ||| |||) is a normed space when X and Y are normed spaces.
Proof. (sketch)
Claim 1. B(X , Y) is a vector space.
Suppose T1 , T2 B(X , Y) and F.
k( T1 + T2 )(x)kY = k T1 (x) + T2 (x)kY || kT1 (x)kY + kT2 (x)kY ||||| T1 ||| kxkX + ||| T2 ||| kxkX = K kxkX , where
K , ||||| T1 ||| + ||| T2 |||. So T1 + T2 is a bounded operator. Clearly T1 + T2 is a linear operator.
Claim 2. ||| ||| is a norm on B.
The hardest part is verifying the triangle inequality:
||| T1 + T2 ||| = sup k(T1 + T2 )xkY sup kT1 xkY + sup kT2 xkY = ||| T1 ||| + ||| T2 |||.
kxk=1
kxk=1
kxk=1
2
Are there other valid norms for B(X , Y)? ??
Remark. We did not really need linearity in this proposition. We could have shown that the space of bounded transformations
from X to Y with ||| ||| is a normed space.
6.4
Proof.
Suppose {Tn } is a Cauchy sequence (in B(X , Y)) of bounded linear operators, i.e., |||Tn Tm ||| 0 as n, m .
Claim 0. x X , the sequence {Tn (x)} is Cauchy in Y.
kTn (x) Tm (x)kY = k(Tn Tm )(x)kY |||Tn Tm ||| kxkX 0 as n, m .
Since Y is complete, for any x X the sequence {Tn (x)} converges to some point in Y. (This is called pointwise convergence.)
So we can define an operator T : X Y by T (x) , limn Tn (x).
To show B is complete, we must first show T B, i.e., 1) T is linear, 2) T is bounded.
Then we show 3) Tn T (convergence w.r.t. the norm ||| |||).
Claim 1. T is linear
T (x + z) = lim Tn (x + z) = lim [Tn (x) + Tn (z)] = T (x) + T (z) .
n
Claim 3: Tn T
Since {Tn } is Cauchy,
> 0, N N s.t. n, m > N
= |||Tn Tm |||
= kTn (x) Tm (x)kY kxkX , x X
=
Corollary. (B(X , R), ||| |||) is a Banach space for any normed space X .
Why? ??
We write A B(X , Y) as shorthand for A is a bounded linear operator from normed space X to normed space Y.
......................................................................................................................
Definition. In general, if S : X Y and T : Y Z, then we can define the product operator or composition as a
transformation T S : X Z by (T S)(x) = T (S(x)).
Proposition. If S B(X , Y) and T B(Y, Z), then T S B(X , Z).
6.5
6.3
Inverse operators
Definition. T : X Y is called one-to-one mapping of X into Y iff x1 , x2 X and x1 6= x2 = T (x1 ) 6= T (x2 ).
Equivalently, T is one-to-one if the inverse image of any point y Y is at most a single point in X , i.e., T 1 ({y}) 1, y Y.
Definition. T : X Y is called onto iff T (X ) = Y. This is a stronger condition than into.
Fact. If T : X Y is one-to-one and onto Y, then T has an inverse denoted T 1 such that T (x) = y iff T 1 (y) = x.
Many optimization methods, e.g., Newtons method, require inversion of the Hessian matrix (or operator) corresponding to a cost
function.
Lemma. [3, p. 171]
If A : X Y is a linear operator between two vector spaces X and Y, then A is one-to-one iff N (A) = {0}.
Linearity of inverses
We first look at the algebraic aspects of inverse operators in vector spaces.
Proposition. If a linear operator A : X Y (for vector spaces X and Y) has an inverse, then that inverse A1 is also linear.
Proof. Suppose A1 (y1 ) = x1 , A1 (y2 ) = x2 , A(x1 ) = y1 , and A(x2 ) = y2 . Then by the linearity of A we have A(x1 +x2 ) =
Ax1 + Ax2 = y1 + y2 , so A1 (y1 + y2 ) = x1 + x2 = A1 (y1 ) + A1 (y2 ).
2
6.4
Banach inverse theorem
Now we turn to the topological aspects, in normed spaces.
Lemma. (Baire) A Banach space X is not the union of countably many nowhere dense sets in X .
Proof. see text
Theorem. (Banach inverse theorem)
If A is a continuous linear operator from a Banach space X onto a Banach space Y for which the inverse operator A1 exists,
then A1 is continuous.
relation
isomorphic
topologically isomorphic
isometrically isomorphic
unitarily equivalent
operator
isomorphism
topological isomorphism
isometric isomorphism
unitary
requirements
linear, onto, 1-1
linear, onto, invertible, continuous
linear, onto, norm preserving = 1-1, continuous
isometric isomorphism that preserves inner products
6.6
Isomorphic spaces
Definition. Vector spaces X and Y are called isomorphic (think: same structure) iff there exists a one-to-one, linear mapping
T of X onto Y.
In such cases the mapping T is called an isomorphism of X onto Y.
Since an isomorphism T is one-to-one and onto, T is invertible, and by the linear of inverses proposition in 6.3, T 1 is linear.
Example. Consider X = R2 and Y = {f (t) = a + bt on [0, 1] : a, b R}.
An isomorphism is f = T (x) f (t) = a + bt where x = (a, b), with inverse T 1 (f ) = (f (0), f (1) f (0)).
Exercise. Any real n-dimensional vector space is isomorphic to Rn (problem 2.14 p. 44) [3, p. 268].
However, they need not be topologically isomorphic [3, p. 270].
......................................................................................................................
So far we have said nothing about norms. In normed spaces we can have a topological relationship too.
Definition. Normed spaces X and Y are called topologically isomorphic iff there exists a continuous linear transformation T of
X onto Y having a continuous inverse T 1 . The mapping T is called a topological isomorphism.
Theorem. [3, p. 258] Two normed spaces are topologically isomorphic iff there exists a linear transformation T with domain
X and range Y and positive constants m and M s.t. m kxkX kT xkY M kxkX , x X .
Example. In the previous example, consider (X , kk ) and (Y, kk2 ). Then for the same T described in that example:
R1
2
2
2
x = (a, b) = kT (x)kY = 0 (a + bt)2 dt = a2 + ab + b2 /3 a2 + |a| |b| + b2 /3 (1 + 1 + 1/3) kxk = 7/3 kxk .
2
2
Also kT (x)kY = a2 + ab + b2 /3 = (a + b/2)2 + b2 /12 = (a 3/2 + b/ 3)2 + a2 /4 kxk /12.
So X and Y are topologically isomorphic for the given norms.
Exercise. (C[0, 1], kk ) and (C[0, 1], kk1 ) are not topologically isomorphic. Why? ??
Isometric spaces
Definition. Let X and Y be normed spaces. A mapping T : X Y is called norm preserving iff kT (x)kY = kxkX , x X .
In particular, if T is norm preserving, then ||| T ||| = 1. What about the converse? ??
Proposition. If T is linear and norm preserving, then T is one-to-one, i.e., T (x) = T (z) = x = z.
Remark. To illustrate why we require onto here, consider T : En `2 defined by T (x) = (x1 , . . . , xn , 0, 0, . . .).
This T is linear, one-to-one, and norm preserving, but not onto.
Exercise. Every normed space X is isometrically isomorphic to a dense subset of a Banach space X .
(problem 2.15 on p. 44)
Normed spaces that are isometrically isomorphic can, in some sense, be treated as being identical, i.e., they have identical properties. However, the specific isomorphism can be important sometimes.
P
p
Example. Consider Y = `p (N) = {(a1 , a2 , . . .) :
|a | < }
i=1
P i
p
and X = `p (Z) = (. . . , a2 , a1 , a0 , a1 , a2 , . . .) :
i= |ai | < . Define the mapping T : X Y by y = (b1 , b2 , . . .) =
T (x) if bi = az(i) where z(i) = (1)i bi/2c . Note that z : {1, 2, 3, . . .} 0, 1, 1, 2, 2, . . ..
This mapping T is clearly an isometric isomorphism, so `p (Z) and `p (N) are isometrically isomorphic. Hence we only bother
to work with `p = `p (N) since we know all algebraic and topological results will carry over to double-sided sequences.
6.7
Fact. If U is unitary, then U is norm preserving since kU xk = hU x, U xi = hx, xi = kxk . Clearly |||U ||| = 1.
Furthermore, since U is onto, U is an isometric isomorphism. Interestingly, the converse is also true [3, p. 332].
Theorem. A mapping U of X onto Y, where X and Y are inner product spaces, is an isometric isomorphism iff U is a
unitary operator.
4 hU x, U yi = kU x + U yk kU x U yk + i kU x + iU yk i kU x iU yk
= kU (x + y)k kU (x y)k + i kU (x + iy)k i kU (x iy)k
2
= U in Hilbert spaces.
Exercise. Any complex n-dimensional inner product space is unitarily equivalent to Cn [3, p. 332].
Every separable Hilbert space is unitarily equivalent with `2 or some Cn [3, p. 339].
Example. Continue the previous f (t) = a + bt example, but now use En and (Y, h, i2 ). If g(t) = c + dt then
hT x1 , T x2 i = hf, gi =
f (t)g(t) dt =
ad bc bd
= [a b]
(a+bt)(c+dt) dt = ac+ + +
2
2
3
1
1/2
1/2
1/3
Example. Soon we will analyze the discrete-time Fourier transform (DTFT) operator, defined by
G = Fg G() =
gn en .
n=
We will show that F B(`2 , L2 [, ]) and F is invertible. And Parsevals relation from Fourier analysis is that
1
1
Fg, Fh
2
2
1
2
G()H () d =
n=
gn hn = hg, hi .
So `2 and L2 [, ] are unitarily equivalent, and the unitary operator needed is simply U =
1 F.
2
c
d
= x01 Gx2 ,
6.8
The following extension theorem is useful in proving that every separable Hilbert space is isometrically isomorphic to `2 .
nk
n )
limn ||| T ||| kxn x
n )
= limn
T(xn x
= limn
T (xn ) T (x
But ky yk
Y
......................................................................................................................
1
Claim 6. If T : N M exists and is bounded, and if X is a Banach space, then T is onto Y.
Let y Y. Since N = Y, {yn } N s.t. yn y.
1
By the same reasoning as in Claim 1, xn = T (yn ) is Cauchy in X .
Since X is a Banach space, x X s.t. xn
x.
1
Thus T (x) = limn T(xn ) = limn T T (yn ) = limn yn = y.
Thus, y Y, x X s.t. T (x) = y.
2
6.9
Proof.
Let M = [{ei }
i=1 ].
Let N = [{
ei }
i=1 ].
v
n
n
n
!
u n
n
X
X
X
X
X
u
2
t
|ci | =
ci ei
= kxk ,
ci ei
=
ci T (ei )
=
ci ei
=
T (x)
=
T
2
i=1
i=1
i=1
i=1
i=1
(6-1)
so ||| T ||| = 1 on M .
1
1
Clearly T exists on N and is given by T (
ei ) = ei , so ||| T ||| = 1 on N by a similar argument.
Since `2 is a Banach space, we have established all the conditions of the preceding theorem, so there exists a
unique linear operator T from H to `2 that is onto `2 with ||| T ||| = ||| T ||| = 1.
Since T is bounded, it is continuous, so one can take limits as n in (6-1) to show that kT xk = kxk ,
so T is norm preserving.
Thus T is an isometric isomorphism for H and `2 , so H and `2 are isometrically isomorphic.
A practical consequence: usually one can search for counterexamples in `2 (and its relatives) rather than L2 .
..................................................................................................
Example. The (different) spaces of odd and even functions in L2 [, ] are isometrically isomorphic.
Example. Elaborating. Let H = L2 [, ] and define
e2k = cos(kt) /c2k
Z
c2k =
cos2 (kt) dt
X = [{e2k }] = {f L2 : f even}
Y = [{e2k+1 }] = {f L2 : f odd} .
Then H, X , and Y are each Banach spaces that are isometrically isomorphic to each other!
And each is isometrically isomorphic to `2 .
Example. `2 is isometrically isomorphic to L2 [, ]. (Just use the DTFT.)
......................................................................................................................
But an even stronger result holds than the above...
Every separable Hilbert space is unitarily equivalent with `2 or some Cn [3, p. 339].
6.10
Summary
6.11
6.5
Adjoints in Hilbert spaces
When performing optimization in inner product spaces, often we need the transpose of a particular linear operator. The term
transpose only applies to matrices. The more general concept for linear operators is called the adjoint.
Luenberger presents adjoints in terms of general normed spaces. In my experience, adjoints most frequently arise in inner product
spaces, so in the interest of time and simplicity these notes focus on that case. The treatment here generalizes somewhat the
treatment in Naylor [3, p. 352].
Recall the following fact from linear algebra. If A : Cn Cm is a m n matrix, then
|gy (x)| kAxkY kykY |||A||| kxkX kykY by Cauchy-Schwarz and since A is bounded.
Thus gy is bounded, with |||gy ||| |||A||| kykY .
In other words, gy X ? .
Definition. By the Riesz representation theorem (here is where we use completeness), for each such y Y there exists a unique
z = zy X such that
gy (x) = hx, zy i ,
x X .
X
So we can define legitimately a mapping A : Y X , called the adjoint of A, by the relationship zy = A? (y).
The defining property of A? is then:
hAx, yi = hx, A? (y)i , x X , y Y.
Y
(At this point we should write A (y) rather than A y since we have not yet shown A? is linear, though we will soon.)
Lemma. A? is the only mapping of Y to X that satisfies the preceding equality.
Proof. For any y Y, suppose x X we have hAx, yi = hx, T1 (y)i = hx, T2 (y)i .
Y
Exercise. Here are some simple facts about adjoints, all of which concur with those of Hermitian transpose in Euclidean space.
hx, yi = hx, yi so for A : X X defined by Ax = x we have A? y = y. So reuse of the asterix is acceptable.
I? = I ,
0?X Y = 0YX
??
A = A (see Thm below)
(ST )? = T ? S ?
(S + T )? = S ? + T ?
(A)? = A?
Note: these last two properties are unrelated to the question of whether A? is a linear operator! ??
Example. Consider X = L2 [0, 1], Y = C2 , and A : X Y is defined by
Z
Z 1
tx(t) dt, y2 = [Ax]2 =
y = Ax y1 = [Ax]1 =
t2 x(t) dt .
6.12
Proof.
Claim 1. A? is linear.
hx, A? (y + z)i = hAx, y + zi = hAx, yi + hAx, zi = hx, A? (y)i + hx, A? (z)i
X
Y
Y
Y
X
X
= hx, A? (y) + A? (z)i , which holds for all x X and y Y, so A? (y + z) = A? (y) + A? (z) by the usual Lemma once
X
again.
......................................................................................................................
Claim 2. A? is bounded and |||A? ||| |||A|||.
2
kA? ykX = hA? y, A? yi = hAA? y, yi kAA? ykY kykY |||A||| kA? ykX kykY
X
so kA ykX |||A||| kykY and thus |||A ||| |||A||| and hence A B(Y, X ).
......................................................................................................................
Claim 3. A?? = A.
Since we have shown A? B(Y, X ), we can legitimately define the adjoint of A? , denoted A?? , as the (bounded linear) operator
that satisfies hA? y, xi = hy, A?? xi , x X , y Y.
X
Since hy, Axi = hAx, yi = hx, A? yi = hA? y, xi = hy, A?? xi , by the previous uniqueness arguments we see A?? = A.
Y
Y
X
X
Y
......................................................................................................................
?
Claim 4. |||A ||| = |||A|||.
From Claim 2 with A? : |||A?? ||| |||A? ||| or equivalently: |||A||| |||A? |||.
2
Corollary.
Under the same conditions as above: |||A? A||| = |||AA? ||| = |||A|||2 = |||A? |||2 .
Proof. Recalling that |||S T ||| |||S|||||| T ||| we have by the preceding theorem:
|||A? A||| |||A? ||||||A||| = |||A|||2 = |||A? |||2 ,
and kAxkY = hAx, Axi = hAx, (A? )? xi = hA? Ax, xi kA? Axk kxk |||A? A||| kxk
Y
Y
?
so |||A|||2 |||A? A|||. Combining yields the equality |||A|||2 = |||A A|||. The rest is obvious using A?? = A.
Proposition. If A B(X , Y) is invertible, where X ,Y are Hilbert spaces, then A? has an inverse and (A? )1 = (A1 )? .
Proof.
Claim 1. A? : Y X is one-to-one into X .
Consider y1 , y2 Y with y1 6= y2 . but suppose A? y1 = A? y2 , so A? d = 0 where d = y2 y1 6= 0.
Thus hx, A? di = 0, x X and hence hAx, di = 0, x X .
X
Since A is invertible, we can make the change of variables z = Ax and hence hz, di = 0, z X .
Y
But this implies d = 0, contradicting the supposition that d 6= 0. So A? is one-to-one.
Claim 2. A? : Y X is onto X .
By the Banach inverse theorem, since A B(X , Y) and A is invertible, A1 B(Y, X ), so A1 has its own adjoint, (A1 )? .
Pick any z X . Then for any x X
? 1
Claim 3. (A ) = (A1 )? .
Since A? is one-to-one and onto X , A? is invertible.
Furthermore, z = A? (A1 )? z = (A? )1 z = (A1 )? z, z X . Thus (A? )1 = (A1 )? .
1
= B.
6.13
Definition. A B(H, H), where H is a real Hilbert space, is called self adjoint if A? = A.
Exercise. An orthogonal projection PM : H H, where M is a closed subspace in a Hilbert space H, is self adjoint. ??
Exercise. Conversely, if P B(H, H) and P 2 = P and P ? = P , then P is an orthogonal projection operator. (L6.16)
Definition. A self-adjoint bounded linear operator A on a Hilbert space H is positive semidefinite iff hx, Axi 0, x H.
(Caution: the proof on [3, p. 358] is incomplete w.r.t. the onto aspects.)
1
0
, for which U ? U = I but U is not onto.
Example.
Consider X = Y = `2 and the (linear) discrete-time convolution operator A : `2 `2 defined by
z = Ax zn =
hnk xk ,
k=
n Z,
where we assume that h `1 , which is equivalent to BIBO stability. We showed previously that kAxk2 khk1 kxk2 , so A is
bounded with |||A||| khk1 , so A has an adjoint.
(Later we will show |||A||| = kHk where H is the frequency response of h.)
Since A is bounded, it is legitimate to search for its adjoint:
"
#
"
#
X
X
X
X
X
k=
k=
n=
n=
which is convolution with hkn .
k=
k=
hkn yk ,
6.14
6.6
Relations between the four spaces
The following theorem relates the null spaces and range spaces of a linear operator and its adjoint.
Remark. Luenberger uses the notation [R(A)] but this seems unnecessary since R(A) is a subspace.
Theorem. If A B(X , Y) where X ,Y are Hilbert spaces, then
1. {R(A)} = N (A? ),
3. {R(A? )} = N (A),
2. R(A) = {N (A? )} ,
4. R(A? ) = {N (A)} .
Proof.
Now pick y {R(A)} . Then for all x X , 0 = hAx, yi = hx, A? yi . So A? y = 0, i.e., y N (A? ).
6.15
Example. Consider A : L2 [R] L2 [R] (which is complete [3, p. 589]) defined by the shifted filtering operator:
y = Ax y(t) = (Ax)(t) =
where sinc(t) =
sin(t)
t
sinc2 (t 3 ) x( ) d,
sinc2 (t + 3 ) y( ) d.
The nullspace of A? consists of signals in L2 whose spectrum is zero over the frequencies (1/2, 1/2). The range of A is all
signals in L2 that are band-limited to that same range, so the orthogonal complement is the same as N (A? ). (Picture) .
Exercise. Why did I use sinc2 () rather than sinc()? ??
6.7
Duality relations for convex cones
skip
6.8
Geometric interpretation of adjoints
When A B(X , Y) with X and Y Hilbert spaces, consider the following hyperplanes:
n
o
n
o
V1 = x X : hx, A? yi = 1 for some y Y,
V2 = y Y : hAx0 , yi = 1 for some x0 X .
X
skip
6.16
6.9
The normal equations (No exact solutions, so we seek a minimum-norm, unconstrained approximation.)
We previously explored the normal equations in a setting where R(A) was finite dimensional. Now we have the tools to generalize.
The following theorem illustrates the fundamental role of adjoints in optimization.
Theorem. Let A B(X , Y) where X and Y are Hilbert spaces.
For a fixed y Y, a vector x X minimizes ky AxkY iff A? Ax = A? y.
Proof. Consider the subspace M = R(A). Then the minimization problem is equivalent to inf mM ky mk.
By the pre-projection theorem, m? M achieves the infimum iff y m? M , i.e., y m? M = [R(A)] = N (A? ), by a
previous theorem. Thus, 0 = A? (y m? ) = A? y A? Ax, for some x X .
2
There is no claim of existence here, since R(A) might not be closed.
There is no claim of uniqueness of x? here, since although m? will be unique, there may be multiple solutions to m? = Ax.
If a minimum distance solution x? exists and A? A is invertible, then the solution is unique and has the familiar form:
x? = (A? A)1 A? y.
Pn
i=1
We know how to solve this from Ch. 3, but the operator notation provides a concise expression.
Define the operator A B(Cn , H) by
A ,
n
X
i xi where = (1 , . . . , n ).
i=1
Note that kAk = 0 G max (G) kk where G = A? A is the Gram matrix. Since the xp
i s are linearly independent, G
is symmetric positive definite so its eigenvalues are real and positive. So A is bounded and |||A||| = max (G).
Our goal is to minimize ky Ak over Cn . By the preceding theorem, the optimal solution must satisfy A? A = A? y.
What is A? : H Cn here? Recall we need hA, yi = h, A? yiCn , Cn , so
H
hA, yi =
H
* n
X
i xi , y
i=1
n
X
i=1
i hxi , yi =
H
n
X
i=1
i hy, xi i =
H
n
X
i=1
A? y = hy, x1 i , . . . , hy, xn i .
H
Thus one can easily show that A Ax = A y is equivalent to the usual normal equations.
So no computational effort has been saved, but the notation is more concise. Furthermore, the notation (A? A)1 A? y is comfortingly similar to the notation (A0 A)1 A0 y that we use for the least-squares solution of linear systems of equations in Euclidean
space. So with everything defined appropriately, the generalization to arbitrary Hilbert spaces is very natural.
6.17
6.10
The dual problem (for minimum norm solutions)
If y = Ax has multiple solutions, then in some contexts it is reasonable to choose the solution that minimizes some type of norm.
However, the appropriate norm is not necessarily the norm induced by the inner product.
Exercise. Generalize Luenbergers treatment to weighted norms.
Theorem. Let X and Y be Hilbert spaces and A B(X , Y). Suppose y R(A) is given, i.e., y = Ax0 for some x0 X .
Assume that R(A? ) is closed in X . ERROR in Luenberger p. 161.
The unique vector x? X having minimum norm and satisfying Ax = y is characterized by:
x? = {A? z : AA? z = y, z Y} .
Proof. Since x0 is one solution to Ax = y, the general solution has the form x = x0 m, where m M , N (A).
In other words, we seek the minimum norm vector in the linear variety
V = {x X : Ax = y} = {x0 m : m M } .
Since A is continuous, N (A) is a closed subspace (homework). Thus V is a closed linear variety in a Hilbert space, and as such
has a unique element x? of minimum norm by the (generalized) projection theorem, and that element is characterized by the two
conditions x? = x0 m? M , and x? V .
Since R(A? ) was assumed closed, by the previous 4-space theorem we have M = {N (A)} = R(A? ) = R(A? ) .
Thus x? M = x? M = R(A? ) = x? = A? z for some z Y.
(There may be more than one such z.)
Furthermore, x? V = Ax? = y = AA? z = y.
(x? is unique even if there are many such z values!)
2
......................................................................................................................
If AA? is invertible, then the minimum norm solution has the form:
x? = A? (AA? )1 y.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 0 1
Example. X = R3 , Y = R2 , A =
, y = (3, 3).
0 1 1
There are multiple solutions including
thereof.
x1 = (0, 0, 3)and x2 = (3, 3, 0) and convex combinations
1
2
1
2
1
1
1
and x? = A? z = (1, 1, 2).
, z = [AA? ] y =
, [AA? ] = 31
Here A? = AT so AA? =
1
1 2
1 2
Of course x? has a smaller 2-norm than the other solutions above.
However, x1 is more sparse which can be important in some applications.
1
= 2.
Example. X = R1 , Y = R2 , A =
, y = (2, 2). Then, using M ATLABs pinv function, x
1
1 1
= A? z however!
In this case, AA? =
, so there are multiple solutions to AA? z = y, each of which leads to the same x
1 1
......................................................................................................................
Example. The downsampling by averaging operator A : `2 `2 is defined by Ax = (x1 /2 + x2 /2, x3 /2 + x4 /2, . . .).
One can show that this is bounded with |||A||| = 1/ 2, since |||[1/2 1/2]||| = 1/ 2.
The adjoint is A? y = (y1 /2, y1 /2, y2 /2, y2 /2, . . .), so AA? z = (z1 /2, z2 /2, . . .) = 12 z = AA? = 12 I. So z = 2y.
Thus x? = A? z = 2A? y = (y1 , y1 , y2 , y2 , . . .), which is a sensible solution in this application.
6.18
6.11
Pseudo-inverse operators
This concept allows a more general treatment of finding solutions to Ax = y, regardless of how many such solutions.
Definition. Let X and Y be Hilbert spaces with A B(X , Y) and R(A) closed in Y. For any y Y, define the following linear
variety:
Vy = x1 X : kAx1 ykY = min kAx ykY .
xX
Among all vectors x1 Vy , let x0 be the unique vector of minimum norm kkX .
The pseudo-inverse A+ of A is the operator mapping each y in Y into its corresponding x0 . So A+ : Y X .
Note: closure of R(A) usually arises from one of X or Y being finite dimensional.
This definition is legitimate since minxX ky Axk = minmM =R(A) ky mk where R(A) is assumed closed.
By the projection theorem there is a unique y M = R(A) of minimum distance to y.
However, the linear variety V = {x X : y = Ax} may nevertheless contain multiple points.
What about uniqueness of the vector having minimum norm?
is a linear variety, a translate of N (A), which is closed. Why? ??
The set {x1 X : Ax1 = y}
So by the Ch. 3 theorem on minimizing norms within a linear variety, x0 is unique. Thus A+ is well defined.
If A is invertible, then x0 = A1 y will of course be the minimizer, in which case we have A+ = A1 .
y
R(A)
{N (A)}
x0
N (A)
V = {x X : Ax = y}
A+
Since N (A) is a closed subspace in the Hilbert space X , by the theorem on orthogonal complements we have
X = N (A) {N (A)} .
Similarly, since we have assumed that R(A) is closed (and it is a subspace):
Y = R(A) {R(A)} .
When restricted to the subspace {N (A)} , the operator A is a mapping from {N (A)} to R(A) (of course).
Between these spaces, A is one-to-one, due to the following Lemma.
Thus A has a linear inverse on R(A) that maps each point in R(A) back into a point in {N (A)} . This inverse defines A+ on
6.19
[N (A)]
R(A)
A+
[R(A)]
N (A)
A+
A
0
. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
0 1
. Then
Example. Consider A =
0 1
N (A) = {(a, 0) : a R} ,
{N (A)} = {(0, b) : b R} ,
R(A) = {(b, b) : b R} ,
{R(A)} = {(a, a) : a R} .
Recall that
y = P[v] x y =
so PR(A) =
1
2
1
1
[1 1] =
1
2
Example.
v
x,
kvk
v
,
kvk
1 1
.
1 1
i1
h
0 0
0
+
=
, so A = A | {N (A)}
PR(A) =
0 1
0
i1
h
(aI) =
1
a I,
0I,
0
1
1
2
1 1
1 1
0
1/2
0
1/2
a 6= 0
= a+ I.
a=0
Proof. (Exercise)
......................................................................................................................
L6.19:
A+ = lim [A? A + I]1 A? = lim A? [AA? + I]1 ,
+
+
0
6.20
......................................................................................................................
One property that is missing is (CB)+ 6= B + C + , unlike with adjoints and inverses.
However, L 6.21 claims that if B is onto and C is one-to-one, then (CB)+ = B + C + .
Example.
1/2
1
1
,
, C + = C ? (CC ? ) =
1/2
1
In this case, CB = 1, but B + C + = 1/2.
C = [1 1],
C? =
B=
1
0
B ? = [1 0],
B + = (B ? B)
B ? = [1 0].
P3
k=0
k cos2 4 k .
Pn
Since G is (Hermitian) symmetric nonnegative definite, it has an orthogonal eigenvector decomposition G = QDQ0 , and one can
show that G+ = QD+ Q0 .
6.21
X
G = Fg G() =
g(n) en .
(6-2)
n=
This is an infinite series, so for a rigorous treatment we must find suitable normed spaces in which we can establish convergence.
The natural family of norms is `p , for some 1 p . Why? What about the doubly infinite sum? ??
The logical meaning of the above definition is really
G = Fg G = lim FN g, where GN = FN g GN () ,
N
N
X
n=N
g(n) en , where N N.
(6-3)
Alternatively, one might also try to show that F = limN FN , where the limit is with respect to the operator norm in
B(`p , Lr [, ]) for some p and r, but this is in fact false! (See below.)
Since GN () is only a finite sum, clearly it is always well defined.
Furthermore, being a finite sum of complex exponentials, GN () is continuous in , and hence Lebesgue integrable on [, ].
So we could write FN : R2N +1 L1 [, ] or perhaps more usefully: FN : `p Lr [, ] for any 1 p, r .
To elaborate, note that by Holders inequality:
N
N
X
X
X
|g(n) 1{|n|N } | kgkp
1{|n|N }
= kgkp (2N + 1)11/p .
|g(n)| =
g(n) en
|GN ()| =
q
n=
n=N
n=N
Thus
Z
|GN ()|r d
1/r
Furthermore, for p = 1 the upper bound is achieved when g(n) = [n], so |||FN |||1r = (2)1/r .
Thus FN B(`p , Lr [, ]) for any 1 p, r , and |||FN |||pr (2)1/r (2N + 1)11/p .
Remark. f L [a, b] = f Lr [a, b] if < a < b < and r 1.
But to make (6-2) rigorous we must have normed spaces in which the limit in (6-3) exists.
Non-convergence of the operators
Note: treating FN : `p Lr [, ], by considering g0 (n) = [n (M + 1)] we have for N > M :
|||FN FM |||pr = sup k(FN FM )gkr k(FN FM )g0 kr =
e(M 1) 0
= (2)1/r .
g : kgkp 1
So {FN } is not Cauchy (and hence not convergent) in B(`p , Lr [, ]), no matter what p or r values one chooses.
So we must analyze convergence of the spectra GN () = FN g, rather than convergence of the operators FN themselves.
(6-4)
6.22
`1 analysis
Proposition. If g `1 , then {FN g} is Cauchy in Lr [, ] for any 1 r .
Proof.
If g `1 , then defining
(6-5)
and GN = FN g we have
N
M
X
X
X
g(n) en
g(n) en =
g(n) en
=
n=M
n=N
nI(N,M )
X
X
|g(n)|
|g(n)| 0 as N, M .
|GN () GM ()|
nI(N,M )
(6-6)
|n|>min(N,M )
So for each R, the sequence {GN ()}N =1 is Cauchy in R, and hence convergent by the completeness of R, provided g `1 .
Thus for each , {GN ()} converges pointwise to some limit, call it G(), where (6-2) is shorthand for that limit.
Furthermore, when g `1 :
So the sequence of function {GN } is Cauchy in L [, ], which is complete, so {GN } converges to a limit G L [, ].
More generally, using (6-6):
r
kGN GM kr =
|GN () GM ()| d 2
|n|>min(N,M )
|g(n)| 0 as N, M ,
(6-7)
6.23
`2 analysis
Unfortunately, `1 analysis is a bit restrictive; the class of signals is not as broad as we might like, and for least-squares problems
we would rather work in `2 . This will allow us to apply Hilbert space methods.
However, if a signal is in `2 , it is not necessarily in `1 , so the above `1 analysis does not apply to many signals in `2 . So we need a
different approach.
Proposition. If g `2 , then {FN g} is Cauchy in L2 [, ].
2
GM k2
N
2
2
Z X
M
X
X
n
n
n
g(n) e
g(n) e
g(n) e
d =
d
n=N
nI(N,M )
n=M
Z
X
X
X
X
e(nm) d =
g(n)g (m)
g(n)g (m)2
nI(N,M ) mI(N,M )
nI(N,M )
|g(n)|2 2
|n|>min(N,M )
nI(N,M ) mI(N,M )
1{n=m}
|g(n)|2 0 as N, M ,
Since L2 [, ] is complete, {GN } is convergent (in the L2 sense!) to some limit G L2 [, ], and the expression in (6-2) is
again a reasonable shorthand for that limit, whatever it may be, and now we can define F : `2 L2 [, ] via (6-7). In other
words,
Z
kGN GM k2 =
|GN () G()|2 d 0.
2.
2
N
N
X
X
n
g(n) e
=
d =
Z
n=N
N
X
n=N m=N
g(n)g (m)
(nm)
d = 2
2
Thus kGk2 = 1 d = 2 which achieves the upper bound above. Hence |||F|||22 = 2.
N
X
n=N
|g(n)|2 2 kgk2 ,
6.24
Adjoint
Since F B(`2 , L2 [, ]) and both `2 and L2 [, ] are Hilbert spaces, F has an adjoint:
!
Z
Z
X
X
n
= hg, F ? Si`2
S() en d
g(n) e
S ()
d =
g(n)
hFg, SiL2[,] =
1
2
ek .
X
k
where
X
k
X
k
1
hG, ek i ek ,
2
2
| hG, ek i |2 = kGk2 .
Thus (since F is linear and invertible and hence an isomorphism), the normalized DTFT U = 1/ 2F is unitary.
So L2 [, ] and `2 are unitarily equivalent.
6.25
Inverse DTFT
Define a partial inverse DTFT operator RN : L2 [, ] `2 by
Z
1
gN = RN G gN (n) =
G() en d
2
1{|n|N } = 1
hG, en i
1{|n|N } .
Proof.
kRN G RM Gk =
1
2
kI(N,M )
| hG, ek i |2
1
2
|k|>min(N,M )
| hG, ek i |2 0 as N, M .
2
Since `2 is complete, {RN G} converges to some limit g `2 and we define RG to be that limit: RG , limN RN G.
Proof.
2
kRN Gk2
X 1 Z
=
2
|n|N
G() e
2
1 X
1 X
1
2
d =
| hG, ek i |2
kGk2 .
| hG, ek i |2 =
2
2
2
|k|N
2
2
2
When G() = 1 we have kRN Gk = k[n]k = 1 and kGk = 2 so |||RN ||| = 1/ 2.
Proof 2. kRGk2 = limN kRN Gk2 limN
1
2
Proposition. RF = I`2 and FR = IL2 , so F 1 = R, where IH denotes the identity operator for Hilbert space H.
Proof. Exercise.
6.26
Convolution revisited
Using time-domain analysis, we showed previously that if h `1 and Ax = h x then A B(`p , `p ).
We have shown F B(`2 , L2 [, ]) and F 1 B(L2 [, ], `2 ).
Consider the band-limiting linear operator D : L2 [, ] L2 [, ] defined by
x(), || /2
y = Dx y() =
0,
otherwise.
Clearly D B(L2 [, ], L2 [, ]) and in fact |||D||| = 1.
Proof. (=) Suppose kFhk is finite. Then using the convolution property of the DTFT and Parseval:
Z
1
1
2
2
2
2
2
2
2 1
kHXk2 =
kXk2 = kHk kxk2 ,
kh xk2 =
|H()X()| d kHk
2
2
2
|||Ah ||| kFhk = kHk . The upper bounded is achieved when h(n) = [n].
(=) If H is unbounded, then forall T there exists an interval over which H T . Choose x to be a signal whose spectrum is an
indicator function on that interval, and then kx hk2 T kxk2 , so Ah would be be unbounded. Take contrapositive.
2
Continuous-time case
Fourier transform
convolution
Youngs inequality
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
P. Enflo. A counterexample to the approximation problem in Banach spaces. Acta Math, 130:30917, 1973.
I. J. Maddox. Elements of functional analysis. Cambridge, 2 edition, 1988.
A. W. Naylor and G. R. Sell. Linear operator theory in engineering and science. Springer-Verlag, New York, 2 edition, 1982.
D. G. Luenberger. Optimization by vector space methods. Wiley, New York, 1969.
J. Schauder. Zur theorie stetiger abbildungen in funktionenrumen. Math. Zeitsch., 26:4765, 1927.
L. Grafakos. Classical and modern Fourier analysis. Pearson, NJ, 2004.
P. P. Vaidyanathan. Generalizations of the sampling theorem: Seven decades after Nyquist. IEEE Tr. Circ. Sys. I, Fundamental
theory and applications, 48(9):1094109, September 2001.
A. M. Ostrowski. Solution of equations in Euclidian and Banach spaces. Academic, 3 edition, 1973.
R. R. Meyer. Sufficient conditions for the convergence of monotonic mathematical programming algorithms. J. Comput.
System. Sci., 12(1):10821, 1976.
M. Rosenlicht. Introduction to analysis. Dover, New York, 1985.
A. R. De Pierro. On the relation between the ISRA and the EM algorithm for positron emission tomography. IEEE Tr. Med.
Imag., 12(2):32833, June 1993.
A. R. De Pierro. On the convergence of the iterative image space reconstruction algorithm for volume ECT. IEEE Tr. Med.
Imag., 6(2):174175, June 1987.
A. R. De Pierro. Unified approach to regularized maximum likelihood estimation in computed tomography. In Proc. SPIE
3171, Comp. Exper. and Num. Meth. for Solving Ill-Posed Inv. Imaging Problems: Med. and Nonmed. Appl., pages 21823,
1997.
J. A. Fessler. Grouped coordinate descent algorithms for robust edge-preserving image restoration. In Proc. SPIE 3170, Im.
Recon. and Restor. II, pages 18494, 1997.
A. R. De Pierro. A modified expectation maximization algorithm for penalized likelihood estimation in emission tomography.
IEEE Tr. Med. Imag., 14(1):132137, March 1995.
J. A. Fessler and A. O. Hero. Penalized maximum-likelihood image reconstruction using space-alternating generalized EM
algorithms. IEEE Tr. Im. Proc., 4(10):141729, October 1995.
M. W. Jacobson and J. A. Fessler. Properties of MM algorithms on convex feasible sets. SIAM J. Optim., 2003. Submitted. #
061996.
P. L. Combettes and H. J. Trussell. Method of successive projections for finding a common point of sets in metric spaces. J.
Optim. Theory Appl., 67(3):487507, December 1990.
F. Deutsch. The convexity of Chebyshev sets in Hilbert space. In A. Yanushauskas Th. M. Rassias, H. M. Srivastava, editor,
Topics in polynomials of one and several variables and their applications, pages 14350. World Sci. Publishing, River Edge,
NJ, 1993.
M. Jiang. On Johnsons example of a nonconvex Chebyshev set. J. Approx. Theory, 74(2):1528, August 1993.
V. S. Balaganskii and L. P. Vlasov. The problem of convexity of Chebyshev sets. Russian Mathematical Surveys, 51(6):1127
90, November 1996.
V. Kanellopoulos. On the convexity of the weakly compact Chebyshev sets in Banach spaces. Israel Journal of Mathematics,
117:619, 2000.
A. R. Alimov. On the structure of the complements of Chebyshev sets. Functional Analysis and Its Applications, 35(3):176
82, July 2001.
Y. Bresler, S. Basu, and C. Couvreur. Hilbert spaces and least squares methods for signal processing, 2000. Draft.
M. Vetterli and J. Kovacevic. Wavelets and subband coding. Prentice-Hall, New York, 1995.
D. C. Youla and H. Webb. Image restoration by the method of convex projections: Part ITheory. IEEE Tr. Med. Imag.,
1(2):8194, October 1982.
M. Unser and T. Blu. Generalized smoothing splines and the optimal discretization of the wiener filter. IEEE Tr. Sig. Proc.,
2004. in press.