You are on page 1of 27

Chapter 6

Linear operators and adjoints


Contents
Introduction . . . . . . . . . . . .
Fundamentals . . . . . . . . . . .
Spaces of bounded linear operators
Inverse operators . . . . . . . . . .
Linearity of inverses . . . . . . . .
Banach inverse theorem . . . . . .
Equivalence of spaces . . . . . . .
Isomorphic spaces . . . . . . . . .
Isometric spaces . . . . . . . . . .
Unitary equivalence . . . . . . . .
Adjoints in Hilbert spaces . . . . .
Unitary operators . . . . . . . . .
Relations between the four spaces .
Duality relations for convex cones .
Geometric interpretation of adjoints
Optimization in Hilbert spaces . . .
The normal equations . . . . . . .
The dual problem . . . . . . . . .
Pseudo-inverse operators . . . . .
Analysis of the DTFT . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

6.1
6.2
6.3
6.5
6.5
6.5
6.5
6.6
6.6
6.7
6.11
6.13
6.14
6.15
6.15
6.16
6.16
6.17
6.18
6.21

6.1
Introduction
The field of optimization uses linear operators and their adjoints extensively.
Example. differentiation, convolution, Fourier transform, Radon transform, among others.
Example. If A is a n m matrix, an example of a linear operator, then we know that ky Axk2 is minimized when x =
[A0 A]1 A0 y.
We want to solve such problems for linear operators between more general spaces. To do so, we need to generalize transpose
and inverse.
6.1

c J. Fessler, December 21, 2004, 13:3 (student version)


6.2

6.2
Fundamentals
We write T : D Y when T is a transformation from a set D in a vector space X to a vector space Y.
The set D is called the domain of T . The range of T is denoted
R(T ) = {y Y : y = T (x) for x D} .
If S D, then the image of S is given by
T (S) = {y Y : y = T (s) for s S} .
If P Y, then the inverse image of P is given by
T 1 (P ) = {x D : T (x) P } .
Notation: for a linear operator A, we often write Ax instead of A(x).
For linear operators, we can always just use D = X , so we largely ignore D hereafter.

Definition. The nullspace of a linear operator A is N (A) = {x X : Ax = 0} .


It is also called the kernel of A, and denoted ker(A).

Exercise. For a linear operator A, the nullspace N (A) is a subspace of X .


Furthermore, if A is continuous (in a normed space X ), then N (A) is closed [3, p. 241].
Exercise. The range of a linear operator is a subspace of Y.
Proposition. A linear operator on a normed space X (to a normed space Y) is continuous at every point X if it is continuous
at a single point in X .

Proof. Exercise. [3, p. 240].


Luenberger does not mention that Y needs to be a normed space too.
Definition. A transformation T from a normed space X to a normed space Y is called bounded iff there is a constant M such that
kT (x)k M kxk , x X .
Definition. The smallest such M is called the norm of T and is denoted ||| T |||. Formally:
||| T ||| , inf {M R : kT (x)k M kxk , x X } .
Consequently:
kT (x)kY ||| T ||| kxkX ,

x X .

Fact. For a linear operator A, an equivalent expression (used widely!) for the operator norm is
|||A||| = sup kAxk .
kxk1

Fact. If X is the trivial vector space consisting only of the vector 0, then |||A||| = 0 for any linear operator A.
Fact. If X is a nontrivial vector space, then for a linear operator A we have the following equivalent expressions:
kAxk
= sup kAxk .
x6=0 kxk
kxk=1

|||A||| = sup

c J. Fessler, December 21, 2004, 13:3 (student version)


6.3

......................................................................................................................
Example. Consider X = (Rn , kk ) and Y = R.
Clearly Ax = a1 x1 + +an xn so kAxkY = |Ax| = |a1 x1 + +an xn | |a1 ||x1 |+ +|an ||xn | (|a1 |+ +|an |) kxk .
In fact, if we choose x such that xi = sgn(ai ), then kxk = 1 we get equality above. So we conclude |||A||| = |a1 | + + |an |.


Example. What if X = Rn , kkp ? ??
Proposition. A linear operator is bounded iff it is continuous.

Proof. Exercise. [3, p. 240].


More facts related to linear operators.
If A : X Y is linear and X is a finite-dimensional normed space, then A is continuous [3, p. 268].
P
If
: X Y is a transformation where
X and Y are normed spaces, then A is linear and continuous iff A( i=1 i xi ) =
PA
P

i=1 i A(xi ) for all convergent series


i=1 i xi . [3, p. 237].
This is the superposition principle as described in introductory signals and systems courses.

Spaces of bounded linear operators


Definition. If T1 and T2 are both transformations with a common domain X and a common range Y, over a common scalar field,
then we define natural addition and scalar multiplication operations as follows:
(T1 + T2 )(x)
( T1 )(x)

= T1 (x) + T2 (x)
= (T1 (x)).

Lemma. With the preceding definitions, when X and Y are normed spaces the following space of operators (!) is a vector space:
B(X , Y) = {bounded linear transformations from X to Y} .
(The proof that this is a vector space is within the next proposition.)
This space is analogous to certain types of dual spaces (see Ch. 5).
Not only is B(X , Y) a vector space, it is a normed space when one uses the operator norm |||A||| defined above.
Proposition. (B(X , Y), ||| |||) is a normed space when X and Y are normed spaces.

Proof. (sketch)
Claim 1. B(X , Y) is a vector space.
Suppose T1 , T2 B(X , Y) and F.
k( T1 + T2 )(x)kY = k T1 (x) + T2 (x)kY || kT1 (x)kY + kT2 (x)kY ||||| T1 ||| kxkX + ||| T2 ||| kxkX = K kxkX , where
K , ||||| T1 ||| + ||| T2 |||. So T1 + T2 is a bounded operator. Clearly T1 + T2 is a linear operator.
Claim 2. ||| ||| is a norm on B.
The hardest part is verifying the triangle inequality:
||| T1 + T2 ||| = sup k(T1 + T2 )xkY sup kT1 xkY + sup kT2 xkY = ||| T1 ||| + ||| T2 |||.
kxk=1

kxk=1

kxk=1

2
Are there other valid norms for B(X , Y)? ??

Remark. We did not really need linearity in this proposition. We could have shown that the space of bounded transformations
from X to Y with ||| ||| is a normed space.

c J. Fessler, December 21, 2004, 13:3 (student version)


6.4

Not only is B(X , Y) a normed space, but it is even complete if Y is.


Theorem. If X and Y are normed spaces with Y complete, then (B(X , Y), ||| |||) is complete.

Proof.
Suppose {Tn } is a Cauchy sequence (in B(X , Y)) of bounded linear operators, i.e., |||Tn Tm ||| 0 as n, m .
Claim 0. x X , the sequence {Tn (x)} is Cauchy in Y.
kTn (x) Tm (x)kY = k(Tn Tm )(x)kY |||Tn Tm ||| kxkX 0 as n, m .
Since Y is complete, for any x X the sequence {Tn (x)} converges to some point in Y. (This is called pointwise convergence.)
So we can define an operator T : X Y by T (x) , limn Tn (x).
To show B is complete, we must first show T B, i.e., 1) T is linear, 2) T is bounded.
Then we show 3) Tn T (convergence w.r.t. the norm ||| |||).
Claim 1. T is linear
T (x + z) = lim Tn (x + z) = lim [Tn (x) + Tn (z)] = T (x) + T (z) .
n

(Recall that in a normed space, if un u and vn v, then un + vn u + v.)


Claim 2: T is bounded
Since {Tn } is Cauchy, it is bounded, so K < s.t. |||Tn ||| K, n N. Thus, by the continuity of norms, for any x X :
kT (x)kY = lim kTn (x)kY lim |||Tn ||| kxkX K kxkX .
n

Claim 3: Tn T
Since {Tn } is Cauchy,
> 0, N N s.t. n, m > N

= |||Tn Tm |||
= kTn (x) Tm (x)kY kxkX , x X
=

lim kTn (x) Tm (x)kY kxkX , x X

= kTn (x) T (x)kY kxkX , x X by continuity of the norm


= |||Tn T ||| .
We have shown that every Cauchy sequence in B(X , Y) converges to some limit in B(X , Y), so B(X , Y) is complete.

Corollary. (B(X , R), ||| |||) is a Banach space for any normed space X .
Why? ??
We write A B(X , Y) as shorthand for A is a bounded linear operator from normed space X to normed space Y.
......................................................................................................................
Definition. In general, if S : X Y and T : Y Z, then we can define the product operator or composition as a
transformation T S : X Z by (T S)(x) = T (S(x)).
Proposition. If S B(X , Y) and T B(Y, Z), then T S B(X , Z).

Proof. Linearity of the composition of linear operators is trivial to show.


To show that the composition is bounded: kT SxkZ ||| T ||| kSxkY ||| T ||||||S||| kxkX .
Does it follow that ||| T S||| = ||| T ||||||S|||? ??

c J. Fessler, December 21, 2004, 13:3 (student version)


6.5

6.3
Inverse operators
Definition. T : X Y is called one-to-one mapping of X into Y iff x1 , x2 X and x1 6= x2 = T (x1 ) 6= T (x2 ).


Equivalently, T is one-to-one if the inverse image of any point y Y is at most a single point in X , i.e., T 1 ({y}) 1, y Y.
Definition. T : X Y is called onto iff T (X ) = Y. This is a stronger condition than into.

Fact. If T : X Y is one-to-one and onto Y, then T has an inverse denoted T 1 such that T (x) = y iff T 1 (y) = x.

Many optimization methods, e.g., Newtons method, require inversion of the Hessian matrix (or operator) corresponding to a cost
function.
Lemma. [3, p. 171]
If A : X Y is a linear operator between two vector spaces X and Y, then A is one-to-one iff N (A) = {0}.
Linearity of inverses
We first look at the algebraic aspects of inverse operators in vector spaces.
Proposition. If a linear operator A : X Y (for vector spaces X and Y) has an inverse, then that inverse A1 is also linear.

Proof. Suppose A1 (y1 ) = x1 , A1 (y2 ) = x2 , A(x1 ) = y1 , and A(x2 ) = y2 . Then by the linearity of A we have A(x1 +x2 ) =
Ax1 + Ax2 = y1 + y2 , so A1 (y1 + y2 ) = x1 + x2 = A1 (y1 ) + A1 (y2 ).
2
6.4
Banach inverse theorem
Now we turn to the topological aspects, in normed spaces.
Lemma. (Baire) A Banach space X is not the union of countably many nowhere dense sets in X .
Proof. see text
Theorem. (Banach inverse theorem)
If A is a continuous linear operator from a Banach space X onto a Banach space Y for which the inverse operator A1 exists,
then A1 is continuous.

Proof. see text


Combining with the earlier Proposition that linear operators are bounded iff they are continuous yields the following.
Corollary.
X , Y Banach and A B(X , Y) and A invertible = A1 B(Y, X )

Equivalence of spaces (one way to use operators)


in vector spaces
in normed spaces
in inner product spaces
spaces
vector
normed
normed
inner product

relation
isomorphic
topologically isomorphic
isometrically isomorphic
unitarily equivalent

operator
isomorphism
topological isomorphism
isometric isomorphism
unitary

requirements
linear, onto, 1-1
linear, onto, invertible, continuous
linear, onto, norm preserving = 1-1, continuous
isometric isomorphism that preserves inner products

6.6

c J. Fessler, December 21, 2004, 13:3 (student version)


Isomorphic spaces
Definition. Vector spaces X and Y are called isomorphic (think: same structure) iff there exists a one-to-one, linear mapping
T of X onto Y.
In such cases the mapping T is called an isomorphism of X onto Y.
Since an isomorphism T is one-to-one and onto, T is invertible, and by the linear of inverses proposition in 6.3, T 1 is linear.
Example. Consider X = R2 and Y = {f (t) = a + bt on [0, 1] : a, b R}.
An isomorphism is f = T (x) f (t) = a + bt where x = (a, b), with inverse T 1 (f ) = (f (0), f (1) f (0)).
Exercise. Any real n-dimensional vector space is isomorphic to Rn (problem 2.14 p. 44) [3, p. 268].
However, they need not be topologically isomorphic [3, p. 270].
......................................................................................................................
So far we have said nothing about norms. In normed spaces we can have a topological relationship too.
Definition. Normed spaces X and Y are called topologically isomorphic iff there exists a continuous linear transformation T of
X onto Y having a continuous inverse T 1 . The mapping T is called a topological isomorphism.
Theorem. [3, p. 258] Two normed spaces are topologically isomorphic iff there exists a linear transformation T with domain
X and range Y and positive constants m and M s.t. m kxkX kT xkY M kxkX , x X .
Example. In the previous example, consider (X , kk ) and (Y, kk2 ). Then for the same T described in that example:
R1
2
2
2
x = (a, b) = kT (x)kY = 0 (a + bt)2 dt = a2 + ab + b2 /3 a2 + |a| |b| + b2 /3 (1 + 1 + 1/3) kxk = 7/3 kxk .

2
2
Also kT (x)kY = a2 + ab + b2 /3 = (a + b/2)2 + b2 /12 = (a 3/2 + b/ 3)2 + a2 /4 kxk /12.
So X and Y are topologically isomorphic for the given norms.
Exercise. (C[0, 1], kk ) and (C[0, 1], kk1 ) are not topologically isomorphic. Why? ??
Isometric spaces
Definition. Let X and Y be normed spaces. A mapping T : X Y is called norm preserving iff kT (x)kY = kxkX , x X .
In particular, if T is norm preserving, then ||| T ||| = 1. What about the converse? ??
Proposition. If T is linear and norm preserving, then T is one-to-one, i.e., T (x) = T (z) = x = z.

Proof. If T (x) = y and T (z) = y, then by linearity T (x z) = 0.


So since T is norm preserving, kx zk = k0k = 0, so x = z.
2
......................................................................................................................
Definition. If T : X Y is both linear and norm preserving, then T is called a linear isometry.
If, in addition, T is onto Y, then X and Y are called isometrically isomorphic, and T is called an isometric isomorphism.

Remark. To illustrate why we require onto here, consider T : En `2 defined by T (x) = (x1 , . . . , xn , 0, 0, . . .).
This T is linear, one-to-one, and norm preserving, but not onto.
Exercise. Every normed space X is isometrically isomorphic to a dense subset of a Banach space X .
(problem 2.15 on p. 44)

Normed spaces that are isometrically isomorphic can, in some sense, be treated as being identical, i.e., they have identical properties. However, the specific isomorphism can be important sometimes.
P
p
Example. Consider Y = `p (N) = {(a1 , a2 , . . .) :
|a | < }
i=1


P i
p
and X = `p (Z) = (. . . , a2 , a1 , a0 , a1 , a2 , . . .) :
i= |ai | < . Define the mapping T : X Y by y = (b1 , b2 , . . .) =
T (x) if bi = az(i) where z(i) = (1)i bi/2c . Note that z : {1, 2, 3, . . .} 0, 1, 1, 2, 2, . . ..
This mapping T is clearly an isometric isomorphism, so `p (Z) and `p (N) are isometrically isomorphic. Hence we only bother
to work with `p = `p (N) since we know all algebraic and topological results will carry over to double-sided sequences.

c J. Fessler, December 21, 2004, 13:3 (student version)


6.7

Unitary equivalence in inner product spaces


Definition. Two inner product spaces X and Y are unitarily equivalent iff there is an isomorphism U : X Y of X onto Y
that preserves inner products: hU x1 , U x2 i = hx1 , x2 i , x1 , x2 X . The mapping U is called a unitary operator.
Y

Fact. If U is unitary, then U is norm preserving since kU xk = hU x, U xi = hx, xi = kxk . Clearly |||U ||| = 1.
Furthermore, since U is onto, U is an isometric isomorphism. Interestingly, the converse is also true [3, p. 332].
Theorem. A mapping U of X onto Y, where X and Y are inner product spaces, is an isometric isomorphism iff U is a
unitary operator.

Proof. (=) was just argued above.


(=) Suppose U is an isometric isomorphism.
Using the parallelogram law, the linearity of U , and the fact that kU xk = kxk, we have:
2

4 hU x, U yi = kU x + U yk kU x U yk + i kU x + iU yk i kU x iU yk
= kU (x + y)k kU (x y)k + i kU (x + iy)k i kU (x iy)k
2

= kx + yk kx yk + i kx + iyk i kx iyk = 4 hx, yi .

Thus U an isometric isomorphism = U unitary.

Remark. After defining adjoints we will show that U

= U in Hilbert spaces.

Exercise. Any complex n-dimensional inner product space is unitarily equivalent to Cn [3, p. 332].
Every separable Hilbert space is unitarily equivalent with `2 or some Cn [3, p. 339].
Example. Continue the previous f (t) = a + bt example, but now use En and (Y, h, i2 ). If g(t) = c + dt then
hT x1 , T x2 i = hf, gi =

f (t)g(t) dt =

ad bc bd
= [a b]
(a+bt)(c+dt) dt = ac+ + +
2
2
3

1
1/2

1/2
1/3



so we define U = T G1/2 , then


hU x1 , U x2 i = hT G1/2 x1 , T G1/2 x2 i = (G1/2 x1 )0 G(G1/2 x1 ) = x01 x1 = hx1 , x2 i .
Example. The Fourier transform F () =

f (t) e2t dt is a unitary mapping of L2 [R] onto itself.

Example. Soon we will analyze the discrete-time Fourier transform (DTFT) operator, defined by
G = Fg G() =

gn en .

n=

We will show that F B(`2 , L2 [, ]) and F is invertible. And Parsevals relation from Fourier analysis is that


1
1
Fg, Fh
2
2

1
2

G()H () d =

n=

gn hn = hg, hi .

So `2 and L2 [, ] are unitarily equivalent, and the unitary operator needed is simply U =

1 F.
2

c
d

= x01 Gx2 ,

c J. Fessler, December 21, 2004, 13:3 (student version)


6.8

The following extension theorem is useful in proving that every separable Hilbert space is isometrically isomorphic to `2 .

Theorem. Let X be a normed space, Y a Banach space, and M X and N Y be subspaces.


Suppose that
M = X , and N = Y,
T : M N is a bounded linear operator.
Then there exists a unique linear operator T : X Y such that T |M = T. Moreover, ||| T ||| = ||| T |||.
If, in addition,
X is a Banach space,
1
T : N M exists and is bounded,
then T is also onto Y.
Proof. Note: T |M reads T restricted to M .
T |M = T means T (x) = T(x) for all x M .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .n. . . . . . . o
.....................................................................
Claim 1. If {xn } M is Cauchy in M , then T(xn ) is Cauchy in N .








T (xn ) T(xm ) = T(xn xm ) ||| T ||| kxn xm kX 0 as n, m .
Y
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Y. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
n} M
n x.
Claimn2. For {x
o , suppose xn x X and x
o n } , {nx
n ) both converge and to the same limit.
Then T(xn ) and T(x
n
o
n
o
n ) are both Cauchy in N by Claim 1 since xn and x
n both converge.
T(xn ) and T(x
n ) y.
Since Y is complete, y, y Y s.t. T(xn )


y and T (x



nk
n ) limn ||| T ||| kxn x
n ) = limn T(xn x
= limn T (xn ) T (x
But ky yk
Y

= ||| T ||| kx xkX = 0, using norm continuity, linearity of T, and boundedness of T.

Now we define T : X Y as follows.


For x X = M , {xn } M such that xn x, so we define T (x) , limn T(xn ).
By Claim 2, T is well-defined, and moreover, if x M , then T (x) = T(x), i.e., T |M = T.
......................................................................................................................
Claim 3. ||| T ||| = ||| T |||
x X , T (x) = limn T (xn ) where
xn M and xn x.



Thus kT (x)k = limn T (xn ) limn ||| T ||| kxn k = ||| T ||| kxk . Thus ||| T ||| ||| T |||.
Y

However, x M , T (x) = T(x) = ||| T ||| ||| T |||, since T is defined on X M .


Thus ||| T ||| = ||| T |||.
......................................................................................................................
Claim 4. T is linear, which is trivial to prove.
......................................................................................................................
Claim 5. (uniqueness of T ) If L1 , L2 B(X , Y), then L1 |M = L2 |M = L1 = L2 .
For x X = M , {xn } M s.t. xn x. Thus, by the continuity of L1 and L2 ,
L1 (x) = limn L1 (xn ) = limn L2 (xn ) = L2 (x), since L1 (xn ) = L2 (xn ) by L1 |M = L2 |M .

......................................................................................................................
1
Claim 6. If T : N M exists and is bounded, and if X is a Banach space, then T is onto Y.
Let y Y. Since N = Y, {yn } N s.t. yn y.
1
By the same reasoning as in Claim 1, xn = T (yn ) is Cauchy in X .
Since X is a Banach space, x X s.t. xn
 x.

1
Thus T (x) = limn T(xn ) = limn T T (yn ) = limn yn = y.
Thus, y Y, x X s.t. T (x) = y.
2

c J. Fessler, December 21, 2004, 13:3 (student version)


6.9

The following theorem is an important application of the previous theorem.


Theorem. Every separable Hilbert space H is isometrically isomorphic to `2 .

Proof.
Let M = [{ei }
i=1 ].

H separable = a countable orthonormal basis {ei }. (Homework.)

Let N = [{
ei }
i=1 ].

Of course `2 has a countable orthonormal basis {


ei }, where eij = ij (Kronecker).

Define T : M N to be the linear operator for which T(ei ) = ei .


(Exercise. Think about why the hereconsider M .)
P
Then since any x M has form x = ni=1 ci ei for some n N:

v
n


n

n

!
u n
n
X





X

X

X
X
u









2
t

|ci | =
ci ei = kxk ,
ci ei =
ci T (ei ) =
ci ei =
T (x) = T








2
i=1

i=1

i=1

i=1

i=1

(6-1)

so ||| T ||| = 1 on M .

1
1
Clearly T exists on N and is given by T (
ei ) = ei , so ||| T ||| = 1 on N by a similar argument.

Since `2 is a Banach space, we have established all the conditions of the preceding theorem, so there exists a
unique linear operator T from H to `2 that is onto `2 with ||| T ||| = ||| T ||| = 1.

Since T is bounded, it is continuous, so one can take limits as n in (6-1) to show that kT xk = kxk ,
so T is norm preserving.
Thus T is an isometric isomorphism for H and `2 , so H and `2 are isometrically isomorphic.

Corollary. L2 is isometrically isomorphic to `2 .


A remarkable result. (Perhaps most of what is remarkable here is that L2 is separable.)

A practical consequence: usually one can search for counterexamples in `2 (and its relatives) rather than L2 .
..................................................................................................
Example. The (different) spaces of odd and even functions in L2 [, ] are isometrically isomorphic.
Example. Elaborating. Let H = L2 [, ] and define
e2k = cos(kt) /c2k
Z
c2k =
cos2 (kt) dt

X = [{e2k }] = {f L2 : f even}

e2k+1 = sin(kt) /c2k+1 , k = 0, 1, 2, . . .


Z
c2k+1 =
sin2 (kt) dt

Y = [{e2k+1 }] = {f L2 : f odd} .

Then H, X , and Y are each Banach spaces that are isometrically isomorphic to each other!
And each is isometrically isomorphic to `2 .
Example. `2 is isometrically isomorphic to L2 [, ]. (Just use the DTFT.)

......................................................................................................................
But an even stronger result holds than the above...
Every separable Hilbert space is unitarily equivalent with `2 or some Cn [3, p. 339].

6.10

c J. Fessler, December 21, 2004, 13:3 (student version)


Summary

Insert 5.1-5.3 here!

c J. Fessler, December 21, 2004, 13:3 (student version)


6.11

6.5
Adjoints in Hilbert spaces
When performing optimization in inner product spaces, often we need the transpose of a particular linear operator. The term
transpose only applies to matrices. The more general concept for linear operators is called the adjoint.
Luenberger presents adjoints in terms of general normed spaces. In my experience, adjoints most frequently arise in inner product
spaces, so in the interest of time and simplicity these notes focus on that case. The treatment here generalizes somewhat the
treatment in Naylor [3, p. 352].
Recall the following fact from linear algebra. If A : Cn Cm is a m n matrix, then

hAx, yiCn = y 0 Ax = (A0 y)0 x = hx, A0 yiCm .

This section generalizes the above relationship to general Hilbert spaces.


Let A B(X , Y) where X and Y are Hilbert spaces.
Let y Y be a fixed vector, and consider the following functional:
gy : X C, where gy (x) , hAx, yi .
Y

gy is clearly linear, since A is linear and h, yi is linear.


Y

|gy (x)| kAxkY kykY |||A||| kxkX kykY by Cauchy-Schwarz and since A is bounded.
Thus gy is bounded, with |||gy ||| |||A||| kykY .

In other words, gy X ? .

Definition. By the Riesz representation theorem (here is where we use completeness), for each such y Y there exists a unique
z = zy X such that
gy (x) = hx, zy i ,
x X .
X

So we can define legitimately a mapping A : Y X , called the adjoint of A, by the relationship zy = A? (y).
The defining property of A? is then:
hAx, yi = hx, A? (y)i , x X , y Y.
Y

(At this point we should write A (y) rather than A y since we have not yet shown A? is linear, though we will soon.)
Lemma. A? is the only mapping of Y to X that satisfies the preceding equality.
Proof. For any y Y, suppose x X we have hAx, yi = hx, T1 (y)i = hx, T2 (y)i .
Y

Then 0 = hx, T1 (y) T2 (y)i so T1 (y) = T2 (y), y Y.

Exercise. Here are some simple facts about adjoints, all of which concur with those of Hermitian transpose in Euclidean space.
hx, yi = hx, yi so for A : X X defined by Ax = x we have A? y = y. So reuse of the asterix is acceptable.
I? = I ,
0?X Y = 0YX
??
A = A (see Thm below)
(ST )? = T ? S ?
(S + T )? = S ? + T ?
(A)? = A?
Note: these last two properties are unrelated to the question of whether A? is a linear operator! ??
Example. Consider X = L2 [0, 1], Y = C2 , and A : X Y is defined by
Z
Z 1
tx(t) dt, y2 = [Ax]2 =
y = Ax y1 = [Ax]1 =

t2 x(t) dt .

We can guess that the adjoint of A is defined by x = A y x(t) = (A y)(t) = y1 t + y2 t2 .


This is verified easily:
Z 1
Z 1
Z 1
h
i
t2 x(t) dt =
x(t) y1 t + y2 t2 dt = hx, A? yi .
tx(t) dt +y2
hAx, yi = y1 [Ax]1 + y2 [Ax]2 = y1
Y
X
| {z }
0
0
0
(A? y)(t)

c J. Fessler, December 21, 2004, 13:3 (student version)


6.12

Theorem. Suppose A B(X , Y) where X and Y are Hilbert spaces.


The adjoint operator A? is linear and bounded, i.e., A? B(Y, X ).
|||A? ||| = |||A|||
(A? )? = A

Proof.
Claim 1. A? is linear.
hx, A? (y + z)i = hAx, y + zi = hAx, yi + hAx, zi = hx, A? (y)i + hx, A? (z)i
X
Y
Y
Y
X
X
= hx, A? (y) + A? (z)i , which holds for all x X and y Y, so A? (y + z) = A? (y) + A? (z) by the usual Lemma once
X
again.
......................................................................................................................
Claim 2. A? is bounded and |||A? ||| |||A|||.
2

kA? ykX = hA? y, A? yi = hAA? y, yi kAA? ykY kykY |||A||| kA? ykX kykY
X

so kA ykX |||A||| kykY and thus |||A ||| |||A||| and hence A B(Y, X ).
......................................................................................................................
Claim 3. A?? = A.
Since we have shown A? B(Y, X ), we can legitimately define the adjoint of A? , denoted A?? , as the (bounded linear) operator
that satisfies hA? y, xi = hy, A?? xi , x X , y Y.
X

Since hy, Axi = hAx, yi = hx, A? yi = hA? y, xi = hy, A?? xi , by the previous uniqueness arguments we see A?? = A.
Y
Y
X
X
Y
......................................................................................................................
?
Claim 4. |||A ||| = |||A|||.
From Claim 2 with A? : |||A?? ||| |||A? ||| or equivalently: |||A||| |||A? |||.
2
Corollary.
Under the same conditions as above: |||A? A||| = |||AA? ||| = |||A|||2 = |||A? |||2 .
Proof. Recalling that |||S T ||| |||S|||||| T ||| we have by the preceding theorem:
|||A? A||| |||A? ||||||A||| = |||A|||2 = |||A? |||2 ,

and kAxkY = hAx, Axi = hAx, (A? )? xi = hA? Ax, xi kA? Axk kxk |||A? A||| kxk
Y

Y
?

so |||A|||2 |||A? A|||. Combining yields the equality |||A|||2 = |||A A|||. The rest is obvious using A?? = A.

Proposition. If A B(X , Y) is invertible, where X ,Y are Hilbert spaces, then A? has an inverse and (A? )1 = (A1 )? .

Proof.
Claim 1. A? : Y X is one-to-one into X .
Consider y1 , y2 Y with y1 6= y2 . but suppose A? y1 = A? y2 , so A? d = 0 where d = y2 y1 6= 0.
Thus hx, A? di = 0, x X and hence hAx, di = 0, x X .
X

Since A is invertible, we can make the change of variables z = Ax and hence hz, di = 0, z X .
Y
But this implies d = 0, contradicting the supposition that d 6= 0. So A? is one-to-one.

Claim 2. A? : Y X is onto X .
By the Banach inverse theorem, since A B(X , Y) and A is invertible, A1 B(Y, X ), so A1 has its own adjoint, (A1 )? .
Pick any z X . Then for any x X

hx, zi = hA1 Ax, zi = hAx, (A1 )? zi = hAx, (A1 )? zi = hx, A? (A1 )? zi .


i
h
Thus z = A? (A1 )? z , showing that z is in the range of A? . Since z X was arbitrary, A? is onto X .
| {z }
Y

? 1

Claim 3. (A ) = (A1 )? .
Since A? is one-to-one and onto X , A? is invertible.
Furthermore, z = A? (A1 )? z = (A? )1 z = (A1 )? z, z X . Thus (A? )1 = (A1 )? .
1

Remark. AB = I and BA = I = A, B invertible and A


Remark. AB = I and A invertible = A1 = B.

= B.

c J. Fessler, December 21, 2004, 13:3 (student version)


6.13

Definition. A B(H, H), where H is a real Hilbert space, is called self adjoint if A? = A.
Exercise. An orthogonal projection PM : H H, where M is a closed subspace in a Hilbert space H, is self adjoint. ??

Exercise. Conversely, if P B(H, H) and P 2 = P and P ? = P , then P is an orthogonal projection operator. (L6.16)

Definition. A self-adjoint bounded linear operator A on a Hilbert space H is positive semidefinite iff hx, Axi 0, x H.

Remark. It is easily shown that hx, Axi is real when A is self-adjoint.


Example. When M is a Chebyshev subspace in an inner product space, is PM a self-adjoint operator? ??
Unitary operators

(Caution: the proof on [3, p. 358] is incomplete w.r.t. the onto aspects.)

We previously defined unitary operators; we now examine the adjoints of these.


Theorem. Suppose U B(X , Y) where X ,Y are Hilbert spaces. Then the following are equivalent:
1. U is unitary, i.e., U is an isomorphism (linear, onto, and one-to-one) and hU x, U zi = hx, zi, x, z X ,
2. U ? U = I and U U ? = I,
3. U is invertible with U 1 = U ? .

Proof. (2 = 3) and (3 = 1) are obvious.


(1 = 2) If U is unitary then for all x, z X : hx, zi = hU x, U zi = hU ? U x, zi, so U ? U x = x, x X so U ? U = IX .
X
For any y Y, since U is onto there exists an x X s.t. U x = y. Thus U U ? y = U U ? U x = U IX x = U x = y.
Since y Y was arbitrary, U U ? = IY .
2

Remark. A corollary is that if U is unitary, then so is U ? .


?

Remark. To see why we need both U U = I and U U = I above, consider U =

1
0


, for which U ? U = I but U is not onto.

Example.
Consider X = Y = `2 and the (linear) discrete-time convolution operator A : `2 `2 defined by
z = Ax zn =

hnk xk ,

k=

n Z,

where we assume that h `1 , which is equivalent to BIBO stability. We showed previously that kAxk2 khk1 kxk2 , so A is
bounded with |||A||| khk1 , so A has an adjoint.
(Later we will show |||A||| = kHk where H is the frequency response of h.)
Since A is bounded, it is legitimate to search for its adjoint:
"
#
"
#

X
X
X
X
X

xk [A? y]k = hx, A? yi,


xk
yn hnk =
hAx, yi =
yn
xk hnk =
n=

k=

k=

where the adjoint is


[A? y]k =

n=

hnk yn = [A? y]n =

n=


which is convolution with hkn .


When is A self adjoint? When hl = hl , i.e., h is Hermitian symmetric.


2
When is A unitary? When hn hn = [n], i.e., when |H()| = 1.

k=

k=

hkn yk ,

c J. Fessler, December 21, 2004, 13:3 (student version)


6.14

6.6
Relations between the four spaces
The following theorem relates the null spaces and range spaces of a linear operator and its adjoint.

Remark. Luenberger uses the notation [R(A)] but this seems unnecessary since R(A) is a subspace.
Theorem. If A B(X , Y) where X ,Y are Hilbert spaces, then

1. {R(A)} = N (A? ),

3. {R(A? )} = N (A),

2. R(A) = {N (A? )} ,

4. R(A? ) = {N (A)} .

Proof.

Claim 1. {R(A)} = N (A? )

Pick z N (A? ) and any y R(A), so y = Ax for some x X .

Now hy, zi = hAx, zi = hx, A? zi = hx, 0i = 0.


Y

Thus z N (A? ) = z {R(A)} since y R(A) was arbitrary. So N (A? ) {R(A)} .

Now pick y {R(A)} . Then for all x X , 0 = hAx, yi = hx, A? yi . So A? y = 0, i.e., y N (A? ).

Since y {R(A)} was arbitrary, {R(A)} N (A? ). Combining: {R(A)} = N (A? )


. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Claim 2. R(A) = {N (A? )} ,

Taking the orthogonal complement of part 1 yields {R(A)}


= N (A? ) .

Recall from proposition 3.4-1 that S


= [S] when S is a subset in a Hilbert space. Since R(A) is a subspace, [R(A)] = R(A).
......................................................................................................................
Parts 3 and 4 follow by applying 1 and 2 to A? and using the fact that A?? = A.
2
Example. (To illustrate why we need closure in R(A) above. Modified from [4, p. 156].)

Consider the linear operator A : `2 `2 defined by Ax = (x1 , x2 / 2, x3 / 3, . . .). Clearly x `2 = Ax `2 and


kAxk kxk, so A is bounded (and hence continuous). In fact |||A||| = 1 (consider x = e1 ).
Clearly R(A) includes all finitely nonzero sequences, so R(A) = `2 .
However, y = (1, 1/2, 1/3, . . .)
/ R(A) (Why not?) yet y `2 , so R(A) is not closed.
This problem never arises in finite-dimensional spaces!
Example. Consider A : `2 `2 defined by the upsampling operator: Ax = (x1 , 0, x2 , 0, x3 , 0 . . .).
It is easily shown that A is bounded and |||A||| = 1.
One can verify that the adjoint operation is downsampling: A? y = (y1 , y3 , y5 , . . .).

Clearly R(A? ) = `2 and N (A) = {0} = [R(A? )] .


?

Furthermore, R(A) = [i=0 e2i+1 ] and N (A ) = [


i=1 e2i ] and these two spaces are orthogonal complements of one another.

c J. Fessler, December 21, 2004, 13:3 (student version)


6.15

Example. Consider A : L2 [R] L2 [R] (which is complete [3, p. 589]) defined by the shifted filtering operator:
y = Ax y(t) = (Ax)(t) =
where sinc(t) =

sin(t)
t

sinc2 (t 3 ) x( ) d,

or unity if t = 0. The adjoint is


(A? y)(t) =

sinc2 (t + 3 ) y( ) d.

The nullspace of A? consists of signals in L2 whose spectrum is zero over the frequencies (1/2, 1/2). The range of A is all
signals in L2 that are band-limited to that same range, so the orthogonal complement is the same as N (A? ). (Picture) .
Exercise. Why did I use sinc2 () rather than sinc()? ??

6.7
Duality relations for convex cones

skip

6.8
Geometric interpretation of adjoints

(Presented in terms of general adjoints.)

When A B(X , Y) with X and Y Hilbert spaces, consider the following hyperplanes:
n
o
n
o
V1 = x X : hx, A? yi = 1 for some y Y,
V2 = y Y : hAx0 , yi = 1 for some x0 X .
X

skip

c J. Fessler, December 21, 2004, 13:3 (student version)


6.16

Optimization in Hilbert spaces


Consider the problem of solving y = Ax, where y Y is given, x X is unknown, A B(X , Y) is given, and X and Y are
Hilbert spaces. For any such y, there are three possibilities for x:
a unique solution,
no solution,
multiple solutions.
If A is invertible, then there is a unique solution x = A1 y, which is the least interesting case.

6.9
The normal equations (No exact solutions, so we seek a minimum-norm, unconstrained approximation.)
We previously explored the normal equations in a setting where R(A) was finite dimensional. Now we have the tools to generalize.
The following theorem illustrates the fundamental role of adjoints in optimization.
Theorem. Let A B(X , Y) where X and Y are Hilbert spaces.
For a fixed y Y, a vector x X minimizes ky AxkY iff A? Ax = A? y.

Proof. Consider the subspace M = R(A). Then the minimization problem is equivalent to inf mM ky mk.
By the pre-projection theorem, m? M achieves the infimum iff y m? M , i.e., y m? M = [R(A)] = N (A? ), by a
previous theorem. Thus, 0 = A? (y m? ) = A? y A? Ax, for some x X .
2
There is no claim of existence here, since R(A) might not be closed.
There is no claim of uniqueness of x? here, since although m? will be unique, there may be multiple solutions to m? = Ax.
If a minimum distance solution x? exists and A? A is invertible, then the solution is unique and has the familiar form:
x? = (A? A)1 A? y.

Example. Find minimum-norm approximation to y H of the form y =


vectors in H.

Pn

i=1

i xi , where the xi s are linearly independent

We know how to solve this from Ch. 3, but the operator notation provides a concise expression.
Define the operator A B(Cn , H) by
A ,

n
X

i xi where = (1 , . . . , n ).

i=1

Note that kAk = 0 G max (G) kk where G = A? A is the Gram matrix. Since the xp
i s are linearly independent, G
is symmetric positive definite so its eigenvalues are real and positive. So A is bounded and |||A||| = max (G).

Our goal is to minimize ky Ak over Cn . By the preceding theorem, the optimal solution must satisfy A? A = A? y.
What is A? : H Cn here? Recall we need hA, yi = h, A? yiCn , Cn , so
H

hA, yi =
H

* n
X

i xi , y

i=1

n
X
i=1

i hxi , yi =
H

n
X
i=1

i hy, xi i =
H

n
X
i=1

i [A? y]i = h, A? yiCn ,

where we see that [A? y]i = hy, xi i and hence


H



A? y = hy, x1 i , . . . , hy, xn i .
H

Thus one can easily show that A Ax = A y is equivalent to the usual normal equations.
So no computational effort has been saved, but the notation is more concise. Furthermore, the notation (A? A)1 A? y is comfortingly similar to the notation (A0 A)1 A0 y that we use for the least-squares solution of linear systems of equations in Euclidean
space. So with everything defined appropriately, the generalization to arbitrary Hilbert spaces is very natural.

c J. Fessler, December 21, 2004, 13:3 (student version)


6.17

6.10
The dual problem (for minimum norm solutions)
If y = Ax has multiple solutions, then in some contexts it is reasonable to choose the solution that minimizes some type of norm.
However, the appropriate norm is not necessarily the norm induced by the inner product.
Exercise. Generalize Luenbergers treatment to weighted norms.
Theorem. Let X and Y be Hilbert spaces and A B(X , Y). Suppose y R(A) is given, i.e., y = Ax0 for some x0 X .
Assume that R(A? ) is closed in X . ERROR in Luenberger p. 161.
The unique vector x? X having minimum norm and satisfying Ax = y is characterized by:
x? = {A? z : AA? z = y, z Y} .

Proof. Since x0 is one solution to Ax = y, the general solution has the form x = x0 m, where m M , N (A).
In other words, we seek the minimum norm vector in the linear variety
V = {x X : Ax = y} = {x0 m : m M } .
Since A is continuous, N (A) is a closed subspace (homework). Thus V is a closed linear variety in a Hilbert space, and as such
has a unique element x? of minimum norm by the (generalized) projection theorem, and that element is characterized by the two
conditions x? = x0 m? M , and x? V .

Since R(A? ) was assumed closed, by the previous 4-space theorem we have M = {N (A)} = R(A? ) = R(A? ) .
Thus x? M = x? M = R(A? ) = x? = A? z for some z Y.
(There may be more than one such z.)
Furthermore, x? V = Ax? = y = AA? z = y.
(x? is unique even if there are many such z values!)
2
......................................................................................................................
If AA? is invertible, then the minimum norm solution has the form:
x? = A? (AA? )1 y.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 0 1
Example. X = R3 , Y = R2 , A =
, y = (3, 3).
0 1 1
There are multiple solutions including
thereof.


 x1 = (0, 0, 3)and x2 = (3, 3, 0) and convex combinations
1
2
1
2
1
1
1
and x? = A? z = (1, 1, 2).
, z = [AA? ] y =
, [AA? ] = 31
Here A? = AT so AA? =
1
1 2
1 2
Of course x? has a smaller 2-norm than the other solutions above.
However, x1 is more sparse which can be important in some applications.
 
1
= 2.
Example. X = R1 , Y = R2 , A =
, y = (2, 2). Then, using M ATLABs pinv function, x
1


1 1
= A? z however!
In this case, AA? =
, so there are multiple solutions to AA? z = y, each of which leads to the same x
1 1
......................................................................................................................
Example. The downsampling by averaging operator A : `2 `2 is defined by Ax = (x1 /2 + x2 /2, x3 /2 + x4 /2, . . .).

One can show that this is bounded with |||A||| = 1/ 2, since |||[1/2 1/2]||| = 1/ 2.
The adjoint is A? y = (y1 /2, y1 /2, y2 /2, y2 /2, . . .), so AA? z = (z1 /2, z2 /2, . . .) = 12 z = AA? = 12 I. So z = 2y.
Thus x? = A? z = 2A? y = (y1 , y1 , y2 , y2 , . . .), which is a sensible solution in this application.

c J. Fessler, December 21, 2004, 13:3 (student version)


6.18

6.11
Pseudo-inverse operators
This concept allows a more general treatment of finding solutions to Ax = y, regardless of how many such solutions.
Definition. Let X and Y be Hilbert spaces with A B(X , Y) and R(A) closed in Y. For any y Y, define the following linear
variety:


Vy = x1 X : kAx1 ykY = min kAx ykY .
xX

Among all vectors x1 Vy , let x0 be the unique vector of minimum norm kkX .
The pseudo-inverse A+ of A is the operator mapping each y in Y into its corresponding x0 . So A+ : Y X .
Note: closure of R(A) usually arises from one of X or Y being finite dimensional.
This definition is legitimate since minxX ky Axk = minmM =R(A) ky mk where R(A) is assumed closed.
By the projection theorem there is a unique y M = R(A) of minimum distance to y.
However, the linear variety V = {x X : y = Ax} may nevertheless contain multiple points.
What about uniqueness of the vector having minimum norm?
is a linear variety, a translate of N (A), which is closed. Why? ??
The set {x1 X : Ax1 = y}
So by the Ch. 3 theorem on minimizing norms within a linear variety, x0 is unique. Thus A+ is well defined.
If A is invertible, then x0 = A1 y will of course be the minimizer, in which case we have A+ = A1 .
y

R(A)

{N (A)}

x0
N (A)

V = {x X : Ax = y}

A+

Often A+ is many-to-one since many y vectors will map to the same x0 .


Geometric interpretation

(The above definition is algebraic.)

Since N (A) is a closed subspace in the Hilbert space X , by the theorem on orthogonal complements we have

X = N (A) {N (A)} .
Similarly, since we have assumed that R(A) is closed (and it is a subspace):

Y = R(A) {R(A)} .

When restricted to the subspace {N (A)} , the operator A is a mapping from {N (A)} to R(A) (of course).
Between these spaces, A is one-to-one, due to the following Lemma.

Thus A has a linear inverse on R(A) that maps each point in R(A) back into a point in {N (A)} . This inverse defines A+ on

R(A). To define A+ on all of Y, define A+ y = 0 for y {R(A)} .


i1
h

One way to write this is: A+ = A | {N (A)}


PR(A) .

Lemma. Let A : X Y be a linear operator on a Hilbert space X .

If S {N (A)} , then A is one-to-one on S.


Proof. Suppose As1 = As2 . where s1 , s2 S {N (A)} , which is a subspace, so s = s1 s2 {N (A)} .
By the linearity of A, we have As = 0, so s N (A). Thus s = 0 and hence s1 = s2 . So A is one-to-one on S.

c J. Fessler, December 21, 2004, 13:3 (student version)


6.19

[N (A)]

R(A)

A+
[R(A)]

N (A)
A+

A
0

. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
0 1
. Then
Example. Consider A =
0 1

N (A) = {(a, 0) : a R} ,

{N (A)} = {(0, b) : b R} ,

R(A) = {(b, b) : b R} ,

{R(A)} = {(a, a) : a R} .

Recall that
y = P[v] x y =
so PR(A) =

1
2

1
1

[1 1] =

1
2

Example.

v
x,
kvk

v
,
kvk


1 1
.
1 1



i1
h
0 0
0

+
=
, so A = A | {N (A)}
PR(A) =
0 1
0

i1
h

And clearly A | {N (A)}

(aI) =

1
a I,

0I,

0
1

 
1
2

1 1
1 1

0
1/2

0
1/2

a 6= 0
= a+ I.
a=0

Here are algebraic properties of a pseudoinverse.


Proposition. Let A B(X , Y) have closed range with pseudo-inverse A+ : Y X . Then
A+ B(Y, X )
+
(A+ ) = A
+
A AA+ = A+
AA+ A = A
?
+
(A? ) = (A+ )
?
?
+
+
(A A) = A A = A? (A+ )
+ ?
+
+
?
?
A = (A A) A = A (AA? )
+
?
1 ?
?
A = (A A) A if A A is invertible
A+ = A? (AA? )1 if AA? is invertible
In finite-dimensional spaces, a simple formula is given in terms of the SVD of A.

Proof. (Exercise)
......................................................................................................................
L6.19:
A+ = lim [A? A + I]1 A? = lim A? [AA? + I]1 ,
+
+
0

where the limits represent convergence with respect to what norm? ??

c J. Fessler, December 21, 2004, 13:3 (student version)


6.20

......................................................................................................................
One property that is missing is (CB)+ 6= B + C + , unlike with adjoints and inverses.
However, L 6.21 claims that if B is onto and C is one-to-one, then (CB)+ = B + C + .

Example.



1/2
1
1
,
, C + = C ? (CC ? ) =
1/2
1
In this case, CB = 1, but B + C + = 1/2.
C = [1 1],

C? =

B=

1
0

B ? = [1 0],

B + = (B ? B)

B ? = [1 0].

Example. Something involving DTFT/filtering following DTFT analysis.


Example. Downsampling by averaging. (Handwritten notes.)
Example. (p.165)
From regularization design in tomography, form a LS approximation of the form: f ()
But those cos terms are linearly dependent!

P3

k=0


k cos2 4 k .

More generally, if the xi s may be linearly dependent. how to we work with




n


X


minn y
i xi .
C

i=1

Pn

Define A : Cn H by = i=1 i xi , where y, xi H.


If minimizing not unique, we could choose the one of minimum norm.
Here R(A) is closed since it is a finite-dimensional subspace of H.
So A+ exists and A+ = (A? A)+ A? , where G = A? A is simply the n n Gram matrix. So the minimum norm LS solution is
= A+ y = (A? A)+ A? y.

Since G is (Hermitian) symmetric nonnegative definite, it has an orthogonal eigenvector decomposition G = QDQ0 , and one can
show that G+ = QD+ Q0 .

c J. Fessler, December 21, 2004, 13:3 (student version)


6.21

Analysis of the DTFT


Given a discrete-time signal g(n), i.e., g : Z C, the discrete-time Fourier transform or DTFT is defined in introductory
signal processing books as follows:

X
G = Fg G() =
g(n) en .
(6-2)
n=

This is an infinite series, so for a rigorous treatment we must find suitable normed spaces in which we can establish convergence.
The natural family of norms is `p , for some 1 p . Why? What about the doubly infinite sum? ??
The logical meaning of the above definition is really
G = Fg G = lim FN g, where GN = FN g GN () ,
N

N
X

n=N

g(n) en , where N N.

(6-3)

Alternatively, one might also try to show that F = limN FN , where the limit is with respect to the operator norm in
B(`p , Lr [, ]) for some p and r, but this is in fact false! (See below.)
Since GN () is only a finite sum, clearly it is always well defined.
Furthermore, being a finite sum of complex exponentials, GN () is continuous in , and hence Lebesgue integrable on [, ].
So we could write FN : R2N +1 L1 [, ] or perhaps more usefully: FN : `p Lr [, ] for any 1 p, r .
To elaborate, note that by Holders inequality:

N
N



X
X
X




|g(n) 1{|n|N } | kgkp 1{|n|N } = kgkp (2N + 1)11/p .
|g(n)| =
g(n) en
|GN ()| =


q
n=

n=N

n=N

Thus

kFN gkr = kGN kr =

Z

|GN ()|r d

1/r

(2)1/r kgkp (2N + 1)11/p .

Furthermore, for p = 1 the upper bound is achieved when g(n) = [n], so |||FN |||1r = (2)1/r .
Thus FN B(`p , Lr [, ]) for any 1 p, r , and |||FN |||pr (2)1/r (2N + 1)11/p .
Remark. f L [a, b] = f Lr [a, b] if < a < b < and r 1.
But to make (6-2) rigorous we must have normed spaces in which the limit in (6-3) exists.
Non-convergence of the operators
Note: treating FN : `p Lr [, ], by considering g0 (n) = [n (M + 1)] we have for N > M :




|||FN FM |||pr = sup k(FN FM )gkr k(FN FM )g0 kr = e(M 1) 0 = (2)1/r .
g : kgkp 1

So {FN } is not Cauchy (and hence not convergent) in B(`p , Lr [, ]), no matter what p or r values one chooses.
So we must analyze convergence of the spectra GN () = FN g, rather than convergence of the operators FN themselves.

(6-4)

c J. Fessler, December 21, 2004, 13:3 (student version)


6.22

`1 analysis
Proposition. If g `1 , then {FN g} is Cauchy in Lr [, ] for any 1 r .

Proof.
If g `1 , then defining

I(N, M ) , {n Z : min(N, M ) < |n| max(N, M )}

(6-5)

and GN = FN g we have


N

M
X
X
X



g(n) en
g(n) en =
g(n) en
=



n=M
n=N
nI(N,M )
X
X

|g(n)|
|g(n)| 0 as N, M .

|GN () GM ()|

nI(N,M )

(6-6)

|n|>min(N,M )

So for each R, the sequence {GN ()}N =1 is Cauchy in R, and hence convergent by the completeness of R, provided g `1 .
Thus for each , {GN ()} converges pointwise to some limit, call it G(), where (6-2) is shorthand for that limit.
Furthermore, when g `1 :

kGN GM k = sup |GN () GM ()| 0 as N, M .


||

So the sequence of function {GN } is Cauchy in L [, ], which is complete, so {GN } converges to a limit G L [, ].
More generally, using (6-6):
r

kGN GM kr =

|GN () GM ()| d 2

|n|>min(N,M )

|g(n)| 0 as N, M ,

so {GN } is Cauchy in Lr [, ] for any 1 r .


2
......................................................................................................................
Thus, due to completeness, {GN } converges to a limit G Lr [, ].
So we can define the DTFT operator F : `1 Lr [, ] by
Fg , lim FN g.
N

(6-7)

Proposition. F B(`1 , Lr [, ]) for any 1 r with |||F|||1r = (2)1/r .

Proof. Linearity of F follows from linearity of FN . For g `1 :






kFgkr = lim FN g = lim kFN gkr lim |||FN |||1r kgk1 = (2)1/r kgk1 ,
N

using (6-4). Equality is achieved when g(n) = [n].

c J. Fessler, December 21, 2004, 13:3 (student version)


6.23

`2 analysis
Unfortunately, `1 analysis is a bit restrictive; the class of signals is not as broad as we might like, and for least-squares problems
we would rather work in `2 . This will allow us to apply Hilbert space methods.
However, if a signal is in `2 , it is not necessarily in `1 , so the above `1 analysis does not apply to many signals in `2 . So we need a
different approach.
Proposition. If g `2 , then {FN g} is Cauchy in L2 [, ].

Proof. If g `2 ,then using GN = FN g:


kGN

2
GM k2

N
2
2
Z X
M
X


X


n
n
n
g(n) e
g(n) e

g(n) e

d =

d


n=N
nI(N,M )
n=M
Z

X
X
X
X
e(nm) d =
g(n)g (m)
g(n)g (m)2

nI(N,M ) mI(N,M )

nI(N,M )

|g(n)|2 2

|n|>min(N,M )

nI(N,M ) mI(N,M )

1{n=m}

|g(n)|2 0 as N, M ,

since g `2 . Thus {FN g} is Cauchy in L2 [, ].

Since L2 [, ] is complete, {GN } is convergent (in the L2 sense!) to some limit G L2 [, ], and the expression in (6-2) is
again a reasonable shorthand for that limit, whatever it may be, and now we can define F : `2 L2 [, ] via (6-7). In other
words,
Z

kGN GM k2 =

|GN () G()|2 d 0.

This is often called mean square convergence.

Proposition. F B(`2 , L2 [, ]) with |||F|||22 =

2.

Proof. Linearity of F is easily shown.


Since {GN } is convergent it is bounded. In fact
2
kGN k2


2
N
N
X

X

n
g(n) e
=

d =


Z

n=N

N
X

n=N m=N

so kFgk2 2 kgk2 and hence |||F||| 2.

g(n)g (m)

(nm)

d = 2

Furthermore, ifR we consider g(n) = [n], then G() = 1.

2
Thus kGk2 = 1 d = 2 which achieves the upper bound above. Hence |||F|||22 = 2.

N
X

n=N

|g(n)|2 2 kgk2 ,

c J. Fessler, December 21, 2004, 13:3 (student version)


6.24

Adjoint
Since F B(`2 , L2 [, ]) and both `2 and L2 [, ] are Hilbert spaces, F has an adjoint:
!

Z
Z
X
X
n

= hg, F ? Si`2
S() en d
g(n) e
S ()
d =
g(n)
hFg, SiL2[,] =

where x = F ? S x(n) = S() en d, n Z.


So the adjoint is almost the same as the inverse DTFT (defined below).
And of course we know that F ? B(L2 [, ], `2 ).
Range
To apply the Hilbert space methods, we would like R(F) to be closed in L2 [, ]. It suffices for R(F) to be onto L2 [, ],
since of course L2 [, ] itself is closed.
Proposition. The DTFT F : `2 L2 [, ] defined by (6-2) is onto L2 [, ].

Proof. Let ek , k Z denote the family of functions ek () =

1
2

ek .

Recall that {ek } is a complete orthonormal basis for L2 [, ] [4, p. 62].


Thus, by Parsevals relation we have for any G L2 [, ]:
G=

X
k

hG, ek i ek , i.e., G() =

where

X
k

X
k

1
hG, ek i ek ,
2
2

| hG, ek i |2 = kGk2 .

Thus if we define g(k) = hG, ek i / 2 then g `2 and G = Fg, so G R(F).


Since G L2 [, ] was arbitrary, we conclude R(F) = L2 [, ].

DTFT is (almost) unitary


One can show easily that
hFu, FviL2[,] = 2 hu, vi2 .

Thus (since F is linear and invertible and hence an isomorphism), the normalized DTFT U = 1/ 2F is unitary.
So L2 [, ] and `2 are unitarily equivalent.

c J. Fessler, December 21, 2004, 13:3 (student version)


6.25

Inverse DTFT
Define a partial inverse DTFT operator RN : L2 [, ] `2 by
Z
1
gN = RN G gN (n) =
G() en d
2

1{|n|N } = 1

hG, en i

1{|n|N } .

Proposition. If G L2 [, ], then {RN G} is Cauchy in `2 .

Proof.
kRN G RM Gk =

1
2

kI(N,M )

| hG, ek i |2

1
2

|k|>min(N,M )

| hG, ek i |2 0 as N, M .
2

Since `2 is complete, {RN G} converges to some limit g `2 and we define RG to be that limit: RG , limN RN G.

Proposition. R B(L2 [, ], `2 ) and |||R||| = 1/ 2.

Proof.
2
kRN Gk2


X 1 Z

=
2
|n|N

G() e

2

1 X
1 X
1
2
d =
| hG, ek i |2
kGk2 .
| hG, ek i |2 =
2
2
2
|k|N

So |||RN ||| 1/ 2 and RN B(L2 [, ], `2 ).

2
2
2
When G() = 1 we have kRN Gk = k[n]k = 1 and kGk = 2 so |||RN ||| = 1/ 2.
Proof 2. kRGk2 = limN kRN Gk2 limN

1
2

kGk2 . Consider G = 1 to show equality.

Proposition. RF = I`2 and FR = IL2 , so F 1 = R, where IH denotes the identity operator for Hilbert space H.

Proof. Exercise.

6.26

c J. Fessler, December 21, 2004, 13:3 (student version)


Convolution revisited
Using time-domain analysis, we showed previously that if h `1 and Ax = h x then A B(`p , `p ).
We have shown F B(`2 , L2 [, ]) and F 1 B(L2 [, ], `2 ).
Consider the band-limiting linear operator D : L2 [, ] L2 [, ] defined by

x(), || /2
y = Dx y() =
0,
otherwise.
Clearly D B(L2 [, ], L2 [, ]) and in fact |||D||| = 1.

Now consider A , F 1 DF. We previously showed in the analysis


of the composition of operators that |||ST ||| |||S||||||T |||.

So A B(`2 , `2 ) with |||A||| |||F 1 ||||||D||||||F||| 12 1 2 = 1.



/ `1 .
But this A represents an ideal lowpass filter, i.e., convolution with h(n) = 21 sinc n2 . But this h
Evidently, the convolution operator, at least in `2 , has a looser requirement than h `1 .
In contrast, in ` , h `1 is both necessary and sufficient for Ah B(` , ` ).
In `2 , a necessary and sufficient condition is that the frequency response be bounded.
Proposition. Ah B(`2 , `2 ) (with |||Ah ||| = kHk ) kFhk < , i.e., Fh L [, ] .

Proof. (=) Suppose kFhk is finite. Then using the convolution property of the DTFT and Parseval:
Z
1
1
2
2
2
2
2
2
2 1
kHXk2 =
kXk2 = kHk kxk2 ,
kh xk2 =
|H()X()| d kHk
2
2
2
|||Ah ||| kFhk = kHk . The upper bounded is achieved when h(n) = [n].
(=) If H is unbounded, then forall T there exists an interval over which H T . Choose x to be a signal whose spectrum is an
indicator function on that interval, and then kx hk2 T kxk2 , so Ah would be be unbounded. Take contrapositive.
2
Continuous-time case
Fourier transform
convolution
Youngs inequality

c J. Fessler, December 21, 2004, 13:3 (student version)


1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.

14.
15.
16.
17.
18.
19.

20.
21.
22.
23.
24.
25.
26.
27.

P. Enflo. A counterexample to the approximation problem in Banach spaces. Acta Math, 130:30917, 1973.
I. J. Maddox. Elements of functional analysis. Cambridge, 2 edition, 1988.
A. W. Naylor and G. R. Sell. Linear operator theory in engineering and science. Springer-Verlag, New York, 2 edition, 1982.
D. G. Luenberger. Optimization by vector space methods. Wiley, New York, 1969.
J. Schauder. Zur theorie stetiger abbildungen in funktionenrumen. Math. Zeitsch., 26:4765, 1927.
L. Grafakos. Classical and modern Fourier analysis. Pearson, NJ, 2004.
P. P. Vaidyanathan. Generalizations of the sampling theorem: Seven decades after Nyquist. IEEE Tr. Circ. Sys. I, Fundamental
theory and applications, 48(9):1094109, September 2001.
A. M. Ostrowski. Solution of equations in Euclidian and Banach spaces. Academic, 3 edition, 1973.
R. R. Meyer. Sufficient conditions for the convergence of monotonic mathematical programming algorithms. J. Comput.
System. Sci., 12(1):10821, 1976.
M. Rosenlicht. Introduction to analysis. Dover, New York, 1985.
A. R. De Pierro. On the relation between the ISRA and the EM algorithm for positron emission tomography. IEEE Tr. Med.
Imag., 12(2):32833, June 1993.
A. R. De Pierro. On the convergence of the iterative image space reconstruction algorithm for volume ECT. IEEE Tr. Med.
Imag., 6(2):174175, June 1987.
A. R. De Pierro. Unified approach to regularized maximum likelihood estimation in computed tomography. In Proc. SPIE
3171, Comp. Exper. and Num. Meth. for Solving Ill-Posed Inv. Imaging Problems: Med. and Nonmed. Appl., pages 21823,
1997.
J. A. Fessler. Grouped coordinate descent algorithms for robust edge-preserving image restoration. In Proc. SPIE 3170, Im.
Recon. and Restor. II, pages 18494, 1997.
A. R. De Pierro. A modified expectation maximization algorithm for penalized likelihood estimation in emission tomography.
IEEE Tr. Med. Imag., 14(1):132137, March 1995.
J. A. Fessler and A. O. Hero. Penalized maximum-likelihood image reconstruction using space-alternating generalized EM
algorithms. IEEE Tr. Im. Proc., 4(10):141729, October 1995.
M. W. Jacobson and J. A. Fessler. Properties of MM algorithms on convex feasible sets. SIAM J. Optim., 2003. Submitted. #
061996.
P. L. Combettes and H. J. Trussell. Method of successive projections for finding a common point of sets in metric spaces. J.
Optim. Theory Appl., 67(3):487507, December 1990.
F. Deutsch. The convexity of Chebyshev sets in Hilbert space. In A. Yanushauskas Th. M. Rassias, H. M. Srivastava, editor,
Topics in polynomials of one and several variables and their applications, pages 14350. World Sci. Publishing, River Edge,
NJ, 1993.
M. Jiang. On Johnsons example of a nonconvex Chebyshev set. J. Approx. Theory, 74(2):1528, August 1993.
V. S. Balaganskii and L. P. Vlasov. The problem of convexity of Chebyshev sets. Russian Mathematical Surveys, 51(6):1127
90, November 1996.
V. Kanellopoulos. On the convexity of the weakly compact Chebyshev sets in Banach spaces. Israel Journal of Mathematics,
117:619, 2000.
A. R. Alimov. On the structure of the complements of Chebyshev sets. Functional Analysis and Its Applications, 35(3):176
82, July 2001.
Y. Bresler, S. Basu, and C. Couvreur. Hilbert spaces and least squares methods for signal processing, 2000. Draft.
M. Vetterli and J. Kovacevic. Wavelets and subband coding. Prentice-Hall, New York, 1995.
D. C. Youla and H. Webb. Image restoration by the method of convex projections: Part ITheory. IEEE Tr. Med. Imag.,
1(2):8194, October 1982.
M. Unser and T. Blu. Generalized smoothing splines and the optimal discretization of the wiener filter. IEEE Tr. Sig. Proc.,
2004. in press.