Algebra Lineal y Análisis Funcional para Señales y Sistemas

Linear Algebra and Functional Analysis
for Signals and Systems 1
Alexandre Megretski John Wyatt
September 15, 2009
1 A.
c Megretski, J. Wyatt, 2007. All rights reserved.
ii
Preface
The course is aimed at filling the gaps in basic linear algebra and functional analysis
background of graduate students interested in communication, control, signal processing,
optimization, and related areas.
Principal topics include:
(a) field-independent linear algebra (vector spaces, bases, dimensions, matrix algebra,
linear transformations, linear equations, determinants, characteristic polynomials)
emphasizing the coordinate-free approach with applications in linear systems and
coding theory;
(b) quadratic optimization (positive definiteness, GramSchmidt orthogonalization, sca-

lar products, projection theorem) with applications in signal processing and optimal
control;
(c) convexity (convex functions and convex sets, convex optimization, cutting plane
methods, Caratheodory theorem, Minkovsky functionals, Krein-Milman Theorem,
Hahn-Banach theorem, minimax theorem) with applications in optimization;
(d) approximation and topology (norms, approximation, functional spaces, Fourier and
Laplace transforms, compactness, fixed point theorems, differentiation, implicit
mapping theorems, first and second order conditions of optimality) with applica-
tions in robustness analysis and optimization.
iii
iv PREFACE
Contents
Preface iii
1 Linearity With Real Scalars 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Interpolation of Data . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Interpolation by Polynomials of a Single Real Variable . . . . . . . 2
Interpolation by Polynomials of Two Real Variables . . . . . . . . . 2
Interpolation by Rational Functions of Single Real Variable . . . . . 3
1.1.2 Stationary Distributions for a Random Walk on a Graph . . . . . . 3
1.2 Real Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Formal Definition and Notation . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Examples of Real Vector Spaces . . . . . . . . . . . . . . . . . . . . 5
The Zero Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . 5
The Real Vector Space IRn . . . . . . . . . . . . . . . . . . . . . . . 6
Vector Spaces of Functions . . . . . . . . . . . . . . . . . . . . . . . 6
Linear Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Vector Spaces of Equivalence Classes . . . . . . . . . . . . . . . . . 7
Elementary Geometry . . . . . . . . . . . . . . . . . . . . . . . . . 8
False Real Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.3 Proofs for Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . 9
Abstract Statements . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Elementary Geometry: Medians in a Triangle . . . . . . . . . . . . 10
1.3 Linear Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.1 Linear Functions, Operators, And Functionals . . . . . . . . . . . . 11
1.3.2 Elementary Operations on Linear Functions . . . . . . . . . . . . . 12
1.3.3 An Extension Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.4 A Factorization Theorem . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.5 Elementary Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4 Bases and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4.1 Linear Combinations of Vectors . . . . . . . . . . . . . . . . . . . . 17
Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . 17
v
vi CONTENTS
Bases and Invariance of Dimension . . . . . . . . . . . . . . . . . . 18

1.4.2 Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Matrix of a Linear Function . . . . . . . . . . . . . . . . . . . . . . 21
Elementary Operations On Matrices . . . . . . . . . . . . . . . . . 22
1.4.3 Feasibility of Linear Equations . . . . . . . . . . . . . . . . . . . . . 23
Dimension Counting For Linear Functions . . . . . . . . . . . . . . 23
1.4.4 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Bi-Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Dimension Counting And Duality . . . . . . . . . . . . . . . . . . . 27
2 Linearity With Alternative Fields 29

2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.1.1 Linear Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Error-Correcting Coding . . . . . . . . . . . . . . . . . . . . . . . . 30
Network Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.1.2 Exact Solutions Of Linear Equations . . . . . . . . . . . . . . . . . 33
2.2 General Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.2.1 Formal Definition of a Field . . . . . . . . . . . . . . . . . . . . . . 35
Elementary Proofs for Abstract Fields . . . . . . . . . . . . . . . . 36
Field Conventions And Terminology . . . . . . . . . . . . . . . . . . 36
Homomorphisms, Extensions, and Subfields . . . . . . . . . . . . . 37
Equivalent (Isomorphic) Fields . . . . . . . . . . . . . . . . . . . . 37
2.2.2 Examples of Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Fields of Complex Numbers . . . . . . . . . . . . . . . . . . . . . . 38
Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
The Field of p-Adic Numbers . . . . . . . . . . . . . . . . . . . . . 40
2.2.3 Fermats Little Theorem . . . . . . . . . . . . . . . . . . . . . . . . 42
2.2.4 Characteristic of a Field . . . . . . . . . . . . . . . . . . . . . . . . 43
2.3 Vector Spaces Over Alternative Fields . . . . . . . . . . . . . . . . . . . . . 44
2.3.1 Formal Definition and Notation . . . . . . . . . . . . . . . . . . . . 44
2.3.2 Examples of General Vector Spaces . . . . . . . . . . . . . . . . . . 45
The Vector Spaces F m and V . . . . . . . . . . . . . . . . . . . . 45
Linear Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Polynomials And Rational Functions . . . . . . . . . . . . . . . . . 46
Field Extensions As Vector Spaces . . . . . . . . . . . . . . . . . . 46
2.4 Standard Linearity In General Vector Spaces . . . . . . . . . . . . . . . . . 46
2.4.1 Linear Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.4.2 Extension, Factorization, and Elementary Duality . . . . . . . . . . 48
2.4.3 Bases and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Linear Combinations of Vectors . . . . . . . . . . . . . . . . . . . . 50
Matrix of a Linear Function . . . . . . . . . . . . . . . . . . . . . . 50
CONTENTS vii
Feasibility of Linear Equations . . . . . . . . . . . . . . . . . . . . . 51

Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3 Determinants 53
3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.1.1 Parameter-Dependent Linear Equations . . . . . . . . . . . . . . . . 53
3.1.2 Determinants as Dynamical System Invariants . . . . . . . . . . . . 54
3.2 Construction of a Determinant . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.2.1 Signed Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.2.2 Multilinear Skew Symmetric Functions . . . . . . . . . . . . . . . . 56
3.2.3 Determinant as Signed Volume Gain . . . . . . . . . . . . . . . . . 59
3.3 Basic Properties of Determinants . . . . . . . . . . . . . . . . . . . . . . . 60
3.3.1 Elementary Properties . . . . . . . . . . . . . . . . . . . . . . . . . 60
Determinant of Identity . . . . . . . . . . . . . . . . . . . . . . . . 60
Determinant of an Invertible Operator . . . . . . . . . . . . . . . . 60
Multiplicativity of Determinant . . . . . . . . . . . . . . . . . . . . 61
Determinants of Similar Operators . . . . . . . . . . . . . . . . . . 62
Determinants and Duality . . . . . . . . . . . . . . . . . . . . . . . 63
3.3.2 Determinants and Block Decompositions . . . . . . . . . . . . . . . 64
Direct Sums And Block Decompositions . . . . . . . . . . . . . . . 64
Determinants Of Block Triangular Matrices . . . . . . . . . . . . . 65
Schur Identity And The Kramers Formula . . . . . . . . . . . . . . 66
4 Characteristic Polynomials 69
4.1 Motivation: LTI Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.1.1 Autonomous Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.1.2 Reachability of LTI State Space Models . . . . . . . . . . . . . . . . 70
4.2 Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.2.1 Companion Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.2.2 Invariant Subspaces And Irreducible Polynomials . . . . . . . . . . 74
4.2.3 Cayley-Hamilton Theorem . . . . . . . . . . . . . . . . . . . . . . . 75
4.2.4 Schur Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5 Quadratic Forms
and Scalar Products 79
5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.1.1 Euclidean Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.1.2 Quadratic Constraints and Uncertainty Modeling . . . . . . . . . . 81
5.1.3 Random Variables and Second Order Statistics . . . . . . . . . . . . 84
5.2 Positive Definiteness of Quadratic Forms . . . . . . . . . . . . . . . . . . . 85
5.2.1 Cauchy-Bunyakovski-Schwarz Inequality . . . . . . . . . . . . . . . 85
viii CONTENTS
5.2.2 Matrix Representation of Quadratic Forms . . . . . . . . . . . . . . 86

5.2.3 Gram-Schmidt Orthogonalization . . . . . . . . . . . . . . . . . . . 88
5.2.4 Sylvester Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6 Linear-Quadratic Optimization 91
6.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.1.1 Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.1.2 Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.2 Basic Properties Of LQ Optimization . . . . . . . . . . . . . . . . . . . . . 97
6.2.1 Equivalence Of LQ Optimization Problems . . . . . . . . . . . . . . 97
6.2.2 Well-Posedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.2.3 Necessary and Sufficient Conditions of Optimality . . . . . . . . . . 100
6.2.4 Optimizing Sequences . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.2.5 Optimal Cost In LQ Optimization . . . . . . . . . . . . . . . . . . . 102
7 Bounded Linear Functions On

Hilbert Spaces 105
7.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
7.1.1 Approximation Of Functions . . . . . . . . . . . . . . . . . . . . . . 106
7.1.2 Model Reduction For Linear Transformations . . . . . . . . . . . . 109
7.2 Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7.2.1 Real Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7.2.2 Hilbert Spaces Of Bounded Linear Functionals . . . . . . . . . . . . 114
7.2.3 Complex Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . . 115
7.2.4 The Conjugation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
7.2.5 Orthogonal Projection . . . . . . . . . . . . . . . . . . . . . . . . . 118
7.2.6 Extensions of Linear Operators . . . . . . . . . . . . . . . . . . . . 120
Example: Extension Of Fourier Series Sum . . . . . . . . . . . . . . 122
7.3 Spectrum Of Self-Adjoint Operators . . . . . . . . . . . . . . . . . . . . . . 125
7.3.1 Constrained Optimization Of Quadratic Forms . . . . . . . . . . . . 125
7.3.2 Quadratic Forms And Self-Adjoint Operators . . . . . . . . . . . . 128
7.3.3 Spectrum And Essential Spectrum . . . . . . . . . . . . . . . . . . 130
7.3.4 Singular Values And Rank Reduction . . . . . . . . . . . . . . . . . 134
8 Convexity 139
8.1 Convex Sets And Convex Functions . . . . . . . . . . . . . . . . . . . . . . 139
8.1.1 Intersections And Maximums . . . . . . . . . . . . . . . . . . . . . 140
8.1.2 Convexity And Differentiation . . . . . . . . . . . . . . . . . . . . . 142
8.1.3 Convexity Preserving Operations . . . . . . . . . . . . . . . . . . . 146
8.2 Basic Theorems of Convex Analysis . . . . . . . . . . . . . . . . . . . . . . 149
8.2.1 Hahn-Banach Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 149
CONTENTS ix
8.2.2 Caratheodorys Fundamental Theorem . . . . . . . . . . . . . . . . 150

8.2.3 Hellys Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
8.2.4 Minimax Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
8.3 Convex Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
8.3.1 Method of ellipsoids . . . . . . . . . . . . . . . . . . . . . . . . . . 155
9 The Hahn-Banach Theorem 159

9.1 Proof of The Hahn-Banach Theorem . . . . . . . . . . . . . . . . . . . . . 159
9.1.1 Hahn-Banach Theorem For Minkowski Functionals . . . . . . . . . 159
9.1.2 Proof Of Theorem 9.2 . . . . . . . . . . . . . . . . . . . . . . . . . 161
9.1.3 Separation With Empty Interior . . . . . . . . . . . . . . . . . . . . 162
9.2 Applications Of Hahn-Banach Theorem . . . . . . . . . . . . . . . . . . . . 164
9.2.1 Existence Of Quadratic Lyapunov Functions . . . . . . . . . . . . . 164
9.2.2 Duality In Linear Programming . . . . . . . . . . . . . . . . . . . . 166
9.2.3 Necessary Conditions Of Optimality . . . . . . . . . . . . . . . . . 168
10 Norms And Convergence 171

10.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
10.1.1 Infinite Dimensional Differentiatial Equations . . . . . . . . . . . . 171
10.1.2 Existence of a Minimizer . . . . . . . . . . . . . . . . . . . . . . . . 173
10.2 Basic Types Of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . 174
10.2.1 Norms And Strong Convergence . . . . . . . . . . . . . . . . . . . . 174
10.2.2 Convergence Defined In Terms Of Duality . . . . . . . . . . . . . . 177
10.2.3 Weak Convergence In `1 and ` . . . . . . . . . . . . . . . . . . . . 179
10.3 Completeness, Continuity And Compactness . . . . . . . . . . . . . . . . . 182
10.3.1 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
10.3.2 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
10.3.3 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
11 Feasibility Of Equations 189

11.1 Uniform Boundedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
11.1.1 Interior Mapping Principle . . . . . . . . . . . . . . . . . . . . . . . 190
11.1.2 Alternative Statements On Uniform Boundedness . . . . . . . . . . 192
11.2 Generalized Inverse Function Theorem . . . . . . . . . . . . . . . . . . . . 196
11.2.1 Contractive Map Theorems . . . . . . . . . . . . . . . . . . . . . . 196
11.2.2 Linearization By Frechet Differentiation . . . . . . . . . . . . . . . 199
11.2.3 Implicit Function Theorem . . . . . . . . . . . . . . . . . . . . . . . 202
11.3 Necessary Conditions Of Optimality . . . . . . . . . . . . . . . . . . . . . . 204
11.3.1 General Constrained Optimization . . . . . . . . . . . . . . . . . . 204
11.3.2 Example: Minimal Time Optimal Control . . . . . . . . . . . . . . 206
x CONTENTS
Chapter 1
Linearity With Real Scalars
A standard way of modeling both the physical and the virtual worlds is by writing systems
of equations. General systems of equations are hard do deal with in a systematic fashion:
they are tough to solve practically, and also difficult to analyze theoretically. The so-
called linear equations turn out to be a nice exception: they are relatively straightforward
to solve practically, and their theoretical analysis is supported by a rich and powerful
theory. This chapter covers the most elementary aspects of linear equations, grouping
around the statement that a system of n scalar linear equations with n scalar unknowns
(counted properly) has a unique solution. It introduces the notions of vector space and
linear transformation as mathematical abstractions of linearity, dimension of a vector
space as an accurate measure of number of equations or number of variables, and
matrix of a linear transformation as a representation and practical computation tool.
1.1 Motivation
A linear equation with real variables has the form A(v) = u, where A : V 7 U is a
given linear function, V and U are real vector spaces, u U is a given element of U,
and v V is an element of V to be found. The main objective of this chapter is to
teach recognition of real vector space structures, assessment and representation of linear
functions, and counting of dimensions.
1.1.1 Interpolation of Data

Finding a simple representation for a function defined by a finite set of its sampled values
is a common task in data analysis.
1
2 CHAPTER 1. LINEARITY WITH REAL SCALARS
Interpolation by Polynomials of a Single Real Variable

Consider the task of finding, for a given positive integer m and a finite sequence =
[(tk , yk )]nk=1 of pairs (tk , yk ) IR IR of real numbers, a polynomial p = p(t) of degree less
than m with real coefficients pr IR such that
m1
X
p(tk ) = yk (k = 1, . . . , n), p(t) = pr tr . (1.1)
r=0
Such polynomial is said to interpolate the data samples provided by .

For example, the polynomial p(t) = 2t2 t+1 is a solution of the interpolation problem
for m = 2 and = [(1, 4), (0, 1), (1, 2)].
Assuming that
tk 6= ti for k 6= i (1.2)
(i.e. all numbers tk are different), it turns out that the search for p is guaranteed to
be feasible whenever m n. Moreover, when m = n and condition (1.2) holds, each
argument/value set defines a unique interpolating polynomial p. We will show that
these theoretical statements can be proven by establishing the following:
(a) the set V of all polynomials of degree less than m, as well as the set U of all columns
of n real numbers, are real vector spaces;
(b) the function A : V 7 U transforming an arbitrary polynomial p of degree less than

m into the column
p(t1 )
q = ...

p(tn )
of its values is linear;
(c) the dimensions of V and U are m and n respectively;
(d) subject to (1.2), the null-space of A consists of a single element when n = m.
In addition, in order to calculate a solution p of (1.1) for a specific set , one can use a
matrix representation of A with respect to some bases in V and U.
Interpolation by Polynomials of Two Real Variables

The question of feasibility of polynomial interpolation questions becomes more compli-
cated when dealing with functions of several variables.
Consider the task of finding, for a given positive integer m and a finite sequence
= [(tk , hk , yk )]nk=1 of triplets (tk , hk , yk ) IR IR IR of real numbers, a polynomial
1.1. MOTIVATION 3
p = p(t, h) of two real variables of degree less than m with respect to each of them, with
real coefficients pr,i IR such that
m1
X m1
X
p(tk , hk ) = yk (k = 1, . . . , n), p(t) = pr,i tr hi . (1.3)
r=0 i=0
2
Since p is defined by m independent real parameters, one can expect that the inter-
polation problem will have a unique solution whenever n = m2
(tk ti )2 + (hk hi )2 6= 0 for k 6= i. (1.4)
This, however, is not the case, in general. We will consider this example in terms of
real vector spaces, dimensions, linear transformations, and show that a less trivial picture
emerges.
Interpolation by Rational Functions of Single Real Variable

An important generalization of problem (1.1) is a rational interpolation question
m1 m1
p(tk ) X X
= yk (k = 1, . . . , n), p(t) = pr tr , q(t) = 1 + qr tr , (1.5)
q(tk ) r=0 r=1
where the positive integer m and the interpolation data = [(tk , yk )]nk=1 are given, while
the coefficients pi , qi of the polynomials p, q are to be figured out.
Interpolation by rational functions offers the possibility of much better use of the free
parameters to match given data. On the other hand, the theoretical analysis becomes
more involved compared to the polynomial case, as the function mapping (p, q) to the
column of the values of p/q is unlikely to be linear. We will show that finding a linear
equation interpretation is more challenging, but still possible in this case and yields nice
conditions for existence and uniqueness of solutions in the rational interpolation problem.
1.1.2 Stationary Distributions for a Random Walk on a Graph

A directed graph with nodes numbered by integers from 1 to n (such as the one shown
on Figure 1.1) defines a random walk setup in which, in a single time step, a particle
moves with equal probability from its current location node to any of the connected
nodes. Consider the task of computing, for each node of the graph, the average fraction
of the time the particle will spend at the node (qi for node i).
Let pij denote the probability associated with moving from node j to node i (i.e.
pij = 1/kj where kj is the number of neighbors of node j when nodes i and j are connected,
and pij = 0 otherwise). The numbers qi to be computed satisfy the equations
n
X
i = qi pij qj = 0. (1.6)
j=1
d - d - d
6@
@
d @R?d - ?d
6 6@ 6
@
d
- d @ R?
- d
Figure 1.1: A Network
How can one be sure that this system of equations has a non-zero solution? One way to
see this is to recognize the set of all possible functions q : {1, . . . , n} 7 IR, assigning a
real number to each node, as a vector space, and the transformation mapping q to the
sequence = (i ) in (1.6) as a linear function. Comparing the dimensions of the vector
spaces of {q} and {} will lead to the observation that (1.6) has n variables but only n 1
equation (if calculated properly). Hence, a non-zero solution of (1.6) does exist.
1.2 Real Vector Spaces

The main objective of this section is to teach recognition of real vector spaces.
1.2.1 Formal Definition and Notation

The notion of a real vector space is a mathematical abstraction for data types in which
values (vectors) of the same type can be scaled (i.e. multiplied by an arbitrary real
number) and added (one to another) to produce another value of the same type. To
institute a real vector space, the operations of scaling and addition must satisfy the
familiar laws of associativity, distributivity, and commutativity.
Definition 1.1 A set V and two functions p : V V 7 V (p for plus) and s : IRV 7
V (s for scale) are said to define a real vector space if there exists an element 0V V
(called zero of V ) such that conditions (V1)-(V8), listed below, are satisfied, where v + u
and cv for v, u V and c IR are used as shortcuts for p(v, u), and s(c, v) respectively:
(V1) (v + u) + w = v + (u + w) for all v, u, w V ;
(V2) c1 (c2 v) = (c1 c2 )v for all v V , c1 , c2 IR;
(V3) v + u = u + v for all v, u V ;
(V4) c(v + u) = (cv) + (cu) for all v, u V , c IR;
(V5) (c1 + c2 )v = (c1 v) + (c2 v) for all v V , c1 , c2 IR;
(V6) v + 0V = v for all v V ;
(V7) 0 v = 0V for all v V ;
(V8) 1 v = v for all v V .
1.2. REAL VECTOR SPACES 5
To streamline notation, several conventions are typically used, as long as this does not
cause ambiguity:
(a) 0V replaced by 0 (whether 0 IR or 0 = 0V V can usually be deduced from the

context);
(b) v u used as a shortcut for v + ((1)u);
(c) v used as a shortcut for (1)v;
(d) associativity allows one to write v + u + w instead of (v + u) + w;
(e) standard operation priority rules (do multiplication and division before addition and
subtraction when not sure, etc.) are applied, so that, for example, c1 v + c2 u means
(c1 v) + (c2 u) and not c1 (v + (c2 u)).
Note that, once the addition function is fixed, only one element can be suitable to
play the role of zero of the vector space. Indeed, if v1 , v2 V are such that v1 + v = v
and v2 + v = v for every v V then
v2 = v1 + v2 = v2 + v1 = v1 .
Mathematical formality requires one to make a distinction between the set V and the
triplet V = (V, p, s), because, on the same set, vector operations satisfying (V1)-(V8) can
be defined in different ways. While V is the true vector space, it is common to ignore
the difference between V and V unless this causes ambiguity, so that v V should be
interpreted as v V .
Multiplication of a vector by another vector is not included in the definition of a real
vector space (another inadmissible operation is addition of a scalar to a vector). Though
multiplication of vectors, satisfying the distributive, associative, and commutative laws
can be defined on every vector space, there is usually no natural way of doing this.
1.2.2 Examples of Real Vector Spaces

Two types of examples are studied: the structures which are true real vector spaces, and
those which are not, despite looking like ones.
The Zero Vector Space

A rather trivial example of a vector space is a set V with a single element (zero, of course),
on which the result of every addition or scaling operation is zero (by necessity, as there
are no other elements anyway).
The Real Vector Space IRn

A very simple example of a real vector space is the set IRn , defined for every positive
integer n as the set of all columns of n real numbers:

x1

x2

n def
IR = x = [x1 ; x2 ; . . . ; xn ] = .. : xi IR .

.

xn
For example IR1 is the same as IR, IR2 is the set of all columns of two real numbers
(essentially, same thing as IR IR), etc. The set IRn appears naturally when analyzing
interactions of n real parameters.
The elements of IRn can be added (to each other) and scaled (by a real number) in a
natural (component-wise) fashion, to get another element in IRn :

x1 y1 x1 + y1 x1 cx1
x2 y2 x2 + y2 x2 cx2
.. + .. = .. , c .. = .. .

. . . . .
xn yn xn + yn xn cxn
Vector Spaces of Functions

Another class of real vector spaces is given by the sets U = V of all functions f : 7 V ,
where is a fixed set, and V is a real vector space. Such functions can be scaled by real
numbers and added together according to the natural component-wise rules: if f, g V
and c IR then h = cf and u = f + g are defined by
h() = cf (), u() = f () + g().
For example, an instantaneous force field in a subset of the three-dimensional space

IR3 (i.e. a three-dimensional vector defined at every point of ) can be understood as
an element of the set V where V = IR3 . Furthermore the time history of this field (the
correspondence between time instances t IR, the location vector , and the field
value e = e(t, ) IR3 ) can be viewed as the element of the real vector space Z V where
Z = IR and V = IR3 . Similarly, scalar discrete time signals defined as semi-infinite
sequences x = (x0 , x1 , . . . xn , . . . ) of real numbers xi IR form a set ` which becomes a
real vector space after introducing component-wise scaling and addition.
Linear Subspaces
It is common to define an interesting real vector space V as a subset V U of a larger
real vector space U, with the addition and scaling operations inherited from U. Indeed,
for this construction to be meaningful, V must be closed under the addition and scaling
operations from U.
Definition 1.2 Let U be a real vector space. A subset V U is called a linear subspace
of U if v1 + v2 V and cv V whenever v, v1 , v2 V and c IR.
Equipped with the addition and scaling operations inherited from U, a linear subspace
V becomes a real vector space in its own right.
For example, the subset C[0, 1] IR[0,1] of all continuous functions f : [0, 1] 7 IR is
a linear subspace of IR[0,1] , and hence is automatically a real vector space with respect to
the usual operations of addition and scaling.
Vector Spaces of Equivalence Classes

Some vector spaces only appear to be vector spaces of functions, while actually they are
vector spaces of equivalence classes. An important example is the set of all real rational
functions of a single real variable. The problem with rational functions is that they are
often not defined for some values of the real argument: for example, f (t) = 1/t is defined
everywhere except at its pole location t = 0. The difficulty is that for each rational
function has its own set of poles, and hence it is impossible to find a single subset of
IR such that every rational function is defined on .
A commonly used way of constructing such vector spaces is by using equivalence classes
as vectors. In the case of rational functions, the approach works as follows. First, consider
the subset V0 of the functional space IRIR which consists of those functions f : IR 7 IR
which, for some polynomials p(t), q(t) (where q(t) is not identically equal to zero) satisfy
the identity f (t) = p(t)/q(t) for all t IR except, possibly, a finite number of values. It
is easy to see that V0 is a linear subspace of IRIR , and hence is a real vector space itself.
Unfortunately, V0 is not quite the desired construction, because different elements of V0
may correspond to the same rational functions. For example, the functions

1/t, t 6= 0, 1/t, t 6= 0,
f1 (t) = f2 (t) =
0, t = 0, 1, t = 0,
while corresponding to the same rational expression 1/t, are different as elements of V0 . To
fix the problem, consider an equivalence relation on V0 , according to which two functions
f1 , f2 V0 are equivalent (notation f1 f2 ) if and only if f1 (t) = f2 (t) for all values
of t except, possibly, a finite set. The relation is transitive, in the sense that f f for
all f V0 , and f1 f2 together with f2 f3 implies f1 f3 . Hence the whole set V0
can be partitioned into equivalence classes: subsets of V0 such that functions from the
same subset are equivalent between each other. Functions from different subsets are never
equivalent. Since (cf1 ) (cf2 ) whenever f1 f2 for every c IR, and f1 + g1 f2 + g2
whenever f1 f2 and g1 g2 , the scaling and addition operations can be lifted from
V0 to the set V of all equivalence classes, which becomes a mathematically accurate real
vector space representation for the set of all rational functions.
Elementary Geometry
A vector space can be associated naturally with the usual elementary geometry on the
plane (or in space), according to the following set of definitions. which translates the
geometric terms of points, lines, segments, distances, etc. into the language of real vector
spaces.
(a) The set V is the set of all points on the plane (or space). An arbitrarily selected
point O is to be called zero.
(b) The sum O + X of O and another point X is defined (naturally) as X. When

A 6= O and B 6= O, the sum A + B of point A and point B is a point C such that
the segment [AC] is parallel and equal in length to [OB], and the segment [BC] is
parallel and equal in length to [OA] (see Figure 1.2).
(c) The result cA of multiplying point A by real number c IR is O when A = O.

Otherwise, it is the point B located on the line passing through O and A, and the
length |OB| of the segment [OB] is given by |OB| = |c| |OA|: when c > 0, B is on
the same side of O as A, when c < 0 , B is on the opposite side (see Figure 1.2).
B A+B
e e
e e e e e
0.5A O 0.75A A 1.5A
Figure 1.2: Geometry and Vectors
With this definition, the plane and the three-dimensional space become real vector
spaces, and the axioms (V1)-(V8) become a set of relatively simple theorems of elementary
geometry. It becomes possible to express geometric objects in terms of real vector space
operations. For example, a line passing through two points A 6= B can be viewed as the
set
(AB) = {(1 t)A + tB : t IR};
two lines (A1 B1 ) and (A2 B2 ) can be called parallel if and only if
A1 B1 = c(A2 B2 )
for some c IR, etc. While not all geometric notions are covered by this association (for
example, there are no means for defining angles and comparing lengths of non-parallel seg-
ments within the framework of real vector spaces), addition of a scalar product operation,
to be discussed in next chapters, makes a real vector space an accurate representation of
elementary geometry. As a result, linear algebra becomes the most powerful and conve-
nient tool for proving geometric theorems, though most people learn linear algebra too
late to use it this way.
False Real Vector Spaces

For h IR let {h} denote the difference between h and the largest integer which is
not larger than h. The functions of scaling (s) and addition (a) can be defined on the
semi-open interval X = [0, 1) according to
s(c, x) = {cx}, a(x, y) = {x + y} (c IR, x, y X).
These operations, however, do not make X a real vector space. For example, the associa-
tivity law does not work, as, for v1 = v2 = 0.5 X
0 = 0.5(v1 + v2 ) 6= (0.5v1 ) + (0.5v2 ) = 0.5.
1.2.3 Proofs for Vector Spaces

One can use axioms (V1)-(V8), combined with the properties of real numbers, to prove a
variety of statements about vector spaces, some of them not completely trivial.
Abstract Statements
Here are some examples done at a very abstract level. Both represent obvious state-
ments, which nevertheless have to be derived from the axioms.
The first statement is about cancelling identical terms on both sides of equalities
between vector sums.
Lemma 1.1 If V is a real vector space, v, u, w V , and v + w = u + w then v = u.
Proof.
V6 V7
v = v + OV = v + 0 w = v + (1 + (1)) w
V5 V8 V1
= v + (1 w + (1) w) = v + (w + (1) w) = (v + w) + (1) w
by assumption V1
= (u + w) + (1) w = u + (w + (1) w) = u.
The second statement claims that scaling a zero vector results in a zero vector no
matter what the scaling parameter is.
Lemma 1.2 If V is a real vector space then c 0V = 0V for all c IR.
Proof.
V7 V2 V7
c 0V = c(0 0V ) = (c 0)0V = 0 0V = 0V .
In the rest of the presentation, we will not sink to this picky level of detail again.
Elementary Geometry: Medians in a Triangle

A simple theorem from plainimetry claims that medians in a triangle have a common
point. This can be translated into a statement about vector spaces as shown below.
A1
d

B

B

B

B

B
B
M3
d
Bd
B M2

B

d B

B

W B

B

B
d

d BBd
A2 M1 A3
Figure 1.3: Medians in a Triangle
According to the definition of addition and scaling, for every two points A, B V
such that A 6= B the segment [AB] and the line (AB) are the sets
[AB] = {tA + (1 t)B : t [0, 1]}, (AB) = {tA + (1 t)B : t IR},
and the middle point C of segment [AB] is given by 0.5(A+ B). Three points A1 , A2 , A3
X define a triangle
A1 A2 A3 = {t1 A1 + t2 A2 + t3 A3 : ti 0, t1 + t2 + t3 = 1}
1.3. LINEAR FUNCTIONS 11
if they do not belong to a single line. A median in A1 A2 A3 is one of the three segments
[Ai Mi ], where M1 is the middle of [A2 A3 ], M2 is the middle of [A1 A3 ], and M3 is the
middle of [A1 A2 ].
The theorem of interest claims that [A1 M1 ], [A2 M2 ], and [A3 M3 ] have a common
point W . To prove this, note that W must be of the form W = t1 A1 + t2 A2 + t3 A3 , where
t1 + t2 + t3 = 1, and, due to the symmetry (re-naming points A1 , A2 , A3 should have no
effect on W ), one would expect that t1 = t2 = t3 . This yields W = (1/3)(A1 + A2 + A3 )
as a guess for what W actually is. Now it remains to verify that W indeed belongs to all
three medians. Since
1 1 2 1 1 2
(A1 + A2 + A3 ) = A1 + ( (A2 + A3 )) = A1 + M1 ,
3 3 3 2 3 3
it follows that W [A1 M1 ]. The inclusions W [A2 M2 ] and W [A3 M3 ] are derived in
a similar way.
1.3 Linear Functions

Real vector spaces are actually a very boring subject to study. What makes the topic
both interesting and relevant in applications is the interaction between linear functions.
1.3.1 Linear Functions, Operators, And Functionals

Linear functions are those mapping one vector space into another while preserving addition
and scaling relations.
Definition 1.3 Let V and U be two real vector spaces. A function A : V 7 U is called
linear if
A(v1 + v2 ) = A(v1 ) + A(v2 ), A(cv) = cA(v)
for every v, v1 , v2 V and c IR.
It is common to omit the brackets in linear function notation, so that Av replaces

A(v) when A is linear. This also leads to using product notation for composition of
linear functions: if V, U, and W are real vector spaces, and A : V 7 U, B : U 7 W
are linear functions then B(A(v)) can be expressed as BAv, where BA stands for the
composition B A. When U = IR, a linear function A : V 7 IR is usually referred to
as a linear functional. When V = U, a linear function A : V 7 V is frequently called a
linear operator, though sometimes A : V 7 U is called an operator even though V 6= U
(apparently, the word operator is juicier). It is also common to refer to linear functions
as linear transformations.
With every linear function A : V 7 U one can associate its null-space and range.
Definition 1.4 The null-space (or kernel) of a linear function A : V 7 U is the set
def
ker(A) = {v V : Av = 0}.
The range of A is the set

def
R(A) = {Av : v V }.
It is easy to check that the null-space ker(A) of a linear function A : V 7 U is a

linear subspace of V . Similarly, the range R(A) is a linear subspace of U.
Example 1.1 If V is a linear subspace of a real vector space U , the inclusion function A :
V 7 U which maps v to itself is linear. An important special case is the identity function
IV : V 7 V is defined by IV (v) = v. When V = IRn , the notation In is frequently used in place
of IV . Also, when the vector space V can be figured out from context, I can be used instead of
IV .
Example 1.2 Let V be the real vector space C[0, 1] of all continuous functions f : [0, 1] 7 IR.
The integration formula Z t
(Af )(t) = f ( )dt
0
defines a linear operator A : V 7 V . The operator maps f C[0, 1] to a continuously
differential function g C[0, 1] such that g(0) = 0 and g(t) = f (t) for all t. The proof of
linearity simply refers to the properties of integration. In contrast, the formula Lf = f (0.5)
defines a linear functional L : C[0, 1] 7 IR.
Example 1.3 The formula Z 1

Lf = f (t2 )dt
0
defines a linear functional L : C[0, 1] 7 IR. In contrast, the formula
Z 1
Hf = f (t)2 dt
0
defines a functional H : C[0, 1] 7 IR which is not linear. To prove absence of linearity, one
typically needs an example. For instance, let f (t) 1. Then H(f ) = 1 but H(2f ) = 4. Since
H(2f ) 6= 2H(f ), the function H is not linear.
1.3.2 Elementary Operations on Linear Functions

When V, U are real vector spaces, A : V 7 U is a linear function, and y IR is a scalar,
the product F = yA is defined as a function F : V 7 U according to F (v) = y(Av). It
is easy to see that F is linear:
F (v1 + v2 ) = yA(v1 + v2 ) = y(Av1 ) + y(Av2) = F (v1 ) + F (v2 ),

F (cv) = yA(cv) = cyAv = cF (v).

When, in addition, B : V 7 U is a linear function as well, the sum D = A + B is a
function defined by D(v) = Av + Bv. It is easy to see that D is linear as well:
D(v1 + v2 ) = A(v1 + v2 ) + B(v1 + v2 ) = (Av1 + Av2 ) + (Bv1 + Bv2 )
= (A(v1 ) + B(v1 )) + (A(v2 ) + B(v2 )) = D(v1 ) + D(v2 ),

D(cv) = A(cv) + B(cv) = cAv + cBv = c(Av + Bv) = cD(v).
Similarly, composition G = BA of two linear functions A : V 7 U and B : U 7 W ,
where W is also a real vector space, is linear:
G(v1 + v2 ) = BA(v1 + v2 ) = B(Av1 + Av2 ) = BAv1 + BAv2 = G(v1 ) + G(v2 ),
G(cv) = BA(cv) = B(c(Av)) = c(BAv) = cG(v).

Assuming that a linear function A : V 7 U is a bijection, the inverse K = A1
is defined so that K(u) for u U is the unique solution v of Av = u. The function
A1 turns out to be linear as well. Indeed, if v satisfies Av = u then A(cv) = cu, i.e.
cv = K(cu), which proves that K(cu) = cK(u). Similarly, if v1 , v2 satisfy Av1 = u1
and Av2 = u2 then A(v1 + v2 ) = u1 + u2 , i.e. v1 + v2 = K(u1 + u2 ), which proves that
K(u1 + u2 ) = K(u1 ) + K(u2 ).
Example 1.4 Let V be the real vector space of all polynomial functions f : IR 7 IR. Let
A : V 7 V and B : V 7 V be the differentiation and multiplication by the independent
variable operators defined by
(Af )(t) = f(t), (Bf )(t) = tf (t).
Then BA maps f to g, where g(t) = tf(t), while AB maps f to h, where h(t) = tf(t) + f (t).
Therefore AB BA maps f to f , i.e.
def
AB BA = AB + (1)BA = I.
The relation AB BA = I is remarkable, in particular, because it allows one to model the

uncertainty principle of quantum mechanics. It is interesting to note that the equality AB BA
is never true when dim(V ) < (to be proven leter in this chapter).
1.3.3 An Extension Theorem

The following theorem claims that a linear function defined on a linear subspace of a real
vector space can be extended to a linear function on the whole space.
Theorem 1.1 Let V, U be real vector spaces. Let A0 : V0 7 U be a linear function

defined on a linear subspace V0 V . Then there exists a linear function A : V 7 U such
that Av = A0 v for all v V0 .
In essence, the proof of Theorem 1.1 is simple. If V0 = V then the statement is obvious.
Otherwise, take a vector w V , w 6 V0 , and consider the set
V1 = {v + tw : v V0 , t IR},
which is a linear subspace of V of which V0 is a strict subset. Define A1 : V1 7 U by
B(v + tw) = A0 v (v V0 , t IR).
Since w 6 V0 , a representation of an arbitrary v1 V1 in the form v1 = v + tw, where

v V0 and t IR, is unique, which means that B is a well defined linear function. Thus,
unless V0 = V , it is always possible to extend A0 to a linear function A1 on a larger
subspace V1 while preserving linearity.
To complete the proof one has to show that there is a maximal linear extension A1 :
V1 7 U, i.e. one that cannot be extended more. According to the previous remarks, such
extension will automatically have V1 = V .
The existence of a maximal extension, while intuitively evident, has to be proven using the
so-called Zorns Lemma, which can be viewed as an axiom of mathematics allowing induction
over sets which are not necessarily countable.
Lemma 1.3 (Zorns Lemma) Let X be a non-empty set. Let Y be a subset of X X with the
following properties:
(a) (x1 , x2 ) Y and (x2 , x1 ) Y if and only if x1 = x2 ;
(b) if (x1 , x2 ) Y and (x2 , x3 ) Y then (x1 , x3 ) Y ;
(c) if X0 is a subset of X such that for every x1 , x2 X0 either (x1 , x2 ) Y or (x2 , x1 ) Y

then there exists xub X such that (x, xub ) Y for all x X0 .
Then there exists xmax X such that (xmax , x) 6 Y for all x 6= xmax .
Zorns Lemma can be interpreted in the following way. The set Y defines a partial order
on X, in which x1 is said to be less or equal to x2 (notation x1 x2 ) whenever (x1 , x2 ) X.
The order is called partial because it is possible for both inequalities x1 x2 and x2 x1 to be
false. Assumptions (a) and (b) reflect the usual properties of ordering: the inequalities x1 x2
and x2 x1 are satisfied simultaneously if and only if x1 = x2 , and the inequalities x1 x2 ,
x2 x3 imply x1 x3 . Condition (c) establishes that every completely ordered subset X0 of
X has an upper bound xub X, i.e. an element, not necessarily belonging to X0 , such that the
inequality x xub holds for all x X0 . The conclusion of the Lemma is that the set X has at
least one maximal element xmax , i.e. such which is not smaller than any other element of X.
To apply Zorns Lemma to prove Theorem 1.1, define X as the set of all linear extensions
F : W 7 U of A0 (i.e. such that V0 W V and F v = A0 v for all v V0 ). Since A0 X,
the set X is not empty. Define the partial order on X according to which F1 : W1 7 U is
less or equal than F2 : W2 7 U if and only if W1 W2 and F2 v = F1 v for all v W1 . Then
conditions (a),(b) of Lemma 1.3 are evidently satisfied. Moreover, condition (c) is satisfied as
well because, for a completely ordered subset X0 of extensions F : W 7 U , an upper bound Fub
can be chosen as the function mapping the union Wub of W to U (since each element w Wub
belongs to some W , the value of Fub (w) is well defined for all w Wub ). Therefore X has a
maximal element Fmax .
1.3.4 A Factorization Theorem

The extension Theorem 1.1 can be used to establish feasibility of linear equations in which
the unknowns themselves are linear functions.
Theorem 1.2 Let U, V, W be three real vector spaces. Let A : V 7 U and B : V 7 W

be linear functions such that ker(A) ker(B). Then there exists a linear function C :
U 7 W such that B = CA.
Proof. Define function C0 : R(A) 7 W according to C0 (Av) = Bv for every v V . Since

Av1 = Av2 implies v1 v2 ker(A), and hence v1 v2 ker(B), i.e. Bv1 = Bv2 , the function
C0 is well defined. Since A and B are linear, so is C0 . According to Theorem 1.1, C0 has a
linear extension C on U , which by construction satisfies CA = B.
Theorem 1.2 (and some of its generalizations) is frequently used in linear algebra and
functional analysis-related proofs. One of its interpretations is in terms of information
recovery, where the transformation v 7 Av is viewed as a measurement process, which
is associated with some loss of the information contained in v. Assuming that Bv is
the information to be recovered, function C is the filter converting measurement Av
into the needed data Bv. Obviously, whatever is in the null-space of A is lost in the
measurement process. Theorem 1.2, essentially, claims that the rest can be recovered by
a linear filter.
Example 1.5 A linear function A : V 7 U such that ker(A) = {0} always has a left inverse
A+ : a linear function A+ : U 7 V such that A+ A = IV . To prove this, use Theorem 1.2 with
W = V and B = IV .
1.3.5 Elementary Duality

Duality is a powerful tool of linear analysis which relies on the use of vector spaces of
linear functionals.
Definition 1.5 Let V be a real vector space. The dual space V ] is the real vector space
defined as the subspace of all linear functions from V IR .
Example 1.6 Every linear functional f : IRn 7 IR has the form

x1 x1
f ... = c1 x1 + + cn xn = c1 . . . cn ... ,
def
xn xn
and hence it is common to associate (IRn )] with the vector space IR1n of all 1-by-n real matrices.
The vector spaces IRn appear to be very similar. In particular, the linear function F : IRn 7
IR1n defined by
x1
F ... = x1 . . . xn

xn
establishes a linear bijection between IRn and IR1n . In general, however, a linear bijection
between V and V ] does not exist.
For a linear function A : V 7 U, its dual is a linear function A] : U ] 7 V ] .
Definition 1.6 Let V, U be real vector spaces. For a linear function A : V 7 U, its
dual A] is the linear function A] : U ] 7 V ] mapping each linear functional f : U 7 IR
to the linear functional g = A] f : V 7 IR defined by g(v) = f (Av).
Example 1.7 A linear function A : IRm 7 IRn can be represented by its matrix a, which
allows one to view A as multiplication by a matrix on the left v 7 av. If (IRm )] and (IRn )]
are represented as the vector spaces of row matrices IR1,m and IR1,n respectively, the dual A]
becomes the multiplication by matrix a on the right q IR1,n 7 qa IR1,m .
It is easy to verify that the duality transformation A 7 A] satisfies the usual identities
similar to those valid for transfposition of matrices:
(A + B)] = A] + B ] , (AB)] = B ] A] , (A1 )] = (A] )1 , etc.
While the notion of orthogonality is not generally available for pairs of vectors from
the same real vector space, it can be applied to elements of a vector space and its dual.
Definition 1.7 For a subspace U of a real vector space V ,

def
U = {f V ] : f (v) = 0 v U}.
The following statement is a commonly used relation between null space of a linear
function and the range of its dual, which is actually a special case of Theorem 1.2.
1.4. BASES AND DIMENSION 17
Theorem 1.3 If V, U are real vector spaces and A : V 7 U is a linear function then
(ker(A)) = R(A] ).
Proof. If f (ker(A)) then f v = 0 for all v such that Av = 0. According to Theorem 1.2,
applied with B = f and W = IR, there exists C U ] such that f = CA, i.e. f = A] C R(A] ).
Conversely, if f R(A] ) then there exists g U ] such that f v = gAv for every v V , and
hence f v = 0 whenever Av = 0.
1.4 Bases and Dimension

Informally speaking, the dimension of a real vector space indicates the number of scalar
real parameters needed to specify a single vector. Calculating dimension is an important
part of assessing existence and uniqueness of solutions of linear equations. Bases are useful
for calculating dimension and defining practical representations of linear functions.
1.4.1 Linear Combinations of Vectors

The definitions given in this subsection are related to the notion of linear combination of
a finite sequence (v1 , . . . , vn ) of vectors, which means an expression of the form
n
X
c1 v1 + c2 v2 + + cn vn = ci vi , (1.7)
i=1
where vi are elements of the same real vector space V , and ci are real numbers, called
coefficients of the linear combination.
Linear Independence
Informally speaking, a finite sequence of vectors is called linearly independent if none of
its elements can be represented as a linear combination of the other.
Definition 1.8 Let V be a real vector space. A sequence (v1 , . . . , vn ) of elements vi V

is called linearly independent if a linear combination (1.7) of its elements equals zero only
when all of its coefficients ci are equal to zero.
Example 1.8 Consider a measurement setup in which n unknown real parameters a1 , . . . , an

are to be measured. A single measurement is defined by a vector w = [w1 ; . . . ; wn ] IRn , and
produces the number
L[w] = w1 a1 + w2 a2 + + wn an .
When a sequence of measurements is performed, some may become redundant: for example,
when n = 3, and three measurements are defined by

1 0 1
w(1) = 1 , w(2) = 1 , w(3) = 0 ,
0 1 1
the third one does not produce any additional information, as the identity
L[w(3)] = L[w(1)] L[w(2)] (1.8)
is satisfied no matter what the values of the unknown parameters ai are. Hence, the outcome
of the third measurement can be predicted using the results from the first two.
The redundancy of the third measurement can be interpreted in terms of linear independence
(more precisely, the lack of one): the identity (1.8) holds because
w(1) + w(2) + w(3) = 0,
which means that the sequence (w(1), w(2), w(3)) is not linearly independent. The interpretation
holds in the general case: a sequence
(w(1), w(2), . . . , w(k))
of k measurements w(i) IRn contains a redundant one if and only if it is not linearly indepen-
dent.
Example 1.9 Let V IRIR be the real vector space of all polynomial functions f : IR 7 IR.
The sequence of monomials (1, t, . . . , tn ) is linearly independent for every n. Indeed, every linear
combination of functions fi is a polynomial with coefficients which are also the coefficients of
the linear combination. Since a polynomial equals zero identically only if all of its coefficients
are zero, the conditions of Definition 1.8 are satisfied.
Example 1.10 Considering the three-dimensional space of elementary geometry as a real

vector space, a set = {A} consisting of a single point A is linearly independent if and only if
A 6= O. A set of two points = {A, B} is linearly independent if and only if there is no line L
containing A, B, and O. A set of three points = {A, B, C} is linearly independent if and only
if there is no plane M containing A, B, C, and O. Finally, a set of four or more points is never
linearly independent.
Bases and Invariance of Dimension

A linearly independent sequence of vectors (v1 , . . . , vn ) is called a basis when every other
vector is its linear combination. An equivalent definition is given as follows.
Definition 1.9 A linearly independent sequence (v1 , . . . , vn ) of elements vi of a real vector

space V is called a basis if every v V is a linear combination of elements from V .
The definition only allows finite bases. While bases with an infinite number of elements
are very important, their proper use requires an adequate framework for approximation
and convergence of vectors, something that the concept of a general real vector space
does not provide. For vector spaces of functions, an intuitively appealing notion of a basis
would call for the possibility of arbitrarily good approximation of every vector by a linear
combination of its elements.
The following statement describes bases of a general real vector space.
Theorem 1.4 Let V be a real vector space, V 6= {0}. Then one of the following condi-
tions is satisfied:
(a) V has no finite basis;
(b) there exists a positive integer n such that every linearly independent sequence
(v1 , . . . , vk ) of vi V has k n elements, and can be extended to a finite ba-
sis (v1 , . . . , vn ) of V .
The number n from case (b) of Theorem 1.4 is called the dimension of V , notation
n = dim(V ). By convention, dim(V ) = when case (a) takes place, and dim(V ) = 0 for
V = {0}.
The proof of Theorem 1.4 is based on the following observation.
Lemma 1.4 If Sv = (v1 , . . . , vm ) and Su = (u0 , u1, . . . , um ) are sequences of elements of

a real vector space V , and each ui is a linear combination of Sv then Su is not linearly
independent.
Proof. This statement can be proven by induction with respect to m.

For m = 1 u0 = c1 v and u2 = c2 v. If c1 = c2 = 0 then Su is obviously not linearly
independent. Otherwise
c2 u1 c1 u2 = c1 c2 v c2 c1 v = 0
means that Su is not linearly independent anyway.
Assume that the statement of Lemma 1.4 is valid for all m n. Consider now the case when
m = n + 1. By assumption,
ui = ai,1 v1 + + ai,m vm (i = 0, 1, . . . , m),
where ai,m are real numbers. If u0 = 0, the set Y is obviously not linearly independent. Other-
wise a0,k 6= 0 for some k. For bi = ai,k /a0,k , each vector ui bi u0 is a linear combination of the
elemements of the sequence Sw oftained by excluding vk from Sv . Since Sw has n elements, by
the inductive assumption the sequence
(u1 b1 u0 , . . . , um bm u0 )
is not linearly independent, i.e. there exist real numbers c1 , . . . , cm , not all of which equal zero,
such that
0 = c1 (u1 b1 u0 ) + + cm (um bm u0 ) = (c1 b1 + + cm bm )u0 + c1 u1 + + cm um ,
which proves Lemma 1.4 for m = n + 1.
Therefore a linearly independent sequence cannot have more elements than a basis. On
the other hand, if a linearly independent sequence Sv = (v1 , . . . , vk ) is not a basis, there
exists a vector vk+1 which is not a linear combination of Sv , and hence can be appended
to Sv to produce a longer linearly independent sequence (v1 , . . . , vk , vk+1). If this process
of extending Sv can be continued without termination, V has no finite basis. Otherwise,
the process terminates at a basis of V . This concludes the proof of Theorem 1.4.
Example 1.11 Let V be the set of all continuous functions f : [0, 1] 7 IR which are piecewise
linear, in the sense that they can be represented in the form f (t) = ak t+bk on each of the intervals
t [k/n 1/n, k/n], where k = 1, . . . , n, and n is a fixed number. It is easy to see that V is a
real vector space. What is the dimension of V ?
The intuition tells us that an element f V is defined by n + 1 real parameters a0 , a1 , . . . , an
representing the values ai = f (i/n) of f (t) at t = 0, 1/n, 2/n, . . . , 1. Hence one would expect
that the dimension of V is n + 1. To prove this, consider the sequence Sf of n + 1 functions

1 |nt i|, |nt i| 1,
fi (t) = i = 0, 1, . . . , n.
0, otherwise,
It is sufficient to establish that Sf is a basis in V .
(a) Each function fi belongs to V .
(b) The value of
fc (t) = c0 f0 (t) + c1 f1 (t) + + cn fn (t)
at t = i/n equals ci for i = 0, 1, . . . , n. Hence fc = 0 implies ci = 0 for all i, which means
that is linearly independent.
(c) For every f V the function
g(t) = f (0)f0 (t) + f (1/n)f1 (t) + + f (1)fn (t)
belons to V and equals f at the points t = i/n, i = 0, 1, . . . , n. Hence g(t) = f (t) for all
t, which proves that is a generating set.
As established in (a)-(c), Sf is a basis of V , and hence dim(V ) = n + 1.
1.4.2 Matrix Algebra

Matrices are used to represent linear functions in a format convenient for specific cal-
culations (computer-aided, or on paper). They can also aid theoretical derivations by
visualizing specific properties of linear functions.
Matrix of a Linear Function

Given two real spaces V, U of finite dimensions, how many real parameters does one have
to specify, in general, to define a linear function A : V 7 U? An important observation
is given by the following statement.
Theorem 1.5 Let V, U be real vector spaces. Then every basis bV = (v1 , . . . , vm ) in V
defines a one-to-one correspondence between linear functions A : V 7 U and sequences
of m vectors g1 , . . . , gm U, according to gk = Avk for k = 1, . . . , m.
Proof. By construction, once bV is fixed, the sequence g1 , . . . , gm is defined by A. On the other

hand, given bV and a sequence g1 , . . . , gm U defines a function A : V 7 U according to
A(v) = c1 g1 + + cm gm for v = c1 v1 + + cm vm . (1.9)
It is easy to verify that the function defined in (1.9) is linear, which concludes the proof.
Assuming that bU = (u1 , . . . , un ) is a fixed basis in U, every vector gk = Avk can be

represented uniquely in the form
gk = a1,k u1 + a2,k u2 + + an,k un , ai,k IR.
The convention for representing linear functions is to form an n-by-m table (n rows and
m columns) filled with the nm real numbers ai,k , so that the first column is a1,1 , a2,1 ,. . . ,
an,1 , the second column is a1,2 , a2,2 ,. . . , an,2 , etc. The table is referred to as the matrix of
A with respect to the bases bV , bU :

a1,1 a1,2 . . . a1,m
a2,1 a2,2 . . . a2,m
a = .. .. .. .

..
. . . .
an,1 an,2 . . . an,m
When A is a linear operator, i.e. when V = U, it is customary to use the same basis
for V and U: bV = bU .
Example 1.12 Let V be the real vector space of all purely harmonic functions f : IR 7 IR of
the form f (t) = A cos(t + ), where the frequency > 0 is fixed, while the amplitude A 0
and phase IR can be arbitrary. The operation of differentiation defines a linear function
A : V 7 V . The two elements f1 (t) = cos(0 t) and f2 (t) = sin(0 t) form a basis b = (f1 , f2 ) in
V . Since
(Af1 )(t) = 0 sin(w0 t) = 0 f2 (t), (Af2 )(t) = 0 cos(0 t) = 0 f1 (t),
the matrix of A with respect to the basis b is

0 w0
a= .
0 0
Elementary Operations On Matrices

For k < r let ek,r denote the vector from IRr all components of which are zero except
the k-th one, which equals 1. The sequence of vectors (e1,r , e2,r , . . . , er,r ) is called the
standard basis in IRr . Fixing the standard bases in IRr for all r establishes a one-to-
one correspondence between linear functions A : IRm 7 IRn and n-by-m matrices a,
which maps each A to its matrix with respect to the standard bases in IRm and IRn .
This association allows one to define addition, scaling, and multiplication of real matrices
according to the following rules.
(a) Two matrices a and b can be added when they have same dimensions. The result
a + b is the matrix of A + B where A, B : IRm 7 IRn are the linear functions with
matrices a and b respectively. Algorithmically, the operation is component-wise
addition.
(b) An n-by-m real matrix a can be scaled by a real number c IR. The result is
the matrix of cA where A : IRm 7 IRn is the linear function with matrix a.
Algorithmically, the operation is component-wise multiplication by c.
(c) An n-by-m matrix a can be multiplied on the left by a k-by-n matrix b. The result
ba is the matrix of BA where A : IRm 7 IRn and B : IRn 7 IRk are the linear
functions with matrices a and b respectively. Algorithmically, the operation means
n
X
(ba)r,q = br,1 a1,q + br,2 a2,q + + br,n an,q = br,i ai,q .
i=1
It is easy to verify that the three elementary operations introduced for real matrices
satisfy the familiar laws of associativity and distributivity. It is important to remember
that commutativity, while valid for the addition of matrices, does not hold for multiplica-
tion. In fact, unless k = m, the product ab is not even defined for a k-by-n matrix b and
an n-by-m matrix a! Another important difference is that a product ba of two non-zero
matrices can be zero.
Example 1.13 The square a2 = aa of matrix

0 1
a=
0 0
is the 2-by-2 zero matrix.
The association between operations on matrices and operations on linear functions ex-
tends to arbitrary real vector spaces of finite dimensions, as long as proper bases are used.
For scaling and addition, the conclusion is obvious. The following statement establishes
the connection for multiplication.
Theorem 1.6 Let V, U, W be real vector spaces with bases bV , bU , and bW respectively.
Let a, b be matrices of linear operators A : V 7 U and B : U 7 W with respect to
the pairs of bases (bV , bU ) and (bU , bW ) respectively. Then ba is the matrix of BA with
respect to bases bV and bW .
Proof. A basis bV = (v1 , . . . , vm ) of a real vector space defines a linear function TV : IRm 7 V
according to
x1
TV ... = x1 v1 + + xm vm .

xm
Since bV is a basis, TV is a bijection, and hence has a well defined inverse TV1 . The bases bU
and bW define similar linear bijections TU : IRn 7 U and TW : IRk 7 W , where n and k are
the dimensions of U and W respectively. By construction, the matrix a of A with respect to
the bases bV , bU is the matrix of the linear function : IRm 7 IRn defined by = TU1 ATV .
Similarly, the matrix b of B with respect to the bases bU , bW is the matrix of the linear function
1
: IRn 7 IRk defined by = TW BTU , and the matrix c of BA is same as the matrix of
1
= TW BATV . Since
1
= TW BTU TU1 ATV = TW1
BATV = ,
the conclusion c = ba follows.
1.4.3 Feasibility of Linear Equations

Conditions of feasibility of linear equations based on dimension counting are formulated
in this subsection.
Dimension Counting For Linear Functions

The equation A(v) = u, where the function A : V 7 U and the element u U are known,
and v V is to be found, is called linear when V and U are real vector spaces, and A is
linear. It turns out that feasibility of equation Av = u for every u can be established by
counting the dimensions of the range and null-space of A.
The following fundamental theorem relates the dimensions of V , R(A), and ker(A).
Theorem 1.7 The equality
dim(R(A)) + dim(ker(A)) = dim(V ), (1.10)
holds for every linear function A : V 7 U.
Proof. If dim(ker(A)) = then dim(V ) = and hence equality (1.10) holds. Otherwise, if
dim(ker(A)) = k, where k < , let b0 = (v1 , . . . , vk ) be a basis in ker(A).
If dim(V ) = n < then, according to Theorem 1.4, b0 can be extended to a basis b =

(v1 , . . . , vn ) of V . The statement of the theorem follows then from the fact that the sequence
be = (Avk+1 , . . . , Avn ) is a basis of R(A), i.e dim(R(A)) = n k. Indeed, on one hand, every
v V has the form
v = c1 v1 + + cn vn , ci IR,
and hence every u R(A) has the form
u = Av = A(c1 v1 + + cn vn ) = ck+1 (Avk+1 ) + + cn (Avn ).
On the other hand, be is linearly independent, because
ck+1 (Avk+1 ) + + cn (Avn ) = 0
implies
ck+1 vk+1 + + cn vn ker(A),
which means that the sequence b is not linearly independent.
If dim(V ) = , b0 can be extended to an arbitrarily long linearly independent sequence b =
(v1 , . . . , vn ), and the arguments used for dim(V ) < can be used to show that dim(R(A)) = .
One interpretation of Theorem 1.7 is information loss under linear transformations.

One can view the dimension of V as the amount of information available at the source,
and the dimension of R(A) as the amount of the information transmitted by A, while
the dim(ker(A)) is the information lost in the transmission. The theorem states that
the transmitted and lost information add up exactly to the total information available.
Of course, this rough interpretation has nothing to do with the true information theory,
which counts bits and not real variables.
While Theorem 1.7, with the convention that n + = and + = , is true
even when some of the dimensions in (1.10) are infinite, it is most important in the finite
dimensional case: when A : V 7 U is linear, dim(V ) < , and
dim(V ) dim(ker(A)) = dim(U), (1.11)
it follows from (1.10) that dim(R(A)) = dim(U). According to Theorem 1.4, this means
that R(A) = U, i.e. that equation Av = u has a solution v V for every u U.
The number dim(R(A)) is important enough to have a special term for it.
Definition 1.10 The dimension dim(R(A)) of the range of a linear function A : V 7 U

is called the rank of A.
The solution v of Av = u will be unique when ker(A) = {0}. Indeed, if Av1 = u and
Av2 = u then A(v1 v2 ) = 0, which means u1 u2 ker(A). Conversely, if Av = u and
w ker(A) then A(v + w) = u as well.
Example 1.14 Returning to the setup of interpolation by polynomials of a single real vari-
able, one can use Theorem 1.7 to prove that for every sequence (t1 , . . . , tm ) of m different real
numbers, and every sequence (y1 , . . . , ym ) of m real numbers, there exists a unique polynomial
p = p(t) of degree less than m such that p(tk ) = yk for all k = 1, . . . , m.
Indeed, let V be the real vector space of all polynomials of degree less than m. The function
A : V 7 IRm defined by

p(t1 )
A(p) = ..
,

.
p(tm )
is linear. Since V has the basis of m monomials (1, t, . . . , tm1 ), its dimension is m. Since the
dimension of IRm is m as well, it is sufficient to show that ker(A) = {0} has dimension zero, i.e.
that Ap = 0 implies p = 0. Since Ap = 0 means that the m different numbers ti are roots of the
polynomial p of degree less than m, the equality p 0 is implied, which completes the proof of
feasibility of polynomial interpolation.
Example 1.15 Let V be the set of those real-valued functions f : IR IR 7 IR of two real
variables which can be represented in the polynomial form
2 X
X 2
f (t, h) = fi,k ti hk ,
i=0 k=0
where fi,k are real constants, and satisfy the rotational symmetry constraint
f (t, h) = f (cos(a)t + sin(a)h, cos(a)h sin(a)t) t, h, a IR.
It is easy to see that V is a real vector space. What is the dimension of V ?

The answer is dim(V ) = 2. To prove this, let U be the real vector space of polynomials
p = p(t) of a single real variable t, of degree not larger than two, which are even, in the sense
that p(t) = p(t) for all t IR. Consider the function A : V 7 U which maps f V to
p U according to p(t) = f (t, 0). The function A is linear, and has zero null-space, as, due to
the rotational symmetry of f , f (t, 0) = 0 for all t if and only if f (t, h) = 0 for all t, h. Since
every polynomial of one variable t can be viewed as a polynomial of two variables t, h, constant
with respect to h, R(A) = U . Hence dim(V ) = dim(U ) (Theorem 1.7). Since U has a basis
bU = {1, t2 }, dim(V ) = 2.
1.4.4 Gaussian Elimination

This subsection covers the ideas behind the algorithm of Gaussian elimination: the most
straightforward (though not necessarily the best possible) method for solving finite di-
mensional systems of linear equations.
Bi-Orthogonality
Let Su = (u1 , . . . , un ) be a sequence of elements of a real vector space U. A sequence
Sf = (f1 , . . . , fn ) of linear functionals fi U ] is called bi-orthogonal to Su when

def 1, i = k,
fi (uk ) = ik =
0, i 6= k.
Bi-orthogonality plays an important role in solving systems of linear equations. Indeed,

let V be a real vector space with basis Sv = (v1 , . . . , vn ). Let A : V 7 U be a linear
function such that Avk = uk for all k = 1, . . . , n. An explicit expression for a solution
v of equation Av = u can be given in terms of a sequence Sf = (f1 , . . . , fn ) which is
bi-orthogonal to Su = (u1 , . . . , un ), provided such solution does exist:
v = f1 (u)v1 + + fn (u)vn ,
because applying fk to both sides of equality
A(c1 v1 + + cn vn ) = u
yields ck = fk u.
The following statement claims that a linearly independent sequence of linear func-
tionals is always bi-orthogonal to a sequence of vectors.
Theorem 1.8 Let U be a real vector space. A sequence Sf = (f1 , . . . , fn ) of fi U ] is

linearly independent if and only if there exists a sequence Su = (u1 , . . . , un ) of ui U
such that Sf and Su are bi-orthogonal.
The proof of Theorem 1.8 is constructive and essentially describes the Gaussian elim-
ination algorithm for solving systems of linear equations.
Proof. The if part is easy: if Sf and Su are bi-orthogonal then applying the functional
f = c1 f1 + + cn fn
to uk yields f uk = ck . Hence f = 0 implies ck = 0 for all k.

The proof of the only if is done by induction with respect to n. For n = 1, since Sf
is linearly independent, there exists w U such that f1 w 6= 0, hence u1 can be defined by
u1 = (f1 w)1 w. Assume that the statement of Theorem 1.8 is true for n m, where m is a
positive integer. Consider the case n = m + 1. By the inductive assumption, there exists a
sequence (u1 , . . . , um ) of ui U such that fi uk = ik for 1 i, k m.
Let
f = fn [(fn u1 )f1 + (fn u2 )f2 + + (fn um )fm ].
Since Sf is linearly independent, there exists u U such that f u 6= 0. Then by construction
w = u [(f1 u)u1 + (f2 u)u2 + + (fm u)um ]

is such that
fn w = f u 6= 0, f1 w = f2 w = = fm w = 0.
Hence, for un = (fn w)1 w the sequence Su = (u1 , . . . , un ) is bi-orthogonal to Sf .
A statement similar to Theorem 1.8 applies to sequences of vectors.
Theorem 1.9 Let U be a real vector space. A sequence Su = (u1 , . . . , un ) of ui U is

linearly independent if and only if there exists a sequence Sf = (f1 , . . . , fn ) of fi U ]
such that Sf and Su are bi-orthogonal.
Dimension Counting And Duality

Theorems 1.8,1.9 can be used to prove a number of statements about the relation between
dimesnion-counting and duality.
The first states that dual of a real vector space has same dimension.
Theorem 1.10 dim(V ) = dim(V ] ) for every real vector space V .
Proof. According to Theorem 1.9, for every linearly independent sequence in V there exists
a linearly independent sequence in V ] (the bi-orthogonal one) of the same length. Conversely,
according to Theorem 1.8, for every linearly independent sequence in V ] there exists a linearly
independent sequence in V (the bi-orthogonal one) of the same length.
The next statement is a familiar relation between the dimensions of a space, its sub-
space, and the orthogonal complement of the subspace.
Theorem 1.11 If U is a subspace of vector space V then

dim(U) + dim(U ) = dim(V ). (1.12)
Proof. If dim(U ) = then dim(V ) = and hence the equality is satisfied no matter what
the value of dim(U ) is.
If dim(U ) = k < let Su = (u1 , . . . , uk ) be a basis in U , and let Sv = (u1 , . . . , un ) be
an extension of Su to a linearly independent sequence in V . Let Sf = (f1 , . . . , fn ) be the
corresponding bi-orthogonal sequence of functionals fi V ] . Then fi ur = 0 for i > k and r k,
which means that (fk+1 , . . . , fn ) is a linearly independent sequence in U . Hence
dim(U ) + dim(U ) dim(V ).
Similarly, let Sf = (f1 , . . . , fm ) be a linearly independent sequence in U . Then a sequence

Sv = (v1 , . . . , vm ) which is bi-orthogonal to Sf will complement Su to a linearly independent
sequence (u1 , . . . , uk , v1 , . . . , vm ) in V . Hence
dim(U ) + dim(U ) dim(V ),
which completes the proof.

The last statement is the well-known relation between the column and row ranks
of a matrix, translated into the language of general linear functions.
Theorem 1.12 If A : V 7 U is a linear function then
dim(R(A)) = dim(R(A] )).
Proof. To prove that dim(R(A)) dim(R(A] )) let
Su = (u1 , . . . , uk ) = (Av1 , . . . , Avk )
be a linearly independent sequence in R(A) U . Let Sg = (g1 , . . . , gk ) be the corresponding

bi-orthogonal sequence of gi U ] . Then, by construction, the sequence
Sf = (f1 , . . . , fk ) = (A] g1 , . . . , A] gk )
is bi-orthogonal to Sv = (v1 , . . . , vk ), since
fi vr = (A] gi )vr = gi Avr = gi ur = ir ,
which means that dim(R(A] )) k.

To prove that dim(R(A)) dim(R(A] )) let let
Sf = (f1 , . . . , fk ) = (A] g1 , . . . , A] gk )
be a linearly independent sequence in R(A] ) V ] . Let Sv = (v1 , . . . , vk ) be the corresponding

bi-orthogonal sequence of vi V . Then, by construction, the sequence
Su = (u1 , . . . , uk ) = (Av1 , . . . , Avk )
is bi-orthogonal to Sv = (g1 , . . . , gk ), which means that dim(R(A)) k.

Chapter 2
Linearity With Alternative Fields
So far, the discussion of vector spaces was limited to the special case when real numbers
are used as scalars, which was reflected in the term real vector space. This chapter
adapts the standard definitions and theorems to the case of arbitrary fields of scalars. It
turns out that the basic methods of real vector spaces, presented in chapter 1, can be
generalized without modification to the case of a general field. As a side benefit, this
chapter should also give a fine opportunity to review basic matrix algebra constructions
in a new environment.
Some of the alternative fields of scalars are quite familiar. For example, vector spaces
with complex scalars are convenient for dealing with eigenvectors and eigenvalues. Vector
spaces with rational scalars are better suited for precise numerical manipulations with
matrices. The main attention of this chapter will be given to recognition and construction
of finite fields, which are used to extend the benefits of linearity to discrete mathematics.
2.1 Motivation
The mathematical notion of a field is an abstraction for data which can be handled by
operations of addition, subtraction, multiplication, and division satisfying the convenient
and familiar properties of associativity, distributivity, and commutativity. Among many
examples, the set IR of real numbers, equipped with the standard arithmetic operations, is
perhaps the most important field: its elements are used to represent values of continuously
varying, or analog physical parameters. Another classical example is the set C of complex
numbers, originally invented to serve in intermediate calculations associated with real
numbers. Practical calculations associated with real or complex numbers are typically
performed using rational numbers (set Q) 0 which gives another example of a field.
While complex numbers appear to be perfect for representing parameters of the physi-
cal world, most of human and computer decision making is done in terms of discrete data,
for which the set of admissible values is finite. Linearity of the quantized word is ex-
pressed by finite fields, which are finite sets on which operations of addition, subtraction,
29
30 CHAPTER 2. LINEARITY WITH ALTERNATIVE FIELDS
multiplication, and division satisfying the usual properties are defined. Finite fields are
used in discrete data processing, such as coding and decoding in digital communications,
as well as to speed up calculations associated with analog data.
2.1.1 Linear Coding

Encoding and decoding information, which commonly deals with large amounts of data,
benefits greatly from linearity, as it reduces the complexity of calculations involved. More-
over, since the data is frequently discrete to begin with, the linearity involved is to be of
a finite field variety.
Error-Correcting Coding
Consider the task of using redundancy to protect a large portion of digital data from
inevitable corruption. The data could be some computer code to be stored on a CD
expected to be scratched and otherwise misused, or some scientific measurements trans-
mitted by a weak and noisy radio signal from a deep space probe, etc. One model of the
situation treats the original data as a binary sequence x = (x1 , . . . , xm ) of xi {0, 1},
which is to be transformed (or encoded) into a longer binary sequence z = (z1 , . . . , zn ) of
zi {0, 1} in such a way that, even after z is modified (corrupted) into a different sequence
y = (y1 , . . . , yn ), the original data x can still be recovered from y, provided that the differ-
ence between y and z is not too large, in a certain sense. The setup is shown on diagram
(2.1), where the encoder E : {0, 1}m 7 {0, 1}n and decoder D : {0, 1}n 7 {0, 1}m are
functions to be designed, while C : {0, 1}n 7 {0, 1}n is given.
z1 y1
x1 .. .. x1
. .
x = ... x = D(y) = ...
E C D
7 z = E(x) = zm 7 y = C(z, w) = ym 7 (2.1)
xm .. .. xm
. .
zn yn
In deterministic models, the objective of designing E and D is to guarantee equality

x = D(C(E(x), w)) for all x {0, 1}m and w . In the (more general) stochastic
models, w and x are independent random variables, some statistical information about w
and x is assumed to be known, and the objective is minimization of the worst case (over
all admissible distributions for w and x) probability that x 6= D(C(E(x), w)). The most
common approach to modeling the data corruption process is to limit the probability of
a large number of individual bit errors zi 6= yi . Depending on an application, there are
many possible ways to do this. For example, when the bits zi are transmitted through a
communication link with no significant memory, the error events zt 6= yt at different t are
modeled as independent random variables with known distributions. On the other hand,
2.1. MOTIVATION 31
when the bits zi represent storage on a compact disc, the errors, caused by scratches and
manufacturing defects, are expected to come in bursts, which makes the independent bit
errors model inadequate.
A common sense idea for encoding is that of mixing, in the sense that every single
bit zi of z = E(x) should depend on a large number of bits of x:
z = F (x) : zi = fi (x1 , . . . , xm ) (i = 1, . . . , m), (2.2)
so that, when a few components of z are corrupted in the transition from z to y, there is
still enough information to restore x from y. This rises the issue of complexity of decoding,
since, even when no data corruption occurs (z = y), recovering x from y means solving
system (2.2) of n equations with m variables. Since, in general, only linear systems
are inexpensive to solve, one would want (2.2) to satisfy the requirements of linearity.
However, the concept of real vector space linearity does not apply here because, by the
nature of the application, the variables xi are discrete, i.e. can take only a finite number
of values.
A significant benefit of understanding vector spaces over general fields is the realization
that there is a way of treating the elements of {0, 1} (or, more generally, sequences of bits
of fixed length k, as well as some other finite sets) as elements of a finite field, which
allows one to extend the concept of linearity, with all of its benefits, to the discrete
variable setup. The coding schemes which use linear operations on finite fields allow for
a significant simplification of the decoding process. There is a cost associated with using
linear coding: in most cases the length n of the encoded message z tends to be larger than
the theoretical minimum achievable with general nonlinear codes, but the gap appears to
be shrinking as better linear codes are being developed.
Network Coding
A set = {(a, b)} of ordered pairs (a, b) of numbers a, b {1, . . . , n}, such that a < b
for all (a, b) , and two positive integers k, m, such that k + m n, can be used as a
simplified model for delivering information from k sources to m sinks over a network with
n nodes, with the interpretation that node a can send information to node b if and only
if (a, b) G.
A memoryless q-bit network communication algorithm for is described by a col-
lection of encoding functions fa,b . For a given (a, b) , the function fa,b combines the
information available originally available at node a with all information received at a from
other nodes to produce a q-bit word to be sent to node b. Formally, fa,b maps N (a)+1 to
, where = {0, 1}q and N(a) is the number of elements in the set
R(a) = {c : (c, a) }.
The meaning of each function fa,b is given by
y(a, b) = fa,b (u(a), y(c1, a), . . . , y(cN (a) , a)) : c1 < < cN (a) , ci R(a), (2.3)
# #
1 2
y(1, 3) y(2,3)
"!
H "!
j#
HH
H
H

3
"!
y(1, 5) y(2, 6)
y(3, 4)
#?
4
"! HH
#?

y(4, 5) HH
y(4, j#
6)
H ?
5 6
"! "!
Figure 2.1: The Butterfly Network
where u(a) is the q-bit word representing the information originating at node a, and
y(a, b) is the q-bit word sent from a to b. For example, in the so-called butterfly
network shown on Figure 2.1:
y(1, 3) = f1,3 (u(1)),

y(1, 5) = f1,6 (u(1)),
y(2, 3) = f2,3 (u(2)),
y(2, 6) = f2,6 (u(2)),
y(3, 4) = f3,4 (u(3), y(1, 3), y(2, 3)),
y(4, 5) = f4,5 (u(4), y(3, 4)),
y(4, 6) = f4,6 (u(4), y(3, 4)).
For a given graph and positive integers k, m such that k + m n, designing an

acceptable network code means finding a family of encoding functions fa,b such that for
every d > n m the values of u(1), . . . , u(k) can be reconstructed from the sequence of
values of y(c, d) with c ranging over R(d). In other words, for every d > n m there
should exist decoding functions ga,d : N (d) 7 , where a {1, . . . , k}, such that
u(a) = ga,d (y(c1 , d), . . . , y(cN (d) , d)) : c1 < < cN (d) , ci R(d)
for all sets of values u(i), y(a, b) satisfying (2.3).

2.1. MOTIVATION 33
As in the case of error-correcting coding, network coding benefits from mixing, which
means some of the functions fa,b not just selecting one of their inputs for an output (as
in routing) but actually making fa,b dependent on all of its arguments. For example,
no routing scheme can produce an acceptable network coding algorithm for the butterfly
network from Figure 2.1 with k = m = 2 (i.e. sending information from nodes 1 and
2 to nodes 5 and 6), as in this case either y(3, 4) = u(1), in which case u(2) cannot be
recovered at node 5, or y(3, 4) = u(2), in which case u(1) cannot be recovered at node 6.
In contrast, defining
y(1, 3) = y(1, 5) = u(1), y(2, 3) = y(2, 6) = u(2), (2.4)
y(4, 5) = y(4, 6) = y(3, 4) = y(1, 3) + y(2, 3),
where the addition is done component-wise modulo 2, yields an acceptable network code.
The general difficulty associated with mixing strategies is that the decoder will have
to solve a system of many equations with many unknowns. Linearity of the encoding
functions fa,b aids greatly both in assessing feasibility of decoding and in reducing com-
plexity of the decoding algorithm. Since is a finite set, the framework for linearity will
have to be based on finite fields. For example, the network coding algorithm defined by
(2.4) is linear over the binary field ZZ2 = {0, 1}.
2.1.2 Exact Solutions Of Linear Equations

A standard way of solving systems of n linear equations with n real unknowns is by using
the Gaussian elimination algorithm. Roughly speaking, a single step of the algorithm
reduces equation Ax = y, where A IRn,n is a known n-by-n real matrix, y IRn is a
known n-by-1 real matrix, and x IRn is to be found, to a modified system Ax = y,
where A IRn1,n1 , y IRn1 are given, and x IRn1 is the unknown. After n steps A
gets reduced to a scalar, whch makes for an easy-to solve equation.
Essentially, a single reduction step replaces A = (aik )ni,k=1 and y = (yi )ni=1 with A =
n1 n1
(aik )i,k=1 and y = (yi )i=1 according to
ai,n an,k ak,n yn
ai,k = ai,k , yk = yk , (2.5)
an,n an,n
where it is assumed that an,n 6= 0. If an,n = 0, a re-ordering of variables and equations is
applied to place a non-zero element of A (usually the largest in amplitude) at the lower
right corner. The formulae (2.5) correspond to using the last equation
an,1 x1 + + an,n xn = yn
to express xn as a linear function of x1 , . . . , xn1 :
1
xn = (an,1 x1 + + an,n1xn1 yn ),
an,n
and then substituting the right side for xn in the rest of equations.
Despite the theoretical simplicity of the algorithm, its implementations (such as LA-
PACK, which is what MATLAB relies upon) frequently fail to produce an accurate so-
lution, even for feasible system of equations. The algorithm implementation failures are
due to the round-off errors associated with the floating point arithmetics used in most
engineering calculations. The issue of sensitivity of linear equation solving with respect to
round-off errors will be studied later in the course, after the concept of distance between
vectors is introduced. In this chapter, it is appropriate to address the topic of solving
systems of linear equations exactly.
An appealing framework for exact linear equation solvers assumes that the entries
of A and y (and hence those of the recursively generated reduced matrices) are ratio-
nal numbers, and hence can be represented precisely in a computer memory as pairs
of sequences of bits (one for the numerator and one for the denominator). Within this
framework, which relies on linearity with respect to the field Q 0 of all rational numbers, a
direct implementation of the exact Gaussian elimination algorithm is possible.
Such implementation, however, is bound to be extremely inefficient for large n. To
see the reason, one can estimate the number of bits needed to store the numerators and
denominators of the entries of the reduced marices involved. Assume that the original
integer entries pi,k , qi,k of ai,k = pi,k /qi, k require m bits each to store. Then the numerators
and denominators of
ai,n an,k pi,n qi,n qn,n pn,n qi,k pi,n pn,k qn,n
ai,k = ai,k =
an,n qi,k qi,n qn,k pn,n
require, roghly, 4m bits to store. After k steps of the Gaussian elimination procedure,
one will expect to need 4k m bits to store each entry of the reduced matrix A, which
actually indicates that the Gaussian elimination algorithm, when implemented directly,
has exponential complexity growth.
Using arithmetic calculations in finite fields can resolve the complexity growth issue.
One way to apply the idea is by performing all arithmetic operations on integers modulo
a sufficiently large prime number p, which means working in the finite field ZZp , where
log(p) n(m + log(n)). This way, the number of bits required to store an entry of a
reduced version of matrix A does not exceed log2 (p) + 1. A more efficient algorithm relies
on finding a sufficiently good approximate solution in the field of q-adic numbers for an
arbitrarily chosen prime number q, and then converting the approximation to the actual
solution.
2.2 General Fields

This section teaches recognition, construction, and elementary manipulations with fields,
which are systems of arithmetic operations on number-like elements, complete with the
features resembling those of complex numbers. It assumes that the classical number fields
2.2. GENERAL FIELDS 35
Q,
0 IR, and C are familiar to the reader, uses them as examples, and introduces finite fields
Fq , where q = pm and p is a prime number. A general classification of fields in terms of
characteristic numbers and extensions is given, showing that every field, depending on its
characteristic number, is an extension of either Q
0 or Fp for some prime number p.
2.2.1 Formal Definition of a Field

Elementary arithmetical operations on complex numbers have several convenient prop-
erties which one usually takes for granted, but which nevertheless play a critical role in
proofs and derivations. Sometimes there is a need to define addition, multiplication, sub-
traction, and division for objects other than complex numbers, or to re-define arithmetic
operations on a subset of C. Mathematical notion of a field establishes a standard set of
requirements regarding the way in which the four basic operations can be defined.
Definition 2.1 A set F and two functions a : F F 7 F (a for addition) and

m : F F 7 F (m for multiplication) are said to define a field F if there exist two
different elements 0F F (zero of F ) and 1F F (unit of F ) such that conditions (F1)-
(F9), listed below, are satisfied, where 0, 1, x + y, and xy = x y are used as shortcuts
for 0F , 1F , a(x, y), and m(x, y) respectively:
(F1) x + (y + z) = (x + y) + z for all x, y, z F ;
(F2) x(yz) = (xy)z for all x, y, z F ;
(F3) x + y = y + x for all x, y F ;
(F4) xy = yx for all x, y F ;
(F5) x(y + z) = (xy) + (xz) for all x, y, z F ;
(F6) x + 0 = x for all x F ;
(F7) x 1 = x for all x F ;
(F8) for every x F there exists y F such that x + y = 0;
(F9) for every x F , x 6= 0 there exists y F such that xy = 1.
Note that, once the addition and multiplication functions are fixed, only one element
can be suitable to play the role of zero of the field. Indeed, if a, b F are such that
a + x = x and b + x = x for every x F then
a = b + a = a + b = b.
Similarly, only one element can be suitable to play the role of unit of the field.
Mathematical formality requires one to make a distinction between the set F and the
triplet F = (F, a, m), because, on the same set, arithmetic operations satisfying the field
axioms (F1)-(F9) can be defined in different ways. While F is the true field, it is
common to ignore the difference between F and F (unless this causes ambiguity), so that
x F has the same meaning as x F .
Elementary Proofs for Abstract Fields

Definition of a general field requests the presence of 0, 1, addition, and multiplication
satisfying axioms (F1)-(F9). This basic information can be used to prove other common
properties of abstract fields.
Here a few examples of such statements are given. The first one claims uniqueness of
y in (F8) and (F9).
Lemma 2.1 Let F be a field, x, y1 , y2 F . Then
(a) x + y1 = x + y2 = 0 implies y1 = y2 ;
(b) xy1 = xy2 = 1 implies y1 = y2 .
Proof. If conditions (A): x + y1 = 0 and (B): x + y2 = 0 are satisfied then

(F 6) (B) (F 3) (F 1)
y1 = y1 + 0 = y1 + (x + y2 ) = y1 + (y2 + x) = (y1 + y2 ) + x
(F 3) (F 1) (F 3) (A) (F 6)
= (y2 + y1 ) + x = y2 + (y1 + x) = y2 + (x + y1 ) = y2 + 0 = y2 ,
which proves (a). The proof of (b) is similar according to
y1 = y1 1 = y1 (xy2 ) = y2 (xy1 ) = y2 .
The second example states that the product of two non-zero elements of a field is not
zero.
Lemma 2.2 If F is a field, x, z F , x 6= 0 and zx = 0 then z = 0.
Proof. Let y F be such that xy = 1 (y exists according to (F9)). Then
z = z 1 = z(xy) = (zx)y = 0 y = 0.
Field Conventions And Terminology

To streamline notation, a number of conventions are typically used, as long as this does
not cause ambiguity.
(a) The element y in (F8), which, according to Lemma 2.1, is uniquely defined by x, is
denoted by x.
(b) The element y in (F9), which, according to Lemma 2.1, is uniquely defined by x, is
denoted by x1 .
(c) x y used as a shortcut for x + (y).
(d) x/y, where y 6= 0, used as a shortcut for x (y 1 ).
(e) Associativity allows one to write x + y + z instead of (x + y) + z, and xyz instead

of (xy)z;
(f) For a positive integer n, and x F , nx and xn denote respectively the sum and the
product of n copies of x;
(g) Standard operation priority rules (do multiplication and division before addition
and subtraction when not sure, etc.) are applied, so that, for example, x + yz
means x + (yz) and not (x + y)z.
Homomorphisms, Extensions, and Subfields

An analog of a linear function mapping one real vector space into another, in the case of
fields, is field homomorphism.
Definition 2.2 Let F = (F, aF , mF ) and G = (G, aG , mG ) be two fields. A function

: F 7 G is called a field homomorphism when (1) = 1 and
(a + b) = (a) + (b), (ab) = phi(a)phi(b) a, b F.
For example, complex conjugation defines a homorphism of the field C into itself.
Definition 2.3 A field F = (F, aF , mF ) is a subfield of field G = (G, aG , mG ) (equiva-

lently, G is an extension of F) is F G and
aF (a, b) = aG (a, b), mF (a, b) = mG (a, b) a, b F.
In particular, F is a subfield of G if F G and the inclusion map : F 7 G defined

by (f ) = f is a homomorphism. For example, IR is a subfield of C.
Equivalent (Isomorphic) Fields

Formally speaking, in order for two fields F = (F, aF , mF ) and G = (G, aG , mG ) to be
equal, the sets F and G must be equal, and equalities aF (x, y) = aG (x, y), mF (x, y) =
mG (x, y) must hold for all x, y F = G.
Consider, however, the field F3 with elements 0, 1, 2 and addition/multiplication tables
a 0 1 2 m 0 1 2
0 0 1 2 0 0 0 0
1 1 2 0 1 0 1 2
2 2 0 1 2 0 2 1
and the field F with three elements 0, 1, defined by
a 0 1 m 0 1
0 0 1 0 0 0 0
1 1 0 1 0 1
0 1 0 1
It is easy to see that F3 and F are very much the same field, and the only difference is
in the ways the elements are labeled: F is obtained from F3 by re-naming 2 as 8. In
fact, fields F and F3 are equivalent, or isomorphic, according to the following definition.
Definition 2.4 Field F = (F, aF , mF ) is called isomorphic (or equivalent) to field G =

(G, aG , mG ) if there exists a bijection b : F 7 G which is a field homomorphism.
In other words, field F is isomorphic to field G if G and the field operations of G are
obtained by re-naming the elements of F . For most practical purposes, isomorphic fields
should be treated as equal.
2.2.2 Examples of Fields

Three major sources of field constructions are discussed here: subfields of complex num-
bers, finite fields, and the fields of p-adic numbers.
Fields of Complex Numbers

It is well known that the set C of complex numbers, with the usual 0, 1, addition, and
multiplication, satisfies conditions (F1)-(F9), whch makes C a field. Therefore, every
subset F of C which contains 0, 1, and is closed with respect to the four arithmetic
operations, is a field as well. The various subfields of C serve as a major source of field
examples.
Example 2.1 The sets Q

0 and IR, with the standard 0, 1, and operations of addition and
multiplication, are fields.
Proof. Verification is straightforward, as sums, products, differences, and ratios of rational
(real) numbers are respectively rational (real).
Example
2.2 The set Q[j]0 consisting of all complex numbers z of the form z = a + jb, where
j = 1 and a, b are rational numbers, with the standard 0, 1, and operations of addition and
multiplication, is a field.
Proof. To verify this, use the identities
(a1 + jb1 ) + (a2 + jb2 ) = (a1 + a2 ) + j(b1 + b2 ),

(a1 + jb1 ) (a2 + jb2 ) = (a1 a2 ) + j(b1 b2 ),
(a1 + jb1 )(a2 + jb2 ) = (a1 a2 b1 b2 ) + j(a2 b1 + a1 b2 ),
(a1 + jb1 )/(a2 + jb2 ) = (a1 a2 + b1 b2 )/(a22 + b22 ) + j(a2 b1 a1 b2 )/(a22 + b22 ).
Example 2.3 The set ZZ of all integer numbers, with the usual selection of zero and identity,
and equipped with the standard operations of addition and multiplication, is not a field.
Proof. For x = 2 ZZ, there is no y ZZ such that xy = 1.
Example
2.4 The set F consisting of all real numbers z of the form z = a + b, where
= 3 2 and a, b are rational numbers, with the standard 0, 1, and operations of addition and
multiplication, is not a field.
Proof. Indeed, assume to the contrary that F is a field. Since F , this implies 2 F , i.e.
2 = a + b for some a, b Q.
0 Multiplying the equality by yields 2 = a + 2 b. Substituting
a + b in place of produces equality 2 ba = (a + b2 ). Since is not a rational
2
number, this
implies 2 = ba and a + b2 = 0, which yields 2 = b3 (impossible since = 3 2 is not a rational
number).
Finite Fields
A finite field can be constructed explicitly, by defining 0F , 1F , addition and multiplication
on a particular finite set F .
Example 2.5 Every field has at least two different elements: 0 = 0F and 1 = 1F . There is
a field, usually denoted by F2 (or sometimes ZZ2 ), which has no other elements. Here are the
addition and multiplication tables in ZZ2 :
a 0 1 m 0 1
0 0 1 0 0 0
1 1 0 1 0 1
It is easy to see that the operations in F2 are the standard addition and multiplication of integers
done modulo 2.
Example 2.6 Let p be a positive number not smaller than 2. Let F = {0, 1, 2, . . . , p 1} be
the set of p integers between 0 and p 1. For every x1 , x2 F let a(x1 , x2 ) be defined as the
reminder of the division of the usual integer sum x1 + x2 by p, i.e.

x1 + x2 , x1 + x2 < p,
a(x1 , x2 ) =
x1 + x2 p, x 1 + x2 p.
Similarly, let m(x1 , x2 ) be defined as the reminder of the division of the usual integer product
x1 x2 by p. It can be shown that the definition satisfies conditions (F1)-(F9) if and only if p is a
prime number. The resulting field is usually denoted by Fp (or ZZp ). In the simplest case p = 2
we get the field from Example 2.5.
Not every set with more than one element allows arithmetic operations which define
a field. In particular, there is no way to complete the addition and multiplication tables
a 0 1 b c d m 0 1 b c d
0 0 1 b c d 0 0 0 0 0 0 0
1 1 1 0 1 b c d
0
b b b 0 b
c c c 0 c
d d d 0 d
to satisfy conditions (F1)-(F9).

The following theorem which gives a partial description of all possible finite fields, is
given here without proof.
Theorem 2.1 Let q be a positive integer. A field F with q elements exists if and only if
q = pm where p is a prime number and m is a positive integer. Every two fields X and Y
with q elements are isomorphic.
Theorem 2.1 allows one to refer to the field Fq with q = pm elements.
The Field of p-Adic Numbers

Every real number x IR has a unique representation as a formal binary expansion

X
x = 2xe + xk 2k , xk {0, 1}, xe ZZ (2.6)
k=0
where the sequence (xk ) is required (for the sake of uniqueness of representation) to have
an infinite number of zero terms. The arithmetic operations on real numbers can be
developed completely in terms of representations (2.6), though this may be not the most
convenient approach, as the proper counting of a sum or a product is supposed to begin

with the least significant bits, and becomes tricky when an infinite number of terms in
a binary representation are equal to one, and hence the least significant bit does not
exist.
Consider the set Q2 of formal binary expansions which institute a reverse version of
(2.6):
X
x = xe + xk 2k , xk {0, 1}, (2.7)
k=0
where xe (1, 1) is a rational number with a denominator which is a power of 2. In

this case, addition and multiplication can be conveniently defined starting with the least
significant bit, as in

!
1 1 X k
X def
+ + 12 = 0+ 0 2k = 0,
2 2 k=0 k=0
0 + 20 + 21 0 + 20 + 21 + 23 + 25 + 27 + . . . = 1.

What kind of an arithmetics does one get this way?

It turns out that with the operations of addition and multiplication defined as de-
scribed, the set Q2 becomes a field, called the field of 2-adic numbers. A similar construc-
tion in which a prime number p replaces 2 leads to the field Qp of p-adic numbers, which
consists of all formal expansions of the form

X
x = xe + xk pk , xk {0, 1, . . . , p 1}, (2.8)
k=0
where xe (1, 1) is a rational number with a denominator which is a power of p.

Remarkably, different prime numbers p yield non-isomorphic fields Qp , which is in signifi-
cant contrast with the field of real numbers, which does not depend on which integer base
m 2 of the expansion

X
x = mxe + xk mk , xk {0, 1, . . . , m 1}, xe ZZ, (2.9)
k=0
is used, where it is required (for the sake of uniqueness of representation) that an infinite
number of terms xk are not equal to m 1.
The fields of p-adic numbers are used in the number theory, but sometimes are handy
in common engineering calculations as well. An important application of p-adic numbers
is exact efficient solving of systems of linear equations in the field Q
0 of rational numbers.
It is easy to see that Qp contains a subfield isomorphic to the field of rational numbers.
Therefore, for a system of equations Ax = y, where A and y are matrices with rational
entries, solving for x Q0 n is equivalent to solving for x Qnp . Due to the nature of
arithmetic operations in Qp , it will be possible to compute (sequentially and efficiently)
the first few terms of the p-adic expansion of vector x:
x = xe + x0 + 2x1 + 4x2 + . . . .
It turns out that knowing a finite set of the first coefficients of a p-adic expansion of a
rational number a = p/q allows one to reconstruct the number. Therefore, a few first
terms of a p-adic expansion of x are sufficient for computing exactly. Calculation of exact
rational solutions of systems of linear equations via p-adic approximations is one of the
best linear equation solving algorithms available.
2.2.3 Fermats Little Theorem

The following statement, called Fermats Little Theorem, in contrast with his Great
Theorem, gives a flawor of derivations associated with finite fields.
Theorem 2.2 If F is a finite field with n elements then xn = x for every x F .
Proof. Let z1 , z2 , . . . , zn1 be the list of all non-zero elements of F . For a non-zero element
x F let yk = xzk . Since, for i 6= k,
yi yk = x(zi zk ) 6= 0,
y1 , y2 , . . . , yn1 is another list of all non-zero elements of F . Hence the products
z = z1 z2 . . . zn1 , y = y1 y2 . . . yn1
are the same non-zero element of F . Since
z = (xz1 )(xz2 ) . . . (xzn1 ) = xn1 z1 z2 . . . zn1 = xn1 z,
it follows that xn1 = 1 for every non-zero element of F . Hence xn x = x(xn1 1) = 0 for
every element x of F .
Theorem 2.2, and some of its generalizations, are used to speed up checking whether a
given large number p is a prime: instead of trying to factor p (no provably fast algorithm of
such factorization is known so far), one computes p-th power of a few integers x [2, p2],
modulo p. According to Theorem 2.2, if the reminder of dividing xp by p is not x for
some x [2, p 2] then p is not a prime number. The opposite is not true: for example,
415 = 4 modulo 15, but 15 is not a prime number.
2.2.4 Characteristic of a Field

Important information about a field is given by its characteristic. If the characteristic is
zero, the field contains a subfield which is isomorphic to Q. 0 The only other possibility
is that the field characteristic equals a positive prime p, in which case there is a subfield
which is equivalent to Fp .
Definition 2.5 Let F be a field. The minimal positive integer q such that the sum of
q copies of 1 F equals 0 is called the characteristic of F . If no such q exists, the
characteristic of F is defined as zero.
It is important not to confuse the true multiplication y, x F 7 xy F of elements

of the field with multiplication by integer numbers. For example, in the field Fp , where
p is a priume number, px = 0 for every x Fp . One way to look at this is that, though
p 6= 0 as an integer number, the sum p 1F of p units is zero in Fp . Since px can be
interpreted as the field product p x = (p 1F ) x, the observation does not contradict
Lemma 2.2.
Example 2.7 Characteristic of Q

0 and Fp (where p is a prime number) are 0 and p respectively.
The minimal subfield of a given field F is completely determined by the characteristic

number of F .
Theorem 2.3 Let q be the characteristic number of a field F . Then
(a) q is either zero or a prime number;
(b) if q = 0 then F contains a subfield F0 which is isomorphic to Q;

0
(c) if q 6= 0 then F contains a subfield Fq which is equivalent to Fq .
Proof. The main idea is to look at the elements a1F F , defined as sums of a copies of 1F ,
where a IN is a positive integer, and to use the identity (ab)1F = (a1F )(b1F ) for all a, b IN,
which follows from the basic field axioms.
To prove (a), assume to the contrary that q IN and q = ab where a, b IN are smaller
than q. Then, according to the definition of q, neither a1F nor b1F are equal to zero. However,
(a1F )(b1F ) = q1F = 0F , which contradicts the result of Lemma 2.2.
To prove (b), define the subfield F0 by

0, for h = 0 Q,
0
def
F0 = {h1F : h Q},0 where h1F = (b1F )/(a1F ), for h = b/a, a, b IN,
(b1F )/(a1F ), for h = b/a, a, b IN.

The definition of h1F for h Q

0 relies on the following two observations:
(i) for a field of characteristic zero, a1F 6= 0F for all a IN (justifies division by a1F );
(ii) (b1 1F )(a2 1F ) = n1F = (b2 1F )(a1 1F ) whenever b1 a2 = n = b2 a1 (shows that (b1F )/(a1F )
is the same element of F for every representation h = b/a of a rational number h as a
ratio of two integers a, b IN).
The map h 7 h1F is a bijection between Q0 and a subset F0 of F which maps sums into sums
and products into products. Hence F0 is a subfield of F which is equivalent to Q.
0
To prove (c), note that the map h Fq 7 h1F F is a bijection between Fq and
Fq = {0F , 1F , 1F + 1F , . . . , (q 1)1F } F,
which maps Fq -sums and Fq -products into F -sums and F -products respectively. Hence Fq is
the subfield of F which is equivalent to Fq .
Theorem 2.3 can be interpreted in the following way: all fields are based on (or are
extensions of) either Q0 or ZZp for some prime p. The fields of characteristic p 6= 0 have a
clear finiteness flavor (though do not have to be finite in general). The extensions of Q0
include but are not limited to the usual number fields, such as IR or C.
2.3 Vector Spaces Over Alternative Fields

The definitions and statements from the previous chapter translate one-to-one to the
framework of vector space over a field.
2.3.1 Formal Definition and Notation

Elements of a vector space V over field F can be scaled by the elements of F , and also
added (one to another) to produce another elemtns of V .
Definition 2.6 Let F be a field. A set V and two functions p : V V 7 V and

s : F V 7 V are said to define a vector space over F if there exists an element 0V V
(called zero of V ) such that conditions (V1)-(V8), listed below, are satisfied, where v + u
and cv for v, u V and c F are used as shortcuts for p(v, u), and s(c, v) respectively:
(V1) (v + u) + w = v + (u + w) for all v, u, w V ;
(V2) c1 (c2 v) = (c1 c2 )v for all v V , c1 , c2 F ;
(V3) v + u = u + v for all v, u V ;
(V4) c(v + u) = (cv) + (cu) for all v, u V , c F ;
(V5) (c1 + c2 )v = (c1 v) + (c2 v) for all v V , c1 , c2 F ;
(V6) v + 0V = v for all v V ;
(V7) 0 v = 0V for all v V ;
(V8) 1 v = v for all v V .
2.3. VECTOR SPACES OVER ALTERNATIVE FIELDS 45
The conventions used for real vector spaces are also applicable to vector spaces over
general fields.
Just as vector spaces over IR are called real vector spaces, vector spaces over the field
C are called complex vector spaces.
2.3.2 Examples of General Vector Spaces

Most ways of constructing real vector spaces extend easily to the general case.
The Vector Spaces F m and V

Let F be a field, and m IN. The set F m consisting of all columns

x1
x2
def
x = .. = [x1 ; x2 ; . . . xm ],

.
xm
where xk are elements of F , with addition and scaling defined by

x1 y1 x1 + y1 x1 cx1
.. .. .. . .
. + . = , c .. = ..

.
xm ym xm + ym xm cxm
becomes a vector space over F . F m can be viewed as a shortcut for F where =

{1, 2, . . . , m}. In particular, C m is a complex vector space (a vector space over C), and
the set {0, 1}n = Fn2 of all vertices of the hypercube, which consists of all sequences of
zeros and ones of length n, is a vector space over the finite field F2 . In general, if V is a
vector space over F , and is any set then the set V of all functions f : 7 V becomes
naturally a vector space over F .
Linear Subspaces
As in the case of real vector spaces, a subset W of a vector space V over field F becomes
a vector space over F when it is closed under the operations of addition and scaling
inherited from V , i.e.
w1 + w2 W, cw W w1 , w2 , w W, c F.
Such subsets are called linear subspaces.

For example, the subset of all columns from Fn2 = {0, 1}n which have an even number
of non-zero elements is a linear subspace of vector space Fn2 over F2 , and, as such, is a
vector space over F2 .
Polynomials And Rational Functions

In the case of a general field F , it is not right to treat the set F [x] of all polynomials
k
X
p(x) = pk xk , pk F,
i=0
as a subset of functions p : F 7 F . For example, when F is a finite field with n

elements, all values of the polynomial p(x) = xn x are zero (according to the Fermats
Little Theorem), though p 6= 0. When it is desirable to treat elements of F [x] as functions,
it is useful to note that every p F [x] defines a function fp : G 7 G for every extension
field G.
A similar observation holds for the set V of all rational functions with coefficients
in F . The best way to define the elements of V is as equivalence classes for pairs of
polynomials (p, q) with coefficients from F , where the polynomial q is not allowed to have
all zero coefficients, (p1 , q1 ) (p2 , q2 ) if and only if the polynomials p1 q2 and p2 q1 have
same coefficients. Then addition and multiplication of rational functions are defined by
extending the natural rules
(p1 , q1 ) + (p2 , q2 ) = (p1 q2 + p2 q1 , q1 q2 ), c(p1 , q1 ) = (cp1 , q1 ).
Field Extensions As Vector Spaces

If field G is an extension of field F then every vector space over G is automatically a
vector space over F . In particular, G is a vector space over F . For example, C is a vector
space over IR (of dimension 2). IR is a vector space over Q 0 (of infinite dimension), etc.
This observation explains why a finite field F always has q = pm coefficients, where
the prime number p is the characteristic of F . Indeed, F contains a subfield which is
isomorphic to Fp , and hence can be viewed as a vector space over Fp . Since F is finite,
the dimension m of F as a vecctor space is finite as well. Since every element of f F
can be represented uniquely as a linear combination
m
X
f= ci fi ,
i=1
where (f1 , . . . , fm ) is a basis in F , and ci Fp are arbitrary, the total number of elements
in F is pm .
2.4 Standard Linearity In General Vector Spaces

This section extends the standard definitions and theorems of real vector spaces to the
case of a general field of scalars, and provides some examples.
2.4. STANDARD LINEARITY IN GENERAL VECTOR SPACES 47
2.4.1 Linear Functions

Linear functions are those mapping one vector space into another while preserving addition
and scaling relations.
Definition 2.7 Let V and U be two vector spaces over the same field F . A function
A : V 7 U is called linear if
A(v1 + v2 ) = A(v1 ) + A(v2 ), A(cv) = cA(v)
for every v, v1 , v2 V and c F .
Definition 2.8 The null-space (or kernel) of a linear function A : V 7 U is the set
def
ker(A) = {v V : Av = 0}.
The range of A is the set

def
R(A) = {Av : v V }.
It is easy to check that the null-space ker(A) of a linear function A : V 7 U is a

linear subspace of V . Similarly, the range R(A) is a linear subspace of U.
Example 2.8 An n-by-m matrix A with entries from a field F defines naturally a linear
function A : F m 7 F n (it is mathematically incorrect but very convenient to denote both the
matrix and the linear function by the same letter).
For example, the matrix
1 2
A=
2 1
defines a linear function A : (F3 )2 7 (F3 )2 with null space and range defined by

c c
ker(A) = : c F3 , R(A) = : c F3 .
c 2c
Example 2.9 It is still true for a general field F that, for a polynomial p of degree k with
coefficients in F , the equation p(x) = 0 has not more than m different solutions x F . Therefore,
when x1 , . . . , xn are n different elements of F , and m n, the linear function A mapping the
vector space V (over F ) of all polynomials p F [x] of degree not larger than m to F n according
to
p(x1 )
Ap =
..
.
p(xn )
has zero null space.
Elementary operations (addition, scaling, composition / multiplicationon, etc.) on

linear functions mapping one vector space over field F into another are defined in the
same way as for real vector spaces.
Example 2.10 Let F be a field. Let V be the real vector space of all polynomials p with
coefficients in F . Let A : V 7 V and B : V 7 V be the differentiation and multiplication
by the independent variable operators defined by
n
X n
X n
X
Af = kfk xk1 , Bf = fk xk+1 for f = fk xk .
k=1 k=0 k=0
Then, just as in the case of polynomials with real coefficients, AB BA maps f to f , i.e.
AB BA = I.
2.4.2 Extension, Factorization, and Elementary Duality

Many elementary theorems known from the real vector space case are easily extendable
to arbitrary fields of scalars.
Theorem 2.4 Let V, U be vector spaces over field F . Let A0 : V0 7 U be a linear

function defined on a linear subspace V0 V . Then there exists a linear function A :
V 7 U such that Av = A0 v for all v V0 .
Theorem 2.5 Let U, V, W be vector spaces over field F . Let A : V 7 U and B : V 7

W be linear functions such that ker(A) ker(B). Then there exists a linear function
C : U 7 W such that B = CA.
Proof. Define function C0 : R(A) 7 W according to C0 (Av) = Bv for every v V . Since

Av1 = Av2 implies v1 v2 ker(A), and hence v1 v2 ker(B), i.e. Bv1 = Bv2 , the function
C0 is well defined. Since A and B are linear, so is C0 . According to Theorem 1.1, C0 has a
linear extension C on U , which by construction satisfies CA = B.
Duality constructions have a natural generalization for arbitrary fields as well.
Definition 2.9 Let V be a vector space over field F . The dual space V ] is the vector
space over F defined as the subspace of all linear functions from F V .
Example 2.11 Let F be a field. Every linear functional f : F n 7 F has the form

x1 x1
f ... = c1 x1 + + cn xn = c1 . . . cn ... ,
def
xn xn
where ci F , and hence it is natural to associate (F n )] with the vector space F 1,n of all 1-by-n
matrices with entries from F . The vector spaces F n and F 1,n appear to be very similar. In
particular, the linear function F : F n 7 F 1,n defined by

x1
F ... = x1 . . . xn

xn
establishes a linear bijection between F n and F 1,n .
For a linear function A : V 7 U, its dual is a linear function A] : U ] 7 V ] .

Definition 2.10 Let V, U be vector spaces over field F . For a linear function A : V 7 U,
its dual A] is the linear function A] : U ] 7 V ] mapping each linear functional f : U 7 F
to the linear functional g = A] f : V 7 F defined by g(v) = f (Av).
Example 2.12 For a fixed field F , a linear function A : F m 7 F n can be represented by its
matrix MA , which allows one to view A as multiplication by a matrix on the left v 7 MA v. If
(F m )] and (F n )] are represented as the vector spaces of row matrices F 1,m and F 1,n respectively,
the dual A] becomes the multiplication by matrix MA on the right q F 1,n 7 qMA F 1,m .
For every field F , the duality transformation A 7 A] satisfies the usual identities
similar to those valid for transfposition of matrices:
(A + B)] = A] + B ] , (AB)] = B ] A] , (A1 )] = (A] )1 , etc.
While the notion of orthogonality is not generally available for pairs of vectors from
the same real vector space, it can be applied to elements of a general vector space and its
dual.
Definition 2.11 For a subspace U of a vector space V over field F ,
def
U = {f V ] : f (v) = 0 v U}.
The following statement is a commonly used relation between null space of a linear
function and the range of its dual, which is actually a special case of Theorem 1.2.
Theorem 2.6 If V, U are vector spaces over field F and A : V 7 U is a linear function
then
(ker(A)) = R(A] ).
2.4.3 Bases and Dimension

The dimension of a general vector space V over field F indicates the number of scalars
xi F needed to specify a single vector v V . Calculating dimensions and bases plays
the same role as for real vector spaces
Linear Combinations of Vectors

A finite sequence of vectors is called linearly independent if none of its elements can be
represented as a linear combination of the other.
Definition 2.12 Let V be a vector space over field F . A sequence (v1 , . . . , vn ) of elements
vi V is called linearly independent if a linear combination
c1 v1 + + cn vn (2.10)
of its elements equals zero only when all of its coefficients ci F are equal to zero.
Definition 2.13 A linearly independent sequence (v1 , . . . , vn ) of elements vi of a vector

space V over field F is called a basis if every v V is a linear combination of elements
from V .
The following statement describes bases of a vector space over a general field F .
Theorem 2.7 Let V be a vector space over field F , V 6= {0}. Then one of the following
conditions is satisfied:
(a) V has no finite basis;
(b) there exists a positive integer n such that every linearly independent sequence
(v1 , . . . , vk ) of vi V has k n elements, and can be extended to a finite ba-
sis (v1 , . . . , vn ) of V .
The number n from case (b) of Theorem 1.4 is called the dimension of V , notation
n = dim(V ). By convention, dim(V ) = when case (a) takes place, and dim(V ) = 0 for
V = {0}.
Matrix of a Linear Function

Given two finite dimensional vector spaces V, U over a field F , a linear function A : V 7 U
is defined by the set of its values on the basis of V .
Theorem 2.8 Let V, U be vector spaces over field F . Then every basis bV = (v1 , . . . , vm )
in V defines a one-to-one correspondence between linear functions A : V 7 U and
sequences of m vectors g1 , . . . , gm U, according to gk = Avk for k = 1, . . . , m.
Assuming that bU = (u1 , . . . , un ) is a fixed basis in U, every vector gk = Avk can be

represented uniquely in the form
gk = a1,k u1 + a2,k u2 + + an,k un , ai,k F.

The convention for representing linear functions is to form an n-by-m table (n rows and
m columns) filled with the nm elements ai,k F x, so that the first column is a1,1 , a2,1 ,. . . ,
an,1 , the second column is a1,2 , a2,2 ,. . . , an,2 , etc. The table is referred to as the matrix of
A with respect to the bases bV , bU :

a1,1 a1,2 . . . a1,m
a2,1 a2,2 . . . a2,m
a = .. .. .. .

. .
. . . .
an,1 an,2 . . . an,m
When A is a linear operator, i.e. when V = U, it is customary to use the same basis
for V and U: bV = bU .
Elementary operations on matrices with entries in a general field F are defined in the
same way as for matrices with real entries.
Feasibility of Linear Equations

The following fundamental theorem relating the dimensions of V , R(A), and ker(A) ex-
tends to vector spaces over arbitrary fields.
Theorem 2.9 Let V, U be vector spaces over field F . The equality
dim(R(A)) + dim(ker(A)) = dim(V ), (2.11)
holds for every linear function A : V 7 U.
As in the case of real vector spaces, for general fields of scalars, the solution v of
Av = u will be unique when A is linear and ker(A) = {0}.
Example 2.13 Returning to Example 2.9, the coefficients pk F of a polynomial p F [x] of

degree less than m can be restored by a linear function taking the vector of n m samples of
p at n different points xi F . When F is a finite field with 2r > m + 2 elements, this suggests
an attractive algorithm for error-correcting coding: a block of data represented by a sequence
of m elements p0 , . . . , pm1 F is encoded as the sequence of 2r samples
m1
X
qk = pi xik
i=0
at all elements xk F . By construction, up to (2r m 2)/2 errors in the sequence (qk ) can
be tolerated to allow recovery of the original data pk . Moreover, if each value of qk is stored as
a straight sequence of r bits, the recovery algorithm will tolerate up to (2r m 2)/4 bursts
of errors, as long as each burst is not more than 2r bits long. The code is named Reed-Solomon
after its inventors.
Gaussian Elimination
The algorithm of Gaussian elimination extends without a glitch to vector spaces over
arbitrary fields.
Let Su = (u1 , . . . , un ) be a sequence of elements of a vector space U over field F . A
sequence Sf = (f1 , . . . , fn ) of linear functionals fi U ] is called bi-orthogonal to Su when

def 1, i = k,
fi (uk ) = ik =
0, i 6= k.
Let V be a vector space over F with basis Sv = (v1 , . . . , vn ). Let A : V 7 U be

a linear function such that Avk = uk for all k = 1, . . . , n. An explicit expression for a
solution v of equation Av = u can be given in terms of a sequence Sf = (f1 , . . . , fn ) which
is bi-orthogonal to Su = (u1 , . . . , un ), provided such solution does exist:
v = f1 (u)v1 + + fn (u)vn ,
because applying fk to both sides of equality
A(c1 v1 + + cn vn ) = u
yields ck = fk u.
The following statement generalizes the claim that a linearly independent sequence of
linear functionals is always bi-orthogonal to a sequence of vectors.
Theorem 2.10 Let U be a vector space over field F . A sequence Sf = (f1 , . . . , fn ) of

fi U ] is linearly independent if and only if there exists a sequence Su = (u1 , . . . , un ) of
ui U such that Sf and Su are bi-orthogonal.
Theorem 2.11 Let U be a vector spaceover field F . A sequence Su = (u1 , . . . , un ) of

ui U is linearly independent if and only if there exists a sequence Sf = (f1 , . . . , fn ) of
fi U ] such that Sf and Su are bi-orthogonal.
Theorems 2.10,2.11 can be used to generalize a number of statements about the relation
between dimesnion-counting and duality.
Theorem 2.12 dim(V ) = dim(V ] ) for every vector space V .
Theorem 2.13 If U is a subspace of vector space V then
dim(U) + dim(U ) = dim(V ). (2.12)
Theorem 2.14 If A : V 7 U is a linear function then
dim(R(A)) = dim(R(A] )).

Chapter 3
Determinants
This chapter introduces a coordinate-free definition and basic properties of determinants

for linear operators on finite dimensional vector spaces over arbitrary fields. While deter-
minants can also be defined for a narrow class of linear operators on infinite dimensional
vector spaces, this extension is not discussed here.
3.1 Motivation
Determinants play an important role in handling linear operators on finite dimensional
vector spaces. In particular, they are useful in figuring out how a solution of a system
of linear equations depends on its parameters. Among other things, determinants are
involved in the formulae for changing coordinates in multivariable integration, define bar-
rier functions in interior point algorithms, and serve as invariants in proving infeasibility
of certain design tasks.
3.1.1 Parameter-Dependent Linear Equations

Consider a linear equation in a standard matrix format: Ax = y where A is a known
n-by-n matrix with elements from a field F , y F n is a known n-by-1 vector, and x F n
is to be found. Assuming that the four standard arithmetic operations in the field F are
accessible, the Gaussian elimination allows one to find a solution x, whenever it exists,
using O(n3) operations.
Considering the elements of A as parameters, how does the solution x of Ax = y
depend on them? For example, assume that F = Q 0 and A and y are binary matrices, in
the sense that all their entries equal either zero or one. How large does one expect the
numerators and denominators of the entries of x to be? This question is important when
there is a need to solve Ax = y exactly, as opposed to the approximate way the equation
is treated in LAPACK or MATLAB, since the size of the integer numbers involved affects
running time of an algorithm.
53
54 CHAPTER 3. DETERMINANTS
A direct analysis of the Gaussian elimination algorithm makes one to expect that the
number of digits needed to represent a denominator of x will grow exponentially with n.
Fortunately, this pessimistic upper bound is overly conservative. A more careful analysis
of the situation, conveniently performed using determinants, shows that the number of
digits in the numerator and denominator of the entries of x will actually grow not faster
than O(n log(n)).
3.1.2 Determinants as Dynamical System Invariants

Consider the task of stabilizing a frictionless oscillator by controlling its spring coeffi-
cient, represented mathematically as finding a piecewise constant function u : [0, ) 7
{0.5, 1.5} such that the solution y = y(t) of the differential equation
y(t) + u(t)y(t) = 0 (3.1)
converges to zero as t +. Here y(t) can be interpreted as position of a frictionless

mass - spring mechanical system in which the value of the spring coeficient u = u(t) can
be switched between two pre-set values (u0 = 0.5 and u1 = 1.5).
When it is admissible to define u = u(t) using a feedback law, the stabilization task
can be accomplished relatively easily. For example, selecting

1.5, y(t)y(t) > 0,
u(t) = (3.2)
0.5, y(t)y(t) 0,
stabilizes system (3.1) in the sense that every y(t) satisfying (3.1) and (3.2) simultaneously
converges to zero exponentially as t +. This can be shown by observing that, subject
to (3.1) and (3.2),
d
{y(t)2 + y(t)2 } = 2(1 u(t))y(t)y(t) = 0.5|y(t)y(t)|,
dt
and hence the Lyapunov function
V (y, y) = y 2 + y 2
is monotonically non-increasing along the solutions of system (3.1),(3.2).

Now consider the case when the function u = u(t) is to be selected as a program
control, i.e. without knowledge of y(t) or y(t). It turns out that a program control can
never achieve the stabilization objective. To see this, one can realize that, with the sping
coefficient function u = u(t) fixed, equation (3.1) defines x(t) = [y(t); y(t)] as a linear
function of x(0) = [y(0); y(0)]:

y(t) y(0)
= M(t) .
y(t) y(0)
3.2. CONSTRUCTION OF A DETERMINANT 55
While the dependence of the 2-by-2 matrix valued function M = M(t) on the coefficient
function u = u(t) is complicated, the analysis of the determinants of the matrices involved
shows that det(M(t)) = 1 for all t. Since M(t) would be converging to zero as t +
in the case when every solution of (3.1) converges to zero, the identity det(M(t)) = 1
proves that stabilization of (3.1) with program control is impossible. Thus, the use of
the determinant as an invariant (something remaining constant as time progresses) of
dynamical system (3.1) plays a critical role in establishing infeasibility of the stabilization
problem.
3.2 Construction of a Determinant

The determinant, an element of a field F , is defined for every linear function A : V 7
V , where V is a finite dimensional vector space over F . The coordinate-free approach
presented in this section defines determinant in terms of signed volume gain caused by a
linear transformation.
3.2.1 Signed Area

Consider the standard relation between the elementary geometry of the two dimensional
plane and a two-dimensional real vector space V , in which, once a special origin point
O P of the plane is designated, every A P is associated with a single vector v V ,
as described in Chapter 1. Two points A, B P define a parallelogram OACB, where
C = A + B (when O, A, and B belong to the same line, the parallelogram will be a bit
flat). The area S0 = S0 (A, B) of the parallelogram OACB is a function of the two vectors
h h
A C =A+B

O h -h
B
Figure 3.1: Signed Area Setup
A and B. With one of the vectors fixed, the function becomes almost linear, in the
sense that
S0 (cA, B) = cS0 (A, B), S0 (A1 + A2 , B) = S0 (A1 , B) + S0 (A2 , B)
when c 0 and the points A1 , A2 lie on the same side of line (OB). It is possible to get
full linearity with respect to each of the arguments A, B by replacing S0 with its signed
area version S = S(A, B), where S(A, B) = S0 (A, B) when the points O, A, B do not
belong to the same line and the line (OB) can be obtained by rotating the line (OA)
around the origin O by an angle (0, ), taking S(A, B) = S0 (A, B) otherwise. The
resulting signed volume S = S(A, B) is not only multilinear (linear with respect all of its
individual components), but also skew symmetric, in the sense that S(A, B) = S(B, A)
for all A, B.
It can be proven that every other multilinear skew symmetric function : V V 7 IR
has the form (A, B) = c0 S(A, B) where c0 IR is a real constant. This observation can
be used to define the determinant det(L) of a linear function L : V 7 V as the constant
such that
S(LA, LB) = S(A, B) A, B P.
def
Indeed, since L is linear and S is multilinear and skew symmetric, the function SL (A, B) =
S(LA, LB) is also multilinear and skew symmetric. Therefore there exists a constant
IR such that S(LA, LB) = S(A, B) for all A, B. Since S(A, B) 6= 0 for some
argument pairs (A, B), the constant IR is uniquely defined by L.
A similar definition of the determinant can be produced for an arbitrary vector space
V over an arbitrary field F .
3.2.2 Multilinear Skew Symmetric Functions

Multilinear skew symmetric functions serve as a generalization of signed volume for finite
dimensional vector spaces.
Definition 3.1 Let V be a vector space over field F . A function : V n 7 F (i.e.

mapping sequences (v1 , . . . , vn ) of n vectors from V to scalars) is called multilinear skew
symmetric if is linear with respect to each of its individual arguments, and takes zero
value whenever two of its different arguments are equal.
According to the definition, in the case n = 1, a function : V 7 F is skew symmetric

if and only if it is linear.
The definition presents a set of abstract requirements for = (v1 , . . . , vn ) to serve
as signed volume of the parallelepiped formed by vectors v1 , . . . , vn . For example, the
signed area S = S(A, B) from the previous subsection is an example of a multilinear skew
symmetric function with the two-dimensional vector space V = P over the field F = IR.
Note the slightly unusual way in which the condition of skew symmetry is worded: instead
of
s(. . . , vi , . . . , vk , . . . ) = (. . . , vk , . . . , vi , . . . )
it requires
s(. . . , v, . . . , v, . . . ) = 0.
Subject to the multilinearity assumption, the two conditions are equivalent when the
characteristic of F is not equal to 2, and the second condition is the proper specification
for the signed volume when F has characteristic 2.
To define a multilinear skew symmetric function by an explicit formula, one can use
the following classical construction.
For a positive integer n let Bn = {b} denote the set of all permutations of the elements
of the set {1, . . . , n}, i.e. sequences
b = (b(1), b(2), . . . , b(n)) : b(i) {1, . . . , n}, b(i) 6= b(k) for i 6= k.
For example, (3, 1, 4, 2) is a permutation from B4 , while (1, 2, 4) and (1, 2, 3, 1) are not.
A well known formula from combinatorics says that Bn has exactly n! elements.
Let
1, a > 0,
sign(a) = 0, a = 0,
1, a < 0,

denote the sign of real number a. For every n and every b Bn let
Y
(b) = sign(b(k) b(i)) (3.3)
i<k
be the function which equals 1 when the number of inversions in b (i.e. pairs (i, k) of
indexes such that i < k but b(i) > b(k)) is even, and equals 1 otherwise. For example,
there are three inversions (b(1) > b(2), b(1) > b(4), and b(3) > b(4)) in b = (3, 1, 4, 2),
and hence ((3, 1, 4, 2)) = 1.
The most remarkable feature of the inversion counting function is that (b) changes
its sign when two elements in the sequence b are replaced by each other to form a new
permutation c.
Theorem 3.1 Let n be a positive integer. Let i, k {1, . . . , n} be such that i < k. For
a b Bn let c Bn be defined according to

b(i), r = k,
c(r) = b(k), r = i,
b(r), otherwise.

Then (b) = (c).
For example, when n = 4, b = (3, 1, 4, 2), i = 3, k = 4, we have c = (3, 1, 2, 4). Since

c has two inversions (c(1) > c(2) and c(1) > c(3)), it follows that (c) = 1, which is in
agreement with the previously computed (b) = 1.
Proof. Then the signs of b(i1 ) b(k1 ) and c(i1 ) c(k1 ) are different for a pair of indexes i1 < k1
if and only if either i1 = i and i < k1 k, or i < i1 < k and k1 = k, which amounts to a total
of 2k 2i 1 sign changes. Since 2k 2i 1 is an odd number, (c) = (b).
The following theorem shows how permutations and the inversion counting function
can be used to construct multilinear skew symmetric forms.
Theorem 3.2 Let V be a vector space over field F . Let f = (f1 , . . . , fn ) be a sequence
of elements fi V ] (i.e. linear functionals fi : V 7 F ). Then the function f : V n 7 F
defined by
X n
Y
f (v1 , . . . , vn ) = (b) fi vb(i) (3.4)
bBn i=1
is multilinear and skew symmetric.
Proof. To prove miltilinearity, note that every product

n
Y
fi vb(i) = (f1 vb(1) ) (fn vb(n) )
i=1
in (3.4) contains exactly one component depending (linearly) on each of the arguments vi V ,
and hence is multilinear. since a linear combination of multilinear functions is multilinear as
well, the multilinearity of f follows.
To prove skew symmetry, assume that vi = vk for some i < k. Then the elements of the
sum in (3.4) can be combined into pairs, where the permutations b, c defining the elements of
a single pair can be obtained from each other by switching their i-th and k-th terms. Since
(b) = (c), and
(f1 vb(1) ) (fn vb(n) ) = (f1 vc(1) ) (fn vc(n) ),
the sum of terms in each pair equals zero. Hence the total value of f is zero as well.
If a sequence v = (v1 , . . . , vn ) is bi-orthogonal to a sequence f = (f1 , . . . , fn ) then

f (v) = 1, because all components of the sum in (3.4) will equal zero, except for the
one corresponding to the identity permutation b(i) i. Therefore the function f is not
identically equal to zero whenever f is linearly independent.
Example 3.1 Let F = F2 , V = (F2 )2 . Let f1 , f2 V ] be the linear functionals on V defined

by
x1 x1
f1 = x1 , f2 = x2 .
x2 x2
For f = (f1 , f2 ) the corresponding multilinear skew symmetric form f : V V 7 F is given
by
x1 y1 x1 y1 x1 y1
f , = f1 f2 f2 f1 =
x2 y2 x2 y2 x2 y2
= x1 y2 x2 y1 = x1 y2 + x2 y1 ,
where the last equality is due to the fact that a = a in every field of characteristic 2.
In contrast, using f = (f1 , f1 ) yields f 0.
3.2.3 Determinant as Signed Volume Gain

Definition of the determinant relies on the fact that, for a vector space V of dimension n,
all multilinear skew symmetric functions of n variables are the same, subject to a constant
scaling, as claimed by the following statement.
Theorem 3.3 Let V be a vector space of dimension n over field F . Let 0 : V n 7 F

be a multilinear skew symmetric function which is not identically equal to zero. Then
for every other multilinear skew symmetric function : V n 7 F there exists a (unique)
constant c F such that
(v1 , . . . , vn ) = c 0 (v1 , . . . , vn ) vi V.
Proof. First, let us show that every multilinear skew symmetric function g : V n 7 F such
that g(u1 , . . . , un ) = 0 for some basis u = (u1 , . . . , un ) in V is identically equal to zero.
Let us do this for n = 2 first. Since every v1 , v2 V can be represented in the form
v1 = c11 u1 + c21 u2 , v2 = c12 u1 + c22 u2 ,
multilinearity and skew symmetry imply
g(v1 , v2 ) = g(c11 u1 + c21 u2 , c12 u1 + c22 u2 )

= c11 g(u1 , c12 u1 + c22 u2 ) + c21 g(u2 , c12 u1 + c22 u2 )
= c11 c12 g(u1 , u1 ) + c11 c22 g(u1 , u2 ) + c21 c12 g(u2 , u1 ) + c21 c22 g(u2 , u2 )
= c21 c12 g(u1 , u2 )
= 0.
The same expansion, using multilinearity and skew symmetry, can be used to show that in
general, for
Xn
vk = cik ui (k = 1, . . . , n),
k=1
the representation
g(v1 , . . . , vn ) = hg(u1 , . . . , un )
takes place, where h = h(c11 , . . . , cnn ) is some function of the coefficients cik . Hence g 0 as
soon as g(u1 , . . . , un ) = 0.
To finish the proof of Theorem 3.3, let u = (u1 , . . . , un ) be a basis in V . Since by assumption
0 is not identically equal to zero, 0 (u1 , . . . , un ) 6= 0. Then g = c0 , where
(u1 , . . . , un )
c = ,
0 (u1 , . . . , un )
is a multilinear skew symmetric function which takes zero value at u. Hence g 0, i.e. c 0 .
Theorem 3.3 allows one to define determinant of a linear function A : V 7 V , where

V is an n-dimensional vector space, as the gain in the signed volume produced by the
action of A.
Definition 3.2 Let V be a vector space of dimension n over vector field F . Let A : V 7
V be a linear function. The determinant det(A) of A is defined by the identity
(Av1 , . . . , Avn ) = det(A)(v1 , . . . , vn ) vi V.
In other words, the determinant of a linear operator is the multiplicative gain in signed
volume produced by its action.
Using the expression from (3.4) for the multilinear skew symmetric form , where
v = (v1 , . . . , vn ) and f = (f1 , . . . , fn ) are bi-orthogonal bases in V and V ] respectively,
yields an explicit, though rather complicated, formulae
X n
Y
det(A) = (b) fi Avb(i) . (3.5)
bBn i=1
3.3 Basic Properties of Determinants

Since there is no simple formula for signed volumes on vector spaces of high dimen-
sion (the expression in (3.4) is not very practical), it is not always convenient to use
Definition 3.2 to work with determinants. Instead, one typically relies on a set of basic
properties of determinants, to be discussed in this section.
3.3.1 Elementary Properties

This subsection lists some of the most primitive properties of determinants: those which
do not rely on any sort of matrix representation, and are applicable in arbitrary fields.
Determinant of Identity
If V is a finite dimensional vector space and I : V 7 V is the identity operator I(v) v
then the signed volume is not changed by the action of I, and hence det(I) = 1 is the
unit in the field of scalars of V .
Determinant of an Invertible Operator

A linear operator on a finite dimensional vector space is invertible if and only if its
determinant is not zero.
Theorem 3.4 Let V be a vector space of dimension n < over vector field F . Let
A : V 7 V be a linear function. Then det(A) = 0 if and only if ker(A) 6= {0}.
3.3. BASIC PROPERTIES OF DETERMINANTS 61
Proof. If ker(A) = {0} and (v1 , . . . , vn ) is a basis in V then (Av1 , . . . , Avn ) is a basis in V .
Hence, for a skew symmetric multilinear form 0 : V n 7 F which is not identically equal to
zero, the values
b = 0 (v1 , . . . , vn ), a = 0 (Av1 , . . . , Avn )
are not zero, and therefore det(A) = a/b 6= 0.
Conversely, if A is not invertible then every element in the range R(A) is a linear combination
of the a fixed sequence of n 1 vectors u = (u1 , . . . , un1 ). Following the proof of Theorem 3.3,
the value of 0 (Av1 , . . . , Avn ) is a linear combination of the values of 0 (w1 , . . . , wn ) where
wi {u1 , . . . , un1 } for every i. Since this measn that at least two arguments of 0 (w1 , . . . , wn )
are equal, 0 (w1 , . . . , wn ) = 0, and hence 0 (Av1 , . . . , Avn ) = 0.
Though, formally speaking, Theorem 3.4 can be used to verify feasibility of a linear
equation, it is not always a good idea to follow this path. For instance, let us return to
the setup of Example 1.14. Checking feasibility of a polynomial interpolation problem
p(ti ) = yi (i = 1, . . . , n),
to be solved with respect to the coefficients pi of the polynomial
p(t) = p0 + p1 t + + pn1 tn1
for a given set of samples (ti , yi), where ti 6= tk for i 6= k, is equivalent to checking
feasibility of the linear equation Mx = y where y IRn is the column vector with entries
yi , and M : IRn 7 IRn is defined by

p0 p(t1 )
M ... = ... .

pn1 p(tn )
One way to solve the problem is by figuring out that the determinant of the matrix

1 t1 . . . t1n1
1 t2 . . . tn1
2
..
.

n1
1 tn . . . tn
of M in the standard basis is not equal to zero. The approach used in Example 1.14,
though mathematically equivalent, appears to be more straightforward and concentrated
on the fundamentals.
Multiplicativity of Determinant
The following statement is the multiplicative property of the determinant.
A : V 7 V and B : V 7 V be two linear functions. Then
det(AB) = det(A) det(B).
Proof. Since AB is the composition of A and B, the total signed volume gain produced by AB
is the product of the gains corresponding to separate actions of B and A.
Example 3.2 In the example from subsection 3.1.2, assume that the piecewise constant func-
tion u takes value u( ) = uk for tk < tk+1 , where 0 = t0 < t1 < < tm+1 = t. Then the
differential equation in (3.1) can be solved explicitly on each of the intervals [tk , tk+1 ]:
1/2 1/2 1/2
y( ) = y(tk ) cos(uk ( tk )) + y(tk )uk sin(uk ( tk )) ( [tk , tk+1 ]),
which can be written in the matrix format as

a1 sin(ab)

y(tk+1 ) 1/2 y(tk ) def cos(ab)
= L(uk , tk+1 tk ) , L(a, b) = .
y(tk+1 ) y(tk ) a sin(ab) cos(ab)
Since det(L(a, b)) = 1 for all a, b, it follows that

y(t) y(0)
= M (t) ,
y(t) y(0)
where
1/2
M (t) = L(u1/2
m , t tm ) . . . L(u0 , t1 t0 )
has determinant det(M (t)) = 1. Hence it is impossible to make all components of M (t) to
converge to zero as t +.
Determinants of Similar Operators

Let U and V be two vector spaces of same finite dimension over the same field F . A
linear operator A : U 7 U is called similar to linear operator B : V 7 V if there exists
a linear bijection S : U 7 V such that SA = BS, as shown on the commutative diagram
below.
S-
U V
A B
? ?
S-
U V
The following theorem states that similar functions have equal determinants.
Theorem 3.6 Let U and V be finite dimensional vector spaces over field F . Let A :
U 7 U, B : V 7 V , and S : U 7 V be linear functions such that SA = BS. If S is a
bijection then det(A) = det(B).
When U = V , Theorem 3.6 follows from Theorem 3.5, according to
det(B) = det(SAS 1 ) = det(S) det(A) det(S 1 = det(A) det(SS 1 ) = det(A).
This argument, formally speaking, does not apply when U 6= V .

Proof. For n = dim(V ) let V : V n 7 F be a multilinear skew symmetric function which is
not identically equal to zero. Then the formula
U (u1 , . . . , un ) = V (Su1 , . . . , Sun )
defines a multilinear skew symmetric function U : U n 7 F which is not identically equal to

zero. Hence, for a basis u = (u1 , . . . , un ) in U ,
U (Au1 , . . . , Aun )
det(A) =
U (u1 , . . . , un )
V (SAu1 , . . . , SAun )
=
V (Su1 , . . . , un )
V (BSu1 , . . . , BSun )
=
V (Su1 , . . . , un )
= det(B),
since (Su1 , . . . , Sun ) is a basis in V .
Determinants and Duality

The determinants of a linear operator and its dual are equal.
Theorem 3.7 Let A : V 7 V be a linear operator on a finite dimensional vector space

V over field F . Then det(A) = det(A] ).
In terms of matrices, Theorem 3.7 is the familiar claim that transposition does not
change determinant of a matrix.
Proof. Let n = dim(V ). Consider the function : (V ] )n V n 7 F defined according to the
proof of Theorem 3.2:
X n
Y
(f1 , . . . , fn , v1 , . . . , vn ) = (b) fi vb(i) .
bBn i=1
With the first n arguments fi V ] are fixed, is a multilinear skew symmetric function on V n .
With the last n arguments fixed, is a multilinear skew symmetric function on (V ] )n . Since
(f1 , . . . , fn , Av1 , . . . , Avn ) = (f1 A, . . . , fn A, v1 , . . . , vn ) = (A] f1 , . . . , A] fn , v1 , . . . , vn ),
it follows that det(A) = det(A] ).

3.3.2 Determinants and Block Decompositions

This subsection explores the properties of determinants associated with block decompo-
sitions of linear operators.
Direct Sums And Block Decompositions

Let V1 , V2 V be two subspaces of a vector space V over a field F . It is said that
V = V1 V2 (V is a direct sum of V1 and V2 ) if V1 V2 = {0} and
def
V = V1 + V2 = {v1 + v2 : v1 V1 , v2 V2 }.
It is easy to see that, due to the condition V1 V2 = {0}, the representation of a given v V
in the form v = v1 +v2 where vi Vi is unique. Moreover, the functions [V1 , V2 ] : V 7 V1
and [V2 , V1 ] : V 7 V2 defined according to
v = [V1 , V2 ](v) + [V2 , V1 ](v)
are linear.
When vector spaces U, V have decompositions U = U1 U2 and V = V1 V2 , a
linear function A : U 7 V naturally defines four linear functions A1,1 : U1 7 V1 ,
A1,2 : U2 7 V1 , A2,1 : U1 7 V2 , and A2,2 : U2 7 V2 , defined according to
A1,1 u1 = [V1 , V2 ]Au1 ,

A1,2 u2 = [V1 , V2 ]Au2 ,
A2,1 u1 = [V2 , V1 ]Au1 ,
A2,2 u2 = [V2 , V1 ]Au2 .
Conversely, the decompositions U = U1 U2 and V = V1 V2 , together with the four

linear functions Aik define A, which is written symbolically in the form of the block
decomposition
A11 A12
A= .
A21 A22
Using the column vector notations

u1 v1
u= , v=
u2 v2
for u = u1 + u2 , v = v1 + v2 , where ui Ui , vi Vi , one can then conveniently employ the

block matrix-vector multiplication according to

A11 A12 u1 A11 u1 + A12 u2 v1
Au = = = = v.
A21 A22 u2 A21 u1 + A22 u2 v2
Determinants Of Block Triangular Matrices

The following observation is very useful for working with determinants.
A : V 7 V be a linear function.
(a) If the block matrix representation of A with respect to a direct sum decomposition
V = V1 V2 is given by

A11 A12
A= ,
0 A22
where 0 denotes the linear function 0 : V1 7 V2 which maps every element v1 V1
to zero then
det(A) = det(A11 ) det(A22 ). (3.6)
(b) Identity (3.6) also holds if the block matrix representation of A with respect to a
direct sum decomposition V = V1 V2 is given by

A11 0
A= ,
A21 A22
where 0 denotes the linear function 0 : V2 7 V1 which maps every element v2 V2

to zero.
Proof. The proofs for (a) and (b) are similar. Let us prove (a).
Let (v1 . . . , vr ) and (vr+1 , . . . , vn ) be bases in V1 and V2 respectively. Let (f1 , . . . , fn ) be the
sequence of functionals from V ] which is bi-orthogonal to (v1 , . . . , vn ). Then

fi A11 vk for i, k r,
fi A22 vk for i, k > r,

fi Avk =
fA v for i r, k > r
i 12 k

0 for i > r, k r.
Hence using the identity

X n
Y
det(A) = (b) fi Avb(i)
bBn i=1
(same as (3.5)), a term in the summation corresponding to a permutation b = (b(1), . . . , b(n))

is non-zero only if b(i) > r for all i > r, i.e.
b = (c(1), . . . , c(r), d(1) + r, . . . , d(n r) + r)

for some permutations c Br , d Bnr , which yields yields
X r
Y nr
Y
det(A) = fi Avc(i) ft+r Avd(t)+r
cBr ,dBnr i=1 t=1
r
!
XY X nr
Y
= fi A11 vc(i) ft+r A22 vd(t)+r
cBr i=1 dBnr t=1
= det(A11 ) det(A22 ).
Theorem 3.8 allows one to calculate determinants of matrices associated with row and
column operations, as in det T (c) = 1 where T is the n-by-n matrix with units on the main
diagonal and a single non-zero off-diagonal element c in the i-th row and k-th column (so
that for every n-by-n matrix A the product AT (c) is obtained from A by adding its i-th
column, scaled by c, to its k-th column, and T (c)A is obtained from A by adding its k-th
row, scaled by c, to its i-th row). Similarly, Theorem 3.8 implies that determinant det(A)
of a diagonal matrix A equals the product of its diagonal entries.
Schur Identity And The Kramers Formula

A very useful implication of Theorem 3.8 is given by the following statement.
Theorem 3.9 Let U, V be finite dimensional vector spaces over field F . Let A : U 7 V
and B : V 7 U be linear functions. Then
det(IV AB) = det(IU BA), (3.7)
where IV , IU are the identity operators on V and U respectively.
Proof. According to Theorem 3.8,

I 0 I A
det = det = 1.
B I 0 I
Hence

I A I A I 0 I AB A
det = det = det = det(I AB),
B I B I B I 0 I

I A I A I A I 0
det = det = det = det(I BA),
B I B I 0 I B I BA
which implies (3.7).
One of many applications of Theorem 3.9 is a well known explicit formula for solving
linear equation Mx = v with respect to x, sometimes referred to as the Kramers formula.
Theorem 3.10 Let V be a finite dimensional vector space over field F . Let M : V 7 V
be a linear function, f V ] a linear functional on V , and v, e V vectors from V , such
that ker(M) = {0} and f e = 1. Then
det(M Mef + vf )
f M 1 v = . (3.8)
det(M)
Proof. Applying Theorem 3.9 with U = F , B = f , and A : F 7 V mapping c F to

c(e A1 v), and using the fact that, for a linear function W : F 7 F (i.e. multiplication in F
by an element w from F ) the determinant is given by det(W ) = w,
det(M M ef + vf ) = det(M ) det(IV (e M 1 v)f ) =
= det(IF f (e M 1 v)) = 1 f e + f M 1 v = f M 1 v.
A common application of Theorem 3.10 is when M is an n-by-n invertible matrix, f

is the 1-by-n row with a single non zero element (the unit 1 in the k-th place), and e is
the n-by-1 column with a single non-zero element (the unit 1 in the k-th place). Then
f M 1 v is the k-th entry in the solution x = M 1 v of the equation Mx = v, and matrix
M Mef + vf is obtained from M by replacing its k-th column with v.
Taking into account the expression (3.5), with (fi ) and (vi ) being the standard bases in
1n
F and F n1 respectively, the identity from (3.8) shows that the entries of x = M 1 v are
rational functions of the entries of M and v. Moreover, since the set of all permutations
Bn has n! elements, the inequality | det(M)| n!Rn provides an upper bound for the
numerators and denominators of the entries of x when M and v have integer coefficients
with absolute values not larger than R.
Chapter 4
Characteristic Polynomials
This chapter uses characteristic polynomials and invariant subspaces to study the prop-
erties associated with iterations of linear operators: a topic highly relevant when working
with linear time invariant (LTI) dynamical systems.
4.1 Motivation: LTI Systems

Understanding the behavior of linear systems is a major application domain for studying
characteristic polynomials and invariant subspaces.
4.1.1 Autonomous Systems

An autonomous LTI system (of finite order and in discrete time) can be defined by a
recursive equation
x(t + 1) = Ax(t) (t = 0, 1, 2, . . . ), (4.1)
where x(t) are elements of a vector space V (over field F ), and A : V 7 V is a linear
function. Once the initial condition x0 = x(0) is fixed, equation (4.1) defines x(t) for all
positive integer t.
A very old example of an autonomous LTI system is given by the classical definition
of Fibonacci numbers y(t):
y(0) = 0, y(1) = 1, y(t + 1) = y(t) + y(t 1) (t = 1, 2, . . . ).
Indeed, arranging the pairs of consecutive Fibonacci numbers into vectors yields (4.1),
where
y(t + 1) 1 1 1
x(t) = , x0 = , A= .
y(t) 0 0 1
Studying the characteristic polynomial A (s) = det(sI A) of A (in particular, its
roots) yields an explicit formula for y(t), as well as a conclusion about its asymptotic
69
70 CHAPTER 4. CHARACTERISTIC POLYNOMIALS
behavior:
s 1 1
A (s) = det = s2 s 1
1 s
has real roots
1+ 5 1 5
= , = ,
2 2
and, accordingly, the Fibonacci number y(t) is a linear combination of t and t :
t t
y(t) = ,

so that y(t) converges to + and y(t + 1)/y(t) converges to as t +.
It is instructive to do analysis of Fibonacci recurrence in the fields of finite charac-
teristic. For example, in F5 the characteristic polynomial of A has a double root s = 3,
which is certified by the identity
s2 s 1 = (s 3)2 (in F5 ).
Accordingly, the explicit expression for y(t) becomes
y(t) = c0 3t + c1 t3t = 2t3t F5 ,
where the values c0 = 0, c1 = 2 of the coefficients c0 , c1 F5 are derived from the

equations
0 = y(0) = c0 , 1 = y(1) = 3c0 + 3c1 .
To find the period T > 0 of y(t), note that that the identity y(t) = y(t + T ) (to be valid
for all t t0 ) means
2t3t = 2(t + T )3t+T (in F5 ),
which means
t = 3T (t + T ) t (in F5 ),
i.e. 3T = 1 (hence T is a multiple of 4) and T 3T = 0 (hence T is a multiple of 5), which
implies that y(t) is periodic with period 20.
4.1.2 Reachability of LTI State Space Models

The autonomous LTI system (4.1) is a very special case of the general LTI state space
model (in discrete time)
x(t + 1) = Ax(t) + Bu(t), y(t) = Cx(t) + Du(t) (t = 0, 1, 2, . . . ), (4.2)
where x(t), u(t), y(t) are elements of vector spaces X, U, Y over the same vector field F ,
and A : X 7 X, B : U 7 X, C : X 7 Y , D : U 7 Y are fixed linear functions.
4.1. MOTIVATION: LTI SYSTEMS 71
The typical understanding of (4.2) is that it defines a linear system which transforms an
input time series u = (u(0), u(1), . . . ) into output sequence y = (y(0), y(1), . . . ), using the
vector x0 = x(0) of initial conditions as an additional input parameter. In the classical
applications of LTI systems the field F = IR of real numbers is used to describe (some-
times approximately) dynamical relations between time samples of physical parameters,
represented, naturally, by real numbers. However, LTI state space models over finite fields
extend linear system methods to finite automata, and can be used to simplify analysis of
discrete algorithms in communications and control.
Consider, for example, the issue of reachability and observability for the model in
(4.2).
Definition 4.1 System (4.2) is called reachable if for every x X there exists a positive
integer T and a sequence u(0), u(1), . . . of vectors u(t) U such that the solution of (4.2)
with x(0) = 0 satisfies x(T ) = x .
Naturally, reachability describes the ability of a control action u = u(t) to stir the
state x = x(t) from a zero initial position to an arbitrary given position x X, given
enough time. Moreover, due to the linearity of equations (4.2), reachability also means
the possibility to stir from any given initial position x0 X to an arbitrary terminal
position x X.
The question of reachability can be answered in terms of invariant subspaces. Specif-
ically, system (4.2) is not reachable if and only if there exists a subspace X0 X which
is proper (i.e. X0 6= X), contains the range of B (i.e. R(B) X0 ), and A-invariant, in
the sense that AX0 is contained in X0 .
For example, consider the input - output recursive equation
y(t + 1) + y(t 1) = u(t) + 2u(t 1), (4.3)
which can be represented in the form (4.2) with

y(t 1) 2u(t 1) 0 1 2
x(t) = , A= , B= , C = 0 1 , D = 0.
y(t) 1 0 1
When F = IR is the field of real numbers (i.e. when y(t) and u(t) are real), system (4.2)
is reachable. This can be seen by noticing that the vectors

2 1
B= , AB =
1 2
span X = IR2 , and hence IR2 is the only A-invariant subspace which contains the range
of B. Interestingly, the reachability property does not necessarily extend to the interpre-
tations of (4.3) in other fields F . For example, with F = F5 , since

1 2
=2 (in F5 ),
2 1
the vector space

x
X0 = : c F5 (F5 )2
2c
is a proper A-invariant subspace of X = (F5 )2 containing the range of B, which means
that the state space model of (4.3) is not reachable.
4.2 Basic Properties

This section describes elementary properties of characteristic polynomials and their rela-
tion to invariant subspaces.
4.2.1 Companion Form

For a linear operator A : V 7 V acting on an n-dimensional vector space V over field F ,
the expression det(sI A), where s F is a parameter, must be a polynomial of degree
n with respect to s, with a unit leading coefficient:
def
A (s) = det(sI A) = sn + an1 sn1 + + a1 s + a0
(the coefficients ai are obtained by applying the operations of addition, subtraction, and
multiplication to the elements of A).
Naturally, different linear functions can have the same characteristic polynomial. The
following theorem constructs a class of linear operators with prescribed characteristic
polynomial.
Theorem 4.1 For an integer n > 1 and a field F let (a0 , . . . , an1 ) be a sequence of
elements of F . If v = (v1 , . . . , vn ) is a basis in a vector space V over F then the (uniquely
defined) linear operator A such that
Avk = vk+1 (k = 1, . . . , n 1), Avn = a0 v0 a1 v1 an1 vn1
has characteristic polynomial
det(sI A) = sn + an1 sn1 + + a1 s + a0 .
In terms of matrix algebra, Theorem 4.1 calculates the characteristic polynomial of a

companion form matrix:

0 0 . . . 0 a0
1 0
0 a1
A=
0 1 0 a n
2 , det(sI A) = s + an1 s
n1
+ + a1 s + a0 .
. . .

0 0 . . . 1 an1
4.2. BASIC PROPERTIES 73
Proof. It is sufficient prove that for a multilinear skew symmetric form : V n 7 F such that
(v1 , . . . , vn ) = 1 the identity
(sv1 v2 , sv2 v3 , . . . , svn1 vn , svn +a0 v1 + +an1 vn ) = sn +an1 sn1 + +a1 s+a0 (4.4)
holds. Let us do this by induction with respect to n = 2, 3, . . . .

The identity is true for n = 2, because
(sv1 v2 , sv2 + a0 v1 + a1 v2 ) =
= s2 (v1 , v2 ) + a0 s(v1 , v1 ) + a1 s(v1 , v2 ) s(v2 , v2 ) a0 (v2 , v1 ) a1 (v2 , v2 ) =
= ( 2 + a1 s + a0 )(v1 , v2 ) = s2 + a1 s + a0 .
Assume that (4.4) is valid for all n k, where k > 2. Then, for n = k + 1,
(sv1 v2 , sv2 v3 , . . . , svn1 vn , svn + a0 v1 + + an1 vn ) =
= s(v1 , sv2 v3 , . . . , svn1 vn , svn + a0 v1 + + an1 vn )
+(v2 , sv2 v3 , . . . , svn1 vn , svn + a0 v1 + + an1 vn ). (4.5)
Since
(v1 , sv2 v3 , . . . , svn1 vn , svn + a0 v1 + + an1 vn ) =
(v1 , sv2 v3 , . . . , svn1 vn , svn + + an1 vn ),
and, for a fixed v1 , the function
(u1 , . . . , uk ) 7 (v1 , u1 , . . . , un )
is multilinear and skew symmetric, the inductive assumption can be applied to the first term in
(4.5), to show that
s(v1 , sv2 v3 , . . . , svn1 vn , svn + a0 v1 + + an1 vn ) = sn + an1 sn1 + + sa1 .
For the second term in (4.5)
(v2 , sv2 v3 , . . . , svn1 vn , svn + a0 v1 + + an1 vn ) =
= (v2 , v3 , sv3 v4 , . . . , svn1 vn , svn + a0 v1 + + an1 vn ) =
= = (v2 , v3 , . . . , vn , svn + a0 v1 + + an1 vn ) =
= (v2 , v3 , . . . , vn , a0 v1 ) = a0 ,
which completes the induction step.

4.2.2 Invariant Subspaces And Irreducible Polynomials

Let A : V 7 V be a linear operator on a vector space V over field F . Assume that V1 V
is a non-trivial A-invariant subspace of V , where invariance means that AV1 V1 , and
non-triviality means that V1 6= {0} and V1 6= V . Then there always exist a subspace
V2 V such that V = V1 V2 (why?). Due to the fact that AV1 V1 , the corresponding
block matrix representation of A has a zero block under the diagonal:

A11 A12
A= . (4.6)
0 A22
The linear operator A11 : V1 7 V1 is called the restriction of A to V1 .

Due to the upper triangularity of the block representation in (4.6), it implies that the
characteristic polynomial of A can factored according to
det(sI A) = det(sI A11 ) det(sI A22 ).
Therefore, if a linear operator has a non-trivial invariant subspace, its characteristic poly-
nomial is reducible, i.e. can be factored into a product of two polynomials of positive
degree. Moreover, one of the two polynomials is the characteristic polynomial of the
restriction of A to the invariant subspace V1 .
Definition 4.2 A polynomial
p(x) = pm xn + pm1 xm1 + + p1 x + p0 , pm 6= 0
of degree m with coefficients pi F is called irreducible over field F if p cannot be

represented as a product p = p1 p2 of two polynomials of positive degree with coefficients
from F .
Example 4.1 If a polynomial p = p(x) of degree m with coefficients from field F has a root
a F , i.e. p(a) = 0, then p can be factored according to p(x) = (x a)q(x), where q = q(x) is
a polynomial of degree m 1 with coefficients from F (this can be shown by performing long
polynomial division of p(x) by x a). Therefore a polynomial p = p(x) with complex coefficients
is irreducible over field C if and only if it has the form p(x) = p1 x + p0 . Similarly, a polynomial
with real coefficients is irreducible over IR if p(x) = p2 x2 + p1 x + p0 where either p2 = 0 or
4p2 p0 > p21 .
Example 4.2 The polynomial p(x) = x2 + x + 1 is irreducible over F2 because it is of degree

two and has no roots in F2 .
The following theorem claims that every linear operator A on a finite dimensional vec-
tor space V has a non-zero invariant subspace V1 such that the characteristic polynomial
of the restriction A11 of A to V1 is irreducible.
Theorem 4.2 Let A : V 7 V be a linear operator on a vector space V of dimension

dim(V ) = n < over field F . Then V has an invariant subspace V1 6= {0} such that the
characteristic polynomial of the restriction A11 of A to V1 is irreducible.
Proof. For every v V , v 6= 0 let = (v) be the minimal positive integer k for which Ak v
is a linear combination of vectors v, Av, . . . , Ak1 v. Since dim(V ) = n, it follows that (v) n
for all v V .
Let m = (u), u 6= 0 be the minimal value of , i.e. the vectors u, Au, . . . , Am1 u are
linearly independent, and
Am u = cm1 Am1 u + + c1 Au + c0 u
for some ci F . Equivalently,
p(A)u = (Am + pm1 Am1 + + p1 A + p0 )u = 0
for
p(x) = xm + pm1 xm1 + + p1 x + p0 , pi = ci F.
Let us show that the polynomial p = p(x) must be irreducible over F . Indeed if p = q1 q2
where both polynomials q1 , q2 have degree less than m then
q1 (A)q2 (A)u = p(A)u = 0.
Then either q2 (A)u = 0, which means (u) < m, or q1 (A)w = 0 for w = q2 (A) 6= 0, which
means (w) < m, with both equalities contradicting the minimality of m = (u).
To complete the proof, note that the vectors u, Au, . . . , Am1 u form a basis in an A-invariant
subspace V1 V , and, according to Theorem 4.1, p(s) = det(sI A11 ) is the characteristic
polynomial of the restriction A11 of A to V1 .
4.2.3 Cayley-Hamilton Theorem

The idea from the proof of Theorem 4.2 can be used to prove the well known Cayley-
Hamilton theorem.
When
s1 0
A= ..
.

0 sn
is a diagonal matrix, the diagonal entries si are the roots of the characteristic polynomial:
A (s) = (s s1 ) . . . (s sn ),
and hence

A (s1 ) 0
def
A (A) = An + an1 An1 + + a1 A + a0 = .. = 0.
.

0 A (sn )
It turns out that the identity A (A) = 0 olds for every linear operator A : V 7 V .
Theorem 4.3 Let = (s) be the characteristic polynomial of a linear operator A :

V 7 V on a finite dimensional vector space V over field F . Then (A) = 0.
Proof. For every v V , v 6= 0 let k be the smallest positive integer such that Ak v is a
linear combination of v, Av, . . . , Ak1 v. As was shown in the proof of Theorem 4.2, this means
p(A)v = 0 for a polynomial p of degree k, and vectors v, Av, . . . , Ak1 v form a basis in an
A-invariant subspace V1 , such that p(s) = det(sI A11 ) is the characteristic polynomial of the
restriction A11 of A to V1 . Hence (s) = q(s)p(s) for some polynomial q, and
(A)v = q(A)p(A)v = 0.
4.2.4 Schur Decomposition

Theorem 4.2 can be applied iteratively to the blocks of a finite dimensional linear operator
to derive the following upper triangular block representation statement.
Theorem 4.4 For every linear operator M : U 7 U on a vector space U of dimension

dim(U) = n < over field F there exists a direct sum decomposition
U = U1 U2 Um
in which A has the block matrix representation

M1,1 M1,2 . . . M1,m1 M1,m
0 M 2,2 . . . M2,m1 M2,m

M = ... .. .. ..

.. , (4.7)
.

. . .
0 0 . . . Mm1,m1 Mm1,m
0 0 ... 0 Mm,m
where the characteristic polynomial of each linear operator Mk,k is irreducible.
Proof. If the characteristic polynomial of M is irreducible, there is nothing to prove. Otherwise,

applying Theorem 4.2 to M represents it in the block diagonal form

M1,1 M12
M= ,
0 M22
where the characteristic polynomial of M1,1 is irreducible.

If the characteristic polynomial of M22 is irreducible, set m = 2, M2,2 = M22 to finish the
proof. Otherwise, applying Theorem 4.2 to M22 represents M it in the block diagonal form

M1,1 M12 M13
M = 0 M2,2 M23 ,
0 0 M33
where the characteristic polynomial of M2,2 is irreducible.

Continuing the process by applying Theorem 4.2 to M33 , M44 etc. (until arriving at an
irreducible Mkk ) yields the desired Shur form representation.
When F = C is the field of complex numbers, the dimensions of the spaces Uk must be
equal to 1 for irreducibility, and hence Mi,k are just scalars, which makes (4.7) an upper
triangular form called complex Schur decomposition. The Schur decomposition of complex
matrices is an important operation, because it not only calculates the eigenvalues Mk,k of
matrix M, but can be performed efficiently and in a numerically stable fashion. More
specifically, in terms of matrix representations, it is always possible to find a complex Schur
decomposition MS = T MT 1 of a matrx M for which the coordinate transformation T is
unitary, i.e. preserves Hermitian length of vectors (to be discussed in the next chapter).
When F = IR, the dimensions of the spaces Uk must be equal to 1 or 2, and the corre-
sponding uppper triangular block representation is called the real Schur decomposition.
Chapter 5
Quadratic Forms
and Scalar Products
The concepts discussed in this chapter apply to real vector spaces, which also allows for
an easy generalization to complex vector spaces, as every vector space over an extension
field of IR is naturally a vector space over IR as well. The main use of quadratic forms in
most applications is to define and to compare lengths of non-parallel vectors.
This chapter introduces bilinear functionals and scalar products, positive definiteness,
Cauchy-Schwarz inequality, Gramm-Schmidt orthogonalization, and other related math-
ematical constructions used to work with uncertainty of functional models, second order
statistics of random variables, elementary geometry proofs, and many other applications.
5.1 Motivation
Quadratic forms are defined by pushing the notion of linearity just a bit further.
Definition 5.1 Let V be a vector space over field F . A quadratic form : V 7 F is a

function representable as (v) = b(v, v), where b : V V 7 F is a multilinear function
(i.e. linear with respect to both of its arguments).
While the definition can be easily generalized to introduce forms of every positive
integer power (cubic, fourth order, etc.), quadratic forms are remarkably easier to handle.
For example, one of the most commonly checked properties of a quadratic form : V 7 V
on a vector space V over the field F = IR of real numbers is its positive definiteness.
Definition 5.2 Let V be a real vector space. A quadratic form : V 7 IR is called

positive definite when (v) > 0 for all v 6= 0.
While positive definiteness of a quadratic form on a finite dimensional real vector space
can be verified easily, assessment of this property for forms of any other even order remains
79
80 CHAPTER 5. QUADRATIC FORMS AND SCALAR PRODUCTS
a problem without a satisfactory solution. A similar complexity bariier is observed in

optimization: if V is a finite dimensional real vector space, : V 7 IR is a positive
definite quadratic form, and L : V 7 IR is a linear functional then it is relatively easy
to find the minimum and the argument of minimum for the function f (v) = (v) + L(v).
However, when becomes a form of a higher even order, the optimization task becomes
very difficult.
5.1.1 Euclidean Geometry

As was shown before, the set of all points in the elementary geometry of the (three
dimensional) space can be viewed naturally as a three dimensional real vector space, with
the geometry of the plane being associated with two dimensional subspaces. However,
the definition of a real vector space does not provide any notion of length of a vector,
apart from the possibility of comparing lengths of parallel vectors: if v = cu where v, u
are vectors and c IR is a scalar then the length |v| of v equals |c| times the length |u| of
u. And, without a notion of length, the notion of an angle is not available as well.
It turns out that the complete set of axioms of the elementary 3D geometry reduces
to fixing a single positive definite quadratic form : V 7 IR on the corresponding
real vector space V of dimension 3. The quadratic form serves to define length of an
arbitrary vector v V according to |v| = (v)1/2 . It also fixes a scalar product on V , a
function (, ) : V V 7 IR, according to
(v + u) (v u)
(v, u) = ,
4
which is bilinear and symmetric, because
(v + u) (v u) b(v + u, v + u) b(v u, v u) b(v, u) + b(u, v)
= =
4 4 2
is a symmetrized version of the original bilinear form b : V V 7 IR such that (v) =
b(v, v). In turn, the scalar product allows one to define the angle = (u, v) between two
non-zero vectors u, v V by
(u, v)
cos((u, v)) = .
|u| |v|
In fact, manipulations with real vector spaces and quadratic forms is one of the most
efficient ways of proving non-trivial theorems of elementary geometry.
For example, consider the following observation from 3D geometry: for every 5 points
on the unit sphere there exists a hemisphere which contains at least 4 of them. In trans-
lation to the language of scalar products, the unit spehere is the set S of all vectors v
such that (v, v) = 1, and a hemisphere in S is the set all v S satisfying the inequality
(v, u) 0, where w S is a fixed vector. To prove the statement using scalar products,
let v1 , . . . , v5 S be the five points. Take u S satisfying (v1 , u) = 0 and (v2 , u) = 0.
5.1. MOTIVATION 81
Then, out of the three real numbers (v3 , u), (v4 , u), and (v5 , u), at least two have the same
sign (i.e. are either both non-negative or both non-positive). Assume, without loss of
generality, that these two vectors are v3 and v4 . If (v3 , u) 0 and (v4 , u) 0 then the
hemisphere defined by (v, u) 0 contains v1 , . . . , v4 . Otherwise, the hemisphere defined
by (v, u) 0 contains v1 , . . . , v4 .
5.1.2 Quadratic Constraints and Uncertainty Modeling

Describing nonlinear or uncertain relations by quadratic inequalities and then using such
inequalities to assess properties of mathematical models defined by nonlinear or uncertain
equations is a widely popular, if not always acknowledged, idea.
Consider the recursive equation
y(t + 2) = 2ay(t + 1) a2 y(t) (t = 0, 1, 2, . . . ), (5.1)
or, equivalently,
2a a2

y(t + 1)
x(t + 1) = Ax(t), x(t) = , A= .
y(t) 1 0
Roughly speaking, the recursive equation (5.1) is called asymptotically stable if every
solution y = y(t) of (5.1) converges to zero as t +. Assessment of asymptotic
stability of recursive equations is a fundamental question in control systems.
Since the characteristic polynomial (s) = (s a)2 of A has a double root at s = a,
every solution of (5.1) has the form
y(t) = (c0 + c1 t)0.9t ,
where c0 , c1 are constants. Hence every solution of (5.1) converges to zero as t +

when |a| < 1, and there are unbounded solutions when |a| 1.
The relative simplicity with which asymptotic stability of (5.1) was established, does
not extend to the modified setup
y(t + 2) = 2ay(t + 1) a2 sin(y(t)) (t = 0, 1, 2, . . . ) (5.2)
as, due to the presence of nonlinearity, it is impossible to produce a useful explicit formula
for y(t) as a function of t. While a complete stability analysis of (5.2) for all possible values
of a is difficult, playing with quadratic forms allows one to certify asymptotic stability or
lack thereof for some values of a.
First, let us show how asymptotic stability of (5.1) can be analyzed without referring
to an explicit formula for its solutions. Consider the function 1 : IR2 7 IR defined by

x1
1 = (x1 ax2 )2 + rx22 ,
x2
where r > 0 is a parameter. Note that 1 is a quadratic form on the real vector space
V = IR2 . Indeed, 1 (v) = b1 (v, v), where b1 : V V 7 IR is the bilinear function defined
by
u1 v1
b1 , = (u1 au2 )(v1 av2 ) + ru2v2 .
u2 v2
For a fixed value of parameter , the function 2 mapping v V to 1 (v) 1 (Av) IR
is a quadratic form as well, because
2 (v) = b2 (v, v) for b2 (u, v) = b1 (u, v) b1 (Au, Av),
and b2 is bilinear as long as b1 is.

Using the criteria for positive definiteness of finite dimensional quadratic forms (to
be discussed in this chapter), it is easy to establish that, for |a| < 1, there always exist
(0, 1) and r > 0 such that 2 is positive definite. This means that

y(t + 1)
1 (x(t + 1)) 1 (x(t), where x(t) =
y(t)
for every t = 0, 1, 2, . . . for every solution of (5.1). Since 1 itself takes non-negative
values, this means that
y(t)2 1 (x(t)) t 1 (x(0))
converges to zero as t +.
Similarly, it is easy to establish that, for |a| > 1, there always exist > 1 and r > 0
such that 2 is positive definite. This means that
1 (x(t + 1)) 1 (x(t))
converges to + as t + unless 1 (x(t)) 0, and hence y(t) is unbounded for every

non-zero solution of (5.1).
To work with the nonlinear equations (5.2), introduce new variable
w(t) = y(t) sin(y(t)) (t = 0, 1, 2, . . . ).
Then (5.2) can be re-written in the form

y(t + 1) 1
x(t + 1) = Ax(t) + Bw(t), x(t) = x(t) = , B= .
y(t) 0
While the relation between w(t) and y(t) cannot be described by quadratic inequalities
exactly, there are certain quadratic inequalities which are implied by the relation between
w(t) and y(t). In particular, for every > 0 there exists = () > 0 such that the
inequality
(x(t), w(t)) 0
5.1. MOTIVATION 83
is satisfied whenever |y(t)| , where : IR2 IR 7 IR is the quadratic form defined by

x1
e ,w = |x2 |2 |w|2.
x2
Using the a standard criteria of positive definiteness for finite dimensional quadratic
forms, it is easy to establish that for every a IR such that |a| > 1 there exist > 1,
> 0, and 0 such that the quadratic form
3 (x, w) = 1 (Ax + Bw) 1 (x) (x, w)
is positive definite. This, in turn, can be used to prove that recursion (5.2) is not asymp-
totically stable when |a| > 1. Indeed, otherwise there will exist a solution y = y(t) of
(5.2) which takes an infinite number of non-zero values, and converges to zero as t +.
Since the inequality |y(t)| = () will be valid for sufficiently large t, it follows that
e (x(t), w(t)) 0 and hence
1 (x(t + 1)) 1 (x(t))
for sufficiently large t, which contradicts the assumption that y(t) 0 but y(t) is not
identically equal to zero as t +.
A similar argument can be used to prove local asymptotic stability of (5.2) for all
a (1, 1), which means that every solution of (5.2) with y(0) and y(1) small enough
remains small for all t > 0 and converges to zero as t .
The approach can also be used to establish (global) asymptotic stability for some
values a (1, 1). Indeed, since
0 y sin(y) hy y 0, (5.3)
where h = 1.22, it follows that

x1
4 (x(t), w(t)) 0 for 4 ,w = w(hx2 w)
x2
is a quadratic form on IR2 IR. If a IR is such that values of (0, 1) and 0 can
be found making the quadratic form
5 (x, w) = 1 (x) 1 (Ax + Bw) 4 (x, w)
positive definite then (5.2) is asymptotically stable. Indeed, since 3 (x(t), w(t)) 0 for
all solutions of (5.2), positive definiteness of 4 for some 0 implies that 1 (x(t + 1))
1 (x(t)) for all t.
5.1.3 Random Variables and Second Order Statistics

The currently dominant approach to modeling uncertainty in engineering models is based
on the idea of averaging. Specifically, it views all (or, at least, most of) scalar real
parameters of interest as functions v : [0, 1] 7 IR of an unknown case parameter [0, 1]
which are integrable, in the sense that the integrals
Z 1
def
E[v] = v()d
0
are well defined and can be handled using the standard rules of integration concerning
linearity, inequalities, convergence, etc. Such functions v are referred to as (scalar) random
variables, and the quantity E[v] is called the expected value of v. While a complete proper
treatment of random variables requires a solid foundation in the theory of integration
(something not assumed in these lectures), certain aspects of the framework, typically
associated with treating linear combinations of a fixed finite family of random variables,
fit nicely into the subject of elementary linear algebra.
Consider the set V of all scalar real random variables v for which E[|v|2] < , assuming
for convenience to consider two random variables u, v V equal when E[|v u|2 ] = 0.
With the natural operations of addition and scaling, V becomes a real vector space.
Moreover, the bilinear form
b(u, v) = E[uv]
defines a positive definite quadratic form
E (v) = E[|v|2].
A number of basic optimization questions in signal processing and control can be formu-
lated in terms of quadratic forms and random variables from V .
A classical example is a parameter estimation setup, in which the value of a real
parameter a IR is to be estimated based on a finite number of uncorrelated noisy
measurements. The measurements are represented by random variables yi (i = 1, . . . , n),
related to a by the equations yi = a + wi where wi are random variables with zero mean
E[wi ] = 0 and unit variance E[wi2 ] = 1. The assumption that wi are uncorrelated is
represented by the equalities E[wi wk ] = 0, to be satisfied for all pairs (i, k) with i 6= k.
A typical objective is to find the sequence of coefficients c = (ci )ni=1 of a linear estimator
a = c1 y1 + + cn yn ,
in such a way that the expected value
(c) = E[|a a|2 ]
of the estimation error a a does not depend on parameter a IR, and, subject to this
constraint, is minimal.
5.2. POSITIVE DEFINITENESS OF QUADRATIC FORMS 85
It turns out that the set U of all estimator coefficient sequences c = (ci )ni=1 for which
E[|a a|2 ] does not depend on a IR is a linear subspace in IRn . Moreover, the functional
is a quadratic functional on U, in the sense that it is a sum of a quadratic form, a linear
function, and a constant. The corresponding quadratic form is positive definite, which
guarantees existence and uniqueness of the optimal estimator, which can be calculated
easily using the basic properties of quadratic forms.
5.2 Positive Definiteness of Quadratic Forms

This section discussed criteria and basic implications of positive definiteness.
5.2.1 Cauchy-Bunyakovski-Schwarz Inequality

A quadratic form : V 7 IR is called positive semidefinite when (v) 0 for all v V .
A very important property of positive semidefinite quadratic forms is given by the
so-called Cauchy-Bunyakovski-Schwarz inequality.
Let : V 7 IR be a quadratic form on a real vector space V . Then the formula
def (u + v) (u v)
(u, v) =
4
defines a symmetric (in the sense that (u, v) = (v, u) bilinear form (, ) : V V 7 IR.
Indeed, if b : V V 7 IR is the bilinear form from the definition of (v) = b(v, v) then
(u + v) (u v) b(u, v) + b(v, u)
= .
4 2
The symmetric bilinear form (, ) : V V 7 IR associated with a positive definite

quadratic form : V 7 IR is usually referred to as a scalar product.
Theorem 5.1 If : V 7 IR is a positive semidefinite quadratic form on a real vector

space V then
|(u, v)|2 (u)(v) u, v V.
Proof. Fix u, v V and consider the function f : IR 7 IR defined by f (t) = (u + tv). Since
(v) = (v, v) for some bilinear symmetric form (, ) , f () is a quadratic function
f (t) = f0 + f1 t + f2 t2 , f0 = (u), f1 = 2(u, v) , f2 = (v).
Since is positive semidefinite, f (t) 0 for all t, which means f0 f2 4f12 .

An important interpretation of Theorem thm:cbs is given by the triangle inequality
|u v| |u| + |v| ,
where, for a positive semidefinite quadratic form ,

def
|v| = (v)1/2 .
In particular, the triangle inequality states that, for a positive definite quadratic form ,
the quantity (u, v) = |u v| can serve as a measure of distance between vectors u and
v.
5.2.2 Matrix Representation of Quadratic Forms

Let : V 7 IR be a quadratic form on a finite dimensional real vector space V . Let (, ) :
V V 7 IR be the corresponding symmetric bilinear function. Let bv = (v1 , . . . , vn ) be a
basis in V . The symmetric real n-by-n matrix

(v1 , v1 ) (v1 , v2 ) . . . (v1 , vn )
(v2 , v1 ) (v2 , v2 ) . . . (v2 , vn )
Q= .. .. .. (Qi,k = (vi , vk ) ) (5.4)

. .
. . . .
(vn , v1 ) (vn , v2 ) . . . (vn , vn )
is called the matrix of in the basis bv .

For a fixed basis bv = (v1 , . . . , vn ) in V , relation (5.4) defines a one-to-one correspon-
dence between quadratic forms : V 7 IR and their matrices Q. Indeed, (5.4) defines
Q as a function of . On the other hand, for every vector
v = c1 v1 + + cn vn V
the value (v) is uniquely defined by the matrix Q and by the coefficients ci IR according
to
(v) = (c1 v1 + + cn vn )
= (c1 v1 + + cn vn , c1 v1 + + cn vn )
X n
= ci ck (vi , vk ) .
i,k=1
In particular, when V = IRn , and Q is the matrix of a quadratic form in the standard
basis, the values of (v) can be calculated according to the matrix algebra formula
(v) = v 0 Qv (v IRn ) (5.5)

where v IRn is an arbitrary n-by-1 matrix, and the prime 0 means transposition, so that
v 0 is a 1-by-n matrix, and, accordingly, the product v 0 Qv is a 1-by-1 matrix, i.e. a scalar
v 0 Qv IR.
The following statement describes how the matrix of a quadratic form changes when
the basis with respect to which it is computed is replaced by another one.
Theorem 5.2 Let : V 7 IR be a quadratic form on a finite dimensional real vector

space V . Let Qv be the matrix of in basis bv = (v1 , . . . , vn ) of V . Let bu = (u1 , . . . , un )
be another basis in V obtained by applying a linear function A : V 7 V to bv , i.e.
uk = Avk for all k. Then the matrix Qu of in basis bu is given by
0
Qu = Muv Qv Muv ,
where Muv is the matrix of A in the basis bv .
Proof. Let Sv : IRn 7 V and Su : IRn 7 V be the linear functions

x1 n x1 n
.. X .. X
Sv . = xk vk , Su . = xk uk .
xn k=1 xn k=1
Then, by the definition of matrix of a linear operator in a basis,
Su = Sv Muv ,
where Muv is interpreted as a linear function Muv : IRn 7 IRn . By the definition of the matrix
of a quadratic form,
(Sv x, Sv y) = x0 Qv y, (Su x, Su y) = x0 Qu y.
Hence
x0 Qu y = (Su x, Su y) = (Sv Muv x, Sv Muv y) = (Muv x)0 Qv (Muv y) = x0 (Muv

0
Qv Muv )y.
Since the equality holds for every pair of vectors x, y IRn (in particular, for every pair of the
standard basis vectors), Qu = Muv0 Q M .
v uv
One useful implication of the coordinate change formula is that the sign of the deter-
minant of the matrix of a quadratic form does not depend on the basis. Indeed, since
Muv is invertible, det(Muv ) 6= 0, and hence the signs of det(Qv ) and
0 0
det(Qu ) = det(Muv Qv Muv ) = det(Muv ) det(Qv ) det(Muv ) = det(Qv ) det(Muv )2
are the same.

5.2.3 Gram-Schmidt Orthogonalization

Let : V 7 IR be a quadratic form of a finite dimensional real vector space V . A basis
bv = (v1 , dots, vn ) in V is called orthonormal (with respect to ) if the matrix Q of with
respect to bv of V is the identity matrix Q = I, i.e. if

def 1, i = k,
(vi , vk ) = ik =
0, i 6= k.
If the basis bv is orthonormal with respect to then
(v) = (c1 v1 + + cn vn )
= c21 + + c2n
> 0,
for every non-zero vector

v = c1 v1 + + cn vn V,
which means that is positive definite. In other words, every quadratic form which has
an orthonormal basis is positive definite.
The following statement, critically important both in theory and numerical calcula-
tions, states that the opposite is also true: every positive definite matrix has an orthonor-
mal basis.
Theorem 5.3 Let : V 7 IR be a positive definite quadratic form on a finite dimen-

sional real vector space V . Let bu = (u1 , . . . , un ) be a basis in V . Then there exists a basis
bv = (v1 , . . . , vn ) in V , orthonormal with respect to , such that for every k the vector uk
is a linear combination of vectors v1 , . . . , vk .
The proof of Theorem 5.3 is constructive, and provides a very efficient way of checking
positive definiteness of quadratic forms, usually referred to as Gram-Schmidt orthogonal-
ization in a theopretical context, and as Choleski decomposition in a numerical linear
algebra context.
Proof. Let us prove by induction with respect to k = 1, 2, . . . , n that there exists a set of
vectors v1 , . . . , vk such that ui is a linear combination of v1 , . . . , vi for every i = 1, . . . , k, and
(vi , vj ) = ij for all i, j {1, . . . , k}.
For k = 1, setting v1 = (u1 )1/2 u1 (which is possible because by assumption (u1 ) > 0)
proves the base of induction. Assume now that the statement is proven for all k m. Then
for k = m + 1 let v1 , . . . , vm be the orthonormal vectors existing according to the inductive
assumption. Define
m
X
vm+1 = (em+1 )1/2 em+1 , where em+1 = um+1 (vi , um+1 ) vi .
i=1
According to the definition of em+1 ,

m
X
(vj , em+1 ) = (vj , um+1 ) (vi , um+1 ) (vj , vi ) = (vj , um+1 ) (vj , um+1 ) = 0
i=1
for j = 1, . . . , m. Since vectors ui form a basis, em+1 6= 0, and hence (em+1 ) > 0, which makes
vm+1 well defined. By construction, (vm+1 ) = 1 and (vm+1 , vi ) = 0 for i m.
The inductive proof of Theorem 5.3 translates into a recursive algorithm in numerical
calculations. Since the matrix M of the linear function A : V 7 V , defined by Avi =
ui , is upper triangular in the basis (v1 , . . . , vn ), the original matrix Q of in the basis
(u1 , . . . , un ) is getting represented in the form Q = M 0 M. Such representation is called
Choleski factorization, and producing M for a given symmetric Q is one of the basic
algorithms of numerical linear algebra.
5.2.4 Sylvester Criterion

Another efficient approach to verifying positive definiteness of a quadratic form is based
on calculating the determinants of all upper left corner submatrices of the matrix Q of .
The test is called the Sylvester criterion, and is stated by the following theorem.
Theorem 5.4 Let : V 7 IR be a quadratic form on a finite dimensional real vector

space V . Let b = (v1 , . . . , vn ) be a basis in V . The following conditions are equivalent:
(a) is positive definite;
(b) the inequality

(v1 , v2 ) (v1 , v2 ) . . . (v1 , vk )
(v2 , v1 ) (v2 , v2 ) . . . (v2 , vk )
det(Qk ) > 0 where Qk = .. .. ..

..
.

. . .
(vk , v1 ) (vk , v2 ) . . . (vk , vk )
holds for all k = 1, . . . , n.
Proof. To prove the implication (a) (b), note that the matrix Q of a positive definite
quadratic form has a Choleski factorization Q = M 0 M where M is the matrix of an invertible
linear operator, and hence has a non-zero determinant. Hence
det(Qn ) = det(M 0 M ) = det(M 0 ) det(M ) = det(M )2 > 0.
Since Qk is the matrix of the restriction of to the subspace of V spanned by v1 , . . . , vk in the

basis (v1 , . . . , vk ), the determinants of Qk must be positive for all k.
The implication (b) (a) can be proven by induction with respect to n.
For n = 1 the statement is trivial. Assume the implication (b) (a) is valid for all n m.
Then for n = m + 1 consider the subspace Vm of V spanned by the first m = n 1 vectors from
the basis b. According to the inductive assumption, the restriction m of to Vm is positive
definite. Let u1 , . . . , um be an orthonormal basis for m . In the basis (u1 , . . . , um , vm+1 ) the
matrix of has the block form
Im h
Qu = ,
h0 a
where h is an m-by-1 real vector, and a IR is a scalar.
Since the sign of determinant of a matrix of a quadratic form does not depend on the basis,
det(Qu ) = a |h|2 > 0.
Hence, according to the Cauchy-Schvarz inequality,

0
x Im h x
= |x|2 + 2x0 hy + ay 2
y h0 a y
|x|2 2|y| |x| |h| + ay 2 (|x| |y| |h|)2 + (a |h|2 )y 2 > 0

whenever x 6= 0 or y 6= 0.
It is important to remember that a similar test for positive semi-definiteness is not

valid. For example, all determinants of the square upper left corner submatrices of

0 0 1
Q= 0 0 0
1 0 0
are non-negative (they are equal to zero), but the corresponding quadratic form is not
positive semidefinite.
Chapter 6
Linear-Quadratic Optimization
This chapter presents basic properties and applications of linear quadratic (LQ) opti-
mization: existence, uniqueness, general properties, and explicit formulae for the optimal
value and optimal vector, constructions associated with Hilbert spaces, relation to optimal
control and optimal estimation.
6.1 Motivation
Maximization or minimization of a quadratic functional over a real vector space is per-
haps the simplest meaningful optimization setup imaginable. When well-posed, it can be
reduced to solving a linear equation. Quadratic optimization has a number of appealing
theoretical properties, and is extensively used in algorithmical and theoretical subjects,
such as optimal feedback control, parameter estimation, and signal processing, convex
and non-convex optimization, Hilbert space constructions, etc.
A general linear quadratic (LQ) optimization setup calls for maximization (or mini-
mization) of a quadratic functional : V 7 IR on a real vector space V :
(v) = 2f (v) (v) sup, (6.1)
vV
where : V 7 IR is a given quadratic form, and f : V 7 IR is a given linear function.

An equivalent formulation seeks to maximize
|f (v)|2 sup . (6.2)
vV,(v)1
Another common setup which can be reduced to (6.1) in a straightforward way is mini-
mization
def
(v) inf , v0 + U = {v0 + u : u U} (6.3)
vv0 +U
of a quadratic form over the affine subspace v0 +U of V , where U is a fixed linear subspace
of V , and v0 V is a given vector.
91
92 CHAPTER 6. LINEAR-QUADRATIC OPTIMIZATION
6.1.1 Optimal Control

Consider the task of finding the most economical way of changing the coordinate of a
point mass by selecting an appropriate force action. One way to formulate the problem
mathematically is to call for finding a function y : [0, 1] 7 IR with piecewise continuous
second derivative, such that
Z 1
y(0) = y(0) = y(1) = 0, y(1) = 1, (y()) = y(t)2 dt min . (6.4)
0
Here the boundary conditions imposed on y reflect an assumption of zero initial velocity,
a requirement of zero terminal velocity, and the desire to move change the coordinate by a
unit of length. Most importantly, the definition of () means that the cost is proportional
to the square of the instantaneous acceleration.
One possible selection of y = y(t) is
2
2t , t [0, 0.5],
y0 (t) = 2 (6.5)
4t 1 2t , t [0.5, 1].
However, it is far from being an optimal one.

To find the optimal y = y(t), consider the real vector space V of all functions
v : [0, 1] 7 IR which have piecewise continuous second derivative. Let U be the lin-
ear subspace in V containing all functions u V such that
u(0) = u(0) = u(1) = u(1) = 0. (6.6)
Let Va be the affine subspace of V consisting of all sums v = y0 + u where u U and y0

is defined in (6.5). Let : V 7 IR be the quadratic form defined by
Z 1
(v()) = v(t)2 dt. (6.7)
0
Then problem (6.4) becomes a special case of (6.3).

The theory of linear-quadratic optimization provides an easy way of finding an optimal
solution (wich, in this problem, is unique). An orthogonality (or projection) principle,
which is, essentially, a necessary and sufficient condition of optimality in setup (6.3),
states that when the quadratic form is positive semidefinite (which the form (6.7)
obviously is), an element v Va is optimal if and only if (v , u) = 0 for all u U, where
(, )0 : V V 7 IR is the symmetric bilinear function defined by .
In the case of optimization (6.4), the orthogonality principle means that y = y Va
is optimal if and only if the integral
Z 1
(y (), u()) = y (t)u(t)dt
0
6.1. MOTIVATION 93
is zero for all functions u V satisfying (6.6). Assuming for now that y is four time
continuously differentiable, integration by part transforms this into
Z 1
y(4) (t)u(t)dt = 0,
0
(4)
which means that the fourth derivative y of y must be identially equal to zero. Hence
y (t) = c0 + c1 t + c2 t2 + c3 t3
is a polynomial of degree less than four. Using the boudary conditions yields c0 = c1 = 1,
c2 = 3 and c3 = 2, i.e.
y (t) = 3t2 2t3
is a minimizer in (6.4).
Is this y the only minimizer? Since the formula for y was derived from an assumption
that y is four times continuously differentiable, the issue has to be addressed in one way
or another. It turns out, however, that a simple abstract argument yields uniqueness of
the optimum in (6.4): since the restriction of to the subspace U of V is positive definite,
the argument of minimum must be unique. Indeed, if v1 and v2 are two such minimizers
then (v1 , u) = 0 and (v2 , u) = 0 for all u U. In particular, with u = v1 v2 U, this
implies (u, u) = 0, i.e. u = 0.
6.1.2 Kalman Filter

Consider a mathematical model of a signal estimation problem stated in terms of vector
random variables xt , yt (xt takes values in IRn(t) and yt takes values in IRk(t) ) which evolve
in discrete time t {0, 1, 2, . . . } according to the law
xt+1 = At xt + Bt wt , yt = Ct xt + Dt wt , x0 = x0 + e0 , (6.8)
where wt , e0 , x0 are random variables ( wt takes values in IRm(t) while x0 and e0 take
values in IRn(0) ) such that
E[wt e00 ] = E[wt x0 ] = 0 t, E[x0 e00 ] = 0, E[|x0 |2 ] < , E[e0 e00 ] = Q0 ,

0 Im(t) , t = ,
E[wt w ] =
0, t 6= ,
and Q0 , At , Bt , Ct , Dt are known real matrices of dimensions n(0)-by-n(0), n(t+1)-by-n(t),
n(t + 1)-by-m(t), k(t)-by-n(t), and k(t)-by-m(t) respectively. Note that, by its nature, Q0
is a symmetric matrix which is positive semidefinite, in the sense that the quadratic form
f 7 f 0 Q0 f = E[|f 0 e0 |2 ] 0
is positive semidefinite.
It is assumed that the variables yt and x0 represent the measured data, and the task
is to design two linear estimators of xt : one uses the measurements y with 0 < t,
together with x0 , and produces a before measurement yt estimate
t1
X
xt = Gt x0 + Gt, y (6.9)
=0
of xt ; the other uses the measurements y with 0 t, together with x0 , and produces
an after measurement yt estimate
t
X
x+
t = Lt x0 + Lt, y (6.10)
=0
of xt . The sequences
Gt = (Gt , Gt,0 , . . . , Gt,t1 ), Lt = (Lt , Lt,0 , . . . , Lt,t )
of real matrix coefficients Gt , Gt, , Lt , Lt, are to be optimized to minimize
t (Gt ) = E[|xt xt |2 ], + + 2
t (Lt ) = E[|xt xt | ].
To address the minimization tasks as special instances of linear-quadratic optimization,

for every t consider the real vector space V = Vt of all random variables which take
values in IRn(t) and have finite second moments E[||2] < . Let the (positive definite)
quadratic form t : Vt 7 IR be defined by
= t () = E[||2],
which corresponds to the scalar product
(, ) = E[ 0 ].
Let Ut and Ut+ be the linear subspaces of V consisting of all linear combinations (6.9)
and (6.10) respectively (by construction, Ut Ut+ ). Then the task of optimizing of Gt is
equivalent to the linear-quadratic problem of minimizing on the affine subspace xt + Ut
of Vt , and optimization of Lt is equivalent to the linear-quadratic minimization of on
the affine subspace xt + Ut+ .
The orthogonality (projection) principle can be used to derive a nice recursive expres-
sion for computing the optimal xt and x+t , (essentially, the famous Kalman filter).
Recall that an estimate xt xt + Ut is optimal if and only if
(xt xt , u) = E[u0 xt ] = 0
6.1. MOTIVATION 95
for all u Ut . Since, in particular, every random variable of the form y , where < t
and is an n(t)-by-k( ) matrix with real coefficients, belongs to Ut , we conclude that
0 = E[(y )0 (xt xt )] = trace {0 E[(xt xt )y0 ]}
for every , which means that the matrix equality E[et y0 ] = 0 must be true for et = xt xt
for every < t. Similarly, the matrix expected value E[et x00 ] must be zero for the optimal
estimate xt . To summarize, a linear estimate xt is optimal if and only
E[(xt xt )x00 ] = 0, E[(xt xt )y0 ] = 0 < t. (6.11)
Similarly, a linear estimate x+

t is optimal if and only if
E[(x+ 0 + 0
t xt )x0 ] = 0, E[(xt xt )y ] = 0 t. (6.12)
Since (6.11) and (6.12) represent conditions of optimality which are both necessary and
sufficient, it is possible to use them to check correctness of optimal estimation guesses.
For example, since, by assumption, E[e0 x00 ] = 0, it follows that
x0 = x0 (6.13)
is the optimal estimate for x0 before measurement y0 .

A more generic derivation shows that
xt+1 = At x+
t . (6.14)
Indeed, since x+
t satisfies (6.12), and, due to the assumptions made about wt ,
E[(xt+1 At x+ 0 + 0 0
t )y = At E[(xt xt )y + Bt E[wt y ] = 0,
for t, as well as
E[(xt+1 At x+ 0 + 0 0
t )x0 = At E[(xt xt )x0 + Bt E[wt x0 ] = 0,
At x+
t satisfies the orthogonality principle, and hence is a correct expression for xt+1 .
A similar, though slightly more involved derivation, shows that
x+
t = xt + Ht (yt Ct xt ), (6.15)
where Ht is an arbitrary n(t)-by-k(t) real matrix satisfying the equation
Ht (Dt Dt0 + Ct Qt Ct0 ) = Qt Ct0 , (6.16)
and
def
Qt = E[et e0t ] = E[(xt xt )(xt xt )0 ].
There are two different statements to prove here: first, the existence of a matrix Ht
satisfying (6.16), and, second, the orthogonality relations (6.12).
Once (6.16) is established, the orthogonality follows by inspection, since, according to
(6.15),
def +
e+
t = xt xt = et + Ht Dt wt Ht Ct et
satisfies all equalities (6.12), except for E[e+ 0

t yt ] = 0, due to the assumed relations (6.11),
and
E[e+ 0
t yt ] = E[(et + Ht Dt wt Ht Ct et )(Dt wt Ct et + Ct xt )0 ]
= E[(et + Ht Dt wt Ht Ct et )(wt0 Dt0 e0t Ct0 + x0t Ct0 )]
= Ht (Dt Dt0 + Ct Qt Ct0 ) Qt Ct0
= 0
due to the orthogonality relations
E[et x0t ] = 0, E[et wt0 ] = 0, E[wt x0t ] = 0,
as well as (6.16).
To establish the feasibility of (6.16), we will use the following frequently used obser-
vation.
Lemma 6.1 Let : V 7 IR be a positive semidefinite quadratic form on a real vector

space V . Assume that v0 V is such that (v0 ) = 0. Then (v0 , v) = 0 for all v V .
Proof. Since for every t IR

(v0 + tv) = (v0 ) + 2(v0 , v) t + (v)t2 = 2(v0 , v) t + (v)t2 0,
the equality (v0 , v) = 0 follows as t 0.
To continue proving the feasibility of (6.16), note that the quadratic form
g IRn(t) 7 g 0 Qt g = E[|e0t g|2 ]
is positive semidefinite. According to Lemma 6.1, this means that g 0 Qt g = 0 implies

Qt g = 0. Therefore for every f IRn the equality
(Dt Dt0 + Ct Qt Ct0 )f = 0
implies
0 = f 0 (Dt Dt0 + Ct Qt Ct0 )f = |Dt0 f |2 + (Ct0 f )0 Qt (Ct0 f ),
and hence Qt Ct0 f = 0. Now, since
ker{Dt Dt0 + Ct Qt Ct0 } ker{Qt Ct0 },

6.2. BASIC PROPERTIES OF LQ OPTIMIZATION 97
the basic factorization theorem implies existence of a solution Ht of (6.16).

To finalize the recursive formulae for the optimal estimates, note that Qt can be
updated according to
Qt+1 = At Q+ 0 0
t At + Bt Bt , (6.17)
Q+ 0 0 0
t = (I Ht Ct )Qt (I Ht Ct ) + Ht Dt Dt Ht , (6.18)
with the initial condition
Q0 = Q0 , (6.19)
where
Q+ + + 0 + + 0
t = E[et (et ) ] = E[(xt xt )(xt xt ) ].
Taken together, (6.13)-(6.19) constitute a standard version of the Kalman filter.
6.2 Basic Properties Of LQ Optimization

This section presents general results concerning existence, uniqueness, and analytical
properties of optimal values and optimizing vectors in linear-quadratic optimization.
6.2.1 Equivalence Of LQ Optimization Problems

The results of this subsection establish the easy equivalence between different ways of
setting up a LQ optimization problem.
The first observation establishes an equivalence between (6.1) and (6.2): when is
positive semidefinite, the functional values are equal, modulo a re-scaling of the vector
argument.
Lemma 6.2 Let : V 7 IR and f : V 7 IR be a positive semidefinite quadratic form
and a linear function on a real vector space V . Then
sup{2f (tv) (tv)} = sup |f (tv)|2
tIR tIR, (tv)1
for every v V . In particular, the supremums in (6.1) and (6.2) are equal.
Proof. When (v) = 0 and f (v) = 0, both sides of the equality are zero. When (v) = 0 but
f (v) 6= 0, both sides of the equality are +. When (v) > 0, the left side is
|f (v)|2
sup{2f (v)t (v)t2 } = at t = f (v)(v)1 ,
tIR (v)
which is the same as the value on the right side:
|f (v)|2
sup t2 |f (v)|2 = at t = s(v)1/2 .
tIR, t2 (v)1 (v)
Another simple but important observation is that (6.3) can be reduced to a special
case of (6.1). Indeed, let : V 7 IR be a quadratic form on a real vector space V . Let
v0 V and U V be an element and a linear subspace of V . Then for every u U
(v0 + u) = (v0 ) + (v0 , u) + (u),
and hence minimization of on the afine subspace v0 + U is equivalent to maximization

of the quadratic functional
(u) = 2f (u) (u), where f (u) = (v0 , u) ,
on the real vector space U.
6.2.2 Well-Posedness
Well-posedness of an LQ optimization problem is related to the following two questions.
(a) Does the functional to be maximized (minimized) have a finite upper (respectively
lower) bound?
(b) If the optimal value is finite, is it achievable?

For example, the formula
Z 1
(v()) = {2v(t) ta v(t)2 }dt,
0
where a IR is a real parameter, defines a quadratic functional on the real vector space
C[0, 1] of all continuous functions v : [0, 1] 7 IR if and only if a > 1 (otherwise the
integral does not converge for some v C[0, 1]). Since
2r ta r 2 ta r IR, t (0, ),
where the inequality becomes equality at r = ta , the functional has a finite upper
bound over C[0, 1] if and only if a < 1. A maximizer v C[0, 1] exists if and only if
1 < a 0, in which case it is, naturally, given by v (t) = ta .
While sometimes it is difficult to judge well-posedness of an LQ optimization setup
over an infinite dimensional vector space, the following simple statements are frequently
quite useful.
First, the quadratic functional from (6.1) has a finite upper bound only when the
corresponding quadratic form is positive semidefinite.
Lemma 6.3 Let : V 7 IR and f : V 7 IR be a quadratic form and a linear function

on a real vector space V such that the supremum in (6.1) is finite. Then is positive
semidefinite.
Proof. If (v0 ) < 0 for some v0 V then (tv0 ) converges to + as t .

Note that the statement of (6.3) applies also to the setup of (6.3), in the sense that
the quadratic form must be positive semidefinite on the subspace U in order for the
infimum of (v0 + u) over u U to be finite. A similar claim is valid for the setup (6.2).
Lemma 6.4 Let : V 7 IR and f : V 7 IR be a quadratic form and a linear function

(which is not identically equal to zero) on a real vector space V such that the supremum
in (6.2) is finite. Then is positive semidefinite.
Proof. Let v1 V be a vector such that (v1 ) < 0. If f (v1 ) 6= 0 then |f (tv1 )|2 + for
t , while (tv1 ) 0 remains smaller than 1. If f (v1 ) = 0, let v0 V be a vector such that
f (v0 ) 6= 0. Then
|f (tv0 + t2 v1 )|2 = t2 |f (v0 )|2 + as t +,
while
(tv0 + t2 v1 ) = t2 (v0 ) + 2t3 (v0 , v1 ) + t4 (v1 ) as t +
remains smaller than 1.
In general, assessing existence of an optimizer in a LQ optimization problem can be

tricky. The finite dimensional case, though, is easy.
Theorem 6.1 Let : V 7 IR and f : V 7 IR be a quadratic form and a linear function

on a finite dimensional real vector space V such that the supremum in (6.1) is finite. Then
the maximum in (6.1) is achieved at a vector v V .
Proof. Let b = (v1 , . . . , vn ) be a basis in V . Let L and Q be the matrices of f and ,

respectively, with respect to this basis. Then for

x1
v = x1 v1 + + xn vn V, x = ... IRn

xn
we have
(v) = 2f (v) (v) = 2Lx x0 Qx.
Since (v) has a finite upper bound, Q is positive semidefinite. Also, for every x IRn such
that Qx = 0 the equation Lx = 0 must hold, otherwise (tx) converges to + as t converges
to plus or minus infinity. Hence there exists 1-by-n matrix H such that L = HQ, which implies
that
2Lx x0 Qx = 2HQx x0 Qx = (x H 0 )0 Q(x H 0 ) + HQH 0
achieves its minimum at x = H 0 .
6.2.3 Necessary and Sufficient Conditions of Optimality

For LQ optimization problems defined by positive semidefinite quadratic forms, a vector
v is optimal if and only if the derivative of the functional being optimized equals zero at
v .
Theorem 6.2 Let

def
(v) = 2f (v) (v),
where : V 7 IR is a positive semidefinite quadratic form and f : V 7 IR a linear
function on a real vector space V . Let v be a vector in V . Then the following conditions
are equivalent:
(a) (v) (v ) for all v V ;
(b) f (u) = (u, v ) for all u V .
Proof. The identity

(v ) (v) = (v v ) + 2(v v , v ) f (v v ) (6.20)
holds for all V .

Assume that (a) is true. Then substituting v = v + tu into (6.20), where t IR and u V
yields
(u)t2 + 2((u, v ) f (u))t 0 t IR,
which yields (b) as t 0.
Conversely, assume that (b) is true. Then (6.20) yields
(v ) (v) = (v v ) 0,
since is assumed to be positive semidefinite.
For example, the matrix algebra format of LQ optimization (6.1) is

(v) = 2F 0 v v 0 Qv 7 minn ,
vIR
where Q = Q0 and F age given real matrices of dimensions n-by-n and n-by-1 respectively.
Assuming that Q = Q0 is a positive semidefinite matrix (i.e. defines a positive semidefinite
quadratic form v 7 v 0 Qv), the necessary and sufficient condition that a given vector
v IRn maximizes (v) is given by
F 0 u = v0 Qu u IRn ,
i.e. F 0 = v0 Q, which is equivalent to Qv = F . When Q = Q0 > 0 is positive definite, the
condition can be further simplified to v = Q1 F .
An equivalent formulation of Theorem 6.2 is given by the orthogonality principle, or
projection theorem.
Theorem 6.3 Let : V 7 IR be a quadratic form on a real vector space V . Let v0 V

and U V be an element and a linear subspace of V . Assume that the restriction of
to U is positive semidefinite. Then, for every vector v v0 + U, the following conditions
are equivalent:
(a) (v) (v ) for all v v0 + U;
(b) (v , u) = 0 for all u U.
0
d
v0 + U
d d
v + u v
d
v0
Figure 6.1: The Orthogonality Principle
As shown on Figure 6.1, the projection theorem can be interpreted in terms of the
distance and scalar product
(v + u) (v u)
dist(v, u) = (v u)1/2 , (v, u) =
4
in V , induced by the positive semidefinite quadratic form . The theorem claims that
a vector v v0 + U has minimal distance to the origin 0 V if and only if it is
orthogonal to every element of U.
6.2.4 Optimizing Sequences

As was demonstrated earlier in this chapter, it is possible for a quadratic functional on
an infinite dimensional real vector space to have a finite upper bound sup which is not
achieved at any vector, in the sense that (v) < sup for all v.
When there is no optimal vector, it is reasonable to use the notion of an optimiz-
ing sequence {vk }k=1 of vectors, i.e. such that (vk ) converges to sup as k .
The following statement extends the necessary and sufficient conditions of optimality of
Theorem 6.2 to cover the case of optimizing sequences.
Theorem 6.4 Let

def
(v) = 2f (v) (v),
where : V 7 IR is a positive semidefinite quadratic form and f : V 7 IR is a linear
function on a real vector space V . Assume that the minimal upper bound = sup of
on V is finite. Let {(vk )
k=1 be a sequence of vectors in V . Then the following conditions
are equivalent:
(a) limk (vk ) = ;
(b) limk supuV,(u)1 |f (u) (vk , u) |2 = 0.
The result of Theorem 6.4 classifies the optimizing sequences {(vk ) k=1 as those for
which the linear functionals u 7 (vk , u) converge to the linear functional u 7 f (u) well
enough, where the required type of convergence is characterized by condition (b).
Proof. The proof follows from the identity
(vk + u) (vk ) = 2g(u) (u) where g(u) = f (u) (vk , u)
and Lemma 6.2.
6.2.5 Optimal Cost In LQ Optimization

For a fixed positive semidefinite quadratic form : V 7 IR, how does the minimal upper
bound
0 (f ) = sup f (v), where f (v) = 2f (v) (v), (6.21)
vV
depend on the linear function f : V 7 IR, viewed as a parameter?

The following fundamental theorem answers this question.
Theorem 6.5 Let : V 7 IR be a positive semidefinite quadratic form on a real vector

space V . Let V be the subset of V ] consisting of those linear functions f : V 7 IR for
which the supremum 0 (f ) defined in (6.21) is finite. Then V is a linear subspace of V ] ,
and 0 is a positive semidefinite quadratic form on V .
Proof. It is easy to see why the statement is true when V = IRn and is positive definite.
Indeed, then (v) = v 0 Qv for some symmetric matrix Q = Q0 , and every f V ] has the form
v 7 F 0 v, where F IRn is a fixed row vector. Since is positive definite, Qv = 0 implies
v 0 Qv = 0 and hence v = 0, i.e. ker(Q) = {0}. Hence equation Qv = F has a unique solution
v = Q1 F for every F IRn . Since
f (u) = F 0 u = (Q1 F )0 Qu u IRn ,

v = Q1 F is the argument of minimum of f (v). Since
f (Q1 F ) = 2F 0 (Q1 F ) (Q1 F )0 Q(Q1 F ) = F 0 Q1 F,
the maximum
0 (f ) = F 0 Q1 F for f (v) = F 0 v
is indeed a quadratic form.
Now consider the general case. To prove that V is a linear subspace of V ] , note that,
according to Lemma 6.2, a linear functional f V ] belongs to V if and only if there exists a
constant d = df IR such that |f (v)| df for all
def
v = {v V : (v) 1}.
Since for every c IR and f, g V ]
|cf (v)| |c| |f (v)|, |f (v) + g(v)| |f (v)| + |g(v)|,
the inclusions f, g V imply cf V and f + g V . Hence V is a linear subspace of V ] .

To prove that 0 is a quadratic form on V , it is sufficient to show that for every fixed g V
the function Lg : V 7 IR defined by
Lg (f ) = 0 (g + f ) 0 (g f )
is linear. To do this, let {vk }

k=1 and {wk }k=1 be the optimizing sequences of vectors for f and
g , respectively, i.e.
lim f (vk ) = 0 (f ), lim g (wk ) = 0 (g).
k k
Equivalently, according to Theorem 6.4,
sup |f (u) (vk , u) | 0, sup |g(u) (wk , u) | 0

u u
as k . Combining the relations between f , g, {vk }

k=1 , {wk }k=1 , and using Theorem 6.4,
one can conclude that the sequences wk + vk and wk vk are optimizing for g+f and gf
respectively. Since
[(g + f )(wk + vk ) (wk + vk )] [(g f )(wk vk ) (wk vk )] = 2f wk + 2hvk 4(wk , vk ) ,
where the entries which depend linearly on f (f and {vk }

k=1 ) enter linearly, Lg : V 7 IR is a
linear function.
Chapter 7
Bounded Linear Functions On

Hilbert Spaces
Fixing a positive definite quadratic form on a real vector space V allows one to quantify
length |v| of every vector vector v V according to |v| = (v)1/2 . It can also be used as
a basis for measuring the operator norm kAk of a linear function A as the minimal upper
bound for the ratio of the lengths |Av| and |v|. The availability of vector length and the
operator norm makes it possible to consider approximation and convergence of vectors
and linear functions.
A major goal of this chapter is to introduce the techniques for finding an approximation
Ar of a given linear function A, where Ar is restricted to be a linear function of rank
less than r, which minimizes the operator norm kAr Ak of the error function =
Ar A. In the case when A is a finite matrix with real or complex coefficients, and
the standard Euclidean length of vectors is used, the procedure requires computing the
dominant (largest) eigenvalues of A0 A (where the prime means Hermitian conjugation), as
well as the corresponding eigenvectors. When A has infinite rank, several complications
may arise. First, it is not always possible to define the conjugate A0 of a linear operator
A on a vector space V of infinite dimension: this, in general, requires boundedness of A
and completeness of V with respect to the distance measure d(v, u) = |v u| . Second,
the operator A0 A will not necessarily have any eigenvalues and eigenvectors: instead, A0 A
has spectrum, of which eigenvalues and eigenvalues constitute a special case. The chapter
explores these issues within the context of Hilbert spaces, operator norms, and spectrum
of a self-adjoint linear operator.
7.1 Motivation
Mathematical models based on Hilbert spaces and spectral decomposition of symmetric
operators are heavily utilized in a variety of applications, ranging from quantum mechanics
and distributed models to wavelets and model reduction.
105
106 CHAPTER 7. BOUNDED LINEAR FUNCTIONS ON HILBERT SPACES
7.1.1 Approximation Of Functions

Let us call a function v : [, ] 7 IR piecewise differentiable when there exists a
sequence of real numbers
= t0 < t1 < < tm =
such that v has a bounded derivative in the open interval (tk1 , tk ) for all k {1, . . . , m},
and
v + vk+
v(tk ) = k+1 k {0, 1, . . . , m},
2
where
def def
vk+ = lim v(t), vk = lim v(t) (k = 1, . . . , m), v0+ = vm
+
, vm+1 = v1 .
ttk+1 ,t<tk+1 ttk ,t>tk
Let V denote the real vector space of all such functions (where m is not fixed and can
be arbitrary). The standard Fourier series can be used to represent the elements of V as
limits of finite sums of sinusoids, according to
n
X
v(t) = lim vn (t), vn (t) = a0 + {ak cos(kt) + bk sin(kt)},
n
k=1
where

1 1 1
Z Z Z
a0 = v)(t)dt, ak = cos(kt)v(t)dt, bk = sin(kt)v(t)dt.
2
For example, the piecewise continuous function h : [, ] 7 IR defined by

1, t (, 0)
h(t) = sign(sin(t)) = 1, t (0, 1),
0, t {, 0, },

has a Fourier series representation

4 sin(t) sin(3t) sin(5t) 4 X sin((2k + 1)t)
h(t) = + + + ... = ,
1 3 5 k=1 2k + 1
which means that the partial sums

[(n+1)/2]
4 X sin((2k + 1)t)
hn (t) =
k=1
2k + 1
converge to h = h(t) as n for every t [, ], though the convergence is not very

good (see, for example, the plot of h100 on Figure 7.1).
7.1. MOTIVATION 107
1.5
0.5
0.5
1.5
4 3 2 1 0 1 2 3 4
Figure 7.1: Poor convergence of Fourier series
A positive statement regarding the rate of convergence can be made by introducing a

positive definite quadratic form : V 7 IR according to
Z
(v) = v(t)2 dt.

Using the Parceval identity

Z 2
n n
1 X
2 1X 2
a0 + {ak cos(kt) + bk sin(kt)} dt = a0 + {ak + b2k },

2 k=1
2 k=1
which is valid for all sequences of real numbers ak , bk , it is possible to establishthat the
length of the error vector h hn decreases not slower than a constant times 1/ n, i.e.

1
|h hn | = O as n .
n
In a similar way, a proof of convergence |v vn | 0 of the partial sums vn of the Fourier

series for every v V can be derived using the inequality
Z 2 n
( Z 2 Z 2 )
1 X 1 1

2 v(t)dt +
cos(kt)v(t)dt + sin(kt)v(t)dt
k=1

1
Z
|v(t)|2 dt < .
2
One can try to interpret this result as evidence to the point that the functions
{cos(kt)}
k=0 , {sin(kt)}k=1 form a basis in V , in the sense that for every element v V
there exist sequences of real numbers (ak )
k=0 and (bk )k=1 such that

X
v(t) = a0 + {ak cos(kt) + bk sin(kt)},
k=1
i.e.
n
X
lim |v vn | = 0 for vn (t) = a0 + {ak cos(kt) + bk sin(kt)}.
n
k=1
While this appears to be a reasonable conclusion, an important property expected of a

basis, namely the possibility of selecting arbitrary coefficients in linear combinations, is
missing here. Indeed, it would be difficult to describe all pairs of sequences (ak )
k=0 , (bk )k=1
which correspond to elements of V . The requirement that the sum of squares of all ak
and bk must be finite is the most natural choice, but it does not guarantee convergence of
the partial Fourier series sums to a limit in V .
The situation can be interpreted as lack of completeness of the real vector space V
with respect to the metric defined by . Completeness is an important property of real
(and complex) vector spaces equipped with positive definite quadratic forms, which allows
them to be referred to as Hilbert spaces. A vector space V which is not complete with
respect to a positive definite quadratic form : V 7 IR can be extended to a larger
vector space V (with simultaneous extension of ), to become a Hilbert space.
For the example considered here, the Hilbert space extension of V is the vector space
L [0, 1] of all square integrable functions u : [0, 1] 7 IR. For L2 [0, 1], the sequence of
2
functions {cos(kt)}
k=0 , {sin(kt)}k=1 forms a basis, in the sense that there is a one-to-one
correspondence between the elements of L2 [0, 1] and pairs of sequences of real numbers
with finite sums of squares, mapping every u L2 [0, 1] to (ak )
k=0 , (bk )k=1 in such a way
that 2
Z n
X
lim u(t) a {a cos(kt) + b sin(kt)} dt = 0.

0 k k
n
k=1
While working directly with the real vector space L2 [0, 1] requires a solid foundation in
the theory of integration, it is possible to infer many of its properties from studying the
quadratic form on the real vector space V .
The analysis of convergence of the Fourier series can be continued by studying the lin-
ear operators En : V 7 V mapping every function v V to the error of its approximation
by a finite sum of its complex Fourier series:
n Z
1 X jk(t )
(En v)(t) = v(t) e v( )d.
2 k=n
Calculation of the operator norm kEn k of En , which measures the maximal (over v 6= 0)
ratio of lengths |En v| /|v| , performed using the Parceval formula, yields a partially
disappointing outcome: kEn k = 1 for all n! On one hand, this means that the length
of the error vector En v can be as large as the length of v 6= 0. On the positive side,
the length of En v is never larger than the length of v, which is much better than can be
claimed for many other approximation methods.
7.1. MOTIVATION 109
7.1.2 Model Reduction For Linear Transformations

Every m-by-n real matrix M defines a linear function v 7 Mv mapping IRn to IRm . In
general, computing the matrix-vector product Mv for a given v IRn takes O(nm) binary
operations on real numbers. The task of finding, for a given M, a faster way of producing
an approximation of u = Mv, can be viewed as a model reduction problem.
An upper bound on the minimal number of binary operations needed to compute Mv
is given by O(kn + km) where k is the rank of M (i.e. the dimension of range R(M)).
Indeed, if v1 , . . . , vk is a basis in R(M) then
Mv = f1 (v)v1 + f2 (v)v2 + + fk (v)vk
for some linear functionals f1 , . . . , fk : IRn 7 IR; computing all values of fi (v) for a
given v takes O(kn) operations, and computing the sum of fi (v)vi takes another O(km)
operations.
While this is not exactly the same as the original matrix model reduction question,
the following relaxed formulation is important because it allows an elegant and efficient
solution: given an m-by-n real matrix M and a positive integer r find a matrix Mr of rank
less than r for which the operator norm kr k of the difference = Mr M is minimal.
To state a question like this, one has to fix first a way of measuring lengths of vectors
in both IRn and IRm . By default, the positive definite quadratic form on IRk is defined
according to
x1
... = x21 + + x2k ,

xk
in which case |x| becomes the standard Euclidean length |x| of x.
A solution of the matrix rank reduction problem is given in terms of Schur (eigen-
value) decomposition for the symmetric matrix A = M 0 M. Essentially, the r-th largest
eigenvalue of M 0 M is the square of the minimum of kMr Mk over the set of matrices
Mr of rank less than r. Moreover, the eigenvectors of M 0 M corresponding to its r largest
eigenvalues can be used to calculate an optimal approximation Mr .
The optimal rank reduction setup can be generalized to cover the important case
of finding low rank approximations to linear functions M : V 7 U, where V and
U are real vector spaces of infinite dimensions. Such approximations are quite helpful
when numerical calculations with infinite dimensional vectors are to be performed. For
example, while there is no explicit formula for computing the solution u : [, ] 7 IR
of the differential equation
u(t) + sin(t)u(t) = v(t), u() = u(), u() = u()
for a given function v : [, ] 7 IR, a good low rank approximation of the linear function
v 7 u can be computed using the ideas of matrix (linear function) rank reduction.
7.2 Hilbert Spaces

This section provides definitions and basic properties of real and complex Hilbert spaces,
as well as the conjugation operation associated with bounded linear operators on Hilbert
spaces.
7.2.1 Real Hilbert Spaces

Let : V 7 IR be a positive definite quadratic form on a real vector space V . According
to the Cauchy-Bunyakovski-Schwarz inequality, for every fixed u V the linear function
f = fu : V 7 IR defined by
def (u + v) (u v)
fu (v) = (u, v) = (7.1)
4
satisfies the condition
def
|f | = sup{|f (v)| : (v) 1} < . (7.2)
The condition (7.2) is very important because it establishes a degree of continuity of f ,

in the sense that the difference f (v1 ) f (v2 ) is small whenever v1 and v2 are close to each
other with respect to the -induced distance measure:
def
|f (v1 ) f (v2 )| |f | |v1 v2 | , where |v| = (v)1/2 .
In general there will be some linear functionals f : V 7 IR which satisfy condition

(7.2) but cannot be represented in the form (7.1).
Example 7.1 Let V = C[0, 1] be the real vector space of continuous functions v : [0, 1] 7 IR.
Let , f : V 7 IR be the quadratic form and the linear functional defined by
Z 1 Z 0.5
2
(v) = v(t) dt, f (v) = v(t)dt.
0 0

Then is positive definite, f is linear, and condition (7.2) is satisfied with |f | = 1/ 2. Nev-
ertheless, there is no continuous function u : [0, 1] 7 IR such that
Z 1
def
f (v) = (u, v)s = u(t)v(t)dt.
0
One can argue that the vector space V from Example 7.1 is incomplete with respect
to the metric defined by the quadratic form, in the sense that it does not contain enough
elements to define all continuous linear functionals on itself as scalar products.
7.2. HILBERT SPACES 111
Definition 7.1 A real Hilbert space is the pair (V, ), where V is a real vector space,
and : V 7 IR is a positive definite quadratic form : V 7 IR such that every linear
functional f : V 7 IR satisfying (7.2) can be represented in the form (7.1).
Actually, this definition is not quite standard. The following statement establishes the
equivalence between Definition 7.1 and a more general notion of completeness.
Theorem 7.1 Let : V 7 IR be a positive definite quadratic form on a real vector

space V . Then the following conditions are equivalent:
(a) for every (0, ) and a linear function f : V 7 IR satisfying |f (v)| |v| for
all v V there exists u V such that |u| and f (v) = (u, v) for all v V ;
(b) for every sequence (vk )

k=1 of vk V such that rn 0 as n +, where
rn = sup |vk vi | .
k,i>n
there exists u V for which |vk u| 0 as k .
The sequence (vk )

k=1 described in (b) is called a Cauchy sequence in V (with respect
to the metric d(v, u) = |v u| introduced on V by the positive definite quadratic form
). According to Theorem 7.1, the pair (V, ) defines a Hilbert space if and only if V is
complete with respect to the metric d(v, u) = |v u| , in the sense that every Cauchy
sequence has a limit.
Proof. Assume that (a) is valid. Let (vk )
k=1 be the the sequence from (b). Then for every
v V we have
|(vi , v) (vk , v)| = |(vi vk , v)| |vi vk | |v| 0
as min{k, i} +. Hence the sequence of real numbers {(vk , v)}
k=1 converges to a limit f (v)
def
which is a linear function of v. Since = supk |vk | < , it follows that |f (v)| |v| for
all v V . Hence, according to (a), there exists u V such that f (v) = (u, v) for all v V .
Substituting v = u vk into
|f (v) (vk , v)| sup |(vi vk , v)| rn |v| ,

i>n
where k > n, implies
|u vk |2 = |f (u vk ) (vk , u vk )| rn |v vk | .
Since rn 0 as n , this means that |u vk | 0 as k .

Now assume that (b) is valid. Let f be the linear functional from (a). By assumption, the
functional
(v) = 2f (v) (v)
has a finite minimal upper bound sup < . Let {vk }

k=1 be a sequence of vectors vk V such
that (vk ) sup . According to Theorem 6.4,
lim sup{|f (v) (v, vk ) |2 : v V, (v) 1} = 0,

k
hence
lim sup{|(v, vk ) (v, vi ) |2 : v V, (v) 1} = 0,
min{k,i}
which implies that the sequence {vk }

k=1 satisfies the assumptions in (b). Therefore there exists
u V such that |u vk | 0 as k . By construction, f (v) = (u, v).
As usually, a common abuse of notation allows one not to distinguish between the
Hilbert space H = (V, ) and the corresponding real vector space V , so that v H is to
be understood as v V . As a rule, the scalar product (v, u) in a Hilbert space H = (V, )
is denoted without the index: as (v, u) or, equivalently, hv, ui. In these lecture notes,
the length (v)1/2 of a vector v in a Hilbert space H = (V, ) is denoted by |v| (alternative
commonly used notation kvk will be reserved for those length-like quantities (norms) for
which the function v 7 kvk2 is not a quadratic form).
Example 7.2 Every positive definite quadratic form : V 7 IR on a finite dimensional

real vector space V makes it a real Hilbert space. Indeed, according to the Gram-Schmidt
orthogonalization statement, there exists a basis (v1 , . . . , vn ) in V which is orthonormal, i.e.

1, i = k,
(vi , vk ) =
6 k.
0, i =
Then for every linear functional f : V 7 IR we have
f (c1 v1 + + cn vn ) = c1 f (v1 ) + + cn f (vn ) = (f (v1 )v1 + + f (vn )vn , c1 v1 + + cn vn ) ,
i.e.
f (v) = (u, v) where u = f (v1 )v1 + + f (vn )vn .
In particular, the real vector space V = IRn is usually treated as a real Hilbert space with
the quadratic form

x1
... = x21 + + x2n .

xn
Establishing from scratch that a specific positive definite quadratic form on an infi-
nite dimensional real vector space defines a Hilbert space is usually tricky. The following
example is one of the easiest of its kind.
Example 7.3 The infinite dimensional Hilbert space `2 is defined as the pair H = (V, ),
where V is the subset of the set IRZZ+ consisting of all functions v : ZZ+ 7 IR (essentially,
sequences v(0), v(1), . . . of real numbers) which are square summable, i.e. such that

X
(v) = |v(t)|2 < .
k=0
Obviously v V implies cv V , and the fact that for v, u V the inequality
|v(t) + u(t)|2 2(|v(t)|2 + |u(t)|2 )
proves v + u V , which establishes that V is a real vector space. Similarly, since
|v(t)u(t)| 0.5(|v(t)|2 + |u(t)|2 ),
the function

X
(v, u) 7 v(t)u(t)
t=0
is a well defined symmetric bilinear form on V , which establishes that : V 7 IR is a positive
definite quadratic form.
To verify that `2 is indeed a Hilbert space, take a sequence {vk } k=1 of elements vk `
2
(each, in turn, is a sequence of real numbers) such that rn 0 as k +, where

rn = sup |vk vi | .
k,i>n
Since |vk vi | |vk (t) vi (t)| for every t ZZ+ , the sequence {vk (t)}
k=1 has a limit u(t) for
every t ZZ+ . Since
X
|vk (t) u(t)|2 rn for k > n,
t=0
it follows that u `2 and |vk u| 0 as k .
Example 7.4 Let V = C[0, 1] be the real vector space of all continuous functions v : [0, 1] 7
IR. The quadratic form : V 7 IR defined by
Z 1
(u) = u(t)2 dt. (7.3)
0
does not make (V, ) a Hilbert space. Nevertheless, V can be viewed as a subset of the real
Hilbert space V 0 of all linear functionals f : V 7 IR such that |f (v)|2 (v) for all v V ,
where an element u V is associated with the functional
Z 1
fu : v 7 u(t)v(t)dt. (7.4)
0
0
Essentially, V consists of all all linear functionals (7.4) where u() is not restricted to V , but
is allowed to be a measureable function u : [0, 1] 7 IR which is also square integrable, in the
sense that the integral (7.3) is finite. In fact, it is proper to associate V 0 with the real Hilbert
space L2 [0, 1] of all measureable square integrable functions u : [0, 1] 7 IR. However, a proper
discussion of L2 [0, 1] relies on the theory of Lebesque integration, and is not discussed here.
7.2.2 Hilbert Spaces Of Bounded Linear Functionals

While establishing, for a given positive definite quadratic form : V 7 IR on a real vector
space V , that the pair H = (V, ) constitutes a Hilbert space, can be tricky, Hilbert spaces
can be constructed easily by an abstract completion process, as shown by the following
statement.
Theorem 7.2 Let : V 7 IR be a positive definite quadratic form on a real vector

space V . Let V 0 be the subset of V ] consisting of all linear functionals f : V
7 IR which
are bounded, in the sense that
0 (f ) = sup{|f (v)|2 : v V, (v) 1} < .
Then
(a) V 0 is a real vector space, 0 : V 0 7 IR is a positive definite quadratic form on V 0 ,

and the pair H = (V 0 , 0 ) is a real Hilbert space;
(b) for every f V 0 there exists a sequence of functionals fk V 0 defined by fk (v) =

(vk , v) for some vk V such that 0 (f fk ) 0 as k .
Under the assumptions of Theorem 7.2, every element u V can be associated with
the linear functional fu V 0 defined by
fu (v) = (u, v) .
The map u 7 fu establishes a linear bijection between V and a subset of V 0 , which

preserves the values of the quadratic forms involved, in the sense that (u) = 0 (fu )
for every u V . Therefore V can be interpreted as a subset of V 0 , in which case 0
becomes an extension of , in the sense that 0 (f ) = (u) for f = fu V . According
to Theorem 7.2, V 0 is a minimal complete extension of V . In other words, V 0 can be
understood as a result of completion of V with respect to the metric induced by . The
process of deriving V 0 from V is somewhat similar to the process of getting the field of
real numbers as a completion of the field of rational numbers.
Proof. It was established earlier in Theorem 6.5 that V 0 is a linear subspace of V ] , and
0 : V 0 7 IR is a positive definite quadratic form on V 0 .
To prove (a), it remains to be shown that every sequence {fk } 0
k=1 of functionals fk V ,
such that
rn = sup 0 (fk fi )
k,i>n
converges to zero as n +, has a limit g V 0 , in the sense that 0 (fk g) 0 as k .

Indeed, since
|fk (v) fi (v)| 0 (fk fi )1/2 |v| ,
the sequence of real numbers {fk (v)} k=1 converges to a limit g(v) IR as k . Repeating
the arguments from the proof of the implication (a)(b) of Theorem 7.1 shows that g V 0 and
0 (fk g) 0 as k .
To prove (b), for a given f V 0 define the sequence {vk }
k=1 as in the proof of the implication
(b)(a) of Theorem 7.1, and follow its arguments to show that 0 (f fk ) 0 where fk (v) =
(vk , v) .
7.2.3 Complex Hilbert Spaces

Every vector space V over the field C of complex numbers can also be viewed as a vector
space over IR, which allows one to define the notion of a real-valued quadratic form
: V 7 IR. While such forms are useful in many applications, the main attention is
usually paid to the narrower class of quadratoc forms , called Hermitian forms.
Definition 7.2 Let V be a complex vector space. A real valued quadratic form : V 7
IR is called a Hermitian form when
(zv) = |z|2 (v) v V, z C.
Example 7.5 Every Hermitian form : C 7 IR is given by
(z) = c|z|2 = c(Re(z))2 + c(Im(z))2 ,
where c IR is a constant. In contrast, real-valued quadratic forms : C 7 IR are given by

0
Re(z) Re(z)
(z) = Q ,
Im(z) Im(z)
where Q is an arbitrary symmetric matrix with real entries.
Just as every quadratic form on a real vector space defines a unique bilinear sym-
metric function (, ) : V V 7 IR such that (v) = (v, v) , every Hermitian form
: V 7 IR on a complex vector space V defines a unique function (, ) : V V 7 C
with the following properties:
(a) (v) = (v, v) for every v V ;
(b) (, ) is C-linear with respect to its second argument;
(c) (, ) is Hermitian symmetric in the sense that
(v, u) = (u, v)0 v, u V,
where z 0 for z C denotes the complex conjugate of z.

A simple calculation show that (, ) is given by

(v + u) (v u) (v + ju) (v ju)
(v, u) = + (j = 1).
4 4j
When is positive definite (i.e. (v) > 0 for all v 6= 0), the function (, ) is usually
referred to as the Hermitian scalar (inner) product defined by .
The pair H = (V, ) is said to define a complex Hilbert space when the pair (VIR , )
is a real Hilbert space, where VIR denotes V viewed as a real vector space. It is good to
remember that the notion of a complex Hilbert space is not a generalization, but rather a
restriction of the notion of a real Hilbert space: every complex Hilbert space is naturally
a real Hilbert space, but the opposite is not true.
Example 7.6 The complex vector space V of all functions v : ZZ+ 7 C which are square
summable, in the sense that

X
(v) = |v(t)|2 <
t=0
and the Hermitian form : V 7 IR define a complex Hilbert space `2 (C ) = (V, ) with
Hermitian inner product

X
(v, u) = (v, u) = v(t)0 u(t).
t=0
7.2.4 The Conjugation

While working with Hilbert spaces, it is frequently inconvenient to keep refering to the
underlying quadratic forms and scalar products. Instead, the formalism of conjugation is
used extensively, especially in applications-oriented work.
The conjugation operation is available for arbitrary bounded linear functions mapping
one Hilbert space to another.
Definition 7.3 Let U, V be Hilbert spaces (either both real, or both complex). A linear
function A : U 7 V is called bounded when there exists a constant c IR such that
|Au| c|u| u U.
The minimal c 0 satisfying this condition is called the operator norm of A, denoted by
kAk.
It is easy to see that a bounded linear function A : U 7 V is also continuous with

respect to the length metrics of the Hilbert spaces: if a sequence {uk }
k=1 of vectors in U
converges to a limit u U, in the sense that |uk u| 0 as k , then
|Auk Au| = |A(uk u)| kAk |uk u| 0 as k .

It is easy to see that the opposite is also true: a linear function A : U 7 V is continuous
with respect to the length metrics if and only if A is bounded. For nonlinear functions, the
relation fails both ways: for example, the function A1 : IR 7 IR defined by A1 (x) = x2 is
continuous but not bounded, and the function A2 : IR 7 IR defined by

0, x < 1,
A2 (x) =
1, x 1,
is bounded but not continuous.

The following theorem is the theoretical basis for introducing the conjugation of
bounded linear functions on Hilbert spaces.
Theorem 7.3 Let A : U 7 V be a bounded linear function mapping Hilbert space U

to Hilbert space V . Then there exists a unique bounded linear function B : V 7 U such
that
(v, Au) = (Bv, u) v V, u U.
Proof. Consider the case of real scalars (the complex case is very similar). For every v V
consider the linear functional fv : U 7 IR defined by fv (u) = (v, Au). Since
|fv (u)| = |(v, Au)| |v| |Au| |v| kAk |u|,
the functional is bounded, and hence can be represented as a scalar product with an element of
U : fv (u) = (w, u). Since Note that w is uniquely defined by v, and |w| kAk |v|. Therefore
the map B : v 7 u is linear and bounded.
The linear function B defined in Theorem 7.3 is called the conjugate of A. The
conjugation operation is frequently denoted by the star sign, as in B = A . However,
deferring to MATLAB, this text will use the prime notation B = A0 for conjugation.
Example 7.7 It is easy to see that the operation of conjugation is, in general, available only
on Hilbert spaces. For example, let : U 7 IR be a positive definite quadratic form on a real
vector space U . Let f : U 7 IR be a bounded linear function. Consider the positive definite
function (x) = x2 on V = IR. Then a conjugate of f with respect to these two quadratic
forms would be a map f 0 : IR 7 V such that
(x, f (u)) = (f 0 (x), u) u U, x IR,
which is equivalent to f (u) = (f 0 (1), u) for all u U . In other words, existence of a conjugate
for every bounded function f : U 7 IR is already equivalent to U being a Hilbert space with
respect to !
The following theorem lists easy-to-derive properties of conjugation.

Theorem 7.4 If A : U 7 V and B : V 7 W are bounded linear functions on Hilbert

spaces U, V, W then
(a) (A0 )0 = A;
(a) kA0 k = kAk;
(b) (BA)0 = A0 B 0 .
Proof. As follows directly from the definition of conjugation B = A0 is equivalent to A = B 0 ,

hence (A0 )0 = A.
The proof of Theorem 7.3 implies kA0 k kAk. Combining this with the identity (A0 )0 = A
proves kAk = kA0 k.
Finally, the equality
(w, BAu) = (B 0 w, Au) = (A0 B 0 w, u)
proves that (BA)0 = A0 B 0 .
Applying the concept of conjugation to linear functions A : IRn 7 IRm or A : C n 7

m
C and their matrices in the standard bases shows that conjugation means transposition
for real matrices and Hermitian conjugation for complex matrices.
The operation of conjugation can naturally be applied to the elements of Hilbert
spaces. Indeed, every vector v H in a real (or complex) Hilbert space H can be
associated with the linear function hv : IR 7 H (accordingly hv : C 7 H) defined by
hv (s) = sv. Since |hv (s)| |v| |s|, the function hv is bounded, and hence has a conjugate
h0v : V 7 IR (or h0v : V 7 C), which we will agree to refer to as simply h0v = v 0 . This
extension of the conjugation operation allows one to use expressions such as f 0 Qf , where
Q : H 7 H is a bounded linear operator on a complex Hilbert space H, and f H is
an element of H. Naturally, f 0 Qf means composition of linear functions f : C 7 H,
Q : H 7 H, and f 0 : H 7 C. Accordingly, f 0 Qf is a linear map C 7 C, i.e. a complex
number. When Q is self-adjoint, in the sense that Q = Q0 , we have
(f 0Qf )0 = f 0 Q0 (f 0 )0 = f 0 Qf,
which means that f 0 Qf is a real number.
7.2.5 Orthogonal Projection

A framework of orthogonal projection can be defined for closed linear subspaces of a
Hilbert space.
Let V be a linear subspace of a Hilbert space H (real or complex). The original
quadratic form (v) = |v|2 from H is still positive definite on V , but does not necessarily
make it a Hilbert space. As is stated by the following result, a linear subspace of a Hilbert
space is also Hilbert space (with respect to the same quadratic form) if and only if it is
closed.
Definition 7.4 A subset X of a Hilbert space H is called closed in H if for every sequence
{vk }
k=1 of elements vk X the condition |u vk | 0 as k implies u X.
Example 7.8 Let X be the set of all sequences x `2 with a finite number of non-zero
elements. While X is a linear subspace of `2 , it is not closed. To show this, let vk X be
defined by t
2 , t < k,
vk (t) =
0, t k.
Let u `2 be defined by u(t) = 2t for all t ZZ+ . Then |u vk | converges to zero as k ,
but u is not an element of X, as it has an infinite number of non-zero entries. Hence X is not
closed in H.
In contrast, the set of all x H such that x(t) = 0 for all odd t ZZ+ is a closed linear
subspace of H.
Theorem 7.5 Let V be a linear subspace of a Hilbert space H (real or complex). The
quadratic form v 7 |v|2 makes V a Hilbert space if and only if V is closed in H.
Proof. To prove necessity assume that V is a Hilbert space. If u H, vk V , and |vk u| 0

as k + then {vk } k=1 is a Cauchy sequence in V , and hence has a limit in V , i.e. u V .
To prove sufficiency, assume that V is closed. If {vk }
k=1 is a Cauchy sequence in V then it
is also a Cauchy sequence in H, and hence has a limit u H. Since V is closed, u V .
Theorem 7.5 can be used to prove the existence of an orthogonal complement to every
closed linear subspace of a Hilbert space.
Theorem 7.6 Let H be a Hilbert space (real or complex) with inner product (, ). Let
V be a closed linear subspace of H. Then
(a) the set

E = V = {e H : (e, v) = 0 v V }
is also a closed linear subspace of H;
(b) every vector u H has a unique representation in the form u = e + v, where e V

and v V ;
(c) the functions PV : H 7 V and PE : H 7 E defined by u = PV (u) + PE (u) for all

u H are linear and bounded.
Proof. To prove (a), note that V is invariant under scaling and addition, i.e. V is a linear
subspace of H. Moreover, if |ek g| 0 as k and (ek , v) = 0 for all k then
(g, v) = lim (ek , v) = 0,

k
i.e. V is closed.
To prove (b), note that every vector u H defines a bounded linear function fu : V 7 F
(where F = IR or F = C ) according to fu (v) = (u, v). By Theorem 7.5, V is a Hilbert space,
and hence there exists w V such that fu (v) = (w, v). Equivalently, the vector e = u w
is orthogonal to V , in the sense that (e, v) = 0 for all v V . To prove uniqueness, assume
e1 + v1 = e2 + v2 where e1 , e2 V and v1 , v2 V . Then e1 e2 = v2 v1 , and hence
|v2 v1 |2 = (v2 v1 , v2 v1 ) = (v2 v1 , e1 e2 ) = 0,
i.e. v2 = v1 and e2 = e1 .
To prove (b), note that the uniqueness of the representation u = e + v, where e V ,
v V , implies linearity of the resulting maps u 7 e and u 7 v, since cu = ce + cv follows from
u = e + v for every scalar c F , and u1 + u2 = (e1 + e2 ) + (v1 + v2 ) follows from u1 = e1 + v1
and u2 = e2 + v2 . Also, since
|e + v|2 = (e + w, e + w) = (e, e) + (e, w) + (w, e) + (w, w) = |e|2 + |w|2
for e V , v V , the operator norms of functions PV and PE are not larger than 1.
While Theorem 7.6 appears to be obvious in the finite dimensional case, where every
linear subspace of a Hilbert space is automatically closed, it is important to note that
orthogonal projections are not well defined for non-closed subspaces. In particular, for
u `2 and V `2 defined in Example 7.8 there is no way to represent u as a sum u = e+v
where v V and e V , because V = {0}, and u 6 V .
7.2.6 Extensions of Linear Operators

A number of infinite dimensional Hilbert spaces used in applications have a rather nasty
nature: their definition relies heavily on the theory of general Lebesque integration, and
manipulations with vectors require familiarity with some delicate aspects of the measure
theory. The situation is frequently quite silly: for example, in optimal control, where
the control action as a real-valued function u = u(t) of continuous time t is sometimes
optimized, the practical considerations would advise that using anything more compli-
cated than a piecewise continuous u = u(t) is out of question. Nevertheless, the proofs
of existence of optimal control, as well as many other useful derivations, become much
easier when u = u(t) is allowed to be an arbitrary integrable function of time. The setup
is very similar to the one we have with real numbers: while all practical measurements
and calculations are done with rational numbers, the advanced arithmetics becomes much
more straightforward once the notion of a general real number is introduced. More to
the same point, while all theorems of real analysis can be stated and proven without ever
mentioning non-rational numbers, such presentation would be awkward and ugly.
One approach to simplify dealing with an infinite dimensional Hilbert space H is based
on working with a selected non-closed linear subspace V of H which is dense, in the sense
that every element of H is a limit of a sequence of vectors from V . The set V from
Example 7.8 is a useful example of a desnse linear subspace of `2 .
The following statement establishes the principle of extension for a bounded linear
function.
Theorem 7.7 Let G and H be two Hilbert spaces (real or complex). Let U and V be
dense linear subspaces of G and H respectively. Let A0 : U 7 V be a linear function
which is bounded, in the sense that there exists a constant 0 such that |A0 u| |u|
for every u U (naturally, the length |A0 u| of A0 u V is taken in H, and the length |u|
of u U is taken in G). Then there exists a unique bounded linear function A : G 7 H
such that Au = A0 u for all u U.
U A-
0 V
? A- ?
G H
Figure 7.2: Extension of bounded linear functions
The function A : G 7 H defined in Theorem 7.7 can be referred to as the continuous

extension of A0 to G.
Proof. For every g G let {uk }
k=1 be a sequence of vectors uk U converging to g, in the
sense that |uk g| 0 as k . Then {uk }
k=1 is a Cauchy sequence in G, and, since
|A0 uk A0 ui | = |A0 (uk ui )| kA0 k |uk ui |,
{A0 uk }
k=1 is a Cauchy sequence in H. Hence Auk h as k for some h H.
Let us show that h depends only on g G (and not on the selection of a specific sequence
{uk }
k=1 converging to g). Indeed, if, in addition, |uk g| 0 for some uk U , then
|uk uk | = |(uk g) (uk g)| |uk g| + |uk g|
converges to zero as k . Hence |A0 uk A0 uk | 0, i.e. the sequences {A0 uk } k=1 and

{A0 uk }k=1 must have the same limit.
The fact that h is uniquely defined by g allows one to write h = A(g). Since scaling of a
convergent sequence results in the same scaling of its limit, and limit of a sum of two convergent
sequences is the sum of their limits, A : G 7 H is a linear function. Also, since |A0 uk | |uk |,
it follows that |Ag| |g|.
Finally, for every bounded linear extension B of A0 we have |Bg Buk | 0 whenever
|uk g| 0. When vk U , this yields |Bg Buk | = |Bg A0 uk | 0, i.e. Bg = Ag.
The operation of continuous extension has a number of useful properties, listed in the
following statement.
Theorem 7.8 Let F , G, H be three Hilbert spaces (real or complex). bLet U, V , W be

dense linear subspaces of F , G, H respectively. Let A : F 7 G, B : G 7 H, C : F 7 G
be the continuous extensions of bounded linear functions A0 : U 7 V , B0 : V 7 W ,
C0 : U 7 V respectively. Then
(a) kA0 k = kAk;
(b) BA is the extension of B0 A0 ;
(c) A + C is the extension of A0 + C0 ;
(d) if a linear function D0 : V 7 U is such that (v, A0 u) = (D0 v, u) for all u U,

v V then D0 is bounded, and A0 is the continuous extension of D0 to G.
Proof. Statement (a) is actually proven in Theorem 7.7.

Statement (b) means that |B0 A0 uk BAg| 0 whenever uk U and |uk g| 0, which
follows from
|B0 A0 uk BAg| = |BA(uk g)| kBk kAk |uk g|.
Statement (c) means that |(A0 + C0 )uk (A+ C)g| 0 whenever |uk g| 0, which follows
from
|(A0 + C0 )uk (A + C)g| = |A(uk g) + C(uk g)| (kAk + kCk) |uk g|.
To prove (d), note first that substituting u = D0 v into (v, A0 u) = (D0 v, u) yields
|D0 v|2 = (v, A0 D0 v) |v| |A0 D0 v| kA0 k |v| |D0 v|,
i.e. |D0 v| kA0 k |v|, which proves boundedness of D0 . Let D : V 7 U be the continuous
extension of D0 . Since for |uk g| 0, uk U , and |vk h| 0, vk V , we have
(h, Ag) (Dh, g) = lim {(vk , Auk ) (Dvk , uk )} = lim {(vk , A0 uk ) (D0 vk , uk )} = 0,
k k
D = A0 .
Example: Extension Of Fourier Series Sum

The (complex) Fourier series expansion
X
f (t) = fk ejkt , (7.5)
kZZ
where t IR and fk are complex numbers, is expected to converge whenever

X
|fk |2 < . (7.6)
kZZ
The convergence, however, is not of the ordinary point-wise type: referring to the value
of f (t) for a specific t IR could be meaningless for a continuum of instances of t. The
subsection explains how the summation in (7.5) can first be defined on a set of nice
sequences {fk }kZZ , and then extended to cover the general case of sequences {fk }kZZ
satisfying (7.6).
For completeness, the following result from the classical theory of continuous functions
will be useful.
Theorem 7.9 If a continuous function f : [, ] 7 C is such that
Z
ejktf (t)dt = 0

for all k ZZ then f (t) = 0 for all t [, ].
Proof. For all positive integers n let the functions hn : IR 7 IR be defined by

|1 + ejt + + ej(n1)t |2 1 1 ejnt

hn (t) = = .
2n 2n 1 ejt
By construction, each function hn = hn (t) is a linear combination of the complex exponents
ek (t) = ejkt , where k ZZ, with a unit coefficient at e0 (t) 0. Hence for every IR
Z Z
hn (t )(f ( ) f (t))dt = f ( ), hn (t )dt = 1.

On the other hand, since
1
0 hn (t) when ejt 6= 1,
n|1 ejt |
for every (, ) and for
0 < < min{ + , },
M = max |f (t)|, ( ) = max |f (t) f ( )|,
t[,] t[ , +]
we have
Z

|f ( )| = hn (t )(f (t) f ( ))dt
Z

hn (t )|f (t) f ( )|dt

Z
hn (t )|f (t) f ( )|dt

Z + Z
+ hn (t )|f (t) f ( )|dt + hn (t )|f (t) f ( )|dt
+
Z Z + Z
2M hn (t )dt + r ( )hn (t )dt + 2M hn (t )dt
+
+ +
2M j
+ r ( ) + 2M .
n|1 e | n|1 ej |
Since f is a continuous function, r ( ) converges to zero as 0, and hence the upper bound
can be made arbitrarily close to zero by selecting large n and small . Therefore f ( ) = 0.
Let U be the complex vector space of all functions u : ZZ 7 C such that u(k) = 0 for
all but a finite set of k ZZ. Let V be the set of all continuous functions v : [, ] 7 C
such that only a finite number of the Fourier series coefficient integrals
Z
1
vk = ejkt v(t)dt, k ZZ (7.7)
2
are not equal to zero. Let A0 : U 7 V be the Fourier series sum function mapping
u U to v = A0 (u) according to
X
v(t) = ejktu(k).
kZZ
Let B0 : V 7 U be the Fourier series function mapping v V to u = B0 (v) defined

by u(k) = vk , where vk is given by (7.7).
By construction, A0 and B0 are linear functions. Moreover, since, for k, m ZZ,
Z
1 jkt jmt 1, k = m,
e e dt = km =
2 0, k 6= m,
it follows that B0 A0 = IU . The identity A0 B0 = IV , though not always implied by

B0 A0 = IU , also holds in this case because, by construction, all Fourier series coefficients
of v = w A0 B0 w are equal to zero for every w V .
Let : U 7 IR and : V 7 IR be the positive definite Hermitian forms defined
according to Z
X
2 1
(u) = |u(k)| , (v) = |v(t)|2 dt.
kZZ
2
Let
1
X Z
0
(u1 , u2 ) = u1 (k) u2 (k), (v1 , v2 ) = v1 (t)0 v2 (t)dt
kZZ
2
be the corresponding Hermitian inner products. It is easy to check that
(u) = (A0 u), (B0 v) = (v), (A0 u, v) = (u, B0 v) u U, v V. (7.8)
Let G and H be the complex Hilbert spaces obtained by applying the process of comple-
tion, as described in Theorem 7.2, to U and V respectively. According to Theorem 7.8, the
identities in (7.8) imply that the linear functions A0 , B0 have uniquely defined continuous
extensions A : G 7 H and B : H 7 G such that
B = A0 , A0 A = BA = IG , AA0 = AB = IH .
7.3. SPECTRUM OF SELF-ADJOINT OPERATORS 125
The linear function A extends the Fourier series sum operation to the elements of G, which
can be easily associated with the Hilbert space of all functions f : ZZ 7 C satisfying the
inequality (7.6).
In contrast, for a fixed t [, ] the linear function ft : U 7 C defined by
X
ft (u) = ejkt u(k)
kZZ
is not bounded with respect to the Hermitian form , and hence cannot be extended to
G.
7.3 Spectrum Of Self-Adjoint Operators

This section presents the techniques used to study optimization questions defined in terms
of two quadratic forms : V 7 IR and : V 7 IR on a real vector space V , where is
assumed to be positive definite. Essentially, the task is to describe the extremums of the
function v 7 (v)/(v) over the set of all non-zero vectors v V . The resulting theory
of spectrum of self-adjoint linear operators aids in computing operator norms and best
bounded rank approximations of linear functions.
When trying to find a finite rank approximation of a bounded linear function L on a
Hilbert space, it is useful to look at the spectrum of the self-adjoint operator A = L0 L.
The spectrum of L0 L is a set of non-negative real numbers, some of which are simply the
eigenvalues of L0 L of finite multiplicity, but some other are elements of essential spectrum
of A, which can be associated, loosely speaking, with infinite multiplicity. The minimal
possible value of the operator norm kL Lr k, where Lr has rank less than r equals the
square root of the k-th largest point in the spectrum of L0 L, counting multiplicity.
7.3.1 Constrained Optimization Of Quadratic Forms

The following statements provides necessary conditions of optimality and asymptotic op-
timality in the problem of maximizing the ratio of two quadratic forms.
Theorem 7.10 Let : V 7 IR and : V 7 IR be two quadratic forms on a real vector

space V . Assume that is positive definite. Let
(v) (v)
= inf , + = sup .
v6=0 (v) v6=0 (v)
Then
(a) if + < and v V is such that (v) = + (v) then
(u, v) = + (u, v) uV; (7.9)

(b) if > , + < , and a sequence {vk } k=1 of vectors vk V is such that
(vk ) = 1 and (vk ) converges to + as k then
lim sup |(u, vk ) + (u, vk ) | = 0. (7.10)

k u: (u)1
Proof. For u, w V , t IR, h = w + tu such that w 6= 0 and h 6= 0 we have

def (h) (w) + 2(u, w) + t2 (u)
(t) = = .
(h) (w) + 2(u, w) + t2 (u)
To prove (a), note that, by assumption, () with w = v achieves its maximum at t = 0, and
hence
2(u, w) (w) 2(w)(u, w) (u, v) + (u, v)
0 = (0) = 2
=2 .
(w) (v)
To prove (b), note that, by assumption, the maximum of (t) (0) converges to zero as
k . Since the maximum is not smaller than
|(u, vk ) + (u, vk ) |2
C , where C = C(+ , ),
(u)(w)
the conclusion follows.
Example 7.9 Let us try to find the maximal value of x1 x2 + x2 x3 where x1 , x2 , x3 are real
numbers such that x21 + x22 + x23 = 1. The setup corresponds to having V = IR3 and : V 7 IR,
: V 7 IR defined by

x1 x1
x2 = x21 + x22 + x23 , x2 = x1 x2 + x2 x3 .
x3 x3
According to Theorem 7.10 (a), a set of values x1 , x2 , x3 is optimal only if the identity
(u1 x1 + u2 x2 + u3 x3 ) = 0.5(u1 x2 + u2 x1 + u2 x3 + u3 x2 )
holds for all u1 , u2 , u3 IR, i.e. when
2x1 = x2 , 2x2 = x1 + x3 , 2x3 = x2 .
The equations, which can be interpreted as the eigenvector relation

x1 0 0.5 0 x1
x2 = 0.5 0 0.5 x2 ,
x3 0 0.5 0 x3

have a non-zero solution (x1 , x2 , x3 ) for {0, 1/ 2, 1/ 2}, which suggests that = 1/ 2
is the maximal value in question.
The conclusion, however, relies on the premise that (v) does achieve its maximal value
subject to (v) = 1. In this specific case, due to the fact that the dimension of V = IR3 is finite,
the assumption is indeed true. A proper way of establishing attainability of the maximum is
by referring to compactness of the set
S = {v IR3 : (v) = 1}
and continuity of function : compactness means that out of every sequence {vk }
k=1 of vk S
such that (vk ) , one can extract a subsequence wn = vk(n) converging to a limit u S,
which in turn implies that
(u) = (lim wn ) = lim (wn ) = .
The compactness and continuity arguments, which become less trivial, though still very
useful, in the infinite dimensional case,will be studied in later chapters. Perhaps an easier way
to see that the computed value = 1/ 2 is indeed the maximum of on S is by checking that
the matrix of the quadratic form

(1/ 2 + )(v) (v)
satisfies the Sylvester criterion of positive definiteness for every > 0.
Example 7.10 Let V = `2 be the real Hilbert space of all functions v : ZZ+ 7 IR such that

X
2
|v| = v(k)2 < .
k=0
Let (v) = |v|2 be the quadratic form defining V as a Hilbert space. Let

v(0)2 X v(0)2
(v) = + v(k)v(k + 1) = + v(1)v(2) + v(2)v(3) + . . . .
2 2
k=1
Since
v(k)v(k + 1) 0.5{v(k)2 + v(k + 1)2 },
it follows that (v)/(v) 1 for all v `2 , v 6= 0. Since, for wr `2 defined by wr (k) = r k
where r (0, 1) is a parameter,
1 r2 1 r3 (wr )
(wr ) = + 2
, (wr ) = + 2
, lim = 1,
2 1r 2 1r r1 (wr )
one can conclude that the minimal upper bound of (v)/(v) equals 1.
According to Theorem 7.10 (a), the minimal upper bound of (v)/(v), where v 6= 0, can
only be achieved at an element v `2 satisfying the identity

" #
X X
u(k)v(k) = 0.5 u(0)v(0) + {u(k)v(k + 1) + u(k + 1)v(k)}
k=0 k=1
for all u `2 , i.e.
v(0) v(2) v(k + 1) + v(k 1)

v(0) = , v(1) = v(1)+ , v(k) = v(k)+ (k = 2, 3, . . . ). (7.11)
2 2 2
Note that every solution of the third equation in (7.11) has the form
v(k) = c1 z1k + c2 z2k (k = 1, 2, 3, . . . ), (7.12)
where z1 , z2 are the two roots of the polynomial
a(z) = z 2 + 2(1 )z + 1.
Substituting (7.12) into the second equation in (7.11) yields c1 = c2 = c. Since z1 z2 = 1, this
implies that v(k) does not converge to zero as k , unless v(k) = 0 for all k > 0. Therefore
the optimality condition from Theorem 7.10 (a) can only be satisfied for = 0.5, which is in
obvious disagreement with the minimal upper bound value computed earlier at = 1!
The example demonstrates the dangers of relying on eigenvalues in finding maximal ratios
of quadratic forms on infinite dimensional vector spaces. It also presents a setup in which the
maximal value of such ratio is not achieved. The absence of an optimizer takes place despite the
fact that V is a Hilbert space defined by the quadratic form . This is in sharp contrast with
the special case when is defined by (v) = |f (v)|2 , where f : V 7 IR is a linear function.
According to the properties of linear quadratic optimization discussed in the previous chapter,
the minimal upper bound of |f (v)|2 /(v) is achievable whenever it is finite and the pair (V, )
defines a Hilbert space.
7.3.2 Quadratic Forms And Self-Adjoint Operators

Let : V 7 IR be a positive definite quadratic form on a real vector space V (or,
alternatively, a positive definite Hermitian form on a complex vector space V ). A linear
operator A : V 7 V is called self-adjoint (with respect to ) if
(v, Au) = (Av, u) u, v V. (7.13)
When (V, ) is a Hilbert space, and A is bounded, (7.13) means that A = A0 , which
explains the terminology.
When V is finite dimensional, the matrix of a self-adjoint operator A with respect to
any orthonormal basis is symmetric (in the real case) or Hermitian (in the complex case).
We are used to representing finite dimensional quadratic forms by symmetric matrices.
The following statement provides a coordinate free generalization of such representation
(which applies in the infinite dimensional case as well), associating bounded quadratic
forms = (v) on a Hilbert space H with self-adjoint operators A : J 7 H according to
(v) = (v, Av) = v 0 Av v H. (7.14)

Theorem 7.11 Let : H 7 IR be a quadratic form on a real Hilbert space H. Then

the following conditions are equivalent:
(a) is bounded, in the sense that there exists a constant > 0 such that |(v)| |v|2
for all v H;
(b) there exists a self-adjoint bounded linear operator A : H 7 H satisfying (7.14).
A C-version of Theorem 7.11, where H is a complex Hilbert space, and is a Hermitian

form, is also valid, and has a similar proof.
Proof. The implication (b)(a) is easy, as
|v 0 Av| |v| |Av| kAk |v|2 .
To prove the implication (a)(b), consider the quadratic form : H 7 IR defined by
(v) = |v|2 + (v).
By assumption, is positive semidefinite, and and can be bounded by (v) 2|v|2 . Hence the
corresponding symmetric bilinear form satisfies
|(u, v) |2 (u)(v) 4 2 |u|2 |v|2 u, v H.
Therefore, for every fixed u H, the linear function
v 7 (u, v) (u, v) = (u, v)
is bounded, and hence there exists a unique w = A(u) H such that (u, v) = (w, v) for
all v H. Since the constraints defining w are linear with respect to u and w, the function
A : V 7 V is linear. Substituting v = Au into the identity
(u, v) = (Au, v) (7.15)
yields
|Au|2 = (u, Au) |u| |Au|,
which proves that A is bounded, and kAk . Since the bilinear form (u, v) is symmetric, A
is self-adjoint. Finally, substituting v = u into (7.15) yields (v) = (v, Av).
Theorem 7.11 allows one to interpret the statements of Theorem 7.10 in terms of self-
adjoint operators. Indeed, consider the setup of Theorem 7.10 where (V, ) is a Hilbert
space (i.e. (v) = |v|2 on V ) and (v) = (v, Av) = v 0 Av for some bounded self-adjoint
linear operator A : V 7 V . Then statement (a) of the theorem means that the minimal
upper bound of the ratio (v)/|v|2 can only be achieved on eigenvectors of A which
correspond to eigenvalue , In turn, statement (b) claims that |Avk vk | 0 as k
for every sequence {vk }
k=1 of vectors vk V such that |vk | = 1 and (vk ) .
In Example 7.9, the self-adjoint operator A : IR3 7 IR3 such that v 0 Av = (v) is
defined by
x1 0.5x2
A x2 = 0.5(x1 + x3 ) ,
x3 0.5x2

and its largest eigenvalue = 1/ 2 is the maximum of (v)/|v|2, achieved at the corre-
sponding eigenvector
p1/2
v = (2)/2 .
1/2
In Example 7.10, the self-adjoint operator A : `2 7 `2 such that v 0 Av = (v) is
defined by
0.5v(0), k = 0,
(Av)(k) = v(1) + 0.5v(2), k = 1,
v(k) + 0.5v(k 1) + 0.5v(k), k > 1.

Operator A has a single eigenvalue = 0.5 which has nothing to do with the supremum
(or, for that matter, infimum) of (v)/|v|2. The actual minimal upper bound of (v)/|v|2
equals 1. It is not achievable, but (vk )/|vk |2 1 for a sequence of vectors {vk }
k=1 ,
2
vk ` such that |vk | = 1 if and only if |Avk vk | 0.
7.3.3 Spectrum And Essential Spectrum

For a bounded linear operator on a Hilbert space, the notion of spectrum generalizes
what the set of all roots of the characteristic polynomial means in the finite dimensional
case. Motivated by the application (low rank approximations of linear functions), this
subsection only deals with spectrum of self-adjoint operators. The definitions and state-
ments concerning spectrum of arbitrary bounded linear operators will be considered in
the chapters to come.
Definition 7.5 Let A : H 7 H be a bounded self-adjoint linear operator on a Hilbert

space H (real or complex). The spectrum (A) of A is the set of all real numbers r IR
for which
inf{|rv Av| : v H, |v| = 1} = 0,
i.e. there exists a sequence {vk }
k=1 of vectors vk V , |vk | = 1, such that |Avk rvk | 0.
According to the definition, an eigenvalue of A automatically belongs to its spectrum.

In the infinite dimensional case, the spectrum of A may contain numbers which are not
eigenvalues of A. It follows easily from the definition that (A) is always a non-empty
closed bounded subset of IR.
In particular, building upon the analysis performed in Example 7.10, it is possible to

show that the bounded self-adjoint operator A : `2 7 `2 defined by

v(0) + 0.5v(1), k = 0,
(Av)(k) =
v(k) + 0.5v(k 1) + 0.5v(k), k > 0,
has no eigenvalues, but its spectrum is the whole interval [1, 1],
Spectrum of a bounded self-adjoint operator A : H 7 H on a Hilbert space H
is closely related to the extremums of the functional (v) = v 0 Av/|v|2, defined for all
non-zero elements of H. According to Theorem 7.10, the minimal upper bound of
belongs to the spectrum of A. Actually, since the condition |Avk rvk | 0 together
with |vk | = 1 implies (vk ) r, the supremum of is the maximal element of (A).
In general, eigenvectors w of A define extremums of the functional , in the sense that
Aw = w for w 6= 0 implies
(w + tu) (w)
lim = 0 u H.
t0 t
The following theorem provides justification for an important classification of spec-
trum: essentially it claims that a spectrum point is either an eigenvalue of finite multi-
plicity, isolated form the rest of the spectrum, or is a point of essential spectrum, which
is a generalization of the notion of eigenvalues of infinite multiplicity.
Theorem 7.12 Let A : H 7 H be a bounded self-adjoint linear operator on a Hilbert

space H (real or complex). Let r (A) be a point in the spectrum of A. Then the
following conditions are equivalent:
(a) there exists a finite set of vectors u1 , . . . , un H such that
inf{|rv Av| : v H, |v| = 1, (v, u1 ) = = (v, un ) = 0} = > 0. (7.16)
(b) m(r) = dim(Vr ) < , where Vr = ker(rI A), and A has block representation

rIm(r) 0
A= , r 6 (A), (7.17)
0 A
with respect to the direct sum decomposition H = Vr Vr , where A : Vr 7 Vr

is a bounded self-adjoint operator.
It is natural to refer to the points r (A) satisfying condition (a) as eigenvalues

of multiplicity m = dim(ker(rI A)). The subset of those points r (A) which do
not satisfy condition (a) is called the essential spectrum of A (notation ess (A). By
the definition, when A is a bounded self-adjoint operator on a Hilbert space H of finite
dimension, its essential spectrum is empty. When dim(H) = , ess (A) is a non-empty
closed subset of (A).
Proof. The implication (b)(a) is easy, as one can define u1 , . . . , un H as a basis in Vr .
Then, using the block representations of matrices and vectors associated with the direct sum
decomposition H = Vr Vr , we conclude that

rIm(r) 0 0 0 0
|Av rv| = r = = |rv Av|
0 A v v rv Av
is separated from zero when (v, u1 ) = = (v, un ) = 0 and |v| = 1.

To prove the implication (a)(b), let Ar = rI A and Vr = ker(Ar ). By assumption,
dim(Vr ) < . Since Ar is bounded, Ar vk = 0 and |vk w| 0 implies Aw = 0, and hence Vr
is a closed subspace of H. Since u Vr means (u, v) = 0 whenever Av = rv, we have
(Au, v) = (u, Av) = r(u, v) = 0 whenever Av = rv,
i.e. Vr is A-invariant, and hence a decomposition (7.17) takes place, where A : Vr 7 Vr is

the restriction of A to Vr . Since tildeAv = Av for all v Vr , A is bounded, and
(Au, v) = (Au, v) = (u, Av) = (u, Av) u, v Vr ,
which means that A = A0 .

It remains to be proven that r 6 (A). Equivalently, it is sufficient to show that, for
def def
Ar = rI A, there exists > 0 such that |Ar u| |u| for u U = Vr .
Let V be the closed linear subspace of U defined by
V = {v U : (v, u1 ) = = (v, un ) = 0}.
Let W be the orthogonal complement of V in U . By construction, dim(W ) n < .

Let us show that Ar V is a closed subspace of U . To do this assume that a sequence {vk }k=1
of vectors vk V is such that |Ar vk w| 0 as k . Then |Ar vk Ar vi | 0 as k, i .
Since vk vi V , assumption (7.16) implies
|vk vi | |Ar (vk vi )|,
which means that |vk vi | 0 as k, i , i.e. {vk }k=1 is a Cauchy sequence, and, as such,
has a limit u V . Since Ar is bounded, Ar u = w, which proves that Ar V is a closed subspace
of U .
Now let the quadratic form : W 7 W be defined by
(w) = min |Ar w u|2 .

uAr V
By construction ker(Ar ) = {0}, and hence is positive definite. Since dim(W ) < , this implies
existence of > 0 such that (w) 2 |w|2 . Hence, for w W and v V ,

|Ar w + Av | |Ar w| (|Ar v| |Ar w + Ar v|) (|v| |Ar w + Ar v|),
kAr k kAr k kAr k
which proves that there exists > 0 such that
|Ar u| |u| u U.
Example 7.11 Let V = C[0, 1] be the real vector space of all continuous functions v : [0, 1] 7
IR, equipped with the positive definite quadratic form
Z 1
def
(v) = |v(t)|2 dt.
0
As was pointed out in earlier examples, (V, ) is not a Hilbert space. Let H be a completion of
V , i.e. a real Hilbert space which contains V as a dense subset, and such that (v) = |v|2 for
all v V . (It is known that H can be interpreted as the space L2 [0, 1] of all square integrable
measureable functions, but this fact will not be used here.)
Let A0 : V 7 V be the double integration operator mapping u V to y = A0 (u) defined
as the (unique) solution of the differential equation
y + u = 0, y(0) = 0, y(1) = 0, (7.18)
which is equivalent to a direct formula

Z 1
def
y(t) = h(t, )u( )d, h(t, ) = 1 max{t, }.
0
Note that A0 is self-adjoint with respect to , since for every pair (u1 , y1 ), (u2 , y2 ) of solutions
of (7.18), integration by part yields
Z 1 Z 1 Z 1
(y1 , u2 ) = y1 (t)u2 (t)dt = y1 (t)y2 (t)dt = y1 (t)y2 (t)dt = (u1 , y2 ) .
0 0 0
Since (7.18) implies

Z 1 Z 1
2 2
|y(t)| |u( )| d |h(t, )|2 d |y()|2 ,
0 0
it follows that A0 is a bounded linear operator, and |A0 v| |v| for all v V . This makes it
possible to consider the uniquely defined extension A : H 7 H of A0 : a bounded linear operator
such that Av = A0 v for all v V . Since A0 is self-adjoint, it follows that A = A0 .
What is the spectrum of A? A point r IR belongs to (A) if and only if |ruk Auk |
converges to zero for some sequence of vectors uk H such that |uk | = 1. Since V is dense in H,
one can assume that uk V without loss of generality, which makes the following calculations
more straightforward.
For r 6= 0 let yk = A0 uk be the corresponding solutions of (7.18). By assumption, |uk
r 1 yk | 0 as k , i.e.
Z 1
yk + r 1 yk = ek , yk (0) = 0, yk (1) = 0, lim |ek (t)|2 dt = 0. (7.19)
k 0
Since, for a given ek (), a colution of (7.19) can be written explicitly as a convolution integral,
we have |yk y| 0 and uk r 1 y| 0 as k , where y = y(t) satisfies
y + r 1 y = 0, y(0) = 0, y(1) = 0. (7.20)
Since a non-zero solution of (7.20) exists if and only if

2
2
r = k = , k {1, 2, . . . },
(2k 1)
the only non-zero points of spectrum of A are the eigenvalues 1 , 2 ,. . . (each of multiplicity
one). A set of coresponding normalized eigenvectors {xk }
k=1 is given by

(2k 1)
xk (t) = 2 cos t .
2
The point r = 0, as a limit point of the spectrum, is automatically in ess (A). It is not very
important, but instructive to see how it is verified that r = 0 is not an eigenvalue of A.
To prove that r = 0 is not an eigenvalue of A0 , note that y = A0 u = 0 means that
u = y = 0. However, the elements of H are mystery vectors for us, and it is not right to jump
into representing the relation between u H and y = Au as y = u.
To prove that r = 0 is not an eigenvalue of A, assume to the contrary that v H is a
non=zero vetor such that Av = 0. Then
(v, A0 u) = (v, Au) = (Av, u) = 0 u V0 ,
which means that not every vector in V can be approximated arbitrarily well by the elements of
A0 V in the metric of H. Since A0 V contains every two times continuously differentiable function
y such that y(1) = y(0) = 0, the opposite is true: A0 V is dense in H. The contradiction proves
that ker(A) = {0}.
7.3.4 Singular Values And Rank Reduction

The classification of spectrum of bounded self-adjoint operators provided by Theorem 7.12
can be used to define singular numbers of bounded linear functions on Hilbert spaces.
Definition 7.6 Let L : G 7 H be a bounded linear function, where G, H are Hilbert

spaces (real or complex). For every positive integer k, the k-th singular value of L (nota-
tion k (L)) is the maximal non-negative real number satisfying the following conditions:
(a) 2 (A);
(b) ess (L0 L) [0, 2 ];
(c) the sum of multiplicities of those eigenvalues of L0 L which are larger than 2 is
smaller than k.
According to Definition 7.6, if
k1 (L) > k (L) = k+1 (L) = = k+m1 (L) > k+m (L)
then the m equal singular values
= k (L) = = k+m1 (L)
can be associated with an orhtonormal set of m eigenvectors vk , . . . , vk+m1 of L0 L

associated with the eigenvalue 2 , i.e. an orthonormal basis in ker( 2 I L0 L). The
vectors vk (not uniquely defined, and not always defined for all k) are called the right
singular vectors of L. The corresponding normalized vectors
Lvk
uk = H
|Lvk |
will also be orthonormal. They are called the left singular vectors of L.
When G = IRn or G = C n , the linear function L can be represented in the form
X X
L= uk k vk0 , i.e. Lx = k (vk , x)uk ,
and vectors uk , k , vk can be obtained numerically in MATLAB by using
[U,S,V] = svd(L);
(uk is the k-th column of U, vk is the k-th column of V , k is the k-th diagonal element
of S).
Let L : G 7 H be a bounded linear function mapping one Hilbert space to another.
Since linear functions of small rank (recall that rank of a linear function is the dimension
of its range) are usually much easier to deal with in practical calculations, it is frequently
desirable to approximate L by a linear function Lr : G 7 H which has rank smaller than
a given positive integer r. It is natural to measure the quality of such approximation in
terms of the operator norm kL Lr k of the error function.
The following theorem, which is the ultimate destination of this chapter, solves the
bounded rank approximation problem in terms of singular values and singular vectors of
L.
Theorem 7.13 Let L : G 7 H be a bounded linear function mapping one Hilbert space
to another. Let
1 2 r1 r 0
are the first r largest singular values of L. Then
(a) kL Lr k r for every linear function Lr : G 7 H of rank less than r;

(b) if r1 > r and v1 , . . . , vr1 are the corresponding orthonormal right singular vec-
tors of L then kL Lr k = r for
r1
X
Lr = (Lvk )vk0 .
k=1
Proof. To prove (a),
Example 7.12 Let V = C[0, 1] and it completion H be the real vector space and the Hilbert
space considered in Example 7.11.
Let L0 : V 7 V be the linear integration operator mapping u V to v = L0 (u) defined as
the (unique) solution of the differential equation
Z t
v = u, v(0) = 0, i.e. v(t) = u( )d.
0
Since |v(t)| |u()| for all t [0, 1], it follows that |L0 u| |u| for all u V , i.e. L0 is bounded
with respect to the metric of H. Therefore L0 has a unique bounded extension L : H 7 H.
Since Z 1 Z 1 Z 1
y(t)u(t)dt = y(t)v(t)dt = y(1)v(1) y(0)v(0) y(t)v(t)dt
0 0 0
or every continuously differentiable function y : [0, 1] 7 IR, it is natural to consider the linear
operator M0 : V 7 V mapping w V to y V defined by
Z 1
y = w, y(1) = 0, i.e. y(t) = w( )d.
t
By the usual bounding argument, M0 is linear and bounded, and hence has a unique linear
bounded extension M : H 7 H. Since, by construction, (w, A0 u) = (M0 w, u) for all w, u V ,
it follows that M = L0 .
Since A = L0 L for the self-adjoint operator A : H 7 H considered in Example 7.11, we can
conclude that the singular values of L are given by
2
k (L) = , (k = 1, 2, . . . ),
(2k 1)
with

(2k 1)
vk (t) = 2 cos t
2
being the corresponding right singular vectors.
The calculations allow one to draw conclusions about the possibility of approximating L
by linear functions of finite rank. In particular 1 (L) = 2/pi is the operator norm kLk of
L, i.e. the best error of approximating L by a linear function of rank zero (the only linear
function of rank zero is zero). More interestingly, 2 (L) = 2/(3) is the minimal possible error
of approximating L by a rank one linear function L1 . One optimal approximation of rank one
is given by L1 = (Lv1 )v10 , i.e. maps u V to w = L1 u defined by
Z 1
4 t
w(t) = sin cos u( )d.
2 0 2
Chapter 8
Convexity
This chapter provides an introduction to elementary convexity. The notions of convexity

for sets and functions are introduced, and methods for recognizing convexity are discussed.
The chapter also presents several fundamental theorems of convex analysis, though the
discussion of applications of the most important one, the Hahn-Banach Theorem, is post-
poned until the next chapter.
A subset of a real vector space is called convex if the line segment
def
[u, v] = {tu + (1 t)v : t [0, 1]}
connecting every two points u, v is a subset of . A real-valued function : 7 IR

defined on a convex subset of a real vector space V is called convex if
(tu + (1 t)v) t(u) + (1 t)(v) u, v , (8.1)
i.e. if the segment connecting every two points on the graph of f lies above the graph,
and quasi-convex if all of its level sets
= {v : (v) < } ( IR)
are convex.
Convexity is of paramount importance in studying optimization and game theory. It
is also a powerful tool of feasibility analysis for systems of equations and inequalities.
8.1 Convex Sets And Convex Functions

This section discusses assesment of convexity for sets and functions. There are two major
ways for doing this: one is to recognize a set as an intersection of a family of half-spaces,
or, similarly, to recognize a function as supremum of a family of affine functions. The other
basic method of establishing convexity of functions is based on directional differentiation.
139
140 CHAPTER 8. CONVEXITY
This can also be used to prove convexity of sets, as level sets of convex functions are
convex. Finally, it is possible to derive convexity of sets and functions with complex
definitions by combining information about convexity of simpler objects. .
8.1.1 Intersections And Maximums

Let V be a real vector space. A function h : V 7 IR is called affine if it can be represented
in the form
h(v) = c + f (v), c IR, f V ] ,
i.e. as a sum of a linear function and a constant. A subset X V is called an open
half-space is it is a level set of a linear function, i.e. if it can be represented in the form
X = {v V : c + f (v) < 0}, c IR, f V ] .
It is easy to see that every half-space is a convex set, and every affine function is convex
(on every convex subset of V ). It is also not difficult to establish that the intersection of
a family of open half-spaces is a convex set, and the minimal upper bound of a family of
linear functions is a convex function on every set over which it is finite, as stated by the
following theorem.
Theorem 8.1 Let K be a (non-empty) set of affine functionals on real vector space V .
Then
(a) the subset 0K of V defined by
0K = {v V : h(v) < 0 h K}
is convex.
(b) the subset

K V defined by the condition
def
(v) = sup h(v) <
hK
is convex, and the function is convex on

K.
In other word, a set defined by affine inequalities is convex, and supremum of affine
functionals is a convex function.
Proof. To prove (a), let v1 , v2 0K and c [0, 1]. Since h(v1 ) > 0 and h(v2 ) > 0 for all h K,
we conclude that
h(tv1 + (1 t)v2 ) = th(v1 ) + (1 t)h(v2 ) > 0
for all h K, t [0, 1], which means tv1 + (1 t)v2 0K .
8.1. CONVEX SETS AND CONVEX FUNCTIONS 141
To prove (b), note that supremum of the sum is not larger than the sum of the corresponding
supremums, and hence
(tv1 + (1 t)v2 ) = sup h(tv1 + (1 t)v2 )

hK
= sup h K[th(v1 ) + (1 t)h(v2 )]
t sup h Kh(v1 ) + (1 t) sup h Kh(v2 )
= t(v1 ) + (1 t)(v2 ).
Example 8.1 Let V be the set of all Hermitian n-by-n matrices. Let V be the subset of
V consisting of all positive semidefinite matrices. Is a convex set?
Note that answering this question using the non-negative eigenvalues definition of positive
semidefiniteness would be difficult, if not impossible. Luckily, there is another definition: a
matrix M V is positive semidefinite if and only if x0 M x 0 for all x C n , x 6= 0, or,
equivalently
def
h(M ) = r x0 M x < 0 x C , x 6= 0, r IR, r > 0.
Since the function h : M 7 r x0 M x is affine for all x and r, the set is convex.
Example 8.2 Let V be the real vecor space of all Hermitian n-by-n matrices. Let V be
the subset of V consisting of all (strictly) positive definite matrices. Let : 7 IR be the
function mapping M to the trace of M 1 . If f a convex function?
Once again, trying to relay of an explicit formula expressing the trace of M in terms of the
entries of M is likely to lead one nowhere.
To show that is a convex function, note that trace of an n-by-n matrix can be defined by
n
X
trace(M ) = e01 M e1 + + e0n M en = e0k M ek ,
k=1
where {ek }nk=1 is the standard basis in C n . On the other hand, as follows readily from the
properties of linear quadratic optimization, the identity
v 0 M 1 v = maxn {2Re(v 0 u) u0 M u}
uC
holds for every positive definite M V and every v C n . Hence

n
X n
X
(M ) = e0k M 1 ek = sup {2Re(e0k uk ) u0k M uk }
uk C n
k=1 k=1
is the minimal upper bound of a set of affine functionals

n
X
h : M 7 {2Re(e0k uk ) u0k M uk },
k=1
where uk range independently over C n. According to Theorem 8.1, is convex.

Example 8.3 Convexity of functions defined in terms of eigenvalues of Hermitian matrices

can frequently be established using Theorem 8.1. In particular, for a given r {1, . . . , n} the
function r : V 7 IR, defined on the real vecor space V of all Hermitian n-by-n matrices by
r (X) = max{trace(XY ) : Y = Y 0 , 0 Y I, rank(Y ) = r}
is convex, as a maximum of a family of linear functions.

On the other hand, it can be shown that
r
X
r (X) = 1 (X) + + r (X) = k (X),
k=1
where k (Z), for a Hermitian n-by-n matrix Z and k {1, . . . , n}, denotes the k-th largest
eigenvalue of Z. (Indeed, in an appropriate orthonormal basis, the matrix of X is a diagonal
one, with the numbers Xkk = k (X) on the diagonal. On the other hand, since 0 Y I, all
diagonal elements Ykk of Y are from the interval [0, 1], and, since Y has r positive eigenvalues,
n
X n
X
Ykk = k (Y ) r.
k=1 k=1
Hence
n
X r
X
trace(XY ) = k (X)Ykk k (X),
k=1 k=1
where the maximum is achieved when Y11 = = Yrr = 1, Ykk = 0 for k > r.
Therefore the sum of r largest eigenvalues is a convex function on the real vector space of
all Hermitian matrices of fixed dimensions.
Example 8.4 Let V be the real vector space of all polynomial functions p : IR 7 IR of n real
variables. Let : V 7 IR be the function mapping p to the maximum
2
(p) = max{p(t) et }.
t
While the function (p) can be difficult to evaluate in terms of the coefficients of p, and finding
an analytical expression for (p) also seems impossible, it is very easy to establish convexity of
p: since, for every fixed t IR the function p 7 p(t) exp(t2 ) is affine, is a maximum of a
family of affine functions, and hence is convex.
8.1.2 Convexity And Differentiation

Let us call a function f : (0, 1) 7 IR right differentiable if for every t (0, 1) the limit
def f (t + ) f (t)
f+ (t) = lim
0, >0
exists and is finite. Similarly, f is left differentiable if for every t (0, 1) the limit
def f (t + ) f (t)
f (t) = lim
0, <0
exists and is finite.

The application of directional second derivatives in the analysis of convexity is based
on the following observation.
Theorem 8.2 For every function f : (0, 1) 7 IR the following conditions are equivalent:
(a) f is convex;
(b) f continuous, right differentiable and its right derivative f+ is monotinically non-
decreasing.
Proof. To prove the implication (a)(b), assume that f is convex. Then for every set of values
a, b, c (0, 1) such that a < b < c we have

ba cb ba cb
f (b) = f c+ a f (c) + f (a),
ca ca ca ca
which is equivalent to
f (c) f (b) f (c) f (a) f (b) f (a)
.
cb ca ba
In particular, for every t (0, 1) and 0 < 1 < 2 < 1 t we have
f (t + 2 ) f (t) f (t + 1 ) f (t) f (t/2) f (0)

,
2 1 t
which means that the limit in the definition of f+ (t) is that of a monotonically non-increasing
function with a finite lower bound, and hence exists. Also, since
f (t1 + 1 ) f (t1 ) f (t2 ) f (t1 + 1 ) f (t2 + 2 ) f (t2 )

1 t2 t1 1 2
for 0 < t1 < t1 + 1 < t2 < t2 + 2 , we have f+ (t1 ) f+ (t + 2).

Since the function g : (0, 1) 7 IR is also convex, g is right differentiable as well, which means
that f is left differentiable. Therefore f is continuous.
To prove the implication (b)(a), use the fact that, for a continuous right differentiable
function f : (0, 1) 7 IR, and for 0 < a < b < 1, the inequalities
f (b) f (a)
inf f+ (t) sup f+ (t)
t(a,b) ba t(a,b)
are satisfied (it is a very useful and not completely trivial excercise to see why this is true).
Hence the monotonicity of f+ guarantees that, for 0 < v < u < 1,
tf (v) + (1 t)f (u) f (tv + (1 t)u)
= (1 t)[f (u) f (tv + (1 t)u)] t[f (tv + (1 t)u) f (v)]

1 t f (u) f (tv + (1 t)u) f (tv + (1 t)u) f (v)
= (u v)
t u [tv + (1 t)u] [tv + (1 t)u] v
!
1t
(u v) inf f+ ( ) sup f+ ( ) 0.
t (tv+(1t)u,u) (v,tv+(1t)u)
As long as one agrees to view monotonically non-decreasing functions as having non-

negative derivatives, Theorem 8.2 can be interpreted, if loosely, as one claiming that a
function f : (0, 1) 7 IR is convex if and only if its second derivative is non-negative.
This interpretation is valid in a limited sense only: monotonic functions g : (0, 1) 7 IR
are not necessarily differentiable at every point t (0, 1). Though it can be shown that
monotonic functions are differentiable at all points of the interval (0, 1) except a subset
of zero measure, derivatives existing almost everywhere are not good for establishing
monotonicity. For example, there exist non-constant functions f : (0, 1) 7 IR which
have zero derivative almost everywhere. Therefore, while it is safe to say that a two
times continuously differentiable function f : (0, 1) 7 IR is convex if and only if its
second derivative is non-negative, one has to be careful using second derivatives which
exist almost everywhere.
Note also that a function f : [0, 1] 7 IR is convex if and only if its restriction to the
open interval (0, 1) is convex, and the inequalities
f (0) lim f (t), f (1) lim f (t)

t0,t>0 t1,t<1
are satisfied.
Let be a subset of a real vector space V . A function : 7 IR is convex if and
only if its restriction to every segment in is convex, i.e. if the function fu,v : [0, 1] 7 IR
defined by
f (t) = u,v (t) = (tu + (1 t)v)
is convex for every pair u, v . In particular, it is sufficient to know that f = u,v has
a non-negative second derivative at t = 0 for every pair u, v to conclude that is
convex.
When is a subset of IRn , and : IRn 7 IR is two times continuously differentiable
on an open subset of IRn containing , the second derivative of u,v is given by
u,v
(0) = (u v)0 (v)(u v),
dt
where (v) is the Hessian of : the matrix of its partial second derivatives (which must
be symmetric due to the continuity of the second derivatives of ). Therefore, positive
semidefiniteness of a continuous Hessian is a guarantee of convexity. In many examples
(usually when is a subset of a real vector space with no convenient representation as
IRn ), instead of assembling the Hessian, it is easier to simply show that for all u, v
there exist real number a such that the limit
(tu + (1 t)v) (v) at
lim
t0,t>0 t2
exists and is non-negative.
Example 8.5 The function : [0, ] 7 IR defined by (v) = sin(v) is convex. Indeed,
(v) = sin(v) is non-negative on [0, ]. As a byproduct, a representation of as maximum of
affine functions is given by
sin(v) = max { sin(w) cos(w)(v w)} for v [0, ].

w[0,]
Example 8.6 Let be the positive quadrant in R2 , i.e. the set of vectors [x; y] IR2 with
positive components x > 0, y > 0. Obviously is convex. Let the function : R be
defined by f (x, y) = 1/xy. According to Theorem 8.2, the function is convex, because its
Hessian 2
d f /dx2 d2 f /dxdy 2/x3 y 1/x2 y 2

W (x, y) = =
d2 f /dydx d2 f /dy 2 1/x2 y 2 2/xy 3
is positive definite on . Moreover, the identity

1 1 x x1 y y 1
= max 2
xy x1 >0,y1 >0 x1 y1 x1 y 1 x1 y12
holds for all x, y > 0.
Directional differentiation is much more convenient than Hessian-based calculations

when the arguments of a convex function candidate are matrices.
Example 8.7 Let V be the set of all real symmetric n-by-n matrices. Let V be the
subset of all positive semidefinite matrices in V . Then for every positive integer n the function
X 7 trace(X n ) is convex on . To prove this, for every X0 and X1 V let f (t) =
trace(X0 + tX1 )n . Since
n
!
X Y
trace(X + t)n = ts() trace X(k) ,
k=1
where the sum is taken over the set H of all functions : {1, . . . , n} 7 {0, 1}, and the
function s : H 7 {0, 1, . . . , n} maps H to the sum of its values. To prove convexity of
, it is sufficient to show that the coefficient at t2 in the expansion is non-negative. Indeed,
since trace(AB) = trace(BA), the coefficient at t2 is a sum of traces of matrices of the form
X0a X1 X0b X1 , where a, b are non-negative integers such that a + b = n 2. Since X0 is a positive
semidefinite symmetric matrix, it can be represented in the form X0 = Y 2 , where Y = Y 0 0.
Then
trace(X0a X1 X0b X1 ) = trace(Y b X1 Y a )0 (Y b X1 Y a )0 0
as the trace of a positive semidefinite matrix.
Sometimes even taking the directional derivatives is unnecessarily complicating con-

vexity verification, as shown by the following classical derivation.
Example 8.8 Let V be the set of all Hermitian n-by-n matrices. Let V be the subset of
all positive definite matrices in V . Then the function : 7 IR defined by
(X) = log det(X)
is convex. To prove this, it is sufficient to show that for arbitrary X and V the function
f (t) = log det(X + t)
has a non-negative second derivative at t = 0. Since X = X 0 > 0, it can be represented in the

form X = Y 2 , where Y = Y 0 > 0. Then
log det(X + t) = log(Y 2 + tY Y 1 Y 1 Y ) = log det(Y 2 ) log det(I + tY 1 Y 1 ).
Since Z = Y 1 Y 1 is a Hermitian matrix, it is real diagonal in some orthonormal basis, and

hence
Xm
log det(I + tZ) = log(1 + k (Z)t)
k=1
has a non-negative derivative at t = 0, just as each individual logarithmic function t 7 log(1+

t) does.
8.1.3 Convexity Preserving Operations

In addition to Theorems thm:convaffine and 8.2 which help establishing convexity from
scratch, several statements can be used to derive convexity of one function from convexity
of other functions.
Theorem 8.3 Let V be a real vector space.
(a) If V is a convex set, f : IR, g : R are convex functions, and a, b 0

are non-negative real numbers then h : R defined by h(v) = af (v) + bg(v) is
convex as well.
(b) If V is a convex set, f : IR is a convex function, U is a real vector space,

and L : U V is linear function then the set
L1 () = {u U : L(u) }
is convex, and the function f L : L1 () IR defined by (f L)(u) = f (L(u)) is

convex.
(c) If { } is a family of convex subsets in V then
is a convex set.
(d) If 1 V , 2 V are convex sets then

def
1 + 2 = {v1 + v2 : v1 1 , v2 2 }
is a convex set.
(e) If V is a convex set, f : IR is a convex function, then
0 = {v : f (v) < 0}
is a convex set.
The proof of Theorem 8.3 is straightforward, and is left as an excercise.
Example 8.9 Let IRn be the set

x1 n
.. :
X
= xk [0, ], sin(x1 ) + + sin(xn ) = sin(xk ) 1 .

.

xn k=1
The set is convex. Why?

It was already established in Example 8.5 that the function v 7 sin(v) is convex on the
interval [0, ]. According to Theorem 8.3 (b), the sets

x1
k = {x IRn : Lk x [0, ]}, where Lk ... = xk ,

xn
and the corresponding functions k : k 7 IR defined by k (x) = sin(Lk x) are convex. The
intersection [0, ]n IRn of all sets Wk is convex as well (this is obvious, as is a hypercube,
but also follows from Theorem 8.3 (c)). Hence for every > 0 the function : 7 IR defined
by
n
def
X
(x) = 1 + k (x)
k=1
is convex. Hence the level sets
X = {x : (x) < 0}
are convex as well (Theorem 8.3 (d)). Finally, is convex as the intersection of all sets X
with > 0.
Another useful theorem addresses convexity of non-negative homogeneous functions.
Theorem 8.4 Let : V 7 [0, ) be a non-negative function on a real vector space V

which is homogeneous, in the sense that
(rv) = |r|(v) v V, r [0, ).
Then the following conditions are equivalent:
(a) is convex;
(b) the set = {v V : (v) 1} is convex.
The implication (a)(b) is trivial, as a level set of every convex function is convex.
The other direction provides a useful way of proving convexity of homogeneous functions
by checking convexity of a single level set.
Proof. Assume that assumption (b) holds. For u, v V and > 0 we have

u v
1, 1.
+ (u) + (v)
Hence the assumed convexity of implies

t( + (u)) u (1 t)( + (v)) v
+ 1
t( + (u)) + (1 t)( + (v)) + (u) t( + (u)) + (1 t)( + (v)) + (v)
for every t [0, 1], which means
(tu + (1 t)v) + t(u) + (1 t)(v).
Choosing 0 completes the proof.

8.2. BASIC THEOREMS OF CONVEX ANALYSIS 149
Example 8.10 Why is the function : IRn 7 IR defined by

x1 n
!1/a
... =
X
|xk |a

xn k=1
convex for all a 1? First, the function t 7 |t|a is convex on IR (verifiable by differentiation).
Hence the function x 7 (x)a is convex. Hence the level set
= {x IRn : (x)a 1}
is convex. Since is also the level set of the homogeneous function : IR7 [0, ), is convex
as well.
8.2 Basic Theorems of Convex Analysis

The theory of convex analysis relies on a number of beautiful and powerful theorems.
This section describes some of the most well known of them.
8.2.1 Hahn-Banach Theorem

Judging by the observations for the planar geometry, it appears to be intuitively obvious
that a point of a vector space which does not belong to a convex set can be separated
from this convex set by a hyperplane. Indeed, with a minor modification, the statement
is true for all real vector spaces, and constitutes a very powerful tool of linear algebra and
functional analysis, the Hahn-Banach Theorem.
Before we formulate the theorem, a definition of interior point of a convex set will be
given
Definition 8.1 Let be a subset of a real vector space V . A vector v V is called

an interior point of if for every u V there exists > 0 such that v + tu for all
t (, ).
The definition of an interior point given in Definition 8.1 is weaker than many alterna-
tives (typically, based on metrics and topology). For example, according to the definition,
v = 0 is an interior point of the set

x1 2 2
= IR : x2 0 or x2 x1 ,
x2
while it is not an interior point in terms of the standard topology of IR2 .
Theorem 8.5 Let V be a convex subset of a real vector space V . Let v V be a

vector in V . Assume that
(a) v is not an interior point of ;
(b) contains at least one interior point.
Then there exists a linear function f V ] , f 6= 0, such that f (w) f (v) for all w .
Note how the inequality f (w) f (v), where f is a non-zero linear function, is used
to describe mathematically a hyperplane separating v from the interior of .
The proof and applications of the Hahn-Banach Theorem will be discussed in detail
in the next chapter.
8.2.2 Caratheodorys Fundamental Theorem

The Caratheodorys Fundamental Theorem states a useful property of convex combina-
tions of vectors. In general, a convex combination of elements v1 , . . . , vm of a real vector
space is a linear combination with non-negative coefficients which sum up to one, i.e. a
vector of the form
v = c1 v1 + + cm vm , ck IR, ck 0, c1 + + cm = 1.
For example, the segment [v, w] connecting two vectors v, w V is the set of all convex
combinations of v and w.
In general, convex combinations provide a useful way of defining convex sets: as stated
by the following result, the set of all convex combinations of elements from a given set is
always convex.
Theorem 8.6 Let 0 be a subset of a real vector space V . Then the set = co(0 ) of
all convex combinations of finite groups of elements from 0 is convex.
Proof. If
m
X n
X m
X n
X
v= ak vk , u = bk uk , ak = bk = 1,
k=1 k=1 k=1 k=1
where vk , uk 0 , and ai , bi [0, ) then for every t [0, 1]

m
X n
X
tv + (1 t)u = [tak ]vk + [(1 t)bk ]uk ,
k=1 k=1
and
m
X n
X
tak + (1 t)bk = 1.
k=1 k=1
The Caratheodorys Fundamental Theorem states that, in a real vector space of finite
dimension n, every convex combination of m > n + 1 vectors is a convex combination of
a subset of n + 1 of those vectors.
Theorem 8.7
(a) If U is a real vector space of dimension dim(U) = q < m and w U is a linear

combination of vectors w1 , . . . , wm with non-negative coefficients then there exists
a subset {e1 , . . . , eq } of the set {w1 , . . . , wm } such that w is a linear combination of
vectors e1 , . . . , eq with non-negative coefficients.
(b) If V is a real vector space of dimension dim(V ) = n < m 1 and v V is a convex

combination of vectors v1 , . . . , vm then there exists a subset {u1 , . . . , un+1} of the
set {v1 , . . . , vm } such that v is a convex combination of vectors u1 , . . . , un+1.
Proof. To prove (a), let d be the minimal number of elements e1 , . . . , ed of the set {w1 , . . . , wm }
needed to represent w as a linear combination with non-negative coefficients. Let
w = a1 e1 + + ad ed , ad > 0 (8.2)
be the corresponding representation. It is sufficient to show that d q. To do this, assume that

d > q. Then, since dim(U ) < d, there exist real numbers c1 , . . . , cd , at least one of which is
positive, such that
c1 e1 + + cd ed = 0.
Then for every t IR we have
w = (a1 + c1 t)e1 + + (ad + tcd )ed . (8.3)
Let t0 be the smallest of the ratios ak /ck taken for k with ck 6= 0. By construction, all
coefficients of the linear combination in (8.3) are non-negative, and at least one of them is zero,
which contradicts the assumption of minimality of d.
To prove (b), let U = V IR, a real vector space of dimension q = n + 1, elements of which
are pairs (v, y), with v V , y IR, with point-wise addition and scaling operations. Define w,
w1 , . . . , wm by w = (v, 1), wk = (vk , 1). Application of statement (a) shows that there exists a
subset u1 , . . . , un+1 of the set {v1 , . . . , vm } such that
(v, 1) = b1 (u1 , 1) + + bn+1 (un+1 , 1), bk 0,
which meand that
v = b1 v1 + + bn+1 un+1 , b1 + + bn+1 = 1, bk 0,
i.e. v is a linear combination of v1 ,. . . ,vn+1 .

Example 8.11 One important use of Theorem 8.7 is to describe extremal probability distri-
butions in optimization problems in which decision parameters are random variables.
Let v be a random variable which takes values in the interval [, ], has zero mean and
unit variance. What is the maximal possible expected value of sin(v)? Formally speaking, the
problem calls for maximizing the integral
Z
E[sin(v)] = sin(t)dV (t)

subject to the constraints

Z Z
tdV (t) = 0, t2 dV (t) = 1,
pi pi
where V : [, ] 7 [0, 1] is the monotonic function such that V (0) = 0 and V (1) = 1, to be
optimized.
The optimization can be simplified greatly by realizing that the set of all possible values
of the vector
E[v]
E[v 2 ] IR3
E[sin(v)]
is the set of all convex combinations of vectors from the set

t
0 = t2 : t [, ] .
sin(t)

According to Theorem 8.7, every element of is a convex combination of just four vectors from
0 . Therefore, the problem can be reduced to minimizing
4
X
ck sin(tk )
k=1
subject to
4
X 4
X 4
X
ck tk = 0, ck t2k = 1, ck = 1, ck 0, tk [, ]
k=1 k=1 k=1
with respect to the eight real variables ck , tk : still not easy, but manageable.
8.2.3 Hellys Theorem

Another useful statement of finite dimensional convex analysis, similar in nature to the
Caratheodorys theorem, is the so-called Hellys theorem.
First, let us introduce the notions of weak boundedness and closedness for subsets of
a real vector space, usually applied to convex sets.
Definition 8.2 Let V be a real vector space. A subset V is called (weakly) bounded
if for every u, v , u 6= 0, the set
def
Ev,u () = {t IR : v + tu }
is bounded. Similarly, the subset is called (weakly) closed when Ev,u () is closed for
all v, u V .
For convex subsets of a finite dimensional vector space, the notions of weak boundeness
and closedness are the same as the usual ones.
Theorem 8.8 Let V be a real vector space V of dimension n < m 1.

(a) Let W = {1 , . . . m } be a finite family of m convex subsets of V , m > n + 1. If
every n + 1 sets from W have a common point then all sets from the family have a
common point.
(b) Let W = { } be a family of closed bounded convex subsets of V (possibly
infinite or uncountable). If every n + 1 sets from W have a common point then all
sets from the family have a common point.
Note that the assumptions of closedness and boundedness are really needed in state-
ment (b) of Theorem 8.8. For examle every finite number of the closed convex subsets
k = [k, ) of IR, where k = 1, 2, . . . have common point, but the intersection of all sets
k is empty. Similarly, every finite number of the bounded convex subsets k = (0, 1/k)
of IR, where k = 1, 2, . . . have common point, but the intersection of all sets k is empty.
Proof. To prove (a), it is sufficient to show that if every r > n sets of a family of r + 1 sets
1 ,. . . ,r+1 in an n-dimensional vector space V have a common point then all k + 1 sets have
a common point. To do this, for every k {1, . . . , r + 1} let vk be a common point for all
sets 1 , . . . , r+1 exceps (possibly) k . Consider the vectors wk = (vk , 1) in V IR. Since the
dimension of V IR is not larger than r, there exist real numbers c1 , . . . , cr+1 , not all of which
are equal to zero, such that
c1 w1 + + cr+1 wr+1 = 0.
The claim is that the vector
|ck |
v = a1 v1 + + ar+1 vr+1 where ak =
|c1 | + + |cr+1 |
belongs to the set k for every k = 1, . . . , r + 1.
To prove this, since vk i whenever k 6= i, it is sufficient to show for that v is a convex
combination of vectors v1 ,. . . ,vr+1 , excluding vk , for every k. Since, by construction, c1 + +
cr+1 = 0, there are both positive and negative elements among ck , we have
X X 1 X X
ak = ak = , v = (2ak )vk = (2ak )vk ,
2
ck <0 ck >0 ck >0 ck <0
which finishes the proof of (a).

To derive (b) from (a), note that intersection of any family of closed bounded convex subsets
in a finite dimensional vector space, of which every finite group has a non-empty intersection, is
not empty.
The proof of the following statement (sometimes called the S-procedure losslessness
theorem, and a very useful result of its own right) demonstrates the typical application
of the Hellys theorem.
Theorem 8.9 If , : V 7 IR are two quadratic forms on a real vector space V such
that
(a) (u) > 0 for some u V ;
(b) (v) 0 whenever (v) 0.
Then there exists 0 such that (v) (v) for all v V .
Proof. For every fivex v V the set v of IR defined by
v = { [0, (u)/(u)] : (v) (v)},
is closed, and bounded. We need to prove that the sets have a common point. According to
Theorem 8.8, it is sufficient to show that every two of such sets have a non-empty intersection.
Since a pair of sets v , w depends only on the values of and at three vectors u, v, w, it is
sufficient to prove Theorem 8.9 for the linear span of u, v, w, i.e. in the case dim(V ) = 3.
Since the application of the Hellys theorem is done at this point in the proof, the rest is left
as an excercise.
8.2.4 Minimax Theorem
8.3 Convex Optimization

The tasks of minimizing a convex or a quasi-convex function , or of finding an element of
an implicitly defined convex set , are frequently referred to as convex optimization. As
a rule, one expects a convex optimization problem to be easy to solve, though there are
several other factors involved, such as dimension of the decision vector and complexity of
description of and/or .
8.3. CONVEX OPTIMIZATION 155
8.3.1 Method of ellipsoids

Consider the task of finding an element of an implicitly defined bounded convex subset
of IRn , in the case when no specific analytical description of is available, except for
a real number R > 0 establishing a bound on :
|v| R v , (8.4)
and a feasibility oracle, which is a function h : IRn 7 IRn IR, called feasibility oracle,
taking vectors u IRn as inputs, and producing pairs (w, c) IRn IR as outputs, in
accordance with the following rule:
(a) if u then w 0 u < c;
(b) if u 6 then w 6= 0, w 0 u c, and w 0 v c for every v V .

In other words, the feasibility oracle function h allows one to check, for a given u IRn ,
whether u belongs to the set , and, in case the answer is negative, produces a hyperplane
w 0 v = c which separates u from (existence of such hyperplane is guaranteed by the Hahn-
Banach Theorem). It is important to understand that h is a black box function: its
analytical description is not assumed to be awailable. In applications, h is implemented
by a piece of software code, which could be very complex, performing the necessary
calculations.
The ellipsoids algorithm allows one to find an element v using not more than
2n log(R/) calls to the oracle, and not more than O(n4 log(R/)) additional arithmetic
2
operations, whenever > 0 is such that the set of contains a ball of radius .
In abstract terms, the algorithm can be described as follows.
(a) Initialize u = 0 IRn , H = R1 In , go to (b).
(b) Apply the oracle to u to produce (w, c) = h(u). If w 0u < c then go to (c). Otherwise
update H and u according to H = He and u = ee , where a non-singular n-by-n real
matrix He and ue IRn are such that
Ee = {v IRn : |He1 (v ue )| 1} (8.5)
is the minimal volume ellipsoid containing the set
= {v IRn : |H 1(v u)| 1, w 0 v c},
and return to step (b).
(c) Stop: u is the desired element of . As a by-product, the ellipsoid E contains

.
The mathematics of the method of ellipsoids is based on the following theorem.
Theorem 8.10 For a real non-singular n-by-n matrix H and a vector u IRn let E
denote the ellipsoid
E = {v IRn : |H 1 (v u)| < 1}.
For (w, c) IRn , where w 6= 0, let X denote the half-space consisting of those v IRn
for which w 0 v < c. Then
(a) E X 6= and u 6 X if and only if
w0u c
r= (0, 1);
|Hw|
(b) assuming condition (a) is satisfied, the ellipsoid defined by (8.5) with
(1 + r) |e| 1 + 0.5(1 r)
ue = u + MM 0 w, He = M,
2 1+
where
1
2(1 + rn) 1 1+
= , e = Hw, = 2 , = , M = H Hee0 ,
(1 r)(n 1) |e| |e|2
contains E K, and has minimal volume among all ellipsoids containing E K.
Here is a sample MATLAB code for finding the minimal value ellipsoid:
function [Hn,x0n]=ellips_cut(H,x0,L,r)
% function [Hn,x0n]=ellips_cut(H,x0,L,r)
%
% finds a minimum-volume ellips {x: |inv(Hn)(x-x0n)|<1} containing the
% intersection of ellips {x: |inv(H)(x-x0)|<1} with the half-plane
% {x: L(x-x0)>r*|LH|}, where r is between 0 and 1.
n=length(x0);
LH=L*H;
q=norm(LH);
p=q^2;
ptau=2*(1+r*n)/((1-r)*(n-1));
tau=ptau/p;
th=(1-1/sqrt(1+ptau))/p;
H1=H-(th*(H*LH))*LH;
x0n=x0+0.5*(1+r)*q*tau*(H1*(L*H1));
Hn=H1*((1+0.5*ptau*(1-r))/sqrt(1+ptau));
8.3. CONVEX OPTIMIZATION 157
It can be shown that the volume of the ellipsoid E decreases by at least 1 0.5/n at
each repetition of step (b), which proves the claimed convergence properties of the algo-
rithm. In must be noted that, despite having remarkable provable convergence properties,
most practical implementations of the ellipsoids algorithm turn out to be inferior to the
alternatives, such as the interior point method.
Chapter 9
The Hahn-Banach Theorem
The Hahn-Banach Theorem, which states that an open set with non-empty interior can
be separated by a hyperplane from every point which is not an interior point of the
set, is a very useful tool for proving results in the fields of linear algebra, functional
analysis, and optimization. This chapter provides a proof for the theorem, as well as
some counterexamples to its generalizations, and explores its applications in matrix
algebra and optimization.
9.1 Proof of The Hahn-Banach Theorem

Essentially, Theorem 8.5 is proven by induction. In finite dimensional vector spaces, this
means induction with respect to the space dimension. In the infinite dimensional case,
this means using the Zorns Lemma. It is more convenient to give the proof in terms of
a functional version of the theorem.
9.1.1 Hahn-Banach Theorem For Minkowski Functionals

Minkowski functionals are homogeneous nonnegative real valued functions on real vector
spaces which provide a convenient way of representing convex sets which have zero as
their interior point.
Definition 9.1 Let be a convex subset of a real vector space V such that 0 is an
interior point of , in the sense that for every u V there exists r > 0 such that ru .
The Minkowski functional of is the function p : V 7 [0, ) mapping every u V to
p(u) = inf{r 1 : r > 0, ru }.
By construction, the Minkowski functional p from Definition 9.1 is a homogeneous

non-negative function on the real vector space V . Since is a level set of p, Theorem 8.4
ensures that p : V 7 [0, ) is convex.
159
160 CHAPTER 9. THE HAHN-BANACH THEOREM
Example 9.1 If is the square in IR2 with vertices at the points

1 1 1 1
, , ,
1 1 1 1
then the Minkowski functional of is given by

x1
p = max{|x1 |, |x2 |}.
x2
As claimed by the following simple remark, Minkowski functional can be used to

characterize interior points of .
Theorem 9.1 Let p : V 7 [0, ) be a convex nonnegative homogeneous function on

real vector space V . Then a vector u V is an interior point of
= {v V : p(v) 1}
if and only if p(u) < 1
Proof. If p(v) 1 then p(v + tv) = 1 + t > 1 (and hence v + tv 6 ) for all t > 0, which means
that v is not an interior point of .
If p(v) < 1 then for every u V

1 t
p(v + tu) = (1 + t)p v+ u p(v) + tp(u) < 1
1+t 1+t
for t [0, (1 + p(u))1 ], i.e. v is an interior point of .
In terms of Minkowski functionals, the Hahn-Banach Theorem can be formulated as

follows.
Theorem 9.2 Let p : V 7 [0, ) be a homogeneous convex function on a real vector

space V . Let f0 : U 7 IR be a linear functional on a linear subspace U of V , such that
p(u) f0 (u) for all u U. Then there exists a linear functional f : V 7 IR such that
f (u) = f0 (u) for all u U, and p(v) f (v) for all v V .
While Theorem 9.2 is formulated differently, it is actually equivalent to Theorem 8.5.

To show one direction of this equivalence, assume that Theorem 9.2 holds. Let V be
a convex set with a non-empty interior point w0 . Assume that v0 V is not an interior
point of . Then the set
def
w0 = {v w0 : v }
has zero as its interior point. Let p : V 7 IR be the Minkowski functional of w0 .
Since u0 = v0 w0 is not an interior point of , we have p(u0 ) 1. Hence p(tu0 ) t for
9.1. PROOF OF THE HAHN-BANACH THEOREM 161
every t IR, which means that p(u) f0 (u) for all u U, where U is the one-dimensional
linear subspace of V consisting of all vectors tu0 with t IR, and f0 : U 7 IR is the
linear function defined by f0 (tu0 ) = t. According to Theorem 9.2, there exists a linear
function f : V 7 IR such that f (u0 ) = f0 (u0 ) = 1, and p(v) f (v) for all v V . Hence
f (v) p(v) 1 for all v w0 , and also f (u0 ) = 1. Equivalently,
f (w) 1 + f (w0 ) = f (v0 w0 ) + f (w0) = f (v0 ) w ,
which means that f (v) = f (v0 ) is the hyperplane separating and v0 .
9.1.2 Proof Of Theorem 9.2

To prove the Hahn-banach Theorem by induction, consider the set X of all pairs (W, g),
where W is a linear subspace of V containing U, and g : W 7 IR is a linear function such
that p(w) g(w) for all w W , and g(u) = f0 (u) for all u U. Define a partial order
on X, according to which (W1 , g1 ) (W2 , g2 ) if and only if W1 W2 and g2 (w) = g1 (w)
for all w W1 . It is easy to see that for every completely ordered subset Xo of X there
exists an upper bound of Xo , i.e. a pair (Wu , gu ) X such that (W, g) (Wu , gu ) for
every (W, g) Xo : indeed, one can define Wu is the union of all W from (W, g) Xo ,
and gu : u 7 IR is defined by gu (w) = g(w) whenever w W for some (W, g) Xo .
According to the Zorns Lemma, X has a maximal element, i.e. there exists a pair
(W , g ) X such that (W, g) (W , g ) for all (W, g) X. It is sufficient to prove
that W = V .
Assume to the contrary that W 6= V . Then there exists a vector u V such that
u 6 W . Our objective is to show that one can find a constant a IR such that
p(w + tu) g (w) + at w W , t IR, (9.1)
because this would mean that (W, g) 6 (W , g ) for (W, g) X defined by
W = {w + tu : w W , t IR}, g(w + tu) = g (w) + at (w W , t IR),
and the resulting contradiction would prove that W = V .
To prove the existence of a IR satisfying (9.1), let
a = inf {p(w + u) g (w)}. (9.2)
wW
Due to the convexity of p and since (W , g ) X we have

[p(w1 + u) g (w1 )] + [p(w2 u) g (w2 )] w1 + w2 w1 + w2
p g 0
2 2 2
for all w1 , w2 W . Therefore
a sup {g (w) p(w u)} > , (9.3)
wW
which, in particular, means that a IR.

To finish the proof, note that (9.2) implies that
p(w + tu) g (w) = t[p(w/t + u) g (w/t)] at
for t > 0, and (9.3) means that
p(w + tu) g (w) = t[g (w/t) p(w/t u)] at
for t < 0. Since
p(w + tu) g (w) 0 = at
for t = 0, the inequality in (9.1) is satisfied, which completes the proof of the Hahn-Banach
Theorem.
9.1.3 Separation With Empty Interior

What happens when the assumption of Theorem 8.5 that the convex set has an interior
point is dropped? The following statement shows that, in a finite dimensional case, a
convex set which has an empty interior is contained within a hyperplane, which makes
the conclusion of Theorem 8.5 valid anyway.
Theorem 9.3 If a convex subset of a real vector space V of finite dimension dim(V ) =
n < has no interior point then there exists a linear function f : V 7 IR, not identically
equal to zero, such that f (w1 ) = f (w2 ) for all w1 , w2 .
Proof. If is empty, the conclusion holds automatically. Assuming that 6= , fix an element
w0 , and let v1 , . . . , vm be a linearly independent subset of w0 with the maximal number
of elements (by construction m n).
Consider first the case when m = n. Let wk = vk + w0 for k = 1, . . . , n. Then every u V
can be represented in the form
u = c1 v1 + + cn vn = (c1 + + cn )w0 + c1 w1 + + cn wn ,
where ck IR, and hence

n n
!
1 X X 1
v + tu = t ck + + tck wk
n n
k=1 k=1
for
n
1 X
v= wk
n+1
k=0
when t > 0 is small enough to satisfy the conditions
n
1 X 1
t ck 0, + tck 0 (k = 1, . . . , n),
n n
k=1
9.1. PROOF OF THE HAHN-BANACH THEOREM 163
which makes v and interior poin of .

The contradiction with the assumed absence of interior points of proves that m < n.
Let vm+1 ,. . . ,vn be the vectors complementing the sequence (v1 , . . . , vm ) to a basis. The linear
functional f : V 7 IR defined by
f (c1 v1 + + cn vn ) = cn
satisfies the condition f (w w0 ) = 0 for all w , as desired.
Theorem 9.3 can be used to strengthen the statement of the separation principle in
the finite dimensional case.
Theorem 9.4 If is a non-empty convex subset of a finite dimensional real vector space
V and u V is not an element of then there exists a linear function f : V 7 IR such
that f (w) f (u) for all w , and f (w) 6= f (u) for some w .
Proof. Let U be the linear subspace of V containing all linear combinations of vectors w u
where w . Since 0 6 u, there exists a non-zero linear function g : U 7 IR such that
g(w u) 0 for all w . By construction, g(w) g(u) for all w , and g(w) 6= g(u) for
at least one w . Let v1 , . . . , vm be a basis in U . Extend it to a basis v1 , . . . , vm , . . . , vn in V ,
and define f : V 7 IR according to
f (c1 v1 + + cn vn ) = g(c1 v1 + + cm vm ).
By construction, f is the desired linear functional.
As is shown by the following example, the non-empty interior assumption is needed

in the infinite dimensional case.
Example 9.2 Let V be the real vector space of all real polynomial functions
p(t) = p0 + p1 t + + pn tn
of a single real variable (n is not fixed). Let be the subset of V consisting of those polynomials
which converge to + as t +. Obviously, is a convex set, and the zero polynomial
p(t) 0 is not an element of . Nevertheless, there is no linear function f : V 7 IR, f 6 0
such that f (w) 0 for all w .
To prove this, note that since every polynomial can be represented as a finite linear combi-
nation of the monomials ek (t) tk , every linear function f : V 7 IR is completely determined
def
by its values fk = f (ek ). Since the polynomial
n n
!
X X
n+1 k
p(t) = t + pk t i.e. p = en+1 + pk ek
k=0 k=0
is in for all pk IR, it follows that
n
X
f (p) = fn+1 + f (ek )pk 0 pk IR,
k=0
which means that f (ek ) = 0 for all k, i.e. f (p) = 0 for all p.
9.2 Applications Of Hahn-Banach Theorem

Most applications of the separation principle follow the same sequence of logical steps:
(A) recognize a real vector space V and its convex subset involved in the original
setup;
(B) interpret the assumptions (or their rejection if proving by contradiction) as some
vector u V not being an interior point of ;
(C) if V is infinite dimensional, check that has an interior point (if not, try to choose
a different V );
(D) produce a workable representation for general linear functionals f : V 7 IR (it is
sufficient to limit attention to those f which have a finite lower bound on , which
can be a critically simplifying factor when dim(V ) = );
(E) interpret the inequality f (w) f (u) (valid for all w ) to derive the desired
conclusion in the original question.
In the infinite dimensional case, a typical difficulty is that satisfying the requirements of
step (C) (if at all possible) makes step (D) harder. Step (D) requires good understanding of
the structure of linear functionals, which is a major skill required for working in functional
analysis anyway. Step (E) typically requires application of other mathematical tools:
sometimes linear algebra is enough, but frequently some additional theorem from analysis
is used.
This section contains a collection of examples of applying the Hahn-Banach Theorem
to derive statements used in dynamical system analysis and optimization.
9.2.1 Existence Of Quadratic Lyapunov Functions

An n-by-n real matrix A defines a system of ordinary differential equations
x(t) = Ax(t). (9.4)
Solutions x = x(t) of (9.4) describe (approximately) the behavior of many dynamical
systems, most prominently those relevant in the design and analysis of feedback control.
Among many questions which can be asked about the behavior of the solutions of (9.4) is
verification of their inability to return to an initial non-zero state, i.e. to have x(0) = x(T )
for some T > 0 for a solution of (9.4) which is not identically equal to zero.
An answer to the question can be given in terms of the eigenvalues of A: there should
be no eigenvalues with zero real part. The answer, which relies on an explicit formulae
for x(t) according to

t2 X tk def
x(t) = x(0) + tAx(0) + A2 x(0) + = Ak x(0) = eAt x(0),
2 k
k=0
9.2. APPLICATIONS OF HAHN-BANACH THEOREM 165
is not completely satisfactory, though, as it does not describe well robustness of the
property being checked. For example, the linear equations in (9.4) are frequently approx-
imating a nonlinear system
x(t) = Ax(t) + (x), (9.5)
where : IRn 7 IRn is a continuous function bounded by
|(x)| |x| x IRn , (9.6)
and > 0 is known to be small. Since there is no explicit formula for solving (9.5), the
eigenvalue argument turns out to be insufficient.
An argument for robustness of the no return property can be given by obtaining a
quadratic Lyapunov function V (x) = x0 P x for system (9.4), which is strictly decreasing
along all non zero solutions of (9.4). Indeed, in this case the inequality V (x(T )) <
V (x(0)) is guaranteed for x(t) 6 0, which makes equality x(t) = x(0) impossible. The
differentiation
dV (x(t)) dx(t)0 P x(t)

= = x(t)0 P (Ax(t)) + (Ax(t))0 P x(t) = x(t)0 [P A + A0 P ]x(t)
dt dt
clearly shows that V (x) = x0 P x has the desired properties if and only if the n-by-n
symmetric matrix P = P 0 satisfies the strict Lyapunov inequality
P A + A0 P > 0. (9.7)
Once a solution P = P 0 of (9.7) is available, it can be used to certify the no return

property for the nonlinear equations (9.5) when (9.6) holds with a sufficiently small > 0,
because
2x0 P (Ax + (x)) 2[min(P A + A0 P ) max (P )]|x|2
has a positive definite lower bound (max (X) and min(X) denote the largest and the
smallest eigenvalues of a symmetric matrix X here).
The following statement, which can be proven easily using the Hahn-Banach Theorem,
shows that a solution P = P 0 of (9.7) exists if and only if A has no eigenvalues on the
imaginary axis.
Theorem 9.5 Let A be an n-by-n real matrix. Then the following conditions are equiv-
alent:
(a) A has no eigenvalues s with zero real part;
(b) there exists a real matrix P = P 0 satisfying (9.7).

Proof. The implication (b)(a) is straightforward: if Av = jv for some v 6= 0 and IR

then
v 0 (P A + A0 P )v = v 0 P (jv) + (jv)0 P v = 0,
which contradicts (9.7).
To prove the implication (a)(b), assume to the contrary that the inequality (9.7) has no
solutions P = P 0 .
(A) Let V be the real vector space of all real symmetric n-by-n matrices. Let be the convex
subset of V consisting of all matrices of the form P A + A0 P Q, where Q = Q0 > 0 and
P = P 0.
(B) By assumption the zero element of V does not belong to , because P A + A0 P Q = 0

for Q = Q0 > 0 means that (9.7) is satisfied.
(C) Note that dim(V ) = 0.5n(n + 1) < .
(D) For every linear function f : V 7 IR there exists H V such that f (X) = trace(XH)
for all X V .
(E) Let f : V 7 IR, f (X) = trace(XH), where H = H 0 , be the non-zero linear function
separating from zero (it exists due to Theorem 8.5). Then
trace[(P A + A0 P Q)H] = trace[P (AH + HA0 )] trace(QH) 0
for all P = P 0 and for all Q = Q0 > 0. Setting P = 0 yields trace(QH) 0 for every
Q = Q0 > 0, which means H 0. Setting P = tP0 for an arbitrary P0 V and letting
t converge to yields AH + HA0 = 0. Our objective is to show that these conditions
contradict the assumptions.
Consider A and H as linear operators on C n . Let U = R(H) C n be the range of H.
Since H 6= 0, we have U 6= {0}. Since AHv = H(A0 v) for every v C n , U is A-invariant.
Let Hv be a non-zero eigenvector of the restriction of A to U , i.e. AHv = sHv, Hv 6= 0.
Then
0 = v 0 (AH + HA0 )v = v 0 (AHv) + (AHv)0 v = 2Re(s)v 0 Hv,
which, due to v 0 Hv 6= 0, means that s is a purely imaginary eigenvalue of A. The
contradiction proves Theorem 9.5.
9.2.2 Duality In Linear Programming

A triplet (A, B, C) of real matrices A, B, C of dimensions m-by-n, m-by-1, and 1-by-n
respectiely defines a linear program
Cx sup = p subject to Ax B, x IRn (9.8)

as the task of finding the minimal upper bound p of Cx subject to the inequality Ax B
(understood component-wise), where x ranges over IRn .
It is easy to see that every 1-by-m matrix p such that p 0 (component-wise) and
pA = C provides an upper bound pB for p , because Cx = pAx pB whenever Ax B.
This leads to another optimization problem, called the dual of (9.8):
pB inf = d subject to pA = C, p 0, p IR1m . (9.9)
An important observation of the theory of linear programming is that d = p unless

d = + and p = , which means that the constraints in both (9.8) and (9.9) are
infeasible. The following example shows that the situation with d = + and p =
can happen.
Example 9.3 If
0 1
A= , B= , C= 1, 0
1 0
then there is no x IR such that Ax B, and there is no p IR12 such that pA = C and
p 0.
The following statement, conveniently proven using the Hahn-Banach Theorem, es-
tablishes that there is no duality gap when the inequality Ax B has a solution.
Theorem 9.6 Let A, B, C be real matrices of dimensions m-by-n, m-by-1, and 1-by-n
respectiely such that Ax0 B for some x0 IRn . Then
sup{Cx : x IRn , Ax B} = inf{pB : p IR1m , p 0, pA = C}.
Proof. To emphasize the essentials, the proof is given under a slightly stronger assumption
that Ax0 < B for some x0 IRn . Assume that y IR is such that y p .
(A) Let
V = IRIRm , = {(y Cx+, AxB +) : IR, > 0, IRm , > 0, x IRn }.
By construction, is a convex subset of a real vector space V .
(B) If y > p then 0 V is not an element of .
(C) dim(V ) = m + 1 < .
(D) For every linear function f : V 7 IR there exist q IR and h IR1m such that
f (u, v) = qu + hv for all (u, v) V .
(E) Let f : V 7 IR be the non-zero linear function f (u, v) = qu + hv separating from 0, in

the sense that (q, h) 6= 0 and
q(y Cx + ) + h(Ax B + ) 0 > 0, > 0, x IRn (9.10)
(existence of such q, h is guaraneed by Theorem 8.5). Fixing x, in (9.10) and letting

= t0 where t + proves that h 0. Fixing x, and letting = t0 where
t + proves that q 0. Fixing , with x = tx0 , t yields hA = qC. Finally,
setting x = 0 with = td0 , = t0 where t 0+ yields qy hB. Note that q > 0,
because otherwise q = 0 and hence
h(B Ax0 ) = hB (hA)x0 qy qCx0 = 0,
which means h = 0 since h 0 and B Ax0 > 0. Hence, for p = q 1 h, we have p 0,

pA = C, and pB y.
Since y was an arbitrary upper bound for p , the proof of Theorem 9.6 is complete.
9.2.3 Necessary Conditions Of Optimality

Consider the constrained nonlinear (and, in general, non-convex) optimization setup
g0 (x) min subject to gk (x) 0 (1 k m), x X, (9.11)
where X is an open subset of IRn , and gk : X 7 IR for k = 0, . . . , m are continuously

differentiable functions. Given such limited information about the properties of functions
gk , the positive statements concerning the optimization problem in (9.11) are ususally re-
stricted to necessary conditions of optimality for a given argument of minimum candidate
x0 X.
When m = 0, i.e. the setup in (9.11) is an unconstrained minimization task, and the
argument of minimum x0 , if exists, must satisfy the equation g0 (x0 ) = 0. Since g0 (x) is a
row vector with n components depending on x, the equation g0 (x0 ) = 0 formally provides
n (nonlinear) scalar equations for n scalar variables (the components of x0 ). In general,
however, equation g0 (x) = 0 can have many solutions, and most of them (possibly even
all of them) will not minimize g0 .
When m > 0, the necessary condition of optimality of x0 X is frequently written
in the form Lp (x0 ) = 0, where p = [p1 , . . . , pm ] is a row vector with non-negative entries,
such that pk = 0 whenever gk (x0 ) < 0, and the function Lp : X 7 IR is defined by
m
X
Lp (x) = g0 (x) + pk gk (x). (9.12)
k=1
The coefficients p1 , . . . , pm are called Lagrange multipliers. When r out of m values of

g1 (x0 ),. . . , gm (x0 ) are expected to be equal to zero, this yields n + r scalar equations for
n + r variables (the r non-zero components of p and the n components of x0 ).
However, there are simple examples showing that existence of the Lagrange multipliers
satisfying Lp (x0 ) = 0 is not a necessary condition of optimality. For instance, with
n = m = 1, X = IR, g0 (x) = x, g1 (x) = x2 , (9.13)
the point x0 = 0 is the argument of the constrained minimum, but Lp (0) = 1 for Lp (x) =
x + px2 for every p IR.
The following statement, proven using the separation principle, provides valid neces-
sary conditions of optimality using Lagrange multipliers.
Theorem 9.7 Let X be an open subset of IRn . For k = 0, 1, . . . , m let gk : X 7 IR

be continuously differential functions. Let x0 X be an argument of minimum in the
optimization setup (9.11), i.e. gk (x0 ) 0 for k = 1, . . . , m, and g0 (x0 ) g0 (x) for all
x X such that gk (x0 ) 0 for k = 1, . . . , m. Then one of the following two conditions is
satisfied:
(a) there exist real scalars pk 0 such that pk gk (x0 ) = 0 and Lp (x0 ) = 0 for the function
Lp : X 7 IR defined in (9.12);
(b) there exist real scalars pk 0 such that pk gk (x0 ) = 0, p1 + + pm = 1, and
p1 g1 (x0 ) + + pm gm (x0 ) = 0. (9.14)
The case (b) described in Theorem 9.7 is sometimes referred to as degenerate: it

describes those points x0 in the parameter space which could turn out to be extremal
because the constraints behave in a weird way in a neigborhood of x0 . In particular, case
(b) takes place in the example described by (9.13).
Proof.
(A) Let k(1), . . . , k(r) be the list of all indexes k{1, . . . , m} such that gk (x0 ) = 0. Define
V = IRr+1 and
gk(0) (x0 )v + 0

.

.. n
= : v IR , > 0 ,

k

gk(r) (x0 )v + r

def
where k(0) = 0.
(B) By assumption, 0 IRr+1 is not an element of . Indeed, otherwise there exists v IRn
such that gk(i) v < 0 for all i = 0, 1, . . . , r, which means that, for a suffciently small t > 0,
all numbers
gk (x0 + tv) = gk (x0 ) + tgk (x0 )v + o(t) (k = 0, 1, . . . , m)
are negative, which contradicts the assumed optimality of x0 .

(C) dim(V ) = r + 1 < .
(D) For every linear functional f : V

7 IR there exist hk IR such that

v0 r
f ... =
X
hk vk . (9.15)

vm k=0
(E) If the functional in (9.15) is not identically equal to zero, and separates and 0, in the
sense that f (w) 0 for all w , then
m
X
dk [gk (x0 )v + k ] 0 k > 0, v IRn ,
k=0
where dt = hi when t = k(i), and dt = 0 when gt (x0 ) < 0. Setting v = 0 yields dk 0,

and letting k 0 yields
Xm
dk gk (x0 ) = 0.
k=0
When d0 > 0, dividing the equality by d0 yields conclusion (a) with pk = dk /d0 . When
d0 = 0, dividing by the equality by the sum d of di yields (9.14) with pk = dk /d.
Chapter 10
Norms And Convergence
This chapter introduces major constructions associated with using limits of sequences of
vectors. It covers the strong convergence, given in terms of norms, weak convergence, de-
fined in terms of duality (linear functionals), and explores application of the standard no-
tions of continuity, compactness, completeness, and separability. In contrast, the general
issues of vector space topology (i.e. the abstract guidelines for defining and generalizing
convergence while working with inifnite dimensional vector spaces) are not discussed here.
10.1 Motivation
Properties of vector spaces associated with convergence are used extensively when deal-
ing with existence of solutions of systems of linear and nonlinear equations, as well as
optimization problems with infinite number of variables. Convergence properties are also
used as benchmarks for establishing and comparing performance of approximation and
model reduction algorithms.
10.1.1 Infinite Dimensional Differentiatial Equations

Convergence comes in handy when dealing with differentiation of invinite dimensional
vector functions. Consider, for example, the system of linear differential equations
xk (t) = xk+1 (t), xk (0) = xk0 (k = 0, 1, 2, . . . ), (10.1)
to be solved with respect to continuously differentiable functions xk : [0, T ] 7 IR for a

given set of initial conditions xk0 IR.
While the system of equations in (10.1) appears to be quite straightforward to un-
derstand, one has to be careful in defining its meaning. For example, if by a solution of
(10.1) one understands a set of continuously differentiable functions {xk ()}
k=0 such that
xk = xk+1 for every k, then the initial conditions xk (0) = xk0 do not define a unique
171
172 CHAPTER 10. NORMS AND CONVERGENCE
solution. For example, when xk0 = 0 for all k, one solution of (10.1) is obviously given
by xk (t) 0. However, for every function : IR 7 IR which is infinitely many times
differentiable, and satisfies the condition (k) (0) = 0, setting xk = (k) (the k-th derivative
of ) yields a valid set of continuously differentiable functions xk satisfying the equations
in (10.1). Since such a function can easily be not identically equal to zero, as in
exp(1/t2 ), t 6= 0,

(t) =
0, t = 0,
we get a multiplicity of solutions with zero initial conditions.

To avoid the confusion, systems of an infinite number of differential equations should
be cast within a framework of real vector spaces with convergence. For example, one of
many possible proper interpretations of (10.1) views it as a relation v(t) = Av(t), which
is an equation to be solved with respect to a function v : [0, T ] 7 V , where V = `2 is the
Hilbert space of all square summable sequences v = {v(t)} t=0 of real numbers, A : V 7 V
is the backshift linear function defined according to
A(v(0), v(1), v(2), . . . ) = (v(1), v(2), v(3), . . . ),
and v(t) is defined as the limit
v(t + ) v(t)
v(t) = lim . (10.2)
0,t+[0,T ]
Since V = `2 , as a Hilbert space, has length |v| = (v, v)1/2 of its elements readily defined,
the limit relation in (10.2) can be understood as the definition of v(t) as the element of
V satisfying

v(t + ) v(t)
lim v(t) = 0.
0,t+[0,T ]
It can be shown that, with this definition of the derivative, the differential equation v = Av
has a solution v : IR 7 `2 satisfying initial condition v(0) = v0 for every given v0 `2 .
The availability of existence and uniqueness of solution statements in this example
is very much due to a proper selection of a real vetcor space and a convergence metric
to represent sequences of real numbers. It is also possible to make a selection which,
while reasonable in some other applications, leads to a lot of inconvenience in this case.
For example, defining V as the real vector space of all real sequences {v(t)} t=0 , with
convergence defined component-wise (i.e. vk u in V iff vk (t) v(t) as k for
every fixed t ZZ+ ), brings one back to the original interpretation of (10.1), when each
differential equation in (10.1) is considered separately, and uniquencess of solutions with
fixed initial conditions does not take place.
10.1. MOTIVATION 173
10.1.2 Existence of a Minimizer

Let : X 7 IR be a real-valued function defined on a set X. We are interested in
conditions which guarantee existence of an minimizer of , i.e. an element x X such
that (x) equals the maximal lower bound inf of on X. Such conditions are needed
both in application-driven optimization, as establishing existence of a minimizer is an
important first step in developing numerical algorithms and analytical formulae. They
are also helpful in proving existence of mathematical objects with certain properties.
When X is a subset of IRn , a commonly used argument for existence of a minimizer
relies on compactness of X and lower semi-continuity of . Both notions are defined
in terms of the standard convergence of sequences of vectors in IRn , according to which
a sequence {vk } n
k=1 converges to a limit w IR (notation vk w) if and only if the
Euclidean length |v vk | converges to zero as k . Within this framework, a subset
of IRn is called (sequentially) compact if from every sequence {xk }k=1 of elements xk X

one can extract a subsequence {yi }i=1 , where yi = xk(i) and k(i) as i , such that
yi z for some z X. Similarly a function : X 7 IR is called lower semi-continuous
if
(w) lim inf (vk ) whenever vk X, w X, vk w. (10.3)
n kn
It is easy to see that a lower semi-continuous function : X 7 IR on a non-empty

compact subset of IRn achieves its minimum.
Lemma 10.1 Let X be a non-empty compact subset of IRn . Let : X 7 IR be a lower

semicontinuous function. Then there exists x X such that (x) = inf .
Proof. By the definition of inf , there exists a minimizing sequence {xk }

k=1 of elements
xk X, i.e. such that (xk ) inf . By the compactness of X, there exists a subsequence
{yi }
i=1 of {xk }k=1 converging to a limit z X. Then the lower semi-continuity of implies
inf = lim (yi ) (z),

i
which proves that z X is the minimizer of .
In applications, Lemma 10.1 is used together with a (less trivial) classical statement
characterizing the compact subsets of IRn .
Theorem 10.1 A subset X IRn is compact if and only if is is closed (i.e. xk x and
xk X implies x X) and bounded (i.e. there exists r > 0 such that |x| r for all
x X).
The proof of the following statement is a typical application of Lemma 10.1 and
Theorem 10.1.
Lemma 10.2 Let a1 , . . . , am be a set of vectors in IRn such that every vector in IRn can
be represented as a linear combination of vectors ak . Then there exists > 0 such that
m
def
X
(x) = |a0k x|4 |x|4 x IRn . (10.4)
k=1
Proof. Let X be the the unit sphere in IRn , i.e. the set of all x IRn such that |x| = 1. Since
X is bounded and closed, it is compact. Since the function is continuous on X, it achieves its
minimum at a point x X. Since x is a linear combination of ak , at least one real number a0k x
is not equal to zero, and hence (x) > 0. Since both sides of (10.4) scale the same way when x
is replaced by cx, where c IR, the conclusion of the theorem follows with = (x).
Examination of the proof of Lemma 10.1 shows that its statement has little to do with
the fact that X is a subset of IRn : the only important thing is that a notion of convergence
to a limit should be defined for sequences of elements of X, in such a way that, with respect
to this definition of convergence, X is sequentially compact, and : X 7 IR is lower
semi-continuous. It turns out that, in contrast with the finite dimensional real vector
space IRn , where all reasonable definitions of convergence turn out to be equivalent,
infinite dimensional vector spaces allow substantially different definitions of convergence.
Which definition of convergence to use depends on the specifics of a particular application:
giving a definition in which more sequences have a limit leads to more sets being compact,
at the expense of allowing fewer lower semi-continuous functions.
10.2 Basic Types Of Convergence

This section gives definition and examples of two basic types of convergence used in real
(and complex) vector spaces: the strong convergence defined in terms of a norm, and
weak convergence based on duality.
10.2.1 Norms And Strong Convergence

In the previous lectures, the value of |v| = (v)1/2 , where : V 7 IR is a positive definite
quadratic form on a real vector space V was used as a measure of length of vectors v V .
The notion of a norm on a real vector space extends this approach to measures of length
which are not based on quadratic forms.
Definition 10.1 A function p : V 7 [0, ) defined on a real vector space V is called a

norm when it satisfies the following conditions:
(a) p(cv) = |c|p(v) for all c IR, v V (homogenuity and symmetry);
(b) p(v + u) p(v) + p(u) (the triangle inequality);

10.2. BASIC TYPES OF CONVERGENCE 175
(c) p(v) = 0 implies v = 0 (positive definiteness).
Since condition (b), subject to (a), is equivalent to convexity of p, a function p : V 7

[0, ) is a norm if and only if it is a Minkowski functional of a convex symmetric bounded
(in the sense that
def
v = {t IR : tv } = [rv , rv ] where rv (0, )
for every v V ). This offers a convenient way to verifying that a specific function
p : V 7 [0, ) is a norm.
According to a common convention, the values of norms are denoted using the double
vertical bar signs, as in p(v) = kvk. This could be inconvenient when several norms on
the same real vector space are used simultaneously. To avoid ambiguity in the notation,
an index can be used to indicate a specific norm, as in p(v) = kvk or p(v) = kvkV .
Example 10.1 If : V 7 IR is a positive definite quadratic form then the function v 7

def
|v| = (v)1/2 is a norm (essentially, this follows from the Cauchy-Bunyakovski-Schwarz in-
equality). In these lecture notes, the norms derived from positive definite quadrtaic forms are
denoted by p(v) = |v| (or p(v) = |v| when V is a Hilbert space defined by ).
Example 10.2 A family of commonly used norms on IRn is defined by

x1 n
!1/p
.. def
X
p
= |xk | ,

.

xn k=1
p
where p [1, ) is a parameter. The main step in proving that k kp is indeed a norm relies on
convexity of the function f : IR 7 IR defined by f (y) = |y|p (hence x 7 kxkpp is convex, hence
the level set of {x : kxkp 1} is convex, hence k kp is convex, as a homogeneous function.
Symmetry and positive definiteness of k kp are obvious. Note that kxkp is not a convex (and
hence is not a norm) when p < 1.
The limit
x1 x1
.. def .

. = lim .. = max |xk |
p+ k
xn xn
p
is also a norm on IRn . The norm k k is the Minkowski functional of the hypercube

x1
.. IRn :

= |xk | 1 .

.
xn

Example 10.3 The idea of the k kp -norms can be extended to sets of functions. For example,
on the real vector stace V = C[0, 1] of all continuous functions v : [0, 1] 7 IR one can define the
norms Z 1 1/p
def p
kvkp = |v(t)| dt
0
for all p [1, ), with
def
kvk = max |v(t)|
t[0,1]
being the limit case. There is no a-priori preference in choosing which norm to use. The kvk
(L-Infinity) norm represents the peak value of |v|, which is suitable for the worst case analysis
problems. The kvk2 (L2) norm, being generated by a quadratic form, is typically by far the
easiest to work with. Using the kvk1 (L1) norm as a measure of error in approximating a complex
function by a large number of simpler ones is commonly believed to aid in reducing the number
of non-zero coefficients in the resulting approximation.
Example 10.4 Let U and V be real vector spaces equipped with norms k kU and k kV
respectively. A linear function A : U 7 V is called bounded (with respect to these norms)
if there exists a constant c > 0 such that kAukV ckukU for all u U . The maximal lower
bound of such constants c is called the operator norm of A (induced by k kU and k kV ). It is
easy to check that the operator norm is indeed a norm on the real vector space W = L(U, V ) of
all bounded linear functions A : U 7 V . In most cases, operator norms cannot be defined by
quadratic forms.
A real vector space V equipped with a norm k k is usually referred to as a normed

space. A sequence {vk }
k=1 of vectors vk V in a normed space V is said to converge
kk
strongly to a limit w V (notation vk v or, to specify the norm, vk v, if and only
if kvk wk 0 as k .
In general, strong convergence depends on the choice of the norm. For example, the
sequence {vk } a
k=1 of functions vk C[0, 1] defined by vk (t) = (t + 1/k) converges to zero
with respect to the norm k kp if and only if pa > 1.
Using strong convergence of vectors makes it more likely that a function of interest will
turn out to be continuous (or lower semi-continuous). On the other hand, compactness
with respect to a strong convergence is relatively rare.
Example 10.5 Consider the sequence of elements ek `2 , where `2 is the standard real Hilbert
space of all functions v : ZZ+ 7 IR such that

def
X
|v|2 = |v(t)|2 < ,
t=1
defined by
1, t = k,
ek (t) =
6 k.
0, t =
While the sequence is bounded,in the sense that |ek | = 1 for all k, it has no (strongly) convergent
subsequence, since |vi vk | = 2 whenever i 6= k.
It is clear that if k k : V 7 [0, ) is a norm on a real vector space V then the

function : V V 7 IR defined by (v1 , v2 ) = kv1 v2 k is a metric on V , in the sense
that (v1 , v2 ) = 0 if and only if v1 = v2 , and (v1 , v2 ) + (v2 , v3 ) (v1 , v3 ). Therefore,
the norm-based convergence is a special case of metric topology.
10.2.2 Convergence Defined In Terms Of Duality

The difficulty of defining a strong convergence with respect to which some important sets
of vectors (such as the unit balls {v V : kvk 1} in infinite dimensional normed
vector spaces V ) will be compact, is the main reason for using weak convergence, defined
in terms of linear functionals.
Definition 10.2 A subset F of the set V ] of all linear functionals f : V 7 IR on a

real vector space is called total if v = 0 is the only vector v V satisfying the equalities
f (v) = 0 for all f F .
Definition 10.3 Let F V ] be a total set of linear functionals on a real vector space V .
A sequence {vk }
k=1 of vk V is said to converge (weakly, with respect to F ) to a limit
F
w V (notation vk w if and only if
f (w) = lim f (vk ) f F.

k
Example 10.6 The set F = {fk } 2

k=0 of linear functionals fk : ` 7 IR defined by fk (v) = v(k)
F
is total. The corresponding convergence vk w means that vk (t) converges to w(t) for every
fixed t ZZ+ . With respect to this definition of convergence, the sequence {ek }
k=1 defined in
Example 10.5 converges to zero.
In the important special case when, just as in Example 10.6, F = {fk } ]

k=0 V is a
countable total set, the corresponding weak convergence on V can be define by introducing
a metric on V . One example of such metric is

X 1 |fk (v w)|
(v, w) = . (10.5)
k=0
2k 1 + |fk (v w)|
Indeed, the expansion in (10.5) converges for every pair v, w V because the function
g : IR 7 IR defined by
|y|
g(y) =
1 + |y|
takes values in [0, 1] only. Since g(y) = 0 implies y = 0, and F is total, (v, w) = 0 if
and only if v = w. Since g(y1 + y2 ) g(y1 ) + g(y2) for all real y1 , y2 , is a metric on V .
Finally, by construction, (vk , w) 0 as k if and only if fn (vk w) 0 for all n.
The most commonly used examples of duality-based convergence are associated with
normed vector spaces. Let V be a real vector space with norm k k : V 7 [0, ). Let
V be the real vector space of all bounded linear functionals f : V 7 IR (when V is a
Hilbert space defined by the quadratic form (v) = kvk2 , we know that V is, essentially,
w
same thing as V ). The weak convergence in V (notation vk u for the sequence {vk } k=1
of vectors vk V to converge weakly to u V ) is the duality convergence defined by
w
F = V . The weak* convergence in V (notation vk u for the sequence {vk } k=1 of

vk V to weak* converge to u V ) is the convergence defined by the set F of all
functionals gv : V 7 IR of the form gv (f ) = f (v), where v ranges over V . Since V ,
equipped with the operator norm, is a normed space itself, there is also the standard
weak convergence of sequences from V which, in general, is not the same as the weak*
convergence.
It is an easy observation that weak convergence is always implied by the strong one.
Theorem 10.2 Let V be a (real) normed space. Then

kk w
(a) if vk , u V and vk u then vk u.
kk w
(b) if fk , g V and fk g then fk g.
Proof. To prove (a), notice that
|f (vk ) f (u)| = |f (vk u)| kf k kvk uk 0 as k .
To prove (b), use
|fk (v) g(v)| = |(fk g)v| kfk gk kvk 0 as k .
w
While in general the weak convergence vk u does not imply strong convergence, it
does imply boundedness of the norms kvk k.
w
Theorem 10.3 If V is a normed real space and vk u in V then there exists a constant
r IR such that kvk k r for all k.
Theorem 10.3 has a nice proof based on the closed graph theorem, a general principle
of proving functional analysis statements to be studied in the forllowing lecture.
Example 10.7 Let V = `2 be the Hilbert space of squared summable sequences v = {v(t)}
t=0
of real numbers v(t) IR with the norm

!
def
X
kvk = |v| = |v(t)|2 .
t=0
The fact that for every bounded linear functional f : `2 7 IR there exists u `2 such that
f (v) = (u, v) and kf k = |u| can be interpreted as the identity (`2 ) = `2 . Therefore weak and
weak* convergence definitions on `2 are equivalent. Using the unit sample functions ek (t) = kt ,
w
it is easy to see that vk u in `2 implies
vk (t) = (vk , et ) (u, et ) = u(t) t ZZ+ .
According to Theorem 10.3, the weak convergence also implies existence of a finite upper bound
for |vk |. A simple derivation shows that these two conditions are not only necessary but also
sufficient for weak convergence:
`2 w
vk u iff sup |vk | < inf, lim vk (t) = u(t) t.
k k
The difference between many (though not all) possible ways of introducing convergence
on vector spaces does not manifest itself in the finite dimensional case, as demonstrated
by the following result.
Theorem 10.4 Let V be a real vector space with a finite basis b = (u1 , . . . , un ). Let
p : V 7 [0, ) be a norm. Let F V ] be a total subset of linear functionals f : V 7 IR.
Then for every sequence {vk }
k=1 of vectors vk V and for every w V the following
conditions are equivalent:
(a) p(w vk ) 0 as k ;
(b) f (w vk ) 0 as k for all f F ;
(c) cik 0 as k for every i {1, . . . , n}, where cik IR are the coefficients of the
linear decomposition
X n
w vk = cik ui.
i=1
10.2.3 Weak Convergence In `1 and `

`1 is an important normed space, defined as the real vector space of all sequences v =
{v(t)} 1
t=0 of real numbers (in other words, the elements of ` are functions v : ZZ+ 7 IR)
such that
def
X
kvk1 = |v(t)| < ,
t=0
with the function v 7 kvk1 serving as the norm of `1 .

To study weak convergence on V = `1 , a description of the dual space V is needed,
to be provided by the following statement.
Theorem 10.5 For every linear functional f : `1 7 IR there exists a sequence h =

{h(t)}
t=0 of real numbers such that
def
khk = sup |h(t)| < ,
tZZ+

X
f (v) = h(t)v(t) v `1 , (10.6)
t=0
and the operator norm of f is given by
kf k = khk .
Essentially, Theorem 10.5 establishes that, for V = `1 , its dual space V can be
associated with the real vector space ` of all uniformly bounded real sequences h =
{h(t)}
t=0 , equipped with the norm h 7 khk .
Proof. Let f : `1 7 IR be a bounded linear functional. For all t ZZ+ define h(t) = f (ek ),
where
1, k = t,
ek (t) =
6 t.
0, k =
Since kek k1 = 1, it follows that
|h(t)| kf k ket k1 kf k,
and hence khk kf k.

For every v `1 and T ZZ+ let PT v `1 be defined by

v(t), t T,
(PT v)(t) =
0, t > T.
By construction,
X
kv PT k1 = |v(t)|
t>T
converges to zero as T . Hence, by the boundedness of f , f (v) f (PT v) converges to zero

as T , and therefore condition (10.6) is satisfied. Let {t(k)}
k=1 be a sequence of indexes
t(k) ZZ+ such that |h(t(k))| khk as k . Then |f (et(k) )| = |h(t(k))| khk , which
proves that kf k = khk .
Theorem 10.5 allows one to establish a criterion of weak convergence in `1 : a sequence

of vk `1 converges weakly to a limit u `1 iff

X
lim h(t)(vk u(t)) = 0
k
t=0
for every h ` . This criterion is not very useful, though, as it can be tricky to apply, due
to the arbitrariness of the sequence h ` . A more detailed analysis of weak convergence
in `1 reveals the following (slightly disappointing) result.
w kk1
Theorem 10.6 Weak convergence vk u takes place in `1 if and only if vk u, i.e.
kvk uk1 0 as k .
w
Proof. Assume to the contrary that vk u but there exists > 0 such that kvk uk1 for
arbitrarily large k. Since by assumption |vk (t) u(t)| 0, this means that for every ZZ+
there exist n = ( ) and r = ( ) > such that
r
X
|vn (t) u(t)| 0.9kvn uk1 0.5.
t=
Let the sequences of non-negative integers (k), n(k) be defined according to
(0) = 0, (k + 1) = ( (k)) + 1, n(k) = ( (k)).
Let h ` be defined by

1, vn(k) (t) > u(t), (k) t < (k + 1),
h(t) = 1, vn(k) (t) < u(t), (k) t < (k + 1),
0, vn(k) (t) = u(t), (k) t < (k + 1).

By construction,

X
h(t)(vn(k) (t) u(t)) 0.8kvn(k) uk1 0.4
t=0
does not converge to zero. The contradiction proves Theorem 10.6.
In contrast, weak* convergence in V = ` is significantly different from the strong

convergence there.
Theorem 10.7 Let u, v1 , v2 , . . . be some elements of ` . The following conditions are

equivalent:
w
(a) vk u with respect to the duality relation ` = V where V = `1 ;
(b) there exists a constant c IR such that kvk k c for all k, and vk (t) u(t) as
k for every fixed t ZZ+ .
Note that component-wise convergence vk (t) u(t) is not enough for the weak*
w
convergence vk u in ` : uniform boundedness is another necessary condition.
Proof. The implication (b)(a) is a standard application of convergence of infinite sums: for
every w `1 the sequences t 7 w(t)(vk (t) u(t)) have the same index-independent summable
upper bound
|w(t)(vk (t) u(t))| |w(t)|(c + kuk ),
and hence

X
X
lim w(t)(vk (t) u(t)) = lim w(t)(vk (t) u(t)) = 0.
k k
t=0 t=0
w
To prove that (a) implies (b) assume that vk u, i.e.

X
lim w(t)(vk (t) u(t)) = 0
k
t=0
for all w `1 . Using w = ek (where ek are defined as in the proof of Theorem 10.5) with this
identity yields vk (t) u(t) for every fixed t ZZ+ .
The uniform boundedness of kvk k follows from Theorem 10.3.
For example, the sequence of unit step elements uk ` defined by

0, t < k,
uk (t) = (10.7)
1, t k,
converges to zero in the weak* sense, while the sequence vk = kuk does not converge
anywhere in the weak* sense.
Since, in contrast with the relations such as (`2 ) = `2 and (`1 ) = ` , there is no
explicit characterization of the dual space U for U = ` , the weak topology of ` , in
contrast with its weak* topology, is difficult to describe. It is easy to see that, in ` , the
weak convergence is strictly stronger requirement than the weak* convergence.
Example 10.8 The sequence uk defined in (10.7) does not converge to zero weakly in ` . To
prove this, consider a Banach limit functional, i.e. any linear function f : ` 7 IR such that
|f (v)| kvk for all v ` , and
f (v) = lim v(t)
t
whenever v(t) converges to a limit as t (existence of f is a consequence of the Hahn-Banach

Theorem). Since f (uk ) = 1 for all k, the convergence in question does not take place.
10.3 Completeness, Continuity And Compactness

This section explores the application of the standard notions of completeness, continuity,
and compactness, to real vector spaces.
10.3. COMPLETENESS, CONTINUITY AND COMPACTNESS 183
10.3.1 Completeness
A real normed space V is called complete is every Cauchy sequence in V has a limit.
Definition 10.4 Let V be a normed real vector space. A sequence {vk }

k=1 of vk V is
called a Cauchy seqience if kvk vm k 0 as k, m +, i.e.
lim sup kvk vm k = 0.

n+ min{k,m}n
V is called complete when for every Cauchy sequence {vk }

k=1 of vk V there exists u V
such that kvk uk 0 as k .
Similar notion of completeness can be introduced for other situations when convergence
of sequences to a limit is defined on a real (or complex) vector space, but the case of
normed spaces is by far the most useful one. Completeness is a required assumption in
many fundamental theorems of functional analysis. A complete normed space is frequently
called a Banach space.
Hilbert spaces form a special class of Banach spaces: availability of a scalar product
is a key element of many proofs. On the other hand, many application problems (in
particular, those involving approximation of linear transformations) do not allow for an
adequate interpretation within a Hilbert space framework, which usually makes using
more general Banach spaces a requirement.
The standard examples of Banach spaces are the vector spaces `p and Lp [0, 1] (where
1 p , as well as C[0, 1] (equipped with the max-norm), which usually supplies
enough building blocks for modeling with Banach spaces. Naturally, a closed subspace of
a Banach space is a Banach space as well. The following theorem generalizes a similar
Hilbert space result, and presents a less obvious construction of a complete normed space.
Theorem 10.8 For every normed real vector space V its dual V is complete.
Proof. Let fk : V 7 IR be bounded linear functionals such that kfk fm k 0 as m, k +

(note that this also implies existence of a single constant r > 0 such that kfk k r for all k).
For every v V the real numbers fk (v) form a Cauchy sequence in IR, and hence converge to a
limit y = g(v). Since fk are linear, so is g. Since
|(g fm )(v)| = lim |(fk fm )(v)| rkvk,

k
g is bounded and kg fk k 0.
Example 10.9 Completeness depends on the norm being used. In particular, the real vector
space C[0, 1] of all continuous functions v : [0, 1] 7 IR is complete with respect to the max-norm
kvk = max |v(t)|,

t[0,1]
but is not complete with respect to the L2 norm

Z 1 1/2
2
|v| = |v(t)| dt .
0
To see the incompleteness with respect to the | |-norm, consider the sequence of functions
(2t)k , t 0.5,

vk (t) =
1, t 0.5.
It is easy to see that vk is a Cauchy sequence with respect to the | | norm, but the limit of
vk is the unit step function with a discontinuity at t = 0.5.
10.3.2 Continuity
A definition of sequential continuity of a function : X 7 Y is available whenever
convergence is defined for sequences of elements of X and Y : the function : X 7 Y is
called continuous if and only if
(x ) = lim (xk ) whenever x = lim xk .

k k
In this subsection, we are interested in applying this concept to subsets of real vector
spaces. Since infinite dimensional vector spaces usually have several different meaningful
types of convergence available, there are different notions of continuity as well.
For linear functions f : U 7 V mapping one normed space into another, there are
important cases when continuity does not depend on the types of convergence being used.
Theorem 10.9 Let A : U 7 V be a linear function mapping one normed space to

another. Then the following conditions are equivalent:
(a) A is continuous with respect to the strong convergence in U and V ;
(b) A is continuous with respect to the weak convergence in U and V ;
(c) A is continuous with respect to the strong convergence in U and weak convergence
in V ;
(d) A is bounded.
In particular, Theorem 10.9 implies that the standard operations of addition and
scaling are continuous with respect to both strong and weak convergence (though not with
respect to the mixture of weak convergence of the arguments and the strong convergence
of the output).
Proof. Assume that A is bounded. Then kuk wk 0 implies
kAuk Awk = kA(uk w)k kAk kuk wk 0,
which means that (d) implies both (a) and (c). Similarly, since A is bounded, for every g V
the composition g A, mapping u U to g(Au) IR, is linear and bounded as well. Hence if
f (uk ) f (w) for every bounded linear function f : U 7 IR then
g(Auk ) = (g A)uk (g A)w = g(Aw),
which means that (d) implies (b).

Conversely, assume that A is not bounded, i.e. there exists a sequence of vectors uk U
such that kuk k 0 but kAuk k . Then uk 0 both strongly and weakly, while Auk does
not converge to zero in either sense. Hence every one of the assumptions (a),(b),(c) implies (d).
Continuity of a linear function A : U 7 V in the case when weak convergence is used

in U and strong convergence is used in V is a relatively rare, but very useful property,
which will be referred to as complete continuity here.
Example 10.10 The identity operator on `2 is not completely continuous. Indeed, the se-
quence of ek `2 defined by ek (t) = kt converges weakly to zero, but does not converge strongly.
In contrast, the function A : `2 7 `2 mapping a sequence v(t) to sequence u(t) = et v(t) is
completely continuous.
A very important example of a completely continuous function on L2 [0, 1] is given by the
integration operator, mapping v = v(t) to
Z t
u(t) = v( )d.
0
Continuity of nonlinear functions mapping vectors to vectors usualy does depend on

the choice of convergence definitions.
Example 10.11 The function f : `2 7 IR mapping v `2 to |v| IR is continuous with

respect to the strong convergence, but not continuous with respect to the weak convergence.
Indeed, if vk , u `2 and |vk u| 0 then, due to the triangle inequality,
|vk u| |u| |vk | |vk u|,
and hence |vk | |u|. On the other hand, the unit sample vectors ek `2 defined by ek (t) = kt
converge weakly to zero, while |ek | = 1 for all k.
Example 10.12 Let V be the normed space of all continuous funcions v : [0, 1] 7 IR, equipped
with the L2 norm 1/2
Z 1
2
kvk = |v(t)| dt .
0
Every function : IR 7 IR corresponds to an operator A : V 7 V (in general, a nonlinear

one) mapping every v V to u = A v defined by u(t) = (v(t)) for all t [0, 1]. It instructive
to check the conditions of continuity of with respect to the different types of convergence in
V.
Here is a summary of the conclusions:
(a) A is continuous with respect to the strong convergence of the argument (and strong or
weak convergence of the output) if and only if there exist constants c0 , c1 IR such that
|(y)| c0 + c1 |y| y IR;
(b) A is continuous with respect to the weak convergence (in both argument and the value)
if and only if there exist constants a0 , a1 IR such that
(y) = a0 + a1 y y IR;
(c) A is completely continuous if and only if there exists a constant a0 such that
(y) = a0 y IR;
As a rule, continuity of a nonlinear function with respect to weak convergence is

proven by showing that it can be represented as a composition of a completely continuous
function and a function which is continuous with respect to strong convergence.
Example 10.13 Let V be the normed space from Example 10.12. Let W be the the real
vector space C[0, 1] equipped with the k k norm. The function A : V 7 W mapping v V
to w = Av W according to
Z t 2
w(t) = v( )d
0
is continuous with respect to the weak convergence in V and the strong convergence in W .
10.3.3 Compactness
A set X, for the elements of which a notion of convergence is available, is called se-
quentially compact if every sequence {xk }k=1 of elements xk X contains a subsequence

{yk }k=1 , where yk = xn(k) , n(k) as k , converging to a limit z X.
For the subsets of IRn , equipped with the usual definition of convergence, sequential
compactness is equivalent to compactness, and a subset of IRn is compact if and only if it
is both closed and bounded.
As is demonstrated by the example of `1 , where the unit ball is not sequentially
compact with respect to both strong and weak convergence, establishing compactness is
not as easy in the infinite dimensional case.
The following theorem is a major tool of proving compactness in infinite dimensional
vector spaces.
Theorem 10.10 Let V be a normed real vector space. Assume that V contains a count-
able subset such that every vector v V is a limit of a sequence of elements of . Then
the unit ball of V is sequentially compact with respect to the weak* convergence.
The subset satisfying the conditions of Theorem 10.10 is called a countable dense
subset of V . A normed space V which has a countable dense subset is sometimes referred
to as separable.
Proof. Let {fk }
k=0 be a sequence of linear functionals fk V such that kfk k 1 for all k. We
need to prove existence of a subsequence gk = fn(k), where n(k) , such that gk (v) g(v)
for all v V .
Let {wm } m=1 be an enumeration of all elements of . Let us construct a strictly monotoni-
cally increasing function n = n(k) in such a way that fn(k) (vm ) converges to a limit gm as k .
To do this, use the standard diagonalization argument, by defining r : ZZ+ ZZ+ 7 ZZ+ ac-
cording to the following rules:
(a) r(0, k) = k for all k ZZ+ ;
(b) for m > 0, {r(m, k)}

k=0 is a subsequence of r(m 1, k) for which there exists gm IR
such that
|fr(m1,k) (wm ) gm | 2k k ZZ+
(such a subsequence exists because the sequence {fr(m1,k) (wm )}

k=0 is uniformly boun-
ded).
Once r = r(i, k) is defined, one can use n(k) = r(k, k).

To finish the proof, for every v V let g(v) be the limit of gm(k) where {wm(k) } k=0 is
a sequence of elements of converging to v. Due to the uniform boundedness of the linear
functionals fk , the limit exists, does not depend on a particular selection of {wm(k) }
k=0 , and
defines a bounded linear functional g : V 7 IR such that fn(k)(v) g(v) for all v V .
Example 10.14 The normed spaces `p are separable for 1 p < (in particular, the
countable set consisting of all sequences of rational numbers with a finite number of non-zero
elements is dense in `p for 1 p < . In contrast, ` is not separable. To see this, consider
the set X ` of all binary sequences v ` (i.e. such that v(t) {0, 1} for all t ZZ+ ).
Since kv uk = 1 for every two different elements of X, a single vector w ` cannot satisfy
the inequality kw vk < 0.5 for more than one element of X. Since X is uncountable, this
means that elements of a fixed countable subset of ` cannot approximate all elements of X
arbitrarily well.
Since `1 and `2 are separable normed spaces, the unit balls of the corresponding dual spaces
(respectively ` and `2 ) are sequentially compact with respect to the weak* topology. This
observation can be used to prove existence of an optimal solution in optimization problems
involving an infinite number of decision parameters.
Example 10.15 Consider the task of designing a control sequence aimed at minimizing the
maximal input/output amplitude of a given discrete time dynamical system given by the equa-
tions
y(t + 2) + 3y(t + 1) + y(t)2 = u(t) t ZZ+ , y(0) = 1, y(1) = 1. (10.8)
Formally, the task is to find a sequences u() and y() satisfying (10.8) for which the value of the
performance functional
(u(), y()) = sup{|u(t)| + |y(t)|} (10.9)
t0
is as small as possible. Without loss of generality, can be viewed as a nonlinear real-valued

function defined on the real vector space V = ` ` , and the task is to minimize on the
subset X of V defined by (10.8).
While the actual minimization problem is extremely hard, the question of existence of a
minimizer is relatively easy to answer using weak* compactness of the unit ball in ` .
First, one has to establish that the set X is not empty, which is easy to do as one check
directly that
4, t = 0, 1, t = 0,
u(t) = 1, t = 1, y(t) = 1, t = 1,
0, t > 1, 0, t > 1,

defines a pair (u, y) X.

To prove that achieves its minimum on X, let (uk , yk ) X be an optimizing sequence for
, i.e. such that (uk , yk ) inf . Since both kuk k and kyk k must be bounded (otherwise
w w
(uk , yk ) ) they have weak* convergent subsequences un(k) u, yn(k) y. Since weak*
convergence implies pointwise convergence, the limit pair (u, y) satisfies the equations from
(10.8). While it is not true that is continuous with respect to the weak* convergence,
(u(), y()) = sup {1 u(t) + 2 y(t)}

t0,i {1,1}
is the minimal upper bound of a family of bounded linear functionals. Since the minimal upper
bound of a limit is never larger than the limit of minimal upper bounds, is lower semicontinuous
with respect to the weak* convergence, which guarantees that (u, y) = inf .
Chapter 11
Feasibility Of Equations
This chapter presents methods for establishing feasibility and robustness of solutions
of linear and nonlinear equations with (potentially) infinite number of variables. The
approach relies on explicit construction of Cauchy sequences of vectors, to converge to a
desired solution (when proving feasibility) or to a parameter vector for which the equation
has no solution (when deriving robustness from feasibility by contradiction).
One idea to be introduced is that of uniform boundedness, claiming that a system of
linear equations Au = v (to be solved for an element u U of a Banach space u for a
given element v of another Banach space V ), defined by a bounded linear function A, and
feasible for every v V , admits a solution u bounded by kuk ckvk, where the constant
c does not depend on v.
Another key statement to be presented is a version of the inverse function theorem,
claiming, roughly speaking, that a nonlinear equation a(u) = v has a solution u u0 for
all v v0 = a(u0 ), as long as a is continuously Frechet differentiable at u0 , and the range
R(A) of its Frechet derivative A is the Banach space V of all possible values of v. The
inverse function theorem plays a key role in deriving necessary conditions of optimality in
infinite dimensional minimization subject to a (possibly infinite) number of equality and
inequality constraints.
11.1 Uniform Boundedness

On a finite dimensional real vector space, the task of minimizing a vector norm subject
to a set of linear constraints always has an optimal solution. This observation does not
extend to the infinite dimensional case, even under the assumption of completeness.
Example 11.1 Let `1 be the stabdard Banach space of all functions x : ZZ+ 7 IR such that

def
X
kxk = |x(t)| < .
t=0
189
190 CHAPTER 11. FEASIBILITY OF EQUATIONS
Let A : `1 7 `1 be the bounded linear operator mapping sequence x `1 to sequence y `1

defined by y(t) = e2t x(t). Let y0 `1 be defined by y0 (t) = et . Then the functional
: `1 7 IR defined by (x) = kAx y0 k has a maximal lower bound inf = 0 which is not
achievable, since the equation Ax = y0 does not have a solution x `1 .
Surprisingly, there exist several theorems claiming achievability of a minimal upper

bound which equals + for functions defined on Banach spaces in terms of norms and
bounded linear transformations. This section presents a number of such results, referring
to them generally as instances of the uniform boundedness principle.
11.1.1 Interior Mapping Principle

Let A : U 7 V be a linear function mapping one normed real vector space to another.
Consider the function : V 7 IR {+} defined by
(v) = inf kuk.

uU,Au=v
Informally, (v) is the minimal norm of a solution u of the linear equation Au = v (if one
ignores the fact that the minimum could be not achievable), unless the equation Au = v
def
has no solutions, in which case (v) = +. By construction, the function achieves
a minimal upper bound of + if and only if equation Au = v has no solution u U
for some v V . The following theorem, known as the interior mapping principle, claims
that achieves its minimal upper bound whenever sup = +, assuming that the linear
function A is bounded and the normed spaces U, V are complete.
Theorem 11.1 Let A : U 7 V be a bounded linear function mapping one Banach space
def
to another, such that V = R(A) = A(U). Then there exists r IR such that for every
v V the equation Au = v has a solution u U with kuk rkvk.
In other words, Theorem 11.1 claims that a feasible system of linear equations defined
by a bounded linear function on Banach spaces has a uniformly bounded solution. The
statement can be appreciated better by realizing that, in general, its assumptions guar-
antee neither the existence of a minimizer of kuk subject to Au = v, nor the existence of
a bounded linear function B : V 7 U such that ABv = v for all v V .
Proof. Let
D = {v V : kvk 1}, Dh = {u U : kuk h}
be the closed unit ball in V and the closed ball of radius h in U , centered at the origin. The first
step is to prove that there exists h > 0 such that every v D can be approximated arbitrarily
well by Au with u Dh , i.e. that the error kAu vk can be made arbitrarily small while using
u Dh . This is done by assuming the contrary, i.e. that
11.1. UNIFORM BOUNDEDNESS 191
(*) for every h > 0 there exists > 0 and v D such that kAu vh k > for all u Dh ,
and then using the assumption to construct a sequence w0 , w1 , . . . of vectors wk V converging

to a limit w V which is not in the range of A.
Note that assumption (*) implies that
(**) for every h > 0, every open ball Bd (v0 ) D of positive radius d > 0 contains a vector v
which cannot be approximated arbitrarily well by Au with u Dh .
Indeed, otherwise for every w D the norms kv0 Au0 k and kv0 + dw Auk can be made
arbitrarily small with some u, u0 Dh , which means that

= kv0 + dw Au (v0 Au0 )k kv0 + dw Auk + kv0 Au0 k
u u0
kw Auk = w A

d d |d|
can be made arbitrarily small by using
def u u0
u = D2h/d ,
d
which contradicts (*).
Using (**), construct a sequence of vectors wk V and positive numbers d(k) IR (k =
0, 1, . . . ) satisfying the inclusion Bd(k) D for all k by setting w0 = 0, d(0) = 1, and for
k = 0, 1, 2, . . . defining wk+1 as the vector in Bd(k) (wk ) which cannot be approximated arbitrarily
well by Au with u Dk (such w = wk+1 exists due to (**)), and choosing d(k + 1) (0, d(k)/2)
small enough so that Bd(k+1) (wk+1 ) D and kAu wk+1 k > d(k + 1) for all u Dk . By
construction, {wk } k=0 is a Cauchy sequence in V which converges to a limit w such that
w Bd(k) (wk ) for all k. Since none of the elements of Bd(k) (wk ) belonkgs to ADh , this implies
u 6 R(A): a contradiction which proves that () is not valid.
The second part of the proof, assuming that for every v D there exists u Dh such
that kAu vk 0.5, which, by homogenuity, is equivalent to the existence, for every v V of
u U such that kuk hkvk and kAu vk 0.5kvk, constructs, for every v V , a sequence
u0 , u1 , . . . converging to a vector u U such that Au = v and ku k 2hkv k. Let u0 = 0.
For k = 0, 1, 2, . . . define uk+1 = uk + u, where u U is such that kuk hkv Auk k and
kAu (v Auk )k 0.5kv Auk k. Then kv Auk k 2k kv k and kuk+1 uk k h2k kv k,
which implies that
kuk um k h21k kv k for m > k > 0.
Therefore {uk }k=0 is a Cauchy sequence. Since U is complete, kuk u k 0 for some u U
such that ku k 2hkv k. Since A is bounded, Au = v .
Example 11.2 Let V1 and V2 be two closed linear subspaces of a Banach space V . Assume
that every vector v V can be represented as a sum v = v1 + v2 , where vk Vk for k = 1, 2,
though not necessarily in a unique way. Does it follow that the equation v = v1 + v2 can be
solved for vk Vk with the solution (v1 , v2 ) being bounded, as in kvk k ckvk?
Due to Theorem 11.1, the answer is affirmative. Indeed, as closed linear subspaces of a
Banach space V , V1 and V2 are complete. Hence the real vector space U = V1 V2 of all pairs
(v1 , v2 ), vk Vk , equipped with the norm k(v1 , v2 )k = max{kv1 k, kv2 k}, is a Banach space. The
map A : U 7 V defined by A(v1 , v2 ) = v1 + v2 is bounded (its operator norm is not larger
than two). Since, by assumption, the equation Au = v (which, for u = (v1 , v2 ), is equivalent to
v1 + v2 = v) has a solution u U for every v V , according to Theorem 11.1 it can be chosen
in such a way that kvk k ckvk, where the constant c does not depend on v.
As shown by the following example, completeness of V is very essential in Theo-

rem 11.1.
Example 11.3 Let V be the real vector space of all polynomial functions p : [0, 1] 7 IR,
equipped with the k k norm. Let U = IR V be the normed space of all pairs u = (c, p),
where c IR and p V , equipped with the norm k(c, p)k = max{|c|, kpk }. The function
A : U 7 V mapping u = (c, p) U to q V according to
Z t
q(t) = c + p( )d (t [0, 1])
0
is linear and bounded. Moreover, the equation Au = v has a solution u = (q(0), q) U for
every q V . Nevertheless, for qn V defined by qn (t) tn , the solution un = (cn , pn ) of
Aun = qn has pn (t) ntn1 . Since kqn k = 1 and kpn k = n, there exists no constant c such
that equation Au = v has a solution u U with kuk ckvk for every v V .
11.1.2 Alternative Statements On Uniform Boundedness

The interior mapping principle of Theorem 11.1 is one of several commonly used state-
ments related to uniform boundedness. This subsection contains some alternative formu-
lations.
One frequently used corollary of Theorem 11.1 is the bounded inverse theorem at-
tributed to Banach.
Theorem 11.2 If A : U 7 V is a bounded linear bijection of one Banach space to
another then A1 is bounded as well.
Proof. According to Theorem 11.1, there exists c IR such that equation Au = v has a solution
u U with kuk ckvk for every v V . Since A is a bijection, such solution u is unique, and is
given by u = A1 v, which implies kA1 k c.
A further generalization of the principle of uniform boundedness is given by the so-

called closed graph theorem.
If U, V are normed real vector spaces, the direct product U V (the set of all pairs
(u, v) with u U and v V ) is naturally a normed real vector space, with the norm
k(u, v)k = max{kuk, kvk}.
If : U 7 V is a function, the graph of is the set
[] = {(u, v) U V : v = (u)}.
Theorem 11.3 Let L : U 7 V be a linear function mapping one Banach space to

another. Then the following conditions are equivalent:
(a) the graph [L] of L is closed, i.e. v = Lu whenever vk = Luk , uk u , and
vk v ;
(b) L is bounded.
The non-trivial part of Theorem 11.3 is the implication (a)(b). One way to interpret
it is by saying that, for a linear function with a closed graph, infinite amplification of input
length is impossible as if it would be achieved at a specific input vector u.
Proof. Since W = U V is a Banach space, and [L] is a closed linear subspace of W , [L]
is a Banach space as well. The projection onto the first coordinate function A : [L] 7 U ,
defined by
Aw = u for w = (u, v) [L],
is a linear bijection which is bounded (kAk 1). Hence its inverse, mapping u U to (u, Lu)
[L] is bounded. Therefore L is bounded as well.
Example 11.4 Consider a time-varying linear dynamical system defined by the recurrent
equations
x(t + 1) = A(t)x(t) + B(t)v(t), x(0) = 0, y(t) = C(t)x(t) (t = 0, 1, 2, . . . ) (11.1)
where A(t), B(t), C(t) are given real matrices of dimensions n-by-n, n-by-1, and 1-by-n respec-
tively, such that
sup kA(t)k < , sup kB(t)k < , sup kC(t)k < ,
t t t
which define a unique output sequence y = y(t) for every input sequence v = v(t). There are
several ways of defining stability of the model in (11.1): the `2 -BIBO stability calls for y to be
squared summable (i.e. y `2 ) whenever v `2 , and the bounded `2 gain stability calls for
existence of a constant > 0 such that kvk kyk for all input/output pairs, where k k is the
standard norm in `2 .
While it may appear that the bounded `2 gain stability is a stronger requirements than
the `2 -BIBO stability, the two notions are actually equivalent. Indeed, the `2 -BIBO stability
means that (11.1) defines a linear function L : `2 7 `2 . Since (11.1) is defined in terms of
scalar equations relating finite subsets of the scalar components of v() and y(), the graph of
L is closed. According to Theorem 11.3, L is bounded, which is equivalent to bounded `2 gain
stability.
The following theorem is a uniform boundedness statement in a very direct way.

Theorem 11.4 Let V be a Banach space. Let fk : V 7 Uk (where k = 0, 1, 2, . . . and

Uk are normed spaces) be bounded linear functionals. Then the following conditions are
equivalent:
(a) for every v V there exists c = c(v) IR such that kfk (v)k ckvk for all k ZZ+ ;
(b) there exists c0 IR such that kfk (v)k c0 kvk for every v V and k ZZ+ .
One way of interpreting Theorem 11.4 is that the function : V 7 IR{+} defined
by
(v) = sup kfk (v)k
k
achieves its minimal upper bound whenever sup = +.

Proof. The implication (b)(a) is obvious. To prove that (a) implies (b), let Uk be some
Banach spaces containing Uk as subspaces (for example, Uk could be obtained by applying the
completion process to the spaces Uk ). Let
def
U = U0 U1 U2 = {u = (u0 , u1 , u2 , . . . ) : uk Uk , kuk = sup kuk k < }
k
be the real vector space of all uniformly bounded sequences of vectors uk Uk . It is a straight-
forward excercise (following the proof of completeness for ` ) to show that, equipped with the
norm k k , U is complete.
By the assumption (a), the formula
Lv = (f0 v, f1 v, f2 v, . . . )
defines a linear function L : V 7 U . Since the functions fk are bounded, the graph of L is
closed. According to the closed graph theorem, L is bounded, and hence kfk k c = kLk < .
Example 11.5 Let V be a complex Banach space (same as the real one, except over the field
of complex numbers). Let A : V 7 V be a bounded linear operator. The set = (A) of all
complex numbers s C for which sI A does not have a bounded inverse is called spectrum of
A. The spectrum of A is a generalization of the set of eigenvalues to the infinite dimensional
case.
Using the Jordan form decomposition, in the case when dim(V ) < , it is relatively easy to
establish that
def
r(A) = max{|s| : s (A)} = lim kAn k1/n . (11.2)
n
Since the expansion

X
f (sI A)1 v = s1 f v + s2 f Av + = st1 f At v (11.3)
t=0
converges whenever |s| is larger than the right side of (11.2), the right side of (11.2) is never
smaller than r(A). The principle of uniform boundedness is handy in establishing the the
inequality in the reverse direction in the general infinite dimensional case.
Due to the completeness of V , the expansion

X
u = B(s0 )v + (s0 s)B(s0 )2 v + = (s0 s)t B(s0 )t+1 v
t=0
converges to a vector u V such that (sI A)u = v and
kB(s0 )k
kvk
1 |s0 s| kB(s0 )k
whenever s0 I A has a bounded inverse (s0 )B and |s0 s|kBk < 1. Hence for every f V and
v V the function s 7 f (sI A)1 v is analytical in the region |s| > r(A). Since the expansion
(11.3) is valid at least for |s| > kAk, this implies that the sequences {hn f An v} converge to
zero for |h| > r. According to the principle of uniform boundedness, this means existence of a
constant c0 = c0 (h) such that
|f An v| c0 (h)hn |h| > r, kf k 1, kvk 1,
which implies kAn k c0 (h)hn for all |h| > r. Therefore
lim kAn k1/n h |h| > r,

n
which implies (11.2).
One corollary (stated in the previous chapter) of Theorem 11.4 is boundedness of

weakly convergent sequences in arbitrary normed spaces.
Theorem 11.5 If {vk } k=1 is a weakly convergent sequence of elements of a normed space
V then supk kvk k < .
Proof. Each vk defines a linear functional fk : V 7 IR defined by fk (g) = g(vk ). The

functionals fk are bounded, with kfk k = kvk k (the inequality
|fk (g)| = |g(vk )| kfk k kgk
holds by the definition, and existence of g V satisfying |fk (g)| = kvk k follows from the Hahn-
Banach theorem). Since fk (g) = g(vk ) converges to a limit as k , the sequence {|fk (g)|k} k=1
is bounded for every fixed v V . Since V is a Banach space, this means that sup kfk k < ,
which implies sup kvk k < .
11.2 Generalized Inverse Function Theorem

This section describes the derivation and application of a generalized version of the inverse
function theorem, which uses linearization via Frechet differentiation to establish sufficient
conditions of local feasibility of nonlinear equations.
11.2.1 Contractive Map Theorems

The commonly used contractive map theorem is a simple statement providing a conser-
vative but easy to use approach to claiming existence and uniqueness of fixed points of
nonlinear functions.
Theorem 11.6 Let X be a non-empty closed subset of a Banach space V . Let F : X 7

X, < 1 be a function and a constant such that
kF (v) F (u)k kv uk v, u X. (11.4)
Then equation v = F (v) has a unique solution v X. Moreover, this solution is the limit
of the sequence {vk }
k=0 of vk V constructed according to
vk+1 = F (vk ) (k = 0, 1, 2, . . . ),
where v0 is an arbitrary element of X.
The parameter can be referred to as the coefficient of contraction of the function

F in (11.4) (to be referred to as strictly contractive when < 1 and contractive when
= 1). The assumption that the elements of X are vectors is not really used in the proof
of Theorem 11.6. Instead, what matters that X is a complete metric space, and F is
contractive with respect the metric.
Proof. Define a sequence {vk }
k=0 of elements of X by picking an arbitrary v0 X and setting
vk+1 = F (vk ) for all k ZZ+ . Then
kvk+1 vk k = kF (vk ) F (vk1 )k kvk vk1 k,
hence
kvk+1 vk k k kF (v0 ) v0 k
and
k
kvm vk k kF (v0 ) v0 k,
1
which means that {vk }
k=0 is a Cauchy sequence in V . Since by assumption V is complete, there
exists u V such that kvk uk 0 as k . By the closednes of X, u X, and
kF (u)uk kF (u)uk+1 k+kuk+1 uk = kF (u)F (uk )k+kuk+1 uk kuuk k+kuk+1 uk

11.2. GENERALIZED INVERSE FUNCTION THEOREM 197
converges to zero as k , which proves that F (u) = u.

To prove uniqueness of the solution of F (u) = u, let F (u) = u and F (v) = v. Then
ku vk = kF (u) F (v)k ku vk
which implies ku vk = 0, i.e. u = v.
Example 11.6 Consider equation
x = n (Ax) + w, (11.5)
where A is a given n-by-n real matrix such that kAk < 1, and n : IRn 7 IRn is a pointwise
sigmoid function, defined by

x1 (x1 )
.. .. def y
n . = , (y) = ,

. max{1, |y|}
xn (xn )
to be solved with respect to x IRn for a given w IRn . Since n : IRn 7 IRn is contractive,
in the sense that
|n (v) n (u)| |v u| v, u IRn ,
and the map v 7 Av is strictly contractive with contraction coefficient = kAk, the function
F : IRn
7 IRn defined by
F (v) = n (Av) + w
is strictly contractive. According to Theorem 11.6, equation (11.5) has a unique solution x IRn
for every w IRn , which can be obtained as the limit of the exponentially converging sequence
vk IRn defined by
v0 = 0, vk+1 = n (Avk ) + w (k = 0, 1, 2, . . . ).
Example 11.7 In general, it is possible for a (non-strictly) contractive map F : V 7 V on a

Banach space not to have any fixed points at all. A simple example is F : IR 7 IR defined by
F (x) = x + 1.
Example 11.8 Existence and uniqueness of solutions of an ordinary differential equation with
a Lipschitz right side is a remarkable application of the contractive map principle.
Let a : IRn 7 IRn be a Lipschitz function, i.e. such that there exists a constant cL IR
satisfying
|a(u) a(v)| cL |u v| u, v IRn .
Our intention is to prove that for every x0 IRn there exists T > 0 and a continuously differen-
7 IRn such that
tiable function x : [0, T ]
x(t) = a(x(t)) t [0, T ], x(0) = x0 . (11.6)

Let T > 0 be such that = cL T < 1. Let V = C([0, T ] 7 IRn ) be the real vector space of
all continuous functions v : [0, T ] 7 IRn . Equipped with the infinity norm
def
kvk = max |v(t)|,
t[0,T ]
V is a Banach space. Let F : V 7 V be the function mapping v V to u = F (v) V defined

by Z t
u(t) = x0 + a(v( ))d.
0
Since for u1 = F (v1 ) and u2 = F (v2 ) we have
Z T Z T
|u1 ( ) u2 ( )| |a(v1 (t)) a(v2 (t))|dt cL |v1 (t) v2 (t)|dt cL T kv1 v2 k ,
0 0
F is contractive. By construction, fixed points of F are solutions of (11.6).
In the derivation of a generalized version of the inverse function theorem later in this
section, we will use a slightly different version of the contractive map principle, adapted
for finding solutions v X of an implicit equation H(v, v) = 0, where H : X X 7 IR
is a continuous function. Naturally, the fixed point equation v = F (v) is a special case of
H(v, v) = 0 obtained with H(v, u) = ku F (v)k.
Theorem 11.7 Let w0 X be an element of a closed subset X of a Banach space W .

Let H : X X 7 IR be a continuous function. Assume that, for some real numbers
r > 0 and < 1 the following conditions (a),(b) are satisfied:
(a) there exists w1 X such that H(w0 , w1 ) = 0 and kw1 w0 k (1 )r;
(b) for every u, v, w X such that ku w0 k < r, kv w0 k < r, and H(u, w) = 0 there
exists e X such that H(v, e) = 0 and kw ek ku vk.
Then the equation H(x, x) = 0 has a solution x X such that kx w0 k < r. Moreover,
the system of recurrent equations and inequalities
H(wk+1, wk ) = 0, kwk+1 wk k kwk wk1 k (k = 1, 2, . . . ) (11.7)
has a solution sequence wk X which converges to a limit x X such that H(x, x) = 0

and kx w0 k r.
Note that applying Theorem 11.7 with
r = (1 )1 kF (w0 ) w0 k, H(v, u) = ku F (v)k,
where W, X, F, satisfy the assumptions of Theorem 11.6, and w0 is an arbitrary elment

of X, yields the existence part of the conclusion of Theorem 11.6 as a special case of
Theorem 11.7.
Proof. According to assumption (b), conditions (11.7) can be satisfied by choosing some
wk+1 X whenever kwk w0 k < r and H(wk , wk1 ) = 0. Since having wi well defined for i k
yields
kwi+1 wi k i kw1 w0 k, txhence kwk w0 k (1 + + + k )kw1 w0 k < r,
wk are defined for all k, and form a Cauchy sequence with a limit x W satisfying kx w0 k r.
Since X is a closed subset of W and H is continuous, x X and H(x, x) = 0.
11.2.2 Linearization By Frechet Differentiation

The possibility of finding a linear approximation of uniformly good relative quality for a
function mapping a subset of one normed space to another can be addressed in terms of
the Frechet differentiation.
Definition 11.1 Let U, V be real normed spaces. A function g : X 7 V mapping a
subset X U to V is said to be Frechet differentiable at a point u0 X if there exists
a linear function G : U 7 V (called the Frechet derivative, or differential, of g at u0 ,
notation G = g(u0 )) such that
kg(u) g(u0) G(u u0 )k
lim = 0, (11.8)
kuu0 k0,u6=u0 ku u0 k
which means that for every > 0 there exists > 0 such that every u U with kuu0 k <
belongs to X and satisfies the inequality
kg(u) g(u0) G(u u0 )k ku u0 k.
Note that Definition 11.1 requires the domain X U of g to contain a ball Br (u0 ) of
radius r > 0 centered at u0 . Generally speaking, the definition offers a strong version of the
differentiability constraint. A much weaker assumption would be to require the existence
of the directional derivative Dg (u0 , ) for every U, as the vector Dg (u0 , ) V
satisfying the condition

g(u0 + t) g(u0)
lim D g (u 0 , ) = 0. (11.9)
t0,t>0 t
When g has a Frechet derivative at u0 , the directional derivative is given by

Dg (u0 , ) = g(u0 ),
and the convergence in (11.9) is uniform, in the sense that

g(u0 + t) g(u0)
lim sup g(u0 )
= 0.
t0,t>0 kk1 t
In general, however, the directional derivative Dg (u0 , ) is not necessarily a linear function
of , and, even if it is, the convergence in (11.9) can be far from uniform.
Example 11.9 A function g : IR 7 IR is Frechet differentiable at a point u if and only if the

limit
def g(u0 + ) g(u0 )
a = lim
0,6=0
exists, in which case G : IR 7 IR maps IR to a IR.
Example 11.10 If A : U 7 V is a linear function mapping one normed space to another,

and v0 is an element of V then the affine function g : U 7 V defined by g(u) = Au + v0 is
Frechet differentiable, and its derivative equals A, at every point u U .
Example 11.11 Let V = C[0, 1] be the normed space of continuous functions v : [0, 1] 7 IR,
equipped with the max-norm k k . The function g1 : V 7 IR defined by g1 (v()) = v(0)2 is
differentiable at every point v0 V , and its derivative G1 = g1 (v0 ) at v0 is the linear functional
G1 : V 7 IR defined by
G1 v = g1 (v0 )(v) = 2v0 (0)v(0).
Similarly, the function g2 : V 7 V mapping v = v(t) to u = u(t) = v(t)2 is differentiable at
every point v0 V , and its derivative G2 = g2 (v0 ) at v0 is the linear operator G2 : V 7 V
defined by
(G2 v)(t) = (g2 (v0 )(v))(t) = 2v0 (t)v(t).
Though, according to the formal definition, the Frechet derivative G = g(u0 ) does not
have to be bounded (as a linear function) in general, it is easy to show that kg(u0 )k <
whenever g is continuous at u0 , i.e. kg(u) g(u0)k 0 as ku u0 k 0.
Theorem 11.8 Let g : X 7 V be a function mapping a subset X U of normed

space U to normed space V . Assuming that g is continuous and Frechet differentiable at
u0 X, the derivative G = g(u0 ) is bounded.
Proof. If G is not bounded, there exists a sequence of vectors wk U such that kwk u0 k 0
but kG(wk u0 )k = 1 as k . Then wk X for sufficiently large k, and kg(wk ) g(u0 )k 1
as k , which contradicts the assumed continuity of g.
Frechet derivatives of composition functions can be computed using the standard chain
rule.
Theorem 11.9 Let U, V, W be normed spaces. Let F : X 7 V and H : Y 7 W be

functions defined on the sets X U and Y V such that F (X) Y . Assume that F is
Frechet differentiable at u0 X, and H is Frechet differentiable at v0 = F (u0 ) Y , and
that F (u0 ) is bounded. Then the composition K = H F is Frechet differentiable at u0 ,
and
K(u0 ) = H(v0 )F (u0 ).
For a real valued function, a Frechet derivative is zero at an argument of a minimum.

Theorem 11.10 Let F : X 7 IR be a function mapping a subset X of a normed space

U to V = IR. If u0 X is an argument of minimum of F on X, and F is Frechet
differentiable at u0 , then F (u0 ) = 0.
Frechet differentiability of a function g at a point u0 means that the increments g(u)
g(u0) are well approximated by a linear function of u u0 when u is close to u0 , as
expressed by (11.8). Many applications, however, require a stronger relative error bound
of the form
kg(u1) g(u2) G(u1 u2 )k
lim = 0. (11.10)
kui u0 k0,u1 6=u2 ku1 u2 k
In general, (11.10) is not implied by Frechet differentiability of g at u0 , as, for example,
with 2
u sin(1/u2), u 6= 0,
U = V = IR, u0 = 0, g(u) =
0, u = 0.
The following statement derives (11.10) from Frechet differentiability in a neigborhood of
u0 , under the condition that g(u) g(u0 ) as u u0 .
Theorem 11.11 Let g : X 7 V be a function mapping a subset X of a normed space
U to normed space V . Let u0 X and r > 0 be such that g is Frechet differentiable at
every point u Br (u0 ). Assume that the function u 7 g(u) is continuous at u0 , i.e.
lim kg(u) g(u0 )k = 0. (11.11)
kuu0 k0
Then condition (11.10) is satisfied for G = g(u0 ).
Proof. By assumption, for every > 0 there exists > 0 such that kg(u) Gk < whenever
ku u0 k < . For every pair u1 , u2 U such that kui u0 k < and u1 6= u2 define function
h : IR 7 U according to h(t) = tu1 + (1 t)u2 . By the definition of the Frechet derivative,
every point [0, 1] is contained in an open interval S = S( ) such that
kg(h(t)) g(h( )) (t )g(h( ))(u1 u2 )k |t | ku1 u2 k
for every t S( ). Since kg(h( )) Gk < , this implies
kg(h(t)) g(h( )) (t )G(u1 u2 )k 2|t | ku1 u2 k,
i.e.
ke(t) e( )k 2|t | ku1 u2 k
for e(t) = g(h(t)) tG(u1 u2 ). Since the segment [0, 1] is compact, it can be covered by a
finite set of such intervals. Combining the bounds on the increments of e(t) over the selected
intervals yields
ke(1) e(0)k 2ku1 u2 k,
i.e.
kg(u1 ) g(u2 ) G(u1 u2 )k 2ku1 u2 k.
Since 0 as 0, this completes the proof of Theorem 11.11.
11.2.3 Implicit Function Theorem

Let g : X 7 V , where X U, and U, V are Banach spaces, be a function which is
Frechet differentiable at a point u0 X, with G = g(u0) being the Frechet derivative
of g at u0 . Among the various versions of the implicit function theorem, we are looking
for a statement claiming that for h U with sufficiently small khk the equation g(u) =
g(u0) + Gh has a solution of the form u = u0 + h + w with kwk = o(khk). Apart from the
boundedness (kGk < ), this requires the strong linearization condition (11.10) (which,
according to Theorem 11.11, can be guaranteed by assuming that kg(u) g(u0 )k 0 as
ku u0 k 0). This also requires G to be onto V , in the sense that R(G) = V .
Theorem 11.12 Let g : X 7 V be a function mapping a subset X U of Banach space

U to Banach space V . Assume that X contains the ball Bd (u0 ) of radius d > 0 centered
at u0 X. Let G : U 7 V be a bounded linear function such that R(U) = V and
condition (11.10) is satisfied (which, in particular, implies that G = g(u0 ) is the Frechet
derivative of g at u0 ). Then for every > 0 there exists > 0 such that the equation
g(u0 + h + w) = g(u0 ) + Gh has a solution w U with kwk khk whenever khk .
Proof. According to the interior map principle, there exists c > 0 such that the equation
Gu = v has a solution u U with kuk ckvk for every v V . According to (11.10), there
exists 0 > 0 such that 0 < and
ku1 u2 k
kg(u1 ) g(u2 ) G(u1 u2 )k whenever kuk u0 k 0 . (11.12)
2c
Let us show that the conclusion of Theorem 11.12 holds for = 0 /3.
To do this, apply Theorem 11.7 with W = U , X = Bd (u0 ), = 0.5, r = 0 , w0 = 0, and
H(u, v) = kGv + g(u0 + h + u) G(u + h) g(u0 )k.
Then H(u, v) = 0 means
Gv = g(u0 ) + G(u + h) g(u0 + h + v). (11.13)
According to (11.12), for khk = 0 /3 the right side in (11.13) with v = w0 = 0 has norm not
larger than 0 /6c, and hence (11.13) has a solution u = w1 U with kw1 k 0 /6. Similarly,
(11.12) implies that equation H(w, e) = 0 has a solution e such that ke vk ku wk,
where = c/2c = 0.5, whenever H(u, v) = 0, kuk < r, and kwk < r. Since (1 )r >
kw1 w0 k, Theorem 11.7 proves existence of w U such that kwk khk and H(w, w) = 0,
i.e. g(u0 + h + w) = g(u0 ) + Gh.
Example 11.12 Consider the quadratic equation
X 2 + A1 X + A0 = 0, (11.14)
where A0 , A1 , X are bounded linear operators on a Banach space V (A0 , A1 are given, and X
is to be found). The equation does not always have a solution: when V is a real vector space,
(11.14) is infeasible for V = IR, A1 = 0, A0 = 1. When V is a complex vector space, (11.14) is
infeasible for
2 0 1
V = C , A1 = 0, A0 = .
0 0
Let us show how to use Theorem 11.12 to prove that, assuming A1 has a right inverse
A+ +
1 : V 7 V (i.e. a bounded linear operator such that A1 A1 = IV ), there exists > 0 such
that equation (11.14) has a solution X for every A0 satisfying kA0 k .
Indeed, let U be the Banach space of all bounded linear operators on V . The function
g : U 7 U defined by
g(X) = X 2 + A1 X
is Frechet differentiable at every point X0 U , with the Frechet derivative g(X) mapping U
to
g(X) = X + X + A1 .
At X = 0 the range of g(X) = A1 is the whole U , as the equation A1 = Y has a solution
= A+ 1 Y for every Y U . According to Theorem 11.12, for every > 0 there exists > 0 such
that the equation g(H + W ) = A1 H has a solution W with kW k kHk whenever kHk .
The following statement is an alternative version of Theorem 11.12, using an explicit

assumption about the structure of the argument of g to relax the completeness require-
ments.
Theorem 11.13 Let g : X 7 V be a function mapping a subset X U W of

the normed space U W (where U is a Banach space, and W is a normed space) to
Banach space V . Assume that X contains the ball Bd (u0 , w0 ) of radius d > 0 centered
at (u0 , w0) X. Let G : U W 7 V be a bounded linear function for which condition
(11.10) is satisfied (in particular, this implies G = g(u0 , w0 )). Assume that G(u, w) =
Au + Bw for all u U and w W , where A : U 7 V is a bounded linear bijection, and
B : W 7 V is a bounded linear function. Then there exist > 0 and (0, d) such
that the equation g(u, w) = 0 has a unique solution u B (u0 ) for every w B (w0 ).
Moreover, the resulting function f : B (w0 ) 7 B (u0 ) (defined by g(f (w), w) = 0) is
such that
kf (w1 ) f (w2 ) A1 B(w1 w2 )k
lim =0
kwk w0 k0 kw1 w2 k
0 ) of f at w0 ).
(in particular, A1 B is the Frechet derivative f(w
Another version of the implicit mapping theorem follows easily from Theorem 11.13.
Theorem 11.14 Let g : X 7 V be a function mapping a subset X U of Banach

space U to Banach space V . Assume that X contains the ball Bd (u0 ) of radius d > 0
centered at u0 X. Let G : U 7 V be a bounded linear bijection for which condition

(11.10) is satisfied (in particular, this implies G = g(u0)). Then there exist > 0 and
(0, d) such that the equation g(u) = v has a unique solution u B (u0 ) for every
v B (g(u0)). Moreover, the resulting function f : B (g(u0)) 7 B (u0 ) (defined by
g(f (v)) = v) is such that
kf (v1 ) f (v2 ) G1 (v1 v2 )k

lim =0
kvk g(u0 )k0 kv1 v2 k
(in particular, G1 is the Frechet derivative f(g(u0)) of f at g(u0)).
11.3 Necessary Conditions Of Optimality

An important application of Theorem 11.12 concerns the derivation of necessary condi-
tions of optimality in minimization with equality constraints.
11.3.1 General Constrained Optimization

Consider the minimization problem
(u) min subject to u , g(u) = 0, (11.15)
where is a convex subset of a Banach space U, : 0 7 IR and g : 0 7 V are

functions (not necessarily linear) mapping a set 0 , where 0 U, to IR and a
Banach space V respectively. Assuming that u0 is an argument of minimum in the
optimization problem defined by (11.15), and that and g are Frechet differentiable at
a point u0 , we are interested in conditions which guarantee existence of a bounded
linear functional p V and a number q {0, 1} (q = 1 when p = 0) such that
L(u0 )(u u0 ) 0 u , (11.16)
where
L(u) = q(u) + pg(u) (11.17)
is the Lagrangian defined by p and q.
Theorem 11.15 Let banach spaces U, V , subsets 0 U, a point u0 , and

functions : 0 7 IR, g : 0 7 V be such that
(a) is convex and has a strong interior point w0 (i.e. there exists > 0 such that
w whenever kw w0 k < );
(b) is Frechet differentiable at u0 ;

11.3. NECESSARY CONDITIONS OF OPTIMALITY 205
(c) there exists a bounded linear function G : U 7 V such that condition (11.10) is
satisfied;
(d) R(G) = V ;
(e) (u) (u0 ) whenever u and g(u) = 0.
Then there exist a bounded linear functional p V and a number q {0, 1} (q = 1 when
p = 0) such that condition (11.16) is satisfied for the Lagrangian (11.17).
Conditions (a)-(d) of Theorem 11.15 are regularity assumptions: (a) means that
is a nice fat set; (b) means that is linearizable around u0 ; (c) is a more restrictive
local linearizability assumption imposed on g (condition (c) implies that G = g(u0 ) is the
Frechet derivative of g at u0 , and is in turn implied by continuity and continuous Frechet
differentiability of g in a neigborhood of u0); (d) ensures that the constraint g(u) = 0
is regular (i.e. defines a smooth manifold of solutions) near u0 . Condition (e) is the
assumption of optimality of u0 in the optimization setup from (11.15).
Proof. Let C = (u0 ). The first step of the proof establishes that C(u u0 ) 0 whenever u
is a strong interior point of such that G(u u0 ) = 0. Indeed, if u + whenever kk < d
(where d > 0) then
u0 + t(u u0 ) + w = (1 t)u0 + t(u + t1 w)
whenever
d
kwk 0 kt(u u0 )k, 0 = .
ku u0 k
Since, for every > 0, the implicit mapping theorem guarantees, for t > 0 small enough,
existence of wt U such that kwt k < kt(u u0 )k and g(u0 + t(u u0 ) + wt ) = 0, the optimality
of u0 implies (u0 + t(u u0 ) + wt ) (u0 ). Since
(u0 + t(u u0 ) + wt ) (u0 ) = tC(u u0 ) + o(t),
this implies C(u u0 ) 0.

The second step of the proof applies the Hahn-Banach theorem to the set
= {(C(u u0 ) + a, G(u u0 )) IR V : u int()}, a > 0
where int() denotes the set of all stong interior points of . Since is convex, so is int(),
and hence . Since has a strict interior point, so does int(). Since R(G) = V , this yields
existence of an interior point of . As was established in the first step of the proof, 0 6 . Hence,
according to the hahn-Banach theorem, there exists a non-zero linear functional f : IR V 7 IR
such that f (z) 0 for all z .
Since every linear functional f : IR V 7 IR has the form
f (y, v) = qy + pv, q IR, p V ] ,

the condition f (z) 0 means
qa + qC(u u0 ) + pG(u u0 ) a > 0, u int(). (11.18)
Fixing u int() and letting a + in (11.18) yields q 0. If q 6= 0 then q > 0, therefore

replacing q with 1 and p with q 1 p in (11.18) yields an equivalent inequality, hence it can be
assumed that q {0, 1} without loss of generality.
To show that p is bounded, note that, according to the interior mapping principle applied
to G, there exists a ball BU such that the set of all vectors G(u u0 ) with u BU
contains a ball BV of V . Since C is assumed to be bounded, C(u u0 ) is bounded as u ranges
over BU . If p is not bounded, pv is not bounded from below as v ranges over BV , and hence
qa + qC(u u0 ) + pG(u u0 ) is not bounded from below as u ranges over BU . Since this
contradicts (11.18), we conclude that p is bounded. Now letting a 0 in (11.18) yields the
inequality from (11.16) for all u int(). Since L(u0 ) is bounded (and hence continuous) linear
operator, this implies (11.18).
It is important to understand that Theorem 11.15 uses rather strong assumptions

to deal with the nonlinear equality constraint g(u) = 0. As a rule, the setup in which
the constraints are limited to inequalities can be addressed using the significantly weaker
assumptions of directional differentiability. In particular, most regularity assumptions
(apart from those guaranteeing existence of interior points) can be dropped in convex
optimization problems.
11.3.2 Example: Minimal Time Optimal Control

Consider the task of constructing a bounded amplitude control action which moves the
state of a given linear system to a subset defined by an equality constraint. An instance
of such problem is defined by a triple (A, B, h()), where A and B are real matrices of di-
mensions n-by-n and n-by-1 respectively, and h : IRn 7 IRm a continuously differentiable
function such that h(0) 6= 0. The task is to find the minimal T > 0 for which there exists
an integrable function u : [0, T ] 7 [1, 1] such that the solution x : [0, T ] 7 IRn of the
differential equation
x(t) = Ax(t) + Bu(t), x(0) = 0 (11.19)
satisfies the constraint h(x(T )) = 0. To include the possibility of x() not being differen-
tiable at some points, equation (11.19) is interpreted as the integral equation
Z t
x(t) = {Ax( ) + Bu( )}d (t [0, T ]).
0
The Lagrange multiplier optimality conditions of Theorem 11.15 can be used to prove
the following necessary condition of optimality for a given triple (T, u(), x()) from (11.19).
Theorem 11.16 Assume that the pair (A, B) of real matrices of dimensions n-by-n and
n-by-1 is controllable, and the function h : IRn 7 IRm is continuously differentiable. Let
11.3. NECESSARY CONDITIONS OF OPTIMALITY 207
M be the set of all triplets (T, u, x), where T > 0, u : [0, T ] 7 [1, 1] is integrable, and
x : [0, T ] 7 IRn satisfies (11.19) and h(x(T )) = 0. Assume that a triplet (T, u, x) M is
optimal (in the sense that T1 T for every (T1 , u1 (), x1 ()) M), and such that h(x(T ))
has rank m. Then there exists b IRm , |b| = 1, such that
u(t) sgn [(t)] (11.20)
where = (t) satisfies the differential equation
(t) = A0 (t), (T ) = h(x(T ))0 b, (11.21)
and
{1}, y > 0,
sgn[y] = {1}, y < 0,
[1, 1], y = 0.

Once T > 0 and b IRm , |b| = 1 are fixed, u(t) is uniquely defined for all t except,
possibly, a finite number of instances where B 0 (t) = 0 Hence x = x() is completely
determined by T , b, and the constraints h(x(T )) = 0, |b| = 1 become a system of n + 1
scalar equations with n + 1 scalar variables. In applications, the optimal control strategy
is computed by solving these equations numerically.
Proof. Let 0 be the subset of the Banach space U = IR L [0, 1] consisting of all pairs (T, v)
with T > 0 and v L [0, 1]. Let : 0 7 IR and g : 0 7 IRm be defined by
(T, v()) = T, g(T, v()) = h(z(1)),
where z : [0, 1] 7 IRn is the (unique) solution of the differential equation
z( ) = T (Az( ) + Bv( )), z(0) = 0. (11.22)
Let 0 be the set of all pairs (T, v) U with T > 0 and kvk 1.
Since the relations
v( ) = u(T ), z( ) = x(T )
define a bijection between the triples (T, u, x) M and the triples (T, v, z) satisfying conditions
(T, v) and (11.22), the original optimization task is equivalent to the problem of minimizing
(T, v) over the pairs (T, v) satisfying g(T, v) = 0.
Since is linear in its domain, and the set 0 contains a ball around every point (T, v) 0 ,
is Frechet differentiable on 0 , and its derivative is given by
(T, v)(T , v ) = T .
Let us show that the function S : 0 7 IRn mapping (T, v) 0 to z(1) according to
(11.22) is Frechet differentiable at every point of its domain, and its derivative is given by
S(T, v)(T , v ) = z (1),

where z : [0, 1] 7 IRn is defined by

z ( ) = T (Az( ) + Bv( )) + Az ( ) + Bv ( ),
z (0) = 0. (11.23)
Indeed, (11.23) defines a bounded linear function U 7 IRm , and the difference
(t) = z (t) z(t) z (t),
where z = z (t) is the solution of (11.22) with z replaced by z , T replaced by T + T , and v

replaced by v , satisfies the differential equation
(t) = T (A(t) + Bv (t)), (0) = 0,
which makes (T ) = o(k(T , v )k). Note also that S(T, v) depends continuously on (T, v) 0 .
Since h is assumed continuously differentiable, the function g is continuously Frechet dif-
ferentiable on 0 . Moreover, since the pair (A, B) is controllable, R(S(T, v)) = IRn for all
(T, v) 0 . Since it is assumed that R(h(x(T ))) = IRm , it follows that R(g(T, v)) = IRm at the
optimum.
Applying Theorem 11.15 with V = IRm yields existence of q {0, 1} and b IRm (b 6= 0
when q = 0) such that
q (T, v)(T , v ) b0 g(T, v)(T , v ) 0 (11.24)
whenever T + T > 0 and kv + v k 1. Note that it is impossible to have b = 0, because in
that case T 0 whenever T + T > 0, which is impossible for T > 0.
Rewriting (11.24) in a more explicit way, and setting T = 0 yields
b0 h(x(T ))x (T ) 0
subject to
x (t) = Ax (t) + Bu (t),
x (0) = 0 (11.25)
and ku + u k 1, where
def
x (t) = z (t/T ), u (t) = v (t/T ).
Let : [0, T ] 7 IRn be defined by (11.21). Then, due to integration by part and (11.25),
Z T
0
b h(x(T ))x (T ) = (t)Bu (t)dt 0
0
whenever ku + u k 1. Since u (t) can be made positive when u(t) = 1, negative when
u(t) = 1, and of either sign when |u(t)| < 1, it follows that B 0 (t) is non-positive when u(t) = 1,
non-negative when u(t) = 1, and zero when |u(t)| < 1, which is equivalent to (11.20).
Index
F m the set of all columns of m elements of zero, 35

F , 45
C (the set of complex numbers), 29 irreducible polynomial, 74
Q0 (the set of rational numbers), 29
kernel, 12, 47
IR (the set of real numbers), 29
Kramers formula, 67
IRn , 6
, 64 linear
equation, 23
analog, 29 subspace, 7
basis linear function, 11, 47
standard, 22 linear functional, 11
bi-orthogonality, 26, 52 linear independence, 17, 50
linear transformations, 11
characteristic
matrix (of a linear function), 21, 51
of a field, 43
multilinear skew symmetric function, 56
convex
function, 139 null-space (of a linear function), 12, 47
optimization, 154
set, 139 polynomial
irreducible, 74
determinant, 60
direct sum V1 V2 , 64 quasi-convex, 139
dual
linear function (A] ), 16, 49 rank
]
real vector space (V ), 16, 48 of a linear function, 24
spectrum
Fermats Little Theorem, 42
essential, 131
field
of complex linear operator, 130
characteristic, 43
definition, 35 theorem
equivalent (isomorphic), 38 Little Fermats, 42
isomorpic (equivalent), 38
shortcut notation, 36 vector space
unit, 35 complex, 45
209
210 INDEX
over a field, 44
real, 4
shortcut notation, 5
zero, of, 4, 44

Algebra Lineal y Análisis Funcional para Señales y Sistemas

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Algebra Lineal y Análisis Funcional para Señales y Sistemas

Uploaded by

Copyright:

Available Formats

Linear Algebra and Functional Analysis

for Signals and Systems 1

Alexandre Megretski John Wyatt

September 15, 2009

(b) quadratic optimization (positive definiteness, GramSchmidt orthogonalization, sca-

1 Linearity With Real Scalars 1

Bases and Invariance of Dimension . . . . . . . . . . . . . . . . . . 18

2 Linearity With Alternative Fields 29

Feasibility of Linear Equations . . . . . . . . . . . . . . . . . . . . . 51

5.2.2 Matrix Representation of Quadratic Forms . . . . . . . . . . . . . . 86

7 Bounded Linear Functions On

8.2.2 Caratheodorys Fundamental Theorem . . . . . . . . . . . . . . . . 150

9 The Hahn-Banach Theorem 159

10 Norms And Convergence 171

11 Feasibility Of Equations 189

Linearity With Real Scalars

1.1.1 Interpolation of Data

Interpolation by Polynomials of a Single Real Variable

Such polynomial is said to interpolate the data samples provided by .

(b) the function A : V 7 U transforming an arbitrary polynomial p of degree less than

(c) the dimensions of V and U are m and n respectively;

(d) subject to (1.2), the null-space of A consists of a single element when n = m.

Interpolation by Polynomials of Two Real Variables

Interpolation by Rational Functions of Single Real Variable

1.1.2 Stationary Distributions for a Random Walk on a Graph

Figure 1.1: A Network

1.2 Real Vector Spaces

1.2.1 Formal Definition and Notation

(a) 0V replaced by 0 (whether 0 IR or 0 = 0V V can usually be deduced from the

(b) v u used as a shortcut for v + ((1)u);

(c) v used as a shortcut for (1)v;

(d) associativity allows one to write v + u + w instead of (v + u) + w;

1.2.2 Examples of Real Vector Spaces

The Zero Vector Space

The Real Vector Space IRn

Vector Spaces of Functions

h() = cf (), u() = f () + g().

For example, an instantaneous force field in a subset of the three-dimensional space

Vector Spaces of Equivalence Classes

(b) The sum O + X of O and another point X is defined (naturally) as X. When

(c) The result cA of multiplying point A by real number c IR is O when A = O.

Figure 1.2: Geometry and Vectors

False Real Vector Spaces

s(c, x) = {cx}, a(x, y) = {x + y} (c IR, x, y X).

0 = 0.5(v1 + v2 ) 6= (0.5v1 ) + (0.5v2 ) = 0.5.

1.2.3 Proofs for Vector Spaces

Lemma 1.1 If V is a real vector space, v, u, w V , and v + w = u + w then v = u.

Lemma 1.2 If V is a real vector space then c 0V = 0V for all c IR.

Elementary Geometry: Medians in a Triangle

Figure 1.3: Medians in a Triangle

[AB] = {tA + (1 t)B : t [0, 1]}, (AB) = {tA + (1 t)B : t IR},

1.3 Linear Functions

1.3.1 Linear Functions, Operators, And Functionals

It is common to omit the brackets in linear function notation, so that Av replaces

The range of A is the set

It is easy to check that the null-space ker(A) of a linear function A : V 7 U is a

Example 1.3 The formula Z 1

1.3.2 Elementary Operations on Linear Functions

F (v1 + v2 ) = yA(v1 + v2 ) = y(Av1 ) + y(Av2) = F (v1 ) + F (v2 ),

F (cv) = yA(cv) = cyAv = cF (v).

D(v1 + v2 ) = A(v1 + v2 ) + B(v1 + v2 ) = (Av1 + Av2 ) + (Bv1 + Bv2 )

= (A(v1 ) + B(v1 )) + (A(v2 ) + B(v2 )) = D(v1 ) + D(v2 ),

G(v1 + v2 ) = BA(v1 + v2 ) = B(Av1 + Av2 ) = BAv1 + BAv2 = G(v1 ) + G(v2 ),