You are on page 1of 39

Vector Calculus

Andrew Monnot

Contents
1 Vector Spaces 2
1.1 Basic Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Bases and Dimension of Vector Spaces . . . . . . . . . . . . . . . . . . . 2
1.3 Functions between Vector Spaces . . . . . . . . . . . . . . . . . . . . . . 3

2 The Vector Space Rn 5


2.1 Magnitude in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Magnitudes in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Dot Product and Distance in Rn . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Hyperplanes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 The Cross Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Limits and Continuity of Functions between Vector Spaces 12


3.1 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4 Differentiation 15
4.1 Differentiation in Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Differentiation in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 Differentials and Tangent Planes in Rn . . . . . . . . . . . . . . . . . . . 17
4.4 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5 Integration 21
5.1 Integration in Vector Spaces? . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2 Integration in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.3 Change of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.4 Subordinate Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

6 Vector Calculus in R3 28
6.1 Gradient, Curl, and Divergence . . . . . . . . . . . . . . . . . . . . . . . 28
6.2 Main Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

7 Applications 35
7.1 Vortex Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
7.2 Electrodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

References 39

1
1 Vector Spaces
1.1 Basic Terminology
We begin with the formal definition of a vector space.
Definition 1.1. A vector space over a field F (also called an F -vector space)
is a set V together a binary operation + : V × V → V (called addition), a function
· : F × V → V (called scalar multiplication), and an element 0 ∈ V (called the zero
vector) such that

u+v =v+u (commutativity)


(u + v) + w = u + (v + w) (associativity)
v+0=v (additive identity)
v + (−v) = 0 (existence of an inverse −v)
(rs) · v = r · (sv) (associativity of scalars)
1·v =v (scalar identity)

for all u, v, w ∈ V and r, s, 1 ∈ F. We will call elements of V vectors and elements of F


scalars.
For example, the real numbers R is a vector space over itself: addition is just ordinary
addition, and scalar multiplication is regular multiplication. The complex numbers C is
a vector space over the real numbers (and over itself). Recall that if we have two sets X
and Y , we can define the Cartesian product as the set

X × Y = {(x, y) : x ∈ X and y ∈ Y }.

Definition 1.2. Let U and V be vector spaces over a field F. We define the product
vector space U × V as Cartesian product of U and V together with the following
definitions:

0 = (0, 0)
(u1 , v1 ) + (u2 , v2 ) = (u1 + u2 , v1 + v2 )
r · (u, v) = (ru, rv).

One can easily verify that U × V becomes an F -vector space with the above rules.
(note by ru we mean r · u, and we will omit the · from now on when performing scalar
multiplication). It follows that Rn = Xni=1 R is a real vector space, and that Cn is both a
real and complex vector space (meaning its scalars may come from the field R or C).

1.2 Bases and Dimension of Vector Spaces


Definition 1.3. A subset S ⊆ V of vectors is called a spanning set (or is said to span
V ) if for every v ∈ V we can write
n
X
v= ci ei
i=1

for ei ∈ S and ci ∈ F.

2
Example 1.4. Let V = R2 and S = {î, ĵ} where î = (1, 0) and ĵ = (0, 1). Then S spans
R2 since for any v = (a, b) ∈ R2 we can write

(a, b) = (a, 0) + (0, b) = a(1, 0) + b(0, 1) = aî + bĵ.

Definition 1.5. A subset L ⊆ V is said to be linearly independent (or, any two


elements of L are linearly independent) if for any finite number of elements {v1 , ..., vn }
in L,
c1 v1 + · · · cn vn = 0 =⇒ c1 = · · · = cn = 0.

Definition 1.6. A subset B ⊆ V of vectors is called a basis of/for V if B spans V and


is linearly independent.

Example 1.7. We showed that {î, ĵ} spanned R2 . Now observe that

aî + bĵ = 0 ⇐⇒ a(1, 0) + b(0, 1) = (a, 0) + (0, b) = (a, b) = (0, 0)

and hence a = 0 and b = 0. So {î, ĵ} is a linearly independent set, and is hence a basis of
R2 .

Proposition 1.8. Let B and B 0 be two bases for a vector space V. Then B and B 0 have
the same number of elements (cardinality). Moreover, every vector space has a basis.

Definition 1.9. Let B be a basis of V. We define the dimension of V over its field F
by
dimF (V ) = |B|,
where |B| is the cardinality of B.

Example 1.10. Let V = Rn and ei = (0, ..., 1, ..., 0) be the vector in Rn with a 1 in the
ith component and 0s in each of the other components. Then {e1 , ..., en } is a basis of Rn
and hence
dimR (Rn ) = n.

1.3 Functions between Vector Spaces


Functions between two F -vector spaces U and V are just functions f : U → V. Unless
otherwise specified, we will assume all vector spaces we mention have scalars that come
from the same field. Sometimes a function between two vector spaces is special :

Definition 1.11. A function f : U → V is called a linear map (or more technically, a


homomorphism of vector spaces) if for all x, y ∈ U and r ∈ F we have

f (x + y) = f (x) + f (y)
f (rx) = rf (x).

Example 1.12. Any function f : R → R of the form f (x) = ax is a linear map since

f (x + y) = a(x + y) = ax + ay = f (x) + f (y)

and
f (rx) = a(rx) = r(ax) = rf (x).

3
Proposition 1.13. If f : U → V is a linear map between vector spaces, then f (0) = 0.

Proof. Since f is a linear map and every vector x ∈ U has an inverse −x such that
x + (−x) = 0, we have that

f (0) = f (x + (−x)) = f (x) + f (−x) = f (x) − f (x) = 0

for any choice of a vector x ∈ U. 


Now suppose U and V are finite dimensional vector spaces over a field F and f : U →
V is a function between them. Suppose dim(U ) = m and dim(V ) = n. Then for x ∈ U we
can write x = (x1 , ..., xm ) where xi ∈ F. Similarly we can write f (x) = (f1 (x), ..., fn (x))
such that fi : U → F.

4
2 The Vector Space Rn
2.1 Magnitude in R
For x ∈ R, we define its norm (or magnitude) as its absolute value |x|, which is defined
as 
 x if x > 0
|x| = −x if x < 0 .
0 if x = 0

And for two real numbers x, y ∈ R, we define their distance as the norm of the difference:
d(x, y) = |x − y|.

Proposition 2.1. For x, y, z ∈ R we have

(a) |x| ≥ 0 and |x| = 0 iff x = 0.

(b) |xy| = |x||y|.

(c) |x + y| ≤ |x| + |y|.

Proof. (a) This follows from the definition (if x < 0, then −x > 0.) (b) xy < 0 iff
x < 0 or y < 0 but not both. Hence |xy| = −xy = |x||y|. xy > 0 iff x, y > 0 or x, y < 0.
In either case, |xy| = xy = (−x)(−y) = |x||y|. And |xy| = 0 iff x = 0 or y = 0 and hence
iff |x||y| = 0. (c) Note that we have −|x| ≤ x ≤ |x| and −|y| ≤ y ≤ |y|. If we add these
two inequalities we obtain

−(|x| + |y|) ≤ x + y ≤ |x| + |y|,

or equivalently, |x + y| ≤ |x| + |y|. 

Proposition 2.2. For x, y, z ∈ R we have

(a) |x − y| ≥ 0 and |x − y| = 0 iff x = y.

(b) |x − y| = |y − x|.

(c) |x − y| ≤ |x − z| + |z − y|.

Proof. (a) Nonnegativity follows immediately from the previous proposition, as does
the fact that |x − y| = 0 iff x − y = 0 iff x = y. (b) We have

|x − y| = |(−1)(y − x)| = | − 1||y − x| = |y − x|.

(c) We have
|x − y| = |(x − z) + (z − y)| ≤ |x − z| + |z − y|
by the previous proposition. 

5
2.2 Magnitudes in Rn
Can we construct a notion of magnitude k · k on Rn that satisifies the same properties?
Namely, is there a function k · k : Rn → R such that
(a) kxk ≥ 0 and kxk = 0 iff x = 0,
(b) kcxk = |c|kxk, and
(c) kx + yk ≤ kxk + kyk,
which we will call a norm. Note that in condition (b) above we look at a vector x ∈ Rn
being multiplied by a scalar c ∈ R since we don’t yet have a notion of a product of vectors.
The first guess for x = (x1 , ..., xn ) ∈ Rn might be
n
X
kxk = kxk1 = |xi |.
i=1

Proposition 2.3. k · k1 is a norm on Rn .


Proof. As a sum of absolute values, clearly kxk1 ≥ 0. Moreover kxk1 = 0 iff xi = 0
for each xi (as we saw for absolute value) and hence iff x = 0 (since the zero vector in
Rn is the vector with 0 in every component). Note for c ∈ R we also have
n
X n
X n
X
kcxk1 = k(cx1 , ..., cxn )k1 = |cxi | = |c||xi | = |c| |xi | = |c|kxk1 .
i=1 i=1 i=1

And lastly
kx + yk1 = k(x1 , ..., xn ) + (y1 , ..., yn )k1
= k(x1 + y1 , ..., xn + yn )k1
Xn
= |xi + yi |
i=1
Xn
≤ |xi | + |yi |
i=1
n
X n
X
= |xi | + |yi |
i=1 i=1
= kxk1 + kyk1 .

Let us also define !1/n
m
X
kxkn = |xi |n .
i=1
m
Proposition 2.4. kxkn is a norm on R .
Proof. kxkn ≥ 0 as it is a root of sums of positive numbers. Thus it is zero iff x = 0.
We also have
m
X m
X m
X
kcxknn = n
|cxi | = n n
|c| |xi | = |c| n
|xi |n = |c|n kxknn
i=1 i=1 i=1

6
and hence kcxkn = |c|kxkn . The last part, kx + ykn ≤ kxkn + kykn , will be proved for
n = 2 once we discuss the dot product. 
In R, |x − y| satisfied additional properties.
Proposition 2.5. For x, y, z ∈ Rm we have
(a) kx − ykn ≥ 0 and kx − ykn = 0 iff x = y.
(b) kx − ykn = ky − xkn .
(c) kx − ykn ≤ kx − zkn + kz − ykn .
Proof. (a) This again follows immediately from the fact that k · kn is a norm. (b)
We have
m
!1/n
X
kx − ykn = |xi − yi |n
i=1
m
!1/n
X
n
= |(−1)(yi − xi )|
i=1
m
!1/n
X
= | − 1|n |yi − xi |n
i=1
m
!1/n
X
= |yi − xi |n
i=1
= ky − xkn .
(c) And this similarly immediately follows:
kx + ykn = k(x − z) + (z − y)kn ≤ kx − zkn + kz − ykn .


2.3 Dot Product and Distance in Rn


Definition 2.6. The dot product is the function · : Rn × Rn → R defined by
m
X
x·y = xi y i .
i=1

Note that we have kxk2 = x · x and hence
(x − y) · (x − y) = kx − yk22
Xn
= |xi − yi |2
i=1
n
X
= (x2i + 2|xi ||yi | + yi2 )
i=1
= x · x + 2x · y + y · y
= kxk22 + 2x · y + kyk22 .

7
Because of this relation of the dot product to the norm k · k2 , we will henceforth denote
k · k2 simply by k · k.
Theorem 2.7. (Cauchy-Schwarz Inequality) If x, y ∈ Rn , then
|x · y| ≤ kxkkyk.
Proof. Suppose y = 0, then the result is trivial. Now let y 6= 0 and note that for any
t ∈ R we have
0 ≤ kx − tyk2 = kxk2 − 2t(x · y) + t2 kyk2 .
Let t = x · y/kyk2 , then we obtain

2 (x · y)2 (x · y)2 kyk2


0 ≤ kxk − 2 +
kyk2 kyk4
(x · y)2 (x · y)2
= kxk2 − 2 +
kyk2 kyk2
kxk2 kyk2 − (x · y)2
= .
kyk2
Thus 0 ≤ kxk2 kyk2 − (x · y)2 which implies (x · y)2 ≤ kxk2 kyk2 and hence gives us the
result. 

Corollary 2.8. k · k is a norm.


Proof. We’ve already shown two conditions earlier for k · kn . It remains to show that
kx + yk ≤ kxk + kyk.
kx + yk2 = kxk2 + 2x · y + kyk2
≤ kxk2 + 2kxkkyk + kyk2
= (kxk + kyk)2
and hence that kx + yk ≤ kxk + kyk. 

Definition 2.9. Two vectors x, y ∈ Rn are orthogonal if x · y = 0. We define the angle


θ between x and y by the formula
x·y
cos θ = .
kxkkyk
Hence x and y are orthogonal iff θ = π/2.
Corollary 2.10. (Law of Cosines)
kx + yk2 = kxk2 + 2kxkkyk cos θ + kyk2 .
Corollary 2.11. (Pythagorean Theorem) If x and y are orthogonal, then
kx − yk2 = kxk2 + kyk2 .
Since ei and ej are orthogonal for i 6= j in Rn , the Pythagorean theorem gives us the
intuition for defining distance.
Definition 2.12. We define the distance between two vectors x, y ∈ Rn by
d(x, y) = kx − yk.

8
2.4 Hyperplanes
To define a hyperplane in Rn , we will want to know the direction n that it faces and one
point p we want to be on the hyperplane. We’ll then add all points on line segments
containing p such that all points on the line also face in the direction of n.

Definition 2.13. We define the hyperplane of p with normal vector n 6= 0 by

Πn (p) = {x ∈ Rn : (x − p) · n = 0}.

That is, n is orthogonal to all points on the plane. It follows that points x =
(x1 , ..., , xm ) on hyperplanes in Rm satisfy the equation
m
X
ni (xi − pi ) = 0
i=1

or equivalently
m
X m
X
n i xi = ni pi = C
i=1 i=1

for a P
choice of n, p. Hence a hyperplane is a vector space iff C = 0, for in this case we’d
have m i=1 ni xi = 0 and hence

m
X m
X
ni (cxi ) = c ni xi = c0 = 0.
i=1 i=1

So cx ∈ Πn (p). And if y is also in such a hyperplane, then


m
X m
X m
X
ni (xi + yi ) = ni xi + ni yi = 0 + 0 = 0.
i=1 i=1 i=1

So x + y ∈ Πn (p). In this case (that is, when p · n = 0) we have

dimR (Πn (p)) = m − 1.

2.5 The Cross Product


For this section we will let V = R3 .

Definition 2.14. We define the cross product as the function × : R3 × R3 → R3


defined by

e1 e2 e3

x × y = x1 x2 x3 = (x2 y3 − x3 y2 , x3 y1 − x1 y3 , x1 y2 − x2 y1 ).
y1 y2 y3

Proposition 2.15. For x, y, z ∈ R3 and c ∈ R we have

(a) x × x = 0

(b) x × y = −(y × x)

9
(c) (cx) × y = x × (cy) = c(x × y)

(d) x × (y + z) = (x × y) + (x × z)

(e) (x × y) · z = x · (y × z)

(f) x × (y × z) = (x · z)y − (x · y)z

(g) kx × yk2 = kxk2 kyk2 − (x · y)2

(h) If x × y 6= 0, then x × y is orthogonal to x and y.


Proof. (a)-(c) are obvious. (d) We have

x × (y + z) = (x2 (y3 + z3 ) − x3 (y2 + z2 ), x3 (y1 + z1 ) − x1 (y3 + z3 ), x1 (y2 + z2 ) − x2 (y1 + z1 ))


= (x2 y3 − x3 y2 , x3 y1 − x1 y3 , x1 y2 − x2 y1 ) + (x2 z3 − x3 z2 , x3 z1 − x1 z3 , x1 z2 − x2 z1 )
= (x × y) + (x × z).

(e)

(x × y) · z = (x2 y3 − x3 y2 )z1 + (x3 y1 − x1 y3 )z2 + (x1 y2 − x2 y1 )z3


= x1 (y2 z3 − y3 z2 ) + x2 (y3 z1 − y1 z3 ) + x3 (y1 z2 − y2 z1 )
= x · (y × z).

(f) Left to reader of course. (g)

kx × yk2 = (x × y) · (x × y)
= x · (y × (x × y))
= x · (kyk2 x − (y · x)y)
= kxk2 kyk2 − x · ((y · x)y)
= kxk2 kyk2 − (x · y)2 .

(h) Suppose x × y 6= 0, then

(x × y) · x = −(y × x) · x = −y · (x × x) = −y · 0 = 0.

A similar argument works for y. 

Corollary 2.16.

x × (y × z) + z × (x × y) + y × (z × x) = 0.

Proof. If we let J denote the expression on the left, then

J = (x · z)y − (x · y)z + (z · y)x − (z · x)y + (y · x)z − (y · z)x = 0.


The above identity is called the Jacobi identity.
Corollary 2.17.
kx × yk = kxkkyk sin θ.

10
Proof.

kx × yk2 = kxk2 kyk2 − (x · y)2


= kxk2 kyk2 − (kxkkyk cos θ)2
= kxk2 kyk2 − kxk2 kyk2 cos2 θ
= kxk2 kyk2 − kxk2 kyk2 (1 − sin2 θ)
= kxk2 kyk2 sin2 θ,

which yields the result. 


While this cross product is only defined on R3 , we will later present another cross
product on R7 and explain why these are the only two dimensions with a nontrivial cross
product comatible with magnitude.

11
3 Limits and Continuity of Functions between Vec-
tor Spaces
In order to discuss limits and continuity on vector spaces, we need a notion of “closeness”.
If our vector space has a norm k · k, then we will say two vectors x and y are ε-close for
a real number ε > 0 if there is some vector z ∈ V such that

x, y ∈ Bε (z) = {v ∈ V : kz − vk < ε}.

Limits can also be defined on more abstract vector spaces that only have a notion of
distance (vector metric spaces)–or even more generally, that have a more abstract notion
of closeness (topological vector spaces). But we will limit oursevles to vector spaces with
norms (normed spaces) whose field is R.

3.1 Limits
Definition 3.1. Let U and V be normed spaces with norms k · kU and k · kV respectively.
Then for a function f : U → V, we say L is the limit of f as x → a, denoted

lim f (x) = L
x→a

for x, a ∈ U and L ∈ V, if for every ε > 0 there is a δ > 0 such that

kx − akU < δ =⇒ kf (x) − LkV < ε.

Proposition 3.2. Let f, g : U → V be functions between normed spaces such that


limx→a f (x) and limx→a g(x) exist and c ∈ R. Then

(a) lim (f (x) + g(x)) = lim f (x) + lim g(x).


x→a x→a x→a

(b) lim (cf (x)) = c lim f (x).


x→a x→a

(c) lim f (x) = lim kf (x)k.

x→a x→a

Proof. We omit (a) and (b) as their proofs are essentially the same as those from
calculus. (c) Let L = limx→a f (x). Then for every ε > 0 there is a δL such that

kx − ak < δL =⇒ kf (x) − Lk < ε.

Hence we wish to show limx→a kf (x)k = kLk. Let ε > 0 and let δ = δL . Then

kx − ak < δL =⇒ |kf (x)k − kLk|


≤ kf (x) − Lk
< ε,

which gives us the result. 

12
Proposition 3.3. Let f, g : Rn → Rm , then
   
lim (f (x) · g(x)) = lim f (x) · lim g(x) .
x→a x→a x→a

Also if f, g : R3 → R3 , then
   
lim (f (x) × g(x)) = lim f (x) × lim g(x) .
x→a x→a x→a

Proof. Let f (x) = (f (x)1 , ..., f (x)m ) and g(x) = (g(x)1 , ..., g(x)m ). Then
m
!
X
lim (f (x) · g(x)) = lim f (x)i g(x)i
x→a x→a
i=1
m
X
= lim (f (x)i g(x)i )
x→a
i=1
m 
X  
= lim f (x)i lim g(x)i
x→a x→a
i=1
   
= lim f (x) · lim g(x) .
x→a x→a

The latter claim follows from the fact that


 
lim f (x) = lim f (x)1 , lim f (x)2 , lim f (x)3 .
x→a x→a x→a x→a


We also have a claim regarding composition of functions.
Proposition 3.4. Let f : U → V and g : V → W such that limx→a f (x) = L and
limy→L g(y) = M. Then

lim ((g ◦ f )(x)) = lim g(y) = M.


x→a y→L

Proof. Let ε > 0, then there is a δ1 > 0 such that

kx − ak < δ1 =⇒ kf (x) − Lk < δ2

where δ2 is such that

kf (x) − Lk < δ2 =⇒ kg(f (x)) − M k < ε.

Hence
kx − ak < δ1 =⇒ kg(f (x)) − M k < ε.


Proposition 3.5. Let f : Rn → Rm . Then limx→a f (x) exists iff limxi →ai f (x) exists for
all 1 ≤ i ≤ n. Moreover if limx→a f (x) exists, then

lim f (x) = lim lim · · · lim f (x),


x→a x1 →a1 x2 →a2 xn →an

and the limits are interchangable.

13
3.2 Continuity
Definition 3.6. Let f : U → V be a function between normed spaces and Ω ⊆ U. f is
continuous on Ω iff for all a ∈ Ω

lim f (x) = f (a).


x→a

In other words f is continuous on Ω iff for every ε > 0 and a ∈ Ω, there is a δa such
that
kx − akU < δa =⇒ kf (x) − f (a)kV < ε.
That is, the limits of f exist at all points a ∈ Ω and are equal to f (a).

Proposition 3.7. Let f, g : U → V be continuous and h : V → W be continuous on


ran f. Then

(a) f + g is continuous on U.

(b) cf is continuous on U for c ∈ R.

(c) f g is continuous on U.

(d) h ◦ f is continuous on U.

We leave the proof to the reader.

14
4 Differentiation
4.1 Differentiation in Vector Spaces
Definition 4.1. Let f : U → V be a function between normed spaces. We say that f
is differentiable at x ∈ U if kDfx (h)kV /khkU < ∞ and
kf (x + h) − f (x) − Dfx (h)kV
lim =0
h→0 khkU
for a linear function Dfx : U → V. f is differentiable on U if it is differentiable at every
x ∈ U (that is, for every x ∈ U, there is a Dfx satisfying the above limit). In this case
we call Df the derivative (or Fréchet derivative) of f.
That is, f is differentiable at x if for every ε > 0, there is a δx > 0 such that
kf (x + h) − f (x) − Dfx (h)kV
khkU < δx =⇒ < ε,
khkU
since the fraction given is a function Fx : U → R with
kf (x + h) − f (x) − Dfx (h)kV
Fx (h) =
khkU
and we are looking at limh→0 Fx (h).
Proposition 4.2. Let f : U → V be differentiable at a ∈ U, then f is continuous at a.
Proof. Let ε > 0, then we wish to find a δ > 0 such that limx→a f (x) = f (a). In
other words, we want that
kx − akU < δ =⇒ kf (x) − f (a)kV < ε.
Note that since f is differentiable, f (a) is defined. We also have a δa for any ε0 such that
kf (a + h) − f (a) − Dfa (h)kV
khkU < δa =⇒ < ε0 .
khkU
Or equivalently, we get that
kf (a + h) − f (a) − Dfa (h)kV < ε0 khkU < ε0 δa .
If ε0 = 1, then note that we have
kf (a + h) − f (a) − Dfa (h)kV < khkU
and hence by the triangle inequality
kf (a + h) − f (a)kV ≤ kDfx (h)kV + khkU ≤ kDfa kkhkU + khkU = khkU (kDfa k + 1)
where kDfa k = sup{kDfa ((h))kU /khkV }. Since f is differentiable, kDfa k < ∞. Thus
limh→0 f (a + h) = f (a). Hence if h = x − a and δ = ε/(kDfa k + 1) we obtain
kx − akU = khkU < δ =⇒kf (x) − f (a)kV
= kf (a + h) − f (a)kV
≤ khkU (kDfa k + 1)
 
ε
< (kDfa k + 1)
kDfa k + 1
= ε.
So f is continuous at a. 
Some standard properties follow, whose proofs we leave to the reader.

15
Proposition 4.3. Let f, g : U → V be differentiable on U and h : V → W be
differentiable on ran f with derivatives Df, Dg, and Dh respectively. Then

(i) cf is differentiable, and D(cf ) = cDf.

(ii) f + g is differentiable, and D(f + g) = Df + Dg.

(iii) (Chain Rule) h ◦ f is differentiable, and D(h ◦ f ) = D(h(f ))D(f ) (where the
product is matrix multiplication).

4.2 Differentiation in Rn
Let f : Rn → R. Then the derivative Df of f (if it exists) would satisfy

|f (x + h) − f (x) − Dfx (h)|


lim =0
h→0 khk

for every x ∈ Rn .

Definition 4.4. Let f : Rn → R. We define the ith partial derivative of f (denoted


∂f
Di f, fxi , or ∂x i
: Rn → R) by

∂f f (x1 , ..., xi + h, ..., xn ) − f (x1 , ..., xn )


(x) = lim
∂xi h→0 h
if the limit exists.

We will often write ∂f /∂xi and omit the (x).

Proposition 4.5. Let f : Rn → Rm be differentiable with f (x) = (f1 (x), ..., fm (x)).
Then Di fj exists for each 1 ≤ i ≤ n and 1 ≤ j ≤ m.

Proof. Let h = tei with t > 0 (without loss of generality). Then since f is differen-
tiable (and hence each fj ) and ktei k = tkei k = t, we have

fj (x + h) − fj (x) − Dfj (tei ) fj (x + tei ) − fj (x)


lim = lim+ − Dfj (ei ) = 0
h→0 ktei k t→0 t

and hence Dfj (ei ) = Di fj . 

Corollary 4.6. If f : Rn → Rm is differentiable, then its derivative Df (called the


Jacobian derivative of f ) is the function defined by
 ∂f1 ∂f1

∂x1
(x) · · · ∂x n
(x)
Dfx =  ... .. .. 

. . 
∂fm ∂fm
∂x1
(x) ··· ∂xn
(x)

Corollary 4.7. If f : Rn → R, then its derivative, called the gradient, is given by


 
∂f ∂f
∇fx = (x), · · · , (x) .
∂x1 ∂xn

16
Note, often in calculus/analysis texts, Dfx and ∇fx are written as Df (x) and ∇f (x).
We do not use this notation since we do not mean the matrix multiplication of Df and
x or ∇f and x. Contrastingly in the definition of the derivative we had Dfx (h) where we
did mean the matrix multiplication of Dfx and the vector h. Correspondingly, evaluation
of Dfx on a vector h ∈ Rn is defined by matrix multiplication (or in the case of the
gradient, via the dot product). We will henceforth leave out the (x) for convenience.

Theorem 4.8. If f : U → Rm with U ⊆ Rn is a function such that Di fj exists and is


continuous at a for all 1 ≤ i ≤ n and 1 ≤ j ≤ m, then f is differentiable at a.

Let us use the notation


∂ 2f
 
∂ ∂f
Dij f = = .
∂xi ∂xj ∂xi ∂xj

Theorem 4.9. If Dij f and Dji f exist and are continuous, then Dij f = Dji f.

Proposition 4.10. If f, g : Rn → Rm are differentiable, then

D(f · g) = D(f )g + f D(g)

where the products are matrix multiplication.


If f, g : Rn → R are differentiable, then

∇(f g) = f ∇g + g∇f.

And if f, g : R3 → R3 are differentiable, then

D(f × g) = f × D(g) + D(f ) × g.

Definition 4.11. Let f : Rn → R be differentiable and v ∈ Rn . We define the direc-


tional derivative of f in the direction of v as
v
∇v f = ∇f · .
kvk

Proposition 4.12. Let u = v/kvk. The directional derivative ∇v f satisfies

f (x + tu) − f (x)
∇v f = lim .
t→0 t
Hence
∂f
∇ei f = .
∂xi

4.3 Differentials and Tangent Planes in Rn


Let f : Rn → R. We should want a differential of f, denoted df, to be a real value
representing the extent to which the function is changing. In the case for n = 1, we have

df = f 0 dx = Df dx.

To still get a real value in n dimensions, we change the product to dot product.

17
Definition 4.13. Let f : Rn → R. Then we define the differential of f as
n
X ∂f
df = Df · dx = ∇f · dx = dxi .
i=1
∂xi

Similarly, if f : Rn → Rm , we define its differential via matrix multiplication as the vector


 ∂f1 ∂f1
   
∂x1
· · · ∂x n
dx1 df1
 .. ... . .
..   ..  =  ... 
df = Df dx =  . .
   
∂fm ∂fm
∂x1
· · · ∂xn dxn dfm

Recall a hyperplane Π is given by the equation

n · (x − p) = 0

for a point p ∈ Π and a vector n orthogonal to points on the plane. Suppose we wish to
find a hyperplane tangent to f at point p for a differentiable function f. Then we need
only to find a normal vector n. In the case when f : R → R, we used the point-slope
method to obtain the tangent plane (which was a line) through the point (p, f (p)) that
also went through (a, f (a)) defined by

f (x) − f (a) = f 0 (p)(x − a).

In this case, if we place the function in R2 so that f = {(x, f (x))}, we had a tangent line
at the point (p, f (p)). Hence

0 = n · ((x, f (x)) − (a, f (a))) = (n1 , n2 ) · ((x − a), (f (x) − f (a))).

Thus we had n = (f 0 (p), −1). Analogously in Rn , consider the differential approximation


of a differentiable function f : Rn → R:
n n
X ∂f X ∂f
∆f = f (x) − f (a) = (p)(xi − ai ) = (p)∆xi .
i=1
∂x i i=1
∂x i

Hence, placing the function into Rn+1 so that our second vector becomes A = (a1 , ..., an , f (a))
and the function becomes X = (x1 , ..., xn , f (x)), we would want
n+1
X n
X
0 = n · (X − A) = ni (Xi − Ai ) = −(f (x) − f (a)) + ni (xi − ai ).
i=1 i=1

Hence  
∂f ∂f
n= (p), ..., (p), −1 .
∂x1 ∂xn
So, this characterizes an “(n−1)-dimensional” tangent hyperplane (with n+1 coordinates,
but note that having n−1 of the evaluations of the partial derivatives at p puts a constraint
on the last one) in Rn while viewing f as an “n-dimensional” subset of Rn+1 as all points
of the form (x1 , ..., xn , f (x)) (similarly, choosing the n xi ’s constrains f (x)–hence “n-
dimensional”).
The construction of a tangent hyperplane is the answer to the question: “Given a
differentiable function f : Rn → R and two points p ∈ Rn and a ∈ Rn+1 , can one

18
construct a hyperplane containing the points (p1 , ..., pn , f (p)) and a?” A weaker question
is: “Given a differentiable function f : Rn → R and a point a ∈ Rn+1 , can one find a
point p ∈ Rn such that there is a tangent hyperplane of f at p passing through a?” The
answer to this is also yes with some slight assumptions.
For two points a, b ∈ Rn , we define the line segment between them as

L(a, b) = {(1 − t)a + tb : t ∈ [0, 1]}.

Theorem 4.14. (Mean Value Theorem) Let U ⊆ Rn , f : U → R be differentiable,


a ∈ U, and L(x, a) ⊆ U. Then there exists a p ∈ L(x, a) such that

f (x) − f (a) = ∇f (p) · (x − a).

4.4 Optimization
Optimization is the study of critical points of differentiable functions. Like the differential
of a function, the notion of a critical point of a function doesn’t quite generalize to an
arbitrary differentiable function between vector spaces. We in turn only consider the
notion of critical points for differentiable functions f : Rn → Rm .
For motivation, if f : R → R is differentiable and f 0 (x) = 0 for some x ∈ R, then we
call x a critical point. For functions f : Rn → R, if ∇f (x) = 0 for some x ∈ Rn , we will
say the same of x. But for f : Rn → Rm , we slightly weaken our requirements regarding
the Jacobian derivative Df.

Definition 4.15. Let f : Rn → Rm be a linear function. We define the rank of f by

rank f = dimR (im f ).

Definition 4.16. Let f : Rn → Rm be differentiable. x ∈ Rn is a critical point of f if

rank Df (x) < m.

A point which is not a critical point is called a regular point.

So for m = 1, we would need rank ∇f = 0, and hence that ∇f = 0.

Definition 4.17. Let f : Rn → R. A point p ∈ Rn is a

(a) local minimum of f if there is some r > 0 such that f (p) ≤ f (x) for all
x ∈ Br (p);

(b) local maximum of f if there is some r > 0 such that f (p) ≥ f (x) for all
x ∈ Br (p);

(c) local extremum of f if it is either a local minimum or local maximum of f.

Proposition 4.18. If p is a local extremum of a differentiable f, then ∇f (p) = 0 (and


hence p is a critical point).

Definition 4.19. If f : Rn → Rm , then x is a local (min/max/extreme) of f if it


is a local (min/max/extreme) of fi for some 1 ≤ i ≤ m. A critical point which is not a
local extremum is called a saddle point.

19
Theorem 4.20. (Second Derivatives Test) Let x be a critical point of a twice
differentiable function f : Rn → Rm .

(a) If all of the eigenvalues of D2 f (x) are negative, then x is a local maximum.

(b) If all of the eigenvalues of D2 f (x) are positive, then x is a local minimum.

What if we wish to find critical points of f : Rn → R subject to a contraint g(x) = 0?


As we have seen, normal vectors for tangent hyperplanes are characterized by the gradient,
so a tangent hyperplane at a critical point is characterized by the gradient at that point
(which would be 0 since it’s a critical point). In this case, the “critical points” of f
subject to g may not actually be critical points of f, but they will be characterized by
∇f being normal to g. In other words, we are looking for points such that ∇f and ∇g
are parallel. So the equation
∇f + λ∇g = 0
should be satisfied for some scalar λ. If we had multiple contraints gi (x) = 0 for 1 ≤ i ≤ k,
then we would similarly want ∇f to be simultaneously parallel to each of the gradients
∇gi . Thus the relative extrema (or extrema of f given {gi }) are the points that
satisfy
X k
∇f (x) + λi ∇gi (x) = 0
i=1

where the λi ’s are called the Langrange multipliers of the constraints.

20
5 Integration
5.1 Integration in Vector Spaces?
Can we define a notion of integration in general vector spaces? It turns out that if f : U →
V is a function between vector spaces with U and V normed and V “complete” (Cauchy
sequences have limits), then there is a natural way to define the integral of f (if it exists).
It ends up being nothing intuitive and is correspondingly not particularly interesting
to examine. It requires some knowledge of measure theory: U has an induced measure
space structure with d-dimensional Hausdorff measure and Caratheodory-measurable sets
as the σ-algebra, and the completion of V allows one to define limits of step functions...but
enough of that. Since these notes do not require measure theory as a prerequisite, we in
turn restrict ourselves to real vector spaces.

5.2 Integration in Rn
Recall that for a function f : R → R, we define the integral of f over the interval [a, b]
(or equivalently (a, b), (a, b] or [a, b)) by
Z b n
X
f (x) dx = lim f (ti )∆xi
a n→∞
i=1

where ∆xi = |xi − xi−1 |, x0 = a, xn = b, and ti ∈ [xi−1 , xi ]. The limit assumes that we
make further subdivisions of the interval [a, b] as n → ∞—where in each successive step,
xi might be different than before.
To amend this, one can discuss partitions of the interval [a, b] as sets P consisting of
points in [a, b] (the xi ’s). One can then say a partition P 0 is finer than P if P ⊆ P 0 (i.e.
that P 0 has all the same points as P and possibly some more). We can define a norm on
a partition P = {x1 , ..., xn−1 , xn } (with x0 = a and xn = n) by

kP k = max ∆xi = max |xi − xi−1 |.


i≤n i≤n

Then if {Pn } is a sequence of partitions of [a, b] such that Pn+1 is finer than Pn for all n
and limn→∞ kPn k = 0, we can then define the integral by
Z b X
f (x) dx = lim f (ti )∆xi
a n→∞
xi ∈Pn

where ti ∈ [xi−1 , xi ]. In this case, one can show that the above definition does not depend
upon the ti ’s chosen in each subinterval.
Now what if f : R2 → R? If we wanted to continue with the same reasoning, we
might want to look at sums of the form
X
f (ti )∆Ai
i

where Ai is a rectangle in R2 , ∆Ai is the area of the rectangle, and ti is a point in the
rectangle. It turns out this is the correct way to define the integral. Now since ti ∈ R2 ,

21
we can write it as ti = (t1i , t2i ) for t1i , t2i ∈ R. Similarly since Ai is a rectangle in R2 , its
area can be written as product |xi − xi−1 ||yi − yi−1 |. Hence our sum looks like
X X X
f (ti )∆Ai = f (t1i , t2i )|xi − xi−1 ||yi − yi−1 | = f (t1i , t2i )∆xi ∆yi .
i i i

Hence we are essentially just subdividing the x axis and y axis, or say, intervals on them
such as [a, b] and [c, d], with the intent of letting the number of rectangles go to infinity
(while their areas go to 0). Should we require [a, b] and [c, d] to have the same number
of subdivisions at each stage? It turns out for that most ordinary circumstances, it will
not matter. Hence we may index the subdivisions of the [a, b] and [c, d] axes separately
as follows: !
XX X X
f (t1i , t2j )∆xi ∆yj = f (t1i , t2j )∆xi ∆yj
j i j i

and correspondingly sum over one and then the other. So we want a sequence of partitions
{Pn } of [a, b] and a sequence of partitions {Pm0 } of [c, d] such that limn→∞ kPn k = 0 and
limm→∞ kPm0 k = 0. We then define the integral of f over the rectangle [a, b] × [c, d] by
Z dZ b Z d Z b  !
X X
f (x, y) dx dy = f (x, y) dx dy = lim lim f (t1i , t2j ) ∆xi ∆yj .
c a c a m→∞ n→∞
0
yj ∈Pm xi ∈Pn

Theorem 5.1. (Fubini’s Theorem) Let f : R2 → R and R = [a, b] × [c, d] be a


rectangle in R2 such that f (x, ·) : R → R is integrable on [c, d] and f (·, y) : R → R is
Rd Rb
integrable on [a, b] for all x, y ∈ R. If c ( a f (x, y) dx) dy exists and is finite, then
ZZ Z d Z b  Z b Z d 
f (x, y) dA := f (x, y) dx dy = f (x, y) dy dx.
R c a a c

By induction, this theorem says that if f : Rn → R is a function and we integrate


on an n-dimensional rectangle such that any of the one-dimensional partial functions is
integrable, then if any iteration of the integrals exists and is finite, it is equal to all other
iterations of the integral. We can thus define the integral of f : Rn → R on a rectangle.
Definition 5.2. Let f : Rn → R and R = Xni=1 [ai , bi ] be an n-dimensional rectangle. If
each f (x1 , ..., ·i , ..., xn ) is integrable on [ai , bi ], then we define
Z Z Z b1 Z bn
· · · f (x1 , ..., xn ) dV = ··· f (x1 , ..., xn ) dx1 · · · dxn .
R a1 an

If R = Xni=1 [ai , bi ], then we equate the following notation:


Z Z Z Z b1 Z bn
f dx = · · · f dV = ··· f dx1 · · · dxn .
R R a1 an

Like derivatives, if f : Rn → Rm with f (x) = (f1 (x), ..., fm (x)) for x ∈ Rn and R =
Xni=1 [ai , bi ], we define Z Z Z 
f dx = f1 dx, ..., fm dx .
R R R
n
What if we wish to integrate over subsets of R that aren’t rectangles? This requires
another definition of the integral, which turns out to include that one we previously had.

22
Say we want to integrate over a subset X ⊆ Rn , we will have to make an assumption
about X that
Xk m
X
sup |Ri | = inf |Sj0 |
S⊆X X⊆S 0
P (S) i=1 P (S 0 ) j=1

where the supremum (or infimum) is taken over all subsets S (or S 0 ) contained in (or
containing) X and partitions of those rectangles (and Ri denotes the ith rectangle in a
partition P (S) of S). In this case, we call X a Jordan region and denote the commone
value by |X|, and call it the volume of X. Let us define the norm of an n-dimensional
partition by
kP (S)k = max |Ri |.
i

That is, the norm of the partition is the volume of the largest rectangle in the partition.

Definition 5.3. Let f : Rn → R and {P (R)n } and {P (R0 )n } be a sequence of partitions


of R and R0 respectively that are contained in or contain X respectively with X ⊆
Rn . Also suppose P (R)n ⊆ P (R)n+1 and P (R0 )n ⊆ P (R0 )n+1 for all n and that both
kP (R)n k, kP (R0 )n k → 0 as n → ∞. Then we define the integral of f over X ⊆ Rn , by
   
Z X X
f dx = sup  lim |Ri | = inf  lim |Rj0 |
X S⊆X n→∞ X⊆S 0 n→∞
Ri ∈P (S)n Rj0 ∈P (S 0 )n

5.3 Change of Variables


Theorem 5.4. (Change of Variables) Let Ω ⊆ Rn and φ : Ω → Rn be an injective
continuously differentiable function such that |Dφ| =
6 0. If X ⊆ Ω is a Jordan region and
n
f : R → R with f ◦ φ integrable on Ω and f integrable on φ(Ω), then
Z Z
f (φ(x)) dφ(x) = (f ◦ φ)(x)| det Dφ(x)| dx.
φ(Ω) Ω

When f : R → R, this is simply the u-substitution rule.

Example 5.5. (Polar Coordinates in R2 ) Define φ : R+ × [0, 2π) → R2 (where R+


is the nonnegative reals) by

φ(r, θ) = (r cos θ, r sin θ).

Then it is clear that this map is continuously differentiable and injective. The magnitude
of the determinant of the Jacobian is
∂x ∂x
cos θ −r sin θ
∂r
| det Dφ| = ∂y ∂θ
∂y =
= |r cos2 θ + r sin2 θ| = r.
∂r ∂θ
sin θ r cos θ

Hence Z Z 2π Z ∞
f (x, y) dx dy = f (r, θ) r dr dθ
R2 0 0

23
Example 5.6. (Spherical Coordinates in R3 ) Define φ : R+ × [0, 2π) × [0, π] → R3
by
φ(ρ, θ, ϕ) = (ρ sin ϕ cos θ, ρ sin ϕ sin θ, ρ cos ϕ).
This map is also clearly continuously differentiable and injective. We also have

∂x ∂x ∂x
∂ρ ∂θ ∂ϕ
| det Dφ| = ∂y ∂y ∂y

∂ρ ∂θ ∂ϕ
∂z ∂z ∂z
∂ρ ∂θ ∂ϕ

sin ϕ cos θ −ρ sin ϕ sin θ ρ cos ϕ cos θ

= sin ϕ sin θ ρ sin ϕ cos θ ρ cos ϕ sin θ
cos ϕ 0 −ρ sin ϕ
= ρ2 sin ϕ.
Thus Z Z π Z 2π Z ∞
f (x, y, z) dx dy dz = f (ρ, θ, ϕ) ρ2 sin ϕ dρ dθ dϕ.
R3 0 0 0

5.4 Subordinate Integration


We discussed integrating a function f : Rn → R over X ⊆ Rn . Suppose either that X
is a subspace of Rn with dimR (X) < n or that all elements x ∈ X have at least one
component the same (that is, for all x, y ∈ X, xi = yi for some i). Let φ : X → Rn be the
identity map, which is certainly continuously differentiable and injective. Let us suppose
each element of X has the form (x1 , ..., xn−1 , c), then φ(x) = (f1 (x), ..., fn−1 (x), fn (x))
where fi (x) = xi for 1 ≤ i < n and fn (x) = c. Hence

∂fi 1 if i = j 6= n
= .
∂xj 0 if i 6= j or i = j = n
That is,
1 0
··· 0
0 1
0 ··· 0
det Dφ = ... .. .. = 0.

. .

0 · · · 1 0

0 ··· 0 0
n
Hence if f : R → R and k < n, the change of variables theorem gives us that
Z
f dx = 0.
Rk

This may seem to suggest that there is no nice way of integrating of subsets of strictly
less “dimension”. Rather than integrating in the classical way (in which case we essen-
tially get hypervolumes being 0 since, say, one of the components is constant and hence
rectangles have 0 length in that component), we will come up with alternate methods for
subordinate integration.
Let γ : [0, 1] → Rn be a continuously differentiable and injective map (which we will
call a path) and f : Rn → R. We ask the question: what is the integral of f along γ?
Change of variables would give us:
Z Z Z 1
f dγ := f (x) dγ(x) = f (γ(t))| det Dγ| dt.
γ γ([0,1]) 0

24
But det Dγ doesn’t make sense in this case since
Dγ = (γ10 (t), ..., γn0 (t)) .
It turns out that kDγk works as a substitute for | det Dγ|. Intuitively in Rn we have an
infinitesimal Pythagorean theorem:
dγ 2 = dγ12 + · · · + dγn2 ,
which would yield
s 2 2
Z Z 1  Z 1
dγ1 dγn
f dγ = f + ··· + dt = f kDγk dt.
γ 0 dt dt 0

Now suppose instead that f : Rn → Rn , then we can define


Z Z
f · dγ = (f1 (x), ..., fn (x)) · (dγ1 , ..., dγn )
γ
Zγ Z
= f1 (x) dγ1 + · · · + fn (x) dγn
γ γ
Z 1 Z 1
= f1 (γ(t)) γ10 (t) dt + ··· + fn (γ(t)) γn0 (t) dt
Z0 1 0

= (f1 (γ(t)), ..., fn (γ(t))) · (γ10 (t), ..., γn0 (t)) dt


Z0 1
= f (γ(t)) · γ 0 (t) dt.
0
The above two definitions (for f : R → R and f : Rn → Rn ) of integration with respect
to a path γ are called line integrals. What if want to integrate with respect to a surface
instead of a line? Instead of γ : [0, 1] → Rn , suppose we have a continuously differentiable
S : X → R where X ⊆ Rn . We can call this a surface in Rn+1 . We want to define
Z
f dS
S
n+1
with f : R → R. The area of a polytope spanned by n vectors V = {v1 , ..., vn } is
k det A(V )k where A(V ) is the (n + 1) × (n + 1) matrix whose first row consists of the
basis vectors {ei } and whose i + 1 row is the vector A(vi ). This vector corresponds to the
parallelogram spanned by vi in the ith and n + 1 image dimension. That is, in a small
patch of the image of S we have
∂S
A(vi ) = ∆xi ei + ∆xi en+1 .
∂xi
Thus

e1 · · · ei ··· en+1
∂S
∆x1 · · · 0 · · · ∂x ∆x1
1
.. ... ..
. .
det(A(V )) = ∂S

0 · · · ∆x i · · · ∂xi
∆x i


.. .. ..
.

. .
∂S

0 ··· ∆xn ∂xn ∆xn
 
∂S ∂S
= − ∆x1 · · · ∆xn , ..., − ∆x1 · · · ∆xn , ∆x1 · · · ∆xn .
∂x1 ∂xn

25
Thus

∆S = k det A(V )k
s 2  2
∂S ∂S
= ∆x1 · · · ∆xn + · · · + ∆x1 · · · ∆xn + (∆x1 · · · ∆xn )2
∂x1 ∂xn
s 2  2
∂S ∂S
= + ··· + + 1 ∆x1 · · · ∆xn .
∂x1 ∂xn

Or in the limit:
s 2  2
∂S ∂S
dS = + ··· + + 1 dx1 · · · dxn .
∂x1 ∂xn

Hence for f : Rn+1 → R and a surface S : Rn → R, we define the surface integral of f


with respect to S by
s 2  2
Z Z
∂S ∂S
f (x1 , ..., xn+1 ) dS = f (x1 , ..., xn , S(x1 , ..., xn )) + ··· + + 1 dx1 · · · dxn .
S S ∂x1 ∂xn

If instead we have that f : Rn+1 → Rn+1 , we can define its surface integral with respect
to an oriented surface S (meaning it has a normal vector that changes continuously along
S) as Z Z
f · dS = (f · n) dS
S S
where n is a unit normal vector to S. Since S : Rn → R, suppose we consider the zero
function

Σ(x1 , ..., xn , S(x1 , ..., xn )) = Σ(x1 , ..., xn , t) = t − S(x1 , ..., xn ) = 0.

The gradient of this function can be written


 
∂S ∂S
∇Σ = − , ..., − ,1 .
∂x1 ∂xn

Also
∂S ∂S
∇Σ · (x1 , ..., xn , S(x1 , ..., xn )) = − x1 − · · · − xn + S(x1 , ..., xn ) = c
∂x1 ∂xn
for some constant c (since applying d to both sides must give us a 0 on the right, by
computing the differential of S). But since Σ = 0, we must have c = 0. So ∇Σ is normal
to the surface. Hence we can define the unit normal vector
 
∂S ∂S
∇Σ − ∂x1
, ..., ∂xn
, 1
n= = r  .
k∇Σk ∂S
2  2
∂S
∂x1
+ · · · + ∂xn + 1

26
Hence we obtain
Z Z
f · dS = (f · n) dS
S S
  s
∂S ∂S
Z − ∂x 1
, ..., ∂xn
, 1 ∂S
2 
∂S
2
= f · r  + ··· + + 1 dx1 · · · dxn
S ∂S
2  2
∂S
∂x1 ∂xn
∂x1
+ · · · + ∂xn + 1
Z  
∂S ∂S
= (f1 , ..., fn+1 ) · − , ..., , 1 dx1 · · · dxn
S ∂x1 ∂xn
Z  
∂S ∂S
= −f1 − · · · − fn + fn+1 dx1 · · · dxn
S ∂x1 ∂xn

where each fi has the change of variables in its arguments: fi (x1 , ..., xn , S(x1 , ..., xn )).

Theorem 5.7. (Fundamental Theorem for Line Integrals) Let γ : [a, b] → Rn be


a path (continuously differentiable) and f : Rn → R be differentiable. Then
Z
∇f · dγ = f (γ(b)) − f (γ(a)).
γ

Proof. By defition we have


Z Z b
∇f · dγ = ∇f (γ(t)) · γ 0 (t) dt
γ a
Z b   
∂f ∂f dγ1 dγn
= , ..., · , ..., dt
a ∂γ1 ∂γn dt dt
Z b 
∂f dγ1 ∂f dγn
= + ··· + dt
a ∂γ1 dt ∂γn dt
Z b
df (γ(t))
= dt
a dt
= f (γ(b)) − f (γ(a)).

27
6 Vector Calculus in R3
6.1 Gradient, Curl, and Divergence
Recall that for f : Rn → R, we have the gradient ∇f : Rn → Rn defined by
 
∂f ∂f
∇f (x) = (x), ..., (x) .
∂x1 ∂xn

Definition 6.1. Let f : Rn → R. the divergence of f, denoted div f or ∇ · f, is a map


from Rn to R defined by
∂f1 ∂fn
div f (x) = ∇ · f (x) = (x) + · · · + (x).
∂x1 ∂xn

We also had the notion of a cross product of two vectors in R3 .

Definition 6.2. Let f : R3 → R3 . The curl (or vorticity) of f, denoted curl f or ∇ × f,


is a map from R3 → R3 defined by

e1 e2 e3  
∂ ∂ ∂
∂f3 ∂f2 ∂f1 ∂f3 ∂f2 ∂f1
curl f = ∇ × f = ∂x1 ∂x2 ∂x3 =
− , − , − .
f1 f2 f3 ∂x2 ∂x3 ∂x3 ∂x1 ∂x1 ∂x2

Since we will be restricting ourselves to R3 in this section, we will simplify our notation
with x1 = x, x2 = y, and x3 = z:
 
∂f ∂f ∂f
∇f = , , = (fx , fy , fz )
∂x ∂y ∂z
 
∂f3 ∂f2 ∂f1 ∂f3 ∂f2 ∂f1
curl f = − , − , −
∂y ∂z ∂z ∂x ∂x ∂y
∂f1 ∂f2 ∂f3
div f = + + .
∂x ∂y ∂z
Proposition 6.3.
curl ∇f = 0.

Proof.

curl ∇f = curl (fx , fy , fz )



e1 e2 e3
∂ ∂ ∂
= ∂x ∂y ∂z
fx fy fz
= (fzy − fyz , fxz − fzx , fyx − fxy )
= 0.

28
Proposition 6.4.
div curl f = 0.
Proof.
 
∂f3 ∂f2 ∂f1 ∂f3 ∂f2 ∂f1
div curl f = div − , − , −
∂y ∂z ∂z ∂x ∂x ∂y
     
∂ ∂f3 ∂f2 ∂ ∂f1 ∂f3 ∂ ∂f2 ∂f1
= − + − + −
∂x ∂y ∂z ∂y ∂z ∂x ∂z ∂x ∂y
= 0.

If curl f = 0, we say that f is irrotational (or conservative). If div f = 0, we say


that f is incompressible. We will also sometimes refer to functions f : R3 → R as
scalar fields and functions f : R3 → R3 as vector fields. Now if C ∞ (X, Y ) denotes the
smooth functions from X to Y with X, Y subsets of Rn , then we have a chain of maps

grad curl div


C ∞ (R3 , R) −→ C ∞ (R3 , R3 ) −→ C ∞ (R3 , R3 ) −→ C ∞ (R3 , R)

such that any double composition in the chain is 0 (i.e. curl of grad, or div or curl).
Such a chain is called a complex. Of interest might be the vector fields f : R3 → R3
that fall into two categories: (1) irrotational ones that aren’t gradients of a scalar field,
and (2) incompressible ones that aren’t the curl of a vector field. Although it turns out
that all irrotational ones are the gradient of a scalar field, and all incompressible ones are
the curl of a vector field. These theorems follow from the fundamental theorem for line
integrals, Stokes’ theorem, and Gauss’ theorem (that latter two of which appear in the
next section).
We will also use the following notation:

∂ 2f ∂ 2f ∂ 2f
∇2 f = ∇ · ∇f = div (grad f ) = + +
∂x2 ∂y 2 ∂z 2
for a scalar field f, and
∇2 f = ∇2 f1 , ∇2 f2 , ∇2 f3


for a vector field f. In each case, we call this the Laplacian of f.

Proposition 6.5.
curl (curl f ) = ∇(div f ) − ∇2 f.

Proof. Let P = f3y − f2z , Q = f1z − f3x , and R = f2x − f1y so that curl f = (P, Q, R).

29
Then

e1 e2 e3
∂ ∂ ∂
curl (curl f ) = ∂x ∂y ∂z
P Q R
= (Ry − Qz , Pz − Rx , Qx − Py )
= (f2xy − f1yy ) − (f1zz − f3xz ), (f3yz − f2zz ) − (f2xx − f1yx ),

(f1zx − f3xx ) − (f3yy − f2zy )

= f2xy + f3xz − (f1yy + f1zz ), f3yz + f1yx − (f2xx + f2zz ), f1zx + f2zy − (f3xx + f3yy )
= f2xy + f3xz + f1xx , f3yz + f1yx + f2yy , f1zx + f2zy + f3zz − ∇2 f

 
∂ ∂ ∂
= (f1 + f2y + f3z ), (f1x + f2y + f3z ), (f1x + f2y + f3z ) − ∇2 f
∂x x ∂y ∂z
2
= ∇(f1x + f2y + f3z ) − ∇ f
= ∇(div f ) − ∇2 f.


We will denote the above by curl2 f.

6.2 Main Theorems


The theorems in this section generalize the fundamental theorem of calculus. In more
advanced mathematics courses, there are even more general versions of the fundamental
theorem of calculus:

Radon-Nikodym Theorem (Measure Theory) If µ is a σ-finite signed measure and ν is σ-


finite positive measure (think of these as a generalization of differentials) on a measurable
space (X, Σ), then there exist unique σ-finite signed measures λ, ρ such that λ ⊥ µ, ρ  µ,
and ν = λ + ρ. Moreover there is a µ-integrable function f such that
Z
ρ(A) = f dµ.
A


Here f is called the Radon-Nikodym derivative and is denoted f = dµ .
You can think of it as saying
R “given two differentials dy and dx, there is an integrable
function f such that dy = f dx.” And we can write f = dy/dx. The actual definition
of a measure isn’t quite like a differential, but it’s a managable generalization. Another
general form is:

Stokes’ Theorem (Differentiable Manifold Theory) Let M be a smooth, compact, oriented,


n-dimensional manifold with boundary and ω be an (n − 1)-form on M. Then
Z Z
dω = ω.
M ∂M

An n-dimensional manifold can be thought of as a space that looks like Rn locally (i.e.
in small neighborhoods). For example, the interval [a, b] is a compact 1-dimensional
manifold with boundary (its boundary is {a, b}, the endpoints). A 0-form on this interval

30
is just a smooth function f defined on it (in general, an n-form looks like f dx1 ∧· · ·∧dxn ).
So the theorem applied to this example is (and recall df = f 0 (x) dx)
Z Z Z
0
df = f (x) dx = f = f (b) − f (a),
[a,b] [a,b] {a,b}

which is simply the fundamental theorem of calculus. It turns out that all of the theorems
in this section are special cases of the above Stokes’ theorem. In what follows, a simple
path will be an injective one (so it doesn’t intersect itself); an orientation on a closed
path will a choice about whether one moves clockwise or counterclockwise. A closed path
with orientation is called an oriented path. An oriented surface is a surface with a
set of normal vectors at each point that change continuously.

Theorem 6.6. (Kelvin-Stokes’ Theorem) Let S : R2 → R be a smooth, bounded,


and oriented surface in R3 with a simple, closed, and piecewise-smooth path γ : [a, b] : R
as a boundary and f : R3 → R3 be a vector field that is continuously differentiable. Then
Z ZZ
f · dγ = curl f · dS.
γ S

This can be seen as a special case of the above Stokes’ theorem by the following
reasoning. S is a surface, so it looks like R2 in very small neighborhoods. That is, S is a
2-dimensional manifold. It also satisfies the assumptions we need. Also,

f · dγ = f1 dγ1 + f2 dγ2 + f3 dγ3 .

f · dγ is a sum of 1-forms, and is thus a 1-form itself. In the statement of the general
Stokes’ theorem, dω refers to the de Rham derivative of ω, and in our case it turns out
that
d(f · dγ) = curl f · dS.
But we can prove the Kelvin-Stokes’ theorem directly without showing it is a special case
of the more general Stokes’ theorem by noting the fact that since γ is a boundary of the
surface, we can write

γ(t) = (x(t), y(t), z(t)) = (x(t), y(t), S (x(t), y(t))) .

31
Proof.
Z Z b
f · dγ = f · γ 0 (t) dt
γ a
Z b
= (f1 x0 (t) + f2 y 0 (t) + f3 z 0 (t)) dt
a
Z b  
0 0 ∂S dx ∂S dy
= f1 x (t) + f2 y (t) + f3 + dt
a ∂x dt ∂y dt
Z b     
∂S dx ∂S dy
= f1 + f3 + f2 + f3 dt
a ∂x dt ∂y dt
Z    
∂S ∂S
= f1 + f3 dx + f2 + f3 dy
γ∩R2 ∂x ∂y
ZZ     
∂ ∂S ∂ ∂S
= f2 + f3 − f1 + f3 dx dy
S ∂x ∂y ∂y ∂x
ZZ  
∂S ∂S
= −(f3y − f2z ) − (f1z − f3x ) + (f2x − f1y ) dx dy
S ∂x ∂y
ZZ  
∂S ∂S
= curl f · − , − , 1 dx dy
S ∂x ∂y
ZZ
= curl f · dS.
S


Corollary 6.7. (Green’s Theorem) Let γ : [a, b] → R2 be a simple, closed, and


piecewise-smooth path in the plane where A denotes the region enclosed by the path.
Let f : R2 → R2 be continuously differentiable. Then
Z ZZ  
∂f2 ∂f1
f · dγ = − dx dy.
γ A ∂x ∂y
Proof. Since A and γ are in R2 , they are in R3 . Moreover S is easily shown to be
smooth, bounded (by γ), and oriented and
s  2  2
∂z ∂z
dS = 1 + + dx dy = dx dy
∂x ∂y
since z = 0, and the normal vector of S is (0, 0, 1). Let f = (f1 , f2 , 0) and γ(t) =
(x(t), y(t), 0). Then Kelvin-Stokes’ gives us
Z ZZ
f · dγ = curl f · dS
γ A

Z Z e1 e2 e3
∂ ∂ ∂
= ∂x ∂y ∂z · (0, 0, 1) dx dy

A
f1 f2 0
ZZ  
∂f2 ∂f1
= − dx dy.
A ∂x ∂y


32
Theorem 6.8. (Gauss’ Theorem)(Divergence Theorem) Let B be a simple body
and S be its boundary with positive orientation. If f : R3 → R3 is continuously differen-
tiable on B then ZZ ZZZ
f · dS = div f dxdydz.
S B

Proof.
ZZ ZZ
f dS = (f1 , f2 , f3 ) · (n1 , n2 , n3 ) dS
S ZZ S

= (f1 n1 + f2 n2 + f3 n3 ) dS
S
Z Z Z  Z Z Z  Z Z Z 
∂f1 ∂f2 ∂f3
= dx dydz + dy dxdz + dz dxdy
S ∂x S ∂y S ∂z
ZZZ  
∂f1 ∂f2 ∂f3
= + + dx dy dz
B ∂x ∂y ∂z
ZZZ
= div f dxdydz.
B


Recall that curl of gradient is zero and divergence of curl is zero. Also recall that vector
fields with 0 curl are irrotational and vector fields with 0 divergence are incompressible.
We said that, in fact, every irrotational vector field is the gradient of some other vector
field, and that every incompressible vector field is the curl of another vector field. The
first step in showing this is to show that every vector field can be written as the sum of
an irrotational and incompressible field.

Theorem 6.9. (Helmholtz)(Fundamental Theorem of Vector Calculus) Let


f : R3 → R3 be a vector field that is twice continuously differentiable. Then f = C + R
where C is an incompressible vector field and R is an irrotational vector field.

It turns out that we can write

f = curl A − ∇B

for some vector field A and scalar field B. It follows immediately that C = curl A and
R = −∇B for the divergence of curl is 0 and curl of gradient is 0. In fact A and B have
the forms Z
1 curl f
A= · dx
4π R3 kx − x0 k
and Z
1 div f
B= dx
4π R3 kx − x0 k
for any choice of x0 ∈ R3 .

Corollary 6.10. (grad-curl-div Exact Sequence Theorem)

(a) If f is a scalar field, then curl ∇f = 0.

(b) If f is a vector field, then div curl f = 0.

33
(c) If f is an irrotational vector field, then it is the gradient of some scalar field.

(d) If f is an incompressible vector field, then it is the curl of some vector field.

Proof. We have already proven (a) and (b). By Helmholtz we have f = curl A − ∇B
where A and B are defined by the integrals above. If f is irrotational, curl f = 0, so
A = 0 and thus f = −∇B. Similarly if f is incompressible, then div f = 0, so B = 0 and
hence f = curl A. 

In the Helmholtz decomposition of f, we sometimes call A the vector potential of


f and B the scalar potential of f. A natural question to ask at this point is: what can
we say about a vector field f which is incompressible and irrotational ?
If f is irrotational, then f = −∇B as we have just shown. But if it is also incom-
pressible, we have

div f = −div ∇B = −∇2 B = 0 =⇒ ∇2 B = 0.

Dually, we could have written f = curl A since it is incompressible. Since it is also


irrotational we have

curl f = curl2 A = ∇(div A) − ∇2 A = 0.

Since both equations involve the Laplacian (∇2 ) and f can be written in either way, we
will stick to the simpler first case. Hence an irrotational and incompressible vector field
f has the form
f = −∇ϕ
where ϕ is a solution to Laplace’s equation:

∇2 ϕ = 0.

Solutions to Laplace’s equations are called harmonic functions.

34
7 Applications
7.1 Vortex Dynamics
In fluid mechanics, we can let v : R3 → R3 be the velocity field of a fluid (where a fluid
is defined with the local property of being an amount of mass or energy per unit volume
and having a velocity vector at each point). Let us assume conservation of mass/energy
in a fixed volume V. So we have ZZZ
= ρ dV
V
where ρ denotes mass/energy density and  is the total mass/energy in the volume. The
change in energy over time depends on how much energy enters or leaves the volume V.
Hence if J denotes the energy velocity field (in the sense that J = ρv), then
ZZ
∂
=− J · dS.
∂t ∂V

That is, a negative change in energy corresponds to how much of the energy velocity
field hits the boundary of the volume (i.e. exits the volume). We use partial derivative
notation since we think of energy field as a function  : R4 → R with a time variable as
well. By Gauss’ theorem we have
ZZZ ZZ ZZZ
∂ ∂ρ
− div J dV = − J · dS = = dV.
V ∂V ∂t V ∂t

Hence
∂ρ
= −div J.
∂t
This is equivalent to saying
∂ρ ∂ρ
0= + div (ρv) = + ∇ρ · v + ρ div v.
∂t ∂t
This is called the continuity equation. As a function of four variables, the total deriva-
tive of the energy density with respect to time is
dρ ∂ρ dt ∂ρ dx ∂ρ dy ∂ρ dz
= + + +
dt ∂t dt ∂x dt ∂y dt ∂z dt
∂ρ
= + ∇ρ · v
∂t
= −ρ div v.

Now we can define the vorticity of the fluid velocity field v as its curl. We will write

ω = curl v.

Suppose we have a subvolume in which the vorticity of the fluid is nonzero (which we
will call a vortex). Then since div ω = 0, it follows that the rate of change in density of
the vortex is 0 by applying the equation we derived above. We can however ask how the
vortex changes as a whole over time in the fluid. A similar equation can be derived from
the Navier-Stokes equation for a Newtonian fluid:
dω ∂w
= + ∇ω · v.
dt ∂t
35
Since v : R3 → R3 , we also have ω : R3 → R3 , so what does ∂ω/∂t mean? Moreover,
what is the gradient of a vector field? Suppose we have a collection of velocity fields {vt }.
Then we have a collection of vorticity fields {ωt = curl vt }. We can in turn think of v and
ω as functions v, ω : R4 → R3 . Correspondingly we will have
 
∂ω ∂ ∂ω1 ∂ω2 ∂ω3
= (ω1 (x, y, z, t), ω2 (x, y, z, t), ω3 (x, y, z, t)) = , ,
∂t ∂t ∂t ∂t ∂t

and ∇ω is simply the Jacobian of the original vector field:


 
ω1x ω1y ω1z
∇ω = Dω = ω2x ω2y
 ω2z  .
ω3x ω3y ω3z

Hence the equation becomes


 
ω1x ω1y ω1z

= (ω1t , ω2t , ω3t ) + ω2x ω2y ω2z  · (v1 , v2 , v3 )
dt
ω3x ω3y ω3z
3
X
= (ω1t , ω2t , ω3t ) + vj (ω1j , ω2j , ω3j )
j=1
3 3 3
!
X X X
= ω1t + vj ω1j , ω2t + vj ω2j , ω3t + vj ω3j
j=1 j=1 j=1

where the dot product is defined on column vectors of the Jacobian and

ω if j = 1
∂ωi  ix
= ωiy if j = 2 .
∂xj 
ωiz if j = 3

This equation, called the vorticity equation, describes the how a vortex changes over
time as it moves through a velocity field. It gives componentwise equations
3
dωi ∂ωi X ∂ωi
= + vj .
dt ∂t j=1
∂x j

It proves to be useful to talk of the operator


D ∂
= + v · ∇,
Dt ∂t
which is called the material differential operator. When applied to a scalar or vector
field, it is called the material derivative of the field.

7.2 Electrodynamics
In electrodynamics we have two vector fields E, B : R3 → R3 called the electric and
magnetic fields respectively. Contributions from Gauss, Faraday, Ampére, and Maxwell

36
led to the discovery of what are called Maxwell’s equations:
ρ
div E =
0
div B = 0
∂B
curl E = −
∂t
 
∂E
curl B = µ0 J + 0
∂t

where the partial derivatives with respect to time are defined as before for fluid velocity
fields, ρ = dQ/dV is the charge density, J = dI/dV is the current density, 0 is the
permittivity of free space (or electric constant), and µ0 is the permeability of free space
(or magnetic constant).
Integrating the first equation over a closed volume and applying Gauss’ theorem gives
us ZZZ ZZ ZZZ
ρ Q
div E dV = E · dS = dV =
V ∂V V 0 0
where S = ∂V. Similarly for the second equation we obtain
ZZ
B · dS = 0.
∂V

Integrating the third equation over ∂V and applying Kelvin-Stokes’ gives us


ZZ Z ZZ
∂B
curl E · dS = E · dγ = − · dS
∂V γ ∂V ∂t

where γ is a simple closed curve on the boundary of V. Similarly for the fourth equation
one can obtain
Z ZZ    ZZ 
∂E ∂E
B · dγ = µ0 J + 0 · dS = µ0 IS + 0 · dS
γ ∂V ∂t ∂V ∂t

where IS is the net current on the boundary. This gives us the integral forms of Maxwell’s
equations:
ZZ
Q
E · dS =
0
Z Z∂V
B · dS = 0
∂V
Z ZZ
∂B
E · dγ = − · dS
γ ∂V ∂t
ZZ
d
=− B · dS
dt ∂V
Z  ZZ 
∂E
B · dγ = µ0 IS + 0 · dS
γ ∂V ∂t
 ZZ 
d
= µ0 IS + 0 E · dS
dt ∂V

37
where γ is a simple closed curve on ∂V. If we assume our volume has no charge or current,
then ρ = J = 0, and Maxwell’s equations are
div E = 0
div B = 0
∂B
curl E = −
∂t
∂E 1 ∂E
curl B = µ0 0 = 2
∂t c ∂t
where c is the speed of an electromagnetic wave in a vacuum, since
1
c= √ .
µ0 0
When there is no charge or current, we also have
curl2 (E) = ∇(div E) − ∇2 E
= −∇2 E
= curl (curl E)
∂B
= −curl
∂t

= − curl B
∂t
1 ∂ 2E
=− 2 2.
c ∂t
So
∂ 2E
2
= c2 ∇2 E.
∂t
Also
curl2 (B) = ∇(div B) − ∇2 B
= −∇2 B
= curl curl B
1 ∂E
= 2 curl
c ∂t
1 ∂
= 2 curl E
c ∂t
1 ∂ 2B
=− 2 2 .
c ∂t
So we also have
∂ 2B
= c2 ∇2 B.
∂t2
Hence E and B both satisfy the equation
∂ 2ψ
= c2 ∇ 2 ψ
∂t2
for a constant c and vector field ψ. This equation is called the wave equation, and has
important applications in other areas of physics as well.

38
References
[1] Aubin, Theirry. A Course in Differential Geometry. American Mathematical Society.
2001.

[2] Folland, Gerald B. Real Analysis: Modern Techniques and Their Applications. 2nd
Edition. John Wiley & Sons, Inc. 1999.

[3] Griffiths, David. Introduction to Electrodynamics. 3rd Edition. Prentice Hall, Inc.
1999.

[4] Stewart, James. Calculus. 5th Edition. Brooks/Cole. 2003.

[5] Wade, William. An Introduction to Analysis. 3rd Edition. Pearson Education, Inc.
2004.

39

You might also like