You are on page 1of 173

Fedor Duzhin

Calculus of Several Variables


1
0.5
0
0.5
1
1
0.5
0
0.5
1
0.5
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
0.5
Contents
1 Vector-valued functions 3
1.1 Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 How to visualize a map . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Parametric equations . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Tangents of curves . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Partial derivatives 16
2.1 Restriction of a map . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Partial derivative and optimization . . . . . . . . . . . . . . . . . 20
2.3 Limits and continuity in vector spaces . . . . . . . . . . . . . . . 23
2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3 Non-degenerate critical points 28
3.1 Second partial derivatives . . . . . . . . . . . . . . . . . . . . . . 28
3.2 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 Hessian matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4 Morse index and second derivative test . . . . . . . . . . . . . . 35
3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4 Continuity and differentiability 42
4.1 Limits and continuity in vector spaces . . . . . . . . . . . . . . . 42
4.2 Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3 Jacobian matrix and tangent space . . . . . . . . . . . . . . . . . 48
4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5 Chain Rule 57
5.1 Differentiation laws . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2 Directional derivative . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.3 Partial differential equations . . . . . . . . . . . . . . . . . . . . . 64
5.4 Geometry of level curves and surfaces . . . . . . . . . . . . . . . 66
5.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
1
6 Constraint Optimization 71
6.1 Extreme values over a curve or surface . . . . . . . . . . . . . . 71
6.2 Extreme values over a region with a boundary . . . . . . . . . . 77
6.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7 Multiple Integral 80
7.1 Double integral over rectangular regions . . . . . . . . . . . . . 80
7.2 Double integral over arbitrary regions . . . . . . . . . . . . . . . 84
7.3 Triple integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8 Change of variables 93
8.1 Substitution in single integral . . . . . . . . . . . . . . . . . . . . 93
8.2 Substitution in multiple integrals . . . . . . . . . . . . . . . . . . 94
8.3 Summary and more examples . . . . . . . . . . . . . . . . . . . . 101
8.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
9 Vector elds and line integrals 104
9.1 Vector elds and operations on them . . . . . . . . . . . . . . . . 104
9.2 Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
9.3 Arc length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
9.4 Line integral of a function . . . . . . . . . . . . . . . . . . . . . . 114
9.5 Work of a vector eld . . . . . . . . . . . . . . . . . . . . . . . . . 117
9.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
10 Newton-Leibniz and Greens Theorems 121
10.1 Newton-Leibniz Theorem . . . . . . . . . . . . . . . . . . . . . . 121
10.2 Greens Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
10.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
11 Surface Integral 136
11.1 Parametric surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . 136
11.2 Surface integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
11.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
12 Stokes Theorem and Applications of Integration 152
12.1 Stokes Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
12.2 Applications of integration . . . . . . . . . . . . . . . . . . . . . 158
12.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
13 Gauss Theorem 163
13.1 Gauss Divergence Theorem . . . . . . . . . . . . . . . . . . . . . 163
13.2 Summary on Multi-variate Integral Calculus . . . . . . . . . . . 169
13.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
2
Lecture 1
Vector-valued functions
1.1 Maps
Recall that the space R
m
consists of m-tuples of real numbers. We denote them
by bold font and write either as rows or columns:
x =
_
_
_
_
_
x
1
x
2
.
.
.
x
m
_
_
_
_
_
R
m
or x = (x
1
, x
2
, . . . , x
m
) R
m
Its elements are called vectors. Two vectors can be added element by element.
Also, a vector can be multiplied with a real number. These two basic opera-
tions are what denes a vector space: in fact, objects of any nature are vectors
once they can be multiplied with numbers and added. However, R
m
has more
structure than mere vector space. In particular, a dot product can be used to
nd a distance between points in R
m
as follows.
Definition 1.1 The Euclidean norm is given by |x| =
_
x
2
1
+ x
2
2
+ + x
2
m
for x R
m
. The distance between two points x, y R
m
is |x y|.
Exercise 1.1 How exactly is the dot product used in this denition?
Calculus of one variable focuses on functions R R (or, more generally
A R, where A R). In other words, the value of a variable is a single real
number and the value of a function is a single real number. The graph of such
a function is a curve on the plane.
3
Example 1.2 Consider the function f (x) =
x
sin x
+1. It is dened whenever
sin x 0. In other words, x k, k Z. Thus its domain is A = R
0, , 2, . . . . The value of the variable x can be taken a real number
x A and the value of the function is going to be some other real number
f (x) R. For instance,
f
_

2
_
=
3
2
, f
_

2
_
=
3
2
, f
_
3
2
_
=
1
2
.
Calculus of several variables deals with functions A R
n
, where A R
m
.
For most of our examples, both our m and n dont exceed 3, which is enough
to observe almost all important features of multi-variate calculus. What are
those features? Unlike real numbers, vectors cannot be divided and compared.
Indeed, what is bigger: (1, 0) or (0, 1)? And what could be the ratio
(1,0,0)
(0,1,0)
?
Recall that the denition of limit is based on comparison (for any > 0 there
is > 0 etc.) and the denition of derivative uses division. Thus its not easy
even to dene limits and derivatives for maps on vector spaces.
Definition 1.3 A function R
m
R
n
is usually referred to as a map. A value
of a map is an n-vector (y
1
, . . . , y
n
) whose coordinates are called component
functions of the map f .
The words map and function mean the same: a rule that assigns an object
of a set B to each object of a set A. The difference between them is only in
which situation one uses either word and it is quite vague. Usually, function is
used when n = 1 and map is for n > 1, but it doesnt really matter. However,
one doesnt speak about component maps only component functions.
Example 1.4 Consider the map R
1
R
2
given by f (t) = (cos t, sin t). In
other words, the value of the variable t is a real number while the value of
the function is a pair of real numbers, that is, a vector u = (x, y) R
2
. For
instance,
f (0) =
_
1
0
_
, f
_

2
_
=
_
0
1
_
, f () =
_
1
0
_
.
The graph of such a function is a curve in the space with coordinates (x, y, t).
The functions x = cos t and y = sin t are component functions of the map f .
Example 1.5 Consider the function R
2
R
1
given by z = f (x, y) = x
2
y
2
.
In other words, the value of the function is a real number while the value of
the variable is a couple of real numbers, that is, a vector u = (x, y) R
2
. For
instance,
f (0, 0) = 0, f (1, 2) = 3, f (8, 7) = 15.
4
The graph of such a function is a surface in the space with coordinates
(x, y, z).
A function R R can be even, periodic, increasing etc. Some of these
properties are well-dened for maps of vector spaces, some of them are not.
In particular,
A function f : R R is one-to-one if x y implies f (x) f (y). This
denition works in a general situation: a map f : R
m
R
n
is one-to-one if
x y implies f (x) f (y).
A function f : R R is increasing (decreasing) if x < y implies f (x) < f (y)
( f (x) > f (y)). This notion does not make sense in multi-dimensional case
because a vector cannot be bigger or smaller than another vector.
A function f : R R is even (resp. odd) if f (x) = f (x) (resp. f (x) =
f (x)). This notion is well-dened for maps of vector spaces although it is
named differently. In fact, a map f : R
m
R
n
is symmetric (resp. anti-
symmetric) if f (x) = f (x) ( f (x) = f (x)).
A function f : R R is periodic of period T 0 if f (x + T) = f (x).
This notion makes sense in a multi-dimensional case: a map f : R
m
R
n
is
periodic of period T R
m
if f (x +T) = f (x).
Example 1.6 Consider a map R R
2
given by f (t) = (sin2t, cos 3t). Lets
check that it has period 2. We have:
f (t +2) =
_
sin2(t +2)
cos 3(t +2)
_
=
_
sin(2t +4)
cos(3t +6)
_
=
_
sin2t
cos 3t
_
= f (t).

Finally, let us recall that the domain of a function (or a map) f : A R


n
is the set A where the function is dened. The range f (A) R
n
of a function
(or a map) is the set of all values the function can take. The codomain is the
target space R
n
.
1.2 How to visualize a map
Given a map f : R
m
R
n
, how can we see it? There are a few ways to
present a function graphically.
Definition 1.7 For a map f : A R
n
, A R
m
, its graph is G
f
=
(x, f (x)) : x A. Since x R
m
and f (x) R
n
, we get G
f
R
m+n
.
In other words, to draw the graph we show possible values of the variable
and corresponding values of the function. For n = m = 1, we get the usual
graph of a function f : R R, that is, a curve on the plane. A graph of a
5
2
1.5
1
0.5
0
0.5
1
1.5
2
2.5
2.5
2
1.5
1
0.5
0
0.5
1
1.5
2
2.5
Figure 1.1: The graph of the function f (x, y) = (x + y) sin
_
x
2
+ y
2
_
arctan x
function of two variables f : R
2
R is a surface in R
3
as shown in Figure
1.1.
Another way to depict a function of two variables is drawing its level
curves. Level curves can be seen as cross-sections of the functions graph by
horizontal planes z = c, where c is any xed number. For example, Figure 1.2
shows a whole bunch of level curves for the function f (x, y) = x
3
2xy + y
2
and the particular level curve x
3
2xy + y
2
= 0 in the 3D space.
Definition 1.8 For a function f : R
2
R, a level curve consists of the
solutions to the equation f (x, y) = c for some xed constant c R.
Note that different constants c, gives different level curves.
Level curves are used widely in geography to draw elevation maps, atmo-
sphere pressure maps etc. They are known under a lot of names like contour
lines, isolines, isobars depending on what function is being represented.
How about a map R
2
R
2
? Well, the variable of such a map is a 2-vector,
say (x, y) and the value is also a 2-vector, say (u, v). We can pick a grid in the
(x, y)-plane and sketch the value vector (u, v) at each point of the grid. What
we are going to see is a vector eld as shown in Figure 1.3.
Be it a graph, level curves, or a vector eld, the picture represents all the
information about the map: it shows values of the variable together with
values of the function. Sometimes, it is useful to look only at the values of the
function, that is, at the image of the map.
Definition 1.9 A curve in R
n
is the range of a map f : A R
n
, where
A R. A surface in R
n
is the range of a map f : A R
n
, where A R
2
.
6
1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1

1
.9

1
.8

1
.7

1
.6

1
.5

1
.4

1
.3

1
.2

1
.1

1
1

0
.9
0.9

0
.8
0.8

0
.7
0.7

0
.6
0.6

0
.5
0.5

0
.4
0.4

0
.3
0.3

0
.2
0.2

0
.1
0.1
0.1
0.1
0.1
0
0
0
0
0
0
0
0.1
0.1
0.1
0.1
0.1
0.1
0.2
0.2
0.2
0.2
0.2
0.2
0.3
0.3
0.3
0.3
0.3
0.3
0.4
0.4
0.4
0.4
0.4
0.4
0.5
0.5
0.5
0.5
0.5
0.5
0.6
0.6
0.6
0.6
0.6
0.7
0.7
0.7
0.7
0.7
0.8
0.8
0.8
0.8
0.9
0.9
0.9
0.9
1
1
1
1
1.1
1.1
1.1
1.1
1.2
1.2
1.2
1.2
1.3
1.3
1.3
1.3
1.4
1.4
1.4
1.4
1.5
1.5
1.5
1.6
1.6
1.6
1.7
1.7
1.7
1.8
1.8
1.8
1.9
1.9
2
2
2
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8 2.9
3
3.1
3.2
3.3
3.4
3.5
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
Figure 1.2: Level curves the function f (x, y) = x
3
2xy + y
2
1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
Figure 1.3: Vector eld u = x y, v = xy
7
8 6 4 2 0 2 4 6 8
3
2
1
0
1
2
3
4
5
6
4
2
0
2
4
6
4
2
0
2
4
6
Figure 1.4: A curve in R
2
and a surface in R
3
Usually, when dening a curve or a surface, the map f above is required
to be continuous. Although we still havent dened the conscept of continuity
for maps, we will learn later that for a vector valued function, continuity of the
map is equivalent to the continuity of all its component functions. For a curve,
each component function is simply a real valued function of one variable
and we already know what it means for such a function to be continuous.
For a surface, each component function is a real valued function of two real
variables, and we will discuss later what continuity means in this case.
For example, Figure 1.4 shows the curve given by the map f (t) = (x(t), y(t)),
where
x(t) =
_
4 +2 sin3t +3 cos 2t
_
cos t, y(t) =
_
4 +2 sin3t +3 cos 2t
_
sin t,
and the surface given by the map f (u, v) = (x, y, z), where
x(u, v) = (5 +cos 3u) sin v, y(u, v) = (5 +cos 3u) cos v, z(u, v) = sin u.
1.3 Parametric equations
How do we describe a motion? For example, you know that a treasure has
been hidden in the forest around NTU and you want to walk around looking
for it. You need to be sure that you visited every corner of the forest, so your
trajectory should be somehow lling the area, something like the ones shown
in Figure 1.5. Such curves cannot be graphs of any function because they do
not pass the vertical line test (so to each value of the variable, more than 1
value of the function would be assigned).
Nevertheless, we can use a map R R
2
to describe such a curve. Indeed,
let (x(t), y(t)) be your coordinates t minutes after you start walking. Then the
8
1 0.5 0 0.5 1
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
0 2 4 6 8 10 12 14 16 18 20
0
2
4
6
8
10
12
14
16
18
20
Figure 1.5: Examples of curves in R
2
function
f (t) =
_
x(t)
y(t)
_
shows your position at all times. The range of this function is the curve were
after.
Example 1.10 Taking the map f (t) = (t cos 100t, t sin100t) for 0 t 1, we
get the spiral shown in Figure 1.5 on the left. Thus the domain of the map f
is the interval [0, 1], the codomain is R
2
.
We already know that a curve in R
2
is the range of a vector-valued func-
tion. The domain is usually going to be a closed interval [a, b] (otherwise the
curve can be innite). Also, we are not actually interested in discontinuous
curves. Specically, if the curve we are looking at consists of several connected
components, we consider each of them separately.
Definition 1.11 Given a map f : [a, b] R
n
, it denes the curve C =
f ([a, b]). Then the parametric equations of the curve C is the collection of
component functions of the map f , that is,
x
1
= x
1
(t), x
2
= x
2
(t), , x
n
= x
n
(t),
where each component function x
i
is assumed to be continuous. If f (a) =
f (b), then the curve C is said to be closed.
Given a closed curve, that is, one with f (a) = f (b), notice that we can
extend the function f onto the whole line R by periodicity. Thus we can say
that a closed curve is an image of a periodic map f : R R
n
.
9
0 0.5 1 1.5 2
2
1.5
1
0.5
0
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
(x
1
,y
1
)
(x
2
,y
2
)
Figure 1.6: A hyperbola and a straight line
Example 1.12 The unit circle x
2
+ y
2
= 1 is a closed curve given by the para-
metric equation
x = cos t,
y = sin t,
where t R. Its period is 2, so we can consider the domain to be any
interval of length 2 like [, ], [0, 2] or whatever is convenient for us. On
the other hand, the equation
x = cos(t
3
+1), y = sin(t
3
+1).
dene the same circle x
2
+ y
2
= 1. Indeed, let u = t
3
+ 1, then t =
3

u 1
and when t runs from 1 to
3

2 1, u changes precisely from 0 to 2.


Exercise 1.2 Among the following parametric equations, which ones give the
whole unit circle x
2
+ y
2
= 1?
x = cos(t
2
+1)
y = sin(t
2
+1)
,
x = cos(sin t)
y = sin(sin t)
,
x = cos(tan t)
y = sin(tan t)
.
Example 1.13 The half-hyperbola x
2
y
2
= 1, x 0 (see Figure 1.6), can be
given by either of the following parametric equations
x = cosh t
y = sinh t
, t R,
x = sec t
y = tan t
, t
_

2
,

2
_
.
The domain of either of them is not a closed interval and the curve is innite.
10
0.4 0.2 0 0.2 0.4 0.6 0.8 1
0.6
0.7
0.8
0.9
1
1.1
Figure 1.7: Secant lines approaching the tangent
Exercise 1.3 Given any function f : R R, dene its graph by a parametric
equation.
Example 1.14 Consider a straight line passing through points (x
1
, y
1
) and
(x
2
, y
2
) as shown in Figure 1.6. Let s be the vector from (x
1
, y
1
) to (x
2
, y
2
).
Then an arbitrary point on the line can be given by shifting from (x
1
, y
1
) in
the direction parallel to the vector s = (x
2
x
1
, y
2
y
1
). Thus the parametric
equation of the straight line is
x(t) = x
1
+ (x
2
x
1
)t, y(t) = y
1
+ (y
2
y
1
)t,
which can be written as
x = x
1
+ at
y = y
1
+ bt
, a = x
2
x
1
, b = y
2
y
1
.
How can we get an equation of the form Ax + By + C = 0 from here? Well,
we see that t =
xx
1
a
=
yy
1
b
, which gives us 0 =
xx
1
a

yy
1
b
, so bx ay +
(ay
1
bx
1
) = 0.
We know that a periodic function is usually given with a trigonometric
formula. How do we nd the period then? We need to express it as a sum
of simple functions whose period we know like sin kx or cos kx. The whole
period is the least common multiple of the summands periods. For example,
sin
2
x =
1cos 2x
2
, so its minimal period is . Or sin
x
2
cos x =
1
2
_
sin
3x
2
sin
x
2
_
,
so its minimal period is 4.
Exercise 1.4 For m, n N, what is the minimal period of the function
sin
x
m
sin
x
n
? Use the formula sin x sin y =
cos(xy)cos(x+y)
2
to simplify this
expression.
1.4 Tangents of curves
An approach to dene the derivative of a function R R is to consider a
tangent line to the functions graph as the limit of secant lines (see Figure
11
1.7). We can do same for a map f : R R
2
or f : R R
n
. Let f (t) =
[x(t), y(t)] and, naturally, suppose the component functions x(t) and y(t) to
be differentiable at t = a. Let the point [x(a), y(a)] be xed and consider
[x(a + h), y(a + h)] approaching it as h tends to 0. The parametric equation of
the line l
h
passing through these two points is
x = x(a) + (x(a + h) x(a))t, y = y(a) + (y(a + h) y(a))t.
Multiplying a vector with any constant doesnt change its direction, so vectors
[x(a + h) x(a), y(a + h) y(a)] and
1
h
[x(a + h) x(a), y(a + h) y(a)] are
parallel to each other and to the straight line l
h
. Thus the parametric equation
x = x(a) +
x(a + h) x(a)
h
t, y = y(a) +
y(a + h) y(a)
h
t
denes the same straight line. Taking the limit of these expressions, we see
that the parametric equation of the tangent to the curve at t = a should be
x = x(a) + x
/
(a)t, y = y(a) + y
/
(a)t.
Definition 1.15 Given a curve ( in R
2
with parametric equation x =
x(t), y = y(t), if (x
/
(a), y
/
(a)) (0, 0), the tangent line to ( at the point
(x(a), y(a)) is dened by
x = x(a) + x
/
(a)t,
y = y(a) + y
/
(a)t.
(1.1)
Example 1.16 Consider the circle given by x = cos t, y = sin t. According to
our argumentation, the tangent line at a point t = a is
x = cos a t sin a, y = sin a + t cos a.
Thus it passes through the point (cos a, sin a) = (x, y) and is parallel to the
vector (sin a, cos a) = (y, x) orthogonal to the vector (x, y), which is, in-
deed, what we see in reality.
Recall that given a function f : R R, the tangent line to its graph at a
point [a, f (a)] is
y = f (a) + f
/
(a)(x a). (1.2)
Theorem 1.17 For any function f : R R differentiable at x = a, equa-
tions (1.1) and (1.2) give the same straight line.
12
Proof The graph of the function is given by the parametric equation x =
t, y = f (t). Thus we need to check that y = f (a) + f
/
(a)(x a) and
x = a + t, y = f (a) + f
/
(a)t,
dene the same straight line. Since a straight line is completely determined
by any its two points, its enough to nd two different points belonging to the
both lines. For t = 0, we have x = a, y = f (a), which satises equation (1.2).
For t = 1, we have x = a +1, y = f (a) + f
/
(a), which also satises (1.2). :)
Example 1.18 Consider the curve x = cos 2t cos t, y = cos 2t sin t and let us
nd out how many times this curve passes through the origin and what equa-
tions of the tangent lines at (0, 0) are.
First, what is the period of this function? We have, cos 2t cos t =
cos t+cos 3t
2
,
so its period is 2. Also, cos 2t sin t =
sin3tsin t
2
, so its period is 2 as well
and so is one of the whole map f . Further, if the point (0, 0) belongs to the
curve, then
cos 2a cos a = 0, cos 2a sin a = 0.
Therefore cos 2a = 0 or cos a = sin a = 0. The latter case is impossible, so
cos 2a = 0, which has the following four solutions within [0, 2]:
a =

4
, a =
3
4
, a =
5
4
, a =
7
4
.
In other words, the curve passes through the origin four times.
Lets nd the tangent lines. We have x
/
(t) = 2 sin2t cos t cos 2t sin t
and y
/
(t) = 2 sin2t sin t + cos 2t cos t. Now we only need to plug it in the
formula (1.1). We get the following lines:
x =

2t
y =

2t
,
x =

2t
y =

2t
,
x =

2t
y =

2t
,
x =

2t
y =

2t
.
Noticing that rst and third coincide and so do second and fourth, we see that
there are two different tangent lines at the origin: x + y = 0 and x y = 0.
1.5 Exercises
Questions on understanding the lecture
Exercise 1.5 Check that the following parametric equations give the same
curve:
x = 1 +cos t, y = 2 +sin t, t 2;
x = t, y = 2

2t t
2
, 0 t 2.
13
give the same curve. What kind of a curve is it? Find a single equation in x, y
that represents this curve.
Exercise 1.6 Check that the parametric equations
x = 2t, y = 3t;
x = 2 6t, y = 3 +9t.
give the same curve. What kind of a curve is it? Find a single equation in x, y
that represents this curve.
Exercise 1.7 A parametric equation of a 3D curve is, naturally, x = x(t),
y = y(t), z = z(t).
(i) What is a parametric equation of a straight line in R
3
?
(ii) Write down a parametric equation for the tangent line to a curve in R
3
given by x = x(t), y = y(t), z = z(t), at a point (x
0
, y
0
, z
0
).
(iii) Write down a parametric equation for the tangent line to the curve x = t,
y = t, z = t
2
, at the point (1, 1, 1).
(iv) Write down the formula for the length of a curve in R
3
given by x = x(t),
y = y(t), z = z(t) for a t b.
Questions on calculation
Exercise 1.8 Consider the closed curve in R
2
given by x = sin3t cos t, y =
sin3t sin t.
(i) What is the minimal period of the corresponding map f : R
t
R
2
x,y
?
(ii) How many times does this curve pass through the origin?
(iii) What are the tangent lines at the point (0, 0)?
Exercise 1.9 Find the lengths of the following curves
(i) y
2
= 2px, x [0, x
0
] for some parameters p, x
0
.
(ii) y = lncos x, x [0, a] for some positive parameter a <

2
.
(iii) f (t) =
_
cos
2
t
sin
2
t
_
here you have to nd the minimal period rst.
(iv) x
2/3
+ y
2/3
= 1 here you have to nd a parametrization rst.
14
Questions on logical thinking
Exercise 1.10 Prove that any straight line is a tangent to itself at any its point.
Exercise 1.11 Prove that the angle between the tangent line to the curve x =
3 cos t, y = 3 sin t, z = 2t at any point and the z-axis is constant (does not
depend on the point).
Exercise 1.12 Find a formula for the length of the graph of a function =
(r), r [a, b] in polar coordinates.
15
Lecture 2
Partial derivatives
2.1 Restriction of a map
Given a map f : R R
2
, its value is a 2-vector [x(t), y(t)], where x(t) and
y(t) are component functions of the map f . In fact, x(t) and y(t) are functions
of one variable, so we know well how to deal with them. The graph of such a
map would be a curve in R
3
t,x,y
.
The situation is much more complicated for a function f : R
2
R or,
more generally, f : A R for A R
2
. The variable is a vector (x, y) R
2
and the value of the function is a real number z = f (x, y), so the graph of the
function is a surface in the space R
3
x,y,z
.
Lets try the same idea decompose the function f (x, y) into functions of
one variable. Fix a point (a, b) R
2
and consider two functions f
1
(t) = f (t, b)
and f
2
(t) = f (a, t). Geometrically, it means that we take cross-sections of f s
graph with the planes y = b and x = a respectively.
Example 2.1 Consider the function f (x, y) = x
3
xy + y
2
and the point
(a, b) = (0, 0). Then f
1
(t) = f (t, 0) = t
3
and f
2
(t) = f (0, t) = t
2
. As shown in
Figure 2.1, you can see the graphs of the functions f
1
and f
2
as cross-sections
of the surface by planes y = 0 and x = 0.
Of course, f
1
and f
2
do not describe the function f (x, y) fully. Indeed, look
at the picture their graphs do not cover the whole surface; two lines are
simply not enough to represent a 2-dimensional object.
Example 2.2 Consider the functions f (x, y) = [xy[, g(x, y) = sin(xy), and
h(x, y) = 0 and the point (a, b) = (0, 0). Notice that
f (t, 0) = f (0, t) = g(t, 0) = g(0, t) = h(t, 0) = h(0, t) = 0,
but the three functions are denitely different.
16
Figure 2.1: f (x, y) = x
3
xy + y
2
.
When taking f
1
(t) = f (t, b), what we actually do is x y = b and look only
at values of the function f on the line y = b. Instead of this line, we could
take an arbitrary set.
Definition 2.3 Let f : A R
n
be a map dened on A R
m
and let B A.
The restriction of the map f onto the set B is the map f [
B
: B R
n
such that
f [
B
(x) = f (x) for any x B and not dened for x B.
Example 2.4 The restriction of the function f (x, y) = x
2
y
3
onto the
parabola y = x
2
would be f (x, x
2
) = x
2
(x
2
)
3
= x
2
x
6
. Geometrically, we
can see it as the intersection of the graph of the function z = f (x, y) = x
2
y
3
and the parabolic cylinder y = x
2
as shown in Figure 2.2.
Example 2.5 Lets nd the restriction of the function f (x, y, z) = e
z+xy
z
onto the plane x +y +z = 0. On this plane we have z = x y, and therefore
f [
x+y+z=0
= f (x, y, x y) = e
xy+xy
(x y) = e
2y
+ x + y.
For a function f (x, y) and a point (a, b), restrictions onto lines x = a and
y = b are f
2
(t) = f (a, t) and f
1
(t) = f (t, b) respectively. What if we con-
sider restrictions onto all straight lines passing through the point (a, b)? An
arbitrary straight line containing (a, b) is given by a parametric equation
x = a + At
y = b + Bt
, A
2
+ B
2
> 0,
17
Figure 2.2: f (x, y) = x
2
y
3
restricted onto parabola y = x
2
.
where the vector (A, B) R
2
is parallel to the straight line. The function
restricted onto this line is therefore f (a + At, b + Bt).
Example 2.6 For f (x, y) = cos(2x y) and (a, b) = (2, 1), we get f (a +
At, b + Bt) = cos(2(2 + At) (1 + Bt) = cos(5 2At + Bt).
Of course, the union of all these straight lines is the whole plane, so the
collection of restricted functions f (a + At, b + Bt) fully denes the original
f (x, y). But studying Calculus, we need to mind limits.
Example 2.7 Let f : R
2
R be given by
f (x, y) =
_
1, y = x
2
,
0, y x
2
.
For instance, f (0, 1) = f (2, 3) = f (5, 5) = 0 and f (0, 0) = f (1, 1) =
f (3, 9) = 1. Thus,
f (At, Bt) =
_
1, Bt = A
2
t
2
,
0, Bt A
2
t
2
.
The equation Bt = A
2
t
2
has at most two roots t = 0 and t =
B
A
2
, hence
the function f (At, Bt) is almost everywhere 0 except of maybe two points.
Therefore lim
t0
f (At, Bt) = 0.
On the other hand, lets look at the restriction of f onto the parabola y =
x
2
. We have, f (x, x
2
) = 1. Therefore lim
x0
f (x, x
2
) = 1. Although the limit
along any straight line is 0, the limit along the parabola is not 0.
18
We see that even if the limit of the function restricted onto any straight
line is the same, it doesnt yet mean that the limit along some other curve
would be same. So what should we do? Should we just forget about the idea
of restricting onto straight lines? No! Recall that one of main applications
of derivatives is optimization: given a function f (x), the equation f
/
(x) = 0
helps to nd maxima and minima of f (x).
Definition 2.8 Given a function f : A R, where A R
m
, a point a =
(a
1
, . . . , a
m
) is a point of
global minimum if f (x) f (a) for any x A,
local minimum if there is > 0 such that f (x) f (a) for |x a| < ,
global maximum if f (x) f (a) for any x A,
local maximum if there is > 0 such that f (x) f (a) for |x a| < .
Example 2.9 Let f (x, y) = cos(2e
x
ln y). Then any point (x, y) such that
f (x, y) = 1 would be a point of global maximum because cosine cannot exceed
1. Recall that cos 0 = 1, so we can try 2e
x
ln y = 0. For instance, if x = 0,
then 2 ln y = 0 and y = e
2
. Thus (0, e
2
) is a point of global maximum of the
function f (x, y).
Of course, a global minimum or maximum is always a local one, too. The
converse is not true in general. Recall that in order to nd global minima
of a function f (x) on an interval [a, b], we consider all roots of the equation
f
/
(x) = 0, all points where f
/
(x) is not dened, the endpoints a and b and
pick the minimal value of f (x) among them.
Lemma 2.10 Assume that a function f : A R, A R
m
has a lo-
cal/global minimum/maximum at a point a B A. Then the re-
striction f [
B
also has a local/global minimum/maximum at the point a.
Proof Let us check the statement for global minimum (the rest is similar). If
a is a global minimum of the function f , it means that f (x) f (a) for any
x A. In particular, f (x) f (a) for any x B because B A and therefore
a is a global minimum of f [
B
, too. :)
19
2.2 Partial derivative and optimization
Consider a function of two variables f : R
2
R and a point (a, b) R
2
.
Recall that we have dened functions f
1
(t) = f (t, b) by xing y = b and
f
2
(t) = f (a, t) by xing x = a. Now we can differentiate them. More generally,
for f : R
m
R, we have
Definition 2.11 Let f : R
m
R be a function of several variables
and let a = (a
1
, . . . , a
m
) R
m
. Further, consider the function f
i
(t) =
f (a
1
, . . . , a
i1
, t, a
i+1
, . . . , a
m
). Its derivative
f
/
i
(a
i
) =
f
x
i
(a) = f
x
i
(a)
is called the partial derivative of f with respect to x
i
at the point (a
1
, . . . , a
m
).
Example 2.12 Let f (x, y) = e
[xy[
y and (a, b) = (0, 0). Then by xing y = 0,
we get f
1
(x) = e
[x0[
0 = 0. By xing x = 0, we get f
2
(y) = e
[0y[
y = y.
Therefore f
x
(0, 0) = f
/
1
(0) = 0 and f
y
(0, 0) = f
/
2
(0) = 1.
Example 2.13 Let
f (x, y) =
_
x sin
1
y
, y 0
x
2
, y = 0.
Then f (x, 0) = x
2
, so f
x
(0, 0) = 2x[
x=0
= 0. Further,
f (0, y) =
_
0 sin
1
y
= 0, y 0
0
2
= 0, y = 0.
= 0,
so f
y
(0, 0) = 0.
In general, to nd partial derivatives of a function of several variables, we
treat all variables except one as constant parameters and differentiate as usual.
Example 2.14 Let f (x, y, z) = e
xyz
cos(2x + y). Lets nd partial derivatives.
First,
f
x
=
_
e
xyz
_
x
cos(2x + y) +
_
e
xyz
_
(cos(2x + y))
x
= ye
xyz
cos(2x + y) 2e
xyz
sin(2x + y).
In the same manner,
f
y
=
_
e
xyz
_
y
cos(2x + y) +
_
e
xyz
_
(cos(2x + y))
y
= xe
xyz
cos(2x + y) e
xyz
sin(2x + y).
20
The situation with z is easier because cos(2x + y) does not depend on z and
therefore can be considered constant when differentiating by z:
f
z
=
_
e
xyz
_
z
cos(2x + y) = e
xyz
cos(2x + y).
What are partial derivatives for? First of all, partial derivatives are dened
in terms of restrictions of the given function and by Lemma 2.10, a maximum
or minimum of the whole function must also be a maximum or a minimum
of a restricted one.
Theorem 2.15 Assume that a function f (x
1
, . . . , x
m
) has a minimum or
a maximum at a point (a
1
, . . . , a
m
). Then if partial derivatives at a =
(a
1
, . . . , a
m
) exist, they all must vanish, that is,
f
x
1
(a) = f
x
2
(a) = = f
x
n
(a) = 0.
Proof Let f
i
(t) = f (a
1
, . . . , a
i1
, t, a
i+1
, . . . , a
m
) be the function restricted onto
the line x
i
= t with x
j
= a
j
xed for j i. By Lemma 2.10, the function f
i
(t)
also has a minimum or a maximum at t = a
i
. Therefore f
x
i
(a) = f
/
i
(a
i
) = 0. :)
Example 2.16 Let f (x, y) = x
2
xy +y
2
2x +y. Notice that f doesnt have a
global maximum because f (x, 0) = x
2
2x can be arbitrary large. Lets check
if we can nd a global minimum. In order to do it, we solve the system
_
f
x
= 2x y 2 = 0
f
y
= x +2y +1 = 0
x = 1, y = 0.
Thus, the only possible point of global minimum is (1, 0). But a function
might not have a global minimum at all, so we still need to check if thats our
case. Since x = 1, lets factor out (x 1) as
f = x
2
2x xy + y + y
2
= (x 1)
2
1 y(x 1) + y
2
=
_
x 1
y
2
_
2
+
3y
2
4
1,
so it is now obvious that it has a global minimum at (x, y) = (1, 0) because
its a sum of squares.
Definition 2.17 Given a function f (x
1
, . . . , x
n
), a point a = (a
1
, . . . , a
n
) is a
critical (or stationary) point of the function f if
f
x
1
(a) = f
x
2
(a) = = f
x
n
(a) = 0.
21
1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
0.05

0
.0
5
0
0
0
0
0
0
0
0
0
0.5
0
.5
0
.5
0
.5
0.5
0
.5
0
.5
1
1
1
1
1
1
1.5
1
.5
1.5
1
.5
2
2
2
2
2.5
2.5
2.5
2.5
3
3
3
3
3.5
3.5
4
4
4.5
4.5
5
5
Figure 2.3: Level curves of the function f (x, y) = (x y
2
)(2x y
2
).
Example 2.18 Let f (x, y) = x
2
y
2
. Lets nd its critical points. We have
f
x
= 2x = 0, so x = 0, and f
y
= 2y = 0, so y = 0. Is (0, 0) a point of local
minimum or a local maximum of the function f ?
Lets consider restrictions f (x, 0) = x
2
, which has a minimum at 0. Further,
f (0, y) = y
2
, which has a maximum at 0. Thus (0, 0) is neither maximal or a
minimal point.
The following example shows that although restrictions to straight lines
help to nd maxima or minima, they cannot be used to determine the type of
a critical point.
Example 2.19 Consider the function f (x, y) = (x y
2
)(2x y
2
). Then
f
x
= (2x y
2
) +2(x y
2
) = 4x 3y
2
,
f
y
= 2y(2x y
2
) 2y(x y
2
) = 2y(3x 2y
2
).
Solving the system f
x
= f
y
= 0, we get a critical point (0, 0) with f (0, 0) =
0. Is it a local minimum or local maximum?
Lets try to consider its restriction to any straight line y = kx. We have
then
f (x, kx) =
_
x k
2
x
2
_ _
2x k
2
x
2
_
= 2x
2
3k
2
x
3
+ k
4
x
4
,
which has a local minimum at x = 0. We missed the straight line x = 0, for
which f (0, y) = y
4
, which also has a local minimum at 0.
22
However, for
y
2
2
< x < y
2
, the value f (x, y) is negative. In particular,
consider now the restriction to the parabola x =
3
4
y
2
. Then we have
f
_
3
4
y
2
, y
_
=
_
3
4
y
2
y
2
__
3
2
y
2
y
2
_
=
y
4
8
,
which has a local maximum at 0. Level curves of the function f (x, y) are
shown in Figure 2.3.
Thus (0, 0) is a local minimum of the restriction to any straight line and at
the same time it is a local maximum of the restriction to some parabola. Hence
it cannot be either local minimum or local maximum of the whole function.
2.3 Limits and continuity in vector spaces
The statement lim
xa
f (x) = l means that the closer x approaches a, the closer
the value of f (x) gets to l. What is closer? Closer means that the distance
is small. Recall that for vectors x, y R
n
, the distance between them is
|x y| =

i=1
(x
i
y
i
)
2
.
Since one can measure distance in R
n
, its possible to dene limits. The de-
nition is almost the same as the one from the 1st semester of calculus, the only
difference is that x, a R
m
, f (x), l R
n
and | | is taken instead of [ [.
Definition 2.20 Given a map f : R
m
R
n
, we say that
lim
xa
f (x) = l
if for any > 0 there is > 0 such that [[ f (x) l[[ < whenever 0 <
[[x a[[ < .
Example 2.21 Let
f (x, y) =
_
x sin
1
xy
, xy 0,
0, xy = 0
Lets try to check that the limit as (x, y) (0, 0) is 0. Consider > 0. We
need to nd > 0 such that
_
x
2
+ y
2
< implies [ f (x, y)[ < . We have
[ f (x, y)[

x sin
1
xy

[x[
_
x
2
+ y
2
.
If we set = , everything works just ne.
23
In Denition 2.20 the domain of f is assumed to be all of R
m
, but we will
also have to dene limits for functions with smaller domains. Suppose that
f has domain D R
m
. The same basic idea still applies, so we want the
statement
lim
xa
f (x) = l,
to mean that we can make the function values f (x) to be arbitrarily close to l
by choosing x close enough to a, but with the obvious condition that x must
also belong to D. This will however only work if its actually possible to get
close to a while remaining in D. If for example D = (x, y) R
2
: y <
0 (0, 1), we see that the point (0, 1) belongs to D but its isolated from
the rest of D. More exactly, there are no points in D within distance 1 from
(0, 1) other than the point (0, 1) itself. In this situation, it would make no
sense to try to uniquely dene the limit lim
(x,y)(0,1)
f (x, y) and we will not
do so. Instead we will only consider limits at points which are reachable
within D.
Definition 2.22 Suppose D is a subset of R
m
. We say that a R
m
is a limit
point of D if for every > 0, there exists x D such that 0 < |x a| < .
We can now dene limits at such points.
Definition 2.23 Given a map f : D R
n
where D R
m
, and a limit point
a of D, we say that
lim
xa
f (x) = l
if for each > 0 there is a > 0 such that whenever 0 < |x a| < , x D,
we have
|f (x) l| < .
Since the denition sounds same, we hope that applying it would be sim-
ilar to applying the one from the 1st semester of calculus. First, lets notice
that if a map has a limit, then any of its restrictions has the same limit.
Lemma 2.24 Consider a map f : A R
n
and B A, where a is a limit
point of B. Then
lim
xa
f (x) = l lim
xa
f [
B
(x) = l.
24
0.4 0.2 0 0.2 0.4 0.6 0.8 1

0
.
4

0
.4

0
.
4

0
.4

0
.
2
5

0
.2
5

0
.
2
5

0
.
1
0.1

0
.
1
0
.
0
5
0.05
0
.
0
5
0
.
2 0.2
0
.
2
0
.
2
0.2
0
.
3
5
0
.3
5
0
.
3
5
0
.
3
5
0
.3
5
1
0.5
0
0.5
1
1
0.5
0
0.5
1
0.5
0
0.5
Figure 2.4: Level curves and graph of the function f (x, y) =
xy
x
2
+ y
2
Proof Suppose, lim
xa
f (x) = l.
Now take any > 0. We know that there exists > 0 such that
0 < |x a| < , x A | f (x) l| < .
So in particular we have
0 < |x a| < , x B |f [
B
(x) l| < .
That is, lim
xa
f [
B
(x) = l. :)
Given a function f (x
1
, . . . , x
m
) and a point a = (a
1
, . . . , a
m
), assume that
there are two straight lines l
1
and l
2
passing through the point a such that
lim
xa
f [
l
1
lim
xa
f [
l
2
.
Then, by Lemma 2.24, lim
xa
f is not dened.
Example 2.25 Consider the function f (x, y) =
xy
x
2
+y
2
shown in Figure 2.4.
Lets nd its restriction onto a line y = kx. It is
f (x, kx) =
kx
2
x
2
+ k
2
x
2
=
k
k
2
+1
,
so it has different limits along different lines. For instance, the limit along the
line y = 0 is 0 and the limit along the line y = x is
1
2
. Thus the limit
lim
(x,y)(0,0)
f (x, y),
does not exist.
25
Example 2.26 Let f (x, y) =
x
2
y
4
x
2
+x
2
y
2
+y
4
. Consider restrictions onto straight
lines. For y = kx, we have
f (x, kx) =
x
2
k
4
x
4
x
2
+ k
2
x
4
+ k
4
x
4
=
1 k
4
x
2
1 + k
2
x
2
+ k
4
x
2
,
so the limit as x 0 is 1. For x = 0, we have f (0, y) = 1, so its different.
Thus this function doesnt have a limit as (x, y) (0, 0).
Recall that if a function f : R R is differentiable at x = a, then it is
continuous at x = a. However, a function f : R R might be continuous
but not differentiable, for example f (x) = [x[ at x = 0. The following exam-
ple shows that for function of several variables existence of partial derivative
doesnt even imply continuity.
Example 2.27 Let f : R
2
R be given by
f (x, y) =
_
1, xy = 0,
0, xy 0.
Then f (x, 0) = 1 and f (0, y) = 1. Hence f
x
(0, 0) = f
y
(0, 0) = 0.
On the other hand, consider the restriction onto the straight line y = x. We
have
f (x, x) =
_
1, x
2
= 0,
0, x
2
0.
=
_
1, x = 0,
0, x 0.
,
which is discontinuous.
2.4 Exercises
Questions on understanding the lecture
Exercise 2.1 What are the domain, the codomain, and the component func-
tions of the following maps?
(i) f (x, y, z) =
_
x + y
y sin z
_
.
(ii) f (x, y) =
_
1 x
2
y
2
.
(iii) f (u, v) =
_
_
uv
3
sin(uv)
ln(uv)
_
_
.
Exercise 2.2 Given a map f : R
m
R
n
, prove that lim
xa
f (x) = l holds if and
only if lim
xa
|f (x) l| = 0 does.
Exercise 2.3 Let a
1
, a
2
, a
3
, . . . be a sequence of vectors in R
m
. Formulate
what it means for this sequence to converge to a vector l R
m
.
26
Questions on calculation
Exercise 2.4 Prove that lim
(x,y)(0,0)
x
2
y
2
x
2
y
2
+ (x y)
2
is not dened (by nding
limits along different straight lines).
Exercise 2.5 For each of the following functions, nd all of its critical points.
(i) f (x, y) = (x y +1)
2
.
(ii) f (x, y) = xy +
50
x
+
20
y
, where x, y > 0.
(iii) f (x, y) = xy ln(x
2
+ y
2
).
(iv) f (x, y, z) = x +
y
2
4x
+
z
2
y
+
2
z
, where x, y, z > 0.
Questions on logical thinking
Exercise 2.6 Let f (x, y) =
xy
2
x
2
+y
4
.
(i) Prove that the limit of f (x, y) as (x, y) approaches (0, 0) along any
straight line is 0,
(ii) Prove that lim
(x,y)(0,0)
f (x, y) is, however, not dened.
Exercise 2.7 Let f (x, y) =
_
[xy[. Find f
x
(0, 0) and f
y
(0, 0).
27
Lecture 3
Non-degenerate critical
points
3.1 Second partial derivatives
Given a function f : R R, and its critical point a R, so f
/
(a)=0, how do
we determine whether a is a local maximum or local minimum?
By Taylors formula,
f (a + h) = f (a) + f
/
(a)h +
f
//
(a)
2
h
2
+
Thus everythings dened by f
//
(a):
For f
//
(a) > 0, we have a local minimum,
For f
//
(a) < 0, we have a local maximum,
If f
//
(a) = 0, then the situation is unclear.
In order to do the same in multi-dimensional case, we need to study second
partial derivatives. The rst derivative is dened as a limit, so the second
derivative is some kind of iterated limit. The following example shows that
one has to be careful with iterated limits.
Example 3.1 Let f (x, y) =
x
2
y
2
x
2
+y
2
, then we have
lim
x0
f (x, y) = f (0, y) =
y
2
y
2
= 1 and lim
y0
f (x, y) = f (x, 0) =
x
2
x
2
= 1.
Therefore,
lim
y0
_
lim
x0
f (x, y)
_
= 1 1 = lim
x0
_
lim
y0
f (x, y)
_

28
Of course, something like this happens only with discontinuous functions.
If f (x, y) is continuous everywhere, then we have
lim
(x,y)(a,b)
f (x, y) = f (a, b), lim
xa
f (x, y) = f (a, y), lim
yb
f (x, y) = f (x, b).
Iterated limits are limits of the restrictions onto respective coordinate lines in
this situation and therefore they all must equal f (a, b). However, one has to
be very careful with discontinuous functions.
Exercise 3.1 Write a random formula for a function f (x, y) and nd second
partial derivatives f
xy
=

y
f
x
and f
yx
=

x
f
y
. Most probably, youll get that
f
xy
= f
yx
.
In general, lets agree that the notation f
xy
means that we rst differentiate
with respect to x and then with respect to y, that is, f
xy
=

y
f
x
.
Theorem 3.2 (Clairaut) Given a function f : R
m
R, assume that
partial derivatives f
x
i
x
j
and f
x
j
x
i
are continuous at a R
m
. Then
f
x
i
x
j
(a) = f
x
j
x
i
(a).
Proof Since only x
i
and x
j
are involved, let x stand for x
i
and y for x
j
. Thus
we have function of two variables f (x, y). By denition,
f
xy
(a, b) =

y
f
x
(a, b) = lim
h0
f
x
(a, b + h) f
x
(a, b)
h
. (3.1)
Further,
f
x
(a, b + h) = lim
k0
f (a + k, b + h) f (a, b + h)
k
,
f
x
(a, b) = lim
k0
f (a + k, b) f (a, b)
k
.
Substituting to (3.1), we obtain
f
xy
(a, b) = lim
h0
lim
k0
f (a + k, b + h) f (a, b + h) f (a + k, b) + f (a, b)
hk
.
In a similar manner, we show that
f
yx
(a, b) = lim
k0
lim
h0
f (a + k, b + h) f (a, b + h) f (a + k, b) + f (a, b)
hk
.
29
Lets look more carefully at the expression
A(h, k) = f (a + k, b + h) f (a, b + h) f (a + k, b) + f (a, b)
= [ f (a + k, b + h) f (a, b + h)] [ f (a + k, b) f (a, b)]
Put (y) = f (a + k, y) f (a, y). Then A = (b + h) (b). By the Mean
Value Theorem, we have
A = (b + h) (b)
= h
/
(b + h
1
)
= h
_
f
y
(a + k, b + h
1
) f
y
(a, b + h
1
)

= kh
_

x
f
y
(a + k
1
, b + h
1
)
_
Dividing by hk and taking the limit as (h, k) (0, 0), we obtain
f
yx
(a, b) = lim
(h,k)(0,0)
A(h, k)
hk
.
In a similar manner, we can show that
f
xy
(a, b) = lim
(h,k)(0,0)
A(h, k)
hk
,
which is same. :)
Generally, if a higher-order partial derivative is continuous, then it does
not depend on the order of differentiation. For example,

z
f
z
=

z

y
f
x
However, in this course we dont need derivatives of more than second order.
3.2 Quadratic forms
This section is a brief summary of what is supposed to be known from linear
algebra.
Definition 3.3 Given two vectors x, y R
m
, their dot product is
x y = x
1
y
1
+ + x
m
y
m
.
30
Recall that the dot product is
(i) symmetric, that is, x y = y x,
(ii) bilinear, that is, (kx + ly) z = kx z + ly z and x (ky + lz) = kx y +
lx z,
(iii) positively dened, that is, x x > 0 for x 0.
Definition 3.4 Given an m by m symmetric matrix A, that is, a
ij
= a
ji
for
all 1 i, j m, the function of a vector variable q(x) = Ax, x is called a
quadratic form.
Specically, we have
q(x) =

1i,jm
a
ij
x
i
x
j
.
It can be any expression consisting of only second degree monomials like
x
2
3xy, xy +2yz +8xz, x
2
1
+ x
2
2
x
2
3
x
2
5
.
Example 3.5 Consider a quadratic form q : R
3
R given by
q(x, y, z) = x
2
2xy +8yz 3z
2
What is its matrix? First, let x be x
1
, y be x
2
, and z be x
3
. Further, we are
supposed to have a
ij
= a
ji
, so we separate the coefcients as
q(x, y, z) = xx xy yx +4yz +4zy 3zz +0yy +0xz +0zx
Thus
A =
_
_
1 1 0
1 0 4
0 4 3
_
_
Lets check that q(x, y, z) = A
_
_
x
y
z
_
_

_
_
x
y
z
_
_
. Indeed,
A
_
_
x
y
z
_
_
=
_
_
1 1 0
1 0 4
0 4 3
_
_
_
_
x
y
z
_
_
=
_
_
x y
x +4z
4y 3z
_
_
Thus,
A
_
_
x
y
z
_
_

_
_
x
y
z
_
_
= (x y)x + (x +4z)y + (4y 3z)z
= x
2
2xy +8yz 3z
2
.
31
A canonical shape of a quadratic form is
q(u
1
, . . . , u
m
) = u
2
1
u
2
k
+ u
2
k+1
+ + u
2
k+l
,
where coordinates u
1
, . . . , u
m
are obtained from x
1
, . . . , x
m
by a one-to-one
linear transformation.
Example 3.6 For the quadratic form we already considered, we have
x
2
2xy +8yz 3z
2
= x
2
2xy + y
2
y
2
+8yz 3z
2
=
(x y)
2
y
2
+8yz 16z
2
+13z
2
= (x y)
2
(y 4z)
2
+
_

13z
_
2
.
Thus if we put u = y 4z, v = x y, w =

13z, then
q(u, v, w) = u
2
+ v
2
+ w
2
.
Example 3.7 For q(x, y) = xy, we have
xy =
(x + y)
2
(x y)
2
4
.
Thus if we put u =
x y
2
, v =
x + y
2
, then
q(u, v) = u
2
+ v
2
.
Exercise 3.2 Can you apply the same idea of completing squares to nd the
canonical shape of the form q(x, y, z) = xy + yz + xz?
If we are interested just what the canonical shape is and not interested in
the linear substitution needed, then there is an easy method. We need just to
nd the eigenvalues of the matrix A of the quadratic form, that is, to solve the
characteristic equation
det (A tI) = 0.
The number of its positive roots is same as the number of positive squares in
the canonical shape. The number of negative roots is the same as the number
of negative squares in the canonical shape.
Example 3.8 Let q(x, y, z) = 2xy +2yz +2xz so
A =
_
_
0 1 1
1 0 1
1 1 0
_
_
.
The characteristic polynomial is
det (A tI) = det
_
_
t 1 1
1 t 1
1 1 t
_
_
= t
3
+3t +2.
32
4 3 2 1 0 1 2 3 4
4
3
2
1
0
1
2
3
4
4 3 2 1 0 1 2 3 4
4
3
2
1
0
1
2
3
4
4 3 2 1 0 1 2 3 4
4
3
2
1
0
1
2
3
4
4 3 2 1 0 1 2 3 4
4
3
2
1
0
1
2
3
4
All positive 1 negative 2 negative 3 negative
Figure 3.1: A cubic polynomial with all real roots.
Since t = 1 is an obvious root, we have
t
3
+3t +2 = t
3
t
2
+ t
2
+3t +2 =
t
2
(t +1) + t(t +1) +2(t +1) =
(t +1)(t
2
+ t +2) = (t +1)
2
(t 2).
Its got double negative root t
1
= t
2
= 1 and a single positive root t = 2.
Thus the canonical shape is q(u, v, w) = u
2
v
2
+ w
2
.
But since we are interested only in how many positive and negative roots
the characteristic equation has, its not necessarily to solve it. Its enough to
graph the characteristic polynomial or even guess the shape of the graph by
checking some key points. For example, Figure 3.1 shows different cases of
a cubic polynomial whose coefcient at
3
is 1 and all whose roots are real
(which happens for a characteristic polynomial of a symmetric matrix).
Example 3.9 Let q(x, y, z) = 3x
2
+5y
2
7z
2
+4xy 2xz. Thus
A =
_
_
3 2 1
2 5 0
1 0 7
_
_
, A I =
_
_
3 2 1
2 5 0
1 0 7
_
_
Therefore we have f () = det(A I) =
3
+
2
+ 46 82. Finding the
roots of this polynomial seems to be hard, but we dont need it. We only need
to know how many of them are positive and how many of them are negative.
Notice that f (0) = 82 < 0, so its either 3 or 1 negative roots. Also,
f
//
(0) = 2, so the function is concave upward at x = 0, so there is 1 negative
root.
Recall that by Vites formulae, for a 2 by 2 matrix A we have
A =
_
a b
b c
_
t
1
+ t
2
= a + c, t
1
t
2
= ac b
2
.
Thus if det A < 0, it means that there is one negative and one positive root.
If det A > 0, then either both roots are positive or both are negative. In this
occasion, a > 0 implies that t
1
, t
2
> 0 and a < 0 implies that t
1
, t
2
< 0.
33
Example 3.10 Let q(x, y) = 2x
2
3xy 6y
2
. Thus,
A =
_
2 1.5
1.5 6
_
,
so t
1
t
2
= det A < 0 and there is one positive and one negative root. Hence the
canonical shape is u
2
+ v
2
.
For matrices 3 by 3, the situation is more complicated. The relation is as
follows
t
1
t
2
t
3
= [A[,
t
1
t
2
+ t
1
t
3
+ t
2
t
3
=

a
11
a
12
a
12
a
22

a
11
a
13
a
13
a
33

a
22
a
23
a
23
a
33

,
t
1
+ t
2
+ t
3
= tr A.
Example 3.11 Let q(x, y, z) = 2xy +2xz +2yz, so
A =
_
_
0 1 1
1 0 1
1 1 0
_
_
.
Since t
1
+ t
2
+ t
3
= tr A = 0 + 0 + 0 = 0, the three eigenvalues cannot have
same sign. Thus either one of them or two of them are negative. Since t
1
t
2
t
3
=
det A = 0 +1 +1 0 0 0 = 2, it is the latter case and the canonical shape
is u
2
v
2
+ w
2
.
For m by m matrices, the product of eigenvalues still equals the determi-
nant and the sum of eigenvalues is the trace of the matrix, but the rest of the
formulae are quite complicated and using it for determining the signs is not
so easy (though possible).
3.3 Hessian matrix
We see that usually mixed partial derivatives are equal. Further, we dont
consider cases of different mixed partial derivatives.
Definition 3.12 Given a function f : R
m
R,
H
f
=
_
_
_
_
_
_
_
f
x
1
x
1
f
x
1
x
2
f
x
1
x
3
f
x
1
x
m
f
x
2
x
1
f
x
2
x
2
f
x
2
x
3
f
x
2
x
m
f
x
3
x
1
f
x
3
x
2
f
x
3
x
3
f
x
3
x
m
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
f
x
m
x
1
f
x
m
x
2
f
x
m
x
3
f
x
m
x
m
_
_
_
_
_
_
_
is the Hessian matrix of the function f .
34
Theorem 3.2 shows that usually Hessian matrix is symmetric.
Definition 3.13 Given a function f : R
m
R, its critical point a R
m
is
called non-degenerate if det H
f
(a) 0.
Thus the following two conditions must hold at a non-degenerate critical
point:
(i) f
x
1
(a) = f
x
2
(a) = = f
x
m
(a) = 0,
(ii) det
_
f
x
i
x
j
_
0.
Example 3.14 Let f (x, y) = sin(xy). Then f
x
= y cos(xy), f
y
= x cos(xy), so
(0, 0) is a critical point. Further, f
xx
= y
2
sin(xy), f
xy
= cos(xy) xy sin(xy),
f
yy
= x
2
sin(xy). Thus,
H
f
(0, 0) =
_
0 1
1 0
_
, det H = 1 < 0,
so (0, 0) is a non-degenerate critical point.
Example 3.15 Consider the function f (x, y) = (x y
2
)(2x y
2
) from the pre-
vious lecture. Recall that (0, 0) is a critical point and that the restriction of this
function onto any straight line passing through (0, 0) has a local minimum at
(0, 0), but restrictions to some parabolas have a local maximum at (0, 0).
Lets nd the Hessian matrix. We have f
x
= 4x 3y
2
, f
y
= 6xy + 4y
3
.
Thus,
H
f
(0, 0) =
_
4 6y
6y 6x +12y
2
_
=
_
4 0
0 0
_
, det H
f
(0, 0) = 0.
Therefore (0, 0) is a degenerate critical point.
3.4 Morse index and second derivative test
There is a multi-variable Taylors formula. We dont need it in general, its
enough to know that
f (a +h) = f (a) + f
/
(a) h +
1
2
H
f
(a)h h + ,
where f
/
(a) is the vector of partial derivatives at the point a and H
f
(a) is
Hessian matrix.
If a is a critical point, then all partial derivatives equal 0 and we get that
f (a +h) = f (a) +
1
2
H
f
(a)h h + , so the critical points type is determined
35
by the quadratic form s(h) = H
f
h h. Quadratic forms are classied by the
canonical shape u
2
1
+ +u
2
p
u
2
p+1
u
2
p+q
, so the local behavior of the
function must depend on these p and q.
Theorem 3.16 (Morse Lemma) Let a R
m
be a non-degenerate critical
point of a function f (x
1
, . . . , x
m
), that is, p + q = m. Then there is a
(non-linear) substitution
h
1
= h
1
(u
1
, . . . , u
m
),
h
2
= h
2
(u
1
, . . . , u
m
),
.
.
.
h
m
= h
m
(u
1
, . . . , u
m
),
such that in the new coordinates u
1
, . . . , u
m
, we have
f (a
1
+ u
1
, . . . , a
m
+ u
m
) = f (a
1
, . . . , a
m
) u
2
1
u
2
q
+ u
2
q+1
+ + u
2
m
This is a very hard theorem, so we wont prove it. Lets just clarify what
it says. A substitution is a map h = h(u), so R
m
R
m
. It must be 1-
to-1, smooth and its inverse must be also smooth. Further, we assume for
simplicity that h(0) = 0, so the point h
1
= = h
m
= 0 is the same point as
u
1
= = u
m
= 0. Finally, this substitution is dened only locally, that is, for
values of h small enough.
Example 3.17 Consider the function f (x, y) = e
x
2
sin
2
y in a neighbourhood
of the point (0, 0). Then
f (x, y) = e
x
2
sin
2
y = 1 +

n=1
x
2n
n!
(sin y)
2
= 1 + x
2
_
1 +

n=2
x
2n2
n!
_
(sin y)
2
.
Let u = x
_
1 +

n=2
x
2n2
n!
and let v = sin y. Then f = 1 + u
2
v
2
. This
substitution is dened only when 1 v 1 or

2
y

2
, so it works
only locally in a neighbourhood of the point (0, 0).
The Morse Lemma doesnt give an explicit formula for such a substitution
in general, it only says that it exists. Its enough for the purpose of detecting
the type of a critical point.
36
Definition 3.18 Let f : R
m
R be a function and let a R
m
be its non-
degenerate critical point. Then the number q of negative eigenvalues of the
Hessian matrix H
f
(a) is called the Morse index of the critical point a.
Notice that a R
m
is supposed to be a non-degenerate critical point, that
is,
f
x
1
(a) =
f
x
2
(a) = =
f
x
m
(a) = 0 and det
_
f
x
i
x
j
_
0. Otherwise,
there is no Morse index at all.
Example 3.19 Let f (x, y) = sin(xy). Then f
x
= y cos(xy) and f
y
= x cos(xy),
so (0, 0) is a critical point. Further,
H
f
(0, 0) =
_
0 1
1 0
_
, t
1
t
2
= det H
f
< 0.
Thus critical point is non-degenerate and the Morse index is 1.
The following theorem is the multi-variable second derivative test.
Theorem 3.20 Assume that a R
m
is a non-degenerate critical point of a
function f : R
m
R and let q be its Morse index. Then
(i) If q = 0, then a is a local minimum.
(ii) If q = m, then a is a local maximum.
Proof Lets prove it for q = 0 (for q = m its just similar). By the Morse
Lemma, we have
f (a +h) f (a) = u
2
1
+ + u
2
m
> 0,
which means that f (x) > f (a), where x = a + h. By denition, its a local
minimum. :)
Example 3.21 Lets nd all critical points of the function u = x
3
+ y
2
+ z
2
+
12xy +2z and calculate the Morse index of non-degenerate ones.
Critical points are those where all the partial derivatives vanish, so we have
u
x
= 3x
2
+12y = 0, u
y
= 2y +12x = 0, u
z
= 2z +2 = 0
The third equation implies that z = 1. From the second one, we have
y = 6x. Lets substitute it to the rst equation. It gives us x
2
24x = 0, so
37
either x = 0 or x = 24. Therefore we get two critical points: (0, 0, 1) and
(24, 144, 1).
Now lets nd second derivatives. We have
u
xx
= 6x, u
yy
= 2, u
zz
= 2,
u
xy
= 12, u
yz
= 0, u
xz
= 0,
so the Hessian matrix is
_
_
6x 12 0
12 2 0
0 0 2
_
_
Thus the characteristic equation is
(2 t)((6x t)(2 t) 144) = 0
At (0, 0, 1) we have (2 t)(t
2
2t 144) = 0. One of the roots equals
2, the product of two other roots is 144, which is negative, so there is one
negative and one more positive. Thus the Morse index is 1.
At (24, 144, 1) we have (2 t)((144 t)(2 t) 144) = 0, so (2
t)(t
2
146t +144). One of the roots is 2, the product of two other roots is 144,
the sum of two other roots is 146, so they must be positive. Thus the Morse
index is 0, which shows that this is a local minimum.
Given a function of two variables f (x, y) and a non-degenerate critical
point (a, b), there are technically three options: the Morse index can be 0 (so
its a local minimum), 2 (local maximum), or 1.
Definition 3.22 For a function of two variables f (x, y), a non-degenerate
critical point of Morse index 1 is called a saddle point.
Example 3.23 Let f (x, y) = sin x cos y. We have then f
x
= cos x cos y and
f
y
= sin x sin y. To nd critical points, we solve the system cos x cos y = 0,
sin x sin y = 0, from where we get that either cos x = sin y = 0, or sin x =
cos y = 0. Thus we get two series of solutions:
(x, y) =
_

2
+m, n
_
, (x, y) =
_
m,

2
+n
_
.
Further, f
xx
= sin x cos y, f
xy
= cos x sin y, f
yy
= sin x cos y, so
H
f
_

2
+m, n
_
=
_
(1)
m+n+1
0
0 (1)
m+n+1
_
, det H
f
= 1
and, similarly,
H
f
_
m,

2
+n
_
=
_
0 (1)
m+n+1
(1)
m+n+1
0
_
, det H
f
= 1.
38
1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
0
.2
5
0.25
0
.2
5
0.25 0.25
0
.5
0.5
0.5
0
.5
0.5
0.5
0
.5
0.75
0
.7
5
0
.7
5
0.75
0.75
0
.7
5
0
.7
5
0.75
1
1
1
1
1
1
1
.2
5
1
.2
5
1.25
1.25
1.5
1.5
1.5
1.5
1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
0.5
0.5

0
.2
5
0.25

0
.2
5
0.25
0
0
0
0
0 0
0.25
0.25
0.25
0.25
0.5
0.5
0.5
0.5
0.75
0.75
1
1
1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
4
4
3.75
3.75
3.5
3.5
3.25
3.25
3
3
2.75
2.75
2.5
2.5
2.25
2.25
2
2
2
2
1.75
1.75
1.75
1.75
1.5
1.5
1.5
1.5
1.25
1.25
1.25
1.25
1
1
1
1
1
1
0.75
0.75
0.75
0.75
0.75
0.75
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.25
0.25
0.25

0
.2
5
0.25
0.25
0.25
Local minimum Saddle point Local maximum
Figure 3.2: Non-degenerate critical points for a function f (x, y).
We see that det H
f
_

2
+m, n
_
> 0 and det H
f
_
m,

2
+n
_
< 0. There-
fore all critical points are non-degenerate. Among them,
_
m,

2
+n
_
are
saddle points; and
_

2
+m, n
_
are local maxima or minima depending on
m and n. For (1)
m+n+1
> 0, or m + n odd, its a local minimum. For
(1)
m+n+1
< 0, or m + n even, its a local maximum.
The type of a non-degenerate critical point can be easily determined by the
diagram of level curves. One has to keep in mind that a minimum looks like
x
2
+y
2
, a saddle like x
2
+y
2
, and a maximum like x
2
y
2
. In general, this
picture can be squeezed and rotated like in Figure 3.2.
Let f (x
1
, . . . , x
m
) be a function and a R
m
be its non-degenerate critical
point. We can consider the restriction f [
l
onto an arbitrary straight line l
a. Then the Morse index q geometrically means the number of independent
directions l such that f [
l
has a local maximum at the point a. If q = m, then
f [
l
has a local maximum in any direction and if q = 0, then f [
l
has a local
minimum in any direction.
For a non-degenerate critical point, its type is completely determined by
restrictions onto straight lines, so strange things like f (x, y) = (x y
2
)(2x
y
2
) cannot happen.
Finally, from the Morse lemma it follows that a non-degenerate critical
point a R of a function f (x) is always isolated. It means that there is R > 0
such that f has no other critical point within distance R from a.
Example 3.24 Let f (x, y) = x
2
y
2
(x +y 2.5). For critical points, we have f
x
=
2xy
2
(x + y 2.5) + x
2
y
2
= xy
2
(3x +2y 5) = 0 and f
y
= x
2
y(2x +3y 5) =
0. Thus we get that either x = 0, either y = 0, or 3x +2y 5 = 2x +3y 5 = 0,
that is, x = y = 1.
As shown in Figure 3.3, critical points of this function form two straight
lines x = 0 and y = 0 (so they are non-isolated) and a single point (1, 1),
which is isolated. Thus (x, 0) and (0, y) are all automatically degenerate, no
need to check.
39
2 1.5 1 0.5 0 0.5 1 1.5 2
2
1.5
1
0.5
0
0.5
1
1.5
2

6
0

4
0

2
0

2
0

2
0

1
0

1
0

1
0
10

8
8

6
6

6
6

4
4

4
4

2
2

2
2

2
2

1
1

1
1

0
.
8
0.8

0
.
8
0.8

0
.
8
0.8

0
.
6
0.6

0
.
6
0.6

0
.
6
0.6

0
.
4
0.4

0
.
4
0.4

0
.
4
0.4

0
.4
0.4

0
.
2
0.2

0
.
2

0.2

0
.
2
0.2

0
.2
0
.
2
0.2
0
0
0
0
0
0
0
0 0
0
0
0
0
0
0
.2
0
.2
0
.4
0
.4
0
.6
0
.6
0
.8
1
1
2
4
6
8
1
0
Figure 3.3: Critical points of the function f (x, y) = x
2
y
2
(x + y 2.5).
For (1, 1), we need to write second partial derivatives. We have f
xx
=
6xy
2
+2y
3
5y
2
, f
xy
= 6x
2
y +6xy
2
10xy, f
yy
= 6x
2
y +2x
3
5x
2
. Thus,
H
f
(1, 1) =
_
3 2
2 3
_
,
so since det H
f
> 0 and tr H
f
> 0, we conclude that its a non-degenerate
local minimum.
Example 3.25 Let f (x, y) = x
4
+ y
4
. Then f
x
= 4x
3
and f
y
= 4y
3
, so we
get only one critical point (0, 0), which is therefore isolated. However, its
degenerate as f
xx
(0, 0) = f
xy
(0, 0) = f
yy
(0, 0) = 0. From the formula we can
easily see that (0, 0) is a point of global minimum of the function f (x, y).
3.5 Exercises
Questions on understanding the lecture
Exercise 3.3 For a function f (x, y, z), assume that all its triple partial deriva-
tives are continuous. Explain why f
xyz
= f
zxy
.
Exercise 3.4 Give an example of a function with an isolated degenerate criti-
cal point.
40
Exercise 3.5 Without doing any calculation, explain why any critical point of
the function f (x, y, z) = sin x arctan(y x) +cos(y
2
x
2
) is degenerate.
Questions on calculation
Exercise 3.6 Find critical points and determine their type (degenerate/non-
degenerate, Morse index if there is any) for the following functions:
(i) f (x, y) = (x y +1)
2
.
(ii) f (x, y) = xy +
50
x
+
20
y
, where x, y > 0.
(iii) f (x, y) = xy ln(x
2
+ y
2
)
(iv) f (x, y, z) = x +
y
2
4x
+
z
2
y
+
2
z
, where x, y, z > 0
Questions on logical thinking
Exercise 3.7 A function f (x, y) is called Morse if all its critical points are non-
degenerate. A function f (x, y) is called harmonic if the equation f
xx
+ f
yy
= 0
holds for all x, y. Prove that a harmonic Morse function does not have local
maxima or minima, so its critical points are always saddles.
Exercise 3.8 Let
f (x, y) =
_
xy
x
2
y
2
x
2
+y
2
, x
2
+ y
2
> 0
0, x = y = 0
(i) Find f
xy
(0, 0) and f
yx
(0, 0).
(ii) Explain why the fact that f
xy
(0, 0) f
yx
(0, 0) does not contradict
Clairauts Theorem.
41
Lecture 4
Continuity and
differentiability
4.1 Limits and continuity in vector spaces
First of all, recall that the denition of limit for maps f : M R
n
, M R
m
is literally same as for functions R R: if for any > 0 we can nd () > 0
such that 0 < |x a| < implies |f (x) l| < , then lim
xa
f (x) = l.
One often can use [x
i
[ [[(x
1
, . . . , x
m
)[[ to apply this denition.
Example 4.1 Given f (x, y) = 2x, so f : R
2
R, let us prove that
lim
(x,y)(0,0)
f (x, y) = 0.
We need to obtain the estimate [ f (x, y) 0[ < . On the other hand, we have
[ f (x, y) 0[ = [2x[ 2
_
x
2
+ y
2
< 2,
which becomes by setting =

2
.
Example 4.2 Given f (x, y) = (y, x), so f : R
2
R
2
, let us prove that
lim
(x,y)(a,b)
f (x, y) = (b, a)
for any (a, b) R
2
. We need to obtain the estimate |f (x, y) (b, a)| < . On
the other hand, we have
[[ f (x, y) (b, a)[[ = [[(y, x) (b, a)[[
=
_
(y b)
2
+ (x a)
2
= |(x, y) (a, b)| < ,
so we just take () = .
42
Similarly, for continuity we have
Definition 4.3 A map f : A R
n
, A R
m
is continuous at a point a A
if lim
xa
f (x) = f (a). The set of functions or maps continuous at a is denoted
C(a).
Example 4.4 Let us prove that the function f (x, y) = xy, f : R
2
R is
continuous at any point (a, b) R
2
. Given > 0, we need to obtain the
estimate [ f (x, y) ab[ < . We have
[ f (x, y) ab[ = [xy ab[ = [xy xb + xb ab[ [x[[y b[ +[b[[x a[.
If we manage to estimate each of the summands by

2
, then the sum is smaller
than and were done.
First,
[b[ [x a[ [b[
_
(x a)
2
+ (y b)
2
[b[,
so we need to make sure that [b[ <

2
. We cannot just divide by [b[ because it
might be 0, but notice that if

2

1
[b[+1
, then, indeed, [b[

2

[b[
[b[+1
<

2
.
Second,
[x[ [y b[ [x[
_
(x a)
2
+ (y b)
2
[x[,
so we need to make sure that [x[ <

2
. Here we need to estimate [x[ rst.
Recall that x should be close to a, so lets for example assume that [x a[ < 1,
that is 1 < x a < 1, a 1 < x < a +1 and we see that [x[ < [a[ +1. Thus
we have [x[ < ([a[ +1) so we need to get

2([a[+1)
.
Summarizing all the above estimates, we get
() = min
_

2[b[ +2
,

2[a[ +2
, 1
_
.
Since the denitions of limit and continuity is the same for a function of
several variables as for a function of one variable, it satises same limit laws.
In particular, the sum, the difference, the product, the ratio, the composition
of two continuous functions are always continuous. Generally, we have
Theorem 4.5 Any elementary function of several variables is continuous
on its natural domain.
Recall that an elementary function is one dened by a formula involving
constants and variables; operations of addition, subtraction, multiplication,
43
division, power, and composition; exponential, logarithm, trigonometric, and
inverse trigonometric functions. Its natural domain is the set of values of the
variables where the formula makes sense (no division by zero, taking square
root of negative number etc.)
Example 4.6 A linear function f : R
m
R given by f (x
1
, . . . , x
m
) = a
1
x
1
+
+ a
m
x
m
for some constants a
1
, . . . , a
m
R is continuous.
This was about functions. But how about maps? Given a map f : R
m

R
n
, it assigns an n-tuple (y
1
, . . . , y
n
) to an m-tuple (x
1
, . . . , x
m
). Thus there are
actually n component functions y
i
: R
m
R, i = 1, . . . , n.
Theorem 4.7 Let f : M R
n
, M R
m
be a map, a = (a
1
, . . . , a
m
) M
be a point in R
m
, and let y
i
: R
m
R, i = 1, . . . , n, be the component
functions. Then
lim
xa
f (x) = l = (l
1
, . . . , l
n
)
if and only if
lim
xa
y
i
(x) = l
i
, i = 1, . . . , n.
In other words, the limit of a map f : R
m
R
n
as a vector equals the
component-by-component limit of n functions y
1
, . . . , y
n
: R
m
R. Thats
why a signicant part of Calculus IV is going to be about functions of several
variables f : R
m
R.
Proof (i) First, assume that lim
xa
y
i
(x) = l
i
for i = 1, . . . , n, and lets prove
that
lim
xa
f (x) = l = (l
1
, . . . , l
n
).
We know that for any > 0 there is
1
() such that 0 < |x a| <
1
implies [y
i
l
i
[ < . We need to nd
2
() such that
|f (x) l| =

i=1
[y
i
(x) l
i
]
2
<
holds for 0 < |x a| <
2
. Setting
2
() =
1
_

n
_
does the job.
(ii) Second, assume that lim
xa
f (x) = l = (l
1
, . . . , l
n
) and lets prove that
lim
xa
y
i
(x) = l
i
for i = 1, . . . , n.
44
We know that for any > 0 there is () such that 0 < |x a| <
implies
[ f (x) l[ =

i=1
[y
i
(x) l
i
]
2
< .
Notice that
[y
i
l
i
[ =
_
[y
i
l
i
]
2

i=1
[y
i
(x) l
i
]
2
< ,
which is needed. :)
Example 4.8 Lets nd the limit
lim
t0+
_
t
t
cos t1
t
2
_
By Theorem 4.7, it equals the component-by-component limit
lim
t0+
t
t
= lim
t0+
e
t ln t
= lim
t0+
e
ln t
1/t
.
By lHpitals Rule, lim
t0+
ln t
1/t
= lim
t0+
1/t
1/t
2
= 0. Also by lHpitals Rule,
lim
t0+
cos t 1
t
2
= lim
t0+
sin t
2t
=
1
2
Thus the answer is
_
1
0.5
_
.
4.2 Derivative
Let f (x) be a function of one variable. Recall that
f
/
(x) = lim
x0
f (x +x) f (x)
x
. (4.1)
This formula works just ne if f (x +x) and f (x) are vectors, that is, for a
map f : R R
n
. In this situation, we can apply Theorem 4.7. In other words,
for f : R R
n
, f = (y
1
, . . . , y
n
), we get
f
/
(x) = lim
x0
f (x +x) f (x)
x
=
_
y
/
1
(x), . . . , y
/
n
(x)
_
.
But what if we try to use Equation 4.1 for a function f : R
m
R, that
is, when x R
m
? The formula doesnt make sense any more because one
cannot divide by a vector x R
m
.
45
Lets try another approach. Recall that the initial idea to dene the deriva-
tive of a function f : R R is the tangent line. One wants to nd a linear
approximation, that is, to write f (x +x) f (x) + A x for small x. In
particular, the expression r(x) = f (x +x) f (x) A x becomes as small
as possible when A = f
/
(x). In this situation, we have
lim
x0
r(x)
x
= lim
x0
f (x +x) f (x) f
/
(x) x
x
= 0.
Now we can say that the derivative f
/
(x) is a number A such that
lim
x0
f (x +x) f (x) A x
x
= 0.
This idea can be generalized for a map f : R
m
R
n
. We just need to under-
stand what A is. Notice that x R
m
and A x R
n
. Thus A must be an n
by m matrix and we need to write vectors as columns, that is,
x =
_
_
_
_
_
x
1
x
2
.
.
.
x
m
_
_
_
_
_
R
m
.
Actually, another option would be to write vectors as rows, but multiply with
A on the right. We choose columns and left multiplication.
Definition 4.9 Let f : M R
n
, M R
m
be a map and let x M R
m
.
One says that A = f
/
(x) if
lim
x0
|f (x +x) f (x) A x|
|x|
= 0
The n by m matrix A = f
/
(x) is called the derivative of the map f at the point
x or the Jacobian matrix. If f
/
(x) exists, then f is differentiable at x. The set of
maps differentiable at x is denoted D(x).
Example 4.10 Let f (x, y) = x + y. Then the derivative f
/
(x, y) must be a
matrix 1 by 2, that is (A, B), for which we would have
lim
(x,y)(0,0)
f (x +x, y +y) f (x, y) Ax By
_
x
2
+y
2
= 0,
that is,
lim
(x,y)(0,0)
(x +x + y +y) (x + y) Ax By
_
x
2
+y
2
=
lim
(x,y)(0,0)
(1 A)x + (1 B)y
_
x
2
+y
2
= 0
46
Thus A = 1 and B = 1, that is, f
/
(x, y) = (1, 1) for any (x, y) R
2
.
Example 4.11 Let f (x, y) = xy. Then the derivative f
/
(x, y) must be a matrix
1 by 2, that is, (A, B) and we would have
lim
(x,y)(0,0)
f (x +x, y +y) f (x, y) Ax By
_
x
2
+y
2
= 0,
that is,
lim
(x,y)(0,0)
(x +x)(y +y) xy Ax By
_
x
2
+y
2
=
lim
(x,y)(0,0)
(y A)x + (x B)y +xy
_
x
2
+y
2
= 0
Thus A = y, B = x, and we need to check that
lim
(x,y)(0,0)
xy
_
x
2
+y
2
= 0.
Notice that 0 [x y[
_
x
2
+y
2
[y[. Thus by the Squeeze Theorem,
the limit is, indeed, 0. We get f
/
(x, y) = (y, x) for any (x, y) R
2
.
Theorem 4.12 If a map f : R
m
R
n
is differentiable at a point x R
m
,
then f is continuous at x. In other words, D(x) C(x).
Proof We are given that lim
x0
|f (x +x) f (x) A x|
|x|
= 0. Multiplying
by |x|, we obtain
lim
x0
_
f (x +x) f (x) A x
_
= 0. (4.2)
Recall that a linear map is dened by an elementary formula and therefore
is continuous everywhere by Theorem 4.5. It means that lim
x0
|A x| = 0.
Substituting it to (4.2), we obtain
lim
x0
_
f (x +x) f (x)
_
= 0,
which is the denition of continuity of the map f at the point x. :)
For a function f : R R, the derivative f
/
(x) at any point x is a matrix
1 by 1, that is, just a number. Calculating this number at any point x gives a
function f
/
: R R. Further, f
//
: R R, f
///
: R R etc.
47
But for a function of two variables f (x, y) : R
2
R, the derivative at
any point (x, y) would be a matrix 1 by 2, that is f
/
(x, y) = (A, B). Thus
f
/
: R
2
R
2
. In the same manner, f
//
: R
2
R
4
, f
///
: R
2
R
8
etc.
Finally, lets mention that since the derivative is dened in terms of limit,
for which we have Theorem 4.7, differentiability of a map f : R
m
R
n
is
equivalent to differentiability of component functions y
1
, . . . , y
n
: R
m
R.
Rows of the matrix f
/
(x) are derivatives y
/
1
(x), . . . , y
/
n
(x).
Example 4.13 Consider a map f : R
2
R
2
given by
f (x, y) =
_
x + y
xy
_
.
The component functions are u = x + y and v = xy. We already know that
u
/
(x, y) = (1, 1) and that v
/
(x, y) = (y, x). Therefore,
f
/
(x, y) =
_
1 1
y x
_
.
4.3 Jacobian matrix and tangent space
Recall that existence of partial derivative does not even imply continuity of
a function f (x
1
, . . . , x
m
), not speaking of differentiability. Conversely, if a
function f (x
1
, . . . , x
m
) is differentiable, then partial derivatives exist as the
following theorem shows.
Let f : M R
n
, M R
m
be a map with component functions y
1
, . . . , y
n
:
M R.
Theorem 4.14 Assume that a map f : M R
n
, M R
m
is differentiable
at x R
m
. Then all partial derivatives of all component functions are de-
ned at x and the matrix f
/
(x) is precisely the matrix of partial derivatives
evaluated at x.
48
In other words, we have
f
/
(x) =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
y
1
x
1
y
1
x
2

y
1
x
m
y
2
x
1
y
2
x
2

y
2
x
m
.
.
.
.
.
.
.
.
.
.
.
.
y
n
x
1
y
n
x
2

y
n
x
m
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
. (4.3)
Proof Since for maps R
m
R
n
limits and derivatives are dened component
by component, it is enough to check the case f : R
m
R. So we consider a
function f (x
1
, . . . , x
m
) and let f
/
(x
1
, . . . , x
m
) = (d
1
, . . . , d
m
). By denition, it
means that
lim
x0
|f (x +x) f (x)
m
i=1
d
i
x
i
|
|x|
= 0. (4.4)
Since the limit is 0, the limit along any line is also 0. In particular, we can x
x
j
= 0 for j i and then obtain
lim
x
i
0
f (x
1
, . . . , x
i1
, x
i
+x
i
, x
i+1
, . . . , x
m
) f (x
1
, . . . , x
m
) d
i
x
i
x
i
= 0,
which implies that
d
i
= lim
x
i
0
f (x
1
, . . . , x
i1
, x
i
+x
i
, x
i+1
, . . . , x
m
) f (x
1
, . . . , x
m
)
x
i
.
This expression is just the denition of the partial derivative f
x
i
, so the theo-
rem is proved. :)
Example 4.15 Consider a map given by
f (u, v) =
_
_
uv
2
sin(u
2
v)
e
u2v
_
_
, f : R
2
R
3
.
and let us nd its Jacobian matrix at the point (3, 5). We have the following
partial derivatives:
x
u
= v
2
x
v
= 2uv
y
u
= 2u cos(u
2
v)
y
v
= cos(u
2
v)
z
u
= e
u2v
z
v
= 2e
u2v
49
It gives us the following matrix for u = 3, v = 5:
_
_
25 30
6 cos 4 cos 4
e
7
2e
7
_
_

Given a map f : R
m
R
n
, how do we check if its differentiable? Ex-
istence of partial derivatives is not enough. The following theorem can be
used.
Theorem 4.16 Given a function f : R
m
R, assume that all its partial
derivatives exist and are continuous at any a for |x a| r, where r > 0.
Then f D(x).
Proof Lets prove it for m = 2; the general case is similar. We need to check
that
lim
(x,y)(0,0)
f (x +x, y +y) f (x, y) x f
x
(x, y) y f
y
(x, y)
_
x
2
+y
2
= 0
We have,
f (x +x, y +y) f (x, y) =
f (x +x, y +y) f (x, y +y)
..........................................................................................
f (t,y+y)
+ f (x, y +y) f (x, y)
........................................................
f (x,t)
.
Applying the Mean Value Theorem to the function f (t, y +y), we get
f (x +x, y +y) f (x, y +y) = x f
x
(x +x
1
, y +y), [x
1
[ [x[.
In the same manner, applying the Mean Value Theorem to the function f (x, t),
we get
f (x, y +y) f (x, y) = y f
y
(x, y +y
1
), [y
1
[ [y[.
Summarizing all the equations above, we get
f (x +x, y +y) f (x, y) x f
x
(x, y) y f
y
(x, y)
_
x
2
+y
2
=
x [ f
x
(x +x
1
, y +y) f
x
(x, y)] +y
_
f
y
(x, y +y
1
) f
y
(x, y)

_
x
2
+y
2
.
(4.5)
50
Finally, notice that 1
x

x
2
+y
2
1 and 1
y

x
2
+y
2
1. Also,
since f
x
and f
y
are continuous, we have
lim
(x,y)(0,0)
_
f
x
(x +x
1
, y +y) f
x
(x, y)
_
= 0,
lim
(x,y)(0,0)
_
f
y
(x, y +y
1
) f
y
(x, y)
_
= 0.
By the Squeeze Theorem, the limit of the expression (4.5) is 0, which is the
denition of differentiability. :)
Differentiability means that the graph is smooth. In case of a function
f (x, y), such a graph is a surface without any folds or corners like in Figure
4.1. The graph of a differentiable map admits a tangent space dened in the
same manner as for functions R R.
Definition 4.17 Given a map f : R
m
R
n
differentiable at a point a R
m
,
the equation
y = f (a) + f
/
(a)(x a) (4.6)
denes the tangent space to the graph of the map f (x) at x = a. It is also
called a tangent line for m = 1 and a tangent plane for m = 2.
For instance, consider a function f (x, y). Then in Equation 4.6, we have
x = (x, y) R
2
, a = (a, b) R
2
, y = z R, f (x, y) R, f
/
= ( f
x
, f
y
) and the
tangent plane is dened by
z = f (a, b) + f
x
(a, b)(x a) + f
y
(a, b)(y b).
The tangent plane can be located below or above the surface, or it can even
intersect it as shown in Figure 4.2.
Example 4.18 Let f (x, y) = x
2
ln(1 + xy). Lets try to nd out if f (x, y) is
differentiable at (2, 0) and, if so, nd the equation of the tangent plane.
We have f
x
= 2x ln(1 + xy) +
x
2
y
1+xy
and f
y
=
x
3
1+xy
. Both partial derivatives
are elementary functions and therefore are continuous for 1 + xy > 0. Thus
the function is, indeed, differentiable at the point (2, 0). Further, f (2, 0) = 0,
f
x
(2, 0) = 0, and f
y
(2, 0) = 8. Finally, the equation of the tangent plane is
z = 0 +0(x 2) +8(y 0) = 8y.
The partial derivatives of a function f (x, y) vanish at a critical point. It
means that the tangent plane is horizontal. For a non-degenerate critical point,
there are three options: a local minimum when the tangent plane is below
51
2
1.5
1
0.5
0
0.5
1
1.5
2
2
1
0
1
2
0
0.5
1
1.5
2
2.5
2
1.5
1
0.5
0
0.5
1
1.5
2
2
1
0
1
2
1
0.8
0.6
0.4
0.2
0
Figure 4.1: Graphs of functions non-differentiable at (0, 0)
2
1.5
1
0.5
0
0.5
1
1.5
2 2
1
0
1
2
20
15
10
5
0
5
10
15
20
2
1.5
1
0.5
0
0.5
1
1.5
2 2
1
0
1
2
8
6
4
2
0
2
4
Figure 4.2: Graphs of differentiable functions with tangent planes
1
0.5
0
0.5
1 1
0.5
0
0.5
1 0
0.5
1
1.5
2
1
0.5
0
0.5
1 1
0.5
0
0.5
1
1
0.5
0
0.5
1
1
0.5
0
0.5
1 1
0.5
0
0.5
1
2
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
Figure 4.3: A local minimum, a saddle point, a local maximum with tangent
planes
52
1
0.5
0
0.5
1
1
0.5
0
0.5
1
1
0.5
0
0.5
1
Figure 4.4: Tangent planes for the surface z = xy at points along the lines
x = 0 and y = 0
the functions graph, a saddle point when the tangent plane intersects the
graph by two lines, and local maximum when the tangent plane is above the
functions graph as shown in Figure 4.3.
Example 4.19 Consider the surface f (x, y) = xy and lets nd points (a, b)
such that the tangent plane at (a, b) contains the origin (0, 0, 0). Since f
x
= y
and f
y
= x are continuous, the function f (x, y) is differentiable everywhere
and we get f
x
(a, b) = b and f
y
(a, b) = a. The equation of the tangent plane is
z = ab + b(x a) + a(y b) = ab + bx + ay.
We need it to contain the point (0, 0, 0), which means 0 = ab + b0 + a0, that
is, ab = 0. In other words, a = 0 or b = 0. In either case, we have z = 0 and
the set of all such points is the two intersecting straight lines x = 0, z = 0 and
y = 0, z = 0 as shown in Figure 4.4.
Example 4.20 Let
f (x, y) =
_
_
x
2
+ y
2
_
sin
1
x
2
+y
2
, (x, y) (0, 0)
0, (x, y) = (0, 0)
Lets nd partial derivatives and test the function for differentiability. We have
f (x, 0) =
_
x
2
sin
1
x
2
, x 0
0, x = 0
,
53
so f
x
(0, 0) = lim
h0
h
2
sin
1
h
0
h
= 0 by denition of derivative. Thus,
f
x
(x, y) =
_
2x sin
1
x
2
+y
2

2x
x
2
+y
2
cos
1
x
2
+y
2
, (x, y) (0, 0)
0, (x, y) = (0, 0)
Here, the formula for (x, y) (0, 0) is obtained with usual differentiation by
Product and Chain Rules.
It is clearly not bounded at (0, 0). For instance,
f
x
(x, 0) = 2x sin
1
x
2

2
x
cos
1
x
2
,
where the rst summand approaches 0 and the second one can be arbitrary
large as x is small and cos
1
x
2
is 1. Therefore f
x
is not bounded at x = 0
(although its dened and f
x
(0, 0) = 0).
In the same manner, we would obtain that
f
y
(x, y) =
_
2y sin
1
x
2
+y
2

2y
x
2
+y
2
cos
1
x
2
+y
2
, (x, y) (0, 0)
0, (x, y) = (0, 0)
is dened but not bounded at (0, 0). Not bounded implies discontinuous.
Hence the partial derivatives f
x
and f
y
are discontinuous at (0, 0), though
f
x
(0, 0) = f
y
(0, 0) = 0.
Lets test this function for differentiability. We need to check that
lim
(x,y)(0,0)
f (x, y) f (0, 0) f
x
(0, 0)x f
y
(0, 0)y
_
x
2
+y
2
= 0.
As we already know, f (0, 0) = f
x
(0, 0) = f
y
(0, 0) = 0, so we need to check
whether
lim
(x,y)(0,0)
_
x
2
+y
2
_
sin
1
x
2
+y
2
_
x
2
+y
2
=
lim
(x,y)(0,0)
_
x
2
+y
2
sin
1
x
2
+y
2
= 0
It is true by Squeeze Theorem: lim
_
x
2
+y
2
= 0 and sin
1
x
2
+y
2
is
bounded.
Thus a function might not have continuous partial derivatives but still be
differentiable.
54
0.5
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
0.5
0.5
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
0.5
0.4
0.2
0
0.2
0.4
0.6
Figure 4.5: A function differentiable at (0, 0) but whose partial derivatives are
discontinuous at (0, 0)
4.4 Exercises
Questions on understanding the lecture
Exercise 4.1 Let f : R
3
u,v,w
R
3
x,y,z
be a map given by
x = p(u, v, w),
y = q(u, v, w),
z = r(u, v, w),
where p, q, and r are polynomials in the variables u, v, w.
(i) Prove that f is continuous on R
3
.
(ii) Is it true that f is also differentiable on R
3
? Justify your answer.
Exercise 4.2 What is the Jacobian matrix of a linear map f (x) = Ax, f : R
m

R
n
. Here, A is an n by m matrix and x R
m
is a vector-column.
Exercise 4.3 Given a map f : R
m
R
n
, recall that it is called differentiable
at a point x R
m
if there is an n m matrix A such that
lim
x0
|f (x +x) f (x) Ax|
|x|
= 0.
Prove that there exists exactly one matrix A satisfying this condition.
55
Questions on calculation
Exercise 4.4 Write down the equation of the tangent plane to each of the
following surfaces at given points
(a) z = x
2
+ y
2
at the point (1, 2, 5).
(b) x
2
+ y
2
+ z
2
= 169 at the point (3, 4, 12).
(c) z = y +ln
x
z
at the point (1, 1, 1).
Questions on logical thinking
Exercise 4.5 One says that a function f (x, y) is independent of the variable x
if f (x
1
, y) = f (x
2
, y) for any y. In a similar manner, one denes a function
independent of y.
Let f (x, y) be a function of two variables. Prove that
(i) If f
x
0, then f is independent of the variable x,
(ii) If f
y
0, then f is independent of the variable y,
(iii) If f
x
f
y
0, then f is constant.
Notice that the converse to each of these statements is obvious: if a function
doesnt have x, then of course its derivative with respect to x is 0.
Exercise 4.6 Let f (x, y) = [xy[. Determine all points (x, y) R
2
such that f
is differentiable at (x, y).
56
Lecture 5
Chain Rule
5.1 Differentiation laws
Let f : M R
n
, M R
m
be a map and let x M R
m
. Recall that
A = f
/
(x) means
lim
x0
|f (x +x) f (x) A x|
|x|
= 0,
where A is the n by m matrix whose elements are partial derivatives of the
component functions y
1
, . . . , y
n
with respect to the variables x
1
, . . . , x
m
.
Partial derivatives are just like usual derivatives, so we have sum, product,
and ratio rules as
( f + g)
x
= f
x
+ g
x
, ( f g)
x
= f
x
g + f g
x
,
_
f
g
_
x
=
f
x
g f g
x
g
2
,
and same with other variables. For instance, for three variables and functions
f (x, y, z) and g(x, y, z), we have f
/
= ( f
x
, f
y
, f
z
) and g
/
= (g
x
, g
y
, g
z
), so we
have
( f g)
/
=
_
( f g)
x
, ( f g)
y
, ( f g)
z
_
=
_
f
x
g + f g
x
, f
y
g + f g
y
, f
z
g + f g
z
_
= f
/
g + f g
/
.
In the same manner, we would obtain that ( f + g)
/
= f
/
+ g
/
(sum rule) and a
certain formula for the ratio rule. The chain rule also works, but the proof is
not so obvious since it requires multiplication of matrices.
57
Theorem 5.1 (Chain Rule) Given two maps f : R
k
R
m
and g : R
m

R
n
, let h(x) = g( f (x)). Then
h
/
(x) = g
/
( f (x)) f
/
(x).
Recall that g
/
( f (x)) is an n m matrix, f
/
(x) is an mk matrix, and h
/
(x)
is an n k matrix, so it agrees with matrix multiplication.
Proof Let A = g
/
( f (x)) and let B = f
/
(x). Since g is differentiable, we have
g(y +y) = g(y) + Ay +r
1
(y), (5.1)
where lim
y0
r
1
(y)
|y|
= 0. We need to apply it at the point y = f (x), but rst
lets write the denition of differentiability of the map f :
f (x +x) = f (x) + Bx +r
2
(x),
where lim
x0
r
2
(x)
|x|
= 0. Combining these equations together, we obtain
h(x +x) = g( f (x +x)) = g( f (x)
..
y
+ Bx +r
2
(x)
................................
y
),
so (5.1) implies that
h(x +x) = g( f (x)) + A (Bx +r
2
(x)) +r
1
(Bx +r
2
(x))
= g( f (x))
............
h(x)
+ABx + Ar
2
(x) +r
1
(Bx +r
2
(x))
............................................................................
r
3
(x)
,
where we only need to check that lim
x0
r
3
(x)
|x|
= 0. We have,
r
3
(x)
|x|
= A
r
2
(x)
|x|
................
0
+
r
1
(Bx +r
2
(x))
|Bx +r
2
(x)|
............................................
0

|Bx +r
2
(x)|
|x|
........................................
bounded
,
so the limit is, indeed, 0. :)
Example 5.2 Consider maps R
2
r,
f
R
3
x,y,z
g
R
u
given by equations
u = x + y
2
+ z
3
,
x = r cos
y = r sin
z = r
,
58
so for the composition u = h(r, ) = g(x(r, ), y(r, ), z(r, )), we have the
expression h(r, ) = r cos + (r sin )
2
+r
3
. The Chain Rule says that
_
u
r
u

_
=
_
u
x
u
y
u
z
_

_
_
x
r
x

y
r
y

z
r
z

_
_
(5.2)
Specically, we have equations u
r
= u
x
x
r
+ u
y
y
r
+ u
z
z
r
and u

= u
x
x

+
u
y
y

+ u
z
z

. Lets check if its true. Substituting actual formulae in (5.2), we


get
_
u
r
u

_
=
_
1 2y 3z
2
_

_
_
cos r sin
sin r cos
1 0
_
_
,
that is, cos + 2r sin
2
+ 3r
2
= u
r
= cos + 2y sin + 3z
2
. Recalling expres-
sions for x, y, and z, we obtain cos +2r sin
2
+3r
2
= cos +2r sin sin +
3r
2
, which is true.
In the same manner, we have r sin + 2r
2
sin cos = u

= r sin +
2r sin r cos , which is also true.
Of course, Chain Rule is not very useful to nd actual partial derivatives
because its always easier to directly differentiate the composed map. How-
ever, it is needed to prove theorems. It can also be applied for substitutions in
partial differential equations.
There is a natural way of presenting Chain Rule with differentials.
Definition 5.3 Given a function f (x
1
, . . . , x
m
), its differential at a point x =
(x
1
, . . . , x
m
) R
m
is a linear function d f : R
m
R given by
d f =
m

i=1
f
x
i
dx
i
,
where dx
1
, dx
2
, . . . , dx
m
are coordinates in R
m
.
We take the Jacobian matrix ( f
x
1
, . . . , f
x
m
) and use it to construct a linear
function R
m
R. Generally, a linear function is given by
l(x
1
, . . . , x
m
) = a
1
x
1
+ + a
m
x
m
, a
1
, . . . , a
m
R
m
.
For the differential, we take a
i
= f
x
i
(x) and use dx
1
, . . . , dx
m
for coordinates in
the space R
m
instead of x
1
, . . . , x
m
. The notation dx
i
emphasizes that the func-
tions f and d f are dened on different copies of the m-dimensional Euclidean
space R
m
.
Basically the differential and the derivative are just different aspects of the
same idea the linear approximation for a function at a certain point.
59
Historically, dx
i
was interpreted as innitely small increment of a variable.
In the same manner, d f
x
was considered to be an innitely small increment of
the function, that is, d f
x
= f (x + dx) f (x).
The notion of innitely small increment has no rigourous mathematical
meaning. We can only think of it as dx = x and d f = f = f (x +x) f (x).
Then the equation d f
x
=
m
i=1
f
x
i
(x)dx
i
is not precise, we can only say that
d f
x

m
i=1
f
x
i
(x)dx
i
, which is not very clear because what does the symbol
mean exactly?
According to the modern denition, d f
x
: R
m
R is a linear function
satisfying
lim
x0
f (x +x) f (x) d f
x
(x)
|x|
= 0
and dx
i
: R
m
R is a linear function given by dx
i
(h) = h
i
, where h =
(h
1
, . . . , h
m
) R
m
. The equation d f
x
=
m

i=1
f
x
i
(x)dx
i
expresses d f
x
as a sum of
basic linear functions dx
i
. Besides, we have f (x +x) f (x) + d f
x
(x)
1
.
The function y = f (x) + d f
x
(x) is the best linear approximation for the
original function f (x + x) where x is a small increment of the variable
vector. The graph of the differential is the tangent space to the graph of the
function.
Example 5.4 Lets approximately compute sin29

tan46

. Put f (x, y) =
sin x tan y and (x, y) =
_

6
,

4
_
. Thus we need to nd f (x +x, y +y), where
x =

180
and y =

180
. Applying differential, we write f (x + x, y +
y) f (x, y) + f
x
(x, y)x + f
y
(x, y)y. Here f (x, y) =
1
2
, f
x
(x, y) =

3
2
,
f
y
(x, y) = 1.
Thus sin29

tan46


1
2

3
2


180
+

180
0.502. The error is, actually,
in the fourth decimal position.
Chain Rule says that dx
i
, i = 1, . . . , x
m
in expression d f =
m
i=1
f
x
i
dx
i
are in
turn differentials of functions x
i
. Let f (u
1
, . . . , u
m
), where u
i
= u
i
(x
1
, . . . , x
n
).
Then we have d f =
m
i=1
f
u
i
du
i
, where du
i
=
n
j=1
u
i
x
j
dx
j
. Thus,
d f =
m

i=1
n

j=1
f
u
i
u
i
x
j
dx
j
=
n

j=1
_
m

i=1
f
u
i
u
i
x
j
_
dx
j
On the other hand, we have
d f =
n

j=1
f
x
j
dx
j
.
1
The symbol has a precise mathematical meaning. One says that f (x) g(x)(x a) if
f (x) g(x) = g(x) s(x) for some function s(x) such that lim
xa
s(x) = 0. In other words, we
have f = g + o(g).
60
Since dx
1
, . . . , dx
n
form a basis of the space R
n
, the coefcients at dx
j
must be
same, that is,
f
x
j
=
m

i=1
f
u
i
u
i
x
j
,
which is Chain Rule.
Example 5.5 Let f (u, v) = u
2
sin v, where u(x, y) = xy and v(x, y) = x +2y.
Then
d f = 2udu cos v dv
= 2xy(ydx + xdy) cos(x +2y) (dx +2dy)
= [2xy
2
cos(x +2y)]dx + [2x
2
y 2 cos(x +2y)]dy,
which shows that f
x
= 2xy
2
cos(x +2y) and f
y
= 2x
2
y 2 cos(x +2y).
5.2 Directional derivative
Recall that partial derivatives f
x
and f
y
of a function f (x, y) at a point (a, b)
are derivatives of restrictions f (x, b) and f (a, y) onto lines y = b and x = a. In
general, a derivative shows how rapidly a function grows or decreases. Thus
partial derivatives indicate the growth rate of a function in directions parallel
to coordinates axes. Of course, its interesting to nd the growth rate in an
arbitrary direction.
For example, we are receiving a radio signal from a terrorist base and
we can measure the strength of the signal at a point (x, y), so its a function
f (x, y). Where is the transmitter? Naturally, to nd it we should look for
the direction in which f (x, y) increases most rapidly. This leads us to the
denition of a directional derivative.
Definition 5.6 Given a function f : R
m
R, a point a R
m
, and a unit
vector v R
m
(meaning |v| = 1) showing a direction, the directional deriva-
tive of the function f at the point a in the direction v is
D
v
f (a) = [ f (a +vt)]
/
t=0
= lim
t0
f (a +vt) f (a)
t
. (5.3)
If the vector v is not necessarily unit, then D
v
f (a) is called the derivative
along the vector v.
Here, x = a + vt is a parametric equation of the straight line passing
through the point a and parallel to the vector v. Thus the directional deriva-
tive shows how rapidly f (x) is growing or decreasing from the point a in the
direction of the vector v.
61
For f (x, y), a point (a, b) R
2
and a vector (u, v) R
2
, we get
D
(u,v)
f (a, b) =
_
d
dt
f (a + ut, b + vt)
_
t=0
.
Example 5.7 Let f (x, y) = x
2
y, (a, b) = (2, 1) and lets take the direction
parallel to the line y = x such that x and y are increasing.
We need the vector (u, v) to be unit, so u
2
+ v
2
= 1. Also, we know that
u = v because its parallel to the line y = x, so u = v =
1

2
. Since x and y are
supposed to be increasing, we take the positive sign, so (u, v) =
_
1

2
,
1

2
_
.
Further,
f
_
2 +
t

2
, 1 +
t

2
_
=
_
2 +
t

2
_
2
_
1 +
t

2
_
=
t
3
2

2
+
5t
2
2
+4

2t +4,
d
dt
f
_
2 +
t

2
, 1 +
t

2
_
=
3t
2
2

2
+5t +4

2,
so the derivative at t = 0 is 4

2.
Example 5.8 Let f (x, y) =
_
[x(x + y)[ and lets nd D
(1,1)
f (0, 0) and
D
(1,1)
f (0, 0).
We have f (0 t, 0 + t) =
_
[ t(t + t)[ = 0, so
d f (t,t)
dt
= 0 and
D
(1,1)
f (0, 0) = 0. Further, f (0 + t, 0 + t) =
_
[t(t + t)[ =

2t
2
=

2[t[,
which is not differentiable at 0 and therefore D
(1,1)
f (0, 0) is not dened.
How do we nd a directional derivative in general? Its easy for a differ-
entiable function since we can use Chain Rule.
Theorem 5.9 Suppose that a function f (x
1
, . . . , x
m
) is differentiable at a =
(a
1
, . . . , a
m
) R
m
and let v = (v
1
, . . . , v
m
) be a vector. Then we have
D
v
f (a) =
m

i=1
f
x
i
(a) v
i
. (5.4)
Proof Let g(t) = f (a +vt) so D
v
f (a) = g
/
(0). Then g(t) = f (l(t)), where
l(t) = a +vt =
_
_
_
_
_
a
1
+ v
1
t
a
2
+ v
2
t
.
.
.
a
m
+ v
m
t
_
_
_
_
_
, l : R R
m
.
62
1
0.5
0
0.5
1 1
0.5
0
0.5
1 1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
Figure 5.1: f (x, y) =
3

xy.
By Chain Rule, we have
g
/
(0) = f
/
(a) l
/
(0) =
_
f
x
1
f
x
2
f
x
m
_

_
_
_
_
_
v
1
v
2
.
.
.
v
m
_
_
_
_
_
,
which completes the proof. :)
Example 5.10 Let f (x, y) = x
2
y and lets again nd D
_
1

2
,
1

2
_
f (2, 1).
We have f
x
= 2xy, f
x
(2, 1) = 4, f
y
= x
2
, f
y
(2, 1) = 4. Applying the
formula, we get
D
_
1

2
,
1

2
_
f (2, 1) =
4

2
+
4

2
= 4

2
Example 5.11 Lets prove that f (x, y) =
_
[xy[ is not differentiable at (0, 0).
By Theorem 5.9, if a function is differentiable, then restriction to any straight
line is also differentiable. But for y = x, we have f (x, x) =
_
[x
2
[ = [x[, which
is not differentiable at 0.
Example 5.12 Let f (x, y) =
3

xy (see the graph in Figure 5.1). Then f


x
(0, 0) =
_
f (x, 0)

x=0
= [0]
/
x=0
= 0 and f
y
(0, 0) =
_
f (0, y)

y=0
= [0]
/
y=0
= 0.
Is f differentiable at (0, 0)? Lets consider f [
x=y
. We have f (x, x) =
3

x
2
=
x
2/3
, so
d
dx
f (x, x) =
2
3x
1/3
, which is innite at x = 0. Therefore the functions
is not differentiable at (0, 0).
63
1
0.5
0
0.5
1 1
0.5
0
0.5
1
0.5
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
0.5
Figure 5.2: A non-differentiable function with all directional derivatives de-
ned.
Notice that (5.4) can be also written as
D
v
f (a) = d f
a
(v)
and as
D
v
f (a) = ( f
x
1
, f
x
2
, . . . , f
x
m
) (v
1
, v
2
, . . . , v
m
).
The following example shows that sometimes derivatives in all directions
exist, but the function is not differentiable.
Example 5.13 Consider the function
f (x, y) =
_
x
2
y
x
2
+y
2
, (x, y) (0, 0)
0, (x, y) = (0, 0)
whose graph is shown in Figure 5.2. Then we have [y[ f (x, y) [y[ and
thus by Squeeze Theorem, f C(0, 0). Further, f (ut, vt) =
u
2
vt
u
2
+v
2
and thus
D
(u,v)
f (0, 0) = [ f (ut, vt)]
/
t=0
=
u
2
v
u
2
+v
2
. In particular, f
x
(0, 0) = f
y
(0, 0) = 0 and
we see that Equation (5.4) does not hold. Therefore f (x, y) is not differentiable
at (0, 0).
5.3 Partial differential equations
A partial differential equation or just PDE involves an unknown function and its
partial derivatives. Many natural processes are described in terms of partial
64
differential equations. For example, the wave equation
u
tt
= c
2
u
xx
or u
tt
= c
2
(u
xx
+ u
yy
) or u
tt
= c
2
(u
xx
+ u
yy
+ u
zz
)
describes propagation of waves like sound, light, water waves. The Laplace
equation
u
xx
+ u
yy
+ u
zz
= 0
is used in astronomy, electricity, uid dynamics. Chain Rule can be applied to
simplify a partial differential equation by some clever substitution.
Example 5.14 Consider the PDE y f
x
x f
y
= 0 and lets substitute u = x and
v = x
2
+ y
2
. By Chain Rule,
f
x
= f
u
u
x
+ f
v
v
x
= f
u
+2x f
v
,
f
y
= f
u
u
y
+ f
v
v
y
= 2y f
v
.
(5.5)
Thus the equation is y( f
u
+2x f
v
) x2y f
v
= 0, that is, f
u
= 0 for y 0.
Therefore f does not actually depend on u. Thus f is a function of only v =
x
2
+y
2
, so f = (x
2
+y
2
), where : R R. Here, can be any differentiable
function of one variable, so f (x, y) = ln(x
2
+ y
2
), f = cos
_
x
2
+ y
2
, f =
e
tan(x
2
+y
2
)
or whatever.
Example 5.15 f
x
+ f
y
= 1. Lets substitute u = x + y, v = x y to solve the
equation. For rst order partial derivatives we have then
u
x
= 1, u
y
= 1, v
x
= 1, v
y
= 1
Thus,
f
x
= f
u
u
x
+ f
v
v
x
= f
u
+ f
v
, f
y
= f
u
u
y
+ f
v
v
y
= f
u
f
v
The equation is f
u
+ f
v
+ f
u
f
v
= 1, that is, f
u
=
1
2
. Thus f
u
2
does not
depend on u and depends only on v. Thus, f (x, y)
x+y
2
= (x y), that is,
f (x, y) =
x+y
2
+ (x y).
Example 5.16 The one dimensional wave equation f
tt
= f
xx
describes vibra-
tion of a one-dimensional string if the shape of the string is given by the
function y = f (x, t). at each particular moment t. To solve the wave equation,
one substitutes u = x + t and v = x t. We get then
f
x
= f
u
+ f
v
f
t
= f
u
f
v
Applying Chain Rule to functions f
x
and f
t
, we get
f
xx
= f
uu
+2f
uv
+ f
vv
f
tt
= f
uu
2f
uv
+ f
vv
65
Thus the wave equation is f
uv
= 0 in the coordinates u, v. Therefore the
general solution is f = (u) + (v) = (x + t) + (x t), where , : R
R.
In the theory of differential equations, one often is interested in the initial
value problem, which means that we know what happens at t = 0 and need
to nd out how the solution is evolving for t > 0.
Example 5.17 Assume that we know that at the moment t = 0, the string has
the sinusoidal form y = sin x. Also, assume that the initially the string is not
moving, that is, f
t
= 0 when t = 0.
Since we know the general solution f (x, t) = (x + t) +(x t), we con-
clude that
(x + t) +(x t) = sin x at t = 0

/
(x + t)
/
(x t) = 0 at t = 0
,
that is,
(x) +(x) = sin x,
/
(x)
/
(x) = 0
Therefore, (x) (x) = C and
(x) =
sin x + C
2
, (x) =
sin x C
2
Finally, we get
f (x, t) =
1
2
sin(x + t) +
1
2
sin(x t)
Although it is usually impossible to nd the general solution of a PDE,
sometimes one can solve the initial value problem without knowing the gen-
eral solution.
5.4 Geometry of level curves and surfaces
Definition 5.18 The vector grad f = ( f
x
1
, f
x
2
, . . . , f
x
m
) is called the gradient
of the function f (x
1
, . . . , x
m
).
Thus (5.4) shows that D
v
f = grad f v. In particular, we have the linearity
with respect to v, that is,
D
cu+dv
f = cD
u
f + dD
v
f , c, d R
66
Theorem 5.19 Let f (x
1
, . . . , x
m
) be a function differentiable at a =
(a
1
, . . . , a
m
) R
m
. Then
(i) grad f (a) indicates direction, in which the function increases most
rapidly.
(ii) grad f (a) indicates direction, in which the function decreases
most rapidly.
(iii) For any direction u perpendicular to grad f (a), the derivative along
u is zero, which means that the function f neither increases nor
decreases along u.
Proof For a unit vector v, the derivative D
v
f shows the growth rate in the
direction v. Since D
v
f = grad f v is the dot product, its maximal value over
all v is when v has the same direction as grad f , its minimal value is when v
has the opposite direction and it is zero when v is perpendicular to grad f . :)
Given a function f (x
1
, . . . , x
m
), recall that its level surfaces are given by
the equation f (x
1
, . . . , x
m
) = c for different values of constant c R. In other
words, the restriction of the function onto any level surface is constant and
therefore has zero derivatives. If we are moving along a level surface, the
value of the function is not changing. But along a level surface means along
a vector tangent to a level surface. Thus we see that derivatives along vectors
tangent to the level surface are zeroes, which means that these tangent vectors
are perpendicular to grad f . Hence the gradient vector is perpendicular to
level surfaces at each point as shown in Figure 5.3. Of course, this is not a
rigorous explanation, but its true. Well prove it only in some easy case.
Theorem 5.20 Assume that a function f : R
m
R is differentiable ev-
erywhere. Then the vector eld grad f is perpendicular to level curves or
level surfaces of the function f at any point x = a.
Proof We will prove this theorem in the easiest case when each level curve is
a graph of some function g : R R. The general case can be deduced from
it by applying the Implicit Function Theorem, which is not included in the
course.
So assume that f (x, y) = y g(x). Level curves are given by
y = g(x) + c, c R.
67
0.5 0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 0.4 0.5
0.5
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
0.5
Figure 5.3: Vector eld grad f together with level curves of a differentiable
function
Then grad f = (g
/
(x), 1). Tangent to a level curve at a point x = a is given by
the equation y = g
/
(a)(x a) + g(a) +c, or g
/
(a)x +y g(a) c + ag
/
(a) =
0. It is, obviously, perpendicular to the vector (g
/
(a), 1). :)
Corollary 5.21 The equation of the tangent space to a surface given by
f (x
1
, . . . , x
m
) = c at a point x
1
= a
1
, . . . , x
m
= a
m
is
f
x
1
(a) (x
1
a
1
) + + f
x
m
(a) (x
m
a
m
) = 0.
Proof Generally, the equation of any hyperplane in R
m
is
A
1
x
1
+ A
2
x
2
+ + A
m
x
m
+ B = 0,
where the vector (A
1
, . . . , A
m
) is orthogonal to the hyperplane. A vector per-
pendicular to the tangent space is grad f . Therefore, the equation of the tan-
gent space is
f
x
1
(a) x
1
+ + f
x
m
(a) x
m
+ B = 0, (5.6)
which is what we need for
B =
m

i=1
f
x
i
(a) a
i
. (5.7)
Since the point x = a belongs to the tangent space, we nd B substituting
x
1
= a
1
, . . . , x
m
= a
m
to (5.6) and get the formula (5.7). :)
68
Example 5.22 Consider an ellipsoid
x
2
a
2
+
y
2
b
2
+
z
2
c
2
= 1 and lets nd the tangent
plane at a point (x
0
, y
0
, z
0
).
First, we have f (x, y, z) =
x
2
a
2
+
y
2
b
2
+
z
2
c
2
1 and lets nd grad f (x
0
, y
0
, z
0
).
We have
f
x
(x
0
, y
0
, z
0
) =
2x
0
a
2
, f
y
(x
0
, y
0
, z
0
) =
2y
0
b
2
, f
z
(x
0
, y
0
, z
0
) =
2z
0
c
2
.
Now the equation of the tangent plane is
2x
0
a
2
(x x
0
) +
2y
0
b
2
(y y
0
) +
2z
0
c
2
(z
z
0
) = 0, that is,
x
0
x
a
2
+
y
0
y
b
2
+
z
0
z
c
2
= 1.
Assume that a function f (x, y) is differentiable at a point (a, b). If (a, b) is
a non-degenerate critical point, then we know how level curves in its neigh-
bourhood look like: ellipses for local maximum or minimum and two lines
and hyperbolae between them for a saddle point. But if (a, b) is a regular (that
is, not critical) point, what is it? Assume that, additionally, partial derivatives
f
x
and f
y
are continuous at (a, b). Then let u = f
x
(a, b) and v = f
y
(a, b).
We have grad f (a, b) = (u, v) and since partial derivatives are continuous,
grad f (u, v) in the whole neighbourhood of the point (a, b). Thus level
curves looks like equidistant parallel lines perpendicular to the vector (u, v)
as shown in Figure 5.3.
5.5 Exercises
Questions on understanding the lecture
Exercise 5.1 What is the differential of a linear function l : R
m
R?
Exercise 5.2 Sketch level curves of the function f (x, y) = 2x 3y and the
vector eld grad f .
Exercise 5.3 Consider the function f (x, y) = x
2
xy and the initial point
(2, 1). In which direction does the function increase most rapidly? Decrease
most rapidly?
Exercise 5.4 Given a differentiable function f : R
m
R, vectors u, v R
m
and scalars , R, prove that D
u+v
f (a) = D
u
f (a) +D
v
f (a).
Questions on calculation
Exercise 5.5 Find an equation of the tangent plane to the surface
2 sin
x
2
y + z
+sin
y
2
x + z
+sin
z
2
x + y
=

3
at the point x = , y = 2, z = 2.
69
Exercise 5.6 Prove that the following surfaces are mutually perpendicular:
x
2
+ y
2
+ z
2
= r
2
, y = x tan , x
2
+ y
2
= z
2
tan
2

Here r > 0, [0, 2],


_

2
,

2

are some parameters. Specically, check


that for any point (a, b, c) where two of these surfaces intersect, the normals
to the surfaces are perpendicular.
Exercise 5.7 Prove that any tangent plane to the surface xyz = a
3
(a > 0) and
the coordinate axes form a tetrahedron of a constant volume.
Questions on logical thinking
Exercise 5.8 Given a function f (x
1
, . . . , x
m
) differentiable at a point a =
(a
1
, . . . , a
m
), what is the maximal growth rate D
v
f (a) over all unit vectors
v?
Exercise 5.9 By an appropriate linear substitution, nd the general solution
to the PDE a f
x
+ b f
y
= 0, where a and b are some parameters such that
a
2
+ b
2
> 0.
Exercise 5.10 Substitute u = x +2y +2 and v = x y 1 into the PDE
2f
xx
+ f
xy
f
yy
+ f
x
+ f
y
= 0.
70
Lecture 6
Constraint Optimization
6.1 Extreme values over a curve or surface
We know how to nd extreme values of a function f : R
m
R dened
everywhere or inside some region like x
2
+y
2
+z
2
< 1 or x > 0. We also know
how to nd out if a critical point is a local maximum or a local minimum. But
what if f is dened on a curve or a surface?
Example 6.1 Imagine that an alien spaceship crashes on Earth and the area
becomes radio-active such that the radiation level at a point (x, y) is f (x, y) =
10 x
2
+ xy y
2
, where the origin is set at the spaceships site. Our duty
is to observe the crash site from the distance of 1 km, so we need to nd a
point with minimal radiation f (x, y) on the circle x
2
+ y
2
= 1. One can see
that the radiation has its highest and lowest values at points where the circle
x
2
+y
2
= 1 touches level curves of the function f (x, y) as shown in Figure 6.1.
To say that two curves touch each other is same as to say that their normal
vectors are collinear. The normal direction to a level curve or a level surface
of some function is indicated by the gradient vector. Thus we need to nd out
where vectors grad(x
2
+ y
2
) and grad f (x, y) are collinear.
We have grad f = (2x + y, x 2y) and grad(x
2
+ y
2
) = (2x, 2y). The
condition of collinearity is (2x + y, x 2y) = (2x, 2y) for some R. In
other words, we have the following equations:
_
_
_
x
2
+ y
2
= 1
2x + y = 2x
x 2y = 2y
(6.1)
Here, y = 2( + 1)x (2nd equation) and x = (2 + 2)y = 4(1 + )
2
x (3rd
equation). Therefore either x = 0 or 4(1 +)
2
= 1.
If x = 0, then y = 1 and its easy to see that (2x + y, x 2y) (2x, 2y),
so its not a solution.
71
2 1.5 1 0.5 0 0.5 1 1.5 2
2
1.5
1
0.5
0
0.5
1
1.5
2

1
0
0
0
.5
0
.5
1
1
1
.5
1
.5
2
2
2
.5
2
.5
3
3
3
.5
3
.5
4
4
4
4
4
.5
4
.5
4
.5
4
.5
5
5
5
5
5
.5
5
.5
5
.5
5
.5
6
6
6
6
6
.5
6.5
6
.5
6
.5
6.5
6
.5
6
.5
6
.5
7
7
7
7
7
7
7
7
7
7
.5
7
.5
7.5
7
.5
7
.5
7
.5
7.5
7
.5
8
8
8
8
8
8
8
8
.5
8.5
8.5
8
.5
8
.5
8.5
9
9
9
9
9
9
.5
9.5
9
.5
9
.5
Figure 6.1: Level curves of the function f (x, y) = 10 x
2
+ xy y
2
If 4(1 +)
2
= 1, then either =
1
2
or =
3
2
. First, let =
1
2
. We have
then y = 2( +1)x = x and 2x
2
= 1, that is, x = y =
1

2
and f (x, y) = 9.5.
Second, let =
3
2
. We have then y = x =
1

2
and f (x, y) = 8.5. The
lowest level of the radiation is 8.5; and it is achieved at the points
_
1

2
,
1

2
_
and
_

2
,
1

2
_
.
Generally, we have a function f : R
m
R and a set M R
m
given by
constraints
g
1
(x) = g
2
(x) = = g
k
(x) = 0,
where g
1
, . . . , g
k
: R
m
R. The question is to maximize or minimize the
functions value on M, that is, nd extreme values of f [
M
. For instance, the
problem we just considered has m = 2, k = 1, f (x, y) = 10 x
2
+ xy y
2
,
g(x, y) = x
2
+ y
2
1.
The case k = 1 is the easiest one, so lets rst think what happens when
there is only one constraint g. Recall that the vector grad f indicates the direc-
tion of maximal growth rate of the function f . Suppose that a M is a point
of maximum of the function f and f (a) = c. If grad f (a) grad g(a), we can
shift a little from a along grad f to a higher level f (x, y) = c
1
> c and the
new level curve f (x, y) = c
1
has to intersect M somewhere, so the value of f
there will be greater than c. On the contrary, if grad f (a) | grad g(a), then we
shift along grad f and f (x, y) = c
1
does not intersect M any more as shown
in Figure 6.2.
72
1.2 1.15 1.1 1.05 1 0.95 0.9 0.85 0.8 0.75
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0.05
0.1
9
9
.1
0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45
0.9
0.85
0.8
0.75
0.7
0.65
0.6
0.55
0.5
0.45
9.5
9
.6
grad f (a) grad g(a) grad f (a) | grad g(a)
Figure 6.2: Shift to a close level curve along the vector grad f . The curve M
given by the constraint g(x, y) = 0 is highlighted with blue.
Thus what we can always do in the case of one constraint is to write the
condition grad f | grad g and solve this equation. How about the general
situation when M is given by k constraints g
1
(x) = g
2
(x) = = g
k
(x) = 0
and we need to nd the extreme values of f [
M
? The geometry is same: we
need to nd a certain equation meaning that grad f M.
Theorem 6.2 (Method of Lagrange multipliers) Assume that we have
a function f : R
m
R and a set M R
m
given by constraints
g
1
(x) = g
2
(x) = = g
k
(x) = 0
and suppose that f takes a minimal or a maximal value under these con-
straints at a point a M. Then there is a vector = (
1
, . . . ,
k
) R
k
such that (a, ) R
m+k
is a critical point of the Lagrangian function
L : R
m+k
R given by
L(x
1
, . . . , x
m
,
1
, . . . ,
k
) = f (x
1
, . . . , x
m
) +
k

i=1

i
g
i
(x
1
, . . . , x
m
).
Proof We need to check that the equations on critical points
L
x
1
= = L
x
m
= L

1
= = L

k
= 0 (6.2)
mean that grad f M. Proving it for arbitrary m and k requires some non-
trivial linear algebra, so well just do the cases k = 1 and m = 3, k = 2 here.
If there is only one constraint g(x) = 0, then there is only one direction
perpendicular to M and it is grad g. We have then L = f +g and we need to
73
check that equations (6.2) mean that grad f | grad g. Here, L
x
i
= f
x
i
+g
x
i
=
0 for i = 1, . . . , m if and only if f
x
i
= g
x
i
, so together they mean exactly
that grad f = grad g. The last equation L

= 0 is just g(x) = 0, which is


the constraint, so the case k = 1 is done.
For m = 3, k = 2, we are given a function f (x, y, z) and two constraints
g(x, y, z) = 0 and h(x, y, z) = 0; the Lagrangian is
L(x, y, z, , ) = f (x, y, z) +g(x, y, z) +h(x, y, z).
Now lets look at equations (6.2). Here, L

= g(x, y, z) = 0 and L

=
h(x, y, z) = 0 just tell us that (x, y, z) M. Further,
L
x
= f
x
+g
x
+h
x
= 0, L
y
= f
y
+g
y
+h
y
= 0, L
z
= f
z
+g
z
+h
z
= 0
together express the fact that grad f = grad g grad h.
What does it mean exactly? Let M
1
be the set given by g(x, y, z) = 0 and
let M
2
be given by h(x, y, z) = 0. Then M = M
1
M
2
. The tangent space to M
is the intersection of tangent spaces to M
1
and M
2
. Thus the normal space to
M is spanned by normal spaces to M
1
and M
2
. On the other hand, the normal
space to M
1
is spanned by grad g and the normal space to M
2
is spanned by
grad h, so grad g grad h is an arbitrary vector belonging to the normal
space to M. Thus the equation grad f = grad g grad h means exactly
that grad f is perpendicular to M. :)
Example 6.3 Lets apply this method to our rst example, that is, nd the
minimal value of the function f (x, y) = 10 x
2
+ xy y
2
subject to the con-
straint g(x, y) = x
2
+ y
2
1 = 0. The Lagrangian is L(x, y, ) = 10 x
2
+
xy y
2
+(x
2
+ y
2
1) and equations on critical points are
L
x
= 2x + y +2x = 0, L
y
= x 2y +2y = 0, L

= x
2
+ y
2
1 = 0,
so they are the same as (6.1) with = .
Example 6.4 Lets nd extreme values of the function f (x, y) = xy subject to
the constraint x + 2y = 1 (see Figure 6.3). Notice that if x is large positive,
then y is large negative and the product xy is not bounded from below. On
the other hand, xy can be positive only if both x and y are and then we have
0 < x < 1, 0 < y <
1
2
, so xy <
1
2
. Thus the product xy is bounded from above
and we can nd its maximal value.
The Lagrangian is L(x, y, ) = xy + (x + 2y 1). The conditions for
critical point are
L
x
= y + = 0, L
y
= x +2 = 0, L

= x +2y 1 = 0,
so
x
2
= y = and from the last equation we have x = 2y =
1
2
, so the point
of maximum is
_
1
2
,
1
4
_
and maximal value is
1
8
.
74
2 1.5 1 0.5 0 0.5 1 1.5 2
2
1.5
1
0.5
0
0.5
1
1.5
2

2
.5

2
.5

1
.
5
1.5

1
.
5
1.5

0
.
5
0.5

0
.
5
0.5
0
0
0
0
0
0
0
.
5
0
.5
0
.
5
0.5
1
1
1
1
1
.
5
1.5
1
.
5
1.5
2
2
2
.5
2
.5
3
3
3
.5
3
.5
Figure 6.3: Level curves of the function f (x, y) = xy. The line x + 2y = 1 is
highlighted with blue.
1
0.5
0
0.5
1
1
0.5
0
0.5
1
1
0.5
0
0.5
1
15
10
5
0
5
10
15
15
10
5
0
5
10
15
15
10
5
0
5
10
15
3
2
1
0
1
2
3
3
2
1
0
1
2
3
4
3
2
1
0
1
2
3
4
x
2
a
2
+
y
2
b
2
+
z
2
c
2
= 1
x
2
a
2
+
y
2
b
2

z
2
c
2
= 1
x
2
a
2

y
2
b
2
+
z
2
c
2
= 1
Figure 6.4: Ellipsoid is bounded; hyperboloids are not.
In general, for a function f (x
1
, . . . , x
m
) and a set M given by constraints,
how do we check if f [
M
is bounded? If f is continuous and the set M is closed
(given by equations and/or non-strict inequalities) and bounded (located in
some nite region of R
m
), then f [
M
is bounded. For example, an ellipse
x
2
a
2
+
y
2
b
2
= 1 in R
2
and an ellipsoid
x
2
a
2
+
y
2
b
2
+
z
2
c
2
= 1 in R
3
are bounded. A hyperbola
x
2
a
2

y
2
b
2
= 1 in R
2
or hyperboloids
x
2
a
2
+
y
2
b
2

z
2
c
2
= 1 and
x
2
a
2

y
2
b
2
+
z
2
c
2
= 1
in R
3
are not bounded. For a bounded set M, everythings easy we just
apply the Lagrange multipliers method and nd extreme values. If M is not
bounded, then we have to think carefully whether a maximal value is dened,
or a minimal one, or both, or neither.
Example 6.5 Consider the ellipse given as the intersection of the plane x +
75
5
0
5 5
0
5
0
5
10
15
20
25
30
Figure 6.5: Plane x + y + z = 12 and paraboloid z = x
2
+ y
2
.
y + z = 12 with the paraboloid z = x
2
+ y
2
(Figure 6.5) and lets nd the
closest point to the origin on it.
The distance to the origin is
_
x
2
+ y
2
+ z
2
, but its easier to minimize its
square f (x, y, z) = x
2
+ y
2
+ z
2
. The constraints are g(x, y, z) = x + y + z 12
and h(x, y, z) = x
2
+ y
2
z, so
L(x, y, z, , ) = x
2
+ y
2
+ z
2
+(x + y + z 12) +(x
2
+ y
2
z).
The equations for critical points are
L
x
= 2x + +2x = 0, L
y
= 2y + +2y = 0, L
z
= 2z + = 0,
L

= x + y + z 12 = 0, L

= x
2
+ y
2
z = 0.
Subtracting the second one from the rst one, we get (1 + )(x y) = 0, so
either = 1 or x = y.
First, lets consider the case = 1. Then the equations are
L
x
= = 0, L
y
= = 0, L
z
= 2z + +1 = 0,
L

= x + y + z 12 = 0, L

= x
2
+ y
2
z = 0,
so immediately = 0 and z =
1
2
, which is impossible because z 0 from
the last equation.
Therefore we always have x = y. And we see that its enough to use the
76
last two equations, that is,
L

= 2x + z 12 = 0, L

= 2x
2
z = 0.
Substituting z = 2x
2
to the rst one, we obtain 2x +2x
2
12 = 0, so x
2
+ x
6 = 0 and x = 3 or x = 2. Thus there are two solutions (3, 3, 18) and
(2, 2, 8).
Finally, f (3, 3, 18) = 342 and f (2, 2, 8) = 72, so (2, 2, 8) is the closest
point to the origin.
6.2 Extreme values over a region with a boundary
How do we nd extreme values of a function over some region with a bound-
ary? We divide the region into parts, each of them given by some constraints
and apply the method separately to each part.
Example 6.6 Lets nd the minimal and the maximal value of the function
f (x, y) = 4x
2
y on the disk x
2
+ y
2
3.
First, we nd critical points inside the disk, that is, of the function f itself.
We have f
x
= 8xy = 0 and f
y
= 4x
2
= 0, so there is a whole line x = 0
consisting of critical points. Further, f (0, y) = 0, but f > 0 for y > 0 and
f < 0 for y < 0, so x = 0 are nether minima nor maxima of the function
f (x, y).
We have L = 4x
2
y + (x
2
+ y
2
3), so L
x
= 8xy + 2x = 0, L
y
= 4x
2
+
2y = 0, L

= x
2
+ y
2
3 = 0. Further, x(4y + ) = 0 implies either x = 0
(which is already done above) or 4y + = 0. Substituting = 4y to the 2nd
equation, we get 2x
2
4y
2
= 0, that is, x =

2y. Using the 3rd equation,


we obtain y = 1, so solutions are (

2, 1), (

2, 1), (

2, 1), (

2, 1).
The maximal value 8 at (

2, 1); the minimal value 8 at (

2, 1).
Example 6.7 Lets nd the maximal and the minimal value of the function
f (x, y, z) = x + y + z when x
2
+ y
2
z 1 (the region is shown in Figure
6.6).
The boundary of this region consists of two components z = x
2
+ y
2
and
z = 1. The rst of them is an elliptical paraboloid, the second one is a plane.
Their intersection is a circle x
2
+ y
2
= z = 1.
Thus we need to nd critical points inside the volume, which means no
constraint, on the paraboloid z = x
2
+ y
2
, on the plane z = 1, and on the
circle where the plane intersects the paraboloid, which means two constraints
z = x
2
+ y
2
and z = 1.
For no constraint, we have f
x
= 1 0, so no critical point.
For z = x
2
+ y
2
, we have L(x, y, z, ) = x + y + z +(z x
2
y
2
). To nd
77
1
0.5
0
0.5
1
1
0.5
0
0.5
1
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Figure 6.6: Paraboloidal region x
2
+ y
2
z 1.
critical points, we solve the system
_

_
L
x
= 1 2x = 0
L
y
= 1 2y = 0
L
z
= 1 + = 0
L

= z x
2
y
2
= 0
From the third equation, we get = 1, so rst and second equations give us
x = y =
1
2
and the fourth equation implies z =
1
2
.
For z = 1, we have L(x, y, z, ) = x + y + z + (z 1), so L
x
= 1 0 and
there is no critical point.
Finally, for z = 1 and z = x
2
+ y
2
, we have L(x, y, z, , ) = x + y + z +
(z 1) +(z x
2
y
2
). To nd critical points, we solve the system
_

_
L
x
= 1 2x = 0
L
y
= 1 2y = 0
L
z
= 1 + + = 0
L

= z 1 = 0
L

= z x
2
y
2
= 0
Substituting x = y and z = 1 in the fth equation, we get x = y =
1

2
. Thus
we get two answers
_
1

2
,
1

2
, 1
_
and
_

2
,
1

2
, 1
_
. At the three critical
78
points we found, we have f
_

1
2
,
1
2
,
1
2
_
=
1
2
, f
_
1

2
,
1

2
, 1
_
= 1 +

2, and
f
_

2
,
1

2
, 1
_
= 1

2, so the minimum is
1
2
and the maximum is
1 +

2.
6.3 Exercises
Questions on understanding the lecture
Exercise 6.1 (i) If we need to use the method of Lagrange multipliers to
nd extreme values of a function f (x, y) in a triangle-shaped area in
R
2
, how many times would we have to run the method? How many
variables does each Lagrange function have?
(ii) What about a function f (x, y, z) and its extreme values over a cube-
shaped volume in R
3
?
Questions on calculation
Exercise 6.2 Find the maximal and the minimal values of each of the follow-
ing functions subject to indicated constraints
(a) f (x, y, z) = sin x sin y sin z for x 0, y 0, z 0, x + y + z =

2
.
(b) f (x, y, z) = xyz, constraints x
2
+ y
2
+ z
2
= 1 and x + y + z = 0.
Exercise 6.3 Find the shortest distance between points on the parabola y = x
2
and on the line x y 2 = 0.
Questions on logical thinking
Exercise 6.4 An alien lady misses each of her m husbands such that her an-
guish is proportional to the square of the distance from her to a husband.
In other words, the total anguish is d
2
1
+ + d
2
m
, where d
i
is the distance
between the lady and the ith husband.
The lady lives on the surface of the planet, that is, the unit sphere x
2
+y
2
+
z
2
= 1. The husbands work in the space and/or underground, so i th guy is
staying at the point (x
i
, y
i
, z
i
). Where on the planets surface should the lady
stay to minimize the total anguish?
Exercise 6.5 Find extreme values of a quadratic form q(x) = Ax x (where A
is an mm symmetric matrix), x R
m
on the (m1)-sphere x
2
1
+ x
2
2
+ +
x
2
m
= 1.
79
Lecture 7
Multiple Integral
7.1 Double integral over rectangular regions
Given real numbers a < b, the interval [a, b] consists of x such that a x b.
A rectangle [a, b] [c, d] consists of (x, y) R
2
such that a x b and
c y d. A rectangular box (or parallelepiped) [a
1
, b
1
] [a
2
, b
2
] [a
3
, b
3
]
consists of (x, y, z) R
3
satisfying a
1
x b
1
, a
2
y b
2
, and a
3
z b
3
.
Recall that given a function f : [a, b] R continuous or having only
nitely many points of discontinuity and n N, points x
i
= a + i
ba
n
, i =
0, 1, . . . , n or, more generally, a = x
0
< x
1
< < x
n1
< x
n
= b form
a partition of the interval [a, b], points x

i
[x
i1
, x
i
], i = 1, . . . , n are sample
points. For instance, x

i
can be taken to be x
i
or
x
i1
+x
i
2
.
Then we put x
i
= x
i
x
i1
and x = max x
i
. The Riemann sum
n

i=1
f (x

i
)x
i
approximates the area under the graph of the function f (if f > 0) (see Figure
7.1). The integral is
_
b
a
f (x)dx = lim
x0
n

i=1
f (x

i
)x
i
).
In a similar manner, given a bounded function f : [a, b] [c, d] continuous
or having only nitely many curves of discontinuity, we have the following
set-up:
(a) a partition a = x
0
< x
1
< < x
n1
< x
n
= b of the interval [a, b] with
x
i
= x
i
x
i1
and x = max(x
1
, . . . , x
n
);
(b) a partition c = y
0
< y
1
< < y
n1
< y
n
= d of the interval [c, d] with
y
i
= y
i
y
i1
and y = max(y
1
, . . . , y
n
);
80
0 0.5 1 1.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 7.1: Riemann sum for a function f (x)
(c) sample points (x

i
, y

j
) [x
i1
, x
i
] [y
j1
, y
j
];
(d) a Riemann sum
n

i,j=1
f (x

i
, y

j
)x
i
y
j
.
Definition 7.1 The double integral of the function f on the rectangle [a, b]
[c, d] is
_
[a,b][c,d]
f (x, y)dxdy = lim
(x,y)(0,0)
n

i,j=1
f (x

i
, y

j
)x
i
y
j
.
Like the 1D Riemann integral, the double one satises all the usual prop-
erties:
Linearity Given two functions f , g : [a, b] [c, d] and two real numbers , ,
we have
_
[a,b][c,d]
_
f (x, y) +g(x, y)
_
dxdy =

_
[a,b][c,d]
f (x, y)dxdy +
_
[a,b][c,d]
g(x, y)dxdy.
81
0
5
10
15
20
0
5
10
15
20
0
5
10
15
20
25
30
Figure 7.2: Riemann sum of a function f (x, y)
Additivity Given two points b
/
[a, b] and d
/
[c, d], we have
_
[a,b][c,d]
f (x, y)dxdy =
_
[a,b
/
][c,d]
f (x, y)dxdy +
_
[b
/
,b][c,d]
f (x, y)dxdy
and
_
[a,b][c,d]
f (x, y)dxdy =
_
[a,b][c,d
/
]
f (x, y)dxdy +
_
[a,b][d
/
,d]
f (x, y)dxdy.
Inequalities If f (x, y) g(x, y) for any (x, y) [a, b] [c, d], then
_
[a,b][c,d]
f (x, y)dxdy
_
[a,b][c,d]
g(x, y)dxdy.
82
Geometric meaning If f 0, then
_
[a,b][c,d]
f (x, y)dxdy
is the volume under the graph of the function f (x, y) on the rectangle [a, b]
[c, d] (see Figure 7.2).
Example 7.2 Let us apply the denition to integrate the constant 1. Any Rie-
mann sum equals
n

i=1
n

j=1
(x
i
x
i1
)(y
j
y
j1
) =
n

i=1
(x
i
x
i1
)
n

j=1
(y
j
y
j1
) = (b a)(d c).
Thus
_
[a,b][c,d]
dxdy = (b a)(d c). More generally,
_
[a,b][c,d]
f (x)g(y)dxdy =
_
b
a
f (x)dx
_
d
c
g(y)dy.
On the other hand, we could do the same trick as with limits and deriva-
tives: consider f (x, y) as a function of only x for each xed value of y or a
function of only y for each xed value of x. Then
_
b
a
f (x, y)dx is a function of
y, which in time can be integrated.
Definition 7.3 Given a function f : [a, b] [c, d] R,
_
d
c
_
_
b
a
f (x, y)dx
_
dy =
_
d
c
dy
_
b
a
f (x, y)dx,
and
_
b
a
_
_
d
c
f (x, y)dy
_
dx =
_
b
a
dx
_
d
c
f (x, y)dy
are iterated integrals of the function f on the rectangle [a, b] [c, d].
Example 7.4 Lets nd
_
1
0
_
1
0
e
xy
dxdy. First,
_
1
0
e
xy
dx =
_
1
0
e
x
e
y
dx = e
y
_
1
0
e
x
dx =
_
e
y
e
x

1
0
= e
y
(e 1)
Thus,
_
1
0
_
1
0
e
xy
dxdy = (e 1)
_
1
0
e
y
dy =
_
(e 1)e
y

1
0
= (e 1)(1 e
1
)
83
Theorem 7.5 (Fubini) Assume that f : [a, b] [c, d] is a bounded function
continuous or having several curves where it is discontinuous. Then
_
b
a
dx
_
d
c
f (x, y)dy =
_
d
c
dy
_
b
a
f (x, y)dx =
_
[a,b][c,d]
f (x, y)dxdy
Idea of Proof From the denition, we know that
_
[a,b][c,d]
f (x, y)dxdy
n

i=1
n

j=1
f (x

i
, y

j
)x
i
y
j
,
where x
i
= x
i
x
i1
and y
j
= y
j
y
j1
. By the denition of the usual
integral,
_
d
c
f (x

i
, y)dy
n

j=1
f (x

i
, y

j
)y
j
.
Thus,
_
b
a
dx
_
d
c
f (x, y)dy
n

i=1
n

j=1
f (x

i
, y

j
)x
i
y
j
.
Remark 7.6 It it crucial that the function and the rectangle are bounded in
this theorem. The situation with improper integrals is more complicated.
Example 7.7 By Fubinis theorem,
_
[0,1][0,1]
e
xy
dxdy =
_
1
0
dy
_
1
0
e
xy
dx = e +
1
e
2
7.2 Double integral over arbitrary regions
Average value is a natural interpretation of integral. The average value of a
function f : [a, b] [c, d] R is
1
(b a)(d c)
_
[a,b][c,d]
f (x, y)dxdy
Often one needs to know the average value over a non-rectangular region, for
example over the disk x
2
+ y
2
1. If, say f (x, y) = 1 + sin(x y
2
), then in
order to nd its integral over the disk, we introduce a new function as follows
g(x, y) =
_
1 +sin(x y
2
), x
2
+ y
2
1
0, otherwise
84
1
0.5
0
0.5
1
1
0.5
0
0.5
1
0
0.5
1
1.5
2
1
0.5
0
0.5
1
1
0.5
0
0.5
1
0
0.5
1
1.5
2
Function on [1, 1] [1, 1] Function on x
2
+ y
2
1
Figure 7.3: A function inside the square and inside the circle
and put
_
x
2
+y
2
1
f (x, y)dxdy =
_
1
1
_
1
1
g(x, y)dxdy
(see Figure 7.3).
Definition 7.8 Given a set D R
2
, its characteristic function
D
: R
2
R
is

D
(x, y) =
_
1, (x, y) D
0, otherwise
Definition 7.9 If f : [a, b] [c, d] R is some function and D [a, b]
[c, d], put
_
D
f (x, y)dxdy =
_
[a,b][c,d]
f (x, y)
D
(x, y)dxdy
Applying Fubinis theorem to double integrals over non-rectangular areas,
we obtain the following statements.
Corollary 7.10 Assume that the area of integration D is given by a
x b, f
1
(x) y f
2
(x), where f
1
and f
2
are some continuous functions
85
0.2 0 0.2 0.4 0.6 0.8 1 1.2
0.2
0
0.2
0.4
0.6
0.8
1
1.2
x = a
x = b
y = f
1
(x)
y = f
2
(x)
Figure 7.4: Region given by a x b, g
1
(x) y g
2
(x)
of x (see Figure 7.4). Then
_
D
f (x, y)dxdy =
_
b
a
dx
_
f
2
(x)
f
1
(x)
f (x, y)dy
Example 7.11 Let the area D be given by 0 x 1, 0 y

1 x. and
evaluate
_
D
xdxdy.
By Fubinis Theorem,
_
D
xdxdy =
_
1
0
dx
_

1x
0
xdy =
_
1
0
xdx
_

1x
0
1 dy =
_
1
0
x

1 xdx.
Substituting
u = 1 x, x = 1 u, dx = du,
we obtain

_
0
1
(1 u)

udu =
_
1
0
(u
1/2
u
3/2
)du =
4
15

86
0.2 0 0.2 0.4 0.6 0.8 1 1.2
0.2
0
0.2
0.4
0.6
0.8
1
1.2
x = g
1
(y) x = g
2
(y)
y = c
y = d
Figure 7.5: Region given by c y d, g
1
(y) x g
2
(y)
Corollary 7.12 Assume that the area of integration D is given by c
y d, g
1
(y) x g
2
(y), where g
1
and g
2
are some continuous functions
of y (see Figure 7.5). Then
_
D
f (x, y)dxdy =
_
d
c
dy
_
g
2
(y)
g
1
(y)
f (x, y)dx
Example 7.13 Consider D R
2
given by 0 y 1, 0 x 1 y
2
, so it is
the same region as in Example 7.11. Let us nd the same integral
_
D
xdxdy.
By Fubinis Theorem,
_
D
xdxdy =
_
1
0
dy
_
1y
2
0
xdx =
1
2
_
1
0
(1 y
2
)
2
dy
=
1
2
_
1
0
(1 2y
2
+ y
4
)dy =
4
15
.
We got the result with less effort no roots, no substitutions, just integrating
a polynomial.
87
2 1.5 1 0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
Figure 7.6: A region divided into simple pieces
Thus in order to calculate a double integral over D R
2
, we divide D into
pieces D
1
, . . . , D
k
, each of them is either of the kind a x b, g
1
(x) y
g
2
(x), or of the kind c y d, h
1
(y) x h
2
(y) as shown in Figure 7.6.
Then we use the additivity as
_
D
f (x, y)dxdy =
_
D
1
f (x, y)dxdy + +
_
D
k
f (x, y)dxdy.
Example 7.14 Given the double integral
_
3
0
dx
_
1

x/3
e
y
3
dy,
draw the region of integration, reverse the order of integration, and evaluate
the integral. Starting with y, we see that 0 y 1 and that 0 x 3y
2
.
Thus the integral is
_
3
0
dx
_
1

x/3
e
y
3
dy =
_
1
0
dy
_
3y
2
0
e
y
3
dx =
_
1
0
3y
2
e
y
3
dy =
_
e
y
3
_
1
0
= e 1
Definition 7.15 Given a function f : D R, where D R
2
is some region
in the plane, the average value of the function f in D is
1
Area(D)
_
D
f (x, y)dxdy
88
Now a question occurs. What is the area of a region? Naturally, we want
the average value of the constant 1 to equal 1, so we have to dene the area as
follows.
Definition 7.16 Given a region D R
2
, its area is
_
D
dxdy.
Example 7.17 Lets nd the average value of the function f (x, y) = y in the
region given by 0 y 1 x
2
. First,
Area(D) =
_
0y1x
2
dxdy =
_
1
1
dx
_
1x
2
0
dy =
_
1
1
_
1 x
2
_
dx =
4
3
Second,
_
0y1x
2
ydxdy =
_
1
1
dx
_
1x
2
0
ydy =
_
1
1
_
1 x
2
_
2
2
dx =
8
15
Thus the average value is
8
15

4
3
=
2
5

7.3 Triple integrals
In a similar manner a triple integral can be dened. Given a function f :
[a
1
, b
1
] [a
2
, b
2
] [a
3
, b
3
] R, its triple integral is
_
[a
1
,b
1
][a
2
,b
2
][a
3
,b
3
]
f (x, y, z)dxdydz,
the average value is
1
(b
1
a
1
)(b
2
a
2
)(b
3
a
3
)
_
[a
1
,b
1
][a
2
,b
2
][a
3
,b
3
]
f (x, y, z)dxdydz
As before, Fubinis theorem applied to triple integrals gives us
Corollary 7.18 Assume that a 3D region V is given by
a x b, g
1
(x) y g
2
(x), h
1
(x, y) z h
2
(x, y)
Then
_
V
f (x, y, z)dxdydz =
_
b
a
dx
_
g
2
(x)
g
1
(x)
dy
_
h
2
(x,y)
h
1
(x,y)
f (x, y, z)dz
89
1
0.5
0
0.5
1
1
0.5
0
0.5
1
0
0.2
0.4
0.6
0.8
1
Figure 7.7: 3D region x
2
+ y
2
z 1, x 0
Example 7.19 Lets nd the integral
_
V
dxdydz,
where V is given by x
2
+ y
2
z 1, x 0.
First, for x, we have x 0. Also, x
2
+ y
2
1, so maximal possible x is 1
and limits for x are 0 x 1. In order to check limits for y, we need to look
at our region from top, that is, from the direction of the z-axis so we see only
x and y (Figure 7.7). The tricky part is to understand that we are going to see
the half-disk located in the plane z = 1 given by

1 x
2
y

1 x
2
.
Finally, limits for z are given to be x
2
+ y
2
z 1. Thus,
_
V
dxdydz =
_
1
0
dx
_

1x
2

1x
2
dy
_
1
x
2
+y
2
dz
90
and therefore we get
_
1
0
dx
_

1x
2

1x
2
dy
_
1
x
2
+y
2
dz =
_
1
0
dx
_

1x
2

1x
2
(1 x
2
y
2
)dy
=
_
1
0
dx
_
(1 x
2
)y
y
3
3
_
y=

1x
2
y=

1x
2
= 2
_
1
0
dx
_
_
_(1 x
2
)
_
1 x
2

1 x
2
_
3
3
_
_
_ =
4
3
_
1
0
_
1 x
2
_
3/2
dx.
Substituting
x = sin t t = arcsin x
dx = cos tdt 1 x
2
= cos
2
t
,
we obtain
=
4
3
_
/2
0
cos
4
tdt =
1
3
_
/2
0
(1 +cos 2t)
2
dt
=
1
3
_
/2
0
(1 +2 cos 2t +cos
2
2t)dt
=

6
+
1
3
_
/2
0
1 +cos 4t
2
dt =

4

7.4 Exercises
Questions on understanding the lecture
Exercise 7.1 Given a set D R
2
, let
D
be its characteristic function. What
is the domain of
D
?
Exercise 7.2 What is
2
D
?
Exercise 7.3 Let D R
2
be the annulus given by the inequality 1 x
2
+y
2

4. By denition,
_
D
dxdy
x
2
+ y
2
=
_
2
2
_
2
2

D
(x, y)
x
2
+ y
2
dxdy,
but the function
1
x
2
+y
2
is not dened at (0, 0), so the right integral seems not to
be dened, but the left integral must be ok since the bad point (0, 0) does not
belong to D. How should we dene the integral

D
dxdy
x
2
+y
2
in this situation?
91
Exercise 7.4 State the denition of the volume of a 3D solid. What is the
average value of a function f (x, y, z) over a 3D region? Prove that the average
value of a constant C is C.
Questions on calculation
Exercise 7.5 For the following double integrals, draw the region of integra-
tion, change the order of integration, and evaluate them:
(i)
_
2
0
dx
_
2
x
2e
y
2
dy
(ii)
_
1
0
dy
_
1
y
3xe
x
3
dx
(iii)
_
1
0
dx
_
1

x
3
4 + y
3
dy
Exercise 7.6 Compute the volume of the solid bounded by the surfaces x =
y
2
, x = 4, x + z = 6, and x + z = 8.
Exercise 7.7 For the triple integral
_
2
0
dz
_

4z
2
0
dy
_
2

y
2
+z
2
dx, sketch the
solid whose volume is taken and rewrite the integral as
_

dy
_

dx
_

dz,
Exercise 7.8 Evaluate the triple integral

V
xy
2
z
3
dxdydz, where the 3D re-
gion V is bounded by the surfaces z = xy, y = x, x = 1, z = 0.
Questions on logical thinking
Exercise 7.9 Consider a rectangle [a, b] [c, d]. Assume that it can be parti-
tioned into smaller rectangles as
[a, b] [c, d] =
n
i=1
[a
i
, b
i
] [c
i
, d
i
],
where (a
i
, b
i
) (c
i
, d
i
) (a
j
, b
j
) (c
j
, d
j
) = for i j. Assume also that for
each i = 1, . . . , n we either have b
i
a
i
Z or d
i
c
i
Z. Prove that either
b a Z or d c Z.
Hint: think of a function f (x) such that
_
b
a
f (x)dx = 0 if and only if
b a Z.
92
Lecture 8
Change of variables
8.1 Substitution in single integral
Recall that if g : [, ] [a, b] is a bijection having continuous derivative, then
for any continuous function f : [a, b] R, we have
_
b
a
f (x)dx =
_
g
1
(b)
g
1
(a)
f (g(t))g
/
(t)dt
Example 8.1 For example, substituting x = sin t, so dx = cos tdt. we get
_
1
0
_
1 x
2
dx =
_
/2
0
cos
2
tdt
Why do we multiply by g
/
(t) here? Consider a Riemann sum for f (x), that
is,
n

i=1
f (x

i
) (x
i
x
i1
) .
If now x is, in turn, a continuously differentiable function of t, then by the
Mean Value Theorem,
x
i
x
i1
= x
/
(t

i
)(t
i
t
i1
), t

i
[t
i1
, t
i
]
Since the choice of sample points doesnt matter, we choose x

i
= x(t

i
) and
then obtain
n

i=1
f (x

i
) (x
i
x
i1
) =
n

i=1
f (x(t

i
))x
/
(t

i
)
_
t
i
t
i1
_
.
A small interval [t, t +t] [, ] is being stretched x
/
(t

) times by the func-


tion x(t) for some t

[t +t]. But if x
/
is continuous, then x
/
(t

) x
/
(t), so
the stretching factor is x
/
(t).
93
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 8.1: Level curves of polar coordinates r and .
8.2 Substitution in multiple integrals
Consider now a double integral
_
[a,b][c,d]
f (x, y)dxdy
and lets think what happens if we substitute x = x(u, v), y = y(u, v), where
T : (u, v) (x, y) is some bijective map.
Recall that a Riemann sum for the double integral is
n

i,j=1
f
_
x

i
, y

j
_
x
i
y
j
, x
i
= x
i
x
i1
, y
j
= y
j
y
j1
.
The map T transforms a small rectangle [u, u + u] [v, v + v] into some
curved gure in x, y.
Example 8.2 Consider the plane R
2
r,
and the map T : R
2
r,
R
2
x,y
given by
polar coordinates x = r cos , y = r sin . A rectangular grid formed by lines
r = const and = const is being transformed by T into a non-rectangular grid
formed by lines and circles as shown in Figure 8.1.
Notice that in general T([u, u +u] [v, v +v]) is not same as the rectan-
gle [x +x] [y +y], but is close to so we neglect the difference. Similarly
to the 1D case, we need to nd out how much [u, u +u] [v, v +v] is being
stretched by the map T. Applying differentials, we see that x x
u
u +x
v
v
and y y
u
u + y
v
v. In other words,
_
x
y
_

_
x
u
x
v
y
u
y
v
_

_
u
v
_
(8.1)
94
and the question is: how does the linear map (8.1) change area of rectangles
in the plane R
2
u,v
? The answer is known from linear algebra: given a linear
transformation / : R
n
R
n
with a matrix A, it stretches or squeezes all
areas by the factor equal to [ det A[. Therefore the factor in the formula for a
substitution in a multiple integral must be [ det T
/
[, where T
/
is the Jacobian
matrix of the substitution. Generally, the following statement holds:
Theorem 8.3 Given two regions E R
m
and D R
m
, assume that
T : E D is a map having continuous partial derivatives. Besides,
suppose that T is bijective on the interiors of the regions E and D (on the
boundaries it might fail to). Then for any continuous function f : D R,
we have
_
D
f (x)dx =
_
E
f (T(u)) [J(u)[ du,
where J = det T
/
.
Remark 8.4 Notice that here we took the absolute value of the determinant
while in 1D case we multiplied by the derivative itself. The difference occurred
due to the denition of reversing limits of a 1D integral as
_

f (g(t))g
/
(t)dt =
_

f (g(t))g
/
(t)dt
For multiple integrals there is no such an effect.
Polar coordinates
Recall that polar coordinates r, (see Figure 8.2) are dened by x = r cos ,
y = r sin . We can also express r and as = arctan
y
x
, r =
_
x
2
+ y
2
, though
it is not exactly true because it works only for x 0 (notice that one can make
it work even for x = 0, but not for x < 0).
Lets nd the Jacobian. We have
J = det
_
cos r sin
sin r cos
_
= r cos
2
+r sin
2
= r
Thus the following statement holds:
Corollary 8.5 For a function f : R
2
R and a region D R
2
x,y
, we
have
_
D
f (x, y)dxdy =
_
E
r f (r cos , r sin )drd,
95
0 0.5 1 1.5 2
0.2
0
0.2
0.4
0.6
0.8
1
1.2
r cos
r
s
i
n

r
Figure 8.2: Polar coordinates r,
where E is the same region in r, .
Example 8.6 Let us nd
_
x
2
+y
2
2y
_
x
2
+ y
2
dxdy
The inequality x
2
+ y
2
2y in polar form is r
2
2r sin , that is, 0 r
2 sin . Since 0 sin , we have 0 . Also, limits for can be seen
geometrically since the equation x
2
+y
2
= 2y is the same as x
2
+ (y 1)
2
= 1,
which is a circle of radius 1 centered at (0, 1). Further, for the function we
have
_
x
2
+ y
2
= r, so
_
x
2
+y
2
2y
_
x
2
+ y
2
dxdy =
_

0
d
_
2 sin
0
r
2
dr =
8
3
_

0
sin
3
d =
32
9

Example 8.7 Lets evaluate
_
D
e
x
2
+y
2
dxdy, where D is the solid half circle
given by
x
2
+ y
2
4, x + y 0.
We see that here

4

3
4
and 0 r 2, so
_
D
e
x
2
+y
2
dxdy =
_ 3
4

4
d
_
2
0
re
r
2
dr =

2
_
e
4
1
_

96
0
0.2
0.4
0.6
0.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Q(x, y, 0)
P(x, y, z)
z
r
r sin

O(0, 0, 0)
r
c
o
s

Figure 8.3: Cylindrical coordinates r, , z


Cylindrical coordinates
In R
3
, we can consider polar coordinates on the plane x, y and keep z as it is.
It gives us so called cylindrical coordinates (see Figure 8.3):
x = r cos , y = r sin , z = z.
Lets nd the Jacobian. We have
J = det
_
_
cos r sin 0
sin r cos 0
0 0 1
_
_
= r cos
2
+r sin
2
= r
Thus it is also r and the following statement holds:
Corollary 8.8 For a function f : R
3
R and a region V R
3
, we have
_
V
f (x, y, z)dxdydz =
_
W
r f (r cos , r sin , z)drddz,
where W is the same region in cylindrical coordinates.
97
0
0.2
0.4
0.6
0.8
1
1
0.5
0
0.5
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 8.4: The region in R
3
given by inequalities x
2
+ y
2
z 1 and x 0
Example 8.9 Let us nd the volume of the solid V given by x
2
+ y
2
z 1,
x 0. The solid V is shown in Figure 8.4. In cylindrical coordinates, it is
dened by
r
2
z 1, r cos 0

2


2
Thus the volume is
_
V
dxdydz =
_
2

2
d
_
1
0
dr
_
1
r
2
rdz =
_
1
0
r(1 r
2
)dr =
_
r
2
2

r
4
4
_
1
0
=

4
.

Example 8.10 Lets evaluate



V
(x
2
+ y
2
)e
(x
2
+y
2
)z
dxdydz, where the solid V
is given by 1 x
2
+ y
2
4, 0 z 1, y 0. Here, we have 1 r 2,
0 z 1, and sin 0, so
_
V
(x
2
+ y
2
)e
(x
2
+y
2
)z
dxdydz =
_

0
d
_
2
1
dr
_
1
0
r
3
e
r
2
z
dz
=
_
2
1
r
3
_
e
r
2
z
r
2
_
z=1
z=0
dr =
_
2
1
r
_
e
r
2
1
_
dr
=
_
e
r
2
2

r
2
2
_
2
1
=

2
_
e
4
e 3
_
.
98
0
0.2
0.4
0.6
0.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Q(x, y, 0)
P(x, y, z)

s
i
n

Figure 8.5: Spherical coordinates , ,


Spherical coordinates
Another standard substitution in R
3
is given by spherical coordinates. Speci-
cally, given a point P R
3
,
(i) is the distance to the origin, so 0,
(ii) is the polar angle in the x, y-plane, so is dened on any interval of
length 2.
(iii) is the angle between OP and Oz, so [0, ]. In particular, the case
=

2
means the horizontal direction of the OP vector, that is, P is on
the x, y-plane. Also, = 0 would mean that the point is on the 0z-axis
with z > 0 and = happens when the point is also on the Oz-axis
but with z < 0.
Its not hard to get the explicit expressions
x = sin cos , y = sin sin , z = cos .
Conversely, spherical coordinates can be expressed via Cartesian ones as
=
_
x
2
+ y
2
+ z
2
, = arctan
y
x
, =

2
arctan
z
_
x
2
+ y
2
,
but these formulae do not work for x < 0.
99
Remark 8.11 The polar angle is is 2-periodic so the full range for is any
interval of length 2, but is not periodic its full range is always [0, ].
Lets nd the Jacobian. It is
det
_
_
sin cos cos cos sin sin
sin sin cos sin sin cos
cos sin 0
_
_
=
2
cos (cos sin cos
2
+cos sin sin
2
)
+
2
sin (sin
2
cos
2
+sin
2
sin
2
)
=
2
(cos
2
sin +sin
3
) =
2
sin .
It turns out to be positive because 0 . Of course, it doesnt have to be
so, but if we remember that it is, then its easy to memorize whether it is sine
or cosine in this expression. Indeed, if we put cosine there, then the integral
with respect to from 0 to would be 0.
Spherical coordinates are suitable for regions and functions dened by
formulae containing x
2
+ y
2
+ z
2
=
2
and x
2
+ y
2
=
2
sin
2
. Sometimes,
both spherical and cylindrical substitutions can be used. Which one is better
and easier? Its impossible to give a universal answer intuition comes with
practice.
Corollary 8.12 For a function f : R
3
R and a region V R
3
, we
have
_
V
f (x, y, z)dxdydz =
_
W

2
sin f ( sin cos , sin sin , cos )ddd,
where W is the same region in the spherical coordinates.
Example 8.13 Let us nd the volume of a sphere of radius R, that is, the one
given by R in spherical coordinates. We must integrate the constant 1
over it, so the volume is
_
x
2
+y
2
+z
2
R
2
dxdydz =
_
2
0
d
_

0
d
_
R
0

2
sin d = 2 2
R
3
3
=
4R
3
3
because there is no additional restriction on and .
100
Example 8.14 Lets evaluate
_
V
zdxdydz, where V is the solid upper half
sphere given by
x
2
+ y
2
+ z
2
1, z 0.
Here, 0 1, is not restricted, 0

2
. Thus the integral equals
_
V
zdxdydz =
_
2
0
d
_
2
0
d
_
1
0

2
sin cos d
= 2
_
2
0
sin cos d
_
1
0

3
d =
_

_
1
2
cos 2
_
2
0
_

4
4
1
0
_
=

4

8.3 Summary and more examples
Thus in order to do a substitution in a multiple integral, we
(1) Re-write the inequalities dening the region in new coordinates,
(2) Re-write the function in new coordinates,
(3) Multiply by the modulus of the Jacobian.
Its good to remember that the Jacobian is r for polar and cylindrical substitu-
tions and is
2
sin for the spherical one, but its not really a big deal because
the Jacobian is easy to nd.
Lets now consider more examples of different substitutions.
Example 8.15 Lets evaluate the area of the ellipse with semi-axes a and b. It
is given by

x
2
a
2
+
y
2
b
2
1
dxdy. The ellipse is a stretched circle, so we naturally try
to substitute stretched polar coordinates
x = ar cos , y = br sin .
The Jacobian is
J = det
_
a cos ar sin
b sin br cos
_
= abr > 0,
so no need to put modulus. Further,
x
2
a
2
+
y
2
b
2
= r
2
, so the whole area of the
ellipse is 0 r 1, 0 2. Finally, we get
_
x
2
a
2
+
y
2
b
2
1
dxdy =
_
2
0
d
_
1
0
abrdr = ab
101
Example 8.16 Lets nd the area inside the astroid x
2/3
+ y
2/3
= 1. Notice
that the astroid is obtained from the circle x
2
+ y
2
= 1 by replacing x and
y with their cube roots. Hence the idea is to cube polar coordinates as x =
p
3
cos
3
, y = p
3
sin
3
. Since polar coordinates are one-to-one and cubing is
one-to-one, the area is 0 p 1 in p, . The Jacobian is
J = det
_
3p
2
cos
3
3p
3
cos
2
sin
3p
2
sin
3
3p
3
sin
2
cos
_
= 9p
5
cos
2
sin
2
> 0.
Thus the answer is
_
2
0
d
_
1
0
9p
5
cos
2
sin
2
dp =
3
8
.
8.4 Exercises
Questions on understanding the lecture
Exercise 8.1 Why are spherical coordinates called spherical? Why are cylin-
drical coordinates called cylindrical?
Exercise 8.2 What is the Jacobian determinant of a linear map f (x) = Ax,
x R
n
where A is an n n matrix?
Exercise 8.3 What is the Jacobian determinant of the map (, ) (x, y, z)
given by
x = sin cos ,
y = sin sin ,
z = cos .
?
Exercise 8.4 Assume that a region D R
2
is bounded by half-lines = and
= and the graph of a non-negative function r = r() in polar coordinates.
Find its area.
Questions on calculation
Exercise 8.5 Evaluate the area enclosed by the curve (x
2
+ y
2
)
2
= a(x
3

3xy
2
), where a > 0 is a parameter.
Exercise 8.6 Evaluate

R
2
e
x
2
y
2
dxdy and use it to compute
_

e
x
2
dx.
Exercise 8.7 Applying cylindrical or spherical coordinates, nd the volume
of the following solids
102
(i) x
2
+ y
2
+ z
2
2az, x
2
+ y
2
z
2
, where a > 0.
(ii) (x
2
+ y
2
+ z
2
)
2
a
2
(x
2
+ y
2
z
2
)
(iii) (x
2
+ y
2
+ z
2
)
2
3xyz
Questions on logical thinking
Exercise 8.8 Evaluate the integral
_
x
2
+y
2
+u
2
+v
2
1
e
x
2
+y
2
u
2
v
2
dxdydudv.
Hint: use two copies of polar coordinates for x, y and for u, v respectively.
Exercise 8.9 Modify spherical substitution to nd the volume of the solid
x
2
a
2
+
y
2
b
2
+
z
4
c
4
1.
103
Lecture 9
Vector elds and line
integrals
9.1 Vector elds and operations on them
Vector elds often occur in physics, engineering, or even in weather forecast
as shown in Figure 9.1. For example, water currents on the surface of water or
winds measured at a certain height are vector elds in R
2
. Velocities of liquid
ows, 3D-pattern of winds, gravitational, electric, magnetic forces are vector
elds in R
3
.
Definition 9.1 Given a region D R
m
, a vector eld V in D is a map D
R
m
.
What is this D? It can be the whole R
2
or R
3
. It can also an open subset
in R
2
or R
3
. Anyway, the vector eld is always supposed to have continuous
partial derivatives.
Whats an open set? Consider a region D R
2
bounded by one or several
smooth curves for example, a disk x
2
+ y
2
1. It consists of the boundary
x
2
+ y
2
= 1 and the interior x
2
+ y
2
< 1. The interior x
2
+ y
2
< 1 is an open
set.
Definition 9.2 A set S R
m
is called open if for any its point x S there is
a small number r > 0 such that the whole solid ball B
r
(x) of radius r centred
at x is contained in S, that is, B
r
(x) S.
A vector eld in R
3
is a map R
3
R
3
. A substitution in a triple integral
104
Figure 9.1: Map of winds for South East Asia, 14 April 2009
is also a map g : R
3
R
3
. What is the difference (except that a substitution
must be bijective)? It is in notation and in interpretation. A vector eld V
in R
3
whose component functions are P(x, y, z), Q(x, y, z) and R(x, y, z) is
denoted by
V = Pi + Qj + Rk,
where
i =
_
_
1
0
0
_
_
, j =
_
_
0
1
0
_
_
, k =
_
_
0
0
1
_
_
are frame elds.
At each particular point, a vector eld is a vector, so one can do all standard
vector operations with vector elds. First, vector elds are added component-
by-component, that is,
(P
1
i + Q
1
j) + (P
2
i + Q
2
j) = (P
1
+ P
2
)i + (Q
1
+ Q
2
)j.
and same in R
3
. Thus the sum of two vector elds is a vector eld of the same
dimension.
Second, a vector eld can be multiplied with a function component-by-
105
component, that is,
f (x, y, z) (Pi + Qj + Rk) = f Pi + f Qj + f Rk.
and same for R
2
. Thus a vector eld multiplied by a function yields a vector
eld of the same dimension.
Third, a dot product of vector elds is dened as
(P
1
i + Q
1
j + R
1
k) (P
2
i + Q
2
j + R
2
k) = P
1
P
2
+ Q
1
Q
2
+ R
1
R
2
.
Thus the dot product of two vector elds is a function.
Finally, for vector elds in R
3
there is the cross product given by
(P
1
i + Q
1
j + R
1
k) (P
2
i + Q
2
j + R
2
k) =
(Q
1
R
2
Q
2
R
1
)i (P
1
R
2
P
2
R
1
)j + (P
1
Q
2
P
2
Q
1
)k
One can remember it as
(P
1
i + Q
1
j + R
1
k) (P
2
i + Q
2
j + R
2
k) = det
_
_
i j k
P
1
Q
1
R
1
P
2
Q
2
R
2
_
_
Thus the cross product of two vector elds in R
3
is, again, a vector eld in R
3
.
Example 9.3 Lets simplify
_
(xi + e
xy
k) (xj y
2
k)

_
z
2
i + (x y)j

. We
have,
(xi + e
xy
k) (xj y
2
k) = det
_
_
i j k
x 0 e
xy
0 x y
2
_
_
= xe
xy
i + xy
2
j + x
2
k.
Further,
_
xe
xy
i + xy
2
j + x
2
k
_

_
z
2
i + (x y)j
_
= xe
xy
z
2
+ xy
2
(x y).
Example 9.4 By straightforward calculation, we can check that
[(Ai + Bj + Ck) (Pi + Qj + Rk)] [Xi +Yj + Zk] = det
_
_
A B C
P Q R
X Y Z
_
_
.

Further, since we are in the realm of Calculus now, there must be differ-
ential operations with vector elds. Recall that the gradient of a function is a
vector eld dened as
grad f (x, y) = f
x
i + f
y
j, grad f (x, y, z) = f
x
i + f
y
j + f
z
k.
106
One can think of it as follows. Let =
_

x
,

y
_
in R
2
or =
_

x
,

y
,

z
_
in
R
3
, that is, (called nabla) is a vector whose components are partial differen-
tiations. Then
grad f = f ,
where the product of a partial differentiation and a function is considered as
the partial differentiation applied to the function.
Further, if we have a function and a vector eld, then we can take the
derivative of the function along the vector eld. We have proved that it is the
same as to take the dot product of the gradient with the vector eld, that is,
D
V
f = V f . The result of this operation is a function but a vector eld is
used in the process. Using , we can write directional derivative as
D
V
= V
where the scalar operator V transforms functions to functions.
Also, we can formally take the dot and the cross product of with a vector
eld. Specically, the divergence is div V = V =
_

x
,

y
,

z
_
(P, Q, R) or
in coordinates
div(Pi + Qj + Rk) = P
x
+ Q
y
+ R
z
.
Thus divergence of a vector eld in R
3
is a function of x, y, z.
Similarly, the curl is curl V = V or in coordinates
curl(Pi + Qj + Rk) =
_
R
y
Q
z
_
i + (P
z
R
x
) j +
_
Q
x
P
y
_
k.
Thus the curl of a vector eld in R
3
is a vector eld in R
3
. Its convenient to
memorize the formula for the curl in the determinant form as
curl(Pi + Qj + Rk) = det
_
_
i j k

x

y

z
P Q R
_
_
.
The reason why they are called divergence and curl will be clear later from
Stokes and Gauss Theorems.
Example 9.5
curl(xj +sin
2
yk) = det
_
_
i j k

x

y

z
0 x sin
2
y
_
_
= 2 sin y cos yi +k.
Theorem 9.6 For any function f on R
3
, we have curl (grad f ) = 0.
For any vector eld V = Pi + Qj + Rk, we have div (curl V) = 0.
107
1 0.5 0 0.5 1
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
Figure 9.2: An astroid.
Proof It can be easily proved by direct calculation. :)
9.2 Curves
Recall that a curve in R
2
is the image of a map r : [a, b] R
2
. For many
applications howeer, this denition is too general.
Example 9.7 Lets consider the astroid given by x
2/3
+ y
2/3
= 1. It can be
parameterized as x = cos
3
t, y = sin
3
t. Visually, the curve has cusps at (0, 1)
and at (1, 0) as seen in Figure 9.2, so we dont want to call it a smooth curve.
We have
dx
dt
= 3 cos
2
t sin t,
dy
dt
= 3 sin
2
t cos t,
so the tangent vector (x
/
, y
/
) vanishes when t is a multiple of

2
. No tangent
vector we see a cusp.
Example 9.8 Lets consider a curve given by x = t
3
t, y = 0. Obviously,
the image of this map is just the straight line y = 0. On the other hand,
whats going to happen if a particle moves along it? We see that x
/
> 0 for
t <
1

3
, which means that the particle is moving to the right, and x
/
< 0 for
108

3
< t <
1

3
, which means that the particle is moving to the left. Thus it
actually reverses its direction twice at the extreme points of the function x(t)
and its trajectory is not exactly same as the straight line.
Definition 9.9 Given an interval I R and a map r : I R
n
such that
(i) r has continuous derivative,
(ii) r
/
never vanishes on I,
the image r(I) is called a smooth curve in R
n
.
Although this denition does not take care of all nuances, it is enough for
integration, which is our purpose. The rigourous denition follows.
Definition 9.10 A smooth oriented curve in R
n
is a map r : I R
n
consid-
ered up to change of variables, that is, given any function u : J I such
that
(i) r
/
never vanishes on I,
(ii) J R is some interval,
(iii) r
/
is continuous,
(iv) u
/
> 0,
maps f (u) and f (u(t)) are in the same equivalence class.
In other words, a smooth curve is an equivalence class of maps whose
domain is some interval in R and whose range is R
n
, where the equivalence
relation is dened by r r u for any increasing function u(t).
Example 9.11 The maps x = u, y =

1 u
2
, 1 < u < 1 and x = cos t,
y = sin t, 0 < t < dene the same curve. Indeed, u = cos t is the required
substitution.
It is important to understand that the mere fact that r
/
(t) = 0 doesnt yet
mean that the curve has a cusp or a turning point. It might be possible to
re-parameterize it to get rid of the singularity.
Example 9.12 Consider parametric equations x = sin(t
3
), y = cos(t
3
). Obvi-
ously, it is the unit circle centred at (0, 0), so it looks perfectly smooth though
x
/
(0) = y
/
(0) = 0. Here we can re-parameterize it by letting t =
3

u to get
smooth equations.
109
Curves are studied in differential geometry and topology. Right now we
dont need to care much about cusps, turning points etc. because even if r
/
(t)
vanishes at nitely many points, it doesnt affect integration we can always
divide the whole curve into several smooth pieces and integrate separately
over each of them.
9.3 Arc length
Let C = f ([a, b]) be a plane curve given by a parametric equation x = x(t), y =
y(t), where the component functions x(t) and y(t) are continuous. How does
one evaluate the length of the curve C?
Definition 9.13 Given n +1 points A
0
, A
1
, . . . , A
n
of the plane R
2
, the col-
lection of line segments A
0
A
1
, A
1
A
2
, . . . , A
n1
A
n
is called a broken line or a
piecewise linear curve.
Let us divide the interval [a, b] into n small intervals. Assume that
a = t
0
< t
1
< < t
n1
< t
n
= b, t =
n
max
i=1
(t
i
t
i1
)
Then the broken line passing through the points f (t
0
), f (t
1
), , f (t
n
) ap-
proximates the curve C as t 0 and n as shown in Figure 9.3.
Definition 9.14 The length of a curve C = f ([a, b]) is
lim
t0
n

i=1
[[ f (t
i
) f (t
i1
)[[.
The idea of dividing an interval into many small subintervals sounds
like an integral. Indeed, applying the Mean Value Theorem, we get f (t
i
)
f (t
i1
) = f
/
(t

i
)(t
i
t
i1
) for some t

i
[t
i1
, t
i
], so
n

i=1
[[ f (t
i
) f (t
i1
)[[ =
n

i=1
|f
/
(t

i
)| (t
i
t
i1
),
but the limit of the right hand side is exactly the integral of the function
|f
/
(t)|, that is,
_
(x
/
)
2
+ (y
/2
). Unfortunately, this is all wrong because the
Mean Value Theorem works only for functions from R to R, so it cannot be
applied to f : R R
2
. The statement is true though.
110
0.5 0 0.5 1 1.5 2 0.5 0 0.5 1 1.5 2
0.5 0 0.5 1 1.5 2 0.5 0 0.5 1 1.5 2
Figure 9.3: A piecewise linear curve approximates a curve
Theorem 9.15 Let a plane curve C = f ([a, b]) be given by a parametric
equation x = x(t), y = y(t) and suppose that the component functions are
continuously differentiable (it means that x
/
(t) and y
/
(t) are continuous).
The length of the curve C is
_
b
a

_
dx
dt
_
2
+
_
dy
dt
_
2
dt (9.1)
Proof The proof is going to consist of two major steps. First, we nd an
inequality that relates the length of a curve with derivatives of its component
functions. Second, we consider the length of a curve as a function of endpoints
and differentiate this function.
First step. Let a = t
0
< t
1
< < t
n1
< t
n
= b be a partition of the
interval [a, b] and consider any particular subinterval [t
i
, t
i1
]. Lets apply the
Mean Value Theorem to the functions x(t) and y(t) on the interval [t
i
, t
i1
].
We have then
x(t
i
) x(t
i1
) = x
/
(t

i
)(t
i
t
i1
),
y(t
i
) y(t
i1
) = y
/
(t

i
)(t
i
t
i1
),
111
where t

i
, t

i
[t
i
, t
i1
]. Thus,
|f (t
i
) f (t
i1
)| =
_
(x(t
i
) x(t
i1
))
2
+ (y(t
i
) y(t
i1
))
2
=
_
[x
/
(t

i
)]
2
+ [y
/
(t

i
)]
2
(t
i
t
i1
).
This is already something like (9.1), but the problem is that, in general, t

i

t

i
.
Further, since x
/
and y
/
are continuous, they are bounded on [t
i1
, t
i
]. Let
l

i
= min
[t
i1
,t
i
]
x
/
(t), L

i
= max
[t
i1
,t
i
]
x
/
(t), l

i
= min
[t
i1
,t
i
]
y
/
(t), L

i
= max
[t
i1
,t
i
]
y
/
(t).
We obtain then
(t
i
t
i1
)
_
_
l

i
_
2
+
_
l

i
_
2
|f (t
i
) f (t
i1
)| (t
i
t
i1
)
_
_
L

i
_
2
+
_
L

i
_
2
.
(9.2)
Now let l

= min l

i
be the minimal value of x
/
(t) on the whole interval [a, b],
L

= max L

i
be the maximal value of x
/
(t) on [a, b], l

= min l

i
, and L

=
max L

i
. If we replace l

i
with l

and l

i
with l

, the left hand side of (9.2)


can become only smaller and, similarly, if we replace L

i
with L

and L

i
with
L

, the right hand side can become only bigger. Thus,


(t
i
t
i1
)
_
(l

)
2
+ (l

)
2
|f (t
i
) f (t
i1
)| (t
i
t
i1
)
_
(L

)
2
+ (L

)
2
.
Adding these inequalities for i = 1, 2, . . . , n together and taking the limit, we
get
(b a)
_
(l

)
2
+ (l

)
2
lim
t0
n

i=1
|f (t
i
) f (t
i1
)| (b a)
_
(L

)
2
+ (L

)
2
.
(9.3)
Second step. Let F(t) be the length of the curve f ([a, t]), which is a piece
of the whole curve C. Thus F(t) is an increasing function, F(a) = 0, and F(b)
is the length of the whole curve C we need to nd. Lets nd F
/
(t). We know
that F
/
(t) = lim
t0
F(t+h)F(t)
h
. Here, F(t +h) F(t) is the length of the curve
f ([t, t + h]), which is a small piece of the curve C. Following the ideology of
the rst step, let
l = min
[t,t+h]
x
/
(t), L = max
[t,t+h]
x
/
(t), m = min
[t,t+h]
y
/
(t), M = max
[t,t+h]
y
/
(t).
Applying the inequality (9.3) on the interval [t, t + h], we get
_
l
2
+ m
2

F(t + h) F(t)
h

_
L
2
+ M
2
.
112
Taking the limit as h 0 and applying the Squeeze Theorem, we get
F
/
(t) =

_
dx
dt
_
2
+
_
dy
dt
_
2
.
Finally,
F(b) = F(b) F(a) =
_
b
a
F
/
(t)dt =
_
b
a

_
dx
dt
_
2
+
_
dy
dt
_
2
dt.
:)
Exercise 9.1 What would be the length of a curve given by parametric equa-
tions
x = x(t), y = y(t), z = z(t),
in R
3
?
Example 9.16 Lets nd the length of the unit circle given by the parametric
equation x = cos t, y = sin t for t [0, 2]. By (9.1), the length is
_
2
0
_
(sin t)
2
+ (cos t)
2
dt =
_
2
0
dt = 2.
Example 9.17 Let us nd the length of the curve y = x
3/2
,where x [0, 4].
First, we need to parameterize this curve. A straightforward way of doing it
is x = t, y = t
3/2
. Thus the answer is
_
4
0

1 +
_
3
2

t
_
2
dt =
_
4
0
_
1 +
9
4
tdt.
Substituting u = 1 +
9
4
t, we get
4
9
_
10
1

udu =
4
9

_
2
3
u
3/2
_
10
1
=
8
27
(10

10 1).
Theorem 9.18 Given a function r = r(), , its graph in polar
coordinates has length
_

_
r
2
() + (r
/
())
2
d. (9.4)
113
Proof Recall that polar coordinates relate to Cartesian ones as x = r cos and
y = r sin . If r = r(), then by the Product Rule we get
dx
d
= r
/
() cos r() sin ,
dy
d
= r
/
() sin +r() cos .
Substituting it to (9.1), we complete the proof. :)
Example 9.19 Lets nd the length of the curve r = e

, 0 1. Applying
(9.4), we see that it equals
_
1
0
_
e
2
+ e
2
d =

2(e 1).
Example 9.20 Lets nd the length of the closed curve given by r = cos in
polar coordinates. First, we need to nd the period of the map t (x, y) (in
Cartesian coordinates!). We have,
x = cos
2
=
1 +cos 2
2
, y = cos sin =
sin2
2
,
so the minimal period is . Thus the length of this curve is
_

0
_
cos
2
+ (sin )
2
d = .
It is important to understand that the period of a curve is to be found in
Cartesian coordinates because polar coordinates are not in one-to-one corre-
spondence with points on the plane.
9.4 Line integral of a function
Lets think about a natural question: given a curve in R
2
or in R
3
, how does
one nd the average value of some function on this curve? For example, the
E2 highway joins KL and JB and we need to nd the average distance from us
to Singapore if we are moving all the way from JB to KL.
We cannot just take the double integral over the curve and divide by its
area because the curve has no area and the double integral over anything of
zero area would be also zero.
We dene the integral of a function over a curve in some other way. In
fact, we do the same as for one-dimensional integral. Dene a Riemann sum
to be a total area of rectangles based on the curve as shown in Figure 9.4 and
dene the integral to be the limit of Riemann sums.
114
1
1.5
2
2.5
0
0.5
1
1.5
2
2.5
3
3.5
0
0.5
1
1.5
Figure 9.4: Riemann sum for a line intergral
Definition 9.21 Assume that a curve C is given by a parametric equation
x = r(t), where r : [a, b] R
m
. Let f : R
m
R be a function of m variables.
Then lets consider the following setting:
(i) a = t
0
< t
1
< < t
n
= b partition of [a, b], with t = max(t
i
t
i1
)
for i = 1, 2, . . . , n.
(ii) t

i
[t
i1
, t
i
], i = 1, . . . , n sample points on [a, b],
(iii) x

i
= r(t

i
), i = 1, . . . , n corresponding sample points on the curve,
(iv) s
i
= |r(t
i
) r(t
i1
)| = |x
i
x
i1
|.
The line integral of the function f over the curve C is then
_
C
f (x)ds = lim
t0
n

i=1
f (x

i
)s
i
From the denition, it follows that the value of the integral does not de-
pend on the orientation of the curve. Indeed, if we reverse orientation, then
the order of partition and sample points is reversed, so instead of t
0
, t
1
, . . . , t
n
115
we take t
n
, t
n1
, . . . , t
0
. Then f (x

i
) remains the same (it only appears in
the reversed order in the Riemann sum) and s
i
= |x
i
x
i1
| changes to
|x
i1
x
i
|, which is also same.
Theorem 9.22 Assume that a smooth curve C is given by a parametric
equation x = r(t), where r : [a, b] R
m
. Then for any continuous func-
tion f : R
m
R, we have
_
C
f (x)ds =
_
b
a
f (r(t))|r
/
(t)|dt.
Specically, in R
2
we have
_
C
f (x, y)ds =
_
b
a
f (x(t), y(t))

_
dx
dt
_
2
+
_
dy
dt
_
2
dt.
In R
3
we have
_
C
f (x, y, z)ds =
_
b
a
f (x(t), y(t), z(t))

_
dx
dt
_
2
+
_
dy
dt
_
2
+
_
dz
dt
_
2
dt.
We are not going to prove this theorem because it is pretty much same as the
theorem about the length of a curve. Naturally, to nd the length of a curve,
we integrate constant 1 and we already considered it in Lecture 1.
Example 9.23 Let us nd the integral of the function f (x, y) = x
2
y
2
on the
unit circle x
2
+ y
2
= 1. First, we need to parameterize the circle, so x = cos t,
y = sin t. Then f (x(t), y(t)) = sin
2
t cos
2
t. Thus we get
_
x
2
+y
2
=1
x
2
y
2
ds =
_
2
0
sin
2
t cos
2
t
_
(sin t)
2
+ (cos t)
2
dt =

4

Example 9.24 Let C be the astroid x
2/3
+ y
2/3
= 1 and lets evaluate
_
C
[x[ds
The astroid is parameterized as x = cos
3
t, y = sin
3
t, so
ds =
_
(x
/
)
2
+ (y
/
)
2
dt =
_
(3 cos
2
t sin t)
2
+
_
3 sin
2
t cos t
_
2
dt
=
_
9 cos
2
t sin
2
t
_
cos
2
t +sin
2
t
_
dt = 3[ cos t sin t[dt
116
Thus,
_
C
[x[ds =
_
2
0
[ cos
3
t[ 3[ cos t sin t[dt
= 12
_
2
0
cos
4
t sin tdt = 12
_
1
0
u
4
du =
12
5

Example 9.25 Lets evaluate the integral


_
C
xds, where C is half the unit circle
given by
x
2
+ y
2
= 1, x 0
Here the parametrization is x = cos t, y = sin t for

2
t
3
2
. Further,
ds =
_
(sin t)
2
+ (cos t)
2
dt = dt
Thus,
_
C
xds =
_ 3
2

2
cos tdt = [sin t]
3
2

2
= 2
9.5 Work of a vector eld
Assume that a force F moves a particle of mass m. And assume that the
vector of displacement is D. Then the work is F D Generally, the force is not
constant, that is F is a vector eld. Also, the particle moves along some curve
C. The work is
_
C
F Vds,
where V is the unit tangent vector to the curve, that is, V =
r
/
(t)
|r
/
(t)|
.
Theorem 9.26 Given a eld of force F in R
n
and a smooth curve C dened
by a parametric equation x = r(t) for r : [a, b] R
n
, the work of the
vector eld F over the curve C is
_
b
a
F Tdt,
where T = r
/
(t) is the (non-unit) tangent vector.
117
Specically, it equals
_
b
a
(Px
/
(t) + Qy
/
(t))dt,
_
b
a
(Px
/
(t) + Qy
/
(t) + Rz
/
(t))dt
for vector elds in R
2
and R
3
respectively. The proof is by direct calculation,
so we skip it.
Definition 9.27 The line integral of a vector eld F along a curve C is its
work over C. The notation is (for F = Pi + Qj + Rk):
_
b
a
F Tdt =
_
C
Pdx + Qdy + Rdz.
Since x
/
(t)dt = dx, y
/
(t)dt = dy and z
/
(t)dt = dz, the line integral of
a vector eld is easy to calculate everything is done automatically by the
notation. Notice that unlike the line integral of a function, the line integral of
a vector eld changes sign if the orientation is reversed.
The curve does not have to be smooth everywhere. It might fail to do so
at a few points, in which case it is called piecewise smooth. For instance, the
astroid consists of four smooth pieces.
Example 9.28 Let C be a circle in R
3
given by x
2
+ y
2
= 1, z = 0 and lets
nd the work of the vector eld xj +sin
2
yk over this circle. Thus we need to
evaluate
_
C
xdy +sin
2
ydz
The circle C can be parameterized as x = cos t, y = sin t, z = 0. Then dx =
sin tdt, dy = cos tdt, dz = 0 and the integral is
_
C
xdy +sin
2
ydz =
_
2
0
cos t cos tdt +sin
2
(sin t) 0
=
_
2
0
cos
2
tdt =
_
2
0
1 +cos 2t
2
dt =

Example 9.29 Let now C be a half circle in R


3
given by x
2
+ y
2
= 1, z = 0,
x 0 initiating at (0, 1) and terminating at (0, 1) and lets evaluate
_
C
xdy +sin
2
ydz
If we parameterize C as x = cos t, y = sin t, z = 0 for

2
t

2
, then
it gives x 0, which is not what we need. If we parameterize C as x = cos t,
118
y = sin t, z = 0 for

2
t
3
2
, then the curve initiates at (0, 1) and terminates
at (0, 1), which is not what we need either.
Thus we put x = cos t, y = sin t, z = 0 for

2
t

2
. Then dx =
sin tdt, dy = cos tdt, dz = 0. Finally, we get
_
C
xdy +sin
2
ydz =
_
2

2
cos t cos tdt +sin
2
(sin t) 0
=
_
2

2
cos
2
tdt =
_
2

2
1 +cos 2t
2
dt =

In fact, a simple way to do Example 9.29 is to notice that we can actually


use x = cos t, y = sin t, but then t must be changing from
3
2
to

2
, so the
integral is
_
C
xdy +sin
2
ydz =
_
2
3
2
cos t cos tdt +sin
2
(sin t) 0
Example 9.30 Lets evaluate
_
C
xdx + ydy, where C is the spiral r = , 0
4. The spiral is parameterized as x = t cos t, y = t sin t, so
dx = (cos t t sin t) dt, dy = (sin t + t cos t) dt.
Thus,
_
C
xdx + ydy =
_
4
0
t cos t (cos t t sin t) dt + t sin t (sin t + t cos t) dt
=
_
4
0
t
_
cos
2
t +sin
2
t
_
dt =
_
4
0
tdt = 8
2

9.6 Exercises
Questions on understanding the lecture
Exercise 9.2 Given a curve C, what is
_
C
1ds?
Exercise 9.3 Dene the average value of a function over a curve.
Exercise 9.4 Let C be a curve in R
3
given by a parametric equation x = r(t)
and let F = Pi + Qj + Rk be a vector eld. Dening the work of the vector
eld F along the curve C as
_
C
F
r
/
(t)
|r
/
(t)|
ds, prove that it, indeed, equals
_
C
Pdx + Qdy + Rdz.
119
Questions on calculation
Exercise 9.5 Let f : R
3
R be an arbitrary function having continuous
partial derivatives. Evaluate curl(grad f ).
Exercise 9.6 Let v = Pi + Qj + Rk be an arbitrary vector eld whose compo-
nent functions have continuous partial derivatives. Evaluate div(curl v).
Exercise 9.7 Evaluate
_
C
y
2
ds, where C is given by x = a(t sin t), y = a(1
cos t) for 0 t 2. Here, a > 0 is some constant.
Exercise 9.8 Evaluate
_
C
(2a y)dx + xdy, where C is given by x = a(t
sin t), y = a(1 cos t) for 0 t 2. Here, a > 0 is some constant.
Questions on logical thinking
Exercise 9.9 Prove that the value of a line integral of a function does not de-
pend on the parametrization of the curve. Specically, assume that a curve C
is given by a parametric equation x = x(t) and let t = t(u) be some increasing
function. Prove that the value of the integral
_
C
f ds is the same whether we
take t or u as a parameter.
Exercise 9.10 Prove that the value of a line integral of a vector eld does not
depend on the parametrization.
120
Lecture 10
Newton-Leibniz and Greens
Theorems
10.1 Newton-Leibniz Theorem
The Newton-Leibniz Formula or the Fundamental Theorem of Calculus is
_
b
a
d f =
_
b
a
f
/
(x)dx = f (b) f (a).
It relates the integral of d f over a whole interval with values of f itself at the
endpoints of the interval. Let a, b be the set of two elements the endpoints
of the interval [a, b]. Lets introduce the following new notations:
(i)
_
a,b
f = f (b) f (a) is integral of a function over a couple of points
a, b,
(ii) [a, b] = a, b is the boundary of the interval [a, b], which is the couple
of points a, b.
Now we can re-write the Newton-Leibniz Theorem in the following integral
form:
_
[a,b]
d f =
_
[a,b]
f . (10.1)
Its a mere notation so it doesnt mean anything special now, but later well
learn three more similar theorems. Greens, Stokes, and Gauss formulae re-
late integral of some sort of a functions differential over a region and integral
of the function itself over the boundary of this region.
Lets rst try to nd out if Newton-Leibniz Theorem is true for functions
of several variables and line integrals of vector elds.
121
Example 10.1 Let f (x, y) = x + y
2
and lets integrate the vector eld grad f
over some curve, for example the arc C of the unit circle r = 1, 0 3/4
parameterized as x = cos t, y = sin t for 0 t
3
4
. Then grad f = i + 2yj
and we obtain
_
C
dx +2ydy =
_
3/4
0
(sin t +2 sin t cos t)dt =
1

2
1 +
1
2
=
1

2

1
2
.
On the other hand,
f
_

2
,
1

2
_
f (1, 0) =
1

2
+
1
2
1 =
1

2

1
2
,
so it works.
Now lets try to prove the general statement. Recall that given f : R
m
R,
we have d f = f
x
1
dx
1
+ + f
x
m
dx
m
.
Theorem 10.2 (Newton-Leibniz) For a function f : R
m
R and a
piecewise smooth curve C R
m
initiating at a point a R
m
and ter-
minating at b R
m
, we have
_
C
d f = f (b) f (a).
Proof We are given the following data:
(i) x = r(t), r : [, ] R
m
curve C,
(ii) r() = a, r() = b,
(iii) f : R
m
R our function.
We need to integrate f over the curve C, so its natural to consider the restric-
tion of f onto C, that is, g(t) = f (x
1
(t), . . . , x
m
(t)). This is a compositions of
maps r and f so we can apply Chain Rule. We get then
g
/
(t) = f
x
1
dx
1
dt
+ + f
x
m
dx
m
dt
.
Further, by denition of the line integral, we have
_
C
d f =
_
C
m

i=1
f
x
i
dx
i
=
_

_
f
x
1
dx
1
dt
+ + f
x
m
dx
m
dt
_
dt
=
_

g
/
(t)dt = g() g() = f (b) f (a),
which is what was needed. :)
122
Corollary 10.3 If a vector eld F is the gradient of some function, then
its work over any curve depends only on the endpoints of the curve and
does not depend on its shape.
Conservative vector elds
Definition 10.4 A vector eld in R
m
is called conservative if it equals the
gradient of some function f : R
m
R. The function f is called a potential.
Potential is very important in physics. For example, potential of the gravita-
tional force is called potential energy; potential of the electric force is called
the voltage etc.
For conservative vector elds, the line integral is path-independent. Its value
is dened by endpoints of the curve and does not depend on its actual shape.
Assume that F is a conservative eld and a curve C initiates at a point a R
m
and terminates at a point b R
m
. Then we write
_
b
a
F instead of
_
C
F
On the other hand, if the line integral of a vector eld F is path-independent,
then the vector eld is conservative. Indeed, consider for simplicity a vector
eld dened in the whole R
2
. Put
f (x, y) =
_
(x,y)
(0,0)
Pdx + Qdy,
which is well-dened because the line integral is path-independent, so we can
actually integrate over any path from (0, 0) to (x, y) the result is same. Its
not hard to see that f
x
= P and f
y
= Q.
Exercise 10.1 Prove rigorously that f
x
= P and f
y
= Q by choosing a specic
path from (0, 0) to (x, y) and performing integration.
Its easy to modify this idea for vector elds in R
m
. The result is
Theorem 10.5 A vector eld in R
m
is conservative if and only if its line
integral is path-independent.
123
Example 10.6 Lets evaluate
_
(5,3)
(1,2)
ydx + xdy.
First, we need to nd the potential f : R
2
R. Since f
x
= y, we get f =
xy + C(y) and f
y
= x + C
/
(y). On the other hand, f
y
= x. Thus C
/
(y) = 0, so
we can put C = 0 and f (x, y) = xy. Finally,
_
(5,3)
(1,2)
ydx + xdy = [xy]
(5,3)
(1,2)
= 15 (2) = 17.
Example 10.7 Lets nd
_
(5,3)
(1,2)
y
2
dx + x
2
dy. (10.2)
What is potential f : R
2
R? We have f
x
= y
2
and hence f = xy
2
+ C(y), so
f
y
= 2xy + C
/
(y). On the other hand, f
y
= x
2
. Thus C
/
(y) = x
2
2xy, which
doesnt make sense because C is a function of only y it cannot contain x.
Therefore the eld y
2
i + x
2
j is not conservative and the expression (10.2) does
not make sense. If you have such a question on the exam, you can immediately
write to Straits Times and complain about your calculus lecturer.
Example 10.8 Lets compute
_
C
xdx + ydy
where C is the spiral given by r = , 0 4. First, we need to nd the
potential. We have f
x
= x, so f (x, y) =
x
2
2
+ C(y). Thus f
y
= C
/
(y) = y, so
C(y) =
y
2
2
and we get f (x, y) =
x
2
2
+
y
2
2
. Finally,
_
C
xdx + ydy = f (4, 0) f (0, 0) = 8
2
.
Generally, to nd the potential of a vector eld (P
1
, P
2
, . . . , P
m
), we just
repeat integration in the following manner:
(a) Integrate P
1
by x
1
as f = g(x
1
, . . . , x
m
) + C(x
2
, . . . , x
m
).
(b) Differentiate by x
2
: f
x
2
= g
x
2
+ C
x
2
= P
2
.
(c) Integrate C
x
2
= P
2
g
x
2
by x
2
: C = h g + D(x
3
, . . . , x
m
) etc.
If its not consistent, that is, the expression we get has more variables than it
is meant to have, then the vector eld is not conservative.
124
0.5 0 0.5 1 1.5 2 2.5 3 3.5
0.5
0
0.5
1
1.5
2
2.5
Figure 10.1: Piecewise smooth curve
1 0.5 0 0.5 1 1.5
1.5
1
0.5
0
0.5
1
1.5
Figure 10.2: Simple smooth curve
4 3 2 1 0 1 2 3 4
0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
Figure 10.3: Closed non-simple smooth curve
125
10.2 Greens Theorem
Curves
We have dened what a smooth curve is. More generally, a piecewise smooth
curve C consists of nitely many connected components C
1
, C
2
, . . . , C
k
(see
Figure 10.1). And by denition, put
_
C
f ds =
k

i=1
_
C
i
f ds,
_
C
V =
k

i=1
_
C
i
V.
Further, a simple curve does not intersect itself (see Figure 10.2). Finally, recall
that a closed curve (Figure 10.3) represents a periodic function. Specically, it
means that a closed curve initiates and terminates at the same point.
Definition 10.9 A curve C given by x = r(t), r : [, ] R
m
is
(i) Smooth if r
/
is always continuous and never vanishes,
(ii) Piecewise smooth if r
/
may equal zero or fail to be continuous at nitely
many points t
1
, . . . , t
k1
[, ]
(iii) Closed if r() = r() (if it is smooth closed, then r
/
() = r
/
() is also
required).
(iv) Simple if r : [, ) R
m
is one-to-one
For us, a curve is always oriented, that is, we know the initial point, the
terminal point, and the direction.
All the above stuff about curves is relevant to curves in R
m
. From now on
we consider curves in R
2
. Intuitively, it seems obvious that a simple closed
curve divides R
2
into two parts: the interior and the exterior. However, a
rigourous proof of this statement requires topological techniques much more
advanced than mere Intermediate Value Theorem, so well just believe that it
is so.
Definition 10.10 A region D R
2
is connected if any two points in it can
be joined by a curve lying in D. A region D R
2
is called simply connected
if, additionally, any closed curve C D contains only points that are also in
D.
Roughly speaking, the fact that D is connected means that D consists of
a single peace of area like a disk, an annulus, the whole plane etc. A non-
connected region has a few components. For example, the region dened by
126
1 0.5 0 0.5 1 1.5
1.5
1
0.5
0
0.5
1
1.5
Figure 10.4: Simply connected region
xy 0 has four connected components corresponding to different choices of
signs for x and y. The fact that D is simply connected means that it doesnt
have any hole. For example, the unit disk x
2
+ y
2
< 1 is simply connected,
but the disk without the origin 0 < x
2
+ y
2
< 1 is not. Generally, any region
enclosed by a simple closed curve is simply connected, but there are also
unbounded regions whose boundary consists of several components.
Convention on orientation
Given a region D R
2
bounded by one or several piecewise continuous
curves C
1
, . . . , C
k
, denote by D the union of these curves endowed with ori-
entation such that when moving along each curve, the region remains on the
left (see Figure 10.5). In other words, the outer boundary component is ori-
ented positively while the inner ones are oriented negatively.
Definition 10.11 For any region D R
2
whose boundary D is a union of
several piecewise smooth curves oriented in the way that the region remains
of the left when moving along each curve, the notation
_
D
V
means the sum of line integral over all oriented components of D.
127
4 3 2 1 0 1 2 3 4 5 6
5
4
3
2
1
0
1
2
3
4
Figure 10.5: Orientation of the boundary of a region
The notation
_
means that the integral is taken over a curve that bounds
some region in R
2
and the curve is oriented in the conventional way, that is,
counterclockwise for the outer component and clockwise for the inner ones.
Also, one uses
_
for the integral over a simple closed curve oriented counter-
clockwise and
_
for a simple closed curve oriented clockwise.
Greens Theorem
Greens formula relates the integral of a vector eld along the boundary of a
region and the integral of its derivative over the whole region. It is similar to
Newton-Leibniz Theorem in the form (10.1).
Theorem 10.12 For any vector eld V = Pi + Qj and any open region
D R
2
whose boundary is a union of several piecewise smooth curves
we have
_
D
Pdx + Qdy =
_
D
_
Q
x
P
y
_
dxdy. (10.3)
Proof First of all, the theorem is equivalent to the two equations
_
D
Pdx =
_
D
P
y
dxdy,
_
D
Qdy =
_
D
Q
x
dxdy (10.4)
128
considered separately. Indeed, the sum of equations (10.4) is the required
(10.3). Conversely, if (10.3) is true for any P and Q, then in particular, it is true
for P = 0 and for Q = 0, which is (10.4).
Now the idea of the proof is as follows. Since P
y
is easy to integrate with
respect to y and Q
x
is easy to integrate with respect to x, we have to present D
in a way suitable for both integrations. The whole proof can be done in three
stages:
(a) Prove (10.4) for regions where we can reverse the order of integration.
(b) Prove that if (10.3) is true on each of two regions D
1
and D
2
, then it is
true on their union D
1
D
2
(it is something like inductive step).
(c) Prove that any region can be divided into simple regions. :)
The following three lemmas are these (a)-(c). Unfortunately, (c) is too hard,
so we only prove rigorously (a) and (b) and give a basic idea for (c).
Definition 10.13 A simple region is one that can be dened at the same time
as a x b, f
1
(x) y f
2
(x) and as c y d, g
1
(y) x g
2
(y).
In other words, a simple region is anything like in numerous exercises on
reversing order of integration dx and dy.
Lemma 10.14 For a simple region D, we have
_
D
Pdx =
_
D
P
y
dxdy and
_
D
Qdy =
_
D
Q
x
dxdy
Proof First,

_
D
P
y
dydx =
_
b
a
dx
_
f
2
(x)
f
1
(x)
P
y
(x, y)dy
=
_
b
a
P(x, f
1
(x))dx
_
b
a
P(x, f
2
(x))dx
Further, in order to nd
_
D
Pdx, we need to parameterize D. It has four
components: x = a, x = b, y = f
1
(x), y = f
2
(x). For x = a and x = b, we have
dx = 0 and hence the integral is 0. Thus,
_
D
Pdx =
_
b
a
P(x, f
1
(x))dx +
_
a
b
P(x, f
2
(x))dx =
_
D
P
y
dydx,
129
3.5 3 2.5 2 1.5 1 0.5 0 0.5 1 1.5 2
4
3
2
1
0
1
2
D
1
D
2
C
1
C
2
C
3
Figure 10.6: A region divided into two parts
so the rst equation is proved.
For the second one, we have
_
D
Q
x
dxdy =
_
d
c
dy
_
g
2
(y)
g
1
(y)
Q
x
(x, y)dx
=
_
d
c
Q(g
2
(y), y)dy
_
d
c
Q(g
1
(y), y)dy
Now D has again four components. For y = c and y = d, the integral is zero
because dy = 0. Thus,
_
D
Qdy =
_
c
d
Q(g
1
(y), y)dy +
_
d
c
Q(g
2
(y), y)dy =
_
D
Q
x
dxdy. :)
Lemma 10.15 Assume that a region D R
2
is a union of regions D
1
and
D
2
and assume that Greens theorem holds for each of the regions D
1
and
D
2
. Then Greens theorem holds for the whole region D.
Proof As shown in Figure 10.6, the boundaries of the regions D
1
and D
2
consist of components C
1
, C
2
, C
3
such that D
1
= C
1
+ C
2
and D
2
= C
2
+
130
0 0.5 1 1.5 2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Figure 10.7: A region divided into simple parts
C
3
. It means that C
2
has different orientations when being considered as a
part of D
1
and D
2
respectively. Thus,
_
D
Pdx + Qdy =
_
C
1
Pdx + Qdy +
_
C
3
Pdx + Qdy
=
_
C
1
+
_
C
2

_
C
2
+
_
C
3
Pdx + Qdy =
_
D
1
+
_
D
2
Pdx + Qdy.
But for D
1
and D
2
Greens theorem is supposed to be true, so we have
_
D
1
Pdx + Qdy +
_
D
2
Pdx + Qdy
=
_
D
1
_
Q
x
P
y
_
dxdy +
_
D
2
_
Q
x
P
y
_
dxdy =
_
D
_
Q
x
P
y
_
dxdy
due to additivity of the integral. :)
Lemma 10.16 Any region D R
2
with piecewise smooth boundary D
can be divided into simple regions.
The idea of the proof is illustrated in Figure 10.7. This is, in fact, the
hardest part of Greens Theorem and careful proof requires some advanced
131
techniques, so we skip the detailed justication.
Remark 10.17 If D is a simply connected region with boundary C, then C has
just one component with counterclockwise orientation. Greens theorem in
this case can be stated as
_
C
Pdx + Qdy =
_
D
_
Q
x
P
y
_
dxdy.
Example 10.18 Lets evaluate the line integral
_
x
2
+y
2
=1
_
x
2
y + e
y
_
dx +
_
xy
2
+ xe
y
_
dy
We have Q
x
P
y
= (y
2
+ e
y
) (x
2
+ e
y
) = x
2
+ y
2
. By Greens formula, we
have
_
x
2
+y
2
=1
_
x
2
y + e
y
_
dx +
_
xy
2
+ xe
y
_
dy =
_
x
2
+y
2
1
_
x
2
+ y
2
_
dxdy
Changing to polar coordinates, we obtain
_
2
0
d
_
1
0
r
3
dr = 2
1
4
=

2

Example 10.19 Lets nd
_
C
(x +y)dx (x y)dy, where C is the ellipse given
by
x
2
a
2
+
y
2
b
2
= 1. Here we have Q
x
P
y
= 1 1 = 2. Further,
_
C
(x + y)dx (x y)dy =
_
C
(x + y)dx (x y)dy
=
_
x
2
a
2
+
y
2
b
2
1
2dxdy = 2Area
_
x
2
a
2
+
y
2
b
2
1
_
= 2ab

Potential in a simply connected region


If a vector eld F = Pi + Qj in R
2
is conservative, then P = f
x
and Q = f
y
.
Hence Q
x
P
y
= f
yx
f
xy
= 0. Conversely, is it true that Q
x
P
y
= 0 implies
that the vector eld is conservative?
Theorem 10.20 Consider a vector eld F = Pi + Qj dened in a simply
connected region D R
2
and suppose that Q
x
P
y
= 0. Then
(i)
_
S
Pdx + Qdy = 0 for any closed curve S D,
(ii) The line integral of F is path-independent.
(iii) F is conservative.
132
Proof First, given a closed curve S, let A be the region enclosed by it. By
Greens Theorem, we have
_
S
Pdx + Qdy =
_
A
_
Q
x
P
y
_
dxdy =
_
A
0dxdy = 0.
We can apply Greens Theorem here because A D due to the assumption
that D is simply connected.
Second, given two curves C
1
and C
2
, they together form a closed curve C,
but in order for C to be oriented counterclockwise, the curves C
1
and C
2
must
have opposite orientations, that is,
_
C
1
Pdx + Qdy
_
C
2
Pdx + Qdy =
_
S
Pdx + Qdy = 0,
so the line integral is, indeed, path-independent and hence F is conservative
by Theorem 10.5. :)
The assumption that the region is simply connected is very important here
as the following example shows.
Example 10.21 Consider the eld

y
x
2
+ y
2
i +
x
x
2
+ y
2
j
dened in the region D = R
2
(0, 0), which is not simply connected. For
instance, the unit circle x
2
+ y
2
= 1 surrounds the point (0, 0) not belonging
to D. Its easy to check that Q
x
P
y
= 0.
Lets nd the integral
_
x
2
+y
2
=1

ydx
x
2
+ y
2
+
xdy
x
2
+ y
2
.
We cannot apply Greens Theorem because P and Q are not dened at some
point inside the given curve, so we have to nd the line integral in the usual
manner. The circle is parameterized by x = cos t, y = sin t, so the integral is
_
2
0
_

sin t (sin t)
sin
2
t +cos
2
t
+
cos t cos t
cos
2
t +sin
2
t
_
dt =
_
2
0
dt = 2,
which could not happen in a simply connected region.
Given a vector eld Pi + Qj in R
2
, we can consider it in R
3
by adding the
zero third component. Notice that curl(Pi + Qj + 0k) = (Q
x
P
y
)k. Thus
Q
x
P
y
= 0 if and only if the curl of this vector eld is zero. Such vector
elds are called irrotational. We see that irrotational vector elds are same as
conservative in a simply connected region, but for a region with holes there
exists irrotational but non-conservative vector elds.
133
In fact, the difference between irrotational and conservative vector elds
measures the number of holes in a region. Specically, let I(D) be the space
of irrotational vector elds in a region D R
2
and let C(D) be the space
of conservative vector elds. Then dim I(D)/C(D) is exactly the number of
holes in D.
10.3 Exercises
Questions on understanding the lecture
Exercise 10.2 The Newton-Leibniz formula
_
b
a
d f = f (b) f (a) was proved
in lectures only for a smooth curve C. Does the formula hold for piecewise
smooth curves?
Exercise 10.3 Give an example of a curve that is
(i) Smooth, simple, closed;
(ii) Piecewise smooth, simple, closed;
(iii) Smooth, not simple, closed;
(iv) Piecewise smooth, not simple, closed;
(v) Smooth, simple, not closed;
(vi) Piecewise smooth, simple, not closed;
(vii) Smooth, not simple, not closed;
(viii) Piecewise smooth, not simple, not closed.
Exercise 10.4 Prove that the work of a conservative vector eld over any
piecewise smooth closed curve is zero.
Exercise 10.5 Give an example of a region D R
2
that can be given as a
x b, f
1
(x) y f
2
(x), but not as c y d, g
1
(y) x g
2
(y).
Exercise 10.6 Prove that the work of a vector eld is path-independent, which
means that it depends only on the endpoints of the curve (and does not de-
pend on its actual shape), if and only if the work over any closed curve is
zero.
Exercise 10.7 Prove that a vector eld is conservative if and only if its work
over any closed curve is zero.
Exercise 10.8 Prove that if a vector eld v = Pi + Qj is conservative, then it
is irrotational.
134
Questions on calculation
Exercise 10.9 Evaluate
_
(x
2
,y
2
,z
2
)
(x
1
,y
1
,z
1
)
xdx + ydy + zdz
_
x
2
+ y
2
+ z
2
, where the point (x
1
, y
1
, z
1
)
lies on the sphere x
2
+ y
2
+ z
2
= a
2
and the point (x
2
, y
2
, z
2
) is on the sphere
x
2
+ y
2
+ z
2
= b
2
for a > 0 and b > 0.
Exercise 10.10 (i) Let C be a closed curve. Prove that the area inside C
equals
_
C
xdy =
_
C
ydx =
1
2
_
C
xdy ydx,
(ii) Apply this formula to nd the area enclosed by the astroid x
2/3
+y
2/3
=
a
2/3
.
Exercise 10.11 Evaluate
_
x
2
+y
2
=R
2
e
x
2
y
2
(cos 2xy dx +sin2xy dy).
Questions on logical thinking
Exercise 10.12 Let v = Pi + Qj be an irrotational vector eld in a simply
connected region D R
2
. Greens theorem implies that the line integral
_
C
Pdx + Qdy is then path-independent and hence
f (x, y) =
_
(x.y)
(x
0
,y
0
)
Pdx + Qdy
is a well-dened function of (x, y), where (x
0
, y
0
) D is any xed point.
Prove that f (x, y) is the potential of the vector eld v. You may for sim-
plicity assume that D is a disk centred at (x
0
, y
0
) = (0, 0).
Exercise 10.13 Evaluate the integral
_
C
xdyydx
x
2
+y
2
, where
(a) C does not enclose the origin,
(b) C goes once around the origin,
(c) C goes n times around the origin.
Does the vector eld
xjyi
x
2
+y
2
have a potential? Try to nd it in the usual way.
135
Lecture 11
Surface Integral
11.1 Parametric surfaces
We know how to calculate the length of a curve. Let us try to nd a similar
formula for the area of a surface.
Definition 11.1 Given a map r : R
2
R
3
and a region D R
2
, the image
S = r(D) R
3
is called a surface.
Let u, v be the coordinates in the space R
2
and let x, y, z be coordinates
in R
3
. Then the component functions x = x(u, v), y = y(u, v), z = z(u, v) of
the map r are called parametric equations of the surface S.
In our course we assume that f has continuous partial derivatives and
D R
2
is some region whose boundary is a union of piecewise smooth
curves.
Example 11.2 The unit sphere is given by the spherical equation = 1. Thus
parametric equations would be
x = sin cos , y = sin sin , z = cos .
Here, 0 2 and 0 . Thus D = [0, 2] [0, ] is a rectangle. The
map r : D R
3
is given by
_

_
r

_
_
sin cos
sin sin
cos
_
_
.
136
2
1
0
1
2 2
1
0
1
2
0
0.2
0.4
0.6
0.8
1
Figure 11.1: Cylinder x
2
+ y
2
= 1, 0 z 1.
3
2
1
0
1
2
3 3
2
1
0
1
2
3
2
1.5
1
0.5
0
0.5
1
1.5
2
Figure 11.2: Hyperboloid of one sheet x
2
+ y
2
z
2
= 1
Example 11.3 Lets consider the cylinder x
2
+ y
2
= 1, 0 z 1 shown in
Figure 11.1. Its parametric equations are
x = cos , y = sin , z = z.
Here, 0 2 and 0 z 1. Thus D = [0, 2] [0, 1] is a rectangle. The
map r : D R
3
is given by
_

z
_
r

_
_
cos
sin
z
_
_

Example 11.4 Lets use spherical coordinates to parameterize the hyperboloid
of one sheet x
2
+ y
2
z
2
= 1 shown in Figure 11.2. The spherical equation is
137

2
(sin
2
cos
2
) = 1, that is, =
1

cos 2
. Thus the parametric equations
are
x =
sin cos
_
cos 2
, y =
sin sin
_
cos 2
, z =
cos
_
cos 2
.
What is the domain D of parameters and ? We must have here cos 2 < 0,
so

4
< <
3
4
. Thus D = [0, 2]
_

4
,
3
4
_
is also rectangle, but not a closed
one as in rst two examples.
Example 11.5 Consider again the hyperboloid x
2
+ y
2
z
2
= 1. Notice that
while the sum of squares is easy to be parameterized with cosine and sine, in
the same manner the difference of squares can be parameterized with hyper-
bolic functions. Thus the parametric equations would be
x = cosh u cos , y = cosh u sin , z = sinh u.
Here we have 0 2 and < u < +. The region D = [0, 2] R,
which is an innite strip.
Boundary of a surface
Definition 11.6 Given a surface S R
3
its boundary is the curve C = S
such that
(i) C lies in the surface S,
(ii) but S is located on the one side of C.
If the surface is bounded and has no boundary, then its called closed.
We distinguish bounded surfaces and surfaces with boundary. A bounded sur-
face means that it is located in some nite region, for example the whole
surface lies inside a large cube C x, y, z C. A surface with boundary is
one whose boundary consists of at least one simple curve.
Example 11.7 The sphere x
2
+ y
2
+ z
2
= 1 is bounded because its equation
implies that 1 x, y, z 1. The whole hyperboloid x
2
+ y
2
z
2
= 1 is
not bounded because we can take arbitrary large z and then set y = 0 and
x =

z
2
+1. However, we could consider only a part of hyperboloid given
by 1 z 1 and such a part would be bounded.
Exercise 11.1 On the contrary, explain why the part of hyperboloid given by
x
2
+ y
2
z
2
= 1, 1 y 1 is not bounded.
138
1
0.5
0
0.5
1 1
0.5
0
0.5
1
1
0.5
0
0.5
1
8
6
4
2
0
2
4
6
8
8
6
4
2
0
2
4
6
8
1
0
1
Figure 11.3: Closed surfaces
A bounded surface can have arbitrary many boundary components. For
instance, sphere and torus (Figure 11.3) are closed surfaces and have no
boundary at all; the disk x
2
+ y
2
1, z = 0 has one boundary component
circle x
2
+ y
2
= 1, z = 0; the cylinder from Figure 11.1 has two boundary
components etc.
Basically, a surface is given by a map r : D R
3
. The region D R
2
has a boundary D. Its image contains the boundary of the surface, that is,
r(D) S but it might be strictly bigger due to periodicity of the map r.
Example 11.8 Consider again the cylinder x = cos , y = sin , z = z for
0 2 and 0 z 1 (Figure 11.1). Here D = [0, 2] [0, 1] is a
rectangle whose boundary contains of one closed curve consisting of four
edges. However, the parametric equations are periodic in so edges = 0
and = 2 are glued together in the actual surface, so the boundary of the
surface has two components z = 0 and z = 1.
Orientation
The next fundamental concept is orientation. To dene it, we rst need the
notion of a smooth surface, which is similar to the notion of a smooth curve.
Definition 11.9 A smooth surface is the image of a map r : D R
3
, where
D R
2
is some region. Besides, we assume that
(i) r has continuous partial derivatives,
(ii) N = r
u
r
v
0 inside the region D.
What does it mean geometrically? Differentiation means tangent, so r
u
139
and r
v
are tangent vectors to the surface. If they are linearly independent,
then they generate the whole tangent space. But two vectors in R
3
are linearly
independent if and only if their cross product is not zero, so the condition
r
u
r
v
0 means exactly that the tangent space to the surface is dened.
In this denition, r
u
and r
v
are tangent vectors to the surface, so N =
r
u
r
v
is a normal vector to the surface.
Example 11.10 Lets nd the normal vector of the sphere parameterized with
spherical coordinates x = sin cos , y = sin sin , z = cos . We have
r

= det
_
_
i j k
cos cos cos sin sin
sin sin sin cos 0
_
_
= sin
2
cos i +sin
2
sin j +cos sin k
.
We see that r

= 0 only at = 0 or = , so this parametrization fails


to be smooth at the points (0, 0, 1). Does it mean that the sphere has a cusp
at these points? Of course, not. We can use another parametrization, which
will be smooth at these points. For instance, y = sin cos , z = sin sin ,
x = cos is also a parametrization of the same sphere and this one is not
smooth only at points (1, 0, 0).
Example 11.11 The upper hemisphere can be also parameterized as x = x,
y = y, z =
_
1 x
2
y
2
for x
2
+ y
2
1. Lets nd the normal vector for this
parametrization. We have then,
r
x
=
_
_
_
1
0

1x
2
y
2
_
_
_, r
y
=
_
_
_
0
1

1x
2
y
2
_
_
_
Thus,
r
x
r
y
= det
_
_
_
_
_
i j k
1 0
x
_
1 x
2
y
2
0 1
y
_
1 x
2
y
2
_
_
_
_
_
=
x
_
1 x
2
y
2
i +
y
_
1 x
2
y
2
j +k
,
which is never zero, but fails to be dened for x
2
+ y
2
= 1. Thus these equa-
tions give a smooth parametrization for the open hemispherex
2
+ y
2
< 1.
Further, orientation on a curve is the choice of either direction. In the same
manner, orientation on a surface is the choice of its either side. We can specify
direction along a curve by picking one of the two unit tangent vectors. In the
same manner, a surface has two unit normal vectors.
140
1
0.5
0
0.5
1 1
0.5
0
0.5
1
0
0.2
0.4
0.6
0.8
1
1
0.5
0
0.5
1 1
0.5
0
0.5
1
0
0.2
0.4
0.6
0.8
1
Figure 11.4: Two opposite orientations
Example 11.12 The sphere x
2
+ y
2
+ z
2
= 1 has two unit normal vectors at
a point (x, y, z): one points outside and another one inside. The former is
(x, y, z) and the latter is (x, y, z).
Definition 11.13 Given a smooth surface S R
3
, its orientation is a unit
normal vector eld U dened on S. If such U exists, then the surface is
called orientable. An orientable surface always has two opposite orientations:
U and U as shown in Figure 11.4.
An orientation is the choice of either side of a surface. It might seem that
all surfaces have two sides and therefore are orientable (like all curves are).
Surprisingly, there are surfaces with only one side. The simplest of them is
the Mbius band shown in Figure 11.5. It can be easily made from a paper
strip: we just need to twist it and glue the ends. In fact, any non-orientable
surface contains such a Mbius band as a subset.
The Mbius band is not closed its boundary consists of one component.
If we imagine that the whole thing is made of some kind of elastic material
and glue the whole boundary into a point, we get a so-called projective plane.
The projective plane is a closed non-orientable surface, but it can only be
realized in R
3
with self-intersections. In fact, any closed simple surface in R
3
is orientable.
141
1.5
1
0.5
0
0.5
1
1.5
1.5
1
0.5
0
0.5
1
1.5
1
0.5
0
0.5
1
Figure 11.5: The Mbius strip is a non-orientable surface
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
Figure 11.6: A rectangle on the surface is combined of two triangles
142
11.2 Surface integral
Area of a surface
First, consider an arbitrary surface r(D), where D = [a, b] [c, d] is a rectangle.
Lets divide D into n
2
small rectangles. In particular, consider partitions a =
u
0
< u
1
< < u
n
= b and c = v
0
< v
1
< < v
n
= d and put u
i
=
u
i
u
i1
and v
i
= v
i
v
i1
for i = 1, 2, . . . , n. Let u and v be the maximal
values of all u
i
and all v
j
respectively.
Lets look at one of these small rectangles [u
i1
, u
i
] [v
j1
, v
j
]. It has four
vertices in the plane R
2
u,v
. The corresponding points on the actual surface
might not be vertices of any rectangle and they even might not lie in one
plane. However, we can always consider them as two triangles shown in
Figure 11.6. Let
A
i,j
= Area[r(u
i1
, v
j1
), r(u
i1
, v
j
), r(u
i
, v
j1
)],
B
i,j
= Area[r(u
i1
, v
j
), r(u
i
, v
j
), r(u
i
, v
j1
)]
Definition 11.14 The area of the parameterized surface S is
lim
(u,v)(0,0)
n

i,j=1
_
A
i,j
+ B
i,j
_
This denition can be modied to the case of an arbitrary surface r(D),
where D R
2
is not necessarily a rectangle, but we are going to use the
following theorem instead of denition anyway.
Theorem 11.15 Let S = r(D) be a smooth surface, where r : R
2
R
3
is some map having continuous partial derivatives. Then the area of the
surface S equals
_
D
|r
u
r
v
|dudv. (11.1)
We are not going to prove this theorem because its too hard, so well just
explain why it is true. Let D = [a, b] [c, d] be a rectangle. Put u
i
= a +
ba
n
i
and v
j
= c +
dc
n
j, so
A
i,j
= Area[r(u
i1
, v
j1
), r(u
i1
, v
j
), r(u
i
, v
j1
)],
B
i,j
= Area[r(u
i1
, v
j
), r(u
i
, v
j
), r(u
i
, v
j1
)]
143
Area of a triangle can be found using the cross product as
A
i,j
=
1
2
_
_
_
r(u
i1
, v
j1
) r(u
i1
, v
j
)
_

_
r(u
i1
, v
j1
) r(u
i
, v
j1
)
__
_
,
B
i,j
=
1
2
_
_
_
r(u
i
, v
j
) r(u
i1
, v
j
)
_

_
r(u
i
, v
j
) r(u
i
, v
j1
)
__
_
Now by the formula for the differential, we have
r(u
i
, v
j
) r(u
i1
, v
j
) r
u
(u
i
, v
j
)u
i
,
r(u
i
, v
j
) r(u
i
, v
j1
) r
v
(u
i
, v
j
)v
j
,
so B
i,j

1
2
u
i
v
j
|r
u
(u
i
, v
j
) r
v
(u
i
, v
j
)|. Due to continuity of partial deriva-
tives, we have A
i,j
B
i,j
. Thus the expression
n

i,j=1
_
A
i,j
+ B
i,j
_
is a Riemann sum for the integral (11.1).
Example 11.16 Let us evaluate the area of the parabolic bowl z = x
2
+y
2
1.
We can parameterize it with x and y, that is, x = u, y = v, z = u
2
+ v
2
.
Tangent vectors are r
u
= (1, 0, 2u) and r
v
= (0, 1, 2v). The normal vector is
N = r
u
r
v
= det
_
_
i j k
1 0 2u
0 1 2v
_
_
= 2ui 2vj +k
Thus,
Area(S) =
_
u
2
+v
2
1
_
1 +4u
2
+4v
2
dudv
=
_
2
0
d
_
1
0
r
_
1 +4r
2
dr = 2
1
12
_
_
1 +4r
2
_
3/2
_
1
0
=

6
(5

5 1)

Example 11.17 Lets nd the area of the sphere of radius R. It can be param-
eterized with spherical angles and as x = Rsin cos , y = Rsin sin ,
z = Rcos . We have then
N = det
_
_
i j k
Rcos cos Rcos sin Rsin
Rsin sin Rsin cos 0
_
_
= R
2
sin
2
cos i + R
2
sin
2
sin j + R
2
cos sin k
.
Now we need the length of the normal vector, that is,
|N| = R
2
_
sin
4
cos
2
+sin
4
sin
2
+cos
2
sin
2
= R
2
sin .
144
Finally, the area is
_
2
0
d
_

0
R
2
sin d = 4R
2
.
Corollary 11.18 Given a function f (x, y), the area of its graph z =
f (x, y) for (x, y) D R
2
is
_
D
_
1 + f
2
x
+ f
2
y
dxdy. (11.2)
Proof Lets parameterize the surface as x = x, y = y, z = f (x, y). Then the
normal vector is
N = r
x
r
y
= det
_
_
i j k
1 0 f
x
0 1 f
y
_
_
= f
x
i f
y
j +k
Its norm is |N| =
_
1 + f
2
x
+ f
2
y
and integrating it we get exactly the expres-
sion (11.2). :)
Corollary 11.19 Consider a function f (x) for a x b. The area of its
surface of revolution about the x-axis equals
2
_
b
a
[ f [
_
1 + ( f
/
)
2
dx
Proof Lets use (modied) cylindrical coordinates for parametrization. It
gives us
x = x, y = f (x) cos , z = f (x) sin .
Thus the normal vector is
N = r
x
r

= det
_
_
i j k
1 f
/
cos f
/
sin
0 f sin f cos
_
_
= f f
/
i f cos j f sin k.
Its norm is |N| =
_
( f f
/
)
2
+ f
2
= [ f [
_
1 + ( f
/
)
2
. Finally, the area is
_
2
0
d
_
b
a
[ f [
_
1 + ( f
/
)
2
dx = 2
_
b
a
[ f [
_
1 + ( f
/
)
2
dx :)
145
Surface integrals of functions and vector elds
First, recall some material on line integrals. Given a curve with parametric
equation x = r(t), r : [a, b] R
m
, let T = r
/
(t) be its tangent vector. Then we
have the following equations
(i) The length of the curve is
_
b
a
|T|dt.
(ii) The line integral of a function f : R
m
R is
_
b
a
f (r(t))|T|dt.
(iii) The line integral of a vector eld V is
_
b
a
V(r(t)) Tdt.
Now let S R
3
be a surface given by a parametric equation x = r(u, v),
where r : D R
3
is some map dened in a region D R
2
u,v
. Let N = r
u
r
v
be the normal vector. Then
Area(S) =
_
D
|N|dudv
In a similar manner to line integrals, one denes surface integrals of func-
tions and of vector elds. The only difference is that in using the normal
vector N instead of tangent vector T.
Definition 11.20 Let S R
3
be a surface given by a parametric equation
x = r(u, v), where r : D R
3
is a map whose domain is a region D R
2
.
Let N = r
u
r
v
be its normal vector. Then the surface integral of a function
f : R
3
R is
_
S
f (x, y, z)dS =
_
D
f (r(u, v))|N|dudv
and the surface integral of a vector eld V = Pi + Qj + Rk is
_
S
V dS =
_
D
V(r(u, v)) Ndudv
Here, dS and dS are notations from Stewart. They mean
dS = Ndudv, dS = |dS| = |N|dudv.
It is possible to dene a better notation, something like
_
Pdx + Qdy + Rdz
for line integrals. It is called differential forms and is studied in differential
geometry.
146
Example 11.21 Let us evaluate the integral

S
zdS, where S is the upper
hemisphere given by x
2
+ y
2
+ z
2
= 1, z 0. Parametric equations are
x = sin cos , y = sin sin z = cos for 0 2, and 0

2
.
The length of the normal vector is
|N| = | sin
2
cos i +sin
2
sin j +cos sin k| = sin .
Thus,
_
S
zdS =
_
2
0
d
_
2
0
cos sin d =
_
2
cos 2
4
_
2
0
= .
Example 11.22 Lets evaluate

S
(x + y)
2
z
3
dS, where S is the cylinder x
2
+
y
2
= 1, 0 z 1. Parametric equations are x = cos , y = sin , z = z for
0 2, 0 z 1. The normal vector is
N = det
_
_
i j k
sin cos 0
0 0 1
_
_
= cos i +sin j.
Its length is
|N| =
_
cos
2
+sin
2
= 1
Thus,
_
S
(x + y)
2
z
3
dS =
_
2
0
d
_
1
0
(cos +sin )
2
z
3
dz
=
_
2
0
(1 +sin2)d
_
1
0
z
3
dz =

2
.

Notice that the surface integral of a function does not depend on the ori-
entation of the surface while the integral of a vector eld does.
Example 11.23 Lets evaluate

S
V dS, where S is a piece of the paraboloid
z = x
2
+ y
2
1 oriented with the unit normal pointing downwards and
V = yi xj + z
2
k. Well use x and y as parameters. The normal vector is
N = det
_
_
i j k
1 0 2x
0 1 2y
_
_
= 2xi 2yj +k.
It points upwards while the surface is oriented with a normal vector pointing
downwards. Thus, we need to reverse orientation, which means taking the
147
integral with negative sign. Finally,
_
S
V dS =
_
x
2
+y
2
1
(yi xj + (x
2
+ y
2
)
2
k) (2xi 2yj +k)dxdy
=
_
x
2
+y
2
1
(x
2
+ y
2
)
2
dxdy =
_
2
0
d
_
1
0
r
5
dr =

3
.

Example 11.24 Evaluate

S
V dS, where S is the lower unit hemisphere x
2
+
y
2
+ z
2
= 1, z 0 oriented with a normal vector pointing upwards and
V = y
2
k. Lets again use spherical coordinates to parameterize it. The normal
vector is
N = sin
2
cos i +sin
2
sin j +cos sin k.
It is pointing downwards because cos = z 0, so we have to take the
integral with negative sign. Further,
V N = sin
2
sin
2
cos sin = cos sin
3
sin
2
.
Thus,
_
S
V dS =
_
2
0
d
_

2
cos sin
3
sin
2
d
=
_
2
0
sin
2
d
_

2
cos sin
3
d =
_
2
0
1 cos 2
2
d
_
0
1
u
3
du =

4
.

Physical meaning of surface integral


Just like the work is a physical meaning of the line integral of a vector eld,
the ux is a physical meaning of the surface integral of a vector eld.
Given a at region D R
2
, assume the a liquid is owing through it in
the perpendicular direction such that its velocity at a point (x, y) is R(x, y).
Then the integral
_
D
R(x, y)dxdy
measures the volume of the liquid passing through D per unit time.
More generally, if the velocity of the liquid is some arbitrary vector eld
V in R
3
and instead of a at region we have a net whose shape is a surface
148
S, then we have to take the component of the vector eld V orthogonal to the
surface. Thus the ux is
_
S
V UdS,
where U =
N
|N|
is the unit normal to the surface. Finally,
_
S
V UdS =
_
S
V
N
|N|
dS
=
_
D
V
N
|N|
|N|dudv =
_
D
V Ndudv =
_
S
V dS
11.3 Exercises
Questions on understanding the lecture
Exercise 11.2 A surface is called at if it lies in a plane, for example z = 0.
How can one parameterize a at surface?
Exercise 11.3 Is a at surface orientable? If so, what is its orientation?
Exercise 11.4 How many boundary components can a surface have?
Exercise 11.5 Given a parametric surface S = r(D), what is the ux of its unit
normal eld U across S?
Questions on calculation
Exercise 11.6 Evaluate
_
S
(xy + yz + xz)dS, where S is a piece cut from the
cone z =
_
x
2
+ y
2
by the cylinder x
2
+ y
2
= 2x.
Exercise 11.7 Evaluate
_
S
_
i
x
+
j
y
+
k
z
_
dS, where S is the ellipsoid
x
2
a
2
+
y
2
b
2
+
z
2
c
2
= 1, a, b, c > 0
oriented with the unit normal pointing outwards.
Questions on logical thinking
Exercise 11.8 Let C be a curve in R
2
given by parametric equations x = x(t),
y = y(t). Let S be the surface of revolution of the curve C about the y-axis
(something like the one shown in Figure 11.7).
149
4
2
0
2
4 4
2
0
2
4
4
3
2
1
0
1
2
3
4
Figure 11.7: Surface obtained by revolving the astroid x = 2 +cos
3
t, y = sin
3
t
about the y-axis.
(a) Parameterize the surface S.
(b) Apply part (a) to parameterize the torus of revolution (see Figure 11.8)
obtained by rotating the circle of radius b centred at (a, 0) about the
y-axis.
(c) Find the area of the torus of revolution.
Exercise 11.9 The Mbius band can be given by a parametric equation as
x = cos t
_
1 +r cos
t
2
_
y = sin t
_
1 +r cos
t
2
_
z = r sin
t
2
,
1
2
r
1
2
, 0 t 2
It seems that the function is periodic in t, so the boundary consists of two
components: r = 1/2 and r = 1/2. Besides, it looks like we can nd the
normal vector eld using the standard procedure.
How come we see on the picture that the boundary consists of only one
component and the Mbius band is non-orientable?
Exercise 11.10 Prove that surface area does not depend on the parametriza-
tion. Specically, check that if we substitute u = u(p, q), v = v(p, q), then
_
D
|r
u
r
v
|dudv =
_
E
|r
p
r
q
|dpdq,
150
6
4
2
0
2
4
6
6
4
2
0
2
4
6
6
4
2
0
2
4
6
Figure 11.8: A torus of revolution.
where (u, v) D R
2
and (p, q) E R
2
.
151
Lecture 12
Stokes Theorem and
Applications of Integration
12.1 Stokes Theorem
Greens formula and curl
Greens theorem involves a vector eld P(x, y)i + Q(x, y)j in R
2
. Consider it
in R
3
as Pi + Qj +0k. The formula is then
_
D
Pdx + Qdy =
_
D
_
Q
x
P
y
_
dxdy.
We can write the expression in the double integral as
det
_

x

y
P Q
_
=
Q
x

P
y
.
Recall that the curl is something similar:
curl(Pi + Qj + Rk) = det
_
_
i j k

x

y

z
P Q R
_
_
In particular, the curl of our 2-component vector eld is
curl(Pi + Qj +0k) = det
_
_
i j k

x

y

z
P(x, y) Q(x, y) 0
_
_
=
_
Q
x
P
y
_
k
Thus the expression in the right hand side of Greens formula is the integral
of a vector eld perpendicular to the region D, that is, its ux across D.
152
Stokes Theorem
Theorem 12.1 (Stokes) Given a bounded oriented surface S R
3
and a
vector eld F = Pi + Qj + Rk, we have
_
S
Pdx + Qdy + Rdz =
_
S
curl F dS.
Definition 12.2 The work of a vector eld over a closed curve C is called
the circulation around C.
In other words, the circulation of a vector eld around a closed contour equals
the ux of its curl through a surface bounded by this contour.
The idea of the proof of Stokes Theorem is
(a) Prove the formula if the surface S is the graph of some function z =
f (x, y). Here, we can just use direct calculation (done in Stewart, for
example).
(b) Prove that given two surfaces S
1
and S
2
, if the formula holds for each of
them, then it holds for their union.
(c) Prove that any surface can be subdivided into graphs of function z =
z(x, y), y = y(x, z), and x = x(y, z). This part involves the Implicit
Function Theorem, which is by unknown reason not included in our
course.
Example 12.3 Let us evaluate the circulation
_
C
(y + e
z
)dx (x sin y)dy + z
3
dz,
where C is the circle given by x
2
+ y
2
= 1, z = 0. The circle is oriented
positively when seen from above. The vector eld being integrated is F =
(y + e
z
)i + (x +sin y)j + z
3
k. Further,
curl F = det
_
_
i j k

x

y

z
y + e
z
x +sin y z
3
_
_
= 0i + e
z
j 2k.
Let D be the unit disk given by x
2
+ y
2
1, z = 0. By Stokes Theorem, we
153
get
_
C
(y + e
z
)dx (x sin y)dy + z
3
dz =
_
D
(e
z
j 2k) dD
=
_
x
2
+y
2
1
2dxdy = 2

Example 12.4 Evaluate the circulation


_
C
(x + e
yz
)dx + (cos x arctan y
7
)dy + (z
3
+5x)dz,
where the contour C encloses the square 0 x , 0 y , z = 0. The
contour is oriented counterclockwise when seen from above. Here,
curl F = (5 + ye
yz
)j + (sin x ze
yz
)k
and substituting z = 0, N = k, we see that the integral equals
_

0
dx
_

0
sin xdy = 2
Applying Stokes Theorem other way round
If we want to apply Stokes Theorem to evaluate surface integral
_
S
V dS,
then we need to nd a vector eld F such that curl F = V. Recall that
div(curl F) = 0 holds. It means that div V = 0 is a necessary condition to
present the vector eld V as the curl of some other vector eld.
It will be observed later that div V = 0 is one of equations for an incom-
pressible liquid ow. For instance, the water is almost incompressible, so its
quite a typical occasion.
For example, imagine the situation when someone wants to become the
world dictator. He wants to achieve the great goal by catching magic Golden
Fishes. A Golden Fish grants three wishes. It is known that if a man applies to
rule the Sea, then the Golden Fish rejects such a request, so we are pretty sure
that the world dictatorship would be too much for one wish. Thus he cannot
ask for more than to be the leader of a particular country. The UN has 192
member states, so one has to catch 64 Golden Fishes to complete the quest.
Example 12.5 We are going to sh using a net of a hemispherical shape shown
in Figure 12.1. The equation dening the net is x
2
+ y
2
+ z
2
= 9, x 0 with
lengths measured in m. It is known that the eld of velocities of the sea water
in our magic area is V = i + e
z
cos yj + (1 + e
z
sin y)k measured in cm/s. In
order to catch a Golden Fish, one has to process 1km
3
of water. Lets nd how
long it takes to conquer the world in such a manner.
154
3
2
1
0
1
2
3
3
2
1
0
1
2
3
3
2
1
0
1
2
3
Figure 12.1: Hemispherical net
We need to evaluate the integral
_
S
V dS, where S is the given hemi-
sphere. Lets try to apply Stokes Theorem.
We need to nd F = Pi + Qj + Rk such that curl F = V. It gives us
equations
_
_
_
R
y
Q
z
= 1
P
z
R
x
= e
z
cos y
Q
x
P
y
= e
z
sin y 1
Its natural to try P = e
z
cos y. Thus, we get
_
_
_
R
y
Q
z
= 1
R
x
= 0
Q
x
= 1
We can take Q = x and R = y, so F = e
z
cos yi xj + yk. Applying Stokes
Theorem, we obtain
_
S
V dS =
_
S
curl F dS =
_
S
Pdx + Qdy + Rdz
Here, S is the circle that bounds the disk of radius 3m located in the plane
155
x = 0. It can be parameterized as
x = 0, y = 3 cos t, z = 3 sin t.
However, velocities are measured in cm, so in m we get
P =
e
z
cos y
100
, Q =
x
100
, R =
y
100
.
Finally,
_
S
Pdx + Qdy + Rdz
=
1
100
_
2
0
(e
z
cos y 0 +0 3 sin t +3 cos t 3 cos t)dt =
9
100
Thus the ux through the net is
9
100
0.2827m
3
/s. To process 1km
3
of water,
one needs 1000
3
/0.2827 3.5 10
9
s. It is 3.5 10
9
/(3600 24 365) 112 years.
Totally, the whole plan requires 112 64 = 7168 years,
Example 12.6 Lets nd F such that curl F = i +
_
x
1+x
2
z
2
2xe
x
2
_
j (3x
2
+
1)k and evaluate the integral
_
S
_
i +
_
x
1 + x
2
z
2
2xe
x
2
_
j (3x
2
+1)k
_
dS,
where the surface S is given by z = ln(2 x
2
y
2
), x
2
+ y
2
1. The surface
is oriented with the unit normal pointing upwards.
One of possible solutions is
F = (y +arctan xz)i x
3
j + (y + e
x
2
)k.
Since the orienting unit normal points upwards, the circle is to be oriented
counter-clockwise, that is,
x = cos t, y = sin t, z = 0.
By Stokes Theorem, the integral is
_
2
0
(sin
2
t cos
4
t)dt =
7
4

Remark 12.7 Recall that curl(grad f ) = 0 for any function f (x, y, z). There-
fore,
curl(F +grad f ) = curl F.
Thus for any divergence-free vector eld V, solutions of the equation curl F =
V can differ by the summand grad f , where f (x, y, z) is an arbitrary function.
156
0.1
0.05
0
0.05
0.1 0.1
0.05
0
0.05
0.1
0.1
0.05
0
0.05
0.1
Figure 12.2: A at disk and a ux across it
Physical meaning of the curl
Stokes Theorem explains why
_
R
y
Q
z
_
i + (P
z
R
x
) j +
_
Q
x
P
y
_
k is
called the curl. Let V be the eld of velocities in a liquid. Consider a cir-
cle C of a small radius r centred at a point a R
3
. Let S be the disk bounded
by C. Then the integral
_
C
V shows how fast the liquid is rotating around C.
Since the circle C is small, curl V is almost constant around its area, so
everything looks like Figure 12.2. Applying Stokes Theorem, we get then
_
C
V =
_
S
curl V dS =
_
u
2
+v
2
r
2
curl V Ndudv,
where u and v are coordinates in the plane of the disk and N is the normal
vector to this plane. Since the disk is at, we have |N| = 1 and curl V N =
| curl V| cos , where is the angle between curl V and N. The maximal value
of this integral is r
2
| curl V| and it happens when cos = 1, that is = 0.
It means that curl V is perpendicular to the disk S. Thus we can say that the
length of the curl measures a rotating effect in a liquid while the direction of
the curl is perpendicular to the plane of rotation.
157
12.2 Applications of integration
Physics
Integration is used a lot in physics. For example, if we know that the speed is
v(t) for 0 t x, then the distance travelled is
_
x
0
v(t)dt.
Given a thin lamina, double integrals can be used to compute some its
physical characteristics. Assume that the lamina occupies a region D R
2
. If
the density is constant , then the mass is
M =
_
D
dxdy.
If the density = (x, y) is not constant, then
M =
_
D
(x, y)dxdy.
Example 12.8 Assume that a lamina of density (x, y) = x
2
+ y
2
occupies the
area between the hyperbola xy = 1 and the straight line 2x + y = 3. First, we
need to nd the points of intersection, that is, to solve the system
_
xy = 1
2x + y = 3
Solving it, we get x =
1
2
and x = 1. The mass of the lamina is then
M =
_
D
(x, y)dxdy =
_
1
1/2
dx
_
32x
1/x
(x
2
+ y
2
)dy,
which is a straightforward integral.
In a similar manner, given a 3D solid body of density (x, y, z) occupying
a region V R
3
, its mass is
M =
_
V
(x, y, z)dxdydz.
Finally, if we have a thin lamina whose shape is a surface S R
3
, then the
mass is
M =
_
S
(x, y, z)dS.
Example 12.9 Assume that an octant of the unit sphere given by x
2
+ y
2
+
z
2
1, x, y, z 0 has density (x, y, z) = 1 + z and let us nd its mass. It
158
equals
_
V
(1 + z)dxdydz =
_
2
0
d
_
2
0
d
_
1
0
(1 + cos )
2
sin d
=

2
_
2
0
sin d
_
1
0

2
d +

2
_
2
0
cos sin d
_
1
0

3
d
=

6
+

16
=
11
48

Example 12.10 Lets nd the mass of the conical lamina z =


_
x
2
+ y
2
1 of
unit density. Lets parameterize it using cylindrical coordinates as
_
_
_
x = r cos
y = r sin
z = r
, 0 2, 0 r 1
The unit normal is then
N = det
_
_
i j k
r sin r cos 0
cos sin 1
_
_
= r cos i +r sin j rk
Its length is |N| =

2r, so the mass equals


_
S
dS =
_
2
0
d
_
1
0

2rdr =

2
The centre of mass of a nite system of points is their mean coordinates.
More generally, given a at lamina of density (x, y) occupying a region D
R
2
, its centre of mass is dened by
X =
1
M
_
D
x(x, y)dxdy, Y =
1
M
_
D
y(x, y)dxdy,
where M is the mass of the lamina.
In a similar manner, the centre of mass of a 3D solid body occupying a
volume V R
3
is dened by
X =
1
M
_
D
x(x, y, z)dxdydz, Y =
1
M
_
D
y(x, y, z)dxdydz,
Z =
1
M
_
D
z(x, y, z)dxdydz,
where M is the mass of the solid.
Example 12.11 Let us nd the centre of mass of the upper hemisphere x
2
+
y
2
+ z
2
1, z 0 of constant density = 1. First, lets notice that the
159
hemisphere is symmetric about the z-axis. Therefore X = Y = 0. Second,
since the density is 1, the mass equals the volume, which is half the volume
of the sphere, that is M =
2
3
. Thus,
Z =
3
2
_
V
zdxdydz =
_
2
0
d
_
2
0
d
_
1
0
cos
2
sin
=
3
2
2
1
2

1
4
=
3
8

Example 12.12 Lets nd the centre of mass of a solid of constant density


occupying the region V R
3
given by x
2
+ y
2
z 1. First, the mass is
_
V
dxdydz =
_
2
0
d
_
1
0
dr
_
1
r
2
rdz =

2
.
Second, due to the symmetry, we have X = Y = 0. Finally,
Z =
2

_
V
zdzdydz =
2

_
2
0
d
_
1
0
dr
_
1
r
2
rzdz =
2
3
.
Thus the centre of mass is
_
0, 0,
2
3
_
.
Probability and statistics
In probability and statistics, a continuous random variable has a density f .
In 1D case, the density is a function f (x) such that f (x) 0 for all x and
_

f (x)dx = 1. The probability that the value of X is between two numbers


a and b is
P(a X b) =
_
b
a
f (x)dx
If we have two random variables X and Y, then the joint density func-
tion f (x, y) is dened. It satises f (x, y) 0 and

R
2
f (x, y)dxdy = 1. The
probability that the random point (X, Y) belongs to some region D R
2
is
P
_
(X, Y) D
_
=
_
D
f (x, y)dxdy
For example, X can be the height and Y the weight of a random human.
If X and Y are random variables with joint density f (x, y), then their ex-
pected values are
E
X
=
_
R
2
x f (x, y)dxdy, E
Y
=
_
R
2
y f (x, y)dxdy
160
The expected value is the mean. For example, if f (x, y) denotes the joint
density of an adult female height and mass, then E
X
is the average height and
E
Y
the average mass.
The Gaussian distribution is very common. For n independent random
variables whose expected value is and deviation is , its density is
f (X
1
, . . . , X
n
; , ) =
1
(

2)
n
n

i=1
e

1
2
_
x
i

_
2
12.3 Exercises
Questions on understanding the lecture
Exercise 12.1 Assume that a surface S R
3
encloses a 3D region (something
like a sphere enclosing a ball). Prove that S is orientable.
Exercise 12.2 Given a closed surface S R
3
and a vector eld V = curl F,
what is the ux of V across S?
Exercise 12.3 What is the difference between orientable and oriented sur-
faces?
Exercise 12.4 Assume that two oriented surfaces S
1
R
3
and S
2
R
3
have
the same boundary, a curve C = S
1
= S
2
. Also, let V = curl F. Prove that
_
S
1
V dS
1
=
_
S
2
V dS
2
.
Exercise 12.5 Assume that the shape of a heavy rope is a curve C R
3
such
that the ropes density at a point (x, y, z) is (x, y, z). What is the mass of the
rope? Where is the centre of mass?
Questions on calculation
Exercise 12.6 Evaluate the line integral
_
C
(y z)dx + (z x)dy + (x y)dz,
where C is the ellipse given as intersection of the cylinder x
2
+ y
2
= a
2
and
the plane
x
a
+
z
h
= 1 (a > 0, h > 0). The ellipse is oriented counterclockwise if
viewed from the positive direction of the Ox-axis.
161
Questions on logical thinking
Exercise 12.7 Let f (x, y) be a function whose partial derivatives are continu-
ous on R
2
. Suppose that f (0, 0) = 0, [ f
x
[ 2[x y[, and [ f
y
[ 2[x y[. Prove
that [ f (5, 4)[ 1.
162
Lecture 13
Gauss Theorem
13.1 Gauss Divergence Theorem
Recall that Greens Theorem is
_
D
Pdx + Qdy =
_
D
_
Q
x
P
y
_
dxdy
and we proved the equations
_
D
Pdx =
_
D
P
y
dxdy,
_
D
Qdy =
_
D
Q
x
dxdy
separately for simple regions rst and then we generalized them for any kind
of region.
Let us consider now a vector eld Pi + Qj + Rk in a region V R
3
.
Assume that V is simple in the sense that it can be given as x, y D R
2
,
f
1
(x, y) z f
2
(x, y), and at the same time as x, z E R
2
, g
1
(x, z)
y g
2
(x, z), and at the same time as y, z F R
2
, h
1
(y, z) x h
2
(y, z).
Let us try to understand what info we can retrieve from these assumptions.
First, lets use the presentation x, y D R
2
, f
1
(x, y) z f
2
(x, y). The
boundary V consists of three components:
(i) x, y D, z = f
1
(x, y),
(ii) x, y D, z = f
2
(x, y),
(iii) x, y D, f
1
(x, y) z f
2
(x, y).
On the third component, we have N = ni + mj. Thus, the ux integral

V
Rk dS equals zero on it. Let S
1
be given by x, y D, z = f
1
(x, y) and let
S
2
be given by x, y D, z = f
2
(x, y). The normal vector on S
1
is
f
1
x
i +
f
1
y
j k.
163
The normal vector on S
2
is
f
2
x
i
f
2
y
j +k. Thus,
_
V
Rk dS =
_
D
R(x, y, f
2
(x, y))dxdy
_
D
R(x, y, f
1
(x, y))dxdy
=
_
D
dxdy
_
f
2
(x,y)
f
1
(x,y)
R
z
dz =
_
V
R
z
dxdydz.
In a similar manner, we could prove that
_
V
Pi dS =
_
V
P
x
dxdydz,
_
V
Qj dS =
_
V
Q
y
dxdydz.
Adding these equations together, we get
_
S
F dS =
_
V
(P
x
+ Q
y
+
R
z
)dxdydz. Recall that div F = P
x
+ Q
y
+ R
z
.
Theorem 13.1 (Gauss) Given a region V R
3
whose boundary S = V
is a piecewise smooth surface oriented with the normal vector pointing
outwards and a vector eld F = Pi + Qj + Rk, we have
_
V
F dS =
_
V
div Fdxdydz.
In other words, the ux of a vector eld across a closed surface equals the
integral of its divergence over the solid 3D region bounded by the surface.
We have proved Gauss Theorem for simple regions. In order to prove
it in general, we need to show that any region can be divided into simple
ones. This can be done in a similar manner as for planar regions, though even
harder so we wont get into details.
Example 13.2 Let us nd the ux of the vector eld F = (x
2
y
2
)i + (x +
z)j + z
3
k across the boundary of the cylindrical region V R
3
given by
x
2
+ y
2
1, x 1 z x +1. In order to apply Gauss Theorem, we need
164
to nd the divergence, that is, div F = P
x
+ Q
y
+ R
z
= 2x +3z
2
. Thus,
_
S
F dS =
_
x
2
+y
2
1
dxdy
_
x+1
x1
(2x +3z
2
)dz
=
_
2
0
d
_
1
0
dr
_
r cos +1
r cos 1
r(2r cos +3z
2
)dz
=
_
2
0
d
_
1
0
_
4r
2
cos (r cos +1) +2r(r cos +1)
3
_
dr
=
_
2
0
d
_
1
0
_
4r
3
cos
2
+2(3r
3
cos
2
+r)
_
dr
because the integral of odd powers of the cosine is 0. Thus the answer is
4
1
4
+6
1
4
+2 =
9
2

Example 13.3 Lets evaluate the ux of the vector eld F = x
3
i + y
3
j + z
3
k
across the sphere given by x
2
+ y
2
+ z
2
= a
2
, a > 0. By default, it is oriented
with the normal vector pointing outside. First, div
_
x
3
i + y
3
j + z
3
k
_
= 3(x
2
+
y
2
+ z
2
). By Gauss Theorem, the ux is
_
x
2
+y
2
+z
2
a
2
3(x
2
+ y
2
+ z
2
)dxdydz =
3
_
2
0
d
_

0
d
_
a
0

4
sin d =
12a
5

5

Corollaries of Gauss Theorem
Corollary 13.4 The volume of a region bounded by a surface S equals
_
S
xi dS =
_
S
yj dS =
_
S
zk dS
Proof We have div xi = div yj = div zk = 1. By Gauss Theorem,
_
S
xi dS =
_
S
yj dS =
_
S
zk dS =
_
V
1dxdydz,
which is exactly volume of the region V. :)
165
Example 13.5 Let us nd the volume of sphere x
2
+ y
2
+ z
2
= 1. As
we already know, the normal vector is N = sin
2
cos i + sin
2
sin j +
sin cos k. Thus the volume is
_
S
zk dS =
_
2
0
d
_

0
sin cos
2
d =
4
3
.
Of course, usually the surface integral is harder to nd than the triple
integral, so this theorem is not very useful. The only known to us reasonable
example is as follows.
Example 13.6 Lets nd the volume of the torus of revolution given by
_
_
_
x = (a + b cos ) cos
y = (a + b cos ) sin
z = b sin
, 0 , 2
Here, a > 0 is the radius of the big equator and b > 0 is the radius of the
small equator. First, we need nd the normal vector, that is,
N = det
_
_
i j k
(a + b cos ) sin (a + b cos ) cos 0
b sin cos b sin sin b cos
_
_
= b(a + b cos ) [cos cos i +cos sin j +sin k].
Thus the volume equals
_
S
zk dS =
_
2
0
d
_
2
0
b(a + b cos ) sin b sin d = 2
2
ab
2
.
Corollary 13.7 Given a divergence-free vector eld F = Pi + Qj + Rk
dened in a whole R
3
and two oriented surfaces S
1
, S
2
with same oriented
boundary C = S
1
= S
2
as shown in Figure 13.1, we have
_
S
1
F dS
1
=
_
S
2
F dS
2
Proof Let V denote the region enclosed by the surfaces S
1
and S
2
. Then the
boundary of the region V geometrically consists of the surfaces S
1
and S
2
and
its oriented with the normal vector pointing outwards. But this orientation on
V induces opposite orientations on the curve C. Since the original orientation
on C was supposed to be the same as induced from either S
1
or S
2
, therefore
166
1
0.5
0
0.5
1
1
0.5
0
0.5
1
1
0.5
0
0.5
Figure 13.1: Two surfaces having same boundary
the oriented boundary of the region V consists of S
1
and S
2
, one of which
taken with reversed orientation.
Now lets apply Gauss Theorem:
=
_
V
F dS =
_
S
1
F dS
1

_
S
2
F dS
2
=
_
V
div Fdxdydz = 0
Therefore,
_
S
1
F dS
1
=
_
S
2
F dS
2
:)
Exercise 13.1 What if S
1
and S
2
intersect? Is Corollary 13.7 still true?
Notice that the proof does not require F to be dened in the whole R
3
. We
have only used the fact that the 3D region enclosed by S
1
and S
2
is completely
contained in the domain of F. Similarly to the 2D case, we could call such
a region simply connected, but this term actually means something else for
3D regions, so one would need some other word. Anyway, it is true that if
div F = 0 and if F is dened in the whole region between two surfaces S
1
and
S
2
with same boundary, then the uxes are equal.
Example 13.8 Lets again evaluate the dictatorship integral

S
V dS from
Example 12.5, where V = i + e
z
cos yj + (1 + e
z
sin y)k and S is the hemi-
sphere given by x
2
+ y
2
+ z
2
= 9, x 0. Notice that div V = 0. Therefore

S
V dS =

S
2
V dS
2
, where S
2
is the at disk bounded by the same circle.
167
Thus S
2
is given by x = 0, y
2
+z
2
9. Its normal vector is N = i = (1, 0, 0).
Finally,
_
S
V dS =
_
S
2
V dS
2
=
_
y
2
+z
2
9
1dydz = 9
Example 13.9 Lets evaluate

S
F dS, where F = e
xy
i + e
xy
j + k and the
surface S is given by
z = y sin (x
2
+ y), 0 y 1 x
2
and oriented with the normal vector pointing upwards. Again, notice that
div F = 0, so instead of nding the ux across the surface S we can evaluate
the ux through the at region
x = x, y = y, z = 0, 0 y 1 x
2
.
Here the normal vector is N = k, so
_
S
F dS =
_
1
1
dx
_
1x
2
0
dy =
4
3
.
Gauss Theorem in physics
Now we can explain physical meaning of the divergence using the Gauss
Theorem. Let v = Pi + Qj + Rk be a eld of velocities in a liquid; lets look at
some point a R
3
and consider a small ball centered at a.
If the liquid just ows through it, then the ux is zero because all the liquid
comes in and then goes out. Therefore, div v(a) = 0.
If div v(a) > 0, then the ux must be positive. It means that some liquid
appears at a, so a is a source.
If div v(a) < 0, then the liquid disappears at a, so a is a sink.
Generally, liquids and gases are described by the Navier-Stokes equations.
One of them is

t
+div(v) = 0
Here is the density function and v is the eld of velocities. In an incom-
pressible liquid, we have = const. So the equation is just div v = 0, which we
predicted by pure mathematical arguments!
Curl, divergence, Stokes and Gauss theorems are used in Maxwells equa-
tions describing electro-magnetic elds. Specically, if E is electric eld and H
is magnetic eld, then in vacuum (if its not vacuum, then equations become
more sophisticated) we have the following formulae:
168
Gauss law:
div E = 0 and div H = 0
Faradays law of induction:
curl E =
0
H
t
Ampres circuital law:
curl H =
0
E
t
Here,
0
and
0
are some (known) constants.
13.2 Summary on Multi-variate Integral Calculus
Fubinis Theorem
To evaluate the integral of a function f (x, y, z) over a region V R
3
, we need
to dene V by inequalities
a x b, f
1
(x) y f
2
(x), g
1
(x, y) z g
2
(x, y)
Then we have
_
V
f (x, y, z)dxdydz =
_
b
a
dx
_
f
2
(x)
f
1
(x)
dy
_
g
2
(x,y)
g
1
(x,y)
f (x, y, z)dz
In the same manner, one does double or multiple integrals.
A very serious mistake is to change the order of integration like
_
f
2
(x)
f
1
(x)
dy
_
b
a
dx
_
g
2
(x,y)
g
1
(x,y)
f (x, y, z)dz
169
Change of variables in a multiple integral
Given a transformation T : u x, where u R
n
and x R
n
, its Jacobian
determinant is
J = det
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
x
1
u
1
,
x
1
u
2
, ,
x
1
u
m
,
x
2
u
1
,
x
2
u
2
, ,
x
2
u
m
,
.
.
.
.
.
.
.
.
.
.
.
.
x
m
u
1
,
x
m
u
2
, ,
x
m
u
m
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
Then for any function f (x), we have
_
T(E)
f (x)dx =
_
E
f (T(u))[J[du
In particular, J = r for polar and cylindrical coordinates and J =
2
sin
for spherical coordinates. Usually, polar and cylindrical coordinates are suit-
able for domains and functions containing x
2
+ y
2
= r
2
and spherical coordi-
nates are good for domains and functions that have x
2
+ y
2
+ z
2
=
2
.
A very serious mistake is forgetting about the Jacobian like
_
E
f (T(u))du
Line and surface integrals
A curve is given by a parametric equation (x, y, z) = r(t), r : R R
3
. Its
tangent vector is
T = r
/
(t) =
dx
dt
i +
dy
dt
j +
dz
dt
k
A surface is given by a parametric equation (x, y, z) = r(u, v), r : R
2
R
3
.
Its normal vector is
N = r
u
r
v
=
_
x
u
i +
y
u
j +
z
u
k
_

_
x
v
i +
y
v
j +
z
v
k
_
The line integral of a function f is
_
b
a
f |T|dt. The surface integral of a
function f is
_
D
f |N|dudv.
170
The line integral of a vector eld F is
_
b
a
F Tdt. The surface integral of a
vector eld F is
_
D
F Ndudv
Theorems about the integral on the boundary
Newton-Leibniz Theorem is
_
b
a
d f = f (b) f (a)
Greens Theorem is
_
D
_
Q
x
P
y
_
dxdy =
_
D
Pdx + Qdy
Stokes Theorem is
_
S
curl F dS =
_
S
Pdx + Qdy + Rdz
Gauss Theorem is
_
V
div Fdxdydz =
_
V
F dS
13.3 Exercises
Exercise 13.2 For a function f (x, y, z), its Laplacian is f = f
xx
+ f
yy
+ f
zz
.
Consider a region V R
3
and let S = V be its boundary. Further, let U be
the unit normal vector to S pointing outwards. Prove that
_
S
D
U
f dS =
_
V
f dxdydz.
Here, D
U
f is the directional derivative of the function f along the vector
U. Thus this formula relates the surface integral of a function and the triple
integral. In fact, this is a particular case of Greens rst identity.
Exercise 13.3 A function f (x, y, z) is called harmonic if f = 0 (in other words,
if its a solution to the Laplace equation). Let f (x, y, z) be a harmonic function.
Consider a 3D region V enclosed by a piecewise smooth surface S = V. Prove
that
_
V
_
f
2
x
+ f
2
y
+ f
2
z
_
dxdydz =
_
S
f (D
U
f ) dS,
where U is the unit normal vector to the surface S pointing outwards.
Use this equation to prove that a function harmonic in a bounded region
V is uniquely determined by its values on V.
171
3
2
1
0
1
2
3
3
2
1
0
1
2
3
1
0.5
0
0.5
1
Figure 13.2: Surface for Exercise 13.4.
Exercise 13.4 Let a, b, c > 0 be xed real numbers. Find the volume enclosed
by the surface
x = a cos u cos v + b sin u sin v, y = a cos u sin v b sin u cos v, y = c sin u
and the planes z = c. The picture of the surface is attached.
172

You might also like