You are on page 1of 12

Chapter 2

Differentiation in higher dimensions


2.1

The Total Derivative

Recall that if f : R R is a 1-variable function, and a R, we say that f is differentiable


(a)
tends to a finite limit, denoted f 0 (a), as h tends
at x = a if and only if the ratio f (a+h)f
h
to 0.
There are two possible ways to generalize this for vector fields
f : D Rm , D Rn ,
for points a in the interior D0 of D. (The interior of a set X is defined to be the subset
X 0 obtained by removing all the boundary points. Since every point of X 0 is an interior
point, it is open.) The reader seeing this material for the first time will be well advised to
stick to vector fields f with domain all of Rn in the beginning. Even in the one dimensional
case, if a function is defined on a closed interval [a, b], say, then one can properly speak of
differentiability only at points in the open interval (a, b).
The first thing one might do is to fix a vector v in Rn and say that f is differentiable
along v iff the following limit makes sense:
1
(f (a + hv) f (a)) .
h0 h
lim

When it does, we write f 0 (a; v) for the limit. Note that this definition makes sense because a
is an interior point. Indeed, under this hypothesis, D contains a basic open set U containing
a, and so a + hv will, for small enough h, fall into U , allowing us to speak of f (a + hv). This
1

derivative behaves exactly like the one variable derivative and has analogous properties. For
example, we have the following
Mean Value Theorem Assume f 0 (a + tv; v) exists for all 0 t 1. Then t0 [0, 1]
such that f 0 (a + t0 v; v) = f (a + v) f (a).
Proof. Put (t) = f (a + tv). By hypothesis, is differentiable at every t in [0, 1], and
0 (t) = f 0 (a + tv; v). By the one variable mean value theorem, there exists a t0 such that
0 (t0 ) =

(1) (0)
= (1) (0) = f (a + v) f (a).
1

Done.
When v is a unit vector, f 0 (a; v) is called the directional derivative of f at a in the
direction of v.
The disadvantage of this construction is that it forces us to study the change of f in one
direction at a time. So we revisit the one-dimensional definition and note that the condition
for differentiability
there is equivalentto requiring that there exists a constant c (= f 0 (a)),

f (a + h) f (a) ch
such that lim
= 0. If we put L(h) = f 0 (a)h, then L : R R is
h0
h
clearly a linear map. We generalize this idea in higher dimensions as follows:
Definition. Let f : D Rm (D Rn ) be a vector field and a an interior point of D. Then
f is differentiable at x = a if and only if there exists a linear map L : Rn Rm such that
||f (a + u) f (a) L(u)||
= 0.
u0
||u||

()

lim

Note that the norm || || denotes the length of vectors in Rm in the numerator and in Rn in
the denominator; this should not lead to any confusion, however.
Lemma 1 Such an L, if it exists, is unique.
Proof. Suppose we have L, M : Rn Rm satisfying (*) at x = a. Then
||L(u) M (u)||
||L(u) + f (a) f (a + u) + (f (a + u) f (a) M (u))||
= lim
u0
u0
||u||
||u||
||L(u) + f (a) f (a + u)||
lim
u0
||u||
||f (a + u) f (a) M (u)||
+ lim
= 0.
u0
||u||
lim

Pick any non-zero v Rn , and set u = tv, with t R. Then, the linearity of L, M implies
that L(tv) = tL(v) and M (tv) = tM (v). Consequently, we have
||L(tv) M (tv)||
= 0
t0
||tv||

lim

|t| ||L(v) M (v)||


t0
|t| ||v||
1
||L(v) M (v)||.
=
||v||
= lim

Then L(v) M (v) must be zero.


Definition. If the limit condition () holds for a linear map L, we call L the total derivative of f at a, and denote it by Ta f .
It is mind boggling at first to think of the derivative as a linear map. A natural question
which arises immediately is to know what the value of Ta f is at any vector v in Rn . We will
show in section 2.3 that this value is precisely f 0 (a; v), thus linking the two generalizations
of the one-dimensional derivative.
Sometimes one can guess what the answer should be, and if (*) holds for this choice, then
it must be the derivative by uniqueness. Here are four examples which illustrate this.
(1) Let f be a constant vector field, i.e., there exists a vector w Rm such that
f (x) = w, for all x in the domain D. Then we claim that f is differentiable at any a D0
with derivative zero. Indeed, if we put L(u) = 0, for any u Rn , then (*) is satisfied,
because f (a + u) f (a) = w w = 0.
(2) Let f be a linear map. Then we claim that f is differentiable everywhere with
Ta f = f . Indeed, if we put L(u) = f (u), then by the linearity of f , f (a + u) f (a) L(u)
will be zero for any u Rn , so that (*) holds trivially.
(3) Let f (x) =< x, x >= ||x||2 . Then
f (a + u) f (a) = < a + u, a + u > < a, a >=< u, a > + < a, u > + < u, u >
=2 < a, u > +||u||2
using the linearity of < ., . > in each argument as well as its symmetry. Defining L by
L(u) = 2 < a, u > we have
||u||2
||f (a + u) f (a) L(u)||
=
= ||u||
||u||
||u||
3

and this tends to zero as u tends to zero. Identifying linear maps from Rn to R with row
vectors we get Ta f = 2at .
(4) Here is another variation on the theme that the derivative of x2 is 2x. Let
f (X) = X 2 = X X where X is an n n-matrix and the denotes matrix multiplication.
2
2
2
So f is a function from Rn to Rn where we view the space of n n-matrices as Rn . Again
just using bilinearity of matrix multiplication we find f (A + U ) f (A) = A U + U A + U 2 .
Using the factq
(not proven in this class) that ||XY || ||X|| ||Y || for matrices X and Y
P
2
where ||X|| =
i,j |Xi,j | we find that TA f is the linear map U 7 A U + U A. But this
time we cannot rewrite this as 2 A U since matrix multiplication is not commutative.
This concludes our list of examples where we can show directly that Ta f exists. Theorem
1 (d) below will give a powerful criterion for the existence of Ta f in many more examples.
Before we leave this section, it will be useful to take note of the following:
Lemma 2 Let f1 , . . . , fm be the component (scalar) fields of f . Then f is differentiable at
a iff each fi is differentiable at a.
An easy consequence of this lemma is that, when n = 1, f is differentiable at a iff the
following familiar looking limit exists in Rm :
lim

h0

f (a + h) f (a)
,
h

allowing us to suggestively write f 0 (a) instead of Ta f . Clearly, f 0 (a) is given by the vector
0
(f10 (a), . . . , fm
(a)), so that (Ta f )(h) = f 0 (a)h, for any h R.
Proof. Let f be differentiable at a. For each v Rn , write Li (v) for the i-th component
of (Ta f )(v). Then Li is clearly linear. Since fi (a + u) fi (u) Li (u) is the i-th component
of f (a + u) f (a) L(u), the norm of the former is less than or equal to that of the latter.
This shows that (*) holds with f replaced by fi and L replaced by Li . So fi is differentiable
for any i. Conversely, suppose each fi differentiable. Put L(v) = ((Ta f1 )(v), . . . , (Ta fm )(v)).
Then L is a linear map, and by the triangle inequality,
||f (a + u) f (a) L(u)||

m
X

|fi (a + u) fi (a) (Ta fi )(u)|.

i=1

It follows easily that (*) exists and so f is differentiable at a.

2.2

Partial Derivatives

Let {e1 , . . . , en } denote the standard basis of Rn . The directional derivatives along the unit
vectors ej are of special importance.
Definition. Let j n. The jth partial derivative of f at x = a is f 0 (a; ej ), denoted by
f
(a) or Dj f (a).
xj
Just as in the case of the total derivative, it can be shown that

f
fi
(a) exists iff
(a)
xj
xj

exists for each coordinate field fi .


Example:

Define f : R3 R2 by
f (x, y, z) = (exsin(y) , zcosy).

All the partial derivatives exist at any a = (x0 , y0 , z0 ). We will show this for

f
and leave
y

it to the reader to check the remaining cases. Note that


cos(y0 + h) cos(y0 )
1
ex0 sin(y0 +h) ex0 sin(y0 )
f (a + he2 ) f (a) = (
, z0
).
h
h
h
We have to understand the limit as h goes to 0. Then the methods of one variable calculus
show that the right hand side tends to the finite limit (x0 cos(y0 )ex0 sin(y0 ) , z0 sin(y0 )), which
f
is
(a). In effect, the partial derivative with respect to y is calculated like a one variable
y
f
derivative, keeping x and z fixed. Let us note without proof that
(a) is (sin(y0 )ex0 sin(y0 ) , 0)
x
f
and
(a) is (0, cosy0 ).
z
It is easy to see from the definition that f 0 (a; tv) equals tf (a; v), for any t R. We also
have the following
Lemma 3 Suppose the derivatives of f along any v Rn exist near a and are continuous
at a. Then
f 0 (a; v + v 0 ) = f 0 (a; v) + f 0 (a; v 0 ),
for all v, v 0 in Rn . In particular, the directional derivatives of f are all determined by the n
partial derivatives.
5

Proof.

If , are functions of h R, let us write


(h) (h) iff

(h) (h)
= 0.
h0
h
lim

Check that is an equivalence relation. Then by definition, we have, for all a D0 and u
in Rn ,
f (a + hu) f (a) + hf 0 (a; u).
Then f (a + h(v + v 0 )) is equivalent to f (a) + hf 0 (a; v + v 0 ) on the one hand, and to
f (a + hv) + hf 0 (a + hv; v 0 ) f (a) + h(f 0 (a; v) + f 0 (a + hv; v 0 )),
on the other. Moreover, the continuity hypothesis shows that f 0 (a + hv; v 0 ) tends to f 0 (a; v 0 )
as h goes to 0. Consequently, we get the equivalence of f 0 (a; v + v 0 ) with f 0 (a; v) + f 0 (a; v 0 ).
Since they are independent of h, they must in fact be equal.
P
Finally, since {ej |j n} is a basis of Rn , we can write any v as j j ej , and by what
P
f
we have just shown, f 0 (a : v) is determined as j j
(a).
xj
In the next section we will show that the conclusion of this lemma remains valid without
the continuity hypothesis if we assume instead that f has a total derivative at a.
The gradient of a scalar field g at an interior point a of its domain in Rn is defined to
be the following vector in Rn :


g
g
(a), . . . ,
(a) .
g(a) = grad g(a) =
x1
xn
Given a vector field f as above, we can then put together the gradients of its component
fields fi , 1 i m, and form the following important matrix, called the Jacobian matrix
at a:


fi
Df (a) =
(a)
Mm,n (R).
xj
1im,1jn
The i-th row is given by fi (a), while the j-th column is given by

2.3

f
(a).
xj

The main theorem

In this section we collect the main properties of the total and partial derivatives.
6

Theorem 1 Let f : D Rm be a vector field, and a an interior point of its domain D Rn .


(a) If f is differentiable at a, then for any vector v in Rn ,
(Ta f )(v) = f 0 (a, v).
In particular, since Ta f is linear, we have
f 0 (a; v + v 0 ) = f 0 (a; v) + f 0 (a; v 0 ),
for all v, v 0 in Rn and , in R.
(b) Again assume that f is differentiable. Then the matrix of the linear map Ta f relative
to the standard bases of Rn , Rm is simply the Jacobian matrix of f at a.
(c) f differentiable at a f continuous at a.
(d) Suppose all the partial derivatives of f exist near a and are continuous at a. Then Ta f
exists.
(e) (chain rule) Consider
f

Rn
Rm
Rk .
a 7 b = f (a)
Suppose f is differentiable at a and g is differentiable at b = f (a). Then the composite
function h = g f is differentiable at a and moreover,
Ta h = Tb g Ta f.
In terms of the Jacobian matrices, this reads as
Dh(a) = Dg(b) Df (a) Mk,n (R)
where indicates a matrix product.
(f ) Assume Ta f and Ta g exist. Then Ta (f + g) exists and
Ta (f + g) = Ta f + Ta g

(additivity)

Assume (m = 1), i.e. f , g are scalar fields, differentiable at a. Then


(i) Ta (f g) = f (a)Ta g + g(a)Ta f
f
g(a)Ta f f (a)Ta g
(ii) Ta ( ) =
g
g(a)2

(product rule)
if g(a) 6= 0

(quotient rule)

The following corollary is an immediate consequence of the theorem, which we will make
use of in the next chapter on normal vectors and extrema.
Corollary 1 Let g be a scalar field, differentiable at an interior point b of its domain D in
Rn , and let v be any vector in Rn . Then we have
g(b) v = f 0 (b; v).
Furthermore, let be a function from a subset of R into D Rn , differentiable at an interior
point a mapping to b. Put h = g . Then h is differentiable at a with
h0 (a) = g(b) 0 (a).
Here is a simple observation before we begin the proof. Let f : R2 R2 be a vector
field such that f1 (x, y) = (x), f2 (x, y) = (y), with , differentiable
everywhere.
Then,
 0

(x)
0
clearly, the Jacobian matrix Df (x, y) is the diagonal matrix
. Conversely,
0
0
(y)

(x) 0
suppose we know apriori that Df is diagonal (at all points), say Df (x, y) =
.
0
(y)
R
R
1
1
2
2
Then f
= (x), f
= 0 = f
, f
= (y) f1 (x, y) = (x) dx; f2 (x, y) = (y) dy. So
x
y
x
y
f1 is independent of y and f2 is independent of x.
Proof of main theorem. (a) It suffices to show that (Ta fi )(v) = fi (a; v) for each i n
and this is clear if v = 0 (both sides are zero by definition). So assume v 6= 0. By definition,
||fi (a + u) fi (a) (Ta fi )(u)||
=0
u0
||u||
lim

This means that we can write for u = hv, h R,


fi (a + hv) fi (a) h(Ta fi )(v)
= 0,
h0
h||v||
lim

multiply by ||v||| and deduce the existence of


fi (a + hv) fi (a)
= lim (Ta fi )(v) = Ta fi (v).
h0
h0
h

f 0 (a; v) = lim

(b) By part (a), each partial derivative exists at a (since f is assumed to be differentiable
at a). The matrix of the linear map Ta f is determined by the effect on the standard basis
vectors. Let {e0i |1 i m} denote the standard basis in Rm . Then we have, by definition,
m
m
X
X
fi
0
(Ta f )(ej ) =
(Ta fi )(ej )ei =
(a)e0i .
x
j
i=1
i=1

The matrix obtained is easily seen to be the Jacobi matrix Df (a).


(c) Suppose f is differentiable at a. This certainly implies that the limit of the function
f (a + u) f (a) (Ta f )(u), as u tends to 0 Rn , is 0 Rm (from the very definition of
Ta f , ||f (a + u) f (a) (Ta f )(u)|| tends to zero faster than ||u||, in particular it tends
to zero). Since Ta f is linear, Ta f is continuous (everywhere), so that limu0 (Ta f )(u) = 0.
Hence limu0 f (a + u) = f (a) which means that f is continuous at a.
(d) By hypothesis, all the partial derivatives exist near a = (a1 , . . . , an ) and are continuous
there. Write u = (u1 , . . . , un ) and define a linear map L by
L(u) =

n
X
j=1

We can write
f (a + u) f (a) =

n
X

uj

f
(a).
xj

(j (aj + uj ) j (aj )),

j=1

where each j is a one variable function (depending on u) defined by


j (t) = f (a1 + u1 , . . . , aj1 + uj1 , t, aj+1 , . . . , an ).
By the mean value theorem,
j (aj + uj ) j (aj )
f
= 0j (tj (u)) =
(yj (u)),
uj
xj
for some tj (u) [aj , aj + uj ], with
yj (u) = (a1 + u1 , . . . , aj1 + uj1 , tj (u), aj+1 , . . . , an ).
Putting these together, we see that it suffices to show that the following limit is zero:


n
1 X
f
f
lim
|
uj
(a)
(yj (u)) |.
u0 ||u||
x
x
j
j
j=1
Clearly, |uj | ||u||, for each j. So it follows, by the triangle inequality, that this limit is
f
f
bounded above by the sum over j of lim |
(a)
(yj (u))|, which is zero by the continuity
u0 xj
xj
of the partial derivatives at a. Here we are using the fact that each yj (u) approaches a as
u = (u1 , . . . , un ) goes to 0. Done.
(e) First we need the following simple
9

Lemma 4 Let T : Rn Rm be a linear map. Then, c > 0 such that ||T v|| c||v|| for any
v Rn .
Proof of Lemma. Let
A be the matrix of T relative to the standard bases. Put C =
n
P
maxj {||T (ej )||}. If v =
j ej , then
j=1

||T (v)|| = ||

j T (ej )|| C

C(

n
X

|j | 1

j=1

n
X

2 1/2

|j | )

j=1

n
X

(
1)1/2 C n||v||,
j=1

by the CauchySchwarz inequality. We are done by setting c = C n.


Note that the Lemma implies that linear maps T are continuous. The optimal choice of
c is
c = sup{

||T v||
|v Rn \ {0}} = sup{||T v|| | ||v|| = 1}
||v||

Note that the Lemma implies that the first set is bounded so the sup exists. The second set
is even compact so the sup is attained, i.e. there is always a vector v of Norm one for which
||T v|| is the optimal constant c.
Proof of (e) (contd.).

Write L = Ta f , M = Tb g, N = M L. To show: Ta h = N .

Define F (x) = f (x) f (a) L(x a), G(y) = g(y) g(b) M (y b) and H(x) = h(x)
h(a) N (x a). Then we have
||G(y)||
||F (x)||
= 0 = lim
.
xa ||x a||
yb ||y b||
lim

So we need to show:

||H(x)||
= 0.
xa ||x a||
lim

But
H(x) = g(f (x)) g(b) M (L(x a))
Since L(x a) = f (x) f (a) F (x), we get
H(x) = [g(f (x)) g(b) M (f (x) f (a))] + M (F (x)) = G(f (x)) + M (F (x)).
Therefore it suffices to prove:
10

(i) lim

xa

||G(f (x))||
= 0 and
||x a||

||M (F (x))||
= 0.
xa
||x a||

(ii) lim

By Lemma 4, we have ||M (F (x))|| c||F (x)||, for some c > 0. Then

||M (F (x))||

||x a||

||F (x)||
= 0, yielding (ii).
xa ||x a||

c lim

||G(y)||
= 0. So we can find, for every  > 0, a
yb ||y b||
> 0 such that ||G(f (x))|| < ||f (x) b|| if ||f (x) b|| < . But since f is continuous,
||f (x) b|| < whenever ||x a|| < 1 , for a small enough 1 > 0. Hence
On the other hand, we know lim

||G(f (x))|| < ||f (x) b|| = ||F (x) + L(x a)||
||F (x)|| + ||L(x a)||,
||F (x)||
xa ||xa||

by the triangle inequality. Since lim

is zero, we get

||G(f (x))||
||L(x a)||
 lim
.
xa ||x a||
xa ||x a||
lim

Applying Lemma 4 again, we get ||L(x a)|| c0 ||x a||, for some c0 > 0. Now (i) follows
easily.
(f) (i) We can think of f + g as the composite h = s(f, g) where (f, g)(x) = (f (x), g(x))
and s(u, v) = u + v (sum). Set b = (f (a), g(a)). Applying (e), we get
Ta (f + g) = Tb (s) Ta (f, g) = Ta (f ) + Tb (g).
Done. The proofs of (ii) and (iii) are similar and will be left to the reader.
QED.
Remark. It is important to take note of the fact that a vector field f may be differentiable
at a without the partial derivatives being continuous. We have a counterexample already
when n = m = 1 as seen by taking
 
1
2
f (x) = x sin
if x 6= 0,
x

11

and f (0) = 0. This is differentiable everywhere. The only question is at x = 0, where the
relevant limit lim f (h)
is clearly zero, so that f 0 (0) = 0. But for x 6= 0, we have by the
h
h0

product rule,
 
 
1
1
f (x) = 2xsin
cos
,
x
x
0

which does not tend to f 0 (0) = 0 as x goes to 0. So f 0 is not continuous at 0.

2.4

Mixed partial derivatives

Let f be a scalar field, and a an interior point in its domain D Rn . For j, k n, we may
consider the second partial derivative


2f

f
(a) =
(a),
xj xk
xj xk
when it exists. It is called the mixed partial derivative when j 6= k, in which case it is of
interest to know whether we have the equality
(3.4.1)

2f
2f
(a) =
(a).
xj xk
xk xj

2f
2f
and
both exist near a and are continuous there.
xj xk
xk xj
Then the equality (3.4.1) holds.

Proposition 1 Suppose

The proof is similar to the proof of part (d) of Theorem 1.

12

You might also like