Optimization Notes

Notes on Optimization and Pareto Optimality
John Duggan
University of Rochester
September 4, 2012
Contents
1 Opening Remarks 1
2 Dierentiability 1
2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.2 Directional Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Continuous Partial Dierentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Second Derivatives and Concavity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3 Unconstrained Optimization 22
3.1 First and Second Order Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Solution Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Envelope Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4 Pareto Optimality 33
4.1 Existence of Pareto Optimals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Characterization with Concavity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3 Characterization without Concavity . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5 Single Equality Constraint 46
5.1 First Order Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2 Convex Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.3 Consumer Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.4 Second Order Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6 Multiple Equality Constraints 60
6.2 Convex Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.3 Consumers Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7 Inequality Constraints 67
7.2 Concave Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8 Pareto Optimality Revisited 84
9 Mixed Constraints 88
1 Opening Remarks
These notes were written for PSC 408, the second semester of the formal modeling
sequence in the political science graduate program at the University of Rochester. I
hope they will be a useful reference on optimization and Pareto optimality for polit-
ical scientists, who otherwise would see very little of these subjects, and economists
wanting deeper coverage than one gets in a typical rst-year micro class. I do not
invent any new theory, but I try to draw together results in a systematic way and to
build up gradually from the basic problems of unconstrained optimization and opti-
mization with a single equality constraint. That said, Theorem 9.2 may be a slightly
new way of presenting results on convex optimization, and Ive strived for quantity
and quality of gures to aid intuition.
2 Dierentiability
2.1 Preliminaries
Recall that we can multiply a vector x = (x
1
, . . . , x
n
) R
n
by a scalar R (applying
to each coordinate of x) and add two vectors x, y R
n
(summing coordinate by
coordinate), writing the new vectors as x and x + y, respectively. Given m vectors
x
1
, . . . , x
m
and corresponding scalars
1
, . . . ,
m
, we call
1
x
1
+
2
x
2
+ +
m
x
m
a linear combination of the vectors; it is a convex combination if the coecients are
non-negative and sum to one. The dot product of two vectors is x y =
n
i=1
x
i
y
i
,
and geometrically x y = 0 means that the vectors are orthogonal, i.e., they form a
right angle. The Euclidean norm in R
n
is dened by
||x y|| =
_
(x y) (x y) =
_
n
i=1
(x
i
y
i
)
2
for all x, y R
n
, which gives us a measure of distance between x and y. A fundamental
fact about Euclidean space, known as the Cauchy-Schwartz inequality, is that for all
r
x
B
r
(x)
x, y R
n
, x y ||x|| ||y||, where equality holds
if and only if there exist , R with x = y.
Furthermore, a vector t R
n
is a direction if it
is norm one, i.e., ||t|| = 1. The open ball of ra-
dius r > 0 around a vector x R
n
is dened as
B
r
(x) = {y R
n
| ||y x|| < r}, depicted to the
right for the two-dimensional case. A sequence in R
n
is a countable set {x
n
} of vectors indexed by the natural numbers n N, and we say
a sequence {x
n
} converges to a vector x if it eventually grows arbitrarily close to x:
1
for all > 0, there exists n such that for all m n, we have x
m
B
(x). This is
written x
n
x or lim
n
x
n
= x, the sequence is convergent, and x is the limit of
the sequence. If a sequence {
n
} of real numbers is decreasing, i.e., higher indices are
assigned to lower numbers, and converges to , we write
n
; and if it increases
to , we write
n
. A sequence {
k
} of real numbers converges asymptotically to
if it converges to x but does not visit it innitely often:
k
and there exists k
such that for all k, we have
= . In this case, we write

k
.
Recall that a vector x is an interior point of a set X
n
if there exists > 0 such that
B
(x) X; and it is a boundary point of X if for all > 0, B
(x)X = = B
(x)\X.
The interior of X, denoted intX, consists of all of its interior points; the boundary of
X, denoted bdX, consists of all of its boundary points; and the closure of X, denoted
closX, consists of all interior and boundary points, i.e., closX = (intX) (bdX). A
set X is open if X = intX; it is closed if X = closX, or equivalently, if the limits
of all convergent sequences in X belong to X. We say X is bounded if there exists
r > 0 such that X B
r
(0). It is compact if it is closed and bounded; equivalently,
X is compact if every sequence in X has a subsequence (possibly after deleting some
elements from the original sequence) that converges to an element of X.
A set X R
n
is convex if for all x, y X and all (0, 1), we have x +
(1 )y X. A special case is the convex hull of a set {x
1
, . . . , x
n
} of vectors,
which consists of all convex combinations of x
1
, . . . , x
n
. A function f : R
n
R is
linear if there exists a xed gradient a R
n
such that for all x R
n
, we have
f = c
X
Y
f(x) = a x. A hyperplane is any level set of a linear
function with non-zero gradient. Let two sets X, Y
R
n
be convex and such that Y has nonempty interior
disjoint from X, i.e., intY = and XintY = ; then
the separating hyperplane theorem establishes that the
two sets can be separated by a linear function with
non-zero gradient, i.e., there is a linear function f
with gradient a = 0 such that for all x X and all y Y , we have f(x) = a x
a y = f(y). Moreover, if X is compact, Y is closed, and X Y = , then f can be
chosen so that the previous weak inequalities hold strictly.
Given X R
n
and x X, a function f : X R is continuous at x if for every
sequence {x
m
} converging to x X, we have lim
m
f(x
m
) = f(x). It is continuous
if it is continuous at every x X. When the domain X is open, an equivalent de-
nition is that for every open set Y R, the preimage f
1
(Y ) = {x X | f(x) Y }
is open. If K X is compact and f is continuous, then the image f(K) is com-
pact; thus, there exists x K such that for all y K, we have f(x) f(y); in
this case, x maximizes the function f over the set K. Given a convex set X R
n
,
we say f : X R is concave if for all distinct x, y X and all (0, 1), we have
f(x+(1)y) f(x)+(1)f(y); and the denition of a strictly concave function
2
is identical but with strict inequality replacing weak. The denition of convex and
strictly convex function is obtained by replacing the weak and strict greater than rela-
tion, respectively, with weak and strict less than. We say f : X R is quasi-concave if
for all distinct x, y X and all (0, 1), we have f(x+(1)y) min{f(x), f(y)};
and the denition of strictly quasi-concave function is identical but with strict in-
equality replacing weak. And, as above, the denition of quasi-convex and strictly
quasi-convex function is obtained by replacing greater than with less than.
Given vectors x
1
, . . . , x
m
R
n
, we say {x
1
, . . . , x
m
} is linearly dependent if there exist
scalars
1
, ,
m
R (not all zero) such that
1
x
1
+
2
x
2
+ +
m
x
m
= 0,
and the set is linearly independent if it is not linearly dependent. The span of
{x
1
, . . . , x
m
} is the set of linear combinations of the vectors x
i
, i = 1, . . . , m, and
it turns out the set is linearly independent if no vector x
j
belongs to the span of the
other vectors. The rank of the set is the dimensionality of its span, which is the same
as the size of the largest linearly independent set contained in {x
1
, . . . , x
m
}.
A matrix is a rectangular array of numbers, such as
A =
_
_
a
1,1
a
1,n
.
.
.
.
.
.
.
.
.
a
m,1
a
m,n
_
_
,
with entry a
i,j
in row i, column j. The above matrix is dimension m n. Given
a vector t = (t
1
, . . . , t
n
), we may sometimes view t as a n 1 column matrix and
the transpose t
as the equivalent 1 n row matrix. We can right-multiply A by

t = (t
1
, . . . , t
n
) or left-multiply by s = (s
1
, . . . , s
m
) to obtain, respectively, a new
m1 column matrix and 1 n row matrix
At =
_
n
j=1
t
j
a
1,j
.
.
.
n
j=1
t
j
a
m,j
_
_
and s
A =
_
m
i=1
s
i
a
i,1

m
i=1
s
i
a
i,n
.
These operations are associative, i.e., (s
A)t = s
(At), and produce a 1 1 matrix,

i.e., a number. When A is square, say n n, the product is simply
t
At =
n
i=1
n
j=1
t
i
t
j
a
i,j
.
The row rank of the matrix A is the rank of the set consisting of its rows, and the
column rank of A is the rank of its columns; a fundamental theorem of linear algebra
3
establishes that these quantities are the same, and we can then unambiguously refer
to the rank of a matrix. It has full column (resp. row) rank if the columns
(resp. rows) are linearly independent. We say an n n matrix A is symmetric if
a
i,j
= a
j,i
for all i, j = 1, . . . , n; it is negative semi-denite if for all non-zero t R
n
,
we have t
At 0; and it is negative denite if the latter inequality always holds

strictly. When an n n matrix A has full rank, it has a unique inverse matrix
A
1
such that for all x R
n
, x = AA
1
x = A
1
Ax. Note that a negative denite
matrix necessarily has full rank. A square matrix A is diagonally dominant if for
each row i, we have |a
i,i
|

j=i
|a
i,j
|, and it is strictly diagonally dominant if the
latter inequality holds strictly for each row. Every symmetric, diagonally dominant
matrix with non-positive entries along the diagonal is negative semi-denite; and
every symmetric, strictly diagonally dominant matrix with negative entries along the
diagonal is negative denite.
2.2 Directional Derivatives
Given a set X R
n
, a function f : X R, and an element x intX, we say
f is dierentiable at x in direction t if for every sequence {
k
} in R converging
asymptotically to zero, the sequence
_
f(x +
k
t) f(x)
k
_
converges. In this case, the limit, which we can write as
lim
0
f(x + t) f(x)
,
must in fact be unique; we call the limit the derivative of f at x in direction t, and we
denote it D
t
f(x). We say f is directionally dierentiable at x if it is dierentiable at
x in every direction. It is directionally dierentiable if it is directionally dierentiable
at every x intX. See Figures 1 and 2 for geometric insight into the nature of the
directional derivative, which is the slope of the straight line tangent to the graph in
the direction t.
Note that we dene derivatives only at interior points of the domain. Although we
could be more general, the reason is that an element x could be on the boundary
of the domain X yet not be approachable along any straight lines in the domain;
even if X is convex, there could be lines along which an element x X could not be
approached. See Figure 3. By considering interior points, this is no longer an issue,
and we can approach elements x intX along straight lines in all directions.
The derivative of f at x in the direction of one of the unit coordinate vectors, such as
e
i
, would be D
e
i f(x), but these derivatives play an important role and have a special
designation. We call this the ith partial derivative of f at x, and we use the notation
4
Figure 1: Directional derivative
Figure 2: Sideview
5
X
X
x
x
t
t
t
Figure 3: Unapproachable elements

D
i
f(x) for the ith partial derivative of f at x. An alternative and common notation
for the ith partial derivative is
f
x
i
(x). Less common is f
i
(x). We say f is partially
dierentiable at x if it is dierentiable at x in direction e
i
for each i = 1, . . . , n. It is
partially dierentiable if it is partially dierentiable at every x intX.
It is useful to consider the ith partial derivative at x in light of the denition of
directional derivative. Formally, it is the limit of
f(x +
k
e
i
) f(x)
k
=
f(x
1
, . . . , x
i1
, x
i
+
k
, x
i+1
, . . . , x
n
) f(x
1
, . . . , x
i1
, x
i
, x
i+1
, . . . , x
n
)
k
as
k
converges asymptotically to zero. Note that the only coordinate that varies in
this limit is the ith. That is, we are only considering how the values of the function
vary with x
i
, which is a real number. This suggests a simple reformulation of the
partial derivative and a straightforward way to calculate this quantity.
Consider the function h: (a, b) R, dened on an open interval around x
i
, as follows:
for all z (a, b),
h(z) = f(x
1
, . . . , x
i1
, z, x
i+1
, . . . , x
n
).
That is, h gives us the values of f as we vary the ith coordinate around x
i
, holding
all other coordinates xed. By denition, the derivative of h at x
i
is the limit of
h(x
i
+
k
) h(x
i
)
k
=
f(x
1
, . . . , x
i1
, x
i
+
k
, x
i+1
, . . . , x
n
) f(x
1
, . . . , x
i1
, x
i
, x
i+1
, . . . , x
n
)
k
6
as
k
converges to zero asymptotically. In other words, it is exactly the ith partial
derivative of f at x. In conclusion, to calculate the ith partial derivative of f at x, we
can simply use the standard rules for dierentiation in applied to x
i
in the expression
for f(x
1
, . . . , x
n
), but now xing x
1
, . . . , x
i1
, x
i+1
, . . . , x
n
as constants.
Example If f is linear with gradient a, then X = R
n
and
f(x) = a
1
x
1
+ + a
i1
x
i1
+ a
i
x
i
+ a
i+1
x
i+1
+ + a
n
x
n
.
Viewing this as a function of x
i
alone, with all other variables held constant, the
derivative is simply a
i
, the coecient of x
i
in the above expression. Thus, D
i
f(x) = a
i
.
For another example, xing x R
n
, let f be the quadratic function f(x) = ||x x||
2
.
To calculate the partial derivatives of f, it is useful to expand it as
f(x) = (x
1
x
1
)
2
(x
n
x
n
)
2
.
Treating all other coordinates as constant and dierentiating only with respect to x
i
,
we nd that
D
i
f(x) = 2( x
i
x
i
).
For yet another example, let f : R
n
+
R be Cobb-Douglas, so there exist coecients
1
, . . . ,
n
0 (not all zero) such that
f(x
1
, . . . , x
n
) =
n
i=1
x
i
i
.
Then the ith partial derivative is
D
i
f(x) =
i
x
i
1
i
j=i
x
j
j
=

i
x
i
f(x).
For a nal example, let X = R
2
+
and f(x
1
, x
2
) = x
1
x
2
x
2
2
2x
4
1
. Then the partial
derivatives at x = (x
1
, x
2
) are
D
1
f(x) = x
2
8x
3
1
D
2
f(x) = x
1
2x
2
.
The vector of partial derivatives at x is called the gradient of f at x, and it is denoted

by Df(x), or commonly by f(x). Thus,
Df(x) = (D
1
f(x), . . . , D
n
f(x)).
7
x
x
Df(x) = 2( x x)
Figure 4: Gradient of quadratic function
Example If f is linear with gradient a, then Df(x) = a, which happens to be
independent of x. For another example, xing x R
n
, the gradient of the quadratic
function f(x) = ||x x||
2
is
Df(x) = (2( x
1
x
1
), . . . , 2( x
n
x
n
))
= 2( x x).
Thus, the gradient is a vector that, if translated so that its tail is at x, points from
x in the direction of x. See Figure 4, where we draw the gradient Df(x) as an arrow
emanating from the vector x, rather than the zero vector. This translation is helpful
because it identies the element of the domain at which we are taking the gradient.
This convention is widespread, if not universal, and we will adhere to it. For yet
another example, if f is Cobb-Douglas, then
Df(x) = f(x)
_
1
x
1
, . . . ,

n
x
n
_
.
2
+
and f(x
1
, x
2
) = x
1
x
2
x
2
2
2x
4
1
, so the gradient at
x is Df(x) = (x
2
8x
3
1
, x
1
2x
2
).
A fundamental result from univariate calculus, which is of great use in multivariate
analysis, is the mean value theorem, next.
Theorem 2.1 Let a, b R satisfy a < b, and let f : [a, b] R be dierentiable and
continuous at a and b. Then there is some x (a, b) such that
Df(x) =
f(b) f(a)
b a
.
8
Concave functions have nice dierentability properties. Of course, the function f : R
R dened by
f(x) =
_
x if x < 0,
0 else
is concave but not dierentiable at zero; and it is easy to modify this example to
produce a strictly concave function that is not dierentiable at zero. Note that the
slope of the graph of the above function is well-dened on either side of zero, the
problem being that these slopes are not equal. Given a set X R
n
, a function
f : X R, and an element x intX, we say f is one-sided dierentiable at x in
direction t if for every sequence {
m
} decreasing to zero, the sequence
_
f(x +
m
t) f(x)
m
_
exists. In this case, the limit is independent of the particular sequence {
m
} cho-
sen, and we may denote it D
t
+f(x). It turns out that concave functions are always
dierentiable in this more limited sense.
Theorem 2.2 Let X R
n
be convex, let f : X R be concave, and let x intX.
Then for every direction t, f is one-sided dierentiable at x in direction t.
So for concave functions, the only possible non-dierentiabilities are kinks, as in
the example described above. In fact, although it is beyond the scope of these notes,
it is further known that the subset of the domain of a concave function on which it
violates full dierentiability is exceedingly small; in a precise sense, a concave function
is directionally dierentiable at almost all elements of its domain.
2.3 Continuous Partial Dierentiability
If f is partially dierentiable, then for each x intX and each coordinate i = 1, . . . , n,
the partial derivative D
i
f(x) is well-dened. Then we may consider the mapping
D
i
f : intX R, which gives the ith partial derivative of f at every interior point of
the domain. We call D
i
f the ith partial derivative of f. If all partial derivatives D
i
f
are well-dened and continuous, then we say f is continuously partially dierentiable,
or C
1
.
In Figure 4, and later in Figure 5, we draw level sets of dierentiable functions as
lower dimensional, smooth curves. These properties can be formalized precisely,
and they are more or less implied by dierentiability: to be a little more precise, if a
function f : R
n
R is C
2
and Df(x) = 0, then the level set of f through x must be
a lower dimensional, smooth curve in an open set around x. That the condition of a
non-zero gradient is required for this result is demonstrated in the following example.
9
Example Dening f : R
2
R by f(x, y) = x
2
y
2
, the level set of f at zero is not
smooth.
Since partial derivatives are relatively easy to calculate, the next result gives easily
veriable conditions for a function to be directionally dierentiable: it is sucient that
the function be partially dierentiable and that the partial derivatives be continuous
in x. Furthermore, it gives a straightforward way of computing the derivative of f
in any direction t: we simply take the dot product of the gradient Df(x) and the
direction t. This generalizes our insight for linear functions.
Theorem 2.3 Let X R
n
and f : X R. If f is C
1
, then f is directionally
dierentiable, and for all x intX and every direction t,
D
t
f(x) = Df(x) t.
Proof Let {
k
} converge asymptotically to zero, and note that
f(x +
k
t) f(x)
k
=
_
f(x
1
+
k
t
1
, . . . , x
n1
+
k
t
n1
, x
n
+
k
t
n
)
f(x
1
+
k
t
1
, . . . , x
n1
+
k
t
n1
, x
n
)
_
t
n
k
t
n
+
_
f(x
1
+
k
t
1
, . . . , x
n1
+
k
t
n1
, x
n
)
f(x
1
+
k
t
1
, . . . , x
n2
k
t
n2
, x
n1
, x
n
)
_
t
n1
k
t
n1
+
[f(x
1
+
k
t
1
, x
2
, . . . , x
n
) f(x
1
, x
2
, . . . , x
n
)]t
1
k
t
1
.
By the mean value theorem, there exists z
k
n
between x
n
and x
n
+
k
t
n
such that
D
n
f(x
1
+
k
t
1
, . . . , x
n1
+
k
t
n1
, z
k
n
)
=
_
f(x
1
+
k
t
1
, . . . , x
n1
+
k
t
n1
, x
n
+
k
t
n
)
f(x
1
+
k
t
1
, . . . , x
n1
+
k
t
n1
, x
n
)
_
k
t
n
.
Similarly, the mean value theorem yields z
k
n1
between x
n1
and x
n1
+
k
t
n1
such
that
D
n1
f(x
1
+
k
t
1
, . . . , x
n2
+
k
t
n2
, z
k
n2
, x
n
)
=
_
f(x
1
+
k
t
1
, . . . , x
n1
+
k
t
n1
, x
n
)
f(x
1
+
k
t
1
, . . . , x
n2
k
t
n2
, x
n1
, x
n
)
_
k
t
n1
.
10
We can continue to apply the mean value theorem n 2 more times, and ultimately
we have z
k
1
between x
1
and x
1
+
k
t
1
such that
D
1
f(z
k
1
, x
2
, . . . , x
n
) =
f(x
1
+
k
t
1
, x
2
, . . . , x
n
) f(x
1
, x
2
, . . . , x
n
)
k
t
1
.
Therefore, we have
f(x +
k
t) f(x)
k
= D
1
f(z
k
1
, x
2
, . . . , x
n
)t
1
+ D
2
f(x
1
+
k
t
1
, z
k
2
, x
3
, . . . , x
n
)t
2
+
+ D
n
f(x
1
+
k
t
1
, . . . , x
n1
+
k
t
n1
, z
k
n
).
Note that for all i = 1, . . . , n, we have
(x
1
+
k
t
1
, . . . , x
i1
+
k
t
i1
, z
k
i
, x
i+1
, . . . , x
n
) (x
1
, . . . , x
n
).
Thus, taking limits and using continuity of the partial derivatives of f, we have
lim
k
f(x +
k
t) f(x)
k
= D
1
f(x)t
1
+ + D
n
f(x)t
n
= Df(x) t,
as required.
Example If f is linear with gradient a, then the slope in direction t is Df(x)t = at.
For another example, xing x R
n
, the slope of the quadratic function f(x) =
||x x||
2
in direction t is Df(x) t = 2( x x) t. For yet another example, if f is
Cobb-Douglas, then the derivative in any direction t is
Df(x) t = f(x)
n
i=1
i
t
i
x
i
.
2
+
and f(x
1
, x
2
) = x
1
x
2
x
2
2
2x
4
1
. Then the derivative
at x in direction t is
Df(x) t = (x
2
8x
3
1
)t
1
+ (x
1
2x
2
)t
2
.
Assume that f is C
1
, and that it has a non-zero gradient at x. In which direction is
the derivative of f greatest? In other words, for which direction t is the dot product
Df(x) t maximized? By the Cauchy-Schwartz inequality, it is maximized for the
direction
t =
1
||Df(x)||
Df(x).
We conclude that the gradient of the function points in the direction of steepest ascent,
generalizing our earlier insight for linear functions. It should not then be a surprise
11
Figure 5: Gradients
that, geometrically speaking, the gradient of f at x is orthogonal to the level set of f
through x. Thus, the level sets of the function contain considerable, though not all,
information about the gradient of f: given the level set of f through x, we know that
the gradient may point in one of only two possible directions. See Figure 5. Note
we follow the usual convention of drawing the gradient Df(x) as an arrow emanating
from the vector x, rather than the zero vector. We cannot infer from the level sets
alone which of those directions the gradient points toward, but we will usually imagine
that the graphs of functions are roughly hill-shaped, and so we will typically draw
gradients pointing more or less in the direction of the top of the hill. Also, we cannot
infer the norm, or length, of the gradient.
It turns out that if a function is C
1
, then it is continuous on the interior of its domain.
Theorem 2.4 Let X R
n
1
, then f is continuous at every
x intX.
Proof Consider any x intX and any sequence {x
k
} in intX converging to x. For
simplicity, we will assume that x
k
diers from x in every coordinate, i.e., x
i
= x
k
i
for
all i = 1, . . . , n. Note that
f(x
k
) f(x) =
f(x
k
1
, x
2
, . . . , x
n
) f(x
1
, x
2
, . . . , x
n
)
x
k
1
x
1
(x
k
1
x
1
)
+
f(x
k
1
, x
k
2
, . . . , x
n
) f(x
k
1
, x
2
, . . . , x
n
)
x
k
2
x
2
(x
k
2
x
2
)
.
.
.
+
f(x
k
1
, . . . , x
k
n1
, x
k
n
) f(x
k
1
, . . . , x
k
n1
, x
n
)
x
k
n
x
n
(x
k
n
x
n
).
12
By the mean value theorem, there exists z
k
1
between x
k
1
and x
1
such that
f(x
k
1
, x
2
, . . . , x
n
) f(x
1
, x
2
, . . . , x
n
)
x
k
1
x
1
= D
1
f(z
k
1
, x
2
, . . . , x
n
).
Similarly, the mean value theorem yields z
k
2
between x
k
2
and x
2
such that
f(x
k
1
, x
k
2
, . . . , x
n
) f(x
k
1
, x
2
, . . . , x
n
)
x
k
2
x
2
= D
2
f(x
k
1
, z
k
2
, x
3
, . . . , x
n
),
and so on. Thus, we can write
f(x
k
) f(x) = D
1
f(z
k
1
, x
2
, . . . , x
n
)(x
k
1
x)
+D
2
f(x
k
1
, z
k
2
, x
3
, . . . , x
n
)(x
k
2
x
2
)
.
.
.
+D
n
f(x
k
1
, . . . , x
k
n1
, z
k
n
)(x
k
n
x
n
).
Note that (z
k
1
, x
2
, . . . , x
n
) x, and (x
k
1
, z
k
2
, x
3
, . . . , x
n
) x, and so on. Therefore, by
continuous partial dierentiability, we have
lim
k
f(x
k
) f(x)
=
_
lim
k
D
1
f(z
k
1
, x
2
, . . . , x
n
)
__
lim
k
x
k
1
x
_
+
_
lim
k
D
2
f(x
k
1
, z
k
2
, x
3
, . . . , x
n
)
__
lim
k
x
k
2
x
2
_
.
.
.
+
_
lim
k
D
n
f(x
k
1
, . . . , x
k
n1
, z
k
n
)
__
lim
k
x
k
n
x
n
_
= D
1
f(x)0 +D
2
f(x)0 + + D
n
f(x)0
= 0,
as required.
Given X R
n
, a C
1
function f : X R, an element x intX, and distinct coordi-
nates i and j with D
j
f(x) = 0, we dene the marginal rate of substitution of j for i
as
MRS
ji
(x) =
D
i
f(x)
D
j
f(x)
.
To see the intuitive meaning of this quantity, suppose we start from the vector x and
increase the ith coordinate by the amount x
i
. How much do we need to increase
the jth coordinate in order to maintain the value of f at f(x)? That is, we seek the
quantity x
j
solving
f(x
1
, . . . , x
i1
, x
i
+ x
i
, x
i+1
, . . . , x
j1
, x
j
+ x
j
, x
j+1
, . . . , x
n
) = f(x).
13
x
1
x
2
x
1
x
2
level set
of f
slope
=
x
2
x
1
Figure 6: Slope of level set
If we are considering small changes to the coordinates of x, then the change in the
value of f when we increase the ith coordinate is roughly D
i
f(x)x
i
. When we
compensate by adjusting the jth coordinate by x
j
, the change in the value of f is
roughly D
j
f(x)x
j
. Then we want to solve D
i
f(x)x
i
+D
j
f(x)x
j
= 0, which has
the solution
x
j
=
_
D
i
f(x)
D
j
f(x)
_
x
i
,
assuming D
j
f(x) = 0. Thus, the marginal rate of substitution gives us an indication of
the rate at which we need to adjust the jth coordinate of x in order to compensate for a
change in the ith coordinate. Put another way, it gives us a measure of the sensitivity
of f to changes in the ith coordinate relative to changes in the jth coordinate.
For the case of n = 2, marginal rate of substitution has an interesting interpretation
in terms of the slope of the level set of f through (x
1
, x
2
). Note that the line through
(x
1
, x
2
) and (x
1
+x
1
, x
2
+x
2
) has slope x
2
/x
1
. We have argued that when x
1
and x
2
are small, this quantity approaches MRS
21
(x). On the other hand, when
x
1
and x
2
are small, the slope of the line through (x
1
, x
2
) and (x
1
+x
1
, x
2
+x
2
)
approximates the slope of the level set of f at (x
1
, x
2
), and therefore we have
slope of level set
of f at (x
1
, x
2
)
= MRS
21
(x
1
, x
2
).
See Figure 6.
14
2.4 Second Derivatives and Concavity
If f is directionally dierentiable and each directional derivative D
t
f : intX R is
also directionally dierentiable, then we say f is twice directionally dierentiable. For
the derivative of D
t
f at x in direction s, we use the special notation
D
st
f(x) = D
s
(D
t
f)(x).
When s = t, we write D
2
t
f(x) for the second derivative in direction t.
If f is partially dierentiable and each partial derivative D
i
f : intX R is itself
partially dierentiable, then we say f is twice partially dierentiable. In that case,
for the ith partial derivative of D
j
f(x), we use the special notation
D
ij
f(x) = D
i
(D
j
f)(x),
and we refer to this as the ijth cross partial derivative of f at x.
The Hessian of f at x is the n n matrix of cross partial derivatives,
D
2
f(x) =
_
_
D
11
f(x) D
12
f(x) D
1n
f(x)
D
21
f(x) D
22
f(x) D
2n
f(x)
.
.
.
.
.
.
.
.
.
D
n1
f(x) D
n2
f(x) D
nn
f(x)
_
_
.
That is, for each of the n partial derivatives of f, we take n partial derivatives, for
a total of n
2
, at x to generate the Hessian matrix. We will see that the Hessian has
close connections to the concept of second derivative for multivariate functions.
Example Let f be a linear function with gradient a. Then D
j
f(x) = a
j
, and
D
ij
f(x) = 0 for all x. That is, the Hessian is just the zero matrix. For another
example, let f(x) = ||x x||
2
for all x R
n
. Then we have seen that D
i
f(x) =
2( x
i
x
i
). Therefore,
D
ij
f(x) =
_
2 if i = j
0 else,
and the Hessian is a matrix with 2 down the diagonal and zeroes elsewhere.
In both of the previous examples, the Hessian was independent of the element x at
which it was evaluated. That is an atypical result. Normally, D
2
f(x) will in fact
depend on x.
Example If f is Cobb-Douglas, then the Hessian matrix has elements
D
2
i
f(x) =
i
(
i
1)x
i
2
i
j=i
x
j
j
=

i
(
i
1)
x
2
i
f(x)
15
along the diagonal, and o the diagonal (for i = j) it has elements
D
ij
f(x) =
i
j
x
i
1
i
x
j
1
j
k=i,j
x
k
k
=

i
j
x
i
x
j
f(x).
For another example, let X = R
2
+
, and let f(x) = x
1
x
2
2x
4
1
x
2
2
. Then
D
11
f(x) = 24x
2
1
D
12
f(x) = D
21
f(x) = 1 D
22
f(x) = 2.
Thus,
D
2
f(x) =
_
24x
2
1
1
1 2
_
,
which clearly depends on x.
The following result, known as Youngs theorem, simplies the computation of
cross partial derivatives by giving weak conditions under which the order in which
partial derivatives is taken is irrelevant. Thus, the Hessian matrix of f at x is sym-
metric. If f is twice partially dierentiable and for all i, j = 1, . . . , n, the cross
partial D
ij
: intX R is continuous, then we say f is continuously twice partially
dierentiable, or C
2
.
Theorem 2.5 Let X R
n
2
, then for all i, j = 1, . . . , n
and all x intX, we have
D
ij
f(x) = D
ji
f(x).
Under the conditions of Youngs theorem, the Hessian gives us a relatively simple
way to compute the second directional derivative D
st
f(x). Dening the function
h: intX R by h(x) = D
t
f(x), Theorem 2.3, implies that
h(x) = Df(x) t =
n
j=1
D
j
f(x)t
j
=
_
n
j=1
t
j
D
j
f
_
(x),
where in the latter equality we take the linear combination of functions D
j
f without
changing the value of h. Since the mapping
n
j=1
t
j
D
j
f(x) is C
1
as a function of x,
we can again use Theorem 2.3 to dierentiate h in the direction s, we have
D
s
h(x) =
n
i=1
D
i
h(x)s
i
=
n
i=1
D
i
_
n
j=1
t
j
D
j
f
_
(x)s
i
=
n
i=1
n
j=1
D
ij
f(x)s
i
t
j
.
16
Thus, we have proved the following theorem.
Theorem 2.6 Let X R
n
2
, then for all directions s and
t, we have
D
st
f(x) =
n
i=1
n
j=1
D
ij
f(x)s
i
t
j
.
Of course, the second directional derivative is just the Hessian matrix pre- and post-
multiplied by the directions s and t: the expression for D
st
f(x) can be written
_
s
1
s
2
s
n
_
D
11
f(x) D
12
f(x) D
1n
f(x)
D
21
f(x) D
22
f(x) D
2n
f(x)
.
.
.
.
.
.
.
.
.
D
n1
f(x) D
n2
f(x) D
nn
f(x)
_
_
_
_
t
1
t
2
.
.
.
t
n
_
_
,
or more concisely, viewing s and t as column vectors and s
as the transpose of s, we
have D
st
f(x) = s
D
2
f(x)t.
Example Let f be a linear function with gradient a. Then all second directional
derivatives are zero: D
st
f(x) = 0. For another example, let f(x) = ||x x||
2
for all
x R
n
. Then the second directional derivative D
st
f(x) is just equal to 2s t. For
yet another example, let f be Cobb-Douglas. Then the second directional derivative
D
st
f(x) is
D
st
f(x) = f(x)
n
i=1
_
i
(
i
1)
x
2
i
s
i
t
i
+
j=i
j
x
i
x
j
s
i
t
j
_
.
2
+
and let f(x) = x
1
x
2
2x
4
1
x
2
2
. Then the second
directional derivative is
D
st
f(x) = 24x
2
1
t
1
s
1
+ t
2
s
1
+ t
1
s
2
2t
2
s
2
.
We can give characterizations of concavity and strict concavity in terms of second

directional derivatives.
Theorem 2.7 Let X R
n
be convex with nonempty interior, and let f : X R be
C
2
and continuous at every x bdX. Then
(1) f is concave if and only if for all x intX and all directions t, D
2
t
f(x) 0.
17
(2) If D
2
t
f(x) < 0 for all x intX and all directions t, then f restricted to intX is
strictly concave.
Proof I will prove part (1). Suppose f is concave, and consider any x intX
and any direction t. Note that A
x,t
= { R | x + t intX} contains zero in
its interior, and that the function f
x,t
: A
x,t
R dened by f
x,t
() = f(x + t) is
concave. Therefore, D
2
f
x,t
(0) 0. Note also that the chain rule implies
Df
x,t
() =
n
i=1
D
i
f(x + t)t
i
= D
t
f(x + t).
Given a sequence {
k
} asymptotically converging to zero, we have
D
2
f
x,t
(0) = lim
k
Df
x,t
(
k
) Df
x,t
(0)
k
= lim
k
D
t
f(x +
k
t) D
t
f(x)
k
= D
2
t
f(x),
which implies D
2
t
f(x) 0. Now suppose that for all x intX and all directions t,
D
2
t
f(x) 0. Choose x intX and t arbitrarily, and choose any A
x,t
. Let {
k
}
be a sequence asymptotically converging to zero. Then
D
2
f
x,t
() = lim
k
Df
x,t
( +
k
) Df
x,t
()
k
= lim
k
D
t
f(x +
k
t) D
t
f(x)
k
= D
2
t
f(x),
which implies D
2
f
x,t
() 0. Therefore, f
x,t
is concave, and since x intX and t were
arbitrary, this implies that f is concave on the interior of X. A continuity argument
(which is omitted) then establishes that f is concave on its entire domain.
Example The linear function f(x) = a x is concave, as given any x R
n
and
any direction t, we have D
2
t
f(x) = 0. The function f(x) = ||x x||
2
is strictly
concave, as we have D
2
t
f(x) = 2||t|| = 2 < 0. For another example, the function
f(x) = x
1
x
2
2x
4
1
x
2
2
from above is not concave. Indeed, we have shown
D
2
t
f(x) = 24x
2
1
t
2
1
+ 2t
1
t
2
2t
2
2
,
so given any vector x with x
1
very small and a direction t pointing more in the
direction of the horizontal axis than the vertical axis, e.g., t = (
3/2, 1/2), we have

D
2
t
f(x) > 0.
18
The Cobb-Douglas case is an instructive one: concavity then depends on whether the
coecients
1
, . . . ,
n
sum to less than (or equal to) one.
Theorem 2.8 Let f : R
n
+
R be Cobb-Douglas with coecients
1
, . . . ,
n
0 (not
all zero). Then
(1) f is quasi-concave,
(2) f is concave if and only if
n
i=1
i
1,
(3) if
i
> 0 for all i = 1, . . . , m, then f is strictly quasi-concave,
(4) the restriction of f to R
n
++
is strictly concave if and only if
n
i=1
i
< 1 and
i
> 0 for all i = 1, . . . , m.
Proof For part 2, rst assume that f is concave, and dene the function g : R
+
R
by g(c) = f(c, . . . , c) = c
P
i

i
. If
n
i=1
i
> 1, then g is strictly convex, a contradic-
tion. Thus, the sum is less than or equal to one. Conversely, assume
n
i=1
i
1.
Dene the matrix
A =
_
1
(
1
1)
1
2

1
1

2
(
2
1)
2
n
.
.
.
.
.
.
.
.
.
1

n
2

n
(
n
1)
_
_
,
and note that A is square and symmetric. Furthermore, under the assumptions of
the theorem, it is diagonally dominant, because |
i
(
i
1)|
i
j=i
j
, and it has
non-positive entries along the diagonal; therefore, it is negative semi-denite. Now
consider any x R
n
++
and any t R
n
, and dene the vector
s =
_
t
1
x
1
, . . . ,
t
n
x
n
_
.
Note that
t
D
2
f(x)t =
_
f(x)
n
i=1
i
(
i
1)t
2
i
x
2
i
_
+
_
f(x)
i,j:i=j
j
t
i
t
j
x
i
x
j
_
= f(x)(s
As) 0,
so the Hessian of f is negative semi-denite. Then Theorem 2.7 implies that f is
concave. Part 4 follows similarly, using the observation that when
n
i=1
i
< 1
and the coecients are positive, the matrix A is strictly diagonally dominant with
negative entries along the diagonal. For part 1, let =
n
i=1
i
> 0, and dene
the function g : R
++
R by g(z) = z
1/2
. Note that g is strictly increasing, and
that h(x) g(f(x)) =
n
i=1
x
i
i
, where
i
=
i
/2, so the function h is Cobb-
Douglas with coecients summing to one half. By part 2, h is concave. Note that
19
the inverse function g
1
is strictly increasing, and that f(x) = g
1
(h(x)), so f is a
monotonic transformation of a concave function. Consider distinct x, y R
n
+
and
(0, 1), and note that f(x + (1 )y) min{f(x), f(y)} if and only if h(x +
(1 )y) min{h(x), h(y)}, which follows because h is concave. Thus, f is quasi-
concave. Finally, part 3 follows similarly, now using the fact that by
i
> 0 and part
4, h is strictly concave, which yields strict quasi-concavity of f.
Note that part 1 of Theorem 2.7 provides a complete characterization of concavity,
but part 2 gives only a sucient condition for strict concavity. The next example
shows that this limitation is unavoidable.
Example Dene f : R R by f(x) = x
4
. This function is strictly concave, but
D
2
f(0) = 0.
Part 2 of Theorem 2.7 gives a sucient condition for strict concavity of the function f
restricted to the interior of its domain. To see that the result cannot be strengthened,
consider the Cobb-Douglas function with positive coecients summing to strictly less
than one. As in the proof of Theorem 2.8, this function has negative denite Hessian
at all vectors x R
n
++
, but it is not strictly concave on R
n
+
: when x
i
= 0 for some i,
the value of the function is zero, so it is constant on the boundary of R
n
++
.
2.5 Optimization Problems
Given a domain X R
n
and an objective function f : X R, the corresponding
unconstrained maximization problem is the attempt to nd elements of the domain
at which f attains its highest value. We write this problem as
max
xX
f(x).
That is, we want the vectors x X such that for all y X, f(x) f(y), in which
case we say x is a maximizer of f. We say x is a local maximizer of f if there is
some > 0 such that for all y X B
(x), f(x) f(y). And x is a strict local

maximizer of f if the latter inequality holds strictly: there is some > 0 such that
for all y X B
(x) with y = x, f(x) > f(y). We sometimes use the term global
maximizer to refer to a maximizer of f. If f is continuous and the domain is compact,
it will have at least one maximizer.
A constrained maximization problem is one in which we search for a maximizer within
a constraint set C R
n
. Given domain X R
n
, constraint set C R
n
, and objective
function f : X R, the problem is
max
xX
f(x)
subject to x C.
That is, we want a vector x X C such that for all y X C, f(x) f(y). An
element x X C is a constrained local maximizer of f subject to C if there exists
20
some > 0 such that for all y B
(x) X C, f(x) f(y). Similarly, an element

x X C is a constrained strict local maximizer of f subject to C if there exists
some > 0 such that for all y B
(x) X C with y = x, we have f(x) > f(y).

As long as f is continuous and XC is nonempty and compact, there is at least one
(global) constrained maximizer.
After investigating unconstrained optimization, we will consider constraint sets C
taking the form of a single equality constraint:
C = {x R
n
| g(x) = c},
where g : R
n
R is any function, and c R is a xed value of g. We write a
maximization problem subject to such a constraint as
max
xX
f(x)
s.t. g(x) = c.
One might think of g(x) as a cost and c as a pre-determined budget. The latter
formulation is unrestrictive, but we will impose more structure (e.g., dierentiability)
on g. Then we will allow multiple equality constraints g
1
: R
n
R, . . . , g
m
: R
n
R,
so that the constraint set takes the form
C = {x R
n
| g
1
(x) = c
1
, . . . , g
m
(x) = c
m
},
where c
j
is a xed value of the jth constraint for j = 1, . . . , m.
We then consider the maximization problem with multiple inequality constraints,
C = {x R
n
|g
1
(x) c
1
, . . . , g
m
(x) c
m
},
which is written
max
xR
n
f(x)
s.t. g
1
(x) c
1
.
.
.
g
m
(x) c
m
,
now dening f on the entire Euclidean space and building any restrictions on the
domain into the constraints of the problem. Of course, it may be that g
j
(x) = x
j
and c
j
= 0, so the j
th
constraint is just a non-negativity constraint: x
j
0. Note
that the problem of equality constraints is a special case of inequality constraints: we
can always convert g(x) = c into two inequalities g(x) c and g(x) c.
Finally, we consider the general problem with mixed constraints, where we are given
equality constraints g
1
: R
n
R, . . . , g
: R
n
Rand inequality constraints h
1
: R
n
21
R, . . . , h
m
: R
n
R. Then the form of the constraint set is
C =
_
x R
n
|
g
1
(x) = c
1
, . . . , g
(x) = c
h
1
(x) d
1
, . . . , h
m
(x) d
m
_
,
and we consider the hybrid optimization
max
xR
n f(x)
s.t. g
j
(x) = c
j
, j = 1, . . . ,
h
j
(x) d
j
, j = 1, . . . , m,
again dening f on the entire Euclidean space.
All of the above optimization problems have been framed as maximization problems,
rather than minimization problems, but the choice is purely rhetorical. Any mini-
mization problem with objective function f can be re-interpreted as a maximization
problem with objective function f. The remainder of these notes focus exclusively
on maximization problems, but all of the results can be reformulated (using the latter
equivalence) with suitable care to keep track of the occasional negative sign.
3 Unconstrained Optimization
We rst consider X R
n
and f : X R and the unconstrained maximization
problem
max
xX
f(x).
Our analysis begins with conditions involving rst derivatives (rst order condi-
tions) and second derivatives (second order conditions), both necessary conditions
and stronger sucient conditions. We then apply these conditions to provide a sys-
tematic method for solving unconstrained optimization problems. And we analyze
the behavior of solutions as we vary parameters of the problem.
3.1 First and Second Order Conditions
Our rst result establishes a straightforward necessary condition for an interior local
maximizer of a function: the derivative of the function at the local maximizer must
be equal to zero.
Theorem 3.1 Let X R
n
, let x intX, and let f : X R be directionally dier-
entiable at x. If x is a local maximizer of f, then for every direction t, D
t
f(x) = 0.
Proof Suppose x is an interior local maximizer, and let t be an arbitrary direction.
Pick > 0 such that B
(x) X and for all y B
(x), f(x) f(y). Note that for

22
Figure 7: First order condition
suciently small > 0, we have x+t B
(x), so f(x) f(x+t) for R small.

Then
D
t
f(x) = lim
0
f(x + t) f(x)
0
and
D
t
f(x) = lim
0
f(x + t) f(x)
0,
and we conclude D
t
f(x) = 0, as required.
Of course, an implication of the necessary rst order condition is all partial derivatives
are equal to zero, i.e., D
i
f(x) = 0 for all i = 1, . . . , n. Assuming f is C
1
, the rst
order condition is written Df(x) t = 0, so the converse direction holds as well: an
element x satises the rst order condition if and only if all partial derivatives at x
are equal to zero. This is a generalization of the well-known rst order condition from
univariate calculus: if the graph of a function peaks at one point in the domain, then
the graph of the function has slope equal to zero at that point. In general, we see
that the derivative of the function in any direction (in multiple dimensions, there are
many directions) must be zero. See Figure 7.
Example If f is quadratic, then Df(x) = 2( x x), and the rst order condition
is 2( x
i
x) = 0 for each i = 1, . . . , n, i.e., x = x. Letting X = R
2
+
and f(x) =
x
1
x
2
2x
4
1
x
2
2
, the rst order condition is
x
2
8x
3
1
= 0
x
1
2x
2
= 0.
Solving the second equation for x
1
, we have x
1
= 2x
2
. Substituting this into the rst
equation, we have x
2
64x
3
2
= 0, which has three solutions: x
2
= 0, 1/8, 1/8. Then
23
the rst order condition has three solutions,
(x
1
, x
2
) = (0, 0), (1/4, 1/8), (1/4, 1/8),
but the last of these is not in the domain of f, and the rst is on the boundary of the
domain. Thus, we have a unique solution in the interior of the domain: (x
1
, x
2
) =
(1/4, 1/8).
The usual necessary second order condition from univariate calculus extends as well.
Theorem 3.2 Let X R
n
, let x intX, and let f : X R be twice directionally
dierentiable. If x is a local maximizer of f, then for every direction t, we have
D
2
t
f(x) 0.
Proof Assume x is an interior local maximizer of f, let t be an arbitrary direction,
and let > 0 be such that B
(x) X and for all y B
(x), f(x) f(y). Consider a

sequence {
n
} such that
n
0, so for suciently high n, we have x +
n
t B
(x),
and therefore f(x +
n
t) f(x). For each such n, the mean value theorem yields
n
(0,
n
) such that
D
t
f(x +
n
t) =
f(x +
n
t) f(x)
n
,
and therefore D
t
f(x +
n
t) 0. Note that
n
0. Furthermore, using D
t
f(x) = 0,
we have
D
t
f(x +
n
t) D
t
f(x)
n
0.
Taking limits, we have D
2
t
f(x) 0, as required.
In matrix terms, assuming f is C
2
, the inequality in the previous theorem is written
t
D
2
f(x)t 0. That is, the Hessian of f at x is negative semi-denite. It is easy to
see that the necessary rst and second order conditions are not sucient.
Example Let X = R and f(x) = x
3
x
4
. Then Df(0) = D
2
f(0) = 0, but x = 0 is
not a local maximizer.
As for functions of one variable, we can give sucient conditions for a strict local
maximizer. Although this result is of limited usefulness in nding a global maxi-
mizer, we will see that it can be of great use in comparative statics analysis, which
investigates the dependence of maximizers on parameters.
24
Theorem 3.3 Let X R
n
, let x intX, and let f : X R be C
2
. If D
t
f(x) = 0
and D
2
t
f(x) < 0 for every direction t, then x is a strict local maximizer of f.
Proof Assume x is interior, D
t
f(x) = 0 and D
2
t
f(x) < 0 for every direction t, and
suppose that x is not a local maximizer. Then for all m, there exists x
m
B1
m
(x)
such that f(x
m
) f(x). For m high enough, we have B1
m
(x) X, and for such
m, dene the direction t
m
=
1
||xmx||
(x
m
x) and the function g
m
: [0, 1/m] R
by g
m
() = f(x + t
m
). Since g
m
is continuous, it has at least one minimizer, say
m
. Since x is a local maximizer of f, it follows that zero is a local maximizer of
g
m
, and we have g
m
(1/m) = f(x
m
) f(x) by assumption; thus, we can assume
0 <
m
<
1
m
without loss of generality. Then
m
is an interior minimizer of g
m
,
so it is an interior maximizer of g
m
, so Theorem 3.2 implies that D
2
g
m
(
m
) 0.
Furthermore, since {t
m
} lies in the closed unit ball, which is compact, we may consider
a convergent subsequence (still indexed by m for simplicity) with limiting direction
t. We therefore have
0 D
2
g
m
(
m
) = D
2
t
mf(y
m
) =
n
i=1
n
j=1
t
m
i
t
m
j
D
i,j
f(y
m
)
i=1
n
j=1
t
i
t
j
D
i,j
f(x) = D
2
t
f(x),
where the rst equality follows by denition of directional derivative, the second and
third from Theorem 2.6, and the limit from t
m
t and the assumption that f is C
2
.
We conclude that D
2
t
f(x) 0, a contradiction. Therefore, x is a local maximizer, as
required.
In matrix terms, the inequality in the previous theorem is written t
D
2
f(x)t < 0.
That is, the Hessian of f at x is negative denite. (See Simon and Blume (1994)
for methods to check negative deniteness of a symmetric matrix.) Obviously, the
sucient second order conditions are not necessary.
Example Letting X = R and f(x) = x
4
, we see that x = 0 maximizes f, but
D
2
f(0) = 0.
Next, we return to one of our running examples.
Example Letting X = R
2
+
and f(x) = x
1
x
2
2x
4
1
x
2
2
, we saw that the rst order
condition has a unique interior solution, (x
1
, x
2
) = (1/4, 1/8). Then
D
2
t
f(1/4, 1/8) =
3
2
t
2
1
+ 2t
1
t
2
2t
2
2
< t
2
1
+ 2t
1
t
2
t
2
2
= (t
1
t
2
)
2
0,
where the strict inequality holds because either t
1
= 0 or t
2
= 0 (or both). Thus,
D
2
t
f(1/4, 1/8) < 0 for every direction t, so the sucient condition of Theorem 3.3
25
is satised, and we conclude that (1/4, 1/8) is a strict local maximizer. It is open
for now, however, whether (1/4, 1/8) is a global maximizer. Note that f(1/4, 1/8) =
1/128 > 0, while for an element on the boundary of X, we have x
1
= 0 or x
2
= 0, in
which case f(x
1
, x
2
) 0. So the only issue is whether f attains values larger than
f(1/4, 1/8) in the interior of the domain.
3.2 Solution Methods
The usefulness of the necessary rst order condition from Theorem 3.1 in solving an
optimization problem is that it transforms the problem to that of solving a system
of equations. Instead of solving the optimization problem directly, we solve the rst
order condition to nd all possible candidates for interior maximizers of a function;
and once we have the solutions to the rst order condition, we can either proceed to
necessary second order conditions to narrow down the set, or we can directly compute
the value of the function at each solution.
More precisely, the rst order condition implies that each partial derivative is equal
to zero,
D
1
f(x
1
, . . . , x
n
) = 0
D
2
f(x
1
, . . . , x
n
) = 0
.
.
.
D
n
f(x
1
, . . . , x
n
) = 0,
a system of n equations in n unknowns (x
1
, . . . , x
n
). Solving such a system can be
straightforward or impossiblehopefully the former. Note that the necessary rst
order condition is that derivatives in all directions are zero, but additional equations
corresponding to other directions are redundant: the n restrictions above imply via
D
t
f(x) = Df(x) t that all directional derivatives are zero.
Importantly, the solutions to the rst order condition will include all interior max-
imizers of a function, but there are two caveats. First, these solutions may contain
other elements, such as local (but not global) maximizers, local (and global) mini-
mizers, and inection points. Second, even if the function attains a higher value on
one solution to the rst order condition than all other solutions, that need not be a
maximizer of the function: the function may attain a higher value on the boundary
of its domain (such maximizers need not satisfy the rst order condition), or it may
not have a maximizer at all (if the domain is not compact). Thus, care must be taken
in employing the calculus approach to optimization.
To give some guidance in applying this approach, it may be useful to describe a
systematic method for solving the unconstrained maximization problem
max
xX
f(x).
26
To this end, assume for simplicity that X R
n
is closed (though not necessarily
compact) and convex, and let f : X R be C
2
. To represent the values of f at the
extremes of the domain, if it is unbounded, let
y
= lim
k
sup
_
f(x) : x X and ||x|| k
_
,
where we assume for simplicity that this limit exists, and let
y = sup{f(x) : x bdX}
represent the highest value of f on the boundary of its domain. Assume for simplicity
that the function f has at most a nite number of critical points.
1. Find all interior solutions, say x
1
, . . . , x
k
intX, to the rst order
condition.
2. Check the second order necessary condition at each solution x
j
; if there
is a direction t such that D
2
t
f(x
j
) < 0, then x
j
is not a maximizer. Let
x
1
, . . . , x
denote the solutions remaining after this step.

3. A solution x
j
from Steps 1 and 2 is a maximizer if and only if
f(x
j
) max{y
, y, f(x
1
), . . . , f(x
)}.
4. A boundary point x X is a maximizer if and only if
f(x) max{y
, y, f(x
1
), . . . , f(x
)}.
In the above, Step 2 is optional; often, the gain from narrowing down the solutions
to the rst order condition is outweighed by the cost of checking the second order
condition, and this step is omitted. In general, there may be multiple maximizers or
no maximizers; but if X is compact, then it has at least one maximizer, simplifying
the situation somewhat.
We can now complete the analysis of one of our running examples.
Example Returning again to X = R
2
+
and f(x) = x
1
x
2
2x
4
1
x
2
2
, we have noted
that (1/4, 1/8) is the unique interior solution to the rst order condition and that the
second directional derivatives of f are negative at this point. Note that when x
1
= 0
or x
2
= 0, we have f(x
1
, x
2
) 0, and f(0) = 0. Thus, y = 0. Furthermore, I claim
that y
= . To see this, consider any value c < 0, let k be suciently high that
(i) k
2
k
4
2c
27
(ii)
_
c
2
_1
4
_
k
2
c
2
k
2
+
_
c
2
c
(iii)
_
k
2
c
2

1
2
_
c
2
_1
4
,
and consider x R
2
+
with ||x|| k. If x
1
x
2
, then x
1
k
2/2, and by (i) we have

f(x) x
2
1
2x
4
1

_
k
2
2
_
2
2
_
k
2
2
_
4
c.
If x
1
< x
2
, then f(x) 2x
4
1
, so it is clear that f(x) c holds when x
1
(
c
2
)
1/4
.
Last, we consider the case in which x
1
< x
2
and x
1
< (
c
2
)
1/4
, which implies
x
2

k
2
c
2
,
and therefore by (ii) and (iii), we have
f(x)
_
c
2
_1
4
x
2
x
2
2

_
c
2
_1
4
k
2
c
2
k
2
+
_
c
2
c,
establishing the claim. Then f(1/4, 1/8) > 0 = max{y
, y}, and we conclude that

(1/4, 1/8) is the unique maximizer of this function.
Matters are greatly simplied when we consider concave functions. Recall that a
necessary condition for f to be concave is that it have non-positive second derivatives
in every direction, i.e., t
D
2
f(x)t 0 for all x intX and all directions t; and
when the domain is open, the more stringent requirement of strictly negative second
directional derivatives is actually sucient for strict concavity. The next result shows
that for quite general concave functions, the rst order necessary condition is actually
sucient for a global maximizer.
Theorem 3.4 Let X R
n
be convex, let x intX, and let f : X R be C
1
and
concave. If Df(x) = 0, then x is a global maximizer of f.
Proof Suppose Df(x) = 0 but f(y) > f(x) for some y X. Let t =
1
||yx||
(y x).
Consider any sequence
m
0, let y
m
= x +
m
t, and note that
y
m
=
_
1

m
||y x||
_
x +
_

m
||y x||
_
y,
so when
m
< ||y x||, it follows that y
m
is a convex combination of x and y. By
concavity, we then have
f(y
m
) f(x)
||y
m
x||

(1
m
||yx||
)f(x) + (
m
||yx||
)f(y) f(x)
m
=
f(y) f(x)
||y x||
> 0.
28
Taking limits, we have
0 = Df(x) t = D
t
f(x)
f(y) f(x)
||y x||
> 0,
a contradiction.
Example Assume X = R
n
, and note that f(x) = ||x x||
2
is strictly concave. The
rst order condition has the unique solution x = x, which is the unique maximizer of
the function. (Of course, we could have veried that directly.)
3.3 Envelope Theorem
The next result lays the foundation for comparative statics analysis, in which we
consider how local maximizers vary with respect to underlying parameters of the
problem. Specically, we study the eect of letting a parameter, say , vary in the
objective function. Of course, if x is a local maximizer given parameter , and the
value of the parameter changes to
, then x may no longer be a local maximizer.

But we will see that under the rst order and second order sucient condition, its
location will vary smoothly as we vary the parameter.
In what follows, we consider an open set R
m
, a set X R
n
, and a function
f : X R, so the value of the function is f(x, ), where x = (x
1
, . . . , x
n
) X
and = (
1
, . . . ,
m
) . If we want to keep xed while x is variable, we write this
as f(x|); alternatively, xing x and varying , we write f(|x). Thus, the derivative
at x in direction t (keeping xed) is D
t
f(x|), and the derivative at x (keeping
xed) is Df(x|); alternatively, we write Df(|x) for the derivative with respect to
for xed x. The proof of the next result is relatively advanced.
Theorem 3.5 Let R
m
be open and X R
n
, and let f : X R be C
2
.
Consider (x
) X with x
intX. Given
, assume x
satises the rst

order condition, i.e., Df(x
) = 0, and the second order sucient condition, i.e.,

D
2
t
f(x
) < 0 for all directions t. Then there are are open sets Y R
n
with x
Y
and with
, and C
1
mappings
i
: R, i = 1, . . . , n, such that
for all , (i) () = (
1
(), . . . ,
n
()) is the unique solution to max
xY
f(x, )
belonging to Y , (ii) () satises the rst order condition at , i.e., Df(x|) = 0 when
evaluated at x = (), and (iii) () satises the second order sucient condition at
, i.e., D
2
t
f(x|) < 0 for all directions t when evaluated at x = ().
Proof By continuous dierentiability, there exist open sets Y

X and
such
that (x
) Y

and for all (x, ) Y
and all directions t, D

2
t
f(x|) < 0. In
particular, the Hessian matrix D
2
f(x
) has full rank. Since x
is a local maximizer
of f(,
), the rst order condition holds at x
: Df(x
) = 0. By the implicit
29
function theorem, there exist open sets Y
X and
with (x
) Y
and a C
1
function :
such that for all
, () is the unique solution

to Df(x|) = 0 belonging to Y
. Dene Y to be a nonempty, convex, open subset

of the intersection Y

Y

and =

1
(Y ). This is possible because
x
Y

Y

and the intersection Y
Y

is open, so we can choose Y to be any open
ball containing x
and included in Y

Y
. Note that Y is open and is continuous,

so
1
(Y ) is open, so is open. Furthermore, (i) maps into Y , (ii) for all
, () is the unique solution to Df(x|) = 0 belonging to Y , and (iii) for all
(x, ) Y and all directions t, D
2
t
f(x|) < 0. Note that (iii) implies that f(, )
is strictly concave on Y . Taken together, given any , (i)(iii) imply that () is
the unique solution to max
yY
f(y, ) belonging to Y .
See Figure 3.3, where given parameter
, the level sets of f(,
) are in blue, and

the mapping selects the local maximizer x
. Varying within changes the level

sets of the function and moves the local maximizer to x
.
With the help of some matrix notation, we can give the precise form of the derivative
of the mapping : Y dened by () = (
1
(), . . . ,
n
()) for all . In the
next corollary, we view Df(x|) as an n1 matrix consisting of the partial derivatives
with respect to the coordinates of x; we view Df(x, ) as an n m matrix of cross
partial derivatives (derivatives with respect to x expanded across rows and derivatives
with respect to expanded across columns) and D
2
f(x|) as an nn Hessian matrix
with respect to the coordinates of x; and we view D() as n m matrices with row
i consisting of the derivative of
i
() with respect to the m coordinates of .
Corollary 3.6 Under the conditions of Theorem 3.5, we have for all ,
D() = [D
2
f(x|)]
1
Df(x, ),
where derivatives on the right-hand side are evaluated at x = ().
Proof Dene the mapping : R
n
by () = Df(x|) for all , where the
derivative on the right-hand side is evaluated at x = (). Since () satises the
rst order condition, we have () = 0 for all ; in particular, is constant on
. Then D() = 0 on , and using the chain rule, this becomes
D
2
f(x|)D() + Df(x, ) = 0,
where the derivatives on the right-hand side are evaluated at x = (). Since
D
2
f(x|) is non-singular at x = (), we can premultiply the above equation by
its inverse to obtain the desired expression.
In case the matrix algebra in the preceding corollary is a bit hard to digest, we can
easily state the derivative of in terms of partial derivatives when m = n = 1. Using
30
x
1
x
2

f(,
)
f(,
Figure 8: Parameterized local maximizer

31
the assumption that for all in an open set D
1
f(x, ) = 0, where the derivative is
evaluated at x = (), it follows that D
1
f((), ) is constant. Thus, its derivative
is zero. Using the chain rule, this implies
D
11
f(x, )D() + D
12
f(x, ) = 0,
which implies
D() =
D
1,2
f(x, )
D
2
1
f(x, )
,
assuming (as we do) that D
2
1
f(x, ) is non-zero.
Given the parameterization of a local maximizer from the previous theorem, we
can consider the locally maximized value of the objective function,
F() = f((), ),
as a function of . Since f is C
2
and each
i
is C
1
, it follows that f((), ) is C
1
as
a function of . The next result, known as the envelope theorem, provides a simple
way of calculating the rate of change of the locally maximized objective function as a
function of : basically, we take a simple partial derivative of the objective function
with respect to , holding x = () xed. That is, although the location of the
constrained local maximizer may in fact change when we vary , we can ignore that
variation, treating the constrained local maximizer as xed in taking the derivative.
Theorem 3.7 Let R
m
be open and X R
n
, and let f : X R be C
1
.
Consider (x
) X with x intX. Let

i
: R, i = 1, . . . , n, be C
1
mappings such that x
= (
) = (
1
(
), . . . ,
n
(
)) and such that for all ,

() is a local maximizer of f at . Dene the mapping F : R by
F() = f((), )
for all . Then F is C
1
, and
DF(
) = Df(
|x
).
Proof For all , the chain rule implies
DF() =
n
i=1
D
i
f(x|)D
i
() + Df(|x),
where the derivatives on the right-hand side are evaluated at x = (). Since ()
is a local maximizer at , the rst order condition Df(x|) = 0 holds, and the above
expression simplies as required.
32
replacemen
F F
F(
) =
f(x
f(x
, )
f(x
, )
f(x
, )
f(x
, )

Figure 9: Envelope theorem
The intuition for the envelope theorem is actually quite simple and has a graphical
explanation; although the idea holds generally, consider the case where m = 1. For a
given value of
, let x
= (
) be a maximizer of f(x,
) as x ranges over the open

set Y . If we then perturb
to , the maximizer over Y is now (), and of course

given , the value of f at () is at least as great as it is at x
, i.e., f(x
, ) F().
We depict this in Figure 3.3; there, xing x
and varying , it follows that the graph

of f(x
, ) lies below the graph of F(); and of course f(x
) = F(
). So the graph
of f(x
, ) is tangent to the graph of F at
, so the derivative of F is equal to the

second partial of f(x
, ) at =
. Repeating this argument for dierent starting

values of
in , say
with x
= (
), we see that the graph of F envelopes the

graphs of f with x xed at ().
4 Pareto Optimality
A set N = {1, 2, . . . , n} of individuals must choose from a nonempty set A of alter-
natives. Assume each individual is preferences over alternatives are represented by a
utility function u
i
: A R. One alternative y Pareto dominates another alternative
x if u
i
(y) u
i
(x) for all i, with strict inequality for at least one individual. An
alternative is Pareto optimal if there is no alternative that Pareto dominates it.
Consider the case of two individuals, X = R
2
, and quadratic utility, i.e., u
i
(x) =
||x x
i
||
2
, and an alternative x, as in Figure 10. It is clear that any alternative
in the shaded lens is strictly preferred to x by both individuals, which implies
that x is Pareto dominated and, therefore, not Pareto optimal. In fact, this will be
true whenever the individuals indierence curves through an alternative create a lens
shape like this. The only way that the individuals indierence curves wont create
such a lens if they meet at a tangency at the alternative x, and this happens only
when x lies directly between the two individuals ideal points. We conclude that,
when there are just two individuals and both have Euclidean preferences, the set of
Pareto optimal alternatives is the line connecting the two ideal points. See Figure 11
33
x
x
1
x
2
Figure 10: Pareto optimals with Euclidean preferences
x
x
1
x
2
Figure 11: More Pareto optimals
for elliptical indierence curves, in which case the set of Pareto optimal alternatives
is a curve connecting the two ideal points. This motivates the standard terminology:
when there are just two individuals, we refer to the set of Pareto optimal alternatives
as the contract curve.
4.1 Existence of Pareto Optimals
It is straightforward to provide a sucient condition for Pareto optimality of an
alternative in terms of social welfare maximization with weights
1
, . . . ,
n
on the
utilities of individuals.
Theorem 4.1 Let x A, and let
1
, . . . ,
n
> 0 be positive weights for each individ-
34
ual. If x solves
max
yA
n
i=1
i
u
i
(y),
then x is Pareto optimal.
Proof Suppose x solves the above maximization problem but there is some alter-
native y that Pareto dominates it. Since u
i
(y) u
i
(x) for each i, each term
i
u
i
(y)
is at least as great as
i
u
i
(x). And since there is some individual, say j, such that
u
j
(y) > u
j
(x), and since
j
> 0, there is at least one y-term that is strictly greater
than the corresponding x-term. But then
n
i=1
i
u
i
(y) >
n
i=1
i
u
i
(x),
a contradiction.
From the preceding sucient condition, we can then deduce the existence of at least
one Pareto optimal alternative very generally.
Corollary 4.2 Assume A R
d
is compact and each u
i
is continuous. Then there
exists a Pareto optimal alternative.
Proof Dene the function f : A R by f(x) =
n
i=1
u
i
(x) for all x. Note that f
is continuous, and so it achieves a maximum over the compact set A. Letting x
be
a maximizer, this alternative is Pareto optimal.
We have shown that if an alternative maximizes the sum of utilities for strictly positive
weights, then it is Pareto optimal. The next result imposes more structure on the set
of alternatives and individual utilities namely convexity of the set of alternatives
and strict quasi-concavity of utilities and strengthens the result of Theorem 4.1
by weakening the sucient condition to allow some weights to be zero.
Theorem 4.3 Assume A R
d
is convex and each u
i
is strictly quasi-concave. If
there exist weights
1
, . . . ,
n
0 (not all zero) such that x solves
max
yA
n
i=1
i
u
i
(y),
then it is Pareto optimal.
35
Proof Suppose x maximizes the weighted sum of utilities over A but is Pareto
dominated by some alternative z = x. In particular, u
i
(z) u
i
(x) for each i. Dene
w =
1
2
x +
1
2
z, and note that convexity of A implies w A. Furthermore, strict
quasi-concavity implies u
i
(w) > min{u
i
(x), u
i
(z)} = u
i
(x) for all i. Since the weights
i
are non-negative, we have
i
u
i
(w)
i
u
i
(x) for all i, and since
i
> 0 for at least
one individual, the latter inequality is strict for at least one individual. But then
n
i=1
i
u
i
(w) >
n
i=1
i
u
i
(x),
a contradiction. We conclude that x is Pareto optimal.
Our sucient condition for Pareto optimality for general utilities, Theorem 4.1, relies
on all coecients
i
being strictly positive, while Theorem 4.3 weakens this for strictly
quasi-concave utilities to at least one positive
i
. In general, we cannot state a
sucient condition that allows some coecients to be zero, even if we replace strict
quasi-concavity with concavity.
Example Let there be two individuals, A = [0, 1], u
1
(x) = x, and u
2
(x) = 0. These
utilities are concave, and x = 0 maximizes
1
u
1
(x) +
2
u
2
(x) with weights
1
= 0
and
2
= 1, but it is obviously not Pareto optimal.
In the latter example, of course the problem max
x[0,1]
1
u
1
(x)+
2
u
2
(x) (with
1
= 0
and
2
= 1) has multiple (in fact, an innite number of) solutions. Next, we provide
a dierent sort of sucient condition, relying on uniqueness of solutions to the social
welfare problem, for Pareto optimality.
Theorem 4.4 Assume that for weights
1
, . . . ,
n
0 (not all zero), the problem
max
yA
n
i=1
i
u
i
(y)
has a unique solution. If x solves the above maximization problem, then it is Pareto
optimal.
The proof is trivial. Suppose that the conditions of the theorem hold and x solves
the problem but is not Pareto optimal; but then there is a distinct alternative y that
provides each individual with utility no lower than x, but then y is another solution
to the problem, a contradiction.
4.2 Characterization with Concavity
As yet, we have derived sucientbut not necessaryconditions for Pareto optimal-
ity. To provide a more detailed characterization of the Pareto optimal alternatives
36
U
V
= (
1
, . . . ,
n
)
(u
1
(x
), . . . , u
n
(x
))
z
z
utility for 1
utility for 2
z + (1 )z
Figure 12: Utility imputations

under convexity and concavity conditions, we rst dene two vector dominance rela-
tions: given x, y R
n
, we write x z if x
i
z
i
for each coordinate i = 1, . . . , n, and
we write x > z if x
i
> z
i
for each coordinate. Dene the set of utility imputations as
U =
_
z R
n
:
there exists x A s.t.
(u
1
(x), . . . , u
n
(x)) z
_
.
Intuitively, given an alternative x, we may consider the vector (u
1
(x), . . . , u
n
(x))
of utilities generated by x. Note that this vector lies in R
n
, which has number of
dimensions equal to the number of individuals. The set of utility imputations consists
of all such utility vectors, as well as any vectors less than or equal to them. See Figure
12 for the n = 2 case.
Example Suppose there are two individuals who must choose an alternative in the
unit interval, A = [0, 1], with quadratic utilities: u
1
(x) = x
2
and u
2
(x) = (1x)
2
.
Consider the mapping u: A R
2
dened by u(x) = (u
1
(x), u
2
(x)). We can describe
the range of this mapping by a function f : [1, 0] R that species individual
2s utility as a function of individual 1s. In particular, if individual 1s utility is a,
then the alternative is x =
a, and 2s utility is b = (1
a)
2
, so we dene
f(a) = (1
a)
2
. Then the set U of utility imputations consists of the graph of
f along with all vectors to the southwest, depicted in Figure 13.
The next lemma gives some useful technical properties of the set of utility imputations.
In particular, assuming the set of alternatives is convex and utilities are concave, it
establishes that the set U of imputations is convex.
37
1
1
utility for 1
utility for 2
U
Figure 13: Example of utility imputations
Lemma 4.5 Assume A R
m
i
is concave. Then U is convex.
Furthermore, if each u
i
is strictly concave, then for all distinct x, y A and all
(0, 1), there exists z U such that
z > (u
1
(x), . . . , u
n
(x)) + (1 )(u
1
(y), . . . , u
n
(y)).
Proof Take distinct z, z
U , so there exist x, x
A such that
(u
1
(x), . . . , u
n
(x)) z and (u
1
(x
), . . . , u
n
(x
)) z
.
Since A is convex, we have x
= x + (1 )x
A. By concavity of u
i
, we have
u
i
(x
) u
i
(x) + (1 )u
i
(x
) z
i
+ (1 )z
i
for all i N. Setting z
= (u
1
(x
), . . . , u
n
(x
)), we have z
z + (1 )z
, which
implies z + (1 )z
U . See Figure 12. Therefore, U is convex. Now assume

each u
i
is strictly concave, and consider any distinct x, x
A and any (0, 1).

Borrowing the above notation, strict concavity implies
u
i
(x
) > u
i
(x) + (1 )u
i
(x
),
which implies
z
> (u
1
(x), . . . , u
n
(x)) + (1 )(u
1
(x
), . . . , u
n
(x
)),
as required.
Next, assuming utilities are concave, we derive a necessary condition for Pareto opti-
mality: if an alternative x
is Pareto optimal, then there is a vector of non-negative

weights = (
1
, . . . ,
n
) (not all zero) such that x
maximizes the sum of individual

utilities with those weights. Note that we do not claim that x
must maximize the

sum of utilities with strictly positive weights.
38
d
i
is concave. If x is Pareto
optimal, then there exist weights
1
, . . . ,
n
max
yA
n
i=1
i
u
i
(y).
Proof Assume that x
is Pareto optimal, and dene the set

V = {z R
n
: z > (u
1
(x
), . . . , u
n
(x
))}
of vectors strictly greater than the utility vector (u
1
(x
), . . . , u
n
(x
)) in each coor-
dinate. For the remainder of the proof, let z
= (u
1
(x
), . . . , u
n
(x
)) be the utility
vector associated with x
. The set V is nonempty, convex, and open (and so has

nonempty interior). The set U of imputations is nonempty and, by Lemma 4.5, con-
vex. Note that U V = , for suppose otherwise. Then there exists z U V ,
which implies the existence of x A such that
(u
1
(x), . . . , u
n
(x)) z > z
.
But then we have xP
i
x
for all i N, contradicting our assumption that x
is Pareto
optimal. Therefore, by the separating hyperplane theorem, there is a linear function
f that separates U and V . Let = (
1
, . . . ,
n
) R
n
be the non-zero gradient of f.
Then we may assume without loss of generality that for all z U and all w V , we
have f(z) c f(w), i.e., z c w. We claim that z
= c, and particular
that x
solves the maximization problem in the theorem. Since z
U , it follows
immediately that f evaluated at this vector is less than or equal to c. Suppose it is
strictly less so, i.e., z
< c. Given > 0, dene w = z
+ (1, 1, . . . , 1), and note

that w V , and therefore w c. But for suciently small, we in fact have
w < c, a contradiction. That x
solves the maximization problem in the theorem

then follows immediately: for all x A, we have (u
1
(x), . . . , u
n
(x)) U , and then
(u
1
(x), . . . , u
n
(x)) c = z
,
or equivalently,
iN
i
u
i
(x)
iN
i
u
i
(x
),
as claimed. Finally, we claim that R
n
+
, i.e.,
i
0 for all i N. To see this,
suppose that
i
< 0 for some i. Then we may dene the vector w = z
+e
i
, and for
high enough , we have
w = z
+
i
i
< z
.
For all > 0, we have w = w + (1, 1, . . . , 1) V , and therefore w c. But we
may choose > 0 suciently small that w < z
= c, a contradiction. Thus,
R
n
+
\ {0}.
39
The proof of the previous result uses the separating hyperplane theorem and the
following insight. We can think of the social welfare function above as merging two
steps: rst we apply individual utility functions to an alternative x to get a vector,
say z = (z
1
, . . . , z
n
), of individual utilities, and then we take the dot product z
to get the social welfare from x. Of course, dot products are equivalent to linear
functions, so we can view the second step as applying a linear function f : R
n
R
to the vector of utilities. Geometrically, when n = 2, we can draw the level sets of
the linear function, and if the vector of utilities from x
, denoted (u
1
(x
), . . . , u
n
(x
)),
maximizes the linear function over the set U of utility imputations, then x
maximizes
social welfare with weights . See Figure 12.
Theorem 4.6, with Theorem 4.3, provides a complete characterization of Pareto opti-
mality (under appropriate convexity/concavity conditions) in terms of optimization
theory. The only if direction follows directly from Theorem 4.6; and if direction
follows from Theorem 4.3 because strict concavity implies strict quasi-concavity.
d
i
is strictly concave. Then x is
Pareto optimal if and only if there exists weights
1
, . . . ,
n
0 (not all zero) such
that x solves
max
yA
n
i=1
i
u
i
(y).
The condition that the weights are non-negative but not all zero cannot be strength-
ened to the condition that they are all strictly positive in the necessary condition of
Theorem 4.6 and Corollary 4.7.
Example Return to the above example with two individuals, A = [0, 1], and
quadratic utilities: u
1
(x) = x
2
and u
2
(x) = (1 x)
2
. Then x = 1 is Pareto
optimal, yet there do not exist strictly positive weights
1
,
2
> 0 such that x max-
imizes
1
u
1
(y) +
2
u
2
(y). See Figure 14. Given any strictly positive weights,
1
and
2
, the level set through (0, 1) of the linear function with gradient (
1
,
2
) cuts
through the set of utility imputations; thus, (u
1
(1), u
2
(1)) does not maximize the
linear function over the set of imputations.
The previous corollary uses the assumption of strict concavity to provide a full char-
acterization of Pareto optimality. It is simple to deduce a more general conclusion
that relies instead on the uniqueness condition of Theorem 4.4.
d
i
is concave. Furthermore,
assume that for all weights
1
, . . . ,
n
0 (not all zero), the problem
max
yA
n
i=1
i
u
i
(y)
40
U
= (
1
,
2
) 0
(u
1
(1), u
2
(1))
Figure 14: No strictly positive weights
has a unique solution. Then x is Pareto optimal if and only if there exist weights
1
, . . . ,
n
0 (not all zero) such that x solves the above maximization problem.
One direction follows immediately from Theorem 4.6. Under the conditions of the
corollary, suppose x solves the maximization problem for some non-negative weights
(not all zero). Then Theorem 4.4 implies x is Pareto optimal, as required.
The necessary condition for Pareto optimality established in Theorem 4.6 immediately
implies, via the rst order necessary condition, that if x is Pareto optimal, then
n
i=1
i
Du
i
(x) = 0
for some non-negative weights (not all zero). Letting
i
=

i
P
n
j=1

j
, it follows that
n
i=1

i
= 1 and
n
i=1

i
Du
i
(x) = 0,
or in other words, the zero vector belongs to the convex hull of the individuals gra-
dients at x, i.e., 0 conv{Du
1
(x), . . . , Du
n
(x)}. Moreover, recall from Theorem 3.4
that the rst order condition is essentially sucient for a global maximizer when the
objective function is concave. Assuming each u
i
is C
1
and strictly concave, Corollary
4.7 then yields a characterization of Pareto optimality in terms of gradients.
d
i
is C
1
and strictly concave.
Then x intA is Pareto optimal if and only if there exists weights
1
, . . . ,
n
0
(not all zero) such that
n
i=1
i
Du
i
(x) = 0.
41
The preceding result has obvious implications when there are just two individuals.
Let x
intA be Pareto optimal, so there exist non-negative weights (not all zero)
such that for all coordinates j, k = 1, . . . , n, we have
1
D
j
u
1
(x
) +
2
D
j
u
2
(x
) = 0
1
D
k
u
1
(x
) +
2
D
k
u
2
(x
) = 0.
Note that when D
k
u
1
(x
) = 0 and D
k
u
2
(x
) = 0, we have
D
j
u
1
(x
)
D
k
u
1
(x
)
=
D
j
u
2
(x
)
D
k
u
2
(x
)
.
That is, the marginal rates of substitution of k for j are equal for the two individuals,
i.e., their indierence curves are tangent, as in Figures 10 and 11. And although the
machinery we have developed thus far requires the utilities u
1
and u
2
in the preceding
discussion to be concave, we will see that the analysis extends more generally.
Example Suppose A = R
d
and each u
i
is quadratic. Since quadratic utilities are
strictly concave, it follows that x is Pareto optimal if and only if there exist weights
1
, . . . ,
n
max
yA
n
i=1
i
u
i
(y).
Furthermore, since each u
i
is strictly concave, the function
n
i=1
i
u
i
(x) is strictly
concave, so x is a solution to the above maximization problem if and only if it solves
the rst order condition
0 = D
n
i=1
i
u
i
(x) =
n
i=1
2
i
( x
i
x),
or
x =
n
i=1
_

i
n
j=1
j
_
x
i
.
Finally, writing
i
=

i
P
n
j=1

j
, we have
i
0 for all i,
n
i=1

i
= 1, and
x =
n
i=1

i
x
i
,
i.e., x is a convex combination of ideal points with weights
i
. This gives us a
characterization of all of the Pareto optimal alternatives: an alternative is Pareto
optimal if and only if it is a convex combination of individual ideal points. That is,
we connect the exterior ideal points to create an enclosed space, and the Pareto
optimals consist of that line and the area within. See Figure 15.
42
Figure 15: Convex hull of ideal points
In the previous example, a vector of weights = (
1
, . . . ,
n
) determined a particu-
lar Pareto optimal alternative, namely, the convex combination of ideal points with
weights
i
. If we consider only vectors of strictly positive weights, we can view this
as a mapping : R
n
++
R
d
dened by
() =
_
n
i=1
n
j=1
j
x
1
, . . . ,
n
i=1
n
j=1
j
x
n
_
.
Writing () = (
1
(), . . . ,
d
()), each coordinate mapping
k
is clearly continuous,
directionally dierentiable, and indeed linear.
For more general utilities, the mapping from weight vectors to Pareto optimal alter-
natives will not be linear; but viewing the weights as parameters, we can use Theorem
3.5 to provide conditions under which the mapping is well-dened and C
1
. In the
next result, we add the assumption that utility functions are C
2
and strengthen strict
concavity by assuming that the second order sucient condition holds at every in-
terior alternative. To apply the theorem, we set = R
n
++
, X = A, and dene
f(x, ) =
n
i=1
i
u
i
(x). Consider an interior Pareto optimal alternative x
intA,
and let x
solve max
xA
n
i=1
i
u
i
(x) with weights
= (
1
, . . . ,
n
). Then the rst
order condition holds, i.e., Df(x
) =
n
i=1
i
Du
i
(x
) = 0, and we have
D
2
t
f(x
) =
d
j=1
d
k=1
t
j
t
k
D
j,k
f(x
) =
d
j=1
d
k=1
t
j
t
k
D
j,k
_
n
i=1
u
i
(x
)
_
=
n
i=1
d
j=1
d
k=1
t
j
t
k
Du
i
(x
) =
n
i=1
D
2
t
u
i
(x
) < 0
for every direction t. Then Theorems 3.5 and 3.7 yield the following.
43
Theorem 4.10 Assume that A R
d
is convex and that each u
i
is C
2
and that for
all x intA and every direction t, we have D
2
t
u
i
(x) < 0. Then there are C
1
mappings
i
: R
n
++
R, i = 1, . . . , n, such that for all R
n
++
, () = (
1
(), . . . ,
n
()) is
Pareto optimal and is the unique solution to
max
xA
n
i=1
i
u
i
(x).
Moreover, the mapping F : R
n
++
R dened by
F() =
n
i=1
i
u
i
(())
is C
1
and
DF() = (u
1
(()), . . . , u
n
(())).
Since we rely only on ordinal information contained in utility representations, and
any utility representation u
i
is equivalent, for our purposes, to an innite number of
others resulting from monotonic transformations of u
i
. This may seem to run counter
to the result just described: if x maximizes social welfare with weights (
1
, . . . ,
n
)
for one specication of utility representations, u
1
, . . . , u
n
, then there is no cause to
think it will maximize social welfare with those weights for a dierent specication,
say 5u
1
, u
3
2
, ln(u
3
), . . .. Indeed, it may not. But if we take monotonic transformations
of the original utility functions, x will still be Pareto optimal, and there will still exist
weights, say (
1
, . . . ,
n
), for which x maximizes social welfare. In short, Theorem
4.6 says that a Pareto optimal alternative will maximize social welfare for suitably
chosen weights, but those weights may depend on the precise specication of utility
functions.
4.3 Characterization without Concavity
When utilities are dierentiable, we can sharpen the characterization of the previous
subsection. We rst note that at an interior Pareto optimal alternative, the gradients
of the individuals are linearly dependent.
d
, and let x intA. Assume each u
i
is dierentiable
at x. Then there exist
1
, . . . ,
n
0 (not all zero) such that
n
i=1
i
Du
i
(x) = 0.
Proof If there do not exist such weights, then 0 / conv{Du
1
(x), . . . , Du
n
(x)}. Then
by the separating hyperplane theorem, there is a non-zero vector p R
n
such that
p Du
1
(x) > 0, . . . , p Du
n
(x) > 0. Then there exists > 0 such that x +p A and
u
i
(x + p) > u
i
(x) for all i, contradicting Pareto optimality of x.
44
We can take a geometric perspective by dening the mapping u: A R
n
from
alternatives to vectors of utilities, i.e., u(x) = (u
1
(x), . . . , u
n
(x)). Then the derivative
of u at x, suitably extended to vector-valued functions, is the matrix
_
_
u
1
x
1
(x)
u
1
x
d
(x)
.
.
.
.
.
.
.
.
.
un
x
1
(x)
un
x
d
(x)
_
_
.
The span of the columns is a linear subspace of R
n
called the tangent space of u
at x. Theorem 4.11 implies that at a Pareto optimal alternative, the rank of this
derivative is n 1 or less. By Pareto optimality, u(x) belongs to the boundary of
u(X). Furthermore, the theorem implies
_

1

n
_
u
1
x
1
(x)
u
1
x
d
(x)
.
.
.
.
.
.
.
.
.
un
x
1
(x)
un
x
d
(x)
_
_
= 0,
so the tangent space has a normal vector (
1
, . . . ,
n
) with non-negative coordinates.
The weights in Theorem 4.11 cannot be unique: if weights (
1
, . . . ,
n
) fulll the
theorem, then any positive scaling of the weights does as well. But when the derivative
Du(x) has rank n 1, the weights are unique up to a positive scalar. Indeed, when
the derivative has rank n1, the tangent space at u(x) is a hyperplane of dimension
n 1, e.g., it is a tangent line when n = 2 and a tangent plane when n = 3. See
Figure 16 for the three-individual case. Then the normal space is one-dimensional,
and the uniqueness claim follows.
d
, and let x intA. Assume each u
i
is dierentiable
at x and that Du(x) has rank n 1. Then there exist
1
, . . . ,
n
0 (not all zero)
such that
n
i=1
i
Du
i
(x) = 0, and these weights are unique up to a positive scaling.
The rank condition used in the previous result, while reasonable in some contexts,
is restrictive; it implies, for example, that the set of alternatives has dimension at
least n 1. Note that the condition that the weights are non-negative and not all
zero implies that the tangent line at u(x) is downward sloping when n = 2, and it
formalizes the idea that the boundary of u(X) at u(x) is downward sloping for any
number of individuals.
45
for 1
utility
utility
utility
u(x)
for 2
for 3
du
dx
1
(x)
du
dx
2
(x)
du
dx
3
(x)
normal space
boundary
of u(X)
Figure 16: Unique weights
5 Single Equality Constraint
We now consider maximization subject to a single equality constraint. That is, letting
f : X R and g : R
n
R, we consider
max
xX
f(x)
s.t g(x) = c.
Following the analysis of unconstrained optimization, we derive necessary rst order
conditions, then consider second order conditions, and the implications of convex
structure for the optimization problem.
5.1 First Order Analysis
As in the case of unconstrained optimization, we can derive restrictions on constrained
local maximizers in terms of directional derivatives. In contrast to the unconstrained
case, however, we can only consider moves in a small range of directionsonly in
directions t that are tangent to the level set of g at c, or equivalently, orthogonal to
the gradient of the constraint function. In truth, we can only move along the level
set of g at c; moving from a constrained local maximizer in such a direction t can
violate the constraint if g is non-linear, a fact that is the source of some complexity
of the analysis. But we can almost move in the directions tangent to the gradient
of the constraint function, and that is enough for our purposes.
46
x
1
x
2
g = c
level sets
of f
Df(x)
x
Dg(x)
y
Figure 17: Constrained local maximizer
x
1
x
2
g = c
level set
of f
Df(x)
x
Dg(x)
y
Figure 18: Not a constrained local maximizer
In problems with equality constraints, constrained local maximizers look something
like x in Figure 17. Note that such a vector need not be a constrained global maximizer
here, f takes a strictly higher value at y, which also satises the constraint. Note
that the level sets of f and g are tangent at x. In other words, their gradients are
collinear (maybe pointing in opposite directions). When thats not the case, we can
nd a point like y in Figure 18 with g(y) = c and f(y) > f(x). Moreover, we can nd
such vectors arbitrarily close to x, so x cant be a local maximizer.
The next result is the analogue of Theorem 3.1, providing a necessary rst order
condition for problems with a single equality constraint.
Theorem 5.1 (Lagrange) Let X R
n
, x intX, f : X R, and g : R
n
R.
Assume f and g are C
1
. Assume Dg(x) = 0. If x is a constrained local maximizer
47
x
1
x
2
Dg(x)
g = c
x = 0
z
(z)
I
Figure 19: Proof of Lagrange
of f subject to g(x) = c, then there is a unique multiplier R such that
Df(x) = Dg(x). (1)
Proof I provide a heuristic argument for the case of two variables. The idea is to
transform the constrained problem into an unconstrained one. The theorem assumes
that Dg(x) = 0, and (only to simplify notation) we will assume x = 0 and D
2
g(x) = 0.
The implicit function theorem implies that in an open interval I around x
1
= 0, we
may then view the level set of g at c as the graph of a function : I R such that for
all z I, g(z, (z)) = c. See Figure 19. Note that 0 = x = (0, (0)). Furthermore,
is C
1
with derivative
D(z) =
D
1
g(z, (z))
D
2
g(z, (z))
. (2)
Because x is interior to X, we can choose the interval I small enough that each
(z, (z)) belongs to the domain X of the objective function. Then z = 0 is a local
maximizer of the unconstrained problem
max
zI
f(z, (z)),
and we know the rst order condition holds, i.e., dierentiating with respect to z and
using the chain rule, we have
D
1
f(0) +D
2
f(0)D(0) = 0,
which implies
D
1
f(0) +D
2
f(0)
D
1
g(0)
D
2
g(0)
= 0.
48
Dening =
D
2
f(0)
D
2
g(0)
, we have Df(0) = Dg(0), as desired.
The number is the Lagrange multiplier corresponding to the constraint. The con-
dition Dg(x) = 0 is called the constraint qualication. Without it, the result would
not be true.
Example Consider X = R, f(x) = (x + 1)
2
, and g(x) = x
2
. Consider the problem
of maximizing f subject to g(x) = 0. The maximizer is clearly x = 0. But Dg(0) = 0
and Df(0) = 2, so there can be no such that Df(0) = Dg(0).
Note the following implication of Lagranges Theorem: at a constrained local maxi-
mizer x, and given coordinate j with D
j
g(x) = 0, we have
f
x
i
(x)
f
x
j
(x)
=
g
x
i
(x)
g
x
j
(x)
for all i and j. The lefthand side is the marginal rate of substitution telling us the
value of x
i
in terms of x
j
. The righthand side tells us the cost of x
i
in terms of x
j
.
Lagrange tells us that, at an interior local maximizer, those have to be the same.
There is an easy way to remember the conditions in the corollary to the Lagranges
theorem: if x is an interior constrained local maximizer of f subject to g(x) = c,
and if Dg(x) = 0, then there exists R such that (x, ) is a critical point of the
function L: X R R dened by
L(x, ) = f(x) + (c g(x)).
That is, there exists R such that
L
x
1
(x, ) =
f
x
1
(x)
g
x
1
(x) = 0
.
.
.
.
.
.
L
xn
(x, ) =
f
xn
(x)
g
xn
(x) = 0
L
(x, ) = c g(x) = 0,
which is equivalent to the rst order condition (1). The function L is called the
Lagrangian function.
Of course, the rst order condition from Lagranges theorem can be written in terms
of partial derivatives:
f
x
1
(x
1
, . . . , x
n
) =
g
x
1
(x
1
, . . . , x
n
)
.
.
.
f
x
n
(x
1
, . . . , x
n
) =
g
x
n
(x
1
, . . . , x
n
).
49
The theorem gives us n + 1 equations (including the constraint) in n + 1 unknowns
(including ), and if we can solve for all of the solutions of this system, then we
have an upper bound on the interior constrained local maximizers. Thus, we have
converted the optimization problem into that of solving a system of n + 1 equations
in n + 1 unknowns.
Example Let X = R
2
+
and f(x
1
, x
2
) = x
1
x
2
2x
4
1
x
2
2
, but now consider the
maximization problem subject to the constraint 6x
1
+ 3x
2
= 12. Note that the
constraint qualication holds. The rst order condition is:
x
2
8x
3
1
= 6
x
1
2x
2
= 3
6x
1
+ 3x
2
= 12.
From the last equation, we have x
2
= 4 2x
1
. Substituting the latter into the second
equation, we have = (5x
1
8)/3. Substituting this into the rst equation, we have
8x
3
1
+ 12x
1
= 20. This has unique solution x
1
= 1, and then x
2
= 2 and = 1.
Thus, the only possible interior constrained local maximizer is (x
1
, x
2
) = (1, 2). The
only elements on the boundary of the domain satisfying the constraint are (2, 0) and
(0, 4), and f(2, 0) = 32 < f(0, 4) = 16 < f(1, 2) = 4, so there are no constrained
maximizers on the boundary of the domain. Because X C is compact, we know f
has at least one constrained maximizer, and we conclude that the unique constrained
maximizer is (1, 2).
Recall that Lagranges theorem only gives us a necessarynot a sucientcondition
for a constrained local maximizer. The next example shows that the rst order
condition is not generally sucient. In fact, it shows more than that: even if the
objective function is strictly quasi-concave, the constraint is linear, and the rst order
condition holds at an element x of the domain, x need not be a local maximizer.
Example Consider X = R
2
, f(x
1
, x
2
) = (x
2
x
2
1
)
3
, g(x
1
, x
2
) = x
1
, and
max
xR
2 f(x
1
, x
2
)
s.t. g(x
1
, x
2
) = 0.
To see that the objective function is strictly quasi-concave, note that (x
1
, x
2
) belongs
to the level set at c if and only if x
2
= x
2
1
+ c
1/3
, so level sets contain no straight
lines and the upper contour sets of f are convex. Note that x
= 0 satises the
constraint g(x
) = 0, and the constraint qualication is also satised. Furthermore,

the rst order condition from Lagranges theorem is satised at x
. Indeed, Df(x) =
(6x
1
(x
2
x
2
1
)
2
, 3(x
2
x
2
1
)
2
) and Dg(x) = (1, 0). Thus, the equality Df(x
) =
Dg(x
) is obtained by setting = 0, as in Lagranges theorem. But x
is not a
constrained local maximizer: for arbitrarily small > 0, (0, ) satises g(0, ) = 1 and
f(0, ) =
3
> 0 = f(x
).
50
Though its not quite technically correct, its as though weve converted a constrained
maximization problem into an unconstrained one: maximizing the Lagrangian L(x, )
with respect to x. Imagine allowing xs that violate the constraint; for example,
suppose, at a constrained maximizer x
, that we could increase the value of f by

moving from x
to a nearby point x with g(x) < c. Since this x violates the constraint,
we dont want this to be protable, so the Lagrangian has to impose a cost of doing
so in the amount (c g(x)) (here, has to be positive). Then is like a price
of violating the constraint imposed by the Lagrangian. The reason why this is not
technically correct is that given the multiplier , a constrained local maximizer need
not be a local maximizer of L(, ).
Example Consider X = R, f(x) = (x 1)
3
+ x, g(x) = x, and
max
xR
f(x)
s.t. g(x) = 1
The unique solution to the constraint, and therefore to the maximization problem,
is x = 1. Note that Df(x) = 3(x 1)
2
+ 1 and Dg(x) = 1, and evaluating at the
solution x = 1, we have Df(1) = 1 = Dg(1). Thus, the multiplier for this problem is
= 1. The Lagrangian is
L(x, ) = (x 1)
3
+ x + (1 x),
and evaluated at = 1, this becomes
L(x, 1) = (x 1)
3
+ 1.
But note that this function is strictly increasing at x = 1, i.e., for arbitrarily small
> 0, we have L(1 +, 1) > L(1, 1), so x = 1 is not a local maximizer of L(, 1).
5.2 Convex Structure
We have seen that the necessary rst order condition is not sucient for a local
maximizer, even if the objective function is strictly quasi-concave and the constraint
is linear: when f(x
1
, x
2
) = (x
2
x
2
1
)
3
and the constraint is x
1
= 0, the rst order
condition is satised at x
= 0, but it is not a constrained local maximizer. Two

features of this example are noteworthy: the objective function is not concave, and
the gradient of the objective function is zero at x
= 0. It turns out that if either

of those conditions holds, then the rst order condition is sucient for not only a
constrained local maximizer, but a constrained global maximizer. The next theorem
actually follows from a more general result, Theorem 7.2, for inequality constrained
maximization, so I defer the proof until then. Note that the constraint qualication
is not assumed.
51
Theorem 5.2 Let X R
n
be convex, let x intX, let f : X R be C
1
, and let
g : R
n
R be linear. Assume x satises the constraint g(x) = c and the rst order
condition (1) with multiplier . Then x is a constrained global maximizer of f subject
to g(x) = c provided either of two conditions holds:
1. f is quasi-concave and Df(x) = 0, or
2. f is concave.
If the goal is to establish sucient conditions for a constrained global maximizer,
then an alternative to assuming a quasi-concave objective with non-zero gradient is
to strengthen the rst order condition to the assumption that x is a local maximizer.
The next example shows that this approach does not work.
Example Consider X = R
2
, dene f : R
2
R by
f(x
1
, x
2
) =
_
x
2
2
if x
2
> 0,
0 if x
2
0,
and let g(x
1
, x
2
) = x
1
. Note that f is indeed C
1
. For the problem
max
xR
2 f(x
1
, x
2
)
s.t. g(x
1
, x
2
) = 0,
the vector (0, 1) is a constrained local maximizer, but it is not a constrained global
maximizer.
The next result strengthens these assumptions even further, and deducing an even
stronger condition: if f is quasi-concave and x is a constrained strict local maximizer,
then x is the unique constrained global maximizer.
Theorem 5.3 Let X R
n
be convex, let f : X R be quasi-concave, and let
g : R
n
R be linear. If x X is a constrained strict local maximizer of f subject to
g(x) = c, then it is the unique constrained global maximizer.
Proof Assume x X is a constrained strict local maximizer, and suppose there
exists y X with y = x such that g(y) = c and f(y) f(x). Let > 0 be
such that for all z X C B
(x) with z = x, we have f(x) > f(z). Given any

with 0 < < 1, dene z() = y + (1 )x. Then quasi-concavity implies
f(z()) min{f(x), f(y)} = f(x). Furthermore, with g(x) = g(y) = c, linearity of g
implies g(z()) = c. But for small enough > 0, we have z() X C B
(x) and
f(z()) 0, a contradiction.
Of course, if f is strictly quasi-concave and x is a constrained local maximizer, then
it is a constrained strict local maximizer, and the theorem can be applied.
52
5.3 Consumer Example
A consumer purchases a bundle (x
1
, x
2
) to maximize utility. His income is I > 0 and
prices are p
1
> 0 and p
2
> 0. His utility function is u: R
2
+
R. We assume u is
dierentiable and monotonic in the following sense: for all (x
1
, x
2
) and (y
1
, y
2
) with
x
1
y
1
and x
2
y
2
, at least one inequality strict, we have u(x
1
, x
2
) > u(y
1
, y
2
). The
consumers problem is:
max
(x
1
,x
2
)R
2
+
u(x
1
, x
2
)
s.t. p
1
x
1
+ p
2
x
2
= I.
Note that we impose the constraint that the consumer must spend all of his income;
since we assume monotonicity, this is without loss of generality. The set X C =
R
2
+
{(x
1
, x
2
) | p
1
x
1
+ p
2
x
2
= I} is compact (since p
1
, p
2
> 0), and u is continuous,
so the maximization problem has a solution. We can apply Lagranges theorem with
f(x
1
, x
2
) = u(x
1
, x
2
)
g(x
1
, x
2
) = p
1
x
1
+ p
2
x
2
c = I
to nd all the constrained local maximizers (x
1
, x
2
) interior to R
2
+
(i.e, x
1
, x
2
> 0)
satisfying Dg(x
1
, x
2
) = 0. In fact, for all (x
1
, x
2
) R
2
+
,
Dg(x
1
, x
2
) = (p
1
, p
2
) = 0,
so the constraint qualication is always met. Letting (x
1
, x
2
) be an interior con-
strained local maximizer, there exists R such that (x
1
, x
2
, ) is a critical point of
the Lagrangian:
L(x
1
, x
2
, ) = u(x
1
, x
2
) + (I p
1
x
1
p
2
x
2
).
That is,
L
x
1
(x
1
, x
2
, ) =
u
x
1
(x
1
, x
2
, ) p
1
= 0
L
x
2
(x
1
, x
2
, ) =
u
x
2
(x
1
, x
2
, ) p
2
= 0
L
(x
1
, x
2
, ) = I p
1
x
1
p
2
x
2
= 0.
Solving these equations gives us the critical points of the Lagrangian, and if a maxi-
mizer (x
1
, x
2
) is interior to R
2
+
(x
1
, x
2
> 0 ), then it will be one of these critical points.
Note that
u
x
1
(x
1
, x
2
)
u
x
2
(x
1
, x
2
)
=
p
1
p
2
,
53
i.e., the relative value of x
1
in terms of x
2
equals the relative price. Consider the Cobb-
Douglas special case u(x
1
, x
2
) = x
1
x
2
, where , > 0. Its clear that every maximizer
must be interior to R
2
+
. (Right?) The critical points of the Lagrangian satisfy
x
1
1
x
2
p
1
= 0
x
1
x
1
2
p
2
= 0,
Divide to get

=
x
2
x
1
p
1
p
2
, or x
2
=

p
1
p
2
x
1
. Plug into p
1
x
1
+ p
2
x
2
= I to get
p
1
x
1
+ p
2
_
p
1
p
2
x
1
_
= I,
so the unique critical point of the Lagrangian is
x
1
=

+
I
p
1
and x
2
=

+
I
p
2
.
Since this critical point is unique, it is the unique maximizer, and we call
x
1
(p
1
, p
2
, I) =

+
I
p
1
x
2
(p
1
, p
2
, I) =

+
I
p
1
demand functions. They tell us the consumers consumption for dierent prices and
incomes. Fixing p
2
and I, we can graph x
1
as a function of p
1
, which gives us the
demand curve for good 1. We can also solve for by substituting into x
1
1
x
2
= p
1
.
This gives us,
=

p
1
_

+
_
1
_
I
p
1
_
1
_

+
_
_
I
p
2
_
=
_
p
1
_
p
2
_
_
I
+
_
+1
.
If + = 1, then the last term drops out. Note that we can always take a strictly
increasing transformation of Cobb-Douglas utilities to obtain + = 1 without al-
tering the consumers demand functions, but such a transformation can aect the
Lagrange multiplier.
5.4 Second Order Analysis
Second order necessary and sucient conditions are more complicated than they
were in unconstrained optimization. As in the rst order analysis, a condition on
second order directional derivatives needs to be satised at an interior constrained
local maximizerbut only in directions t that are orthogonal to the level set of g
54
at c. Once again, we must deal with the complication that we can only move along
the level set of g at c; moving from a constrained local maximizer in such a direction
t can violate the constraint if g is non-linear. Once again, the insight is to convert
the constrained optimization problem into an unconstrained one using the implicit
function theorem.
Theorem 5.4 Let X R
n
, let x intX, and let f : X R and g : R
n
R be
C
2
. Assume Dg(x) = 0. Assume x is a constrained local maximizer of f subject to
g(x) = c and satises the rst order condition (1) with multiplier . Then
t
[D
2
f(x) D
2
g(x)]t 0 (3)
for all directions t with Dg(x)t = 0.
Proof We give a heuristic proof for the two-variable case similar to that of Lagranges
theorem. As above, we have Dg(x) = 0, and we assume for simplicity that x = 0 and
D
2
g(x) = 0. To further simplify matters, assume that the gradient of g at x points
straight up, so D
1
g(x) = 0. (This just amounts to a rotation of axes that doesnt
aect the second order analysis.) Again, we have an open interval I around x
1
= 0
and a C
1
function : I R such that for all z I, we have g(z, (z)) = c. Because
we assume Dg(0) = (0, D
2
g(0)), this means Dg(x)t = 0 for exactly two directions,
i.e., t = (1, 0) and t = (1, 0). In either case, the necessary second order condition is
D
2
1
f(0) D
2
1
g(0).
To obtain this, we note again that z = 0 is a local maximizer of the unconstrained
problem
max
zI
f(z, (z)).
That is, dening the function : I R by (z) = f(z, (z)), z = 0 is a local
maximizer of . Thus, the rst order necessary condition holds,
D(0) = D
1
f(0) +D
2
f(0)D(0) = 0,
where the derivative of has the form (2) given by the implicit function theorem.
Furthermore, the necessary second order condition holds, i.e., D
2
(0) 0. To expand
this we use the chain rule to calculate the derivatives of as follows:
D(z) = D
1
f(z, (z)) +D
2
f(z, (z))D(z)
D
2
(z) = D
2
1
f + D
1,2
fD + [D
1,2
f + D
2,2
fD]D + D
2
fD
2
,
where f is evaluated at (z, (z)) and at z. To unpack this further, we must expand
the second derivative of using (2) and the chain rule:
D
2
(z) =
D
2
g[D
2
1
g + D
1,2
gD] D
1
g[D
1,2
g + D
2,2
gD]
(D
2
g)
2
=
D
2
gD
2
1
g D
1
gD
1,2
g
(D
2
g)
2
,
55
again evaluating g at (z, (z)) and at z. These derivatives simplify considerably at
z = 0, because D
1
g(0) = 0 and (2) imply D(0) = 0. Then
D
2
(0) = D
2
1
f(0) +D
2
f(0)D
2
(0) 0
D
2
(0) =
D
2
g(0)D
2
1
g(0)
(D
2
g(0))
2
=
D
2
1
g(0)
D
2
g(0)
.
Substituting the latter expression for D
2
(0) into the inequality D
2
F(0) 0, we have
D
2
1
f(0) D
2
f(0)
D
2
1
g(0)
D
2
g(0)
0.
Finally, recall that the rst order condition (1) implies D
2
f(0) = D
2
g(0), so the
preceding inequality becomes D
2
1
f(0) D
2
1
g(0), as required.
We can write the necessary second order condition in (3) in terms of the Lagrangian.
Recall the Lagrangian of an equality constrained maximization problem is dened as
L(x, ) = f(x) + (c g(x)).
Suppose the rst order condition (1) holds at x with multiplier , i.e.,
DL(x|) = Df(x) Dg(x) = 0,
and dene the Hessian of the Lagrangian with respect to x as
D
2
x
L(x|) =
_
2
x
2
1
L(x|)

2
x
1
x
2
L(x|)

2
x
1
xn
L(x|)
2
x
2
x
1
L(x|)

2
x
2
2
L(x|)

2
x
2
xn
L(x|)
.
.
.
.
.
.
.
.
.
.
.
.
2
xnx
1
L(x|)

2
xnx
2
L(x|)

2
x
2
n
L(x|)
_
_
.
Then the necessary second order condition is
t
D
2
L(x|)t 0
for all t with Dg(x)t = 0.
How do we check whether the Hessian satises these inequalities? We can form the
bordered Hessian of the Lagrangian,
_
0 Dg(x)
Dg(x)
D
2
L(x|)
_
,
and then check signs of the last n 1 leading principal minors of the matrix. But
this takes us beyond the scope of these notes. See Simon and Blume (1994) for a nice
explanation.
56
Of course, the latter result provides only a necessary second order condition, not a
sucient one. Strengthening the condition to strict inequality, we have a stronger
second order condition that is sucient for a constrained strict local maximizer. Note
that, in contrast to the analysis of necessary conditions, the next result does not rely
on the constraint qualication.
Theorem 5.5 Let X R
n
, let x intX, and let f : X R and g : R
n
R be
C
2
. Assume x satises the constraint g(x) = c and the rst order condition (1) with
multiplier . If
t
[D
2
f(x) D
2
g(x)]t < 0 (4)
for all directions t with Dg(x)t = 0, then x is a constrained strict local maximizer of
f subject to g(x) = c.
Again, we can write the sucient second order condition in terms of the Lagrangian
as t
D
2
L(x|)t < 0 for all t with Dg(x)t = 0.
In fact, with the rst order condition (1) from Lagranges theorem, the second order
sucient condition implies much more. It implies that x is locally isolated, i.e., there
is an open set Y R
n
around x such that x is the unique constrained local maximizer
belonging to Y . Furthermore, following the analysis of unconstrained optimization,
we can consider the possibility that the objective function f and the constraint func-
tion g contain parameters, notationally suppressed until now, and we can study the
eect of letting one parameter, say , vary. Of course, if x is a constrained local
maximizer given parameter , and then the value of the parameter changes a small
amount to
, then x may no longer be a constrained local maximizer, but assuming

the second order sucient condition, the new constrained local maximizer will be
close to x, and its location will vary smoothly as we vary the parameter. Note that
the constraint qualication is reinstated in the next result.
Theorem 5.6 Let R
m
be open and X R
n
, and let f : X R and
g : R
n
R be C
2
. Consider (x
) X with x intX. Assume Dg(x
) =
0. Given
, assume x
satises the constraint g(x
) = c and the rst order

condition, i.e.,
Df(x
Dg(x
) = 0,
with multiplier
. If
t
[D
2
f(x
D
2
g(x
)]t < 0
for all t with Dg(x
)t = 0, then there are open sets Y R

n
with x
Y and
with
and C
1
mappings
i
: , i = 1, . . . , n, and : R such
57
that for all , (i) () = (
1
(), . . . ,
n
()) is the unique maximizer of f(, )
subject to g(x, ) = c belonging to Y , (ii) () satises the rst order condition (1)
at with unique multiplier (), and (iii) () satises the second order sucient
condition (4) at with multiplier ().
The preceding result lays the theoretical groundwork necessary for studying the eect
of a parameter on the solution to a given optimization problem. This exercise is
referred to as comparative statics. For example, under the conditions of the preceding
theorem, we can take partial derivatives,
x
1
p
1
,
x
1
p
2
,
x
1
I
, etc.,
that tell us how the consumers maximizer changes with respect to market parameters.
Example Recall the solution for the Cobb-Douglas consumers demands:
x
1
(p
1
, p
2
, I) =

+
I
p
1
and x
2
(p
1
, p
2
, I) =

+
I
p
1
The conditions of Theorem 5.6 are satised here, and indeed we can directly compute
partial derivatives of demand for good 1 as
x
1
p
1
(p
1
, p
2
, I) =

+
I
p
2
1
x
1
p
2
(p
1
, p
2
, I) = 0
x
1
I
(p
1
, p
2
, I) =

p
1
( + )
.
Obviously, partial derivatives for good 2 are similar. Interesting features of Cobb-
Douglas demands are that demand curves are downward-sloping and the demand for
any good is invariant with respect to price changes in other goods. Indeed, the share
of income spent on good 1 is always /( + ) and the share spent on good 2 is
/( + ).
Given the preceding result and the mapping , which species a constrained local
maximizer as a C
1
function of the parameter I, the locally maximized value
F() = f((), )
is itself a C
1
mapping. The next result is an extension of the envelope theorem
to equality constrained maximization problems; it provides a simple technique for
performing comparative statics on this maximized value function: basically, we can
take a simple partial derivative of the parameterized Lagrangian,
L(x, , ) = f(x, ) + (c g(x, )),
58
only through the argument. That is, although the location of the constrained local
maximizer may indeed change when we vary , we can ignore that variation, treating
the constrained local maximizer as xed in taking the derivative.
Theorem 5.7 Let R
m
be open and X R
n
g : R
n
R be C
1
. Consider (x
) X with x intX. Let

i
: R,
i = 1, . . . , m, and : R be C
1
= (
) and
= (
)
and such that for all , () is a constrained local maximizer satisfying the rst
order condition (1) at with multiplier (). Dene the mapping F : R by
F() = f((), )
1
, and
DF(
) = DL(
|x
).
Proof Write the partial derivative of the Lagrangian with respect to
i
as
D
i
L(
|x
) = D
i
f(
|x
D
i
g(
|x
).
Using the chain rule, we have
D
i
F(
) =
n
j=1
D
j
f(x
)D
i
j
(
) + D
i
f(
|x
),
so the result then follows if
n
j=1
D
j
f(x
)D
i
j
(
) =
D
i
g(x
).
To verify the latter equality, we write G() = g((), ) and use the chain rule to
conclude
D
i
G() =
n
j=1
D
j
g(x|)D
i
j
() + D
i
g(|x) = 0,
where derivatives of g are evaluated at x = (), and the second equality above
follows since g((), ) takes a constant value of c on , so its derivative is zero.
Then
D
i
g(
|x
) =
n
j=1
D
j
g(x
)D
i
j
(
),
so the result follows if
n
j=1
[D
j
f(x
) D
j
g(x
)] D
i
j
(
) = 0,
59
which follows from the rst order condition (1).
The previous analysis looks less general than it is, and in fact, it provides an intuitive
interpretation of the Lagrange multiplier. Although the parameter does not explic-
itly enter the value of the constraint, c, we can consider a simple linear specication,
g(x, ) = g(x) , so the Lagrangian becomes
L(x, ) = f(x) + (c + g(x)),
so by changing , we are eectively varying the value of the constraint, now c + .
By Theorem 5.7, the rate of change of the locally maximized value of the objective
function as we increase is
DF(
) =
L
(x
) =
.
That is, the value of the multiplier at a constrained local maximizer tells us the
marginal eect of increasing the value of the constraint.
Example In the consumers problem, given prices and income p
1
, p
2
, and I, let
x
1
(p
1
, p
2
, I) and x
2
(p
1
, p
2
, I) be demands satisfying the rst order condition and sec-
ond order sucient condition. Then the consumers maximum utility is
U(p
1
, p
2
, I) = u(x
1
(p
1
, p
2
, I), x
2
(p
1
, p
2
, I)),
and the function U() is called the consumers indirect utility function. How does
this vary with respect to prices and income? Consider I. According to the envelope
theorem, we take the partial derivative of the Lagrangian,
u
1
(x
1
, x
2
) +
(I p
1
x
1
p
2
x
2
),
with respect to I, where x
1
and x
2
are xed at their maximized values and
is
the associated multiplier. Thats just
! Thus, we see that the Lagrange multiplier

measures the rate at which the consumers utility increases with her income, i.e., it
is the marginal utility of money. How does the consumers maximum utility vary
with the price p
1
? It is simply
1
.
6 Multiple Equality Constraints
More generally, a maximization problem may be subject to several constraints. Let
f : X R, g
1
: R
n
R, . . . , g
m
: R
n
R. Consider
max
xX
f(x)
s.t g
1
(x) = c
1
.
.
.
g
m
(x) = c
m
.
The results for problems with multiple equality constraints are very similar to the
case with one constraint.
60
The rst order condition from Theorem 5.1 carries over in much the same form.
Theorem 6.1 (Lagrange) Let X R
n
, let x intX, and let f : X R, g
1
: R
n
R, . . . , g
m
: R
n
R be C
1
. Assume the gradients of the constraints, {Dg
1
(x), . . . ,
Dg
m
(x)}, are linearly independent. If x is a local constrained maximizer of f subject
to g
1
(x) = c
1
, . . . , g
m
(x) = c
m
, then there are unique multipliers
1
,
2
, . . . ,
m
R
such that
Df(x) =
m
j=1
j
Dg
j
(x). (5)
Of course, the rst order condition can be written in terms of partial derivatives:
f
x
i
(x) =
m
j=1
j
g
j
x
i
(x)
for all i = 1, . . . , n. Thus, it gives us n + m equations in n + m unknowns. If we
can solve for all of the solutions of this system, then we have an upper bound on
the interior constrained local maximizers. The numbers
1
, . . . ,
m
are the Lagrange
multipliers corresponding to the constraints. The linear independence condition is
the general constraint qualication.
The main dierence when we move to multiple equality constraints is the form of
the constraint qualication. Previously, we assumed only that Dg(x) = 0, and now
we assume that {Dg
1
(x), . . . , Dg
m
(x)} is linearly independent, i.e., there do not exist
scalars
1
, . . . ,
m
1
Dg
1
(x) +
2
Dg
2
(x) + +
m
Dg
m
(x) = 0.
When there is one constraint (m = 1), the requirement is that Dg
1
(x) = 0 (the same
as before). When there are multiple constraints, the linear independence condition
means that no gradient Dg
i
(x) can be written as a linear combination of the remaining
ones, i.e., there do not exist scalars
j
for all j = i such that
Dg
i
(x) =
j=i
j
Dg
j
(x).
When there are two constraints (m = 2), the requirement is that the two gradients,
Dg
1
(x) and Dg
2
(x), are not collinear; when there are three constraints (m = 3), the
gradients of the constraints do not lie on the same plane.
As for the case of a single equality constraint, the constraint qualication is needed
for the resultfor both existence and uniqueness of the multipliers. To provide some
61
x
x
2
x
1
g
2
= c
2
g
1
= c
1
Dg
2
(x)
Dg
1
(x)
Df(x)
Figure 20: Constraint qualication needed for existence
x
x
2
x
1
g
2
= c
2
g
1
= c
1
Dg
2
(x)
Dg
1
(x)
Df(x)
Figure 21: Constraint qualication needed for uniqueness
geometric insight into the condition, consider the case of two constraints in Figure 20.
Here, x is a constrained local maximizer of f (in fact, it is the unique element of the
constraint set), but Df(x) cannot be written as a linear combination of Dg
1
(x) and
Dg
2
(x), which are linearly dependent. In Figure 20, Df(x) can be written as a linear
combination of Dg
1
(x) and Dg
2
(x), but now there the coecients on the gradients
of the constraints are indeterminate.
Put dierently, Lagranges theorem says that if x is an interior constrained local
maximizer, there exist
1
, . . . ,
m
R such that (x,
1
, . . .
m
) is a critical point of
the Lagrangian function L: X R
m
R, now dened by
L(x,
1
, . . . ,
m
) = f(x) +
m
j=1
j
(c
j
g
j
(x)).
62
The analysis of second order conditions and envelope theorems is very much the
same as with a single equality constraint. Indeed, the interpretation of the Lagrange
multipliers is the same:
j
tells us the rate at which the maximized value of f changes
if we increase c
j
in the j
th
constraint.
6.2 Convex Structure
Next, we note the implications of quasi-concave objective and linear constraints. As
for the case of a single equality constraint, the result follows from Theorem 7.2.
Theorem 6.2 Let X R
n
be convex, let x intX, let f : X R be C
1
, and
let g
1
: R
n
R, . . . , g
m
: R
n
R be linear. Assume x satises the constraints
g
1
(x) = c
1
, . . . , g
m
(x) = c
m
, and the rst order condition (5) holds with multipliers
1
, . . . ,
m
. Then x is a constrained global maximizer of f subject to g
1
(x) = c
1
, . . . ,
g
m
(x) = c
m
provided either of two conditions holds:
2. f is concave.
With quasi-concavity, a constrained strict local maximizer is the unique constrained
(global) maximizer.
Theorem 6.3 Let X R
n
be convex, let f : X R be quasi-concave, and let
g
1
: R
n
R, . . . , g
m
: R
n
R be linear. If x X is a constrained strict local max-
imizer of f subject to g
1
(x) = c
1
, . . . , g
m
(x) = c
m
, then it is the unique constrained
global maximizer.
6.3 Consumers Example
Consider the problem of a social planner in an exchange economy. There are n
consumers and K commodities. The social endowment (the amount in existence) of
good k is W
k
. The planner has to decide on an allocation of the goods to consumers,
where x
i
= (x
i
1
, x
i
2
, . . . , x
i
K
) is the bundle for consumer i and x
1
k
+x
2
k
+ +x
n
k
= W
k
for each good k. Given C
1
utility functions u
1
, u
2
, . . . , u
n
representing the preferences
of consumers and non-negative weights
1
, . . . ,
n
(not all zero), the planner solves
max
x
i
k
R
+
i=1,...,n
k=1,...,K
n
i=1
i
u
i
(x
i
i
, x
i
2
, . . . , x
i
K
)
s.t.
n
i=1
x
i
k
= W
k
k = 1, 2, . . . , K.
63
This is a maximization problem subject to multiple equality constraints, one for each
commodity. The Lagrangian for the problem is
L(x
1
, . . . , x
n
, ) =
n
i=1
i
u
i
(x
i
1
, . . . , x
i
K
) +
K
k=1
k
(W
k
i=1
x
i
k
).
By Lagranges theorem, an interior constrained maximizer must satisfy
L
x
i
k
=
i
u
i
x
i
k
(x
i
1
, . . . , x
i
K
) =
k
i = 1, . . . , n, k = 1, . . . , K
n
i=1
x
i
k
= W
k
k = 1, . . . , K.
Three observations can be made. First, these conditions imply
u
i
x
k
(x
i
1
, . . . , x
i
K
)
u
i
x
(x
i
1
, . . . , x
i
K
)
=

k
.
That is, if we look at any is marginal rate of substitution between any goods k and
(measuring the value of k for i in terms of ), it is

k
. This is independent of i, so
the marginal rates of substitution of the consumers are equal. Indeed, recall that
k
is the rate at which maximized social welfare increases with an increase in the total
amount of good k (and similarly for ). So

k
is the social value of good k in terms

of good , i.e., an extra unit of good k is worth roughly

k
l
units of good . The rst
order condition says that the planner must equate the individual values of the goods
to the social value. Second, the rst order conditions imply
i
u
i
x
k
(x
i
1
, . . . , x
i
K
)
j
u
j
x
k
(x
j
1
, . . . , x
j
K
)
= 1.
The lefthand side is the marginal rate of substitution between is consumption of
good k and js consumption (measuring the value of is consumption in terms of j).
Interestingly, this is equal to one for all pairs of consumers and for all goods. Third,
rewriting the above formulation of the rst order condition, we have
u
i
x
k
(x
i
1
, . . . , x
i
K
)
u
j
x
k
(x
j
1
, . . . , x
j
K
)
=

j
i
.
The lefthand side compares the increased utility consumer i would get from more of
good k to the increased utility consumer j would get. If it is high, is weight in the
welfare function must be low compared to js. This may be the opposite of what you
expected: if is weight werent relatively low, the planner would give i more of good
k, and that would raise social welfare but then the original allocation couldnt
have been optimal. To continue the example, recall the denition of a Walrasian
equilibrium allocation ( x
1
, . . . , x
n
): there exist prices p
1
, p
2
, . . . , p
K
such that
64
1. for all i = 1, . . . , n, x
i
= ( x
i
1
, . . . , x
i
K
) solves
max
x
i
1
,...,x
i
K
0
u
i
(x
i
1
, . . . , x
i
K
)
s.t. p
1
x
i
1
+ + p
K
x
i
K
p
1
w
i
1
+ + p
1
w
i
K
.
2. for all k = 1, . . . , K, W
k
=
n
i=1
x
i
k
.
Assuming each consumers solution is interior, x
i
= ( x
i
1
, . . . , x
i
K
) satises
u
i
x
i
1
( x
i
1
, . . . , x
i
K
) =
i
p
1
.
.
.
u
i
x
i
k
( x
i
1
, . . . , x
i
K
) =
i
p
K
,
where
i
is the Lagrange multiplier for is problem. Now reconsider the planning
problem with welfare weights
i
=
1
i
(one over is marginal utility of money) for
each consumer:
max
x
i
k
R
+
i=1,...,n
k=1,...,K
n
i=1
i
u
i
(x
i
i
, x
i
2
, . . . , x
i
K
)
s.t.
n
i=1
x
i
k
= W
k
k = 1, 2, . . . , K.
The rst order conditions of this problem imply
i
u
i
x
k
(x
i
1
, . . . , x
i
K
) =
k
,
or equivalently,
u
i
x
k
(x
i
1
, . . . , x
i
K
) =
i
k
.
Clearly, the Walrasian allocation ( x
1
, . . . , x
n
) satises the rst order conditions with
multipliers
k
= p
k
. Adding concavity of consumer utilities (see Theorem 6.2), we can
conclude that the Walrasian allocation is indeed the social optimum given weights
1
i
,
and then the multiplier p
k
represents the social value of good k.
Moving to the second order analysis, the necessary condition is again that the second
directional derivative of the Lagrangian be non-positive, now in every direction that
is orthogonal to the gradient of each constraint function.
65
Theorem 6.4 Let X R
n
, let x intX, and let f : X R and g
1
: R
n
R,
. . . , g
m
: R
n
R be C
2
. Assume the gradients {Dg
1
(x), . . . , Dg
m
(x)} are linearly
independent. Assume x is a constrained local maximizer of f subject to g
1
(x) = c
1
,
. . . , g
m
(x) = c
m
and satises the rst order condition (5) with multipliers
1
, . . . ,
m
.
Then
t
_
D
2
f(x)
m
j=1
j
D
2
g
j
(x)
_
t 0
for all directions t satisfying Dg
j
(x)t = 0 for all j = 1, . . . , m.
Again, strengthening the weak inequality to strict gives us a second order condition
that, in combination with the rst order condition, is sucient for a constrained strict
local maximizer. Note that, in contrast to the analysis of necessary conditions, the
next result does not rely on the constraint qualication.
Theorem 6.5 Let X R
n
, let x intX, and let f : X R and g
1
: R
n
R, . . . ,
g
m
: R
n
R be C
2
. Assume x satises the constraints g
1
(x) = c
1
, . . . , g
m
(x) = c
m
and the rst order condition (5) with multipliers
1
, . . . ,
m
. If
t
_
D
2
f(x)
m
j=1
j
D
2
g
j
(x)
_
t < 0. (6)
for all directions t satisfying Dg
j
(x)t = 0 for all j = 1, . . . , m, then x is a constrained
strict local maximizer of f subject to g
1
(x) = c
1
, . . . , g
m
(x) = c
m
.
As with a single constraint, we can consider the parameterized optimization problem
and can provide conditions under which a constrained local maximizer is a well-
dened, smooth function of the parameter. We now reinstate the constraint quali-
cation.
Theorem 6.6 Let R
m
be open and X R
n
g
1
: R
n
R, . . . , g
m
: R
n
R be C
2
. Consider (x
) X
with x intX. Assume the gradients {Dg
1
(x
), . . . , Dg
m
(x
)} are linearly
independent. Given
, assume x
satises the constraints g

1
(x
) = c
1
, . . . ,
g
m
(x
) = c
m
and the rst order condition, i.e.,
Df(x
)
m
j=1
j
D
x
g
j
(x
) = 0,
66
with multipliers
1
, . . . ,
m
. Assume that for all directions t with Dg
j
(x
)t = 0 for
all j = 1, . . . , m, we have
t
_
D
2
f(x
)
m
j=1
j
D
2
g
j
(x
)
_
t < 0.
Then there are open sets Y R
n
with x
Y and with
and C
1
mappings
i
: R, i = 1, . . . , n, and
j
: R, j = 1, . . . , m such that for all
, (i) () = (
1
(), . . . ,
n
()) is the unique maximizer of f(, ) subject to
g
1
(x, ) = c
1
, . . . , g
m
(x, ) = c
m
belonging to Y , (ii) () satises the rst order
condition (5) at with unique multipliers
1
(), . . . ,
m
(), and (iii) () satises
the second order sucient condition (6) at with multipliers
1
(), . . . ,
m
().
Fortunately, the statement of the envelope theorem carries over virtually unchanged.
Theorem 6.7 Let R
m
be open and X R
n
g
1
: R
n
R, . . . , g
m
: R
n
R be C
1
. Consider (x
) X with
x intX. Let
i
: R, i = 1, . . . , n, and
j
: R, j = 1, . . . , m, be C
1
= (
) and
j
=
j
(
), j = 1, . . . , m, and such that for all

, () is a constrained local maximizer satisfying the rst order condition (5)
at with multipliers
1
(), . . . ,
m
(). Dene the mapping F : R by
F() = f((), )
1
, and
DF(
) = DL(
|x
).
Again, we can use the envelope theorem to characterize
j
as the marginal eect of
increasing the value of the jth constraint.
7 Inequality Constraints
We now consider maximization problems subject to multiple inequality constraints,
max
xR
n
f(x)
s.t. g
1
(x) c
1
.
.
.
g
m
(x) c
m
,
now dening f on the entire Euclidean space and building any restrictions on the
domain into the constraints of the problem. Recall that it may be that g
j
(x) = x
j
and c
j
= 0, so the j
th
constraint is just a non-negativity constraint: x
j
0. And recall
that the problem of equality constraints is a special case of inequality constraints: we
can always convert g(x) = c into two inequalities g(x) c and g(x) c.
67
x
2
x
1
g
1
= c
1
g
2
= c
2
y
x
z
Df(y)
Dg
1
(y)
Df(z)
Dg
2
(z)
Dg
1
(z)
Df(x) = 0
C
Figure 22: Kuhn-Tucker conditions
We now consider maximization subject to multiple inequality constraints. Now, there
are dierent possible rst order conditions for a constrained local maximizer, de-
pending on which constraints are met with equality. Given x R
n
, we say the j
th
constraint is binding if g
j
(x) = c
j
, and if g
j
(x) < c
j
, then the constraint is slack.
Figure 22 illustrates a problem with two inequality constraints and depicts three pos-
sibilities, depending on whether none, one, or two constraints are binding. In the rst
case, we could have a constrained local maximizer such as x, for which no constraints
bind. Such a vector must be a critical point of the objective function. In the second
case, we could have a single constraint binding at a constrained local maximizer such
as y, and here the gradients of the objective and constraint are collinear. Interest-
ingly, these gradients actually point in the same direction. Lastly, we could have a
constrained local maximizer such as z, where both constraints bind. Here, the gradi-
ent of the objective is not collinear with the gradient of either constraint, and it may
appear that no gradient restriction is possible. But in fact, Df(z) can be written
as a linear combination of Dg
1
(z) and Dg
2
(z) with non-negative weights; and if the
picture were a three-dimensional picture of optimization over three variables, then we
would have drawn the three gradients lying on the same plane.
The restrictions evident in Figure 22 are formalized in the next theorem. Although
often attributed to Kuhn and Tucker, the result was derived independently by Karush.
Note that we now assume the domain of f is the entire Euclidean space R
n
. To
capture maximization over a smaller domain X R
n
, we would formalize X in terms
68
inequality constraints. For example, if we want the domain of the objective function
to be R
n
+
, then we impose the non-negativity restrictions x
1
0, . . . , x
n
0 by adding
them explicitly as inequality constraints: we specify g
1
(x) = x
1
, . . . , g
m
(x) = x
m
and c
1
= = c
m
= 0.
Theorem 7.1 (Karush-Kuhn-Tucker Theorem) Let f : R
n
R, g
1
: R
n
R,
. . . , g
m
: R
n
R be C
1
. Suppose the rst k constraints are the binding ones at x
R
n
, and assume the gradients of the binding constraints, {Dg
1
(x), . . . , Dg
k
(x)}, are
linearly independent. If x is a constrained local maximizer of f subject to g
1
(x) c
1
,
. . . , g
m
(x) c
m
1
, . . .
m
R such that
Df(x) =
m
j=1
j
Dg
j
(x) (7)
j
(c
j
g
j
(x)) = 0 j = 1, . . . , m (8)
j
0 j = 1, . . . , m. (9)
Proof Maintaining the conditions of the theorem, assume that x is a constrained
local maximizer. By Gales (1960) Theorem 2.9, either there exist
0
,
1
, . . . ,
k
0
0
Df(x) +
k
j=1
j
(Dg
j
(x)) = 0,
or there exists a direction t such that Df(x)t > 0 and Dg
j
(x)t > 0 for all j =
1, . . . , k. In the latter case, however, we can choose > 0 suciently small so that
f(x + t) > f(x) and g
j
(x + t) < g
j
(x) = c
j
for all j = 1, . . . , k, but then x is not
a constrained local maximizer, a contradiction. In the former case, note that linear
independence of {Dg
1
(x), . . . , Dg
k
(x)} implies that
0
= 0, and so we can dene
j
=
j
/
0
, j = 1, . . . , k, to fulll (7) and
k+1
= =
m
= 0 to fulll (8) and (9).
Again, linear independence implies that these coecients are unique.
Geometrically, the rst order condition from the Karush-Kuhn-Tucker theorem means
that the gradient of the objective function, Df(x), is contained in the semi-positive
cone generated by the gradients of binding constraints, i.e., it is contained in the set
_
m
j=1
j
Dg
j
(x) |

1
, . . . ,
m
0
(not all zero)
_
,
depicted in Figure 23. The technology of the proof is essentially a form of the sepa-
rating hyperplane theorem, but one known as a theorem of the alternative that is
especially adapted for problems exhibiting a polyhedral structure. In turn, there are
69
Dg
1
(x)
Dg
2
(x)
0
Df(x)
f = 0
Figure 23: Cones and alternatives
dierent versions of the theorem of the alternative, depending on the types of inequal-
ities involved. (Some versions involve all strict inequalities, some all weak, etc.) We
use Gales (1960) Theorem 2.9, which states that a vector y lies in the semi-positive
cone of a collection {a
1
, . . . , a
k
} if and only if it is not the case that there exists a
vector t such that a
j
t > 0 for all j = 1, . . . , k.
The numbers
1
, . . . ,
m
are still referred to as Lagrange multipliers, and the linear
independence condition is still referred to as the constraint qualication. To see why
the constraint qualication is needed for existence of the multipliers, consider Figure
24, a simple adaptation of Figure 20. Again, x is a local constrained maximizer of f
(in fact, it is the unique element of the constraint set), but Df(x) cannot be written
as a non-negative linear combination of Dg
1
(x) and Dg
2
(x), because they are linearly
dependent. In Figure 25, an adaptation of Figure 21, the gradient of the objective
function can be written as a non-negative linear combination of the gradients of the
constraints for an innite range of weights.
As with equality constraints, we can dene the Lagrangian L: R
n
R
m
R by
L(x,
1
, . . . ,
m
) = f(x) +
m
j=1
j
(c
j
g
j
(x)),
and then condition (7) from Theorem 7.1 is the requirement that x is a critical
point of the Lagrangian given multipliers
1
, . . . ,
m
. A loose interpretation of the
terms
j
(c
j
g
j
(x)) is the cost of violating the jth constraint, in which case the
multiplier
j
is the price for violating the constraint; for this reason, the multipliers
are sometimes referred to as shadow prices. This interpretation is not entirely
justied, because it suggests that a solution of the constrained optimization problem
should be an unconstrained maximizer of the Lagrangian. This is true under some
conditions, but it is not completely general; see Theorem 7.4. Nevertheless, our earlier
interpretation of
j
carries over: it tells us the rate of change of the maximized value
of the objective function as we increase constrained value c
j
of the jth constraint.
70
x
x
2
x
1
g
2
c
2
g
1
c
1
Dg
2
(x)
Dg
1
(x)
Df(x)
Figure 24: Constraint qualication needed for existence
x
x
2
x
1
g
2
c
2
g
1
c
1
Dg
2
(x)
Dg
1
(x)
Df(x)
Figure 25: Constraint qualication needed for uniqueness
71
An important dierence from the case of equality constraints is that the constraint
qualication now holds only for the gradients of binding constraints. (With equality
constraints, every constraint is binding, but now some may not be.) Another im-
portant dierence, touched on above, is that the multipliers are non-negative. This
is consistent with our interpretation of
j
, which gives the rate of change of the
maximized value of the objective function as we increase constrained value c
j
of the
jth constraint: now only the inequality g
j
(x) c
j
needs to be maintained, so in-
creasing c
j
cant hurt, so the multipliers are non-negative. Yet another dierence is
that the equality constraints g
j
(x) = c
j
have been replaced in (8) by the conditions
j
(c
j
g
j
(x)), j = 1, . . . , m, which are called the complementary slackness conditions.
Put dierently, complementary slackness says that
j
> 0 c
j
g
j
(x) = 0
j
= 0 c
j
g
j
(x) > 0.
In words, the multiplier of every slack constraint is zero, and every constraint with a
positive multiplier is binding.
Referring back to Figure 22, rst consider a constrained local maximizer such as x.
Assuming the constraint qualication holds, the Karush-Kuhn-Tucker theorem says
that the gradient of f at x can be written as
Df(x) =
1
Dg
1
(x) +
2
Dg
2
(x),
and since both constraints are slack, complementary slackness implies
1
=
2
= 0,
which gives us Df(x) = 0. At y, the second constraint is slack, so
2
= 0, and we
have Df(y) =
1
Dg
1
(y) for
1
0, as depicted. At z, the rst order condition (7)
implies that the gradient of the objective lies in the semi-positive cone generated by
the binding constraints, as depicted.
In practical terms, the rst order conditions (7) and (8) give us n + m equations in
n +m unknowns. If we can solve for all of the solutions of this system, then we have
an upper bound on the interior constrained local maximizers. Typically, one goes
through all combinations of binding constraints; given one set of binding constraints
meeting the constraint qualication, solve the problem as though it were just one of
multiple equality constraints. Furthermore, if any solutions involve
j
< 0, then the
non-negativity conditions (9) allow us to discard them as possible constrained local
maximizers. Typically, some combinations of binding constraints will be impossible,
so those can be skipped. After doing this for all possible combinations of binding
constraints, one hopefully has a small set of possible candidates for constrained local
maximizers satisfying the constraint qualication.
Example The consumers problem is most accurately formulated in terms of inequal-
ity constraints. We can now think of u dened on all R
2
and impose non-negativity
72
constraints on the consumers choice. The problem is
max
xR
2
u(x
1
, x
2
)
s.t. p
1
x
1
+ p
2
x
2
I
x
1
0
x
2
0.
Dening g
1
(x
1
, x
2
) = p
1
x
1
+ p
2
x
2
, g
2
(x
1
, x
2
) = x
1
, and g
3
(x
1
, x
2
) = x
2
, this is
a maximization problem subject to the inequality constraints (1) g
1
(x
1
, x
2
) I,
(2) g
2
(x
1
, x
2
) 0, and (3) g
3
(x
1
, x
2
) 0. Note the three constraints cannot bind si-
multaneously. First, consider the possibility that only (2) binds, i.e., p
1
x
1
+p
2
x
2
< I,
x
1
= 0, and x
2
> 0. Note that Dg
2
(x) = (1, 0) = 0, so the constraint qualication
is met. By complementary slackness, it follows that
1
=
3
= 0, so the rst order
condition becomes
u
x
1
(x
1
, x
2
) =
2
g
2
x
1
(x
1
, x
2
) =
2
u
x
2
(x
1
, x
2
) =
2
g
2
x
2
(x
1
, x
2
) = 0
g
2
(x
1
, x
2
) = 0,
2
0,
but this is incompatible with monotonicity of u, so we discard this case. Similarly for
the case in which only (3) binds, the case in which (2) and (3) both bind, and the case
in which no constraints bind. Next, consider the case in which (1) and (2) bind, i.e.,
p
1
x
1
+ p
2
x
2
= I, x
1
= 0, x
2
> 0. Note that Dg
1
(x) = (p
1
, p
2
) and Dg
2
(x) = (1, 0)
are linearly independent, so the constraint qualication is met. Since x
2
> 0, com-
plementary slackness implies
3
= 0, so the rst order conditions are
u
x
1
(x
1
, x
2
) =
1
g
1
x
1
(x
1
, x
2
) +
2
g
2
x
1
(x
1
, x
2
)
u
x
2
(x
1
, x
2
) =
1
g
1
x
2
(x
1
, x
2
) +
2
g
2
x
2
(x
1
, x
2
)
g
1
(x
1
, x
2
) = I, g
2
(x
1
, x
2
) = 0,
1
,
2
0.
We substitute x
1
= 0 into p
1
x
1
+p
2
x
2
= I to solve for x
2
=
I
p
2
, and we conclude that
the bundle (0,
I
p
2
) is one possible optimal bundle for the consumer. Similarly, when
(1) and (3) bind, we nd the possible bundle (
I
p
1
, 0). Finally, we consider the case in
which only (1) binds. Then complementary slackness implies
2
=
3
= 0, and the
73
rst order conditions are
u
x
1
(x
1
, x
2
) =
1
g
1
x
1
(x
1
, x
2
)
u
x
2
(x
1
, x
2
) =
1
g
1
x
2
(x
1
, x
2
)
g
1
(x
1
, x
2
) = I,
1
0.
When u is Cobb-Douglas,
1
these equations yield x
1
=

+
I
p
1
and x
2
=

+
I
p
2
, and
checking the three possible solutions, youll see that this one indeed solves the con-
sumers problem. Assume, instead, that the two goods are perfect substitutes, i.e.,
u(x
1
, x
2
) = ax
1
+ bx
2
with a, b > 0, and consider the case in which only (1) binds.
The rst order conditions imply a =
1
p
1
and b =
1
p
2
, so this case is only possible
when the consumers marginal rate of substitution (measuring the value of good 1 in
terms of good 2) is equal to the relative price of good 1:
a
b
=
p
1
p
2
. Then every bundle
(x
1
, x
2
) satisfying the budget constraint with equality yields utility
ax
1
+ b
_
I p
1
x
1
p
2
_
= ax
1
+
ap
2
p
1
_
I p
1
x
1
p
2
_
=
aI
p
1
=
bI
p
2
,
so all such bundles are optimal. If the razors edge condition on marginal rates of sub-
stitution and relative prices does not hold, then either
a
b
>
p
1
p
2
or the opposite obtain,
and the only possible optimal bundles are the corner solutions. In the former case,
u
_
I
p
1
, 0
_
=
aI
p
1
>
bI
p
2
= u
_
0,
I
p
2
_
,
so the consumer optimally spends all of his money on good 1, and in the remaining
case he spends everything on good 2.
7.2 Concave Programming
Like optimization subject to equality constraints, optimization problems subject to
inequality constraints are simplied under concavity conditions. In fact, such prob-
lems are even more amenable to this structure. We rst extend our earlier results
for concave objective and linear constraints. First, we establish a general result that
implies our earlier results for quasi-concave objective and linear equality constraints.
Now, it is enough that constraints are quasi-concave: the full strength of linearity is
not needed for inequality constraints.
1
Technically, the Cobb-Douglas utility function is not dened on the entire Euclidean space R
2
,
but we can still apply the KKT theory on the interior of the domain, R
2
++
.
74
n
R be C
1
, and let g
1
: R
n
R, . . . , g
m
: R
n
R be
C
1
and quasi-convex. Assume x R
n
1
(x) c
1
, . . . ,
g
m
(x) c
m
and the rst order condition (7)(9) with multipliers
1
, . . . ,
m
. Then x
is a constrained global maximizer of f subject to g
1
(x) c
1
, . . . , g
m
(x) c
m
provided
either of two conditions holds:
2. f is concave.
Proof Note that either Df(x) = 0 or, under the assumptions of the theorem,
Df(x) = 0 and f is concave, which implies that x is an unconstrained (and therefore
a constrained) global maximizer. Thus, we consider the Df(x) = 0 case. Let y be any
element of the constraint set C, i.e., y satises g
j
(y) c
j
for j = 1, . . . , m, and let t =
1
||yx||
(y x) be the direction pointing to the vector y from x. Given (0, 1], dene
z() = x + (y x) = (1 )x + y,
a convex combination of x and y. Note that g
j
(x) c
j
and g
j
(y) c
j
for each j, and
so by quasi-convexity, we have
g
j
(z()) max{g
j
(x), g
j
(y)} c
j
.
For each binding constraint j, we then have g
j
(z()) c
j
= g
j
(x), and therefore
D
t
g
j
(x) = lim
0
g
j
(z()) g
j
(x)
||y x||
0,
and of course, for each slack constraint, we have
j
= 0. Combining these observa-
tions, we conclude
D
t
f(x) =
m
j=1
j
D
t
g
j
(x) 0.
Now suppose in order to derive a contradiction that f(y) > f(x). Then there exists
(0, 1] such that
D
t
f(z()) = Df(z())t > 0,
and by quasi-concavity of f, we have f(z()) f(x). See Figure 26 for a visual
depiction. By continuity of the dot product, there exists > 0 suciently small that
Df(z())(t Df(x)) > 0. Letting t
=
1
||t+Df(x)||
(t Df(x)) point in the direction
of the perturbed vector t Df(x), it follows that the derivative of f at z() in this
direction is positive, i.e., D
t
f(z()) > 0. This means that for suciently small > 0,
we can dene w = z()+t
such that f(w) > f(z()) f(x). Given (0, 1], dene
v() = x + (w x) = (1 )x + w,
75
a convex combination of x and w. Let s =
1
||wx||
(w x) be the direction pointing
from x to w, and note that
D
s
f(x) = lim
0
f(v()) f(x)
||w x||
0. (10)
But we also have
D
s
f(x) = Df(x)s
=
1
||w x||
Df(x)[w x]
=
1
||w x||
Df(x)[z() + t
x]
=
1
||w x||
Df(x)[(y x) + t

||w x||
Df(x)t
=

||w x|| ||t + Df(x)||
Df(x)[t Df(x)]

||w x|| ||t + Df(x)||
[Df(x) Df(x)]
< 0,
where the rst line follows from the denition of directional derivative, the second
from the denition of s, the third from the denition of w, the fourth from the def-
inition of z(), the fth from Df(x)t 0, the sixth from the denition of t
, the
seventh from Df(x)t 0, and the nal line follows from Df(x) = 0. This contra-
dicts D
s
f(x) 0 from (10), and we conclude that f(x) f(y), i.e., x is a constrained
global maximizer.
Returning to Theorems 5.2 and 6.2, note that a linear constraint g(x) = c can be
reformulated as two linear inequality constraints: g(x) c and g(x) c. Since
Theorem 7.2 does not use a constraint qualication, we can map the earlier results
to the framework of this section and apply the current theorem. The only slight
gap is that in the earlier results, we assumed an open, convex domain X R
n
,
rather than assuming f is dened on the entire Euclidean space, but that dierence
is inconsequential.
As in the analysis of equality constraints, if f is quasi-concave and x is a constrained
strict local maximizer, then it is the unique global maximizer. A dierence from
equality constraints is that we can allow the constraints to be quasi-convex, rather
than actually linear.
76
replacemen
x
2
x
1
x
Df(x)
t
t
s
w
z() y
set of f
level
Figure 26: Proof of Theorem 7.2
n
R be quasi-concave, and let g
1
: R
n
R, . . . , g
m
: R
n
R be quasi-convex. If x R
n
is a constrained strict local maximizer, then it is the
unique constrained global maximizer of f subject to g
1
(x) c
1
, . . . , g
m
(x) c
m
.
We end this section with an analysis that is particular to inequality constraints. Under
a weak version of the constraint qualication, and with concave objective and convex
constraints, solutions to the constrained maximization problem can be re-cast as
unconstrained maximizers of the Lagrangian, with appropriately chosen multipliers.
Formally, writing = (
1
, . . . ,
m
) for a vector of multipliers, we say (x
) is a
saddlepoint of the Lagrangian if for all x R
n
and all R
m
+
,
L(x,
) L(x
) L(x
, ).
In words, given x
j
minimizes
m
j=1
j
(c
j
g
j
(x
)); and given
, x
maximizes
f(x) +
m
j=1
j
(c
j
g
j
(x)). Note that the maximization problem over x is uncon-
strained, but if (x
) is a saddlepoint, then x
will indeed satisfy g

j
(x
) c
j
for each
j; indeed, if c
j
g
j
(x
) < 0, then the term

j
(c
j
g
j
(x
)) could be made arbitrarily

negative by choice of arbitrarily large
j
, so (x
) could not be a saddlepoint.

n
R be concave, let g
1
: R
n
R, . . . , g
m
: R
n
R be
convex, and let x
R
n
. If there exist
1
, . . . ,
m
R such that (x
) is a sad-
dlepoint of the Lagrangian, then x
is a global constrained maximizer of f subject to

g
1
(x) c
1
, . . . , g
m
(x) c
m
. Conversely, assume there is some x R
n
such that
satises the constraints strictly, i.e., g
1
( x) < c
1
, . . . , g
m
( x) < c
m
. If x
is a con-
strained local maximizer of f subject to g
1
(x) c
1
, . . . , g
m
(x) c
m
, then there exist
1
, . . . ,
m
R such that (x
) is a saddlepoint of the Lagrangian.

77
The condition g
1
( x) < c
1
, . . . , g
m
( x) < c
m
is called Slaters condition. To gain an
intuition for the saddlepoint theorem and the need for Slaters condition, consider
Figure 27. Here, we consider maximizing a function of any number of variables, but
to illustrate the problem in a two-dimensional graph, we assume there is a single
inequality constraint, g(x) c. On the horizontal axis, we graph values of f(x) as x
varies over R
n
, and on the vertical axis, we graph cg(x) as x varies over the Euclidean
space. When f is concave and g is convex (so c g(x) is concave), you can check
that the set {(f(x), c g(x)) | x R
n
}, which is shaded in the gure, is convex. The
values (f(x), c g(x)) corresponding to vectors x satisfying the constraint g(x) c
lie above the horizontal axis, the darker shaded regions in the gure. The ordered
pairs (f(x
), c g(x
)) corresponding to solutions of the constrained maximization

problem are indicated by the black dots.
Consider the problem of minimizing f(x
) + (c g(x
)) with respect to , holding

x
xed. This simply means that at a saddlepoint, (i) if c g(x
) > 0, then
= 0,
and (ii) if c g(x
) = 0, then
can be any non-negative number. Figure 27 depicts

the rst possibility in Panel (a) and the second possibility in Panels (b) and (c). Now
consider the problem of maximizing f(x) +
(c g(x)) with respect to x, holding
xed. Lets write the objective function as a dot product: (1,
) (f(x), c
g(x)). Viewed this way, we can understand the problem as choosing the ordered pair
(f(x), c g(x)) in the shaded region that maximizes the linear function with gradient
(1,
). This is depicted in Panels (a) and (b). The dierence in the two panels is
that in (a), the constraint is not binding at the solution to the optimization problem
(so Df(x
) =
g(x
) = 0), while in (b) it is (so
may be positive).
The dierence between Panels (b) and (c) is that Slaters condition is not satised in
the latter: there is no x such that g(x) < c; graphically, the shaded region does not
contain any points above the horizontal axis. The pair (f(x
), cg(x
)) corresponding
to the solution of the maximization problem is indicated by the black dot; we then
must choose
such that (f(x
), cg(x
)) maximizes the linear function with gradient

(1,
). The diculty is that for any nite
, the pair (f(x
), c g(x
)) does not
maximize the linear function; instead, the maximizing pair will correspond to a vector
x that violates the constraint, i.e., c g(x) < 0. To make (f(x
), c g(x
)) the
maximizing pair, the gradient of the linear function must be pointing straight up,
which would correspond to something like innite (whatever that would mean). In
other words, if Slaters condition is not satised, then there may be no way to choose
a multiplier to solve the saddlepoint problem.
Example For a formal example demonstrating the need for Slaters condition, let
n = 1, f(x) = x, m = 1, c
1
= 0, and g(x) = x
2
. The only point in R satisfying
g
1
(x) 0 is x = 0, so this is trivially the constrained maximizer of f. But Df(0) = 1
and Dg
1
(0) = 0, so there is no 0 such that Df(0) = Dg
1
(0).
78
c g(x)
c g(x)
c g(x)
f(x)
f(x)
f(x)
(1, 0)
= 0
(1,
> 0
(1, )
(f(x
), c g(x
))
(f(x
), c g(x
))
(f(x
), c g(x
))
(1,
)?
a)
b)
c)
Figure 27: Saddlepoint theorem
79
Note that Slaters condition is implied by the usual constraint qualication.
0
convex
hull
Dg
1
(x)
Dg
2
(x)
p
Indeed, suppose the gradients of the constraints
{Dg
1
(x), . . . , Dg
m
(x)} are linearly independent; in
particular, there do not exist non-negative co-
ecients
1
, . . . ,
m
summing to one such that
m
j=1
j
Dg
j
(x) = 0. In geometric terms, the zero
vector does not belong to the convex hull of the set
of gradients. By the separating hyperplane theorem,
there is a direction p such that p Dg
j
(x) > 0 for
j = 1, . . . , m, and this means the derivative in direction p is negative for each con-
straint: D
p
g
j
(x) < 0. Then we can choose > 0 suciently small that z = x p
satises g
j
(z) < g
j
(x) c
j
for j = 1, . . . , m, fullling Slaters condition. In fact, this
argument shows that we can fulll Slaters condition using vectors arbitrarily close
to the constrained local maximizer.
The above observation leads to the following corollary, a dierentiable version of the
saddlepoint theorem.
Corollary 7.5 Let f : R
n
R be C
1
and concave, let g
1
: R
n
R, . . . , g
m
: R
n
R
be C
1
convex, and let x
R
n
1
(x) c
1
, . . . ,
g
m
(x) c
m
. Then x
satises the rst order condition (7)(9) with multipliers
=
(
1
, . . . ,
m
) if and only if (x
) is a saddlepoint of the Lagrangian.

Proof Assume the assumptions of the theorem; in particular, the constraint qual-
ication implies Slaters condition. For one direction, suppose x
satises the rst

order condition (7)(9) with multipliers
1
, . . . ,
m
. By Theorem 7.2, x
is a con-
strained maximizer, and then Theorem 7.1 implies that these multipliers are unique.
Furthermore, Theorem 7.4 implies there exists = (
1
, . . . ,
m
) such that (x
, ) is
a saddlepoint. In particular,
j
0 and
j
(c
j
g
j
(x)) = 0, j = 1, . . . , m, and x
maximizes L(, ), and by Theorem 3.1, the rst order condition

DL(x
|) = Df(x
)
m
j=1
j
Dg
j
(x
) = 0 (11)
holds. This means that the rst order condition (7)(9) holds with multipliers
1
, . . . ,
m
, and we conclude =
; therefore, (x
) is a saddlepoint. For the

other direction, assume (x
) is a saddlepoint. Then (11) holds, which implies

(7)(9).
A corollary of the corollary is that under the above concavity and dierentiability
conditions, the multipliers dening a saddlepoint (x
) are unique.
80
Corollary 7.6 Let f : R
n
R be C
1
and concave, let g
1
: R
n
R, . . . , g
m
: R
n
R
be C
1
convex, and let x
R
n
1
(x) c
1
, . . . ,
g
m
(x) c
m
. If (x
, ) and (x
) are saddlepoints of the Lagrangian, then =
.
Note that if (x
, ) and (x
) are saddlepoints, then x
satises the rst order

condition (7)(9) with multipliers
1
, . . . ,
m
and with
1
, . . . ,
m
. By Theorem 7.1,
we have
j
=
j
, j = 1, . . . , m.
The second order analysis parallels that for multiple equality constraints, modied to
accommodate the dierent rst order conditions. Again, the necessary condition is
that the second directional derivative of the Lagrangian be non-positive in a restricted
set of directions. A dierence is that now the inequality must hold only for directions
orthogonal to the gradients of binding constraints.
n
R, g
1
: R
n
R, . . . , g
m
: R
n
R be C
2
. Suppose the
rst k constraints are the binding ones at x R
n
, and assume the gradients of the
binding constraints, {Dg
1
(x), . . . , Dg
k
(x)}, are linearly independent. Assume x is a
constrained local maximizer of f subject to g
1
(x) c
1
, . . . , g
m
(x) c
m
and satises
the rst order conditions (7)(9) with multipliers
1
, . . . ,
m
. Consider any direction
t such that Dg
j
(x)t = 0 for all binding constraints j = 1, . . . , k. Then
t
_
D
2
f(x)
m
j=1
j
D
2
g
j
(x)
_
t 0.
Note that the range of directions for which the above inequality must hold is the
set of directions that are orthogonal to the gradients of binding constraints. One
might think it should hold as well for directions t such that Dg
j
(x)t 0 for all
j = 1, . . . , m, since any direction with Dg
j
(x)t < 0 is feasible for that constraint. In
fact, the stronger version of the condition (using the larger range of directions) is not
necessary.
Example Let n = 1, f(x) = e
x
, m = 1, g
1
(x) = x, and c
1
= 0. Clearly, x = 0
maximizes f subject to g
1
(x) 0, and the rst order condition Df(0) =
1
Dg
1
(0)
holds with
1
= 1. Furthermore, the direction t = 1 satises Dg
1
(0)t = 1 < 0.
Nevertheless,
D
2
f(0)
1
D
2
g
1
(0) = 1 > 0,
violating the stronger version of the condition.
81
Again, strengthening the weak inequality to strict gives us a second order condition
that, in combination with the rst order condition, is sucient for a constrained strict
local maximizer. In contrast to the analysis of necessary conditions, the next result
does not rely on the constraint qualication.
n
R, g
1
: R
n
R, . . . , g
m
: R
n
R be C
2
. Assume
x R
n
1
(x) c
1
, . . . , g
m
(x) c
m
and the rst order
condition (7)(9) with multipliers
1
, . . . ,
m
. Assume that for all directions t with
Dg
j
(x)t 0 for all binding constraints j = 1, . . . , k, we have
t
_
D
2
f(x)
m
j=1
j
D
2
g
j
(x)
_
t < 0. (12)
Then x is a constrained strict local maximizer of f subject to g
1
(x) c
1
, . . . , g
m
(x)
c
m
.
Note that, in contrast to Theorem 7.7, the range of directions over which the inequality
holds in Theorem 7.8 is now larger, also holding for directions in which binding
constraints are decreasing: it holds for all t such that Dg
j
(x)t 0 rather than
Dg
j
(x)t = 0. This subtlety does not arise in the analysis of equality constraints, and
the next example demonstrates that it plays a critical role.
Example Let n = 1, f(x) = x
2
, m = 1, c
1
= 0, and g
1
(x) = x. Obviously,
x = 0 is not a local maximizer of f subject to g
1
(x) 0, and the rst order condition
from Theorem 7.1 holds with = 0. Nevertheless, it is vacuously true that for all
directions t such that Dg
1
(0)t = 0, the inequality (12) holds.
As with equality constraints, we can consider the parameterized optimization problem
and can provide conditions under which a constrained local maximizer is a well-
dened, smooth function of the parameter. As before, we reinstate the constraint
qualication. A change from the previous result is that we strengthen the rst order
condition by assuming strict complementary slackness, which entails that
j
> 0 if
and only if g
j
(x) = c
j
. That is, whereas complementary slackness means g
j
(x) = c
j
if
j
> 0, we now add the converse direction of this statement.
Theorem 7.9 Let R
m
be open, and let f : R
n
R and g
1
: R
n

R, . . . , g
m
: R
n
R be C
2
. Consider (x
) R
n
. Given
, assume
that x

1
(x,
) c
1
, . . . , g
m
(x,
) c
m
, that the rst k
constraints are the binding ones at x
, that the gradients of the binding constraints,

{Dg
1
(x
), . . . , Dg
k
(x
)}, are linearly independent, that x
satises the rst order

82
condition at
, i.e.,
Df(x
) =
m
j=1
j
Dg
j
(x
j
(c
j
g
j
(x
)) = 0 j = 1, . . . , m
j
0 j = 1, . . . , m,
with multipliers
1
, . . . ,
m
, and that strict complementary slackness holds, i.e.,
j
> 0
if and only if j k. Assume that for all directions t with Dg
j
(x
)t 0 for all
binding constraints j = 1, . . . , k, we have
t
_
D
2
f(x
)
m
j=1
j
D
2
g
j
(x
)
_
t < 0.
n
with x
Y , and with
and C
1
mappings
i
: R, i = 1, . . . , n, and
j
: R, j = 1, . . . , m, such that for
all , (i) () = (
1
(), . . . ,
n
()) is the unique maximizer of f(, ) subject
to g
1
(x, ) c
1
, . . . , g
m
(x, ) c
m
belonging to Y , (ii) () satises the rst order
condition (7)(9) at with unique multipliers
1
(), . . . ,
m
(), and (iii) () satises
the second order sucient condition (12) at with multipliers
1
(), . . . ,
m
().
Fortunately, the statement of the envelope theorem carries over virtually unchanged.
Theorem 7.10 Let R
m
n
I R and g
1
: R
n
I R,
. . . , g
m
: R
n
I R be C
1
. Consider (x
) R
n
. Let
i
: R, i = 1, . . . , n,
and
j
: R, j = 1, . . . , m, be C
1
= (
) and
j
=
j
(
),
j = 1, . . . , m, and such that for all , () is a constrained local maximizer
satisfying the rst order condition (7)(9) at with multipliers
1
(), . . . ,
m
().
Dene the mapping F : R by
F() = f((), )
1
, and
DF(
) = DL(
|x
).
Again, we can use the envelope theorem to characterize
j
as the marginal eect
of increasing the value of the jth constraint; with inequality constraints, of course,
this cannot diminish the maximized value of the objective, so the multipliers are
non-negative.
83
8 Pareto Optimality Revisited
We now return to the topic of characterizing Pareto optimal alternatives and explore
an alternative approach using the framework of constrained optimization. First, we
give a general characterization in terms of inequality constrained optimization. Sec-
ond, we establish a necessary rst order condition for Pareto optimality that adds a
rank condition on gradients of individual utilities to the assumptions of Theorem 4.11
to deduce strictly positive coecients and provides an interpretation of the coecients
in terms of shadow prices of utilities. Finally, we establish that with quasi-concave
utilities, the rst order condition is actually sucient for Pareto optimality as well.
This gives us a full characterization that, in comparison with Corollary 4.7, weak-
ens the assumption of concavity to quasi-concavity but adds the rank condition on
gradients.
The next result is structure free, extending our earlier analysis by dropping all con-
vexity, concavity, and dierentiability conditions. It gives a full characterization: an
alternative is Pareto optimal if and only if it satises n dierent maximization prob-
lems (one for each individual) subject to inequality constraints. The proof follows
directly from denitions and is omitted.
Theorem 8.1 Let x A be an alternative, and let u
i
= u
i
(x) for all i. Then x is
Pareto optimal if and only if it solves
max
yX
u
i
(y)
s.t. u
j
(y) u
j
, j = 1, . . . , i 1, i + 1, . . . , n
for all i.
Note that the suciency direction of Theorem 8.1 uses the fact that the alternative
x solves n constrained optimization problems, one for each individual. Figure 28
demonstrates that this feature is needed for the result: there, x
maximizes u
2
(y)
subject to u
1
(y) u
1
, but it is Pareto dominated by x
. Obviously, x
is Pareto
optimal, as it maximizes u
1
(y) subject to u
2
(y) u
2
and it maximizes u
2
(y) subject
to u
1
(y) u
1
.
Of course, we can use our analysis of maximization subject to multiple inequality con-
straints to draw implications of Theorem 8.1. Consider a Pareto optimal alternative
x, as in Figure 28, for which the constraint qualication holds for the optimization
problem corresponding to each i. In this context, note that all constraints are binding
by construction: u
j
(x) = u
j
for all j = i. Thus, the constraint qualication is that
the gradients
Du
1
(x), . . . , Du
i1
(x), Du
i+1
(x), . . . , Du
n
(x)
84
U
u
1
u
1
u
2
u
2
(u
1
(x), u
2
(x))
(u
1
(x
), u
2
(x
))
(u
1
(x
), u
2
(x
))
utility
for 1
utility for 2
p (1,
1
2
) (
2
1
, 1)
Figure 28: Pareto optimality without concavity
are linearly independent. One implication of the constraint qualication is that the
set of alternatives has dimension at least n 1. Furthermore, an implication of the
constraint qualication holding for all i is that all individuals gradients are non-zero
at x. When there are just two individuals, the qualication becomes Du
2
(x) = 0 for
individual 1s problem and Du
1
(x) = 0 for individual 2s problem, i.e., the condition
of non-zero gradients is necessary and sucient for the constraint qualication.
Note that for each i, x is a constrained local maximizer of u
i
subject to u
j
(x) u
j
,
j = i. Then the rst order condition from Theorem 7.1 holds, as stated in the
next theorem, where we omit the complementary slackness conditions because all
constraints are binding.
d
, x intA, and each u
i
: R
d
R is C
1
. Let i be such
that the gradients {Du
j
(x) | j = i} are linearly independent. If x is Pareto optimal,
then there exist unique multipliers
i
1
,
i
i1
,
i
i+1
, . . . ,
i
n
0, j = i, such that
Du
i
(x) =
j=i
i
j
Du
j
(x). (13)
Recall that the multiplier on a constraint has the interpretation of giving the rate of
change of the maximized objective function as we increase the value of the constraint.
In this context, the multiplier
i
j
has a special meaning: it is the rate at which we
can increase i utility by taking utility away from individual j. Put dierently, it is
85
the rate at which is utility would decrease if we increase js utility (holding all other
individuals at the constraint). Thus, it is the shadow price of utility for j in terms
of utility for i. Geometrically, viewed in R
d
, the gradient Du
i
(x) of individual i lies
on the (n 1)-dimensional hyperplane spanned by the other individuals gradients.
Now recall the mapping u: X R
n
dened by u(x) = (u
1
(x), . . . , u
n
(x)). Then
u(X) is the set of possible utility vectors, and the linear independence assumption in
Theorem 8.2 is equivalent to the requirement that the derivative of u at x, which is
the matrix
_
_
u
1
x
1
(x)
u
1
x
d
(x)
.
.
.
.
.
.
.
.
.
un
x
1
(x)
un
x
d
(x)
_
_
,
has rank n1. This means that there is a uniquely dened hyperplane that is tangent
to u(X) at the point u(x). When there are just two individuals, this implies there is
a unique tangent line at (u
1
(x), u
2
(x)), as in Figure 28. See Figure 16 for the case of
three individuals. This hyperplane has a normal vector p that is uniquely dened up
to a non-zero scalar. The rst order condition (13) from Theorem 8.2 can be written
in matrix terms as
_

i
1

i
i1
1
i
i+1

i
n
_
u
1
x
1
(x)
u
1
x
d
(x)
.
.
.
.
.
.
.
.
.
un
x
1
(x)
un
x
d
(x)
_
_
= 0,
and we conclude that p is, up to a non-zero scalar, equal to the vector of multipliers
(with a coecient of one for i) for individual is problem.
An implication of the above analysis is that the vectors (
i
1
, . . . ,
i
i1
, 1,
i
i+1
, . . . ,
i
n
)
of multipliers corresponding to individuals i = 1, . . . , n are collinear. Indeed, they
are each normal to the tangent hyperplane at u(x), and the set of normal vectors is
one-dimensional, so the claim follows. The claim can also be veried mechanically by
multiplying both sides of
Du
i
(x) =
j=i
i
j
Du
j
(x)
by
1
i
j
and manipulating to obtain
Du
j
(x) =
1
i
j
Du
i
(x)
k=i,j
i
k
i
j
Du
k
(x).
Since the multipliers for js problem are unique, this implies
j
i
=
1
i
j
and for all
k = i, j,
j
k
=

i
k
i
j
, as claimed.
86
u(X)
(u
1
(x), u
2
(x))
(u
1
(y), u
2
(y))
utility
for 1
utility for 2
p
Figure 29: Violation of Pareto optimality
Three interesting conclusions follow from these observations. First, the multipliers
from Theorem 8.2 are actually strictly positive. Second, the utility shadow prices for
any two individuals are reciprocal: we can transfer utility from j to i at rate
i
j
, and
we can transfer utility from i to j at rate
j
i
=
1
i
j
. Third, the relative prices of any
two individuals are independent of the problem we consider. To see this, consider any
two individuals h, i, and let j and k be any two individuals. Then from the analysis
in the preceding paragraph, we have
i
j
i
k
=
h
j
/
h
i
h
k
/
h
i
=
h
j
h
k
.
If, for example, it is twice as expensive, in terms of is utility, to increase js utility
as it is to increase ks utility, then it is also twice as expensive in terms of hs utility.
A naland importantgeometric insight stems from the sign of the multipliers;
they are all non-negative, and at least one is strictly positive. Thus, the tangent
hyperplane to u(X) has a normal vector with all non-negative coordinates, at least
one positive. When there are just two individuals, this means that the utility frontier
is sloping downward at (u
1
(x), u
2
(x)), as in Figure 28, and the idea extends to a
general number of individuals, as in Figure 16. We conclude that at a Pareto optimal
alternative for which the constraint qualication is satises, the boundary of u(X) is
sloping downward, in a precise sense.
This is only a necessary condition, as Figure 29 illustrates: the boundary of u(X)
is downward sloping at (u
1
(x), u
2
(x)), but x is Pareto dominated by y. Although
87
conceptually possible, however, the anomaly depicted in the gure is precluded under
the typical assumption of quasi-concave utility. Recall that, by Theorem 7.2, the rst
order condition is sucient for a maximizer when the objective and constraints are
quasi-concave. With Theorem 8.1, this yields the following result, which does not
rely on a constraint qualication.
d
is convex, x A, and each u
i
: R
d
R is C
1
and quasi-concave. Assume that for each i, Du
i
(x) = 0 and there exist multipliers
i
1
,
i
i1
,
i
i+1
, . . . ,
i
n
0, j = i, such that
Du
i
(x) =
j=i
i
j
Du
j
(x).
Then x is Pareto optimal.
Thus, under quite general conditions, the rst order condition (13) is necessary and
sucient for Pareto optimality.
d
is convex, x intA, and each u
i
: R
d
R is C
1
and
quasi-concave. Suppose that for each i, the gradients {Du
j
(x) | j = i} are linearly
independent. Then x is Pareto optimal if and only if there exist strictly positive
multipliers
1
, . . . ,
n
> 0 such that
n
i=1
i
Du
i
(x) = 0.
As discussed above, Pareto optimality implies strictly positive coecients through
the rank condition, and then we select any i in Theorem 8.2 and manipulate (13)
to obtain the simpler rst order condition in the above corollary. For the other
direction, obviously all gradients must be non-zero, and we can manipulate the rst
order condition and set
i
j
=
j
/
i
to fulll the assumptions of Theorem 8.3.
9 Mixed Constraints
The goal of this section is simply to draw together results for equality constrained
and inequality constrained maximization into a general framework. Conceptually,
nothing new is added.
88
Let f : R
n
R, g
1
: R
n
R, . . . , g
: R
n
R, and h
1
: R
n
R, . . . , h
m
: R
n
R.
We consider the hybrid optimization problem
max
xR
n f(x)
s.t g
j
(x) = c
j
, j = 1, . . . ,
h
j
(x) d
j
, j = 1, . . . , m,
incorporating any restrictions on the domain of f into the constraints.
The rst order analysis extends from the previous sections. See Theorem 1 and
Corollary 3 of Fiacco and McCormick (1968).
n
R, g
1
: R
n
R,. . . , g
: R
n
R, h
1
: R
n
R, . . . ,
h
m
: R
n
R be C
1
. Suppose the rst k inequality constraints are the binding ones at
x R
n
, and assume the gradients of the binding constraints,
{Dg
1
(x), . . . , Dg
(x), Dh
1
(x), . . . , Dh
k
(x)},
are linearly independent. If x is a constrained local maximizer of f subject to g
1
(x) =
c
1
, . . . , g
(x) = c
and h
1
(x) d
1
, . . . , h
m
(x) d
m
1
, . . .
,
1
, . . . ,
m
R such that
Df(x) =
j=1
j
Dg
j
(x) +
m
j=1
j
Dh
j
(x) (14)
j
(d
j
h
j
(x)) = 0 j = 1, . . . , m (15)
j
0 j = 1, . . . , m. (16)
As above, we can dene the Lagrangian L: R
n
R
R
m
R by
L(x,
1
, . . . ,
,
1
, . . . ,
m
) = f(x) +
j=1
j
(c
j
g
j
(x)) +
m
j=1
j
(d
j
h
j
(x)),
and condition (14) from Theorem 9.1 is then the requirement that x is a critical point
of the Lagrangian given multipliers
1
, . . . ,
,
1
, . . . ,
m
.
Our results for quasi-concave objective functions with non-zero gradient go through
in the general setting, now with the assumption that all equality constraints are linear
and all inequality constraints are quasi-convex. Again, we rely on Theorem 7.2 for
the proof.
89
n
R be C
1
, let g
1
: R
n
R, . . . , g
: R
n
R be linear,
and let h
1
: R
n
R, . . . , h
m
: R
n
R be quasi-convex. Assume x R
n
satises
the constraints g
1
(x) = c
1
, . . . , g
(x) = c
and h
1
(x) d
1
, . . . , h
m
(x) d
m
and
the rst order condition (14)(16) with multipliers
1
, . . . ,
,
1
, . . . ,
m
. Then x
is a constrained global maximizer of f subject to g
1
(x) = c
1
, . . . , g
(x) = c
and
h
1
(x) d
1
, . . . , h
m
(x) d
m
provided either of two conditions holds:
1. quasi-concave and Df(x) = 0, or
2. f is concave.
With the above convexity conditions on the objective and constraints, if x is a con-
strained strict local maximizer, then it is the unique global maximizer.
n
R be quasi-concave, and let g
1
: R
n
R, . . . , g
: R
n
R be linear and h
1
: R
n
R, . . . , h
m
: R
n
R be quasi-convex. If x R
n
is a
constrained strict local maximizer, then it is the unique constrained global maximizer
of f subject to g
1
(x) = c
1
, . . . , g
(x) = c
and h
1
(x) d
1
, . . . , h
m
(x) d
m
.
Next, we have the standard second order necessary condition. See Theorems 2 and 3
in Fiacco and McCormick (1968).
n
R, g
1
: R
n
R,. . . , g
: R
n
R, h
1
: R
n
R, . . . ,
h
m
: R
n
R be C
2
. Suppose the rst k inequality constraints are the binding ones at
x R
n
, and assume the gradients of the binding constraints,
{Dg
1
(x), . . . , Dg
(x), Dh
1
(x), . . . , Dh
k
(x)},
are linearly independent. Assume x is a constrained local maximizer of f subject to
g
1
(x) = c
1
, . . . , g
(x) = c
and h
1
(x) d
1
, . . . , h
m
(x) d
m
, and the rst order
conditions (14)(16) with multipliers
1
, . . .
,
1
, . . . ,
m
. Consider any direction t
such that Dg
j
(x)t = 0 for all j = 1, . . . , and Dh
j
(x)t = 0 for all j = 1, . . . , k. Then
t
_
D
2
f(x)
j=1
j
D
2
g
j
(x)
m
j=1
j
D
2
h
j
(x)
_
t 0.
Again, strengthening the weak inequality to strict yields the second order sucient
condition for a local maximizer. See Theorem 4 of Fiacco and McCormick (1968).
90
n
R, g
1
: R
n
R,. . . , g
: R
n
R, h
1
: R
n
R, . . . ,
h
m
: R
n
R be C
2
. Assume x R
n
1
(x) = c
1
, . . . , g
(x) =
c
and h
1
(x) d
1
, . . . , h
m
(x) d
m
and the rst order condition (14)(16) with
multipliers
1
, . . .
,
1
, . . . ,
m
. Assume that for all directions t with Dg
j
(x)t = 0
for all j = 1, . . . , and Dh
j
(x)t 0 for all j = 1, . . . , k, we have
t
_
D
2
f(x)
j=1
j
D
2
g
j
(x)
m
j=1
j
D
2
h
j
(x)
_
t < 0. (17)
Then x is a constrained strict local maximizer of f subject to g
1
(x) = c
1
, . . . , g
(x) =
c
and h
1
(x) d
1
, . . . , h
m
(x) d
m
.
Adding strict complementary slackness, we obtain conditions under which solutions
to mixed problems vary smoothly with respect to parameters. See Theorem 5.1 in
Fiacco and Ishizuka (1990).
Theorem 9.6 Let R
m
n
R, g
1
: R
n
R,. . . ,
g
: R
n
R, h
1
: R
n
R, . . . , h
m
: R
n
R be C
2
. Consider (x
)
R
n
. Given
, assume that x

1
(x
) = c
1
, . . . ,
g
(x
) = c
m
and h
1
(x
) d
1
, . . . , h
m
(x
) d
m
, that the rst k inequality
constraints are the binding ones at x, and assume the gradients,
{D
x
g
1
(x
), . . . , D
x
g
(x
), D
x
h
1
(x
), . . . , D
x
h
k
(x
)},
are linearly independent, that x
satises the rst order condition at
, i.e.,
D
x
f(x
) =
j=1
j
D
x
g
j
(x
) +
m
j=1
j
D
x
h
j
(x
j
(d
j
h
j
(x
)) = 0 j = 1, . . . , m
j
0 j = 1, . . . , m,
with multipliers
1
, . . . ,
,
1
, . . . ,
m
, and that strict complementary slackness holds,
i.e.,
j
> 0 if and only if j k. Assume that for all directions t with Dg
j
(x
)t = 0
for all j = 1, . . . , and Dh
j
(x
)t 0 for all j = 1, . . . , k, we have

t
_
D
2
f(x
j=1
j
D
2
g
j
(x
)
m
j=1
j
D
2
h
j
(x
)
_
t < 0.
n
with x
Y and and C
1
mappings
h
:
R, h = 1, . . . , n,
j
: R, i = 1, . . . , , and
j
: R, j = 1, . . . , m, such
that for all , (i) () = (
1
(), . . . ,
n
()) is the unique maximizer of f(, )
91
subject to g
1
(x, ) = c
1
, . . . , g
m
(x, ) = c
and h
1
(x, ) d
1
, . . . , h
m
(x, ) d
m
belonging to Y , (ii) () satises the rst order condition (14)(16) at with unique
multipliers
1
(), . . . ,
(),
1
(), . . . ,
m
() with complementary slackness, and (iii)
() satises the second order sucient condition (17) at with multipliers
1
(),
. . . ,
(),
1
(), . . . ,
m
().
Next is our last version of the envelope theorem. See Theorem 4.1 in Fiacco and
Ishizuka (1990).
Theorem 9.7 Let R
m
n
R, g
1
: R
n
R,. . . ,
g
: R
n
R, h
1
: R
n
R, . . . , h
m
: R
n
R be C
1
. Consider
(x
) R
m
. Let
h
: R, h = 1, . . . , n, and
i
: R, i = 1, . . . , ,
and
j
: R be C
1
= (
) and
= (
1
(
), . . . ,
))
and
= (
1
(
), . . . ,
m
(
)) and such that for all , () is a constrained local

maximizer satisfying the rst order condition (14)(16) at with multipliers
1
(),
. . . ,
(),
1
(), . . . ,
m
(). Dene the mapping F : R by
F() = f((), )
1
, and
DF(
) = DL(
|x
).
Finally, we can again use the envelope theorem to characterize
j
as the marginal
eect of increasing the value of the jth equality constraint; and
j
as the marginal
eect of increasing the value of the jth inequality constraint, which of course cannot
reduce the maximized value of the objective.
References
[1] A. Fiacco and Y.Ishizuka (1990) Sensitivity and Stability Analysis for Nonlinear
Programming, Annals of Operations Research, 27: 215236.
[2] A. Fiacco and G. McCormick (1968) Nonlinear Programming: Sequential Uncon-
strained Minimization Techniques, McLean, VA: Research Analysis Corporation.
[3] D. Gale (1960) The Theory of Linear Economic Models, Chicago, IL: University
of Chicago Press.
92

Optimization Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Optimization Notes

Uploaded by

Copyright:

Available Formats

Notes on Optimization and Pareto Optimality

= . In this case, we write

(x) X; and it is a boundary point of X if for all > 0, B

as the equivalent 1 n row matrix. We can right-multiply A by

(At), and produce a 1 1 matrix,

At 0; and it is negative denite if the latter inequality always holds

Figure 3: Unapproachable elements

The vector of partial derivatives at x is called the gradient of f at x, and it is denoted

We can give characterizations of concavity and strict concavity in terms of second

3/2, 1/2), we have

(x), f(x) f(y). And x is a strict local

(x) X C, f(x) f(y). Similarly, an element

(x) X C with y = x, we have f(x) > f(y).

(x) X and for all y B

(x), f(x) f(y). Note that for

(x), so f(x) f(x+t) for R small.

(x) X and for all y B

(x), f(x) f(y). Consider a

denote the solutions remaining after this step.

2/2, and by (i) we have

, y}, and we conclude that

, then x may no longer be a local maximizer.

satises the rst

) = 0, and the second order sucient condition, i.e.,

and for all (x, ) Y

and all directions t, D

) has full rank. Since x

), the rst order condition holds at x

such that for all

, () is the unique solution

. Dene Y to be a nonempty, convex, open subset

. Note that Y is open and is continuous,

, the level sets of f(,

) are in blue, and

. Varying within changes the level

Figure 8: Parameterized local maximizer

) X with x intX. Let

)) and such that for all ,

) as x ranges over the open

to , the maximizer over Y is now (), and of course

and varying , it follows that the graph

, ) lies below the graph of F(); and of course f(x

, ) is tangent to the graph of F at

, so the derivative of F is equal to the

. Repeating this argument for dierent starting

), we see that the graph of F envelopes the

Figure 12: Utility imputations

U . See Figure 12. Therefore, U is convex. Now assume

A and any (0, 1).

is Pareto optimal, then there is a vector of non-negative

maximizes the sum of individual

must maximize the

is Pareto optimal, and dene the set

. The set V is nonempty, convex, and open (and so has

for all i N, contradicting our assumption that x

solves the maximization problem in the theorem. Since z

< c. Given > 0, dene w = z

+ (1, 1, . . . , 1), and note

solves the maximization problem in the theorem

) = 0, and the constraint qualication is also satised. Furthermore,

) is obtained by setting = 0, as in Lagranges theorem. But x

, that we could increase the value of f by

= 0, but it is not a constrained local maximizer. Two

= 0. It turns out that if either