You are on page 1of 23

c 2012 Jerey A.

Miron

Review of Calculus Tools

Outline
1. Derivatives
2. Optimization of a Function of a Single Variable
3. Partial Derivatives
4. Optimization of a Function of Several Variables
5. Optimization Subject to Constraints

Derivatives

The basic tool we need to review is derivatives. The basic, intuititive denition of a
derivative is that it is the rate of change of a function in response to a change in its
argument. Lets take an example and look at it more slowly.
Say we have some variable y that is a function of another variable x, e.g.,
y = f (x)
For example, we could have
y = x2
or
y = 7x + 3
or
y = ln x
Graphically, I am just assuming that we have something that looks like the following:

Graph: A Standard Dierentiable Function with a Maximum


y = (x 3)2 + 8

10
9
8
7
6
5
4
3
2
1
0
0

10

Now say that we are interested in knowing how y will change if we change x.
Lets say that y is test scores, and x is hours of studying.
Assume we are initially at some amount of x, e.g., you have been in the habit of
studying 20 hours per week. You want to know how much higher your test scores
would be at some other amount of x, x + h.
One thing you could do, if you know the formula, is take this alternate x + h, and
compute f (x) as well as f (x + h). You could then look at the dierence:
f (x + h)

f (x)

This would be the change in y. For some purposes, that might be exactly what you
care about.
In other instances, however, you might care about not just how much of a change
there would be, but how much per amount of change in x, i.e., per h:
That is also easy to calculate:
2

f (x + h)
h

f (x)

Now look at this graphically:

Graph: Calculating the Rate of Change in f Over a Discrete Interval


y = 4x1=2

10
9

f(x+h)8
7

f(x+h) - f(x)

6
5

f(x) 4
h

3
2
1
-1

x+h

10

As you can see, we are just calculating the ratio of two legs of a triangle; that
ratio is the slope of the line that connects the two points, as seen above.
The problem is that this calculation would have a dierent answer if we calculated
it at a dierent point:

Graph: Calculating the Rate of Change at a Dierent Point


y = 4x1=2

f(x+h)10
y
9

f(x+h') - f(x)

f(x) 7
h'

6
5
4
3
2
1
-1

x+h'

So, what if we calculated all this, but for a smaller h?


Then, we would get a dierent rate of change/slope:

10

Graph: Rate of Change as h Shrinks


y = 4x1=2

10
9
8
7
6
5
4

3
2
1
0
0

10

So, lets think about the limiting case of this. Say we examine
f (x + h)
h!0
h
lim

f (x)

At one level, this thing might seem a bit confusing or ill-dened. The numerator
obviously goes to zero as h gets small. The denominator also goes to zero. So,
why should we expect the limit to converge to anything?
The proof is outside this course. But, looking at the graph, we can see that it
seems plausiblethat as we let h go to zero, the ratio should approach the slope of
the line that is tangent to the function.
This is indeed the case, and it can be proven, but we will just accept it as
reasonable.
To summarize, we have shownthat the rate of change of a function at a given
point (assuming it has a well-dened rate of change) is equal to the slope of a line
that is tangent to the curve at that point.
6

So, we simply want to dene the derivative as


f (x + h)
h!0
h

dy=dx = f 0 (x) = lim

f (x)

The key thing to keep in your head is that the derivative is both:
1) the rate of change of the function at that point, and
2) the slope of the tangent line at that point.
Here are a few additional things to consider:
1) The derivative is usually dierent at dierent points.
2) Some functions do not have derivatives at all points:

Graph: Functions with Non-Dierentiabilities

10
9
8
7
6
5
4
3
2
1
0

x'

10

10
9
8
7
6
5
4
3
2
1
0

y=

x'

(x + 3)=x + 4

10

x' = 0
2
0
0.5

1.0

1.5

2.0

2.5

3.0

3.5

-2
-4
-6
-8
-10

3) We know the formula for the derivatives of a lot of functions:


constant
linear
polynomial
x to any power
ln x
ex
and many more, but we will only need the ones above.
4) We also know some rules about combinations of functions.
The product rule: if
f (x) = g(x)h(x)
then
9

4.0

f 0 (x) = g(x)h0 (x) + h(x)g 0 (x)


The chain rule: if
f (x) = g(h(x))
then
f 0 (x) = g 0 (h(x))h0 (x)

Optimization of a Function of a Single Variable

So far we have talked about the idea that the change in a variable y that depends
on a variable x, per unit of x, might be a useful thing to measure in some settings.
And, we have seen that the derivative we have dened the change in y per unit
of x, for small changes in x seems to measure that concept.
But we have not been that explicit about why derivatives are useful in economics.
Well take a step in that direction now.
So, imagine that we have some y that depends on x, and we control x. We know
that dierent values of x lead to dierent values of y, and we want to choose the x
that gives us the highest y.
For example, assume y is a measure of happiness,and x is the number of pints
of Ben and Jerrys that a consumer eats each night. You might think that for
small values of x, y increases with x. But at some point, as x increases, happiness
decreases (because you can feel your arteries clogging as you eat your 8th pint that
night).
Graphically, we have

10

Graph: A Single-Peaked Function


y = (x 3)2 + 8

10
9
8
7
6
5
4
3
2
1
0
0

10

So, graphically, its easy to pick the right point.


The key thing about this point, other than the fact that it seems to be where y
is highest, is that the slope at that point, i.e., the derivative, is zero.
So, this suggests a strategy for nding the x that leads to the maxiumum y: take
the derivative, set it equal to zero, and then solve.
That is, compute
f 0 (x)
set this to zero
f 0 (x) = 0
and solve for x.
This kind of equation is known as the rst-order condition (FOC) for a maximum.

11

The phrase rst-orderis important; it suggests that this is not the whole story,
and that there may be second-order things we have to worry about. Lets leave
that aside for a second.
Intuitively, it seems clear (and one can prove rigorously under some assumptions)
that the x that satises this condition is the x at which the maximum y occurs.
There are some caveats, but ignore them for a moment and look at an example.
Lets say
y = f (x) =

x2 + 6x + 4:

Then the problem we want to solve can be written as


max
x

x2 + 6x + 4

We therefore compute the derivative


2x + 6
set this to zero
2x + 6 = 0
and solve for x; x turns out to be 3.

2.1

Caveats

The graph that I drew, and the example I considered, had nicefeatures:
1. exactly one peak
2. denitely had a max
3. everywhere dierentiable
This is not true for all functions:

12

Graph: Functions Without Well-Dened Maxima

10
8
6

no max or min
4
2

-3

-2

-1

10

-2
-4

10

infinitely many
max = min

-3

-2

-1

-2

13

10

10
9
8
7
6
5

min but no max

4
3
2
1
0
0

10

So, the condition we have stated, the FOC, is not su cient for a point to be a
maximum.
Indeed, it is not even necessary, if we allow for functions that are not dierentiable.
There is a standard approach to dealing with this that handles these weird cases
for dierentiable functions. This method is known as the second-order conditions.
It basically says that the second derivative has to be negative for a maximum.
What is a second derivative? Its just a derivative of a derivative. And you
probably remember, or can at least see intuitively, why this makes sense: If the
second derivative is negative, the derivative is getting smaller.
Dont worry about this for now. I will review it again in a few examples where
it is relevant later.
Most, although not all, of the problems we examine are nice. For now, I want
you to be aware of the fact that some problems are not "nice." We will see some
examples where it is relevant later. But, its not the key thing to focus on now - just
be sure to understand the intuition and mechanics of the FOC.
To be clear, it is very important that you be aware that the FOC is not a su cient
condition; there are special cases where the point that satises the FOC is not the
14

maximizing point.
But were not going to worry about the details yet or to a
signicant degree in this course overall.
NB: everything Ive said is applicable for nding minima instead of maxima. That
is one reason we have to check the SOCs. But again, in most applications that we
will consider, this will take care of itself.

Partial Derivatives

The next, and basically last, calculus topic that we need is partial derivatives.
The reason is that many interesting economics examples relate one variable, say
y, to two (or more) other variables, say k and l. A common example can be found
in a production function:
y = f (k; l)
or, in a utility function,
u = u(x1; x2 )
So, the standard calculus of one variable is not su cient.
Imagine that we have a function of two variables, e.g.,
y = f (x; z)
Now, this is a bit more of a pain graphically.

15

But, in principle, we can draw this:

Graph: A Function of Two Variables


z = (x 4)2 =8 (y 4)2 =8 + 8
10

5
0
0 0

10

10

So, y changes in responses to both x and z.


If we held one variable constant that is, looked at a particular slice of this
picture in either the x or z direction we would see a univariate function.
If we were only working with that, then we might just apply the standard approach from before.
So, we might consider the rate of change of y with respect to either one of those
variables.
It is therefore natural to dene what are called partial derivatives:
f (x + h; z)
@y
= lim
h!0
@x
h

f (x; z)

Now, this might look messy. But it simply treats z as a constant, and then takes
a standard derivative.
This is easiest to see by considering examples. Assume
16

y = xz
Then
@y
= z:
@x
Why? Because if we treat z as a constant, then y equals just a constant times x,
and we know how to take that derivative.
What exactly is this partial telling us? It is telling us the rate at which y changes
as we change x, holding z constant.
Furthermore, it makes sense that this depends on the value of z. Take z = 0 then changing x has no eect on y.
Of course, we could also think about the eect of z on y. To calculate that, we
take the derivative of y with respect to z, treating x as a constant:
@y
= x:
@z
So, if we have a function
y = f (x1; x2 ; : : : xn )
i.e., a function of n variables, there will be exactly n partial derivatives.
More examples: Let
y = ax + bz + cq
Then
@y
=a
@x
@y
= b:
@z
@y
= c:
@z

17

Now say
y = x2 z 3
Then
@y
= 2xz 3
@x
@y
= 3x2 z 2 :
@x
Or, let
u(x1; x2 ) = x1 x2
Then
@u(x1 ; x2 )
= x1
@x1

@u(x1 ; x2 )
= x 1 x2
@x2

3.1

x2
1

Discussion:

You need to know two things about partials.


First, given a general function or some specic function, you should know how to
calculate them.
That should be pretty straightforward, since once you understand the approach
treat all other variables as constants, and then apply standard rules from univariate
calculus its a totally mechanical application of univariate calculus.
Second, you need to know how to interpret partials.
This again should not be hard; it is just a tad dierent than the univariate case,
but in a way that matters.
18

In words, the partial of a function with respect to one argument is the rate of
change in the function in response to a small change in the argument, holding the
other arguments xed.
This is dierent than adjusting both arguments.
For example, increasing a consumers consumption of goods 1 and 2 is normally
going to have a dierent eect on utility than just increasing, say, good 1.
As a second example, increasing both K and L will have a dierent eect than,
say, increasing L and holding K constant.
Well see this in practice soon.

Optimization of Functions of Several Variables

The last topic we need to consider is how to nd the maximizing values for functions
of several variables.
Indeed, this is the case of real interest, since key examples in economics are of
this variety.
That is what creates all the tension about how much math to use in intermediate
courses.
Everyone agrees that its nice to be able to use calculus. But it turns out that
we need just a little bit of multivariate calculus.
Virtually all basic calculus courses, however, focus only on univariate, rather than
multivariate, calculus; in particular, they do not teach partial derivatives. Thus, in
most sequences, you do linear algebra and then multivariate calculus. This makes
sense, since you need linear algebra (but only a tiny amount) for some parts of
multivariate calculus. But this standard approach makes life di cult.
So, the key tool we need to do micro theory with calculus is partial derivatives.
That means that if we cannot use partials, the benets of using calculus are not
large; thats why most books put it in an appendix, or skip it entirely.
19

Thats also why many departments do not require calculus for an econ major;
Harvard did not until 10 or 15 years ago.
But this seems nutty to me: for good students who have had some introduction
to basic calculus, learning a partial derivative is not a big deal; its really just a baby
step away from what you already know. Indeed, if you think about it the right way,
you already know what a partial is, as we have seen.
Now we can see why it is useful.
Lets rst consider an abstract example, because theres one small wrinkle that
I want to leave aside for the moment that comes in when we get to the economics
examples.
Say we have
y = f (x1; x2 )
We know that if f is a smoothfunction, it could look something like:

20

Graph: A Smooth Function of Two Variables


z = (x 4)2 =8 (y 4)2 =8 + 8
10

5
0
0 0

10

10

We also know we could think about this in only one of two dimensions.
Then this would look like:

21

Graph: A Slice of the Graph Above


z = (x 4)2 =8 (4 4)2 =8 + 8

10
8
6
4
2

-10

-8

-6

-4

-2

-2

10

12

14

16

18

20

-4
-6
-8
-10

So, intuitively, we want to make sure were at a peak from either angle.
Well, looking from either angle is like holding one of x1 or x2 xed.
So, say we do the following: calculate
@y=@x1
and
@y=@x2
set both to zero, and nd the combination of x1 and x2 that simultaneously
solves the two equations.
I am assuming you can see intuitively that this is analogous to the univariate
case.
In words, if we are at the combination of x1 and x2 that produces the maximum
y, then two things must be true:
22

1) A small change in x1 leads to a decrease in y, whichever direction we go in.


2) A small change in x2 leads to a decrease in y, whichever direction we go in.
Thus we have two conditions involving two unknowns, and we can solve these.
These two equations need not be linear.
Under some regularity assumptions, there will be an x1 and an x2 that works.
Take for an example
y=

3x21

2x22 + 5x1 x2 + x1 + x2

Then, the FOCs are


@y=@x1 =

6x1 + 5x2 +

=0

@y=@x2 =

4x2 + 5x1 +

=0

This is just two linear equations in two unknowns. We can easily solve for x1 and
x2 .
Of course, these are just rst-order conditions. As with the univariate case, we
have to worry about whether were getting a max or min; we also have to worry
about kinks and boundaries, etc.
Ignore all this for the time being, but be aware that there could be an issue. Well
look more carefully at some particular cases as needed.

Constraints

So far, we have not talked about the multivariate case under the assumption that
there might be constraints.
We are going to nesse that issue for the most part, in ways you will see shortly.
So, it is again something we will have to worry about a bit, but its best handled
case-by-case with specic examples, rather than with general theory.
23

You might also like