Advanced Mathematics 2014 2015 TOTAAL Week 1 - Week 7 26 March 2015

Advanced Mathematics
2014/2015
Weeks 1-7
This version: 26 March 2015
Utrecht University School of Economics
Prof. dr. Wolter Hassink (lectures and coordinator)

w.h.j.hassink@uu.nl
Oke Onemu, MSc (tutorials) O.A.Onemu@uu.nl
Jochem Zweerink, MSc. (tutorials) J.R.Zweerink@uu.nl

ADVANCED MATHEMATICS CONTENTS
General information 4
Week 1 Introductory material 8

Lecture 8
Tutorial 47
Take home assignment 64
Additional exercises 65
Week 2 Linear algebra (I) 72

Lecture 72
Tutorial 97
Take home assignment week 2 119
Week 3 Linear algebra (II) 123

Lecture 123
Tutorial 152
Take home assignment week 3 169
Week 4 Calculus 183

Lecture 183
Tutorial 220
Week 5 Optimization (I) 240

Lecture 240
Tutorials 274
Week 6 Optimization (II) and integrals (I) 302

Lecture 302
Tutorials 321
Week 7 Integral calculus and dynamic analysis (I) 342

Lecture 342
Tutorials 355
Week 8 Dynamic analysis (II) 385
2
3
ADVANCED MATHEMATICS GENERAL INFORMATION
Assessment method
Midterm (50%) on Friday 6 March 2015; 1:30 p.m. 4:30 p.m.

Location: Educatorium Gamma (on material week 1 week 4)
Endterm (50%) on Friday 10 April 2015; 1:30 p.m. 4:30 p.m.
Location: Olympos Hall 3 (on material week 5 week 8)
In week of 2 March: no education
Retakes: week of retake after period 4
Replacement retake exam (4.0 <= final grade < 5.0)

o On all material (week 1 week 8): No restriction to
revised grade after retake.
Supplementary retake exams (5.0 <= final grade < 5.5)
o On all material (week 1 week 8): No restriction to
revised grade after retake
o On material of endterm (week 5 week 8): Revised grade
can be 6 at maximum.
Effort requirement
In week 2 week 8, students have to hand in individual
assignments (deadline: Friday 4:30 p.m. of that week), in which
the exercises are on the material of the week before. At the end
of the course, the assignments of two randomly drawn weeks
will be checked. If both assignments are of sufficient quality, the
effort requirement will be checked off.
4
Group Week Day Time Building Room

Lecture 6 - 9, 11 - 14 Monday 11.00 - 12.45 Androclus C101
1 6 - 9, 11 - 14 Wednesday 09.00 - 10.45 Adam Smith Hall 113

3 6 - 9, 11 - 14 Wednesday 09.00 - 10.45 Descartes Hall 210
1 6 - 9, 11 - 13* Friday 09.00 - 10.45 Adam Smith Hall 113

3 6 - 9, 11 - 13* Friday 09.00 - 10.45 Descartes Hall 210
In week of 2 March (week 10): no education

No tutorial on April 3 (week 14)!
5
Academic Skills
Problem solving
Effective teamwork.
Please contact prof. Wolter Hassink, if you would like to sign in

for academic skills.
6
The course
Week 1 Introductory material

Week 2 Linear Algebra (I)
Week 3 Linear Algebra (II)
Week 4 Calculus - differentiation
Week 5 + first half week 6 Optimization
Second half week 6 Constrained optimization and Integral
calculus
Week 7 Constrained optimization and Integral calculus
Week 8 Dynamic analysis
7
ADVANCED MATHEMATICS LECTURE WEEK 1
Week 1 - Introductory material

The material in the slides (compulsory
for the exam) is more extensive than
the book
Functional notation (domain, range etc.) Klein 2.1. [K 2.1.]
Graphs of univariate and multivariate K 2.1.
functions
Limits, continuity; continuous functions K 2.1.
Properties of functions: monotonous, K 2.2.
convex, concave, injective, surjective,
inverse, homogeneous
Necessary conditions, sufficient K 2.2.
conditions
Exponential function K.2.3. K.3.2.
Rules of exponential functions K.2.3.
Logarithm (as inverse of exponential K.3.3.
function)
Rules of logarithms K.3.3.
Structure of mathematical proofs Supplemental
material
Summations and multiplications Supplemental
material
8
Sets
Definition: Set: collection of elements
Example 1:
let
W : the set of all non-negative integers W {0,1,2...}
Set of positive integers: W {1,2,...}
W : the set of all integers W {..., 2, 1,0,1,2,...}
W : the set of all rational numbers of the form a / b , where a and
b are both integers
W : the set of all real numbers (includes both rational and
irrational numbers).
1
Definition: Irrational numbers, e.g.: 3, , ,e
2
Example 2:
let S {0,1,2...,,10}
An element may belong to a set (or: may be a member of a set). Thus
x S
Example 3:
integer 1 S
integer 1
Irrational number (=3.141...) and e (=2,71...)
However, 0.5 S and and e
Definition: Sub-set
Example 4: let T {0,1}

T is a sub-set of S: T S or : S T
: inclusion symbol
9
Definition: union of sets

Example 5: S T S
All elements that belong to S or T (or both).
Definition: intersection of sets:

Example 6: S T T
Elements that belong to both S and T.
Definition: empty set
Example 7: V {9,10}
V T
The sets V and T have no elements in common. They are disjoint.
10
Functions
Definition: Function, Mapping (or transformation): element of set X

into set Y
Function: f : X Y
Set X: domain on which the function is defined.

Set Y: range of the function
Note: range can be broad.
11
Univariate functions
Definition: one member of domain is related to one member of range

y f ( x)
Definition: x is argument of the function f(x)
Example 8:
y 2 3x
Domain:
Range:
Example 9:
y 2 3| x |
Domain:
Range: [2, )
Example 10:
y 2 3 x
Domain: [0, )
Range: [2, )
Example 11:
3
y 2
x
Domain: (0, )
Range: (2, )
Example 12:
y
x
Domain: (0, )
Range: ( , )
Definition: and are parameters
12
Multivariate functions
Definition: different independent variables and one dependent

variable
y f ( x1 , x2 ,..., xn )
Subscript of x refers to the variable name.
13
Necessary and sufficient conditions: some logic
Definition:
Whenever P is true, Q is necessarily true.
Q is a necessary condition for P:
P is a sufficient condition for Q:
P Q
Read: It means that if P then Q

or Q is the consequence of P
Example 13:
X is a square X is a rectangle
A sufficient condition for X to be a rectangle is that X be a square.
or
A necessary condition for X to be a square is that X be a rectangle.
Example 14:
Person is healthy Person breathes without difficulty
Sufficiency: A sufficient condition for Person breathes without
difficulty is that Person is healthy
Necessity: A necessary condition of Person is healthy is that
Person breathes without difficulty
Wrong implications (reverse implication of above):

A person breathes without difficulty is necessarily healthy.
and
Breathing is a sufficient condition for a person to be healthy.
Example 15:
x 5 x 2 25
x 5 x2 25
Example 16:
xy 0 x 0 or y 0
14
Example 17:
x 0 or y 0 xy 0
Definition:
P is a necessary and sufficient condition for Q:
P Q is equivalent to
P Q and Q P
Read: It means that P if and only if Q

P Q
Read: It means that P if and only if Q

P Q
Implication
P Q is equivalent to
both P Q and not P not Q
Example 18:
x 0 or y 0 xy 0
Example 19:
x 5 or x 5 x2 25
15
Structure of a mathematical proof I - direct proof
Direct proof:
P Q
P: set of propositions. Also referred to as premise (what we know)
Q: set of propositions. Also referred to as conclusion (what we want
to know)
Indirect proof (proof by contrapositive):

Equivalent structure of a direct proof:
P Q is equivalent to not Q not P
P Q is equivalent to Q P
Example 20:
Direct structure: If it is raining, the grass is getting wet.
Indirect structure: If the grass is not getting wet, then it is not raining.
16
Proof by contradiction general structure
P Q is impossible or it leads to a contradiction
Proof : Step 1
Assume that the result (Q) which must be proved is false
Step 2
Combine this assumption with the information that was given (P) and
any other useful statements that are true.
Step 3
deduction of step 2 which contradicts a known fact.
17
Structure of a mathematical proof II - proof by contradiction
Usually, proofs by contradiction are used for negative results
Example 21:
Proposition: There is no largest number in
Proof:
Step 1Assume that there is a largest number in , call it x.
Step 2 Consider x 1 . Since the sum of two integers is again an

integer, ( x 1) . But x 1 x . This contradicts the assumption
that x is the largest number in .
Step 3 Hence there is no largest number in .
18
Structure of a mathematical proof - proof by induction
Structure
Step 1
Show that the statement is true for n = 1
Step 2
Assume that the statement is true for any n k and demonstrate that it
is also true for n k 1
Step 3
Conclusion: Statement is valid for all n.
19
Example 24
n(n 1)
1 2 ... n
2
Step 1
The statement is true for n = 1
Step 2
Assume that the statement is true for any n = k
k (k 1)
1 2 ... k
2
k (k 1)
then 1 2 ... k (k 1) (k 1)
2
k (k 1) k (k 1) 2 k 2 k 2k 2
(k 1) (k 1)
2 2 2 2
k 2 3k 2 (k 1)(k 2)
2 2
Step 3
The statement is valid for all n
20
Functions and limits
Definition: The limit

lim f ( x) L
x a
means that for any arbitrarily small number 0 there exists a small
number 0 such that
| f ( x) L | whenever 0 | x a |
Thus, when calculating the limit, we must consider the 0 for
which this is a true statement.
Example 25:
Take a constant function f ( x) 10
lim f ( x) lim 10 10
x 5 x 5
Take any 0
| f ( x) 10 | 0 whenever 0 | x 5 |
True, because | f ( x) 10 | 0
and 0 is smaller than for any positive , whatever value of is
taken.
Example 26:
Take the identity function f ( x) x
lim f ( x) lim x 3
x 3 x 3
| x 3| whenever 0 | x 3 |
This is true if we take
21
Continuity of functions (I)
Definition: a function f ( x) is continuous at a point x=a if

a) lim f ( x) f ( p)
x a
and
b) the function f ( x) is defined at x=a .
Definition: If a function is continuous at every point of its domain, it

is called a continuous function.
22
Example 27
A constant function f ( x) c is continuous on every point of its
domain. Take any point x=a. Take f ( x) 10 . We know that f ( x) 10
for any point x=a .
lim f ( x) lim 10 10 f (10)

x a x a
Example 28
The identity function f ( x) x is continuous on every point of its
domain. Take any point x=a. We know that f ( x) x for any point
x=a .
lim f ( x) lim a a f (a)
x a x a
which is true for all values of a.
Example 29
1
,for x 8
Let f ( x) x 8
1,for x 8
the function is discontinuous at x=8.
23
This slide and the next six slides: A graphic interpretation of

lim 2 x 6
x 3
24
First we pick a (random) 0.
25
Then we find a suitable 0 . Note that is not unique: any smaller

value certainly would also have sufficed. All that matters is that we
can find one that works
26
The outcomes within the range of (indicated by the red line) are all
inside the range of , which is what we want for our limit.
Note that lim 2 x 6 is unique. We cant make the same graph for
x 3
?
lim 2 x 7 , because it isnt true:
x 3
27
Now if we pick a certain 0:
28
There is no 0 that works:
The red line, the outcome from within the range of , do not fall
within the range of . Hence 7 is not the correct limit. Of course here
we show this only for a particular , but with some thought you should
be able to see that things dont work out for any .
29
More on limits (I)
If lim f ( x) A and lim g ( x) B

x a x a
o lim f ( x) g ( x) lim f ( x) lim g ( x) A B

x a x a x a
o lim f ( x) g ( x) lim f ( x) lim g ( x)

x a x a x a
o lim f ( x) g ( x) A B
x a
o lim f ( x) / g ( x) A / B if B 0
x a
30
Which functions are continuous functions?
If f, g are continuous functions, then so are the functions

f
f g , f g , f .g , For the last one take care that it is well
g
defined, i.e. that g ( x) 0
Any polynomial function is continuous.
Any rational function (a polynomial divided by a polynomial, e.g.
x3 3x
) is continuous.
x2 2
The exponential and the logarithmic function are continuous.
31
More on limits (II)
lim x a if a 0 lim x a 0 if a 0
x x
lim e x lim e x 0
x x
limlog( x) lim log( x)

x x 0
If f is a continuous function, then for lim f ( x) (or lim f ( x) ) you

x x
can try to pug in (or -) and treat them as normal numbers. This
may or may not give you an answer. See the rules for infinity
(below) to see how you can treat them as numbers.
32
Rules for infinity
For any real number a
a
a
a , if a 0, a , if a 0
a , if a 1, a 0, if 0 a 1
a
if a 0, a
0 if a 0
, ,
0
, , are not defined.
Note that if an expression in infinities is not defined, it does not
necessarily mean that the limit that gave rise to it is also not defined.
lim x 2 x 2
which is not defined. However, by
x
rewriting it as lim x 2 x lim x( x 1) ( 1) . If you

x x
find an expression like it only means that you cannot solve your
limit in the way that you tried.
33
Polynomial functions - many examples 30:

x2 1 x2 1 x2 1 1 1
lim 3 lim 3 lim lim lim lim 0
x x x x x3 x x3 x x3 x x x x3
x3 1 x3 1 x3 1 1
lim 3 lim 3 lim lim 1 lim 1 0 1
x x x x x3 x x3 x x3 x x3
ax3 1
lim a
x x3
x4 1 1
lim 3 lim x lim
x x x x x3
x3 1 x3 1 x3 1 1
lim 3 lim lim lim lim1 lim
x 0 x x 0 x3 x3 x 0 x3 x 0 x3 x 0 x 0 x3
3
x 1 x3 1 x3 1 1
lim 3 lim lim lim 3 lim1 lim 3
x 0 x x 0 x3 x3 x 0 x3 x 0 x x 0 x 0 x
34
Exponential functions - many examples 31:
lim e5 x
x
lim e5 x 0
x
e5 x e5 x e 5x
1 lim1 1
lim lim lim x
1
x 1 e5 x x 1 e5 x e 5x x e 5x
1 lim e 5x
lim1 0 1
x x
e5 x e5 x e 5x
1 lim1 1
lim lim lim x
0
x 1 e5 x x 1 e5 x e 5x x e 5x
1 lim e 5x
lim1 1
x x
35
Properties of functions
Definition:
Lets have f ( x) and xB xA
f ( x) is increasing if f ( xB ) f ( xA )
f ( x) is strictly increasing if f ( xB ) f ( xA )
f ( x) is decreasing if f ( xB ) f ( xA )
f ( x) is strictly decreasing if f ( xB ) f ( xA )
Definition: a monotone function is either increasing or decreasing.

Definition: a strict monotone function is either strictly increasing or
strictly decreasing.
Example 32
The function f ( x) 3( x 1)2 is not a monotone function. However, the
function f ( x) 3( x 1) is a monotone function
Definition: Strictly monotone functions are one-to-one functions (or

injective functions).
If f ( xA ) f ( xB ) then xA xB
Other formulation: f ( xA ) f ( xB ) whenever xA xB
Definition:
Any monotone function has an inverse function. Notation:
y f ( x) has the inverse function y f 1 ( x)
Example 33
y 3( x 1)
1
can be rewritten as x ( y 3)
3
1
Thus the function y ( x 3)
3
is the inverse function of y 3( x 1)
36
Definition: Composite function

Argument x of the function y f ( x)
is also a function x g ( z )
so that y f ( g ( z ))
Property
f ( f 1 ( x)) x and f 1 ( f ( x)) x
Example 34
1
y 3( x 1) and y ( x 3) are inverse functions.
3
1
thus 3[ ( x 3) 1] x
3
1
and [3( x 1) 3] x
3
37
Secant lines
Definition:
Secant line: line between the points ( xA , y A ) and ( xB , yB )
where y A f ( xA ) and yB f ( xB )
f ( xB ) f ( x A )
y ' yA ( x ' xA )
xB x A
For any point ( x ', y ') on this line, x ' is within [ xA , xB ] and y ' is
within [ y A , yB ]
38
Concavity and convexity
Definition:
A function is strictly concave in an interval if for any distinct points
x A and xB in that interval, and for all values in the open interval
(0,1)
f ( xA (1 ) xB ) f ( xA ) (1 ) f ( xB )
Definition:
A function is strictly convex in an interval if for any distinct points
x A and xB in that interval, and for all values in the open interval
(0,1)
f ( xA (1 ) xB ) f ( xA ) (1 ) f ( xB )
39
A menu of functions: power function
Definition: power function

y f ( x) kx p
for which p is referred to as the exponent of the function
Rules of exponents:
x0 1
x1 x
1
x p
xp
xm / n n xm
x a b x a xb
a b xa
x
xb
b
xa x ab
a
xa y a xy
a
xa x
(y 0)
ya y
Definition: polynomial function

y f ( x) a0 a1 x a2 x 2 ... an x n
Degree of the polynomial function: highest exponent of the function
(=n)
40
A menu of functions: exponential function
Definition: exponential function

y f ( x) kb x
b: base of the function
Example 35:
lim b x 0 if | b | 1
x
and lim b x 1 if | b | 1
x
and lim b x if | b | 1
x
Example 36:
lim b x 0 If | b | 1
x
Which is equivalent to
1
lim x 0 If | b | 1
x b
41
Summations
Definition:
The sum from i = 1 to i = 5 of xi :

5
xi x1 x2 x3 x4 x5
i 1
i: summation index (an integer).
This summation is the same as

5
xj x1 x2 x3 x4 x5
j 1
Example 37:
6
i2 12 22 32 42 52 62 91
i 1
2
1 1 1 1 21
j 0 ( j 1)( j 3) 3 8 15 40
42
Summations: properties
Additive property:
n n n
(ai bi ) ai bi
i 1 i 1 i 1
Homogeneity property:
n n
cai c ai
i 1 i 1
So that:
n
c nc
i 1
43
Double summations
Property:
m n m n n n n
aij aij a1 j a2 j ... amj
i 1 j 1 i 1 j 1 j 1 j 1 j 1
or
n m n m m m m
aij aij ai1 ai 2 ... ain
j 1 i 1 j 1 i 1 i 1 i 1 i 1
thus
m n n m
aij aij
i 1 j 1 j 1 i 1
44
Logarithmic functions
Definition:
y bx
has a point in logarithmic form: y logb ( x)
Example 38
blogb ( x ) x
Rule 1:
logb ( xy) logb ( x) logb ( y)
Rule 2:
logb ( x / y) logb ( x) logb ( y)
Rule 3:
logb ( x ) logb ( x)
Property:
logW H
logW J
log J H
45
Natural Logarithms
Rule 1: ln(e ) z
z
Rule 2: eln x x
Rule 3: ln( xy) ln( x) ln( y)
Rule 4: ln( x / y) ln( x) ln( y)
Rule 5: ln( x z ) z ln( x)
46
ADVANCED MATH TUTORIAL WEEK 1
ADVANCED MATHEMATICS TUTORIAL WEEK 1

Advanced Mathematics Tutorials week 1 - Solutions week 1
Exercises with an asterisk (*) are meant to deepen the knowledge of the material of that
week. Hence, these exercises are less likely to be asked on the exam.
Technical tutorial (Wednesday)

Exercise 1.
Consider the following graph of a function f.
Graphically assess the domain and the range of f, its limits x 0, x A, x B, x C and its
continuity at these points. By graphically assessing, we mean that the exact function values do
not matter, but the procedure followed should be clear.
Solution:
The domain is the set on which the function is defined. f appears to be defined everywhere,
except on the interval [-1,-0.5] and at x=0, where f has no values. So its domain is
\ ([ 1, .5] {0}) (this means the set , which denotes the real numbers, with the interval
[-1,-0.5] and the point x=0 cut out).
The range is the set of outcomes of the function. In this case all numbers from seem to be
reached except for the small interval at about 2, 1 . So the range of the function is
\ ([ 2, 1]) . Note that we cant tell from the picture whether this is equal to the co-domain
of f, i.e., if f : X Y , whether Y range( f ) . For instance, Y here could be the whole of ,
or it could be \ ([ 2, 1]) . In the latter case, the function would be called onto, or surjective.
A function which is onto or surjective has range equal to its codomain.
For the limits and continuity we start with point C. Recall that a function is continuous if you
47
can draw it without lifting you pencil from the paper. Clearly around point C this is possible,
so f is continuous at C. Continuity at a point means that the function value at that point is
equal to its limit, so lim f ( x) f (C ) 0.9 , where the last approximate equality is our guess
x C
looking at the graph.

We turn to point B. The function has a jump here, so it is not continuous at B. Furthermore,
from the left the function tends to a different value than from the right (from the right it goes
to, say, 2, whereas from the left it appears to go to something like 1.2). Therefore a limit at B
is not defined.
At point A the function also has a jump, so it is again discontinuous. However from the left
and from the right it tends towards the same value, so it does have a limit: lim f ( x) 1 . Do
x A
note that lim f ( x) f ( A) .

x A
Finally, at point 0 something strange happens. The function is not defined at this point.
However, it does have a left-hand and a right-hand limit. From the left it goes to , from the
right to . These values are different, so the limit at 0 is not defined. Even if it were, we
could not decide whether f is continuous at 0, because the condition lim f ( x) f (0) is not
x 0
well-defined: f(0) does not exist.
Hopefully this rather convoluted example displays all the conceptual pitfalls of limits and
continuity.
Exercise 2.
Let A and B be two sets and A B . We wonder if x B . Would x A be a sufficient
condition for that? And a necessary condition? And if we knew x B and wondered x A?
Solution:
If x A , then certainly x B , so it is sufficient. However, it is not necessary, because x could
be in B without being in A.
If x B , then it might still be that x A , so it is not a sufficient condition. However, it is a
necessary condition, for if x B , then certainly x A .
Exercise 3.
Let A {1,3,5,7,11} and B {7,11,13}
a) Consider A B and A B and A and A
A B {7,11} and A B {1,3,5,7,11,13} and A and A A
A \ B {1,3,5} and B \ A {13}
Consider the following statements (true or not true):
b) 1 A B NOT TRUE
c) 15 A B TRUE
d) A B NOT TRUE
e) Let C {11} . Consider the following expressions:
11 TRUE
C WRONG. It must be C
C TRUE
f) and
Exercise 4.
Exercise 2.2.2. from Klein.
Which of the following functions are one-to-one:
48
a) A function relating countries to their citizens
b) A function relating street addresses to zip codes
c) A function relating library call numbers to books.
d) A function relating a students identification number to a course grade in a specific
class.
Solution:
a) Not, because Pierre and Jacques are both from France, but not the same person.
b) Not
c) Don't know, I don't know what a library call number is. If there is only one call number per
book (and one book per library call number), then it is one-to-one.
d)Not, if two different students obtain the same grade.
Exercise 5.
Prove the following propositions by induction
a) 2n 2n, n and n 3
n
b) (2i 1) n2 , n
i 1
n
1
c) 2i 2n 1, n
i 0
d) *Every integer greater than 1 can be written as the product of prime numbers.
Remember that a prime number is a number that cannot be written as the product of
two integers larger than 1. So 2,3,5,7,11 are prime numbers, but, e.g. 9 is not, as
9 3 3.
n
(2n 1)(n 1)n
e) * i 2 , n
i 1 6
Solution:
a) 2n 2n, n and n 3
Proof:
We first check that the proposition holds for n 3 : 23 8 2 3 . Next we assume that
the proposition holds for k (we call this the induction assumption) and prove that
it then also holds for k 1 . From the induction assumption we know that 2k 2 k .
Now lets check the proposition for k 1 :
2k 1 2 2k 2 (2k ) (because k 3 and the induction assumption) 2k 2k 2(k 1)
So we see that the proposition holds for k 1 .
n
b) (2i 1) n2 , n
i 1
Proof:
We first check that it holds for n 1 :
1
(2i 1) 2 1 1 12 , so it works out.
i 1
Next we assume that the proposition holds for k and prove that it holds for
49
k
k 1 . The induction assumption is now: (2i 1) k 2 . Lets check it for k 1 :
i 1
k 1 k
(2i 1) (2i 1) 2(k 1) 1 k 2 2(k 1) 1 (By the induction assumption)
i 1 i 1
k 2k 1 (k 1) 2
2
In this last expression we recognize precisely the proposition for k 1 .
n
1
c) 2i 2n 1, n
i 0
Proof:
We first check that it holds for n 0 :
0
2i 20 1 21 1
i 0
Next we assume that the proposition holds for k and prove that it holds for
k
k 1 . The induction assumption is now: 2i 2k 1
1.
i 0
Lets check for k 1 :

k 1 k
1 1 1 1 2
2i 2i 2k 2k 1 2k (By the induction assumption) 2 2k 1 2k 1
i 0 i 0
. In this last expression we recognize precisely the proposition for k 1 .
d) *Every integer greater than 1 can be written as the product of prime numbers.
Proof:
Here we are going to use the induction hypothesis ore intensively. We will not only
use the fact that it holds for the last step, but for all steps so far.
First we check again if the proposition holds for the smallest case: n 2 . Because 2 is
a prime number, the proposition clearly holds: write 2 2 1 .
Now lets assume that the proposition holds for all numbers smaller than n. We now
show that it must then also hold for n. If n is a prime, the proposition holds. Suppose n
is not prime, then, by definition, it can be written as a product of two integers
n m l, m ,l , m 1, l 1 . Because both l and m are larger than one, they must
both be smaller than n. But because they are smaller than n, the induction hypothesis
applies and they can be written as a product of prime numbers. Hence n can be
written as the product of those products and therefore as a product of prime
numbers.
n
(2n 1)(n 1)n
e) * i2 , n
i 1 6
Proof:
A slightly trickier one to top things off.
We first check for n 1 :
1
(2 1 1)(1 1) 1
i 2 12 1 ,so it works out. Next we assume that the
i 1 6
proposition holds for k and prove that it holds for k 1 . The induction
50
k
(2k 1)(k 1)k
assumption is now: i2 . We check for k 1 :
i 1 6
k 1 k
(2k 1)(k 1)k (2k 1)(k 1)k 6(k 1)(k 1)
i2 i 2 (k 1) 2 (k 1) 2
i 1 i 1 6 6 6
(2k 1)(k 1)k 6(k 1)(k 1) ((2k 1)k 6(k 1))(k 1) (2k 2 7k 6)(k 1)
6 6 6
(2k 3)(k 2)(k 1) (2(k 1) 1)(k 2)(k 1)
6 6
In this last expression we recognize precisely the proposition for k 1 .
Exercise 6.
Prove the following limits from the , definition:
a) lim x 5 8
x 3
b) lim 2 x 3 11
x 4
{
2
2 f (x )= x if x 4
c) * lim f ( x ) lim x 16 . Would your answer change if 0 if x= 4 ?
x 4 x 4
d) * lim x 2 3x 1 17
x 3
Well also do two examples of how to prove that something is not a limit. Show that:
e) * lim x 2 6
x 3
f) * lim x 2 10
x 3
Solution:
Remember that in general lim f ( x) L means that for any 0 we can find a 0 such
x a
that if x a (meaning x is close enough to a), then f ( x) L , meaning f ( x) is

close to L .
a) In the end we have to write a nice little proof, but to get there, we first have to write
some stuff on scratch paper. This is very normal in mathematics. The proofs you read
are very nice and economical, because it isnt necessary to show all the hard work that
was required to get there.
In the end we want to get the expression: ( x 5) 8 from x 3 , by imposing
a relation between and . Lets play around with the expression ( x 5) 8 to see
what we might need to impose.
( x 5) 8 x 3
We already see the expression for ( x 3 ) pop up immediately. So we can just take
and prove the limit.
Proof of a):
Let 0 . Let and let x 3 . Then ( x 5) 8 x 3 . The first
51
equality follows by simple algebra, the inequality and the second equality follow by
the assumptions. If we read the extreme left hand and extreme right hand of the
equations, we get: ( x 5) 8 , which is what we wanted. So the proof is complete.
In this first example finding the was so straightforward that it became a little
confusing. Lets try b) and see if things become clearer.
b) Here we want to get the expression (2 x 3) 11 from the expression x 4 .

Lets manipulate the first expression to see how it relates to the second.
(2 x 3) 11 2 x 8 2( x 4) 2 x 4 2 x 4 . Here we used the rule
a b a b . Now can see that if this must hold, then, dividing both sides by 2,
x 4 . Now if we set , then this will hold. So we can write our proof.
2 2
Proof of b):
Let 0 . Let and let x 4 . Then
2
(2 x 3) 11 2x 8 2( x 4) 2 x 4 2 x 4 2
The little black box is a notation that mathematicians use to indicate that a proof is
complete.
c) *This one is slightly trickier. We want to get the expression x 2 16 from

x 4 . Lets manipulate the first expression again:
x2 16 ( x 4)( x 4) x 4 x 4 Here we first factorised x 2 16 and then
used the rule a b a b again. Now we see already the x 4 in here that we want,
but there is still a term with an x left: x 4 . We cannot just set , because
x 4
cannot be a function of x, only of . But what we can do is restrict our and see what
that implies for x 4 . Lets say that 1 , then from x 4 1 , it follows that
1 x 4 1 3 x 5 7 x 4 9 , so x 4 9 (make sure you understand
this last conclusion!). So from 1 we get x 4 x 4 9x 4 , or x 4 .

9
We are ready for our proof again. But first a bit of notation: with y=min{a,b} we mean
that y is the minimum of the two values a and b. Notice that
min{a, b} a and min{a, b} b , a property that we will use quite often in this
exercise.
Proof of c) *
Let 0 . Let min{1, } and let x 4 . First observe that x 4 1 , so
9
1 x 4 1 7 x 4 9 and x 4 9 .
Now
52
x 2 16 ( x 4)( x 4) x 4 x 4 9x 4 9 9 min{1, } min{9, }

9
d) *The trickiest for last. We want to get ( x 2 3x 1) 17 from x 3 . Lets

start as usual with manipulating the first expression:
( x2 3x 1) 17 x 2 3x 18 ( x 3)( x 6) x 3 x 6 . We factorised our
expression, but it turns out that this is always possible for a limit like this. So we
werent just lucky. Now we see our x 3 and by now we know how to handle the
x 6 part. We restrict our , say 1 (note that the 1 isnt special and things would
have worked out for other numbers as well). Then , x 3 1 so
1 x 3 1 8 x 6 10 , so x 6 10 .
This gives x 3 x 6 10 x 3 , or x 3 . We are ready for the proof.

10
Proof of d)
Let 0 . Let min{1, } and let x 3 . First observe that x 3 1 , so

10
1 x 3 1 8 x 6 10 and x 6 10 .
Now
( x 2 3x 1) 17 x 2 3x 18 ( x 3)( x 6) x 3 x 6 10 x 3 10

10 min{1, } min{10, }
10
e) * This is first of all a little exercise in logic. The usual claim for a limit is like this: for
all there exist a certain such that for all x (with certain properties) something is
true. Now we want to prove the negation of this statement. It follows from normal
logical considerations that this would be: there is an such that for all there exist an
x (with certain properties) such that the previous something is not true. So now we
want to prove a statement of this form. Remember that is basically a challenge. To
prove that something is a limit, we want to show that we can meet any challenge. To
show that something is not the limit we have to show that there is a challenge that
cannot be met.
So much for the preparation. We want to find an for which things go wrong. Lets
1
take . Then for any 0 there must be an x such that x 3 and
10
1
( x 2) 6 . Lets take x 3 min{ , } . Clearly
2 2
1 1
x 3 3 min{ , } 3 min{ , } , but
2 2 2 2
1 1 1 1 1
( x 2) 6 (3 min{ , } 2) 6 min{ , } 5 6 min{ , } 1 1
2 2 2 2 2 2 2 2
. So the poses a challenge that cannot be met and 6 is not the limit.
53
1
f) * Again we want to find an for which things go wrong. Lets take . Then for
10
any 0 there must be an x such that x 3 and x 2 10 . Lets take
1 1 1
x 3 min{ , } . Clearly x 3 3 min{ , } 3 min{ , } , but
10 2 10 2 10 2
1 1 1
x 2 10 (3 min{ , })2 10 9 6 min{ , } (min{ , }) 2 10
10 2 10 2 10 2
1 1 1 1 61 39 1
6 min{ , } (min{ , }) 2 1 6 ( )2 1 1
10 2 10 2 10 10 100 100 10
Again we conclude that the challenge cannot be met and 10 cannot be the limit.
Exercise 7.
Exercise 3.3.1 from Klein.
a) 10log10 (100)
b) ln e x eln( x )
1
c) log10 ( 5 )
x
d) log 2 (a b)
e) ln(ea bx cz
)
3
f) ln(4 x)
1
e) ln 5 [ x y ]2
e
Solution:
a) 10log10 (100) 100
b) ln(e x ) eln( x ) x x 0
1
c) log10 ( 5 ) 5log10 ( x)
x
d) Cannot be simplified further.
e) ln(ea bx cz ) a bx cz
f) log(3x)4 Unclear, has two interpretations:
Either:
log((4 x)3 ) 3log(4 x) 3(2log(2) log( x))
Or:
(log(3x))4 Which cannot be simplified further.
1
g) ln( 5 [ x y ]2 ) ln(e 5 ) ln([ x y ]2 ) 5 2(ln( x ) ln( y ))
e
5 2( ln( x) ln( y))
54
Advanced Mathematics - Broad tutorial (Friday of week 1)
Exercise 1.
Show from the definitions that for f ( x) x2
a) f(.) is a concave function,
b) f(.) it is a homogeneous function (of a degree to be determined by you).
Solution:
1a. Recall the definition of concavity: f(.) is concave if and only if, for 0<<1,
f ( xA ) (1 ) f ( xB ) f ( xA (1 ) xB )
or
(1) f ( xA ) (1 ) f ( xB ) f ( xA (1 ) xB ) 0
Given our function f(.), we know that:

(2a) f ( xA ) xA2
(2b) (1 ) f ( xB ) (1 ) xB 2
(2c) f ( xA (1 ) xB ) ( xA (1 ) xB )2
{so that f ( xA (1 ) xB ) ( xA (1 ) xB )2 }
Writing out equation (1), by substitution (2a), (2b) and (2c) in this equation, we get:
xA2 (1 ) xB 2 ( xA (1 ) xB ) 2
xA2 (1 ) xB 2 2
xA2 2 (1 ) xA xB (1 ) 2 xB 2 (3)
Next, we are rearranging equation (3) in terms of x A 2 , xB 2 and xA xB
(1 ) xA2 (1 ) xB 2 2 (1 ) x A xB
(1 )( xA2 xB 2 2 x A xB )
(1 )( xA xB ) 2 0
The last term consist of a minus, two positive terms { >0 and 1 0 }, and a square (
( xA xB )2 >0), so that the last term is negative.
which proves our result.
1b. Recall the definition of homogeneity:
The function f(.) is homogeneous of degree k if, for >0, f ( x) k

f ( x)
We write out the definition:
f ( x) ( x) 2 2 2
x 2
f ( x)
55
So we find that f(.) is homogeneous of degree 2.
Exercise 2.
Suppose f(x) and g(x) are both monotonous increasing functions on the same domain. Show
that
a. f(.) + g(.) is also a monotonous increasing function.
b. And if f(x) and g(x) are both concave, can you show that f(.) + g(.) is concave?
c. And if f(x) and g(x) are both homogeneous, of degrees l and m respectively, can you then
show that f ( x) g ( x) is homogeneous, and of what degree?
Solution:
2a. Monotonicity:
We know that, if xA xB , f ( xA ) f ( xB ) and g ( xA ) g ( xB ) , for any

xA , xB Dom( f ) Dom( g )
(meaning that x A and xB are both on the domain of both f(.) and g(.)).
Now we have to prove that, if we let h( x) f ( x) g ( x) that h( xA ) h( xB ) .
Thus:
h( xA ) f ( xA ) g ( xA ) f ( xB ) g ( xb ) h( xB )
This proves what we want.
2b. Concavity: We know that for any 0<<1 and xA , xB Dom( f ) Dom( g ) .
f ( xA ) (1 ) f ( xB ) f ( xA (1 ) xB )
and
g ( xA ) (1 ) g ( xB ) g ( xA (1 ) xB )
Now we want to show that, if we define again h( x) f ( x) g ( x) , then

(1) h( xA ) (1 )h( xB ) h( xA (1 ) xB ) .
Thus we start with the left-hand side of equation (1) and we rewrite it in terms of functions f(.)
and g(.)
h( xA ) (1 )h( xB )
( f ( xA ) g ( xA )) (1 )( f ( xB ) g ( xB ))
f ( xA ) (1 ) f ( xB ) g ( xA ) (1 ) g ( xB ) f ( x A (1 ) xB ) g ( x A (1 ) xB )
h( xA (1 ) xB )
That proves what we want.
2c. Homogeneity:
We know that f ( x) l
f ( x) and g ( x) m
g ( x) .
56
Now we want that, if we define k ( x) f ( x) g ( x) , then we want k ( x) p

k ( x) for some p
still to be determined.
k ( x) f ( x) g ( x) l
f ( x) m
g ( x) l m
f ( x) g ( x) l m
k ( x) .
So we see that the function k(.) is homogeneous of degree l+m.
Exercise 3. Proposition
There is no such thing as the smallest positive number in .
Proof:
A positive number in means some x , x 0 . Lets assume that there is an x ' , x' 0
x' x'
that is the smallest positive number. Consider . Because x ' 0 , 0 x ' . This
2 2
contradicts that x ' is the smallest positive number. Hence there can be no smallest positive
number in .
Exercise 4.
Find the following limits from the rules about limits that you know. Also draw a graph to
illustrate your findings:
1
a) lim
x x 2
2x 1
b) lim
x x
2
x 2x 1
c) lim
x x
1
d) lim
x 2 x 2
2
e) * lim x e x
x
2 ex
f) lim
x ex
2 e2 x
g) lim
x ex
2 e2 x
h) lim
x ex
3x 2
i) lim 2
x x 100 x3
1
j) lim( ) x 1x
x 3
5 x10 1010
k) lim
x 10 x12
5e 3 x 2
l) lim 5 x
x e 4
57
Solution:
1 1 1
a) lim 0
x x 2 2
6
3.0 2.5 2.0 1.5 1.0 0.5 0.5
We can see that as x grows larger, the graph goes to zero.

2x 1 2x 1 1 1
b) lim lim( ) lim(2 ) 2 2
x x x x x x x
2.4
2.2
2.0
1.8
1.6
1.4
2 4 6 8
We can see that as x grows larger, the graph goes to two.
58
x2 2 x 1 x2 2x 1 1 1
c) lim lim( ) lim( x 2 ) 2 2 0
x x x x x x x x
40
30
20
10
5 10 15 20
10
We can see that as x grows, the graph increases indefinitely.

1 1 1
d) * lim
x 2 x 2 2 2 0
This may seem slightly confusing. Its best to think of a as a number slightly larger
than a. So a number slightly larger than -2, plus 2, gives a number slightly larger than
0.
We can see in the graph we drew in a) that if we approach -2 from the right, we
increase indefinitely. Note that the answer would not be the same if we approached -2
from the left; then we would go to minus infinity.
e) lim x 2 e x ( )2 e ? . We cannot evaluate this limit based on our rules,
x
because might be anything. In this case, the outcome happens to be -, but if

we switched the formula around, lim e x x 2 , the answer would be .
x
We can see that the outcome should be minus infinity in a graph:

1 2 3 4 5 6
50
100
150
200
250
300
350
59
2 ex 2 ex e x
2 e x
1 2 e 1 0 1
f) lim lim lim 1
x ex x ex e x x 1 1 1
6
1 0 1 2 3 4
2x 2x
2 e 2 e e x
2 e x
e x
g) lim lim lim 2 e e 0
x ex x ex e x x 1
100
80
60
40
20
4 2 2 4
2x 2x
2 e 2 e e 2 e e x x x
h) lim lim lim 2 e( ) e 2 0
x e x x e x
e x x 1
1 1
i) lim( ) x 1x lim( ) x lim1x 0 1 1
x 3 x 3 x
Note that lim 0.999999999999 x 0 (convergence is slow)

x
Reason:
0.999999999999x 0.999999999999x 0.999999999999
0.999999999999x 0.999999999999 x 1
On the other hand lim1.000000000001x (again: divergence is slow)

x
60
5 x10 1010 5 x10 1010 5 x10 109

j) lim lim lim lim lim
x 10 x12 x 10 x12 x 10 x12 x
2 5 x10 x 2 x x12
1 109
lim lim 0 0 0
x 2 x2 x x12
5e 3 x 2 lim(5e 3 x 2) lim 5e 3x
lim 2 0 2 1
k) lim 5 x x
5x
x
5x
x
x e 4 lim(e 4) lim e lim 4 0 4 2
x x x
Exercise 5.
Write out the following:
4 4
a) i 2i
i 2 j 1
3 2
b) (i j )2
i 1 j 1
Solution:
4 4 4 4 4 4 4
a) i 2i i 2i (i 2i i 2i i 2i i 2i ) 4(i 2i ) 4 (i 2i )
i 2 j 1 i 2 j 1 i 2 i 2 i 2
2 3 4
4(2 2 3 2 4 2 ) 384
We could also have shown:

4 4 4 4 4 4
i 2i i 2i (2 22 3 23 4 24 ) 96 4 96 384
j 1 i 2 j 1 i 2 j 1 j 1
Thus both outcomes are equal:

4 4 4 4
i 2i i 2i
j 1 i 2 i 2 j 1
b)
3 2 3 2 3
(i j )2 (i j )2 (i 1) 2 (i 2) 2
i 1 j 1 i 1 j 1 i 1
(1 1) 2 (1 2) 2 (2 1) 2 (2 2) 2 (3 1) 2 (3 2) 2
4 9 9 16 16 25 79
Again, one can show that

3 2 2 3
(i j )2 (i j )2
i 1 j 1 j 1 i 1
Note that always the indices have dropped out after you have evaluated the sums. They are
only useful within the sum and for that reason are sometimes called dummies.
61
Exercise 6.
Proof by induction that:
n
1 an 1 1
ai for | a | 1 and that ai (which is one of the fundamental formulas
i 0 1 a i 0 1 a
of finance)
1 a
Step 1: It is true for n=0. Because 1
1 a
k 1
Step 2: Assume it is true for any n=k and compute ai
i 0
k 1 k 1 k 1 k 2
k
1 a 1 1 a 1 a a a 1 ak 2
ai ak ak 1
i 0 1 a 1 a 1 a 1 a
So, we can conclude that it is true for all non-negative integers.
Next, we consider the following limit:
n
1 an 1 1 an 1 1 1
lim ai lim lim lim 0
n
i 0
n 1 a n 1 a n 1 a 1 a 1 a
Exercise 7.
Let k be some constant and f(.) some function. Show, or at least make clear, that
n n n
kf (i) k f (i) and k nk .
i 1 i 1 i 1
Solution:
n n
kf (i) k f (1) k f (2) ... k f (n) k ( f (1) f (2) ... f (n)) k f (i)
i 1 i 1
n
k (k k ... k ) nk
i 1
n times
*Exercise 8.
Exercise 3.3.5. from Klein.
The theory of consumer behaviour is one of the foundations of economic analysis. The linear
logarithmic utility function is one of the original functions developed to measure consumer
utility and is still widely used by economists. It is written as
n
u ln U i ln qi
i 1
where u is the index of utility, qi is the quantity of good i, and 0 i 1 . Transform the
function back to its original form, where U is utility.
Solution:
n
i ln( qi ) n n n
ln( qi )
U eln(U ) ei 1 e i
(eln( qi ) ) i
qi i
i 1 i 1 i 1
62
Please note that we applied the following notation:
4 4
i 1 2 3 4 24 and that i 1 2 3 4 10
i 1 i
Consequently
4
4 4 i 4
ln( i) ln(i) and e i 1 e1 2 3 4
ei
i 1 i 1 i 1
63
ADVANCED MATH TAKE HOME ASSIGNMENT WEEK 1
ADVANCED MATHEMATICS TAKE HOME ASSIGNMENT

MATERIAL OF WEEK 1
1.
Find the following limits from the rules about limits that you know.
2
e5 x 1
lim 3 x2
x
e
Solution:
2 2
e5 x 1 e5 x 1 2 1
lim 3 x2 lim 3 x2
lim 3 x2
lim e2 x lim 2 0
x
e x
e x
e x x
e3 x
2 2
( x 1) ( x 1)3
lim
x 1 3
( x 1)
Solution:
2 2
( x 1) ( x 1)3 2 ( x 1) 2
( x 1) 2 2 2
lim lim lim lim
x 1 3 x 1 ( x 1) 3 x 1 ( x 1) 3 3
3 x 1 3( x 1) 2
3
( x 1)
2. Show that
100
i3 99 99
log( ) 3 log( j 1) 2 log( j 2)
i 1 (i 1)2 j 0 j 0
Solution:
100
i3 100
i3 100 100 100 100
log( ) log( ) log(i 3 ) log(i 1) 2 3log(i ) 2 log(i 1)
i 1 (i 1)2 i 1 (i 1)2 i 1 i 1 i 1 i 1
100 100 99 99
3log(i ) 2 log(i 1) log( j 1) 2 log( j 2)
i j 1
i 1 i 1 i 1 j 0 j 0 j 0
i 100 j 99
64
ADVANCED MATH ADDITIONAL EXERCISES WEEK 1
* Exercise 1.
The following are all graphs of functions . Determine whether they are one-to-one
(i.e. injective) and whether they are onto (i.e. surjective).
a)
Solution:
We check for injectivity. Clearly the function does not take the same value twice, so the
function is injective.
We check for surjectivity. The range for the function is only about [ 3, ) , while it is given
that Y . So the function is not surjective.
65
b)
Solution:
We check for injectivity. We see that, for instance, both at x=2 and at x=-2 f(x)=1, so the
function is not injective.
We check for surjectivity. The range for the function is only about [ 3, ) , while it is given
that Y . So the function is not surjective.
66
c)
Solution:
We check for injectivity. We see that the function does not take the same value twice, so it is
injective.
We check for surjectivity. We see that the range of the function is Y , so it is surjective.
d)
Solution:
We check for injectivity. We see that for instance at x=0 and at x=40 the function takes the
value f(x)=0, so it is not injective.
We check for surjectivity. We see that the range of the function is Y , so it is surjective.
67
Exercise 2.
Write out:
2 4
(2i 3 j )
i 1 j 1
Solution:
2 4 4 4
(2i 3 j ) (2 3 j ) (4 3 j ) (6 6 j ) (6 6) (6 12) (6 18) (6 24) 14 6 84
i 1 j 1 j 1 j 1
Or the other way around:

2 4 2 2
(2i 3 j ) ((2i 3) (2i 6) (2i 9) (2i 12)) (8i 30) (8 30) (16 30) 84
i 1 j 1 i 1 i 1
Of course, not all of these brackets are necessary, they are mostly to show what comes from
what.
* Exercise 3.
Determine whether the following function is homogeneous. If it is, determine the degree.
f ( x) h( x3 ) , where h( x) is homogeneous of degree 7. (Hint: if you find this confusing, first
try it with h( x) x7 , which is a homogeneous function of degree 7.)
Solution:
First the general problem:
We check if f (t x) t m f ( x) for some m.
f (t x) h((t x)3 ) h(t 3 x3 )
Now we know that h is homogeneous of degree 7, i.e. h(r y) r 7 h( y) . Take r t 3 and
y x3 to find:
h(t 3 x3 ) (t 3 )7 h( x3 ) t 21h( x3 ) t 21 f ( x)
So f is homogeneous of degree 21.
The hint is solved similarly, but now we take h( x) x7 :
f (t x) h((t x)3 ) (t 3 x3 )7 t 21 ( x3 )7 t 21h( x3 ) t 21 f ( x)
Which is not really easier, I suppose.
* Exercise 4.
Of course you all know intuitively what the derivative of a function f(x) is: it is the very small
change that occurs in f(x) when you very slightly change x. The picture illustrates this. The
blue line is the graph of the function f(x). If you take x ever smaller, you will approach ever
more closely the slope of the red line of the derivative.
68
For this approaching ever more closely, we naturally think of the limit (in fact, it was in the
context of derivatives that the notion of limit was first developed).
d f (x x ) f ( x)
We define: f ( x) lim . (Note that f ( x x) f ( x) f ( x) .)
dx x 0 x
d
Now you must prove that for f ( x) x 2 it indeed holds that f ( x) 2 x by writing out the
dx
limit. Do this in three steps: first, before evaluating the limit, observe that
f ( x x) f ( x) 2( x x) x 2 . Then, still before evaluating the limit, show that
f (x x) f ( x)
simplifies to 2x x . Then evaluate the limit lim 2 x x directly from
x x 0
the definition (either doing it from the left or the right hand side is enough).
If you succeed, you have proved a rule that youve already known for a long time. Isnt that
fun!
Solution to exercise 4:
So, let the fun begin:
We first write out the definition. To emphasize that x is a single number and not a
multiplication of and x, I will now define x=h. (So this is just giving it a new name).
d 2 f ( x h) f ( x ) ( x h) 2 x 2
x lim lim
dx h 0 h h 0 h
Now, before we touch the limit, we just apply algebra to what is inside the limit. This is
allowed, because were basically not changing the expression over which we take the limit.
( x h) 2 x 2 x 2 2 xh h2 x 2 2 xh h2
lim lim lim lim(2 x h)
h 0 h h 0 h h 0 h h 0
Strictly speaking, for our last step, we should observe that h0, because otherwise it would
69
not be allowed to divide by h. However, if you recall the definition of a limit, you will see that
h never actually takes the value to which it goes, i.e. 0 in this case. Therefore our last step is
valid and is obtained by dividing both the denominator and the numerator by h. For this last
expression we can now apply the definition of a limit.
How does it work again? In general, the idea is that, if
c lim f ( x)
x a
then f(x) will get ever closer to c, as x gets closer to a. This was formalised thus: if you say
how close to c you want to get, then I should be able to give a distance from a so that you will
indeed get that close or closer to c. You saying how close you want to get is setting an , me
providing you with this distance is picking the .
Lets apply this to our case. The function over which we are taking a limit is 2x+h, where x is
now just some given number. Intuitively, we would expect that as h goes to zero, this function
will just go to 2x. So lets make that our guess for the limit.
Now, since this function is very simple, we can do the right hand limit and the left hand limit
at the same time.
You provide me with >0 and I decide to pick = (Why? It turns out that it works. This is
basically backward engineering.) Then if I only look at h whose distance to 0 is less than ,
i.e. 0 h h ,I hope to find that my distance to 2x is smaller than , i.e. f (h) 2 x .
Lets see:
1
f ( h) 2 x 2 x h 2 x h
2
Comparing the left hand side and the right hand side, we see that we indeed are close enough,
d
so the limit is as we specified. This proves that f ( x) 2 x . Yay!
dx
Exercise 5.
Proof by induction:
n
n2 n n n
i (hint: use (2i 1) n 2 and 1 n)
i 1 2 2 i 1 i 1
Solution 1 to exercise 5 (proof by induction):

Step 1:
1
12 1
For n=1: i
i 1 2 2
Step 2:
Assume the equality holds true for n=k and check whether it is also true for n=k+1:
k
k2 k k 2 k 2(k 1) k 2 2k 1 k 1 (k 1)2 (k 1)
i k 1 k 1
i 1 2 2 2 2 2 2 2 2 2 2 2 2
Step 3:
The equality holds true for all n
Alternative solution 2 to exercise 5 (no proof by induction):
n
1 n
1 n
1 n
1 n
1 n
1 n
1 2 1
i 2i 2i 1 1 (2i 1) 1 n n
i 1 2 i 1 2 i 1 2 i 1 2 i 1 2 i 1 2 i 1 2 2
70
Eexercise 6.
Find the following limits from the rules about limits that you know.
2
ex 2x 1
6a) lim 2
x
e3 x
Solution to exercise 6a):

2 2
ex 2x 1 ex 2x 1 2 x2 2 x 1 2 x2
lim 3 x2
lim 3 x2 lim 3 x2 lim e ( ) lim e lim e2 x 0
x x x x x x
e e e
0
It is wrong to conclude that this term is 0, because it depends which of both limits dominates.
2x 2 or 2x ?
2x 1 2
It can be shown that lim 2 lim 0 , so that the term lim e 2 x dominates.
x 2x x x x
2 x2 2x
Hence lim e lim e 0
x x
5 3
6b) lim x x2
x 0 2x
Solution to exercise 6b):

5 3
2 1 5 1 3 5 3
lim x x lim lim 2
lim lim
x 0 2x x 0 2x x x 0 2x x x 0 2x2 x 0 2 x3
Exercise 7. Show that
100
i2 100 100
log( ) 2 log(i) log(i 1)
i 1 i 1 i 1 i 1
Solution to exercise 7:
100
i2 100
i2 100 100 100 100 100 100
log( ) log log i 2 log(i 1) 2log(i) log(i 1) 2 log(i) log(i 1)
i 1 i 1 i 1 i 1 i 1 i 1 i 1 i 1 i 1 i 1
71
ADVANCED MATHEMATICS SLIDES WEEK 2
WEEK 2: LINEAR ALGEBRA (I)
Vectors
Definition:
n
A vector x, x , is defined as
x1
x
xn
The vector x has a (n X 1) format, which means that it is a column

vector with n elements x1 , x2 ,, xn
Definition:
x1 y1
n
Two vectors x and y , x, y can be added:
xn yn
x1 y1
x y
xn yn
Example 3
3
x, y
2 1
x 1 and y 0 so that
1 5
3
x y 1
4
72
Multiplication of a vector by a real number
Definition:
A vector x can be multiplied by a real number a:
ax1
ax
axn
Example 4
2
If x 1 then:
1
6
a) 3x 3 (multiplication of vector x by scalar 3)
3
6
b) 3x 3 (multiplication of vector x by scalar -3)
3
73
Line through the origin
Definition
2
Every x which satisfies
c1 ac1
a
c2 ac2
for which a , is at a line through the origin

0 c
O and the point 1
0 c2
Example 5
x1 2a
Any
x2 3a
0
for which a is on a line that goes through the origin O and
0
2
the point
3
6
Thus, for instance is at this line (a=3).
9
74
Length of a vector
Definition:
The length of vector x:
x x12 x2 2
Implication:
The length of vector ax:
ax a 2 x12 a 2 x2 2 | a | x12 x2 2 | a | x
Note that we take the absolute value of a, because the length cannot be
a negative number.
2
If x then
3
a) x 4 9 13 (length of x)
b) 3x 3 4 9 3 13 (length of 3x)
c) 3x 3 4 9 3 13 (length of -3x)
75
Example 6
b
If x
3
a) For which real number b is the length of x equal to 5?
x b2 9 5
Thus for b = 4 or for b = -4
b) For which real number b is the length of x equal to 1?
x b2 9 1
Thus, there is no b available.
76
Example 7
b
If x
0.5
For which real numbers b is the length of the vector x equal to 1?
x b2 0.25 1
1
Thus b2 3/ 4 . Which means that b 3
2
1 1
3 3
Thus the vectors x 2 and x 2 have a length of 1.
1 1
2 2
Definition:
2 1 0
In the unit vectors e1 and e2 have a length of 1
0 1
77
Circles
Definition:
x1
Vectors x
x2
are at the unit circle if their length is equal to 1:
x x12 x2 2 1
0
Thus the locus of this circle is the origin O
0
Consequence:
2 1 0
In the unit vectors e1 and e2 are at the unit circle.
0 1
Definition:
x1 c1
A vector x is at a circle with locus c and with a non-
x2 c2
negative radius (r 0) if it satisfies the restriction:
( x1 c1 )2 ( x2 c2 )2 r
Example 8
1
( x1 1)2 ( x2 2)2 25 describes a circle with locus and radius
2
5.
78
Inner product
Definition: inner product
The inner product of two vectors

x1 y1
x and y
x2 y2
is defined as
x y x1 y1 x2 y2
79
Orthogonal vectors
Definition
Two vectors are orthogonal (perpendicular) if their inner product is
equal to zero:
x y 0
Example 9
2 1 0
In the unit vectors e1 and e2 are orthogonal:
0 1
e1 e2 1 0 0 1 0
Definition:
2 1 0
In the unit vectors e1 and e2
are referred to as
0 1
orthonormal vectors (they are perpendicular and they have a length
of 1).
Example 10
2 1 0
In the unit vectors e1 and a e2 ,a , are orthogonal:
0 a
Reason:
e1 (a e2) 1 0 0 a 0
I
1
Consequence: e1 is orthogonal to any point at the line
0
0
a e2 for a
a
80
Example 11
2 a 0
In any point at the line a e1 ,a and the line b e2
0 b
b are orthogonal:
Reason:
(a e1) (b e2) a 0 0 b 0
81
Linearly dependence of two vectors
(Informal) definition of dependence:

In 2 two vectors x and y are linearly dependent if the first vector is a
x1 y1
linear combination of the second vector. Thus x and y
x2 y2
x1 ky1
If x ky
x2 ky2
Formal definition of independence:

In 2 two vectors x and y are linearly independent if
x1 y1 0
k1 k2
x2 y2 0
for k1 0 and k2 0
82
Examples linearly dependence
Example 12
4 12
x and y are linearly dependent (k=1/3)
1 3
Reason: 4 k1 12 , 1 k2 ( 3) , thus k1 k2 1/ 3
Example 13
10 5
x and y are linearly dependent (k=2)
0 0
Reason: 10 k1 5 , 0 k2 0 , thus k1 k2 2
Example 14
10 3
x and y are linearly independent
0 8
Reason: 10 k1 3 , 0 k2 8 , thus k1 k2
Example 15
2 1
x and y are two linearly independent and orthogonal
1 2
vectors
Reason (for independent vectors): 2 k1 ( 1) , 1 k2 2 , thus k1 k2

Reason (for orthogonal vectors): 2 ( 1) 1 2 0
83
Example 16
1 2 1 1
x and y are linearly independent and orthonormal
5 1 5 2
vectors
Because:
Reason 1) ) 2 k1 *( 1),1 k2 * 2
So that
k1 1/ 2, k2 1/ 2, k1 k2
Reason 2) 2 ( 1) 1 2 0
1 1
Reason 3) x 22 12 1 and y 12 22 1
5 5
84
Linear independence of two vectors: important consequences
Consequence 1:
2
All vectors z can be written as a linear combination of two
2
linearly independent vectors x, y
x1 y1
Thus: z k1 k2
x2 y2
for which k1 , k2
2
Thus: can be spanned by two linearly independent vectors!!!!!
Consequence 2:
2 a c
Let x, y . We write both vector as x and y
b d
Both vectors are linearly dependent if
a d b c 0
Both vectors are linearly independent if
a d b c 0
Proof: both vectors x and y are linearly dependent if

a
1) a k1 c so that k1
c
b
2) b k2 d so that k2
d
a b
3) Linear dependent so that k1 k2 or or a d b c 0
c d
85
Matrices
Definition:
The (2 X 2) matrix A:
a11 a12
A
a21 a22
It consists of two rows and two columns.
Definitions:
aij is an element of the matrix A.
The diagonal of the matrix consists of the elements a11 and a22 .
The elements a11 and a22 are referred to as the diagonal elements
of the matrix A.
The elements a21 and a12 referred to as the off-diagonal elements
of the matrix A.
The (2 X 2) matrix A can be (post)multiplied by a (2 X 1)-vector x.
a11 a12 x1
Ax
a21 a22 x2
For the multiplication it is required that the number of columns of A is

equal to the number of rows of the vector x.
86
How to determine Ax ?
1) The first element of the product can be determined as:
a11 a12 x1 a11 x1 a12 x2

Ax
x2
2) The second element of the product can be determined as:
x1
Ax
a21 a22 x2 a21 x1 a22 x2
So that it can be combined as:
a11 a12 x1 a11 x1 a12 x2

Ax
a21 a22 x2 a21 x1 a22 x2
87
Example 17
1 3 2 11
a)
2 5 3 11
1 0 2 2
b)
0 1 3 3
0 1 2 3
c)
1 0 3 2
5 0 2 10
d)
0 5 3 15
2 1 2 7
e)
4 2 3 14
88
How to determine (A+B)x ?
Format of both matrices must be equal:
a11 a12 b11 b12 a11 b11 a12 b12

A B
a21 a22 b21 b22 a21 b21 a22 b22
Note that A B B A
89
How to calculate (AB)x ?
Requirement of matrix multiplication:

Number of columns of matrix A = Number of rows of matrix B
1) Row 1 times column 1:

a11 a12 b11 a11b11 a12b21
AB
b21

a11 a12 b12 a11b12 a12b22
AB
b22

b11
AB
a21 a22 b21 a21b11 a22b21

b12
AB
a21 a22 b22 a21b12 a22b22
Thus all together:

a11 a12 b11 b12 a11b11 a12b21 a11b12 a12b22
AB
a21 a22 b21 b22 a12b11 a22b21 a21b12 a22b22
Note that
a) AB BA
b) A( BC ) ( AB)C
90
Example 18
1 3 1 2
If A and B
2 5 0 4
The sum of both matrices (both have the same format)
1 3 1 2 0 5
A B
2 5 0 4 2 9
The product of both matrices (number of columns of A equals the

number of rows of B):
1 3 1 2 1 14
AB
2 5 0 4 2 16
91
How to interpret Ax?
a11 a12 x1 a11 a12

Ax x1 x2
a21 a22 x2 a21 a22
a11 a12
Thus Ax is a linear combination of the vectors and
a21 a22
Example 19:
1 3 2 1 3
a) 2 3
2 5 3 2 5
1 0 2 1 0
b) 2 3
0 1 3 0 1
0 1 2 0 1
c) 2 3
1 0 3 1 0
5 0 2 5 0
d) 2 3
0 5 3 0 5
2 1 2 2 1
e) 2 3
4 2 3 4 2
92
How to interpret the identity matrix?
Definition:
The square matrix I is an identity matrix and it has the following
property
Ix x
1 0
where I
0 1
I is a diagonal matrix with ones on the diagonal (all off-diagonal

elements are zero).
For a 2 X 2 matrix A, we have the following consequence:

Consequence 1: IA = A
Furthermore, I times I equals I:

Consequence 2: II = I
Example 20:
See examples 17b and 19b:
1 0 2 1
2
0 1 3 0
1 0 2 1 0
2 3
0 1 3 0 1
93
How to interpret the inverse matrix of A?

1
The inverse of the square matrix A is referred to as A
It has the following properties:

1) AA 1 I
2) A 1 A I
Consequence: I 1I I
Why is it important to calculate the inverse matrix?
Ax b
can be rewritten as:
A 1 Ax A 1b
or
x A 1b
Hence, Ax b can be solved as x A 1b
94
How to calculate the inverse matrix A?
Definition:
The inverse of the square matrix A
a c
A
b d
equals
1 1 d c
A
ad bc b a
Proof:
1 d c a c
A 1A
ad bc b a b d
1 da bc dc dc
ad bc ba ab bc ad
1 da bc 0
ad bc 0 bc ad
1 0
0 1
Note: the inverse of matrix A does not exist if ad bc 0
95
How to calculate the inverse matrix?
2 4 1 0
A A
3 1 0 1
Round 1: Divide row 1 by 2

1 2 1/ 2 0
A A
3 1 0 1
Round 2: Row 2 new: row 2 - 3 times row 1
1 2 1/ 2 0
A A
0 5 3/ 2 1
Round 3: Divide row 2 by (-5)
1 2 1/ 2 0
A A
0 1 3/10 1/ 5
Round 4: Row 1 new: row 1 - 2 times row 1
1 0 1/10 2 / 5
A A1
0 1 3/10 1/ 5
Check:
1 2 4 1/10 2/5 1 0
AA I
3 1 3/10 1/ 5 0 1
1/10 2 / 5 2 4 1 0
A 1A I
3/10 1/ 5 3 1 0 1
96
ADVANCED MATHEMATICS TUTORIALS WEEK 2
Solutions Tutorials Week 2

Advanced Mathematics - Technical tutorial Week 2 (Wednesday)
Exercise 1.
Exercise 4.2.5. from Klein (below).
Determine if the following are conformable for matrix multiplication, and if so, indicate the
dimension of the resulting product.
2 4 6 8 2 4
a)
1 3 4 5 1 3
Solution:
The matrices are non-comformable, because the number of columns (4) is not equal to the
number of rows (2). However, by swapping both matrices - the product
2 4 2 4 6 8
1 3 1 3 4 5
leads to two conformable matrices. This product has the format 2 x 4 (2 is the number of rows
in the first matrix and 4 is the number of columns of the second matrix.
a
b) a b c b
c
Solution:
This is an important example, because it can be interpreted for instance as an objective
function in optimization problems. The product gives a scalar (a 1 x 1 number), which is non-
negative because it is the sum of three squared numbers.
a
a b c b a 2 b2 c2 (a scalar)
c
a a 2 ab ac
Remarkable, the product b a b c ab b 2 bc has a 3 x 3 dimension.
c ac bc c 2
The features of this product are:
1) It is a square matrix
2) All diagonal elements are non-negative
t
a2 ab ac a2 ab ac
3) The matrix is symmetric ab b 2 bc ab b 2 bc
ac bc c 2 ac bc c2
97
b additional) I would like to give some further motivation for this example. An example of an
application of this as follows:
1
1 1 1 1 12 12 12 3
1
whereas
1 1 1 1
1 1 1 1 1 1 1
1 1 1 1
m p s v
c) n q t w
o r u x
Solution:
The matrices are non-conformable in this order of multiplication.
d) A row vector of dimension 1 x k and a matrix of dimension k x l
Solution:
The product has the dimension 1 x l (a 1 by l row vector)
e) (bonus exercise from us, not in Klein)

What is the dimension of the product
A B BT C
for which the matrices have the following format:

A is a p x q matrix
B is a q x q matrix (and the matrix BT is the transpose of the matrix B, which also has a q x q
format.
C is a q x r matrix
Of course we can generalize such a multiplication further. What you should learn is that the
number of rows of the first matrix (in this example A) and the number of columns of the last
matrix in the matrix product (in this example C) determine the format of the product.
Solution:
The product A B Bt C is a a p x r matrix
Exercise 2.
Determine in which order the following matrices can be multiplied and carry out the
multiplication.
98
1 2 3
1 4
4 5 6
, 2 5
7 8 9
3 6
1 2 3
Solution:
1 2 3 11 2 2 3 3 1 4 2 5 3 6 14 32
1 4
4 5 6 41 5 2 6 3 4 4 5 5 6 6 32 77
2 5
7 8 9 71 8 2 9 3 7 4 8 5 9 6 50 122
3 6
1 2 3 11 2 2 3 3 1 4 2 5 3 6 14 32
Exercise 3.
1 4 8 0
For the matrices A ,B , show that ABBA. Do the same, but without
4 5 3 1
T
1 1
2 2
calculation for 3 and 3 1 2 3 4 5 .
4 4
5 5
Solution:
Observe that A is a symmetric matrix, because A AT (A is equal to its transpose AT ). Note
8 3
that B BT , because BT It doesnt help your calculations, but you should know what
0 1
it is.
1 4 8 0 20 4 8 0 1 4 8 32
AB BA
4 5 3 1 47 5 3 1 4 5 7 17
For the second pair of vectors, observe that one order of the multiplication gives rise to a 5x5
matrix, while the other leads to a 1x1 matrix, often called a number (or a scalar). Clearly those
T
1 1
2 2
cannot be the same. In fact the version with the number as an outcome, 3 3
4 4
5 5
is just another way of writing the inner product. (Check that its the same thing!)
3b).
5 3
4 3 17 17
Show for the matrix A , with inverse A 1 that the general rule
1 5 1 4
17 17
99
( A 1 )T ( AT ) 1
holds.
Solution:
T
5 3 5 1
We compute directly ( A 1 )T 17 17 17 17 .
1 4 3 4
17 17 17 17
T 1 1
If this is equal to ( A ) , then it must hold that I AT ( AT ) AT ( A 1 )T . We check:
5 1
4 1 17 17 1 20 3 4 4 1 0
AT ( A 1 )T
3 5 3 4 17 15 15 3 20 0 1
17 17
Yippee.
Exercise 4.
Write out the following sets of equations in matrix form. Solve by sweeping.
a)
x1 x2 3x3 7 x4 5
2 x1 2 x2 4 x3 6 x4 4
x1 x2 3x3 6 x4 0
x1 2 x3 1x4 4
b)
x1 3x2 x3 3
x1 2 x2 5 x3 4
x2 4 x3 1
c)
x1 3x2 x3 3
x1 2 x2 5 x3 4
x2 4 x3 0
Solution:
a)
Sweeping is basically adding and subtracting and multiplying equations until you get
something from which you can easily read the result. Ideal would be the identity matrix, but
in practice we settle for something a bit messier. It is best seen in an example. We write the
equation in a matrix as follows:
1 1 3 7 x1 5
2 2 4 6 x2 4
1 1 3 6 x3 0
1 0 2 1 x4 4
If you multiply this out, you indeed get the equations back (try it!).This is of course why
matrix multiplication is defined the way it is: it makes it very easy to write sets of equations
compactly. However, for actual solving we will write things down slightly differently.
100
Suppose we subtract the first equation x1 x2 3x3 7 x4 5 from the second

2 x1 2 x2 4 x3 6 x4 4 , then we get:
2 x1 2 x2 4 x3 6 x4 4
x1 x2 3x3 7 x4 5
x1 x2 x3 x4 1
Notice that the xs dont change, we only have to look at the value in front of them. Thats
why we write down the set as follows:
1 1 3 7 5
2 2 4 6 4
1 1 3 6 0
1 0 2 1 4
Now we rework this augmented matrix, as it is called, to get something that we can interpret
quickly. Each time we rewrite the matrix, we indicate this with the sign ~ rather than =, as the
matrices are not equal. Notice that we not only subtract, add and multiply, but also
interchange rows. We get:
1 1 3 7 5 0 1 5 6 9 1 0 2 1 4
2 2 4 6 4 0 2 8 4 12 0 1 4 2 6
1 1 3 6 0 0 1 5 7 4 0 0 9 5 10
1 0 2 1 4 1 0 2 1 4 0 0 1 4 3
Here in the first step we subtracted the last row from the first row once, twice from the second
row and once from the third. In the second we interchanged the first and the last row, then
divided the second row by 2 and then added it to the third row and subtracted it from the
fourth. From now on we dont describe our steps, as this is very cumbersome and confusing.
We continue:
1 0 2 1 4
1 0 2 1 4 1 0 2 1 4
0 1 4 2 6
0 1 4 2 6 0 1 4 2 6
0 0 1 4 3
0 0 9 5 10 0 0 1 4 3
17
0 0 1 4 3 0 0 0 41 17 0 0 0 1
41
17
From the last matrix we can now easily solve the equations. The last line reads: x4 . We
41
17 17 55
plug this in in the line above to get: x3 4( ) 3 x3 3 4( ) . We use this again in
41 41 41
55 17 55 17 8
the line above that: x2 4( ) 2( ) 6 x2 6 4( ) 2( ) and finally
41 41 41 41 41
55 17 55 17 71
x1 2( ) ( ) 4 x1 4 2( ) ( ) .
41 41 41 41 41
Notice that once we have all zeros in the lower corner, life is quite easy. Of course, in general,
sweeping is just a systematic way of solving by substitution. The operations we used to
change the augmented matrix (addition, subtraction, etc.) are called elementary operations.
They are very useful in the study of linear algebra.
101
b)
1 3 1 x1 3
1 2 5 x2 4
0 1 4 x3 1
1 3 1 3 1 3 1 3 1 3 1 3
1 2 5 4 0 1 4 1 0 1 4 1
0 1 4 1 0 1 4 1 0 0 0 0
Hmm, so whats going on here? We have an equation saying 0 x1 0 x2 0 x3 0 , which doesnt
tell us anything we did not already know. And after that we have only two equations left for
three unknowns. So what we get is infinitely many solutions. (If you find this confusing, think
back to the simpler case x1 x2 0 , which has infinitely many solutions also: one for each x1
such that x1 x2 .) We have one free variable. We could pick any of our xs to be the free one
and express the others in terms of it. We pick x3 . Then we get x2 4 x3 1 x2 1 4 x3
and x1 3( 1 4 x3 ) x3 3 x1 6 13x3 .
A system of linear equations which allows multiple solutions like this is called
underdetermined. You get such an underdetermined system if and only if you get a row of all
zeros in your augmented matrix. The number of free variables you get is the number of
columns minus the number of nonzero rows you get in the end, ie. 3-2=1 in this casse.
c)
1 3 1 x1 3
1 2 5 x2 4
0 1 4 x3 0
1 3 1 3 1 3 1 3 1 3 1 3
1 2 5 4 0 1 4 1 0 1 4 1
0 1 4 0 0 1 4 0 0 0 0 1
Hmm, this seems especially problematic: 0 x1 0 x2 0 x3 1 . What we have here is an
inconsistent system: it has no solutions. It is no coincidence that the matrix (not augmented)
here is the same as under b). Non-augmented matrices which can get a row of all zeros give
rise to either an underdetermined or an inconsistent system. Only matrices such as that under
a), which cannot have a row of zeros under elementary operations, have one and only one
solution for every possible vector on the right hand side. The number of rows with not all
zeros in the end is called the rank of the matrix. Thus this matrix has rank 2, whereas that
under a) had rank 4.
Exercise 5.
a11 a1n
Find vectors b and c that pick out element aij from matrix A , i.e.
am1 amn
aij b A c .
102
Solution:
0
Consider the vectors: b ei (0 1 0), c e j 1 , where e j has a 1 only in its j-th
0
spot. These es are called unit vectors. Then:
0 a1 j
a11 a1n
b A c (0 1 0) 1 (0 1 0) aij aij
am1 amn
0 amj
In general: pre-multiplication with a unit vector gives you a row from the matrix, post-
multiplication gives you a column.
* Exercise 6.
Find vectors b and c such that b A c gives the average of all elements of A.
Solution:
1
1
Consider b 1 1 1 ,c 1 , then:
mn
1
1
mn
a11 a1n
1 1 m n
b A c 1 1 1 aij
mn mn i 1 j 1
am1 amn
1
mn
This is the sum of all the elements in A divided by the number of elements in A, i.e. the
1
average. The term could also have been in front of the first vector, or shared between
mn
them.
* Exercise 7.
Write out the matrix product A B in terms of their typical elements aij , bij , assuming A and B
are conformable (which means that the number of columns of A is equal to the number of
rows of B). I.e., find the typical element cij of C A B , where we write C cij .
103
Solution:
a11 a1n b11 b1 j b1 p
n
C ai1 aij ain bij ai1b1 j ainbnj aik bkj
k 1
am1 amn bn1 bnj bnp
* Exercise 8.
Show that the length formula x x12 x2 2 holds in two dimensions.
Solution:
The picture shows a general vector x and its components x1 , x2 . Pythagoras theorem now
states (as you hopefully remember from high school) that for a triangle with a right angle like
2
this, x x12 x2 2 . Taking the square root gives the result.
* Exercise 9.
Show that the circle formula ( x1 c1 )2 ( x2 c2 )2 R 2 indeed gives a circle with radius R and
centre (c1 , c2 ) .
104
Solution:
The figure tells the basic story. We start with point (c1 , c2 ) and we pick a point ( x1 , x2 ) such
that the condition ( x1 c1 )2 ( x2 c2 )2 R2 holds. We did this in the figure. But then, by
Pythagoras theorem, the distance between (c1 , c2 ) and ( x1 , x2 ) must be R. This must hold for
all ( x1 , x2 ) we can find this way, so the set of points ( x1 , x2 ) for which the condition holds is
the set of points that has distance R to (c1 , c2 ) . This is the circle drawn.
Exercise 10.
2 3
Find all vectors orthogonal to (1, 2) . Do the same for (1, 2,3)
Solution:
For an orthogonal vector (a, b) we want (1, 2) (a, b) 0 . We find (1, 2) (a, b) a 2b 0 . This
is a single equation in two unknowns, so we have one free variable. Lets take b free. Then
a=-2b, so all vectors ( 2b, b), b are orthogonal to (1, 2) .
3
For (1, 2,3) we want (1, 2,3) (a, b, c) 0 . We find (1, 2,3) (a, b, c) a 2b 3c 0 . This
is a single equation in 3 unknowns, so we have two free variables. Lets take b and c free.
Then a=-2b-3c, so al vectors ( 2b 3c, b, c), b, c are orthogonal to (1,2,3). If you try to
imagine this in space, it makes sense that a vector in 3 has two free variables. If you find an
orthogonal vector you cannot only extend it, like in 2 , but also rotate it.
* Exercise 11. Bonus:

105
Show that the formula for orthogonality coincides with our intuitive notion of orthogonality.
Solution:
As a preliminary, convince yourself that the vector indicated in the following figure as B-A is
indeed B-A.
The easiest way to see this (much easier than I explained in class) is that in the figure A and
B - A add up to B, which they clearly also do algebraically.
Now look at the following picture:
106
Convince yourself that the vectors are as indicated. Now, from the figure we see that the angle
between A and B can only be orthogonal if A + B and A - B have the same length, like so:
107
A + B having the same length as B-A means B A A B . We now manipulate this

expression to derive that then A B 0 , as we want. First note that for any vector C it holds
2
that C C C . This can be verified easily by writing out the definitions.
Now, squaring our equation, we get:
2 2
B A A B B A A B ( B A) ( B A) ( A B) ( A B)
Working this out:
( B A) ( B A) B B 2 A B A A ( A B) ( A B) B B 2 A B A A
2A B 2A B 4A B 0 A B 0
Which is what we wanted.
Broad tutorial Friday week 2
Exercise 1.
2
For vectors a (a1 , a2 ), b (b1 , b2 ) from , draw them and a b . Also calculate a b .
108
Solution:
In the figure, the red lines denote the vectors a and b, while the green lines denote their
translations (by b and a respectively). The yellow line is the resultant vector a+b, which has
coordinates (a1 b1 , a2 b2 ) .
* Exercise 2.
a11 a1n b1
Consider the matrix A working on vector b . Show that this
am1 amn bn
constitutes a linear map from n
to m
.
Solution:
Let T (b) be the transformation indicated. Then we have to show that
T ( b) T (b), ,b n
and T (b1 b2 ) T (b1 ) T (b2 ), b1 , b2 n
. So:
T ( b) A ( b) ( b) T (b), ,b n
T (b1 b2 ) A (b1 b2 ) A b1 A b2 T (b1 ) T (b2 ), b1 , b2 n
This is so easy that it becomes confusing. Have a good look at it: youre gazing into the minds
of mathematicians. From the outset we did not know that a matrix function was always a
linear map, nor that the reverse holds (as we showed in class). Now however, we have proved
that this is the case and that matrix functions and linear maps are the same thing.
* Exercise 3.
1 1
Show that ( A B) B A1
Solution:
( A B) 1 ( A B) I ( A B) 1 ( A B) B 1 I B 1
( A B) 1 A B 1
( A B) 1 A A 1 B 1
A 1 ( A B) 1 B 1
A1
109
Exercise 4.
Determine dimension of the span of the following vectors:
1 0 3 2
2 7 5 11
3 , 4 , 3 , 14
4 2 7 7
5 5 0 5
Solution:
The span of a set of vector is the set of all multiples and sums of these vectors. Geometrically,
you can think of it as everywhere you can get by taking steps in the direction of these vectors.
The dimension of a span is the number of linearly independent vectors within the span. So we
have to determine how many linearly independent vectors there are in the set of four vectors
given.
Linear independence was defined as follows: v1 , v 2 , , v n are linearly independent if
v
1 1 2 v2 n vn 0 only if 1 2 n 0 . We can write this in matrix notation
as follows:
1
v1 v2 vn 0 only if 1 2 n 0
n
This equation has the associated augmented matrix:

0
v1 v2 vn 0
0
Lets think about this for a second. If we start sweeping this matrix, the right hand side will
never change, as we would be adding zeros.
Now, in general for a set of linear equations there were three possibilities: well-determined,
underdetermined, or inconsistent. However, in this case we already know that the system is
not inconsistent (that is called consistent), because 1 2 n 0 is certainly a solution.
So the question becomes if it is underdetermined and to what extent (how many free variables
are there). If there are no free variables, then all the vectors are linearly independent. If there
are free variables, then there are as many linearly dependent vectors as free variables.
So what this boils down to is just an ordinary matrix sweep, made simpler by the fact that the
right hand side contains only zeros, and then counting the number of lines that do not contain
only zeros.
It will be clearer after our example, so lets turn to that. We sweep:
1 0 3 2 0 1 0 3 2 0 1 0 3 2 0
2 7 5 11 0 2 7 5 11 0 0 7 1 15 0
2 4 3 14 0 2 4 3 14 0 0 4 9 18 0
5 11 5 23 0 3 7 8 9 0 1 0 3 2 0
7 15 2 37 0 2 4 3 14 0 0 0 0 0 0
110
1 0 3 2 0 1 0 3 2 0
1 15 1 15
0 1 0 0 1 0
7 7 7 7
59 111 111
0 0 0 0 0 1 0
7 7 59
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
We see that from the four s we set out to find, one will be free, while the other three are
fixed in terms of the fourth. We dont care about their actual values, so we stop solving here.
The dimension of the span of the set of vectors is three.
We have determined the dimension of the span of the column vectors of a matrix (this span is
often called the column space of the matrix). However, some reflection will show that we also
determined the dimension of the span of the row vectors of our matrix (called the row space).
The reason is as follows: vectors are linearly independent if one of them cannot be obtained
through elementary operations (additions, subtraction etc.) on the others. That is precisely
what we check by sweeping. We therefore see that the row space also has dimension three. In
general the dimension of the row space is equal to the dimension of the column space. It is
also equal to the rank of the matrix, as we defined it in class. So now we have a few ways of
thinking about the rank of a matrix.
Exercise 5.
Consider the linear map T : n m

with associated matrix A V1 V2 Vn , where Vi
is the ith column vector. Write the outcome of map in terms of the column vectors and a
general vector x ( x1 , x2 , , xn ) n
. What does this mean for the relation between the
column space (the space spanned by the column vectors of A) and the range of T.
Solution:
x1
T : ( x1 , x2 , , xn ) A x V1 V2 Vn x1V1 xnVn
xn
So the image of T (the possible outcomes that T could give) is equal to the set of all linear
combinations of the column vectors of A. But that set is just the column space of A. So the
column space of A is the range of T.
* Exercise 6.
Consider linear maps T1 : n m
and T2 : m p
, with associated matrices A1 , A2 . Show
that the matrix associated with the linear map T2 T1 is A2 A1 .
Solution:
What does T2 T1 mean again? If, x ( x1 , x2 , , xn ) n
, then T1 ( x)
m
so that we can
apply T2 : T2 (T1 ( x)) . This is meant by T2 T1 . It defines a new linear map T3 : n p
which
has an associated matrix also. Lets call this matrix C and derive what it is:
111
T3 ( x) C x T2 (T1 ( x)) T2 ( A1 x) A2 A1 x C A2 A1 .
112
Advanced mathematics
Additional exercises of week 2
Question 1
1
Given is the vector x 2
0
3
Make a system of orthonormal vectors based on the vector x that span the entire space
Solution
In a three dimensional space there are at maximum three independent vectors. First, we
1 0 0
consider the three unit vectors e1 0 , e2 1 , e3 0
0 0 1
They have the following features:
1) The vectors are perpendicular, which means that the inner product for a pair of these
vectors is zero. For instance for e1 and e2, the inner product is e1 e2 1 0 0 1 0 0 0
.This also holds true for the other combinations (e1 and e3, e2 and e3).
2) The vectors have a length of one. For instance, the length of the vector e1 is
12 02 02 1
3) The three vectors span the entire space 3 . It implies that each vector in 3
can be written
as a linear combination of e1, e2 and e3. For instance:
1 1 0 0
2 1 0 2 1 0 0
0 0 0 1
1
Next, we consider x 2 . Our strategy is the following. Step 1: we construct two vectors
0
that are perpendicular to x. Step 2, we normalize the length of both vectors to one.
2 1
Step 1: The vector y 1 is perpendicular to x 2
0 0
Reason: x y 1 2 2 1 0 0 0
b a
(Thus a is perpendicular to b )
0 0
0 1 2
The vector z 0 is perpendicular to x 2 and y 1
1 0 0
Reason: x z 1 0 2 0 0 1 0 and y z 2 0 10 01 0
113
0 a b
(Thus 0 is perpendicular to b and a )
1 0 0
Step 2:
1
a) The length of x 2 is 12 ( 2)2 02 5
0
1
1 1 1 4 0 5
Hence, the length of x 2 is 1
5 5 5 5 5 5
0
2
b) The length of y 1 is (2)2 12 02 5
0
2
1 1 4 1 0 5
Hence, the length of y 1 is 1
5 5 5 5 5 5
0
0
c) the length of z 0 is 1.
1
1 2
5 5 0
2 1 3
Step 3. The orthonormal vectors , and 0 span the entire space
5 5
1
0 0
1
5
2
It means that each vector can be written as a linear combination of these vectors ,
5
0
2
5 0
1
and 0
5
1
0
114
Question 2
1
Given is the vector x 2
0
1 1 0
and A 0 2 1
1 4 2
Compute x ' x and x ' Ax
Solution:
1
x'x 1 2 0 2 1 1 ( 2) ( 2) 0 0 5
0
1 1 0 1 1 1 1 ( 2) 0 0
x ' Ax 1 2 0 0 2 1 2 1 2 0 0 1 2 ( 2) 1 0
1 4 2 0 ( 1) 1 4 ( 2) 2 0
1
1 2 0 4 1 ( 1) ( 2) ( 4) 0 ( 9) 7
9
Alternative solution:
1 1 0 1
x ' Ax 1 2 0 0 2 1 2
1 4 2 0
1
1 1 ( 2) 0 0 ( 1) 1 1 ( 2) 2 0 4 1 0 ( 2) 1 0 2 2
0
1
1 3 2 2 1 1 ( 3) ( 2) 0 ( 9) 7
0
Question 3
1 2 1
If A and x
2 t 2
For which t has the determinant of the matrix A a negative value?
For which t is x ' Ax 0
115
For which t is x ' A 1 x 0
Does it matter for these results that A is a symmetric matrix? So that A A'
1 1
So, check whether the results is different for e.g. B ?
2 t
Solution
a) det( A) 1 t 2 2 t 4
The determinant of A is negative if t 4
b)
1 2 1 1
x ' Ax 1 2 3 2 2t 3 2(2 2t ) 0
2 t 2 2
Thus, 7 4t 0 , so that t 7/4
1 2
c) A
2 t
1 1t 2
A
t 4 2 1
1 t 2 1 1 1 t 12
x ' A 1x 1 2 t 4 4
t 4 2 1 2 t 4 2 t 4
It is positive if t > -4 or t < -12
1 1
We check the results for B
2 t
1 1t 1
B
t 2 2 1
1 t 1 1 1 1 t 6
x ' B 1x 1 2 t 4 1
t 2 2 1 2 t 2 2 t 2
It is positive if t > -2 or t < -6
We do not observe any major difference between A and B.
116
Question 4
Solve the following systems of equations.
a)
1 2 2 x 3
5 7 8 y 2
6 3 4 z 7
Solution:
We write the augmented matrix form:
1 2 2 3 1 2 2 3 1 2 2 3
5 7 8 2 ~ 0 3 2 13 ~ 0 3 2 13
6 3 4 7 0 9 8 11 0 0 2 28
41
So 2 z 28 z 14 and 3 y 2 z 13 3 y 28 y and
3
82 82 11
x 2 y 2z 3 x 28 x 31 . We check our results
3 3 3
11 11 82 93 84 9
28
1 2 2 3 3 3 3 3 3
41 55 287 342 336 6
5 7 8 112 2 . Yippee.
3 3 3 3 3
6 3 4 7
14 66 123 189 168 21
56
3 3 3 3
b)
3 2 1 4 w 1
2 3 4 5 x 2
9 8 7 6 y 3
1 8 3 4 z 4
Solution:
Just the bare calculations:
117
3 2 1 4 1 1 8 3 4 4 1 8 3 4 4
2 3 4 5 2 2 3 4 5 2 0 13 2 13 6
~ ~ ~
9 8 7 6 3 9 8 7 6 3 0 64 34 30 33
1 8 3 4 4 3 2 1 4 1 0 26 8 8 11
1 8 3 4 4
0 13 2 13 6
13 64 2 64 13 64 6 64 ~
0 0( 64 ) 34 30 33
13 13 13 13
0 0 4 18 1
1 8 3 4 4
1 8 3 4 4
0 13 2 13 6
0 13 2 13 6
314 442 45 ~ ~
0 0 0 0 4 18 1
13 13 13
0 0 314 442 45
0 0 4 18 1
1 8 3 4 4 1 8 3 4 4
0 13 2 13 6 0 13 2 13 6
0 0 4 18 1 ~ 0 0 4 18 1
18 314 314 247
0 0 0 442 45 0 0 0 971
4 4 2
247 247 247 313
So 971z , 4 y 18 z 1
z 4 y 18 y ,
2 1942 1942 971
313 247 553
13x 2 y 13z 6 13x 2 13 x ,
971 1942 1942
553 313 247 239
w 8x 3 y 4 z 4 w 8 3 4 w
1942 971 1942 971
Lets check the result:
239 239 553 313 247
3 2 4
971 971 1942 971 1942
3 2 1 4 553 239 553 313 247 1
2 3 4 5
2 3 4 5 1942 971 1942 971 1942 2
9 8 7 6 313 239 553 313 247 3
9 8 7 6
1 8 3 4 971 971 1942 971 1942 4
247 239 553 313 247
8 3 4
1942 971 1942 971 1942
Phew. What a chore. Lets never sweep a matrix again! (Until the exam, of course)
118
Take home assignments on material of week 2
1.
Compute the a, b and c , for which the three three-dimensional vectors are a system of
orthonormal vectors.
c 1 2
1 , 2 , a
1 b 1
Solution:
Our method of solution is that we first determine the a, b, and c for which the three vectors
are mutually perpendicular. Next, we normalize the three vectors by multiplying the vector
with one over the norm of the vector (= 1/length).
We consider three pairs of vectors that must have an inner product of zero.
c 2 b 0 (vectors 1 and 2 are orthogonal)

2c a 1 0 (vectors 1 and 3 are orthogonal)
2 2a b 0 (vectors 2 and 3 are orthogonal)
It can be shown that a 3, b 4, c 2
The length (or norm) of the three vectors is respectively:

c
1 ( 2) 2 12 12 6
1
1
2 ( 1)2 22 ( 4) 2 21
b
2
3 22 32 12 14
1
Hence it would be possible to orthogonalize the three vectors if they were multiplied by
1 1 1 1 1 1
6, 21 and 14 , respectively.
6 6 21 21 14 14
1 1
(note: I am rewriting all of these three numbers, such as 6 , because I dont want to
6 6
have an irrational number (square root) in the denominator of the ratio)
Thus an orthogonal system of three vectors is
119
2 1 2
1 1 1
6 1 , 21 2 , 14 3
6 21 14
1 4 1
c
Hence, it is not possible to calculate the c of the vector 1 without changing the other two
1
elements of the vector. This is also the case for the other two vectors. So, the formal answer to
this question is that there are no a, b and c . An orthogonal system would be:
2 1 2
1 1 1
6 1 , 21 2 , 14 3
6 21 14
1 4 1
2.
2 1 1 0
For A and B demonstrate that
0 3 2 1
a) ( A B)T BT AT
1
b) (( A B)T ) (( A B) 1 )T
The superscript T refers to the transpose of the matrix.
Solution:
2 1 1 0 2 2 0 1 4 1
A B
0 3 2 1 0 6 0 3 6 3
4 6
( A B )T
1 3
1 2 2 0 2 2 0 6 4 6
BT AT
0 1 1 3 0 1 0 3 1 3
3.
3
Determine the span of the four vectors
3 1 2 4
1 , 2 , 3 , 1
1 4 5 3
We are interested in the linear combination of the three vectors (addition, subtracting,
multiplication by a scalar (=number)), and we want to determine the dimension of the
collection of all of these linear combinations (sub)-space of 3 . It may be a single point
(dimension = 0), a linear equation spanned by a three-dimensional vector (dimension = 1),
and a plane spanned by two independent three-dimensional vectors (dimension = 2) or the
entire space 3 , which is spanned by three independent three-dimensional vectors.
120
For a linear combination of four three-dimensional vectors, at maximum three vectors should
suffice. The four vectors are mutually dependent.
3 1 2 4 0
1 1 2 2 3 3 4 1 0
1 4 5 3 0
1 4 2 5 3 3 4 0 eq(1)
3 1 2 2 3 4 4 0 eq(2)
1 2 2 3 3 4 0 eq(3)
1 4 2 5 3 3 4 0 eq(1) ' eq(1)

0 13 2 13 3 13 4 0 eq(2)'= 3 eq(1) eq(2)
0 2 2 2 3 2 4 0 eq(3)'= eq(1) eq(3)
1 4 2 5 3 3 4 0 eq(1) '' eq(1) '

0 2 3 4 0 eq(2)''= eq(2)'/13
0 2 3 4 0 eq(3)''= eq(3) '/ 2
The parameters 1 and 2 can be rewritten in terms of 3 and 4 , so that for any arbitrary
value of 3 and 4 we can immediately determine the corresponding values of 1 and 2 .
According to equation (2) and (3): 2 3 4

According to equation (1):
1 4 2 5 3 3 4 0 (eq(1))
1 4 2 5 3 3 4 4( 3 4 ) 5 3 3 4 3 4
We can take for instance 3 1, 4 1 , we have that 2 3 4 1 1 0 and

1 3 4 1 1 2
Check: for these specific values of 3 and 4 , the mapping of the four vectors becomes the
origin (vector of zeros).
3 1 2 4 3 1 2 4
1 1 2 2 3 3 4 1 ( 3 4) 1 ( 3 4) 2 3 3 4 1
1 4 5 3 1 4 5 3
3 1 2 3 1 4 0
3 1 2 3 4 1 2 1 0
1 4 5 1 4 3 0
Thus two out of four vectors are redundant. It implies that the rank of the matrix is equal to 2.
It means that the dimension of the set of outcomes is equal to a plane that can be spanned by
two vectors. It does not matter which of these two, because they are all pairwise independent.
This the outcome space is the space spanned by for instance the two vectors
121
3 1
1 and 2 ,
1 4
But we could have taken also two other vectors out of these four vectors.
122
Week 3 - Linear Algebra (II)
Klein: Chapter 5 (it excludes Cramers rule)
Determinant of matrix Section 5.1

Rank of matrix See lecture slides
Eigenvalues and eigenvectors Section 5.3
Diagonalization of a matrix Section 5.3
Cramers rule is no part of the material
123
Linear Maps
a11 a1n
As we have seen last week, the matrix can be
am1 amn
interpreted as a function f (.) : n m
, in particular:
x1 a11 a1n x1
f( ) .
xn am1 amn xn
We now look at functions T (.) : n m

such that, for
x, y n
,c , T (x y) T (x) T (y) and T (cx) cT (x) .
Functions that satisfy both criteria are called linear functions (or linear
maps or linear transformations).
Example 1:
x1
3 2 x1 x2
T (.) : , T ( x2 ) , then
x1 2 x3
x3
w1 y1 w1 y1
w1 y1 w2 y2
T ( w2 y2 ) T ( w2 y2 )
w1 y1 2( w3 y3 )
w3 y3 w3 y3
w1 y1
w1 w2 y1 y2 w1 w2 y1 y2
T ( w2 ) T ( y2 )
w1 2w3 y1 2 y3 ) w1 wx3 y1 2 y3
w3 y3
And
y1 cy1 y1
cy1 cy2 y1 y2
T (c y2 ) T ( cy2 ) c cT ( y2 )
cy1 2cy3 y1 2 y3
y3 cy3 y3
124
Mapping and matrices
Theorem:
Any linear map can be represented by a matrix and any matrix is a
linear map. That is, they are the same thing.
Matrix representation of a linear map:
Let e1 , , en be the unit vectors in n , then a matrix representation of
a linear map T (.) : n m
is:
A T (e1 ) T (e2 ) T (en )
This shows (if we proved it) that every linear map has a matrix
representation. The other way around (that every matrix represents a
linear map) is done in the tutorial.
Example 2
We represent the linear map from example 1 as a matrix:
0
1 0 1 0 1 1
T () ,T ( 1 )
1 2 0 1 0 2 0 0
0
0
0 0 0
T( 0 )
0 2 1 2
1
So the map T is represented by the matrix:
1 1 0
1 0 2
Lets check:
x1
1 1 0 x1 x2
x2
1 0 2 x1 2 x3
x3
This is indeed our original map T.
125
Example 3 of linear mapping: a counter clockwise rotation of 90

degrees
2 2
We are interested in a matrix A: , which represents a counter
clockwise rotation.
0 1
The matrix A can be understood as follows:
1 0
1
First column of the matrix A. Rotating the unit vector
0
0
counter clockwise we get .
1
0
Second column of the matrix A. Rotating the unit vector
1
1
counterclockwise we get ,
0
0 1
Thus, the rotation is represented by: A
1 0
Thus the matrix can be used to rotate any vector counter clockwise.
2
For instance the vector :
1
0 1 2 1
Ax y
1 0 1 2
2
The rotation implies that the vector x is perpendicular to its
1
1
mapping y =
2
Because:
126
1) x y 2 ( 1) 1 2 0 (the inner product of x and y is zero)

2) x 5 and y 5 So that both vectors x and y have equal
length.
Implication 1
0 1
The matrix B represents a clockwise rotation: Thus
1 0
1 0 0 1
B and B
0 1 1 0
127
Now we show graphically that T (x y) T (x) T (y) . It can be seen

in the picture that it does not matter whether you first rotate your
vectors and then add them, or the other way around, i.e. first adding
them and only then rotating them.
128
Example 4: Interpretation of matrix multiplication (I)
We have two matrices A and B:
0 1
A
1 0
3 0
B
0 3
Question: what is the meaning of ABx?

1) First multiplication Bx: Implies a multiplication of both elements
of the vector x by a factor 3.
3 0
Bx x
0 3
3 0 1 3
Be1
0 3 0 0
and
3 0 0 0
Be2
0 3 1 3
2) Second multiplication ABx:

Counter clockwise rotation
Thus :
0 1 3 0 0 3
C AB
1 0 0 3 3 0
Mapping C:
Step 1: Bx: three times larger length of the vector
Step 2: ABx: Counter clockwise rotation by 90 degrees
129
2
We apply the mapping on x
1
0 3 2 3
3 0 1 6
2 3
x is perpendicular to y =
1 6
Because:
1) x y 2 ( 3) 1 6 0
2) x 5 and y 45 3 5
so that y 3x
130
Finally, we show that T (cx) cT (x) .
131
Also here, it does not matter whether we first multiply our vector with
a number and then rotate it or the other way around.
132
Multiplication by a common factor
Example 5:
3 3
Ax= 3x
3 0 0 2 6
0 3 0 1 3
0 0 3 3 9
Example 6:
3 3
Ax x,
0 0 2 2 2
0 0 1 1
0 0 3 3 3
133
Interpretation of matrix multiplication (II)
Example 7:
3
To make the length of vector in three times larger:
3 0 0
A 0 3 0
0 0 3
The product of AA makes the length of a vector 9 times larger:
3 0 0 3 0 0 32 0 0
0 3 0 0 3 0 0 32 0
0 0 3 0 0 3 0 0 32
The product of AAA makes the length of a vector 27 times larger:
3 0 0 3 0 0 3 0 0 33 0 0
0 3 0 0 3 0 0 3 0 0 33 0
0 0 3 0 0 3 0 0 3 0 0 33
Et cetera
134
Space and subspace

3
has a dimension of 3.
It means that it can be spanned by three independent vectors at

maximum.
Example 8:
2
3
In the line spanned by 1 has a dimension of 1. The line is
3
2
3
referred to as a subspace of . It means that 1 , , is part of
3
this subspace.
Example 9:
2 1
3
In the sub-space spanned by the vectors 1 and 0 has a
3 2
dimension of 2. A vector in this subspace can be characterized as
2 2 1
1 1 0 ,
3 2 3 2
135
Example 10:
2 1 0
A 1 0 0 will be a mapping into the subspace spanned by the
3 2 0
2 1
vectors 1 and 0 . Thus, the image of each vector will be part of
3 2
this subspace. The matrix A is non-invertible. It has a rank of 2.
136
Interpretation of the determinant of a matrix
The determinant measures how much a linear map blows up the

image.
Consider the map in the figure. It maps the blue square into the
green one.
The determinant is the ratio between the area of the green square
and the blue one.
Since the blue one is the unit square, while the green has area 8, the
determinant of the matrix associated with this map is 8.
Negative determinants occur when positive vectors are mapped into
negative ones.
Note that for any shape to which we apply a linear map, this
relation between area before and after the map is the same. In
higher dimensions we do not talk of the area but of the volume.
137
138
Thus interpretation of determinant for 2 (no proof):

a c
Lets consider the mapping
b d
We consider the size of the area spanned by:
a c 0 0
1)
b d 0 0
a c 1 a
2)
b d 0 b
a c 1 a c
3)
b d 1 b d
a c 0 c
4)
b d 1 d
0 a a c c
It can be shown that the area spanned by , , ,
0 b
b d d
a c
is ad bc , which is the determinant of the matrix
b d
139
Example 11:
The determinant of
3 0
A
0 3
equals nine. Thus det(A)=9. (The area of the mapping spanned by the
0 3 3 0
four vectors , , , equals 9).
0 0 3 3
For the matrix:

1/ 3 0
B we have det(B)=1/9
0 1/ 3
Properties of determinants:
1) It can be shown that in general det(AB)=det(A)det(B)
1
2) It can be show that det( A 1 )
det( A)
Note that for the particular example 11 AB=I
3 0 1/ 3 0 1 0
0 3 0 1/ 3 0 1
And that the determinant of the unit matrix equals 1.
140
3
Determinants in
3
For the determinant of A
a11 a12 a13
A a21 a22 a23
a31 a32 a33
can be calculated as follows:
The determinant can be calculated for each of the three rows or each
of the three columns.
1) For the first row of the matrix:
a22 a23 a21 a23 a21 a22

| A | a11 a12 a13
a32 a33 a31 a33 a31 a32
a a23
For which 22 is the minor of a11 (the determinant of the sub-
a32 a33
matrix of a11 )
a21 a23
is the minor of a12
a31 a33
a21 a22
is the minor of a13
a31 a32
2) One can also take the second row of the matrix A:

a a13 a a23 a a12
| A | a21 12 a22 21 a23 11
a32 a33 a31 a33 a31 a32
etc
141
Thus for each of the minors it is multiplied by:
a11 a12 a13

A a21 a22 a23
a31 a32 a33
Conclusion: The determinant of an upper-diagonal matrix
a11 a12 a13

A 0 a22 a23
0 0 a33
equals | A | a11a22 a33
And for a diagonal matrix:
a11 0 0
A 0 a22 0
0 0 a33
Hence: the determinant of A is equal to zero if one of the diagonal

elements equals zero.
| A | a11a22 a33
142
Example 12
2 1 1
The determinant of the matrix A 1 4 4
1 0 2
equals 6.
For a 3x3 matrix, there are six possibilities to calculate the

determinant of the matrix:
First row of A:
4 4 1 4 1 4
2 1 1 6
0 2 1 2 1 0
or (second row of A):

1 1 2 1 2 1
1 4 4 6
0 2 1 2 1 0
or (third row of A):

1 1 2 1
1 2 6
4 4 1 4
or (first column of A):

4 4 1 4 1 4
2 1 1 6
0 2 1 2 1 0
or (second column of A):

1 4 2 1
1 4 6
1 2 1 2
or (third column of A):

1 4 2 1 2 1
1 4 2 6
1 0 1 0 1 4
143
Example 13
a 1 0
The determinant of the matrix A 2 a 2
0 1 a
equals a3 4a .
144
Eigenvalues and eigenvectors
Compute the eigenvalues of the n x n square matrix A as follows:
Ax x
(A I )x 0
For which x is a non-zero vector, and I is an n x n identity matrix. In

order for a non-zero vector x to satisfy this equation, ( A I ) must
not be invertible. If ( A I ) has an inverse:
(A I ) 1( A I )x (A I ) 10
or
x (A I ) 10
thus
x = 0.
The matrix is non-invertible if its determinant is zero:
|A I| 0
a11 a12
If A is a 2 x 2 matrix: A
a21 a22
Then | A I | 0 becomes
a11 a12
0
a21 a22
The characteristic equation is a second-order polynomial:

(a11 )(a11 ) a21a12 0
or
2
(a11 a22 ) (a11a22 a21a12 ) 0
For which: the trace of the matrix A is defined as the sum of the
145
diagonal elements: tr( A) a11 a22
| A | a11a22 a21a12
trA (trA)2 4 | A |
1,2
2
Thus two solutions: two real eigenvalues at maximum
146
A matrix is diagonalizable:
The square matrix can be diagonalized as follows:
AP P
For which is a diagonal matrix with the eigenvalues of A on the

main diagonal. For a 2 x 2 matrix A, the diagonal matrix is
1 0
0 2
P is a matrix that is spanned by the eigenvectors P [ p1, p2] .

For which 1 corresponds to the vector p1 and 2 corresponds to
the vector p 2
Notation:
The diagonalization AP P can also written as
a) A P P 1
b) P 1 AP
147
Example 14
2 2
The matrix A has the eigenvalues 1 1 and 2 6
2 5
1 0
0 6
2 1
The eigenvectors belonging to 1 1 are p1 and p 2
1 2
Note that the eigenvectors are orthogonal (because the matrix A is

symmetric, what we will not proof here)
Example 15
2 4
The matrix A has the eigenvalues 1 3 and 2 2
1 1
1 0
0 6
4 1
The eigenvectors belonging to 1 1 are p1 and p 2
1 1
for 2 2
148
Example 16 (it is hard example because of the third order

polynomial characteristic equation)
2 1 1
The matrix A 2 3 4 has the characteristic equation
1 1 2
2 1 1
|A I| 2 3 4 ( 1)( 1)( 3) 0
1 1 2
The eigenvalues are 1, -1 and 3.
The associated eigenvectors are:
1
1 1: p1 1
0
0
2 1: p 2 1
1
2
2 3 : p3 3
1
149
Wrapping up: under which conditions is a matrix A invertible?
Think of matrices as maps, the they are invertible if they are both one-
to-one (injective) and onto (surjective).
This map is not injective.
This map is not surjective.
150
This map is both injective and surjective, and is therefore invertible.
Thus: A square matrix A n n

is invertible if :
The matrix A is of full rank n;

The columns of the matrix A are linearly independent.
None of the real eigenvalues is equal to zero.
The determinant of A is nonzero.
151

Technical tutorial of week 3 - Solutions

Exercise 1.
1 0 t
Let A = 2 1 t
0 1 1
For what value of t does A have an inverse?
Solution:
The matrix A has no inverse if
1 0 t
2 1 t 0
0 1 1
Which gives
1 t 2 1
1 t 0
1 1 0 1
So that
(1 t ) 2t 0
Thus the matrix A has an inverse if t 1.

Note that for t 1
1 0 1
a) the determinant of A = 2 1 1 0
0 1 1
b) one of the eigenvalues of A equals zero:
Ax x which has a solution x 0 if
A I 0
1 0 1
2 1 1 0
0 1 1
(1 )((1 )2 1) 1(2 0(1 )) 0
3 2
3 4 0
There is one real eigenvalue: 0
1 0 1
c) The span of the matrix A= 2 1 1 is equal to 2, because its col 3 = col 2 col 1
0 1 1
152
(see below).
1 0 1 x1
2 1 1 x2
0 1 1 x3
1 0 1
x1 2 x2 1 x3 1
0 1 1
1 0 0 1
x1 2 x2 1 x3 1 2
0 1 1 0
1 0
( x1 x3 ) 2 ( x2 x3 ) 1
0 1
1 0
It means that the range of A is a linear combination of the vectors 2 and 1
0 1
(which is a 2 dimensional plane). The dimension of the range of A is 2, which is
smaller than the number of columns of A (=3).
* Exercise 2.
a11 a1n
Find the determinant of .
am1 amn
Solution:
We explained the procedure in class.
* Exercise 3.
Given an object C of a certain volume and a linear map T with associated matrix A, find
Vol(T(C)).
Solution:
Vol (T (C )) det( A) Vol (C )
* Exercise 4.
Consider the following maps and show that they are linear, without deriving their matrix
representation. Also derive and show their eigenvectors (if any).
a) Blow-up of a vector along the x-axis by 100%, while the y-axis remains unchanged.
b) Projection of a vector on the y-axis.
c) A counter-clockwise rotation of a vector by 90 .
Solution:
a)
153
In the figure we drew the transformation for a specific vector. Note that algebraically, this
transformation amounts to T : (a1 , a2 ) (2a1 , a2 ) . In green and yellow are two eigenvectors
for this transformation, we return to them shortly. First we show that the map T is linear. For
that we have to show that T (a b) T (a) T (b) and T (ra) rT (a) . We do this both
graphically and algebraically.
In the figure we constructed T (a b) and it can be seen to be equal to T (a) T (b) (what does
this mean? Try to do the transformation T on a b and see that it gives the same result as
adding T (a) T (b) .) Algebraically, we have:
T (a b) T ((a1 , a2 ) (b1 , b2 )) T (a1 b1 , a2 b2 ) (2(a1 b1 ), a2 b2 )
T (a) T (b) T (a1 , a2 ) T (b1 , b2 ) (2a1 , a2 ) (2b1, b2 ) (2(a1 b1 ), a2 b2 )
So they are equal.
154
Again, we see in the figure that T (ra) rT (a) . Algebraically we have:

T (ra) T (r (a1 , a2 )) T (ra1 , ra2 ) (2ra1 , ra2 )
rT (a) rT (a1 , a2 ) r (2a1 , a2 ) (2ra1 , ra2 )
We see again that they are equal.
For the eigenvectors, we return to the first figure. Two examples of them are drawn in green
and yellow. First consider a vector along the y-axis. What would happen to it under this
transformation? Absolutely nothing. So it is an eigenvector with eigenvalue 1.
T (0, y) 1 (0, y)
Now consider a vector along the x-axis. What will happen to it under T? It will get doubled.
So it is an eigenvector with eigenvalue 2.
T ( x,0) (2 x,0) 2 ( x,0)
b)
155
To make my life easier, all in one picture this time. A projection simple takes any vector and
only keeps the y-part of it: T : ( x, y) (0, y) . Form the picture it is again clear that the map
is a linear one. Note that this time we checked T (ra) rT (a) for r<1.
Algebraically we have:
T (a b) T ((a1 , a2 ) (b1 , b2 )) T (a1 b1 , a2 b2 ) (0, a2 b2 )
T (a) T (b) T (a1 , a2 ) T (b1 , b2 ) (0, a2 ) (0, b2 ) (0, a2 b2 )
and
T (ra) T (r ( x, y )) T (rx, ry) (0, ry)
rT (a) rT ( x, y) r (0, y) (0, ry)
Finally, this map has only one type of eigenvector: any vector along the y-axis is an
eigenvector with eigenvalue 1, as it is unchanged by the map. However, any other vector is
not an eigenvector.
c)
156
The figure shows that the first condition for linearity holds.
157
This figure shows that the second condition for linearity also holds. We drew it here for r<0.
It is a bit beyond the scope of this course to derive the map algebraically, so we leave that to
the interested reader (It is actually not very hard. Give it a try).
This map is interesting in that it has no eigenvectors at all. Because it is a rotation, there is no
vector that does not change direction under the map.
* Exercise 5.
Suppose v is an eigenvector of a matrix A, with associated eigenvalue . Show that, for
0 , v is also an eigenvector with eigenvalue .
Solution:
We know from the fact that v is an eigenvector with eigenvalue that A v v . Now we
use the linearity of a matrix: A v ( A v) ( v) v . So v is also an eigenvector
with eigenvalue .
Exercise 6.
Calculate the eigenvectors and the associated eigenvalues of the following matrix:
2 0 0
A 1 3 5
1 1 1
Solution:
We start by solving the characteristic equation:
158
2 0 0
2
P( ) 1 3 5 0 (2 )((3 )( 1 ) 5) (2 )( 2 8)
1 1 1
(2 )(4 )( 2 )
So we find 1 2, 2 4, 3 2 . (You have to be lucky to be able to solve a cubic equation
this way. Dont worry; on an exam you will always be lucky.)
Now we find the associated eigenvectors v1 , v 2 , v3 by solving the equation:
( A 1I ) v1 0
We solve by sweeping:
2 2 0 0 0 0 0 0 0 1 1 5 0
1 3 2 5 0 1 1 5 0 ~ 0 0 1 0
1 1 1 2 0 1 1 3 0 0 0 0 0
p
So we find v1 p for any p, with associated eigenvalue 1 2.
0
We check our result:
2 0 0 p 2p p
1 3 5 p p 3p 2 p
1 1 1 0 p p 0
Such a relief!
We move on to v 2 with associated eigenvalue 2 4
2 4 0 0 0 2 0 0 0 1 0 0 0 1 0 0 0
1 3 4 5 0 1 1 5 0 0 1 5 0 0 1 5 0
1 1 1 4 0 1 1 5 0 0 1 5 0 0 0 0 0
0
So we find v 2 5q for any q, with associated eigenvalue 2 4
q
2 0 0 0 0 0
1 3 5 5q 20q 4 5q
1 1 1 q 4q q
Hurrah!
We move on to v 3 and 3 2.
2 2 0 0 0 4 0 0 0 4 0 0 0 1 0 0 0
1 3 2 5 0 1 5 5 0 0 5 5 0 0 1 1 0
1 1 1 2 0 1 1 1 0 0 1 1 0 0 0 0 0
0
So we find v 3 r for any r, with associated eigenvalue 3 2.
r
159
2 0 0 0 0 0
1 3 5 r 2r 2 r
1 1 1 r 2r r
Again it works out and we have found all our eigenvectors.
160
Broad tutorial (Friday)
* Exercise 1.
2
0 0
3
1 3 1
Consider the Markov matrix A . Diagonalize it and use your result to determine
6 4 3
1 1 2
6 4 3
the long term state of the population (no matter what the starting state was) by calculating
x1 x1
lim Ak x2 for general population x2 .
k
x3 x3
Solution:
Background
At the tutorial there was more explanation about the background of Markov transition
matrices. It describes transition in the labour market, for which there are three states (e.g. state
1: employment; state 2: unemployment; state 3: non-participation).
The matrix describes the probabilities in the transitions across the three states between period
t and period t+1.
Note that the numbers in the matrix should be read as conditional probabilities.
2/3 = Pr(employed in period t+1 | someone was employed in period t)
1/6 = Pr(unemployed in period t+1 | someone was employed in period t)
1/6 = Pr(non-participant in period t+1 | someone was employed in period t)
These probabilities add up to one exactly.
3/4 = Pr(unemployed in period t+1 | someone was unemployed in period t)

1/4 = Pr(non-participant in period t+1 | someone was unemployed in period t)
These probabilities add up to one exactly
2/3 = Pr(non participant in period t+1 | someone was non participant in period t)
1/3 = Pr(unemployed in period t+1 | someone was non participant in period t)
These probabilities add up to one exactly
Note that x1 x2 x3 1
2
0 0
3 x1
1 3 1
Thus Ax x2
6 4 3
x3
1 1 2
6 4 3
is informative about the states in period t+1
To diagonalize the transition-matrix A, we have to start by finding the eigenvectors and

161
eigenvalues. We look at the characteristic equation (we showed in the lecture why this
equation matters):
2
0 0
3
1 3 1 2 3 2 1 2 1 17 2 1
P( ) ( )(( )( ) ) ( )( ) 0
6 4 3 3 4 3 12 3 2 12 12
1 1 2
6 4 3
2 2
0 ( )(5 17 12 2 ) ( )(1 )(5 12 )
3 3
2 5
This gives us three eigenvalues: 1 1, 2 , 3 . (You have to be lucky to be able to
3 12
solve a cubic equation this way. Dont worry; on an exam you will always be lucky.)
To get the associated eigenvectors v1 , v 2 , v3 , we use the equation:
( A 1I ) v1 0
W sweep:
2 1 1
1 0 0 0 0 0 0 0 0 0
3 3 3
1 3 1 1 1 1 4
1 0 0 0 1 0
6 4 3 6 4 3 3
1 1 2 1 1 1 0 0 0 0
1 0 0
6 4 3 6 4 3
0
1
So v1 4r for any r. Of course, if v1 is to represent a population, then r .
7
3r
We check if v1 is indeed an eigenvector with eigenvalue 1:
2
0 0
3 0 0 0
1 3 1
4r 3r r 1 4r So it is as we wanted.
6 4 3
3r r 2r 3r
1 1 2
6 4 3
2
The eigenvector for 2 :
3
2 2
0 0 0 1
3 3 0 0 0 0 1 2 0
2
1 3 2 1 1 1 1
0 0 0 1 2 0
6 4 3 3 6 12 3
0 0 0 0
1 1 2 2 1 1
0 0 0
6 4 3 3 6 4
162
3p
2
So v 2 2 p for any p. We check if v 2 is indeed an eigenvector with eigenvalue .
3
p
2
0 0
3 2p 2p
3p 3p
1 3 1 1 3 1 4 2
2p p p p p 2p
6 4 3 2 2 3 3 3
p p
1 1 2 1 1 2 2
p p p p
6 4 3 2 2 3 3
So again we made no error in calculation.
5
Finally we solve for the eigenvector associated with 3
12
2 5 1
0 0 0 0 0 0
3 12 4 1 0 0 0
1 0 0 0
1 3 5 1 1 1 1 1 1
0 0 0 0 0 1 1 0
6 4 12 3 6 3 3 3 3
0 0 0 0
1 1 2 5 1 1 1 1 1
0 0 0 0
6 4 3 12 6 4 4 4 4
0
5
So v 3 q for any q. We check if v 3 is indeed an eigenvector with eigenvalue .
12
q
2
0 0
3 0 0
0 0
1 3 1 3 1 5 5
q q q q q
6 4 3 4 3 12 12
q q
1 1 2 1 2 5
q q q
6 4 3 4 3 12
Yippee.
Now were almost ready to diagonalize our matrix. Recall that we want to write
0 3 0
1
A CDC , where C is the matrix of eigenvectors, so C 4 2 1 , where we picked
3 1 1
1 0 0
2
easy values for r,p and q, and D is the diagonal matrix of eigenvalues, so D 0 0 .
3
5
0 0
12
1
We still need to find C . Because we do not show how to find an inverse of a matrix in this
course its not hard, but we can only do so much we simply postulate that
163
1 1 1
7 7 7
1 1
C 0 0 and check if this is indeed true:
3
2 3 4
21 7 7
1 1 1
0 3 0 7 7 7 1 0 0
1 1
C C 4 2 1 0 0 0 1 0 How good of us.
3
3 1 1 0 0 1
2 3 4
21 7 7
Finally we have diagonalized A: A CDC 1 .
Now we want to use this to make it easy to raise A to a certain power. Notice that:
Ak (CDC 1 )k (CDC 1 ) (CDC 1 ) (CDC 1 ) (CDC 1 ) CDk C 1
k
So we only have to raise D to the power k, and D is a diagonal matrix, so:
1k 0 0
k
1 2 1
A k
CD Ck
C 0 0 C
3
k
5
0 0
12
What we wanted was to take the limit of this for to infinity, to see what would happen after
infinitely many periods, i.e. in the long run. But, because two of our eigenvalues are smaller
than one, they tend to zero as k tends to infinity. So:
1 1 1
1k 0 0 7 7 7
k 1 0 0 0 3 0 1 0 0
2 1 1 1
lim Ak lim C 0 0 C C 0 0 0 C 4 2 1 0 0 0 0 0
k k 3 3
0 0 0 3 1 1 0 0 0
5
k 2 3 4
0 0 21 7 7
12
1 1 1
0 0 0
0 3 0 7 7 7
4 4 4
4 2 1 0 0 0
7 7 7
3 1 1 0 0 0
3 3 3
7 7 7
So now we can calculate:
164
0 0 0 0 0
x1
4 4 4 4 4
x2 ( x1 x2 x3 )
7 7 7 7 7
x3
3 3 3 3 3
( x1 x2 x3 )
7 7 7 7 7
The last equality follows that a population vector has ( x1 x2 x3 ) 1 . So in the long run the
population will be divided over states 2 and 3 in the proportions 4:3, while nobody will be in
state 1. (Can you understand just by looking at matrix A why that might be the case?)
Finally, not that the long run population vector that we found is also an eigenvector with
eigenvalue 1. That is no coincidence: it is almost always the case with Markov chains, in fact
always if the long-run state is well defined. The reason is as follows: In the long run, we
expect a steady state, so nothing changes anymore. So we want a vector such that, if A works
on it, we get our vector back. But that is just an eigenvector with eigenvalue 1.
Dont get angry, we did not sweat for nothing. Although it is true that it is much easier to find
the long run state by looking for an eigenvector with eigenvalue 1, our method is the only way
( I know of) to fairly easily find Ak for any large k.
*Exercise 2.
We show that the determinant-volume formula holds in a special case and discuss the general
proof.
Solution:
We start with a unit square C, characterized by the vectors (1,0),(0,1) and investigate what
a c
happens under transformation T : 2 2
with associated matrix .If we let T work
b d
on our two vectors, we get
a c 1 a
b d 0 b
a c 0 c
b d 1 d
So our transformation on the unit square looks like this:
165
Now we know the area of the unit square is 1, so to calculate the determinant of the matrix, all
we have to do is calculate the area of the resulting parallelogram. To calculate this, we need
one geometric fact, which we now illustrate.
The parallelogram with the blue sides has the same area as the parallelogram with the red
sides. In general, if you keep one side of a parallelogram fixed and you move the opposing
side along a parallel line, the area of the resulting parallelogram is the same as that of the
original.
We now use this fact to transform our parallelogram given by (a,b) and (c,d) into a more
manageable one with the same area. We actually use it twice, to transform it into a rectangle:
166
So we see that the rectangle given by (p,0) and (0,q) has the same area. Clearly that area equal
pq. So what we have still to do is calculate p and q. We start with p. What did we do in the
first step, the first shift of parallelograms? We took our point (a,b) and went in the direction of
(c,d) until we reached the x-axis, so we have a b r c d p 0 for some r to be
b bc
determined. We know b rd 0 r , so p a rc a .
d d
Thats one out of the way. Actually, finding q is easier. We see in the figure that going from
(c,d) to (0,q) is a horizontal shift, so the y-coordinate does not change: q=d.
So the area of our rectangle and therefore also our original parallelogram is
bc
pq (a )d ad bc
d
This is indeed the determinant formula for the two-dimensional case.
Of course, this is no proof of the formula in general. For that we would have to show that it
holds for all shapes we could start with, not just the unit square. The way to do that is not by
extending the argument we gave above (just imagine doing this for general parallelograms in
higher-dimensional spaces). Instead what mathematicians do is very different: they look at
volume as a function of a shape and show that is must have certain properties (for instance, if
you translate a shape, its volume does not change). Then they show that there can be only one
such function. And then they show that the determinant also has these properties. Then they
can conclude that determinant indeed gives the volume of a transformation. We wont trace
their steps here, as that would take as much too far afield, but you might be interested to see
how you can handle such a seemingly awesome problem.
* Exercise 3.
1
Prove that det( A 1 ) (if A 1 exists of course).
det( A)
Solution:
If A 1 exists, then we can say A A 1 I , so det( A A 1 ) det( I ) 1 . Now recall that
167
1 1
det(C B) det(C ) det( B) , so 1 det( A A ) det( A) det( A 1 ) det( A 1 ) .
det( A)
* Exercise 4.
Prove that det( A) i , if A is diagonalizable, where i are the eigenvalues of A.
i
Solution:
We first establish the intuition. Suppose A is 2x2. Because A is diagonalizable, we know it has
2 eigenvectors v, w with associated eigenvalues , . Now consider the parallelogram given
by v, w and consider what would happen to it when multiplied by A. We call the original
parallelogram P and the new one Q.
In the figure it is quite clear that Vol (Q) Vol ( P) . Therefore we should find that indeed
det( A) i . We now proceed to prove this.
i
From class we know that for a diagonalizable matrix A the following holds:
AC CD , where C is the matrix with for every column an eigenvector of A, and D is a
diagonal matrix with the associated eigenvalues on the diagonal. Now we know:
det( AC) det(CD) det( A) det(C) det(C) det( D) det( A) det( D)
1 0 0
But D is a diagonal matrix, so det( A) det( D) 0 i 0 i .

i
0 0 n
We have quite a powerful apparatus by now. This was not such an easy theorem to
understand, but the proof is just a few lines.
Consequence: if one of the eigenvalues of A equals zero, the determinant of the matrix A will
be zero. If one of the eigenvalues of A equals zero, the inverse of the matrix A does not exist.
168
1.
a) Compute the a for which the matrix
3 a
B
6 4
has a determinant of zero.
Solution:
3 a
B 3 4 ( 6) a 0
6 4
a 2
b, c en d) Show that for this specific value a, the matrix B is not of full rank. What will be the
dimension of the outcome space (or mapping) of the matrix B for this value of a? What is the
outcome space (or mapping) of the matrix B?
Solution:
The outcome (image or mapping) of B is
3 2
1 2 1 , 2
6 4
Consider the 1 and 2 for which
3 2 0
1 2
6 4 0
Both vectors are linearly independent if 1 0 and 2 0 . Thus we solve both equations:
3 1 2 2 0 eq(1)
6 1 4 2 0 eq(2)
1 (2 / 3) 2 0 eq(1) ' eq(1) / 3

1 (2 / 3) 2 0 eq(2) ' eq(2) / ( 6)
Thus for 1 (2 / 3) 2 we the linear combination of both vectors is zero:
3 2 3 2 0
1 2 (2 / 3) 2 2
6 4 6 4 0
2
It means that the dimension of the mapping is equal to 1, because it is the sub-space ,
which is spanned by one vector. This sub-space is any linear combination of the vectors
169
3 2 3
and , so that it can be characterized as where
6 4 6
e) Compute the eigenvalues of B and show that the product of its eigenvalues is equal to the
determinant of B.
Solution:
3 2
B I 0
6 4
(3 )(4 ) 12 0
2
7
0
( 7) 0
Thus there are two eigenvalue 0, 7
The product of both eigenvalues is equal to zero.
f) Compute the corresponding eigenvectors and the corresponding matrix decomposition of B.
Solution:
For the eigenvalue 0:

3 2 x1 x 0
0 1
6 4 x2 x2 0
The corresponding eigenvector: 3x1 2 x2 0
2
Eigenvector for 0:
3
For 7:
3 2 x1 x1 7 x1
7
6 4 x2 x2 7 x2
The corresponding eigenvector: 4 x1 2 x2 0
2
7:
4
Thus:
170
1
B P P
3 2
=
6 4
1
1
2 2 0 0 2 2 2 2 0 0 4 / 14 2 / 14
P P
3 4 0 7 3 4 3 4 0 7 3 / 14 2 / 14
matrix of diagonal matrix determinant is -14
eigenvectors with eigenvalues
columns are of 0 and 7 on
the eigenvectors main diagonal
0 14 4 / 14 2 / 14 3 2
B
0 28 3 / 14 2 / 14 6 4
g) Compute the trace of the matrix B and demonstrate that it is equal to the sum of the
eigenvalues.
Trace (B) = 3 + 4 = 7 (which is equal to the sum of the eigenvalues)
171
ADV MATH ADDITIONAL ASSIGNMENTS WEEK 3
ADVANCED MATHEMATICS ADDITONAL EXERCISES

WEEK 3
Question 1 matrix decomposition

1 2
i) Compute the matrix decomposition P 1 AP for A
3 0
4
ii) Using the decomposition, compute A
Solution
Compute the eigenvalues of A:
1 2
A I (1 ) 6 0
3
So that the characteristic equation is ( 3)( 2) 0
For 3 , the eigenvector is
2 x1 2 x2 0 1
so that v1 is an eigenvector
3x1 3x2 0 1
For 2 , the eigenvector is
3x1 2 x2 0 2
so that v2 is an eigenvector
3x1 2 x2 0 3
Thus:
3 0
0 2
1 2
P
1 3
3 2
1 1 2 1 3 2 5 5
P
1 3 5 1 1 1 1
5 5
3 2
1 1 2 5 5 1 0
Check: PP
1 3 1 1 0 1
5 5
P 1 AP
183
3 2 9 6
Thus: 5 5 1 2 1 2 5 5 1 2 3 0
1 1 3 0 1 3 2 2 1 3 0 2
5 5 5 5
Using the decomposition, compute A4
1
A P P
A4 P P 1P P 1P P 1P P 1
P 4
P 1
3 2 3 2 3 2
1 2 34 0 5 5 1 2 81 0 5 5 81 32 5 5
A4
1 3 0 24 1 1 1 3 0 16 1 1 81 48 1 1
5 5 5 5 5 5
275 130
5 5 55 26
195 210 39 42
5 5
Question 2 Span of the matrix

1 1 1
Determine the dimension of the span of the following vectors: u 0 v 2 ,w 1 .
1 1 0
Solution:
The dimension of the span of a set of vectors is equal to the number of linearly independent
vectors in the set. Vectors are linearly independent if the following equation has only the
1 1 1 0
solution 1 2 3 0 . 1u 2v 3 w 1 0 2 2 3 1 0 .
1 1 0 0
We can rewrite this equation as:
1 1 1 1 0
0 2 1 2 0 . We sweep the matrix:
1 1 0 3 0
1 1 1 0 1 1 1 0 1 1 1 0
0 2 1 0 ~ 0 2 1 0 ~ 0 2 1 0
1 1 0 0 0 2 1 0 0 0 0 0
We can stop here, since we are not interested in the explicit solution and it is clear that we
have all the zero-rows that we will get. The number of unknowns (the three lambdas) minus
the number of zero rows is the number of free variables that we have, i.e. the number of
lambdas that we can pick non-zero. This means that there is one linear dependent vector in
184
the three and two linearly independent. So the dimension of the span of u,v and w is 2.
Question 3 - Eigenvalues and eigenvectors

Find the eigenvalues and eigenvectors of the following matrices:
5 0 4
a) 19 2 10
8 0 7
Solution:
We start with the characteristic equation:
5 0 4
2
19 2 10 0 (2 )(( 5 )(7 ) 4 8) (2 )( 2 35 32)
8 0 7
2
(2 )( 2 3) (2 )( 3)( 1)
So we find eigenvalues 1 2, 2 3, 3 1 . Next we solve the equation ( A I )x 0 for
each eigenvalue . We start with 1 2 .
4
0 1 0
7 0 4 0 7 1 0 0 0
70 19 4
19 0 10 0 ~ 0 0 0 ~ 0 0 1 0 Remember that, because the
7
8 0 5 0 0 0 1 0
5 4 8
0 0 0
7
right hand side of the equation is all zeros, our life is a bit easier. For the last step here, we
know that we can simply multiply our equations by the appropriate number to go from
0
70 19 4 x3 0 . So we find x1 0 and x3 0 and x2 free, so x2 is an
( ) x3 0 to
7
0
eigenvector with eigenvalue 2 for any number x2 .
5 0 4 0 0 0
Lets check: 19 2 10 x2 2 x2 2 x2 , yay!
8 0 7 0 0 0
Next 2 3
1
1 0 0
8 0 4 0 2
1 x3
19 1 10 0 ~ 0 1 0 , so x2 0 x3 2 x2 and
2 2
8 0 4 0
0 0 0 0
185
x2
x3 x3 x2 be our free variable, we get x2 is our
x1 0 x1 x2 , so, letting
2 2
2 x2
eigenvector with eigenvalue 3. Lets check.
5 0 4 x2 5 x2 8 x2 3x2 x2
19 2 10 x2 19 x2 2 x2 20 x2 3x2 3 x2 . Hurrah!
8 0 7 2 x2 8 x2 14 x2 6 x2 2 x2
Finally 3 1.
4 0 4 0 1 0 1 0 1 0 1 0
19 3 10 0 ~ 0 3 9 0 ~ 0 1 3 0 , so x1 x3 0 x1 x3 and
8 0 8 0 0 0 0 0 0 0 0 0
x3
x2 3x3 0 x2 3x3 , so our eigenvector is 3x3 with eigenvalue -1. Lets check:
x3
5 0 4 x3 5 x3 4 x3 x3 x3
19 2 10 3x3 19 x3 6 x3 10 x3 3x3 3x3 . Phew, were done.
8 0 7 x3 8 x3 7 x3 x3 x3
5 0 0
b) 2 1 2
1 3 4
Solution:
Just the bare calculations this time.
5 0 0
2
2 1 2 (5 )(( 1 )(4 ) ( 2) 3 (5 )( 3 10)
1 3 4
(5 )( 5)( 2) 0
1 5, 2 2 . A strange thing happens here. The value 5 is twice a solution. Well see
what this entails in a minute. We call this phenomenon the multiplicity of the eigenvalue. The
multiplicity of the eigenvalue 5 is 2, the multiplicity of the eigenvalue -2 is 1 (it is only once a
solution).
Lets find the eigenvectors for 1 5
0 0 0 0 1
1 0 3 3x2 x3
2 6 0 0 , x1 3x2 x3 0 x1 3x2 x3 ,
2 0 ~ 0 0 x2
1 3 1 0 0 0 0 0 x3
We see that we have two free variables now! This is because of the multiplicity of two that we
found for this eigenvalue. It doesnt really matter for the calculations, though. Lets check our
finding.
186
5 0 0 3x2 x3 15 x2 5 x3 15 x2 5 x3 3x2 x3
2 1 2 x2 6 x2 2 x3 x2 2 x3 5 x2 5 x2 So it works out.
1 3 4 x3 3x2 x3 3x2 4 x3 5 x3 x3
On to 2 2
7 0 0 0 1 0 0 0 1 0 0 0 0
2 1 2 0 ~ 0 1 2 0 ~ 0 1 2 0 , x1 0, x2 2 x3 0 x2 2x3 , 2x3
1 3 6 0 0 3 6 0 0 0 0 0 x3
We check the result:

5 0 0 0 0 0 0
2 1 2 2 x3 2 x3 2 x3 4 x3 2 2 x3 .
1 3 4 x3 6 x3 4 x3 2 x3 x3
2 1 1
c) 9 4 3
3 1 2
Solution:
Again just the bare calculations.
2 1 1
9 4 3 0 ( 2 )((4 ) (2 ) 3) ( 9(2 ) ( 3) 3) ( 9 ( 3)(4 ))
3 1 2
2 2 3 2
( 2 )( 6 5) ( 9 9 ) (3 3 ) 2 12 10 6 5 9 9 3 3
3 2
4 5 2 ( 2)( 1)( 1)
Dont worry if you couldnt do the last step, just check that it works out. We get 1 2, 2 1
, the latter again with multiplicity 2. We find the eigenvectors, first for 2 1
3 1 1 0 3 1 1 0
9 3 3 0 ~ 0 0 0 0 , so 3x1 x2 x3 0 x3 3x1 x2 , where we let x1 , x2
3 1 1 0 0 0 0 0
x1
be our free variables. So x2 is an eigenvector for any value for x1 , x2 . Lets check:
3x1 x2
2 1 1 x1 2 x1 x2 3x1 x2 x1
9 4 3 x2 9 x1 4 x2 9 x1 3x2 x2
3 1 2 3x1 x2 3x1 x2 6 x2 2 x2 3x1 x2
On to 1 2:
187
4 1 1 0
4 1 1 0 4 1 1 0
1 3 x3 be our free variable,
9 2 3 0 ~ 0 0 ~ 0 1 3 0 so, letting
4 4
3 1 0 0 0 0 0 0
1 3
0 0
4 4
x3
x2 3x3 0 x2 3x3 and 4 x1 x2 x3 0 4 x1 3x3 x3 x1 x3 . So 3x3 is an
x3
eigenvector for any value x3 . Lets check:
2 1 1 x3 2 x3 3x3 x3 2 x3 x3
9 4 3 3x3 9 x3 12 x3 3x3 6 x3 2 3x3 .
3 1 2 x3 3x3 3x3 2 x3 2 x3 x3
188
Exercise 4 - Diagonalization
349
5 0 4
Calculate 19 2 10 . Hint: use the previous question (a).
8 0 7
Solution:
0
In a) we found the eigenvalues 1 2, 2 3, 3 1 and the respective eigenvectors x2 ,
0
x2 x3
x2 and 3 x3 . To diagonalize the matrix, we plug in some values in the eigevectors, say:
2 x2 x3
0 1 1 5 0 4 2 0 0
1
1 , 1 , 3 . Then we want to write 19 2 10 CDC , where D 0 3 0 and
0 2 1 8 0 7 0 0 1
0 1 1
C 1 1 3 . We wont go into how to calculate C 1
, here we just give it to be
0 2 1
5 1 2 0 1 1 5 1 2 1 0 0
1
C 1 0 1 . Indeed 1 1 3 1 0 1 0 1 0 .
0 2 1 0 2 1 2 0 1 0 0 1
5 0 4 0 1 1 2 0 0 5 1 2
1
So 19 2 10 CDC 1 1 3 0 3 0 1 0 1 and
8 0 7 0 2 1 0 0 1 0 2 1
349
5 0 4 0 1 1 2349 0 0 5 1 2
349 1 349
19 2 10 CD C 1 1 3 0 3 0 1 0 1 , which is
8 0 7 0 2 1 0 0 1349 0 2 1
something you could easily calculate (but I wont bother).
Exercise 5 - Eigenvalues
Please give the matrix A for which
1 1 0
Ae1 t , Ae2 3 , Ae3 2
0 1 1
Please give the value of t of the matrix A for which the eigenvalue of 0 is one of the
eigenvalues of the matrix A. Show that 0 is an eigenvalue of this matrix.
189
Solution:
1 1 0
The matrix is A t 3 2 , because it can be shown that
0 1 1
1 1 0 1 1 1 1 0 0 1 1 1 0 0 0
Ae1 t 3 2 0 t , Ae2 t 3 2 1 3 , Ae3 t 3 2 0 2
0 1 1 0 0 0 1 1 0 1 0 1 1 1 1
We need to compute the t for which the determinant of the matrix A is zero.
1 1 0
det( A) | A | t 3 2 0
0 1 1
3 2 1 0
1 t 0
1 1 1 1
( 3 2) t ( 1 0) 0
t 1
Next, for t=1, we need to compute the eigenvalues of the matrix A.

A I 0
1 1 0
t 3 2 0
0 1 1
3 2 1 0
(1 ) 1 0
1 1 1 1
The characteristic equation is
3
3 2 0
Which has the solutions =0 or =3. =0 is an eigenvalue of the matrix A.
Exercise 6 Determinant and eigenvalues
7 0 3
Given B 9 2 3
18 0 8
Demonstrate that for this particular matrix det( B) 1 2 3
(thus the determinant of B is equal to the product of the eigenvalues; this is a general
property)
190
Solution:
7 0 3
7 3
det( B) 9 2 3 2 2( 56 54) 4
18 8
18 0 8
Again, we compute the eigenvalues of the matrix B.
B I 0
7 0 3
9 2 3 0
18 0 8
7 3
( 2 ) 0
18 8
So that the characteristic equation is
3
3 2 4 0
which has the solutions
1 2, 2 2, 3 1
The product of the eigenvalues of B is equal to the determinant of the matrix B:
1 2 3 4 det( B)
Exercise 7 Eigenvalues and trace of a matrix
1 0 1
Given C 1 1 3
1 0 3
If the trace of matrix C is defined as the sum of the diagonal elements, show that for this
particular matrix trace(C ) 1 2 3
(thus the trace of C is equal to the sum of its eigenvalues). This is a general property of
matrices.
Solution:
The trace of matrix C is 1+1+3=5
For the eigenvalues of the matrix C, we must compute
C I 0
1 0 1
1 1 3 0
1 0 3
1 1
(1 ) 0
1 3
Characteristic equation:
2
(1 )((1 )(3 ) 1) (1 )(3 4 1) (1 )( 2) 2 0
Which has the solutions:
1 1, 2 2, 3 2
191
1 2 3 1 2 2 5 trace(C)
Exercise 8 Trace and determinant

We have a 2 X 2 matrix D, which has two eigenvalues 1 and 2 , a determinant
Det(D) , and a trace Trace(D) . Please use the general requirements of questions 2
and 3 (regarding the determinant and trace of a matrix). Starting with the expressions
of question 2 and 3, please derive the general relationship between Trace(D) and
Det(D) for (2X2) matrices that have
a) two equal eigenvalues 1 2
Solution:
Trace: 1 2 T
Determinant: 1 2 D
For equal eigenvalues 1 2 we can compute the trace and determinant. There are two
equations with three unknowns ( 1 , T and D).
T
2 1 T 1 T
2
2 D
1 D 2
1 D
T2
or D
4
b) two unequal eigenvalues 1 2
Solution:
Trace: 1 2 T
Determinant: 1 2 D
We express 1 in terms of T and 2 , and we solve the expression for 2
Thus for the first equation:

1 2 T
1 T 2
Which is substituted in the second equation:
1 2 D
(T 2 ) 2 D
Which is a second-order polynomial function in 2
2 2
2 T 2 D 0 (in general: a 2 bT 2 c 0)
This polynomial function has two unequal solutions if the discriminant b2 4ac 0
Thus: T 2 4D 0
or T 2 4D
192
c) Two negative eigenvalues 1 0 and 2 0
Solution:
It is a combination of questions a and b:
T 2 4D
T 0
D 0
d) Two positive eigenvalues 1 0 and 2 0
Solution:
It is a combination of questions a and b:
T 2 4D
T 0
D 0
193
ADV MATH LECTURE WEEK 4
Klein: Chapters 6, 7, and 8
Derivatives as limits K.6.3.

Differentiability K.6.3.
Differentials K.6.4.
Rules of differentiation (up to chain rule) K.7.1. K.7.2.
Second order derivative K.7.3.
Multivariate functions K.8.1.
Partial derivatives K.8.2.
Young's rule K.8.2.
Chain rule again K.8.3.
Total differentials K.8.4.
Implicit differentiation K.8.4.
Marginal concept vs derivative K.6.3.
Concavity and second order derivatives K.7.3.
Homogeneous functions K.8.3.
Eulers equation See slides
Homothetic functions See slides
Taylors approximation See slides
Gradient and directional derivative See slides
183
Average rate of change of function
Definition:
Secant line: line between the points ( xA , y A ) and ( xB , yB )
where y A f ( xA ) and yB f ( xB )
f ( xB ) f ( x A )
y ' yA ( x ' xA )
xB x A
For any point ( x ', y ') on this line, x ' is within [ xA , xB ] and y ' is
within [ y A , yB ]
Definition: The average rate of change of the function y f ( x)

over the closed interval [ xA , xB ] is
y f ( xB ) f ( x A )
x xB x A
184
Differential calculus
Difference quotient:
Let y f ( x)
x0 : initial value
y f ( x0 x) f ( x0 )
x x
Example 1:
y a bx cx 2
The difference quotient becomes:
y a bx0 b x c( x0 b x)2 (a bx0 cx0 2 )

x x
b 2cx0 c x
Derivative:
Let y f ( x)
x0 : initial value
dy f ( x0 x) f ( x0 )
lim
dx x 0 x
Also denoted by: f '( x0 )
Example 1 (continued)
dy
lim b 2cx0 c x b 2cx0
dx x 0
185
Second derivative
Gives the curvature of the function f, which is the change of the
steepness of f.
d 2 y d dy
dx 2 dx dx
Sum difference rule

For any two functions f ( x) and g ( x)
d ( f ( x) g ( x))
f '( x) g '( x)
dx
Proof
h( x
x ) h( x )
h '( x) lim
x 0 x
f (x x) g ( x x) ( f ( x) g ( x))
lim
x 0 x
f (x x) ( f ( x) g(x x) g ( x)
lim lim
x 0 x x 0 x
f '( x) g '( x)
Scalar rule
Let g ( x) kf ( x) k
g '( x) kf '( x)
Proof:
k f (x x) k f ( x)
g '( x) lim
x 0 x
f (x x) f ( x)
k lim
x 0 x
k f '( x)
Product rule
For f ( x) g ( x) h( x)
186
f '( x) g '( x) h( x) h '( x) g ( x)
Power rule
For f ( x) k x n k , n
f '( x) n k x n 1
Exponential function rule

For f ( x) ekx k
f '( x) k ekx
Otherwise stated:
For f ( x) exp(k x)
f '( x) k exp(k x)
Chain rule
The derivative of the composite function
y f ( x) g (h( x))
where u h( x)
g (h( x)) g (u)
and both h( x) and g (u ) are differentiable functions
df ( x)
g '(h( x)) h '( x)
dx
Or
dy dy du
dx du dx
Natural logarithmic function rule

f ( x) ln( x)
d ln( x) 1
f '( x)
dx x
187
Definition:
A function is differentiable in an interval if a derivative exists for
each point in that interval.
Requirement: the function must be continuous.
Example 2
The function
y 3x 2
is continuous at x 0
f (0) 3 02 0
lim 3x 2 0
x 0
lim 3x 2 0
x 0
and it is also differentiable at x 0 .
dy
6x
dx
lim f '( x) lim 3 2 x 0
x 0 x 0
lim f '( x) lim 3 2 x 0
x 0 x 0
Example 3
3x x 0
f ( x) | 3 x | 0 x 0
3x x 0
The function is continuous at x 0
The function is non-differentiable at x 0 because
lim f '( x) lim 3 3
x 0 x 0
lim f '( x) lim 3 3
x 0 x 0
Thus f '(0) is undefined.
188
Absolute minimum/maximum
A function f(.) is defined on a domain S. The function has an absolute

maximum if there exists at least one point c in S such that
f ( x) f (c) for all x S
The function has an absolute minimum (or global minimum) if there

exists at least one point c in S such that
f ( x) f (c) for all x S
Example 4
f ( x) ln( x) on S (0,3]
The function has no absolute minimum on S. The absolute maximum
is ln(3) .
189
Definition:
Global maximum: largest value of function over range
Global minimum: smallest value of function over range
Example 5:
Minimum:
It says that we minimize the function 5 ( x 8)2 with respect to x. The
minimum function value (at the argument x=8) is equal to 5.
min 5 ( x 8) 2 5
x
The minimum function value is at the argument x=8:

arg min 5 ( x 8) 2 8
x
Maximum:
It says that we maximize the function 3 2( x 9)2 with respect to x.
The function value (at x=9) is equal to 3.
max 3 2( x 9)2 3
x
The argument at which the function has a maximum:

arg max 3 2( x 9) 2 9
x
190
Rolles theorem
If the function f(x) is continuous in the bounded (or closed) interval
[a, b] and differentiable in the open interval (a, b) and f (a) f (b) ,
then there exists at least one point c in the open interval (a, b) such
that
f '(c) 0
Example 6
For the function
f ( x) 1 exp( x 2 x)
we have at f (0) 0 and f (1) 0 , so that we may apply Rolles
theorem.
The derivative of the function is

f '( x) (2 x 1)exp( x 2 x)
1
f '( x) 0 for x
2
So that there is an extremum at the open interval (0,1)
1
We can also compute the second derivative of f(.) at x
2
f ''( x) 2exp( x 2 x) (2 x 1)2 exp( x 2 x)
f ''( x) [ 2 (2 x 1)2 ]exp( x 2 x)
1 1
Note that f ''( ) 2exp( ) 0
2 4
191
Mean value theorem

If the function f(x) is continuous in the bounded (or closed) interval
[a, b] and differentiable on each point in the open interval (a, b) then
there exists at least one point c in (a, b) such that
f (b) f (a)
f '(c)
b a
Alternative formulation: f (b) f (a) f '(c)(b a)
Example 7:
The function f ( x) 3x 2 5x can be tested in [0,3] . It is a continuous
function on [0,3] and differentiable on [0,3] . We test the mean value
theorem.
f (3) f (0) (27 15) 0

4
3 0 3 0
f '( x) 6 x 5
The equation f '( x) 4 has one solution: x* 4
192
Theorem
If f '( x) 0 for every x (a, b) then f is strictly increasing on [a, b]
If f '( x) 0 for every x (a, b) then f is strictly decreasing on [a, b]
Theorem
If f is continuous on a closed interval [a, b] and assume that the
derivative f ' everywhere in the open interval (a, b) except possibly at
a point c.
If f '( x) 0 for every x c and f '( x) 0 for every x c , then f '
has a relative maximum at c.
If f '( x) 0 for every x c and f '( x) 0 for every x c , then f '
has a relative minimum at c.
Critical points
If f has a derivative everywhere on [a, b] . An extremum occurs where:
i) At the endpoints a and b
ii) At the interior points where f '( x) 0
Theorem
Let c be a critical point on (a, b) , so that a c b and f '(c) 0 . We
assume that the second derivative exists in (a, b) . Then
i) If f '' 0 in (a, b) , f has a relative maximum at c.
ii) If f '' 0 in (a, b) , f has a relative minimum at c.
193
Example 8
1
f ( x) 2
( x 1)
2x
f '( x)
( x 2 1) 2
f '( x) 0 if x 0 and f '( x) 0 if x 0 and f '( x) 0 if x 0
Thus for x 0 , the function has a relative maximum.
2(3x 2 1)
f ''( x)
( x 2 1)3
1
The first derivative changes of sign for x 2 . The points of
3
1 1
inflection are x or x 3
3 3
194
Theorem
Assume that f is a continuous function on (a, b) . A function is
strictly concave in (a, b) if f ' exists and f '' 0 for all values of x in
that interval.
Theorem
Assume that f is a continuous function on (a, b) . A function is
strictly convex in (a, b) if f ' exists and f '' 0 for all values of x in
that interval.
Example 9
f ( x) x 4 8 x 2 3
f '( x) 4 x3 16 x
f '( x) 0 for x 0
f ''( x) 12 x 2 16 0
Thus f ''(0) 0 so that the function is a minimum at x 0
Because f ( x) is a concave function, the function reaches at x 0 a
global minimum.
195
Taylor Series expansion
The function f(x) can be around x = a approximated by:
y f ( x)
Linear approximation:
h( x) f (a) b ( x a)
Take b f '(a)
h( x) f (a) f '(a) ( x a)
Quadratic approximation around x=a:

j ( x) f (a) f '(a) ( x a) c ( x a) 2
Take c such that j ''( x) f ''( x)
j ''( x) 2c
1
c f ''(a)
2
1
j ( x) f (a) f '(a) ( x a) f ''(a) ( x a) 2
2
n-th degree approximation around x=a:

f (a) f '(a) f ''(a) f ( n ) (a)
m( x) ( x a) ( x a) 2 ( x a)n
0! 1! 2! n!
196
f (a) f '(a) f ''(a) f ( n ) (a)

m( x) ( x a) ( x a) 2 ( x a)n
0! 1! 2! n!
Example 10
Take the exponential function f ( x) e x and lets consider a 0 . We
know that
f (a) f '(a) ... f ( n) (a) 1
The Taylor approximation becomes
x2 xn
e 1 x
x
2! n!
Example 11
Take the exponential function f ( x) e x
and lets consider a 0 . We
know that
f (a) f '(a) ... f ( n ) (a) 1

x 2 x3 n x
n
e x
1 x ( 1)
2! 3! n!
x2 x3 xn
e x
1 x ( 1) n
as x 0
2! 3! n!
Example 12
Take the exponential function f ( x) sin( x) and lets consider a 0 .
We know that
f ( x) sin( x) , f '( x) cos( x) , f ''( x) sin( x) , f '''( x) cos( x) ,
f (4) ( x) sin( x) , etc.
sin(0) 0
cos(0) 1
x3 x5 x 7 x2n 1
sin( x) x ( 1) n
3! 5! 7! (2n 1)!
197
x3 x5 x7 x2n 1
sin( x) x ( 1) n
as x 0
3! 5! 7! (2n 1)!
Example 13
f (a ) f '(a ) f ''(a ) f ( n ) (a )
m( x ) ( x a) ( x a)2 ( x a)n
0! 1! 2! n!
f ( x) log(1 x)
1
f '( x)
1 x
1
f ''( x)
(1 x) 2
2
f '''( x)
(1 x)3
2 3
f (4) ( x)
(1 x) 4
(n 1)!
f ( n ) ( x)
(1 x) n
f (0) f '(0) f ''(0) 2 f ( n ) (0)
log(1 x) ( x) ( x) ( x) n
0! 1! 2! n!
1 1 1 2 (n 1)! 1 n
log(1) x x ... x
1 0 (1 0) 2 2! (1 0) n n!
x2 xn
x ...
2 n
x2 x3 xn
log(1 x) x ... as x 0
2 3 n
Example 14
log(1 2 x) 2x
We can show that lim lim 2
x 0 x x 0 x
198
Partial derivatives
The partial derivative of f ( x1 , x2 , , xn ) with respect to xi is

y f ( x1 , , xi xi , , xn ) f ( x1 , , xi , , xn )
lim
xi xi 0 xi
Notation: fi ( x1 , x2 , , xn )
Cross derivatives:
2
y
f11 ( x1 , x2 )
x12
2
y
f 22 ( x1 , x2 )
x22
2
y
f12 ( x1 , x2 )
x1 x2
2
y
f 21 ( x1 , x2 )
x2 x1
Youngs theorem
Let the function f : n , for which all of the partial derivatives of

exist and are themselves differentiable with continuous derivatives
then
f ( x1 , x2 , , xn ) f ( x1 , x2 , , xn )
i, j 1,..., n
xi xj xj xi
or
f ji ( x1 , x2 , , xn ) fij ( x1 , x2 , , xn )
199
Example 15
2
y f ( x1 , x2 ) e3 x1 5x25 3x2e5 x1
y 2
6 x1e3 x1 15 x2e5 x1
x1
y
25 x2 4 3e5 x1
x2
2
y 2 3 x12
2
36 x e 75 ye5 x1
x1
2
y
2
100 x23
x2
2
y y
( ) 15e5 x1
x2 x1 x2 x1
2
y y
( ) 15e5 x1
x1 x2 x1 x2
200
Composite functions
Multivariate chain rule (I)

If the arguments of the function
y f ( x1 , x2 , , xn )
are themselves differentiable functions of the variable t, such that

x1 g1 (t ) x2 g 2 (t ) ,..., xn g n (t )
Then:
dy dx1 dx2 dxn
f1 f2 fn
dt dt dt dt
with
y
fi
xi
201
Example 16
y f ( x1 , x2 ) x12 2 x2 4
x1 5t 2 2e3t
x2 t 3 e 5t
We compute the partial derivatives
x1
10t 6e3t
t
x2
3t 2 5e 5t
t
y
2 x1
x1
y
8 x23
x2
y
2 x1 (10t 6e3t ) 8 x23 (3t 2 5e 5t )
t
y
2(5t 2 2e3t )(10t 6e3t ) 8(t 3 e 5t )3 (3t 2 5e 5t )
t
202
Multivariate chain rule (II)
If the arguments of the function
y f ( x1 , x2 , , xn )
are themselves differentiable functions of the variables t1 , , tm , such

that
x1 g1 (t1 , , tm ) x2 g 2 (t1 , , tm ) ,..., xn g n (t1 , , tm )
Then:
y dx1 dx2 dxn
f1 f2 fn
ti dti dti dti
with
y
fi
xi
203
Homogenous function
A multivariate function y f ( x1 , x2 , , xn )
is homogenous of degree k if for any number s > 0:
sk y f (sx1 , sx2 , , sxn )
Example 17
y f ( x1 , x2 , x3 , x4 ) 3x14 8x23 x33 2 x43 4 x12 x2 2 6 x32 x4 2 5x1 x2 x3 x4
It can be shown that
s 4 y f (sx1 , sx2 , , sxn ) so that the function is homogenous of degree
4.
Eulers theorem
For any multivariate differentiable function y f ( x1 , x2 , , xn )
that is homogenous of degree k if for any number s > 0:
kf ( x1 , x2 , , xn ) x1 f1 ( x1 , x2 , , xn ) xn f n ( x1 , x2 , , xn )
Proof:
The function is homogenous of degree k:
s k y f (sx1 , sx2 , , sxn )
s k f ( x1 , x2 , , xn ) f (sx1 , sx2 , , sxn )
Take the derivative with respect to s (keeping ( x1 , x2 , , xn ) fixed) for
both sides of the equation:
ks k 1 f ( x1 , x2 , , xn ) x1 f1 (sx1 , sx2 , , sxn ) xn f n ( sx1, sx2 , , sxn )
and take s=1
204
Consequence of Eulers theorem
The some of n partial elasticities is equal to k
f1 ( x1 , x2 , , xn ) f n ( x1 , x2 , , xn )
k x1 xn
f ( x1 , x2 , , xn ) f ( x1 , x2 , , xn )
partial elasticity wrt x1 partial elasticity wrt xn
205
Homothetic function
Definition: A monotone transformation of a homogenous function:
y f ( x1 , x2 , , xn )
z g ( y)
is a homothetic function if g(y) is strictly monotonic: g '( y) 0 for

all y or g '( y) 0 for all y.
Example 18
y x14 x25
Is homogenous to the degree 9
The function: z ln( y) 4ln( x1 ) 5ln( x2 ) is a homothetic function.
The function z is not a homogenous function
Property: every homogenous function is a homothetic function (take

g ( y) y )
206
Total differential
The total differential of
y f ( x1 , x2 , , xn )
evaluated at the point ( x10 , x20 , , xn0 )
is
dy f1 ( x10 , x20 , , xn0 )dx1 f 2 ( x10 , x20 , , xn0 )dx2 f n ( x10 , x20 , , xn0 )dxn
207
Implicit functions
Explicit function:
y f ( x1 , x2 , , xn )
Implicit function:
F ( y, x1 , x2 , , xn ) k
Implicit function theorem

For F ( y, x1 , x2 , , xn ) k
that is defined at ( y 0 , x10 , x20 , , xn0 )
That has continuous derivatives at the point ( y 0 , x10 , x20 , , xn0 )

there is a function y f ( x1 , x2 , , xn )
such that :
1) F ( f (( x10 , x20 , , xn0 ), x10 , x20 , , xn0 ) k
2) y 0 f ( x10 , x20 , , xn0 )
0 0 0
Fxi ( y 0 , x10 , x20 , , xn0 )
3) fi ( x1 , x2 , , xn )
Fy ( y 0 , x10 , x20 , , xn0 )
for which Fy ( y 0 , x10 , x20 , , xn0 ) 0
Example 19
6 y 2 x 2 10
We assume that y f ( x) and we take the derivative with respect to x
on the left-hand side and the right-hand side of the equation:
dy
6 4x 0
dx
or
dy 4x 2x
dx 6 3
208
Example 20
For the curve given by F ( x, y) x3 x 2 y 2 y 2 11y 5
compute the slope at the point ( x, y) (2,1)
We assume that y is a function of x.
F ( x, y) x3 x 2 y 2 y 2 11y 5
dy 2
3x 2 2 xy ( x 4 y 11) 0
dx
dy (3x 2 2 xy )
2
for which x 2 4 y 11 0
dx x 4 y 11
dy (3 4 2 2 1) 16
At ( x, y) (2,1) , 2
dx 2 4 1 11 11
209
Suppose we are given a function f : 2 and an equation f ( x, y) 0 . Then this equation

implicitly defines a relation between x and y: for any particular x only certain y obey the
equation (perhaps one, perhaps a few, perhaps none, perhaps infinitely many, call them y * ).
The implicit function theorem states that under certain condition this relation can be locally
represented as a function (so y* g ( x) for some function g) and it states what the derivative
f ( x, y )
dy* dg ( x) x
of this function is, i.e. . In this exercise we will see what locally
dx dx f ( x, y )
y
represented by a function means, as well as three examples of what may go wrong with this
local representation when the conditions of the implicit function theorem do not hold.
Consider first the function f ( x, y) x 2 y 2 1 . We know from first week that the equation
f ( x, y) 0 now represents a circle with radius 1 and centre (0,0). That is, if we graph all the
points (x,y) for which the equation f ( x, y) 0 holds, we get that circle:
This graph sort of looks like the graph of a function, but it is not, because for a function we
want that every x-value gives only one y-value. Here, however, for every x ( 1,1) there are
two corresponding y-values. But if we zoomed in on the graph, we would get something that
that looks like a function:
210
The part in the zoom is perfectly well behaved: for every x-value there is just one y-value. So
in this part of the graph we can talk about a function y* g ( x) .
Now lets revisit the conditions of the implicit function theorem. They are two: the partial
f ( x0 , y0 ) f ( x0 , y0 )
derivatives and must exist at the point ( x0 , y0 ) (the point on which
x y
f ( x0 , y0 )
were zooming in) and 0 (otherwise we would be dividing by zero in the formula
y
f ( x, y )
*
dy dg ( x) x ). Lets check these conditions for f ( x, y) x 2 y 2 1 .
dx dx f ( x, y )
y
f ( x0 , y0 ) f ( x0 , y0 )
2 x0 , 2 y0 . These both exist everywhere (well see in the third
x y
f ( x0 , y0 )
example a case where this isnt so). However, for y0 0 , 0 , so our second
y
condition is violated. What points in the function are we talking about? Well, lets check:
f ( x,0) x2 02 1 x 2 1 0 , so x 1 x 1 . Lets look at these point (-1,0) and (1,0):
So what goes wrong here? At these points, no matter how far we zoom in, there will always
be two y-values. The problem is that at y=0 the graph goes straight up. A rough way of
211
f ( x, y )
f ( x0 , 0) dg ( x) x
thinking about this is that, as 0, , so the function goes
y dx f ( x , 0)
y
straight up or down, leading to two y-values for a particular x-value.
Another example is the equation f ( x, y) 4 x2 (1 x2 ) y 2 0 . Its graph looks like this:
Just looking at the graph, we see immediately that thing will go wrong in three points: (-
1,0),(0,0) and (1,0). So we imagine that our conditions will fail there. Lets check:
f ( x0 , y0 )
(4 x 2 (1 x 2 ) y 2 ) 8 x0 (1 x0 2 ) 8 x03
x x x x0
f ( x0 , y0 )
(4 x 2 (1 x 2 ) y 2 ) 2 y0
y y y y0
f ( x0 , y0 )
Both partial derivatives are well-defined, but y0 0 , 0 . For what values of x
y
does this hold? Well:
212
f ( x, 0) 4 x 2 (1 x 2 ) 02 0 4 x 2 (1 x 2 ) 0
x 1 x 0 x 1
So we indeed find that we have trouble at the points (-1,0),(0,0) and (1,0).
As a final example, lets see what can go wrong if the partial derivative is not defined. Lets
first consider why a derivative might not be defined. Consider the function f ( x) x 2 , that
is the absolute value of x-2. Plotted, it looks thus:
Now lets think about the derivative of this function at x=2. This should be the tangent line at
x=2, but because of the dent in the function, there is no clear-cut tangent line. Therefore, this
function does not have a derivative at x=2.
Now consider the implicit relation f ( x, y) y 2 x 2 0 . Plotted, it looks like this:
Clearly, here we have trouble at (0,-2): no matter how far we zoom in, we never get a
function. So lets check our condition:
f ( x0 , y0 )
( y 2 x2 ) 2 x0
x x x x0
213
1 if y0 2
f ( x0 , y0 )
(y 2 x2 ) not defined if y0 2
y x y y0
1 if y0 0
So for y0 2 we run into trouble. The associated x is: f ( x, y 2) 2 2 x2 0 x 0 , so
we find that it is indeed (0,2) that is causing us headaches.
214
Gradient and directional derivative

If we have a function f : n , we now know how to find all its
partial derivatives. It turns out that the vector of all these partial
derivatives gives some interesting information about our function f.
We call this vector the gradient and we write it as follows:
f
x1
f
f
xn
It turns out that if you evaluate this gradient at a particular point (i.e.
you plug in values for x1 , , xn ), then the vector gives the direction in
which the function increases the most. Furthermore, the length of the
vector is an indication of how much the function increases. Lets see
what this means with a particular example.
2 x2 2 y 2
f: , f ( x, y ) , then
3
f 2x
( x, y )
x 3
f ( x, y ) .
f 4y
( x, y )
y 3
If we now take any point ( x, y) and plug it in, we get a vector which
gives the direction of greatest increase. Before we start drawing this in
a graph, it is useful to consider one further aspect of f. We are often
interested in the level curve of a function. The level curve of f is the
set
{x dom( f ) | f (x) }
for any real number . In words, it gives the set of all points which
give the same outcome w.r.t. f, which often looks like a nice curve.
It turns out that the gradient vector at a point is perpendicular to the

level curve at this point. Lets find level curves for our example. These
x2 2 y 2
would be points ( x, y) such that . This rather reminds us
3
215
of the formula for a circle and indeed it is an elongated circle, or

ellipse. Lets draw the level curve
x2 2 y 2 1
3 3
and draw a few gradients on this level curve.
Figure 1. The blue line is the level curve, the arrows are gradients for
a few selected points. Noticed that we drew all of this in 2 , the
domain of our function.
216
If we zoom in on one of our points, we do indeed see that our

gradients are perpendicular to the level curve.
Figure 2.
217
Finally, we can also relate the gradient to something called the

directional derivative. The partial derivative of f with respect to, for
example, y, gives the marginal increase of f if we follow the function
in the direction of y, i.e. along the y-axis. We would be quite interested
in the marginal increase in any direction. We can define the directional
derivative in the direction of vector s quite in line with the ordinary
derivative like this:
f (x ts) f (x)
f (x, s) lim
t 0 t
Intuitively this means that we take a very small step in the direction of
vector s and see how much the function then increases. The relation
between this directional derivative and the gradient is really neat and
explains why we only focus on the gradient.
f (x, s) f (x) s
This means that we just take the inner product of the two vectors, the
gradient of f and s . It also means that we can describe the marginal
increase in any direction by just focussing on the gradient.
Lets apply it to our particular example. Lets take the directional
derivative at (1,1) in the direction ( 2,1) .
f (x, s) f ((1,1),( 2,1)) f (1,1) ( 2,1)
2x 4 y 2 4 4 4
( , ) ( 2,1) ( , ) ( 2,1) 0
3 3 x 1, y 1 3 3 3
(Here we wrote the gradient as a row vector instead of a column
vector. Here it doesnt matter, because we take the in-product of two
vectors. That is the same for column and row vectors.) So this means
that our function does not increase in the direction ( 2,1) at point
(1,1) . Lets check the picture to see if that makes sense.
218
Figure 3. The function f does not increase in the direction of s (we

shortened s a bit in the picture, for sake of clarity). This makes some
sense. s is tangent to the level curve, so, intuitively, if we go in the
direction s with a very small step, we remain on the level curve and f
does not change value.
219
Technical tutorial Advanced Mathematics, Week 4
Technical tutorial Advanced Mathematics, Week 4
Exercise 1.
Calculate the derivatives of the following functions:
2
a) e 5 x 2x
Solution:
1
d 5 x2 2x d d d
e e ( 5x2 2 x ) e ( 10 x (2 x) 2 )
dx d dx dx
1
5 x2 2x 1 2 1
e ( 10 x (2 x) 2 2) e 5 x 2 x ( 10 x )
2 2x
x 1
b) log( ) (log: natural logarithm)
x2
Solution:
d x 1 d d x 1 1 ( x 2 ( x 1)2 x)
log( 2 ) log( )
dx x d dx x 2 ( x 2 )2
x 2 ( x 2 ( x 1)2 x) ( x 2 ( x 1)2 x)
x 1 ( x 2 )2 ( x 1) x 2
c) elog( x ) 2x
Solution:
d log( x ) 2x d log( x ) 2 x d
e (e e ) ( xe2 x ) e2 x 2 xe2 x
dx dx dx
Alternatively:
d log( x ) 2 x d d 1
e e log( x) 2 x elog( x ) 2 x ( 2)
dx d dx x
1 1
elog( x ) e2 x ( 2) xe2 x ( 2) e2 x 2 xe2 x
x x
2
3x 7 log( x)
d)
( x 1) 4
Solution:
7
2 ( x 1)4 (6 x ) (3x 2 7 log( x))4( x 1)3
d 3x 7 log( x) x
dx ( x 1) 4 ( x 1)8
220
4
e) (e3 )log(3 x )
Solution:
d 3 log(3 x4 ) d 3log(3 x4 ) d
(e ) e (3x 4 )3 3(3x 4 )212 x3 324 x11
dx dx dx
f) x5 3x 2
Solution:
1 1
d d 5 1 5 5x4 6 x
x5 3x 2 ( x 3x 2 ) 2 ( x 3x 2 ) 2 (5 x 4 6 x)
dx dx 2 2 x5 3x 2
3
3x
ax 3x 2
g)
3x 1
Solution:
3
x3 3 x 2 3(a x 3 x 3x 2 )
3 3x 1(log(a)a (3x 3) 6 x)
d a x 3 x 3x 2 2 3x 1
dx 3x 1 3x 1
3 3
h) e(3 x 6)
Solution:
d (3 x3 6)3 d d d 3
6)3
e e ( )3 (3x3 6) e 3( )2 (9 x 2 ) 27e(3 x (3x3 6)2 ( x 2 )
dx d d dx
i) log(ax) , where a > 0 is some constant.

Solution:
d 1 1
log(ax) a
dx ax x
Alternatively:
d d 1
log(ax) (log(a) log( x))
dx dx x
*Exercise 2.
Compute the partial derivative with respect to x and y of the following functions (they are
called the Cobb-Douglas and the Constant Elasticity of Substitution (CES) function
respectively, and you will see them often in Microeconomics as well as more mathematical
Macroeconomics):
a) x a y1 a
Solution:
x a y1 a
x a y1 a
ax a 1 y1 a
x
or
x a y1 a
(1 a) x a y a
221
s 1 s 1 s
b) R ( x s
(1 )y s
)
s 1
Solution:
s 1 s 1 s s 1 s 1 s
s 1 s 1 ss1 1
R ( x s
(1 )y )
s s 1
R( )( x s
(1 )y )
s s 1
x
x s 1 s
s 1 s 1 s s 1 s 1 s
s 1 s 1 ss1 1
R ( x s
(1 )y s s 1
) R( )( x s (1 )y )
s s 1
x
x s 1 s
s 1 s 1 s s 1 s 1 s s 1
s 1 s 1 1
R ( x s
(1 )y s s 1
) R( )( x s (1 )y )
s s 1
(1 ) y s
y s 1 s
s 1 s 1 1 1
R( x s
(1 )y )
s s 1
(1 )y s
Exercise 3.
Calculate the Hessian of the following function. Verify by computation that Youngs rule
holds.
f ( x, y) y e x 2 y x 2 y
Solution:
2y
y ex x2 y y ex 2y
2 xy, y ex 2y
x2 y 2 y ex 2y
ex 2y
x2
x y
2
2y
y ex x2 y y ex 2y
2 xy y ex 2y
2y
x2 x
2
2y
y ex x2 y y ex 2y
2 xy (2 y 1)e x 2y
2x
y x y
2
2y
y ex x2 y (2 y 1) e x 2y
x2 (2 y 1) e x 2y
2x
x y x
2
2y
2
y ex x2 y (2 y 1) e x 2y
x2 (4 y 4)e x 2y
y y
2 2
Note that f ( x, y ) f ( x, y ) , verifying Youngs rule.
y x x y
Exercise 4.
Verify that the second order derivative(s) of the following concave functions is/are indeed
negative (be careful: on the domain of the functions!):
a) x2
Solution:
d2 d
2
x2 2x 2 0
dx dx
222
b) x
Solution:
d2 d 1 1
2
x 0 , as on the domain of the function x 0 .
dx dx 2 x 4x x
c) log( x)
Solution:
d2 d 1 1
log( x) 0
dx 2 dx x x2
d) x a y1 a , 0<<1
Solution:
d2 a 1 a d a 1 1 a
2
x y ax y (a 1)ax a 2 y1 a 0
dx dx
As a 1 <0, a > 0 and x, y > 0. The result is similar for y.
Exercise 5.
Establish if the following functions are homogeneous and, if so, of what degree:
a) f ( x, y) x 2 y 2
Solution:
f (tx, ty) (tx)2 (ty)2 t 2 x2 t 2 y 2 t 2 ( x2 y 2 ) t 2 f ( x, y)
So the function is homogeneous of degree 2.
b) f ( x, y) x2 y 2
Solution:
f (tx, ty) (tx)2 (ty)2 t 4 x2 y 2 t 4 f ( x, y)
So the function is homogeneous of degree 4.

c) f ( x, y) x y
Solution:
f (tx, ty) tx ty t ( x y) t x y t f ( x, y)
So the function is homogeneous of degree .
d) f ( x, y) x2 y 2 x y
Solution:
f (tx, ty) (tx)2 (ty)2 tx ty t 4 x 2 y 2 t x y
We cant go any further with this. The function is not homogeneous (even though it is the sum
of two homogeneous functions).
e) f ( x, y) log( x y)
Solution:
f (tx, ty) log(tx ty) log(t ( x y)) log(t ) log( x y) log(t ) f ( x, y)
Clearly, this is not a homogeneous function either.
223
Exercise 6.
For the following functions, calculate the level sets f (x) 1, f (x) 4 and the gradients at a
point on each of these level curves. Draw a graph of your findings.
a) f ( x, y) x 3 y
b) f ( x, y) x2 y2
2 y2
c) f ( x, y) e x
For the following functions, calculate the level sets f (x) 1, f (x) 4 and the gradients at a
point on each of these level curves. Draw a graph of your findings.
a) f ( x, y) x 3y
Solution:
1 x
For the first level curve we solve the equation x 3 y 1 y . Similarly, for the
3
4 x
second level curve we find x 3 y 4 y . The level curves are thus straight lines.
3
For the gradient we calculate:
f
x 1
f ( x, y ) . The gradient is independent of x and y. This is just like it is for the
f 3
y
derivative in the one variable case: if the function is linear, the derivative is constant. In a
graph:
224
b) f ( x, y) x2 y2
Solution:
For the first level curve, we calculate: f ( x, y) x 2 y 2 1 . Actually, there is little to
calculate. From week 1, we know this is a circle with radius 1, centred at the origin. The
second level curve is similar, a circle with radius 4 2 at the origin. For the gradient we
f
x 2x
find: f ( x, y ) . Clearly, in this case the gradient does depend on x and y. We
f 2y
y
have to calculate it at a point on each level curve. For the first point, lets take
1 3 1 3
x ,y , then x 2 y 2 1 , so we are at our level curve. The gradient then
4 4 4 4
1
2
1 3 4 1
becomes: f( , ) . For the second point, lets take x y 2 . Then
4 4 3 3
2
4
2 2
x2 y2 4 , so we are at our second level curve. The gradient becomes: f (2, 2) .
2 2
225
We draw the all in one graph:
2 y2
c) f ( x, y) e x
Solution:
For the first level curve, we calculate:
2 y2 2 y2 x
f ( x, y ) e x 1 log(e x ) x 2 y2 log(1) 0 y2
2
x x
y y
2 2
So now we have two y-values for each x-value. Also, our xs can only be negative. Our level
curve will be a rotated parabola. Lets calculate the second level curve.
2 2 log(4) x
f ( x, y ) e x 2 y 4 log(e x 2 y ) x 2 y 2 log(4) y2
2
log(4) x log(4) x
y y
2 2
Similarly, here we too have two y-values for each x-value. In this case our x cannot be larger
than log(4) .
Lets calculate the gradient:
f
2
x ex 2 y
f ( x, y )
f 4 y ex 2 y
2
y
Finally, we have to find some points on our level curves. For the first level curve, lets take
8 8
x 8 , then y 2 y 2 , lets pick y 2 . The gradient then
2 2
becomes:
226
8 2( 2)2
e e0 1
f ( 8, 2)
8e0 8
2
4( 2) e 8 2( 2)
For the second level curve lets pick x log(4) 2 , so that
log(4) log(4) 2 log(4) log(4) 2
y 1 y 1 , lets pick y 1 . Our gradient
2 2
2 2(1) 2
elog(4) elog(4) 4
then becomes: f (log(4) 2,1) . We draw the graph:
4elog(4) 16
2
4(1) elog(4) 2 2(1)
In the graph I halved the length of the gradients to make the picture clearer.
227
Exercise 1.
a) Find the second-order Taylor approximation about the point x 0 of the function
2
3x
f ( x) e x log( x 1)
Solution:
The general formula for the Taylor approximation is
f (a) f '(a) f ''(a) 2 f ( n ) (a)
m( x) ( x a) ( x a) ( x a)n
0! 1! 2! n!
For which n! is the factorial of the non-negative integer n, which is defined as the product of
all non-negative integers up to and including n.
1! 1
2! 1 2
3! 1 2 3
n! 1 2 (n 1) n
n
In addition we define: 0! = 1 (and please note that we can write alternatively: n ! i)
i 1
So the second-order approximation becomes:
f (a) f '(a) f ''(a)
( x a) ( x a) 2
0! 1! 2!
Lets first calculate the first and second-order derivatives:
2
f ( x) e x 3 x log( x 1)
2
3x 1
f '( x) (2 x 3)e x
x 1
2 1 2
f ''( x) (2 x 3) 2 e x 3x
2e x 3x
( x 1) 2
We have to evaluate these at the point x 0
f (0) e0 log(1) 1
1
f '(0) 3e0 2
1
1
f ''(0) 32 e0 2e0 12
(0 1) 2
Finally, we plug this all into our formula:
f (0) f '(0) f ''(0)
f ( x) ( x 0) ( x 0) 2
0! 1! 2!
1 2 12 2
x x 6x2 2x 1
1 1 2
So this final, nice quadratic function is our second-order approximation of the original, much
nastier function. Lets see how it performs in a graph.
228
In the graph we see that the approximation (the red line) resembles the original function (the
blue line) quite well when were pretty close to 0.
However, in this picture we see that things quickly turn sour as we move away from 0.
b) Find the second order Taylor approximation about the point x a, a of the function
f ( x) 7 x 2 3 x 1
Solution:
The general formula for the Taylor approximation is
f (a) f '(a) f ''(a) f ( n ) (a)
m( x) ( x a) ( x a) 2 ( x a)n
0! 1! 2! n!
We first calculate the derivatives:
f '( x) 14 x 3
f ''( x) 14
Now we have to evaluate these at a general point x a . If we plug that into our Taylor
formula, we get:
7a 2 3a 1 14a 3 14
( x a) ( x a)2
0! 1! 2!
7a 2 3a 1 14ax 3 x 14a 2 3a 7( x 2 2ax a 2 )
7 x 2 3x 1 7a 2 14a 2 7a 2 14ax 14ax 3a 3a
7 x 2 3x 1
Psych! We just get our original function back, no matter around which point we approximate
the original function. If you think about it for a little bit, this should make sense. A second
order Taylor approximation is a quadratic function that best approximates the original
229
function. Now, if the original function is quadratic itself, then what better quadratic function
than the original function! Basically, what we checked here is that the Taylor formula really
behaves like we want it to.
c) Find the Taylor approximation of order n about the point x 1 of the function
f ( x) 1 log( x) 0
Solution:
So now we have to calculate all the derivatives of the function. Lets hope there is some nice
repetitive pattern, otherwise this will be quite the chore!
1
f '( x) 1
x
1
f ''( x) 1 2
x
2
f (3) ( x) 1 3
x
2 3
f (4) ( x) 1
x4
2 3 4
f (5) ( x) 1
x5
n 1 ( n 1)!
f ( n ) ( x) 1 ( 1)
xn
Ok, that was easier than expected. There is a nice pattern to the derivatives. Notice that
( 1)n 1 is just a little trick to indicate that the derivatives alternate in sign. Check that the
formula for the n-th order derivative indeed works for orders 1 through 5. Now we have to
approximate about the point x 1 , so lets plug that in:
n 1 ( n 1)!
f ( n ) (1) 1 ( 1) 1 ( 1)
n 1
(n 1)!
1n
That gets even easier. Lets plug this into the Taylor formula:
f (1) f '(1) f ''(1) f ( n ) (1)
( x 1) ( x 1) 2 ( x 1) n
0! 1! 2! n!
n 1
1 log(1) 1 (0!) 1 (1!) 1 ( 1) ( n 1)!
0
( x 1) ( x 1) 2 ... ( x 1) n
0! 1! 2! n!
2 3 n 1
1 ( x 1) 1 ( x 1) ( 1) ( x 1) n
0 1x ... 1
2 3 n
Ok, so this was doable and we get a quite workable outcome. However, we had an ulterior
motive in treating this example. Note that we can rewrite our result as:
1 log x 0 1x 1R as x 1 , where 0 1 and
( x 1)2 ( x 1)3 ( 1) n 1 ( x 1) n . This term R becomes very small as x 1 , so that

R ...
2 3 n
if x is very close to 1 we may even write
1 log x 0 1x
This result is invoked very often in econometrics. There it is often assumed that the true
relation between two variables is something like y 1x , but instead the relation
y 1 log x 0 is estimated. What we see now is that you will indeed find the same values
230
for 1if your x is not too far away from 1.

1 log x 0 1x 1R
f ( x) f ( x) R R
%error 1
f ( x) 1x x
Exercise 2.
Calculate the Marginal Rate of Substitution (MRS) by total differential and by implicit
differentiation and show its relation to marginal utility. Do it first for U xy and then for
the general case.
Solution:
1. Total differential
The MRS between two goods is the amount you have to receive from one good if you give up
the other good to keep utility constant. The total differential of a function is the very small
change (called infinitesimal change) in that function for infinitesimal changes in its variables.
Given that U xy ,
U U 1 y 1 x
dU dx dy xydx xydy dx dy
x y x y 2 x 2 y
Because we are interested in the MRS, we want to keep utility constant, so we impose dU=0.
dy
Furthermore, since the MRS is the change in y for a given change in x, MRS . We solve
dx
for that:
1 y 1 x x y dy y
0 dx dy dy dx
2 x 2 y y x dx x
So what does this mean? Well, if you have, say, 5 units of good y and 10 units of good x, then,
5 1
if you had to give up an infinitesimal amount of x (say ), you would require of y
10 2
as compensation to keep your utility unchanged. The minus indicates the opposite directions:
you receive one and relinquish the other.
Implicit differentiation
The implicit differentiation method to derive this is very similar. In fact, it is a bit more
precise, since differentials are not completely well-defined: it is not clear what exactly an
infinitesimal change is. However, intuitively the total differential is easier to grapple with.
Since the method is more precise anyway, we make one other change in the direction of
dy
precision. It is slightly misleading to speak of , since we appear not even to have defined
dx
a relationship between x and y. And how could there be such a relationship: x and y are just
amounts of goods; you could have as many as you like. Of course the relationship between the
two comes from the fact that we impose that utility is fixed. In effect we imposed a relation
between x and y when we imposed dU=0 (keeping utility fixed). For implicit differentiation,
we will distinguish the general approach from the implicit function theorem. The latter
approach is more mechanical.
Implicit differentiation general approach

We start with the implicit relation between x and y, which is equal to a specific level of utility
231
U . In addition, we have imposed that y is a function of x.
function
U xy
Taking the derivative with respect to x for the left-hand side and the right-hand side of the
equation:
dU d d
( xy ) ( xy( x))
dx dx dx
The right-hand side can be considered as the product rule of differentiation:
d 1 y 1 x dy
( xy ( x))
dx 2 x 2 y dx
partial derivative y is an implicit
partial derivative function
to x , keeping y to y , keeping x of x
constant constant
dU
0
dx
Where
d d dU
( xy ) ( xy( x))
dx dx dx
1 y 1 x dy
0
2 x 2 y dx
1 x dy 1 y
2 y dx 2 x
y
dy x y 1 y 1 y
( )2 ( )2
dx x x x x
y
Lo and behold, the result is the same.
Implicit differentiation implicit function theorem

Remember that the implicit function theorem is:
U ( x, y ) 1 y
x 2 x y
0
U ( x, y ) 1 x x
y 2 y
Some generalization
Finally we derive a similar result for general utility functions. This is primarily to show that
more abstract calculations, although they may seem a bit more confusing, are often more easy
than concrete examples.
For the total differential approach we again have:
232
U
U U U U dy x
0 dU dx dy dy dx
x y y x dx U
y
In fact this last line is just the implicit function theorem (the constants U drop out after
differentiation). Indeed the total differential approach is one way of proving the implicit
function theorem. It just remains to link this result to the marginal utilities, but that is easy.
U dy MU x
The marginal utility MU x of x is just and similarly for y. So MRS . That
x dx MU y
result was easier to derive than the specific case!
Exercise 3.
Derive the derivative of log(x) by differentiating elog( x ) x and using your knowledge of the
derivate of e y and the inverse of the exponential function.
Solution:
This may seem like a silly question, since we know the derivative of log(x) just as much as we
know the derivative of e x . However, log(x) is actually defined as the inverse of e x , so all the
information we have on log(x) comes from our knowledge of e x . To work then:
d log( x ) d
e x , working out the left hand side we get:
dx dx
d log( x ) d d d d d
e e log( x) e log( x) elog( x ) log( x) x log( x) ,
dx d dx dx dx dx
whilst the right-hand side gives:
d
x 1
dx
So:
d d 1
x log( x) 1 log( x)
dx dx x
So now we have actually proved that the derivative of log(x) is as we always assumed it was.
Exercise 4.
d log( y )
Show that y,x , where y,x is the elasticity of y with respect to x, by using
d log( x)
differentials.
Solution:
This is actually not that hard, but it turns out to be rather useful in many areas. We derive the
total differential of f(y)=log(y).
log( y) 1 1
df ( y) d log( y) dy dy . Similarly d log( x) dx . Dividing the two, we get:
y y x
dy
( )
d log( y ) y dy x dy x
y,x
d log( x) ( dx ) y dx dx y
x
233
Exercise 5.
Show that the demand function Q P ( and constants) exhibits constant elasticity,
as well as the derived log-linear demand function log(Q) log( ) log( P) . Next week we
will see that this demand function arises from Cobb-Douglas utility functions.
Solution:
1
dQ P 1 P P P
Q,P P , which is constant (independent of
dP Q P P
price).
d log Q
log( Q ),log( P )
d log P
Exercise 6.
Estimate the effect of a change in x on f(x,y(x)), where:
a) x is ability, y is education and f(x,y(x)) is income.
b) ( p, D( p)) pD( p) cD( p)
Solution:
df f f dy
a) . What does this mean? The total effect of ability on income is
dx x y dx
f
composed of two separate effects: the direct effect of ability on income , which is
x
positive (if youre smarter youll generally earn more money) plus the effect of ability on
dy
education (which is presumably also positive) times the effect of education of income
dx
f
, which is again positive. Since all terms are positive, the effect of ability on income
y
will also be positive. Should it be the case that very able people actually get less education
(say because they think it beneath them, or because theyre so smart they dont function in
dy f
a rigid system) then for these high values of x becomes negative, while is still
dx y
positive. Then for this range of ability the marginal effect of intelligence is unclear: on the
one hand it increases your income directly, on the other it decreases your education and
through that effect reduces income.
The point here is that writing down this equation allows you to see all the partial effects.
Your analysis will then be as convincing as your explanation of the signs of the
derivatives is.
d
b) ( p, D( p)) D( p) ( p c) D '( p) (Note that we sometimes denote the derivative of
dp
d
functions of one variable as f ( x) : f '( x) ).
dx
Here p is the price, D is demand, is profit and c is the constant unit cost of production.
234
So
d ( p, D( p))
dp
is the effect of a change in price on the profit. The marginal effect of this is that it allows you
to get a little more money from the people you sell to (D(p)), while it costs you some demand
D(p),which in turn costs you p - c per costumer, as you dont get your money, but you also
dont incur the costs of production for them. In fact the total effect is unambiguously positive
if p c < 0, which makes sense: if your price is so low that you make a loss each time you
sell, you can increase profits by increasing the price. Otherwise the effect depends on whether
you think the loss in customers will be outweighed by the extra profit per costumer you make.
235
Question 1
2
Find the second-order Taylor-approximation for the function f ( x) e x 3 x log(3x) around
the point x 2 . Draw a graph of the original function and its approximation in one figure.
(Hint: you can use an online aid (or any aid at your disposal) to draw the graphs. There are
many available, for instance: http://rechneronline.de/function-graphs/. On that site you can
enter the original function as exp(x^2-3x)-log(3x).)
Solution:
The formula for the second order approximation around the point x 2 is:
f ''(2)( x 2)2
f (2) f '(2)( x 2)
2
Lets first calculate the first and second order derivatives.
2
f ( x) e x 3 x log(3 x)
2 1
f '( x) (2 x 3)e x 3 x
x
2 2 1
f ''( x) (2 x 3) 2 e x 3 x 2e x 3 x
x2
Therefore we get:
f (2) e 2 log(6) 1.93
2 1
f '( x) (1)e 0.64
2
1
f ''( x) (1) 2 e 2
2e 2
0.16
4
And for our Taylor approximation:
f ''(2)( x 2)2
f (2) f '(2)( x 2) 1.93 0.64( x 2) 0.08( x 2) 2
2
In a graph it looks like this:
The approximation is in red. It works pretty well in the vicinity of x 2 , but badly further
away.
236
Question 2
The formula 5( x 1)2 (1 ( x 1)2 ) ( y 2)2 0 defines a relation between x and y. In a graph
the function looks like this:
Find by implicit differentiation the derivative of the relation between x and y. Use the implicit
f
dy x, f f f
differentiation theorem from the slides: if and exist and 0. Here
dx f x y y
y
2 2 2
f ( x, y) 5( x 1) (1 ( x 1) ) ( y 2) . You can leave the derivative as a function of x and y
both. Comment (by reference to the graph) on those points where the relation does not look
locally like a function. (Hint: see the slides from week 4, page 28 onwards, particularly the
second example.) (Second hint: guess the problematic points and show that they are
problematic. Thats easier than finding them from the formulas directly.)
Solution:
We first calculate the partial derivatives of the theorem:
f ( x, y ) 5( x 1) 2 (1 ( x 1) 2 ) ( y 2) 2
f
10( x 1)(1 ( x 1) 2 ) 10( x 1)3 10(( x 1) 20( x 1)3 )
x
f
2( y 2)
y
This gives for the derivative:
f
3 3
dy x 10(( x 1) 20( x 1) ) 5(( x 1) 10( x 1) ) . For any point ( x, y) on the curve
dx f 2( y 2) y 2
y
we could now plug in these values and find the derivative at that point.
Finally we have to check where this goes wrong. There are three conditions for the implicit
f
function theorem: that both partial derivatives exist and that 0 . Clearly the first two
y
f
always hold, but y 2 0 y 2 . So any point on the curve where y 2 will give us
y
trouble. Lets look at the graph and guess what the corresponding x-values might be.
237
It seems like the points are (0, 2),(1, 2),(2, 2) (and they do look problematic, especially (1, 2) ).
Indeed, if we plug them in:
5(0 1) 2 (1 (0 1) 2 ) (2 2) 2 0
5(1 1) 2 (1 (1 1) 2 ) (2 2) 2 0
2 2 2
5(2 1) (1 (2 1) ) (2 2) 0
So at these three points we cannot locally represent our function as a graph and the derivative
that we found is problematic.
Question 3
Find and draw the level curve f ( x, y) 2 x 3 y 1 and draw a gradient on this level curve.
Solution:
f
1 2x x 2
The level curve is y . The gradient is f for every point. If we draw
3 f 3
y
1
the gradient on the level curve at the point (0, ) (which is on the level curve, as
3
1 1
2 0 3 1 ), we get the vector which goes from the point (0, ) to the point
3 3
1 1
(0, ) (2,3) (2,3 ) . In a graph, it looks like this:
3 3
4
4 2 0 2 4
Of course you can draw the gradient on another point of the level cure too, if you want.
Question 4
Suppose that f ( x1 , x2 ) is homogeneous function of degree 2 and g ( x1 , x2 ) is a homogeneous
function of degree 6. Show that the function h( x1 , x2 ) f ( x1 , x2 )3 g ( x1 , x2 ) is homogeneous
and determine the degree.
Solution:
It is given that
s 2 y f (sx1 , sx2 )
s6 y g (sx1 , sx2 )
238
h(sx1 , sx2 ) f (sx1 , sx2 )3 g (sx1, sx2 ) [ s 2 f ( x1, x2 )]3 s 6 g ( x1, x2 ) s 6[ f ( x1, x2 )3 g ( x1, x2 )] s 6h( x1, x2 )
Hence, h(.) is homogenous of degree 6.
* Question 5
Consider the implicit relation between x and y defined by:
dy
( x 3)2 ( y 3)2 9 . Use the implicit function theorem to find the derivative .
dx
You will get an outcome that depends on both x and y. Use the original relation between x and
dy
y to determine the value of the derivative at x=3 and at x=1. For the latter case you will get
dx
two possible outcomes.
Also find the two points where the relation cannot be represented as a function y(x).
Finally, draw a picture to elucidate you findings.
Solution:
The function ( x 3)2 ( y 3)2 9 is a circle with locus (3, -3) and radius 3.
We construct an implicit function: z ( x 3)2 ( y 3)2 9

dz 2( x 3)dx 2( y 3)dy
dy ( x 3)
dz 0 gives
dx y 3
dy
At x=3 we get y=0, and y = -6 and 0
dx
At x=1 we get y=-0.523 and y =-0.765
dy
does not exist at (0,-3) and (6,-3)
dx
239
Klein: Chapters 9, 10, and 11

Local versus global optima, strict optimum K.9.1.
Univariate Calculus
First order condition, stationary point K.9.1.
Second order condition K.9.1.
Concavity K.9.1.
Multivariate calculus
Frst order condition, stationary/saddle point K.10.1.
Hessian matrix K.10.3.
Second order condition in terms of K.10.3.
semidefiniteness
Concavity, convexity and semidefiniteness See lecture slides
Constrained optimization
Substitution K.11.1.
Lagrange K.11.2.
Multipliers K.11.2.
Value function K.11.2.
Envelope theorem K.11.2.
Convex constraints, multiple constraints, slackness See lecture slides
240
Chapter 9: Extreme values of univariate functions
Stationary function
For a differentiable function f ( x)
x * is a stationary point if f '( x*) 0
Example 1
y f ( x) 10 ( x 5)2
Stationary point f '( x) 2( x 5) 0 at x* 5
First-order necessary condition

If the function is everywhere differentiable on an interval and reaches
a minimum or a maximum at the point x * then x * is a stationary
point.
It is a necessary but not sufficient condition for identifying a local
maximum or a local minimum.
Example 2
y f ( x) | x 5 |
Note that f (5) 0 , but the derivative does not exist at x 5 . Thus the
first-order condition is not a necessary condition if the function is not
everywhere differentiable.
241
Extremum: minimum or maximum? (I)
Global maximum
If a function f ( x) is everywhere differentiable, for which x * is a
stationary point. The stationary point is a global maximum if
f '( x*) 0 for x x * and f '( x*) 0 for x x *
Global minimum
If a function f ( x) everywhere differentiable, for which x * is a
stationary point. The stationary point is a global minimum if
f '( x*) 0 for x x * and f '( x*) 0 for x x *
Example 3
y f ( x) 10 ( x 5)2
f '( x) 2( x 5) 0
f '( x) 0 for x 5
f '( x) 0 for x 5
Thus x* 5 is a global maximum.
242
Extremum: minimum or maximum? (II)
Lets consider a Taylor approximation of a function f ( x) about a

point x *
f ( x*) f '( x*) f ''( x*)
m( x) ( x x*) ( x x*) 2
0! 1! 2!
or
f ( x*) f '( x*) f ''( x*)
m( x) ( x x*) ( x x*) 2
1 1 2
The approximation has the
function value f ( x*)
fhe first-order derivative f '( x*)
fhe second-order derivative f ''( x*)
Because for the extremum f '( x*) 0

f ( x*) f ''( x*)
m( x) ( x x*) 2
1 2
Case 1
The extremum x * is a maximum: m( x) f ( x*) for any x close to the
maximum x * . In other words, the second derivative of f(.) must be
negative: f ''( x*) 0 .
Case 2
The extremum x * is a minimum: m( x) f ( x*) for any x close to x * .
In other words, the second derivative of f(.) must be positive:
f ''( x*) 0 .
Example 4
y f ( x) 10 ( x 5)2
f '( x) 2( x 5) 0
f ''( x) 2
At x* 5 the second derivative is f ''(5) 2
Thus x* 5 is a maximum, because f ''(5) 0
243
Extremum: global or local minimum or maximum? (II)
Global maximum
stationary point. The stationary point is a global maximum if
f ''( x) 0 for all x.
Global minimum
stationary point. The stationary point is a global minimum if
f ''( x) 0 for all x.
Example 5
y f ( x) 10 ( x 5)2
f '( x) 2( x 5) 0
f ''( x) 2
At x* 5 the second derivative is f ''(5) 2
Thus x* 5 is a maximum, because f ''(5) 0
244
Example 6
x
f ( x)
x2 4
( x 2 4) 2 x 2 ( x 2 4)
f '( x) 0
( x 2 4) 2 ( x 2 4) 2
2 x( x 2 4) 2 4 x( x 2 4)( x 2 4)
f ''( x)
( x 2 4) 4
2 x( x 4 8 x 2 16) 4 x( x 4 16)
( x 2 4) 4
2 x 5 16 x 3 96 x
( x 2 4) 4
2 x( x 4 8 x 2 48)
( x 2 4) 4
2 x( x 2 12)( x 2 4)
( x 2 4) 4
There is a stationary point for x 2 and x 2
Minimum or maximum?
Possibility 1
f '( x) 0 for x 2
f '( x) 0 for 2 x 2
f '( x) 0 for x 2
Possibility 2
2( 2)(4 12)(4 4) 256 1
f ''( 2) 0
(4 4)4 84 16
2(2)(4 12)(4 4) 256 1
f ''(2) 4 4
0
(4 4) 8 16
1 1
f ( 2) is a local minimum and f (2) is a local maximum.
4 4
245
Example 7
f ( x) e x 1 x
f '( x) e x 1 1 0 for x 1
f ''( x) e x 1 0 for all x. So that f ( x) is concave.
Inflection point: the twice-differentiable function f ( x) has an

inflection point if and only if the sign of the second derivative
switches from negative (positive) in some interval (m, x) to positive
(negative) in some interval ( x, n) . Note that if the second derivative is
negative, the function is concave. If the second derivative is positive is
convex. Thus, at the inflection point, the curvature changes from
convex to concave (or from concave to convex).
Example 7:
f ( x) x 4 does not have an inflection point at x=0.
f ''( x) 12 x 2
It does not change from sign at x=0. Because f ''( x) 0 for positive
and negative.
Example 8:
1 3 1 2 2
f ( x) x x x 1
9 6 3
1 2 1 2 1 2 1
f '( x) x x (x x 2) ( x 2)( x 1)
3 3 3 3 3
2 1 1
f ''( x) x (2 x 1)
3 3 3
1
The function has an inflection point at x=1/2 (because f ''( ) 0 ;
2
1
f ''( x) 0 for x ,
2
1
f ''( x) 0 for x
2
246
For x=-1 the function has a local maximum:

f '( 1) 0 and f ''( 1) 0
For x=2 the function has a local minimum:
f '(2) 0 and f ''(2) 0
Example 9:
f ( x) x6 10 x 4
f '( x) 6 x5 40 x3
Second derivative of f(x):

f ''( x) 30 x 4 120 x 2
30 x 2 ( x 2 4)
30 x 2 ( x 2)( x 2)
The function has an inflection point at x=-2 and x=2. No inflection

point at x=0
247
Wrapping up: Procedure of one variable optimization
1) Start with y f ( x) and determine the first-order necessary

dy
condition: f '( x) 0
dx
2) Check second-order sufficient conditions:
d2y
f ''( x) 0 (maximum)
dx 2
d2y
f ''( x) 0 (minimum)
dx 2
248
Chapter 10: Multivariable optimization without constraints
Next we consider a multivariable function. For example lets consider
y f ( x1 , x2 )
A necessary condition for the optimum is that

f
f1 x1 0
f ( x1 , x2 )
f2 f 0
x2
Definition:
The Hessian of the function y f ( x1 , x2 ) is the matrix of second-order
derivatives:
f11 f12
H
f 21 f 22
Remember: Youngs theory (lecture 4): f12 f 21
249
Intermezzo: Positive definite matrices (and negative definite

matrices)
Definition:
a c 2
A matrix A is positive definite if x ' Ax 0 for all x .
b d
Definition:
a c
A matrix A is negative definite if x ' Ax 0 for all x.
b d
Important:
1) A matrix A is positive definite if and only if all eigenvalues of the
matrix A are positive.
2) A matrix A is negative definite if and only if all eigenvalues of the
matrix A are negative.
3) A matrix A is indefinite if it has both positive eigenvalues and
negative eigenvalues.
Note
that the determinant of A is equal to the product of its eigenvalues
(week 3). As a consequence, if the determinant of A has a negative
sign, it must have both positive and negative eigenvalues. Hence, the
matrix A will be indefinite. The matrix A will be either positive
definite or negative definite if it has a positive determinant.
250
The unconstrained optimum: minimum or maximum?
Consider again the Hessian of the function y f ( x1 , x2 ) :
f11 f12
H
f 21 f 22
x1*
Consider the Hessian at , for which the gradient
x2*
f
x1 0
f ( x1 , x2 )
f 0
x2
251
For a function of more variables, say n , the Taylor approximation at the point a looks like
this: f (x) f (a) f (a) (x a) (x a)T H f (a)(x a) ...
Here we showed the first two terms. Remember that f (a) is a vector with n components,
while H f (a) is an n n matrix. Now suppose that a is a critical point, so that f (a) 0 and
that H f (a) is positive semi-definite, so that yT H f (a)y 0 for any vector y . For points x in
the direct vicinity of a , the first three terms of the Taylor approximation are very close to the
original function, so we can focus on them. So we get
f (x) f (a) f (a) ( x a) (x a)T H f (a)(x a)
f (a) 0 (x a) (x a)T H f (a)(x a)
f ( x) f (a) (x a)T H f (a)(x a)
f (x) f (a)
By the positive semi-definiteness of H f (a) . But because f (x) f (a) for all points in the
vicinity of a , a is a local minimum. Similarly, if H f (a) were negative semi-definite, a would
be a local maximum.
Case 1:
2
If the Hessian H is negative definite x ' Hx 0 for all x (all
eigenvalues of H are negative), the function will be at a maximum.
Case 2:
2
If the Hessian H is positive definite x ' Hx 0 for all x (all
eigenvalues of H are positive), the function will be at a minimum.
252
Example 10:
y f ( x1 , x2 ) x13 x12 x22 8
The first-order necessary conditions:
f
x1 3x12 2 x1 0
f
f 2 x2 0
x2
6 x1 2 0
and the Hessian is H
0 2
0 2/3
Thus the stationary points are and
0 0
0
The Hessian at :
0
2 0
H
0 2
The eigenvalues of the matrix H is 2 . Reason:

2 0
H I ( 2 )2
0 2
H I is zero for 2 . Because all eigenvalues are negative, the
0
matrix is negative definite, so that at the function f(.) reaches a
0
maximum.
2/3
For the second extremum, the Hessian at :
0
253
2 0
H
0 2
2 0
H I (2 )( 2 )
0 2
Because the eigenvalues have a negative and positive sign, the matrix
2/3
is indefinite, so that at the function reaches neither a maximum
0
nor a minimum.
254
2
Domain and range of the function f :
Lets reconsider the optimization problem

x1 2
x
x2
f (x) x12 x22
f: 2 Thus: 2
is the domain of f and is the range the
function f.
What does it mean if we graph f (x) in a two-dimensional space 2 ?

An iso-contour curve are the points of the domain of the function f
that have equal function value.
255
256
257
258
What does moving in a domain of f mean?
We are moving in the domain of the function f. Consider the function

of above
f (x) x12 x22
and the for example the point
5
x
0
Definition
5 10
If we move from the point x in the direction of y
0 0
it means that:
0 10
we take a line through the origin O and the point y .
0 0
5
The line through the new point and x is parallel to the line
0
0 10
through the origin O and the point y . The two lines
0 0
are parallel.
Three questions:
Question 1: What is the direction such the function of f value does not
change? This is the iso-curve (points of the domain that have a
constant function value)
Question 2: What is the direction, such that the function of f will
increase? (points of the domain that have a higher function value)
Question 3: In which direction is the increase of the function value
the steepest? Vector in the domain that indicates the change, in which
increase has the highest value.
259
Gradient
Definition:
The gradient is the direction (the vector in the domain of f) in which
the function at a particular point has the steepest slope.
f
x1
f ( x)
f
x2
Consequence:
The gradient is perpendicular to the set of points of the function, for
which the function has a constant value (the so-called iso-curve).
Example
For the function f (x) x1 x2 the gradient is
f
x1 1
f ( x)
f 1
x2
2 0
The iso-curve for f (x) 2 connects the points , and .
0 2
1
Implication: The gradient vector is perpendicular to the iso-
1
2
curve through both points. At the point , we should move in the
0
1
direction of (it means that we are moving parallel to the line
1
0 1
through the origin and ) to induce the steepest ascent.
0 1
260
Example
For the function g (x) x12 x22 the gradient is
g
x1 2 x1
g ( x)
g 2 x2
x2
0
The iso-curve for g (x) 4 is a circle connects with locus and
0
2 4
radius 2. It means that at the point the gradient is . At the
0 0
2 0
point we should move along the line through the points and
0 0
4
to have the steepest increase of the function value of g (x) .
0
4
Implication: The gradient vector is perpendicular to the iso
0
2
curve at that connects the points x12 x22 4.
0
261
Constrained optimization: subject to an inequality
We consider the following problem
max f (x) x1 x2
subject to g(x) = x12 x2 2 4 0
f (x) : the objective function

g (x) : the constraint
Note that the gradients of both functions are
f g
x1 1 x1 2 x1
f ( x) and g ( x)
f 1 f 2 x2
x2 x2
262
What does the maximization imply?
max f (x) x1 x2
means that we can take the highest function value of x that is a

feasible solution. We will have a iso-contours for a specific value of
f (x) : for instance f (x) = 0, f (x) =1, f (x) = 2, or f (x) = 3
Definition:
x1
Feasible point: specific value of in the domain of the objective
x2
function f which satisfies the constraint
g(x) = x12 x2 2 4 0
x1 0 x1 2
Thus and are feasible points
x2 2 x2 0
0 2
However, the function value of f (x) is larger for than for .
2 0
Definition:
Feasible set: the set of all of the feasible points in the domain of the
objective function f (so that they satisfy to the constraint)
0
In this particular example, the feasible set is a circle with locus
0
and radius 2.
263
max2 f (x) x1 x2
x
Possible outcome 1:
Unconstrained optimum is inside the feasible region.
Possible outcome 2:
Unconstrained optimum is outside the feasible region. We will have a
corner solution (a solution at the constraint).
264
Constrained optimization: how to move in the domain of the

function?
Next, we take a feasible point (subscript f) at the constraint (for

2
instance x f )
0
g(xf ) = 22 02 4 0
We consider the function value at f (x f ) 2 0 2
Definition:
The upper level set of a function f at s defined as the domain of f,
for which there is a function value higher than .
In mathematical language:
for every real number ( ):
U f , {x Dom( f ) | f (x) }
Read: the set U of the function f of a specific value of ( U f , ) is

defined as the points x of the domain of f (Dom(f)) such that (| )
the function value of f is larger than ( f (x) )
Note: the set must be formulated in the curly brackets { and }
Definition:
The lower level set of a function f is defined as
for every real number ( ):
Ll , {x Dom( f ) | f (x) }
265
Important implication 1:
By moving in the direction of the gradient, we will move to the points
in the domain that correspond to the upper level set. More specifically,
we will move to the points in the upper level set that have the highest
increase of the function value.
Important implication 2:
For a maximization problem, we must move in the direction of the
gradient of the objective function.
For a minimization problem, we must move in the direction of the
negative of the gradient of the objection function.
266
Constrained optimization: subject to an inequality
In what follows we will only consider maximization problems subject

to an inequality constraint.
We consider the following specific problem
max f (x) x1 x2
We will focus on maximization problems that have the following

structure:
First, the objective function must be a concave function.
Second, the constraint must be a convex function.
1) f (x) is a concave function. Reason: the function f is linear in x1

and x2 . All linear functions are concave functions
2) The constraint g(x) is a convex function. Reason: the Hessian is for
all combinations of x1 and x2
2 0
H which is positive semi definite matrix, because the
0 2
eigenvalues are 2 . Note that linear constraints are convex
functions also.
267
Step 1) The function f (x) x1 x2 has no unconstrained optimum

Conclusion 1: the optimum will be at the constraint x12 x2 2 4 0
Conclusion 2: the optimization problem reduces to
max f (x) x1 x2
Step 2) Is the objective function (of the maximization problem) a

concave function and the constraint a convex function?
Answer: Yes. See slide above.
Step 3) Solve:
f ( x) g ( x)
g(x)=0
0
Which will be a system of equations of three unknowns x1 , x2 and .
After solving the system of equations, the optimal x1* , x2* and must
satisfy the following:
a) The constraint g(x) = (x1* )2 (x2* )2 4 0
b) is positive, because of the maximization problem concave
objective function and a convex constraint (the reason will be
explained below).
Important
The KKT conditions are a sufficient condition for a local maximum. If
a point satisfies the KKT conditions, it will be a local maximum.
268
Why should the gradient of the objective function be a positive

function of the constraint?
Remember:
1
any other point is on the same line through the origin
1
0 1
O and .
0 1
max f (x) x1 x2
f g
x1 1 x1 2 x1
f ( x) and g ( x)
f 1 f 2 x2
x2 x2
1 2 x1
f (x) g (x) for some positive . The vectors and are
1 2 x2
on the same line through the origin
Outcome 1:
The gradient is the steepest ascent of the objective function in a
particular point x .
Outcome 2:
The gradient is the steepest ascent of the constraint in a particular
point x .
Conclusion: We cannot move further. Hence, the point is an

constrained local optimum of the function f !!!
269
Thus, we can solve the equations now:
max f (x) x1 x2
f ( x) g ( x)
g(x)=0
0
1 2 x1
1 2 x2
x12 x2 2 4 0
If a points satisfies the conditions of above, it must be a local
maximum.
It can be solved as
1 1
Solution 1) x1* 2 , x2* 2 and 2
2 2
1 1
Solution 1) x1* 2 , x2* 2 and 2
2 2
Hence, there are two possible solutions. HOWEVER, Note that the
second solution is unfeasible, because it delivers a negative ( 0 ).
1 1
Conclusion: the solution is x1* 2 , x2* 2 and 2
2 2
270
Wrapping up: Karush Kuhn Tucker conditions
max2 f (x)
x
subject to g(x) 0
The function f (x) is a concave function and the function g(x) is a

convex function
There exists a local maximum x* then there exists a unique *

such
that
1) f (x* ) *
g(x* )
2) * 0
3) * g (x* ) 0
Note:
1) The maximum x* is referred to as a Karush Kuhn Tucker point
(abbreviated as a KKT-point).
2) If the unconstrained optimum is feasible (the unconstrained
optimum satisfies the constraint anyway, * 0 ) . Consequently,
*
g (x* ) 0
*
3) If the unconstrained optimum is not-feasible. 0
271
Optimization subject to multiple inequality constraints How to

relate it to the formulation of the Lagrangian?
Consider the problem

max f (x)
s.t. gi 0, i I {1,2,... p}
Let L(x, i , j ,...) f (x) i gi (x) j g j (x) ... (the Lagrangian for
certain constraint i, j,... binding). Then L(x* ) 0 works out to
0
f ( x) i gi j g j ...
0
g i ( x) which implies:
0
g j ( x)
0
i gi j g j ...
f ( x)
0
g i ( x) . This last set of equations is precisely
0
g j ( x)
0
the KKT-condition followed by the binding constraints. So if you want

to check the KKT-conditions for a certain set of binding constraints
you can look at the Lagrangian for that set of constraint. Then repeat
the process for the other sets of binding constraints you may want to
try.
272
Wrapping up: Optimization subject to inequality constraints
Step 1 Check if objective function is concave

Step 2 Check if constraints are convex
Step 3 Pick a set if constraints to be binding
Step 4 Try to solve for a (strict) KKT-point under those constraints
Step 5 Check if i 0 and the non-binding constraints hold. (And if
f (x* ) 0 )
Step 6 If you find in step 4 and 5 that not all the conditions for a
(strict) KKT-point hold, go back to step 3 and pick another
combination of binding constraints. If you have no more
combinations left, go to step 7.
If you find that all the conditions do hold go to step 7.
Step 7 If you still have combinations to check, see if you can find a
smart argument to see that the KKT-point you found is the
unique maximizer (see example 4). Then you dont have to
check the other combinations. If you cannot find such an
argument, check the other combinations for KKT-points (so
go back to step 3).
If you have checked all possible combinations, you are done.
If you found (a) KKT-point(s) then that/those are your
solution. If you found none, then he problem has no solution
(or you made a mistake, of course).
273
Answers tutorial week 5 Advanced Mathematics
Exercise 1.
Find the critical points of the following functions and assess whether they are minima,
maxima or inflection points:
a) ax2 bx c , where a, b, c are constants.
Solution:
d (ax 2 bx c) b
2ax b 0 x , so the function has one critical point.
dx 2a
d 2 (ax 2 bx c)
2a The partial derivative is 2a, independent of x. If a < 0, the function
dx 2
takes a maximum, if a > 0 it takes a minimum.
b) log( x) 4 x
Solution:
d log( x) 4 x 1 1
4 0 x
dx x 4
so the function has one critical point. The second derivative is

d 2 log( x) 4 x 1
2
dx x2
We can now plug in the optimal value for x:
1
16 0
1 2
( )
4
1
so the function takes a maximum. (We could also observe that is negative for all x).
x2
* c) x2n , n
Solution:
dx 2 n
2nx 2 n 1 0 x 0 .
dx
d 2 x2n 2n 2 2n(2n 1) x 2 n 2 if n 2
2 n (2 n 1) x If we evaluate this at the optimal value for
dx 2 2 if n 1
x, we find:
d 2 x2n 0 if n 2
dx 2 x 0
2 if n 1
274
So we have found a minimum for n=1. But for the other cases we dont know yet. Well graph
them to what is going on:
35
30
25
20
15
10
2 1 1 2
4 6 8
Here we graphed x , x , x . Clearly they all have a minimum at x=0. This shows that the
conditions we use for finding a minimum are only a sufficient requirement: always if we find
a critical point that obeys the second order condition, it is a minimum, but not all minima
obey the second order condition.
d) ex
Solution:
de x
e x 0 , so this function has no critical points. It is always increasing, monotonously
dx
increasing, as it is called.
2
3x 2
e) e x
Solution:
Given that we saw in the last exercise that e x is always increasing, we expect to find a
minimum here at the minimum of the exponent. Indeed, this is what comes out:
2
de x 3 x 2 2
(2 x 3)e x 3 x 2 0 , since the exponential function is always positive, this implies:
dx
3
2x 3 0 x . This indeed at the minimum of the exponent.
2
2
d 2e x 3 x 2 2 2
2
(2 x 3)2 e x 3 x 2 3e x 3 x 2 0 . We could plug in the optimal value of x, but we
dx
see that only positive numbers (a square, twice an exponential function, 3) occur, so we see
immediately that this is positive and therefore the critical point a minimum.
* f) ex x2
Solution:
d (e x x 2 ) ?
e x 2 x 0 We dont know how to solve this; in fact it is even unclear if this has a
dx
solution. Is it possible that e x 2 x for all x? One way to check that is to see if e x 2 x 0
even at its minimum. As it happens, this is precisely the next exercise. (Well come back to
this one when were done with it).
275
g) e x 2 x
Solution:
de x 2 x
e x 2 0 e x 2 x log(2)
dx
d 2e x 2 x
e x 0 , so we find a minimum. Now at the minimum:
dx 2
ex 2x elog(2) 2log(2) 2 2log(2) 0 , as log(2) < 1. So, coming back to the
x log(2)
previous sub-question f), we find that even at its minimum, the first derivative of e x x 2 is
positive, so it is always increasing and has no critical points.
Exercise 2.
Determine whether the following matrices are positive definite, negative definite, or neither:
a b
* a)
b c
Solution:
Because the matrix is very small, we can apply the definition of positive definiteness directly.
We focus on symmetric matrices, because it can be shown that if a matrix A is positive
definite, one can always find a symmetric positive definite matrix S that gives the same
outcomes, i.e.: xT Ax xT Sx for all vectors x.
For a positive definite matrix, it must hold that:

a b x ax by
x y x y ax 2 2bxy cy 2 0 for all possible x and y.
b c y bx cy
Clearly, if we take either x = 0 or y = 0, we find that both a and c must be positive. To find the
final condition, we rewrite our expression:
b
ax 2 2bxy cy 2 a ( x 2 2 x y ) cy 2
a
2
b b 2 b2 2
a( x 2 2 x y y ) cy 2
y
a a2 a
b 2 b2 2
a( x y ) (c )y 0
a a
The second and third step here are not at all obvious, but we take them, because we end up
b 2
with something nice. Indeed, in the last line we have two squares ( ( x y ) and y 2 ) which
a
b2
are always positive, and a, which we also demanded be positive. So the final term, (c ),
a
b2
must also be positive. But (c ) 0 ac b2 0 . This means that the determinant of our
a
matrix should also be positive. This is actually the familiar principal minor condition.
276
2
0 0
3
1 3 1
b)
6 4 3
1 1 2
6 4 3
Solution:
We have seen this matrix before, it is the markov chain example we analyzed in the tutorial of
2 5
week 3. There we saw that it had eigenvalues 1 1, 2 , 3 , so all eigenvalues are
3 12
positive. This is one characterization of a positive definite (PD) matrix. We double-check our
result by looking at the leading principal minors:
2
0 0
3
1 3 1 2 32 11 5
det 0
6 4 3 3 43 43 18
1 1 2
6 4 3
2
0
3 23 1
det 0
1 3 34 2
6 4
2
0
3
They are all positive, confirming our results.
1 0 1
c) 0 2 1
1 1 3
Solution:
This time we have to check by direct computation:
1 0 1
det 0 2 1 1(2 3 1 1) 0 1(0 1 2 1) 5 2 3 0
1 1 3
1 0
det 2 0
0 2
1 0
277
It is positive definite (PD).
* Exercise 3.
Find the critical points of the following functions and assess whether they are minima,
maxima or saddle points:
a b x x
a) f ( x, y) ( x y) d e , where a, b, c, d, e are constants
b c y y
Solution:
From 2a) we know that the function written out becomes:
a b x x
f ( x, y) ( x y ) d e ax 2 2bxy cy 2 dx ey
b c y y
The two first-order conditions become:
f
2ax 2by d 0
x
f
2bx 2cy e 0
y
Note that we can write this as:
a b x d
2 0
b c y e
Basically, f is a quadratic function, but now in two dimensions (In fact functions of the form
xT Ax bT x cT are called quadratic functions). The rule for its derivative is very similar to
the one dimensional case. We could solve for the critical point (x,y) by solving this matrix
equation, but we dont really care about the outcome, so we move on to the second order
condition.
We have to check the Hessian:

2 2
f f
2
x x xy a b
H 2 2
2
f f b c
y x y2
Again, the Hessian of this matrix looks very much like the second derivative of a one
dimensional matrix. Furthermore, we have seen in question 2a) under what conditions this
matrix is positive definite ( ac b2 0 , a > 0 and c > 0) and when it is negative definite
( ac b2 0 , a < 0 and c < 0). The first case corresponds to a minimum, the second to a
maximum.
1 0 1 x x
*b) f ( x, y, z ) x y z 0 2 1 y 1 2 3 y 5
1 1 3 z z
Solution:
We can apply the same result that we saw in question 3a): the derivative of a quadratic
function is:
278
1 0 1 x 1
f ( x, y , z ) 2 0 2 1 y 2 0
1 1 3 z 3
Here we write f ( x, y, z ) for the column vector of partial derivatives of f(x,y,z). This is here
nothing more than a short-hand, although f ( x, y, z ) , called the gradient of f, is a useful thing
with interesting properties, which we do not study here.
Our equation leads to the matrix equation:
1
1 0 1 x 2
0 2 1 y 1
1 1 3 z 3
2
1 1 1
Which we can solve by sweeping to find: x ,y ,z . So our critical point is
6 3 3
1
6 1 0 1
1
. We know that the Hessian is equal to: 0 2 1 . We saw in question 2c) that this is a
3
1 1 3
1
3
positive definite matrix, so we have a minimum here.
*c) f ( x, y, z) xyz x2 3 y 2 log( z )
Solution:
We compute the partial derivatives and set them equal to zero:
f ( x, y , z )
yz 2 x 0
x
f ( x, y , z )
xz 6 x 0
y
f ( x, y , z ) 1 1 1
xy 0 xyz 1, yz , xz
y z x y
Plugging these last two equalities back into the first partial derivatives, we get
1 1 1 1
2x x2 x x
x 2 2 2
1 1 1 1
6y y2 y y
y 6 6 6
Using now the fact that xyz 1 , we find z 2 3 z 2 3 . If we make sure that the signs
work out correctly (for xyz to be positive, we must have an even number of negative
numbers), we find that there are 4 critical points:
279
1 1 1 1
2 2 2 2
1 1 1 1
, , , .
6 6 6 6
2 3 2 3 2 3 2 3
To see whether these are maxima, minima or saddle points, we have to calculate the Hessian:
2 2 2
f f f
2
x x y x z
2 2 2
2 z y
f f f
H z 6 x
y x y2 y z
1
2
f 2
f 2
f y x
2
z2
z x z y z
In general, it could be a lot of work to check for all four points whether this matrix will be
positive or negative definite. However, in this case we observe that the first two diagonal
entries are negative, while the third is positive (it is a square). Since positive definiteness
requires all diagonal elements positive, while negative definiteness requires all diagonal
elements negative, clearly these Hessians can be neither. Therefore All critical points are
saddle points.
d) f ( x, y) log( x) log( y) log(1 y x) (hint:check if it is concave first)
Solution:
As per the hint we check for concavity. We first the compute the partial derivatives:
f ( x, y ) 1 1
x x y x 1
f ( x, y ) 1 1
y y y x 1
1
We can show that the partial derivatives are zero for x y
3
Next the Hessian of the function:
2 2
f f 1 1 1
2
x x xy x ( x y 1) 2
2
( x y 1) 2
H 2
f 2
f 1 1 1
y x y 2
( x y 1) 2 y ( x y 1) 2
2
1
We compute the eigenvalues of the Hessian at x y
3
18 9
0
9 18
( 18 )2 81
1 27, 2 9
280
Both eigenvalues of H are negative, so that the Matrix H is negative definite (ND).
1
Consequently, at x y the function reaches a maximum.
3
One could also argue that H is a negative definite matrix, because the diagonal entries are
negative (minus squares), while the determinant is:
1 1 1
x2 (x y 1) 2 ( x y 1) 2 a z z
det det (a z )(b z ) z 2
1 1 1 z b z
2 2 2
( x y 1) y ( x y 1)
Here I define a, b and z to simplify the expression. Because a, b and z are all negative, this
determinant must be positive. So we see the Hessian is negative definite everywhere and
therefore the function is concave. This means that whatever critical point we find, it will be a
maximum. Furthermore, for a strictly concave function, there is only 1 critical point.
Looking at the symmetry of the function, we might wish to guess what the critical point is. If
we do this successfully and find that all the partial derivatives are zero there, we are done.
1
We pick x y . When we evaluate the partial derivatives we found earlier, we see that we
3
indeed get zero, so we are done.
e) f ( x, y) x4 2 x2 y 2 y4
Solution:
We first rewrite our function:
f ( x, y) x4 2 x2 y 2 y 4 ( x2 y 2 )2
Next we compute the first order conditions:
f ( x, y )
4 x( x 2 y 2 ) 0
x
f ( x, y )
4 y( x2 y 2 ) 0
y
From these it follows that x=y=0 is the only critical point. Now let us compute the Hessian at
this point:
2 2
f f
2
x x xy 4( x 2 y 2 ) 8 x 2 8 xy
H 2
f 2
f 8 xy 4( x y 2 ) 8 y 2
2
y x y2
Evaluated at zero, this becomes:
0 0
H
0 0
This matrix is not positive definite, but still the function is at a minimum. This is another
example of the fact (already observed in question 1c) ) that the second-order condition is
sufficient, but not necessary for a minimum.
But have still to show that the function is indeed at a minimum. Fortunately, this is not so hard
281
to do. Recall that for a point (x,y), x 2 y 2 is the square of the distance to the origin, i.e. the
point (0,0). So what the function f(x,y) does is give us the square of that number. Clearly, for
any point but (0,0) itself, our critical point, this will give a positive number. So the function is
everywhere positive, but in the point (0,0), where it is 0. Thus (0,0) must be a minimum.
Tutorial of Friday
Question 4
max f ( x) x2 y 2
s.t. g1 (x) x 5 0 and g 2 (x) x y 7 0
Solution
Lets first try the case where only the first constraint is binding, i.e. x 5 0 , but x y 7 0
. Then (1.4) becomes:
f ( x* ) g i ( x* ) g1 (x* )
i = and
x i I* x x
x
f ( x* ) g i ( x* ) g1 (x* )
i ,which gives
y i I y y
x*
2x and
2y 0
We find that y 0 and from x 5 0 x 5 that 10 0 so ( 5, 0) appears to be a
KKT point. However, note that at ( 5, 0) the second constraint does not hold:
x y 7 5 7 2 0 . So ( 5, 0) is not a KKT point after all and we conclude that there
are none with only the first constraint binding.
Lets suppose instead that only the second constraint is binding, i.e. x y 7 0 , but
x 5 0 . Then (1.4) becomes:
f (x* ) g 2 (x* ) f (x* ) g 2 (x* )
and
x1 x1 x2 x2
2x and 2y
x y
1
Together with x y 7 0 this gives x y 3 and hence 7 0 . However, now the
2
1
other constraint is not met: x 5 1 0 . We conclude that there is no KKT point with only
2
the last constraint binding.
Finally, lets assume that both constraints are binding, i.e. x 5 0 and x y 7 0 . Notice
that this immediately gives us that x 5 and y 2 . So we only have to check if this point
is KKT:
f (x* ) gi (x* ) g1 (x* ) g 2 (x* ) f (x* ) gi (x* ) g1 (x* ) g 2 (x* )
i = 1 2 and i 1 2
x i I* x x x y i I* y y y
x x
2x 1 2 and -2 y 2
2 4 0, 1 6 0
We find that both multipliers are positive and, by assumption, all constraint hold, so ( 5, 2)
is a solution to (1.3). Notice that if we had started checking the last case, it would hardly have
282
been necessary to check the first two. After all, we already found a maximizer and, with some
smart reasoning, we could have excluded the possibility of other maximizers. This is true in
general: if you are either lucky or smart in checking the right case first, you will have less
work to do.
We claimed above that ( 5, 2) is a solution to (1.3). However, for that we have to check one
more thing. We have to see if the level sets are indeed convex. In general that can be a hard
problem, so we again develop a sufficient condition.
With the lemma at our disposal, all we have to do check if our objective function is concave
and our constraints (at least the binding ones) are convex. This can be a bit of a chore, but in
economics the functions in which we are interested are almost always appropriately concave
or convex. This is not just for mathematical convenience, but because the problems we study
in economics often have unique maximizing outcomes where our constraints are binding. This
basically just the fact that economics is the study of scarcity (optimizing under binding
constraints).
In (1.3) checking these conditions isnt too hard. First not that the constraints are linear and
therefore both concave and convex. They are certainly not a problem. For the objective
function, notice that it is a quadratic function in two variables:
1 0 x
f ( x, y) x2 y 2 x y . As we will see in exercise 3, such a function is
0 1 y
1 0
concave if the matrix is negative definite. We observe that the diagonal elements
0 1
1 0
are negative and the determinant is 2 0 , so it is negative definite. (Alternatively
0 1
1 0
and more interestingly we could consider the eigenvalues of . Notice that the matrix
0 1
is minus the identity matrix, so it gives it gives minus the input vector as an outcome.
Therefore any vector is an eigenvector with eigenvalue -1. This eigenvalue is negative, so we
again conclude that the matrix is negative definite).
Question 5
1 0.5 x x
max f ( x, y ) x y x 0.5 y 0.5 x y x2 xy y 2
0.5 1 y y
subject to x y 2
Solution:
First we have to check whether our objective function f is concave and our constraint is
convex. Notice that we wrote f as a matrix product, but that it is just a normal function
2
f: . To check for concavity we compute the Hessian:
2 2
f f
2
x x y 2 1
Hf . We recognize this as twice the matrix from the original
2
f 2
f 1 2
x y y2
function. This is a general result. A function xAx has Hessian 2A . Such a function is called
quadratic. That is a generalization of ordinary one-variable quadratic functions to functions of
more variables. Anyway, lets check if the Hessian is negative (semi)-definite. One way of
283
doing this is computing the eigenvalues and checking if they are non-positive.
2 1
( 2 )2 1 2
4 4 1 2
4 3 0
1 2
( 3)( 1) 0 3 1
So we find that both eigenvalues are non-positive, therefore the Hessian is negative semi-
definite and the function is concave.
For the constraint function, g ( x, y) x y 2 , we observe that this is a linear function,
which is always convex and so the constraint is convex. Our functions are appropriately
concave and convex, so all we have to do is find KKT-points and were done.
There is only one constraint, so lets try it with that constraint binding. Then we must have:
f ( x* , y * ) g ( x* , y * )
2 x* y *
x* 2 y *
This gives us two equation with three unknowns, but we also have the fact that we want our
constraint binding, to which we return shortly. We usually first try to eliminate from the
first two equations.
2 x* y *
2 x* y * ( x* 2 y * ) x * 2 y * 3 x* 3 y * x* y*
x* 2 y *
We combine this with the binding constraint: x* y* 2 2 y*
y* 1 , so we find
2
that x* 1, y* 1 is our candidate KKT-point. Finally we have to check if 0.
2 x* y* 2 1 1 0 . So we have found a maximum (strictly speaking we have to
check that f ( x* , y* ) 0 , but that is rather trivial).
For our elucidation, we draw a picture of the level curves at our maximum.
Figure 13 A picture of the feasible set and the upper level curve at our maximum, together with the (minus)
gradients.
Question 6. (an economic example with multiple constraints)
A firm has a fixed budget of 100 million euro for the production of socks. As is well known,
the production of socks involves either the highly environmentally damaging method X or the
much more neutral method Y. In particular the production function looks like this:
Q( X , Y ) 15 X Y , where X and Y are expressed in millions of hours worked with the
respective methods. As can be seen from the production function, method X is much more
effective. However, working with method X rather dangerous (dont take your socks for
granted!) and hence requires a higher wage compensation for labour. This gives rise to a wage
284
constraint: 40 X 10Y 100 . In other words, labour in method X costs 40 euro per hour,
while labour in method Y costs 10 euro per hour. Furthermore, the government, worried about
the detrimental effects of method X, imposes a strict cap: X 1 . What is the firms optimal
division of labour between the two methods?
Solution:
In terms of (1.2) our problem looks like this:
max Q( X , Y ) 15 X Y
subject to 40 X 10Y 100 and X 1
Lets first check if our functions are appropriate:

15 12
X
2
Q( X , Y ) 1 and the Hessian is:
Y 2
2
3
15 2
X 0
4
HQ . The diagonal elements are negative, while
1 32
0 Y
4
15 32
X 0
4 15 32 32
X Y 0 , so the matrix is negative definite and the production
1 32 16
0 Y
4
function is strictly concave. Both constraints are linear and thus no problem. Hence we only
have to check the KKT-conditions. But we still have several possibilities as for which are
binding. Lets start by assuming that both are binding, so 40 X 10Y 100 and X 1 1.
Together they imply that Y 6 . So we have to check whether the point (1, 6) is a strict KKT-
point.
15 12 15
X
2 2 40 1 40 1 2
Q(X, Y) 1 1 g1 ( X , Y ) 2 g2 ( X , Y ) 1 2
1 10 0 10 1
Y 2
2 2 6
From this we obtain:
1 15 2 15 6 4
1 0, 2 2 0 . We find that both our multipliers are
20 6 2 6 2 6
positive and hence we have a maximum. That basically means were done! Might there not be
other KKT-points where fewer of the constraints are binding, you ask? Well, , the point that
we found is a global maximizer, so any other point that is also a global maximizer should give
the same outcome (the same amount of socks produced). But if we keep one of the constraints
binding, while relaxing the other one, we move away from our optimal point ( X * , Y * ) along a
1
For the rest of this example we drop the * from X * , Y * , for ease of reading (and typing!), but
do recall that the relations that we derive hold only for special points and not for general X
and Y.
285
straight line (e.g. the line 40 X 10Y 100 ). Our function will not keep the same value along
this straight line, because it is not linear. It will change value, and because we are at a
maximum, it can only decrease. Now you might still wonder: could it not increase again, so
much so that we obtain another global maximum? Well, no. Lets imagine that it did, so that at
another point ( X , Y ) we obtain again our maximum. But, as we know, somewhere along the
straight line 40 X 10Y 100 between ( X * , Y * ) and ( X , Y ) there is a point ( X , Y ) that gives a
value less than that of the maximum. But we know that the upper level set U Q ,Q ( X * ,Y * ) is
convex, because Q is concave, and we know that ( X * , Y * ) and ( X , Y ) are both elements of that
upper level set. ( X , Y ) is on a straight line between them and hence by the convexity of
U Q ,Q ( X * ,Y * ) , ( X , Y ) U Q,Q ( X * ,Y * ) , but this contradicts the fact that ( X , Y ) is not a maximizer.
Hence we find that there can be no other maximizer than ( X * , Y * ) . What we did here was use
the strict concavity of Q to show that our maximum is unique. This is a useful rule to
remember: if you function is strictly concave then any maximum that you find is unique.
.
286
ADV MATH ADDITIONAL EXERCISES WEEK 5
Additional exercise 1.
max f ( x, y) x 3y
subject to ( x 3)2 ( x 4) 2 4
Solution:
We first have to check whether our functions are appropriately concave and convex again. We
see that f is linear and hence we can immediately conclude that it is concave.
For the constraint function g ( x, y) ( x 3)2 ( y 4)2 4 we first note that we can rewrite this
g ( x, y ) ( x 3) 2 ( y 4) 2 4 x 2 6 x 9 y 2 8 y 16 4
as 1 0 x x
x y 6 8 21
0 1 y y
This function is of a more general quadratic form than we encountered in the previous
example. The general form is h(x) xAx b x c , where A is a matrix, b is a vector and c
is a number. Here too, as in the previous case, the Hessian is 2A , while the gradient is
h(x) 2 Ax b . (This should rather remind you of the first and second derivatives of a
quadratic function of 1 variable). Lets check in this case. First we calculate the gradient
(which will come in handy in a bit):
g
x 2( x 3) 2x 6 1 0 x 6
g ( x, y ) ( 2 )
g 2( y 4) 2y 8 0 1 y 8
y
(Observe that the general rule works out here). Next we calculate the Hessian:
2 2
g g
2
x x y 2 0
Hg 2
g 2
g 0 2
x y y2
(This one also obeys the general rule. That rule will considerable simplify our future
calculations. Quadratic functions are easy to handle).
By now we can immediately observe that this Hessian is positive definite and hence the
function convex. So our function have the appropriate properties and all that is left to do is
find KKT-points.
f ( x* , y * ) g ( x* , y * )
1 2x 6
3 2y 8
As a matter of technique we usually want to get rid of the first and a good way of doing
that is often to divide the equations on both sides:
1 (2 x 6) (2 x 6)
3 (2 y 8) (2 y 8)
2 y 8 6 x 18 y 3x 5
Next we use the binding constraint:
( x 3)2 (y 4)2 4 x 2 6 x 9 (3x 9) 2 4
10 x 2 60 x 86 0
287
60 3600 4 10 86 10 10
With the abc-formula we find: x 3 x 3 and
20 5 5
3 10 3 10
respectively y 4 y 4 . So we have two possible KKT-points. Lets check
5 5
for each. We get:
1
1 (2 x 6)
2x 6
1 5 1 5
0 0
10 10 10 10
2(3 ) 6 2(3 ) 6
5 5
10 * 3 10
So is positive only for the point x* 3 ,y 4 , which is the solution to our
5 5
problem. We once again make a graph:
Figure 14
max f ( x, y ) log( x) 3log( y)

subject to x y 3 and log(2x) -2
Solution:
We first check whether our functions are concave and convex. We start with f.
f 1
x x
f ( x, y )
f 3
y y
Hence the Hessian is:
288
2 2
f f 1
2 0
x x y x2
Hf . The diagonal elements are negative for any x, y (notice
f2 2
f 3
0
x y y2 y2
that the function is only defined for positive x and y, but we dont have to invoke that here, as
1
0
x2
squares are always positive) and the determinant of the matrix is 3x 2 y 2 0 , so the
3
0
y2
Hessian is negative semi-definite and f is concave.
The first constraint is linear and thus certainly convex. For the second constraint we have to
be careful in rewriting it: the function should be of the form g (x) 0 , so we must write:
log(2 x) 2 g2 ( x) 2 log(2 x) 2 log(2) log( x) 0 . This is a function of only one
variable, so we only have to check the ordinary second order derivative.
1 1
g 2 '( x) g 2 ''( x) 0 , so the function is convex.
x x2
Everything is in order and we proceed to finding KKT-points. We have several constraints and
we dont know in advance which will be binding. So we have to systematically work our way
through the possible combinations. Lets first assume that only the second constraint is
binding, so log(2 x) 2 , but x y 3 . We obtain for the KKT-condition:
f ( x* , y * ) 1 g 2 ( x* , y * )
x x* x
f ( x* , y * ) g 2 ( x* , y * ) x*
f ( x , y* )
*
3 g 2 ( x* , y * )
0
y y* y
1
We see immediately that y* 0 0 and
1 0 . Because 0 , we conclude that
x*
we cannot have a KKT-point here and thus that there is no solution with only the second
constraint binding.
For the second case, lets assume both cases are binding, so log(2 x) 2 and x y 3 ,
1 6e2 1
which implies x , y . So we only have to check the KKT-condition for this
2e2 2e2
point, but the KKT-condition is a bit more complicated.
f ( x* , y * ) 1 2 g1 ( x* , y* ) g 2 ( x* , y * )
2e
x x* * * * * x x
1 g1 ( x , y ) 2 g2 ( x , y )
2
*
f (x , y ) *
3 6 e 1 * *
g1 ( x , y )
2
g 2 ( x* , y * )
2
y y* 6e 1 y y
1
2
* 1 2 e2
x
1
0 1
2e 2
1 2 e2
2
6e
1
6e 2 1
289
From this we obtain:
2e 2 2
2 1 2e
6e
1
6e 2 1
6e2
1 0,
6e2 1
6e2 6
2e2 2 2e
2
(2 2 ) 2
0,14... 2 0
6e 1 6e 1
So we find a negative multiplier, which means our point is not KKT. Therefore there is no
maximum when both constraints are binding.
Lets finally assume that only the first constraint is binding, i.e. x y 3 , but log(2 x) 2.
We obtain for the KKT-condition:
f 1
x x*
f ( x* , y * ) g1 ( x* , y * )
f 3
y y*
1
x* y*
1 y * 3 x*
3 3 x*
y8
Together with the binding constraint this implies:
4
y* 3x* , x* y* 3 y* 3(3 y* ) y*
9
4 8 ?
From this we obtain x* . Lets check if this point is feasible, i.e. log(2 x* ) log( ) 2 .
27 27
8
Indeed log( ) 1.26... 2 . So far so good, lets find the multiplier.
27
1 27
0 , so all the conditions for maximum are met and we have our solution. Again,
x* 4
if we would have been lucky enough to start with the third case, our work would have been
finished much sooner. (In fact I was so lucky when I solved this problem, but I didnt want to
keep you from having fun with the other cases).
Maximize U ( x, y) x y , 1 subject to px y I . Do this first by substitution and
then by Lagrange multipliers.
Solution:
Substitution:
px y I y I px , so
290
U ( x, y ) x y U ( x) x (I px)
U
x 1 ( I px) px ( I px) 1
0
x
x 1 ( I px) px ( I px) 1
( I px) px I ( ) px
I
x , (remember 1)
p
y I px (1 ) I I
Lagrange:
L ( x, y , ) x y ( px y I )
L
x 1y p 0 x 1y p
x
L
x y 1 0 x y 1
y
For the problems encountered in economics, it is almost always useful to divide the two
partial derivative constraints of the Lagrangian:
x 1y p y
1
p
x y x
We plug this into the budget constraint:
y y ( )y
px y I x y y I y I
x
(1 ) I aI
px y I px I I x
p p
The results are the same.
However, the Lagrangian multiplier method does allow us to say one more thing: we can now
interpret the multiplier . Lets derive the marginal utility of income. That is the amount of
extra utility derived from more income. So if we call our optimal consumption solutions
aI *
x* ( I ) , y (I ) I , our utility at optimal consumption is: U * ( I ) U ( x* , y* ) and the
p
U * (I ) U ( x* , y * ) x * U ( x* , y * ) y *
marginal utility of income is: . Now we know from
I x I y I
U ( x* , y * ) U ( x* , y * )
the Lagrange constraints that p and , so we get:
x y
U ( x* , y* ) x* U ( x* , y * ) y * x* y* x* y*
p (p ) . Let us have a look at the
x I y I I I I I
x* y*
term p . It measures marginal expenditure as income increase. But since expenditure
I I
and income are equal at optimal consumption (we dont leave money lying around), this term
must be 1 (this can also be seen by implicitly differentiating the budget constraint w.r.t.
U * (I )
income). So .
I
291
Additional Exercise 4.
Maximize f ( x) x3 3x 2 x s.t. x 6 and x 10
Solution:
Lets start by drawing the graph on the indicated domain:
10 5 5
200
400
600
800
1000
1200
Lets also zoom in a bit on the action:
1 1 2 3
Looking at the figures, we can expect to find four points of interest with the Kuhn-Tucker
method: the two points where the function becomes flat and the two points where one the
restrictions becomes binding. Lets compute and see if it works out.
292
We construct the Lagrangian:

L( x) x3 3x 2 x 1 ( x 6) 2 ( x 10)
Note that we always construct the Lagrangian with constraints of the form g ( x) c , so we
had to rewrite x 10 as x 10 .
The first-order condition:

L
( x) 3 x 2 6 x 1 1 2 0
x
Now there are two possibilities: either a constraint is binding, meaning it holds with equality,
or it is slack, meaning that it holds with strict inequality. If the constraint is slack, we want the
associated multiplier (lambda) to equal zero. For maximization, we want that the multiplier is
positive if the constraint is binding. Lets first try it with the multipliers equal to zero and see
if we end up with point in which the constraints are slack.
6 24 6 24
3x 2 6 x 1 0 x x
6 6
We get two critical points that both lie strictly between -10 and 6 (they are then called interior
points). So these are candidate optima and we have to check the second order conditions to
see if they are local minima or maxima or inflection points:
2
f ( x)
6 x 6 , which we evaluate at our critical points:
x2
6 24 6 24
6 6 24 0, 6 6 24 0
6 6
So the first point is a maximum and the second a minimum.
Now lets check the solutions at which the constraints are binding. Since clearly x cannot
equal -10 and 6 at the same time, only one of the constraints can bind.
Lets first try x=6. Then it must hold that 1 0 if this is an actual constrained maximum
(remember that the other multiplier should still be zero, as it is not binding).
3x 2 6 x 1 1 0 73 1 1 73 0
x 6
So this is in fact a maximum.
Now lets try the other constraint: x= -10:

3x 2 6 x 1 2 0 361 2 2 361 0
x 10
So here the Kuhn-Tucker conditions do not hold and this is not a local maximum (it is in fact
a minimum).
We can now check the two local maxima we found to see which is bigger and is the global
maximum, but we will not do so, as this is only cumbersome and the answer is obvious from
the graph. x=6 is the global maximum.
Now, where do these Kuhn-Tucker conditions come from?

Remember that the first order condition in general says:
293
f g
0
x x
For a positive lambda (which the Kuhn-Tucker conditions demand) this means that the
derivative of the objective function and the derivative of the constraint must have the same
sign. Lets think about this. For a maximum at a constraint we want that the function is
increasing if the constraint is an upper bound; its only a maximum if we could increase by
relaxing the constraint. But if the constraint is an upper bound than it must also be increasing.
(see the figure, where we zoomed in our function at x=6 and scaled it down for clarity. The
figure also shows the constraint).
2.0
1.5
1.0
0.5
5.5 6.0 6.5 7.0
0.5
1.0
Similarly, if the constraint is a lower bound, we want our function to be decreasing there (so
that by going below the lower bound, we would increase our value). But for a constraint to be
a lower bound, it must be decreasing. In the case of our function and the lower bound
constraint, this was not the case.
294
1.0
0.5
10.5 10.0 9.5 9.0
0.5
1.0
1.5
Clearly, though, this is a local constrained minimum. What Kuhn-Tucker, for all its
complexity, boils down to, is simply checking that the sign of the derivatives of the function
and the constraint(s) are equal for a maximum and opposite for a minimum.
Maximize f ( x, y) x y , where > 0, subject to x2 2 y 2 10, 2 x2 y 2 10, x y 3 .
Determine which constraints are binding for which values of .
Solution:
Lets first get a picture of what these constraints mean. The ones with squares in them remind
us of the circle formulas we have seen. In fact they are ellipsoids: elongated circles (in fact,
in this particular case you could obtain them from circles by applying the linear map that we
discussed in week 3, where you double your vectors in the x-direction and leave them
unchanged in the y direction). The third constraint is of course just a straight line. The whole
thing then looks like this:
295
The red area is the area where all the constraints hold. You could think of our function as a hill
landscape over the plane. When we optimize it subject to our constraints we restrict ourselves
to looking for peaks and valleys on the red area.
Now lets have a look at our objective function, f ( x, y) x y . Because x is only defined for
negative x if is an integer (a whole number), we restrict our attention to the case where x is
positive. The f(x,y) is increasing in x and y as long as y is positive. So, because we are
maximizing, we can restrict our attention to the following area:
Because we want to maximize and f(x,y) is increasing in both our variables, we expect to end
up somewhere on the outer edge of the red area. But where? This will depend on . The
higher , the more x contributes to a higher outcome, so the more we want to move to a
296
higher x, at the expense of a lower y. So what we expect is that, as we increase from 0 up to
infinity, we will move along the outer edge of the red area, starting at the y-axis and moving
towards the x-axis.
For =3, we obtain a case where two of the constraints are binding. We write down the
Lagrangian:
L( x, y) x3 y 1 ( x 2 2 y 2 10) 2 (2 x 2 y 2 10) 3 ( x y 3)
From our calculations, we expect the second and third constraint to be binding here. As we
3 2 3 6 2 3
saw, this happens at the point ( x, y) ( , ) (2.15, 0.85) . What we have to
3 3
check now is that both the associated lambdas are positive at this point. So lets calculate:
L ( x, y )
3x 2 y 4 2 x 3 12 9 2 3 0
x x 2.15, y 0.85 x 2.15, y 0.85
L ( x, y )
x3 2 2 y 3 x 2.15, y 0.85 10 2 2 3 0
y x 2.15, y 0.85
2 66
2 0, 3 0
7 7
So we do indeed find that the multipliers are positive, so the Kuhn-Tucker conditions hold, so
we have a local constrained maximum.
Find and draw the level curve f ( x, y) 2 x 3 y 1 and draw a gradient on this level curve.
Solution:
f
1 2x x 2
The level curve is y . The gradient is f for every point. If we draw
3 f 3
y
1
the gradient on the level curve at the point (0, ) (which is on the level curve, as
3
1 1
2 0 3 1 ), we get the vector which goes from the point (0, ) to the point
3 3
1 1
(0, ) (2,3) (2,3 ) . In a graph, it looks like this:
3 3
4
4 2 0 2 4
297
Of course you can draw the gradient on another point of the level cure too, if you want.
Find the critical points of the function f ( x, y) xy 3x 4 y log( xy) and determine if they
are maxima, minima or saddle-points.
Solution:
1
y* 3
x* 0
f ( x* , y * )
1 0
x* 4
y*
1 1
y* 3 x*
x* y *
3
1 1 1 y* 3
x* 4 4 1 4( y* 3)
y* y* 3 y* y*
2
y* 4 y* ( y* 3) y* 3 4 y* 12 y* 3 0
12 144 48
y*
8
Because there is a logarithm in the in the function, we must have that either both x and y are
positive, or both are negative (negative numbers are not part of the domain of the function).
We check the two possibilities:
12 144 48
y1* 0.232
8
1
x1* *
0.309
y 3
12 144 48
y2 * 3.23
8
1
x2* *
4.31
y 3
So both critical points are within the domain. We check the Hessian:
2 2
f f 1
2 1
x x y x2
H f ( x, y ) 2
f 2
f 1
1
x y y 2 y2
Evaluated at our first critical point this becomes:

1
1
x*2 10.45 1
H f ( x* , y* )
1 1 18.57
1 *2
y
Because all the diagonal elements are positive, we only have to check for positive
definiteness. We check the leading principal minors:
298
LPM1 10.45 0
LPM 2 10.45 18.57 1 0
All the leading principal minors are positive, so the Hessian is positive definite and this
critical point is a minimum.
We check the Hessian at the second critical point:
1
1
x*2 0.096 1
H f ( x* , y* )
1 1 0.054
1 *2
y
Both diagonal elements are positive, so we only have to check for positive definiteness.
However,
LPM1 0.096 0
LPM 2 0.096 0.054 1 0
So this point is neither a minimum, nor a maximum, but a saddle point.
max f ( x) x 2 y
s.t. 2 x y 5 and x 2 y 4
Solution:
We first check if the objective function is concave and the constraints are convex. The
constraints are linear, so they are no problem. For the objective function we check the
Hessian:
1
2 x
f ( x, y )
1
y
1
3
0
4x 2
H f ( x, y )
1
0 3
2y2
The diagonal elements of the matrix are negative, while the determinant is
1
3
0
4x 2 1
0 , so the matrix is positive definite and the function is concave.
1 3
0 3 8( xy ) 2
2y2
All that remains is to find a KKT-point. We first assume that only the first constraint is
299
1
2 x* 2
binding: 2 x y 5 , y 2 x 4 . Then we have f ( x* , y * ) g1 ( x* , y* ) .
1
y*
We divide the equations:
1
2 x* y* 2 y*
2 42 16 y* 16 x* . Together with the constraint this gives:
1 2 x* x*
y*
5 16 5 40
2 x* y* 18 x* y* 5 x* . We check that the second condition holds
18 18 9
5 40 165
x* 2 y * 2 4 . The second condition does not hold, so we conclude that
18 9 18
there can be no KKT-point with only the first constraint binding.
Next we try only the second constraint binding, so 2 x y 5 , but y 2 x 4 . This gives
1
* * 2 x*
f (x , y ) g 2 ( x* , y * ) . We divide the equations again
1 2
y*
1
2 x* y* 1 y*
12 11 y* x* . Together with the constraint this gives
1 2 x * 2 2 x*
*
y
y* x* 2 . But then we find 2 x* y* 6 5 , so the first constraint is not met. We conclude
that this cannot be a KKT-point.
Finally we try both constraints binding: 2 x* y* 5, x* 2 y* 4 y* 1, x* 2 . We check
if this is a KKT-point.
1
* * 2 x* 2 1 2 2 1 2
f (x , y ) 1 g1 ( x* , y* ) 2 g 2 ( x* , y * )
1 1 2 2 1 2 2
*
y
1 1 4 2 2 4 2 2
So 2 1 1 2 2
1 2 , 2 0, 1 0
2 2 1 6 2 6 2
So this is a KKT-point. Notice that we would have gotten lucky if we started with both
constraints binding. In general its a good strategy to start with the cases with both constraints
binding, because they are easiest to check. However, here I gave all possibilities, so that you
could check your answers.
max f ( x, y ) log( x) 2 y
subject to x 2 y c
300
and show that the derivative of the value function is equal to the multiplier.
Solution:
We first check that the objective function is concave (the linear restraint is certainly convex).
1
f ( x, y ) x
2
1
0
H f ( x, y ) x2
0 0
1
The diagonal elements of the matrix are negative, while the determinant is 0 0 . So the
x2
matrix is positive semi-definite and the function is concave (we dont say that the matrix is
positive definite in this case (because of the zero), but the concavity is still there). We find the
KKT-point:
1
* * 1
f (x , y ) x* g ( x, y ) . The second equation immediately implies 1 0,
2
2
1 c 1
so *
1 x* 1 and from the constraint y* . Because the multiplier is positive, we
x 2
have a KKT-point. The value function is the objective function evaluated at the optimal point:
c 1
f (c) f ( x* , y* ) log( x* ) 2 y* log(1) 2( ) c 1 . If we take the derivative with
2
df
respect to c we find: 1 , as we should.
dc
301
Week 6 Constrained optimization and Integral calculus
Klein: Chapter 11
Value function K.11.2.
Envelope theorem K.11.2.
Convex constraints, multiple constraints, slackness See lecture slides
Klein: Chapter 12
Area under curve K.12.1.
Anti-derivative, fundamental theorem of calculus K.12.1.
Simple rules K.12.2.
Substitution K.12.2.
Integration by parts K.12.2.
Improper integrals K.12.2.
302
Informal interpretation of the Lagrange multiplier (the multiplier

of the KKT problem)
The polynomial function that we consider is the following:
y f ( x) x2 3
Case 1 unconstrained optimum

max x 2 3 for x
x
f '( x) 2x 0
x 0 (extremum)
f ''( x) 2 0
So, the extremum is a maximum, because the second-order derivative
is negative. In addition, f ( x) is a concave function.
Case 2 we introduce a non-binding constraint

We introduce the non-binding constraint:
max x2 3 for x
x
subject to g ( x) x 2 0
The function g(.) is a convex function. It means that the constraint

does not lead to a change of the optimum
The KKT conditions become (we do not need to apply the symbol,
because f ( x) is a univariate function).
f ( x) g ( x) f '( x ) g '( x )
g ( x) 0 g ( x) 0
0 0
or
303
2x 1 4
x 2 0 x 2
0 0
This solution is non-binding, because we find a negative value for the
optimum . Check: the unconstrained optimum x 0 satisfies the
constraint g ( x) x 2 0 .
Case 3 we introduce a binding constraint

We introduce the binding constraint to the optimization of a concave
function
max x2 3 for x
x
subject to g ( x) x 2 0
The function g(.) a convex function. The maximization is with respect

to a concave objective function, whereas the constraint is convex. We
may apply the KKT conditions.
f '( x ) g '( x ) 2x 1 4
g ( x) 0 x 2 0 x 2
0 0 0
It means that the constraint does lead to the optimum x 2 . This is a

binding constraint because 0.
Case 4 we introduce a binding constraint in a more general form

We generalize the binding constraint
max x2 3 for x
x
subject to g ( x) x c ( c is a positive number)
or g ( x) x c 0
Which is a convex function. The maximization is with respect to a

concave objective function, whereas the constraint is convex. We may
304
apply the KKT conditions.
f '( x ) g '( x ) 2x 1 2c
g ( x) 0 x c 0 x c
0 0 0
It means that the constraint does lead to the optimum x c . This is a

binding constraint because 0.
Conclusion 1:
The multiplier is increasing in c.
Thus, for larger values of we are further away from the optimum
value of f ( x)
Conclusion 2:
From an economic perspective, this is an important result. It means
that the importance of the constraint is signifying for larger values of
c!
Conclusion 3:
We optimize the function by taking the derivative of f ( x) at the
constraint x c . If the derivative at this point is close to zero, we are
not far away from the optimum of the unconstrained function. This is
reflected by the value of .
If is zero: the constrained optimum is equal to the unconstrained
optimum (the non-binding optimum).
If is large and positive: the constrained optimum is very different
from the unconstrained optimum. Thus, the constraint is important
for the outcome of the maximization problem.
If is negative: the constraint is irrelevant. It does not change the
outcome of the non-binding optimum.
305
Formal interpretation of the Lagrange multiplier :
Objective function: f ( x1 , x2 ) and the constraint: g ( x1 , x2 ) c . The

optimal values of ( x1 , x2 ) are ( x1* (c), x2* (c)) , which depend on c.
Step 1 Take the derivative of the constraint g ( x1* (c), x2* (c)) c with
respect to c:
g ( x1* (c), x2* (c)) dx1* (c) g ( x1* (c), x2* (c)) dx2* (c)
1 (1)
x1 dc x2 dc
Step 2 Take the derivative of the objective function f ( x1 , x2 ) with

respect to c (chain rule):
df ( x1* (c), x2* (c)) f ( x1* (c), x2* (c)) dx1* (c) f ( x1* (c), x2* (c)) dx2* (c)
dc x1 dc x2 dc
(2)
Step 3 we consider the first-order conditions

f g
g ( x1 , x2 ) c
f ( x1* (c), x2* (c)) g ( x1* (c), x2* (c))

x1 x1
(3)
f ( x1* (c), x2* (c)) g ( x1* (c), x2* (c))
x2 x2
306
Step 4 Substituting the first-order conditions (3) in the derivative of

the objective function (equation (2)):
df ( x1* (c), x2* (c)) f ( x1* (c), x2* (c)) dx1* (c) f ( x1* (c), x2* (c)) dx2* (c)
dc x1 dc x2 dc
df ( x1* (c), x2* (c)) g ( x1* (c), x2* (c)) dx1* (c) g ( x1* (c), x2* (c)) dx1* (c)
dc x1 dc x2 dc
step 3 step 3
(4)
Step 5 Because of equation (1), equation (4) can be rewritten as:
df ( x1* (c), x2* (c)) g ( x1* (c), x2* (c)) dx1* (c) g ( x1* (c), x2* (c)) dx1* (c)
dc x1 dc x2 dc
1 (step 1)
df ( x1* (c), x2* (c))

dc
Thus: interpretation of is:
The amount by which the value of the objective function f(.)

increases if the constraint g (.) c is relaxed by one unit.
The maximum amount that the economic agent would be willing to
pay (in units of the objective function) for a relaxation of the
constraint.
Definition: is the shadow price of the constraint g (.) c
307
We apply this formal procedure to our simple problem:
Maximize the concave objective function y f ( x ) ax 2 , subject to

the constraint x c . We assume that a and c are negative real
numbers.
Step 1.
Take the first derivative of the constraint g ( x) c with respect to c:
dx* (c)
1 (1)
dc
Step 2.
Take the first derivative of the unconstrained function f ( x) (the
objective function) with respect to c. We apply the chain rule:
df ( x* (c)) df ( x* ) dx* (c)

dc dx* dc
df ( x* (c)) * dx* (c)
2ax (c) (2)
dc dc
Step 3.
The first-order conditions f '( x ) g '( x ) are
f '( x ) g '( x )
2ax* (c) (3)
Check: is positive
Step 4.
Substituting equation (3) in equation (2)
df ( x* (c)) * dx* (c) dx* (c)

2ax (c) (4)
dc dc dc
Step 5.
Because of equation (1), equation (4) can be rewritten as:
308
df ( x* (c )) dx* (c )
dc dc
1
309
Envelope theorem
Take a specific revenue function of a firm. Question: for which Q are

the revenues highest? The revenue function is based on a demand
function and a cost function:
Demand function
P 3 0.5Q
Cost function:
C 2Q
The objective function (= profit function) becomes:
(Q, a, b, c) PQ C (3 2)Q 0.5Q2
Maximization of the objective function yields:
(3 1)
Q* 1
2*0.5
Next, we generalize this simple exercise. We introduce the parameters

a, b, and c and we reconsider the maximization.
Demand function
P a bQ
Cost function:
C cQ
The objective function (profit function) becomes:

(Q, a, b, c) PQ C (a c)Q bQ 2 (1)
0
Q
310
(a c) 2bQ* 0
(a c)
Q* (2)
2b
Substitute Q* in the objective function (.) (equation (1)). The

maximum value function is the following:
(Q* , a, b, c) (a c)Q* bQ*2

(a c) (a c) 2
(a c) b
2b 4b 2
2(a c) 2 (a c) 2
4b 4b
(a c) 2
4b
So that the maximum value function becomes:
(a c) 2
F (a, b, c) (3)
4b
Thus the maximum value function is a general function of the

parameters a, b, and c. The interpretation of F is that it is an indirect
objective function.
311
Advantage: we can apply either of the following 2 methods:
Method 1: We can take the derivative of the function (1) with respect
(a c)
to a, b or c and we substitute the optimum Q* (equation (2))
2b
afterwards. Lets consider for instance the derivative of (1) with
respect to a:
(a c)Q bQ 2 Q (4)
a a
(a c)
Next, we substitute Q* in equation (4)
2b
(a c)
Q*
a 2b
Method 2: We can take the derivative of the maximum value function

(1) with respect to a, b or c. Lets consider for instance the derivative
of (3) with respect to a:
F 2(a c) ( a c)
F (a, b, c) (5)
a a 4b 2b
Conclusion: equation (4) is equal to equation (5). This is refered to as

Envelope Theorem.
Note: The second method is easier to apply.
312
Envelope Theorem in general terms
z f ( x, y; ) is objective function; x and y are choice variables.

x* x* ( ) and y* y* ( ) are the optimal x and y. Both depend on a
single parameter .
The maximum value function is

z* F ( ) f ( x* ( ), y* ( ); )
And we can take the derivative of the maximum value function with
respect to
z*
F( ) f ( x* ( ), y* ( ); )
Proof:
F( ) f ( x* ( ), y* ( ); )
* * x* * * y*
f ( x ( ), y ( ); ) f ( x ( ), y ( ); ) f ( x* ( ), y * ( ); )
x y
x* y*
0 0 f ( x* ( ), y* ( ); ) f ( x* ( ), y* ( ); )
i
313
Example of Envelope Theorem (Hotellings lemma)
Production function: Q f ( K , L)
Where Q: output, K: capital and L: Labour.
Profit function: ( K , L, p, r, w) pf ( K , L) rK WL
Maximize profit with respect to K, L, keeping p, r, and w fixed.
Optimal K and L are: K * K * ( p, r , w) and L* L* ( p, r , w) so that

*
( K * , L* , p, r , w) pf ( K , L) rK WL
Hence:
*
f ( K * , L* ) Q* 0 and
p
*
K*
0 : how much profits is lost if the price of capital
r
increases by a small amount?
*
and L* 0
w
314
Integral Calculus
We compute the area under the function between the lower limit a and
the upper limit b:
b
f ( x)dx
a
Note that:
c b c
1) f ( x)dx f ( x)dx f ( x)dx
a a b
b a
2) f ( x)dx f ( x)dx
a b
b b b
3) f ( x) g ( x) dx f ( x)dx g ( x)dx
a a a
We compute the antiderivative of f ( x) , which is F ( x) , such that

dF ( x)
f ( x)
dx
Take f ( x) 5 x 2
5 3
F ( x) x C where C is a constant
3
b
b
2 5 3 5 3 5 3 5 3
5 x dx x b a (b a 3 )
a 3 x a 3 3 3
315
Rules of integration
1) Polynomial function
a
ax n dx xn 1 C
n 1
2)How to treat a constant in the integral?

kf ( x)dx k f ( x)dx
3
3 2 32
Example: 16 x dx 16* x3/ 2
1/ 2
3 3 0 32 3
0 3 x 0 3
3
3 2 2
16 x dx 16 x3/ 2
1/ 2
16( 3 3 0) 32 3
0 3 x 0 3
3) Exponential function
dekx
ke x
dx
Thus
a kx
aekx dx e C
k
1) Logarithmic function - I
d ln( x) 1
dx x
Thus
a
dx a ln( x k ) C
x k
2) Logarithmic function - II
d ln( f ( x)) f '( x)
dx f ( x)
Thus
316
f '( x)
dx ln( f ( x)) C
f ( x)
317
3) Integration by parts
Lets consider the product rule for differentiation
df ( x) g ( x)
f '( x) g ( x) f ( x) g '( x)
dx
Thus the analogue for integration is:
f ( x) g ( x) C f '( x) g ( x)dx f ( x) g '( x)dx
or
f '( x) g ( x)dx f ( x) g ( x) f ( x) g '( x)dx C
Thus: udv uv vdu C
1 3 1
Example ( f ( x) x 2 ; g ( x) ln( x); f ( x) x ; g '( x) )
3 x
2 x3 x3 1
x ln( x)dx ln( x) dx
3 3 x
x3 x3
ln( x) dx
3 3x
x3 1 2
ln( x) x dx
3 3
x3 1 3
ln( x) x C
3 9
318
Alternative of example:
2 x3 x3 x3
x ln( x)dx ln( x)d ln( x) d ln( x)
3 3 3
x3 x3
ln( x) dx
3 3x
x3 1 2
ln( x) x dx
3 3
x3 1 3
ln( x) x C
3 9
319
4) Substitution method:
It is an application of the reverse of the chain rule to integrals
g (h( x)) C g '(h( x)) h '( x)dx
or
u h( x )
du h '( x)dx
g '(h( x)) h '( x)dx g '(u )du
du
Example: u 3x 2 1; 6 x; du 6 xdx :
dx
1 2 1
(3x 2 1)6 xdx udu u (3x 2 1)2 C
2 2
Alternative:
1
(3x 2 1)6 xdx (3x 2 1)d (3x 2 1) (3x 2 1)2 C
2
320
Advanced Mathematics Week 6 Technical tutorial (Wednesday)
1.Maximize f ( x) x3 3x 2 x s.t. x 6 and x 10

Solution: see additional exercises of week 5, exercise 4.
2. Maximize f ( x, y) x y , subject to x2 2 y 2 10, 2 x2 y2 10, x y 3 . Determine

which constraints are binding for which values of .
Solution: see additional exercises of week 5, exercise 5.
Exercise 3. Find the following integrals:

x3 2 x 2 1
a) dx
x
Solution:
We can work out the fraction and split up the parts:
x3 2 x 2 1 1 1
dx ( x2 2x )dx x 2 dx 2 xdx dx
x x x
1 3
x x 2 log( x) C
3
We check by taking the derivative with respect to x:
d 1 3 2 1
( x x log( x) C ) x 2 2 x
dx 3 x
This, as we saw, is what we started out with under the integral sign.
5
x3 2 x 2 1
b) dx
2
x
Solution:
We already found the indefinite integral under a), so now we just use the fundamental theorem
of calculus to plug in:
5 5
x3 2 x 2 1 1 3 1 1
dx x x 2 log( x) 125 25 log(5) 8 4 log(2)
2
x 3 x 2 3 3
This has some outcome we dont care about.
3x 2 4 x 1
c) dx
2 x3 4 x 2 2 x 6
Solution:
This is a lucky integral. We just so happen to be able to apply a substitution,
321
3x 2 4 x 1 3x 2 4 x 1 (Note 1) ( 3 x 2 4 x 1) 1
dx dx dy
2 x3 4 x 2 2 x 6 3 2
2x 4x 2x 6 y 2
6 x 8x 2
2 x3 4 x2 2 x 6 dx
( 3x 2 4 x 1) 1 1 1 1
dy dy log( y ) C
y 2 ( 3x 2
4 x 1) 2 y 2
1
log(2 x3 4 x 2 2 x 6) C
2 y
Note 1 (we apply method of substitution):

y 2 x3 4 x 2 2 x 6
dy
6 x2 8x 2
dx
1
2
dy dx
(6 x 8 x 2)
We check by taking the derivative of the result with respect to x:

d 1 1 1 3x 2 4 x 1
log(2 x3 4 x 2 2 x 6) 6 x2 8x 2
dx 2 2 2 x 4 x2 2 x 6
3
2 x3 4 x 2 2 x 6
This is exactly what we started out with. Note how lucky we were. If we change the
denominator even slightly, say changing the -1 to -2, there is no easy way to find the solution
anymore.
d) 1 2xdx
Solution:
Here we again apply the method of substitution
(Note 1) 1 1 1 1 3/ 2
1 2 xdx y dy y1/ 2 dy y C
2 2 3/ 2 2
1 2x dx
2 1 3/ 2 1 3/ 2 1
y C y C (1 2 x)3/ 2 C
3 2 3 3 y
Note 1 (we apply the method of substitution):

dy 1
y 1 2x ; 2; dy dx
dx 2
We check the result by taking the derivative with respect to x:

3 1
d 1 1 3
(1 2 x) 2 (1 2 x) 2 2 1 2x
dx 3 3 2
3
e) ( 2 x 2 )dx
x 12
322
Solution:
Here we see a square root to which we would like to apply a substitution, but there is the other
term there. But this is no problem; we can just split it off:
3 3 3 2 3
( 2 x 2 )dx dx 2 x 2 dx dx x C
x 12 x 12 x 12 3
Now we can apply a similar substitution as in the previous exercise

3 (Note 1) 3 3 1/ 2
dx dy 3 y 1/ 2 dy y C 6 x 12 C
x 12 y 1/ 2
dx
x 12
dy
y x 12 ; 1 ; dy dx
dx
Now we can add the two solutions (note that we just add the two integration constants into
one new constant. This doesnt matter, since the constants could be anything anyway. Strictly
speaking, we should give all these constants new names, but that would be very cumbersome.)
3 3 2 3 2 3
( 2 x 2 )dx dx x C 6 x 12 x C
x 12 x 12 3 3
We check our result by taking the derivative with respect to x:

1
d 2 3 1 3
(6 x 12 x ) 6( x 12) 2 2 x 2 2x2
dx 3 2 x 12
f) log(3x 7)dx
Solution:
We try our luck with another substitution:
(Note 1) 1 1
log(3x 7)dx log( y ) dy log( y)dy =
3 3
3x 7
dx

dy 1
y 3x 7 ; 3 ; dy dx
dx 3
We now have to find an integral for the logarithmic function. We can do this by an application
of integration by parts. Remember that integration by parts is the following:
f ( y) g '( y)dy f ( y) g ( y) f '( y) g ( y )dy
In our solution, we will rewrite it differently
f ( y)dg ( y) f ( y) g ( y) g ( y)df ( y)
Thus the integral can be solved as follows:
323
1 1 (Note 2)
log( y )dy [ y log( y ) yd log( y )]
3 3
1 1 1
[ y log( y ) y dy ] [ y log( y ) 1dy]
3 y 3
d log( y )
1 1
[ y log( y ) y ] C [(3x 7) log(3x 7) (3x 7)] C
3 3 y y y
Note 2
As part of the method of integration by parts, we need to rewrite g ( y)df ( y ) as
f '( y) g ( y)dy :
df ( y ) d log( y) 1 1
; d log( y) dy
dy dy y y
1
Thus y d log( y ) y dy
y
g ( y) f ( y) g ( y)
f '( y )
We check by differentiating the solution with respect to x:

d 1
( (3x 7) log(3x 7) 3 x 7 )
dx 3
1 3x 7
(3log(3x 7) 3 3) log(3 x 7)
3 3x 7
g) x 2e x dx
We present two solutions. The first solution is based on the expression
f ( x) g '( x)dx f ( x) g ( x) f '( x) g ( x)dx , which I find harder to memorize. The second
solution is based on f ( x)dg ( x) f ( x) g ( x) g ( x)df ( x) . The works very easy once you
df ( x)
realize that f '( x); df ( x) f '( x)dx . I prefer the second solution, because it is much
dx
easier to memorize. Some students have seen the first solution before at the high school.
Solution 1 - using ( f ( x) g '( x)dx f ( x) g ( x) f '( x) g ( x)dx )

We take f ( x) x 2 , g '( x) ex
x 2e x dx x 2e x 2 xe x dx
f ( x ) g '( x ) f ( x ) g '( x ) f '( x ) g ( x )
Now we take for the next integral f ( x) 2 x, g '( x) e x
x2e x 2 xe x dx x 2e x [2 xe x 2e x dx] ( x 2 2 x 2)e x C
324
Alternative solution 2 - using ( f ( x)dg ( x) f ( x) g ( x) g ( x)df ( x) )

(Note 1) (Note 2) (Note 1)
x 2 e x dx x2 d ex x 2e x ex d x2 x 2e x e x 2 xdx
f ( x) g ( x) f ( x) g ( x) g ( x) f ( x) f ( x) g ( x)
(Note 3) (Note 4)
x 2e x 2x d ex x 2e x [ 2 xe x ex d 2x ] x 2e x 2 xe x 2e x dx
f ( x) h( x) f ( x) h( x) h( x) f ( x)
x 2e x 2 xe x 2e x C ( x 2 2 x 2)e x C
Note 1
de x
e x ; de x e x dx
dx
Note 2
dx 2
2 x; dx 2 2 xdx
dx
Note 3
d 2x
2 ; d 2x 2dx
dx
Note 4
d 2x
2 ; d 2x 2dx
dx
We check the derivative of the solution with respect to x:

d 2
( x 2 x 2)e x ( x 2 2 x 2)e x (2 x 2)e x x 2e x
dx
h) ( x 2 1)e3 x 2 dx

This works exactly the same as the previous exercise, and we again present two solutions.
( x 2 1) e3 x 2 dx
f ( x) g '( x )
1 1 3x 2
( x 2 1) e3 x 2
2x e dx
3 f '( x ) 3
f ( x)
g ( x) g ( x)
1 2 1
( x 1)e3 x 2
2 x e3 x 2 dx
3 3 f '( x ) g '( x )
1 2 1 1 1 3x 2
( x 1)e3 x 2
[2 x e3 x 2
2 e dx
3 3 3 f ''( x ) 3
f '( x ) g ( x ) g ( x)
1 2 2 3x 2 2 3x
( x 1)e3 x 2 xe e 2
C
3 9 27
1 2 2 3x 2
( ( x 2 1) x )e C
3 9 27
325

(Note 1) 1
( x 2 1)e3 x 2 dx ( x 2 1) de3 x 2
3
1
( x 2 1) d e3 x 2
3 g ( x)
f ( x)
1 2 1 3x 2
( x 1) e3 x 2 e d ( x 2 1)
3 g ( x) 3 g ( x)
f ( x) f ( x)
(Note 2) 1 1 3x 2
( x 2 1)e3 x 2
e 2 xdx
3 3
(Note 1) 1 1 1
( x 2 1)e3 x 2
2 x de3 x 2
3 3 3
1 2 1
( x 1)e3 x 2
2 x d e3 x 2
3 9 h( x) g ( x)
(Note 4) 1 1
( x 2 1)e3 x 2
[2 xe3 x 2 e3 x 2 d 2 x ]
3 9 h( x) g ( x) g ( x) h( x)
1 2 2 3x 2 1
(x 1)e3 x 2
xe e3 x 2 2dx
3 9 9
1 2 2 3x 2 2
(x 1)e3 x 2 xe e3 x 2 dx
3 9 9
1 2 2 3x 2 2 1 3x
(x 1)e3 x 2 xe e 2
3 9 9 3
1 2 2 3x
( ( x2 1) x )e 2
C
3 9 27
Note 1
de3 x 2 1 3x
3e3 x 2 ; de3 x 2 3e3 x 2 dx; e3 x 2 dx de 2
dx 3
Note 2
d ( x 2 1)
2 x; d ( x 2 1) 2 xdx
dx
Note 3
d 2x
2 ; d 2 x 2dx
dx
Note 4
d 2x
2 ; d 2 x 2dx
dx

d 1 2 2 2 3x 2 1 2 2 3x 2 2 3x
( ( x 1) x )e 3 ( ( x 2 1) x )e 2
( x )e 2
( x 2 1)e3 x 2
dx 3 9 27 3 9 27 3 9
326
i) x 2 log 2 ( x)dx
This also works by repeated integration by parts. Again, we present two solutions.

We take f '( x) x 2 , g ( x) log 2 ( x)
x 2 log 2 ( x) dx
f '( x ) g ( x)
1 3 2 3 1
x log 2 ( x) x log( x) dx
3 3 x
g ( x)
f ( x) f ( x) g '( x )
1 3 2
x log 2 ( x) x 2 log( x)dx
3 3 f '( x ) h ( x )
1 3 2 1 3 1 3 1
x log 2 ( x) [ x log( x) x dx
3 3 3 3 x
h( x)
f ( x) f ( x ) h '( x )
1 3 2 1 3 1 3
x log 2 ( x) [ x log( x) x]
3 3 3 9
1 2 2
x3 ( log 2 ( x) log( x) ) C
3 9 27

(Note 1) 1
x 2 log 2 ( x)dx log 2 ( x) d ( x3 )
3
g ( x)
f ( x)
1 3 1 3
x log 2 ( x) x d log 2 ( x)
3 3
g ( x) g ( x)
f ( x) f ( x)
(Note 2) 1 1 31
x3 log 2 ( x) x 2log( x)dx
3 3 x
1 3 2
x log 2 ( x) log( x) x 2 dx
3 3
(Note 3) 1 2 1
x3 log 2 ( x) log( x) dx3
3 3 3
1 3 2
x log 2 ( x) log( x)d x3
3 9 v( x)
u ( x)
1 3 2 3
x log 2 ( x) [ x log 2 ( x) x3 d log( x)]
3 9 v( x)
u ( x )v ( x ) u ( x)
(Note 4) 1 2 3 1
x3 log 2 ( x) [ x log 2 ( x) x3 dx]
3 9 x
327
1 3 2 2 3 2 1 3
x log ( x) [ x log ( x) x]
3 9 3
1 3 2 2 3 1 3
x log ( x) [ x log( x) x]
3 9 3
1 2 2
x3 ( log 2 ( x) log( x) ) C
3 9 27
Note 1
1
d ( x3 )
3 1
x 2 ; x 2 dx d ( x3 )
dx 3
Note 2
d log 2 ( x) 1 1
2log( x); d log 2 ( x) 2log( x)dx
dx x x
Note 3
dx3 1 3
3x 2 ; x 2 dx dx
dx 3
Note 4
d log( x) 1 1
; d log( x) dx
dx x x

d 3 1 2 2 2 1 2 2 2 1 21
( x ( log ( x) log( x) )) 3x 2 ( log 2 ( x) log( x) ) x3 ( log( x) ) x 2 log 2 ( x)
dx 3 9 27 3 9 27 3 x 9x
Quite a relief, to be honest.
7
x
j) dx for which you must make use of the long polynomial division.
2 1 2 x
Solution:
This is an example where we apply our familiar substitution, but we also have to take into
dw
account the limits of integration. We take the substitution: w x 2; 1; dw dx .
dx
For the limits we get: x 2 w 4, x 7 w 9 , so:
7 2 x9
x w w 2
dx dw
2 1 2 x 41 w
dy 1
We take the substitution: y w; ; 2 wdy dw; 2 ydy dw . For the limits we get:
dw 2 w
w 4 y 2, w 9 y 3
9 w 3 2 3 3
w 2 y
y 2 y2 2 2 y3 4 y
dw d ( y2 ) 2 ydy dy
41 w 2
1 y 2
1 y 2
1 y
3
2y 4y
The term can be simplified using the long division (polynomial division), which is
1 y
explained below:
328
3 3 3 3 3 3
2 y3 4 y 2 2 2 1
dy ( 2y 2y 2 )dy 2 y dy 2 ydy 2 dy 2 dy
2
1 y 2
1 y 2 2 2 2
y 1
3
1 3 1 2 1 1 1 1
2 y y y log( y 1) 2[( 27 9 3 log(2)) ( 8 4 2 log(1))]
3 2 y 2 3 2 3 2
19 3 38 47
2[ log(2)] 3 2 log(2) 2 log(2)
3 2 3 3
Long division (polynomial division):

Divide a polynomial function of order k, by another polynomial function that has a lower
2 y3 4 y
order. We take the ratio , for which the numerator has the highest power 3, whereas
1 y
the denominator has the highest power 1. We can simplify the ratio as follows.
Step 1:
We divide the numerator 2 y 3 4 y by the denominator (1 y) .
We want to split
2 y3 4 y X
by 2 , for which X is a polynomial function in y that has the highest order 2.
1 y 1 y
We take two substeps 1a and 1b.
Step 1a: Take a number b such that the product of by 2 and the denominator (1 y) eliminates
the term 2 y 3 . The product is by 2 (1 y) by3 by 2 . For b = -2, the highest order is
eliminated.
Step 1b:
For the term X we calculate:
2 y3 4 y X
by 2
1 y 1 y
2 y3 4 y X
by 2 (b=-2)
1 y 1 y
2 y 3 4 y by 2 (1 y) b 2 2 y 2 4 y X
1 y 1 y 1 y 1 y
Result step 1a and step 1b together:
2 y3 4 y 2 2 y2 4 y
2y
1 y 1 y
Step 2:
2 y3 4 y 2 y2 4 y
So, we have brought back the ratio to the sum of 2 y 2 and . For the latter
1 y 1 y
ratio the power of the numerator is not lower than that of the denominator. We can repeat step
1.
329
2 y2 4 y X
cy
1 y 1 y
2
2y 4y X
cy
1 y 1 y
2
2y 4y cy (1 y ) c 2 2 y
thus for c=-2 the power of 2 in the numerator is
1 y 1 y 1 y
eliminated.
2
2y 4y 2y
Result step 2: 2y
1 y 1 y
Step 3:
2y X
d
1 y 1 y
2y X
d
1 y 1 y
2 y d (1 y ) X
1 y 1 y 1 y
2 y d (1 y ) d 2 2
thus for d=-2 the power of 1 in the numerator is eliminated.
1 y 1 y 1 y
Result step 3:
2y 2
2
1 y 1 y
Important: we cannot take another simplification, because the power in the numerator (=0) is
smaller than the power in the denominator (=1)
Step 4 (combination of steps 1, 2 and 3):

2 y3 4 y 2
2 y2 2 y 2
1 y 1 y
k) x 2 ( x 3)12 dx
Solution:
This is just a nice trick. We could manually compute this beast by just expanding the
expression raised to the power 12. That would be a lot of work. By applying the substitution
y x 3, dy dx , we get:
x y 3
x 2 ( x 3)12 dx ( y 3)2 y12 dy
This is a lot easier to expand!
330
( y 3) 2 y12 dy ( y 2 6 y 9) y12 dy ( y 2 y12 6 yy12 9 y12 dy ( y14 6 y13 9 y12 )dy

y14 dy 6 y13dy 9 y12 dy
1 15 6 14 9 13 y x 3
y y y C
15 14 13
1 6 9
( x 3)15 ( x 3)14 ( x 3)13 C
15 14 13
Some additional integral exercises:

i) Apply the substitution method to solve
2x 4
2
dx
x 4x 8
Substitution method:
f g ( x) g '( x)dx f y dy and y g ( x)
Usually, I take one step in between, to have a better understanding. It is as follows:
f g ( x) g '( x)dx f g ( x) dg ( x) f y dy
Solution:
dy
Lets take y x2 4x 8 , 2 x 4 thus dy (2 x 4)dx
dx
2x 4 1
2
dx dy ln( y ) C for y > 0.
x 4x 8 y
Thus the solution: ln( x2 4 x 8) C for x2 4 x 8 0
m) Apply the substitution method to solve

6 x( x 1)4 dx
Solution
y x 1 so that x y 1 and dx dy
x y 1 6 5 6
6 x( x 1)4 dx 6( y 1) y 4 dy (6 y 5 6 y 4 )dy y 6 y C ( x 1)6 ( x 1)5 C
5 5
n) Apply the substitution method to solve
cx 2
xe dx
Solution 1:
dx 2 1 2
2 x 2 xdx dx 2 xdx dx
dx 2
2 1 2
dx2 d ( x2 ) 1
cx 2
xe cx dx e cx dx 2 e d x2
2 2
2 2
1 2 t cx 1 t et t cx2 e cx
e cx d cx 2 e dt C C
2c 2c 2c 2c
Alternative solution 2:
dt
Substitution: t cx 2 and 2cx or dt 2cxdx
dx
331
2 2
cx 21 t et t cx e cx
xe dx e dt C C
2c 2c 2c
o) Apply the substitution method to solve
2
x2
xe dx
0
Solution:
Lets take
t x 2 , dt 2 xdx . For x = 0, we have t = 1. For x = 2, we have t = -4
2 2 1 4 t 1 4 0 1 4
xe x dx e dx (e e ) (e 1)
0 2 0 2 2
p) Apply the substitution method to solve
e 1 ln x
dx
1 x
Solution:
e 1 ln x e e
dx (1 ln( x))d ln( x) (1 ln( x))d 1 ln( x)
1 x 1 1
For the substitution method, we also need to change the limits of integration.
We will apply t 1 ln( x)
Lower bound of integral: x=1, so that t 1 ln(1) 1 0 1
Upper bound of integral: x=e, so that t 1 ln(e) 1 1 2
2
e 2 1 2 1 3
(1 ln( x))d 1 ln( x) tdt t (4 1)
1 1 2 t 1 2 2
1
Take t 1 ln( x) , dt dx , Limits of integration: t = 1 (for x = 1; t 1 ln(1) 1 0 1 ) and
x
t = 2 (for x = e; t 1 ln(e) 1 1 2 )
2
e 1 ln x 2 1 2 1 3
dx tdt t (4 1)
1 x 1 2 t 1 2 2
q) Apply the method of integration by parts to solve

4 xe2 x dx
Solution:
4 xe2 x dx 2 xde2 x 2 xe2 x 2 e2 x dx 2 xe2 x e2 x C
r) Apply the method of integration by parts to solve
2x
dx
( x 8)3
Solution:
2x 2 2
dx xd ( x 8) x( x 8) ( x 8) 2 dx x( x 8) 2
( x 8) 1
C
( x 8)3
Check:
332
d
x( x 8) 2 ( x 8) 1 C ( x 8) 2 2 x( x 8) 3
( x 8) 2
2 x( x 8) 3
dx
s) Apply the method of integration by parts to solve
10
0.05t
(1 0.4t )e dt
0
Solution:
10
10
0.05t 1 10
0.05t 1 0.05t 1 10
0.05t
(1 0.4t )e dt (1 0.4t )de (1 0.4t ) e e d (1 0.4t )
0 0.05 0 0.05 t 0 0.05 0
10
0.5
10
0.05t 0.5 1 0.05t 0.5 0.5
100e 20 20 0.4 e dt 100e 20 8 e 100e 20 160(e 1)
0 0.05 t 0
22.3
t) Apply both the method of integration by parts and the substitution method to solve
2 x ln( x 2 b2 )dx
Solution:
2 x ln( x 2 b 2 )dx ln( x 2 b 2 )d ( x 2 b 2 ) ( x 2 b 2 ) ln( x 2 b 2 ) ( x 2 b 2 )d ln( x 2 b 2 )
t
( x 2 b 2 ) ln( x 2 b 2 ) td ln(t ) ( x 2 b 2 ) ln( x 2 b 2 ) dt ( x 2 b 2 ) ln( x 2 b 2 ) t C
t
( x 2 b 2 ) ln( x 2 b 2 ) ( x 2 b 2 ) C
333
1.
Two individuals have the following utility function:
Ui ( x1 , x2 , yi ) log( xi ) log( yi ) x1 x2 , subject to gi ( xi , yi ) xi yi 10 for i 1, 2 . Find
their utility maximizing consumption pattern. Next suppose that a king wishes to maximize
the sum of their utilities. Find the consumption that he would ordain.
There are two possible ways of solving this. Either you can use the Lagrangian approach or
you can just look at the KKT-condition (equating the gradients). Both approaches lead to the
same result and are almost indistinguishable. Nonetheless, we will give them both here, so
you can familiarize yourself with their similarities.
Solution 1 (KKT):
We optimize for person 1, person two is completely the same. We first observe that the
objective function is concave (we know the logarithms to be concave, and the addition of a
linear function does not alter this) and that the constraint function is linear and hence convex.
So we can apply our standard technique and look for a point such that
U1 ( x1* , y1* ) g1 ( x1* , y1* ) . We drop the stars from the notation for convenience.
1
1
x1 1
U1 ( x1* , y1* ) g1 ( x1* , y1* )
1 1
y1
1 1 x1 1
1
x1 x1 y1
From this point on the solution of the single person case is the same as below. We continue
the second question in KKT form below.
Solution 2 (Lagrangian):
We have already observed that the functions are suitably concave and convex. The individual
1 simply maximizes:
L1 ( x1 , y1 ) log( x1 ) log( y1 ) x1 x2 ( x1 y1 10)
L1 1
1 0
x1 x1
L1 1 1
0
y1 y1 y1
1 1 x1 1 x1
1 y1
x1 x1 y1 1 x1
x1 x1 (1 x1 ) x1 (2 x1 ) x1
x1 y1 x1 10
1 x1 1 x1 1 x1
x12 2 x1 10 10 x1 x12 12 x1 10 0
12144 40
x1 6 26
2
y1 10 x1 4 26
334
Since y cannot be negative, we are left with the single possibility:
x1 6 26 0.9, y1 4 26 9.1
Individual 2 maximizes the exact same function, so he chooses the same consumption levels.
However, neither individual takes into account the fact that his consumption of x leads to a
lower utility for his neighbour. Each regards the consumption of his neighbour as a given and
neither cares directly about the utility of his neighbour.
As you are probably aware this situation is not Pareto-optimal; there exists a consumption
pattern that leaves both individuals better off. One way to see this is by introducing a
benevolent dictator. He will optimize total welfare, which we take here to be simply the sums
of the individual utilities, by choosing the consumption levels of the individuals for them,
while respecting their budget constraints (so without redistributing). We omit the possibility
of redistribution because we are interested in a Pareto-improvement: everybody must be made
better off, so we dont take anything away from anybody.
This means that the dictator faces two constraints. Again we can solve his problem with the
Lagrange method or with the KKT method and again they are almost completely the same.
KKT:
We call welfare W and it looks like this:
W ( x1 , x2 , y1 , y2 ) log( x1 ) log( y1 ) log( x2 ) log( y2 ) 2 x1 2 x2
Which we optimize subject to the two constraints gi ( xi , yi ) xi yi 10, i 1, 2 .
We observe that the functions are again concave and convex as they should be. Because we
have several constraints, we could check many possible cases here, which each constraint
either binding (meaning that it holds with equality) or not. As a rule of thumb, it is usually
best to start with the case where all constraints bind, because it is easiest to check. Here we
would also expect on general grounds that both constraints will bind. After all, the constraint
say that a consumer cannot spend more than his income, but in this model it would make
sense that he spends exactly his income.
Taking both constraints binding, the KKT-condition becomes:
W ( x1 , x2 , y1 , y2 ) 1 g1 ( x1 , x2 , y1 , y2 ) 2 g2 ( x1 , x2 , y1 , y2 )
Note that we interpret the constraints here as functions of all four variables, even though only
two actually occur in the formula. This is to make the vectors of the same dimension:
1
2
x1
1 1 0
y1 1 0
W ( x1 , x2 , y1 , y2 ) 1 g1 ( x1 , x2 , y1 , y2 ) 2 g 2 ( x1 , x2 , y1 , y2 ) 1 2
1 0 1
2
x2 0 1
1
y2
If we write this out, we get the exact same set of equations as below:
Lagrange method:
The Lagrangian looks as follows:
LD ( x1 , x2 , y1 , y2 ) log( x1 ) log( y1 ) log( x2 ) log( y2 ) 2 x1 2 x2 1 ( x1 y1 10) 2 ( x2 y2 10)
This gives first order conditions:
335
LD ( x1 , x2 , y1 , y2 ) 1
2 1 0
x1 x1
LD ( x1 , x2 , y1 , y2 ) 1
1 0
y1 y1
LD ( x1 , x2 , y1 , y2 ) 1
2 2 0
x2 x2
LD ( x1 , x2 , y1 , y2 ) 1
2 0
y2 y2
Notice that we get two completely symmetric sets of two equations: the first two and the last
two. We can just solve one of these; the other will be exactly the same.
Furthermore, the steps of the solution are almost the same as in the individual case, only a 1 is
changed to a 2 (because of the internalization of the effects of consumption).
We race through it:
LD 1
2 0
x1 x1
LD 1 1
0
y1 y1 y1
1 1 2 x1 1 x1
2 y1
x1 x1 y1 1 2 x1
x1 x1 (1 2 x1 ) x1 (2 2 x1 ) x1
x1 y1 x1 10
1 2 x1 1 2 x1 1 2 x1
2 x12 2 x1 10 20 x1 2 x12 22 x1 10 0
22 484 80 22 404
x1
4 4
18 404
y1 10 x1
4
Once again, because y cannot be negative, we are left with a single possibility:
22 404 18 404
x1 0.48, y1 9.52
4 4
So we find that, as expected, consumption of x is reduced, because the externality is now
taken into account. Finally, we check that utility is indeed higher now.
Utility before was:
Ui ( x1 , x2 , yi ) log( xi ) log( yi ) x1 x2 log(0.9) log(9.1) 0.9 0.9 0.3
Utility afterwards is:
Ui ( x1 , x2 , yi ) log( xi ) log( yi ) x1 x2 log(0.48) log(9.52) 0.48 0.48 0.56
So our individuals are better off.
In what follows we use the KKT and Lagrange methods interchangeably. We advise that you
use the former yourself, but you will encounter both in the literature, so dont be confused.
2.
336
f* f*
Maximize f ( x, y) x2 y2 yx x . Find and by direct computation and by the
envelope theorem.
Solution:
From the first-order conditions we obtain:
f ( x, y )
2x y 0
x
f ( x, y ) x
2y x 0 y
y 2
2
2
2x y 2 x x 2
,y 2
2 4 4
2 2 2 2 2 2
4 2 2
f *( , ) 2
( 4) 2 ( 2
4) 2 ( 2
4) 2 2
4
2 2 2 2
( 4) 2
2
( 4) 2 2
4 2
4
*
f
We can now calculate directly:
f* 2
2
2 2
4 4
Lets compare this to the envelope theorem:
f* f 2
( x2 y 2 yx x) x* 2
x x* , y y* x x* , y y* 4
The results are the same. Please notice that for the envelope theorem, we did not have to
calculate f * at all. That saves a lot of time.
f*
Lets do :
f* 2
2 2
2
4 ( 2 4)2
With the envelope theorem (dropping the cumbersome conditions x x* etc. for ease of
writing):
f* f 2 2
( x2 y 2 yx x ) y * x*
( 2 4)2
Hurrah, they are the same!
3.
f*
Maximize f ( x, y) x y1 a
subject to x py I . Find both by direct computation and
p
by the envelope theorem.
Solution:
This optimization of a Cobb-Douglas function should look familiar by now:
337
L ( x , y ) x y1 a (x py I )
L
x 1 y1 a 0
x
L
(1 ) x y a
p 0
y
( x* ) 1 ( y* )1 a
We divide the constraints to get:
y 1 py
x
1 x p 1
py py (1 ) I
py I y ,x I
1 1 p
For the direct computation we have to plug this into f(x,y), to get something fiercely ugly:
(1 ) I 1 1
f * ( I) ( ) ( )1 I (1 )1 p 1I
p p
This yields:
f*
( 1) p 2 (1 )1 I p 2 (1 )2
p
Lets see what the envelope theorem gives us.

f* L
y* ( x* ) 1 ( y* )1 a y* ( x* ) 1
( y* )2 a
p p x x* , y y*
1 (1 )I 1
( I) ( )2 a
( )2 I
p p
Once more, they are the same.
4.
A firm has the profit function ( p, A) ( p c) Q( p, A) A , where p is price, Q demand, A
the amount of advertising, c the constant unit cost of production and the cost of advertising.
Find the effect of on pricing and on profits.
Solution:
This is now easy-peasy. We just apply the envelope theorem:
*
A*
p p* , A A*
So a marginal increase in advertising costs decreases you profits by the (optimal) initial
amount of advertising. At the margin, we dont have to take into account the fact that you will
also change your amount of advertising.
Since the problem is posed so abstractly, this is all we can really say, we cannot derive any
meaningful result on A* .
5. Maximize f ( x, y) x2 y 2 , s.t. x y c , and calculate the shadow price. Show that

f*
, where f * (c) f ( x* , y* ) , the value function.
c
338
Solution:
We check the KKT condition:
2x 1
f g , which gives us:
2y 1
2x 2y
c c
x* y* c x* , y* 2 c
2 2
Now our value function f * (c) is the objective function f evaluated at the optimal values
x* (c), y* (c) :
c c c2
f * (c ) f ( x* (c), y* (c)) ( ) 2 ( ) 2
2 2 2
We can now take the derivative of this with respect to c. This shows us how much our optimal
value of f changes when we change the constraint c:
f * (c ) c2
( ) c
c c 2
6. Maximize f ( x, y) x y , s.t. x 2 y c . and calculate the shadow price.

*
f
Show that .
c
Solution:
L ( x, y ) x y (c x 2 y)
L L 1 1
1 2 x 0, 1 0 1, x ,y c
x y 2 4
Now the value function becomes:
1 1
f * (c) f ( x* (c), y* (c)) c c and
4 4
*
f (c )
1
c
7. In one variable it is a bit confusing, but it still works:

2 f*
Maximize f ( x) x , s.t. x c . Show that . (the shadow price).
c
Solution:
L( x ) x 2 (c x )
L
2x 0 2 x, x c 2c
x
Our value function becomes:
f * (c) f ( x* (c)) c2 and
f * (c )
2c
c
339
ADVANCED MATH ADDITIONAL ASSIGNMENTS WEEK 6
Additional assignments Week 6 advanced mathematics
Additional assignment 1.
An example with several constraints:
max x y
subject to x 2 y c1 and 3x y c2
* *
f f
Show that 1 , 2 .
c1 c2
Solution:
We observe that the objective function is concave and the constraints convex. We first try both
constraints binding, so
x 2 y c1 3c1 c2 2c2 c1
y x
3x y c2 5 5
We check if this is a KKT-point:
L( x, y ) x y 1 ( x 2 y c1 ) 2 (3x y c2 )
L
1 1 3 2 0
x
L
1 2 1 2 0
y
2 1
1 0, 2 0 , so this is a KKT-point and a maximum. Now lets consider our value
5 5
function:
2c2 c1 3c1 c2 2c1 c2
f * (c1 , c2 ) x* y*
5 5 5
Lets take the partial derivatives:
f* 2
1
c1 5
f* 1
2
c2 5
Everything is as it should be!
Maximize f ( x) x2 x 1 . Denote the solution by x* and the value function, or indirect
f*
objective function f * ( ) f ( x* ) . Find by direct computation and by the envelope
theorem.
Solution:
We check the first-order condition:
340
df
2 x* 0 x*
dx *
x x 2
2 2 2
f *( ) x*2 x* 1 1 1
4 2 4
f*
We now compute directly:
f* 2
1
4 2
Instead we could have used the envelope theorem, which says:
f* f
[ x2 x 1]x x* x x x*
x x* 2
The two methods give the same outcome. In this case, they are equally easy, but in general,
the envelope theorem saves you a lot of work, as we will see in the next exercise.
f*
Maximize f ( x) x 2 subject to x 10 . Find by direct computation and by the
envelope theorem.
Solution:
Clearly this problem is a little silly if we view it as a maximization problem: the constraint
f*
already fixes x: x 10 . Still, for the computation of it will be worthwhile to write
this down with a Lagrangian:
L( x) x2 x 10
This gives us FO:
L
2 x 0 2 x
x
From the constraint we got x 10 , so:
f *( ) (10 )2
From this we obtain directly:
f*
(10 )2 2 (10 )
We could also have gotten this by the envelope theorem, which for a constrained optimization
problem is as follows:
f* L
x*2 (10 )2 2 x* (10 ) 2 2 (10 )
x x*
The outcome is once more the same.
341
Week 7 Integral calculus and dynamic analysis (I)
Integral calculus
Further issues of integration See lecture slides
Differential calculus
Solution concept (solution is a function) K.14.1.
Monotonic, oscillatory (convergence, divergence) K.14.1.
Steady state K.14.1.
Phase diagram K.14.1.
Stability K.14.1.
342
Integral Calculus
Remember from week 6 that we may calculate the area under a

function. However, we may have not well defined functions. E.g. take
the function, which is only defined for x (0, )
1
f ( x)
x
The antiderivative is F ( x) 2 x
Example1:
1
f ( x)
x
Note that f(x) has a vertical asymptote at x=0. Yet, we can compute the
the following area:
16 1 16
dx 2 x 8 lim 2 x 8 0 8
0 x 0 x 0
x
Example 2:
1
f ( x)
x
We can calculate an improper integral, in which the lower limit
approaches negative infinity or the upper limit approaches positive
infinity. The following integral has no finite limit. It diverges.
1
dx 2 x lim2 x lim 2 x 0
0 x 0 0
x x x
343
Example 3:
1
f ( x)
x x
The integral does not exist:
16
16 1 2 2 2
dx lim
0
x x x x 0 16 x 0 x
Example 4:
1
f ( x)
x x
The integral does not exist:
1 2 2 2
dx lim lim 0
0 0
x x x x 0
x x x x
Example 5:
1
f ( x)
x x
The integral does exist:
1 2 2 2
dx lim 2
1
x x x x 1
x x 1
344
Example 6: exponential distribution

Consider the following function:
f ( x) e x
The integral does exist and the area is exactly equal to one:
e t dt e t
lim e t
( 1) 0 1 1
0 t 0 t
t may be considered as a random variable, which is the duration of a

certain event. It can be shown that is the expected duration of t.
Example 7: exponential distribution

For an exponential distribution of
2x
f ( x) 2e
(expected value is two days) what is the probability of having a

duration longer than one day?
2e 2t dt e 2t
lim e 2t
( e 2) 0 e 2
0.135
1 t 1 t
Thus the probability is 13.5 percent
345
Integration and differentiation
The continuous function f ( x; ) is a function of x, but it also depends

on the parameter .
If the range of integration does not depend on , integration and

differentiation are interchangeable:
b b
f ( x; )dx f ( x; )dx
a a
Example 8:
Take the integral, which depends on the unknown
4
4
2 3 1 2
(3t t )dt t t (64 8 ) (1 0.5 ) 63 7.5
1 2 t 1
4
(t 2 t )dt (63 7.5 ) 7.5
1
4
4
2
4 1 2
(t t )dt tdt t 8 ( 0.5) 7.5
1 1 2 t 1
4 4
Thus (t 2 t )dt (t 2 t )dt
1 1
346
Integration and differentiation
Fundamental theorem of calculus

Let f ( x) is be a function that is integrable on [a, x] for each x in [a, b]
. Let c be such that a c b . Define F ( x) as follows:
x
F ( x) f (t )dt which is a function of the upper limit of the integral.
c
The derivative of F ( x) becomes: F '( x) f ( x)
Example 9:
x 1 3 1
F ( x) t 2 dt x
1 3 3
F '( x) x2
Example 10:
A statistical probability density function is f ( x) . The cumulative
x
function is F ( x) f (t )dt .
We have the following features:

f (t )dt 1
F( ) 0
F( ) 1
We can derive the density function from the cumulative
function:
F '( x) f ( x)
347
Multiple integrals
We can also compute multiple integrals, in which we keep the other

variable constant.
Example 11:
2
1 2 1 2 1 1 2
( x2 xy )dydx ( x2 xy )dy dx x2 y xy dx
0 1 0 1 0 2 y 1
1
1
2 2 1 1
2 3 1 3 3 2 1 3 13
(2 x 2 x) ( x x) dx (x x)dx x x
0 2 0 2 3 4 x 0 3 4 12
Theorem
Let f be a continuous function defined on the rectangle
R [a, b] x [c, d ]
b d d b
Then f ( x, y )dy dx f ( x, y )dx dy
a c c a
Explanation:
On the left-hand side: we first integrate over [c, d ] with respect to
y; next we integrate over [a, b] with respect to x.
On the right-hand side: we first integrate over [a, b] with respect to
x; next we integrate over [c, d ] with respect to y.
Example 12:
1
2 1
2
2 1
2
2 1 3 1 2
(x xy )dxdy (x xy )dx dy x x y dy
1 0 1 0 1 3 2 x 0
2
2 1 1 1 1 2 2 4 1 1 13
( y )dy y y ( ) ( )
1 3 2 3 4 y 1 3 4 3 4 12
This outcome is equal to that of example 11.
348
Differential equations (Chapter 14)
In a differential equation, the unknown is a function, not a number

The equation contains one or more of the derivatives of the
function.
We consider ordinary differential equations: the unknown is a
function of only one variable
We focus on linear first-order differential equations.
Example 13
Autonomous equation (it does not involve time).
The differential equation describes natural growth:

dx(t )
ax(t )
dt
Otherwise formulated: x(t ) ax(t ) (dot above x denotes derivative

with respect to time)
General solution: x(t ) Ceat

Because:
dx(t )
aCeat ax(t )
dt
Definite solution: x(0) Cea 0 Ce0 C (x(0) is the initial value)

The solution is stable if a < 0:
lim x(t ) x(0)lim eat 0
t t
The solution is unstable if a > 0:
lim x(t ) x(0)lim eat (if x(0) 0 )
t t
349
Example 14
Equation with a constant term. Next, we consider the general solution
of
dx(t )
ax(t ) b a 0
dt
or
x(t ) ax(t ) b
We can multiply this equation by e at :
dx(t ) at
e ax(t )eat be at
dt
Which can be rewritten as:

d
( x(t )eat ) beat
dt
Thus:
b at
x(t )eat beat dt e C
a
Thus:
b
x(t ) Ce at
b
x(t ) converges to as t
a
Example 15
dx(t )
Find the general solution of 4 x(t ) 12
dt
4t
Solution: x(t ) 3 Ce
350
Example 16
Next, we consider the general solution of, which contains a time-
varying term b(t)
dx(t )
ax(t ) b(t ) a 0
dt
or
x(t ) ax(t ) b(t )
We can multiply this equation by e at :
dx(t ) at
e ax(t )eat b(t )eat
dt
Which can be rewritten as:

d
( x(t )eat ) b(t )eat
dt
Thus:
x(t )eat b(t )eat dt C
Thus:
x(t ) e at
b(t )eat dt Ce at
351
Separable equations
Lets suppose that

d
x(t ) F (t , x)
dt
(or: x(t ) F (t , x) )
for which F (t , x) is the product of two functions:

F (t , x) f (t ) g ( x)
Solution:
Step 1: write the differential equation as:
dx
f (t ) g ( x)
dt
Step 2:separate the variables:

dx
f (t )dt
g ( x)
Step 3: integrate each side

dx
f (t )dt
g ( x)
Solve both integrals.
352
Example 17
Find the general solution of the differential equation:
dx
2tx 2
dt
Step 2: Separate the equation.
dx
2tdt
x2
Step 3: Integrate
dx
2tdt
x2
Which has the solution:

1 2
t C (C is a constant)
x
or
1
x
t2 C
353
Example 18
Find the general solution of the differential equation:
dx t3
dt x 6 1
Separate: ( x6 1)dx t 3dt

Integrate: ( x6 1)dx t 3dt
1 7 1 4
which has the solution: x x t C (C is a constant)
7 4
354
Tutorial of Week 7 Advanced mathematics
Technical tutorial (Wednesday)
1. Compute the following integrals:

1
a) dt
3
t 3
Solution:
1 1
dt d (t 3)
3
t 3 3
t 3
We apply the substitution method, by substituting t+3=x.
For t = -3 (lower bound of integral), x = 0. The upper bound of the integral does not change.
1 1
d (t 3) dx 2 x lim 2 x lim 2 x
3
t 3 0
x x 0 x x 0
1/ 3
b) e 3t dt
Solution:
1/ 3
1/ 3 1 3t
3t 1 3(1/ 3) 1 3t 1
e dt e e lim e
3 t 3 t 3 3e
b additional exercise)
1 1 1 0 1 1
e 3t dt e 3t
lim e 3t
e 0
0 3 t 0
t 3 3 3 3
x2
c) xe dx
1
Solution:
1 x2 2
x2
xe dx
e dx
1 1 2
We apply the substitution method, by substituting x 2 t

For x=1 (lower bound of integral), t = 1. The upper bound of the integral does not change.
1 x2 1 t 1 1 1 1 1
e dx 2 e dt e t
lim e t
e 1
0
1 2 1 2 2 t 1
t 2 2 2e 2e
1 3 y 3 1 y
d) ( xy )dxdy {and show that it is equal to ( xy )dydx }
0 e x e 0 x
Solution:
3
1 3 y 1 3 y 1 1 2
( xy )dxdy ( xy )dx dy x y y ln( x) dy
0 e x 0 e x 0 2 x e
3
1 9 1 2
y e y y ln(3) y dy
0 2 2 x e
1
9 2 1 2 2 1 2 1 2 7 1 2 1
y e y y ln(3) y e ln(3)
4 4 2 2 y 0
4 4 2
355
1
3 1 y 3 1 y 3 1 2 1 y2
( xy )dydx ( xy )dy dx xy dx
e 0 x e 0 x e 2 2 x y 0
3
3 1 1 1 2 1 9 1 2 1 1 7 1 2 1
x dx x ln( x) e ln(3) e ln(3)
e 2 2x 4 2 x e 4 4 2 2 4 4 2
e) f '(t )dt
Solution:
f '(t )dt f (t ) C C is a constant
2. Please compute the first-order derivative of the following functions

x
a. F ( x) e 3t dt
1
Solution:
F '( x) e 3 x
x
x
3t 1 3t 1 3x 1 3
F ( x) e dt e e e
1 3 t 1 3 3
1 3x 3x
F '( x) ( 3)e e
3
2x
b. F ( x) e 3t dt (apply chain rule)
1
Solution:
(2 x)
3 (2 x )
F '( x) e 2e 6 x
x
2x
2x
3t 1 3t 1 6x 1 3
F ( x) e dt e e e
1 3 t 1 3 3
1
F '( x) ( 6)e 6 x 2e 6 x
3
3
c. F ( x) e 3t dt
x
Solution
3 x
F ( x) e 3t dt e 3t dt
x 3
F '( x) e 3x
x
3
3t
x
3t 1 3t 1 3x 1 9
F ( x) e dt e dt e e e
x 3 3 t 3 3 3
356
3x
F '( x) e
x
d. F ( x) e 3t dt
Solution:
F '( x) e 3 x
x
x
3t 1 3t 1 3x 1 3t
F ( x) e dt e e lim e
3 t 3 t 3
3x
F '( x) e
t 1
e. F (t ) dx
0 ( x 1)
Solution:
1
F '(t )
(t 1)
1 1
f. F( ) dx
0 ( x 1)
Solution:
1
We know that dt ln(t ) for positive t. However the denominator is not t, but x 1. So
t
we need to rearrange the integral a bit:
1 1 11 1 1 1 1
F '( ) dx d ( x) d ( x 1)
0 ( x 1) 0 ( x 1) 0 ( x 1)
We apply the substitution method here, x 1 t , so that the upper bound (x=1) becomes
1 1 1 , and the lower bound (x=0) is 0 1 1
1 1 1 1 1 1
d ( x 1) dt
0 ( x 1) 1 t
Next, we apply the product rule of differentiation: f '( x) g ( x) f ( x) g '( x)
1 11 1 1 1 11 1 1 1 1 1
dt 2
dt 2
ln(t ) t 1 2
ln( 1)
1 t 1 1 t ( 1) ( 1)
357
1 1 1 1 1 x
Alternative: F '( ) dx dx dx
0 ( x 1) 0 ( x 1) 0 ( x 1) 2
At this stage, I decided to apply of integration by parts, because I know that

t 1 1 1 t
2
dt td t dt ln(t 1) for positive t+1.
(t 1) (t 1) (t 1) (t 1) (t 1)
{However, there is one problem, it is not t, but ( x 1) . Hence:
1
d
1 1 1 ( x 1)
dx d , because }
( x 1) 2
( x 1) dx ( x 1) 2
1
1 x 1 x 1 1 x 1 x 1 1 1 x
dx d d d
0 ( x 1) 2 0 ( x 1) 0 ( x 1) ( x 1) 0 ( x 1)
x 0
1 1 1 1 1 1 1 1 1
dx 2
d ( x 1)
( 1) 0 ( x 1) ( 1) 0 ( x 1)
1
1 1 1 1 1 1 1 1
2
dt 2
ln(t ) 2
ln( 1)
( 1) 1 t ( 1) t 1 ( 1)
3. Sketch the direction diagram and the phase diagram (if possible) for the following
differential equations. If possible, give an explicit solution.
a. x(t ) 3x(t )
Solution:
x(t ) C e3t
The directional diagram, with some solutions drawn in, looks thus:
358
What we see is a picture of widely diverging functions as time progresses. This can be
confirmed by looking at the phase diagram, which, for autonomous equations (those who do
not explicitly depend on time, see e.g. exercise 3e), plots the relation between x and x .
359
x
6
x
2 1 1 2
6
As we can see in the phase diagram, if x is positive, then x is also positive. That means that if
x is positive, it will increase over time. Similarly, if it is negative, it will decrease over time.
Only if x is zero, the time-derivative of x is also zero, and x does not change over time. This is
what we saw in the first picture. The solutions are either always increasing or always
decreasing. Only the zero-solution does not change over time. It is called an unstable
equilibrium: if you are there, you stay there, but if you are near, no matter how close you are,
youll never get there.
1
b. x(t ) x(t )
2
Solution:
1
t
x(t ) C e 2
360
From the picture we see that these solutions do converge to 0. We can again confirm this in
the phase diagram.
361
x
1.0
0.5
x
2 1 1 2
0.5
1.0
This time, a positive x implies a negative x and vice versa, so a positive x means that x is
decreasing over time. Again zero is an equilibrium, as it does not change over time. However,
this time, it is a stable equilibrium: if you start somewhere near it, youll get ever closer. In
fact, in this particular case, it doesnt matter where you start, you always get ever closer to
zero.
c. x(t ) 2 x(t ) 1
Solution:
From the lecture slides we obtain:
1
x C e 2t
2
362
10
1 1 2 3
10
We see all solutions converging to 0.5. The phase diagram confirms:
363
x
x
2 1 1 2 3
1
0.5 is an equilibrium, because x( ) 0 , and it is stable, because when x is smaller than 0.5, t
2
is increasing, x 0 and it is decreasing if x is larger than 0.5.
d. x(t ) 2 x(t ) 1
Solution:
We obtain from the lecture slides:
1
x C e 2t
2
364
We see -0.5 is an equilibrium, but it is unstable. Again, the phase diagram confirms:
x
x
2 1 1 2 3
e. x(t ) tx(t ) 0
365
Solution:
This differential equation is of the form:
x(t ) f (t ) x(t ) 0 ,where f (t ) t . These equations have the following general solution:
dF (t )
If F (t ) is such that f (t ) on a certain interval, then:
dt
1 2
x(t ) C e F (t ) on that interval. Here, the simplest F that works is F (t ) t (
2
1 2
F (t ) t 43 would also work, but it would make life more complicated). We get:
2
1 2
t
x(t ) C e 2
15
10
2 1 1 2
10
15
Because our differential equation also depends on t, we cannot draw a phase diagram; the
relation between x and x changes with time. We can observe however that x=0 is the only
equilibrium. We know that in equilibrium x 0 for all t, from this and our differential
equation it follows immediately that x=0.
f. tx(t ) x(t ) 0
366
Solution:
This is not of the form x(t ) f (t ) x(t ) 0 , that we discussed in 3e), but we can make it so by
dividing both sides by t. We obtain:
x(t ) 1
x(t ) 0 , which is of the form we want, with f (t ) . We can find integrands for f on
t t
the positive and the negative interval: F (t ) log(t ) and F (t ) log( t ) respectively.
So we get on the two intervals:
C
log( t ) if t 0
Ce if t 0 t
x(t )
Ce log( t ) if t 0 C
if t 0
t
To be sure, lets check this by plugging it into the original equation:
Ct C
tx(t ) x(t ) 0 , so it works for t>0. Similarly for t<0.
t2 t
4 2 0 2 4
g. x(t ) x(t ) t
367
Solution:
The nicest way to solve this is in 2 steps. This is a linear differential equation, meaning that
the x and its derivatives appear only linearly (their coefficients could be non-linear, although
thats not the case here). It is not homogeneous; it would be if it was x(t ) x(t ) 0 instead. In
general, it is homogeneous if, when you write all the terms involving x and its derivatives on
one side, the other side is 0. Now the equation is called inhomogeneous. The solution to an
inhomogeneous linear differential equation can be obtained as follows. Find a simple
particular solution and find the general solution to the homogeneous counterpart. Add the two
and you have your general solution. It will become clear in a minute:
Finding the particular solution can be tricky, but here its rather easy. If we try x(t ) t , wed
get: x(t ) x(t ) 1 t t , but this gives us the hint we need: x(t ) t 1 will work:
x(t ) x(t ) 1 t 1 t . By being clever like this, you can often find the particular solution.
The homogeneous equation is:
x(t ) x(t ) 0 , which has solution:
x(t ) C e t .
So our total solution becomes:
x(t ) C e t t 1. We check our result:
x(t ) x(t ) C e t
1 C e t
t 1 t
368
10
1 1 2 3
10
2
h. x(t ) x(t ) t
Solution:
We apply the same trick. In fact, the homogeneous solution is the same as before:
x(t ) C e t
(Strictly speaking, we should not write x(t ) , because this is not a solution to our equation. It
would however be a bit formal and a bit too much work to introduce new notation for this.)
For the particular solution, we first try x(t ) t 2 , only to see that it should be x(t ) t 2 2t .
So the general solution becomes:
x(t ) C e t t 2 2t
369
20
15
10
1 1 2 3 4 5 6
10
Notice that, although any particular solution does not converge, they also start to resemble the
function x(t ) t 2 2t , the particular solution, more and more over time. The same thing
happened in exercise 3g); the solutions converged to the particular solution. This is because
the homogeneous solutions converge to 0.
i. x(t ) tx(t ) t
Solution:
Again, we try to find a particular solution. It turns out to be particularly easy in this case:
x(t ) 1 works.
The solution to the homogeneous equation x(t ) tx(t ) 0 we know to be (similarly to
exercise 3e):
1 2
t
x(t ) C e 2
So we obtain:
1 2
t
x(t ) C e 2
1
Lets check:
370
1 2 1 2
t t
x(t ) tx(t ) t C e 2
t (C e 2
1) t , it worked!
15
10
2 1 1 2
10
15
2
j. x(t ) tx(t ) t
Solution:
Well, sort of. It turns out that there is no handy particular solution to this system. This is quite
a common occurrence with differential equations. However, we can still see what is going on
by drawing our regular pictures. Only, now the solutions were found numerically by a
computer.
371
10
4 2 2 4
10
Notice that the solution converge to the line x=t. According to the differential equation
x 0 along that line. Unfortunately it is not, because x 1 .
k. x(t ) t 3e x (t )
Solution:
This can be handled by separating the variables, as it is called. The right hand side is a
product of a term containing only x and a term containing only t. So we can rewrite it like
this:
dx 3 x (t )
x(t ) te e x (t ) dx t 3 dt
dt
1 4
e x (t ) dx e x (t ) t 3 dt t C
4
1
x t log t 4 C
4
2
t 1
l. x(t )
x5 1
372
Solution:
This works in the same way, but the solution stays in implicit form.
1 6 1 3
( x5 1)dx x x (t 2 1)dt t t C
6 3
1 1 3
So we end up with the implicit relation between x and t: x 6 x t t C . We cant
6 3
simply rewrite, but we know from week 5 how to handle implicit relations.
m. x(t ) x2 (t ) t
Solution:
No simple solution exists and no phase diagram can be drawn. But we can see the solution
graphically.
20
15
10
1 1 2 3 4 5 6
10
n. x(t ) x(t ) log(t )

Again we can only show the solution via a computer simulation:
373
20
15
10
1 1 2 3 4 5 6
10
2
o. x t t x t t 3 2t 2 1
Solution:
Particular: x t 2
13
t
Homogeneous: x C e 3
p. x t 3x t 4e t
Solution:
Particular: x 2e t
Homogeneous: x C e 3t
2
q. x t (2t 2) x t e t
Solution:
1 t2
Particular: x e
2
( t 2 2t )
Homogeneous: x C e
Friday broad tutorial
374
4. Consider the following probability density function (the exponential distribution):
f ( x) e x for x [0, )
a. Please calculate the expected value of x.
b. Compute the cumulative distribution of x.
Solution
a. We know that EX xf ( x)dx
Solution 1:
Hence, we apply the method of integration by parts:
x x x x
x e dx xde xe e dx
0 0 x 0 0
1 1 1 1
lim( xe x
0) e x
0 lim e x
x x
x 0
It can be shown that lim xe x

0
x
1 (note 1)
1 (note 2)
x e x dx x e xd x t e t dt
0 0 0
f g'
1 1
(t ( e t ) 1 ( e t ) dt
0
f g f' g
t 0
1 1
(t ( e t ) ( e t)
t 0
f g t 0
1 1 1 1
lim( te t
0) e t
0 lim e t
t x
t 0
Note 1:
dt 1
x t; ; dt dx
dx
x 0 t 0
x t
Note 2:
( f g ')dt f g ( f ' g )dt
f t g' e t
f' 1 g et
x x
x e dx x e dx
0 0
f g' f g'
1 1
x( e x
) 1 e x
dx
0
f f'
g x 0 g
375
1
(lim xe x
0) e x
x
x 0
0
1 1 1 1
0 (lim e x
e0 ) 1
x
Additional question 4b.

d
Compute e x dx
d 0
Solution:
d d
We can show that e x
dx 1 0
d 0 d
1
But we are also allowed to interchange integration and differentiation:
d d
e x dx e x dx
d 0 0 d f g
(1 e x
( xe x
)dx
0
f'g f g'
x x
e dx xe dx
0 0
1 1 1 1
e x
0
x 0
see exercise a
c. The cumulative distribution is
x
F ( x) e t dt
0
5. The probability of observing a t is f (t ) e t for t [0, )

Compute the probability of t larger than 1.
Solution:
1 1
Pr(T 1) e t dt et lim e t
1 t 1 t e e
6. Consider the Solow economic growth model:

X AK 1 L (Cobb Douglas production function)
K sX (investment is proportional to output)
L L0e t
(exponential growth of labour force)
where X X (t ) is the national product, K K (t ) is the capital stock and L L(t ) is the
number of employees at time t. The model contains the following constants: A, , s, and L0 .
Derive the differential equation to determine K K (t ) when K K (0) 0
Solution
dK
K sAK 1 L0e t sA( L0 ) K 1 e t
dt
It is a separable differential equation. So we have on the left-hand side a function of dK and
K. On the right hand side we have a function of dt and t.
376
1 t
K dK sA L0e dt
We take the integral on both sides of this equation:

1 t
K dK sA L0e dt
which becomes
1 1 t
K sA L0 e C
Next, we rewrite this equation, so that K is a function of t. Thus, we eliminate
or ( C1 C)
1 t
K sA L0 e C1
if K K0 for t=0
sA
C1 K 0 L0
Thus the solution becomes:
1/
s
K K0 L0 e t
1
377
Additional assignments week 7
Exercise 1
Solve the following integral:
1 2 1
( xy )dxdy
0 1 2
Solution
2 1
1 2 1 1 2 1 1 1 2 13 3 2 3
( xy)dxdy ( xy)dx dy x y dy ydy y
0 1 2 0 1 2 0 4 x 1
0 4 8 y 0 8
Alternative:
1 2
2 11 2 1 1 2 1 2 2 1 1 2 4 1 3
( xy)dydx ( xy)dy dx xy dx xdx x
1 0 2 1 0 2 1 4 1 4 8 8 8 8
y 0 x 1
Exercise 2
Solve the following integral:
1
2
dx
0
x
Solution:
1 1 1 1
2
dx [ ]0 (lim ) (lim ) 0
0
x x x x x 0 x
Exercise 3
Differentiate the following function:
x2
g ( x) t 3 log 2 (t 12) dt
1
Solution:
The upper limit is a function of x.
dg ( x)
( x 2 )3 log 2 ( x 2 12) 2 x 2 x 2 3 1 log 2 ( x 2 12) 2 x 7 log 2 ( x 2 12)
dx integration with respect to
upper
limit
the upper limit of the integral is
function
of x
Note that we applied the chain rule for the upper limit of the integral.
Exercise 4 (this is material of Week 6, but very relevant for the exam)
t
dt for which you may assume that t is positive.
(t 3) 2
Solution
We apply the method of integration by parts:
378
t 1 1 1 1 1
2
dt td t dt t dt
(t 3) (t 3) (t 3) (t 3) (t 3) (t 3)
t
ln(t 3) C
(t 3)
Two remarks:
Remark 1) We applied in the first step that
1
d
(t 3) 1
dt (t 3) 2
1 1 1
dt d d
(t 3)2 (t 3) (t 3)
Remark 2) We can check that:

d t d 1 1 2 1 2
ln(t 3) C t (t 3) ln(t 3) C (t 3) t (t 3) (t 3) t (t 3)
dt (t 3) dt
Exercise 5 (this is material of Week 6, but very relevant for the exam)
Calculate the following integral:
7
3
6 x 2 e x dx
2
Solution:
We apply the method of substitution: t x3 , so that dt 3x 2 dx . The lower limit becomes
t 23 8 , while the upper limit becomes t 73 343 Plugging this in we find:
7 343
2 x3
6 x e dx 2et dt 2et |8343 2(e343 e8 ) .
2 8
3 3
Alternatively, we could have used the substitution u e x , giving du 3x 2e x dx . The lower
3 3
limit becomes u e(2) e8 , while the lower limit becomes u e(7) e343 . We plug all this
in:
7 e343
2 x3 343
6 x e dx 2du 2u |ee8 2(e343 e8 )
2 e8
Exercise 6
Give the complete solution to the following differential equations. If possible, draw a phase
diagram and discuss.
x(t ) 3x(t ) e2t .
Hint: for the particular solution, try a multiple of the inhomogeneous part, e 2t .
Solution:
For a particular solution, as per the hint, we try something of the form of the inhomogeneous
part: x(t ) c e2t for some constant c. We plug this into the equation:
x(t ) 3x(t ) 2ce2t 3ce2t ce2t e 2t c 1 x(t ) e 2t .
379
We have found our particular solution lets move on to the homogeneous solution:
x(t ) 3x(t ) 0 (notice that strictly speaking we should give the function x(t ) a new name,
because it doesnt solve the original equation. We dont do so to avoid clogging notation).
Here we know that this is a homogeneous linear first-order differential equation, which we
know to have the solution: x(t ) C e3t , for any constant C. So the total solution becomes:
x(t ) C e3t e2t
x(t ) 3C e3t 2e2t
x(t ) 3x(t ) 3C e3t 2e2t 3(C e3t e 2t ) e 2t
It all worked out.
Notice that we cannot draw a phase diagram, as the relation between x and x changes over
time.
380
Exercise 7
x(t ) 4 x(t ) 3
Solution:
3
For the particular solution, we try a constant. We see that x(t ) works:
4
3
x(t ) 4 x(t ) 0 4 3
4
For the homogeneous solution we find x(t ) C e 4t , so that the total solution becomes
3
x(t ) C e 4t .
4
Here we can draw a phase diagram. The relation between x and x can be rewritten as
x(t ) 3 4 x(t ) and graphed:
1 1 2 3
3 3
What we see in the phase diagram is that x 0 if x , but x 0 if x . So x always
4 4
3
moves towards over time and that is a stable equilibrium.
4
381
Exercise 8
Sketch the direction diagram and the phase diagram (if possible) for the following differential
equations. If possible, give an explicit solution.
x(t ) 2 x(t )
Solution:
Lets first solve it exactly. We could either rewrite the equation in a form to which we know
the solution, or we could try to guess the solution, because the equation is so simple. We will
start with the guessing approach. x(t ) 2 x(t ) requires that we find functions that are twice
their own derivative. We know that the exponential function is equal to its own derivative, so
it makes sense to see if we can manipulate it to get a solution to our equation. We could try
multiplying the exponential function, but then it would still be equal to its own derivative:
d d 2t
C et C et . But if we multiply the power, we do get our desired result: e 2e2t .
dt dt
This gives us exactly what we want. But we know that it will also hold for any multiple of this
function, so we get as a general solution: x(t ) C e2t .
We can also obtain this result in a more procedural fashion: if we rewrite the equation as
x(t ) 2 x(t ) 0 , we see that is a homogeneous linear equation, i.e. of the form
x(t ) f (t ) x(t ) 0 , with f (t ) 2 . This general form has the solution:
x(t ) C e F (t ) , where F is an antiderivative of f, F '(t ) f (t ) . Here F (t ) 2t and we
obtain as our solution x(t ) C e2t , as before.
We draw the direction diagram (or vector field) and some solutions:
382
10
1.0 0.5 0.5 1.0
10
Finally, we draw a phase diagram. Here we set x on the horizontal axis and x on the vertical
axis.
383
4
2 1 1 2
4
From the phase diagram we can also infer the qualitative behaviour of our solutions. In
particular, we see that x 0 when x=0. This implies that x=0 is an equilibrium: when x=0, the
x value does not change over time. If x>0, then x 0 , according to the phase diagram. This
means that if x is positive, it will grow over time. Similarly, if x<0, then x 0 , so x will
decrease over time. We see that solutions will move away from the equilibrium-value 0 over
time, so that x=0, while an equilibrium, is not stable. All this is corroborated by our actual
solutions.
384
Week 8 Dynamic analysis (II)
Differential equations
Equilibrium of homogenous equation K.14.1.
Stability of system of equation K.14.3. Page 481
486 (Phase diagram
Difference equations
Introduction to first-order difference equations K. 13.1
385
Again linear differential equations
We consider the following differential equation, that we solve as

follows:
x f ( x, t )
So, lets take a simple case:

x f (t )
dx
f (t )
dt
dx f (t )dt
dx f (t )dt
x f (t )dt C C is some constant
386
First-order linear differential equations
x A(t ) x B(t )
Solution:
Step 1
Solve the general solution to the homogenous equation (or reduced
equation)
x A(t ) x 0
x A(t ) x
x
A(t )
x
x
A(t )
x
1
dx A(t )dt
x
ln x A(t )dt C
A( t ) dt C
x e
Step 2
Find the particular solution for the non-homogenous differential
equation. This can be considered as a steady-state value
x A(t ) x B(t )
Try for the particular solution of x :
If B(t ) is constant, a constant
If B(t ) is a polynomial of degree n, try for x a polynomial of degree
n
If B(t ) is e at , try a multiple of e at .
For linear combinations of the above, try linear combinations.
If this fails, try multiplying with a factor t, before you throw in the
towel. Dont be upset if it doesnt work, this might be one of many
insoluble differential equations
387
Step 3
General solution of non-homogenous equation is the sum of step 1 and
step 2.
Step 4
A definite solution of C specifies the initial value of x0 . Thus
substitute t=0 in the general solution (step 3) and solve for C.
Step 5
Study the limit of the solution of step 4 if t gets infinitely large.
388
Example: First-order linear differential equation
We consider the following equation:
x ax b
Note that x is a function of t. We solve the general solution by

following the four steps of above.
Step 1 (homogenous equation: right-hand side of the differential

equation is zero). The homogenous equation is a separable differential
equation.
x ax 0
x ax
dx
ax
dt
dx
adt
x
dx
adt
x
ln( x) adt C (we assume a positive x; C is some real
number)
adt C adt adt
eln( x ) e eC e C1e ( C1 eC )
Step 2:
b
x is the steady state
a
Step 3:
General solution is the solution of the homogenous equation plus the
steady state:
b
x C1e at
a
389
Step 4:
We solve C1 of step 3 by means of substituting the initial value, t=0,
in the equation of step 3.
a0 b b
x(0) C1e C1 1
a a
Thus:
b
C1 x(0)
a
Next we substitute C1 in the equation
b b
Thus solution: x x(0) e at
a a
Step 5
We study the solution as t becomes infinitely large.
b
Hence, if the initial value x(0) is , then the limit will be equal to
a
the initial value.
b b b b
lim x e at
t a a a a
b
If the initial value x(0) is not equal to and a >0 then
a
b b
lim x lim x(0) e at
t t a a
b b
x(0) lim e at
a t a
b b b
x(0) 0
a a a
b
If the initial value x(0) is not equal to and a < 0 then
a
b b
lim x lim x(0) e at
t t a a
390
Adjustment towards equilibrium
We consider the process of adjustment towards equilibrium. The major

question here is whether there is rapid or slow adjustment.
We start again with the difference equation:
x ax b
Note that x is a function of t; so the outcome of variable x depends on

b
time. We calculated that the equilibrium is
a
We rewrite the differential equation as a function of equilibrium:
b
x ax b ax a ax a x* a( x x* )
a
Convergence:
If a is positive, and x x* , x becomes negative. It means that x is
too large (relative to equilibrium value). A negative x ensures
adjustment towards equilibrium, so that x becomes smaller.
If a is positive, and x x* , x is positive. Hence x is too small. The
positive x gives an adjustment towards equilibrium.
Finally, we can show that a more positive a gives a faster

convergence (more rapid adjustment) towards equilibrium. We
consider the solution of the differential equation. A larger a gives
and e at close to zero, so that x is close to x* . It implies a more
rapid adjustment.
b b
x x(0) e at
x(0) x* e at
x* x(0)e at
x* (1 e at
)
a a
391
System of two differential equations (two-variable phase

diagram)
Lets consider the system of two differential equations, for which x

and y depend on x and y. The equations contain six parameters a, b, c,
d, e1 and e2 . Both of the variables x and y depend on t.
x ax by e1
y cx dy e2
392
Case 1 - Global stability of the system of equations (figure 14.7a).

For this case we assume that a 0 , b 0 , c 0 and d 0
We consider a two-variable Phase diagram, for which the x-
variable is on the horizontal axis and the y-variable is on the
vertical axis (See Figure 14.7 of Klein).
First: we consider the upward sloping x 0 equation:

For this equation, we take a 0 and b 0 .
Consequence: the line x 0 is upward sloping in the (y x) phase
diagram.
Reason: we are interested for which combinations of x and y, x 0 .
Thus:
0 ax by e1
b e
y x 1
a a
b
Because a 0 , b 0 , the slope of the equation x 0
a
becomes positive.
For any point below the line x 0 , x is negative. A negative x
implies that x is becoming smaller, thus the horizontal arrows are
pointing in leftward direction in Figure 14.7 a.
o Reason:
x 0 corresponds to
ax by e1 0
by ax e1
a e
y x 1
b b
(division by a positive number b, so that the inequality sign
does not change).
For any point above the line x 0 , x is positive. The positive x
means that x is becoming larger. Thus, the horizontal arrows point
in rightward direction.
393
o Reason:
x 0 corresponds to
ax by e1 0
a e
y x 1
b b
Second: we consider the downward sloping y 0 equation:
For this equation we take the parameters c 0 and d 0

Consequence: the line y 0 is downward sloping in the (y x)
phase diagram
o Reason:
y 0
0 cx dy e2
c e
y x 2
d d
c
o Because c 0 , d 0 , the slope of the equation y 0
d
becomes negative.
o For any point below the line y 0 , y is positive (y is
becoming larger, thus the vertical arrows are upwardly
pointing).
o Reason:
y 0
cx dy e2 0
dy cx e2
c e
y x 2 (the inequality sign turns around because
d d
the left hand side and the right hand side of the equation are
divided by d, which is a negative number)
o For any point above the line y 0 , y is negative (y is
becoming smaller. Thus, the vertical arrows in downward
direction).
o Reason:
y 0
394
cx dy e2 0
dy cx e2
c e
y x 2 (the inequality sign turns around because
d d
the left-hand side and the right-hand side of the equation are
divided by d, which is a negative number)
Conclusion: as a result of the directions of the vertical and horizontal

arrows, there will be convergence towards equilibrium, for all initial
conditions of x and y.
395
Case 2 (saddlepath stability; Figure 14.7c):
For the x 0 equation we take a 0 and b 0 . So, consider the

case 1 of above:
o For any point below the line x 0 , x is negative. A negative
x implies that x is becoming smaller, thus the horizontal
arrows are pointing in leftward direction in Figure 14.7 c.
o For any point above the line x 0 , x is positive. The positive
x means that x is becoming larger. Thus, the horizontal
arrows point in rightward direction.
For the y 0 equation we take c 0 and d 0 . The line y 0 is

downward sloping in the (y x) phase diagram.
o Reason:
y 0
0 cx dy e2
c e
y x 2
d d
c
o Because c 0 , d 0 , the slope of the equation y 0
d
becomes negative.
For any point below the line y 0 , y is positive (y becomes larger,
arrow up).
For any point above the line y 0 , y is negative (y becomes
smaller, arrow down).
Consequence: for particular initial conditions, there will be

convergence towards equilibrium. Starting north and south (N and S)
of the equilibrium (see Figure 14.7c), there will be convergence
towards equilibrium. Starting east and west of equilibrium (E and W
in Figure 14.7c), there will be divergence.
396
Stability and eigenvalues
For stability, we can consider the eigenvalues ( 1 and 2 ) of the

matrix
a b
A
c d
1) Globaly stable if both eigenvalues of A are negative:

1 0, 2 0
2) Saddlepath stable if the eigenvalues of A have different signs:

1 0, 2 0
3) Globaly unstable if both eigenvalues of A are positive:

1 0, 2 0
We will not pursue on this matter in this course.
397
Difference equations (Chapter 13)
We consider periods time. Consequently, we can have a sequence {xt }

for x and a sequence { yt } for y:
x1 , x2 , x3 ,...
y1 , y2 , y3 ,...
First-order difference equation:

xt axt 1 yt
Second-order difference equation:

xt axt 1 bxt 2 yt
There will be a monotonic sequence if a > 0.

There will be a sequence that alternates in sign if a < 0.
A sequence {xt } is bounded if there is a such that for any t: xt
The sequence xt axt 1 yt converges if:

lim xt x
t
The sequence diverges if:

lim xt
t
Thus:
x ax y
(1 a) x y
1
x y
(1 a)
Convergence of the sequence to steady state (regardless of initial
value) of x0 if a 1
The steady state is not well defined if a 1
Divergence of the sequence if a 1
398
Solutions to first-order difference equations
1) Repeated iteration
xt axt 1 y
xt 1 axt 2 y
so that
xt a(axt 2 y) y a 2 xt 2 ay y
thus
t 1
xt at x0 i 0
a i
y
2) Forward solution
ut but 1 vt
ut 1 but 2 vt 1
so that
ut b(but 2 vt 1 ) vt b2ut 2 bvt 1 vt
thus
n
ut lim bnut n i 0
bi vt i
n
If b 1 and {ut } is bounded

lim bnut n 0
n
If {vt } is bounded then the solution to the difference equation is

ut i 0
bi vt i
399
General solution
We consider the difference equation. It resembles the procedure of

above for the differential equation.
xt axt 1 y
Step 1:
Solve the homogenous equation (so, the right-hand side is zero):
xt axt 1 0
Solution to this equation (we need to determine A and k):
xt Ak t
which is substituted in the homogenous equation:

1
Ak t aAk t 0
so that k = a
solution to the homogenous equation
xt Aat
Step 2:
Find a particular solution of
xt axt 1 y:
which is
1
x y
(1 a)
Step 3:
General solution:
1
xt Aa t y
(1 a)
400
Step 4:
Determine A of the general solution
1
x0 Aa 0 y
(1 a)
1
A x0 y
(1 a)
So that we substitute A in the general solution
1 t 1 t 1 at
xt x0 y a y x0 a y
(1 a) (1 a) 1 a
401
402

Advanced Mathematics 2014 2015 TOTAAL Week 1 - Week 7 26 March 2015

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Advanced Mathematics 2014 2015 TOTAAL Week 1 - Week 7 26 March 2015

Uploaded by

Copyright:

Available Formats

Advanced Mathematics

This version: 26 March 2015

Utrecht University School of Economics

Prof. dr. Wolter Hassink (lectures and coordinator)

Oke Onemu, MSc (tutorials) O.A.Onemu@uu.nl

Jochem Zweerink, MSc. (tutorials) J.R.Zweerink@uu.nl

Week 1 Introductory material 8

Week 2 Linear algebra (I) 72

Week 3 Linear algebra (II) 123

Week 4 Calculus 183

Week 5 Optimization (I) 240

Week 6 Optimization (II) and integrals (I) 302

Week 7 Integral calculus and dynamic analysis (I) 342

Week 8 Dynamic analysis (II) 385

Midterm (50%) on Friday 6 March 2015; 1:30 p.m. 4:30 p.m.

Retakes: week of retake after period 4

Replacement retake exam (4.0 <= final grade < 5.0)

Group Week Day Time Building Room

1 6 - 9, 11 - 14 Wednesday 09.00 - 10.45 Adam Smith Hall 113

1 6 - 9, 11 - 13* Friday 09.00 - 10.45 Adam Smith Hall 113

In week of 2 March (week 10): no education

Please contact prof. Wolter Hassink, if you would like to sign in

Week 1 Introductory material

Week 1 - Introductory material

Definition: Set: collection of elements

Example 4: let T {0,1}

Definition: union of sets

Definition: intersection of sets:

Definition: empty set

Definition: Function, Mapping (or transformation): element of set X

Set X: domain on which the function is defined.

Definition: one member of domain is related to one member of range

Definition: and are parameters

Definition: different independent variables and one dependent

Subscript of x refers to the variable name.

Necessary and sufficient conditions: some logic

Read: It means that if P then Q

Wrong implications (reverse implication of above):

Read: It means that P if and only if Q

Read: It means that P if and only if Q

Structure of a mathematical proof I - direct proof

Indirect proof (proof by contrapositive):

Proof by contradiction general structure

P Q is impossible or it leads to a contradiction

Structure of a mathematical proof II - proof by contradiction

Usually, proofs by contradiction are used for negative results

Proposition: There is no largest number in

Step 2 Consider x 1 . Since the sum of two integers is again an

Step 3 Hence there is no largest number in .

Structure of a mathematical proof - proof by induction

Functions and limits

Definition: The limit

Continuity of functions (I)

Definition: a function f ( x) is continuous at a point x=a if

Definition: If a function is continuous at every point of its domain, it

lim f ( x) lim 10 10 f (10)

This slide and the next six slides: A graphic interpretation of

First we pick a (random) 0.

Then we find a suitable 0 . Note that is not unique: any smaller

Now if we pick a certain 0:

There is no 0 that works:

More on limits (I)

If lim f ( x) A and lim g ( x) B

o lim f ( x) g ( x) lim f ( x) lim g ( x) A B