Professional Documents
Culture Documents
Preface v
iii
iv Contents
These notes have been prepared for the Math Review Class for Graduate students joining Ph. D.
program in the field of economics at Cornell University. While making these notes we have referred
to the material used in previous years’ classes.
The objective of Math Review class is to present elementary concepts from set theory, multi-
variable calculus, linear algebra, elementary probability concepts, real analysis and optimization
theory. I have used examples and problem sets to explain the concepts, definitions and techniques
which are useful in Fall semester graduate economics classes.
These notes could serve to refresh the memory for those incoming students who are familiar
with the material. To others, these notes could be a ready reckoner of math techniques they will
need to know in the first few weeks of the graduate classes (Econ 6090, Econ 6130, Econ 6190) in
Economics before they are discussed in Econ 6170 in more rigorous way.
The topics have been arranged so that the entire material can be covered in thirteen classes
of three hours duration each. Additional problem sets with solutions are provided on each day’s
material. Three additional sections of three hours each are sufficient to go over the questions in
problem sets. It is hoped that they will help the reader to better understand the material in lecture
notes.
Earlier versions have been used for the Math Review Classes during 2009-17. My sincere
thanks go to the participants for their comments and also for pointing out typos, errors.
Ram Sewak Dubey
vii
Chapter 1
Syllabus
1.1. Overview
The Field of Economics offers the August Math Review Course for incoming first-year Ph.D.
students. The aim of this review is to refresh students’ mathematical skills and introduce concepts
that are critical to success in the first year economics core courses, i.e., Econ 6090, Econ 6130,
Econ 6170, and Econ 6190. The emphasis is on rigorous treatment of proof techniques, underlying
concepts and illustrative examples.
There is usually a great deal of variation in the mathematical background of incoming first-year
students. However, almost all students have something to gain from the review course. For those
who do not have an adequate mathematics background (by a US Ph.D. standard), the course offers
an opportunity to catch up on critical concepts and get a head start on the fall classes. For those
who took their core undergraduate courses in analysis and algebra some years ago, the course is a
good refresher. For those who do not have significant experience with technical courses taught in
English, the review offers an opportunity to pick up the math vocabulary that will be in use from
the first day of regular instruction.
The Math Review Course is funded by the Department of Economics. There is no charge for
students matriculating into the Economics Ph.D. Program. Students matriculating into other Ph.
D. programs should contact the Director of Graduate Studies in their Field. There will be a charge
ix
x 1. Syllabus
for these students, and the DGS in the student’s Field must make arrangements to pay that charge
before the student may attend the Math Review Course.
The Math Review Course is not linked to Econ 6170, Intermediate Mathematical Economics I.
There is no course grade, and no record will be kept of your performance. However, the Economics
Ph.D. program strongly encourages you to attend. Most students who have taken this course in past
years have found it useful, regardless of their prior mathematics training. Perhaps most importantly,
the review period is an excellent time to get acquainted with other incoming students, meet the
faculty and settle into Ithaca.
1.4. Textbook
There is no textbook for the math review course, however the following books may be helpful.
The textbook ? is used in the Microeconomics course sequence. ? and ? are useful textbooks for
Mathematical Economics. It will be useful to refer to ? for understanding the material. Copies of
this textbook are available in the libraries. ? will be our reference book for analysis. ? contains
many useful examples. ? is the set of Lecture Notes used in Econ 6170.
Bridges, D., Ray, M., 1984. What is constructive mathematics. The Mathematical Intelligencer
6 (4), 32–38.
Dixit, A. K., 1990. Optimization in Economic Theory, 2nd Edition. Oxford University Press, USA.
Mas-Colell, A., Whinston, M., Green, J., 1995. Microeconomic Theory. Oxford University Press,
USA.
Mitra, T., 2013. Lectures on Mathematical Analysis for Economists. Campus Book Store.
Olsen, L., October 2004. A new proof of Darboux’s Theorem. The American Mathematical
Monthly 111, 173–175.
Royden, H. L., 1988. Real Analysis. Prentice Hall.
Simon, C. P., Blume, L., 1994. Mathematics for Economists. W. W. Norton & Co., New York.
Stricharz, R., 2000. The Way of Analysis. Jones and Bartlett.
Wainwright, E. K., Chiang, A., 2005. Fundamental Methods of Mathematical Economics, 4th Edi-
tion. McGraw Hill, New York.
xiii
Chapter 2
Introduction to Logic
2.1. Introduction
The theory that you’ll learn during the first year is built on a foundation borrowed from engineering
and pure mathematics. You will be required to both understand and reproduce certain key proofs,
particularly in microeconomics. On some problem sets and exams you’ll be asked to produce your
own proofs.
If you haven’t taken any pure math courses, you might be thinking “I don’t even know what a
proof is”. That is completely fine. There are plenty of very accomplished Ph.D. students at Cornell
who had no idea how to write a proof when they arrived. It’s important not to get discouraged
because it takes time to learn how to write good proofs. There is a standard bag of tricks that will
get you through almost any proof in the first year sequence, but it takes exposure and then practice
for you to learn and be comfortable with these tricks. Math majors are at an advantage here, more
than in most areas, but by the end of the year they’ll have forgotten the fancier proof techniques
and you’ll have learned the necessary ones, so the field will be surprisingly leveled.
A proof is a series of statements that demonstrates the truth of a proposition. In writing a proof
you make use of (i) the rules of logic and (ii) Definitions, theorems, and other propositions that
have already been proved, or that you are told you can take as given.
The rules of logic are obviously fixed and unchanging. The components of the second point,
however, will vary depending on the task at hand. The most important question to ask yourself
when attempting to prove a proposition is “What do I already know”? It will often be the case that
if you write down all of the relevant mathematical definitions, the theorems or results that you were
given or that you know you can take as given, and any result that you just proved in a previous
problem, a straightforward rearrangement of everything on the page will give you the proof that
you want.
1
2 2. Introduction to Logic
In this chapter we will discuss the principles of logic that are essential for problem solving in
mathematics. The ability to reason using the principles of logic is key to seek the truth which is
our goal in mathematics. Before we explore and study logic, let us start by spending some time
motivating this topic. Mathematicians reduce problems to the manipulation of symbols using a set
of rules. As an illustration, let us consider the following problem.
Example 2.1. Joe is 7 years older than John. Six years from now Joe will be twice John’s age.
How old are Joe and John?
Solution 2.1. To answer the above question, we reduce the problem using symbolic formulation.
We let John’s age be x. Then Joe’s age is x + 7. We are given that six years from now Joe will be
twice John’s age. In symbols, (x + 7) + 6 = 2(x + 6). Solving for x yields x = 1. Therefore, John
is 1 year old and Joe is 8.
Our objective is to reduce the process of mathematical reasoning, i.e., logic, to the manipulation
of symbols using a set of rules. The central concept of deductive logic is the concept of argument
form. An argument is a sequence of statements aimed at demonstrating the truth of an assertion (a
“claim”). Consider the following two arguments.
Argument 1. If x is a real number such that x < −3 or x > 3, then x2 > 9. Therefore, if x2 ≤ 9,
then x ≥ −3 and x ≤ 3.
Argument 2. If it is raining or I am sick, then I stay at home. Therefore, if I do not stay at home,
then it is not raining and I am not sick.
Although the content of the above two arguments is very different, their logical form is the
same. To illustrate the logical form of these arguments, we use letters of the alphabet (such as p, q
and r) to represent the component sentences and the expression “not p” to refer to the sentence “It
is not the case that p.”. Then the common logical form of both the arguments above is as follows:
If p or q, then r. Therefore, if not r, then not p and not q.
We start by identifying and giving names to the building blocks which make up an argument. In
Arguments 1 and 2, we identified the building blocks as follows:
Argument 1. If x is a real number such that x < −3 (p) or x> 3 (q), then x2 > 9 (r). Therefore,
if ≤ 9 (not r), then x ≥ −3 (not p) and x ≤ 3 (not q).
x2
Argument 2. If it is raining (p) or I am sick (q), then I stay at home (r). Therefore, if I do not
stay at home (not r), then it is not raining (not p) and I am not sick (not q).
2.2. Statements
The study of logic is concerned with the truth or falsity of statements.
Definition 2.1 (Statement). A statement is a sentence which can be classified as true or false
without ambiguity. The truth or falsity of the statement is known as the truth value.
2.3. Logical Connective 3
For a sentence to be a statement, it is not necessary for us to know whether it is true or false.
However, it must be clear that it is one or the other.
Example 2.2. Consider following examples.
(a) One plus two equals three. It is a statement which is true.
(b) One plus one equals three. It is also a statement which is not true.
(c) He is a university student. This sentence is neither true nor false. The truth or falsity depends
on the reference for the pronoun he. For some values of he the sentence is true; for others it is
false, and so it is not a statement.
(d) “Every continuous function is differentiable.” is a statement with truth value being false.
(e) “x < 1 ” is true for some values of x and false for some others. It is a statement if we have
some particular context in mind. Otherwise, it is not a statement.
(f) Goldbach’s Conjecture “Every even number greater than 2 is the sum of two prime numbers”
is a statement whose truth value is not known yet.
(g) “There are infinitely many prime numbers of the form 2n + 1, where n is a natural number.” is
another statement whose truth value is not known till now.
Every statement has a truth value, namely true (denoted by T) or false (denoted by F). We often use
p, q and r to denote statements, or perhaps p1 , p2 , · · · , pn if there are several statements involved.
Exercise 2.1. Which of the following sentences are statements?
(a) If x is a real number, then x2 ≥ 0.
(b) 11 is a prime number.
(c) This sentence is false.
The possible truth values of a statement are often given in a table, called a truth table. The truth
values for two statements p and q are given below. Since there are two possible truth values for
each of p and q, there are four possible combinations of truth values for p and q. It is customary to
consider the four combinations of truth values in the order of TT, TF, FT, FF from top to bottom.
p q
T T
(2.1) T F
F T
F F
A B A∧B
T T T
(2.3) T F F
F T F
F F F
In words, if both A and B are true, then the conjunction A ∧ B is true. For all other assignments
of logical values to A and to B the conjunction A ∧ B is false.
For example, consider the statements
p : The integer 2 is even.
q : 4 is less than 3.
The conjunction of p and q, namely,
p ∧ q : The integer 2 is even and 4 is less than 3,
is a false statement since q is false (even though p is true).
2.3. Logical Connective 5
Suppose first that p is true and q is true. That is, you meet the month-end deadline and you
do get a bonus. Did your supervisor tell the truth? Yes, indeed. So if p and q are both true,
then so too is p ⇒ q, which agrees with the first row of the truth table of (2.5).
Second, suppose that p is true and q is false. That is, you meet the month-end deadline
and you did not get a bonus. Then your supervisor did not do as he / she promised. What your
supervisor said was false, which agrees with the second row of the truth table of (2.5).
Third, suppose that p is false and q is true. That is, you did not meet the month-end
deadline and you did get a bonus. Your supervisor (who was most generous) did not lie (your
supervisor promised nothing if you did not meet the month-end deadline); so he/she told the
truth. This agrees with the third row of the truth table of (2.5).
Finally, suppose that p and q are both false. That is, you did not meet the month-end
deadline and you did not get a bonus. Your supervisor did not lie here either. Your supervisor
only promised you a bonus if you met the month-end deadline. So your supervisor told the
truth. This agrees with the fourth row of the truth table of (2.5).
In summary, the implication p ⇒ q is false only when p is true and q is false.
A conditional (or implication) statement that is true by virtue of the fact that its hypothesis
is false is said to be vacuously true or true by default. Thus the statement: “If you meet
the month-end deadline, then you will get a bonus” is vacuously true if you do not meet the
month-end deadline!
So A ≡ B is true if A and B have the same truth value (both true or both false), and false if they
have different truth values.
Recall that the given statement can be written as p ⇒ q where p and q are the statements:
p: You meet the month-end deadline;
q: You get a bonus.
(a) The converse of this implication is q ⇒ p: If you get a bonus, then you have met the month-end
deadline.
(b) The inverse of this implication is ∼ p ⇒∼ q: If you do not meet the month-end deadline, then
you will not get a bonus.
(c) The contrapositive of this implication is ∼ q ⇒∼ p: If you do not get a bonus, then you will
not have met the month-end deadline.
The following theorem is extremely useful.
Theorem 2.1. (A ⇒ B) ⇔ (∼ B ⇒∼ A).
Remark 2.1. It is an exercise to see that A ⇒ B is not logically equivalent to its converse, B ⇒ A.
One should avoid the very common mistake of claiming the opposite.
Example 2.8. Consider following two statements,
(A) Cornell is in Ithaca.
(B) Cornell is in NY state.
and the compound statements:
(a) Implication : A ⇒ B : If Cornell is in Ithaca, then Cornell is in NY state.
(b) Contrapositive : ∼ B ⇒∼ A : If Cornell is NOT in NY state, then Cornell is NOT in Ithaca.
(c) Converse : B ⇒ A : If Cornell is in NY state, then Cornell is in Ithaca.
Note that the converse statement is FALSE. This leads us to another important interpretation
of the implication A ⇒ B. It means that every time A is true, then B must be true. Hence A is a
sufficient condition for B. If we know that A is true then we can always conclude that B is also
true. The contrapositive ∼ B ⇒∼ A showed us that when B is not true then A cannot be true either.
Hence B is a necessary condition for A. If A is true we must necessarily have that B is true, because
if B isn’t true then A cannot be true either. Thus we have following ways of reading
A implies B,
If A then B,
(2.10) A⇒B:
A is sufficient for B,
B is necessary for A.
Remark 2.2. Note that for equivalence relation (the if and only if) A ⇔ B, the implication goes
in both the directions. In this case A and B are necessary and sufficient conditions for each other.
A ⇔ B means that both the statement A ⇒ B and its converse B ⇒ A are true.
2.4. Quantifiers
In the previous sections, we learnt some definitions and basic properties of compound statements.
We were interested in whether a particular statement was true or false. This logic is called propo-
sitional logic or statement logic. However there are many arguments whose validity cannot be
verified using propositional logic. Consider, for example, the sentence
p : x is an even integer.
This sentence is neither true nor false. The truth or falsity depends on the value of the variable x.
For some values of x the sentence is true; for others it is false. Thus this sentence is not a statement.
However, let us denote this sentence by P(x), i.e.,
P(x) : x is an even integer.
Then, P(5) is false, while P(6) is true. To study the properties of such sentences, we need to extend
the framework of propositional logic to what is called first-order logic.
2.4. Quantifiers 9
Definition 2.4. A predicate or propositional function is a sentence that contains a finite number of
variables and becomes a statement when specific values are substituted for the variables. The do-
main of a predicate variable is the set of all values that may be substituted in place of the variables.
is a propositional function with domain D, the set of integers; since for each x ∈ D, P(x) is a
statement, i.e., for each x ∈ D, P(x) is true or false, but not both.
(a) The sentence “P(x) : x + 3 is an even integer” with domain D the set of positive integers.
(b) The sentence “P(x) : x + 3 is an even integer” with domain D the set of integers.
(c) The sentence “P(x; y; z) : x2 + y2 = z2 ” with domain D the set of positive integers.
Before proceeding further, we introduce following notations. A more comprehensive list of
notation will be described later.
is a statement. To see this, notice that either P(x) is true at each value x ∈ D (the notation x ∈ D
indicates that x is in the set D, while x ∈
/ D means that x is not in D) or P(x) is false for at least
one value of x ∈ D. If P(x) is true at each value x ∈ D, then Q(x) is true. However, if P(x) is
false for at least one value of x ∈ D, then Q(x) is false. Hence, Q(x) is a statement because it is
either true or false (but not both).
Definition 2.5. Each of the phrases “every”, “for every”, “for each”, and “for all” is referred
to as the universal quantifier and is expressed by the symbol ∀. Let P(x) be a statement with
domain D. A universal statement is a statement of the form ∀x ∈ D, P(x). It is false if P(x) is
false for at least one x ∈ D; otherwise, it is true.
The statement
∀x ∈ D, x > 0
means “For all x that are elements of D, x is positive.”
Example 2.11. Let P(x) be the predicate “P(x) : x2 ≥ x.”.
Determine whether the following universal statements are true or false.
(i) ∀x ∈ R; P(x);
(ii) ∀x ∈ Z; P(x);
( )2 ( )
(i) Let x = 12 ∈ R. Then, 12 = 14 < 12 , and so P 12 is false. Therefore, “∀x ∈ R; P(x)” is
false.
(ii) For all integers x, x2 ≥ x is true, and so P(x) is true for all ∀x ∈ Z. Hence,“∀x ∈ Z; P(x)”
is true.
(b) The Existential Quantifier:
Each of the phrases “there exists”, “there is”, “for some”, and “for at least one” is referred
to as the existential quantifier and is denoted in symbols ∃. Let P(x) be a predicate with domain
D. An existential statement is a statement of the form ∃x ∈ D such that P(x): It is true if P(x)
is true for at least one x ∈ D; otherwise, it is false.
Example 2.12. As before let D be a set.
The statement
∃x ∈ D, x > 0
tells us that “There exists an element x of D such that x is positive.”
Example 2.13. Let P(x) be the predicate “P(x) : x2 < x.”.
Determine whether the following existential statements are true or false.
(i) ∃x ∈ R; P(x);
(ii) ∃x ∈ Z; P(x);
(i) Let x = 12 ∈ R. Then, ( 12 )2 = 14 < 12 , and so P( 21 ) is true. Therefore, “∃x ∈ R; P(x)” is
true.
(ii) For all integers x, x2 ≥ x is true, and so there is no x ∈ Z such that P(x) is true. Hence,“∃x ∈
Z; P(x)” is false.
(c) Universal Conditional Statements
Recall that a conditional statement has a contrapositive, a converse, and an inverse. These
definitions can be extended to universal conditional statements. Consider a universal condi-
tional statement of the form ∀x ∈ D; P(x) ⇒ Q(x).
(i) Its contrapositive is the statement,
∀x ∈ D; ∼ Q(x) ⇒∼ P(x).
(ii) Its converse is the statement,
∀x ∈ D; Q(x) ⇒ P(x)
2.4. Quantifiers 11
2.5.1. More Examples. We can use the truth tables to prove following examples of negations.
∼ (A ∧ B) ⇔∼ A∨ ∼ B
∼ (A ∨ B) ⇔∼ A∧ ∼ B
∼ (x > y) ⇔ x 6 y
∼ (A ⇒ B) ⇔ A∧ ∼ B
∼ (∼ A) ⇔ A.
Try proving them (Good Exercise).
2.5.2. Negation of statement with one quantifier. The universal statement in the Example 2.10
contains a universal quantifier term and the statement x > 0. To negate a universal statement we
need to find only one counterexample. In this example, if we can find just one x in D that is non
positive, we know that it is not true that all x are positive. Thus the negation of the universal
statement
∀x ∈ D, x > 0
is an existential statement,
∃x ∈ D, x 6 0.
To negate an existential statement we must show that every possible instance is false. The existen-
tial statement
∃x ∈ D, x > 0
is false if there are no positive elements of D. Thus the negation of the existential statement is a
universal statement
∀x ∈ D, x 6 0.
Insight from these examples can be generalized to rules of negation. Note that , such that always
follows ∃ (the existential quantifier).
Rule 2.1. For negating the statement, [quantifier term, statement], first change the quantifier: ∀
becomes ∃, ∃ becomes ∀ and then negate the statement.
Rule 2.2. To negate a statement with a string of quantifiers, change the type of each quantifier,
preserve their order and negate the statement that follows the quantifiers.
(2.13) ∀ε > 0 ∃N ∀ n, if n > N, then ∀ x ∈ D, fn (x) − f (x) < ε.
14 2. Introduction to Logic
Negation: ∃ε > 0 ∼ [∃N ∀ n, if n > N, then ∀ x ∈ D, fn (x) − f (x) < ε],
or ∃ε > 0 ∀N, ∼ [ ∀ n, if n > N, then ∀ x ∈ D, fn (x) − f (x) < ε],
(2.14) or ∃ε > 0 ∀N, ∃n ∼ [if n > N, then ∀ x ∈ D, fn (x) − f (x) < ε],
or ∃ε > 0 ∀N, ∃n n > N and ∼ [∀ x ∈ D, fn (x) − f (x) < ε],
or ∃ε > 0 ∀N, ∃ n > N and ∃ x ∈ D, fn (x) − f (x) > ε.
The De Morgans Laws can be expressed in words as under: “The negation of an and statement is
logically equivalent to the or statement in which each component is negated, while the negation of
an or statement is logically equivalent to the and statement in which each component is negated.”
Operator Meaning
∀ For all, for every, for each
∃ There exists, there is
∈ In, a member of
∋ Owns, contains
∨ Or
∧ And
∴ Therefore
∼ or ¬ Not
0/ Empty set
⊂ Subset, is a subset of
⊃ Contains the set
∪ Union (of sets)
∩ Intersection (of sets)
⇒ Implies
⇐⇒ or iff If and only if, each implies the other
s.t., |, or : Such that
Q.E.D. Quod erat demonstrandum (Proof complete)
(f) Hypothesis A hypothesis is a proposition that is consistent with known data, but has been
neither verified nor shown to be false.
(g) Definition Tells us how or what things are.
Chapter 3
Proof Techniques
Definition 3.1. An integer n is even if and only if n = 2k for some integer k. An integer n is odd if
and only if n = 2k + 1 for some integer k.
Using the quotient-remainder theorem, we can show that every integer is either even or odd.
Definition 3.2. An integer n is prime if and only if n > 1 and for all positive integers r and s, if
n = r · s then r = 1 or s = 1. An integer n is composite if and only if n = r · s for some positive
integers r and s, with r ̸= 1 and s ̸= 1.
First three prime numbers are 2, 3, and 5. First six composite numbers are 4, 6, 8, 9, 10 and 12.
Every integer greater than 1 is either prime or composite since the two definitions are negations of
each other.
Definition 3.3. Two integers m and n are said to be of the same parity if m and n are both even or
are both odd, while m and n are said to be of the opposite parity if one of m and n is even and the
other is odd. Two integers are consecutive if one is one more than the other.
17
18 3. Proof Techniques
Integers 2 and 8 are of same parity while 5 and 10 are of opposite parity.
Definition 3.4. Let n and d be integers with d ̸= 0. Then n is said to be divisible by d if n = d · k for
some integer k. In such case we say that n is a multiple of d, or d is a factor of n, or d is a divisor
of n, or d divides n.
Proof. Consider the two statements P(x) : x > −3 and Q(x) : x2 + 1 > 0. Since x2 ≥ 0 for every
x ∈ R, it follows that x2 + 1 ≥ 0 + 1 > 0 for every x ∈ R. Thus P(x) → Q(x) is true for every x ∈ R
and hence for x > −3.
Claim 3.2. If n is an odd integer, then 6n3 + 4n + 3 is an odd integer.
Observe the fact that 6n3 + 4n + 3 is odd does not depend on n being odd. It would have been
better to replace the statement of the claim by “if n is an integer, then 6n3 + 4n + 3 is odd.”
Proof. Let P(x) : x2 − 2x + 1 < 0 and Q(x) : x > 1. Since x2 − 2x + 1 = (x − 1)2 ≥ 0 for every
x ∈ R, we have (x − 1)2 < 0 is false for every x ∈ R. Hence, P(x) is false for every x ∈ R. Thus,
P(x) → Q(x) is true for every x ∈ R.
3.4. Proof by Construction 19
where n ∈ N. If we evaluate this function, it seems that we always get a prime number. For
instance
f (1) = 19
f (2) = 23
f (3) = 29
f (15) = 257.
We can verify that all these numbers are prime. Then we might conjecture that
Conjecture 1. The function f (n) = n2 + n + 17 generates prime numbers for all n ∈ N.
We want to show that (i) the sum of two even numbers is even,
∀x, y ∈ NE , x + y ∈ NE
and (ii) the sum of an odd number and an even number is odd
∀x ∈ NE , ∀y ∈ NO , x + y ∈ NO .
g (n, m) = n2 + n + m where m, n ∈ N.
g (1, 2) = 12 + 1 + 2 = 22
g (2, 3) = 22 + 2 + 3 = 32
g (12, 13) = 122 + 12 + 13 = 132
On the basis of above, we can form a conjecture,
Conjecture 2.
(3.2) ∀n ∈ N, g (n, n + 1) = (n + 1)2 .
Proof. By construction.
g (n, n + 1) = (n)2 + n + (n + 1)
= n2 + 2n + 1
= (n + 1)2 .
Having proved the general statement, we know that
g (15, 16) = 162 .
This is an example of deductive reasoning.
Example 3.5. If the sum of two integers is even, then so is their difference.
Proof. Assume that the integers m and n are such that m + n is even. Then m + n = 2k for some
integer k. So, m = 2k − n and m − n = 2k − n − n = 2(k − n) = 2l, where l = k − n is an integer.
Thus m − n is even.
3.5. Proof by Contraposition 21
m2 > 0 ⇒ m > 0
is false but its converse
m > 0 ⇒ m2 > 0
is true.
To show that A ⇒ B, we can instead show that ∼ B ⇒∼ A. We have already shown before that
implication and its contrapositive are logically equivalent.
Proof.
m ∈ NE ⇔ ∃k ∈ N m = 2k,
7m = 7 (2k) = 2 (7k) , 7k ∈ N ⇒7m ∈ NE .
This is much easier than trying to show directly that 7m being odd implies that m is odd.
(3.3) x 2 ∈ NE ⇒ x ∈ NE
Its contrapositive is
(3.4) x ∈ NO ⇒ x 2 ∈ NO
Proof. Assume, to the contrary, that there is a greatest integer, say N. Then, N ≥ n for every integer
n. Let m = N + 1. Now m is an integer since it is the sum of two integers. Also, m > N. Thus, m is
an integer that is greater than the greatest integer, which is a contradiction. Hence our assumption
that there is a greatest integer is false. Thus there is no greatest integer.
Definition 3.5. A real number r is rational number if r = mn for some integers m and n with n ̸= 0.
A real number that is not a rational number is called an irrational number.
Proof. Assume, to the contrary, that there is a least positive rational number x. Then, x ≤ y for
every positive rational number y. Consider the number 2x . Since x is a positive rational number,
so too is 2x . Multiplying both sides of the inequality 12 < 1 by x, which is positive, gives 2x < x.
Hence, 2x is a positive rational number that is less than x, which is a contradiction. Hence our
assumption that there is a least positive rational number is false. Thus there is no least positive
rational number.
Example 3.12. The sum of a rational number and an irrational number is irrational.
3.6. Proof by Contradiction 23
Proof. Assume, to the contrary, that there exists a rational number p and an irrational number q
whose sum is a rational number. Thus, by definition of rational numbers, p = ab and p + q = r = dc
for some integers a; b; c and d with b ̸= 0 and d ̸= 0. Hence,
c a bc − ad
q = r− p = − =
d b bd
Now, bc − ad ∈ Z and bd ∈ Z since a; b; c and d ∈ Z. Since b ̸= 0 and d ̸= 0, bd ̸= 0. Hence,
r ∈ Q, which is a contradiction. Hence our assumption that there exists a rational number and an
irrational number whose sum is a rational number is false. Thus, the sum of a rational number and
an irrational number is irrational.
√
We end this section with a proof of the classical result that 2 is irrational.
√
Example 3.13. The real number 2 is irrational.
√
Proof. Assume, to the contrary, that 2 is rational. Then,
√ m
2=
n
where m; n ∈ Z and n ̸= 0. By dividing m and n by any common factors, if necessary, we may
further assume that m and n have no common factors, i.e., mn has been expressed in (or reduced to)
2
lowest terms. Then, 2 = mn2 , and so m2 = 2n2 . Thus, m2 is even. Hence, m is even, and so m = 2k,
where k ∈ Z. Substituting this into our earlier equation m2 = 2n2 , we have (2k)2 = 2n2 , and so
4k2 = 2n2 . Therefore, n2 = 2k2 . Thus, n2 is even, and so n is even. Therefore each of m and n has
2 as a factor, which contradicts our assumption that m ̸= n has been reduced to lowest terms and√
therefore that m and n have no common
√ factors. We deduce, therefore, that our assumption that 2
is rational is incorrect. Hence, 2 is irrational.
Exercise 3.1. The square root of any prime number is irrational.
Remark 3.1. One should be very careful when writing proof by contradiction. Here is a very
strong word of caution which can be found in ?, page 3.
“All students are enjoined in the strongest possible terms to eschew proofs by contradiction!
There are two reasons for the prohibition: First such proofs are very often fallacious, the contra-
diction on the final page arising from an erroneous deduction on an earlier page, rather than from
the incompatibility of p with ¬q. Second, even when correct, such a proof gives little insight into
the connection between p and q whereas both the direct proof and the proof by contraposition con-
struct a chain of argument connecting p and q. One reason why mistakes are so much more likely
in proofs by contradiction than in direct proofs is that in a direct proof (assuming the hypotheses is
not always false) all deduction from the hypothesis are true in those cases where hypothesis holds.
One is dealing with true statements, and one’s intuition and knowledge about what is true help to
keep one from making erroneous statements. In proofs by contradiction, however, you are (assum-
ing the theorem is true) in the unreal world where any statement can be derived, and so the falsity
of a statement is no indication of an erroneous deduction.”.
24 3. Proof Techniques
Proof. By Induction.
(a) Base of induction:
then for
f (x) = xn+1 = xn · x,
f ′ (x) = nxn−1 · x + xn · 1
= nxn + xn
(3.8) = (n + 1) xn
(3.9) 7n − 4n = 7 − 4 = 3
Statement is true.
(b) Inductive transition:
3.7. Proof by Induction 25
Observe that in the inductive hypothesis of our proof above, we assume that P(k) is true for an
arbitrary, but fixed, positive integer k. We certainly do not assume that P(k) is true for all positive
integers k, for this is precisely what we wish to prove! It is important to understand that our aim is
to establish the truth of the implication “If P(k) is true, then P(k + 1) is true.” which together with
the truth of the statement P(1) allows us to conclude that an infinite number of statements (namely,
P(1), P(2),P(3), · · · ) are true.
Example 3.17. For every positive integer n,
n(n + 1)(2n + 1)
12 + 22 + · · · + n2 = .
6
n(n+1)(2n+1)
Proof. For every integer n ≥ 1, let P(n) be the statement P(n) : 12 + 22 + · · · + n2 = 6 .
(a) Base of induction:
When n = 1, the statement P(1) : 12 = 1(1+1)(2·1+1)
6 is certainly true since 1(1+1)(2·1+1)
6 =
6
6 = 1. This establishes the base case when n = 1.
(b) For every integer n > 1, let P(n) be the statement P(n) : 12 + 22 + · · · + n2 = n(n+1)(2n+1)
6 . For
the inductive hypothesis, let k be an arbitrary (but fixed) integer such that k > 1 and assume
that P(k) is true; that is, assume that 12 + · · · + k2 = k(k+1)(2k+1)
6 . For the inductive step, we
need to show that P(k + 1) is true. That is, we show that
(k + 1)(k + 2)(2k + 3)
12 + 22 + · · · + k2 + (k + 1)2 = .
6
Evaluating the left-hand side of this equation, we have
12 + 22 + · · · + k2 + (k + 1)2 = (12 + 22 + · · · + k2 ) + (k + 1)2
k(k + 1)(2k + 1)
= + (k + 1)2 (by the inductive hypothesis)
6
k(k + 1)(2k + 1) 6(k + 1)2
= +
6 6
(k + 1)(2k2 + k + 6k + 6)
=
6
(k + 1)(2k2 + 7k + 6) (k + 1)(2k2 + 4k + 3k + 6)
= =
6 6
(k + 1)(k + 2)(2k + 3)
= ;
6
thus verifying that P(k + 1) is true.
(c) Hence, by the principle of mathematical induction, P(n) is true for all integers n ≥ 1; that is,
n(n + 1)(2n + 1)
12 + 22 + · · · + n2 =
6
3.7. Proof by Induction 27
Recall that in a geometric sequence, each term is obtained from the preceding one by multiply-
ing by a constant factor. If the first term is 1 and the constant factor is r, then the sequence is 1, r,
r2 , r3 , · · · , rn , · · · . The sum of the first n terms of this sequence is given by a simple formula which
we shall verify using mathematical induction. This is left as an exercise.
Induction can also be used to solve problems involving divisibility, as the next two example
illustrates.
Example 3.18. For all integers n ≥ 1, 22n − 1 is divisible by 3.
Proof. We proceed by mathematical induction. When n = 1, the result is true since in this case
22n − 1 = 22 − 1 = 3 and 3 is divisible by 3. Hence, the base case when n = 1 is true. For the
inductive hypothesis, let k be an arbitrary (but fixed) integer such that k > 1 and assume that the
property holds for n = k, i.e., suppose that 22k − 1 is divisible by 3. For the inductive step, we must
show that the property holds for n = k + 1. That is, we must show that 22(k+1) − 1 is divisible by
3. Since 22k − 1 is divisible by 3, there exists, by definition of divisibility, an integer m such that
22k − 1 = 3m, and so 22k = 3m + 1. Now,
22(k+1) − 1 = 22k 22 − 1
= 4 · 22k − 1
= 4(3m + 1) − 1
= 12m + 3
= 3(4m + 1).
Since m ∈ Z, we know that 4m + 1 ∈ Z. Hence, 22(k+1) − 1 is an integer multiple of 3; that is,
22(k+1) − 1 is divisible by 3, as desired. Hence, by the principle of mathematical induction, the
property holds for all integers n ≥ 1.
Induction can also be used to verify certain inequalities, as the next example illustrates.
Example 3.19. For all integers n ≥ 2,
√ 1 1 1
n < √ + √ +···+ √ .
1 2 n
Proof. We proceed by mathematical induction. To show the inequality holds for n = 2, we must
show that
√ 1 1
2< √ +√ .
1 2
√ √
But √ √ if and1 only1 if 2 < 2 + 1 which is true if and only if 1 < 2. Since
this inequality is true
1 < 2 is true, so too is 2 < √1 + √2 . Hence the inequality holds for n = 2. This establishes the
28 3. Proof Techniques
base case. For the inductive hypothesis, let k be an arbitrary (but fixed) integer such that k > 2 and
assume that the inequality holds for n = k, i.e., suppose that
√ 1 1 1
k < √ + √ +···+ √ .
1 2 k
For the inductive step, we must show that the inequality holds for n = k + 1. That is, we must show
that
√ 1 1 1 1
k +1 < √ + √ +···+ √ + √ .
1 2 k k+1
√ √ √
Since k > 2, k < k + 1, and so multiplying both sides by k,
√ √
k < k k + 1.
√ √ √
Add 1 to both sides, k + 1 < k k + 1 + 1; and so dividing both sides by k + 1 we have
√ √ 1
k+1 < k+ √ .
k+1
Hence, by the inductive hypothesis,
√ 1 1 1 1
k +1 < √ + √ +···+ √ + √ ;
1 2 k k+1
as desired. Hence, by the principle of mathematical induction, the inequality holds for all integers
n > 2.
In some cases, it is possible to prove an existential statement in an indirect way without actually
producing any specific element of the set. One indirect method is to use contrapositive and another
is to use a proof by contradiction. Consider following example to show this aspect.
Example 3.21. Let f be a continuous function.
If
∫1
(3.13) f (x) dx ̸= 0,
0
then there exists a point x ∈ [0, 1] such that
f (x) ̸= 0.
Claim 3.4. There exist irrational numbers a and b such that ab is rational.
√ √2
Proof. Consider the real number, 2 . This number is either rational or irrational. We consider
each case in turn.
√ √2 √ √
(1) 2 is rational. Let a = 2 and b = 2. Thus a and b are irrational, and by assumption, ab
is rational.
√ √2 √ √2 √
(2) 2 √ is irrational. Let a = 2 and b = 2. Thus a and b are irrational. Moreover, ab =
√ 2 √ √ √ √ √
( 2 ) 2 = ( 2) 2· 2 = ( 2)2 = 2 is rational.
In both cases, we proved the existence of irrational numbers a and b such that ab is rational, and so
we have the desired result.
We remark that as it stands, this proof does not enable us to pinpoint which of the two choices
of the pair (a, b) has the required property. In order to determine the correct choice of (a, b),
√ √2
we would need to decide whether 2 is rational or irrational. √It is not a constructive proof.
Following would be a constructive proof of this claim. Let a = 2 and b = log2 9. Then b is
an irrational number, for if it were rational, then log2 9 = mn where m and n are integers with no
common factor. This implies 2m = 9n which is a contradiction as 2m is an even number and 9n is
an odd number. This gives ab = 3 which is rational 1.
mutually exclusive ( i.e., every pair of them is a contradiction). Following example illustrates this
technique.
Before going over some examples, we state the following theorem.
Theorem 3.1. (Quotient-Remainder Theorem) For every given integer n and positive integer d,
there exist unique integers q and r such that
n = d ·q+r and 0 ≤ r < d.
Definition 3.6. Let n be a nonnegative integer and let d be a positive integer. By the Quotient-
Remainder Theorem, there exist unique integers q and r such that n = d · q + r; where 0 ≤ r < d.
We define,
n div d = q (read as “n divided by d ”), and
n mod d = r (read as “n modulo d ”).
Thus n div d and n mod d are the integer quotient and integer remainder, respectively, obtained
when n is divided by d.
Observe that given a nonnegative integer n and a positive integer d, we have that n mod d ∈
{0, · · · , d − 1} (since 0 ≤ r ≤ d − 1) and that n mod d = 0 if and only if n is divisible by d.
Result 3.1. Every integer is either even or odd.
Proof. By the Quotient-Remainder Theorem with d = 2, there exist unique integers q and r such
that n = 2 · q + r and 0 ≤ r < 2. Hence, r = 0 or r = 1. Therefore, n = 2q or n = 2q + 1 for some
integer q depending on whether r = 0 or r = 1, respectively. In the case that n = 2q, the integer n
is even. In the other case that n = 2q + 1, the integer n is odd. Hence, n is either even or odd.
Example 3.24. Let m, n ∈ Z. If m and n are of the same parity (either both even or both odd), then
m + n is even.
Proof. We use a proof by cases, depending on whether m and n are both even or both odd.
(1) m and n are both even.
Then, m = 2k and n = 2l for some integers k and l. Thus, m + n = 2k + 2l = 2(k + l). Since
k + l ∈ Z, the integer m + n is even.
(2) m and n are both odd.
Then, m = 2k + 1 and n = 2l + 1 for some integers k and l. Thus, m + n = (2k + 1) + (2l +
1) = 2(k + l + 1). Since k + l + 1 ∈ Z, the integer m + n is even.
Proof. We shall combine two proof techniques and use both a proof by contrapositive and a proof
by cases. Suppose that n is not a multiple of 3. We wish to show then that n2 is not a multiple of
3. By the Quotient-Remainder Theorem with d = 3, there exist unique integers q and r such that
n = 3 · q + r and 0 ≤ r < 3. Hence, r ∈ {0; 1; 2}. Therefore, n = 3q or n = 3q + 1 or n = 3q + 2
for some integer q depending on whether r = 0; 1 or 2, respectively. Since n is not a multiple of 3,
either n = 3q + 1 or n = 3q + 2 for some integer q. We consider each case in turn.
(1) n = 3q + 1 for some integer q.
Then, n2 = (3q + 1)2 = 9q2 + 6q + 1 = 3(3q2 + 2q) + 1, and so n2 is not a multiple of 3.
(2) n = 3q + 2 for some integer q.
Then, n2 = (3q + 2)2 = 9q2 + 12q + 4 = 3(3q2 + 4q + 1) + 1, and so n2 is not a multiple of
3.
Proof. We shall use both a direct proof and a proof by cases. Assume that n is an odd integer.
By the Quotient-Remainder Theorem with d = 4, there exist unique integers q and r such that
n = 4 · q + r and 0 ≤ r < 4. Hence, r ∈ {0; 1; 2; 3}. Therefore, n = 4q or n = 4q + 1 or n = 4q + 2
or n = 4q + 3 for some integer q depending on whether r = 0; 1; 2 or 3, respectively. Since n is
odd, and since 4q and 4q + 2 are both even, either n = 4q + 1 or n = 4q + 3 for some integer q. We
consider each case in turn.
(1) n = 4q + 1 for some integer q.
Then, n2 = (4q + 1)2 = 16q2 + 8q + 1 = 8(2q2 + q) + 1 = 8m + 1, where m = 2q2 + q.
Since q ∈ Z, we must have m ∈ Z. Hence, n2 = 8m + 1 for some integer m.
(2) n = 4q + 3 for some integer q.
3.9. Decomposition or proof by cases 33
We remark that the last conclusion can be restated as follows: For every odd integer n, we have
n2 mod 8 = 1. Here are some additional illustrative examples.
Example 3.27. If x is a real number, then
x 6 |x| .
Proof. Let x be an arbitrary real number. Then either x > 0 or x < 0. If x > 0, then by definition
|x| = x. If x < 0, then −x > 0, so that
x < 0 < −x = |x|
In either case,
x 6 |x| .
Chapter 4
Problem Set 1
(1) Prove or give a counterexample for the following claims. Capital letters refer to propositions
or sets, depending on the context.
(a)
∼ (A ∧ B) ⇔ ∼ A ∨ ∼ B
(b)
∼ (A ∨ B) ⇔∼ A ∧ ∼ B.
(c)
∼ (A ⇒ B) ⇔ A ∧ ∼ B.
(d)
((A ∨ B) ⇒ C) ⇔ ((A ⇒ C) ∧ (B ⇒ C)).
(e) If n and n + 1 are consecutive integers, then both cannot be even.
(f) Give a counter example of the proposed statement: If n ∈ N then n2 > n.
(g) If x is odd then x2 is odd.
(2) Write the negation of the following statements
(a) If S is closed and bounded, then S is compact.
(b) If S is compact, then S is closed and bounded.
(c) If a function is continuous then it is differentiable.
(3) Find the contrapositive of
(a) If x2 ̸= 3 ∧ y2 > 5 then xy is a rational number.
(b) If x ̸= 0 then ∃y xy = 1.
(4) Find the mistake in the “proof”of the following results, and provide correct proofs.
(a) If m is an even integer and n is an odd integer, then 2m + 3n is an odd integer.
Proof. Since m is an even integer and n is an odd integer, m = 2k and n = 2k + 1 for some
integer k. Therefore, 2m + 3n = 2(2k) + 3(2k + 1) = 10k + 3 = 2(5k + 1) + 1 = 2l + 1;
35
36 4. Problem Set 1
We define a set as a “well-specified collection”in order to emphasize that there must be a clear
rule or group of rules that determine membership in the set. Essentially all mathematical objects
can be gathered into sets: numbers, variables, functions, other sets, etc. Examples of sets can be
found everywhere around us. For example, we can speak of the set of all living human beings,
the set of all cities in Europe, the set of all propositions, the set of all prime numbers, and so on.
Each living human being is an element of the set of all living human beings. Similarly each prime
number is an element of the set of all prime numbers. If A is a set and a is an element of A, then
we write a ∈ A. If it so happens that a is not an element of A, then we write a ∈ / A. If S is the set
whose elements are s, t, and u, then we write S = {s;t; u}. The left brace and right brace visually
indicate the “bounds” of the set, while what is written within the bounds indicates the elements
of the set. For example, if S = {1; 2; 3; 5}, then 2 ∈ S, but 4 ∈ / S. Sets are determined by their
elements. The order in which the elements of a given set are listed does not matter. For example,
{1; 2; 3} and {3; 1; 2} are the same set. It also does not matter whether some elements of a given
set are listed more than once. For instance, {1; 2; 2; 2; 3; 3} is still the set {1; 2; 3}. Many sets are
given a shorthand notation in mathematics as they are used so frequently. A set may be defined by
a property. For instance, the set of all true propositions, the set of all even integers, the set of all
odd integers, and so on. Formally, if P(x) is a property, we write A = {x ∈ S : P(x)} to indicate that
the set A consists of all elements x of S having the property P(x). The colon : is commonly read as
“such that” and is also written as “| ”. So {x ∈ S|P(x)} is an alternative notation for {x ∈ S : P(x)}.
For a concrete example, consider A = {x ∈ R : x2 = 2}. Here the property P(x) is x2 = 2. Thus, A
is the set of all real numbers whose square is one.
37
38 5. Set Theory, Sequence
A B
Set A
Set AC
D = {2, 4, 10},
B = {x ∈ R s.t. x ≥ 10}
S = The set of all real-valued functions on R.
The last set R2 is shorthand notation for the Cartesian product R × R. This notation is accept-
able for any n ∈ Z++ number of sets. You will often encounter proofs and theorems defined on the
set Rn , which is the general way of describing the space of n-vectors, each element of which is a
real number.
Definition 5.5. Union : The union of n sets is the set containing all elements from all n sets. We
write
A ∪ B = {x : x ∈ A ∨ x ∈ B}.
n
∪ Ai = A1 ∪ A2 ∪ · · · ∪ An == {x : for some i = 1, · · · , n, x ∈ Ai }
i=1
A. B
Definition 5.6. Intersection : The intersection of n sets is the set containing the elements common
to all n sets. We write
A ∩ B = {x : x ∈ A ∧ x ∈ B}.
n
∩ Ai = A1 ∩ A2 ∩ · · · ∩ An = {x : for all i = 1, · · · , n, x ∈ Ai }
i=1
C C
∪
j=n ∩
j=n ∩
j=n ∪
j=n
A j = ACj ; A j = ACj .
j=1 j=1 j=1 j=1
A. B
Definition 5.7. Exclusion : The exclusion of the set B from the set A is the set of all elements in
A that are, in addition, not elements of B. We write
A \ B = {x ∈ A | x ∈
/ B}.
5.1. Set Theory 41
A\B B\A
A B
A−B
A. B
B−A
A. B
Proposition 1. (A \ B) ∩ (B \ A) = 0/
Proof.
A \ B = A ∩ BC ⊆ BC
B \ A = B ∩ AC ⊆ B
/
B ∩ BC = 0.
Exercise 5.2. Let B, and A1 , · · · , An be subsets of X. Then,
∪
j=n ∩
j=n ∩
j=n ∪
j=n
B− A j = (B − A j ); B− A j = (B − A j ).
j=1 j=1 j=1 j=1
Next we consider the sets whose elements are sets themselves. For example, let A, B, and C be
subsets of X, then the collection A = {A, B,C} is a set, whose elements are A, B and C. We call a
set whose elements are subsets of X, a family of subsets of X, or a collection of subsets of X. The
notation we follow would be, the lower case letters refer to the elements of X, upper case letters
refer to subsets of X and script letters refer to families of subsets of X.
Any subset of empty set is empty. Observe that the empty set 0/ is a subset of every set X. It is
/ In this case {0}
possible to form a non-empty set whose only element is the empty set, i.e., {0}. / is
a singleton. Also 0/ ⊂ {0}
/ and 0/ ∈ {0}.
/
Specifically,
P (A) = {B : B ⊆ A}
/ = {0},
The power set of the empty set is P (0) / i.e., the singleton of 0.
/ The power set of a singleton
P ({a}) = {0, / {a}}. Note that the power set of A always contains A and 0. / In general, if A is a
n
finite set with n elements, then P (A) contains 2 elements.
Exercise 5.3. Prove that if A is a finite set with n elements, then P (A) contains 2n elements.
There are a number of set identities that the set operations of union, intersection, and set difference
satisfy. They are very useful in calculations with sets. Below we give a table of such set identities,
where U is a universal set and A, B, and C are subsets of U.
• Commutative Laws: A ∪ B = B ∪ A ; A ∩ B = B ∩ A
5.2. Set Identities 43
• Idempotent Laws: A ∪ A = A ; A ∩ A = A
• Absorption Laws: A ∩ (A ∪ B) = A ; A ∪ (A ∩ B) = A
• Identity Laws: A ∪ 0/ = A ; A ∩U = A
• Complement Laws: A ∪ Ac = U ; A ∩ Ac = 0/
• Complements of U and 0/ : U c = 0/ ; 0/ c = U
A. B
(b) (A ∪ B) \ (C \ A) = A ∪ (B \C).
/
(c) A ∩ (((B ∪Cc ) ∪ (D ∩ E c )) ∩ ((B ∪ Bc ) ∩ Ac )) = 0.
44 5. Set Theory, Sequence
We will discuss additional concepts in set theory after we have gone over some elementary
exposition of functions and sequences.
5.3. Functions
(c) A mapping f (x) which assigns at least one element from R to each element x ∈ D.
Definition 5.10. A function consists of:
(c) A mapping f (x) which assigns exactly one element from R to each element x ∈ D.
The set of all functions is a strict subset of the set of all correspondences. This is the same as
saying that all functions are correspondences, but not the other way around. From here onwards it’s
critical that you specify the domain and the range when defining or using a function. For example
these two functions:
are not the same function, even though in practice they produce identical results.2
Definition 5.11. The argument of a function is the element from the domain that is mapped into
the range and the value of a function is the element from the range that is the destination of the
mapping.
Definition 5.12. A real-valued function is a function whose range is the set R or any subset of R.
From the above definition 5.12, the definitions of integer-valued functions, complex-valued
functions, etc., should be clear.
{ }
Definition 5.13. Let f : D → R and let A ⊆ D. We let f (A) represent the subset f (x) : x ∈ A
{ R. The set f (A)
of } is called the image of A in R. If B ⊆ R, we let f −1 (B) represent the subset
−1
x ∈ D : f (x) ∈ B of D. The set f (B) is called the pre-image of B in D.
Note that the image of a function may be equivalent to the range, or it may be a strict subset of
the range. In the above example, the image of the function f is a strict subset of its range, but the
image of g is equal to its range.
The vector space is defined over a field which is a set on which two operations + and · (called
addition and multiplication respectively) are defined. The formal definition of field is as follows:
Definition 5.14. A field F is a set on which two operations, called addition (+) and multiplication
(·), are defined so that for each pair of elements x, y in F there are unique elements x + y and x · y
in F, such that the following conditions hold for all a, b c in F.
(iii) Existence of identity elements for addition and multiplication: There exists elements 0 and 1
in F such that
0 + a = a, and 1 · a = a
2The difference between the two is that the range of f is all real numbers, and the range of g is the set of non-negative real numbers.
This is inconsequential, since the mapping in both cases takes all elements from the domain and assigns them to a non-negative real
number. But the two functions are still not the same.
46 5. Set Theory, Sequence
(iv) Existence of inverses for addition and multiplication: For each element a in F and for each
non-zero element b in F, there exist elements c and d in F such that
a + c = 0, and b · d = 1
Example of fields include the set of real numbers R with the usual definitions of addition and mul-
tiplication, the set of rational numbers Q with the usual definitions of addition and multiplication.
Definition 5.15. A vector space V over a field F consists of a set on which two operations, called
addition (+) and scalar multiplication (·), are defined so that for each pair of elements x, y in V
there is a unique element x + y in V , and for each element a in the field F and for each element x in
V , there is a unique element ax in V, such that the following conditions hold.
In order to show that any space is a vector space, we simply need to show that the properties in
the above definition are satisfied.
Definition 5.16. The Cartesian Product of sets A and B is the set of pairs (a, b) satisfying a ∈
A ∧ b ∈ B. We write
A × B = {(a, b) | a ∈ A ∧ b ∈ B}.
The Cartesian product is the two set case of the general “cross product” of sets, which is the
same concept defined for any number of sets. For example using sets A, B, C and D we could define
E = A × B × C × D, and a typical element of E would be (a, b, c, d) for some a ∈ A, b ∈ B, c ∈ C
and d ∈ D.
Example 5.2.
{ }
R3 = R × R × R = (x, y, z) | x ∈ R ∧ y ∈ R ∧ z ∈ R
R2+ = R+ ×R+ ; R2++ = R++ ×R++ .
The order of the sets in the cross-product does matter as the following example shows.
Example 5.3. Let
A = {1, 2, 3} , B = {2, 4}
{ }
A × B = (1, 2) , (1, 4) , (2, 2) , (2, 4) , (3, 2) , (3, 4)
{ }
B × A = (2, 1) , (2, 2) , (2, 3) , (4, 1) , (4, 2) , (4, 3) .
(a) The nonzero vectors u and v are parallel if there exists a ∈ R such that u = av.
(b) The vectors u and v are orthogonal or perpendicular if their scalar product is zero, that is, if
u · v = 0.
( )
u·v
(c) The angle between vectors u and v is arccos ∥u∥·∥v∥ .
5.4.1. Metric.
Definition 5.17. A distance function is a real-valued function d : V ×V → R which satisfies
(i) Non-negativity:
∀ x, y ∈ V, d(x, y) > 0 with equality if and only if x = y
48 5. Set Theory, Sequence
(ii) Symmetry:
∀x, y ∈ V, d(x, y) = d(y, x)
Any function satisfying these three properties is a distance function. A distance function is also
called a metric. The space V with elements x, y, which would be called points, is a metric space if
we can associate a distance function to it.
Example 5.4.
(a) The set of real numbers R with the distance function d(x, y) ≡ |x − y|.
(b) The set of complex numbers C with the distance function d(w, z) ≡ |w − z|.
(e) In V = R2
{ }
d (x, y) = max |x1 − y1 | , |x2 − y2 |
(g) Let X be a set of people of same generation with a common ancestor, for example all grandchil-
dren of a grandmother. The distance d(x, y) between any two individuals x and y is the number
of generations one has to go back along the female lines to find the first common ancestor. For
example, distance between two sisters is one.
5.4. Vector Space 49
(h) Let X be the set of n letter words in a m-character alphabet A = {a1 , a2 , · · · , am }, meaning
X = {(x1 , x2 , · · · , xn )|xi ∈ A}. We define the distance d(x, y) between two words x = (x1 , · · · , xn )
and y = (y1 , · · · , yn ) to be the number of places in which the words have different letters. That
is,
d(x, y) = #{i|xi ̸= yi }.
Exercise 5.5. Try to show the last two examples are indeed metric functions.
5.4.2. Norm.
Definition 5.18. A norm is a real-valued function written ∥ · ∥: V → R, defined on vector space V ,
which satisfies
(i) Non-negativity:
∀x ∈ V, ∥ x ∥ > 0; with equality if only if x = 0,
(ii) Homogeneity:
∀x ∈ V, α ∈ R, ∥ α · x ∥ = | α | · ∥ x ∥,
(i) Symmetry:
∀x, y ∈ V, ⟨x, y⟩ = ⟨y, x⟩ ,
(iii) Bilinearity:
∀x, y, z ∈ V, ∀α, β ∈ R, ⟨αx + βy, z⟩ = ⟨αx, z⟩ + ⟨βy, z⟩ .
Example 5.6. V = Rn . Dot Product
∀x, y ∈ V, x · y = x1 y1 + · · · + xn yn .
Definition 5.20. A metric space (V, d) is a space V equipped with a distance function d.
A normed metric space (V, ∥·∥) is a metric space V equipped with a norm ∥·∥. An inner product
space (V, ⟨·, ·⟩) is a space V and an inner product ⟨·, ·⟩.
5.4.4. Cauchy-Schwartz Inequality. The Cauchy-Schwarz inequality states that for all vectors x
and y of an inner product space,
|⟨x, y⟩|2 6 ⟨x, x⟩ · ⟨y, y⟩,
where ⟨·, ·⟩ is the inner product. Equivalently, by taking the square root of both sides, and referring
to the norms of the vectors, the inequality is written as
|⟨x, y⟩| 6 ∥x∥ · ∥y∥.
Moreover, the two sides are equal if and only if x and y are linearly dependent (or, in a geometrical
sense, they are parallel or one of the vectors is equal to zero).
5.5. Sequences
Another notation for sequence is ⟨xn ⟩ where ⟨xn ⟩ ≡ (x1 , x2 , · · · ). As we saw above, sets are
unordered collections of elements. Even if there is an intuitive ordering to the elements of a set,
with respect to the definition of the set itself there is no “first element” or “last element”. Sequences,
however, are sets for which the elements are assigned a particular order.
5.5. Sequences 51
Example 5.7.
{ }
1
S1 = , n ∈ N is a sequence in R
n
( )
1
S2 = n , n ∈ N is a sequence in R2
n
The interpretation of S1 is that the nth element of the sequence is givenby 1n . So we could also
( ) ( )
1 1
have written S1 = {1, 21 , 13 , 14 , · · · }. Similarly S2 = 1 , 2 , · · · . Note the implication
1 2
of this definition is that the elements of the sequence are numbered from 1 onwards, not from 0.
It’s usually assumed in the first year courses that the first element of a sequence is numbered “1”
not “0”, but this need not always be the case. Note that order of appearance of elements matters
{1, 2, 3, 4, · · · } ̸= {2, 1, 3, 4, · · · }
and elements can be repeated,
S = {1, 1, 1, · · · } is a sequence.
n
(d) Let xn = n(−1) . This sequence has a limit point a = 0.
Definition 5.23 is a source of a lot of difficulty. However it’s one of the most important defini-
tions in macroeconomic theory and in parts of micro, and it’s worth forcing yourself to fully absorb
it before the end of the Review. The intuition behind limits is not as difficult as the formal defini-
tion. A sequence converges to x if after choosing any very, very tiny number (ε), you can identify a
point in the sequence (N) after which all of the remaining members of the sequence are no farther
than ε from some particular value x. This concept is only well-defined for infinite sequences. In
most economic theory, the elements of a convergent sequence never actually reach their limiting
value. They simply get closer and closer to it as the sequence progresses.
1
Example 5.9. The sequence xn = n is a convergent sequence.
Next let us assume that the tail of the sequence {xn } is bounded and show that {xn } is bounded.
Let
B′ = max {x1 , x2 , · · · , xm−1 , B} .
Then B′ is a bound for {xn },
∀n ∈ N, |xn | ≤ B′ .
{ }∞
Definition 5.27. If {xn }∞
n=1 is a sequence, a subsequence xnk k=1 is obtained from {xn } by cross-
ing out some (possibly infinitely many) elements, while preserving the order.
{ }
Example 5.11. Sequence: {xn } = 1, −1, 12 , −1, 13 , −1, · · · .
{ } { }
Subsequence: xnk = {−1, −1, −1, · · · } or 1, 12 , 13 , · · · .
Proof. Note that we have a1 < a2 < a3 · · · < an < · · · < bn < · · · < b2 < b1 . Then each bi is an
upper bound for the set A = {a1 ; a2 ; · · · }. In other words sequence {an } is monotone increasing
and bounded sequence. Therefore, limn→∞ an = a exists and a = sup{an } 6 {bk } for each natural
number k. Hence ak 6 a 6 bk for every k ∈ N or a is contained in each Ik . Now let b be contained
in In for all n ∈ N. Then an 6 b 6 bn for every n ∈ N or 0 6 (b − an ) 6 (bn − an ) for each n. Then
limn→∞ (b − an ) = 0. It follows that b = limn→∞ an = a, and so a is the only real number common
to all intervals.
Theorem 5.2. Bolzano-Weierstrass Theorem Every bounded sequence {xn } has a convergent sub-
sequence.
54 5. Set Theory, Sequence
then xi ∈ [an , cn ] or xi ∈ [cn , bn ], possibly both. Thus at least one of the sets {i : xi ∈ [an , cn ]}
or {i : xi ∈ [cn , bn ]} is infinite. If the first set is infinite, we let an+1 = an and bn+1 = cn . If
the second is infinite, we let an+1 = cn and bn+1 = bn . Let In+1 = [an+1 , bn+1 ]. Then (i) and
(ii) are satisfied. By the Nested Interval Property, there exists a ∈ ∩∞ n=1 In .
Step 2 We next find a subsequence converging to a. Choose i1 ∈ N such that xi1 ∈ I1 . Suppose we
have in . We know that {i : xi ∈ In+1 } is infinite. Thus we can choose in+1 > in such that
xin+1 ∈ In+1 . This allows us to construct a sequence of natural numbers i1 < i2 < i3 < · · ·
where xin ∈ In for all n ∈ N.
Remark 5.1. Every bounded sequence {xn } has at least one limit point x̄.
Definition 5.29. A sequence {xn } is Cauchy sequence if
∀ ε > 0, ∃N, such that ∀n, m > N, d ( xn , xm ) < ε.
After N, each element is close to every other element or in other words, the elements lie within
a distance of ε from each other.
(i) Every convergent sequence {xn } (with limit x, say) is a Cauchy sequence, since, given any
real number ε > 0, beyond some fixed point, every term of the sequence is within distance 2ε
of x, so any two terms of the sequence are within distance ε of each other.
(ii) Every Cauchy sequence of real numbers is bounded (since for some N, all terms of the se-
quence from the N-th position onwards are within distance 1 of each other, and if M is the
5.5. Sequences 55
largest absolute value of the terms up to and including the N-th, then no term of the sequence
has absolute value greater than M + 1).
(iii) In any metric space, a Cauchy sequence which has a convergent subsequence with limit x is
itself convergent (with the same limit), since, given any real number ε > 0, beyond some fixed
point in the original sequence, every term of the subsequence is within distance 2ε of x, and
any two terms of the original sequence are within distance 2ε of each other, so every term of
the original sequence is within distance ε of x.
Theorem 5.3. Every sequence has at most one limit.
Proof. By contradiction. We use the intuition that all points end up being close to say r1 and r2
at the same time which is not possible. Let sequence {xn } converge to two limits r1 and r2 . It is
enough to show that there is one ε for which this does not hold. Let us choose ε = d(r14,r2 ) = |r1 −r
4
2|
(a) Every convergent sequence is bounded BUT a bounded sequence may not be convergent. For
example {1, −1, 1, −1, · · · }.
{ }∞
(d) x is a limit point of {xn } if and only if ∃ a subsequence xn(k) of the sequence {xn } such
{ } k=1
that xn(k) → x.
x1
x
n
1
x2n x2
(e) Sequence of vectors {xn } = ∈ R converges to a limit {x} =
N if and only if
···
···
xN
xN
{ } n
xin → {xi } , ∀i = 1, 2, · · · , N.
5.6. Sets in Rn
Now we are ready for additional useful concepts in set theory. We begin with some definitions.
Definition 5.31. A set A on the real line is bounded if ∃B ∈ R ∀x ∈ A, ∥x∥ 6 B.
Theorem 5.4. For every non-empty bounded set A ⊂ R, ∃ a real number sup A such that
r x
Br (x)
This example shows that sup and inf of a set need not belong to the set. If sup belongs to the
set A, it is called max {A} and if inf {A} belongs to the set A, it is called min {A}.
Definition 5.32. Point x is a limit point of a set A if every neighborhood of x contains a point of A
different from x : x is a limit point of A if
∀ε > 0, ∃y ∈ A, y ̸= x ∧ d (x, y) < ε.
Theorem 5.5. Bolzano-Weierstrass Theorem for sets Every bounded infinite set has at least one
limit point.
Example 5.13. For the set A = (0, 1), x = 0 is a limit point of the set A.
This shows that limit point of a set need not belong to the set.
Theorem 5.6. Point x is a limit point of set A ⊆ Rn if ∃ a sequence
{xn } ∀n ∈ N, xn ̸= x ∧ xn ∈ A ∧ xn → x.
Definition 5.33. An open ball in Rn centered at x with radius r > 0 is
{ }
Br (x) = y ∈ Rn | d (x, y) < r .
Note that the open ball does not include its boundary points.
Around any point in an open set, one can draw an open ball which is completely contained in
the set.
Example 5.15. Following sets are open
/
A = [2, 5] since AC = (−∞, 2) ∪ (5, ∞) is open.; R; 0.
There are two sets which are both open and closed. The empty set and the universal set. Empty set
0/ is open since
int 0/ = 0/
and 0/ is closed since
bd 0/ = 0/ ⊆ 0.
/
The universal set is complement of the empty set and so is both open and closed. There can be sets
which are neither open nor closed: A = (0, 1]. Following theorem characterizes the closed set using
convergent sequences.
Theorem 5.8. A set A ⊆ Rn is closed if and only if every convergent sequence of points {xn } ∈ A
has its limit x ∈ A.
Example 5.17. The budget set
{ }
B (p, I) = y ∈ Rn+ | p · y 6 I ,
where p ∈ Rn++ and I ∈ R++ , is closed.
5.6. Sets in Rn 59
Good 2 p1
|Slope| = p2
M
P2
0 M
Good 1
P1
Figure 5.5. Budget set B(p, I)
It will be useful to draw some sets to differentiate between convex and non-convex sets.
5.6. Sets in Rn 61
Problem Set 2
(b) For x, y ∈ R2 ,
(6.2) d(x, y) = max{| x1 − y1 |, | x2 − y2 |}
(c) Let d(·, ·) be a metric, then
d(x, y)
(6.3) d1 (x, y) = .
1 + d(x, y)
(4) Suppose A, B, and C are sets which satisfy both of the following two conditions
(a) A ∪C = B ∪C,
(b) A ∩C = B ∩C.
63
64 6. Problem Set 2
Prove that A = B.
(7) Let V denote the set of ordered pairs of real numbers. If (a1 , a2 ) and (b1 , b2 ) are elements of V
and c ∈ R define
(a1 , a2 ) + (b1 , b2 ) = (a1 + b1 , a2 − b2 ), and c(a1 , a2 ) = (ca1 , ca2 ).
Is V a vector space over R with these operations? Justify your answer.
(8) Let V denote the set of ordered pairs of real numbers. If (a1 , a2 ) and (b1 , b2 ) are elements of V
and c ∈ R define
(a1 , a2 ) + (b1 , b2 ) = (a1 + 2b1 , a2 + 3b2 ), and c(a1 , a2 ) = (ca1 , ca2 ).
Is V a vector space over R with these operations? Justify your answer.
(9) Prove
(J ∩ K)c = J c ∪ K c
(J ∪ K)c = J c ∩ K c
{ }
(13) Prove that the sequence {xn } = 2 − 1n : n ∈ N is not convergent to 1.
(15) Determine whether the following sets are open, closed, neither or both:
(i) S = (0, 1);
(ii) S = [0, 1];
(iii) S = R;
(iv) S = [0, 1).
Linear Algebra
Linear algebra is the branch of mathematics dealing with (among many other things) matrices and
vectors. It’s intuitively easy to see why linear algebra is important for econometrics and statistics.
Economic data is arranged in matrix format (rows corresponding to observations, columns corre-
sponding to variables), so the body of theory governing matrices should help us analyze data. It
is harder to see the connection between matrix theory and the optimization that we do in micro
theory, but there are some important links. We’ll cover the basics and some of the necessary detail
here, but more detailed coverage will be offered in the core courses.
7.1. Vectors
You may be familiar with vectors from physics courses, in which a vector is a pair giving the mag-
nitude and direction of a moving body. The vectors we use in economics are more general, in that
they can have any finite number of elements (rather than just 2), and the meaning of each element
can vary with the context (rather than always signifying magnitude and direction). Formally speak-
ing a vector can be defined as a member of a vector space, but we don’t need to deal with such a
definition here. For our purposes:
Definition 7.1. A vector is an ordered array of elements with either one row or one column.
The elements are usually numbers. A vector is an n × k matrix for which either n = 1, k = 1 or
both (see the definition of a matrix below). A general vector, for which the number of elements is
not specified but left as n, will sometimes be called an “n-vector”. We also refer to these as “vectors
in Rn ”. A vector can be written in either row or column form:
67
68 7. Linear Algebra
x1
( )
x2
Row Vector: x ∈ Rn = x1 x2 . . . xn ; Column Vector: x ∈ Rn =
.. .
.
xn
Although you will sometimes be able to switch between thinking of a vector as a row or a
column without restriction, there are certain operations that require a vector to be oriented in a
certain way, so it is good to distinguish between row and column vectors whenever possible. Most
people use x to refer to the vector in column form and x′ to refer to it in row form, but this is not
universal. Also, we usually use lowercase letters for vectors and uppercase letters for matrices.
1
.
Sum vector un×1 = ..
1
(a) Equality :
Vectors x ∈ Rn , y ∈ Rm are equal if n = m and xi = yi ∀ i.
(c) Addition :
∀x, y ∈ Rn , x + y = z ∈ Rn where zi = xi + yi , ∀i.
(e) Vector Multiplication : This is essentially an inner product rule applied to Rn . See the rules
for matrix multiplication below, as they also apply for vectors.
7.2. Matrices
Definition 7.3. A matrix is a rectangular array of elements (usually numbers, for our purposes).
It’s worthwhile to check your understanding of each of the above definitions by writing out a
matrix that satisfies each. Then note this next definition carefully:
7.2.1.1. Addition. Matrix addition is only defined for matrices of the same size. If A is n × k and
B is n × k then
(7.4) [A] + [B] = [C]
n×k n×k n×k
where
(7.5) ci j = ai j + bi j ∀ i = 1, · · · , n, j = 1, · · · , k.
We say that matrix addition occurs “element wise” because we move through each element of the
matrix A, adding the corresponding element from B.
7.2.1.2. Scalar Multiplication. Scalar multiplication is also an element wise operation. That is,
λa11 λa12 · · · λa1k
λa21 λa22 · · · λa2k
(7.6) ∀ λ ∈ R, λ · [A] = . ..
. .. ..
n×k . . . .
λan1 λan2 · · · λank
7.2.1.3. Matrix Multiplication. Matrix multiplication is defined for matrices [A] and [B] if j = n
m× j n×k
or m = k. That is, the number of columns in one of the matrices must be equal to the number of
rows in the other. If matrices A and B satisfy this condition, so that A is m × j and B is j × k, their
7.2. Matrices 71
product [C] ≡ [A] · [B] is given by ci j = Ai · B j , where Ai is the ith row of A and B j is the jth column
m×k m× j j×k
of B. For example, suppose
[ ] [ ]
1 2 6 5 4
[A] = and [B] =
2×2 3 4 2×3 3 2 1
Multiplication between A and B is only defined if A is on the left and B is on the right. It must
always be the case that the number of columns in the left hand matrix is the same as the number of
rows in the right hand matrix. In this case, if we say AB = C, then element
[ ]
[ ] 6
c11 = 1 2 · = 1 · 6 + 2 · 3 = 12
3
Likewise
c12 = 1 · 5 + 2 · 2 = 9
c13 = 1 · 4 + 2 · 1 = 6
c21 = 3 · 6 + 4 · 3 = 30
c22 = 3 · 5 + 4 · 2 = 23
c23 = 3 · 4 + 4 · 1 = 16
which gives [ ]
12 9 6
[A] · [B] = [C] =
2×2 2×3 2×3 30 23 16
Note that matrix multiplication is not a symmetric operation. In general, AB ̸= BA, and in fact it
is often the case that the operation will only be defined in one direction. In our example BA is not
defined because the number of columns of B = (3) is not equal to the number of rows of A = (2).
For both AB and BA to be defined
[A] · [B] = [C] ,
n×k k×n n×n
and
[B] · [A] = [D].
k×n n×k k×k
(i) Even if n = k,
AB ̸= BA.
[ ] [ ] [ ] [ ]
1 2 0 −1 12 13 −3 −4
A= , B= , AB = , BA = .
3 4 6 7 24 25 27 40
72 7. Linear Algebra
(7.8) (A + B)′ = A′ + B′
(7.9) (AB)′ = B′ A′
Note the reversal of the order of the matrices in the last operation.
The first and the third columns are linearly dependent. The elements of column 3 are three
times the corresponding entry in the column 1. Now take Columns 1 and 2.
1 2 0
λ1 0 + λ2 1 = 0
2 4 0
λ1 + 2λ2 = 0
⇔ λ2 = 0
2λ1 + 4λ2 = 0
⇔ λ1 = 0, λ2 = 0
is the only solution. So the first two columns are linearly independent. We found two linearly
independent columns so the rank of matrix A is 2. We could have done the exercise taking rows
instead of columns and still got the same answer. (Please verify).
Theorem 7.1. (i) Rank of [A] 6 {# rows, # columns} = min {n, k};
n×k
{ } { }
(ii) Rank of AB 6 min Rank (A) , Rank (B) 6 Rank (A) , Rank (B) .
Definition 7.7. A square matrix [A] is called non-singular or of full rank if rank (A) = n.
n×n
Definition 7.8. A square matrix [A] is invertible if there exist [B] such that [A] · [B] = [B] · [A] =
n×n n×n n×n n×n n×n n×n
[I] . Then B is called inverse of A.
n×n
Definition 7.9. A square matrix [A] is called orthogonal if A−1 = A′ , i.e., AA′ = I.
n×n
Determinant is defined only for square matrices. The determinant is a function depending on n that
associates a scalar, det (A), to an n × n square matrix A. The determinant of an 1-by-1 matrix A is
the only entry of that matrix: det (A) = A11 . The determinant of a 2 by 2 matrix
[ ]
a b
A=
c d
Then
[ ] [ ] [ ]
e f d f d e
det (A) = a (−1)1+1 det + b (−1)1+2 det + c (−1)1+3 det
h i g i g h
= a (ei − f h) − b (di − f g) + c (dh − eg) .
(a)
( )
(7.15) det (A) = det A′
7.3. Determinant of a matrix 75
(b) Interchanging any two rows will alter the sign but not the numerical value of the determinant.
(c) Multiplication of any one row by a scalar k will change the determinant k− fold.
(e) The addition of a multiple of one row to another row will leave the determinant unchanged.
(g) Properties (b) − (e) are valid if we replace row by columns everywhere.
[ ] [ ]
1 2 1 3 ( )
A = , det (A) = −2; A′ = det A′ = −2
3 4 2 4
[ ]
3 4
B = , det (B) = 2.
1 2
Result 7.1. Let A be an n × n upper triangular matrix, i.e., ai j = 0 whenever i > j. The determinant
of the matrix A is given by:
det A = ∏i=1 aii
n
(1) Base case: Let n = 1. If A is a 1 × 1 matrix, then det A = a11 = ∏1i=1 aii by the definition of a
determinant.
(2) Inductive case: Let n > 1. Assume that for any (n − 1) × (n − 1) matrix A with ai j = 0 for all
i=1 aii . Now consider any n × n matrix A with ai j = 0 for all i > j.
i > j, we have det A = ∏n−1
76 7. Linear Algebra
Result 7.2. The upper triangular square matrix A is non-singular if and only if aii ̸= 0 for each
i ∈ {1, · · · , n}.
As an ”if and only if” statement, this requires proofs in both directions.
Claim 7.1. If the upper triangular matrix A is non-singular, then aii ̸= 0 for all i = 1, . . . , n.
Proof. Let A be non-singular. Then A has an inverse, A−1 . Since 1 = det I = det [A−1 A] =
(det A−1 )(det A), we know that det A ̸= 0. If aii = 0 for any i ∈ 1, . . . , n, then by the Result
(7.1) we would have det A = 0, a contradiction. So it must be that aii ̸= 0 for all i = 1, . . . , n.
Claim 7.2. If A is upper triangular and aii ̸= 0 for all i = 1, . . . , n, then A is non-singular.
7.4. An application of matrix algebra 77
Proof. Let aii ̸= 0 for all i = 1, . . . , n. Seeking contradiction, suppose A is singular. Without loss
of generality, we can write A1 = ∑ni=2 αi Ai . Let
B = A1 − ∑ni=2 αi Ai A2 · · · An
= 0 A2 · · · An
We know, by the properties of determinants, that det B = det A. But, expanding B by the first
column, we have det B = 0. This gives det A = 0, a contradiction. So we have that A is non-
singular.
We provide an application of matrix algebra is Markov process or Markov chain. Markov processes
are used to measure movements over time. It involves use of a Markov transition matrix. Each value
in the transition matrix is probability of moving from one state to another state. It also specifies a
vector containing the initial distribution across each of these states. By repeatedly multiplying the
initial distribution vector by the transition matrix, we can estimate changes across states over time.
Consider the problem of movement of employees within a firm at different branches. In the
simple case, we take two locations, namely Ithaca and Cortland to demonstrate the basic elements
of a Markov process.
To determine the number of employees in Ithaca tomorrow, we take the probability that the
employees will stay in Ithaca branch multiplied by the total number of employees currently in
Ithaca. We add to this the number of Cortland employees transferring to Ithaca, which is equal
to total number of employees in Cortland multiplied by the probability of Cortland employees
transferring to Ithaca.
We follow the same process to determine the number of employees in Cortland tomorrow, made
up of the employees who choose to remain at Cortland and the Ithaca employees who transfer into
Cortland.
There are four probabilities involved which can be arranged in a Markov transition matrix.
78 7. Linear Algebra
Let At and Bt denote the populations of Ithaca and Cortland locations at some time t. The
transition probabilities are defined as follows.
pAA ≡ probability that a current A remains an A,
Then the distribution of employees across the two locations next period (t + 1) is xt′ · M = xt+1
′ ,
which is
[ ]
pAA pAB
[At Bt ] = [(At pAA + Bt pBA ) (At pAB + Bt pBB )] = [At+1 Bt+1 ].
pBA pBB
In the similar manner we can determine the distribution of employees after two periods.
′ ′
xt+1 · M = xt+2
[ ]
pAA pAB
[At+1 Bt+1 ] = [At+2 Bt+2 ]
pBA pBB
[ ][ ]
pAA pAB pAA pAB
[At Bt ] = [At+2 Bt+2 ]
pBA pBB pBA pBB
[ ]2
pAA pAB
[At Bt ] = [At+2 Bt+2 ]
pBA pBB
Let [ ] [ ]
pAA pAB 0.8 0.2
M= = .
pBA pBB 0.4 0.6
Then the distribution of employees in the next period t = 1 is
[ ]
0.8 0.2
[200 200] = [240 160] = [A1 B1 ].
0.4 0.6
Observe that when the transition matrix is raised to higher powers, the new transition matrix con-
verges to a matrix whose rows are identical. This is referred to as the steady state. In this example,
the steady state would be
[ ]
2 1
M= 3 3 .
2 1
3 3
Try computing this value.
7.4.1. Absorbing Markov Chains. We can extend the previous model by adding a third choice:
employees can exit the firm with
pAE ≡ probability that a current A choose to exit, E,
or n
pAA pAB pAE
[A0 B0 E0 ] pBA pBB pBE = [An Bn En ]
0 0 1
This type of Markov process is referred to as absorbing Markov chain. The values of transition
probabilities assigned in the third row are such that once an employee goes to state E, he or she
remains in that state for ever. As n goes to infinity, An , and Bn will approach zero and En will
approach the total number of employees at time zero (i.e., A0 + B0 + E0 ).
Claim 7.3. A homogeneous system Ax = 0 always has a solution (Trivial x = 0). But there might
be other solutions (solution may not be unique).
Claim 7.4. For a non-homogeneous system Ax = b, a solution may not exist.
Example 7.6. Following system of two linear equations
2x + 4y = 5
x + 2y = 2
does not have a solution. Multiply second equation by 2. Then LHS of both equations become
same which leads to 5 = 4 which is a contradiction.
Example 7.7. Following system of two linear equations
2x + 4y = 2
x + 2y = 1
has many solution.
[ ]
Given [A] and {b}, the n × (k + 1) matrix [Ab ] = A1 A2 · · · Ak b is called the augmented
n×k k×1 n×(k+1)
matrix. Note Ai is the ith column of A.
[ ] [ ] [ ]
5 3 1 5 3 1
Example 7.8. Let A = , b= ⇒ Ab = .
6 1 2 6 1 2
gives us [ ] [ ] [ ]
2 1 0 2 1 0
A= , b = , Ab = .
2 2 0 2 2 0
It is easy to verify that
rank (A) = 2 = rank (Ab ) .
Hence solution exists and is unique.
Example 7.10. The system of linear equations
2x + y = 0
4x + 2y = 0
leads to [ ] [ ] [ ]
2 1 0 2 1 0
A= , b = , Ab = .
4 2 0 4 2 0
It is again easy to verify that
rank (A) = 1 = rank (Ab ) .
However
rank (A) = rank (Ab ) < k = 2.
Hence solution exists but is not unique1.
Now, we revert to the problem of computing the inverse of a non-singular matrix. We first note
the following result.
( )
Theorem 7.4. Matrix [A] is invertible⇔ det (A) ̸= 0. Also if [A] is invertible then det A−1 =
n×n n×n
1
det(A) .
Suppose, next, that A is not invertible. Then, A is singular and so one of its columns (say, A1 )
can be expressed as a linear combination of its other columns A2 , · · · , An . That is,
n
A1 = ∑ αi Ai
i=2
1A row or column vector of zeros is always linearly dependent on the other vectors.
7.5. System of Linear Equations 83
[ ]
n
Consider the matrix, B, whose first column is A − ∑ αi A and whose other columns are the same
1 i
i=2
as those of A. Then, the first column of B is zero, and so |B| = 0. By the property of determinants,
|B| = |A|, and so |A| = 0.
For a square matrix, [A] , we define the co-factor matrix of A to be the n × n matrix given by
n×n
A11 A12 ... A1n
. ..
C=
..
..
.
..
.
.
An1 An2 ... Ann
The transpose of C is called the adjoint of A, and denoted by adj A.
n n n
∑ a1 j A1 j
j=1 ∑ a1 j A2 j · · · ∑ a1 j An j
j=1 j=1 |A| 0 · · · 0
AC′ = .. ..
=
..
n . . .
n n 0 0 · · · |A|
∑ an j A1 j ∑ an j A2 j · · · ∑ an j An j
j=1 j=1 j=1
This yields the equation
C′ = |A| A−1
Since A is non-singular, we have |A| ̸= 0, and
C′ ad j A
(7.24) A−1 = =
|A| |A|
84 7. Linear Algebra
Thus (7.24) gives us a formula for computing the inverse of a non-singular matrix in terms of the
determinant and cofactors of A.
Recall that we wanted to calculate the (unique) solution of a system of n equations in n unknowns
given by
(7.25) Ax = c
where A is an n × n matrix, and c is a vector in Rn .
To obtain a unique solution, we saw that we must have A non-singular, which now translates to
the condition “|A| ̸= 0”. The unique solution to (7.25) is then
adj A
(7.26) x = A−1 c = c
|A|
Let us evaluate x1 , using (7.26). This can be done by finding the inner product of x with the first
unit vector, e1 = (1, 0, · · · , 0). Thus,
e1 · adj A
x1 = e1 x = c
|A|
c1 a12 · · · a1n
= ..
.
|A|−1
cn an2 · · · ann
This gives us an easy way to compute the solution of x1 . In general, in order to calculate xi , replace
the ith column of A by the vector c and find the determinant of this matrix. Dividing this number
by the determinant of A yields the solution xi . This rule is known as Cramer’s Rule.
Example 7.11. General Market Equilibrium with three goods
7.6. Cramer’s Rule 85
Consider a market for three goods. Demand and supply for each good are given by:
D1 =5 − 2P1 + P2 + P3
S1 = − 4 + 3P1 + 2P2
D2 =6 + 2P1 − 3P2 + P3
S2 =3 + 2P2
D3 =20 + P1 + 2P2 − 4P3
S3 =3 + P2 + 3P3
where Pi is the price of good i; i = 1; 2; 3. The equilibrium conditions are: Di = Si ; i = 1; 2; 3, that
is
5P1 + P2 − P3 = 9
−2P1 + 5P2 − P3 = 3
−P1 − P2 + 7P3 = 17
This system of linear equations can be solved at least in two ways.
34 −6 4 9 2
1
P = 15 34 7 · 3 = 2
178
7 4 27 17 3
Again, P1∗ = 2, P1∗ = 2, and P3∗ = 3.
Definition 7.13. A principal minor of order k (1 6 k 6 n) of [A] is the determinant of the k × k sub
n×n
matrix that remains when (n − k) rows and columns with the same indices are deleted from A.
Example 7.12. Let
1 2 3
A= 0 8 1
2 5 9
[ ]
1 2
det =8
0 8
and leading principal minor of order 3 is
1 2 3
det 0 8 1 = 23.
2 5 9
A quadratic form consists of a square matrix [A] which is pre and post multiplied by a n vector. It
n×n
is a scalar.
(7.27) Q (x, A) = x′ Ax
Example 7.13. Let [ ] [ ]
a b x1
A= , x= .
c d x2
Then
[ ] [ ]
[ ] a b x1
Q (x, A) = x1 x2 · ·
c d x2
= ax12 + (b + c) x1 x2 + dx22 .
[A] is ND if and only if all leading principal minors of A have sign (−1)k .
n×n
[A] is PSD if and only if all principal minors of A are non-negative.
n×n
[A] is NSD if and only if all principal minors of A have sign (−1)k or are 0.
n×n
Example 7.14. Let [ ]
a11 a12
A= .
a21 a22
Then A is
positive definite: a11 > 0, a11 a22 − a12 a21 > 0.
negative definite: a11 < 0, a11 a22 − a12 a21 > 0.
positive semi-definite: a11 > 0, a22 > 0, a11 a22 − a12 a21 > 0.
negative semi-definite: a11 6 0, a22 6 0, a11 a22 − a12 a21 > 0.
Note that a negative definite matrix necessarily has full rank: indeed, if the zero vector can be
obtained by a linear combination of columns of A with weights α1 , · · · , αn (not all zero), then we
can define t = (α1 , · · · , αn ) to obtain t ′ At = 0.
Definition 7.15. Let A be a symmetric n × n matrix. Matrix A is diagonally dominant if for each
row i, we have |ai,i | ≥ ∑ j̸=i |ai, j |, and it is strictly diagonally dominant if the latter inequality holds
strictly for each row.
Every symmetric, diagonally dominant matrix with non-positive entries along the diagonal is
negative semi-definite; and every symmetric, strictly diagonally dominant matrix with negative
entries along the diagonal is negative definite.
Given an n × n real matrix A, an eigenvalue of A is a number λ which when subtracted from each
of the diagonal entries of A converts A into a singular matrix. Subtracting a scalar λ from each
diagonal entry of A is the same as subtracting λ times the identity matrix I from A. Hence, λ is a
eigenvalue of A if and only if A − λI is a singular matrix.
7.9. Eigenvalue and Eigenvectors 89
This is also equivalent to asking for what non-zero vectors x ∈ Rn , and for what complex
numbers λ, is it true that
(7.32) Ax = λx
This is known as the the eigenvalue problem.
Therefore, 4 is an eigenvalue of matrix A. Also, subtracting 2 from each diagonal entries transforms
A into the singular matrix [ ]
2 0
.
0 0
Therefore, 2 is also an eigenvalue of matrix A.
The above example illustrates a general principal about the eigenvalues of a diagonal matrix.
Theorem 7.5. The diagonal entries of a diagonal matrix A are the eigenvalues of A.
Theorem 7.6. A square matrix A is singular if and only if 0 is an eigenvalue of A.
Example 7.17. Consider the 2 × 2 matrix A given by
[ ]
4 −4
A=
−4 4
Since the first row is negative of the second row, matrix A is singular. Hence 0 is an eigenvalue of
A. Also subtracting 8 from each diagonal entries transforms A into the singular matrix
[ ]
−4 −4
.
−4 −4
Therefore, 8 is also an eigenvalue of matrix A.
Example 7.18. Consider the 2 × 2 matrix A given by
[ ]
2 1
A=
1 2
Then equation (7.34) becomes
2−λ 1
(7.36)
1 2−λ
which yields
x1 + x2 = 0
Thus the general solution of the eigenvector corresponding to the eigenvalue λ = 1 is given by
(x1 , x2 ) = θ(1, −1) for θ ̸= 0
Similarly, corresponding to the eigenvalue λ = 3, we have the eigenvector given by
(x1 , x2 ) = θ(1, 1) for θ ̸= 0.
Example 7.19. A square matrix A whose entries are non-negative and whose rows (or columns)
each add to 1 is called a Markov matrix. These matrices play a major role in economic dynamics.
Consider the 2 × 2 matrix A given by
[ ]
a 1−a
A=
b 1−b
where a ≥ 0 and b ≥ 0. Then subtracting 1 from the diagonal entries leads to the matrix
[ ]
a−1 1−a
A=
b −b
Notice that each row of the matrix adds to 0. But if the rows of a square matrix add to zero {0, 0},
the columns are linearly dependent and the matrix is singular. This shows that 1 is an eigenvalue
of the Markov matrix. This same argument shows that 1 is an eigenvalue of every Markov matrix.
For the case of a symmetric matrix A, we can show that all the eigenvalues of A are real.
Theorem 7.7. Let A be a symmetric n × n matrix. Then all the eigenvalues of A are real.
Proof. Suppose λ is a complex eigenvalue, with associated complex eigenvector, x. Then we have
(7.37) Ax = λx
Define x∗ to be the complex conjugate of x, and λ∗ to be the complex conjugate of λ. Then
(7.38) Ax∗ = λ∗ x∗
Pre-multiply (7.37) by (x∗ )′ and (7.38) by x′ to get
(7.39) (x∗ )′ Ax = λ(x∗ )′ x
(7.40) x′ Ax∗ = λ∗ x′ x∗
Subtracting (7.40) from (7.39)
(7.41) (x∗ )′ Ax − x′ Ax∗ = (λ − λ∗ )x′ x∗
92 7. Linear Algebra
On the other hand, if λ1 , ..., λn are the eigenvalues of A, then the characteristic equation (7.34)
can be written as
(7.44) 0 = (λ1 − λ)(λ2 − λ)....(λn − λ)
Using (7.34), (7.43), and (7.44) and “comparing coefficients” we can conclude that
bn−1 = λ1 + λ2 + ... + λn
7.11. Eigenvalues, Trace and Determinant of a Matrix 93
and
b0 = λ1 λ2 ...λn
Also, by looking at the terms in the characteristic polynomial of A which would involve
(−λ)n−1 , we can conclude that
bn−1 = a11 + a22 + ... + ann
Finally, putting λ = 0 in (7.43), we get
b0 = |A|
Thus we might note two interesting relationships between the characteristic values, the trace
and the determinant of A:
n
tr A = ∑ λi
i=1
and
n
|A| = ∏ λi
i=1
(1) A is positive definite if and only if all the eigenvalues of A are positive.
(2) A is negative definite if and only if all the eigenvalues of A are negative.
(3) A is positive semidefinite if and only if all the eigenvalues of A are non-negative.
(4) A is negative semidefinite if and only if all the eigenvalues of A are non-positive.
(5) A is indefinite if and only if A has a positive eigenvalue and a negative eigenvalue.
Chapter 8
Problem Set 3
(1) Let
[ ] 9 6 5 4
1 −1 7
A= , B = 1 −2 −3 3
0 8 10
0 1 −1 2
(3) Let
[ ] 8 4
1 6 2
A= , B = 0 −2 .
−1 5 3
7 −3
95
96 8. Problem Set 3
(7) What is the definiteness of the following matrices? (Hint: Use the principal minors)
[ ] [ ] [ ] [ ]
2 −1 2 4 −3 4 −3 4
A= , B= C= , D= .
−1 1 4 8 4 5 4 −6
(8) Consider the situation of a mass layoff (i.e. a firm goes out of business) where 2000 people
become unemployed and now begin a job search. There are two states: employed (E) and
unemployed (U) with an initial vector
x0′ = [E U] = [0 2000].
Suppose that in any given period an unemployed person will find a job with probability 0.7
and will therefore remain unemployed with a probability 0.3. Additionally, persons who find
themselves employed in any given period may lose their job with a probability of 0.1 (and will
continue to remain employed with probability 0.9).
(i) Set up the Markov transition matrix for this problem.
(ii) What will be the number of unemployed people after (a) two periods; (b) four periods;
(c) six periods; (d) ten periods.
(iii) What is the steady-state level of unemployment?
| ×A×
Ak = A {z· · · · · · A} = O
k times
(10) (a) Prove that the eigenvalues of an upper or lower triangular matrix are precisely its diagonal
entries.
(b) Suppose that A is an invertible matrix. Show that (A−λI)x = 0 implies that (A−1 − λI )x = 0.
Conclude that for an invertible matrix A, λ is an eigenvalue of A if and only if λ1 is an
eigenvalue of A−1 .
(c) Let A be an invertible matrix and let x be an eigenvector of A. Show it is also an eigenvector
of A2 and A−2 . What are the corresponding eigenvalues?
Chapter 9
9.1. Functions
Recall the definition of functions discussed earlier. Now we discuss some features of function
which are useful in optimization exercise.
Definition 9.1. A function f : D → R is called surjective (or is said to map) D onto R if f (D) = R,
i.e., if the image f (D) of the function is equal to entire range.
Definition 9.2. A function f : D → R is called injective or one to one if
(9.1) f (x) = f (y) ⇔ x = y.
It is not surjective as there exist no element in the domain which gets mapped into −1.
99
100 9. Single and Multivariable Calculus
Next let us also restrict the domain of the function to R+ . The function is
h : R+ → R+ : f (x) = x2 .
It is both surjective and injective. Hence it is bijective.
Example 9.2. Let A be a non-empty set and let S be a subset of A. We define a function χS : A →
{0, 1} by
{
1, if a ∈ S;
(9.2) χS (a) =
0, if a ∈
/ S.
Then
[ ]
(a) If f is injective, then f −1 f (A) = A,
[ ]
(b) If f is surjective, then f f −1 (B) = B,
(c) If f is injective, then f (A1 ∩ A2 ) = f (A1 ) ∩ f (A2 ) .
9.2. Surjective and Injective Functions 101
Proof. You should try and prove (a) and (b) on your own. I will provide proof for (c) here. We
need to prove that f (A1 ∩ A2 ) ⊆ f (A1 ) ∩ f (A2 ) and f (A1 ) ∩ f (A2 ) ⊆ f (A1 ∩ A2 ).
Step 1. Show
f (A1 ∩ A2 ) ⊆ f (A1 ) ∩ f (A2 )
Let
y ∈ f (A1 ∩ A2 ) .
Then ∃x ∈ A1 ∩ A2 f (x) = y. Since x ∈ A1 ∩ A2 , x ∈ A1 and x ∈ A2 . But then f (x) ∈ f (A1 ) and
f (x) ∈ f (A2 ). So f (x) ∈ f (A1 ) ∩ f (A2 ). Observe that we have not used the fact that f is injective.
So this part of the result holds for any function.
f (x1 ) = y = f (x2 ) .
Definition 9.4.
(a) A function f is odd if and only if for every x, − f (x) = f (−x). Example: f (x) = x.
(b) A function f is even if and only if for every x, f (x) = f (−x). Example: f (x) = x2 .
(c) A function f is periodic if and only if there exists a k > 0 such that for every x, f (x + k) = f (x).
Example: f (x) = sin x, since sin (x + 2π) = sin x.
(d) A function f is increasing if and only if for every x and every y, if x ≤ y, then f (x) ≤ f (y).
Example: f (x) = x.
(e) A function f is decreasing if and only if for every x and every y, if x ≤ y, then f (x) ≥ f (y).
Example: f (x) = −x.
102 9. Single and Multivariable Calculus
Definition 9.5. Composition of Functions: If f : A → B and g : B → C are two functions, then for
( a ∈)A, f (a) ∈ B. But B is the domain of g, so mapping g can be applied to f (a), which yields
any
g f (a) , an element in C. This establishes a correspondence between a in A and c in C. This
correspondence is called the composition function of f and g and is denoted by g ◦ f (read g of f ).
Thus we have
( )
(9.5) (g ◦ f ) (a) = g f (a) .
Remark 9.1. Composition of two functions need not be commutative,
(g ◦ f ) (a) ̸= ( f ◦ g) (a)
as the following example shows.
Proof. (a) Since g is surjective, range of g = C. That is for any element c ∈ C, there exists an
element b ∈ B such that g (b) = c. Since f is also surjective, there exists an element a ∈ A such
that f (a) = b. But then
( )
(g ◦ f ) (a) = g f (a) = g (b) = c.
So, (g ◦ f ) is surjective.
( )
(b) Since g is injective, for all b and b′ in B, (if g)(b) = g b′ = c ∈ C then b = b′ and since f is
injective, for all a and a′ in A, if f (a) = f a′ = b ∈ B then a = a′ . Then
( )
(g ◦ f ) (a) = (g ◦ f ) a′
( ) ( ( ))
⇒ g f (a) = g f a′
( )
⇒ f (a) = f a′
⇒ a = a′
So, (g ◦ f ) is injective.
9.4. Continuous Functions 103
It is easy to draw examples of functions which are not continuous. An intuitive way of under-
standing continuity of function is that we should be able to draw its graph without lifting pencil
from paper. If a function has a point of discontinuity say, x0 , then as we approach x0 from the left
hand side and from right hand side, the function attains different values.
For a function to be continuous at x0 , both the LHS and RHS limits must exist and converge to
the function value.
(9.7) lim f (x) = lim+ f (x) = f (x0 )
x→x0− x→x0
Theorem 9.4. A function f : D → R is continuous if and only if for every convergent sequence of
points {xn } ∈ D with limit x ∈ D, the sequence f (xn ) → f (x).
Example 9.4. If
lim f (x) = lim+ f (x) ̸= f (x0 )
x→x0− x→x0
then the function is not continuous. Take
x for 0 6 x < 21
y= 0 for x = 12
1 − x for 1 < x 6 1
2
Definition 9.8. Given f : D → R, let A ⊆ R be any subset of the range. The inverse image of A
under f , f −1 (A), is the set of points x in the domain D such that f (x) ∈ A
{ }
(9.8) f −1 (A) = x ∈ D | f (x) ∈ A .
104 9. Single and Multivariable Calculus
Conversely, assume that f −1 (V ) is open in D for every open set V in R. Fix p ∈ D and ε > 0,
and let V be the set of all y ∈ R such that d( f (p), y) < ε. Then V is open and hence f −1 (V ) is
open, and so there exists δ > 0 such that x ∈ f −1 (V ) as soon as d(p, x) < δ. But if x ∈ f −1 (V ), then
f (x) ∈ V , and so d( f (p), y) < ε.
Next theorem (stated without proof) considers the inverse image of the closed subsets of the
range R to characterize continuous functions.
Theorem 9.6. A function f : D → R is continuous if and only if the inverse image of every closed
set is closed.
This follows from Theorem 9.5, since a set is closed if and only if its complement is open, and
since f −1 (V c ) = [ f −1 (V )]c for every V ⊂ R.
Claim 9.2. If f is a continuous function of two variables f (x1 , x2 ), then the functions of one
variable obtained by holding the other variable constant f (·, x̄2 ) and f (x̄1 , ·) are also continuous.
Theorem 9.7. Intermediate Value Theorem for continuous functions: Let f be a continuous func-
tion on a domain containing [a, b], with say f (a) < f (b). Then for any y in between, f (a) < y <
f (b), there exists c in (a, b) with f (c) = y.
9.4. Continuous Functions 105
y = f (x)
f (b)
y=u
u
f (a)
a c b
We can apply the Intermediate Value Theorem to prove the existence of a fixed point for fol-
lowing function.
Theorem 9.8. Consider a continuous function f : [0, 1] → [0, 1]. Then there exists c ∈ [0, 1] such
that f (c) = c.
Proof. Define a function g(x) = f (x) − x. It is continuous since it is sum of two continuous func-
tions, f (x) and −x. If f (0) = 0, then x = 0 is a fixed point. If not, then f (0) > 0, or g(0) > 0.
If f (1) = 1, then x = 1 is a fixed point. If not, then f (1) < 1, or g(1) < 0.
Now we apply the Intermediate Value Theorem to claim that there exists a point c ∈ [0, 1] such
that g(c) = 0. This implies g(c) = f (c) − c = 0 or f (c) = c or c is a fixed point.
106 9. Single and Multivariable Calculus
Definition 9.9. The function f : D → R attains a local maximum at x0 if there exists a neighborhood
of x0 such that f (x) 6 f (x0 ) for all x in the neighborhood.
Definition 9.10. The function f : D → R attains a strict local maximum at x0 if there exists a
neighborhood of x0 such that f (x) < f (x0 ) for all x not equal to x0 in the neighborhood.
Proof. We first claim that the function f is bounded on the domain D. If not, then there exists a
sequence {xn }∞ n=1 in D such that f (xn ) → ∞ as n → ∞. Since D is compact, there exists a subse-
quence {yn }n=1 of sequence {xn }∞
∞ ∞
n=1 which converges to ȳ in D. Since {yn }n=1 is a subsequence
of sequence {xn }∞ ∞
n=1 and f (xn ) → ∞, it must be true that f (yn ) → ∞. However, {yn }n=1 converges
to ȳ and f is a continuous function, f (yn ) must converge to the finite real number f (ȳ). These two
observations lead to a contradiction. Thus we have proved the claim.
To prove the theorem, we again assume that f does not attain its maximum value in D. Since f
is bounded on D, let M be the least upper bound of the values f takes in D. Clearly M is finite. Also
there exists a sequence {zn }∞
n=1 in D such that f (zn ) → M. Note, even though f (zn ) approaches
towards the least upper bound M as n → ∞, the sequence {zn }∞ n=1 itself need not converge. Since
D is compact, there exists a subsequence {un }∞ n=1 of sequence {zn }∞
n=1 which converges to ū in
D. Since f is a continuous function, f (un ) must converge to the finite real number f (ū). Since a
convergent sequence has only one limit, f (ū) = M and ū is the point of global maximum of f in
D.
This is the theorem we will be using to show the existence of optimal bundles for consumers
and producers. So we need to understand it and be comfortable with using it.
9.6. An application of Extreme Values Theorem 107
The following examples show why the function domain must be closed and bounded in order
for the theorem to apply. In each of the following examples, the function fails to attain a maximum
on the given interval.
(a) f (x) = x defined over [0, ∞) (domain being unbounded) is not bounded from above.
x
(b) f (x) = 1+x defined over [0, ∞) (domain being unbounded) is bounded but does not attain its
least upper bound, i. e., 1.
1
(c) f (x) = x defined over (0, 1] (domain is bounded but not closed) is not bounded from above.
(d) f (x) = 1 − x defined over (0, 1] (domain is bounded but not closed) is bounded but never attains
its least upper bound, i. e., 1.
(e) Defining f (x) = 0 in the last two examples shows that both functions require continuity on
[a, b].
If we are given two norms ∥·∥a and ∥·∥b on some finite-dimensional vector space V over Rn ,
a very useful fact is that they are always within a constant factor of one another. In other words,
there exists a pair of real numbers 0 < C1 < C2 such that, for all x ∈ V , the following inequality
holds:
C1 ∥x∥b ≤ ∥x∥a ≤ C2 ∥x∥b .
Note that any finite-dimensional vector space, by definition, is spanned by a basis e1 , e2 , · · · , en
where n is the dimension of the vector space. The basis is often chosen to be orthonormal if we
have an inner product. That is, any vector x can be written
n
x = ∑ αi ei ,
i=1
Now, we can prove equivalence of norms in four steps, the last of which requires application
of the Extreme Value Theorem.
Step 1 It is sufficient to consider ∥·∥b = ∥·∥1 (Transitivity property for the norms hold.)
108 9. Single and Multivariable Calculus
where u ≡ x
∥x∥1 has norm ∥u∥1 = 1.
and
∥x′ ∥a − ∥x∥a = ∥x − (x − x′ )∥a − ∥x∥a ≤ ∥x − x′ ∥a ,
and therefore,
|∥x∥a − ∥x′ ∥a | < ∥x − x′ ∥a .
Second, applying the triangle inequality again, and writing x = ∑ni=1 αi ei and x′ = ∑ni=1 α′i ei ,
we obtain
n
∥x − x′ ∥a ≤ ∑ |αi − α′i |∥ei ∥a ≤ ∥x − x′ ∥1 (max ∥ei ∥a ).
i
i=1
Therefore, if we choose
ε
δ=
maxi ∥ei ∥a
it immediately follows that
9.7. Differentiability
We follow the steps listed below to determine whether a derivative exists and if yes, its value.
∆f
(c) If the secant h has a limit as h → 0, then f is differentiable at x0 , and the derivative is
equal to this limit.
We can see that the derivative is equal to the slope of the tangent to the graph at x0 . Note that
the tangent can be used to approximate the function in the neighborhood of x0 .
f (x0 + h) = f (x0 ) + h · f ′ (x0 ) .
It is the best linear approximation.
Definition 9.14. A function f : R → R is differentiable on a set S ⊆ R, if it is differentiable at each
point x ∈ S. It is called differentiable if it is differentiable at each point of the domain.
Example 9.5. Let f (x) : R → R be f (x) = x2 . This function is differentiable at all x ∈ R.
f (x0 + h) − f (x0 ) (x0 + h)2 − x02
αsec = =
( h ) h
x02 + h2 + 2x0 h − x02 2x0 h + h2
= =
h h
= 2x0 + h
lim αsec = 2x0 ⇒ f ′ (x0 ) = 2x0 .
h→0
Definition 9.15. Second derivative: Let function f : R → R be differentiable with f ′ (·) denoting
its first derivative. If f ′ (·) is differentiable, its derivative is denoted by f ′′ (·) and is called the
second derivative of f .
Definition 9.16. A function whose derivative exists and is continuous is called continuously dif-
ferentiable or of class C 1 . A function whose second derivative exists and is continuous is called
twice continuously differentiable or of class C 2 .
9.7. Differentiability 111
Note this claim does not hold in the other direction. Not all continuous functions are differen-
tiable. Consider the example of absolute value function f : R −→ R is defined by
f (x) = |x| .
The absolute value or |x| of x is defined by
{
x if x > 0
|x| =
−x if x < 0.
It is easy to check that f is continuous on R. However, it is not differentiable at x0 = 0 (Please
verify).
f (x) = |x|
1 2
( f ◦ g)′ (x) = · 2x =
x2 x
Theorem 9.12. If f is differentiable and has a local maxima or minima at x0 , then f ′ (x0 ) = 0.
Note the converse is not true. Take f (x) = x3 (See Figure 9.3). The first derivative is zero at
x0 = 0 which is a point of inflection.
−3 −2 −1 1 2 3
−1
−2
−3
−4
−5
−6
−7
y
1
0 x
0.2 0.4 0.6 0.8
−1
9.7.2. L’Hospital’s Rule. Sometimes we need to determine the value of a function where both the
numerator and the denominator go to zero. We use L’Hospital rule in such case. If f (a) = g (a) = 0
and g′ (a) ̸= 0, then
f (x) f ′ (a)
lim = ′ .
x→a g (x) g (a)
9.8. Mean Value Theorem 115
Theorem 9.13. Mean Value Theorem: Let f be a continuous function on the compact interval [a, b]
and differentiable on (a, b). Then there exists a point c ∈ (a, b) where
f (b) − f (a)
f ′ (c) = .
b−a
Following claim is helpful in proving the Mean Value Theorem. The proof the claim relies on
the Weierstrass Theorem and thus is another example of application of the Weierstrass Theorem.
Claim 9.3. Let f (·) and g(·) be continuous functions on [a, b] and differentiable on (a, b). Then
there exist x ∈ (a, b) such that
[ f (b) − f (a)]g′ (x) = [g(b) − g(a)] f ′ (x).
Proof. Define,
h(s) = [ f (b) − f (a)]g(s) − [g(b) − g(a)] f (s).
Then, it is easy to check h(a) = f (b)g(a) − f (a)g(b) = h(b). We need to show that h′ (x) = 0 for
some x ∈ (a, b). If h(x) is a constant function, then h′ (x) = 0 for every point in (a, b). If not, then
consider without loss of generality, h(x) > h(a) for some x ∈ (a, b). Since h(·) is a continuous
function defined on a compact domain [a, b], Weierstrass Theorem can be applied to claim that it
attains a maximum at some point s ∈ (a, b). Also since h(·) is differentiable on (a, b) and attains its
maximum at s ∈ (a, b), h′ (s) = 0. The case where h(x) < h(a) for some x ∈ (a, b) can be proved in
similar manner as in this case, the function h(·) will attain a minimum at some interior point.
To prove the Mean Value Theorem, we consider g(x) = x. Then, g′ (x) = 1 leads to
f (b) − f (a)
[ f (b) − f (a)](1) = [b − a] f ′ (x) or f ′ (x) = ,
b−a
for some x ∈ (a, b).
116 9. Single and Multivariable Calculus
f (b)
f (b)− f (a)
b−a
f (a)
f ′ (c)
a c b
f (x) 6 f (y) .
(f) sometimes we also say f is increasing at c if there exists some δ > 0 such that c − δ < x < c <
y < c + δ implies that
f (x) 6 f (c) 6 f (y) .
(g) f is decreasing at c if there exists some δ > 0 such that c − δ < x < c < y < c + δ implies that
f (x) > f (c) > f (y) .
(a) If f ′ (x) > 0 for all x ∈ (a, b), then f is non-decreasing on [a, b].
(b) If f ′ (x) > 0 for all x ∈ (a, b), then f is strictly increasing on [a, b].
(c) Similarly if f ′ (x) 6 0 for all x ∈ (a, b), then f is non-increasing on [a, b].
(d) If f ′ (x) < 0 for all x ∈ (a, b), then f is strictly decreasing on [a, b].
(e) If f ′ (x) = 0 for all x ∈ (a, b), then f is constant on [a, b].
{ } { }
′ > monotone increasing
(9.16) f (x0 ) 0 ⇔ f is at x0 .
6 monotone decreasing
Theorem 9.14. [Darboux’s Theorem] Intermediate Value Theorem for derivative: If f is differ-
entiable on (a, b) then its derivative has the intermediate value property. If x1 < x2 are any two
points in the interval (a, b), and y lies between f ′ (x1 ) and f ′ (x2 ), then there exists a number x in
the interval [x1 , x2 ] such that f ′ (x) = y.
Assume y lies strictly between f ′ (x1 ), and f ′ (x2 ). Define a function g : (a, b) → R by
g(t) = f (t) − yt.
118 9. Single and Multivariable Calculus
Then g′ (x1 ) = f ′ (x1 ) − y and g′ (x2 ) = f ′ (x2 ) − y. Then either (i) g′ (x1 ) > 0 and g′ (x2 ) < 0 or (ii)
g′ (x1 ) < 0 and g′ (x2 ) > 0. Take the first case, i.e. g′ (x1 ) > 0 and g′ (x2 ) < 0. It is clear that neither
x1 nor x2 can be a point where g attains even a local maximum. Since g is a continuous function, it
must therefore attain its maximum at an interior point x of the closed and bounded interval [x1 , x2 ]
by Weierstrass Theorem. So we conclude that
0 = g′ (x) = f ′ (x) − y, or f ′ (x) = y.
We can clearly assume that y lies strictly between f ′ (x1 ) and f ′ (x2 ). Define continuous func-
tions fx1 , fx2 : [a, b] → R by
{
f ′ (x1 ) for t = x1
fx1 (t) = f (x1 )− f (t)
x1 −t for t ̸= x1 .
and {
f ′ (x2 ) for t = x2
fx2 (t) = f (t)− f (x2 )
t−x2 for t ̸= x2 .
Observe that fx1 (x1 ) = f ′ (x1 ), fx2 (x2 ) = f ′ (x2 ) and fx1 (x2 ) = fx2 (x1 ). Hence, y lies between fx1 (x1 )
and fx1 (x2 ); or y lies between fx2 (x1 ) and fx2 (x2 ). If y lies between fx1 (x1 ) and fx1 (x2 ), then (by
continuity of fx1 ) there exists s in (x1 , x2 ] with
f (s) − f (x1 )
y = fx1 (s) = .
s − x1
Then by Mean Value Theorem there exists x ∈ [x1 , s] such that
f (s) − f (x1 )
y= = f ′ (x).
s − x1
Similarly if y lies between fx2 (x1 ) and fx2 (x2 ), then (by continuity of fx2 ) there exists s in [x1 , x2 )
and x ∈ [s, x2 ] such that
f (x2 ) − f (s)
y= = f ′ (x).
x2 − s
Definition 9.18. The function f (x) is differentiable at the point x if there exists an n dimensional
vector D f (x), called the differential or total derivative of f at x, such that
∀ε > 0, ∃δ > 0 ∥x − y∥ < δ
⇒ f (x) − f (y) − D f (x) · (x − y) < ε · ∥x − y∥ .
9.10.1. Partial Derivative. To us the more important concept is that of partial derivative which
we define now.
Definition 9.19. Let f : D → R where D ⊆ Rn be a function of n variables. If the limit
f (x1 , · · · , xi + h, · · · xn ) − f (x1 , · · · , xi , · · · xn )
lim
h→0 h
∂ f (x)
exists, it is called the ith (first order) partial derivative of f at x and is denoted by ∂xi or fi (x).
The function f (x) is then said to be partially differentiable with respect to xi . The function
f (x) is said to be partially differentiable if it is partially differentiable with respect to every xi .
Note ∂ ∂x
f (x)
i
is the derivative of f (x1 , · · · , xn ) with respect to xi holding all other variables con-
stant. When all the partial derivatives exist, the vector of partial derivatives
[ ]
∂ f (x) ∂ f (x)
∇ f (x) = ,··· ,
∂x1 ∂xn
is called the Jacobian vector or the gradient vector. For functions of one variable, ∇ f (x) = f ′ (x).
Result 9.4. If a function is differentiable at x0 then it is partially differentiable at x0 .
However, existence of all the partial derivatives do not guarantee even the continuity of the
function as the following example shows.
Example 9.10. Let f (x, y) be defined as
{
xy
x2 +y2
if (x, y) ̸= (0, 0)
f (x, y) =
0 otherwise.
We can prove that the partial derivatives D1 f (x, y) and D2 f (x, y) exist at every point in R2 , although
f is not continuous at (0, 0).
If f is a real valued function defined on an open set D in Rn , and the partial derivatives are
bounded in D, then f is continuous on D.
Example 9.11. Let f : R2 → R be
f (x1 , x2 ) = x13 + 2x1 x2 + 3x23 .
120 9. Single and Multivariable Calculus
Then
∂ f (x) ∂ f (x)
= 3x12 + 2x2 , = 2x1 + 9x22
∂x1 ∂x2
[ ]
∇ f (x) = 3x12 + 2x2 , 2x1 + 9x22 , ∀x ∈ R2
For functions of one variable we have seen earlier that we could approximate the function
around a point by the tangent to the function at the point. We can do something similar in case of
functions of several variables. Instead of approximation by a line (the tangent), we now approxi-
mate by the tangent hyperplane.
Definition 9.20. Given f : D → R with gradient ∇ f (x) at x0 , the tangent hyperplane to f at x0 is
given by
f (x) = f (x0 ) + ∇ f (x0 ) · (x − x0 ) .
9.10.2. Second Order Partial Derivatives. Let us look at the example above again. For
f (x1 , x2 ) = x13 + 2x1 x2 + 3x23 ,
∂ f (x)
∂x1 = 3x12 + 2x2 and ∂∂xf (x)
2
= 2x1 + 9x22 are differentiable functions of x1 and x2 themselves. When
we take partial derivatives of these functions we get the second partial derivatives.
∂2 f (x) ∂2 f (x) ∂2 f (x) ∂2 f (x)
= 6x1 , = 18x 2 , = = 2.
∂x12 ∂x22 ∂x1 ∂x2 ∂x2 ∂x1
This example can be generalized.
Definition 9.21. Let f : Rn → R be twice differentiable. For each of the n partial derivatives, we
get n partial derivative of second order,
( )
∂ ∂ f (x) ∂2 f (x)
= = fi j (x) .
∂x j ∂xi ∂x j ∂xi
We organize the second order derivatives in a matrix, called the Hessian Matrix.
∂2 f (x) ∂2 f (x)
∂x12 · · · · · · ∂xn ∂x1
∂2 f (x)
∂x ∂x · · · · · · ···
2
H f (x) = D f (x) = .
(9.18) .. .
1 2
.. ..
.. . . .
2
∂ f (x) ∂2 f (x)
∂x1 ∂xn · · · · · · ∂x 2
n
9.10. Functions of Several Variables 121
If all the partial derivatives of the first order exist and are continuous then f is called C 1 or contin-
uously differentiable. If all the partial derivatives of second order exist and are continuous then f
is called C 2 or twice continuously differentiable and so forth.
Theorem 9.15. Young’s Theorem : If f is twice continuously differentiable then
∂2 f (x) ∂2 f (x)
= ,
∂x j ∂xi ∂xi ∂x j
i.e., the Hessian of f is a symmetric matrix.
Example 9.12. For the example above
[ ]
6x1 2
H f (x) = .
2 18x2
The off diagonal element of the Hessian are also called cross-partials. For functions of one
variable, H f (x) = f ′′ (x).
Example 9.13. Let f : R3 → R be
f (x) = 5x12 + x1 x23 − x22 x32 + x33 .
Then [ ]
∇ f (x) = 10x1 + x23 3x1 x22 − 2x2 x32 −2x22 x3 + 3x32
and
10 3x22 0
H f (x) = 3x22 6x1 x2 − 2x32 −4x2 x3
0 −4x2 x3 −2x22 + 6x3
We now provide three very useful theorems on continuous and differentiable functions on
convex sets in Rn for n ≥ 1. They are the Intermediate Value theorem, the Mean Value theorem
and Taylor’s theorem.
Theorem 9.16 (Intermediate Value Theorem:). Suppose A is a convex subset of Rn , and f : A → R
is a continuous function on A. Suppose x1 and x2 are in A, and f (x1 ) > f (x2 ). Then given any
c ∈ R such that f (x1 ) > c > f (x2 ), there is 0 < θ < 1 such that f [θx1 + (1 − θ)x2 ] = c.
Example 9.14. Suppose X ≡ [a, b] is a closed interval in R (with a < b). Suppose f is a continuous
function on X. By Weierstrass theorem, there will exist x1 and x2 in X such that f (x1 ) ≥ f (x) ≥
f (x2 ) for all x ∈ X. If f (x1 ) = f (x2 ) [this is the trivial case], then f (x) = f (x1 ) for all x ∈ X, and
so f (X) is the single point, f (x1 ). If f (x1 ) > f (x2 ), then using the fact that X is a convex set, we
can conclude from the Intermediate Value Theorem that every value between f (x1 ) and f (x2 ) is
attained by the function f at some point in X. This shows that, f (X) is itself a closed interval.
122 9. Single and Multivariable Calculus
Theorem 9.17 (Mean Value Theorem). Suppose A is an open convex subset of Rn , and f : A → R
is continuously differentiable on A. Suppose x1 and x2 are in A. Then there is 0 ≤ θ ≤ 1 such that
f (x2 ) − f (x1 ) = (x2 − x1 )∇ f (θx1 + (1 − θ)x2 )
Example 9.15. Let f : R → R be a continuously differentiable function with the property that
f ′ (x) > 0 for all x ∈ R. Then given any x1 , x2 in R, with x2 > x1 we have by the Mean-Value
Theorem (since R is open and convex), the existence of 0 ≤ θ ≤ 1, such that
f (x2 ) − f (x1 ) = (x2 − x1 ) f ′ (θx1 + (1 − θ)x2 )
Now f ′ (θx1 + (1 − θ)x2 ) > 0 by assumption, and x2 > x1 by hypothesis. So f (x2 ) > f (x1 ). This
shows that f is an increasing function on R.
Observe that a function f : R → R can be increasing without satisfying f ′ (x) > 0 at all x ∈ R.
For example, f (x) = x3 is increasing on R, but f ′ (0) = 0.
Theorem 9.18 (Taylor’s Expansion up to Second-Order). Suppose A is an open, convex subset of
Rn , and f : A → R is twice continuously differentiable on A. Suppose x1 and x2 are in A. Then
there exists 0 ≤ θ ≤ 1, such that
1
f (x2 ) − f (x1 ) = (x2 − x1 )′ ∇ f (x1 ) + (x2 − x1 )′ H f (θx1 + (1 − θ)x2 )(x2 − x1 )
2
The “Chain Rule” of differentiation provides us with a formula for finding the partial deriva-
tives of a composite function, F, in terms of the partial derivatives of the individual functions, f
and h.
Theorem 9.19 (Chain Rule of differentiation). Let h : A → Rm be a function with component
functions hi : A → R(i = 1, · · · , m) which are continuously differentiable on an open set A ⊂ Rn .
Let f : B → R be a continuously differentiable function on an open set B ⊂ Rm which contains the
set h(A). If F : A → R is defined by F(x) = f [h(x)] on A, and a ∈ A, then F is differentiable at a
and we have for i = 1, · · · , n,
m
Di F(a) = ∑ D j f (h1 (a), · · · , hm (a))Di h j (a)
j=1
9.11. Composite Functions and the Chain Rule 123
Example 9.16. Let m = 2, n = 1. Let h1 (x) = x3 on R, and h2 (x) = 10 + x on R; and let f (y1 , y2 ) =
y1 + y42 on R2 . Then
F(x) = f [h(x)] = f [h1 (x), h2 (x)] = h1 (x) + [h2 (x)]4 = x3 + (10 + x)4
is a composite function on R. If a ∈ R,
F ′ (a) = D1 F(a) = D1 f (h1 (a), h2 (a)) · D1 h1 (a) + D2 f (h1 (a), h2 (a)) · D1 h2 (a)
= 1 · (3a2 ) + 4(h2 (a))3 · 1 = 3a2 + 4(10 + a)3
Example 9.17. Take m = 1, n = 2. Let h1 (x) = h1 (x1 , x2 ) = x12 + x2 on R2 ; f (y) = 2y on R. Then
F(x) = F(x1 , x2 ) = f [h1 (x1 , x2 )] = 2[x12 + x2 ]. Then if a ∈ R2 ,
D1 F(a) = D1 f [h1 (a1 , a2 )]D1 h1 (a1 , a2 )
D2 F(a) = D1 f [h1 (a1 , a2 )]D2 h1 (a1 , a2 )
Thus, D1 F(a) = 2(2a1 ) = 4a1 ; and D2 F(a) = 2(1) = 2.
Chapter 10
Problem Set 4
(3) Let f : R → R be
{
x2 − 1, x 6 0
f (x) =
−x2 , x > 0.
and g : R → R
{
3x − 2, x 6 2
g (x) =
−x + 6, x > 2.
(a) Is f continuous at x = 0?
(b) Is g continuous at x = 2?
(4) Find
( )
f (x) exp x2 + exp (−x) − 2
(10.3) lim = lim
x→0 g (x) x→0 2x
125
126 10. Problem Set 4
(7) This exercise gives an example of a function with D12 f (x, y) ̸= D21 (x, y). Let f (x, y) be defined
as {
xy(x2 −y2 )
x2 +y2
if (x, y) ̸= (0, 0)
f (x, y) =
0 otherwise.
(a) We can prove that the partial derivatives D1 f (x, y) and D2 f (x, y) exist at every point in
(x, y) ∈ R2 and f is continuous on R2 .
(b) The partial derivatives D1 f (x, y) and D2 f (x, y) are continuous at every point in R2 .
(c) The second order cross partial derivatives D12 f (x, y) and D21 f (x, y) exist at every point in
R2 and are continuous everywhere in R2 except at (0, 0).
(d) D21 f (0, 0) = +1 and D12 f (0, 0) = −1.
Chapter 11
Convex Analysis
For functions of real variable, we say that a function is concave if and only if, informally, its
slope is weakly decreasing. If the function is differentiable then the derivative is weakly decreasing.
Theorem 11.1. Let X ⊂ R be an open interval. Then f : X → R is concave if and only if for any
a, b, c ∈ X with a < b < c
f (b) − f (a) f (c) − f (b)
≥ , and
b−a c−b
Proof. First we assume that f is concave and show that the two inequalities hold. Note b − a > 0
and c − b > 0. Hence the first inequality holds if and only if
[ f (b) − f (a)](c − b) ≥ [ f (c) − f (b)](b − a).
127
128 11. Convex Analysis
f(d)
f ′ (d)
f (d)− f (c)
d−c
f(c)
f ′ (c)
c d
Figure 11.1. A concave Function of one variable: f ′ (d) < < f ′ (c)
f (d)− f (c)
d−c
( ) ( )
b−a b−a
f (b) ≥ f (c) + 1 − f (a),
c−a c−a
with b−a
c−a ∈ (0, 1). This holds true since f is concave.
To show that if the two inequalities hold then f is concave, we can take any a < c and any
λ ∈ (0, 1), and let b = λa + (1 − λ)c so that a < b < c holds.
( ) ( )
Proof. Let the function f be concave. Let x1 , α1 ∈ C and x2 , α2 ∈ C. Then f (x1 ) ≥ α1 and
f (x2 ) ≥ α2 . Since f is concave, and x1 , x2 ∈ A, for every λ ∈ [0, 1],
[ ] ( ) ( )
f λx1 + (1 − λ) x2 ≥ λ f x1 + (1 − λ) f x2 ≥ λα1 + (1 − λ) α2 ,
( )
which implies λx1 + (1 − λ) x2 , λα1 + (1 − λ) α2 ∈ C. Hence C is convex.
( ) ( )
Next we assume C to be convex. Note for x1 , x2 ∈ A, we have x1 , f (x1 ) ∈ C and x2 , f (x2 ) ∈
C. Since C is convex, for every λ ∈ [0, 1],
( ) ( )
λ · x1 , f (x1 ) + (1 − λ) · x2 , f (x2 ) ∈ C.
This implies
f (λx1 + (1 − λ)x2 ) ≥ λ · f (x1 ) + (1 − λ) · f (x2 ),
or f is concave.
In general, a concave function on a convex set in Rn need not be continuous as the following
example shows.
130 11. Convex Analysis
f (x)
b
f (x) is not continuous at x = 0
bC
However, if the set A is open and convex, then the concave function f is continuous on A.
Following theorem can be proved using Theorem 12.1 for functions of real variable.
Theorem 11.3. Let X ⊂ R be open and convex and let f : X → R be a concave function. Then f
is continuous on X.
Proof. Assume f is concave. Theorem 12.1 implies that for any a, b, c ∈ X with a < b < c the
graph of the function f lies between the graph of the line through points (a, f (a)) and (b, f (b)) and
the line through points (b, f (b)) and (c, f (c)). Thus for any x ∈ [a, b],
f (b) − f (a) f (c) − f (b)
f (b) − (b − x) ≤ f (x) ≤ f (b) − (b − x)
b−a c−b
and for any x ∈ [b, c],
f (b) − f (a) f (c) − f (b)
f (b) + (x − b) ≥ f (x) ≥ f (b) + (x − b).
b−a c−b
These two inequalities imply that f is continuous at b.
11.1. Concave, Convex Functions 131
If the function is continuously differentiable on an open convex set, then following theorem
characterizes the concave functions.
Theorem 11.4. Suppose A ⊂ Rn is an open convex set, and f : A → R is continuously differentiable
on A. Then f is concave on A if and only if
Next we assume (11.2) holds true for all x2 , x1 ∈ A. Then for any λ ∈ [0, 1], let x = λx2 + (1 −
λ)x1 . Since A is convex, x ∈ A. Note
x2 − x = x2 − λx2 − (1 − λ)x1 = (1 − λ)(x2 − x1 ).
Also
x1 − x = x1 − λx2 − (1 − λ)x1 = −λ(x2 − x1 ).
Applying (11.2), we get
f (x2 ) − f (x) 6 ∇ f (x) · (x2 − x) = ∇ f (x) · (1 − λ)(x2 − x1 ),
and
f (x1 ) − f (x) 6 ∇ f (x) · (x1 − x) = ∇ f (x) · (−λ)(x2 − x1 ).
We multiply the first inequality by λ and the second inequality by 1 − λ and add to obtain
λ · f (x2 ) + (1 − λ) · f (x1 ) − f (x) 6 0,
132 11. Convex Analysis
which implies
λ · f (x2 ) + (1 − λ) · f (x1 ) 6 f (x) = f (λx2 + (1 − λ)x1 ).
So f is concave.
Also the function will be strictly concave if we change the weak inequality to strict inequality.
Theorem 11.5. Suppose A ⊂ Rn is an open convex set, and f : A → R is continuously differentiable
on A. Then f is strictly concave on A if and only if
Now we consider twice continuously differentiable functions. Following two theorems char-
acterize concave and strictly concave functions.
Theorem 11.6. Suppose A ⊂ Rn is an open convex set, and f : A → R is twice continuously
differentiable on A. Then f is concave on A if and only if H f (x) is negative semi-definite for all
x ∈ A.
If H f (x) is negative definite whenever x ∈ A, then the function is strictly concave, but the
converse is not true.
Theorem 11.7. Suppose A ⊂ Rn is an open convex set, and f : A → R is twice continuously
differentiable on A. If H f (x) is negative definite for all x ∈ A then f is strictly concave on A.
Following example shows that the converse implication does not hold.
Example 11.2. Let f : R → R be defined by f (x) = −x4 for all x ∈ R (See Figure 2). This is a
twice continuously differentiable function on the open, convex set R. We can verify that f is strictly
concave on R, but since f ′′ (x) = −12x2 , f ′′ (0) = 0. This shows that the converse implication is not
valid.
Claim 11.1. If f : A → R is a function of one variable and is twice continuously differentiable then
∀ x ∈ A, f ′′ (x) 6 0 ⇔ f is concave.
Definition 11.2. Function f : A → R is convex if ∀x, y ∈ A, ∀λ ∈ [0, 1],
[ ]
(11.3) λ f (x) + (1 − λ) f (y) > f λx + (1 − λ) y
Function f is strictly convex if the inequality is strict for all λ ∈ (0, 1).
11.1. Concave, Convex Functions 133
−3 −2 −1 1 2 3
−1
−2
−3
−4
−5
−6
−7
Claim 11.2. If f : A → R is a function of one variable and is twice continuously differentiable then
Note that a local maxima (minima) of a concave (convex) function is a global maxima (minima)
as well.
Theorem 11.8. Let f : A → R (where A ⊆ Rn is open and convex) be twice continuously differen-
tiable. Then,
−3 −2 −1 1 2 3
Take f (x) = x4 , f ′′ (x) = 12x2 . It is strictly convex everywhere but f ′′ (0) = 0. We would need
f ′′ (x) > 0, ∀x ∈ A for the Hessian to be PD.
( )
(b) If f (x) is concave (convex) and F (u) concave(convex) and increasing then U (x) = F f (x)
is concave(convex).
(3.) 1
x is strictly convex on R++ and strictly concave on R−− .
Note that for functions of one variable, any monotone function is quasi-concave. This however
does NOT apply for functions of more than one variable. Also all quasi-concave functions need
not be concave. Take f (x) = x2 it is monotone increasing, hence quasi-concave. But it is not
136 11. Convex Analysis
concave, rather it is convex. For functions of one variable, following theorem characterizes the
quasi-concave functions.
Theorem 11.12. A function f of a single variable is quasiconcave if and only if either (a) it is
non-decreasing, (b) it is non-increasing, or (c) there exists x∗ such that f is non-decreasing for
x < x∗ and non-increasing for x > x∗ .
Let Br (x) denote the sub matrix of the first (r + 1) rows and columns of Br (x), i.e., Br (x) is
(r + 1) × (r + 1) matrix.
( )
Condition 1. A necessary condition for f to be quasiconcave is that (−1)r det Br (x) > 0, ∀r =
1, 2, · · · , n; ∀x ∈ D.
( )
Condition 2. A sufficient condition for f to be quasiconcave is that (−1)r det Br (x) > 0, ∀r =
1, 2, · · · , n; ∀x ∈ D.
When we check for quasi-concavity, we have to check the sufficient conditions. We need
[ ]
0 f1
det < 0,
f1 f11
0 f1 f2
det f1 f11 f12 > 0, etc.
f2 f21 f22
Remark 11.1. When we have to check whether a function is quasi-concave, start out checking
whether it is concave because it is easier to check for concavity and concavity implies quasi-
concavity.
Remark 11.2. Quasi-concavity is preserved under monotone transformation whereas concavity
need not be preserved.
11.2. Quasi-concave Functions 137
√
Example 11.5. Let f (x, y) = xy for (x, y) ∈ R2++ .
Then √ √
y
− 1
4√ x3
1
4
1
xy
H f (x) = 1 1 √ .
4 xy − 14 x
y3
The principal minors of order one are negative and of order two is zero. Hence f (x) is concave and
so quasi-concave.
( )4
Let us take a monotone transformation g (x, y) = f (x, y) = x2 y2 , for (x, y) ∈ R2++ .
0 2xy2 2x2 y
B (x, y) = 2xy2 2y2 4xy
2x2 y 4xy 2x2
[ ]
( ) 0 2xy2
det B1 (x, y) = = −4x2 y4 < 0
2xy2 2y2
( )
⇒ (−1)1 det B1 (x, y) > 0, ∀ (x, y) ∈ R2++ .
( ) ( ) ( )
det B2 (x, y) = −2xy2 4x3 y2 − 8x3 y2 + 2x2 y 8x2 y3 − 4x2 y3
= 8x4 y4 + 8x4 y4 = 16x4 y4 > 0, (x, y) ∈ R2++
⇒ g(x, y) is quasi-concave.
Note however, g (x, y) is not concave.
[ ]
2y2 4xy
Hg (x, y) =
4xy 2x2
Principal minors of order one are strictly positive and of order two is −12x2 y2 which is strictly
negative. Thus g (x, y) is not concave.
Chapter 12
Problem Set 5
(1) Prove or give a counterexample: The sum of two concave functions is concave.
(3) Suppose f : [0, 1] → R+ and g : [0, 1] → R+ are increasing, convex functions on [0, 1]. Define
the function h : [0, 1] → R+ by:
(a) f (x) = 3x + 4;
(b) g(x, y) = yex , y > 0;
(c) h(x, y) = −x2 y3 .
(5) Show using an example that the sum of two quasi-concave functions need not be quasi-concave
(in general).
139
140 12. Problem Set 5
This idea can be extended to the domains of the function, A, being subsets of Rn , with the
function f defined from A to R. Then f is one-to-one on A if for all x1 , x2 ∈ A, x1 ̸= x2 , we have
f (x1 ) ̸= f (x2 ). In this case, if there is a function g, from f (A) to A, such that g[ f (x)] = x for each
x ∈ A, then g is called the inverse function of f on f (A).
141
142 13. Inverse and Implicit Function Theorems
defined “locally” around f (a). The important restriction to carry out the kind of analysis noted
above is that f ′ (a) ̸= 0.
We note here that f ′ (a) ̸= 0 is not a necessary condition to get a unique inverse function of f .
For example if f : R → R is defined by f (x) = x3 , then we have f to be continuously differentiable
on R, with f ′ (0) = 0. However f is an increasing function, and clearly has a unique inverse function
g(y) = y1/3 on R, and hence locally around f (0).
Following theorem deals with the existence and properties of inverse functions.
Theorem 13.1 (Inverse Function Theorem). Let A be an open set of Rn , and f : A → Rn be con-
tinuously differentiable on A. Suppose a ∈ A and the Jacobian of f at a is non-zero. Then there is
an open set X ⊂ A containing a, and an open set Z ⊂ Rn containing f (a), and a unique function
h : Z → X, such that:
(i) f (X) = Z;
(ii) f is one-to-one on X;
Following example shows that continuity of f ′ is needed in the inverse function theorem, even
in the case n = 1.
Example 13.1. Let ( )
1
f (t) = t + 2t 2 sin for t ̸= 0 and f (0) = 0,
t
then f ′ (0) = 1, f ′ is bounded in (−1, 1), but f is not one-to-one in any neighborhood of 0.
For the system of simultaneous linear equations Ax = b, we have seen earlier, that there exists a
unique solution for every choice of right hand side column vector b, if and only if the rank of A
13.2. The Linear Implicit Function Theorem 143
is equal to the number of rows of A which is equal to the number of columns of the matrix A.
In economic models, the vector b represents some externally determined (exogenous) parameters
while the linear equations constitute some equilibrium conditions which determine the vector x
which is the set of internal (endogenous) variables.
In this sense it is possible to divide the set of variables in two disjoint subsets of endogenous and
exogenous variables. Thus a general linear economic model will have m equations in n unknowns:
a11 x1 + a12 x2 + · · · + a1n xn = b1
··· ··· ··· ··· ···
am1 x1 + am2 x2 + · · · + amn xn = bm
In general it will be possible to divide the set of variables into endogenous variables and exoge-
nous variables. Such a division will be useful only if after substituting the values of the exogenous
variables in the m equations, it is possible to obtain a solution of the system for the remaining en-
dogenous variables. For this two conditions must hold. The number of endogenous variables must
be equal to the number of equations m and the square matrix corresponding to the endogenous
variables must have maximal rank m.
A formal statement of the above observation is known as the linear version of Implicit Function
Theorem.
Exercise 13.1.
144 13. Inverse and Implicit Function Theorems
Find an explicit formula for the endogenous variables in terms of the exiguous variables.
Exercise 13.2.
It is possible to apply quadratic formula to the implicit function xy2 − 3y − 2 exp x = 0 to obtain
an explicit function for y as
√
3 ± 9 + 8 exp x
y= .
2x
However it could turn out to be the case that the explicit functions more difficult to work with than
the original implicit function.
13.3. Implicit Function Theorem for R2 145
We consider implicit functions in R2 of the form F(x, y) = c and analyze following question.
For a given implicit function F(x, y) = c and a specified solution (x0 , y0 ),
(a) Does F(x, y) = c determine y as a continuous function of x for points (x, y) such that x is near
x0 and y is near y0 ?
(a) Given the implicit function F(x, y) = c, determine a point (x0 , y0 ) such that F(x0 , y0 ) = c, and
also does there exist a continuous function y = f (x) defined on an interval I around x0 so that:
(1) F(x, f(x))=c for all x ∈ I and
(2) y0 = f (x0 )?
(c)
∂F(x,y)
′ ∂x (x ,y )
f (x0 ) = − ∂F(x,y) 0 0 .
∂y (x0 ,y0 )
Homogeneous and
Homothetic Functions
Most of us have come across homogeneous functions in the elementary algebra courses. For exam-
ple f (x) = ax is homogeneous of degree 1, f (x) = axm is homogeneous of degree m, f (x) = ax + 1
is not a homogeneous function, and so on. First we define the homogeneous function formally.
Definition 14.1. For any scalar k, a real valued function f (x1 , · · · , xn ) is homogeneous of degree k
on Rn+ if for all x ∈ Rn+ , and all t > 0,
(a) Consider f : R2+ → R given by f (x1 , x2 ) = x12 x23 . Then if t > 0, we have f (tx1 , tx2 ) = (tx1 )2 (tx2 )3 =
t 2+3 x12 x23 = t 5 f (x1 , x2 ). So, f is homogeneous of degree 5.
(b) The function f (x1 , x2 ) = x1a x2b is homogeneous of degree a + b. This function is an example of
returns to scale based on the value of a and b which we assume to be non-negative. If a + b = 1,
the function displays constant returns to scale. If a + b > 1, the function displays increasing
returns to scale. If a + b < 1, the function displays decreasing returns to scale.
147
148 14. Homogeneous and Homothetic Functions
(c) The Cobb-Douglas function f (x1 , x2 , · · · , xn ) = x1a1 x2a2 · · · xnan is homogeneous of degree a1 +
a2 + · · · + an .
( )q
(d) The constant elasticity of substitution function f (x1 , x2 ) = A a1 x1p + a2 x2p p is homogeneous
of degree q.
√
(e) The function f (x1 , x2 ) = x13 + x23 is homogeneous of degree 32 .
(f) The function f : R2+ → R given by f (x1 , x2 ) = x12 x2 + 3x1 x22 + x23 is homogeneous of degree 3,
since each term is homogeneous of degree 3.
(i) In consumer theory, the demand function is homogeneous function of degree zero.
(j) The only homogeneous of degree k function of one variable is f (x) = axk for some constant a.
(k) The only homogeneous of degree zero function of one variable is the constant function f (x) = a
for some constant a.
(l) There exist non-constant homogeneous of degree zero functions. Consider for example f (x, y) =
y , y ̸= 0.
x
(m) If functions f and g are homogeneous of degree k, then the sum function f + g is also homo-
geneous of degree k.
(n) The function f : R2+ → R given by f (x1 , x2 ) = 3x12 x23 − 6x15 x22 is not homogeneous since the first
term is homogeneous of degree 5 but the second term is homogeneous of degree 7.
Let us look at the function f (x1 , x2 ) = x1a x2b again. We can calculate the partial derivatives of f
on R2++ . Thus,
∂ f (x1 , x2 ) ∂ f (x1 , x2 )
= ax1a−1 x2b ; = bx1a , x2b−1 .
∂x1 ∂x2
Now, if t > 0, then
∂ f (tx1 ,tx2 ) ∂ f (x1 , x2 )
= a(tx1 )a−1 (tx2 )b = t a+b−1 ax1a−1 x2b = t a+b−1 .
∂x1 ∂x1
14.1. Homogeneous Functions 149
So ∂ f (x∂x11,x2 ) is homogeneous of degree (a + b − 1). Similarly, one can check that ∂ f (x∂x12,x2 ) is homo-
geneous of degree (a + b − 1). More generally, whenever a function, f , is homogeneous of degree
k, its partial derivatives are homogeneous of degree (k − 1).
The partial derivative of the function on the right hand side of (14.1) is t k ∂ f (x∂x
1 ,··· ,xn )
1
. Equality of the
two expressions lead to
∂ f (x1 , · · · , xn )
(14.3) D1 f (tx1 , · · · ,txn ) · t = t k
∂x1
Dividing by t, we get,
∂ f (x1 , · · · , xn )
(14.4) D1 f (tx1 , · · · ,txn ) = t k−1
∂x1
Thus the partial derivatives are homogeneous functions of degree k − 1.
Theorem 14.2 (Euler’s Theorem). Suppose f : Rn+ → R is homogeneous of degree k on Rn+ and
continuously differentiable on Rn++ . Then,
∂ f (x1 , · · · , xn ) ∂ f (x1 , · · · , xn )
x1 · + · · · + xn · = k f (x)
∂x1 ∂xn
x · ∇ f (x) = k f (x) for all x ∈ Rn++
150 14. Homogeneous and Homothetic Functions
f (tx) = t k f (x1 , · · · , xn )
and,
d f (tx)
(14.6) = kt k−1 f (x1 , · · · , xn )
dt
Take t = 1 to complete the proof.
Theorem 14.3 (Euler’s Theorem). Suppose f : Rn+ → R is continuous function on Rn+ and contin-
uously differentiable on Rn++ . Also suppose,
∂ f (x1 , · · · , xn ) ∂ f (x1 , · · · , xn )
x1 · + · · · + xn · = k f (x)
∂x1 ∂xn
for all x ∈ Rn++ . Then, f is homogeneous of degree k.
A useful geometric property of the homogeneous function is as follows. Let f (x) be a homo-
geneous function of degree one and consider the level set f (x) = 1. In the producers’ theory, the
function f could be a constant returns to scale production function and the level sets would then be
the iso - quants. Let x be a point on the iso-quant f (x) = 1. If we translate the point x by a factor r
along the ray joining point x and the origin, we obtain a point on the iso - quant f (z) = r.
Similarly if the function f is homogeneous of degree k, then translation of points on iso - quant
q = 1 by a factor r along the ray joining point x and the origin would generate the iso - quant q = rk ,
since f (rx) = rk f (x) = rk as f (x) = 1. Thus the level sets of a homogeneous function are radial
expansions and contractions of each other. This observation leads to following consequence.
The converse of this theorem is also true and is stated here for the sake of completeness.
Theorem 14.7. Suppose f : Rn+ → R is continuously differentiable on Rn++ . If (14.7) holds for all
x in Rn++ , for every i and j and for all t > 0, then f is homothetic.
Chapter 15
Separating Hyperplane
Theorem
A hyperplane in Rn is a set of the form [p = a], where p ̸= 0. We could visualize the vector p ∈ Rn
as as a vector normal (orthogonal) to the hyperplane at each point. The hyperplane does not change
when we multiply both the vector p and real number a by non-zero scalar α.
Example 15.1. Consider p = (1, 2) ∈ R2 and a = 4. Then the hyperplane [p = a] is the set of
points in R2 on the line x1 + 2x2 = 4 which is a straight line. For p = (1, 2, 3) ∈ R3 and a = 6, the
hyperplane [p = a] is the set of points in R3 on the plane x1 + 2x2 + 3x3 = 6. For p = (4) ∈ R1 and
a = 8, the hyperplane [p = a] is the set of singleton point in R1 4x1 = 8 or x1 = 2.
153
154 15. Separating Hyperplane Theorem
A weak half space or closed half space is a set of the form [p ≥ α] or [p ≤ α]. A strict half
space or open half space is a set of the form [p > α] or [p < α]. We say that a non-zero p, or the
hyperplane [p = α] separates A and B if either
A ⊂ [p ≥ α], and B ⊂ [p ≤ α],
or
B ⊂ [p ≥ α], and A ⊂ [p ≤ α]
holds. We will write p · A ≥ p · B to mean p · x ≥ p · y for all x ∈ A and y ∈ B.
We state and prove one of the versions of the separating hyperplane theorems.
Theorem 15.1. Let A and B be disjoint non-empty convex subset of Rn . Let A be compact and B
be close sets. Then there exists a non-zero p ∈ Rn that strongly separates A and B.
Further
( ) ( )
f (x) ≤ d(x, y) ≤ d x, x′ + d x′ , y ,
( ) ( )
for all y ∈ B. Consider a sequence {yn } such that d x′ , yn → f x′ , then
( )
f (x) ≤ d(x, y) ≤ d x, x′ + f (x′ ).
Similarly,
( ) ( )
f x′ ≤ d(x′ , y) ≤ d x, x′ + d (x, y) .
Consider again sequence {yn } such that d (x, yn ) → f (x), then
( )
f (x′ ) ≤ d(x, y) ≤ d x, x′ + f (x).
Thus,
( ) ( ) ( )
−d x, x′ ≤ f (x) − f x′ ≤ d x, x′ ,
or
| f (x) − f (x′ )| ≤ d(x, x′ ).
Thus f (x) is a continuous function on A.
Since A is a compact subset of Rn and f (x) is continuous, using Weierstrass Theorem, there
exists x̄ such that f (x) attains its minimum, i.e.,
f (x̄) ≤ f (x), for all x ∈ A.
Chapter 16
Problem Set 6
(3) Show that the equation x2 − xy3 + y5 = 19 is an implicit function of y in terms of x in the
neighborhood of (x, y) = (5, 2). Then estimate the value of y which corresponds to x = 4.9.
157
158 16. Problem Set 6
(5) Consider the profit maximizing firm described in the Example 13.2. If p increases by ∆p and
w increases by ∆w, what will be the change in the optimal input amount x?
(6) Consider 3x2 yz + xyz2 = 96 as defining x as an implicit function of y and z around the point
x = 2, y = 3, z = 2.
(a) If y increases to 3.1 and z remains same at 2, use the Implicit Function Theorem to estimate
the corresponding x.
(b) Use the quadratic formula to solve 3x2 yz + xyz2 = 96 for x as an explicit function of y and
z.
(c) Use the approximation by differentials on the explicit formula to estimate x when y = 3.1
and z = 2.
(d) Which of the two methods is easier?
(8) Let f : Rn+ → R be a non-decreasing, quasi-concave and homogeneous of degree one function.
Show that f must be concave on Rn+ .
(9) Let f be a continuous function from Rn+ to R, which is twice continuously differentiable on
Rn++ . Suppose f is homogeneous of degree m, where m is a positive integer ≥ 2. Show that
x′ H f (x)x = m(m − 1) f (x)
for all x ∈ Rn++ where H f (x) is the Hessian of f evaluated at x.
Chapter 17
Unconstrained
Optimization
We call
(17.1) max f (x) , x ∈ D ⊆ Rn ,
or
(17.2) min f (x) , x ∈ D ⊆ Rn ,
where domain D is an open set, unconstrained optimization problems. There are no restrictions on
x within the domain. Furthermore, there are no boundary solutions, because the domain does not
include its boundary (recall the definition of open set). Note max f (x) , x ∈ Rn or min f (x) , x ∈ Rn
are unconstrained optimization problems since Rn is an open set. While solving unconstrained op-
timization problem, we want to use the tools we developed earlier, i.e., find points where ∇ f (x) = 0
and investigate the curvature / shape of the function.
Remark 17.1. An unconstrained optimization problem may not have a solution.
Example 17.1. Let f (x) = x2 . Then,
(17.3) max f (x) , x ∈ R
does not have a solution. See the graph of f (x) = x2 .
159
160 17. Unconstrained Optimization
−3 −2 −1 1 2 3
Remark 17.2. A minimization problem can always be turned into a maximization problem and
vice versa:
(17.4) min f (x) ⇔ max − f (x) .
x∈D x∈D
We will see several examples of unconstrained optimization in these notes. Also there are
additional exercises in the problem set.
Theorem 17.1. First order necessary condition for local maxima / minima: Let A be an open
set in Rn , and let f : A → R be a continuously differentiable function on A. If function f has local
maximum / minimum at x∗ , then
( )
∇ f x∗ = 0
where 0 is a n × 1 null vector.
Remark 17.3. The converse is not true.
Theorem 17.2. Second order necessary condition for local maxima / minima: Let A be an open
set in Rn , and let f : A → R be a twice continuously differentiable function on A.
The first order and second order necessary conditions are useful tools to help us in ruling out
the points where a local maximum or local minimum cannot occur. This narrows down our search
for points where a local maximum or local minimum does occur. Examples below explain this
further.
It is easy to see that the necessary first and second order conditions are not sufficient.
f (x) d 2 f (x)
Example 17.4. Let X = R be the domain and f (x) = x3 − x4 . Then d dx = 3x2 − 4x3 and dx2
=
6x − 12x2 are both 0 at x = 0. But x = 0 is not a local maximizer for f (x).
Theorem 17.3. Sufficient conditions for local maxima / minima: Let A be an open set in Rn , and
let f : A → R be a twice continuously differentiable function on A.
(a) If x∗ ∈ A is such that H f (x∗ ) is negative definite and ∇ f (x∗ ) = 0 then f has local maximum
at x∗ .
(b) If x∗ ∈ A is such that H f (x∗ ) is positive definite and ∇ f (x∗ ) = 0 then f has local minimum at
x∗ .
It should be noted that the sufficient condition in Theorem 17.3 cannot be weakened to the
necessary condition in the statement of Theorem 17.2. The following example explains this point.
162 17. Unconstrained Optimization
Example 17.5. Let f : R → R be given by f (x) = x3 for all x ∈ R. Then A = R is an open set, and
f is a twice continuously differentiable function on A. At x∗ = 0,
f ′ (x∗ ) = f ′ (0) = 0, and f ′′ (x∗ ) = f ′′ (0) = 0,
so first order necessary condition and second order necessary condition are satisfied. But x∗ is
clearly not a point of local maximum of f since f is an increasing function on A.
It may also be observed that the second order necessary condition in Theorem 17.2 cannot be
strengthened to the sufficient condition in the statement of Theorem 17.3. The following example
illustrates this point.
Example 17.6. Let f : R → R be given by f (x) = −x4 for all x ∈ R. Then A = R is an open set, and
f is a twice continuously differentiable function on R. Clearly, x∗ = 0 is a point of local maximum
of f , since f (0) = 0, while f (x) < 0 for all x ̸= 0. We can calculate that
f ′ (x∗ ) = f ′ (0) = 0, and f ′′ (x∗ ) = f ′′ (0) = 0.
Thus first order necessary condition (in Theorem 17.1) and second order necessary condition (in
Theorem 17.2) are satisfied are, but the second order sufficient condition (in Theorem 17.3) is
violated.
The above discussion shows that the second-order necessary conditions for a local maximum
are different from (weaker than) the second-order sufficient conditions for a local maximum. This
demonstrates the fact that, in general, the first and second derivatives of a function at a point do not
capture all aspects relevant to the occurrence of a local maximum of the function at that point.
Theorem 17.4. Concavity (convexity) and global maxima (minima): Let A be an open and con-
vex set in Rn , and let f : A → R be a continuously differentiable function on A.
(a) If x∗ ∈ A is such that ∇ f (x∗ ) = 0 and f is concave on A, then f has global maximum at x∗ .
(b) If x∗ ∈ A is such that ∇ f (x∗ ) = 0 and f is convex on A, then f has global minimum at x∗ .
This is very easy to show. Note that concavity alongwith continuous differentiability of f
implies that for all x ∈ A,
f (x) − f (x∗ ) 6 ∇ f (x∗ ) · (x − x∗ ).
So f (x) − f (x∗ ) 6 0 or x∗ is a point of global maximum of f on A.
Theorem 17.5. Let A be an open and convex set in Rn , and let f : A → R be a twice continuously
differentiable function on A.
(a) If x∗ ∈ A is such that ∇ f (x∗ ) = 0 and H f (x) is negative semi-definite for all x ∈ A, then f
has global maximum at x∗ .
17.2. Maxima / Minima for C 2 functions of n variables 163
(b) If x∗ ∈ A is such that ∇ f (x∗ ) = 0 and H f (x) is positive semi-definite for all x ∈ A, then f has
global minimum at x∗ .
It is worth noting that Theorem 17.4 or Theorem 17.5 might be applicable in cases Theorem
17.3 is not applicable as the following example shows.
Example 17.7. Let f : R → R be given by f (x) = −x4 . Here, we note that f ′ (0) = 0 and f ′′ (x) =
−12x2 ≤ 0 for all x ∈ R. Thus we can apply Theorem 17.4 or Theorem 17.5 and conclude that
x = 0 is a point of global maximum, and hence also a point of local maximum. But the conclusion
that x = 0 is a point of local maximum cannot be derived from Theorem 17.3, since f ′′ (0) = 0.
Now we explain the steps in applying these theorems via several examples.
Example 17.8. Consider X = R2+ and f (x) = x1 x2 − 2x14 − x22 . The optimization exercise is to
maximize the objective function f (x) by choosing x ∈ X. The two first order conditions are
x2 − 8x13 = 0, and x1 − 2x2 = 0.
Solving the second equation for x1 , we have x1 = 2x2 . Substituting this into the first equation, we
have x2 − 64x23 = 0, which has three solutions:
1 1
x2 = 0, , and − .
8 8
Then the first order conditions have three solutions,
( ) ( )
1 1 1 1
(x1 , x2 ) = (0, 0) , , , and − , − ,
4 8 4 8
but the last of these is not in the domain of f , and the first is on the boundary of the domain, giving
f (0, 0) = 0. Thus, we have a unique solution in the interior of the domain:
( )
( ∗ ∗) 1 1
x1 , x2 = , .
4 8
The only solution is (x, y, z) = (0, 0, 0). So we have one candidate for local maximum or
minimum.
164 17. Unconstrained Optimization
Step 2 Compute H f .
2 2 2
H f (x, y, z) = 2 4 0 .
2 0 6
Note that in this example, H f is independent of (x, y, z). So whichever property of H f , we
get, will be global.
Step 3 Determine the curvature. Begin with computing the leading principal minors.
D1 = 2 > 0, D2 = 2 · 4 − 2 · 2 = 4 > 0 and
D3 = 2 (24 − 0) − 2 (12 − 0) + 2 (0 − 8) = 48 − 24 − 16 = 8 > 0
All leading principal minors are strictly positive→ H f is positive definite ∀ (x, y, z) in-
cluding (0, 0, 0) which implies that f is strictly convex.
Step 4 Conclude, using Theorem 17.4, that we have a global minimum at (0, 0, 0).
Example 17.10. Let us find maxima / minima for f : R2 → R
f (x, y) = −x3 + xy − y3 .
Step 2 Compute H f .
[ ] ( ) [ ] [ ]
−6x 1 1 1 −2 1 0 1
H f (x, y) = ⇒ Hf , = and H f (0, 0) = .
1 −6y 3 3 1 −2 1 0
()
1 1
Step 3 Determine the curvature. For 3, 3
, the leading principal minors.
( )
1 1
D1 = −2 < 0, D2 = 3 > 0 ⇔ H f , is negative definite.
3 3
For (0, 0),the principal minors are
D1 = 0, 0; D2 = −1 < 0 ⇒ H f (0, 0) is neither negative semi-definite nor positive semi-definite.
17.2. Maxima / Minima for C 2 functions of n variables 165
Step 4 Then Theorem ( 17.3) on second order necessary conditions applies and we have strict local
maximum at 13 , 31 . The contrapositive of the second order necessary conditions (Theo-
rem 17.2) shows that (0, 0) is neither a point of local maximum nor local minimum. It is
an inflection point.
Example 17.11. Let us find maxima / minima for f : R2 → R
f (x, y) = 2x3 + xy2 + 5x2 + y2 .
Step 2 Compute H f .
[ ]
12x + 10 2y
Hf = .
2y 2x + 2
Step 3
[ ]
10 0
H f (0, 0) = , D1 = 10 > 0, D2 = 20 > 0
0 2
⇒ H f (0, 0) is positive definite.
[ ]
−2 4
H f (−1, 2) = , D1 = −2 < 0, and 0, D2 = −16 < 0
4 0
⇒ H f (−1, 2) is neither positive semi-definite nor negative semi-definite.
[ ]
−2 −4
H f (−1, −2) = , D1 = −2 < 0, and 0, D2 = −16 < 0
−4 0
⇒ H f (−1, −2) is neither positive semi-definite nor negative semi-definite.
166 17. Unconstrained Optimization
( ) [ ]
5 −10 0 40
Hf − ,0 = D1 = −10 < 0, D2 = >0
3 0 −34
3
( )
5
⇒ H f − , 0 is negative definite.
3
( )
Step 4 Then Theorem 17.3 on sufficient conditions apply for (0, 0) and − 53 , 0 . We have strict
( )
local minimum at (0, 0); and strict local maximum at − 53 , 0 . The contrapositive of the
second order necessary conditions (Theorem 17.2) implies that neither local maximum
not local minimum exist at (−1, 2) and (−1, −2). They are inflection points.
n
(17.5) ∑ [ f (xi ) − yi ]2
i=1
is minimized. Thus the coefficients are such that the sum of the squares of the residuals (error
terms, i.e., the difference between the estimates and the actual observations) is minimized.
Define f : R2 → R by
n
f (a, b) = − ∑ [axi + b − yi ]2
i=1
The principal minors of order one for the Hessian are, f11 = −2 ∑ni=1 xi2 < 0 and f22 = −2n < 0.
We need to check the determinant of the principal minor of order two to be non-negative. Thus, the
determinant of the Hessian of f is
[ ]2
n n
det(H f (a, b)) = 4n ∑ xi2 − 4 ∑ xi
i=1 i=1
|x · y| ≤ ∥x∥ · ∥y∥.
We can take the two vectors x and the sum vector u and apply the inequality to get
|x · u| ≤ ∥x∥ · ∥u∥
|x · u|2 ≤ ∥x∥2 · ∥u∥2
n [ ]
| ∑ xi |2 ≤ ∑ xi2 · n.
i=1
Therefore, det(H f (a, b)) ≥ 0. Since f11 (a, b) ≤ 0, f22 (a, b) ≤ 0, and det(H f (a, b)) ≥ 0, H f (a, b) is
negative semi-definite. Consequently, if (a∗ , b∗ ) satisfies the first-order conditions, then (a∗ , b∗ ) is
168 17. Unconstrained Optimization
In the next exercise, we provide an alternative proof of the determinant of the Hessian is non-
negative.
Next, we assume that the claim holds for some k ∈ N and show that it holds true for n = k + 1.
Let
(x1 + · · · + xk )2 ≤ k(x12 + · · · + xk2 ).
Then
(x1 + · · · + xk + xk+1 )2 = (x1 + · · · + xk )2 + 2(x1 + · · · + xk )xk+1 + xk+1
2
Problem Set 7
(3) A monopolist producing a single output has two types of buyers. If it produces Q1 units for
buyers of type 1, then the buyers are willing to pay a price of 100 − 5Q1 dollars per unit. If it
produces Q2 units for buyers of type 2, then the buyers are willing to pay a price of 50 − 10Q2
dollars per unit. The monopolist’s cost of producing Q units of output is 50+10Q. How many
units the monopolist should produce to maximize profit?
(4) Suppose that a perfectly competitive firm receives a price of P for its output, pays prices of
w, and r for its labor (L), and capital inputs (K), and operates with the production function
Q = La K b .
(a) Write profits as a function of L, and K. Derive the first order conditions. Provide an eco-
nomic interpretation of the first order conditions.
(b) Solve for the optimal levels of L, and K.
(c) Check the second order conditions. What restrictions on the values of a, and b are necessary
for a profit maximum. Provide an economic interpretation of these restrictions.
171
172 18. Problem Set 7
(d) Find the signs of the partial derivatives of L with respect to P, w, and r.
(e) Derive the firm’s long run supply curve, i.e., Q as a function of the exogenous parameters.
Find the elasticities of supply with respect to w, r, and P. Do these elasticities sum to zero?
Provide an economic explanation for this fact.
(5) Suppose that a perfectly competitive firm receives a price of P for its output, pays prices of
w, v, and r for its labor (L), natural resource (R) and capital inputs (K), and operates with the
production function Q = A(L)a (K)b + ln R.
(a) Write profits as a function of L, R and K. Derive the first order conditions. Provide an
economic interpretation of the first order conditions.
Now take A = 3, a = b = 31 for remainder of the problem.
(b) Check the second order conditions.
(c) [Optional)] Solve for L∗ . Find the change in L∗ for a change in r when all other parameters
are constant by taking the partial derivatives of L∗ with respect to r.
(d) [Optional)] Find the change in L∗ for a change in v when all other parameters are constant
by taking the partial derivatives of L∗ with respect to v.
(e) [Optional)] It is also possible to determine the changes in L∗ when r or v values change
without explicitly solving for L∗ by using the Implicit Function Theorem. You might like
to use a more general version of the Implicit Function Theorem (than what we stated in
class) to complete this exercise.
(i) Find the change in L for a change in r when all other parameters are constant.
(ii) Find the change in L for a change in v when all other parameters are constant.
Chapter 19
Optimization Theory:
Equality Constraints
The optimization problems we encounter in economics are, in general, constrained problems where
there are some restrictions on the set we can choose x from. Some examples of constrained opti-
mization problems we see are,
max u (x)
(19.1) x
subject to x ∈ B (p, I)
Producer Theory
max py − w · x
(19.2) y,x
subject to (y, x) ∈ Y
where
{ }
Y = (y, x) ∈ R × Rn | y 6 f (x)
is the production possibility set with f (x) being the production function (one output, many inputs).
173
174 19. Optimization Theory: Equality Constraints
We will work with maximization problem as it is easy to turn a minimization problem into a
maximization problem. A constrained maximization problem has the following form.
max f (x)
x
subject to x ∈ G (x)
f (x) is called the objective function,
where x is called the choice variable,
G (x) is called the constraint set.
We assume the objective function to be C 2 so that we can use differential calculus techniques.
max f (x)
(19.3) x
subject to x ∈ [a, b]
Does a solution exist? Note f is continuous (because it is C 2 ) and [a, b] is a non-empty compact
set. We can use Weierstrass Theorem to show existence of a maximum and minimum. Having
shown the existence, there are two possibilities:
(a) The solution is interior, x∗ ∈ (a, b). Then x∗ must also be a local maximum, i.e.,
( ) ( )
(19.5) f ′ x∗ = 0 ∧ f ′′ x∗ 6 0.
In general, constrained optimization problems are of two categories, (a) with equality con-
straint and (b) with inequality constraint. We discuss them next.
Note that g (x) = (g1 (x) , · · · , gk (x)) is k-dimensional row vector. The interesting case will be
k < n as the following example shows.
Definition 19.1. A point x∗ ∈ G (x) is point of local maximum of f subject to the constraint g (x) =
0, if there is δ > 0 such that x ∈ G (x) ∩ B (x∗ , δ) implies f (x) 6 f (x∗ ).
176 19. Optimization Theory: Equality Constraints
Definition 19.2. A point x∗ ∈ G (x) is point of global maximum of f subject to the constraint
g (x) = 0, if x∗ solves the problem
max f (x)
subject to g (x) = 0.
Theorem 19.1. Necessary condition for a constrained local maximum (Lagrange Theorem) Let
A ⊆ Rn be open and f : A → R, g : A → Rk be C 1 functions. Suppose x∗ is a point of local
maximum of f subject to the constraint g (x) = 0. Suppose further that ∇g (x∗ ) ̸= 0. Then there is
λ∗ ∈ Rk such that
( ) ( )
(19.7) ∇ f x∗ = λ∗ ∇g x∗ .
Remark 19.1. The condition ∇g (x∗ ) ̸= 0 is called constraint qualification.
It is important to check the constraint qualification condition ∇g(x∗ ) ̸= 0, for applying the
conclusion of Lagrange’s theorem. Without this condition, the conclusion of Lagrange’s theorem
would not be valid, as the following example shows.
Example 19.5.
Let f : R2 → R be given by
f (x1 , x2 ) = 4x1 + 3x2 for all (x1 , x2 ) ∈ R2 ;
and let g : R2 → R be given by
g(x1 , x2 ) = x12 + x22 .
Consider the constraint set C = {(x1 , x2 ) ∈ R2 : g(x1 , x2 ) = 0}. The only element of this set is (0,0),
so (x1∗ , x2∗ ) = (0, 0) is a point of local maximum of f subject to the constraint g(x) = 0. Observe that
the conclusion of Lagrange’s theorem does not hold here. For, if it did, there would exist λ∗ ∈ R
such that
∇ f (0, 0) = λ∗ ∇g(0, 0)
But this means that
(4, 3) = λ∗ (0, 0)
which is a contradiction. The problem here is that
∇g(x1∗ , x2∗ ) = ∇g(0, 0) = (0, 0),
so the constraint qualification condition is violated.
In the next Theorem, we use notation C to denote the constraint set, i.e.,
{ }
C = x ∈ Rn : g(x) = 0 .
19.2. Equality Constraint 177
Theorem 19.2. Sufficient Conditions for a Global maximum: Let A ⊆ Rn be an open convex set
and f : A → R, g : A → Rk be C 1 functions. Suppose (x∗ , λ∗ ) ∈ C × Rk satisfies
( ) ( )
(19.8) ∇ f x∗ = λ∗ ∇g x∗ .
If L (x, λ∗ ) = f (x) − λ∗ · g (x) is concave in x on A, then x∗ is a point of global maximum of f
subject to constraint g (x) = 0.
We use the following steps to solve the optimization problem with equality constraint. Let f
and gi , i = 1, · · · , k, be C 1 functions.
Necessity Route:
Step 1 Existence of solution can be shown by using Weierstrass Theorem. For this we need to
show that the constraint set is closed and bounded.
Step 3 Take the partial derivative with respect to each variable x1 , · · · xn , and Lagrange multipliers
λ1 , · · · , λk .
x2
2 x1
Step 5 Let
{ }
M = (x, λ) ∈ Rn+k | x satisfies gi (x) = 0, i = 1, · · · , k and FOCs hold.
Verify that ∇g (x∗ ) ̸= 0 holds at each point in the set M. Then evaluate f at each (x, λ) ∈ M
and find the maximum.
Sufficiency Route: We know that if f and λ1 g1 (x) , · · · , λk gk (x) are such that L (x, λ) is con-
cave, then the FOCs are sufficient for a maximum. Hence if we can show concavity, then any point
satisfying the FOC will be a solution. We illustrate the use of the two routes through following
examples.
Remark 19.2. Note if f is not concave, we have to compare points in M (x, λ).
Example 19.6.
max f (x1 , x2 ) = −x12 − x22
x∈R2+
subject to 5x1 + 10x2 = 10
The constraint set consists of 1 − 0.5x1 and non-negative values of x1 and x2 subject to the equality
constraint. To get the constraint in g (x) = 0 form, we rearrange it
5x1 + 10x2 − 10 = 0.
19.2. Equality Constraint 179
Constraint set is closed. Take any convergent sequence {xn } ∈ G (x) → x̄. Since 5x1n + 10x2n −
10 = 0, x1n > 0, x2n > 0, ∀n ∈ N, and weak inequalities are preserved in the limit,
5x̄1 + 10x̄2 − 10 = 0, x̄1 > 0, x̄2 > 0.
So x̄ ∈ G (x).
( )
The solution then is x∗ = 2 4
5, 5 .
Sufficiency Route
[ ]
∇ f (x) = −2x1 −2x2
[ ]
−2 0
H f (x) =
0 −2
D1 = −2 < 0, D2 = 4 > 0
So H f (x) is negative definite ∀x. Since H f (x) is negative definite ∀x, f is concave. The constraint
g(x) is concave as it is linear. Also −λ > 0. Then f (x) − λg (x) is concave as a sum of concave
functions.
( )Then we know that the FOCs are sufficient condition for a maximum. So the point
∗ 2 4
x = 5 , 5 is our solution.
The constraint set is an ellipsoid and can be rewritten as 3 − 2x12 − x22 = 0. Here the sufficiency
route will not work as the objective function is not concave.
[ ]
2x2 2x1
H f (x) =
2x1 0
D1 = 2x2 , D2 = −4x12 , D2 < 0 ∀ x ̸= 0
which means that H f (x) is indefinite ∀x ̸= 0. So f is not concave. Hence we have to use the
necessity route.
√ √
√ √
So ∥x∥ 6
3, 3
= 3 + 3 = 6. So the constraint set is compact and non-empty and the
objective function f is continuous, hence Weierstrass theorem is applicable and a solution exists.
Case (ii)
x2
λ = − → x12 − x22 = 0
2
→ x1 = x2 ∨ x1 = −x2
→ 3 − 2x12 − x22 = 0
gives x1 = 1 ∨ x1 = −1. If
1 1
x1 = 1 → x2 = 1 ∨ x2 = −1, λ = − ∨ λ = .
2 2
Similarly for x1 = −1. We get four more candidates for solution.
( ) ( )
1 1
m3 = 1, 1, − , m4 = 1, −1, ,
2 2
( ) ( )
1 1
m5 = −1, −1, , m6 = −1, 1, − .
2 2
Thus
M = {m1 , m2 , · · · , m6 } .
The constraint qualification
( ) [ ]
∇g x∗ = −4x1∗ −2x2∗ ̸= 0
182 19. Optimization Theory: Equality Constraints
The Hessian is [ ]
0 1
H f (x) =
1 0
which is indefinite for all values of x ∈ R2+ . Hence the objective function is not concave.
Observe that x is restricted to R2+ and the equality constraint holds. This constraint set is non-
empty as (0, 4) is contained in it, and compact. A solution to this problem exists as f is continuous
and the constraint set is non empty and compact, hence Weierstrass theorem is applicable.
is satisfied trivially for m1 . Compare it with the corners (0, 4) , (16, 0) and verify that
f (0, 4) = 0 = f (16, 0) , f (8, 2) = 16.
The solution then is x = (8, 2).
Example 19.9.
max f (x1 , x2 ) = ln x1 + ln x2
x∈R2+
subject to x1 + 4x2 = 16 or 16 − x1 − 4x2 = 0.
Here the necessity route does not work as the objective function is not defined at the corners
of the constraint set, x = (16, 0) or x = (0, 4) as ln y is not defined for y = 0. Weierstrass Theorem
cannot be applied. Let us use the sufficiency route. Since ln is not defined at the corners, the
problem can be modified as follows
max f (x1 , x2 ) = ln x1 + ln x2
x∈R2++
subject to 16 − x1 − 4x2 = 0.
The Lagrangian and the FOCs are
L (x, λ) = ln x1 + ln x2 − λ (16 − x1 − 4x2 )
∂L (x, λ) 1
= + λ = 0 → λx1 = −1
∂x1 x1
∂L (x, λ) 1
= + 4λ = 0 → 4λx2 = −1
∂x2 x2
∂L (x, λ)
= −(16 − x1 − 4x2 ) = 0.
∂λ
So x1 = 4x2 from the first two FOCs. Substituting it in the third FOC, we get x1 = 8, x2 = 2, λ = − 18 .
The Hessian is
− x12 0
H f (x) = 1
0 − x12
2
1 1
D1 = − < 0, D2 = 2 2 > 0, ∀x ∈ R2++ .
x12 x1 x2
Hence H f (x) is negative definite ∀x ∈ R2++ , so f is concave. Also g (x) = 16 − x1 − 4x2 is linear,
hence concave. Lastly −λ > 0. So L (x, λ) is concave and the FOCs are sufficient for maximum.
Hence x∗ = (8, 2) is the solution.
Example 19.10. Application: Arithmetic mean Geometric mean inequality Consider
max f (a, b) = ab
(19.9) (a,b)∈R2+
subject to a + b = 2.
184 19. Optimization Theory: Equality Constraints
Note the constraint set C = {a > 0, b > 0, a + b = 2} is non empty, (2, 0) is contained in it,
closed since weak inequalities are preserved in the limit, and bounded as
√
(a, b)
6
(2, 2)
= 2 2.
Note that at the solution a > 0, b > 0. Hence we can rewrite the problem as under
max f (a, b) = ab
(a,b)∈R2++
subject to g (a, b) = 2 − a − b = 0.
The Lagrangian and the FOCs are
L (a, b, λ) = ab − λ (2 − a − b)
∂L (x, λ)
= b+λ = 0
∂a
∂L (x, λ)
= a+λ = 0
∂b
∂L (x, λ)
= −(2 − a − b) = 0.
∂λ
Now
a = b → a = b = 1 = −λ
We get one candidate for solution
m1 = (1, 1, −1) .
The constraint qualification
( ) [ ]
∇g x∗ = −1 −1 ̸= 0
is satisfied trivially for m1 . Compare it with the corners (0, 2) , (2, 0) and verify that
f (0, 2) = 0 = f (2, 0) , f (1, 1) = 1.
The solution then is (1, 1). In other words, we have shown that
(19.10) ab 6 1.
Now let x1 > 0, x2 > 0 be arbitrary with
x1 + x2 = x > 0.
Then
2x1 + 2x2 = 2x
2x1 2x2
+ = 2
x x
19.2. Equality Constraint 185
Optimization Theory:
Inequality Constraints
The more general constrained optimization problem deals with inequality constraint. Note that the
equality constraint g (x) = 0 can be expressed as g (x) > 0 and g (x) 6 0.
The constrained maximization problem with which we are concerned is the following:
max f (x)
subject to g j (x) ≥ 0 for j = 1, · · · , m
and x∈ Rn+ .
187
188 20. Optimization Theory: Inequality Constraints
We illustrate the application of this Theorem through examples. First we take a linear objective
function.
Example 20.1. Solve
max f (x, y) = ax + by
(x,y)∈R2+
subject to p1 x + p2 y 6 M.
where a, b, p1 , p2 and M are positive parameters. Find a solution to the problem for the following
parameter configurations
a p1 a p1
(i) > (ii) < ,
b p2 b p2
using Kuhn Tucker sufficiency theorem.
(i) Let { }
X = (x, y) ∈ R2 | x > −1, y > −1 .
20.1. Inequality Constraint 189
is closed.
(ii) Function f (x, y) is continuous as ax and by are continuous and f (·, ·) is obtained by taking
sum of two continuous functions.
Let g1 (x, y) = M − p1 x − p2 y, g2 (x, y) = x, g3 (x, y) = y are linear and hence continuous
functions. Further fx (x, y) = a, fy (x, y) = b are continuous functions. Hence f , g j ( j =
1, · · · , 3) are continuously differentiable on X.
Function f (x, y) is concave as sum of two concave functions and g j ( j = 1, · · · , 3) are concave being
linear functions. Hence for the following problem
max f (x, y) = ax + by
(x,y)∈X
subject to p1 x + p2 y 6 M, x > 0, y > 0.
all conditions of Kuhn-Tucker sufficiency theorem are satisfied. We need to find pair ((x∗ , y∗ ) , λ∗ ) ∈
X × R3+ , that satisfies the Kuhn-Tucker conditions.
m
(i) Di f (x∗ ) + ∑ λ∗j · Di g j (x∗ ) = 0; i = 1, · · · , n,
j=1
(ii) g(x ) > 0 and λ∗ · g(x∗ ) = 0.
∗
They are
a − λ1 p1 + λ2 = 0
b − λ1 p2 + λ3 = 0
M − p1 x − p2 y > 0, λ1 (M − p1 x − p2 y) = 0
x > 0, λ2 x = 0; y > 0, λ3 y = 0
If λ1 = 0, then a − λ1 p1 + λ2 = 0 → λ2 = −a < 0 which contradicts λ2 > 0. Hence
λ1 > 0 → M − p1 x − p2 y = 0.
So x = y = 0 is ruled out. We now consider the two cases.
190 20. Optimization Theory: Inequality Constraints
x2
M
p2
M x1
( ) p1
a p1 M
Figure 20.1. Case (i): b > p2 : Optimal Consumption Bundle = p1 , 0
p1
Case (i) a
b > p2 . Consider x > 0, y = 0. Note λ2 = 0, x = M
p1 ,
a a
= λ1 , b − p2 + λ3 = 0,
p1 p1
( )
a a p2
λ3 = p2 − b = b − 1 > 0,
p1 b p1
a p1 a p2
since b > p2 or b p1 > 1. Hence
( )
M a a p2
x = , y = 0, λ1 = , λ2 = 0, λ3 = b −1 > 0
p1 p1 b p1
is a solution.
p1
Case (ii) a
b < p2 . Consider x = 0, y > 0. Note λ3 = 0, y = M
p2 ,
b b
= λ1 , a − p1 + λ2 = 0
p2 p2
( )
b b p1
λ2 = p1 − a = a −1 > 0
p2 a p2
a p1 b p1
since b < p2 or 1 < a p2 . Hence
( )
M b b p1
x = 0, y = , λ1 = , λ2 = a − 1 > 0, λ3 = 0
p2 p2 a p2
is a solution.
20.1. Inequality Constraint 191
x2
M
p2
M x1
( ) p1
a p1
Figure 20.2. Case (ii): b < p2 : Optimal Consumption Bundle = 0, M
p2
(i) Let { }
X = (x, y) ∈ R2 | x > −1, y > −1 .
Then X is open as its complement
{ }
X C = (x, y) ∈ R2 | x 6 −1, y 6 −1
is closed.
(ii) Function f (x, y) is continuous as x, y, 1 + x are continuous, 1 + x > 0 and f (·, ·) is obtained
by taking quotient of two continuous functions x and 1 + x, with non-vanishing denominator
192 20. Optimization Theory: Inequality Constraints
Function f (x, y) is concave as sum of two concave functions (exercise) and g j ( j = 1, · · · , 3) are
concave being linear functions. Hence for the following problem
x
max f (x, y) = 1+x +y
(x,y)∈X
subject to x + 4y 6 16, x > 0, y > 0.
all conditions of Kuhn -Tucker sufficiency theorem are satisfied. We need to find pair ((x∗ , y∗ ) , λ∗ ) ∈
X × R3+ , that satisfies the Kuhn-Tucker conditions. They are
1
− λ1 + λ2 = 0
(1 + x)2
1 − 4λ1 + λ3 = 0
16 − x − 4y > 0, λ1 (16 − x − 4y) = 0
x > 0, λ2 x = 0; y > 0, λ3 y = 0.
λ1 > 0 → 16 − x − 4y = 0
In the above example, let the price of good y be p > 0 and income be I > 0. We can redo the
exercise by going over the Kuhn Tucker conditions again. They are
1
− λ1 + λ2 = 0
(1 + x)2
1 − pλ1 + λ3 = 0
I − x − py > 0, λ1 (I − x − py) = 0
x > 0, λ2 x = 0; y > 0, λ3 y = 0.
If λ1 = 0, then 1 − pλ1 + λ3 = 0 → λ3 = −1 < 0 which contradicts λ3 > 0. Hence
λ1 > 0 → I − x − py = 0
and x = y = 0 is ruled out because I > 0. There are three remaining cases.
( )
p 2
If − 1 > 0 → p > (I + 1) , then λ3 > 0. So solution is I, 0, 1
, 0, p 2 − 1 if
(1+I)2 (1+I)2 (1+I)
p > (I + 1)2 .
( )
Combining them the solution x∗ , y∗ , λ∗1 , λ2∗ , λ∗3 is
( )
I, 0, 1
, 0, p
− 1 if p > (I + 1)2
( (1+I)2 (1+I)2
)
0, I 1 1
, , − 1, 0 if p 6 1, and
(
p p p
√ )
√ I+1− p
p − 1, p , 1p , 0, 0 if 1 < p < (I + 1)2 .
The Kuhn Tucker Sufficiency Theorem asserts that this solution is a global maximum and therefore
solves both the problem.
We know from the definitions, that if x̂ is a point of global maximum, then x̂ is also a point of local
maximum. The situations under which the converse is true are given by the following theorems.
Theorem 20.2. Suppose A is an open convex set in Rn , and f is a function from A to R.
20.2. Global maximum and constrained local maximum 195
(a) Assume that x̄ is not a global maximum of f on A. Then there exists another point x̂ ∈ A such
that x̂ ̸= x̄ and f (x̂) > f (x̄).
Since x̄ is a point of local maximum, there exists δ > 0 such that f (x̄) ≥ f (x) for all
x ∈ A ∩ B(x̄, δ).
Consider a point x ∈ A on the line joining the two points x̂ and x̄, i.e.,
x = λx̂ + (1 − λ)x̄,
for some λ ∈ [0, 1]. Since A is convex, we know x ∈ A. By concavity of f , we have for all
λ ∈ [0, 1]
f (λx̂ + (1 − λ)x̄) ≥ λ f (x̂) + (1 − λ) f (x̄).
Since f (x̂) > f (x̄), we also have for all λ ∈ (0, 1] that
f (λx̂ + (1 − λ)x̄) ≥ λ f (x̂) + (1 − λ) f (x̄) > λ f (x̄) + (1 − λ) f (x̄) = f (x̄).
We wish to take λ sufficiently close to zero (but not equal to zero) so that
x′ ≡ λx̂ + (1 − λ)x̄ ∈ B(x̄, δ).
For this, let us denote d(x̂, x̄) = d and note
d(x′ , x̄) = d(λx̂ + (1 − λ)x̄, x̄) = |λ|d(x̂, x̄) = λ · d.
δ
If we set λ = 2d , then we know
δ δ
d(x′ , x̄) = λ · d = ·d = ,
2d 2
or x′ ∈ B(x̄, δ).
196 20. Optimization Theory: Inequality Constraints
Also x′ ∈ A since A is a convex set. Therefore, we have found a point x′ ∈ A ∩ B(x̄, δ) such
that f (x′ ) > f (x̄), which contradicts that x̄ was a point of local maximum. It follows that x̄
must be a global maximum of f on A.
(b) Assume that x̄ is not a point of global maximum of f on A. Then there exists another point
x̂ ∈ A such that x̂ ̸= x̄ and f (x̂) > f (x̄).
Since x̄ is a point of local maximum, there exists δ > 0 such that f (x̄) ≥ f (x) for all
x ∈ A ∩ B(x̄, δ).
Consider a point x ∈ A on the line joining the two points x̂ and x̄, i.e.,
x = λx̂ + (1 − λ)x̄,
for some λ ∈ [0, 1]. Since A is convex, we know x ∈ A. Since f is strictly quasi-concave, we
have for all λ ∈ (0, 1)
{ }
f (λx̂ + (1 − λ)x̄) ≥ min f (x̂), f (x̄) = f (x̄).
We wish to take λ > 0 sufficiently small so that
x′ ≡ λx̂ + (1 − λ)x̄ ∈ B(x̄, δ).
For this, let us denote d(x̂, x̄) = d and note
d(x′ , x̄) = d(λx̂ + (1 − λ)x̄, x̄) = |λ|d(x̂, x̄) = λ · d.
δ
If we set λ = 2d , then we know
δ δ
d(x′ , x̄) = λ · d = ·d = ,
2d 2
or x′ ∈ B(x̄, δ).
Also x′ ∈ A since A is a convex set. Therefore, we have found a point x′ ∈ A ∩ B(x̄, δ) such
that f (x′ ) > f (x̄), which contradicts that x̄ was a point of local maximum. It follows that x̄
must be a global maximum of f on A.
To show uniqueness, if not, then there exists x′′ ∈ A such that
f (x̄) = f (x′′ ).
But then, since f is strictly quasi-concave and A is convex,
{ }
f (0.5x̄ + 0.5x′′ ) > min f (x̄), f (x′′ ) = f (x′′ ) = f (x̄).
This contradicts the fact that x̄ is a point of global maximum.
(c) Assume that x̄ is not the unique point of global maximum of f on A. Then there exists another
point x̂ ∈ A such that x̂ ̸= x̄ and f (x̂) > f (x̄).
Since x̄ is the unique point of local maximum in the open ball B(x̄, δ), f (x̄) > f (x) for all
x ∈ A ∩ B(x̄, δ).
Consider a point x ∈ A on the line joining the two points x̂ and x̄, i.e.,
x = λx̂ + (1 − λ)x̄,
20.2. Global maximum and constrained local maximum 197
for some λ ∈ [0, 1]. Since A is convex, we know x ∈ A. Since f is quasi-concave, we have for
all λ ∈ (0, 1)
{ }
f (λx̂ + (1 − λ)x̄) ≥ min f (x̂), f (x̄) = f (x̄).
We wish to take λ > 0 sufficiently small so that
x′ ≡ λx̂ + (1 − λ)x̄ ∈ B(x̄, δ).
For this, let us denote d(x̂, x̄) = d and note
d(x′ , x̄) = d(λx̂ + (1 − λ)x̄, x̄) = |λ| d(x̂, x̄) = λ · d.
δ
If we set λ = 2d , then we know
δ δ
d(x′ , x̄) = λ · d = ·d = ,
2d 2
or x′ ∈ B(x̄, δ).
Also x′ ∈ A since A is a convex set. Therefore, we have found a point x′ ∈ A ∩ B(x̄, δ) such
that f (x′ ) ≥ f (x̄), which contradicts that x̄ was the unique point of local maximum. It follows
that x̄ must be the unique point of global maximum of f on A.
This theorem shows that there is an important difference between concavity and quasi-concavity
in going from the local maximum property to the global maximum property. With quasi-concavity,
we need something more (some “strictness”) to make the arguments work. In (b), this additional
condition takes the form of strict quasi-concavity. In (c), it takes the form of assuming that the
point of local maximum is unique. This underlying theme (that one needs something in addition
to quasi-concavity to make the arguments and results work) recurs in Arrow- Enthoven’s theory
of quasi-concave programming, where the attempt is made to replace the concavity conditions of
Kuhn-Tucker with quasi-concavity.
The following example shows that in Theorem 20.2(a), we cannot replace concavity of f by
quasi-concavity of f , and still preserve the conclusion.
Example 20.4. Let A be the interval (0, 6) in R. Clearly, A is an open, convex set. Let f : A → R
be defined as follows:
x for x ∈ (0, 2)
f (x) = 2 for x ∈ [2, 4]
x−2 for x ∈ (4, 6)
Then, f is a non-decreasing function on A, and therefore quasi-concave. The point x̄ = 3 is clearly
a point of local maximum, since f (x̄) = 2 ≥ f (x) for all x ∈ A ∩ B(x̄, 1). However, x̄ is not a point
of global maximum of f on A, since (for example), f (5) = 3 > 2 = f (x̄).
198 20. Optimization Theory: Inequality Constraints
Following theorem describes the conditions in which a point of constrained local maximum, x̂
is also a point of constrained global maximum.
Theorem 20.3. Let X be a convex set in Rn . Let f , g j ( j = 1, · · · , m) be concave functions on
X. Suppose x̂ is a point of constrained local maximum. Then, x̂ is a point of constrained global
maximum.
Now, if x̂ is not a point of constrained global maximum, then there is some x̄ ∈ C, such that
f (x̄) > f (x̂). One can choose 0 < θ < 1 with θ sufficiently close to zero, such that
x̃ ≡ [θ x̄ + (1 − θ)x̂] ∈ B(x̂, δ).
For this, we need
∥θ x̄ + (1 − θ)x̂ − x̂∥ < δ.
This implies if
δ
θ<
∥x̄ − x̂∥
then x̃ ∈ B(x̂, δ). Take
δ
θ=
2∥x̄ − x̂∥
so that x̃ ∈ B(x̂, δ). Since X is convex and g j ( j = 1, · · · , m) are concave, we claim that C is a convex
set, and x̃ ≡ [θ x̄ + (1 − θ)x̂] ∈ C.
Let y ∈ C and y′ ∈ C be two arbitrary points. By definition of the constraint set C, y and y′ are
in X and therefore, ŷ ≡ [λ y + (1 − λ)y′ ] ∈ X for all λ ∈ [0, 1]. Also by concavity of the constraint
functions,
g j (ŷ) = g j (λ y + (1 − λ)y′ ) ≥ λ g j (y) + (1 − λ)g j (y′ ) ≥ λ · 0 + (1 − λ) · 0 = 0,
for all j = 1, · · · , m.
Therefore, x̃ ≡ [θ x̄ + (1 − θ)x̂] ∈ C.
Thus
x̃ = [θ x̄ + (1 − θ)x̂] ∈ B(x̂, δ) ∩C.
20.2. Global maximum and constrained local maximum 199
Observe that we did not need to assume that the objective function is differentiable on the
domain X in this proof.
Chapter 21
Problem Set 8
(a) Show, by using Weierstrass theorem, that there exists x̄ ∈ R3 which solves (31.1).
(b) Use Lagrange’s theorem to show that
3
(21.2) ∑ ci x̄i = ∥C∥.
i=1
(c) Let p, q be arbitrary non-zero vectors in Rn . Using result in (b), show that |p·q| ≤ ∥p∥·∥q∥.
Solve the following constrained optimization problems.
(2) Let f : R2 → R.
max f (x, y) = x2 − 3xy
(21.3) (x,y)∈R2+
subject to x + 2y = 10.
201
202 21. Problem Set 8
(6) Let X be a non-empty, convex set in R2 . Let g be a continuous function from X to R, and
let f be a strictly quasi-concave function from X to R. Consider the following constrained
optimization problem.
max f (x)
(21.7) subject to g(x) ≥ 0
and x∈X
(7) Suppose that a consumer has the utility function U(x, y) = xa yb and faces the budget constraint
px x + py y ≤ I.
(A) Utility Maximization
(a) What are the first order conditions for utility maximization?
(b) Solve for the consumer’s demands for goods x and y.
(c) Solve for the value of λ. What is the economic interpretation of λ? When is λ an
increasing, decreasing or constant function of income?
(d) Show that the second order conditions hold?
(e) Show that the implicit function theorem value of dx dI is identical to the value of taking
∗
the partial derivative of x with respect to I.
21. Problem Set 8 203
(8) Suppose a consumer has the utility function U = a ln(x − x0 ) + b ln(y − y0 ) where a, b, x0 and
y0 are positive parameters. Assume that the usual budget constraint applies.
(a) Solve for the consumer’s demand for good x.
(b) Find the elasticities of demand for good x with respect to income and prices.
(c) Show that the utility function U = 45(x − x0 )3.5a (y − y0 )3.5a would have yielded the same
demand for good x.
where a > 0, b > 0 and c > 0 are such that a + b + c = 1. The budget constraint is
px + qy + rz ≤ I.
In other words, the prices of good x, y and z are p, q and r respectively and the consumer has
an income I. The prices and income are positive.
In addition, the consumer faces a rationing constraint. He is not allowed to buy more than
k > 0 units of good x.
(a) Solve the optimization problem.
(b) Under what condition on the various parameters, is the rationing constraint binding?
204 21. Problem Set 8
(c) Show that when the rationing constraint binds, the income that the consumer would have
liked to spend on good x but cannot do so now is split between good y and z in proportions
b : c.
(d) Would you expect rationing of bread purchases to affect demand for butter and rice in this
way? If not, how would you expect the bread-butter-rice case to differ from the result in
(c)?
Chapter 22
Envelope Theorem
Let f (x, α) be a continuously differentiable function of x ∈ Rn and a parameter α. For each choice
of α, consider the unconstrained maximization problem:
max f (x, α)
where choice variable is x. It is of interest to us as to how the maximizer value x∗ changes as the
parameter value α changes.
Theorem 22.1. Let x∗ (α) be a solution of this problem and also assume that x∗ (α) is a continuously
differentiable function of α. Then,
d ∂
f (x∗ (α), α) = f (x∗ (α), α)
dα ∂α
205
206 22. Envelope Theorem
Example 22.1. Consider the problem of maximizing the function f (x, a) = −2x2 + 2ax + 4a2 with
respect to x for any given value of a. What is the effect of a unit increase in the value of a on the
maximum value of f (x, a).
This can be done directly by computing the x∗ which maximizes f . The first order condition
yields
f ′ (x) = −4x + 2a = 0.
So x∗ = 0.5a. We can plug this into f (x, a) which leads to
Observe that f (x∗ (a), a) increases at the rate of 9a as a increases. Alternatively we could apply the
Envelope Theorem to get
d f ∗ ∂ f (x∗ (a), a)
= = 2x∗ + 8a = 9a
da ∂a
since x∗ (a) = 0.5a.
Let us denote the input level at which the maximum profit is attained by x∗ . We observe that x∗ is a
function of the parameters p and w. The maximum profit is the value function of this exercise and
we call it the profit function.
By Envelope Theorem
∂π∗ (p, w)
= f (x∗ (p, w)) > 0.
∂p
Thus the profit function is increasing in the price of the output. Also
∂π∗ (p, w)
= −x∗ (p, w) < 0.
∂w
So the profit function is decreasing in the price of the input. Further, it also shows that
∂π∗ (p, w)
x∗ (p, w) = − .
∂w
The profit maximizing input stock can be obtained by taking partial derivative of the profit function
with respect to w (a result known as Hotelling’s Lemma).
22.2. Meaning of the Lagrange multiplier 207
In this section we will see that the multipliers measure the sensitivity of the optimal value of the
objective function to the changes in the right-hand sides (parameters) of the constraints. In this
sense, they provide a natural measure of the value for scarce resources in economic maximization
problems.
Consider a simple maximization problem with two variables and one equality constraint. Let
f R2 → R be denoted as f (x, y).
:
max f (x, y)
(22.2) (x,y)∈R2+
subject to h(x, y) = a.
Let (x∗ (a), y∗ (a)) be a solution to the above problem for any given parameter value a. Thus
f (x∗ (a), y∗ (a)) is the corresponding optimal value of the objective function. Let the Lagrange
multiplier be denoted by λ∗ (a). Following theorem shows that λ∗ (a) measures the rate of change
of the optimal value of the objective function f with respect to a.
Theorem 22.2. Let f and h be continuously differentiable functions of two variables. For any fixed
value of the parameter a, let (x∗ (a), y∗ (a)) be the solution of the optimization problem (22.2) with
the corresponding Lagrange multiplier λ∗ (a). Assume that x∗ (a), y∗ (a) and λ∗ (a) are continuously
differentiable functions of a and the constraint qualification holds at (x∗ (a), y∗ (a)). Then,
d f (x(a), y(a))
λ∗ (a) = .
da
L ≡ f (x, y) − λ(h(x, y) − a)
where a is a parameter. The solution of this problem, (x∗ (a), y∗ (a)), λ∗ (a) satisfies the First Order
conditions.
∂L (x∗ (a),y∗ (a),λ∗ (a)) ∂ f (x∗ (a),y∗ (a),λ∗ (a)) ∗ ∗ ∗
for all values of a. Also, since h(x∗ (a), y∗ (a)) = a for all a, we get,
∂h(x∗ (a), y∗ (a), λ∗ (a)) dx∗ (a) ∂h(x∗ (a), y∗ (a), λ∗ (a)) dy∗ (a)
+ =1
∂x da ∂y da
208 22. Envelope Theorem
for all a. Now we can use the Chain Rule and the two First Order conditions,
d f (x∗ (a),y∗ (a) ∂ f (x∗ (a),y∗ (a)) dx∗ (a) ∗ ∗ (a)) dy∗ (a)
da = ∂x · da + ∂ f (x (a),y ∂y · da
∂h(x ∗ (a),y∗ (a),λ∗ (a)) dx∗ (a) ∂h(x ∗ (a),y∗ (a),λ∗ (a)) dy∗ (a)
= λ∗ ∂x da + λ
∗
∂y da
∂h(x ∗ (a),y∗ (a),λ∗ (a)) dx∗ (a) ∂h(x ∗ (a),y∗ (a),λ∗ (a)) dy∗ (a)
= λ∗ [ ∂x da + ∂y da ]
= ∗
λ ·1 = λ ∗
The general envelope theorem arises in the case of constrained optimizations where both the ob-
jective function as well as the constraint functions are functions of some parameters. Consider for
example the optimization exercise as follows:
max f (x, a)
subject to g j (x, a) = 0 for j = 1, · · · , m
(22.3) and x∈ Rn+ .
In this case, the objective function f as well as the constraints g1 , · · · , gm depend on the
parameter a. Following theorem shows that the rate of change of f (x∗ (a), a) with respect to a
equals the partial derivative with respect to a, not of f but of the corresponding Lagrangian function
L.
Theorem 22.3. Let f , g1 , · · · , gm be continuously differentiable functions and let
x∗ (a) = (x1∗ (a), x2∗ (a), · · · , xn∗ (a))
denote the solution of the optimization problem (22.3) for any fixed value of the parameter a.
Assume that x∗ (a), and the Lagrange multipliers λ∗1 (a), · · · , λ∗m (a) are continuously differentiable
functions of a and the constraint qualification condition holds. Then,
d f (x∗ (a), a) ∂L (x∗ (a), λ∗ (a), a)
(22.4) =
da ∂a
∂ f (x∗ (a), a) ∂g1 (x∗ (a), a) ∂gm (x∗ (a), a)
(22.5) = − λ∗1 − · · · − λ∗m .
∂a ∂a ∂a
Chapter 23
Elementary Concepts in
Probability
Probability theory deals with random events, events whose occurrence cannot be predicted with
certainty. There are at least three sources of randomness. Firstly by nature many features of our
world are stochastic. Evolution of such a diverse variety of life is witness to unpredictability in
the universe and environment. Second source of randomness: Many events are the result of a very
large number of actions and decisions. Third source of randomness: Some variables may appear
random because they are measured with error.
Even though we are not sure about the outcomes of a random event, we can attach to each
outcome a number called probability.
We first describe the set of outcomes of a random event, i.e., a set whose elements are all possible
outcomes of a random event. It is known as the sample space and denoted by Ω.
Example 23.1. The set of possible outcomes of flipping a fair coin is
Ω = {H, T }.
209
210 23. Elementary Concepts in Probability
where the outcome (i, j) is said to occur if i appeared on the first die, and j appeared on the second
die.
The set of outcomes for measuring the lifetime of a car, consists of non-negative real numbers.
Ω = [0, ∞).
Next we form the set F that contains all elements of the set Ω as well as their unions and
complements. Thus if A and B are in F , so does A ∪ B, Ac , and Bc . The set F , which is closed
under the operations of union and complements, is known as algebra.
Example 23.2. The algebra for the outcomes of flipping a fair coin is
/ Ω, {H}, {T }}.
F = {0,
The algebra for the outcomes for flipping two coins is
/ Ω, {T T }, {HH}, {HT, T H}, {HH, T T }, {HH, HT, T H}, {T T, HT, T H}}.
F = {0,
We can now define a probability measure by assigning to each element of sample space Ω, a
probability P.
Definition 23.1. The set function P is called a probability measure if
/ = 0;
(i) P(0)
(ii) P(Ω) = 1;
23.1. Discrete Probability Model 211
/
(iii) P(A ∪ B) = P(A) + P(B) for all A, B ∈ Ω and A ∩ B = 0.
The three conditions listed above are the axioms of probability theory.
Example 23.3. For the outcomes of flipping two fair coins,
P(HH) = P(HT ) = P(T T ) = P(T H) = 0.25.
The triple of the set of outcomes, the algebra, and the probability measure (Ω, F , P) is referred
to as a probability model.
In next step, we assign probabilities to the random events. Three sources of attaching proba-
bilities to the outcomes of random events are (a) equally likely events, (b) long run frequencies and
(c) degree of confidence (subjective or Bayesian approach). Observe that even though we assign
probabilities to different events, the mathematical theory for dealing with the random events and
their probabilities remain the same.
We define random variable next. The rule that specifies a real number to the outcomes is called
a random variable. More formally,
Definition 23.2. A random variable is a set function that maps the set of outcomes of a random
event to the set of real numbers.
Such a function is not unique and depending on the the purpose at hand, we may define one or
many random variables to the same random event.
Example 23.4. For the outcomes of flipping two fair coins, let us define a random variable X as
the number of heads. Then, we have
X(HH) = 2; X(HT ) = X(T H) = 1, X(T T ) = 0.
We could have defined the random variable X as the number of tails. Then, we have
X(HH) = 0; X(HT ) = X(T H) = 1, X(T T ) = 2.
In collecting labor statistics, we are interested in the characteristics of the respondents. For
example, we may ask if a person is in the labor force or not, employed or unemployed. We could
also be interested to learn the demographic characteristics of the respondents like gender, race, age
etc. For each of these answers we can define one or more binary variables. For example let X = 1 if
a respondent who is in the labor force is unemployed and X = 0 if employed. We can define Y = 1
if the respondent is a woman and employed, Y = 0 otherwise.
Probability distributions become unwieldy as the number of outcomes becomes large or infi-
nite. One way to summarize the information about a probability distribution is through its moments
as mean which measure the central tendency, and variance, which measures the dispersion or vari-
ability of the distribution. Another moment reflects the skewness of the distribution to the left or
to the right and kurtosis which is an indicator of the bundling of the outcomes near the mean : the
more values are concentrated near the mean, the taller is the peak of the distribution.
The first moment of the distribution which is the expected value or the mean of the distribution
is defined as
n
E(X) = µ = ∑ xi P(xi ).
i=1
Example 23.6. For the distribution of the number of heads in three flips of a coin, we have,
µ = 0 · P(X = 0) + 1 · P(X = 1) + 2 · P(X = 2) + 3 · P(X = 3).
which yields the mean as
µ = 0 + 0.375 + 0.750 + 0.375 = 1.50
Another measure (which is of great importance) is the variance or the second moment around
the mean :
n
E(X − µ)2 = σ2 = ∑ (xi − µ)2 P(xi ).
i=1
23.2. Marginal and Conditional Distribution 213
The formula for the variance can be rewritten using the binomial expansion as
n
E(X − µ)2 = ∑ (xi − µ)2 P(xi )
i=1
n n
= ∑ xi2 P(xi ) − 2µ ∑ xi P(xi ) + µ2
i=1 i=1
n
= ∑ xi2 P(xi ) − µ2
i=1
Example 23.8. For the distribution of the number of heads in three flips of a coin, the variance is
Mean is a measure of central tendency of a distribution showing its center of gravity whereas
the variance and its square root, called the standard deviation measure the dispersion or the volatil-
ity of the distribution. The advantage of using the standard deviation is that it measures the disper-
sion in the same measurement units as the original variable. In finance, variance of returns of an
asset is used as a measure of risk.
As we have observed before, a random event may give rise to a number of random variables each
defined by a different set function whose domains are the same set. In the Table below we present
such a situation where random variables X and Y and their probabilities are reported. Think of
Y as the annual income in units of thousand dollars of a profession and X as gender, with X = 0
denoting men and X = 1 denoting women. The information contained in the table is probability
of joint events, i.e., the probability of X and Y each taking a particular value. For instance the
probability of X = 1 and Y = 120 is 0.11, which is denoted as
Such a probability is referred to as joint probability because it shows the probability of a woman
earning $120000 a year.
214 23. Elementary Concepts in Probability
X Y P
0 60 0.02
0 70 0.04
0 80 0.07
0 90 0.09
0 100 0.10
0 110 0.06
0 120 0.03
0 130 0.02
0 140 0.01
0 150 0.01
1 70 0.01
1 80 0.02
1 90 0.04
1 100 0.08
1 110 0.11
1 120 0.11
1 130 0.09
1 140 0.05
1 150 0.03
1 160 0.01
If we are interested only in X, then we can sum up the overall relevant values of Y and get the
marginal probability of X. For example,
P(X = 1) = P(X = 1,Y = 70) + · · · + P(X = 1,Y = 160) = 0.01 + 0.02 + · · · + 0.03 + 0.01 = 0.55.
In similar manner, we can calculate the probability of X = 0 which would be 0.45. Thus the
marginal distribution of X is
X P(X)
0 0.45
1 0.55
Observe that in this example, the marginal distribution of X shows the distribution of men and
women in that profession (45% men and 55% women), whereas the marginal distribution of Y
would show the distribution of income for both men and women, i.e., profession as a whole.
Sometimes we may be interested to know the probability of Y = 110 when we already know
that X = 1. Thus we want to know the conditional probability of Y = 110, given that X = 1.
P(Y = 110, X = 1) 0.11
P(Y = 110|X = 1) = = = 0.20
P(X = 1) 0.55
In general,
P(Y = y j , X = xk )
P(Y = y j |X = xk ) = .
P(X = xk )
We have computed the conditional distribution of Y |X = 0 and Y |X = 1.
A conditional distribution has a mean, variance and other moments. The mean is
n
E(Y |X = xk ) = ∑ y j P(y j |X = xk ).
j=1
Variance and other higher moments of the conditional distribution can be computed similarly.
E(Y |X = 1) = 111.4.
Many variables we come across in economics are continuous in nature as against discrete. In
assigning probabilities to continuous variables, we face the problem that no matter how small is
the interval of values of the continuous variable, there are infinitely many points in it. If we assign
positive probabilities to each point, the sum of such probabilities would diverge which violates the
axiom of probability theory, i. e., the sum of probabilities should add up to one.
This problem is circumvented by assigning probabilities to the segments of the interval within
which the random variable is defined.
P(X ≤ 5), or P(−4 < X ≤ 2)
Example 23.9. A simple example of a continuous random variable is the uniform distribution.
Variable X can take any value between a and b and the probability of X falling within the segment
[a, c] is proportional to the length of the interval compared to the interval [a, b].
c−a
P(a < X ≤ c] =
b−a
(c)
F(−∞) = lim F(x) = 0, and F(∞) = lim F(x) = 1.
x→−∞ x→∞
These conditions are the counterpart of the discrete case and entail that probability is always posi-
tive and the sum of probabilities adds to one.
Now we define the probability model for continuous random variables. Consider the extended
real line R = R ∪ {−∞, ∞} which shall play the same role for the continuous variables as Ω plays
for the discrete variables, (the set of all possible outcomes). Consider the half closed intervals on
R,
(a, b] = [x ∈ R : a < x ≤ b}]
and form finite sums of such intervals provided the intervals are disjoint:
n
A = ∑ (a j , b j ], n < ∞.
j=1
A set consisting of all such sums plus the empty set 0/ is an algebra, but it is not a σ-algebra. The
smallest σ-algebra that contains this set is called the Borel set and is denoted by B(R). Finally we
define the probability measure as
F(x) = P(−∞, x].
The triple (R, B(R), P) is our probability model for continuous random variables.
Chapter 24
Solution to PS 1
A B A ∧ B A ∨ B ∼ (A ∧ B) ∼ A ∼ B ∼ A∨ ∼ B ∼ (A ∨ B) ∼ A∧ ∼ B
1 2 3 4 5 6 7 8 9 10
T T T T F F F F F F
T F F T T F T T F F
F T F T T T F T F F
F F F F T T T T T T
A B A ⇒ B ∼ (A ⇒ B) ∼ B A∧ ∼ B
1 2 3 4 5 6
T T T F F F
T F F T T T
F T T F F F
F F T F T F
219
220 24. Solution to PS 1
(4) (a) The mistake is in assuming the same value of k for m and n. The correct proof should be
Proof. Since m is an even integer and n is an odd integer, m = 2k and n = 2p + 1 for some
integers k and p. Therefore, 2m + 3n = 2(2k) + 3(2p + 1) = 4k + 6p + 3 = 2(2k + 3p +
1) + 1 = 2l + 1; where l = 2k + 3p + 1. Since k, p ∈ Z, l ∈ Z. Hence, 2m + 3n = 2l + 1 for
some integer l, whence 2m + 3n is an odd integer.
(b) The mistake is in showing the claim for one particular value of n. The claim holds for all
positive integers. The correct proof should be
24. Solution to PS 1 221
Proof. Assume, to the contrary, that there is a greatest negative real number x. Then, x ≥ y
for every negative real number y. Consider the number 2x . Since x is a negative real number,
so too is 2x . Multiplying both sides of the inequality 12 < 1 by x, which is negative, gives
x x
2 > x. Hence, 2 is a negative real number that is greater than x, which is a contradiction.
Hence our assumption that there is a greatest negative real number is false. Thus there is
no greatest negative real number.
(c) The product of an irrational number and a nonzero rational number is irrational.
Proof. Assume, to the contrary, that there exists a non-zero rational number p and an
irrational number q whose product is a rational number. Thus, by definition of rational
numbers, p = ab and p · q = r = dc for some integers a; b; c and d witha ̸= 0, b ̸= 0 and
d ̸= 0. Hence,
c
r d bc
q= = a =
p b ad
[ ]2
and assume that P(k) is true; that is, assume that 13 + · · · + k3 = k(k+1)
2 . For the
inductive step, we need to show that P(k + 1) is true. That is, we show that
[ ]
(k + 1)(k + 2) 2
1 + 2 + · · · + k + (k + 1) =
3 3 3 3
.
2
Evaluating the left-hand side of this equation, we have
13 + · · · + k3 + (k + 1)3 = (13 + · · · + k3 ) + (k + 1)3
[ ]
k(k + 1) 2
= + (k + 1)3 (by the inductive hypothesis)
2
[ ]
2 4(k + 1)
2 k
= (k + 1) +
4 4
[ ] [ ]
(k + 1) 2 2 (k + 1) 2
= [k + 4k + 4] = (k + 2)2
2 2
[ ]
(k + 1)(k + 2) 2
= ;
2
thus verifying that P(k + 1) is true.
(iii) Hence, by the principle of mathematical induction, P(n) is true for all integers n ≥ 1;
that is,
[ ]
n(n + 1) 2
1 +···+n =
3 3
2
is true for every positive integer n.
(d) It is an example of an arithmetic - geometric series. Let us denote the sum by S, i.e.
(7) To show that the formula holds for n = 0, we must show that
0
r0+1 − 1
∑ ri = r−1
.
i=0
The left-hand side of this equation is ∑0i=0 ri = r0 = 1, while the right-hand side is r r−1−1 = 1,
0+1
since r ̸= 1. Hence the formula holds for n = 0. For the inductive hypothesis, let k be an
arbitrary (but fixed) integer such that k ≥ 0 and assume that ∑ki=0 ri = r r−1−1 . For the inductive
k+1
k+1 i rk+2 −1
step, we need to show that ∑i=0 r = r−1 . Evaluating the left-hand side of this equation, we
have
k+1 k
∑ ri = ∑ ri + rk+1 (writing the (k + 1)st term separately)
i=0 i=0
rk+1 − 1
= + rk+1 (by the inductive hypothesis)
r−1
rk+1 − 1 (r − 1)rk+1
= +
r−1 r−1
r − 1 + r − rk+1
k+1 k+2
=
r−1
r −1
k+2
= ;
r−1
thus verifying the claim. Hence, by the principle of mathematical induction, the formula is true
for all integers n ≥ 0.
In the limiting case of n → ∞, the sum is well-defined for |r| < 1. Also the sum is 1−r 1
in
this case. In case of |r| ≥ 1 it is not well defined in case of n → ∞, though it is defined for all
n ∈ N.
(8) (a) We proceed by mathematical induction. When n = 2, the result is true since in this case
n3 − n = 23 − 2 = 8 − 2 = 6 and 6 is divisible by 6. Hence, the base case when n = 2 is
true. For the inductive hypothesis, let k be an arbitrary (but fixed) integer such that k ≥ 2
24. Solution to PS 1 225
and assume that the property holds for n = k, i.e., suppose that k3 − k is divisible by 6. For
the inductive step, we must show that the property holds for n = k + 1. That is, we must
show that (k + 1)3 − (k + 1) is divisible by 6. Since k3 − k is divisible by 6, there exists,
by definition of divisibility, an integer r such that k3 − k = 6r. Now, by the laws of algebra
and the inductive hypothesis, it follows that
Now, k(k + 1) is a product of two consecutive integers, and is therefore even. Hence,
k(k + 1) = 2s for some integer s. Thus, 6r + 3k(k + 1) = 6r + 3(2s) = 6(r + s), and so, by
substitution, (k + 1)3 − (k + 1) = 6(r + s), which is divisible by 6. Therefore, (k + 1)3 −
(k + 1) is divisible by 6, as desired. Hence, by the principle of mathematical induction, the
property holds for all integers n ≥ 2.
(b) We proceed, as before, by mathematical induction. When n = 3, the inequality holds since
in this case 2n = 23 = 8 and 2n + 1 = 2 · 3 + 1 = 7, and 8 > 7. Hence, the base case when
n = 3 is true. For the inductive hypothesis, let k be an arbitrary (but fixed) integer such that
k > 3 and assume that the inequality holds for n = k, i.e., suppose that 2k > 2k + 1. For
the inductive step, we must show that the inequality holds for n = k + 1. That is, we must
show that 2k+1 > 2(k + 1) + 1. Now,
2k+1 = 2 · 2k
> 2 · (2k + 1) (by the inductive hypothesis)
= 2(k + 1) + 2k
> 2(k + 1) + 1 (since k ≥ 3),
as desired. Hence, by the principle of mathematical induction, the inequality holds for all
integers n ≥ 3.
(9) By the Quotient Remainder theorem, for d = 6, for all natural number m, m = 6n + r where
n is integer and r ∈ {0, 1, 2, 3, 4, 5}. Since m is prime, it cannot be of the form 6n (divisible
by 6), 6n + 2, or 6n + 4 (divisible by 2) or 6n + 3 (divisible by 3). Thus the only remaining
possibilities are 6n + 1 and 6n + 5.
226 24. Solution to PS 1
|9 − 5x| ≤ 11
9 − 5x ≤ 11 or − (9 − 5x) ≤ 11
(10) 9 − 5x − 9 ≤ 11 − 9 −9 + 5x ≤ 11
−5x ≤ 2 −9 + 5x + 9 ≤ 11 + 9
1 1
− · −5x ≥ − · 2 5x ≤ 20
5 5
2 1 1
x≥− · 5x ≤ · 20
5 5 5
Chapter 25
Solution to PS 2
(1) We need to verify that it satisfies three conditions of the distance function.
(a) (i) Non-negativity is obvious as the absolute value is non-negative. If x = y, then d(x, y) =
0. Also if
n
d (x, y) = ∑ | xi − yi |= 0
i=1
then xi − yi = 0 for all i = 1, · · · , n. This implies that x = y.
(ii) Symmetry is obvious too since absolute value function is symmetric,
| a − b |=| b − a | .
227
228 25. Solution to PS 2
(iii) Triangle Inequality I: Note that max{a, b} > a and max{a, b} > b. Using this we
have
d(x, y) > |x1 − y1 | and d(x, y) > |x2 − y2 |
d(y, z) > |y1 − z1 | and d(y, z) > |y2 − z2 |
d(x, y) + d(y, z) > |x1 − y1 | + |y1 − z1 | > |x1 − z1 |
d(x, y) + d(y, z) > |x2 − y2 | + |y2 − z2 | > |x2 − z2 | .
It follows that
d(x, y) + d(y, z) > max{|x1 − z1 | , |x2 − z2 |} = d (x, z) .
Hence it is a distance function.
(iv) Triangle Inequality II: Consider the case when d(x, z) = |x1 − z1 |, i.e., |x1 − z1 | ≥
|x2 − z2 |.
Then using triangle inequality for the absolute value function,
d(x, z) = |x1 − z1 | ≤ |x1 − y1 | + |y1 − z1 |
≤ d(x, y) + d(y, z)
The inequality in second line follows from the fact that either d(x, y) = |x1 − y1 | or
d(x, y) > |x1 − y1 |. Similar observations hold for d(y, z). The second case of d(x, z) =
|x2 − z2 | will be similar. Hence it is a distance function.
(c) (i) Non-negativity: d(x, y) ≥ 0 for all x, y in Rn , and thus 1 + d(x, y) ≥ 1 for all x, y in
Rn . As a result, d1 (x, y) ≥ 0 for all x, y in Rn .
By the definition of d1 (x, y), d1 (x, y) = 0 if and only if d(x, y) = 0. But d(x, y) = 0 if
and only if x = y.
(ii) Since d(x, y) = d(y, x), it is straightforward to see that d1 (x, y) = d1 (y, x).
(iii) Triangle Inequality I
d1 (x, z) ≤ d1 (x, y) + d1 (y, z) ⇔
d(x, z) d(x, y) d(y, z)
≤ + ⇔
1 + d(x, z) 1 + d(x, y) 1 + d(y, z)
d(x, z)[1 + d(x, y)][1 + d(y, z)] ≤ d(x, y)[1 + d(x, z)][1 + d(y, z)]
+ d(y, z)[1 + d(x, y)][1 + d(x, z)] ⇔
d(x, z) ≤ d(x, y) + d(y, z) + 2d(x, y)d(y, z) + d(x, y)d(y, z)d(x, z)
Since d(x, y) + d(y, z) ≥ d(x, z), d(a, b) ≥ 0 for any (a, b) ∈ Rn × Rn , the last inequal-
ity is always true. Thus d1 (x, z) ≤ d1 (x, y) + d1 (y, z) for all x, y, z in Rn .
25. Solution to PS 2 229
a ≤ b → a + ab ≤ b + ab
a b
a(1 + b) ≤ b(1 + a) → ≤
1+a 1+b
d(x, z) d(x, y) + d(y, z)
≤
1 + d(x, z) 1 + d(x, y) + d(y, z)
d(x, z) d(x, y) d(y, z)
≤ +
1 + d(x, z) 1 + d(x, y) 1 + d(y, z)
d1 (x, z) ≤ d1 (x, y) + d1 (y, z).
∪∞ [ ]
(2) It is bounded. Take B = 2, ∥x∥ 6 2, ∀x ∈ n=1
1 2
n, n . But it is NOT closed as
∞ [ ]
∪ 1 2
, = (0, 2].
n=1
n n
So it is not compact.
(3)
(A ∪ B)c ⊆ Ac ∪ Bc is TRUE.
Let, x ∈ (A ∪ B)c
⇒ x∈
/ (A ∪ B)
⇒ x∈
/ A∧x ∈
/B
⇒ x ∈ AC ∧ x ∈ BC
⇒ x ∈ AC ∪ BC
(A ∪ B)c ⊇ Ac ∪ Bc is FALSE.
Let, x ∈ Ac ∪ Bc and let x ∈ Ac ∧ x ∈
/ Bc
⇒ x∈
/ A∧x ∈ B ⇒ x ∈ A∪B
/ (A ∪ B)C .
⇒ x∈
(6) It is enough to show that one of the properties of the vector space is not satisfied by this space.
Take scalar multiplication by 2. Let (x1 , x2 ) ∈ C and let α = 2 be a scalar. Then
(2x1 , 2x2 ) ∈ R2 : (2x1 )2 + (2x2 )2 = 4.
Hence (2x1 , 2x2 ) ∈
/ C and so C is not a vector space.
(7) In this case the commutative property of the sum of vectors does not hold. Consider a = (2, 3)
and b = (4, 5).
Then a + b = (2 + 4, 3 − 5) = (6, −2) and b + a = (4 + 2, 5 − 3) = (6, 2). Hence
(2, 3) + (4, 5) ̸= (4, 5) + (2, 3).
So V is not a vector space.
(8) In this case also, the commutative property of the sum of vectors does not hold. Consider as
before, a = (2, 3) and b = (4, 5).
Then a + b = (2 + 2 × 4, 3 + 3 × 5) = (10, 18) and b + a = (4 + 2 × 2, 5 + 3 × 3) = (8, 14).
Hence
(2, 3) + (4, 5) ̸= (4, 5) + (2, 3).
So V is not a vector space.
(i)
(25.1) (J ∩ K)c ⊆ J c ∪ K c .
Let
x ∈ (J ∩ K)c
⇒ x∈
/ (J ∩ K)
⇒ x∈
/ J ∨x ∈
/K
⇒ x ∈ JC ∨ x ∈ K C
⇒ x ∈ JC ∪ K C .
(ii) Next
Jc ∪ Kc ⊆ Jc ∩ Kc
x ∈ JC ∪ K C
⇒ x ∈ JC ∨ x ∈ K C
⇒ x∈
/ J ∨x ∈
/K
⇒ x∈
/ J ∩K
⇒ x ∈ (J ∩ K)C .
(b) (J ∪ K)c = J c ∩ K c .
(25.2) (J ∪ K)c ⊆ J c ∩ K c
Let
x ∈ (J ∪ K)c
⇒ x∈
/ (J ∪ K)
⇒ x∈
/ J ∧x ∈
/K
⇒ x ∈ JC ∧ x ∈ K C
⇒ x ∈ JC ∩ K C .
Next
J c ∩ K c ⊆ (J ∪ K)c
Let
x ∈ Jc ∩ Kc
⇒ x ∈ Jc ∧ x ∈ Kc
⇒ x∈
/ J ∧x ∈
/K
⇒ x∈
/ J ∪K
⇒ x ∈ (J ∪ K)c .
232 25. Solution to PS 2
(11) We know that if a sequence is convergent then it is bounded. The contrapositive statement
will be, “If a sequence is not bounded then it is not convergent.”. The sequence xn = n, n ∈ N
is NOT bounded. No matter which B we choose as a bound, there will be a natural number
greater than it. We now use the contrapositive to conclude that {xn }∞
n=1 is not convergent.
(12) Since {xn } is a Cauchy sequence, for ∀ε > 0, there exist N ∈ N such that ∀m, n > N implies
that |xn − xm | < ε. Choose ε = 1, m = N, then
second sequence {yn } converges to zero. Hence, the sequence being some of two convergent
sequences converges to the sum of the limits which is equal to 2 + 0 = 2. Since limit of
convergent sequence is unique, 1 cannot be a limit.
(14) We consider monotone increasing sequence xn 6 xn+1 . Proof is analogous for the monotone
decreasing case. Let {xn } be a convergent sequence and let lim xn = x. From the definition of
n→∞
convergence, with ε = 1, we get N ∈ N such that ∀n > N implies that |xn − x| < 1. Then,
xn < 1 + |x| , ∀n > N.
25. Solution to PS 2 233
Let { }
B = max |x1 | , |x2 | , · · · , 1 + |x| ,
then xn 6 B, ∀n ∈ N. Now let the sequence be bounded. Let x be the least upper bound. Then
xn 6 x ∀n ∈ N. For every ε > 0, there exists a N ∈ N, such that x − ε < xN 6 x. Otherwise x − ε
would be an upper bound for the sequence. Since xn is increasing, n > N implies
x − ε < xn 6 x
which shows that xn converges to x.
(15) (i) S = (0, 1) Open: For any x ∈ (0, 1), open ball with radius min {x, 1 − x} is contained in S.
(ii) S = [0, 1] Closed: Use the theorem: A set S ⊆ Rn is closed if and only if every convergent
sequence of points {xn } ∈ S has its limit x ∈ A. Let {xn } be a convergent sequence with
limit x contained in S, then for all n, xn > 0, and xn 6 1. Since weak inequalities are
preserved in the limit, x 6 1 and x > 0. So x ∈ S and S is closed.
(iii) S{ = [0, }
1): Neither open nor closed: It is not closed since the limit of convergent sequence
1 − n is not contained in S and is not open since x = 0 is contained in S but it is not
1
Solution to PS 3
(1)
[ ] 9 6 5 4
1 −1 7
AB = . 1 −2 −3 3
0 8 10
0 1 −1 2
[ ]
1·9−1·1+7·0 1·6−1·2+7·1 1·5+1·3−7·1 1·4−1·3+7·2
=
0 · 9 + 8 · 1 + 10 · 0 0 · 6 − 8 · 2 + 10 · 1 0 · 5 − 8 · 3 − 10 · 1 0 · 4 + 8 · 3 + 10 · 2
[ ]
8 15 1 15
(26.1) =
8 −6 −34 44
235
236 26. Solution to PS 3
(3) Recall the property of determinant: If we multiply any column of the matrix by scalar k, then
the determinant of the new matrix is k times the determinant of the original matrix.
Since the matrix −2A is obtained by multiplying each column of the matrix A (having
five rows and five columns) by −2, the determinant of matrix −2A would be (−2)5 . Thus
det (−2A) = (−2)5 det A = (−32)(−1) = 32.
(4) Recall the rank of a matrix A is the number of linearly independent column vectors of A. It is
also equal to the number of linearly independent row vectors of A.
3 2 1
A= 0 1 7
5 4 −1
(26.4) ⇔ λ1 = 0, λ2 = 0
is the only solution. So the first two columns are linearly independent. Now lets take all three
columns,
3 2 1 0
λ1 0 + λ2 1 + λ3 7 = 0
5 4 −1 0
3λ1 + 2λ2 + λ3 = 0 (i)
⇔ λ2 + 7λ3 = 0 (ii)
5λ1 + 4λ2 − λ3 = 0 (iii)
(i) − 2 (ii) : 3λ1 − 13λ3 = 0
(26.5) ⇔ (iii) − 4 (ii) : 5λ1 − 29λ3 = 0
So, λ1 = 0, λ3 = 0 → λ2 = 0
is the only solution. So all three columns are linearly independent. This implies that the rank
of matrix A is 3.
26. Solution to PS 3 237
(6)
A11 = 2 > 0, A11 A22 − A12 A21 = 2 · 1 − 1 = 1 > 0: PD
B11 > 0, B22 > 0, B11 B22 − B12 B21 = 2 · 8 − 16 = 0: PSD
C11 < 0, C11C22 −C12C21 = −3 · 5 − 16 < 0 : Indefinite
D11 < 0, D11 D22 − D12 D21 = −3 · (−6) − 16 > 0: ND
(7) Let Et and Ut denote the number of people who have employment and unemployed people in
some period t. The transition probabilities are defined as follows.
pAA ≡ probability that a current A remains an A,
pAB ≡ probability that a current A moves to B,
pBB ≡ probability that a current B remains a B,
238 26. Solution to PS 3
which is
[ ]
0.9 0.1
[At Bt ] = [(0.9At + 0.7Bt ) (0.1At + 0.3Bt )] = [At+1 Bt+1 ].
0.7 0.3
In the similar manner we can determine the distribution of employees after two periods.
′ ′
xt+1 · M = xt+2
[ ]
0.9 0.1
[At+1 Bt+1 ] = [At+2 Bt+2 ]
0.7 0.3
[ ][ ]
0.9 0.1 0.9 0.1
[At Bt ] = [At+2 Bt+2 ]
0.7 0.3 0.7 0.3
[ ]2
0.9 0.1
[At Bt ] = [At+2 Bt+2 ]
0.7 0.3
In general, for n periods,
[ ]n
0.9 0.1
(26.10) [At Bt ] = [At+n Bt+n ]
0.7 0.3
The initial distribution of employees across two states at time t = 0 as
x0′ = [A0 B0 ] = [0 2000]
Then the distribution of employees in the next period t = 1 is
[ ]
0.9 0.1
[0 2000] = [1400 600] = [A1 B1 ].
0.7 0.3
The distribution after two periods is
[ ]2 [ ]
0.9 0.1 0.88 0.12
[0 2000] = [0 2000] = [1680 320] = [A2 B2 ]
0.7 0.3 0.84 0.16
The distribution after four periods is
[ ]4 [ ]
0.9 0.1 0.8752 0.1248
[0 2000] = [0 2000] = [1747 253] = [A4 B4 ]
0.7 0.3 0.8736 0.1264
26. Solution to PS 3 239
Observe that when the transition matrix is raised to higher powers, the new transition matrix
converges to a matrix whose rows are identical. This is referred to as the steady state. In this
example, the steady state would be
[ ]
7 1
M̄ = 8 8 .
7 1
8 8
(b) Note
det A′ = det A.
Also, matrix −A is obtained by multiplying each row (or each column) of matrix A by −1.
Hence,
det(−A) = (−1)n det A = − det A,
if n is an odd number. Thus
det A′ = det A = − det A.
This leads to
det A = 0,
and therefore A is not invertible.
(c) Note
det A′ = det A.
and
det[AA′ ] = det A × det A′ = det A × det A = [det A]2 = det I = 1,
we get
det A = ±1.
(d) As we have seen in part (b), for n an odd integer,
det AB = det A × det B = (−1)n det BA = (−1)n det B det A = − det A × det B,
implies
det A × det B = 0.
This means either det A = 0 (i. e., A is not invertible) or det B = 0 (i.e., B is not invertible).
(e) Since
det AB = det A × det B = det I = 1,
det A ̸= 0 and therefore A is invertible. Pre-multiplying both sides by A−1 , we get
A−1 AB = IB = B = A−1 I = A−1 ,
showing that A−1 = B.
(9) (a) The characteristic polynomial is obtained by taking the determinant of the matrix
4−λ 4 4
A − λI = −2 −3 − λ −6
1 3 6−λ
This is equal to
(4 − λ)(−18 − 3λ + λ2 + 18) − 4(−12 + 2λ + 6) + 4(−6 + 3 + λ) = 0
On simplification we get
(4 − λ)(−3λ + λ2 ) − 4(−6 + 2λ) + 4(−3 + λ) = 0,
26. Solution to PS 3 241
or
−12λ + 4λ2 + 3λ2 − λ3 + 24 − 8λ − 12 + 4λ = 0,
12 − 16λ + 7λ2 − λ3 = 0,
(b) The characteristic polynomial is of degree three and hence has three solutions (possibly
repeated). The solutions are λ1 = 3, λ2 = 2 and λ3 = 2.
(c) Eigenvector for λ = 3:
1 4 4 x1 0
[A − λI]x = −2 −6 −6 x2 = 0 .
1 3 3 x3 0
It is easy to check that x1 = 0, x2 = −1, x3 = 1 is a solution. Hence the eigenvector family
is given by
x1 0
x2 = t −1 , t ̸= 0.
x3 1
Eigenvector for λ = 2:
2 4 4 x1 0
[A − λI]x = −2 −5 −6 x2 = 0 .
1 3 4 x3 0
It is easy to check that x1 = 2, x2 = −2, x3 = 1 is a solution. Hence the eigenvector family
is given by
x1 2
x2 = t −2 , t ̸= 0.
x3 1
(10) (a) We use the Result 7.1 to prove this. The determinant of the upper triangular matrix is equal
to the product of all the diagonal terms. By definition of eigenvalue, it is clear that if we
take λi = aii , then the determinant of the matrix [A − λi I] is zero since the diagonal entry
in row i or column i is zero.
Similar arguments can be used to prove the result for the lower triangular matrix.
(b) Since A is an invertible matrix, A−1 exists and we can pre-multiply the equation (A−λI)x =
0 by (A−1 . This yields (I − λA−1 )x = 0 or ( λ1 I − A−1 )x = 0 or (A−1 − λ1 I)x = 0 as desired.
Thus for an invertible matrix A, λ is an eigenvalue of A if and only if λ1 is an eigenvalue of
A−1 .
(c) Assume λ is the eigenvalue for the eigenvector x, we know
Ax = λx.
Pre-multiplying both sides by A, we get
A × Ax = A × λx = (λ)Ax = (λ)λx = (λ)2 x.
242 26. Solution to PS 3
Solution to PS 4
(1) (a)
[ ]1
2x + 1 2
f (x) =
x−1
[ ] 1
1 2x + 1 − 2 (x − 1) 2 − (2x + 1) 1
f ′ (x) =
2 x−1 (x − 1)2
[ ]1
3 x−1 2 1
= −
2 2x + 1 (x − 1)2
3 1
(27.1) = −
2 (2x + 1) 12 (x − 1) 32
(b)
y = f (x0 ) + f ′ (x0 ) (x − x0 ) .
243
244 27. Solution to PS 4
(3) (a)
lim f (x) = lim+ f (x) ̸= f (x0 )
x→x0− x→x0
(5)
[ ]
∇ f (x, y) = 2xy + y2 − 2y + 3 x2 + 2xy − 2x
[ ]
2y 2x + 2y − 2
H f (x, y) =
2x + 2y − 2 2x
[ ]
4 4
(27.5) H f (1, 2) =
4 2
27. Solution to PS 4 245
Show that the partial derivatives D1 f (x, y) and D2 f (x, y) exist at every point in R2 , although f
is not continuous at (0, 0).
(a) Observe that for all (x, y) ̸= (0, 0), we get
(7) This exercise gives an example of a function with D12 f (x, y) ̸= D21 (x, y). Let f (x, y) be defined
as
{
xy(x2 −y2 )
x2 +y2
if (x, y) ̸= (0, 0)
f (x, y) =
0 otherwise.
246 27. Solution to PS 4
and
xh(x2 −h2 )
(x2 +h2 )
−0
D2 f (x, 0) = lim
h→0 h
x(x2 − h2 )
= lim 2 = x.
h→0 (x + h2 )
Therefore, the partial derivatives D1 f (x, y) and D2 f (x, y) are continuous at every point
in R2 . Since the real-valued function f has continuous partial derivatives at every point
(x, y) ∈ R2 , it is continuous at every point (x, y) ∈ R2 .
(c) Since f (x, y) is a rational function with non-zero denominator for (x, y) ̸= (0, 0), the second
order cross partial derivatives D12 f (x, y) and D21 f (x, y) exist at every point in R2 and are
continuous everywhere in R2 except at (0, 0).
(d) Given D2 f (x, 0) = x we get D21 f (0, 0) = +1 and from D1 f (0, y) = −y we get D12 f (0, 0) =
−1.
Chapter 28
Solution to PS 5
(1) Let f (x) and g (x) be two concave functions and let h (x) = f (x) + g (x). Concavity of f and g
imply, ∀x, y ∈ D, ∀λ ∈ [0, 1]
[ ]
λ f (x) + (1 − λ) f (y) 6 f λx + (1 − λ) y
[ ]
λg (x) + (1 − λ) g (y) 6 g λx + (1 − λ) y .
(2) (a) False. Consider A, B ⊆ R, A = [0, 2] , B = [4, 6] .A ∪ B = [0, 2] ∪ [4, 6]. Then 1 ∈ A ∪ B, 5 ∈
A ∪ B, but 12 · 1 + 12 · 5 = 3 ∈
/ A ∪ B.
(b) True. If A and B are convex sets, then A ∩ B is convex. Let x ∈ A ∩ B, y ∈ A ∩ B. Then,
(28.1) λx + (1 − λ) y ∈ A as x, y ∈ A
(28.2) λx + (1 − λ) y ∈ B as x, y ∈ B
(28.3) ⇒ λx + (1 − λ) y ∈ A ∩ B.
Hence A ∩ B is convex.
(c) True. Let z, z′ ∈ C, and let 0 ≤ λ ≤ 1. By definition of C, there exist x, x′ ∈ A and y, y′ ∈ B,
such that z = x + y and z′ = x′ + y′ . We will show that λz + (1 − λ)z′ belongs to C. This
will establish that C is a convex set in Rn .
249
250 28. Solution to PS 5
(3) Let x, y ∈ [0, 1] and let 0 ≤ λ ≤ 1. Clearly [λx + (1 − λ)y] ∈ [0, 1]. In order to prove the claim,
we will show that:
h(λx + (1 − λ)y) ≤ λh(x) + (1 − λ)h(y)
Using the definition of h,
(28.5) h(λx + (1 − λ)y) = f (λx + (1 − λ)y)g(λx + (1 − λ)y)
Since f and g are convex functions on [0, 1]
f (λx + (1 − λ)y) ≤ λ f (x) + (1 − λ) f (y), and
g(λx + (1 − λ)y) ≤ λg(x) + (1 − λ)g(y).
Since f and g are non-negative valued functions,
f (λx + (1 − λ)y)g(λx + (1 − λ)y) ≤ [λ f (x) + (1 − λ) f (y)][λg(x) + (1 − λ)g(y)]
= λ2 f (x)g(x) + λ(1 − λ){ f (x)g(y) + g(x) f (y)}
(28.6) + (1 − λ)2 f (y)g(y)
Since f and g are increasing functions on [0, 1],
{ f (x) − f (y)}{g(x) − g(y)} ≥ 0
and so:
(28.7) f (x)g(x) + f (y)g(y) ≥ f (x)g(y) + f (y)g(x)
Using (28.7) in (28.6),
f (λx + (1 − λ)y)g(λx + (1 − λ)y) ≤ λ2 f (x)g(x) + λ(1 − λ){ f (x)g(x) + f (y)g(y)} + (1 − λ)2 f (y)g(y)
= λ f (x)g(x) + (1 − λ) f (y)g(y)
(28.8) = λh(x) + (1 − λ)h(y)
Using (28.6) in (28.8), we obtain:
h(λx + (1 − λ)y) ≤ λh(x) + (1 − λ)h(y)
which is the desired result.
(4) (a) Recall a monotone function of one variable is quasi-concave. Since f (x) = 3x + 4 is mono-
tone increasing, it is quasi-concave.
28. Solution to PS 5 251
0 y exp (x) exp (x)
B (x, y) = y exp (x) y exp (x) exp (x)
exp (x) exp (x) 0
[ ]
( ) 0 y exp (x)
det B1 (x, y) = det
y exp (x) y exp (x)
= −y2 exp (2x) < 0;
0 y exp (x) exp (x)
det B2 (x, y) = det y exp (x) y exp (x) exp (x)
exp (x) exp (x) 0
= y exp (3x) > 0.
( )
(−1)r det Br (x) > 0, ∀r = 1, 2, · · · , n; ∀x ∈ D.
0 −2xy3 −3x2 y2
B (x, y) = −2xy3 −2y3 −6xy2
−3x2 y2 −6xy2 −6x2 y
[ ]
( ) 0 −2xy3
det B1 (x, y) = det
−2xy3 −2y3
(28.9) = −4x2 y6 6 0
( ) 0 −2xy 3 −3x2 y2
det B2 (x, y) = det −2xy3 −2y3 −6xy2
−3x2 y2 −6xy2 −6x2 y
(28.10) = −30x4 y7
( )
Note the sign of det B2 (x, y) is not positive. Hence it is not quasi-concave.
252 28. Solution to PS 5
f (x)
g(x)
(5) Let
0 for x 6 0
x for 0 6 x 6 21
(28.11) f (x) =
1 − x for 12 6 x 6 1
0 for x > 1
0 for x 6 1
x − 1 for 1 6 x 6 23
(28.12) g (x) = and
2 − x for 32 6 x 6 2
0 for x > 2
(28.13) h (x) = f (x) + g (x)
In the figures, Fig. 1 and Fig. 2 functions are quasiconcave (each of them is first non-
decreasing, then non-increasing), whereas Fig. 3 function, which is the sum of the top and
middle functions, is not quasiconcave (it is not non-decreasing, is not non-increasing, and is
not non-decreasing then non-increasing.
28. Solution to PS 5 253
f (x) + g(x)
(6) (i)
[ ]
∇ f (x, y, z) = 24x2 + 2y2 4xy −3z2
48x 4y 0
(28.14) H f (x, y) = 4y 4x 0
0 0 −6z
Then f (x, y) is not concave as the principal minor D1 = 48x > 0. The bordered Hessian
is
0 24x2 + 2y2 4xy −3z2
24x2 + 2y2 48x 4y 0
B (x, y) =
4xy 4y 4x 0
−3z2 0 0 −6z
[ ]
( ) 0 24x2 + 2y2
det B1 (x, y) = det 2 2
24x + 2y 48x
(28.15) = −576x4 − 96x2 y2 − 4y4 6 0
0 24x 2 + 2y2 4xy
( )
det B2 (x, y) = det 24x2 + 2y2 48x 4y
4xy 4y 4x
(28.16) = −2304x5 − 384x3 y2 + 48xy4
which could take both positive or negative values. Hence f (x, y, z) is neither quasiconcave
nor quasiconvex.
(ii)
[ ]
∇g (x, y) = 1 − exp (x) − exp (x + y) 1 − exp (x + y)
[ ]
− exp (x) − exp (x + y) − exp (x + y)
(28.17) Hg (x, y) = .
− exp (x + y) − exp (x + y)
254 28. Solution to PS 5
Solution to PS 6
(a) The rank of matrix A can be at most 2. This means that there can be at most two endogenous
variables. Also the second column of matrix A is a multiple (three times) of the first column
and the fourth column is a multiple of column one (−2 times). The remaining two columns
are one and three. The sub-matrix consisting of columns one and three has full rank as it’s
determinant is −4. So we can choose x and z as endogenous variables and the remaining
two y and w as exogenous variables.
(b) The system of linear equations can be rewritten as under (with the exogenous and endoge-
nous variables choice made above.
[ ] { } [ ]
1 1 x 1 − 3y + 2w
(29.2) · =
2 −2 z 3 − 6y + 4w
Multiply the first equation by two and add it to the second to get,
(29.3) 4x = 5 − 12y + 8w
5 − 12y + 8w 5
(29.4) x= = − 3y + 2w
4 4
Substitute the value of x in the first equation to get
( )
5 1
z = 1 − 3y + 2w − − 3y + 2w = − .
4 4
255
256 29. Solution to PS 6
The rank of matrix A can be at most 3. However, we observe that the third row is equal to the
sum of twice the second row and the first row. This means that the rank of matrix A cannot be
three. The sub matrix obtained by eliminating the third row of A (call it matrix B) is
[ ]
−1 3 −1 1
(29.6)
4 −1 1 1
The determinant of the sub matrix of B obtained by eliminating the third and fourth column is
−11 which is non-zero. This sub-matrix has full rank. So we can choose x and y as endogenous
variables and the remaining two z and w as exogenous variables.
We can solve the set of equations to obtain
[ ] { } [ ]
−1 3 x z−w
(29.7) · =
4 −1 y 3−z−w
Solving the two equations we get
9 − 2z − 4w
x= ,
11
and
3 + 3z − 5w
y= .
11
( ) ( ) ( )
dz D1 f (x, y, z) 2x 2(6) 4
=− =− =− =− ,
dx (6,3,−3) D3 f (x, y, z) (6,3,−3) 3z2 (6,3,−3) 3(9) 9
and
( ) ( ) ( )
dz D2 f (x, y, z) −2x −2(3) 2
=− =− =− = .
dy (6,3) D3 f (x, y, z) (6,3,−3) 3z2 (6,3,−3) 3(9) 9
( ) ( )
dz dz
z = g(6, 3) + (6.1 − 6) + (2.8 − 3)
dx (6,3) dy (6,3)
( ) ( )
4 2 135
= −3 + − · (0.1) + · (−0.2) = − .
9 9 45
(5) Consider the profit maximizing firm described in the Example 13.2. If p increases by ∆p and
w increases by ∆w, what will be the change in the optimal input amount x?
Note the first order condition for the profit maximization is
p f ′ (x) − w = 0.
quantity. Then
) ( ( )
∗dx dx
x=x + · ∆p + · ∆w
dp dw
( ) ( )
∗ D1 F(p, w, x) D2 F(p, w, x)
=x − · ∆p − · ∆w
D3 F(p, w, x) D3 F(p, w, x)
( ′ ) ( )
f (x) −1
= x∗ − ′′
· ∆p − · ∆w
p f (x) p f ′′ (x)
( ′ ) ( )
∗ f (x) 1
=x − · ∆p + · ∆w.
p f ′′ (x) p f ′′ (x)
(6) Consider 3x2 yz + xyz2 = 96 as defining x as an implicit function of y and z around the point
x = 2, y = 3, z = 2.
(a) Let F(x, y, z) = 3x2 yz + xyz2 − 96 = 0. Then
D1 F(x, y, z) = 6xyz + yz2 = 6(2)(3)(2) + 3(4) = 84 ̸= 0,
and F(x, y, z) is a continuously differentiable function (being polynomial). Hence we can
apply IFT to claim that there exists a function x = f (y) in terms of y which is continuously
differentiable, in the neighborhood of (x, y, z) = (2, 3, 2). Also
( ) ( ) ( )
dx D2 F(x, y, z) 3x2 z + xz2
=− =−
dy (2,3,2) D1 F(x, y, z) 6xyz + yz2
( )
3x2 + xz 3(4) + 2(3) 3
=− =− =− ,
6xy + yz 6(2)(3) + 3(2) 7
Then ( ) ( )
dx 3 137
x = 2+ (3.1 − 3) = 2 + − (0.1) = .
dy (2,3,2) 7 70
(b) √
z2
√ √
− 3z ± 9 + 128
yz z z2 32 1 1 16
x= =− ± + =− ± + ,
2 6 36 yz 3 9 y
which implies that
√
1 1 16
x=− + +
3 9 y
in the neighborhood of (2, 3, 2).
(c)
( ) ( ) ( )
dx 1 16 1 16 8
= √ · − 2 = 7· − =− .
dy (2,3,2) 2 19 + 16 y 2· 3 9 21
y
29. Solution to PS 6 259
Then ( ) ( )
8 8 412
x = 2+ − (3.1 − 3) = 2 − − = .
21 210 210
(d) The second method involves more computations.
( )
Take x and x′ such that f (x) = y > 0 and f x′ = y′ > 0. Then
f (x) = y
1
f (x) = 1
y
( )
x
f = 1.
y
Similarly
( ′)
x
f = 1.
y′
Take λ ∈ (0, 1) and define
λy
θ= .
λy + (1 − λ) y′
Then
(1 − λ) y′
1−θ =
λy + (1 − λ) y′
and θ ∈ (0, 1) . Function f is quasi-concave. So
( ( ) ( ′ )) { ( ) ( )}
x x x x′
f θ + (1 − θ) ′ > min f ,f
y y y y′
( ( ) ( ′ ))
λy x (1 − λ) y′ x
f ′
+ > min {1, 1}
λy + (1 − λ) y y λy + (1 − λ) y y′
′
( )
λx (1 − λ) x′
f + >1
λy + (1 − λ) y′ λy + (1 − λ) y′
( )
λx + (1 − λ) x′
f >1
λy + (1 − λ) y′
1 ( ′
)
f λx + (1 − λ) x >1
λy + (1 − λ) y′
( )
f λx + (1 − λ) x′ > λy + (1 − λ) y′
( )
= λ f (x) + (1 − λ) f x′
( )
it is concave. If f x′ is zero, since f is non-decreasing,
( )
f λx + (1 − λ) x′ > f (λx)
= f (λx) + 0
( )
= λ f (x) + (1 − λ) f x′ .
29. Solution to PS 6 261
( )
If both f (x) and f x′ are zero, then
( ) { ( )}
f λx + (1 − λ) x′ > min f (x) , f x′
( )
= 0 = λ f (x) + (1 − λ) f x′ .
(9) Since function f is homogeneous of degree m and is twice continuously differentiable, each of
the partial derivatives are homogeneous of degree m − 1.
Further, the partial derivatives are also continuously differentiable and are homogeneous
of degree m − 2 > 0.
Applying Euler’s theorem for second order partial derivatives of the partial derivative
D1 f (x), we get,
Solution to PS 7
(1)
[ ]
∇g (x, y) = 3x2 − 3 3y2 − 2
[ ]
6x 0
(30.1) Hg (x, y) =
0 6y
Then
( √ ) √
∗ 2 4 2
g 1, = −2 − = −3.09.
3 3 3
263
264 30. Solution to PS 7
4
3
−1 1 2 3
(2) We know that f ′ (x) = 0 is a necessary condition for f to have a local maxima or minima. Find
all the local maxima and minima of
(30.5) f ′ (x) = 4x3 − 12x2 + 8x = 0
( )
(30.6) 4x x2 − 3x + 2 = 0,
(30.7) x = 0, x = 1, x = 2
If we plot the graph of this function, we can see that x = 0, and x = 2 are local minima and
x = 1 is local maxima. Also x = 0, and x = 2 are global minima and there is no global maxima.
maximizing supply plan for the firm. The maximum profit is π∗ = 9 × 55 + 2 × 30 − 50 − 110 =
395.
(4) (a) The profit for the firm, when it uses K and L units of capital and labor to produce output
Q = La K b , given the out and input prices (P,w,r) is
Π(K, L) = (P · Q − wL − rK).
The firm maximizes it’s profit by choosing K and L such that both the FOC and SOC are
satisfied.
The FOCs are as under.
dΠ ( )
= P · aLa−1 K b − w = 0
dL
P · aLa−1 K b = w;
dΠ ( )
= P · La bK b−1 − r = 0,
dK
P · L bK b−1 = r.
a
The FOC with respect to L leads to the condition that the value of the marginal product
of labor is equal to the wage rate w. Similarly, the FOC with respect to K leads to the
condition that the value of the marginal product of capital is equal to the rental rate r.
(b) To solve for the optimal levels of L and K, we divide the first FOC by the second and get,
Observe that the ratio of the MPL and MPK is the MRTS (Marginal rate of technical substi-
tution, i.e., the rate at which one can substitute labor for capital along an iso-quant.) The
266 30. Solution to PS 7
value of K can be substituted in any of the two FOC to get the expression for L.
P · aLa−1 K b = w;
( )b
a−1 wb
P · aL L = w;
ra
( )b
a+b−1 wb w
P·L = ;
ra a
( )1−b ( )b
a b
P· = L1−a−b
w r
( ) 1−a−b
1−b ( ) b
∗ a b 1−a−b 1
L = P 1−a−b .
w r
We compute the optimal value of K ∗ from the last equation as under:
wb
K∗ = L;
ra
( ) 1−b ( ) b
wb a 1−a−b b 1−a−b 1
= P 1−a−b ;
ra w r
( ) 1−a−b
1−b
−1 ( ) b
a b 1−a−b +1 1
= P 1−a−b
w r
( ) 1−a−b
a ( ) 1−a−b
1−a
a b 1
= P 1−a−b .
w r
(c) For SOC, we first write down the Hessian (the matrix of second order partial derivatives
using the FOCs.
[ ] [ ]
PFLL PFLK Pa(a − 1)La−2 K b PabLa−1 K b−1
H= = .
PFKL PFKK PabLa−1 K b−1 Pb(b − 1)La K b−2
For the SOC to be satisfied, the leading principal minor of order one needs to be negative
and the leading principal minor of order two needs to be positive. Thus, Pa(a−1)La−2 K b <
0, which implies that a − 1 < 0 or a < 1. The LPM of order two is the determinant of the
Hessian matrix.
[ ]
Pa(a − 1)La−2 K b PabLa−1 K b−1
det H = det
PabLa−1 K b−1 Pb(b − 1)La K b−2
= P2 ab(a − 1)(b − 1)L2a−2 K 2b−2 − (PabLa−1 K b−1 )2 ,
= P2 ab[(a − 1)(b − 1) − ab]L2a−2 K 2b−2 ;
= P2 ab[1 − a − b]L2a−2 K 2b−2 > 0,
30. Solution to PS 7 267
which holds true if and only if 1 − a − b > 0. Note that this condition also implies that
b < 1.
Thus the production function is such that it displays diminishing marginal product in each
of the two inputs (a < 1 and b < 1) and also it displays diminishing returns to scale as the
production function is homogeneous of degree a + b < 1.
(d) We use the expression for L∗ derived earlier to find the partial derivatives.
( ) ( ) 1−a−b
1−b ( ) b
∂L∗ 1 a b 1−a−b
P 1−a−b −1 > 0,
1
=
∂P 1−a−b w r
( ) ( ) 1−a−bb
∂L∗ 1−b 1−b
− 1−a−b
1−b
−1 b 1
= − (a) 1−a−b (w) P 1−a−b < 0,
∂w 1−a−b r
( ) ( ) 1−a−b1−b
∂L∗ b a
(b) 1−a−b (r)− 1−a−b −1 P 1−a−b < 0.
b b 1
= −
∂r 1−a−b w
(e) The output is obtained by noting that the profit maximizing inputs are K ∗ and L∗ .
Q∗ = (L∗ )a (K ∗ )b ,
a b
( ) 1−a−b 1−b ( ) b ( ) 1−a−b
a ( ) 1−a−b
1−a
a b 1−a−b a b
= P 1−a−b P 1−a−b ,
1 1
w r w r
( ) a(1−b)+ab ( ) ab+b(1−a)
a 1−a−b b 1−a−b a+b
= P 1−a−b ,
w r
( ) 1−a−b
a ( ) 1−a−b
b
a b a+b
= P 1−a−b ,
w r
[( ) ( ) ] 1−a−b
1
a a b b a+b
= P .
w r
For computing the price elasticity of supply with respect to out put price, note that
[( ) ( ) ] 1−a−b
1
a b
a b
Q∗ = Pa+b ,
w r
a+b
= AP 1−a−b ,
[( ) ( ) ] 1−a−b
1
a b
a b
where A = w r is a constant independent of P. It is easy to see that the
elasticity will be εP = a+b
1−a−b . [Note that for Q = APb , εP = dQ
dP · QP = AbPb−1 QP = b.]
268 30. Solution to PS 7
Similarly, εw = − 1−a−b
a
and εr = − 1−a−b
b
. Thus,
a+b −a −b
εP + εw + εr = + + ,
1−a−b 1−a−b 1−a−b
a+b−a−b
=− = 0.
1−a−b
The economic interpretation is that if we change all the prices by same factor, then the
profit maximizing quantity does not change. In other words, the profit maximizing output
is homogeneous of degree zero in the prices (P,w,r).
(f) You may like to write down the expression for the profit function explicitly in terms of P,
w and r, on your own.
(5) (a) The profit for the firm, when it uses K, L and R units of capital, labor and natural resources
to produce output Q = ALa K b + ln R, given the output and input prices (P,w,v,r), is
Π(K, L) = P · Q − wL − rK − vR = P · ALa K b + P ln R − wL − rK − vR.
The firm maximizes it’s profit by choosing K, L and R such that both the FOC and SOC
are satisfied.
The FOCs are as under.
dΠ
= P · aLa−1 K b − w = PFL − w = 0,
dL
P · AaLa−1 K b = w;
dΠ
= P · ALa bK b−1 − r = PFK − r = 0,
dK
P · ALa bK b−1 = r,
dΠ P
= − v = PFR − v = 0,
dR R
P
= v.
R
The FOC with respect to L leads to the condition that the value of the marginal product of
labor is equal to the wage rate w̄. Similarly, the FOC with respect to K leads to the condition
that the value of the marginal product of capital is equal to the rental rate r. Lastly, the FOC
with respect to R leads to the condition that the value of the marginal product of natural
resource is equal to the price of the natural resource v.
Now take A = 3, a = b = 31 for remainder of the problem.
(b) With the given parameter values, the FOCs are, (note Aa=1=Ab)
P · L− 3 K 3 = w;
2 1
P · L 3 K − 3 = r,
1 2
P
= v.
R
30. Solution to PS 7 269
For SOC, we first write down the Hessian (the matrix of second order partial derivatives
using the FOCs.
− 2 P · L− 3 K 3 − 23 − 23
5 1
3P·L
1
PFLL PFLK PFL R K 0
1 3 −2 −2
− 3 P · L 3 K− 3
1 5
H = PFKL PFKK PFK R = P · L 3 K 3
3
2
0 .
PFRL PFRK PFR R 0 0 − RP2
For the SOC to be satisfied, the leading principal minor of order one needs to be negative,
the leading principal minor of order two needs to be positive and the leading principal
minor of order three needs to be negative.
The LPM of order 1 is negative as, − 32 P · L− 3 K 3 < 0 (given that P > 0 and K > 0, L > 0).
5 1
The LPM of order two is the determinant of the matrix obtained by removing the third row
and the third column.
[ ]
− 23 P · L− 3 K 3 31 P · L− 3 K − 3
5 1 2 2
det H2 = det 1 − 23 − 23
− 23 P · L 3 K − 3
1 5
3P·L K
4 1
= P2 · L− 3 K − 3 − P2 · L− 3 K − 3 ,
4 4 4 4
9 9
1
= P2 · L− 3 K − 3 > 0.
4 4
3
The LPM of order three is the determinant of the Hessian matrix. We compute the deter-
minant using the third row to get,
− 32 P · L− 3 K 3 13 P · L− 3 K − 3
5 1 2 2
0
det H = det 13 P · L− 3 K − 3 − 23 P · L 3 K − 3
2 2 1 5
0
0 0 − RP2
[ ]
P 4 2 −4 −4 1 2 −4 −4
=− 2 P ·L K − P ·L K
3 3 3 3 ,
R 9 9
1 P3 − 4 − 4
=− · L 3 K 3 < 0.
3 R2
Hence the SOC is satisfied.
(c) To solve for the optimal levels of L and K, we divide the first FOC by the second and get,
(note a = b = 13 )
P · MPL MPL dK P · aLa−1 K b w
= = = = ;
P · MPK MPK dL P · La bK b−1 r
aK w
= ;
bL r
w
K = L.
r
Observe that the ratio of the MPL and MPK is the MRTS (Marginal rate of technical substi-
tution, i.e., the rate at which one can substitute labor for capital along an iso-quant.) The
270 30. Solution to PS 7
value of K can be substituted in any of the two FOC to get the expression for L.
P · L− 3 K 3
2 1
= w;
( ) 13
− 23 w
P·L L = w;
r
( ) 13
w 1
P· = wL 3 ;
r
( )1
1 3 1
P· = L3 ;
rw2
P3
L∗ =
.
rw2
Taking the derivative of L with respect to r we obtain
dL∗ P3
= − 2 2.
dr r w
3 3
1 2
dP · L 3 K − 3 + P · L− 3 K − 3 dL − P · L 3 K − 3 dK = dr
1 2 2 2 1 5
3 3
dP P
− dR = dv.
R2 R3
We can write this in matrix form as under:
− 32 P · L− 3 K 3 13 P · L− 3 K − 3
5 1 2 2
0
dL
dw − dP · L − 23 31
K
A = 13 P · L− 3 K − 3 − 23 P · L 3 K − 3 0 , q = dK b = dr − dP · L 3 K − 3 .
2 2 1 5 1 2
0 0 − RP2 dR dv − dP
R2
Then Aq = b. Note that the matrix A is same as the Hessian. Solving for dL, when
dP = dw = dv = 0 and dr ̸= 0, using Cramer’s Rule, we get,
0 13 P · L− 3 K − 3
2 2
0
det dr − 23 P · L 3 K − 3
1 5
0
0 0 − RP2
dL =
− 23 P · L− 3 K 3 31 P · L− 3 K − 3
5 1 2 2
0
det 13 P · L− 3 K − 3 − 23 P · L 3 K − 3
2 2 1 5
0
0 0 − RP2
(− RP2 )(−dr) 13 P · L− 3 K − 3
2 2 2 2
dr · L 3 K 3
= =− < 0.
− 13 PR2 · L− 3 K − 3
3 4 4
P
30. Solution to PS 7 271
dL∗
2 2
L3 K 3
=− < 0.
dr P
dL∗
Thus, L∗ decreases as r increases. To see that we obtain identical expression for dr
as in the previous part, observe,
P3 P6 P4
K∗ = ; L ∗
· K ∗
= ; (L ∗
· K ∗ 32
) =
r2 w r3 w3 r2 w2
dL∗ (L∗ K ∗ ) 3
2
P3
=− = 2 2
dr P r w
dL ∗ 2
L3 K 3
2
P3
=− = − 2 2.
dr P r w
(ii) Solving for dL, when dP = dw = dr = 0 and dv ̸= 0, using Cramer’s Rule, we get,
0 13 P · L− 3 K − 3
2 2
0
det 0 − 23 P · L 3 K − 3
1 5
0
dv 0 − RP2
dL =
− 23 P · L− 3 K 3 13 P · L− 3 K − 3
5 1 2 2
0
det 13 P · L− 3 K − 3 − 23 P · L 3 K − 3
2 2 1 5
0
0 0 − RP2
(− RP2 )(0) 31 P · L− 3 K − 3
2 2
=
− 13 RP2 · L− 3 K − 3
3 4 4
0
=− 4 = 0.
− 13 PR2 · L− 3 K − 3
3 4
dL∗
= 0.
dv
Since L∗ does not depend on v, this conclusion is obvious.
Chapter 31
Solution to PS 8
therefore C is ( )
(i) bounded since C ⊂ B (0, 0, 0) , 2 : indeed, x ∈ C ⇒ d(x, 0) = 1 < 2 ⇒ x ∈
B(0, 2),
(ii) closed in R3 since it is defined as a level set in R3 of polynomial and therefore con-
tinuous function ∑3i=1 xi2 (use the characterization of closed set in terms of convergent
sequences),
(iii) non-empty since (1, 0, 0) ∈ C.
Since objective function ∑3i=1 ci xi is linear, and therefore continuous on R3 , Weierstrass
theorem is applicable and yields x̄ ∈ C such that ∑3i=1 ci xi ≤ ∑3i=1 ci x̄i for any (x1 , x2 , x3 ) ∈ C.
(b) The optimization problem can be rewritten as
max f (x)
(31.1) subject to g(x) = 0
and x ∈ R3
where
3 3
f (x) = ∑ ci xi and g(x) = ∑ xi2 − 1.
i=1 i=1
Both functions f and g are polynomial and therefore continuously differentiable on an open
set R3 . Since x̄ is a point of global maximum of f subject to the constraint g(x) = 0, it is
also a local maximum of f subject to the constraint g(x) = 0. Since g(0) = −1 ̸= 0 we have
273
274 31. Solution to PS 8
x̄ ̸= 0. Now
∇g(x) = 2 (x1 , x2 , x3 )′ ̸= 0 for x ̸= 0,
and x̄ ̸= 0, hence constraint qualification ∇g(x̄) ̸= 0 holds. Therefore by Lagrange’s theorem
there exists λ ∈ R such that ∇ f (x̄) = λ∇g (x̄), or
(31.2) (c1 , c2 , c3 )′ = λ2 (x̄1 , x̄2 , x̄n )′
If we premultiply (31.2) by the row vector (x̄1 , x̄2 , x̄n ), we will get
3 3 ( )
(31.3) ∑ ci x̄i = 2λ ∑ x̄i2 = 2λ g(x̄) + 1 = 2λ (0 + 1) = 2λ
i=1 i=1
If we premultiply (31.2) by row vector (c1 , c2 , c3 ), equation (31.3) yields
( )2
3 3 3
(31.4) ∥c∥2 = ∑ c2i = 2λ ∑ ci x̄i = ∑ ci x̄i
i=1 i=1 i=1
√
125. So the constraint set is compact and non-empty and the objective function f is contin-
uous, hence Weierstrass theorem is applicable and a solution exists. The Lagrangian and the
FOCs are
(31.5) L (x, y, λ) = x2 − 3xy + λ (2y + x − 10)
∂L (x, y, λ)
(31.6) = 2x − 3y + λ = 0
∂x
∂L (x, y, λ) 3
(31.7) = −3x + 2λ = 0 → λ = x
∂y 2
∂L (x, λ)
(31.8) = 2y + x − 10 = 0.
∂λ
Now
3 7 7
2x − 3y + λ = 2x − 3y + x = 0 → x = 3y → y = x
2 2 6
7 10
2y + x − 10 = 0 → x + x − 10 = 0 → x = 10 → x = 3
3 3
7 7 9
y = ·3 = ,λ = .
6 2 2
We get an interior candidate for solution
( )
7 9
m1 = 3, , .
2 2
The constraint qualification
( ) [ ]
∇g x∗ , y∗ = 1 2 ̸= 0
3. Necessity Route: A solution exists by arguments similar to the earlier problem. The La-
grangian and the FOCs are
1 2
(31.9) L (x, y, λ) = x 3 y 3 + λ (4 − 2x − y)
∂L (x, y, λ) 1 −2 2
(31.10) = x y − 2λ = 0
3 3
∂x 3
∂L (x, y, λ) 2 1 −1
(31.11) = x3 y 3 −λ = 0
∂y 3
∂L (x, λ)
(31.12) = 4 − 2x − y = 0.
∂λ
276 31. Solution to PS 8
Now
1 − 23 23
3x y 2λ y
= → = 2 → y = 4x
2 3 − 13
1
λ 2x
3x y
( ) 13
2 8 2 1
4 − 2x − y = 4 − 2x − 4x = 0 → x = , y = , λ =
3 3 3 4
We get an interior candidate for solution
( ) 13
2 8 2 1
m1 = , , .
3 3 3 4
H f (x, y) = 9x y
.
2 − 23 − 31
− 92 x 3 y− 3
1 4
9x y
The determinant of Principal minors of order one,
2 5 2
− x− 3 y 3 6 0,
9
2 1 −4
− x3 y 3 6 0
9
and principal minor of order two
0>0
for ∀ (x, y) ∈ R2+ . Hence f is concave. The constraint is linear and so concave.
( ) λ > 0. L (x, y, λ)
is concave and FOC are sufficient for maximum. Therefore (x∗ , y∗ ) = 2 8
3, 3 which satisfies the
FOC is the solution.
4. Let f : R2 → R
√
max f (x, y) = xy
(31.13)
subject to x + y 6 6, x > 0, y > 0.
31. Solution to PS 8 277
This problem has inequality constraint and so we will use Kuhn Tucker Sufficiency theorem.
We need to check all conditions of the Theorem are satisfied.
(i) Let
{ }
X = (x, y) ∈ R2++ .
Then X is open as its complement
{ }
X C = (x, y) ∈ R2 | x 6 0, y 6 0
is closed.
(ii) Function f (x, y) is continuous as x, and y are continuous, and f (·) is obtained by taking the
product of these two continuous functions. Let g1 (x, y) = 6 − x − y, 2 3
√g (x, y) = x, g (x, y) =
√
1 y 1 x
y are linear and hence continuous functions. Further fx (x, y) = 2 x , fy (x, y) = 2 y are
continuous functions. Hence f , g j ( j = 1, · · · , 3) are continuously differentiable on X.
(iii) The set X is convex as (x1 , y1 ), (x2 , y2 ) ∈ X,then
x1 > 0, x2 > 0 → λx1 + (1 − λ) x2 > 0∀λ ∈ (0, 1)
y1 >0, y2 > 0 → λy1 + (1 − λ) y2 > 0∀λ ∈ (0, 1)
( )
→ λx1 + (1 − λ) x2 , λy1 + (1 − λ) y2 ∈ X.
(iv) Function f (x, y) is concave as
[ √ √ ]
y
∇ f (x, y) = 1
2 x
1
2
x
y
√ √
y
− 1
4√ x3
1
4
1
xy
H f (x, y) = 1 1 √ .
4 xy − 41 x
y3
all conditions of Kuhn -Tucker sufficiency theorem are satisfied. We need to find pair ((x∗ , y∗ ) , λ∗ ) ∈
X × R3+ , that satisfies the Kuhn-Tucker conditions.
m
(i) Di f (x∗ ) + ∑ λ∗j Di g j (x∗ ) = 0; i = 1, · · · , n,
j=1
(ii) g(x ) > 0 and λ∗ · g(x∗ ) = 0.
∗
They are
√
1 y
(31.14) − λ1 + λ2 = 0
2 x
√
1 x
(31.15) − λ1 + λ3 = 0
2 y
(31.16) 6 − x − y > 0, λ1 (6 − x − y) = 0
(31.17) x > 0, λ2 x = 0; y > 0, λ3 y = 0
√ √
If λ1 = 0, then 1
2
x
y − λ 1 + λ 3 = 0 → λ 3 = − 1
2 y < 0 which contradicts λ3 > 0. Hence
x
λ1 > 0 → 6 − x − y = 0
Since x > 0, y > 0, λ2 = 0, λ3 = 0,
√ √ √ √
1 y 1 x 1 x 1 y
− λ1 + λ2 = − λ1 + λ3 = 0 → = λ1 =
2 x 2 y 2 y 2 x
→ x = y → 6 − x − y = 0 → x = y = 3 > 0.
Note that all conditions are satisfied. Hence it is a global maximum on X. Observe that it is also
a global maximum on R2+ as
f (x, y) = 0 for (x, y) = R2+ \ X
and f (3, 3) > 0. Hence, (3, 3) solves the optimization problem.
5. Let f : R2 → R
max f (x, y) = x + ln (1 + y)
(31.18)
subject to x ≥ 0, y ≥ 0 and x + py ≤ m.
Again we will use Kuhn Tucker Sufficiency theorem. We need to check all conditions of the
Theorem are satisfied.
(i) Let { }
X = (x, y) ∈ R2 | x > −1, y > −1 .
Then X is open as its complement
{ }
X C = (x, y) ∈ R2 | x 6 −1, y 6 −1
is closed.
31. Solution to PS 8 279
(ii) Function f (x, y) is continuous as x and ln (1 + y), for y > −1 are continuous, and f (·) is
sum of two continuous functions. Let g1 (x, y) = m − x − py, g2 (x, y) = x, g3 (x, y) = y are
1
linear and hence continuous functions. Further fx (x, y) = 1, fy (x, y) = 1+y are continuous
functions. Hence f , g j ( j = 1, · · · , 3) are continuously differentiable on X.
(iii) The set X is convex as (x1 , y1 ), (x2 , y2 ) ∈ X, then
x1 > −1, x2 > −1 → λx1 + (1 − λ) x2 > −1∀λ ∈ (0, 1)
y1 > −1, y2 > −1 → λy1 + (1 − λ) y2 > −1∀λ ∈ (0, 1)
( )
→ λx1 + (1 − λ) x2 , λy1 + (1 − λ) y2 ∈ X.
(iv) Function f (x, y) is concave as
[ ]
∇ f (x, y) = 1 1
1+y
[ ]
0 0
H f (x, y) = 0 − 1 .
(1+y)2
They are
(31.19) 1 − λ1 + λ2 = 0
1
(31.20) − pλ1 + λ3 = 0
1+y
(31.21) m − x − py > 0, λ1 (m − x − py) = 0
(31.22) x > 0, λ2 x = 0; y > 0, λ3 y = 0.
280 31. Solution to PS 8
λ1 > 0 → m − x − py = 0
and x = y = 0 is ruled out because m > 0. There are three remaining cases.
(i) x > 0, y = 0. Note λ2 = 0, x = m,
1 = λ1
1 − p + λ3 = 0
λ3 = p − 1.
1 1
( )= == λ1
p 1 + mp p+m
1 − λ1 + λ2 = 0
1
1− + λ2 = 0
p+m
1
λ2 = − 1.
p+m
( )
1
If p+m −1 > 0 → 1 > p+m, then λ2 > 0. So solution is 0, mp , p+m
1 1
, p+m − 1, 0 if p+m 6
1.
(iii) x > 0, y > 0. Note λ2 = 0, λ3 = 0,
1 1
(31.23) 1 = λ1 ,
= p → y = −1 > 0
1+y p
(31.24) m − x − py = 0 → x = m − 1 + p > 0
( )
Hence for 1 > p > 1 − m, the solution is m − 1 + p, 1p − 1, 1, 0, 0 . Combining them the
( )
solution x∗ , y∗ , λ∗1 , λ∗2 , λ∗3 is
(m, 0, 1, 0, p −)1) if p > 1
( m 1 1
0, p , p+m , p+m − 1, 0 if p 6 1 − m and
( )
m − 1 + p, 1p − 1, 1, 0, 0 if 1 − m < p < 1.
The Kuhn Tucker Sufficiency Theorem asserts that this solution is a global maximum and
therefore solves both the problem.
31. Solution to PS 8 281
7. Suppose that a consumer has the utility function U(x, y) = xa yb and faces the budget constraint
px x + py y ≤ I.
(A) Utility Maximization
282 31. Solution to PS 8
(a) What are the first order conditions for utility maximization?
Observe that the utility function makes sense only if a > 0 and b > 0. The Lagrangean
for the optimization problem is
L (x, y, λ) = U(x, y) + λ(I − px x − py y)
= xa yb + λ(I − px x − py y)
The first order conditions are,
∂L
= axa−1 yb − λpx = 0
∂x
∂L
= bxa yb−1 − λpy = 0
∂y
∂L
= I − px x − py y = 0
∂λ
(b) Solve for the consumer’s demands for goods x and y.
From the first two FOCs, we get
axa−1 yb = λpx
bxa yb−1 = λpy
Dividing the first equation by the second, we get
axa−1 yb λpx
=
bxa yb−1 λpy
ay px
=
bx py
b
py y = px x.
a
We use this in the third FOC to get,
px x + py y = I
b
px x + px x = I
a
a+b
px x = I
a
a a I
px x ∗ = I → x∗ =
a+b a + b px
This gives
b b I
py y∗ = I → y∗ =
a+b a + b py
31. Solution to PS 8 283
(c) Solve for the value of λ. What is the economic interpretation of λ? When is λ an
increasing, decreasing or constant function of income?
We use the first FOC (with respect to x) to get,
axa−1 yb
axa−1 yb = λpx → λ∗ =
px
( )a−1 ( )b
a I b I
a a+b px a+b py
λ∗ =
px
( )a ( )b ( )a+b−1
a b I
= >0
px py a+b
The border preserving leading principal minor of order 2 is the Hessian matrix itself.
For the second order condition to be satisfy, the determinant of the Hessian needs to be
284 31. Solution to PS 8
positive.
dx
(e) Show that the implicit function theorem value of dI is identical to the value of taking
the partial derivative of x∗ with respect to I.
Using x∗ , we get
∂x∗ a 1
=
∂I a + b px
31. Solution to PS 8 285
(a) What are the first order conditions for expenditure minimization?
First, we write down the minimization problem as
min px x + py y
subject to xa yb ≥ u0 ,
which can be converted into a maximization exercise as under:
max −px x − py y
subject to xa yb ≥ u0 ,
The Lagrangean for the maximization problem is
L (x, y, λ) = −px x − py y + λ(xa yb − u0 )
∗ b px ∗ b px a py a+b 1
a+b
b px 1
y = x = u0 = u0a+b
a py a py b px a py
31. Solution to PS 8 287
( ) a+b
b ( ) a+b
a
a py 1 b px 1
e(px , py , u0 ) = px x∗ + py y∗ = px a+b
u0 + py u0a+b
b px a py
( ) a+b
b ( ) a+b
a
a b
= (pax pby u0 ) a+b
1
+
b a
1
a+b
( )a ( ( ) a+b
b )b
( ) a+b
a
a b a b
I a+b
1
a b a+b
= (px py ) +
(a + b)px (a + b)py b a
[( 1
)a ( )b ] a+b ( ) a+b
b ( ) a+ba
a b a b I = I.
= +
a+b a+b b a
This shows that the minimum expenditure required to attain utility equal to the indirect
utility function is same as the income I. Thus the two approaches are equivalent.
(e) To avoid confusion, let us call solution for utility maximization of good x as x∗ and
solution for good x in expenditure minimization as h∗ . Prove that
∂x∗ ∂h∗ ∂x∗
= − x∗ .
∂Px ∂Px ∂I
Interpret this answer.
1
Observe that we can rewrite h∗ as h∗ = θ(px )− a+b where θ ≡ ( ab py ) a+b u0a+b . This gives
b b
us
( ) ( ) ∗
∂h∗ b − a+b
b
−1 b h
=θ − (px ) = − .
∂px a+b a + b px
Also from the utility maximization, we get,
( )
∂x∗ aI −x∗
= − (px )−2 = .
∂px a+b px
and
∗
( )
∗ ∂x ∗ a
x =x (px )−1 .
∂I a+b
288 31. Solution to PS 8
Therefore,
( )( ∗ ) ( ∗)
∂x∗ ∗ ∂x
∗ −x∗ a x b x ∂h∗
+x = + =− = .
∂px ∂I px a+b px a + b px ∂px
The change in x∗ due to change in own price px (Total effect) is the sum of the substi-
∂h∗ ∗
tution effect ( ∂p x
) and the income effect (−x∗ ∂x
∂I ).
8. Suppose a consumer has the utility function U = a ln(x − x0 ) + b ln(y − y0 ) where a, b, x0 and y0
are positive parameters. Assume that the usual budget constraint applies.
(a) Solve for the consumer’s demand for good x.
Observe that the utility maximization exercise makes sense if consumption bundle (x0 , y0 )
is feasible. Let us denote x − x0 by x′ and y − y0 by y′ . Then the utility function can be
written as U(x′ , y′ ) = a ln(x′ ) + b ln(y′ ). The budget constraint px + qy = I can be written as
px′ +qy′ = I − px0 −qy0 = I ′ . The utility maximization exercise can therefore be formulated
as
max a ln(x′ ) + b ln(y′ )
subject to px′ + qy′ = I ′ .
∂L a
′
= ′ − λpx = 0
∂x x
∂L b
′
= ′ − λpy = 0
∂y y
∂L
= I ′ − px x′ − py y′ = 0
∂λ
a b
′
= λpx ; ′ = λpy
x y
ay′ px b
′
= ; py y′ = px x′ .
bx py a
31. Solution to PS 8 289
H = ∂x∂y ∂y2 ∂y∂λ = 0 − (yb′ )2 −py .
∂2 L ∂2 L ∂2 L −px −py 0
∂x∂λ ∂y∂λ ∂λ2
The border preserving leading principal minor of order 2 is the Hessian matrix itself. For
the second order condition to be satisfy, the determinant of the Hessian needs to be positive.
[ ( )] [ ( )]
b a
det H = (−px ) −(−px ) − ′ 2 − (−py ) (−py ) − ′ 2
(y ) (x )
( ) ( )
bp2x ap2y
= + > 0.
(y′ )2 (x′ )2
Thus SOC holds and we have a maximum. The optimum consumption bundle is
a I − px x0 − py y0
x∗ = x′ + x0 = + x0
a+b px
a I − py y0 b
= + x0
a + b px a+b
b I − px x0 a
y∗ = + y0
a + b py a+b
(b) Find the elasticities of demand for good x with respect to income and prices.
It is easy to compute the price and income elasticity using the definitions. Please let me
know if you have any questions on this.
290 31. Solution to PS 8
(c) Show that the utility function V = 45(x − x0 )3.5a (y − y0 )3.5b would have yielded the same
demand for good x.
If we take positive monotone transformation of the given utility by taking its natural log,
then we get a function which is similar to the utility function in (a).
lnV = ln 45 + 3.5a ln(x − x0 ) + 3.5b ln(y − y0 )
= ln 45 + 3.5(U)
This implies that the consumption bundle (x∗ , y∗ ) will maximize the utility function V also.
The determinant of leading principal minors of order one is − xa2 < 0; of leading prin-
cipal minor of order two is xab
2 y2 > 0; and of leading principal minor of order three
31. Solution to PS 8 291
is − x2abc y2 z2
< 0 for ∀ (x, y, z) ∈ X. Hence f is concave. Further, g j ( j = 1, · · · , 5) are
concave being linear functions.
Hence all conditions of Kuhn -Tucker sufficiency theorem are satisfied. We need to find pair
((x∗ , y∗ , z∗ ) , λ∗ ) ∈ X × R5+ , that satisfies the Kuhn-Tucker conditions.
5
(i) Di f (x∗ , y∗ , z∗ ) + ∑ λ∗j Di g j (x∗ , y∗ , z∗ ) = 0; i = 1, · · · , 3,
j=1
They are
(31.29) a
x − λ1 p − λ2 + λ3 = 0
(31.30) b
y − λ1 q + λ4 = 0
(31.31) c
z − λ1 r + λ5 = 0
(31.32) I − px − qy − rz > 0, λ1 (I − px − qy − rz) = 0
(31.33) k − x > 0, λ2 (k − x) = 0
(31.34) x > 0, λ3 x = 0; y > 0, λ4 y = 0, z > 0, λ5 z = 0.
If λ1 = 0, then by − λ1 q + λ4 = 0 → λ4 = − by < 0 which contradicts λ4 > 0. Hence
λ1 > 0 → I − px − qy − rz = 0
Also, x > 0, y > 0, and z > 0 for the three FOC to hold with equality. Thus λ3 = 0 = λ4 = λ5 .
(i) If λ2 > 0 then x = k, and
b c
I − pk = qy + rz = + .
λ1 λ1
Thus λ1 = b+c
I−pk which leads to
b(I − pk) c(I − pk)
y= and z = .
q(b + c) q(b + c)
We need to verify λ2 > 0 which will hold if λ2 = ak − (b+c)p
I−pk > 0 or
a pk
> .
b + c I − pk
(ii) If λ2 = 0, then
aI bI cI I
x= ;y = ;z = ; λ1 =
p(a + b + c) q(a + b + c) r(a + b + c) (a + b + c)
satisfies the KT conditions (Please verify).
(b)
a pk
> .
b + c I − pk
292 31. Solution to PS 8
(c)
b(I−pk)
py (b+c) b
= c(I−pk)
= .
rz c
(b+c)
(d) No, it is more likely that one buys more rice and less butter.
Bibliography
Bridges, D., Ray, M., 1984. What is constructive mathematics. The Mathematical Intelligencer
6 (4), 32–38.
Dixit, A. K., 1990. Optimization in Economic Theory, 2nd Edition. Oxford University Press, USA.
Mas-Colell, A., Whinston, M., Green, J., 1995. Microeconomic Theory. Oxford University Press,
USA.
Mitra, T., 2013. Lectures on Mathematical Analysis for Economists. Campus Book Store.
Olsen, L., October 2004. A new proof of Darboux’s Theorem. The American Mathematical
Monthly 111, 173–175.
Royden, H. L., 1988. Real Analysis. Prentice Hall.
Simon, C. P., Blume, L., 1994. Mathematics for Economists. W. W. Norton & Co., New York.
Stricharz, R., 2000. The Way of Analysis. Jones and Bartlett.
Wainwright, E. K., Chiang, A., 2005. Fundamental Methods of Mathematical Economics, 4th Edi-
tion. McGraw Hill, New York.
293