You are on page 1of 307

Math Review 2018

Ram Sewak Dubey

D EPARTMENT OF E CONOMICS , F ELICIANO S CHOOL OF B USINESS , M ONTCLAIR S TATE


U NIVERSITY, M ONTCLAIR , N EW J ERSEY, 07043
E-mail address: dubeyr@mail.montclair.edu
Contents

Preface v

Chapter 1. Syllabus vii


§1.1. Overview vii
§1.2. Course Schedule viii
§1.3. Topics covered viii
§1.4. Textbook ix
§1.5. Mathematics Proficiency Test ix

Chapter 2. Introduction to Logic 1


§2.1. Introduction 1
§2.2. Statements 2
§2.3. Logical Connective 3
§2.4. Quantifiers 8
§2.5. Rules of Negation of statements with quantifiers 12
§2.6. Logical Equivalences 14
§2.7. Some Math symbols and Definitions 15

Chapter 3. Proof Techniques 17


§3.1. Methods of Proof 17
§3.2. Trivial Proofs 18
§3.3. Vacuous Proofs 18
§3.4. Proof by Construction 19

iii
iv Contents

§3.5. Proof by Contraposition 21


§3.6. Proof by Contradiction 22
§3.7. Proof by Induction 24
§3.8. Additional Notes on Proofs 28
§3.9. Decomposition or proof by cases 30
Chapter 4. Problem Set 1 35
Chapter 5. Set Theory, Sequence 37
§5.1. Set Theory 37
§5.2. Set Identities 42
§5.3. Functions 44
§5.4. Vector Space 45
§5.5. Sequences 50
§5.6. Sets in Rn 56
Chapter 6. Problem Set 2 63
Chapter 7. Linear Algebra 67
§7.1. Vectors 67
§7.2. Matrices 69
§7.3. Determinant of a matrix 74
§7.4. An application of matrix algebra 77
§7.5. System of Linear Equations 80
§7.6. Cramer’s Rule 84
§7.7. Principal Minors 86
§7.8. Quadratic Form 87
§7.9. Eigenvalue and Eigenvectors 88
§7.10. Eigenvalues of symmetric matrix 91
§7.11. Eigenvalues, Trace and Determinant of a Matrix 92
Chapter 8. Problem Set 3 95
Chapter 9. Single and Multivariable Calculus 99
§9.1. Functions 99
§9.2. Surjective and Injective Functions 99
§9.3. Composition of Functions 102
§9.4. Continuous Functions 103
Contents v

§9.5. Extreme Values 106


§9.6. An application of Extreme Values Theorem 107
§9.7. Differentiability 110
§9.8. Mean Value Theorem 115
§9.9. Monotone Functions 116
§9.10. Functions of Several Variables 118
§9.11. Composite Functions and the Chain Rule 122
Chapter 10. Problem Set 4 125
Chapter 11. Convex Analysis 127
§11.1. Concave, Convex Functions 127
§11.2. Quasi-concave Functions 135
Chapter 12. Problem Set 5 139
Chapter 13. Inverse and Implicit Function Theorems 141
§13.1. Inverse Function Theorem 141
§13.2. The Linear Implicit Function Theorem 142
§13.3. Implicit Function Theorem for R2 144
Chapter 14. Homogeneous and Homothetic Functions 147
§14.1. Homogeneous Functions 147
§14.2. Homothetic Functions 151
Chapter 15. Separating Hyperplane Theorem 153
§15.1. Separation by hyperplanes 153
§15.2. Separating Hyperplane Theorem 154
Chapter 16. Problem Set 6 157
Chapter 17. Unconstrained Optimization 159
§17.1. Optimization Problem 159
§17.2. Maxima / Minima for C2 functions of n variables 160
§17.3. Application: Ordinary Least Square Analysis 166
Chapter 18. Problem Set 7 171
Chapter 19. Optimization Theory: Equality Constraints 173
§19.1. Constrained Optimization 173
§19.2. Equality Constraint 175
vi Contents

Chapter 20. Optimization Theory: Inequality Constraints 187


§20.1. Inequality Constraint 187
§20.2. Global maximum and constrained local maximum 194
Chapter 21. Problem Set 8 201
Chapter 22. Envelope Theorem 205
§22.1. Envelope Theorem for Unconstrained Problems 205
§22.2. Meaning of the Lagrange multiplier 207
§22.3. Envelope Theorem for Constrained Optimization 208
Chapter 23. Elementary Concepts in Probability 209
§23.1. Discrete Probability Model 209
§23.2. Marginal and Conditional Distribution 213
§23.3. The Law of Iterated Expectation 216
§23.4. Continuous Random Variables 216
Chapter 24. Solution to PS 1 219
Chapter 25. Solution to PS 2 227
Chapter 26. Solution to PS 3 235
Chapter 27. Solution to PS 4 243
Chapter 28. Solution to PS 5 249
Chapter 29. Solution to PS 6 255
Chapter 30. Solution to PS 7 263
Chapter 31. Solution to PS 8 273
Preface

These notes have been prepared for the Math Review Class for Graduate students joining Ph. D.
program in the field of economics at Cornell University. While making these notes we have referred
to the material used in previous years’ classes.
The objective of Math Review class is to present elementary concepts from set theory, multi-
variable calculus, linear algebra, elementary probability concepts, real analysis and optimization
theory. I have used examples and problem sets to explain the concepts, definitions and techniques
which are useful in Fall semester graduate economics classes.
These notes could serve to refresh the memory for those incoming students who are familiar
with the material. To others, these notes could be a ready reckoner of math techniques they will
need to know in the first few weeks of the graduate classes (Econ 6090, Econ 6130, Econ 6190) in
Economics before they are discussed in Econ 6170 in more rigorous way.
The topics have been arranged so that the entire material can be covered in thirteen classes
of three hours duration each. Additional problem sets with solutions are provided on each day’s
material. Three additional sections of three hours each are sufficient to go over the questions in
problem sets. It is hoped that they will help the reader to better understand the material in lecture
notes.
Earlier versions have been used for the Math Review Classes during 2009-17. My sincere
thanks go to the participants for their comments and also for pointing out typos, errors.
Ram Sewak Dubey

vii
Chapter 1

Syllabus

Math Review 2018


Field of Economics
Cornell University

Instructor: Ram Sewak Dubey Office Room: 474F, Uris Hall


Office Hour: 12:15 -1:15pm E-mail: rsd28@cornell.edu

1.1. Overview
The Field of Economics offers the August Math Review Course for incoming first-year Ph.D.
students. The aim of this review is to refresh students’ mathematical skills and introduce concepts
that are critical to success in the first year economics core courses, i.e., Econ 6090, Econ 6130,
Econ 6170, and Econ 6190. The emphasis is on rigorous treatment of proof techniques, underlying
concepts and illustrative examples.
There is usually a great deal of variation in the mathematical background of incoming first-year
students. However, almost all students have something to gain from the review course. For those
who do not have an adequate mathematics background (by a US Ph.D. standard), the course offers
an opportunity to catch up on critical concepts and get a head start on the fall classes. For those
who took their core undergraduate courses in analysis and algebra some years ago, the course is a
good refresher. For those who do not have significant experience with technical courses taught in
English, the review offers an opportunity to pick up the math vocabulary that will be in use from
the first day of regular instruction.
The Math Review Course is funded by the Department of Economics. There is no charge for
students matriculating into the Economics Ph.D. Program. Students matriculating into other Ph.
D. programs should contact the Director of Graduate Studies in their Field. There will be a charge

ix
x 1. Syllabus

for these students, and the DGS in the student’s Field must make arrangements to pay that charge
before the student may attend the Math Review Course.
The Math Review Course is not linked to Econ 6170, Intermediate Mathematical Economics I.
There is no course grade, and no record will be kept of your performance. However, the Economics
Ph.D. program strongly encourages you to attend. Most students who have taken this course in past
years have found it useful, regardless of their prior mathematics training. Perhaps most importantly,
the review period is an excellent time to get acquainted with other incoming students, meet the
faculty and settle into Ithaca.

1.2. Course Schedule


The course duration will be July 30- August 17. There will be a lecture session each working day.
The room for all the sessions is URIS 202.
(A) Session Time:
July 30-August 3, August 6-10, 13-17 Time: 9am-Noon.
(B) There will be a handout of some basic definitions distributed at each session, and practice
problems will be assigned on each topic. You are strongly encouraged to at least attempt every
problem, as this is the best way to understand the material. The problem sets will be due the
following day in class (for example, the problem set given in class on Monday will be due on
Tuesday) and I intend to grade some of the questions in each problem set. We will go over the
solutions to the problem sets in class.

1.3. Topics covered


A. Elements of Logic: Statements, Truth tables, Implications, Tautologies, Contradictions, Logical
Equivalence, Quantifiers, Negation of Quantified Statements
B. Proof Techniques: Trivial Proofs, Vacuous Proofs, Direct Proofs, Proof by Contrapositive,
Proof by Cases, Proof by Contradiction, Existence Proofs, Proof by Mathematical Induction
C. Set Theory: Definitions, Set Equality, Set Operations, Venn Diagrams, Set Identities, Cartesian
Products, Properties of the Set of Real Numbers
D. Sequences: Convergent Sequences, Subsequences, Cauchy Sequences, Upper and Lower Lim-
its, Algebraic Properties of Limits, Monotone Sequences
E. Functions of One Variable: Limits of Functions, Continuous Functions, Monotone Functions,
Properties of Exponential and Logarithmic Functions
F. Linear Algebra: System of Linear Equations, Solution by Substitution or Elimination of Vari-
ables, Systems with Many or No Solutions
G. Vectors I: Addition, Subtraction, Scalar Multiplication, Length, Distance, Inner Product
H. Matrix Algebra I: Addition, Subtraction, Scalar and Matrix Multiplication, Transpose, Laws of
Matrix Algebra
1.5. Mathematics Proficiency Test xi

I. Determinants: Definition, Computation, Properties, Use of Determinants, Matrix Inverse, Cramer’s


Rule
J. Vectors II: Linear Independence, Rn as an example of Vector Space, Basis and Dimension in
Rn
K. Matrix Algebra II: Algebra of Square Matrices, Eigenvalues, Eigenvectors, Properties of Eigen-
values
L. Differential Calculus: Derivative of a Real Function, Mean Value Theorem, Continuity of
Derivatives, L’Hospital’s Rule, Higher Order Derivatives, Taylor’s Theorem
M. Functions of Several Variables: Graphs of Functions of Two Variables, Level Curves, Continu-
ous Functions, Total Derivative, Chain Rule, Partial Derivatives
N. Unconstrained Optimization: First Order Conditions, Global Maxima and Minima, Examples
O. Constrained Optimization with equality constraints: First Order Conditions, Constrained Min-
imization Problems, Examples
P. Constrained Optimization with inequality constraints: Kuhn-Tucker conditions, Interpreting
the Multipliers, Envelope Theorem

1.4. Textbook
There is no textbook for the math review course, however the following books may be helpful.
The textbook ? is used in the Microeconomics course sequence. ? and ? are useful textbooks for
Mathematical Economics. It will be useful to refer to ? for understanding the material. Copies of
this textbook are available in the libraries. ? will be our reference book for analysis. ? contains
many useful examples. ? is the set of Lecture Notes used in Econ 6170.

1.5. Mathematics Proficiency Test


A Mathematics Proficiency Test will be given on Friday, August 17, 2017 from 12:30pm - 3:30
pm in URIS 202. The test will be based on the course material of Economics 6170. If you pass
this test, you have satisfied the mathematics proficiency requirement of the field of economics,
and need not take the Economics 6170 course. If you fail this test, or if you do not take this test,
you can complete the mathematics proficiency requirement of the field of economics by taking the
Economics 6170 course for credit, and getting a course grade of B- or better.
If you would like any more information, you can contact me at rsd28@cornell.edu. Enjoy
your summer and I look forward to meeting you in August.
Bibliography

Bridges, D., Ray, M., 1984. What is constructive mathematics. The Mathematical Intelligencer
6 (4), 32–38.
Dixit, A. K., 1990. Optimization in Economic Theory, 2nd Edition. Oxford University Press, USA.
Mas-Colell, A., Whinston, M., Green, J., 1995. Microeconomic Theory. Oxford University Press,
USA.
Mitra, T., 2013. Lectures on Mathematical Analysis for Economists. Campus Book Store.
Olsen, L., October 2004. A new proof of Darboux’s Theorem. The American Mathematical
Monthly 111, 173–175.
Royden, H. L., 1988. Real Analysis. Prentice Hall.
Simon, C. P., Blume, L., 1994. Mathematics for Economists. W. W. Norton & Co., New York.
Stricharz, R., 2000. The Way of Analysis. Jones and Bartlett.
Wainwright, E. K., Chiang, A., 2005. Fundamental Methods of Mathematical Economics, 4th Edi-
tion. McGraw Hill, New York.

xiii
Chapter 2

Introduction to Logic

2.1. Introduction
The theory that you’ll learn during the first year is built on a foundation borrowed from engineering
and pure mathematics. You will be required to both understand and reproduce certain key proofs,
particularly in microeconomics. On some problem sets and exams you’ll be asked to produce your
own proofs.
If you haven’t taken any pure math courses, you might be thinking “I don’t even know what a
proof is”. That is completely fine. There are plenty of very accomplished Ph.D. students at Cornell
who had no idea how to write a proof when they arrived. It’s important not to get discouraged
because it takes time to learn how to write good proofs. There is a standard bag of tricks that will
get you through almost any proof in the first year sequence, but it takes exposure and then practice
for you to learn and be comfortable with these tricks. Math majors are at an advantage here, more
than in most areas, but by the end of the year they’ll have forgotten the fancier proof techniques
and you’ll have learned the necessary ones, so the field will be surprisingly leveled.
A proof is a series of statements that demonstrates the truth of a proposition. In writing a proof
you make use of (i) the rules of logic and (ii) Definitions, theorems, and other propositions that
have already been proved, or that you are told you can take as given.
The rules of logic are obviously fixed and unchanging. The components of the second point,
however, will vary depending on the task at hand. The most important question to ask yourself
when attempting to prove a proposition is “What do I already know”? It will often be the case that
if you write down all of the relevant mathematical definitions, the theorems or results that you were
given or that you know you can take as given, and any result that you just proved in a previous
problem, a straightforward rearrangement of everything on the page will give you the proof that
you want.

1
2 2. Introduction to Logic

In this chapter we will discuss the principles of logic that are essential for problem solving in
mathematics. The ability to reason using the principles of logic is key to seek the truth which is
our goal in mathematics. Before we explore and study logic, let us start by spending some time
motivating this topic. Mathematicians reduce problems to the manipulation of symbols using a set
of rules. As an illustration, let us consider the following problem.
Example 2.1. Joe is 7 years older than John. Six years from now Joe will be twice John’s age.
How old are Joe and John?
Solution 2.1. To answer the above question, we reduce the problem using symbolic formulation.
We let John’s age be x. Then Joe’s age is x + 7. We are given that six years from now Joe will be
twice John’s age. In symbols, (x + 7) + 6 = 2(x + 6). Solving for x yields x = 1. Therefore, John
is 1 year old and Joe is 8.

Our objective is to reduce the process of mathematical reasoning, i.e., logic, to the manipulation
of symbols using a set of rules. The central concept of deductive logic is the concept of argument
form. An argument is a sequence of statements aimed at demonstrating the truth of an assertion (a
“claim”). Consider the following two arguments.
Argument 1. If x is a real number such that x < −3 or x > 3, then x2 > 9. Therefore, if x2 ≤ 9,
then x ≥ −3 and x ≤ 3.
Argument 2. If it is raining or I am sick, then I stay at home. Therefore, if I do not stay at home,
then it is not raining and I am not sick.

Although the content of the above two arguments is very different, their logical form is the
same. To illustrate the logical form of these arguments, we use letters of the alphabet (such as p, q
and r) to represent the component sentences and the expression “not p” to refer to the sentence “It
is not the case that p.”. Then the common logical form of both the arguments above is as follows:
If p or q, then r. Therefore, if not r, then not p and not q.
We start by identifying and giving names to the building blocks which make up an argument. In
Arguments 1 and 2, we identified the building blocks as follows:
Argument 1. If x is a real number such that x < −3 (p) or x> 3 (q), then x2 > 9 (r). Therefore,
if ≤ 9 (not r), then x ≥ −3 (not p) and x ≤ 3 (not q).
x2
Argument 2. If it is raining (p) or I am sick (q), then I stay at home (r). Therefore, if I do not
stay at home (not r), then it is not raining (not p) and I am not sick (not q).

2.2. Statements
The study of logic is concerned with the truth or falsity of statements.
Definition 2.1 (Statement). A statement is a sentence which can be classified as true or false
without ambiguity. The truth or falsity of the statement is known as the truth value.
2.3. Logical Connective 3

For a sentence to be a statement, it is not necessary for us to know whether it is true or false.
However, it must be clear that it is one or the other.
Example 2.2. Consider following examples.
(a) One plus two equals three. It is a statement which is true.
(b) One plus one equals three. It is also a statement which is not true.
(c) He is a university student. This sentence is neither true nor false. The truth or falsity depends
on the reference for the pronoun he. For some values of he the sentence is true; for others it is
false, and so it is not a statement.
(d) “Every continuous function is differentiable.” is a statement with truth value being false.
(e) “x < 1 ” is true for some values of x and false for some others. It is a statement if we have
some particular context in mind. Otherwise, it is not a statement.
(f) Goldbach’s Conjecture “Every even number greater than 2 is the sum of two prime numbers”
is a statement whose truth value is not known yet.
(g) “There are infinitely many prime numbers of the form 2n + 1, where n is a natural number.” is
another statement whose truth value is not known till now.
Every statement has a truth value, namely true (denoted by T) or false (denoted by F). We often use
p, q and r to denote statements, or perhaps p1 , p2 , · · · , pn if there are several statements involved.
Exercise 2.1. Which of the following sentences are statements?
(a) If x is a real number, then x2 ≥ 0.
(b) 11 is a prime number.
(c) This sentence is false.
The possible truth values of a statement are often given in a table, called a truth table. The truth
values for two statements p and q are given below. Since there are two possible truth values for
each of p and q, there are four possible combinations of truth values for p and q. It is customary to
consider the four combinations of truth values in the order of TT, TF, FT, FF from top to bottom.
p q
T T
(2.1) T F
F T
F F

2.3. Logical Connective


A logical connective (also called a logical operator) is a symbol or word used to connect two or
more statements such that the compound statement produced has a truth value dependent on the
respective truth values of the original statements.
We discuss some of the elementary logical operators (connectives) first.
4 2. Introduction to Logic

(1) Logical Negation


Logical negation is an operation on one logical value, typically, the value of a proposition,
that produces a value of true if its operand is false and a value of false if its operand is true.
The truth table for ¬A (also written as NOT A or ∼ A) is as follows:
 
A ¬A
 
(2.2)  T F 
F T
For example, consider the statement,
p : The integer 2 is even.
Then the negation of p is the statement
∼ p : It is not the case that the integer 2 is even.
It would be better to write,
∼ p : The integer 2 is not even.
Or better yet to write,
∼ p : The integer 2 is odd.
(2) Logical Conjunction
Logical conjunction is an operation on the values of two propositions, that produces a value
of true if and only if both of its operands are true. The truth table for A ∧ B (also written as
A AND B) is as follows:

 
A B A∧B
 
 T T T 
 
(2.3)  T F F 
 
 F T F 
F F F
In words, if both A and B are true, then the conjunction A ∧ B is true. For all other assignments
of logical values to A and to B the conjunction A ∧ B is false.
For example, consider the statements
p : The integer 2 is even.

q : 4 is less than 3.
The conjunction of p and q, namely,
p ∧ q : The integer 2 is even and 4 is less than 3,
is a false statement since q is false (even though p is true).
2.3. Logical Connective 5

(3) Logical Disjunction


Logical disjunction is an operation on the values of two propositions, that produces a value
of false if and only if both of its operands are false. The truth table for A ∨ B (also written as
A OR B) is as follows:
 
A B A∨B
 T T T 
 
 
(2.4)  T F T 
 
 F T T 
F F F
Thus for the statements p and q described earlier, the disjunction of p and q, namely,
p ∨ q : The integer 2 is even or 4 is less than 3,
is a true statement since at least one of p and q is true (in this case, p is true).
(4) Logical Implication
Logical implication is associated with an operation on the values of two propositions, that
produces a value of false only in the case that the first operand is true and the second operand
is false. The truth table associated with A ⇒ B is as follows:
 
A B A⇒B
 T T 
 T 
 
(2.5)  T F F 
 
 F T T 
F F T
The last row of the table may appear to be counterintuitive. Note, however, that the use of “if
· · · then ” as a connective is quite different from that of day-to-day language.
Consider the following example.
Example 2.3. Suppose your supervisor makes you the following promise:
“If you meet the month-end deadline, then you will get a bonus.”
Under what circumstances are you justified in saying that your supervisor spoke falsely?
The answer is: You do meet the month-end deadline and you do not get a bonus. Your
supervisor’s promise only says that you will get a bonus if a certain condition (you meet the
month-end deadline) is met; it says nothing about what will happen if the condition is not met.
So if the condition is not met, your supervisor did not lie (your supervisor promised nothing if
you did not meet the month-end deadline); so your supervisor told the truth in this case. Are
you convinced? Good! If not, let us then check the truth and falsity of the implication based
on the various combinations of the truth values of the statements
p: You meet the month-end deadline;
q: You get a bonus.
The given statement can be written as p ⇒ q.
6 2. Introduction to Logic

Suppose first that p is true and q is true. That is, you meet the month-end deadline and you
do get a bonus. Did your supervisor tell the truth? Yes, indeed. So if p and q are both true,
then so too is p ⇒ q, which agrees with the first row of the truth table of (2.5).
Second, suppose that p is true and q is false. That is, you meet the month-end deadline
and you did not get a bonus. Then your supervisor did not do as he / she promised. What your
supervisor said was false, which agrees with the second row of the truth table of (2.5).
Third, suppose that p is false and q is true. That is, you did not meet the month-end
deadline and you did get a bonus. Your supervisor (who was most generous) did not lie (your
supervisor promised nothing if you did not meet the month-end deadline); so he/she told the
truth. This agrees with the third row of the truth table of (2.5).
Finally, suppose that p and q are both false. That is, you did not meet the month-end
deadline and you did not get a bonus. Your supervisor did not lie here either. Your supervisor
only promised you a bonus if you met the month-end deadline. So your supervisor told the
truth. This agrees with the fourth row of the truth table of (2.5).
In summary, the implication p ⇒ q is false only when p is true and q is false.
A conditional (or implication) statement that is true by virtue of the fact that its hypothesis
is false is said to be vacuously true or true by default. Thus the statement: “If you meet
the month-end deadline, then you will get a bonus” is vacuously true if you do not meet the
month-end deadline!

Example 2.4. Consider the expression 4 + 1 = 9 ⇒ 8 − 1 = 3. It may not be apparent as to


why this statement is assigned a truth value of T. But it is indeed true can be seen as follows.
4 + 1 − 4 = 9 − 4 = 5 so 1 = 5 and therefore 8 − 1 = 8 − 5 = 3.

(5) Logical Equality


Logical equality is an operation on the values of two propositions, that produces a value of
true if and only if both operands are false or both operands are true. The truth table for A ≡ B
is as follows:
 
A B A≡B
 
 T T T 
 
(2.6)  T F F 
 
 F T F 
F F T

So A ≡ B is true if A and B have the same truth value (both true or both false), and false if they
have different truth values.

Definition 2.2. A compound statement (statement with connective) is said to be a tautology if it is


always true regardless of the truth value of the simple statements from which it is constructed. It is
a contradiction if it is always false. Thus a tautology and a contradiction are negation of each other.
2.3. Logical Connective 7

Example 2.5. A ∨ (¬A) is a tautology, while A ∧ (¬A) is a contradiction.


 
A ¬A A ∨ (¬A) A ∧ (¬A)
 
(2.7)  T F T F 
F T T F

Example 2.6. [A ∧ (A ⇒ B)] ⇒ B is a tautology.


 
A B A ⇒ B A ∧ (A ⇒ B) [A ∧ (A ⇒ B)] ⇒ B
 T T 
 T T T 
 
(2.8)  T F F F T 
 
 F T T F T 
F F T F T
Definition 2.3.
(a) The converse of A ⇒ B is B ⇒ A.
(b) The inverse of A ⇒ B is ∼ A ⇒∼ B.
(c) The contrapositive of A ⇒ B is ∼ B ⇒∼ A.
Example 2.7. Write the converse, inverse and contrapositive of the statement in Example 2.3.

Recall that the given statement can be written as p ⇒ q where p and q are the statements:
p: You meet the month-end deadline;
q: You get a bonus.
(a) The converse of this implication is q ⇒ p: If you get a bonus, then you have met the month-end
deadline.
(b) The inverse of this implication is ∼ p ⇒∼ q: If you do not meet the month-end deadline, then
you will not get a bonus.
(c) The contrapositive of this implication is ∼ q ⇒∼ p: If you do not get a bonus, then you will
not have met the month-end deadline.
The following theorem is extremely useful.
Theorem 2.1. (A ⇒ B) ⇔ (∼ B ⇒∼ A).

Proof. Using Truth table,


 
A B A ⇒ B ∼ B ∼ A ∼ B ⇒∼ A
 
 T T T F F T 
 
(2.9)  T F F T F F 
 
 F T T F T T 
F F T T T T
The entries in third and sixth columns are identical. 
8 2. Introduction to Logic

Remark 2.1. It is an exercise to see that A ⇒ B is not logically equivalent to its converse, B ⇒ A.
One should avoid the very common mistake of claiming the opposite.
Example 2.8. Consider following two statements,
(A) Cornell is in Ithaca.
(B) Cornell is in NY state.
and the compound statements:
(a) Implication : A ⇒ B : If Cornell is in Ithaca, then Cornell is in NY state.
(b) Contrapositive : ∼ B ⇒∼ A : If Cornell is NOT in NY state, then Cornell is NOT in Ithaca.
(c) Converse : B ⇒ A : If Cornell is in NY state, then Cornell is in Ithaca.
Note that the converse statement is FALSE. This leads us to another important interpretation
of the implication A ⇒ B. It means that every time A is true, then B must be true. Hence A is a
sufficient condition for B. If we know that A is true then we can always conclude that B is also
true. The contrapositive ∼ B ⇒∼ A showed us that when B is not true then A cannot be true either.
Hence B is a necessary condition for A. If A is true we must necessarily have that B is true, because
if B isn’t true then A cannot be true either. Thus we have following ways of reading


 A implies B,


If A then B,
(2.10) A⇒B:

 A is sufficient for B,

 B is necessary for A.

Remark 2.2. Note that for equivalence relation (the if and only if) A ⇔ B, the implication goes
in both the directions. In this case A and B are necessary and sufficient conditions for each other.
A ⇔ B means that both the statement A ⇒ B and its converse B ⇒ A are true.

2.4. Quantifiers
In the previous sections, we learnt some definitions and basic properties of compound statements.
We were interested in whether a particular statement was true or false. This logic is called propo-
sitional logic or statement logic. However there are many arguments whose validity cannot be
verified using propositional logic. Consider, for example, the sentence
p : x is an even integer.
This sentence is neither true nor false. The truth or falsity depends on the value of the variable x.
For some values of x the sentence is true; for others it is false. Thus this sentence is not a statement.
However, let us denote this sentence by P(x), i.e.,
P(x) : x is an even integer.
Then, P(5) is false, while P(6) is true. To study the properties of such sentences, we need to extend
the framework of propositional logic to what is called first-order logic.
2.4. Quantifiers 9

Definition 2.4. A predicate or propositional function is a sentence that contains a finite number of
variables and becomes a statement when specific values are substituted for the variables. The do-
main of a predicate variable is the set of all values that may be substituted in place of the variables.

In our earlier example, the sentence

P(x) : x is an even integer

is a propositional function with domain D, the set of integers; since for each x ∈ D, P(x) is a
statement, i.e., for each x ∈ D, P(x) is true or false, but not both.

Example 2.9. The following are examples of predicate or propositional functions:

(a) The sentence “P(x) : x + 3 is an even integer” with domain D the set of positive integers.
(b) The sentence “P(x) : x + 3 is an even integer” with domain D the set of integers.
(c) The sentence “P(x; y; z) : x2 + y2 = z2 ” with domain D the set of positive integers.
Before proceeding further, we introduce following notations. A more comprehensive list of
notation will be described later.

∈ : “is an element of”,


 : “such that”,
∧ : AND in the sense that A ∧ B means both Aand B,
∨ : OR in the sense that A ∨ B means either A or B or both
∀ : Universal “for all”
∃ : Existential “there exists” (one or more).
(a) The Universal Quantifier:
Let P(x) be a predicate with domain D. Then the sentence

Q(x) : for all x, P(x)

is a statement. To see this, notice that either P(x) is true at each value x ∈ D (the notation x ∈ D
indicates that x is in the set D, while x ∈
/ D means that x is not in D) or P(x) is false for at least
one value of x ∈ D. If P(x) is true at each value x ∈ D, then Q(x) is true. However, if P(x) is
false for at least one value of x ∈ D, then Q(x) is false. Hence, Q(x) is a statement because it is
either true or false (but not both).

Definition 2.5. Each of the phrases “every”, “for every”, “for each”, and “for all” is referred
to as the universal quantifier and is expressed by the symbol ∀. Let P(x) be a statement with
domain D. A universal statement is a statement of the form ∀x ∈ D, P(x). It is false if P(x) is
false for at least one x ∈ D; otherwise, it is true.

Example 2.10. Let D be a set.


10 2. Introduction to Logic

The statement
∀x ∈ D, x > 0
means “For all x that are elements of D, x is positive.”
Example 2.11. Let P(x) be the predicate “P(x) : x2 ≥ x.”.
Determine whether the following universal statements are true or false.
(i) ∀x ∈ R; P(x);
(ii) ∀x ∈ Z; P(x);
( )2 ( )
(i) Let x = 12 ∈ R. Then, 12 = 14 < 12 , and so P 12 is false. Therefore, “∀x ∈ R; P(x)” is
false.
(ii) For all integers x, x2 ≥ x is true, and so P(x) is true for all ∀x ∈ Z. Hence,“∀x ∈ Z; P(x)”
is true.
(b) The Existential Quantifier:
Each of the phrases “there exists”, “there is”, “for some”, and “for at least one” is referred
to as the existential quantifier and is denoted in symbols ∃. Let P(x) be a predicate with domain
D. An existential statement is a statement of the form ∃x ∈ D such that P(x): It is true if P(x)
is true for at least one x ∈ D; otherwise, it is false.
Example 2.12. As before let D be a set.
The statement
∃x ∈ D,  x > 0
tells us that “There exists an element x of D such that x is positive.”
Example 2.13. Let P(x) be the predicate “P(x) : x2 < x.”.
Determine whether the following existential statements are true or false.
(i) ∃x ∈ R; P(x);
(ii) ∃x ∈ Z; P(x);
(i) Let x = 12 ∈ R. Then, ( 12 )2 = 14 < 12 , and so P( 21 ) is true. Therefore, “∃x ∈ R; P(x)” is
true.
(ii) For all integers x, x2 ≥ x is true, and so there is no x ∈ Z such that P(x) is true. Hence,“∃x ∈
Z; P(x)” is false.
(c) Universal Conditional Statements
Recall that a conditional statement has a contrapositive, a converse, and an inverse. These
definitions can be extended to universal conditional statements. Consider a universal condi-
tional statement of the form ∀x ∈ D; P(x) ⇒ Q(x).
(i) Its contrapositive is the statement,
∀x ∈ D; ∼ Q(x) ⇒∼ P(x).
(ii) Its converse is the statement,
∀x ∈ D; Q(x) ⇒ P(x)
2.4. Quantifiers 11

(iii) Its inverse is the statement,


∀x ∈ D; ∼ P(x) ⇒∼ Q(x).
Example 2.14. Write the contrapositive, converse, and inverse of the statement: If a real num-
ber is greater than 3, then its square is greater than 9.
Solution 2.2. Symbolically, the statement can be written as:
∀x ∈ R; if x > 3 then x2 > 9
Here P(x) is the statement x > 3 and Q(x) the statement x2 > 9.
(i) The contrapositive is:
∀x ∈ R; if x2 ̸> 9 then x ̸> 3,
or, equivalently,
∀x ∈ R; if x2 ≤ 9 then x ≤ 3.
(ii) The converse is:
∀x ∈ R; if x2 > 9 then x > 3.
Note that the converse is false; take, for example, x = −4. Then, (−4)2 > 9 is true but
−4 > 3 is false. Hence the statement if (−4)2 > 9 then −4 > 3 is false. Hence the
universal statement ∀x ∈ R; if x2 > 9 then x > 3 is false.
(iii) The inverse is:
∀x ∈ R; if x ̸> 3 then x2 ̸> 9,
or, equivalently,
∀x ∈ R; if x ≤ 3 then x2 ≤ 9.
(d) Order of quantifiers:
If the quantifiers are of the same type, the order in which they appear does not matter.
∀x, ∀y : x+y = y+x
∃x ∧ ∃y : x + y = 2 ∧ x + 2y = 3.
But if the quantifiers are of different types we have to be careful. For the set of real numbers,
the statement
(2.11) ∀x ∃y  y > x
is TRUE, that is given any real number x, there is always a real number y that is greater than x.
But the statement
(2.12) ∃y  ∀ x, y>x
is FALSE, since there is no fixed real number y that is greater than every real number.
Example 2.15. The statement [∃y ∈ U  ∀x ∈ V, statement A] means that one y will make A
true regardless of what x is. The statement [∀x ∈ V, ∃y ∈ U  statement A] means that A can be
made true by choosing y depending on x.
12 2. Introduction to Logic

2.5. Rules of Negation of statements with quantifiers


Fact 1. The negation of a universal statement of the form ∀x ∈ D; P(x) is logically equivalent to an
existential statement of the form ∃x ∈ D; such that ∼ P(x). Symbolically,
∼ [∀x ∈ D; P(x)] ≡ ∃x ∈ D; such that ∼ P(x)
Consider the universal statement ∀x ∈ D; P(x). It is false if P(x) is false for at least one x ∈ D;
otherwise, it is true. Hence it is false if and only if P(x) is false for at least one x ∈ D, or, if and
only if ∼ P(x) is true for at least one x ∈ D. Thus the negation of this statement is the statement
∃x ∈ D such that ∼ P(x).
Example 2.16. What is the negation of the statement “All mathematicians wear glasses ”?
Solution 2.3. Let us write this statement symbolically. Let D be the set of all mathematicians and
let P(x) be the predicate “x wears glasses” with domain D. The given statement can be written as
∀x ∈ D; P(x). The negation is ∃x ∈ D such that ∼ P(x). In words, the negation is “There exists a
mathematician who does not wear glasses” or “Some mathematicians do not wear glasses”.
Fact 2. The negation of an existential statement of the form ∃x ∈ D such that P(x) is logically
equivalent to a universal statement of the form ∀x ∈ D; ∼ P(x). Symbolically,
∼ (∃x ∈ D such that P(x)) ≡ ∀x ∈ D; ∼ P(x).
Consider the existential statement, ∃x ∈ D such that P(x). It is true if P(x) is true for at least
one x ∈ D; otherwise, it is false. Hence it is false if and only if P(x) is false for all x ∈ D, in other
words, if and only if ∼ P(x) is true for all x ∈ D. Thus the negation of this statement is the statement
∀x ∈ D; ∼ P(x).
Example 2.17. What is the negation of the statement “Some politicians are honest”?
Solution 2.4. Let us write this statement symbolically. Let D be the set of all politicians and let
P(x) be the predicate “x is honest” with domain D. The given statement can be written as ∃x ∈ D
such that P(x). The negation is ∀x ∈ D; ∼ P(x). In words, the negation is “All politicians are not
honest” or “No politician is honest”.
Consider next the negation of a universal conditional statement. By the second Fact, we have
that ∼ (∀x ∈ D; (P(x) ⇒ Q(x))) ≡ ∃x ∈ D such that ∼ (P(x) ⇒ Q(x)). But the negation of an “if
p then q” statement is logically equivalent to an “p and not q” statement. Hence, ∼ (P(x) ⇒
Q(x)) ≡ P(x)∧ ∼ Q(x). Therefore we have the following fact:
Fact 3. The negation of a universal conditional statement of the form ∀x ∈ D; (P(x) ⇒ Q(x)) is
logically equivalent to the existential statement of the form ∃x ∈ D such that (P(x)∧ ∼ Q(x)).
Symbolically,
∼ (∀x ∈ D; (P(x) ⇒ Q(x))) ≡ ∃x ∈ D such that (P(x)∧ ∼ Q(x)).
Written less symbolically, this becomes
∼ (∀x ∈ D; if P(x) then Q(x)) ≡ ∃x ∈ D such that P(x) and ∼ Q(x).
2.5. Rules of Negation of statements with quantifiers 13

2.5.1. More Examples. We can use the truth tables to prove following examples of negations.
∼ (A ∧ B) ⇔∼ A∨ ∼ B
∼ (A ∨ B) ⇔∼ A∧ ∼ B
∼ (x > y) ⇔ x 6 y
∼ (A ⇒ B) ⇔ A∧ ∼ B
∼ (∼ A) ⇔ A.
Try proving them (Good Exercise).

2.5.2. Negation of statement with one quantifier. The universal statement in the Example 2.10
contains a universal quantifier term and the statement x > 0. To negate a universal statement we
need to find only one counterexample. In this example, if we can find just one x in D that is non
positive, we know that it is not true that all x are positive. Thus the negation of the universal
statement
∀x ∈ D, x > 0
is an existential statement,
∃x ∈ D,  x 6 0.
To negate an existential statement we must show that every possible instance is false. The existen-
tial statement
∃x ∈ D,  x > 0
is false if there are no positive elements of D. Thus the negation of the existential statement is a
universal statement
∀x ∈ D, x 6 0.
Insight from these examples can be generalized to rules of negation. Note that , such that always
follows ∃ (the existential quantifier).

Rule 2.1. For negating the statement, [quantifier term, statement], first change the quantifier: ∀
becomes ∃, ∃ becomes ∀ and then negate the statement.

2.5.3. Negation with more than one quantifier.

Rule 2.2. To negate a statement with a string of quantifiers, change the type of each quantifier,
preserve their order and negate the statement that follows the quantifiers.

Example 2.18. Statement:


(2.13) ∀ε > 0 ∃N  ∀ n, if n > N, then ∀ x ∈ D, fn (x) − f (x) < ε.
14 2. Introduction to Logic


Negation: ∃ε > 0  ∼ [∃N  ∀ n, if n > N, then ∀ x ∈ D, fn (x) − f (x) < ε],

or ∃ε > 0  ∀N, ∼ [ ∀ n, if n > N, then ∀ x ∈ D, fn (x) − f (x) < ε],

(2.14) or ∃ε > 0  ∀N, ∃n  ∼ [if n > N, then ∀ x ∈ D, fn (x) − f (x) < ε],

or ∃ε > 0  ∀N, ∃n  n > N and ∼ [∀ x ∈ D, fn (x) − f (x) < ε],

or ∃ε > 0  ∀N, ∃ n > N and ∃ x ∈ D,  fn (x) − f (x) > ε.

2.6. Logical Equivalences


There are many fundamental logical equivalences that we often encounter. Several of these are
listed in Theorem below. We may find them to be useful for future reference.
Theorem 2.2. Let p, q and r be statements. Then the following logical equivalences hold.
(1) Commutative Laws
(i) p ∧ q ≡ q ∧ p;
(ii) p ∨ q ≡ q ∨ p.
(2) Associative Laws
(i) (p ∧ q) ∧ r ≡ p ∧ (q ∧ r);
(ii) (p ∨ q) ∨ r ≡ p ∨ (q ∨ r).
(3) Distributive Laws
(i) p ∨ (q ∧ r) ≡ (p ∨ q) ∧ (p ∨ r);
(ii) p ∧ (q ∨ r) ≡ (p ∧ q) ∨ (p ∧ r).
(4) De Morgans Laws
(i) ∼ (p ∨ q) ≡ (∼ p) ∧ (∼ q);
(ii) ∼ (p ∧ q) ≡ (∼ p) ∨ (∼ q).
(5) Idempotent Laws
(i) p ∧ p ≡ p;
(ii) p ∨ p ≡ p.
(6) Negation Laws
(i) p ∨ (∼ p) ≡ T ;
(ii) p ∧ (∼ p) ≡ F;
where T: True; F: False.
(7) Universal Bound Laws
(i) p ∨ T ≡ T ;
(ii) p ∧ F ≡ F.
(8) Identity Laws
(i) p ∨ F ≡ p;
(ii) p ∧ T ≡ p.
(9) Double Negation Law ∼ (∼ (p)) ≡ p.
2.7. Some Math symbols and Definitions 15

The De Morgans Laws can be expressed in words as under: “The negation of an and statement is
logically equivalent to the or statement in which each component is negated, while the negation of
an or statement is logically equivalent to the and statement in which each component is negated.”

2.7. Some Math symbols and Definitions


This is a very brief list of some of the mathematical shorthand that will be used in this course and
in the first year courses. Some of these symbols will be explained in more detail as we go.

Operator Meaning
∀ For all, for every, for each
∃ There exists, there is
∈ In, a member of
∋ Owns, contains
∨ Or
∧ And
∴ Therefore
∼ or ¬ Not
0/ Empty set
⊂ Subset, is a subset of
⊃ Contains the set
∪ Union (of sets)
∩ Intersection (of sets)
⇒ Implies
⇐⇒ or iff If and only if, each implies the other
s.t., |,  or : Such that
Q.E.D. Quod erat demonstrandum (Proof complete)

Next we define some of the commonly used mathematical terms.


(a) Theorem A statement which can be demonstrated to be true by accepted mathematical
operations and arguments.
In general, a theorem is an embodiment of some general principle that makes it part of a
larger theory. The process of showing a theorem to be correct is called a proof.
(b) Proposition A statement which is required to be proved.
(c) Axiom A proposition regarded as self-evidently true without proof. The word “axiom” is
synonym for postulate.
(d) Corollary An immediate consequence of a result already proved. Corollaries usually state
more complicated theorems in a language simpler to use and apply.
(e) Lemma A short theorem used in proving a larger theorem.
16 2. Introduction to Logic

(f) Hypothesis A hypothesis is a proposition that is consistent with known data, but has been
neither verified nor shown to be false.
(g) Definition Tells us how or what things are.
Chapter 3

Proof Techniques

3.1. Methods of Proof


A proof is a method of establishing the truthfulness of an implication. An example would be to
prove a proposition of the form, “If H1 , · · · , Hn , then T. ”. The statements H1 , · · · , Hn are referred to
as hypotheses of the proof and proposition T is referred to as the conclusion. A formal proof would
consist of a sequence of valid propositions ending with the conclusion T. By valid proposition,
we mean the proposition in the sequence must either be one of the hypotheses H1 , · · · , Hn , or an
axiom, a definition, a tautology or a proposition proved earlier, or it must be derived from previous
propositions using either logical implication or substitution.
Before we present proof techniques, we describe some elementary definitions in number theory.

Definition 3.1. An integer n is even if and only if n = 2k for some integer k. An integer n is odd if
and only if n = 2k + 1 for some integer k.

Using the quotient-remainder theorem, we can show that every integer is either even or odd.

Definition 3.2. An integer n is prime if and only if n > 1 and for all positive integers r and s, if
n = r · s then r = 1 or s = 1. An integer n is composite if and only if n = r · s for some positive
integers r and s, with r ̸= 1 and s ̸= 1.

First three prime numbers are 2, 3, and 5. First six composite numbers are 4, 6, 8, 9, 10 and 12.
Every integer greater than 1 is either prime or composite since the two definitions are negations of
each other.

Definition 3.3. Two integers m and n are said to be of the same parity if m and n are both even or
are both odd, while m and n are said to be of the opposite parity if one of m and n is even and the
other is odd. Two integers are consecutive if one is one more than the other.

17
18 3. Proof Techniques

Integers 2 and 8 are of same parity while 5 and 10 are of opposite parity.
Definition 3.4. Let n and d be integers with d ̸= 0. Then n is said to be divisible by d if n = d · k for
some integer k. In such case we say that n is a multiple of d, or d is a factor of n, or d is a divisor
of n, or d divides n.

The notation “d|n” is read as “d divides n”.


We discuss following techniques of writing proofs. Our emphasis here will be on showing how
each of them is used through several examples.

3.2. Trivial Proofs


Let P(x) and Q(x) be statements with domain D. If Q(x) is true for every x ∈ D, then the universal
statement
∀x ∈ D, P(x) → Q(x)
is true regardless of the truth value of P(x). Such a proof is called a trivial proof.
Claim 3.1. For x ∈ R, if x > −3, then x2 + 1 > 0.

Proof. Consider the two statements P(x) : x > −3 and Q(x) : x2 + 1 > 0. Since x2 ≥ 0 for every
x ∈ R, it follows that x2 + 1 ≥ 0 + 1 > 0 for every x ∈ R. Thus P(x) → Q(x) is true for every x ∈ R
and hence for x > −3. 
Claim 3.2. If n is an odd integer, then 6n3 + 4n + 3 is an odd integer.

Proof. Since 6n3 + 4n + 3 = 2(3n3 + 2n + 1) + 1 where 3n3 + 2n + 1 ∈ Z (i.e. 6n3 + 4n + 3 = 2k + 1


where k = 3n3 + 2n + 1 ∈ Z), the integer 6n3 + 4n + 3 is odd for every integer n. 

Observe the fact that 6n3 + 4n + 3 is odd does not depend on n being odd. It would have been
better to replace the statement of the claim by “if n is an integer, then 6n3 + 4n + 3 is odd.”

3.3. Vacuous Proofs


Let P(x) and Q(x) be the statements with domain D. If P(x) is false for all every x ∈ D, then the
universal statement
∀x ∈ D, P(x) → Q(x)
is true regardless of the truth value of Q(x). Such a proof is called vacuous proof.
Claim 3.3. For x ∈ R, if x2 − 2x + 1 < 0, then x > 1.

Proof. Let P(x) : x2 − 2x + 1 < 0 and Q(x) : x > 1. Since x2 − 2x + 1 = (x − 1)2 ≥ 0 for every
x ∈ R, we have (x − 1)2 < 0 is false for every x ∈ R. Hence, P(x) is false for every x ∈ R. Thus,
P(x) → Q(x) is true for every x ∈ R. 
3.4. Proof by Construction 19

3.4. Proof by Construction


In a proof by construction we work straight from the set of assumptions.
Example 3.1. Consider a function
(3.1) f (n) = n2 + n + 17,

where n ∈ N. If we evaluate this function, it seems that we always get a prime number. For
instance
f (1) = 19
f (2) = 23
f (3) = 29
f (15) = 257.
We can verify that all these numbers are prime. Then we might conjecture that
Conjecture 1. The function f (n) = n2 + n + 17 generates prime numbers for all n ∈ N.

Drawing such a conclusion is an example of inductive reasoning. It is important to note that


we have NOT proved the conjecture made above in the example. In fact, this conjecture is false.
Take n = 17, f (17) = 172 + 17 + 17 = 17 · 19 which is not a prime number.
Example 3.2. Let NE be the set of even natural numbers and NO be the set of odd numbers.

We want to show that (i) the sum of two even numbers is even,
∀x, y ∈ NE , x + y ∈ NE
and (ii) the sum of an odd number and an even number is odd
∀x ∈ NE , ∀y ∈ NO , x + y ∈ NO .

Proof. (By construction)


(i) Let
x, y ∈ NE ⇔ ∃m, n ∈ N  x = 2m ∧ y = 2n,
x + y = 2m + 2n = 2 (m + n) ∈ NE since m + n ∈ N.
(ii) Let
x ∈ NE ⇔ ∃m ∈ N  x = 2m, y ∈ NO ⇔ ∃n ∈ N  y = 2n + 1,
x + y = 2m + 2n + 1 = 2 (m + n) + 1, where m + n ∈ N ⇒ x + y ∈ NO .

Example 3.3. Consider function g (n, m)
20 3. Proof Techniques

g (n, m) = n2 + n + m where m, n ∈ N.
g (1, 2) = 12 + 1 + 2 = 22
g (2, 3) = 22 + 2 + 3 = 32
g (12, 13) = 122 + 12 + 13 = 132
On the basis of above, we can form a conjecture,

Conjecture 2.
(3.2) ∀n ∈ N, g (n, n + 1) = (n + 1)2 .

It turns out that this conjecture is true.

Proof. By construction.
g (n, n + 1) = (n)2 + n + (n + 1)
= n2 + 2n + 1
= (n + 1)2 .
Having proved the general statement, we know that
g (15, 16) = 162 .
This is an example of deductive reasoning. 

Example 3.4. Show that if x is odd then x2 is odd.

Proof. By construction. Let x > 1.


x ∈ NO ⇔ ∃n ∈ N  x = 2n + 1,
x2 = (2n + 1)2
= 4n2 + 4n + 1
( )
= 2 2n2 + 2n + 1
⇒ x 2 ∈ NO .
For x = 1, x2 = 1 which is odd. 

Example 3.5. If the sum of two integers is even, then so is their difference.

Proof. Assume that the integers m and n are such that m + n is even. Then m + n = 2k for some
integer k. So, m = 2k − n and m − n = 2k − n − n = 2(k − n) = 2l, where l = k − n is an integer.
Thus m − n is even. 
3.5. Proof by Contraposition 21

3.5. Proof by Contraposition


Note that A ⇒ B is not logically equivalent to its converse statement B ⇒ A. It is possible for an
implication to be false while its converse is true. Hence we cannot prove A ⇒ B by showing B ⇒ A.

Example 3.6. The implication

m2 > 0 ⇒ m > 0
is false but its converse
m > 0 ⇒ m2 > 0
is true.
To show that A ⇒ B, we can instead show that ∼ B ⇒∼ A. We have already shown before that
implication and its contrapositive are logically equivalent.

Example 3.7. Consider a theorem.

“If 7m is an odd number then m is an odd number.”


Its contrapositive is “If m is not an odd number, then 7m is not an odd number.”, or, equivalently,
“If m is an even number, then 7m is an even number.”
We are talking about integers here. Using contrapositive, we can construct a proof of theorem
as under:

Proof.

m ∈ NE ⇔ ∃k ∈ N  m = 2k,
7m = 7 (2k) = 2 (7k) , 7k ∈ N ⇒7m ∈ NE .

This is much easier than trying to show directly that 7m being odd implies that m is odd. 

Example 3.8. Show that if x2 is even, then x is even.

(3.3) x 2 ∈ NE ⇒ x ∈ NE

Its contrapositive is

(3.4) x ∈ NO ⇒ x 2 ∈ NO

This we have already shown in an example above.


22 3. Proof Techniques

3.6. Proof by Contradiction


To prove that statement C is true, try supposing ∼ C is true and then show that this leads to a
contradiction. To show that A ⇒ B we can use
(3.5) ∼ (A ⇒ B) ⇔ A∧ ∼ B.
So assume A to be true and show ∼ B is false. Hence A∧ ∼ B is false. So A ⇒ B is true.

Example 3.9. In the last example,


x 2 ∈ NE ⇒ x ∈ NE .

We can prove the statement by contradiction as follows.

Proof. Assume x2 is even and x is odd.


x2 ∈ NE ⇔ ∃m ∈ N  x2 = 2m.
x ∈ NO ⇔ ∃n ∈ N  x = 2n + 1
⇒ x2 = 4n2 + 4n + 1, which is odd.
This contradicts initial assumption that x2 is even. 

Example 3.10. There is no greatest integer.

Proof. Assume, to the contrary, that there is a greatest integer, say N. Then, N ≥ n for every integer
n. Let m = N + 1. Now m is an integer since it is the sum of two integers. Also, m > N. Thus, m is
an integer that is greater than the greatest integer, which is a contradiction. Hence our assumption
that there is a greatest integer is false. Thus there is no greatest integer. 

For next example, we define the rational number first.

Definition 3.5. A real number r is rational number if r = mn for some integers m and n with n ̸= 0.
A real number that is not a rational number is called an irrational number.

Example 3.11. There is no smallest positive rational number.

Proof. Assume, to the contrary, that there is a least positive rational number x. Then, x ≤ y for
every positive rational number y. Consider the number 2x . Since x is a positive rational number,
so too is 2x . Multiplying both sides of the inequality 12 < 1 by x, which is positive, gives 2x < x.
Hence, 2x is a positive rational number that is less than x, which is a contradiction. Hence our
assumption that there is a least positive rational number is false. Thus there is no least positive
rational number. 

Example 3.12. The sum of a rational number and an irrational number is irrational.
3.6. Proof by Contradiction 23

Proof. Assume, to the contrary, that there exists a rational number p and an irrational number q
whose sum is a rational number. Thus, by definition of rational numbers, p = ab and p + q = r = dc
for some integers a; b; c and d with b ̸= 0 and d ̸= 0. Hence,
c a bc − ad
q = r− p = − =
d b bd
Now, bc − ad ∈ Z and bd ∈ Z since a; b; c and d ∈ Z. Since b ̸= 0 and d ̸= 0, bd ̸= 0. Hence,
r ∈ Q, which is a contradiction. Hence our assumption that there exists a rational number and an
irrational number whose sum is a rational number is false. Thus, the sum of a rational number and
an irrational number is irrational. 

We end this section with a proof of the classical result that 2 is irrational.

Example 3.13. The real number 2 is irrational.

Proof. Assume, to the contrary, that 2 is rational. Then,
√ m
2=
n
where m; n ∈ Z and n ̸= 0. By dividing m and n by any common factors, if necessary, we may
further assume that m and n have no common factors, i.e., mn has been expressed in (or reduced to)
2
lowest terms. Then, 2 = mn2 , and so m2 = 2n2 . Thus, m2 is even. Hence, m is even, and so m = 2k,
where k ∈ Z. Substituting this into our earlier equation m2 = 2n2 , we have (2k)2 = 2n2 , and so
4k2 = 2n2 . Therefore, n2 = 2k2 . Thus, n2 is even, and so n is even. Therefore each of m and n has
2 as a factor, which contradicts our assumption that m ̸= n has been reduced to lowest terms and√
therefore that m and n have no common
√ factors. We deduce, therefore, that our assumption that 2
is rational is incorrect. Hence, 2 is irrational. 
Exercise 3.1. The square root of any prime number is irrational.
Remark 3.1. One should be very careful when writing proof by contradiction. Here is a very
strong word of caution which can be found in ?, page 3.

“All students are enjoined in the strongest possible terms to eschew proofs by contradiction!
There are two reasons for the prohibition: First such proofs are very often fallacious, the contra-
diction on the final page arising from an erroneous deduction on an earlier page, rather than from
the incompatibility of p with ¬q. Second, even when correct, such a proof gives little insight into
the connection between p and q whereas both the direct proof and the proof by contraposition con-
struct a chain of argument connecting p and q. One reason why mistakes are so much more likely
in proofs by contradiction than in direct proofs is that in a direct proof (assuming the hypotheses is
not always false) all deduction from the hypothesis are true in those cases where hypothesis holds.
One is dealing with true statements, and one’s intuition and knowledge about what is true help to
keep one from making erroneous statements. In proofs by contradiction, however, you are (assum-
ing the theorem is true) in the unreal world where any statement can be derived, and so the falsity
of a statement is no indication of an erroneous deduction.”.
24 3. Proof Techniques

3.7. Proof by Induction


A proof by induction involves three steps.
(a) Base of induction. Check for n = 1, whether the statement is true.
(b) Inductive transition: Assume that the statement is true for some n and show that it is also true
for n + 1.
(c) Inductive conclusion: The statement is true for all n > 1.

Example 3.14. Show that if f (x) = xn , then f ′ (x) = nxn−1 for n ∈ N.

Proof. By Induction.
(a) Base of induction:

(3.6) f (x) = x, f ′ (x) = 1 = x0 = 1 · x1−1

(b) Inductive transition:


Assume that for

(3.7) f (x) = xn , f ′ (x) = nxn−1

then for

f (x) = xn+1 = xn · x,
f ′ (x) = nxn−1 · x + xn · 1
= nxn + xn
(3.8) = (n + 1) xn

(c) Inductive conclusion:


∀n ∈ N if f (x) = xn then f ′ (x) = n · xn−1 .


Example 3.15. Prove by induction that 7n − 4n is a multiple of 3, for n ∈ N.

Proof. (a) Base of induction:

(3.9) 7n − 4n = 7 − 4 = 3

Statement is true.
(b) Inductive transition:
3.7. Proof by Induction 25

Assume that 7n − 4n = 3m where m ∈ N, then


7n+1 − 4n+1 = 7 · 7n − 4 · 4n
= 7 · 7n − 7 · 4n + 7 · 4n − 4 · 4n
= 7 · (7n − 4n ) + (7 − 4) · 4n
= 7 · (3m) + 3 · 4n
= 3 · (7m + 4n )
Since m and n are natural numbers, so is 7m + 4n . So 7n+1 − 4n+1 is a multiple of 3.
(c) Inductive conclusion:
7n − 4n is a multiple of 3, for all n ∈ N.

(n)
Example 3.16. Prove the Binomial Theorem : (a + b)n = ∑nk=0 k an−k bk by induction.

Proof. (a) Base of induction:


For n = 1, the claim is trivially true.
(b) Inductive transition:
Assume that the Binomial Theorem holds true for n. Then
n ( )
n n−k k
(a + b)n+1
= (a + b)(a + b) = (a + b) ∑
n
a b
k=0 k
n ( ) n ( )
n n−k+1 k n n−k k+1
= ∑ a b +∑ a b
k=0 k k=0 k
n ( ) ( )
n n−k+1 k n+1 n
= ∑ a b +∑ an−l+1 bl by change of variable l = k + 1
k=0 k l=1 l − 1
( ) {( ) ( )} ( )
n
n n+1 n n n n+1
= a +∑ + an−l+1 bl + b
0 l=1 l l − 1 n
( ) {( )} ( )
n + 1 n+1 n+1 n+1 n + 1 n+1
= a +∑ an−l+1 l
b + b
0 l=1 l n+1
n+1 ( )
n + 1 (n+1)−k k
= ∑ a b
k=0 k
In the fifth line we have used the fact that ,
( ) ( ) ( )
n n n+1
+ = .
l l −1 l
It is a good exercise to verify this.
(c) Inductive conclusion:
The Binomial Theorem holds for all n ∈ N.
26 3. Proof Techniques

Observe that in the inductive hypothesis of our proof above, we assume that P(k) is true for an
arbitrary, but fixed, positive integer k. We certainly do not assume that P(k) is true for all positive
integers k, for this is precisely what we wish to prove! It is important to understand that our aim is
to establish the truth of the implication “If P(k) is true, then P(k + 1) is true.” which together with
the truth of the statement P(1) allows us to conclude that an infinite number of statements (namely,
P(1), P(2),P(3), · · · ) are true.
Example 3.17. For every positive integer n,
n(n + 1)(2n + 1)
12 + 22 + · · · + n2 = .
6
n(n+1)(2n+1)
Proof. For every integer n ≥ 1, let P(n) be the statement P(n) : 12 + 22 + · · · + n2 = 6 .
(a) Base of induction:
When n = 1, the statement P(1) : 12 = 1(1+1)(2·1+1)
6 is certainly true since 1(1+1)(2·1+1)
6 =
6
6 = 1. This establishes the base case when n = 1.
(b) For every integer n > 1, let P(n) be the statement P(n) : 12 + 22 + · · · + n2 = n(n+1)(2n+1)
6 . For
the inductive hypothesis, let k be an arbitrary (but fixed) integer such that k > 1 and assume
that P(k) is true; that is, assume that 12 + · · · + k2 = k(k+1)(2k+1)
6 . For the inductive step, we
need to show that P(k + 1) is true. That is, we show that
(k + 1)(k + 2)(2k + 3)
12 + 22 + · · · + k2 + (k + 1)2 = .
6
Evaluating the left-hand side of this equation, we have
12 + 22 + · · · + k2 + (k + 1)2 = (12 + 22 + · · · + k2 ) + (k + 1)2
k(k + 1)(2k + 1)
= + (k + 1)2 (by the inductive hypothesis)
6
k(k + 1)(2k + 1) 6(k + 1)2
= +
6 6
(k + 1)(2k2 + k + 6k + 6)
=
6
(k + 1)(2k2 + 7k + 6) (k + 1)(2k2 + 4k + 3k + 6)
= =
6 6
(k + 1)(k + 2)(2k + 3)
= ;
6
thus verifying that P(k + 1) is true.
(c) Hence, by the principle of mathematical induction, P(n) is true for all integers n ≥ 1; that is,
n(n + 1)(2n + 1)
12 + 22 + · · · + n2 =
6
3.7. Proof by Induction 27

is true for every positive integer n.




Recall that in a geometric sequence, each term is obtained from the preceding one by multiply-
ing by a constant factor. If the first term is 1 and the constant factor is r, then the sequence is 1, r,
r2 , r3 , · · · , rn , · · · . The sum of the first n terms of this sequence is given by a simple formula which
we shall verify using mathematical induction. This is left as an exercise.
Induction can also be used to solve problems involving divisibility, as the next two example
illustrates.
Example 3.18. For all integers n ≥ 1, 22n − 1 is divisible by 3.

Proof. We proceed by mathematical induction. When n = 1, the result is true since in this case
22n − 1 = 22 − 1 = 3 and 3 is divisible by 3. Hence, the base case when n = 1 is true. For the
inductive hypothesis, let k be an arbitrary (but fixed) integer such that k > 1 and assume that the
property holds for n = k, i.e., suppose that 22k − 1 is divisible by 3. For the inductive step, we must
show that the property holds for n = k + 1. That is, we must show that 22(k+1) − 1 is divisible by
3. Since 22k − 1 is divisible by 3, there exists, by definition of divisibility, an integer m such that
22k − 1 = 3m, and so 22k = 3m + 1. Now,
22(k+1) − 1 = 22k 22 − 1
= 4 · 22k − 1
= 4(3m + 1) − 1
= 12m + 3
= 3(4m + 1).
Since m ∈ Z, we know that 4m + 1 ∈ Z. Hence, 22(k+1) − 1 is an integer multiple of 3; that is,
22(k+1) − 1 is divisible by 3, as desired. Hence, by the principle of mathematical induction, the
property holds for all integers n ≥ 1. 

Induction can also be used to verify certain inequalities, as the next example illustrates.
Example 3.19. For all integers n ≥ 2,
√ 1 1 1
n < √ + √ +···+ √ .
1 2 n
Proof. We proceed by mathematical induction. To show the inequality holds for n = 2, we must
show that
√ 1 1
2< √ +√ .
1 2
√ √
But √ √ if and1 only1 if 2 < 2 + 1 which is true if and only if 1 < 2. Since
this inequality is true
1 < 2 is true, so too is 2 < √1 + √2 . Hence the inequality holds for n = 2. This establishes the
28 3. Proof Techniques

base case. For the inductive hypothesis, let k be an arbitrary (but fixed) integer such that k > 2 and
assume that the inequality holds for n = k, i.e., suppose that
√ 1 1 1
k < √ + √ +···+ √ .
1 2 k
For the inductive step, we must show that the inequality holds for n = k + 1. That is, we must show
that
√ 1 1 1 1
k +1 < √ + √ +···+ √ + √ .
1 2 k k+1
√ √ √
Since k > 2, k < k + 1, and so multiplying both sides by k,
√ √
k < k k + 1.
√ √ √
Add 1 to both sides, k + 1 < k k + 1 + 1; and so dividing both sides by k + 1 we have
√ √ 1
k+1 < k+ √ .
k+1
Hence, by the inductive hypothesis,
√ 1 1 1 1
k +1 < √ + √ +···+ √ + √ ;
1 2 k k+1
as desired. Hence, by the principle of mathematical induction, the inequality holds for all integers
n > 2. 

3.8. Additional Notes on Proofs


To prove a universal statement
(3.10) ∀x ∈ D, p (x)
we let x represent an arbitrary element of the set D and then show that statement p (x) is true. The
only properties we can use about x are those that apply to all elements of D. For example, if the set
D consists of the natural numbers, then we cannot assume x to be odd as not all natural numbers
are odd. To prove an existential statement,
(3.11) ∃x ∈ D, p (x)
all we need to do is to show that there exists at least one member of D for which p (x) is true. We
show these techniques through following examples.
Example 3.20. For every ε > 0, there exists a δ > 0 such that

(3.12) 1 − δ < x < 1 + δ ⇒ 5 − ε < 2x + 3 < 5 + ε


In this example we are asked to prove that the statement is true for each positive number ε. We
begin with an arbitrary ε and use it to find a δ which is positive and has the property that the
implication holds true. We give a particular value of δ which could possibly depend on ε and show
that the statement is true.
3.8. Additional Notes on Proofs 29

Proof. Let ε > 0 be arbitrary and let δ = 2ε . Note that δ > 0.


1−δ < x < 1+δ
ε ε
1− < x < 1+
2 2
2 − ε < 2x < 2 + ε
5 − ε < 2x + 3 < 5 + ε.


In some cases, it is possible to prove an existential statement in an indirect way without actually
producing any specific element of the set. One indirect method is to use contrapositive and another
is to use a proof by contradiction. Consider following example to show this aspect.
Example 3.21. Let f be a continuous function.

If
∫1
(3.13) f (x) dx ̸= 0,
0
then there exists a point x ∈ [0, 1] such that
f (x) ̸= 0.

Proof. The contrapositive implication can be written as


∫1
(3.14) If ∀x ∈ [0, 1] f (x) = 0, then f (x) dx = 0
0
This is lot easier to prove. Instead of having to conclude the existence of an x having a particular
property, we are given that all x have a different property. The proof follows directly from the
definition of the integral, since each of the terms in any Riemann sum will be zero. 
1
Example 3.22. Let x be a real number. If x > 0 then x > 0.

Proof. Note that p ⇒ q is equivalent to (p∧ ∼ q) ⇒ contradiction. We begin by assuming x > 0


and
1
(3.15) 6 0.
x
Since x > 0, we can multiply both sides by x.
( )
1
(3.16) (x) 6 (x) · 0 or 1 6 0.
x
This is a contradiction. 

Consider the proof of the following existential statement.


30 3. Proof Techniques

Claim 3.4. There exist irrational numbers a and b such that ab is rational.
√ √2
Proof. Consider the real number, 2 . This number is either rational or irrational. We consider
each case in turn.
√ √2 √ √
(1) 2 is rational. Let a = 2 and b = 2. Thus a and b are irrational, and by assumption, ab
is rational.
√ √2 √ √2 √
(2) 2 √ is irrational. Let a = 2 and b = 2. Thus a and b are irrational. Moreover, ab =
√ 2 √ √ √ √ √
( 2 ) 2 = ( 2) 2· 2 = ( 2)2 = 2 is rational.
In both cases, we proved the existence of irrational numbers a and b such that ab is rational, and so
we have the desired result. 

We remark that as it stands, this proof does not enable us to pinpoint which of the two choices
of the pair (a, b) has the required property. In order to determine the correct choice of (a, b),
√ √2
we would need to decide whether 2 is rational or irrational. √It is not a constructive proof.
Following would be a constructive proof of this claim. Let a = 2 and b = log2 9. Then b is
an irrational number, for if it were rational, then log2 9 = mn where m and n are integers with no
common factor. This implies 2m = 9n which is a contradiction as 2m is an even number and 9n is
an odd number. This gives ab = 3 which is rational 1.

3.9. Decomposition or proof by cases


Let P(x) be a statement. If x possesses certain properties, and if we can verify that P(x) is true
regardless of which of these properties x has, then P(x) is true. Such a proof is called a proof by
cases.
Some proofs naturally divide themselves into consideration of two or more cases. For example
positive integers are either even or odd. Real numbers are positive, negative or zero. It may be that
different arguments are required for each case.
More rigorously, suppose we want to prove that p ⇒ q, and that p can be decomposed into two
disjoint propositions p1 , p2 such that p1 ∧ p2 is a contradiction. Then p ≡ (p1 ∨ p2 ) ∧ ¬(p1 ∧ p2 ) ≡
(p1 ∨ p2 ).
With this choice of p1 and p2 , we have,
(p ⇒ q) ⇔ (¬p ∨ q) ⇔ [¬(p1 ∨ p2 ) ∨ q]
⇔ [(¬p1 ∧ ¬p2 ) ∨ q] ⇔ [(¬p1 ∨ q) ∧ (¬p2 ∨ q)]
⇔ [(p1 ⇒ q) ∧ (p2 ⇒ q)].
This means that we only need to show that p1 ⇒ q and p2 ⇒ q. Note that this method works also
if we can decompose p into a number of propositions greater than 2 as far as these propositions are
1There is an extensive literature on constructive mathematics. You may like to do a google search for easy to read articles on the
subject. A classic reference is ?.
3.9. Decomposition or proof by cases 31

mutually exclusive ( i.e., every pair of them is a contradiction). Following example illustrates this
technique.
Before going over some examples, we state the following theorem.
Theorem 3.1. (Quotient-Remainder Theorem) For every given integer n and positive integer d,
there exist unique integers q and r such that
n = d ·q+r and 0 ≤ r < d.
Definition 3.6. Let n be a nonnegative integer and let d be a positive integer. By the Quotient-
Remainder Theorem, there exist unique integers q and r such that n = d · q + r; where 0 ≤ r < d.
We define,
n div d = q (read as “n divided by d ”), and
n mod d = r (read as “n modulo d ”).
Thus n div d and n mod d are the integer quotient and integer remainder, respectively, obtained
when n is divided by d.

Observe that given a nonnegative integer n and a positive integer d, we have that n mod d ∈
{0, · · · , d − 1} (since 0 ≤ r ≤ d − 1) and that n mod d = 0 if and only if n is divisible by d.
Result 3.1. Every integer is either even or odd.

Proof. By the Quotient-Remainder Theorem with d = 2, there exist unique integers q and r such
that n = 2 · q + r and 0 ≤ r < 2. Hence, r = 0 or r = 1. Therefore, n = 2q or n = 2q + 1 for some
integer q depending on whether r = 0 or r = 1, respectively. In the case that n = 2q, the integer n
is even. In the other case that n = 2q + 1, the integer n is odd. Hence, n is either even or odd. 

Let Z denote the set of integers.


Example 3.23. If n ∈ Z, then n2 + 5n + 3 is an odd integer.

Proof. We use a proof by cases, depending on whether n is even or odd.


(1) n is even.
Then, n = 2k for some integer k. Thus, n2 + 5n + 3 = (2k)2 + 5(2k) + 3 = 4k2 + 10k + 3 =
2(2k2 + 5k + 1) + 1 = 2m + 1, where m = 2k2 + 5k + 1. Since k ∈ Z, we must have m ∈ Z.
Hence, n2 + 5n + 3 = 2m + 1 for some integer m, and so the integer n2 + 5n + 3 is odd.
(2) n is odd.
Then, n = 2k + 1 for some integer k. Thus, n2 + 5n + 3 = (2k + 1)2 + 5(2k + 1) + 3 =
4k + 14k + 9 = 2(2k2 + 7k + 4) + 1 = 2m + 1, where m = 2k2 + 7k + 4. Since k ∈ Z, we must
2

have m ∈ Z. Hence, n2 + 5n + 3 = 2m + 1 for some integer m, and so the integer n2 + 5n + 3 is


odd.

32 3. Proof Techniques

Example 3.24. Let m, n ∈ Z. If m and n are of the same parity (either both even or both odd), then
m + n is even.

Proof. We use a proof by cases, depending on whether m and n are both even or both odd.
(1) m and n are both even.
Then, m = 2k and n = 2l for some integers k and l. Thus, m + n = 2k + 2l = 2(k + l). Since
k + l ∈ Z, the integer m + n is even.
(2) m and n are both odd.
Then, m = 2k + 1 and n = 2l + 1 for some integers k and l. Thus, m + n = (2k + 1) + (2l +
1) = 2(k + l + 1). Since k + l + 1 ∈ Z, the integer m + n is even.


Example 3.25. Let n ∈ Z. If n2 is a multiple of 3, then n is a multiple of 3.

Proof. We shall combine two proof techniques and use both a proof by contrapositive and a proof
by cases. Suppose that n is not a multiple of 3. We wish to show then that n2 is not a multiple of
3. By the Quotient-Remainder Theorem with d = 3, there exist unique integers q and r such that
n = 3 · q + r and 0 ≤ r < 3. Hence, r ∈ {0; 1; 2}. Therefore, n = 3q or n = 3q + 1 or n = 3q + 2
for some integer q depending on whether r = 0; 1 or 2, respectively. Since n is not a multiple of 3,
either n = 3q + 1 or n = 3q + 2 for some integer q. We consider each case in turn.
(1) n = 3q + 1 for some integer q.
Then, n2 = (3q + 1)2 = 9q2 + 6q + 1 = 3(3q2 + 2q) + 1, and so n2 is not a multiple of 3.
(2) n = 3q + 2 for some integer q.
Then, n2 = (3q + 2)2 = 9q2 + 12q + 4 = 3(3q2 + 4q + 1) + 1, and so n2 is not a multiple of
3.


Example 3.26. Let n ∈ Z. If n is an odd integer, then n2 = 8m + 1 for some integer m.

Proof. We shall use both a direct proof and a proof by cases. Assume that n is an odd integer.
By the Quotient-Remainder Theorem with d = 4, there exist unique integers q and r such that
n = 4 · q + r and 0 ≤ r < 4. Hence, r ∈ {0; 1; 2; 3}. Therefore, n = 4q or n = 4q + 1 or n = 4q + 2
or n = 4q + 3 for some integer q depending on whether r = 0; 1; 2 or 3, respectively. Since n is
odd, and since 4q and 4q + 2 are both even, either n = 4q + 1 or n = 4q + 3 for some integer q. We
consider each case in turn.
(1) n = 4q + 1 for some integer q.
Then, n2 = (4q + 1)2 = 16q2 + 8q + 1 = 8(2q2 + q) + 1 = 8m + 1, where m = 2q2 + q.
Since q ∈ Z, we must have m ∈ Z. Hence, n2 = 8m + 1 for some integer m.
(2) n = 4q + 3 for some integer q.
3.9. Decomposition or proof by cases 33

Then, n2 = (4q + 3)2 = 16q2 + 24q + 9 = (16q2 + 24q + 8) + 1 = 8(2q2 + 3q + 1) + 1 =


8m + 1, where m = 2q2 + 3q + 1. Since q ∈ Z, we must have m ∈ Z. Hence, n2 = 8m + 1 for
some integer m.


We remark that the last conclusion can be restated as follows: For every odd integer n, we have
n2 mod 8 = 1. Here are some additional illustrative examples.
Example 3.27. If x is a real number, then
x 6 |x| .

Recall the definition of absolute value:


{
x if x > 0
(3.17) |x| =
−x if x < 0.
Since this definition is divided into two parts, it makes sense to divide the proof also in two parts.

Proof. Let x be an arbitrary real number. Then either x > 0 or x < 0. If x > 0, then by definition
|x| = x. If x < 0, then −x > 0, so that
x < 0 < −x = |x|
In either case,
x 6 |x| .

Chapter 4

Problem Set 1

(1) Prove or give a counterexample for the following claims. Capital letters refer to propositions
or sets, depending on the context.
(a)
∼ (A ∧ B) ⇔ ∼ A ∨ ∼ B
(b)
∼ (A ∨ B) ⇔∼ A ∧ ∼ B.
(c)
∼ (A ⇒ B) ⇔ A ∧ ∼ B.
(d)
((A ∨ B) ⇒ C) ⇔ ((A ⇒ C) ∧ (B ⇒ C)).
(e) If n and n + 1 are consecutive integers, then both cannot be even.
(f) Give a counter example of the proposed statement: If n ∈ N then n2 > n.
(g) If x is odd then x2 is odd.
(2) Write the negation of the following statements
(a) If S is closed and bounded, then S is compact.
(b) If S is compact, then S is closed and bounded.
(c) If a function is continuous then it is differentiable.
(3) Find the contrapositive of
(a) If x2 ̸= 3 ∧ y2 > 5 then xy is a rational number.
(b) If x ̸= 0 then ∃y  xy = 1.
(4) Find the mistake in the “proof”of the following results, and provide correct proofs.
(a) If m is an even integer and n is an odd integer, then 2m + 3n is an odd integer.
Proof. Since m is an even integer and n is an odd integer, m = 2k and n = 2k + 1 for some
integer k. Therefore, 2m + 3n = 2(2k) + 3(2k + 1) = 10k + 3 = 2(5k + 1) + 1 = 2l + 1;

35
36 4. Problem Set 1

where l = 5k + 1. Since k ∈ Z, l ∈ Z. Hence, 2m + 3n = 2l + 1 for some integer l, whence


2m + 3n is an odd integer. 
(b) For all integers n ≥ 1, n2 + 2n + 1 is composite.
Proof. Let n = 4. Then, n2 + 2n + 1 = 42 + 2(4) + 1 = 25 and 25 is composite. 
(5) Prove the following claims:
(a) An integer that is not divisible by 2, cannot be divisible by 4. (Try proving this twice, once
with contraposition and once with contradiction).
(b) There is no greatest negative real number.
(c) The product of an irrational number and a nonzero rational number is irrational.
(6) Prove that for n ∈ N,
(a)
1 + 3 + 5 + · · · + (2n − 1) = n2 .
(b)
n(n + 1)
1+2+···+n = .
2
(c)
[ ]2
n(n + 1)
13 + 23 + · · · + n3 = .
2
(d) For q ̸= 1 and n > 1,
n−1
a − [a + (n − 1)r]qn rq(1 − qn−1 )
∑ (a + kr)qk = 1−q
+
(1 − q)2
k=0
(7) (Sum of a Geometric Sequence): For all integers n ≥ 0 and all real numbers r with r ̸= 1,
n
rn+1 − 1
∑ ri = r−1
i=0
What can we say when n → ∞ for arbitrary values of r? For what values of r is the sum well
defined? What is the sum for such values of r?
(8) (a) For all integers n ≥ 2, n3 − n is divisible by 6.
(b) For all integers n ≥ 3, 2n > 2n + 1.
(9) All prime numbers greater than 6 are either of the form 6n + 1 or 6n + 5, where n is some
natural number.
(10) If |9 − 5x| ≤ 11 then show that x ≥ − 25 and x ≤ 4.
Chapter 5

Set Theory, Sequence

5.1. Set Theory


5.1.1. Basic Definitions.

Definition 5.1. A set is a well-specified collection of elements.

We define a set as a “well-specified collection”in order to emphasize that there must be a clear
rule or group of rules that determine membership in the set. Essentially all mathematical objects
can be gathered into sets: numbers, variables, functions, other sets, etc. Examples of sets can be
found everywhere around us. For example, we can speak of the set of all living human beings,
the set of all cities in Europe, the set of all propositions, the set of all prime numbers, and so on.
Each living human being is an element of the set of all living human beings. Similarly each prime
number is an element of the set of all prime numbers. If A is a set and a is an element of A, then
we write a ∈ A. If it so happens that a is not an element of A, then we write a ∈ / A. If S is the set
whose elements are s, t, and u, then we write S = {s;t; u}. The left brace and right brace visually
indicate the “bounds” of the set, while what is written within the bounds indicates the elements
of the set. For example, if S = {1; 2; 3; 5}, then 2 ∈ S, but 4 ∈ / S. Sets are determined by their
elements. The order in which the elements of a given set are listed does not matter. For example,
{1; 2; 3} and {3; 1; 2} are the same set. It also does not matter whether some elements of a given
set are listed more than once. For instance, {1; 2; 2; 2; 3; 3} is still the set {1; 2; 3}. Many sets are
given a shorthand notation in mathematics as they are used so frequently. A set may be defined by
a property. For instance, the set of all true propositions, the set of all even integers, the set of all
odd integers, and so on. Formally, if P(x) is a property, we write A = {x ∈ S : P(x)} to indicate that
the set A consists of all elements x of S having the property P(x). The colon : is commonly read as
“such that” and is also written as “| ”. So {x ∈ S|P(x)} is an alternative notation for {x ∈ S : P(x)}.
For a concrete example, consider A = {x ∈ R : x2 = 2}. Here the property P(x) is x2 = 2. Thus, A
is the set of all real numbers whose square is one.

37
38 5. Set Theory, Sequence

A B

Figure 5.1. Set B is a strict subset of set A: B⊂ A

Definition 5.2. If A is a set, then B is a subset of A if every element of B is also an element of A.


We write B ⊆ A or A ⊇ B.
Definition 5.3. If A is a set, then B is a strict subset of A if every element of B is also an element
of A, and there exists at least one element of A which is not an element of B.

We write B ⊂ A or A ⊃ B. In shorthand we could write these as: B is a subset of A if


b∈B⇒b∈A
and B is a strict subset of A if
b ∈ B ⇒ b ∈ A ∧ ∃a ∈ A s.t. a ∈
/ B.
Technically we should differentiate between subsets and strict subsets, but economists are usually
sloppy about this. In most courses you will see the operator ⊂ used for both, and you will not be
required to differentiate between the two concepts. Now let X be a universal set, such that we are
interested in subsets of this set.
Definition 5.4. The complement of the set A is the set Ac containing all elements not in A.
We write Ac = {x : x ∈
/ A}.
For the complement of a set to be clearly understood, we need to know what the relevant
universe is. For example, we can define the set J as all real numbers between 2 and 4, inclusive:
J = {x ∈ R | 2 ≤ x ≤ 4}1.
In this context, the set J c is the set of all real numbers strictly less than 2 or strictly greater than 4:
J c = {x ∈ R | x < 2 ∨ x > 4}.
The “universe” in this case is the set of real numbers. The complement of J doesn’t include all
mathematical objects not in J, nor does it include all numbers not in J (because complex numbers
are excluded). In most cases the universe is clear from the context.
1This can also be written as J = [2, 4], where the square brackets indicate the closed interval between the first entry and the second.
5.1. Set Theory 39

Set A

Set AC

Figure 5.2. Complement of Set A

Example 5.1. Some examples of sets are:

D = {2, 4, 10},
B = {x ∈ R s.t. x ≥ 10}
S = The set of all real-valued functions on R.

5.1.2. A Few Common Sets.


R Set of real numbers
R+ Set of non-negative real numbers ≥ 0
R++ Set of positive real numbers > 0
Z Set of integers (−10, 0, 2, 451, etc.)
Z+ Set of non-negative integers ≥ 0 (also called N)
Z++ Set of integers > 0 (sometimes also called N)
Q Set of rational numbers (numbers that can be expressed as fractions)
C Set of complex numbers
0/ Empty set or null set
Ω Universal set
R2 Set of pairs of real numbers

The last set R2 is shorthand notation for the Cartesian product R × R. This notation is accept-
able for any n ∈ Z++ number of sets. You will often encounter proofs and theorems defined on the
set Rn , which is the general way of describing the space of n-vectors, each element of which is a
real number.

5.1.3. Set Operations.


40 5. Set Theory, Sequence

Definition 5.5. Union : The union of n sets is the set containing all elements from all n sets. We
write

A ∪ B = {x : x ∈ A ∨ x ∈ B}.
n
∪ Ai = A1 ∪ A2 ∪ · · · ∪ An == {x : for some i = 1, · · · , n, x ∈ Ai }
i=1

Union of two sets: A ∪ B

A. B

Definition 5.6. Intersection : The intersection of n sets is the set containing the elements common
to all n sets. We write

A ∩ B = {x : x ∈ A ∧ x ∈ B}.
n
∩ Ai = A1 ∩ A2 ∩ · · · ∩ An = {x : for all i = 1, · · · , n, x ∈ Ai }
i=1

Exercise 5.1. Let A1 , · · · , An be subsets of X. Then,

 C  C

j=n ∩
j=n ∩
j=n ∪
j=n
 A j = ACj ;  A j = ACj .
j=1 j=1 j=1 j=1

Intersection of two sets: A ∩ B

A. B

Definition 5.7. Exclusion : The exclusion of the set B from the set A is the set of all elements in
A that are, in addition, not elements of B. We write

A \ B = {x ∈ A | x ∈
/ B}.
5.1. Set Theory 41

A\B B\A

A B

Figure 5.3. Proposition 1: Sets A \ B and B \ A have empty intersection.

A−B

A. B

B−A

A. B

Proposition 1. (A \ B) ∩ (B \ A) = 0/

Proof.

A \ B = A ∩ BC ⊆ BC
B \ A = B ∩ AC ⊆ B
/
B ∩ BC = 0.

Here is a pictorial representation (Fig. 5.1.3) of this proof.


42 5. Set Theory, Sequence


Exercise 5.2. Let B, and A1 , · · · , An be subsets of X. Then,

   

j=n ∩
j=n ∩
j=n ∪
j=n
B− A j = (B − A j ); B− A j = (B − A j ).
j=1 j=1 j=1 j=1

Next we consider the sets whose elements are sets themselves. For example, let A, B, and C be
subsets of X, then the collection A = {A, B,C} is a set, whose elements are A, B and C. We call a
set whose elements are subsets of X, a family of subsets of X, or a collection of subsets of X. The
notation we follow would be, the lower case letters refer to the elements of X, upper case letters
refer to subsets of X and script letters refer to families of subsets of X.

Any subset of empty set is empty. Observe that the empty set 0/ is a subset of every set X. It is
/ In this case {0}
possible to form a non-empty set whose only element is the empty set, i.e., {0}. / is
a singleton. Also 0/ ⊂ {0}
/ and 0/ ∈ {0}.
/

There is a special family of subsets of X with a special name.


Definition 5.8. Let A be any subset in X. The power class of A or the power set of A is the family
of all subsets of A. We denote the power set of A by P (A).

Specifically,
P (A) = {B : B ⊆ A}
/ = {0},
The power set of the empty set is P (0) / i.e., the singleton of 0.
/ The power set of a singleton
P ({a}) = {0, / {a}}. Note that the power set of A always contains A and 0. / In general, if A is a
n
finite set with n elements, then P (A) contains 2 elements.
Exercise 5.3. Prove that if A is a finite set with n elements, then P (A) contains 2n elements.

5.2. Set Identities

There are a number of set identities that the set operations of union, intersection, and set difference
satisfy. They are very useful in calculations with sets. Below we give a table of such set identities,
where U is a universal set and A, B, and C are subsets of U.

• Commutative Laws: A ∪ B = B ∪ A ; A ∩ B = B ∩ A
5.2. Set Identities 43

• Associative Laws: (A ∪ B) ∪C = A ∪ (B ∪C) ; (A ∩ B) ∩C = A ∩ (B ∩C)

• Distributive Laws: A ∩ (B ∪C) = (A ∩ B) ∪ (A ∩C) ; A ∪ (B ∩C) = (A ∪ B) ∩ (A ∪C)

• Idempotent Laws: A ∪ A = A ; A ∩ A = A

• Absorption Laws: A ∩ (A ∪ B) = A ; A ∪ (A ∩ B) = A

• Identity Laws: A ∪ 0/ = A ; A ∩U = A

• Universal Bound Laws: A ∪U = U ; A ∩ 0/ = 0/

• De Morgan’s Laws: (A ∪ B)c = Ac ∩ Bc ; (A ∩ B)c = Ac ∪ Bc

• Complement Laws: A ∪ Ac = U ; A ∩ Ac = 0/

• Complements of U and 0/ : U c = 0/ ; 0/ c = U

• Double Complement Law: (Ac )c = A

• Set Difference Law: A \ B = A ∩ Bc

De Morgan’s Laws: (A ∩ B)c = Ac ∪ Bc

A. B

Exercise 5.4. Prove the following using only set identities:

(a) (A ∪ B) \C = (A \C) ∪ (B \C).

(b) (A ∪ B) \ (C \ A) = A ∪ (B \C).

/
(c) A ∩ (((B ∪Cc ) ∪ (D ∩ E c )) ∩ ((B ∪ Bc ) ∩ Ac )) = 0.
44 5. Set Theory, Sequence

We will discuss additional concepts in set theory after we have gone over some elementary
exposition of functions and sequences.

5.3. Functions

First we define a correspondence.


Definition 5.9. A correspondence consists of:

(a) A set D called the domain;

(b) A set R called the range; and

(c) A mapping f (x) which assigns at least one element from R to each element x ∈ D.
Definition 5.10. A function consists of:

(a) A set D called the domain;

(b) A set R called the range; and

(c) A mapping f (x) which assigns exactly one element from R to each element x ∈ D.

Here are some examples of functions.


f (x) = x3 , D = R, R=R
f (x) = 0, D = R, R = R.
The range need not be exhausted but the domain must be.

The set of all functions is a strict subset of the set of all correspondences. This is the same as
saying that all functions are correspondences, but not the other way around. From here onwards it’s
critical that you specify the domain and the range when defining or using a function. For example
these two functions:

f : R → R such that f (x) = x2

g : R → R+ such that g(x) = x2


5.4. Vector Space 45

are not the same function, even though in practice they produce identical results.2
Definition 5.11. The argument of a function is the element from the domain that is mapped into
the range and the value of a function is the element from the range that is the destination of the
mapping.
Definition 5.12. A real-valued function is a function whose range is the set R or any subset of R.

From the above definition 5.12, the definitions of integer-valued functions, complex-valued
functions, etc., should be clear.
{ }
Definition 5.13. Let f : D → R and let A ⊆ D. We let f (A) represent the subset f (x) : x ∈ A
{ R. The set f (A)
of } is called the image of A in R. If B ⊆ R, we let f −1 (B) represent the subset
−1
x ∈ D : f (x) ∈ B of D. The set f (B) is called the pre-image of B in D.

Note that the image of a function may be equivalent to the range, or it may be a strict subset of
the range. In the above example, the image of the function f is a strict subset of its range, but the
image of g is equal to its range.

5.4. Vector Space

The vector space is defined over a field which is a set on which two operations + and · (called
addition and multiplication respectively) are defined. The formal definition of field is as follows:
Definition 5.14. A field F is a set on which two operations, called addition (+) and multiplication
(·), are defined so that for each pair of elements x, y in F there are unique elements x + y and x · y
in F, such that the following conditions hold for all a, b c in F.

(i) Commutativity of addition and multiplication:


a + b = b + a, and a · b = b · a.

(ii) Associativity of addition and multiplication:


(a + b) + c = a + (b + c), and (a · b) · c = a · (b · c)

(iii) Existence of identity elements for addition and multiplication: There exists elements 0 and 1
in F such that
0 + a = a, and 1 · a = a
2The difference between the two is that the range of f is all real numbers, and the range of g is the set of non-negative real numbers.
This is inconsequential, since the mapping in both cases takes all elements from the domain and assigns them to a non-negative real
number. But the two functions are still not the same.
46 5. Set Theory, Sequence

(iv) Existence of inverses for addition and multiplication: For each element a in F and for each
non-zero element b in F, there exist elements c and d in F such that
a + c = 0, and b · d = 1

(v) Distributivity of multiplication over addition:


a · (b + c) = a · b + a · c.

Example of fields include the set of real numbers R with the usual definitions of addition and mul-
tiplication, the set of rational numbers Q with the usual definitions of addition and multiplication.

Definition 5.15. A vector space V over a field F consists of a set on which two operations, called
addition (+) and scalar multiplication (·), are defined so that for each pair of elements x, y in V
there is a unique element x + y in V , and for each element a in the field F and for each element x in
V , there is a unique element ax in V, such that the following conditions hold.

(i) Commutativity of addition:


∀x, y ∈ V, x + y = y + x

(ii) Associativity of addition:


∀x, y, z ∈ V, (x + y) + z = x + (y + z)

(iii) Existence of additive identity:


∃ an element 0 ∈ V  x + 0 = x ∀x ∈ V

(iv) Existence of additive inverse:


∀x ∈ V ∃ some element y ∈ V  x + y = 0

(v) Identity element for scalar multiplication:


1 · x = x ∀ x ∈ V.

(vi) Scalar association:


∀ α, β ∈ R, ∀ x ∈ V, (αβ) · x = α · (β · x)

(vii) Distributivity of scalar multiplication:


∀ α ∈ F, ∀ x, y ∈ V, α · (x + y) = (α · x) + (α · y)
5.4. Vector Space 47

(viii) Scalar distribution:


∀ α, β ∈ F, ∀ x ∈ V, (α + β) · x = α · x + β · x

In order to show that any space is a vector space, we simply need to show that the properties in
the above definition are satisfied.
Definition 5.16. The Cartesian Product of sets A and B is the set of pairs (a, b) satisfying a ∈
A ∧ b ∈ B. We write
A × B = {(a, b) | a ∈ A ∧ b ∈ B}.

The Cartesian product is the two set case of the general “cross product” of sets, which is the
same concept defined for any number of sets. For example using sets A, B, C and D we could define
E = A × B × C × D, and a typical element of E would be (a, b, c, d) for some a ∈ A, b ∈ B, c ∈ C
and d ∈ D.
Example 5.2.
{ }
R3 = R × R × R = (x, y, z) | x ∈ R ∧ y ∈ R ∧ z ∈ R
R2+ = R+ ×R+ ; R2++ = R++ ×R++ .

The order of the sets in the cross-product does matter as the following example shows.
Example 5.3. Let
A = {1, 2, 3} , B = {2, 4}
{ }
A × B = (1, 2) , (1, 4) , (2, 2) , (2, 4) , (3, 2) , (3, 4)
{ }
B × A = (2, 1) , (2, 2) , (2, 3) , (4, 1) , (4, 2) , (4, 3) .

(a) The nonzero vectors u and v are parallel if there exists a ∈ R such that u = av.

(b) The vectors u and v are orthogonal or perpendicular if their scalar product is zero, that is, if
u · v = 0.
( )
u·v
(c) The angle between vectors u and v is arccos ∥u∥·∥v∥ .

5.4.1. Metric.
Definition 5.17. A distance function is a real-valued function d : V ×V → R which satisfies

(i) Non-negativity:
∀ x, y ∈ V, d(x, y) > 0 with equality if and only if x = y
48 5. Set Theory, Sequence

(ii) Symmetry:
∀x, y ∈ V, d(x, y) = d(y, x)

(iii) Triangle Inequality:


∀x, y, z ∈ V, d(x, z) 6 d(x, y) + d(y, z).

Any function satisfying these three properties is a distance function. A distance function is also
called a metric. The space V with elements x, y, which would be called points, is a metric space if
we can associate a distance function to it.
Example 5.4.

(a) The set of real numbers R with the distance function d(x, y) ≡ |x − y|.

(b) The set of complex numbers C with the distance function d(w, z) ≡ |w − z|.

(c) Euclidean Distance:



d (x, y) = (x1 − y1 )2 + · · · + (xn − yn )2
where V = Rn .

(d) Discrete metric: {


0 if x = y
d (x, y) =
̸ y
1 if x =
where V is any vector space.

(e) In V = R2
{ }
d (x, y) = max |x1 − y1 | , |x2 − y2 |

(f) In space V if d(·, ·) is a metric then


d(x, y)
d1 (x, y) =
1 + d(x, y)
is also a metric. This allows us to construct any number of metric d(x, y) from any given metric.

(g) Let X be a set of people of same generation with a common ancestor, for example all grandchil-
dren of a grandmother. The distance d(x, y) between any two individuals x and y is the number
of generations one has to go back along the female lines to find the first common ancestor. For
example, distance between two sisters is one.
5.4. Vector Space 49

(h) Let X be the set of n letter words in a m-character alphabet A = {a1 , a2 , · · · , am }, meaning
X = {(x1 , x2 , · · · , xn )|xi ∈ A}. We define the distance d(x, y) between two words x = (x1 , · · · , xn )
and y = (y1 , · · · , yn ) to be the number of places in which the words have different letters. That
is,
d(x, y) = #{i|xi ̸= yi }.
Exercise 5.5. Try to show the last two examples are indeed metric functions.

5.4.2. Norm.
Definition 5.18. A norm is a real-valued function written ∥ · ∥: V → R, defined on vector space V ,
which satisfies

(i) Non-negativity:
∀x ∈ V, ∥ x ∥ > 0; with equality if only if x = 0,

(ii) Homogeneity:
∀x ∈ V, α ∈ R, ∥ α · x ∥ = | α | · ∥ x ∥,

(iii) Triangle Inequality:


∀x, y ∈ V, ∥ x + y ∥ 6∥ x ∥ + ∥ y ∥ .
Example 5.5.

(a) Euclidean Norm: √


∀x ∈ Rn , ∥x∥ = x12 + · · · + xn2

(b) Taxicab Norm:


n
∀x ∈ Rn , ∥x∥ = ∑ |xi |
i=1

5.4.3. Inner Product.


Definition 5.19. An inner product is a real valued function ⟨·, ·⟩ : V × V → R, defined on vector
space V , which satisfies

(i) Symmetry:
∀x, y ∈ V, ⟨x, y⟩ = ⟨y, x⟩ ,

(ii) Positive definiteness:


∀x ∈ V, ⟨x, x⟩ > 0 with equality if and only if x = 0,
50 5. Set Theory, Sequence

(iii) Bilinearity:
∀x, y, z ∈ V, ∀α, β ∈ R, ⟨αx + βy, z⟩ = ⟨αx, z⟩ + ⟨βy, z⟩ .
Example 5.6. V = Rn . Dot Product
∀x, y ∈ V, x · y = x1 y1 + · · · + xn yn .
Definition 5.20. A metric space (V, d) is a space V equipped with a distance function d.

A normed metric space (V, ∥·∥) is a metric space V equipped with a norm ∥·∥. An inner product
space (V, ⟨·, ·⟩) is a space V and an inner product ⟨·, ·⟩.

5.4.4. Cauchy-Schwartz Inequality. The Cauchy-Schwarz inequality states that for all vectors x
and y of an inner product space,
|⟨x, y⟩|2 6 ⟨x, x⟩ · ⟨y, y⟩,
where ⟨·, ·⟩ is the inner product. Equivalently, by taking the square root of both sides, and referring
to the norms of the vectors, the inequality is written as
|⟨x, y⟩| 6 ∥x∥ · ∥y∥.
Moreover, the two sides are equal if and only if x and y are linearly dependent (or, in a geometrical
sense, they are parallel or one of the vectors is equal to zero).

If x1 , · · · , xn ∈ R and y1 , · · · , yn ∈ R are any real numbers, the inequality may be restated in a


more explicit way as follows:
|x1 y1 + · · · + xn yn |2 6 (x12 + · · · + xn2 ) · (y21 + · · · + y2n ).

5.5. Sequences

Definition 5.21. A sequence is a function


{xn } : N → Rm
that gives us an ordered infinite list of points in Rm .

Another notation for sequence is ⟨xn ⟩ where ⟨xn ⟩ ≡ (x1 , x2 , · · · ). As we saw above, sets are
unordered collections of elements. Even if there is an intuitive ordering to the elements of a set,
with respect to the definition of the set itself there is no “first element” or “last element”. Sequences,
however, are sets for which the elements are assigned a particular order.
5.5. Sequences 51

Example 5.7.
{ }
1
S1 = , n ∈ N is a sequence in R
n
( )
 1 
S2 = n , n ∈ N is a sequence in R2
 n 

The interpretation of S1 is that the nth element of the sequence is givenby 1n . So we could also
( ) ( )
 1 1 
have written S1 = {1, 21 , 13 , 14 , · · · }. Similarly S2 = 1 , 2 , · · · . Note the implication
 1 2 
of this definition is that the elements of the sequence are numbered from 1 onwards, not from 0.
It’s usually assumed in the first year courses that the first element of a sequence is numbered “1”
not “0”, but this need not always be the case. Note that order of appearance of elements matters
{1, 2, 3, 4, · · · } ̸= {2, 1, 3, 4, · · · }
and elements can be repeated,
S = {1, 1, 1, · · · } is a sequence.

5.5.1. Convergence and Limits.


Definition 5.22. We say that x is a limit point of {xn } , n ∈ N, if
∀ε > 0 there exist infinite number of terms xn  d (x, xn ) < ε.
Example 5.8. (a) Let xn = (−1)n . This sequence has two limit points: a = −1 and a = 1.

(b) Let xn = sin( π·n


2 ). This sequence has three limit points: a = −1, 0, 1.
{ }
(c) The sequence 1, −1, 12 , −1, 13 , −1, · · · has two limit points 0 and −1.

n
(d) Let xn = n(−1) . This sequence has a limit point a = 0.

(e) Let {xn } be a convergent sequence: xn → x as n → ∞. Then xn has a limit point x.


Definition 5.23. The sequence {xn } converges to x (has a limit) x if
∀ ε > 0, ∃ N ∈ N such that d (xn , x) < ε ∀ n > N

In this case we write


x = lim xn .
n→∞
52 5. Set Theory, Sequence

Definition 5.23 is a source of a lot of difficulty. However it’s one of the most important defini-
tions in macroeconomic theory and in parts of micro, and it’s worth forcing yourself to fully absorb
it before the end of the Review. The intuition behind limits is not as difficult as the formal defini-
tion. A sequence converges to x if after choosing any very, very tiny number (ε), you can identify a
point in the sequence (N) after which all of the remaining members of the sequence are no farther
than ε from some particular value x. This concept is only well-defined for infinite sequences. In
most economic theory, the elements of a convergent sequence never actually reach their limiting
value. They simply get closer and closer to it as the sequence progresses.
1
Example 5.9. The sequence xn = n is a convergent sequence.

Proof. Let ε > 0 be given. We have to find N such that


∀n > N, d (xn , 0) = |xn | < ε
1 1
⇔ xn < ε ⇔ < ε ⇔ n > .
n ε
So by choosing N to be any natural number greater than 1ε , we have
1 1
∀n > N, d (xn , 0) = |xn | = < < ε.
n N

Definition 5.24. A sequence {xn } is bounded if
∃ B ∈ R such that d (xn , 0) 6 B, ∀ n ∈ N.
Definition 5.25. A sequence {xn } is unbounded if
∀ B ∈ R ∃ n ∈ N such that d (xn , 0) > B.
Example 5.10. The sequence {1, 0, 1, 0, · · · } is bounded. The sequence {xn } , xn = n, n ∈ N is
unbounded.
Definition 5.26. The tail of a sequence {xn } is the continuation of {xn } after some m ∈ N, that is
{xm+1 , xm+2 , · · · }.
Theorem 5.1. A sequence {xn } is bounded if and only if the tail of {xn } is bounded.

Proof. {xn } is bounded⇒ tail {xn } is bounded. (TRIVIAL)

Next let us assume that the tail of the sequence {xn } is bounded and show that {xn } is bounded.

Fix some m. Then the tail {xn } is bounded. ⇔


∃ B such that |xn | ≤ B, ∀ n > m.
5.5. Sequences 53

Let
B′ = max {x1 , x2 , · · · , xm−1 , B} .
Then B′ is a bound for {xn },
∀n ∈ N, |xn | ≤ B′ .

{ }∞
Definition 5.27. If {xn }∞
n=1 is a sequence, a subsequence xnk k=1 is obtained from {xn } by cross-
ing out some (possibly infinitely many) elements, while preserving the order.
{ }
Example 5.11. Sequence: {xn } = 1, −1, 12 , −1, 13 , −1, · · · .

{ } { }
Subsequence: xnk = {−1, −1, −1, · · · } or 1, 12 , 13 , · · · .

Definition 5.28. A sequence is monotone increasing if


∀n ∈ N, xn+1 > xn
and is monotone decreasing if
∀n ∈ N, xn+1 6 xn .

Following claim on monotone sequences characterizes convergence property of monotone se-


quences.
Claim 5.1. Let {xn } be monotonic. Then it is convergent if and only if it is bounded.

Following proposition is useful in proving Bolzano-Weierstrass Theorem which we will discuss


next.
Proposition 2. Nested Interval Property Suppose that I1 = [a1 , b1 ], I2 = [a2 , b2 ], · · · , where I1 ⊇
I2 ⊇ · · · , and limn→∞ (bn − an ) = 0. Then there exists exactly one real number common to all
intervals In .

Proof. Note that we have a1 < a2 < a3 · · · < an < · · · < bn < · · · < b2 < b1 . Then each bi is an
upper bound for the set A = {a1 ; a2 ; · · · }. In other words sequence {an } is monotone increasing
and bounded sequence. Therefore, limn→∞ an = a exists and a = sup{an } 6 {bk } for each natural
number k. Hence ak 6 a 6 bk for every k ∈ N or a is contained in each Ik . Now let b be contained
in In for all n ∈ N. Then an 6 b 6 bn for every n ∈ N or 0 6 (b − an ) 6 (bn − an ) for each n. Then
limn→∞ (b − an ) = 0. It follows that b = limn→∞ an = a, and so a is the only real number common
to all intervals. 
Theorem 5.2. Bolzano-Weierstrass Theorem Every bounded sequence {xn } has a convergent sub-
sequence.
54 5. Set Theory, Sequence

Proof. Let {xn }∞


n=1 be bounded. There is B ∈ R such that xn 6 B for all n ∈ N. We prove the
theorem in following steps.

Step 1 We inductively construct a sequence of intervals I0 ⊇ I1 ⊇ I2 ⊇ · · · such that:


(i) In is a closed interval [an , bn ] where bn − an = 2B 2n ; and
(ii) {i : xi ∈ In } is infinite.
We let I0 = [−B, B]. This closed interval has length 2B and xi ∈ I0 for all i ∈ N. Suppose we
have In = [an , bn ] satisfying (i) and (ii). Let cn be the midpoint an +b n
2 . Each of the intervals
2n = 2n+1 . If xi ∈ In ,
[an , cn ] and [cn , bn ] is half the length of In . Thus they both have length 12 · 2B 2B

then xi ∈ [an , cn ] or xi ∈ [cn , bn ], possibly both. Thus at least one of the sets {i : xi ∈ [an , cn ]}
or {i : xi ∈ [cn , bn ]} is infinite. If the first set is infinite, we let an+1 = an and bn+1 = cn . If
the second is infinite, we let an+1 = cn and bn+1 = bn . Let In+1 = [an+1 , bn+1 ]. Then (i) and
(ii) are satisfied. By the Nested Interval Property, there exists a ∈ ∩∞ n=1 In .

Step 2 We next find a subsequence converging to a. Choose i1 ∈ N such that xi1 ∈ I1 . Suppose we
have in . We know that {i : xi ∈ In+1 } is infinite. Thus we can choose in+1 > in such that
xin+1 ∈ In+1 . This allows us to construct a sequence of natural numbers i1 < i2 < i3 < · · ·
where xin ∈ In for all n ∈ N.

Step 3 We complete the proof by showing that the subsequence (xin )∞


n=1 converges to a. Let ε > 0.
Choose N such that ε > 2B
2N
. Suppose n > N. Then xin ∈ In and a ∈ In . Thus |xin − a| 6 2B
2n 6
2B
2N
for all n > N.


Remark 5.1. Every bounded sequence {xn } has at least one limit point x̄.
Definition 5.29. A sequence {xn } is Cauchy sequence if
∀ ε > 0, ∃N, such that ∀n, m > N, d ( xn , xm ) < ε.

After N, each element is close to every other element or in other words, the elements lie within
a distance of ε from each other.

Some properties of Cauchy sequence are

(i) Every convergent sequence {xn } (with limit x, say) is a Cauchy sequence, since, given any
real number ε > 0, beyond some fixed point, every term of the sequence is within distance 2ε
of x, so any two terms of the sequence are within distance ε of each other.

(ii) Every Cauchy sequence of real numbers is bounded (since for some N, all terms of the se-
quence from the N-th position onwards are within distance 1 of each other, and if M is the
5.5. Sequences 55

largest absolute value of the terms up to and including the N-th, then no term of the sequence
has absolute value greater than M + 1).

(iii) In any metric space, a Cauchy sequence which has a convergent subsequence with limit x is
itself convergent (with the same limit), since, given any real number ε > 0, beyond some fixed
point in the original sequence, every term of the subsequence is within distance 2ε of x, and
any two terms of the original sequence are within distance 2ε of each other, so every term of
the original sequence is within distance ε of x.
Theorem 5.3. Every sequence has at most one limit.

Proof. By contradiction. We use the intuition that all points end up being close to say r1 and r2
at the same time which is not possible. Let sequence {xn } converge to two limits r1 and r2 . It is
enough to show that there is one ε for which this does not hold. Let us choose ε = d(r14,r2 ) = |r1 −r
4
2|

or |r1 − r2 | = 4ε. Since r1 is a limit,


∃N1 , ∀n > N1 , |xn − r1 | < ε
and since r2 is a limit,
∃N2 , ∀n > N2 , |xn − r2 | < ε.
Let N = max {N1 , N2 }. Then
∀n > N, |xn − r1 | + |xn − r2 | < 2ε.
By triangle inequality,

4ε = |r1 − r2 | = (xn − r2 ) − (xn − r1 ) 6 |xn − r1 | + |xn − r2 | < 2ε
which is a contradiction. 
Remark 5.2. A sequence can have more than one limit point.

5.5.2. Some Results on Sequences.

(a) Every convergent sequence is bounded BUT a bounded sequence may not be convergent. For
example {1, −1, 1, −1, · · · }.

(b) If xn → x and yn → y, then


xn + yn → x + y
xn · yn → x · y,
and if yn ̸= 0, ∀n ∧ y ̸= 0,
xn x
→ .
yn y
56 5. Set Theory, Sequence

(c) Weak inequalities are preserved in the limit.


   

 > 
 
 > 


 
 
 

6 6
If {xn } → x and xn b, ∀n ∈ N, then x b.

 >  
 > 


   
<   6 

{ }∞
(d) x is a limit point of {xn } if and only if ∃ a subsequence xn(k) of the sequence {xn } such
{ } k=1
that xn(k) → x.

   

 x1 
 
 x 


 n 
 

1 

x2n x2
(e) Sequence of vectors {xn } = ∈ R converges to a limit {x} =
N if and only if

 ··· 
  ··· 
 

 xN 
 
 xN 
{ } n

xin → {xi } , ∀i = 1, 2, · · · , N.

(f) Every convergent sequence is also a Cauchy sequence.


Definition 5.30. A metric space (X, d) is complete if every Cauchy sequence in X converges to a
limit in X.

5.6. Sets in Rn

Now we are ready for additional useful concepts in set theory. We begin with some definitions.
Definition 5.31. A set A on the real line is bounded if ∃B ∈ R  ∀x ∈ A, ∥x∥ 6 B.
Theorem 5.4. For every non-empty bounded set A ⊂ R, ∃ a real number sup A such that

(a) sup A is an upper bound for A,


∀ x ∈ A, x 6 sup A

(b) If y is any upper bound for A then


y > sup A
or sup A is the least upper bound for A.

Similarly inf A is the greatest lower bound.


5.6. Sets in Rn 57

r x

Br (x)

Figure 5.4. Open ball Br (x) in R2

Example 5.12. For the sets


A = [0, 1] , B = (0, 1) ,C = [0, 1), D = (0, 1]
sup = 1, inf = 0

This example shows that sup and inf of a set need not belong to the set. If sup belongs to the
set A, it is called max {A} and if inf {A} belongs to the set A, it is called min {A}.
Definition 5.32. Point x is a limit point of a set A if every neighborhood of x contains a point of A
different from x : x is a limit point of A if
∀ε > 0, ∃y ∈ A,  y ̸= x ∧ d (x, y) < ε.
Theorem 5.5. Bolzano-Weierstrass Theorem for sets Every bounded infinite set has at least one
limit point.
Example 5.13. For the set A = (0, 1), x = 0 is a limit point of the set A.

This shows that limit point of a set need not belong to the set.
Theorem 5.6. Point x is a limit point of set A ⊆ Rn if ∃ a sequence
{xn }  ∀n ∈ N, xn ̸= x ∧ xn ∈ A ∧ xn → x.
Definition 5.33. An open ball in Rn centered at x with radius r > 0 is
{ }
Br (x) = y ∈ Rn | d (x, y) < r .

Note that the open ball does not include its boundary points.

Example 5.14. An open ball in R2 centered at x = (0, 0) with radius 1 is


{ }
y ∈ R2 | y21 + y22 < 1 .
58 5. Set Theory, Sequence

Definition 5.34. The set A is open if


∀x ∈ A, ∃r > 0  Br (x) ⊆ A.

Around any point in an open set, one can draw an open ball which is completely contained in
the set.
Example 5.15. Following sets are open

A = (0, 1) ∪ (5, 10) ;


/
B = (−∞, 0) ; R; 0.
Definition 5.35. The set A is closed if A contains all its limit points. (contains its borders)
Theorem 5.7. Set A ⊆ Rn is closed if and only if AC is open.
Example 5.16. Following sets are closed.

/
A = [2, 5] since AC = (−∞, 2) ∪ (5, ∞) is open.; R; 0.
There are two sets which are both open and closed. The empty set and the universal set. Empty set
0/ is open since
int 0/ = 0/
and 0/ is closed since
bd 0/ = 0/ ⊆ 0.
/
The universal set is complement of the empty set and so is both open and closed. There can be sets
which are neither open nor closed: A = (0, 1]. Following theorem characterizes the closed set using
convergent sequences.
Theorem 5.8. A set A ⊆ Rn is closed if and only if every convergent sequence of points {xn } ∈ A
has its limit x ∈ A.
Example 5.17. The budget set
{ }
B (p, I) = y ∈ Rn+ | p · y 6 I ,
where p ∈ Rn++ and I ∈ R++ , is closed.
5.6. Sets in Rn 59

Good 2 p1
|Slope| = p2
M
P2

0 M
Good 1
P1
Figure 5.5. Budget set B(p, I)

Proof. Take any sequence


{xn }  xn ∈ B (p, I) ∀n ∧ xn → x.
xn > 0, ∀n ⇒ x > 0,
p · xn 6 I, ∀n ⇒ p · x 6 I
⇒ x ∈ B (p, I) ⇒ B (p, I) is closed.

Theorem 5.9.

(a) The union of any number of open sets is open.

(b) The intersection of a finite number of open sets is open.


60 5. Set Theory, Sequence

(c) A singleton set is a closed set.

(d) The union of a finite number of closed sets is closed.

(e) The intersection of any number of closed sets is closed.


Remark 5.3. The finite number of sets in (b) and (d) are necessary as following example shows.
( )
1 1 ∞
For (b) , An = − , , n ∈ N, ∩ An = [0] which is closed.
n n n=1
[ ]
1 ∞
For (d) , Bn = , 2 , n ∈ N, ∪ Bn = (0, 2] which is not closed.
n n=1

Definition 5.36. A set A ⊆ Rn is compact if and only if A is closed and bounded.


Example 5.18.
A = [1, 2] is compact.
R is closed but not bounded. NOT compact.
B = (1, 2] is bounded but not closed. NOT compact.
Definition 5.37. A set A ⊆ Rn is compact if every sequence of points {xn } ∈ A has a limit point
x ∈ A.
Definition 5.38. A set A ⊆ Rn is convex if ∀x, y ∈ A, ∀λ ∈ (0, 1),
λx + (1 − λ) y ∈ A.

It will be useful to draw some sets to differentiate between convex and non-convex sets.
5.6. Sets in Rn 61

Figure 5.6. A Non-convex Set

Figure 5.7. Not a convex set


62 5. Set Theory, Sequence

Figure 5.8. A Convex Set


Chapter 6

Problem Set 2

(1) Verify that the following are distance function or metric.


(a) The Manhattan distance, for x, y ∈ Rn
n
(6.1) d(x, y) = ∑ | xi − yi | ∀ x, y ∈ Rn
i=1

(b) For x, y ∈ R2 ,
(6.2) d(x, y) = max{| x1 − y1 |, | x2 − y2 |}
(c) Let d(·, ·) be a metric, then
d(x, y)
(6.3) d1 (x, y) = .
1 + d(x, y)

(2) Determine whether


∞ [ ]
∪ 1 2
(6.4) ,
n=1
n n
is compact.

(3) Which of the following is true? Prove or give a counterexample.


(a) (A ∪ B)c ⊆ Ac ∪ Bc
(b) (A ∪ B)c ⊇ Ac ∪ Bc .

(4) Suppose A, B, and C are sets which satisfy both of the following two conditions
(a) A ∪C = B ∪C,
(b) A ∩C = B ∩C.

63
64 6. Problem Set 2

Prove that A = B.

(5) Let A and B, be sets.


(a) Prove that P (A) ∩ P (B) = P (A ∩ B).
(b) Prove that P (A) ∪ P (B) ⊂ P (A ∪ B).
(c) Give an example of sets A and B such that P (A) ∪ P (B) ̸= P (A ∪ B).
Prove that A = B.

(6) Define he set C as an ordered pair of real numbers (x1 , x2 ) as follows.


{ }
(6.5) C = (x1 , x2 ) ∈ R2 : x12 + x22 = 1

If (a1 , a2 ) and (b1 , b2 ) are elements of C and c ∈ R, define


(a1 , a2 ) + (b1 , b2 ) = (a1 + b1 , a2 + b2 ), and c(a1 , a2 ) = (ca1 , ca2 ).
Is C a vector space? Justify your answer.

(7) Let V denote the set of ordered pairs of real numbers. If (a1 , a2 ) and (b1 , b2 ) are elements of V
and c ∈ R define
(a1 , a2 ) + (b1 , b2 ) = (a1 + b1 , a2 − b2 ), and c(a1 , a2 ) = (ca1 , ca2 ).
Is V a vector space over R with these operations? Justify your answer.

(8) Let V denote the set of ordered pairs of real numbers. If (a1 , a2 ) and (b1 , b2 ) are elements of V
and c ∈ R define
(a1 , a2 ) + (b1 , b2 ) = (a1 + 2b1 , a2 + 3b2 ), and c(a1 , a2 ) = (ca1 , ca2 ).
Is V a vector space over R with these operations? Justify your answer.

(9) Prove
(J ∩ K)c = J c ∪ K c
(J ∪ K)c = J c ∩ K c

(10) Consider sequences {xn }∞ ∞ ∞ ∞


n=1 and {yn }n=1 such that {xn }n=1 → x and {yn }n=1 → y. Show that
{xn + yn }∞
n=1 → x + y.

(11) Let xn = n, n ∈ N. Show that {xn }∞


n=1 is not convergent.

(12) Prove that every Cauchy sequence {xn } is bounded.


6. Problem Set 2 65

{ }
(13) Prove that the sequence {xn } = 2 − 1n : n ∈ N is not convergent to 1.

(14) Prove or disprove: A monotone sequence is convergent if and only if it is bounded.

(15) Determine whether the following sets are open, closed, neither or both:
(i) S = (0, 1);
(ii) S = [0, 1];
(iii) S = R;
(iv) S = [0, 1).

(16) Let An , Bn and Cn be the intervals in R defined by


[ ] ( ] ( )
1 1 1
An = 0, , Bn = 0, , Cn = − , n ,
n n n
where n is a positive integer. Obtain

∪ ∞
∩ ∞
∪ ∞
∩ ∞
∪ ∞

An , An , Bn , Bn , Cn , and Cn .
n=1 n=1 n=1 n=1 n=1 n=1
Chapter 7

Linear Algebra

Linear algebra is the branch of mathematics dealing with (among many other things) matrices and
vectors. It’s intuitively easy to see why linear algebra is important for econometrics and statistics.
Economic data is arranged in matrix format (rows corresponding to observations, columns corre-
sponding to variables), so the body of theory governing matrices should help us analyze data. It
is harder to see the connection between matrix theory and the optimization that we do in micro
theory, but there are some important links. We’ll cover the basics and some of the necessary detail
here, but more detailed coverage will be offered in the core courses.

7.1. Vectors

You may be familiar with vectors from physics courses, in which a vector is a pair giving the mag-
nitude and direction of a moving body. The vectors we use in economics are more general, in that
they can have any finite number of elements (rather than just 2), and the meaning of each element
can vary with the context (rather than always signifying magnitude and direction). Formally speak-
ing a vector can be defined as a member of a vector space, but we don’t need to deal with such a
definition here. For our purposes:
Definition 7.1. A vector is an ordered array of elements with either one row or one column.

The elements are usually numbers. A vector is an n × k matrix for which either n = 1, k = 1 or
both (see the definition of a matrix below). A general vector, for which the number of elements is
not specified but left as n, will sometimes be called an “n-vector”. We also refer to these as “vectors
in Rn ”. A vector can be written in either row or column form:

67
68 7. Linear Algebra

 
x1
( )  
x2 
Row Vector: x ∈ Rn = x1 x2 . . . xn ; Column Vector: x ∈ Rn =  
 .. .
 .
xn

Although you will sometimes be able to switch between thinking of a vector as a row or a
column without restriction, there are certain operations that require a vector to be oriented in a
certain way, so it is good to distinguish between row and column vectors whenever possible. Most
people use x to refer to the vector in column form and x′ to refer to it in row form, but this is not
universal. Also, we usually use lowercase letters for vectors and uppercase letters for matrices.

7.1.1. Special Vectors.  


0
.
Null vector 0n×1 =  .. 


0

 
1
.
Sum vector un×1 =  .. 


1

Unit vector : In Rn there are n unit vectors.


The ith unit vector, called ei , has all elements 0 except for the i th, which is equal to 1. The
definition of a unit vector is specific to the vector space in which it sits. For example:
 
0
 
(7.1) e2 ∈ R3 = 1
0
and
 
0
 
1
(7.2) e2 ∈ R4 =  
0
0

7.1.2. Vector Relations and Operations.


Definition 7.2.
7.2. Matrices 69

(a) Equality :
Vectors x ∈ Rn , y ∈ Rm are equal if n = m and xi = yi ∀ i.

(b) Inequalities : ∀x, y ∈ Rn :


x ≥ y if xi ≥ yi ∀ i = 1, · · · , n;
x > y if xi ≥ yi ∀ i = 1, · · · , n and xi > yi for at least one i;
x ≫ y if xi > yi ∀i = 1, · · · , n.

(c) Addition :
∀x, y ∈ Rn , x + y = z ∈ Rn where zi = xi + yi , ∀i.

(d) Scalar Multiplication :∀x ∈ Rn , and α ∈ R, we define the scalar product as


 
αx1
 
αx2 
(7.3) αx = 
 .. 

 . 
αxn

(e) Vector Multiplication : This is essentially an inner product rule applied to Rn . See the rules
for matrix multiplication below, as they also apply for vectors.

7.2. Matrices

Definition 7.3. A matrix is a rectangular array of elements (usually numbers, for our purposes).

A matrix is characterized as n × k when it has n rows and k columns. To represent the n × k


matrix A, we can write:  
a11 a12 . . . a1k
[ ]  
a21 a22 . . . a2k 
[A] = ai j = .. .. .. .. 

n×k n×k  . . . . 
an1 an2 . . . ank

The matrix An×k is a null matrix if ai j = 0 for i = 1, · · · , n, j = 1, · · · , k.

The matrix An×k is square if n = k. In this case we refer to it as an n × n matrix.

The square matrix An×n is symmetric if ai j = a ji ∀ i, j.


70 7. Linear Algebra

The symmetric matrix An×n is diagonal if ai j = 0 whenever i ̸= j.

The diagonal matrix An×n is an identity matrix if ai j = 1 whenever i = j.

The square matrix An×n is lower triangular if ai j = 0 ∀ i < j.

The square matrix An×n is upper triangular if ai j = 0 ∀ i > j.

It’s worthwhile to check your understanding of each of the above definitions by writing out a
matrix that satisfies each. Then note this next definition carefully:

The k × n matrix B is called the transpose of An×k if bi j = a ji ∀ i, j.

We write the transpose of A as either AT or A′ . If A is symmetric then A′ = A. This is an


obvious statement, but you could try proving it formally. It should only take a few lines.

7.2.1. Matrix Operations.

7.2.1.1. Addition. Matrix addition is only defined for matrices of the same size. If A is n × k and
B is n × k then
(7.4) [A] + [B] = [C]
n×k n×k n×k

where
(7.5) ci j = ai j + bi j ∀ i = 1, · · · , n, j = 1, · · · , k.
We say that matrix addition occurs “element wise” because we move through each element of the
matrix A, adding the corresponding element from B.

7.2.1.2. Scalar Multiplication. Scalar multiplication is also an element wise operation. That is,
 
λa11 λa12 · · · λa1k
 
λa21 λa22 · · · λa2k 
(7.6) ∀ λ ∈ R, λ · [A] =  .  .. 
. .. .. 
n×k  . . . . 
λan1 λan2 · · · λank

7.2.1.3. Matrix Multiplication. Matrix multiplication is defined for matrices [A] and [B] if j = n
m× j n×k
or m = k. That is, the number of columns in one of the matrices must be equal to the number of
rows in the other. If matrices A and B satisfy this condition, so that A is m × j and B is j × k, their
7.2. Matrices 71

product [C] ≡ [A] · [B] is given by ci j = Ai · B j , where Ai is the ith row of A and B j is the jth column
m×k m× j j×k
of B. For example, suppose
[ ] [ ]
1 2 6 5 4
[A] = and [B] =
2×2 3 4 2×3 3 2 1
Multiplication between A and B is only defined if A is on the left and B is on the right. It must
always be the case that the number of columns in the left hand matrix is the same as the number of
rows in the right hand matrix. In this case, if we say AB = C, then element
[ ]
[ ] 6
c11 = 1 2 · = 1 · 6 + 2 · 3 = 12
3
Likewise
c12 = 1 · 5 + 2 · 2 = 9
c13 = 1 · 4 + 2 · 1 = 6
c21 = 3 · 6 + 4 · 3 = 30
c22 = 3 · 5 + 4 · 2 = 23
c23 = 3 · 4 + 4 · 1 = 16
which gives [ ]
12 9 6
[A] · [B] = [C] =
2×2 2×3 2×3 30 23 16
Note that matrix multiplication is not a symmetric operation. In general, AB ̸= BA, and in fact it
is often the case that the operation will only be defined in one direction. In our example BA is not
defined because the number of columns of B = (3) is not equal to the number of rows of A = (2).
For both AB and BA to be defined
[A] · [B] = [C] ,
n×k k×n n×n
and
[B] · [A] = [D].
k×n n×k k×k

7.2.2. Some Fun facts about matrix multiplication.

(i) Even if n = k,
AB ̸= BA.
[ ] [ ] [ ] [ ]
1 2 0 −1 12 13 −3 −4
A= , B= , AB = , BA = .
3 4 6 7 24 25 27 40
72 7. Linear Algebra

(ii) AB may be null matrix even when A ̸= 0 and B ̸= 0.


[ ] [ ] [ ]
2 4 −2 4 0 0
A= , B= , AB = .
1 2 1 −2 0 0

(iii) CD = CE ; D = E even when C ̸= 0.


[ ] [ ] [ ] [ ]
2 3 1 1 −2 1 5 8
C= , D= , E= , CD = CE = .
6 9 1 2 3 2 15 24

7.2.3. Rules for matrix operations.


A+B = B+A
A + (B +C) = (A + B) +C
(AB)C = A(BC)
(A + B)C = AC + BC
A(B +C) = AB + AC
Check that you have a clear understanding of the restrictions needed on the number of rows and
columns of A, B and C in order for the above to work. More matrix rules, involving the transpose:
(7.7) (A′ )′ = A

(7.8) (A + B)′ = A′ + B′

(7.9) (AB)′ = B′ A′
Note the reversal of the order of the matrices in the last operation.

7.2.4. Rank of a matrix.


Definition 7.4. A set of vectors x1 , · · · , xn in Rm is linearly dependent if there exist λ1 , · · · , λn , not
all zero, such that
(7.10) λ1 x1 + · · · + λn xn = 0.
Definition 7.5. A set of vectors x1 , · · · , xn in Rm is linearly independent if it is not linearly depen-
dent.
Definition 7.6. The rank of a matrix A is the maximum number of linearly independent column
vectors of A. It is also equal to the number of linearly independent row vectors of A.
Example 7.1. Let  
1 2 3
 
A= 0 1 0 
2 4 6
7.2. Matrices 73

The first and the third columns are linearly dependent. The elements of column 3 are three
times the corresponding entry in the column 1. Now take Columns 1 and 2.
     
1 2 0
     
λ1  0  + λ2  1  =  0 
2 4 0
 

 λ1 + 2λ2 = 0  
⇔ λ2 = 0

 2λ1 + 4λ2 = 0  

⇔ λ1 = 0, λ2 = 0
is the only solution. So the first two columns are linearly independent. We found two linearly
independent columns so the rank of matrix A is 2. We could have done the exercise taking rows
instead of columns and still got the same answer. (Please verify).

Theorem 7.1. (i) Rank of [A] 6 {# rows, # columns} = min {n, k};
n×k

{ } { }
(ii) Rank of AB 6 min Rank (A) , Rank (B) 6 Rank (A) , Rank (B) .

Definition 7.7. A square matrix [A] is called non-singular or of full rank if rank (A) = n.
n×n

It is called singular if rank (A) < n.

Exercise 7.1. Prove: If A is a 2 × 1 matrix and B is a 1 × 2 matrix, then matrix AB is singular.

Definition 7.8. A square matrix [A] is invertible if there exist [B] such that [A] · [B] = [B] · [A] =
n×n n×n n×n n×n n×n n×n
[I] . Then B is called inverse of A.
n×n

7.2.5. Rules for the inverse:


( )−1
(7.11) A−1 = A

(7.12) (AB)−1 = B−1 A−1


( ′ )−1 ( )′
(7.13) A = A−1

Definition 7.9. A square matrix [A] is called orthogonal if A−1 = A′ , i.e., AA′ = I.
n×n

Theorem 7.2. [A] is invertible ⇔ [A] is non-singular.


n×n n×n
74 7. Linear Algebra

7.3. Determinant of a matrix

Determinant is defined only for square matrices. The determinant is a function depending on n that
associates a scalar, det (A), to an n × n square matrix A. The determinant of an 1-by-1 matrix A is
the only entry of that matrix: det (A) = A11 . The determinant of a 2 by 2 matrix
[ ]
a b
A=
c d

is det (A) = ad − bc.


Definition 7.10. The cofactor Ai j of the element ai j is defined as (−1)i+ j times the determinant of
the sub matrix obtained from A after deleting row i and column j.
Example 7.2. Let
[ ]
1 2
A =
3 4
A11 = (−1)1+1 · 4 = 4, A12 = (−1)1+2 · 3 = −3
A21 = (−1)2+1 · 2 = −2, A22 = (−1)2+2 · 1 = 1.
Definition 7.11. The determinant of an n × n matrix A is given by
n n
(7.14) det (A) = ∑ a1 j A1 j = ∑ ai1 Ai1 .
j=1 i=1

Example 7.3. Let  


a b c
 
A =  d e f .
g h i

Then
[ ] [ ] [ ]
e f d f d e
det (A) = a (−1)1+1 det + b (−1)1+2 det + c (−1)1+3 det
h i g i g h
= a (ei − f h) − b (di − f g) + c (dh − eg) .

7.3.1. Properties of Determinants.

(a)
( )
(7.15) det (A) = det A′
7.3. Determinant of a matrix 75

(b) Interchanging any two rows will alter the sign but not the numerical value of the determinant.

(c) Multiplication of any one row by a scalar k will change the determinant k− fold.

(d) If one row is a multiple of another row, the determinant is zero.

(e) The addition of a multiple of one row to another row will leave the determinant unchanged.

(f) If A and B are n × n matrices, then


det (AB) = det (A) · det (B) .

(g) Properties (b) − (e) are valid if we replace row by columns everywhere.

[ ] [ ]
1 2 1 3 ( )
A = , det (A) = −2; A′ = det A′ = −2
3 4 2 4
[ ]
3 4
B = , det (B) = 2.
1 2

Result 7.1. Let A be an n × n upper triangular matrix, i.e., ai j = 0 whenever i > j. The determinant
of the matrix A is given by:
det A = ∏i=1 aii
n

Proof. The matrix A is upper triangular and is described as under:


 
a11 a12 · · · a1,n−1 a1n
 
 0 a22 · · · a2,n−1 a2n 
 . 
A=  ..
.. . .
. .
..
.
..
.


 
 0 0 · · · a n−1,n−1 a n−1,n 
0 0 ··· 0 ann

We prove the result by induction.

(1) Base case: Let n = 1. If A is a 1 × 1 matrix, then det A = a11 = ∏1i=1 aii by the definition of a
determinant.

(2) Inductive case: Let n > 1. Assume that for any (n − 1) × (n − 1) matrix A with ai j = 0 for all
i=1 aii . Now consider any n × n matrix A with ai j = 0 for all i > j.
i > j, we have det A = ∏n−1
76 7. Linear Algebra

Expanding by the last row, we have

det A = an1 An1 + · · · + ann Ann




a11 a12 · · · a1,n−1

n+n
0 a22 · · · a2,n−1
= ann (−1) .
.. . . ..
.. . . .

0 0 · · · an−1,n−1
n−1
= ann ∏ aii
i=1
n
= ∏ aii
i=1

where the third equality follows from the inductive hypothesis.

(3) The result holds for all n by inductive conclusion.

Using this result we can show the following.

Result 7.2. The upper triangular square matrix A is non-singular if and only if aii ̸= 0 for each
i ∈ {1, · · · , n}.

As an ”if and only if” statement, this requires proofs in both directions.

Claim 7.1. If the upper triangular matrix A is non-singular, then aii ̸= 0 for all i = 1, . . . , n.

Proof. Let A be non-singular. Then A has an inverse, A−1 . Since 1 = det I = det [A−1 A] =
(det A−1 )(det A), we know that det A ̸= 0. If aii = 0 for any i ∈ 1, . . . , n, then by the Result
(7.1) we would have det A = 0, a contradiction. So it must be that aii ̸= 0 for all i = 1, . . . , n. 

Claim 7.2. If A is upper triangular and aii ̸= 0 for all i = 1, . . . , n, then A is non-singular.
7.4. An application of matrix algebra 77

Proof. Let aii ̸= 0 for all i = 1, . . . , n. Seeking contradiction, suppose A is singular. Without loss
of generality, we can write A1 = ∑ni=2 αi Ai . Let
 
 
B =  A1 − ∑ni=2 αi Ai A2 · · · An 

 
 
=  0 A2 · · · An 

We know, by the properties of determinants, that det B = det A. But, expanding B by the first
column, we have det B = 0. This gives det A = 0, a contradiction. So we have that A is non-
singular. 

7.4. An application of matrix algebra

We provide an application of matrix algebra is Markov process or Markov chain. Markov processes
are used to measure movements over time. It involves use of a Markov transition matrix. Each value
in the transition matrix is probability of moving from one state to another state. It also specifies a
vector containing the initial distribution across each of these states. By repeatedly multiplying the
initial distribution vector by the transition matrix, we can estimate changes across states over time.

Consider the problem of movement of employees within a firm at different branches. In the
simple case, we take two locations, namely Ithaca and Cortland to demonstrate the basic elements
of a Markov process.

To determine the number of employees in Ithaca tomorrow, we take the probability that the
employees will stay in Ithaca branch multiplied by the total number of employees currently in
Ithaca. We add to this the number of Cortland employees transferring to Ithaca, which is equal
to total number of employees in Cortland multiplied by the probability of Cortland employees
transferring to Ithaca.

We follow the same process to determine the number of employees in Cortland tomorrow, made
up of the employees who choose to remain at Cortland and the Ithaca employees who transfer into
Cortland.

There are four probabilities involved which can be arranged in a Markov transition matrix.
78 7. Linear Algebra

Let At and Bt denote the populations of Ithaca and Cortland locations at some time t. The
transition probabilities are defined as follows.
pAA ≡ probability that a current A remains an A,

pAB ≡ probability that a current A moves to B,


pBB ≡ probability that a current B remains a B,
pBA ≡ probability that a current B moves to A.
The distribution of employees at time t is denoted by the vector xt′ = [At Bt ] and the transition
probabilities in matrix form as
[ ]
pAA pAB
(7.16) M= .
pBA pBB

Then the distribution of employees across the two locations next period (t + 1) is xt′ · M = xt+1
′ ,

which is
[ ]
pAA pAB
[At Bt ] = [(At pAA + Bt pBA ) (At pAB + Bt pBB )] = [At+1 Bt+1 ].
pBA pBB

In the similar manner we can determine the distribution of employees after two periods.
′ ′
xt+1 · M = xt+2
[ ]
pAA pAB
[At+1 Bt+1 ] = [At+2 Bt+2 ]
pBA pBB
[ ][ ]
pAA pAB pAA pAB
[At Bt ] = [At+2 Bt+2 ]
pBA pBB pBA pBB
[ ]2
pAA pAB
[At Bt ] = [At+2 Bt+2 ]
pBA pBB

In general, for n periods,


[ ]n
pAA pAB
(7.17) [At Bt ] = [At+n Bt+n ]
pBA pBB

When n is exogenous, the process is known as finite Markov chain.


Example 7.4.

Consider the initial distribution of employees across two locations at time t = 0 as


x0′ = [A0 B0 ] = [200 200]
7.4. An application of matrix algebra 79

Let [ ] [ ]
pAA pAB 0.8 0.2
M= = .
pBA pBB 0.4 0.6
Then the distribution of employees in the next period t = 1 is
[ ]
0.8 0.2
[200 200] = [240 160] = [A1 B1 ].
0.4 0.6

The distribution after two periods is


[ ]2 [ ]
0.8 0.2 0.72 0.28
[200 200] = [200 200] = [256 144] = [A2 B2 ]
0.4 0.6 0.56 0.44

The distribution after six periods is


[ ]6 [ ]
0.8 0.2 0.668 0.332
[200 200] = [200 200] = [266.4 133.6] = [A6 B6 ]
0.4 0.6 0.664 0.336

Observe that when the transition matrix is raised to higher powers, the new transition matrix con-
verges to a matrix whose rows are identical. This is referred to as the steady state. In this example,
the steady state would be
[ ]
2 1
M= 3 3 .
2 1
3 3
Try computing this value.

7.4.1. Absorbing Markov Chains. We can extend the previous model by adding a third choice:
employees can exit the firm with
pAE ≡ probability that a current A choose to exit, E,

pBE ≡ probability that a current B choose to exit, E.


Let us assume that
pEA = 0, pEB = 0, pEE = 1
where pEA , pEB , and pEE are the probabilities that an employee who is currently in state E will go
to A, B or E respectively. The values assigned to pEA , pEB , and pEE mean that nobody who leaves
the firm ever returns. It is also implied by these restrictions that the firm never replaces employees
that leave. Starting at time t = 0, the Markov chain becomes,
 n
pAA pAB pAE
 
[A0 B0 E0 ]  pBA pBB pBE  = [An Bn En ]
pEA pEB pEE
80 7. Linear Algebra

or  n
pAA pAB pAE
 
[A0 B0 E0 ]  pBA pBB pBE  = [An Bn En ]
0 0 1

This type of Markov process is referred to as absorbing Markov chain. The values of transition
probabilities assigned in the third row are such that once an employee goes to state E, he or she
remains in that state for ever. As n goes to infinity, An , and Bn will approach zero and En will
approach the total number of employees at time zero (i.e., A0 + B0 + E0 ).

7.5. System of Linear Equations

The system of linear equation is


(7.18) Ax = b
where matrix A is of dimension n × k, x is a column vector k × 1 and b is column vector n × 1. This
is a system of n equations with k unknowns.
Example 7.5. The system of two linear equations,
5x + 3y = 1
6x + y = 2
can be written as [ ][ ] [ ]
5 3 x 1
=
6 1 y 2

When b = 0, the system is called a homogeneous system. When b ̸= 0, it is called a non-


homogeneous system.
Definition 7.12. Column vector x∗ is called a solution to the system if Ax∗ = b.

There are three important questions in this context.

(a) Does a solution exist?

(b) If there exists a solution, is it unique?

(c) If a solution exists, how do we compute it?


7.5. System of Linear Equations 81

Claim 7.3. A homogeneous system Ax = 0 always has a solution (Trivial x = 0). But there might
be other solutions (solution may not be unique).
Claim 7.4. For a non-homogeneous system Ax = b, a solution may not exist.
Example 7.6. Following system of two linear equations
2x + 4y = 5
x + 2y = 2
does not have a solution. Multiply second equation by 2. Then LHS of both equations become
same which leads to 5 = 4 which is a contradiction.
Example 7.7. Following system of two linear equations
2x + 4y = 2
x + 2y = 1
has many solution.

[ ]
Given [A] and {b}, the n × (k + 1) matrix [Ab ] = A1 A2 · · · Ak b is called the augmented
n×k k×1 n×(k+1)
matrix. Note Ai is the ith column of A.
[ ] [ ] [ ]
5 3 1 5 3 1
Example 7.8. Let A = , b= ⇒ Ab = .
6 1 2 6 1 2

Theorem 7.3. The system of equations


[A] · {x} = {b}
n×k k×1 n×1
has a solution if and only if
(7.19) rank (A) = rank (Ab ) .
The solution is unique if and only if
(7.20) rank (A) = rank (Ab ) = k = # of columns of A = # of unknowns

Consider the case of n equations in n unknowns. In this case, A is n × n. If Ax = b has a solution


and if det (A) ̸= 0 then the solution is characterized by
(7.21) {x∗ } = [A−1 ] · {b}
n×1 n×n n×1

Example 7.9. The system of linear equations


2x + y = 0
2x + 2y = 0
82 7. Linear Algebra

gives us [ ] [ ] [ ]
2 1 0 2 1 0
A= , b = , Ab = .
2 2 0 2 2 0
It is easy to verify that
rank (A) = 2 = rank (Ab ) .
Hence solution exists and is unique.
Example 7.10. The system of linear equations
2x + y = 0
4x + 2y = 0
leads to [ ] [ ] [ ]
2 1 0 2 1 0
A= , b = , Ab = .
4 2 0 4 2 0
It is again easy to verify that
rank (A) = 1 = rank (Ab ) .
However
rank (A) = rank (Ab ) < k = 2.
Hence solution exists but is not unique1.

Now, we revert to the problem of computing the inverse of a non-singular matrix. We first note
the following result.
( )
Theorem 7.4. Matrix [A] is invertible⇔ det (A) ̸= 0. Also if [A] is invertible then det A−1 =
n×n n×n
1
det(A) .

Proof. Suppose A is invertible. Then


A · A−1 = I
so
1 = det I = det(AA−1 ) = det(A) · det(A−1 )
using properties of determinants, noted above. Consequently det(A) ̸= 0, and det(A−1 ) = [det(A)]−1 .

Suppose, next, that A is not invertible. Then, A is singular and so one of its columns (say, A1 )
can be expressed as a linear combination of its other columns A2 , · · · , An . That is,
n
A1 = ∑ αi Ai
i=2

1A row or column vector of zeros is always linearly dependent on the other vectors.
7.5. System of Linear Equations 83

[ ]
n
Consider the matrix, B, whose first column is A − ∑ αi A and whose other columns are the same
1 i
i=2
as those of A. Then, the first column of B is zero, and so |B| = 0. By the property of determinants,
|B| = |A|, and so |A| = 0. 

For a square matrix, [A] , we define the co-factor matrix of A to be the n × n matrix given by
n×n

 
A11 A12 ... A1n
 . .. 
C=
 ..
..
.
..
.

. 
An1 An2 ... Ann
The transpose of C is called the adjoint of A, and denoted by adj A.

Now, by the rules of matrix multiplication,

 n n n

∑ a1 j A1 j
 j=1 ∑ a1 j A2 j · · · ∑ a1 j An j   
 j=1 j=1   |A| 0 · · · 0
   
AC′ =  .. ..
=
.. 
 n . .  . 
 n n  0 0 · · · |A|
∑ an j A1 j ∑ an j A2 j · · · ∑ an j An j
j=1 j=1 j=1
This yields the equation

(7.22) AC′ = |A| I


If A is non-singular (that is invertible) then there is A−1 such that

(7.23) AA−1 = A−1 A = I


Pre-multiplying (7.22) by A−1 and using (7.23),

C′ = |A| A−1
Since A is non-singular, we have |A| ̸= 0, and

C′ ad j A
(7.24) A−1 = =
|A| |A|
84 7. Linear Algebra

Thus (7.24) gives us a formula for computing the inverse of a non-singular matrix in terms of the
determinant and cofactors of A.

7.6. Cramer’s Rule

Recall that we wanted to calculate the (unique) solution of a system of n equations in n unknowns
given by

(7.25) Ax = c
where A is an n × n matrix, and c is a vector in Rn .

To obtain a unique solution, we saw that we must have A non-singular, which now translates to
the condition “|A| ̸= 0”. The unique solution to (7.25) is then
adj A
(7.26) x = A−1 c = c
|A|
Let us evaluate x1 , using (7.26). This can be done by finding the inner product of x with the first
unit vector, e1 = (1, 0, · · · , 0). Thus,
e1 · adj A
x1 = e1 x = c
|A|

[A11 A21 An1 ]c


=
|A|

= [c1 A11 + c2 A21 + · · · + cn An1 ]/ |A|



c1 a12 · · · a1n

= ..
.

|A|−1

cn an2 · · · ann
This gives us an easy way to compute the solution of x1 . In general, in order to calculate xi , replace
the ith column of A by the vector c and find the determinant of this matrix. Dividing this number
by the determinant of A yields the solution xi . This rule is known as Cramer’s Rule.
Example 7.11. General Market Equilibrium with three goods
7.6. Cramer’s Rule 85

Consider a market for three goods. Demand and supply for each good are given by:
D1 =5 − 2P1 + P2 + P3
S1 = − 4 + 3P1 + 2P2
D2 =6 + 2P1 − 3P2 + P3
S2 =3 + 2P2
D3 =20 + P1 + 2P2 − 4P3
S3 =3 + P2 + 3P3
where Pi is the price of good i; i = 1; 2; 3. The equilibrium conditions are: Di = Si ; i = 1; 2; 3, that
is
5P1 + P2 − P3 = 9
−2P1 + 5P2 − P3 = 3
−P1 − P2 + 7P3 = 17
This system of linear equations can be solved at least in two ways.

(a) Using Cramer’s rule:  


9 1 −1
 
A1 = det  3 5 −1  = 356.
17 −1 7
 
5 1 −1
 
A = det  −2 5 −1  = 178.
−1 −1 7
A1 356
P1∗ = = = 2.
A 178
Similarly P2∗ = 2 and P3∗ = 3. The vector of (P1∗ , P1∗ , P3∗ ) describes the general market equilib-
rium.

(b) Using the inverse matrix rule. Let us denote


     
5 1 −1 P1 9
     
A =  −2 5 −1  , P =  P2  , B =  3 
−1 −1 7 P3 17

The matrix form of the system is AP = B, which implies P = A−1 B.


 
34 −6 4
1  
A−1 =  15 34 7 
det A
7 4 27
86 7. Linear Algebra

     
34 −6 4 9 2
1      
P =  15 34 7  ·  3  =  2 
178
7 4 27 17 3
Again, P1∗ = 2, P1∗ = 2, and P3∗ = 3.

7.7. Principal Minors

Let [A] be a square matrix.


n×n

Definition 7.13. A principal minor of order k (1 6 k 6 n) of [A] is the determinant of the k × k sub
n×n
matrix that remains when (n − k) rows and columns with the same indices are deleted from A.
Example 7.12. Let  
1 2 3
 
A= 0 8 1 
2 5 9

(a) Principal minors of order 1 are 1, 8, 9.

(b) Principal minors of order 2 are,


[ ] [ ] [ ]
1 2 8 1 1 3
det = 8; det = 67; det = 3.
0 8 5 9 2 9

(c) Principal minor of order 3 is


 
1 2 3
 
det  0 8 1  = 23.
2 5 9

7.7.1. Leading Principal Minor.


Definition 7.14. A leading principal minor of order k, (1 6 k 6 n) of [A] is the principal minor of
n×n
order k which has the last (n − k) rows and columns deleted.

In the previous example, leading principal minor of order 1 is 1.

Leading principal minor of order 2 is


7.8. Quadratic Form 87

[ ]
1 2
det =8
0 8
and leading principal minor of order 3 is
 
1 2 3
 
det  0 8 1  = 23.
2 5 9

7.8. Quadratic Form

A quadratic form consists of a square matrix [A] which is pre and post multiplied by a n vector. It
n×n
is a scalar.
(7.27) Q (x, A) = x′ Ax
Example 7.13. Let [ ] [ ]
a b x1
A= , x= .
c d x2
Then
[ ] [ ]
[ ] a b x1
Q (x, A) = x1 x2 · ·
c d x2
= ax12 + (b + c) x1 x2 + dx22 .

7.8.1. Matrix Definiteness. Let [A] be symmetric.


n×n

(a) A is positive definite (PD) if


(7.28) Q (z, A) = z′ Az > 0, ∀ z ∈ Rn , z ̸= 0.

(b) A is negative definite (ND) if


(7.29) Q (z, A) = z′ Az < 0, ∀ z ∈ Rn , z ̸= 0.

(c) A is positive semidefinite (PSD) if


(7.30) Q (z, A) = z′ Az > 0, ∀ z ∈ Rn .

(d) A is negative semi definite (NSD) if


(7.31) Q (z, A) = z′ Az 6 0, ∀ z ∈ Rn .
88 7. Linear Algebra

(e) A is indefinite if none of the above conditions hold true.

7.8.2. Test for definiteness of symmetric matrices:


[A] is PD if and only if all leading principal minors of A are strictly positive.
n×n

[A] is ND if and only if all leading principal minors of A have sign (−1)k .
n×n
[A] is PSD if and only if all principal minors of A are non-negative.
n×n

[A] is NSD if and only if all principal minors of A have sign (−1)k or are 0.
n×n
Example 7.14. Let [ ]
a11 a12
A= .
a21 a22
Then A is
positive definite: a11 > 0, a11 a22 − a12 a21 > 0.
negative definite: a11 < 0, a11 a22 − a12 a21 > 0.
positive semi-definite: a11 > 0, a22 > 0, a11 a22 − a12 a21 > 0.
negative semi-definite: a11 6 0, a22 6 0, a11 a22 − a12 a21 > 0.

Note that a negative definite matrix necessarily has full rank: indeed, if the zero vector can be
obtained by a linear combination of columns of A with weights α1 , · · · , αn (not all zero), then we
can define t = (α1 , · · · , αn ) to obtain t ′ At = 0.
Definition 7.15. Let A be a symmetric n × n matrix. Matrix A is diagonally dominant if for each
row i, we have |ai,i | ≥ ∑ j̸=i |ai, j |, and it is strictly diagonally dominant if the latter inequality holds
strictly for each row.

Every symmetric, diagonally dominant matrix with non-positive entries along the diagonal is
negative semi-definite; and every symmetric, strictly diagonally dominant matrix with negative
entries along the diagonal is negative definite.

7.9. Eigenvalue and Eigenvectors

Given an n × n real matrix A, an eigenvalue of A is a number λ which when subtracted from each
of the diagonal entries of A converts A into a singular matrix. Subtracting a scalar λ from each
diagonal entry of A is the same as subtracting λ times the identity matrix I from A. Hence, λ is a
eigenvalue of A if and only if A − λI is a singular matrix.
7.9. Eigenvalue and Eigenvectors 89

This is also equivalent to asking for what non-zero vectors x ∈ Rn , and for what complex
numbers λ, is it true that
(7.32) Ax = λx
This is known as the the eigenvalue problem.

If x ̸= 0 and λ satisfy equation (7.32), then λ is called a eigenvalue of A, and x is called a


eigenvector of A.

Clearly (7.32) holds if and only if


(7.33) (A − λI)x = 0
But (7.33) is a homogeneous system of n equations in n unknowns. It has a non-zero solution for x
if and only if (A − λI) is singular; that is, if and only if
(7.34) |A − λI| = 0

This equation is called the characteristic equation of A. If we look at the expression


(7.35) f (λ) ≡ |A − λI|
we note that f is a polynomial in λ; it is called the characteristic polynomial of A.
Example 7.15. Consider the 3 × 3 matrix A given by
 
4 1 1
 
A= 1 4 1 
1 1 4
Then subtracting 3 from each diagonal entries transforms A into the singular matrix
 
1 1 1
 
 1 1 1 .
1 1 1
Therefore, 3 is an eigenvalue of matrix A.
Example 7.16. Consider the 2 × 2 matrix A given by
[ ]
4 0
A=
0 2
Then subtracting 4 from each diagonal entries transforms A into the singular matrix
[ ]
0 0
.
0 −2
90 7. Linear Algebra

Therefore, 4 is an eigenvalue of matrix A. Also, subtracting 2 from each diagonal entries transforms
A into the singular matrix [ ]
2 0
.
0 0
Therefore, 2 is also an eigenvalue of matrix A.

The above example illustrates a general principal about the eigenvalues of a diagonal matrix.
Theorem 7.5. The diagonal entries of a diagonal matrix A are the eigenvalues of A.
Theorem 7.6. A square matrix A is singular if and only if 0 is an eigenvalue of A.
Example 7.17. Consider the 2 × 2 matrix A given by
[ ]
4 −4
A=
−4 4
Since the first row is negative of the second row, matrix A is singular. Hence 0 is an eigenvalue of
A. Also subtracting 8 from each diagonal entries transforms A into the singular matrix
[ ]
−4 −4
.
−4 −4
Therefore, 8 is also an eigenvalue of matrix A.
Example 7.18. Consider the 2 × 2 matrix A given by
[ ]
2 1
A=
1 2
Then equation (7.34) becomes


2−λ 1
(7.36)
1 2−λ

So, (4 − 4λ + λ2 ) − 1 = 0, which yields


(1 − λ)(3 − λ) = 0
Thus, the eigenvalues are λ = 1 and λ = 3. In this case it was also possible to see that λ = 1 is a
eigenvalue as subtracting 1 from the diagonal entries converts matrix A into a singular matrix.

Putting λ = 1 in (7.33), we get


[ ] [ ] [ ]
1 1 x1 0
=
1 1 x2 0
7.10. Eigenvalues of symmetric matrix 91

which yields
x1 + x2 = 0
Thus the general solution of the eigenvector corresponding to the eigenvalue λ = 1 is given by
(x1 , x2 ) = θ(1, −1) for θ ̸= 0
Similarly, corresponding to the eigenvalue λ = 3, we have the eigenvector given by
(x1 , x2 ) = θ(1, 1) for θ ̸= 0.
Example 7.19. A square matrix A whose entries are non-negative and whose rows (or columns)
each add to 1 is called a Markov matrix. These matrices play a major role in economic dynamics.
Consider the 2 × 2 matrix A given by
[ ]
a 1−a
A=
b 1−b
where a ≥ 0 and b ≥ 0. Then subtracting 1 from the diagonal entries leads to the matrix
[ ]
a−1 1−a
A=
b −b
Notice that each row of the matrix adds to 0. But if the rows of a square matrix add to zero {0, 0},
the columns are linearly dependent and the matrix is singular. This shows that 1 is an eigenvalue
of the Markov matrix. This same argument shows that 1 is an eigenvalue of every Markov matrix.

7.10. Eigenvalues of symmetric matrix

For the case of a symmetric matrix A, we can show that all the eigenvalues of A are real.
Theorem 7.7. Let A be a symmetric n × n matrix. Then all the eigenvalues of A are real.

Proof. Suppose λ is a complex eigenvalue, with associated complex eigenvector, x. Then we have
(7.37) Ax = λx
Define x∗ to be the complex conjugate of x, and λ∗ to be the complex conjugate of λ. Then
(7.38) Ax∗ = λ∗ x∗
Pre-multiply (7.37) by (x∗ )′ and (7.38) by x′ to get
(7.39) (x∗ )′ Ax = λ(x∗ )′ x

(7.40) x′ Ax∗ = λ∗ x′ x∗
Subtracting (7.40) from (7.39)
(7.41) (x∗ )′ Ax − x′ Ax∗ = (λ − λ∗ )x′ x∗
92 7. Linear Algebra

since (x∗ )′ x = x′ x∗ . Also,


x′ Ax∗ = (x′ Ax∗ )′ = (x∗ )′ A′ x = (x∗ )′ Ax
since A′ = A (by symmetry). Thus (7.41) yields
(7.42) (λ − λ∗ )x′ x∗ = 0
Since x ̸= 0, we know that x′ x∗ is real and positive. Hence (7.42) implies that λ = λ∗ , so λ is
real. 

7.11. Eigenvalues, Trace and Determinant of a Matrix

If A is an n × n matrix, the trace of A, denoted by tr (A), is the number defined by


n
tr (A) = ∑ aii
i=1
The following properties of the trace can be verified easily [Here A, B and C are n × n matrices,
and λ ∈ R].

(a) tr (A + B) = tr (A) + tr (B)

(b) tr (λA) = λ tr (A)

(c) tr (AB) = tr (BA)

(d) tr (ABC) = tr (BCA) = tr (CAB)

Let A be an n × n matrix. The characteristic polynomial of A, defined in (7.35) above can


generally be written as
(7.43) |A − λI| = (−λ)n + bn−1 (−λ)n−1 + .... + b1 (−λ) + b0
where b0 , ..., bn−1 are the coefficients of the polynomial which are determined by the coefficients
of the A-matrix.

On the other hand, if λ1 , ..., λn are the eigenvalues of A, then the characteristic equation (7.34)
can be written as
(7.44) 0 = (λ1 − λ)(λ2 − λ)....(λn − λ)

Using (7.34), (7.43), and (7.44) and “comparing coefficients” we can conclude that
bn−1 = λ1 + λ2 + ... + λn
7.11. Eigenvalues, Trace and Determinant of a Matrix 93

and
b0 = λ1 λ2 ...λn

Also, by looking at the terms in the characteristic polynomial of A which would involve
(−λ)n−1 , we can conclude that
bn−1 = a11 + a22 + ... + ann
Finally, putting λ = 0 in (7.43), we get
b0 = |A|

Thus we might note two interesting relationships between the characteristic values, the trace
and the determinant of A:
n
tr A = ∑ λi
i=1
and
n
|A| = ∏ λi
i=1

7.11.1. Eigenvalues and Definiteness of Quadratic Forms.


Theorem 7.8. Let A be a symmetric matrix. Then,

(1) A is positive definite if and only if all the eigenvalues of A are positive.

(2) A is negative definite if and only if all the eigenvalues of A are negative.

(3) A is positive semidefinite if and only if all the eigenvalues of A are non-negative.

(4) A is negative semidefinite if and only if all the eigenvalues of A are non-positive.

(5) A is indefinite if and only if A has a positive eigenvalue and a negative eigenvalue.
Chapter 8

Problem Set 3

(1) Let
 
[ ] 9 6 5 4
1 −1 7  
A= , B =  1 −2 −3 3 
0 8 10
0 1 −1 2

Compute AB. Is BA defined?


( ) ( )
1 1
(2) Are the vectors and linearly independent?
2 3

(3) Let
 
[ ] 8 4
1 6 2  
A= , B =  0 −2  .
−1 5 3
7 −3

Compute AB and BA.

(4) What is the determinant of


 
1 2 3 4
 
 1 2 1 2 
A= ?
 1 3 5 7 
2 1 4 1

95
96 8. Problem Set 3

(5) What is the rank of  


3 2 1
 
A =  0 1 7 ?
5 4 −1

(6) Consider the system of three linear equations in three unknowns:


x+y+z = 6
x + 2y + 3z = 10
x + 2y + λz = µ.
For what values of λ and µ, the system of equations have
(a) no solution,
(b) a unique solution,
(c) infinitely many solutions?

(7) What is the definiteness of the following matrices? (Hint: Use the principal minors)
[ ] [ ] [ ] [ ]
2 −1 2 4 −3 4 −3 4
A= , B= C= , D= .
−1 1 4 8 4 5 4 −6

(8) Consider the situation of a mass layoff (i.e. a firm goes out of business) where 2000 people
become unemployed and now begin a job search. There are two states: employed (E) and
unemployed (U) with an initial vector
x0′ = [E U] = [0 2000].
Suppose that in any given period an unemployed person will find a job with probability 0.7
and will therefore remain unemployed with a probability 0.3. Additionally, persons who find
themselves employed in any given period may lose their job with a probability of 0.1 (and will
continue to remain employed with probability 0.9).
(i) Set up the Markov transition matrix for this problem.
(ii) What will be the number of unemployed people after (a) two periods; (b) four periods;
(c) six periods; (d) ten periods.
(iii) What is the steady-state level of unemployment?

(9) (a) A n × n matrix A is called nilpotent if for some positive integer k

| ×A×
Ak = A {z· · · · · · A} = O
k times

where O is the n × n null matrix. Prove that if A is nilpotent, then det A = 0.


8. Problem Set 3 97

(b) A n × n matrix A is called skew- symmetric if A′ = −A. Prove that if A is skew-symmetric


and n is odd, then A is not invertible.
(c) A n × n matrix A is called orthogonal if AA′ = I. Prove that if A is orthogonal, then
det A = ±1.
(d) Let n × n matrices A and B be such that AB = −BA. Prove that if n is an odd number, then
either A or B is not invertible.
(e) Let n × n matrices A and B be such that AB = I. Use determinants to prove that A is
invertible (and hence B = A−1 .

(10) (a) Prove that the eigenvalues of an upper or lower triangular matrix are precisely its diagonal
entries.
(b) Suppose that A is an invertible matrix. Show that (A−λI)x = 0 implies that (A−1 − λI )x = 0.
Conclude that for an invertible matrix A, λ is an eigenvalue of A if and only if λ1 is an
eigenvalue of A−1 .
(c) Let A be an invertible matrix and let x be an eigenvector of A. Show it is also an eigenvector
of A2 and A−2 . What are the corresponding eigenvalues?
Chapter 9

Single and Multivariable


Calculus

9.1. Functions

Recall the definition of functions discussed earlier. Now we discuss some features of function
which are useful in optimization exercise.

9.2. Surjective and Injective Functions

Definition 9.1. A function f : D → R is called surjective (or is said to map) D onto R if f (D) = R,
i.e., if the image f (D) of the function is equal to entire range.
Definition 9.2. A function f : D → R is called injective or one to one if
(9.1) f (x) = f (y) ⇔ x = y.

A function f : D → R is called a bijection if it is both surjective and injective.


Example 9.1. Consider function
f : R → R : f (x) = x2 .

It is not surjective as there exist no element in the domain which gets mapped into −1.

99
100 9. Single and Multivariable Calculus

Let us restrict the range to R+ . So the new function is


g : R → R+ : f (x) = x2 .
Now this function is surjective as each non-negative real number has a pre-image (square root) in
R. However, this function is not injective as the pre-image of 4 is both −2 and 2.

Next let us also restrict the domain of the function to R+ . The function is

h : R+ → R+ : f (x) = x2 .
It is both surjective and injective. Hence it is bijective.
Example 9.2. Let A be a non-empty set and let S be a subset of A. We define a function χS : A →
{0, 1} by
{
1, if a ∈ S;
(9.2) χS (a) =
0, if a ∈
/ S.

This function is called characteristic function or indicator function of S. It is widely used in


probability and statistics. If S is a non-empty proper subset of A, then χS is surjective. If S = 0/ or
S = A, then χS is not surjective.
Definition 9.3. Inverse Function: Consider f : D → R. If ∃g : R → D such that ∀x ∈ D,
( )
(9.3) g f (x) = x,
then g is called the inverse function of f and is written as f −1 : R → D. Alternatively we can also
define the inverse function as under. Let f : D → R be bijective. The inverse function of f is the
function f −1 : R → D such that ∀x ∈ D,
( )
(9.4) f −1 f (x) = x.
Theorem 9.1. Let f : D → R be bijective. Then f −1 : R → D is bijective.
( )
Example 9.3. f (x) = 2x, f −1 (x) = 2x , f −1 f (x) = f (x) 2x
2 = 2 = x.

Theorem 9.2. Suppose f : D → R. Let A, A1 , A2 be subsets of D and let B be a subset of R.

Then
[ ]
(a) If f is injective, then f −1 f (A) = A,
[ ]
(b) If f is surjective, then f f −1 (B) = B,
(c) If f is injective, then f (A1 ∩ A2 ) = f (A1 ) ∩ f (A2 ) .
9.2. Surjective and Injective Functions 101

Proof. You should try and prove (a) and (b) on your own. I will provide proof for (c) here. We
need to prove that f (A1 ∩ A2 ) ⊆ f (A1 ) ∩ f (A2 ) and f (A1 ) ∩ f (A2 ) ⊆ f (A1 ∩ A2 ).

Step 1. Show
f (A1 ∩ A2 ) ⊆ f (A1 ) ∩ f (A2 )
Let
y ∈ f (A1 ∩ A2 ) .
Then ∃x ∈ A1 ∩ A2  f (x) = y. Since x ∈ A1 ∩ A2 , x ∈ A1 and x ∈ A2 . But then f (x) ∈ f (A1 ) and
f (x) ∈ f (A2 ). So f (x) ∈ f (A1 ) ∩ f (A2 ). Observe that we have not used the fact that f is injective.
So this part of the result holds for any function.

Step 2. We need to show


f (A1 ) ∩ f (A2 ) ⊆ f (A1 ∩ A2 ) .
Let y ∈ f (A1 ) ∩ f (A2 ). Then y ∈ f (A1 ) and y ∈ f (A2 ). Hence there exist a point x1 ∈ A1 and a
point x2 ∈ A2 such that f (x1 ) = y and f (x2 ) = y. Or

f (x1 ) = y = f (x2 ) .

Since f is injective, we must have x1 = x2 , or x1 ∈ A1 ∩ A2 . But then y = f (x1 ) ∈ f (A1 ∩ A2 ).

Here are some more definitions related to functions.

Definition 9.4.

(a) A function f is odd if and only if for every x, − f (x) = f (−x). Example: f (x) = x.

(b) A function f is even if and only if for every x, f (x) = f (−x). Example: f (x) = x2 .

(c) A function f is periodic if and only if there exists a k > 0 such that for every x, f (x + k) = f (x).
Example: f (x) = sin x, since sin (x + 2π) = sin x.

(d) A function f is increasing if and only if for every x and every y, if x ≤ y, then f (x) ≤ f (y).
Example: f (x) = x.

(e) A function f is decreasing if and only if for every x and every y, if x ≤ y, then f (x) ≥ f (y).
Example: f (x) = −x.
102 9. Single and Multivariable Calculus

9.3. Composition of Functions

Definition 9.5. Composition of Functions: If f : A → B and g : B → C are two functions, then for
( a ∈)A, f (a) ∈ B. But B is the domain of g, so mapping g can be applied to f (a), which yields
any
g f (a) , an element in C. This establishes a correspondence between a in A and c in C. This
correspondence is called the composition function of f and g and is denoted by g ◦ f (read g of f ).
Thus we have
( )
(9.5) (g ◦ f ) (a) = g f (a) .
Remark 9.1. Composition of two functions need not be commutative,
(g ◦ f ) (a) ̸= ( f ◦ g) (a)
as the following example shows.

Let f (x) = x2 , g (x) = x + 1. Then


(g ◦ f ) (x) = x2 + 1 but
( f ◦ g) (x) = (x + 1)2 .
Theorem 9.3. Let f : A → B, and g : B → C.
(a) If f and g are surjective, then g ◦ f is surjective,
(b) If f and g are injective, then g ◦ f is injective,
(c) If f and g are bijective, then g ◦ f is bijective.

Proof. (a) Since g is surjective, range of g = C. That is for any element c ∈ C, there exists an
element b ∈ B such that g (b) = c. Since f is also surjective, there exists an element a ∈ A such
that f (a) = b. But then
( )
(g ◦ f ) (a) = g f (a) = g (b) = c.
So, (g ◦ f ) is surjective.
( )
(b) Since g is injective, for all b and b′ in B, (if g)(b) = g b′ = c ∈ C then b = b′ and since f is
injective, for all a and a′ in A, if f (a) = f a′ = b ∈ B then a = a′ . Then
( )
(g ◦ f ) (a) = (g ◦ f ) a′
( ) ( ( ))
⇒ g f (a) = g f a′
( )
⇒ f (a) = f a′
⇒ a = a′
So, (g ◦ f ) is injective.
9.4. Continuous Functions 103

(c) Proof of this result follows from (a) and (b).

9.4. Continuous Functions

f : D → R at the point c ∈ D if and only


Definition 9.6. The real number L is the limit of the function
if for each ε > 0, there exists a δ > 0 such that f (x) − L < ε whenever x ∈ D and 0 < |x − c| < δ.
Definition 9.7. A function f : D → R is continuous at x0 ∈ D, if
( )
(9.6) ∀ε > 0, ∃δ > 0  d (x, x0 ) < δ ⇒ d f (x) , f (x0 ) < ε.
A function f : D → R is continuous if it is continuous at all x0 ∈ D.

It is easy to draw examples of functions which are not continuous. An intuitive way of under-
standing continuity of function is that we should be able to draw its graph without lifting pencil
from paper. If a function has a point of discontinuity say, x0 , then as we approach x0 from the left
hand side and from right hand side, the function attains different values.

For a function to be continuous at x0 , both the LHS and RHS limits must exist and converge to
the function value.
(9.7) lim f (x) = lim+ f (x) = f (x0 )
x→x0− x→x0

Theorem 9.4. A function f : D → R is continuous if and only if for every convergent sequence of
points {xn } ∈ D with limit x ∈ D, the sequence f (xn ) → f (x).
Example 9.4. If
lim f (x) = lim+ f (x) ̸= f (x0 )
x→x0− x→x0
then the function is not continuous. Take



 x for 0 6 x < 21
y= 0 for x = 12

 1 − x for 1 < x 6 1
2

Definition 9.8. Given f : D → R, let A ⊆ R be any subset of the range. The inverse image of A
under f , f −1 (A), is the set of points x in the domain D such that f (x) ∈ A
{ }
(9.8) f −1 (A) = x ∈ D | f (x) ∈ A .
104 9. Single and Multivariable Calculus

We give two more theorems on continuity of functions.


Theorem 9.5. A function f : D → R is continuous if and only if the inverse image of every open
set is open.

Proof. Suppose f is continuous on D and V is an open set in R. We have to show that f −1 (V ) is


open in D (i.e., every point of f −1 (V ) is an interior point of f −1 (V )). Let p ∈ D and f (p) ∈ V .
Since V is open, there exists ε > 0 such that y ∈ V if d( f (p), y) < ε. Also since f is continuous at p,
there exists a δ > 0, such that d( f (p), y) < ε if d(p, x) < δ. Thus x ∈ f −1 (V ) as soon as d(p, x) < δ
and hence f −1 (V ) is open.

Conversely, assume that f −1 (V ) is open in D for every open set V in R. Fix p ∈ D and ε > 0,
and let V be the set of all y ∈ R such that d( f (p), y) < ε. Then V is open and hence f −1 (V ) is
open, and so there exists δ > 0 such that x ∈ f −1 (V ) as soon as d(p, x) < δ. But if x ∈ f −1 (V ), then
f (x) ∈ V , and so d( f (p), y) < ε. 

Next theorem (stated without proof) considers the inverse image of the closed subsets of the
range R to characterize continuous functions.
Theorem 9.6. A function f : D → R is continuous if and only if the inverse image of every closed
set is closed.

This follows from Theorem 9.5, since a set is closed if and only if its complement is open, and
since f −1 (V c ) = [ f −1 (V )]c for every V ⊂ R.

9.4.1. Properties of Continuous Functions.


Claim 9.1. If f and g are continuous functions then

f ±g 



f( · g) 


f
(if g ̸
= 0) are continuous.
g 

max { f , g} 



min { f , g} 

Claim 9.2. If f is a continuous function of two variables f (x1 , x2 ), then the functions of one
variable obtained by holding the other variable constant f (·, x̄2 ) and f (x̄1 , ·) are also continuous.
Theorem 9.7. Intermediate Value Theorem for continuous functions: Let f be a continuous func-
tion on a domain containing [a, b], with say f (a) < f (b). Then for any y in between, f (a) < y <
f (b), there exists c in (a, b) with f (c) = y.
9.4. Continuous Functions 105

y = f (x)

f (b)

y=u
u

f (a)

a c b

Figure 9.1. Intermediate Value Theorem

We can apply the Intermediate Value Theorem to prove the existence of a fixed point for fol-
lowing function.

Theorem 9.8. Consider a continuous function f : [0, 1] → [0, 1]. Then there exists c ∈ [0, 1] such
that f (c) = c.

Proof. Define a function g(x) = f (x) − x. It is continuous since it is sum of two continuous func-
tions, f (x) and −x. If f (0) = 0, then x = 0 is a fixed point. If not, then f (0) > 0, or g(0) > 0.

If f (1) = 1, then x = 1 is a fixed point. If not, then f (1) < 1, or g(1) < 0.

Now we apply the Intermediate Value Theorem to claim that there exists a point c ∈ [0, 1] such
that g(c) = 0. This implies g(c) = f (c) − c = 0 or f (c) = c or c is a fixed point. 
106 9. Single and Multivariable Calculus

9.5. Extreme Values

Definition 9.9. The function f : D → R attains a local maximum at x0 if there exists a neighborhood
of x0 such that f (x) 6 f (x0 ) for all x in the neighborhood.
Definition 9.10. The function f : D → R attains a strict local maximum at x0 if there exists a
neighborhood of x0 such that f (x) < f (x0 ) for all x not equal to x0 in the neighborhood.

Local minima are defined by reversing the inequalities.


Definition 9.11. The function f : D → R attains a global maximum at x0 if f (x) 6 f (x0 ) , ∀x ∈ D.
Definition 9.12. The function f : D → R attains a strict global maximum at x0 if f (x) < f (x0 ),
∀x ∈ D\x0 .

Global minima are defined by reversing the inequalities.


Remark 9.2. A global maximum (minimum) is also a local maximum (minimum).
Theorem 9.9. Weierstrass Theorem: Suppose D is a non-empty closed and bounded subset of
Rn . If f : D → R is continuous on D, then there exists x∗ and x∗ in D such that
( )
(9.9) f x∗ > f (x) > f (x∗ ) , ∀ x ∈ D.

Proof. We first claim that the function f is bounded on the domain D. If not, then there exists a
sequence {xn }∞ n=1 in D such that f (xn ) → ∞ as n → ∞. Since D is compact, there exists a subse-
quence {yn }n=1 of sequence {xn }∞
∞ ∞
n=1 which converges to ȳ in D. Since {yn }n=1 is a subsequence
of sequence {xn }∞ ∞
n=1 and f (xn ) → ∞, it must be true that f (yn ) → ∞. However, {yn }n=1 converges
to ȳ and f is a continuous function, f (yn ) must converge to the finite real number f (ȳ). These two
observations lead to a contradiction. Thus we have proved the claim.

To prove the theorem, we again assume that f does not attain its maximum value in D. Since f
is bounded on D, let M be the least upper bound of the values f takes in D. Clearly M is finite. Also
there exists a sequence {zn }∞
n=1 in D such that f (zn ) → M. Note, even though f (zn ) approaches
towards the least upper bound M as n → ∞, the sequence {zn }∞ n=1 itself need not converge. Since
D is compact, there exists a subsequence {un }∞ n=1 of sequence {zn }∞
n=1 which converges to ū in
D. Since f is a continuous function, f (un ) must converge to the finite real number f (ū). Since a
convergent sequence has only one limit, f (ū) = M and ū is the point of global maximum of f in
D. 

This is the theorem we will be using to show the existence of optimal bundles for consumers
and producers. So we need to understand it and be comfortable with using it.
9.6. An application of Extreme Values Theorem 107

The following examples show why the function domain must be closed and bounded in order
for the theorem to apply. In each of the following examples, the function fails to attain a maximum
on the given interval.

(a) f (x) = x defined over [0, ∞) (domain being unbounded) is not bounded from above.

x
(b) f (x) = 1+x defined over [0, ∞) (domain being unbounded) is bounded but does not attain its
least upper bound, i. e., 1.

1
(c) f (x) = x defined over (0, 1] (domain is bounded but not closed) is not bounded from above.

(d) f (x) = 1 − x defined over (0, 1] (domain is bounded but not closed) is bounded but never attains
its least upper bound, i. e., 1.

(e) Defining f (x) = 0 in the last two examples shows that both functions require continuity on
[a, b].

9.6. An application of Extreme Values Theorem

Result 9.1. Equivalence of norms in finite dimensional vector space.

If we are given two norms ∥·∥a and ∥·∥b on some finite-dimensional vector space V over Rn ,
a very useful fact is that they are always within a constant factor of one another. In other words,
there exists a pair of real numbers 0 < C1 < C2 such that, for all x ∈ V , the following inequality
holds:
C1 ∥x∥b ≤ ∥x∥a ≤ C2 ∥x∥b .
Note that any finite-dimensional vector space, by definition, is spanned by a basis e1 , e2 , · · · , en
where n is the dimension of the vector space. The basis is often chosen to be orthonormal if we
have an inner product. That is, any vector x can be written
n
x = ∑ αi ei ,
i=1

where the αi are some real numbers depending on x.

Now, we can prove equivalence of norms in four steps, the last of which requires application
of the Extreme Value Theorem.

Step 1 It is sufficient to consider ∥·∥b = ∥·∥1 (Transitivity property for the norms hold.)
108 9. Single and Multivariable Calculus

First, let us define a taxi-cab norm by


n
∥x∥1 = ∑ |αi |
i=1
We have seen earlier in a problem set that it is indeed a norm. The linear independence of
any basis {ei } implies that x ̸= 0 ⇐⇒ |α j | > 0 for some j ⇐⇒ ∥·∥1 > 0. The triangle
inequality and the scaling property are obvious and follow from the usual properties of N1
norms on x ∈ Rn .
We will show that it is sufficient for us to prove that ∥·∥a is equivalent to ∥·∥1 , because
norm equivalence is transitive: if two norms are equivalent to ∥·∥1 , then they are equivalent
to each other.
In particular, suppose both ∥·∥a and ∥·∥a′ are equivalent to∥·∥1 for constants 0 < C1 ≤ C2
and 0 < C1′ ≤ C2′ , respectively:
C1 ∥x∥1 ≤ ∥x∥a ≤ C2 ∥x∥1 ,
C1′ ∥x∥1 ≤ ∥x∥a′ ≤ C2′ ∥x∥1 .
C′
Multiply the ∥x∥a inequalities by C12 to get
( ′) ( ′)
C1 C1
C1 ∥x∥1 ≤ ∥x∥a ≤ C1′ ∥x∥1 ,
C2 C2
C′
and multiply the ∥x∥a inequalities by C12 to get
( ′) ( ′)
( ′) C2 C2
C2 ∥x∥1 ≤ ∥x∥a ≤ C1′ ∥x∥1
C1 C1
and combine to get
( ) ( ′) ( ′) ( ′)
C1′ C1 ′ ′ C2 C2
C1 ∥x∥1 ≤ ∥x∥a ≤ C1 ∥x∥1 ≤ ∥x∥a′ ≤ C2 ∥x∥1 ≤ ∥x∥a ≤ C1′ ∥x∥1 .
C2 C2 C1 C1
Then it immediately follows that
C1′ C′
∥x∥a ≤ ∥x∥a′ ≤ 2 ∥x∥a ,
C2 C1
and hence ∥·∥a and ∥·∥a′ are equivalent.

Step 2 It is sufficient to consider only x with ∥x∥1 = 1.


We want to show that
C1 ∥x∥1 ≤ ∥x∥a ≤ C2 ∥x∥1 ,
is true for all x ∈ V for some C1 ,C2 . It is trivially true for x = 0, so we need only consider
x ̸= 0, in which case we can divide by ∥x∥1 to obtain the condition

x
C1 ≤ ≤ C2 ,
∥x∥1 a
9.6. An application of Extreme Values Theorem 109

where u ≡ x
∥x∥1 has norm ∥u∥1 = 1.

Step 3 Any norm ∥x∥a is continuous under ∥x∥1 .


We wish to show that any norm ∥·∥a is a continuous function on V under the topology
induced by the norm ∥·∥1 . That is, we wish to show that for any ε > 0, there exists a δ > 0
such that
∥x − x′ ∥1 < δ → |∥x∥a − ∥x′ ∥a | < ε
We prove this in two steps. First, by the triangle inequality on ∥·∥a , it follows that

∥x∥a − ∥x′ ∥a = ∥x′ + (x − x′ )∥a − ∥x′ ∥a ≤ ∥x − x′ ∥a ,

and
∥x′ ∥a − ∥x∥a = ∥x − (x − x′ )∥a − ∥x∥a ≤ ∥x − x′ ∥a ,
and therefore,
|∥x∥a − ∥x′ ∥a | < ∥x − x′ ∥a .
Second, applying the triangle inequality again, and writing x = ∑ni=1 αi ei and x′ = ∑ni=1 α′i ei ,
we obtain
n
∥x − x′ ∥a ≤ ∑ |αi − α′i |∥ei ∥a ≤ ∥x − x′ ∥1 (max ∥ei ∥a ).
i
i=1

Therefore, if we choose
ε
δ=
maxi ∥ei ∥a
it immediately follows that

∥x − x′ ∥1 < δ → |∥x∥a − ∥x′ ∥a | < ε

Step 4 The maximum and minimum of ∥x∥a on the unit sphere


Now we have a continuous function (the norm ∥·∥a on a compact (closed and bounded)
non-empty domain, the unit sphere, and can apply Weierstrass Theorem. By the extreme
value theorem, the function must achieve a maximum and minimum value on the set (it
cannot merely approach them). Let

C1 = min ∥u∥a , and C2 = max ∥u∥a .


∥u∥1 =1 ∥u∥1 =1

Since u ̸= 0 for ∥u∥1 = 1, it follows that C2 ≥ C1 > 0, and C1 ≤ ∥u∥a ≤ C2 as required in


Step 2. This completes the proof.
110 9. Single and Multivariable Calculus

9.7. Differentiability

Definition 9.13. A function f : R → R is differentiable at x0 ∈ R, if


f (x0 + h) − f (x0 )
(9.10) lim exists.
h→0 h
If this limit exists, we call it derivative of f at x0 and denote it by f ′ (x0 ) or d f (x)
dx |x=x0 .

We follow the steps listed below to determine whether a derivative exists and if yes, its value.

(a) ∆ f = f (x0 + h) − f (x0 ) is the change in functional value.

∆f f (x0 +h)− f (x0 )


(b) slope of the secant is h = h .

∆f
(c) If the secant h has a limit as h → 0, then f is differentiable at x0 , and the derivative is
equal to this limit.

We can see that the derivative is equal to the slope of the tangent to the graph at x0 . Note that
the tangent can be used to approximate the function in the neighborhood of x0 .
f (x0 + h) = f (x0 ) + h · f ′ (x0 ) .
It is the best linear approximation.
Definition 9.14. A function f : R → R is differentiable on a set S ⊆ R, if it is differentiable at each
point x ∈ S. It is called differentiable if it is differentiable at each point of the domain.
Example 9.5. Let f (x) : R → R be f (x) = x2 . This function is differentiable at all x ∈ R.
f (x0 + h) − f (x0 ) (x0 + h)2 − x02
αsec = =
( h ) h
x02 + h2 + 2x0 h − x02 2x0 h + h2
= =
h h
= 2x0 + h
lim αsec = 2x0 ⇒ f ′ (x0 ) = 2x0 .
h→0

Definition 9.15. Second derivative: Let function f : R → R be differentiable with f ′ (·) denoting
its first derivative. If f ′ (·) is differentiable, its derivative is denoted by f ′′ (·) and is called the
second derivative of f .
Definition 9.16. A function whose derivative exists and is continuous is called continuously dif-
ferentiable or of class C 1 . A function whose second derivative exists and is continuous is called
twice continuously differentiable or of class C 2 .
9.7. Differentiability 111

Result 9.2. If function f : R → R is differentiable at x0 then it is continuous at x0 .

Proof. Since f : R → R is differentiable at x0 ∈ R, the limit,


f (x0 + h) − f (x0 )
lim
h→0 h
exists and is f ′ (x0 ). Consider,
[ ]
f (x) − f (x0 )
lim [ f (x) − f (x0 )] = lim [x − x0 ] ;
x→x0 x→x0 x − x0
[ ]
f (x) − f (x0 )
= lim [x − x0 ] lim
x→x0 x→x0 x − x0

= 0 · f (x0 ) = 0
lim f (x) = f (x0 ) .
x→x0
Hence f is continuous at x0 . 

Note this claim does not hold in the other direction. Not all continuous functions are differen-
tiable. Consider the example of absolute value function f : R −→ R is defined by
f (x) = |x| .
The absolute value or |x| of x is defined by
{
x if x > 0
|x| =
−x if x < 0.
It is easy to check that f is continuous on R. However, it is not differentiable at x0 = 0 (Please
verify).

9.7.1. Rules of Differentiation.


Theorem 9.10. If f and g are differentiable functions then

(9.11) f ± g is differentiable with ( f ± g)′ (x) = f ′ (x) ± g′ (x)


(9.12) f · g is differentiable with ( f · g)′ (x) = f ′ (x) g (x) + f (x) g′ (x)
( )′
f f f ′ (x) g (x) − f (x) g′ (x)
(9.13) If g ̸= 0, then is differentiable with (x) = ( )2 .
g g g (x)
112 9. Single and Multivariable Calculus

f (x) = |x|

f (x) is not differentiable at x = 0

Figure 9.2. Continuity does not imply differentiability

Theorem 9.11. Chain Rule: If f and g are differentiable, then


( )
(9.14) f ◦ g is differentiable with ( f ◦ g)′ (x) = f ′ g (x) · g′ (x)
( )
Example 9.6. Let f (y) = ln y and g (x) = x2 . Then, f ◦ g (x) = ln x2 and

1 2
( f ◦ g)′ (x) = · 2x =
x2 x
Theorem 9.12. If f is differentiable and has a local maxima or minima at x0 , then f ′ (x0 ) = 0.

Note the converse is not true. Take f (x) = x3 (See Figure 9.3). The first derivative is zero at
x0 = 0 which is a point of inflection.

Following two examples illustrate differentiability and continuous differentiability of a function.

Example 9.7. Let f be defined by


 ( )

x sin 1
x for x ̸= 0
f (x) =
 0 for x = 0.
9.7. Differentiability 113

−3 −2 −1 1 2 3
−1

−2

−3

−4

−5

−6

−7

Figure 9.3. Graph of x3

We know that derivative of sin (x) is cos (x). Using it


( ) ( )( )
′ 1 1 1
f (x) = sin + x cos − 2
x x x
( ) ( )
1 1 1
= sin − cos for x ̸= 0.
x x x
At x = 0, this does not work as 1
x is not defined there. We use the definition, for h ̸= 0, the secant is
( )
( )
h −0
1
f (h) − f (0) h sin 1
= = sin .
h h h
( )
As h → 0, sin 1
h does not tend to any limit, so f ′ (0) does not exist.
114 9. Single and Multivariable Calculus

y
1

0 x
0.2 0.4 0.6 0.8

−1

Figure 9.4. f(x) = sin 1x

Example 9.8. Let f be defined by


 ( )

x2 sin 1
x for x ̸= 0
f (x) =
 0 for x = 0.

We know that derivative of sin (x) is cos (x). Using it


( ) ( )( )
1 1 1
f ′ (x) = 2x sin + x2 cos − 2
x x x
( ) ( )
1 1
= 2x sin − cos for x ̸= 0.
x x
At x = 0, we use the definition as before, for h ̸= 0, the secant is
( )
h 2 sin 1 − 0 ( )
f (h) − f (0) h 1
= = h sin
h h h
( )
f (h) − f (0)
= h sin 1 6 |h|
h h
As h →( 0,)we see that f ′ (0) = 0. Thus f (x) is differentiable everywhere but f ′ (x) is not continuous
as cos 1x does not tend to a limit as x → 0.

9.7.2. L’Hospital’s Rule. Sometimes we need to determine the value of a function where both the
numerator and the denominator go to zero. We use L’Hospital rule in such case. If f (a) = g (a) = 0
and g′ (a) ̸= 0, then
f (x) f ′ (a)
lim = ′ .
x→a g (x) g (a)
9.8. Mean Value Theorem 115

Example 9.9. Find lim 4x√−16


2
x−8
.
x→4

f (x) = x2 − 16, g (x) = 4 x − 8
2
f (4) = g (4) = 0, f ′ (x) = 2x, g′ (x) = √ .
x
Then
x2 − 16 f ′ (4) 8
lim √ = ′ = = 8.
x→4 4 x − 8 g (4) 1

9.8. Mean Value Theorem

Theorem 9.13. Mean Value Theorem: Let f be a continuous function on the compact interval [a, b]
and differentiable on (a, b). Then there exists a point c ∈ (a, b) where
f (b) − f (a)
f ′ (c) = .
b−a

Following claim is helpful in proving the Mean Value Theorem. The proof the claim relies on
the Weierstrass Theorem and thus is another example of application of the Weierstrass Theorem.
Claim 9.3. Let f (·) and g(·) be continuous functions on [a, b] and differentiable on (a, b). Then
there exist x ∈ (a, b) such that
[ f (b) − f (a)]g′ (x) = [g(b) − g(a)] f ′ (x).

Proof. Define,
h(s) = [ f (b) − f (a)]g(s) − [g(b) − g(a)] f (s).
Then, it is easy to check h(a) = f (b)g(a) − f (a)g(b) = h(b). We need to show that h′ (x) = 0 for
some x ∈ (a, b). If h(x) is a constant function, then h′ (x) = 0 for every point in (a, b). If not, then
consider without loss of generality, h(x) > h(a) for some x ∈ (a, b). Since h(·) is a continuous
function defined on a compact domain [a, b], Weierstrass Theorem can be applied to claim that it
attains a maximum at some point s ∈ (a, b). Also since h(·) is differentiable on (a, b) and attains its
maximum at s ∈ (a, b), h′ (s) = 0. The case where h(x) < h(a) for some x ∈ (a, b) can be proved in
similar manner as in this case, the function h(·) will attain a minimum at some interior point. 

To prove the Mean Value Theorem, we consider g(x) = x. Then, g′ (x) = 1 leads to
f (b) − f (a)
[ f (b) − f (a)](1) = [b − a] f ′ (x) or f ′ (x) = ,
b−a
for some x ∈ (a, b).
116 9. Single and Multivariable Calculus

f (b)

f (b)− f (a)
b−a

f (a)

f ′ (c)
a c b

Figure 9.5. Mean Value Theorem f ′ (c) = f (b)− f (a)


b−a

9.9. Monotone Functions

Definition 9.17. Let f : [a, b] → R. We say that function f is

(a) strictly increasing on (a, b) if a < x < y < b implies that

f (x) < f (y) .

(b) increasing or non-decreasing on (a, b) if a < x < y < b implies that

f (x) 6 f (y) .

(c) strictly decreasing on (a, b) if a < x < y < b implies that

f (x) > f (y) .

(d) decreasing or non-increasing on (a, b) if a < x < y < b implies that

f (x) > f (y) .

(e) monotone on (a, b) if it is increasing on (a, b) or decreasing on (a, b).


9.9. Monotone Functions 117

(f) sometimes we also say f is increasing at c if there exists some δ > 0 such that c − δ < x < c <
y < c + δ implies that
f (x) 6 f (c) 6 f (y) .

(g) f is decreasing at c if there exists some δ > 0 such that c − δ < x < c < y < c + δ implies that
f (x) > f (c) > f (y) .

Some properties of derivative of monotone functions are as follows.


Result 9.3. Suppose f : [a, b] → R is continuous on [a, b] and differentiable on (a, b).

(a) If f ′ (x) > 0 for all x ∈ (a, b), then f is non-decreasing on [a, b].

(b) If f ′ (x) > 0 for all x ∈ (a, b), then f is strictly increasing on [a, b].

(c) Similarly if f ′ (x) 6 0 for all x ∈ (a, b), then f is non-increasing on [a, b].

(d) If f ′ (x) < 0 for all x ∈ (a, b), then f is strictly decreasing on [a, b].

(e) If f ′ (x) = 0 for all x ∈ (a, b), then f is constant on [a, b].

Let f : (a, b) → R be differentiable at x0 ∈ (a, b). Then


{ } { }
′ > strictly increasing
(9.15) f (x0 ) 0 ⇒ f is at x0 .
< strictly decreasing

{ } { }
′ > monotone increasing
(9.16) f (x0 ) 0 ⇔ f is at x0 .
6 monotone decreasing

Theorem 9.14. [Darboux’s Theorem] Intermediate Value Theorem for derivative: If f is differ-
entiable on (a, b) then its derivative has the intermediate value property. If x1 < x2 are any two
points in the interval (a, b), and y lies between f ′ (x1 ) and f ′ (x2 ), then there exists a number x in
the interval [x1 , x2 ] such that f ′ (x) = y.

Proof. Using Weierstrass Theorem

Assume y lies strictly between f ′ (x1 ), and f ′ (x2 ). Define a function g : (a, b) → R by
g(t) = f (t) − yt.
118 9. Single and Multivariable Calculus

Then g′ (x1 ) = f ′ (x1 ) − y and g′ (x2 ) = f ′ (x2 ) − y. Then either (i) g′ (x1 ) > 0 and g′ (x2 ) < 0 or (ii)
g′ (x1 ) < 0 and g′ (x2 ) > 0. Take the first case, i.e. g′ (x1 ) > 0 and g′ (x2 ) < 0. It is clear that neither
x1 nor x2 can be a point where g attains even a local maximum. Since g is a continuous function, it
must therefore attain its maximum at an interior point x of the closed and bounded interval [x1 , x2 ]
by Weierstrass Theorem. So we conclude that
0 = g′ (x) = f ′ (x) − y, or f ′ (x) = y.

Alternate proof using Mean Value Theorem [see ?]

We can clearly assume that y lies strictly between f ′ (x1 ) and f ′ (x2 ). Define continuous func-
tions fx1 , fx2 : [a, b] → R by
{
f ′ (x1 ) for t = x1
fx1 (t) = f (x1 )− f (t)
x1 −t for t ̸= x1 .
and {
f ′ (x2 ) for t = x2
fx2 (t) = f (t)− f (x2 )
t−x2 for t ̸= x2 .
Observe that fx1 (x1 ) = f ′ (x1 ), fx2 (x2 ) = f ′ (x2 ) and fx1 (x2 ) = fx2 (x1 ). Hence, y lies between fx1 (x1 )
and fx1 (x2 ); or y lies between fx2 (x1 ) and fx2 (x2 ). If y lies between fx1 (x1 ) and fx1 (x2 ), then (by
continuity of fx1 ) there exists s in (x1 , x2 ] with
f (s) − f (x1 )
y = fx1 (s) = .
s − x1
Then by Mean Value Theorem there exists x ∈ [x1 , s] such that
f (s) − f (x1 )
y= = f ′ (x).
s − x1
Similarly if y lies between fx2 (x1 ) and fx2 (x2 ), then (by continuity of fx2 ) there exists s in [x1 , x2 )
and x ∈ [s, x2 ] such that
f (x2 ) − f (s)
y= = f ′ (x).
x2 − s


9.10. Functions of Several Variables

Let f : D → R where D ⊆ Rn be a function of n variables.


(9.17) f (x) = f (x1 , x2 , · · · xn )
Examples of such functions are utility functions for several goods, the production functions for
many inputs etc.
9.10. Functions of Several Variables 119

Definition 9.18. The function f (x) is differentiable at the point x if there exists an n dimensional
vector D f (x), called the differential or total derivative of f at x, such that
∀ε > 0, ∃δ > 0  ∥x − y∥ < δ

⇒ f (x) − f (y) − D f (x) · (x − y) < ε · ∥x − y∥ .

9.10.1. Partial Derivative. To us the more important concept is that of partial derivative which
we define now.
Definition 9.19. Let f : D → R where D ⊆ Rn be a function of n variables. If the limit
f (x1 , · · · , xi + h, · · · xn ) − f (x1 , · · · , xi , · · · xn )
lim
h→0 h
∂ f (x)
exists, it is called the ith (first order) partial derivative of f at x and is denoted by ∂xi or fi (x).

The function f (x) is then said to be partially differentiable with respect to xi . The function
f (x) is said to be partially differentiable if it is partially differentiable with respect to every xi .

Note ∂ ∂x
f (x)
i
is the derivative of f (x1 , · · · , xn ) with respect to xi holding all other variables con-
stant. When all the partial derivatives exist, the vector of partial derivatives
[ ]
∂ f (x) ∂ f (x)
∇ f (x) = ,··· ,
∂x1 ∂xn
is called the Jacobian vector or the gradient vector. For functions of one variable, ∇ f (x) = f ′ (x).
Result 9.4. If a function is differentiable at x0 then it is partially differentiable at x0 .

However, existence of all the partial derivatives do not guarantee even the continuity of the
function as the following example shows.
Example 9.10. Let f (x, y) be defined as
{
xy
x2 +y2
if (x, y) ̸= (0, 0)
f (x, y) =
0 otherwise.
We can prove that the partial derivatives D1 f (x, y) and D2 f (x, y) exist at every point in R2 , although
f is not continuous at (0, 0).

If f is a real valued function defined on an open set D in Rn , and the partial derivatives are
bounded in D, then f is continuous on D.
Example 9.11. Let f : R2 → R be
f (x1 , x2 ) = x13 + 2x1 x2 + 3x23 .
120 9. Single and Multivariable Calculus

Then
∂ f (x) ∂ f (x)
= 3x12 + 2x2 , = 2x1 + 9x22
∂x1 ∂x2
[ ]
∇ f (x) = 3x12 + 2x2 , 2x1 + 9x22 , ∀x ∈ R2

For functions of one variable we have seen earlier that we could approximate the function
around a point by the tangent to the function at the point. We can do something similar in case of
functions of several variables. Instead of approximation by a line (the tangent), we now approxi-
mate by the tangent hyperplane.
Definition 9.20. Given f : D → R with gradient ∇ f (x) at x0 , the tangent hyperplane to f at x0 is
given by
f (x) = f (x0 ) + ∇ f (x0 ) · (x − x0 ) .

Note that in an n dimensional world, the tangent hyperplane is an (n − 1) dimensional object.

9.10.2. Second Order Partial Derivatives. Let us look at the example above again. For
f (x1 , x2 ) = x13 + 2x1 x2 + 3x23 ,
∂ f (x)
∂x1 = 3x12 + 2x2 and ∂∂xf (x)
2
= 2x1 + 9x22 are differentiable functions of x1 and x2 themselves. When
we take partial derivatives of these functions we get the second partial derivatives.
∂2 f (x) ∂2 f (x) ∂2 f (x) ∂2 f (x)
= 6x1 , = 18x 2 , = = 2.
∂x12 ∂x22 ∂x1 ∂x2 ∂x2 ∂x1
This example can be generalized.
Definition 9.21. Let f : Rn → R be twice differentiable. For each of the n partial derivatives, we
get n partial derivative of second order,

( )
∂ ∂ f (x) ∂2 f (x)
= = fi j (x) .
∂x j ∂xi ∂x j ∂xi
We organize the second order derivatives in a matrix, called the Hessian Matrix.
 
∂2 f (x) ∂2 f (x)
 ∂x12 · · · · · · ∂xn ∂x1 
 ∂2 f (x) 
 ∂x ∂x · · · · · · ··· 
2 
H f (x) = D f (x) =  . 
(9.18) ..  .
1 2
.. ..
 .. . . . 
 2 
∂ f (x) ∂2 f (x)
∂x1 ∂xn · · · · · · ∂x 2
n
9.10. Functions of Several Variables 121

If all the partial derivatives of the first order exist and are continuous then f is called C 1 or contin-
uously differentiable. If all the partial derivatives of second order exist and are continuous then f
is called C 2 or twice continuously differentiable and so forth.
Theorem 9.15. Young’s Theorem : If f is twice continuously differentiable then
∂2 f (x) ∂2 f (x)
= ,
∂x j ∂xi ∂xi ∂x j
i.e., the Hessian of f is a symmetric matrix.
Example 9.12. For the example above
[ ]
6x1 2
H f (x) = .
2 18x2

The off diagonal element of the Hessian are also called cross-partials. For functions of one
variable, H f (x) = f ′′ (x).
Example 9.13. Let f : R3 → R be
f (x) = 5x12 + x1 x23 − x22 x32 + x33 .
Then [ ]
∇ f (x) = 10x1 + x23 3x1 x22 − 2x2 x32 −2x22 x3 + 3x32
and  
10 3x22 0
 
H f (x) =  3x22 6x1 x2 − 2x32 −4x2 x3 
0 −4x2 x3 −2x22 + 6x3

We now provide three very useful theorems on continuous and differentiable functions on
convex sets in Rn for n ≥ 1. They are the Intermediate Value theorem, the Mean Value theorem
and Taylor’s theorem.
Theorem 9.16 (Intermediate Value Theorem:). Suppose A is a convex subset of Rn , and f : A → R
is a continuous function on A. Suppose x1 and x2 are in A, and f (x1 ) > f (x2 ). Then given any
c ∈ R such that f (x1 ) > c > f (x2 ), there is 0 < θ < 1 such that f [θx1 + (1 − θ)x2 ] = c.
Example 9.14. Suppose X ≡ [a, b] is a closed interval in R (with a < b). Suppose f is a continuous
function on X. By Weierstrass theorem, there will exist x1 and x2 in X such that f (x1 ) ≥ f (x) ≥
f (x2 ) for all x ∈ X. If f (x1 ) = f (x2 ) [this is the trivial case], then f (x) = f (x1 ) for all x ∈ X, and
so f (X) is the single point, f (x1 ). If f (x1 ) > f (x2 ), then using the fact that X is a convex set, we
can conclude from the Intermediate Value Theorem that every value between f (x1 ) and f (x2 ) is
attained by the function f at some point in X. This shows that, f (X) is itself a closed interval.
122 9. Single and Multivariable Calculus

Theorem 9.17 (Mean Value Theorem). Suppose A is an open convex subset of Rn , and f : A → R
is continuously differentiable on A. Suppose x1 and x2 are in A. Then there is 0 ≤ θ ≤ 1 such that
f (x2 ) − f (x1 ) = (x2 − x1 )∇ f (θx1 + (1 − θ)x2 )
Example 9.15. Let f : R → R be a continuously differentiable function with the property that
f ′ (x) > 0 for all x ∈ R. Then given any x1 , x2 in R, with x2 > x1 we have by the Mean-Value
Theorem (since R is open and convex), the existence of 0 ≤ θ ≤ 1, such that
f (x2 ) − f (x1 ) = (x2 − x1 ) f ′ (θx1 + (1 − θ)x2 )
Now f ′ (θx1 + (1 − θ)x2 ) > 0 by assumption, and x2 > x1 by hypothesis. So f (x2 ) > f (x1 ). This
shows that f is an increasing function on R.

Observe that a function f : R → R can be increasing without satisfying f ′ (x) > 0 at all x ∈ R.
For example, f (x) = x3 is increasing on R, but f ′ (0) = 0.
Theorem 9.18 (Taylor’s Expansion up to Second-Order). Suppose A is an open, convex subset of
Rn , and f : A → R is twice continuously differentiable on A. Suppose x1 and x2 are in A. Then
there exists 0 ≤ θ ≤ 1, such that
1
f (x2 ) − f (x1 ) = (x2 − x1 )′ ∇ f (x1 ) + (x2 − x1 )′ H f (θx1 + (1 − θ)x2 )(x2 − x1 )
2

9.11. Composite Functions and the Chain Rule

Let h : A → Rm be a function with component functions hi : A → R(i = 1, · · · , m) which are defined


on an open set A ⊂ Rn . Let f : B → R be a function defined on an open set B ⊂ Rm which contains
the set h(A). Then, we can define G : A → R by G(x) ≡ f [h(x)] ≡ f [h1 (x), · · · , hm (x)] for each
x ∈ A. This function is known as a composite function [of f and h].

The “Chain Rule” of differentiation provides us with a formula for finding the partial deriva-
tives of a composite function, F, in terms of the partial derivatives of the individual functions, f
and h.
Theorem 9.19 (Chain Rule of differentiation). Let h : A → Rm be a function with component
functions hi : A → R(i = 1, · · · , m) which are continuously differentiable on an open set A ⊂ Rn .
Let f : B → R be a continuously differentiable function on an open set B ⊂ Rm which contains the
set h(A). If F : A → R is defined by F(x) = f [h(x)] on A, and a ∈ A, then F is differentiable at a
and we have for i = 1, · · · , n,
m
Di F(a) = ∑ D j f (h1 (a), · · · , hm (a))Di h j (a)
j=1
9.11. Composite Functions and the Chain Rule 123

Example 9.16. Let m = 2, n = 1. Let h1 (x) = x3 on R, and h2 (x) = 10 + x on R; and let f (y1 , y2 ) =
y1 + y42 on R2 . Then
F(x) = f [h(x)] = f [h1 (x), h2 (x)] = h1 (x) + [h2 (x)]4 = x3 + (10 + x)4
is a composite function on R. If a ∈ R,
F ′ (a) = D1 F(a) = D1 f (h1 (a), h2 (a)) · D1 h1 (a) + D2 f (h1 (a), h2 (a)) · D1 h2 (a)
= 1 · (3a2 ) + 4(h2 (a))3 · 1 = 3a2 + 4(10 + a)3
Example 9.17. Take m = 1, n = 2. Let h1 (x) = h1 (x1 , x2 ) = x12 + x2 on R2 ; f (y) = 2y on R. Then
F(x) = F(x1 , x2 ) = f [h1 (x1 , x2 )] = 2[x12 + x2 ]. Then if a ∈ R2 ,
D1 F(a) = D1 f [h1 (a1 , a2 )]D1 h1 (a1 , a2 )
D2 F(a) = D1 f [h1 (a1 , a2 )]D2 h1 (a1 , a2 )
Thus, D1 F(a) = 2(2a1 ) = 4a1 ; and D2 F(a) = 2(1) = 2.
Chapter 10

Problem Set 4

(1) Find the derivative of the following functions from R → R:


[ ]1
2x + 1 2
(10.1) f (x) =
x−1
(10.2) f (x) = ln(3x2 − 5x)

(2) Find the equation for the tangent to f (x) = 5x2 + 3x − 2 at x = 2.

(3) Let f : R → R be
{
x2 − 1, x 6 0
f (x) =
−x2 , x > 0.
and g : R → R
{
3x − 2, x 6 2
g (x) =
−x + 6, x > 2.
(a) Is f continuous at x = 0?
(b) Is g continuous at x = 2?

(4) Find
( )
f (x) exp x2 + exp (−x) − 2
(10.3) lim = lim
x→0 g (x) x→0 2x

125
126 10. Problem Set 4

(5) Evaluate the Hessian of the function f : R2 → R


f (x, y) = x2 y + y2 x − 2xy + 3x
at the point (1, 2).

(6) Let f (x, y) be defined as


{
xy
x2 +y2
if (x, y) ̸= (0, 0)
f (x, y) =
0 otherwise.
Show that the partial derivatives D1 f (x, y) and D2 f (x, y) exist at every point in R2 , although f
is not continuous at (0, 0).

(7) This exercise gives an example of a function with D12 f (x, y) ̸= D21 (x, y). Let f (x, y) be defined
as {
xy(x2 −y2 )
x2 +y2
if (x, y) ̸= (0, 0)
f (x, y) =
0 otherwise.
(a) We can prove that the partial derivatives D1 f (x, y) and D2 f (x, y) exist at every point in
(x, y) ∈ R2 and f is continuous on R2 .
(b) The partial derivatives D1 f (x, y) and D2 f (x, y) are continuous at every point in R2 .
(c) The second order cross partial derivatives D12 f (x, y) and D21 f (x, y) exist at every point in
R2 and are continuous everywhere in R2 except at (0, 0).
(d) D21 f (0, 0) = +1 and D12 f (0, 0) = −1.
Chapter 11

Convex Analysis

11.1. Concave, Convex Functions

Definition 11.1. Function f : D → R is concave if ∀x, y ∈ D, ∀λ ∈ [0, 1],


[ ]
(11.1) λ f (x) + (1 − λ) f (y) 6 f λx + (1 − λ) y
Function f is strictly concave if the inequality is strict for all λ ∈ (0, 1).

For functions of real variable, we say that a function is concave if and only if, informally, its
slope is weakly decreasing. If the function is differentiable then the derivative is weakly decreasing.

Theorem 11.1. Let X ⊂ R be an open interval. Then f : X → R is concave if and only if for any
a, b, c ∈ X with a < b < c
f (b) − f (a) f (c) − f (b)
≥ , and
b−a c−b

f (b) − f (a) f (c) − f (a)


≥ .
b−a c−a

Proof. First we assume that f is concave and show that the two inequalities hold. Note b − a > 0
and c − b > 0. Hence the first inequality holds if and only if
[ f (b) − f (a)](c − b) ≥ [ f (c) − f (b)](b − a).

127
128 11. Convex Analysis

f(d)

f ′ (d)

f (d)− f (c)
d−c
f(c)

f ′ (c)

c d
Figure 11.1. A concave Function of one variable: f ′ (d) < < f ′ (c)
f (d)− f (c)
d−c

This will be true if and only if


(c − b + b − a) f (b) ≥ (b − a) f (c) + (c − b) f (a)
or ( ) ( )
b−a c−b
f (b) ≥ f (c) + f (a).
c−a c−a
Observe that ( ) ( )
b−a c−b
b= c+ a,
c−a c−a
where
b−a c−b
= λ > 0, and = 1 − λ > 0.
c−a c−a
Since f is concave,
f (b) = f (cλ + a(1 − λ)) ≥ λ f (c) + (1 − λ) f (a)
11.1. Concave, Convex Functions 129

holds true. The second inequality holds if and only if

( ) ( )
b−a b−a
f (b) ≥ f (c) + 1 − f (a),
c−a c−a
with b−a
c−a ∈ (0, 1). This holds true since f is concave.

To show that if the two inequalities hold then f is concave, we can take any a < c and any
λ ∈ (0, 1), and let b = λa + (1 − λ)c so that a < b < c holds. 

Following theorem gives a characterization of concave functions.


Theorem 11.2. Suppose A is a convex subset of Rn and f is a real-valued function on A. Then f
is a concave function if and only if the set
C ≡ {(x, α) ∈ A × R : f (x) > α}
is a convex set in Rn+1 .

( ) ( )
Proof. Let the function f be concave. Let x1 , α1 ∈ C and x2 , α2 ∈ C. Then f (x1 ) ≥ α1 and
f (x2 ) ≥ α2 . Since f is concave, and x1 , x2 ∈ A, for every λ ∈ [0, 1],
[ ] ( ) ( )
f λx1 + (1 − λ) x2 ≥ λ f x1 + (1 − λ) f x2 ≥ λα1 + (1 − λ) α2 ,
( )
which implies λx1 + (1 − λ) x2 , λα1 + (1 − λ) α2 ∈ C. Hence C is convex.

( ) ( )
Next we assume C to be convex. Note for x1 , x2 ∈ A, we have x1 , f (x1 ) ∈ C and x2 , f (x2 ) ∈
C. Since C is convex, for every λ ∈ [0, 1],
( ) ( )
λ · x1 , f (x1 ) + (1 − λ) · x2 , f (x2 ) ∈ C.

This implies
f (λx1 + (1 − λ)x2 ) ≥ λ · f (x1 ) + (1 − λ) · f (x2 ),
or f is concave.

In general, a concave function on a convex set in Rn need not be continuous as the following
example shows.
130 11. Convex Analysis

f (x)

b
f (x) is not continuous at x = 0
bC

Figure 11.2. Concave function need not be continuous

Example 11.1. Let


{
1 + x for x > 0
f (x) =
0 for x = 0.
This function is concave but it is not continuous at x = 0.

However, if the set A is open and convex, then the concave function f is continuous on A.
Following theorem can be proved using Theorem 12.1 for functions of real variable.
Theorem 11.3. Let X ⊂ R be open and convex and let f : X → R be a concave function. Then f
is continuous on X.

Proof. Assume f is concave. Theorem 12.1 implies that for any a, b, c ∈ X with a < b < c the
graph of the function f lies between the graph of the line through points (a, f (a)) and (b, f (b)) and
the line through points (b, f (b)) and (c, f (c)). Thus for any x ∈ [a, b],
f (b) − f (a) f (c) − f (b)
f (b) − (b − x) ≤ f (x) ≤ f (b) − (b − x)
b−a c−b
and for any x ∈ [b, c],
f (b) − f (a) f (c) − f (b)
f (b) + (x − b) ≥ f (x) ≥ f (b) + (x − b).
b−a c−b
These two inequalities imply that f is continuous at b. 
11.1. Concave, Convex Functions 131

If the function is continuously differentiable on an open convex set, then following theorem
characterizes the concave functions.
Theorem 11.4. Suppose A ⊂ Rn is an open convex set, and f : A → R is continuously differentiable
on A. Then f is concave on A if and only if

(11.2) f (x2 ) − f (x1 ) 6 ∇ f (x1 )(x2 − x1 )


whenever x1 and x2 are in A.

Proof. We assume f to be a concave function and x2 ∈ A and x1 ∈ A. For λ ∈ [0, 1],

f (λx2 + (1 − λ)x1 ) ≥ λ · f (x2 ) + (1 − λ) · f (x1 ) = λ · f (x2 ) − λ · f (x1 ) + f (x1 ).


Then ( )
f (x1 + λ(x2 − x1 )) − f (x1 ) ≥ λ · f (x2 ) − f (x1 ) .
Dividing both sides by λ, we get
f (x1 + λ(x2 − x1 )) − f (x1 )
≥ f (x2 ) − f (x1 ).
λ
Taking λ → 0, we get
∇ f (x1 ) · (x2 − x1 ) ≥ f (x2 ) − f (x1 ),
which proves the inequality.

Next we assume (11.2) holds true for all x2 , x1 ∈ A. Then for any λ ∈ [0, 1], let x = λx2 + (1 −
λ)x1 . Since A is convex, x ∈ A. Note
x2 − x = x2 − λx2 − (1 − λ)x1 = (1 − λ)(x2 − x1 ).
Also
x1 − x = x1 − λx2 − (1 − λ)x1 = −λ(x2 − x1 ).
Applying (11.2), we get
f (x2 ) − f (x) 6 ∇ f (x) · (x2 − x) = ∇ f (x) · (1 − λ)(x2 − x1 ),
and
f (x1 ) − f (x) 6 ∇ f (x) · (x1 − x) = ∇ f (x) · (−λ)(x2 − x1 ).

We multiply the first inequality by λ and the second inequality by 1 − λ and add to obtain
λ · f (x2 ) + (1 − λ) · f (x1 ) − f (x) 6 0,
132 11. Convex Analysis

which implies
λ · f (x2 ) + (1 − λ) · f (x1 ) 6 f (x) = f (λx2 + (1 − λ)x1 ).
So f is concave. 

Also the function will be strictly concave if we change the weak inequality to strict inequality.
Theorem 11.5. Suppose A ⊂ Rn is an open convex set, and f : A → R is continuously differentiable
on A. Then f is strictly concave on A if and only if

f (x2 ) − f (x1 ) < ∇ f (x1 )(x2 − x1 )


whenever x1 and x2 are in A.

Now we consider twice continuously differentiable functions. Following two theorems char-
acterize concave and strictly concave functions.
Theorem 11.6. Suppose A ⊂ Rn is an open convex set, and f : A → R is twice continuously
differentiable on A. Then f is concave on A if and only if H f (x) is negative semi-definite for all
x ∈ A.

If H f (x) is negative definite whenever x ∈ A, then the function is strictly concave, but the
converse is not true.
Theorem 11.7. Suppose A ⊂ Rn is an open convex set, and f : A → R is twice continuously
differentiable on A. If H f (x) is negative definite for all x ∈ A then f is strictly concave on A.

Following example shows that the converse implication does not hold.
Example 11.2. Let f : R → R be defined by f (x) = −x4 for all x ∈ R (See Figure 2). This is a
twice continuously differentiable function on the open, convex set R. We can verify that f is strictly
concave on R, but since f ′′ (x) = −12x2 , f ′′ (0) = 0. This shows that the converse implication is not
valid.

Claim 11.1. If f : A → R is a function of one variable and is twice continuously differentiable then
∀ x ∈ A, f ′′ (x) 6 0 ⇔ f is concave.
Definition 11.2. Function f : A → R is convex if ∀x, y ∈ A, ∀λ ∈ [0, 1],
[ ]
(11.3) λ f (x) + (1 − λ) f (y) > f λx + (1 − λ) y
Function f is strictly convex if the inequality is strict for all λ ∈ (0, 1).
11.1. Concave, Convex Functions 133

−3 −2 −1 1 2 3
−1

−2

−3

−4

−5

−6

−7

Figure 11.3. Graph of −x4

Claim 11.2. If f : A → R is a function of one variable and is twice continuously differentiable then

∀ x ∈ A, f ′′ (x) > 0 ⇔ f is convex.

Note that a local maxima (minima) of a concave (convex) function is a global maxima (minima)
as well.

11.1.1. Hessian, Concavity and Convexity.

Theorem 11.8. Let f : A → R (where A ⊆ Rn is open and convex) be twice continuously differen-
tiable. Then,

f is concave if and only if H f (x) is NSD ∀x ∈ A.


f is convex if and only if H f (x) is PSD ∀x ∈ A.
H f (x) is ND ∀x ∈ A ⇒ f is strictly concave.
H f (x) is PD ∀x ∈ A ⇒ f is strictly convex.

Corollary 1. For a function of one variable, this means,


134 11. Convex Analysis

−3 −2 −1 1 2 3

Figure 11.4. Graph of x4

f is concave if and only if f ′′ (x) 6 0 ∀x ∈ A.


f is convex if and only if f ′′ (x) > 0 ∀x ∈ A.
f ′′ (x) < 0 ∀x ∈ A ⇒ f is strictly concave.
f ′′ (x) > 0 ∀x ∈ A ⇒ f is strictly convex.
Example 11.3. The implication
f is strictly convex ⇒ f ′′ (x) > 0, ∀x ∈ A
does not hold.

Take f (x) = x4 , f ′′ (x) = 12x2 . It is strictly convex everywhere but f ′′ (0) = 0. We would need
f ′′ (x) > 0, ∀x ∈ A for the Hessian to be PD.

11.1.2. Some Useful Results.


Proposition 3.

(a) If f and g are concave (convex)and a > 0, b > 0 then a f + bg is concave(convex).


11.2. Quasi-concave Functions 135

( )
(b) If f (x) is concave (convex) and F (u) concave(convex) and increasing then U (x) = F f (x)
is concave(convex).

(c) Function f is concave if and only if − f is convex.

Next example describes some convex / concave functions.


Example 11.4.

(1.) ex is strictly convex on R.

(2.) ln x is strictly concave on R+ .

(3.) 1
x is strictly convex on R++ and strictly concave on R−− .

(4.) xα where α is an integer greater that 1, is is strictly convex on R+ . On R− , xα is strictly convex


for α even integer and strictly concave for α odd integer.

(5.) xα where α is a real number in (0, 1) is strictly concave on R+ .

11.2. Quasi-concave Functions

Definition 11.3. Function f : A → R is quasi-concave if ∀x, y ∈ A, ∀λ ∈ [0, 1]


( ) { }
f λx + (1 − λ) y > min f (x) , f (y) .
{ }
Theorem 11.9. Function f : A{→ R is quasi-concave
} if and only if ∀a ∈ R, the set f + = x ∈ A | f (x) > a
a
is a convex set. The set fa+ = x ∈ A | f (x) > a is called upper contour set.
Definition 11.4. Function f : A → R is quasi-convex if function − f is quasi-concave.
{ }
Theorem 11.10. Function f : A { → R is quasi-convex
} if and only if ∀a ∈ R, the set fa− = x ∈ A | f (x) 6 a
is a convex set. The set fa− = x ∈ A | f (x) 6 a is called lower contour set.
Theorem 11.11.
f : A → R concave ⇒ f is quasi-concave,
f : A → R convex ⇒ f is quasi-convex.

Note that for functions of one variable, any monotone function is quasi-concave. This however
does NOT apply for functions of more than one variable. Also all quasi-concave functions need
not be concave. Take f (x) = x2 it is monotone increasing, hence quasi-concave. But it is not
136 11. Convex Analysis

concave, rather it is convex. For functions of one variable, following theorem characterizes the
quasi-concave functions.
Theorem 11.12. A function f of a single variable is quasiconcave if and only if either (a) it is
non-decreasing, (b) it is non-increasing, or (c) there exists x∗ such that f is non-decreasing for
x < x∗ and non-increasing for x > x∗ .

11.2.1. Bordered Hessian. To check quasi-concavity of a C 2 function, we use bordered Hessian


matrix.
Definition 11.5. Bordered Hessian Let f be a C 2 function.

The bordered Hessian is


 
∂ f (x) ∂ f (x)
0 ∂x1 ··· ··· ∂xn
 ∂ f (x) 
 ∂2 f (x)
··· ··· ∂2 f (x) 
 ∂x1 ∂x12 ∂xn ∂x1 
 
B (x) = 
 ···
∂2 f (x)
∂x1 ∂x2 ··· ··· ··· .

 . .. .. .. .. 
 .. . . . . 
 
∂ f (x) ∂2 f (x) ∂2 f (x)
∂xn ∂x1 ∂xn ··· ··· ∂xn2

Let Br (x) denote the sub matrix of the first (r + 1) rows and columns of Br (x), i.e., Br (x) is
(r + 1) × (r + 1) matrix.
( )
Condition 1. A necessary condition for f to be quasiconcave is that (−1)r det Br (x) > 0, ∀r =
1, 2, · · · , n; ∀x ∈ D.
( )
Condition 2. A sufficient condition for f to be quasiconcave is that (−1)r det Br (x) > 0, ∀r =
1, 2, · · · , n; ∀x ∈ D.

When we check for quasi-concavity, we have to check the sufficient conditions. We need
[ ]
0 f1
det < 0,
f1 f11
 
0 f1 f2
 
det  f1 f11 f12  > 0, etc.
f2 f21 f22
Remark 11.1. When we have to check whether a function is quasi-concave, start out checking
whether it is concave because it is easier to check for concavity and concavity implies quasi-
concavity.
Remark 11.2. Quasi-concavity is preserved under monotone transformation whereas concavity
need not be preserved.
11.2. Quasi-concave Functions 137


Example 11.5. Let f (x, y) = xy for (x, y) ∈ R2++ .

Then  √ √ 
y
− 1
 4√ x3
1
4
1
xy 
H f (x) =  1 1 √ .
4 xy − 14 x
y3

The principal minors of order one are negative and of order two is zero. Hence f (x) is concave and
so quasi-concave.
( )4
Let us take a monotone transformation g (x, y) = f (x, y) = x2 y2 , for (x, y) ∈ R2++ .
 
0 2xy2 2x2 y
 
B (x, y) =  2xy2 2y2 4xy 
2x2 y 4xy 2x2
[ ]
( ) 0 2xy2
det B1 (x, y) = = −4x2 y4 < 0
2xy2 2y2
( )
⇒ (−1)1 det B1 (x, y) > 0, ∀ (x, y) ∈ R2++ .
( ) ( ) ( )
det B2 (x, y) = −2xy2 4x3 y2 − 8x3 y2 + 2x2 y 8x2 y3 − 4x2 y3
= 8x4 y4 + 8x4 y4 = 16x4 y4 > 0, (x, y) ∈ R2++
⇒ g(x, y) is quasi-concave.
Note however, g (x, y) is not concave.
[ ]
2y2 4xy
Hg (x, y) =
4xy 2x2

Principal minors of order one are strictly positive and of order two is −12x2 y2 which is strictly
negative. Thus g (x, y) is not concave.
Chapter 12

Problem Set 5

(1) Prove or give a counterexample: The sum of two concave functions is concave.

(2) Which of the following is true? Prove or give a counterexample.


Suppose A and B are convex sets in Rn .
(a) Set A ∪ B is convex set in Rn .
(b) Set A ∩ B is convex set in Rn .
(c) Define the set
C = {x + y : x ∈ A and y ∈ B}
Set C a convex set in Rn .

(3) Suppose f : [0, 1] → R+ and g : [0, 1] → R+ are increasing, convex functions on [0, 1]. Define
the function h : [0, 1] → R+ by:

h(x) = f (x)g(x) for all x ∈ [0, 1]

Show that h a convex function on [0, 1].

(4) Determine whether each of the following functions is quasi-concave:

(a) f (x) = 3x + 4;
(b) g(x, y) = yex , y > 0;
(c) h(x, y) = −x2 y3 .

(5) Show using an example that the sum of two quasi-concave functions need not be quasi-concave
(in general).

139
140 12. Problem Set 5

(6) Consider the functions:


(i) f (x, y, z) = 8x3 + 2xy2 − z3
(ii) g(x, y) = x + y − ex − ex+y
Write out the gradient vector and the Hessian matrices ∇ f (x, y, z) and H f (x, y, z) and ∇g(x, y)
and Hg (x, y). State if f concave, quasi-concave, quasi-convex? What about function g?

(7) Prove the following theorem from the lecture notes.


Theorem 12.1. Let X ⊂ R be an open interval. Then f : X → R is concave if and only if for
any a, b, c ∈ X with a < b < c
f (b) − f (a) f (c) − f (b)
≥ , and
b−a c−b
f (b) − f (a) f (c) − f (a)
≥ .
b−a c−a

(8) Prove the following theorem from the lecture notes.


Theorem 12.2. Let X ⊂ R be open and convex and let f : X → R be a concave function. Then
f is continuous on X.
Chapter 13

Inverse and Implicit


Function Theorems

13.1. Inverse Function Theorem

Recall the earlier discussions on inverse functions. Consider a real-valued function, f : R → R


defined by f (x) = 4x. It is one - to - one on R; and also we can define a function g : R → R by
g(y) = 4y . The function g(y) satisfies the property g[ f (x)] = x and is called the inverse function of
f on R. Furthermore g′ [ f (x)] = f ′1(x) for all x ∈ R.

This idea can be extended to the domains of the function, A, being subsets of Rn , with the
function f defined from A to R. Then f is one-to-one on A if for all x1 , x2 ∈ A, x1 ̸= x2 , we have
f (x1 ) ̸= f (x2 ). In this case, if there is a function g, from f (A) to A, such that g[ f (x)] = x for each
x ∈ A, then g is called the inverse function of f on f (A).

More generally, let A be an open set in R, and f : A → R be continuously differentiable on A.


Let a ∈ A, and suppose that f ′ (a) ̸= 0. If f ′ (a) > 0, then there is an open interval B(a, r) such that
f ′ (x) > 0 for all x in B(a, r), and f is increasing on B(a, r). Thus, for every z ∈ f [B(a, r)], there
is a unique x in B(a, r) such that f (x) = y. Or there is a unique function h : f [B(a, r)] → B(a, r)
such that h[ f (x)] = x for all x ∈ B(a, r). Thus, h is an inverse function of f on f [B(a, r)]. In other
word, h is the inverse of f “locally” around the point f (a). We have not guaranteed that the inverse
function is defined on the entire set f (A). Similarly, if f ′ (a) < 0, an inverse function could be

141
142 13. Inverse and Implicit Function Theorems

defined “locally” around f (a). The important restriction to carry out the kind of analysis noted
above is that f ′ (a) ̸= 0.

To illustrate this, consider f : R → R+ given by f (x) = x2 . Consider the point a = 0. Clearly f


is continuously differentiable on R, but f ′ (0) = 0. Now, we cannot define a unique inverse function
of f even “locally” around f (0). If we choose any open ball B(0, r), and consider any point y ̸= 0
in the set f [B(0, r)], then there will be two values x, x′ in B(0, r), x ̸= x′ , such that f (x) = y = f (x′ ).

We note here that f ′ (a) ̸= 0 is not a necessary condition to get a unique inverse function of f .
For example if f : R → R is defined by f (x) = x3 , then we have f to be continuously differentiable
on R, with f ′ (0) = 0. However f is an increasing function, and clearly has a unique inverse function
g(y) = y1/3 on R, and hence locally around f (0).

Following theorem deals with the existence and properties of inverse functions.
Theorem 13.1 (Inverse Function Theorem). Let A be an open set of Rn , and f : A → Rn be con-
tinuously differentiable on A. Suppose a ∈ A and the Jacobian of f at a is non-zero. Then there is
an open set X ⊂ A containing a, and an open set Z ⊂ Rn containing f (a), and a unique function
h : Z → X, such that:

(i) f (X) = Z;

(ii) f is one-to-one on X;

(iii) h(Z) = X, and h[ f (x)] = x for all x ∈ X.

Further, h is continuously differentiable on Z.

Following example shows that continuity of f ′ is needed in the inverse function theorem, even
in the case n = 1.
Example 13.1. Let ( )
1
f (t) = t + 2t 2 sin for t ̸= 0 and f (0) = 0,
t
then f ′ (0) = 1, f ′ is bounded in (−1, 1), but f is not one-to-one in any neighborhood of 0.

13.2. The Linear Implicit Function Theorem

For the system of simultaneous linear equations Ax = b, we have seen earlier, that there exists a
unique solution for every choice of right hand side column vector b, if and only if the rank of A
13.2. The Linear Implicit Function Theorem 143

is equal to the number of rows of A which is equal to the number of columns of the matrix A.
In economic models, the vector b represents some externally determined (exogenous) parameters
while the linear equations constitute some equilibrium conditions which determine the vector x
which is the set of internal (endogenous) variables.

In this sense it is possible to divide the set of variables in two disjoint subsets of endogenous and
exogenous variables. Thus a general linear economic model will have m equations in n unknowns:
a11 x1 + a12 x2 + · · · + a1n xn = b1
··· ··· ··· ··· ···
am1 x1 + am2 x2 + · · · + amn xn = bm

In general it will be possible to divide the set of variables into endogenous variables and exoge-
nous variables. Such a division will be useful only if after substituting the values of the exogenous
variables in the m equations, it is possible to obtain a solution of the system for the remaining en-
dogenous variables. For this two conditions must hold. The number of endogenous variables must
be equal to the number of equations m and the square matrix corresponding to the endogenous
variables must have maximal rank m.

A formal statement of the above observation is known as the linear version of Implicit Function
Theorem.

Theorem 13.2. Let x1 , · · · , x j ; x j+1 , · · · , xn be a partition of the n variables in the system of


equations (22.2) into endogenous and exogenous variables respectively. Then there exists, for
every choice of the exogenous variables, x̄ j+1 , · · · , x̄n , a unique set of the values, x̄1 , · · · , x̄ j , if and
only if

(a) j = m, i.e., number of endogenous variables = number of equations;

(b) the rank of j × j square matrix


 
a11 a12 . . . a1 j
 
a21 a22 . . . a2 j 
(13.1) 
[A] =  . .. 
.. .. 
j× j  .. . . . 
a j1 a j2 . . . a jk

corresponding to the endogenous variables is j.

Here is an example for this theorem.

Exercise 13.1.
144 13. Inverse and Implicit Function Theorems

Let the system of equations be


x+ 2y+ z− w = 1
3x− y− 4z+ 2w = 3
0x+ y+ z+ w = 0
Determine how many variables can be endogenous at any one time and show a partition of the vari-
ables into endogenous and exogenous variables such that the system of equations have a solution.

Find an explicit formula for the endogenous variables in terms of the exiguous variables.
Exercise 13.2.

Let the system of equations be


−x+ 3y− z+ w = 0
4x− y+ 2z+ w = 3
7x+ y+ z+ 3w = 6
Is it possible to partition the variables into endogenous and exogenous variables such that the
system of equations have a unique solution.

13.3. Implicit Function Theorem for R2

Consider the following example of a non-linear implicit function.


y2 − 6xy + 5x2 = 0.
Given any value of x, we can solve this equation for y. For example if x = 0, then y = 0; if x = 1
the equation takes the form y2 − 6y + 5 = 0 and yields y = 1 or y = 5 as solution. Observe that it
is possible to solve y explicitly in terms of x (it turns out to be a correspondence) by applying the
quadratic formula:

6x ± 36x2 − 20x2
y=
2
or y = 5x or y = x.

It is possible to apply quadratic formula to the implicit function xy2 − 3y − 2 exp x = 0 to obtain
an explicit function for y as

3 ± 9 + 8 exp x
y= .
2x
However it could turn out to be the case that the explicit functions more difficult to work with than
the original implicit function.
13.3. Implicit Function Theorem for R2 145

If we come across an implicit function


y5 − 5xy + 4x2 = 0
then it is not possible to solve it in explicit form as there is no general formula for solving a quintic
equation. Note however that the equation still defines y as an implicit function of x. For x = 0, we
get y = 0, for x = 1 we get y = 1 and so on.
Example 13.2. A profit maximizing firm uses single input x (with unit cost w per unit) to produce
an output y using production function y = f (x). Let the price of the output be p per unit. Then the
profit function for this firm given p and w is
Π(x) = p · f (x) − w · x.
To obtain the optimal input x which maximizes the profit , we take the first order condition, which
is
p · f ′ (x) − w = 0.
We can treat p and w as exogenous variables and then this equation defines x as a function of p and
w. The equation need not yield x as an explicit function of p and w. However, it does define x as an
implicit function of p and w and we can use it to estimate the change in x in response to changes in
p and w.

Consider functions of the form


y = G(x1 , · · · , xn ).
In this the endogenous variable y is an explicit function of the exogenous variables (x1 , · · · , xn ).
Such an ideal situation need not occur in every case. More frequently we come across functions of
the form
(13.2) F(x1 , · · · , xn ; y) = 0.
If the function G determines value of y for each set of values (x1 , · · · , xn ), then we say that Eq. (13.2)
defines the endogenous variable y as an implicit function of the exogenous variables (x1 , · · · , xn ).

We consider implicit functions in R2 of the form F(x, y) = c and analyze following question.
For a given implicit function F(x, y) = c and a specified solution (x0 , y0 ),

(a) Does F(x, y) = c determine y as a continuous function of x for points (x, y) such that x is near
x0 and y is near y0 ?

(b) If so, how do the changes in x affect the corresponding values of y?

More formally the two questions can be rephrased as under:


146 13. Inverse and Implicit Function Theorems

(a) Given the implicit function F(x, y) = c, determine a point (x0 , y0 ) such that F(x0 , y0 ) = c, and
also does there exist a continuous function y = f (x) defined on an interval I around x0 so that:
(1) F(x, f(x))=c for all x ∈ I and
(2) y0 = f (x0 )?

(b) If y = f (x) exists and is differentiable, what is f ′ (x0 )?


Theorem 13.3. Let F(x, y) = c be a continuously differentiable function on an open ball around
(x0 , y0 ) in R2 . Suppose F(x0 , y0 ) = c, and consider the expression
F(x, y) = c.
∂F(x,y)
If ̸= 0, then there exists a continuously differentiable function y = f (x) defined on an
∂y (x0 ,y0 )
open interval I around x0 such that:

(a) F(x, f (x)) = c for all x ∈ I,

(b) y0 = f (x0 ), and

(c)
∂F(x,y)
′ ∂x (x ,y )
f (x0 ) = − ∂F(x,y) 0 0 .
∂y (x0 ,y0 )

Example 13.3. Consider the function F : R2 → R given by


F(x, y) = x2 + y2 − 1 = 0
(the graph of this function is a circle with radius r = 1. If we choose (a, b) with
F(a, b) = a2 + b2 − 1 = 0, and a ̸= ±1,
then there are open intervals I ⊂ R containing a, and Y ⊂ R containing b, such that if x ∈ I, there
is a unique y ∈ Y with
F(x, y) − 1 = 0.
Thus, we can define a unique function
f : I → Y such that F(x, f (x)) = 0

for all x ∈ I. If a > 0 and b > 0, then f (x) = 1 − x2 on I. We say such a function is defined implic-
itly by the equation F(x, y) = 0, with y = f (x). Note that if a = 1, and b = 0, so that D2 F(a, b) = 0,
we cannot find such a unique function, f .
Chapter 14

Homogeneous and
Homothetic Functions

14.1. Homogeneous Functions

Most of us have come across homogeneous functions in the elementary algebra courses. For exam-
ple f (x) = ax is homogeneous of degree 1, f (x) = axm is homogeneous of degree m, f (x) = ax + 1
is not a homogeneous function, and so on. First we define the homogeneous function formally.

Definition 14.1. For any scalar k, a real valued function f (x1 , · · · , xn ) is homogeneous of degree k
on Rn+ if for all x ∈ Rn+ , and all t > 0,

f (tx1 , · · · ,txn ) = t k f (x1 , · · · , xn ).

Some examples of homogeneous functions are:

(a) Consider f : R2+ → R given by f (x1 , x2 ) = x12 x23 . Then if t > 0, we have f (tx1 , tx2 ) = (tx1 )2 (tx2 )3 =
t 2+3 x12 x23 = t 5 f (x1 , x2 ). So, f is homogeneous of degree 5.

(b) The function f (x1 , x2 ) = x1a x2b is homogeneous of degree a + b. This function is an example of
returns to scale based on the value of a and b which we assume to be non-negative. If a + b = 1,
the function displays constant returns to scale. If a + b > 1, the function displays increasing
returns to scale. If a + b < 1, the function displays decreasing returns to scale.

147
148 14. Homogeneous and Homothetic Functions

(c) The Cobb-Douglas function f (x1 , x2 , · · · , xn ) = x1a1 x2a2 · · · xnan is homogeneous of degree a1 +
a2 + · · · + an .
( )q
(d) The constant elasticity of substitution function f (x1 , x2 ) = A a1 x1p + a2 x2p p is homogeneous
of degree q.

(e) The function f (x1 , x2 ) = x13 + x23 is homogeneous of degree 32 .

(f) The function f : R2+ → R given by f (x1 , x2 ) = x12 x2 + 3x1 x22 + x23 is homogeneous of degree 3,
since each term is homogeneous of degree 3.

(g) A linear function, f (x1 , · · · , xn ) = a1 x1 + · · · + an xn , is homogeneous of degree 1.

(h) A quadratic for, Q(x, A) = x′ Ax = ∑ ai j xi x j is homogeneous of degree 2.

(i) In consumer theory, the demand function is homogeneous function of degree zero.

(j) The only homogeneous of degree k function of one variable is f (x) = axk for some constant a.

(k) The only homogeneous of degree zero function of one variable is the constant function f (x) = a
for some constant a.

(l) There exist non-constant homogeneous of degree zero functions. Consider for example f (x, y) =
y , y ̸= 0.
x

(m) If functions f and g are homogeneous of degree k, then the sum function f + g is also homo-
geneous of degree k.

(n) The function f : R2+ → R given by f (x1 , x2 ) = 3x12 x23 − 6x15 x22 is not homogeneous since the first
term is homogeneous of degree 5 but the second term is homogeneous of degree 7.

Let us look at the function f (x1 , x2 ) = x1a x2b again. We can calculate the partial derivatives of f
on R2++ . Thus,
∂ f (x1 , x2 ) ∂ f (x1 , x2 )
= ax1a−1 x2b ; = bx1a , x2b−1 .
∂x1 ∂x2
Now, if t > 0, then
∂ f (tx1 ,tx2 ) ∂ f (x1 , x2 )
= a(tx1 )a−1 (tx2 )b = t a+b−1 ax1a−1 x2b = t a+b−1 .
∂x1 ∂x1
14.1. Homogeneous Functions 149

So ∂ f (x∂x11,x2 ) is homogeneous of degree (a + b − 1). Similarly, one can check that ∂ f (x∂x12,x2 ) is homo-
geneous of degree (a + b − 1). More generally, whenever a function, f , is homogeneous of degree
k, its partial derivatives are homogeneous of degree (k − 1).

Theorem 14.1. Suppose f is homogeneous of degree k on Rn+ , and continuously differentiable on


Rn++ . Then for each i = 1, · · · , n, ∂ f (x1∂x,···i ,xn ) is homogeneous of degree (k − 1) on Rn++ .

Proof. To prove this let t > 0 be given. Then,


(14.1) f (tx1 , · · · ,txn ) = t k f (x1 , · · · , xn )
We can consider f (tx) to be a function of n + 1 variables, t, x1 , · · · , xn . We will show this result for
the partial derivative with respect to x1 . In this case the remaining variables t, x2 , · · · , xn are held
as constant. Applying the Chain Rule, we have for i = 1, the partial derivative of the expression on
the left hand side of (14.1) is
∂ f (tx1 , · · · ,txn ) ∂tx1
(14.2) · = D1 f (tx1 , · · · ,txn ) · t
∂tx1 ∂x1

The partial derivative of the function on the right hand side of (14.1) is t k ∂ f (x∂x
1 ,··· ,xn )
1
. Equality of the
two expressions lead to
∂ f (x1 , · · · , xn )
(14.3) D1 f (tx1 , · · · ,txn ) · t = t k
∂x1
Dividing by t, we get,
∂ f (x1 , · · · , xn )
(14.4) D1 f (tx1 , · · · ,txn ) = t k−1
∂x1
Thus the partial derivatives are homogeneous functions of degree k − 1. 

We can also verify that


x1 D1 f (x1 , x2 ) + x2 D2 (x1 , x2 ) = ax1a x2b + bx1a x2b = (a + b)x1a x2b = (a + b) f (x1 , x2 ).
More generally, when a function, f , is homogeneous of degree k, then x ∇ f (x) = k f (x), a result
known as Euler’s theorem.

Theorem 14.2 (Euler’s Theorem). Suppose f : Rn+ → R is homogeneous of degree k on Rn+ and
continuously differentiable on Rn++ . Then,
∂ f (x1 , · · · , xn ) ∂ f (x1 , · · · , xn )
x1 · + · · · + xn · = k f (x)
∂x1 ∂xn
x · ∇ f (x) = k f (x) for all x ∈ Rn++
150 14. Homogeneous and Homothetic Functions

Proof. To prove this, let


f (tx) = t k f (x1 , · · · , xn )
Then, applying the Chain Rule, we have
d f (tx) ∂ f (tx) ∂ f (tx)
(14.5) = · x1 + · · · + · xn
dt ∂x1 ∂xn
But since f is homogeneous of degree k, we have

f (tx) = t k f (x1 , · · · , xn )

and,
d f (tx)
(14.6) = kt k−1 f (x1 , · · · , xn )
dt
Take t = 1 to complete the proof. 

Following is a converse of the Euler Theorem.

Theorem 14.3 (Euler’s Theorem). Suppose f : Rn+ → R is continuous function on Rn+ and contin-
uously differentiable on Rn++ . Also suppose,

∂ f (x1 , · · · , xn ) ∂ f (x1 , · · · , xn )
x1 · + · · · + xn · = k f (x)
∂x1 ∂xn
for all x ∈ Rn++ . Then, f is homogeneous of degree k.

A useful geometric property of the homogeneous function is as follows. Let f (x) be a homo-
geneous function of degree one and consider the level set f (x) = 1. In the producers’ theory, the
function f could be a constant returns to scale production function and the level sets would then be
the iso - quants. Let x be a point on the iso-quant f (x) = 1. If we translate the point x by a factor r
along the ray joining point x and the origin, we obtain a point on the iso - quant f (z) = r.

Similarly if the function f is homogeneous of degree k, then translation of points on iso - quant
q = 1 by a factor r along the ray joining point x and the origin would generate the iso - quant q = rk ,
since f (rx) = rk f (x) = rk as f (x) = 1. Thus the level sets of a homogeneous function are radial
expansions and contractions of each other. This observation leads to following consequence.

Theorem 14.4. Suppose f : Rn+ → R is a homogeneous function which is continuously differen-


tiable on Rn++ . Then, the tangent planes of the level sets of f have constant slope along each ray
from the origin.
14.2. Homothetic Functions 151

14.2. Homothetic Functions

Definition 14.2. A function f : Rn+ → R is homothetic function if it is a monotone transformation


of a homogeneous function.

Thus if there is a monotone transformation, g : R → R and a homogeneous function h : Rn+ → R


such that f (x) = g(h(x)) holds for all x in the domain, then f is a homothetic function.

The function, f (xy) = (xy)3 + xy is homothetic as h(xy) = z = xy is homogeneous function of


degree 2 and g(z) = z3 + z is a monotone transformation of z.

Following theorem characterizes the homothetic function.


Theorem 14.5. Suppose f : Rn+ → R be a strictly monotonic function. Then, f is homothetic if
and only if for all x and y in Rn+ ,
f (x) ≥ f (y) ⇔ f (θx) ≥ f (θy) for all θ > 0.

Following theorem provides a necessary condition for a function to be homothetic in terms of


the partial derivatives.
Theorem 14.6. Suppose f : Rn+ → R is continuously differentiable on Rn++ . If f is homothetic
then, the tangent planes to the level sets of f are constant along rays from the origin; i. e., in other
words, for every i and j and for every x in Rn++
∂ f (tx) ∂ f (x)
∂xi ∂xi
(14.7) ∂ f (tx)
= ∂ f (x)
for all t > 0.
∂x j ∂x j

The converse of this theorem is also true and is stated here for the sake of completeness.
Theorem 14.7. Suppose f : Rn+ → R is continuously differentiable on Rn++ . If (14.7) holds for all
x in Rn++ , for every i and j and for all t > 0, then f is homothetic.
Chapter 15

Separating Hyperplane
Theorem

15.1. Separation by hyperplanes

Given p ∈ Rn , and a ∈ R, we let


[p ≥ a]
denote the set
{x ∈ Rn : p · x ≥ a} ,
where p · x is the Euclidean inner product
n
∑ pi xi .
i=1

A hyperplane in Rn is a set of the form [p = a], where p ̸= 0. We could visualize the vector p ∈ Rn
as as a vector normal (orthogonal) to the hyperplane at each point. The hyperplane does not change
when we multiply both the vector p and real number a by non-zero scalar α.
Example 15.1. Consider p = (1, 2) ∈ R2 and a = 4. Then the hyperplane [p = a] is the set of
points in R2 on the line x1 + 2x2 = 4 which is a straight line. For p = (1, 2, 3) ∈ R3 and a = 6, the
hyperplane [p = a] is the set of points in R3 on the plane x1 + 2x2 + 3x3 = 6. For p = (4) ∈ R1 and
a = 8, the hyperplane [p = a] is the set of singleton point in R1 4x1 = 8 or x1 = 2.

Thus in R2 , hyperplane is a straight line, in R3 , it is an ordinary plane and in R1 it is just a


point.

153
154 15. Separating Hyperplane Theorem

A weak half space or closed half space is a set of the form [p ≥ α] or [p ≤ α]. A strict half
space or open half space is a set of the form [p > α] or [p < α]. We say that a non-zero p, or the
hyperplane [p = α] separates A and B if either
A ⊂ [p ≥ α], and B ⊂ [p ≤ α],
or
B ⊂ [p ≥ α], and A ⊂ [p ≤ α]
holds. We will write p · A ≥ p · B to mean p · x ≥ p · y for all x ∈ A and y ∈ B.

A strict notion of separation we consider is as follows. A non-zero p, or the hyperplane [p = α]


strongly separates A and B if A and B are in disjoint closed half spaces, i.e., there exists some ε > 0,
such that either
A ⊂ [p ≥ α + ε], and B ⊂ [p ≤ α],
or
B ⊂ [p ≥ α + ε], and A ⊂ [p ≤ α]
holds. An equivalent way to state the strong separation is that
inf p · x > sup p · y, or inf p · y > sup p · x.
x∈A y∈B y∈B x∈A

15.2. Separating Hyperplane Theorem

We state and prove one of the versions of the separating hyperplane theorems.
Theorem 15.1. Let A and B be disjoint non-empty convex subset of Rn . Let A be compact and B
be close sets. Then there exists a non-zero p ∈ Rn that strongly separates A and B.

Proof. Let d(x, y) = ∥x − y∥ be the Euclidean distance on Rn . We define function f : A → R by


{ }
f (x) = inf d(x, y) : y ∈ B ,
which is the distance from x ∈ A to set B. We claim that the function f (x) is continuous. Observe
that for any x, x′ ∈ A and y ∈ B, the distance function satisfies the triangle inequality,
( ) ( )
d(x, y) ≤ d x, x′ + d x′ , y ,
and ( )
d(x′ , y) ≤ d x, x′ + d (x, y) .
Thus ( ) ( ) ( )
−d x, x′ ≤ d(x, y) − d x′ , y ≤ d x, x′ ,
( )
where the first inequality is obtained from the triangle inequality for d x′ , y , and the second in-
equality follows from the triangle inequality for d (x, y). Together they imply
( ) ( )
|d(x, y) − d x′ , y | ≤ d x, x′ .
15.2. Separating Hyperplane Theorem 155

Further
( ) ( )
f (x) ≤ d(x, y) ≤ d x, x′ + d x′ , y ,
( ) ( )
for all y ∈ B. Consider a sequence {yn } such that d x′ , yn → f x′ , then
( )
f (x) ≤ d(x, y) ≤ d x, x′ + f (x′ ).
Similarly,
( ) ( )
f x′ ≤ d(x′ , y) ≤ d x, x′ + d (x, y) .
Consider again sequence {yn } such that d (x, yn ) → f (x), then
( )
f (x′ ) ≤ d(x, y) ≤ d x, x′ + f (x).
Thus,
( ) ( ) ( )
−d x, x′ ≤ f (x) − f x′ ≤ d x, x′ ,
or
| f (x) − f (x′ )| ≤ d(x, x′ ).
Thus f (x) is a continuous function on A.

Since A is a compact subset of Rn and f (x) is continuous, using Weierstrass Theorem, there
exists x̄ such that f (x) attains its minimum, i.e.,
f (x̄) ≤ f (x), for all x ∈ A.

Next, we claim that there exists ȳ ∈ B such that


f (x̄) = d(x̄, ȳ).
Define family of sets Bn for each n ∈ N
{ }
1
Bn = y ∈ B : d(x̄, y) ≤ f (x̄) + .
n
The sets Bn is non-empty, closed and convex subset of B and
Bn+1 ⊂ Bn ,
for each n. Further,
f (x̄) =∈ {d(x̄, y) : y ∈ Bn },
i.e., if such a ȳ exists, it must be in Bn for each n. Also we note that the set B1 is a compact set.
Since Bn is non-empty, we can choose a sequence yn ∈ Bn which is a bounded sequence. Then by
applying the Heine - Borel theorem, there exists a subsequence which is convergent to some point
ȳ. Since
{ }
diam Bn = sup d(y1 , y2 ) : y1 , y2 ∈ Bn → 0
as n → ∞, we have found the ȳ having the desired property.
156 15. Separating Hyperplane Theorem

Choose p = x̄ − ȳ. Since A and B are disjoint, p ̸= 0. Therefore,


∥p∥2 > 0,
implies
∥p∥2 = p · p = p · (x̄ − ȳ) = p · x̄ − p · ȳ > 0,
so p · x̄ > p · ȳ. It still remains to show that
p · ȳ ≥ p · y, for all y ∈ B,
and
p · x̄ ≤ p · x, for all x ∈ A.
We will show the first inequality and the other one can be shown using similar arguments. Consider
y ∈ B. Since ȳ minimizes the distance (hence the square of the distance) to x̄ over all y ∈ B, and B
is convex, for any point z = ȳ + λ(y − ȳ) (with λ ∈ (0, 1], on the line segment joining ȳ and y, we
have
[d(x̄, z)]2 = (x̄ − z) · (x̄ − z) ≥ (x̄ − ȳ) · (x̄ − ȳ) = [d(x̄, ȳ)]2 .
Observe that
x̄ − z = x̄ − ȳ − λ(y − ȳ) = p − λ(y − ȳ).
Thus
(x̄ − z) · (x̄ − z) = p · p − 2λp · (y − ȳ) + (λ)2 (y − ȳ) · (y − ȳ) ≥ p · p.
This simplifies (after canceling terms on both sides) to
0 ≥ 2p · (y − ȳ) − (λ)(y − ȳ) · (y − ȳ).
Since the inequality holds for all λ ∈ (0, 1], taking the limit λ → 0, we get
0 ≥ p · (y − ȳ),
ofr
p · ȳ ≥ p · y,
for all y ∈ B.


Chapter 16

Problem Set 6

(1) Let the system of equations be


x+ 3y+ z− 2w = 1
2x+ 6y− 2z− 4w = 3
(a) Determine how many variables can be endogenous at any one time and show a partition of
the variables into endogenous and exogenous variables such that the system of equations
have a solution.
(b) Find an explicit formula for the endogenous variables in terms of the exiguous variables.

(2) Let the system of equations be


−x+ 3y− z+ w = 0
4x− y+ z+ w = 3
7x+ y+ z+ 3w = 6
Is it possible to partition the variables into endogenous and exogenous variables such that the
system of equations have a unique solution.

(3) Show that the equation x2 − xy3 + y5 = 19 is an implicit function of y in terms of x in the
neighborhood of (x, y) = (5, 2). Then estimate the value of y which corresponds to x = 4.9.

(4) Consider the function f (x, y, z) = x2 − y2 + z3 .


(a) If x = 6 and y = 3, find a value of z which satisfies the equation f (x, y, z) = 0.
(b) Verify if this equation
( defines
) z as(an )implicit function of x and y near x = 6 and y = 3.
∂z ∂z
(c) If it does, compute ∂x and ∂y .
(6,3) (6,3)
(d) If x increases to 6.1 and y decreases to 2.8, estimate the corresponding change in z.

157
158 16. Problem Set 6

(5) Consider the profit maximizing firm described in the Example 13.2. If p increases by ∆p and
w increases by ∆w, what will be the change in the optimal input amount x?

(6) Consider 3x2 yz + xyz2 = 96 as defining x as an implicit function of y and z around the point
x = 2, y = 3, z = 2.
(a) If y increases to 3.1 and z remains same at 2, use the Implicit Function Theorem to estimate
the corresponding x.
(b) Use the quadratic formula to solve 3x2 yz + xyz2 = 96 for x as an explicit function of y and
z.
(c) Use the approximation by differentials on the explicit formula to estimate x when y = 3.1
and z = 2.
(d) Which of the two methods is easier?

(7) Let f : R+ → R be homogeneous of degree one. Prove that for all x, y ∈ R+ ,


f (x + y) = f (x) + f (y).

(8) Let f : Rn+ → R be a non-decreasing, quasi-concave and homogeneous of degree one function.
Show that f must be concave on Rn+ .

(9) Let f be a continuous function from Rn+ to R, which is twice continuously differentiable on
Rn++ . Suppose f is homogeneous of degree m, where m is a positive integer ≥ 2. Show that
x′ H f (x)x = m(m − 1) f (x)
for all x ∈ Rn++ where H f (x) is the Hessian of f evaluated at x.
Chapter 17

Unconstrained
Optimization

17.1. Optimization Problem

We call
(17.1) max f (x) , x ∈ D ⊆ Rn ,
or
(17.2) min f (x) , x ∈ D ⊆ Rn ,
where domain D is an open set, unconstrained optimization problems. There are no restrictions on
x within the domain. Furthermore, there are no boundary solutions, because the domain does not
include its boundary (recall the definition of open set). Note max f (x) , x ∈ Rn or min f (x) , x ∈ Rn
are unconstrained optimization problems since Rn is an open set. While solving unconstrained op-
timization problem, we want to use the tools we developed earlier, i.e., find points where ∇ f (x) = 0
and investigate the curvature / shape of the function.
Remark 17.1. An unconstrained optimization problem may not have a solution.
Example 17.1. Let f (x) = x2 . Then,
(17.3) max f (x) , x ∈ R
does not have a solution. See the graph of f (x) = x2 .

159
160 17. Unconstrained Optimization

−3 −2 −1 1 2 3

Figure 17.1. Graph of x2

Remark 17.2. A minimization problem can always be turned into a maximization problem and
vice versa:
(17.4) min f (x) ⇔ max − f (x) .
x∈D x∈D

We will see several examples of unconstrained optimization in these notes. Also there are
additional exercises in the problem set.

17.2. Maxima / Minima for C 2 functions of n variables

Theorem 17.1. First order necessary condition for local maxima / minima: Let A be an open
set in Rn , and let f : A → R be a continuously differentiable function on A. If function f has local
maximum / minimum at x∗ , then
( )
∇ f x∗ = 0
where 0 is a n × 1 null vector.
Remark 17.3. The converse is not true.
Theorem 17.2. Second order necessary condition for local maxima / minima: Let A be an open
set in Rn , and let f : A → R be a twice continuously differentiable function on A.

(a) If function f has local maximum at x∗ then H f (x∗ ) is negative semi-definite.


17.2. Maxima / Minima for C 2 functions of n variables 161

(b) If function f has local minimum at x∗ then H f (x∗ ) is positive semi-definite.

The first order and second order necessary conditions are useful tools to help us in ruling out
the points where a local maximum or local minimum cannot occur. This narrows down our search
for points where a local maximum or local minimum does occur. Examples below explain this
further.

Example 17.2. Let f : R → R be given by f (x) = 4 − x2 for all x ∈ R. Then A = R is an open


set, and f a continuously differentiable function on A with f ′ (x) = −2x. Consider the point x∗ = 1.
Then f ′ (x∗ ) = f ′ (1) = −2(1) = −2. We apply Theorem 17.1 to conclude that x∗ = 1 is not a point
of local maximum of f .

Example 17.3. Let f : R → R be given by f (x) = 4 − 4x + x2 for all x ∈ R. Then A = R is an open


set, and f a twice continuously differentiable function on A. Consider the point x∗ = 2. We can
calculate f ′ (x∗ ) = f ′ (2) = −4 + 2(2) = 0, so the necessary condition of Theorem 17.1 is satisfied.
However this theorem in itself fails to provide any additional information at this stage. In other
words, we cannot conclude from Theorem 17.1 that x∗ = 2 is a point of local maximum. Also, we
cannot conclude from Theorem 17.1 that x∗ = 2 is not a point of local maximum. Theorem 17.2 is
useful at this point. We can calculate
f ′′ (x∗ ) = f ′′ (2) = 2 > 0,
and so the necessary condition of Theorem 17.2 is violated. Consequently, by Theorem 17.2, we
can conclude that x∗ = 2 is not a point of local maximum of f .

It is easy to see that the necessary first and second order conditions are not sufficient.
f (x) d 2 f (x)
Example 17.4. Let X = R be the domain and f (x) = x3 − x4 . Then d dx = 3x2 − 4x3 and dx2
=
6x − 12x2 are both 0 at x = 0. But x = 0 is not a local maximizer for f (x).

Theorem 17.3. Sufficient conditions for local maxima / minima: Let A be an open set in Rn , and
let f : A → R be a twice continuously differentiable function on A.

(a) If x∗ ∈ A is such that H f (x∗ ) is negative definite and ∇ f (x∗ ) = 0 then f has local maximum
at x∗ .

(b) If x∗ ∈ A is such that H f (x∗ ) is positive definite and ∇ f (x∗ ) = 0 then f has local minimum at
x∗ .

It should be noted that the sufficient condition in Theorem 17.3 cannot be weakened to the
necessary condition in the statement of Theorem 17.2. The following example explains this point.
162 17. Unconstrained Optimization

Example 17.5. Let f : R → R be given by f (x) = x3 for all x ∈ R. Then A = R is an open set, and
f is a twice continuously differentiable function on A. At x∗ = 0,
f ′ (x∗ ) = f ′ (0) = 0, and f ′′ (x∗ ) = f ′′ (0) = 0,
so first order necessary condition and second order necessary condition are satisfied. But x∗ is
clearly not a point of local maximum of f since f is an increasing function on A.

It may also be observed that the second order necessary condition in Theorem 17.2 cannot be
strengthened to the sufficient condition in the statement of Theorem 17.3. The following example
illustrates this point.
Example 17.6. Let f : R → R be given by f (x) = −x4 for all x ∈ R. Then A = R is an open set, and
f is a twice continuously differentiable function on R. Clearly, x∗ = 0 is a point of local maximum
of f , since f (0) = 0, while f (x) < 0 for all x ̸= 0. We can calculate that
f ′ (x∗ ) = f ′ (0) = 0, and f ′′ (x∗ ) = f ′′ (0) = 0.
Thus first order necessary condition (in Theorem 17.1) and second order necessary condition (in
Theorem 17.2) are satisfied are, but the second order sufficient condition (in Theorem 17.3) is
violated.

The above discussion shows that the second-order necessary conditions for a local maximum
are different from (weaker than) the second-order sufficient conditions for a local maximum. This
demonstrates the fact that, in general, the first and second derivatives of a function at a point do not
capture all aspects relevant to the occurrence of a local maximum of the function at that point.
Theorem 17.4. Concavity (convexity) and global maxima (minima): Let A be an open and con-
vex set in Rn , and let f : A → R be a continuously differentiable function on A.

(a) If x∗ ∈ A is such that ∇ f (x∗ ) = 0 and f is concave on A, then f has global maximum at x∗ .

(b) If x∗ ∈ A is such that ∇ f (x∗ ) = 0 and f is convex on A, then f has global minimum at x∗ .

This is very easy to show. Note that concavity alongwith continuous differentiability of f
implies that for all x ∈ A,
f (x) − f (x∗ ) 6 ∇ f (x∗ ) · (x − x∗ ).
So f (x) − f (x∗ ) 6 0 or x∗ is a point of global maximum of f on A.
Theorem 17.5. Let A be an open and convex set in Rn , and let f : A → R be a twice continuously
differentiable function on A.

(a) If x∗ ∈ A is such that ∇ f (x∗ ) = 0 and H f (x) is negative semi-definite for all x ∈ A, then f
has global maximum at x∗ .
17.2. Maxima / Minima for C 2 functions of n variables 163

(b) If x∗ ∈ A is such that ∇ f (x∗ ) = 0 and H f (x) is positive semi-definite for all x ∈ A, then f has
global minimum at x∗ .

It is worth noting that Theorem 17.4 or Theorem 17.5 might be applicable in cases Theorem
17.3 is not applicable as the following example shows.

Example 17.7. Let f : R → R be given by f (x) = −x4 . Here, we note that f ′ (0) = 0 and f ′′ (x) =
−12x2 ≤ 0 for all x ∈ R. Thus we can apply Theorem 17.4 or Theorem 17.5 and conclude that
x = 0 is a point of global maximum, and hence also a point of local maximum. But the conclusion
that x = 0 is a point of local maximum cannot be derived from Theorem 17.3, since f ′′ (0) = 0.

Now we explain the steps in applying these theorems via several examples.

Example 17.8. Consider X = R2+ and f (x) = x1 x2 − 2x14 − x22 . The optimization exercise is to
maximize the objective function f (x) by choosing x ∈ X. The two first order conditions are
x2 − 8x13 = 0, and x1 − 2x2 = 0.
Solving the second equation for x1 , we have x1 = 2x2 . Substituting this into the first equation, we
have x2 − 64x23 = 0, which has three solutions:
1 1
x2 = 0, , and − .
8 8
Then the first order conditions have three solutions,
( ) ( )
1 1 1 1
(x1 , x2 ) = (0, 0) , , , and − , − ,
4 8 4 8
but the last of these is not in the domain of f , and the first is on the boundary of the domain, giving
f (0, 0) = 0. Thus, we have a unique solution in the interior of the domain:
( )
( ∗ ∗) 1 1
x1 , x2 = , .
4 8

Example 17.9. Let us find maxima / minima for f : R3 → R


f (x, y, z) = x2 + 2y2 + 3z2 + 2xy + 2xz.

Step 1 Find ∇ f (x, y, z) and set it equal to zero vector.


[ ] [ ]
∇ f (x, y, z) = 2x + 2y + 2z 4y + 2x 6z + 2x = 0 0 0 .

The only solution is (x, y, z) = (0, 0, 0). So we have one candidate for local maximum or
minimum.
164 17. Unconstrained Optimization

Step 2 Compute H f .
 
2 2 2
 
H f (x, y, z) =  2 4 0  .
2 0 6
Note that in this example, H f is independent of (x, y, z). So whichever property of H f , we
get, will be global.

Step 3 Determine the curvature. Begin with computing the leading principal minors.
D1 = 2 > 0, D2 = 2 · 4 − 2 · 2 = 4 > 0 and
D3 = 2 (24 − 0) − 2 (12 − 0) + 2 (0 − 8) = 48 − 24 − 16 = 8 > 0
All leading principal minors are strictly positive→ H f is positive definite ∀ (x, y, z) in-
cluding (0, 0, 0) which implies that f is strictly convex.

Step 4 Conclude, using Theorem 17.4, that we have a global minimum at (0, 0, 0).
Example 17.10. Let us find maxima / minima for f : R2 → R

f (x, y) = −x3 + xy − y3 .

Step 1 Find ∇ f (x, y) and set equal to zero vector.


[ ] [ ]
∇ f (x, y) = −3x2 + y −3y2 + x = 0 0 .
( )
There are two solutions (x, y) = (0, 0) ; (x, y) = 13 , 31 .

Step 2 Compute H f .
[ ] ( ) [ ] [ ]
−6x 1 1 1 −2 1 0 1
H f (x, y) = ⇒ Hf , = and H f (0, 0) = .
1 −6y 3 3 1 −2 1 0

()
1 1
Step 3 Determine the curvature. For 3, 3
, the leading principal minors.
( )
1 1
D1 = −2 < 0, D2 = 3 > 0 ⇔ H f , is negative definite.
3 3
For (0, 0),the principal minors are
D1 = 0, 0; D2 = −1 < 0 ⇒ H f (0, 0) is neither negative semi-definite nor positive semi-definite.
17.2. Maxima / Minima for C 2 functions of n variables 165

Step 4 Then Theorem ( 17.3) on second order necessary conditions applies and we have strict local
maximum at 13 , 31 . The contrapositive of the second order necessary conditions (Theo-
rem 17.2) shows that (0, 0) is neither a point of local maximum nor local minimum. It is
an inflection point.
Example 17.11. Let us find maxima / minima for f : R2 → R
f (x, y) = 2x3 + xy2 + 5x2 + y2 .

Step 1 Find ∇ f (x, y) and set equal to zero vector.


[ ] [ ]
∇ f (x, y) = 6x2 + y2 + 10x 2xy + 2y = 0 0 .
2xy + 2y = 0 ⇒ y = 0 ∨ x = −1,
for x = −1, 6x2 + y2 + 10x = y2 − 4 = 0 ⇒ y = 2 ∨ y = −2
5
for y = 0, 6x2 + y2 + 10x = 6x2 + 10x = 0 ⇒ x = 0 ∨ x = − .
3
There are four solutions
( )
5
(x, y) = (0, 0) ; (−1, 2) ; (−1, −2) , and − , 0 .
3

Step 2 Compute H f .
[ ]
12x + 10 2y
Hf = .
2y 2x + 2

Step 3
[ ]
10 0
H f (0, 0) = , D1 = 10 > 0, D2 = 20 > 0
0 2
⇒ H f (0, 0) is positive definite.
[ ]
−2 4
H f (−1, 2) = , D1 = −2 < 0, and 0, D2 = −16 < 0
4 0
⇒ H f (−1, 2) is neither positive semi-definite nor negative semi-definite.
[ ]
−2 −4
H f (−1, −2) = , D1 = −2 < 0, and 0, D2 = −16 < 0
−4 0
⇒ H f (−1, −2) is neither positive semi-definite nor negative semi-definite.
166 17. Unconstrained Optimization

( ) [ ]
5 −10 0 40
Hf − ,0 = D1 = −10 < 0, D2 = >0
3 0 −34
3
( )
5
⇒ H f − , 0 is negative definite.
3

( )
Step 4 Then Theorem 17.3 on sufficient conditions apply for (0, 0) and − 53 , 0 . We have strict
( )
local minimum at (0, 0); and strict local maximum at − 53 , 0 . The contrapositive of the
second order necessary conditions (Theorem 17.2) implies that neither local maximum
not local minimum exist at (−1, 2) and (−1, −2). They are inflection points.

17.3. Application: Ordinary Least Square Analysis

We describe a nice application of the unconstrained optimization technique in the determination of


regression coefficients in the method of ordinary least squares.

Suppose there are n points (xi , yi ), i = 1, · · · , n in R2 . Let f : R → R be given by f (x) = ax + b


for all x ∈ R. Our objective is to find a function f (i.e., we want to choose a ∈ R and b ∈ R) such
that the quantity

n
(17.5) ∑ [ f (xi ) − yi ]2
i=1

is minimized. Thus the coefficients are such that the sum of the squares of the residuals (error
terms, i.e., the difference between the estimates and the actual observations) is minimized.

We can set up the problem as an unconstrained maximization problem as follows.

Define f : R2 → R by
n
f (a, b) = − ∑ [axi + b − yi ]2
i=1

The maximization problem then is

max f (a, b).


(a,b)
17.3. Application: Ordinary Least Square Analysis 167

The function f is twice continuously differentiable on R2 (being a polynomial function), and we


can calculate
n n
f1 = −2 ∑ [axi + b − yi ]xi = −2 ∑ [axi2 + bxi − xi yi ]
i=1 i=1
n
f2 = −2 ∑ [axi + b − yi ]
i=1
n
f11 = −2 ∑ xi2 ;
i=1
n
f12 = −2 ∑ xi ;
i=1
n
f21 = −2 ∑ xi ;
i=1
f22 = −2n

The hessian matrix, H f , is


[ ]
−2 ∑ni=1 xi2 −2 ∑ni=1 xi
H f (a, b) = .
−2 ∑ni=1 xi −2n

The principal minors of order one for the Hessian are, f11 = −2 ∑ni=1 xi2 < 0 and f22 = −2n < 0.
We need to check the determinant of the principal minor of order two to be non-negative. Thus, the
determinant of the Hessian of f is
[ ]2
n n
det(H f (a, b)) = 4n ∑ xi2 − 4 ∑ xi
i=1 i=1

Recall the Cauchy-Schwarz inequality,

|x · y| ≤ ∥x∥ · ∥y∥.

We can take the two vectors x and the sum vector u and apply the inequality to get

|x · u| ≤ ∥x∥ · ∥u∥
|x · u|2 ≤ ∥x∥2 · ∥u∥2
n [ ]
| ∑ xi |2 ≤ ∑ xi2 · n.
i=1

Therefore, det(H f (a, b)) ≥ 0. Since f11 (a, b) ≤ 0, f22 (a, b) ≤ 0, and det(H f (a, b)) ≥ 0, H f (a, b) is
negative semi-definite. Consequently, if (a∗ , b∗ ) satisfies the first-order conditions, then (a∗ , b∗ ) is
168 17. Unconstrained Optimization

a point of global maximum of f . The first-order conditions are


n n n
a ∑ xi2 + b ∑ xi = ∑ xi yi
i=1 i=1 i=1
n n
a ∑ xi + bn = ∑ yi
i=1 i=1
n n
∑ xi ∑ yi
i=1 i=1
Denoting n by x and n by y (the mean of x and mean of y respectively), we get
(17.6) ax + b = y
Using this in the first equation leads to
n n
(17.7) a ∑ xi2 + (y − ax)nx = ∑ xi yi
i=1 i=1
Thus,
n
∑ xi yi
i=1
n −xy
n = a
∑ xi2
i=1
n − x2
y − ax = b
solves the problem. Note the solution is meaningful provided not all the xi are the same.

In the next exercise, we provide an alternative proof of the determinant of the Hessian is non-
negative.

(a) We will first show that


2αβ ≤ α2 + β2 .
Observe,
(α − β)2 = α2 + β2 − 2αβ ≥ 0
which shows that the desired inequality holds.

(b) Next we show by induction that


(x1 + x2 + · · · + xn )2 ≤ n(x12 + x22 + · · · + xn2 ).
For n = 2, we know
2x1 x2 ≤ x12 + x22
x12 + x22 + 2x1 x2 ≤ 2(x12 + x22 )
(x1 + x2 )2 ≤ 2(x12 + x22 ).
17.3. Application: Ordinary Least Square Analysis 169

Next, we assume that the claim holds for some k ∈ N and show that it holds true for n = k + 1.
Let
(x1 + · · · + xk )2 ≤ k(x12 + · · · + xk2 ).
Then
(x1 + · · · + xk + xk+1 )2 = (x1 + · · · + xk )2 + 2(x1 + · · · + xk )xk+1 + xk+1
2

≤ k(x12 + · · · + xk2 ) + 2(x1 xk+1 + · · · + xk xk+1 ) + xk+1


2

≤ k(x12 + · · · + xk2 ) + (x12 + xk+1


2
) + · · · + (xk2 + xk+1
2 2
) + xk+1
= (k + 1)(x12 + · · · + xk+1
2
).
Hence, the claim holds true for all n ∈ N.

There is yet another short proof of this inequality. Observe that


[ ]2 [ ][ ]
n n n n
∑ni=1 xi
4n ∑ xi − 4 ∑ xi = 4n ∑ xi − 4 ∑ xi
2 2
·n
i=1 i=1 i=1 i=1 n
 [ ] 
n n
= 4n  ∑ xi2 − ∑ xi · x .
i=1 i=1

Hence, it is sufficient to show that


[ ]
n n
∑ xi2 − ∑ xi · x ≥ 0.
i=1 i=1
Note
[ ]
n n n n
∑ xi2 − ∑ xi · x = ∑ xi (xi − x) = ∑ (xi − x + x)(xi − x)
i=1 i=1 i=1 i=1
n n
= ∑ (xi − x)2 + x · ∑ (xi − x) ≥ 0,
i=1 i=1
since the first term being sum of squares is non-negative and the second term is zero because
∑ni=1 (xi − x) = 0.
Chapter 18

Problem Set 7

(1) Consider the function g defined for all x ≥ 0, y ≥ 0 by


g(x, y) = x3 + y3 − 3x − 2y.
Write out ∇g(x, y) and Hg (x, y). Show that g is convex in its domain and find its (global)
minimum.

(2) Find all the local maxima and minima of


(18.1) f (x) = x4 − 4x3 + 4x2 + 4?
Which if any of them are global maxima or minima?

(3) A monopolist producing a single output has two types of buyers. If it produces Q1 units for
buyers of type 1, then the buyers are willing to pay a price of 100 − 5Q1 dollars per unit. If it
produces Q2 units for buyers of type 2, then the buyers are willing to pay a price of 50 − 10Q2
dollars per unit. The monopolist’s cost of producing Q units of output is 50+10Q. How many
units the monopolist should produce to maximize profit?

(4) Suppose that a perfectly competitive firm receives a price of P for its output, pays prices of
w, and r for its labor (L), and capital inputs (K), and operates with the production function
Q = La K b .
(a) Write profits as a function of L, and K. Derive the first order conditions. Provide an eco-
nomic interpretation of the first order conditions.
(b) Solve for the optimal levels of L, and K.
(c) Check the second order conditions. What restrictions on the values of a, and b are necessary
for a profit maximum. Provide an economic interpretation of these restrictions.

171
172 18. Problem Set 7

(d) Find the signs of the partial derivatives of L with respect to P, w, and r.
(e) Derive the firm’s long run supply curve, i.e., Q as a function of the exogenous parameters.
Find the elasticities of supply with respect to w, r, and P. Do these elasticities sum to zero?
Provide an economic explanation for this fact.

(5) Suppose that a perfectly competitive firm receives a price of P for its output, pays prices of
w, v, and r for its labor (L), natural resource (R) and capital inputs (K), and operates with the
production function Q = A(L)a (K)b + ln R.
(a) Write profits as a function of L, R and K. Derive the first order conditions. Provide an
economic interpretation of the first order conditions.
Now take A = 3, a = b = 31 for remainder of the problem.
(b) Check the second order conditions.
(c) [Optional)] Solve for L∗ . Find the change in L∗ for a change in r when all other parameters
are constant by taking the partial derivatives of L∗ with respect to r.
(d) [Optional)] Find the change in L∗ for a change in v when all other parameters are constant
by taking the partial derivatives of L∗ with respect to v.
(e) [Optional)] It is also possible to determine the changes in L∗ when r or v values change
without explicitly solving for L∗ by using the Implicit Function Theorem. You might like
to use a more general version of the Implicit Function Theorem (than what we stated in
class) to complete this exercise.
(i) Find the change in L for a change in r when all other parameters are constant.
(ii) Find the change in L for a change in v when all other parameters are constant.
Chapter 19

Optimization Theory:
Equality Constraints

19.1. Constrained Optimization

The optimization problems we encounter in economics are, in general, constrained problems where
there are some restrictions on the set we can choose x from. Some examples of constrained opti-
mization problems we see are,

Example 19.1. Consumer Theory

max u (x)
(19.1) x
subject to x ∈ B (p, I)

where B (p, I) is the budget set.

Producer Theory

max py − w · x
(19.2) y,x
subject to (y, x) ∈ Y

where
{ }
Y = (y, x) ∈ R × Rn | y 6 f (x)

is the production possibility set with f (x) being the production function (one output, many inputs).

173
174 19. Optimization Theory: Equality Constraints

We will work with maximization problem as it is easy to turn a minimization problem into a
maximization problem. A constrained maximization problem has the following form.

max f (x)
x
subject to x ∈ G (x)


 f (x) is called the objective function,
where x is called the choice variable,

 G (x) is called the constraint set.

We assume the objective function to be C 2 so that we can use differential calculus techniques.

Example 19.2. Consider following optimization problem.

max f (x)
(19.3) x
subject to x ∈ [a, b]

A solution to this problem is


( )
(19.4) x∗ ∈ X ∗ ⊂ [a, b] ∧ f x∗ > f (x) ∀x ∈ [a, b] .

First question to answer is,

Does a solution exist? Note f is continuous (because it is C 2 ) and [a, b] is a non-empty compact
set. We can use Weierstrass Theorem to show existence of a maximum and minimum. Having
shown the existence, there are two possibilities:

(a) The solution is interior, x∗ ∈ (a, b). Then x∗ must also be a local maximum, i.e.,
( ) ( )
(19.5) f ′ x∗ = 0 ∧ f ′′ x∗ 6 0.

Hence we are able to apply earlier theorems to interior solutions.

(b) We have corner (boundary) solution, i.e. x∗ = a, or x∗ = b or both.


If x∗ = a, then f ′ (a) 6 0.
If x∗ = b, then f ′ (b) > 0.

A sufficient condition for a differentiable function f to have a minimum at an interior point x0


of its domain is that f ′ (x0 ) = 0 and that there exists an interval (a, b) around x0 such that f ′ (x) < 0
in (a, x0 ) and f ′ (x) > 0 in (x0 , b). Following example shows that the conditions are not necessary.
19.2. Equality Constraint 175

Example 19.3. Consider


 ( )

 2x2 + x2 sin 1 , for x ̸= 0
f (x) = x

 0, for x = 0.

Function f has a minimum at 0. We can verify that f ′ (0) = 0. However, for x ̸= 0,


( ) ( )
′ 1 1
f (x) = 4x + 2x sin − cos ,
x x
( )
changes sign arbitrarily because of the term cos 1x close to the origin.

In general, constrained optimization problems are of two categories, (a) with equality con-
straint and (b) with inequality constraint. We discuss them next.

19.2. Equality Constraint

In this case the constraint set G (x) is described by k equality constraints.



g1 (x) = 0 

··· where x ∈ Rn , or,

gk (x) = 0 
{ }
(19.6) G (x) = x ∈ Rn | g (x) = 0 .

Note that g (x) = (g1 (x) , · · · , gk (x)) is k-dimensional row vector. The interesting case will be
k < n as the following example shows.

Example 19.4. Consider


max f (x)
x∈R2 {
x1 + x2 − 2 = 0 : g1 (x) = 0
subject to
3 x1 + x2 − 1 = 0 g (x) = 0.
1
: 2
( )
The only point in the constraint set is (x1 , x2 ) = 32 , 21 . Maximizing over this set is trivial. The
solitary point in the constraint set is also the solution.

Definition 19.1. A point x∗ ∈ G (x) is point of local maximum of f subject to the constraint g (x) =
0, if there is δ > 0 such that x ∈ G (x) ∩ B (x∗ , δ) implies f (x) 6 f (x∗ ).
176 19. Optimization Theory: Equality Constraints

Definition 19.2. A point x∗ ∈ G (x) is point of global maximum of f subject to the constraint
g (x) = 0, if x∗ solves the problem
max f (x)
subject to g (x) = 0.
Theorem 19.1. Necessary condition for a constrained local maximum (Lagrange Theorem) Let
A ⊆ Rn be open and f : A → R, g : A → Rk be C 1 functions. Suppose x∗ is a point of local
maximum of f subject to the constraint g (x) = 0. Suppose further that ∇g (x∗ ) ̸= 0. Then there is
λ∗ ∈ Rk such that
( ) ( )
(19.7) ∇ f x∗ = λ∗ ∇g x∗ .
Remark 19.1. The condition ∇g (x∗ ) ̸= 0 is called constraint qualification.

It is important to check the constraint qualification condition ∇g(x∗ ) ̸= 0, for applying the
conclusion of Lagrange’s theorem. Without this condition, the conclusion of Lagrange’s theorem
would not be valid, as the following example shows.
Example 19.5.

Let f : R2 → R be given by
f (x1 , x2 ) = 4x1 + 3x2 for all (x1 , x2 ) ∈ R2 ;
and let g : R2 → R be given by
g(x1 , x2 ) = x12 + x22 .
Consider the constraint set C = {(x1 , x2 ) ∈ R2 : g(x1 , x2 ) = 0}. The only element of this set is (0,0),
so (x1∗ , x2∗ ) = (0, 0) is a point of local maximum of f subject to the constraint g(x) = 0. Observe that
the conclusion of Lagrange’s theorem does not hold here. For, if it did, there would exist λ∗ ∈ R
such that
∇ f (0, 0) = λ∗ ∇g(0, 0)
But this means that
(4, 3) = λ∗ (0, 0)
which is a contradiction. The problem here is that
∇g(x1∗ , x2∗ ) = ∇g(0, 0) = (0, 0),
so the constraint qualification condition is violated.

In the next Theorem, we use notation C to denote the constraint set, i.e.,
{ }
C = x ∈ Rn : g(x) = 0 .
19.2. Equality Constraint 177

Theorem 19.2. Sufficient Conditions for a Global maximum: Let A ⊆ Rn be an open convex set
and f : A → R, g : A → Rk be C 1 functions. Suppose (x∗ , λ∗ ) ∈ C × Rk satisfies
( ) ( )
(19.8) ∇ f x∗ = λ∗ ∇g x∗ .
If L (x, λ∗ ) = f (x) − λ∗ · g (x) is concave in x on A, then x∗ is a point of global maximum of f
subject to constraint g (x) = 0.

Proof. Let x ∈ C. Then,


L (x, λ∗ ) − L (x∗ , λ∗ ) ≤ [∇ f (x∗ ) − λ∗ ∇g(x∗ )] · (x − x∗ )
by concavity of L in x on A. Using the first-order condition, the term on the right hand side of the
inequality [∇ f (x∗ ) − λ∗ ∇g(x∗ )] · (x − x∗ ) is zero and we get
f (x) − λ∗ g(x) = L (x, λ∗ ) ≤ L (x∗ , λ∗ ) = f (x∗ ) − λ∗ g(x∗ ).
Since x ∈ C, and x∗ ∈ C, we have g(x) = g(x∗ ) = 0. Thus, f (x) ≤ f (x∗ ), and so x∗ is a point of
global maximum of f subject to the constraint g(x) = 0. 

We use the following steps to solve the optimization problem with equality constraint. Let f
and gi , i = 1, · · · , k, be C 1 functions.

Necessity Route:

Step 1 Existence of solution can be shown by using Weierstrass Theorem. For this we need to
show that the constraint set is closed and bounded.

Step 2 Define the Lagrangian function as


L (x, λ) = f (x) − λ · g(x) = f (x) − λ1 g1 (x) − · · · − λk gk (x)
where λi , i = 1, · · · , k are Lagrange multipliers.

Step 3 Take the partial derivative with respect to each variable x1 , · · · xn , and Lagrange multipliers
λ1 , · · · , λk .

Step 4 Solve the following equations:


∂L (x, λ)
= 0, i = 1, · · · , n;
∂xi
∂L (x, λ)
= 0, i = 1, · · · , k.
∂λi
These are n + k first order conditions (FOCs) for n + k unknowns.
178 19. Optimization Theory: Equality Constraints

x2

2 x1

Figure 19.1. Constraint Set x2 = 1 − 0.5 · x1

Step 5 Let
{ }
M = (x, λ) ∈ Rn+k | x satisfies gi (x) = 0, i = 1, · · · , k and FOCs hold.

Verify that ∇g (x∗ ) ̸= 0 holds at each point in the set M. Then evaluate f at each (x, λ) ∈ M
and find the maximum.

Sufficiency Route: We know that if f and λ1 g1 (x) , · · · , λk gk (x) are such that L (x, λ) is con-
cave, then the FOCs are sufficient for a maximum. Hence if we can show concavity, then any point
satisfying the FOC will be a solution. We illustrate the use of the two routes through following
examples.
Remark 19.2. Note if f is not concave, we have to compare points in M (x, λ).
Example 19.6.
max f (x1 , x2 ) = −x12 − x22
x∈R2+
subject to 5x1 + 10x2 = 10

The constraint set consists of 1 − 0.5x1 and non-negative values of x1 and x2 subject to the equality
constraint. To get the constraint in g (x) = 0 form, we rearrange it
5x1 + 10x2 − 10 = 0.
19.2. Equality Constraint 179

Necessity Route Constraint set is non-empty as (2, 0) is contained in it.

Constraint set is closed. Take any convergent sequence {xn } ∈ G (x) → x̄. Since 5x1n + 10x2n −
10 = 0, x1n > 0, x2n > 0, ∀n ∈ N, and weak inequalities are preserved in the limit,
5x̄1 + 10x̄2 − 10 = 0, x̄1 > 0, x̄2 > 0.
So x̄ ∈ G (x).

Constraint set is bounded. Note for ∀x ∈ G (x),


√ √
x1 6 2 and x2 6 1 ⇒ ∥x∥ 6 (2, 1) = 22 + 12 = 5.

So 5 will serve as a bound. So the constraint set is compact and non-empty and the objective
function f is continuous, hence Weierstrass theorem is applicable and a solution exists.

The Lagrangian and the FOCs are


L (x, λ) = −x12 − x22 − λ (5x1 + 10x2 − 10)
∂L (x, λ)
= −2x1 − 5λ = 0
∂x1
∂L (x, λ)
= −2x2 − 10λ = 0
∂x2
∂L (x, λ)
= −(5x1 + 10x2 − 10) = 0.
∂λ
Now from the first two FOCs
4x1 = 2x2 ⇔ 2x1 = x2
and from the third FOC,
5x1 + 20x1 − 10 = 0
10 2 4 4
x1 = = , x2 = , λ = − .
25 5 5 25
We get a candidate for solution ( )
2 4 4
m1 = , ,− .
5 5 25
Since we know a solution exists, it must necessarily be either m1 or one of the corners (2, 0) or
(0, 1). The constraint qualification
( ) [ ]
∇g x∗ = 5 10 ̸= 0
is verified trivially.

We also see that ( )


2 4 4
f (2, 0) = −4, f (0, 1) = −1, f , =− .
5 5 5
180 19. Optimization Theory: Equality Constraints

( )
The solution then is x∗ = 2 4
5, 5 .

Sufficiency Route
[ ]
∇ f (x) = −2x1 −2x2
[ ]
−2 0
H f (x) =
0 −2
D1 = −2 < 0, D2 = 4 > 0
So H f (x) is negative definite ∀x. Since H f (x) is negative definite ∀x, f is concave. The constraint
g(x) is concave as it is linear. Also −λ > 0. Then f (x) − λg (x) is concave as a sum of concave
functions.
( )Then we know that the FOCs are sufficient condition for a maximum. So the point
∗ 2 4
x = 5 , 5 is our solution.

Example 19.7. (Non-concave objective function)


max f (x1 , x2 ) = x12 x2
subject to 2x12 + x22 = 3.

The constraint set is an ellipsoid and can be rewritten as 3 − 2x12 − x22 = 0. Here the sufficiency
route will not work as the objective function is not concave.
[ ]
2x2 2x1
H f (x) =
2x1 0
D1 = 2x2 , D2 = −4x12 , D2 < 0 ∀ x ̸= 0
which means that H f (x) is indefinite ∀x ̸= 0. So f is not concave. Hence we have to use the
necessity route.

Constraint set is non-empty as (1, 1) is contained in it.


( )2
Constraint set is closed. Take any convergent sequence {xn } ∈ G (x) → x̄. Since 2 x1n +
( n )2
x2 = 3 ∀ n ∈ N, and weak inequalities are preserved in the limit,
2 (x̄1 )2 + (x̄2 )2 = 3.
So x̄ ∈ G (x).

Constraint set is bounded. Note for ∀x ∈ G (x),



3 √ √
x1 6 < 3 and x2 6 3.
2
19.2. Equality Constraint 181

√ √ √ √

So ∥x∥ 6 3, 3 = 3 + 3 = 6. So the constraint set is compact and non-empty and the
objective function f is continuous, hence Weierstrass theorem is applicable and a solution exists.

The Lagrangian and the FOCs are


( )
L (x, λ) = x12 x2 − λ 3 − 2x12 − x22
∂L (x, λ)
= 2x1 x2 + 4λx1 = 0
∂x1
∂L (x, λ)
= x12 + 2λx2 = 0
∂x2
∂L (x, λ)
= −(3 − 2x12 − x22 ) = 0.
∂λ
Now
x2
2x1 (x2 + 2λ) = 0 ⇔ x1 = 0 ∨ λ = − .
2
Case (i) √
x1 = 0, x2 = ± 3, λ = 0.
We get two candidates for solution
( √ ) ( √ )
m1 = 0, 3, 0 , m2 = 0, − 3, 0 .

Case (ii)
x2
λ = − → x12 − x22 = 0
2
→ x1 = x2 ∨ x1 = −x2
→ 3 − 2x12 − x22 = 0
gives x1 = 1 ∨ x1 = −1. If
1 1
x1 = 1 → x2 = 1 ∨ x2 = −1, λ = − ∨ λ = .
2 2
Similarly for x1 = −1. We get four more candidates for solution.
( ) ( )
1 1
m3 = 1, 1, − , m4 = 1, −1, ,
2 2
( ) ( )
1 1
m5 = −1, −1, , m6 = −1, 1, − .
2 2
Thus
M = {m1 , m2 , · · · , m6 } .
The constraint qualification
( ) [ ]
∇g x∗ = −4x1∗ −2x2∗ ̸= 0
182 19. Optimization Theory: Equality Constraints

for each mi ∈ M. Verify that


√ ( √ )
f (0, 3) = 0 = f 0, − 3 ,
f (1, 1) = f (−1, 1) = 1,
f (1, −1) = f (−1, −1) = −1.
The solution then is x = (1, 1) and x = (−1, 1).
Example 19.8.
max f (x1 , x2 ) = x1 x2
x∈R2+
subject to x1 + 4x2 = 16 or 16 − x1 − 4x2 = 0.

The Hessian is [ ]
0 1
H f (x) =
1 0
which is indefinite for all values of x ∈ R2+ . Hence the objective function is not concave.

Observe that x is restricted to R2+ and the equality constraint holds. This constraint set is non-
empty as (0, 4) is contained in it, and compact. A solution to this problem exists as f is continuous
and the constraint set is non empty and compact, hence Weierstrass theorem is applicable.

The Lagrangian and the FOCs are


L (x, λ) = x1 x2 − λ (16 − x1 − 4x2 )
∂L (x, λ)
= x2 + λ = 0
∂x1
∂L (x, λ)
= x1 + 4λ = 0
∂x2
∂L (x, λ)
= −(16 − x1 − 4x2 ) = 0.
∂λ
The FOCs will give us interior candidates. We will still need to compare with the corners. Now
x1 = 4x2 →
8x2 = 16 → x2 = 2 and
x1 = 8, λ = −8.
We get one candidate for solution
m1 = (8, 2, −8) .
The constraint qualification
( ) [ ]
∇g x∗ = −1 −4 ̸= 0
19.2. Equality Constraint 183

is satisfied trivially for m1 . Compare it with the corners (0, 4) , (16, 0) and verify that
f (0, 4) = 0 = f (16, 0) , f (8, 2) = 16.
The solution then is x = (8, 2).
Example 19.9.
max f (x1 , x2 ) = ln x1 + ln x2
x∈R2+
subject to x1 + 4x2 = 16 or 16 − x1 − 4x2 = 0.

Here the necessity route does not work as the objective function is not defined at the corners
of the constraint set, x = (16, 0) or x = (0, 4) as ln y is not defined for y = 0. Weierstrass Theorem
cannot be applied. Let us use the sufficiency route. Since ln is not defined at the corners, the
problem can be modified as follows
max f (x1 , x2 ) = ln x1 + ln x2
x∈R2++
subject to 16 − x1 − 4x2 = 0.
The Lagrangian and the FOCs are
L (x, λ) = ln x1 + ln x2 − λ (16 − x1 − 4x2 )
∂L (x, λ) 1
= + λ = 0 → λx1 = −1
∂x1 x1
∂L (x, λ) 1
= + 4λ = 0 → 4λx2 = −1
∂x2 x2
∂L (x, λ)
= −(16 − x1 − 4x2 ) = 0.
∂λ
So x1 = 4x2 from the first two FOCs. Substituting it in the third FOC, we get x1 = 8, x2 = 2, λ = − 18 .
The Hessian is
 
− x12 0
H f (x) =  1 
0 − x12
2

1 1
D1 = − < 0, D2 = 2 2 > 0, ∀x ∈ R2++ .
x12 x1 x2
Hence H f (x) is negative definite ∀x ∈ R2++ , so f is concave. Also g (x) = 16 − x1 − 4x2 is linear,
hence concave. Lastly −λ > 0. So L (x, λ) is concave and the FOCs are sufficient for maximum.
Hence x∗ = (8, 2) is the solution.
Example 19.10. Application: Arithmetic mean Geometric mean inequality Consider
max f (a, b) = ab
(19.9) (a,b)∈R2+
subject to a + b = 2.
184 19. Optimization Theory: Equality Constraints

Note the constraint set C = {a > 0, b > 0, a + b = 2} is non empty, (2, 0) is contained in it,
closed since weak inequalities are preserved in the limit, and bounded as

(a, b) 6 (2, 2) = 2 2.

The objective function is continuous. Hence by Weierstrass Theorem a solution exists.

Note that at the solution a > 0, b > 0. Hence we can rewrite the problem as under
max f (a, b) = ab
(a,b)∈R2++
subject to g (a, b) = 2 − a − b = 0.
The Lagrangian and the FOCs are
L (a, b, λ) = ab − λ (2 − a − b)
∂L (x, λ)
= b+λ = 0
∂a
∂L (x, λ)
= a+λ = 0
∂b
∂L (x, λ)
= −(2 − a − b) = 0.
∂λ
Now
a = b → a = b = 1 = −λ
We get one candidate for solution
m1 = (1, 1, −1) .
The constraint qualification
( ) [ ]
∇g x∗ = −1 −1 ̸= 0
is satisfied trivially for m1 . Compare it with the corners (0, 2) , (2, 0) and verify that
f (0, 2) = 0 = f (2, 0) , f (1, 1) = 1.
The solution then is (1, 1). In other words, we have shown that
(19.10) ab 6 1.
Now let x1 > 0, x2 > 0 be arbitrary with
x1 + x2 = x > 0.
Then
2x1 + 2x2 = 2x
2x1 2x2
+ = 2
x x
19.2. Equality Constraint 185

Note that a = 2x1


x > 0, b =
> 0 and a + b = 2. So we can apply the result shown above.
2x2
x
( )( )
2x1 2x2
ab = 61
x x
( )
x2 x1 + x2 2
x1 x2 6 =
4 2
√ x1 + x2
x1 x2 6
2
which is the Arithmetic mean Geometric mean inequality.
Chapter 20

Optimization Theory:
Inequality Constraints

20.1. Inequality Constraint

The more general constrained optimization problem deals with inequality constraint. Note that the
equality constraint g (x) = 0 can be expressed as g (x) > 0 and g (x) 6 0.

The constrained maximization problem with which we are concerned is the following:

max f (x)
subject to g j (x) ≥ 0 for j = 1, · · · , m
and x∈ Rn+ .

We take set X to be a non-empty open subset of Rn containing Rn+ , and f , g j ( j = 1, · · · , m) are


continuously differentiable functions from X to R.

We define the following constraint functions on the domain X

G j (x) = g j (x) for j = 1, · · · , m, and


Gm+ j (x) = x j for j = 1, · · · , n.

187
188 20. Optimization Theory: Inequality Constraints

Using these constraint functions, the maximization problem can be rewritten as


max f (x)
subject to G j (x) ≥ 0 for j = 1, · · · , m + n
and x ∈ X.

We define the constraint set, C as follows:


C = {x ∈ X : G(x) ≥ 0}
where, G(x) = [G1 (x), · · · , Gm+n (x)].
Definition 20.1. Kuhn-Tucker Conditions: Let X be an open set in Rn , and f , G j ( j = 1, · · · , m +
n) be continuously differentiable on X. A pair (x∗ , λ∗ ) in X × Rm+n
+ satisfies the Kuhn-Tucker
conditions if
m+n
(i) Di f (x∗ ) + ∑ λ∗j · Di G j (x∗ ) = 0; i = 1, · · · , n
j=1
(ii) G(x∗ ) > 0 and λ∗ · G(x∗ ) = 0.
Theorem 20.1. Let X be an open set in Rn , and f , G j ( j = 1, · · · , m + n) be continuously differ-
entiable on X. Suppose a pair (x∗ , λ∗ ) ∈ X × Rm+n + , satisfies the Kuhn-Tucker conditions. If X
is convex and f , G j ( j = 1, · · · , m + n) are concave on X, then x∗ is a point of constrained global
maximum.

We illustrate the application of this Theorem through examples. First we take a linear objective
function.
Example 20.1. Solve
max f (x, y) = ax + by
(x,y)∈R2+
subject to p1 x + p2 y 6 M.
where a, b, p1 , p2 and M are positive parameters. Find a solution to the problem for the following
parameter configurations
a p1 a p1
(i) > (ii) < ,
b p2 b p2
using Kuhn Tucker sufficiency theorem.

We need to check all conditions of the Theorem are satisfied.

(i) Let { }
X = (x, y) ∈ R2 | x > −1, y > −1 .
20.1. Inequality Constraint 189

Then X is open as its complement


{ }
X C = (x, y) ∈ R2 | x 6 −1, or y 6 −1

is closed.

(ii) Function f (x, y) is continuous as ax and by are continuous and f (·, ·) is obtained by taking
sum of two continuous functions.
Let g1 (x, y) = M − p1 x − p2 y, g2 (x, y) = x, g3 (x, y) = y are linear and hence continuous
functions. Further fx (x, y) = a, fy (x, y) = b are continuous functions. Hence f , g j ( j =
1, · · · , 3) are continuously differentiable on X.

(iii) The set X is convex as (x1 , y1 ), (x2 , y2 ) ∈ X, then


x1 > −1, x2 > −1 → λx1 + (1 − λ) x2 > −1∀λ ∈ (0, 1)
y1 >−1, y2 > −1 → λy1 + (1 − λ) y2 > −1∀λ ∈ (0, 1)
( )
⇒ λx1 + (1 − λ) x2 , λy1 + (1 − λ) y2 ∈ X.

Function f (x, y) is concave as sum of two concave functions and g j ( j = 1, · · · , 3) are concave being
linear functions. Hence for the following problem
max f (x, y) = ax + by
(x,y)∈X
subject to p1 x + p2 y 6 M, x > 0, y > 0.
all conditions of Kuhn-Tucker sufficiency theorem are satisfied. We need to find pair ((x∗ , y∗ ) , λ∗ ) ∈
X × R3+ , that satisfies the Kuhn-Tucker conditions.
m
(i) Di f (x∗ ) + ∑ λ∗j · Di g j (x∗ ) = 0; i = 1, · · · , n,
j=1
(ii) g(x ) > 0 and λ∗ · g(x∗ ) = 0.

They are
a − λ1 p1 + λ2 = 0
b − λ1 p2 + λ3 = 0
M − p1 x − p2 y > 0, λ1 (M − p1 x − p2 y) = 0
x > 0, λ2 x = 0; y > 0, λ3 y = 0
If λ1 = 0, then a − λ1 p1 + λ2 = 0 → λ2 = −a < 0 which contradicts λ2 > 0. Hence
λ1 > 0 → M − p1 x − p2 y = 0.
So x = y = 0 is ruled out. We now consider the two cases.
190 20. Optimization Theory: Inequality Constraints

x2

M
p2

M x1
( ) p1
a p1 M
Figure 20.1. Case (i): b > p2 : Optimal Consumption Bundle = p1 , 0

p1
Case (i) a
b > p2 . Consider x > 0, y = 0. Note λ2 = 0, x = M
p1 ,
a a
= λ1 , b − p2 + λ3 = 0,
p1 p1
( )
a a p2
λ3 = p2 − b = b − 1 > 0,
p1 b p1
a p1 a p2
since b > p2 or b p1 > 1. Hence
( )
M a a p2
x = , y = 0, λ1 = , λ2 = 0, λ3 = b −1 > 0
p1 p1 b p1
is a solution.

p1
Case (ii) a
b < p2 . Consider x = 0, y > 0. Note λ3 = 0, y = M
p2 ,
b b
= λ1 , a − p1 + λ2 = 0
p2 p2
( )
b b p1
λ2 = p1 − a = a −1 > 0
p2 a p2
a p1 b p1
since b < p2 or 1 < a p2 . Hence
( )
M b b p1
x = 0, y = , λ1 = , λ2 = a − 1 > 0, λ3 = 0
p2 p2 a p2
is a solution.
20.1. Inequality Constraint 191

x2

M
p2

M x1
( ) p1
a p1
Figure 20.2. Case (ii): b < p2 : Optimal Consumption Bundle = 0, M
p2

Example 20.2. Solve


x
max f (x, y) = 1+x +y
(x,y)∈R2+
subject to x + 4y 6 16.
using Kuhn Tucker sufficiency theorem.

We need to check all conditions of the Theorem are satisfied.

(i) Let { }
X = (x, y) ∈ R2 | x > −1, y > −1 .
Then X is open as its complement
{ }
X C = (x, y) ∈ R2 | x 6 −1, y 6 −1

is closed.

(ii) Function f (x, y) is continuous as x, y, 1 + x are continuous, 1 + x > 0 and f (·, ·) is obtained
by taking quotient of two continuous functions x and 1 + x, with non-vanishing denominator
192 20. Optimization Theory: Inequality Constraints

and then adding a continuous function. Functions

g1 (x, y) = 16 − x − 4y; g2 (x, y) = x; g3 (x, y) = y


1
are linear and hence continuous. Further fx (x, y) = , f (x, y) = 1 are continuous func-
(1+x)2 y
tions. Hence f , g j ( j = 1, · · · , 3) are continuously differentiable on X.

(iii) The set X is convex as (x1 , y1 ) , (x2 , y2 ) ∈ X, then

x1 > −1, x2 > −1 → λx1 + (1 − λ) x2 > −1∀λ ∈ (0, 1)


y1 > −1, y2 > −1 → λy1 + (1 − λ) y2 > −1∀λ ∈ (0, 1)
( )
→ λx1 + (1 − λ) x2 , λy1 + (1 − λ) y2 ∈ X.

Function f (x, y) is concave as sum of two concave functions (exercise) and g j ( j = 1, · · · , 3) are
concave being linear functions. Hence for the following problem
x
max f (x, y) = 1+x +y
(x,y)∈X
subject to x + 4y 6 16, x > 0, y > 0.

all conditions of Kuhn -Tucker sufficiency theorem are satisfied. We need to find pair ((x∗ , y∗ ) , λ∗ ) ∈
X × R3+ , that satisfies the Kuhn-Tucker conditions. They are
1
− λ1 + λ2 = 0
(1 + x)2
1 − 4λ1 + λ3 = 0
16 − x − 4y > 0, λ1 (16 − x − 4y) = 0
x > 0, λ2 x = 0; y > 0, λ3 y = 0.

If λ1 = 0, then 1 − 4λ1 + λ3 = 0 → λ3 = −1 < 0 which contradicts λ3 > 0. Hence

λ1 > 0 → 16 − x − 4y = 0

and x = y = 0 is ruled out. There are three remaining cases.

Case (i) x > 0, y = 0. Note λ2 = 0, x = 16,


1 4
2
= λ1 ; 1 − + λ3 = 0
(1 + 16) 289
285
λ3 = − <0
289
This contradicts λ3 > 0.
20.1. Inequality Constraint 193

Case (ii) x = 0, y > 0. Note λ3 = 0, y = 4,


1
= λ1 ; 1 − λ1 + λ2 = 0
4
1 3
1 − + λ2 = 0; λ2 = − < 0.
4 4
This contradicts λ2 > 0.

Case (iii) x > 0, y > 0. Note λ2 = 0, λ3 = 0,


1
= λ1 ;
(1 + x)2
1
1 − 4λ1 = 0, λ1 = > 0;
4
(1 + x)2 = 4 → x = 1 > 0
15
16 − x − 4y = 0 → y = > 0.
4
( )
Note that all conditions are satisfied. The Theorem asserts that 1, 15
4 is a global maxi-
mum and therefore solves both the problem.
Example 20.3.

In the above example, let the price of good y be p > 0 and income be I > 0. We can redo the
exercise by going over the Kuhn Tucker conditions again. They are
1
− λ1 + λ2 = 0
(1 + x)2
1 − pλ1 + λ3 = 0
I − x − py > 0, λ1 (I − x − py) = 0
x > 0, λ2 x = 0; y > 0, λ3 y = 0.
If λ1 = 0, then 1 − pλ1 + λ3 = 0 → λ3 = −1 < 0 which contradicts λ3 > 0. Hence
λ1 > 0 → I − x − py = 0
and x = y = 0 is ruled out because I > 0. There are three remaining cases.

Case (i) x > 0, y = 0. Note λ2 = 0, x = I,


1
= λ1
(1 + I)2
p p
1− 2
+ λ3 = 0 → λ3 = − 1.
(1 + I) (1 + I)2
194 20. Optimization Theory: Inequality Constraints

( )
p 2
If − 1 > 0 → p > (I + 1) , then λ3 > 0. So solution is I, 0, 1
, 0, p 2 − 1 if
(1+I)2 (1+I)2 (1+I)

p > (I + 1)2 .

Case (ii) x = 0, y > 0. Note λ3 = 0, y = pI ,


1
= λ1
p
1
1 − λ1 + λ2 = 0 → 1 − + λ2 = 0
p
1
λ2 = − 1.
p
( )
If 1
p − 1 > 0 → 1 > p, then λ2 > 0. So solution is 0, pI , 1p , 1p − 1, 0 if p 6 1.

Case (iii) x > 0, y > 0. Note λ2 = 0, λ3 = 0,


1
= λ1 , 1 − pλ1 = 0,
(1 + x)2

(1 + x)2 = p → x = p − 1 > 0

I +1− p
I − x − py = 0 → y = > 0.
p
√ (√ √ )
I+1− p
Hence for p > 1 and I + 1 > p, the solution is p − 1, p , 1p , 0, 0 .

( )
Combining them the solution x∗ , y∗ , λ∗1 , λ2∗ , λ∗3 is
 ( )



 I, 0, 1
, 0, p
− 1 if p > (I + 1)2
 ( (1+I)2 (1+I)2
)
 0, I 1 1
, , − 1, 0 if p 6 1, and
 (

p p p
√ )

 √ I+1− p
p − 1, p , 1p , 0, 0 if 1 < p < (I + 1)2 .
The Kuhn Tucker Sufficiency Theorem asserts that this solution is a global maximum and therefore
solves both the problem.

20.2. Global maximum and constrained local maximum

We know from the definitions, that if x̂ is a point of global maximum, then x̂ is also a point of local
maximum. The situations under which the converse is true are given by the following theorems.
Theorem 20.2. Suppose A is an open convex set in Rn , and f is a function from A to R.
20.2. Global maximum and constrained local maximum 195

(a) Suppose x̄ in A is a point of local maximum of f , and f is concave on A. Then x̄ is a point of


global maximum of f on A.

(b) Suppose x̄ in A is a point of local maximum of f , and f is strictly quasi-concave on A. Then x̄


is the unique point of global maximum of f on A.

(c) Suppose x̄ is in A and there is δ > 0 such that


(i) B(x̄, δ) ⊂ A, and
(ii) x̄ is the unique point of maximum of f on B(x̄, δ).
If f is quasi-concave on A, then x̄ is the unique point of global maximum of f on A.

[Note we do not assume the function f to be differentiable on A].

Proof. We prove these claims by contradiction.

(a) Assume that x̄ is not a global maximum of f on A. Then there exists another point x̂ ∈ A such
that x̂ ̸= x̄ and f (x̂) > f (x̄).
Since x̄ is a point of local maximum, there exists δ > 0 such that f (x̄) ≥ f (x) for all
x ∈ A ∩ B(x̄, δ).
Consider a point x ∈ A on the line joining the two points x̂ and x̄, i.e.,
x = λx̂ + (1 − λ)x̄,
for some λ ∈ [0, 1]. Since A is convex, we know x ∈ A. By concavity of f , we have for all
λ ∈ [0, 1]
f (λx̂ + (1 − λ)x̄) ≥ λ f (x̂) + (1 − λ) f (x̄).
Since f (x̂) > f (x̄), we also have for all λ ∈ (0, 1] that
f (λx̂ + (1 − λ)x̄) ≥ λ f (x̂) + (1 − λ) f (x̄) > λ f (x̄) + (1 − λ) f (x̄) = f (x̄).
We wish to take λ sufficiently close to zero (but not equal to zero) so that
x′ ≡ λx̂ + (1 − λ)x̄ ∈ B(x̄, δ).
For this, let us denote d(x̂, x̄) = d and note
d(x′ , x̄) = d(λx̂ + (1 − λ)x̄, x̄) = |λ|d(x̂, x̄) = λ · d.
δ
If we set λ = 2d , then we know
δ δ
d(x′ , x̄) = λ · d = ·d = ,
2d 2
or x′ ∈ B(x̄, δ).
196 20. Optimization Theory: Inequality Constraints

Also x′ ∈ A since A is a convex set. Therefore, we have found a point x′ ∈ A ∩ B(x̄, δ) such
that f (x′ ) > f (x̄), which contradicts that x̄ was a point of local maximum. It follows that x̄
must be a global maximum of f on A.

(b) Assume that x̄ is not a point of global maximum of f on A. Then there exists another point
x̂ ∈ A such that x̂ ̸= x̄ and f (x̂) > f (x̄).
Since x̄ is a point of local maximum, there exists δ > 0 such that f (x̄) ≥ f (x) for all
x ∈ A ∩ B(x̄, δ).
Consider a point x ∈ A on the line joining the two points x̂ and x̄, i.e.,
x = λx̂ + (1 − λ)x̄,
for some λ ∈ [0, 1]. Since A is convex, we know x ∈ A. Since f is strictly quasi-concave, we
have for all λ ∈ (0, 1)
{ }
f (λx̂ + (1 − λ)x̄) ≥ min f (x̂), f (x̄) = f (x̄).
We wish to take λ > 0 sufficiently small so that
x′ ≡ λx̂ + (1 − λ)x̄ ∈ B(x̄, δ).
For this, let us denote d(x̂, x̄) = d and note
d(x′ , x̄) = d(λx̂ + (1 − λ)x̄, x̄) = |λ|d(x̂, x̄) = λ · d.
δ
If we set λ = 2d , then we know
δ δ
d(x′ , x̄) = λ · d = ·d = ,
2d 2
or x′ ∈ B(x̄, δ).
Also x′ ∈ A since A is a convex set. Therefore, we have found a point x′ ∈ A ∩ B(x̄, δ) such
that f (x′ ) > f (x̄), which contradicts that x̄ was a point of local maximum. It follows that x̄
must be a global maximum of f on A.
To show uniqueness, if not, then there exists x′′ ∈ A such that
f (x̄) = f (x′′ ).
But then, since f is strictly quasi-concave and A is convex,
{ }
f (0.5x̄ + 0.5x′′ ) > min f (x̄), f (x′′ ) = f (x′′ ) = f (x̄).
This contradicts the fact that x̄ is a point of global maximum.

(c) Assume that x̄ is not the unique point of global maximum of f on A. Then there exists another
point x̂ ∈ A such that x̂ ̸= x̄ and f (x̂) > f (x̄).
Since x̄ is the unique point of local maximum in the open ball B(x̄, δ), f (x̄) > f (x) for all
x ∈ A ∩ B(x̄, δ).
Consider a point x ∈ A on the line joining the two points x̂ and x̄, i.e.,
x = λx̂ + (1 − λ)x̄,
20.2. Global maximum and constrained local maximum 197

for some λ ∈ [0, 1]. Since A is convex, we know x ∈ A. Since f is quasi-concave, we have for
all λ ∈ (0, 1)
{ }
f (λx̂ + (1 − λ)x̄) ≥ min f (x̂), f (x̄) = f (x̄).
We wish to take λ > 0 sufficiently small so that
x′ ≡ λx̂ + (1 − λ)x̄ ∈ B(x̄, δ).
For this, let us denote d(x̂, x̄) = d and note
d(x′ , x̄) = d(λx̂ + (1 − λ)x̄, x̄) = |λ| d(x̂, x̄) = λ · d.
δ
If we set λ = 2d , then we know
δ δ
d(x′ , x̄) = λ · d = ·d = ,
2d 2
or x′ ∈ B(x̄, δ).
Also x′ ∈ A since A is a convex set. Therefore, we have found a point x′ ∈ A ∩ B(x̄, δ) such
that f (x′ ) ≥ f (x̄), which contradicts that x̄ was the unique point of local maximum. It follows
that x̄ must be the unique point of global maximum of f on A.

This theorem shows that there is an important difference between concavity and quasi-concavity
in going from the local maximum property to the global maximum property. With quasi-concavity,
we need something more (some “strictness”) to make the arguments work. In (b), this additional
condition takes the form of strict quasi-concavity. In (c), it takes the form of assuming that the
point of local maximum is unique. This underlying theme (that one needs something in addition
to quasi-concavity to make the arguments and results work) recurs in Arrow- Enthoven’s theory
of quasi-concave programming, where the attempt is made to replace the concavity conditions of
Kuhn-Tucker with quasi-concavity.

The following example shows that in Theorem 20.2(a), we cannot replace concavity of f by
quasi-concavity of f , and still preserve the conclusion.
Example 20.4. Let A be the interval (0, 6) in R. Clearly, A is an open, convex set. Let f : A → R
be defined as follows: 

 x for x ∈ (0, 2)
f (x) = 2 for x ∈ [2, 4]

 x−2 for x ∈ (4, 6)
Then, f is a non-decreasing function on A, and therefore quasi-concave. The point x̄ = 3 is clearly
a point of local maximum, since f (x̄) = 2 ≥ f (x) for all x ∈ A ∩ B(x̄, 1). However, x̄ is not a point
of global maximum of f on A, since (for example), f (5) = 3 > 2 = f (x̄).
198 20. Optimization Theory: Inequality Constraints

Following theorem describes the conditions in which a point of constrained local maximum, x̂
is also a point of constrained global maximum.
Theorem 20.3. Let X be a convex set in Rn . Let f , g j ( j = 1, · · · , m) be concave functions on
X. Suppose x̂ is a point of constrained local maximum. Then, x̂ is a point of constrained global
maximum.

Proof. We prove this by contradiction. We denote the constraint set by


{ }
C = x ∈ X : g j (x) ≥ 0 .
Since x̂ is a point of constrained local maximum, there is δ > 0, such that for all x ∈ B(x̂, δ) ∩C, we
have f (x) ≤ f (x̂).

Now, if x̂ is not a point of constrained global maximum, then there is some x̄ ∈ C, such that
f (x̄) > f (x̂). One can choose 0 < θ < 1 with θ sufficiently close to zero, such that
x̃ ≡ [θ x̄ + (1 − θ)x̂] ∈ B(x̂, δ).
For this, we need
∥θ x̄ + (1 − θ)x̂ − x̂∥ < δ.
This implies if
δ
θ<
∥x̄ − x̂∥
then x̃ ∈ B(x̂, δ). Take
δ
θ=
2∥x̄ − x̂∥
so that x̃ ∈ B(x̂, δ). Since X is convex and g j ( j = 1, · · · , m) are concave, we claim that C is a convex
set, and x̃ ≡ [θ x̄ + (1 − θ)x̂] ∈ C.

Let y ∈ C and y′ ∈ C be two arbitrary points. By definition of the constraint set C, y and y′ are
in X and therefore, ŷ ≡ [λ y + (1 − λ)y′ ] ∈ X for all λ ∈ [0, 1]. Also by concavity of the constraint
functions,
g j (ŷ) = g j (λ y + (1 − λ)y′ ) ≥ λ g j (y) + (1 − λ)g j (y′ ) ≥ λ · 0 + (1 − λ) · 0 = 0,
for all j = 1, · · · , m.

Thus, ŷ ∈ C and therefore C is a convex set.

Therefore, x̃ ≡ [θ x̄ + (1 − θ)x̂] ∈ C.

Thus
x̃ = [θ x̄ + (1 − θ)x̂] ∈ B(x̂, δ) ∩C.
20.2. Global maximum and constrained local maximum 199

Also, since f is concave,


f (x̃) = f (θ x̄ + (1 − θ)x̂) ≥ θ f (x̄) + (1 − θ) f (x̂) > θ f (x̂) + (1 − θ) f (x̂) = f (x̂).
But this contradicts the fact that x̂ is a point of constrained local maximum. 

Observe that we did not need to assume that the objective function is differentiable on the
domain X in this proof.
Chapter 21

Problem Set 8

(1) [Cauchy-Schwartz inequality]


Let C = (c1 , c2 , c3 ) be a non-zero vector in R3 . Consider the following constrained maxi-
mization problem:


 max ∑3i=1 ci xi
(21.1) subject to ∑3i=1 xi2 = 1

 and (x1 , x2 , x3 ) ∈ R3

(a) Show, by using Weierstrass theorem, that there exists x̄ ∈ R3 which solves (31.1).
(b) Use Lagrange’s theorem to show that
3
(21.2) ∑ ci x̄i = ∥C∥.
i=1

(c) Let p, q be arbitrary non-zero vectors in Rn . Using result in (b), show that |p·q| ≤ ∥p∥·∥q∥.
Solve the following constrained optimization problems.

(2) Let f : R2 → R.
max f (x, y) = x2 − 3xy
(21.3) (x,y)∈R2+
subject to x + 2y = 10.

(3) Let f : R2+ → R.


1 2
max f (x, y) = x 3 y 3
(21.4) (x,y)∈R2+
subject to 2x + y = 4.

201
202 21. Problem Set 8

(4) Let f : R2+ → R



max f (x, y) = xy
(21.5)
subject to x + y 6 6, x > 0, y > 0.

(5) Let f : R2+ → R


max f (x, y) = x + ln(1 + y)
(21.6)
subject to x ≥ 0, y ≥ 0 and x + py ≤ m.

(6) Let X be a non-empty, convex set in R2 . Let g be a continuous function from X to R, and
let f be a strictly quasi-concave function from X to R. Consider the following constrained
optimization problem.

max f (x) 

(21.7) subject to g(x) ≥ 0


and x∈X

and the corresponding optimization problem:


}
max f (x)
(21.8)
subject to x ∈ X

in which the constraint g(x) ≥ 0 has been omitted.


(a) Suppose that x̄ is a solution to (31.26), and g(x̄) > 0. Is x̄ also a solution to problem
(31.25)? Explain.
(b) Suppose that x̄ is a solution to (31.26), but x̄ is not a solution to (31.25). Show that if x̂ is
any solution to (31.25), then we must have g(x̂) = 0.

(7) Suppose that a consumer has the utility function U(x, y) = xa yb and faces the budget constraint
px x + py y ≤ I.
(A) Utility Maximization
(a) What are the first order conditions for utility maximization?
(b) Solve for the consumer’s demands for goods x and y.
(c) Solve for the value of λ. What is the economic interpretation of λ? When is λ an
increasing, decreasing or constant function of income?
(d) Show that the second order conditions hold?
(e) Show that the implicit function theorem value of dx dI is identical to the value of taking

the partial derivative of x with respect to I.
21. Problem Set 8 203

(f) A consumer’s indirect utility function is defined to be utility as a function of prices


and income. Use x∗ and y∗ to solve for the indirect utility function. Is it true that the
partial of the indirect utility function with respect to income equals λ?
(B) Expenditure Minimization:
Now consider the “dual ”of the utility maximization problem. The dual problem is to min-
imize expenditures, Px x + Py y, subject to reaching a given level of utility, u0 (the constraint
is therefore U0 − xa yb = 0).
(a) What are the first order conditions for expenditure minimization?
(b) Use the first order conditions to solve for x∗ and y∗ (these are called the Hicksian or
compensated demand functions).
(c) Check the second order conditions.
(d) Write the level of income, I, necessary to reach U0 as a function of U0 , prices, and
parameters. How does this expenditure function relate to the indirect utility function?
(e) To avoid confusion, let us call solution for utility maximization of good x as x∗ and
solution for good x in expenditure minimization as h∗ . Prove that
∂x∗ ∂h∗ ∂x∗
= − x∗ .
∂Px ∂Px ∂I
Interpret this answer.

(8) Suppose a consumer has the utility function U = a ln(x − x0 ) + b ln(y − y0 ) where a, b, x0 and
y0 are positive parameters. Assume that the usual budget constraint applies.
(a) Solve for the consumer’s demand for good x.
(b) Find the elasticities of demand for good x with respect to income and prices.
(c) Show that the utility function U = 45(x − x0 )3.5a (y − y0 )3.5a would have yielded the same
demand for good x.

(9) Optimization with inequality constraints: Rationing.


Suppose a consumer has the utility function,

U(x, y, z) = a ln(x) + b ln(y) + c ln(z)

where a > 0, b > 0 and c > 0 are such that a + b + c = 1. The budget constraint is

px + qy + rz ≤ I.

In other words, the prices of good x, y and z are p, q and r respectively and the consumer has
an income I. The prices and income are positive.
In addition, the consumer faces a rationing constraint. He is not allowed to buy more than
k > 0 units of good x.
(a) Solve the optimization problem.
(b) Under what condition on the various parameters, is the rationing constraint binding?
204 21. Problem Set 8

(c) Show that when the rationing constraint binds, the income that the consumer would have
liked to spend on good x but cannot do so now is split between good y and z in proportions
b : c.
(d) Would you expect rationing of bread purchases to affect demand for butter and rice in this
way? If not, how would you expect the bread-butter-rice case to differ from the result in
(c)?
Chapter 22

Envelope Theorem

22.1. Envelope Theorem for Unconstrained Problems

Let f (x, α) be a continuously differentiable function of x ∈ Rn and a parameter α. For each choice
of α, consider the unconstrained maximization problem:

max f (x, α)

where choice variable is x. It is of interest to us as to how the maximizer value x∗ changes as the
parameter value α changes.

Theorem 22.1. Let x∗ (α) be a solution of this problem and also assume that x∗ (α) is a continuously
differentiable function of α. Then,
d ∂
f (x∗ (α), α) = f (x∗ (α), α)
dα ∂α

Proof. We use the Chain Rule to get


d ∂ d ∗ ∂
f (x∗ (α), α) = ∑ f (x∗ (α), α) · xi (α) + f (x∗ (α), α)
dα i ∂x i dα ∂α
or
d ∂
f (x∗ (α), α) = f (x∗ (α), α)
dα ∂α

since ∂xi f (x∗ (α), α) = 0 for i = 1, · · · , n by the First Order conditions for the solution. 

205
206 22. Envelope Theorem

Example 22.1. Consider the problem of maximizing the function f (x, a) = −2x2 + 2ax + 4a2 with
respect to x for any given value of a. What is the effect of a unit increase in the value of a on the
maximum value of f (x, a).

This can be done directly by computing the x∗ which maximizes f . The first order condition
yields
f ′ (x) = −4x + 2a = 0.
So x∗ = 0.5a. We can plug this into f (x, a) which leads to

f (x∗ (a), a) = f (0.5a, a) = −0.5a2 + a2 + 4a2 = 4.5a2 .

Observe that f (x∗ (a), a) increases at the rate of 9a as a increases. Alternatively we could apply the
Envelope Theorem to get
d f ∗ ∂ f (x∗ (a), a)
= = 2x∗ + 8a = 9a
da ∂a
since x∗ (a) = 0.5a.

Example 22.2. Consider the firms’ profit maximization problem.

(22.1) max π = p f (x) − wx.


(x)∈R+

Let us denote the input level at which the maximum profit is attained by x∗ . We observe that x∗ is a
function of the parameters p and w. The maximum profit is the value function of this exercise and
we call it the profit function.

π∗ (p, w) = p f (x∗ (p, w)) − wx∗ (p, w).

By Envelope Theorem
∂π∗ (p, w)
= f (x∗ (p, w)) > 0.
∂p
Thus the profit function is increasing in the price of the output. Also
∂π∗ (p, w)
= −x∗ (p, w) < 0.
∂w
So the profit function is decreasing in the price of the input. Further, it also shows that
∂π∗ (p, w)
x∗ (p, w) = − .
∂w
The profit maximizing input stock can be obtained by taking partial derivative of the profit function
with respect to w (a result known as Hotelling’s Lemma).
22.2. Meaning of the Lagrange multiplier 207

22.2. Meaning of the Lagrange multiplier

In this section we will see that the multipliers measure the sensitivity of the optimal value of the
objective function to the changes in the right-hand sides (parameters) of the constraints. In this
sense, they provide a natural measure of the value for scarce resources in economic maximization
problems.

Consider a simple maximization problem with two variables and one equality constraint. Let
f R2 → R be denoted as f (x, y).
:

max f (x, y)
(22.2) (x,y)∈R2+
subject to h(x, y) = a.

Let (x∗ (a), y∗ (a)) be a solution to the above problem for any given parameter value a. Thus
f (x∗ (a), y∗ (a)) is the corresponding optimal value of the objective function. Let the Lagrange
multiplier be denoted by λ∗ (a). Following theorem shows that λ∗ (a) measures the rate of change
of the optimal value of the objective function f with respect to a.

Theorem 22.2. Let f and h be continuously differentiable functions of two variables. For any fixed
value of the parameter a, let (x∗ (a), y∗ (a)) be the solution of the optimization problem (22.2) with
the corresponding Lagrange multiplier λ∗ (a). Assume that x∗ (a), y∗ (a) and λ∗ (a) are continuously
differentiable functions of a and the constraint qualification holds at (x∗ (a), y∗ (a)). Then,

d f (x(a), y(a))
λ∗ (a) = .
da

Proof. The Lagrangian for the problem (22.2) is

L ≡ f (x, y) − λ(h(x, y) − a)

where a is a parameter. The solution of this problem, (x∗ (a), y∗ (a)), λ∗ (a) satisfies the First Order
conditions.
∂L (x∗ (a),y∗ (a),λ∗ (a)) ∂ f (x∗ (a),y∗ (a),λ∗ (a)) ∗ ∗ ∗

∂x = ∂x − λ∗ (a) ∂h(x (a),y∂x(a),λ (a)) = 0


∂L (x∗ (a),y∗ (a),λ∗ (a)) ∗ ∗ ∗
∂ f (x (a),y (a),λ (a)) ∗ ∗ ∗

∂y = ∂y − λ∗ (a) ∂h(x (a),y∂y(a),λ (a)) = 0


∂L (x∗ (a),y∗ (a),λ∗ (a))
∂λ = h(x∗ (a), y∗ (a)) − a = 0

for all values of a. Also, since h(x∗ (a), y∗ (a)) = a for all a, we get,

∂h(x∗ (a), y∗ (a), λ∗ (a)) dx∗ (a) ∂h(x∗ (a), y∗ (a), λ∗ (a)) dy∗ (a)
+ =1
∂x da ∂y da
208 22. Envelope Theorem

for all a. Now we can use the Chain Rule and the two First Order conditions,
d f (x∗ (a),y∗ (a) ∂ f (x∗ (a),y∗ (a)) dx∗ (a) ∗ ∗ (a)) dy∗ (a)

da = ∂x · da + ∂ f (x (a),y ∂y · da
∂h(x ∗ (a),y∗ (a),λ∗ (a)) dx∗ (a) ∂h(x ∗ (a),y∗ (a),λ∗ (a)) dy∗ (a)
= λ∗ ∂x da + λ

∂y da
∂h(x ∗ (a),y∗ (a),λ∗ (a)) dx∗ (a) ∂h(x ∗ (a),y∗ (a),λ∗ (a)) dy∗ (a)
= λ∗ [ ∂x da + ∂y da ]
= ∗
λ ·1 = λ ∗

22.3. Envelope Theorem for Constrained Optimization

The general envelope theorem arises in the case of constrained optimizations where both the ob-
jective function as well as the constraint functions are functions of some parameters. Consider for
example the optimization exercise as follows:

max f (x, a)
subject to g j (x, a) = 0 for j = 1, · · · , m
(22.3) and x∈ Rn+ .

In this case, the objective function f as well as the constraints g1 , · · · , gm depend on the
parameter a. Following theorem shows that the rate of change of f (x∗ (a), a) with respect to a
equals the partial derivative with respect to a, not of f but of the corresponding Lagrangian function
L.
Theorem 22.3. Let f , g1 , · · · , gm be continuously differentiable functions and let
x∗ (a) = (x1∗ (a), x2∗ (a), · · · , xn∗ (a))
denote the solution of the optimization problem (22.3) for any fixed value of the parameter a.
Assume that x∗ (a), and the Lagrange multipliers λ∗1 (a), · · · , λ∗m (a) are continuously differentiable
functions of a and the constraint qualification condition holds. Then,
d f (x∗ (a), a) ∂L (x∗ (a), λ∗ (a), a)
(22.4) =
da ∂a
∂ f (x∗ (a), a) ∂g1 (x∗ (a), a) ∂gm (x∗ (a), a)
(22.5) = − λ∗1 − · · · − λ∗m .
∂a ∂a ∂a
Chapter 23

Elementary Concepts in
Probability

Probability theory deals with random events, events whose occurrence cannot be predicted with
certainty. There are at least three sources of randomness. Firstly by nature many features of our
world are stochastic. Evolution of such a diverse variety of life is witness to unpredictability in
the universe and environment. Second source of randomness: Many events are the result of a very
large number of actions and decisions. Third source of randomness: Some variables may appear
random because they are measured with error.

Even though we are not sure about the outcomes of a random event, we can attach to each
outcome a number called probability.

23.1. Discrete Probability Model

We first describe the set of outcomes of a random event, i.e., a set whose elements are all possible
outcomes of a random event. It is known as the sample space and denoted by Ω.
Example 23.1. The set of possible outcomes of flipping a fair coin is
Ω = {H, T }.

The set of outcomes of rolling a die is


Ω = {1, 2, 3, 4, 5, 6}

209
210 23. Elementary Concepts in Probability

where the outcome i means that i appeared on the die, i = 1, 2, 3, 4, 5, 6.

The set of outcomes for flipping two coins is


Ω = {HT, T H, T T, HH}.
It is easy to list the set of outcomes for flipping n coins, but very soon the list becomes too long.

The set of outcomes of rolling two dice is


 

 (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)


 


 (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6)


 


 (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6)
Ω=

 (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6)


 


 


 (5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6)


 (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)

where the outcome (i, j) is said to occur if i appeared on the first die, and j appeared on the second
die.

The set of outcomes for measuring the lifetime of a car, consists of non-negative real numbers.
Ω = [0, ∞).

Next we form the set F that contains all elements of the set Ω as well as their unions and
complements. Thus if A and B are in F , so does A ∪ B, Ac , and Bc . The set F , which is closed
under the operations of union and complements, is known as algebra.
Example 23.2. The algebra for the outcomes of flipping a fair coin is
/ Ω, {H}, {T }}.
F = {0,
The algebra for the outcomes for flipping two coins is
/ Ω, {T T }, {HH}, {HT, T H}, {HH, T T }, {HH, HT, T H}, {T T, HT, T H}}.
F = {0,

We can now define a probability measure by assigning to each element of sample space Ω, a
probability P.
Definition 23.1. The set function P is called a probability measure if

/ = 0;
(i) P(0)

(ii) P(Ω) = 1;
23.1. Discrete Probability Model 211

/
(iii) P(A ∪ B) = P(A) + P(B) for all A, B ∈ Ω and A ∩ B = 0.

The three conditions listed above are the axioms of probability theory.
Example 23.3. For the outcomes of flipping two fair coins,
P(HH) = P(HT ) = P(T T ) = P(T H) = 0.25.

The triple of the set of outcomes, the algebra, and the probability measure (Ω, F , P) is referred
to as a probability model.

In next step, we assign probabilities to the random events. Three sources of attaching proba-
bilities to the outcomes of random events are (a) equally likely events, (b) long run frequencies and
(c) degree of confidence (subjective or Bayesian approach). Observe that even though we assign
probabilities to different events, the mathematical theory for dealing with the random events and
their probabilities remain the same.

We define random variable next. The rule that specifies a real number to the outcomes is called
a random variable. More formally,
Definition 23.2. A random variable is a set function that maps the set of outcomes of a random
event to the set of real numbers.

Such a function is not unique and depending on the the purpose at hand, we may define one or
many random variables to the same random event.
Example 23.4. For the outcomes of flipping two fair coins, let us define a random variable X as
the number of heads. Then, we have
X(HH) = 2; X(HT ) = X(T H) = 1, X(T T ) = 0.
We could have defined the random variable X as the number of tails. Then, we have
X(HH) = 0; X(HT ) = X(T H) = 1, X(T T ) = 2.

In collecting labor statistics, we are interested in the characteristics of the respondents. For
example, we may ask if a person is in the labor force or not, employed or unemployed. We could
also be interested to learn the demographic characteristics of the respondents like gender, race, age
etc. For each of these answers we can define one or more binary variables. For example let X = 1 if
a respondent who is in the labor force is unemployed and X = 0 if employed. We can define Y = 1
if the respondent is a woman and employed, Y = 0 otherwise.

A random variable together with its probabilities is called a probability distribution.


212 23. Elementary Concepts in Probability

Let us consider flipping a fair coin three times.


Example 23.5. For the outcomes of flipping a fair coin three times, let us define a random variable
X as the number of heads. The set of outcomes for flipping a fair coin three times is
Ω = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T }.
Then, the probability distribution is
P(X = 0) = 0.125; P(X = 1) = 0.375; P(X = 2) = 0.375; P(X = 3) = 0.125.

Probability distributions become unwieldy as the number of outcomes becomes large or infi-
nite. One way to summarize the information about a probability distribution is through its moments
as mean which measure the central tendency, and variance, which measures the dispersion or vari-
ability of the distribution. Another moment reflects the skewness of the distribution to the left or
to the right and kurtosis which is an indicator of the bundling of the outcomes near the mean : the
more values are concentrated near the mean, the taller is the peak of the distribution.

The first moment of the distribution which is the expected value or the mean of the distribution
is defined as
n
E(X) = µ = ∑ xi P(xi ).
i=1
Example 23.6. For the distribution of the number of heads in three flips of a coin, we have,
µ = 0 · P(X = 0) + 1 · P(X = 1) + 2 · P(X = 2) + 3 · P(X = 3).
which yields the mean as
µ = 0 + 0.375 + 0.750 + 0.375 = 1.50

In similar manner, we may define the rth moment of a distribution as


n
E(X r ) = mr = ∑ xir P(xi ).
i=1
Example 23.7. For the distribution of the number of heads in three flips of a coin, the second
moment is
E(X 2 ) = 02 · P(X = 0) + 12 · P(X = 1) + 22 · P(X = 2) + 32 · P(X = 3).
which yields the second moment as
µ = 0 + 0.375 + 1.50 + 1.125 = 3

Another measure (which is of great importance) is the variance or the second moment around
the mean :
n
E(X − µ)2 = σ2 = ∑ (xi − µ)2 P(xi ).
i=1
23.2. Marginal and Conditional Distribution 213

The formula for the variance can be rewritten using the binomial expansion as

n
E(X − µ)2 = ∑ (xi − µ)2 P(xi )
i=1
n n
= ∑ xi2 P(xi ) − 2µ ∑ xi P(xi ) + µ2
i=1 i=1
n
= ∑ xi2 P(xi ) − µ2
i=1

Example 23.8. For the distribution of the number of heads in three flips of a coin, the variance is

σ2 = E(X 2 ) − µ2 = 3 − 1.52 = 0.75.

Mean is a measure of central tendency of a distribution showing its center of gravity whereas
the variance and its square root, called the standard deviation measure the dispersion or the volatil-
ity of the distribution. The advantage of using the standard deviation is that it measures the disper-
sion in the same measurement units as the original variable. In finance, variance of returns of an
asset is used as a measure of risk.

23.2. Marginal and Conditional Distribution

As we have observed before, a random event may give rise to a number of random variables each
defined by a different set function whose domains are the same set. In the Table below we present
such a situation where random variables X and Y and their probabilities are reported. Think of
Y as the annual income in units of thousand dollars of a profession and X as gender, with X = 0
denoting men and X = 1 denoting women. The information contained in the table is probability
of joint events, i.e., the probability of X and Y each taking a particular value. For instance the
probability of X = 1 and Y = 120 is 0.11, which is denoted as

P(X = 1,Y = 120) = 0.11.

Such a probability is referred to as joint probability because it shows the probability of a woman
earning $120000 a year.
214 23. Elementary Concepts in Probability

X Y P
0 60 0.02
0 70 0.04
0 80 0.07
0 90 0.09
0 100 0.10
0 110 0.06
0 120 0.03
0 130 0.02
0 140 0.01
0 150 0.01
1 70 0.01
1 80 0.02
1 90 0.04
1 100 0.08
1 110 0.11
1 120 0.11
1 130 0.09
1 140 0.05
1 150 0.03
1 160 0.01

If we are interested only in X, then we can sum up the overall relevant values of Y and get the
marginal probability of X. For example,

P(X = 1) = P(X = 1,Y = 70) + · · · + P(X = 1,Y = 160) = 0.01 + 0.02 + · · · + 0.03 + 0.01 = 0.55.

In general we can write


n
P(X = xk ) = ∑ P(X = xk ,Y = y j )
j=1

In similar manner, we can calculate the probability of X = 0 which would be 0.45. Thus the
marginal distribution of X is

X P(X)
0 0.45
1 0.55

A similar procedure yields the marginal probability of Y . For example,

P(Y = 90) = P(Y = 90, X = 0) + P(Y = 90, X = 1) = 0.09 + 0.04 = 0.13.


23.2. Marginal and Conditional Distribution 215

Observe that in this example, the marginal distribution of X shows the distribution of men and
women in that profession (45% men and 55% women), whereas the marginal distribution of Y
would show the distribution of income for both men and women, i.e., profession as a whole.

Sometimes we may be interested to know the probability of Y = 110 when we already know
that X = 1. Thus we want to know the conditional probability of Y = 110, given that X = 1.
P(Y = 110, X = 1) 0.11
P(Y = 110|X = 1) = = = 0.20
P(X = 1) 0.55
In general,
P(Y = y j , X = xk )
P(Y = y j |X = xk ) = .
P(X = xk )
We have computed the conditional distribution of Y |X = 0 and Y |X = 1.

Y P(Y— X=0) Y P(Y— X=1)


60 0.044 70 0.018
70 0.089 80 0.036
80 0.156 90 0.073
90 0.2 100 0.145
100 0.222 110 0.200
110 0.133 120 0.200
120 0.067 130 0.164
130 0.044 140 0.091
140 0.022 150 0.055
150 0.022 160 0.018

A conditional distribution has a mean, variance and other moments. The mean is
n
E(Y |X = xk ) = ∑ y j P(y j |X = xk ).
j=1

Variance and other higher moments of the conditional distribution can be computed similarly.

The conditional mean of the conditional distribution given above is


n
E(Y |X = 0) = ∑ y j P(y j |X = 0)
j=1
= 60 × 0.044 + 70 × 0.089 + 80 × 0.156 + 90 × 0.2 + 100 × 0.222
+ 110 × 0.133 + 120 × 0.067 + 130 × 0.044 + 140 × 0.022 + 150 × 0.022
= 101.4.
. We can compute the conditional mean for X = 1 to be
216 23. Elementary Concepts in Probability

E(Y |X = 1) = 111.4.

23.3. The Law of Iterated Expectation

It relates the conditional mean and the unconditional mean. In general


n
E(Y ) = EX E(Y |X) = ∑ E(Y |X = x j )P(X = x j )
j=1

For the example above,


E(Y ) = E(Y |X = 0)P(X = 0) + E(Y |X = 1)P(X = 1)
= 101.4 × 0.45 + 111.4 × 0.55 = 107.9
It is easy to infer that if E(Y |X = x j ) = 0 for all values of x, i.e., the conditional expectation of Y
equals zero, then the unconditional expectation E(Y ) = EX E(Y |x) = 0. However, the reverse is not
true. E(Y ) = 0 does not imply that E(Y |X) = 0 for all values of x.

23.4. Continuous Random Variables

Many variables we come across in economics are continuous in nature as against discrete. In
assigning probabilities to continuous variables, we face the problem that no matter how small is
the interval of values of the continuous variable, there are infinitely many points in it. If we assign
positive probabilities to each point, the sum of such probabilities would diverge which violates the
axiom of probability theory, i. e., the sum of probabilities should add up to one.

This problem is circumvented by assigning probabilities to the segments of the interval within
which the random variable is defined.
P(X ≤ 5), or P(−4 < X ≤ 2)
Example 23.9. A simple example of a continuous random variable is the uniform distribution.
Variable X can take any value between a and b and the probability of X falling within the segment
[a, c] is proportional to the length of the interval compared to the interval [a, b].
c−a
P(a < X ≤ c] =
b−a

The probability distribution function of X is defined by


F(x) = P(X ≤ x)
and has to conform to the following conditions:
23.4. Continuous Random Variables 217

(a) F(x) is continuous.

(b) F(x) is non-decreasing, i.e.


F(x1 ) ≤ F(x2 ), if x1 < x2 .

(c)
F(−∞) = lim F(x) = 0, and F(∞) = lim F(x) = 1.
x→−∞ x→∞

These conditions are the counterpart of the discrete case and entail that probability is always posi-
tive and the sum of probabilities adds to one.

Now we define the probability model for continuous random variables. Consider the extended
real line R = R ∪ {−∞, ∞} which shall play the same role for the continuous variables as Ω plays
for the discrete variables, (the set of all possible outcomes). Consider the half closed intervals on
R,
(a, b] = [x ∈ R : a < x ≤ b}]
and form finite sums of such intervals provided the intervals are disjoint:
n
A = ∑ (a j , b j ], n < ∞.
j=1
A set consisting of all such sums plus the empty set 0/ is an algebra, but it is not a σ-algebra. The
smallest σ-algebra that contains this set is called the Borel set and is denoted by B(R). Finally we
define the probability measure as
F(x) = P(−∞, x].
The triple (R, B(R), P) is our probability model for continuous random variables.
Chapter 24

Solution to PS 1

(1) (a) ∼ (A ∧ B) ⇔ ∼ A ∨ ∼ B and ∼ (A ∨ B) ⇔∼ A ∧ ∼ B. We prove this claim using Truth


table.

A B A ∧ B A ∨ B ∼ (A ∧ B) ∼ A ∼ B ∼ A∨ ∼ B ∼ (A ∨ B) ∼ A∧ ∼ B
1 2 3 4 5 6 7 8 9 10
T T T T F F F F F F
T F F T T F T T F F
F T F T T T F T F F
F F F F T T T T T T

Claim (a) is proved by comparing the columns 5 and 8.


(b) Claim (b) is proved by comparing columns 9 and 10.
(c) ∼ (A ⇒ B) ⇔ A ∧ ∼ B

A B A ⇒ B ∼ (A ⇒ B) ∼ B A∧ ∼ B
1 2 3 4 5 6
T T T F F F
T F F T T T
F T T F F F
F F T F T F

219
220 24. Solution to PS 1

(d) ((A ∨ B) ⇒ C) ⇔ ((A ⇒ C) ∧ (B ⇒ C))


A B C A ⇒ C B ⇒ C A ∨ B (A ⇒ C) ∧ (B ⇒ C) (A ∨ B) ⇒ C
1 2 3 4 5 6 7 8
T T T T T T T T
T T F F F T F F
T F T T T T T T
T F F F T T F F
F T T T T T T T
F T F T F T F F
F F T T T F T T
F F F T T F T T
(e) This claim is true.
If n is even, then n + 1 is odd. If n is odd, then n + 1 is even. Hence both cannot be even.
(f) Let n = 1, then n2 = 1 = n.
(g) Let x > 1.
x ∈ NO ⇔ ∃n ∈ N  x = 2n + 1,
x 2
= (2n + 1)2
= 4n2 + 4n + 1
( )
= 2 2n2 + 2n + 1
(24.1) ⇒ x 2 ∈ NO .
For x = 1, x2 = 1 which is odd.

(2) Recall that ∼ (A ⇒ B) is equivalent to A∧ ∼ B.


(a) Set S is closed and bounded, and S is not compact.
(b) Set S is compact, and S is either not closed or unbounded.
(c) Function f is continuous and not differentiable.

(3) (a) If xy is NOT a rational number then x2 = 3 ∨ y2 < 5.


(b) If there does not exist a y such that xy = 1, then x = 0.

(4) (a) The mistake is in assuming the same value of k for m and n. The correct proof should be
Proof. Since m is an even integer and n is an odd integer, m = 2k and n = 2p + 1 for some
integers k and p. Therefore, 2m + 3n = 2(2k) + 3(2p + 1) = 4k + 6p + 3 = 2(2k + 3p +
1) + 1 = 2l + 1; where l = 2k + 3p + 1. Since k, p ∈ Z, l ∈ Z. Hence, 2m + 3n = 2l + 1 for
some integer l, whence 2m + 3n is an odd integer. 
(b) The mistake is in showing the claim for one particular value of n. The claim holds for all
positive integers. The correct proof should be
24. Solution to PS 1 221

Proof. Observe that n2 + 2n + 1 = (n + 1)2 and (n + 1) · (n + 1) is a composite number for


all positive integers. 

(5) (a) (i) Contrapositive If a number is divisible by 4 then it is divisible by 2. Let y = 4m


where m ∈ N, then y = 2 (2m). Hence y is divisible by 2.
(ii) Contradiction There exist a number y which is not divisible by 2 but is divisible
by 4. Since y = 4m where m ∈ N, we know that y = 2 (2m) and so y is divisible by 2.
This contradicts our initial assumption.
(b) There is no greatest negative real number.

Proof. Assume, to the contrary, that there is a greatest negative real number x. Then, x ≥ y
for every negative real number y. Consider the number 2x . Since x is a negative real number,
so too is 2x . Multiplying both sides of the inequality 12 < 1 by x, which is negative, gives
x x
2 > x. Hence, 2 is a negative real number that is greater than x, which is a contradiction.
Hence our assumption that there is a greatest negative real number is false. Thus there is
no greatest negative real number. 

(c) The product of an irrational number and a nonzero rational number is irrational.

Proof. Assume, to the contrary, that there exists a non-zero rational number p and an
irrational number q whose product is a rational number. Thus, by definition of rational
numbers, p = ab and p · q = r = dc for some integers a; b; c and d witha ̸= 0, b ̸= 0 and
d ̸= 0. Hence,
c
r d bc
q= = a =
p b ad

Now, bc ∈ Z and ad ∈ Z since a; b; c and d ∈ Z. Since a ̸= 0 and d ̸= 0, ad ̸= 0. Hence,


r ∈ Q, which is a contradiction. Hence our assumption that there exists a non-zero rational
number and an irrational number whose product is a rational number is false. Thus, the
product of a rational number and an irrational number is irrational. 

(6) (a) (i) Base of induction:


When n = 1, the statement P(1) : 1 = 12 holds trivially.
(ii) For every integer n > 1, let P(n) be the statement P(n) : 1 + 3 + · · · + (2n − 1) = n2 .
For the inductive hypothesis, let k be an arbitrary (but fixed) integer such that k > 1
and assume that P(k) is true; that is, assume that 1 + 3 + · · · + (2k − 1) = k2 . For the
inductive step, we need to show that P(k + 1) is true. That is, we show that

1 + 3 + · · · + (2k − 1) + (2k + 1) = (k + 1)2 .


222 24. Solution to PS 1

Evaluating the left-hand side of this equation, we have


1 + 3 + · · · + (2k − 1) + (2k + 1) = (1 + 3 + · · · + (2k − 1)) + (2k + 1)
= k2 + (2k + 1) (by the inductive hypothesis)
= (k + 1)2 ;
thus verifying that P(k + 1) is true.
(iii) Hence, by the principle of mathematical induction, P(n) is true for all integers n ≥ 1;
that is,
1 + 3 + · · · + (2n − 1) = n2
is true for every positive integer n.
(b) (i) Base of induction:
When n = 1, the statement P(1) : 1 = 1(1+1) 2 is certainly true since 1(1+1)
2 = 22 = 1.
This establishes the base case when n = 1.
(ii) For every integer n > 1, let P(n) be the statement P(n) : 1 + 2 + · · · + n = n(n+1)
2 . For
the inductive hypothesis, let k be an arbitrary (but fixed) integer such that k > 1 and
assume that P(k) is true; that is, assume that 1 + · · · + k = k(k+1)
2 . For the inductive
step, we need to show that P(k + 1) is true. That is, we show that
(k + 1)(k + 2)
1 + 2 + · · · + k + (k + 1) = .
2
Evaluating the left-hand side of this equation, we have
1 + 2 + · · · + k + (k + 1) = (1 + 2 + · · · + k) + (k + 1)
k(k + 1)
= + (k + 1) (by the inductive hypothesis)
2
k(k + 1) 2(k + 1)
= +
2 2
(k + 1)(k + 2)
= ;
2
thus verifying that P(k + 1) is true.
(iii) Hence, by the principle of mathematical induction, P(n) is true for all integers n ≥ 1;
that is,
n(n + 1)
1+2+···+n =
2
is true for every positive integer n.
(c) (i) Base of induction:
[ ]2
When n = 1, the statement P(1) : 1 = 1(1+1) 2 = 1 is certainly true since 1(1+1)
2 =
2
= 1. This establishes the base case when n = 1.
2 [ ]2
(ii) For every integer n > 1, let P(n) be the statement P(n) : 13 +23 +· · ·+n3 = n(n+1)
2 .
For the inductive hypothesis, let k be an arbitrary (but fixed) integer such that k > 1
24. Solution to PS 1 223

[ ]2
and assume that P(k) is true; that is, assume that 13 + · · · + k3 = k(k+1)
2 . For the
inductive step, we need to show that P(k + 1) is true. That is, we show that
[ ]
(k + 1)(k + 2) 2
1 + 2 + · · · + k + (k + 1) =
3 3 3 3
.
2
Evaluating the left-hand side of this equation, we have
13 + · · · + k3 + (k + 1)3 = (13 + · · · + k3 ) + (k + 1)3
[ ]
k(k + 1) 2
= + (k + 1)3 (by the inductive hypothesis)
2
[ ]
2 4(k + 1)
2 k
= (k + 1) +
4 4
[ ] [ ]
(k + 1) 2 2 (k + 1) 2
= [k + 4k + 4] = (k + 2)2
2 2
[ ]
(k + 1)(k + 2) 2
= ;
2
thus verifying that P(k + 1) is true.
(iii) Hence, by the principle of mathematical induction, P(n) is true for all integers n ≥ 1;
that is,
[ ]
n(n + 1) 2
1 +···+n =
3 3
2
is true for every positive integer n.
(d) It is an example of an arithmetic - geometric series. Let us denote the sum by S, i.e.

S = a + (a + r)q + (a + 2r)q2 + · · · + (a + (n − 1)r)qn−1


Multiplying both sides by q, we get
qS = aq + (a + r)q2 + (a + 2r)q3 + · · · + (a + (n − 1)r)qn
Subtracting it from S, we get

S − qS = a + rq + rq2 + · · · + rqn−1 − (a + (n − 1)r)qn


All terms except the first and the last term on the right hand side constitute a geometric
series with first term being rq, the common ratio being q and the number of terms being
n − 1.
a − (a + (n − 1)r)qn rq + rq2 + · · · + rqn−1
S= +
1−q 1−q
224 24. Solution to PS 1

The sum of the geometric series described above is


qr(1 − qn−1 )
.
1−q
We substitute this for the sum and get S as
rq(1−qn−1 )
a − (a + (n − 1)r)qn (1−q)
S= +
1−q 1−q
a − (a + (n − 1)r)qn rq(1 − qn−1 )
= + .
1−q (1 − q)2

(7) To show that the formula holds for n = 0, we must show that
0
r0+1 − 1
∑ ri = r−1
.
i=0

The left-hand side of this equation is ∑0i=0 ri = r0 = 1, while the right-hand side is r r−1−1 = 1,
0+1

since r ̸= 1. Hence the formula holds for n = 0. For the inductive hypothesis, let k be an
arbitrary (but fixed) integer such that k ≥ 0 and assume that ∑ki=0 ri = r r−1−1 . For the inductive
k+1

k+1 i rk+2 −1
step, we need to show that ∑i=0 r = r−1 . Evaluating the left-hand side of this equation, we
have
k+1 k
∑ ri = ∑ ri + rk+1 (writing the (k + 1)st term separately)
i=0 i=0
rk+1 − 1
= + rk+1 (by the inductive hypothesis)
r−1
rk+1 − 1 (r − 1)rk+1
= +
r−1 r−1
r − 1 + r − rk+1
k+1 k+2
=
r−1
r −1
k+2
= ;
r−1
thus verifying the claim. Hence, by the principle of mathematical induction, the formula is true
for all integers n ≥ 0.
In the limiting case of n → ∞, the sum is well-defined for |r| < 1. Also the sum is 1−r 1
in
this case. In case of |r| ≥ 1 it is not well defined in case of n → ∞, though it is defined for all
n ∈ N.

(8) (a) We proceed by mathematical induction. When n = 2, the result is true since in this case
n3 − n = 23 − 2 = 8 − 2 = 6 and 6 is divisible by 6. Hence, the base case when n = 2 is
true. For the inductive hypothesis, let k be an arbitrary (but fixed) integer such that k ≥ 2
24. Solution to PS 1 225

and assume that the property holds for n = k, i.e., suppose that k3 − k is divisible by 6. For
the inductive step, we must show that the property holds for n = k + 1. That is, we must
show that (k + 1)3 − (k + 1) is divisible by 6. Since k3 − k is divisible by 6, there exists,
by definition of divisibility, an integer r such that k3 − k = 6r. Now, by the laws of algebra
and the inductive hypothesis, it follows that

(k + 1)3 − (k + 1) = (k3 + 3k2 + 3k + 1) − (k + 1)


= (k3 − k) + 3(k2 + k)
= 6r + 3k(k + 1)

Now, k(k + 1) is a product of two consecutive integers, and is therefore even. Hence,
k(k + 1) = 2s for some integer s. Thus, 6r + 3k(k + 1) = 6r + 3(2s) = 6(r + s), and so, by
substitution, (k + 1)3 − (k + 1) = 6(r + s), which is divisible by 6. Therefore, (k + 1)3 −
(k + 1) is divisible by 6, as desired. Hence, by the principle of mathematical induction, the
property holds for all integers n ≥ 2.
(b) We proceed, as before, by mathematical induction. When n = 3, the inequality holds since
in this case 2n = 23 = 8 and 2n + 1 = 2 · 3 + 1 = 7, and 8 > 7. Hence, the base case when
n = 3 is true. For the inductive hypothesis, let k be an arbitrary (but fixed) integer such that
k > 3 and assume that the inequality holds for n = k, i.e., suppose that 2k > 2k + 1. For
the inductive step, we must show that the inequality holds for n = k + 1. That is, we must
show that 2k+1 > 2(k + 1) + 1. Now,

2k+1 = 2 · 2k
> 2 · (2k + 1) (by the inductive hypothesis)
= 2(k + 1) + 2k
> 2(k + 1) + 1 (since k ≥ 3),

as desired. Hence, by the principle of mathematical induction, the inequality holds for all
integers n ≥ 3.

(9) By the Quotient Remainder theorem, for d = 6, for all natural number m, m = 6n + r where
n is integer and r ∈ {0, 1, 2, 3, 4, 5}. Since m is prime, it cannot be of the form 6n (divisible
by 6), 6n + 2, or 6n + 4 (divisible by 2) or 6n + 3 (divisible by 3). Thus the only remaining
possibilities are 6n + 1 and 6n + 5.
226 24. Solution to PS 1

|9 − 5x| ≤ 11

9 − 5x ≤ 11 or − (9 − 5x) ≤ 11

(10) 9 − 5x − 9 ≤ 11 − 9 −9 + 5x ≤ 11
−5x ≤ 2 −9 + 5x + 9 ≤ 11 + 9
1 1
− · −5x ≥ − · 2 5x ≤ 20
5 5
2 1 1
x≥− · 5x ≤ · 20
5 5 5
Chapter 25

Solution to PS 2

(1) We need to verify that it satisfies three conditions of the distance function.
(a) (i) Non-negativity is obvious as the absolute value is non-negative. If x = y, then d(x, y) =
0. Also if
n
d (x, y) = ∑ | xi − yi |= 0
i=1
then xi − yi = 0 for all i = 1, · · · , n. This implies that x = y.
(ii) Symmetry is obvious too since absolute value function is symmetric,
| a − b |=| b − a | .

(iii) Triangle Inequality: Note that


| xi − zi |6| xi − yi | + | yi − zi |
holds for all i = 1, 2, · · · , n. Hence
n n n
∑ |xi − zi | 6 ∑ |xi − yi | + ∑ |yi − zi |
i=1 i=1 i=1
or
d(x, z) 6 d(x, y) + d(y, z).
Hence it is a distance function.
(b) (i) Non-negativity is obvious as the maximum of two absolute values is non-negative. If
x = y, d(x, y) = 0. Also
d (x, y) = max{|x1 − y1 | , |x2 − y2 |} = 0
⇒ |x1 − y1 | = 0 = |x2 − y2 | ⇒ x = y.

227
228 25. Solution to PS 2

(ii) Symmetry is obvious too since absolute value function is symmetric


| a − b |=| b − a | .

(iii) Triangle Inequality I: Note that max{a, b} > a and max{a, b} > b. Using this we
have
d(x, y) > |x1 − y1 | and d(x, y) > |x2 − y2 |
d(y, z) > |y1 − z1 | and d(y, z) > |y2 − z2 |
d(x, y) + d(y, z) > |x1 − y1 | + |y1 − z1 | > |x1 − z1 |
d(x, y) + d(y, z) > |x2 − y2 | + |y2 − z2 | > |x2 − z2 | .
It follows that
d(x, y) + d(y, z) > max{|x1 − z1 | , |x2 − z2 |} = d (x, z) .
Hence it is a distance function.
(iv) Triangle Inequality II: Consider the case when d(x, z) = |x1 − z1 |, i.e., |x1 − z1 | ≥
|x2 − z2 |.
Then using triangle inequality for the absolute value function,
d(x, z) = |x1 − z1 | ≤ |x1 − y1 | + |y1 − z1 |
≤ d(x, y) + d(y, z)
The inequality in second line follows from the fact that either d(x, y) = |x1 − y1 | or
d(x, y) > |x1 − y1 |. Similar observations hold for d(y, z). The second case of d(x, z) =
|x2 − z2 | will be similar. Hence it is a distance function.
(c) (i) Non-negativity: d(x, y) ≥ 0 for all x, y in Rn , and thus 1 + d(x, y) ≥ 1 for all x, y in
Rn . As a result, d1 (x, y) ≥ 0 for all x, y in Rn .
By the definition of d1 (x, y), d1 (x, y) = 0 if and only if d(x, y) = 0. But d(x, y) = 0 if
and only if x = y.
(ii) Since d(x, y) = d(y, x), it is straightforward to see that d1 (x, y) = d1 (y, x).
(iii) Triangle Inequality I
d1 (x, z) ≤ d1 (x, y) + d1 (y, z) ⇔
d(x, z) d(x, y) d(y, z)
≤ + ⇔
1 + d(x, z) 1 + d(x, y) 1 + d(y, z)
d(x, z)[1 + d(x, y)][1 + d(y, z)] ≤ d(x, y)[1 + d(x, z)][1 + d(y, z)]
+ d(y, z)[1 + d(x, y)][1 + d(x, z)] ⇔
d(x, z) ≤ d(x, y) + d(y, z) + 2d(x, y)d(y, z) + d(x, y)d(y, z)d(x, z)
Since d(x, y) + d(y, z) ≥ d(x, z), d(a, b) ≥ 0 for any (a, b) ∈ Rn × Rn , the last inequal-
ity is always true. Thus d1 (x, z) ≤ d1 (x, y) + d1 (y, z) for all x, y, z in Rn .
25. Solution to PS 2 229

(iv) Triangle Inequality II


We use notation a ≡ d(x, z) and b ≡ d(x, y) + d(y, z). Then

a ≤ b → a + ab ≤ b + ab
a b
a(1 + b) ≤ b(1 + a) → ≤
1+a 1+b
d(x, z) d(x, y) + d(y, z)

1 + d(x, z) 1 + d(x, y) + d(y, z)
d(x, z) d(x, y) d(y, z)
≤ +
1 + d(x, z) 1 + d(x, y) 1 + d(y, z)
d1 (x, z) ≤ d1 (x, y) + d1 (y, z).

∪∞ [ ]
(2) It is bounded. Take B = 2, ∥x∥ 6 2, ∀x ∈ n=1
1 2
n, n . But it is NOT closed as
∞ [ ]
∪ 1 2
, = (0, 2].
n=1
n n

So it is not compact.

(3)

(A ∪ B)c ⊆ Ac ∪ Bc is TRUE.
Let, x ∈ (A ∪ B)c
⇒ x∈
/ (A ∪ B)
⇒ x∈
/ A∧x ∈
/B
⇒ x ∈ AC ∧ x ∈ BC
⇒ x ∈ AC ∪ BC

(A ∪ B)c ⊇ Ac ∪ Bc is FALSE.
Let, x ∈ Ac ∪ Bc and let x ∈ Ac ∧ x ∈
/ Bc
⇒ x∈
/ A∧x ∈ B ⇒ x ∈ A∪B
/ (A ∪ B)C .
⇒ x∈

(4) Suppose for contradiction that A ̸= B. There exists x ∈ A such that x ∈


/ B. Since A ∪C = B ∪C,
and we know x ∈ A ∪ C we get x ∈ B ∪ C. This also means that x ∈ C since we have assumed
x∈/ B. Then x ∈ A ∩C and since A ∩C = B ∩C, x ∈ B ∩C which implies x ∈ B a contradiction.
Hence A = B.
230 25. Solution to PS 2

(5) (a) Let X ∈ P (A ∩ B) be an arbitrary element. Then X ⊂ A ∩ B which means X ⊂ A and


X ∈ P (A). Also X ⊂ B and X ∈ P (B). Hence X ∈ P (A) ∩ P (B). Next let X ∈ P (A) ∩ P (B)
be an arbitrary element. Then X ∈ P (A) which implies X ⊂ A and X ∈ P (B) which implies
X ⊂ B. Both taken together imply that X ⊂ A ∩ B and X ∈ P (A ∩ B).
(b) Let X ∈ P (A) be an arbitrary element. Then X ∈ P (A) ∪ P (B). Since X ⊂ A, X ⊂ A ∪ B
which means X ∈ P (A ∪ B). Thus we have P (A) ⊂ P (A ∪ B). The same argument works
to show that P (B) ⊂ P (A ∪ B). Hence P (A) ∪ P (B) ⊂ P (A ∪ B).
(c) Let A = {1} and B = {2}. Then
/
P (A) = {{1}, 0},
and
/
P (B) = {{2}, 0}.
So
/
P (A) ∪ P (B) = {{1}, {2}, 0}.
However A ∪ B = {1, 2}. So
/
P (A ∪ B) = {{1}, {2}, {1, 2}, 0}.

(6) It is enough to show that one of the properties of the vector space is not satisfied by this space.
Take scalar multiplication by 2. Let (x1 , x2 ) ∈ C and let α = 2 be a scalar. Then
(2x1 , 2x2 ) ∈ R2 : (2x1 )2 + (2x2 )2 = 4.
Hence (2x1 , 2x2 ) ∈
/ C and so C is not a vector space.

(7) In this case the commutative property of the sum of vectors does not hold. Consider a = (2, 3)
and b = (4, 5).
Then a + b = (2 + 4, 3 − 5) = (6, −2) and b + a = (4 + 2, 5 − 3) = (6, 2). Hence
(2, 3) + (4, 5) ̸= (4, 5) + (2, 3).
So V is not a vector space.

(8) In this case also, the commutative property of the sum of vectors does not hold. Consider as
before, a = (2, 3) and b = (4, 5).
Then a + b = (2 + 2 × 4, 3 + 3 × 5) = (10, 18) and b + a = (4 + 2 × 2, 5 + 3 × 3) = (8, 14).
Hence
(2, 3) + (4, 5) ̸= (4, 5) + (2, 3).
So V is not a vector space.

(9) (a) (J ∩ K)c = J c ∪ K c .


We split the proof in two parts.
25. Solution to PS 2 231

(i)
(25.1) (J ∩ K)c ⊆ J c ∪ K c .
Let
x ∈ (J ∩ K)c
⇒ x∈
/ (J ∩ K)
⇒ x∈
/ J ∨x ∈
/K
⇒ x ∈ JC ∨ x ∈ K C
⇒ x ∈ JC ∪ K C .
(ii) Next
Jc ∪ Kc ⊆ Jc ∩ Kc

x ∈ JC ∪ K C
⇒ x ∈ JC ∨ x ∈ K C
⇒ x∈
/ J ∨x ∈
/K
⇒ x∈
/ J ∩K
⇒ x ∈ (J ∩ K)C .
(b) (J ∪ K)c = J c ∩ K c .

(25.2) (J ∪ K)c ⊆ J c ∩ K c
Let
x ∈ (J ∪ K)c
⇒ x∈
/ (J ∪ K)
⇒ x∈
/ J ∧x ∈
/K
⇒ x ∈ JC ∧ x ∈ K C
⇒ x ∈ JC ∩ K C .
Next
J c ∩ K c ⊆ (J ∪ K)c
Let
x ∈ Jc ∩ Kc
⇒ x ∈ Jc ∧ x ∈ Kc
⇒ x∈
/ J ∧x ∈
/K
⇒ x∈
/ J ∪K
⇒ x ∈ (J ∪ K)c .
232 25. Solution to PS 2

(10) We need to show that ∀ ε > 0 ∃ N such that



∀n > N, (xn + yn ) − (x + y) < ε.
Note
|xn + yn − x − y| = |xn − x + yn − y| 6 |xn − x| + |yn − y| ,
by triangle inequality. Since
ε
xn → x, ∃N1 s.t. ∀n > N1 , |xn − x| <
2
ε
yn → y, ∃N2 s.t. ∀n > N2 , |yn − y| < .
2
Let N = max {N1 , N2 }. Hence
ε ε
∀n > N, |xn − x| + |yn − y| < + = ε,
2 2
⇒ |xn + yn − x − y| < ε.
So {xn + yn } → x + y.

(11) We know that if a sequence is convergent then it is bounded. The contrapositive statement
will be, “If a sequence is not bounded then it is not convergent.”. The sequence xn = n, n ∈ N
is NOT bounded. No matter which B we choose as a bound, there will be a natural number
greater than it. We now use the contrapositive to conclude that {xn }∞
n=1 is not convergent.

(12) Since {xn } is a Cauchy sequence, for ∀ε > 0, there exist N ∈ N such that ∀m, n > N implies
that |xn − xm | < ε. Choose ε = 1, m = N, then

|xn − xN | < 1 ⇒ xn < 1 + |xN | , ∀n > N.


Let { }
B = max |x1 | , |x2 | , · · · , 1 + |xN | ,
then xn 6 B, ∀n ∈ N.

(13) It is easy to check that the sequence {converges


} to 2 being sum of a constant sequence {xn } =
{2, 2, · · · } and the sequence {yn } = − n . We have already seen in the class notes that the
1

second sequence {yn } converges to zero. Hence, the sequence being some of two convergent
sequences converges to the sum of the limits which is equal to 2 + 0 = 2. Since limit of
convergent sequence is unique, 1 cannot be a limit.

(14) We consider monotone increasing sequence xn 6 xn+1 . Proof is analogous for the monotone
decreasing case. Let {xn } be a convergent sequence and let lim xn = x. From the definition of
n→∞
convergence, with ε = 1, we get N ∈ N such that ∀n > N implies that |xn − x| < 1. Then,
xn < 1 + |x| , ∀n > N.
25. Solution to PS 2 233

Let { }
B = max |x1 | , |x2 | , · · · , 1 + |x| ,
then xn 6 B, ∀n ∈ N. Now let the sequence be bounded. Let x be the least upper bound. Then
xn 6 x ∀n ∈ N. For every ε > 0, there exists a N ∈ N, such that x − ε < xN 6 x. Otherwise x − ε
would be an upper bound for the sequence. Since xn is increasing, n > N implies
x − ε < xn 6 x
which shows that xn converges to x.

(15) (i) S = (0, 1) Open: For any x ∈ (0, 1), open ball with radius min {x, 1 − x} is contained in S.
(ii) S = [0, 1] Closed: Use the theorem: A set S ⊆ Rn is closed if and only if every convergent
sequence of points {xn } ∈ S has its limit x ∈ A. Let {xn } be a convergent sequence with
limit x contained in S, then for all n, xn > 0, and xn 6 1. Since weak inequalities are
preserved in the limit, x 6 1 and x > 0. So x ∈ S and S is closed.
(iii) S{ = [0, }
1): Neither open nor closed: It is not closed since the limit of convergent sequence
1 − n is not contained in S and is not open since x = 0 is contained in S but it is not
1

possible to have an open ball with centre x = 0 which is contained in S.


(iv) S = R; Both open and closed: Use the result in the notes that empty set is both open and
closed and R is complement of the empty set.
(v) Let An , Bn and Cn be the intervals in R defined by
[ ] ( ] ( )
1 1 1
An = 0, , Bn = 0, , Cn = − , n ,
n n n
where n is a positive integer. Since

N ∩
N
1
An = [0, 1], and An = [0, ] for all N ∈ N,
n=1 n=1
N
we get

∪ ∞

An = [0, 1], and An = {0}.
n=1 n=1
Similarly

∪ ∞

Bn = (0, 1], / and
Bn = 0,
n=1 n=1

∪ ∞

Cn = (−1, ∞), and Cn = [0, 1).
n=1 n=1
Chapter 26

Solution to PS 3

(1)
 
[ ] 9 6 5 4
1 −1 7  
AB = .  1 −2 −3 3 
0 8 10
0 1 −1 2
[ ]
1·9−1·1+7·0 1·6−1·2+7·1 1·5+1·3−7·1 1·4−1·3+7·2
=
0 · 9 + 8 · 1 + 10 · 0 0 · 6 − 8 · 2 + 10 · 1 0 · 5 − 8 · 3 − 10 · 1 0 · 4 + 8 · 3 + 10 · 2
[ ]
8 15 1 15
(26.1) =
8 −6 −34 44

Note BA is not defined in this case.

(2) Set up the vector equation as


( ) ( ) ( )
1 1 0
λ1 + λ2 =
2 3 0
{
λ1 + λ2 = 0

2λ1 + 3λ2 = 0
{
λ1 = −λ2
(26.2) ⇔
λ1 = − 23 λ2

The only solution is


(26.3) λ1 = 0, λ2 = 0.

235
236 26. Solution to PS 3

So the two vectors are linearly independent.

(3) Recall the property of determinant: If we multiply any column of the matrix by scalar k, then
the determinant of the new matrix is k times the determinant of the original matrix.
Since the matrix −2A is obtained by multiplying each column of the matrix A (having
five rows and five columns) by −2, the determinant of matrix −2A would be (−2)5 . Thus
det (−2A) = (−2)5 det A = (−32)(−1) = 32.

(4) Recall the rank of a matrix A is the number of linearly independent column vectors of A. It is
also equal to the number of linearly independent row vectors of A.
 
3 2 1
 
A= 0 1 7 
5 4 −1

Take Columns 1 and 2.


    
3 2 0
     
λ1  0  + λ2  1  =  0 
5 4 0


 3λ1 + 2λ2 = 0
⇔ λ2 = 0

 5λ1 + 4λ2 = 0

(26.4) ⇔ λ1 = 0, λ2 = 0

is the only solution. So the first two columns are linearly independent. Now lets take all three
columns,
       
3 2 1 0
       
λ1  0  + λ2  1  + λ3  7  =  0 
5 4 −1 0


 3λ1 + 2λ2 + λ3 = 0 (i)
⇔ λ2 + 7λ3 = 0 (ii)

 5λ1 + 4λ2 − λ3 = 0 (iii)


 (i) − 2 (ii) : 3λ1 − 13λ3 = 0
(26.5) ⇔ (iii) − 4 (ii) : 5λ1 − 29λ3 = 0

 So, λ1 = 0, λ3 = 0 → λ2 = 0

is the only solution. So all three columns are linearly independent. This implies that the rank
of matrix A is 3.
26. Solution to PS 3 237

(5) Recall that the system of equation


(26.6) A · x = b
3×3 3×1 3×1
has a solution if and only if
(26.7) rank (A) = rank (Ab )
and the solution, if it exists, is unique if and only if
(26.8) rank (A) = rank (Ab ) = 3#of unknowns
In this question    
1 1 1 1 1 1 6
   
A =  1 2 3 , Ab =  1 2 3 10 
1 2 λ 1 2 λ µ
We can verify that the rank of A is at least 2 since the first two rows of A are linearly inde-
pendent. Similarly the rank of Ab is at least 2 since the first two rows of Ab are also linearly
independent.
(a) For no solution to exist, the ranks of A and Ab need to be different which will be possible
only in case rank of A is 2 and rank of Ab is 3. This is because if rank of A is 3 then so is
the rank of Ab .
For rank of A to be 2, λ = 3. Also for rank of Ab to be equal to 3, µ ̸= 10.
(b) For unique solution, the rank of A and Ab must be equal to 3. Rank of A is 3 if and only if
λ ̸= 3. In this case, rank of Ab is 3 for every value of µ ∈ R. Thus for λ ̸= 3 and µ ∈ R we
get unique solution.
(c) For infinitely many solution, rank of A and Ab need to be equal to 2. This is possible if and
only if λ = 3 and µ = 10.
You might consider writing down the solutions in the last two cases in terms of λ and µ
values.

(6)
A11 = 2 > 0, A11 A22 − A12 A21 = 2 · 1 − 1 = 1 > 0: PD
B11 > 0, B22 > 0, B11 B22 − B12 B21 = 2 · 8 − 16 = 0: PSD
C11 < 0, C11C22 −C12C21 = −3 · 5 − 16 < 0 : Indefinite
D11 < 0, D11 D22 − D12 D21 = −3 · (−6) − 16 > 0: ND

(7) Let Et and Ut denote the number of people who have employment and unemployed people in
some period t. The transition probabilities are defined as follows.
pAA ≡ probability that a current A remains an A,
pAB ≡ probability that a current A moves to B,
pBB ≡ probability that a current B remains a B,
238 26. Solution to PS 3

pbA ≡ probability that a current B moves to A.


The distribution of employees at time t is denoted by the vector xt′ = [At Bt ] and the transition
probabilities in matrix form as
[ ] [ ]
pAA pAB 0.9 0.1
(26.9) M= = .
pBA pBB 0.7 0.3
Then the distribution of employees across the two locations next period (t + 1) is xt′ · M = xt+1
′ ,

which is
[ ]
0.9 0.1
[At Bt ] = [(0.9At + 0.7Bt ) (0.1At + 0.3Bt )] = [At+1 Bt+1 ].
0.7 0.3
In the similar manner we can determine the distribution of employees after two periods.
′ ′
xt+1 · M = xt+2
[ ]
0.9 0.1
[At+1 Bt+1 ] = [At+2 Bt+2 ]
0.7 0.3
[ ][ ]
0.9 0.1 0.9 0.1
[At Bt ] = [At+2 Bt+2 ]
0.7 0.3 0.7 0.3
[ ]2
0.9 0.1
[At Bt ] = [At+2 Bt+2 ]
0.7 0.3
In general, for n periods,
[ ]n
0.9 0.1
(26.10) [At Bt ] = [At+n Bt+n ]
0.7 0.3
The initial distribution of employees across two states at time t = 0 as
x0′ = [A0 B0 ] = [0 2000]
Then the distribution of employees in the next period t = 1 is
[ ]
0.9 0.1
[0 2000] = [1400 600] = [A1 B1 ].
0.7 0.3
The distribution after two periods is
[ ]2 [ ]
0.9 0.1 0.88 0.12
[0 2000] = [0 2000] = [1680 320] = [A2 B2 ]
0.7 0.3 0.84 0.16
The distribution after four periods is
[ ]4 [ ]
0.9 0.1 0.8752 0.1248
[0 2000] = [0 2000] = [1747 253] = [A4 B4 ]
0.7 0.3 0.8736 0.1264
26. Solution to PS 3 239

The distribution after six periods is


[ ]6 [ ]
0.9 0.1 0.875005 0.124992
[0 2000] = [0 2000] = [1749 251]. = [A6 B6 ]
0.3 0.7 0.874944 0.125056

The distribution after eight periods is


[ ]8 [ ]
0.9 0.1 0.8750 0.1250
[0 2000] = [0 2000] = [1750 250] = [A8 B8 ]
0.3 0.7 0.8750 0.1250

The distribution after ten periods is


[ ]10 [ ]
0.9 0.1 0.8750 0.1250
[0 2000] = [0 2000] = [1750 250] = [A10 B10 ].
0.3 0.7 0.8750 0.1250

Observe that when the transition matrix is raised to higher powers, the new transition matrix
converges to a matrix whose rows are identical. This is referred to as the steady state. In this
example, the steady state would be
[ ]
7 1
M̄ = 8 8 .
7 1
8 8

For this observe that [ ]


0.9 0.1
[A B] × = [A B],
0.7 0.3
gives
0.9A + 0.7B = A,
and
A + B = 2000.
Then we get
7 1
A = 7B, or A = · (2000), B = · (2000).
8 8

(8) (a) We know


det AB = det A × det B.
Since the matrix A is nilpotent, we know
det Ak = [det A]k = det O = 0.
Hence, det A = 0.
240 26. Solution to PS 3

(b) Note
det A′ = det A.
Also, matrix −A is obtained by multiplying each row (or each column) of matrix A by −1.
Hence,
det(−A) = (−1)n det A = − det A,
if n is an odd number. Thus
det A′ = det A = − det A.
This leads to
det A = 0,
and therefore A is not invertible.
(c) Note
det A′ = det A.
and
det[AA′ ] = det A × det A′ = det A × det A = [det A]2 = det I = 1,
we get
det A = ±1.
(d) As we have seen in part (b), for n an odd integer,
det AB = det A × det B = (−1)n det BA = (−1)n det B det A = − det A × det B,
implies
det A × det B = 0.
This means either det A = 0 (i. e., A is not invertible) or det B = 0 (i.e., B is not invertible).
(e) Since
det AB = det A × det B = det I = 1,
det A ̸= 0 and therefore A is invertible. Pre-multiplying both sides by A−1 , we get
A−1 AB = IB = B = A−1 I = A−1 ,
showing that A−1 = B.

(9) (a) The characteristic polynomial is obtained by taking the determinant of the matrix
 
4−λ 4 4
 
A − λI =  −2 −3 − λ −6 
1 3 6−λ
This is equal to
(4 − λ)(−18 − 3λ + λ2 + 18) − 4(−12 + 2λ + 6) + 4(−6 + 3 + λ) = 0
On simplification we get
(4 − λ)(−3λ + λ2 ) − 4(−6 + 2λ) + 4(−3 + λ) = 0,
26. Solution to PS 3 241

or
−12λ + 4λ2 + 3λ2 − λ3 + 24 − 8λ − 12 + 4λ = 0,
12 − 16λ + 7λ2 − λ3 = 0,
(b) The characteristic polynomial is of degree three and hence has three solutions (possibly
repeated). The solutions are λ1 = 3, λ2 = 2 and λ3 = 2.
(c) Eigenvector for λ = 3:
    
1 4 4 x1 0
    
[A − λI]x =  −2 −6 −6   x2  =  0  .
1 3 3 x3 0
It is easy to check that x1 = 0, x2 = −1, x3 = 1 is a solution. Hence the eigenvector family
is given by    
x1 0
   
 x2  = t  −1  , t ̸= 0.
x3 1
Eigenvector for λ = 2:
    
2 4 4 x1 0
    
[A − λI]x =  −2 −5 −6   x2  =  0  .
1 3 4 x3 0
It is easy to check that x1 = 2, x2 = −2, x3 = 1 is a solution. Hence the eigenvector family
is given by    
x1 2
   
 x2  = t  −2  , t ̸= 0.
x3 1

(10) (a) We use the Result 7.1 to prove this. The determinant of the upper triangular matrix is equal
to the product of all the diagonal terms. By definition of eigenvalue, it is clear that if we
take λi = aii , then the determinant of the matrix [A − λi I] is zero since the diagonal entry
in row i or column i is zero.
Similar arguments can be used to prove the result for the lower triangular matrix.
(b) Since A is an invertible matrix, A−1 exists and we can pre-multiply the equation (A−λI)x =
0 by (A−1 . This yields (I − λA−1 )x = 0 or ( λ1 I − A−1 )x = 0 or (A−1 − λ1 I)x = 0 as desired.
Thus for an invertible matrix A, λ is an eigenvalue of A if and only if λ1 is an eigenvalue of
A−1 .
(c) Assume λ is the eigenvalue for the eigenvector x, we know
Ax = λx.
Pre-multiplying both sides by A, we get
A × Ax = A × λx = (λ)Ax = (λ)λx = (λ)2 x.
242 26. Solution to PS 3

In other words, we get


A2 x = (λ)2 x.
Hence x is an eigenvector of A2 and the corresponding eigenvalue is λ2 . Using the exercise
in part (b) and similar argument, we can show that x is an eigenvector of A−2 and the
corresponding eigenvalue is λ−2 .
Chapter 27

Solution to PS 4

(1) (a)

[ ]1
2x + 1 2
f (x) =
x−1
[ ] 1
1 2x + 1 − 2 (x − 1) 2 − (2x + 1) 1
f ′ (x) =
2 x−1 (x − 1)2
[ ]1
3 x−1 2 1
= −
2 2x + 1 (x − 1)2
3 1
(27.1) = −
2 (2x + 1) 12 (x − 1) 32

(b)

f (x) = ln(3x2 − 5x)


1
f ′ (x) = (6x − 5)
3x2 − 5x
6x − 5
(27.2) = .
3x2 − 5x

(2) Recall the equation for the tangent to f (x) at x0 is

y = f (x0 ) + f ′ (x0 ) (x − x0 ) .

243
244 27. Solution to PS 4

Using f (2) = 24, f ′ (x) = 10x + 3, f ′ (2) = 23,


y = 24 + 23 (x − 2) ,
y = −22 + 23x.

(3) (a)
lim f (x) = lim+ f (x) ̸= f (x0 )
x→x0− x→x0

lim f (x) = −1, lim+ f (x) = 0 = f (x0 )


x→0− x→0
lim− f (x) ̸= lim+ f (x)
x→x0 x→x0

Hence f (x) is not continuous at x = 0.


(b)
lim g (x) = 3 · 2 − 2 = 4; lim+ g (x) = −2 + 6 = 4 = g (x0 ) .
x→2− x→2
Hence g (x) is continuos at x = 2.

(4) Since both


f (0) = g (0) = 0,
we can use L’Hospital rule to find the limit.
( )
f ′ (x) = 2x · exp x2 − exp (−x) ⇒ f ′ (0) = −1
(27.3) g′ (x) = 2 ⇒ g′ (0) = 2.
Hence
( )
exp x2 + exp (−x) − 2 −1
(27.4) lim = .
x→0 2x 2

(5)
[ ]
∇ f (x, y) = 2xy + y2 − 2y + 3 x2 + 2xy − 2x
[ ]
2y 2x + 2y − 2
H f (x, y) =
2x + 2y − 2 2x
[ ]
4 4
(27.5) H f (1, 2) =
4 2
27. Solution to PS 4 245

(6) Let f (x, y) be defined as


{
xy
x2 +y2
if (x, y) ̸= (0, 0)
f (x, y) =
0 otherwise.

Show that the partial derivatives D1 f (x, y) and D2 f (x, y) exist at every point in R2 , although f
is not continuous at (0, 0).
(a) Observe that for all (x, y) ̸= (0, 0), we get

(x2 + y2 )(y) − xy(2x)


D1 f (x, y) =
(x2 + y2 )2
y(y2 − x2 )
=
(x2 + y2 )2
and
(x2 + y2 )(x) − xy(2y)
D2 f (x, y) =
(x2 + y2 )2
x(x2 − y2 )
=
(x2 + y2 )2
Further,
f (h, 0) − f (0, 0)
D1 f (0, 0) = lim
h→0 h
0
= =0
h
and
f (0, h) − f (0, 0)
D2 f (0, 0) = lim
h→0 h
0
= =0
h
Thus the partial derivatives D1 f (x, y) and D2 f (x, y) exist at every point (x, y) ∈ R2 .
(b) Consider y = x. The function f (x, x) = 12 for all points x ̸= 0 and therefore f (0, 0) = 0 ̸=
limh→0 f (h, h). Hence it is not continuous at (0, 0).

(7) This exercise gives an example of a function with D12 f (x, y) ̸= D21 (x, y). Let f (x, y) be defined
as
{
xy(x2 −y2 )
x2 +y2
if (x, y) ̸= (0, 0)
f (x, y) =
0 otherwise.
246 27. Solution to PS 4

(a) Observe that for all (x, y) ̸= (0, 0), we get


(x2 + y2 )(3x2 y − y3 ) − (x3 y − xy3 )2x
D1 f (x, y) =
(x2 + y2 )2
y(x4 + 4x2 y2 − y4 )
=
(x2 + y2 )2
and
(x2 + y2 )(x3 − 3xy2 ) − (x3 y − xy3 )2y
D2 f (x, y) =
(x2 + y2 )2
x(x4 − 4x2 y2 − y4 )
=
(x2 + y2 )2
Further,
f (h, 0) − f (0, 0)
D1 f (0, 0) =
h
0
= =0
h
and
f (0, h) − f (0, 0)
D2 f (0, 0) =
h
0
= =0
h
Further,
( )
y(x4 + 4x2 y2 − y4 ) 2x2 y2 − 2y4
D1 f (x, y) = = y 1+ 4
(x2 + y2 )2 x + 2x2 y2 + y4
( ) ( )
2x2 y2 2x2 y2
≤ y 1+ 4 = y 1+ 2 2 ≤ y(1 + 1).
x + 2x2 y2 + y4 2x y + (x4 + y4 )
So D1 f (x, y) ≤ 2|y|. It is easy to verify that D2 f (x, y) ≤ 2|x| on similar lines. This shows
that D1 f (x, y)(x,y)→(0,0) → 0 = D1 f (0, 0) as lim(x,y)→(0,0) 2|y| = 0. Similarly, D2 f (x, y)(x,y)→(0,0) →
0 = D2 f (0, 0) as lim(x,y)→(0,0) 2|x| = 0. For all (x, y) ∈ R2 \ (0, 0) the partial derivatives
D1 f (x, y) and D2 f (x, y) are continuous functions being a ratio of two polynomials with
non-vanishing denominator.
Thus the partial derivatives D1 f (x, y) and D2 f (x, y) exist at every point (x, y) ∈ R2 .
(b) Observe that
hy(h2 −y2 )
(h2 +y2 )
−0
D1 f (0, y) = lim
h→0 h
y(h2 − y2 )
= lim 2 = −y
h→0 (h + y2 )
27. Solution to PS 4 247

and
xh(x2 −h2 )
(x2 +h2 )
−0
D2 f (x, 0) = lim
h→0 h
x(x2 − h2 )
= lim 2 = x.
h→0 (x + h2 )

Therefore, the partial derivatives D1 f (x, y) and D2 f (x, y) are continuous at every point
in R2 . Since the real-valued function f has continuous partial derivatives at every point
(x, y) ∈ R2 , it is continuous at every point (x, y) ∈ R2 .
(c) Since f (x, y) is a rational function with non-zero denominator for (x, y) ̸= (0, 0), the second
order cross partial derivatives D12 f (x, y) and D21 f (x, y) exist at every point in R2 and are
continuous everywhere in R2 except at (0, 0).
(d) Given D2 f (x, 0) = x we get D21 f (0, 0) = +1 and from D1 f (0, y) = −y we get D12 f (0, 0) =
−1.
Chapter 28

Solution to PS 5

(1) Let f (x) and g (x) be two concave functions and let h (x) = f (x) + g (x). Concavity of f and g
imply, ∀x, y ∈ D, ∀λ ∈ [0, 1]
[ ]
λ f (x) + (1 − λ) f (y) 6 f λx + (1 − λ) y
[ ]
λg (x) + (1 − λ) g (y) 6 g λx + (1 − λ) y .

Adding these two inequalities, we get


[ ] [ ]
λ f (x) + (1 − λ) f (y) + λg (x) + (1 − λ) g (y) 6
f λx + (1 − λ) y + g λx + (1 − λ) y
( ) ( ) [ ] [ ]
λ f (x) + g (x) + (1 − λ) f (y) + g (y) 6 f λx + (1 − λ) y + g λx + (1 − λ) y
[ ]
λh (x) + (1 − λ) h (y) 6 h λx + (1 − λ) y .

This proves that h (x) = f (x) + g (x) is concave.

(2) (a) False. Consider A, B ⊆ R, A = [0, 2] , B = [4, 6] .A ∪ B = [0, 2] ∪ [4, 6]. Then 1 ∈ A ∪ B, 5 ∈
A ∪ B, but 12 · 1 + 12 · 5 = 3 ∈
/ A ∪ B.
(b) True. If A and B are convex sets, then A ∩ B is convex. Let x ∈ A ∩ B, y ∈ A ∩ B. Then,

(28.1) λx + (1 − λ) y ∈ A as x, y ∈ A
(28.2) λx + (1 − λ) y ∈ B as x, y ∈ B
(28.3) ⇒ λx + (1 − λ) y ∈ A ∩ B.

Hence A ∩ B is convex.
(c) True. Let z, z′ ∈ C, and let 0 ≤ λ ≤ 1. By definition of C, there exist x, x′ ∈ A and y, y′ ∈ B,
such that z = x + y and z′ = x′ + y′ . We will show that λz + (1 − λ)z′ belongs to C. This
will establish that C is a convex set in Rn .

249
250 28. Solution to PS 5

Since x, x′ ∈ A, and A is a convex set, we have λx + (1 − λ)x′ ∈ A. Since y, y′ ∈ B, and B is


a convex set, we have λy + (1 − λ)y′ ∈ B. By definition of C, we have:
(28.4) [λx + (1 − λ)x′ ] + [λy + (1 − λ)y′ ] ∈ C
We can rewrite (28.4) as:
λ(x + y) + (1 − λ)(x′ + y′ ) ∈ C
Since z = x + y and z′ = x′ + y′ , this means [λz + (1 − λ)z′ ] ∈ C.

(3) Let x, y ∈ [0, 1] and let 0 ≤ λ ≤ 1. Clearly [λx + (1 − λ)y] ∈ [0, 1]. In order to prove the claim,
we will show that:
h(λx + (1 − λ)y) ≤ λh(x) + (1 − λ)h(y)
Using the definition of h,
(28.5) h(λx + (1 − λ)y) = f (λx + (1 − λ)y)g(λx + (1 − λ)y)
Since f and g are convex functions on [0, 1]
f (λx + (1 − λ)y) ≤ λ f (x) + (1 − λ) f (y), and
g(λx + (1 − λ)y) ≤ λg(x) + (1 − λ)g(y).
Since f and g are non-negative valued functions,
f (λx + (1 − λ)y)g(λx + (1 − λ)y) ≤ [λ f (x) + (1 − λ) f (y)][λg(x) + (1 − λ)g(y)]
= λ2 f (x)g(x) + λ(1 − λ){ f (x)g(y) + g(x) f (y)}
(28.6) + (1 − λ)2 f (y)g(y)
Since f and g are increasing functions on [0, 1],
{ f (x) − f (y)}{g(x) − g(y)} ≥ 0
and so:
(28.7) f (x)g(x) + f (y)g(y) ≥ f (x)g(y) + f (y)g(x)
Using (28.7) in (28.6),
f (λx + (1 − λ)y)g(λx + (1 − λ)y) ≤ λ2 f (x)g(x) + λ(1 − λ){ f (x)g(x) + f (y)g(y)} + (1 − λ)2 f (y)g(y)
= λ f (x)g(x) + (1 − λ) f (y)g(y)
(28.8) = λh(x) + (1 − λ)h(y)
Using (28.6) in (28.8), we obtain:
h(λx + (1 − λ)y) ≤ λh(x) + (1 − λ)h(y)
which is the desired result.

(4) (a) Recall a monotone function of one variable is quasi-concave. Since f (x) = 3x + 4 is mono-
tone increasing, it is quasi-concave.
28. Solution to PS 5 251

(b) The bordered Hessian is

 
0 y exp (x) exp (x)
 
B (x, y) =  y exp (x) y exp (x) exp (x) 
exp (x) exp (x) 0
[ ]
( ) 0 y exp (x)
det B1 (x, y) = det
y exp (x) y exp (x)
= −y2 exp (2x) < 0;
 
0 y exp (x) exp (x)
 
det B2 (x, y) = det  y exp (x) y exp (x) exp (x) 
exp (x) exp (x) 0
= y exp (3x) > 0.

Recall the sufficient condition for f to be quasiconcave is that

( )
(−1)r det Br (x) > 0, ∀r = 1, 2, · · · , n; ∀x ∈ D.

This holds true for the function. Hence it is quasiconcave.


(c)

 
0 −2xy3 −3x2 y2
 
B (x, y) =  −2xy3 −2y3 −6xy2 
−3x2 y2 −6xy2 −6x2 y
[ ]
( ) 0 −2xy3
det B1 (x, y) = det
−2xy3 −2y3
(28.9) = −4x2 y6 6 0
 
( ) 0 −2xy 3 −3x2 y2
 
det B2 (x, y) = det  −2xy3 −2y3 −6xy2 
−3x2 y2 −6xy2 −6x2 y
(28.10) = −30x4 y7

( )
Note the sign of det B2 (x, y) is not positive. Hence it is not quasi-concave.
252 28. Solution to PS 5

f (x)

Figure 28.1. Function f (x), Problem 5

g(x)

Figure 28.2. Function g(x), Problem 5

(5) Let


 0 for x 6 0


x for 0 6 x 6 21
(28.11) f (x) =

 1 − x for 12 6 x 6 1

 0 for x > 1


 0 for x 6 1


x − 1 for 1 6 x 6 23
(28.12) g (x) = and

 2 − x for 32 6 x 6 2

 0 for x > 2
(28.13) h (x) = f (x) + g (x)

In the figures, Fig. 1 and Fig. 2 functions are quasiconcave (each of them is first non-
decreasing, then non-increasing), whereas Fig. 3 function, which is the sum of the top and
middle functions, is not quasiconcave (it is not non-decreasing, is not non-increasing, and is
not non-decreasing then non-increasing.
28. Solution to PS 5 253

f (x) + g(x)

Figure 28.3. Function f (x) + g(x), Problem 5

(6) (i)
[ ]
∇ f (x, y, z) = 24x2 + 2y2 4xy −3z2
 
48x 4y 0
 
(28.14) H f (x, y) =  4y 4x 0 
0 0 −6z

Then f (x, y) is not concave as the principal minor D1 = 48x > 0. The bordered Hessian
is
 
0 24x2 + 2y2 4xy −3z2
 
 24x2 + 2y2 48x 4y 0 
B (x, y) =  
 4xy 4y 4x 0 
−3z2 0 0 −6z
[ ]
( ) 0 24x2 + 2y2
det B1 (x, y) = det 2 2
24x + 2y 48x
(28.15) = −576x4 − 96x2 y2 − 4y4 6 0
 
0 24x 2 + 2y2 4xy
( )  
det B2 (x, y) = det  24x2 + 2y2 48x 4y 
4xy 4y 4x
(28.16) = −2304x5 − 384x3 y2 + 48xy4
which could take both positive or negative values. Hence f (x, y, z) is neither quasiconcave
nor quasiconvex.
(ii)
[ ]
∇g (x, y) = 1 − exp (x) − exp (x + y) 1 − exp (x + y)
[ ]
− exp (x) − exp (x + y) − exp (x + y)
(28.17) Hg (x, y) = .
− exp (x + y) − exp (x + y)
254 28. Solution to PS 5

Then the leading principal minors are


(28.18) D1 = − exp (x) − exp (x + y) < 0, D2 = ex ex+y > 0
implies that g (x, y) is concave. Hence it is also quasi-concave.
Chapter 29

Solution to PS 6

(1) We can write the set of linear equations in the form


 
[ ]  
 x 

 [ ]
 
1 3 1 −2 y 1
(29.1) · =
2 6 −2 −4 
 z 
 3

 w 

(a) The rank of matrix A can be at most 2. This means that there can be at most two endogenous
variables. Also the second column of matrix A is a multiple (three times) of the first column
and the fourth column is a multiple of column one (−2 times). The remaining two columns
are one and three. The sub-matrix consisting of columns one and three has full rank as it’s
determinant is −4. So we can choose x and z as endogenous variables and the remaining
two y and w as exogenous variables.
(b) The system of linear equations can be rewritten as under (with the exogenous and endoge-
nous variables choice made above.
[ ] { } [ ]
1 1 x 1 − 3y + 2w
(29.2) · =
2 −2 z 3 − 6y + 4w
Multiply the first equation by two and add it to the second to get,
(29.3) 4x = 5 − 12y + 8w
5 − 12y + 8w 5
(29.4) x= = − 3y + 2w
4 4
Substitute the value of x in the first equation to get
( )
5 1
z = 1 − 3y + 2w − − 3y + 2w = − .
4 4

255
256 29. Solution to PS 6

(2) The system of linear equations is


 
   x   

 

−1 3 −1 1   0
  y  
(29.5)  4 −1 1 1  · = 3 

 z 

7 1 1 3 
 w 
 6

The rank of matrix A can be at most 3. However, we observe that the third row is equal to the
sum of twice the second row and the first row. This means that the rank of matrix A cannot be
three. The sub matrix obtained by eliminating the third row of A (call it matrix B) is
[ ]
−1 3 −1 1
(29.6)
4 −1 1 1
The determinant of the sub matrix of B obtained by eliminating the third and fourth column is
−11 which is non-zero. This sub-matrix has full rank. So we can choose x and y as endogenous
variables and the remaining two z and w as exogenous variables.
We can solve the set of equations to obtain
[ ] { } [ ]
−1 3 x z−w
(29.7) · =
4 −1 y 3−z−w
Solving the two equations we get

9 − 2z − 4w
x= ,
11
and
3 + 3z − 5w
y= .
11

(3) Observe that we can write the equation as


F(x, y) = x2 − xy3 + y5 − 17 = 0,
which is a continuous function being polynomial. Also,
D2 F(x, y) = −3xy2 + 5y4 = −3(5)(4) + 5(2)4 = 20 ̸= 0.
Hence, by Implicit Function Theorem, there exist a function y = f (x) in terms of x, which is
continuously differentiable, in the neighborhood of (x, y) = (5, 2). Further,
( ) ( )
′ D1 F(x, y) 2x − y3 2 1
f (x)x=5 = − =− =− =− .
D2 F(x, y) (x,y)=(5,2) −3xy + 5y
2 4 20 10
(x,y)=(5,2)
Then
( )
′ 1 199
y = f (4.9) = f (5) + (5 − 4.9) · f (x)x=5 = 2 + (0.1) · − = .
10 200
29. Solution to PS 6 257

(4) Consider the function f (x, y, z) = x2 − y2 + z3 .


(a) Check that z = 3 satisfies the equation f (x, y, z) = 0 for x = 6 and y = 3.
(b) Observe that

D3 f (x, y, z) = 3z2 = 3(−3)2 = 27 ̸= 0.

In addition, the function f is a continuous function being polynomial. Hence, by Implicit


Function Theorem (IFT), there exist a function z = h(x, y) in terms of x and y, which is
continuously differentiable, in the neighborhood of (x, y) = (6, 3).
(c) By IFT, we have

( ) ( ) ( )
dz D1 f (x, y, z) 2x 2(6) 4
=− =− =− =− ,
dx (6,3,−3) D3 f (x, y, z) (6,3,−3) 3z2 (6,3,−3) 3(9) 9

and
( ) ( ) ( )
dz D2 f (x, y, z) −2x −2(3) 2
=− =− =− = .
dy (6,3) D3 f (x, y, z) (6,3,−3) 3z2 (6,3,−3) 3(9) 9

(d) If x increases to 6.1 and y decreases to 2.8, then

( ) ( )
dz dz
z = g(6, 3) + (6.1 − 6) + (2.8 − 3)
dx (6,3) dy (6,3)
( ) ( )
4 2 135
= −3 + − · (0.1) + · (−0.2) = − .
9 9 45

(5) Consider the profit maximizing firm described in the Example 13.2. If p increases by ∆p and
w increases by ∆w, what will be the change in the optimal input amount x?
Note the first order condition for the profit maximization is

p f ′ (x) − w = 0.

It can be written as a function of p, w and x, as F(p, w, x) = p f ′ (x)−w = 0. Then D3 F(p, w, x) =


p f ′′ (x) < 0 since f (x) is strictly concave. Also, F(p, w, x) is a continuously differentiable func-
tion. Hence we can apply IFT to claim that there exists a function x = f (p, w) which is contin-
uously differentiable, in the neighborhood of (p, w, x∗ ), where x∗ is the profit maximizing input
258 29. Solution to PS 6

quantity. Then
) ( ( )
∗dx dx
x=x + · ∆p + · ∆w
dp dw
( ) ( )
∗ D1 F(p, w, x) D2 F(p, w, x)
=x − · ∆p − · ∆w
D3 F(p, w, x) D3 F(p, w, x)
( ′ ) ( )
f (x) −1
= x∗ − ′′
· ∆p − · ∆w
p f (x) p f ′′ (x)
( ′ ) ( )
∗ f (x) 1
=x − · ∆p + · ∆w.
p f ′′ (x) p f ′′ (x)

(6) Consider 3x2 yz + xyz2 = 96 as defining x as an implicit function of y and z around the point
x = 2, y = 3, z = 2.
(a) Let F(x, y, z) = 3x2 yz + xyz2 − 96 = 0. Then
D1 F(x, y, z) = 6xyz + yz2 = 6(2)(3)(2) + 3(4) = 84 ̸= 0,
and F(x, y, z) is a continuously differentiable function (being polynomial). Hence we can
apply IFT to claim that there exists a function x = f (y) in terms of y which is continuously
differentiable, in the neighborhood of (x, y, z) = (2, 3, 2). Also
( ) ( ) ( )
dx D2 F(x, y, z) 3x2 z + xz2
=− =−
dy (2,3,2) D1 F(x, y, z) 6xyz + yz2
( )
3x2 + xz 3(4) + 2(3) 3
=− =− =− ,
6xy + yz 6(2)(3) + 3(2) 7
Then ( ) ( )
dx 3 137
x = 2+ (3.1 − 3) = 2 + − (0.1) = .
dy (2,3,2) 7 70
(b) √
z2
√ √
− 3z ± 9 + 128
yz z z2 32 1 1 16
x= =− ± + =− ± + ,
2 6 36 yz 3 9 y
which implies that

1 1 16
x=− + +
3 9 y
in the neighborhood of (2, 3, 2).
(c)
( ) ( ) ( )
dx 1 16 1 16 8
= √ · − 2 = 7· − =− .
dy (2,3,2) 2 19 + 16 y 2· 3 9 21
y
29. Solution to PS 6 259

Then ( ) ( )
8 8 412
x = 2+ − (3.1 − 3) = 2 − − = .
21 210 210
(d) The second method involves more computations.

(7) Let both x and y be positive. Then


( )
x+y
f (x + y) = f x·
x
x+y
= f (x)
x
x
f (x) = f (x + y) .
x+y
Similarly,
y
f (y) = f (x + y) .
x+y
So
x y
f (x) + f (y) = f (x + y) + f (x + y)
x+y x+y
x+y
= f (x + y)
x+y
= f (x + y) .
If x is zero, then
f (2x) = 2 f (x) = 2 f (0) and
f (2x) = f (0) ⇒ 2 f (0) = f (0)
f (0) = 0.
Then
f (x + y) = f (0 + y) = f (y) = 0 + f (y) = f (0) + f (y) = f (x) + f (y) .
Same arguments hold if both x and y are zero.
Another method of proof is as follows: Let x > 0 and y > 0. Then x = yt for some t > 0.
f (x + y) = f (yt + y) = f [(1 + t)y] = (1 + t) f (y)
= f (y) + t f (y) = f (y) + f (ty) = f (y) + f (x)
The remaining cases of x = 0 or y = 0 are considered as in the earlier proof.

(8) First observe that if x is the null vector, then


f (2x) = 2 f (x) = 2 f (0) and
f (2x) = f (0) ⇒ 2 f (0) = f (0)
f (0) = 0.
260 29. Solution to PS 6

( )
Take x and x′ such that f (x) = y > 0 and f x′ = y′ > 0. Then

f (x) = y
1
f (x) = 1
y
( )
x
f = 1.
y
Similarly
( ′)
x
f = 1.
y′
Take λ ∈ (0, 1) and define
λy
θ= .
λy + (1 − λ) y′
Then
(1 − λ) y′
1−θ =
λy + (1 − λ) y′
and θ ∈ (0, 1) . Function f is quasi-concave. So
( ( ) ( ′ )) { ( ) ( )}
x x x x′
f θ + (1 − θ) ′ > min f ,f
y y y y′
( ( ) ( ′ ))
λy x (1 − λ) y′ x
f ′
+ > min {1, 1}
λy + (1 − λ) y y λy + (1 − λ) y y′

( )
λx (1 − λ) x′
f + >1
λy + (1 − λ) y′ λy + (1 − λ) y′
( )
λx + (1 − λ) x′
f >1
λy + (1 − λ) y′
1 ( ′
)
f λx + (1 − λ) x >1
λy + (1 − λ) y′
( )
f λx + (1 − λ) x′ > λy + (1 − λ) y′
( )
= λ f (x) + (1 − λ) f x′
( )
it is concave. If f x′ is zero, since f is non-decreasing,
( )
f λx + (1 − λ) x′ > f (λx)
= f (λx) + 0
( )
= λ f (x) + (1 − λ) f x′ .
29. Solution to PS 6 261

( )
If both f (x) and f x′ are zero, then
( ) { ( )}
f λx + (1 − λ) x′ > min f (x) , f x′
( )
= 0 = λ f (x) + (1 − λ) f x′ .

(9) Since function f is homogeneous of degree m and is twice continuously differentiable, each of
the partial derivatives are homogeneous of degree m − 1.
Further, the partial derivatives are also continuously differentiable and are homogeneous
of degree m − 2 > 0.
Applying Euler’s theorem for second order partial derivatives of the partial derivative
D1 f (x), we get,

x1 D11 f (x) + x2 D12 f (x) + · · · + xn D1n f (x) = (m − 1)D1 f (x).


In general, applying Euler’s theorem to the second order partial derivatives of the partial
derivative Di f (x), we get,

x1 Di1 f (x) + x2 Di2 f (x) + · · · + xn Din f (x) = (m − 1)Di f (x).


for i = 1, · · · , n.
We can write these n equalities in matrix notation as
   
x1 D11 f (x) + x2 D12 f (x) + · · · + xn D1n f (x) (m − 1)D1 f (x)
   
 ···   ··· 
   
 x1 Di1 f (x) + x2 Di2 f (x) + · · · + xn Din f (x)  =  (m − 1)Di f (x) 
   
 ···   ··· 
x1 Dn1 f (x) + x2 Dn2 f (x) + · · · + xn Dnn f (x) (m − 1)Dn f (x)
This is equivalent to
     
D11 f (x) D12 f (x) · · · D1n f (x) 
 x1 
 D1 f (x)
   
  
 ··· ··· ··· ···   ··· 
  ··· 
   
 Di1 f (x) Di2 f (x) · · · Din f (x) · xi = (m − 1)  Di f (x) 
     
 ··· ··· ··· ···  
 ··· 

  ··· 

 

Dn1 f (x) Dn2 f (x) · · · Dnn f (x) xn Dn f (x)
The n × n square matrix on the left hand side is the Hessian matrix H f (x) for the function f .
Thus the LHS is H f (x) · x. Pre-multiplying both sides by the row vector x′ , we get
[ ]
x′ H f (x) · x = (m − 1) x1 D1 f (x) + · · · + xn Dn f (x) .
Applying Euler’s theorem to the sum on the RHS, we get
x′ H f (x) · x = (m − 1)[m f (x)] = m(m − 1) f (x).
262 29. Solution to PS 6
Chapter 30

Solution to PS 7

(1)
[ ]
∇g (x, y) = 3x2 − 3 3y2 − 2
[ ]
6x 0
(30.1) Hg (x, y) =
0 6y

Then

(30.2) D1 = 6x > 0, D2 = 36xy > 0

implies that g (x, y) is convex. Using


[ ] [ ]
(30.3) ∇g (x, y) = 3x2 − 3 3y2 − 2 = 0 0

2
(30.4) x∗ ∗
= 1, y =
3

is the unique solution.


( √ ) the theorem on convexity and global minima, g (x, y) attains
Using
global minimum at 1, 23 . Consider the function g defined for all x > 0, y > 0 by

( √ ) √
∗ 2 4 2
g 1, = −2 − = −3.09.
3 3 3

263
264 30. Solution to PS 7

4
3

−1 1 2 3

Figure 30.1. Graph of f (x) = x4 − 4x3 + 4x2 + 4

(2) We know that f ′ (x) = 0 is a necessary condition for f to have a local maxima or minima. Find
all the local maxima and minima of
(30.5) f ′ (x) = 4x3 − 12x2 + 8x = 0
( )
(30.6) 4x x2 − 3x + 2 = 0,
(30.7) x = 0, x = 1, x = 2
If we plot the graph of this function, we can see that x = 0, and x = 2 are local minima and
x = 1 is local maxima. Also x = 0, and x = 2 are global minima and there is no global maxima.

(3) The profit function for the monopolist is


π(Q1 , Q2 ) = Q1 (100 − 5Q1 ) + Q2 (50 − 10Q2 ) − (50 + 10Q1 + 10Q2 ).
The first order conditions for the profit maximization are
D1 π(Q1 , Q2 ) = 100 − 10Q1 − 10 = 0, or Q1 = 9,
D2 π(Q1 , Q2 ) = 50 − 20Q1 − 10 = 0, or Q2 = 2.
We need to check the second order conditions. Note
D11 π = −10, D22 π = −20, andD12 π = D21 π = 0,
which gives the first order leading principal minor to be −10 and the second order leading
principal minor to be 200. So the Hessian is negative definite for all outputs in the positive
orthant. Therefore, the function π is concave function. Then Q1 = 9 and Q2 = 2 is a profit
30. Solution to PS 7 265

maximizing supply plan for the firm. The maximum profit is π∗ = 9 × 55 + 2 × 30 − 50 − 110 =
395.

(4) (a) The profit for the firm, when it uses K and L units of capital and labor to produce output
Q = La K b , given the out and input prices (P,w,r) is

Π(K, L) = (P · Q − wL − rK).

The firm maximizes it’s profit by choosing K and L such that both the FOC and SOC are
satisfied.
The FOCs are as under.

dΠ ( )
= P · aLa−1 K b − w = 0
dL
P · aLa−1 K b = w;
dΠ ( )
= P · La bK b−1 − r = 0,
dK
P · L bK b−1 = r.
a

The FOC with respect to L leads to the condition that the value of the marginal product
of labor is equal to the wage rate w. Similarly, the FOC with respect to K leads to the
condition that the value of the marginal product of capital is equal to the rental rate r.
(b) To solve for the optimal levels of L and K, we divide the first FOC by the second and get,

P · MPL MPL dK P · aLa−1 K b w


= = = = ;
P · MPK MPK dL P · La bK b−1 r
aK w
= ;
bL r
wb
K= L.
ra

Observe that the ratio of the MPL and MPK is the MRTS (Marginal rate of technical substi-
tution, i.e., the rate at which one can substitute labor for capital along an iso-quant.) The
266 30. Solution to PS 7

value of K can be substituted in any of the two FOC to get the expression for L.
P · aLa−1 K b = w;
( )b
a−1 wb
P · aL L = w;
ra
( )b
a+b−1 wb w
P·L = ;
ra a
( )1−b ( )b
a b
P· = L1−a−b
w r
( ) 1−a−b
1−b ( ) b
∗ a b 1−a−b 1
L = P 1−a−b .
w r
We compute the optimal value of K ∗ from the last equation as under:
wb
K∗ = L;
ra
( ) 1−b ( ) b
wb a 1−a−b b 1−a−b 1
= P 1−a−b ;
ra w r
( ) 1−a−b
1−b
−1 ( ) b
a b 1−a−b +1 1
= P 1−a−b
w r
( ) 1−a−b
a ( ) 1−a−b
1−a
a b 1
= P 1−a−b .
w r
(c) For SOC, we first write down the Hessian (the matrix of second order partial derivatives
using the FOCs.
[ ] [ ]
PFLL PFLK Pa(a − 1)La−2 K b PabLa−1 K b−1
H= = .
PFKL PFKK PabLa−1 K b−1 Pb(b − 1)La K b−2
For the SOC to be satisfied, the leading principal minor of order one needs to be negative
and the leading principal minor of order two needs to be positive. Thus, Pa(a−1)La−2 K b <
0, which implies that a − 1 < 0 or a < 1. The LPM of order two is the determinant of the
Hessian matrix.
[ ]
Pa(a − 1)La−2 K b PabLa−1 K b−1
det H = det
PabLa−1 K b−1 Pb(b − 1)La K b−2
= P2 ab(a − 1)(b − 1)L2a−2 K 2b−2 − (PabLa−1 K b−1 )2 ,
= P2 ab[(a − 1)(b − 1) − ab]L2a−2 K 2b−2 ;
= P2 ab[1 − a − b]L2a−2 K 2b−2 > 0,
30. Solution to PS 7 267

which holds true if and only if 1 − a − b > 0. Note that this condition also implies that
b < 1.
Thus the production function is such that it displays diminishing marginal product in each
of the two inputs (a < 1 and b < 1) and also it displays diminishing returns to scale as the
production function is homogeneous of degree a + b < 1.
(d) We use the expression for L∗ derived earlier to find the partial derivatives.

( ) ( ) 1−a−b
1−b ( ) b
∂L∗ 1 a b 1−a−b
P 1−a−b −1 > 0,
1
=
∂P 1−a−b w r
( ) ( ) 1−a−bb
∂L∗ 1−b 1−b
− 1−a−b
1−b
−1 b 1
= − (a) 1−a−b (w) P 1−a−b < 0,
∂w 1−a−b r
( ) ( ) 1−a−b1−b
∂L∗ b a
(b) 1−a−b (r)− 1−a−b −1 P 1−a−b < 0.
b b 1
= −
∂r 1−a−b w
(e) The output is obtained by noting that the profit maximizing inputs are K ∗ and L∗ .

Q∗ = (L∗ )a (K ∗ )b ,
 a  b
( ) 1−a−b 1−b ( ) b ( ) 1−a−b
a ( ) 1−a−b
1−a
a b 1−a−b a b
= P 1−a−b   P 1−a−b  ,
1 1

w r w r
( ) a(1−b)+ab ( ) ab+b(1−a)
a 1−a−b b 1−a−b a+b
= P 1−a−b ,
w r
( ) 1−a−b
a ( ) 1−a−b
b
a b a+b
= P 1−a−b ,
w r
[( ) ( ) ] 1−a−b
1
a a b b a+b
= P .
w r

For computing the price elasticity of supply with respect to out put price, note that
[( ) ( ) ] 1−a−b
1
a b
a b
Q∗ = Pa+b ,
w r
a+b
= AP 1−a−b ,
[( ) ( ) ] 1−a−b
1
a b
a b
where A = w r is a constant independent of P. It is easy to see that the
elasticity will be εP = a+b
1−a−b . [Note that for Q = APb , εP = dQ
dP · QP = AbPb−1 QP = b.]
268 30. Solution to PS 7

Similarly, εw = − 1−a−b
a
and εr = − 1−a−b
b
. Thus,
a+b −a −b
εP + εw + εr = + + ,
1−a−b 1−a−b 1−a−b
a+b−a−b
=− = 0.
1−a−b
The economic interpretation is that if we change all the prices by same factor, then the
profit maximizing quantity does not change. In other words, the profit maximizing output
is homogeneous of degree zero in the prices (P,w,r).
(f) You may like to write down the expression for the profit function explicitly in terms of P,
w and r, on your own.

(5) (a) The profit for the firm, when it uses K, L and R units of capital, labor and natural resources
to produce output Q = ALa K b + ln R, given the output and input prices (P,w,v,r), is
Π(K, L) = P · Q − wL − rK − vR = P · ALa K b + P ln R − wL − rK − vR.
The firm maximizes it’s profit by choosing K, L and R such that both the FOC and SOC
are satisfied.
The FOCs are as under.

= P · aLa−1 K b − w = PFL − w = 0,
dL
P · AaLa−1 K b = w;

= P · ALa bK b−1 − r = PFK − r = 0,
dK
P · ALa bK b−1 = r,
dΠ P
= − v = PFR − v = 0,
dR R
P
= v.
R
The FOC with respect to L leads to the condition that the value of the marginal product of
labor is equal to the wage rate w̄. Similarly, the FOC with respect to K leads to the condition
that the value of the marginal product of capital is equal to the rental rate r. Lastly, the FOC
with respect to R leads to the condition that the value of the marginal product of natural
resource is equal to the price of the natural resource v.
Now take A = 3, a = b = 31 for remainder of the problem.
(b) With the given parameter values, the FOCs are, (note Aa=1=Ab)
P · L− 3 K 3 = w;
2 1

P · L 3 K − 3 = r,
1 2

P
= v.
R
30. Solution to PS 7 269

For SOC, we first write down the Hessian (the matrix of second order partial derivatives
using the FOCs.
   
− 2 P · L− 3 K 3 − 23 − 23
5 1
3P·L
1
PFLL PFLK PFL R K 0
   1 3 −2 −2 
− 3 P · L 3 K− 3
1 5
H = PFKL PFKK PFK R =  P · L 3 K 3
3
2
0 .
PFRL PFRK PFR R 0 0 − RP2
For the SOC to be satisfied, the leading principal minor of order one needs to be negative,
the leading principal minor of order two needs to be positive and the leading principal
minor of order three needs to be negative.
The LPM of order 1 is negative as, − 32 P · L− 3 K 3 < 0 (given that P > 0 and K > 0, L > 0).
5 1

The LPM of order two is the determinant of the matrix obtained by removing the third row
and the third column.
[ ]
− 23 P · L− 3 K 3 31 P · L− 3 K − 3
5 1 2 2

det H2 = det 1 − 23 − 23
− 23 P · L 3 K − 3
1 5
3P·L K
4 1
= P2 · L− 3 K − 3 − P2 · L− 3 K − 3 ,
4 4 4 4

9 9
1
= P2 · L− 3 K − 3 > 0.
4 4

3
The LPM of order three is the determinant of the Hessian matrix. We compute the deter-
minant using the third row to get,
 
− 32 P · L− 3 K 3 13 P · L− 3 K − 3
5 1 2 2
0
 
det H = det  13 P · L− 3 K − 3 − 23 P · L 3 K − 3
2 2 1 5
0 
0 0 − RP2
[ ]
P 4 2 −4 −4 1 2 −4 −4
=− 2 P ·L K − P ·L K
3 3 3 3 ,
R 9 9
1 P3 − 4 − 4
=− · L 3 K 3 < 0.
3 R2
Hence the SOC is satisfied.
(c) To solve for the optimal levels of L and K, we divide the first FOC by the second and get,
(note a = b = 13 )
P · MPL MPL dK P · aLa−1 K b w
= = = = ;
P · MPK MPK dL P · La bK b−1 r
aK w
= ;
bL r
w
K = L.
r
Observe that the ratio of the MPL and MPK is the MRTS (Marginal rate of technical substi-
tution, i.e., the rate at which one can substitute labor for capital along an iso-quant.) The
270 30. Solution to PS 7

value of K can be substituted in any of the two FOC to get the expression for L.
P · L− 3 K 3
2 1
= w;
( ) 13
− 23 w
P·L L = w;
r
( ) 13
w 1
P· = wL 3 ;
r
( )1
1 3 1
P· = L3 ;
rw2
P3
L∗ =
.
rw2
Taking the derivative of L with respect to r we obtain
dL∗ P3
= − 2 2.
dr r w

(d) (i) We first totally differentiate the FOCs to get,


2 1
dP · L− 3 K 3 − P · L− 3 K 3 dL + P · L− 3 K − 3 dK = dw
2 1 5 1 2 2

3 3
1 2
dP · L 3 K − 3 + P · L− 3 K − 3 dL − P · L 3 K − 3 dK = dr
1 2 2 2 1 5

3 3
dP P
− dR = dv.
R2 R3
We can write this in matrix form as under:
     
− 32 P · L− 3 K 3 13 P · L− 3 K − 3
5 1 2 2
0 
 
dL  
dw − dP · L − 23 31 
K 
 
A =  13 P · L− 3 K − 3 − 23 P · L 3 K − 3 0  , q = dK  b =  dr − dP · L 3 K − 3  .
2 2 1 5 1 2

0 0 − RP2  dR   dv − dP 
R2
Then Aq = b. Note that the matrix A is same as the Hessian. Solving for dL, when
dP = dw = dv = 0 and dr ̸= 0, using Cramer’s Rule, we get,
 
0 13 P · L− 3 K − 3
2 2
0
 
det dr − 23 P · L 3 K − 3
1 5
0 
0 0 − RP2
dL =  
− 23 P · L− 3 K 3 31 P · L− 3 K − 3
5 1 2 2
0
 
det  13 P · L− 3 K − 3 − 23 P · L 3 K − 3
2 2 1 5
0 
0 0 − RP2
(− RP2 )(−dr) 13 P · L− 3 K − 3
2 2 2 2
dr · L 3 K 3
= =− < 0.
− 13 PR2 · L− 3 K − 3
3 4 4
P
30. Solution to PS 7 271

dL∗
2 2
L3 K 3
=− < 0.
dr P
dL∗
Thus, L∗ decreases as r increases. To see that we obtain identical expression for dr
as in the previous part, observe,
P3 P6 P4
K∗ = ; L ∗
· K ∗
= ; (L ∗
· K ∗ 32
) =
r2 w r3 w3 r2 w2
dL∗ (L∗ K ∗ ) 3
2
P3
=− = 2 2
dr P r w
dL ∗ 2
L3 K 3
2
P3
=− = − 2 2.
dr P r w
(ii) Solving for dL, when dP = dw = dr = 0 and dv ̸= 0, using Cramer’s Rule, we get,
 
0 13 P · L− 3 K − 3
2 2
0
 
det  0 − 23 P · L 3 K − 3
1 5
0 
dv 0 − RP2
dL =  
− 23 P · L− 3 K 3 13 P · L− 3 K − 3
5 1 2 2
0
 
det  13 P · L− 3 K − 3 − 23 P · L 3 K − 3
2 2 1 5
0 
0 0 − RP2
(− RP2 )(0) 31 P · L− 3 K − 3
2 2

=
− 13 RP2 · L− 3 K − 3
3 4 4

0
=− 4 = 0.
− 13 PR2 · L− 3 K − 3
3 4

dL∗
= 0.
dv
Since L∗ does not depend on v, this conclusion is obvious.
Chapter 31

Solution to PS 8

1. (a) The Constraint set C can be rewritten as


{ ( ) }
C = (x1 , . . . , x3 ) ∈ R3 : d 2 (0, 0, 0) , (x1 , x2 , x3 ) = 1 ,

therefore C is ( )
(i) bounded since C ⊂ B (0, 0, 0) , 2 : indeed, x ∈ C ⇒ d(x, 0) = 1 < 2 ⇒ x ∈
B(0, 2),
(ii) closed in R3 since it is defined as a level set in R3 of polynomial and therefore con-
tinuous function ∑3i=1 xi2 (use the characterization of closed set in terms of convergent
sequences),
(iii) non-empty since (1, 0, 0) ∈ C.
Since objective function ∑3i=1 ci xi is linear, and therefore continuous on R3 , Weierstrass
theorem is applicable and yields x̄ ∈ C such that ∑3i=1 ci xi ≤ ∑3i=1 ci x̄i for any (x1 , x2 , x3 ) ∈ C.
(b) The optimization problem can be rewritten as


 max f (x)
(31.1) subject to g(x) = 0

 and x ∈ R3

where
3 3
f (x) = ∑ ci xi and g(x) = ∑ xi2 − 1.
i=1 i=1

Both functions f and g are polynomial and therefore continuously differentiable on an open
set R3 . Since x̄ is a point of global maximum of f subject to the constraint g(x) = 0, it is
also a local maximum of f subject to the constraint g(x) = 0. Since g(0) = −1 ̸= 0 we have

273
274 31. Solution to PS 8

x̄ ̸= 0. Now
∇g(x) = 2 (x1 , x2 , x3 )′ ̸= 0 for x ̸= 0,
and x̄ ̸= 0, hence constraint qualification ∇g(x̄) ̸= 0 holds. Therefore by Lagrange’s theorem
there exists λ ∈ R such that ∇ f (x̄) = λ∇g (x̄), or
(31.2) (c1 , c2 , c3 )′ = λ2 (x̄1 , x̄2 , x̄n )′
If we premultiply (31.2) by the row vector (x̄1 , x̄2 , x̄n ), we will get
3 3 ( )
(31.3) ∑ ci x̄i = 2λ ∑ x̄i2 = 2λ g(x̄) + 1 = 2λ (0 + 1) = 2λ
i=1 i=1
If we premultiply (31.2) by row vector (c1 , c2 , c3 ), equation (31.3) yields
( )2
3 3 3
(31.4) ∥c∥2 = ∑ c2i = 2λ ∑ ci x̄i = ∑ ci x̄i
i=1 i=1 i=1

( ∑)i=1 ci x̄i ≥ 0. Indeed, since


3
To conclude that the result holds we only need to show that
|c |
(c1 , c2 , c3 ) ̸= (0, 0, 0), we have ci ̸= 0 for some i. Since g ei cii = 0 and x̄ solves (31.1 ),
by definition of the solution to the constrained maximization problem
( )
i |ci |
3
∑ iic x̄ = f (x̄) ≥ f e
ci
= |ci | > 0.
i=1
Now taking square roots in (31.4) yields the results.
q
(c) Let us define c = p, and consider x̂ = ∥q∥ . Then ∥x̂∥ = 1, hence g (x̂) = 0 and the definition
of the solution of the constrained maximization problem yields
3 3
1 3 1 3 pq
∥p∥ = ∥c∥ = ∑ ci x̄i = f (x̄) ≥ f (x̂) = ∑ ci x̂i = ∑ ci qi = ∑ pi qi = ∥q∥
i=1 i=1 ∥q∥ i=1 ∥q∥ i=1
q
Analogously, for x̌ = − ∥q∥
we have ∥x̌∥ = 1 hence g (x̌) = 0 and the definition of the solution
of the constrained maximization problem yields
3 3
1 3
∥p∥ = ∥c∥ = ∑ ci x̄i = f (x̄) ≥ f (x̌) = ∑ ci x̌i = − ∑ ci qi
i=1 i=1 ∥q∥ i=1
3
1 pq
= − ∑
∥q∥ i=1
pi qi = −
∥q∥
.

Therefore, since ∥p∥ ∥q∥ > 0, we have


−∥p∥ ∥q∥ ≤ pq ≤ ∥p∥ ∥p∥ ⇔ |pq| ≤ ∥p∥ ∥q∥.
{ }
2. Necessity Route : Function f (x, y) = x2 −3xy is continuous and the constraint set (x, y) ∈ R2+ | x + 2y = 10
which we denote by G(x, y) is non-empty, (10, 0) is contained in it, closed
as the set
√ is defined

by weak inequalities which are preserved in the limit and bounded as (x, y) 6 102 + 52 =
31. Solution to PS 8 275


125. So the constraint set is compact and non-empty and the objective function f is contin-
uous, hence Weierstrass theorem is applicable and a solution exists. The Lagrangian and the
FOCs are
(31.5) L (x, y, λ) = x2 − 3xy + λ (2y + x − 10)
∂L (x, y, λ)
(31.6) = 2x − 3y + λ = 0
∂x
∂L (x, y, λ) 3
(31.7) = −3x + 2λ = 0 → λ = x
∂y 2
∂L (x, λ)
(31.8) = 2y + x − 10 = 0.
∂λ
Now
3 7 7
2x − 3y + λ = 2x − 3y + x = 0 → x = 3y → y = x
2 2 6
7 10
2y + x − 10 = 0 → x + x − 10 = 0 → x = 10 → x = 3
3 3
7 7 9
y = ·3 = ,λ = .
6 2 2
We get an interior candidate for solution
( )
7 9
m1 = 3, , .
2 2
The constraint qualification
( ) [ ]
∇g x∗ , y∗ = 1 2 ̸= 0

for all (x, y) ∈ R2+ . Verify that


( )
7 −45
f (10, 0) = 100, f (0, 5) = 0, f 3, = .
2 2
The solution then is (x∗ , y∗ ) = (10, 0). Note that we cannot use sufficiency route since f is not
concave.

3. Necessity Route: A solution exists by arguments similar to the earlier problem. The La-
grangian and the FOCs are
1 2
(31.9) L (x, y, λ) = x 3 y 3 + λ (4 − 2x − y)
∂L (x, y, λ) 1 −2 2
(31.10) = x y − 2λ = 0
3 3
∂x 3
∂L (x, y, λ) 2 1 −1
(31.11) = x3 y 3 −λ = 0
∂y 3
∂L (x, λ)
(31.12) = 4 − 2x − y = 0.
∂λ
276 31. Solution to PS 8

Now
1 − 23 23
3x y 2λ y
= → = 2 → y = 4x
2 3 − 13
1
λ 2x
3x y
( ) 13
2 8 2 1
4 − 2x − y = 4 − 2x − 4x = 0 → x = , y = , λ =
3 3 3 4
We get an interior candidate for solution
 
( ) 13
2 8 2 1 
m1 =  , , .
3 3 3 4

The constraint qualification


( ) [ ]
∇g x∗ , y∗ = −2 −1 ̸= 0

for all (x, y) ∈ R2+ . Verify that


( ) ( ) 13 ( ) 23
2 8 2 8
f (2, 0) = 0 = f (0, 4) = 0, f , = > 0.
3 3 3 3
( )
The solution then is (x∗ , y∗ ) = 2 8
3, 3 .
Sufficiency route:
[ ]
∇ f (x, y) = 1 − 23 32 2 13 − 13
3x y 3x y
[ ]
− 29 x− 3 y 3 2 − 23 − 13
5 2

H f (x, y) = 9x y
.
2 − 23 − 31
− 92 x 3 y− 3
1 4
9x y
The determinant of Principal minors of order one,
2 5 2
− x− 3 y 3 6 0,
9
2 1 −4
− x3 y 3 6 0
9
and principal minor of order two
0>0
for ∀ (x, y) ∈ R2+ . Hence f is concave. The constraint is linear and so concave.
( ) λ > 0. L (x, y, λ)
is concave and FOC are sufficient for maximum. Therefore (x∗ , y∗ ) = 2 8
3, 3 which satisfies the
FOC is the solution.

4. Let f : R2 → R

max f (x, y) = xy
(31.13)
subject to x + y 6 6, x > 0, y > 0.
31. Solution to PS 8 277

This problem has inequality constraint and so we will use Kuhn Tucker Sufficiency theorem.
We need to check all conditions of the Theorem are satisfied.
(i) Let
{ }
X = (x, y) ∈ R2++ .
Then X is open as its complement
{ }
X C = (x, y) ∈ R2 | x 6 0, y 6 0

is closed.
(ii) Function f (x, y) is continuous as x, and y are continuous, and f (·) is obtained by taking the
product of these two continuous functions. Let g1 (x, y) = 6 − x − y, 2 3
√g (x, y) = x, g (x, y) =

1 y 1 x
y are linear and hence continuous functions. Further fx (x, y) = 2 x , fy (x, y) = 2 y are
continuous functions. Hence f , g j ( j = 1, · · · , 3) are continuously differentiable on X.
(iii) The set X is convex as (x1 , y1 ), (x2 , y2 ) ∈ X,then
x1 > 0, x2 > 0 → λx1 + (1 − λ) x2 > 0∀λ ∈ (0, 1)
y1 >0, y2 > 0 → λy1 + (1 − λ) y2 > 0∀λ ∈ (0, 1)
( )
→ λx1 + (1 − λ) x2 , λy1 + (1 − λ) y2 ∈ X.
(iv) Function f (x, y) is concave as
[ √ √ ]
y
∇ f (x, y) = 1
2 x
1
2
x
y
 √ √ 
y
− 1
 4√ x3
1
4
1
xy 
H f (x, y) =  1 1 √ .
4 xy − 41 x
y3

The determinant of Principal minors of order one,



1 y
− 6 0,
4 x3

1 x
− 6 0
4 y3
and principal minor of order two
0>0
for ∀ (x, y) ∈ X. Hence f is concave. Further, g j ( j = 1, · · · , 3) are concave being linear
functions.
Hence for the following problem

max f (x, y) = xy
(x,y)∈X
subject to x + y 6 6, x > 0, y > 0.
278 31. Solution to PS 8

all conditions of Kuhn -Tucker sufficiency theorem are satisfied. We need to find pair ((x∗ , y∗ ) , λ∗ ) ∈
X × R3+ , that satisfies the Kuhn-Tucker conditions.
m
(i) Di f (x∗ ) + ∑ λ∗j Di g j (x∗ ) = 0; i = 1, · · · , n,
j=1
(ii) g(x ) > 0 and λ∗ · g(x∗ ) = 0.

They are

1 y
(31.14) − λ1 + λ2 = 0
2 x

1 x
(31.15) − λ1 + λ3 = 0
2 y
(31.16) 6 − x − y > 0, λ1 (6 − x − y) = 0
(31.17) x > 0, λ2 x = 0; y > 0, λ3 y = 0
√ √
If λ1 = 0, then 1
2
x
y − λ 1 + λ 3 = 0 → λ 3 = − 1
2 y < 0 which contradicts λ3 > 0. Hence
x

λ1 > 0 → 6 − x − y = 0
Since x > 0, y > 0, λ2 = 0, λ3 = 0,
√ √ √ √
1 y 1 x 1 x 1 y
− λ1 + λ2 = − λ1 + λ3 = 0 → = λ1 =
2 x 2 y 2 y 2 x
→ x = y → 6 − x − y = 0 → x = y = 3 > 0.
Note that all conditions are satisfied. Hence it is a global maximum on X. Observe that it is also
a global maximum on R2+ as
f (x, y) = 0 for (x, y) = R2+ \ X
and f (3, 3) > 0. Hence, (3, 3) solves the optimization problem.

5. Let f : R2 → R
max f (x, y) = x + ln (1 + y)
(31.18)
subject to x ≥ 0, y ≥ 0 and x + py ≤ m.
Again we will use Kuhn Tucker Sufficiency theorem. We need to check all conditions of the
Theorem are satisfied.
(i) Let { }
X = (x, y) ∈ R2 | x > −1, y > −1 .
Then X is open as its complement
{ }
X C = (x, y) ∈ R2 | x 6 −1, y 6 −1
is closed.
31. Solution to PS 8 279

(ii) Function f (x, y) is continuous as x and ln (1 + y), for y > −1 are continuous, and f (·) is
sum of two continuous functions. Let g1 (x, y) = m − x − py, g2 (x, y) = x, g3 (x, y) = y are
1
linear and hence continuous functions. Further fx (x, y) = 1, fy (x, y) = 1+y are continuous
functions. Hence f , g j ( j = 1, · · · , 3) are continuously differentiable on X.
(iii) The set X is convex as (x1 , y1 ), (x2 , y2 ) ∈ X, then
x1 > −1, x2 > −1 → λx1 + (1 − λ) x2 > −1∀λ ∈ (0, 1)
y1 > −1, y2 > −1 → λy1 + (1 − λ) y2 > −1∀λ ∈ (0, 1)
( )
→ λx1 + (1 − λ) x2 , λy1 + (1 − λ) y2 ∈ X.
(iv) Function f (x, y) is concave as
[ ]
∇ f (x, y) = 1 1
1+y
[ ]
0 0
H f (x, y) = 0 − 1 .
(1+y)2

The determinant of Principal minors of order one,


0 6 0,
1
− 6 0
(1 + y)2
and principal minor of order two
0>0
for ∀ (x, y) ∈ X. Hence f is concave. g j ( j = 1, · · · , 3) are concave being linear functions.
Hence for the following problem
max f (x, y) = x + ln (1 + y)
(x,y)∈X
subject to x + py 6 m, x > 0, y > 0.
all conditions of Kuhn -Tucker sufficiency theorem are satisfied. We need to find pair ((x∗ , y∗ ) , λ∗ ) ∈
X × R3+ , that satisfies the Kuhn-Tucker conditions.
m
(i) Di f (x∗ ) + ∑ λ∗j Di g j (x∗ ) = 0; i = 1, · · · , n, and
j=1
(ii) g(x ) > 0 and λ∗ · g(x∗ ) = 0.

They are
(31.19) 1 − λ1 + λ2 = 0
1
(31.20) − pλ1 + λ3 = 0
1+y
(31.21) m − x − py > 0, λ1 (m − x − py) = 0
(31.22) x > 0, λ2 x = 0; y > 0, λ3 y = 0.
280 31. Solution to PS 8

If λ1 = 0, then 1 − λ1 + λ2 = 0 → λ2 = −1 < 0 which contradicts λ2 > 0. Hence

λ1 > 0 → m − x − py = 0

and x = y = 0 is ruled out because m > 0. There are three remaining cases.
(i) x > 0, y = 0. Note λ2 = 0, x = m,

1 = λ1
1 − p + λ3 = 0
λ3 = p − 1.

If p − 1 > 0, then λ3 > 0. So solution is (m, 0, 1, 0, p − 1) if p > 1.


(ii) x = 0, y > 0. Note λ3 = 0, y = mp ,

1 1
( )= == λ1
p 1 + mp p+m

1 − λ1 + λ2 = 0
1
1− + λ2 = 0
p+m
1
λ2 = − 1.
p+m
( )
1
If p+m −1 > 0 → 1 > p+m, then λ2 > 0. So solution is 0, mp , p+m
1 1
, p+m − 1, 0 if p+m 6
1.
(iii) x > 0, y > 0. Note λ2 = 0, λ3 = 0,

1 1
(31.23) 1 = λ1 ,
= p → y = −1 > 0
1+y p
(31.24) m − x − py = 0 → x = m − 1 + p > 0
( )
Hence for 1 > p > 1 − m, the solution is m − 1 + p, 1p − 1, 1, 0, 0 . Combining them the
( )
solution x∗ , y∗ , λ∗1 , λ∗2 , λ∗3 is


 (m, 0, 1, 0, p −)1) if p > 1

 ( m 1 1
0, p , p+m , p+m − 1, 0 if p 6 1 − m and
 ( )


 m − 1 + p, 1p − 1, 1, 0, 0 if 1 − m < p < 1.

The Kuhn Tucker Sufficiency Theorem asserts that this solution is a global maximum and
therefore solves both the problem.
31. Solution to PS 8 281

6. Recall the two optimization problems are



max f (x) 

(31.25) subject to g(x) ≥ 0


and x∈X

and the corresponding optimization problem:


}
max f (x)
(31.26)
subject to x ∈ X

in which the constraint g(x) ≥ 0 has been omitted.


(a) We claim that x̄ is also a solution to problem (31.25). For, if this is not the case, then since
x̄ is in the constraint set {x ∈ X : g(x) ≥ 0} of problem (31.25), there is some x′ ∈ X, with
g(x′ ) ≥ 0, such that f (x′ ) > f (x̄). But, since x′ ∈ X and is therefore in the constraint set
of problem (31.26), this means that x̄ is not a solution to problem (31.26), a contradiction.
This establishes our claim. [Note that we are not given the information that problem (31.25)
has a solution, and so we do not make use of this information in the answer].
(b) Let x̂ be any solution to problem (31.25). Note that since both x̂ and x̄ are in X, the constraint
set of problem (31.26), and x̄ solves problem (31.26), we have
(31.27) f (x̄) ≥ f (x̂)
We claim that g(x̂) = 0. For if g(x̂) ̸= 0, we must have g(x̂) > 0, since x̂ is a solution to
problem (31.25), and must therefore be in the constraint set {x ∈ X : g(x) ≥ 0} of problem
(31.25).
Since x̄ is not a solution to problem (31.25), and x̄ ∈ X, it must be the case that g(x̄) < 0.
For if g(x̄) ≥ 0, then, given (31.27), x̄ would also solve problem (31.25).
Since g(x̄) < 0, continuity of g on the convex set X [using the intermediate value theorem]
implies that we can find λ ∈ (0, 1), such that:
(31.28) g(λx̂ + (1 − λ)x̄) = 0
Denote (λx̂+(1−λ)x̄) by z. Then z ∈ X and g(z) = 0 by (31.28), so z satisfies the constraints
of problem (31.25).
Since f is strictly quasi-concave on X, then we can use x̂ ̸= x̄ [recall that g(x̄) < 0 while
g(x̂) > 0], and λ ∈ (0, 1), to obtain:
f (z) = f (λx̂ + (1 − λ)x̄) > min{ f (x̂), f (x̄)} = f (x̂)
using (31.27). But this contradicts the fact x̂ solves (31.25), and establishes our claim.

7. Suppose that a consumer has the utility function U(x, y) = xa yb and faces the budget constraint
px x + py y ≤ I.
(A) Utility Maximization
282 31. Solution to PS 8

(a) What are the first order conditions for utility maximization?
Observe that the utility function makes sense only if a > 0 and b > 0. The Lagrangean
for the optimization problem is
L (x, y, λ) = U(x, y) + λ(I − px x − py y)
= xa yb + λ(I − px x − py y)
The first order conditions are,
∂L
= axa−1 yb − λpx = 0
∂x
∂L
= bxa yb−1 − λpy = 0
∂y
∂L
= I − px x − py y = 0
∂λ
(b) Solve for the consumer’s demands for goods x and y.
From the first two FOCs, we get
axa−1 yb = λpx
bxa yb−1 = λpy
Dividing the first equation by the second, we get
axa−1 yb λpx
=
bxa yb−1 λpy
ay px
=
bx py
b
py y = px x.
a
We use this in the third FOC to get,
px x + py y = I
b
px x + px x = I
a
a+b
px x = I
a
a a I
px x ∗ = I → x∗ =
a+b a + b px
This gives
b b I
py y∗ = I → y∗ =
a+b a + b py
31. Solution to PS 8 283

(c) Solve for the value of λ. What is the economic interpretation of λ? When is λ an
increasing, decreasing or constant function of income?
We use the first FOC (with respect to x) to get,
axa−1 yb
axa−1 yb = λpx → λ∗ =
px
( )a−1 ( )b
a I b I
a a+b px a+b py
λ∗ =
px
( )a ( )b ( )a+b−1
a b I
= >0
px py a+b

The Lagrange multiplier λ∗ is marginal utility of income, as we can see below.


L ∗ (x∗ , y∗ , λ∗ ) = U(x∗ , y∗ ) + λ∗ (I − p∗x − py y∗ )
= (x∗ )a (y∗ )b + λ∗ · (0)
Suppose the income increased by a dollar. Then the utility goes up by λ∗ .
Lastly the λ∗ is increasing with income if and only if a + b > 1.
(d) Show that the second order conditions hold?
Observe that the second order partial derivatives are,
∂2 L
= a(a − 1)xa−2 yb
∂x2
∂2 L
= abxa−1 yb−1
∂x∂y
∂2 L
= b(b − 1)xa yb−2
∂y2
∂2 L
= −px
∂x∂λ
∂2 L
= −py
∂y∂λ

Using these, we get the bordered Hessian matrix as under:


 2   
∂ L ∂2 L ∂2 L
 ∂∂x
2 ∂x∂y ∂x∂λ  a(a − 1)x a−2 yb abx a−1 yb−1 −p x
 a yb−2 −p  .
H = ∂2 L 
2L ∂2 L
 ∂x∂y ∂y 2 ∂y∂λ  =  abx a−1 yb−1 b(b − 1)x y
∂2 L ∂2 L ∂2 L −px −py 0
∂x∂λ ∂y∂λ ∂λ2

The border preserving leading principal minor of order 2 is the Hessian matrix itself.
For the second order condition to be satisfy, the determinant of the Hessian needs to be
284 31. Solution to PS 8

positive.

det H = (−px )[(−py )abxa−1 yb−1 − (−px )b(b − 1)xa yb−2 ]


− (−py )[(−py )a(a − 1)xa−2 yb − (−px )abxa−1 yb−1 ]
= px [py abxa−1 yb−1 − px b(b − 1)xa yb−2 ] − py [py a(a − 1)xa−2 yb − px abxa−1 yb−1 ]
= 2px py abxa−1 yb−1 − p2x b(b − 1)xa yb−2 − p2y a(a − 1)xa−2 yb
[ ]
2abp p b(b − 1)p 2 a(a − 1)p 2
= (x∗ )a (y∗ )b
x y y
− x

xy y2 x2
 
2abpx py b(b − 1)px a(a − 1)py 
2 2
= (x∗ )a (y∗ )b  aI bI
− bI
− aI
(a+b)px (a+b)py ( (a+b)p y
)2 ( (a+b)p x
)2
( [ ]2 [ ]2 [ ]2 )
(a + b)p p b − 1 p p a − 1 p p
= (x∗ )a (y∗ )b 2
x y x y x y
− (a + b)2 − (a + b)2
I b I a I
[ ]2 ( )
(a + b)px py b−1 a−1
= (x∗ )a (y∗ )b 2− −
I b a
[ ]2 ( )
∗ a ∗ b (a + b)px py 1 1
= (x ) (y ) 2−1+ −1+
I b a
[ ]2 ( )
(a + b)px py 1 1
= (x∗ )a (y∗ )b + >0
I b a

dx
(e) Show that the implicit function theorem value of dI is identical to the value of taking
the partial derivative of x∗ with respect to I.
Using x∗ , we get

∂x∗ a 1
=
∂I a + b px
31. Solution to PS 8 285

Using the implicit function theorem,


 
0 abxa−1 yb−1 −px
 
det  0 b(b − 1)xa yb−2 −py 
dx∗ −1 −py 0 −1[(−py )abxa−1 yb−1 − (−px )b(b − 1)xa yb−2 ]
= =
dI det H det H
[py abx y − px b(b − 1)x y ]
a−1 b−1 a b−2
=
det H
[apy y − px (b − 1)x] xpx bxa yb−2 px
= bxa−1 yb−2 = bxa−1 yb−2 =
det H det H det H
bxa yb−2 px
= [ ] ( )
(a+b)px py 2 1
(x∗ )a (y∗ )b I b + 1
a
bpx
= [ ] ( )
(a+b)px py 2 a+b
(y∗ )2 I ab
px 1 a 1
=[ ]2 ( ) =( ) = .
y∗ (a+b)py a+b a+b
px a + b px
bI a p2x a

Thus the two expressions are identical.


(f) A consumer’s indirect utility function is defined to be utility as a function of prices and
income. Use x∗ and y∗ to solve for the indirect utility function. Is it true that the partial
of the indirect utility function with respect to income equals λ?
The indirect utility function is
( )a ( )b
∗ ∗ ∗ ∗ a ∗ b aI bI
u = u(x , y ) = (x ) (y ) =
(a + b)px (a + b)py
( )a ( ) b
a b
= I a+b
(a + b)px (a + b)py
Then,
( )a ( )b
∂u∗ a b
= (a + b) I a+b−1
∂I (a + b)px (a + b)py
( )a ( )b ( )a+b−1
a b I
= = λ∗
px py a+b
(B) Expenditure Minimization:
Now consider the “dual ”of the utility maximization problem. The dual problem is to min-
imize expenditures, px x + py y, subject to reaching a given level of utility, u0 (the constraint
is therefore U0 − xa yb = 0).
286 31. Solution to PS 8

(a) What are the first order conditions for expenditure minimization?
First, we write down the minimization problem as
min px x + py y
subject to xa yb ≥ u0 ,
which can be converted into a maximization exercise as under:
max −px x − py y
subject to xa yb ≥ u0 ,
The Lagrangean for the maximization problem is
L (x, y, λ) = −px x − py y + λ(xa yb − u0 )

The first order conditions are,


∂L
= −px + λaxa−1 yb = 0
∂x
∂L
= −py + λbxa yb−1 = 0
∂y
∂L
= xa yb − u0 = 0
∂λ
(b) Use the first order conditions to solve for x∗ and y∗ (these are called the Hicksian or
compensated demand functions).
From the first two FOCs, we get
λaxa−1 yb = px
λbxa yb−1 = py
Dividing the first equation by the second, we get
λaxa−1 yb px
=
λbx y
a b−1 py
ay px
=
bx py
b px
y= x.
a py
We use this in the third FOC to get,
( )b ( ) b
a b a b p x a+b u0 ∗ a py a+b a+b1
x y = u0 ; x x = u0 ; x =( )b ; x = u0
a py b px b px
a py
( ) b ( ) a+b
a

∗ b px ∗ b px a py a+b 1
a+b
b px 1
y = x = u0 = u0a+b
a py a py b px a py
31. Solution to PS 8 287

(c) Check the second order conditions.


It is easy to see that the bordered Hessian is same as in the case of utility maximization
exercise. Hence we conclude that the SOC holds in this case.
(d) Write the level of income, I, necessary to reach U0 as a function of U0 , prices, and
parameters. How does this expenditure function relate to the indirect utility function?

( ) a+b
b ( ) a+b
a
a py 1 b px 1
e(px , py , u0 ) = px x∗ + py y∗ = px a+b
u0 + py u0a+b
b px a py
 
( ) a+b
b ( ) a+b
a
a b
= (pax pby u0 ) a+b  
1
+
b a
 1 
 a+b 
( )a ( ( ) a+b
b )b
( ) a+b
a

 a b a b
I a+b   
1
a b a+b
= (px py ) +
(a + b)px (a + b)py b a
[( 1  
)a ( )b ] a+b ( ) a+b
b ( ) a+ba
a b  a b  I = I.
= +
a+b a+b b a

This shows that the minimum expenditure required to attain utility equal to the indirect
utility function is same as the income I. Thus the two approaches are equivalent.
(e) To avoid confusion, let us call solution for utility maximization of good x as x∗ and
solution for good x in expenditure minimization as h∗ . Prove that
∂x∗ ∂h∗ ∂x∗
= − x∗ .
∂Px ∂Px ∂I
Interpret this answer.
1
Observe that we can rewrite h∗ as h∗ = θ(px )− a+b where θ ≡ ( ab py ) a+b u0a+b . This gives
b b

us
( ) ( ) ∗
∂h∗ b − a+b
b
−1 b h
=θ − (px ) = − .
∂px a+b a + b px
Also from the utility maximization, we get,
( )
∂x∗ aI −x∗
= − (px )−2 = .
∂px a+b px
and

( )
∗ ∂x ∗ a
x =x (px )−1 .
∂I a+b
288 31. Solution to PS 8

Therefore,
( )( ∗ ) ( ∗)
∂x∗ ∗ ∂x
∗ −x∗ a x b x ∂h∗
+x = + =− = .
∂px ∂I px a+b px a + b px ∂px

The change in x∗ due to change in own price px (Total effect) is the sum of the substi-
∂h∗ ∗
tution effect ( ∂p x
) and the income effect (−x∗ ∂x
∂I ).

8. Suppose a consumer has the utility function U = a ln(x − x0 ) + b ln(y − y0 ) where a, b, x0 and y0
are positive parameters. Assume that the usual budget constraint applies.
(a) Solve for the consumer’s demand for good x.
Observe that the utility maximization exercise makes sense if consumption bundle (x0 , y0 )
is feasible. Let us denote x − x0 by x′ and y − y0 by y′ . Then the utility function can be
written as U(x′ , y′ ) = a ln(x′ ) + b ln(y′ ). The budget constraint px + qy = I can be written as
px′ +qy′ = I − px0 −qy0 = I ′ . The utility maximization exercise can therefore be formulated
as
max a ln(x′ ) + b ln(y′ )
subject to px′ + qy′ = I ′ .

The lagrangean for the optimization problem is

L (x′ , y′ , λ) = a ln(x′ ) + b ln(y′ ) + λ(I ′ − px x′ − py y′ )

The first order conditions are,

∂L a

= ′ − λpx = 0
∂x x
∂L b

= ′ − λpy = 0
∂y y
∂L
= I ′ − px x′ − py y′ = 0
∂λ

From the first two FOCs, we get

a b

= λpx ; ′ = λpy
x y

Dividing the first equation by the second, we get

ay′ px b

= ; py y′ = px x′ .
bx py a
31. Solution to PS 8 289

We use this in the third FOC to get,


px x′ + py y′ = I ′
b
px x′ + px x′ = I ′
a
a+b
px x′ = I ′
a
a ′ a I′
px x′ = I → x′ =
a+b a + b px
This gives
b ′ b I′
py y′ = I → y′ =
a+b a + b py
We need to show that the second order conditions hold for the solution to yield a maximum.
Observe that the second order partial derivatives are,
∂2 L a ∂2 L ∂2 L b ∂2 L ∂2 L
= − ; = 0; = − ; = −p x ; = −py
∂x2 (x′ )2 ∂x∂y ∂y2 (y′ )2 ∂x∂λ ∂y∂λ
Using these, we get the bordered Hessian matrix as under:
 2   a 
∂ L ∂2 L ∂2 L
∂x ∂x∂y ∂x∂λ − (x′ )2 0 −px
 ∂2 L ∂2 L ∂2 L 
2

   
H =  ∂x∂y ∂y2 ∂y∂λ  =  0 − (yb′ )2 −py  .
∂2 L ∂2 L ∂2 L −px −py 0
∂x∂λ ∂y∂λ ∂λ2
The border preserving leading principal minor of order 2 is the Hessian matrix itself. For
the second order condition to be satisfy, the determinant of the Hessian needs to be positive.
[ ( )] [ ( )]
b a
det H = (−px ) −(−px ) − ′ 2 − (−py ) (−py ) − ′ 2
(y ) (x )
( ) ( )
bp2x ap2y
= + > 0.
(y′ )2 (x′ )2
Thus SOC holds and we have a maximum. The optimum consumption bundle is
a I − px x0 − py y0
x∗ = x′ + x0 = + x0
a+b px
a I − py y0 b
= + x0
a + b px a+b
b I − px x0 a
y∗ = + y0
a + b py a+b
(b) Find the elasticities of demand for good x with respect to income and prices.
It is easy to compute the price and income elasticity using the definitions. Please let me
know if you have any questions on this.
290 31. Solution to PS 8

(c) Show that the utility function V = 45(x − x0 )3.5a (y − y0 )3.5b would have yielded the same
demand for good x.
If we take positive monotone transformation of the given utility by taking its natural log,
then we get a function which is similar to the utility function in (a).
lnV = ln 45 + 3.5a ln(x − x0 ) + 3.5b ln(y − y0 )
= ln 45 + 3.5(U)
This implies that the consumption bundle (x∗ , y∗ ) will maximize the utility function V also.

9. The utility function is


U(x, y, z) = a ln(x) + b ln(y) + c ln(z)
where a > 0, b > 0 and c > 0 are such that a + b + c = 1. The budget constraint can be written
as
g1 (x, y, z) = I − px − qy − rz ≥ 0.
The rationing constraint is
g2 (x, y, z) = k − x ≥ 0.
(a) This problem has two inequality constraints and so we will use Kuhn Tucker Sufficiency
theorem.
(i) Let { }
X = (x, y, z) ∈ R2+++ .
Then X is open as its complement
{ }
X C = (x, y, z) ∈ R3 | x 6 0, y 6 0, z 6 0
is closed.
(ii) Function U (x, y, z) is continuous in x, y, and z (being sum of log functions). Let
g1 (x, y, z) = I − px−qy−rz, g2 (x, y, z) = k−x, g3 (x, y, z) = x, g4 (x, y, z) = y, g5 (x, y, z) =
z are linear and hence continuous functions.
It is possible to infer that f , g j ( j = 1, · · · , 5) are twice continuously differentiable on
X and the set X is convex.
(iii) Function U (x, y, z) is concave as
[ ]
∇ f (x, y, z) = a
x
b
y
c
z
 
− x2
a
0 0
 0 
H f (x, y, z) =  0 − yb2 .
0 0 − z2 c

The determinant of leading principal minors of order one is − xa2 < 0; of leading prin-
cipal minor of order two is xab
2 y2 > 0; and of leading principal minor of order three
31. Solution to PS 8 291

is − x2abc y2 z2
< 0 for ∀ (x, y, z) ∈ X. Hence f is concave. Further, g j ( j = 1, · · · , 5) are
concave being linear functions.
Hence all conditions of Kuhn -Tucker sufficiency theorem are satisfied. We need to find pair
((x∗ , y∗ , z∗ ) , λ∗ ) ∈ X × R5+ , that satisfies the Kuhn-Tucker conditions.
5
(i) Di f (x∗ , y∗ , z∗ ) + ∑ λ∗j Di g j (x∗ , y∗ , z∗ ) = 0; i = 1, · · · , 3,
j=1

(ii) g (x , y , z ) > 0 and λ∗j · g j (x∗ , y∗ , z∗ ) = 0.


j ∗ ∗ ∗

They are
(31.29) a
x − λ1 p − λ2 + λ3 = 0
(31.30) b
y − λ1 q + λ4 = 0
(31.31) c
z − λ1 r + λ5 = 0
(31.32) I − px − qy − rz > 0, λ1 (I − px − qy − rz) = 0
(31.33) k − x > 0, λ2 (k − x) = 0
(31.34) x > 0, λ3 x = 0; y > 0, λ4 y = 0, z > 0, λ5 z = 0.
If λ1 = 0, then by − λ1 q + λ4 = 0 → λ4 = − by < 0 which contradicts λ4 > 0. Hence
λ1 > 0 → I − px − qy − rz = 0
Also, x > 0, y > 0, and z > 0 for the three FOC to hold with equality. Thus λ3 = 0 = λ4 = λ5 .
(i) If λ2 > 0 then x = k, and
b c
I − pk = qy + rz = + .
λ1 λ1
Thus λ1 = b+c
I−pk which leads to
b(I − pk) c(I − pk)
y= and z = .
q(b + c) q(b + c)
We need to verify λ2 > 0 which will hold if λ2 = ak − (b+c)p
I−pk > 0 or

a pk
> .
b + c I − pk
(ii) If λ2 = 0, then
aI bI cI I
x= ;y = ;z = ; λ1 =
p(a + b + c) q(a + b + c) r(a + b + c) (a + b + c)
satisfies the KT conditions (Please verify).
(b)
a pk
> .
b + c I − pk
292 31. Solution to PS 8

(c)
b(I−pk)
py (b+c) b
= c(I−pk)
= .
rz c
(b+c)
(d) No, it is more likely that one buys more rice and less butter.
Bibliography

Bridges, D., Ray, M., 1984. What is constructive mathematics. The Mathematical Intelligencer
6 (4), 32–38.
Dixit, A. K., 1990. Optimization in Economic Theory, 2nd Edition. Oxford University Press, USA.
Mas-Colell, A., Whinston, M., Green, J., 1995. Microeconomic Theory. Oxford University Press,
USA.
Mitra, T., 2013. Lectures on Mathematical Analysis for Economists. Campus Book Store.
Olsen, L., October 2004. A new proof of Darboux’s Theorem. The American Mathematical
Monthly 111, 173–175.
Royden, H. L., 1988. Real Analysis. Prentice Hall.
Simon, C. P., Blume, L., 1994. Mathematics for Economists. W. W. Norton & Co., New York.
Stricharz, R., 2000. The Way of Analysis. Jones and Bartlett.
Wainwright, E. K., Chiang, A., 2005. Fundamental Methods of Mathematical Economics, 4th Edi-
tion. McGraw Hill, New York.

293

You might also like