387 views

Uploaded by Ram Sewak Dubey

This file contains a compilation of the instruction materials (lecture notes, problem sets and solutions) which I used to teach a mathematics review course (in Summer 2014) to the incoming graduate students in the Department of Economics at Cornell University.

- IvyGlobal-SAT Math Review
- Math 120 Final Exam Review (3)
- Reviewer Math
- LET Math Final Handout
- MAJOR Reviewer
- Math Review 2 Algebra
- Math review
- Percent Increase and Decrease
- Let Review 2015
- DynaMath Reviewer for Basic Algebra
- April Math Assessment
- Grade 8 | Math assessment
- Math Review
- Grade 10 Math Review
- CPTCollegeMathReview
- Math Review NAT_20151010
- SAT/ ACT/ Accuplacer Math Review
- Writing and Math Assessment
- LET General Math Reviewer
- Grade 9 Math Review

You are on page 1of 288

U NIVERSITY, M ONTCLAIR , N EW J ERSEY, 07043

E-mail address: dubeyr@mail.montclair.edu

Contents

Preface vii

Syllabus ix

0.1. Overview ix

0.2. Course Schedule x

0.3. Topics covered x

0.4. Textbook xi

0.5. Mathematics Proficiency Test xi

1.1. Introduction 1

1.2. Statements 2

1.3. Logical Connective 3

1.4. Quantifiers 8

1.5. Rules of Negation of statements with quantifiers 12

1.6. Logical Equivalences 14

1.7. Some Math symbols and Definitions 15

2.1. Methods of Proof 17

2.2. Trivial Proofs 18

2.3. Vacuous Proofs 18

2.4. Proof by Construction 19

iii

iv Contents

2.6. Proof by Contradiction 22

2.7. Proof by Induction 24

2.8. Additional Notes on Proofs 28

2.9. Decomposition or proof by cases 30

Chapter 3. Problem Set 1 35

Chapter 4. Set Theory, Sequence 37

4.1. Set Theory 37

4.2. Set Identities 42

4.3. Functions 44

4.4. Vector Space 45

4.5. Sequences 50

4.6. Sets in Rn 56

Chapter 5. Problem Set 2 63

Chapter 6. Linear Algebra 65

6.1. Vectors 65

6.2. Matrices 67

6.3. Determinant of a matrix 72

6.4. An application of matrix algebra 75

6.5. System of Linear Equations 78

6.6. Cramers Rule 82

6.7. Principal Minors 84

6.8. Quadratic Form 85

6.9. Eigenvalue and Eigenvectors 86

6.10. Eigenvalues of symmetric matrix 89

6.11. Eigenvalues, Trace and Determinant of a Matrix 90

Chapter 7. Problem Set 3 93

Chapter 8. Single and Multivariable Calculus 97

8.1. Functions 97

8.2. Surjective and Injective Functions 97

8.3. Composition of Functions 99

8.4. Continuous Functions 101

Contents v

8.6. An application of Extreme Values Theorem 105

8.7. Differentiability 107

8.8. Monotone Functions 112

8.9. Functions of Several Variables 115

8.10. Composite Functions and the Chain Rule 119

Chapter 9. Problem Set 4 121

Chapter 10. Convex Analysis 123

10.1. Concave, Convex Functions 123

10.2. Quasi-concave Functions 129

Chapter 11. Problem Set 5 133

Chapter 12. Inverse and Implicit Function Theorems 135

12.1. Inverse Function Theorem 135

12.2. The Linear Implicit Function Theorem 136

12.3. Implicit Function Theorem for R2 138

Chapter 13. Homogeneous and Homothetic Functions 141

13.1. Homogeneous Functions 141

13.2. Homothetic Functions 144

Chapter 14. Problem Set 6 145

Chapter 15. Unconstrained Optimization 147

15.1. Optimization Problem 147

15.2. Maxima / Minima for C 2 functions of n variables 148

15.3. Application: Ordinary Least Square Analysis 154

Chapter 16. Problem Set 7 159

Chapter 17. Optimization Theory: Equality Constraints 161

17.1. Constrained Optimization 161

17.2. Equality Constraint 163

Chapter 18. Optimization Theory: Inequality Constraints 173

18.1. Inequality Constraint 173

18.2. Global maximum and constrained local maximum 180

Chapter 19. Problem Set 8 187

vi Contents

20.1. Envelope Theorem for Unconstrained Problems 191

20.2. Meaning of the Lagrange multiplier 193

20.3. Envelope Theorem for Constrained Optimization 194

Chapter 21. Elementary Concepts in Probability 195

21.1. Discrete Probability Model 195

21.2. Marginal and Conditional Distribution 199

21.3. The Law of Iterated Expectation 202

21.4. Continuous Random Variables 202

Chapter 22. Solution to PS 1 205

Chapter 23. Solution to PS 2 213

Chapter 24. Solution to PS 3 219

Chapter 25. Solution to PS 4 227

Chapter 26. Solution to PS 5 233

Chapter 27. Solution to PS 6 239

Chapter 28. Solution to PS 7 247

Chapter 29. Solution to PS 8 257

Preface

These notes have been prepared for the Math Review Class for Graduate students joining Ph. D.

program in the field of economics at Cornell University. While making these notes we have referred

to the material used in previous years classes.

The objective of Math Review class is to present elementary concepts from set theory, multi-

variable calculus, linear algebra, elementary probability concepts, real analysis and optimization

theory. I have used examples and problem sets to explain the concepts, definitions and techniques

which are useful in Fall semester graduate economics classes.

These notes could serve to refresh the memory for those incoming students who are familiar

with the material. To others, these notes could be a ready reckoner of math techniques they will

need to know in the first few weeks of the graduate classes (Econ 6090, Econ 6130, Econ 6190) in

Economics before they are discussed in Econ 6170 in more rigorous way.

The topics have been arranged so that the entire material can be covered in thirteen classes

of three hours duration each. Additional problem sets with solutions are provided on each days

material. Three additional sections of three hours each are sufficient to go over the questions in

problem sets. It is hoped that they will help the reader to better understand the material in lecture

notes.

Earlier versions have been used for the Math Review Classes during 2009-16. My sincere

thanks go to the participants for their comments and also for pointing out typos, errors.

Ram Sewak Dubey

vii

Syllabus

Field of Economics

Cornell University

Instructor: Ram Sewak Dubey Office Room: 474G, Uris Hall

Office Hour: 12:15 -1:15pm E-mail: rsd28@cornell.edu

0.1. Overview

The Field of Economics offers the August Math Review Course for incoming first-year Ph.D.

students. The aim of this review is to refresh students mathematical skills and introduce concepts

that are critical to success in the first year economics core courses, i.e., Econ 6090, Econ 6130,

Econ 6170, and Econ 6190. The emphasis is on rigorous treatment of proof techniques, underlying

concepts and illustrative examples.

There is usually a great deal of variation in the mathematical background of incoming first-year

students. However, almost all students have something to gain from the review course. For those

who do not have an adequate mathematics background (by a US Ph.D. standard), the course offers

an opportunity to catch up on critical concepts and get a head start on the fall classes. For those

who took their core undergraduate courses in analysis and algebra some years ago, the course is a

good refresher. For those who do not have significant experience with technical courses taught in

English, the review offers an opportunity to pick up the math vocabulary that will be in use from

the first day of regular instruction.

The Math Review Course is funded by the Department of Economics. There is no charge for

students matriculating into the Economics Ph.D. Program. Students matriculating into other Ph.

D. programs should contact the Director of Graduate Studies in their Field. There will be a charge

ix

x Syllabus

for these students, and the DGS in the students Field must make arrangements to pay that charge

before the student may attend the Math Review Course.

The Math Review Course is not linked to Econ 6170, Intermediate Mathematical Economics I.

There is no course grade, and no record will be kept of your performance. However, the Economics

Ph.D. program strongly encourages you to attend. Most students who have taken this course in past

years have found it useful, regardless of their prior mathematics training. Perhaps most importantly,

the review period is an excellent time to get acquainted with other incoming students, meet the

faculty and settle into Ithaca.

The course duration will be July 31- August 18. There will be a lecture session each working day.

The room for all the sessions is URIS 202.

(A) Session Time:

July 31-August 4, August 7-11, 14-18 Time: 9am-Noon.

(B) There will be a handout of some basic definitions distributed at each session, and practice

problems will be assigned on each topic. You are strongly encouraged to at least attempt every

problem, as this is the best way to understand the material. The problem sets will be due the

following day in class (for example, the problem set given in class on Monday will be due on

Tuesday) and I intend to grade some of the questions in each problem set. We will go over the

solutions to the problem sets in class.

A. Elements of Logic: Statements, Truth tables, Implications, Tautologies, Contradictions, Logical

Equivalence, Quantifiers, Negation of Quantified Statements

B. Proof Techniques: Trivial Proofs, Vacuous Proofs, Direct Proofs, Proof by Contrapositive,

Proof by Cases, Proof by Contradiction, Existence Proofs, Proof by Mathematical Induction

C. Set Theory: Definitions, Set Equality, Set Operations, Venn Diagrams, Set Identities, Cartesian

Products, Properties of the Set of Real Numbers

D. Sequences: Convergent Sequences, Subsequences, Cauchy Sequences, Upper and Lower Lim-

its, Algebraic Properties of Limits, Monotone Sequences

E. Functions of One Variable: Limits of Functions, Continuous Functions, Monotone Functions,

Properties of Exponential and Logarithmic Functions

F. Linear Algebra: System of Linear Equations, Solution by Substitution or Elimination of Vari-

ables, Systems with Many or No Solutions

G. Vectors I: Addition, Subtraction, Scalar Multiplication, Length, Distance, Inner Product

H. Matrix Algebra I: Addition, Subtraction, Scalar and Matrix Multiplication, Transpose, Laws of

Matrix Algebra

0.5. Mathematics Proficiency Test xi

Rule

J. Vectors II: Linear Independence, Rn as an example of Vector Space, Basis and Dimension in

Rn

K. Matrix Algebra II: Algebra of Square Matrices, Eigenvalues, Eigenvectors, Properties of Eigen-

values

L. Differential Calculus: Derivative of a Real Function, Mean Value Theorem, Continuity of

Derivatives, LHospitals Rule, Higher Order Derivatives, Taylors Theorem

M. Functions of Several Variables: Graphs of Functions of Two Variables, Level Curves, Continu-

ous Functions, Total Derivative, Chain Rule, Partial Derivatives

N. Unconstrained Optimization: First Order Conditions, Global Maxima and Minima, Examples

O. Constrained Optimization with equality constraints: First Order Conditions, Constrained Min-

imization Problems, Examples

P. Constrained Optimization with inequality constraints: Kuhn-Tucker conditions, Interpreting

the Multipliers, Envelope Theorem

0.4. Textbook

There is no textbook for the math review course, however the following books may be helpful.

The textbook ? is used in the Microeconomics course sequence. ? and ? are useful textbooks for

Mathematical Economics. It will be useful to refer to ? for understanding the material. Copies of

this textbook are available in the libraries. ? will be our reference book for analysis. ? contains

many useful examples. ? is the set of Lecture Notes used in Econ 6170. It should be available at

the bookstore at the start of the course.

A Mathematics Proficiency Test will be given on Friday, August 18, 2017 from 12:30pm - 3:30

pm in URIS 202. The test will be based on the course material of Economics 6170. If you pass

this test, you have satisfied the mathematics proficiency requirement of the field of economics,

and need not take the Economics 6170 course. If you fail this test, or if you do not take this test,

you can complete the mathematics proficiency requirement of the field of economics by taking the

Economics 6170 course for credit, and getting a course grade of B- or better.

If you would like any more information, you can contact me at rsd28@cornell.edu. Enjoy

your summer and I look forward to meeting you in August.

Chapter 1

Introduction to Logic

1.1. Introduction

The theory that youll learn during the first year is built on a foundation borrowed from engineering

and pure mathematics. You will be required to both understand and reproduce certain key proofs,

particularly in microeconomics. On some problem sets and exams youll be asked to produce your

own proofs.

If you havent taken any pure math courses, you might be thinking I dont even know what a

proof is. That is completely fine. There are plenty of very accomplished Ph.D. students at Cornell

who had no idea how to write a proof when they arrived. Its important not to get discouraged

because it takes time to learn how to write good proofs. There is a standard bag of tricks that will

get you through almost any proof in the first year sequence, but it takes exposure and then practice

for you to learn and be comfortable with these tricks. Math majors are at an advantage here, more

than in most areas, but by the end of the year theyll have forgotten the fancier proof techniques

and youll have learned the necessary ones, so the field will be surprisingly level.

A proof is a series of statements that demonstrates the truth of a proposition. In writing a proof

you make use of (i) the rules of logic and (ii) Definitions, theorems, and other propositions that

have already been proved, or that you are told you can take as given.

The rules of logic are obviously fixed and unchanging. The components of the second point,

however, will vary depending on the task at hand. The most important question to ask yourself

when attempting to prove a proposition is What do I already know? It will often be the case that

if you write down all of the relevant mathematical definitions, the theorems or results that you were

given or that you know you can take as given, and any result that you just proved in a previous

problem, a straightforward rearrangement of everything on the page will give you the proof that

you want.

1

2 1. Introduction to Logic

In this chapter we will discuss the principles of logic that are essential for problem solving in

mathematics. The ability to reason using the principles of logic is key to seek the truth which is

our goal in mathematics. Before we explore and study logic, let us start by spending some time

motivating this topic. Mathematicians reduce problems to the manipulation of symbols using a set

of rules. As an illustration, let us consider the following problem.

Example 1.1. Joe is 7 years older than John. Six years from now Joe will be twice Johns age. How

old are Joe and John?

Solution 1.1. To answer the above question, we reduce the problem using symbolic formulation.

We let Johns age be x. Then Joes age is x + 7. We are given that six years from now Joe will be

twice Johns age. In symbols, (x + 7) + 6 = 2(x + 6). Solving for x yields x = 1. Therefore, John

is 1 year old and Joe is 8.

Our objective is to reduce the process of mathematical reasoning, i.e., logic, to the manipulation

of symbols using a set of rules. The central concept of deductive logic is the concept of argument

form. An argument is a sequence of statements aimed at demonstrating the truth of an assertion (a

claim). Consider the following two arguments.

Argument 1. If x is a real number such that x < 3 or x > 3, then x2 > 9. Therefore, if x2 9,

then x 3 and x 3.

Argument 2. If it is raining or I am sick, then I stay at home. Therefore, if I do not stay at home,

then it is not raining and I am not sick.

Although the content of the above two arguments is very different, their logical form is the

same. To illustrate the logical form of these arguments, we use letters of the alphabet (such as p, q

and r) to represent the component sentences and the expression not p to refer to the sentence It

is not the case that p.. Then the common logical form of both the arguments above is as follows:

If p or q, then r. Therefore, if not r, then not p and not q.

We start by identify and giving names to the building blocks which make up an argument. In

Arguments 1 and 2, we identified the building blocks as follows:

Argument 1. If x is a real number such that x < 3 (p) or x> 3 (q), then x2 > 9 (r). Therefore,

if 9 (not r), then x 3 (not p) and x 3 (not q).

x2

Argument 2. If it is raining (p) or I am sick (q), then I stay at home (r). Therefore, if I do not

stay at home (not r), then it is not raining (not p) and I am not sick (not q).

1.2. Statements

The study of logic is concerned with the truth or falsity of statements.

Definition 1.1 (Statement). A statement is a sentence which can be classified as true or false

without ambiguity. The truth or falsity of the statement is known as the truth value.

1.3. Logical Connective 3

For a sentence to be a statement, it is not necessary for us to know whether it is true or false.

However, it must be clear that it is one or the other.

Example 1.2. Consider following examples.

(a) One plus two equals three. It is a statement which is true.

(b) One plus one equals three. It is also a statement which is not true.

(c) He is a university student. This sentence is neither true nor false. The truth or falsity depends

on the reference for the pronoun he. For some values of he the sentence is true; for others it is

false, and so it is not a statement.

(d) Every continuous function is differentiable. is a statement with truth value being false.

(e) x < 1 is true for some values of x and false for some others. It is a statement if we have

some particular context in mind. Otherwise, it is not a statement.

(f) Goldbachs Conjecture Every even number greater than 2 is the sum of two prime numbers

is a statement whose truth value is not known yet.

(g) There are infinitely many prime numbers of the form 2n + 1, where n is a natural number. is

another statement whose truth value is not known till now.

Every statement has a truth value, namely true (denoted by T) or false (denoted by F). We often use

p, q and r to denote statements, or perhaps p1 , p2 , , pn if there are several statements involved.

Exercise 1.1. Which of the following sentences are statements?

(a) If x is a real number, then x2 0.

(b) 11 is a prime number.

(c) This sentence is false.

The possible truth values of a statement are often given in a table, called a truth table. The truth

values for two statements p and q are given below. Since there are two possible truth values for

each of p and q, there are four possible combinations of truth values for p and q. It is customary to

consider the four combinations of truth values in the order of TT, TF, FT, FF from top to bottom.

p q

T T

(1.1) T F

F T

F F

A logical connective (also called a logical operator) is a symbol or word used to connect two or

more statements such that the compound statement produced has a truth value dependent on the

respective truth values of the original statements.

We discuss some of the elementary logical operators (connectives) first.

4 1. Introduction to Logic

Logical negation is an operation on one logical value, typically, the value of a proposition,

that produces a value of true if its operand is false and a value of false if its operand is true.

The truth table for A (also written as NOT A or A) is as follows:

A A

(1.2) T F

F T

For example, consider the statement,

p : The integer 2 is even.

Then the negation of p is the statement

p : It is not the case that the integer 2 is even.

It would be better to write,

p : The integer 2 is not even.

Or better yet to write,

p : The integer 2 is odd.

(2) Logical Conjunction

Logical conjunction is an operation on the values of two propositions, that produces a value

of true if and only if both of its operands are true. The truth table for A B (also written as

A AND B) is as follows:

A B AB

T T T

(1.3) T F F

F T F

F F F

In words, if both A and B are true, then the conjunction A B is true. For all other assignments

of logical values to A and to B the conjunction A B is false.

For example, consider the statements

p : The integer 2 is even.

q : 4 is less than 3.

The conjunction of p and q, namely,

p q : The integer 2 is even and 4 is less than 3,

is a false statement since q is false (even though p is true).

1.3. Logical Connective 5

Logical disjunction is an operation on the values of two propositions, that produces a value

of false if and only if both of its operands are false. The truth table for A B (also written as

A OR B) is as follows:

A B AB

T T T

(1.4) T F T

F T T

F F F

Thus for the statements p and q described earlier, the disjunction of p and q, namely,

p q : The integer 2 is even or 4 is less than 3,

is a true statement since at least one of p and q is true (in this case, p is true).

(4) Logical Implication

Logical implication is associated with an operation on the values of two propositions, that

produces a value of false only in the case that the first operand is true and the second operand

is false. The truth table associated with A B is as follows:

A B AB

T T

T

(1.5) T F F

F T T

F F T

The last row of the table may appear to be counterintuitive. Note, however, that the use of if

then as a connective is quite different from that of day-to-day language.

Consider the following example.

Example 1.3. Suppose your supervisor makes you the following promise:

If you meet the month-end deadline, then you will get a bonus.

Under what circumstances are you justified in saying that your supervisor spoke falsely?

The answer is: You do meet the month-end deadline and you do not get a bonus. Your

supervisors promise only says that you will get a bonus if a certain condition (you meet the

month-end deadline) is met; it says nothing about what will happen if the condition is not met.

So if the condition is not met, your supervisor did not lie (your supervisor promised nothing if

you did not meet the month-end deadline); so your supervisor told the truth in this case. Are

you convinced? Good! If not, let us then check the truth and falsity of the implication based

on the various combinations of the truth values of the statements

p: You meet the month-end deadline;

q: You get a bonus.

The given statement can be written as p q.

6 1. Introduction to Logic

Suppose first that p is true and q is true. That is, you meet the month-end deadline and you

do get a bonus. Did your supervisor tell the truth? Yes, indeed. So if p and q are both true,

then so too is p q, which agrees with the first row of the truth table of (1.5).

Second, suppose that p is true and q is false. That is, you meet the month-end deadline

and you did not get a bonus. Then your supervisor did not do as he / she promised. What your

supervisor said was false, which agrees with the second row of the truth table of (1.5).

Third, suppose that p is false and q is true. That is, you did not meet the month-end

deadline and you did get a bonus. Your supervisor (who was most generous) did not lie (your

supervisor promised nothing if you did not meet the month-end deadline); so he/she told the

truth. This agrees with the third row of the truth table of (1.5).

Finally, suppose that p and q are both false. That is, you did not meet the month-end

deadline and you did not get a bonus. Your supervisor did not lie here either. Your supervisor

only promised you a bonus if you met the month-end deadline. So your supervisor told the

truth. This agrees with the fourth row of the truth table of (1.5).

In summary, the implication p q is false only when p is true and q is false.

A conditional (or implication) statement that is true by virtue of the fact that its hypothesis

is false is said to be vacuously true or true by default. Thus the statement: If you meet the

month-end deadline, then you will get a bonusis vacuously true if you do not meet the month-

end deadline!

why this statement is assigned a truth value of T. But it is indeed true can be seen as follows.

4 + 1 4 = 9 4 = 5 so 1 = 5 and therefore 8 1 = 8 5 = 3.

Logical equality is an operation on the values of two propositions, that produces a value of

true if and only if both operands are false or both operands are true. The truth table for A B

is as follows:

A B AB

T T T

(1.6) T F F

F T F

F F T

So A B is true if A and B have the same truth value (both true or both false), and false if they

have different truth values.

always true regardless of the truth value of the simple statements from which it is constructed. It is

a contradiction if it is always false. Thus a tautology and a contradiction are negation of each other.

1.3. Logical Connective 7

A A A (A) A (A)

(1.7) T F T F

F T T F

A B A B A (A B) [A (A B)] B

T T

T T T

(1.8) T F F F T

F T T F T

F F T F T

Definition 1.3.

(a) The converse of A B is B A.

(b) The inverse of A B is A B.

(c) The contrapositive of A B is B A.

Example 1.7. Write the converse, inverse and contrapositive of the statement in Example 1.3.

Recall that the given statement can be written as p q where p and q are the statements:

p: You meet the month-end deadline;

q: You get a bonus.

(a) The converse of this implication is q p: If you get a bonus, then you have met the month-end

deadline.

(b) The inverse of this implication is p q: If you do not meet the month-end deadline, then

you will not get a bonus.

(c) The contrapositive of this implication is q p: If you do not get a bonus, then you will

not have met the month-end deadline.

The following theorem is extremely useful.

Theorem 1.1. (A B) ( B A).

A B A B B A B A

T T T F F T

(1.9) T F F T F F

F T T F T T

F F T T T T

The entries in third and sixth columns are identical.

8 1. Introduction to Logic

Remark 1.1. It is an exercise to see that A B is not logically equivalent to its converse, B A.

One should avoid the very common mistake of claiming the opposite.

Example 1.8. Consider following two statements,

(A) Cornell is in Ithaca.

(B) Cornell is in NY state.

and the compound statements:

(a) Implication : A B : If Cornell is in Ithaca, then Cornell is in NY state.

(b) Contrapositive : B A : If Cornell is NOT in NY state, then Cornell is NOT in Ithaca.

(c) Converse : B A : If Cornell is in NY state, then Cornell is in Ithaca.

Note that the converse statement is FALSE. This leads us to another important interpretation

of the implication A B. It means that every time A is true, then B must be true. Hence A is a

sufficient condition for B. If we know that A is true then we can always conclude that B is also

true. The contrapositive B A showed us that when B is not true then A cannot be true either.

Hence B is a necessary condition for A. If A is true we must necessarily have that B is true, because

if B isnt true then A cannot be true either. Thus we have following ways of reading

A implies B,

If A then B,

(1.10) AB:

A is sufficient for B,

B is necessary for A.

Remark 1.2. Note that for equivalence relation (the if and only if) A B, the implication goes

in both the directions. In this case A and B are necessary and sufficient conditions for each other.

A B means that both the statement A B and its converse B A are true.

1.4. Quantifiers

In the previous sections, we learnt some definitions and basic properties of compound statements.

We were interested in whether a particular statement was true or false. This logic is called propo-

sitional logic or statement logic. However there are many arguments whose validity cannot be

verified using propositional logic. Consider, for example, the sentence

p : x is an even integer.

This sentence is neither true nor false. The truth or falsity depends on the value of the variable x.

For some values of x the sentence is true; for others it is false. Thus this sentence is not a statement.

However, let us denote this sentence by P(x), i.e.,

P(x) : x is an even integer.

Then, P(5) is false, while P(6) is true. To study the properties of such sentences, we need to extend

the framework of propositional logic to what is called first-order logic.

1.4. Quantifiers 9

Definition 1.4. A predicate or propositional function is a sentence that contains a finite number of

variables and becomes a statement when specific values are substituted for the variables. The do-

main of a predicate variable is the set of all values that may be substituted in place of the variables.

is a propositional function with domain D, the set of integers; since for each x D, P(x) is a

statement, i.e., for each x D, P(x) is true or false, but not both.

(a) The sentence P(x) : x + 3 is an even integer with domain D the set of positive integers.

(b) The sentence P(x) : x + 3 is an even integer with domain D the set of integers.

(c) The sentence P(x; y; z) : x2 + y2 = z2 with domain D the set of positive integers.

Before proceeding further, we introduce following notations. A more comprehensive list of

notation will be described later.

: is an element of,

: such that,

: AND in the sense that A B means both Aand B,

: OR in the sense that A B means either A or B or both

: Universal for all

: Existential there exists (one or more).

(a) The Universal Quantifier:

Let P(x) be a predicate with domain D. Then the sentence

is a statement. To see this, notice that either P(x) is true at each value x D (the notation x D

indicates that x is in the set D, while x

/ D means that x is not in D) or P(x) is false for at least

one value of x D. If P(x) is true at each value x D, then Q(x) is true. However, if P(x) is

false for at least one value of x D, then Q(x) is false. Hence, Q(x) is a statement because it is

either true or false (but not both).

Definition 1.5. Each of the phrases every, for every, for each, and for all is referred

to as the universal quantifier and is expressed by the symbol . Let P(x) be a statement with

domain D. A universal statement is a statement of the form x D, P(x). It is false if P(x) is

false for at least one x D; otherwise, it is true.

10 1. Introduction to Logic

The statement

x D, x > 0

means For all x that are elements of D, x is positive.

Example 1.11. Let P(x) be the predicate P(x) : x2 x..

Determine whether the following universal statements are true or false.

(i) x R; P(x);

(ii) x Z; P(x);

( )2 ( )

(i) Let x = 12 R. Then, 12 = 14 < 12 , and so P 12 is false. Therefore, x R; P(x) is

false.

(ii) For all integers x, x2 x is true, and so P(x) is true for all x Z. Hence,x Z; P(x)

is true.

(b) The Existential Quantifier:

Each of the phrases there exists, there is, for some, and for at least one is referred

to as the existential quantifier and is denoted in symbols . Let P(x) be a predicate with domain

D. An existential statement is a statement of the form x D such that P(x): It is true if P(x)

is true for at least one x D; otherwise, it is false.

Example 1.12. As before let D be a set.

The statement

x D, x > 0

tells us that There exists an element x of D such that x is positive.

Example 1.13. Let P(x) be the predicate P(x) : x2 < x..

Determine whether the following existential statements are true or false.

(i) x R; P(x);

(ii) x Z; P(x);

(i) Let x = 12 R. Then, ( 12 )2 = 14 < 12 , and so P( 21 ) is true. Therefore, x R; P(x) is

true.

(ii) For all integers x, x2 x is true, and so there is no x Z such that P(x) is true. Hence,x

Z; P(x) is false.

(c) Universal Conditional Statements

Recall that a conditional statement has a contrapositive, a converse, and an inverse. These

definitions can be extended to universal conditional statements. Consider a universal condi-

tional statement of the form x D; P(x) Q(x).

(i) Its contrapositive is the statement,

x D; Q(x) P(x).

(ii) Its converse is the statement,

x D; Q(x) P(x)

1.4. Quantifiers 11

x D; P(x) Q(x).

Example 1.14. Write the contrapositive, converse, and inverse of the statement: If a real num-

ber is greater than 3, then its square is greater than 9.

Solution 1.2. Symbolically, the statement can be written as:

x R; if x > 3 then x2 > 9

Here P(x) is the statement x > 3 and Q(x) the statement x2 > 9.

(i) The contrapositive is:

x R; if x2 > 9 then x > 3,

or, equivalently,

x R; if x2 9 then x 3.

(ii) The converse is:

x R; if x2 > 9 then x > 3.

Note that the converse is false; take, for example, x = 4. Then, (4)2 > 9 is true but

4 > 3 is false. Hence the statement if (4)2 > 9 then 4 > 3 is false. Hence the

universal statement x R; if x2 > 9 then x > 3 is false.

(iii) The inverse is:

x R; if x > 3 then x2 > 9,

or, equivalently,

x R; if x 3 then x2 9.

(d) Order of quantifiers:

If the quantifiers are of the same type, the order in which they appear does not matter.

x, y : x+y = y+x

x y : x + y = 2 x + 2y = 3.

But if the quantifiers are of different types we have to be careful. For the set of real numbers,

the statement

(1.11) x y y > x

is TRUE, that is given any real number x, there is always a real number y that is greater than x.

But the statement

(1.12) y x, y>x

is FALSE, since there is no fixed real number y that is greater than every real number.

Example 1.15. The statement [y U x V, statement A] means that one y will make A

true regardless of what x is. The statement [x V, y U statement A] means that A can be

made true by choosing y depending on x.

12 1. Introduction to Logic

Fact 1. The negation of a universal statement of the form x D; P(x) is logically equivalent to an

existential statement of the form x D; such that P(x). Symbolically,

[x D; P(x)] x D; such that P(x)

Consider the universal statement x D; P(x). It is false if P(x) is false for at least one x D;

otherwise, it is true. Hence it is false if and only if P(x) is false for at least one x D, or, if and

only if P(x) is true for at least one x D. Thus the negation of this statement is the statement

x D such that P(x).

Example 1.16. What is the negation of the statement All mathematicians wear glasses ?

Solution 1.3. Let us write this statement symbolically. Let D be the set of all mathematicians and

let P(x) be the predicate x wears glasses with domain D. The given statement can be written as

x D; P(x). The negation is x D such that P(x). In words, the negation is There exists a

mathematician who does not wear glasses or Some mathematicians do not wear glasses.

Fact 2. The negation of an existential statement of the form x D such that P(x) is logically

equivalent to a universal statement of the form x D; P(x). Symbolically,

(x D such that P(x)) x D; P(x).

Consider the existential statement, x D such that P(x). It is true if P(x) is true for at least

one x D; otherwise, it is false. Hence it is false if and only if P(x) is false for all x D, in other

words, if and only if P(x) is true for all x D. Thus the negation of this statement is the statement

x D; P(x).

Example 1.17. What is the negation of the statement Some politicians are honest?

Solution 1.4. Let us write this statement symbolically. Let D be the set of all politicians and let

P(x) be the predicate x is honest with domain D. The given statement can be written as x D

such that P(x). The negation is x D; P(x). In words, the negation is All politicians are not

honest or No politician is honest.

Consider next the negation of a universal conditional statement. By the second Fact, we have

that (x D; (P(x) Q(x))) x D such that (P(x) Q(x)). But the negation of an if

p then q statement is logically equivalent to an p and not q statement. Hence, (P(x)

Q(x)) P(x) Q(x). Therefore we have the following fact:

Fact 3. The negation of a universal conditional statement of the form x D; (P(x) Q(x)) is

logically equivalent to the existential statement of the form x D such that (P(x) Q(x)).

Symbolically,

(x D; (P(x) Q(x))) x D such that (P(x) Q(x)).

Written less symbolically, this becomes

(x D; if P(x) then Q(x)) x D such that P(x) and Q(x).

1.5. Rules of Negation of statements with quantifiers 13

1.5.1. More Examples. We can use the truth tables to prove following examples of negations.

(A B) A B

(A B) A B

(x > y) x 6 y

(A B) A B

( A) A.

Try proving them (Good Exercise).

1.5.2. Negation of statement with one quantifier. The universal statement in the Example 1.10

contains a universal quantifier term and the statement x > 0. To negate a universal statement we

need to find only one counterexample. In this example, if we can find just one x in D that is non

positive, we know that it is not true that all x are positive. Thus the negation of the universal

statement

x D, x > 0

is an existential statement,

x D, x 6 0.

To negate an existential statement we must show that every possible instance is false. The existen-

tial statement

x D, x > 0

is false if there are no positive elements of D. Thus the negation of the existential statement is a

universal statement

x D, x 6 0.

Insight from these examples can be generalized to rules of negation. Note that , such that always

follows (the existential quantifier).

Rule 1.1. For negating the statement, [quantifier term, statement], first change the quantifier:

becomes , becomes and then negate the statement.

Rule 1.2. To negate a statement with a string of quantifiers, change the type of each quantifier,

preserve their order and negate the statement that follows the quantifiers.

(1.13) > 0 N n, if n > N, then x D, fn (x) f (x) < .

14 1. Introduction to Logic

Negation: > 0 [N n, if n > N, then x D, fn (x) f (x) < ],

or > 0 N, [ n, if n > N, then x D, fn (x) f (x) < ],

(1.14) or > 0 N, n [if n > N, then x D, fn (x) f (x) < ],

or > 0 N, n n > N and [ x D, fn (x) f (x) < ],

or > 0 N, n > N and x D, fn (x) f (x) > .

There are many fundamental logical equivalences that we often encounter. Several of these are

listed in Theorem below. We may find them to be useful for future reference.

Theorem 1.2. Let p, q and r be statements. Then the following logical equivalences hold.

(1) Commutative Laws

(i) p q q p;

(ii) p q q p.

(2) Associative Laws

(i) (p q) r p (q r);

(ii) (p q) r p (q r).

(3) Distributive Laws

(i) p (q r) (p q) (p r);

(ii) p (q r) (p q) (p r).

(4) De Morgans Laws

(i) (p q) ( p) ( q);

(ii) (p q) ( p) ( q).

(5) Idempotent Laws

(i) p p p;

(ii) p p p.

(6) Negation Laws

(i) p ( p) T ;

(ii) p ( p) F;

where T: True; F: False.

(7) Universal Bound Laws

(i) p T T ;

(ii) p F F.

(8) Identity Laws

(i) p F p;

(ii) p T p.

(9) Double Negation Law ( (p)) p.

1.7. Some Math symbols and Definitions 15

The De Morgans Laws can be expressed in words as under: The negation of an and statement is

logically equivalent to the or statement in which each component is negated, while the negation of

an or statement is logically equivalent to the and statement in which each component is negated.

This is a very brief list of some of the mathematical shorthand that will be used in this course and

in the first year courses. Some of these symbols will be explained in more detail as we go.

Operator Meaning

For all, for every, for each

There exists, there is

In, a member of

Owns, contains

Or

And

Therefore

or Not

0/ Empty set

Subset, is a subset of

Contains the set

Union (of sets)

Intersection (of sets)

Implies

or iff If and only if, each implies the other

s.t., |, or : Such that

Q.E.D. Quod erat demonstrandum (Proof complete)

(a) Theorem A statement which can be demonstrated to be true by accepted mathematical

operations and arguments.

In general, a theorem is an embodiment of some general principle that makes it part of a

larger theory. The process of showing a theorem to be correct is called a proof.

(b) Proposition A statement which is required to be proved.

(c) Axiom A proposition regarded as self-evidently true without proof. The word axiom is

synonym for postulate.

(d) Corollary An immediate consequence of a result already proved. Corollaries usually state

more complicated theorems in a language simpler to use and apply.

(e) Lemma A short theorem used in proving a larger theorem.

16 1. Introduction to Logic

(f) Hypothesis A hypothesis is a proposition that is consistent with known data, but has been

neither verified nor shown to be false.

(g) Definition Tells us how or what things are.

Chapter 2

Proof Techniques

A proof is a method of establishing the truthfulness of an implication. An example would be to

prove a proposition of the form, If H1 , , Hn , then T. . The statements H1 , , Hn are referred to

as hypotheses of the proof and proposition T is referred to as the conclusion. A formal proof would

consist of a sequence of valid propositions ending with the conclusion T. By valid proposition,

we mean the proposition in the sequence must either be one of the hypotheses H1 , , Hn , or an

axiom, a definition, a tautology or a proposition proved earlier, or it must be derived from previous

propositions using either logical implication or substitution.

Before we present proof techniques, we describe some elementary definitions in number theory.

Definition 2.1. An integer n is even if and only if n = 2k for some integer k. An integer n is odd if

and only if n = 2k + 1 for some integer k.

Using the quotient-remainder theorem, we can show that every integer is either even or odd.

Definition 2.2. An integer n is prime if and only if n > 1 and for all positive integers r and s, if

n = r s then r = 1 or s = 1. An integer n is composite if and only if n = r s for some positive

integers r and s, with r = 1 and s = 1.

First three prime numbers are 2, 3, and 5. First six composite numbers are 4, 6, 8, 9, 10 and 12.

Every integer greater than 1 is either prime or composite since the two definitions are negations of

each other.

Definition 2.3. Two integers m and n are said to be of the same parity if m and n are both even or

are both odd, while m and n are said to be of the opposite parity if one of m and n is even and the

other is odd. Two integers are consecutive if one is one more than the other.

17

18 2. Proof Techniques

Integers 2 and 8 are of same parity while 5 and 10 are of opposite parity.

Definition 2.4. Let n and d be integers with d = 0. Then n is said to be divisible by d if n = d k for

some integer k. In such case we say that n is a multiple of d, or d is a factor of n, or d is a divisor

of n, or d divides n.

We discuss following techniques of writing proofs. Our emphasis here will be on showing how

each of them is used through several examples.

Let P(x) and Q(x) be statements with domain D. If Q(x) is true for every x D, then the universal

statement

x D, P(x) Q(x)

is true regardless of the truth value of P(x). Such a proof is called a trivial proof.

Claim 2.1. For x R, if x > 3, then x2 + 1 > 0.

Proof. Consider the two statements P(x) : x > 3 and Q(x) : x2 + 1 > 0. Since x2 0 for every

x R, it follows that x2 + 1 0 + 1 > 0 for every x R. Thus P(x) Q(x) is true for every x R

and hence for x > 3.

Claim 2.2. If n is an odd integer, then 6n3 + 4n + 3 is an odd integer.

where k = 3n3 + 2n + 1 Z), the integer 6n3 + 4n + 3 is odd for every integer n.

Observe the fact that 6n3 + 4n + 3 is odd does not depend on n being odd. It would have been

better to replace the statement of the claim by if n is an integer, then 6n3 + 4n + 3 is odd.

Let P(x) and Q(x) be the statements with domain D. If P(x) is false for all every x D, then the

universal statement

x D, P(x) Q(x)

is true regardless of the truth value of Q(x). Such a proof is called vacuous proof.

Claim 2.3. For x R, if x2 2x + 1 < 0, then x > 1.

Proof. Let P(x) : x2 2x + 1 < 0 and Q(x) : x > 1. Since x2 2x + 1 = (x 1)2 0 for every

x R, we have (x 1)2 < 0 is false for every x R. Hence, P(x) is false for every x R. Thus,

P(x) Q(x) is true for every x R.

2.4. Proof by Construction 19

In a proof by construction we work straight from the set of assumptions.

Example 2.1. Consider a function

(2.1) f (n) = n2 + n + 17,

where n N. If we evaluate this function, it seems that we always get a prime number. For

instance

f (1) = 19

f (2) = 23

f (3) = 29

f (15) = 257.

We can verify that all these numbers are prime. Then we might conjecture that

Conjecture 1. The function f (n) = n2 + n + 17 generates prime numbers for all n N.

we have NOT proved the conjecture made above in the example. In fact, this conjecture is false.

Take n = 17, f (17) = 172 + 17 + 17 = 17 19 which is not a prime number.

Example 2.2. Let NE be the set of even natural numbers and NO be the set of odd numbers.

We want to show that (i) the sum of two even numbers is even,

x, y NE , x + y NE

and (ii) the sum of an odd number and an even number is odd

x NE , y NO , x + y NO .

(i) Let

x, y NE m, n N x = 2m y = 2n,

x + y = 2m + 2n = 2 (m + n) NE since m + n N.

(ii) Let

x NE m N x = 2m, y NO n N y = 2n + 1,

x + y = 2m + 2n + 1 = 2 (m + n) + 1, where m + n N x + y NO .

Example 2.3. Consider function g (n, m)

20 2. Proof Techniques

g (n, m) = n2 + n + m where m, n N.

g (1, 2) = 12 + 1 + 2 = 22

g (2, 3) = 22 + 2 + 3 = 32

g (12, 13) = 122 + 12 + 13 = 132

On the basis of above, we can form a conjecture,

Conjecture 2.

(2.2) n N, g (n, n + 1) = (n + 1)2 .

Proof. By construction.

g (n, n + 1) = (n)2 + n + (n + 1)

= n2 + 2n + 1

= (n + 1)2 .

Having proved the general statement, we know that

g (15, 16) = 162 .

This is an example of deductive reasoning.

x NO n N x = 2n + 1,

x2 = (2n + 1)2

= 4n2 + 4n + 1

( )

= 2 2n2 + 2n + 1

x 2 NO .

For x = 1, x2 = 1 which is odd.

Example 2.5. If the sum of two integers is even, then so is their difference.

Proof. Assume that the integers m and n are such that m + n is even. Then m + n = 2k for some

integer k. So, m = 2k n and m n = 2k n n = 2(k n) = 2l, where l = k n is an integer.

Thus m n is even.

2.5. Proof by Contraposition 21

Note that A B is not logically equivalent to its converse statement B A. It is possible for an

implication to be false while its converse is true. Hence we cannot prove A B by showing B A.

m2 > 0 m > 0

is false but its converse

m > 0 m2 > 0

is true.

To show that A B, we can instead show that B A. We have already shown before that

implication and its contrapositive are logically equivalent.

Its contrapositive is If m is not an odd number, then 7m is not an odd number., or, equivalently,

If m is an even number, then 7m is an even number.

We are talking about integers here. Using contrapositive, we can construct a proof of theorem

as under:

Proof.

m NE k N m = 2k,

7m = 7 (2k) = 2 (7k) , 7k N 7m NE .

This is much easier than trying to show directly that 7m being odd implies that m is odd.

(2.3) x 2 NE x NE

Its contrapositive is

(2.4) x NO x 2 NO

22 2. Proof Techniques

To prove that statement C is true, try supposing C is true and then show that this leads to a

contradiction. To show that A B we can use

(2.5) (A B) A B.

So assume A to be true and show B is false. Hence A B is false. So A B is true.

x 2 NE x NE .

x2 NE m N x2 = 2m.

x NO n N x = 2n + 1

x2 = 4n2 + 4n + 1, which is odd.

This contradicts initial assumption that x2 is even.

Proof. Assume, to the contrary, that there is a greatest integer, say N. Then, N n for every integer

n. Let m = N + 1. Now m is an integer since it is the sum of two integers. Also, m > N. Thus, m is

an integer that is greater than the greatest integer, which is a contradiction. Hence our assumption

that there is a greatest integer is false. Thus there is no greatest integer.

Definition 2.5. A real number r is rational number if r = mn for some integers m and n with n = 0.

A real number that is not a rational number is called an irrational number.

Proof. Assume, to the contrary, that there is a least positive rational number x. Then, x y for

every positive rational number y. Consider the number 2x . Since x is a positive rational number,

so too is 2x . Multiplying both sides of the inequality 12 < 1 by x, which is positive, gives 2x < x.

Hence, 2x is a positive rational number that is less than x, which is a contradiction. Hence our

assumption that there is a least positive rational number is false. Thus there is no least positive

rational number.

Example 2.12. The sum of a rational number and an irrational number is irrational.

2.6. Proof by Contradiction 23

Proof. Assume, to the contrary, that there exists a rational number p and an irrational number q

whose sum is a rational number. Thus, by definition of rational numbers, p = ab and p + q = r = dc

for some integers a; b; c and d with b = 0 and d = 0. Hence,

c a bc ad

q = r p = =

d b bd

Now, bc ad Z and bd Z since a; b; c and d Z. Since b = 0 and d = 0, bd = 0. Hence,

r Q, which is a contradiction. Hence our assumption that there exists a rational number and an

irrational number whose sum is a rational number is false. Thus, the sum of a rational number and

an irrational number is irrational.

We end this section with a proof of the classical result that 2 is irrational.

Example 2.13. The real number 2 is irrational.

Proof. Assume, to the contrary, that 2 is rational. Then,

m

2=

n

where m; n Z and n = 0. By dividing m and n by any common factors, if necessary, we may

further assume that m and n have no common factors, i.e., mn has been expressed in (or reduced to)

2

lowest terms. Then, 2 = mn2 , and so m2 = 2n2 . Thus, m2 is even. Hence, m is even, and so m = 2k,

where k Z. Substituting this into our earlier equation m2 = 2n2 , we have (2k)2 = 2n2 , and so

4k2 = 2n2 . Therefore, n2 = 2k2 . Thus, n2 is even, and so n is even. Therefore each of m and n has

2 as a factor, which contradicts our assumption that m = n has been reduced to lowest terms and

therefore that m and n have no common

factors. We deduce, therefore, that our assumption that 2

is rational is incorrect. Hence, 2 is irrational.

Exercise 2.1. The square root of any prime number is irrational.

Remark 2.1. One should be very careful when writing proof by contradiction. Here is a very

strong word of caution which can be found in ?, page 3.

All students are enjoined in the strongest possible terms to eschew proofs by contradiction!

There are two reasons for the prohibition: First such proofs are very often fallacious, the contra-

diction on the final page arising from an erroneous deduction on an earlier page, rather than from

the incompatibility of p with q. Second, even when correct, such a proof gives little insight into

the connection between p and q whereas both the direct proof and the proof by contraposition con-

struct a chain of argument connecting p and q. One reason why mistakes are so much more likely

in proofs by contradiction than in direct proofs is that in a direct proof (assuming the hypotheses is

not always false) all deduction from the hypothesis are true in those cases where hypothesis holds.

One is dealing with true statements, and ones intuition and knowledge about what is true help to

keep one from making erroneous statements. In proofs by contradiction, however, you are (assum-

ing the theorem is true) in the unreal world where any statement can be derived, and so the falsity

of a statement is no indication of an erroneous deduction..

24 2. Proof Techniques

A proof by induction involves three steps.

(a) Base of induction. Check for n = 1, whether the statement is true.

(b) Inductive transition: Assume that the statement is true for some n and show that it is also true

for n + 1.

(c) Inductive conclusion: The statement is true for all n > 1.

Proof. By Induction.

(a) Base of induction:

Assume that for

then for

f (x) = xn+1 = xn x,

f (x) = nxn1 x + xn 1

= nxn + xn

(2.8) = (n + 1) xn

n N if f (x) = xn then f (x) = n xn1 .

(2.9) 7n 4n = 7 4 = 3

Statement is true.

(b) Inductive transition:

2.7. Proof by Induction 25

7n+1 4n+1 = 7 7n 4 4n

= 7 7n 7 4n + 7 4n 4 4n

= 7 (7n 4n ) + (7 4) 4n

= 7 (3m) + 3 4n

= 3 (7m + 4n )

Since m and n are natural numbers, so is 7m + 4n . So 7n+1 4n+1 is a multiple of 3.

(c) Inductive conclusion:

7n 4n is a multiple of 3, for all n N.

(n)

Example 2.16. Prove the Binomial Theorem : (a + b)n = nk=0 k ank bk by induction.

For n = 1, the claim is trivially true.

(b) Inductive transition:

Assume that the Binomial Theorem holds true for n. Then

n ( )

n nk k

(a + b)n+1

= (a + b)(a + b) = (a + b)

n

a b

k=0 k

n ( ) n ( )

n nk+1 k n nk k+1

= a b + a b

k=0 k k=0 k

n ( ) ( )

n nk+1 k n+1 n

= a b + anl+1 bl by change of variable l = k + 1

k=0 k l=1 l 1

( ) {( ) ( )} ( )

n

n n+1 n n n n+1

= a + + anl+1 bl + b

0 l=1 l l 1 n

( ) {( )} ( )

n + 1 n+1 n+1 n+1 n + 1 n+1

= a + anl+1 l

b + b

0 l=1 l n+1

n+1 ( )

n + 1 (n+1)k k

= a b

k=0 k

In the fifth line we have used the fact that ,

( ) ( ) ( )

n n n+1

+ = .

l l 1 l

It is a good exercise to verify this.

(c) Inductive conclusion:

The Binomial Theorem holds for all n N.

26 2. Proof Techniques

Observe that in the inductive hypothesis of our proof above, we assume that P(k) is true for an

arbitrary, but fixed, positive integer k. We certainly do not assume that P(k) is true for all positive

integers k, for this is precisely what we wish to prove! It is important to understand that our aim

is to establish the truth of the implication If P(k) is true, then P(k + 1) is true. which together with

the truth of the statement P(1) allows us to conclude that an infinite number of statements (namely,

P(1), P(2),P(3), ) are true.

Example 2.17. For every positive integer n,

n(n + 1)(2n + 1)

12 + 22 + + n2 = .

6

n(n+1)(2n+1)

Proof. For every integer n 1, let P(n) be the statement P(n) : 12 + 22 + + n2 = 6 .

(a) Base of induction:

When n = 1, the statement P(1) : 12 = 1(1+1)(21+1)

6 is certainly true since 1(1+1)(21+1)

6 =

6

6 = 1. This establishes the base case when n = 1.

(b) For every integer n > 1, let P(n) be the statement P(n) : 12 + 22 + + n2 = n(n+1)(2n+1)

6 . For

the inductive hypothesis, let k be an arbitrary (but fixed) integer such that k > 1 and assume

that P(k) is true; that is, assume that 12 + + k2 = k(k+1)(2k+1)

6 . For the inductive step, we

need to show that P(k + 1) is true. That is, we show that

(k + 1)(k + 2)(2k + 3)

12 + 22 + + k2 + (k + 1)2 = .

6

Evaluating the left-hand side of this equation, we have

12 + 22 + + k2 + (k + 1)2 = (12 + 22 + + k2 ) + (k + 1)2

k(k + 1)(2k + 1)

= + (k + 1)2 (by the inductive hypothesis)

6

k(k + 1)(2k + 1) 6(k + 1)2

= +

6 6

(k + 1)(2k2 + k + 6k + 6)

=

6

(k + 1)(2k2 + 7k + 6) (k + 1)(2k2 + 4k + 3k + 6)

= =

6 6

(k + 1)(k + 2)(2k + 3)

= ;

6

thus verifying that P(k + 1) is true.

(c) Hence, by the principle of mathematical induction, P(n) is true for all integers n 1; that is,

n(n + 1)(2n + 1)

12 + 22 + + n2 =

6

2.7. Proof by Induction 27

Recall that in a geometric sequence, each term is obtained from the preceding one by multiply-

ing by a constant factor. If the first term is 1 and the constant factor is r, then the sequence is 1, r,

r2 , r3 , , rn , . The sum of the first n terms of this sequence is given by a simple formula which

we shall verify using mathematical induction. This is left as an exercise.

Induction can also be used to solve problems involving divisibility, as the next two example

illustrates.

Example 2.18. For all integers n 1, 22n 1 is divisible by 3.

Proof. We proceed by mathematical induction. When n = 1, the result is true since in this case

22n 1 = 22 1 = 3 and 3 is divisible by 3. Hence, the base case when n = 1 is true. For the

inductive hypothesis, let k be an arbitrary (but fixed) integer such that k > 1 and assume that the

property holds for n = k, i.e., suppose that 22k 1 is divisible by 3. For the inductive step, we must

show that the property holds for n = k + 1. That is, we must show that 22(k+1) 1 is divisible by

3. Since 22k 1 is divisible by 3, there exists, by definition of divisibility, an integer m such that

22k 1 = 3m, and so 22k = 3m + 1. Now,

22(k+1) 1 = 22k 22 1

= 4 22k 1

= 4(3m + 1) 1

= 12m + 3

= 3(4m + 1).

Since m Z, we know that 4m + 1 Z. Hence, 22(k+1) 1 is an integer multiple of 3; that is,

22(k+1) 1 is divisible by 3, as desired. Hence, by the principle of mathematical induction, the

property holds for all integers n 1.

Induction can also be used to verify certain inequalities, as the next example illustrates.

Example 2.19. For all integers n 2,

1 1 1

n < + ++

1 2 n

.

Proof. We proceed by mathematical induction. To show the inequality holds for n = 2, we must

show that

1 1

2< + .

1 2

But if and1 only1 if 2 < 2 + 1 which is true if and only if 1 < 2. Since

this inequality is true

1 < 2 is true, so too is 2 < 1 + 2 . Hence the inequality holds for n = 2. This establishes the

28 2. Proof Techniques

base case. For the inductive hypothesis, let k be an arbitrary (but fixed) integer such that k > 2 and

assume that the inequality holds for n = k, i.e., suppose that

1 1 1

k < + ++ .

1 2 k

For the inductive step, we must show that the inequality holds for n = k + 1. That is, we must show

that

1 1 1 1

k +1 < + ++ + .

1 2 k k+1

Since k > 2, k < k + so (multiplying both sides by k, k < k k + 1. Hence

1, and (adding

1to both sides), k + 1 < k k + 1 + 1; and so (dividing both sides by k + 1 we have k + 1 <

1

k + k+1 . Hence, by the inductive hypothesis,

1 1 1 1

k +1 < + ++ + ;

1 2 k k+1

as desired. Hence, by the principle of mathematical induction, the inequality holds for all integers

n > 2.

To prove a universal statement

(2.10) x D, p (x)

we let x represent an arbitrary element of the set D and then show that statement p (x) is true. The

only properties we can use about x are those that apply to all elements of D. For example, if the set

D consists of the natural numbers, then we cannot assume x to be odd as not all natural numbers

are odd. To prove an existential statement,

(2.11) x D, p (x)

all we need to do is to show that there exists at least one member of D for which p (x) is true. We

show these techniques through following examples.

Example 2.20. For every > 0, there exists a > 0 such that

In this example we are asked to prove that the statement is true for each positive number . We

begin with an arbitrary and use it to find a which is positive and has the property that the

implication holds true. We give a particular value of which could possibly depend on and show

that the statement is true.

2.8. Additional Notes on Proofs 29

1 < x < 1+

1 < x < 1+

2 2

2 < 2x < 2 +

5 < 2x + 3 < 5 + .

In some cases, it is possible to prove an existential statement in an indirect way without actually

producing any specific element of the set. One indirect method is to use contrapositive and another

is to use a proof by contradiction. Consider following example to show this aspect.

Example 2.21. Let f be a continuous function.

If

1

(2.13) f (x) dx = 0,

0

then there exists a point x [0, 1] such that

f (x) = 0.

1

(2.14) If x [0, 1] f (x) = 0, then f (x) dx = 0

0

This is lot easier to prove. Instead of having to conclude the existence of an x having a particular

property, we are given that all x have a different property. The proof follows directly from the

definition of the integral, since each of the terms in any Riemann sum will be zero.

1

Example 2.22. Let x be a real number. If x > 0 then x > 0.

and

1

(2.15) 6 0.

x

Since x > 0, we can multiply both sides by x.

( )

1

(2.16) (x) 6 (x) 0 or 1 6 0.

x

This is a contradiction.

30 2. Proof Techniques

Claim 2.4. There exist irrational numbers a and b such that ab is rational.

2

Proof. Consider the real number, 2 . This number is either rational or irrational. We consider

each case in turn.

2

(1) 2 is rational. Let a = 2 and b = 2. Thus a and b are irrational, and by assumption, ab

is rational.

2 2

(2) 2 is irrational. Let a = 2 and b = 2. Thus a and b are irrational. Moreover, ab =

2

( 2 ) 2 = ( 2) 2 2 = ( 2)2 = 2 is rational.

In both cases, we proved the existence of irrational numbers a and b such that ab is rational, and so

we have the desired result.

We remark that as it stands, this proof does not enable us to pinpoint which of the two choices

of the pair (a, b) has the required property. In order to determine the correct choice of (a, b),

2

we would need to decide whether 2 is rational or irrational. It is not a constructive proof.

Following would be a constructive proof of this claim. Let a = 2 and b = log2 9. Then b is

an irrational number, for if it were rational, then log2 9 = mn where m and n are integers with no

common factor. This implies 2m = 9n which is a contradiction as 2m is an even number and 9n is

an odd number. This gives ab = 3 which is rational 1.

Let P(x) be a statement. If x possesses certain properties, and if we can verify that P(x) is true

regardless of which of these properties x has, then P(x) is true. Such a proof is called a proof by

cases.

Some proofs naturally divide themselves into consideration of two or more cases. For example

positive integers are either even or odd. Real numbers are positive, negative or zero. It may be that

different arguments are required for each case.

More rigorously, suppose we want to prove that p q, and that p can be decomposed into two

disjoint propositions p1 , p2 such that p1 p2 is a contradiction. Then p (p1 p2 ) (p1 p2 )

(p1 p2 ).

With this choice of p1 and p2 , we have,

(p q) (p q) [(p1 p2 ) q]

[(p1 p2 ) q] [(p1 q) (p2 q)]

[(p1 q) (p2 q)].

This means that we only need to show that p1 q and p2 q. Note that this method works also

if we can decompose p into a number of propositions greater than 2 as far as these propositions are

1There is an extensive literature on constructive mathematics. You may like to do a google search for easy to read articles on the

subject. A classic reference is ?.

2.9. Decomposition or proof by cases 31

mutually exclusive ( i.e., every pair of them is a contradiction). Following example illustrates this

technique.

Before going over some examples, we state the following theorem.

Theorem 2.1. (Quotient-Remainder Theorem) For every given integer n and positive integer d,

there exist unique integers q and r such that

n = d q+r and 0 r < d.

Definition 2.6. Let n be a nonnegative integer and let d be a positive integer. By the Quotient-

Remainder Theorem, there exist unique integers q and r such that n = d q + r; where 0 r < d.

We define,

n div d = q (read as n divided by d ), and

n mod d = r (read as n modulo d ).

Thus n div d and n mod d are the integer quotient and integer remainder, respectively, obtained

when n is divided by d.

Observe that given a nonnegative integer n and a positive integer d, we have that n mod d

{0, , d 1} (since 0 r d 1) and that n mod d = 0 if and only if n is divisible by d.

Result 2.1. Every integer is either even or odd.

Proof. By the Quotient-Remainder Theorem with d = 2, there exist unique integers q and r such

that n = 2 q + r and 0 r < 2. Hence, r = 0 or r = 1. Therefore, n = 2q or n = 2q + 1 for some

integer q depending on whether r = 0 or r = 1, respectively. In the case that n = 2q, the integer n

is even. In the other case that n = 2q + 1, the integer n is odd. Hence, n is either even or odd.

Example 2.23. If n Z, then n2 + 5n + 3 is an odd integer.

(1) n is even.

Then, n = 2k for some integer k. Thus, n2 + 5n + 3 = (2k)2 + 5(2k) + 3 = 4k2 + 10k + 3 =

2(2k2 + 5k + 1) + 1 = 2m + 1, where m = 2k2 + 5k + 1. Since k Z, we must have m Z.

Hence, n2 + 5n + 3 = 2m + 1 for some integer m, and so the integer n2 + 5n + 3 is odd.

(2) n is odd.

Then, n = 2k + 1 for some integer k. Thus, n2 + 5n + 3 = (2k + 1)2 + 5(2k + 1) + 3 =

4k + 14k + 9 = 2(2k2 + 7k + 4) + 1 = 2m + 1, where m = 2k2 + 7k + 4. Since k Z, we must

2

odd.

32 2. Proof Techniques

Example 2.24. Let m, n Z. If m and n are of the same parity (either both even or both odd), then

m + n is even.

Proof. We use a proof by cases, depending on whether m and n are both even or both odd.

(1) m and n are both even.

Then, m = 2k and n = 2l for some integers k and l. Thus, m + n = 2k + 2l = 2(k + l). Since

k + l Z, the integer m + n is even.

(2) m and n are both odd.

Then, m = 2k + 1 and n = 2l + 1 for some integers k and l. Thus, m + n = (2k + 1) + (2l +

1) = 2(k + l + 1). Since k + l + 1 Z, the integer m + n is even.

Proof. We shall combine two proof techniques and use both a proof by contrapositive and a proof

by cases. Suppose that n is not a multiple of 3. We wish to show then that n2 is not a multiple of

3. By the Quotient-Remainder Theorem with d = 3, there exist unique integers q and r such that

n = 3 q + r and 0 r < 3. Hence, r {0; 1; 2}. Therefore, n = 3q or n = 3q + 1 or n = 3q + 2

for some integer q depending on whether r = 0; 1 or 2, respectively. Since n is not a multiple of 3,

either n = 3q + 1 or n = 3q + 2 for some integer q. We consider each case in turn.

(1) n = 3q + 1 for some integer q.

Then, n2 = (3q + 1)2 = 9q2 + 6q + 1 = 3(3q2 + 2q) + 1, and so n2 is not a multiple of 3.

(2) n = 3q + 2 for some integer q.

Then, n2 = (3q + 2)2 = 9q2 + 12q + 4 = 3(3q2 + 4q + 1) + 1, and so n2 is not a multiple of

3.

Proof. We shall use both a direct proof and a proof by cases. Assume that n is an odd integer.

By the Quotient-Remainder Theorem with d = 4, there exist unique integers q and r such that

n = 4 q + r and 0 r < 4. Hence, r {0; 1; 2; 3}. Therefore, n = 4q or n = 4q + 1 or n = 4q + 2

or n = 4q + 3 for some integer q depending on whether r = 0; 1; 2 or 3, respectively. Since n is

odd, and since 4q and 4q + 2 are both even, either n = 4q + 1 or n = 4q + 3 for some integer q. We

consider each case in turn.

(1) n = 4q + 1 for some integer q.

Then, n2 = (4q + 1)2 = 16q2 + 8q + 1 = 8(2q2 + q) + 1 = 8m + 1, where m = 2q2 + q.

Since q Z, we must have m Z. Hence, n2 = 8m + 1 for some integer m.

(2) n = 4q + 3 for some integer q.

2.9. Decomposition or proof by cases 33

8m + 1, where m = 2q2 + 3q + 1. Since q Z, we must have m Z. Hence, n2 = 8m + 1 for

some integer m.

We remark that the last conclusion can be restated as follows: For every odd integer n, we have

n2 mod 8 = 1. Here are some additional illustrative examples.

Example 2.27. If x is a real number, then

x 6 |x| .

{

x if x > 0

(2.17) |x| =

x if x < 0.

Since this definition is divided into two parts, it makes sense to divide the proof also in two parts.

Proof. Let x be an arbitrary real number. Then either x > 0 or x < 0. If x > 0, then by definition

|x| = x. If x < 0, then x > 0, so that

x < 0 < x = |x|

In either case,

x 6 |x| .

Chapter 3

Problem Set 1

(1) Prove or give a counterexample for the following claims. Capital letters refer to propositions

or sets, depending on the context.

(a)

(A B) A B

(b)

(A B) A B.

(c)

(A B) A B.

(d)

((A B) C) ((A C) (B C)).

(e) If n and n + 1 are consecutive integers, then both cannot be even.

(f) Give a counter example of the proposed statement: If n N then n2 > n.

(g) If x is odd then x2 is odd.

(2) Write the negation of the following statements

(a) If S is closed and bounded, then S is compact.

(b) If S is compact, then S is closed and bounded.

(c) If a function is continuous then it is differentiable.

(3) Find the contrapositive of

(a) If x2 = 3 y2 > 5 then xy is a rational number.

(b) If x = 0 then y xy = 1.

(4) Find the mistake in the proofof the following results, and provide correct proofs.

(a) If m is an even integer and n is an odd integer, then 2m + 3n is an odd integer.

Proof. Since m is an even integer and n is an odd integer, m = 2k and n = 2k + 1 for some

integer k. Therefore, 2m + 3n = 2(2k) + 3(2k + 1) = 10k + 3 = 2(5k + 1) + 1 = 2l + 1;

35

36 3. Problem Set 1

2m + 3n is an odd integer.

(b) For all integers n 1, n2 + 2n + 1 is composite.

Proof. Let n = 4. Then, n2 + 2n + 1 = 42 + 2(4) + 1 = 25 and 25 is composite.

(5) Prove the following claims:

(a) An integer that is not divisible by 2, cannot be divisible by 4. (Try proving this twice, once

with contraposition and once with contradiction).

(b) There is no greatest negative real number.

(c) The product of an irrational number and a nonzero rational number is irrational.

(6) Prove that for n N,

(a)

1 + 3 + 5 + + (2n 1) = n2 .

(b)

n(n + 1)

1+2++n = .

2

(c)

[ ]2

n(n + 1)

13 + 23 + + n3 = .

2

(d) For q = 1 and n > 1,

n1

a [a + (n 1)r]qn rq(1 qn1 )

(a + kr)qk = 1q

+

(1 q)2

k=0

(7) (Sum of a Geometric Sequence): For all integers n 0 and all real numbers r with r = 1,

n

rn+1 1

ri = r1

i=0

What can we say when n for arbitrary values of r? For what values of r is the sum well

defined? What is the sum for such values of r?

(8) (a) For all integers n 2, n3 n is divisible by 6.

(b) For all integers n 3, 2n > 2n + 1.

(9) All prime numbers greater than 6 are either of the form 6n + 1 or 6n + 5, where n is some

natural number.

(10) If |9 5x| 11 then show that x 25 and x 4.

Chapter 4

4.1.1. Basic Definitions.

We define a set as a well-specified collectionin order to emphasize that there must be a clear

rule or group of rules that determine membership in the set. Essentially all mathematical objects

can be gathered into sets: numbers, variables, functions, other sets, etc. Examples of sets can be

found everywhere around us. For example, we can speak of the set of all living human beings,

the set of all cities in Europe, the set of all propositions, the set of all prime numbers, and so on.

Each living human being is an element of the set of all living human beings. Similarly each prime

number is an element of the set of all prime numbers. If A is a set and a is an element of A, then

we write a A. If it so happens that a is not an element of A, then we write a / A. If S is the set

whose elements are s, t, and u, then we write S = {s;t; u}. The left brace and right brace visually

indicate the bounds of the set, while what is written within the bounds indicates the elements

of the set. For example, if S = {1; 2; 3; 5}, then 2 S, but 4 / S. Sets are determined by their

elements. The order in which the elements of a given set are listed does not matter. For example,

{1; 2; 3} and {3; 1; 2} are the same set. It also does not matter whether some elements of a given

set are listed more than once. For instance, {1; 2; 2; 2; 3; 3} is still the set {1; 2; 3}. Many sets are

given a shorthand notation in mathematics as they are used so frequently. A set may be defined by

a property. For instance, the set of all true propositions, the set of all even integers, the set of all

odd integers, and so on. Formally, if P(x) is a property, we write A = {x S : P(x)} to indicate that

the set A consists of all elements x of S having the property P(x). The colon : is commonly read as

such that and is also written as | . So {x S|P(x)} is an alternative notation for {x S : P(x)}.

For a concrete example, consider A = {x R : x2 = 2}. Here the property P(x) is x2 = 2. Thus, A

is the set of all real numbers whose square is one.

37

38 4. Set Theory, Sequence

A B

We write B A or A B.

Definition 4.3. If A is a set, then B is a strict subset of A if every element of B is also an element

of A, and there exists at least one element of A which is not an element of B.

bBbA

and B is a strict subset of A if

b B b A a A s.t. a

/ B.

Technically we should differentiate between subsets and strict subsets, but economists are usually

sloppy about this. In most courses you will see the operator used for both, and you will not be

required to differentiate between the two concepts. Now let X be a universal set, such that we are

interested in subsets of this set.

Definition 4.4. The complement of the set A is the set Ac containing all elements not in A.

We write Ac = {x : x

/ A}.

For the complement of a set to be clearly understood, we need to know what the relevant

universe is. For example, we can define the set J as all real numbers between 2 and 4, inclusive:

J = {x R | 2 x 4}1.

In this context, the set J c is the set of all real numbers strictly less than 2 or strictly greater than 4:

J c = {x R | x < 2 x > 4}.

The universe in this case is the set of real numbers. The complement of J doesnt include all

mathematical objects not in J, nor does it include all numbers not in J (because complex numbers

are excluded). In most cases the universe is clear from the context.

1This can also be written as J = [2, 4], where the square brackets indicate the closed interval between the first entry and the second.

4.1. Set Theory 39

Set A

Set AC

D = {2, 4, 10},

B = {x R s.t. x 10}

S = The set of all real-valued functions on R.

R The real numbers

R+ Real numbers 0

R++ Real numbers > 0

Z The set of integers (10, 0, 2, 451, etc.)

Z+ The set of integers 0 (also called N)

Z++ The set of integers > 0 (sometimes also called N)

Q The rational numbers (numbers that can be expressed as fractions)

C The complex numbers

0/ Empty set or null set

The universal set

R2 The set of pairs of real numbers

The last set R2 is shorthand notation for the Cartesian product R R. This notation is accept-

able for any n Z++ number of sets. You will often encounter proofs and theorems defined on the

set Rn , which is the general way of describing the space of n-vectors, each element of which is a

real number (this is taking us ahead to linear algebra).

40 4. Set Theory, Sequence

Definition 4.5. Union : The union of n sets is the set containing all elements from all n sets. We

write

A B = {x : x A x B}.

n

Ai = A1 A2 An == {x : for some i = 1, , n, x Ai }

i=1

A. B

Definition 4.6. Intersection : The intersection of n sets is the set containing the elements common

to all n sets. We write

A B = {x : x A x B}.

n

Ai = A1 A2 An = {x : for all i = 1, , n, x Ai }

i=1

C C

j=n

j=n

j=n

j=n

A j = ACj ; A j = ACj .

j=1 j=1 j=1 j=1

A. B

Definition 4.7. Exclusion : The exclusion of the set B from the set A is the set of all elements in

A that are, in addition, not elements of B. We write

A \ B = {x A | x

/ B}.

4.1. Set Theory 41

A\B B\A

A B

AB

A. B

BA

A. B

Proposition 1. (A \ B) (B \ A) = 0/

Proof.

A \ B = A BC BC

B \ A = B AC B

/

B BC = 0.

42 4. Set Theory, Sequence

Exercise 4.2. Let B, and A1 , , An be subsets of X. Then,

j=n

j=n

j=n

j=n

B A j = (B A j ); B A j = (B A j ).

j=1 j=1 j=1 j=1

Next we consider the sets whose elements are sets themselves. For example, let A, B, and C be

subsets of X, then the collection A = {A, B,C} is a set, whose elements are A, B and C. We call a

set whose elements are subsets of X a family of subsets of X, or a collection of subsets of X. The

notation we follow would be, the lower case letters refer to the elements of X, upper case letters

refer to subsets of X and script letters refer to families of subsets of X.

Any subset of empty set is empty. Observe that the empty set 0/ is a subset of X. It is possible to

/ In this case {0}

form a non-empty set whose only element is the empty set, i.e., {0}. / is a singleton.

Also 0/ {0}

/ and 0/ {0}.

/

Definition 4.8. Let A be any subset in X. The power class of A or the power set of A is the family

of all subsets of A. We denote the power set of A by P (A).

Specifically,

P (A) = {B : B A}

/ = {0},

The power set of the empty set is P (0) / i.e., the singleton of 0.

/ The power set of a singleton

P ({a}) = {0, / {a}}. Note that the power set of A always contains A and 0. / In general, if A is a

n

finite set with n elements, then P (A) contains 2 elements.

Exercise 4.3. Prove that if A is a finite set with n elements, then P (A) contains 2n elements.

There are a number of set identities that the set operations of union, intersection, and set difference

satisfy. They are very useful in calculations with sets. Below we give a table of such set identities,

where U is a universal set and A, B, and C are subsets of U.

Commutative Laws: A B = B A ; A B = B A

4.2. Set Identities 43

Associative Laws: (A B) C = A (B C) ; (A B) C = A (B C)

Distributive Laws: A (B C) = (A B) (A C) ; A (B C) = (A B) (A C)

Idempotent Laws: A A = A ; A A = A

Absorption Laws: A (A B) = A ; A (A B) = A

Identity Laws: A 0/ = A ; A U = A

Complement Laws: A Ac = U ; A Ac = 0/

Complements of U and 0/ : U c = 0/ ; 0/ c = U

A. B

(b) (A B) \ (C \ A) = A (B \C).

/

(c) A (((B Cc ) (D E c )) ((B Bc ) Ac )) = 0.

44 4. Set Theory, Sequence

We will discuss additional concepts in set theory after we have gone over some elementary

exposition of functions and sequences.

4.3. Functions

Definition 4.9. A correspondence consists of:

(c) A mapping f (x) which assigns at least one element from R to each element x D.

Definition 4.10. A function consists of:

(c) A mapping f (x) which assigns exactly one element from R to each element x D.

f (x) = x3 , D = R, R=R

f (x) = 0, D = R, R = R.

The range need not be exhausted but the domain must be.

The set of all functions is a strict subset of the set of all correspondences. This is the same as

saying that all functions are correspondences, but not the other way around. From here onwards its

critical that you specify the domain and the range when defining or using a function. For example

these two functions:

4.4. Vector Space 45

are not the same function, even though in practice they produce identical results.2

Definition 4.11. The argument of a function is the element from the domain that is mapped into

the range and the value of a function is the element from the range that is the destination of the

mapping.

Definition 4.12. A real-valued function is a function whose range is the set R or any subset of R.

From the above definition 4.12, the definitions of integer-valued functions, complex-valued

functions, etc., should be clear.

{ }

Definition 4.13. Let f : D R and let A D. We let f (A) represent the subset f (x) : x A

{ R. The set f (A)

of } is called the image of A in R. If B R, we let f 1 (B) represent the subset

1

x D : f (x) B of D. The set f (B) is called the pre-image of B in D.

Note that the image of a function may be equivalent to the range, or it may be a strict subset of

the range. In the above example, the image of the function f is a strict subset of its range, but the

image of g is equal to its range.

The vector space is defined over a field which is a set on which two operations + and (called

addition and multiplication respectively) are defined. The formal definition of field is as follows:

Definition 4.14. A field F is a set on which two operations, called addition (+) and multiplication

(), are defined so that for each pair of elements x, y in F there are unique elements x + y and x y

in F, such that the following conditions hold for all a, b c in F.

a + b = b + a, and a b = b a.

(a + b) + c = a + (b + c), and (a b) c = a (b c)

(iii) Existence of identity elements for addition and multiplication: There exists elements 0 and 1

in F such that

0 + a = a, and 1 a = a

2The only difference between the two is that the range of f is all real numbers, and the range of g is the set of non-negative

real numbers. This is inconsequential, since the mapping in both cases takes all elements from the domain and assigns them to a

non-negative real number. But the two functions are still not the same.

46 4. Set Theory, Sequence

(iv) Existence of inverses for addition and multiplication: For each element a in F and for each

non-zero element b in F, there exist elements c and d in F such that

a + c = 0, and b d = 1

a (b + c) = a b + a c.

Example of fields include the set of real numbers R with the usual definitions of addition and mul-

tiplication, the set of rational numbers Q with the usual definitions of addition and multiplication.

Definition 4.15. A vector space V over a field F consists of a set on which two operations, called

addition (+) and scalar multiplication (), are defined so that for each pair of elements x, y in V

there is a unique element x + y in V , and for each element a in the field F and for each element x in

V , there is a unique element ax in V, such that the following conditions hold.

x, y V, x + y = y + x

x, y, z V, (x + y) + z = x + (y + z)

an element O V x + O = x x V

x V some element y V x + y = O

R, x, y V, (x + y) = ( x) + ( y)

, R, x V, ( + ) x = x + x

, R, x V, () x = ( x)

4.4. Vector Space 47

1 x = x x V.

In order to show that any space is a vector space, we simply need to show that the properties in

the above definition are satisfied.

Definition 4.16. The Cartesian Product of sets A and B is the set of pairs (a, b) satisfying a

A b B. We write

A B = {(a, b) | a A b B}.

The Cartesian product is the two set case of the general cross product of sets, which is the

same concept defined for any number of sets. For example using sets A, B, C and D we could define

E = A B C D, and a typical element of E would be (a, b, c, d) for some a A, b B, c C

and d D.

Example 4.2.

{ }

R3 = R R R = (x, y, z) | x R y R z R

R2+ = R+ R+ ; R2++ = R++ R++ .

The order of the sets in the cross-product does matter as the following example shows.

Example 4.3. Let

A = {1, 2, 3} , B = {2, 4}

{ }

A B = (1, 2) , (1, 4) , (2, 2) , (2, 4) , (3, 2) , (3, 4)

{ }

B A = (2, 1) , (2, 2) , (2, 3) , (4, 1) , (4, 2) , (4, 3) .

(a) The nonzero vectors u and v are parallel if there exists a R such that u = av.

(b) The vectors u and v are orthogonal or perpendicular if their scalar product is zero, that is, if

u v = 0.

( )

uv

(c) The angle between vectors u and v is arccos uv .

4.4.1. Metric.

Definition 4.17. A distance function is a real-valued function d : V V R which satisfies

(i) Non-negativity,

x, y V, d(x, y) > 0 with equality if and only if x = y

48 4. Set Theory, Sequence

(ii) Symmetry

x, y V, d(x, y) = d(y, x)

x, y, z V, d(x, z) 6 d(x, y) + d(y, z).

Any function satisfying these three properties is a distance function. A distance function is also

called a metric. The space V with elements x, y, which would be called points, is a metric space if

we can associate a distance function to it.

Example 4.4.

d (x, y) = (x1 y1 )2 + + (xn yn )2

where V = Rn .

0 if x = y

d (x, y) =

y

1 if x =

where V is any vector space.

(c) In V = R2

{ }

d (x, y) = max |x1 y1 | , |x2 y2 |

d(x, y)

d1 (x, y) =

1 + d(x, y)

is also a metric. This allows us to construct any number of metric d(x, y) from any given metric.

4.4.2. Norm.

Definition 4.18. A norm is a real-valued function written : V R, defined on vector space V ,

which satisfies

(i) Non-negativity:

x V, x > 0; with equality if only if x = 0,

4.4. Vector Space 49

(ii) Homogeneity:

x V, R, x = | | x ,

x, y V, x + y 6 x + y .

Example 4.5.

x Rn , x = x12 + + xn2

n

x Rn , x = |xi |

i=1

space V , which satisfies

(i) Symmetry:

x, y V, x, y = y, x ,

(iii) Bilinearity:

x, y, z V, , R, x + y, z = x, z + y, z .

x, y V, x y = x1 y1 + + xn yn .

A normed metric space (V, ) is a metric space V and a norm . An inner product space

(V, , ) is a space V and an inner product , .

50 4. Set Theory, Sequence

4.4.4. Cauchy-Schwartz Inequality. The Cauchy-Schwarz inequality states that for all vectors x

and y of an inner product space,

|x, y|2 6 x, x y, y,

where , is the inner product. Equivalently, by taking the square root of both sides, and referring

to the norms of the vectors, the inequality is written as

|x, y| 6 x y.

Moreover, the two sides are equal if and only if x and y are linearly dependent (or, in a geometrical

sense, they are parallel or one of the vectors is equal to zero).

more explicit way as follows:

|x1 y1 + + xn yn |2 6 (x12 + + xn2 ) (y21 + + y2n ).

4.5. Sequences

{xn } : N Rm

that gives us an ordered infinite list of points in Rm .

Another notation for sequence is xn where xn (x1 , x2 , ). As we saw above, sets are

unordered collections of elements. Even if there is an intuitive ordering to the elements of a set,

with respect to the definition of the set itself there is no first element or last element. Sequences,

however, are sets for which the elements are assigned a particular order.

Example 4.7.

{ }

1

S1 = , n N is a sequence in R

n

( )

1

S2 = n , n N is a sequence in R2

n

The interpretation of S1 is that the nth element of the sequence is givenby 1n . So we could also

( ) ( )

1 1

have written S1 = {1, 2 , 3 , 4 , }. Similarly S2 =

1 1 1 1 , 2 , . Note the implication

1 2

of this definition is that the elements of the sequence are numbered from 1 onwards, not from 0.

4.5. Sequences 51

Its usually assumed in the first year courses that the first element of a sequence is numbered 1

not 0, but this need not always be the case. Note that order of appearance of elements matters

{1, 2, 3, 4, } = {2, 1, 3, 4, }

and elements can be repeated,

S = {1, 1, 1, } is a sequence.

Definition 4.22. We say that x is a limit point of {xn } , n N, if

> 0 there exist infinite number of terms xn d (x, xn ) < .

Example 4.8. (a) Let xn = (1)n . This sequence has two limit points: a = 1 and a = 1.

2 ). This sequence has three limit points: a = 1, 0, 1.

{ }

(c) The sequence 1, 1, 12 , 1, 13 , 1, has two limit points 0 and 1.

n

(d) Let xn = n(1) . This sequence has a limit point a = 0.

Definition 4.23. The sequence S converges to x (has a limit) x if

> 0, N N such that d (xn , x) < n > N

x = lim xn .

n

Definition 4.23 is a source of a lot of difficulty. However its one of the most important def-

initions in macroeconomic theory and in parts of micro, and its worth forcing yourself to fully

absorb it before the end of the Review. The intuition behind limits is not nearly as difficult as the

formal definition. A sequence converges to x if after choosing any very, very tiny number (), you

can identify a point in the sequence (N) after which all of the remaining members of the sequence

are no farther than from some particular value x. This concept is only well-defined for infinite

sequences. In most economic theory, the elements of a convergent sequence never actually reach

their limiting value. They simply get closer and closer to it as the sequence progresses.

1

Example 4.9. The sequence xn = n is a convergent sequence.

(Use claim 1

n 0).

52 4. Set Theory, Sequence

n > N, d (xn , 0) = |xn | < |xn | <

1 1

xn < < n > .

n

So by choosing N to be any natural number greater than 1 , we have

1 1

n > N, d (xn , 0) = |xn | = < < .

n N

Definition 4.24. A sequence {xn } is bounded if

B R such that d (xn , 0) 6 B, n N.

Definition 4.25. A sequence {xn } is unbounded if

B R n N such that d (xn , 0) > B.

Example 4.10. The sequence {1, 0, 1, 0, } is bounded. The sequence {xn } , xn = n, n N is

unbounded.

Definition 4.26. The tail of a sequence {xn } is the continuation of {xn } after some m N, that is

{xm+1 , xm+2 , }.

Theorem 4.1. A sequence {xn } is bounded if and only if the tail of {xn } is bounded.

(TRIVIAL).

B such that |xn | < B, n > m.

Let

B = max {x1 , x2 , , xm1 , B} .

Then B is a bound for {xn },

n N, |xn | < B .

{ }

Definition 4.27. If {xn }

n=1 is a sequence, a subsequence xn(k) is obtained from {xn } by

k=1

crossing out some (possibly infinitely many) elements, while preserving the order.

4.5. Sequences 53

{ }

Example 4.11. Sequence: {xn } = 1, 1, 12 , 1, 13 , 1, .

{ } { }

Subsequence: xn(k) = {1, 1, 1, } or 1, 12 , 13 , .

n N, xn+1 > xn

and is monotone decreasing if

n N, xn+1 6 xn .

quences.

Claim 4.1. Let {xn } be monotonic. Then it is convergent if and only if it is bounded.

Theorem 4.2. Bolzano-Weierstrass Theorem Every bounded sequence {xn } has a convergent sub-

sequence.

Proposition 2. Nested Interval Property Suppose that I1 = [a1 , b1 ], I2 = [a2 , b2 ], , where I1

I2 , and limn (bn an ) = 0. Then there exists exactly one real number common to all

intervals In .

Proof. Note that we have a1 < a2 < a3 < an < < bn < < b2 < b1 . Then each bi is an

upper bound for the set A = {a1 ; a2 ; }. In other words sequence {an } is monotone increasing

and bounded sequence. Therefore, limn an = a exists and a = sup{an } 6 {bk } for each natural

number k. Hence ak 6 a 6 bk for every k N or a is contained in each Ik . Now let b be contained

in In for all n N. Then an 6 b 6 bn for every n N or 0 6 (b an ) 6 (bn an ) for each n. Then

limn (b an ) = 0. It follows that b = limn an = a, and so a is the only real number common

to all intervals.

n=1 be bounded. There is B R such that xn 6 B for all n N. We prove the

theorem in following steps.

(i) In is a closed interval [an , bn ] where bn an = 2B

2n ; and

(ii) {i : xi In } is infinite.

54 4. Set Theory, Sequence

We let I0 = [B, B]. This closed interval has length 2B and xi I0 for all i N. Suppose we

have In = [an , bn ] satisfying (i) and (ii). Let cn be the midpoint an +b n

2 . Each of the intervals

[an , cn ] and [cn , bn ] is half the length of In . Thus they both have length 12 2B

2n = 2n+1 . If xi In ,

2B

then xi [an , cn ] or xi [cn , bn ], possibly both. Thus at least one of the sets {i : xi [an , cn ]}

or {i : xi [cn , bn ]} is infinite. If the first set is infinite, we let an+1 = an and bn+1 = cn . If

the second is infinite, we let an+1 = cn and bn+1 = bn . Let In+1 = [an+1 , bn+1 ]. Then (i) and

(ii) are satisfied. By the Nested Interval Property, there exists a n=1 In .

Step 2 We next find a subsequence converging to a. Choose i1 N such that xi1 I1 . Suppose we

have in . We know that {i : xi In+1 } is infinite. Thus we can choose in+1 > in such that

xin+1 In+1 . This allows us to construct a sequence of natural numbers i1 < i2 < i3 <

where xin In for all n N.

n=1 converges to a. Let > 0.

Choose N such that > 2N . Suppose n > N. Then xin In and a In . Thus |xin a| 6 2B

2B

2n 6

2B

2N

for all n > N.

Remark 4.1. Every bounded sequence {xn } has at least one limit point x.

Definition 4.29. A sequence {xn } is Cauchy sequence if

> 0, N, such that n, m > N, d ( xn , xm ) < .

After N, each element is close to every other element or in other words, the elements lie within

a distance of from each other.

(i) Every convergent sequence {xn } (with limit x, say) is a Cauchy sequence, since, given any

real number > 0, beyond some fixed point, every term of the sequence is within distance 2

of x, so any two terms of the sequence are within distance of each other.

(ii) Every Cauchy sequence of real numbers is bounded (since for some N, all terms of the se-

quence from the N-th position onwards are within distance 1 of each other, and if M is the

largest absolute value of the terms up to and including the N-th, then no term of the sequence

has absolute value greater than M + 1).

(iii) In any metric space, a Cauchy sequence which has a convergent subsequence with limit x is

itself convergent (with the same limit), since, given any real number > 0, beyond some fixed

4.5. Sequences 55

point in the original sequence, every term of the subsequence is within distance 2 of x, and

any two terms of the original sequence are within distance 2 of each other, so every term of

the original sequence is within distance of x.

Theorem 4.3. Every sequence has at most one limit.

Proof. By contradiction. We use the intuition that all points end up being close to say r1 and r2

at the same time which is not possible. Let sequence {xn } converge to two limits r1 and r2 . It is

enough to show that there is one for which this does not hold. Let us choose = d(r14,r2 ) = |r1 r

4

2|

N1 , n > N1 , |xn r1 | <

and since r2 is a limit,

N2 , n > N2 , |xn r2 | < .

Let N = max {N1 , N2 }. Then

n > N, |xn r1 | + |xn r2 | < 2.

By triangle inequality,

4 = |r1 r2 | = (xn r2 ) (xn r1 ) 6 |xn r1 | + |xn r2 | < 2

which is a contradiction.

Remark 4.2. A sequence can have more than one limit point.

(a) Every convergent sequence is bounded BUT a bounded sequence may not be convergent. For

example {1, 1, 1, 1, }.

xn + yn x + y

xn yn x y,

and if yn = 0, n y = 0,

xn x

.

yn y

>

>

6 6

If {xn } x and xn b, n N, then x b.

>

>

<

6

56 4. Set Theory, Sequence

{ }

(d) x is a limit point of {xn } if and only if a subsequence xn(k) of the sequence {xn } such

{ } k=1

that xn(k) x.

x1

x1

n

x2n x2

(e) Sequence of vectors {xn } = R converges to a limit {x} =

N if and only if

xN

xN

{ } n

xin {xi } , i = 1, 2, , N.

Definition 4.30. A vector space in which every Cauchy sequence has a limit is called a complete

vector space.

4.6. Sets in Rn

Now we are ready for additional useful concepts in set theory. We begin with some definitions.

Definition 4.31. A set A on the real line is bounded if B R x A, x 6 B.

Theorem 4.4. For every non-empty bounded set A R, a real number sup A such that

x A, x 6 sup A

y > sup A

or sup A is the least upper bound for A.

Example 4.12. For the sets

A = [0, 1] , B = (0, 1) ,C = [0, 1), D = (0, 1]

sup = 1, inf = 0

This example shows that sup and inf of a set need not belong to the set. If sup belongs to the

set A, it is called max {A} and if inf {A} belongs to the set A, it is called min {A}.

4.6. Sets in Rn 57

r x

Br (x)

Definition 4.32. Point x is a limit point of a set A if every neighborhood of x contains a point of A

different from x : x is a limit point of A if

> 0, y A, y = x d (x, y) < .

Theorem 4.5. Bolzano-Weierstrass Theorem for sets Every bounded infinite set has at least one

limit point.

Example 4.13. For the set A = (0, 1), x = 0 is a limit point of the set A.

This shows that limit point of a set need not belong to the set.

Theorem 4.6. Point x is a limit point of set A Rn if a sequence

{xn } n N, xn = x xn A xn x.

Definition 4.33. An open ball in Rn centered at x with radius r > 0 is

{ }

Br (x) = y Rn | d (x, y) < r .

Note that the open ball does not include its boundary points.

{ }

y R | y1 + y2 < 1 .

2 2 2

x A, r > 0 Br (x) A.

Around any point in an open set, one can draw an open ball which is completely contained in

the set.

Example 4.15. Following sets are open

58 4. Set Theory, Sequence

/

B = (, 0) ; R; 0.

Definition 4.35. The set A is closed if A contains all its limit points. (contains its borders)

Theorem 4.7. Set A Rn is closed if and only if AC is open.

Example 4.16. Following sets are closed.

/

A = [2, 5] since AC = (, 2) (5, ) is open.; R; 0.

There are two sets which are both open and closed. The empty set and the universal set. Empty set

0/ is open since

int 0/ = 0/

and 0/ is closed since

bd 0/ = 0/ 0.

/

The universal set is complement of the empty set and so is both open and closed. There can be sets

which are neither open nor closed: A = (0, 1]. Following theorem characterizes the closed set using

convergent sequences.

Theorem 4.8. A set A Rn is closed if and only if every convergent sequence of points {xn } A

has its limit x A.

Example 4.17. The budget set

{ }

B (p, I) = y Rn+ | p y 6 I ,

where p Rn++ and I R++ , is closed.

{xn } xn B (p, I) n xn x.

xn > 0, n x > 0,

p xn 6 I, n p x 6 I

x B (p, I) B (p, I) is closed.

Theorem 4.9.

4.6. Sets in Rn 59

Good 2 p1

|Slope| = p2

M

P2

0 M

Good 1

P1

Figure 4.5. Budget set B(p, I)

60 4. Set Theory, Sequence

Remark 4.3. The finite number of sets in (b) and (d) are necessary as following example shows.

( )

1 1

For (b) , An = , , n N, An = [0] which is closed.

n n n=1

[ ]

1

For (d) , Bn = , 2 , n N, Bn = (0, 2] which is not closed.

n n=1

Example 4.18.

A = [1, 2] is compact.

R is closed but not bounded. NOT compact.

B = (1, 2] is bounded but not closed. NOT compact.

Definition 4.37. A set A Rn is compact if every sequence of points {xn } A has a limit point

x A.

Definition 4.38. A set A Rn is convex if x, y A, (0, 1),

x + (1 ) y A.

It will be useful to draw some sets to differentiate between convex and non-convex sets.

4.6. Sets in Rn 61

Chapter 5

Problem Set 2

(a) The Manhattan distance, for x, y Rn

n

(5.1) d(x, y) = | xi yi | x, y Rn

i=1

(b) For x, y R2 ,

(5.2) d(x, y) = max{| x1 y1 |, | x2 y2 |}

(c) Let d(, ) be a metric, then

d(x, y)

(5.3) d1 (x, y) = .

1 + d(x, y)

[ ]

1 2

(5.4) ,

n=1

n n

is compact.

(a) (A B)c Ac Bc

(b) (A B)c Ac Bc .

{ }

(5.5) C = (x1 , x2 ) R2 : x12 + x22 = 1

63

64 5. Problem Set 2

(a1 , a2 ) + (b1 , b2 ) = (a1 + b1 , a2 + b2 ), and c(a1 , a2 ) = (ca1 , ca2 ).

Is C a vector space? Justify your answer.

(5) Let V denote the set of ordered pairs of real numbers. If (a1 , a2 ) and (b1 , b2 ) are elements of V

and c R define

(a1 , a2 ) + (b1 , b2 ) = (a1 + b1 , a2 b2 ), and c(a1 , a2 ) = (ca1 , ca2 ).

Is V a vector space over R with these operations? Justify your answer.

(6) Let V denote the set of ordered pairs of real numbers. If (a1 , a2 ) and (b1 , b2 ) are elements of V

and c R define

(a1 , a2 ) + (b1 , b2 ) = (a1 + 2b1 , a2 + 3b2 ), and c(a1 , a2 ) = (ca1 , ca2 ).

Is V a vector space over R with these operations? Justify your answer.

(7) Prove

(J K)c = J c K c

(J K)c = J c K c

n=1 and {yn }n=1 such that {xn }n=1 x and {yn }n=1 y. Show that

{xn + yn }

n=1 x + y.

n=1 is not convergent.

{ }

(11) Prove that the sequence {xn } = 2 1n : n N is not convergent to 1.

(13) Determine whether the following sets are open, closed, neither or both:

(i) S = (0, 1);

(ii) S = [0, 1];

(iii) S = R;

(iv) S = [0, 1).

Chapter 6

Linear Algebra

Linear algebra is the branch of mathematics dealing with (among many other things) matrices and

vectors. Its intuitively easy to see why linear algebra is important for econometrics and statistics.

Economic data is arranged in matrix format (rows corresponding to observations, columns corre-

sponding to variables), so the body of theory governing matrices should help us analyze data. It

is harder to see the connection between matrix theory and the optimization that we do in micro

theory, but there are some important links. Well cover the basics and some of the necessary detail

here, but more detailed coverage will be offered in the core courses.

6.1. Vectors

You may be familiar with vectors from physics courses, in which a vector is a pair giving the mag-

nitude and direction of a moving body. The vectors we use in economics are more general, in that

they can have any finite number of elements (rather than just 2), and the meaning of each element

can vary with the context (rather than always signifying magnitude and direction). Formally speak-

ing a vector can be defined as a member of a vector space, but we dont need to deal with such a

definition here. For our purposes:

Definition 6.1. A vector is an ordered array of elements with either one row or one column.

The elements are usually numbers. A vector is an n k matrix for which either n = 1, k = 1 or

both (see the definition of a matrix below). A general vector, for which the number of elements is

not specified but left as n, will sometimes be called an n-vector. We also refer to these as vectors

in Rn . A vector can be written in either row or column form:

65

66 6. Linear Algebra

x1

( )

x2

Row Vector: x Rn = x1 x2 . . . xn ; Column Vector: x Rn =

.. .

.

xn

Although you will sometimes be able to switch between thinking of a vector as a row or a

column without restriction, there are certain operations that require a vector to be oriented in a

certain way, so it is good to distinguish between row and column vectors whenever possible. Most

people use x to refer to the vector in column form and x to refer to it in row form, but this is not

universal. Also, we usually use lowercase letters for vectors and uppercase letters for matrices.

0

.

Null vector 0n1 = ..

0

1

.

Sum vector un1 = ..

1

The ith unit vector, called ei , has all elements 0 except for the i th, which is equal to 1. The

definition of a unit vector is specific to the vector space in which it sits. For example:

0

(6.1) e2 R3 = 1

0

and

0

1

(6.2) e2 R4 =

0

0

Definition 6.2.

6.2. Matrices 67

(a) Equality :

Vectors x Rn , y Rm are equal if n = m and xi = yi i.

(b) Inequalities : x, y Rn :

x y if xi yi i = 1, , n;

x > y if xi yi i = 1, , n and xi > yi for at least one i;

x y if xi > yi i = 1, , n.

(c) Addition :

x, y Rn , x + y = z Rn where zi = xi + yi , i.

x1

x2

(6.3) x =

..

.

xn

(e) Vector Multiplication : This is essentially an inner product rule applied to Rn . See the rules

for matrix multiplication below, as they also apply for vectors.

6.2. Matrices

Definition 6.3. A matrix is a rectangular array of elements (usually numbers, for our purposes).

matrix A, we can write:

a11 a12 . . . a1k

[ ]

a21 a22 . . . a2k

[A] = ai j = .. .. .. ..

nk nk . . . .

an1 an2 . . . ank

The matrix Ank is a null matrix if ai j = 0 for i = 1, , n, j = 1, , k.

68 6. Linear Algebra

Its worth it to check your understanding of each of the above definitions by writing out a

matrix that satisfies each. Then note this next definition carefully:

obvious statement, but you could try proving it formally. It should only take a few lines.

6.2.1.1. Addition. Matrix addition is only defined for matrices of the same size. If A is n k and

B is n k then

(6.4) A + B = Cnk

where

(6.5) ci j = ai j + bi j i = 1, , n, j = 1, , k.

We say that matrix addition occurs element wise because we move through each element of the

matrix A, adding the corresponding element from B.

6.2.1.2. Scalar Multiplication. Scalar multiplication is also an element wise operation. That is,

a11 a12 a1k

a21 a22 a2k

(6.6) R, [A] = . ..

. .. ..

nk . . . .

an1 an2 ank

6.2.1.3. Matrix Multiplication. Matrix multiplication is defined for matrices [A] and [B] if j = n

m j nk

or m = k. That is, the number of columns in one of the matrices must be equal to the number of

rows in the other. If matrices A and B satisfy this condition, so that A is m j and B is j k, their

6.2. Matrices 69

product [C] [A] [B] is given by ci j = Ai B j , where Ai is the ith row of A and B j is the jth column

mk m j jk

of B. For example, suppose

[ ] [ ]

1 2 6 5 4

[A] = and [B] =

22 3 4 23 3 2 1

Multiplication between A and B is only defined if A is on the left and B is on the right. It must

always be the case that the number of columns in the left hand matrix is the same as the number of

rows in the right hand matrix. In this case, if we say AB = C, then element

[ ]

[ ] 6

c11 = 1 2 = 1 6 + 2 3 = 12

3

Likewise

c12 = 1 5 + 2 2 = 9

c13 = 1 4 + 2 1 = 6

c21 = 3 6 + 4 3 = 30

c22 = 3 5 + 4 2 = 23

c23 = 3 4 + 4 1 = 16

which gives [ ]

12 9 6

[A] [B] = [C] =

22 23 23 30 23 16

Note that matrix multiplication is not a symmetric operation. In general, AB = BA, and in fact it

is often the case that the operation will only be defined in one direction. In our example BA is not

defined because the number of columns of B = (3) is not equal to the number of rows of A = (2).

For both AB and BA to be defined

[A] [B] = [C] ,

nk kn nn

and

[B] [A] = [D].

kn nk kk

(i) Even if n = k,

AB = BA.

[ ] [ ] [ ] [ ]

1 2 0 1 12 13 3 4

A= , B= , AB = , BA = .

3 4 6 7 24 25 27 40

70 6. Linear Algebra

[ ] [ ] [ ]

2 4 2 4 0 0

A= , B= , AB = .

1 2 1 2 0 0

[ ] [ ] [ ] [ ]

2 3 1 1 2 1 5 8

C= , D= , E= , CD = CE = .

6 9 1 2 3 2 15 24

A+B = B+A

A + (B +C) = (A + B) +C

(AB)C = A(BC)

(A + B)C = AC + BC

A(B +C) = AB + AC

Check that you have a clear understanding of the restrictions needed on the number of rows and

columns of A, B and C in order for the above to work. More matrix rules, involving the transpose:

(6.7) (A ) = A

(6.8) (A + B) = A + B

(6.9) (AB) = B A

Note the reversal of the order of the matrices in the last operation.

Definition 6.4. A set of vectors x1 , , xn in Rm is linearly dependent if there exist 1 , , n , not

all zero, such that

(6.10) 1 x1 + + n xn = 0.

Definition 6.5. A set of vectors x1 , , xn in Rm is linearly independent if it is not linearly depen-

dent.

Definition 6.6. The rank of a matrix A is the maximum number of linearly independent column

vectors of A. It is also equal to the number of linearly independent row vectors of A.

Example 6.1. Let

1 2 3

A= 0 1 0

2 4 6

6.2. Matrices 71

The first and the third columns are linearly dependent. The elements of column 3 are three

times the corresponding entry in the column 1. Now take Columns 1 and 2.

1 2 0

1 0 + 2 1 = 0

2 4 0

1 + 22 = 0

2 = 0

21 + 42 = 0

1 = 0, 2 = 0

is the only solution. So the first two columns are linearly independent. We found two linearly

independent columns so the rank of matrix A is 2. We could have done the exercise taking rows

instead of columns and still got the same answer. (Please verify).

Theorem 6.1. (i) Rank of [A] 6 {# rows, # columns} = min {n, k};

nk

{ } { }

(ii) Rank of AB 6 min Rank (A) , Rank (B) 6 Rank (A) , Rank (B) .

Definition 6.7. A square matrix [A] is called non-singular or of full rank if rank (A) = n.

nn

Definition 6.8. A square matrix [A] is invertible if there exist [B] such that [A] [B] = [B] [A] =

nn nn nn nn nn nn

[I] . Then B is called inverse of A.

nn

( )1

(6.11) A1 = A

(6.12) (AB)1 = B1 A1

( )1 ( )

(6.13) A = A1

nn

nn nn

72 6. Linear Algebra

Determinant is defined only for square matrices. The determinant is a function depending on n that

associates a scalar, det (A), to an n n square matrix A. The determinant of an 1-by-1 matrix A is

the only entry of that matrix: det (A) = A11 . The determinant of a 2 by 2 matrix

[ ]

a b

A=

c d

Definition 6.10. The cofactor Ai j of the element ai j is defined as (1)i+ j times the determinant of

the sub matrix obtained from A after deleting row i and column j.

Example 6.2. Let

[ ]

1 2

A =

3 4

A11 = (1)1+1 4 = 4, A12 = (1)1+2 3 = 3

A21 = (1)2+1 2 = 2, A22 = (1)2+2 1 = 1.

Definition 6.11. The determinant of an n n matrix A is given by

n n

(6.14) det (A) = a1 j A1 j = ai1 Ai1 .

j=1 i=1

a b c

A = d e f .

g h i

Then

[ ] [ ] [ ]

e f d f d e

det (A) = a (1)1+1 det + b (1)1+2 det + c (1)1+3 det

h i g i g h

= a (ei f h) b (di f g) + c (dh eg) .

(a)

( )

(6.15) det (A) = det A

6.3. Determinant of a matrix 73

(b) Interchanging any two rows will alter the sign but not the numerical value of the determinant.

(c) Multiplication of any one row by a scalar k will change the determinant k fold.

(e) The addition of a multiple of one row to another row will leave the determinant unchanged.

det (AB) = det (A) det (B) .

(g) Properties (b) (e) are valid if we replace row by columns everywhere.

[ ] [ ]

1 2 1 3 ( )

A = , det (A) = 2; A = det A = 2

3 4 2 4

[ ]

3 4

B = , det (B) = 2.

1 2

Result 6.1. Let A be an n n upper triangular matrix, i.e., ai j = 0 whenever i > j. The determinant

of the matrix A is given by:

det A = i=1 aii

n

a11 a12 a1,n1 a1n

0 a22 a2,n1 a2n

.

A= ..

.. . .

. .

..

.

..

.

0 0 a n1,n1 a n1,n

0 0 0 ann

(1) Base case: Let n = 1. If A is a 1 1 matrix, then det A = a11 = 1i=1 aii by the definition of a

determinant.

(2) Inductive case: Let n > 1. Assume that for any (n 1) (n 1) matrix A with ai j = 0 for all

i=1 aii . Now consider any n n matrix A with ai j = 0 for all i > j.

i > j, we have det A = n1

74 6. Linear Algebra

a11 a12 a1,n1

n+n

0 a22 a2,n1

= ann (1) .

.. . . ..

.. . . .

0 0 an1,n1

n1

= ann aii

i=1

n

= aii

i=1

Result 6.2. The upper triangular square matrix A is non-singular if and only if aii = 0 for each

i {1, , n}.

Claim 6.1. If the upper triangular matrix A is non-singular, then aii = 0 for all i = 1, . . . , n.

Proof. Let A be non-singular. Then A has an inverse, A1 . Since 1 = det I = det [A1 A] =

(det A1 )(det A), we know that det A = 0. If aii = 0 for any i 1, . . . , n, then by the Result

(6.1) we would have det A = 0, a contradiction. So it must be that aii = 0 for all i = 1, . . . , n.

Claim 6.2. If A is upper triangular and aii = 0 for all i = 1, . . . , n, then A is non-singular.

6.4. An application of matrix algebra 75

Proof. Let aii = 0 for all i = 1, . . . , n. Seeking contradiction, suppose A is singular. Without loss

of generality, we can write A1 = ni=2 i Ai . Let

B = A1 ni=2 i Ai A2 An

= 0 A2 An

We know, by the properties of determinants, that det B = det A. But, expanding B by the first

column, we have det B = 0. This gives det A = 0, a contradiction. So we have that A is non-

singular.

We provide an application of matrix algebra is Markov process or Markov chain. Markov processes

are used to measure movements over time. It involves use of a Markov transition matrix. Each value

in the transition matrix is probability of moving from one state to another state. It also specifies a

vector containing the initial distribution across each of these states. By repeatedly multiplying the

initial distribution vector by the transition matrix, we can estimate changes across states over time.

Consider the problem of movement of employees within a firm at different branches. In the

simple case, we take two locations, namely Ithaca and Cortland to demonstrate the basic elements

of a Markov process.

To determine the number of employees in Ithaca tomorrow, we take the probability that the

employees will stay in Ithaca branch multiplied by the total number of employees currently in

Ithaca. We add to this the number of Cortland employees transferring to Ithaca, which is equal

to total number of employees in Cortland multiplied by the probability of Cortland employees

transferring to Ithaca.

We follow the same process to determine the number of employees in Cortland tomorrow, made

up of the employees who choose to remain at Cortland and the Ithaca employees who transfer into

Cortland.

There are four probabilities involved which can be arranged in a Markov transition matrix.

76 6. Linear Algebra

Let At and Bt denote the populations of Ithaca and Cortland locations at some time t. The

transition probabilities are defined as follows.

pAA probability that a current A remains an A,

pBB probability that a current B remains a B,

pBA probability that a current B moves to A.

The distribution of employees at time t is denoted by the vector xt = [At Bt ] and the transition

probabilities in matrix form as

[ ]

pAA pAB

(6.16) M= .

pBA pBB

Then the distribution of employees across the two locations next period (t + 1) is xt M = xt+1

,

which is

[ ]

pAA pAB

[At Bt ] = [(At pAA + Bt pBA ) (At pAB + Bt pBB )] = [At+1 Bt+1 ].

pBA pBB

In the similar manner we can determine the distribution of employees after two periods.

xt+1 M = xt+2

[ ]

pAA pAB

[At+1 Bt+1 ] = [At+2 Bt+2 ]

pBA pBB

[ ][ ]

pAA pAB pAA pAB

[At Bt ] = [At+2 Bt+2 ]

pBA pBB pBA pBB

[ ]2

pAA pAB

[At Bt ] = [At+2 Bt+2 ]

pBA pBB

[ ]n

pAA pAB

(6.17) [At Bt ] = [At+n Bt+n ]

pBA pBB

Example 6.4.

x0 = [A0 B0 ] = [200 200]

6.4. An application of matrix algebra 77

Let [ ] [ ]

pAA pAB 0.8 0.2

M= = .

pBA pBB 0.4 0.6

Then the distribution of employees in the next period t = 1 is

[ ]

0.8 0.2

[200 200] = [240 160] = [A1 B1 ].

0.4 0.6

[ ]2 [ ]

0.8 0.2 0.72 0.28

[200 200] = [200 200] = [256 144] = [A2 B2 ]

0.4 0.6 0.56 0.44

[ ]6 [ ]

0.8 0.2 0.668 0.332

[200 200] = [200 200] = [266.4 133.6] = [A6 B6 ]

0.4 0.6 0.664 0.336

Observe that when the transition matrix is raised to higher powers, the new transition matrix con-

verges to a matrix whose rows are identical. This is referred to as the steady state. In this example,

the steady state would be

[ ]

2 1

M= 3 3 .

2 1

3 3

Try computing this value.

6.4.1. Absorbing Markov Chains. We can extend the previous model by adding a third choice:

employees can exit the firm with

pAE probability that a current A choose to exit, E,

Let us assume that

pEA = 0, pEB = 0, pEE = 1

where pEA , pEB , and pEE are the probabilities that an employee who is currently in state E will go

to A, B or E respectively. The values assigned to pEA , pEB , and pEE mean that nobody who leaves

the firm ever returns. It is also implied by these restrictions that the firm never replaces employees

that leave. Starting at time t = 0, the Markov chain becomes,

n

pAA pAB pAE

[A0 B0 E0 ] pBA pBB pBE = [An Bn En ]

pEA pEB pEE

78 6. Linear Algebra

or n

pAA pAB pAE

[A0 B0 E0 ] pBA pBB pBE = [An Bn En ]

0 0 1

This type of Markov process is referred to as absorbing Markov chain. The values of transition

probabilities assigned in the third row are such that once an employee goes to state E, he or she

remains in that state for ever. As n goes to infinity, An , and Bn will approach zero and En will

approach the total number of employees at time zero (i.e., A0 + B0 + E0 ).

(6.18) Ax = b

where matrix A is of dimension n k, x is a column vector k 1 and b is column vector n 1. This

is a system of n equations with k unknowns.

Example 6.5. The system of two linear equations,

5x + 3y = 1

6x + y = 2

can be written as [ ][ ] [ ]

5 3 x 1

=

6 1 y 2

homogeneous system.

Definition 6.12. Column vector x is called a solution to the system if Ax = b.

6.5. System of Linear Equations 79

Claim 6.3. A homogeneous system Ax = 0 always has a solution (Trivial x = 0). But there might

be other solutions (solution may not be unique).

Claim 6.4. For a non-homogeneous system Ax = b, a solution may not exist.

Example 6.6. Following system of two linear equations

2x + 4y = 5

x + 2y = 2

does not have a solution. Multiply second equation by 2. Then LHS of both equations become

same which leads to 5 = 4 which is a contradiction.

Example 6.7. Following system of two linear equations

2x + 4y = 2

x + 2y = 1

has many solution.

[ ]

Given [A] and {b}, the n (k + 1) matrix [Ab ] = A1 A2 Ak b is called the augmented

nk k1 n(k+1)

matrix. Note Ai is the ith column of A.

[ ] [ ] [ ]

5 3 1 5 3 1

Example 6.8. Let A = , b= Ab = .

6 1 2 6 1 2

[A] {x} = {b}

nk k1 n1

has a solution if and only if

(6.19) rank (A) = rank (Ab ) .

The solution is unique if and only if

(6.20) rank (A) = rank (Ab ) = k = # of columns of A = # of unknowns

and if det (A) = 0 then the solution is characterized by

(6.21) {x } = [A1 ] {b}

n1 nn n1

2x + y = 0

2x + 2y = 0

80 6. Linear Algebra

gives us [ ] [ ] [ ]

2 1 0 2 1 0

A= , b = , Ab = .

2 2 0 2 2 0

It is easy to verify that

rank (A) = 2 = rank (Ab ) .

Hence solution exists and is unique.

Example 6.10. The system of linear equations

2x + y = 0

4x + 2y = 0

leads to [ ] [ ] [ ]

2 1 0 2 1 0

A= , b = , Ab = .

4 2 0 4 2 0

It is again easy to verify that

rank (A) = 1 = rank (Ab ) .

However

rank (A) = rank (Ab ) < k = 2.

Hence solution exists but is not unique1.

Now, we revert to the problem of computing the inverse of a non-singular matrix. We first note

the following result.

( )

Theorem 6.4. Matrix [A] is invertible det (A) = 0. Also if [A] is invertible then det A1 =

nn nn

1

det(A) .

A A1 = I

so 1 = det I = det(AA1 ) = det(A) det(A1 ) using properties of determinants, noted above. Con-

sequently det(A) = 0, and det(A1 ) = [det(A)]1 .

Suppose, next, that A is not invertible. Then, A is singular and so one of its columns (say, A1 )

can be expressed as a linear combination of its other columns A2 , , An . That is,

n

A1 = i Ai

i=2

1A row or column vector of zeros is always linearly dependent on the other vectors.

6.5. System of Linear Equations 81

[ ]

n

Consider the matrix, B, whose first column is A i A and whose other columns are the same

1 i

i=2

as those of A. Then, the first column of B is zero, and so |B| = 0. By the property of determinants,

|B| = |A|, and so |A| = 0.

For a square matrix, [A] , we define the co-factor matrix of A to be the n n matrix given by

nn

A11 A12 ... A1n

. ..

C=

..

..

.

..

.

.

An1 An2 ... Ann

The transpose of C is called the adjoint of A, and denoted by adj A.

n n n

a1 j A1 j

j=1 a1 j A2 j a1 j An j

j=1 j=1 |A| 0 0

AC = .. ..

=

..

n . . .

n n 0 0 |A|

an j A1 j an j A2 j an j An j

j=1 j=1 j=1

This yields the equation

(6.22) AC = |A| I

If A is non-singular (that is invertible) then there is A1 such that

(6.23) AA1 = A1 A = I

Pre-multiplying (6.22) by A1 and using (6.23),

C = |A| A1

Since A is non-singular, we have |A| = 0, and

C ad j A

(6.24) A1 = =

|A| |A|

82 6. Linear Algebra

Thus (6.24) gives us a formula for computing the inverse of a non-singular matrix in terms of the

determinant and cofactors of A.

Recall that we wanted to calculate the (unique) solution of a system of n equations in n unknowns

given by

(6.25) Ax = c

where A is an n n matrix, and c is a vector in Rn .

To obtain a unique solution, we saw that we must have A non-singular, which now translates to

the condition |A| = 0. The unique solution to (6.25) is then

ad j A

(6.26) x = A1 c = c

|A|

Let us evaluate x1 , using (6.26). This can be done by finding the inner product of x with the first

unit vector, e1 = (1, 0, , 0). Thus,

e1 ad j A

x1 = e1 x = c

|A|

=

|A|

c1 a12 .. a1n

= ..

.

|A|1

cn an2 .. ann

This gives us an easy way to compute the solution of x1 . In general, in order to calculate xi , replace

the ith column of A by the vector c and find the determinant of this matrix. Dividing this number

by the determinant of A yields the solution xi . This rule is known as Cramers Rule.

Example 6.11. General Market Equilibrium with three goods

6.6. Cramers Rule 83

Consider a market for three goods. Demand and supply for each good are given by:

D1 =5 2P1 + P2 + P3

S1 = 4 + 3P1 + 2P2

D2 =6 + 2P1 3P2 + P3

S2 =3 + 2P2

D3 =20 + P1 + 2P2 4P3

S3 =3 + P2 + 3P3

where Pi is the price of good i; i = 1; 2; 3. The equilibrium conditions are: Di = Si ; i = 1; 2; 3, that

is

5P1 + P2 P3 = 9

2P1 + 5P2 P3 = 3

P1 P2 + 7P3 = 17

This system of linear equations can be solved at least in two ways.

9 1 1

A1 = det 3 5 1 = 356.

17 1 7

5 1 1

A = det 2 5 1 = 178.

1 1 7

A1 356

P1 = = = 2.

A 178

Similarly P2 = 2 and P3 = 3. The vector of (P1 , P1 , P3 ) describes the general market equilib-

rium.

5 1 1 P1 9

A = 2 5 1 , P = P2 , B = 3

1 1 7 P3 17

34 6 4

1

A1 = 15 34 7

det A

7 4 27

84 6. Linear Algebra

34 6 4 9 2

1

P = 15 34 7 3 = 2

178

7 4 27 17 3

Again, P1 = 2, P1 = 2, and P3 = 3.

nn

Definition 6.13. A principal minor of order k (1 6 k 6 n) of [A] is the determinant of the k k sub

nn

matrix that remains when (n k) rows and columns with the same indices are deleted from A.

Example 6.12. Let

1 2 3

A= 0 8 1

2 5 9

[ ] [ ] [ ]

1 2 8 1 1 3

det = 8; det = 67; det = 3.

0 8 5 9 2 9

1 2 3

det 0 8 1 = 23.

2 5 9

Definition 6.14. A leading principal minor of order k, (1 6 k 6 n) of [A] is the principal minor of

nn

order k which has the last (n k) rows and columns deleted.

6.8. Quadratic Form 85

[ ]

1 2

det =8

0 8

and leading principal minor of order 3 is

1 2 3

det 0 8 1 = 23.

2 5 9

A quadratic form consists of a square matrix [A] which is pre and post multiplied by a n vector. It

nn

is a scalar.

(6.27) Q (x, A) = x Ax

Example 6.13. Let [ ] [ ]

a b x1

A= , x= .

c d x2

Then

[ ] [ ]

[ ] a b x1

Q (x, A) = x1 x2

c d x2

= ax12 + (b + c) x1 x2 + dx22 .

nn

(6.28) Q (z, A) = z Az > 0, z Rn , z = 0.

(6.29) Q (z, A) = z Az < 0, z Rn , z = 0.

(6.30) Q (z, A) = z Az > 0, z Rn .

(6.31) Q (z, A) = z Az 6 0, z Rn .

86 6. Linear Algebra

[A] is PD if and only if all leading principal minors of A are strictly positive.

nn

[A] is ND if and only if all leading principal minors of A have sign (1)k .

nn

[A] is PSD if and only if all principal minors of A are non-negative.

nn

[A] is NSD if and only if all principal minors of A have sign (1)k or are 0.

nn

Example 6.14. Let [ ]

a11 a12

A= .

a21 a22

Then A is

positive definite: a11 > 0, a11 a22 a12 a21 > 0.

negative definite: a11 < 0, a11 a22 a12 a21 > 0.

positive semi-definite: a11 > 0, a22 > 0, a11 a22 a12 a21 > 0.

negative semi-definite: a11 6 0, a22 6 0, a11 a22 a12 a21 > 0.

Note that a negative definite matrix necessarily has full rank: indeed, if the zero vector can be

obtained by a linear combination of columns of A with weights 1 , , n (not all zero), then we

can define t = (1 , , n ) to obtain t At = 0.

Definition 6.15. Let A be a symmetric n n matrix. Matrix A is diagonally dominant if for each

row i, we have |ai,i | j=i |ai, j |, and it is strictly diagonally dominant if the latter inequality holds

strictly for each row.

Every symmetric, diagonally dominant matrix with non-positive entries along the diagonal is

negative semi-definite; and every symmetric, strictly diagonally dominant matrix with negative

entries along the diagonal is negative definite.

Given an n n real matrix A, an eigenvalue of A is a number which when subtracted from each

of the diagonal entries of A converts A into a singular matrix. Subtracting a scalar from each

diagonal entry of A is the same as subtracting times the identity matrix I from A. Hence, is a

eigenvalue of A if and only if A I is a singular matrix.

6.9. Eigenvalue and Eigenvectors 87

This is also equivalent to asking for what non-zero vectors x Rn , and for what complex

numbers , is it true that

(6.32) Ax = x

This is known as the the eigenvalue problem.

eigenvector of A.

(6.33) (A I)x = 0

But (6.33) is a homogeneous system of n equations in n unknowns. It has a non-zero solution for x

if and only if (A I) is singular; that is, if and only if

(6.34) |A I| = 0

(6.35) f () |A I|

we note that f is a polynomial in ; it is called the characteristic polynomial of A.

Example 6.15. Consider the 3 3 matrix A given by

4 1 1

A= 1 4 1

1 1 4

Then subtracting 3 from each diagonal entries transforms A into the singular matrix

1 1 1

1 1 1 .

1 1 1

Therefore, 3 is an eigenvalue of matrix A.

Example 6.16. Consider the 2 2 matrix A given by

[ ]

4 0

A=

0 2

Then subtracting 4 from each diagonal entries transforms A into the singular matrix

[ ]

0 0

.

0 2

88 6. Linear Algebra

Therefore, 4 is an eigenvalue of matrix A. Also, subtracting 2 from each diagonal entries transforms

A into the singular matrix [ ]

2 0

.

0 0

Therefore, 2 is also an eigenvalue of matrix A.

The above example illustrates a general principal about the eigenvalues of a diagonal matrix.

Theorem 6.5. The diagonal entries of a diagonal matrix A are the eigenvalues of A.

Theorem 6.6. A square matrix A is singular if and only if 0 is an eigenvalue of A.

Example 6.17. Consider the 2 2 matrix A given by

[ ]

4 4

A=

4 4

Since the first row is negative of the second row, matrix A is singular. Hence 0 is an eigenvalue of

A. Also subtracting 8 from each diagonal entries transforms A into the singular matrix

[ ]

4 4

.

4 4

Therefore, 8 is also an eigenvalue of matrix A.

Example 6.18. Consider the 2 2 matrix A given by

[ ]

2 1

A=

1 2

Then equation (6.34) becomes

2 1

(6.36)

1 2

(1 )(3 ) = 0

Thus, the eigenvalues are = 1 and = 3. In this case it was also possible to see that = 1 is a

eigenvalue as subtracting 1 from the diagonal entries converts matrix A into a singular matrix.

[ ] [ ] [ ]

1 1 x1 0

=

1 1 x2 0

6.10. Eigenvalues of symmetric matrix 89

which yields

x1 + x2 = 0

Thus the general solution of the eigenvector corresponding to the eigenvalue = 1 is given by

(x1 , x2 ) = (1, 1) for = 0

Similarly, corresponding to the eigenvalue = 3, we have the eigenvector given by

(x1 , x2 ) = (1, 1) for = 0.

Example 6.19. A square matrix A whose entries are non-negative and whose rows (or columns)

each add to 1 is called a Markov matrix. These matrices play a major role in economic dynamics.

Consider the 2 2 matrix A given by

[ ]

a 1a

A=

b 1b

where a 0 and b 0. Then subtracting 1 from the diagonal entries leads to the matrix

[ ]

a1 1a

A=

b b

Notice that each row of the matrix adds to 0. But if the rows of a square matrix add to zero {0, 0},

the columns are linearly dependent and the matrix is singular. This shows that 1 is an eigenvalue

of the Markov matrix. This same argument shows that 1 is an eigenvalue of every Markov matrix.

For the case of a symmetric matrix A, we can show that all the eigenvalues of A are real.

Theorem 6.7. Let A be a symmetric n n matrix. Then all the eigenvalues of A are real.

Proof. Suppose is a complex eigenvalue, with associated complex eigenvector, x. Then we have

(6.37) Ax = x

Define x to be the complex conjugate of x, and to be the complex conjugate of . Then

(6.38) Ax = x

Pre-multiply (6.37) by (x ) and (6.38) by x to get

(6.39) (x ) Ax = (x ) x

(6.40) x Ax = x x

Subtracting (6.40) from (6.39)

(6.41) (x ) Ax x Ax = ( )x x

90 6. Linear Algebra

since (x ) x = x x . Also,

x Ax = (x Ax ) = (x ) A x = (x ) Ax

since A = A (by symmetry). Thus (6.41) yields

(6.42) ( )x x = 0

Since x = 0, we know that x x is real and positive. Hence (6.42) implies that = , so is

real.

n

tr(A) = aii

i=1

The following properties of the trace can be verified easily [Here A, B and C are n n matrices,

and R].

generally be written as

(6.43) |A I| = ()n + bn1 ()n1 + .... + b1 () + b0

where b0 , ..., bn1 are the coefficients of the polynomial which are determined by the coefficients

of the A-matrix.

On the other hand, if 1 , ..., n are the eigenvalues of A, then the characteristic equation (6.34)

can be written as

(6.44) 0 = (1 )(2 )....(n )

Using (6.34), (6.43), and (6.44) and comparing coefficients we can conclude that

bn1 = 1 + 2 + ... + n

6.11. Eigenvalues, Trace and Determinant of a Matrix 91

and

b0 = 1 2 ...n

Also, by looking at the terms in the characteristic polynomial of A which would involve

()n1 , we can conclude that

bn1 = a11 + a22 + ... + ann

Finally, putting = 0 in (6.43), we get

b0 = |A|

Thus we might note two interesting relationships between the characteristic values, the trace

and the determinant of A:

n

trA = i

i=1

and

n

|A| = i

i=1

Theorem 6.8. Let A be a symmetric matrix. Then,

(1) A is positive definite if and only if all the eigenvalues of A are positive.

(2) A is negative definite if and only if all the eigenvalues of A are negative.

(3) A is positive semidefinite if and only if all the eigenvalues of A are non-negative.

(4) A is negative semidefinite if and only if all the eigenvalues of A are non-positive.

(5) A is indefinite if and only if A has a positive eigenvalue and a negative eigenvalue.

Chapter 7

Problem Set 3

(1) Let

[ ] 9 6 5 4

1 1 7

A= , B = 1 2 3 3

0 8 10

0 1 1 2

( ) ( )

1 1

(2) Are the vectors and linearly independent?

2 3

(3) Let

[ ] 8 4

1 6 2

A= , B = 0 2 .

1 5 3

7 3

1 2 3 4

1 2 1 2

A= ?

1 3 5 7

2 1 4 1

93

94 7. Problem Set 3

3 2 1

A = 0 1 7 ?

5 4 1

x+y+z = 6

x + 2y + 3z = 10

x + 2y + z = .

For what values of and , the system of equations have

(a) no solution,

(b) a unique solution,

(c) infinitely many solutions?

(7) What is the definiteness of the following matrices? (Hint: Use the principal minors)

[ ] [ ] [ ] [ ]

2 1 2 4 3 4 3 4

A= , B= C= , D= .

1 1 4 8 4 5 4 6

(8) Consider the situation of a mass layoff (i.e. a firm goes out of business) where 2000 people

become unemployed and now begin a job search. There are two states: employed (E) and

unemployed (U) with an initial vector

x0 = [E U] = [0 2000].

Suppose that in any given period an unemployed person will find a job with probability 0.7

and will therefore remain unemployed with a probability 0.3. Additionally, persons who find

themselves employed in any given period may lose their job with a probability of 0.1 (and will

continue to remain employed with probability 0.9).

(i) Set up the Markov transition matrix for this problem.

(ii) What will be the number of unemployed people after (a) two periods; (b) four periods;

(c) six periods; (d) ten periods.

(iii) What is the steady-state level of unemployment?

| A

Ak = A {z A} = O

k times

7. Problem Set 3 95

and n is odd, then A is not invertible.

(c) A n n matrix A is called orthogonal if AA = I. Prove that if A is orthogonal, then

det A = 1.

(d) Let n n matrices A and B be such that AB = BA. Prove that if n is an odd number, then

either A or B is not invertible.

(e) Let n n matrices A and B be such that AB = I. Use determinants to prove that A is

invertible (and hence B = A1 .

(10) (a) Prove that the eigenvalues of an upper or lower triangular matrix are precisely its diagonal

entries.

(b) Suppose that A is an invertible matrix. Show that (AI)x = 0 implies that (A1 I )x = 0.

Conclude that for an invertible matrix A, is an eigenvalue of A if and only if 1 is an

eigenvalue of A1 .

(c) Let A be an invertible matrix and let x be an eigenvector of A. Show it is also an eigenvector

of A2 and A2 . What are the corresponding eigenvalues?

Chapter 8

Calculus

8.1. Functions

Recall the definition of functions discussed earlier. Now we discuss some features of function

which are useful in optimization exercise.

Definition 8.1. A function f : D R is called surjective (or is said to map) D onto R if f (D) = R,

i.e., if the image f (D) of the function is equal to entire range.

Definition 8.2. A function f : D R is called injective or one to one if

(8.1) f (x) = f (y) x = y.

Example 8.1. Consider function

f : R R : f (x) = x2 .

It is not surjective as there exist no element in the domain which gets mapped into 1.

97

98 8. Single and Multivariable Calculus

g : R R+ : f (x) = x2 .

Now this function is surjective as each non-negative real number has a pre-image (square root) in

R. However, this function is not injective as the pre-image of 4 is both 2 and 2.

Next let us also restrict the domain of the function to R+ . The function is

h : R+ R+ : f (x) = x2 .

It is both surjective and injective. Hence it is bijective.

Example 8.2. Let A be a non-empty set and let S be a subset of A. We define a function S : A

{0, 1} by

{

1, if a S;

(8.2) S (a) =

0, if a

/ S.

probability and statistics. If S is a non-empty proper subset of A, then S is surjective. If S = 0/ or

S = A, then S is not surjective.

Definition 8.3. Inverse Function: Consider f : D R. If g : R D such that x D,

( )

(8.3) g f (x) = x,

then g is called the inverse function of f and is written as f 1 : R D. Alternatively we can also

define the inverse function as under. Let f : D R be bijective. The inverse function of f is the

function f 1 : R D such that x D,

( )

(8.4) f 1 f (x) = x.

Theorem 8.1. Let f : D R be bijective. Then f 1 : R D is bijective.

( )

Example 8.3. f (x) = 2x, f 1 (x) = 2x , f 1 f (x) = f (x) 2x

2 = 2 = x.

Then

[ ]

(a) If f is injective, then f 1 f (A) = A,

[ ]

(b) If f is surjective, then f f 1 (B) = B,

(c) If f is injective, then f (A1 A2 ) = f (A1 ) f (A2 ) .

8.3. Composition of Functions 99

Proof. You should try and prove (a) and (b) on your own. I will provide proof for (c) here. We

need to prove that f (A1 A2 ) f (A1 ) f (A2 ) and f (A1 ) f (A2 ) f (A1 A2 ).

Step 1. Show

f (A1 A2 ) f (A1 ) f (A2 )

Let

y f (A1 A2 ) .

Then x A1 A2 f (x) = y. Since x A1 A2 , x A1 and x A2 . But then f (x) f (A1 ) and

f (x) f (A2 ). So f (x) f (A1 ) f (A2 ). Observe that we have not used the fact that f is injective.

So this part of the result holds for any function.

Step 2. We need to show

f (A1 ) f (A2 ) f (A1 A2 ) .

Let y f (A1 ) f (A2 ). Then y f (A1 ) and y f (A2 ). Hence there exist a point x1 A1 and a

point x2 A2 such that f (x1 ) = y and f (x2 ) = y. Or

f (x1 ) = y = f (x2 ) .

Since f is injective, we must have x1 = x2 , or x1 A1 A2 . But then y = f (x1 ) f (A1 A2 ).

Definition 8.4.

(c) A function f is periodic if and only if there exists a k > 0 such that for every x, f (x + k) = f (x).

(d) A function f is increasing if and only if for every x and every y, if x y, then f (x) f (y).

(e) A function f is decreasing if and only if for every x and every y, if x y, then f (x) f (y).

Definition 8.5. Composition of Functions: If f : A B and g : B C are two functions, then for

( a )A, f (a) B. But B is the domain of g, so mapping g can be applied to f (a), which yields

any

g f (a) , an element in C. This establishes a correspondence between a in A and c in C. This

100 8. Single and Multivariable Calculus

Thus we have

( )

(8.5) (g f ) (a) = g f (a) .

Remark 8.1. Composition of two functions need not be commutative,

(g f ) (a) = ( f g) (a)

as the following example shows.

(g f ) (x) = x2 + 1 but

( f g) (x) = (x + 1)2 .

Theorem 8.3. Let f : A B, and g : B C.

(a) If f and g are surjective, then g f is surjective,

(b) If f and g are injective, then g f is injective,

(c) If f and g are bijective, then g f is bijective.

Proof. (a) Since g is surjective, range of g = C. That is for any element c C, there exists an

element b B such that g (b) = c. Since f is also surjective, there exists an element a A such

that f (a) = b. But then

( )

(g f ) (a) = g f (a) = g (b) = c.

So, (g f ) is surjective.

( )

(b) Since g is injective, for all b and b in B, (if g)(b) = g b = c C then b = b and since f is

injective, for all a and a in A, if f (a) = f a = b B then a = a . Then

( )

(g f ) (a) = (g f ) a

( ) ( ( ))

g f (a) = g f a

( )

f (a) = f a

a = a

So, (g f ) is injective.

8.4. Continuous Functions 101

Definition 8.6. The real number L is the limit ofthe function

if for each > 0, there exists a > 0 such that f (x) L < whenever x D and 0 < |x c| < .

( )

(8.6) > 0, > 0 d (x, x0 ) < d f (x) , f (x0 ) < .

A function f : D R is continuous if it is continuous at all x0 D.

It is easy to draw examples of functions which are not continuous. An intuitive way of under-

standing continuity of function is that we should be able to draw its graph without lifting pencil

from paper. If a function has a point of discontinuity say, x0 , then as we approach x0 from the left

hand side and from right hand side, the function attains different values.

For a function to be continuous at x0 , both the LHS and RHS limits must exist and converge to

the function value.

(8.7) lim f (x) = lim+ f (x) = f (x0 )

xx0 xx0

Theorem 8.4. A function f : D R is continuous if and only if for every convergent sequence of

points {xn } D with limit x D, the sequence f (xn ) f (x).

Example 8.4. If

lim f (x) = lim+ f (x) = f (x0 )

xx0 xx0

then the function is not continuous. Take

x for 0 6 x < 21

y= 0 for x = 12

1 x for 1 < x 6 1

2

Definition 8.8. Given f : D R, let A R be any subset of the range. The inverse image of A

under f , f 1 (A), is the set of points x in the domain D such that f (x) A

{ }

(8.8) f 1 (A) = x D | f (x) A .

Theorem 8.5. A function f : D R is continuous if and only if the inverse image of every open

set is open.

102 8. Single and Multivariable Calculus

open in D (i.e., every point of f 1 (V ) is an interior point of f 1 (V )). Let p D and f (p) V .

Since V is open, there exists > 0 such that y V if d( f (p), y) < . Also since f is continuous at p,

there exists a > 0, such that d( f (p), y) < if d(p, x) < . Thus x f 1 (V ) as soon as d(p, x) <

and hence f 1 (V ) is open.

Conversely, assume that f 1 (V ) is open in D for every open set V in R. Fix p D and > 0,

and let V be the set of all y R such that d( f (p), y) < . Then V is open and hence f 1 (V ) is

open, and so there exists > 0 such that x f 1 (V ) as soon as d(p, x) < . But if x f 1 (V ), then

f (x) V , and so d( f (p), y) < .

Next theorem (stated without proof) considers the inverse image of the closed subsets of the

range R to characterize continuous functions.

Theorem 8.6. A function f : D R is continuous if and only if the inverse image of every closed

set is closed.

This follows from Theorem 8.5, since a set is closed if and only if its complement is open, and

since f 1 (V c ) = [ f 1 (V )]c for every V R.

Claim 8.1. If f and g are continuous functions then

f g

f( g)

f

(if g

= 0) are continuous.

g

max { f , g}

min { f , g}

Claim 8.2. If f is a continuous function of two variables f (x1 , x2 ), then the functions of one

variable obtained by holding the other variable constant f (, x2 ) and f (x1 , ) are also continuous.

Theorem 8.7. Intermediate Value Theorem for continuous functions: Let f be a continuous func-

tion on a domain containing [a, b], with say f (a) < f (b). Then for any y in between, f (a) < y <

f (b), there exists c in (a, b) with f (c) = y.

We can apply the Intermediate Value Theorem to prove the existence of a fixed point for fol-

lowing function.

Theorem 8.8. Consider a continuous function f : [0, 1] [0, 1]. Then there exists c [0, 1] such

that f (c) = c.

8.5. Extreme Values 103

y = f (x)

f (b)

y=u

u

f (a)

a c b

Proof. Define a function g(x) = f (x) x. It is continuous since it is sum of two continuous func-

tions, f (x) and x. If f (0) = 0, then x = 0 is a fixed point. If not, then f (0) > 0, or g(0) > 0.

If f (1) = 1, then x = 1 is a fixed point. If not, then f (1) < 1, or g(1) < 0.

Now we apply the Intermediate Value Theorem to claim that there exists a point c [0, 1] such

that g(c) = 0. This implies g(c) = f (c) c = 0 or f (c) = c or c is a fixed point.

Definition 8.9. The function f : D R attains a local maximum at x0 if there exists a neighborhood

of x0 such that f (x) 6 f (x0 ) for all x in the neighborhood.

104 8. Single and Multivariable Calculus

Definition 8.10. The function f : D R attains a strict local maximum at x0 if there exists a

neighborhood of x0 such that f (x) < f (x0 ) for all x not equal to x0 in the neighborhood.

Definition 8.11. The function f : D R attains a global maximum at x0 if f (x) 6 f (x0 ) , x D.

Definition 8.12. The function f : D R attains a strict global maximum at x0 if f (x) < f (x0 ),

x D\x0 .

Remark 8.2. A global maximum (minimum) is also a local maximum (minimum).

Theorem 8.9. Weierstrass Theorem: Suppose D is a non-empty closed and bounded subset of

Rn . If f : D R is continuous on D, then there exists x and x in D such that

( )

(8.9) f x > f (x) > f (x ) , x D.

Proof. We first claim that the function f is bounded on the domain D. If not, then there exists a

sequence {xn } n=1 in D such that f (xn ) as n . Since D is compact, there exists a subse-

quence {yn }

n=1 of sequence {xn }n=1 which converges to y in D. Since {yn }n=1 is a subsequence

of sequence {xn }n=1 and f (xn ) , it must be true that f (yn ) . However, {yn }

n=1 converges

to y and f is a continuous function, f (yn ) must converge to the finite real number f (y). These two

observations lead to a contradiction. Thus we have proved the claim.

To prove the theorem, we again assume that f does not attain its maximum value in D. Since f

is bounded on D, let M be the least upper bound of the values f takes in D. Clearly M is finite. Also

there exists a sequence {zn }

n=1 in D such that f (zn ) M. Note, even though f (zn ) approaches

towards the least upper bound M as n , the sequence {zn } n=1 itself need not converge. Since

D is compact, there exists a subsequence {un }n=1 of sequence {zn }

n=1 which converges to u in

D. Since f is a continuous function, f (un ) must converge to the finite real number f (u). Since a

convergent sequence has only one limit, f (u) = M and u is the point of global maximum of f in

D.

This is the theorem we will be using to show the existence of optimal bundles for consumers

and producers. So we need to understand it and be comfortable with using it.

The following examples show why the function domain must be closed and bounded in order

for the theorem to apply. In each of the following examples, the function fails to attain a maximum

on the given interval.

8.6. An application of Extreme Values Theorem 105

(a) f (x) = x defined over [0, ) (domain being unbounded) is not bounded from above.

x

(b) f (x) = 1+x defined over [0, ) (domain being unbounded) is bounded but does not attain its

least upper bound, i. e., 1.

1

(c) f (x) = x defined over (0, 1] (domain is bounded but not closed) is not bounded from above.

(d) f (x) = 1 x defined over (0, 1] (domain is bounded but not closed) is bounded but never attains

its least upper bound, i. e., 1.

(e) Defining f (x) = 0 in the last two examples shows that both functions require continuity on

[a, b].

If we are given two norms a and b on some finite-dimensional vector space V over Rn ,

a very useful fact is that they are always within a constant factor of one another. In other words,

there exists a pair of real numbers 0 < C1 < C2 such that, for all x V , the following inequality

holds:

C1 xb xa C2 xb .

Note that any finite-dimensional vector space, by definition, is spanned by a basis e1 , e2 , , en

where n is the dimension of the vector space. The basis is often chosen to be orthonormal if we

have an inner product. That is, any vector x can be written

n

x = i ei ,

i=1

Now, we can prove equivalence of norms in four steps, the last of which requires application

of the Extreme Value Theorem.

First, let us define a taxi-cab norm by

n

x1 = |i |

i=1

106 8. Single and Multivariable Calculus

We have seen earlier in a problem set that it is indeed a norm. The linear independence of

any basis {ei } implies that x = 0 | j | > 0 for some j 1 > 0. The triangle

inequality and the scaling property are obvious and follow from the usual properties of N1

norms on x Rn .

We will show that it is sufficient for us to prove that a is equivalent to 1 , because

norm equivalence is transitive: if two norms are equivalent to 1 , then they are equivalent

to each other.

In particular, suppose both a and a are equivalent to1 for constants 0 < C1 C2

and 0 < C1 C2 , respectively:

C1 x1 xa C2 x1 ,

C1 x1 xa C2 x1 .

Then it immediately follows that

C1 C

xa xa 2 xa ,

C2 C1

and hence a and a are equivalent.

We want to show that

C1 x1 xa C2 x1 ,

is true for all x V for some C1 ,C2 . It is trivially true for x = 0, so we need only consider

x = 0, in which case we can divide by x1 to obtain the condition

x

C1 C2 ,

x1 a

where u x

x1 has norm u1 = 1.

We wish to show that any norm a is a continuous function on V under the topology

induced by the norm 1 . That is, we wish to show that for any > 0, there exists a > 0

such that

x x 1 < |xa x a | <

We prove this in two steps. First, by the triangle inequality on a , it follows that

xa x a = x + (x x )a x a x x a ,

and

x a xa = x (x x )a xa x x a ,

and therefore,

|xa x a | < x x a .

8.7. Differentiability 107

Second, applying the triangle inequality again, and writing x = ni=1 i ei and x = ni=1 i ei ,

we obtain

n

x x a |i i |ei a x x 1 (max ei a ).

i

i=1

Therefore, if we choose

=

maxi ei a

it immediately follows that

x x 1 < |xa x a | <

Now we have a continuous function (the norm a on a compact (closed and bounded)

non-empty domain, the unit sphere, and can apply Weierstrass Theorem. By the extreme

value theorem, the function must achieve a maximum and minimum value on the set (it

cannot merely approach them). Let

C1 = min ua , and C2 = max ua .

u1 =1 u1 =1

Step 2. This completes the proof.

8.7. Differentiability

f (x0 + h) f (x0 )

(8.10) lim exists.

h0 h

If this limit exists, we call it derivative of f at x0 and denote it by f (x0 ) or d f (x)

dx |x=x0 .

We follow the steps listed below to determine whether a derivative exists and if yes, its value.

(b) slope of the secant is h = h .

f

(c) If the secant h has a limit as h 0, then f is differentiable at x0 , and the derivative is

equal to this limit.

108 8. Single and Multivariable Calculus

We can see that the derivative is equal to the slope of the tangent to the graph at x0 . Note that

the tangent can be used to approximate the function in the neighborhood of x0 .

f (x0 + h) = f (x0 ) + h f (x0 ) .

It is the best linear approximation.

Definition 8.14. A function f : R R is differentiable on a set S R, if it is differentiable at each

point x S. It is called differentiable if it is differentiable at each point of the domain.

Example 8.5. Let f (x) : R R be f (x) = x2 . This function is differentiable at all x R.

f (x0 + h) f (x0 ) (x0 + h)2 x02

sec = =

( h ) h

x02 + h2 + 2x0 h x02 2x0 h + h2

= =

h h

= 2x0 + h

lim sec = 2x0 f (x0 ) = 2x0 .

h0

its first derivative. If f () is differentiable, its derivative is denoted by f () and is called the

second derivative of f .

Definition 8.16. A function whose derivative exists and is continuous is called continuously dif-

ferentiable or of class C 1 . A function whose second derivative exists and is continuous is called

twice continuously differentiable or of class C 2 .

Result 8.2. If function f : R R is differentiable at x0 then it is continuous at x0 .

f (x0 + h) f (x0 )

lim

h0 h

exists and is f (x0 ). Consider,

[ ]

f (x) f (x0 )

lim [ f (x) f (x0 )] = lim [x x0 ] ;

xx0 xx0 x x0

[ ]

f (x) f (x0 )

= lim [x x0 ] lim

xx0 xx0 x x0

= 0 f (x0 ) = 0

lim f (x) = f (x0 ) .

xx0

Hence f is continuous at x0 .

8.7. Differentiability 109

f (x) = |x|

Note this claim does not hold in the other direction. Not all continuous functions are differen-

tiable. Consider the example of absolute value function f : R R is defined by

f (x) = |x| .

The absolute value or |x| of x is defined by

{

x if x > 0

|x| =

x if x < 0.

verify).

Theorem 8.10. If f and g are differentiable functions then

(8.12) f g is differentiable with ( f g) (x) = f (x) g (x) + f (x) g (x)

( )

f f f (x) g (x) f (x) g (x)

(8.13) If g = 0, then is differentiable with (x) = ( )2 .

g g g (x)

110 8. Single and Multivariable Calculus

3 2 1 1 2 3

1

( )

(8.14) f g is differentiable with ( f g) (x) = f g (x) g (x)

( )

Example 8.6. Let f (y) = ln y and g (x) = x2 . Then, f g (x) = ln x2 and

1 2

( f g) (x) = 2x =

x2 x

Theorem 8.12. If f is differentiable and has a local maxima or minima at x0 , then f (x0 ) = 0.

Note the converse is not true. Take f (x) = x3 (See Figure 8.3). The first derivative is zero at

x0 = 0 which is a point of inflection.

8.7. Differentiability 111

y

1

0 x

0.2 0.4 0.6 0.8

Example 8.7. Let f be defined by

( )

x sin 1

x for x = 0

f (x) =

0 for x = 0.

( ) ( )( )

1 1 1

f (x) = sin + x cos 2

x x x

( ) ( )

1 1 1

= sin cos for x = 0.

x x x

At x = 0, this does not work as 1

x is not defined there. We use the definition, for h = 0, the secant is

( )

( )

h 0

1

f (h) f (0) h sin 1

= = sin .

h h h

( )

As h 0, sin 1

h does not tend to any limit, so f (0) does not exist.

( )

x2 sin 1

x for x = 0

f (x) =

0 for x = 0.

112 8. Single and Multivariable Calculus

( ) ( )( )

1 1 1

f (x) = 2x sin 2

+ x cos 2

x x x

( ) ( )

1 1

= 2x sin cos for x = 0.

x x

At x = 0, we use the definition as before, for h = 0, the secant is

( )

h 2 sin 1 0 ( )

f (h) f (0) h 1

= = h sin

h h h

( )

f (h) f (0)

= h sin 1 6 |h|

h h

As h ( 0,)we see that f (0) = 0. Thus f (x) is differentiable everywhere but f (x) is not continuous

as cos 1x does not tend to a limit as x 0.

8.7.2. LHospitals Rule. Sometimes we need to determine the value of a function where both the

numerator and the denominator go to zero. We use LHospital rule in such case. If f (a) = g (a) = 0

and g (a) = 0, then

f (x) f (a)

lim = .

xa g (x) g (a)

2

x8

.

x4

f (x) = x2 16, g (x) = 4 x 8

2

f (4) = g (4) = 0, f (x) = 2x, g (x) = .

x

Then

x2 16 f (4) 8

lim = = = 8.

x4 4 x 8 g (4) 1

that

f (x1 ) 6 f (x0 ) 6 f (x2 )

for all x1 , x2 in the neighborhood satisfying x1 < x0 < x2 .

8.8. Monotone Functions 113

that

f (x1 ) < f (x0 ) < f (x2 )

for all x1 , x2 in the neighborhood satisfying x1 < x0 < x2 .

Definition 8.19. Function f is monotone increasing on an interval if for all points x1 , x2 in the

interval satisfying x1 < x2

f (x1 ) 6 f (x2 ) .

Definition 8.20. Function f is strictly increasing on an interval if for all points x1 , x2 in the interval

satisfying x1 < x2

f (x1 ) < f (x2 ) .

We define monotone and strictly decreasing functions in the same way by reversing the in-

equalities. Some properties of derivative of monotone functions are

{ } { }

> strictly increasing

(8.15) f (x0 ) 0 f is at x0 .

< strictly decreasing

{ } { }

> monotone increasing

(8.16) f (x0 ) 0 f is at x0 .

6 monotone decreasing

Theorem 8.13. Mean Value Theorem: Let f be a continuous function on the compact interval [a, b]

and differentiable on (a, b). Then there exists a point c (a, b) where

f (b) f (a)

f (c) = .

ba

Following claim is helpful in proving the Mean Value Theorem. The proof the claim relies on

the Weierstrass Theorem and thus is another example of application of the Weierstrass Theorem.

Claim 8.3. Let f () and g() be continuous functions on [a, b] and differentiable on (a, b). Then

there exist x (a, b) such that

[ f (b) f (a)]g (x) = [g(b) g(a)] f (x).

Proof. Define,

h(s) = [ f (b) f (a)]g(s) [g(b) g(a)] f (s).

Then, it is easy to check h(a) = f (b)g(a) f (a)g(b) = h(b). We need to show that h (x) = 0 for

some x (a, b). If h(x) is a constant function, then h (x) = 0 for every point in (a, b). If not, then

consider without loss of generality, h(x) > h(a) for some x (a, b). Since h() is a continuous

function defined on a compact domain [a, b], Weierstrass Theorem can be applied to claim that it

attains a maximum at some point s (a, b). Also since h() is differentiable on (a, b) and attains its

114 8. Single and Multivariable Calculus

f (b)

f (b) f (a)

ba

f (a)

f (c)

a c b

ba

maximum at s (a, b), h (s) = 0. The case where h(x) < h(a) for some x (a, b) can be proved in

similar manner as in this case, the function h() will attain a minimum at some interior point.

To prove the Mean Value Theorem, we consider g(x) = x. Then, g (x) = 1 leads to

f (b) f (a)

[ f (b) f (a)](1) = [b a] f (x) or f (x) = ,

ba

for some x (a, b).

Theorem 8.14. [Darbouxs Theorem] Intermediate Value Theorem for derivative: If f is differ-

entiable on (a, b) then its derivative has the intermediate value property. If x1 < x2 are any two

points in the interval (a, b), and y lies between f (x1 ) and f (x2 ), then there exists a number x in

the interval [x1 , x2 ] such that f (x) = y.

Assume y lies strictly between f (x1 ), and f (x2 ). Define a function g : (a, b) R by

g(t) = f (t) yt.

Then g (x1 ) = f (x1 ) y and g (x2 ) = f (x2 ) y. Then either (i) g (x1 ) > 0 and g (x2 ) < 0 or (ii)

g (x1 ) < 0 and g (x2 ) > 0. Take the first case, i.e. g (x1 ) > 0 and g (x2 ) < 0. It is clear that neither

x1 nor x2 can be a point where g attains even a local maximum. Since g is a continuous function, it

8.9. Functions of Several Variables 115

must therefore attain its maximum at an interior point x of the closed and bounded interval [x1 , x2 ]

by Weierstrass Theorem. So we conclude that

0 = g (x) = f (x) y, or f (x) = y.

We can clearly assume that y lies strictly between f (x1 ) and f (x2 ). Define continuous func-

tions fx1 , fx2 : [a, b] R by

{

f (x1 ) for t = x1

fx1 (t) = f (x1 ) f (t)

x1 t for t = x1 .

and {

f (x2 ) for t = x2

fx2 (t) = f (t) f (x2 )

tx2 for t = x2 .

Observe that fx1 (x1 ) = f (x1 ), fx2 (x2 ) = f (x2 ) and fx1 (x2 ) = fx2 (x1 ). Hence, y lies between fx1 (x1 )

and fx1 (x2 ); or y lies between fx2 (x1 ) and fx2 (x2 ). If y lies between fx1 (x1 ) and fx1 (x2 ), then (by

continuity of fx1 ) there exists s in (x1 , x2 ] with

f (s) f (x1 )

y = fx1 (s) = .

s x1

Then by Mean Value Theorem there exists x [x1 , s] such that

f (s) f (x1 )

y= = f (x).

s x1

Similarly if y lies between fx2 (x1 ) and fx2 (x2 ), then (by continuity of fx2 ) there exists s in [x1 , x2 )

and x [s, x2 ] such that

f (x2 ) f (s)

y= = f (x).

x2 s

(8.17) f (x) = f (x1 , x2 , xn )

Examples of such functions are utility functions for several goods, the production functions for

many inputs etc.

116 8. Single and Multivariable Calculus

Definition 8.21. The function f (x) is differentiable at the point x if there exists an n dimensional

vector D f (x), called the differential or total derivative of f at x, such that

> 0, > 0 x y <

f (x) f (y) D f (x) (x y) < x y .

8.9.1. Partial Derivative. To us the more important concept is that of partial derivative which we

define now.

Definition 8.22. Let f : D R where D Rn be a function of n variables. If the limit

f (x1 , , xi + h, xn ) f (x1 , , xi , xn )

lim

h0 h

f (x)

exists, it is called the ith (first order) partial derivative of f at x and is denoted by xi or fi (x).

The function f (x) is then said to be partially differentiable with respect to xi . The function

f (x) is said to be partially differentiable if it is partially differentiable with respect to every xi .

Note x

f (x)

i

is the derivative of f (x1 , , xn ) with respect to xi holding all other variables con-

stant. When all the partial derivatives exist, the vector of partial derivatives

[ ]

f (x) f (x)

f (x) = , ,

x1 xn

is called the Jacobian vector or the gradient vector. For functions of one variable, f (x) = f (x).

Result 8.3. If a function is differentiable at x0 then it is partially differentiable at x0 .

However, existence of all the partial derivatives do not guarantee even the continuity of the

function as the following example shows.

Example 8.10. Let f (x, y) be defined as

{

xy

x2 +y2

if (x, y) = (0, 0)

f (x, y) =

0 otherwise.

We can prove that the partial derivatives D1 f (x, y) and D2 f (x, y) exist at every point in R2 , although

f is not continuous at (0, 0).

If f is a real valued function defined on an open set D in Rn , and the partial derivatives are

bounded in D, then f is continuous on D.

Example 8.11. Let f : R2 R be

f (x1 , x2 ) = x13 + 2x1 x2 + 3x23 .

8.9. Functions of Several Variables 117

Then

f (x) f (x)

= 3x12 + 2x2 , = 2x1 + 9x22

x1 x2

[ ]

f (x) = 3x12 + 2x2 , 2x1 + 9x22 , x R2

For functions of one variable we have seen earlier that we could approximate the function

around a point by the tangent to the function at the point. We can do something similar in case of

functions of several variables. Instead of approximation by a line (the tangent), we now approxi-

mate by the tangent hyperplane.

Definition 8.23. Given f : D R with gradient f (x) at x0 , the tangent hyperplane to f at x0 is

given by

f (x) = f (x0 ) + f (x0 ) (x x0 ) .

8.9.2. Second Order Partial Derivatives. Let us look at the example above again. For

f (x1 , x2 ) = x13 + 2x1 x2 + 3x23 ,

f (x)

x1 = 3x12 + 2x2 and xf (x)

2

= 2x1 + 9x22 are differentiable functions of x1 and x2 themselves. When

we take partial derivatives of these functions we get the second partial derivatives.

2 f (x) 2 f (x) 2 f (x) 2 f (x)

= 6x1 , = 18x 2 , = = 2.

x12 x22 x1 x2 x2 x1

This example can be generalized.

Definition 8.24. Let f : Rn R be twice differentiable. For each of the n partial derivatives, we

get n partial derivative of second order,

( )

f (x) 2 f (x)

= = fi j (x) .

x j xi x j xi

We organize the second order derivatives in a matrix, called the Hessian Matrix.

2 f (x) 2 f (x)

x12 xn x1

2 f (x)

x x

2

H f (x) = D f (x) = .

(8.18) .. .

1 2

.. ..

.. . . .

2

f (x) 2 f (x)

x1 xn x 2

n

118 8. Single and Multivariable Calculus

If all the partial derivatives of the first order exist and are continuous then f is called C 1 or contin-

uously differentiable. If all the partial derivatives of second order exist and are continuous then f

is called C 2 or twice continuously differentiable and so forth.

Theorem 8.15. Youngs Theorem : If f is twice continuously differentiable then

2 f (x) 2 f (x)

= ,

x j xi xi x j

i.e., the Hessian of f is a symmetric matrix.

Example 8.12. For the example above

[ ]

6x1 2

H f (x) = .

2 18x2

The off diagonal element of the Hessian are also called cross-partials. For functions of one

variable, H f (x) = f (x).

Example 8.13. Let f : R3 R be

f (x) = 5x12 + x1 x23 x22 x32 + x33 .

Then [ ]

f (x) = 10x1 + x23 3x1 x22 2x2 x32 2x22 x3 + 3x32

and

10 3x22 0

H f (x) = 3x22 6x1 x2 2x32 4x2 x3

0 4x2 x3 2x22 + 6x3

We now provide three very useful theorems on continuous and differentiable functions on

convex sets in Rn for n 1. They are the Intermediate Value theorem, the Mean Value theorem

and Taylors theorem.

Theorem 8.16 (Intermediate Value Theorem:). Suppose A is a convex subset of Rn , and f : A R

is a continuous function on A. Suppose x1 and x2 are in A, and f (x1 ) > f (x2 ). Then given any

c R such that f (x1 ) > c > f (x2 ), there is 0 < < 1 such that f [x1 + (1 )x2 ] = c.

Example 8.14. Suppose X [a, b] is a closed interval in R (with a < b). Suppose f is a continuous

function on X. By Weierstrass theorem, there will exist x1 and x2 in X such that f (x1 ) f (x)

f (x2 ) for all x X. If f (x1 ) = f (x2 ) [this is the trivial case], then f (x) = f (x1 ) for all x X, and

so f (X) is the single point, f (x1 ). If f (x1 ) > f (x2 ), then using the fact that X is a convex set, we

can conclude from the Intermediate Value Theorem that every value between f (x1 ) and f (x2 ) is

attained by the function f at some point in X. This shows that, f (X) is itself a closed interval.

8.10. Composite Functions and the Chain Rule 119

Theorem 8.17 (Mean Value Theorem). Suppose A is an open convex subset of Rn , and f : A R

is continuously differentiable on A. Suppose x1 and x2 are in A. Then there is 0 1 such that

f (x2 ) f (x1 ) = (x2 x1 ) f (x1 + (1 )x2 )

Example 8.15. Let f : R R be a continuously differentiable function with the property that

f (x) > 0 for all x R. Then given any x1 , x2 in R, with x2 > x1 we have by the Mean-Value

Theorem (since R is open and convex), the existence of 0 1, such that

f (x2 ) f (x1 ) = (x2 x1 ) f (x1 + (1 )x2 )

Now f (x1 + (1 )x2 ) > 0 by assumption, and x2 > x1 by hypothesis. So f (x2 ) > f (x1 ). This

shows that f is an increasing function on R.

Observe that a function f : R R can be increasing without satisfying f (x) > 0 at all x R.

For example, f (x) = x3 is increasing on R, but f (0) = 0.

Theorem 8.18 (Taylors Expansion up to Second-Order). Suppose A is an open, convex subset of

Rn , and f : A R is twice continuously differentiable on A. Suppose x1 and x2 are in A. Then

there exists 0 1, such that

1

f (x2 ) f (x1 ) = (x2 x1 ) f (x1 ) + (x2 x1 ) H f (x1 + (1 )x2 )(x2 x1 )

2

on an open set A Rn . Let f : B R be a function defined on an open set B Rm which contains

the set h(A). Then, we can define G : A R by G(x) f [h(x)] f [h1 (x), , hm (x)] for each

x A. This function is known as a composite function [of f and h].

The Chain Rule of differentiation provides us with a formula for finding the partial deriva-

tives of a composite function, F, in terms of the partial derivatives of the individual functions, f

and h.

Theorem 8.19 (Chain Rule of differentiation). Let h : A Rm be a function with component

functions hi : A R(i = 1, , m) which are continuously differentiable on an open set A Rn .

Let f : B R be a continuously differentiable function on an open set B Rm which contains the

set h(A). If F : A R is defined by F(x) = f [h(x)] on A, and a A, then F is differentiable at a

and we have for i = 1, , n,

m

Di F(a) = D j f (h1 (a), , hm (a))Di h j (a)

j=1

120 8. Single and Multivariable Calculus

Example 8.16. Let m = 2, n = 1. Let h1 (x) = x3 on R, and h2 (x) = 10 + x on R; and let f (y1 , y2 ) =

y1 + y42 on R2 . Then

F(x) = f [h(x)] = f [h1 (x), h2 (x)] = h1 (x) + [h2 (x)]4 = x3 + (10 + x)4

is a composite function on R. If a R,

F (a) = D1 F(a) = D1 f (h1 (a), h2 (a)) D1 h1 (a) + D2 f (h1 (a), h2 (a)) D1 h2 (a)

= 1 (3a2 ) + 4(h2 (a))3 1 = 3a2 + 4(10 + a)3

Example 8.17. Take m = 1, n = 2. Let h1 (x) = h1 (x1 , x2 ) = x12 + x2 on R2 ; f (y) = 2y on R. Then

F(x) = F(x1 , x2 ) = f [h1 (x1 , x2 )] = 2[x12 + x2 ]. Then if a R2 ,

D1 F(a) = D1 f [h1 (a1 , a2 )]D1 h1 (a1 , a2 )

D2 F(a) = D1 f [h1 (a1 , a2 )]D2 h1 (a1 , a2 )

Thus, D1 F(a) = 2(2a1 ) = 4a1 ; and D2 F(a) = 2(1) = 2.

Chapter 9

Problem Set 4

[ ]1

2x + 1 2

(9.1) f (x) =

x1

(9.2) f (x) = ln(3x2 5x)

(3) Let f : R R be

{

x2 1, x 6 0

f (x) =

x2 , x > 0.

and g : R R

{

3x 2, x 6 2

g (x) =

x + 6, x > 2.

(a) Is f continuous at x = 0?

(b) Is g continuous at x = 2?

(4) Find

( )

f (x) exp x2 + exp (x) 2

(9.3) lim = lim

x0 g (x) x0 2x

121

122 9. Problem Set 4

f (x, y) = x2 y + y2 x 2xy + 3x

at the point (1, 2).

{

xy

x2 +y2

if (x, y) = (0, 0)

f (x, y) =

0 otherwise.

Show that the partial derivatives D1 f (x, y) and D2 f (x, y) exist at every point in R2 , although f

is not continuous at (0, 0).

(7) This exercise gives an example of a function with D12 f (x, y) = D21 (x, y). Let f (x, y) be defined

as {

xy(x2 y2 )

x2 +y2

if (x, y) = (0, 0)

f (x, y) =

0 otherwise.

(a) We can prove that the partial derivatives D1 f (x, y) and D2 f (x, y) exist at every point in

(x, y) R2 and f is continuous on R2 .

(b) The partial derivatives D1 f (x, y) and D2 f (x, y) are continuous at every point in R2 .

(c) The second order cross partial derivatives D12 f (x, y) and D21 f (x, y) exist at every point in

R2 and are continuous everywhere in R2 except at (0, 0).

(d) D21 f (0, 0) = +1 and D12 f (0, 0) = 1.

Chapter 10

Convex Analysis

[ ]

(10.1) f (x) + (1 ) f (y) 6 f x + (1 ) y

Function f is strictly concave if the inequality is strict for all (0, 1).

is a concave function if and only if the set

( ) ( )

Proof. Let the function f be concave. Let x1 , 1 C and x2 , 2 C. Then f (x1 ) 1 and

f (x2 ) 2 . Since f is concave, and x1 , x2 A, for every [0, 1],

[ ] ( ) ( )

f x1 + (1 ) x2 f x1 + (1 ) f x2 1 + (1 ) 2 ,

( )

which implies x1 + (1 ) x2 , 1 + (1 ) 2 C. Hence C is convex.

123

124 10. Convex Analysis

f(d)

f (d)

f (d) f (c)

dc

f(c)

f (c)

c d

Figure 10.1. A concave Function of one variable: f (d) < < f (c)

f (d) f (c)

dc

( ) ( )

Next we assume C to be convex. Note for x1 , x2 A, we have x1 , f (x1 ) C and x2 , f (x2 )

C. Since C is convex, for every [0, 1],

( ) ( )

x1 , f (x1 ) + (1 ) x2 , f (x2 ) C.

This implies

f (x1 + (1 )x2 ) f (x1 ) + (1 ) f (x2 ),

or f is concave.

In general, a concave function on a convex set in Rn need not be continuous as the following

example shows.

10.1. Concave, Convex Functions 125

f (x)

b

f (x) is not continuous at x = 0

bC

1 + x for x > 0

f (x) =

0 for x = 0.

This function is concave but it is not continuous at x = 0.

However, if the set A is open and convex, then the concave function f is continuous on A.

If the function is continuously differentiable on an open convex set, then following theorem

characterizes the concave functions.

Theorem 10.2. Suppose A Rn is an open convex set, and f : A R is continuously differentiable

on A. Then f is concave on A if and only if

whenever x1 and x2 are in A.

126 10. Convex Analysis

Then ( )

f (x1 + (x2 x1 )) f (x1 ) f (x2 ) f (x1 ) .

Dividing both sides by , we get

f (x1 + (x2 x1 )) f (x1 )

f (x2 ) f (x1 ).

Taking 0, we get

f (x1 ) (x2 x1 ) f (x2 ) f (x1 ),

which proves the inequality.

Next we assume (10.2) holds true for all x2 , x1 A. Then for any [0, 1], let x = x2 + (1

)x1 . Since A is convex, x A. Note

x2 x = x2 x2 (1 )x1 = (1 )(x2 x1 ).

Also

x1 x = x1 x2 (1 )x1 = (x2 x1 ).

Applying (10.2), we get

f (x2 ) f (x) 6 f (x) (x2 x) = f (x) (1 )(x2 x1 ),

and

f (x1 ) f (x) 6 f (x) (x1 x) = f (x) ()(x2 x1 ).

We multiply the first inequality by and the second inequality by 1 and add to obtain

f (x2 ) + (1 ) f (x1 ) f (x) 6 0,

which implies

f (x2 ) + (1 ) f (x1 ) 6 f (x) = f (x2 + (1 )x1 ).

So f is concave.

Also the function will be strictly concave if we change the weak inequality to strict inequality.

Theorem 10.3. Suppose A Rn is an open convex set, and f : A R is continuously differentiable

on A. Then f is strictly concave on A if and only if

whenever x1 and x2 are in A.

Now we consider twice continuously differentiable functions. Following two theorems char-

acterize concave and strictly concave functions.

10.1. Concave, Convex Functions 127

3 2 1 1 2 3

1

differentiable on A. Then f is concave on A if and only if H f (x) is negative semi-definite for all

x A.

If H f (x) is negative definite whenever x A, then the function is strictly concave, but the

converse is not true.

Theorem 10.5. Suppose A Rn is an open convex set, and f : A R is twice continuously

differentiable on A. If H f (x) is negative definite for all x A then f is strictly concave on A.

Following example shows that the converse implication does not hold.

Example 10.2. Let f : R R be defined by f (x) = x4 for all x R (See Figure 2). This is a

twice continuously differentiable function on the open, convex set R. We can verify that f is strictly

concave on R, but since f (x) = 12x2 , f (0) = 0. This shows that the converse implication is not

valid.

Claim 10.1. If f : D R is a function of one variable and is twice continuously differentiable then

x D, f (x) 6 0 f is concave.

Definition 10.2. Function f : D R is convex if x, y D, [0, 1],

[ ]

(10.3) f (x) + (1 ) f (y) > f x + (1 ) y

128 10. Convex Analysis

Function f is strictly convex if the inequality is strict for all (0, 1).

Claim 10.2. If f : D R is a function of one variable and is twice continuously differentiable then

x D, f (x) > 0 f is convex.

Note that a local maxima (minima) of a concave (convex) function is a global maxima (minima)

as well.

Theorem 10.6. Let f : D R (where D Rn is open and convex) be twice continuously differ-

entiable. Then,

(10.5) f is convex if and only if H f (x) is PSD x D.

(10.6) H f (x) is ND x D f is strictly concave.

(10.7) H f (x) is PD x D f is strictly convex.

Corollary 1. For a function of one variable, this means,

(10.9) f is convex if and only if f (x) > 0 x D.

(10.10) f (x) < 0 x D f is strictly concave.

(10.11) f (x) > 0 x D f is strictly convex.

Example 10.3. The implication

f is strictly convex f (x) > 0, x D

does not hold.

Take f (x) = x4 , f (x) = 12x2 . It is strictly convex everywhere but f (0) = 0. We would need

f (x)> 0, x D for the Hessian to be PD.

10.2. Quasi-concave Functions 129

3 2 1 1 2 3

Proposition 3.

( )

(b) If f (x) is concave (convex) and F (u) concave(convex) and increasing then U (x) = F f (x)

is concave(convex).

( ) { }

f x + (1 ) y > min f (x) , f (y) .

{ }

Theorem 10.7. Function f : D{ R is quasi-concave

} if and only if a R, the set f + = x D | f (x) > a

a

is a convex set. The set fa+ = x D | f (x) > a is called upper contour set.

Definition 10.4. Function f : D R is quasi-convex if function f is quasi-concave.

{

= x D | f (x) 6 a

}

Theorem 10.8. Function f : D{ R is quasi-convex} if and only if a R, the set f a

is a convex set. The set fa = x D | f (x) 6 a is called lower contour set.

130 10. Convex Analysis

Theorem 10.9.

f : D R concave f is quasi-concave,

f : D R convex f is quasi-convex.

Note that for functions of one variable, any monotone function is quasi-concave. This however

does NOT apply for functions of more than one variable. Also all quasi-concave functions need

not be concave. Take f (x) = x2 it is monotone increasing, hence quasi-concave. But it is not

concave, rather it is convex. For functions of one variable, following theorem characterizes the

quasi-concave functions.

Theorem 10.10. A function f of a single variable is quasiconcave if and only if either (a) it is

non-decreasing, (b) it is non-increasing, or (c) there exists x such that f is non-decreasing for

x < x and non-increasing for x > x .

matrix.

f (x) f (x)

0 x1 xn

f (x)

2 f (x)

2 f (x)

x1 x12 xn x1

B (x) =

2 f (x)

x1 x2 .

. .. .. .. ..

.. . . . .

f (x) 2 f (x) 2 f (x)

xn x1 xn xn2

Let Br (x) denote the sub matrix of the first (r + 1) rows and columns of Br (x), i.e., Br (x) is

(r + 1) (r + 1) matrix.

( )

Condition 1. A necessary condition for f to be quasiconcave is that (1)r det Br (x) > 0, r =

1, 2, , n; x D.

( )

Condition 2. A sufficient condition for f to be quasiconcave is that (1)r det Br (x) > 0, r =

1, 2, , n; x D.

10.2. Quasi-concave Functions 131

When we check for quasi-concavity, we have to check the sufficient conditions. We need

[ ]

0 f1

det < 0,

f1 f11

0 f1 f2

det f1 f11 f12 > 0, etc.

f2 f21 f22

Remark 10.1. When we have to check whether a function is quasi-concave, start out checking

whether it is concave because it is easier to check for concavity and concavity implies quasi-

concavity.

Remark 10.2. Quasi-concavity is preserved under monotone transformation whereas concavity

need not be preserved.

Example 10.4. Let f (x, y) = xy for (x, y) R2++ .

Then

y

1

4 x3

1

4

1

xy

H f (x) = 1 1 .

4 xy 14 x

y3

The principal minors of order one are negative and of order two is zero. Hence f (x) is concave and

so quasi-concave.

( )4

Let us take a monotone transformation g (x, y) = f (x, y) = x2 y2 , for (x, y) R2++ .

0 2xy2 2x2 y

B (x, y) = 2xy2 2y2 4xy

2x2 y 4xy 2x2

[ ]

( ) 0 2xy2

det B1 (x, y) = = 4x2 y4 < 0

2xy2 2y2

( )

(1)1 det B1 (x, y) > 0, (x, y) R2++ .

( ) ( ) ( )

det B2 (x, y) = 2xy2 4x3 y2 8x3 y2 + 2x2 y 8x2 y3 4x2 y3

= 8x4 y4 + 8x4 y4 = 16x4 y4 > 0, (x, y) R2++

g(x, y) is quasi-concave.

Note however, g (x, y) is not concave.

[ ]

2y2 4xy

Hg (x, y) =

4xy 2x2

132 10. Convex Analysis

Principal minors of order one are strictly positive and of order two is 12x2 y2 which is strictly

negative. Thus g (x, y) is not concave.

Chapter 11

Problem Set 5

(1) Prove or give a counterexample: The sum of two concave functions is concave.

(a) If A and B are convex sets, then A B is convex.

(b) If A and B are convex sets, then A B is convex.

(a) f (x) = 3x + 4;

(b) g(x, y) = yex , y > 0;

(c) h(x, y) = x2 y3 .

(4) Show using an example that the sum of two quasiconcave functions need not be quasiconcave

(in general).

(i) f (x, y, z) = 8x3 + 2xy2 z3

(ii) g(x, y) = x + y ex ex+y

Write out the gradient vector and the Hessian matrices f (x, y, z) and H f (x, y, z) and g(x, y)

and Hg (x, y). State if f concave, quasiconcave, quasiconvex? What about function g?

133

Chapter 12

Function Theorems

defined by f (x) = 4x. It is one - to - one on R; and also we can define a function g : R R by

g(y) = 4y . The function g(y) satisfies the property g[ f (x)] = x and is called the inverse function of

f on R. Furthermore g [ f (x)] = f 1(x) for all x R.

This idea can be extended to the domains of the function, A, being subsets of Rn , with the

function f defined from A to R. Then f is one-to-one on A if for all x1 , x2 A, x1 = x2 , we have

f (x1 ) = f (x2 ). In this case, if there is a function g, from f (A) to A, such that g[ f (x)] = x for each

x A, then g is called the inverse function of f on f (A).

Let a A, and suppose that f (a) = 0. If f (a) > 0, then there is an open interval B(a, r) such that

f (x) > 0 for all x in B(a, r), and f is increasing on B(a, r). Thus, for every z f [B(a, r)], there

is a unique x in B(a, r) such that f (x) = y. Or there is a unique function h : f [B(a, r)] B(a, r)

such that h[ f (x)] = x for all x B(a, r). Thus, h is an inverse function of f on f [B(a, r)]. In other

word, h is the inverse of f locally around the point f (a). We have not guaranteed that the inverse

function is defined on the entire set f (A). Similarly, if f (a) < 0, an inverse function could be

135

136 12. Inverse and Implicit Function Theorems

defined locally around f (a). The important restriction to carry out the kind of analysis noted

above is that f (a) = 0.

is continuously differentiable on R, but f (0) = 0. Now, we cannot define a unique inverse function

of f even locally around f (0). If we choose any open ball B(0, r), and consider any point y = 0

in the set f [B(0, r)], then there will be two values x, x in B(0, r), x = x , such that f (x) = y = f (x ).

We note here that f (a) = 0 is not a necessary condition to get a unique inverse function of f .

For example if f : R R is defined by f (x) = x3 , then we have f to be continuously differentiable

on R, with f (0) = 0. However f is an increasing function, and clearly has a unique inverse function

g(y) = y1/3 on R, and hence locally around f (0).

Following theorem deals with the existence and properties of inverse functions.

Theorem 12.1 (Inverse Function Theorem). Let A be an open set of Rn , and f : A Rn be con-

tinuously differentiable on A. Suppose a A and the Jacobian of f at a is non-zero. Then there is

an open set X A containing a, and an open set Z Rn containing f (a), and a unique function

h : Z X, such that:

(i) f (X) = Z;

(ii) f is one-to-one on X;

Following example shows that continuity of f is needed in the inverse function theorem, even

in the case n = 1.

Example 12.1. Let ( )

1

f (t) = t + 2t 2 sin for t = 0 and f (0) = 0,

t

then f (0) = 1, f is bounded in (1, 1), but f is not one-to-one in any neighborhood of 0.

For the system of simultaneous linear equations Ax = b, we have seen earlier, that there exists a

unique solution for every choice of right hand side column vector b, if and only if the rank of A

12.2. The Linear Implicit Function Theorem 137

is equal to the number of rows of A which is equal to the number of columns of the matrix A.

In economic models, the vector b represents some externally determined (exogenous) parameters

while the linear equations constitute some equilibrium conditions which determine the vector x

which is the set of internal (endogenous) variables.

In this sense it is possible to divide the set of variables in two disjoint subsets of endogenous and

exogenous variables. Thus a general linear economic model will have m equations in n unknowns:

a11 x1 + a12 x2 + + a1n xn = b1

am1 x1 + am2 x2 + + amn xn = bm

In general it will be possible to divide the set of variables into endogenous variables and exoge-

nous variables. Such a division will be useful only if after substituting the values of the exogenous

variables in the m equations, it is possible to obtain a solution of the system for the remaining en-

dogenous variables. For this two conditions must hold. The number of endogenous variables must

be equal to the number of equations m and the square matrix corresponding to the endogenous

variables must have maximal rank m.

A formal statement of the above observation is known as the linear version of Implicit Function

Theorem.

equations (20.2) into endogenous and exogenous variables respectively. Then there exists, for

every choice of the exogenous variables, x j+1 , , xn , a unique set of the values, x1 , , x j , if and

only if

a11 a12 . . . a1 j

a21 a22 . . . a2 j

(12.1)

[A] = . ..

.. ..

j j .. . . .

a j1 a j2 . . . a jk

Exercise 12.1.

138 12. Inverse and Implicit Function Theorems

x+ 2y+ z w = 1

3x y 4z+ 2w = 3

0x+ y+ z+ w = 0

Determine how many variables can be endogenous at any one time and show a partition of the vari-

ables into endogenous and exogenous variables such that the system of equations have a solution.

Find an explicit formula for the endogenous variables in terms of the exiguous variables.

Exercise 12.2.

x+ 3y z+ w = 0

4x y+ 2z+ w = 3

7x+ y+ z+ 3w = 6

Is it possible to partition the variables into endogenous and exogenous variables such that the

system of equations have a unique solution.

y2 6xy + 5x2 = 0.

Given any value of x, we can solve this equation for y. For example if x = 0, then y = 0; if x = 1

the equation takes the form y2 6y + 5 = 0 and yields y = 1 or y = 5 as solution. Observe that it

is possible to solve y explicitly in terms of x (it turns out to be a correspondence) by applying the

quadratic formula:

6x 36x2 20x2

y=

2

or y = 5x or y = x.

It is possible to apply quadratic formula to the implicit function xy2 3y 2 exp x = 0 to obtain

an explicit function for y as

3 9 + 8 exp x

y= .

2x

However it could turn out to be the case that the explicit functions more difficult to work with than

the original implicit function.

12.3. Implicit Function Theorem for R2 139

y5 5xy + 4x2 = 0

then it is not possible to solve it in explicit form as there is no general formula for solving a quintic

equation. Note however that the equation still defines y as an implicit function of x. For x = 0, we

get y = 0, for x = 1 we get y = 1 and so on.

Example 12.2. A profit maximizing firm uses single input x (with unit cost w per unit) to produce

an output y using production function y = f (x). Let the price of the output be p per unit. Then the

profit function for this firm given p and w is

(x) = p f (x) w x.

To obtain the optimal input x which maximizes the profit , we take the first order condition, which

is

p f (x) w = 0.

We can treat p and w as exogenous variables and then this equation defines x as a function of p and

w. The equation need not yield x as an explicit function of p and w. However, it does define x as an

implicit function of p and w and we can use it to estimate the change in x in response to changes in

p and w.

y = G(x1 , , xn ).

In this the endogenous variable y is an explicit function of the exogenous variables (x1 , , xn ).

Such an ideal situation need not occur in every case. More frequently we come across functions of

the form

(12.2) F(x1 , , xn ; y) = 0.

If the function G determines value of y for each set of values (x1 , , xn ), then we say that Eq. (12.2)

defines the endogenous variable y as an implicit function of the exogenous variables (x1 , , xn ).

We consider implicit functions in R2 of the form F(x, y) = c and analyze following question.

For a given implicit function F(x, y) = c and a specified solution (x0 , y0 ),

(a) Does F(x, y) = c determine y as a continuous function of x for points (x, y) such that x is near

x0 and y is near y0 ?

140 12. Inverse and Implicit Function Theorems

(a) Given the implicit function F(x, y) = c, determine a point (x0 , y0 ) such that F(x0 , y0 ) = c, and

also does there exist a continuous function y = f (x) defined on an interval I around x0 so that:

(1) F(x, f(x))=c for all x I and

(2) y0 = f (x0 )?

Theorem 12.3. Let F(x, y) = c be a continuously differentiable function on an open ball around

(x0 , y0 ) in R2 . Suppose F(x0 , y0 ) = c, and consider the expression

F(x, y) = c.

F(x,y)

If = 0, then there exists a continuously differentiable function y = f (x) defined on an

y (x0 ,y0 )

open interval I around x0 such that:

(c)

F(x,y)

x (x ,y )

f (x0 ) = F(x,y) 0 0 .

y (x0 ,y0 )

Example 12.3. Consider the function F : R2 R given by F(x, y) = x2 + y2 = 1 (the graph of this

function is a circle with radius r = 1. If we choose (a, b) with F(a, b) = 1, and a = 1, a = 1,

then there are open intervals I R containing a, and Y R containing b, such that if x I, there

is a unique y Y with F(x, y) = 0. Thus, we can define aunique function f : I Y such that

f (x, f (x)) = 0 for all x I. If a > 0 and b > 0, then g(x) = 1 x2 on I. We say such a function

is defined implicitly by the equation f (x, y) = 0. Note that if a = 1, and b = 0, so that f (a, b) = 0,

we cannot find such a unique function, f .

Chapter 13

Homogeneous and

Homothetic Functions

Most of us have come across homogeneous functions in the elementary algebra courses. For exam-

ple f (x) = ax is homogeneous of degree 1, f (x) = axm is homogeneous of degree m, f (x) = ax + 1

is not a homogeneous function, and so on. First we define the homogeneous function formally.

Definition 13.1. For any scalar k, a real valued function f (x1 , , xn ) is homogeneous of degree k

on Rn+ if for all x Rn+ , and all t > 0,

(a) Consider f : R2+ R given by f (x1 , x2 ) = x12 x23 . Then if t > 0, we have f (tx1 , tx2 ) = (tx1 )2 (tx2 )3 =

t 2+3 x12 x23 = t 5 f (x1 , x2 ). So, f is homogeneous of degree 5.

(c) The function f : R2+ R given by f (x1 , x2 ) = x12 x2 + 3x1 x22 + x23 is homogeneous of degree 3,

since each term is homogeneous of degree 3.

141

142 13. Homogeneous and Homothetic Functions

(f) The function f : R2+ R given by f (x1 , x2 ) = 3x12 x23 6x15 x22 is not homogeneous since the first

term is homogeneous of degree 5 but the second term is homogeneous of degree 7.

Let us look at the function f (x1 , x2 ) = x1a x2b again. We can calculate the partial derivatives of f

on R2++ . Thus,

f (x1 , x2 ) f (x1 , x2 )

= ax1a1 x2b ; = bx1a , x2b1 .

x1 x2

Now, if t > 0, then

f (tx1 ,tx2 ) f (x1 , x2 )

= a(tx1 )a1 (tx2 )b = t a+b1 ax1a1 x2b = t a+b1 .

x1 x1

So f (xx11,x2 ) is homogeneous of degree (a + b 1). Similarly, one can check that f (xx12,x2 ) is homo-

geneous of degree (a + b 1). More generally, whenever a function, f , is homogeneous of degree

k, its partial derivatives are homogeneous of degree (k 1).

Theorem 13.1. Suppose f is homogeneous of degree k on Rn+ , and continuously differentiable on

Rn++ . Then for each i = 1, , n, f (x1x,i ,xn ) is homogeneous of degree (k 1) on Rn++ .

(13.1) f (tx1 , ,txn ) = t k f (x1 , , xn )

We can consider f (tx) to be a function of n + 1 variables, t, x1 , , xn . We will show this result for

the partial derivative with respect to x1 . In this case the remaining variables t, x2 , , xn are held

as constant. Applying the Chain Rule, we have for i = 1, the partial derivative of the expression on

the left hand side of (13.1) is

f (tx1 , ,txn ) tx1

(13.2) = D1 f (tx1 , ,txn ) t

tx1 x1

The partial derivative of the function on the right hand side of (13.1) is t k f (xx

1 , ,xn )

1

. Equality of the

two expressions lead to

f (x1 , , xn )

(13.3) D1 f (tx1 , ,txn ) t = t k

x1

Dividing by t, we get,

f (x1 , , xn )

(13.4) D1 f (tx1 , ,txn ) = t k1

x1

Thus the partial derivatives are homogeneous functions of degree k 1.

13.1. Homogeneous Functions 143

x1 D1 f (x1 , x2 ) + x2 D2 (x1 , x2 ) = ax1a x2b + bx1a x2b = (a + b)x1a x2b = (a + b) f (x1 , x2 ).

More generally, when a function, f , is homogeneous of degree k, then x f (x) = k f (x), a result

known as Eulers theorem.

Theorem 13.2 (Eulers Theorem). Suppose f : Rn+ R is homogeneous of degree k on Rn+ and

continuously differentiable on Rn++ . Then,

f (x1 , , xn ) f (x1 , , xn )

x1 + + xn = k f (x)

x1 xn

x f (x) = k f (x) for all x Rn++

f (tx) = t k f (x1 , , xn )

Then, applying the Chain Rule, we have

d f (tx) f (tx) f (tx)

(13.5) = x1 + + xn

dt x1 xn

But since f is homogeneous of degree k, we have

f (tx) = t k f (x1 , , xn )

and,

d f (tx)

(13.6) = kt k1 f (x1 , , xn )

dt

Take t = 1 to complete the proof.

Theorem 13.3 (Eulers Theorem). Suppose f : Rn+ R is continuous function on Rn+ and contin-

uously differentiable on Rn++ . Also suppose,

f (x1 , , xn ) f (x1 , , xn )

x1 + + xn = k f (x)

x1 xn

for all x Rn++ . Then, f is homogeneous of degree k.

A useful geometric property of the homogeneous function is as follows. Let f (x) be a homo-

geneous function of degree one and consider the level set f (x) = 1. In the producers theory, the

function f could be a constant returns to scale production function and the level sets would then be

the iso - quants. Let x be a point on the iso-quant f (x) = 1. If we translate the point x by a factor r

along the ray joining point x and the origin, we obtain a point on the iso - quant f (z) = r.

144 13. Homogeneous and Homothetic Functions

Similarly if the function f is homogeneous of degree k, then translation of points on iso - quant

q = 1 by a factor r along the ray joining point x and the origin would generate the iso - quant q = rk ,

since f (rx) = rk f (x) = rk as f (x) = 1. Thus the level sets of a homogeneous function are radial

expansions and contractions of each other. This observation leads to following consequence.

Theorem 13.4. Suppose f : Rn+ R is a homogeneous function which is continuously differen-

tiable on Rn++ . Then, the tangent planes of the level sets of f have constant slope along each ray

from the origin.

of a homogeneous function.

such that f (x) = g(h(x)) holds for all x in the domain, then f is a homothetic function.

degree 2 and g(z) = z3 + z is a monotone transformation of z.

Theorem 13.5. Suppose f : Rn+ R be a strictly monotonic function. Then, f is homothetic if

and only if for all x and y in Rn+ ,

f (x) f (y) f (x) f (y) for all > 0.

the partial derivatives.

Theorem 13.6. Suppose f : Rn+ R is continuously differentiable on Rn++ . If f is homothetic

then, the tangent planes to the level sets of f are constant along rays from the origin; i. e., in other

words, for every i and j and for every x in Rn++

f (tx) f (x)

xi xi

(13.7) f (tx)

= f (x)

for all t > 0.

x j x j

The converse of this theorem is also true and is stated here for the sake of completeness.

Theorem 13.7. Suppose f : Rn+ R is continuously differentiable on Rn++ . If (13.7) holds for all

x in Rn++ , for every i and j and for all t > 0, then f is homothetic.

Chapter 14

Problem Set 6

x+ 3y+ z 2w = 1

2x+ 6y 2z 4w = 3

(a) Determine how many variables can be endogenous at any one time and show a partition of

the variables into endogenous and exogenous variables such that the system of equations

have a solution.

(b) Find an explicit formula for the endogenous variables in terms of the exiguous variables.

x+ 3y z+ w = 0

4x y+ z+ w = 3

7x+ y+ z+ 3w = 6

Is it possible to partition the variables into endogenous and exogenous variables such that the

system of equations have a unique solution.

(3) Show that the equation x2 xy3 + y5 = 19 is an implicit function of y in terms of x in the

neighborhood of (x, y) = (5, 2). Then estimate the value of y which corresponds to x = 4.9.

(a) If x = 6 and y = 3, find a value of z which satisfies the equation f (x, y, z) = 0.

(b) Verify if this equation

( defines

) z as(an )implicit function of x and y near x = 6 and y = 3.

z z

(c) If it does, compute x and y .

(6,3) (6,3)

(d) If x increases to 6.1 and y decreases to 2.8, estimate the corresponding change in z.

145

146 14. Problem Set 6

(5) Consider the profit maximizing firm described in the Example 12.2. If p increases by p and

w increases by w, what will be the change in the optimal input amount x?

(6) Consider 3x2 yz + xyz2 = 96 as defining x as an implicit function of y and z around the point

x = 2, y = 3, z = 2.

(a) If y increases to 3.1 and z remains same at 2, use the Implicit Function Theorem to estimate

the corresponding x.

(b) Use the quadratic formula to solve 3x2 yz + xyz2 = 96 for x as an explicit function of y and

z.

(c) Use the approximation by differentials on the explicit formula to estimate x when y = 3.1

and z = 2.

(d) Which of the two methods is easier?

f (x + y) = f (x) + f (y).

(8) Let f : Rn+ R be a non-decreasing, quasi-concave and homogeneous of degree one function.

Show that f must be concave on Rn+ .

(9) Let f be a continuous function from Rn+ to R, which is twice continuously differentiable on

Rn++ . Suppose f is homogeneous of degree m, where m is a positive integer 2. Show that

x H f (x)x = m(m 1) f (x)

for all x Rn++ where H f (x) is the Hessian of f evaluated at x.

Chapter 15

Unconstrained

Optimization

We call

(15.1) max f (x) , x D Rn ,

or

(15.2) min f (x) , x D Rn ,

where domain D is an open set, unconstrained optimization problems. There are no restrictions on

x within the domain. Furthermore, there are no boundary solutions, because the domain does not

include its boundary (recall the definition of open set). Note max f (x) , x Rn or min f (x) , x Rn

are unconstrained optimization problems since Rn is an open set. While solving unconstrained op-

timization problem, we want to use the tools we developed earlier, i.e., find points where f (x) = 0

and investigate the curvature / shape of the function.

Remark 15.1. An unconstrained optimization problem may not have a solution.

Example 15.1. Let f (x) = x2 . Then,

(15.3) max f (x) , x R

does not have a solution. See the graph of f (x) = x2 .

147

148 15. Unconstrained Optimization

3 2 1 1 2 3

Remark 15.2. A minimization problem can always be turned into a maximization problem and

vice versa:

(15.4) min f (x) max f (x) .

xD xD

We will see several examples of unconstrained optimization in these notes. Also there are

additional exercises in the problem set.

Theorem 15.1. First order necessary condition for local maxima / minima: Let A be an open

set in Rn , and let f : A R be a continuously differentiable function on A. If function f has local

maximum / minimum at x , then

( )

f x = 0

where 0 is a n 1 null vector.

Remark 15.3. The converse is not true.

Theorem 15.2. Second order necessary condition for local maxima / minima: Let A be an open

set in Rn , and let f : A R be a twice continuously differentiable function on A.

15.2. Maxima / Minima for C 2 functions of n variables 149

The first order and second order necessary conditions are useful tools to help us in ruling out

the points where a local maximum or local minimum cannot occur. This narrows down our search

for points where a local maximum or local minimum does occur. Examples below explain this

further.

set, and f a continuously differentiable function on A with f (x) = 2x.Considerthepointx = 1.

Then f (x ) = f (1) = 2(1) = 2. We apply Theorem 15.1 to conclude that x = 1 is not a point

of local maximum of f .

set, and f a twice continuously differentiable function on A. Consider the point x = 2. We can

calculate f (x ) = f (2) = 4 + 2(2) = 0, so the necessary condition of Theorem 15.1 is satisfied.

However this theorem in itself fails to provide any additional information at this stage. In other

words, we cannot conclude from Theorem 15.1 that x = 2 is a point of local maximum. Also, we

cannot conclude from Theorem 15.1 that x = 2 is not a point of local maximum. Theorem 15.2 is

useful at this point. We can calculate

f (x ) = f (2) = 2 > 0,

and so the necessary condition of Theorem 15.2 is violated. Consequently, by Theorem 15.2, we

can conclude that x = 2 is not a point of local maximum of f .

It is easy to see that the necessary first and second order conditions are not sufficient.

f (x) d 2 f (x)

Example 15.4. Let X = R be the domain and f (x) = x3 x4 . Then d dx = 3x2 4x3 and dx2

=

6x 12x2 are both 0 at x = 0. But x = 0 is not a local maximizer for f (x).

Theorem 15.3. Sufficient conditions for local maxima / minima: Let A be an open set in Rn , and

let f : A R be a twice continuously differentiable function on A.

(a) If x A is such that H f (x ) is negative definite and f (x ) = 0 then f has local maximum

at x .

(b) If x A is such that H f (x ) is positive definite and f (x ) = 0 then f has local minimum at

x .

It should be noted that the sufficient condition in Theorem 15.3 cannot be weakened to the

necessary condition in the statement of Theorem 15.2. The following example explains this point.

150 15. Unconstrained Optimization

Example 15.5. Let f : R R be given by f (x) = x3 for all x R. Then A = R is an open set, and

f is a twice continuously differentiable function on A. At x = 0,

f (x ) = f (0) = 0, and f (x ) = f (0) = 0,

so first order necessary condition and second order necessary condition are satisfied. But x is

clearly not a point of local maximum of f since f is an increasing function on A.

It may also be observed that the second order necessary condition in Theorem 15.2 cannot be

strengthened to the sufficient condition in the statement of Theorem 15.3. The following example

illustrates this point.

Example 15.6. Let f : R R be given by f (x) = x4 for all x R. Then A = R is an open set, and

f is a twice continuously differentiable function on R. Clearly, x = 0 is a point of local maximum

of f , since f (0) = 0, while f (x) < 0 for all x = 0. We can calculate that

f (x ) = f (0) = 0, and f (x ) = f (0) = 0.

Thus first order necessary condition (in Theorem 15.1) and second order necessary condition (in

Theorem 15.2) are satisfied are, but the second order sufficient condition (in Theorem 15.3) is

violated.

The above discussion shows that the second-order necessary conditions for a local maximum

are different from (weaker than) the second-order sufficient conditions for a local maximum. This

demonstrates the fact that, in general, the first and second derivatives of a function at a point do not

capture all aspects relevant to the occurrence of a local maximum of the function at that point.

Theorem 15.4. Concavity (convexity) and global maxima (minima): Let A be an open and con-

vex set in Rn , and let f : A R be a continuously differentiable function on A.

This is very easy to show. Note that concavity alongwith continuous differentiability of f

implies that for all x A,

f (x) f (x ) 6 f (x ) (x x ).

So f (x) f (x ) 6 0 or x is a point of global maximum of f on A.

Theorem 15.5. Let A be an open and convex set in Rn , and let f : A R be a twice continuously

differentiable function on A.

(a) If x A is such that f (x ) = 0 and H f (x) is negative semi-definite for all x A, then f

has global maximum at x .

15.2. Maxima / Minima for C 2 functions of n variables 151

(b) If x A is such that f (x ) = 0 and H f (x) is positive semi-definite for all x A, then f has

global minimum at x .

It is worth noting that Theorem 15.4 or Theorem 15.5 might be applicable in cases Theorem

15.3 is not applicable as the following example shows.

Example 15.7. Let f : R R be given by f (x) = x4 . Here, we note that f (0) = 0 and f (x) =

12x2 0 for all x R. Thus we can apply Theorem 15.4 or Theorem 15.5 and conclude that

x = 0 is a point of global maximum, and hence also a point of local maximum. But the conclusion

that x = 0 is a point of local maximum cannot be derived from Theorem 15.3, since f (0) = 0.

Now we explain the steps in applying these theorems via several examples.

Example 15.8. Consider X = R2+ and f (x) = x1 x2 2x14 x22 . The optimization exercise is to

maximize the objective function f (x) by choosing x X. The two first order conditions are

x2 8x13 = 0, and x1 2x2 = 0.

Solving the second equation for x1 , we have x1 = 2x2 . Substituting this into the first equation, we

have x2 64x23 = 0, which has three solutions:

1 1

x2 = 0, , and .

8 8

Then the first order conditions have three solutions,

( ) ( )

1 1 1 1

(x1 , x2 ) = (0, 0) , , , and , ,

4 8 4 8

but the last of these is not in the domain of f , and the first is on the boundary of the domain, giving

f (0, 0) = 0. Thus, we have a unique solution in the interior of the domain:

( )

( ) 1 1

x1 , x2 = , .

4 8

Example 15.9. Let us find maxima / minima (if any) for f : R3 R

f (x, y, z) = x2 + 2y2 + 3z2 + 2xy + 2xz.

Step 3. Find f (x, y, z) and set it equal to zero vector.

[ ] [ ]

f (x, y, z) = 2x + 2y + 2z 4y + 2x 6z + 2x = 0 0 0 .

The only solution is (x, y, z) = (0, 0, 0). So we have one candidate for local maximum or minimum.

Step 4. Compute H f .

2 2 2

H f (x, y, z) = 2 4 0 .

2 0 6

152 15. Unconstrained Optimization

Note that in this example, H f is independent of (x, y, z). So whichever property of H f , we get, will

be global.

Step 5. Determine the curvature. Begin with computing the leading principal minors.

D1 = 2 > 0, D2 = 2 4 2 2 = 4 > 0 and

D3 = 2 (24 0) 2 (12 0) + 2 (0 8) = 48 24 16 = 8 > 0

All leading principal minors are strictly positive H f is positive definite (x, y, z) including

(0, 0, 0) which implies that f is strictly convex.

Step 6. Conclude, using Theorem 15.4, that we have a global minimum at (0, 0, 0).

Example 15.10. Let us find maxima / minima (if any) for f : R2 R

f (x, y) = x3 + xy y3 .

[ ] [ ]

f (x, y) = 3x2 + y 3y2 + x = 0 0 .

( )

There are two solutions (x, y) = (0, 0) ; (x, y) = 13 , 13 .

Step 2 Compute H f .

[ ] ( ) [ ] [ ]

6x 1 1 1 2 1 0 1

H f (x, y) = Hf , = and H f (0, 0) = .

1 6y 3 3 1 2 1 0

)(

1 1

Step 3 Determine the curvature. For 3, 3

, the leading principal minors.

( )

1 1

D1 = 2 < 0, D2 = 3 > 0 H f , is negative definite.

3 3

For (0, 0),the principal minors are

D1 = 0, 0; D2 = 1 < 0 H f (0, 0) is neither negative semi-definite nor positive semi-definite.

( ) 15.3 on second order necessary conditions applies and we have strict

1 1

local maximum at 3 , 3 . The contrapositive of the second order necessary conditions (Theorem

15.2) shows that (0, 0) is neither a point of local maximum nor local minimum. It is an inflection

point.

15.2. Maxima / Minima for C 2 functions of n variables 153

[ ] [ ]

f (x, y) = 6x2 + y2 + 10x 2xy + 2y = 0 0 .

2xy + 2y = 0 y = 0 x = 1,

for x = 1, 6x2 + y2 + 10x = y2 4 = 0 y = 2 y = 2

5

for y = 0, 6x2 + y2 + 10x = 6x2 + 10x = 0 x = 0 x = .

3

( )

There are four solutions (x, y) = (0, 0) ; (1, 2) ; (1, 2) and 53 , 0 .

Step 2 Compute H f .

[ ]

12x + 10 2y

Hf = .

2y 2x + 2

Step 3

[ ]

10 0

H f (0, 0) = , D1 = 10 > 0, D2 = 20 > 0

0 2

H f (0, 0) is positive definite.

[ ]

2 4

H f (1, 2) = , D1 = 2 < 0, and 0, D2 = 16 < 0

4 0

H f (1, 2) is neither positive semi-definite nor negative semi-definite.

[ ]

2 4

H f (1, 2) = , D1 = 2 < 0, and 0, D2 = 16 < 0

4 0

H f (1, 2) is neither positive semi-definite nor negative semi-definite.

( ) [ ]

5 10 0 40

Hf ,0 = D1 = 10 < 0, D2 = >0

3 0 34

3

( )

5

H f , 0 is negative definite.

3

154 15. Unconstrained Optimization

( )

Step 4 Then Theorem 15.3 on sufficient conditions apply for (0, 0) and 53 , 0 . We have

( )

strict local minimum at (0, 0); and strict local maximum at 53 , 0 . The contrapositive of the

second order necessary conditions (Theorem 15.2) implies that neither local maximum not local

minimum exist at (1, 2) and (1, 2). They are inflection points.

regression coefficients in the method of ordinary least squares.

for all x R. Our objective is to find a function F (that is, we want to choose a R and b R)

such that the quantity

n

(15.5) [ f (xi ) yi ]2

i=1

is minimized. Thus the coefficients are such that the sum of the squares of the residuals (error

terms, i.e., the difference between the estimates and the actual observations) is minimized.

Define f : R2 R by

n

f (a, b) = [axi + b yi ]2

i=1

(a,b)

15.3. Application: Ordinary Least Square Analysis 155

can calculate

n n

f1 = 2 [axi + b yi ]xi = 2 [axi2 + bxi xi yi ]

i=1 i=1

n

f2 = 2 [axi + b yi ]

i=1

n

f11 = 2 xi2 ;

i=1

n

f12 = 2 xi ;

i=1

n

f21 = 2 xi ;

i=1

f22 = 2n

[ ]

2 ni=1 xi2 2 ni=1 xi

H f (a, b) = .

2 ni=1 xi 2n

The principal minors of order one for the Hessian are, f11 = 2 ni=1 xi2 < 0 and f22 = 2n < 0.

We need to check the determinant of the principal minor of order two to be non-negative. Thus, the

determinant of the Hessian of f is

[ ]2

n n

det(H f (a, b)) = 4n xi2 4 xi

i=1 i=1

|x y| x y.

We can take the two vectors x and the sum vector u and apply the inequality to get

|x u| x u

|x u|2 x2 u2

n [ ]

| xi |2 xi2 n.

i=1

Therefore, det(H f (a, b)) 0. Since f11 (a, b) 0, f22 (a, b) 0, and det(H f (a, b)) 0, H f (a, b) is

negative semi-definite. Consequently, if (a , b ) satisfies the first-order conditions, then (a , b ) is

156 15. Unconstrained Optimization

n n n

a xi2 + b xi = xi yi

i=1 i=1 i=1

n n

a xi + bn = yi

i=1 i=1

n n

xi yi

i=1 i=1

Denoting n by x and n by y (the mean of x and mean of y respectively), we get

(15.6) ax + b = y

Using this in the first equation leads to

n n

(15.7) a xi2 + (y ax)nx = xi yi

i=1 i=1

Thus,

n

xi yi

i=1

n xy

n = a

xi2

i=1

n x2

y ax = b

solves the problem. Note the solution is meaningful provided not all the xi are the same.

In the next exercise, we provide an alternative proof of the determinant of the Hessian is non-

negative.

Exercise 15.1.

2 2 + 2 .

(x1 + x2 + + xn )2 n(x12 + x22 + + xn2 ).

(c) Show that the point (a, b) is a global minimizer of the objective function (15.5).

Solution 15.1.

(a) Observe,

( )2 = 2 + 2 2 0

which shows that the desired inequality holds.

15.3. Application: Ordinary Least Square Analysis 157

(b) We use induction to show this. For n = 2, from part (a) we know

x12 + x22 + 2x1 x2 2(x12 + x22 )

(x1 + x2 )2 2(x12 + x22 ).

Next, we assume that the claim holds for some k N and show that it holds true for n = k + 1.

Let

(x1 + + xk )2 k(x12 + + xk2 ).

Then

2

2

2

) + + (xk2 + xk+1

2 2

) + xk+1

= (k + 1)(x12 + + xk+1

2

).

(c) Part (b) shows that the determinant of the Hessian matrix is non-negative. (This is an alterna-

tive proof, without using Cauchy-Schwarz inequality). Since the Hessian matrix is negative-

semidefinite for all points in R2 , the point (a, b) is a global minimizer of the objective function

(15.5).

There is yet another proof of this inequality which is quite short and one of you showed (thank

you) me in class today.

Observe that

[ ]2 ][ [ ]

n n n

ni=1 xi

n

4n xi2 4 xi = 4n xi

xi2 4 n

i=1 i=1 i=1 i=1 n

[ ]

n n

= 4n xi2 xi x .

i=1 i=1

[ ]

n n

xi2 xi x 0.

i=1 i=1

158 15. Unconstrained Optimization

Note

[ ]

n n n n

xi2 xi x = xi (xi x) = (xi x + x)(xi x)

i=1 i=1 i=1 i=1

n n

= (xi x)2 + x (xi x) 0,

i=1 i=1

since the first term being sum of squares is non-negative and the second term is zero because

ni=1 (xi x) = 0.

Chapter 16

Problem Set 7

g(x, y) = x3 + y3 3x 2y.

Write out g(x, y) and Hg (x, y). Show that g is convex in its domain and find its (global)

minimum.

(16.1) f (x) = x4 4x3 + 4x2 + 4?

Which if any of them are global maxima or minima?

(3) A monopolist producing a single output has two types of buyers. If it produces Q1 units for

buyers of type 1, then the buyers are willing to pay a price of 100 5Q1 dollars per unit. If it

produces Q2 units for buyers of type 2, then the buyers are willing to pay a price of 50 10Q2

dollars per unit. The monopolists cost of producing Q units of output is 50+10Q. How many

units the monopolist should produce to maximize profit?

(4) Suppose that a perfectly competitive firm receives a price of P for its output, pays prices of

w, and r for its labor (L), and capital inputs (K), and operates with the production function

Q = La K b .

(a) Write profits as a function of L, and K. Derive the first order conditions. Provide an eco-

nomic interpretation of the first order conditions.

(b) Solve for the optimal levels of L, and K.

(c) Check the second order conditions. What restrictions on the values of a, and b are necessary

for a profit maximum. Provide an economic interpretation of these restrictions.

159

160 16. Problem Set 7

(d) Find the signs of the partial derivatives of L with respect to P, w, and r.

(e) Derive the firms long run supply curve, i.e., Q as a function of the exogenous parameters.

Find the elasticities of supply with respect to w, r, and P. Do these elasticities sum to zero?

Provide an economic explanation for this fact.

(5) Suppose that a perfectly competitive firm receives a price of P for its output, pays prices of

w, v, and r for its labor (L), natural resource (R) and capital inputs (K), and operates with the

production function Q = A(L)a (K)b + ln R.

(a) Write profits as a function of L, R and K. Derive the first order conditions. Provide an

economic interpretation of the first order conditions.

Now take A = 3, a = b = 31 for remainder of the problem.

(b) Check the second order conditions.

(c) [Optional)] Solve for L . Find the change in L for a change in r when all other parameters

are constant by taking the partial derivatives of L with respect to r.

(d) [Optional)] Find the change in L for a change in v when all other parameters are constant

by taking the partial derivatives of L with respect to v.

(e) [Optional)] It is also possible to determine the changes in L when r or v values change

without explicitly solving for L by using the Implicit Function Theorem. You might like

to use a more general version of the Implicit Function Theorem (than what we stated in

class) to complete this exercise.

(i) Find the change in L for a change in r when all other parameters are constant.

(ii) Find the change in L for a change in v when all other parameters are constant.

Chapter 17

Optimization Theory:

Equality Constraints

The optimization problems we encounter in economics are, in general, constrained problems where

there are some restrictions on the set we can choose x from. Some examples of constrained opti-

mization problems we see are,

max u (x)

(17.1) x

subject to x B (p, I)

Producer Theory

max py w x

(17.2) y,x

subject to (y, x) Y

where

{ }

Y = (y, x) R Rn | y 6 f (x)

is the production possibility set with f (x) being the production function (one output, many inputs).

161

162 17. Optimization Theory: Equality Constraints

We will work with maximization problem as it is easy to turn a minimization problem into a

maximization problem. A constrained maximization problem has the following form.

max f (x)

x

subject to x G (x)

f (x) is called the objective function,

where x is called the choice variable,

G (x) is called the constraint set.

We assume the objective function to be C 2 so that we can use differential calculus techniques.

Example 17.2. Consider following optimization problem.

max f (x)

(17.3) x

subject to x [a, b]

( )

(17.4) x X [a, b] f x > f (x) x [a, b] .

First question to answer is,

Does a solution exist? Note f is continuous (because it is C 2 ) and [a, b] is a non-empty compact

set. We can use Weierstrass Theorem to show existence of a maximum and minimum. Having

shown the existence, there are two possibilities:

Case (i) If solution is interior, then x must also be a local maximum, i.e.,

( ) ( )

(17.5) f x = 0 f x 6 0.

Hence we are able to apply earlier theorems to interior solutions.

If x = a, then f (a) 6 0.

17.2. Equality Constraint 163

In general, constrained optimization problems are of two categories, (a) with equality con-

straint and (b) with inequality constraint. We discuss them next.

g1 (x) = 0

where x Rn , or,

k

g (x) = 0

{ }

(17.6) G (x) = x Rn | g (x) = 0 .

Note that g (x) = (g1 (x) , , gk (x)) is k-dimensional row vector. The interesting case will be

k < n as the following example shows.

max f (x)

xR2 {

x1 + x2 2 = 0 : g1 (x) = 0

subject to

3 x1 + x2 1 = 0 g (x) = 0.

1

: 2

( )

The only point in the constraint set is (x1 , x2 ) = 32 , 21 . Maximizing over this set is trivial. The

solitary point in the constraint set is also the solution.

Definition 17.1. A point x G (x) is point of local maximum of f subject to the constraint g (x) =

0, if there is > 0 such that x G (x) B (x , ) implies f (x) 6 f (x ).

Definition 17.2. A point x G (x) is point of global maximum of f subject to the constraint

g (x) = 0, if x solves the problem

max f (x)

subject to g (x) = 0.

Theorem 17.1. Necessary condition for a constrained local maximum (Lagrange Theorem) Let

A Rn be open and f : A R, g : A Rk be C 1 functions. Suppose x is a point of local

maximum of f subject to the constraint g (x) = 0. Suppose further that g (x ) = 0. Then there is

Rk such that

( ) ( )

(17.7) f x = g x .

164 17. Optimization Theory: Equality Constraints

It is important to check the constraint qualification condition g(x ) = 0, for applying the

conclusion of Lagranges theorem. Without this condition, the conclusion of Lagranges theorem

would not be valid, as the following example shows.

Example 17.4.

Let f : R2 R be given by

f (x1 , x2 ) = 4x1 + 3x2 for all (x1 , x2 ) R2 ;

and let g : R2 R be given by

g(x1 , x2 ) = x12 + x22 .

Consider the constraint set C = {(x1 , x2 ) R2 : g(x1 , x2 ) = 0}. The only element of this set is (0,0),

so (x1 , x2 ) = (0, 0) is a point of local maximum of f subject to the constraint g(x) = 0. Observe that

the conclusion of Lagranges theorem does not hold here. For, if it did, there would exist R

such that

f (0, 0) = g(0, 0)

But this means that

(4, 3) = (0, 0)

which is a contradiction. The problem here is that

g(x1 , x2 ) = g(0, 0) = (0, 0),

so the constraint qualification condition is violated.

In the next Theorem, we use notation C to denote the constraint set, i.e.,

{ }

C = x Rn : g(x) = 0 .

Theorem 17.2. Sufficient Conditions for a Global maximum: Let A Rn be an open convex set

and f : A R, g : A Rk be C 1 functions. Suppose (x , ) C Rk satisfies

( ) ( )

(17.8) f x = g x .

If L (x, ) = f (x) g (x) is concave in x on A, then x is a point of global maximum of f

subject to constraint g (x) = 0.

L (x, ) L (x , ) [ f (x ) g(x )] (x x )

by concavity of L in x on A. Using the first-order condition, the term on the right hand side of the

inequality [ f (x ) g(x )] (x x ) is zero and we get

f (x) g(x) = L (x, ) L (x , ) = f (x ) g(x ).

Since x C, and x C, we have g(x) = g(x ) = 0. Thus, f (x) f (x ), and so x is a point of

global maximum of f subject to the constraint g(x) = 0.

17.2. Equality Constraint 165

We use the following steps to solve the optimization problem with equality constraint. Let f

and gi , i = 1, , k, be C 1 functions.

Necessity Route:

Step 1 Existence of solution can be shown by using Weierstrass Theorem. For this we need to

show that the constraint set is closed and bounded.

L (x, ) = f (x) g(x) = f (x) 1 g1 (x) k gk (x)

where i , i = 1, , k are Lagrange multipliers.

Step 3 Take the partial derivative with respect to each variable x1 , xn , and Lagrange multi-

pliers 1 , , k .

L (x, )

= 0, i = 1, , n;

xi

L (x, )

= 0, i = 1, , k.

i

These are n + k first order conditions (FOCs) for n + k unknowns.

Step 5 Let

{ }

M = (x, ) Rn+k | x satisfies gi (x) = 0, i = 1, , k and FOCs hold.

Verify that g (x ) = 0 holds at each point in the set M. Then evaluate f at each (x, ) M and

find the maximum.

Sufficiency Route: We know that if f and 1 g1 (x) , , k gk (x) are such that L (x, ) is con-

cave, then the FOCs are sufficient for a maximum. Hence if we can show concavity, then any point

satisfying the FOC will be a solution. We illustrate the use of the two routes through following

examples.

Remark 17.2. Note if f is not concave, we have to compare points in M (x, ).

Example 17.5.

max f (x1 , x2 ) = x12 x22

xR2+

subject to 5x1 + 10x2 = 10

166 17. Optimization Theory: Equality Constraints

x2

2 x1

The constraint set consists of 1 0.5x1 and non-negative values of x1 and x2 subject to the equality

constraint. To get the constraint in g (x) = 0 form, we rearrange it

5x1 + 10x2 10 = 0.

Constraint set is closed. Take any convergent sequence {xn } G (x) x. Since 5x1n + 10x2n

10 = 0, x1n > 0, x2n > 0, n N, and weak inequalities are preserved in the limit,

So x G (x).

x1 6 2 and x2 6 1 x 6 (2, 1) = 22 + 12 = 5.

So 5 will serve as a bound. So the constraint set is compact and non-empty and the objective

function f is continuous, hence Weierstrass theorem is applicable and a solution exists.

17.2. Equality Constraint 167

L (x, ) = x12 x22 (5x1 + 10x2 10)

L (x, )

= 2x1 5 = 0

x1

L (x, )

= 2x2 10 = 0

x2

L (x, )

= (5x1 + 10x2 10) = 0.

Now from the first two FOCs

4x1 = 2x2 2x1 = x2

and from the third FOC,

5x1 + 20x1 10 = 0

10 2 4 4

x1 = = , x2 = , = .

25 5 5 25

We get a candidate for solution

( )

2 4 4

m1 = , , .

5 5 25

Since we know a solution exists, it must necessarily be either m1 or one of the corners (2, 0) or

(0, 1). The constraint qualification

( ) [ ]

g x = 5 10 = 0

is verified trivially.

2 4 4

f (2, 0) = 4, f (0, 1) = 1, f , = .

5 5 5

( )

The solution then is x = 2 4

5, 5 .

Sufficiency Route

[ ]

f (x) = 2x1 2x2

[ ]

2 0

H f (x) =

0 2

D1 = 2 < 0, D2 = 4 > 0

So H f (x) is negative definite x. Since H f (x) is negative definite x, f is concave. The constraint

g(x) is concave as it is linear. Also > 0. Then f (x) g (x) is concave as a sum of concave

168 17. Optimization Theory: Equality Constraints

functions.

( )Then we know that the FOCs are sufficient condition for a maximum. So the point

2 4

x = 5 , 5 is our solution.

Example 17.6. (Non-concave objective function)

max f (x1 , x2 ) = x12 x2

subject to 2x12 + x22 = 3.

The constraint set is an ellipsoid and can be rewritten as 3 2x12 x22 = 0. Here the sufficiency

route will not work as the objective function is not concave.

[ ]

2x2 2x1

H f (x) =

2x1 0

D1 = 2x2 , D2 = 4x12 , D2 < 0 x = 0

which means that H f (x) is indefinite x = 0. So f is not concave. Hence we have to use the

necessity route.

( )2

Constraint set is closed. Take any convergent sequence {xn } G (x) x. Since 2 x1n +

( n )2

x2 = 3 n N, and weak inequalities are preserved in the limit,

2 (x1 )2 + (x2 )2 = 3.

So x G (x).

3

x1 6 < 3 and x2 6 3.

2

So x 6 3, 3 = 3 + 3 = 6. So the constraint set is compact and non-empty and the

objective function f is continuous, hence Weierstrass theorem is applicable and a solution exists.

( )

L (x, ) = x12 x2 3 2x12 x22

L (x, )

= 2x1 x2 + 4x1 = 0

x1

L (x, )

= x12 + 2x2 = 0

x2

L (x, )

= (3 2x12 x22 ) = 0.

17.2. Equality Constraint 169

Now

x2

2x1 (x2 + 2) = 0 x1 = 0 = .

2

Case (i)

x1 = 0, x2 = 3, = 0.

We get two candidates for solution

( ) ( )

m1 = 0, 3, 0 , m2 = 0, 3, 0 .

Case (ii)

x2

= x12 x22 = 0

2

x1 = x2 x1 = x2

3 2x12 x22 = 0

gives x1 = 1 x1 = 1. If

1 1

x1 = 1 x2 = 1 x2 = 1, = = .

2 2

Similarly for x1 = 1. We get four more candidates for solution.

( ) ( )

1 1

m3 = 1, 1, , m4 = 1, 1, ,

2 2

( ) ( )

1 1

m5 = 1, 1, , m6 = 1, 1, .

2 2

Thus

M = {m1 , m2 , , m6 } .

The constraint qualification

( ) [ ]

g x = 4x1 2x2 = 0

for each mi M. Verify that

( )

f (0, 3) = 0 = f 0, 3 ,

f (1, 1) = f (1, 1) = 1,

f (1, 1) = f (1, 1) = 1.

The solution then is x = (1, 1) and x = (1, 1).

Example 17.7.

max f (x1 , x2 ) = x1 x2

xR2+

subject to x1 + 4x2 = 16 or 16 x1 4x2 = 0.

170 17. Optimization Theory: Equality Constraints

The Hessian is [ ]

0 1

H f (x) =

1 0

which is indefinite for all values of x R2+ . Hence the objective function is not concave.

Observe that x is restricted to R2+ and the equality constraint holds. This constraint set is non-

empty as (0, 4) is contained in it, and compact. A solution to this problem exists as f is continuous

and the constraint set is non empty and compact, hence Weierstrass theorem is applicable.

L (x, ) = x1 x2 (16 x1 4x2 )

L (x, )

= x2 + = 0

x1

L (x, )

= x1 + 4 = 0

x2

L (x, )

= (16 x1 4x2 ) = 0.

The FOCs will give us interior candidates. We will still need to compare with the corners. Now

x1 = 4x2

8x2 = 16 x2 = 2 and

x1 = 8, = 8.

We get one candidate for solution

m1 = (8, 2, 8) .

The constraint qualification

( ) [ ]

g x = 1 4 = 0

is satisfied trivially for m1 . Compare it with the corners (0, 4) , (16, 0) and verify that

f (0, 4) = 0 = f (16, 0) , f (8, 2) = 16.

The solution then is x = (8, 2).

Example 17.8.

max f (x1 , x2 ) = ln x1 + ln x2

xR2+

subject to x1 + 4x2 = 16 or 16 x1 4x2 = 0.

Here the necessity route does not work as the objective function is not defined at the corners

of the constraint set, x = (16, 0) or x = (0, 4) as ln y is not defined for y = 0. Weierstrass Theorem

17.2. Equality Constraint 171

cannot be applied. Let us use the sufficiency route. Since ln is not defined at the corners, the

problem can be modified as follows

max f (x1 , x2 ) = ln x1 + ln x2

xR2++

subject to 16 x1 4x2 = 0.

The Lagrangian and the FOCs are

L (x, ) = ln x1 + ln x2 (16 x1 4x2 )

L (x, ) 1

= + = 0 x1 = 1

x1 x1

L (x, ) 1

= + 4 = 0 4x2 = 1

x2 x2

L (x, )

= (16 x1 4x2 ) = 0.

So x1 = 4x2 from the first two FOCs. Substituting it in the third FOC, we get x1 = 8, x2 = 2, = 18 .

The Hessian is

x12 0

H f (x) = 1

0 x12

2

1 1

D1 = 2

< 0, D2 = 2 2 > 0, x R2++ .

x1 x1 x2

Hence H f (x) is negative definite x R2++ , so f is concave. Also g (x) = 16 x1 4x2 is linear,

hence concave. Lastly > 0. So L (x, ) is concave and the FOCs are sufficient for maximum.

Hence x = (8, 2) is the solution.

max f (a, b) = ab

(17.9) (a,b)R2+

subject to a + b = 2.

Note the constraint set C = {a > 0, b > 0, a + b = 2} is non empty, (2, 0) is contained in it,

closed since weak inequalities are preserved in the limit, and bounded as (a, b) 6 (2, 2) 2 2 =

2 2. The objective function is continuous. Hence by Weierstrass Theorem a solution exists.

Note that at the solution a > 0, b > 0. Hence we can rewrite the problem as under

max f (a, b) = ab

(a,b)R2++

subject to g (a, b) = 2 a b = 0.

172 17. Optimization Theory: Equality Constraints

L (a, b, ) = ab (2 a b)

L (x, )

= b+ = 0

a

L (x, )

= a+ = 0

b

L (x, )

= (2 a b) = 0.

Now

a = b a = b = 1 =

We get one candidate for solution

m1 = (1, 1, 1) .

The constraint qualification

( ) [ ]

g x = 1 1 = 0

is satisfied trivially for m1 . Compare it with the corners (0, 2) , (2, 0) and verify that

f (0, 2) = 0 = f (2, 0) , f (1, 1) = 1.

The solution then is (1, 1). In other words, we have shown that

(17.10) ab 6 1.

Now let x1 > 0, x2 > 0 be arbitrary with

x1 + x2 = x > 0.

Then

2x1 + 2x2 = 2x

2x1 2x2

+ = 2

x x

Note that a = 2xx1 > 0, b = 2xx2 > 0 and a + b = 2. So we can apply the result shown above.

( )( )

2x1 2x2

ab = 61

x x

( )

x2 x1 + x2 2

x1 x2 6 =

4 2

x1 + x2

x1 x2 6

2

which is the Arithmetic mean Geometric mean inequality.

Chapter 18

Optimization Theory:

Inequality Constraints

The more general constrained optimization problem deals with inequality constraint. Note that the

equality constraint g (x) = 0 can be expressed as g (x) > 0 and g (x) 6 0.

The constrained maximization problem with which we are concerned is the following:

max f (x)

subject to g j (x) 0 for j = 1, , m

and x Rn+ .

continuously differentiable functions from X to R.

Gm+ j (x) = x j for j = 1, , n.

173

174 18. Optimization Theory: Inequality Constraints

max f (x)

subject to G j (x) 0 for j = 1, , m + n

and x X.

C = {x X : G(x) 0}

where, G(x) = [G1 (x), , Gm+n (x)].

Definition 18.1. Kuhn-Tucker Conditions: Let X be an open set in Rn , and f , G j ( j = 1, , m +

n) be continuously differentiable on X. A pair (x , ) in X Rm+n

+ satisfies the Kuhn-Tucker

conditions if

m+n

(i) Di f (x ) + j Di G j (x ) = 0; i = 1, , n

j=1

(ii) G(x ) > 0 and G(x ) = 0.

entiable on X. Suppose a pair (x , ) X Rm+n + , satisfies the Kuhn-Tucker conditions. If X

is convex and f , G j ( j = 1, , m + n) are concave on X, then x is a point of constrained global

maximum.

We illustrate the application of this Theorem through examples. First we take a linear objective

function.

Example 18.1. Solve

max f (x, y) = ax + by

(x,y)R2+

subject to p1 x + p2 y 6 M.

where a, b, p1 , p2 and M are positive parameters. Find a solution to the problem for the following

parameter configurations

a p1 a p1

(i) > (ii) < ,

b p2 b p2

using Kuhn Tucker sufficiency theorem.

(i) Let { }

X = (x, y) R2 | x > 1, y > 1 .

18.1. Inequality Constraint 175

{ }

X C = (x, y) R2 | x 6 1, or y 6 1

is closed.

taking sum of two continuous functions.

Let g1 (x, y) = M p1 x p2 y, g2 (x, y) = x, g3 (x, y) = y are linear and hence continuous func-

tions. Further fx (x, y) = a, fy (x, y) = b are continuous functions. Hence f , g j ( j = 1, , 3) are

continuously differentiable on X.

y1 > 1, y2 > 1 y1 + (1 ) y2 > 1 (0, 1)

( )

x1 + (1 ) x2 , y1 + (1 ) y2 X.

Function f (x, y) is concave as sum of two concave functions and g j ( j = 1, , 3) are concave being

linear functions. Hence for the following problem

max f (x, y) = ax + by

(x,y)X

subject to p1 x + p2 y 6 M, x > 0, y > 0.

all conditions of Kuhn-Tucker sufficiency theorem are satisfied. We need to find pair ((x , y ) , )

X R3+ , that satisfies the Kuhn-Tucker conditions.

m

(i) Di f (x ) + j Di g j (x ) = 0; i = 1, , n,

j=1

(ii) g(x ) > 0 and g(x ) = 0.

They are

a 1 p1 + 2 = 0

b 1 p2 + 3 = 0

M p1 x p2 y > 0, 1 (M p1 x p2 y) = 0

x > 0, 2 x = 0; y > 0, 3 y = 0

1 > 0 M p1 x p2 y = 0.

176 18. Optimization Theory: Inequality Constraints

x2

M

p2

M x1

( ) p1

a p1 M

Figure 18.1. Case (i): b > p2 : Optimal Consumption Bundle = p1 , 0

So x = y = 0 is ruled out. Take Case (i) ab > pp12 . Consider x > 0, y = 0. Note 2 = 0, x = M

p1 ,

a a

= 1 , b p2 + 3 = 0,

p1 p1

( )

a a p2

3 = p2 b = b 1 > 0,

p1 b p1

a p1 a p2

since b > p2 or b p1 > 1. Hence

( )

M a a p2

x = , y = 0, 1 = , 2 = 0, 3 = b 1 > 0

p1 p1 b p1

p1

is a solution. Case (ii) ab < p2 . Consider x = 0, y > 0. Note 3 = 0, y = M

p2 ,

b b

= 1 , a p1 + 2 = 0

p2 p2

( )

b b p1

2 = p1 a = a 1 > 0

p2 a p2

a p1 b p1

since b < p2 or 1 < a p2 . Hence

( )

M b b p1

x = 0, y = , 1 = , 2 = a 1 > 0, 3 = 0

p2 p2 a p2

is a solution.

18.1. Inequality Constraint 177

x2

M

p2

M x1

( ) p1

a p1

Figure 18.2. Case (ii): b < p2 : Optimal Consumption Bundle = 0, M

p2

x

max f (x, y) = 1+x +y

(x,y)R2+

subject to x + 4y 6 16.

using Kuhn Tucker sufficiency theorem.

(i) Let { }

X = (x, y) R2 | x > 1, y > 1 .

Then X is open as its complement

{ }

X C = (x, y) R2 | x 6 1, y 6 1

is closed. (ii) Function f (x, y) is continuous as x, y, 1 + x are continuous, 1 + x > 0 and f (, ) is ob-

tained by taking quotient of two continuous functions x and 1 + x, with non-vanishing denominator

and then adding a continuous function. Functions

g1 (x, y) = 16 x 4y; g2 (x, y) = x; g3 (x, y) = y

1

are linear and hence continuous. Further fx (x, y) = , f (x, y) = 1 are continuous functions.

(1+x)2 y

Hence f , g j ( j = 1, , 3) are continuously differentiable on X.

178 18. Optimization Theory: Inequality Constraints

y1 > 1, y2 > 1 y1 + (1 ) y2 > 1 (0, 1)

( )

x1 + (1 ) x2 , y1 + (1 ) y2 X.

Function f (x, y) is concave as sum of two concave functions (exercise) and g j ( j = 1, , 3) are

concave being linear functions. Hence for the following problem

x

max f (x, y) = 1+x +y

(x,y)X

subject to x + 4y 6 16, x > 0, y > 0.

all conditions of Kuhn -Tucker sufficiency theorem are satisfied. We need to find pair ((x , y ) , )

X R3+ , that satisfies the Kuhn-Tucker conditions. They are

1

1 + 2 = 0

(1 + x)2

1 41 + 3 = 0

16 x 4y > 0, 1 (16 x 4y) = 0

x > 0, 2 x = 0; y > 0, 3 y = 0.

1 > 0 16 x 4y = 0

1 4

2

= 1 ; 1 + 3 = 0

(1 + 16) 289

285

3 = <0

289

This contradicts 3 > 0.

1

= 1 ; 1 1 + 2 = 0

4

1 3

1 + 2 = 0; 2 = < 0.

4 4

This contradicts 2 > 0.

18.1. Inequality Constraint 179

1

= 1 ; 1 41 = 0,

(1 + x)2

(1 + x)2 = 4 x = 1 > 0

15

16 x 4y = 0 y = > 0.

4

( )

Note that all conditions are satisfied. The Theorem asserts that 1, 15

4 is a global maximum and

therefore solves both the problem.

Example 18.3.

In the above example, let the price of good y be p > 0 and income be I > 0. We can redo the

exercise by going over the Kuhn Tucker conditions again. They are

1

1 + 2 = 0

(1 + x)2

1 p1 + 3 = 0

I x py > 0, 1 (I x py) = 0

x > 0, 2 x = 0; y > 0, 3 y = 0.

1 > 0 I x py = 0

and x = y = 0 is ruled out because I > 0. There are three remaining cases.

1

= 1

(1 + I)2

p p

1 2

+ 3 = 0 3 = 1.

(1 + I) (1 + I)2

( )

If p

1 > 0 p > (I + 1)2 , then 3 > 0. So solution is I, 0, 1

, 0, p 2 1 if p >

(1+I)2 (1+I)2 (1+I)

2

(I + 1) .

180 18. Optimization Theory: Inequality Constraints

1

= 1

p

1

1 1 + 2 = 0 1 + 2 = 0

p

1

2 = 1.

p

( )

If 1

p 1 > 0 1 > p, then 2 > 0. So solution is 0, pI , 1p , 1p 1, 0 if p 6 1.

1

= 1 , 1 p1 = 0,

(1 + x)2

(1 + x)2 = p x = p1 > 0

I +1 p

I x py = 0 y = > 0.

p

( )

I+1 p

Hence for p > 1 and I + 1 > p, the solution is p 1, p , 1p , 0, 0 . Combining them the

( )

solution x , y , 1 , 2 , 3 is

( )

I, 0, 1

2 , 0,

p

1 if p > (I + 1)2

( (1+I) (1+I)2

)

0, p , p , p 1, 0 if p 6 1, and

I 1 1

( )

I+1 p

p 1, p , 1p , 0, 0 if 1 < p < (I + 1)2 .

The Kuhn Tucker Sufficiency Theorem asserts that this solution is a global maximum and therefore

solves both the problem.

We know from the definitions, that if x is a point of global maximum, then x is also a point of local

maximum. The situations under which the converse is true are given by the following theorems.

global maximum of f on A.

18.2. Global maximum and constrained local maximum 181

is the unique point of global maximum of f on A.

(i) B(x, ) A, and

(ii) x is the unique point of maximum of f on B(x, ).

If f is quasi-concave on A, then x is the unique point of global maximum of f on A.

(a) Assume that x is not a global maximum of f on A. Then there exists another point x A such

that x = x and f (x) > f (x).

Since x is a point of local maximum, there exists > 0 such that f (x) f (x) for all

x A B(x, ).

Consider a point x A on the line joining the two points x and x, i.e.,

x = x + (1 )x,

for some [0, 1]. Since A is convex, we know x A. By concavity of f , we have for all

[0, 1]

f (x + (1 )x) f (x) + (1 ) f (x).

Since f (x) > f (x), we also have for all (0, 1] that

f (x + (1 )x) f (x) + (1 ) f (x) > f (x) + (1 ) f (x) = f (x).

We wish to take sufficiently close to zero (but not equal to zero) so that

x x + (1 )x B(x, ).

For this, let us denote d(x, x) = d and note

d(x , x) = d(x + (1 )x, x) = ||d(x, x) = d.

If we set = 2d , then we know

d(x , x) = d = d = ,

2d 2

or x B(x, ).

Also x A since A is a convex set. Therefore, we have found a point x A B(x, ) such

that f (x ) > f (x), which contradicts that x was a point of local maximum. It follows that x

must be a global maximum of f on A.

182 18. Optimization Theory: Inequality Constraints

(b) Assume that x is not a point of global maximum of f on A. Then there exists another point

x A such that x = x and f (x) > f (x).

Since x is a point of local maximum, there exists > 0 such that f (x) f (x) for all

x A B(x, ).

Consider a point x A on the line joining the two points x and x, i.e.,

x = x + (1 )x,

for some [0, 1]. Since A is convex, we know x A. Since f is strictly quasi-concave, we

have for all (0, 1)

{ }

f (x + (1 )x) min f (x), f (x) = f (x).

We wish to take > 0 sufficiently small so that

x x + (1 )x B(x, ).

For this, let us denote d(x, x) = d and note

d(x , x) = d(x + (1 )x, x) = ||d(x, x) = d.

If we set = 2d , then we know

d(x , x) = d = d = ,

2d 2

or x B(x, ).

Also x A since A is a convex set. Therefore, we have found a point x A B(x, ) such

that f (x ) > f (x), which contradicts that x was a point of local maximum. It follows that x

must be a global maximum of f on A.

To show uniqueness, if not, then there exists x A such that

f (x) = f (x ).

But then, since f is strictly quasi-concave and A is convex,

{ }

f (0.5x + 0.5x ) > min f (x), f (x ) = f (x ) = f (x).

This contradicts the fact that x is a point of global maximum.

(c) Assume that x is not the unique point of global maximum of f on A. Then there exists another

point x A such that x = x and f (x) > f (x).

Since x is the unique point of local maximum in the open ball B(x, ), f (x) > f (x) for all

x A B(x, ).

Consider a point x A on the line joining the two points x and x, i.e.,

x = x + (1 )x,

for some [0, 1]. Since A is convex, we know x A. Since f is quasi-concave, we have for

all (0, 1) { }

f (x + (1 )x) min f (x), f (x) = f (x).

18.2. Global maximum and constrained local maximum 183

x x + (1 )x B(x, ).

For this, let us denote d(x, x) = d and note

d(x , x) = d(x + (1 )x, x) = || d(x, x) = d.

If we set = 2d , then we know

d(x , x) = d = d = ,

2d 2

or x B(x, ).

Also x A since A is a convex set. Therefore, we have found a point x A B(x, ) such

that f (x ) f (x), which contradicts that x was the unique point of local maximum. It follows

that x must be the unique point of global maximum of f on A.

This theorem shows that there is an important difference between concavity and quasi-concavity

in going from the local maximum property to the global maximum property. With quasi-concavity,

we need something more (some strictness) to make the arguments work. In (b), this additional

condition takes the form of strict quasi-concavity. In (c), it takes the form of assuming that the

point of local maximum is unique. This underlying theme (that one needs something in addition

to quasi-concavity to make the arguments and results work) recurs in Arrow- Enthovens theory

of quasi-concave programming, where the attempt is made to replace the concavity conditions of

Kuhn-Tucker with quasi-concavity.

The following example shows that in Theorem 18.2(a), we cannot replace concavity of f by

quasi-concavity of f , and still preserve the conclusion.

Example 18.4. Let A be the interval (0, 6) in R. Clearly, A is an open, convex set. Let f : A R

be defined as follows:

x for x (0, 2)

f (x) = 2 for x [2, 4]

x2 for x (4, 6)

Then, f is a non-decreasing function on A, and therefore quasi-concave. The point x = 3 is clearly

a point of local maximum, since f (x) = 2 f (x) for all x A B(x, 1). However, x is not a point

of global maximum of f on A, since (for example), f (5) = 3 > 2 = f (x).

Following theorem describes the conditions in which x is a point of constrained local maxi-

mum, then x is also a point of constrained global maximum.

184 18. Optimization Theory: Inequality Constraints

X. Suppose x is a point of constrained local maximum. Then, x is a point of constrained global

maximum.

{ }

C = x X : g j (x) 0 .

Since x is a point of constrained local maximum, there is > 0, such that for all x B(x, ) C, we

have f (x) f (x).

Now, if x is not a point of constrained global maximum, then there is some x C, such that

f (x) > f (x). One can choose 0 < < 1 with sufficiently close to zero, such that

x [ x + (1 )x] B(x, ).

For this, we need

x + (1 )x x < .

This implies if

<

x x

then x B(x, ). Take

=

2x x

so that x B(x, ). Since X is convex and g ( j = 1, , m) are concave, we claim that C is a convex

j

Let y C and y C be two arbitrary points. By definition of the constraint set C, y and y are

in X and therefore, y [ y + (1 )y ] X for all [0, 1]. Also by concavity of the constraint

functions,

g j (y) = g j ( y + (1 )y ) g j (y) + (1 )g j (y ) 0 + (1 ) 0 = 0,

for all j = 1, , m.

Therefore, x [ x + (1 )x] C.

Thus

x [ x + (1 )x] B(x, ) C.

Also, since f is concave,

f (x) = f ( x + (1 )x) f (x) + (1 ) f (x) > f (x) + (1 ) f (x) = f (x).

18.2. Global maximum and constrained local maximum 185

But this contradicts the fact that x is a point of constrained local maximum.

Observe that we did not need to assume that the objective function is differentiable on the

domain X in this proof.

Chapter 19

Problem Set 8

Let C = (c1 , c2 , c3 ) be a non-zero vector in R3 . Consider the following constrained maxi-

mization problem:

max 3i=1 ci xi

(19.1) subject to 3i=1 xi2 = 1

and (x1 , x2 , x3 ) R3

(a) Show, by using Weierstrass theorem, that there exists x R3 which solves (29.1).

(b) Use Lagranges theorem to show that

3

(19.2) ci xi = C.

i=1

(c) Let p, q be arbitrary non-zero vectors in Rn . Using result in (b), show that |pq| pq.

Solve the following constrained optimization problems.

(2) Let f : R2 R.

max f (x, y) = x2 3xy

(19.3) (x,y)R2+

subject to x + 2y = 10.

1 2

max f (x, y) = x 3 y 3

(19.4) (x,y)R2+

subject to 2x + y = 4.

187

188 19. Problem Set 8

max f (x, y) = xy

(19.5)

subject to x + y 6 6, x > 0, y > 0.

max f (x, y) = x + ln(1 + y)

(19.6)

subject to x 0, y 0 and x + py m.

(6) Let X be a non-empty, convex set in R2 . Let g be a continuous function from X to R, and

let f be a strictly quasi-concave function from X to R. Consider the following constrained

optimization problem.

max f (x)

(19.7) subject to g(x) 0

and xX

}

max f (x)

(19.8)

subject to x X

(a) Suppose that x is a solution to (29.26), and g(x) > 0. Is x also a solution to problem

(29.25)? Explain.

(b) Suppose that x is a solution to (29.26), but x is not a solution to (29.25). Show that if x is

any solution to (29.25), then we must have g(x) = 0.

(7) Suppose that a consumer has the utility function U(x, y) = xa yb and faces the budget constraint

px x + py y I.

(A) Utility Maximization

(a) What are the first order conditions for utility maximization?

(b) Solve for the consumers demands for goods x and y.

(c) Solve for the value of . What is the economic interpretation of ? When is an

increasing, decreasing or constant function of income?

(d) Show that the second order conditions hold?

(e) Show that the implicit function theorem value of dx dI is identical to the value of taking

the partial derivative of x with respect to I.

19. Problem Set 8 189

and income. Use x and y to solve for the indirect utility function. Is it true that the

partial of the indirect utility function with respect to income equals ?

(B) Expenditure Minimization:

Now consider the dual of the utility maximization problem. The dual problem is to min-

imize expenditures, Px x + Py y, subject to reaching a given level of utility, u0 (the constraint

is therefore U0 xa yb = 0).

(a) What are the first order conditions for expenditure minimization?

(b) Use the first order conditions to solve for x and y (these are called the Hicksian or

compensated demand functions).

(c) Check the second order conditions.

(d) Write the level of income, I, necessary to reach U0 as a function of U0 , prices, and

parameters. How does this expenditure function relate to the indirect utility function?

(e) To avoid confusion, let us call solution for utility maximization of good x as x and

solution for good x in expenditure minimization as h . Prove that

x h x

= x .

Px Px I

Interpret this answer.

(8) Suppose a consumer has the utility function U = a ln(x x0 ) + b ln(y y0 ) where a, b, x0 and

y0 are positive parameters. Assume that the usual budget constraint applies.

(a) Solve for the consumers demand for good x.

(b) Find the elasticities of demand for good x with respect to income and prices.

(c) Show that the utility function U = 45(x x0 )3.5a (y y0 )3.5a would have yielded the same

demand for good x.

Suppose a consumer has the utility function,

where a > 0, b > 0 and c > 0 are such that a + b + c = 1. The budget constraint is

px + qy + rz I.

In other words, the prices of good x, y and z are p, q and r respectively and the consumer has

an income I. The prices and income are positive.

In addition, the consumer faces a rationing constraint. He is not allowed to buy more than

k > 0 units of good x.

(a) Solve the optimization problem.

(b) Under what condition on the various parameters, is the rationing constraint binding?

190 19. Problem Set 8

(c) Show that when the rationing constraint binds, the income that the consumer would have

liked to spend on good x but cannot do so now is split between good y and z in proportions

b : c.

(d) Would you expect rationing of bread purchases to affect demand for butter and rice in this

way? If not, how would you expect the bread-butter-rice case to differ from the result in

(c)?

Chapter 20

Envelope Theorem

Let f (x, ) be a continuously differentiable function of x Rn and a parameter . For each choice

of , consider the unconstrained maximization problem:

max f (x, )

where choice variable is x. It is of interest to us as to how the maximizer value x changes as the

parameter value changes.

Theorem 20.1. Let x () be a solution of this problem and also assume that x () is a continuously

differentiable function of . Then,

d

f (x (), ) = f (x (), )

d

d d

f (x (), ) = f (x (), ) xi () + f (x (), )

d i x i d

or

d

f (x (), ) = f (x (), )

d

since xi f (x (), ) = 0 for i = 1, , n by the First Order conditions for the solution.

191

192 20. Envelope Theorem

Example 20.1. Consider the problem of maximizing the function f (x, a) = 2x2 + 2ax + 4a2 with

respect to x for any given value of a. What is the effect of a unit increase in the value of a on the

maximum value of f (x, a).

This can be done directly by computing the x which maximizes f . The first order condition

yields

f (x) = 4x + 2a = 0.

So x = 0.5a. We can plug this into f (x, a) which leads to

Observe that f (x (a), a) increases at the rate of 9a as a increases. Alternatively we could apply the

Envelope Theorem to get

d f f (x (a), a)

= = 2x + 8a = 9a

da a

since x (a) = 0.5a.

(x)R+

Let us denote the input level at which the maximum profit is attained by x . We observe that x is a

function of the parameters p and w. The maximum profit is the value function of this exercise and

we call it the profit function.

By Envelope Theorem

(p, w)

= f (x (p, w)) > 0.

p

Thus the profit function is increasing in the price of the output. Also

(p, w)

= x (p, w) < 0.

w

So the profit function is decreasing in the price of the output. Further, it also shows that

(p, w)

x (p, w) = .

w

The profit maximizing input stock can be obtained by taking partial derivative of the profit function

with respect to w (a result known as Hotellings Lemma).

20.2. Meaning of the Lagrange multiplier 193

In this section we will see that the multipliers measure the sensitivity of the optimal value of the

objective function to the changes in the right-hand sides (parameters) of the constraints. In this

sense, they provide a natural measure of the value for scarce resources in economics maximization

problems.

Consider a simple maximization problem with two variables and one equality constraint. Let

f R2 R be denoted as f (x, y).

:

max f (x, y)

(20.2) (x,y)R2+

subject to h(x, y) = a.

Let (x (a), y (a)) be a solution to the above problem for any given parameter value a. Thus

f (x (a), y (a)) is the corresponding optimal value of the objective function. Let the Lagrange

multiplier be denoted by (a). Following theorem shows that (a) measures the rate of change

of the optimal value of the objective function f with respect to a.

Theorem 20.2. Let f and h be continuously differentiable functions of two variables. For any fixed

value of the parameter a, let (x (a), y (a)) be the solution of the optimization problem (20.2) with

the corresponding Lagrange multiplier (a). Assume that x (a), y (a) and (a) are continuously

differentiable functions of a and the constraint qualification holds at (x (a), y (a)). Then,

d f (x(a), y(a))

(a) = .

da

L f (x, y) (h(x, y) a)

where a is a parameter. The solution of this problem, (x (a), y (a)), (a) satisfies the First Order

conditions.

L (x (a),y (a), (a))

x =0

f (x (a),y (a), (a)) h(x (a),y (a), (a))

x (a) x =0

L (x (a),y (a), (a))

y =0

f (x (a),y (a), (a))

L (x (a),y (a), (a))

=0

h(x (a), y (a)) a =0

for all values of a. Also, since h(x (a), y (a)) = a for all a, we get,

h(x (a), y (a), (a)) dx (a) h(x (a), y (a), (a)) dy (a)

+ =1

x da y da

194 20. Envelope Theorem

for all a. Now we can use the Chain Rule and the two First Order conditions,

d f (x (a),y (a) f (x (a),y (a)) dx (a) (a)) dy (a)

da = x da + f (x (a),y y da

h(x (a),y (a), (a)) dx (a) h(x (a),y (a), (a)) dy (a)

= x da +

y da

h(x (a),y (a), (a)) dx (a) h(x (a),y (a), (a)) dy (a)

= [ x da + y da ]

=

1 =

The general envelope theorem arise in the case of constrained optimizations where both the ob-

jective function as well as the constraint functions are functions of some parameters. Consider for

example the optimization exercise as follows:

max f (x, a)

subject to g j (x, a) = 0 for j = 1, , m

(20.3) and x Rn+ .

In this case, the objective function f as well as the constraints g1 , , gm depend on the

parameter a. Following theorem shows that the rate of change of f (x (a), a) with respect to a

equals the partial derivative with respect to a, not of f but of the corresponding Lagrangian function

L.

Theorem 20.3. Let f , g1 , , gm be continuously differentiable functions and let

x (a) = (x1 (a), x2 (a), , xn (a))

denote the solution of the optimization problem (20.3) for any fixed value of the parameter a.

Assume that x (a), and the Lagrange multipliers 1 (a), , m (a) are continuously differen-

tiable functions of a and the constraint qualification condition holds. Then,

d f (x (a), a) L (x (a), (a), a)

(20.4) = .

da a

Chapter 21

Elementary Concepts in

Probability

Probability theory deals with random events, events whose occurrence cannot be predicted with

certainty. There are at least three sources of randomness. Firstly by nature many features of our

world are stochastic. Evolution of such a diverse variety of life is witness to unpredictability in

the universe and environment. Second source of randomness: Many events are the result of a very

large number of actions and decisions. Third source of randomness: Some variables may appear

random because they are measured with error.

Even though we are not sure about the outcomes of a random event, we can attach to each

outcome a number called probability.

We first describe the set of outcomes of a random event, i.e., a set whose elements are all possible

outcomes of a random event. It is known as the sample space and denoted by .

Example 21.1. The set of possible outcomes of flipping a fair coin is

= {H, T }.

= {1, 2, 3, 4, 5, 6}

195

196 21. Elementary Concepts in Probability

= {HT, T H, T T, HH}.

It is easy to list the set of outcomes for flipping n coins, but very soon the list becomes too long.

(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)

(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6)

(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6)

=

(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6)

(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6)

(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)

where the outcome (i, j) is said to occur if i appeared on the first die, and j appeared on the second

die.

The set of outcomes for measuring the lifetime of a car, consists of non-negative real numbers.

= [0, ).

Next we form the set F that contains all elements of the set as well as their unions and

complements. Thus if A and B are in F , so does A B, Ac , and Bc . The set F , which is closed

under the operations of union and complements, is known as algebra.

Example 21.2. The algebra for the outcomes of flipping a fair coin is

/ , {H}, {T }}.

F = {0,

The algebra for the outcomes for flipping two coins is

/ , {T T }, {HH}, {HT, T H}, {HH, T T }, {HH, HT, T H}, {T T, HT, T H}}.

F = {0,

We can now define a probability measure by assigning to each element of sample space , a

probability P.

Definition 21.1. The set function P is called a probability measure if

/ = 0;

(i) P(0)

(ii) P() = 1;

21.1. Discrete Probability Model 197

/

(iii) P(A B) = P(A) + P(B) for all A, B and A B = 0.

The three conditions listed above are the axioms of probability theory.

Example 21.3. For the outcomes of flipping two fair coins,

P(HH) = P(HT ) = P(T T ) = P(T H) = 0.25.

The triple of the set of outcomes, the algebra, and the probability measure (, F , P) is referred

to as a probability model.

In next step, we assign probabilities to the random events. Three sources of attaching proba-

bilities to the outcomes of random events are (a) equally likely events, (b) long run frequencies and

(c) degree of confidence (subjective or Bayesian approach). Observe that even though we assign

probabilities to different events, the mathematical theory for dealing with the random events and

their probabilities remain the same.

We define random variable next. The rule that specifies a real number to the outcomes is called

a random variable. More formally,

Definition 21.2. A random variable is a set function that maps the set of outcomes of a random

event to the set of real numbers.

Such a function is not unique and depending on the the purpose at hand, we may define one or

many random variables to the same random event.

Example 21.4. For the outcomes of flipping two fair coins, let us define a random variable X as

the number of heads. Then, we have

X(HH) = 2; X(HT ) = X(T H) = 1, X(T T ) = 0.

We could have defined the random variable X as the number of tails. Then, we have

X(HH) = 0; X(HT ) = X(T H) = 1, X(T T ) = 2.

In collecting labor statistics, we are interested in the characteristics of the respondents. For

example, we may ask if a person is in the labor force or not, employed or unemployed. We could

also be interested to learn the demographic characteristics of the respondents like gender, race, age

etc. For each of these answers we can define one or more binary variables. For example let X = 1 if

a respondent who is in the labor force is unemployed and X = 0 if employed. We can define Y = 1

if the respondent is a woman and employed, Y = 0 otherwise.

198 21. Elementary Concepts in Probability

Example 21.5. For the outcomes of flipping a fair coin three times, let us define a random variable

X as the number of heads. The set of outcomes for flipping a fair coin three times is

= {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T }.

Then, the probability distribution is

P(X = 0) = 0.125; P(X = 1) = 0.375; P(X = 2) = 0.375; P(X = 3) = 0.125.

Probability distributions become unwieldy as the number of outcomes becomes large or infi-

nite. One way to summarize the information about a probability distribution is through its moments

as mean which measure the central tendency, and variance, which measures the dispersion or vari-

ability of the distribution. Another moment reflects the skewness of the distribution to the left or

to the right and kurtosis which is an indicator of the bundling of the outcomes near the mean : the

more values are concentrated near the mean, the taller is the peak of the distribution.

The first moment of the distribution which is the expected value or the mean of the distribution

is defined as

n

E(X) = = xi P(xi ).

i=1

Example 21.6. For the distribution of the number of heads in three flips of a coin, we have,

= 0 P(X = 0) + 1 P(X = 1) + 2 P(X = 2) + 3 P(X = 3).

which yields the mean as

= 0 + 0.375 + 0.750 + 0.375 = 1.50

n

E(X r ) = mr = xir P(xi ).

i=1

Example 21.7. For the distribution of the number of heads in three flips of a coin, the second

moment is

E(X 2 ) = 02 P(X = 0) + 12 P(X = 1) + 22 P(X = 2) + 32 P(X = 3).

which yields the second moment as

= 0 + 0.375 + 1.50 + 1.125 = 3

Another measure (which is of great importance) is the variance or the second moment around

the mean :

n

E(X )2 = 2 = (xi )2 P(xi ).

i=1

21.2. Marginal and Conditional Distribution 199

The formula for the variance can be rewritten using the binomial expansion as

n

E(X )2 = (xi )2 P(xi )

i=1

n n

= xi2 P(xi ) 2 xi P(xi ) + 2

i=1 i=1

n

= xi2 P(xi ) 2

i=1

Example 21.8. For the distribution of the number of heads in three flips of a coin, the variance is

Mean is a measure of central tendency of a distribution showing its center of gravity whereas

the variance and its square root, called the standard deviation measure the dispersion or the volatil-

ity of the distribution. The advantage of using the standard deviation is that it measures the disper-

sion in the same measurement units as the original variable. In finance, variance of returns of an

asset is used as a measure of risk.

As we have observed before, a random event may give rise to a number of random variables each

defined by a different set function whose domains are the same set. In the Table below we present

such a situation where random variables X and Y and their probabilities are reported. Think of

Y as the annual income in units of thousand dollars of a profession and X as gender, with X = 0

denoting men and X = 1 denoting women. The information contained in the table is probability

of joint events, i.e., the probability of X and Y each taking a particular value. For instance the

probability of X = 1 and Y = 120 is 0.11, which is denoted as

Such a probability is referred to as joint probability because it shows the probability of a woman

earning $120000 a year.

200 21. Elementary Concepts in Probability

X Y P

0 60 0.02

0 70 0.04

0 80 0.07

0 90 0.09

0 100 0.10

0 110 0.06

0 120 0.03

0 130 0.02

0 140 0.01

0 150 0.01

1 70 0.01

1 80 0.02

1 90 0.04

1 100 0.08

1 110 0.11

1 120 0.11

1 130 0.09

1 140 0.05

1 150 0.03

1 160 0.01

If we are interested only in X, then we can sum up the overall relevant values of Y and get the

marginal probability of X. For example,

P(X = 1) = P(X = 1,Y = 70) + + P(X = 1,Y = 160) = 0.01 + 0.02 + + 0.03 + 0.01 = 0.55.

n

P(X = xk ) = P(X = xk ,Y = y j )

j=1

In similar manner, we can calculate the probability of X = 0 which would be 0.45. Thus the

marginal distribution of X is

X P(X)

0 0.45

1 0.55

21.2. Marginal and Conditional Distribution 201

Observe that in this example, the marginal distribution of X shows the distribution of men and

women in that profession (45% men and 55% women), whereas the marginal distribution of Y

would show the distribution of income for both men and women, i.e., profession as a whole.

Sometimes we may be interested to know the probability of Y = 110 when we already know

that X = 1. Thus we want to know the conditional probability of Y = 110, given that X = 1.

P(Y = 110, X = 1) 0.11

P(Y = 110|X = 1) = = = 0.20

P(X = 1) 0.55

In general,

P(Y = y j , X = xk )

P(Y = y j |X = xk ) = .

P(X = xk )

We have computed the conditional distribution of Y |X = 0 and Y |X = 1.

60 0.044 70 0.018

70 0.089 80 0.036

80 0.156 90 0.073

90 0.2 100 0.145

100 0.222 110 0.200

110 0.133 120 0.200

120 0.067 130 0.164

130 0.044 140 0.091

140 0.022 150 0.055

150 0.022 160 0.018

A conditional distribution has a mean, variance and other moments. The mean is

n

E(Y |X = xk ) = y j P(y j |X = xk ).

j=1

Variance and other higher moments of the conditional distribution can be computed similarly.

n

E(Y |X = 0) = y j P(y j |X = 0)

j=1

= 60 0.044 + 70 0.089 + 80 0.156 + 90 0.2 + 100 0.222

+ 110 0.133 + 120 0.067 + 130 0.044 + 140 0.022 + 150 0.022

= 101.4.

. We can compute the conditional mean for X = 1 to be

202 21. Elementary Concepts in Probability

E(Y |X = 1) = 111.4.

n

E(Y ) = EX E(Y |X) = E(Y |X = x j )P(X = x j )

j=1

E(Y ) = E(Y |X = 0)P(X = 0) + E(Y |X = 1)P(X = 1)

= 101.4 0.45 + 111.4 0.55 = 107.9

It is easy to infer that if E(Y |X = x j ) = 0 for all values of x, i.e., the conditional expectation of Y

equals zero, then the unconditional expectation E(Y ) = EX E(Y |x) = 0. However, the reverse is not

true. E(Y ) = 0 does not imply that E(Y |X) = 0 for all values of x.

Many variables we come across in economics are continuous in nature as against discrete. In

assigning probabilities to continuous variables, we face the problem that no matter how small is

the interval of values of the continuous variable, there are infinitely many points in it. If we assign

positive probabilities to each point, the sum of such probabilities would diverge which violates the

axiom of probability theory, i. e., the sum of probabilities should add up to one.

This problem is circumvented by assigning probabilities to the segments of the interval within

which the random variable is defined.

P(X 5), or P(4 < X 2)

Example 21.9. A simple example of a continuous random variable is the uniform distribution.

Variable X can take any value between a and b and the probability of X falling within the segment

[a, c] is proportional to the length of the interval compared to the interval [a, b].

ca

P(a < X c] =

ba

F(x) = P(X x)

and has to conform to the following conditions:

21.4. Continuous Random Variables 203

F(x1 ) F(x2 ), if x1 < x2 .

(c)

F() = lim F(x) = 0, and F() = lim F(x) = 1.

x x

These conditions are the counterpart of the discrete case and entail that probability is always posi-

tive and the sum of probabilities adds to one.

Now we define the probability model for continuous random variables. Consider the extended

real line R = R {, } which shall play the same role for the continuous variables as plays

for the discrete variables, (the set of all possible outcomes). Consider the half closed intervals on

R,

(a, b] = [x R : a < x b}]

and form finite sums of such intervals provided the intervals are disjoint:

n

A = (a j , b j ], n < .

j=1

A set consisting of all such sums plus the empty set 0/ is an algebra, but it is not a -algebra. The

smallest -algebra that contains this set is called the Borel set and is denoted by B(R). Finally we

define the probability measure as

F(x) = P(, x].

The triple (R, B(R), P) is our probability model for continuous random variables.

Chapter 22

Solution to PS 1

table.

A B A B A B (A B) A B A B (A B) A B

1 2 3 4 5 6 7 8 9 10

T T T T F F F F F F

T F F T T F T T F F

F T F T T T F T F F

F F F F T T T T T T

(b) Claim (b) is proved by comparing columns 9 and 10.

(c) (A B) A B

A B A B (A B) B A B

1 2 3 4 5 6

T T T F F F

T F F T T T

F T T F F F

F F T F T F

205

206 22. Solution to PS 1

A B C A C B C A B (A C) (B C) (A B) C

1 2 3 4 5 6 7 8

T T T T T T T T

T T F F F T F F

T F T T T T T T

T F F F T T F F

F T T T T T T T

F T F T F T F F

F F T T T F T T

F F F T T F T T

(e) This claim is true.

If n is even, then n + 1 is odd. If n is odd, then n + 1 is even. Hence both cannot be even.

(f) Let n = 1, then n2 = 1 = n.

(g) Let x > 1.

x NO n N x = 2n + 1,

x 2

= (2n + 1)2

= 4n2 + 4n + 1

( )

= 2 2n2 + 2n + 1

(22.1) x 2 NO .

For x = 1, x2 = 1 which is odd.

(a) Set S is closed and bounded, and S is not compact.

(b) Set S is compact, and S is either not closed or unbounded.

(c) Function f is continuous and not differentiable.

(b) If there does not exist a y such that xy = 1, then x = 0.

(4) (a) The mistake is in assuming the same value of k for m and n. The correct proof should be

Proof. Since m is an even integer and n is an odd integer, m = 2k and n = 2p + 1 for some

integers k and p. Therefore, 2m + 3n = 2(2k) + 3(2p + 1) = 4k + 6p + 3 = 2(2k + 3p +

1) + 1 = 2l + 1; where l = 2k + 3p + 1. Since k, p Z, l Z. Hence, 2m + 3n = 2l + 1 for

some integer l, whence 2m + 3n is an odd integer.

(b) The mistake is in showing the claim for one particular value of n. The claim holds for all

positive integers. The correct proof should be

22. Solution to PS 1 207

all positive integers.

where m N, then y = 2 (2m). Hence y is divisible by 2.

(ii) Contradiction There exist a number y which is not divisible by 2 but is divisible

by 4. Since y = 4m where m N, we know that y = 2 (2m) and so y is divisible by 2.

This contradicts our initial assumption.

(b) There is no greatest negative real number.

Proof. Assume, to the contrary, that there is a greatest negative real number x. Then, x y

for every negative real number y. Consider the number 2x . Since x is a negative real number,

so too is 2x . Multiplying both sides of the inequality 12 < 1 by x, which is negative, gives

x x

2 > x. Hence, 2 is a negative real number that is greater than x, which is a contradiction.

Hence our assumption that there is a greatest negative real number is false. Thus there is

no greatest negative real number.

(c) The product of an irrational number and a nonzero rational number is irrational.

Proof. Assume, to the contrary, that there exists a non-zero rational number p and an

irrational number q whose product is a rational number. Thus, by definition of rational

numbers, p = ab and p q = r = dc for some integers a; b; c and d witha = 0, b = 0 and

d = 0. Hence,

c

r d bc

q= = a =

p b ad

r Q, which is a contradiction. Hence our assumption that there exists a non-zero rational

number and an irrational number whose product is a rational number is false. Thus, the

product of a rational number and an irrational number is irrational.

When n = 1, the statement P(1) : 1 = 12 holds trivially.

(ii) For every integer n > 1, let P(n) be the statement P(n) : 1 + 3 + + (2n 1) = n2 .

For the inductive hypothesis, let k be an arbitrary (but fixed) integer such that k > 1

and assume that P(k) is true; that is, assume that 1 + 3 + + (2k 1) = k2 . For the

inductive step, we need to show that P(k + 1) is true. That is, we show that

208 22. Solution to PS 1

1 + 3 + + (2k 1) + (2k + 1) = (1 + 3 + + (2k 1)) + (2k + 1)

= k2 + (2k + 1) (by the inductive hypothesis)

= (k + 1)2 ;

thus verifying that P(k + 1) is true.

(iii) Hence, by the principle of mathematical induction, P(n) is true for all integers n 1;

that is,

1 + 3 + + (2n 1) = n2

is true for every positive integer n.

(b) (i) Base of induction:

When n = 1, the statement P(1) : 1 = 1(1+1) 2 is certainly true since 1(1+1)

2 = 22 = 1.

This establishes the base case when n = 1.

(ii) For every integer n > 1, let P(n) be the statement P(n) : 1 + 2 + + n = n(n+1)

2 . For

the inductive hypothesis, let k be an arbitrary (but fixed) integer such that k > 1 and

assume that P(k) is true; that is, assume that 1 + + k = k(k+1)

2 . For the inductive

step, we need to show that P(k + 1) is true. That is, we show that

(k + 1)(k + 2)

1 + 2 + + k + (k + 1) = .

2

Evaluating the left-hand side of this equation, we have

1 + 2 + + k + (k + 1) = (1 + 2 + + k) + (k + 1)

k(k + 1)

= + (k + 1) (by the inductive hypothesis)

2

k(k + 1) 2(k + 1)

= +

2 2

(k + 1)(k + 2)

= ;

2

thus verifying that P(k + 1) is true.

(iii) Hence, by the principle of mathematical induction, P(n) is true for all integers n 1;

that is,

n(n + 1)

1+2++n =

2

is true for every positive integer n.

(c) (i) Base of induction:

[ ]2

When n = 1, the statement P(1) : 1 = 1(1+1) 2 = 1 is certainly true since 1(1+1)

2 =

2

= 1. This establishes the base case when n = 1.

2 [ ]2

(ii) For every integer n > 1, let P(n) be the statement P(n) : 13 +23 + +n3 = n(n+1)

2 .

For the inductive hypothesis, let k be an arbitrary (but fixed) integer such that k > 1

22. Solution to PS 1 209

[ ]2

and assume that P(k) is true; that is, assume that 13 + + k3 = k(k+1)

2 . For the

inductive step, we need to show that P(k + 1) is true. That is, we show that

[ ]

(k + 1)(k + 2) 2

1 + 2 + + k + (k + 1) =

3 3 3 3

.

2

Evaluating the left-hand side of this equation, we have

13 + + k3 + (k + 1)3 = (13 + + k3 ) + (k + 1)3

[ ]

k(k + 1) 2

= + (k + 1)3 (by the inductive hypothesis)

2

[ ]

2 4(k + 1)

2 k

= (k + 1) +

4 4

[ ] [ ]

(k + 1) 2 2 (k + 1) 2

= [k + 4k + 4] = (k + 2)2

2 2

[ ]

(k + 1)(k + 2) 2

= ;

2

thus verifying that P(k + 1) is true.

(iii) Hence, by the principle of mathematical induction, P(n) is true for all integers n 1;

that is,

[ ]

n(n + 1) 2

1 ++n =

3 3

2

is true for every positive integer n.

(d) It is an example of an arithmetic - geometric series. Let us denote the sum by S, i.e.

Multiplying both sides by q, we get

qS = aq + (a + r)q2 + (a + 2r)q3 + + (a + (n 1)r)qn

Subtracting it from S, we get

All terms except the first and the last term on the right hand side constitute a geometric

series with first term being rq, the common ratio being q and the number of terms being

n 1.

a (a + (n 1)r)qn rq + rq2 + + rqn1

S= +

1q 1q

210 22. Solution to PS 1

qr(1 qn1 )

.

1q

We substitute this for the sum and get S as

rq(1qn1 )

a (a + (n 1)r)qn (1q)

S= +

1q 1q

a (a + (n 1)r)qn rq(1 qn1 )

= + .

1q (1 q)2

(7) To show that the formula holds for n = 0, we must show that

0

r0+1 1

ri = r1

.

i=0

The left-hand side of this equation is 0i=0 ri = r0 = 1, while the right-hand side is r r11 = 1,

0+1

since r = 1. Hence the formula holds for n = 0. For the inductive hypothesis, let k be an

arbitrary (but fixed) integer such that k 0 and assume that ki=0 ri = r r11 . For the inductive

k+1

k+1 i rk+2 1

step, we need to show that i=0 r = r1 . Evaluating the left-hand side of this equation, we

have

k+1 k

ri = ri + rk+1 (writing the (k + 1)st term separately)

i=0 i=0

rk+1 1

= + rk+1 (by the inductive hypothesis)

r1

rk+1 1 (r 1)rk+1

= +

r1 r1

r 1 + r rk+1

k+1 k+2

=

r1

r 1

k+2

= ;

r1

thus verifying the claim. Hence, by the principle of mathematical induction, the formula is true

for all integers n 0.

In the limiting case of n , the sum is well-defined for |r| < 1. Also the sum is 1r 1

in

this case. In case of |r| 1 it is not well defined in case of n , though it is defined for all

n N.

(8) (a) We proceed by mathematical induction. When n = 2, the result is true since in this case

n3 n = 23 2 = 8 2 = 6 and 6 is divisible by 6. Hence, the base case when n = 2 is

true. For the inductive hypothesis, let k be an arbitrary (but fixed) integer such that k 2

22. Solution to PS 1 211

and assume that the property holds for n = k, i.e., suppose that k3 k is divisible by 6. For

the inductive step, we must show that the property holds for n = k + 1. That is, we must

show that (k + 1)3 (k + 1) is divisible by 6. Since k3 k is divisible by 6, there exists,

by definition of divisibility, an integer r such that k3 k = 6r. Now, by the laws of algebra

and the inductive hypothesis, it follows that

= (k3 k) + 3(k2 + k)

= 6r + 3k(k + 1)

Now, k(k + 1) is a product of two consecutive integers, and is therefore even. Hence,

k(k + 1) = 2s for some integer s. Thus, 6r + 3k(k + 1) = 6r + 3(2s) = 6(r + s), and so, by

substitution, (k + 1)3 (k + 1) = 6(r + s), which is divisible by 6. Therefore, (k + 1)3

(k + 1) is divisible by 6, as desired. Hence, by the principle of mathematical induction, the

property holds for all integers n 2.

(b) We proceed, as before, by mathematical induction. When n = 3, the inequality holds since

in this case 2n = 23 = 8 and 2n + 1 = 2 3 + 1 = 7, and 8 > 7. Hence, the base case when

n = 3 is true. For the inductive hypothesis, let k be an arbitrary (but fixed) integer such that

k > 3 and assume that the inequality holds for n = k, i.e., suppose that 2k > 2k + 1. For

the inductive step, we must show that the inequality holds for n = k + 1. That is, we must

show that 2k+1 > 2(k + 1) + 1. Now,

2k+1 = 2 2k

> 2 (2k + 1) (by the inductive hypothesis)

= 2(k + 1) + 2k

> 2(k + 1) + 1 (since k 3),

as desired. Hence, by the principle of mathematical induction, the inequality holds for all

integers n 3.

(9) By the Quotient Remainder theorem, for d = 6, for all natural number m, m = 6n + r where

n is integer and r {0, 1, 2, 3, 4, 5}. Since m is prime, it cannot be of the form 6n (divisible

by 6), 6n + 2, or 6n + 4 (divisible by 2) or 6n + 3 (divisible by 3). Thus the only remaining

possibilities are 6n + 1 and 6n + 5.

212 22. Solution to PS 1

|9 5x| 11

9 5x 11 or (9 5x) 11

(10) 9 5x 9 11 9 9 + 5x 11

5x 2 9 + 5x + 9 11 + 9

1 1

5x 2 5x 20

5 5

2 1 1

x 5x 20

5 5 5

Chapter 23

Solution to PS 2

(1) We need to verify that it satisfies three conditions of the distance function.

(a) (i) Non-negativity is obvious as the absolute value is non-negative. If x = y, then d(x, y) =

0. Also if

n

d (x, y) = | xi yi |= 0

i=1

then xi yi = 0 for all i = 1, , n. This implies that x = y.

(ii) Symmetry is obvious too since absolute value function is symmetric,

| a b |=| b a | .

| xi zi |6| xi yi | + | yi zi |

holds for all i = 1, 2, , n. Hence

n n n

|xi zi | 6 |xi yi | + |yi zi |

i=1 i=1 i=1

or

d(x, z) 6 d(x, y) + d(y, z).

Hence it is a distance function.

(b) (i) Non-negativity is obvious as the maximum of two absolute values is non-negative. If

x = y, d(x, y) = 0. Also

d (x, y) = max{|x1 y1 | , |x2 y2 |} = 0

|x1 y1 | = 0 = |x2 y2 | x = y.

213

214 23. Solution to PS 2

| a b |=| b a | .

(iii) Triangle Inequality I: Note that max{a, b} > a and max{a, b} > b. Using this we

have

d(x, y) > |x1 y1 | and d(x, y) > |x2 y2 |

d(y, z) > |y1 z1 | and d(y, z) > |y2 z2 |

d(x, y) + d(y, z) > |x1 y1 | + |y1 z1 | > |x1 z1 |

d(x, y) + d(y, z) > |x2 y2 | + |y2 z2 | > |x2 z2 | .

It follows that

d(x, y) + d(y, z) > max{|x1 z1 | , |x2 z2 |} = d (x, z) .

Hence it is a distance function.

(iv) Triangle Inequality II: Consider the case when d(x, z) = |x1 z1 |, i.e., |x1 z1 |

|x2 z2 |.

Then using triangle inequality for the absolute value function,

d(x, z) = |x1 z1 | |x1 y1 | + |y1 z1 |

d(x, y) + d(y, z)

The inequality in second line follows from the fact that either d(x, y) = |x1 y1 | or

d(x, y) > |x1 y1 |. Similar observations hold for d(y, z). The second case of d(x, z) =

|x2 z2 | will be similar. Hence it is a distance function.

(c) (i) Non-negativity: d(x, y) 0 for all x, y in Rn , and thus 1 + d(x, y) 1 for all x, y in

Rn . As a result, d1 (x, y) 0 for all x, y in Rn .

By the definition of d1 (x, y), d1 (x, y) = 0 if and only if d(x, y) = 0. But d(x, y) = 0 if

and only if x = y.

(ii) Since d(x, y) = d(y, x), it is straightforward to see that d1 (x, y) = d1 (y, x).

(iii) Triangle Inequality I

d1 (x, z) d1 (x, y) + d1 (y, z)

d(x, z) d(x, y) d(y, z)

+

1 + d(x, z) 1 + d(x, y) 1 + d(y, z)

d(x, z)[1 + d(x, y)][1 + d(y, z)] d(x, y)[1 + d(x, z)][1 + d(y, z)]

+ d(y, z)[1 + d(x, y)][1 + d(x, z)]

d(x, z) d(x, y) + d(y, z) + 2d(x, y)d(y, z) + d(x, y)d(y, z)d(x, z)

Since d(x, y) + d(y, z) d(x, z), d(a, b) 0 for any (a, b) Rn Rn , the last inequal-

ity is always true. Thus d1 (x, z) d1 (x, y) + d1 (y, z) for all x, y, z in Rn .

23. Solution to PS 2 215

We use notation a d(x, z) and b d(x, y) + d(y, z). Then

a b a + ab b + ab

a b

a(1 + b) b(1 + a)

1+a 1+b

d(x, z) d(x, y) + d(y, z)

1 + d(x, z) 1 + d(x, y) + d(y, z)

d(x, z) d(x, y) d(y, z)

+

1 + d(x, z) 1 + d(x, y) 1 + d(y, z)

d1 (x, z) d1 (x, y) + d1 (y, z).

[ ]

(2) It is bounded. Take B = 2, x 6 2, x n=1

1 2

n, n . But it is NOT closed as

[ ]

1 2

, = (0, 2].

n=1

n n

So it is not compact.

(3)

(A B)c Ac Bc is TRUE.

Let, x (A B)c

x

/ (A B)

x

/ Ax

/B

x AC x BC

x AC BC

(A B)c Ac Bc is FALSE.

Let, x Ac Bc and let x Ac x

/ Bc

x

/ Ax B x AB

/ (A B)C .

x

(4) It is enough to show that one of the properties of the vector space is not satisfied by this space.

Take scalar multiplication by 2. Let (x1 , x2 ) C and let = 2 be a scalar. Then

(2x1 , 2x2 ) R2 : (2x1 )2 + (2x2 )2 = 4.

Hence (2x1 , 2x2 )

/ C and so C is not a vector space.

216 23. Solution to PS 2

(5) In this case the commutative property of the sum of vectors does not hold. Consider a = (2, 3)

and b = (4, 5).

Then a + b = (2 + 4, 3 5) = (6, 2) and b + a = (4 + 2, 5 3) = (6, 2). Hence

(2, 3) + (4, 5) = (4, 5) + (2, 3).

So V is not a vector space.

(6) In this case also, the commutative property of the sum of vectors does not hold. Consider as

before, a = (2, 3) and b = (4, 5).

Then a + b = (2 + 2 4, 3 + 3 5) = (10, 18) and b + a = (4 + 2 2, 5 + 3 3) = (8, 14).

Hence

(2, 3) + (4, 5) = (4, 5) + (2, 3).

So V is not a vector space.

We split the proof in two parts.

(i)

(23.1) (J K)c J c K c .

Let

x (J K)c

x

/ (J K)

x

/ J x

/K

x JC x K C

x JC K C .

(ii) Next

Jc Kc Jc Kc

x JC K C

x JC x K C

x

/ J x

/K

x

/ J K

x (J K)C .

(b) (J K)c = J c K c .

(23.2) (J K)c J c K c

23. Solution to PS 2 217

Let

x (J K)c

x

/ (J K)

x

/ J x

/K

x JC x K C

x JC K C .

Next

J c K c (J K)c

Let

x Jc Kc

x Jc x Kc

x

/ J x

/K

x

/ J K

x (J K)c .

n > N, (xn + yn ) (x + y) < .

Note

|xn + yn x y| = |xn x + yn y| 6 |xn x| + |yn y| ,

by triangle inequality. Since

xn x, N1 s.t. n > N1 , |xn x| <

2

yn y, N2 s.t. n > N2 , |yn y| < .

2

Let N = max {N1 , N2 }. Hence

n > N, |xn x| + |yn y| < + = ,

2 2

|xn + yn x y| < .

So {xn + yn } x + y.

(9) We know that if a sequence is convergent then it is bounded. The contrapositive statement

will be, If a sequence is not bounded then it is not convergent.. The sequence xn = n, n N

is NOT bounded. No matter which B we choose as a bound, there will be a natural number

greater than it. We now use the contrapositive to conclude that {xn }

n=1 is not convergent.

218 23. Solution to PS 2

(10) Since {xn } is a Cauchy sequence, for > 0, there exist N N such that m, n > N implies

that |xn xm | < . Choose = 1, m = N, then

Let { }

B = max |x1 | , |x2 | , , 1 + |xN | ,

then xn 6 B, n N.

} to 2 being sum of a constant sequence {xn } =

{2, 2, } and the sequence {yn } = n . We have already seen in the class notes that the

1

second sequence {yn } converges to zero. Hence, the sequence being some of two convergent

sequences converges to the sum of the limits which is equal to 2 + 0 = 2. Since limit of

convergent sequence is unique, 1 cannot be a limit.

(12) We consider monotone increasing sequence xn 6 xn+1 . Proof is analogous for the monotone

decreasing case. Let {xn } be a convergent sequence and let lim xn = x. From the definition of

n

convergence, with = 1, we get N N such that n > N implies that |xn x| < 1. Then,

xn < 1 + |x| , n > N.

Let { }

B = max |x1 | , |x2 | , , 1 + |x| ,

then xn 6 B, n N. Now let the sequence be bounded. Let x be the least upper bound. Then

xn 6 x n N. For every > 0, there exists a N N, such that x < xN 6 x. Otherwise x

would be an upper bound for the sequence. Since xn is increasing, n > N implies

x < xn 6 x

which shows that xn converges to x.

(13) (i) S = (0, 1) Open: For any x (0, 1), open ball with radius min {x, 1 x} is contained in S.

(ii) S = [0, 1] Closed: Use the theorem: A set S Rn is closed if and only if every convergent

sequence of points {xn } S has its limit x A. Let {xn } be a convergent sequence with

limit x contained in S, then for all n, xn > 0, and xn 6 1. Since weak inequalities are

preserved in the limit, x 6 1 and x > 0. So x S and S is closed.

(iii) S{ = [0, }

1): Neither open nor closed: It is not closed since the limit of convergent sequence

1 n is not contained in S and is not open since x = 0 is contained in S but it is not

1

(iv) S = R; Both open and closed: Use the result in the notes that empty set is both open and

closed and R is complement of the empty set.

Chapter 24

Solution to PS 3

(1)

[ ] 9 6 5 4

1 1 7

AB = . 1 2 3 3

0 8 10

0 1 1 2

[ ]

1911+70 1612+71 15+1371 1413+72

=

0 9 + 8 1 + 10 0 0 6 8 2 + 10 1 0 5 8 3 10 1 0 4 + 8 3 + 10 2

[ ]

8 15 1 15

(24.1) =

8 6 34 44

( ) ( ) ( )

1 1 0

1 + 2 =

2 3 0

{

1 + 2 = 0

21 + 32 = 0

{

1 = 2

(24.2)

1 = 23 2

(24.3) 1 = 0, 2 = 0.

219

220 24. Solution to PS 3

(3)

[ ] 8 4

1 6 2

AB = . 0 2

1 5 3

7 3

[ ]

18+60+27 146223

=

1 8 + 5 0 + 3 7 1 4 5 2 3 3

[ ]

22 14

(24.4) =

13 23

8 4 [ ]

1 6 2

BA = 0 2 .

1 5 3

7 3

8141 86+45 82+43

= 01+21 0625 0223

71+31 7635 7233

4 68 28

(24.5) = 2 10 6

10 27 5

(4)

1 2 3 4

1 2 1 2

A=

1 3 5 7

2 1 4 1

Let us expand the determinant by the first column.

2 1 2 2 3 4

|A| = 1 (1)1+1 3 5 7 + 1 (1)2+1 3 5 7

1 4 1 1 4 1

2 3 4 2 3 4

3+1 4+1

+1 (1) 2 1 2 + 2 (1) 2 1 2

1 4 1 3 5 7

24. Solution to PS 3 221

2 1 2

5 7 3 7 3 5

3 5 7 = 2 1 +2 = 28

4 1 1 1 1 4

1 4 1

2 3 4

5 7 3 7 3 5

3 5 7 = 2 3 +4 = 6

4 1 1 1 1 4

1 4 1

2 3 4

1 2 2 2 2 1

2 1 2 = 2 3 +4 = 14

4 1 1 1 1 4

1 4 1

2 3 4

1 2 2 2 2 1

2 1 2 = 2 3 +4 = 2

5 7 3 7 3 5

3 5 7

and therefore

= 4

(5) Recall the rank of a matrix A is the number of linearly independent column vectors of A. It is

also equal to the number of linearly independent row vectors of A.

3 2 1

A= 0 1 7

5 4 1

3 2 0

1 0 + 2 1 = 0

5 4 0

31 + 22 = 0

2 = 0

51 + 42 = 0

(24.6) 1 = 0, 2 = 0

222 24. Solution to PS 3

is the only solution. So the first two columns are linearly independent. Now lets take all three

columns,

3 2 1 0

1 0 + 2 1 + 3 7 = 0

5 4 1 0

31 + 22 + 3 = 0 (i)

2 + 73 = 0 (ii)

51 + 42 3 = 0 (iii)

(i) 2 (ii) : 31 133 = 0

(24.7) (iii) 4 (ii) : 51 293 = 0

So, 1 = 0, 3 = 0 2 = 0

is the only solution. So all three columns are linearly independent. This implies that the rank

of matrix A is 3.

(24.8) A x = b

33 31 31

(24.9) rank (A) = rank (Ab )

and the solution, if it exists, is unique if and only if

(24.10) rank (A) = rank (Ab ) = 3#of unknowns

In this question

1 1 1 1 1 1 6

A = 1 2 3 , Ab = 1 2 3 10

1 2 1 2

We can verify that the rank of A is at least 2 since the first two rows of A are linearly inde-

pendent. Similarly the rank of Ab is at least 2 since the first two rows of Ab are also linearly

independent.

(a) For no solution to exist, the ranks of A and Ab need to be different which will be possible

only in case rank of A is 2 and rank of Ab is 3. This is because if rank of A is 3 then so is

the rank of Ab .

For rank of A to be 2, = 3. Also for rank of Ab to be equal to 3, = 10.

(b) For unique solution, the rank of A and Ab must be equal to 3. Rank of A is 3 if and only if

= 3. In this case, rank of Ab is 3 for every value of R. Thus for = 3 and R we

get unique solution.

24. Solution to PS 3 223

(c) For infinitely many solution, rank of A and Ab need to be equal to 2. This is possible if and

only if = 3 and = 10.

You might consider writing down the solutions in the last two cases in terms of and

values.

(7)

A11 = 2 > 0, A11 A22 A12 A21 = 2 1 1 = 1 > 0: PD

B11 > 0, B22 > 0, B11 B22 B12 B21 = 2 8 16 = 0: PSD

C11 < 0, C11C22 C12C21 = 3 5 16 < 0 : Indefinite

D11 < 0, D11 D22 D12 D21 = 3 (6) 16 > 0: ND

(8) Let Et and Ut denote the number of people who have employment and unemployed people in

some period t. The transition probabilities are defined as follows.

pAA probability that a current A remains an A,

pAB probability that a current A moves to B,

pBB probability that a current B remains a B,

pbA probability that a current B moves to A.

The distribution of employees at time t is denoted by the vector xt = [At Bt ] and the transition

probabilities in matrix form as

[ ] [ ]

pAA pAB 0.9 0.1

(24.11) M= = .

pBA pBB 0.7 0.3

Then the distribution of employees across the two locations next period (t + 1) is xt M = xt+1

,

which is

[ ]

0.9 0.1

[At Bt ] = [(0.9At + 0.7Bt ) (0.1At + 0.3Bt )] = [At+1 Bt+1 ].

0.7 0.3

In the similar manner we can determine the distribution of employees after two periods.

xt+1 M = xt+2

[ ]

0.9 0.1

[At+1 Bt+1 ] = [At+2 Bt+2 ]

0.7 0.3

[ ][ ]

0.9 0.1 0.9 0.1

[At Bt ] = [At+2 Bt+2 ]

0.7 0.3 0.7 0.3

[ ]2

0.9 0.1

[At Bt ] = [At+2 Bt+2 ]

0.7 0.3

224 24. Solution to PS 3

[ ]n

0.9 0.1

(24.12) [At Bt ] = [At+n Bt+n ]

0.7 0.3

The initial distribution of employees across two states at time t = 0 as

x0 = [A0 B0 ] = [0 2000]

Then the distribution of employees in the next period t = 1 is

[ ]

0.9 0.1

[0 2000] = [1400 600] = [A1 B1 ].

0.7 0.3

The distribution after two periods is

[ ]2 [ ]

0.9 0.1 0.88 0.12

[0 2000] = [0 2000] = [1680 320] = [A2 B2 ]

0.7 0.3 0.84 0.16

The distribution after four periods is

[ ]4 [ ]

0.9 0.1 0.8752 0.1248

[0 2000] = [0 2000] = [1747 253] = [A4 B4 ]

0.7 0.3 0.8736 0.1264

The distribution after six periods is

[ ]6 [ ]

0.9 0.1 0.875005 0.124992

[0 2000] = [0 2000] = [1749 251]. = [A6 B6 ]

0.3 0.7 0.874944 0.125056

The distribution after eight periods is

[ ]8 [ ]

0.9 0.1 0.8750 0.1250

[0 2000] = [0 2000] = [1750 250] = [A8 B8 ]

0.3 0.7 0.8750 0.1250

The distribution after ten periods is

[ ]10 [ ]

0.9 0.1 0.8750 0.1250

[0 2000] = [0 2000] = [1750 250] = [A10 B10 ].

0.3 0.7 0.8750 0.1250

Observe that when the transition matrix is raised to higher powers, the new transition matrix

converges to a matrix whose rows are identical. This is referred to as the steady state. In this

example, the steady state would be

[ ]

7 1

M = 8 8 .

7 1

8 8

0.9 0.1

[A B] = [A B],

0.7 0.3

24. Solution to PS 3 225

gives

0.9A + 0.7B = A,

and

A + B = 2000.

Then we get

7 1

A = 7B, or A = (2000), B = (2000).

8 8

det AB = det A det B.

Since the matrix A is nilpotent, we know

det Ak = [det A]k = det O = 0.

Hence, det A = 0.

(b) Note

det A = det A.

Also, matrix A is obtained by multiplying each row (or each column) of matrix A by 1.

Hence,

det(A) = (1)n det A = det A,

if n is an odd number. Thus

det A = det A = det A.

This leads to

det A = 0,

and therefore A is not invertible.

(c) Note

det A = det A.

and

det[AA ] = det A det A = det A det A = [det A]2 = det I = 1,

we get

det A = 1.

(d) As we have seen in part (b), for n an odd integer,

det AB = det A det B = (1)n det BA = (1)n det B det A = det A det B,

implies

det A det B = 0.

This means either det A = 0 (i. e., A is not invertible) or det B = 0 (i.e., B is not invertible).

226 24. Solution to PS 3

(e) Since

det AB = det A det B = det I = 1,

det A = 0 and therefore A is invertible. Pre-multiplying both sides by A1 , we get

A1 AB = IB = B = A1 I = A1 ,

showing that A1 = B.

(10) (a) We use the Result 6.1 to prove this. The determinant of the upper triangular matrix is equal

to the product of all the diagonal terms. By definition of eigenvalue, it is clear that if we

take i = aii , then the determinant of the matrix [A i I] is zero since the diagonal entry

in row i or column i is zero.

Similar arguments can be used to prove the result for the lower triangular matrix.

(b) Since A is an invertible matrix, A1 exists and we can pre-multiply the equation (AI)x =

0 by (A1 . This yields (I A1 )x = 0 or ( 1 I A1 )x = 0 or (A1 1 I)x = 0 as desired.

Thus for an invertible matrix A, is an eigenvalue of A if and only if 1 is an eigenvalue of

A1 .

(c) Assume is the eigenvalue for the eigenvector x, we know

Ax = x.

Pre-multiplying both sides by A, we get

A Ax = A x = ()Ax = ()x = ()2 x.

In other words, we get

A2 x = ()2 x.

Hence x is an eigenvector of A2 and the corresponding eigenvalue is 2 . Using the exercise

in part (b) and similar argument, we can show that x is an eigenvector of A2 and the

corresponding eigenvalue is 2 .

Chapter 25

Solution to PS 4

(1) (a)

[ ]1

2x + 1 2

f (x) =

x1

[ ] 1

1 2x + 1 2 (x 1) 2 (2x + 1) 1

f (x) =

2 x1 (x 1)2

[ ]1

3 x1 2 1

=

2 2x + 1 (x 1)2

3 1

(25.1) =

2 (2x + 1) 12 (x 1) 32

(b)

1

f (x) = (6x 5)

3x2 5x

6x 5

(25.2) = .

3x2 5x

y = f (x0 ) + f (x0 ) (x x0 ) .

227

228 25. Solution to PS 4

y = 24 + 23 (x 2) ,

y = 22 + 23x.

(3) (a)

lim f (x) = lim+ f (x) = f (x0 )

xx0 xx0

x0 x0

lim f (x) = lim+ f (x)

xx0 xx0

(b)

lim g (x) = 3 2 2 = 4; lim+ g (x) = 2 + 6 = 4 = g (x0 ) .

x2 x2

Hence g (x) is continuos at x = 2.

f (0) = g (0) = 0,

we can use LHospital rule to find the limit.

( )

f (x) = 2x exp x2 exp (x) f (0) = 1

(25.3) g (x) = 2 g (0) = 2.

Hence

( )

exp x2 + exp (x) 2 1

(25.4) lim = .

x0 2x 2

(5)

[ ]

f (x, y) = 2xy + y2 2y + 3 x2 + 2xy 2x

[ ]

2y 2x + 2y 2

H f (x, y) =

2x + 2y 2 2x

[ ]

4 4

(25.5) H f (1, 2) =

4 2

25. Solution to PS 4 229

{

xy

x2 +y2

if (x, y) = (0, 0)

f (x, y) =

0 otherwise.

Show that the partial derivatives D1 f (x, y) and D2 f (x, y) exist at every point in R2 , although f

is not continuous at (0, 0).

(a) Observe that for all (x, y) = (0, 0), we get

D1 f (x, y) =

(x2 + y2 )2

y(y2 x2 )

=

(x2 + y2 )2

and

(x2 + y2 )(x) xy(2y)

D2 f (x, y) =

(x2 + y2 )2

x(x2 y2 )

=

(x2 + y2 )2

Further,

f (h, 0) f (0, 0)

D1 f (0, 0) = lim

h0 h

0

= =0

h

and

f (0, h) f (0, 0)

D2 f (0, 0) = lim

h0 h

0

= =0

h

Thus the partial derivatives D1 f (x, y) and D2 f (x, y) exist at every point (x, y) R2 .

(b) Consider y = x. The function f (x, x) = 12 for all points x = 0 and therefore f (0, 0) = 0 =

limh0 f (h, h). Hence it is not continuous at (0, 0).

(7) This exercise gives an example of a function with D12 f (x, y) = D21 (x, y). Let f (x, y) be defined

as

{

xy(x2 y2 )

x2 +y2

if (x, y) = (0, 0)

f (x, y) =

0 otherwise.

230 25. Solution to PS 4

D1 f (x, y) =

(x2 + y2 )2

y(x4 + 4x2 y2 y4 )

=

(x2 + y2 )2

and

(x2 + y2 )(x3 3xy2 ) (x3 y xy3 )2y

D2 f (x, y) =

(x2 + y2 )2

x(y4 + 4x2 y2 x4 )

=

(x2 + y2 )2

Further,

f (h, 0) f (0, 0)

D1 f (0, 0) =

h

0

= =0

h

and

f (0, h) f (0, 0)

D2 f (0, 0) =

h

0

= =0

h

Further,

( )

y(x4 + 4x2 y2 y4 ) 2x2 y2 2y4

D1 f (x, y) = = y 1+ 4

(x2 + y2 )2 x + 2x2 y2 + y4

( ) ( )

2x2 y2 2x2 y2

y 1+ 4 = y 1+ 2 2 y(1 + 1).

x + 2x2 y2 + y4 2x y + (x4 + y4 )

So D1 f (x, y) 2|y|. It is easy to verify that D2 f (x, y) 2|x| on similar lines. This shows

that D1 f (x, y)(x,y)(0,0) 0 = D1 f (0, 0) as lim(x,y)(0,0) 2|y| = 0. Similarly, D2 f (x, y)(x,y)(0,0)

0 = D2 f (0, 0) as lim(x,y)(0,0) 2|x| = 0. For all (x, y) R2 \ (0, 0) the partial derivatives

D1 f (x, y) and D2 f (x, y) are continuos functions being a ratio of two polynomials with

non-vanishing denominator.

Thus the partial derivatives D1 f (x, y) and D2 f (x, y) exist and are continuous at every point

(x, y) R2 .

(b) Since the real-valued function f has continuous partial derivatives at every point (x, y)

R2 , it is continuous at every point (x, y) R2 .

25. Solution to PS 4 231

hy(h2 y2 )

(h2 +y2 )

0

D1 f (0, y) = lim

h0 h

y(h2 y2 )

= lim = y

h0 (h2 + y2 )

and

xh(x2 h2 )

(x2 +h2 )

0

D2 f (x, 0) = lim

h0 h

x(x2 h2 )

= lim 2 = x.

h0 (x + h2 )

(d) Since f (x, y) is a rational function with non-zero denominator for (x, y) = (0, 0), the second

order cross partial derivatives D12 f (x, y) and D21 f (x, y) exist at every point in R2 and are

continuous everywhere in R2 except at (0, 0).

(e) Given D2 f (x, 0) = x we get D21 f (0, 0) = +1 and from D1 f (0, y) = y we get D12 f (0, 0) =

1.

Chapter 26

Solution to PS 5

(1) Let f (x) and g (x) be two concave functions and let h (x) = f (x) + g (x). Concavity of f and g

imply, x, y D, [0, 1]

[ ]

f (x) + (1 ) f (y) 6 f x + (1 ) y

[ ]

g (x) + (1 ) g (y) 6 g x + (1 ) y .

[ ] [ ]

f (x) + (1 ) f (y) + g (x) + (1 ) g (y) 6

f x + (1 ) y + g x + (1 ) y

( ) ( ) [ ] [ ]

f (x) + g (x) + (1 ) f (y) + g (y) 6 f x + (1 ) y + g x + (1 ) y

[ ]

h (x) + (1 ) h (y) 6 h x + (1 ) y .

(2) (a) False. Consider A, B R, A = [0, 2] , B = [4, 6] .A B = [0, 2] [4, 6]. Then 1 A B, 5

A B, but 12 1 + 12 5 = 3

/ A B.

(b) True. If A and B are convex sets, then A B is convex. Let x A B, y A B. Then,

(26.1) x + (1 ) y A as x, y A

(26.2) x + (1 ) y B as x, y B

(26.3) x + (1 ) y A B.

Hence A B is convex.

(3) (a) Recall a monotone function of one variable is quasi-concave. Since f (x) = 3x + 4 is mono-

tone increasing, it is quasi-concave.

233

234 26. Solution to PS 5

0 y exp (x) exp (x)

B (x, y) = y exp (x) y exp (x) exp (x)

exp (x) exp (x) 0

[ ]

( ) 0 y exp (x)

det B1 (x, y) = det

y exp (x) y exp (x)

= y2 exp (2x) < 0;

0 y exp (x) exp (x)

det B2 (x, y) = det y exp (x) y exp (x) exp (x)

exp (x) exp (x) 0

= y exp (3x) > 0.

( )

(1)r det Br (x) > 0, r = 1, 2, , n; x D.

(c)

0 2xy3 3x2 y2

B (x, y) = 2xy3 2y3 6xy2

3x2 y2 6xy2 6x2 y

[ ]

( ) 0 2xy3

det B1 (x, y) = det

2xy3 2y3

(26.4) = 4x2 y6 6 0

( ) 0 2xy 3 3x2 y2

det B2 (x, y) = det 2xy3 2y3 6xy2

3x2 y2 6xy2 6x2 y

(26.5) = 30x4 y7

( )

Note the sign of det B2 (x, y) is not positive. Hence it is not quasi-concave.

26. Solution to PS 5 235

f (x)

g(x)

(4) Let

0 for x 6 0

x for 0 6 x 6 12

(26.6) f (x) =

1 x for 21 6 x 6 1

0 for x > 1

0 for x 6 1

x 1 for 1 6 x 6 32

(26.7) g (x) = and

2 x for 23 6 x 6 2

0 for x > 2

(26.8) h (x) = f (x) + g (x)

In the figures, Fig. 1 and Fig. 2 functions are quasiconcave (each of them is first non-

decreasing, then non-increasing), whereas Fig. 3 function, which is the sum of the top and

middle functions, is not quasiconcave (it is not non-decreasing, is not non-increasing, and is

not non-decreasing then non-increasing.

236 26. Solution to PS 5

f (x) + g(x)

(5) (i)

[ ]

f (x, y, z) = 24x2 + 2y2 4xy 3z2

48x 4y 0

(26.9) H f (x, y) = 4y 4x 0

0 0 6z

Then f (x, y) is not concave as the principal minor D1 = 48x > 0. The bordered Hessian

is

0 24x2 + 2y2 4xy 3z2

24x2 + 2y2 48x 4y 0

B (x, y) =

4xy 4y 4x 0

3z2 0 0 6z

[ ]

( ) 0 24x2 + 2y2

det B1 (x, y) = det 2 2

24x + 2y 48x

(26.10) = 576x4 96x2 y2 4y4 6 0

0 24x 2 + 2y2 4xy

( )

det B2 (x, y) = det 24x2 + 2y2 48x 4y

4xy 4y 4x

(26.11) = 2304x5 384x3 y2 + 48xy4

which could take both positive or negative values. Hence f (x, y, z) is neither quasiconcave

nor quasiconvex.

(ii)

[ ]

g (x, y) = 1 exp (x) exp (x + y) 1 exp (x + y)

[ ]

exp (x) exp (x + y) exp (x + y)

(26.12) Hg (x, y) = .

exp (x + y) exp (x + y)

26. Solution to PS 5 237

(26.13) D1 = exp (x) exp (x + y) < 0, D2 = ex ex+y > 0

implies that g (x, y) is concave. Hence it is also quasi-concave.

Chapter 27

Solution to PS 6

[ ]

x

[ ]

1 3 1 2 y 1

(27.1) =

2 6 2 4

z

3

w

(a) The rank of matrix A can be at most 2. This means that there can be at most two endogenous

variables. Also the second column of matrix A is a multiple (three times) of the first column

and the fourth column is a multiple of column one (2 times). The remaining two columns

are one and three. The sub-matrix consisting of columns one and three has full rank as its

determinant is 4. So we can choose x and z as endogenous variables and the remaining

two y and w as exogenous variables.

(b) The system of linear equations can be rewritten as under (with the exogenous and endoge-

nous variables choice made above.

[ ] { } [ ]

1 1 x 1 3y + 2w

(27.2) =

2 2 z 3 6y + 4w

Multiply the first equation by two and add it to the second to get,

(27.3) 4x = 5 12y + 8w

5 12y + 8w 5

(27.4) x= = 3y + 2w

4 4

Substitute the value of x in the first equation to get

( )

5 1

z = 1 3y + 2w 3y + 2w = .

4 4

239

240 27. Solution to PS 6

x

1 3 1 1 0

y

(27.5) 4 1 1 1 = 3

z

7 1 1 3

w

6

The rank of matrix A can be at most 3. However, we observe that the third row is equal to the

sum of twice the second row and the first row. This means that the rank of matrix A cannot be

three. The sub matrix obtained by eliminating the third row of A (call it matrix B) is

[ ]

1 3 1 1

(27.6)

4 1 1 1

The determinant of the sub matrix of B obtained by eliminating the third and fourth column is

11 which is non-zero. This sub-matrix has full rank. So we can choose x and y as endogenous

variables and the remaining two z and w as exogenous variables.

We can solve the set of equations to obtain

[ ] { } [ ]

1 3 x zw

(27.7) =

4 1 y 3zw

Solving the two equations we get

9 2z 4w

x= ,

11

and

3 + 3z 5w

y= .

11

F(x, y) = x2 xy3 + y5 17 = 0,

which is a continuous function being polynomial. Also,

D2 F(x, y) = 3xy2 + 5y4 = 3(5)(4) + 5(2)4 = 20 = 0.

Hence, by Implicit Function Theorem, there exist a function y = f (x) in terms of x, which is

continuously differentiable, in the neighborhood of (x, y) = (5, 2). Further,

( ) ( )

D1 F(x, y) 2x y3 2 1

f (x)x=5 = = = = .

D2 F(x, y) (x,y)=(5,2) 3xy + 5y

2 4 20 10

(x,y)=(5,2)

Then

( )

1 199

y = f (4.9) = f (5) + (5 4.9) f (x)x=5 = 2 + (0.1) = .

10 200

27. Solution to PS 6 241

(a) Check that z = 3 satisfies the equation f (x, y, z) = 0 for x = 6 and y = 3.

(b) Observe that

Function Theorem (IFT), there exist a function z = h(x, y) in terms of x and y, which is

continuously differentiable, in the neighborhood of (x, y) = (6, 3).

(c) By IFT, we have

( ) ( ) ( )

dz D1 f (x, y, z) 2x 2(6) 4

= = = = ,

dx (6,3,3) D3 f (x, y, z) (6,3,3) 3z2 (6,3,3) 3(9) 9

and

( ) ( ) ( )

dz D2 f (x, y, z) 2x 2(3) 2

= = = = .

dy (6,3) D3 f (x, y, z) (6,3,3) 3z2 (6,3,3) 3(9) 9

( ) ( )

dz dz

z = g(6, 3) + (6.1 6) + (2.8 3)

dx (6,3) dy (6,3)

( ) ( )

4 2 135

= 3 + (0.1) + (0.2) = .

9 9 45

(5) Consider the profit maximizing firm described in the Example 12.2. If p increases by p and

w increases by w, what will be the change in the optimal input amount x?

Note the first order condition for the profit maximization is

p f (x) w = 0.

p f (x) < 0 since f (x) is strictly concave. Also, F(p, w, x) is a continuously differentiable func-

tion. Hence we can apply IFT to claim that there exists a function x = f (p, w) which is contin-

uously differentiable, in the neighborhood of (p, w, x ), where x is the profit maximizing input

242 27. Solution to PS 6

quantity. Then

) ( ( )

dx dx

x=x + p + w

dp dw

( ) ( )

D1 F(p, w, x) D2 F(p, w, x)

=x p w

D3 F(p, w, x) D3 F(p, w, x)

( ) ( )

f (x) 1

= x

p w

p f (x) p f (x)

( ) ( )

f (x) 1

=x p + w.

p f (x) p f (x)

(6) Consider 3x2 yz + xyz2 = 96 as defining x as an implicit function of y and z around the point

x = 2, y = 3, z = 2.

(a) Let F(x, y, z) = 3x2 yz + xyz2 96 = 0. Then

D1 F(x, y, z) = 6xyz + yz2 = 6(2)(3)(2) + 3(4) = 84 = 0,

and F(x, y, z) is a continuously differentiable function (being polynomial). Hence we can

apply IFT to claim that there exists a function x = f (y) in terms of y which is continuously

differentiable, in the neighborhood of (x, y, z) = (2, 3, 2). Also

( ) ( ) ( )

dx D2 F(x, y, z) 3x2 z + xz2

= =

dy (2,3,2) D1 F(x, y, z) 6xyz + yz2

( )

3x2 + xz 3(4) + 2(3) 3

= = = ,

6xy + yz 6(2)(3) + 3(2) 7

Then ( ) ( )

dx 3 137

x = 2+ (3.1 3) = 2 + (0.1) = .

dy (2,3,2) 7 70

(b)

z2

3z 9 + 128

yz z z2 32 1 1 16

x= = + = + ,

2 6 36 yz 3 9 y

which implies that

1 1 16

x= + +

3 9 y

in the neighborhood of (2, 3, 2).

(c)

( ) ( ) ( )

dx 1 16 1 16 8

= 2 = 7 = .

dy (2,3,2) 2 19 + 16 y 2 3 9 21

y

27. Solution to PS 6 243

Then ( ) ( )

8 8 412

x = 2+ (3.1 3) = 2 = .

21 210 210

(d) The second method involves more computations.

( )

x+y

f (x + y) = f x

x

x+y

= f (x)

x

x

f (x) = f (x + y) .

x+y

Similarly,

y

f (y) = f (x + y) .

x+y

So

x y

f (x) + f (y) = f (x + y) + f (x + y)

x+y x+y

x+y

= f (x + y)

x+y

= f (x + y) .

If x is zero, then

f (2x) = 2 f (x) = 2 f (0) and

f (2x) = f (0) 2 f (0) = f (0)

f (0) = 0.

Then

f (x + y) = f (0 + y) = f (y) = 0 + f (y) = f (0) + f (y) = f (x) + f (y) .

Same arguments hold if both x and y are zero.

Another method of proof is as follows: Let x > 0 and y > 0. Then x = yt for some t > 0.

f (x + y) = f (yt + y) = f [(1 + t)y] = (1 + t) f (y)

= f (y) + t f (y) = f (y) + f (ty) = f (y) + f (x)

The remaining cases of x = 0 or y = 0 are considered as in the earlier proof.

f (2x) = 2 f (x) = 2 f (0) and

f (2x) = f (0) 2 f (0) = f (0)

f (0) = 0.

244 27. Solution to PS 6

( )

Take x and x such that f (x) = y > 0 and f x = y > 0. Then

f (x) = y

1

f (x) = 1

y

( )

x

f = 1.

y

Similarly

( )

x

f = 1.

y

Take (0, 1) and define

y

= .

y + (1 ) y

Then

(1 ) y

1 =

y + (1 ) y

and (0, 1) . Function f is quasi-concave. So

( ( ) ( )) { ( ) ( )}

x x x x

f + (1 ) > min f ,f

y y y y

( ( ) ( ))

y x (1 ) y x

f

+ > min {1, 1}

y + (1 ) y y y + (1 ) y y

( )

x (1 ) x

f + >1

y + (1 ) y y + (1 ) y

( )

x + (1 ) x

f >1

y + (1 ) y

1 (

)

f x + (1 ) x >1

y + (1 ) y

( )

f x + (1 ) x > y + (1 ) y

( )

= f (x) + (1 ) f x

( )

it is concave. If f x is zero, since f is non-decreasing,

( )

f x + (1 ) x > f (x)

= f (x) + 0

( )

= f (x) + (1 ) f x .

27. Solution to PS 6 245

( )

If both f (x) and f x are zero, then

( ) { ( )}

f x + (1 ) x > min f (x) , f x

( )

= 0 = f (x) + (1 ) f x .

(9) Since function f is homogeneous of degree m and is twice continuously differentiable, each of

the partial derivatives are homogeneous of degree m 1.

Further, the partial derivatives are also continuously differentiable and are homogeneous

of degree m 2 > 0.

Applying Eulers theorem for second order partial derivatives of the partial derivative

D1 f (x), we get,

In general, applying Eulers theorem to the second order partial derivatives of the partial

derivative Di f (x), we get,

for i = 1, , n.

We can write these n equalities in matrix notation as

x1 D11 f (x) + x2 D12 f (x) + + xn D1n f (x) (m 1)D1 f (x)

x1 Di1 f (x) + x2 Di2 f (x) + + xn Din f (x) = (m 1)Di f (x)

x1 Dn1 f (x) + x2 Dn2 f (x) + + xn Dnn f (x) (m 1)Dn f (x)

This is equivalent to

D11 f (x) D12 f (x) D1n f (x)

x1

D1 f (x)

Di1 f (x) Di2 f (x) Din f (x) xi = (m 1) Di f (x)

Dn1 f (x) Dn2 f (x) Dnn f (x) xn Dn f (x)

The n n square matrix on the left hand side ios the Hessian matrix H f (x) for the function f .

Thus the LHS is H f (x) x. Pre-multiplying both sides by the row vector x , we get

[ ]

x H f (x) x = (m 1) x1 D1 f (x) + + xn Dn f (x) .

Applying Eulers theorem to the sum on the RHS, we get

x H f (x) x = (m 1)[m f (x)] = m(m 1) f (x).

246 27. Solution to PS 6

Chapter 28

Solution to PS 7

(1)

[ ]

g (x, y) = 3x2 3 3y2 2

[ ]

6x 0

(28.1) Hg (x, y) =

0 6y

Then

[ ] [ ]

(28.3) g (x, y) = 3x2 3 3y2 2 = 0 0

2

(28.4) x

= 1, y =

3

( ) the theorem on convexity and global minima, g (x, y) attains

Using

global minimum at 1, 23 . Consider the function g defined for all x > 0, y > 0 by

( )

2 4 2

g 1, = 2 = 3.09.

3 3 3

247

248 28. Solution to PS 7

4

3

1 1 2 3

(2) We know that f (x) = 0 is a necessary condition for f to have a local maxima or minima. Find

all the local maxima and minima of

(28.5) f (x) = 4x3 12x2 + 8x = 0

( )

(28.6) 4x x2 3x + 2 = 0,

(28.7) x = 0, x = 1, x = 2

If we plot the graph of this function, we can see that x = 0, and x = 2 are local minima and

x = 1 is local maxima. Also x = 0, and x = 2 are global minima and there is no global maxima.

(Q1 , Q2 ) = Q1 (100 5Q1 ) + Q2 (50 10Q2 ) (50 + 10Q1 + 10Q2 ).

The first order conditions for the profit maximization are

D1 (Q1 , Q2 ) = 100 10Q1 10 = 0, or Q1 = 9,

D2 (Q1 , Q2 ) = 50 20Q1 10 = 0, or Q2 = 2.

We need to check the second order conditions. Note

D11 = 10, D22 = 20, andD12 = D21 = 0,

which gives the first order leading principal minor to be 10 and the second order leading

principal minor to be 200. So the Hessian is negative definite for all outputs in the positive

orthant. Therefore, the function is concave function. Then Q1 = 9 and Q2 = 2 is a profit

28. Solution to PS 7 249

maximizing supply plan for the firm. The maximum profit is = 9 55 + 2 30 50 110 =

395.

(4) (a) The profit for the firm, when it uses K and L units of capital and labor to produce output

Q = La K b , given the out and input prices (P,w,r) is

(K, L) = (P Q wL rK).

The firm maximizes its profit by choosing K and L such that both the FOC and SOC are

satisfied.

The FOCs are as under.

d ( )

= P aLa1 K b w = 0

dL

P aLa1 K b = w;

d ( )

= P La bK b1 r = 0,

dK

P L bK b1 = r.

a

The FOC with respect to L leads to the condition that the value of the marginal product

of labor is equal to the wage rate w. Similarly, the FOC with respect to K leads to the

condition that the value of the marginal product of capital is equal to the rental rate r.

(b) To solve for the optimal levels of L and K, we divide the first FOC by the second and get,

= = = = ;

P MPK MPK dL P La bK b1 r

aK w

= ;

bL r

wb

K= L.

ra

Observe that the ratio of the MPL and MPK is the MRTS (Marginal rate of technical substi-

tution, i.e., the rate at which one can substitute labor for capital along an iso-quant.) The

250 28. Solution to PS 7

value of K can be substituted in any of the two FOC to get the expression for L.

P aLa1 K b = w;

( )b

a1 wb

P aL L = w;

ra

( )b

a+b1 wb w

PL = ;

ra a

( )1b ( )b

a b

P = L1ab

w r

( ) 1ab

1b ( ) b

a b 1ab 1

L = P 1ab .

w r

We compute the optimal value of K from the last equation as under:

wb

K = L;

ra

( ) 1b ( ) b

wb a 1ab b 1ab 1

= P 1ab ;

ra w r

( ) 1ab

1b

1 ( ) b

a b 1ab +1 1

= P 1ab

w r

( ) 1ab

a ( ) 1ab

1a

a b 1

= P 1ab .

w r

(c) For SOC, we first write down the Hessian (the matrix of second order partial derivatives

using the FOCs.

[ ] [ ]

PFLL PFLK Pa(a 1)La2 K b PabLa1 K b1

H= = .

PFKL PFKK PabLa1 K b1 Pb(b 1)La K b2

For the SOC to be satisfied, the leading principal minor of order one needs to be negative

and the leading principal minor of order two needs to be positive. Thus, Pa(a1)La2 K b <

0, which implies that a 1 < 0 or a < 1. The LPM of order two is the determinant of the

Hessian matrix.

[ ]

Pa(a 1)La2 K b PabLa1 K b1

det H = det

PabLa1 K b1 Pb(b 1)La K b2

= P2 ab(a 1)(b 1)L2a2 K 2b2 (PabLa1 K b1 )2 ,

= P2 ab[(a 1)(b 1) ab]L2a2 K 2b2 ;

= P2 ab[1 a b]L2a2 K 2b2 > 0,

28. Solution to PS 7 251

which holds true if and only if 1 a b > 0. Note that this condition also implies that

b < 1.

Thus the production function is such that it displays diminishing marginal product in each

of the two inputs (a < 1 and b < 1) and also it displays diminishing returns to scale as the

production function is homogeneous of degree a + b < 1.

(d) We use the expression for L derived earlier to find the partial derivatives.

( ) ( ) 1ab

1b ( ) b

L 1 a b 1ab

P 1ab 1 > 0,

1

=

P 1ab w r

( ) ( ) 1abb

L 1b 1b

1ab

1b

1 b 1

= (a) 1ab (w) P 1ab < 0,

w 1ab r

( ) ( ) 1ab1b

L b a

(b) 1ab (r) 1ab 1 P 1ab < 0.

b b 1

=

r 1ab w

(e) The output is obtained by noting that the profit maximizing inputs are K and L .

Q = (L )a (K )b ,

a b

( ) 1ab 1b ( ) b ( ) 1ab

a ( ) 1ab

1a

a b 1ab a b

= P 1ab P 1ab ,

1 1

w r w r

( ) a(1b)+ab ( ) ab+b(1a)

a 1ab b 1ab a+b

= P 1ab ,

w r

( ) 1ab

a ( ) 1ab

b

a b a+b

= P 1ab ,

w r

[( ) ( ) ] 1ab

1

a a b b a+b

= P .

w r

For computing the price elasticity of supply with respect to out put price, note that

[( ) ( ) ] 1ab

1

a b

a b

Q = Pa+b ,

w r

a+b

= AP 1ab ,

[( ) ( ) ] 1ab

1

a b

a b

where A = w r is a constant independent of P. It is easy to see that the

elasticity will be P = a+b

1ab . [Note that for Q = APb , P = dQ

dP QP = AbPb1 QP = b.]

252 28. Solution to PS 7

Similarly, w = 1ab

a

and r = 1ab

b

. Thus,

a+b a b

P + w + r = + + ,

1ab 1ab 1ab

a+bab

= = 0.

1ab

The economic interpretation is that if we change all the prices by same factor, then the

profit maximizing quantity does not change. In other words, the profit maximizing output

is homogeneous of degree zero in the prices (P,w,r).

(f) You may like to write down the expression for the profit function explicitly in terms of P,

w and r, on your own.

(5) (a) The profit for the firm, when it uses K, L and R units of capital, labor and natural resources

to produce output Q = ALa K b + ln R, given the output and input prices (P,w,v,r), is

(K, L) = P Q wL rK vR = P ALa K b + P ln R wL rK vR.

The firm maximizes its profit by choosing K, L and R such that both the FOC and SOC

are satisfied.

The FOCs are as under.

d

= P aLa1 K b w = PFL w = 0,

dL

P AaLa1 K b = w;

d

= P ALa bK b1 r = PFK r = 0,

dK

P ALa bK b1 = r,

d P

= v = PFR v = 0,

dR R

P

= v.

R

The FOC with respect to L leads to the condition that the value of the marginal product of

labor is equal to the wage rate w. Similarly, the FOC with respect to K leads to the condition

that the value of the marginal product of capital is equal to the rental rate r. Lastly, the FOC

with respect to R leads to the condition that the value of the marginal product of natural

resource is equal to the price of the natural resource v.

Now take A = 3, a = b = 31 for remainder of the problem.

(b) With the given parameter values, the FOCs are, (note Aa=1=Ab)

P L 3 K 3 = w;

2 1

P L 3 K 3 = r,

1 2

P

= v.

R

28. Solution to PS 7 253

For SOC, we first write down the Hessian (the matrix of second order partial derivatives

using the FOCs.

2 P L 3 K 3 23 23

5 1

3PL

1

PFLL PFLK PFL R K 0

1 3 2 2

3 P L 3 K 3

1 5

H = PFKL PFKK PFK R = P L 3 K 3

3

2

0 .

PFRL PFRK PFR R 0 0 RP2

For the SOC to be satisfied, the leading principal minor of order one needs to be negative,

the leading principal minor of order two needs to be positive and the leading principal

minor of order three needs to be negative.

The LPM of order 1 is negative as, 32 P L 3 K 3 < 0 (given that P > 0 and K > 0, L > 0).

5 1

The LPM of order two is the determinant of the matrix obtained by removing the third row

and the third column.

[ ]

23 P L 3 K 3 31 P L 3 K 3

5 1 2 2

det H2 = det 1 23 23

23 P L 3 K 3

1 5

3PL K

4 1

= P2 L 3 K 3 P2 L 3 K 3 ,

4 4 4 4

9 9

1

= P2 L 3 K 3 > 0.

4 4

3

The LPM of order three is the determinant of the Hessian matrix. We compute the deter-

minant using the third row to get,

32 P L 3 K 3 13 P L 3 K 3

5 1 2 2

0

det H = det 13 P L 3 K 3 23 P L 3 K 3

2 2 1 5

0

0 0 RP2

[ ]

P 4 2 4 4 1 2 4 4

= 2 P L K P L K

3 3 3 3 ,

R 9 9

1 P3 4 4

= L 3 K 3 < 0.

3 R2

Hence the SOC is satisfied.

(c) To solve for the optimal levels of L and K, we divide the first FOC by the second and get,

(note a = b = 13 )

P MPL MPL dK P aLa1 K b w

= = = = ;

P MPK MPK dL P La bK b1 r

aK w

= ;

bL r

w

K = L.

r

Observe that the ratio of the MPL and MPK is the MRTS (Marginal rate of technical substi-

tution, i.e., the rate at which one can substitute labor for capital along an iso-quant.) The

254 28. Solution to PS 7

value of K can be substituted in any of the two FOC to get the expression for L.

P L 3 K 3

2 1

= w;

( ) 13

23 w

PL L = w;

r

( ) 13

w 1

P = wL 3 ;

r

( )1

1 3 1

P = L3 ;

rw2

P3

L =

.

rw2

Taking the derivative of L with respect to r we obtain

dL P3

= 2 2.

dr r w

2 1

dP L 3 K 3 P L 3 K 3 dL + P L 3 K 3 dK = dw

2 1 5 1 2 2

3 3

1 2

dP L 3 K 3 + P L 3 K 3 dL P L 3 K 3 dK = dr

1 2 2 2 1 5

3 3

dP P

dR = dv.

R2 R3

We can write this in matrix form as under:

32 P L 3 K 3 13 P L 3 K 3

5 1 2 2

0

dL

dw dP L 23 31

K

A = 13 P L 3 K 3 23 P L 3 K 3 0 , q = dK b = dr dP L 3 K 3 .

2 2 1 5 1 2

0 0 RP2 dR dv dP

R2

Then Aq = b. Note that the matrix A is same as the Hessian. Solving for dL, when

dP = dw = dv = 0 and dr = 0, using Cramers Rule, we get,

0 13 P L 3 K 3

2 2

0

det dr 23 P L 3 K 3

1 5

0

0 0 RP2

dL =

23 P L 3 K 3 31 P L 3 K 3

5 1 2 2

0

det 13 P L 3 K 3 23 P L 3 K 3

2 2 1 5

0

0 0 RP2

( RP2 )(dr) 13 P L 3 K 3

2 2 2 2

dr L 3 K 3

= = < 0.

13 PR2 L 3 K 3

3 4 4

P

28. Solution to PS 7 255

dL

2 2

L3 K 3

= < 0.

dr P

dL

Thus, L decreases as r increases. To see that we obtain identical expression for dr

as in the previous part, observe,

P3 P6 P4

K = ; L

K

= ; (L

K 32

) =

r2 w r3 w3 r2 w2

dL (L K ) 3

2

P3

= = 2 2

dr P r w

dL 2

L3 K 3

2

P3

= = 2 2.

dr P r w

(ii) Solving for dL, when dP = dw = dr = 0 and dv = 0, using Cramers Rule, we get,

0 13 P L 3 K 3

2 2

0

det 0 23 P L 3 K 3

1 5

0

dv 0 RP2

dL =

23 P L 3 K 3 13 P L 3 K 3

5 1 2 2

0

det 13 P L 3 K 3 23 P L 3 K 3

2 2 1 5

0

0 0 RP2

( RP2 )(0) 31 P L 3 K 3

2 2

=

13 RP2 L 3 K 3

3 4 4

0

= 4 = 0.

13 PR2 L 3 K 3

3 4

dL

= 0.

dv

Since L does not depend on v, this conclusion is obvious.

Chapter 29

Solution to PS 8

{ ( ) }

C = (x1 , . . . , x3 ) R3 : d 2 (0, 0, 0) , (x1 , x2 , x3 ) = 1 ,

therefore C is ( )

(i) bounded since C B (0, 0, 0) , 2 : indeed, x C d(x, 0) = 1 < 2 x

B(0, 2),

(ii) closed in R3 since it is defined as a level set in R3 of polynomial and therefore con-

tinuous function 3i=1 xi2 (use the characterization of closed set in terms of convergent

sequences),

(iii) non-empty since (1, 0, 0) C.

Since objective function 3i=1 ci xi is linear, and therefore continuous on R3 , Weierstrass

theorem is applicable and yields x C such that 3i=1 ci xi 3i=1 ci xi for any (x1 , x2 , x3 ) C.

(b) The optimization problem can be rewritten as

max f (x)

(29.1) subject to g(x) = 0

and x R3

where

3 3

f (x) = ci xi and g(x) = xi2 1.

i=1 i=1

Both functions f and g are polynomial and therefore continuously differentiable on an open

set R3 . Since x is a point of global maximum of f subject to the constraint g(x) = 0, it is

also a local maximum of f subject to the constraint g(x) = 0. Since g(0) = 1 = 0 we have

257

258 29. Solution to PS 8

x = 0. Now

g(x) = 2 (x1 , x2 , x3 ) = 0 for x = 0,

and x = 0, hence constraint qualification g(x) = 0 holds. Therefore by Lagranges theorem

there exists R such that f (x) = g (x), or

(29.2) (c1 , c2 , c3 ) = 2 (x1 , x2 , xn )

If we premultiply (29.2) by the row vector (x1 , x2 , xn ), we will get

3 3 ( )

(29.3) ci xi = 2 xi2 = 2 g(x) + 1 = 2 (0 + 1) = 2

i=1 i=1

If we premultiply (29.2) by row vector (c1 , c2 , c3 ), equation (29.3) yields

( )2

3 3 3

(29.4) c2 = c2i = 2 ci xi = ci xi

i=1 i=1 i=1

3

To conclude that the result holds we only need to show that

|c |

(c1 , c2 , c3 ) = (0, 0, 0), we have ci = 0 for some i. Since g ei cii = 0 and x solves (29.1 ),

by definition of the solution to the constrained maximization problem

( )

i |ci |

3

iic x = f (x) f e

ci

= |ci | > 0.

i=1

Now taking square roots in (29.4) yields the results.

q

(c) Let us define c = p, and consider x = q . Then x = 1, hence g (x) = 0 and the definition

of the solution of the constrained maximization problem yields

3 3

1 3 1 3 pq

p = c = ci xi = f (x) f (x) = ci xi = ci qi = pi qi = q

i=1 i=1 q i=1 q i=1

q

Analogously, for x = q

we have x = 1 hence g (x) = 0 and the definition of the solution

of the constrained maximization problem yields

3 3

1 3

p = c = ci xi = f (x) f (x) = ci xi = ci qi

i=1 i=1 q i=1

3

1 pq

=

q i=1

pi qi =

q

.

p q pq p p |pq| p q.

{ }

2. Necessity Route : Function f (x, y) = x2 3xy is continuous and the constraint set (x, y) R2+ | x + 2y = 10

which we denote by G(x, y) is non-empty, (10, 0) is contained in it, closed

as the set

is defined

by weak inequalities which are preserved in the limit and bounded as (x, y) 6 102 + 52 =

29. Solution to PS 8 259

125. So the constraint set is compact and non-empty and the objective function f is contin-

uous, hence Weierstrass theorem is applicable and a solution exists. The Lagrangian and the

FOCs are

(29.5) L (x, y, ) = x2 3xy + (2y + x 10)

L (x, y, )

(29.6) = 2x 3y + = 0

x

L (x, y, ) 3

(29.7) = 3x + 2 = 0 = x

y 2

L (x, )

(29.8) = 2y + x 10 = 0.

Now

3 7 7

2x 3y + = 2x 3y + x = 0 x = 3y y = x

2 2 6

7 10

2y + x 10 = 0 x + x 10 = 0 x = 10 x = 3

3 3

7 7 9

y = 3 = , = .

6 2 2

We get an interior candidate for solution

( )

7 9

m1 = 3, , .

2 2

The constraint qualification

( ) [ ]

g x , y = 1 2 = 0

( )

7 45

f (10, 0) = 100, f (0, 5) = 0, f 3, = .

2 2

The solution then is (x , y ) = (10, 0). Note that we cannot use sufficiency route since f is not

concave.

3. Necessity Route: A solution exists by arguments similar to the earlier problem. The La-

grangian and the FOCs are

1 2

(29.9) L (x, y, ) = x 3 y 3 + (4 2x y)

L (x, y, ) 1 2 2

(29.10) = x y 2 = 0

3 3

x 3

L (x, y, ) 2 1 1

(29.11) = x3 y 3 = 0

y 3

L (x, )

(29.12) = 4 2x y = 0.

260 29. Solution to PS 8

Now

1 23 23

3x y 2 y

= = 2 y = 4x

2 3 13

1

2x

3x y

( ) 13

2 8 2 1

4 2x y = 4 2x 4x = 0 x = , y = , =

3 3 3 4

We get an interior candidate for solution

( ) 13

2 8 2 1

m1 = , , .

3 3 3 4

( ) [ ]

g x , y = 2 1 = 0

( ) ( ) 13 ( ) 23

2 8 2 8

f (2, 0) = 0 = f (0, 4) = 0, f , = > 0.

3 3 3 3

( )

The solution then is (x , y ) = 2 8

3, 3 .

Sufficiency route:

[ ]

f (x, y) = 1 23 32 2 13 13

3x y 3x y

[ ]

29 x 3 y 3 2 23 13

5 2

H f (x, y) = 9x y

.

2 23 31

92 x 3 y 3

1 4

9x y

The determinant of Principal minors of order one,

2 5 2

x 3 y 3 6 0,

9

2 1 4

x3 y 3 6 0

9

and principal minor of order two

0>0

for (x, y) R2+ . Hence f is concave. The constraint is linear and so concave.

( ) > 0. L (x, y, )

is concave and FOC are sufficient for maximum. Therefore (x , y ) = 2 8

3, 3 which satisfies the

FOC is the solution.

4. Let f : R2 R

max f (x, y) = xy

(29.13)

subject to x + y 6 6, x > 0, y > 0.

29. Solution to PS 8 261

This problem has inequality constraint and so we will use Kuhn Tucker Sufficiency theorem.

We need to check all conditions of the Theorem are satisfied.

(i) Let

{ }

X = (x, y) R2++ .

Then X is open as its complement

{ }

X C = (x, y) R2 | x 6 0, y 6 0

is closed.

(ii) Function f (x, y) is continuous as x, and y are continuous, and f () is obtained by taking the

product of these two continuous functions. Let g1 (x, y) = 6 x y, 2 3

g (x, y) = x, g (x, y) =

1 y 1 x

y are linear and hence continuous functions. Further fx (x, y) = 2 x , fy (x, y) = 2 y are

continuous functions. Hence f , g j ( j = 1, , 3) are continuously differentiable on X.

(iii) The set X is convex as (x1 , y1 ), (x2 , y2 ) X,then

x1 > 0, x2 > 0 x1 + (1 ) x2 > 0 (0, 1)

y1 >0, y2 > 0 y1 + (1 ) y2 > 0 (0, 1)

( )

x1 + (1 ) x2 , y1 + (1 ) y2 X.

(iv) Function f (x, y) is concave as

[ ]

y

f (x, y) = 1

2 x

1

2

x

y

y

1

4 x3

1

4

1

xy

H f (x, y) = 1 1 .

4 xy 41 x

y3

1 y

6 0,

4 x3

1 x

6 0

4 y3

and principal minor of order two

0>0

for (x, y) X. Hence f is concave. Further, g j ( j = 1, , 3) are concave being linear

functions.

Hence for the following problem

max f (x, y) = xy

(x,y)X

subject to x + y 6 6, x > 0, y > 0.

262 29. Solution to PS 8

all conditions of Kuhn -Tucker sufficiency theorem are satisfied. We need to find pair ((x , y ) , )

X R3+ , that satisfies the Kuhn-Tucker conditions.

m

(i) Di f (x ) + j Di g j (x ) = 0; i = 1, , n,

j=1

(ii) g(x ) > 0 and g(x ) = 0.

They are

1 y

(29.14) 1 + 2 = 0

2 x

1 x

(29.15) 1 + 3 = 0

2 y

(29.16) 6 x y > 0, 1 (6 x y) = 0

(29.17) x > 0, 2 x = 0; y > 0, 3 y = 0

If 1 = 0, then 1

2

x

y 1 + 3 = 0 3 = 1

2 y < 0 which contradicts 3 > 0. Hence

x

1 > 0 6 x y = 0

Since x > 0, y > 0, 2 = 0, 3 = 0,

1 y 1 x 1 x 1 y

1 + 2 = 1 + 3 = 0 = 1 =

2 x 2 y 2 y 2 x

x = y 6 x y = 0 x = y = 3 > 0.

Note that all conditions are satisfied. Hence it is a global maximum on X. Observe that it is also

a global maximum on R2+ as

f (x, y) = 0 for (x, y) = R2+ \ X

and f (3, 3) > 0. Hence, (3, 3) solves the optimization problem.

5. Let f : R2 R

max f (x, y) = x + ln (1 + y)

(29.18)

subject to x 0, y 0 and x + py m.

Again we will use Kuhn Tucker Sufficiency theorem. We need to check all conditions of the

Theorem are satisfied.

(i) Let { }

X = (x, y) R2 | x > 1, y > 1 .

Then X is open as its complement

{ }

X C = (x, y) R2 | x 6 1, y 6 1

is closed.

29. Solution to PS 8 263

(ii) Function f (x, y) is continuous as x and ln (1 + y), for y > 1 are continuous, and f () is

sum of two continuous functions. Let g1 (x, y) = m x py, g2 (x, y) = x, g3 (x, y) = y are

1

linear and hence continuous functions. Further fx (x, y) = 1, fy (x, y) = 1+y are continuous

functions. Hence f , g j ( j = 1, , 3) are continuously differentiable on X.

(iii) The set X is convex as (x1 , y1 ), (x2 , y2 ) X, then

x1 > 1, x2 > 1 x1 + (1 ) x2 > 1 (0, 1)

y1 > 1, y2 > 1 y1 + (1 ) y2 > 1 (0, 1)

( )

x1 + (1 ) x2 , y1 + (1 ) y2 X.

(iv) Function f (x, y) is concave as

[ ]

f (x, y) = 1 1

1+y

[ ]

0 0

H f (x, y) = 0 1 .

(1+y)2

0 6 0,

1

6 0

(1 + y)2

and principal minor of order two

0>0

for (x, y) X. Hence f is concave. g j ( j = 1, , 3) are concave being linear functions.

Hence for the following problem

max f (x, y) = x + ln (1 + y)

(x,y)X

subject to x + py 6 m, x > 0, y > 0.

all conditions of Kuhn -Tucker sufficiency theorem are satisfied. We need to find pair ((x , y ) , )

X R3+ , that satisfies the Kuhn-Tucker conditions.

m

(i) Di f (x ) + j Di g j (x ) = 0; i = 1, , n, and

j=1

(ii) g(x ) > 0 and g(x ) = 0.

They are

(29.19) 1 1 + 2 = 0

1

(29.20) p1 + 3 = 0

1+y

(29.21) m x py > 0, 1 (m x py) = 0

(29.22) x > 0, 2 x = 0; y > 0, 3 y = 0.

264 29. Solution to PS 8

1 > 0 m x py = 0

and x = y = 0 is ruled out because m > 0. There are three remaining cases.

(i) x > 0, y = 0. Note 2 = 0, x = m,

1 = 1

1 p + 3 = 0

3 = p 1.

(ii) x = 0, y > 0. Note 3 = 0, y = mp ,

1 1

( )= == 1

p 1 + mp p+m

1 1 + 2 = 0

1

1 + 2 = 0

p+m

1

2 = 1.

p+m

( )

1

If p+m 1 > 0 1 > p+m, then 2 > 0. So solution is 0, mp , p+m

1 1

, p+m 1, 0 if p+m 6

1.

(iii) x > 0, y > 0. Note 2 = 0, 3 = 0,

1 1

(29.23) 1 = 1 ,

= p y = 1 > 0

1+y p

(29.24) m x py = 0 x = m 1 + p > 0

( )

Hence for 1 > p > 1 m, the solution is m 1 + p, 1p 1, 1, 0, 0 . Combining them the

( )

solution x , y , 1 , 2 , 3 is

(m, 0, 1, 0, p )1) if p > 1

( m 1 1

0, p , p+m , p+m 1, 0 if p 6 1 m and

( )

m 1 + p, 1p 1, 1, 0, 0 if 1 m < p < 1.

The Kuhn Tucker Sufficiency Theorem asserts that this solution is a global maximum and

therefore solves both the problem.

29. Solution to PS 8 265

max f (x)

(29.25) subject to g(x) 0

and xX

}

max f (x)

(29.26)

subject to x X

(a) We claim that x is also a solution to problem (29.25). For, if this is not the case, then since

x is in the constraint set {x X : g(x) 0} of problem (29.25), there is some x X, with

g(x ) 0, such that f (x ) > f (x). But, since x X and is therefore in the constrain set

of problem (29.26), this means that x is not a solution to problem (29.26), a contradiction.

This establishes our claim. [Note that we are not given the information that problem (29.25)

has a solution, and so we do not make use of this information in the answer].

(b) Let x be any solution to problem (29.25). Note that since both x and x are in X, the constraint

set of problem (29.26), and x solves problem (29.26), we have

(29.27) f (x) f (x)

We claim that g(x) = 0. For if g(x) = 0, we must have g(x) > 0, since x is a solution to

problem (29.25), and must therefore be in the constraint set {x X : g(x) 0} of problem

(29.25).

Since x is not a solution to problem (29.25), and x X, it must be the case that g(x) < 0.

For if g(x) 0, then, given (29.27), x would also solve problem (29.25).

Since g(x) < 0, continuity of g on the convex set X [using the intermediate value theorem]

implies that we can find (0, 1), such that:

(29.28) g(x + (1 )x) = 0

Denote (x+(1)x) by z. Then z X and g(z) = 0 by (29.28), so z satisfies the constraints

of problem (29.25).

Since f is strictly quasi-concave on X, then we can use x = x [recall that g(x) < 0 while

g(x) > 0], and (0, 1), to obtain:

f (z) = f (x + (1 )x) > min{ f (x), f (x)} = f (x)

using (29.27). But this contradicts the fact x solves (29.25), and establishes our claim.

7. Suppose that a consumer has the utility function U(x, y) = xa yb and faces the budget constraint

px x + py y I.

(A) Utility Maximization

266 29. Solution to PS 8

(a) What are the first order conditions for utility maximization?

Observe that the utility function makes sense only if a > 0 and b > 0. The Lagrangean

for the optimization problem is

L (x, y, ) = U(x, y) + (I px x py y)

= xa yb + (I px x py y)

The first order conditions are,

L

= axa1 yb px = 0

x

L

= bxa yb1 py = 0

y

L

= I px x py y = 0

(b) Solve for the consumers demands for goods x and y.

From the first two FOCs, we get

axa1 yb = px

bxa yb1 = py

Dividing the first equation by the second, we get

axa1 yb px

=

bxa yb1 py

ay px

=

bx py

b

py y = px x.

a

We use this in the third FOC to get,

px x + py y = I

b

px x + px x = I

a

a+b

px x = I

a

a a I

px x = I x =

a+b a + b px

This gives

b b I

py y = I y =

a+b a + b py

29. Solution to PS 8 267

(c) Solve for the value of . What is the economic interpretation of ? When is an

increasing, decreasing or constant function of income?

We use the first FOC (with respect to x) to get,

axa1 yb

axa1 yb = px =

px

( )a1 ( )b

a I b I

a a+b px a+b py

=

px

( )a ( )b ( )a+b1

a b I

= >0

px py a+b

L (x , y , ) = U(x , y ) + (I px py y )

= (x )a (y )b + (0)

Suppose the income increased by a dollar. Then the utility goes up by .

Lastly the is increasing with income if and only if a + b > 1.

(d) Show that the second order conditions hold?

Observe that the second order partial derivatives are,

2 L

= a(a 1)xa2 yb

x2

2 L

= abxa1 yb1

xy

2 L

= b(b 1)xa yb2

y2

2 L

= px

x

2 L

= py

y

2

L 2 L 2 L

x

2 xy x a(a 1)x a2 yb abx a1 yb1 p x

a yb2 p .

H = 2 L

2L 2 L

xy y 2 y = abx a1 yb1 b(b 1)x y

2 L 2 L 2 L px py 0

x y 2

The border preserving leading principal minor of order 2 is the Hessian matrix itself.

For the second order condition to be satisfy, the determinant of the Hessian needs to be

268 29. Solution to PS 8

positive.

(py )[(py )a(a 1)xa2 yb (px )abxa1 yb1 ]

= px [py abxa1 yb1 px b(b 1)xa yb2 ] py [py a(a 1)xa2 yb px abxa1 yb1 ]

= 2px py abxa1 yb1 p2x b(b 1)xa yb2 p2y a(a 1)xa2 yb

[ ]

2abp p b(b 1)p 2 a(a 1)p 2

= (x )a (y )b

x y y

x

xy y2 x2

2abpx py b(b 1)px a(a 1)py

2 2

= (x )a (y )b aI bI

bI

aI

(a+b)px (a+b)py ( (a+b)p y

)2 ( (a+b)p x

)2

( [ ]2 [ ]2 [ ]2 )

(a + b)p p b 1 p p a 1 p p

= (x )a (y )b 2

x y x y x y

(a + b)2 (a + b)2

I b I a I

[ ]2 ( )

(a + b)px py b1 a1

= (x )a (y )b 2

I b a

[ ]2 ( )

a b (a + b)px py 1 1

= (x ) (y ) 21+ 1+

I b a

[ ]2 ( )

(a + b)px py 1 1

= (x )a (y )b + >0

I b a

dx

(e) Show that the implicit function theorem value of dI is identical to the value of taking

the partial derivative of x with respect to I.

Using x , we get

x a 1

=

I a + b px

29. Solution to PS 8 269

0 abxa1 yb1 px

det 0 b(b 1)xa yb2 py

dx 1 py 0 1[(py )abxa1 yb1 (px )b(b 1)xa yb2 ]

= =

dI det H det H

[py abx y px b(b 1)x y ]

a1 b1 a b2

=

det H

[apy y px (b 1)x] xpx bxa yb2 px

= bxa1 yb2 = bxa1 yb2 =

det H det H det H

bxa yb2 px

= [ ] ( )

(a+b)px py 2 1

(x )a (y )b I b + 1

a

bpx

= [ ] ( )

(a+b)px py 2 a+b

(y )2 I ab

px 1 a 1

=[ ]2 ( ) =( ) = .

y (a+b)py a+b a+b

px a + b px

bI a p2x a

(f) A consumers indirect utility function is defined to be utility as a function of prices and

income. Use x and y to solve for the indirect utility function. Is it true that the partial

of the indirect utility function with respect to income equals ?

The indirect utility function is

( )a ( )b

a b aI bI

u = u(x , y ) = (x ) (y ) =

(a + b)px (a + b)py

( )a ( ) b

a b

= I a+b

(a + b)px (a + b)py

Then,

( )a ( )b

u a b

= (a + b) I a+b1

I (a + b)px (a + b)py

( )a ( )b ( )a+b1

a b I

= =

px py a+b

(B) Expenditure Minimization:

Now consider the dual of the utility maximization problem. The dual problem is to min-

imize expenditures, px x + py y, subject to reaching a given level of utility, u0 (the constraint

is therefore U0 xa yb = 0).

270 29. Solution to PS 8

(a) What are the first order conditions for expenditure minimization?

First, we write down the minimization problem as

min px x + py y

subject to xa yb u0 ,

which can be converted into a maximization exercise as under:

max px x py y

subject to xa yb u0 ,

The Lagrangean for the maximization problem is

L (x, y, ) = px x py y + (xa yb u0 )

L

= px + axa1 yb = 0

x

L

= py + bxa yb1 = 0

y

L

= xa yb u0 = 0

(b) Use the first order conditions to solve for x and y (these are called the Hicksian or

compensated demand functions).

From the first two FOCs, we get

axa1 yb = px

bxa yb1 = py

Dividing the first equation by the second, we get

axa1 yb px

=

bx y

a b1 py

ay px

=

bx py

b px

y= x.

a py

We use this in the third FOC to get,

( )b ( ) b

a b a b p x a+b u0 a py a+b a+b1

x y = u0 ; x x = u0 ; x =( )b ; x = u0

a py b px b px

a py

( ) b ( ) a+b

a

b px b px a py a+b 1

a+b

b px 1

y = x = u0 = u0a+b

a py a py b px a py

29. Solution to PS 8 271

It is easy to see that the bordered Hessian is same as in the case of utility maximization

exercise. Hence we conclude that the SOC holds in this case.

(d) Write the level of income, I, necessary to reach U0 as a function of U0 , prices, and

parameters. How does this expenditure function relate to the indirect utility function?

( ) a+b

b ( ) a+b

a

a py 1 b px 1

e(px , py , u0 ) = px x + py y = px a+b

u0 + py u0a+b

b px a py

( ) a+b

b ( ) a+b

a

a b

= (pax pby u0 ) a+b

1

+

b a

1

a+b

( )a ( ( ) a+b

b )b

( ) a+b

a

a b a b

I a+b

1

a b a+b

= (px py ) +

(a + b)px (a + b)py b a

[( 1

)a ( )b ] a+b ( ) a+b

b ( ) a+ba

a b a b I = I.

= +

a+b a+b b a

This shows that the minimum expenditure required to attain utility equal to the indirect

utility function is same as the income I. Thus the two approaches are equivalent.

(e) To avoid confusion, let us call solution for utility maximization of good x as x and

solution for good x in expenditure minimization as h . Prove that

x h x

= x .

Px Px I

Interpret this answer.

1

Observe that we can rewrite h as h = (px ) a+b where ( ab py ) a+b u0a+b . This gives

b b

us

( ) ( )

h b a+b

b

1 b h

= (px ) = .

px a+b a + b px

Also from the utility maximization, we get,

( )

x aI x

= (px )2 = .

px a+b px

and

( )

x a

x =x (px )1 .

I a+b

272 29. Solution to PS 8

Therefore,

( )( ) ( )

x x

x a x b x h

+x = + = = .

px I px a+b px a + b px px

The change in x due to change in own price px (Total effect) is the sum of the substi-

h

tution effect ( p x

) and the income effect (x x

I ).

8. Suppose a consumer has the utility function U = a ln(x x0 ) + b ln(y y0 ) where a, b, x0 and y0

are positive parameters. Assume that the usual budget constraint applies.

(a) Solve for the consumers demand for good x.

Observe that the utility maximization exercise makes sense if consumption bundle (x0 , y0 )

is feasible. Let us denote x x0 by x and y y0 by y . Then the utility function can be

written as U(x , y ) = a ln(x ) + b ln(y ). The budget constraint px + qy = I can be written as

px +qy = I px0 qy0 = I . The utility maximization exercise can therefore be formulated

as

max a ln(x ) + b ln(y )

subject to px + qy = I .

L (x , y , ) = a ln(x ) + b ln(y ) + (I px x py y )

L a

= px = 0

x x

L b

= py = 0

y y

L

= I px x py y = 0

a b

= px ; = py

x y

ay px b

= ; py y = px x .

bx py a

29. Solution to PS 8 273

px x + py y = I

b

px x + px x = I

a

a+b

px x = I

a

a a I

px x = I x =

a+b a + b px

This gives

b b I

py y = I y =

a+b a + b py

We need to show that the second order conditions hold for the solution to yield a maximum.

Observe that the second order partial derivatives are,

2 L a 2 L 2 L b 2 L 2 L

= ; = 0; = ; = p x ; = py

x2 (x )2 xy y2 (y )2 x y

Using these, we get the bordered Hessian matrix as under:

2 a

L 2 L 2 L

x xy x (x )2 0 px

2 L 2 L 2 L

2

H = xy y2 y = 0 (yb )2 py .

2 L 2 L 2 L px py 0

x y 2

The border preserving leading principal minor of order 2 is the Hessian matrix itself. For

the second order condition to be satisfy, the determinant of the Hessian needs to be positive.

[ ( )] [ ( )]

b a

det H = (px ) (px ) 2 (py ) (py ) 2

(y ) (x )

( ) ( )

bp2x ap2y

= + > 0.

(y )2 (x )2

Thus SOC holds and we have a maximum. The optimum consumption bundle is

a I px x0 py y0

x = x + x0 = + x0

a+b px

a I py y0 b

= + x0

a + b px a+b

b I px x0 a

y = + y0

a + b py a+b

(b) Find the elasticities of demand for good x with respect to income and prices.

It is easy to compute the price and income elasticity using the definitions. Please let me

know if you have any questions on this.

274 29. Solution to PS 8

(c) Show that the utility function V = 45(x x0 )3.5a (y y0 )3.5b would have yielded the same

demand for good x.

If we take positive monotone transformation of the given utility by taking its natural log,

then we get a function which is similar to the utility function in (a).

lnV = ln 45 + 3.5a ln(x x0 ) + 3.5b ln(y y0 )

= ln 45 + 3.5(U)

This implies that the consumption bundle (x , y ) will maximize the utility function V also.

U(x, y, z) = a ln(x) + b ln(y) + c ln(z)

where a > 0, b > 0 and c > 0 are such that a + b + c = 1. The budget constraint can be written

as

g1 (x, y, z) = I px qy rz 0.

The rationing constraint is

g2 (x, y, z) = k x 0.

(a) This problem has two inequality constraints and so we will use Kuhn Tucker Sufficiency

theorem.

(i) Let { }

X = (x, y, z) R2+++ .

Then X is open as its complement

{ }

X C = (x, y, z) R3 | x 6 0, y 6 0, z 6 0

is closed.

(ii) Function U (x, y, z) is continuous in x, y, and z (being sum of log functions). Let

g1 (x, y, z) = I pxqyrz, g2 (x, y, z) = kx, g3 (x, y, z) = x, g4 (x, y, z) = y, g5 (x, y, z) =

z are linear and hence continuous functions.

It is possible to infer that f , g j ( j = 1, , 5) are twice continuously differentiable on

X and the set X is convex.

(iii) Function U (x, y, z) is concave as

[ ]

f (x, y, z) = a

x

b

y

c

z

x2

a

0 0

0

H f (x, y, z) = 0 yb2 .

0 0 z2 c

The determinant of leading principal minors of order one is xa2 < 0; of leading prin-

cipal minor of order two is xab

2 y2 > 0; and of leading principal minor of order three

29. Solution to PS 8 275

is x2abc y2 z2

< 0 for (x, y, z) X. Hence f is concave. Further, g j ( j = 1, , 5) are

concave being linear functions.

Hence all conditions of Kuhn -Tucker sufficiency theorem are satisfied. We need to find pair

((x , y , z ) , ) X R5+ , that satisfies the Kuhn-Tucker conditions.

5

(i) Di f (x , y , z ) + j Di g j (x , y , z ) = 0; i = 1, , 3,

j=1

j

They are

(29.29) a

x 1 p 2 + 3 = 0

(29.30) b

y 1 q + 4 = 0

(29.31) c

z 1 r + 5 = 0

(29.32) I px qy rz > 0, 1 (I px qy rz) = 0

(29.33) k x > 0, 2 (k x) = 0

(29.34) x > 0, 3 x = 0; y > 0, 4 y = 0, z > 0, 5 z = 0.

If 1 = 0, then by 1 q + 4 = 0 4 = by < 0 which contradicts 4 > 0. Hence

1 > 0 I px qy rz = 0

Also, x > 0, y > 0, and z > 0 for the three FOC to hold with equality. Thus 3 = 0 = 4 = 5 .

(i) If 2 > 0 then x = k, and

b c

I pk = qy + rz = + .

1 1

Thus 1 = b+c

Ipk which leads to

b(I pk) c(I pk)

y= and z = .

q(b + c) q(b + c)

We need to verify 2 > 0 which will hold if 2 = ak (b+c)p

Ipk > 0 or

a pk

> .

b + c I pk

(ii) If 2 = 0, then

aI bI cI I

x= ;y = ;z = ; 1 =

p(a + b + c) q(a + b + c) r(a + b + c) (a + b + c)

satisfies the KT conditions (Please verify).

(b)

a pk

> .

b + c I pk

276 29. Solution to PS 8

(c)

b(Ipk)

py (b+c) b

= c(Ipk)

= .

rz c

(b+c)

(d) No, it is more likely that one buys more rice and less butter.

- IvyGlobal-SAT Math ReviewUploaded byJakBlack
- Math 120 Final Exam Review (3)Uploaded byBrent Matheny
- Reviewer MathUploaded byMel-jr Valencia
- LET Math Final HandoutUploaded byCarla Naural-citeb
- MAJOR ReviewerUploaded byIyen Dalisay
- Math Review 2 AlgebraUploaded bynevers23
- Math reviewUploaded bytwsttwst
- Percent Increase and DecreaseUploaded byMr. Peterson
- Let Review 2015Uploaded byJona Addatu
- DynaMath Reviewer for Basic AlgebraUploaded byAngeli98
- April Math AssessmentUploaded bymeemsickle
- Grade 8 | Math assessmentUploaded byCourier Journal
- Math ReviewUploaded byMauricio Gracia
- Grade 10 Math ReviewUploaded bynob
- CPTCollegeMathReviewUploaded byMat0021
- Math Review NAT_20151010Uploaded bydramachines
- SAT/ ACT/ Accuplacer Math ReviewUploaded byaehsgo2college
- Writing and Math AssessmentUploaded by1224adh
- LET General Math ReviewerUploaded byMarco Rhonel Eusebio
- Grade 9 Math ReviewUploaded byCourseCentral

- Stanford CS103 Course ReaderUploaded byTucker Leavitt
- Kmpt Kedah 2016 AnsUploaded bySky Chin
- Mfcs NotesUploaded bySeravana Kumar
- What is Proof?Uploaded byMichael de Villiers
- Transformation GroupsUploaded byteesri_aankh
- 1andreescu_titu_mortici_cristinel_tetiva_marian_mathematical.pdfUploaded byMarcos Vinícius
- Section1.3Uploaded byKaly Rie
- How Euler Did It 09 v E and F Part 2Uploaded byΗαμιδ Εφθηχαρι
- Descartes StanfordUploaded byAnonyo X Plt
- Principle of Mathematical InductionUploaded byAnirvan Shukla
- congruence.08Uploaded bybelizeannairb
- Jacquette v20n2Uploaded byE. Goldstajn
- RESPONSES TO "THEORETICAL MATHEMATICS: TOWARD A CULTURAL SYNTHESIS OF MATHEMATICS AND THEORETICAL PHYSICS", BY A. JAFFE AND F. QUINNUploaded bycalamart
- 17-4Uploaded byksr131
- Martin-Löf - A Path From Logic to MetaphysicsUploaded byAnderson Luis Nakano
- Argument Essay Rubric 1Uploaded bygrace
- Euclid 5th Postulate's Verification by Al-HaythamUploaded byempezardc
- AI Propsitional LogicUploaded byambynith
- G_Reading_ListsUploaded byapi-3752698
- 2.Goursat’s proofUploaded byMichael Kwok
- The Proof-Structure of Kant's Transcendental DeductionUploaded byZhennya Slootskin
- Discrete Math ReportUploaded byseidametova
- Numerical Methods in EconomicsUploaded byJuanfer Subirana Osuna
- Discrete Math and Probability TheoryUploaded byninjatron
- Richard Carrier- Proving History or IdiocyUploaded by50_BMG
- 370 Proofs of Pythagores TheoremUploaded byPawan Pathak
- Occult Ability & ClarvoyanceUploaded byLaron Clark
- PLC Formal ArticuloUploaded byCesar Fabian Bolívar Guerrero
- CORRY - history of Fermat's Last Theorem.pdfUploaded byYannis Vlioras
- MacKenzie TheAutomationOfProofUploaded byMisósofo Butron