You are on page 1of 821

Lectures on Financial Economics

by Antonio Mele

Swiss Finance Institute, University of Lugano


Centre for Economic Policy Research

April 2016

c
by
A. Mele

Front cover explanations


Top: Illustration of the increased e ciency in maritime routing allowed by the Suez
Canal (right panel) opened in 1869, and the Panama Canal (left panel) opened in
1913, two amongst the most enduring technological marvels with global economic
and political implications.
Bottom: A 75 year 3% coupon bearing bond issued by the Panama Canal Company
(Compagnie Universelle du Canal Interoceanique de Panama) in October 1884.
The company defaulted in 1889 under the leadership of the Count Ferdinand de
Lesseps, who during 1858 had also founded the Suez Canal Company (Compagnie
Universelle du Canal Maritime de Suez).

ii

Preface

These Lectures on Financial Economics are based on notes I wrote in support of advanced
undergraduate and graduate lectures in nancial economics, macroeconomic dynamics, nancial
econometrics and nancial engineering.
Part I, Foundations, develops the fundamentals tools of analysis used in Part II and Part III.
These tools span such disparate topics as classical portfolio selection, dynamic consumption- and
production- based asset pricing, in both discrete and continuous-time, the intricacies underlying
incomplete markets and other market imperfections and, nally, econometric tools comprising
maximum likelihood, methods of moments, and the relatively more modern simulation-based
inference methods.
Part II, Applied asset pricing theory, is about identifying the main empirical facts in nance
and the challenges they pose to nancial economists: from excess price volatility and countercyclical stock market volatility, to cross-sectional puzzles such as the value premium. This
second part reviews the main models aiming to take these puzzles on board.
Part III, Asset pricing and reality, aims just to this: to use the main tools in Part I and the
lessons drawn from Part II, so as to cope with the main challenges occurring in actual capital
markets, arising from option pricing and trading, interest rate modeling and credit risk and
their associated derivatives. In a sense, Part II is about the big puzzles we face in fundamental
research, while Part III is about how to live within our current and certainly unsatisfactory
paradigms, so as to cope with demand for intellectual expertise.
These notes are still underground. Economic motivation and intuition are not always developed
as they would deserve, some derivations are inelegant, and sometimes, the English is a bit
informal. Moreover, I still have to include material on monetary models of asset prices, theories
of the nominal and the real term structure of interest rates, bubbles, asset prices implications of
overlapping generations models, or nancial frictions and their interconnections with business
cycle developments. Finally, I need to include more extensive surveys for each topic I cover,
especially in Chapters 1, 3, 5, 6, and 10. Of the 13 Chapters I have already drafted, I believe

c
by
A. Mele
Chapters 1 and 6 are those in need of the most serious revamp. I plan to revise these notes to
ll all these gaps. Meanwhile, any comments on this version are more than welcome.

Antonio Mele
April 2016

iv

c
by
A. Mele

Antonio Mele does not accept any liability for any losses related to the use of the
models, data, and methods described or developed in these lectures.

Contents

Foundations

1 The classic capital asset pricing model


1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Portfolio selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.1 Wealth constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.2 Portfolio choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.3 Without the safe asset . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.4 The global minimum variance portfolio . . . . . . . . . . . . . . . . .
1.2.5 The market portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.6 Tobins re-interpretation of Keynesian speculative demand for money
1.3 The CAPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.1 Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.2 Zero-beta CAPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.3 Equilibrium with expected utility . . . . . . . . . . . . . . . . . . . .
1.3.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4 The APT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4.1 Exact APT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4.2 Risk-neutral tilts, or the fundamental theorem of asset pricing . . . .
1.4.3 The APT with idiosyncratic risk and a large number of assets . . . .
1.4.4 Systematic risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5 Empirical evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5.1 Fama & Mac Beth . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5.2 Macroeconomic forces . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5.3 Fama & French . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5.4 The high-beta stocks anomaly . . . . . . . . . . . . . . . . . . . . . .

14
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

15
15
15
15
16
17
18
19
21
23
23
24
25
25
26
26
28
31
33
33
33
33
33
35

c
by
A. Mele

Contents
1.6 Stochastic dominance . . . . . . . . . . . . . . . . . . . .
1.7 Appendix 1: Analytical details relating to portfolio choice
1.7.1 The primal program . . . . . . . . . . . . . . . .
1.7.2 The dual program . . . . . . . . . . . . . . . . . .
1.8 Appendix 2: The market portfolio . . . . . . . . . . . . .
1.8.1 The tangent portfolio is the market portfolio . . .
1.8.2 Tangency condition . . . . . . . . . . . . . . . . .
1.9 Appendix 3: An alternative derivation of the SML . . . .
1.10 Appendix 4: Demand for money and liquidity traps . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

35
39
39
40
42
42
42
44
45
47

2 Arbitrage, equilibrium and pricing


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 The static general equilibrium in a nutshell . . . . . . . . . . . . . . . . . . . . .
2.2.1 Walras Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2 Competitive equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.3 Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 The role of nancial securities in markets with uncertainty . . . . . . . . . . . .
2.3.1 Commodity markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 Financial securities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.3 Gambles and securities . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.4 Arrow-Debreu securities . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.5 Pricing by arbitrage and replication in complete markets: an introductory
example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.6 Replication and pricing: the role of complete markets . . . . . . . . . . .
2.4 No-arbitrage theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.1 Lands of Cockaigne . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.2 Enforced asset prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5 Equivalent martingales, and equilibrium . . . . . . . . . . . . . . . . . . . . . .
2.5.1 Rational expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5.2 Stochastic discount factors . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5.3 Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6 Consumption-CAPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6.1 Risk-neutral pricing and macroeconomic risks . . . . . . . . . . . . . . .
2.6.2 The beta relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6.3 CCAPM & CAPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7 Innite horizon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8 Further topics on incomplete markets . . . . . . . . . . . . . . . . . . . . . . . .
2.8.1 Nominal assets and real indeterminacy of the equilibrium . . . . . . . . .
2.8.2 Nonneutrality of money . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.9 Appendix 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.10 Appendix 2: Proofs of selected results . . . . . . . . . . . . . . . . . . . . . . . .
2.11 Appendix 3: The multicommodity case . . . . . . . . . . . . . . . . . . . . . . .
2

49
49
51
52
52
53
56
56
57
58
61
64
66
66
67
68
70
71
71
73
78
79
80
81
81
82
82
83
84
85
88

c
by
A. Mele

Contents

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 Innite horizon economies
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Consumption-based asset evaluation . . . . . . . . . . . . . . . . . . .
3.2.1 Recursive plans: introduction . . . . . . . . . . . . . . . . . .
3.2.2 Asset pricing: the marginalist argument . . . . . . . . . . . . .
3.2.3 Intertemporal elasticity of substitution . . . . . . . . . . . . .
3.2.4 Lucas model . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Production: foundational issues . . . . . . . . . . . . . . . . . . . . .
3.3.1 Decentralized economy . . . . . . . . . . . . . . . . . . . . . .
3.3.2 The social planner solution . . . . . . . . . . . . . . . . . . . .
3.3.3 Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.4 Stochastic economies . . . . . . . . . . . . . . . . . . . . . . .
3.4 Production-based asset pricing . . . . . . . . . . . . . . . . . . . . . .
3.4.1 Firms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.2 Consumers . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.3 Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5 Money, production and asset prices in overlapping generations models
3.5.1 Introduction: endowment economies . . . . . . . . . . . . . . .
3.5.2 Diamonds model . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.3 Money . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.4 Money in a model with real shocks . . . . . . . . . . . . . . .
3.6 Optimality and bubbles . . . . . . . . . . . . . . . . . . . . . . . . .
3.6.1 Economies with production . . . . . . . . . . . . . . . . . . .
3.6.2 Over-accumulation of capital . . . . . . . . . . . . . . . . . . .
3.6.3 Money . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.7 Appendix 1: Finite di erence equations, with economic applications .
3.8 Appendix 2: Neoclassic growth in continuous-time . . . . . . . . . . .
3.8.1 Convergence from discrete-time . . . . . . . . . . . . . . . . .
3.8.2 The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.9 Appendix 3: Notes on optimization of continuous time systems . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 Continuous time models
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . .
4.2 An introduction to no-arbitrage and equilibrium . .
4.2.1 Time . . . . . . . . . . . . . . . . . . . . . .
4.2.2 The origins: Black & Scholes . . . . . . . . .
4.2.3 Asset prices as Feynman-Kac representations
4.2.4 The Girsanov theorem . . . . . . . . . . . .
4.2.5 The APT in continuous time . . . . . . . . .
4.2.6 Example: no-arbitrage in Lucas tree . . . . .
3

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

90
91
91
91
91
93
93
94
98
98
99
100
102
107
107
110
111
111
111
114
114
118
119
119
120
120
122
126
126
127
129
132

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

133
. 133
. 134
. 134
. 135
. 139
. 141
. 144
. 147

c
by
A. Mele

Contents
4.3 Distorsions and numeraires . . . . . . . . . . . . . . . . . . . . . . . .
4.3.1 Leading example: consumption-based probabilities . . . . . . .
4.3.2 Numeraire pricing . . . . . . . . . . . . . . . . . . . . . . . . .
4.4 Martingales and arbitrage . . . . . . . . . . . . . . . . . . . . . . . .
4.4.1 The information framework . . . . . . . . . . . . . . . . . . .
4.4.2 Viability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.3 Market completeness . . . . . . . . . . . . . . . . . . . . . . .
4.5 Equilibrium with a representative agent . . . . . . . . . . . . . . . . .
4.5.1 Mertons approach: dynamic programming . . . . . . . . . . .
4.5.2 Martingale methods . . . . . . . . . . . . . . . . . . . . . . .
4.5.3 Continuous time Consumption-CAPM . . . . . . . . . . . . .
4.6 Partial hedging in incomplete markets: introduction . . . . . . . . . .
4.7 Inaction: the economics of American options . . . . . . . . . . . . . .
4.7.1 Early exercise premiums: an introductory example . . . . . . .
4.7.2 Gambles and securities again . . . . . . . . . . . . . . . . . .
4.7.3 Real options theory . . . . . . . . . . . . . . . . . . . . . . . .
4.7.4 Perpetual puts . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7.5 Perpetual calls . . . . . . . . . . . . . . . . . . . . . . . . . .
4.8 Further topics on real options and controlled Brownian motions . . .
4.8.1 Irreversible investments and the decision to invest . . . . . . .
4.8.2 A model of determination of exchange rates in target zones . .
4.8.3 Liquidity constraints and optimal dividend policy . . . . . . .
4.9 Portfolio constraints . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.9.1 Technical background . . . . . . . . . . . . . . . . . . . . . . .
4.9.2 Articial markets . . . . . . . . . . . . . . . . . . . . . . . . .
4.10 Jumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.10.1 Poisson jumps . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.10.2 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.10.3 Properties and related distributions . . . . . . . . . . . . . . .
4.10.4 Asset pricing implications . . . . . . . . . . . . . . . . . . . .
4.10.5 An option pricing formula . . . . . . . . . . . . . . . . . . . .
4.11 Continuous time Markov chains . . . . . . . . . . . . . . . . . . . . .
4.12 Appendix 1: An introduction to stochastic calculus for nance . . . .
4.12.1 Stochastic integrals . . . . . . . . . . . . . . . . . . . . . . . .
4.12.2 Stochastic di erential equations . . . . . . . . . . . . . . . . .
4.13 Appendix 2: Self-nanced strategies, from discrete to continuous time
4.13.1 The basic dynamics . . . . . . . . . . . . . . . . . . . . . . . .
4.13.2 Models with nal consumption only . . . . . . . . . . . . . . .
4.14 Appendix 3: Proof of selected results . . . . . . . . . . . . . . . . . .
4.14.1 Proof of Theorem 4.3 . . . . . . . . . . . . . . . . . . . . . . .
4.14.2 Proof of Eq. (4.82). . . . . . . . . . . . . . . . . . . . . . . . .
4.14.3 Walrass consistency tests . . . . . . . . . . . . . . . . . . . .
4

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

153
153
154
158
158
159
160
162
162
163
167
168
168
168
169
170
171
172
175
175
176
177
181
181
184
185
185
186
187
188
189
189
190
190
200
205
205
205
208
208
209
209

c
by
A. Mele

Contents
4.15 Appendix 4: The Greens function . . .
4.15.1 Setup . . . . . . . . . . . . . .
4.15.2 The PDE connection . . . . . .
4.16 Appendix 5: Portfolio constraints . . .
4.17 Appendix 6: Topics on jumps . . . . .
4.17.1 The Radon-Nikodym derivative
4.17.2 Arbitrage restrictions . . . . . .
4.17.3 State price density: introduction
4.17.4 State price density: general case
References . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

5 Taking models to data


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Data generating processes . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.2 Restrictions on the DGP . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.3 Parameter estimators . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.4 Basic properties of density functions . . . . . . . . . . . . . . . . . .
5.2.5 The Cramer-Rao lower bound . . . . . . . . . . . . . . . . . . . . . .
5.3 Maximum likelihood estimation . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.2 Factorizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.3 Asymptotic properties . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4 M-estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5 Pseudo, or quasi, maximum likelihood . . . . . . . . . . . . . . . . . . . . .
5.6 GMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.7 Simulation-based estimators . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.7.1 Three simulation-based estimators . . . . . . . . . . . . . . . . . . . .
5.7.2 Asymptotic normality . . . . . . . . . . . . . . . . . . . . . . . . . .
5.7.3 A fourth simulation-based estimator: Simulated maximum likelihood
5.7.4 Advances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.7.5 In practice? Latent factors and identication . . . . . . . . . . . . . .
5.8 Asset pricing, prediction functions, and statistical inference . . . . . . . . . .
5.9 Appendix 1: Proof of selected results . . . . . . . . . . . . . . . . . . . . . .
5.10 Appendix 2: Collected notions and results . . . . . . . . . . . . . . . . . . .
5.11 Appendix 3: Theory for maximum likelihood estimation . . . . . . . . . . . .
5.12 Appendix 4: Dependent processes . . . . . . . . . . . . . . . . . . . . . . . .
5.12.1 Weak dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.12.2 The central limit theorem for martingale di erences . . . . . . . . . .
5.12.3 Applications to maximum likelihood . . . . . . . . . . . . . . . . . .
5.13 Appendix 5: Proof of Theorem 5.4 . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

211
211
212
213
215
215
216
216
217
219

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

221
. 221
. 221
. 221
. 222
. 223
. 223
. 224
. 224
. 224
. 224
. 225
. 227
. 228
. 229
. 232
. 233
. 235
. 238
. 239
. 239
. 240
. 244
. 245
. 248
. 249
. 249
. 249
. 249
. 251
. 252

c
by
A. Mele

Contents

II

Applied asset pricing theory

6 Neo-classical kernels and puzzles


6.1 Introduction . . . . . . . . . . . . . . . . . . . . .
6.2 The equity premium puzzle . . . . . . . . . . . .
6.2.1 A single-factor model . . . . . . . . . . . .
6.2.2 Extensions . . . . . . . . . . . . . . . . . .
6.2.3 Equity premium and interest rate puzzles .
6.3 Hansen-Jagannathan cup . . . . . . . . . . . . . .
6.4 Multifactor extensions . . . . . . . . . . . . . . .
6.4.1 Exponential a ne pricing kernels . . . . .
6.4.2 Lognormal returns . . . . . . . . . . . . .
6.5 Conditional CAPM . . . . . . . . . . . . . . . . .
6.6 Pricing kernels and Sharpe ratios . . . . . . . . .
6.6.1 Market portfolios and pricing kernels . . .
6.6.2 Pricing kernel bounds . . . . . . . . . . . .
6.7 Conditioning bounds . . . . . . . . . . . . . . . .
6.8 Appendix . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . .

255
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

7 Aggregate uctuations in equity markets


7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Empirical evidence: birds eye view . . . . . . . . . . . . . .
7.3 Volatility: a business cycle perspective . . . . . . . . . . . .
7.3.1 Volatility cycles . . . . . . . . . . . . . . . . . . . . .
7.3.2 Understanding the empirical evidence . . . . . . . . .
7.3.3 What to do with stock market volatility? . . . . . . .
7.3.4 What did we learn? . . . . . . . . . . . . . . . . . . .
7.4 Rational market uctuations . . . . . . . . . . . . . . . . . .
7.4.1 The dynamics of asset returns . . . . . . . . . . . . .
7.4.2 Asset prices as options . . . . . . . . . . . . . . . . .
7.5 Time-varying discount rates or uncertain growth? . . . . . .
7.5.1 Tackling the puzzles . . . . . . . . . . . . . . . . . .
7.5.2 Markov pricing kernels, asset returns and volatility .
7.5.3 External habit formation . . . . . . . . . . . . . . . .
7.5.4 Large price swings as a learning induced phenomenon
7.5.5 Linearity-generating processes . . . . . . . . . . . . .
7.6 Modeling market-to-book ratios . . . . . . . . . . . . . . . .
7.7 Appendix 1: Estimation of the market expected return . . .
7.8 Appendix 2: Calibration of the tree in Section 7.3 . . . . . .
7.9 Appendix 3: Asset prices in a multifactor model . . . . . . .
7.10 Appendix 4: Arrow-Debreu PDEs . . . . . . . . . . . . . . .
7.11 Appendix 5: The maximum principle . . . . . . . . . . . . .
6

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

256
. 256
. 257
. 257
. 260
. 260
. 262
. 264
. 265
. 266
. 268
. 268
. 268
. 270
. 272
. 273
. 276

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

277
. 277
. 278
. 285
. 286
. 287
. 292
. 296
. 297
. 297
. 299
. 304
. 304
. 304
. 306
. 310
. 317
. 320
. 321
. 322
. 324
. 325
. 326

c
by
A. Mele

Contents
7.12 Appendix 6: Stochastic dominance beyond Rothschild and Stiglitz .
7.12.1 Dynamic stochastic dominance . . . . . . . . . . . . . . . .
7.12.2 Proof of Theorem 7.1 . . . . . . . . . . . . . . . . . . . . . .
7.13 Appendix 7: Dynamics of habit in Campbell and Cochrane (1999) .
7.14 Appendix 8: An algorithm to simulate discrete-time pricing models
7.15 Appendix 9: Heuristic details of learning in continuous time . . . .
7.16 Appendix 10: Linear regime-switching economies . . . . . . . . . . .
7.17 Appendix 11: Bond price convexity revisited . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8 Macronance
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2 Non-expected utility . . . . . . . . . . . . . . . . . . . . . . . .
8.2.1 Recursive formulations . . . . . . . . . . . . . . . . . . .
8.2.2 Testable restrictions . . . . . . . . . . . . . . . . . . . .
8.2.3 Risk premiums and interest rates . . . . . . . . . . . . .
8.2.4 Campbell-Shiller approximation . . . . . . . . . . . . . .
8.2.5 Risks for the long-run . . . . . . . . . . . . . . . . . . .
8.3 Heterogeneous agents and catching up with the Joneses . . . .
8.4 Idiosyncratic risk . . . . . . . . . . . . . . . . . . . . . . . . . .
8.4.1 A static model . . . . . . . . . . . . . . . . . . . . . . .
8.4.2 Self-insurance and persistence of idiosyncratic shocks . .
8.4.3 A model with countercyclical income inequality . . . . .
8.5 Incomplete markets with homogeneous and heterogenous agents
8.5.1 Idiosyncratic shocks unrelated to aggregate risk . . . . .
8.5.2 A two-agents economy . . . . . . . . . . . . . . . . . . .
8.6 Disagreement and learning . . . . . . . . . . . . . . . . . . . . .
8.6.1 Learning with multiple signals . . . . . . . . . . . . . . .
8.6.2 Overcondence and bubbles . . . . . . . . . . . . . . . .
8.6.3 General equilibrium without frictions . . . . . . . . . . .
8.7 Coping with Knigthian uncertainty . . . . . . . . . . . . . . . .
8.7.1 Prelude . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.7.2 Uncertainty aversion and Ellsberg paradox . . . . . . . .
8.7.3 Portfolio selection and market participation . . . . . . .
8.7.4 A model of multiple likelihoods . . . . . . . . . . . . . .
8.8 Production . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.9 Government spending and asset prices . . . . . . . . . . . . . .
8.10 Leverage and volatility . . . . . . . . . . . . . . . . . . . . . . .
8.10.1 Primitives . . . . . . . . . . . . . . . . . . . . . . . . . .
8.10.2 Equity volatility: a decomposition formula . . . . . . . .
8.10.3 Bankruptcy . . . . . . . . . . . . . . . . . . . . . . . . .
8.11 Multiple trees and the cross-section of asset returns . . . . . . .
8.12 The term-structure of interest rates . . . . . . . . . . . . . . . .
7

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

328
328
329
330
332
333
334
335
336

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

340
. 340
. 342
. 342
. 343
. 344
. 345
. 346
. 347
. 349
. 350
. 351
. 352
. 353
. 354
. 355
. 357
. 358
. 359
. 362
. 371
. 371
. 372
. 374
. 378
. 382
. 384
. 384
. 384
. 385
. 386
. 386
. 386

Contents

c
by
A. Mele

8.13 Prices, quantities and the separation hypothesis . . . . . . . . . . . . . .


8.13.1 A closed-form expression for non-expected utility . . . . . . . . .
8.13.2 Preferences for robustness . . . . . . . . . . . . . . . . . . . . . .
8.13.3 Irrelevance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.14 Endogenous risk and the nancial accelerator doctrine . . . . . . . . . .
8.14.1 Credit cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.14.2 Amplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.14.3 Additional literature . . . . . . . . . . . . . . . . . . . . . . . . .
8.15 Appendix 1: Non-expected utility . . . . . . . . . . . . . . . . . . . . . .
8.15.1 Detailed derivation of optimality conditions and selected relations
8.15.2 Details regarding models of long-run risks . . . . . . . . . . . . .
8.15.3 Continuous time . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.16 Appendix 2: Economies with heterogenous agents . . . . . . . . . . . . .
8.17 Appendix 3: Knightian uncertainty . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9 Information and other market frictions
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2 Prelude: imperfect information in macroeconomics . . . . . . . . . . . . .
9.3 Informational e ciency: roadmap . . . . . . . . . . . . . . . . . . . . . .
9.4 Walrasian equilibria as informationally ine cient outcomes . . . . . . . .
9.5 Rational Expectations Equilibrium . . . . . . . . . . . . . . . . . . . . .
9.6 Noisy Rational Expectations Equilibrium . . . . . . . . . . . . . . . . . .
9.6.1 Asymmetric information: information transmission . . . . . . . .
9.6.2 Di erential information: information aggregation . . . . . . . . . .
9.7 Dealers markets: Introduction . . . . . . . . . . . . . . . . . . . . . . . .
9.7.1 Markets with symmetric information . . . . . . . . . . . . . . . .
9.7.2 With asymmetric information . . . . . . . . . . . . . . . . . . . .
9.8 Markets with strategic players . . . . . . . . . . . . . . . . . . . . . . . .
9.8.1 The Kyle baseline model . . . . . . . . . . . . . . . . . . . . . . .
9.8.2 Markets with multiple traders and dealers . . . . . . . . . . . . .
9.8.3 Dynamic markets . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.9 Further topics on market microstructure, frictions and limits to arbitrage
9.9.1 Further determinants of bid-ask spreads . . . . . . . . . . . . . .
9.9.2 Liquidity trading . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.9.3 Arbitrage imperfections . . . . . . . . . . . . . . . . . . . . . . .
9.9.4 Price impacts and derivatives . . . . . . . . . . . . . . . . . . . .
9.10 Over-the-counter markets . . . . . . . . . . . . . . . . . . . . . . . . . .
9.11 Questions regarding higher order beliefs and beauty contests . . . . . . .
9.12 Appendix 1: The projection theorem . . . . . . . . . . . . . . . . . . . .
9.13 Appendix 2: Details regarding solutions of selected models . . . . . . . .
9.14 Appendix 3: Some foundations to pricing behavior in macroeconomics . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

388
388
389
389
390
390
394
395
396
396
399
399
400
405
407

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

413
. 413
. 415
. 417
. 419
. 420
. 422
. 423
. 428
. 431
. 431
. 432
. 435
. 436
. 437
. 443
. 447
. 447
. 447
. 447
. 447
. 448
. 449
. 450
. 451
. 455
. 458

c
by
A. Mele

Contents

III

Asset pricing and reality

10 Options and volatility


10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.2 Forwards and futures . . . . . . . . . . . . . . . . . . . . . . .
10.2.1 Forwards: denition and pricing in frictionless markets
10.2.2 Forwards as a means to borrow money . . . . . . . . .
10.2.3 Marking to market . . . . . . . . . . . . . . . . . . . .
10.2.4 Futures . . . . . . . . . . . . . . . . . . . . . . . . . .
10.2.5 Backwardation and Contango . . . . . . . . . . . . . .
10.3 Optionality and no-arb bounds . . . . . . . . . . . . . . . . .
10.3.1 Model-free properties . . . . . . . . . . . . . . . . . . .
10.3.2 Hedging . . . . . . . . . . . . . . . . . . . . . . . . . .
10.3.3 A case study: accumulators, decumulators . . . . . . .
10.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.4.1 A pricing formula . . . . . . . . . . . . . . . . . . . . .
10.4.2 Black & Scholes . . . . . . . . . . . . . . . . . . . . . .
10.4.3 Surprising cancellations and preference-free formulae
10.4.4 Future options and Blacks formula . . . . . . . . . . .
10.4.5 Hedging . . . . . . . . . . . . . . . . . . . . . . . . . .
10.4.6 Endogenous volatility . . . . . . . . . . . . . . . . . . .
10.4.7 Properties of options in di usive models . . . . . . . .
10.5 Stochastic volatility . . . . . . . . . . . . . . . . . . . . . . . .
10.5.1 Statistical models of changing volatility . . . . . . . . .
10.5.2 Implied volatility, smiles and skews . . . . . . . . . . .
10.5.3 Option pricing with stochastic volatility . . . . . . . .
10.6 Trading volatility with options . . . . . . . . . . . . . . . . . .
10.6.1 Payo s . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.6.2 P&Ls of -hedged strategies . . . . . . . . . . . . . . .
10.7 Local volatility . . . . . . . . . . . . . . . . . . . . . . . . . .
10.7.1 Issues . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.7.2 Implied binomial trees . . . . . . . . . . . . . . . . . .
10.7.3 The perfect t, in continuous time . . . . . . . . . . . .
10.7.4 Relations with implied volatility . . . . . . . . . . . . .
10.8 The price of (equity) volatility . . . . . . . . . . . . . . . . . .
10.8.1 One introductory example: range-based volatility . . .
10.8.2 Fear gauge contracts . . . . . . . . . . . . . . . . . .
10.8.3 Forward volatility trading . . . . . . . . . . . . . . . .
10.8.4 Marking to market . . . . . . . . . . . . . . . . . . . .
10.8.5 Stochastic interest rates . . . . . . . . . . . . . . . . .
10.8.6 Hedging . . . . . . . . . . . . . . . . . . . . . . . . . .
10.9 A digression on skewness . . . . . . . . . . . . . . . . . . . . .
10.10Dealing with market imperfections . . . . . . . . . . . . . . .
9

460
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

461
. 461
. 462
. 462
. 463
. 463
. 463
. 464
. 467
. 467
. 471
. 472
. 473
. 473
. 474
. 476
. 476
. 477
. 478
. 479
. 482
. 482
. 484
. 488
. 496
. 496
. 501
. 503
. 504
. 504
. 507
. 509
. 510
. 511
. 512
. 517
. 518
. 518
. 519
. 520
. 521

c
by
A. Mele

Contents
10.11Appendix 1: The original arguments of Black & Scholes
10.12Appendix 2: Black (1976) . . . . . . . . . . . . . . . .
10.13Appendix 3: Stochastic volatility . . . . . . . . . . . .
10.13.1 Hull & White equation . . . . . . . . . . . . .
10.13.2 Extensions . . . . . . . . . . . . . . . . . . . . .
10.13.3 Smile analytics . . . . . . . . . . . . . . . . . .
10.14Appendix 4: Local volatility . . . . . . . . . . . . . . .
10.15Appendix 5: Variance contracts . . . . . . . . . . . . .
10.16Appendix 6: Skewness contracts . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

11 Engineering of xed income securities


11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.1.1 Relative pricing in xed income markets . . . . . . . . . . . . . . . . .
11.1.2 Many evaluation paradigms . . . . . . . . . . . . . . . . . . . . . . . .
11.1.3 Plan of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.2 Markets and interest rate conventions . . . . . . . . . . . . . . . . . . . . . . .
11.2.1 Markets for interest rates . . . . . . . . . . . . . . . . . . . . . . . . . .
11.2.2 Mathematical denitions of interest rates . . . . . . . . . . . . . . . . .
11.2.3 Yields to maturity on coupon bearing bonds . . . . . . . . . . . . . . .
11.2.4 Accruals, invoice, and clean prices on coupon bearing bonds . . . . . .
11.3 Duration and convexity hedging and trading . . . . . . . . . . . . . . . . . . .
11.3.1 Duration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.3.2 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.3.3 Asset-liability management . . . . . . . . . . . . . . . . . . . . . . . . .
11.4 Foundational issues in interest rate modeling . . . . . . . . . . . . . . . . . . .
11.4.1 Tree representation of the short-term rate . . . . . . . . . . . . . . . .
11.4.2 Tree pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.4.3 Introduction to calibration . . . . . . . . . . . . . . . . . . . . . . . . .
11.4.4 Calibrating probabilities throught derivative data . . . . . . . . . . . .
11.4.5 Extensions to trinomial trees . . . . . . . . . . . . . . . . . . . . . . . .
11.5 The Ho and Lee model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.5.1 The tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.5.2 Price movements and the martingale restriction . . . . . . . . . . . . .
11.5.3 The recombining condition and interest rate volatility . . . . . . . . . .
11.5.4 Models solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.5.5 Calibration of the model . . . . . . . . . . . . . . . . . . . . . . . . . .
11.5.6 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.5.7 Continuous-time approximations, with an application to barbell trading
11.6 Beyond Ho and Lee: Calibration through Arrow-Debreu securities . . . . . . .
11.6.1 Extracting Arrow-Debreu securities from the yield curve . . . . . . . .
11.6.2 Two model examples . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.7 Callables, puttable and convertibles with trees . . . . . . . . . . . . . . . . . .
10

.
.
.
.
.
.
.
.
.
.

522
523
524
524
524
525
527
529
532
533

536
. 536
. 537
. 537
. 538
. 538
. 538
. 541
. 543
. 543
. 545
. 546
. 548
. 548
. 556
. 557
. 560
. 561
. 576
. 584
. 584
. 585
. 586
. 586
. 588
. 590
. 590
. 594
. 598
. 599
. 602
. 611

c
by
A. Mele

Contents
11.7.1 Denitions and rationale . . . . . . . . . . . . .
11.7.2 Callable bonds . . . . . . . . . . . . . . . . . .
11.7.3 Convertible bonds . . . . . . . . . . . . . . . . .
11.8 Appendix 1: Botstrapping and no-arbitrage restrictions
11.9 Appendix 2: Proof of Eq. (11.17) . . . . . . . . . . . .
11.10Appendix 2: The Ho and Lee price representation . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

12 Interest rates
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.2 Bond prices and interest rates . . . . . . . . . . . . . . . . .
12.2.1 A rst representation of bond prices . . . . . . . . . .
12.2.2 Forward rates . . . . . . . . . . . . . . . . . . . . . .
12.2.3 A second representation of bond prices . . . . . . . .
12.3 Stylized facts . . . . . . . . . . . . . . . . . . . . . . . . . .
12.3.1 The expectation hypothesis . . . . . . . . . . . . . .
12.3.2 Bond returns predictability . . . . . . . . . . . . . .
12.3.3 The yield curve and the business cycle . . . . . . . .
12.3.4 Additional stylized facts about the US yield curve . .
12.3.5 Common factors a ecting the yield curve . . . . . . .
12.4 Models of the short-term rate: Introduction . . . . . . . . .
12.4.1 Models versus representations . . . . . . . . . . . . .
12.4.2 The bond pricing equation . . . . . . . . . . . . . . .
12.4.3 Stochastic duration . . . . . . . . . . . . . . . . . . .
12.4.4 Some famous models . . . . . . . . . . . . . . . . . .
12.4.5 The Monetary Experiment and interest rate volatility
12.4.6 Short-term rates as jump-di usion processes . . . . .
12.5 Multifactor models of the short-term rate . . . . . . . . . . .
12.5.1 Stochastic volatility . . . . . . . . . . . . . . . . . . .
12.5.2 Three-factor models . . . . . . . . . . . . . . . . . .
12.5.3 A ne and quadratic term-structure models . . . . .
12.5.4 Unspanned stochastic volatility . . . . . . . . . . . .
12.5.5 Topics regarding estimation and trading strategies . .
12.6 No-arbitrage models: early formulations . . . . . . . . . . . .
12.6.1 Fitting the yield-curve, perfectly . . . . . . . . . . . .
12.6.2 Ho & Lee . . . . . . . . . . . . . . . . . . . . . . . .
12.6.3 Hull & White . . . . . . . . . . . . . . . . . . . . . .
12.7 The Heath-Jarrow-Morton framework . . . . . . . . . . . . .
12.7.1 Framework . . . . . . . . . . . . . . . . . . . . . . .
12.7.2 The model . . . . . . . . . . . . . . . . . . . . . . . .
12.7.3 The dynamics of the short-term rate . . . . . . . . .
12.7.4 Embedding . . . . . . . . . . . . . . . . . . . . . . .
12.7.5 Stochastic string shocks models . . . . . . . . . . . .
11

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

611
614
618
622
626
628
630

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

631
. 631
. 632
. 632
. 634
. 634
. 635
. 635
. 636
. 638
. 641
. 641
. 644
. 645
. 646
. 649
. 650
. 656
. 659
. 660
. 661
. 664
. 665
. 666
. 667
. 669
. 670
. 671
. 672
. 672
. 672
. 673
. 674
. 675
. 676

Contents

c
by
A. Mele

12.8 Interest rate derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


12.8.1 Fixed income market volatility and the persistence of the short-term rate
12.8.2 Hypothetical continuous payo s . . . . . . . . . . . . . . . . . . . . . . .
12.8.3 Forward martingale probabilities . . . . . . . . . . . . . . . . . . . . . .
12.8.4 European options on bonds . . . . . . . . . . . . . . . . . . . . . . . . .
12.8.5 Callable and puttable bonds . . . . . . . . . . . . . . . . . . . . . . . . .
12.8.6 Options on xed coupon bonds . . . . . . . . . . . . . . . . . . . . . . .
12.8.7 Interest rate swaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.8.8 Caps & oors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.8.9 Swaptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.9 Market models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.9.1 Models and market practice . . . . . . . . . . . . . . . . . . . . . . . . .
12.9.2 Simply-compounded forward rate dynamics, and no-arb restrictions . . .
12.9.3 Applications to derivative evaluation . . . . . . . . . . . . . . . . . . . .
12.10Volatility surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.10.1 Implied volatilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.10.2 Local volatilities and SABR models . . . . . . . . . . . . . . . . . . . . .
12.11Appendix 1: The FTAP for bond prices . . . . . . . . . . . . . . . . . . . . . . .
12.12Appendix 2: Certainty equivalent interpretation of forward prices . . . . . . . .
12.13Appendix 3: Additional results on forward probabilities . . . . . . . . . . . . . .
12.14Appendix 4: Principal components analysis . . . . . . . . . . . . . . . . . . . . .
12.15Appendix 5: A few analytical details regarding the Hull and White model . . . .
12.16Appendix 6: Expectation theory and embedding in selected models . . . . . . .
12.17Appendix 7: Additional results on string models . . . . . . . . . . . . . . . . . .
12.18Appendix 8: Changes of numeraire and Jamshidians (1989) formula . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13 Risky debt and credit derivatives
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.1.1 A brief history of credit risk and nancial innovation . . . . . . . . .
13.1.2 Plan of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.2 The classics: Modigliani-Miller irrelevance results . . . . . . . . . . . . . . .
13.3 Conceptual approaches to valuation of defaultable securities . . . . . . . . .
13.3.1 Firm value, or structural, approaches . . . . . . . . . . . . . . . . . .
13.3.2 The structural approach in practice: the pricing of convertible bonds
13.3.3 Reduced form approaches: rare events, or intensity, models . . . . . .
13.3.4 Ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.4 Credit derivatives and structured products based thereon . . . . . . . . . . .
13.4.1 Options and spreads . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.4.2 Credit Default Swaps . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.4.3 Collateralized Debt Obligations (CDOs) . . . . . . . . . . . . . . . .
13.5 Foundations of risk-management . . . . . . . . . . . . . . . . . . . . . . . . .
13.5.1 Value at Risk (VaR) . . . . . . . . . . . . . . . . . . . . . . . . . . .
12

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

678
678
681
682
683
686
688
689
692
693
694
694
695
696
699
699
700
704
706
707
708
709
710
712
713
714

718
. 718
. 718
. 721
. 721
. 723
. 723
. 738
. 740
. 745
. 749
. 749
. 750
. 766
. 778
. 778

c
by
A. Mele

Contents
13.5.2 Backtesting . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.5.3 Stress testing . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.5.4 Credit risk and VaR . . . . . . . . . . . . . . . . . . . . . . . .
13.5.5 Expected shortfall and measures of systemic risk . . . . . . . . .
13.6 Procyclicality, credit crunches and quantitative easing . . . . . . . . . .
13.6.1 Regulatory framework . . . . . . . . . . . . . . . . . . . . . . .
13.6.2 The 2007 subprime crisis . . . . . . . . . . . . . . . . . . . . . .
13.6.3 Top tier capital ratio targets and endogenous volatility . . . . .
13.6.4 Credit crunches and quantitative easing . . . . . . . . . . . . . .
13.7 Appendix 1: Present values contingent on future bankruptcy . . . . . .
13.8 Appendix 2: Proof of selected results . . . . . . . . . . . . . . . . . . .
13.9 Appendix 3: Transition probability matrices and pricing . . . . . . . . .
13.10Appendix 4: Bond spreads in markets with stochastic default intensity .
13.11Appendix 6: Conditional probabilities of survival . . . . . . . . . . . . .
13.12Appendix 7: Details regarding CDS index swaps and swaptions . . . . .
13.13Appendix 8: Modeling correlation with copulae functions . . . . . . . .
13.14Appendix 9: Details on CDO pricing with imperfect correlation . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

781
782
783
785
785
787
789
792
798
802
803
804
806
808
809
811
813
814

Part I
Foundations

14

1
The classic capital asset pricing model

1.1 Introduction
An investor is considering investing into a portfolio of securities. How would he choose the proportion of each asset based on his attitude towards risk? What are the asset pricing implications
of these portfolio choices? Are these choices consistent with absence of arbitrage opportunities?
This chapter deals with these issues in the context of the static market idealized in the rst
contributions that would give birth to nancial economics. The next section deals with portfolio
selection when our investor maximizes a mean-variance criterion, as in the seminal approach of
Markovitz (1952). We shall see that optimal portfolio choices like these lead to a notion of a
market portfolio as well as a rst theory of asset prices, known as the Capital Asset Pricing
Model, the celebrated CAPM (see Section 1.3). The CAPM predicts that each asset expected
return is proportional to the expected return of the market portfolio. It is, of course, a quite
coarse description of asset markets. Section 1.4 develops the Arbitrage Pricing Theory, or
APT, model. The APT provides renements of the CAPM, in that it predicts that each asset
expected return does relate to a number of factors. [In progress]

1.2 Portfolio selection


We begin with the derivation of wealth constraint. Second, we illustrate the main results of the
model, with and without a safe asset. Third, we introduce the notion of market portfolio.
1.2.1 Wealth constraints
Our investor can invest into
risky assets and one safe asset. Let = [ 1
] be the
risky assets price vector, and let 0 be the price of the riskless asset. We wish to evaluate the
value of a portfolio of all these assets. Let = [ 1
], where is the number of the -th
risky asset, and let 0 be the number of the riskless assets, in this portfolio. The initial wealth
is, = 0 0 + . Terminal wealth is 0 = 0 0 + , where 0 is the payo promised by the

c
by
A. Mele

1.2. Portfolio selection

riskless asset, and = [ 1


] is the vector of the payo s pertaining to the risky assets,
i.e. is the payo of the -th asset.
0
Let
, and
. In words, is the gross interest rate obtained by investing in a
0
safe asset, and is the gross return obtained by investing in the -th risky asset. Accordingly,

we dene
1 as the safe interest rate; = [1 ], where
1 is the rate of
return on the -th asset; and
(), the vector of the expected returns on the risky assets.
Finally, we let = [ 1
], where
is the wealth invested in the -th asset. We
have
X
X
X
0

= 0 0+
and
= 0+
(1.1)
0+
=1

=1

Combining the two expressions for


0

>

and

)+

=1

leaves:
>

)+

>

We use the decomposition,


= , where is a volatility matrix, with
,
and is a random vector with expectation zero and variance-covariance matrix equal to the
identity matrix. With this decomposition, we can rewrite the budget constraint in Eq. (1.1) as
follows:
0
= >(
1 )+
+ >
(1.2)
We can now use Eq. (1.2) and determine the expected return and the variance of the portfolio
value. We have
[ 0 ( )] =
where

>

. Let 2
, which implies that

>

)+

[ 0 ( )] =

and

. We assume that
min ( ).

>

has full-rank, and that for all

(1.3)
,

1.2.2 Portfolio choice


We assume that the investor maximizes the expected return on his portfolio conditionally to a
given level of uncertainty. That is, let 2 2 dene the maximum level of variance the investor
is willing to accept. We consider the following program based on Eq. (1.3) and the uncertainty
constraint:
( ) = arg max [ 0 ( )] s.t.
[ 0 ( )] = 2 2
[1.P1]
R

The rst order conditions for [1.P1] are,


( ) = (2 )

and

> =

where is a Lagrange multiplier for the variance constraint. By plugging the rst condition

, where
into the second, we obtain, (2 ) 1 =
Sh
Sh

)>
16

(1.4)

c
by
A. Mele

1.2. Portfolio selection

is the Sharpe market performance. To ensure e ciency, we take the positive solution. Substituting the positive solution for (2 ) 1 into the rst order condition, we obtain that the portfolio
that solves [1.P1] is
1
(
1 )
( )
(1.5)

Sh
We are ready to determine the value of [1.P1], [ 0 ( ( ))] and, hence, the expected portfolio
return, dened as,
[ 0 ( ( ))]
( )
(1.6)
= + Sh
where the last equality follows by a simple calculation. Eq. (1.6) describes what is known as
the Capital Market Line (CML).
1.2.3 Without the safe asset
Next, assume that the investors choice space does not include the riskless asset. In this case,
P
P
. By the denition
his current wealth is =
, and his terminal wealth is 0 =
=1
=1

of
1, and a few basic calculations,
0

=1

>

>

(1.7)

=1

where and are as dened as in Eq. (1.2). We can use Eq. (1.7) to determine the expected
return and the variance of the portfolio value, which are
[ 0 ( )] =

>

+ , where

>

[ 0 ( )] =

and

>

(1.8)

The program our investor solves, now, is


( ) = arg max
R

[ 0 ( )]

[ 0 ( )] =

s.t.

In the Appendix, we show that provided


in the Appendix), the solution to [1.P2] is
( )

( )

1
2

and

>

[1.P2]

0 (a second order condition, as explained

( )
2

(1.9)

>
1
,
1> 1 and
1> 1 1 , and ( ) is the expected portfolio
where
return, dened as in Eq. (1.6). In the Appendix, we also show that

1
1
2
(1.10)
=
( )
1+
2

Based on Eq. (1.10), we dene the global minimum variance portfolio as that portfolio that
1
achieves a variance equal to 2 =
and an expected return equal to
= / . We shall
return to this portfolio below.
17

c
by
A. Mele

1.2. Portfolio selection

Note that for each , there are two values of ( ) that solve Eq. (1.10). The optimal choice
for our investor is that with the highest . We dene the e cient portfolio frontier as the set
of values (
) that solve Eq. (1.10) with the highest . It has the following expression
q

1 2
2)
( )=
1 (
(1.11)
+

The e cient portfolio frontier is increasing and concave in risk, . It can be interpreted
as a production function, i.e., one that produces expected returns obtained using varying
levels of risk as inputs (see, e.g., Figure 1.1). Which portfolio on this frontier is selected by
an investor depends on the investors attitudes towards risk.
Example 1.1. Let the number of risky assets
= 2. In this case, the e cient portfolio
frontier is obtained without optimizing as above: the budget constraint, 1 + 2 = 1, pins down
an unique relation between the expected portfolio return and the variance of the portfolios
0
value. Precisely, we have:
= [ ( )] = 1 1 + 2 2 , or,
=

whence:
1

=
2

When

= 1

+(

2
2

2
1

1)
2
1

+2 1

+2

12 +

2) ( 1

1 2

2
+

2
2

2
2

= 1,
=

But diversication pays, provided asset returns are not perfectly positively correlated. Figure
1.1 actually reveals that there are portfolios that are even less risky than the less risky asset.
1
Moreover, risk can be zeroed when = 1, in which case 1 = 2 2 1 and 2 =
or,
2
1
1
2
2
1
alternatively, to
=
and
= 2 1 .Section 1.2.5 relies on a simplied version of this
2
1
model, which was at the basis of Tobins (1958) famous reformulation of the Keynesian theory
of money demand.
1.2.4 The global minimum variance portfolio
Note that the portfolio in Eq. (1.9) can be decomposed into two components, as follows:

( )
( )
= ( )
+ [1
( )]
( )
(1.12)
2

where

is the global minimum variance portfolio, for we know from Eq. (1.10) that the
q

1
minimum variance occurs at (
)=
, in which case ( ) = 0.1 More generally, we
Therefore,

1 It

is easy to show that the covariance of the global minimum variance portfolio with any other portfolio equals

18

1.

c
by
A. Mele

1.2. Portfolio selection

0.15

0.14

Expected return, mu

= 1
= 0.5

0.13

=0
= 0.5

0.12

=1
0.11

0.1

0.09

0.05

0.1

0.15

0.2

0.25

Volatility, vp

FIGURE 1.1. From top to bottom: portfolio frontiers corresponding to = 1 0 5 0 0 5 1. Parameters are set to 1 = 0 10, 2 = 0 15, 1 = 0 20, 2 = 0 25. For each portfolio frontier, the e cient
portfolio frontier includes those portfolios that yield the highest expected return for a given volatility.

can span any portfolio on the frontier by just choosing a convex combination of

and

, with

weight equal to ( ). Its a mutual fund separation theorem. We shall use this representation of
the portfolios on the e cient portfolio frontier while deriving the zero-beta CAPM in Section
1.3.3.
1.2.5 The market portfolio
The market portfolio is the portfolio at which the CML in Eq. (1.6) and the e cient portfolio
frontier in Eq. (1.11) intersect. In fact, the market portfolio is the point at which the CML is
tangent at the e cient portfolio frontier. For this reason, the market portfolio is also referred to
as the tangent portfolio. In Figure 1.2, the market portfolio is at point
and has volatility
equal to
and expected return equal to
. At this point, the CML is tangent to the e cient
2
portfolio frontier,
.
As Figure 1.2 illustrates, the CML dominates the e cient portfolio frontier
. This is
because the CML is the value of the investors problem, [1.P1], obtained using all the risky
assets and the riskless asset, and the e cient portfolio frontier is the value of the investors
problem, [1.P2], obtained using only all the risky assets.3 For the same reason, the CML and
the e cient portfolio frontier can only be tangent to each other. For suppose not. Then, there
would exist a point on the e cient portfolio frontier that dominates some portfolio on the CML,
2 The

existence of the market portfolio requires a restriction on , derived in Eq. (1.13) below.
1.2 also depicts the dotted line
, which is the value of the investors problem when he invests a proportion higher
than 100% in the market portfolio, leveraged at an interest rate for borrowing higher than the interest rate for lending. In this case,
the CML coincides with
, up to the point . From
onwards, the CML coincides with the highest between
and
.
3 Figure

19

c
by
A. Mele

1.2. Portfolio selection

a contradiction. Likewise, the CML must have a portfolio in common with the e cient portfolio
frontierthe portfolio that does not include the safe asset. Below, we shall use this insight to
characterize the market portfolio analytically.
Why is the market portfolio called in this way? Figure 1.2 reveals that any portfolio on the
CML can be obtained as a combination of the safe asset and the market portfolio
(a portfolio
containing only the risky assets). An investor with high risk-aversion would like to choose a
point such as , say. An investor with low risk-aversion would like to choose a point such as ,
say. But no matter how risk averse an individual is, the optimal solution for him is to choose a
combination of the safe asset and the market portfolio . Thus, the market portfolio plays a
mere instrumental role. It obviously does not depend on the risk attitudes of any investorit
is a mere convex combination of all the existing assets in the economy. Instead, the optimal
course of action for any investor is to use those proportions of this portfolio that make his
overall exposure to risk consistent with his risk appetite. Its a two fund separation theorem.
Are these predictions observed, in practice? Financial advisors are known to recommend
young investors to hold more risky positions, and less conservative investors to increase their
stock holdings, compared to bonds. Instead, according to the 1.2, the stocks/bonds mix in
should be the same, independently of risk-attitudes. These are assets allocation puzzles,
described for example by Campbell and Viceira (2002), which can be addressed through extensions of the CAPMe.g., assuming that agents have access to stochastic opportunity sets
including stochastic volatility, as discussed in later chapters of Part II. [In progress]
The equilibrium implications of the previous separation theorem lead to better clarify the
reasons we refer the tangent portfolio to as the market portfolio. As explained, any portfolio
can be attained by lending or borrowing funds in zero net supply, and in the portfolio .
In equilibrium, then, every investor must hold some proportions of . But since in aggregate,
there is no net borrowing or lending, one has that in aggregate, all investors must have portfolio
holdings that sum up to the market portfolio, which is therefore the value-weighted portfolio
of all the existing assets in the economy. This argument is formally developed in the appendix.
We turn to characterize the market portfolio. We need to assume that the interest rate is
su ciently low to allow the CML to be tangent at the e cient portfolio frontier. The technical
condition that ensures this is that the return on the safe asset be less than the expected return
on the global minimum variance portfolio, viz
(1.13)
Let
where

be the market portfolio. To identify


, we note that it belongs to
also belongs to the CML and, therefore, by Eq. (1.5), is such that:
1

1>
20

1
Sh

>

1 = ,

(1.14)

that solves

Therefore, we must be looking for the value


= 1>

1
Sh

if

c
by
A. Mele

1.2. Portfolio selection

CML
A
Z

Q
C
r

vM

FIGURE 1.2.

i.e.,
Sh

=
Then, we plug this value of

(1.15)
in Eq. (1.14) and obtain,4

into the expression for


=

(1.16)

Once again, the market portfolio belongs to the e cient portfolio frontier. Indeed, consider the
following reasoning. On the one hand, the market portfolio cannot be above the e cient portfolio
frontier, as this would contradict the e ciency of the
curve (obtained by investing in
the risky assets only). On the other hand, by construction, the market portfolio belongs to
the CML and so it cannot be below the e cient portfolio frontier, as the CML dominates the
e cient portfolio frontier. The Appendix makes this reasoning rigorous, and shows that the
market portfolio does indeed satisfy the tangency condition.
1.2.6 Tobins re-interpretation of Keynesian speculative demand for money
Tobin (1958) relies on portfolio selection and shows that demand for money can be explained
while making reference to the agents attitude vis-`a-vis the risk of alternative ways to invest
savings. His contribution aims to revise some foundational issues regarding the monetary theory
in Keynes (1936). Tobin explains that the Keynesian explanation for money demand may imply
that agents end up making dichotomic choices: they hold either money or bonds. That is, at
the individual level, a given agent either holds money or bonds based on his own expectations
of the future interest rate levels. However, at the aggregate level, money demand is inversely
4 While

the market portfolio depends on , this portfolio does not obviously include any share in the safe asset.

21

c
by
A. Mele

1.2. Portfolio selection

related to the nominal interest rate, albeit at in correspondence of small values of this ratea
liquidity trap. The Appendix provides a parameteric example that claries the details of how
these mechanisms operate.
Tobin formulates a theory of money demand in which agents do not make previous dichotomic
choices. Consider the following specication of Example 1.1. We interpret the safe asset as
money, which is therefore such that its return and volatility are 1 0 and 1 0; instead,
bonds are risky, in that they provide a superior return but at the cost of some volatility.
Therefore, the expected return and volatility of a portfolio comprising money and bonds are
= 2 2 and = 2 2 , with straightforward notation. We have
=

2
2

1
2

(1.17)

The top panels of next two pictures plots the rst of Eqs. (1.17), along with indi erence
curves of an hypothetical representative agent. The agent nds his optimum is achieved at ,
which is the point of tangency between the rst of Eqs. (1.17) and the indi erence curve
.
Money demand, 1
2 , is determined by the second of Eqs. (1.17), and is shown in the bottom
panels. As the expected return on the bond, 2 , increases, the new optimum shifts to 0 , the
tangency point between the new relation in Eqs. (1.17) and the indi erence curve 0 0 . Money
demand decreases as a result.
p

P
U

U
P

P
U

U
b2/s 2
1/s 2

p2
p2

vp

vp

1/s 2

p2

I
II

This framework of analysis can be used to study the e ects of a decreased interest rate
volatility. Suppose the central bank has the power to lower both interest rates and interest
rate volatility in such a way to keep the ratio 2 / 2 unchanged. As the second picture above
illustrates, the optimum is still , although then the second of Eqs. (1.17) becomes steeper than
before the policy action, by shifting from the line to the line . Money demand decreases
22

c
by
A. Mele

1.3. The CAPM

as a result. We can say more. As is clear, money demand decreases as interest rate volatility,
2 , decreases. Therefore, the central bank might keep money supply constant and achieve lower
interest rates by simply targeting low interest rate volatility.
[In progress]

1.3 The CAPM


This section goes beyond the previous issues relating to e cient portfolio choice, and derives
the rst asset evaluation formula, based on what is known as the Capital Asset Pricing Model
(CAPM). First, we derive the CAPM, relying on arguments with the same avor as those in the
original derivation of Sharpe (1964). Second, we derive a CAPM while relaxing the assumption
that we can trade a risk-free asset. Third, we cast the evaluation problem in a slightly more
general context, one in which agents are utility maximizers. Finally, we illustrate a few simple
ideas regarding some basic applications of the CAPMproject evaluation.
1.3.1 Foundations
Consider a portfolio comprising a proportion of wealth invested in any asset and the remaining proportion 1
invested in the market portfolio. That is, we consider a portfolio that
is parametrized by . The expected return and volatility of this portfolio are
(

)
p + (1
(1.18)
2
2

(1
)
+ 2(1
)
+ 2 2

where we have dened


. Clearly, the market portfolio, , belongs to this portfolio.
Relying on Example 1.1, we know that the curve in (1.18) has the same shape as the curve
0
in Figure 1.3. The curve 0
lies below the e cient portfolio frontier
, because
the latter results from optimizing a mean-variance criterion over all the existing assets, which
then dominates any portfolio that only comprises the two assets and .
For example, assume the 0
curve intersects the
curve. Then, a feasible combination
0
of assets in the
curve would dominate points on
, a contradiction, as
is the
0
most e cient feasible combination of all existing assets. On the other hand, the
curve
has a point in common with the
curve, which is
(the portfolio obtained with = 0).
0
Therefore, the curve
is tangent to the e cient portfolio frontier
at , which is
tangent to the CML at , as we know.
That is, at , the slopes of the 0
curve is the same as that of the e cient portfolio
frontier
. This condition provides a restriction on the expected return on any asset , as
we now show. Consider the -parametrized curve (1.18). We determine its slope at
through

the two slopes,


and / , both evaluated at = 0. We have

+ 2 | =0
(1
) 2 + (1 2 )

1
2
=
=
=

| =0
=0
Therefore,

( )

( )

=0

23

(1.19)

c
by
A. Mele

1.3. The CAPM

CML
A
M

i
C
r

vM

FIGURE 1.3.

On the other hand, the slope of the CML is (


(1.19), yields
=

)/

which, equated to the slope in Eq.


= 1

(1.20)

Eq. (1.20) is the celebrated Security Market Line (SML). Appendix 3 contains an alternative
derivation of the SML.
The SML can be interpreted as a projection of the excess return on asset (i.e.
) on the

excess returns on the market portfolio (i.e.


). In other words,

)+

= 1

(1.21)

1 are usually referred to as aggressive assets and those with


1 are
Assets with
usually called conservative. Eq. (1.21) can be used to provide the following decomposition of
the volatility regarding the return in the -th asset:
2

2 2

( )

= 1

( ) 0 does, instead,
The quantity 2 2 is referred to as systematic risk. The quantity
capture the notion of idiosyncratic risk. In the next section, we shall show that idiosyncratic
risk can be eliminated through a well-diversied portfolioroughly, a portfolio that contains
a large number of assets.
1.3.2 Zero-beta CAPM
Suppose the risk-free asset is not available for trading, and consider a portfolio that only contains
risky assets in the portfolio frontier, which can be generated by Eq. (1.12), with a xed weigth
24

c
by
A. Mele

1.3. The CAPM


equal to . The vector of the covariances of all the asset returns with
generated by the portfolio ( ) , is,
( ) =

( )

>( )

, the return

(1.22)

and in particular, for any arbitrary asset ,


( ) =

(1.23)

and then,
( ) =

( ) =

(1.24)

Eqs. (1.23) and (1.24) can be solved for and 1 , which can then be plugged into Eq. (1.22),
leaving, after simple calcuations,

( )
( ) 1
( ) (
)
=
( )
( )
We can simplify this expression, once we assume that
=

1 +

( )
(
( )

( ) = 0, in which case,
)

(1.25)

Eq. (1.25) is the zero-beta, or Blacks (1972), CAPM. Let us summarize its meaning. Even if
there are no riskless assets, we can express expected returns in terms of benchmark returns,
as in the security market line of Eq. (1.20). First, let us be given a benchmark portfolio
return, and, second, an asset return that has zero correlation with it, (whence, the zerobeta qualication). Then, Eq. (1.25) is the counterpart to the security market line in the case
where we dont have riskless assets to invest in. Note, however, that Eq. (1.25) does not rely on
relaxing the assumption of the existence of no riskless assets. The next section provides more
equilibrium foundations to Eqs. (1.20) and (1.25).
1.3.3 Equilibrium with expected utility
Equilibrium with mean-variance & exponential utility. [In progress] [...]
1.3.4 Applications
1.3.4.1 Hedging

Interestingly, the CAPM can be interpreted by having regard to a classical hedging framework.
Suppose we hold an asset that delivers a return equal to perhaps, a nontradable asset. We
wish to hedge against this stochastic rerturn, by going long a portfolio comprising a proportion
in the market portfolio, and a proportion 1
in a safe asset. We use as a hedging criterion
the variance of the overall exposure of the position, min
[ ((1
) + )]. The solution
2
to this basic problem is,
( )
. That is, the proportion to hold is simply the

beta of the asset to hedge with the market portfolio.


25

c
by
A. Mele

1.4. The APT


1.3.4.2 Project evaluation and certainty equivalents

The CAPM is a model for the required return for any asset. As such, it might be used as a
very rst tool to assess risky projects. Let denote the future, random, cash ow of a certain

project. Its return is given by =


1, where
is the current value of the project, such

that
( )= +
(
), which is, then, interpreted, as the risk-adjusted discount
rate for this project. Hence,
( )
=
(1.26)
1+
Furthermore, we can express the value of the project,
equivalent of the random cash ow, . We have:
( )

=1+
=1+ +

=1+ +
=1+ +
=1+ +
where
leaves:

, in terms of the projects certainty

( )
2

)
)

( )

, the unit market risk-premium. Rearranging terms in the previous equation


=

( )

( )

( )

( )

(1.27)
1+
Next, consider a safe project with the same value as the original, , but a cash ow constant

and equal to , satisfying:


1 = . In other words, is the certain equivalent to for the
original risky project,

: = ( ) =
1+
1+
By Eq. (1.27), then,
=

1.4 The APT


1.4.1 Exact APT
1.4.1.1 No-arb restrictions on expected returns

Suppose that the

asset returns we observe are generated by the following linear factor model,

+
1

26

( )[

( )]

(1.28)

c
by
A. Mele

1.4. The APT

where and are a vector and a matrix of constants, and is a -dimensional vector of factors
supposed to a ect the asset returns, with
. The vector is the vector of risks that a ects
developments in asset returns.
Let us normalize [ ( )] 1 = , so that =
( ). With this normalization, we have,
(1
..
.
(

= +

= +

=1

=1

(1
..
.
(

)
(1.29)
)

Next, consider a portfolio of risky assets and a riskless asset. The wealth generated by this
portfolio is given by Eq. (1.2), which applied to this model leaves,
0

>

)+

>

(1.30)

An arbitrage opportunity arises, in this context, if there exists some portfolio such that the
wealth generated by this portfolio, 0 in Eq. (1.30) is certain, and di erent from the safe gross
interest rate , i.e. if
: > = 0 and > (
1 ) 6= 0. Mathematically, this is ruled out
whenever
R : =1 +
. Substituting this relation into Eq. (1.28) leaves,
=1

=1

( ) +

( )

Taking expectations, we have, that for each asset return,


= +(

) = +

=1

(
{z

)
}

= 1

(1.31)

be the -th component of the vector of risks in . The economic interpretation of


Let
Eq. (1.31) is that to be induced to invest in a risky market, I need a a risk-premium, which
P
compensates me beyond the risk-free rate. This risk premium is
. For each asset ,
=1
it is a linear combination of the betas
, which are the exposures of the asset return to
the sources of risks , times the unit risk-premiums
relating to each . So naturally, the
unit risk-premiums are common to all asset returns, although of course the return exposures,
, which can vary across .
1.4.1.2 Project evaluation

Project evaluation under the exact APT obtains under a straightforward generalization of Eq.
(1.26). For any project with random cash-ow equal to , we have that its random return is
=
1, such that the value, , is given by
=

()
1+

(1.32)

where is as in Eq. (1.31). In Section 1.4.3, we shall explain how to represent the value of any
project, in terms of an alternative probability.
27

c
by
A. Mele

1.4. The APT


1.4.1.3 APT and CAPM

The APT collapses to the CAPM, once we assume that the only factor a ecting the returns is
the market portfolio. To show this, we must normalize the market portfolio return so that its
variance equals one, consistently with Eq. (1.31). So let be the normalized market return,
1
dened as
, so that
( ) = 1. We have,
= +
where

( ) =

= 1

( ). Then, we have,
= 1

= +
In particular,

)=

( ) =

(1.33)

, and so, by Eq. (1.33),

=
which is known as the Sharpe ratio for the market portfolio, or the market price of risk.
By replacing = 1 ( ) and the expression for above into Eq. (1.33), we obtain,
= +

( )
2

= 1

This is simply the SML in Eq. (1.20).


1.4.2 Risk-neutral tilts, or the fundamental theorem of asset pricing
Only if the market were risk-neutral, would Eq. (1.31) predict that the expected return on any
risky asset collapse to the safe interest rate, = . But, is there a way to construct a risk-neutral
market, i.e. one with a zero risk-premium, from the original market where the risk-premium
P
is di erent from zero? The answer is in the a rmative. It is a quite fundamental
=1
theme in these Lectures, and in nancial economics indeed. It links to the celebrated fundamental theorem of asset pricingoften referred to as the FTAP.
1.4.2.1 Probability twists

Let us develop intuition on such a beautiful result, by elaborating a simple example. Assume
that each of the risks , is standard normal: ( ) = 12 exp( 12 2 ) and next, that we tilt
their densities by a factor equal to ( ) = exp( 12 2
), where is the unit-risk premium,
as dened in Section 1.4.1. This tilt denes a new probability under which each factor
is
distributed. Let us determine this probability, by tilting through ,

1
1
1 2 1 2
1 2
(1.34)
=
( )= ( ) ( )=
exp
exp
2
2
2
2
2
where
=

+
28

(1.35)

c
by
A. Mele

1.4. The APT

Note that the new density, , is still that of standard normal variate. Yet under , it is to
have zero expectation, not . In other words, we have that under , (i) is standard normal
and (ii) is normal with unit variance, but expectation equal to
. That is, assuming that
0,
has a lower expectation under than under under , this expectation is zero,
and under , it is
.
We label the new probability risk-neutral probability, for the following reasons. Consider
Eqs. (1.29) and (1.31), which say that under ,
= +

=1

(1.36)

=1

We also know that under the new probability , each factor


is normally distributed with
mean
. That is, replacing Eq. (1.35) into Eq. (1.36), we have that under ,
= +

=1

and
To summarize, the return on each assets , has the following distributions under
under ,

2
P
2

+
(1.37)
=1

>
,
, as in Eq. (1.3), and
and
denote the Normal densities
where 2 =
under and under . That is, under , the expected return on each asset equals , whence
the risk-neutral probability label. In later chapters, we shall use the celebrated Girsanovs
theorem to elaborate on these topics, and label the tilt in Eq. (1.34), as the Radon-Nikodym
of the probability against (see, e.g., Chapter 4, Section 4.3.3).

1.4.2.2 Derivative asset evaluation

Why complicating everything through the previous probability changes? In fact, the entire
building block underlying asset evaluation, relies on similar risk-neutral tilts. Consider the
following example of derivative evaluation. We wish to price a quadratic derivative, i.e. one
that pays o the square of the cash-ow promised by the rst asset, 12 . It is challenging
to evaluate this derivative through APT software in this Gaussian market, because 12 is
obviously not normally distributed, which complicates the reasoning underlying its exposure to
the factors . In fact, in this Gaussian market, we cannot restrict the behavior of the expected
return on this derivative, without assuming something more. Let us explain.
Suppose we want to construct a portfolio of the existing assets to replicate the payo of the
quadratic derivative, for each possible value this derivative could take. Can we do this? The
answer is in the negative. We cannot use a nite number of assets to span an asset payo , which
could take a continuum of values, such as 12 . We say markets are incomplete in this context.
In the next chapter, we shall see that the price of such, and related, derivatives, can be found
in a preference-free format, as soon as the number of assets is at least as large as the number
of statesmarkets are, then, complete. In this case, a portfolio that replicates the derivatives
29

c
by
A. Mele

1.4. The APT

payo can be found, and its value is the same as the derivatives, for two assets are worth the
same whenever they promise the same payo .
Chapter 2 explains that in a world with complete markets, the price of the existing traded
assets can be inverted for the shadow value of some elementary assets, those that pay o one
unit of numeraire in a given state of the world, and zero otherwise. The price of these elementary
securities can then be used to price any derivative, which is redundant indeed. Chapter 4 of
these Lectures shall explain how these results can be generalized to markets with a continuum
of states, as soon as we assume that there exist a number of su ciently diverse elementary
securities, which guarantee a payo could be delivered for each state of nature. To illustrate
through the example of this section, consider the following elementary security, which promises
the following payo ,

1 if 1 (
+ )
( )=
(1.38)
0 otherwise
and let ( ) be its current price. We shall refer these securities as Arrow-Debreu securities
in these Lectures, for reasons explained in the next chapter.
We could utilize all of these Arrow-Debreu securities, i.e. for all
R, and replicate any
R. Indeed,
generic function of the state, ( ), including our original payo , ( ) = 2 ,
note that by purchasing ( ) units of the security that pays o
( ) in Eq. (1.38), we pay
( ) ( ) today, and are guaranteed to receive 1 ( ) tomorrow in state 1 (
+ ),
and zero otherwise. Therefore, by purchasing all the securities that span R, we shall receive
( 1 ) for any possible value of 1 , for sure, tomorrow, and pay, today,
Z
( ) ( )
(1.39)
C
We call
such a portfolio. We claim that the value of the derivative,
say, is just C . For
suppose not, and assume, for instance, that
C . Then, we could sale short the derivative
for , invest C into the portfolio
, and retain an arbitrage prot equal to
C . It is
an arbitrage prot, because the portfolio
delivers the exact payo we need to honour the
short-sale of the derivative.
The crucial point is to determine the value of . We claim that,
( )=

1
1+

(1 + )

2 2
1 1

(1.40)

2
) denotes the density of the risk-neutral normal distribution
in (1.37), but
where ( ;
2
with mean and variances
and
as given in Eq. (1.40). Indeed, let us take expectations of
1 = 1 (1 + 1 ) under , such that,

( 1 )
1+

(1.41)

On the other hand, let us apply Eq. (1.39) to determine the value of the derivative that pays
o , ( 1 ) = 1 , which is
Z
1

( )

30

(1.42)

c
by
A. Mele

1.4. The APT

Comparing Eq. (1.41) and Eq. (1.42) yields Eq. (1.40).5


We are now ready to evaluate the quadratic derivative, by relying on Eq. (1.39), and the
expression of in Eq. (1.40). We have for ( ) = 2 , that the value of the derivative, say C ,
is:
Z
( 12 )
2
(1.43)
=
(
)
=
C
1+
The expression in Eq. (1.43) tells us that all we have to do is to re-cast the initial evaluation
problem in terms of this new, risk-neutral, setup. To x ideas, suppose that the unit prices
of risk
are all positive. In this setup, the discounting factor is the safe interest rate, . To
compensate with such as generous discounting, the expectation in the numerator of Eq. (1.43)
is lower than that under , because the distribution of 1 and, hence, 12 , is more skewed to
the left under than under , due to the factors driving the value of 1 = 1 (1 + 1 ) being
skewed to take more pessimistic values under , on average, by a factor of
.
Let us proceed with the determination of C , by using Eq. (1.43). We have that
( 12 ) =
2
2
((1 + 1 )2 ), where 1 + 1
(1 +
1
1 ), such that, by a direct calculation,
C

2
1

2
1

1+

+1+

where 1 is the value of the rst asset, determined as usual through Eq. (1.32). Note how simple
this formula is. It links the value of the derivative to the square of the value of the underlying
risk, 12 , and the discounted expectation of 21 , reecting that after all, a quadratic derivative
is about a play in volatility.6
Remarkably, the price of this derivative does not require any knowledge of the risk-premium
components, . It is, thus, a preference-free formula. It is the task of the next chapter to
develop the deep reasons why derivatives can sometimes be expressed in such as simple way.
To anticipate, the derivative relies on a risk, which is already traded in the market, in that the
cash ow 1 is traded at 1 . All risks have, then, already been embedded into the market price,
1.
1.4.3 The APT with idiosyncratic risk and a large number of assets
[Ross (1976), and Connor (1984), Huberman (1983).]
How can idiosyncratic risk be eliminated? Consider, for example, Eq. (1.21). Intuitively, we
may form portfolios with a large number of assets, so as to make idiosyncratic risk negligible, by
the law of large numbers. But would the beta-relation still hold, in this case? More in general,
would the APT relation in Eq. (1.31) be still valid? The answer is in the a rmative, although
it deserves some qualications.
5 We can check that Eq. (1.40) is consistent with the pricing of a pure discount bond. Such a bond has a payo equal to
( )=1
1
=
( ) , which it does, by Eq. (1.40).
for all , such that by Eq. (1.39), 1+
6 In continuous-time, the price at time of a quadratic derivative, which promises to pay o the square of an asset price, 2 ( )
2
)
at , is given by 2 ( ) ( + )(
, provided is a Geometric Brownian motion with volatility parameter equal to (see Chapter

4, Section 4.3.3.1).

31

c
by
A. Mele

1.4. The APT

Consider the APT equation (1.28), and add a vector of idiosyncratic returns, , which are
independent of , and have mean zero and variance 2 :
= +

We wish to show that in the absence of some appropriate notion of arbitrage, to be dened
below, it must be that the number of assets such that Eq. (1.31) does not hold, ( ) say, is
bounded as gets large, i.e.:
|

((

) + )|

= 1

( )

(1.44)

where
lim

( )

(1.45)

In other words, we wish to show that in a large market, Eq. (1.31) does indeed hold for most
of the assets, an approach close to that in Huang and Litzenberger (1988, p. 106-108).
By the same arguments leading to Eq. (1.1), the wealth generated by a portfolio of the assets
satisfying (1.44), 0 ( ) say, is,

0
>
>
=
1
+
+
+
(
)
(
)
(
)
(
)
(
)
( )
( )
( )

and
are (i) the vector of the expected returns, (ii) the return volatility (or
where ,
factor exposures) matrix and (iii) the vector of idiosyncratic return components a ecting these
assets, and, nally,
and
are the portfolio and the initial wealth invested in these assets.
In this context, we may dene an arbitrage as the portfolio
( ) that in the limit, as the
number of the existing assets gets large, is riskless and yet delivers an expected return strictly
larger than the safe interest rate, viz
lim

( )]

and

lim

( )

( )]

(1.46)

We want to show that this situation does not arises, under the condition in (1.45), thereby
establishing that the linear APT relation in Eq. (1.31) is valid for most of the assets, in a large
market.
So suppose the linear relation,
1 =
, doesnt hold. Then, there exists a portfolio
such that,
>
>
= 0 and
(
1 ) 6= 0.
(1.47)
Consider the portfolio:
=

>

is as in (1.47). With this portfolio we have, clearly, that [ 0 ] = > (


1 )+
0
, for each , and even for
large. That is, lim
[ ( )]
, which
( )
is the rst condition in (1.46). As regards the second condition in (1.46), we have that

>
[ 0 ] = >
+ 2 = 2 >

where

[ 0 ( )]
where the second equality follows by the rst relation in (1.47). Clearly, lim
as ( )
. Hence, in the absence of arbitrage, the condition in (1.45) must hold.
32

c
by
A. Mele

1.5. Empirical evidence


1.4.4 Systematic risk
Portfolio diversication & systematic versus non-systematic risk
[In progress]

Well-diversied portfolios.

1.5 Empirical evidence


[In progress]
1.5.1 Fama & Mac Beth
How to estimate Eq. (1.21)? Consider a slightly more general version of Eq. (1.21), where the
safe interest rate is time-varying:

)+

= 1

where
denote time-series residuals. Fama and MacBeth (1973) consider the following
procedure. In a rst step, one obtains estimates of the exposures to the market, say, for all
stocks, using, for example, monthly returns, and approximating the market portfolio with some
broad stock market index.7 In a second step, one runs cross-sectional regressions, one for each
month,

=
+ +
= 1
denote cross-sectional residuals. The time-series of crosswhere is the sample size and
sectional estimates of the intercept
and the price of risk , and say, are, then, used
to make statistical inference. For example, time-series averages and standard errors of and
lead to point estimates and standard errors for
and . If the CAPM holds, estimates of
should not be signicantly di erent from zero.
1.5.2 Macroeconomic forces
Chen, Roll and Ross (1986) use the Fama-MacBeth two-step procedure to estimate a multifactor
APT model, such as that in Section 1.4. They identify macroeconomic forces driving asset
returns with the innovations in variables such as the term spread, expected and unexpected
ination, industrial production growth, or the corporate spread. They nd that these sources
of variation in the cross-section of asset returns are signicantly priced.
1.5.3 Fama & French
Consider the Security Market Line in Eq. (1.20), which predicts that each asset display an
average excess return lying precisely on the SML. Assets delivering average excess returns and
betas above the SML, as the points , , , and
in Figure 1.4 below, would be simply
evidence that this single factor version of the APT does not work. Consider, for example, the
7 In

tests of the CAPM, one uses proxies of the market portfolio, such as, say, the S&P 500. However, the market portfolio is
unobservable. Roll (1977) points out that as a result, the CAPM is inherently untestable, as any test of the CAPM is a joint test
of the model itself and of the closeness of the proxy to the market portfolio.

33

c
by
A. Mele

1.5. Empirical evidence

asset corresponding to point . A regression of the excess return of this asset onto the excess
return on the market would produce a positive intercept, some
0, such that its average
excess return would equal + (
), thereby invalidating Eq. (1.20). There exist at least
two pieces of evidence against the one-factor CAPM, which were systematically pointed out by
Fama and French (1992, 1993):
(i) Size e ect (Banz, 1981): Average returns for small rms, or low capitalized rms (in
terms of market equity, dened as stock price times outstanding shares) are too high given
their beta.
(ii) Value e ect (Stattman, 1980; Rosenberg, Reid and Lanstein, 1985): Average returns on
stocks of rms with high book-to-market (BM, henceforth) ratios, or value stocks, are
too high given their beta. In general, average returns on value stocks are higher than those
on growth stocks, i.e. those stocks with low BM ratios. As an example, the points ,
, , and in Figure 1.4 might typically refer to stocks with low-to-high BM ratios.
A third piece of evidence against the standard CAPM is the momentum e ect:
(iii) Momentum e ect (Jegadeesh and Titman, 1993): Stocks with the highest returns in the
previous twelve months will outperform in the next future.
Average excess return

A
B
C
Se curity Mar ket Line

D
M - r

FIGURE 1.4.
The one-factor CAPM has no power in explaining the cross-section of asset returns, sorted by
size, BM or momentum. Assets sorted in this way command a size premium, a value premium,
and a momentum premium. For example, one can create portfolios sorted by size and BM, say
25 portfolios, out of a 5 5 matrix with dimensions given by size and BM. The puzzle, then,
at least from the standard CAPM perspective, is that this model cannot explain the returns
on these porfolios. Fama and French (1993) show that the returns on these portfolios can be
very much better understood by means of a multifactor model, where both size and value
premiums are explicitly taken into account. They consider three factors: (i) the excess return
on the market; (ii) an HML factor, dened as the monthly di erence between the returns
34

c
by
A. Mele

1.6. Stochastic dominance

on assets with high and low BM ratios (high minus low); an SMB factor, dened as the
di erence between the asset returns of rms with small and big size (small minus big). The
HML and SMB factors are dened as the di erences between the returns on the appropriate
cells of a 2 3 matrix, obtained through percentiles of the distribution of asset returns over the
previous year.
Book-to-Market
L
M
H
Size
S
L
The resulting model is the celebrated Fama-French three factor model. Carhart (1997) extends
this model to a four-factor model with a momentum factor: the monthly di erence between the
returns on the high and low prior return portfolios.
1.5.4 The high-beta stocks anomaly
High-beta stocks should command higher returns, to compensate for their higher volatility. Yet
historically, it is low-beta stocks to have performed better, on a risk-adjusted basis (i.e., in
terms of alphas). Theoretically, then, one could buy low-beta stocks and leverage them through
debt. Its indeed feasible when youve got the opportunity to do so. Two papers to read are,
Frazzini, Kabiller and Pedersen (2012), and Frazzini and Pedersen (2012).

1.6 Stochastic dominance


The notion of risk underlying the classical CAPM is variance. However, there are situations
studied in these lectures, where choices of every expected utility maximizer might be best understood with a generalized notion of risk due to Rothschild and Stiglitz (1970, 1971). Consider,
rst, the following denition of stochastic dominance:
Definition 1.1 (First-order stochastic dominance). 2 dominates 1 if, for each utility
function, i.e. increasing, we have also that [ (2 )]
[ (1 )].
We have:
Theorem 1.1. The following statements are equivalent: (a) 2 dominates 1 , or [ (2 )]
[ (1 )] for every increasing; (b) for each
0, we have that 2 ( ) is more likely than 1
to pay more than , i.e. 1 ( )
( ) denotes the distribution function of .
2 ( ), where
Proof. We prove this result in the case the support is compact, say [
that ( )
( ). By integrating by parts,
[ ( )] =

( )

( )= ( )
35

( ) ( )

]. First, we show

c
by
A. Mele

1.6. Stochastic dominance


where we have used the fact that:

[ (2 )]

Next, we show that ( )


and
(
( ) = I
1( )
2 ( ). k

( ) = 0 and

[ (1 )] =

( ) = 1. Therefore,
0

( )[

1(

( ). Indeed ( ) implies that


R
), or 0
I
( 2( )

)
R

2(

)]

( )( 2( )
R
(
1 ( )) =

1
2

( )

( ))

0 for
1 ( )) =

There is an alternative characterization of rst-order stochastic dominance. Suppose there


exists a strictly positive random variable by which 2 exceeds 1 , viz,
0 : 2 = 1 +

(1.48)

Then, we have that


[ ], 1 ( )
Pr (1
) = Pr (2
+ )
Pr (2
)
2 ( ).
That is, 2 dominates 1 if it can be expressed as in Eq. (1.48). It is quite an intuitive property,
but at the same time, it does not insulate the pure component of risk. Instead, we would like to
perform the thought experiment to ask every expected utility maximizer to choose between two
distributions with the same mean. Consider the following denition of second order stochastic
dominance.
Definition 1.2. (Second-order stochastic dominance). 1 is more risky than 2 if, for every
concave function , we have also that [ (1 )]
[ (2 )] for 1 and 2 having the same
mean.
Note that the previous denition of increasing risk does not rely on the sign of 0 . Furthermore, it does not necessarily imply that if 1 is more risky than 2 when
(1 )
(2 ).
Consider the following standard counterexample. Let 2 = 1 with probability 0 8, and 100 otherwise. Let 1 = 10 with probability 0 99, and 1090 otherwise. We have, (1 ) = (2 ) = 20 8,
but
(1 ) = 11762 204 and
(2 ) = 1647 368. However, consider ( ) = ln . Then,
(log (1 )) = 2 35
(log (2 )) = 0 92. It is easily seen that in this particular example,
the distribution function 1 of 1 does not move around probability masses of 2 in an
approproate sense to be developed below.
Consider, instead, the case in which the distribution of one variable 1 is obtained from that
of another variable 2 , as follows: we take weights from the middle part of the density and move
them towards the tails, by making sure that the new density has the same mean as the initial.
Mathematically, this is equivalent to requiring that condition (b) in the following theorem holds
true. We have:
Theorem 1.2. The following statements are equivalent: (a) 1 is more risky than 2 ; (b) 1
R
has more weight in the tails than 2 , i.e.
[ 1( )
0
2 ( )]
36

c
by
A. Mele

1.6. Stochastic dominance

Proof. As for ( )
( ), consider the function,
( ) = max {
0}. It is increasing
and concave and, hence, a candidate utility function. Therefore, it satises,
Z

( )) [

That is, using the denition of ,


Z
0
(

)[

= [ 1( )
Z
=
[ 1( )

( )

( )

( )]
2

( )]

( )]
[

( )

( )]

( )]

where the last equality follows by an integration by parts. Next we prove that ( )
( ). We
have:
Z
[ (1 )]
[ (2 )] =
( ) [ 1( )
2 ( )]
Z
0
= ( ) [ 1( )
( ) [ 1( )
2 ( )]|
2 ( )]
Z
0
=
( ) [ 1( )
2 ( )]

0
00
=
( ) 1 ( ) 2 ( )
( ) 1 ( ) 2 ( )
=

=
where ( ) =

( )
Z

00
00

( ) 1 ( )

2 ( )

( ) 1 ( )

2 ( )

( ) 1 ( )

2 ( )

. The last equality follows, because, by integrating by parts,


[

( )

( )]

( )

where the last equality follows by the assumption that 1 and


2 ( ), the previous relation implies that
by 00 0, and 1 ( )
is more risky than 2 . k

( )] = 0
have the same mean. Now,
[ (1 )]
[ (2 )], i.e., 1

We can now consider random variables that add up risk without a ecting the mean: suppose
that there exists a random variable : 1 has the same distribution as 2 + , and ( | 2 = 2 ) =
0. We can think of an experiment in which after receiving a payo 2 , another payo could be
added which has conditional expectation zero, and which therefore adds noise. Clearly 1 is a
mean preserving spread of 2 . It is easy to show that this mean-preserving spread implies that
37

c
by
A. Mele

1.6. Stochastic dominance


the two condition in Theorem 1.2 hold true. Indeed, we have,
[ (1 )] =
=
=
=
where the inequality follows by

00

[ (2 + )]
[ ( (2 + )| 2 =

2 )]

[ ( ( 2 + | 2 =

2 ))]

[ ( ( 2 | 2 =

2 ))]

[ (2 )]

0, and by Jensens inequality.

38

c
by
A. Mele

1.7. Appendix 1: Analytical details relating to portfolio choice

1.7 Appendix 1: Analytical details relating to portfolio choice


We derive Eq. (1.9), which is the solution to the portfolio choice, when the space choice does not
include a safe asset. We derive the solution by solving two programs: (i) the primal program [1.P2] in
the main text, amounting to maximizing the expected portfolio return, with a given variance of the
portfolio value; and (ii) a dual program, to be introduced below, where we minimize the variance of
the portfolios value, with given portfolio expected return.

1.7.1 The primal program


Given Eq. (1.8), the Lagrangian function associated to [1.P2] is,
>

=
where

and

1(

>

>

2(

are two Lagrange multipliers. The rst order conditions are,


=

1
2 1

> =

21

>1 =

(1A.1)

Using the rst and the third conditions, we obtain,


1
(1> 1
2 1 | {z }

= 1> =
We can solve for

2,

obtaining,

>
21

1 )
{z }

1
(
2 1

into the rst condition in (1A.1) leaves,

1
1
1
1 +
1
=
2 1

By replacing the solution for

(1A.2)

Next, we derive the value of the program [1.P2]. We have,

( )

>

It is easy to check that


0
( ) =

1 > 1
1
+
(
| {z } 2 1 | {z }
>

>

1
2

1
)=
| {z }

1
+
2 1

(1A.3)

= >
=

>

1
+
2 1
2

1
2 1

>

>

1
2

1
1 +
2 1

(1A.4)

Let us gather Eqs. (1A.3) and (1A.4),


[ 0 ( )]

( )
2

1
2

39

+
2
2

1
1

(1A.5)

c
by
A. Mele

1.7. Appendix 1: Analytical details relating to portfolio choice

where we have emphasized the dependence of


on , which arises through the presence of the
Lagrange multiplier 1 .
Let us rewrite the rst equation in (1A.5) as follows,
1

We can use this expression for


( ). We have,

11

( )

(1A.6)

to express in Eq. (1A.2) in terms of the portfolio expected return,

( )

By rearranging terms in the previous equation, we obtain Eq. (1.9) in the main text.
Finally, we substitute Eq. (1A.6) into the second equation in (1A.5), and obtain:
2

1 h
1+

2 i

( )

which is Eq. (1.10) in the main text. Note, also, that the second condition in (1A.5) reveals that,

1
2

2
0, the previous equation conrms the properties of the global minimum variance
Given that
portfolio stated in the main text.

1.7.2 The dual program


We now solve the dual program, dened as follows,

0( )
0
s.t.
( ) =
= arg min

and

[1A.P2-dual]

. The rst order conditions are

for some constant

where 1 and
second one,

>

> =

= >1 ;

(1A.7)

are two Lagrange multipliers. By replacing the rst condition in (1A.7) into the
= > =

1 >

+ 1> 1 )
2 | {z }
2 | {z }

(1A.8)

(1A.9)

By replacing the rst condition in (1A.7) into the third one,


= >1 =

Next, let

1 >

2|

1 + 1> 1 1 )
{z }
2 | {z }

. By Eqs. (1A.8) and (1A.9), the solutions for


1

40

and

2
are,

c
by
A. Mele

1.7. Appendix 1: Analytical details relating to portfolio choice


Therefore, the solution for the portfolio in Eq. (1A.7) is,

Finally, the value of the program is,

0 ( )
1
1
= 2 > = >

1
2

1
+ >

1 =
2

+
2

(
(

)2 1
+
2)

which is exactly Eq. (1.10) in the main text.


To check the second-order conditions apply, consider the bordered Hessian,
2

2
1

1
2

1
2

1
2

2 1
2
1>

2
0
0

1
0
0

It is negative (semi) denite whenever the leading principal minors (formed through the last columns
and corresponding rows, for = 4
+ 2) have determinants with signs that alternate, with the
rst one (formed with the last 4 rows and corresponding columns) having the sign of ( 1)2 = +1.
2.
This is possible whenever 1 0, which is true, by Eq. (1A.6), whenever

41

c
by
A. Mele

1.8. Appendix 2: The market portfolio

1.8 Appendix 2: The market portfolio


1.8.1 The tangent portfolio is the market portfolio
Let us dene the market capitalization for any asset as the value of all the assets that are outstanding
in the market, viz

= 1
Cap
where is the number of assets outstanding in the market. The market capitalization of all the
assets is simply
X
Cap
Cap
=1

The market portfolio, then, is the portfolio with relative weights given by,

Cap
Cap

= 1

Next, suppose there are investors and that each investor has wealth , which he invests in two
be the wealth investor invests in the safe asset
funds, a safe asset and the tangent portfolio. Let
the remaining wealth the investor invests in the tangent portfolio. The tangent portfolio

is dened as
, for some
solution to [1.P2], and is obviously independent of
(see Eq.
(1.16) in the main text). The equilibrium in the stock market requires that
and

Cap

=1

X
=1

= Cap

P
where the second equality follows because the safe asset is in zero net supply and, hence,
= 0;
=1
and the third equality holds because all the wealth in the economy is invested in stocks, in equilibrium.

1.8.2 Tangency condition


We check that the CML and the e cient portfolio frontier have the same slope in correspondence
of the market portfolio. Let us impose the following tangency condition of the CML to the e cient
portfolio frontier in Figure 1.2,
, at the point :
2

Sh =

(1A.10)

The left hand side of this equation is the slope of the CML, obtained through Eq. (1.6). The right hand
side is the slope of the e cient portfolio frontier, obtained by di erentiating ( ) in the expression
in
for the portfolio frontier in Eq. (1.11), and setting =
q
2
( )
2) =
= ( 2 1) 1 (
( )
and where the second equality follows, again, by Eq. (1.11). By Eqs. (1A.10) and (1.15), we need to
show that,
1
=
2
By plugging

= +

Sh

into the previous equality and rearranging terms,


Sh

42

c
by
A. Mele

1.8. Appendix 2: The market portfolio

where we have made use of the equality Sh =


2 + 2 , obtained by elaborating on the denition
of the Sharpe market performance Sh given in Eq. (1.4). This is indeed the variance of the market
portfolio given in Eq. (1.15).

43

c
by
A. Mele

1.9. Appendix 3: An alternative derivation of the SML

1.9 Appendix 3: An alternative derivation of the SML


The vector of covariances of the

asset returns with the market portfolio are:

1
=
(
1 )

( ) =
=

(1A.11)

where we have used the expression for the market portfolio given in Eq. (1.16). Next, premultiply the
previous equation by

>

or

to obtain:
=

>

>

)=

1
)2

Sh

= Sh , which conrms Eq. (1.15).


Let us rewrite Eq. (1A.11) component by component. That is, for = 1
( ) =

(1A.12)

)=

Sh

)=

where the last two equalities follow by Eq. (1A.12) and by the relation,
terms, we obtain Eq. (1.20).

44

(
Sh =

)
. By rearranging

c
by
A. Mele

1.10. Appendix 4: Demand for money and liquidity traps

1.10 Appendix 4: Demand for money and liquidity traps


We provide a parametric example mentioned in the main text, in which agents operate dichotomic
choices: they either hold money or bonds according to their expectations
Pof future interest rates.
= 0 1,
Consider a perpetuity with coupons xed and equal to 1, priced as 0 =
=1 1 (1 + 0 )
where 0 is the current nominal rate, assumed to be at over all maturities. A crucial assumption is
that each agent expects that within a certain reference period, 0 will converge to a normal rate,
1 ( ), with probability one, for all maturities. From the perspective of this agent, the
say ( )
capital gain from holding the perpetuity over this period is,
( )

( 0)

+1

It is easy to see that there exists a value of

1+

( )

for each , such that

( 0 ) = 0, given by,

( )
1+ ( )

0 ( )

It is a critical rate in that agent would only invest in bonds if 0


0 ( ), would only demand
cash if 0 0 ( ) and, nally, would be indi erent about the two choices if 0 = 0 ( ). Next, dene
min 0 ( ) 0 ( ). For small ,
= 0 ( ) =

( )
'
1+ ( )

( )

Given this approximation, when 0 = , every agent believes rates can only rise within the reference
period, such that no one is willing to purchase any bond, as this purchase would lead to a sure loss.
This situation is known as a liquidity trap: when 0 = , changes in money supply, be they positive or
negative, do not a ect interest rates. Indeed, at 0 = , the only investor holding bonds is simply the
marginal investors , who is indi erent about whether to hold money or bonds. If the central bank
increases money supply by purchasing the bonds, this marginal investor would be perfectly ready to
accept this new money and tender the bonds, as he is obviously indi erent between investing in bonds
or hoarding money. Likewise, if the central bank decreases money supply through a bonds sale, the
marginal investor would buy these bonds.
Yet an important point of Keynesian theory is that money demand is negatively sloped, at a
macroeconomic level. We now develop an analytical example where this property holds true. Assume
there are a continuum of agents on [0 1], ordered such that the distribution of ( ) is uniform:
( ) = + (

[0 1]

for some two constants and . Then,


0 ( ) =

+ (
)
1 + + (
)

The -th agent choice relating to money demand,

1
( )=
0

( ) say, is dichotomic, in that:


if 0 ( )
otherwise

Yet aggregate money demand is, after denoting ( 0 )


Z

( )

I0 (

(1+ ) 0
( )(1
=

{ :0 ( )

0}

45

0)

=1
( 0)

( 0) =
(

1+ 0
)(1
0)

1.10. Appendix 4: Demand for money and liquidity traps


where I is the indicator function. Note that

is always positive, provided

c
by
A. Mele
1+

1+

. The

interest rate relating to the liquidity trap is 1+ . The purpose of Section 1.2.6 is to explain how Tobin
(1958) coped with this degeneracy of interest rate expectations.

46

1.10. Appendix 4: Demand for money and liquidity traps

c
by
A. Mele

References
Banz, R.W. (1981): The Relationship Between Return and Market Value of Common Stocks.
Journal of Financial Economics 9, 3-18.
Black, F. (1972): Capital Market Equilibrium with Restricted Borrowing. Journal of Business 45, 444-454.
Campbell, J.Y. and L.M. Viceira (2002): Strategic Asset Allocation. Oxford: Oxford University
Press.
Carhart, M. (1997): On Persistence of Mutual Fund Performance. Journal of Finance 52,
57-82.
Chen, N-F., R. Roll and S.A. Ross (1986): Economic Forces and the Stock Market. Journal
of Business 59, 383-403.
Connor, G. (1984): A Unied Beta Pricing Theory. Journal of Economic Theory 34, 13-31.
Fama, E.F. and J.D. MacBeth (1973): Risk, Return, and Equilibrium: Empirical Tests.
Journal of Political Economy 38, 607-636.
Fama, E. F. and K. R. French (1992): The Cross-Section of Expected Stock Returns. Journal
of Finance 47, 427-465.
Fama, E. F. and K. R. French (1993): Common Risk Factors in the Returns on Stocks and
Bonds. Journal of Financial Economics 33, 3-56.
Frazzini, A., D. Kabiller and L. Pedersen (2012): Bu ets Alpha. Working paper.
Frazzini, A. and L. Pedersen (2012): Betting Against Beta. Working paper.
Huang, C-f. and R.H. Litzenberger (1988): Foundations for Financial Economics. New York:
North-Holland.
Huberman, G. (1983): A Simplied Approach to Arbitrage Pricing Theory. Journal of Economic Theory 28, 1983-1991.
Jegadeesh, N. and S. Titman (1993): Returns to Buying Winners and Selling Losers: Implications for Stock Market E ciency. Journal of Finance 48, 65-91.
Keynes, J. M. (1936): The General Theory of Employment, Interest and Money. London:
Palgrave Macmillan.
Markovitz, H. (1952): Portfolio Selection. Journal of Finance 7, 77-91.
Roll, R. (1977): A Critique of the Asset Pricing Theorys Tests Part I: On Past and Potential
Testability of the Theory. Journal of Financial Economics 4, 129-176.
Rosenberg, B. K. Reid and R. Lanstein (1985): Persuasive Evidence of Market Ine ciency.
Journal of Portfolio Management 11, 9-17.
47

1.10. Appendix 4: Demand for money and liquidity traps

c
by
A. Mele

Ross, S. (1976): Arbitrage Theory of Capital Asset Pricing. Journal of Economic Theory
13, 341-360.
Rothschild, M. and J. Stiglitz (1970): Increasing Risk: I. A Denition. Journal of Economic
Theory 2, 225-243.
Rothschild, M. and J. Stiglitz (1971): Increasing Risk: II. Its Economic Consequences. Journal of Economic Theory 5, 66-84.
Sharpe, W. F. (1964): Capital Asset Prices: A Theory of Market Equilibrium under Conditions of Risk. Journal of Finance 19, 425-442.
Stattman, D. (1980): Book Values and Stock Returns. The Chicago MBA: A Journal of
Selected Papers 4, 25-45.
Tobin, J. (1958): Liquidity Preference as Behavior Towards Risk. Review of Economic Studies 25, 65-86.

48

2
Arbitrage, equilibrium and pricing

2.1 Introduction
This chapter develops asset pricing implications arising whilts requiring that markets are free
from arbitrage opportunities. An important distinction between nancial securities and gambles
rests on the way we price them. Typically, but not exclusively, gambles regard risks that are
not traded. Their value is determined by supply and demand, resting then on factors such as
risk-aversion or the bargaining power of the parties involved into them. A few gambles, such as
some we see in a casino, might be repeated, so to speak. For these gambles, the Law of Large
Numbers might be a rough guidance to what we should expect their value to be, although
additional factors determing supply and demand still play a critical role, as we shall explain.
Financial securities work in a radically di erent fashion. The principle to price them relies
on absence of arbitrage. Suppose there is a security which could be replicated by a portfolio
of existing securities. In absence of any frictions, the price of this security equals the value of
the replicating portfolio, regardless of supply and demand, or any Law of Large Numbers
Its preference-free as we say. The reason is that two portfolios delivering the same payo
must be worth the same, for otherwise an arbitrage opportunity would arisethe possibility to
implement nancial transactions and make money without any risk. Granted, there certainly
are securities which are not so easy to replicate, as they might link to new risks, compared to
the existing securities. However, it might be that in practice, and for all purposes, the existing
assets can be bunched into a portfolio that mimics su ciently well this security, such that we
can design worst-case and best-case scenarios for the value of the security to evaluate. Naturally,
an alternative to all this might be nancial innovationthe process of creation of new securities
that have the potential to ll in the initial incomplete market structure.
This chapter aims to formalize these ideas, and relies on the very denition and role of nancial securities in a world with uncertainty. We start from very far, and review the classical
general equilibrium model in a context without uncertainty. This model has profound implications, leading to idealized outcomes for the society as a whole, where allocations are optimal,
according to Paretos criterion. Financial securities play a critical role, once we plug uncertainty
into this model. A su ciently high number of these securities might actually lead to the same
idealized outcomes predicted by the classic general equilibrium model. Intuitively, a high num-

2.1. Introduction

c
by
A. Mele

ber of su ciently diverse nancial securities have the potential to deliver the payo s we need
in each future contingency of the world, thereby making markets function as if we were in a
static world.
The purest kind of security we could imagine is one that only pays o a xed amount of
numeraire, say $1, in a pre-specied state of the world, and zero otherwise. These securities
are known as Arrow-Debreu, in recognition of the founders of general equilibrium theory with
uncertainty, as explained in detail in Section 2.3. Arrow-Debreu securities are conceptually very
useful, as the knowledge of their prices can be utilized to price any other asset. Not surprisingly,
then, these idealized securities are also a useful tool in the market practice of asset pricing, as
explained in Part III of these Lectures. They link to what we usually term as risk-neutral
probability, similarly as in the previous chapter, as we shall explain.
Needless to mention, many of the previous optimistic conclusions rely on a number of assumptions, such as agentss symmetric information about the assets payo s, perfect competition in
the goods markets, the presence of frictionless capital markets or, nally, market completeness
the circumstance that there is an Arrow-Debreu security for each state of the world. We shall
deal with capital market imperfections in Chapter 4, and with information problems in Chapter
9, of these Lectures. These imperfections will allow us to think about quite interesting aspects
of modern markets and economies in Part II of the Lectures. However, since this chapter, we
shall develop an introduction to the theory and methods arising in the context of incomplete
markets.
In fact, even in the presence of incomplete markets, there exist shadow prices for any ArrowDebreu security. In principle, we could use these prices to evaluate any asset. The challenging
issue is that in an incomplete markets setting, requiring absence of arbitrage does not lead
to an unique set of Arrow-Debreu prices, as in the obvious complete markets case. Further
assumptions are needed to help further characterize these shadow prices. For example, we may
naturally imagine a market economy, with agents optimizing over consumption choices, and pin
down these prices in the general equilibrium of this economy.
It is indeed an important task of this chapter to link Arrow-Debreu security prices to optimal
consumption in general equilibrium. It is natural. As nancial economists, we would naturally
like to like to understand asset prices, from the perspective of an economy where households
allocate their endowments across consumption and savings. Our households objective is to maximize utility of consumption, in an intertemporal context subject to uncertainty. This chapter
only considers two-period economies, but many of its insights lead to asset pricing equations,
which are the logical antecedents to the Eulers equations arising in multiperiod, and possibly
innite horizon economies dealt with in Chapter 3the Consumption-CAPM.
The chapter is organized as follows. The next section contains a succinct description of the
static general equilibrium model and its properties in a static context, and abstracts from
decisions taken within the production sphere of the economy. (Production-based economies are
studied in more detail in Chapter 3 and 8 of these Lectures.) Section 2.3 illustrates the role
nancial securities can play in economies with uncertainties, and the very rst examples of the
meaning and use of Arrow-Debreu securities, and their relations to the risk-neutral probability.
Sections 2.4 and 2.5 provide theory, based on the extension of the general equilibrium model
of Section 2.2, and include uncertainty. The focus in Section 2.4 is absence of arbitrage, and
the implications of this assumption on asset prices, in both complete and incomplete markets.
Naturally, absence of arbitrage does not imply equilibrium, although the converse is true, as
explained in Section 2.5. Section 2.5 then relates Arrow-Debreu security prices to risk-neutral
probabilities, in both complete and incomplete markets, and provides discussion on topics such
50

c
by
A. Mele

2.2. The static general equilibrium in a nutshell

as the role of nancial markets as vehicles risk-sharing for a society, or nancial innovation.
Section 2.6 provides a very rst introduction to the theory of the Consumption-CAPM as well
as its predictions on the equity premium. Section 2.7 provides a framework to think about
budget constraints in innite horizon markets. Finally, Section 2.8 develops a few more topics
about the theme of incomplete markets, and the appendixes contain material omitted from the
main text.

2.2 The static general equilibrium in a nutshell


We consider an economy with agents and commodities. Let
denote the amount of the
-th commodity the -th agent is endowed with, and let
= [ 1
]. Let the
P price
vector be = [ 1
], where
is the price of the -th commodity. Let
=
=1
be the total endowment of the -th commodity in the economy, and
= [ 1
] the
corresponding endowments bundle in the economy.
The -th agent has utility function ( 1
), where ( ) =1 denotes his consumption
bundle. We assume the following standard conditions for the utility functions :
Assumption 2.1 (Preferences). The utility functions
satisfy the following properties:
(i) Monotonicity; (ii) Continuity; and (iii) Quasi-concavity: ( )
( ), and
(0 1),
2
( + (1
) )
( ) or,
( 1
) 0 and 2 ( 1
) 0.

P
P
Let
( 1
) = {( 1
):
}, a bounded, closed and
=1
=1
convex set, hence a convex set. Each agent maximizes his utility function subject to the budget
constraint:
max
{

) subject to (

[2.P1]

is compact set and by Assumption 2.1,


is
This problem has certainly a solution, for
continuous, and a continuous function attains its maximum on a compact set. Moreover, the
Appendix shows that this maximum is unique.
The rst order conditions to [2.P1] are, for each agent ,
1

=1

= =

(2.1)

=1

These conditions form a system of


equations with
unknowns. Let us denote the solution
) (
)]. The total demand for the -th commodity is,
to this system with [1 (
(

)=

X
=1

= 1

We emphasize the economy we consider in this chapter is one that completely abstracts from
production. Here, prices are the key determinants of how resources are allocated in the end. The
perspective is, of course, radically di erent from that taken by the Classical school (Ricardo,
51

c
by
A. Mele

2.2. The static general equilibrium in a nutshell

Marx and Sra a), for which prices and resources allocation cannot be disentangled from the
production side of the economy. In the next chapter and more advanced parts of the lectures,
we consider the asset pricing implications of production, following the Neoclassical perspective.
2.2.1 Walras Law
Let us plug the demand functions of the -th agent into the constraint of [2.P1], to obtain,
0=

=1

(2.2)

Next, dene the total excess demand for the -th commodity as
aggregating the budget constraint across all the agents,
, 0=

XX
=1 =1

. By

=1

The previous equality is the celebrated Walras law.


Next, multiply by
R++ . Since the constraint to [2.P1] does not change, the excess
demand functions are the same, for each value of . In other words, the excess demand functions
are homogeneous of degree zero in the prices, or (
)= (
) = 1
. This property
of the excess demand functions is also referred to as absence of monetary illusion.
2.2.2 Competitive equilibrium
A competitive equilibrium is a vector in R+ such that ( ) 0 for all = 1
, with at
least one component of being strictly positive. Furthermore, if there exists a : ( ) 0,
then = 0.
2.2.2.1 Back to Walras law

Walras law holds by the mere aggregation of the agents constraints. But the agents constraints
are accounting identities. In particular, Walras law holds for any price vector and, a fortiori,
it holds for the equilibrium price vector,
0=

X
=1

)=

X1

)+

(2.3)

=1

1.
Now suppose that the rst
1 markets are in equilibrium, or ( ) 0, for = 1
By the denition of an equilibrium, we have that sign ( ( )) = 0. Therefore, by Eq. (2.3),
we conclude that if
1 markets are in equilibrium, then, the remaining market is also in
equilibrium.
2.2.2.2 The notion of numeraire

The excess demand functions are homogeneous of degree zero. Walras law implies that if
1
markets are in equilibrium, then, the -th remaining market is also in equilibrium. We wish
to link these two results. A rst remark is that by Walras law, the equations that dene a
competitive equilibrium are not independent. Once
1 of these equations are satised, the
-th remaining equation is also satised. In other words, there are
1 independent relations
52

c
by
A. Mele

2.2. The static general equilibrium in a nutshell

and
unknowns in the equations that dene a competitive equilibrium. So, there exists an
innity of solutions.
Suppose, then, that we choose the -th price to be a sort of exogeneous datum. The result
is that we obtain a system of
1 equations with
1 unknowns. Provided it exists, such
a solution is a function of the -th price, = ( ), = 1
1. Then, we may
refer to the -th commodity as the numeraire. In other words, general equilibrium can only
determine a structure of relative prices. The scale of these relative prices depends on the price
level of the numeraire. It is easily checked that if the functions are homogeneous of degree
one, multiplying
by a strictly positive number does not change the relative price structure.
Indeed, by the equilibrium condition, for all = 1
,
0
=

(1 2
) = ( 1( ) 2( )
( 1 2
) = (1 2
)

where the second equality is due to the homogeneity property of the functions , and the
last equality holds because the excess demand functions are homogeneous of degree zero. In
particular, by dening relative prices as = / , one has that
=
is a function
that is homogeneous of degree one. In other words, if
1 , then,

1
(1
) = ( 1
)
1
0

2.2.3 Optimality
Let
= ( 1
) be the allocation to agent , = 1 . The following denition is
the well-known concept of a desirable resource allocation within a society, according to Pareto.
1
Definition
if it
P 2.2 (Pareto optimum). An allocation = ( ) is a Pareto optimum
1
is feasible,
)
0, and if there are no other feasible allocations = (
)
=1 (
such that ( )
( ), = 1 , with one strict inequality for at least one agent.

We have the following fundamental result:


Theorem 2.3 (First welfare theorem). Every competitive equilibrium is a Pareto optimum.
Proof. Let us suppose on the contrary that is an equilibrium but not a Pareto optimum.
Then, there exists a :
( )
( ), for some
. Because
P
P is optimal for agent ,
(), or
and, by aggregating: =1
=1 , which is unfeasible. It
follows that can not be an equilibrium. k
Next, we show that any Pareto optimal allocation can be decentralized. That is, corresponding to a given Pareto optimum , there exist ways of redistributing endowments around,
and a price vector : = , which is an equilibrium for the initial set of resources.
Theorem 2.4 (Second welfare theorem). Every Pareto optimum can be decentralized.
Proof. In the appendix.
53

c
by
A. Mele

2.2. The static general equilibrium in a nutshell

FIGURE 2.1. Decentralizing a Pareto optimum

The previous theorem can be interpreted as one that supports an equilibrium with transfer
payments. For any given Pareto optimum , a social planner can always give to each
agent (with = , where
is chosen by the planner), and agents choose . Figure 2.1
illustratres such a decentralization procedure within the Edgeworths box. Suppose that the
objective is to achieve . Given an initial allocation
chosen by the planner, each agent is
given . Under laissez faire, will obtain. In other words, agents are given a constraint of
the form
= . If
and are chosen so as to induce each agent to choose , then is a
supporting equilibrium price. In this case, the marginal rates of substitutions are identical, as
established by the following celebrated result:
Theorem 2.5 (Characterization of Pareto optima: I). A feasible allocation = (1 )
is a Pareto optimum if and only if there exists a R++ 1 such that
!

= = 1 , where 5
(2.4)

5
1

Proof. A Pareto optimum satises:

arg max

R+

( )
X
(

subject to

= 2

= 2

( , = 1

( ,

=1

The Lagrangian function associated with this program is


=

1(

)+

X
=2

( )

X
=1

=1

and the rst order conditions are


1

11
1

and for

= 2

In each of the previous two


systems, we divide each equation by the the rst, obtaining

exactly Eq. (2.4), with = 2


. The converse is straightforward. k
1

54

c
by
A. Mele

2.2. The static general equilibrium in a nutshell


There is a simple and appealing interpretation of the Kuhn-Tucker multipliers
straints of Theorem 2.5. Note that by Eq. (2.1), in the competitive equilibrium,

5 =

of the con-

But because a competitive equilibrium is also a Pareto optimum, then, by Theorem 2.5,

5 =

Hence, represents the vector of relative, shadow prices arising within the centralized allocation
process.
We provide a further characterization of Pareto optimal allocations.
Theorem 2.6 (Characterization of Pareto optima: II). A feasible allocation = (1 )
is a Pareto optimum if and only if there exists
0 such that is solution to the following
program:
(

) = 1max

=1

subject to

=1

= 1

[2.P2]

Proof. The if part is simple and at the same time instructive. Let us solve the program in
[2.P2]. The Lagrangian is,
=

X
=1

and the rst order conditions are, for


=

=1

= 1
>

=1

>

(2.5)

That is, equals the same vector of constants for all the agents, just as in Theorem 2.5. The
converse to this theorem follows by an application of the usual separating theorem, as in Du e
(2001, Chapter 1). k
Note, if 1 = 1 and =
for = 2 , then,
=
( = 1
) and so the rst
order conditions in Theorem 2.5 and 2.6 would lead to the same allocation. More generally, we
have:
Theorem 2.7 (Centralization of competitive equilibrium through Pareto weightings). The
outcome of any competitive equilibrium can be obtained, through a central planner who maximizes the program in [2.P2], with system of social weights equal to
= 1 , where
is the
marginal utility of income for agent .
So agents with high marginal utility of income for a given price vector, will receive little
social weight in the centralized planner allocation procedure. This result is particularly useful
55

c
by
A. Mele

2.3. The role of nancial securities in markets with uncertainty

when it comes to study nancial markets in economies with heterogeneous agents. Theorem 2.7
is also a point of reference, where to move from, when it comes to study asset prices in a world
of incomplete markets. Chapter 8 contains several examples of these applications.
Proof of Theorem 2.7. In the competitive equilibrium,
=

(2.6)

are the Lagrange multipliers for the agents budget constraint, so that
where
marginal utility of income:
=

(1 (

))

is the agent

X
=1

By comparing the competitive equilibrium solution in Eq. (2.6) with the Pareto optimality
property of the equilibrium in Eq. (2.5), we deduce that, a competitive equilibrium ( ) can
be implemented, by a social planner acting as in Theorem 2.6, when =
P1 . Then, it also
follows that, necessarily,
= , by the aggregate resource constraint,
, which,
=1
intuitively, has to hold both in the competitive and the centralized economy. Indeed, we have:
=

=1

X
=1

= 1

(2.7)

where and
are the inverse functions for consumption, as implied by the private allocation
in Eq. (2.6) and the social, in Eq. (2.5). The rst of Eqs. (2.7) determines the general equilibrium
price vector,
say. The second of Eqs. (2.7) is the aggregate constraint faced by the central
planner with = 1 , and clearly, this constraint is satised by a Lagrange multiplier , say,
which exactly matches , viz
= , in which case
= by construction. Moreover, this
is the unique solution for as
is monotonically decreasing. k

2.3 The role of nancial securities in markets with uncertainty


2.3.1 Commodity markets
We can use general equilibrium theory to span a variety of elds, by just making an appropriate
use of the following denition.

A commodity is characterized by its physical properties, the date and the place at which
it will be available.

Gerard Debreu (1959, Chapter 2)

For example, by freezing time and physical properties, we have a theory of international
commerce, and by freezing places and physical properties, we a theory of nance. The previous
denition does not include the notion of uncertainty. To cope with uncertainty, Debreu (1959,
Chapter 7) extends the previous denition, highlighting that a commodity should be described
56

2.3. The role of nancial securities in markets with uncertainty

c
by
A. Mele

through a list of physical properties, with the structure of dates and places being replaced
by an event structure. The following example illustrates the di erence between two contracts
underlying the delivery of corn arising under conditions of certainty (case A) and uncertainty
(case B):
A The rst agent will deliver 5000 tons of corn of a specied type to the second agent, who
will accept the delivery at date and in place .
B The rst agent will deliver 5000 tons of corn of a specied type to the second agent, who
will accept the delivery in place and in the event
at time . If
does not occur at
time , no delivery will take place.
In both cases, the contract is paid at the time it is agreed. There are instances where we
can actually use the model of the previous section to deal with contracts such as those in
case B. Consider, for example, a two-period economy, and suppose that in the second period,
mutually exhaustive and exclusive states of nature may occur. We can, then, recover the
model of the previous section, once we replace
(the number of commodities described by
physical properties, dates and places) with
, where
=
. With
replacing ,
the competitive equilibrium in this economy is dened as the competitive equilibrium in the
economy of the previous section.
The assumption underlying this trick is that markets exist, where commodities for all states
of nature are traded. Such contingent markets are complete, in that a market is open for each
commodity in each state of nature. Therefore, agents can implement any feasible action plan.
In particular, resource allocation is Pareto-optimal. However, the existence of contingent
markets is a strong assumption. Next, we show how nancial securities help mitigate this
assumption.
2.3.2 Financial securities
What role nancial securities could play in an uncertainty world? Arrow (1953) develops the
following interpretation. Rather than signing contracts for delivery of commodities that are
contingent on the realization of events, agents might agree on contracts generating payo s
that are contingent on the realization of eventsi.e. nancial securities, or assets. The payo s
delivered by the assets in the various states of the world might then be collected to nance
state-contingent consumption plans.
Let us illustrate. Consider a two-period economy. An asset A is a contract that promises
to pay a payo
( ) in some state
S in the second period, where S denotes the set of
all possible future events. We assume there exist
assets, and that markets re-open for each
commodity in each state of nature in the second period. Assets and commodities are linked as
follows. At time zero I purchase units of the asset A . If state of nature occurs in the second
period, I will utilize the payo
( ) and nance net transactions on the commodity markets
re-opening in the second period, viz
( ) ( )=

X
=1

( )

(2.8)

where ( ) and ( ) denote the commodity price and excess demand vectors, which are both
contingent on the realization of state .
57

2.3. The role of nancial securities in markets with uncertainty

c
by
A. Mele

Thus, nancial assets transfer value across states of nature. Security markets shrink this
uncertain economy to one similar to that of the previous section, with one added feature: there
are no such complicated markets to open at time zero for each state of nature in the future.
There are, simply, commodity and asset markets in the rst period, and commodity markets
that re-open in the second period. This economy is more realistic than in the previous section
although perfectly isomorphic to the previous once Eq. (2.8) holds true. However, Eq. (2.8) does
not hold, in general. It would, should the number of assets be equal to the number of events in
S, i.e. = . We say security markets are complete in this case.
The natural question arises as to what is the fair value of these nancial securities? Section 2.4
develops a comprehensive theory of security evaluation, which relies on both a precise notion of
arbitrage opportunties and how absence of arbitrage links to contexts with maximizing agents,
and the existence of prices for a specic class of securities, known as Arrow-Debreu securities,
and their relation to the notion of complete markets. The remainder of this section aims to
develop in detail the notion of Arrow-Debreu securities, and a number of introductory examples
illustrating their importance. We begin with two examples, which aim to draw attention to
what we really mean by absence of arbitrage and how this notion can help di erentiate between
gambles and securities.
2.3.3 Gambles and securities
Part of the reasoning underlying the very fundamental asset pricing formulae in these Lectures
relies on the notion of absence of arbitragetwo portfolios yielding the same should be worth
the same, in the presence of frictionless markets. The aim of this section is to draw a distinction
between gambles and securities, by emphasizing the idea that some securities can be priced by
requiring absence of arbitrage.
As usual, securities are contracts that are traded at a certain price, which might reect a
number of factors such as supply and demand, bargaining power, the market microstructure
and, of course, absence of arbitrage, as we shall explain soon. Instead, gambles are risks that
are typically not traded, like in a casino, with some nuances to be made below. Naturally, the
fair value of a gamble also depends on many factors, such as the bargaining power of the parties
entering into play, and their risk-attitudes.
If we were capable to slice a gamble into elementary risks that are actually traded, we would
be able to price the gamble through no-arbitrage: the fair value of the gamble would equal that
of a portfolio that has one unit of each of the risks which the gamble is split into, as in the
examples of the previous section. In fact, such a gamble would not be like those in a casino
anymore, but a derivative, which we could price through the price of the underlying risks.
Note, this gamble might not be traded per seonly its constituent risks arealthough then, it
might be replicated.
All in all, there are securities that can, and securities that cannot, be replicated, through
the set of existing assets. Those that can be replicated, are priced without regard to anything
else but the prices of the assets that add up to the replicating portfolio. Arbitrage is the
possibility to prot from price inconsistencies, by trading the risk constituents, and absence of
arbitrage imposes discipline on the set of economically viable security prices. Likewise, there
are gambles that can actually be replicated, as soon as their constituent risks are traded. What
these gambles have in common with replicable securities, is that their price is free from anything
relating to factors such as risk-aversion, or supply and demand, say, being then determined by
no-arbitrage. Gambles that cannot be sliced in this way have a value that likely depends on
58

2.3. The role of nancial securities in markets with uncertainty

c
by
A. Mele

the gamblers risk-aversion or the bookmakers bargaining power, say. Securities that cannot be
replicated have this trait in common with unreplicable gambles: their value is not tied down to
anything that is already traded, and subject to a number of factors that we may only speculate
about. Still, the price of traded securities cannot be anything. To avoid arbitrage, security prices
satisfy a quite fundamental economic restriction, known as the martingale restriction, which is
the focus of much of this chapter.
2.3.3.1 Short-run bets and asymptotic arbitrage

What is the price of a coins game? If we could only toss the coin many times, then, by the Law
of Large Numbers, the value of this gamble would approach 50% of a stake as the number of
draws grows, to avoid an asymptotic arbitrage, i.e. the possibility to gain from betting for
tails, if the price for the tails is less than 50% of the stake over such a large number of trialsor
selling tails, whenever this price is higher than 50%. Note that by allowing for the possibility
to sell tails, we are meaning that this gamble is, in fact, a tradeable asset.
Yet assume, and realistically, that the number of trials is small, in which case the value
of this gamble might well deviate from 50% of a stake, reecting for instance preferences,
bargaining power, or in general, supply and demand. The previous asymptotic arbitrage could
not take place because tossing a coin only once, say, is far from guaranteeing an almost sure
50% frequency of tails outcomes.
This example shows that the value of an unreplicable gamble might well depend on preferences
and other factorsin short, by supply and demand. In fact, in the next section, we show that
factors such as bettors preferences are not only necessary to value a gamble. These preferences
would sometimes even need to be restricted for us to conclude about the logical consistency of
the very same gamble.
2.3.3.2 St Petersburg paradox

Consider the following, celebrated gamble, leading to the so-called St Petersburg paradox. We
toss a coin a number of times, and the gamble ends as soon as a tail (say) arises as an outcome,
with a payo doubling all the time, being equal to $2 if the gamble stops at . The probability
to receive $2 at the -th trial is, naturally, the probability that
1 heads occur over the
rst
1 trials and one tail in one trial, which in total is 2 , given the independence of the
trials. Therefore, the fair value of this gamble is,
X
=1

(2 ) 2

This gamble is obviously not trivial, as there is a risk anyway, in that the payo , although
positive, is not certain. Because the payo is positive, we have to pay something to enter this
gamble, although it is unlikely that anyone would be willing to pay a large amount of money
for it. Why? A moment reection suggests the payo prole is quite unattractive. For example,
there are almost 95% chances to obtain less than $ 16, which illustrates how implausible it
seems that anyone would be willing to spend a large amount of money to enter this gamble.
Bernoulli (1738) proposes to solve this puzzle by replacing outcomes with concave utilities of
outcomesconcave utilities would dampen the occurrence of very positive returns. For example,
a decision maker with log-utility, would perceive an expected utility from this gamble, equal to:
X
=1

(ln 2 ) 2
59

= ln 4

2.3. The role of nancial securities in markets with uncertainty

c
by
A. Mele

Therefore, he would not be willing to pay more than $ 4, to enter this gamble. (This amount of
money, $4, is the certainty equivalent of the gamble, CE, say, i.e. ln CE = (ln ), where is
the random outcome from the gamble.) As anticipated in the previous section, gambles might
be quite subtle. Not only does their value depend on preferences. Factors such as risk-aversion
are critical to the very existence of such a game. In the St Petersburg paradox example, the
game cannot exist were the bettors risk-seeking or risk-neutral.
2.3.3.3 World Cup pricing, and arbitrage pricing

The risks underlying the gambles of the previous examples cannot be evaluated in a preferencefree fashion. As for the coins game, we would need a large number of draws to conclude its
fair value is 50% of a stake. As for the coins game leading to the St Petersburg paradox, we
would actually need risk-aversion to keep the value of the gamble nite. We now develop a third
example where, again, the value of the gamble cannot be determined without making reference
to supply and demand although then we could price some of the elementary risks that make up
this gamble, in a preference-free fashion, for the very simple reason that these risks are actually
traded.
Consider, then, the odds posted by a bookmaker, displayed in Table 2.1, against four hypothetical teams competing for the World Cup. In bookmaking language, odds set at - against
a certain event mean that on a $ stake, the bettor receives $ if the given event occurs, and
loses his stake of $ , otherwise. Therefore, odds of 5-1 against team B imply that the bookmaker stands ready to pay $500 for $100 stake, should B win the World Cup. Let be the
probability of the event that a given team wins, under which the bettors expected gain is zero,
: +( ) (1
) = 0, such that equals the odds ratio, 1 , with being xed to one in
this example. Also shown in Table 2.1 are these implied break-even probabilities, calculated
as
= 1+1 , where
are the odds against team , and
are the corresponding break-even
probability.
Team
A
B
C
D

odds
against
2-1
5-1
6-1
9-1

implied
prob.
33.33%
16.67%
14.28%
10.00%

TABLE 2.1. Odds against teams in a World Cup.

As in the coins game of Section 2.3.3.1, the breakeven probabilities do not necessarily lead
to the fair value of the gamble. Indeed, bookmakers typically make prots through what is
known as the overoundquoted probabilities that exceed 100%. Overounds are obviously
not arbitrage opportunities, because a given World Cup is quite an unique event, with its own
weather conditions, teams components and many other factors, such that the chance of any
team to win the competion has only a weak linkage to the objective probability of winning over
an hypothetically large number of matches. Once again, the price and, then, the quoted odds,
depend on both the bettors risk-appetite and the bookmakers bargaining power.
While the single odds depend on supply and demand, we can price other gambles which cover
events relating to those the bookmakers are making a market for, and independently of supply
and demand. For example, what are the odds against either team A or team B winning the
60

c
by
A. Mele

2.3. The role of nancial securities in markets with uncertainty

World Cup? It is a derivative gamble, because its value depends on that of the constituent
events. Let us convert the odds into tickets, meaning that the cost of a ticket for a payo equal
to $1 for a - odds is equal to $ + , meaning that if the event occurs, we obtain $ + $ ,
and $ otherwise, we which we interpret as the cost of the ticket. So, the unit ticket for team
A winning is worth $0 33333, and that for team B is $0 16667. It easy to see, then, that the
unit ticket value for the composite event needs to be $0 33333 + $0 16667 = $0 5. If it was
lower, say, $0 4, we could sell the two bets for A and B for $0 5, and use $0 4 to ensure that
we could honour the payo that A or B will win. If the unit ticket value for the composite event
was, instead, $0 6, we could sell the composite bet and, then, bet for A and B separately, to
honour the payo relating to the composite event.
Pricing derivatives in a preference-free format as in this example relies on the fact that
markets are complete, in that we could replicate the composite risk by betting on each single
team. The next sections extend this notion to various situation of interest that include nancial
security transactions. We begin with introducing the simplest securities, those that only pay
o in specic states of nature, similarly as the single bets of the World Cup of this section.
2.3.4 Arrow-Debreu securities
We now develop a very rst evaluation model relying on the notion of a special type of quite
simple securities, those that pay o one unit of numeraire in state , if state will prevail in the
future, and zero otherwise. We usually refer this set of assets to as Arrow-Debreu securities, in
honour of Kenneth Arrow and Gerard Debreu. Arrow-Debreu securities are perhaps the most
elementary we might imagine, and constitute the bricks of the entire asset evaluation framework
in Financial Economics, a bit like atoms in Physics, so to speak. Naturally, Arrow-Debreu
securities do not exist in reality, for the very simple reason that it is impossible to determine
states at such a pure level, in practice. States are simply part of a models assumptions.
While Arrow-Debreu securities are model-dependent, they can be used to price assets
after all, asset pricing is model-dependent by nature. This section develops a very intuitive link
between asset prices and Arrow-Debreu security prices, and explores how asset prices can, then,
be re-cast in terms of a certain probability under which they equate their expected payo s, in
an actuarial sense. We conclude with a derivation of the price of these securities based on a
simple equilibrium model.
2.3.4.1 Pricing

Consider a two-period economy where at time = 1, three mutually exclusive states of the
world may occur. We consider three Arrow-Debreu securities: the -th, AD , pays o $1 in
state , and zero otherwise, as in the next gure.
=1

=0

AD1

AD2

AD3

state 1

$1

state 2

$1

&

state 3

$1

Let
be the price of the asset AD , and consider a portfolio that has one unit of each of
the three assets. This portfolio yields one, for sure, at time = 1. Therefore, its price must be
61

c
by
A. Mele

2.3. The role of nancial securities in markets with uncertainty


that of a pure discount bond, or,
1
1+

3
X

(2.9)

=1

where denotes the riskless interest rate. More generally, consider an asset A0 , which pays o
in state , as in the next gure. Naturally, the values of
are known at zeroalthough of
course we do not know which
will be drawn at time = 1. Also shown in this gure are the
payo s of three assets, A , = 1 2 3, which are rescaled versions of Arrow-Debreu securities.
Let us explain.
A0

=1

=0

A1

state 1

state 2

&

state 3

A2

A3

$
0

0
$

By denition, paying 1 yields 1 if state 1 occurs, and zero otherwise. Therefore, paying
1 1 yields 1 1 if state 1 occurs, and zero otherwise. We can, then, slice the risks related
to asset A0 into three, by considering the three elementary assets A1 , A2 and A3 . Each of these
assets costs
at = 0. Moreover, a portfolio that has one unit of each of these three assets
needs to have the same value of A0 , say ,
=

3
X

(2.10)

=1

P
We can say more about . Consider Eq. (2.9), which we rewrite as, 3=1
= 1, where
= (1 + ) . Thus,
is a probability distribution. Replacing
into Eq. (2.10) reveals an
important property of ,
3
1 X
=
1 + =1

1
1+

( )

The price of A0 can be expressed as the expectation of the future payo , , discounted.
It is as if we evaluated assets through actuarial methods. We shall refer to
as risk-neutral
probability for this reason, and we shall discuss the properties of this benchmark probability in
detail in the next sections.
To sumup, Arrow-Debreu securities would allow us to price any asset, through Eq. (2.10).
In Section 2.4, we shall actually show that there are no arbitrage opportunities if and only if
there exists a vector : = > , where is a -dimensional vector of security prices,
is a matrix of security payo s, and is the number of states. In this general context,
the vector carries the natural interpretation of one containing Arrow-Debreu security prices.
Note, also, that the security we are evaluating in this specic example relies on the assumption
that available for trading are an appropriate set of Arrow-Debreu securities. Alternatively, we
may assume that only a selected number of Arrow-Debreu securities are available for trading,
62

c
by
A. Mele

2.3. The role of nancial securities in markets with uncertainty

in which case markets become incomplete, in a sense to be made precise in Section 2.3.6. In
this case, it is unlikely the given security could be given an unique as in Eq. (2.10) as we shall
explain.
Note, nally, an interesting property of Arrow-Debreu securities. Consider a general context
with a nite number of states, i.e. not necessarily equal to three, and dene the random return
of the -th asset as = , where = 1 in state , and zero otherwise. Given the objective
probability distribution of the states, , we have that each pair of asset returns is negatively
correlated, with,
s
(1
)
( ) =
( )=
( )=
2
(1
) (1
)
It is natural to expect these prices to be negatively correlated, as they pay o in mutually
exclusive states of the world! As we decrease the number of states, the correlation obviously
increases in absolute value, with the extreme case
( ) = 1, arising when there are
only two states, such that
=1
. It is also natural, as we increase the number of states,
there is a progressively higher number of states where any two Arrow-Debreu securities do
not pay o , which brings the correlation of their returns up. This correlation is still negative,
however.
2.3.4.2 Tying down Arrow-Debreu securities to consumption: an example

Consider the previous two-period economy, where now there is a single agent, who maximizes,
max [ ( 0 ) +
0

( ( 1 ))]

s.t. (i)

and (ii)

= 1 +

is consumption at time = , 0 is endowment at time = 0, 1 is the random


where
endowment at time 1, is a concave utility function satisfying all the regularity we need,
is the expectation taken under the physical probability, and remaining notation is as before.
We then have two constraints. The rst (i), says that endowment is split into time = 0
consumption and investment, and the second (ii) says that time = 1 consumption is nanced
by 1 and the payo provided by the asset.
Replacing the two constraints into the utility functions, rst order conditions are obtained
by di erentiating the resulting objective function with respect to , and lead to,
=

( 0 ( 1 + ) )
=
0(
)
0

( 0 ( 1 ) ) X
=
0(
0)
=1
3

( 1 )
0(
0)

(2.11)

where 1 is the endowment in state , and


is the probability of state . The second equality
follows by an equilibrium condition, namely that the asset is in zero net supply, such that our
Robinson Crusoe has to consume his endowment in each state.
Comparing Eq. (2.10) and (2.11) allows to identify Arrow-Debreu security prices, as follows,
=

0
0

( 1 )
( 0)

is high when 1 is low. In other words, the ArrowBy concavity of the utility function,
Debreu price for the bad states of nature are high or, risk-neutral probabilities assign high
values to bad states. We shall come back to the intepretation of this result many times in this,
and subsequent, chapters.
63

c
by
A. Mele

2.3. The role of nancial securities in markets with uncertainty

2.3.5 Pricing by arbitrage and replication in complete markets: an introductory example


We now develop an introductory example where absence of arbitrage leads to a unique price of
redundant securities, in a context with complete markets. Consider an economy where uncertainty relates to the event tomorrow it will rain. An hypothetical decision maker, Mr Law,
say, would like to implement the following state-contingent plan. If tomorrow will be sunny, he
will need
0 units of numeraire to purchases sunglasses; if tomorrow it will rain, he will
need
0 to purchase an umbrella. Mr Law has access to a nancial market where
assets
are traded. He builds up a portfolio aimed to deliver the payments he will need tomorrow:
P

(1 +

( )) =

(1 +

( )) =

=1

(2.12)

=1

where
is the price of the -th asset, is the number of assets to put in the portfolio, and
( ) and ( ) are the net returns of asset in the two states of nature, which are part
of the information set available to Mr Law. Finally, and remarkably, we are not making any
assumption regarding Mr Laws preferences. We only know that he needs in state .
Eqs. (2.12) form a system of two equations with
unknowns ( 1
). If
2, no
perfect hedging strategies are available to Mr Lawthat is, Eqs. (2.12) cannot be solved to
obtain the desired pair ( ) = . We say markets are incomplete in this case. More generally,
consider an economy with
states of nature, ande dene markets to be complete if and
only if Mr Law has access to
assets, which allow him to achieve any consumption plan,
independently of his preferences. Preferences would only play a role, if any, when it comes to
pick up a particular consumption plan amongst all possible..
Let us, then, dene the following payo matrix,
1 (1

( 1 ))

(1 +

( 1 ))

(1 +

( ))

...

=
1 (1

( ))

where ( ) is the return performed by the -th asset in the state . To implement any state
contingent consumption plan
R , Mr Law needs to solve the following system,
=

where
R , the portfolio. If rank( ) = = , the previous system has a unique solution,
1
given by =
. Consider, for example, the previous case where = 2, and take = 2, for
any additional assets would be redundant here. Then, we have,
(1 + 2 (
1 [(1 + 1 ( ))(1 +
(1 + 1 (
2 =
2 [(1 + 1 ( ))(1 +
1 =

))
(1 + 2 ( ))
(1 + 1 ( ))(1 +
2 ( ))
))
(1 + 1 ( ))
(1 + 1 ( ))(1 +
2 ( ))

( ))]

( ))]

Finally, assume that the second asset is safe, or that it yields the same return in the two states
of nature: 2 ( ) = 2 ( )
. Let
= 1 ( ) and
= 1 ( ). Then, the pair (1 2 ) can be
rewritten as,
(1 + )
2 = (1 + )
1 =
)
)
1(
2 (1 + ) (
64

2.3. The role of nancial securities in markets with uncertainty

c
by
A. Mele

As is clear, we are dealing with an issue relating to the replication of random variables. Our
random variable is a state contingent consumption plan ( ) = , where
and
are known,
which we want to replicate for hedging purposesMr Law will need to buy either a pair of
sun-glasses or an umbrella, tomorrow.
In this example, any two-state variable can be generated by investing into two assets with
independent payo s. Our next step is to understand whether there are implications for the
price asset, A say, which would exactly deliver the same random variable ( ) = . Let be the
current price of asset A. We claim that,
1

+ 2

(2.13)

for the nancial market to be free of arbitrage opportunities, to be dened in a moment. Indeed,
if
, we can buy and sell at the same time the third asset A. The result is a sure prot,
or an arbitrage opportunity, equal to
, for generates
if tomorrow it will rain, and

if tomorrow it will not rain. In both cases, the portfolio generates the payments we need
to honour the sale of A. Likewise, we can show that the inequality
would lead to an
arbitrage. Hence, Eq. (2.13) must hold true.
We are left with the calculation of the right hand side of Eq. (2.13). We have:
=

1
[
1+

+ (1

) ]

(2.14)

Eq. (2.14) is an evaluation formula for the asset A, and says that the price, , can be
expressed as the present value of the expected payo s promised by A under a new probability,
, which for obvious reasons we term risk-neutral probability. Note that, remarkably, we are
able to price the asset A without any reference to agents preferences. The reasons underlying
this preference-free result link to the fact that we can replicate the asset A, through . Eq.
(2.14) does not obviously require that any agent is using this portfolio. For example, Mr Law
might be so poor that he could not even implement . The point is, rather, that the portfolio
could be used to implement an arbitrage opportunity, arising as soon as Eq. (2.13) does not
hold. In this case, any penniless agent could implement the arbitrage described above.
The next step is to extend Eq. (2.14) to a dynamic setting. Suppose an additional day is
available for trading, with the same description as before: the day after tomorrow, the asset A
payo s are
if it will be sunny, provided the previous day was sunny;
if it will be sunny,
provided the previous day was raining, etc. By the same arguments leading to Eq. (2.14), we
have that:
2

1
=
+ (1
)
+ (1
)
+ 2
2
(1 + )
Finally, by extending the same reasoning to
=

trading days,

1
(1 + )

( )

(2.15)

where
denotes the expectation taken under the probability .
The key assumption underlying Eq. (2.15) is that markets are complete at each trading day.
True, at the beginning of the trading period, Mr Law faces 2 mutually exclusive states of
nature for . If he did not have access to markets in each period, he would need 2 securities to
replicate the asset A. Yet in this example, 2 assets and
trading periods for these assets are
needed, to replicate and, then, price A. To emphasize this dynamic feature, we say the that the
65

c
by
A. Mele

2.4. No-arbitrage theory

structure of assets and transaction dates make the markets dynamically complete. Dynamically
complete markets allow to implement dynamic trading strategies that replicate the value of
the asset A, period by period. As a result, the asset A is priced, without any need to assume
anything about the agents preferences. The next section further claries these issues, and the
existence of the risk-neutral probability, .
2.3.6 Replication and pricing: the role of complete markets
What are the origins of the preference-free formula in Eq. (2.14)? Consider a general two-period
model where the assets deliver a payo matrix, , and denote as usual with the vector of
Arrow-Debreu prices, . We know that the initial price vector is, = > . In a setting with
complete markets, we can extract a unique vector of state prices from the previous relation,
> =

(2.16)

Next, we want to replicate any asset (e.g., a derivative) with nal payo
at the second period,
through a portfolio comprising the initial assets. Denote the initial value of this portfolio with
and the value in the second period with 0 ,
0

(2.17)

We want to use this portfolio to replicate the payo of the derivative in the second period,
i.e.,
1
= =
= 0=
1
into of the rst of Eqs. (2.17) and obtain the initial value of the replicating
We plug =
portfolio,
>
1
=

=
where the last equality follows by Eq. (2.16). As usual, two portfolios that yield the same thing
must be worth the same, in the presence of frictionless markets. Therefore, the price of the
derivative, , is,
>
1
()
= = =
1+
Done. We could price the given asset in this preference-free fashion because markets are
complete in that the asset payo can be replicated through securities that are actually traded
(the complete market assumption). These traded securities convey information about ArrowDebreu securities, , to the derivative, so to speak. Chapters 10 through 12 explain how this
extract-and-plug-in procedure regarding Arrow-Debreu security prices can be used, in practice, to evaluate complex derivative instruments.
In the next sections, we study Arrow-Debreu securities from a more theoretical perspective,
connecting them more deeply with the notion of absence of arbitrage, and relying on general
equilibrium model without production, and with possibly incomplete markets.

2.4 No-arbitrage theory


This section formalizes the notion of absence of arbitrage opportunities and link it the competitive equilibrium in Section 2.2. We consider markets with one single commodity, with Appendix
2 dealing briey with the intricacies arising in the multi-commodities case.
66

c
by
A. Mele

2.4. No-arbitrage theory


2.4.1 Lands of Cockaigne
Let ( ) be the payo of asset
payo matrix:

in state
1

= 1

,
1)

and

1)

= 1

. Consider the

...
1

Let
( ), [ 1
],
[
The budget constraint of each agent is:
0

=[

]> . We assume that rank( ) =

=
Let

=1

= 1

=1

]> . The second constraint can be written as:


1

We dene an arbitrage opportunity as a portfolio that has a negative value at the rst period,
and a positive value in at least one state of world in the second period, or a positive value in
all states of the world in the second period and a nonpositive value in the rst period. Let us
introduce the following pieces of notation.
Notation. [In progress ...] Given a vector
R ,
0 means that at least one component
of is strictly positive while the other components of are nonnegative. 0 means that all
components of are strictly positive.
0 means [ ]
0, = 1 , with at least one
for which [ ]
0.
0 means that [ ]
0, = 1 , i.e. it allows for [ ] = 0,
=1 .
[Insert here further notes]
Definition 2.8. An arbitrage opportunity is a strategy
initial investment
0, or a strategy that produces
0.

that yields either


0 with an
0 with an initial investment

An arbitrage opportunity cannot exist in a competitive equilibrium, for the agents programs
would not be well-dened in this case. Precisely, consider the ( + 1) matrix,

=
the vector subspace of R

+1

,
h

and the null space of h

i=

i,
h

i =

+1

R +1 :
67

=0

c
by
A. Mele

2.4. No-arbitrage theory

The interpretation of the vector subspace h i is that of the excess demand space in all the
states of nature, generated by the income transfers across states
induced by the portfoliochoice

. Naturally, h i and h i are orthogonal, as h i =


R +1 :
=0
h i . (We
shall return to the interpretation of h i below; see Eq. (2.20)). The assumption there are no
arbitrage opportunities is equivalent to the following condition,
T
h i R++1 = {0}
(2.18)

The interpertation of Eq. (2.18) is that in the absence of arbitrage opportunities, there should
be no portfolios generating income transfers that are (i) non-negative and (ii) strictly positive in
at least one state, i.e. @ :
0. Hence, h i and the positive orthant R++1 cannot intersect.

2.4.2 Enforced asset prices


The no-arbitrage condition in Eq. (2.18) restricts the prices of all the assets in the economy to
satisfy a joint restriction, summarized as follows:
Theorem 2.9. There are no arbitrage opportunities
if and only if there
exists a
>
>
. If
= , is unique, and if
, dim
R++ : =
=
.
=

R++ :

Proof. In the appendix.

Theorem 2.9 provides foundations to the main architecture underlying asset evaluation in
frictionless markets. Pre-multiply the second constraint by > , obtaining,
>

)=

>

0)

where the second equality follows by Theorem 2.9, and the third equality is the rst period
budget constraint. Critically, then, Theorem 2.9 shows that in the absence of arbitrage, each
agent faces the following budget constraint,
0=

0+

>

0+

) with

=1

h i

(2.19)

where h i is the subspace generated by the payo s stemming from the portfolio choices,

R
h i=
R : =

Eq. (2.19) suggests to interpret as the price vector of commodities available for consumption
over the various states of nature, with rst-period consumption being the numeraire, in the spirit
of the Debreu (1959, Chapter 7) quote given in Section 2.3.1. Moreover, Theorem 2.9 tells us
that because = > , the vector is, then, the Arrow-Debreu state price vector, generalizing
the heuristic notions introduced in Section 2.3.4. Finally, we can interpret the subspace h i
as follows. Let 1 [1 ]> . Then, by Eq. (2.19),
h i
T +1
i
= {0}.
R
68

In particular, no-arbitrage implies that h

(2.20)

c
by
A. Mele

2.4. No-arbitrage theory

v2

v1

FIGURE 2.2. Incomplete markets,

=2

= 1.

Notwithstanding these interesting conceptual links, it is misleading to interpret the budget


constraint in (2.19) is that arising within the static Arrow-Debreu type model of Section 2.2,
as this case would only obtain when
= , in which case h i = R in (2.19). This case,
which according to Theorem 2.0 arises when markets are complete, also implies the remarkable
property that there exists a unique that is compatible with the asset prices we observe. If
markets are incomplete,
, the situation quite di ers, as the agents have then access to a
smaller subspace of excess demands in the second period, h i R .
Consider, for example, the
R2 : =
R},
1 case = 2 and = 1. In this case, h i = {
with = 1 , where 1 = 2 say, and dim h i = 1, as illustrated by Figure 2.2. Next, suppose

we open a new market for a second nancial asset with payo s given by: 2 = 34 . Then, = 2,
n
o
1 1+ 2 3
2
2
1 3
= ( 2 4 ), and h i =
R : = 1 2+ 2 4 ,
R , i.e. h i = R2 . As a result, we can

now generate any excess demand in R2 , just as in the Arrow-Debreu economy of Section 2.2.
To generate any excess demand, we multiply the payo vector 1 by 1 and the payo vector
2 by 2 . For example, suppose we wish to generate the payo the payo vector 4 in Figure
2.3. Then, we choose some 1 1 and 2 1. (The exact values of 1 and 2 are obtained by
solving a linear system.) In Figure 2.3, the payo vector 3 is obtained with 1 = 2 = 1.
We are in a position to state a fundamental result regarding the viability of the model.
Dene the second period consumption 1 [ 1
]> , where
is the second-period consumption in state , and let,

=
0
0
1
1
arg max1
( 0 )+
( ( ))
subject to
[2.P3]
0
1
1
=
0
and
are utility functions, both satisfying Assumption 2.1. Naturally, we could use
where
more general formulations of utilities than that in [2.P3], and in fact we shall in more advanced
parts of these Lectures. For the sake of this introductory chapter, we only consider additive
utility.
We have:

Theorem 2.10. The program [2.P3] has a solution if and only if there are no arbitrage
opportunities.
69

c
by
A. Mele

2.5. Equivalent martingales, and equilibrium

V3

v4
v2

V2

V4
V1
v3

v1

FIGURE 2.3. Complete markets, h i = R2 .

Proof. Let us suppose on the contrary that the program [2.P3] has a solution 0 1 ,
but that there exists a :
0. The program constraint is, with straightforward notation,
=
+ . Then, we may dene a portfolio = + , such that =
+ ( + ) =
+
, which contradicts the optimality of . For the converse, note that the absence of
arbitrage opportunities implies that
R++ : = > , which leads to the budget constraint
in (2.19), for a given . This budget constraint is clearly a closed subset of the compact budget
constraint
in [2.P1] (in fact, it is
restricted to h i). Therefore, it is a compact set and,
hence, the program [2.P3] has a solution, as a continuous function attains its maximum on a
compact set. k

2.5 Equivalent martingales, and equilibrium


We provide the denition of an equilibrium with nancial markets, when the nancial assets
are in zero net supply.
Definition 2.11. An equilibrium is given by allocations and prices {(0 ) =1 , (( ) =1 ) =1 ,
( ) =1
+ + + }, where the allocations are solutions to program [2.P3], and satisfy:
0=

X
=1

(0

) ; and for

= 1

, 0=

=1

) and 0 =

X
=1

( = 1

We now express demand functions in terms of the stochastic discount factor, and then look for
an equilibrium by looking for the stochastic discount factor that clears the commodity markets.
By Walras Law, clearing in the commodity market implies clearing in the nancial market.
Indeed, by aggregating the agents constraints in the second period,
X
=1

X
=1

For simplicity, we also assume that 0 ( )


0, 00 ( )
0
0
lim
( ) = 0 and that
satises the same properties.
70

0 and lim

( ) =

c
by
A. Mele

2.5. Equivalent martingales, and equilibrium


2.5.1 Rational expectations

Lucas, Radner, Green. Every agent correctly anticipates the equilibrium price in each state of
nature.
[Consider for example the models with asymmetric information, to be dealt in in Chapter 9,
where we shall be concerned with inferences such as ( | () = ), where is the value of the
asset and () is the asset pricing pricing function depending on the state of nature . In these
markets,
( () ) + (1
) ( () ) + = 0, and we look for a solution () satisfying
this equation.]
[In progress]
2.5.2 Stochastic discount factors
Theorem 2.10 states that in the absence of arbitrage opportunities,
=

>

= 1

=1

We assume the rst asset is safe, in that

=1

1
1+

(2.21)

, such that
X

(2.22)

=1

Eq. (2.22) conrms the economic interpretation of the state prices in Eq. (2.19). Because
the states of nature are exhaustive and mutually exclusive,
is the price to be paid today to
obtain one unit of numeraire, tomorrow, in state . It is actually the economic interpretation
of the budget constaint in (2.19), conrmed by Eq. (2.22), which says that the prices of all
these rights sum up to the price of a pure discount bondi.e. the asset yielding one unit of
numeraire, tomorrow, for sure.
Eq. (2.22) can be elaborated to provide us with a second interpretation of the state prices in
Theorem 2.10. Dene,
(1 + )
which satises, by construction,
X

=1

=1

Therefore, we can interpret


in Eq. (11.17) leaves,
=

1 X
1 + =1

=1

as a probability distribution. Moreover, by replacing

1
1+

= 1

(2.23)

Eq. (2.23) conrms Eq. (2.14), obtained in the introductory example of Section 2.5. It says
that the price of any asset is the expectation of its future payo s, taken under the probability , discounted at the risk-free interest rate . For this reason, we usually refer to the
probability
as the risk-neutral probability. Eq. (2.23) can be extended to a dynamic context, as we shall see in later chapters. Intuitively, consider an asset that distributes dividends
71

c
by
A. Mele

2.5. Equivalent martingales, and equilibrium

in every period, let ( ) be its price at time , and ( ) the dividend paid o at time .
Then, the payo it promises for the next period is ( + 1) + ( + 1). By Eq. (2.23),
( ) = (1 + ) 1
( ( + 1) + ( + 1)) or, by rearranging terms,

( + 1) + ( + 1)
()
=
(2.24)
()
That is, the expected return on the asset under equals the safe interest rate, . In a dynamic
context, the risk-neutral probability is also referred to as the risk-neutral martingale measure,
or equivalent martingale measure, for the following reason. Dene a money market account as
an asset with value evolving over time as
( ) (1 + ) . Then, Eq. (2.24) can be rewritten
as ( )
() =
[( ( + 1) + ( + 1))
( + 1)]. This shows that if ( + 1) = 0 for
some , then, the discounted process ( )
( ) is a martingale under .
1
Next, let us replace into the budget constraint in (2.19), to obtain, for ( 1
) h i,
0=

0+

)=

0+

=1

1 X
1 + =1

)=

0+

1
1+

(2.25)
For reasons developed below, it is also useful to derive an alternative representation of the
budget constraint, in terms of the objective probability , say. Accordingly, we introduce a
and
are,
ratio state-dependent ratio , which indicates how far
= 1

is strictly positive and boundedmathematically, it is tantamount to say that


We assume
and are equivalent measures, in that they assign the same weight to the null sets. Dene
the stochastic discount factor, = ( ) =1 , dened as,
1
1+

1
1+

)=

(2.26)

We have,
1
1+

X
=1

1
1+

Hence, we can rewrite Eq. (2.25) as,


0=

1
(
1+
=1 | {z }
1

1
1+

)=

Similarly, by replacing the stochastic discount factor


=

h i

into Eq. (2.23) we obtain,


)

= 1

(2.27)

We now derive optimality conditions and, then, solve for the equilibrium in this economy.
72

c
by
A. Mele

2.5. Equivalent martingales, and equilibrium


2.5.3 Equilibrium
In the absence of arbitrage, any agent faces the following program:

1
max
(

(
(
))
subject
to
0
=
( 1
0 )+
0
0 +
1
(

h i

[2.P4]
This way to formalize the agents problem and constraints is quite convenient and, it helps
understand the nature of incomplete markets, as in the cases illustrated through Figures 2.2
and 2.3. The present section will further illustrate how useful the representation of the program in [2.P4] is, while studying incomplete markets in general, through the so-called min-max
stochastic discounting approach. Magill and Quinzii (1996) contain an extensive analysis of how
this representation helps studying general equilibrium with incomplete markets in quite general
models. Truth be said, the representation in [2.P4] is not the exclusive way to study decision
problems arising in incomplete markets. In some cases, such as those covered in Section 8.5
of Chapter 8, it seems easier to make reference to the initial program in [2.P3]. The choice of
which program to consider, [2.P3] or [2.P4], quite depends on the problem at hand.

2.5.3.1 Complete markets and risk sharing

In the complete markets case, h i = R , so that the rst order conditions to the program [2.P4]
are,
0
0
(0 ) =
( ) =
= 1
(2.28)

where
is a Lagrange multiplier. So, really, the properties of this model are the same as those
of the static model in Section 2.2. Formally, the complete markets economy in this section is the
same as the static economy in Section 2.2, once we set = , where is the dimension of the
commodity space, in Section 2.2, and
= , where
is the price of the -th commodity in
Section 2.2, with 1 = 1 (the numeraire), and
is the Arrow-Debreu state price in the unied
budget constraint of Eq. (2.25).
These facts formalize the reasoning made at the beginning of Section 2.3 (see Sections 2.3.1
and 2.3.2): when markets are complete, an economy with uncertainty can be understood through
a static one. Complicated markets with heterogeneous agents, but with potentially interesting
asset pricing implications, and still, apparently, so hopelessly di cult to analyze, can be centralized through a dedicated design of Paretos weights, as formalized in Theorem 2.7.
These properties are robust to dynamic extensions, as explained in more advanced parts of
these Lectures (see Chapters 4 and 8), provided markets are dynamically completea property
explained in the next two chapters. However, the assumption seems unrealistic that agents can
trade Arrow-Debreu securities for all states of the world: one reason nancial innovation is in
practice so pervasive is that markets are incomplete. Yet as discussed in Chapter 8, centralizing
marktes is a concept that has been extended to economies with incomplete markets setting,
relying on stochastic Pareto weights.
We now derive the equilibrium implications of the rst-order conditions in the simple case of
an economy with a single agent, where the following stochastic discount factor is:
0

=
The economic interpretation of

0(

)
0)

is the following. In the autarchic case,


=
0=

0(

73

)
0)

c
by
A. Mele

2.5. Equivalent martingales, and equilibrium

is the present consumption the agent is willing to give up to at = 0, in order to obtain


additional consumption at time = 1, in state . In other words,
is the price, in terms of the
present consumption numeraire, of one additional unit of consumption at time = 1 and state
. So it is a state price that makes the agent happy to consume his own endowment, without
any incentives left to trade in the nancial markets. The risk-neutral probability is,
=

= (1 + )

= (1 + )

)
0)

0(

By the rst order conditions, and


P the expression for the evaluation of a pure discount bond,
1
= ( ), we have that 1 =
. Moreover,
=1
1+
=

(1 + ) =

0(

0)

)]

0(

)]

The case with heterogenous agents is similar provided markets are complete. By the optimality conditions applying to each agent (see Eq. (2.28)), and Eq. (2.26), the marginal rate of
substitution for each agents are:
0
0

( )
(0 )

= 1

= 1

(2.29)

That is, in equilibrium agents do have the same marginal rate of substitution when are
markets are complete. It is so because the vector of state prices is unique if and only if markets
are complete (see Theorem 2.9), implying
is unique. The marginal rates of substitution are,
then, independent of , and the equilibrium allocation is a Pareto optimum as a result, by the
discussion at the beginning of this section, and Theorem 2.5.
The fact that agents have the same marginal rates of substitution in each state of the world
is known as risk-sharing. It means that, given an initial endowment distribution, the market
mechanism is capable to shift risks around through a system of complete security markets.
Risks are borne more by the agents most willing to take them.
Suppose for example that two agents have the same utility and discount rate but that the
rst agent is less risk-averse than the second, i.e. the CRRAs satisfy: 1
2 . Then, Gr 1 =
(Gr 2 ) 2 1 , where Gr is consumption growth for the -th agent in state . In good times, when
Gr 2
1, the more risk-averse agent experiences a lower consumption growth rate ex-post,
Gr 2 Gr 1 . However, in bad times, when Gr 2 1, the more risk-averse agent experiences a
higher consumption growth rate ex-post, Gr 2 Gr 1 , as illustrated by Figure 2.4 in the case
where 2 1 = 3. In other words, capital markets, when complete, operate in such a way to
have the more risk-averse agent face a less volatile consumption growth.

74

c
by
A. Mele

2.5. Equivalent martingales, and equilibrium

1.6

Gr_1

1.4
1.2

Gr_2

1.0
0.8
0.6
0.8

0.9

1.0

1.1

1.2

Gr_2

FIGURE 2.4. Equilibrium consumption growth rates of two agents with di erent riskaversion. The dashed line depicts the consumption growth rate of the more risk averse
agent, and the solid is the consumption growth rate of the more risk-averse. The ratio of
the two CRRA is 3.

Risk-sharing carries another meaning, that of mutuality (Wilson, 1968). Suppose that the
distribution of endowments across the population is heterogeneous, in that there are some
states of the world in which some agents are better o than others. A system of complete
markets would allow the agents to insure each other in a way that they would only bear the
macroeconomic risk, not the idiosyncratic risk. Let us illustrate. By Eq. (2.29),
0
0

( )
=
(0 )

0
0

( )
(0 )

for all agents

, and state

(2.30)

Suppose, then, that the aggregate endowment is higher in some state than in some other
. Then, we claim that each agent has a strictly higher consumption in state than in 0 . In
particular, this means that in absence of aggregate risk (aggregate endowment being the same in
each state), the agents bear no risk. Indeed, consider two distinct states and 0 and assume that
0 . Then, there must exist an individual
the aggregate endowment satises
such that
0
0

0 and, hence, 0 ( )
( 0 ). By Eq. (2.30), then, 0 ( )
( 0 ) and, hence,

0 for any agent . Finally, note that because the utilities () are state-independent,
the equilibrium distribution of the aggregate endowment does not depend on .
The previous conclusions imply that the equilibrium allocations are state-independent functions of the aggregate endowment,
= ( )
0

for some strictly increasing functions (). Whence, the mutuality result mentioned above: the
equilibrium allocations do not depend on the state of the economy if the aggregate endowment
does not vary across states .1 The functions () are known as Pareto sharing rules. Huang
1 To illustrate, consider an economy with two agents ( and ) and no-aggregate risk. Pareto allocations can then be characterized
within an Edgeworths box where the axes indicate consumption in the two states 0 and . Note that the 45o line of the box is

75

c
by
A. Mele

2.5. Equivalent martingales, and equilibrium

and Litzenberger (1988, Chapter 5) explain in detail further properties of these functions. In
particular, () are linear if and only if the utility functions take the following form, ( ) =
1
( + ) , for some constants and .
2.5.3.2 Incomplete markets

Agents marginal rates of substitution cannot be equal if markets are incompleteexcept perhaps on a negligible set of endowment distribution. The best outcome in this case, is a set
of equilibria known as constrained Pareto optima, i.e. constrained by ... the states of nature.
It might actually turn out that no constrained Pareto optima could even exist in multiperiod
economies with incomplete markets. [Elaborate, in progress]
When market are incomplete, the state price vector is not unique. That is, suppose that
>
is an equilibrium state price. Then, all the elements of
={

R++ : (

)>

= 0}

(2.31)

are also equilibrium state prices. That is, there exist many equilibrium state prices consistent
with absence of arbitrage. Or, in other words, there are many equilibrium state prices consistent
with the same observable asset price vector , for 0 > = > = .
How do we proceed in this case? Introduce the following budget constraint:

1
1

>
+1
1
1
C=
R++
:0= 0
h i
R++ : = >
0+
(2.32)
Because in Eq. (2.31) has many elements as discussed, the budget constraint C does then
include many constraints to account for in a context with incomplete markets: the martingale
methods in the previous sections do not apply anymore.
Yet let Val ( ) be the value of the following program in the incomplete markets at hand:

( 0 )+
( ( 1 ))
[2.PI ]
max
C

Consider, next, the following constraint:

+1
R++
:0= 0
0+
C =
for some given

and let Val (

>

R++ :

(
=

1
>

) be the value of the program in some abstract complete markets case:

max
( 0 )+
( ( 1 ))
C

[2.P ]

Clearly, we have, Val ( ) Val ( ) for all , for the constraint in the incomplete markets
case, C, is more stringent than that in any complete market setting, C : the solution to the
program in the incomplete markets case [2.PI ], must satisfy the budget constraints in C, formed
using all of the possible Arrow-Debreu state prices (including the Arrow-Debreu state price
1
given in C ), as the constraint in Eq. (2.32) shows. Moreover, ( 1
) h i. These remarks
suggest to dene the following min-max Arrow-Debreu state price:
= arg min Val (

the set in which the two agents are perfecty insured across states. Moreover, the agents indi erence curves are tangent along the
0
( )
0
0
0
=
. Because
= 0 ( )
= 0 , this tangency
contract curve (i.e. the set of Pareto allocations), meaning that
condition is met only on the 45o line: Pareto allocations lead to mutuality.

76

c
by
A. Mele

2.5. Equivalent martingales, and equilibrium


The natural issue is to ascertain whether
Val (

) = Val (

(2.33)

This is indeed the case, under regularity conditions. For the characterization of , suppose
there exists : Val ( ) = Val( ). Then, = . Indeed, suppose the contrary, i.e. there exists
0
: Val( 0 ) Val( ). Then, we would have,
Val (

Val(

Val(

= Val (

a contradiction. Note, again, this is a characterization result of , not an existence proof.


But as mentioned earlier, Eq. (2.33) holds true, as shown in a dynamic setting by He and
Pearson (1991). Chapter 4 provides general guidance on an even more general approach to
solving problems of this kind, arising in a broader context with market imperfections, where
incomplete markets arise as a special case.
A nal note. The min-max state price does not give rise to a pricing probability. Rather,
it is a device to solve programs arising in the context of incomplete markets. Consider, for
example, the standard case in which agents have separable utilities. The counterparts to the
0
rst order conditions in Eqs. (2.28) would be 0 (0 ) =
and
( ) =
, where
is the min-max stochastic discount factor for agent we have as many min-max stochastic
discount factors as agents. An equilibrium would then have to be found by imposing market
clearing. Naturally, a single-agent case is much easier, with the pricing probability being then
the min-max probability, i.e. that resulting from the min-max state price .
2.5.3.3 Financial innovation

Introduction of derivatives to complete the markets.


2.5.3.4 Computation of the equilibrium

The rst order conditions satised by any agents program are:


1

0 = ( )
=

denote the inverse functions of 0 and 0 . By the assumptions we made on


inherit the same properties of 0 and 0 . By replacing these functions into

where and
and 0 ,
and
the constraint,
0 = 0

(2.34)

(1

) =

( )

=1

Dene the function,

We see that lim


solution for :

( )

( )+

( )=

, lim

( ) = 0 and

+ (

) =

( )

0. Therefore, there exists a unique

where () denotes the inverse function of . By replacing back into Eqs. (2.34), we obtain:

0 =
1)
=
1)
0 + (
0 + (
77

c
by
A. Mele

2.6. Consumption-CAPM

To determine the general equilibrium, we need to pin down the stochastic discounting factor,
. We have unknowns ( , = 1 ), and + 1 equilibrium conditions (holding in the
+ 1 markets). By Walras law, only of these are independent. Consider the equilibrium
conditions in the markets at the second period:

X
X

1
; ( 0 ) 0 6=
1) =
= 1
0 + (
=1

=1

These conditions determine the kernel ( ) =1 which leads to compute prices and equilibrium
allocations. Finally, once the optimal are computed, for = 0 1 , the portoio
1 1
1
generated them can be inferred through =
(
).

2.6 Consumption-CAPM
We refer a theory to be consumption-based, if the equilibrium expected returns, or the riskpremiums, are determined through optimal consumption choices. Under certain conditions,
studied in this section, one of its prediction is that these risk-premiums are high for securities
that pay high returns when consumption is high (i.e. when we dont need high returns) and low
returns when consumption is low (i.e. when we need high returns). In fact, while this statement
is quite often used to explain, in a nutshell, the quintessence of the Consumption-CAPM, it
might be misleading in economies where assets are in zero net supply, such as those we studied
in previous sections.
Consider, for example, Eq. (2.27). It is an asset pricing equation (2.27), which states that for
every asset delivering a gross return,
1=

(2.35)

where is some pricing kernel. Let us elaborate on Eq. (2.35), so as to obtain a representation
of the expected return on any asset. Naturally, for a riskless asset, 1 = ( ), which combined
with Eq. (2.35) leaves [ (
)] = 0, and by rearranging terms,
( 0( +) )
[ 0 ( + )]

( ) =

(2.36)

Eq. (2.36) can be rewritten as,


( )

( )
=
( )

(2.37)

We also know from previous sections, that in economies with a single agent, or in economies
with complete markets, this pricing kernel is given by:
=

(
0(

)
0)

That is, the pricing kernel directly links to optimal consumption choicewe are dealing with a
consumption-based CAPM. However, the risk-premium,
( ), does merely depend on
how the pricing kernel co-varies with the asset returns, and not directly on the cyclical properties
of dividends. For example, were endowments be constant in the second period, the risk-premium
would be zero! The next section aims to clarify how we should set the Consumption-CAPM as
an appropriate context to think about the cyclical properties of asset returns.
78

c
by
A. Mele

2.6. Consumption-CAPM
2.6.1 Risk-neutral pricing and macroeconomic risks

Up to now, we have considered assets in zero net supply. We now generalize previous ndings to
the relevant case where some assets could be in positive supply, which is a case applying many
times in these Lectures. For simplicity, consider an economy which only has an asset in positive
supply equal to 0 , which is the endowment of a representative agent. The budget constraints
of the agent over his two period of life are:
1

and 2 = +

(2.38)

where and are the risky asset price and demand, is the initial endowment, 1 is rst period
consumption, is the random endowment for the good in the second period, is the random
dividend promised by the asset in the second period, and 2 is second-period consumption. We
assume that 0.
Note that this model is one with incomplete markets, because the agent can invest using only
one asset, and yet there are two sources of risk: the asset dividend and endowment. We shall
analyze this model in Chapter 8, to explain the extent to which incomplete markets might help
rationalize the equity premium we observe, in practice, within a consumption-based perspective.
In equilibrium, = 0 , such that the asset price equation is,

0
( 0 + )
(2.39)
=
0( )
The previous sections have studied the special case where the asset is in zero-net supply,
0, the asset price
0 = 0, as mentioned. Eq. (2.39) shows that in general, and assuming
is decreasing with asset supply, 0 . It is not a mere supply-demand e ect but, rather a
risk-premium e ect, as we now explain.
We know that the state price for state is, consistently with results in Section 2.6.3,
=

+
( )
0

(2.40)

It is, as usual, the present consumption we are willing to give up, today, to obtain additional
consumption tomorrow, in state . Due to decreasing marginal
utility, we have that consumption
0 ( )
= 0 (0 ) , for = 1 , such that is
demand in state , , is decreasing in , :
decreasing in . Therefore, in equilibrium,
is decreasing in the good supply in state ,
+
.
0
The previous scarcity e ect, by which a shrinkage in supply for state determines an
increase in the state price , is quite simple yet powerful. As we know, we price assets through
Arrow-Debreu assets, by assigning high weights, i.e. by utilizing high Arrow-Debreu prices, to
the bad states of naturea scarcity e ect. Therefore, an increase in the asset supply 0 , being
uniform across all states of nature, mitigates the previous scarcity e ect and, then, lowers the
entire set of Arrow-Debreu prices, thereby reducing the asset price, . In other words, the asset
price decreases because consumption become cheaper in each state of the world, due to this
scarcity channel, which requires less demand for savings.
We can approach this problem from a di erent angle. Note that we can rewrite Eq. (2.39) as
follows,

0
( )
( )
0

=
+ 0
( 0 + )
( 0) =
(2.41)
( 0)
( )
0( + )
0
79

c
by
A. Mele

2.6. Consumption-CAPM

The rst term is an actuarial evaluation of the asset. The second term is a risk-premium, which
comes as a discount to the initial actuarial evaluation, given the assumptions of decreasing
marginal utility, and
( ) 0. The previous equation reveals that an increase in 0 entails
an heavier discounting e ect, as the interest rate, ( 0 ), increases with 0 , as a result of
a decreased demand for savings. The second term is negative, as explainedthe asset pays
o exactly when it is not neeeded. However, it becomes thinner and thinner as 0 increases,
reecting the fact that as 0 increases, and given the assumption that 0, the agent bears
less and less risk: even over poor realization of the states would guarantee handsome overall
returns, when the asset supply is large. As we know from the discussion of Eq. (2.39), however,
the discounting e ect dominates, with asset prices falling as 0 increases.
Note that the previous reasoning does not apply, once we assume the asset is in zero net
supply, 0 = = 0, because as Eq. (2.38) makes clear, capital markets do not a ect equilibrium
consumption anyway. Instead, in this case, only a pure endowment scarcity channel leads to
low Arrow-Debreu prices for the bad states of the world: those states where endowments are
lower command higher Arrow-Debreu prices, at least provided that
( )
0, such that
2
the second term in Eq. (2.41) is negative.
All in all, Arrow-Debreu prices are independent of the asset returns, when the asset is in
zero net supply. When, instead, 0 0 and, still,
( ) 0, one additional scarcity channel
is activated, which consists in a drop in consumption generated by a drop in the dividend.
Suppose, for example, that we enter bad times, when
falls as a result for example of a job
loss. Not only, then, does
fall, the dividend also likely falls, and exactly when you would need
it to compensate for the fall in . It is in this sense, which we may say that asset investing
might make consumption even more volatile.
The second part of these Lectures deals with the literature on the equity premium puzzle,
the challenges that consumption-based explanations of the equity premium have to be consistent
with the observed equity premium.
2.6.2 The beta relation
Suppose there is a such that
=

all

In this case,
( ) =

( )
[ 0 ( + )]

( )=

and

( )
[ 0 ( + )]

These relations can be combined to yield,


( )

[ ( )

( )
( )

2 Note, nally, a special instance of this model, arising when


0 = 0 and = , in which case consumption is not random in the
second period, which means that the agent does not bear any risk. The price is, then, simply, the discounted expected dividend,

( )

80

c
by
A. Mele

2.7. Innite horizon

2.6.3 CCAPM & CAPM


Let be the portfolio return which is the most highly correlated with the pricing kernel .
We have,
( )
=

( )
(2.42)
Using Eqs. (2.37) and (2.42),
( )
( )

)
)

(
(

and by rearranging terms,


( )

[ ( )

[CCAPM]

If is perfectly correlated with

( )
( )

: =

, i.e. if there exists


and

, then

and then
( )

[ ( )

[CAPM]

This is not the only way the CAPM obtains. As we shall explain in Chapter 6, the CAPM also
obtains through the so-called maximum correlation portfolio, which is the portfolio that is
the most highly correlated with the pricing kernel .

2.7 Innite horizon


We consider states of the nature and
= Arrow securities. We write a unied budget
constraint, as in the valuation equilibria approach of Debreu (1954). We have,

0)

(0) (0)

)=

(0)

(0) (0)

=1

= 1

or,
0( 0

0)

X
=1

(0)

=0

The previous relation holds in a two-period economy. In a multiperiod economy, in the second
period (as in the following periods) agents save indenitively for the future. In the appendix,
we show that,
"
#
X

0=
(2.43)
0
=0

where 0 are the state prices. From the perspective of time 0, at time
of nature and, thus, possible prices.
81

there exist

states

c
by
A. Mele

2.8. Further topics on incomplete markets

2.8 Further topics on incomplete markets


2.8.1 Nominal assets and real indeterminacy of the equilibrium
( +1)
The equilibrium is a set of prices ( ) R++
R++ such that:
0=

( )

0=

=1

( )

0=

=1

( )

=1

where the previous functions are the results of optimal plans of the agents. This system has
( + 1) + equations and ( + 1) + unknowns, where
. Let us aggregate the
constraints of the agents,

=1

=1

=1

=1

Suppose the nancial markets clearing condition is satised, i.e.


0=

0 0

=1

0 =

( ) ( )
0 0

=1

= 0. Then,

=1

=1

1 1

( )
( )
1 ( 1) 1 ( 1)

=1

=1

( )
1 (

( )
1 (

>
)

Therefore, there is one redundant equation for each state of nature, or + 1 redundant
equations, in total. As a result, the equilibrium has less independent equations ( ( + 1) 1)
than unknowns ( ( +1)+ ), i.e., an indeterminacy degree equal to +1. This result does not
rely on whether markets are complete or not. In a sense, it is even not an indeterminacy result
when markets are complete, as we may always assume agents would organize the exchanges
at the beginning. In this case, onle the suitably normalized Arrow-Debreu state prices would
matter for agents.
The previous indeterminacy can be reduced to
1, as we may use two additional homogeneity relations. To pin down these relations, let us consider the budget constaint of each agent
,
0 0 =
1 1 =
The rst-period constraint is still the same if we multiply the spot price vector 0 and the
nancial price vector by a positive constant, (say). In other words, if (0 1 ) is an equilibrium, then, ( 0 1 ) is also an equilibrium, which delivers a rst homogeneity relation.
To derive the second homogeneity relation, we multiply the spot prices of the second period by
a positive constant, and increase at the same time the rst period agents purchasing power,
by dividing each asset price by the same constant, as follows:
0 0

1 1

Therefore, if (0 1 ) is an equilibrium, then, 0


82

is also an equilibrium.

c
by
A. Mele

2.8. Further topics on incomplete markets


2.8.2 Nonneutrality of money

The previous indeterminacy arises because nancial contracts are nominal, i.e. the asset payo s
are expressed in terms of some unite de compte that, among other things, we did not make
precise. Such an indeterminacy vanishes if we were to consider real contracts, i.e. contracts
with payo s expressed in terms of the goods. To show this, note that in the presence of real
contracts, the agents constraints are

0 0 =
) 1 ( ) = 1( )
= 1
1(
where
= [ 1
] is the
matrix of the real payo s. The previous constraint
now reveals how to recover + 1 homogeneity relations. For each strictly positive vector
= [ 0 1
], we have that if [0
) 1 ( )] is an equilibrium, then,
1( 1)
1(
[ 0 0 0
(
)

(
)

(
)]
is
also
an
equilibrium,
and
so
is
1
1
1
1
[0
) 1 ( )], for , = 1 .
1( 1)
1(
As is clear, the distinction between nominal and real assets has a precise meaning, when
one considers a multi-commodity economy. Even in this case, however, such a distinctions is
not very interesting without a suitable introduction of a unite de compte. These considerations
led Magill and Quinzii (1992) to solve the indeterminacy while still remaining in a framework
with nominal assets. They simply propose to introduce money as a mean of exchange. The
indeterminacy can then be resolved by xing the prices via the + 1 equations dening the
money market equilibrium in all states of nature:
=

= 0 1

=1

Magill and Quinzii showed that the monetary policy (

83

=0

is generically nonneutral.

c
by
A. Mele

2.9. Appendix 1

2.9 Appendix 1
In this appendix we prove that the program [2.P1] has a unique maximum. Indeed, suppose on the
contrary that we have two maxima:

= (1 ) and = 1
P
P
with
. To check that this
These two maxima would satisfy () = (),P
=1 =
=1 =
. Then, the consumption bundle,
claim is correct, suppose on the contrary that
=1
=

would be preferred to , by Assumption 2.1, and, at the same time, it would hold that, for su ciently
small ,
X
X
= 1+
.
=1

=1

[Indeed, we have,
.
0: + 1
. E.g., 1 =
,
0. The
=1
condition is then:
0:
.] Hence, would be a solution to [2.P1], thereby contradicting
the optimality of . Therefore, the existence of two optima would imply a full use of resources. Next,
) ,
(0 1). By Assumption 2.1,
consider a point lying between and , viz = + (1

( )=
() = ( )
+ (1
)
Moreover,
X
=1

X
=1

+ (1

=1

=1

=1

Hence,
( ) and is also strictly preferred to and , which means that and
as initially conjectured. This establishes uniqueness of the solution to [2.P1].

84

=
are not optima,

c
by
A. Mele

2.10. Appendix 2: Proofs of selected results

2.10 Appendix 2: Proofs of selected results


We rst provide a useful result, a well-known theorem on the separation of two convex sets. We use
this theorem to deal with the proof of the second welfare theorem (Theorem 2.4) and the existence of
state prices tying up all asset prices (Theorem 2.10). A nal proof we provide in this appendix is that
of Eq. (2.43).
Minkowskis separation theorem.
Let
and
be two non-empty convex subsets of R . If
T
is closed,
is compact and
= , then there exists a
R and two real numbers 1 2 such
that:
>

>

We are now ready to prove Theorems 2.4 and 2.10.

Proof of Theorem 2.4. Let be a Pareto


optimum and =
: (o)
( ) . Let us
n
S
P
and
consider the two sets =
= ( ) =1 :
0
=
.
is the set of all
=1
=1
possible combinations of feasible allocations. By T
the denition of a Pareto optimum, there are no
= . In particular, this is true for all compact
elements in that areTsimultaneously in , or

subsets
of , or
= . Because
is closed, then, by the Minkowskis separating theorem,
there exists a
R and two distinct numbers 1 , 2 such that
>

This means that for all allocations

>

preferred to , we have:

=1
>

>

=1

or, by replacing

=1

with

=1

X
=1

,
>

>

=1

(2A.1)

=1

P
, and partition = (1 ). Let us apply
Next we show that
0. Let =
=1 , = 1
. We have 1
0, or
the inequality in (2A.1) to
and, for
0, to = (1 + )
0. By reiterating the argument,
0 for all . Finally, we choose = + 1
= 2 ,
1
> 1 + >1
or,
0 in (2A.1), > 1
> 1

> 1

1
for su ciently small. This means that 1 ( 1 )
1 ( )
1
>
1
>
1
= . By symmetry, = arg max
arg max 1 1 ( ) s.t.

> 1

> 1 .

( ) s.t.

>

This means that 1 =


= > for all . k

Proof of Theorem 2.9. The condition in (2.18) holds for any compact subset of R++1 , and
therefore it holds when it is restricted to the unit simplex in R++1 ,
h

S = {0} .

By the Minkowskis separation theorem, R +1 : >


walking through the simplex boundaries, one nds that 1

85

> ,

= 1

h i,
S . By
. On the other hand,

c
by
A. Mele

2.10. Appendix 2: Proofs of selected results

+1
0 h i, which reveals that 1 0, and R++
. Next we show that > = 0. Assume the contrary,
h i that satises at the same time > 6= 0. In this case, there would be a real number
i.e.
>
h i and
with sign( ) = sign( > ) such that
2 , a contradiction. Therefore, we have
>
>
>
>
= ( (
) ) = ( 0 + ( ) ) ,
R , where ( ) contains the last components
0=

of . Whence = > , where > = 1 .


0

The proof of the converse is immediate (hint: multiply by ): shown in further notes.
The proof of the second part is the following one. We have that each point of R +1 is equal to
each point of h i plus each point of h i , or dim h i + dim h i = + 1. Since dim h i =
rank( ), dim h i = + 1 dim h i, and since = > in the absence of arbitrage opportunities,
dim h i = dim h i = , whence:
dim h i =
+1

>
>
In other terms, before we showed that :
= 0, or
h i . Whence dim h i
1 in
the absence of arbitrage opportunities. The previous relation provides more information. Specically,
>
dim h i = 1 if and only if = . In this case, dim{ R++1 :
= 0} = 1, which means that
>

the relation
= 0 also holds truefor
=
for every positive scalar , but there are no
0 +

>
1
other possible candidates. Therefore,
= is such that = ( ), and then it is unique.
0
0

>
By a similar reasoning, dim{ R++1 :
= 0} =
+1
dim
R++ : = >
=
.
k
(2)( )

Proof of Eq. (2.43). Let

be the price at

= 2 in state

, for the Arrow-Debreu security promising 1 unit of numeraire in state


(2)(1)

(2)( )

if the state in
at

= 3. Let

(1)( )

= 1 is
(2)
0

]. Let
be the quantity purchased at = 1 in state of Arrow-Debreu securi[ 0
0
ties promising 1 unit of numeraire if at = 2. Let 2 be the price of the good at = 2 in state if
the previous state at = 1 was . The budget constraint is
0( 0
1

0)
1

(0) (0)

(0)( )

(0)( ) (0)( )

=1
(1) (1)

(0)( )

(1)( ) (1)( )

= 1

=1

(1)( )

is the price to be paid at time 1 and in state , for an Arrow-Debreu security yielding 1
where
unit of numeraire in state at time 2. The previous two equations can be combined to leave,
i

P (0)( ) h 1 1
(1) (1)
1
+
0( 0
0) =
or,

0=

0( 0

0)

0( 0

0)

0( 0

0)

At time 2,
2

P
P
P

(1)( )

(0)( ) 1
(0)( ) 1
(0)( ) 1

(1)( )

(2) (2)

+
+
+

P
P

(0)( ) P

P P

X
=1

86

(0)( ) (1) (1)


(1)( ) (1)( )

(0)( ) (1)( ) (1)( )

(2)( ) (2)( )

= 1

(2A.2)

(2A.3)

c
by
A. Mele

2.10. Appendix 2: Proofs of selected results


(2)

where
denotes the price vector to be paid at = 2 in state if the state at = 1 is , for the
Arrow-Debreu securities expiring at = 3, with remaining notation being straightforward.
Plugging Eq. (2A.3) into Eq. (2A.2) leaves:
i
P P (0)( ) (1)( ) h 2 2

P (0)( ) 1 1
(2) (2)
1
2
+
+
0= 0( 0
0) +
P P (0)( ) (1)( ) 2 2
P (0)( ) 1 1
1
2
= 0( 0
(
)
+
0) +
P P P (0)( ) (1)( ) (2)( ) (2)( )
(2A.4)
+
In the absence of arbitrage,
is 0 , such that:

+1

R++ , which is the state prices vector for + 1 if the state in

( )( )
0

+1

= 1

R+ and has all zeros except in the -th component which is 1. Next, we restate the
where
( )
previous relation in terms of the kernel +1 0 = ( +1 0 ) =1 and the probability distribution +1 0 =
(

( )
+1

=1

of the events in + 1 when the state in


( )( )
0

( )
+1

( )
+1

0:

is

= 1

(2A.5)

Eq. (2.43) in the main text follows by replacing Eq. (2A.5) into Eq. (2A.4), and by imposing the
transversality condition:
X XX X
1 =1

2 =1

3 =1

4 =1

=1

(0)( 1 ) (1)( 2 ) (2)( 3 ) (3)( 4 )


1

we get eq. (2.43). k

87

1)( )
1

c
by
A. Mele

2.11. Appendix 3: The multicommodity case

2.11 Appendix 3: The multicommodity case


The multicommodity case is interesting, but at the same time is extremely delicate to deal with when
markets are incomplete. While standard regularity conditions ensure the existence of an equilibrium
in the static and complete markets case, only generic existence results are available for the incoplete
markets cases. Hart (1974) built up well-chosen examples in which there exist sets of endowments
distributions for which no equilibrium can exist. However, Du e and Shafer (1985) showed that such
sets have zero measure, which justies the terminology of generic existence.
commodities are traded in period ( = 0 1).
Here we only provide a derivation of the contraints.
The states of nature in the second period are , and the number of traded assets is . The rst period
budget constraint is:
0 0 =
0
0
0
(1)

(1)

where 0 = ( 0 0 1 ) is the rst period price vector, 0 = ( 0 0 1 )0 is the rst period


) is the nancial asset price vector, and
= ( 1
)0 is
excess demands vector, = ( 1
the vector of assets quantities that agent buys at the rst period.
The second period budget constraint is,
0
1

where
1( 1)
1 2

2)

is the matrix of excess demands,

1(
2

= ( 1(

1)

2 1

1(

..

=
1(

)
2

)) is the matrix of spot prices, and

2 1

1( 1)

1(
1

1)

is the payo s matrix. We can rewrite the second period constraint as 1 1 = , where 1 is
( 1 ( 1 ) 1 ( 1 ) 1 ( ) 1 ( ))0 . The budget constraints are
dened similarly as 0 , and 1 1
then,
0 0 =
1 1 =
Now suppose that markets are complete, i.e., = and can be inverted. The second constraint
1
=
= . We have
is then:
1
1 . Consider without loss of generality Arrow securities, or
= 1 1 , and by replacing into the rst constraint,
0 =

0 0

0 0

0 0

0 0

1 1

( 1( 1) 1 ( 1)
P
+
1( ) 1 ( )

1(

=1

P1

=1

P1

=1

( ) ( )
0
0

( ) ( )
0
0

=1

P P2

=1 =1

P2

=1

( )
1 (

( )

1 ( )

88

( )

( )

))0

c
by
A. Mele

2.11. Appendix 3: The multicommodity case


( )

( )

where 1 ( )
1 ( ). The price to be paid today for the obtention of a good in state is equal
( )
to the price of an Arrow asset written for state multiplied by the spot price 1 ( ) of this good in this
( )
state; here the Arrow-Debreu state price is 1 ( ). The general equilibrium can be analyzed by making
. Then we are left with
reference to such state prices. From now on, we simplify and set 1 = 2
(1)
( )
(1)
( )
determining ( + 1) equilibrium prices, i.e. 0 = ( 0 0 ), 1 ( 1 ) = (1 ( 1 ) 1 ( 1 )),
(1)
( )
, 1 ( ) = (1 ( ) 1 ( )). By exactly the same arguments of the previous chapter, there
exists one degree of indeterminacy. Therefore, there are only ( + 1) 1 relations that can determine
the ( + 1) prices. (Price normalization can be done by letting one of the rst period commodities
be the numeraire.) On the other hand, in the initial economy we have to determine ( + 1) + prices
( +1)
R++ which are the solution to the system:
( ) R++
P

=1

( ) = 0

( ) = 0

=1

( ) = 0

=1

where the previous functions are obtained as solutions to the agents programs. When we solve for
Arrow-Debreu prices, in a second step we have to determine ( + 1) + prices starting from the
knowledge of ( + 1) 1 relations dening the Arrow-Debreu prices, which implies a price indeterminacy of the initial economy equal to + 1. In fact, it is possible to show that the degree of
indeterminacy is only
1.

89

c
by
A. Mele

2.11. Appendix 3: The multicommodity case

References
Arrow, K. J. (1953): Le role des valeurs boursi`eres pour la repartitition la meilleure des
risques. Econometrie 41-48. CNRS, Paris. Translated and reprinted in 1964: The Role
of Securities in the Optimal Allocation of Risk-Bearing. Review of Economic Studies 31,
91-96.
Bernoulli, D. (1738): Specimen Theoriae Novae de Mensura Sortis. Commentarii Academiae
Scientiarum Imperialis Petropolitanae V, 175-192. Reprinted in English in 1954: Exposition of a New Theory on the Measurement of Risk. Econometrica 22, 23-36
Debreu, G. (1954): Valuation Equilibrium and Pareto Optimum. Proceedings of the National
Academy of Sciences 40, 588-592.
Debreu, G. (1959): Theory of Value: An Axiomatic Analysis of Economic Equilibrium. New
Haven: Yale University Press.
Du e, D. (2001): Dynamic Asset Pricing Theory. Princeton: Princeton University Press.
Du e, D. and W. Shafer (1985): Equilibrium in Incomplete Markets: I. A Basic Model of
Generic Existence. Journal of Mathematical Economics 13 285-300.
Hart, O. (1974): On the Existence of Equilibrium in a Securities Model. Journal of Economic
Theory 9, 293-311.
He, H. and N. Pearson (1991): Consumption and Portfolio Policies with Incomplete Markets
and Short-Sales Constraints: The Innite Dimensional Case. Journal of Economic Theory
54, 259-304.
Huang, C-f. and R.H. Litzenberger (1988): Foundations for Financial Economics. New York:
North-Holland.
Magill, M. and M. Quinzii (1996): Theory of Incomplete Markets. Cambridge: MIT Press.
Wilson, R. (1968): The Theory of Syndicates. Econometrica 36, 119-132.

90

3
Innite horizon economies

3.1 Introduction
This chapter extends the analysis of the previous two. Consumption is still an important determinant of asset prices. At the same time, this chapter analyzes asset prices in multiperiod
economies. We consider simple economies, in which agents either live forever and have access to
a set of complete markets, or belong to overlapping generations. We consider models without
and with production, without and with money. We aim to develop fundamental tools applied
or extended in subsequent chapters while dealing with nancial frictions, bubbles or sunspots.
[In progress]

3.2 Consumption-based asset evaluation


3.2.1 Recursive plans: introduction
We consider a simple, benchmark case, arising in the absence of any risks for a decision maker.
Consider an agent endowed with initial wealth equal to 0 , who solves the following problem:
(
s.t.

0)

max

( )
+1

=0

=(

( )

[3.P1]

=0

+1

) =0 given

This problem can be reformulated in a recursive format:


( ) = max [ ( ) +

+1 )]

s.t.

+1

=(

(3.1)

+1

By replacing the wealth constraint into the maximand, it is easily checked that the rst-order
0
( +1 ) +1 . Therefore, the consumption policy is a function
condition for leads to, 0 ( ) =
of both wealth and the interest rate, which for sake of simplicity we denote as ( ). The value
function and the rst-order condition, then, can be written as:
( ) = ( ( )) +

((

( ))

+1 )

( ( )) =

((

( ))

+1 )

+1

c
by
A. Mele

3.2. Consumption-based asset evaluation

By di erentiating the value function, and using the rst-order condition, leaves the envelope
condition:
0

( )=

Therefore,

( ( )) 0 ( ) +
+1 )

( (

+1 ))

((

( ))

+1 ) (1

( ))

+1

( ( ))

(3.2)

too, and by substituting back into the rst-order condition,


0

( ( +1 ))
=
0 ( ( ))

(3.3)

+1

The economic intuition underlying Eq. (3.3) is the same as that we saw in the two-period
economy analyzed in Chapter 2. Eq. (3.3) says that along an optimal consumption path, the
present consumption I give up at to obtain additional consumption at + 1 has to equal the
price at of a pure discount bond. That is, the bond price is the relative price of consumption
tomorrow relative to consumption today.
We can achieve the same conclusion relying on an alternative approach, based on Lagrange
multipliers. This approach is useful when dealing with more intricate issues relating to production economies or economies with nancial frictions, as we shall see in this and further chapters.
So consider the constraint in program [3.P1]. Savings at time are sav
. Using this
denition, the constraint in [3.P1] is: +1 + sav +1 = +1 sav , with sav 1 = 0 , given. Let
be a sequence of Lagrange multipliers associated to these constraints. Consider the program,
L (sav 1 )

max
sav )

=0

( )

( + sav

sav

=0

1)

where is a sequence of Lagrange multipliers. The rst-order condition for consumption is,
0
( ) = , and the rst-order condition for savings sav leads to: = +1 +1 . Putting all
together yields precisely Eq. (3.3). Finally, note that the same program can be cast, and solved,
in a recursive format,
L (sav

1)

= max [ ( )

( + sav

sav

sav

1)

+ L (sav )]

The rst-order condition for consumption and savings are 0 ( ) =


and
= L0 (sav ),
respectively. By replacing the rst-order condition for , i.e. the budget constraint, and di erentiating L (sav 1 ), leaves L0 (sav 1 ) = L0 (sav ) . These conditions lead to Eq. (3.3).
As a simple example, consider the case of a logarithmic utility function, ( ) = ln . Let us
guess that the value function is ( )
( ; ) = + ln . Substituting this guess into the
envelope condition in (3.2) leaves ( ) = 1 . By Eq. (3.3), then, +1 =
+1 . Comparing
the R.H.S. of this equation with the R.H.S. of the constraint in [3.P1], leaves ( ) = (1
) ;
in other terms, = (1
) 1 .1
Next, we introduce uncertainty and develop heuristic details regarding asset evaluation in
this context.
1 To pin down the coe cient series
, use the denition of the value function, ( ; )
)=
+ log and ( ) = (1
) 1 into this denition leaves,
= ln (1
plugging (

is constant,

is also constant, and equal to (ln (1

)+

ln (

92

)) (1

).

( ( )) +
(
)+
+1 + 1

+1 ;

ln (

+1 ).

By
If

+1 ).

c
by
A. Mele

3.2. Consumption-based asset evaluation


3.2.2 Asset pricing: the marginalist argument

Consider the following thought experiment. At time , I give up to a small quantity of consumption equal to
. The reduction of utility at then equals 0 ( )
. But by investing
in a
safe asset, I can consume
more at + 1. These additional consumption units
+1 =
+1
lead to an expected utility gain equal to
( 0 ( +1 )
denote the expecta+1 ), where
tion conditional on time- information. If and +1 are part of an optimal consumption plan,
I should be left with no incentives to implement these intertemporal consumption transfers.
Therefore, along an optimal consumption plan, any reductions and gains in the welfare of the
type considered above need to be identical:
0

( 0(

( )=

+1 )

+1 )

can be invested in
This relation generalizes Eq. (3.3). Next, suppose that at time ,
a risky asset whose price is . I can buy
/
units of this asset. Come time + 1, I
could sell the asset for +1 , pocket its dividend +1 if any, and nance additional units of
consumption equal to
/ ) ( +1 + +1 ). The reduction in the current utility
+1 = (
0
is ( )
. The boost in the expected utility at time + 1 is
( 0 ( +1 ) +1 ). Again, if
I am on an optimal consumption plan, there should not be incentives left to implement these
intertemporal transfers. Therefore, the celebrated Lucas asset pricing equation holds:

+1 +
+1
0
0
(3.4)
( )=
( +1 )
Section 3.2.4 derives Eq. (3.4) through dynamic programming methods, which are essential,
once we wish to work through more complex models such as those including nancial frictions.
The next section, instead, elaborates on the optimality condition in Eq. (3.3) and develops
key concept in both nancial economics and macroeconomics: the intertemporal elasticity of
substitution.
3.2.3 Intertemporal elasticity of substitution
The elasticity of substitution between two consumption goods,
ratio,

ES (

)=

and

, is dened as the

and
are the prices of the two goods. It measures the percentage change in the
where
relative consumption choice of two goods after a percentage change in their relative prices.
Similarly, dene EIS (
)
ES (
) as the elasticity of intertemporal substitution of
0
consumption and at two points in time and
. By the rst order conditions, 0 (( )) =
,
0
( )/ 0 ( )
( / )
( / )
)=
=

EIS (
0
0
( ( )/ ( ))
/
( / )
where
=

0
0

( )
( )
93

c
by
A. Mele

3.2. Consumption-based asset evaluation

denotes the price of a zero-coupon bond; accordingly, denotes the gross interest rate from
to . Note that we are assuming no uncertainty in these basic derivations.
The elasticity EIS (
) is a measure of the percentage increase in the desired consumption
tomorrow relative to today, after a percentage decrease of the price of consumption tomorrow
relative to today. Intuitively, high values of EIS (
) describe a situation where the agent is
quite sensitive about consuming at and : even a small increase in the interest rate from
to and, hence, a small percentage drop in , can induce him to a substantial relative increase
of consumption in the future.
As
, EIS (
) collapses to the inverse of the elasticity of marginal utility with respect
to consumption or, simply, the relative risk-aversion
1
EIS ( )

lim

1
EIS (

=
=

lim

lim
00
0

/
0 ( )/

1+
( )
( )

(
0

( )
00 (

)
0( )

( )/ 0 ( ))
( / )

where the second equality follows by a rst-order Taylors expansion of the marginal utility
of consumption at time , 0 ( ) = 0 ( ) + 00 ( ) (
) + ((
)2 ). The expression,
EIS ( ), is called instantaneous elasticity of intertemporal substitution.
For example, in the CRRA case, and again in the deterministic case, we have that along an
optimal consumption path, +1 = ( )1 , where is the CRRA: as increases, it becomes
more attractive to save and postpone consumption. In a stylized equilibrium with a representative agent, ln = ln + , where denotes the growth rate of the economy. When is
high, more consumption will be available in the future, which mitigates the incentives to save,
driving the interest rate up.
An agent with a low EIS has a quite inelastic demand for bonds. Intuitively, when the price
of consumption in the future relative to today, , drops, desired consumption tomorrow relative
to today increases. But for an agent with a low EIS, the desired relative increase in future
consumption is quite limited, and so is his demand for bondsthe instruments that allow him
to allocate intertemporal consumption.
3.2.4 Lucas model
3.2.4.1 The optimality condition

We consider markets for


trees, and assume that the only source of risk stems from the
dividends related to these trees:
= ( 1
). We assume
is a Markov process and
denote its conditional distribution function with ( +1 | ). A representative agent solves
the following program:
#
"

(
)=
max
( + ) F
[3.P2]

( +
+ ) =0
=0
+
+ )
s.t.
+1 = (

where F denotes the information set as of time , +1 R is F -measurable, that is, +1 needs
to be chosen at time . We can solve the program [3.P2], using the same recursive approach in
94

c
by
A. Mele

3.2. Consumption-based asset evaluation

Section 3.2.1, once due account is made of uncertainty. The Bellmans equation is:
(

) = max

[ ( )+

+1

+1

+1 )| F

] s.t.

+1

=(

(3.5)

Similarly as we did for Eq. (3.1), let us replace the budget constraint into the maximand. The
following rst-order condition holds for :
0=

((

+1 )

+1

+1 )| F

(3.6)

where the subscript in the value function on the right hand side denotes a partial derivative:
)=
(
)
. The optimal policy, +1 is a function of the current state, (
),
1 (
say +1 = T (
). By di erentiating the value function with respect to , and using the
previous rst-order condition, leaves:
"

!
#
P
P
0
)=
( )
+
T1 (
) +
)
1 (
1 ( +1
+1 ) T1 (
=1

( )(

=1

to denote the expectation operator conditional upon F , and


where for brevity, we use
we have dened T1 (
) = T (
)
and T is the -th component of the vector T .
Substituting this result into Eq. (3.6) yields precisely the Lucas equation (3.4), holding for each
asset :

+1 +
+1
0
0
(3.7)
( )=
( +1 )
It is easy to show to extend these conditions to the case where a representative agent can
also invest into a locally riskless asset, that is, an asset that expires over the next period. The
budget constraint in Eq. (3.5) is, in this case: +
+ ) + 0 , where
+1 + 0 0 +1 = (
denotes
the
amount
of
the
locally
riskless
asset,
,
and
is
the riskless interest
0
0
0
rate, and the Lucas equation for the would be:
=
[ ( +1 )].
3.2.4.2 Rational expectations equilibrium

The asset market clears when for each , P= 1 and 0 = 0. By the budget constraint,
. A rational expectation equilibrium
then, the market for goods also clears, =
=1
is a sequence of asset prices ( ) =0 such that the optimality condition in Eq. (3.7) holds, the
markets clear, = , and each asset price is a function of the state,
= ( ) say. All in
all,
Z

0
0
( ) ( )=
( +1 ) ( +1 ) +
(3.8)
( +1 | )
+1
This is a functional equation in
( +1 ).

(). Let us focus, rst, on the IID case:

+1 |

) =

IID shocks

Eq. (3.8) simplies to:


0

( )

)=

+1 )

95

+1 )

+1

+1 )

(3.9)

c
by
A. Mele

3.2. Consumption-based asset evaluation

Note that the right hand side of this equation is independent of . Therefore, 0 ( ) ( )
equals some constant (say), which we can easily nd by substituting it back into the previous
equation, leaving:
Z
0
=
( +1 ) +1 ( +1 )
1
Therefore, the solution for

( ) is:

)=

0(

)
00

( )
, which collapses to relative riskNote, the elasticity of the price to dividend equals
0( )
aversion, once we assume only one tree exists. For example, if relative risk-aversion is constant
and equal to ,
Z

)=

( )

Figure 3.1 depicts the behavior of the asset price function ( ), under the assumption that
is not increasing in . The asset price collapses to the constant, (1
) 1 ( ), in the special
case where the representative agent is risk-neutral, = 0.

The asset pricing function


.

) in the IID case and constant relative risk-aversion, equal to

Dependent shocks
0
Dene ( )
( ) ( ) and
functions, Eq. (3.8) is:

( )=

( )

( )+

+1 )

+1 )

+1

+1 |

+1 |

). In terms of these new

It is a functional equation in , which we can show it admits a unique solution, under the
conditions contained in the celebrated Blackwells theorem below:
Theorem 3.1. Let B( ) the Banach space of continuous bounded real functions on
R
endowed with the norm k k = sup | |,
B( ). Introduce an operator
: B( ) 7 B( )
with the following properties:
(i) is monotone:
and 1 2 B( ) 1 ( )
[ 1] ( )
[ 2 ] ( );
2( )
96

c
by
A. Mele

3.2. Consumption-based asset evaluation


(ii)
Then,
[ ].

and
0,
(0 1) : [ + ] ( )
[ ]( ) + .
is a -contraction and, 0 B( ), it has a unique xed point lim

So let us introduce the following operator:


[ ]( ) =

( )+

[ 0] =

| )

The existence of
and, hence, , relies on the existence of a xed point of
:
= [ ].
It is easily checked that conditions (i) and (ii) in Theorem 3.1 hold here. To establish that
: B( ) 7 B( ) as well, it is su cient to show that
B( ). A su cient condition given
by Lucas (1978) is that is bounded, and bounded away by a constant .2 Note, a log-utility
agent would not satisfy this condition, yet, this case can be easily solved in the case of a single
tree, as shown next.
Suppose, then, that ( ) = ln , and that there is one single asset, such that Eq. (3.8) collapses
to

Z
( )
( +1 )
=
+1
( +1 | )
+1

The solution to this equation is a constant price-dividend ratio,


(

Note that this result does not depend on any distribution assumption on the dividend process.
However, in the general CRRA case, it cannot be said more, not even in the single asset case.
Indeed, by Eq. (3.8),
1

Z
( )
( +1 )
+1
=
+1
( +1 | )
+1

It is easily seen that the solution to this functional equation is:


R +1 1
( +1 | )
( )
=

1
R
+1
1
( +1 | )

such that the price-dividend ratio is constant whenever the distribution of the consumption
endowment growth rate is independent of . In Chapter 6 of Part II, we develop this case in
more detail, assuming a log-normal distribution for +1
.
3.2.4.3 Arrow-Debreu state prices

The model makes sharp prediction regarding Arrow-Debreu securities prices. In terms of the
terminology of Chapter 2, we want to identify the stochastic discounting factor. By the asset
pricing equation (3.9), it is
0
( +1 )
+1
0(
)
2 In this case, concavity of
0 ( ), which implies that
implies that for each , 0 = (0)
( ) + 0 ( )(
)

0( )
for each ,
and, hence,
( )
. Then, it is possible to show that the solution is in B( ), which implies that
: B( ) 7 B( ).

97

c
by
A. Mele

3.3. Production: foundational issues

Under regularity conditions, the Radon-Nikodym


derivative of the risk-neutral probability,
0(
+1 )

, with respect to , is:


, such that the Arrow-Debreu state-price
[ 0 ( +1 | )]
+1 |
1
( +1 | ). It is the price to pay in state
to obtain
density is: ( +1 | ) =
one unit of the good the next period when the state is
.
These
conclusions
generalize
+1
those of Chapter 2 to the innite horizon case. Parts II and III contains many more examples
of stochastic discounting factors that are useful to address empirical puzzles or applied issues
arising in both equity and xed income markets.
3.2.4.4 Multiple trees

The Lucas model is extraordinary complex as in general, the price of any asset depends on the
dividends paid by all the remaining assets, as Eq. (3.8) makes clear. The model can generate
contagion, in that a shock in the fundamentals a ecting some assets a ects all the other asset
evaluation, even when the dividends are not correlated. It is an interesting property, due to the
simple circumstance that there is a representative agent who is pricing the same assetsmarkets
are not segmented and a shock to the stochastic discounting factor, +1 , a ects all the asset
prices,
. We mention e orts made by the literature, discussed in deeper detail in Chapter
8 (Section 8.10): Menzly, Santos and Veronesi (2004), Cochrane, Longsta and Santa-Clara
(2008), Pavlova and Rigobon (2008), Martin (2011).

3.3 Production: foundational issues


In the economy of the previous section, asset payo s are given exogenously. We now lay down
analytical foundations regarding production-based economies, in which rms maximize their
value and set dividends endogenously. In these economies, production and capital accumulation
are endogenous. In this section, we review the foundational issues arising in economies with
productive capital. In the next section, we develop asset pricing implications of these economies
in absence of frictions. In Part II of these Lectures (Chapter 8), we extend the framework in
this chapter, examine asset price implications of markets plagued with nancial frictions.
3.3.1 Decentralized economy
A continuum of identical rms in (0 1) have access to capital and labor markets, and the following technology: (
)7
(
) where (
) 0,
(
) 0, lim 0+ 1 (
)=
lim 0+ 2 (
) = , lim
) = lim
) = 0, and subscripts denote
1(
2(
partial derivatives. We assume
is homogeneous of degree one, i.e. (
)=
(
)
for all
0. Per capita production is ( )
( 1), where
/ is per-capita capital,
Population growth can be non-zero, i.e.
satises
1 = (1 + ). Firms purchase capital
and labor at prices = 1 (
) and = 2 (
). We have,
=

( )

= ( )

( )

The
consumers live forever. We assume each consumer o ers inelastically one unit of labor,
and that, for now, that 0 = 1 and = 0. The resource constraint for the consumer is:
+

= 1 2

(3.10)

At each time
1, the consumer saves
1 units of capital, which he lends to the rm. At time
, the consumer receives the gross return on savings from the rm,
= 0 ( ),
1 , where
98

c
by
A. Mele

3.3. Production: foundational issues


plus the wage receipts
rm. At time zero,

. Then, he uses these resources to consume

1(

0)

and lend

to the

Following the approach developed in Chapter 2, we can write down a single budget constraint,
obtained iterating Eq. (3.10):
0=

=1

+Q

=1

and imposing the transversality condition:

lim

0
=1

=0

(3.11)

=1

so as to have:
max

( )

=0

( )

s.t.

0+

=1

X
=1

[3.P3]
=1

The economic interpretation of the transversality condition (3.11) is the following. The rstorder conditions of the program [3.P3] are:
0

( )= Q

(3.12)

=1

where is a Lagrange multiplier. In equilibrium, current savings equal next period capital, or
. Therefore, Eq. (3.11) is:
+1 =
0

lim

( )

+1

=0

(3.13)

That is, the value of capital is capital weighted by discounted marginal utility, and is worthless,
eventually. We shall derive Eq. (3.13) below.
The rst-order condition (3.12) leads to the usual optimality condition in Eq. (3.3), where
this time, +1 = 0 ( +1 ). In this economy, an equilibrium is a sequence (( ) ) =0 satisfying
= ( )
( +1 )
1
= 0
0( )
( +1 )

+1
0

(3.14)

and the transversality condition in Eq. (3.13). The rst equation in this system is simply this:
capital available for producing the next period, +1 , is equal to savings,
( )
.
3.3.2 The social planner solution
3.3.2.1 Recursive plan

The market solution in (3.14) can be implemented by a social planner, who solves the following
program:
X
( 0)
max
( )
[3.P4]
(
) =0
=0
s.t. +1 = ( )
0 given
99

c
by
A. Mele

3.3. Production: foundational issues

under the further transversality condition in Eq. (3.13).


The program in [3.P4] is easily solved. By replacing the constraint into the utility function, and taking derivatives with respect to , leads directly to the second equation in (3.14).
Alternatively, let us introduce the Lagrangian,
L ( 0) =

max

+1 ) =0

( )

( )+ )

+1

=0

0
The rst-order condition with respect to consumption is
=
( ), and the condition for
0
capital is
=
(
).
Putting
these
conditions
together,
leads
to the second equation in
1
(3.14). The same argument can be made, following a recursive approach. We have:

L ( ) = max [ ( )

( ) + ) + L(

+1

+1

+1 )]

The rst-order condition for consumption is = 0 ( ), and that for capital is = L0 ( +1 ).


By replacing the rst-order condition for
(i.e., the constraint in program [3.P4]), and differentiating with respect to , yields L0 ( ) = L0 ( +1 ) 0 ( ). These three conditions lead,
again, to the second equation in (3.14).
Finally, consider the Bellmans equation:
( ) = max [ ( ) +

+1 )]

s.t.

+1

= ( )

0
( ( )
). Let us denote the policy with
The rst-order condition leads to, 0 ( ) =
= ( ). In terms of the policy function, the value function and the rst-order conditions
are:
0
0
( ) = ( ( )) +
( ( )
( ))
( ( )) =
( ( )
( ))

By di erentiating the value function:


0

( )=

( ( )) 0 ( ) +

( ( )

( )) ( 0 ( )

( )) =

( ( ))

( )

By replacing back into the rst-order condition, we obtain the second equation in (3.14).
3.3.2.2 Transversality condition again

Another derivation of the transversality condition relies on the following arguments. Consider
the followig truncated program,
max
(

+1 ) =0

( )

( )+ )

+1

=0

s.t.

+1

=0

0
The rst order conditions are the usual ones. Moreover, by multiplying
( )=
by
+1
0
and utilizing the previous constraint leaves:
( ) +1 = 0. Taking the limit for large
yields Eq. (3.13).

3.3.3 Dynamics
We study the dynamics of the system in (3.14) in a small neighborhood of the stationary state,
dened as the pair ( ), solution to:
= ( )

=
100

1
( )

c
by
A. Mele

3.3. Production: foundational issues


ct

c0 = c + (v21/v11) (k0 k)

c
c = y(k) k
c0
k0

kt

k k*
FIGURE 3.1.

A rst-order expansion of each equation in (3.14) around its stationary state, yields the
following linear system:
!

0
+1

( )
1
0( )
0
(3.15)
=
00
( ) 1 + 00(( )) 00 ( )
+1

00 ( )
The solution to this system is obtained with the tools reviewed in Appendix 1 of this chapter.
It is:
= 11 1 + 12 2
= 21 1 1 + 22 2 2
(3.16)
1
2

where: are constants that depend on the initial state, are the eigenvalues of , and 11
,
21
12
(0 1) and
are the eigenvectors associated with . In Appendix 1, we show that 1
22
1.
The
proof
we
provide
in
the
appendix
is
important,
as
it
illustrates
precisely
how the
2
neoclassical model reviewed in this section, needs to be modied to induce indeterminacy in
the dynamics of capital and consumption. A critical step in that proof relies on the assumption
of diminishing returns, i.e. 00 ( ) 0.
Let us return to the equations in (3.16). First, we need to rule out an explosive behavior
of and , for otherwise we would contradict (i) that ( ) is a stationary point, and (ii)
the optimality of the trajectories. Since 2
1, the only possibility is to lock the initial

state ( 0 0 ) in such a way that 2 = 0, which yields the following set of initial conditions:
0 = 11 1 and 0 = 21 1 , or 0 = 21 .3 Therefore, the set of initial points that ensure a
0
11
( 0
). Since is a predetermined variable,
non-explosive path must lie on the line 0 = + 21
11
there exists one, and only one, value of 0 , which ensures a non-explosive path of the system
around its steady state, as Figure 3.2 illustrates. In this gure,
is dened as the solution of
1 = 0( )
= ( 0 ) 1 [1], and = ( 0 ) 1 [ 1 ].
The usual word of caution is in order. A linear approximation might turn out to be misleading.
We develop one example where the dynamics of the system could be quite di erent from those
analyze here, when we start away from the stationary state. Let ( ) = , ( ) = ln . It is
3 In

fact, Appendix 1 shows that the converse is also true, i.e.

0
0

101

21
11

= 0.

c
by
A. Mele

3.3. Production: foundational issues


ct
linear approximation

steady state

nonlinear stable manifold

kt

FIGURE 3.2.

easy to show that the exact solution is:


= (1

+1

Figure 3.3 depicts the nonlinear manifold associated with this system, and its linear approximation. For example, let = 0 99 and = 0 3. Then, the (linear) saddlepath is, approximately,
= + 0 7101 (
where =

1,

and

where:

= (1

1 (1

= 0 3.

3.3.4 Stochastic economies


Real business cycle theory is the application of general equilibrium theory to the quantitative analysis of business cycle uctuations. Edward Prescott (1991, p. 3)
The Kydland and Prescott model is a complete markets set-up, in which equilibrium and
optimal allocations are equivalent. When it was introduced, it seemed to manymyself
includedto be much too narrow a framework to be useful in thinking about cyclical
issues. Robert Lucas (1994, p. 184)

In its simplest version, real business cycle theory is an extension of the neoclassical model
of Section 3.3.3, in which random productivity shocks are added. The engine of uctuations,
then, comes from the real sphere of the economy. This approach is in contrast with the Lucas
approach of the 1970s, based on information and money, where uctuations arise due to information delays with which agents discover the nature of a shock (real or monetary). As further
reviewed in Chapter 9, the Lucas information-theoretic approach has been, instead, more successful in inspiring work on the formation of asset prices, leading to the development of market
microstructure theory and, more generally, to information driven explanations of asset prices.
Despite the remarkable switch in the economic motivation, the paradigm underlying real
business cycle theory is the same as the information-based approach of Lucas, as it relies on
rational expectations: macroeconomic uctations and, then, as we shall explain, asset prices
uctuations, stem from the optimal response of the agents vis-`a-vis exogeneous shocks: agents
implement action plans that are state-contingent, i.e. they decide to consume, to work and to
invest according to the history of shocks as well as the present shocks they observe.
102

c
by
A. Mele

3.3. Production: foundational issues


3.3.4.1 Basic model

We consider an economy with complete markets and no frictions, such that its equilibrium
allocations are Pareto-optimal. To characterize these allocations, we implement them through
the following program of a social planner:
"
#
X
( 0 0 ) = max
( )
(3.17)
( )

=0

=0

denote new

subject to a capital accumulation constraint, with capital depreciation. Let


investment. It is:
= +1 (1
)

(3.18)

of this capital is
At time
1, the available productive capital is . At time , a portion
lost, due to depreciation. Therefore, at time , the productive system is left with (1
)
units
of capital. The capital available at time , +1 , equals the capital already in place, (1
) ,
plus new investments, which is exactly what Eq. (3.18) says.
Next, normalize population normalized to one, such that
= . The goods market clearing
condition is:
(
)= +
) is the production function, which is F -measurable, and is the source of
where (
randomnessthe engine for random uctuations of the endogeneous variables. By replacing
Eq. (3.18) into the equilibrium condition,
+1

= (

+ (1

(3.19)

So the planner maximizes the utility in Eq. (3.17), under the capital accumulation constraint
in Eq. (3.19).
We assume that (
)
( ), where is as in Section 3.2, and ( ) =0 is solution to:
+1

+1 ,

(3.20)

where
(0 1), and ( ) =0 is a IID sequence with support s.t.
0. In this economy, every
asset is priced as in the Lucas model of the previous section. Therefore, the gross return on
savings 0 ( ) satises:
0

( )=

( 0(

+1 ) (

+1

+1 )

+1

))

(3.21)

A rational expectation equilibrium is a stochastic process (


) =0 , satisfying Eq. (3.19), the
Euler equation in (3.21), for given 0 and 0 .
We show the existence of a saddlepoint path for the linearized version of Eqs. (3.19)-(3.20)(3.21), which implies determinacy of the stochastic (linearized) equilibrium.4 We study the
behavior of (
) in a neighborhood of
( ). Let (
) be consumption, capital and
productivity shock, corresponding to , obtained replacing into Eqs. (3.19)-(3.20)-(3.21), and
assuming no uncertainty takes place:
=

( )

1
1

0(

1
)+1

4 A stochastic equilibrium is the situation where there is a stationary measure (denition: (+) =
) =1 .
the transition measure) generating (

103

(+

( ), where

is

c
by
A. Mele

3.3. Production: foundational issues


A rst-order approximation to Eqs. (3.19)-(3.20)-(3.21) around (
+1 =

(3.22)

+1

, and = ( )>
where we have dened
=
1 ( ) = , and, nally,
=

)
00 ( )

)> ,

=(

( )

1
0(

), leaves:

00

( ) 1+

0(

)
00 ( )

00

0(

)
00 ( )

( )

( ( ) 00 ( ) +

( ))

00

( )
+

Let us consider the characteristic equation:

2
0 = det (
)=(
)

+1+

( )
00 ( )

1 (

),

0 0
1 0
0 1

A solution is 1 = . By the same arguments applying to the deterministic case (see Section
3.3.3 and Appendix 1), one nds that 2 (0 1) and 3 1.5 Similarly as in the deterministic
1
case, we diagonalize the system by rewriting =
, where is a diagonal matrix that
has the eigenvalues of on the diagonal, and is a matrix of the eigenvectors associated to
the roots of . Eq. (3.22) is then:
+1 =
where

and

(3.23)

+1

. The third equation of this system is:


3

+1

3 3

(3.24)

3 +1

and 3 explodes unless 3 = 0 for all , which is only possible when 3 = 0 for all .6
The condition that 3
0 carries an interesting economic interpretation: it tells us that the
only sources of uncertainty in this system can stem from shocks to the fundamentals, or that
there cannot be extraneous sources of noise, or sunspots. The reasons for this are easy to
1
explain. Let =

. We have:
0 = 3 =

31

32

33

(3.25)

Eq. (3.25) tells us that the three state variables, , and , are mutually linked in a twodimensional plane, a saddlepoint, where they exhibit a stable behavior. This saddlepoint is
formally dened as:

S=
R3 3 = 0
3 = ( 31
32
33 )

Furthermore, Eq. (3.25) implies that a linear relation exists between the two expectational
errors:
33
=
(no-sunspots).
(3.26)
For all ,
32
5 The

linearized model in this section has state variables expressed in growth rates. However, the model can be reformulated in
terms of rst di erences, by pre- and post- multiplying by appropriate normalizing matrices. For example, if i the 3 3 matrix
1
that has 1 1 and 1 on its diagonal, (3.22) can be written as: ( +1
)=
(
), where
=(
), and we would
arrive at the same conclusions. It can be easily shown that the model in this section collapses to that in Section 3.3.3, once we set
= 1, for each , and 0 = 1.
(
)
6 In other words, Eq. (3.24) implies that =
(3 + ), and for all . Because 3 1, this relation holds only when
3
3
3 = 0 for all .

104

c
by
A. Mele

3.3. Production: foundational issues

Eq. (3.26) is a no-sunspots condition, as it says that the expectational error to consumption
cannot be independent of the expectational shock on the fundamentals of the economy, which in
this simple economy relates to technological shock. In other words, the source of uncertainty we
have assumed in this economy, relates to the technological shock. The remaining expectational
errors can only be perfectly correlated to the expectational shock in technology or, there are
no sunspots.
The manifold S has the same meaning as the stable relation depicted in Figure 3.2 applying
to the deterministic case. Mathematically, in this section, S is the convergent subspace, with
dim(S) = 2, i.e., the number of roots with modulus less than one. In other words, in this
economy with two predetermined variables, 0 and 0 , there exists one, and only one, value of

31 0 + 33 0
. This reasoning generalizes that made
0 S that ensures stability, given by 0 =
32
in the deterministic case (Section 3.3.3), and is generalized further in Appendix 1.
The solution to the linearized model can be determined by generalizing the reasoning in the
deterministic case. First, by Eq. (3.23) is:
=

1
X

0 +

=0

which implies the solution for is:


=

=(

3 ) =

3
X

=1

3
X

=1

3
X
=1

1
0 =
0
0 . The stability
To pin down the components of 0 , note that 0 = 0
(3)
condition then requires that the state variables be in S, or 0 = 0, which we now use to
implement the solution. We have:

1 1 10

2 2 20

3 3 30

1 1

2 2

3 3

Moreover,
P 1the term 3 3 30 + 3 3 needs to be zero, because 30 = 0. Finally, we have that
=
, and since 3 = 0, then, then 3 = 0 as well. Therefore, the solution for
3
=0 3 3
is:
= 1 1 10 + 2 2 20 + 1 1 + 2 2
3.3.4.2 Frictions, indeterminacy and sunspots

In the neoclassical model that we are analyzing, the equilibrium is determinate. As explained,
this property arises because the number of predetermined variables equals the dimension of the
convergent subspace of the economy. If we managed to increase the dimension of the converging
subspace, the equilibrium would be indeterminate, as further formalized in Appendix 1. As it
turns out, indeterminacy goes hand in hand with sunspots, the expectational shocks extraneous
to those in the economic fundamentals, as we discussed earlier, just after Eq. (3.26).
Introducing sunspots in macroeconomics has been an approach pursued in detail by Farmer
in a series of articles (see Farmer, 1998, for an introductory account of this approach). The
idea is quite interesting, as we know that the basic real business cycle model of this section
needs many extensions in order not to be rejected, empirically, as originally shown by Watson
(1993). In other words, the basic model in this section o ers little room for a rich propagation
mechanism, as it entirely relies on impulses, the productivity shocks, which we hardly read
105

c
by
A. Mele

3.3. Production: foundational issues

about in the Wall Street Journal, as provocatively put by King and Rebelo (1999). Sunspots
o er an interesting route to enrich the propagation mechanism, although their asset pricing
implications in terms of the model analyzed in this section, have not been explored yet.
In a series of articles, David Cass showed that a Pareto-optimal economy cannot harbour
sunspots equilibria. On the other hand, any market imperfection has the potential to be a
source of sunspots. The typical example is the presence of incomplete markets. The neoclassical
model analyzed in this section cannot generate sunspots, as it relies on a system of perfectly
competitive markets and absence of any sort of frictions. To introduce sunspots in the economy
of this section, we need to think about some deviation from optimality. Two possibilities analyzed in the literature are the presence of imperfect competition and/or externality e ects. We
provide an example of these e ects, by working out the deterministic economy in Section 3.3.3.
(Generalizations to the stochastic economy in this section are easy, although more cumbersome.)
How is it that a deterministic economy might generate stochastic outcomes, that is, outcomes driven by shocks entirely unrelated to the fundamentals of the economy? Let us imagine
this can be possible. Then, both optimal consumption and capital accumulation in Section
3.3.3 are necessarily random processes. The system in (3.15), then, must be rewritten in an
expectation format,

+1

=
+1

Next, let us introduce the expectational error process


into the previous system, to obtain:

Naturally, we still have


1
as
, and have:

+1
+1

(0 1) and
+1 =

0
+1

1 (

), which we plug back

1, as in Section 3.3.3. Therefore, we decompose

(0

>
+1 )

Moreover, for 2 = 2
(2 + ) to hold for all , we need to have 2 = 0, for all . Therefore,
1
>
the second element of the vector
(0
+1 ) must be zero, or, for all ,
0=

22

0=

There is no room for expectational errors and, hence, sunspots, in this model. The fact that
1 implies the dimension of the saddlepoint is less than the number of predetermined
2
variables. So a viable route to pursue here, is to look for economies such that the saddlepoint
has a dimension larger than one, i.e. such that 2
1. In these economies, indeterminancy
and sunspots will be two facets of the same coin. As shown in the appendix, the reasons for
which 2 1 relate to the classical assumptions about the shape of the utility function and
the production function . We now modify the production function, to see the e ect on the
eigenvalues of .
[Economy with increasing returns]
[Asset pricing implications in further chapters]
106

c
by
A. Mele

3.4. Production-based asset pricing

3.4 Production-based asset pricing


3.4.1 Firms
For each rm, capital accumulation does satisfy the identity in Eq. (3.18), reproduced here for
convenience:
)
+
(3.27)
+1 = (1
The additional assumption we make, is that capital adjustment is costly: investing per unit
of capital already in place, , entails a cost ( ), expressed in terms of the price of the nal
good, which we take to be the numeraire, thereby allowing the investment goods to di er from
the nal good the rm produces. An investment of , then, leads to a cost ( ) , such that
the prot the rm makes at time is,

(
) (
( ))
( )
(3.28)
where (
) is the rms production at time , obtained with capital
and labor , and
subject to the productivity shocks described in Section 3.3.4,
is the real wage, ( ) is the
labor demand schedule, solution to the optimality condition, (
( )) =
for all ,
and is the real price of the investment goods, or uninstalled capital. Finally, the adjustmentcost function satises
0, 0 0, 00 0. In words, capital adjustment is costly when the
adjustment is made fastly. Naturally, is zero in the absence of adjustment costs.
What is the value of the prot, from the perspective of time zero? This question can be
answered, by utilizing the Arrow-Debreu state prices introduced in Chapter 2. At time , and
in state , the prot
( ) (say) is worth,
0

( )

( )

( )) =

( )

( )

( ))

( )

with the same notation as in Chapter 2.


3.4.1.1 The value of the rm

We assume that in each period, the rm distributes all the prots it makes, and that for a given
capital 0 , it maximizes its cum-dividend value,
"

!#
X
( 0 0) +
( 0 ) = max
(
)
0
(

1 ) =1

=1

subject to the capital accumulation law of Eq. (3.27).


The value of the rm at time , ( ), can be found recursively, through the Bellmans
equation,
( ) = max [ (
) + ( +1 ( +1 ))]
where the expectation is taken with respect to the information set as of time . The rst-order
conditions for lead to,
(
) = [ +1 0 ( +1 )]
(3.29)
That is, along the optimal capital accumulation path, the marginal cost of new installed capital
at time ,
, must equal the expected marginal return on the investment, i.e. the expected
value of the marginal contribution of capital to the value of the rm at time + 1, 0 ( +1 ).
107

c
by
A. Mele

3.4. Production-based asset pricing


By Eq. (3.29), optimal investment is a function (
(

)=

)) +

), and the value of the rm satises,


[

+1

+1 )]

Di erentiating the value function in the previous equation, with respect to


(3.29), yields the following envelope condition:
0

) =
=

(
(

(
(

)) +
(
)) (1
)

))
(

( )+
( ))

+1

, and using Eq.

+1 ) ((1

)+

))]

By replacing this expression for the value function back into Eq. (3.29), leaves:
(

)) =

+1

+1

+1 ))

(1

+1

+1 )))]

(3.30)

Along the optimal capital accumulation path, the marginal cost of new installed capital
at time , which by Eq. (3.29) is the expected marginal return on the investment, equals the
expected value of (i) the very same marginal cost at time +1, corrected for capital depreciation,
(1
), and (ii) capital productivity, net of adjustment costs. Analytically,

0
0
(
)= (
( )) + (
( )) ( )
( )

( ))
= (

0
(
( )) = +
We now introduce a fundamental concept in investment theory.
3.4.1.2 q theory

The Tobins marginal q is dened as the ratio of the expected marginal value of an additional
unit of capital over its replacement cost:

We show that the numerator, [ +1


Consider the Lagrangian at time ,
(

Tobins marginal q

TQ

) = max [ (

+1 )],

+1

(1

+1

+1 )]

is, simply, the shadow price of installed capital.

)+

+1

+1 ))]

(3.31)

+1

which, integrated, gives rise to the value of the rm:


"
X
( 0) =
max
)
0 ( (
(

+1

=0

=0

The rst-order condition for investment, , is, =


the second equality follows by Eq. (3.29). Therefore,
108

+1

(1

))

(
) = ( +1 0 ( +1 )), where
is the expected marginal return on the

c
by
A. Mele

3.4. Production-based asset pricing

investment, that is, the shadow price of installed capital. Tobins marginal q is, then, the ratio
of the shadow price of installed capital to its replacement cost:
TQ =
=

Replacing the condition


=

(
[

+1

) into the valuation equation (3.30) leaves:

+1 )

+1

that is,
=

+ (1

+1 )]

(3.32)

(3.33)

The shadow price of installed capital, , has to equal the marginal cost of new installed capital,
and is larger than the price of uninstalled capital, . It is natural: to install new capital requires
some (marginal) adjustment costs, which add to the row price of uninstalled capital, .
Therefore, in the presence of adjustment costs, Tobins marginal q is larger than one.
Eq. (3.32) can be solved forward, leaving:
"
#
X
=
(1
) 1 0 +
( + +)
=1

The shadow price of installed capital is worth the sum of all its future marginal net productivity,
discounted at the depreciation rate. Moreover, Eq. (3.33) can be inverted for
, to deliver:
=

0 1

(3.34)

where 0 1 denotes the inverse of 0 , and is increasing, since 0 is increasing. Given , and the
fact that +1 is predetermined, the rm evaluates through Eq. (3.32), and then determines
the level of new investments through Eq. (3.34). These investments are increasing in the difference between the shadow price of installed capital, , and that of uninstalled capital, , as
originally assumed by Tobin (1969).
In the absence of adjustement costs, when = , Eq. (3.32) delivers the condition,
1=

+1

( (

+1

+1 ))

+ (1

))]

1 for all , meaning that the rms production is just the uninstalled capwhere we have set
ital. Empirically, however, the marginal productivity of capital, (
( )), is not volatile
enough, to rationalize asset returns, as explained in more detail in Chapter 8. Moreover, as we
argue in a moment, Tobins marginal q can be approximated by market-to-book ratios, which
are typically time-varying. Therefore, adjustment costs are important for asset pricing.
A di culty with Tobins marginal q is that it is quite di cult to estimate. Yet in the special
case we are analyzing in this section, where rms act competitively and have access to an
homogeneous production function and adjustment costs, Tobins marginal q can be proxied by
the market-to-book ratio of a given rm. Let ( ) denote the ex-dividend value of the rm,
which is its stock market value, since it nets out the dividend it pays to its holder in the current
period. It is:
( )
( )
(
( )) = [ +1 ( +1 )]
109

c
by
A. Mele

3.4. Production-based asset pricing

The Tobins average q is dened as the ratio of the stock market value of the rm over the
replacement cost of the capital:
Stock Mkt Value of the Firm
=
Replacement Cost of Capital

Tobins average q

)
+1

The next result was originally obtained by Hayashi (1982) in a continuous-time setting.
Theorem 3.2. Tobins marginal q and average q coincide. That is, we have,
(

)=

+1

Proof. By the homogeneity properties of the production function and the adjustment costs,
)=

Therefore, the ex-dividend value of the rm is:


"
#
X
( 0 0) =
(
)
0
" =1
X

(1

=1

))

"
X

=1

+1

where the second line follows by Eq. (3.27). By Eq. (3.30), and the law of iterated expectations,
"
#
"
#
X
X
=
(
) (1
)
(
))
( 0 0) 1
(
)
0 (
0
+1
=1

Hence,

=1

0)

0)

1.

This result, in conjunction with that in Eq. (3.33), provides a simple rule of thumb for
investement decisions. Consider, for example, the case of quadratic adjustment costs, where
( ) = 12 1 2 , for some
0. Then, Eq. (3.34) is:

Stock Mkt Value of the Firm


= (
)
=
1
Replacement Cost of Capital
where the second equality follows by Theorem 3.2. Thus, according to q theory, we expect rms
with a market value larger than the cost of reproducing their capital to grow, and rms which
are not worth the cost of reproducing their capital to shrink. This basic observation constitutes
a rst assessment that we can use to assess developments of rms.
3.4.2 Consumers
We now generalize the budget constraint obtained in the program [3.P3], to the uncertainty
case. We claim that in this case, the relevant budget constraint is,
"
#
X
)
(3.35)
0 = 0+
0 (
=1

110

c
by
A. Mele

3.5. Money, production and asset prices in overlapping generations models


We have:
"
X

=(

+1

"
X

) =

=1

and, then:

=1

"
X

=1

"
X

0 1

0
1
1

=1

"
X
=1

"
X

+1

#
1

=0

"
X

=2

=2

where the third line follows by the properties of the discount factor,
1 .
Therefore, the program consumers solve is:
"
#
X
( )
s.t. Eq. (3.35).
max
( )

0
1

and

=1

We now have two optimality conditions, one intertemporal and another, intratemporal:
+1

(
1

+1

+1 )

(inter temporal);

(
1(

)
(intratemporal).
)

3.4.3 Equilibrium
For all ,
(

)=

(3.36)

+
,
It is easily seen that the condition = 1 in the nancial market, implies that =
which, upon substitution of the prots in Eq. (3.28), delivers the equilibrium condition in Eq.
(3.36). Implicit in this reasoning, is the idea the adjustment costs are not paid to anyone. They
represent, so to speak, capital losses incurred along the way of growth.

3.5 Money, production and asset prices in overlapping generations models


3.5.1 Introduction: endowment economies
3.5.1.1 A deterministic model

We initially assume the population is constant, and made up of one young and one old. The
young agent maximizes his intertemporal utility subject to his budget constraint:

sav + 1 = 1
max [ ( 1 ) +
( 2 +1 )] subject to
[3.P5]
( 1 2 +1 )
2 +1 = sav
+1 + 2 +1
where

and

2 +1

are the endowments the agent receives at his young and old age.
111

3.5. Money, production and asset prices in overlapping generations models

c
by
A. Mele

c2,t+1

w2,t+1
c2,t+1 = Rt+1 c1,t + Rt+1 w1t + w2,t+1

c1,t

w1,t

FIGURE 3.3.

The agent born at time


1, then, faces the constraints: sav 1 + 1 1 = 1 1 and 2 =
sav 1 + 2 . By combining his second period constraint with the rst period constraint of
the agent born at time ,
sav

= sav +

(3.37)

The equilibrium in the intergenerational lending market is, naturally:


sav = 0

(3.38)

P
= 2=1 , and for all .
and implies that the goods market is also in equilibrium, in that
Therefore, we can analyze the model, by just analyzing the autarkic equilibrium.
As Figure 3.4 illustrates, the rst-order condition for the program [3.P5] requires that the
slope of the indi erence curve be equal to the slope of the lifetime budget constraint, 2 +1 =
+1 1 +
+1 1 + 2 +1 , and leads to:
0

2 +1 )
0(
1 )

(3.39)

+1

satisfying Eqs. (3.37), (3.38) and (3.39),

The equilibrium, then, is a sequence of gross returns


or:
0
(
1
=
0

2 +1 )

(3.40)
( 1)
In this relation, is the shadow price of a bond issued at , and promising one unit of numeraire
at + 1: the sequence of prices, , satisfying Eq. (3.40), is such that agents are happy with not
being able to lend and borrow, intergenerationally.
The previous model is easy to extend to the case where agents are heterogeneous. The program
each agent solves is, now:

sav + 1 = 1
( 1 )+
( 2 +1 ) subject to
max
( 1
2 +1 = sav
+1 + 2 +1
2 +1 )
+1

with obvious notation. The rst-order condition is, for all time and agent ,
0

(
0

+1 )

)
112

1
+1

c
by
A. Mele

3.5. Money, production and asset prices in overlapping generations models


and the equilibrium is a sequence of bond prices
equilibrium in the intrageneration lending market:
X

sav

satisfying the previous relation and the

=0

(3.41)

=1

where denotes the constant number of agents in each generation.


To illustrate, suppose agents have all the same utility, of the CRRA class, with CRRA
= . In this case,
coe cient equal to , and the same discount rate,
=

sav

+1 )

+1

2 +1

+(

+1 )

The rst term in the numerator reects an income e ect, while the second is a substitution
e ect. The coe cient 1 is the elasticity of intertemporal substitution, as explained in Section
3.2.3. Consider, for example, the logarithmic case, where = 1, and:
1

1
=
1+

+1
+1

+1

1+

+1

and using the equilibrium condition in Eq. (3.41),


P
1
=
=P
+1

+1 )

sav

=1
2

=1

1
=
1+

+1

1
+1

(3.42)

(3.43)

+1

3.5.1.2 A tree in a stochastic economy

Suppose, next, that we introduce a tree, which yields a stochastic dividend


in each period.
Each agent solves the following program:

+ 1 = 1
( ( 2 +1 )| F )] subject to
[3.P6]
max [ ( 1 ) +
( 1 2 +1 )
2 +1 = ( +1 +
+1 ) + 2 +1
where denotes the asset price and the units of the asset the agent chooses in his young age.
The agent born at time
1 faces the constraints
+
1
1 + 1 1 =
1 1 and
2 +(
) 1 = 2 . By combining the second period constraint of the agent born at time
1 with
the rst period constraint of the agent born at time ,
(

The clearing condition in the asset market, = 1, implies that the market for goods also clears,
for all :
+ 1 + 2 = 1 + 2 . A characterization of the solution to the program [3.P6]
can be obtained by eliminating from the constraint,
max [ (

)+

( ((

+1

+1 )

)| F )]

The equilibrium is one where = 1, implying that (i) 1 = 1


and (ii) 2 +1 = +1 +
+
.
Using
(i)
and
(ii),
the
rst-order
condition
for
the
program
[3.P6] leads to:
+1
2 +1
0

+1

+ +1 +
113

2 +1 ) (

+1

+1 )| F

c
by
A. Mele

3.5. Money, production and asset prices in overlapping generations models

Consider, for example, the case where ( ) = ln , and set +1 = ( +1 + +1 )


have:
!

1
1
+1 F
where sav
=
,
=1

sav
sav +1 + 2 +1
1

. We

(3.44)

In a deterministic setting,
1
1

sav

sav

1
+1 +

+1 ,

where sav = 0

(3.45)

2 +1

which leads to the equilibrium bond price in Eq. (3.43). Eqs. (3.44) and (3.45) are formally
equivalent. Their fundamental di erence is that in the tree economy, savings have to stay
positive, as the tree must be held by the young agent, in equilibrium: sav
0. In an
economy without a tree, instead, the interest rate, , has to be such that savings are zero for
all , sav = 0.
Eq. (3.44) can be solved explicitly for the price of the tree, , once we assume 2 = 0 for
all . In the absence of a tree, we cannot assume endowments are zero in the old age, since
the autarkic economy in this case would be such that the old generation would not consume
anything. In the presence of a tree, instead, this assumption is innocuous, conceptually, as the
autarkic equilibrium in this case is such that the old generation could consume the fruits of the
tree, as well as the proceedings arising from selling the tree to the young generation. Solving
Eq. (3.44) for
when 2 = 0, then, leads to a price for the tree, equal to:
=

1+

3.5.2 Diamonds model


,
+1 =
[Bubbles]

( (

)).

3.5.3 Money
We consider a version of the previous model with endowment (not with capital), and assume
that agents can now transfer value through a piece of paper, interpreted as money. The young
agent, then, maximizes his intertemporal utility, subject to a new budget constraint:
+
(

max [ (
1

2 +1 )

1 )+

+1 )] subject to
2 +1

[3.P7]
2 +1

+1

is the amount of money he holds at time , and


where
good as of time .
Let
sav
+1

is the price of the consumption

(3.46)

+1

Then, the budget constraint for program [3.P7] is formally identical to that for program [3.P5].
The di erence is that in the monetary economy of this section, the young agent may wish to
114

c
by
A. Mele

3.5. Money, production and asset prices in overlapping generations models

transfer value over time, by saving money, earning a gross interest rate equal to the rate
of deation: the lower the price level the next period, the higher the purchasing power of the
money he transfers from the young to the old age. Naturally, then, by aggregating the budget
constraints of the young and the old generation, we obtain, formally, Eq. (3.37), where now,
sav and +1 are as in (3.46). However, in the setting of this section, sav is not necessarily
zero, as money can be transferred from a generation to another one. In equilibrium, sav = ,
where denotes money supply. Therefore, the real value of money is strictly positive, if the
equilibrium price
stays bounded over time, which might actually occur, as we shall study
below. As we see, the role of money as a medium for transferring value, is, in this context,
similar to that of a tree in the stochastic overlapping generations economy of Section 3.5.1.2.
Substituting the equilibrium savings sav = and +1 = +1 into Eq. (3.37), we obtain,
), which used again in Eq. (3.37), delivers,
1= + ( 1 + 2
sav

= sav

(3.47)

We need a law of movement for money creation. We assume that:7


sequence . Replacing this into Eq. (3.47), leaves:
(1 +

) sav

, for some bounded

= sav

(3.48)

The last relation can be obtained even more simply, noting that by denition, (1 + ) 11 1 =

. The previous relation can be generalized when population grows. Suppose that at time ,
individuals are born, and that
= (1 + ), for some constant . Let money supply be
1
, and assume that for all ,
= . Then, by a reasoning similar to that
given by
1
leading to Eq. (3.48),
1+
sav ( ) = sav ( +1 )
(3.49)
1+
where now, we have set the real savings equal to a function of the interest rate, sav 1 sav ( ),
as it should be, by the solution to the program [3.P7].
Next, suppose that
is independent of , and that lim
= , say, a constant. Eq.
(3.49) leads to two stationary equilibria:
(a)

= 1+
. This stationary equilibrium relates to the Golden Rule, once we set = 0,
1+
as we shall
6= 0, the price is, in this stationary equilibrium,
say in Section 3.6.2. For
1+

= 1+
=
= 0 00 , and (ii) +1 = 0 00 1+
. All in all, the
0 . Then, we have: (i)
1+
agents budget constraints are bounded and the real value of money is strictly positive.
In this stationary equilibrium, agents trust money.

(b)

: sav ( ) = 0. This stationary equilibrium relates to an autarkic state. Generally, we


have that
: prices increase more rapidly than per-capita money stocks. Analyti
cally, sav ( ) = 0 implies that lim
0, which, in turn, implies that for large ,
/
1+
+1
= +1
= 1+
+1 /

=
= 1+
, and since lim
1+
+1
stationary equilibrium, agents do not trust money.

+1

+1

+1

. As for
0, then lim

, we have that
0. In this
+1

+1

7 In this section, we assume that money transfers are made to the young generation: the money the young generation has to
absorb is that from the old generation, 1 , and that created by the central bank,
1 . One might consider an alternative
model in which transfers are made to old.

115

c
by
A. Mele

3.5. Money, production and asset prices in overlapping generations models

If sav() is di erentiable and sav0 () 6= 0, the dynamics of ( ) =0 can be analyzed through


the slope,
sav0 ( ) + sav( ) 1 +
+1
(3.50)
=
sav0 ( +1 )
1+
There are three cases:
(i) sav0 ( )

0. Gross substituability: the substitution e ect dominates the income e ect.

(ii) sav0 ( ) = 0. Income and substitution e ects compensate with each other.
(iii) sav0 ( )

0. Complementarity: the income e ect dominates the substitution e ect.

The introductory example of this section leads to an instance of gross substituability (see
Eq. (3.42)). Note that an equilibrium cannot exist in that economy, once we assume agents
do not have endowments in the second period, 2 +1 = 0, as in this case, savings would
be strictly positive, such that the equilibrium condition in Eq. (3.41) would not hold. These
issues do not arise in the monetary setting of this section, where savings have to be positive
and equal to , in order to sustain a monetary equilibrium. Assume, for example, the CobbDouglas utility function, ( 1 2 +1 ) = 11 22 +1 , which leads to a real saving function equal
to sav(

+1 )

2
1

1+

reorganizing,

2 +1
+1
2
1

. If

2 +1

= 0, then, sav(

+1 )

1+ 2
2

and, by

an equation supporting the Quantitative Theory of money. In this economy, the sequence of
gross returns satises, +1 = +1 = +1 1 1 +1 , or
+1

(1 + ) (1 +
1 + +1

+1 )

1 +1

+1
1

1
, equals the monetary creation factor, corrected for the growth rate of the
Gross ination,
economy as measured by +1 , the youngs endowments growth rate.

(
( 1)
( 1)
As a nal example, consider the utility function ( 1 2 +1 ) =
+ (1
) 2 +1
1
which collapses to Cobb-Douglas once
1. We have:

+1
+1

2 +1
2 +1

+1

+1

1+

2 +1
1
+1

sav (

+1 )

+1
+1

2 +1
+1

1
where
. To simplify, set (i)
= 1, (ii) 2 = = = 0, and (iii) 1 = 1 +1 . It can
0
be shown that in this case, sign (sav ( )) = sign (
1). Moreover, the dynamics of the gross
interest rate, , are given by:
+1

1)1

(1

(3.51)

The stationary equilibria are solutions to = ( ), and it is easily seen that one of them is
= 1, and corresponds to the monetary steady state.
When
1, the slope in Eq. (3.50) has always the same sign, and the mapping in Eq.
(3.51) has two xed points,
= 0 and = 1, with
being stable and being unstable, as
illustrated by Figure 3.5 when = 2.
116

1)

c
by
A. Mele

3.5. Money, production and asset prices in overlapping generations models

f(R)
1.5

1.0

0.5

0.0

0.0

FIGURE 3.4.

0.2

0.4

( )=(

0.6

0.8

1)

1.0

1,

1.2

with

= 2.

When
1, the situation is quite delicate. In this case,
is not well-dened, and = 1 is
not necessarily unstable. We may have sequences of gross interest rates, , converging towards
, or even the emergence of cycles. Mathematically, these properties
can be understood by

+1
examining the slope of the map in Eq. (3.50), for = 1,
= 1.

+1 =

=1

In the general case, Figure 3.6 depicts an hypothetical shape of the map
7
+1 , which
is that we might expect to arise in the
presence
of
gross
substituability
or,
in
fact,
even
in the
sav0 ( )
+1
case of complementary, provided sav( )
1 for all . In both cases, the slope,
0.

sav( )
+1
, is
= 1 + sav
0. The
Moreover, the slope at the monetary state, = 1+

0( )
1+
+1 =

slope at the monetary state, the point


in Figure 3.6, is greater than one, provided sav0 ( ) 0.
In this case, the monetary state
is unstable, while the autarkic state, the point in Figure
3.6, is stable. Note that any path beginning from the right of the monetary state leads to
explosive dynamics for . These dynamics cannot be part of any equilibrium because they
would imply a decreasing sequence of prices, , thereby tilting the agents budget constraints
in such a way to rule out the existence of a solution to the agents programs. Therefore, the
economy needs to starts anywhere between the point and the point , although then, we do
not have any other piece of information: there exists, in fact, a continuum of points 1 [
)
that are equally likely candidates to the beginning of the equilibrium sequence. Contrary to
the representative agent models in the previous sections, the model of this section leads to an
indeterminacy of the equilibrium, parametrized by the initial price 0 .
Would an autarkic equilibrium be the only possible stable steady state? The answer is in
the negative.
Consider the case where the map
7
+1 bends backwards and is such that

+1
1, such that the monetary steady state
is stable. A condition for the map

=
+1 =

sav0 ( )
+1
7
to
bend
backward
is
that
1,
and
the
condition
for

+1
sav( )
1 to hold is that

neighborhood of
8 For
1+
1+

sav0 ( )
sav( )

+1 =

1
.
2

In this case, the point

is reached from any su ciently

. Figure 3.7 shows a cycle of order two, where

)= (

). Multiplying the two equations side by side leaves the result that

117

the proof, note that by Eq. (3.49), we have, we have that for a cycle of order 2, (i)
(

1+
1+
2

(
.

.8 Note that to
) = (

), and (ii)

3.5. Money, production and asset prices in overlapping generations models

c
by
A. Mele

Rt+1

A
Ra

Rt

FIGURE 3.5. Gross substitutability


Rt+1

A
R*

R**

Rt

FIGURE 3.6.

analyze the behavior of the gross interest rate, we are needing to make reference to backwardlooking dynamics, as there exists an indeterminacy of forward-looking dynamics. Finally, there
might exist more complex situations where cycles of order 3 exist, giving rise to what is known
as a chaotic system. Note that these complex dynamics, including those in Figure 3.7, rely
on the assumption that sav0 ( ) 0, which might be somehow unappealing.
3.5.4 Money in a model with real shocks
Lucas (1972) is the rst attempt to address issues relating the neutrality of money in contexts
with overlapping generations and uncertainty. This section is a simplied version of Lucas
model as explained by Stokey et Lucas (1989) (p. 504). Every agent works when young, so as
to produce a consumption good, and consumes when he is old, and experiences a disutility of
work equal to
( ), where
is his labor supply, and is assumed to satisfy 0 00
0.
Utility drawn from second period consumption is denoted with ( +1 ), and has the standard
properties. The agent faces the following program:

=
=
( ( +1 )| )] subject to
max [ ( ) +
{ }
+1 +1 =
118

c
by
A. Mele

3.6. Optimality and bubbles

where
denotes the information set as of time ,
is money holdings; is the agents production, obtained through his labor supply , and ( ) =0 1 is a sequence of positive shocks
a ecting his productivity. Finally,
is the price of the consumption good as of time . By
replacing the rst constraint into the second leaves, +1 =
. There+1 , where
+1
+1
( (
))]. The rst-order
fore, the program the agent solves is to max [ ( ) +
+1 )|
condition leads to,
0
( )=
[ 0(
]
+1 )
+1 |
We have, +1 = +1 +1 =
+1 , where the rst equality follows by the equilibrium in the
good market. Replacing this relation into the previous equation leaves,
0

( )

( 0(

+1 ) +1

+1 |

(3.52)

A rational expectation equilibrium is a labor supply = ( ), where


i.e.,
Z
0
0
( ( ) ( )) =
( ()) () (| )

satises Eq. (3.52),

where E denotes the support of . This equation simplies as soon as productivity shocks are
IID, (| ) = (), in which case, 0 ( ( ) ( )) is independent of , and is a constant .9 This
is a result about the neutrality
of money, at least provided such a constant exists. Precisely,
R 0
0
we have that ( ) = E ( ) (). For example, consider ( ) = 12 2 and ( ) = ln , in
which case =
, ( )=
and ( ) =
.

3.6 Optimality and bubbles


3.6.1 Economies with production
Consider the usual law of capital accumulation:
Dividing both sides of this equation by
leaves:
+1

1
1+

( ( )

+1

) , for

The stationary state of the economy is achieved when


that:
= ( ) (1 + )

+1

, for

given.

given.

(3.53)
and

+1

, such

In steady-state, per-capita consumption attains its maximum at:


:

() = 1 +

(3.54)

The steady state per-capita capital satisfying Eq. (3.54) is said to satisfy the Golden Rule. A
social planner would be able to increase per-capita consumption at the stationary state, provided
0
( ) 1 + . Indeed, because ( ) is given, we can lower and have
= (1 + )
0,
immediately, and = ( 0 ( ) (1 + ))
0, in the next periods. In fact, this outcome would
9 The proof that ( ) = relies on the following argument. Suppose the contrary, i.e. there exists a point
0 and a neigh( 0 ) or (ii) ( 0 + )
( 0 ), for some strictly positive constant . We
borhood of 0 such that either (i) ( 0 + )
deal with the proof of (i) as the proof of (ii) is nearly identical. Since 0 ( ( ) ( )) is constant, and 00
0, we have that
0( (
0 ( ( )) ( )
0( (
0( (
( 0 ))
0. Next, note
0 + )) ( 0 + ) =
0
0
0 + )) ( 0 ). Therefore,
0 + )) ( ( 0 + )
( 0 ), contradicting that ( 0 + )
( 0 ).
that 0 0, such that ( 0 + )

119

c
by
A. Mele

3.6. Optimality and bubbles

apply along the entire capital accumulation path of the economy, not only in steady state, as
we now illustrate. First, a denition. We say that a path ( ) =0 is consumption-ine cient if
there exists another path ( ) =0 satisfying Eq. (3.53), and such that
for all , with at
least a strict inequality for one . The following is a slightly less general version of Theorem 1
in Tirole (p. 161):
Theorem 3.3 (Cass-Malinvaud theory). A path ( ) =0 is: (i) consumption e cient if
0( )
)
1 for all , and (ii) consumption ine cient if 1+
1 for all .
1+
0(

Proof. As for Part (i), suppose


consumption e cient path. Since

is consumption e cient, ad let = + be an alternative


= 0. Moreover, by Eq. (3.53),
0 is given,

(1 + ) ( +1

+1 )

= ( )

( )

and because is consumption-e cient,


, with at least one strictly equality for some .
Therefore, by concavity of , and the denition of ,
( )

( )

(1 + )( +1

+1 )

( ) + 0( )

( )

(1 + )

+1

( )
( 0)
or +1
. Evaluating this inequality at = 0 yields 1
0 , and since 0 = 0, one
1+
1+
0( )
has that 1 0. Since 1+
1 for all , then
as
, which contradicts
has
bounded trajectories. The proof of Part (ii) is nearly identical, except that, obviously, in this
case, lim inf
. Note, in general, there are innitely many sequences that allow for
e ciency improvements. k

The reasoning in this section holds independently of whether the economy has a nite number
of agents living forever, or overlapping generations. For example, in the case of overlapping
generations, Eq. (3.53) is the capital accumulation path for Diamonds model, once we set
2 +1
= 1 + 1+
. An important issue is to establish whether actual economies are dynamically
e cient? In a seminal contribution, Abel, Mankiw, Summers and Zeckhauser (1989) provide a
framework to address this issue and conclude about dynamic e ciency.
[In progress]
3.6.2 Over-accumulation of capital
Bubbles in the Diamonds moodel
[In progress]
3.6.3 Money
We wish to nd rst-best optima, that is, equilibria that a social planner may choose, by acting
directly on agents consumption, without needing to force the agents to make use of money.10
Let us analyze, rst, the stationary state,
= 1+
. We show that this state corresponds to
1+
the stationary state where consumptions and endowments are constants, and that the agents
utility is maximized when = 0. Indeed, since the social planner allocates resources without
10 In a second-best equilibrium, a social planner would let the market play rst, by allowing the agents to use money and, then,
would parametrize such virtual equilibria by . The indirect utility functions that arise as a result would then be expressed in
terms of these growth rates . The social planner would then maximize an an aggregator of these utilities with respect to .

120

c
by
A. Mele

3.6. Optimality and bubbles


having regard to money, the only constraint is:
of the stationary agent is:

( 1 2) =

+ 1+2 =

2
2

1+

+ 1+2 , such that the utility

The rst-order condition is 21 = 1+1 . Instead, the rst-order condition in the market equilibrium is 21 = 1 . Therefore, the Golden Rule is attained in the market equilibrium, if and
only if = 0. The social planner policy converges towards the Golden Rule. Indeed, the social
planner solves:
max

()

2 +1 )

subject to

=0

P
()
2
or max =0
1+
as of time , and the notation

1+

1+

, where is the weight the planner gives to the generation


is meant to emphasize that that endowments may change

2 +1
()

from one generation to another. The rst-order conditions,

Golden Rule in stead-state state (modied by the weight ).

121

1)
2
( )
1

1+

, lead to the modied

3.7. Appendix 1: Finite di erence equations, with economic applications

c
by
A. Mele

3.7 Appendix 1: Finite di erence equations, with economic applications


Let

R , and consider the following linear system of nite di erence equations:


=

+1

for some matrix

= 0 1

(3A.1)

. The solution to Eq. (3A.1) is:


=

1 1 1

+ +

(3A.2)

where
and are eigenvalues and eigenvectors of , and are constants, which will be determined
below. The standard proof of this result relies on the so-called diagonalization of Eq. (3A.1). Let us
) = 0 1 , where
is scalar and a
consider the system of characteristic equations for , (
) and is
1 column vector, for = 1 , or, in matrix form,
=
, where = ( 1
1 . By post-multiplying by
1
on its diagonal. We assume that > =
a diagonal matrix with
leaves the spectral decomposition of :
1
(3A.3)
=
By replacing Eq. (3A.3) into Eq. (3A.1), and rearranging terms,
+1

where

P
The solution for is
=
, and the solution for is:
=
= ( 1
) =
=
=1
P
,
which
is
Eq.
(3A.2).
=1
)> , we rst evaluate the solution at = 0,
To determine the vector of constants = ( 1
0

=(

) =

whence

( )=

(3A.4)

where the columns of


are vectors belonging to the space of the eigenvectors. Naturally, there is
an innity of these vectors. However, the previous formula shows how the constants ( ) need to
adjust so as to guarantee the stability of the solution with respect to changes in .
3A.1 Example. Let = 2, and suppose that 1 (0 1), 2 1. The resulting system is unstable
for any initial condition, except perhaps for a set with measure zero. This set of measure zero gives
rise to the so-called saddlepoint path. We can calculate the coordinates of such a set. We wish to nd
the set of initial conditions such that 2 = 0, so as to rule out an explosive behavior related to the
unstable root 2 1. The solution at = 0 is:

0
1
11 1 + 12 2
= 0=
=
= ( 1 2)
0
2
21 1 + 22 2
where we have set
yields:

=(

)> . By replacing the second equation into the rst, and solving for
2

11 0

21 0

11 22

12 21

which is zero when


0

21

2,

11

For this system, the saddlepoint is a line with a slope equal to the ratio of the two components of
eigenvector for 1 the stable root. Figure 3A.1 depicts the phase diagram for this system, with the
divergent line satisfying the equation 0 = 22
0.
12

122

c
by
A. Mele

3.7. Appendix 1: Finite di erence equations, with economic applications

y0
x

x0

y = (v21/v11) x

FIGURE 3A.1.
A saddlepoint path brings the following economic content. If is a predetermined variable, must
jump to the saddlepoint 0 = 21
0 , so as to ensure the system does not explode. Note, then, that
11
a conceptual di culty arises should the system include two predetermined variables, as in this case,
there are no stable solutions, generically. However, this possibility is unusual in economics. Consider
the next example.
3A.2 Example. The system of Example 3A.1 is exactly the one for the neoclassic growth model, as
we now demonstrate. Section 3.3.3 shows that in a small neighborhood of the stationary values ( ),
the deviations ( ) of capital and consumption from ( ), satisfy Eq. (3.15), which is reported
here for convenience:

0( )

+1
1
0( )
0( )
=
00 ( ) 1 +
00 ( )
+1

00 ( )
00 ( )
0(

By using the relation,

) = 1, and the standard conditions on utility and production,

we have that the two eigenvalues of


1

1, and (ii) tr ( ) =
tr( )2

4 det( ) =

+1+
+1+

are:

1 2

0(

) 00
(
00 ( )
0(

)
00 ( )

00

tr( )

tr( )2 4 det( )
,
2

and ,

where (i) det( ) =

0(

)=

1 + det ( ). Next, note that:

2
( )

2
+1

= 1

1 2

1
It follows that 2 = 12 (tr( ) +
)
)
1 + 12
1. Finally, to show that
2 (1 + det( ) +
p
2
(0 1), note thatpthat since det( ) 0, one has 2 1 = tr ( )
tr( )
4 det( ) 0; moreover,
1
1
tr ( )
tr( )2 4 det( )
2, or (tr ( ) 2)2
tr ( )2 4 det( ), which is true, by
1
simple computations.

We generalize the previous examples to the case where


2. The counterpart of the saddlepoint
in Eq. (3A.3)
for = 2, is called convergent, or stable subspace. It is the locus of points such that
does not explode. (In the case of nonlinear systems, this convergent subspace is termed convergent, or
1 , and rewrite Eq. (3A.4),
stable manifold. In this appendix we only study linear systems.) Let
i.e. the system determining the solution for , as follows:
=

123

c
by
A. Mele

3.7. Appendix 1: Finite di erence equations, with economic applications


We assume the elements of and are ordered in such a way that
| | 1 for = + 1 . Then, we partition as follows:

=
(

:| |

1 for = 1

and

Proceeding similarly as in Example 3A.1, we aim to make sure the system stays trapped in the
convergent space and, accordingly, require that: +1 = = = 0, or,

+1

..
.

= 0(

0
(

)1

Let
+ , where is the number of free variables and is the number of predetermined variables.
and 0 in such a way to disentangle free from predetermined variables, as follows:
Partition

0(

)1

0
(

(1)
(

free
0
1
pre
0
1

(2)
)

(1)

=
(

free
0
1

(2)

+
(

pre
0
1

or,
(1)
(

free
0
1

pre
0
1

(2)

=
(

This system has


equations with unknowns, the components of 0free : indeed, 0pre is known, as
(1)
(2)
it the -dimensional vector of predetermined variables, and
depend on the primitive data
(1)
of the economy, through their relation with . We assume that
has full rank.
(

We shall refer to as the dimension of the convergent subspace, S say. The reason for this terminology is the following. Consider the solution for ,
=
For

1 1 1

+ +

+1 +1

+1

+ +

to remain stuck in S, it must be the case that

+1

= = = 0

= ( 1 1

in which case,
=
1

1 1 1
1

+ +

i.e.,

where

( 1 1

) and
hi

1
1

R :

>

>
1

. Finally, for each , introduce the vector subspace:

Clearly, for each , dimh i = rank( ) = .


There are three cases to consider:

124

R}

3.7. Appendix 1: Finite di erence equations, with economic applications

c
by
A. Mele

(i)

= , or = . The dimension of the divergent subspace is equal to the number of the


free variables or, equivalently, the dimension of the convergent subspace is equal to the number
of predetermined variables. In this case, the system is determined. The previous conditions
are interpreted as follows. The predetermined variables identify one and only one point in the
convergent space, which gives rise to only one possible jump that the free variables can make to
(1) 1 (2) pre
ensure the system remain in the convergent space: 0free =
0 . This case is exactly
as in Example 3A.1, where = 2, = 1, and the predetermined variable is . In this example,
0 identies one and only one point in the saddlepoint path, such that starting from that point,
there is one and only one value of 0 guaranteeing that the system does not explode.

(ii)

. There are generically no solutions lying in the convergent spacea case


, or
mentioned just before Example 3A.2.

(iii)

. There are innitely many solutions lying in the convergent space, a


, or
phenomenon typically referred to as indeterminacy. Note that in this case, sunspots equilibria
may arise. In Example 3A.1, = 1, and so in order for this case to emerge when = 2, we might
need to rule out the existence of predetermined variables.

125

c
by
A. Mele

3.8. Appendix 2: Neoclassic growth in continuous-time

3.8 Appendix 2: Neoclassic growth in continuous-time


3.8.1 Convergence from discrete-time
3.8.1.1 Laws

Consider chopping time in the law of population growth as follows:


(

1)

= 1

1)

where is an instantaneous rate, and = is the number of subperiods in the given time period .
= (1 + ) 0 , or
= (1 + ) / 0 . By taking limits leaves:
The solution is
/

= lim (1 + )

We proceed similarly with the law of capital accumulation. We have:

+ ( +1)
= 0
( +1) = 1

where is an instantaneous rate, and

Taking the limits for

= 1

= . By iterating,

0 yields:

0+

=1

/
X

0+

(3.55)

or in di erential form:
=

(3A.5)

By replacing the IS equation,


=

)=

(3A.6)

into Eq. (3A.5), we obtain the law of capital accumulation:


=

(3.56)

3.8.1.2 Discretization issues

Consider the following exact discretization of Eq. (3.55):


Z

( +1)
+
+1 =

+1

By identifying with the capital accumulation law in discrete time,


= ln

(3A.7)
+1

= (1

+ , leaves

1
1

That is, while


[0 1), can take on values on the entire real line. In the continuous time model, then,
1
lim 1 ln (1 ) = . In continuous time, we cannot imagine a maximal rate of capital depreciation,
as this would be innite!
Let us replace into the exact discretization in Eq. (3A.7):
Z +1
( +1)

)
+
+1 = (1
{z
}
|
=

+1

That is, investments over the period from to + 1 are given by the continuous ow of investments
during this period, discounted at the appropriate depreciated rates.

126

c
by
A. Mele

3.8. Appendix 2: Neoclassic growth in continuous-time


3.8.1.3 Per-capita dynamics

Dene

and divide both sides of Eq. (3.56) by

We have: =

= ( )

, which replaced into the previous equation leaves

= ( )

It is the capital accumulation contraint used to solve the program in the next section.

3.8.2 The model


Consider the following social planner problem:
Z
max

( )

s.t. = ( )

[3A.P1]

where all variables are per-capita. We assume there is no capital depreciation. (Note that for the
discrete time model, we assumed, instead, a total capital depreciation.) The Hamiltonian is,

+
= ( )+
( )

where is a co-state variable. As explained below (see Appendix 3), the rst-order conditions for this
problem are:
= 0
=

The rst and the third conditions lead to

= 0 ( )
=
++

0(

(3A.8)

By di erentiating the rst equation in (3A.8) with respect to time yields

00
( )
=

0( )

and using the third equation in (3A.8),


=

)
++
00 ( )
0(

( )

(3A.9)

The equilibrium is the solution of the system consisting of the constraint of the program [3A.P1],
and Eq. (3A.9). Similarly as in Section 3.3.3, we analyze the dynamics of the system in a small
neighborhood of the stationary state, dened as the solution ( ) of the constraint of the program
[3A.P1], and Eq. (3A.9), when ( ) = ( ) = 0,

+
= ( )
+ + = 0( )

127

c
by
A. Mele

3.8. Appendix 2: Neoclassic growth in continuous-time

A rst-order approximation of both sides of the constraint of the program [3A.P1], and Eq. (3A.9),
near ( ), yields:
0( )
00 ( ) (
)
=
00 ( )
=
(
) (
)
where we used the equality
system can be rewritten as:

++ =

0(

). By setting
0(

and

00 (

)
)

00 (

, the previous

where
(
)> . Warning! There must be some mistake somewhere.
1 , where
and are as in Appendix 1. We have:
We diagonalize this system by setting =
=
1

where

We see that

. The eigenvalues are solutions of the following quadratic equation:

2,

and

0=

1
2

0(

)
)

00 (
2

+4

0(

00

) 00
) (

00 (

( )

). The solution for

is:

=1 2

whence
=
where the

s are 2 1 vectors. We have,

=
=

Let us evaluate this solution in = 0,

21 1

+
+

11 1

1 1

12 2
22 2

By repeating the reasoning of the previous appendix,


2

=0

2 2

2
2

1
2

21
11

As in the discrete time model, the saddlepoint path is located along a line that has as a slope
the ratio of the components of the eigenvector associated with the negative root. We can explicitely
compute such ratio. By denition, 1 = 1 1
0(

)
)
11 +
00 (

i.e.,

21
11

1
0( )
00 (
00 ( )

and simultaneously,

21
11

00 (

) =

1 11

1 21

21
1
1

128

c
by
A. Mele

3.9. Appendix 3: Notes on optimization of continuous time systems

3.9 Appendix 3: Notes on optimization of continuous time systems


Consider the following optimization problem. For 0 given,
Z
(
) (
(
)
max
)
(

s.t.

[0

+ (

(3A.10)

is
where and are an instantaneous and a terminal payo , is a subjective instantaneous rate,
a standard Brownian Motion and, nally, (
) and (
) are given drift and di usions functions.
We interpret as a state variable and as a control. In general, the control depends on the realization
of although we assume that it cannot depend on future observations of it can only depend on
past values of . We conne attention to controls known as feedbacks, i.e., such that only depends
( )), for each sample path of the state variable
( ).
on the current value of , ( ) (
The function makes the control and, then, the state variable , Markovian.
We apply what is known as the stochastic programming principle. Heuristically, we maximize up to
an intermediate point in time, + , say, assuming the maximization for the remaining time period
[ +
] holds. We have:
Z

(
)
(
)
) = max
(
) +
( )
(
(

[0

max
)

[0

+
=

max

[0

(
Z

+
)

(
+

)
( (

))

where the last two lines follow by the law of iterated expectations. By rearranging terms,

1
(
)
Z +

1
1
(
)
( ( +
(
)
+
+
)
(
= max
(

[0

For small

, and under regularity conditions, we have that by It


os lemma,

+
(
)
(
)
(
)+
0 = max

subject to the boundary condition (


)=
The rst order condition for Eq. (3A.11) is
0=

which implicitly denes


(

) +
(
)

( ), where

) +

1
2

1
2

))

(3A.11)

2(

2.

), such that Eq. (3A.11) can be written as:


(
(

)) + (
(

)
(

129

(
))

)) +

1
(
2

))
(3A.12)

c
by
A. Mele

3.9. Appendix 3: Notes on optimization of continuous time systems


where
(

is interpreted as the marginal indirect utility in (3A.10) with respect to , and


(

)
2

The function is usually referred to as the optimized Hamiltonian, and Eq. (3A.12) is the Bellman
Equation for di usion processes.
By Itos lemma:

1 2 2
+
+
(3A.13)
+
=
2 2
Moreover, by di erentiating both sides of the Bellman Equation (3A.12) with respect to ,
2

1
2

(3A.14)

where the rst equality follows by the denition of , and the third equality holds by the denition of
the optimized Hamiltonian function, (
). Plugging Eq. (3A.14) into Eq. (3A.13) leaves:

=
+
which shows that:

1
2

(3A.15)

To summarize, we solve the problem in three steps:


1. Maximize the Hamiltonian function,
(

)+

)+

)2

with respect to the control . The result is the optimized Hamiltonian function, (

).

2. Impose the condition in Eq. (3A.15).


3. Solve the partial di erential equation (3A.11) for .
Succinctly,

=0
(3A.16)

1 2 2
2 2
Note that in the innite horizon case (i.e., the problem in (3A.10) when
collapses to
)+
(
)
(
))
0 = max ( (
=

130

), Eq. (3A.11)

3.9. Appendix 3: Notes on optimization of continuous time systems

c
by
A. Mele

subject to the transversality condition,


lim

[ (

)] = 0

The deterministic model in Appendix 2 of this chapter is a special case of the current setup. Indeed,
the rst of Eqs. (3A.8) reveals that is not a function of such that Eqs. (3A.8) are a special case of
Eqs. (3A.16), namely for
0.

131

3.9. Appendix 3: Notes on optimization of continuous time systems

c
by
A. Mele

References
Abel, A.B., N.G. Mankiw, L.H. Summers and R.J. Zeckhauser (1989): Assessing Dynamic
E ciency: Theory and Evidence. Review of Economic Studies 56, 1-20.
Cochrane, J. H., F. A. Longsta , and P. Santa-Clara (2008): Two Trees. Review of Financial
Studies 21, 347-385.
Farmer, R. (1998): The Macroeconomics of Self-Fullling Prophecies. Boston: MIT Press.
Hayashi, F. (1982): Tobins Marginal and Average : A Neoclassical Interpretation. Econometrica 50, 213-224.
Kamihigashi, T. (1996): Real Business Cycles and Sunspot Fluctuations are Observationally
Equivalent. Journal of Monetary Economics 37, 105-117.
King, R. G. and S. T. Rebelo (1999): Resuscitating Real Business Cycles. In: J. B. Taylor
and M. Woodford (Editors): Handbook of Macroeconomics, Elsevier.
Lucas, R. E. (1972): Expectations and the Neutrality of Money. Journal of Economic Theory
4, 103-124.
Lucas, R. E. (1978): Asset Prices in an Exchange Economy. Econometrica 46, 1429-1445.
Lucas, R. E. (1994): Money and Macroeconomics. In: General Equilibrium 40th Anniversary
Conference, CORE DP no. 9482, 184-187.
Martin, I. (2011): The Lucas Orchard. Working Paper Stanford University.
Menzly, L., Santos, T., and P. Veronesi (2004): Understanding Predictability. Journal of
Political Economy 112, 1-47.
Pavlova, A. and R. Rigobon (2008): The Role of Portfolio Constraints in the International
Propagation of Shocks. Review of Economic Studies 75, 1215-1256.
Prescott, E. (1991): Real Business Cycle Theory: What Have We Learned? Revista de Analisis Economico 6, 3-19.
Stokey, N. L. and R. E. Lucas, (with E.C. Prescott) (1989): Recursive Methods in Economic
Dynamics. Harvard University Press.
Tirole, J. (1988): E cacite intertemporelle, transferts intergenerationnels et formation du
prix des actifs: une introduction. Melanges economiques. Essais en lhonneur de Edmond
Malinvaud. Paris: Editions Economica & Editions EHESS, 157-185.
Tobin, J. (1969): A General Equilibrium Approach to Monetary Policy. Journal of Money,
Credit and Banking 1, 15-29.
Watson, M. (1993): Measures of Fit for Calibrated Models. Journal of Political Economy
101, 1011-1041.

132

4
Continuous time models

4.1 Introduction
This chapter is an introduction to asset pricing models cast in continuous time. As such, it
does not not introduce any new economic concept against what we have already learned in
previous chapters. Nevertheless, continuous time methods are powerful as they allow to deal
with issues arising in economies and markets more complex than those in the previous chapters.
Moreover, on an applied perspective, continuous time methods are extremely useful to evaluate
derivative instruments that draw value from complex events, such as those relating to baskets
of credit events, capital market volatility, or history-dependent developments in xed income
security markets, to name just a few, as we shall see in Part III of these lectures. Continuous
time models pose challenges to econometricianswe only observe a discrete realization of an
idealized continuous time data generating process. The next chapter surveys tools, based on
simulations, which allow us to mitigate these challenges.
This chapter aims to two scopes. The rst is to explain in detail how the principle of absence
of arbitrage works in continuous time: how do asset prices need to drift to ensure that there
is no arbitrage? How many possible drifts would we expect to see in arbitrage-free markets?
The second objective is to develop technical details about the properties of asset prices in
continuous time. For example, we shall see that asset prices, once restricted by absence of
arbitrage, satisfy partial di erential equations under regularity conditions. Yet asset prices are
discounted expectations of their future payo s, taken under the risk-neutral probability. How
are these properties tied together? We shall explain how these properties are tied to each
other, by introducing the celebrated Feynman-Kac theorem, which provides a probabilistic
representation of the solution to a partial di erential equation. Moreover, what is the relation
between the risk-neutral probability and the physical probability? How do we need to tilt the
physical probability to determine the risk-neutral? How many risk-neutral probabilities exist,
in incomplete markets or in markets with frictions? Are there natural pricing probabilities
arising in complex contexts such as those in which interest rates are random? And how these
pricing probabilities relate to the notion of numeraires? Girsanov theorem is the starting point
that we need to deal with these fundamental questions.

4.2. An introduction to no-arbitrage and equilibrium

c
by
A. Mele

The models we consider in this chapter di usion models (with some extensions that accommodate for jumps), which are the workhorse in nance. Di usion models are, so to speak,
those where the variations of a variable of interest are driven by a deterministic component
(the drift) and a stochastic one (the di usion). Heuristically, the di usion component is
normally distributed over an innitesimal amount of time, being proportional to the variations
of what is known as a Brownian motion. We typically assume that the fundamentals of the
economy follow di usion processes, and that asset prices are rational, in that they are a function
of these fundamentals. Absence of arbitrage restricts the set of all possible pricing functions.
The fundamental tool with which we link asset prices to fundamentals is Itos lemma, a device
we need to build new processes (in our case, the asset prices) from old ones (the fundamentals
of the economy). The complication in nance is that these new processes, albeit a function of
the fundamentals, are not given in advance; instead, they are the focus of research.
The chapter is organized as follows. The next section is an introduction to methods. It deals
with details leading to the birth of continuous time nancethe Black & Scholes formula
of evaluation of European options; it also illustrates how continuous time models obtain as
limiting cases of the discrete time models in the previous chapter, and also describes basic
properties of long-lived asset prices, such as (i) the fundamental relations that link expected
returns, volatilities (the betas) and risk-premiums (the lambdas); and (ii) a representation
of the price-dividend ratio in terms of certain possibly varying discount ratesthe risk-adjusted
discount rates. These derivations turn out to be useful while discussing the properties of equity
markets in Chapter 7.
Section 4.3 ... Finally, the Appendix provides technical details omitted from the main text,
including a self-contained appendix containing notions of stochastic calculus.
[In progress]

4.2 An introduction to no-arbitrage and equilibrium


4.2.1 Time
By denition, long-lived assets do not have an expiration date and their prices do not explicitly
depend on calendar time,1 at least insofar as the state variables driving them do not explicitly
depend on time. Naturally, it does not mean that these prices cannot be time-varying, or
stochastic. On the contrary, they can well be driven by the random state variables, as we shall
explain in a few sections.
Derivatives do, instead, have an expiration date, and their value reects the time left to
maturity, when they will be worth the terminal payo . The presence of an expiration date
implies that derivative prices solve partial di erential equations, which also involve the price
sensitivity to time. In the absence of any expiration date, asset prices are still solutions to
partial di erential equations, but their price sensitivity is zero, at least provided that the state
variables driving are independent of time. In the simple case of a single state variable, long-lived
asset prices are, then, solutions to ordinary di erential equations, as in the Lucas model with
a single state variable of Section 4.3.
The next section develops the simplest possible setting where we can illustrate the main
tools of continuous time nance, which is the Black and Scholes (1973) market leading to the
evaluation formula for a European option. In the absence of arbitrage, the derivative price is
1 See,

e.g., Eq. (4.28) below.

134

c
by
A. Mele

4.2. An introduction to no-arbitrage and equilibrium

solution to a certain partial di erential equation, the solution of which, we can represent as a
conditional expectation taken under the risk-neutral probability. In Section 4.3.3, we provide
the link between this risk-neutral probability and the original probability.
4.2.2 The origins: Black & Scholes
4.2.2.1 Self-nanced strategies

A self-nanced portfolio leads to a situation where the change in value of the portfolio between
two instants and + is determined as a mark-to-market P&L: the change in the asset prices
times the quantities of the same assets held at time : there is no injection or withdrawal of
funds between any two instants. For example, let 1 and be the number of shares and the
price of some risky asset, and 2 and be the number of some riskless assets and its price.
Then, the value of a self-nanced portfolio, = 1 + 2 , satises:

+
= 1 + 2 =
where
1 and the second equality follows by simple calculations. If the portfolio strategy
involves risky assets distributing a dividend process, and consumption, the value of the selfnanced portfolio satises (see Appendix 2 for details):

=
+
+
(4.1)
where

is the dividend process.

4.2.2.2 Black & Scholes partial di erential equation

Why are partial di erential equations so important in nance? Suppose that the price of a stock
follows a geometric Brownian motion:
=

and that there exists a riskless accounting technology, or money market account (MMA, henceforth) making spare money evolve as:
=
where
0. Finally, suppose that there exists another asset, a call option, which gives rise
)+ at some future date , where
is the strike, or exercise
to a payo equal to (
price of the option. Let ( ) [0 ] be the option price process. We wish to gure out what this
price looks like while formulating as few assumptions as possible. We ignore dividend issues,
assume there are no transaction costs, and rule out any other frictions. We assume rational
expectations, that is, there exists a function : = (
), and assume that this function is
as di erentiable as needed for an application of Itos lemma, such that:
=(

where
is the innitesimal generator, dened as
= + 12 2 2
denoting partial derivatives. Next, we create the following portfolio:
135

+
, with subscripts
units of the risky asset

c
by
A. Mele

4.2. An introduction to no-arbitrage and equilibrium

and invest as a residual of the wealth,


unit in the MMA,
. As explained, for any portfolio
strategy to be self-nanced, the value of the resulting portfolio,
=
+
, must
be such that:
=
=(

+
+

Now, set 0 = 0 . Moreover, let us actually conjecture that we could choose and
such
that
=
for all , i.e. that the self-nanced strategy can replicate the option price. We
obviously need to check this conjecture below. For now, note that if = for all , the drifts
and di usion coe cients of both and have to be the same, by a result stated in Appendix
1, known as the unique decomposition property. Di usion terms are the same when = .
Replacing this into the dynamics of produces:
=(
Next, take

(4.2)

: drift( ) = drift ( ), i.e.,

Because we are conjecturing that

for all :

= =

(4.3)

By the denition of , and rearranging terms,


+

1
2

=0

[0

) R++

(4.4)

subject to the boundary condition (


)=(
)+ for all . Eq. (4.4) is a partial di erential equation; that is, we are searching for an unknown function , which has to be such that
once it and its partial derivatives are plugged into the left hand side of the rst line, we obtain
zero. Moreover, the same functions must pick up the boundary condition. The solution to this
is the celebrated Black and Scholes (1973) formula.2
It remains to verify the conjecture that = holds for all . We deal with this issue in the
next subsection while relying on a slightly more general perspective than Black & Scholes.
4.2.2.3 Non-hedgeable claims

Suppose that there exists a traded asset and that its price satises,
=

with obvious notation. We want to replicate a generic risk, a process


lemma,

1 2 2
=
+
+
2

). By Itos

2 The case where


= 0 is dealt with similarly. If = 0, we need to have
, by Eq. (4.2) or, analogously, by Eq. (4.3),
which leads to Eq. (4.4), with = 0. In this case, the portfolio process is set equal to 0 = 0 0 0 at inception of the strategy,
=
+ 0 , with
=
.
and then, kept constant, such that at each point in time,

136

c
by
A. Mele

4.2. An introduction to no-arbitrage and equilibrium


Consider as usual a self-nanced strategy
=
We have,

, the value of which satises:


+

1 2 2
=(
+
(4.5)
)
+
2
Therefore, and consistently with the Black-Scholes derivation in the previous subsection, a
necessary condition for
to replicate
is that

for all

[0

(4.6)

Next, suppose is the price of a traded asset, one that delivers a payo equal to (
)=
( ) at time , for some function , and that at the same time, satises the partial di erential
equation (4.7) below. We need to have that 0 = $0 where $0 denotes the market price of the
$
$
asset. For suppose not and, e.g., 0
0 . We could sale short the asset for 0 , and implement
a self-nancing strategy with , such that 0 = ( 0 0), where satises,
(
We claim that

)=
=

with boundary condition

) for all
[0

=
(
= (

)=

(4.7)

]. Indeed, by Eq. (4.5),

1 2 2
+
2

1 2 2
+
)
2

(4.8)

where the rst equality follows by Eq. (4.6); the second by the self-nancing condition,
=
and again by (4.6); and the third by (4.7). Because 0 = ( 0 0), then (4.8) implies
= (
) for all , as claimed, and then
= (
) = ( ) too, which allows to
3
honour the short-sale of the asset. Note that these arguments do not require that a market for
this asset exists over the life of the asset.
The crucial assumption underlying the property that the strategy value replicates the risk
(
), = (
) for all , is that (
) is the price of a traded asset, i.e., Eq. (4.7) holds
true. We can show the converse. That is, suppose the strategy value replicates the risk through
Eq. (4.6). In this case, the L.H.S. of (4.5) is zero, and by Eq. (4.6),
+

1
2

= (

)= (

Then, the risk, , satises Eq. (4.7).


Next, suppose that is not the price of a traded assetfor example, () =
counterpart to (4.6) would be = 0 ( ), the counterpart to (4.8) would be,

1 2 2 00
=
2

(4.9)
2

. Then, the

(4.10)

3 Suppose the opposite, i.e. that $


(0 0 ). We could then buy the asset for $0 and, hence, claim for ( ) at . We
0 =
0
$
could at the same time, short-sell the portfolio for 0
0 (guaranteeing a positive prot at time-0), which we could update

=
( ), which we could honour through
through through the self-nanced strategy , up to time , when it will deliver
the long static position in the asset initiated at time-0.

137

c
by
A. Mele

4.2. An introduction to no-arbitrage and equilibrium


and the latter is identically zero when the following counterpart to (4.9) holds,
1
2

00

= (

)= (

However, this ordinary di erential equation cannot hold, unless + 2 = 0. That is, assume
that = ( ) for all ; then, the left-hand side of Eq. (4.10) cannot equal zero, contradicting
that = ( ) holds for all .
All in all, we cannot replicate ( ) = 2 through a self-nanced strategy. Naturally, we
could replicate the payo at maturity, 2 , provided a market exists for a claim to 2 (see Eq.
(4.22) in the following section), not necessarily a market over the entire life of this derivative.
The price of this asset is obviously not ( ) = 2 for each though. To determine the tracking
error of the portfolio ( ) = 2 , note that
=
=
=

1
2

00

1 2
)
2

2
2
2 +
0

00

(4.11)

Eq. (4.11) conrms that


=
only when
= , which would imply that + 2 = 0:
2
an asset price equal to
would be sustained only under this implausible parameter restriction.
An alternative to a self-nanced strategy generating a tracking error, is a strategy, which
albeit not being self-nanced, guarantees that the risk process is replicable. For example, one
could set = 0 ( ), and

= ( )
(4.12)
While Eq. (4.12) allows by construction to replicate , it also obviously generates an hedging
cost that satises:
=
. In the context of our example,
=
=
=

1
2
(

1
2

00
2

00

(4.13)

where is a self-nanced strategy implemented with .


,
In other words, the second line of (4.11) uses the self-nancing condition
=
while the second line of (4.13) relies on the replicability condition (4.12). Assuming = 0, the
tracking error arising from (4.13) is strictly positive process, and at time it equals,
=

( +

2 )(

In Section 4.6, we shall return to the issues regarding replicability of claims in the context of
incomplete markets.
138

c
by
A. Mele

4.2. An introduction to no-arbitrage and equilibrium


4.2.2.4 Surprising cancellations

The option price predicted by the Black and Scholes model is independent of the drift of the
underlying asset. After reading this chapter, the reader will interpret this result as follows. Asset
prices (rescaled by the money market account) are martingales under the so-called risk-neutral
probability, say . Therefore, their value is equal to the discounted expectation of its payo
under , that is, the probability under which the stock price drifts proportionally to . In other
words, doesnt matter.
Let us analyze the details of this property in the context of the previous replicating arguments,
by relying on a very simple example. Assume that,
=

(4.14)

where is a constant,
and
are some function of calendar time and and , and
,
= 1 2, are two standard Brownian motions.
Consider the function
= (
), and assume it is solution to the following partial
di erential equation, generalizing (4.7),
=

)
(4.15)
where
(
) is the innitesimal generator of the di usion process (4.14), and is some
function, interpreted as the drift of under the risk-neutral probability. Note that once
=0
for each (
), Eq. (4.15) collapses to the Black-Scholes equation (4.7). We shall return to
this point soon.
Next, consider a self-nancing strategy invested into the asset and the money market account,
just as in the previous section. We have, generalizing Eq. (4.5), that
=(

with boundary condition

1
+
2

1
+
2

Let = , use the denition of a self-nancing strategy (i.e.,


and (4.15), such that the previous equation collapses to

= (
)

)=

),

= 0 implies that the strategy is self-nancing and replicates the option


That is, taking
value. By no-arbitrage, the price of the option is then the Black-Scholes price even in this
market with random drift. Section 10.6 of Chapter 6 explains that it is in general impossible
to replicate the option in case the asset return volatility is random: the Black-Scholes price is
generally no longer the no arbitrage price in a market with random volatility.
4.2.3 Asset prices as Feynman-Kac representations
4.2.3.1 A digression on more general partial di erential equations

The Black-Scholes equation (4.4) is a typical (in fact the rst) example of partial di erential
equations in nance. It leads to an equation of the so-called parabolic type, as we shall explain
soon. More generally, let us be given,
0

139

=0

(4.16)

c
by
A. Mele

4.2. An introduction to no-arbitrage and equilibrium

subject to some boundary condition. This partial di erential equation is called: (i) elliptic,
if 25 4 3 4
0; (ii) parabolic, if 25 4 3 4 = 0; (iii) hyperbolic, if 25 4 3 4
0. The
typical partial di erential equations arising in nance are of the parabolic type. For example,
the Black-Scholes function =
is parabolic. The following section explains how to provide
a probabilitsic representation to these parabolic partial di erential equations.
4.2.3.2 Feynman-Kac solutions to partial di erential equations

The typical situation that we encounter in nance is that the asset price is a function
solves a parabolic partial di erential equation, i.e. a special case of Eq. (4.16):
(

)+

)+ (

)+

1
2

)=0

[0

that

)R

(4.17)
with the boundary condition, (
)= (
) for all , and the function is the nal payo .
Somehow surprisingly, dene, now, a stochastic di erential equation, with drift and di usion
and in Eq. (10.27),
= (
) + (
)
(4.18)
0 =
where
is a Brownian motion. Under regularity conditions on
, the solution
(4.17) is

)
(
(
) =
( )=

to Eq.
(4.19)

where is solution to Eq. (4.18), and the expectation is taken with respect to the distribution
of in Eq. (4.18). Note that the existence of the Feynman-Kac representation does not ensure
per se the existence of a solution to a given partial di erential equation.
Eq. (4.19) can be used to represent the solution to the Black & Scholes partial di erential
equation (4.4), with auxiliary stochastic di erential equation (4.18) collapsing to,
=

where is a Brownian motion, which is dened under the risk-neutral probability, due to the
drift of
being equal to the risk-free rate, . That is, by Eq. (4.19), the price of an option in
a Black & Scholes market is the risk-neutral expectation of the nal payo , discounted at the
risk-free rate.
The Feynman-Kac representation of the solution to partial di erential equations is quite
useful. First, computing expectations is generally both easier and more intuitive than nding
a solution to partial di erential equations through guess and trial. Second, except for specic
cases, the solution to asset prices is unknown, and a natural way to cope with this problem
is to go for Monte-Carlo methodsapproximation of the expectation in Eq. (4.19) through
simulations and use of the law of large numbers. Finally, the Feynman-Kac representation
theorem is useful for some theoretical reasons we shall see later in this chapter.
4.2.3.3 A few heuristic proofs

It is well-beyond the purpose of this section to develop detailed proofs of the Feynman-Kac
representation theorem. In addition to Karatzas and Shreve (1991, p.366), an excellent source
of reference is still Friedman (1975), which relaxes many su cient conditions given in Karatzas
140

c
by
A. Mele

4.2. An introduction to no-arbitrage and equilibrium

and Shreve through opportune localizations of linear and growth conditions. The heuristic proof
provided below covers the slightly more general case in which
+

=0

(4.20)

with some boundary condition. Here is some function ( )


. The interpretation of is
that of an instantaneous dividend rate promised by the asset. As usual,
= +
+ 12 2 .
So suppose there exists a solution to Eq. (4.20). To see what a Feynman-Kac representation
of such a solution looks like in this case, dene
Z
R
R
0
0
(
)+
0

where:
=

By Itos lemma,
=

[(
|

+
{z

)
}

=0

Therefore, and under regularity conditions on


,
Z
R
R
0
0
=
(
)+

]=

is a martingale, with
and

(0

). We have,

0)

Hence,
(0

0)

)=

) +

4.2.4 The Girsanov theorem


4.2.4.1 Motivation again

Consider the Black & Scholes partial di erential equation (4.4). We now know that by the
Feynman-Kac theorem, we can represent the price , as a discounted expectation of the terminal
payo , (
)=(
)+ ,
(

)=

)+

(4.21)

where
denotes the expectation taken conditionally upon the information set at , and with
respect to a new probability (say), under which
is solution to,
=

where is a Brownian motion under . For obvious reasons, we refer to as the risk-neutral
probability.
Naturally, the methodology underlying Eq. (4.21) can be applied to evaluate other derivatives
than Black-Scholes. Consider, for example, a quadratic derivative, i.e. one that pays o the
square of the asset price, at time , 2 . The arguments made in the previous sections, relating
141

c
by
A. Mele

4.2. An introduction to no-arbitrage and equilibrium

to the replication of this derivative are still the samewith a portfolio process that still includes
=
units of the risky assets, to be calculated below. The price of this derivative, then, still
satises Eq. (4.21), although the boundary condition is (
) = 2 , such that the price can
be expressed as,
2
)
(
)
(
) 2 (2 + 2 )(
(
)=
(4.22)
=
2
This implies that the hedging portfolio is = 2 ( + )( ) . Indeed, it is easy to check that
the value of the replicating strategy,
=
+ , coincides with Eq. (4.22), as = 2 ( )
and, using Eq. (4.3) and Eq. (4.22), =
(
). The hedging portfolio for Black-Scholes
will be introduced, and discussed at length, in Chapter 10.

4.2.4.2 Relation between

How does

relate to

and

: heuristic details

? Note that under

is solution to:

=
where

is a Brownian motion under

. To check, heuristically, that this is true, note that:


Z
+
) = ( ( )
)+ ( )
Z
=
( )( ( )
)+ ( )

(
)+

where

(
(

)
,
)

and

is solution to:

=1

Then, is necessarily equal to =


. Indeed, let
. Because
=(
)+ , we
have that, 0 = 0 =
( ). That is,
is a -martingale. Moreover, by Itos lemma:

1 2 2
+
=
+
+
(
)
2
Under the usual pathwise integrability conditions, this is a martingale when,
0=

On the other hand, we know that


0=

1
2

(4.23)

is solution of the Black-Scholes partial di erential equation:


+

1
2

(4.24)

(
)+ is
Comparing Eq. (4.23) with Eq. (4.24) reveals that the representation
possible with, =
, as originally claimed.
The interpretation of is that of a unit risk-premium for investing in the stock. We shall
return to this important interpretation below.
142

c
by
A. Mele

4.2. An introduction to no-arbitrage and equilibrium

The point of the previous computations is that it looks like as if we could start from the
original probability space under which
=

(4.25)

and, then, dene a new Brownian motion =


+
, such that Eq. (4.25) can be written
=
, under some new probability space. Note that
= (
) +
+
as,
the denition of Brownian motion we initially begun with obviously depends on the underlying
probability measure , although the pricing operates under a di erent space. This fact is crucial
in nancial economics. These concepts are formalized in the next section.
4.2.4.3 The theorem

R
Consider two probabilities and , linked through ( ) =
Radon-Nikodym derivative,

Z
Z

1
2
=
= exp
k k

2
0

( )

where be some process satisfying the so-called Novikovs condition:


, and
is a Brownian motion under .4 Then,
Z
=
+

) for

(exp( 12

, and the

k k2

))

is a Brownian motion under the probability .


The Girsanov theorem is the counterpart to the static and heuristic change of probability
dealt with in Section 1.4 of Chapter 1. Intuitively, in continuous-time, a change in probability
leads to a mean-shift of the random variable we are changing the distribution of. The fact
this change does not entail a change in volatility is well-known in continuous-time nance. The
economic interpretation is that the shift towards the left reects a more pessimistic assessment
of asset return developments, other things being equal, including volatility. This property does
not necessarily hold with other models, or even in discrete time settings, such as those generated
by a binomial distribution, as illustrated by the tree models in Chapters 7, 10, or 11.
A few more technical details regarding asset evaluation. Suppose we wish to price a claim at
. Heuristically, we have

Z
Z
= (
=
( )=
=
)

Similarly, we can update the previous formula as time unfolds, as follows:


(

)=

where
=

=1

Section 4.4 develops additional details regarding this change of probabilities. We now proceed
to provide details regarding the risk-premium in a slightly more general context, which also
includes evaluation of long-lived assets. We shall see that relates to the stochastic discount
factor through what is known as the pricing kernel.
4 This condition is needed to ensure that
preclude this equality
= 1 to hold.

integrates to one,

143

)=1

= 1, and rules out ill-behaved

that would

c
by
A. Mele

4.2. An introduction to no-arbitrage and equilibrium


4.2.5 The APT in continuous time

We explain how asset prices link to a number of state variables by deriving a continuous time
version of the APT (see Chapter 2), in which the asset expected excess returns form a linear
combination of the asset exposures to factors, with weights equal to the unit risk-premiums the
market requires to bear the risk arising from each of these factors. We begin with a heuristic
derivation of the pricing kernel in continuous time as the limit of a discrete time model; then,
we characterize the market expected returns in terms of the APT while relying on a di usive
model.
4.2.5.1 Prices and pricing kernels, from discrete to continuous time

Let
be the price of a long-lived asset as of time , and + the dividend paid by this
asset at over a small trading period . We know that in the absence of arbitrage, there exists
a positive process
, known as the stochastic discount factor, such that the price of any asset
is the expectation of its future payo , weighted with
+ ,
=

)]

(4.26)

where
is the conditional expectation given the information set at time . For example, in an
economy with a representative risk-neutral agent, we have that
, where is the
+ =
risk-free rate per unit of time.
Given the stochastic discount factor, we dene, as usual, the pricing kernel, or state-price,
process, , as the process that grows by the stochastic discount factor:
+

In terms of the pricing kernel, Eq. (4.26) is 0 =


in the limiting case of that tends to zero, is
0=

[ (

1=

)] +

Eq. (4.27) can be integrated to yield,


=
lim
(
) = 0, the asset price satises,
Z
=
Note that,

1
+

, which
(4.27)

R
(

)+

). Assuming that

(4.28)

,5 whence,

+(

))

(4.29)

In the presence of risk, and risk-averse agents, the innovations to the stochastic discount factor
will drive uctuations of the pricing kernel.
Many of the models in this chapter are cast within a di usion setting, such that the pricing
kernel satises the continuous time limit to Eq. (4.29),

=
5 Just

set

1 in Eq. (4.26), and interpret

as the price of default-free zero coupon bond.

144

(4.30)

c
by
A. Mele

4.2. An introduction to no-arbitrage and equilibrium

where
is a vector Brownian motion, supposed to drive uctuations of the asset prices, and
is the vector of unit risk-premiums.
(
)
The interpretation of in Eq. (4.30) is simple. Without risk, =
the stochastic
discount factor is simply the usual discount factor. In the presence of risk, the discount factor varies stochastically, driven by the same sources of variation a ecting asset prices,
.
Naturally, some components of
are zero if some of these sources of variation do not receive
compensation.
4.2.5.2 Expected returns, lambdas and betas

We determine the di erential (


(

)=

) in Eq. (4.27) by relying on Itos lemma,

+
+
=
+
+

such that Eq. (4.27) can be written as,



+
=

(4.31)

Eq. (4.31) holds for any asset


price,
including those not distributing dividends and locally
0
= 0, where 00 =
and
is the short term rate
riskless, 0 say, such that
0
process. By (4.31), then,

=
consistently with the discrete time counterpart of the pricing kernel in Eq. (4.29).
By Eq. (4.31), the expected returns satisfy:

+
=

(4.32)

We know from previous sections that the expectations in Eq. (4.32) can be expressed in
terms of partial derivatives, implying that the asset price solves a certain partial di erential
equation. We will develop this theme in detail in Chapters 7 and 8. Now, we wish to further
our interpretation of in Eq. (4.30) as a vector of unit risk-premiums.
Note that theasset
price, , is obviously driven by the same Brownian motions driving and


, such that
= Vol

, where Vol
denotes the instantaneous volatility
of asset returns; note that this volatility could be a vector when there are more than a state
variables driving the asset price, . Substituting this result into the R.H.S. of Eq. (4.32) leaves:


|{z}
(4.33)
+
=
+ Vol
lambdas
| {z }
betas

Expected returns equal the short-term rate plus a risk-premium arising due to the randomness
of the very same returns. This premium is the product of the instantaneous risks related to the
asset price uctuations (the betas, Vol
) times the unit risk-premiums that compensate
for each individual source of these instantaneous risks (the lambdas, ). Eq. (4.33) is an APT
relation, the continuous time counterpart to those developed in Chapter 1 of the lectures: the
only assumption underlying it is absence of arbitrage, i.e., a positive stochastic discounting
factor exists. We now proceed to a decomposition of these expected returns.
145

c
by
A. Mele

4.2. An introduction to no-arbitrage and equilibrium


4.2.5.3 Risk-adjusted discount rates

How to discount future cash ows in a model with multiple sources of risk? It sounds like
there might be an obvious answer to this question: we should use the APT, i.e., Eq. (4.33).
It is actually a subtle point. Eq. (4.33) provides predictions of expected returns, but expected
returns are not necessarily risk-adjusted discount rates. Naturally, were dividends and asset
prices driven by one (and the same) factor, expected returns and risk-adjusted discount rates
would be the same. However, the two notions deviate in a multifactor framework.
We illustrate these points while relying on a simplifying assumption, namely that the pricedividend ratio, say, is independent of the dividends , and driven by a vector of state variables,
, such that:
(
)= ( )
(4.34)
Such a scale-invariant property of asset price arises in many economies (see Part II of these
lectures for more detailed discussion). For example, it does if (i) the dividends are geometric
Brownian motions and (ii) the state variables do not depend on . In this case, the price in
Eq. (4.34) satises:
= +
such that by Eq. (4.32),

where R is dened as follows:


R

=R

0
|{z}

cash-ow beta

(4.35)

CF
|{z}

cash-ow lambda


and 0
Vol
(possibly a vector) and CF denote the unit-risk premium required to
compensate for the randomness in the dividend process. Note that Eq. (4.32) is decomposition:
we can always nd the appropriate vector CF such that Eq. (4.32) holds true.
We refer to R as the risk-adjusted
discount rates. They equal the safe interest rate , plus

the premium,
, arising to compensate for the stochastic uctuations of dividends.

= 0, the risk adjusted


Eq. (4.35) tells us that if the price-dividend ratio is constant,
discount rates are the same as the expected returns, just as in the one-factor Lucas economy
discussed in Chapter 3.
If we assume that the price-dividend ratio is constant, and that aggregate dividends equal
consumption, then, 0 would be too small to make the expected returns predicted by this model
consistent with the data. It is the celebrated equity premium puzzle dealt with in much detail
in Part II of these lectures. But Eq. (4.35) reveals that expected returns could be inated if the
price-dividend ratio uctuates, driven by additional state variables requiring compensation:

(4.36)
= Vol
|{z}
| {z } price lambdas
price betas

This term does indeed represent a wedge between the expected returns and the risk-adjusted
discount rates (see Eq. (4.35)); therefore, it carries the potential to mitigate the equity premium
puzzle. Note that Eq. (4.36) describes the most natural channel through which expected returns
146

c
by
A. Mele

4.2. An introduction to no-arbitrage and equilibrium

are inated: the state variables uctuate and lead the price-dividend ratio to uctuate, thereby
a ecting the realized returns. If these uctuations require compensation (meaning that their
innovations a ect the pricing kernel, 6= 0), they a ect the expected returns. The price beta
are the asset returns exposure to factors arising through this channel, and the price lambdas
are the corresponding unit premiums to bear these risks.
Chapter 7 explains that in addition to the equity premium puzzle, time variation in returns
volatility is a persavive empirical property of asset prices. This property can be rationalized by
time variation in risk-adjusted discount rates. Heuristically, note that returns volatility relates
to the price betas in Eq. (4.36). In a di usion setting, the
relates to the semi price
beta
0( )
elasticity of the price-dividend ratio with respect to , Vol
= ( ) Vol( ); that is, the
critical ingredient of the price beta is the very same price-dividend ratio in Eq. (4.28). Note,
also, that the price-dividend ratio can be re-expressed in terms of the risk-adjusted discount
rates, as follows:

Z

1 2

R( )
( )=E
(4.37)

= ( 0 2 0 )( )+ 0 ( ( ) )

where E and denote


the expectation and a Brownian motion under the risk-neutral probability, and 0
.
Eq. (4.37) is a present value formula in which a ctitious risk-unadjusted dividend growth,
, is discounted using the risk-adjusted discount rates, R ( ). It is a representation of
asset prices with its own interest. It also helps explain results detailed in Chapter 7, by which
the sensitivity of the price-dividend ratio to changes in
is a ected by how R( ) reacts in
response to changes in . Intuitively, if the risk-adjusted discount rates R( ) undergo through
large and swings in bad times, the volatility predicted by Eq. (4.37) would likely exhibits some
of the interesting countercyclical statistics that we see in the data. Chapter 7 explains these
properties in detail, as mentioned.
4.2.6 Example: no-arbitrage in Lucas tree
4.2.6.1 Family of prices

Consider the Lucas (1978) model with one tree and one perishable good taken as the numeraire
the continuous time version of the model in Chapter 3. We assume that the dividend is a
geometric Brownian motion,
=

(4.38)

for two positive constants and 0 . We assume no-sunspots, and denote the rational pricing
function with
( ). By Itos lemma,
=

By Eq. (4.1), wealth satises,



=

( ) + 12 20
( )

00

( )

( )
( )

Below, we shall show that in the absence of arbitrage, there must be some process , the unit
risk-premium, such that,
=

+
147

(4.39)

c
by
A. Mele

4.2. An introduction to no-arbitrage and equilibrium

Let us assume that the short-term rate, , and the risk-premium, , are both constant. Below,
we shall show that such an assumption is compatible with a general equilibrium economy. By
the denition of
and , Eq. (4.39) can be written as,
0=

1
2

2
0

00

( )+(

0)

( )

( )+

(4.40)

Eq. (4.40) is a second order di erential equation. Its solution, provided it exists, is the rational
price of the asset. To solve Eq. (4.40), we initially assume that the solution, F say, tales the
following simple form,
)=
(4.41)
F(
where
is a constant to be determined. Next, we verify that this is indeed one solution to
Eq. (4.40). Indeed, if Eq. (4.41) holds, then, by plugging this guess and its derivatives into Eq.
(4.40) leaves, = (
+ 0 ) 1 and, hence,
F(

)=

1
+

(4.42)
0

This is a Gordon-type formula. It merely states that prices are risk-adjusted expectations of
future expected dividends, where the risk-adjusted discount rate is given by + 0 . Hence,
in a comparative statics sense, stock prices are inversely related to the risk-premium, a quite
intuitive conclusion.
Eq. (4.42) can be thought to be the Feynman-Kac representation to Eq. (4.40), viz
Z

(
)
)=E
(4.43)
F(
where E [] is the conditional expectation taken under the risk neutral probability
dividend process follows,
=(
+ 0
0)
+ (
and =
the true probability,
derivative,

(say), the

) is a another standard Brownian motion dened under . Formally,


, and the risk-neutral probability, , are tied up by the Radon-Nikodym
=

1
2

(4.44)

These pricing results relate to the assumption and are both constant. We didnt specify
the exact economic conditions this is true. It is the reason we refer the prices predicted by this
model as a family of prices. The next section provides more structure, through a restriction
on preferences that leads to the pricing results summarized so far.
4.2.6.2 Equilibrium with CRRA

How do precisely preferences a ect asset prices? In Eq. (4.42), the asset price relates to the
interest rate, , and the risk-premium, . But in equilibrium, agents preferences a ect and .
However, such an impact can have a non-linear pattern. For example, when the risk-aversion is
low, a small change of risk-aversion can make the interest rate and the risk-premium change in
the same direction. If the risk-aversion is high, the e ects may be di erent, as the interest rate
reects a variety of factors, including precautionary motives.
148

c
by
A. Mele

4.2. An introduction to no-arbitrage and equilibrium

To illustrate these features within the simple case of CRRA preferences, consider, rst, the
dynamics of wealth under the risk-neutral probability, , such that by Eq. (4.1),

(4.45)

We assume that the following transversality condition holds,


( )
=0
lim E

(4.46)

=(

By integrating Eq. (4.45), and using the previous transversality condition,


Z

(
)
=E

(4.47)

Note that Eqs. (4.43) and (4.47) imply that in equilbrium, i.e.
= , we also have that
= .
Next, consider a representative agent with instantaneous utility of consumption ( ) and
subjective discount rate , who solves the following intertemporal optimization problem,
Z
Z

(
)
( )
s.t. =
[P1]
max
where the constraint follows by a change of probability in Eq. (4.47) using Eq. (4.44) and,
accordingly,
1 2
)
= ( + 2 )( ) (
Consider the Lagrangean
Z
L

( )

where is a Lagrange multiplier. The rst order conditions are:


equilibrium, = , and by the denition of
,
0

)=

+ 12

)(

) 0

( )=

. In

(4.48)

That is, by Itos lemma,


0

( )
=
0( )

00

( )
0( )

1
+
2

2
0

000

( )
0( )

00

( )
0( )

(4.49)

On the other hand, by expanding the R.H.S. of Eq. (8A.21) leaves, by Itos lemma again,
0
0

( )
=(
( )

(4.50)

But drifts and volatilities of Eq. (4.49) and Eq. (4.50) have to be the same, whence
=

00

( )
0( )

1
2

2
0

000

( )
0( )
149

and

00

( )
0( )

c
by
A. Mele

4.2. An introduction to no-arbitrage and equilibrium

Assume, for example, that is constant. After integrating the second of the previous relations,
1
we see that apart from an irrelevant integration constant, ( ) = 1 1 , where
is the
0
CRRA. Hence, under CRRA preferences,
1
( + 1)
2

= +

2
0

Finally, by replacing these expressions for the short-term rate and the risk-premium into Eq.
(4.42) leaves,
1

( )=
(4.51)
1
2
(1
)
0
2

provided of course the denominator is strictly positive.


We are only left to check that the transversality condition (4.46) holds at the equilibrium
= . Under the same conditions for which ( ) is nite, we have that:
( )
( )
= lim E
lim E
= lim

= lim

lim

lim

lim

)(

1
2

2
0

1
2

)(

+ 12

(1

)(

)+(

)(

)(

1
2

)(

2)
0

=0

(4.52)

4.2.6.3 Bubbles

The transversality condition in Eq. (4.46) is often referred to as a no-bubble condition. To


illustrate the reasons underlying this denition, note that Eq. (4.40) admits an innite number
of solutions. Each of these solutions takes the following form,
( )=

constants.

(4.53)

Indeed, by plugging Eq. (4.53) into Eq. (4.40) reveals that Eq. (4.53) holds if and only if the
following conditions holds true:
0=

( +

and 0 = (

0)

1
(
2

1)

2
0

The rst condition implies that equals the price-dividend ratio in Eq. (4.42), i.e.
The second condition leads to a quadratic equation in , with the two solutions,
0. Therefore, the asset price function takes the following form:
2
( )=

( )+
150

(4.54)
=
1

F(

.
0 and

c
by
A. Mele

4.2. An introduction to no-arbitrage and equilibrium


It satises:
( )=

lim

, if

lim

( ) = 0 if

To rule out an explosive behavior of the price as the dividend level,


1 = 0, which leaves,
( )=

( )+B( )

B( )

=0

, gets small, we must set


(4.55)

The component, F ( ), is the fundamental value of the asset, as by Eq. (4.43), it is the
risk-adjusted present value of the expected dividends. The second component, B ( ), is simply
the di erence between the market value of the asset, ( ), and the fundamental value, F ( ).
Hence, it is a bubble.
We seek conditions under which Eq. (4.55) satises the transversality condition in Eq. (4.46).
We have,

( )
( )
( )
) + lim E
B( )
= lim E
lim E
F(
By Eq. (4.52), the fundamental value of the asset satises the transversality condition, under
the condition the denominator in Eq. (4.51) is strictly positive. Regrading the bubble we have,

( )
( ) 2
lim E
B ( ) = 2 lim E
=

lim E

1
0 )+ 2 2 ( 2

2(

1)

2
0

)(

(4.56)

where the last line holds as 2 satises the second condition in Eq. (4.54). Therefore, the bubble
can not satisfy the transversality condition, except in the trivial case in which 2 = 0. In other
words, in this economy, the transversality condition in Eq. (4.46) holds if and only if there are
no bubbles.
4.2.6.4 Reecting barriers and absence of arbitrage

Next, suppose that insofar as the dividend


uctuates above a certain level
0, everything
goes as in the previous section but that, as soon as the dividends level hits a barrier , it is
reected back with probability one. In this case, we say that the dividend follows a process
with reecting barriers. Alternatively, one may think that the rm has innitely deep pockets,
in that it can always guarantee that the dividends it distributes are solution to,
=

(4.57)

where
is a continuous, non-negative process that increases only when the dividends
hit
. We may think of the rm as operating on an innitesimal intervention scale.
Eq. (4.57) thus generalizes Eq. (4.38) in that it allows for dividends growth to always ensure
dividends cannot decrease below a threshold. How does the price behave in this context? By
Itos lemma, and Eq. (4.57),
=

)
(

+
)

1
2

2
0

1
+
2

2
0

00
2

(
00

)
(

+
151

(4.58)

c
by
A. Mele

4.2. An introduction to no-arbitrage and equilibrium

We claim that to ensure absence of arbitrage, the following smooth pasting condition must
hold,
0
( )=0
(4.59)
Indeed, after hitting the barrier, , the dividend is reected back for the part exceeding .
Since the reection takes place with probability one, the asset is locally riskless at the barrier
. Therefore, absence of arbitrage requires the price moves at , only by its predictable drift
component in Eq. (4.58), which it does when 0 ( ) = 0 for
= . Note then that the last
0
component in Eq. (4.58), ( )
= 0 for all . By standard arguments (i.e. Eq. (4.39)), we
then have that:

0
( )
0
( )
+
=
( )
If = then, by Eq. (4.59),

=
+

This relation tells us that holding the asset during the reection guarantees a total return
equal to the short-term rate. Once again, during the reection, the asset is locally riskless and,
hence, arbitrage is ruled out when holding the asset will make us earn no more than the safe
interest rate, . Indeed, by the previous relation, and using ( ) = 0 we have that the wealth
in Eq. (4.1), satises,


=
+
+
+
( )
=(
)

This example illustrates how the relation in Eq. (4.39) works to preclude arbitrage opportunities.
To solve the model, note that while the dividends are above the barrier,
, the price is
still as in Eq. (4.53),
(

+ 1 1+ 2 2
+ 0
As in the previous section, we need to set 2 = 0 to satisfy the transversality condition in
Eq. (4.46) (see Eq. (4.56)). However, we now determine 1 to pin down the price at the barrier
, rather than set it equal to zero as in the previous section.
We have,
F ( )/ , and
0=

)=

( )=

( )=

where the second condition is the value matching condition, which needs to be imposed to
ensure continuity of the pricing function with respect to
and, hence absence of arbitrage.
The previous system can be solved to yield
=

and

(4.60)

The price is now an increasing and convex function of the fundamentals, .6


Note that the model takes the barrier
as given. Section 4.6 presents a model that goes
beyond this context and analyzes a situation in which the dividend is controlled as part of
an optimization problem.
6 This model takes the barrier
as given. However, one may be interested in controlling the dividend
in a way that as soon
hits an exogenous given level , the dividend is activated to induce the price to increase back. It is straightforward to
as the price
see that the solution for the dividend level that triggers this innitesimal intervention is obtained by inverting the rst equation

in (4.60) for the dividend threshold, viz

152

c
by
A. Mele

4.3. Distorsions and numeraires

4.3 Distorsions and numeraires


Consider a framework where given a strictly positive random variable
Nikodym derivative as,

2
= ( )=
( )
1 F

, we dene a Radon-

where F denotes the information set at time , and


() is the expectation under 1 , conditional on the information set at time . Dene a corresponding density process for
( )
as,
( )
( )=
[ ]
( )
[ ( )] =
( ) for all , and in particular,
( ) = 1. Therefore,
We obviously have that
() is a martingale starting from 1, with ( ) = [ ( )] = 1, and 2 is thus a probability.
We now develop a series of examples to illustrate how naturally changes of probability arise in
nance. A leading example is one where 1 is the physical probability, and the random variable
is the marginal utility of consumption at ,
= 0 ( ), as in (the continuous time version
of) the consumption-based asset pricing models of Chapters 2 and 3. Another fundamental
example spans many xed income security pricing problems described in Chapter 12. In these
is the inverse of the money market account at ,
=
, and 1 is the
problems,
risk-neutral probability, such that

R
E
( )
( )=
=
( )
( )
where ( ) is the time price of a zero coupon bond expiring at .
The next section develops the leading example of change of probabilities, the consumptionbased probabilities, as a benchmark. Section 4.4.2 reviews general notions, and examples, of
pricing, based on the numeraires being involved into the denition of a given asset pricing
problems.
4.3.1 Leading example: consumption-based probabilities
Consider a basic consumption-based model, where an investor can invest at two dates only, at
and at
, without any intermediate consumption, such that the price of an asset satises,
( ) 0

( )
(
)
=
=
E ( )
0( )
with the usual notation. In this model, we dene the risk-neutral probability
Radon-Nikodym derivative of against ,7

(
)
0

[ 0 ( )] ( )
( )
=

0( )
[ 0 ( )]
F
|
{z
}

through the

=1

7 This derivation relies on the assumption that the short-term rate is constant. Section 7.5.1 in Chapter 7 contains a derivation
of the more general case of stochastic interest rates in a Markov setting.

153

c
by
A. Mele

4.3. Distorsions and numeraires

Assuming decreasing marginal utility as usual, we have that the risk-neutral probability distorts
the physical, by assigning more weight to the bad states of nature, i.e. those where consumption
is low, as explained in Chapter 2.
We derive implications for a continuous-time model where a representative agent has CRRA
equal to . In equilibrium,
= , such that if consumption is a geometric Brownian motion
with parameters (
), we have that the density process,


1 2
( +1) (
)
2

=
( )=

such that,

( )
=
( )
where
+(

is a Brownian motion under . By Girsanovs theorem, then, we have that =


) is a Brownian motion under , such that the stock price is solution to,
=

{z
=

To reconcile prices in the risk-neutral world with prices in the true, we need the average path
of the stock price to be lower under the risk-neutral probability than under the physical. Since
2
the stock is traded, we also have that
= , as indicatedthe risk-premium is somehow
hidden in this example with complete markets. When markets are incomplete, risk-premiums
terms would, instead, show up, as in the Gamma process example of Heston (1993,b).
[Develop Gamma processes here.]
4.3.2 Numeraire pricing
4.3.2.1 Denition

The formulation of the fundamental theorems of asset pricing (FTAP) in the previous chapters
relies on the risk-neutral probabilitythe probability under which the asset prices discounted by
the value of the money market account are martingales, to prevent arbitrage. That is, the money
market account is the numeraire in this no-arbitrage context. We can formulate equivalent
versions of this theorem, utilizing di erent numeraires, and di erent equivalent probabilities.
Consider the following denition, which we shall state assuming no dividends for simplicity.
Definition 4.1 (Numeraires and equivalent probabilities). A numeraire is any asset with a
price process N
0, say. Given a numeraire, a probability N is an equivalent probability
(or measure) if any asset price process
normalized by N satises,

N
=
N
N
A critical task in this and previous chapters is the formulation of the FTAPmarkets are
arbitrage-free if and only if there exists an equivalent martingale probability. We now provide
examples of existence of numeraires and associated probabilities N .
154

c
by
A. Mele

4.3. Distorsions and numeraires


4.3.2.2 Examples of numeraires and arbitrage pricing laws
Risk-neutral probability

The risk-neutral probability corresponds to the money market account numeraire, as mentioned,
with value process given by,
R
N

Suppose, for example, that the short-term rate is constant. In terms of the consumption probabilities of Section 4.1, we have that,

N =
=E
(4.61)
N
N
Alternatively, and still assuming interest rates are constant, we may express Eq. (4.61) as,

(
)
N =
=E
= ( )
N
N

( ) = 1. If interest rates are constant, the money


with the obvious condition that N =
market account yields the same as an investment of 1( ) units of a zero coupon bond, such
that we can we can re-interpret the zero-coupon bond as the numeraire. Next, we generalize
this context to one with stochastic interest rates.
Forward probability

Suppose, instead, that interest rates are random, such that

!
R

=E

where
against

( )E

( )

( )E

(4.62)

is the so-called forward probability, dened through the Radon-Nikodym derivative


, as follows,

N
=
(4.63)

( )
F

Accordingly, in this context of random interest rates, we can dene the zero coupon bond as a
numeraire, such that, by Eq. (4.62),

N
N = ( )
=E
(4.64)
N
N

where the equality follows by the boundary condition for the price of a zero coupon bond,
N = 1.
In Chapter 12, we shall make reference to this probability to price interest rate derivatives
without risk of default. The price of these derivatives is,

=E
( ) [ ]
(4.65)
where () is the payo , possibly a function(al) of the entire path of the short-term rate over
the life of the derivative. Note the main di culty in Eq. (4.65). The discounting factor
155

c
by
A. Mele

4.3. Distorsions and numeraires

obviously correlates with the payo , and complicates the evaluation of this derivative. However,
we can use the forward probability to express , as follows,
( )E


{ }

(4.66)

As the Radon-Nikodym derivative dened in Eq. (4.63) makes clear, the new probability,
, distorts , in that it assigns a higher weight to events where interest rates paths are
lower. The drift of under N is indeed lower, as formally shown in Appendix 3 of Chapter
12.8 In other words, we have that upon using Eq. (4.66) instead of Eq. (4.65), we get rid o the
randomness of the discounting factor
, by multiplying the expected payo by ( ).
At the same time, this expected payo is calculated in world ruled by a probability N , where
interest rates are on average lower than under , which compensates for the fact the very same
expected payo receives an haircut of ( ) 1.
Note that the calculations underlying Eq. (4.64) can be used to check the internal consistency
of the denition of the new numeraire and equivalent probabilities with absence of arbitrage.
There are no arbitrage opportunities if there exists a probability N with Radon-Nikodym
derivative against equal to:

N
=N N
(4.67)

N N
F
N

Indeed, we have, simply, that

=E

=E

N N

N N N

which is the rst equality in Eq. (4.62).


Annuity probability

Consider, now, a class of contracts, where the payo


is given by di erence between a constant
and some risk of interest, which is not necessarily traded, re-scaled by a variable measurable
at time ,
N (
)
(4.68)
The interpretation of
in this section is that of the value of an interest rate swap at time
, whereby a counterparty shall pay a xed interest rate to counterpart , in exchange of
the variable LIBOR, over a certain horizon, called the tenor. In Eq. (4.68),
is the so-called
swap rate, and N is the value at time
of an annuity of one dollar over the tenor of the
swap. Swaps contracts are amongst the most important interest rate derivatives dealt with in
practice, and are motivated and explained in detail in Chapter 12.
The annuity factor, N , is a valid numeraire in this market. Accordingly, let us dene the
annuity probability arising in this context, through the counterpart to the Radon-Nikodym
derivative in Eq. (4.67),

N
=N N

N N
F

8 Note that this statement certainly holds when the short-term rate is a Markov process. In more complex models such as those
with stochastic volatility, the bond price sensitivity to movements in the state variables may switch sign and render this statement
model-dependent.

156

c
by
A. Mele

4.3. Distorsions and numeraires

such that, we can price all contracts in this market.9 First, the value at of a forward starting
interest rate swap, i.e. a swap agreed at time , say , is given by:

=E

and that of a swaption, i.e. an option to enter into a future swap, is


+

=E

max {
N

0}

=E

Note that forward swap rate, dened as the value of


zero, satises,
=E

, where:

)+

determined at time , such that

is

)
N

=E
( ), for
,
Accordingly, we can dene the forward swap rate as the process
which is obviously a martingale under N . These properties are extremely useful, when it
comes to pricing these important derivatives.
Defaultable probability

Finally, we consider the so-called survival-contingent probability, studied in Chapter 13, where
the numeraire, N , is a defaultable annuity of one dollar, paid o over a certain tenor, which
is interpreted as the period over which two counterparties exchange credit risk. The option to
enter into this deal is called a credit default swaption, and the payo is as in Eq. (4.68), with
+
N (cds
)+ , where cds is the credit default risk premium as of time , and
is
a constant.
4.3.2.3 Martingales and numeraires

The examples in the previous section can be gathered to provide a general insight. Consider a
forward starting agreement, originated at time , with payo equal to,
N (

where N is a numeraire, N is an equivalent probability, and


Then, the forward risk, dened as
=E
is a martingale under N , and
agreement equal to zero.

is measurable at time

is the value that sets the value of the forward starting

4.3.2.4 Marking-to-market updates of forward starting agreements

Suppose we are long an amount of a forward starting agreement at time and strike
such
that the value of this agreement at right after the trade is +
N (
) and the value
9 Note that under this annuity probability, the events that matter most, are those where interest rates paths are low, similarly
as with the forward probability in the previous section.

157

c
by
A. Mele

4.4. Martingales and arbitrage


at + 1 before the trade is +1
N +1 ( +1
). Choosing the strike
delivers the forward risk,
= , such that, by aggregating,
X

+1

=1

X
=1

N +1 (

+1

[N (

that clears

) + (N +1

+1

=1

N )(

)]

+1

By chopping the trading interval in small pieces we have that under standard regularity conditions,
Z
Z
0

4.4 Martingales and arbitrage


4.4.1 The information framework
We still consider a Lucas type economy, but with a nite horizon
. Consider a probability
space ( F ), let
be a standard -dimensional Brownian motion, and let F = (F ) [0 ]
be the -augmentation of the natural ltration F = (
) generated by , with
F=F .
We consider trees, a money market account and at times further assets in zero net supply
inside money. All these assets are exchanged without frictions. The trees entitle to receive
dividends
, = 1
, which are positive F -adapted bounded processes. Dividends are
the numeraire. Let + = [ 0
]> be the F -adapted asset price process. The price 0
is that of a unit money
, where
is F -adapted
R market account, and satises: 0 =
process satisfying (
)
. Moreover, we assume that
=

= 1

(4.69)

are processes satisfying the same properties as , with


where and
that rank ( ) =
a.s., where
[ 1
]> .
We assume that
is solution to
=

R . We assume

and 0 are F -adapted, with 0 R .


where
A strategy is a predictable process in R +1 , denoted as: [ 0
]> , and satisfying
R
2
( k k
)
. The value of a strategy, net of dividends, is:
+ , where
+
is a row vector. By generalizing results in Section 4.4.1, we say a strategy is self-nancing if its
value , is the solution to:

= >(
1 )+
+ >
(4.70)
where 1 is a -dimensional vector of ones,
( 1
)> ,
>
( 1 + 11 +
) . The solution to the previous equation is
Z
Z
Z
>
>
(
1
)
=
+
+
0

158

= 1

(4.71)

c
by
A. Mele

4.4. Martingales and arbitrage


where

denotes the initial wealth. We require

to be strictly positive.

4.4.2 Viability

R
Let = 0 +
= 1
, where = 10
and
=
. Let us generalize
the denition of the risk-neutral probability in Eq. (4.44), and introduce the set Q of riskneutral, or equivalent martingale, probabilities, Q {
: is a -martingale}. We show
the equivalent of Theorem 2.8 in Chapter 2: Q is not empty if and only if the market is
arbitrage-free. We rely on Girsanovs theorem of Section 4.3.3. Given a the F -adapted process
, dene,
Z
+
[ ]
(4.72)
0 =
is a standard Brownian motion under a probability , equivalent to
derivative equal to,

Z
Z

1
2
>
0
= exp
k k

2
F

such that

is a martingale under

. Under

=
If is a

+ =

(4.73)

((

-martingale, it is necessary and su cient that

= , or in vector notation,

(4.74)

Therefore, by Eqs. (4.70), (4.72) and (4.74), we have that, for


Z
Z
>
=
+
0

, Eq. (4.69) is,

= (
such that,

, with Radon-Nikodym

],

(4.75)

Consider the following denition:


Definition 4.2 (Arbitrage opportunity).
A portfolio

0
0
1
1
and
Pr
0
0.
0
0

159

is an arbitrage opportunity if

c
by
A. Mele

4.4. Martingales and arbitrage


We have:

Theorem 4.3. There are no arbitrage opportunities if and only if Q is not empty.
A proof of this theorem is in the Appendix. The if part follows easily, by Eq. (4.75). The
only if part is more elaborated, but its basic structure can be understood as follows. By the
Girsanovs theorem, the statement absence of arbitrage opportunities
Q is equivalent
to absence of arbitrage opportunities
satisfying Eq. (4.74). If Eq. (4.74) didnt hold,
we could implement an arbitrage, as follows. We could nd a nonzero : > = 0 and > (
1 ) 6= 0. Then, we could use when
1
0 and
when
1
0, therebying
obtaining an appreciation rate of larger than in spite of having zeroed uncertainty through
>
= 0. If Eq. (4.74) holds, this arbitrage opportunity would never occur, as in this case for
each , > (
1 ) = > . More precisely, dene

2
: > =0
h >i
and

h i

, for

Then, we may formalize the previous reasoning as follows. The excess return vector,
must be orthogonal to all vectors in h > i , and since h i and h > i are orthogonal,
2
h i, or
:
1 = .10

4.4.3 Market completeness


Let

). Consider the following denition:

Definition 4.4 (Market completeness). Markets are dynamically complete if for each ran0
2
dom variable
( F ), we can nd a portfolio process :
= a.s.
The previous denition is the natural continuous-time counterpart to that we gave in the
discrete-time case (see Chapter 2). Consistently with the conclusions in Chapter 2, we shall
prove that in continuous-time, markets are dynamically complete if and only if (i) = and
(ii) the price volatility matrix of the available assets (primitives and derivatives) is nonsingular.
We shall provide a sketch of the proof for the su ciency part of this statement (see, e.g.,
Karatzas (1997 pp. 8-9) for the converse), which relates to the existence of fully spanning
2
dynamic strategies. So given a
( F ), let
= and suppose the volatility matrix
is nonsingular. Let us consider the -martingale:

1
F
(4.76)
0

By the representation theorem of continuous local martingales as stochastic integrals with


respect to Brownian motions (e.g., Karatzas and Shreve (1991) (thm. 4.2 p. 170)), there exists
10 To

see that h i and h 0 i

are orthogonal spaces, note that:


2

>

=0

h i

>

>

=0

>

=0

160

>i

=0

c
by
A. Mele

4.4. Martingales and arbitrage


2
0

) such that

can be written as:


=

We wish to nd out a portfolio process


0
consumption, 0 1
equals
under

>

such that the discounted wealth process, net of


(or, equivalently, under ) a.s. By Eq. (4.75),

>
0
0

and so, by identifying, the portfolio we are looking for is > = 0 > 1 . Set, then, =
.
0
1
1
0
Then,
= 0
, and in particular,
= 0
a.s. By comparing with Eq. (4.76),
0
= .
Armed with this result, we can now easily state:
Theorem 4.5. Q is a singleton if and only if markets are complete.
Proof. There exists a unique
Girsanovs theorem. k

= . The result follows by the

When markets are incomplete, there is an innity of risk-neutral probabilities belonging to


Q. Absence of arbitrage does not allow us to recover a unique risk-neutral probability, just as
in the discrete time model of Chapter 2. The next results, provide a further representation of
the set of risk-neutral probabilities Q, in the incomplete markets case. Let 20 ( F ) be
R
the space of all F -adapted processes in R satisfying: 0
k k2
, and dene,
0
h i

2
0

):

=0

where 0 is a vector of zeros in R . Let


=

>

>

a.s.

Under the usual regularity conditions, can be interpreted as the process of unit risk-premia.
In fact, all processes belonging to the set:
o
n
Z=
: = +
h i

are bounded and, hence, can be interpreted as unit risk-premia processes. More precisely, dene
the Radon-Nikodym derivative of with respect to on F :

Z 2
Z

>
1

= exp

2
0
0
F

and the density process of all

on (

Z
1
= exp
k k2
2 0

F),

161

>

[0

])

c
by
A. Mele

4.5. Equilibrium with a representative agent

a strictly positive -martingale. We have the following results, which follows for example by
He and Pearson (1991, Proposition 1 p. 271) or Shreve (1991, Lemma 3.4 p. 429):
Proposition 4.6.

Q if and only if it is of the form:

( )=

(1

),

F .

To summarize, we have that dim(h i ) =


. The previous result shows quite nitidly that
markets incompleteness implies the existence of an innity of risk-neutral probabilities. Such a
result was shown in great generality by Harrison and Pliska (1983).11

4.5 Equilibrium with a representative agent


4.5.1 Mertons approach: dynamic programming
4.5.1.1 Setup

Relies on dynamic programming. An agent faces the following problem:


Z

( ) = max

( )

Under regularity conditions, the value function


Bellman equation,
0 = max

( )+

( )

>

s.t. Eq. (4.70) holds.

() satises the following Hamilton-Jacobi

)+

1
+
2

00

( )

>

( )

>

(4.77)

The rst order conditions lead to:


0

( )=

( )

and

( )
00 ( )
0

>

(4.78)

To solve for the value function, replace Eqs. (4.78) into Eq. (4.77), which leaves an ordinary
di erential equation that needs to be solved by . While an analytical solution for
is in
1
general unavailable, it is easy to check that in the CRRA case, ( ) = 1 1 , we have that
1
( )=
, for two constants
,12 such that
1
=

>

4.5.1.2 Stochastic opportunity sets

Let us generalize to the multi-state case.


11 The so-called F
ollmer and Schweizer (1991) measure, or minimal equivalent martingale measure, is dened as: ( )
(1 ( )), for each
F( ).
12 To determine these two constants, we replace optimal consumption, =
1
, into Eq. (4.77), use the conjectured functional
(1
)
(1
)
)
and = ( ) 1 , where
form for , and solve for the undetermined coe cients, obtaining = (
2 2

)> (

>) 1(

).

162

c
by
A. Mele

4.5. Equilibrium with a representative agent


4.5.1.3 Separation theorems

4.5.2 Martingale methods


4.5.2.1 Arrow-Debreu state prices and intertemporal budget constraint

We assume markets are complete,


= , and consider market imperfections in the next
section. As usual, our agent maximizes expected utility over consumption ows and terminal
wealth under the constraint in Eq. (4.71):

Z
(
(
) = max
)+
( )
s.t. Eq. (4.71) holds.
[4.P1]
(

This optimization problem can be solved relying on the notion of Arrow-Debreu state prices,
similarly as in Chapter 2. The rst task is to derive a budget constraint that parallels that in
Chapter 2, and applied to a two period economy. For reference, in Chapter 2, we explained that
in the presence of complete markets, the budget constraint can be written compactly as:

1
0= 0
1
(4.79)
0+

where and denote consumption and endowments, and


is the stochastic discount factor.
It is instructive to re-iterate the steps to achieve Eq. (4.79). First, we have the two budget
1
, and 1
=
, where notation is as in
constraints at time zero and one, 0
0 =
>
> 1
1
Chapter 2. By absence of arbitrage, =
, such that 0
(
), and Eq.
0 =
(4.79) follows by the denition of the Arrow-Debreu state prices,
=

(1 + )

and by taking the sum over all the states of nature.


The same logic would also obviously apply in the continuous time case. First, we dene
Arrow-Debreu state price densities,

= 0
=
(4.80)

As in the nite state case, we want to evaluate the present value of future consumption and
the current. For a given
, the value of future and present consumption is,
Z
( )
( )
( )+
( ) ( )

In terms of the two-period model,


( ) is the counterpart to 0 + (
), i.e. the value
of present consumption (the numeraire) and consumption in state . Next, we take the integral
over all states of nature, obtaining,

Z
+
=
(4.81)
where the rst equality follows by Eq. (4.80), and the second by the budget constraint.
Note that the original problem, one with an innite number of trajectory constraints, is
now reduced to one with a single constraint, just as in the two period market model. In the
163

c
by
A. Mele

4.5. Equilibrium with a representative agent

Appendix, we also show that Eq. (4.81) is equivalent to one in which the expectation is taken
under the risk-neutral probability, as follows:

Z
Z
=E
+
=
+
(4.82)
0

Eq. (4.82) is the nite horizon counterpart of Eq. (4.47) of Section 4.3.4.2, derived through a
di erent routeby rewriting the constraint under the risk-neutral probability and integrating
out. The approach in this section provides the economic intuition underlying that change of
probability.
4.5.2.2 The optimization problem

So the program is,


(

) = max
(

)+

s.t. Eq. (4.82),

where denotes state-contingent consumption arising from terminal wealth,


.
Because of the emphasis on the equivalent martingale measure, this approach to dynamic
portfolio optimization is known as the martingale method. The critical assumption underlying
it is that market are incomplete, such that there is a unique Arrow-Debreu density process
underlying the optimization problem. However, this method could conveniently be extended
to deal with the presence of portfolio constraints and incomplete markets in particular (see in
Section 4.7).
Consider the Lagrangean:
Z

max
( (
)

) + ( )

+
(

where

is a Lagrange multiplier, and by Eqs. (4.73) and (4.80),


Z

Z
1
2
= exp
+ k k
2

>

(4.83)

(4.84)

The rst order conditions are


for

)=

, and

)=

To determine the portfolio-consumption policy, note that for a generic F -measurable random
variable , the martingale,

Z
Z
1
1
E
=
+

0
0 +
By the predictable representation theorem, there exists a vector
Z
>
=
+

such that.

and by Itos lemma,


0

164

>

c
by
A. Mele

4.5. Equilibrium with a representative agent


By identifying,
>

where

>

(4.85)

can be determined from the constraint in Eq. (4.82) after setting

and

= .
4.5.2.3 Marginal utility of income

The Lagrange multiplier in Eq. (4.84) carries the interpretation of marginal utility of income,
similarly as for some derivations in Chapter 2. We provide a proof of this quite intuitive envelope property by relying on an innite horizon framework and instantaneous utility at equal
to
( ), as this is a framework often utilized in Part II of the Lectures. We, then, consider
the following program, a special case of program [4.P1],
Z
Z

max 0
( )
s.t. 0
= 0
1
[4.P2]
( 0)
0
( )

where

denotes initial wealth, and the notation,


+

generalizes that in Section 4.2. Optimality conditions are,


0

( ) =

(0 ) =

(4.86)

as usual. Furthermore, by assuming enough regularity conditions, we can di erentiate the intertemporal utility in [4.P2],
Z
Z

0
0
( 0) = 0
( )
=
=
(4.87)
0
0

where the second equality holds by the rst of the optimality conditions (4.86), and the third
by di erentiating L.H.S. and R.H.S. of the budget constraint in [4.P2] with respect to 0 , as
originally claimed.
Next, we determine , thereby pinning down 0 in the second of the optimality conditions
(4.86). Rather than performing this task within a general framework, we assume the agent has
1
1
CRRA equal to , such that the optimality conditions (4.86) become =
(
) . By
replacing optimal consumption into the budget constraint of [4.P2], and rearranging terms,
Z
1
1
(4.88)
= 0
0
0

and by integrating out Eq. (4.87),


( 0) =

1
0

Finally, by the optimality conditions (4.86), initial consumption has to satisfy,


Z
1
1
1 1
0 =
= 0
const 1 0
0
0

165

(4.89)

c
by
A. Mele

4.5. Equilibrium with a representative agent

where the second equality follows by the expression of in Eq. (4.88). Assume, for instance,
log-utility, = 1, in which case 0 = 0 as in some of the models in Chapter 3.
This example shows what we would have to expect in general equilibrium. In equilibrium,
optimal consumption equals the dividends of the trees. Assuming one tree to illustrate, we have
that the initial endowment has to satisfy the following constraint,
= const

where const is the same constant dened in Eq. (4.89).


4.5.2.4 Example: log-utility

Let ( ) = ln and ( ) = ln , and the time-discount factor


0. By plugging the optimality
1
1
conditions (4.84), =
=
, into the budget constraint, we obtain that the

+1
, such that optimal consumption is = +1 1 ,
solution for the Lagrange multiplier is: =
and = +1 1 . As regards the portfolio process, one has that:

=
which shows that
(4.85),

= 0 in the representation of Eq. (4.85). Then, by replacing


>

We determine

= 0 into

using and Eq. (4.82),

=
+
+1

where we used the semigroup property that for


>

+1

and, hence,
=

+1

(
+1

>

(
+1

(
+1

+1

)
. The solution is:

where we used the expression for the risk-premium vector,

1
1

4.5.2.5 Equilibrium

In a complete markets setting, an equilibrium is: (i) a consumption-portfolio policy satisfying


the Eqs. (4.84) and (4.85), and (ii) the market clearing conditions,
=

= 0,

for

(4.90)

=1

We now derive equilibrium allocations and Arrow-Debreu state price densities. First, note
that the dividend process, , satises:
=

+
166

c
by
A. Mele

4.5. Equilibrium with a representative agent


where
We have:

=1

and

ln

=1

) = ln
= ln

1
+ k k2
2

>

(4.91)

where the rst equality holds in an equilibrium, the second equality follows by the rst order
in Eq. (4.83).
conditions in (4.84), and the third equality is true by the denition of
Finally, by Itos lemma, ln (
) is solution to:

2 !!
1
ln =
+
+
+ 20 2
(4.92)
0
2
By identifying drifts and di usion terms in Eqs. (4.91)-(4.92), we obtain, after a few simplications, the expression for the equilibrium short term rate and the prices of risk:

(
)
(
) 1 2 2
(
)
=
+
+ 0
(
)
(
)
2
(
)
(
)
>
=
0
(
)
For example, consider the CRRA utility function, if ( ) = ( ) (
= 1. Then,
1
= +
= 0
( + 1) 20
2
Appendix 2 performs Walrass consistency tests regarding Eq. (4.90).

1) (1

), and

4.5.3 Continuous time Consumption-CAPM


[In progress]
We summarize,
=E

0
0

where the second equality holds by the same arguments leading to Eq. (4.82). Replacing the rst
order condition in (4.84), and the equilibrium conditions in Eq. (??), we obtain the continuous
time version of the Consumption-CAPM,
0

Z
0
( )
( )
=
+
= 0 1
0(
0(
)
)
For example, the price of a pure discount bond, is,
0

( )
=
0(
)
where

is as in Eq. (4.83).
167

4.6. Partial hedging in incomplete markets: introduction

c
by
A. Mele

4.6 Partial hedging in incomplete markets: introduction


[In progress]

4.7 Inaction: the economics of American options


4.7.1 Early exercise premiums: an introductory example
Consider the following two gambles. In the rst, we draw a dice, and obtain a payo
{1 6},Psuch that the fair value of the gamble from the perspective of a risk-neutral bettor
6
1
is 0
= 3 5. In the second gamble, we are allowed to draw the dice three times,
=1
6
and decide to earn the payo
{1 6} and, then, stop the game, at any of these draws.
The design of the second gamble clearly aims to increase the chances of better outcomes, and is
expected to be valued more than the rstthe right to exercise the option to restart the game
two times more has a positive value. For example, in case the outcome in any of the draws is
= 1, we would for sure like to draw the dice again and, in case the outcome is = 6, we would
for sure like to stop drawing the dice. More generally, now, we wish to nd a state-contingent
rule that maps states to a decision to stop the game.
We proceed recursively. Each time we draw the dice, we stop, provided the outcome is
favourable enough, in that it is higher than the outcome we would expect from the next draw.
Mathematically, the value of the game in each state is max{ 0 }, where
{1 6} and 0
denotes the expected payo from drawing the dice another time.
The next table provides the values of the second gamble on the rst and second draws, for
each state. Let us consider Panel B. If the outcome after the second draw is 1 or 2 or 3, we draw
the dice again, because the expected payo from a third (a nal) draw, 3 say, is obviously
3 = 0 , and is higher than that we would currently obtain, 3
3. However, we stop this
gamble after the second draw, provided we observe an outcome equal to 4 or 5 or 6, as any
of these outcomes is higher than 3 . By a similar reasoning, the value of the gamble at the
rst draw in state is, 1 ( ) = max { 2 }, where 2 = 16 (3 5 3 + 4 + 5 + 6) = 4 25, leading
to Panel A of the table. The value of the gamble is, then, the expected payo from Panel A,
1 = 16 (4 25 4 + 5 + 6) = 4 6667.
Panel A: Value after the 1 draw
state
value at ,
(
decision
1 ) = max{ 2 }
1
4.25
draw again
2
4.25
draw again
3
4.25
draw again
4
4.25
draw again
5
5
stop
6
6
stop

Panel B: Value after the 2 draw


state
value at ,
decision
2 ( ) = max{ 3 }
1
3.5
draw again
2
3.5
draw again
3
3.5
draw again
4
4
stop
5
5
stop
6
6
stop

The value of the rst gamble is less than that of the second, 0 1 . We may interpret the
second gamble as an American option, which we can exercize at any time before its expiration,
say after having seen any of the rst two draws. The rst gamble is, instead, interpreted as a
European option, with a payo that only links to the third draw. The di erence between the
168

c
by
A. Mele

4.7. Inaction: the economics of American options

values of the two gambles is a premium we need to pay for the optionality to exercise early,
such that in general, the value of the gamble after draws is:
( ) = max { E (

+1 )}

{1 6}

where we consider the unconditional expectation E () due to the fact that the draws of the dice
are independent and identically distributed.
Note that the exercise boundaries are
5 after the rst draw and
4 after the second,
meaning that the region where it is optimal to stop widens as the game goes throughthe
value to wait obviously decreases as the expiration approaches. Moreover, these boundaries
are obviously endogenous, in that we calculate them as a part of the decision process. Finally,
the boundaries are time-varying in this free-boundary problem, although in the next section,
we shall consider innite horizon decision problems where the free-boundary to be selected is
constant.
4.7.2 Gambles and securities again
Chapter 2 explains a fundamental distinction exists between gambles and securities. Securities
are traded while gambles are not, such that the value of gambles depends on supply and demand.
For example, the evaluation in the previous section relies on the assumption that the bettor
is risk-neutral. Alternatively, we could consider a risk-averse bettor, for example, one with
logarithmic utility and, hence, a value function at time and state equal to,
( ) = max {ln

+1 )}

P
The value function to be expected from a third draw is 3 16 6=1 ln = 1 0965, and because
ln 2 3 and ln 3 3 , the decision after the second draw is to draw again only when the draw
delivers particularly poor outcomes,
= 1 or 2. Likewise,

the value expected from drawing


P6
1
the dice a second time is 2 6 2 1 0965 +
= 1 3465, such that, and given that
=3 ln
ln 3
2 and ln 4
2 , the decision after the rst draw is to stop, provided the state is
particularly favourable, i.e. when is at least as large as 4. Therefore, the valueof this gamble
P
from the perspective of the log-utility bettor is, 1 = 16 3 1 3465 + 6=4 ln = 1 4712.
Note how more cautious the log-bettor is, compared to the risk-neutral, as he is inclined
to continue this game, only when the draw outcome is larger than three. In terms of riskpremiums, the certainty equivalent of this gamble, : ln = 1 4712, is = 4 3545, which is
lower than the value 1 = 4 6667 for a risk-neutral gambler. The di erence is a risk-premium.
Still, the log-bettor acknoweledge this gamble carries a strictly positive early exercise premium,
1 3 = 0 3747.

Panel A: Value after the 1


state
value at ,
1 ( ) = max{ln 2 }
1
1.3465
2
1.3465
3
1.3465
4
ln 4
5
ln 5
6
ln 6

draw

Panel B: Value after the 2


state
value at ,
2 ( ) = max{ln 3 }
1
1.0965
2
1.0965
3
ln 3
4
ln 4
5
ln 5
6
ln 6

decision
draw again
draw again
draw again
stop
stop
stop

169

draw
decision
draw again
draw again
stop
stop
stop
stop

c
by
A. Mele

4.7. Inaction: the economics of American options

As explained, the reason a gamble like this is so relying on factors such as risk-aversion is
that the asset we consider (throwing a dice) is not traded and, as such, does not satisfy the
martingale restriction. The next section develops an evaluation framework for a traded asset
with a payo relating to another traded asset, such that standard risk-neutral pricing applies.
4.7.3 Real options theory
We built upon the intuition developed through the example of the previous section, and derive
a continuous-time approximation to the solution of the resulting problem, by hinging upon a
series of heuristic arguments. We consider an American option, i.e. an option that we can
exercise at any time before the expiry date, . Once the option is exercized, it yields a payo
equal to a function of the underlying asset price, say ( ). Let C be the price of an American
option as of time . In discrete time, we have:

C = max
( )
E [C + ]
We assume that the nature of the option, summarized by the payo
( ), is such that there
are two regions, a stopping region and a continuation region, dened as follows:

(i) Stopping region, where time-to-maturity and the price


of the asset underlying
the option
are such that it is optimal to exercise, C = max
( )
E [C + ] = ( ), in
which case, of course, C
[C + ]. By rearranging terms
0

E [C + ]

(4.93)

The expected return on the option under the risk-neutral probability is less than that on
a bank deposit, which further claries why it is optimal to exercise early. Naturally, the
fact the option is yielding less than the safe interest rate is not an arbitrage. We could
simply not short the derivative, as no one else is willing to buy it, as it is not optimal to
do so.
(ii) Continuation region, where time-to-maturity and the price of the asset underlying the option are such that it is optimal to wait, C = max
( )
E [C + ] =
E [C + ],
or
1
E [C + ] C
C
0=
(4.94)
The expected return on the option under the risk-neutral probability is the same as that
on a bank deposit.
Note that the existence of these two regions is not guaranteed. For example, we shall see
that it is never optimal to exercise early American calls written on assets that do not distribute
dividends. When the two regions are, instead, well-dened, they dene an exercise envelope, a
function of the asset price underlying the option and time-to-maturity. It is a free boundary
problem: we need to nd a boundary that triggers some action, in this case, exercising the
option, and the boundary is free in that it is not given in advance as in the case of, say, the
barrier options of the following section.
This problem can be quite complex, but sometimes, simplies for those derivatives with an
innite expiry date, . This simplication arises as in this case, the option price and, hence,
170

c
by
A. Mele

4.7. Inaction: the economics of American options

the envelope, only depends on the underlying asset price. Under this assumption, and the
assumption that the price of the asset underlying the option is a geometric Brownian motion
with volatility parameter , we have that the option price satises, in the limit
0:
L [C]
L [C]

Stopping region:
Continuation region:

C 0 and C =
C=0

( )

(4.95)
(4.96)

where L [C] = 12 2 2 C 00 + C 0 . Eqs. (4.95)-(4.96) are the continuous time counterparts to Eqs.
(4.93)-(4.94). To these two equations, we need to add a number of conditions, discussed in the
two examples in the subsections below.
We can understand this evaluation of this asset under an equivalent angle, one cast in terms
of an optimal stopping time problem,

C = sup
( )
(4.97)
0

That is, we would like to nd the time at which it is best to undertake an action, in this case,
exercizing the option. This time links to the free-exercise boundary , say, as follows:
= inf { :
0

is determined endogenously. It is a real option problem, as it links nicely to the


where
theory of evaluation of rms undertaking real economic activities as we shall say in Section 4.7.
We now describe this problem in detail in standard cases.
4.7.4 Perpetual puts
Consider a perpetual put, where ( ) = (
)+ , and the price is, accordingly, a function
of the underlying asset price only. This price satises Eqs. (4.95)-(4.96), with some additional
conditions and qualications. First, we assume, and later verify, that there exists a value for the
asset price, the free boundary,
say, such that, it is optimal to exercise the option whenever
. In other terms, Eqs. (4.95)-(4.96) can be written as:
): ( ) =
): L [ ]
=0

Stopping region (
Continuation region (

(4.98)
(4.99)

where is the strike price of the option. Eq. (4.98) is a value-matching condition. It ensures
that the pricing function is continuous as we move from the continuation region towards the
stopping region.
Second, we require the following boundary condition:
lim

( )=0

(4.100)

That is, as the asset price gets large, the value of the put option needs to approach zero, as the
probability the derivative is ever exercised becomes negligible.
Finally, the pricing function, ( ), satises the following smooth-pasting condition, obtained after taking the derivative in Eq. (4.98),
0

( )=
171

(4.101)

c
by
A. Mele

4.7. Inaction: the economics of American options

We conjecture that in the continuation region, the pricing function that solves Eq. (4.99) has
the form ( ) =
, for two constants and . Plugging this guess into Eq. (4.99) reveals
that actually, the pricing function satisfying it, has the following form:
( )=

(4.102)

where + and
are two constants, to be pinned down, + = 1 and
= 22 . To satisfy
the boundary condition in Eq. (4.100), we need that + = 0, which leaves ( ) =
.
Evaluating this function at , as in Eq. (4.98), and using the smooth pasting condition in Eq.
(4.101), yields:

( )=
=
(4.103)
1
0
( )=
= 1
The endogenous variables of this system are the two constants
=
and

=(

1
2

and

. We have:

(4.104)

, such that
( )=(

A few comments are in order. First, Eq. (4.104) shows that the value to wait increases with
. Second, when the short-term rate is zero,
= 0, meaning it is never optimal to exercise,
and the option is worthless. Intuitively, in the stopping region, the expected return on the
option under the risk-neutral probability is less than that on a bank deposit. When = 0, this
expected return is negative, which destroys the time-value of money argument underpinning
early exercise.
2

4.7.5 Perpetual calls


As anticipated, not any payo gives rise to well-dened stopping and continuation regions, such
as those in Eqs. (4.95)-(4.96). For call options, where ( ) = (
)+ , it is never optimal
to exercise early, when the underlying assets do not pay dividends. To illustrate, we literally
follow the arguments in the previous subsection, and nd that the call price, ( ), satises,
Stopping region (
Continuation region (

): ( ) =
): L [ ]
=0

(4.105)
(4.106)

The solution to Eq. (4.106) has the same functional form as that in Eq. (4.102), with the
same values of
and + . However, due to the obvious conjectures about the location of the
stopping and continuation regions in Eqs. (4.105)-(4.106), it satises the boundary condition
lim 0 ( ) = 0, rather than lim
( ) = 0, as the put price does in Eq. (4.100). Therefore,
and because + = 1, we must have that ( ) = + , or
( )=

where the second equality follows by the value matching condition in Eq. (4.105). Solving for
yields,
=

1
172

c
by
A. Mele

4.7. Inaction: the economics of American options

As for the smooth-pasting condition we have that 0 ( ) = + = 1, such that


= . It
is never optimal to exercise. In other words, there are no solutions to the counterparts to the
two Eqs. (4.103). The call option price fails to simultaneously satisfy the value matching and
smooth pasting condition.
4.7.5.1 With dividends

If the underlying asset distributes any payouts over the life of the option, say due to storage
costs, or even dividends, the problem has, instead, a well-dened solution. Assume that under
the risk-neutral probability, the underlying asset price satises,
=(

where is the constant payout ratio,


, with
denoting the instantaneous payout (e.g.,
a dividend), and as usual, denotes a Brownian motion under the risk-neutral probability.
For this problem, Eqs. (4.95)-(4.96) reduce to,
): ( ) =
): L [ ]

Stopping region (
Continuation region (
where now, the innitesimal generator is L [ ] =
solution to Eq. (4.108) is,
( )=

1
2

2 00

+(

(4.107)
(4.108)

=0
)

. We can show the

2+2 2 ,
2 + 2 2 , and
= 12
+ 12 2 . Clearly, we
+
where, + = 12
+
have that
0, and
0, such that the conjectures about the location of the stopping and
continuation regions in Eqs. (4.107)-(4.108) deliver the boundary condition lim 0 ( ) = 0,
and then,
( )=

(4.109)

The value matching condition and the smooth pasting conditions equivalent to the two Eqs.
(4.103) are, now

( )= + +=
1
0
( )= + + + =1
The solution to this system is,

=(

such that the call option price in Eq. (4.109) can be written as:
( )=(

)
173

c
by
A. Mele

4.7. Inaction: the economics of American options


The following picture depicts the triggering ratio,
function of , when = 1% and = 15%.

S*/K

, after which exercise is triggered as a

1.06

1.05

1.04

1.03

0.20

0.25

0.30

0.35

0.40

0.45

0.50

The triggering ratio blows up as the payout ratio shrinks to zero. In general, the expected
optimal stopping time is inversely related to the payout ratio .
4.7.5.2 American calls and incomplete markets

American call options might be worth even in the absence of dividends paid out by the underlying over their life, in the presence of incomplete markets. Consider the following example of a
perpetual American call option written on the instantaneous variance (not an average expected
volatility) of an asset return. Suppose this variance, say, is solution to,
= (

where , and are parameters, and


is a Brownian motion under the risk-neutral probability. It is easy to show that the American option price is solution to,
Stopping region (
Continuation region (

): ( ) =

): L [ ]
=0

where is the strike price of the option, and


L[ ] = (

1
+
2

2
2 2
2

(4.110)

The smooth pasting condition is,


0

( )=1

and can be satised, because the fact is not traded makes the drift of show up in the
innitesimal generator of Eq. (4.110)the fact is not traded makes its drift under the riskneutral probability play a role similar to dividends and leads to a positive economic value for
in this context.
174

4.8. Further topics on real options and controlled Brownian motions

c
by
A. Mele

4.8 Further topics on real options and controlled Brownian motions


4.8.1 Irreversible investments and the decision to invest
We discuss a classical model of the optimal time at which a rm should start to sell its output
due to McDonald and Siegel (1986). Assume this rm has access to a technology such that at
each instant of time , an output
could be produced (and sold), which is solution to,
=

where and are constants and


is a standard Brownian motion. The idea underlying the
model is that the rm begins to produce only once
has reached a certain threshold, and that
the production cannot be interrupted once it startsthe investment is irreversible.
Note that if the rm decides to begin selling output for
at , its NPV is
, where
is the constant discounting rate, which needs to be larger than output growth to ensure nite
NPV,
, and is an initial constant cost incurred while starting the production. Therefore,
the value of the rm that holds the potentiality to start produce at some future time when
current potential output is 0 , is,

( )
( )
( 0 ) = sup
0

That is, the value of the rm is formally the same as that a perpetual American option in
(4.97). Accordingly, we conjecture that the optimal stopping time is determined as the rst
time
crosses a free-boundary from below,
say. In the continuation region,
0=L ( )
where L

1
2

2 2

00

( ) for

(4.111)

. At the free-boundary,
( )=

(4.112)

The solution to Eq. (4.111) is, ( ) = 1 1 + 2 2 , where it is easy to show that 1 0 and
1 assuming as we are that that
, and 1 and 2 are constants to be determined. Note
2
that it must be that 1 = 0, for otherwise the value function would increase disproportionately
as the output potential becomes arbitrarily small. Therefore, the economically viable solution
to Eq. (4.111) is the bounded value function,
( )=

for

(4.113)

The free-boundary
and the constant 2 can now be determined by requiring that the two
following conditions hold at the threshold : (i) the value matching condition, i.e. the value
function in Eq. (4.113) is the same as that in (4.112), and (ii) the smooth pasting condition,
the derivatives agree. The result is:
=

(
2

)
1

( )1
=
2(

That is, the rm should start produce immediately, unless it had to pay a cost in which
case the threshold to start produce increase proportionally with as the previous expression
for
suggests.
175

4.8. Further topics on real options and controlled Brownian motions

c
by
A. Mele

4.8.2 A model of determination of exchange rates in target zones


We provide one instance of policy intervention that can force asset prices to experience less
long-term volatility than implied by economic fundamentals: an agreement between central
banks to do whatever it takes to maintain the exchange rate within a given band. The
main assumption underlying the model is that monetary intervention are fully credible and
that interventions occur marginally, in a way that the fundamentals behave as processes with
reecting barriers, just as in the dividend example given in Section 4.2.6.4. Below, we shall
discuss issues arising when the policy is not fully credible, and the subsequent possibility of
speculative attacks.
Consider, rst, the classical model of exchange rate determination, in which the exchange rate
is tied down to the consumer price di erentials between two countries through the purchasing
power parity equation,
=
, where
denotes the log-spot exchange rate,
is the
domestic log-price level at time , and
is its foreign counterpart. Money demand increases
with log-output and decreases with the nominal interest rate in a way that in equilibrium,
log-money supply is
= +
+ , where and are constants, and
is some
stochastic disturbance is thus the semi-elasticity of money demand to to interest rates. An
analogous, starred equation holds in the foreign country. Finally, the covered interest rate
parity holds,
( )
= +
We can put all together and conclude that the exchange rate is solution to,
=
(
where
uctuate, the fundamentals,

(4.114)

) (
). Assume that were the exchange rate free to
, would follow an arithmetic Brownian motion
=

where and are constants, and


is a Brownian motion. It is immediate to see that this
assumption regarding the fundamentals, Eq. (4.114) and Itos lemma, all imply that the equilibrium exchange rate, = ( ), is solution to the following di erential equation:

1 2 00
0
( )+
( )
(4.115)
( )= +
2
which is solved by,
F(

Note that exchange rate volatility does not play any role in this context. In fact, there are
many additional exchange rate functions on top of F ( ), which satisfy Eq. (4.115),
( )=

( )+

(4.116)

p
2 2+2
2 ), and
where 1 2 = 1 2 (
1 and
2 are two arbitrary constants. However,
F ( ) is the only solution that we can interpret as one of expected fundamentals consistent
with Eq. (4.114), in that,
Z

1
1(
)
( )=
=
+
F( )
176

4.8. Further topics on real options and controlled Brownian motions

c
by
A. Mele

How would exchange rates behave, once central banks are credible enough to modify the
fundamentals and make the exchange rates lie within a given band? Krugman (1991) considers
a model in which the exchange rate is maintained withing a target zone, say
[ ] for
two given constants and . The models main assumption is that central banks control the
fundamentals through innitesimal interventions, by injecting or withdrawing monetary base
, as soon as the fundamentals become too weak or too strong for the current exchange
rate to be maintained within the band [ ]. The fundamentals are then solution to,
=

where
and
are continuous processes that only increase when the fundamentals reach hit
some critical values ( ) and ( ) that ask for intervention, which will be determined below.
By arguments entirely analogous to those utilized while pricing assets with dividends that
have reecting barriers (see Eq. (4.58) in Section 4.2.6.4), we now have that the exchange rate
is solution to,

1 2 00
0
0
( )=
( )+
( )
+ 0( )
+ 0( )
( )
2
Therefore, the exchange rate is still solution to the di erential equation (4.115), and its
general solution is therefore given by Eq. (4.116). We can now determine the two constants,
1 and
2 . First, note that analogous to the reecting barrier case in Section 4.2.6.4 (see Eq.
(4.59)), to rule out arbitrage, we need to ensure that the following smooth pasting conditions
hold at the critical values and :
0

( ) = 0 and

() = 0

(4.117)

() =

(4.118)

Moreover, at the boundaries, we need to ensure that


( )=

and

Eqs. (4.117)-(4.118) are four independent equations with four unknowns: the two constants
of the exchange rate function 1 and 2 , and the two critical values of the fundamentals that
trigger intervention, and , which are all implicitly determined.
[In progress: discuss the literature on speculative attacks & policy credibility]
4.8.3 Liquidity constraints and optimal dividend policy
Jeanblanc-Picque and Shirayev (1995) consider a model in which a rm maximizes its value
through a dividend policy that accounts for periods of inaction. Inaction occurs because of
liquidity constraints. While it will be assumed that the capitals rm has positive growth, there
might be bad times in which it would not be optimal for the rm to distribute any dividends
as this might trigger bankruptcy. This issue would not arise if shareholders were able to inject
new funds or the rm could borrow from a bank or issue new securities.
Consider the simple frictionless case, in which the cash generated by the rm is,
=

(4.119)

where and are parameters, and


is a standard Brownian motion. The net present value
of the rm that has initial cash equal to
is,
o
n
NPV ( 0 ) max 0 0 +
(4.120)
177

c
by
A. Mele

4.8. Further topics on real options and controlled Brownian motions

where is the constant discounting rate.


Next, consider the case in which the rm is liquidity constrainted. It will now be optimal to
distribute dividends only after cash reaches a threshold, with the rms capital being solution
to,
=
+
(4.121)
is a cumulative dividend process, which only increases once capital
has reached a
where
su ciently high level , to be determined below. It is assumed that 0 0, that bankruptcy
occurs whenever
hits zero, and that the value of the rm is maximized by risk-neutral
managers as:

Z
( 0 ) = sup

s.t. Eq. (4 121)

(4.122)

The evaluation in (4.121) quite di ers from that in Eq. (4.120), as we now show, depending as
it is on the dividend policy.

4.8.3.1 Bounded dividend policy

Assume, initially, that the dividend satises,


=

( )

with 0

( )

(4.123)

The limiting case, = , is considered below. By Eqs. (4.121)-(4.122), and (4.123), the
problem of the rm is then,
Z

( )
s.t.
=(
( )) +
(4.124)
( 0 ) = sup
0

such that the Bellman equation for the problem is,


0=

sup
0

( )

( )+

( )]

( )
0

( )+

1
2

00

( )

( )+

sup
0

[ ( ) (1

( ))]

(4.125)

( )

where denotes the usual innitesimal generator of the constraint in (4.124).


The solution is,

0
: 0 ( ) 1 (inaction)
( )=

: 0 ( ) 1 (dividend distribution)

(4.126)

The structure of the problem suggests the following interpretation of the solution: the rm
should pay o as soon as the level of capital reaches a threshold level, at which the payout takes
place at the maximal speed, . This interpretation needs to be corroborated by verifying that
the marginal value of capital, 0 ( ), is indeed decreasing in . In this case, the interpretation
of the optimal policy in (4.126) is quite neat: the rm pays out whenever the marginal value of
capital is low (less than unity).
Substituting Eqs. (4.126) into Eq. (4.125), and conjecturing that
is concave (decreasing
returns to capital) over the inaction region, yields that the rm does not distribute any dividends
until reaches a threshold
(say),
0=
1
0=

1 2
2
0
( ) + 12

( )+

00
2

( )
00

( )

( )

(inaction region)

( ) + (1

178

(4.127)
0

( ))

(payout region)

c
by
A. Mele

4.8. Further topics on real options and controlled Brownian motions

It will be veried that


is concave indeed in the inaction region. It is straightforward to
verify that the solutions to the two equations in (4.127), ( ) and
( ) say, take the following
form:

1 +
2 +
( )= 1 1 + 2 2
and
( )=
1
2
for some coe cients
and
to be determined, and 1 0,
The bankruptcy condition stipulates that 0 = 0, such that
1

2
( )= 1
for

2
1

0,

0 and
2 , and then,
1

0.
(4.128)

( ), note that because ( ) is bounded by , then by Eq. (4.124),


As for

and bounded by , which requires


2 = 0. Therefore,

must be positive

( )=

for

(4.129)

To summarize, the value of the rm is given by Eq. (4.128) and Eq. (4.129), with three
constants to be determined: the two coe cients,
. The
1,
1 , and the free-boundary,
0
constants are determined by imposing (i)
( ) =
( ) (value matching), (ii)
( ) =
0
( ) (smooth pasting), (iii) 00 ( ) = 00 ( ) (super-contact). The super-contact condition
makes the second order derivative continuous as required for an application of Itos lemma to
a function of a variable that takes values over the real line, ( ). Finally, it can be veried
that the thusly determined coe cients imply that
is positive, increasing and concave for
all
, thereby validating the conjectures underlying the two equations in (4.127). The
di erence between ( ) in Eqs. (4.128)-(4.129) and ( ) in Eq. (4.120) is a cost accounting
for the nancial frictions faced by the rm.
4.8.3.2 Reected Brownian motions

Next, we consider the limiting case in which = . In a subsection below, we argue that the
rms capital would behave as a reected Brownian motion in this case: when capital is below
the threshold , it is an arithmetic Brownian motion as in Eq. (4.119), but as soon as capital
reaches , it is then reected back to values lower to it, with the capital in excess distributed as
a dividend. It is also assumed that if the rm had to begin with cash larger than the threshold,
the part exceeding . Heuristically, Eqs. (4.127) would hold in this case when,
0=

( )+

1
2

with the bankruptcy condition,

00

( )

( ) for

and

( )=1

(4.130)

= 0, and

( )=

( ) for

higher than the threshold,


with the interpretation that were the rm to start with capital
the di erence
would be immediately paid out as dividend. The super-contact condition
is therefore, 00 ( ) = 0.
The solution to Eq. (4.130) that satises the bankruptcy condition, is,

2
1
( ) = max 0
(4.131)
2
1
2

179

c
by
A. Mele

4.8. Further topics on real options and controlled Brownian motions

where 1 0 and 2 0, just as in Eq. (4.128). The threshold,


is determined through the
ln( 21 ) ln( 22 )
super-contact condition. It is,
=
.
2
1
Similarly as in the previous subsection, the di erence between ( ) in Eq. (4.131) and ( )
in Eq. (4.120) is a cost accounting for the nancial frictions faced by the rm. We now provide
a few technical but heuristic details regarding the connection of this problem with reected
Brownian motions.
4.8.3.3 Heuristic technical details on reected Brownian motions

While details are provided below to ll the reader with additional insights regarding the nature
of reected Brownian motions, a standard and rigorous source of details is in Karatzas and
Shreve (1991, p. 210-212). Dene the occupancy time of any given random process
in a set
up to time as,
Z
(
)=
I
0

spends in set up to
where I denotes the indicator function. It represents how much time
time for a given realization of
on [0 ]. Below, it will be argued that there exists a function
( ), called local time of
at a given point , such that,
(

)=

) = lim
0

1
2

((

+ ) )

(4.132)

We provide an informal derivation of (4.132) below. Note the following property. We have,
for any xed , and using the denitions of and , and that of the Diracs function,
Z
)=
(
) ( )
(
Z
Z
1
) lim
I (
=
(
+ )
0 2
0
Z Z
=
) (
)
(
0
Z
(
)
(4.133)
=
0

Note, also, that,


(

)=

I
Z Z
0

=
=

(
Z

)I
(

where the last equality follows by Eq. (4.133). This is Eq. (4.132).
180

c
by
A. Mele

4.9. Portfolio constraints

Next, consider the solution for in Eq. (4.126), which in the limiting case = , can be
conceived as ( ) = (
), where () is Diracs delta. To simplify the presentation, set
= 0 and = 1, such that the rms capital in Eq. (4.121) now becomes,
Z
= 0
(
) +
= 0
(
)+
(4.134)
0

where the second equality follows by the property of local time in (4.133).
We argue that Eq. (4.134) describes a Brownian motion reected at : each time
hits
+
from below, it is reected back. Consider the function |
| (
) +(
) ,
where
is a Brownian motion, and apply heuristically Itos lemma,
Z
Z
|
|= | 0
|
sign (
)
(
)
(4.135)
0

Eq. (4.135) is known as Tanakas formula for Brownian local time (see, also, the Appendix).
It is possible to show that the second term on the R.H.S. of this equation is a Brownian motion
starting from zero (Karatzas and Shreve, 1991, p. 209), say Moreover, the third term is local
time, by Eq. (4.133). Therefore, Eq. (4.135) is,
|

|=

|+

such that the two processes in (4.134) and (4.135) have the same distribution.

4.9 Portfolio constraints


This section analyzes portfolio choices in contexts of market imperfections such as market
incompleteness or short sale constraints, or others. We keep on hinging upon the setup in
Section 4.4, and we x
= although then we allow for frictions, by restricting the vector
of normalized portfolio shares in the risky assets,
, to lie in a closed convex set
R.
We follow the approach put forward by Cvitanic and Karatzas (1992), which consists in embedding the constrained portfolio choice of the investor into a set of unconstrained portfolio
optimization problems. Under regularity conditions, in this set of unconstrained problems, there
exists one, the solution to which is that of the original constrained portfolio problem. Therefore, the constrained portfolio problem is solved, once we solve for the unconstrained, which
we can do through the martingale methods in Section 4.4. This approach is closely related
to the minimax probability methods of He and Pearson (1991), briey illustrated within the
discrete-time framework with incomplete markets of Chapter 2. It is a systematic approach to
implement solutions to consumption and portfolio policies.in a context of constrained portfolio
selection, and generalizes results from He and Pearson (1991), which only relate to contexts
with incomplete markets.
4.9.1 Technical background
The starting point is the denition of the support function,
( ) = sup(

>

181

(4.136)

c
by
A. Mele

4.9. Portfolio constraints


and its e ective domain,
={

R : ( )

(4.137)

The set in Eq. (4.137) is also known as the barrier cone of


, where
is the set of
admissible portfolio choices.
Examples of the support function in Eq. (4.136) are the unconstrained case:
= R , in

which case
= {0} and = 0 on ; prohibition of short-selling:
= [0 ) , in which case
= and = 0 on , or: incomplete markets: = {
R : +1 = = = 0} (i.e. the
rst
assets can only be traded), in which case = {
R : 1 = =
= 0} and = 0

on , which is the case covered by He and Pearson (1991).


We illustrate these cases in the R2 case, as in Figures 4.1 and 4.2 below. Given a cone ,
interpreted as before as the set of admissible portfolio choices, consider the polar cone of
,
>

dened as
={
R :
0
}, illustrated in Figure 4.1 in the case = 2. We
emphasize is the polar cone of
, not the barrier cone of
dened in Eq. (4.137).

~
K

p2

p1

~
K

FIGURE 4.1

Consider, then, again, prohibition of short-selling, in which case


see that in this case, = and that

= [0
0
if
>
)=
( ) = sup (
otherwise
[0 )2

= [0

)2 . It is easy to

)2

All in all, ( ) is nite only on , such that the e ective domain is = .


As for the incomplete markets case, assume the second asset is missing for sake of illustration.
We have that = {
R2 : 2 = 0}, such that,
={
={
={

R2 :
R2 :
R2 :

1 1

0
= 0}

1 1
1

2 2
1

182

R}

R and

= 0}

c
by
A. Mele

4.9. Portfolio constraints

Panel C in Figure 4.2 illustrates this incomplete markets case. Panel A and B are examples
of less severe forms of constraints, where markets are still complete. Panels A, B and C impose
progressively tougher constraints, to the extent of the incomplete markets case in Panel C.

p2

~
K

K
p1
K
~
K

Panel A
p2

~
K

p1

~
K

Panel B
p2
2

~
K

p1

~
K

Panel C
FIGURE 4.2: Portfolio constraints and incomplete markets.

183

c
by
A. Mele

4.9. Portfolio constraints


As for the function ( ) in this context, we have that:

(i.e.
0
if
( ) = sup ( 1 1 ) =
otherwise
1 R

= 0)

Again, ( ) is nite only on , such that the e ective domain is = .


4.9.2 Articial markets
The role of the support function
Section 4.4, as follows:
0

in Eq. (4.136) is to tilt the dynamics of the price system in

= 1

(4.138)

where:
+ ( )

+ + ( )

(4.139)

and is as in Section 4.4.


The main result is as follows. Denote with Val ( ; ) the value of the problem faced by an
investor facing a portfolio constraint
R , when his initial wealth is . Let Val ( ) be the
corresponding value of the problem faced by an unconstrained investor in the market (4.138).
Clearly, this value is just Val0 ( ) for the markets in Sections 4.4 and 4.5. Moreover, for each
R , the unconstrained program the investor faces in the market (4.138), can be solved
through martingale methods, using the unique risk-neutral probability
, equivalent to ,
with Radon-Nikodym derivative equal to,

1 2
1 >
1
= 0 exp

(4.140)

2 0
0
F
where 0 is as in Eq. (4.73). As mentioned, He and Pearson (1991) is a special instance of this
setting, which obtains once we assume incomplete markets, in which case the support function
is = 0, as explained.
Under regularity conditions, we have that:
Val ( ;

) = inf (Val ( ))

(4.141)

and optimal consumption and portfolio choices for this unconstrained problem are exactly those
chosen by the investor constrained to have
. Appendix 4 provides a heuristic derivation
of Eq. (4.141).
In the context of log-utility functions, we have that,

1 2

= arg min 2 ( ) +
+
(4.142)

where = 1 (
1 ). Part II of these lectures will work out applications of this result.
[Eq. (4.142) follows by applying martingale methods in a market where the Brownian motion
is
. Its actually very simplesee Cvitanic and Karatzas (1991, p. 790 + 797) + a portfolio
> 1
=
( + 1 ).
For example, in the incomplete markets case, we have that = 0, and the solution is then
= 0 as originally explained by He and Pearson (1991). This is the case with only one assetthe
multi-asset case is more complex and dealt with by Cvitanic and Karatzas (1991, p. 797).]
[In progress]
184

c
by
A. Mele

4.10. Jumps

4.10 Jumps
Brownian motions are well suited to model the price behavior of liquid assets or assets issued by
names or Governments not subject to default risk. There is, however, a fair amount of interest in
modeling discontinuous changes in asset prices. Fixed income instruments may undergo liquidity
dry-ups, or even default, causing price discontinuities that we wish to model. This section is
an introduction to Poisson models, a class of processes that is particularly useful in addressing
these issues.
4.10.1 Poisson jumps
Let ( ) be a given interval, and consider events in that interval which display the following
properties:
(i) The random number of events arrivals on any disjoint time intervals of (
pendent.

) are inde-

(ii) Given two arbitrary disjoint but equal time intervals in ( ), the probability of a given
random number of events arrivals is the same in each interval.
(iii) The probability that at least two events occur simultaneously in any time interval is zero.
Next, let (
) be the probability that events arrive during the time interval
. We
make use of the previous three properties to determine the functional form of (
). First,
(
) must satisfy:
) = 0(
) 0( )
(4.143)
0( +
and we impose
0 (0)

=1

(0) = 0 for

(4.144)

Eq. (4.143) and the rst condition in (4.144) are satised by 0 ( ) =


, for some constant
, which we take to be positive, so as to ensure that 0 [0 1]. Furthermore, we have that:
1

( +

) =
..
.

( +

) =
..
.

)+

)+

(4.145)

The rst equation in (4.145) can be rearranged as follows:


1

( +

For small ,
0
)=
1(

(
1(

)
)

1
)+

(
(
0
(

)
1

)+

) and 0 ( ) = 1
+
). By a similar reasoning,

)=

)+

The solution to this equation is:


(

)=

)
!

185

)
. Therefore,

c
by
A. Mele

4.10. Jumps
4.10.2 Interpretation
A Poisson model is one of rare events. Moreover, by:
(event arrival in

)=

)=

For this reason, we usually refer to the parameter as the intensity of event arrivals.
To provide additional intuition about the mathematics of rare events, consider the expression
for the probability of arrivals in trials, predicted by a binomial distribution:

!
=
=
0 + =1
!(
)!
where is the probability of arrival for each trial. We want to model the probability as a
function of , with the feature that lim
( ) = 0, so as to make each arrival rare. One
possible choice is ( ) = , for some constant
0. Under this assumption, we have:
=
=
=

!
!(
!
!(

)!
!

!(

)!
!

leaving,

)!

( ) (1

( ))

)! !
1

{z

times

lim

+1

} !

!
, and then make the probNext, we split the interval (
) into subintervals of length
ability of one arrival in each sub-interval proportional to each sub-interval length, as illustrated
in Figure 4.3,
( )=

The Poisson model in the previous section is thus as that we consider here, with
,
which is continuous-time, as each sub-interval in Figure 4.1 shrinks to . The probability there
is one arrival in
is
, which is also the expected number of events in
as shown below:
(# arrivals in )
= Pr (one arrival in
= Pr (one arrival in
=

) one arrival + Pr (zero arrivals in


) 1 + Pr (zero arrivals in ) 0

) zero arrivals

The heuristic construction in this section opens the way to how we can simulate Poisson
processes. We can just simulate a Uniform random variable (0 1), with the continuous-time
186

c
by
A. Mele

4.10. Jumps
process being approximated by

where

, where:

0 if 0
=
1 if 1

1
1

is a discretization interval.
n (

t)

t
n subintervals

FIGURE 4.3. Heuristic construction of a Poisson process from a binomial distribution.

4.10.3 Properties and related distributions


We check that

is a probability. We have,
X

=0

=0

=1

where the last equality follows by a McLaurin expansion of


Mean =

X
=0

=0

. The mean is,


!

(4.146)

A related distribution is the exponential (or Erlang) distribution. Remember, the probability
(
)
of zero arrivals in
predicted by the Poisson model is 0 (
)=
, from which it
follows that:
(
)
(
) 1
)=1
0(
is the probability of at least one arrival in
. The function can be also interpreted as the
probability the rst arrival occurred before , starting from . The density function of is:
(

)=

)=

The rst two moments of the exponential distribution are:


Z
Z

1
=
Variance =
Mean =
0

1 2

The expected time of the rst arrival occurred before starting from equals 1 . More generally, 1 can be interpreted as the average time from an arrival to another.13
A more general distribution than the exponential is the Gamma distribution with density:
(

)=

The exponential distribution obtains when

)(

(
(

))
1)!

= 1.

13 Suppose arrivals are generated by Poisson processes, and consider the random variable time interval elapsing from one arrival
0 which will elapse from
to next one. Let 0 be the instant at which the last arrival occurred. Then, the probability the time
0 , there is at least one
the last arrival to the next is less than
is the same as the probability that during the time interval
arrival.

187

c
by
A. Mele

4.10. Jumps
4.10.4 Asset pricing implications

This section is a short introduction to modeling asset prices as being driven by Brownian
motions and jumps processes. We model jumps by interpreting the arrivals in the previous
sections as those events upon which a certain random variable experiences a jump of size S,
where S is another random variable with a xed probability . A simple model is:
= (

+ (

)S

+ (

where
are given functions (with
0),
Poisson process with intensity equal to , i.e.

(4.147)

is a standard Brownian motion, and

is a

(i) Pr ( ) = 0.
(ii)

= 1
(iii)

( 0 ) and

( )

1)

are independent for each

is a random variable with Poisson distribution and expected value (

),

i.e.:
Pr (

= )=

)
!

Using the property in Eq. (4.146), we have that


)= (

In this framework, is the number of jumps over the time interval


(
)
have that Pr (
= 1) = (
)
and for
small,

(
)
= 1) Pr (
|
= 1) = (
)
Pr (

.14 From this, we


'

(
))
is a martingale.
More generally, the process (
Armed with these preliminary facts, we can provide a heuristic derivation of Itos lemma for
jump-di usion processes. Consider any function with enough regularity conditions, a rational
function of time and in Eq. (4.147), i.e. ( )
(
). Consider the following expansion of
:
)
( )=
(
+[ ( + (

+ (
)S )

) ( )
(
)]

The rst two terms in are the usual Itos lemma terms, with denoting the usual innitesimal
generator for di usions. The third term accounts for jumps. If there are no jumps from time
to time (where
=
), then
= 0. If there is a jump then
= 1, and in this
case , as a rational function, needs also instantaneously jump to ( + ( ) S ). The
jump will be exactly ( + ( ) S )
(
), where S is another random variable with
14 For

simplicity, we take

to be constant. If
Pr ( ( )

and there is also the possibility to model

is a deterministic function of time, we have that

( )= )=

( )
!

exp

as a function of the state:

188

( )

=0 1

= ( ), for example. Cox processes.

c
by
A. Mele

4.11. Continuous time Markov chains

a xed probability measure. Clearly, if (


) = , we are back to the initial jump-di usion
model in Eq. (4.147).
To derive the innitesimal generator for jumps-di usion,
say, note that:
( )=(
=(

)
)

or
=

+
+
Z

[( ( + S
[( ( + S

supp(S)

[ ( + S

)
)

]
))
)) ]

(
(

)] ( S)

where supp (S) denotes the support of S. Therefore, the innitesimal generator for jumpsdi usion is, simply,
.
4.10.5 An option pricing formula
Merton (1976, JFE), Bates (1988, working paper), Naik and Lee (1990, RFS) are the seminal
papers.
Obviously we cannot hedge. Make the argument. Use Itos lemma and show that the strategy
cannot hedge the jump component.

4.11 Continuous time Markov chains


Needed to model credit risk.

189

4.12. Appendix 1: An introduction to stochastic calculus for nance

c
by
A. Mele

4.12 Appendix 1: An introduction to stochastic calculus for nance


4.12.1 Stochastic integrals
4.12.1.1 Motivation

Given is a Brownian motion


( ),
0, and the associated natural ltration F . This appendix
aims to explore the sense to be given to the integral,
Z
( )
( )
( )
(4A.1)
0

where

is a given function or, more generally,


Z
( )

( ; )

( )

where is now a progressively F -measurable function.


The motivation is so apparent from how useful many of the models in these Lectures are, which build
upon Brownian motions. Let us illustrate. Given a probability space (
) on which
is Brownian
motion, and some
, let us write
for the increment of
over an innitesimal amount of
equals
as
0.
We
may
think of the increment
time. In a heuristic sense,
+
as being normally distributed:
(0 ), and use this as a mean to build up a richer process
, say, solution to:
( )= ( ) + ( )
(4A.2)
for some () and () to be dened later.
These processes are known as Itos processes, as further explained below. The intuition on () and
() is as follows. Heuristically, we have that [
( )] = [ ( )] +
(
) = [ ( )] ,
. So this model is richer than
such that () is related to the instantaneous expected changes of
Brownian motions because can be di erent from identically zero.
Think of
as an asset price process. No risk-averse individuals are likely to invest in the asset
should its expected capital gain be zero, ( ) = 0. Eq. (4A.2) also allows to thinl about the asset
)=
return variance to be expected over the next innitesimal amount of time, being equal to
(
(
)]2 = (
)2 , which turns out to equal 2 .
[
The process is called the drift and the process is called the di usion coe cient, or the
volatility of . Clearly, the drift determines the trend, and the volatility determines the noisiness
of
around that trend. Both drift and di usion coe cients need to be adapted processes, as we shall
and
0. In this case,
explain. One example of drift and di usion coe cients. Assume that
=
, which shows that
still uctuates randomly. Here, is a stochastic
we have that:
process and so is . Its innitesimal variations can be predicted. But its further development cannot.
We would say that
is locally riskless in this example.
Let us proceed with a more delicate example, relating to strategies and trading gains. Suppose that a
stock price is just a Brownian motion. Assume it does not distribute dividends over some time-horizon
of it at time . What are our trading gains from 0 to ? Later, we
of interest, and that we hold units
R
( ), is the answer to this question: intuitively, this expression
shall argue that the expression, 0
is the sum of the instantaneous capital gains on the assets, multiplied by the units of the asset that
are held. This expression is what is known as stochastic integral.
Why are we insisting in modeling asset prices through Brownian motions? As we shall see, Brownian
motions are wild in some sense, i.e. they are of unbounded variation on any interval. So why dont
we go for smoother processes? The answer is that smoother processes would give rise to arbitrage
opportunities. Harrison, Pitbladdo and Schaefer (1984) show that in continuous time, asset prices must
be wild. Intuitively, if stock-prices are continuous in time and have nite variation, we could predict
them over the immediate future, thus cashing-in the capital gains.

190

c
by
A. Mele

4.12. Appendix 1: An introduction to stochastic calculus for nance

A Brownian motion is actually nowhere di erentiable. Therefore, Eq. (4A.2) should be only understood as a shorthand for,
Z
Z
( )= 0+
( ) +
( )
( )
0

The question then is what does the stochastic integral 0 ( )


( ) mean? In standard calculus,
the integral can be dened from its di erential. To anticipate, in stochastic calculus this is no longer
the case, in that the stochastic integral is the real thing.
We now provide short reviews of the ordinary Riemann integral and the Riemann-Stieltjes integral,
explaining that these two approaches to pathwise integration generically fail to provide a solid foundation to the expression ( ) in Eq. (4A.1). To anticipate, the main issue relates to unboundedness
of Brownian motions:
X

( )
sup
1( ) =
=1

where the supremum is taken over all partitions of [0 ]. We shall state conditions on how bounded
the integrator and integrands in ( ) should be to make Riemann-Stieltjes theory go through. Unfortunately, these conditions are restrictive within the context of interest in nance. For example,
Riemann-Stieltjes theory works when one takes ( ) = 1, or ( ) = in Eq. (4A.1). However, this
theory doesnt hold in more general context.
4.12.1.2 Riemann

Given is 7
( ),
(0 1). Consider two standard denitions. First, a partition,
: 0 = 0

=
1
and
=
=
1

.
Second,
an
intermediate
partition,
:
1
1
1
any collection of values
satisfying
, = 1 . Then, for a given partition
and
1
intermediate partition , the Riemann sum is dened as:
(

( )

=1

Its a weighted average of the values taken by ().


, and consider Mesh ( )
0 by sending
. If the limit,
Next, let Mesh ( ) max =1
(
), exists, and is independent of
and , then it is called the Riemann integral of
lim
on (0 1) and it is written:
Z 1
()
0

Two properties are worth mentioning:


1. Linearity: Given two constants
2. Linearity on adjacent intervals:

4.12.1.3 Riemann-Stieltjes

and

R1
0

()

2,

R1
0

(
R

1 1(

()

)+

2 2(

R1

R1
1 0 1(

))

()

for every

R1
2 0 2(

) .

(0 1).

The main idea is to integrate one function with respect to another function . One standard
example relates to the computation of the expectation of a random variable with distribution function
. Heuristically, we have that:
Z

()

[ ( )

191

1 )]

c
by
A. Mele

4.12. Appendix 1: An introduction to stochastic calculus for nance

More generally, let us be given two functions and , and consider, again, the denitions of
given earlier. Let
= ( )
( 1 ) = 1 . The Riemann-Stieltjes sum is dened as:
(

)=

( )

=1

Clearly the Riemann sum is a special case obtained with the identity function ( ) = . Similarly as
when proceeding with denition of the Riemann sum, the Riemann-Stieltjes integral of with respect
(
), provided it exists and is independent of
and . It
to on (0 1) is the limit, lim
is written:
Z
1

()

The crucial issue now is how we can use


R 1 Riemann-Stieltjes theory to dene integrals of functions
against Brownian motions: can interpret 0
( ) as a Riemann-Stieltjes integral, path by path,
i.e.
? The answer is in the negative, except in very special cases.
Indeed, a natural example of
R
(
)
( ). But what does this
an integral of functions with respect to Brownian motion is ( )
0
representation mean? We know that a - path is not-di erentiable. However, the main point, here,
does not even relate to di erentiability, but to a quite peculiar property, known as unboundedness of
Brownian motions. To introduce this property, consider a certain function , such as that depicted in
the next picture.
f(t)

t0

t2

t1

Consider, then, its rst variation, dened as:


[ ( 1)

( 0 )]

[ ( 2)

( 1 )] + [ ( )

( 2 )] =

()

( )

()

()

We can see that this rst variation is a measure of the total amount of up and down motion of the path
of the function . We can formalize this reasoning as follows. Let be a function of a real variable.
Its variation in an interval [ ] is dened as

X ( )
( )
([ ]) = sup
( 1 )
( )
=1

( )

( )

( )

where the supremum is taken over the partitions

. By the triangle
0
1
inequality, | ( )
( )| = | ( )
( )+ ( )
( )|
| ( )
( )| + | ( )
( )|, the sums in
([ ]) can only increase as we add more and more into the partition, such that,

X ( )
( )
(
([ ]) = lim
)
(
)

1
mesh 0

=1

192

c
by
A. Mele

4.12. Appendix 1: An introduction to stochastic calculus for nance


Next, consider the following denition.
Definition 4A.1. A real function

on (0 1) has bounded -variation,

sup

X
=1

| ( )

0, if

1 )|

where the supremum is taken over all partitions of (0 1).


We have:
Theorem 4A.2. The Riemann-Stieltjes integral,
(i) The functions
(ii)

and

R1

()

( ), exists under the following conditions:

dont have discontinuities at the same points.


has bounded -variation, with 1 +
P
and sup
( 1 )|
=1 | ( )

has bounded -variation and


P
( 1 )|
sup
=1 | ( )

1, that is,
with 1 + 1

satisfy
1.

Now, it is well-known that almost every path has bounded -variation for
2. And, as
expected,
unbounded -variation for
2, as further argued below. Consider, then, the integral,
R1
(
),
and
suppose
is
di
erentiable
with bounded derivatives. By the P
mean value theorem,
0
( 1 )|
thereP
exists a
0 such that: | ( )
( )|
(
) for
. Therefore, sup
=1 | ( )
. That is, has bounded -variation, with = 1. By Theorem 4A.2, we now
1) =
=1 (
path, the Riemann-Stieltjes integral of with respect to Brownian
have that for almost every motions,
Z
( )

( )

( )

exists for every deterministic function which is di erentiable with bounded rst-order derivative.
For example, ( ) = 1, or ( ) = . We arent done. Consider ( ) =
( ) and, then:
(

)( ) =

( )

( )

2
1. The Riemann-Stieltjes
Let = 2 + , for some
0. Hence = = 2 + , and so 1 + 1 = 2+
theory doesnt work even with this simple example. This is where the theory of It
os stochastic integrals
comes in.

4.12.1.4 A digression on unboundedness of Brownian motions

Why do Brownian motions display unbounded variation? Consider the Brownian tree below.
%1

&1
Time is

and space is

. In the Brownian tree, we must have,


=

193

(4A.3)

c
by
A. Mele

4.12. Appendix 1: An introduction to stochastic calculus for nance

Indeed, and heuristically, we have that


(
) = ( )2 , which matched to
(
) = , leaves
precisely Eq. (4A.3). Therefore, (|
|) =
=
. Next let us chop a time interval of length
parts. The total expected length traveled by a Brownian motion is,
in
=

as

0.

A more substantive proof is one for example of Corollary 2.5 p. 25 in Revuz and Yor (1999). A
sketch of this proof proceeds as follows. We have:
X

max
1
1
1

because is continuous, and by the HeineMoreover, max


1 converges to zero
Cantor theorem, continuous functions are uniformly continuous on nite intervals. Then, suppose that
has bounded variation, which would imply that
X
2
0 Mesh 0
1

which
is impossible,
2 as we shall now argue. Indeed, in the next section, we shall establish that
P
, implying that p lim
= . Therefore, there exists a sequence
:
1
for all
. (Convergence in probability does not imply almost sure convergence, yet it implies that
s.t. a.s. convergence, which is what we just need here.)
there exists a suitable subsequence
4.12.1.5 It
o

Let us begin with a rst example, which can help grasp the nature of the issues under study. Consider
(

)( ) =

( )

( )

Consider, then, the following Riemann-Stieltjes sum:


=

=1

where the intermediate partition relies on the left-end points =


tions leave:
X

1 2
=
()
()
(
2

= 1

. Simple computa-

)2

=1

The quantity
We have,

( ) is known as Quadratic Variation, a quite useful concept in nancial econometrics.


( )] =

)2 =

=1

Moreover,

[(

)2 ]

(0 1), which implies that

( )) =

X
=1

)2 ]

[(
2
2

=2

=1

2,

= 2

2 (1).

where the last equality follows because

Hence,
2

=1

X
=1

194

Mesh(

= 2 Mesh(

c
by
A. Mele

4.12. Appendix 1: An introduction to stochastic calculus for nance


But

( )) =

()

(
(

( )))2 =
( )) =

)2 . Therefore,

()
)2

()

-pointwise.

This type of convergence is called convergence in quadratic mean of


( ) to and is written
()
, as explained in more detail in the appendixes of Chapter 5. By the celebrated Chebyshevs inequality,
convergence in quadratic mean implies convergence in probability:
()

0 Pr {|

)2

()

Issues related Rto uniform convergence issues will be dealt with later.

To sumup, 0 ( )
( ) doesnt exist as a Riemann-Stieltjes integral. Nevertheless, the previous
facts suggest that a good denition of it could hinge upon the notion of a mean square limit, viz
=

=1

or, as we shall explain,

=
R

1
2

1
2

=1

()

2
where 0
=2
has the Itos sense.
R
Clearly, 0
does not satisfy the usual Riemann-Stieltjes rule of integration. (For any smooth
R
function such that (0) = 0, the Riemann-Stieltjes integral 0 ( ) ( ) = 12 2 ( ).) This doesnt
work here because we have yet to see what the chain-rule
- is. This will lead us
of
R for functions
2
= 12
. This example vividly
to the celebrated It
os lemma, which shall conrm that 0
illustrated that standard integration methods fails. In fact, the timing of the integrands is quite critical.
For example, in Riemann integration, the integrand can be evaluated at any
inthe

interval. If
P point
we apply this to the kind of integrals
we
are
studying
here
we
obtain,
lim
1
1

P
(for the left boundary) and lim
(
)
(for
the
right
boundary).
But
the
two
limits
1
do not agree. The expectation of the rst is zero (by the law of iterated expectations), while the
expectation of the second is not necessarily zero. Finally, Riemann integration theory di ers from the
integration theory underlying the previous example because of the mode of convergence utilized in the
two theories.
A short digression is in order. The so-called Stratonovich stochastic integral selects as points of the
intermediate partion the central ones:

=1

1
(
2

+ )

For the Stratonovich integral, the usual Riemann-Stieltjes rule applies, yet the Stratonovich stochastic
integral isnt Riemann-Stieltjes.
4.12.1.6 The It
os stochastic integral for simple processes

Let F be the -augmentation of the ltration of


= , and the following denition:

. Consider [0

Definition 4A.3 (Simple processes). The process


(i) There exists a partition

and a sequence of r.v.

if =
=
if
1

195

] and partitions

=(

[0

, = 1
= 1

:0=

]) is simple if

, s.t

c
by
A. Mele

4.12. Appendix 1: An introduction to stochastic calculus for nance


(ii) The sequence ( ) is F
(iii)

2)

all

-adapted, = 1

2 ).

As an example, consider
we have:

, if = , and

, if

Definition 4A.4. The It


os stochastic integral of a simple process
Z

=1
X1

=1

with the notation

P0

=1

. Next,

is,

=1

= 1

on [0

0.

It is a Riemann-Stieltjes sum of with respect to Brownian motions evaluated at left-end points.


Finally, we proceed with listing a set of useful properties.
Property 4A.P1.
zero.

( ) =

[0

] is a F -martingale and has expectation equal to

Proof. Let us check that ( ) is a F -martingale. We have to check three conditions: (i) | ( )|
,
. Condition (i) follows by the
all
[0 ]; (ii) ( ) is F -adapted; (iii) [ ( )| F ] = ( ),
isometry property to be introduced below. Condition (ii) is trivial. To show (iii), suppose, initially,
. We have:
that
[ 1 ],

( )=

X1
=1
X1
=1

=
[

( )| F ] =
=

( )+

( )| F ] +

( )+

+
)

[(

)| F ]
)| F ] =

( )

[ 1 ],
is proven similarly. Finally, ( ) has zero expectation
The case
[ 1 ] and
( ( )) = 0 all . That is, ,
because it starts from the origin by the denition: 0 ( ) = 0
[ ( )] = [ 0 ( )] = 0 ( ) = 0.
Property 4A.P2 (Isometry).

196

, for all

[0

].

c
by
A. Mele

4.12. Appendix 1: An introduction to stochastic calculus for nance


Proof. Without loss of generality, set =
Z

. We have:

=1

XX

=1 =1

2
1

=1

where the last equality follows because


Then,
2

2
1

=1

=1

=1

X
=1

and

!2

F
1

are independent, for all 6= .

2
F
1

1)

1)

Property 4A.P3 (Linearity and linearity on adjacent intervals).


Property 4A.P4.

( ) has continuous

-paths.

4.12.1.7 The general It


os stochastic integral

We now consider a more general class of integrand F-adapted processes


,
[0 ] satisfying
2
R
2(
),
which
is
obviously
satised
by
simple
processes,
although now
,
and
0
2
2
(
). So let
we are now moving to continuous time. Clearly, H is a closed linear subspace of
2(
2 be the subset of H2 consisting of all simple processes.
be
the
norm
of
).
Let
H
kk 2 (
0
)
We now outline how to construct the stochastic integral, in four steps.
Step 1: (H02 is dense in H2 ). It is possible to show that for any

R
( )
simple processes ( ) s.t
0,
i.e.
(
2(
0
)
( )
is a Cauchy sequence in
Step 2: By step 1,
integral for simple processes

( ( ) ) 2
( ( ))
Therefore,
plete, and so

( )

2(

( )

H2 , there exists a sequence of


( ) 2
)
0.

). By the isometry property of the Itos

( )

( 0)

2(

is a Cauchy sequence in 2 ( ). Now it is well-known that 2 ( ) is com


must converge to some element of 2 ( ), denoted as ( ).

( )

197

c
by
A. Mele

4.12. Appendix 1: An introduction to stochastic calculus for nance


Step 3: This limit is called the It
os stochastic integral of
( )=

, and is written as

0
( )

Finally, the limit is well-dened: if there is another


lim

( )

) = lim

( ))

( ) in the

2(

) norm.

( )

2(

0, then

Step 4: (Itos integral as a process) We wish to create a whole continuum of It


os integrals at a
single glance. Step 3 is not enough because we need uniform convergence on [0 ]. To show
that its feasible lies beyond the aim of these introductory lectures. The nal result is, For any
[0 ]) which is a continuous F -martingale s.t
H2 , there exists a process ( ,
=

[0

],

-a.s.

To summarize, then, let us introduce the two spaces,


n
o
n
R
L: 0 | |
a.s.
=1 2
H2 =
L =

L2 :

| |2

o
a.s.

R
where L denotes the set of all adapted processes. Let
H2 . The stochastic integral ( ) = 0
satises the following properties: (i) Continuous sampleR paths, and R( ) is aF -martingale; (ii) Expec2
)2 = 0
[0 ], hence
tation
on H2 , i.e. ( 0
R equal to zero;R(iii) Itos isometry
R

2
2
2
[ 0
[ 0
] = 0 ( )
; (iv) Linearity and linearity on adjacent intervals.
]

A few remarks are in order. If


H2 , then solution to
=
is a martingale. If 6 H2 , but
L2 , is, instead, called a local martingale. The converse is the Martingale Representation Theorem.
This theorem states that if is a F -martingale, then there exists a
H2 :
=
. This result
is utilized in the main text of this chapter, when it helps us tell whether we live in a world with complete
or incomplete markets. Moreover, in continuous-time nance, is often a portfolio strategy. It must
be in H2 to avoid doubling strategies, which are a kind of arbitrage opportunities (at least in absence
of frictions such as short-selling constraints). Assume, for example, that an asset price is , and that
this asset does not distribute dividends from 0 to . Then
is the instantaneous gain from holding
arbitrarily
one unit of this asset. The condition
H2 implies that these strategies cannot become
R
2
suggest
large according to the H criterion. Moreover, the previous properties of ( ) = 0
= 0 + ( ) is a martingale (not only a local martingale).
that the cumulative gain process
Therefore, no investor expects to make prots from investing in this asset.
4.12.1.8 It
os lemma: Introduction

We develop, heuristically, a basic version of It


os lemma, with its most general version stated further
in this appendix. Let : R 7 R be twice continuously di erentiable. We have:
(

)=

0) +

1
2

00

(4A.4)

where the rst integral is an It


os stochastic integral, and second one is a Riemanns one. For example,
let ( ) = 2 . Then,
2

=2

198

1
2

(4A.5)

c
by
A. Mele

4.12. Appendix 1: An introduction to stochastic calculus for nance


To provide a sketchy proof of Eq. (4A.4), note that:
)

0)

X1

+1

+1

=0

By Taylor,

where min
( )=

+1

+1

( )

for some

)=

max
( ):

Wti

1
2

00

( )

+1

, as in the gure below. Because


+1 .
+1

( )

is continuous,

Wti

ti

ti

Therefore,
(

X1

0) =

=0

+1

1X
2

00

=0

+1

We have
X1

00

=0

Finally,

0(
00 (

+1

)(

X1

2
+1

+1

00

)(

+1

=0

0(

00 (

More technical details in order of descending di culty can be found in Karatzas and Shreve (1991),
Arnold (1974), Steele (2001) and Mikosch (1998).
Let us reconsider the example in Eq. (4A.4). By the stochastic integral theorem, is a martingale.
This is conrmed by Eq. (4A.4). According to Eq. (4A.4),
Z

and

is indeed a martingale for

1
2

= all .

199

4.12. Appendix 1: An introduction to stochastic calculus for nance

c
by
A. Mele

4.12.2 Stochastic di erential equations


4.12.2.1 Background

Consider the di erential equation:


=

for some function . Randomness can be introduced via an additional noise term:
=
We already know that a -

+ (

is not di erentiable, so this is only a short-hand notation for:


Z
Z
= 0+
(
) +
(
)
0

(4A.6)

where the rst integral is Riemann and the second integral is an Itos stochastic integral.
We have the following denitions. First, we say that an It
os process is,
( )=

( )

( )

Moreover, we say that an It


os di usion process is,
=

+ (

It is known that an Itos di usion process is a Markov process. The previous equation is also called a
stochastic di erential equation (SDE). In a SDE, and depend on only through . Finally, we
say that a time-homogeneous di usion process is,
=

( )

+ ( )

There is a beautiful property that is used to price nancial derivatives, using replication arguments,
as explained in the main text, called the unique decomposition property. Suppose we were given two
processes and with 0 = 0 , and that:
=
=
Then
R
( 0 |

almost surely if and only if


=
R
| = 0) = ( 0 |
| ) = 0.

and
and

=
=

+
almost everywhere, in the sense that

4.12.2.2 Basic denitions, properties and regularity conditions

How do we know whether


os
example, the It
R As an
R the various integrals given before are well-dened.
. But how
)
works if is F -adapted and 0
(
)2
integral representation 0 (
can be sure that these two basic conditions are satised if we dont know yet the solution of ? And,
above all, what is a solution to a SDE? We have two concepts of such a solution, strong and weak.
Definition 4A.5. (Strong solution to a SDE) A strong solution to Eq. (4A.6) is a stochastic
process = (
[0 ]) such that:
(i)

is F -adapted.

(ii) The integrals in Eq. (4A.6) are well-dened in the Riemanns and It
os sense and Eq. (4A.6)
holds
-almost surely

200

4.12. Appendix 1: An introduction to stochastic calculus for nance


(iii)

| |2

c
by
A. Mele

In other words, the denition of a strong solution requires that a Brownian motion be given in
advance, and that the solution
constructed from it be then F -adapted.
Next, suppose, instead, that we were only given 0 and two functions ( ) and ( ), and that
we were asked to nd a pair of processes ( ) on some probability space ( F ) such that Eq.
-adapted on some space, not necessarily the one in Eq. (4A.6). (Clearly
(4A.6) holds with being F
such a needs not to be F -adapted.) In this case ( ) is called a weak solution on ( F ). In the
case of a weak solution, we are given
and then
R we have to nd
R two things: a Brownian motion
-adapted process such that = 0 +
and a F
+ 0 ( ) holds
-almost
0 ( )
surely. Clearly, a strong solution is also weak, but the converse is not true. Consider the following
example.
satisfy:

Example 4A.6. (Tanaka equation) Let

= sign( )

=0

This equation has no strong solutions, for dene


Z
=
sign( )

(4A.7)

(4A.8)

where is a Brownian motion. It can be shown that is G -measurable, where G is the -algebra
, where F
is the -algebra generated by . Therefore, the F
generated by | |. Clearly G
. Armed with this result, we can easily show that
algebra generated by is also strictly contained in F
there are no strong solutions to Eq. (4A.7). To show this, suppose the contrary. There is a theorem
would then be a Brownian motion. On the other hand, Eq. (4A.7) can also be written
saying that
= sign( )
or
=

=0

sign( )

By the same reasoning produced to show that the -algebra generated by is strictly contained in F
in Eq. (4A.8), we conclude that the -algebra generated by
is strictly contained in the -algebra
is a strong solution to Eq. (4A.7).
generated by . But this contradicts that
Clearly, one needs to be able to impose conditions that allow to distinguish between weak and
strong solutions. However, the only focus of the following discussion is about the regularity conditions
ensuring existence and uniqueness of strong solutionsthe case of interest in continuous-time nance.
We need two types of restrictions on and . Consider the following denition. For a given function
, we say that it satises a Lipschitz condition in if there exists a constant , such that for all
(
) R R ,
k ( )
( )k
k
k uniformly in
p
Tr ( > ). In other words, cannot change too widely. We also say satises a growth
where k k
condition in , if there exists a constant such that for all (
) R R ,

k ( )k2
1 + k k2 uniformly in .
That is,

cannot grow too much.

201

4.12. Appendix 1: An introduction to stochastic calculus for nance

c
by
A. Mele

Next, we turn to the concepts of existence and uniqueness of a solution to a stochastic di erential
(1)
(2)
(1)
equation. We say that if
( ) and
( ) are both strong solutions to Eq. (4A.6), then
( )=
(2)
( )
-a.s. We have:
Theorem 4A.7. Suppose that
satisfy Lipschitz and growth conditions in , then there exists
a unique It
os process satisfying Eq. (4A.6) which is continuous adapted Markov.
Consider the following stochastic di erential equation:
=

for some constants


. This is the so-called square-root process utilized to model equity volatility
(see Chapter 10), the short-term rate (see Chapter 12) or instantaneous probabilities of default of debt
issuers (see Chapter 13). The point here, for now, is that the di usion component does not satisfy
the conditions in Theorem 4A.7. Yet it is possible to show that under suitable parameter restrictions
there exists a strong solution. Incidentally, the solution to this simple equation is still unknown.
What about uniqueness of the solution? It is well-known that if
are locally Lipschitz continuous
in , then strong uniqueness holds. But even for ordinary di erential equations, a local Lipschitz
condition is not necessarily enough to guarantee global existence (i.e. for all ) of a solution. For
example, consider the following equation:
=

( )

=1

has as unique solution:

1
0
1
1
Yet is impossible to nd a global solution, i.e. one dened for all . This is exactly the kind of pathology
ruled out by linear-growth conditions. More generally, linear-growth conditions ensure that | ( )| is
unique and doesnt explode in nite time. Naturally, Lipschitz and growth conditions are only su cient
conditions to guarantee the previous conclusions.
A nal remark. The uniqueness concept used here refers to strong or pathwise uniqueness. There
are also denitions of weak uniqueness to mean that any two solutions (weak or strong) have the same
nite-dimensional distributions. For example, the Tanakas equation introduced earlier has no strong
solution, yet it can be shown that it has a (weakly) unique weak solution.
=

4.12.2.3 It
os lemma

It
os lemma is a fundamental tool of analysis in continuous-time nance. It helps build up new
processes from old processes. Two examples might clarify.
(i) A share price is certainly a function of its dividend process. If the dividend process is solution
to some SDE, then the asset price is a solution to another SDE. Which SDE? It
os lemma will
give us the answer.
(ii) Derivative products, reviewed in the third part of these lectures, are nancial instruments, with
a value depending on some underlying factors, whence, the terminology, derivative. In other
words, derivative prices are functions of these factors. If factors are solutions to SDE, derivative
prices are also solutions to SDE. Once again, Itos lemma will provide us with the right SDE.
Naturally, the functional form linking the dividend process (or the factors) to the asset prices is
unknown. But in situations of interest, no-arbitrage restrictions will help to pin down such a functional
form.

202

c
by
A. Mele

4.12. Appendix 1: An introduction to stochastic calculus for nance

Let us proceed with a few preliminary heuristic considerations. A useful heuristic denition is that
the increments of a Brownian motion,
, can be thought of as being equal to
as
0.
+
as being normally distributed,
(0 ). Heuristically,
We may think of the increments
(0
). But then, by the previous normality property of
,
indeed,
+

2
2
(
) = 0 and
) = , and
= , hence
(
=2 2
where the second equality follows by the property 2 distributions.
2 , which is proporThe point of the previous computations is that for small , the variance of
2 , is negligible if compared to its expectation, which is
. Heuristically,
()
and
tional to
2
( )
. These heuristic considerations lead to the following, celebrated table below.

( )

(
)2
(
)
1

=
=
=
=
=

0
0

It
os multiplication table
for
1

0
0

for
2
for two independent Brownian motions

We now use this table, and heuristically derive It


os lemma. Let
=

be the solution to,

and suppose we are given a function ( ), which we assume to be as di erentiable in (


times as needed below. We expand as follows:
(

)=

1
2

)(

where the remainder contains only terms of order higher than (


will be clear in one moment we will discard it. We have,
=

+
(

1
2

)+

By the It
os multiplication table,
=

)=

)+

)2 and ( )2 . So for reasons which

1
2

)2

+
2

( )2 +

|{z} +

1
) +
2

)2 + 2

By rearranging terms,

)2 + Remainder,

)2
1
2
1
+
2

) as many

| {z } + 2

| {z })
0

This is It
os lemma.
Naturally, Itos lemma also holds when is a multidimensional process. A heuristic derivation of it
can be obtained through the It
os multiplication table applied to the following expansion:
(

)=

203

1X
2

c
by
A. Mele

4.12. Appendix 1: An introduction to stochastic calculus for nance


Then, we have:

s lemma, multidimensional) Let us be given a multidimensional process


Theorem 4A.8. (Ito
solution to,
= (
) + (
)
(4A.9)
and
where
is in
,
is in
is a -dimensional vector of independent Brownian motions. Moreover, let us be given a function ( ), which is twice continuously di erentiable in and
di erentiable in . Then is an It
os process, solution to:

)=L (

or more formally,
(

)=

0) +

where

) and

) (

(4A.10)

1 >
) + Tr
( )
2
) are the gradient and Hessian of
with respect to .

L (
and

L (

)=

)+

Note that by Eq. (4A.10), and provided


H2 , is a martingale whenever
> L ( ) = 0, for
1
( ) is usually
all
. As a matter of terminology, the operator A ( ) = ( ) + 2 Tr
referred to as the innitesimal generator of the di usion process in Eq. (4A.9).

204

c
by
A. Mele

4.13. Appendix 2: Self-nanced strategies, from discrete to continuous time

4.13 Appendix 2: Self-nanced strategies, from discrete to continuous time


4.13.1 The basic dynamics
We have,
+
where
account.
We have,

1 +1

=(

2 +1

is wealth net of dividends, and

=
=

=(

is the overall value of the money market

1) 1

+(

1 1

1 2

1) 2

1 1
1 1

1)

and more generally,


=(
Now let

0 and assume that

+(

and

=(

are constant between


+

)+(

and

. We have:

Assume that
=
The budget constraint can then be written as:
=(

=(

+ (

=(

1
1

+
+

4.13.2 Models with nal consumption only


We might
be interested in models where consumption takes place only at the end of the period. Let

|
= (0)
and = ( (0) ), where and are both -dimensional. Dene as usual wealth as of
. There are no dividends. A self-nancing strategy satises,
time as
+1 =

= 1

Therefore,
= +
= +
=

(because is self-nancing)
1
= 1

or,
=

=1

205

4.13. Appendix 2: Self-nanced strategies, from discrete to continuous time

c
by
A. Mele

Next, suppose that


(0)

(0)
1

= 1

with { } =1 given and to be dened more precisely below. The term


=

(0) (0)

(0) (0)
1
(0) (0)
1

=
=

can then be rewritten as:

+
(because is self-nancing)

1
1

and we obtain
= (1 + )

or,
=

=1

Next, considering small time intervals. In the limit we obtain:


=

Such an equation can also be arrived at by noticing that current wealth is nothing but initial wealth
plus gains from trade accumulated up to now:
Z

= 0+
0

+
=

+
+

+
+

Now consider the sequence of problems of terminal wealth maximization:

max
[ ( )| F 1 ]
For = 1
P :
= (1 + )
s.t.
1
1 +
Even if markets are incomplete, agents can solve the sequence of problems {P } =1 as time unfolds.
Each problem can be written as:
"
!
#

(
) F 1
max
1+
1
1 +

=1

The FOC for any generic

leads to:

0(

) +1 F

( 0(

)| F )

206

= 0

4.13. Appendix 2: Self-nanced strategies, from discrete to continuous time

c
by
A. Mele

where =
, i.e. the prices expressed in terms of the money market numeraire.
The previous relations suggest that we can dene a martingale measure
for the price process
(expressed in terms of the money market numeraire), by dening

0(

)
=

0
( ( )| F )
F

207

c
by
A. Mele

4.14. Appendix 3: Proof of selected results

4.14 Appendix 3: Proof of selected results


4.14.1 Proof of Theorem 4.3
As mentioned in the main text, we have that by by the Girsanovs theorem, Q is non-empty if and
only if Eq. (4.74) holds true. Therefore, the proof will rely on Eq. (4.74). If part. With
0, Eq.
(4.75) is:
0

>

0
0
0
1
[ 01
]. An arbitrage opportunity is
a.s., which comwhich implies, =
0
0
0
1
= 0
-a.s. (if a r.v. 0 and
() = 0, this
bined with the previous equality leaves:

0
1
means that = 0 a.s.) and, hence, -a.s. The last equality is in contradiction with Pr 0
0
0, as required by Denition 4.3.
Only if part. We combine portions of proofs in Karatzas (1997, thm. 0.2.4 pp. 6-7) and ksendal
(1998, thm. 12.1.8b, pp. 256-257). We let:

( )={

: Eq. (4.74) has no solutions}

={
n
=

( )

( )

( ):

( )>

h i}

( )> ( ( )

( ) = 0 and

o
( )) 6= 0

and consider the following portfolio,

( ; ) =

( )> ( ( )
0

( ))

( )

for
for

( )
( )

Clearly is ( ; )-measurable, and generates, by Eq. (4.71),

Z
Z

> (

So the market has no arbitrage only if I

1
0

> (

1
0

I
)

>
0

= 0, i.e. only if Eq. (4.74) has at least one solution. k

208

c
by
A. Mele

4.14. Appendix 3: Proof of selected results


4.14.2 Proof of Eq. (4.82).
We have:
=E

+
0

=
=

+
0

+
0

+
0

Z
Z

)
0

where we used the fact that is adapted, the law of iterated expectations, the martingale property of
, and the denition of 0 .

4.14.3 Walrass consistency tests


[This section is in progress]
First, we show that Eq. (??)
E . (??). To develop intuition, consider the two-period economy of
Chapter 2, where in the absence of arbitrage, there is a vector (interpreted as the vector of Arrow1) =
= ( 0
=
= 0
Debreu securities)
R such that > ( 1
0 ), whence
= 0 . In the continuous time model, absence of arbitrage and market completeness imply that there
exists a unique
Q such that,
Z
Z
>1
>
0 0 +
>
=1
+
0
0

That is,
0

>

=1
Plugging the solution
0

=
0

0 +

>
0

>

>

>

R
>

>1

>

0
0

1
0

When Eq. (??) holds, we have that


becomes:

>

>

>

>1
>

=
0

>

>
0

209

a martingale starting at zero, satisfying:

0=

=0

>

in the previous relation,

>1

(4A.11)
0

, and

= , and Eq. (4A.11)

c
by
A. Mele

4.14. Appendix 3: Proof of selected results

Since ker( ) = { } then, we have that


=
a.s. for
[
] and, hence,
=
a.s. for
It is easily checked that this implies 0 = 0 -a.s. and that in fact, 0 = 0 a.s.
Next, we show that Eq. (??)
Eq. (??). When Eq. (??) holds, Eq. (4A.11) becomes:
0=

].

,
0

a martingale starting at zero. We conclude by the same arguments used in the proof of the previous
part. k

210

c
by
A. Mele

4.15. Appendix 4: The Greens function

4.15 Appendix 4: The Greens function


4.15.1 Setup
In Section 4.5, it is shown that in frictionless markets, the value of a security as of time

Z
)=
(
)+
(
)
(

is:
(4A.12)

is the stochastic discount factor,

where

The Arrow-Debreu state price density is:


=

Our aim is to characterize this density in terms of partial di erential equations. By the same reasoning
produced in Section 4.5, Eq. (4A.12) can be rewritten as:

Z
0
0 00
0( )
)=E
(
) (
)+
( ) (
)
(4A.13)
(
00
0( )
( (

Next, consider the state vector,


neutral density of . We have,

)=E
(
) (
(
=
If

) and

(
Z

)+

) (

Assuming the same for ,

The function

),

( ( 0 )|

, and let

) be the risk-

) (

) (

) ( ( )| )

) (

) ( |

) are independent,

where:

) (

)=

) (

) (

) (

is known as the Greens function:


(

)=

, provided future states are


It is the value in state
R as of time of a unit of numeraire at
in a neighborhood (in R ) of . It is thus the Arrow-Debreu state-price density.
For example, a pure discount bond has (
) = 1 , and ( ) = 1
, and
Z
)=
(
;
)
with lim (
;
)= (
)
(
where

is the Diracs delta.

211

c
by
A. Mele

4.15. Appendix 4: The Greens function


4.15.2 The PDE connection

We show the Greens function satises the same partial di erential equation (PDE) satised by the
security price, but with a di erent boundary condition, and with the instantaneous dividend taken
out. We have:
Z
Z Z
)=
(
;
) (
)
+
(
;
) (
)
(4A.14)
(
Consider the scalar case. By Eq. (4A.13), and the Feynman-Kac connection between PDEs and conditional expectations reviewed in Section 4.2, we have that under regularity conditions, is solution
to:
1 2
+
+
(4A.15)
0= +
2
where is the risk-neutral drift of . Next, take the following partial derivatives of ( ) in Eq.
(4A.14):
=
=
=

(
Z

and replace them into Eq. (4A.15) to obtain:


Z
1
+
+
0 =
2
Z Z
+
+
This shows that

1
+
2

is solution to
0=

1
2

with lim

212

)= (

c
by
A. Mele

4.16. Appendix 5: Portfolio constraints

4.16 Appendix 5: Portfolio constraints


We are looking for a portfolio-consumption policy (
Z
Val ( ; ) =
( ) +

such that

Val ( )

(4A.16)

for all
[0 ]. Two small remarks on notation. We remind that
and such that leads to
we are dening as in Section 4.7, (i) Val ( ) as the value of the problem an investor faces in the
.
unconstrained market in Eqs. (4.138), and (ii) as the normalized portfolio process,
Its really this. As Cvitanic and Karatzas (1992) put it, we simply search for a member of a family
of unconstrained problems (those arising from the market in Eqs. (4.138)), one for which the optimal
portfolio actually satises the constraint (i.e. without imposing it a priori), and thereby solves the
original problem.
Note that because
contains the origin, the support function in Eq. (4.136) satises ( ) 0
. Moreover, by construction,
for each

(4A.17)

Next, dene the standard Brownian motion under the probability


Nikodym in Eq. (4.140):
Z
Z

1
=
+
+
0 +

, dened through the Radon-

( )+

>

where = 1 ( 1 ), and 0 is the usual Brownian under the risk-neutral probability in a market
without frictions. If the price system is as in the articial market of Eqs. (4.138), then, for any
say, are easily seen to be:
unconstrained portfolio-consumption ( ), the dynamics of wealth,

= > (
1 )+
+ >

>
>
=
+ ( )
+
+
(4A.18)
0

and given
where the second expression follows by a change in probability and the expressions for
in Eq. (4.139).
Therefore, for a given normalized portfolio-consumption ( ), we have that the wealth di erence,
, satises:
()
0

()=
|

Next, consider, the simpler equation,

+ ( )
{z
}

>

>

=0

()

>

0 = 0

(4A.19)

Because
0 by Eq. (4A.17), then, by a comparison theorem (e.g., Karatzas and Shreve (1991,
= 0, where the last equality follows because the solution to Eq. (4A.19) is
p. 291-295)),
= 0 , for some positive process . Therefore, we have,
with an equality if

( )+

>

= 0 for all

Assume now that there exists a and a portfolio-consumption pair (

and

( ) +

213

>

=0

),

(4A.20)

such that
(4A.21)

c
by
A. Mele

4.16. Appendix 5: Portfolio constraints

(Meaning that its a portfolio chosen without imposing the constraint, which then happens to satisfy
the constraint.)
By the inequality in (4A.20), we have that Val ( ; ) Val ( ) for all and, hence,
Val ( ;

Val ( )

and Val ( ;

inf (Val ( ))

(4A.22)

Moreover, we have,
Val ( ;

)=

= Val ( )

(4A.23)

where the second line follows, because the value of the original problem is, of course, larger than that
of any constrained and not-optimally chosen portfolio-consumption ( ). In other words, note that
as soon as the second equality in Eq. (4A.21) holds true, then, the dynamics of wealth in Eq. (4A.18)
collapse to those in the original market with = 0, where the agent is constrained in
(
), such
that the second line follows, because ( ) are at this stage not optimally chosen.
The third line of (4A.23) follows by Eq. (4A.21) and (4A.20). The fourth line is the denition of
Val ( ). Combining the rst inequality in (4A.22) with Eq. (4A.23) leaves,
Val ( ) = Val ( ;

) = inf (Val ( ))

where the last inequality follows by the second inequality of (4A.22).

214

c
by
A. Mele

4.17. Appendix 6: Topics on jumps

4.17 Appendix 6: Topics on jumps


4.17.1 The Radon-Nikodym derivative
This appendix derives, heuristically, results about Radon-Nikodym derivatives for jump-di usion processes. Precise mathematical details can be found in Bremaud (1981). Consider the jump times

= . The probability of a jump in a neighborhood of


is ( ) . To
0
1
2
( ) under , and set
=
,
dene the same probability under the risk-neutral world, write
for some . The probability that no-jump would occur between any two adjacent random points
1
and
and a jump would at time
2, proportional to:
1 is, for
1)

under

and to
(

1)

= (

1)

1)

under

As explained in Section 4.7, these are in fact densities of time intervals elapsing from one arrival to
the next one.
. The Radon-Nikodym derivative is the
Next, let
be the event of marks at time 1 2
likelihood ratio of the two probabilities and of :
1

( )
=
( )

( 1)
1

2
1

( 1)

( 2)

2
1

( 1)

3
2

( 2)
3
2

( 2)

where we have used the fact that given that at 0 = , there are no-jumps, the probability that
1
1
no-jumps would occur from to 1 is
under , and
under . Simple algebra
yields,
( )
=
( )
=

( 1)
Y
=1

"
Y
= exp ln
= exp

=1

ln

=1

= exp

( )

X
Z

( 2)

ln

2
1

1)

3
2

1)

1)

( )
( )

1)

1)

!#

where the last equality follows from the denition of the Stieltjes integral.
Consider, nally, the following denition. Let
be a martingale. The unique solution to the equation:
Z
=1+

is named the Doleans-Dade exponential semimartingale and is denoted as E( ). We now turn to the
arbitrage restrictions arising whilst dealing with asset prices driven by jump-di usion processes.

215

c
by
A. Mele

4.17. Appendix 6: Topics on jumps


4.17.2 Arbitrage restrictions
As in the main text, let now

be the price of a primitive asset, solution to:


=

+ S

+ S(

=( + S )
Next, dene

=
Both and are

-martingales. We have:
=

+ S(

+ S

)+ S

+ S

The characterization of the equivalent martingale measure for the discounted price is given by the
following Radon-Nikodym density of with respect to :
=E

1 (

where E () is the Doleans-Dade exponential semimartingale, and so:


= +

S (S)

= +

S (S)

Clearly, markets are incomplete here. It is possible to show that if S is deterministic, a representative
1
agent with utility function ( ) = 1 1 makes (S) = (1 + S) .

4.17.3 State price density: introduction


We have:
= exp

ln

The objective here is to use It


os lemma for jump processes to express
the jump process as:
Z
Z

1
+
ln
In terms of ,

is

= ( ) with ( ) =

or,

. We have:

in di erential form. Dene

+
+

+ jump

ln

1 (

The general case (with stochastic distribution) is covered in the following subsection.

216

c
by
A. Mele

4.17. Appendix 6: Topics on jumps


4.17.4 State price density: general case
Assume that the primitive is:
=
and let

denote the price of a derivative. Introduce the


=

+
-martingale,

By Itos lemma for jump-di usion processes,


(

)
=
)

=
where
nally:

)
(

)+ (

is the generator for pure di usion processes and,


(

)
(

)
)

To generalize the steps made to deal with the standard di usion case, let
=

We wish to nd restrictions on both and , such that both and are


the jump component for the state price density :
=

We shall show that:


=
Note that in this case,
=

Finally, by the

+
+ (1 +

= 1.

=1

be

(1 +

-martingale property of the discounted ,


=

where

1+

a clear generalization of the pure di usion case.


As for the derivative price:
=(

-martingales. Let

)=

is taken with respect to the jump-size distribution, which is the same under

217

and

c
by
A. Mele

4.17. Appendix 6: Topics on jumps

. As usual, the state-price density has to be a -martingale in order


Proof that
= 1+
to be able to price bonds (in addition to all other assets). In addition, clearly depends on
and
. Therefore, it satises:
=
in

We wish to nd

such that is a

=1

-martingale, viz

= E ( )
i.e.,
E( ) =

i.e.,
is a

-martingale.

By Itos lemma,
( ) =
=
=
=
Because ,

are

and

+ +

+
+ [|
+

-martingales,
Z
, 0=

But
=
and since (

)2 =

+
,

and the previous condition collapses to:


Z
, 0=
which implies
=

1+

218

]+

) =

{z

}+

i
)

a.s.

4.17. Appendix 6: Topics on jumps

c
by
A. Mele

References
Arnold, L. (1974): Stochastic Di erential Equations: Theory and Applications, New York:
Wiley.
Black, F. and M. Scholes (1973): The Pricing of Options and Corporate Liabilities. Journal
of Political Economy 81, 637-659.
Bremaud, P. (1981): Point Processes and Queues: Martingale Dynamics. Berlin: Springer Verlag.
Cvitanic, J. and I. Karatzas (1992): Convex Duality in Constrained Portfolio Optimization.
Annals of Applied Probability 2, 767-818.
Follmer, H. and M. Schweizer (1991): Hedging of Contingent Claims under Incomplete Information. In: Davis, M. and R. Elliott (Editors): Applied Stochastic Analysis. New York:
Gordon & Breach, 389-414.
Friedman, A. (1975): Stochastic Di erential Equations and Applications (Vol. I). New York:
Academic Press.
Harrison, J.M. and S. Pliska (1983): A Stochastic Calculus Model of Continuous Trading:
Complete Markets. Stochastic Processes and Their Applications 15, 313-316.
Harrison, J.M, R. Pitbladdo and S.M. Schaefer (1984): Continuous Price Processes in Frictionless Markets Have Innite Variation. Journal of Business 57, 353-365.
He, H. and N. Pearson (1991): Consumption and Portfolio Policies with Incomplete Markets
and Short-Sales Constraints: The Innite Dimensional Case. Journal of Economic Theory
54, 259-304.
Jeanblanc-Picque, M. and A.N. Shirayev (1995): Optimization of the Flow of Dividends.
Russian Mathematical Surveys 50, 257-277.
Karatzas, I. and S.E. Shreve (1991): Brownian Motion and Stochastic Calculus. New York:
Springer Verlag.
Krugman, P. (1991): Target Zones and Exchange Rate Dynamics. Quarterly Journal of
Economics 106, 669-682.
McDonald, R.L. and D.R. Siegel (1986): The Value of Waiting to Invest. Quarterly Journal
of Economics 101: 707-727.
Mikosch, T. (1998): Elementary Stochastic Calculus with Finance in View. Singapore: World
Scientic.
Revuz, D. and M. Yor (1999): Continuous Martingales and Brownian Motion. New York:
Springer Verlag.
Shreve, S. (1991): A Control Theorists View of Asset Pricing. In: Davis, M. and R. Elliot
(Editors): Applied Stochastic Analysis. New York: Gordon & Breach, 415-445.
219

c
by
A. Mele

4.17. Appendix 6: Topics on jumps

Steele, J.M. (2001): Stochastic Calculus and Financial Applications. New York: SpringerVerlag.

220

5
Taking models to data

5.1 Introduction
This chapter surveys methods to estimate and test dynamic models of asset prices. It begins
with foundational issues on identication, specication and testing. Then, it surveys classical
estimation and testing methodologies such as the Method of Moments, where the number of
moment conditions equals the dimension of the parameter vector (Pearson, 1894); Maximum
Likelihood (ML) (Gauss, 1816; Fisher, 1912); the Generalized Method of Moments (GMM),
where the number of moment conditions exceeds the dimension of the parameter vector, leading
to the minimum chi-squared (Neyman and Pearson, 1928; Hansen, 1982); and, nally, the
relatively more recent developments relying on simulations, which aim to implement ML and
GMM estimation for models that are analytically quite complex, but that can be simulated.
The chapter concludes with an illustration of how joint estimation of fundamentals and asset
prices in arbitrage-free models can asymptotically lead to statistical e ciency.

5.2 Data generating processes


5.2.1 Basics
Given is a multidimensional stochastic process , a data generating process (DGP).
we do not know the probability distribution underlying , we use the available data
insights into its nature. A few denitions. A DGP is a conditional law, say the law of
the set of past values
1 = {
1
2 }, and some exogenous [dene] variable
={
1
2 },
DGP : 0 ( | )

While
to get
given
, with

where
=( 1
), and 0 denotes the conditional density of the data, the true law. Then,
we have three basic denitions. First, we dene a parametric model as a set of conditional laws
for , indexed by a parameter vector
R,
(

)={ ( |

; )

R}

c
by
A. Mele

5.2. Data generating processes


Second, we say that the model (

) is well-specied if,
0

: ( |

0)

( |

Third, we say that the model ( ) is identiable if 0 is unique. The main goal of this chapter is
to review tools aimed at drawing inference about the true parameter 0 , given the observations.
5.2.2 Restrictions on the DGP
The previous denition of DGP is too rich to be of practical relevance. This chapter deals
with estimation methods applying to DGPs satisfying a few restrictions. Two fundamental
restrictions are usually imposed on the DGP:
Restrictions on the heterogeneity of the stochastic process, which lead to stationary random processes.
Restrictions on the memory of the stochastic process, which pave the way to ergodic
processes.
5.2.2.1 Stationarity

Stationary processes describe phenomena leading to long run equilibria, in some statistical
sense: as time unfolds, the probability generating the observations settles down to some longrun probability density, a time invariant probability. As Chapter 3 explains, in the early
1980s, theorists begun to dene a long-run equilibrium as a well-dened stationary, probability
distribution generating economic outcomes. We have two notions of stationarity: (i) Strong,
or strict, stationarity. Denition: Homogeneity in law; (ii) Weak stationarity, or stationarity of
order . Denition: Homogeneity in moments.
Even with stationary DGP, there might be situations where the number of parameters to
be estimated increases with the sample size. As an example, consider two stochastic processes:
2
one, for which
(
; and another, for which
(
+ ) =
+ ) = exp ( | |). In both
cases, the DGP is stationary. Yet for the rst process, the dependence increases with , and
for the second, the dependence decreases with . As this simple example reveals, a stationary
stochastic process may have long memory. Ergodicity further restricts DGP, so as to make
this memory play a more limited role.
5.2.2.2 Ergodicity

We shall deal with DGPs where the dependence between 1 and 2 decreases with | 2
1 |.
To introduce some concepts and notation, say two events
and
are independent, when
(
) = ( ) ( ). A stochastic process is asymptotically independent if, for some function
,
| ( 1

( 1
) ( 1+
+ )
+ )|
1+
0. A stochastic process is -dependent if
,
6= 0.
we also have that lim
A stochastic p
process is asymptotically uncorrelated if there exists
such
that
for
all
,
P
(
( )
( + ), and that 0
1 with
. For example,
+ )
=0
= (1+ ) ,
0, in which case
0 as
.
Let B1 denote the -algebra generated by { 1
} and
B ,
B + , and dene:
= sup | (

( ) ( )|

( ) = sup | (
222

( )|

( )

c
by
A. Mele

5.2. Data generating processes

We say that (i) is strongly mixing, or -mixing if lim


0; (ii) is uniformly mixing
if lim
0. Clearly, a uniformly mixing
process is also strongly mixing. A second order
P
stationary process is ergodic if lim
(
. If a second order stationary
+ )
=1
process is strongly mixing, it is also ergodic.
5.2.3 Parameter estimators
Consider an estimator of the parameter vector
)={ ( |

of the model,
R}

; )

Naturally, any estimator does necessarily depend on the sample size, which we write as
( ). Of a given estimator , we say that it is:
Correct, or unbiased, if

( ) =

Weakly consistent if plim =

0.
0.

( )

The di erence

is called distortion, or bias.

And strongly consistent if

0.

(1)
(2)
Finally, an estimator is more e cient than another estimator if, for any vector of
(1)
(2)
>
constants , we have that >
( )

( ) .

5.2.4 Basic properties of density functions


We have
observations 1 = { 1
}. Suppose these observations are the realization
of a -dimensional random variable with joint density, (1 ; ) =
1 ; . We have
1
momentarily put tildes on , to emphasize that we view each as a random variable.
HowR
from now on, we write instead of . By construction,
( | )

Rever,Rto ease notation,

=
1
or,
1
1
Z
( ; )

Now suppose that the support of


Z

=1

doesnt depend on . Under regularity conditions,


Z
( ; ) =0
( ; ) =

,
where 0 is a column vector of zeros in R . Moreover, for all
Z
( ; ) =
[ ln ( ; )]
0 =

Finally, we have,
0

=
=

Z
[

ln ( ; )] ( ; )
ln ( ; )] ( ; )

where | |2 denotes the outer product, i.e. | |2 =


[

ln ( ; )] =

(5.1)

ln ( ; )|2 =

>

ln ( ; )|2 ( ; )

. Hence, by Eq. (5.1),


[

ln ( ; )]

J( )

The matrix J is known as the Fishers information matrix.


1 Therefore, we follow a classical perspective. A Bayesian statistician would view the sample as given. We do not review Bayesian
methods in this chapter.

223

c
by
A. Mele

5.3. Maximum likelihood estimation


5.2.5 The Cramer-Rao lower bound

Let ( ) some unbiased estimator of , and set the dimension of the parameter space to
We have,
Z
[ ( )] =
( ) ( ; )
Under regularity conditions,
Z
[ ( )] =
( )[

By Cauchy-Schwartz inequality, [
fore,
[

( ( ))]2

ln ( ; )] ( ; )

But if ( ) is unbiased, or

ln ( ; ))]2

(( )

[ ( )]

(( )

ln ( ; ))

[ ( )]

[ ( )]

ln ( ; )] =

= 1.

ln ( ; )]. There-

ln ( ; )]

[ ( )] = ,
[ ( )]

ln ( ; ))]

J( )

This is the celebrated Cramer-Rao bound. The same results holds in the multidimensional
case, through a mere change in notation (see, e.g., Amemiya, 1985, p. 14-17).

5.3 Maximum likelihood estimation


5.3.1 Basics

The density of the data, ( 1 ), maps every possible sample and parameter values of on
to positive numbers, the likelihood of occurence of any given sample, given the parameter
: R 7 R+ . We trace the joint density of the entire sample through a thought experiment,
in which we change the sample 1 . So the sample is viewed as the realization of a random
variable, a view opposite to the Bayesian perspective. We ask: Which value of makes the
sample we observed the most likely to have occurred? We introduce the likelihood function,
( | 1)
( 1 ; ). It is the function 7
( ; ) for 1 given and equal to , say:
( | )

(; )

Then, we maximize ( | 1 ) with respect to . That is, we look for the value of , which
maximizes the probability to observe the sample we have e ectively observed. The resulting
estimator is called maximum likelihood estimator (MLE). As we shall see, the MLE attains the
Cramer-Rao lower bound, provided the model is not misspecied.
5.3.2 Factorizations
Consider a series of events {

}. In the Appendix, we
Y

T
=
Pr
Pr
=1

=1

224

show that,

!
T1

=1

(5.2)

c
by
A. Mele

5.3. Maximum likelihood estimation


By Eq. (5.2), then, the MLE satises:
= arg max

( ) = arg max

( )

ln

where, assuming IID data,


ln

( )

ln

1
1

=1

and

ln

=1

1
1

ln ( ; )

=1

( )

(5.3)

=1

( ) is the log-likelihood of a single observation.

5.3.3 Asymptotic properties


We consider the i.i.d. case only, as in Eq. (5.3). Moreover, we provide heuristic arguments,
leaving more rigorous proofs and general results in the Appendix.
5.3.3.1 The limiting problem

The MLE satises the following rst order conditions,


0 =

ln

( )|

ln

( )

Consider a Taylor expansion of the rst order conditions around


0 =

ln

( ) =

ln

( 0) +

ln

0,

( 0 )(

0)

where the notation


=
means that the di erence
= (1), and
solution to the limiting problem,

1
= arg max [ ( ( ))]
ln ( )
0 = arg max lim
and, nally,

(5.4)
0

is dened as the

satises regularity conditions needed to ensure that,


0

( 0 )] = 0

To show that this is indeed the solution, suppose 0 is identied; that is, 6= 0 and
0
( | ) 6= ( | 0 ). Suppose, further, that for each
,
[ln ( | )]
. Then, we
have that 0 = arg max
[ln ( | )], and this value of is unique. The proof is, indeed,
very simple. We have,

( | )
( | )
ln
ln 0
0
( | 0)
( | 0)
Z
( | )
( | 0)
= ln
( | 0)
Z
= ln
( | ) =0
225

c
by
A. Mele

5.3. Maximum likelihood estimation


5.3.3.2 Consistency and asymptotic normality

Provided the model is well-specied, we have that


0 and even
0 , under regularity
conditions. One example of conditions required to obtain weak consistency is that the following
uniform weak law of large numbers holds,
lim Pr sup |

( ( ))|

( )

Next, consider again the asymptotic expansion in Eq. (5.4), which can be elaborated, so as to
have,
(

0) =

"

( 0)

ln

1X

( 0)

1 X

( 0)

=1

ln

( 0)

=1

By the law of large numbers reviewed in the Appendix (weak law no. 1),
1X

( 0)

J ( 0)

( 0 )] =

=1

Therefore, asymptotically,
(

0) = J ( 0)

1 X

( 0)

=1

We also have,
1 X

(0 J ( 0 ))

( 0)

=1

P
Indeed, let
( 0) = 1
( 0 ), and note that
=1
limit theorem reviewed in the Appendix:
P
1
p =1

( 0)
=
( 0 )]

( 0)
p

( 0 )) = 0. Then, by the central

(
[

( 0 )]

( 0 )] = J ( 0 ).
where, for each ,
[
Finally, by the Slutzkys theorem reviewed in the Appendix,
(

0)

0 J ( 0) 1

Therefore, the ML estimator attains the Cramer-Rao lower bound.


226

( 0 ))

c
by
A. Mele

5.4. M-estimators

5.4 M-estimators
Consider a function of the unknown parameters . Given a function
function ( ) is the solution to,
max

, a M-estimator of the

; )

=1

where and are as in Section 5.2.1. We assume that a solution to this problem exists, that it
is interior and that it is unique. Let us denote the M-estimator with ( 1 1 ). Naturally, the
M-estimator satises the following rst order conditions,
0=

1X

=1

; (

To simplify the presentation, we assume that (


the same law. By the law of large numbers,
1X

; )

=1

ZZ

; )

)=

) are independent in time, and that they have


ZZ

( | )

; )

( )

[ (

; )]

where 0 is the expectation operator taken with respect to the true conditional law of given
and
is the expectation operator taken with respect to the true marginal law of . The
limit problem is,
=

( 0 ) = arg max

[ (

; )]

Under standard regularity conditions,2 there exists a sequence of M-estimators ( ) converging a.s. to
=
( 0 ). Under additional regularity conditions, the M-estimator is also
asymptotic normal:

>
Theorem 5.1: Let I
( ;
( 0 )) [
( ;
( 0 ))]
and assume that the
0
( ; )] exists and has an inverse. We have,
matrix J
0[
(

0 J

( 0 ))

IJ

Sketch of the proof. The M-estimator satises the following rst order conditions,
0 =

1 X

; )

=1

1 X
=1

)+

"

1X
=1

) (

is compact; is continuous with respect to and integrable with respect to the true law, for each ;
; )] uniformly on ; the limit problem has a unique solution
=
( 0 ).
0[ (

227

1
=1

; )

c
by
A. Mele

5.5. Pseudo, or quasi, maximum likelihood


By rearranging terms,
(

)=

"

1X

By the limiting problem,


I, and, then,

; ))]

1 X

1 X

=1

=[
=J

=1

1 X

)=

=1

=1

0[

1 X

)] = 0. Then,

=1

]> =

(0 I)

The result follows by the Slutzkys theorem and the symmetry of J . k


One simple example of M-estimator is the Nonlinear Least Squares estimator,
= arg min

( ; )]2

=1

for some function

. In this case,

; )=[

( ; )]2 .

5.5 Pseudo, or quasi, maximum likelihood


The maximum likelihood estimator is an M-estimator: set = ln , the log-likelihood function.
Indeed, assume the model is well-specied, in which case J = I, which conrms we are back
to the MLE.
Next, suppose that we implement the MLE to estimate a model, when in fact the model is
misspecied in that the true DGP 0 ( | ) does not belong to the family of laws spanned by
our model,
) ( )={ ( | ; )
}
0( |
P
( | ; ). In this case,
Suppose we insist in maximizing = ln , where =

(
0 J 1 IJ 1
0)
where

J =

is the pseudo-true value,3 and


h
i
ln ( | 1 ; 0 )
I=
0

ln ( |

1; 0)

ln ( |

i>
1; 0)

In the presence of specication errors, J 6= I. By comparing the two estimated matrices


leads to detect specication errors. Finally, note that in this general case, the variance-covariance
3 That is,
0 is, clearly, the solution to some misspecied limiting problem. This
some entropy distance minimizer.

228

has an appealing interpretation in terms of

c
by
A. Mele

5.6. GMM

matrix J 1 IJ 1 depends on the unknown law of (


). To assess the precision of the estimates
of , one needs to estimate such a variance-covariance matrix. A common practice is to use
the following a.s. consistent estimators,
=

1X

1 X

and I =

; )

=1

; )

=1

; )>

5.6 GMM
Economic theory often places restrictions on models that have the following format,
[ ( ;

0 )]

=0

(5.5)

is the -dimensional vector of


where : R 7 R , 0 is the true parameter vector,
the observable variables and
R . Typically, then, the MLE cannot be used to estimate
0 . Moreover, MLE requires specifying a density function. Hansen (1982) proposed the following Generalized Method of Moments (GMM) estimation procedure. Consider the sample
counterpart to the population in Eq. (5.5),

1X

( ; )

(5.6)

=1

where we have rewritten asa function


of the parameter vector
. The basic idea of GMM

is to nd a which makes 1> ; as close as possible to zero. Precisely, we have,


Definition (GMM estimator): The GMM estimator is the sequence satisfying,
= arg min
R

where {
vations.
When

1;
1

>

1;
1

} is a sequence of weighting matrices, with elements that may depend on the obser= , we say the GMM is just-identied, and is, simply, the MM, satisfying:
: (

; ) = 0

When
, we say the GMM estimator imposes overidentifying restrictions.
that
We analyze the i.i.d. case only. Under regularity conditions, there exists a matrix
minimizes the asymptotic variance of the GMM estimator, which satises asymptotically,

i 1
h
>
1

( 1; ) ( 1; )
(5.7)
= lim
0
An estimator of

can be:

1 Xh
=1

( ; ) ( ; )>
229

c
by
A. Mele

5.6. GMM

Note that depends on the weighting matrix


, and the weighting matrix
depends on .
Therefore, we need to implement an iterative procedure. The more one iterates, the less likely
(0)
the nal outcome depends on the initial weighting matrix
. For example, one can start with
(0)
=I .
We have:
Theorem 5.2: Suppose to be given a sequence of GMM estimators with weigthing matrix
as in Eq. (5.7), and such that:
0 . We have,
h
i 1
>
1

where
(
0
( ) 0
( )
( ; 0)
0)
Sketch of the proof: The assumption that
conditions. Moreover, the GMM satises,
(

0 =

is easy to check under mild regularity

; )

1;
1

(5.8)

Eq. (5.8) conrms that if = , the GMM satises : ( 1 ; ) = 0. Indeed,


full-rank with = , and Eq. (5.8) can only be satised with = 0. In the general case,
we have,


>

( ; ) =

(
(1)
0) +
1
1; 0 +
1; 0
1

; )

; )

is
,

By premultiplying both sides of the previous equality by


(

( 1 ; )

1; 0 +

; )

; )

>

0) +

(1)

The l.h.s. of this equality is zero by the rst order conditions in Eq. (5.8). By rearranging
terms,
(

0)

We have: 1

> =

1 P

( ; )

=1

( ;
0 . Hence:

1 P

>

( ; )]>

=1

( )

=1

1
0

( )>

0)

( ( )
1 X

( )

1 X

; )

1X

0)

( ; )

=1

( ;

0)

=1

( )), where, by Eq. (5.5),

( ;

(0

( ) = 0, and

( )=

0)

=1

Therefore,

( )

1
0

( )>

0)

is asymptotically normal with expectation 0 , and variance,


( )

1
0

1
0

( )>

230

( )

1
0

( )>

>

( )

1
0

( )>

c
by
A. Mele

5.6. GMM
k

A widely used global specication test is that of the celebrated overidentifying restrictions.
Consider the following intuitive result:

> 1
>
2

;
;
( )
0
0
1
0
1

Would we be expecting the same, if we were to replace the true parameter 0 with the GMM
estimator , which is, anyway, a consistent estimator for 0 ? The anwer is no. Dene:
(

C =

; )>

; )

We have,
(
=
=
= (I

; ) =

1; 0

1; 0
1

P)

and

;
h

( )>

1; 0

> h

( )

( )

(
1
0

( )>

1
0

( )>
1

0)
1

( )

( )
1
0

>

( )

( )

>

( )

( )

1
0

is the orthogonal projector in the space generated by the columns of ( ) by the inner product
1
0 . Thus, we have shown that,

>

(I
P )> 1 (I
P)
C =
1; 0
1; 0

But,

and, by a classical result,

(0
(

0)

Hansen and Singleton (1982, 1983) started the literature on the estimation and testing of dynamic asset pricing models within a fully articulated rational expectations framework. Consider
the classical system of Euler equations arising in the Lucas tree,

0

( +1 )
(1 + +1 ) 1 F = 0
= 1
0( )

where is the utility function of the representative agent, is the return on asset , is the
time-discount factor, F is the information set as of time , and
is the number of assets.
1
Consider the CRRA utility function, ( ) =
(1
). If the model is well-specied, then,
there exist some 0 and 0 such that:
#
"
0

+1
= 1
(1
+
)
1
F = 0
+1
0

231

c
by
A. Mele

5.7. Simulation-based estimators

To sumup, the dimension of the parameter vector is = 2. To estimate the true parameter
vector 0
( 0 0 ), we may build up a system of orthogonality conditions. This system can
be based on projecting observable variables predicted by the model onto other variables, some
instruments included in the information set F :
[ ( ;
where, for some vector of

0 )]

=0

instruments, say, In = [

( ; ) =

+1

+1

(1 +

(1 +

]> ,

1 +1 )

1 In

+1 )

1 In

..
.

(5.9)

The instruments used to produce the orthogonality restrictions, may include constants, past
values of consumption growth, +1 , or even past returns.

5.7 Simulation-based estimators


Ideally, MLE should be the preferred estimation method of parametric Markov models, as it
leads to rst-order e ciency. Yet economic theory places restrictions that make these models
problematic to estimate through maximum ML. In these cases, GMM is a natural estimation
method. But GMM can be unfeasible as well, in situations of interest. Assume, for example, that
the data generating process is not i.i.d. Instead, data are generated by the transition function,
+1

+1 ; 0 )

(5.10)

where
: R R 7 R , and
is a vector of i.i.d. disturbances in R . Assume the
econometrician knows the function . Let = (
. In many cases of
1
+1 ),

interest, the function in Eq. (5.6) can be written as,

1X
=1

where,
=

[
|
(

( (
{z
(

))]
}

(5.11)

0)

is a vector-valued moment function, or observation function, a function that summarizes


satisfactorily the data, so to speak. Consider, for example, Eq. (5.9) without the instruments
In , where
= (1 + +1 ) 1 and ( (
)) =
(( +1 ) ). Once we identify consumption
growth with +1 , +1 = ln +1 , and take the transition law in Eq. (5.10) to be log-normally
distributed, as in some basic models we shall see in Part II of these lectures, we can compute
( (
)) in closed form. Needless to say, the GMM estimator is unfeasible, if we are not able
to compute the expectation ( (
)) in closed form, for each . Simulation-based methods
can make the method of moments feasible in this case.
232

c
by
A. Mele

5.7. Simulation-based estimators


5.7.1 Three simulation-based estimators

The basic idea underlying simulation-based methods is quite simple. While the moment conditions are too complex to be evaluated analytically, the model in Eq. (5.10) can be simulated.
Accordingly, draw from its distribution, and save the simulated values . Compute recursively,

+1
+1

and create simulated moment functions as follows,

Consider the following parameter estimator,


= arg min
where

( )>

( )

(5.12)

( ) is the simulated counterpart to in Eq. (5.11),

!
( )
1X
1 P
( )=
( ) =1
=1

is some weigthing matrix,

and ( ) is the simulated sample size, which we write as a function of the sample size , for
the purpose of the asymptotic theory.
The estimator , also known as the Simulated Method of Moments (SMM) estimator, aims to
match the sample properties of the actual and simulated processes
and . It was introduced
in a series of works, by McFadden (1989), Pakes and Pollard (1989), Lee and Ingram (1991)
and Du e and Singleton (1993). The simulated pseudo-maximum likelihood method of Laroque
and Salanie (1989, 1993, 1994) can also be interpreted as a SMM estimator.
A second simulation-based estimator relies on the indirect inference principle (IIP), and was
proposed by Gourieroux, Monfort and Renault (1993) and Smith (1993). Instead of minimizing
the distance of some moment conditions, the IIP relies on minimizing the parameters of an
auxiliary, possibly misspecied model. For example, consider the following auxiliary parameter
estimator,

= arg max ln
;
(5.13)
1
where is the likelihood of some possibly misspecied model. Consider simulating
process in Eq. (5.10), and computing,
( ) = arg max ln (

( )1 ; )

times the

= 1

where ( )1 = ( ) =1 are the simulated variables (for = 1 ) when the parameter


vector is . The IIP-based estimator is dened similarly as
in Eq. (5.12), but with the
function
given by,
1X
( )=
( )
(5.14)
=1

The diagram in Figure 5.1 illustrates the main ideas underlying the IIP.
233

c
by
A. Mele

5.7. Simulation-based estimators

Model-simulated data

Model

yt

Estimation of an
auxiliary model on
model-simulated data
Auxiliary
parameter estimates

~
y( ) (~
y1 ( ), , ~
yT ( ))

H ( yt 1 , t ; )

( )

Auxiliary
parameter estimates

( y1 , , yT )

Observed data
Indirect Inference Estimator

argmin

~
T

Estimation of the
same auxiliary model
on observed data

( )

FIGURE 5.1. The Indirect Inference principle. Given the true model = ( 1 ; ), an estimator
of based on the indirect inference principle ( say) makes the parameters of some auxiliary model
( ) as close as possible to the parameters

of the same auxiliary model estimated on the

= arg min
observations. That is,
( )
, for some norm .

Finally, Gallant and Tauchen (1996) propose a simulation-based estimation method they
label e cient method of moments (EMM). Their estimator sets,
(

)=

1 X

ln

=1

1;

where ln ( | ; ) is the score of some auxiliary model , also known as the score generator,
is the Pseudo ML estimator of the auxiliary model, and ( ) =1 is a long simulation (i.e.
is very large) of Eq. (5.10), with parameter vector set equal to . Finally, the weighting matrix
in Eq. (5.12) is taken to be any matrix I 1 converging in probability to:

(5.15)
ln ( 2 | 1 ; )
I=

( ), note that the auxiliary score,


To motivate this choice of
the following rst order conditions:
1X
=1

ln ( |

1;

which is the sample equivalent of

ln ( 2 |
234

1;

)=0

) =0

ln ( |

1;

), satises

c
by
A. Mele

5.7. Simulation-based estimators

for some . Likewise, we must have that with = 0 ,


( 0
) = 0, for large . All in all,
we want to nd a stochastic process (
such
+1 ; ) in Eq. (5.10), or a parameter vector
that the expectation of the score of the auxiliary model is zero, a very property of the score,
arising even when the model is misspecied.
5.7.2 Asymptotic normality
We show, heuristically, how asymptotic normality obtains for the three estimators of Section
5.7.1, and then, dene conditions under which asymptotic e ciency might obtain for the EMM.
5.7.2.1 SMM

Let,
0

h
(

and suppose that

( ))

> i

and

( )
1

We now demonstrate that under this condition, as

(
)
0 (1 + )
0

>
0

1
0

,
(5.16)

0
where = lim
,
=
(
(
))
=
, and the notation
0
0
( )
is drawn from its stationary distribution.
Indeed, the rst order conditions satised by the SMM in Eq. (5.12) are,
0 =[

)]>

)=[

)]>

( 0) +

means that

( 0) (

0 )]

(1)

That is,
(

0)

=
=
=

We have,
( 0) =
=

>
0
>
0

1
0

1X
=1

1 X

( 0)

)]>

>
0

>
0

))
0)

)]>

( 0)

)
( 0)

( )
1 P
( ) =1

=1

(0 (1 + )

(5.17)

( )

( )
X

1
( ) =1


0
where we have used the fact that ( ) =
. By using this result into Eq. (5.17) produces
= 0 (i.e. if the number of simulations grows
the convergence in Eq. (5.16). If = lim
( )
faster than the sample size), the SMM estimator is as e cient as the GMM estimator. Finally,
and obviously, we need that = lim
: the number of simulations ( ) cannot
( )
grow more slowly than the sample size.
235

c
by
A. Mele

5.7. Simulation-based estimators


5.7.2.2 Indirect inference

The IIP-based estimator works slightly di erently. For this estimator, even if the number of
simulations is xed, asymptotic normality obtains without requiring to go to innity faster
than the sample size. Basically, what really matters here is that
goes to innity.
By Eq. (5.17), and the discussion in Section 5.7.1, we know that asymptotically, the rst
order conditions satised by the IIP-based estimator are,
(

0)

>
0

>
0

( 0)

is as in Eq. (5.14), 0 =
( ), and ( ) is solution to the limiting problem
where
corresponding to the estimator in Eq. (5.13), viz

1
( ) = arg max lim
ln
1;

We need to nd the distribution of


( 0) =

in Eq. (5.14). We have,

1X

( 0 ))

=1

1X

[(

0)

( 0)

0 )]

=1

0)

1X

( 0)

0)

=1

where

( 0 ). Hence, given the independence of the sample and the simulations,

1
Asy.Var
( 0)
0 1+

That is, asymptotically

can be xed with respect to

5.7.2.3 E cient method of moments

We have,
= arg min

)>

1 X

)=

ln

=1

The rst order conditions are:


0=

)>
)>

(
(

1;

)
(

)+

)(

0 ))

or
(

0)

)>

( 0
236

)>

c
by
A. Mele

5.7. Simulation-based estimators


We have, for some

)=J
(
)
(0 I)

ln ( 2 | 1 ; ) and I is as in Eq. (5.15). Hence,


(

where J =

where,
=
=I

With

>

0)

>

, this variance collapses to,

(0

>

>

5.7.2.4 Spanning scores

>

(5.18)

This section provides a heuristic discussion of the conditions under which the EMM achieves the
Cramer-Rao lower bound. Consider the following denition, which is similar to that in Tauchen
(1997). Of a given span of moment conditions , say that of the EMM, we say that it also
spans the true score if,
( | )=0
(5.19)
where denotes the true score. From Eq. (5.18), we know that the asymptotic variance of the
EMM, say
EMM , satises:
1
EMM

>

( )

( )

By the linear projection,


+

we have,
1
MLE

where

( )=
MLE

>

( )

( |

)=

Indeed, under regularity conditions,


(

)=
=
=
=

0)

( )

)> +

( | )
(5.20)

denotes the asymptotic variance of the MLE. We claim that:


)> =

where (

ln ( ;

ln ( ;

ln ( ;

(5.21)

) (
(

)
)

0)

ln (

)>

(
0)

0)

is the true density. Next, replace Eq. (5.21) into Eq. (5.20),
1
MLE

>

( )

( |

)=

1
EMM

( |

Therefore, the EMM estimator achieves the Cramer-Rao lower bound under the spanning condition in Eq. (5.19).
237

c
by
A. Mele

5.7. Simulation-based estimators


5.7.3 A fourth simulation-based estimator: Simulated maximum likelihood

Estimating the parameters of stochastic di erential equations is a recurrent theme in empirical


nance. Consider a continuous time model,
( ) = ( ( ); )

( ( ); )

( )

(5.22)

where ( ) is a Brownian motion and and are two functions guaranteeing a strong solution
to Eq. (5.22). Except in special cases (e.g., the a ne models reviewed in Chapter 12), the
likelihood function of the data generated by this process is unknown. We can then use one of
the three estimators we have presented in section 5.7.1. Alternatively, we might use simulated
maximum likelihood, a method introduced in nance by Santa-Clara (1995) (see, also, Brandt
and Santa-Clara, 2002). We only provide the idea of the method, not the asymptotic theory.
Suppose, then, that we observe discretely sample data generated by Eq. (5.22): 0 , 1 , , ,
, , where is the sample size. We need to know the transition density, say ( +1 | ; ),
to implement maximum likelihood, which we assume we do not know. Consider, then, the Euler
approximation to Eq. (5.22),
=

( +1)

(5.23)

+1

where is a sequence of i.i.d. random variables with expectation zero and unit variance. This
stochastic process is dened at the dates , for integer. Let [ ] denote the integer part of
, and for = 1 [ ], set
(

+1

if

In other words, we are chopping the time interval between two observations, [ + 1], in
( )
pieces, and then take
to be large. We know that as
,
( ) as
,
where
denotes weak convergence, or convergence in distribution, meaning that all nite
( )
dimensional distributions of converge to those of ( ) as
. The idea underlying
simulated maximum likelihood, then, is to estimate the transition density, ( +1 | ; ), through
simulations of Eq. (5.23), performed using a large value of . Note, we cannot guarantee the
transition density is recovered by simulating Eq. (5.23), not even for a large value of . We can
only perform an imperfect simulation of Eq. (5.23).
The likelihood function is,
= ( 0; )

Y1

+1 |

=0

; )

where ( 0 ; ) denotes the marginal density of the rst observation, 0 .


Let
( 0 | ; ) the transition density of the data generated by Eq. (5.23). Then, if
normally distributed,

( +1)

( +1)

;
238

is

c
by
A. Mele

5.7. Simulation-based estimators

where ( ; ; 2 ) denotes the Gaussian density with mean and variance 2 . Moreover, we
have, approximately,
Z
( +1 | ; ) =
( +1 | ; ) ( | ; )

Z
1 2
1
( ; )
( | ; )
=
+1 ; + ( ; ) ;
where we have set = +1
in a moment, and estimate
(

+1 |

; )

. We may, now, draw values of


( +1 | ; ) through:

1X

( +1)

; +

=1

( | ; ), as explained

from

time to time + 1
where is obtained by iterating Eq. (5.23) from

conditions, we have that for all


, sup 0
( 0| ; )
( 0 | ; )
large, with
0.

1
1

. Under regularity
0 as and get

5.7.4 Advances
The three estimators examined in Sections 5.7.1-5.7.2 are general-purpose, but in general, they
do not lead to to asymptotic e ciency, unless the true score belongs to the span of the moment conditions, as explained in Section 5.7.2.4. There exist other simulation-based methods,
which aim to approximate the likelihood function through simulations (e.g., Lee, 1995; Hajivassiliou and McFadden, 1998): for example, the simulated maximum likelihood estimator in
Section 5.7.2.3 can be used to estimate the parameters of stochastic di erential equations. While
methods based on simulated likelihood lead to asymptotically e cient estimators, they address
specic estimation problems, just as the example of Section 5.7.2.3 illustrates.
There exist estimators that are both general purpose and that can lead to asymptotic e ciency. Fermanian and Salanie (2004) consider an estimator that relies on approximating the
likelihood function through kernel estimates obtained simulating the model of interest. Carrasco,
Chernov, Florens and Ghysels (2007) rely on a continuum of moment conditions matching
model-based (simulated) characteristic functions to data-based characteristic functions. Altissimo and Mele (2009) propose an estimator based on a continuum of moment conditions,
which minimizes a certain distance between conditional densities estimated with the true data
and conditional densities estimated with data simulated from the model, where both conditional
densities are estimated through kernel methods.
5.7.5 In practice? Latent factors and identication
The estimation theory of this section does not rule out the situation where some of the variables
in Eq. (5.10) are unobservable. The principle to follow is very simple, one applies any of the
methods we have discussed to those variables simulated out of Eq. (5.10), which correspond to
the observed ones. For example, we may want to estimate the following model of the short-term
rate ( ), discussed at length in Chapter 12:
p
( ) = (
( )) + p ( ) 1 ( )
(5.24)
( )) +
( ) 2( )
( ) = (
239

c
by
A. Mele

5.8. Asset pricing, prediction functions, and statistical inference

where ( ) is the short-term rate instantaneous, stochastic variance, 1 and 2 are two standard Brownian motions, and the parameter vector of interest is = [
]. Let us consider
one of the methods discussed so far, say indirect inference. The logical steps to follow, then,
are (i) to simulate Eqs. (5.24), and (ii) to calibrate an auxiliary model to the short term rate
data simulated out of Eqs. (5.24) which is as close as possible to the very same auxiliary model
tted on true data. Note, in doing so, we just have to neglect the volatility data simulated out
of Eqs. (5.24), as these data are obviously unobservable.
The question arises, therefore, as to whether the auxiliary model one chooses is rich enough
to allow identifying the models parameter vector . There might be many combinations of
unobserved random processes ( ) that are consistent with the likelihood of any given auxiliary model. So which auxiliary model to t, in practice? Gallant and Tauchen (1996) asked
this question long time ago. Needless to mention, there are no general answers to this question. Very simply, one requires the model to be identiable, which is likely to happen once the
auxiliary model is rich enough. In an impressive series of applied work, Gallant and Tauchen
and their co-authors have proposed semi-nonparametric score generators, as a way to get as
close as possible to a rich model. Intuitively, by increasing the order of Hermite expansions,
semi-nonparametric scores might converge to the true ones. Alternatively, one might use a continuum of moment conditions, as explained in Section 5.7.4. For example, the nonparametric
density estimators of Altissimo and Mele (2009) converge to the true parameter once the bandwidth parameters used to smooth out these kernel estimates gets smaller and smaller. In the
next section, we provide a discussion of how asset prices might help convey information about
unobserved processes and lead to statistical e ciency.

5.8 Asset pricing, prediction functions, and statistical inference


We develop conditions, which ensure the feasibility of estimation methods in a context where
an unobservable multidimensional process is estimated in conjunction with prediction functions suggested by asset pricing models.4 We assume that the data generating process is a
multidimensional partially observed di usion process solution to,
( ) = ( ( ); )

( ( ); )

( )

(5.25)

where
is a multidimensional process and (
) satisfy some regularity conditions we single
out below. We analyze situations where the original partially observed system in Eq. (5.25)
can be estimated by augmenting it with a number of observable deterministic functions of the
state. In many situations, such deterministic functions are suggested by asset pricing theories
in a natural way. Typical examples include the price of derivatives or in general, any functional
of asset prices (such as asset returns, bond yields, implied volatilities).
The idea to use asset pricing predictions to improve the t of models with unobservable
factors has been explored at least by, e.g., Christensen (1992), Pastorello, Renault and Touzi
(2000), Chernov and Ghysels (2000), Singleton (2001), and Pastorello, Patilea and Renault
(2003).
We consider a standard Markov pricing setting. For xed
0, we let
be the expiration
date of a contingent claim with rational price process = { ( ( )
)} [ ) , and let
{ ( ( ))} [ ] and ( ) be the associated intermediate payo process and nal payo function,
4 This

section is based on an unpublished appendix of Altissimo and Mele (2009).

240

c
by
A. Mele

5.8. Asset pricing, prediction functions, and statistical inference

respectively. Let / + be the usual innitesimal generator of the system in Eq. (5.25), taken
under the risk-neutral probability. Then, as we saw in Chapter 4, we have that in a frictionless,
arbitrage-free market, is the solution to the following partial di erential equation:

0=
+
(
)+ ( ) ( )
[
)
(5.26)
( 0) = ( )
where
( ) is the short-term rate. We call prediction function any continuous and twice
di erentiable function ( ;
) solution to the partial di erential equation and boundary
condition in (5.26). Examples of contingent claims with prices satisfying (5.26) are derivatives,
typically.
Next, we augment the system in Eq. (5.25) with
prediction functions, where denotes
the number of the observable variables in Eq. (5.25). Precisely, we let:
( )

( ( ( )

( ( )

))

1]

where { } =1 is an increasing sequence of xed maturity dates. Furthermore, we dene the


measurable vector valued function:
( ( );

( ( )

( ( )))

1]

(5.27)

where ( ) denotes the vector of observable variables in Eq. (5.25), and


R is a compact
parameter set containing additional parameters. These new parameters arise from the change of
measure leading to the pricing model in Eq. (5.27), and are now part of our estimation problem.
We assume that the pricing model in Eq. (5.27) is correctly specied. That is, all contingent
claim prices in the economy are taken to be generated by the prediction function (
) for
some ( 0 0 )
. For simplicity, we also consider a stylized situation in which all contingent
claims have the same contractual characteristics specied by C
(
). More generally, one
may dene a series of classes of contingent claims {C } =1 , where the class of contingent claims
has characteristics specied by C
(
). As an example, assets belonging to the class C1 can
be European options, assets belonging to the class C1 can be bonds. The number
of prediction
P
functions that we would introduce in this case would be equal to
=
, where
=1
is the number of prediction functions within class of assets . To keep the presentation simple,
we do not consider such a more general situation.
The objective is to dene estimators of the parameter vector ( 0 0 ), under which observations were generated. We want to use any of the simulation methods reviewed in Section
5.7 to produce an estimator of ( 0 0 ). The idea, as usual, is to make the nite dimensional
distributions of implied by the pricing model in Eq. (5.27) and the fundametals in Eq. (5.26)
as close as possible to the sample counterparts of . Let
R be the domain on which
takes values. As illustrated by Figure 5.2, we want to move from the unfeasible domain
of
the original state variables in Eq. (5.25) (observables and not) to the domain on which only
observable variables take value. Ideally, we would like to implement such a change in domain
in order to recover as much information as possible about the original unobserved process in
(5.25). Clearly, is fully revealing whenever it is globally invertible. However, we will show that
estimation is feasible even when is only locally one-to-one.
An important feature of the theory in this section is that it does not hinge upon the availability of contingent prices data covering the same sample period covered by the observables
241

c
by
A. Mele

5.8. Asset pricing, prediction functions, and statistical inference


(y; 0, 0)

(y; 0, 0)
FIGURE 5.2. Asset pricing, the Markov property, and statistical e ciency. is the domain on which
)> takes values, is the domain on which
the partially observed primitive state process
(
( ))> takes values in Markovian economies, and ( ) is a contingent
the observed system
(
claim price process in R
. Let
= (
( 1) (
)), where { (
)} =1 forms an
intertemporal cohort of contingent claim prices, as in Denition 5.3. If the local restrictions of are
one-to-one and onto, statistical inference about and can be made, using information about the price
of derivative contracts, . If is also globally invertible, statistical inference can lead to rst-order
asymptotic e ciency, once conditioned upon .

in Eq. (5.25). First, the price of a given contingent claim is typically not available for a long
sample period. As an example, available option data often include option prices with a life span
smaller than the usual sample span of the underlying asset prices. By contrast, it is common
to observe long time series of option prices having the same maturity. Second, the price of a
single contingent claim depends on the time-to-maturity of the claim; therefore, it does not
satisfy the stationarity assumptions maintained in this paper. To address these issues, we deal
with data on assets having the same characteristics at each point in time. Precisely, consider
the data generated by the following random processes:
Definition 5.3. (Intertertemporal (
)-cohort of contingent claim prices) Given a prediction
function ( ;
) and a -dimensional vector
( 1
) of xed time-to-maturity,
an intertemporal (
)-cohort of contingent claim prices is any collection of contingent claim
price processes ( )
( ( ( ) 1) ( ( )
)) (
0) generated by the pricing model
(5.27).
Consider for example a sample realization of three-months at-the-money option prices, or
a sample realization of six-months zero-coupon bond prices. Long sequences such as the ones
in these examples are common to observe. If these sequences were generated by the pricing
model in Eq. (5.27), as in Denition 5.3, they would be deterministic functions of , and hence
stationary. We now develop conditions ensuring both feasibility and rst-order e ciency of the
class of simulation-based estimators, as applied to this kind of data. Let denote the matrix
having the rst rows of , the di usion matrix in Eq. (5.25). Let
denote the Jacobian of
with respect to . We have:
Theorem 5.4. (Asset pricing and Cramer-Rao lower bound) Suppose to observe an intertemporal (
)-cohort of contingent claim prices ( ), and that there exist prediction functions
in R
with the property that for = 0 and = 0 ,

( ) ( )
( )

6= 0,

-a.s. all
242

+ 1],

(5.28)

5.8. Asset pricing, prediction functions, and statistical inference

c
by
A. Mele

where
satises the initial condition ( ) = ( )
( ( ( ) 1)
( ()
)). Let
= ( ( ) ( ( ) 1) ( ( )
)). Then, any simulation-based estimator applied to
is feasible. Moreover, asssume
is also Markov. Then, any estimator with a span of moment
conditions for
that also spans the true score, attains the Cramer-Rao lower bound, with
respect to the elds generated by .
According to Theorem 5.4, any estimator is feasible, whenever is locally invertible for a
time span equal to the sampling interval. As Figure 5.2 illustrates, condition (5.28) is satised
whenever is locally one-to-one and onto.5 If is also globally invertible for the same time
span,
is Markov. The last part of this theorem says that in this case, any estimator is
asymptotically e cient. We emphasize that this conclusion is about rst-order e ciency in the
joint estimation of and given the observations on .
Naturally, condition (5.28) does not ensure that is globally one-to-one and onto: might
have many locally invertible restrictions.6 In practice, might fail being globally invertible
because monotonicity properties of may break down in multidimensional di usion models.
For example, in models with stochastic volatility, option prices can be decreasing in the underlying asset price (see Bergman, Grundy and Wiener, 1996). In models of the yield curve with
stochastic volatility, to cite a second example, medium-long term bond prices can be increasing
in the short-term rate (see Mele, 2003). These cases might arise as there is no guarantee that
the solution to a stochastic di erential system is nondecreasing in the initial condition of one
if its components, which is, instead, always true in the scalar case.
When all components of vector represent the prices of assets actively traded in frictionless
markets, (5.28) corresponds to a condition ensuring market completeness in the sense of Harrison
and Pliska (1983). As an example, condition (5.28) for Hestons (1993) model is
/
6=
0
-a.s, where denotes instantaneous volatility of the price process. This condition is
satised by the Hestons model. In fact, Romano and Touzi (1997) showed that within a fairly
general class of stochastic volatility models, option prices are always strictly increasing in
whenever they are convex in . Theorem 5.4 can be used to implement e cient estimators in
other complex multidimensional models. Consider for example a three-factor model of the yield
curve. Consider a state-vector (
), where is the short-term rate and
are additional
factors (such as, say, instantaneous short-term rate volatility and a central tendency factor). Let
()
= ( ( ) ( ) ( );
) be the time rational price of a pure discount bond expiring
at
= 1 2, and take 1
( (1) (2) ). Condition (5.28) for this model
2 . Let
is then,
(1) (2)
(1) (2)
6= 0,
-a.s.
[ + 1]
(5.29)
where subscripts denote partial derivatives. It is easily checked that this same condition must be
satised by models with correlated Brownian motions and by yet more general models. Classes
of models of the short-term rate for which condition (5.29) holds are more intricate to identify
than in the European option pricing case seen above (see Mele, 2003).

5 Local

invertibility of means that for every


, there exists an open set
containing such that the restriction of to
is invertible. Let
denote the Jacobian of . Then, we have that is locally invertible on
if det
6= 0 on
, which is
condition (5.28).
6 As an example, consider the mapping R2 7
R2 dened as ( 1 2 ) = ( 1 cos 2 1 sin 2 ). The Jacobian satises
det ( 1 2 ) = 2 1 , yet is 2 -periodic with respect to 2 . For example, (0 2 ) = (0 0).

243

c
by
A. Mele

5.9. Appendix 1: Proof of selected results

5.9 Appendix 1: Proof of selected results


Proof of Eq. (5.2). We have:
We still have,
Pr (

3|

That is,
Pr

3
T

=1

= Pr (

2 ) = Pr (

2 ) Pr (

Continuing, we obtain Eq. (5.2). k

2)

3| ) =

3|

1)

2|

1 ).

Consider the event

T
T
T
Pr ( 3
Pr ( 3
)
2)
T1
=
Pr ( )
Pr ( 1
)
2
2)

244

= Pr (

1)

Pr (

2|

1 ) Pr (

3|

2)

2.

c
by
A. Mele

5.10. Appendix 2: Collected notions and results

5.10 Appendix 2: Collected notions and results


Convergence in probability. A sequence of random vectors { } converges in probability to the
random vector if for each
0,
0 and each = 1 2
, there exists a
such that for
,
every
|
)
Pr (|
, or plim

This is succinctly written as

= , if

, a constant.

Convergence in probability generalizes the standard notion of a limit of a deterministic sequence.


, we say it converges to some limit if, for
0, there exists a
:
Of a deterministic sequence
we have that |
|
. Convergence in probability can also be restated as saying
for each
that:
|
)=0
lim Pr (|
The following is a stronger notion of convergence:
Almost sure convergence. A sequence of random vectors {
random vector if, for each = 1 2
, we have:
Pr ( :
where

denotes the entire random sequence

( )

} converges almost surely to the

)=1

. This is succinctly written as

Almost sure convergence implies convergence in probability. Convergence in probability means


that for each
0, lim
Pr ( : | ( ) |
) = 1. Almost sure convergence requires that
Pr (lim
) = 1 or that

!
S
Pr sup |
|
= lim
Pr
|
|
=0
lim
0
0
0

Next, assume that the second order moments of all

are nite. We have:

Convergence in quadratic mean. A sequence of random vectors {


mean to the random vector if for each = 1 2
, we have:
i
h
2
lim
0
(

)
0
This is succinctly written as

} converges in quadratic

Remark. By Chebyshevs inequality,


Pr (|

)2 ]

[(
2

which shows that convergence in quadratic mean implies convergence in probability.


We now turn to a weaker notion of convergence:
Convergence in distribution. Let { ()} be the sequence of probability distributions (that is,
( ) =
(
)) of the sequence of the random vectors { }. Let be a random vector with
,
probability distribution ( ). A sequence { } converges in distribution to if, for each = 1 2
we have:
( )= ( )
lim

245

c
by
A. Mele

5.10. Appendix 2: Collected notions and results


This is succinctly written as

The following two results are useful to the purpose of this chapter:
Slutzkys theorem. If

and

, then:

Cramer-Wold device. Let

be a

-dimensional vector of constants. We have:

>

>

The following example illustrates the Cramer-Wold device. If


(0; ).
We now state two laws about convergence in probability.

>

0;

} be a i.i.d. sequence satistfying

Weak law (No. 1) (Khinchine). Let {


have:

>

, then

)=

. We

1X

=1

Weak law (No. 2) (Chebyshev). Let


{
and
(
satisfying ( ) =

} be a sequence independent but not


identically distributed,
1 P
2
)2 = 2
. If lim
0, then:
2
=1

1X

=1

1X
=1

We now state and provide a proof of the central limit theorem in a simple setting.
Central Limit Theorem. Let { } be a i.i.d. sequence, satisfying
1 P
= 2
. Let
. We have,
=1
(

)=

and

)2

(0 1)

The multidimensional version of this theorem requires a mere change in notation. For the proof, the
classic method relies on the characteristic functions. Let:
Z
i
i
()
( )
i
1
=
We have

( ) =0 = i

( ) = (0) +
Next, let =

=1

( ),

where (

1
( )
+
2
=0

is the -th order moment. By a Taylors expansion,

2
(2) 1 2

(
)
+ = 1 + i (1)
+

2
2
=0

, and consider the random variable,


(

1 X
=1

246

c
by
A. Mele

5.10. Appendix 2: Collected notions and results


The characteristic function of
all the same:

( )=(

is the product of the characteristic functions of

( )) , where
( )=

Clearly, lim

( )=

1 2
2

( )=1

, which are

+ . Therefore,
1
2

, which is the characteristic function of a standard Gaussian variable.

247

c
by
A. Mele

5.11. Appendix 3: Theory for maximum likelihood estimation

5.11 Appendix 3: Theory for maximum likelihood estimation


Assume that
( )
ln ( R| ) exists, it is continuous in
0 , and that
and that we can di erentiate twice inside the integral
( | ) = 1. We have:
( )=

1X

ln

uniformly in

( | )

=1

Consider the -parametrized curves ( ) = ( 0 ) + where, for all


(0 1) and
,
denotes a vector in where the th element is ( ) ( ) . By the intermediate value theorem, there exists
then a
in (0 1) such that we have almost surely:
( ) =
where

( ) (

( 0) +

0)

( ) and:
( )=

1X

( | )

=1

The rst order conditions tell us that


0=

( ) = 0. Hence,
( 0) +

) (

0)

We also have that:


|

1X

( 0 )|

=1

( 0 )|

sup |

( 0 )|

where the supremum is taken over the set of all the observations. Since
0 . Moreover, by the law of large numbers,
1X

( 0) =

0|

=1

Since

is continuous in

Therefore, as

0|

)] =

0,

J ( 0)

(5A.1)

we also have that

(5A.2)

uniformly in , the inequality in (5A.1), and (5A.2) both imply that:


(

J ( 0)

( 0)

By the central limit theorem, and

=J

( 0) =

( 0)

) = 0, the score,

( 0)

(0

( (

)))

( 0)
P

=1

), is such that

where
( (

)) = J

The result follows by the Slutzkys theorem and the symmetry of J .


Finally, one should show the existence of a sequence converging a.s. to 0 . Proofs on this type
of convergence can be found in Amemiya (1985), or in Newey and McFadden (1994).

248

c
by
A. Mele

5.12. Appendix 4: Dependent processes

5.12 Appendix 4: Dependent processes


5.12.1 Weak dependence
Let

P
( =1

), and assume that that


1

( ), and that

( ))

. If

(0 1)

=1

we say that { } is weakly dependent. Of a process, we say it is nonergodic, when it exhibits such a
strong dependence that it does not even satisfy the law of large numbers.
Stationarity
Weak dependence
Ergodicity

5.12.2 The central limit theorem for martingale di erences


2

Let

be a martingale di erence sequence with


1P
2 . Let,
and 2
=1
1 X
2 =1

0, lim

I|

=0

for all , and dene

1X

and

1P

=1

=1

Under the previous condition,

(0 1)

5.12.3 Applications to maximum likelihood


We use the central limit theorem for martingale di erences to prove asymptotic normality of the MLE,
in the case of weakly dependent processes. We have,
X

( )=

ln

( )

( )

( ;

=1

The MLE satises the following rst order conditions,


0 =

ln

( )|

( )|

=1

whence
(

0)

"

We have:
0

which shows that

0)

( )|

0)

=1

1X

( 0)

=1

+1 ( 0 )|

1 X

( 0)

=1

]=0

is a martingale di erence. Naturally, here we also have that:


(|

+1 ( 0 )|2 |

)=

249

+1 ( 0 )|

J ( 0)

(5A.3)

c
by
A. Mele

5.12. Appendix 4: Dependent processes


Next, for a given constant

R , let:
>

Clearly,

( 0)

is also a martingale di erence. Furthermore,


2

= >J ( 0)
+1
0

and because is a martingale di erence, (


)= [ (
|
)] =
and
are mutually uncorrelated. It follows that,
0, for all . That is,

!
X
X
2
=
=1

[ ( |

]=

=1

>

(|

>

( 0 )|2 )

=1

=1

>

(|

[J

( 0 )|2 |
1 ( 0 )]

=1

>

"
X

1 )]

(J

1 ( 0 ))

"

=1

Next, dene:

1X

and

=1

1X

=1

>

1X

(J

1 ( 0 ))

=1

Under the conditions underlying the central limit theorem for weakly dependent processes provided
earlier, to be spelled out below,

(0 1)

By the Cramer-Wold device,


"

1X

(J

=1

1 2

1 ( 0 ))

1 X

( 0)

(0 I )

=1

The conditions that need to be satised are,


1X
=1

( 0)

1X

[J

1 ( 0 )]

=1

and plim

1X
=1

Under the previous conditions, it follows from Eq. (5A.3) that,

(
0 J ( 0)
0)

250

[J

1 ( 0 )]

J ( 0) .

c
by
A. Mele

5.13. Appendix 5: Proof of Theorem 5.4

5.13 Appendix 5: Proof of Theorem 5.4


Let

( ( ( + 1) M
( () M

( + 1)1
)

)| ( ( ) M

( ( ))

( () ( ()

where we have emphasized the dependence of


M
By

( ) full rank

)) denote the transition density of

( ()

))

on the time-to-maturity vector:


(

-a.s., and It
os lemma, satises, for

( ) =
( ) + ( ) ( )
( ) + ( ) ( )
( ) =

+ 1],

( )
( )

and
are, respectively, -dimensional and (
)-dimensional measurable functions, and
where
-a.s. Under condition (5.28), is not degenerate. Furthermore, ( ( ); )
( ) ( ) ( ) 1
). That is, for all ( + ) R R , there exists a function
( ) is deterministic in
( 1
such that for any neighbourhood (+ ) of + , there exists another neighborhood ( (+ )) of (+ )
such that,

(+ )
( () M
1 )=
: ( ( + 1) M ( + 1)1 )

=
: ( ( + 1) ( ( + 1) 1
)) ( ( + 1)
))
( (+ ))
=

: ( ( + 1) ( ( + 1)

| ( () M
1

))

|( ( ) ( ( )

) = }

( ( + 1)
1

))
( ()

( (+ ))

)) = }

where the last equality follows by the denition of . In particular, the transition laws of
given
are
not
degenerate;
and
is
stationary.
The
feasibility
of
simulation
based
method
of
moments
1
estimation is proved. The e ciency claim follows by the Markov property of , and the usual score
martingale di erence argument.

251

5.13. Appendix 5: Proof of Theorem 5.4

c
by
A. Mele

References
Altissimo, F. and A. Mele (2009): Simulated Nonparametric Estimation of Dynamic Models.
Review of Economic Studies 76, 413-450.
Amemiya, T. (1985): Advanced Econometrics. Cambridge, Mass.: Harvard University Press.
Bergman, Y. Z., B. D. Grundy, and Z. Wiener (1996): General Properties of Option Prices.
Journal of Finance 51, 1573-1610.
Brandt, M. and P. Santa-Clara (2002): Simulated Likelihood Estimation of Di usions with an
Applications to Exchange Rate Dynamics in Incomplete Markets. Journal of Financial
Economics 63, 161-210.
Carrasco, M., M. Chernov, J.-P. Florens and E. Ghysels (2007): E cient Estimation of General Dynamic Models with a Continuum of Moment Conditions. Journal of Econometrics
140, 529-573.
Chernov, M. and E. Ghysels (2000): A Study towards a Unied Approach to the Joint Estimation of Objective and Risk-Neutral Measures for the Purpose of Options Valuation.
Journal of Financial Economics 56, 407-458.
Christensen, B. J. (1992): Asset Prices and the Empirical Martingale Model. Working paper,
New York University.
Du e, D. and K. J. Singleton (1993): Simulated Moments Estimation of Markov Models of
Asset Prices. Econometrica 61, 929-952.
Fermanian, J.-D. and B. Salanie (2004): A Nonparametric Simulated Maximum Likelihood
Estimation Method. Econometric Theory 20, 701-734.
Fisher, R. A. (1912): On an Absolute Criterion for Fitting Frequency Curves. Messages of
Mathematics 41, 155-157.
Gallant, A. R. and G. Tauchen (1996): Which Moments to Match? Econometric Theory 12,
657-681.
Gauss, C. F. (1816): Bestimmung der Genanigkeit der Beobachtungen. Zeitschrift f
ur Astronomie und Verwandte Wissenschaften 1, 185-196.
Gourieroux, C., A. Monfort and E. Renault (1993): Indirect Inference. Journal of Applied
Econometrics 8, S85-S118.
Hajivassiliou, V. and D. McFadden (1998): The Method of Simulated Scores for the Estimation of Limited-Dependent Variable Models. Econometrica 66, 863-896.
Hansen, L. P. (1982): Large Sample Properties of Generalized Method of Moments Estimators. Econometrica 50, 1029-1054.
Hansen, L. P. and K. J. Singleton (1982): Generalized Instrumental Variables Estimation of
Nonlinear Rational Expectations Models. Econometrica 50, 1269-1286.
252

5.13. Appendix 5: Proof of Theorem 5.4

c
by
A. Mele

Hansen, L. P. and K. J. Singleton (1983): Stochastic Consumption, Risk Aversion, and the
Temporal Behavior of Asset Returns. Journal of Political Economy 91, 249-265.
Harrison, J. M. and S. R. Pliska (1983): A Stochastic Calculus Model of Continuous Trading:
Complete Markets. Stochastic Processes and their Applications 15, 313-316.
Heston, S. (1993): A Closed-Form Solution for Options with Stochastic Volatility with Applications to Bond and Currency Options. Review of Financial Studies 6, 327-343.
Laroque, G. and B. Salanie (1989): Estimation of Multimarket Fix-Price Models: An Application of Pseudo-Maximum Likelihood Methods. Econometrica 57, 831-860.
Laroque, G. and B. Salanie (1993): Simulation-Based Estimation of Models with Lagged
Latent Variables. Journal of Applied Econometrics 8, S119-S133.
Laroque, G. and B. Salanie (1994): Estimating the Canonical Disequilibrium Model: Asymptotic Theory and Finite Sample Properties. Journal of Econometrics 62, 165-210.
Lee, B-S. and B. F. Ingram (1991): Simulation Estimation of Time-Series Models. Journal
of Econometrics 47, 197-207.
Lee, L. F. (1995): Asymptotic Bias in Simulated Maximum Likelihood Estimation of Discrete
Choice Models. Econometric Theory 11, 437-483.
McFadden, D. (1989): A Method of Simulated Moments for Estimation of Discrete Response
Models without Numerical Integration. Econometrica 57, 995-1026.
Mele, A. (2003): Fundamental Properties of Bond Prices in Models of the Short-Term Rate.
Review of Financial Studies 16, 679-716.
Newey, W. K. and D. L. McFadden (1994): Large Sample Estimation and Hypothesis Testing. In: Engle, R. F. and D. L. McFadden (Editors): Handbook of Econometrics, Vol. 4,
Chapter 36, 2111-2245. Amsterdam: Elsevier.
Neyman, J. and E. S. Pearson (1928): On the Use and Interpretation of Certain Test Criteria
for Purposes of Statistical Inference. Biometrika 20A, 175-240, 263-294.
Pakes, A. and D. Pollard (1989): Simulation and the Asymptotics of Optimization Estimators. Econometrica 57, 1027-1057.
Pastorello, S., E. Renault and N. Touzi (2000): Statistical Inference for Random-Variance
Option Pricing. Journal of Business and Economic Statistics 18, 358-367.
Pastorello, S., V. Patilea, and E. Renault (2003): Iterative and Recursive Estimation in
Structural Non Adaptive Models. Journal of Business and Economic Statistics 21, 449509.
Pearson, K. (1894): Contributions to the Mathematical Theory of Evolution. Philosophical
Transactions of the Royal Society of London, Series A 185, 71-78.
Romano, M. and N. Touzi (1997): Contingent Claims and Market Completeness in a Stochastic Volatility Model. Mathematical Finance 7, 399-412.
253

c
by
A. Mele

5.13. Appendix 5: Proof of Theorem 5.4

Santa-Clara, P. (1995): Simulated Likelihood Estimation of Di usions With an Application


to the Short Term Interest Rate. Ph.D. dissertation, INSEAD.
Singleton, K. J. (2001): Estimation of A ne Asset Pricing Models Using the Empirical Characteristic Function. Journal of Econometrics 102, 111-141.
Smith, A. (1993): Estimating Nonlinear Time Series Models Using Simulated Vector Autoregressions. Journal of Applied Econometrics 8, S63-S84.
Tauchen, G. (1997): New Minimum Chi-Square Methods in Empirical Finance. In D. Kreps
and K. Wallis (Editors): Advances in Econometrics, 7th World Congress, Econometrics
Society Monographs, Vol. III. Cambridge UK: Cambridge University Press, 279-317.

254

Part II
Applied asset pricing theory

255

6
Neo-classical kernels and puzzles

6.1 Introduction
Asset pricing models impose a number of restrictions on security returns, which can conveniently
be summarized by a few but key properties of the pricing kernel, consistent with a data-reduction
principle. This chapter discusses methods of statistical inference based on this data-reduction,
by relying on restrictions on the moments of the pricing kernel based on the celebrated Hansen
and Jagannathan (1991) boundsevidence against any model mounts when the volatility of
the pricing kernel is below a certain threshold. We illustrate how these bounds are useful,
by revisiting a simplied version of the Lucas tree model introduced in Chapter 3. We shall
examine issues within this methodology arising in nite samples, and review the e orts needed
to tackle them.
The next section contains a simpied version of the Lucas model, which we take as a useful
benchmark in this chapter. Section 6.3 develops the central tools of analysis, a non-parametric
bound on the volatility of the pricing kernels, i.e. the risk-premium, consistent with a given level
of the short-term rate. Section 6.4 considers multifactor extensions, with closed-form solutions,
arising under a number of analytically convenient assumptions on the stochastic discounting
factor. One of the striking points of Section 6.4 is that in spite of these added dimensionalities,
the resulting models might spectacularly fail explain the dynamics of asset pricespumping up
volatility is not enough, if this added volatility is not accompanied by time-varying countercyclical statistics. Section 6.5 develops a link between stochastic discount factors and Sharpe ratios,
and Section 6.6 develops dynamic versions of the core bounds at the heart of this chapter.

c
by
A. Mele

6.2. The equity premium puzzle

6.2 The equity premium puzzle


6.2.1 A single-factor model
6.2.1.1 Assumptions

We consider an economy with a single agent with CRRA equal to , and a constant discount
factor . We assume cum-dividends gross returns, ( + )/
1 , are generated by:
+

ln(

) = ln

ln

= ln

where

1
2
1
2

+
+

02 ;

+
(6.1)

2
2

The second equation in (6.1) is obviously given. The rst, is endogenous, so to speak. We want
to nd parameter values such that Eqs. (6.1) hold in equilibrium.
6.2.1.2 Eulers restrictions

The coe cients ,


, 2 , 2 and
need to satisfy joint restrictions consistent with Euler
equations, which is indeed the equilibrium outcome, as explained in Part I of the Lectures. By
results in Part I of the Lectures, these Euler equations are:

+1 +
+1
+1
+1
+1
(6.2)
1=
+1
+1 = ln
+1

where
is the information set as of time , and
+1 is the stochastic discounting factor.
Naturally, Eq. (6.2) holds for any asset. In particular, it holds
1for
a one-period bond with price
, +1
1 and +1 0. Dene,
ln
into
ln . By replacing
then, +1
1
+1
Eq. (6.2), one gets
=
, such that,
1

+1 |

1=

+1

+1

(6.3)

The following result helps determine the expectations in Eqs. (6.3) in closed-form:
Lemma 6.1: Let
. Then, for any

+1

R,

+1

+1

By the denition of
1

be normally distributed, conditionally upon the information set at time

)=

+1 |

)+ 12

+1 |

)+

+1 |
+1 |

p
)
1

+1 |

, Eq. (6.1), and Lemma 6.1,


+1

+1 |

)+ 12

+1 |

ln

1
2

)+ 12

2 2

(6.4)

Therefore, the equilibrium interest rate is constantits expression is given in the second of
Eqs. (6.5) below.
257

c
by
A. Mele

6.2. The equity premium puzzle


The second of equations (6.3) can be written as,
1=

(exp (

+1

+1 )|

)=

ln

1
2

)+

1
2

+1

(0 2 + 2 2 2
). The expectation in the above equation
where +1
+1
+1
can be determined through Lemma 6.1, resulting in,
| {z }

and

ln

ln +

1
( + 1)
2

(6.5)

risk premium

where we have used Eq. (6.4) to calculate the interest rate, . Note that the expected gross
return on the risky asset is,

1 2
+1 +
+1
+1
2
=
[
| ]=
= +

Therefore, if
0, then ( ( +1 + +1 )/ | )
, as expected.
The expressions for the equity premium and the short-term rate are the discrete-time counterpart to those derived in Chapter 4. Consider, for example, the interest rate. The second
term,
, reects intertemporal substitution e ects: consumption endowment increases, on
average, as
increases, which reduces the demand for bonds, thereby increasing the interest
rate. The last term, instead, relates to precautionary motives: an increase in the uncertainty
related to consumption endowment,
, raises concerns with our representative agent, who
then increases his demand for bonds, thereby leading to a drop in the interest rate.
6.2.1.3 Solution for the P/D ratio and absence of sunspots

We check the internal consistency of the model. The coe cients of the model satisfy some
restrictions. In particular, the asset price volatility must be determined endogenously. Let us
conjecture, and later verify, that the following no-sunspots condition holds, for each period
:
=
(6.6)
By Eqs. (6.5) and (6.6),
=
and by the denition of

+1

+1

(6.7)

in Eq. (6.2),

1
+
2

+1
+1

+1

such that we can dene the pricing kernel , from the stochastic discounting factor, recursively,
as follows:
+1

+1

+1

=1

It is the discrete-time counterpart to the continuous-time representation of the Arrow-Debreu


state price density given in Chapter 4.
258

c
by
A. Mele

6.2. The equity premium puzzle


Let us iterate the asset price equation (6.2),
#
!
"
"

Y
X
Y

+
+
+
=

=1
=1
=1
X

+
+

+
+
+
=
=1

By letting

and assuming the rst term in the previous equation goes to zero,

X +

=
+

(6.8)

=1

Eq. (6.8) holds, as just mentioned, under a transversality condition, similar to that analyzed in
Chapter 4, Section 4.3.3, which always holds, under the inequalities given in (6.9) below.
The expectations in Eq. (6.8) are, by Lemma 6.1,

+
(
)
+
=1
=
=
+
+
Suppose the risk-adjusted discount rate +
viz
+

is higher than the growth rate of the economy,


1

(6.9)

Under this condition, the summation in Eq. (6.8) converges, leaving:


(6.10)
1
Eq. (6.10) links to the celebrated Gordons formula (Gordon, 1962). The price-dividend ratio
increases with the expected dividend growth,
, and decreases with the (risk-adjusted) discount rate, +
. The model predicts that price-dividend ratios are constant, a counterfactual
prediction addressed in the next two chapters. Finally, the solution for the price-dividend ratio
in Eq. (6.10) is, of course, consistent with that of the Lucas model in Chapter 3, Section 3.2.4,
as shown below.
We now check that the no-sunspots condition in Eq. (6.6) holds, and derive the asset price
variance. By Eq. (6.10), ln ( + ) = ln (1
)+ln
& ln
ln (1
)+ln
1 = ln
1,
or
1 2
ln ( + ) ln
ln + ln
ln
ln +
+
1 =
1 =
2
where the last equaility follows by the second of Eqs. (6.1). By identifying terms in the previous
equation and the rst of Eqs. (6.1) leaves:
1 2
1 2
=
ln
=
, for each
(6.11)
2
2
The second condition conrms the no-sunspots condition in Eq. (6.6) holds. It also informs
us that, 2 =
= 2 . By replacing this into the rst condition conrms indeed that
=
ln = +
.
Note also that by replacing the expression for the interest rate in the second of Eqs. (6.5) and
( 1)(
+ 12 2 )
,
the equity premium in Eq. (6.7) into Eq. (6.9), the constant simplies to
such that the price-dividend ratio in the log-utility, = 1, collapses to (1
), as established
in Chapter 3. This section provides a solution to the general CRRA case, under the additional
assumption that dividends are normally distributed.
259
=

c
by
A. Mele

6.2. The equity premium puzzle


6.2.2 Extensions

Chapter 3 shows that within a IID environment, prices are convex in the dividend
if
1,
and concave if
1. Eq. (6.10) reveals this property may be lost in a dynamic environment.
The next chapter shows that in such an environment, the convexity of the price depends on the
that of the dividend process, in the following sense: if the expected dividend growth under the
risk-neutral probability is convex (resp. concave) in , the price is convex (resp. concave) in .
In the model of this section, the expected dividend growth under the risk-neutral probability
is linear in
, which explains the linear property in Eq. (6.10). [In progress, give economic
intuition]
6.2.3 Equity premium and interest rate puzzles
Average excess returns on the US stock market [the equity premium] is too high to be
easily explained by standard asset pricing models. Mehra and Prescott (1985)

Mehra and Prescott (1985) noted the following di culty with the Lucas model, which gave
rise to what is widely known as the equity premium puzzle. To be consistent with US data,
2
the equity premium in Eq. (6.7),
=
, must be an approximate 6% annualized, as
explained in the next chapter. If the asset we are trying to price is literally a consumption
claim, then,
would be consumption volatility, which is very low, approximately 3.3%. For
the equity premium to be high, we would need, by reverse-engineering the two equations in
0 06
(6.7), a quite high value of the relative risk-aversion, say
55. Section 6.4 explains
0 0332
this number, 55, can be improved to 35, once we also condition on the volatility of short-term
bonds.
One assumption underlying the previous calculations is that aggregate dividend equals aggregate consumption, which is obviously not the case in the real world. Note, then, that dividend
06
growth volatility is around 6%, which implies the implied is 17 0006
2 , thereby mitigating the
premium puzzle. Still, the model would fail deliver realistic predictions about return volatility,
as in this case, return volatility would be just 6%, by Eq. (6.11), which is less than a half of what
we see in the data, as explained in the next chapter. Moreover, the model would fail predict
countercyclical statistics, such as countercyclical expected returns or dividend yields. We shall
return to these topics in the next chapter.

r
0.1

0.0

10

20

30

40

gamma

-0.1

FIGURE 6.1. The risk-free rate puzzle: the two curves depict the graph
7
( ) =
2
1
(0
0328)

(
+
1),
with
=
0
95
(solid
line)
and
=
1
05
(dashed
ln + 0 0183
2

260

c
by
A. Mele

6.2. The equity premium puzzle

line). Even if risk aversion were to be as high as = 30, the equilibrium short-term rate
would behave counterfactually, reaching a level as high as 10%. In order for to be lower
when is high, it might be required that
1.

The equity premium puzzle is not the only one. Even if we are willing to consider that a
CRRA as large as = 30 is plausible, another puzzle arisesan interest rate puzzle. As the
expression for in equations (6.5) shows, a large value of can lead the interest rate to take very
high values, as illustrated by Figure 6.1. Finally, related to the interest rate puzzle is an interest
rate volatility puzzle. In the model of this chapter, the safe rate is constant. However, in models
where both the equity premium and interest rates change over time, driven by state variables
related to, say, preference shocks or market imperfections, the short-term rate is too volatile.
For example, in the presence of time-varying expected dividend growth,
say, the expression
for the short-term rate is the same as in Eq. (6.5), but with
replacing the constant
, as
explained in the next chapters. It is easily seen, then, that the interest rate is quite volatile for
high values of .
This interest rate volatility puzzle relates to the assumption of a representative agent. Chapter
3 (Section 3.2.3) explains that agents with low elasticity of intertemporal substitution (EIS)
have an inelastic demands for bonds. In the context of CRRA utility functions, a low EIS
corresponds to a high CRRA, as EIS = 1 , as explained in Chapter 3. So now suppose there is a
wide-economy shock that shifts the demand for bonds, as in the following picturefor example,
a shock that makes
change.
Bond pric e

B ond suppl y

B ond dem and

An economy with a representative agent is one where the supply of bonds is xed. The combination of a representative agent with a low EIS, then, implies a high volatility of the short-term
rate, which is counterfactual. To mitigate this issue, one may consider preferences that disentangle the EIS from risk-aversion, as such as those relying on non-expected utility (Epstein
and Zin, 1989, 1991; Weil, 1989), or a framework with multiple agents, where bond supply is
positively sloped, as in the limited participation model of Guvenen (2009). These models are
examined in Chapter 8.
261

c
by
A. Mele

6.3. Hansen-Jagannathan cup

6.3 Hansen-Jagannathan cup


Suppose there are

risky assets. The


1=

+1

asset pricing equations for these assets are,


(1 +

+1 )|

= 1

Assuming
+1 is stationary, and taking the unconditional expectation of both sides of the
previous equation, leaves,
1 =

(1 +

)]

=(

)>

(6.12)

Next, let
( ), and create a family of stochastic discount factors
, by projecting on to the asset returns, as follows:
Proj ( | 1 +

() = +[

)]>

, parametrized by

where1

h
and
(
( )) (
discounting factor, i.e.,

1 +

) =

[1

(1 +

)]

i
))> . The Appendix shows that

1 =

( ) (1 +

( ) is also a stochastic

)]

(6.13)

We have,
p

( )) =

>

q
= (1

(1 +

))>

(1

(1 +

))

(6.14)

Eq. (6.14), provides the expression for the celebrated Hansen-Jagannathan cupafter the
work of Hansen and Jagannathan (1991). It leads to an important tool of analysis, as the
following theorem shows.
Theorem 6.2: Among all stochastic discount factors with xed expectation ,
one with the smallest variance.
Proof: Consider another discount factor indexed by , i.e.
1 = [ ( ) (1 + )]. Moreover, by Eq. (6.13),
0 =
=
=
=

[( ( )
[( ( )
[( ( )
[ ()

( ). Naturally,

( )) (1 + )]
( )) ((1 + ( )) + (
( )) (
( ))]
() ]

( ) is the

( ) satises

)))]
(6.15)

where the third line follows because [ ( )] = [ ( )] = , and the fourth line holds by
[( ( )
( ))] = 0. But
( ) is a linear combination of . By the previous equation,
1 We

have,

1 +

)=

(1 +

)]

( )

(1 +

262

)=1

(1 +

).

c
by
A. Mele

6.3. Hansen-Jagannathan cup


then, it must be that 0 =
[

( )] =
=
=

[
[
[
[

()

()+
( )] +
( )] +
( )]

()

( )]. Therefore:

()
( )]
[ ()
( )] + 2
[ ()
( )]

()

()

( )]

k
Hansen and Jagannathan (1991) consider an extension of this result, in which the stochastic
discount factor satises the non-negativity constraint,
0.
Consider, then, the curve, 7
[ ( )]the cup. By Theorem 6.1, each pair
(
[ ( )]) predicted by any candidate model satisfying Eq. (6.12) has to lie above the
cup for each possible , (
[ ( )]). The idea is the following. Testing the validity of a
model is tantamount to assess whether the Euler equations it predicts are satised and how
volatile its pricing kernel is. Consider, then, Eq. (6.12), and assume that a candidate pricing
kernel is parametrized by some vector
, such that ( ; )
( ) say, and is volatile
enough. We want to nd values of , say , such that ( ; ) is a valid stochastic factor, i.e.
such that it prices all the assets, just as in Eq. (6.12) and is volatile enough. Requiring it prices
all the assets is actually a condition needed for Theorem 6.2 to holdin particular, the proof
shows that in this case Eq. (8.44) would hold. Naturally, the pricing kernel needs to be volatile
enough as well. The issue then is to ascertain whether the values of that let the pricing kernel
in the cup are economically reasonable so to speak.
To illustrate, consider the Lucas model of Section 6.2. We calculate the Hansen-Jagannathan
bounds in Eq. (6.14) and ascertain whether there are parameter values of the model that would
allow the stochastic discount factor to be inside the bounds in the mean and standard deviation
space. The stochastic discount factor of the model is:
+1

+1

= exp (

+1 )

+1

By Lemma 6.1, the rst two moments of


=

)=

and

1
+
2

+1
+1

+1

are:
p

( )) =

+ 12

p
1

(6.16)

where is as in Eq. (6.5), and =


, as usual. For given
and 2 , these two equations in
(6.16) form a -parametrized curve in the space ( - ). The issue is to check whether this curve
enters the Hansen-Jagannathan cup for plausible values of . It is not the case. Rather, we have
the situation depicted in Figure 6.2. The reason the circles bend back is easily explained. When
is low, an increase in leads to an increase in and a decrease in , because increases
with when is small. Yet as soon as gets su ciently large, the interest rate decreases,
due to precautionary motives, which make both and increase. In general, we expect that
for any model where the intertemporal elasticity of substitution is somehow confounded with
relative risk-aversion, a bending pattern such that in Figure 6.2 would arise, being indicative
of conicts between intertemporal elasticity of substitution and precautionary e ects relating
to the demand of riskless assets.
263

c
by
A. Mele

6.4. Multifactor extensions

Standard deviation of the pricing kernel

4.5
HansenJagannathan
bounds

4
3.5
3
2.5
2
1.5

Predictions of
the Lucas model

1
0.5
0
0.8

0.85

0.9
0.95
1
1.05
Expected value of the pricing kernel

1.1

1.15

FIGURE 6.2. The solid line depicts the Hansen-Jagannathan bounds, obtained through
Eq. (6.14), through aggregate stock market data and the short-term rate. The average
return and standard deviation of the stock market are taken to be 0.07 and 0.14. The
average short-term rate (three-month bill) and its volatility are, instead, 0.01 and 0.02.
These estimates relate to the sample period from January 1948 to December 2002. The
circles are predictions of the Lucas model in Eq. (6.16), with = 0 95,
= 0 0183,
= 0 0328 and ranging from 1 to 35. The two circles inside the cup are the pairs
( ) in Eq. (6.16) obtained with = 35 and 33. Progressively lower values of lead
the pairs ( ) to lie outside the cup, nonlinearly.

The Lucas model predicts that the pricing kernel is quite moderately volatile. The following
chapters discuss models with both heterogeneous agents or more general preferences, which can
help boost the volatility of the pricing kernel.

6.4 Multifactor extensions


A natural way to increase the variance of the pricing kernel is to increase the number of
factors. We consider two possibilities: one in which returns are normally distributed, and one
in which returns are lognormally distributed. A multifactor model is naturally promising, for
its potential to deliver boosted values for the risk-premium. However, it may fail deliver timevarying statistics as we shall explain.
264

c
by
A. Mele

6.4. Multifactor extensions


6.4.1 Exponential a ne pricing kernels

This section considers a variant of the Lucas model of Section 6.2, and determines expected
returns under a di erent assumption regarding the returns distributions. We still maintain the
hypothesis that the stochastic discount factor is exponential-Gaussian, viz

1 2
+
NID (0 1)
+1 = exp ( +1 )
+1 =
+1
+1
2
where

and
1=

are some constants. In the absence of arbitrage,


(

+1

+1 ) =

+1 )

( +1 ) +

By rearranging terms,2 and using the fact that


( +1 )

+1

+1

+1

+1 )

+1 )

+1 )

+1

+1

,
(6.17)

Consider the following result, which we shall use later:


Lemma 6.3 (Steins lemma): Suppose that two random variables
Then,
[ ( ) ] = [ 0 ( )]
( )

for any function

(| 0 ( )|)

and

are jointly normal.

Next, suppose is normally distributed. This assumption is inconsistent with those underlying the model in Section 6.2, in which is lognormally distributed, in equilibrium, be1 2
ing equal to ln =
+ , where
is normal. Howver, let us explore the asset
2

pricing implications of this new assumption . Because +1 and +1 are both normal, and
( +1 ) = exp ( +1 ), we apply Lemma 6.3 and obtain,
+1 =
(

+1

+1 ) =

+1 )]

Replacing this expression for the covariance,


( +1 )

+1 ) =

+1

+1

+1

+1 )

+1 ), into Eq. (6.17), leaves:


+1 )

+1

We now extend the previous observations to a more general setup. Consider a stochastic discount factor as a function of factors, ( 1
) say. A particularly convenient analytical
assumption is that
is exponential-a ne and the factors ( ) =1 normal, as in the following
denition:
Definition 6.4 (EAPK: Exponential A ne Pricing Kernel): Let,
0

X
=1

2 With

a portfolio return that is perfectly correlated with


(

+1 )

, we have:

1
(

+1 )

(
(

+1 )
+1 )

In more general setups than the ones considered in this introductory example, both

265

+1 )

(
(

+1 )
+1 )

and

+1 )

should be time-varying.

c
by
A. Mele

6.4. Multifactor extensions


A EAPK is a function,
=

( ) = exp( )

(6.18)

has mean zero and variance


If ( ) =1 are jointly normal, and each
EAPK is called a Normal EAPK (NEAPK).

= 1

, the

It is without loss of generality to assume that each


has a mean equal to zero in the
previous denition provided that 0 6= 0. Next, suppose is normally distributed. By Lemma
6.3 and the NEAPK assumption,
(

+1

+1 ) =

[exp (

+1 )

+1 ] =

+1

+1 ) =

+1

+1 )

=1

By replacing this into Eq. (6.17) leaves the linear factor representation,
( +1 )

=1

We have shown the following result:

+1
{z

+1 )
}

(6.19)

betas

Proposition 6.5: Suppose that is normally distributed. Then, NEAPK =


representation for asset returns.

linear factor

The APT representation in Eq. (6.19), is similar to a result in Cochrane (1996).3 Cochrane
(1996) assumes that is a ne, i.e.
is as in Denition 6.1. This assumption
P ( ) = where

implies that
( +1 +1 ) =
(
).
By replacing this expression for the
+1
+1
=1

covariance,
( +1 +1 ), into Eq. (6.17), leaves
( +1 )

+1

+1 )

=1

where

1
1
=
( )
0

The NEAPK assumption, compared to Cochranes, carries the obvious advantage to guarantee
the stochastic discount factor is strictly positivea theoretical condition we need to rule out
arbitrage opportunities.
6.4.2 Lognormal returns
Next, assume that is lognormally distributed, and that the NEAPK holds. We have,

+1

0
=1
1=
(6.20)
=
+1
+1
+1

Consider, rst, the case


= 1, and let
= ln be normally distributed. The previous
equation can be written as,

1
2 2
2
0 =
1 +1 + +1
= ( +1 )+ 2 ( 1 + +2 1 )

3 To recall why Eq. (6.19) is indeed a APT equation, suppose that is a -(column) vector of returns and that = +
, where
is -(column) vector with zero mean and unit variance and
are some given vector and matrix with appropriate dimension.
Then clearly, =
( ). A portfolio delivers > = > + >
( ) . Arbitrage opportunity is:
: >
( ) = 0
and > 6= . To rule that out, we may show as in Part I of these Lectures that there must exist a -(column) vector
s.t.
=
( ) + . This implies = +
= +
( ) + . That is, ( ) = +
( ) .

266

c
by
A. Mele

6.4. Multifactor extensions


This is,

1 2 2
2
( +1 ) =
+ +2 1 )
0+ ( 1
2
By applying the pricing equation (6.20) to a zero coupon bond,

ln +1
1 2 2
0 =
1 +1
= ln +1 + 2 1
which we can solve for

+1 :

ln

+1

The expected excess return is,

1
0+
2

2 2
1

1 2
=
(6.21)
1
2
Eq. (6.21) shows that the theory in Section 6.2 through a di erent angle. Apart from Jensens
. This model
inequality e ects ( 12 2 ), this is indeed the Lucas model of Section 6.2 once 1 =
has poor quantitative implications as discussed in Section 6.3, bound as it is to explain returns
with only one stochastic discount-factor parameter, i.e. 1 .
Next consider the general case. Assume as usual that dividends are as in (6.1). To nd the
price function in terms of the state variable , we may proceed as in Section 6.2. In the absence
of bubbles,

X
X
+
( + 0 + 12 =1 ( 2 +2 ))
=
+ =

(
)
(

+1 )

=1

ln

+1

=1

Thus, if

then,

1X
2 =1

+2

1
There are interesting features of the model to mention. The model is cast within a multi-factor
setting, and yet it predicts that the price-dividend ratio is constant, which is counterfactual, as
explained in the next chapter. However, note the following facts. The rst two moments of the
stochastic discount factor of this model are easily determined, by relying on Lemma 6.1:
p
p
1
( )
( ) = ( )+ 2 ( )
( ) = ( )+ ( ) 1

Therefore, we can always calibrate the parameters of this model and make sure the rst
two moments of the pricing kernel enter into the Hansen-Jagannathan cup. At the same time,
remember, the model still predicts price-dividend ratios to be constantthat is, the model
makes counterfactual predictions (constant price-dividend ratios) even when the variance of its
pricing kernel is arbitrarily large.
In other words, a model satisfying Hansen-Jagannathan bounds is not denitely good. It
would rather necessitate further scrutiny. The next section illustrates a variant of the previous
model in which time-variation in the risk-premiums gives rise to time varying statistics. The
next chapter makes additional steps forwards, attempting to develop theoretical test conditions,
which ensure that these time-varying statistics have the same cyclical properties as in the data,
such as procyclicality of price-dividend ratios or countercyclical volatility.
267

c
by
A. Mele

6.5. Conditional CAPM

6.5 Conditional CAPM


[In progress]

6.6 Pricing kernels and Sharpe ratios


[This section needs a considerable revision]
6.6.1 Market portfolios and pricing kernels
The market portfolio cannot be perfectly correlated with the stochastic discount factor, in
general [cite some literature such as Cecchetti, Lam, and Mark (1994)]. Let +1 = +1
+1
be the excess return on a risky asset. We have:

0=
( +1 )
Std ( +1 ) Std
+1
+1 =
+1 +
+1

where Std ( +1 ) denotes the standard deviation of a variable +1 , conditionally upon the
information available at time , and
corr ( +1
+1 ), a conditional correlation. Hence,
(
)
the Sharpe Ratio, S Std +1 , satises:
( +1 )
|S|

Std (
(

+1 )

= Std (

+1 )

+1 )

+1

(6.22)

The highest possible Sharpe ratio is bounded. The equality holds for a hypothetical portfolio
, say, yielding excess returns perfectly conditionally negatively correlated with the stochastic
discount factor,
= 1. We shall say of
that it is a -CAPM generating portfolio. Is
it also a market portfolio? After all, a feasible and attainable portfolio lying on the volatility
bounds for the stochastic discount factor is clearly mean-variance e cient. The answer is subtle.
As explained in the context of the static model of Chapter 1, the Sharpe ratio, S, equals the
slope of the Capital Market Line, and bears the interpretation of unit market risk-premium. If
= 1, then, by Eq. (6.22), the slope of the Capital Market Line reduces to Std(( +1+1)) . For
example, with the Lucas model in Section 6.2,
Std (
(

+1 )
+1 )

2 2

)/
=
, which is only approximately true,
In Section 6.2, we also explained that (
according to the previous relation. Not even a simple model with a single tree, such as that
in Section 6.2, would be capable of leading to a -CAPM generating portfolio, or a market
1 2
) 12 2 2 , and
2
portfolio! Indeed, for this model, we have that ( ) =
, = ln + (
2
( ) = 2 (
1), with a Sharpe ratio equal to:

By simple computations,
mately equal to

1
2 2

1, for low values of

1
S=p

2
2

, which is not precisely 1only approxi1

.
268

c
by
A. Mele

6.6. Pricing kernels and Sharpe ratios

A further complication arises, a -CAPM generating portfolio is not necessarily the tangency
portfolio. We can show that there is another portfolio leading to the very same -pricing relation
predicted by the tangency portfolio. Such a portfolio is referred to as the maximum correlation
portfolio, for reasons developed below. Let = (1 ) . By the CCAPM in Chapter 2,

where
is a portfolio return. Next, let
=
, which is clearly perfectly correlated
( 2)
with the stochastic discount factor. By results in Chapter 2,


=
( )

This is not yet the -representation of the CAPM, because we have yet to show that there
is a way to construct
as a portfolio return. In fact, there is a natural choice: pick =
,
where
is the minimum-variance kernel leading to the Hansen-Jagannathan bounds. Since
is linear in all asset retuns,
can be thought of as a return that can be obtained by
investing in all assets. Furthermore, in the appendix we show that
satises,

1=

Where is this portfolio located? The Appendix shows that there is no portfolio yielding the
same expected return with lower variance, that is,
is mean-variance e cient), and that:

1=

1+
1+

1+

Mean-variance e ciency of
and the previous inequality imply that this portfolio lies in
the lower branch of the mean-variance e cient portfolios. And this is so because this portfolio
is positively correlated with the true pricing kernel. Naturally, the fact that this portfolio is
-CAPM generating doesnt necessarily imply that it is also perfectly correlated with the true
stochastic discount factor. As shown in the appendix,
has only the maximum possible
correlation with all possible . Perfect correlation occurs exactly in correspondence of the
stochastic discount factor =
(i.e. when the economy exhibits a stochastic discount factor
exactly equal to
).
Proof that
imply:
( )
or,

is -capm generating. The relations, 1 =


=

(
(

)
)

(
=

(
(

=
)
)

) and 1 =

By construction,
is perfectly correlated with
. Precisely,
=
( 2 ). Therefore,

)
(
=
=
=
(
)
(
)
(
)
k

269

),

/ (

c
by
A. Mele

6.6. Pricing kernels and Sharpe ratios


6.6.2 Pricing kernel bounds

Figure 6.2 depicts the typical situation the neoclassical asset pricing model has to face. Points
are those generated by the Lucas model for various values of . The model has to be such
that points lie above the observed Sharpe ratio ( ( )/ ( )
greatest Sharpe ratio ever
observed in the dataSharpe ratio on the market portfolio) and inside the Hansen-Jagannathan
bounds. Typically, we need high values of to enter the Hansen-Jagannathan bounds.
There is an interesting connection between these facts and the classical mean-variance portfolio frontier described in Chapter 1. As shown in Figure 6.3, every asset or portfolio must lie
inside the region bounded by two straight lines with slopes
( )/ ( ). It must be so, as
for any asset (or portfolio) priced by a stochastic discount factor , we have that

( )

( )

As seen in the previous section, the equality is only achieved by asset (or portfolio) returns
that are perfectly correlated with . A tangency portfolio such as T doesnt necessarily attain
the volatility bounds for the stochastic discount factor. Moreover, the market portfolio has no
reasons to lie on the volatility bound for the stochastic discount factor. As an example, for
the simple Lucas model, the (only existing) asset has a Sharpe ratio, which doesnt lie on the
volatility bounds for the stochastic discount factor. In a sense, the CCAPM does not need
to imply the CAPM: there are necessarily no assets performing, at the same time, as market
portfolios and -CAPM generating, which are also priced consistently by the true stochastic
discount factor. These conditions would only simultaneously hold if the candidate market portfolio were perfectly negatively correlated with the stochastic discount factor, which is a quite
specic circumstance, the only circumstance where we can really say the CAPM is a particular
case of the CCAPM. We still do not know conditions on general families of stochastic discount
factors, which are consistent with the previous properties.

(m )

H ansen-Jagannathan bounds

Sharpe ratio

E(m )

However, we know that there exists another portfolio, the maximum correlation portfolio,
which is also -CAPM generating. In other terms, if
:
=
, for some positive
constant , then the -CAPM representation holds, but this doesnt necessarily mean that
270

c
by
A. Mele

6.6. Pricing kernels and Sharpe ratios


is also a market portfolio. More generally, if there is a return
then,
=

that is -CAPM generating,

all

(6.23)

Therefore, we dont need an asset or portfolio return that is perfectly correlated with
to
make the CCAPM collapse to the CAPM. All in all, the existence of an asset return that is
perfectly negatively correlated with the stochastic discount factor is a su cient condition for
the CCAPM to collapse to the CAPM, not a necessary condition. The proof of Eq. (6.23) is
simple. By the CCAPM,
(

( )
(
( )

and

( )
(
( )

That is,
(
(
But if

)
)

(
(

)
)

(6.24)

is -CAPM generating,
(
(

)
)

(
(

)2

(
(

)
)

(6.25)

Comparing Eq. (6.24) with Eq. (6.25) produces Eq. (6.23).

E(R)

kernel volatility bounds


mean-variance efficient portfolios
efficient portfolios frontier

T
tangency portfolio
1 / E(m)
maximum correlation portfolio

(R)

A nal thought. In many pieces of applied research, we often read that because we observe
time-varying Sharpe ratios on (proxies
. of) the market portfolio, we should also model the market
p
risk-premium
( +1 )
( +1 ) as time-varying. While Chapter 7 explains that
the evidence for time-varying risk-premiums is overwhelming, a criticism to this motivation
is that
is only an upper bound to the Sharpe ratio of the market portfolio. On a strictly
theoretical point of view, then, a time-varying is neither a necessary or a su cient condition
to have time-varying Sharpe ratios, as Figure 6.3 illustrates.
271

c
by
A. Mele

6.7. Conditioning bounds

6.7 Conditioning bounds


The Hansen-Jagannathan bounds in Eq. (6.14) can be improved by using conditioning information, as originally shown by Gallant, Hansen and Tauchen (1990) and in Ferson and Siegel
(2003). A di culty with these bounds is that they may diplay a nite sample bias, in that
they tend to overstate the true bounds and thus reject too often a given model. Finite sample
corrections are considered by Ferson and Siegel (2003). [Discuss, analytically]
Alvarez and Jermann (2005). [Discuss, analytically]

272

c
by
A. Mele

6.8. Appendix

6.8 Appendix
Proof of Eq. (6.13). We have,
[

( ) (1 +

)] =
=

+(

))>

(1 +

)+

(1 +

)+

(1 +

)+

(1 +

)+

(1 +

)+

(1 +

)+1

i
)

(1 +

h
i
(
( ))> (1 + )
i
h
(1 + ) (
( ))>
h
((1 + ( )) + (
( ))) (
i
h
(
( )) (
( ))>

))>

(1 +

where the last line follows by the denition of


Proof that

can be generated by a feasible portfolio

Proof of the Equation, 1 =

. We have,

)
(
2
[( ) ]

)=

where
(

) = 2 +
= 2 +
2

= +

h
h

))>

(
)>

(1 +

( ) [1 +

h
= 2 + 1
h
= 2 + 1
= 2 +

(1 +

(1 +
(

>

(
>

))

))>

)]
i
i

(1 +

))>

[1

(1 +

))]

where the last line is due to the denition of

is mean-variance
efficient. Let = ( 0 1
)> the vector of
Proof that

portfolio weights (here


is the portfolio weight of asset , = 0 1 . We have,
>

+1

+1

=1

>

= 1 1 1
. We denote our benchmark portfolio
The returns we consider are
=
1. Next, we build up an arbitrary portfolio yielding the same expected return
return as
( ) and then we show that this has a variance greater than the variance of . Since this portfolio

273

c
by
A. Mele

6.8. Appendix
is arbitrary, the proof will be complete. Let
(

>

)=

1+ > ]
( 2)

>

)=

2)

)=

). We have:

)]

such that

)]

[ (

(1

+1

2 )]2

))

1]

=0
The rst line follows by construction since
>

(1

+1

). The last line follows because


>

)] =

+1

=1

Given this, the claim follows directly from the fact that
(

)=

+(

)] =

Proof of the Equation,

(
1+
1+

1=

)+

1=

. We have,
1

)2 ]

[(

In terms of the notation introduced in Section 6.8,


= + ( )>

is:
1

[1

(1 + )]

We have,
[(

)2 ] = [ + ( )>
= 2 +

[(

= 2 +

[(

= +

[(

= 2 +

>

>

2
]
)> ]2
)> ( )> ]2
>
)( > > )]

= + [1

(1> +

= 2 + 1>

[1>

1 +

>

>
)]

1>
1

Again in terms of the notation of Section 6.8 (


[(

)2 ] =

Thus, the expected return is

1=

(
[(

)
)2 ]

[1

(1 + )]

1 + 1>

(1>
1>

1 +
11

2 ( + )+ 2 1+

1=

>

1 + 1>

and
+2 +

1>

>

)]

), this is:

+2 ( + ) 2 1+ +2 +
2 ( + ) + 2 (1 + + 2 + >

274

>

>

1
1

c
by
A. Mele

6.8. Appendix
Next, recall the following two denitions:
=
In terms of

and
(

1
1+

=(

)>

>

)=

, we have,
)

( )
[( )2 ]

1=

(1 + )2 (1 + ) (1 + 2 + 2 ) + 1 + + 2 + >
(1 + )2 (1 + ) (2 + 2 ) + 1 + + 2 + >

=
=

1
1
1

1+
1+
1+

This is positive if
0, i.e. if
low (or su ciently high) values of .

>

(2 + 1) +

0, which is possible for su ciently

is the -maximum correlation portfolio. We have to show that for any


Proof that
)| |
(
)|. Dene a -parametrized portfolio such
stochastic discount factor , |
(
that:
]= ( )
1
[(1
) +
We have
(

)=

[
(

The rst line follows because (1


equality follows because
(

(1

(1

+ ((1
) +
)+
( (1
) +
p
=
( )
((1
) +
)
(
p
=
( )
((1
) +
)

((1

= (1

)=

( )

(
)+
| {z }

((1

(
((1

)
)

)=

. The last

)]

=1

= 0

is a nonstochastic a ne translation of

) =

where the rst line follows because

)]

)
}

{z

=1

{z

=1

)
}

). Therefore,

( )

(
p

)
(

is mean-variance e cient (i.e. @ feasible portfolios with the


where the inequality follows because
and variance less than
( )), and then
((1
) +
)
same expected return as
.
( ), all

275

c
by
A. Mele

6.8. Appendix

References
Alvarez, F. and U.J. Jermann (2005): Using Asset Prices to Measure the Persistence of the
Marginal Utility of Wealth. Econometrica 73, 1977-2016.
Cecchetti, S., Lam, P-S. and N. C. Mark (1994): Testing Volatility Restrictions on Intertemporal Rates of Substitution Implied by Euler Equations and Asset Returns. Journal of
Finance 49, 123-152.
Cochrane, J. (1996): A Cross-Sectional Test of an Investment-Based Asset Pricing Model.
Journal of Political Economy 104, 572-621.
Epstein, L.G. and S.E. Zin (1989): Substitution, Risk-Aversion and the Temporal Behavior of
Consumption and Asset Returns: A Theoretical Framework. Econometrica 57, 937-969.
Epstein, L.G. and S.E. Zin (1991): Substitution, Risk-Aversion and the Temporal Behavior of
Consumption and Asset Returns: An Empirical Analysis. Journal of Political Economy
99, 263-286.
Ferson, W. E. and A. F. Siegel (2003): Stochastic Discount Factor Bounds with Conditioning
Information. Review of Financial Studies 16, 567-595.
Gallant, R. A., L. P. Hansen and G. Tauchen (1990): Using the Conditional Moments of
Asset Payo s to Infer the Volatility of Intertemporal Marginal Rates of Substitution.
Journal of Econometrics 45, 141-179.
Gordon, M. (1962): The Investment, Financing, and Valuation of the Corporation. Homewood,
IL: Irwin.
Guvenen, F. (2009): A Parsimonious Macroeconomic Model for Asset Pricing. Econometrica
77, 1711-1740.
Hansen, L. P. and R. Jagannathan (1991): Implications of Security Market Data for Models
of Dynamic Economies. Journal of Political Economy 99, 225-262.
Mehra, R. and E. C. Prescott (1985): The Equity Premium: A Puzzle. Journal of Monetary
Economics 15, 145-161.
Weil, Ph. (1989): The Equity Premium Puzzle and the Risk-Free Rate Puzzle. Journal of
Monetary Economics 24, 401-421.

276

7
Aggregate uctuations in equity markets

7.1 Introduction
This chapter discusses empirical regularities of the aggregate equity market, how these properties relate to the business cycle, and the extent to which the neo-classical model can account
for them. This chapter is, thus, a natural development of the previous. Its motivation is still
to explain how existing models can help rationalize the extant empirical evidence. However, its
focus now regards how aggregate stock market uctuations relate to macroeconomic developments. When are stock returns pro-cyclical? When is stock volatility countercyclical? Such are
the questions this chapter addresses.
Our analysis regards how to reverse-engineer the pricing kernel that is consistent with the
empirical properties of the aggregate stock market. We would like to identify properties of
the pricing kernels consistent with stock volatility, not only stock returns. We consider two
broad classes of models. In the rst, agents evaluate assets relying on time-varying discount
rates. Thus, in these models, the cyclical properties of aggregate equity markets are explained
by the agents optimal response to shocks on fundamentals. In the second class of models,
expected growth is time-varying. This variability translates to uctuations in aggregate stock
market volatility. For example, time-varying expected growth may arise because agents have
incomplete knowledge of the state of the economy, and try to infer it from public signals. If the
agents estimate that the probability of living in the good state is time-varying, they will view
expected growth as being random. This randomness becomes a source of asset volatility.
The previous properties of the pricing kernels as well as the agents inference processes form
the basis of more advanced discussions in Chapter 8. Note, nally, that the models we analyze
in this chapter do not necessarily lead to a resolution of the puzzles surveyed in the previous
chapter. To illustrate, even if a given model likely leads to interesting dynamics, further scrutiny
is required regarding the size of the equity premium. We may nd a model predicts countercyclical volatility, as in the data, and yet this volatility can be orders of magnitude lower than
in the data.
A nal remark is in order concerning the very nature of aggregate stock market uctuations.
The empirical evidence reviewed in this chapter (see Section 7.2) suggests that a few key
market statistics are quite stationary in relation to their historical behavior vis-`a-vis the business

7.2. Empirical evidence: birds eye view

c
by
A. Mele

cycle. It is interesting, especially in light of the fact that capital markets have undergone
signicant changes over time, which mainly a ected various aspects of their microstructure
say transactions technology, the price discovery process, liquidity, or volumes, to mention a
few examples. How is it that the properties of the aggregate stock market reviewed in this
chapter do not appear to be a ected by these changes? One simple possibility is that market
microstructure regards the very high frequency behavior of markets, whereas the properties
studied in this chapter relate to slow, low frequency movements, which would not be too a ected
by market microstucture details. The models in this chapter chapter aim to rationalize some of
these movements, as explained. Models addressing the previous market microstructure issues
are reviewed in Chapter 9.
More in detail, this chapter is organized as follows. The next section provides a succinct
overview of empirical regularities of aggregate equity markets at the business cycle frequency; we
explain that price-dividend ratios and stock returns are procyclical, whereas stock volatility and
risk-premiums are countercyclical. Section 7.3 analyzes in deeper detail the empirical behavior of
aggregate volatility, and provides intuitive explanations for it. Section 7.4 develops a framework
to think about our countercyclical statistics. Section 7.5 analyzes the two classes of economies
with which it illustrates the predictions of Section 7.4. The modeling approach in this chapter
is based on the price-dividend ratios. Section 7.6 develops an alternative approached based on
B/M ratios. [In progress]

7.2 Empirical evidence: birds eye view


While the aggregate equity market is certainly unpredictable in the short-run as Fama showed
in his seminal work (see Fama, 2014), the work of Shiller shows that it moves much more than
justied by the fundamentals (see Shiller, 2014). This excess volatility puzzle is also associated
with predictability in the medium-term: in bad times (say in recessions), the market requires
high returns while in good times, the market appetite for risk is high. These oscillations in the
investors mood lead stock prices to wide and predictable uctuations in the medium-term,
that is, excess volatility.
Naturally, the conclusions of Fama and those of Shiller are not in contradiction. Famas
work centers on short-term uctuations, whereas Shillers focus regards the medium-term, as
explained. Regarding the short-run, Chapter 9 revisits Famas work on market e ciency in
light of information problems. Instead, this and the following chapter analyze aggregate stock
market uctuations at a lower frequency, i.e., that of the business cycle. Note, in most of the
models of this and the next chapter, the investors mood is rational, that is, linked to a
rational assessment of risk or uncertainty in light of preferences and beliefs.
The evidence of a linkage between the aggregate stock market and the business cycle is
indeed both striking and well-known since long time (see, e.g., the early survey in Campbell,
2003). This section aims to streamline how aggregate stock market uctuations relate to general
macroeconomic conditions. Note that we do not aim to establish any causality link amongst
the various variables we analyze. Rather, the analysis in this section could be best described as
one delivering descriptive statistics.
We use monthly data that cover the period
P from January 1948 through December 2002.
We calculate yearly returns at month as 12=1 +1 , where = ln( + 1 ),
is the S&P
Composite index at month , and
is the aggregate dividend calculated by Robert Shiller.
Table 7.1 provides basic statistics for both raw data such as the P/D ratio, realized returns,
278

c
by
A. Mele

7.2. Empirical evidence: birds eye view

and stock volatility and expected returns. Stock volatility is calculated as follows:
1 X
+1
12 =1
12

Vol

+1

(7.1)

where
is the risk-free rate, taken to be the one month bill return.1 Expected returns are
calculated as explained below (see Eq. (7.2)). With the exception of the P/D ratio, all gures
are annualized percent.
We note the rst main set of stylized facts:
Fact I. The P/D ratio and realized returns are procylical, although variations in
the business cycle conditions do not seem to be their only driving forces.
For example, Figure 7.1 reveals that the price-dividend ratio on the S&P500 declines during
all of the economic slowdowns, as signaled by the recession indicator calculated by the National
Bureau of Economic Research (NBER)the NBER recessions. At the same time, during NBER
expansions, price-dividend ratios seem to be driven by additional factors not necessarily related
to the business cycle. For example, during the roaring 1960s, price-dividend ratios experienced
two major drops that display the same order of magnitude as the decline at the very beginning
of the chaotic 1970s. Realized returns follow approximately the same pattern, although they
are more volatile than price-dividend ratios (see Figure 7.2).

P/D ratio
P/D
ln P/D+1
one year returns
real risk-free rate
excess return volatility
expected returns

total
average std dev
31.99
15.88
2.01
12.13
8.59
15.86
1.02
2.48
14.55
4.68
8.36
3.49

NBER expansions
average std dev
33.21
15.79
3.95
10.81
12.41
13.04
1.03
2.43
14.05
4.47
8.09
3.29

NBER recessions
average std dev
26.20
14.89
7.28
16.79
9.45
15.49
0.97
2.69
16.91
4.91
9.62
4.10

TABLE 7.1. Data are sampled monthly and cover the period from January 1948 through
December 2002. With the exception of the P/D ratio levels, all gures are annualized
percent.

Table 7.1 also reports estimates of a key component of asset evaluation, the expected market
return (i.e. the investors expected return to invest in the stock market) at each point in time
of our sample. Appendix 1 describes a procedure to estimate such an unobserved variable. This
chapter relies on our reconstruction of yearly expected returns, dened as
E

12
X

( + )

(7.2)

=1

1 The

rationale behind this calculation is as follows. First, is an estimate of the average volatility occurring over the last 12
months. We annualize by multiplying it by 12. The term 6 arises for the following reason. If we assume that a given return
=

, where

is a positive constant and

follows by multiplying

12 ( ) by

is a standard unit normal, then

. This correction term,


2

(| |) =

. The denition Vol in Eq. (7.1), then,

, has been suggested by Schwert (1989a) in a related context.


2

279

c
by
A. Mele

7.2. Empirical evidence: birds eye view

where + is the expected return at time + , and ( + ) denotes the projection of +


given the information at (based on the model described in the appendix).
The following fact summarizes the cyclical properties of realized volatility and the expected
returns.
Fact II. Stock volatility and expected returns are countercyclical. However, business cycle conditions do not seem to be the only forces explaining the swings in
these variables.
That expected returns are countercyclical is a well-known fact since at least Fama and French
(1989) and Ferson and Harvey (1991). Figures 7.2 through 7.4 are also suggestive of the business
cycle pattern of the aggregate stock market. For example, Figure 7.3 depicts the statistical
relation between stock volatility and the industrial production growth rate, which shows that
stock volatility is largely countercyclical, being larger in bad times than in good.
There are, of course, exceptions. For example, stock volatility rocketed to almost 23% around
the October 1987 crasha crash occurring during one of the most enduring post-war expansion
period. Countercyclical volatility is a stylized fact extensively discussed in Sections 7.3 and 7.4.
An important lesson is that this empirical fact can be explained once the volatility of the P/D
ratios changes is countercyclical. Interestingly, Table 7.1 reveals that the P/D ratios variations
are more volatile in bad times than in good.
Finally, Figure 7.3 suggests that stock volatility behaves asymmetrically over the business
cycle, in that it increases more in bad times than it decreases in good. This asymmetric behavior
of stock volatility echoes its high frequency behavior documented at least since Glosten, Jagannathan and Runkle (1993)stock volatility increases more when returns are negative than it
decreases when returns are positive.
A third set of stylized facts links to the asymmetric behavior of the aggregate stock market
uctuations over the business cycle. We have already noted that volatility is countercyclical
and behaves asymmetrically. We now turn to price multiples and expected returns.
Fact III. Changes in the P/D ratios and expected returns occur asymmetrically
over the business cycle. The most severe variations in these variables occur during
the contractionary phases of the business cycle.
During recessions, price multiples and expected returns move more than they do in good
times. For example not only are expected returns countercyclical. On average, expected returns
increase more during NBER recessions than they decrease during NBER expansions, consistently with the summary statistics of Table 7.1, and the informal pieces of evidence from Figure
7.2. We shall return to this issue of countercyclical expected returns many times in this chapter.
Similarly, not only are P/D ratios procyclical. On average, P/D ratios increase less during
NBER expansions than they decrease during NBER recessions. Furthermore, this asymmetric
behavior is quantitatively quite pronounced. Consider, for example, the changes in the P/D
ratios: on average, their percentage (negative) changes during recessions is nearly twice as the
percentage (positive) changes during expansions. Sections 7.3 and 7.4 aim to provide basic
explanations of these facts.

280

c
by
A. Mele

7.2. Empirical evidence: birds eye view

FIGURE 7.1

FIGURE 7.2. Monthly excess returns and volatility, in percentage, year-to-year.


281

c
by
A. Mele

7.2. Empirical evidence: birds eye view

Predictive regression
24

30

22

Predicted volatility (annualized, %)

Return volatility (annualized, %)

Data
35

25

20

15

10

20

18

16

14

12
5

10

Avg industrial production growth (%)

Avg industrial production growth (%)

FIGURE 7.3. Stock volatility and business cycle conditions. The left panel plots stock
volatility, Vol , against yearly (deseasoned) industrial production average growth rates,
1 P12
computed as IP
=1 Ind +1 , where where Ind is the real, seasonally adjusted
12
industrial production growth as of month . The right panel depicts the prediction of the
ordinary least squares regression: Vol = 15 59 5 21 IP +1 56 IP2 + , where
is a
(0 19)

(0 39)

(0 41)

residual term, and robust standard errors are in parenthesis. The data span the period
from January 1948 to December 2002.

282

c
by
A. Mele

7.2. Empirical evidence: birds eye view

Data

Predictive regression
11

Predicted expected excess returns (annualized, %)

18

Expected excess returns (annualized, %)

16

14

12

10

10

5
2

Avg industrial production growth (%)

Avg industrial production growth (%)

FIGURE 7.4. The left-hand side of this picture plots estimates of the expected returns
(annualized, percent) (E say) against yearly (deseasoned) industrial production average
1 P12
growth rates, computed as IP
=1 Ind +1 , where where Ind is the real, seasonally
12
adjusted industrial production growth as of month . Expected returns are estimated
through Eq. (7.2). The right-hand side of this picture depicts the prediction of the ordinary
least squares regression: E = 7 74 2 19 IP + 0 43 IP2 + , where
is a residual
(0 09)

(0 19)

(0 19)

term, and robust standard errors are in parenthesis. Data are sampled monthly, and span
the period from January 1948 to December 2002.

Fact I entails a quite intuitive consequence: price-dividend ratios might convey information
relating to future returns. After all, expansions are followed by recessions. Therefore, in good
times, the stock market predicts that future returns will be negative. Dene the excess return
+
+ is the asset return over
for the time period [ + ] as +
+ , where
[ + ], and + is the sum of the one-month Treasury bill rate, taken over [ + ]. Consider
the following regressions,

P D +

(7.3)

is a residual term. Typically, the estimates of


are signicantly negative, with
where
regression R2 that increase with . [In progress, provide these estimates]

+ P D . Thus, they
In turn, the previous regressions imply that [ + P D ] =
suggest that price-dividend ratios are driven by expected excess returns. In this restrictive sense,
countercyclical expected returns (Fact II) and procyclical price-dividend ratios (Fact I) might
283

c
by
A. Mele

7.2. Empirical evidence: birds eye view

be two sides of the same coin. To investigate how this predictability links to business cycle
developments, consider the following regression, performed with monthly data from 1948:01 to
2002:12,

= 14 64

12

(1 04)

9 09 IP

(1 37)

12

14 27 In
(2 67)

12

with R2 = 11%

(7.4)

where robust standard errors are in parenthesis, is a residual term, 12 is the excess return
from 12 to , IP is the average industrial production growth over the previous twelve months,
as dened in Figure 7.3, and In is dened similarly as IP .
The negative signs of the coe cients in Eq. (7.4) are quite to be expected. Economic activity
does display mean-reverting behavior: bad times are followed by good. But good times are those
where the stock market goes up. Therefore, a slowdown in economic activity is a predictor of
high returns in the future. To illustrate with a simple example, consider a case where the
aggregate stock market positively links to a single state variable tracking the business cycle
conditions , say, such that the log of the aggregate equity index is ln = 0 +
, for two
constant 0 and , and where
0. Assume, then, and critically, that
is mean-reverting,
with unconditional expectation , speed of adjustment
0, and some volatility coe cient
( ),
= (
) + ( )

where
is a Brownian motion. Then, it is straightforward to show that
ln
=
12
12
denotes the expectation conditional on the information at , 0
0
1
12 , where
12
12
(1
) and 1
(1
). That is, if
is mean-reverting,
0, and the aggregate stock market is procyclical,
0, expected returns negatively link to past values
of , i.e. 1
0. This reasoning generalizes to a multivariate case, although the presence of
feedbacks between macroeconomic variables might then dilute the contribution of each variable
as a predictor of future expected returns.
The regression results in Eq. (7.4) have the same nature as that underlying Eq. (7.3): pricedividend ratios and market returns are procyclical. Note a nal relation that reveals this procyclicality of market returns. At a contemporaneous level, the excess market returns are positively related to industrial production and negatively related to ination,

12

= 10 47 + 7 27 IP
(1 07)

(1 19)

with R2 = 14%

16 33 In +
(2 91)

(7.5)

where robust standard errors in parenthesis, and


is a residual term. Corradi, Distaso and
Mele (2013) estimate a continuous time model in which the aggregate stock market is driven
by industrial production, ination and one unobserved factor. They show that in the model,
the relation between the market return and the two macroeconomic factors are similar to those
summarized by the linear regression in Eq. (7.5).
Finally, an apparently puzzling feature is that price-dividend ratios do not predict future
dividend growth. Let
ln( /
1 ). In regressions taking the following format,
+

P D +

the predictive content of price-dividend ratios is poor, and estimates of


might often come
with a wrong sign.
To summarize, the previous regression estimates suggest that: (i) the price-dividend ratios are
driven by time-varying expected returns (a proxy of the risk-premium as we shall explain), and
284

c
by
A. Mele

7.3. Volatility: a business cycle perspective

Expected excess returns (annualized, %)

(ii) the role played by expected dividend growth is somewhat limited. In the next chapter (see
Section 8.11), we shall explain that this view can be challenged along several dimensions. First,
it seems that expected earning growth does help predicting price-dividend ratios. Second, the
fact expected dividend growth does not seem to a ect price-dividend ratios can be a property
to be expected in equilibrium.
Naturally, because expected returns and stock volatility are both strongly countercyclical,
they then positively relate at the business cycle frequency considered in this chapter, as illustrated by Figure 7.5 below.
18
16
14
12
10
8
6
4
2

10

15

20

25

30

35

Stock volatility (annualized, %)

FIGURE 7.5. Expected returns and volatility.

7.3 Volatility: a business cycle perspective


A prominent feature of the U.S. equity market is the close connection between aggregate volatility and business cycle developments discussed in the previous section (see, e.g., Figure 7.3). Understanding the origins and implications of these facts is extremely relevant to policy makers.
Indeed, if stock market volatility is countercyclical, it must necessarily be encoding information
about the development of the business cycle. Policy makers could then extract the signals stock
volatility brings regarding business cycle developments.
This section accomplishes three tasks. First, it mentions additional stylized facts relating
stock volatility, expected returns and P/D ratios over the business cycle (in Section 7.3.1).
Second, it attempts at preliminary explanations of these facts (in Section 7.3.2). Third, it
investigates whether stock volatility contains any useful information about the business cycle
(in Section 7.3.3).
There are topics left over from this section. For example, we do not tackle statistical issues
regarding volatility measurement.2 Nor do we consider the role of volatility in applied asset evaluation. Instead, Chapter 10 covers details about how time-varying volatility a ects derivative
pricing. The focus of this section is more fundamental. It explores the extent to which volatility
movements could be given a wider business cycle perspective, and also highlights some of the
rational mechanisms underlying them.
2 See, e.g., Andersen, Bollerslev and Diebold (2002) for an early survey regarding the many available statistical techniques to
estimate volatility.

285

c
by
A. Mele

7.3. Volatility: a business cycle perspective


7.3.1 Volatility cycles

Why does equity volatility relate to the business cycle? One of the rst contributions to this
literature is Schwert (1989a,b). Schwert points out that low frequency uctuations in equity
volatility are di cult to explain through those in the volatility of other macroeconomic variables.
For example, industrial production volatility does not correlate
with stock volatility. Let us
P12
1
calculate industrial production volatility as VolG
is the real,
+1 |, where
=1 |
12
seasonally adjusted industrial production growth rate at month (similarly as in Eq. (7.1)).
Figure 7.6 plots stock volatility against VolG ; it does not reveal any obvious pattern between
these two variables. These results are in striking contrast with those available from Figure 7.3,
where, instead, stock volatility exhibits a quite clear countercyclical behavior. More in detail,
Table 7.1 reveals that stock market volatility is almost 30% higher during NBER recessions
than during NBER expansions.
In fact, Schwert also shows that stock volatility is countercyclical. The main focus of this section is to provide a few explanations of this evidence, which supports the view equity volatility
relates to the business cycle, although not precisely related to the volatility of other macroeconomic variables.
A seemingly unrelated but well-known stylized fact is that risk-premiums are countercyclical,
as summarized by Fact II in the previous section. Particularly important is also Fact III:
expected returns lower much less during expansions than they increase during recessions. With
post-war data, we nd that compared to an average of 8.36%, the expected returns increase by
nearly 19% during recessions while they drop by a mere 3% during NBER expansions (see Table
7.1). A nal stylized fact is that price-dividend ratios behave asymmetrically over the business
cycle. Table 7.1 reveals that not only are price-dividend ratios pro-cyclical. Their downward
changes during recessions are also more severe than their upward movements during expansions.
Table 7.1 suggests that price-dividend ratios uctuate nearly two times more in recessions than
in expansions.
How can we rationalize these facts? A simple possibility is that the economy is frequently hit
by shocks that display the same qualitative behavior of return volatility, expected returns and
price-dividend ratios. However, Figure 7.6 suggests this channel is unlikely. Another possibility
is that the economy reacts to shocks, thanks to some mechanism endogenously related to the
investors maximizing behavior, which then activates the previous phenomena.
Section 7.3.2 puts forward explanations for countercyclical stock volatility relying on endogenous mechanisms. Section 7.3.3 provides, instead, additional empirical properties of equity
volatility. The motivation is simple: because stock volatility is countercyclical, it might contain useful information about ongoing business cycle developments. The section, then, aims to
provide answers to the following questions: (i) Do macroeconomic factors help explain the dynamics of stock market volatility? (ii) Conversely, what is the predictive content stock market
volatility brings about the business cycle?

286

c
by
A. Mele

7.3. Volatility: a business cycle perspective

Data

Predictive regression

35

18

17

Predicted volatility (annualized, %)

Return volatility (annualized, %)

30

25

20

15

10

16

15

14

13
5

10

12

Ind production volatility (annualized, %)

12

10

12

Ind production volatility (annualized, %)

FIGURE 7.6. Return volatility and industrial production volatility. The left panel plots
stock volatility, Vol , against industrial production volatility, VolG . The right panel of
the picture depicts the prediction of the ordinary least squares regression: Vol = 16 51
(0 93)

0 78 VolG + 0 05 Vol2G +

(0 47)

(0 05)

, where

is a residual term, and standard errors are in

parenthesis. The data span the period from January 1948 to December 2002.

7.3.2 Understanding the empirical evidence


This section is a broad introduction to the main determinants of aggregate stock market uctuations. We develop a simple example of an economy, where countercyclical volatility arises
in conjunction with the property that investors required return are (i) countercyclical, and (ii)
asymmetrically related to business cycle developments. That is, in this economy, risk-premiums
increase more in bad times than they decrease in good, consistently with the evidence discussed
earlier in this chapter.
7.3.2.1 Fluctuating compensation for risk

In frictionless markets, the price of a long-lived security is simply the risk-adjusted discounted
expectation of the future dividends stream. Heuristically, and other things being equal, this
price increases as the expected return from holding the asset and, hence, the risk-premium,
decreases. According to this mechanism, asset prices and price-dividend ratios are pro-cyclical
because risk-adjusted discount rates are countercyclical.
Would countercyclical risk-premiums also lead to countercyclical volatility? We now explain
that an addditional property is required, asymmetry. Figure 7.7 depicts a situation in which
287

c
by
A. Mele

7.3. Volatility: a business cycle perspective

risk-premiums are countercyclical and, also, asymmetric, in that they decrease less in good
times than they increase in bad, consistently with the empirical evidence.
Suppose that the economy enters a boom, in which case risk-premiums decrease, and asset
prices increase, on average. During the boom, when the economy is hit by positive shocks on
the fundamentals, risk-premiums decrease and asset prices increase. However, risk-premiums
(and, hence, asset prices) do not change as they would during a recessionwe are assuming
that risk-premiums behave asymmetrically over the business cycle. Eventually, the boom ends
and a recession begins. As the economy leads to a recession, risk-premiums increase and asset
prices decrease. Yet now, the negative shocks on the fundamentals lead risk-premiums increase
(and, hence, asset prices decrease) more than they did during the boom. All in all, volatility
increases on the downside. Once again, this asymmetric behavior occurs due to the assumption
that risk-premiums change asymmetrically over the business cycle.3
Price-dividend
ratio

Risk-adjusted
discount rates

good
times

bad
times

bad
times
good
times

FIGURE 7.7. Countercyclical risk-premiums and stock volatility.


The empirical evidence in Table 7.1 is supportive of the channel described above: expected
returns seem to move more during recessions than during expansions. Figure 7.8 connects such
an asymmetric behavior of the expected returns with short-run macroeconomic uctuations. It
depicts how expected returns relate to the monthly industrial production growth, according to
whether the U.S. economy is in a booming or a recessionary phase.

3 Mele

(2007) develops a no-arbitrage framework to deal with these countercyclical issues, on which the next section is based.

288

c
by
A. Mele

7.3. Volatility: a business cycle perspective

Data

Predictive regression
11

Predicted expected excess returns (annualized, %)

18

Expected excess returns (annualized, %)

16

14

12

10

Monthly growth (%)

10

Monthly growth (%)

FIGURE 7.8. Expected returns and business cycle conditions. The left panel plots expected excess returns, E in Eq. (7.2) against real (deseasoned) monthly industrial production growth, Ind . The right panel depicts the prediction of the ordinary least squares
regression: E = 7 225 (0 782 IRec + 0 121 IExp ) Ind + , where IRec (resp., IExp )
(0 099)

(0 174)

(0 103)

is the indicator functions that takes the value one if the economy is in a NBER-recession
is a residual term, and standard errors
(resp. expansion) episode and zero otherwise,
are in parenthesis.

To summarize, if risk-premiums are more volatile during recessions than booms, asset prices
and price-dividend ratios are more responsive to changes in economic conditions in bad times
than in good, thereby leading to countercyclical volatility. These e ects are precisely those
we observe, as explained. The next section develops theoretical foundations to formalize these
links. A key result is that countercyclical volatility is likely to arise in many models, provided
the previous asymmetry in discounting is su ciently strong. More precisely, if the asymmetry
in discounting is su ciently strong in relation to some benchmark variable tracking the business cycle conditions, then, the price-dividend ratio is, then, increasing and concave in these
variables. It is this concavity feature to make stock volatility increase on the downside.
Section 7.4 provides a more comprehensive explanation of these facts, by relying on a fairly
general continuous-time framework and tools relatively unusual in economics, although the
intuition is still the same as that of Figure 7.7. The scope of this section is to provide a quantitative illustration of these results, based on a simple binomial tree model, which is solved in
closed-form, and shown to predict a few of the stylized features of the aggregate market, surveyed in Section 7.2. Section 7.6 provides additional models that help understand the empirical
evidence.
289

c
by
A. Mele

7.3. Volatility: a business cycle perspective


7.3.2.2 A simple model of countercyclical volatility

Consider an innite horizon economy, a single asset, and a representative agent. In equilibrium,
the agent consumes the dividends promised by asset. We also assume a safe asset is innitely
elastically supplied, such that the interest rate is some constant
0. In the initial state, the
dividend is equal to one (see Figure 7.9). In the second period, =
(
0) with prob
(the bad state) or = with prob 1
(the good state).
In the initial state, the agents CRRA is
0. In the good (resp., the bad) state, the agents
RRA is
(resp.,
) 0. In the third period, the agent receives the nal payo s of Figure
7.9, where
is the price of a claim to all future dividends, discounted at a RRA , with
{
} and
= .
This model is thus one with constant expected dividend growth, but random risk-aversion.
Note that risk-aversion is being a source of long-run riskonce this risk is resolved, riskaversion remains xed at its level forever.4
e2 + MG

e
good state
q

1 + MGB

q
p

e
bad state
q

e2 + MB

FIGURE 7.9. A model of random risk-aversion and countercyclical volatility. The dividend
is normalized to one at the initial node of the tree. With prob ,
then decreases to
in the bad state. The risk-neutral probability of this state is denoted as . The riskneutral probability of further dividend movements depends upon whether the economy is
in the good or bad state (i.e.,
or ). At time 3, the agent receives the dividends plus
the right to the stream of all future dividends. In the upper node, this right is worth
(obtained through the risk-neutral probability ). In the central node, it is worth
(through the risk-neutral probability ). In the lower node, it is worth
(through the
risk-neutral probability ).

Table 7.2 provides calibration results for this model, based on the statistics of Table 7.1
(see Appendix 2 for details). Note that the model is calibrated using data for the aggregate
dividend growth (not consumption), the volatility of which is about 6% annualized (almost
4 The literature on long-run risks is surveyed in Chapter 8. In this literature, the expected dividend growth is a ected by some
unobservable and persistent factor, which generates countercyclical stock volatility, due to the assumption that dividend growth
and consumption volatility are countercyclical. The model in this section leads to countercyclical volatility without the assumption
that the volatility of the fundamentals is countercyclical.

290

c
by
A. Mele

7.3. Volatility: a business cycle perspective

twice that on consumption growth). In the Lucas model of the previous chapter (Section 6.2),
06
this gure would imply a RRA equal to 17 0006
2 to match an equity premium of 6%. However,
the Lucas model would predict that return volatility is simply dividend growth volatility, thus
being equal to 6%, which is less than a half of the average volatility in the data, 14.55%. Finally
the Lucas model predicts the price-dividend ratio is constant, and therefore cannot lead to the
countercyclical statistics in Table 7.1.

P/D ratio
excess return volatility

Data
expansions average recessions
33.21
31.99
26.20
14.05
14.55
16.91

P/D ratio
excess return volatility
risk-adjusted rate
expected returns
implied risk-aversion

Model calibration
good state average bad state
32.50
31.81
28.15
7.29
8.20
13.03
8.95
9.07
9.71
10.16
11.46
18.42
13.69
13.89
14.96

TABLE 7.2. This table reports calibration results for the innite horizon tree model in
Figure 7.9, by perfectly matching the model-implied time-3 P/D ratios to the data in the
rst row. The model-implied statistics in the good, average and bad state are those of
time-2. The risk-adjusted rate is + 0 , where is the riskless rate; 0 is dividend
growth volatility, p
and
is the Sharpe ratio on gross returns in state , determined as
(
)
(1
) for
=
(the good state) and
=
(the bad state).
Finally, is the probability of the bad state and
is the state-dependent risk-adjusted
probability of the bad state (for
{
}). Implied risk-aversion is the coe cient RRA
in the good state ( = ) and in the bad ( = ) implied by the calibrated model.
The gures in the average column are the averages of the corresponding values in the
good and bad states, averages taken under = 0 158.

The model predicts that the average excess return volatility equals about 8%. Moreover,
the average implied RRA is now around 13, and the average expected excess returns are high.
Finally, the model of this section predicts stock volatility has swings that mimic those in the
data, with levels reaching 13% in the bad state. In the bad state, however, the model overstates
the expected returns by a few percentage points. Importantly, this calibration exercise illustrates
the asymmetric feature of expected returns and risk aversion. In this experiment, both expected
returns and risk-aversion increase more in bad times than they decrease in good.
7.3.2.3 Alternative channels

There are at least two broad mechanisms to explain aggregate stock market uctuations, as
expalined in the Introduction: (i) time-varying risk-premiums (as in this section), and (ii) timevarying expected dividend growth. Section 7.5 surveys more elaborated models than the tree
of this section, aiming to rationalize countercyclical statistics based on the rst channel. We
291

c
by
A. Mele

7.3. Volatility: a business cycle perspective

shall also examine models of learning, in which stock volatility is time varying due to the
agents attempt to learn about the state of the economythese models predict agents face timevarying expected dividend growth. In the next chapter, we survey many models relying on both
channels, each of them focussing on particular economic mechanisms and specic predictions
(e.g., idiosyncratic risk, restricted stock market participation, heterogeneous beliefs, bubbles).
7.3.3 What to do with stock market volatility?
The historical behavior of equity volatility displays a pronounced business cycle pattern. Could
we exploit this pattern for the purpose of forecasting? This section considers two exercises.
In the rst exercise, we forecast stock market volatility using past macroeconomic data. Note
that stock volatility is an input to many decision making processes, ranging from portfolio
selection to risk-management; understanding how it links to ongoing business cycle conditions
is therefore a natural exercise. The second exercise explores whether volatility helps predict
economic activity. This exercise might help decision makers (e.g., policy makers) take informed
decisions.
7.3.3.1 Macroeconomic constituents of stock market volatility

Table 7.3 reports results regarding the rst forecasting exercise. How does volatility link to past
macroeconomic data? We use year-to-year industrial production growth and ination as the two
macroeconomic factors that summarize the state of the economy at any given point in time.
Volatility is positively related to past growth in the medium term (say between one and two
years), a nding we can easily interpret. Bad times are followed by good. Because stock market
volatility is countercyclical, high growth is followed by high volatility. These explanations are
similar to those put forward in Section 7.2 while elaborating on Eq. (7.4). Equity volatility is
also related to past ination, but in a more complex manner.
Figure 7.10 (top panel) depicts equity volatility and its in-sample forecasts when the regression model is fed with past macroeconomic data only. Naturally, the t could be improved by
providing the model with information about both past volatility and past macroeconomic data.
Nevertheless, it is remarkable that the t relying on past macro information is more than 60%
better than that relying on past volatility only, as witnessed by the R2 s in Table 7.3.
Note that these results are not inconsistent with those reported by Schwert (1989). Indeed,
this section relies on estimates regarding lower frequency scales than those investigated by
Schwert. More importantly, these estimate regard the linkages between stock market volatility
and the level of macroeconomic variables, not their volatility.

292

7.3. Volatility: a business cycle perspective

c
by
A. Mele

FIGURE 7.10. Stock market volatility predictions. The top panel depicts stock market
volatility (solid line) and its forecasts based on the sole use of past macroeconomic indicators (dashed), i.e. the model estimates in the second column of Table 7.3 (Past). The
bottom panel depicts stock market volatility and its prediction based on the realization
of future values of macroeconomic indicators, i.e. the model estimates in the third column
of Table 7.3 (Future). Shaded areas are NBER recession and expansion episodes.

The previous ndings should not be interpreted as suggesting any causality link; they could be
best regarded as descriptive statistics. They do suggest, however, that stock market volatility
links to past macroeconomic developments. A natural question is how precisely it does. Once
again, the previous regressions capture mere statistical relations. Yet macroeconomic factors
are likely part of the evaluation leading to the very same volatility. In terms of our daily jargon,
macroeconomic factors could well be determinants of the pricing kernel. That is, stock volatility
links to how the price responds to shocks in the fundamentals and, hence, macroeconomic
conditions, but this linkage should be determined in absence of arbitrage.
Corradi, Distaso and Mele (2013) pursue this topic in detail and build up a no-arb model
that reproduces the previous predictability results. In their model, there is a no-arbitrage nexus
between equity volatility and macroeconomic factors. Christiansen, Schmeling and Schrimpf
(2012) and Paye (2012) provide evidence of Granger causality from past values of several macroeconomic variables to stock volatility, in out of sample experiments. However, Paye notes that
we are still not able to exploit these linkages for forecasting purposes. It is an important result,
as it points to the possibility that in the future, alternative data sets could do a better job than
the datasets these authors are using.
The distinction between Granger causality and forecasting accuracy is indeed subtle. A set of
variables could well a ect the probability distribution of stock volatilitythis is the denition
293

c
by
A. Mele

7.3. Volatility: a business cycle perspective

of Granger causality. At the same time, estimating, say, a linear regression linking past macroeconomic variables to stock volatility might not necessarily perform well. Intuitively, this relation
can be subject to parameter estimation error, which increases the uncertainty sorrounding the
forecasts. This uncertainty might overwhelm bias reduction gains brought by a correctly specied model, i.e. without omitted variables (macroeconomic variables). We illustrate this point
in more detail below (see Section 7.3.3.4).5
Const.
Growth
Growth
Growth
Growth
In 12
In 24
In 36
In 48
Vol 12
Vol 24
Vol 36
Vol 48
R2

12
24
36
48

Past
10.98

0.36
0.09
0.10
0.08
12.50

10.81
0.15
0.16
0.27
0.23
0.76
0.97
0.45
0.31

21.91

4.79
0.002
0.24
0.27
0.20
0.63
0.92
0.62
0.15
0.28
0.03
0.02
0.09
27.24

Future
Const.
17.88
Growth +12 0.02
Growth +24
0.15
Growth +36
0.43
Growth +48
0.31
In +12
1.12
In +24
1.09
In +36
1.03
In +48
0.94

R2

24.30

TABLE 7.3. Forecasting stock market volatility with economic activity. The left part of
this table (Past) reports OLS estimates in linear regression of one year volatility (in %)
on to, past one year industrial production growth (in %), past one year month ination
(in %), and past stock volatility. Growth
is one year industrial production growth at
time
, etc. Time units are months. The second part of the table (Future) is similar,
but contains coe cient estimates in linear regressions of volatility on to future industrial
production growth and future ination. Starred gures are not statistically distinguishable
from zero at the 95% level. R2 is the percentage, adjusted R2 .
7.3.3.2 Macroeconomic implications of stock market volatility

Does equity volatility also anticipate the business cycle? Table 7.3 suggests that stock volatility
does indeed link to future business cycle developments. The bottom panel of Figure 7.10 depicts
the predicting part of the regression in the third column of Table 7.3, a back-casting exercise.
Fornari and Mele (2013) have actually tackled this issue in great detail, concluding that stock
volatility does quite help predict the business cycle, on top of traditional indicators such as the
term spread and other nancial variables, both in sample and out of sample.
To illustrate, note that not only is stock volatility countercyclical, i.e. a coincident business
cycle indicator. Figures 7.2 and 7.10 also seem to indicate that stock volatility tends to increase
before recessions, a typical attribute of a leading indicator. Consider the following regression:
X
= +
+ 1 I O(NBER =1) + 2 INBER =1 +
(7.6)
{3 12 24 36}

5 The literature on statistical tests for Granger causality and forecasting accuracy is large. See, e.g., Clark and West (2007) for
the former and Giacomini and White (2006) for the latter.

294

c
by
A. Mele

7.3. Volatility: a business cycle perspective

where
is stock volatility at month ; I O(NBER =1) is the indicator function that equals one
in the twelve months preceding any NBER-dated recession, and zero otherwise; INBER =1 is
the indicator function that equals one during any NBER-dated recession, and zero otherwise;
nally, is a residual term.
Table 7.4 reports estimates of this model parameters on a sample covering monthly data
from January 1957 to September 2008. The table reports estimates for the whole sample, and
two subsamples, one before the Great Moderation, i.e., up to 1982, and another, covering
the Great Moderation and ending in 2008. A value 1
0 is indicative that stock volatility
increases ahead of recessions and a value 2
0 indicates that stock volatility also increases
during recessions.
3

1957-2008
1957-1982
1983-2008

3.11 0.94
3.60 0.98
2.88 0.94

12

0.15
0.24
0.09

24

36

0.01
0.02
0.05

0.01
0.04
0.01

0.48
0.34
1.01

1.51
1.87
1.22

TABLE 7.4. This table reports ordinary least squares estimates of the parameters in Eq.
(7.6) for the post-War data and the two subsamples (i) prior and over (ii) the Great
Moderation. Starred gures are not statistically distinguishable from zero at the 95%
level.

It appears that especially during the Great Moderation, stock volatility does anticipate economic downturns. This issue is indeed quite a delicate one. The fact stock volatility is countercyclical does not necessarily imply it anticipates real economic activity. And even if it could,
there would remain to know whether a sustained stock market volatility could really create the
premises for future economic slowdowns.
Post hoc ergo propter hoc? Does aggregate stock market volatility a ect investment decisions
in the real sphere of the economy? Or, rather, does volatility help predict the business cycle?
The policy implications of these issues are quite obvious. If volatility merely anticipates, without
a ecting, the business cycle, there is little policy makers can do about it, even if its forecasting
power is obviously interesting per se. These themes are still unexplored at the time of writing.
[However, survey the recent volatility paradox ideas]
7.3.3.3 Forecasting with the wrong model

The results in this section are in-sample. It may turn out that real-time forecasts could be
disappointing. One reason could be data-snooping: if we regress a variable of interest over
thousands, there is a considerable chance that at least one out of these thousands nicely links to
the endogenous variable, and displays a spectacular t (in-sample). However, precisely because
this t was obtained only by chance, and not due to an economic linkages between this variable
and the endogenous one, the out-of-sample performance of the model will likely disappoint.
An opposite situation can actually occur, in which a link between two variables really exists,
which cannot be properly exploited for practical forecasting purposes. The intuitive reason for
this di culty is limited data. That is, we can only estimate a linkage between two variables by
relying on a nite sample. Yet the nite-sample bias in the linkage estimates could turn out
to be substantial, and lead to large forecasting errors. Consider the following example, a data
generating process in which a variable Granger causes a second one, , as follows:
= +

NID (0
295

NID (

(7.7)

c
by
A. Mele

7.3. Volatility: a business cycle perspective

for ve constants , , ,
and , the parameters of the model. We assume that
and
are known, and consider making predictions of the variable through two models.
The rst model is misspecied, in that we simply neglect that
Granger causes , i.e.
= + , for some constant and some residual term . We estimate the constant of this
misspecied model through ordinary least squares (OLS), obtaining:

= +

where and denote the sample averages of and , and


error generated by this model for time + 1 is:
1

+1

+1

is the sample size. The prediction

)+

+1

+1

Note that although is biased


for , even asymptotically, the predictor of this misspecied
model is unbiased because
1 +1 = 0.
Next, condider using as a predictor, the predictive part of Eq. (7.7), obtained through the
OLS estimators of and , say and . The resulting prediction error is,
2

+1

+1

=
=(

+ (
) (

where the second equality follows by (i) =


, and (ii) Eq. (7.7), and:
=

+1

)
+1

+1

)+

+1

+1

(7.8)

, with denoting the sample average of


(
)
( )

and
stand for the sample covariance and variance of their arguments.
and

The correctly
specied model does, naturally, lead to an unbiased predictor, in that
2 +1 = 0, by the
second line in Eq. (7.8).
Therefore, the two models we consider (the misspecied and the correctly specied) both
lead to unbiased predictors. However, the second predictor is plagued by parameter estimation
error, and might actually lead to mean-squared prediction errors higher than those generated
is large. In other words,
by the rst predictor, especially when the sample variance of
is, of course, quite small, as is consistent for . In nite samples,
for large samples,
however, this term can adversely a ect the performance of the correctly specied model.
7.3.4 What did we learn?
Stock market volatility is higher in bad times than in good. Explaining this basic fact is challenging. We know very well how to model risk-premiums and how these premiums should relate
to the business cycle. We are more embarrassed when it comes to explain volatility. This section
explains that countercyclical volatility could arise because risk-premiums undergo large swings
as the economy moves away from good states, just as the data seem to suggest.
The focus in this section relates to the uctuations of aggregate stock volatility and risk
premiums, not their average levels. Not suprisingly, the question whether these uctuations
(and their average levels) can be consistent with the neo-classical model of rational evaluation is
controversial, as for many topics at the intersection of nancial economics and macroeconomics,
296

c
by
A. Mele

7.4. Rational market uctuations

as vividly illustrated in the early debates (see, e.g., Campbell, 2003; Mehra and Prescott, 2003).
However, this section suggests that there is a potential for explaining the swings that aggregate
stock volatility experiences across states of nature.
Do these theoretical insights have some additional empirical content? This section has discussed three empirical issues: (ii) the market expected returns are strongly countercyclical and
asymmetrically related to macoreconomic conditions; (ii) equity volatility links to the business
cycle (in-sample), although it cannot necessarily be forecast through macroeconomic variables,
out-of-sample; (iii) equity volatility contains information regarding business cycle developments.
We now turn to more theoretically-based explanations of the aggregate stock market uctuations.

7.4 Rational market uctuations


What would be needed to make rational valuation consistent with the countercyclical statistics
described so far? We rely on a parsimonious framework and explain that the price-dividend ratio
should play a critical role, by reacting asymmetrically to shocks in the fundamentals. Section
7.4.1 provides a decomposition of the asset returns. Section 7.4.2 develops tools of analysis with
examples. In Section 7.5, we apply these tools to address the empirical issues.
7.4.1 The dynamics of asset returns
7.4.1.1 A return decomposition

Asset returns depend on both payo s and prices. Consider the following identity that holds for
+1 +
+1
the gross returns, +1
,
ln +1

+1

+ ln

+1

+1

(7.9)

where
ln
, the price-dividend ratio. Thus, return
, the dividend growth, and
1
volatility is countercyclical because the dividend growth and/or the price-dividend ratio changes
have countercyclical volatility.
The empirical evidence in Section 7.2 suggests that return volatility does not necessarily
inherit the properties of the volatility of the fundamentals. Instead, the empirical evidenc suggests at least two minimal predictions any model should make regarding the price-dividend
ratio. First, it needs to be volatile, and second, it needs to be more volatile in bad times than
in good. For example, in an economy driven by a state variable linked to the business cycle,
such as habit formation (see Section 7.5), we would require that the price-dividend ratio be increasing and concave in the business cycle variable, as previously explained (see Section 7.3.2).
Intuitively, this property ensures stock volatility increases on the downsidethe very denition
of countercyclical volatility. This section aims to provide conditions under which price-dividend
ratios behave in this way.
7.4.1.2 Asymmetric behavior of the price-dividend ratio

Do price multiples behave asymmetrically over the business cycle? Empirically, they do, as
explained. We now rely on a simple continuous time model that leads to these asymmetric
297

c
by
A. Mele

7.4. Rational market uctuations


properties. We assume that dividend growth is i.i.d.,
=

where 0 and 0 are two constants.


We assume that the risk-adjusted discount rates are a function of some state variable summarizing the business cycle conditions,
say. That is, and relying on notation introduced in
Chapter 4 (Section 4.2.5), we assume that
R( ) = ( ) +

0 CF (

where () is the short-term rate and CF () is the cash-ow lambda. In the next sections and
in the next chapter, we explain how agents preferences and beliefs can lead to these discount
rates. We assume that
is solution to
=

( )

( )

( )

for some functions and .


Given these assumptions, and under regularity conditions, the price-dividend ratio is driven
by , as shown by the following expression

Z

1 2
1 )

R( )

(7.10)

= ( 0 2 0 )( )+ 0 ( 1
( )=E

where 1 is a standard Brownian motion under the risk-neutral probability , and E denotes
the expectation under .
Eq. (7.10) is derived in Chapter 4 (Section 4.2.5). It suggests that the sensitivity of with
respect to is related to the sensitivity of the risk-adjusted discount rate R with respect to
. Hence, whether volatility is countercyclical now depends on how the risk-adjusted discount
rates change after shocks in .
This section formalizes the previous intuition. It shows that if R increases in bad times
su ciently more than it does in good times, the price-dividend ratio is concave in , thereby
being more volatile in bad times than in good. This property is desirable because it would be
consistent with the empirical behavior of price-dividend ratios and volatility. Chapter 6 explains
that additional state variables to dividends are needed to drive uctuations in the price-dividend
ratio. But Chapter 6 also explains that multifactor models are necessarily satisfactory. Indeed,
multifactor models exist, such that (i) the variance of the pricing kernel increases arbitrarily
with the number of factors, and yet (ii) price-dividend ratios are constant. What we really need
is a discipline on how to increase the dimension of a model.
In the remainder of the chapter, we focus on two broad but key properties of models consistent
with this search process: (i) monotonicity and (ii) convexity properties:
(i) Monotonicity. Consider the price-dividend ratio
p in Eq. (7.10). By Itos lemma, stock
0( )
2
2
volatility is 0 + ( ) Vol ( ), where Vol ( ) =
1 ( ) + 2 ( ) is the volatility of . Therefore, can help inate volatility if increases with . This monotonicity is important
theoretically: it ensures that stock volatility is strictly positive, thereby guaranteeing the
agents budget constraints are well-dened.
298

c
by
A. Mele

7.4. Rational market uctuations

(ii.1) Negative convexity. Suppose as before that correlates with the business cycle. If Vol( )
is constant, stock volatility is countercyclical whenever in Eq. (7.10) is concave in ,
as in the simple reasoning underlying Figure 7.8. We shall study this point in detail in
Section 7.5.3.
(ii.2) Convexity. Alternatively, suppose that expected dividend growth, say, is stochastic (an
assumption we explore in detail in Section 7.5). We shall explain that under conditions,
the price-dividend ratio is a function of , similarly as with Eq. (7.10). Now, suppose
is increasing and convex in . In this case, the price-dividend ratio would displays
overreaction to small changes in in good times, i.e. when is high. The empirical
relevance of this point was rst acknowledged by Barsky and De Long (1990, 1993), and
formalized by Veronesi (1999) in a model with learning (see Section 7.5.4).
We now introduce a framework to study these issues. We need to revisit the option pricing
literature on convexity of option prices and extend it to contexts with untraded risks. Chapter
10 contains additional explanations regarding these general properties of option prices that go
beyond those needed for the purpose of this chapter.
7.4.2 Asset prices as options
Consider a two-period market for a cash to be paid in the second period. We assume interest
rates are at at zero, and
= () for some random variable . Let
E[ ()] be the
premium.
The focus of standard textbooks is how the premium relates to the volatility of (see
Appendix 5 for further details). Appendix 5 considers a dynamic extension of this problem,
and develop conditions matching those in the static case. In this extension, = , for some
future date , where is a random process, with 0 = , such that the price of the claim is
now,
( ) = E ( ( )| )
(7.11)
Clearly, the two pricing problems, E ( ( )| ) and E ( ()), are not the same. They actually
bear similarities if (i) is the price of a traded asset; and (ii) is a proportional processone
for which the risk-neutral distribution of
is independent of . If these assumptions hold,
the usual tools of the static case still apply to this dynamic case. In particular, increases after
a mean-preserving spread in whenever is convex.6 We now examine cases in which these
assumptions do not necessarily hold.
7.4.2.1 Volatility, options and convexity

If
is the price of a traded asset, which does not pay dividends, the drift of
under the
risk-neutral probabilty is proportional to . We shall clarify soon that in this case, the price
inherits the convexity properties from the nal payo only, .
But there are risks that are not necessarily traded. In these markets, interesting nonlinearities
arise. For example, Theorem 7.1 reveals that in this context, convexity of is neither a necessary
or a su cient condition for convexity of . The drift of plays a crucial role.7 .
6 This

prediction is consistent with the celebrated Black and Scholes (1973) formula, as we further explain in Chapter 10a
point made by Jagannathan (1984, p. 429-430). As further explained in Chapter 10, Bajeux-Besnainou and Rochet (1996, Section
5), Bergman, Grundy and Wiener (1996), El Karoui, Jeanblanc-Picqu
e and Shreve (1998) and Romano and Touzi (1997) generalize
these results to more general di usion models, including those with stochastic volatility.
7 Kijima (2002) produces a counterexample where convexity of option prices might break down even payo s are convex in the
underlying, and traded, assets. This counterexample relies on an extension of the Black-Scholes model where due to the presence of

299

c
by
A. Mele

7.4. Rational market uctuations

Let us consider the following problem. It is the benchmark for a number of pricing problems
dealt with in the remainder of this chapter.
Canonical pricing problem. Let

be the solution to:


+ ( )

= ( )

(7.12)

where is a multidimensional -Brownian motion (for some ), and


are some given
functions. Let and be two twice continuously di erentiable positive functions, and dene

( )

0
( )
(7.13)
(
) E
to be the price of an asset which promises to pay

) at time

A simple market encompassed by this problem is one in which


is the price of a traded
asset, and = , the risk-neutral probability, such that the drift in Eq. (7.12) is ( ) = ( ),
by no-arbitrage. If
is not a traded risk, ( ) = 0 ( )
( ) ( ), where 0 is the physical
drift of , and is a risk-premium. Our canonical pricing problem now covers a number of
interesting cases. For example, we may assume ( ) = 1, ( ) = , and is a short-term rate,
in which case is the price of a zero-coupon bond price.
Note that is not necessarily the risk-neutral probability. Consider the following important
example, a scale-invariant endowment economy with one asset and dividend solution to
=
=

( )
( )

+
+

( )

(7.14)
1

( )

where 1 and 2 are standard Brownian motions. This example generalizes that in Section
7.4.1.2, in that expected dividend growth, , can be stochastic, driven as it is by the state
variable .
Note also that in this model, the distribution of
does not depend on . That is, it is
a proportional Samuelson-Mertons process. Moreover, we assume that the short-term rate
only depends on . These assumptions imply that the price-dividend ratio is only a function
of the current state, similarly as in Section 7.4.1. However, due to stochastic expected dividend
growth, the expression for is slighly more general than that in Eq. (7.10). It is:
Z
( )=
( )
(7.15)
0

where, denoting as usual the risk-adjusted discount rate with R ( ),


(
)

0
( ) E

h
1 2
i

( )
)
+ 0 1
0
0 R(
2 0
=E

( )
)

0
0 R(
=E

(7.16)

dividends, the drift of the underlying asset is concave in the asset price. Among other things, Theorem 7.1 below unveils the origins
of this counterexample.

300

c
by
A. Mele

7.4. Rational market uctuations


and E is the expectation under the risk-neutral probability ( ), under which
=( ( )
0 1 ( ))

P2
=
( )
( )
=1

( )

and

satisfy:8

1
+

( ) 1 +

( ) 2

where are two Brownian motions under . The third line of Eq. (7.16) is obtained with a
is the expectation taken under a conveniently changed probability
change of probability, and E
, dened by the Radon-Nikodym derivative,

= 12 20 + 0 1
(7.17)

where
denotes the information set as of time
have that under ,
= ( ) +
where ( )
( )

generated by 1 . By Girsanov theorem, we

( ) 1 + 2 ( ) 2
P2
( ) ( )+ 0 1( )
=1

(7.18)

and are Brownian motions under .9 Note the trick we have used to arrive to a relatively
1 2

neat formula, by getting rid of the term, 2 0 + 0 1 , arising because consumption and the
state variable are correlated. The density of under is right-shifted with respect to the
same density under , due to the positive covariance between consumption growth and
.
Our canonical pricing problem allows us to analyze properties of prices relating to long-lived
assets, through those relating to in Eq. (7.16), with as in Eq. (7.18), once we set
( )

1;

( )

R( )

( );

( )= ( )

(7.19)

The next theorem characterizes slope and convexity properties of the price in the canonical
pricing problem.
Theorem 7.1. We have:
0, then is increasing whenever 0
0. Furthermore, if 0 = 0, then is
(i) If 0
decreasing (resp. increasing) whenever 0 0 (resp. 0).
(ii) If 00 0 (resp. 00 0) and is increasing, then is concave (resp. convex ) whenever
00
2 0 (resp. 00
2 0 ) and 00
0 (resp. 00
0). Finally, if 00 = 2 0 , is concave (resp.
convex ) whenever 00 0 (resp. 0) and 00 0 (resp. 0).
Theorem 7.1-(i) generalizes previous results regarding monotonicity of option prices, obtained
by Bergman, Grundy and Wiener (1996). By the so-called no-crossing property of a di usion,
is not decreasing in its initial condition . Therefore, inherits the same monotonicity features
of if discounting does not operate adversely. This simple observation allows us to address
monotonicity properties of long-lived asset prices, as we shall see in Section 7.5.
Theorem 7.1-(ii) generalizes a number of existing results on option price convexity. First,
assume that is constant and that
is the price of a traded asset, such that 0 = 00 = 0.
8 See, for example, Huang and Pag`
es (1992, Theorem 3 p. 53) and Wang (1993, Lemma 1, p. 202), for regularity conditions
underlying the Feynman-Kac theorem in innite horizon settings; and Huang and Pag`es (1992, Proposition 1, p. 41) for regularity
conditions ensuring that the Girsanovs theorem holds in innite horizon settings.
9 Mele (2005, 2007) contains the rst derivation of this representation of the price-dividend ratio.

301

c
by
A. Mele

7.4. Rational market uctuations

The last part of Theorem 7.1-(ii) then says that the convexity of propagates to the convexity
of . This result reproduces the ndings in the literature surveyed earlier. Theorem 7.1-(ii)
characterizes convexity in a more general context. Suppose, for example, that 00 = 0 = 0, and
that
is not a traded risk. Then, Theorem 7.1-(ii) suggests that inherits the convexity of
the drift of . As a nal example, Theorem 7.1-(ii) extends a result in Mele (2003) relating to
bond pricing: let ( ) = 1 and ( ) = . Accordingly, is the price of a zero-coupon bond in a
short-term rate model (see Chapter 12 for details). By Theorem 7.1-(ii), is convex whenever
00
2 (see Appendix 6 for further details and intuition on this bounding number).
Option prices rely on both discounting and nonlinearities a ecting the drift of the state
variables, when the underlying fundamentals are not traded (unlike stock prices). In Section
7.5, we rely on the predictions of Theorem 7.1 and analyze the price of long-lived assets. In
the next section, we illustrate the gist of the proofs underlying Theorem 7.1, by developing one
example.
7.4.2.2 A digression on a macro-asset option

We discuss an example regarding a highly conceptual and abstract asset, a macro-asset


option, i.e., one asset that delivers payo s linked to aggregate outcomes. We illustrate a few
facts Theorem 7.1 predicts. Let
be the aggregate consumption process. The owner of the
option has the right to receive a twice-di erentiable payo
( ) at some date , where is
increasing and convex. We assume that is solution to:
=
and that consumption growth satises
=

( )

+ ( )

where and are some well-behaved functions, and


is a standard Brownian motion. Let
(
) be the option price when the state of the economy at is (
) = ( ). We assume
that is as di erentiable as needed below, that interest rates are constant, and that all agents
are risk-neutral.
By the usual connection between partial di erential equations and conditional expectations
(see Chapter 4), the price (
) is solution to:
0=

1
2

for all

and

[0

(7.20)

with boundary condition (


) = ( ) for all
(subscripts denote partial di erentiation).
First, we study monotonicity of (
) with respect to both and , following two approaches. The rst approach relies on the so-called no-crossing property of a di usion process.
Note that:

(
)
=
(
)=

=
(7.21)

Since is increasing, is increasing in . Furthermore, the no-crossing property of implies


is increasing in the initial condition . Therefore, ( ) is also increasing in .
that
To analyze convexity of with respect to , di erentiate Eq. (7.20) and its boundary condition
with respect to , and nd that
is solution to:
0=

1
2

(
302

for all

and

[0

c
by
A. Mele

7.4. Rational market uctuations


0

with boundary condition (


)=
solution to the previous equation is:
(

)=

( ), all

. The Feynman-Kac representation of the

( )

(7.22)

which is positive, by the assumption that 0


0. This conrms the monotonicity properties
established previously through no-crossing arguments. So when is (
) convex in ? By
di erentiating Eq. (7.20) with respect to , one obtains that
is solution to:
0=

1
2

1
+ ( + ( 2 )0 )
2

with boundary condition (


(

)=

) = 0, all

) +

for all

and

[0

) (7.23)

. By the Feynman-Kac representation theorem,

0(

)+

0. Hence, is increasing in . We can now apply Theorem 7.1 and conclude


By Eq. (7.22),
that is strictly convex in whenever the drift function of is weakly convex. Indeed, by
di erentiating Eq. (7.23) with respect to , we obtain that
is solution to:
+

1
2

+( +( 2 )0 )

1 2 00
( ) ) +
2

for all

and

[0

) (7.24)

where
(

)+

00

( ) (

(7.25)

and boundary condition (


) = 0 for all
.

0
(
)
)=
( ) . Thus is increasing in by
By Eq. (7.22), we have that (
the assumption that is increasing and convex, and the no-crossing property of a di usion,
by which
is increasing in the initial condition . Therefore,
0. Furthermore,
0.
00
Therefore, (
) 0 whenever ( ) 0. By the Feynman-Kac theorem, then, is convex
in whenever 00 ( ) 0.
The previous conclusions can hold even with a concave payo function, say ( ) = ln . In
(
)1
this case, Eq. (7.22) implies that (
)=
, such that the function in Eq. (7.25)
collapses to, (
) = 00 ( ) (
). That is, the price (
) is convex (resp. concave) in
whenever is convex (concave) in . Note, then, that the price is linear in whenever 00 = 0,
as it can easily be veried by replacing ( ) = ln into Eq. (7.21), leaving:
(

)=

ln +

These examples convey a gist of the arguments underlying the proof of Theorem 7.1. They
also illustrate how we shall proceed to develop properties regarding long-lived asset prices in
the context of the canonical pricing problem of Section 7.4.2.1.
303

c
by
A. Mele

7.5. Time-varying discount rates or uncertain growth?

7.5 Time-varying discount rates or uncertain growth?


7.5.1 Tackling the puzzles
Asset prices are what they are due to the characteristics of the pricing kernel and the statistical
properties of dividend growth. We focus on a simplifying assumption, namely that explanations
of asset prices can elaborate on either one of the two previous channels. Accordingly, the solid
arrows of Figure 7.11 illustrate that once we make assumptions regarding dividend growth (e.g.
i.i.d. distributions), one can then seek for pricing kernels that are consistent with the asset
prices. Conversely, once we make assumptions regarding the pricing kernel, one can ask for
which statistical properties of dividend growth are needed to reconcile explanations of asset
prices with the security market data (dashed arrows).
Dynamic properties
of asset prices

1. Expected returns
2. Returns volatility

Pricing
Kernel

Dividends
distribution

1. Interest rates
2. Risk-premium

FIGURE 7.11.
This section deals with this search process while relying on methodology introduced in the
previous section. We consider two economies. In the rst economy, changes in the economic
fundamentals determine cyclical variations in the discount rates (in Section 7.6.1). In the second,
the economic fundamentals lead to time-varying expected dividend growth (in Section 7.6.2).
We need to provide preliminary results about pricing kernels, which we need to use while
illustrating these two broad classes of economies. Finally, Section 7.5.4 is an introduction to a
class of hopefully analytically convenient processes we can use to model long lived asset prices.
7.5.2 Markov pricing kernels, asset returns and volatility
7.5.2.1 Pricing kernels

Motivated by the previous discussion, this section considers economies in which asset prices
have high volatility due to volatile pricing kernels and stochastic expected dividend growth. We
provide foundations relying on a representative agent economy. In this setting, interest rates
and risk-premiums change randomly because the agents utility depends on both consumption
and other variables. While complete markets naturally t in the analysis of this section (see
Chapter 2), incomplete markets can sometimes be analyzed relyiong on this framework, as
explained in the Chapter 8.
304

c
by
A. Mele

7.5. Time-varying discount rates or uncertain growth?

We extend some foundational issues in Chapter 4. Consider the stochastic discount factor in
Chapter 4,
( )
, where the pricing kernel process
satises
(

)=

=1

(7.26)

for some functions (


) and (
) and some di usion process . We assume that is
bounded and positive, and that
is twice continuously di erentiable. We also assume that
there is another process, , such that
= Y ( ) for some monotonic, continuous and twice
continuously di erentiable function Y, and that (
) are solutions to a slight generalization
of Eqs. (7.14):

= (
) + 0( )
1
(7.27)
= (
) + 1(
)
)
1 + 2(
2
The pricing kernel is solution to
=

(7.28)

where is the short-term rate and [ 1 2 ] is the vector of unit risk-premiums.


By applying Itos lemma to in Eq. (7.26), and identifying the terms in Eq. (7.28), one nds
that the short-term rate and risk-premiums are both functions of the current state, in that:
(

1(

) =

) =

(
)
(
)
ln (
)

)
0( )
2

ln (

1(

ln (

where
is the innitesimal generator operator (see Chapter 4) for (
) and
(
) denote
the di usion coe cients of .
For example, consider an innite horizon economy in which aggregate dividends are solution
to Eq. (7.27), with 2 0 and
1 , and a representative agent solves the following program:
Z

(
)
s.t
0
max
0 =
0
(

where
0, the instantaneous utility
tiable, and is solution to
=

is continuous and three times continuously di eren-

In equilibrium, optimal consumption equals aggregate dividends. In terms of


we have that (
) = and (
) = 11(( 0 0 )) , such that 2 = 0, and
(

(
1(

(
1(

12

)=

)
)

11

)=

(
1(

11

)
)

(
)
)

1
2

)
)

( )

1
2

(
1(

12

2
0

(
)
)
1(
)
122 (
)
)
1(
111

( )

(
)
)

in Eq. (7.26),
1 , and

(
305

( )

(
1(

112

)
)

(7.29)
(7.30)

c
by
A. Mele

7.5. Time-varying discount rates or uncertain growth?


7.5.2.2 Expected returns and volatility under a scale-invariant assumption

We study the implications of these pricing kernels in terms of the asset expected returns and
volatility. We base the derivations on the continuous time formulation of the APT model in
Chapter 4 (see Section 4.2.5). These derivations rely on the assumption of a scale-invariant
economy, and may reveal useful as a guidance for empirical work. It can be shown that a
scale-invariant economy obtains once we assume that in Eq. (7.27), (i) (
) =
, (ii)
(
)
=
(for
some
constant
),
and
(iii)
the
drift
and
di
usion
coe
cients
of
are
0
0
0
independent of . We still assume that is some monotonic function Y () of in Eq. (7.26);
accordingly, we express the price-dividend ratio in terms of .
Under these assumptions, we have that the asset expected returns are

E
( ) +
=

+ cash-ow beta cash-ow lambda + price beta price lambda


|
{z
}
|
{z
}
R

where

R =

risk-adjusted discount rates

0 1

( )

W =

( )
(
( )

( )

(7.31)

price premium wedge

( )+

( )

( ))

Finally, returns volatility is


Vol ( )

Vol1 ( )
Vol2 ( )

"

( )
0(
2( ) (

0(

)
( )
)
)

We rely on these predictions while analyzing a number of models in the following sections as
well as in Chapter 8.
7.5.3 External habit formation
Time-varying risk-premiums are a plausibly engine mechanism for asset price uctuations. Intuitively, the very properties of asset prices must necessarily inherit those of the risk-premiums,
as illustrated by Figure 7.11. Campbell and Cochrane (1999) model of external habit formation
is a well-known attempt at explaining some of the empirical features outlined in Section 7.2 by
incorporating time-varying risk-premiums into an economy with i.i.d. dividend growth.
Consider an innite horizon economy in which a representative agent has undiscounted instantaneous utility:
)1
1
(
(
)=
(7.32)
1
where denotes consumption and is a time-varying habit, or exogenous subsistence level.
In this model, the habit process is dened in a residual way, through the surplus consumption
ratio, as we now explain.
The total endowment process,
, satises,10
=

(7.33)

10 Campbell and Cochrane (1999) consider a discrete-time model in which the log-consumption growth is Gaussian. Eq. (7.33) is
the di usion limit of their model.

306

c
by
A. Mele

7.5. Time-varying discount rates or uncertain growth?

A measure of distance between consumption and the level of habit is the surplus consumption
ratio,

Note that the curvature of the instantaneous utility is inversely related to ,


(
(

)
=
)

(7.34)

where subscripts denote partial derivatives, the second equality is the equilibrium condition,
=
, and the third is the denition of the equilibrium surplus consumption ratio. By
assumption, ln is solution to:
ln

= (1

)(

ln

( )

(7.35)

where is a positive function, dened below. That is, the surplus ratio is driven by output
innovations (i.e., by
): the higher the output growth innovations (which lead to higher
consumption in equilibrium), the higher the surplus ratio.11
This model of habit formation di ers from previous formulations such as that of Ryder
and Heal (1973), or Sundaresan and Constantinides (1990), due to three properties: (i) it
is an external theory, in that the habit is aggregate, not consumption chosen by the
individual, similarly as with Abels (1990) catching up with the Joneses formulation, or
Duesenberrys (1949) relative income model; (ii) habit responds to consumption smoothly, not
to each period past consumption, as in previous models of habit formation such as that of
Ferson and Constantinides (1990); (iii) it guarantees marginal utility is always positive.
The second of the previous properties produces slow mean reversions in the price-dividend
ratio and long-horizon predictability, and large predictable movements in stock volatility, three
empirical features reviewed in Section 7.2.
Note that markets are complete as there is only one source of risk (the dividend in Eq. (7.33).
Therefore, we can determine the Sharpe ratio in this economy relying on results in Section 7.5.2
(see Eq. (7.30)):

1
(
)=
(
)
(7.36)
0
where
is the di usion coe cient of equilibrium habit,
to Eq. (7.35). By Itos lemma, (
) = (1
( ))
leaves:
( ) = 0 (1 + ( ))
The real interest rate is, by Eq. (7.29),

1 2
+ (1
( )= +
0
2 0

) (

ln )

=
(1
), and
is solution
,
which
replaced
into
Eq.
(7.36),
0
(7.37)

1
2

2 2
0

(1 + ( ))2

(7.38)

The third term reects usual intertemporal substitution e ects. Due to mean reversion, bad
times (when is low) are those when agents expect the very same will improve. Therefore, in
bad times, agents expect their marginal utility to decrease in the future and to compensate for
11 One could add an additional Brownian motion in Eq. (7.35) to lower the conditional correlation between output growth and
surplus ratio.

307

c
by
A. Mele

7.5. Time-varying discount rates or uncertain growth?

this fall, they will try to decrease future consumption, compared to today, by trying to save less
(or trying to borrow more), thereby pushing interest rates up. The last term is a precautionary
savings term.
Campbell and Cochrane (1999) choose the function so as to satisfy three conditions: (i)
the short-term rate is constant; and habit is predetermined both (ii) at the steady state, and
(iii) near the steady state. A constant is consistent with the empirical evidence surveyed in
Section 7.2, that real interest rates are really not volatile, compared to stock returns. Making
habit predetermined at and near the steady state formalizes the idea that it takes time for
consumption shocks to a ect habit, at least at the steady state. The Appendix shows that
under these conditions, the function is:
p
( ) = 1 1 + 2( ln ) 1
(7.39)
q
where = 0 1 = . In turn, this function implies that the short-term rate in Eq. (7.38)
1

1 2
(1
).12
is: = +
0
2 0
2
The next picture depicts the function in Eq. (7.39), obtained using the parameter values in
Campbell and Cochrane, = 2, 0 = 0 0150, = 0 870. It is decreasing in , and convex in ,
over the empirically relevant range of variation of .

l(s)

50

40

30

20

10

0
0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

surplus consumption ratio, s

These properties are inherited by the Sharpe ratio in Eq. (7.37). Precisely, the Sharpe ratio
is countercyclical and moves asymmetrically over the business cycle because habit is predetermined near the steady state, and the short-term rate constant (or, at least, a ne in ln , as in
the Appendix).
The model makes a number of important predictions. Consider, rst, the instantaneous util1
ity in Eq. (7.32). By Eq. (7.34), relative risk aversion is equal to
. That is, risk aversion is

,
0. It is countercountercyclical. Formally, the stochastic discount factor is
0
0
cyclical because both and
are procyclical;
moreover, it is more volatile than the standard

stochastic discount factor,


12 The

Appendix considers a slightly more general model, in which the short-term rate is a ne in ln .

308

c
by
A. Mele

7.5. Time-varying discount rates or uncertain growth?


In terms of the expected returns, Eq. (7.31) reduces to
E ( ) = R( ) + W ( )

where the risk-adjusted discount rates, R ( ), and the wedge over them, W ( ), are given by:
R( ) = ( ) +

( )

W( )=

( )
( ) ( )
( )

(7.40)

and () denotes the di usion coe cient of in Eq. (7.35), ( ) = 0 ( ). The mechanism
is sensible. Intuitively, during economic downturns, the surplus consumption ratio decreases
and agents become more risk-averse. As a result, prices decrease and expected returns increase;
moreover, the model leads to realistic risk premiums. Note that a su cient condition for these
e ects to occur is that the price-dividend ratio is concave, which also ensures a countercyclical
wedge, W 0 ( )
0. However, the economy is one with high risk-aversion, as on average, the
1
calibrated model produces a value of
with an average value of approximately 40.
By Eq. (7.35), the log of is a mean-reverting process. By taking logs, we are sure that
remains positive. Moreover, ln is also conditionally heteroskedastic since its instantaneous
volatility is 0 . Because is decreasing in , and is clearly procyclical, the volatility of ln
is countercyclical. This feature is responsible of many interesting properties of the model, such
as countercyclical returns volatility.
Finally, the Sharpe ratio in Eq. (7.37) is made up of two components. The rst is
0,
which coincides with the Sharpe ratio predicted by the standard Gordons (1962) model. The
second is
0 ( ), and arises as a compensation related to the stochastic uctuations of the
habit, = (1
). Therefore, is countercyclical due to the functional form of . Combined
with a high , this assumption leads to slowly varying, countercyclical expected returns. Finally,
the model suggests that the price-dividend ratio is concave in .13
We now explain the link between convexity of and concavity of the price-dividend ratio in
this model. We rely on the canonical pricing problem of Section 7.4, and appeal to Theorem
7.1. What is the price-dividend ratio in this economy? Note that the short-term rate is constant
in this model, as discussed. Yet for sake of generality, assume it is state-dependent, although
only a function of (as, e.g., in the Appendix). The price-dividend ratio is then as in Eqs.
(7.15)-(7.16)-(7.18), with constant growth, i.e., ( ) = 0 :
Z

0 R( )
( )=
E
(7.41)
0=
0

where R ( ) are the risk-adjusted discount rate in (7.40), and


= ( )

+ ( )

where
( ) = ( ) +( 0
( ) = (1

( )) ( )
( ) = 0 ( )
1 2 2
) ( ln ) + 2 0 ( )

(see Eq. (7.17)) and, nally,


and =
0 is a Brownian motion under the probability
is a Brownian motion under the risk-neutral probability .
The properties of the inner expectation in Eq. (7.41) can be analyzed relying on the canonical
pricing problem of Section 7.4. Precisely, Theorem 7.1 leads to the following conclusions:
13 The solution of the model is not known in closed-form, and this property is known through a numerical solution of the model.
The Appendix describes a simple method to solve the discrete-time version of this model (and related models) numerically.

309

7.5. Time-varying discount rates or uncertain growth?


(i) Suppose that the risk-adjusted discount rates are countercyclical, viz R0 ( )
price-dividend ratio is procyclical, viz 0 ( ) 0.

c
by
A. Mele
0. Then, the

(ii) Suppose that the price-dividend ratio is procyclical. Then, the price-dividend ratio is also
a concave function of as soon as the risk-adjusted discount rates are convex in , viz
R00 ( ) 0, and 00 ( ) 2R0 ( ).
The previous statements impose joint restrictions on the primitives such that the pricedividend ratio is consistent with properties given in advance. The economic interpretation of
the convexity of R is similar to that anticipated in Section 7.3 (see Figure 7.7). In terms of
the Campbell-Cochrane economy, countercyclical volatility arises because R is decreasing and
convex in the surplus consumption ratio , such that is concave in .
The mechanism is the following. In bad times, consumption gets close to the substistence
level , such that is very small. Risk aversion is high as a result, and the agent becomes more
reluctanct to invest in the stock market. That is, in bad times, risk-adjusted discount rates,
R, increase sharply, thus making the price-dividend ratio quite responsive to changes in the
economic conditions. Instead, in good times, and again due to convexity, R changes relatively
less, such that the price-dividend ratio changes relatively less in response to changes in the
economic conditions.
In other words, the model is such that risk aversion becomes extremely large in bad times.
Technically, R is su ciently convex in , such that the price-dividend ratio is concave in .
Then, stock volatility increases on the downside, i.e., it is thus countercyclical, as illustrated by
Figure 7.6. Note that these properties arise because the risk-adjusted rate, R, is su ciently
convex in the surplus consumption ratio. The Appendix does indeed provide an upper bound
to convexity that triggers these properties.14
One di culty with this model is that its predictions are driven by a single state variable,
the surplus consumption ratio, . One implication is that the conditional correlation between
consumption growth and stock returns is one. In the data, this correlation is much lower.
Naturally, the model predicts this correlation is unconditionally less than one, although still
too large, once compared with that in the data.
Brunnermeier and Nagel (2007) nd that US investors do not change the composition of
their risky asset holdings in response to changes in wealth. The authors interpret this evidence
against external habit formation. Naturally, time-varying risk-premiums do not exclusively arise
through external habit formation. Barberis, Huang and Santos (2001) develop a theory distinct
from habit formation, which leads to time-varying risk-premiums. The next chapter explains
there are many instances of economies in which risk-premiums are time-varying as a result of
alternative mechanisms.
7.5.4 Large price swings as a learning induced phenomenon
We now develop models in which expected dividend growth is unobserved. This leads to a natural question: How do agents process available data while they formulate their guesses regarding
the growth of their economy? Inevitably, these guesses lead the agents to face situations with
14 Alternatively, Mele (2007) shows that for any model in which the price-dividend ratio is driven by a di usion variable
, there
such that the price-dividend ratio is concave for all
whenever lim 0 R ( ) = . Note indeed that the
is a threshold
Campbell-Cochrane model fails to satisfy restrictions (i) and (ii) over the entire range of variation of , althrough then it satises
lim 0 R ( ) = . There exists additional models with external habit formation that lead to countercyclical volatility (see, for
example, Menzly, Santos and Veronesi, 2004; Mele, 2007).

310

7.5. Time-varying discount rates or uncertain growth?

c
by
A. Mele

stochastic expected growth, such that in addition to consumption, expected gorwth becomes
a new state variable with the potential to introduce interesting price dynamics. Note that although the focus of this section regards models with unobserved expected growth, models with
observed expected dividend growth have always had an interest on their own (see, e.g., the
early survey of Campbell, 2003), as explained in more detail in Chapter 8.
7.5.4.1 The information channel

Time variation in stock volatility may also arise due to the agents learning about the economic
fundamentals. In models along these lines, public signals about the fundamentals hit the market, and agents make inference about them, thereby creating new state variables driving price
uctuations, which relate to the agents own guesses about the (unknown) state of the economic
fundamentals. Timmermann (1993, 1996) provides models with exogenous discount rates and
learning about the fundamentals. The e ects of learning increase stock volatility beyond the extent explained by a model with known fundamentals. Brennan and Xia (2001) generalize these
models to a stochastic general equilibrium. Veronesi (1999) provides a rational expectations
model with learning about the fundamentals, with nonlinear e ect regarding the asset price.
This section provides details about the mechanisms through which learning a ects asset prices
in general, and stock volatility in particular.
We shall assume that information about the fundamentals is incomplete, but symmetrically
distributed among agents. The assumption of symmetric information might appear strong. It
should not. The models in this section aim to capture the idea that markets function in a
context of incompressible uncertainty, where agents are all unaware of the crucial aggregate,
macroeconomic developments a ecting asset prices. Chapter 9 reviews models with both di erential and asymmetric information, which are more useful whilst thinking about the functioning
of markets for individual stocks. In these markets, it is plausible to assume that agents have
di erent information sets, and that acquire information in dedicated information markets. By
contrast, it seems unrealistic to assume that one could acquire crucial information about ongoing business cycle developments and that agents are, then, asymmetrically informed about
it, such that uninformed agents can learn from the asset prices: the cost of acquiring such
information appears to be incommensurable.
Note that the assumption of symmetric information simplies the analysis, as the agents do
not need to base their decisions upon the observation of the equilibrium price. In a context with
asymmetric information, agents can, instead, learn pieces of information other agents have, by
reading the equilibrium price, because agents with superior information impinge part of their
information on the asset price, through trading, as explained in Chapter 9. This complication
does not arise in the model of this section. Agents, now, need only to condition upon the
realization of signals, which convey information about the fundamentals. There is no need for
any agent to condition on prices, because prices merely convey the same information any such
agent already has.
7.5.4.2 An introductory model of learning

Assume that dividend

is made up of two random components:


= +

where and
are independently distributed, with
Pr( = ) = 1
Pr( = ) = Pr( =
) = 12 . Suppose that the state is unobserved.
311

(7.42)
Pr( =

), and

c
by
A. Mele

7.5. Time-varying discount rates or uncertain growth?

How should we update our prior probability of the good state after observing ? A
simple application of Bayes Theorem yields the posterior probabilities Pr( = | ) in Table
7.4. Considered as a random variable dened over the observable states , the posterior probability Pr( = | ) has expectation [Pr ( = | )] = and variance
[Pr ( = | )] =
1
(1
). It is an inverse U-shaped function of , and takes a value of zero exactly when the
2
prior on the state is degenerate (zero or one).

=2

Pr ( =

Pr(
=

(observable state)
2
2 = 0
3 =
1
1
(1
)
2
2
0

1
2

)
)

TABLE 7.4. Distribution of posterior probabilities Pr ( = | )


The probabilities in Table 7.4 follow by a simple application of Bayes Theorem. That is, for
any partition ( ) of a given state space,
| ) = Pr ( )

Pr (

Pr ( | )
Pr ( | )
= Pr ( ) P
Pr ( )
Pr ( | ) Pr (

(7.43)

Applying Eq. (7.43) to our example, we have:


Pr ( =

1)

= Pr ( = )

Pr ( = 1 | = )
Pr ( = 1 | = )
=
Pr ( = 1 )
Pr ( = 1 )

But Pr ( = 1 | = ) = Pr ( = 1
) = Pr ( = ) = 12 . Moreover, we have that
Pr ( = 1 ) = 12 . This leaves Pr ( = | = 1 ) = 1. Its trivial, but one proceeds similarly while determining the other probabilities.
This simple example illustrates the main ideas underlying Bayesian learning. However, it leads
to a nonlinear lter, , which di ers from those we usually encounter in the literature (see, e.g.,
Chapters 8 and 9 in Liptser and Shiryaev, 2001 ) (LS, in the sequel), where the instantaneous
variance of the posterior probability changes,
say, is proportional to 2 (1
)2 , not to
(1
).
This distinction arises due to technical reasons, notably because
is a discrete random
variable. Indeed, assume that has some arbitrary, but continuous density , and zero mean
and unit variance. Let ( ) Pr ( = |
). By the Bayes rule in Eq. (7.43),
( ) = Pr ( = )
But Pr (
Pr ( =

Pr (

Pr (
| = )
| = ) Pr ( = ) + Pr (
| =

| = ) = Pr ( =
)= (
) and, similarly, Pr (
+ ) = ( + ). Therefore, simple calculations leave
( )

= (1

(
(
312

)
( + )
) + (1
) ( + )

) Pr ( =
| =

)
)=

(7.44)

c
by
A. Mele

7.5. Time-varying discount rates or uncertain growth?

That is, the variance of the probability changes, ( )


, is proportional to 2 (1
)2 .
We now extend these facts to a continuous time setting, by assuming is a Brownian motion.
Precisely, we assume that the dynamic counterpart to Eq. (7.42) is,
=

(7.45)

is a standard Brownian motion. We assume that the agent projects


for some constant 0 , and
on to the information set generated by , thereby being left with the following decomposition,

=
|( )
(7.46)
+ 0

is a Brownian motion with respect to the agents information set


where
Appendix 9, we show that
( ) satises
=2

(1

0)

). In

(7.47)

1
|( )
).15
where
(the lter) and
0 (
It is possible to show that if = (resp. =
), then, lim
= 1 (resp. 0) a.s. It
is the Strong Law of Large Numbers for Brownian motions (e.g., Karatzas and Shreve, 1991):
lim
=
or
according to whether =
or =
. Intuitively, if = ,
the Brownian noise in Eq. (7.45) will be dominated in the long-run, such that
becomes
arbitrarily large, which leaves any agent condent that = . In other words, in this model,
the agents are able to gure out the truth in the long-run. Below, we shall specify an alternative
model in which the agents can never completely learn.
7.5.4.3 Pricing implications

Note an interesting implication of this model. The lter is linear in , i.e., = (2


1).
Therefore, the equilibrium in this economy with incomplete information is isomorphic in its
pricing implications to that in a full information economy in which:

0) + 0
=(
(7.48)
= ( ) + ( )
where ( ) (
) ( + ) 0 , is a Brownian motion under the risk-neutral probability,

and , the risk-premium, is assumed to be constant for simplicity.


Similar conclusions hold in proportional markets, i.e. when the dividend process is solution
to

+ 0
=(
0 )
(7.49)
= ( ) + ( )
and, again, is a risk-premium, assumed to be constant. The instantaneous volatility of the
expected dividend growth, , is inverse U-shaped in this example, too. In the presence of positive
compensation for risk, 0, the risk-neutral drift of (i.e., ( )) is, then, a convex function
of . Our discussion of the canonical economy in Section 7.4 suggests that this property can
15 This construction is heuristic, but it can be made rigorous (see LS, Theorem. 8.1 p. 318 and Example 1 p. 371). In particular,
is a Brownian motion with respect to the agents information set ( ,
).
it can be shown (LS, Theorem 7.12 p. 273) that

313

c
by
A. Mele

7.5. Time-varying discount rates or uncertain growth?

imply that the asset price is convex with respect to expected growth, . Note that this convexity
arises only once 0: only in this case would the risk-neutral drift of be convex in .
The economic interpretation of the convex drift of is simple. An utility maximizing representative agent requires compensation for the dividend risk, but also for the risk regarding
his estimated probability of living in the good state. In very good and in very bad times, this
probability is either one or zero, such that there is not risk to be compensated for regarding
this probability (i.e., () is close to zero), whence the convexity of risk-adjustment. In the
next subsection, we develop more intuition on these risk-adjustements, relying on the portfolio
choices of a representative agent.
The implications of this convexity is that prices undergo large swings in good times. The
economic mechanism is the following. In good times, that is, after a string of repeated positive
news on growth, the agents nd it very likely that the true dividend drift is = . In particular,
the higher
, the more likely it is that the asset is good. In these states of the world, the
agents do not feel exposed to errors on , and therefore require little risk-aversion corrections
regarding
(only regarding realized dividend growth,
).16 Note that in bad times, the
agents are not exposed to these errors either, such that asset prices fall, albeit moderately so.
In other words, a convex price might lead to uctuations that an econometrician (say) could
interpret as being generated by a bubble even if no bubbles are present whatsoever.
Finally, it is instructive to provide the expression for expected returns and volatility predicted by this model, based on the analysis of Section 7.5.2. For example, assuming that the
representative agent has CRRA equal to , we have that = 0 , such that expected returns
are
0
E =
+ 20 + (( )) 0 ( )
(7.50)
| {z }
|
{z
}
R

and volatility is

Vol =

0+

( )
( )
( )

(7.51)

We now rely on Theorem 7.1 and analyze these convexity properties in two famous (and more
general) models of learning, one of them generalizing Eqs. (7.48).
7.5.4.4 With Bellman

The previous explanations regarding risk-aversion corrections can be illustrated while solving
the agents dynamic programming problem. In particular, we shall show, it is the hedging
component of the agents portfolio choice to determine how much the agent is willing to invest
in the asset in periods of uncertainty.
[In progress]
7.5.4.5 Convexity again, and two models of learning

The model in Eqs. (7.48) can be considered as a special case of that considered by Veronesi
(1999), in which an innitely lived agent has constant absolute risk aversion equal to
0,
and observes realizations of , generated by:
=

(7.52)

16 In terms of Eqs. (7.49), the drift of dividend growth is tilted to the left due the risk-adjustment term,

0 . Moreover, we also
have that
is tilted to the left, due to the negative drift, ( ). The e ects of ( ) are small in very good and very poor
times, bacause () is small in these cases. The e ects of 0 are independent of the state of the economy.

314

c
by
A. Mele

7.5. Time-varying discount rates or uncertain growth?

where 1 is a Brownian motion, and is the expected dividend change, supposed to follow a
two-state ( ) Markov chain. (See, also, David, 1997, for a related model.) In Eqs. (7.48), the
dividend is a type, in that once nature draws , this is there forever. In Eqs. (7.52), instead,
is allowed to change, according to a Markov chain, as explained.
The key aspect in this economy is that the expected dividend change, , is unobserved. As
a result of this lack of knowledge, the agent attempts to learn about the state in which he is
living, through Bayesian learning. The resulting economy is one generalizing that in the previous
section. In this economy, the price is the same as that in a full information economy in which
the dividend is solution to

=
+ 0
(7.53)
= (
) + ( )

) 0 , and are some positive constants. Note that while the


where ( ) = (
)(
di usion terms in Eq. (7.48) and in Eq. (7.53) have the same functional form, the expected
dividend change is a martingale in Eq. (7.48), and mean-reverting in Eq. (7.53), under the
physical probabilityi.e. when = 0 in Eq. (7.48).
These di erences exists because in Eq. (7.48), is drawn at time 0 and stays the same forever,
whereas in Eq. (7.53), is a Markov chain. In other words, in Eq. (7.52), the agent will never
gure out the truth, such that
is mean-reverting.
Eqs. (7.53) amount to a special case of Eqs. (7.27). Therefore, by Eq. (7.30), the risk-premium
is constant, and equal to =
0 such that the dynamics of dividends under the risk-neutral
probability are:

2
=(
+ 0
0)
(7.54)
= ( (
)
+ ( )
0 ( ))
Veronesi (1999) also assumes the riskless asset is innitely elastically supplied, and therefore
that the interest rate is a constant.
Let us analyze the properties of the equilibrium price predicted by this model. It is easy to
see that given Eq. (7.54), the asset price is:

Z
Z

=
(
)=E
(
)

where

)=

2
0

and

E(

| )

(7.55)

The conditional expectation, E ( | ) can be read as a special case of the canonical price
in Eq. (7.13), namely, for = 0 and ( ) = . By Theorem 7.1-(ii), E ( | ) is convex in
whenever the drift of in Eq. (7.54) is convex. This condition always holds true because
0.
That is, the conditional expectation of in Eq. (7.55), inherits the same second order properties
(convexity) of this drift function.
The economics behind these convexities is the same as that behind Eqs. (7.48) or (7.49). Prices
are convex in the expected dividend growth because of a kind of speculative enthusiasm: after
a sequence of high realized growth, investors believe that the likelihood is high they are living in
good times, and that the likelihood is low they are making mistakes in their assessment. Thus,
asset prices increase fueled by low risk-premium discounts. Note that these low premiums regard
315

c
by
A. Mele

7.5. Time-varying discount rates or uncertain growth?

the uncertainty the agents face while assessing the world in which they live. Their speculative
enthusiasm is rational, so to speak, i.e., not determined by animal spirits.
These properties hold as we are assuming that the riskless asset is innitely elastically supplied. When investors demand for the safe assets a ects the interest rate, the interest rate
becomes a function of the expected growth, . Thus when increases, the increase in the asset
price is mitigated by the fact that relatively less savings are now being made, reecting the fact
that more resources are likely to be available in the future. The interest rate increase. Formally,
1 2 2
by Eq. (7.29), the short-term rate is ( ) = +
0 , such that the asset price is, now,
2

Z
Z

(
)

0
=
(
) E
(
)

where

)=(

)

2
0

)
+Z

( )

By Theorem 7.1, the two functions, ( ) and


( ), are a ne in if the risk-neutral
drift of in Eq. (7.54), ( ) ( (
)
(
)),
and
the short-term rate ( ), are such that
0
00
0
( ) = 2 ( ). A simple calculation reveals this is indeed the case, with 00 ( ) = 2 . That is, if
interest rates are determined endogenously, the asset price is no longer convex in the dividend
expected growth. It is unpleasant arithmetics. We know the short-term rate should not uctuate
too much, compared to stocks. Assuming it is constant as Veronesi (1999) did, is reasonable and
leads to interesting properties, which would be destroyed under the counterfactual circumstance
the short-term rate is allowed to considerably uctuate.
Interestingly, this property that prices are linear in , once interest rates are endogeneous, is
not specic to cases where dividend is as in Eqs. (7.53) and agents have CARA. Veronesi (2000)
shows that convexity properties are lost even in proportional markets (i.e. markets where the
distribution of dividend growth is independent of the initial level of dividends) and representative agents with CRRA. In fact, Appendix 10 shows that there are no linear signal structures
and representative agent CRRA economies with complete securities markets supporting the
convexity property.
In the previous model, the asset price reacts asymmetrically to changes in the expected
dividend growth, due to the assumption that agents learn about a discrete state space in
a continuous-time economy. We now describe another model, analyzed by Brennan and Xia
(2001), in which the agents learn about a continuous state in a continuous-time economy. In
this model, an innitely lived agent has CRRA preferences, and observes , solution to:
=

and the expected dividend growth, , is unobserved. Rather than assuming to be on a


countable number of states, Brennan and Xia postulate that it is an Ornstein-Uhlenbeck process:
= (

where , 1 and 2 are positive constants.


The agent learns about through Bayes rule, similarly as in the previous section. It can be
shown that if the agent has a Gaussian prior on 0 with variance 2 dened below, the price is
316

c
by
A. Mele

7.5. Time-varying discount rates or uncertain growth?


(
), where
2 = 0, and

and satisfy Eqs. (7.27), with


1(

(
)=

)=
1

( )=

( ) = (

),

+
0

and
is the positive solution to 1 ( ) = 21 + 22 2 .17 By Eq. (7.30), the risk-premium
is constant, and equal to =
is the CRRA coe cient, and by Eq. (7.29), the
0 , where
short-term rate is linear in . Therefore, and by the same reasoning leading to Eq. (7.41), the
price-dividend ratio is independent of , and is given by:
Z

(
( ))
0

0
( )=
(7.56)
E

where

=( ( )+(

) ( ))

+ ( )

(7.57)

and is a -Brownian motion.18 The two functions, ( ) and ( ), are momentarily left
unspecied, as we wish to provide general results within this model.
Under regularity conditions, monotonicity and convexity properties are inherited by the inner
expectation in Eq. (7.56). Precisely, in the notation of the canonical pricing problem, we have
that ( )
+ ( ) + 0 and ( )
( )+( 0
) ( ). Therefore, by Theorem 7.1, we
have:
(i) If

( )

1, the price-dividend ratio is increasing in the expected dividend growth, .

(ii) Suppose that the price-dividend ratio is increasing in . Then, it is also convex in
2
) ( ))
2 + 2 0 ( ).
whenever 00 ( ) 0 and 2 ( ( ) + ( 0
For example, assume that the short-term rate is constant (it could be in innitely elastic
supply). Then, the price-dividend ratio is increasing and convex in the expected dividend growth
if:
2

( )+(

) ( ))

These conditions are satised by Brennan and Xia (2001). [...] Provide economic interpretation.
[in progress] Moreover, explain that this is due to the fact the inner expectation in Eq. (7.56)
is indeed one for an a ne model to be introduced and explained in full detail in Chapter 12.
Finally, expected returns and returns volatility have the same expression as in Eqs. (7.50) and
(7.51).
7.5.5 Linearity-generating processes
The focus of the previous sections is a search for pricing kernels that make asset pricing models qualitatively consistent with countercyclical statisticsa search relying on theoretical test
conditions, those emanating from Theorem 7.1 in Section 7.4.
17 Brennan and Xia (2001) actually consider a slightly more general model, where consumption and dividends di er. They derive
a model with a reduced-form identical to that in this example. In the calibrated model, Brennan and Xia found that the variance
of the ltered is higher than the variance of the expected dividend growth in an economy with complete information. The results
in this example can be obtained through an application of theorem 12.1 in Liptser and Shiryaev (2001) (Vol. II, p. 22). They
on
generalize results in Gennotte (1986) and are a special case of results in Detemple (1986). Both Gennotte and Detemple did not
emphasize the impact of learning on the pricing function.
18 By Girsanovs theorem,
=
=
+
are Brownian motions under and under . That is, =
0 and
+(
)
,
whence
Eq.
(7.57).
0

317

c
by
A. Mele

7.5. Time-varying discount rates or uncertain growth?

We can actually use Theorem 7.1 for another (somehow surprising) purpose: a search for
asset pricing models that have a closed-form solution. The idea is simple. Theorem 7.1-(ii) provides conditions under which price-dividend ratios can be either concave or convex in the state
variables driving them. Specically, consider the representation of the price-dividend ratio in
Eqs. (7.15)-(7.16)-(7.18), which ts the canonical pricing problem in in Section 7.4, as observed,
once we identify the primitives of the models with Eqs. (7.19), reported here for convenience,
( )
where as usual, R ( )

( )+
( )

R( )

( )
0 CF

( )

( )= ( )

( ) and, by Eq. (7.18),


2
P

( )

( )

( )+

0 1

( )

=1

and ( ) denotes the drift of under the physical probability. Then, given that
we have that by Theorem 7.1-(i),

is:

If 00 = 2 (R0
concave in
if (R00
convex in
if (R00

00

= 0,

00

)
00
)

0 and
0 and

00
00

0
0

(7.58)

regardless of whether is increasing in .


It is quite unlikely a closed-form solution exists for the price-dividend ratio when one of
these conditions is satised. Yet how about assuming none of the two conditions for concavity
or convexity hold true? In this case, obviously, price-dividend ratios are neither concave or
convex, so they must necessarily be a ne in the state variables! Technically, then, we have that
by the conditions in (7.58), the price-dividend ratios is a ne in if:
00 = 2 (R0

and R00

00

=0

(7.59)

Gabaix (2009) is the rst to note that price-dividend ratios are a ne in the state variables
driving them, should the drift of these state variables be quadratic. His remarks are consistent
with Theorem 7.1, and the two conditions in (7.59). In fact, Gabaix develops a unied theory of linearity-generating processes, which generalizes the single state variable framework
underlying Theorem 7.1 and the related conditions in in (7.59).
To illustrate these facts, consider a model that ts the class of linearity-generating processes,
one of external habit formation by Menzly, Santos and Veronesi (2004) (MSV, henceforth). In
this model, a representative agent maximizes,
Z

=
ln (
)
(7.60)
0

where
is external habit. Relative risk-aversion equals the inverse of the surplus consumption ratio, 1 , with
=
, which in equilibrium equals
, where
is consumption
endowment.
MSV assume that the surplus consumption ratio is a continuous-time autoregressive process,
solution to,

1
1
1
1
1
=
(7.61)
0

318

c
by
A. Mele

7.5. Time-varying discount rates or uncertain growth?

for some constants , ,


and . It can be shown that if
is small enough,
(0 ).
Ljungqvist and Uhlig (2000) utilize similar assumptions to model productivity shocks in their
model of catching up with the Joneses.
It is easy to verify that this model does satisfy the two conditions in (7.59). First, note that
Eq. (7.61) implies that the surplus consumption ratio satises,
=
such that,
( )=


1

1

2 2
0

2 2
0

1
2

+(

CF ( ))

Note that the drift function of the surplus ratio under is quadratic, a property that would
crucially lead to the rst condition in (7.59) to be satised.
)
By results in Section 7.5.1, the market Sharpe ratio is (
) = 1 ( 0 ( ) ), where (
is the instantaneous volatility of the habit level, = (1
) and, by Itos lemma, equals

(
) =
)
Vol ( ), such that, by Itos lemma again, Vol ( ) =
1
.
0 (1
0
Therefore, the market Sharpe ratio equals,
CF

( )

)=

1+

which is countercyclical. Finally, again by results in Section 7.5.1, we can also infer that:
( )= +

2
0

+
1

2
0

Therefore, we have that,


00 ( ) =

R0 ( ) =

00

( )=

( )=0

As a result, the two conditions in (7.59) are satised, and the price-dividend ratio is a ne
in . We can check that the price-dividend ratio is a ne in , through a direct computation.
Denote the instantaneous utility with ( ) ln (
), with = . By the usual asset price
representation, we have that the price-dividend ratio, ( 0 ), is,
Z

( 0) =
=
=
=

1
+

Z0
Z0

(
)
(
)
0 0
1
1
+

( + )
319

1
0

c
by
A. Mele

7.6. Modeling market-to-book ratios

Figure 7.12 depicts the price-dividend ratio as a function of the current surplus consumption
ratio, using the following parameter values, = 0 04, = 0 15, and = 0 03.

P/D

35
30
25
20
15
10
5
0.00

0.01

0.02

0.03

0.04

0.05

surplus consumption ratio

FIGURE 7.12. Price-dividend ratio for the aggregate consumption claim predicted by the
Menzly, Santos and Veronesi (2004) model of external habit formation.

As an example of a simple case of a linearity-generating process, consider a model with a


constant market Sharpe ratio and a constant short-term rate , but with stochastic dividend
growth,
=
where the expected dividend growth,

, is solution to,
2

such that,
( )=

R( ) = +

( )=

+(

It is straightforward to see that the two conditions in (7.59) hold truethe price-dividend ratio
is a ne in . Indeed, assuming that the price-dividend ratio is independent of , we have that
it satises the following di erential equation,
2
0 1 2 00
+ 0
+
+1=(
)
(7.62)
2
Let us conjecture, now, that the price-dividend ratio is a ne in , i.e. there are two constant
and such that
( )= 0+ 1
Replacing the previous expression into Eq. (7.62) allows us to pin down the two expressions for
0 and 1 such that the solution for the price-dividend ratio is,
+
( )= 2
0

7.6 Modeling market-to-book ratios


[In progress]
320

c
by
A. Mele

7.7. Appendix 1: Estimation of the market expected return

7.7 Appendix 1: Estimation of the market expected return


We estimate yearly expected excess returns by relying on the following simple model,

+ (

1) +

>

(7A.1)

is a vector containing default and term


where , , and are coe cients ( is two-dimensional),
spreads, and
is a residual term, assumed to be independent and identically normally distributed.
Note that Eq. (7A.1) could be cast, equivalently, as:

= + (1
) 1
(7A.2)
= + 1+ > 1
where is the lag operator,
1 , such that yearly expected returns could be dened just as in

Eq. (7.2) in the main text, with ( + ) denoting the projection of +1 based on model (7A.2).
To calculate the projection terms in Eq. (7.2) obviously necessitates specifying the dynamics of
in (7A.2). We proceed, alternatively, as follows. First, we estimate Eq. (7A.1). Second, we reconstruct
a time series of monthly expected returns, in the second of Eqs. (7A.2) given the estimates of ,
and . Third, we t an AR(1) model to the reconstructed series , and calculate projections on this
series to calculate E in Eq. (7.2).

321

c
by
A. Mele

7.8. Appendix 2: Calibration of the tree in Section 7.3

7.8 Appendix 2: Calibration of the tree in Section 7.3


Solution and calibration of the model. We provide details regarding the calibration results in
Table 7.2 of the main text. In the rst step, we estimate the two parameters and of the dividend
process. Let be the yearly dividend gross growth rate. We calibrate and by a perfect matching
( ) =
+ (1
)
and dividend variance,
of the model expected dividend growth,

2
2
( ) =
(1
), to their sample counterparts = 1 0594 and 0 = 0 0602
0
obtained with US aggregate dividend data. The result is ( ) = (0 158 0 082). Given these calibrated
and .
values of ( ), we x = 1 0%, and proceed to calibrate the probabilities
), we need an explicit expression for all the payo s at each node. We rely on
To calibrate (
, which we obtain in closed-form, as follows. For each state
{
},
the price of the claim,
is solution to,

(7A.3)

where E () is the expectation taken under the risk-neutral probability


in state ,
{
},
2
2
0
0
= ,
=
,
=
,
= 1, and
and
are the dividend and the price of
and
the claim as of the next period. Since risk-aversion is constant from the third period on, the price0
dividend ratio is constant as well, from the third period on, which implies that
= 0 . Replacing
, yields,
this equality into Eq. (7A.3), and solving for
=

+ (1
+ (1

)
)

(7A.4)

We calibrate (
= ) to make the average price-dividend (P/D henceforth) ratio
,
the good P/D ratio 2 and the bad P/D ratio 2 in Eq. (7A.4) perfectly match the average
P/D ratio, the average P/D ratio during NBER expansion periods, and the average P/D ratio during
), we
NBER recession periods (i.e. 31.99, 33.21 and 26.20, from Table 7.1). Given (
=
compute
P/D ratios
in states
and . For example, the price of the asset in state
is,
the

2 +
[
) (1 +
)]. Given
, we compute the log-return in the bad state as
+ (1
ln(

), where

with probability
with probability 1

1+

Then, we compute the return volatility in state . The P/D ratios, the expected log-return and
return volatility in state are computed similarly. (Please notice that volatilities under and under
{ } {
} are not the same.)
Next, we recover the risk-aversion parameter
in the three states
{
} implied by the
and =
. As we shall show below, the relevant formula
previously calibrated probabilities ,
to use is,
=

+ (1

(7A.5)

The values for the implied risk-aversion parameter in Table 7.2 are obtained by inverting Eq. (7A.5)
).
for , given the calibrated values of (
is the Sharpe ratio,
Finally, we determine the risk-adjusted discount rate as + 0 , where
which we shall show below to equal,
=p

(1

322

(7A.6)

c
by
A. Mele

7.8. Appendix 2: Calibration of the tree in Section 7.3

Proof of Eq. (7A.5). We only provide the derivation of the risk-neutral probability
, since
the proofs for the expressions of the risk-neutral probabilities
and =
are nearly identical. In
equilibrium, the Euler equation for the stock price at the bad node is,
#
"

i
0 ( )
+

+
=
{
}
(7A.7)
=
0 (
)
where: (i) is the discount
is state dependent and
rate; (ii) the utility function for consumption
(1
( )= 1
); (iii) () is the expectation taken under the physical probability
equal to,
2 and
; and (iv) the dividend and the gross dividend growth rate are either =
2
=
=
with probability , or
= 1 and
= 1 = with probability 1
.
In the model, the asset is elastically supplied or, equivalently, there exists a storage technology with
a xed rate of return equal to = 1%. Let us derive the agents private evaluation of this asset. The
Euler equation for the safe asset is,
P

=
(
)=
(7A.8)
, is state dependent,

where the safe interest rate,

and

(1

=1

. Therefore,

(7A.9)

is a probability distribution. In fact, by plugging


and 1
into Eq. (7A.7), one sees that it
=
is the risk-neutral probability distribution. To obtain Eq. (7A.5), note that by Eq. (7A.8),
), which replaced into Eq. (7A.9) yields,
1/ (

Eq. (7A.5) follows by the denition of given above.


Proof of Eq. (7A.6). Let
the gross expected return of the risky asset. The asset return can
with probability , and
with probability 1
, and
. Therefore, for
take two values:
each state, we have that:
=

+ (1

+ (1

(7A.10)

where we have omitted the dependence on the state


to alleviate the presentation. The standard
p
(1
). The Sharpe ratio is dened as
deviation of the asset return is Std =
=

Std

By substracting the two equations in (7A.10),


= +

(1

(1

)= +

(1

from which Eq. (7A.6) follows immediately. Note, also, that in terms of this denition of the Sharpe
ratio, the risk-neutral expectation of the dividend growth is, E ( ) = ( )
0.

323

c
by
A. Mele

7.9. Appendix 3: Asset prices in a multifactor model

7.9 Appendix 3: Asset prices in a multifactor model


Consider a reduced-form model, where an asset price,
number of factors,
= ( ), = 1
, and = [
that the -th asset pays o an instantaneous dividend
=

( )

say, is a twice-di erentiable function of a

]> is a vector of factors. We assume


= ( ), and that is a di usion process:

+ ( )

where is -valued, is valued, and


is a -dimensional Brownian motion. We assume the
number of assets does not exceed the number of factors,
, consistently with the framework in
Chapter 4. By It
os lemma:
1
z}|{
z}|{
=
+
is the innitesimal operator. Let be the instantaneous short-term rate. By no-arbitrage,
where
and under regularity conditions, there exists a measurable -vector process , the vector of unit prices
of risk associated with the uctuations of the factors, such that,
1
1

+
..
.
+

1
1

1
1

= |{z} |{z}

where

..
.

(7A.11)

To simplify, we take this economy to be Markov, assuming that,


( ) and
( ).
Eqs. (7A.11) add up to a system of
uncoupled partial di erential equations, and the solution is
one no-arbitrage price system, assuming no bubbles.

324

c
by
A. Mele

7.10. Appendix 4: Arrow-Debreu PDEs

7.10 Appendix 4: Arrow-Debreu PDEs


We develop an interesting connection. Note that by Eq. (7A.11),
+

+(

1)

is solution to,
(

2 2

RR

(7A.12)

Under regularity conditions, the Feynman-Kac representation of the solution to Eq. (7A.12) is,
Z
(
)
(
)=
0

where,
(

)=E

( )

and
is the stochastic discount factor:
= , 0 = 1.
0
Alternatively, we represent the price under the physical probability. Given the previous assumptions, we have that necessarily satises,
=

( )

1(

2(

(7A.13)

Next, dene the undiscounted Arrow-Debreu adjusted asset price process, dened as:
(
where

) (

) is as in Eq. (7.26),
(

0 (

)=

By results in Section 7.4.2, we know that the following price representation holds true:
Z

=
0
Under regularity conditions, the previous equation can then be understood as the unique Feynman-Kac
stochastic representation of the solution to the following partial di erential equation
(
where

)+ (

)= (

) (

. Eq. (7A.12) then follows by the denition of

325

RR

[ ]

c
by
A. Mele

7.11. Appendix 5: The maximum principle

7.11 Appendix 5: The maximum principle


Consider the di erential equation:
=

=0

sign (

where satises regularity conditions that ensure


we have that:

) = constant on

remains bounded on (

sign ( ) =

(7A.14)

). Under these conditions,

sign ( )

(7A.15)

Figure 7.A.1 illustrates the intuitive reasons leading to Eq. (7A.15). Consider the following heuristic
arguments. Note that
Z
Z
0=
= +
=
=
such that Eq. (7A.15) holds.
still satises Eq. (7A.14), but that at the same time,
Next, suppose that
and time,
a state variable
=
where

is some function of

satises:
=

Assuming enough regularity conditions, we have that


=
where

(7A.16)

. Therefore, comparing Eq. (7A.14) with Eq. (7A.16) leaves:

Because

= (

(7A.17)

) = 0, the solution to Eq. (7A.17) is:


(

)=

(7A.18)

Therefore, and extending Eq. (7A.15), we have the following result. Suppose that
(
). Thenn, by Eq. (7A.18):
and that sign ( ) = constant on
sign ( (

)) =

= (

) = 0,

sign ( )

These results can be extended to stochastic di erential equations. Consider the more elaborate
operator-theoretic format version of Eq. (7A.17), the one that arises in typical asset pricing models
with Brownian motions:

0=

326

(7A.19)

c
by
A. Mele

7.11. Appendix 5: The maximum principle

x(t)

<0
x(T)
t

>0
x(T)
t

x(t)

FIGURE 7A.1. Illustration of the maximum principle for ordinary di erential equations
Let
+

We claim that under regularity conditions, if Eq. (7A.19) holds,


=

is then a martingale. Indeed,

+
+

=
where

is a local martingale, and the last equality holds due to Eq. (7A.19). But if
Z

+
= = ( )=

is a martingale,

where
is the expectation taken with respect to the information at time . Therefore, conclude as
= 0 and that sign ( ) = for each . Then sign ( ) = for each .
follows. Assume that

327

c
by
A. Mele

7.12. Appendix 6: Stochastic dominance beyond Rothschild and Stiglitz

7.12 Appendix 6: Stochastic dominance beyond Rothschild and Stiglitz


7.12.1 Dynamic stochastic dominance
An old issue in nancial economics is the relation between asset prices and the volatility of fundamentals (see, e.g., Malkiel, 1979; Pindyck, 1984; Poterba and Summers, 1985; Abel, 1988; Barsky, 1989).
Second-order stochastic dominance (Rothschild and Stiglitz, 1970, 1971) suggests that within the static
pricing problem discussed in the main text where = E( ()) (see Section 7.4.2), is inversely related
to mean preserving spreads of , provided is concave (see also the review in the appendix of Chapter
1). Intuitively, a concave function exaggerates poor realizations of and dampens the favorable
ones.
The following result contains conditions that apply to the dynamic case, which generalizes the
pricing problem in Eq. (7.11),
(
where

is a di usion process with drift

(7A.20)

and di usion . We have:

Theorem 7.A.1. (Dynamic Stochastic Dominance) Consider two economies A and B with two
fundamental volatilities
and
and let ( )
( ) ( ) and ( ) ( =
) the corresponding
, the price
in economy A is lower than the price price
risk-premium and discount rate. If
in economy B whenever for all (
) R [0 ],
(

( )

( ))

( )

( ))

)+

1
2

( )

( )

(7A.21)

If is the price of a traded asset,


=
. If in addition is constant, is decreasing (increasing)
in volatility whenever it is concave (convex) in . This phenomenon is related to the convexity e ect
discussed in the main text. If is not a traded risk, two additional e ects are activated. The rst one
reects a discounting adjustment, and is apparent through the rst term in the denition of . The
second e ect reects risk-premiums adjustments and corresponds to the second term in the denition
of . Both signs at which these two terms show up in Eq. (7A.21) are intuitive.
Proof of Theorem 7.A.1. The function (

0 = 2(
( 0) = ( )

)+

) in Eq. (7A.20) is solution to


( ) (

) R [0
R

(7A.22)

(
) = 12 2 ( )
(
)+ ( ) (
) and subscripts denote partial derivatives. Clearly,
where
and
are both solutions to Eq. (7A.22) but with di erent coe cients. Let ( )
( ).
0( )
(
)
(
) is solution to
For each ( ) R [0 ), the price di erence
(
)
0=

2(

)+

1
2

( )

)+

( )

( )

)+ (

with
( 0) = 0 for all
R, and
is as in Eq. (7A.21) of the theorem. The result follows by
results given in Appendix 5 to this chapter.

328

c
by
A. Mele

7.12. Appendix 6: Stochastic dominance beyond Rothschild and Stiglitz


7.12.2 Proof of Theorem 7.1
By di erentiating Eq. (7A.22) two times with respect to , we nd that that for all (
(1) (
)
(
) and (2) (
)
(
) are solutions to,

1 2
1 2
(1)
(1)
0
(1)
)+
( ) (
) + ( ) + ( ( ))
(
0= 2 (
2
2

0
0
( ) (1) (
)
( ) (
)
( )

with

(1) (

0) =

0=

( )

R, and (

(2)
2 (

( )

0
2 ( )

)+
2 0( )
00

( )

R [0

1 2
( ) (2) (
2

1 2
00
( ( ))
2

(1)

R++ [0

),

),

)+
(2)

( ) + ( 2 ( ))0

00

( ) (

(2)

with (2) ( 0) = 00 ( )
R. By results reviewed in Appendix 5 to this chapter, (1) (
) 0
R. This
(resp. 0) ( ) R [0 ) whenever 0 ( ) 0 (resp. 0) and 0 ( ) 0 (resp. 0)
completes the proof of Part (i) of the theorem. The proof of Part (ii) is obtained similarly.
Remark 7.A.2. (Alternative proof) An alternate proof can directly rely on the convexity of the
payo function, and a result due to Hajeks (1985). This result says that if is increasing and convex,
and 1 and 2 are two di usion processes, both starting o from the same origin, with integrable
drifts 1 and 2 and volatilities 1 and 2 , then, [ ( 1 )]
[ ( 2 )], whenever 1 ( )
2 ( ) and
(0 ). This result generalizes classic comparison theorems (e.g., Karatzas and
1( )
2 ( ) for all
Shreve, 1991, p. 291-295), where is an increasing function and 1
2.
Remark 7.A.3. (Bounds on convexity) An inspection of the proof of Theorem 7.1 reveals that
concavity of prices holds under a less restrictive conditions than that given in Part (ii) of the theorem.
We would only need that,
0

00
00
2 ( )
( ) (1) (
)
( ) (
) 0
or that,

00

( )
(1)

0
2 ( )

00

( )

(1) (

)
)

(7A.23)

Assuming that 0
0, such that
0, the inequality (7A.23) imposes a theoretical upper bound
to convexity of discount rates such that prices are concave.

329

c
by
A. Mele

7.13. Appendix 7: Dynamics of habit in Campbell and Cochrane (1999)

7.13 Appendix 7: Dynamics of habit in Campbell and Cochrane (1999)


We derive Eq. (7.39), by making a slightly more general assumption that the short-term rate we wish
to come up with, is a ne in ln , meaning that, the last two terms in Eq. (7.38) sum up to,
(1

) (

1
2

ln )

2 2
0 (1 +

( ))2 =

const + (

ln )

(7A.24)

for some , and where const is to be determined. The working paper version of Campbell and Cochrane
(1999) considers exactly this case.
Dene the log of the surplus ratio as

ln 1
(7A.25)
where
ln ,
steady state

ln 1

where

ln and
(

+1

ln , and consider its rst-order Taylors expansion around the

),

+ 1

and

1
(

. Consider the discrete-time version of Eq. (7.35),

)+ ( )

+1

where
( ). Replacing Eq. (7A.26), evaluated at
previous approximation, and rearranging terms, leaves:
+1

+1

(7A.26)

( )

)+

and

1
2

2
0

(7A.27)

= + 1, into both sides of the

+1

1
2

2
0

(7A.28)

The function in Eq. (7.39) is found by imposing the following three conditions, where the rst is
a slight generalization of that mentioned in the main text, and the remaining two are the last two
conditions in the main text:
First, the short-term rate in Eq. (7.38) is a ne in ln , i.e. Eq. (7A.24) holds, such that:
s
1
2
)
) ( ln ) + 2 2 const 1
(7A.29)
( ) = 2 2 2 ( (1
0

Second, habit is predetermined at the steady state, meaning that


+1 , which by Eq. (7A.28), it does not, when:
() =

+1

does not change with

(7A.30)

Evaluating Eq. (7A.29) at the steady state , and using the previous condition delivers,
=

1
,
2

2
2 2
0

const

such that Eq. (7A.29) is,


( )=

1
2 2
0

( (1

330

) (

ln ) +

1
2

(7A.31)

c
by
A. Mele

7.13. Appendix 7: Dynamics of habit in Campbell and Cochrane (1999)


Third, habit is predetermined near the steady state, meaning that,

=0

(7A.32)

of
+1 . By the denition

(
+1 )
the log-surplus consumption ratio in Eq. (7A.25), we have that
+ +1 ,
+1 = ln 1
where ( +1 )
+1 and
+1 is as in Eq. (7A.27), such that, using Eq. (7A.27):

We, then, need to nd the dynamics of

+1
+1

where we have set

( )

=1

+1 ,

expressed as a function of

1
(

+1 )

+1 )

( ). Therefore, Eq. (7A.32) is:

after simple computation, and using Eq. (7A.30),


0

( ) =

=1

)
1

+1

( )

= 0, which leaves,

By taking the derivative in Eq. (7A.31), and replacing into the left hand side of the previous
equation, and solving for , yields,
s
2

which is the expression of the main text, for


(7A.31), leaves Eq. (7.39) in the main text.

(1

= 0. By replacing this expression of into Eq.

Finally, note that, now, the expression of the short-term rate can be found after simple computations:

1 2
1
( (1
)
) + ( ln )
( )= +
0
0
2
2

331

c
by
A. Mele

7.14. Appendix 8: An algorithm to simulate discrete-time pricing models

7.14 Appendix 8: An algorithm to simulate discrete-time pricing models


Consider the pricing equation,

The price-dividend ratio,

(
(

0)

say, satises:
0

0
1+

0+

The previous equation is a functional equation in ( ), say:


( )=

1+

A numerical solution can be implemented as follows. Create a grid and dene


1
, for some . We have,
1

11

..
.

..
.

..
.

=1

= (

..
.

..
.

( ),

..
.

= Pr ( | )

= max , min and max are the boundaries in the


where
is the integration step, 1 = min ,
approximation, and Pr ( | ) is the transition density from state to state - in this case, a Gaussian
]> , = [ 1
]> , and let be a matrix with elements
.
transition density. Let = [ 1
The solution is,
(7A.33)
=(
) 1
The model can be simulated in the following manner. Let and be the boundaries of the underlying

. Draw states. State


is drawn. Then,
state process. Fix
=
1. If min {
and max =
2. If min {
and min =

min +

max

}=

, let

be the smallest integer close to

. Let

}=

, let

be the biggest integer close to

. Let

min

max

,
+

The previous algorithm avoids interpolations, and ensures that during the simulations, is computed
that is drawn. Precisely, once
is drawn, we proceed to
in correspondence of exactly the state

= max
the following two steps: (i) create the corresponding grid 1 = min , 2 = min +
according to the previous rules; and (ii) compute the solution from Eq. (7A.33). In this way, one has
is drawn.
( ) at handthe simulated P/D ratio when state

332

c
by
A. Mele

7.15. Appendix 9: Heuristic details of learning in continuous time

7.15 Appendix 9: Heuristic details of learning in continuous time


We derive Eq. (7.47). The probability the dividend process in Eq. (7.45) has the good drift
conditional on the observation of ( )
is,
(

)=

(
) + (1

) (

+ (1

2
0

where the rst equality follows by a straightforward generalization of the arguments leading to Eq.
(7.44), and the second holds by the assumption that 0
is Gaussian with mean zero and variance
2 . Hence, we have that (
)
is
a
function
of
the
dividend
only. To simplify notation, let ( )
0
). We have,
(

2
2
1
1
2
2
2
0
2
00
0
0
0
2 ( )
( )=2 ( )
( )= ( ) 2
1
2
0

Note that,
2

( )
(1
=
( )

2
0

and

( )=

|(

(2 (

1)

such that,
0

( )=

( ) (1

2
0

Next, apply Itos lemma to the function


in in Eq. (7A.34),
(

)=

)
(
(

), where
1
2
1
+
2
+

00

( ))

) (1

00
00

333

2
2
0

(7A.34)

is solution to Eq. (7.46). By the expressions

(
(

( )

)
)
))

2
0

c
by
A. Mele

7.16. Appendix 10: Linear regime-switching economies

7.16 Appendix 10: Linear regime-switching economies


We prove a claim made in Section 7.5.3. Consider a complete markets economy where dividends,
consumption, and signals (
) satisfy:
/
/

(
(
(

0 0)

3)

6)

and
= ( 1 2 3 )> denotes a vector standard Brownian motion, with being a two-state ( )
are constants. Let 3 5 6= 2 6 . Then, there are no CRRA representative agent
Markov chain, and
equilibria in which price-dividend ratios are convex in expected dividend growth. To demonstrate this
claim, we apply the ltering results of Liptser and Shiryaev (2001) (Vol. I), and nd that the previous
economy is isomorphic to one in which,
/
/

=
(

+
)

0

(

0, and is some vector


where
= ( 1 2 3 )> is a vector standard Brownian motion,
satisfying = 0. Standard arguments lead that in equilibrium, the short-term rate, , is constant
( = 1 2 3). The claim follows by and Theorem 7.1.
and

334

c
by
A. Mele

7.17. Appendix 11: Bond price convexity revisited

7.17 Appendix 11: Bond price convexity revisited


Consider a short-term rate process
the current short-term rate is 0 :

, and let (

)=E

) be the price of a bond expiring at time

when

As pointed out in Section 7.6, Theorem 7.1-(ii) implies that in scalar di usion models of the short0 whenever 00
2, where
term rate, such as those dealt with in Chapter 12, one has 11 ( 0 )
is the risk-neutralized drift of . This result, obtained by Mele (2003), can be proved through the
Feynman-Kac representation of 11 , and a similar proof can be used to show Theorem 7.1-(ii). This
appendix provides a more intuitive derivation under a set of simplifying assumptions. By Eq. (6) p.
685 in Mele (2003),
"Z
! R
#
2 Z
2

11 ( 0

Hence

11 ( 0

)=E

0 whenever

2
2
0

= ( )
is a constant. We have,
Z
1
0
= exp
( )
2
0
0

(7A.35)

is solution to:

To keep the presentation simple, assume

where

2
0

2
0

and

2
0

=
0

00

( )
0

(7A.36)

2
Therefore, if 00 0, then 2
0, and by the inequality in (12.56), 11 0.
0
This result can be improved. Suppose that 00 2, instead of 00 0. By the second of Eqs. (7A.36),
Z
2
2
2
0

and consequently,
Z

2
2
0

which is the inequality in (12.56).

335

7.17. Appendix 11: Bond price convexity revisited

c
by
A. Mele

References
Abel, A.B. (1988): Stock Prices under Time-Varying Dividend Risk: An Exact Solution in
an Innite-Horizon General Equilibrium Model. Journal of Monetary Economics 22,
375-393.
Abel, A.B. (1990): Asset Prices under Habit Formation and Catching Up with the Joneses.
American Economic Review Papers and Proceedings 80, 38-42.
Andersen, T. G., T. Bollerslev and F.X. Diebold (2002): Parametric and Nonparametric
Volatility Measurement. Forthcoming in At-Sahalia, Y. and L. P. Hansen (Eds.): Handbook of Financial Econometrics.
Bajeux-Besnainou, I. and J.-C. Rochet (1996): Dynamic Spanning: Are Options an Appropriate Instrument? Mathematical Finance 6, 1-16.
Barberis, N., M. Huang and T. Santos (2001): Prospect Theory and Asset Prices. Quarterly
Journal of Economics 116, 1-53.
Barsky, R.B. (1989): Why Dont the Prices of Stocks and Bonds Move Together? American
Economic Review 79, 1132-1145.
Barsky, R.B. and J.B. De Long (1990): Bull and Bear Markets in the Twentieth Century.
Journal of Economic History 50, 265-281.
Barsky, R.B. and J.B. De Long (1993): Why Does the Stock Market Fluctuate? Quarterly
Journal of Economics 108, 291-311.
Bergman, Y.Z., B.D. Grundy, and Z. Wiener (1996): General Properties of Option Prices.
Journal of Finance 51, 1573-1610.
Black, F. and M. Scholes (1973): The Pricing of Options and Corporate Liabilities. Journal
of Political Economy 81, 637-659.
Brennan, M.J. and Y. Xia (2001): Stock Price Volatility and Equity Premium. Journal of
Monetary Economics 47, 249-283.
Brunnermeier, M.K. and S. Nagel (2007): Do Wealth Fluctuations Generate Time-Varying
Risk Aversion? Micro-Evidence on Individuals Asset Allocation. Forthcoming in American Economic Review.
Campbell, J.Y. (2003): Consumption-Based Asset Pricing. In: Constantinides, G.M., M.
Harris and R. M. Stulz (Editors): Handbook of the Economics of Finance (Volume 1B:
Chapter 13), 803-887.
Campbell, J.Y., and J.H. Cochrane (1999): By Force of Habit: A Consumption-Based Explanation of Aggregate Stock Market Behavior. Journal of Political Economy 107, 205-251.
Christiansen, C., M. Schmeling and A. Schrimpf (2012): A Comprehensive Look at Financial
Volatility Prediction by Economic Variables. Journal of Applied Econometrics 27, 956977.
336

7.17. Appendix 11: Bond price convexity revisited

c
by
A. Mele

Clark, T.E. and K.D. West (2007): Approximately Normal Tests for Equal Predictive Accuracy in Nested Models. Journal of Econometrics 138, 291-311.
Constantinides, G.M. (1990): Habit Formation: A Resolution of the Equity Premium Puzzle.
Journal of Political Economy 98, 519-543.
Corradi, V., W. Distaso and A. Mele (2013): Macroeconomic Determinants of Stock Volatility
and Volatility Premiums. Journal of Monetary Economics 60, 203-220.
David, A. (1997): Fluctuating Condence in Stock Markets: Implications for Returns and
Volatility. Journal of Financial and Quantitative Analysis 32, 427-462.
Detemple, J.B. (1986): Asset Pricing in a Production Economy with Incomplete Information.
Journal of Finance 41, 383-391.
Duesenberry, J.S. (1949): Income, Saving, and the Theory of Consumer Behavior. Cambridge,
Mass.: Harvard University Press.
El Karoui, N., M. Jeanblanc-Picque and S.E. Shreve (1998): Robustness of the Black and
Scholes Formula. Mathematical Finance 8, 93-126.
Fama, E.F. and K.R. French (1989): Business Conditions and Expected Returns on Stocks
and Bonds. Journal of Financial Economics 25, 23-49.
Fama, E.F. (2014): Nobel Lecture: Two Pillars of Asset Pricing. American Economic Review
104, 1467-1485.
Ferson, W.E. and C.R. Harvey (1991): The Variation of Economic Risk Premiums. Journal
of Political Economy 99, 385-415.
Fornari, F. and A. Mele (2013): Financial Volatility and Real Economic Activity. Journal
of Financial Management, Markets and Institutions 1, 155-198.
Gabaix, X. (2009): Linearity-Generating Processes: A Modelling Tool Yielding Closed Forms
for Asset Prices. Working paper New York University.
Gennotte, G. (1986): Optimal Portfolio Choice Under Incomplete Information. Journal of
Finance 41, 733-746.
Giacomini, R. and H. White (2006): Tests of Conditional Predictive Ability. Econometrica
74, 1545-1578.
Glosten, L., R. Jagannathan and D. Runkle (1993): On the Relation between the Expected
Value and the Volatility of the Nominal Excess Return on Stocks. Journal of Finance
48, 1779-1801.
Gordon, M. (1962): The Investment, Financing, and Valuation of the Corporation. Homewood,
IL: Irwin.
Hajek, B. (1985): Mean Stochastic Comparison of Di usions. Zeitschrift fur Wahrscheinlichkeitstheorie und Verwandte Gebiete 68, 315-329.
337

7.17. Appendix 11: Bond price convexity revisited

c
by
A. Mele

Huang, C.-F. and Pag`es, H. (1992): Optimal Consumption and Portfolio Policies with an
Innite Horizon: Existence and Convergence. Annals of Applied Probability 2, 36-64.
Jagannathan, R. (1984): Call Options and the Risk of Underlying Securities. Journal of
Financial Economics 13, 425-434.
Karatzas, I. and S.E. Shreve (1991): Brownian Motion and Stochastic Calculus. Berlin: Springer
Verlag.
Kijima, M. (2002): Monotonicity and Convexity of Option Prices Revisited. Mathematical
Finance 12, 411-426.
Liptser, R. S. and A. N. Shiryaev (2001): Statistics of Random Processes. Berlin, SpringerVerlag. [2001a: Vol. I (General Theory). 2001b: Vol. II (Applications).]
Ljungqvist, L. and H. Uhlig (2000): Tax Policy and Aggregate Demand Management under
Catching Up with the Joneses. American Economic Review 90, 356-366.
Malkiel, B. (1979): The Capital Formation Problem in the United States. Journal of Finance
34, 291-306.
Mehra, R. and E.C. Prescott (2003): The Equity Premium in Retrospect. In Constantinides,
G.M., M. Harris and R. M. Stulz (Editors): Handbook of the Economics of Finance (Volume 1B, chapter 14), 889-938.
Mele, A. (2003): Fundamental Properties of Bond Prices in Models of the Short-Term Rate.
Review of Financial Studies 16, 679-716.
Mele, A. (2005): Rational Stock Market Fluctuations. WP FMG-LSE.
Mele, A. (2007): Asymmetric Stock Market Volatility and the Cyclical Behavior of Expected
Returns. Journal of Financial Economics 86, 446-478.
Menzly, L., T. Santos and P. Veronesi (2004): Understanding Predictability. Journal of
Political Economy 111, 1, 1-47.
Pindyck, R. (1984): Risk, Ination and the Stock Market. American Economic Review 74,
335-351.
Paye, B.S. (2012): Dej`a Vol: Predictive Regressions for Aggregate Stock Market Volatility
Using Macroeconomic Variables. Journal of Financial Economics 106, 527-546.
Poterba, J. and L. Summers (1985): The Persistence of Volatility and Stock Market Fluctuations. American Economic Review 75, 1142-1151.
Romano, M. and N. Touzi (1997): Contingent Claims and Market Completeness in a Stochastic Volatility Model. Mathematical Finance 7, 399-412.
Rothschild, M. and J. Stiglitz (1970): Increasing Risk: I. A Denition. Journal of Economic
Theory 2, 225-243.
Rothschild, M. and J. Stiglitz (1971): Increasing Risk: II. Its Economic Consequences. Journal of Economic Theory 5, 66-84.
338

7.17. Appendix 11: Bond price convexity revisited

c
by
A. Mele

Ryder, H.E. and G.M. Heal (1973): Optimal Growth with Intertemporally Dependent Preferences. Review of Economic Studies 40, 1-33.
Schwert, G.W. (1989a): Why Does Stock Market Volatility Change Over Time? Journal of
Finance 44, 1115-1153.
Schwert, G.W. (1989b): Business Cycles, Financial Crises and Stock Volatility. CarnegieRochester Conference Series on Public Policy 31, 83-125.
Shiller, R.J. (2014): Nobel Lecture: Speculative Asset Prices. American Economic Review
104, 1486-1517.
Sundaresan, S.M. (1989): Intertemporally Dependent Preferences and the Volatility of Consumption and Wealth. Review of Financial Studies 2, 73-89.
Timmermann, A. (1993): How Learning in Financial Markets Generates Excess Volatility
and Predictability in Stock Prices. Quarterly Journal of Economics 108, 1135-1145.
Timmermann, A. (1996): Excess Volatility and Return Predictability of Stock Returns in
Autoregressive Dividend Models with Learning. Review of Economic Studies 63, 523577.
Veronesi, P. (1999): Stock Market Overreaction to Bad News in Good Times: A Rational
Expectations Equilibrium Model. Review of Financial Studies 12, 975-1007.
Veronesi, P. (2000): How Does Information Quality A ect Stock Returns? Journal of Finance
55, 807-837.
Wang, S. (1993): The Integrability Problem of Asset Prices. Journal of Economic Theory
59, 199-213.

339

8
Macronance

8.1 Introduction
This chapter discusses models that aim to address the empirical puzzles surveyed in the previous two chapters. The most prominent are the equity premium puzzle, its pattern over the
business cycle as well as that of aggregate stock volatility. Because these puzzles mainly relate
to the behavior of aggregate variables, it is natural to attempt to provide explanations hinging
upon macroeconomic variables such as consumption or output, whence the title of this chapter.
The equity premium puzzle is the di culty of the neoclassical model to predict expected
returns that are quantitatively consistent with those in the data. In particular, Chapter 6
explains that an implausibly high level of risk-aversion is needed to reconcile models with data.
Moreover, a high risk-aversion implies a low elasticity of intertemporal substitution and, hence,
an implausibly high volatility of the interest rates, which gives rise to the interest rate puzzle.
In the early attempts to address the equity premium puzzle, the assumption of a representative agent with CRRA preferences was replaced with that of a representative agent with
non-expected utility. In the non-expected utility framework, risk-aversion can be understood
independently of the elasticity of intertemporal substitution. This approach, described in Section 8.2, does not necessarily lead to address the equity premium puzzle. We shall explain that
within the non-expected utility model, a possible resolution of the puzzle requires that the
price-dividend ratio is a ected by a number of state variables. One instance of state variables
identied in the literature is a long-run risk, dened as a mechanism capable of turning even
a small shock into an economic damage perduring for years. For example, it has been argued,
expected consumption growth is highly persistent, such that shocks to expected growth have
long-lasting implications. While the model has the potential to address the equity premium
puzzle, we shall explain that there are nuances regarding the realism of this precise mechanism
underlying this model.
Section 8.3 explores channels that address the equity premium puzzle based on a variant
of habit formation. The external habit formation model reviewed in Chapter 7 relies on the
existence of a representative agent with high risk-aversion. Alternatively, one can consider
economies with heterogenous agents, in which each agent has a consumption reference that he
wants to benchmark to (catching up with the Joneses). The economy is heterogeneous in that

8.1. Introduction

c
by
A. Mele

each agent displays a di erent curvature of his utility function. In bad times, when markets
are down, the less risk-averse agents are those who su er the most, reecting their previous
relatively more aggressive investment decisions. Thus, their relative wealth and, hence, social
weight decreases. That is, the aggregate risk-aversion increases in bad times, leading to the
countercyclical mechanisms explained in the previous chapter. This model has the potential to
explain the equity premium, but is also subject to a number of caveats.
Section 8.4 describes economies in which agents can be hit by idiosyncratic shocks, a risk
that they cannot hedge against through security trading. How is it that idiosyncratic shocks
could a ect asset prices? The key assumption of the models we survey is that although agents
are ex-ante the same, they will be a ected by shocks that have di erent amplitude. A job loss
is one important instance that illustrates how these models work. Recessions do not necessarily
a ect agents in the same way. Some agents can be hurt more than others. The possibility of
a job loss can, then, induce agents to act prudently while investing in the stock market. That
is, risk-premium is countercyclical. This explanation can lead to a potential resolution of the
equity premium puzzle, theoretically at least.
While idiosyncratic risk is a clear explanation, its empirical implications do not seem to
be entirely exhaustive. Note that a natural hedge against idiosyncratic risk is self-insurance,
that is, the ability to save in good times to cope with adversities possibly occurring in bad.
Agents might actually eliminate a large portion of idiosyncratic risk by insuring themselves
while having access to capital markets, thereby making idiosyncratic risk practically irrelevant
to the explanation of the equity premium. For idiosyncratic risk to really matter, we would
need to observe a large and persistent idiosyncratic risk, or capital markets transactions to
be so expensive to prevent agents from implementing self-insurance plans. But empirically,
idiosyncratic risks do not appear to be as large and persistent, and market transaction costs
are not as large, as required by our standard models with idiosyncratic risk. Naturally, there are
historical instances in which idiosyncratic risk may have a better appeal: the Great Recession
occurring after the 2007 crisis is one teaching us idiosyncratic risks could be quite persistent.
Section 8.5 considers an alternative channel, market incompleteness. Economies with incomplete markets are not necessarily consistent with a sizeable equity premium. If agents have
comparable access to equity markets, they share the same level of risk, such that the premium
they require is similar to that in economies with complete markets. But market incompleteness
has the potential to resolve the puzzle, when a large fraction of the agents do not participate.
The mechanism is simple. If the markets are being shut down to a large proportion of agents,
the agents who participate are concerned for the aggregate macroeconomic risk they bear. This
concern leads them to require a sizeable premium.
Sections 8.6 and 8.7 deal with economies where agents have di erences in beliefs and also
uncertainty (as opposed to risk) regarding the fundamentals. [In progress]
Section 8.8 deals with issues arising within production-based economies. In these economies,
consumption is endogenous, and an increase in the agents risk aversion might actually lead
to a decreased consumption volatility. One additional important di culty in these economies
is that capital supply is innitely elastic, such that the price of capital is quite smooth. To
increase capital price volatility, we need hindrances in the capital formation process, such as
the presence of adjustment costs, or an added volatility of the demand for capital, obtained
for example when agents have habit formation over their consumption plans. Both rigidities in
the capital formation process and volatility in the demand of capital are needed, in order to
explain the equity premium.
Section 8.9: In progress.
341

c
by
A. Mele

8.2. Non-expected utility

Section 8.10 presents a simple model to assess a very old hypothesis in nancial economics:
the extent to which equity volatility can be explained by rms leverage.
Section 8.11 reviews recent models capable to explain the cross-section of asset returns,
relying on multiple trees.
Section 8.12 surveys predictions that the previous models make about the yield curve. Chapter
12 contains many more models and explanation of the yield curve and its relation to macroeconomic developments.
Section 8.13 deals with an intriguing topic: what do nancial economists and macroeconomists
have really in common? Granted, in many of the models surveyed in this chapter, we aim to understand asset prices in a context of the business cycle. However, these models are built upon a
revamp of many of the assumptions underlying the neo-classical paradigm. Yet macroeconomists
do not necessarily seem to acknowledge our asset pricing lessons. Are macroeconomists mistaken? Or is there a case for a modern version of a dichotomy between the real and the nancial
spheres of the economy? A simple model shows there is a potential for the hypothesis of a separation between nance and macroeconomics. However, this potential is seriously undermined
once we allow nancial markets to feed back the economy.
Section 8.14 concludes the chapter and reviews explanations of macroeconomic developments
relying on such asset price feedbacks. [In progress]

8.2 Non-expected utility


The standard intertemporal additive separable utility function confounds intertemporal substitution e ects and attitudes towards risk. This fact is problematic. Epstein and Zin (1989, 1991)
and Weil (1989) consider a class of recursive, but not necessarily expected utility, preferences.
In this section, we present some details of this approach, without insisting on the theoretic underpinnings, which the reader will nd in Epstein and Zin (1989). We provide a basic denition
and derivation of this class of preferences, and analyze its asset pricing implications.
8.2.1 Recursive formulations
Consider the following standard case, in which a decision maker displays additive preferences,
in that his continuation utility
satises:
= ( )+

+1 )

( )+

( +1 )

(8.1)

where is a von Neumann-Morgenstern utility function, and +1 is the certainty equivalent in


terms of consumption, i.e., +1 : ( +1 ) = ( +1 ). Next, let us dene the certainty equivalent
continuation utility expressed in consumption units, as : = ( ), such that we can write
Eq. (8.1) as:
1
= ( +1 )
( ( )+
( +1 ))
where
is the aggregator, and where by construction, +1 is also the certainty-equivalent
utility at + 1, dened as ( +1 ) = [ ( +1 )].
Consider, for example, the standard CRRA utility function, in which case the aggregator is:

11
( ) = 1 + 1

In this formulation, the decision maker attitude vis-`a-vis risk is encoded into the certainty
equivalent +1 through the utility and, as is well-known (see Chapter 3), in this specic
342

c
by
A. Mele

8.2. Non-expected utility

example, the CRRA


is exactly equal to the reciprocal of the elasticity of intertemporal
substitution (EIS). But there are no economic reasons for this coincidence. An alternative
assumption regarding the aggregator
is that CRRA and EIS are not tied up as in the
previous example. Consider the following aggregator:
+ )1

( ) = (

() = 1

and

(8.2)

for three positive constants , and . In this formulation, risk-attitudes for static wealth
gambles have still the classical CRRA avor. Precisely, we say that is the CRRA for static
wealth gambles, and
(1
) 1 is the EIS. We still have
1

+1 =
and the parametrization for

( (

+1 ))]

1 1

1
+1

in Eq. (8.2) then implies that


=

1
+1

1 1
)

(8.3)

= +
is
Naturally, in the absence of uncertainty,
+1 , another clear illustration that
the EIS. Also, it is straightforward to verify that as soon as the CRRA equals the reciprocal of
the EIS, i.e.,
P = 1 1 , in Eq. (8.3) collapses to the standard intertemporal additive utility,
1
=
+ .
=0
Let us return to the general case. Let us recall that is the certainty equivalent continuation
utility expressed in consumption units; in some cases (e.g., in Section 8.13), it is both convenient
and intuitive to formulate problems in terms of the continuation utility
= ( ), such that
we can write Eq. (8.3) as:
=

1
1

+ ((1

+1 ))

(8.4)

In the appendix, we shall rely on Eq. (8.4) while deriving the asset pricing implications of
portfolio choices within a representative agent framework.
8.2.2 Testable restrictions

Let us dene cum-dividend wealth as


evolves as follows:
+1

=(

>

(1 + r +1 )

=1

) (1 +

where is the vector of proportions of wealth invested in the


+1 +
returns, with any component being equal to, +1
the market portfolio, dened as,
+1

X
=1

+1

. In the Appendix, we show that

+1 )

(8.5)

assets, r +1 is the vector of asset


, and
is the return on

+1

+1
+1

and
are the price and the dividend of asset at time : the usual trading convention
where
is that the portfolio +1 is choosen at time .
343

c
by
A. Mele

8.2. Non-expected utility

Let us consider a Markov economy in which the underlying state is some process . We
consider stationary consumption and investment plans. Accordingly, let the stationary util be
a function ( ) when current wealth is and the state is . By Eq. (8.4),

1
1
(
)=
+ ((1
) ( ( +1 +1 ))) 1
(8.6)
max
1

In the Appendix, we show that the rst order conditions for the representative agent lead to
the following Euler equation,
[

+1 ) (1

+1

where the stochastic discount factor is

( +1 +1 )
; +1 +1 ) =
(
(
)

+1 ))]

= 1

=1

(1 +

(8.7)

1
1

+1 ))

(8.8)

This stochastic discount factor displays the interesting property to be a ected by the market
portfolio return,
, at least as soon as 6= 1 . In particular, when
1, the stochastic
discount factor is countercyclical not only through consumption growth, but also through the
market return, by its property to give more weight to states of nature when
is low than
to states when
is high. Moreover, the stochastic discount factor may potentially inherit the
excess volatility of market returns in a quite natural fashion.
Note, then, the interesting xed point problem in this model: market returns a ect the
stochastic discount factor, which a ects market returns in turn! Due to this xed-point, asset
prices predicted by these models are not known in closed-form, except for isolated exceptions.
Furthermore, the potential of this model to explain the empirical puzzles needs to be further
qualied, as discussed in the next section.
8.2.3 Risk premiums and interest rates
So the Euler equation is,

+1

(1 +

+1 )

(1 +

+1 )

=1

(8.9)

Eq. (8.9) obviously holds for the market portfolio and the risk-free asset. Therefore, by taking
logs in Eq. (8.9) for = , and for the risk-free asset, = 0 say, yields the following conditions:

ln( +1 )+
+1
0 = ln
= ln (1 +
)
(8.10)
where

ln , and

ln (1 +

) = ln

ln(

+1

)+(

1)

+1

(8.11)

Next, suppose that consumption growth, ln +1 , and the market portfolio return,
+1 ,
are jointly normally distributed. In the appendix, we show that the expected excess return on
the market portfolio is given by
(

+1 )

1
2

=
344

+ (1

(8.12)

c
by
A. Mele

8.2. Non-expected utility

where 2 =
(
) and
=
(
)), and the term 12 2 in the left
+1 ln ( +1
hand side is a Jensens inequality term. Note, Eq. (8.12) is a mixture of the Consumption
CAPM (for the part
) and the CAPM (for the part (1
) 2 ).
The risk-free rate is

1
1
1
+1
2
ln
= +
(8.13)
(1
) 2
2
2 2
(ln ( +1 )).
where 2 =
Eqs. (8.12) and (8.13) can be elaborated further. In equilibrium, the asset price and, hence,
the asset return, is certainly related to consumption volatility. Precisely, assume that
2

(8.14)

where 2 is a positive constant that may arise when the asset return is driven by some additional
state variable.1 Under the assumption that the asset return volatility is as in Eq. (8.14), the
equity premium in Eq. (8.12) is:
(

+1 )

1
+
2

1
2

+ (1

(8.15)

Disentangling risk-aversion from intertemporal substitution does not necessarily lead to a


resolution of the equity premium puzzle. To raise the equity premium, we need that 2
0,
meaning that additional state variables are needed, to drive variation of asset returns. At the
same time, the volatility of these state variables has the power to a ect asset returns only when
risk-aversion is distinct from the reciprocal of the EIS. As an example, suppose that 2 does
not depend on and , and that
1. Then, the equity premium increases with 2 whenever
1
. In other words, these state variables have the potential to inate the equity premium,
but only once they enter the stochastic discount factor through the market return. Section 8.2.5
shall illustrate this feature of the model through a mechanism relying on long-run risks.
Next, we derive the risk-free rate. Assume that (ln ( +1 )) = 0 12 2 , where 0 is the
expected consumption growth, a constant. Furthermore, use the assumptions in Eq. (8.14) to
obtain that the risk-free rate in Eq. (8.13) is,

1
1
1
1
1
2
2
= +
(8.16)
1+
0
1
2
21
As we can see, we may increase the level of relative risk-aversion, , without substantially
a ecting the level of the risk-free rate,
. This is because the e ects of on
are of a
second-order importance (they multiply variances, which are orders of magnitude less than the
expected consumption growth, 0 ).
8.2.4 Campbell-Shiller approximation
Consider the denition of the return on the market portfolio,

+1

+1
+1 +
+1
= ln
+ +1
+1 = ln
1 This

is for example the case of the Bansal and Yaron (2004) model described below.

345

+1

)+

+1

c
by
A. Mele

8.2. Non-expected utility

where
is the value of the market portfolio, +1 = ln +1 is the aggregate dividend growth,
and = ln is the log of the aggregate price-dividend ratio. A rst-order linear approximation
of ( +1 ) around the average level of leaves,
+1

+1

(8.17)

+1

+1
where 0 = ln
+ +1 , 1 = +1 and is the average level of the log price-dividend

ratio, such that 1


0 997, using US data. The approximation in Eq. (8.17) appears for the
rst time in Campbell and Shiller (1988).
This approximation allows to characterize the forces that are potentially capable to address
the equity premium puzzle. We know that agents must require compensation for risk that goes
well beyond that regarding consumption growth. Let us characterize, then, the innovations to
the stochastic discount factor predicted by this model. By Eq. (8.8),
ln

+1

(ln

+1 )

+1

+1 ))

(1

+1

+1 ))

(8.18)

where +1 ln +1 and
(
; +1 +1 ). That is, there can be additional sources
+1
of risk the agents require to be compensated for, provided that (i) the price-dividend ratio is
random and (ii) agents have a preference for an early resolution of uncertainty (
1). We now
illustrate how this mechanism operates in the context of risks that may be very persistent.
8.2.5 Risks for the long-run
Bansal and Yaron (2004) consider a model in which persistence in the expected consumption
growth has the potential to explain the equity premium puzzle. To illustrate the main points
of this explanation, assume that consumption growth is solution to,
+1

where

= ln

+1

1
2

+1

+1

is a small persistent component in consumption growth, solution to2

+ +1
0 2
+1 =
+1

(8.19)

(8.20)

To nd an approximate solution to the log of the price-dividend ratio, replace the CampbellShiller approximation in Eq. (8.17) into the Euler equation (8.10) for the market portfolio,

ln( +1
)+ ( 0 + 1 +1
+ +1 )
(8.21)
0 = ln

Conjecture that the log of the price-dividend ratio takes the simple form, = 0 + 1 , where
0 and 1 are two coe cients to be determined. Substituting this guess into Eq. (8.21), and
identifying terms, leaves:
1 1
(8.22)
= 0+ 1
1
1
1
where the constant

is characterized in the Appendix.

2 In this version of the model, the equity premium and volatility are constant. Bansal and Yaron assume that
to make the model consistent with time-varying statistics.

346

is heteroskedastic

c
by
A. Mele

8.3. Heterogeneous agents and catching up with the Joneses

Note that in the presence of a very persistent growth process, 1 1


1, such that even small
changes in the expected dividend growth, , would lead to large swings in the price-dividend
ratio. A model solved along these lines, where persistent processes would lead to highly volatile
prices, was already available in the literature, at least since the discussion of Campbell, Lo and
MacKinlay (1997, Chapter 7, p. 265) of a model with persistent expected returns:
If this standard deviation is small [i.e. the variability of expected stock returns], it is
tempting to conclude that changing expected returns have little inuence on stock prices.
[...] This conclusion is too hasty: [...] if expected returns vary in a persistent fashion,
[prices] can be very variable even when the [expected returns are] not.

The model discussed by Campbell, Lo and MacKinlay is one where expected returns are
directly modeled as possibly persistent processes. Instead, the model of this section is one
where expected growth is possibly persistent. We shall see soon, what the implications are, of
such a broader perspective. Note, however, a crucial point. High volatility of the price-dividend
ratio does not necessarily lead to a resolution of the equity premium puzzle. For example, in the
context of non-expected utility, Eq. (8.15) suggests that in equilibrium, relative risk-aversion
and intertemporal elasticity of substitution should play together in the right direction, and the
1
variance
is equally important. For example, if =
, the price-dividend ratio would not
even enter the Euler equation as we know. Therefore, we need to check how this high volatility
of the price-dividend ratio translates into a high equity premium.
2
We use the expression of
in (8.14), leading to the
+1 in Eq. (8.17) and determine
model prediction regarding the equity premium and the risk-free rate through Eqs. (8.15) and
(8.16). We have
2

2
1
2
1

=
|

+1 )

{z

+
!2

That is, and as anticipated, non-expected utility can lead to a resolution of the equity premium puzzle when asset returns are driven by sources of variation in addition to consumption
growth. The previous expression for 2 shows precisely how long run risks accomplish this: this
added volatility stems from large uctuations in the price-dividend ratio, arising due to the high
persistence of the small component in Eq. (8.20). An alternative albeit entirely consistent
way to explain these facts relies on the expression for the innovations to the stochastic discount
factor in this model: by Eq. (8.18),
ln

+1

(ln

+1 )

+1

+1 ))

(1

1 1

+1

+1 ))

That is, long-run risks are priced through 1 , the sensitivity of the price-dividend ratio to
changes in (see Eq. (8.22)): even if the innovations to are small, 1 is large, as explained,
and leads to large compensation for this risk, provided
1.
[Survey briey developments in the LRR literature]

8.3 Heterogeneous agents and catching up with the Joneses


The attractive feature of the Campbell and Cochrane (1999) model of external habit formation,
reviewed in Chapter 7, is to have the potential to generate the right properties of asset prices
347

c
by
A. Mele

8.3. Heterogeneous agents and catching up with the Joneses

and volatilities, through the channel of a countercyclical price of risk. It does rely on a high
risk-averse economy, though. Chan and Kogan (2002) show that a countercyclical price of risk
might arise, without assuming the existence of a representative agent with a high risk-aversion.
They consider an economy where heterogeneous agents have preferences displaying catching
up with the Joneses features introduced by Abel (1990, 1999).
In this economy, there is a continuum of agents, indexed by a parameter
[1 ) appearing
in their instantaneous utility,
1
(

)=

where is consumption, and is the standard living of others, to be dened below.


The total endowment in the economy, , follows a geometric Brownian motion,
=

(8.23)

By assumption, the standard of living of others, , is a weighted geometric average of the past
realizations of the aggregate endowment , viz
Z
(
)
+
ln
with
0
ln = ln 0
0

Therefore,

satises
=

By Eqs. (8.23) and (8.24),

where

is solution to,

1
= 0
2

ln

2
0

(8.24)

This model can be interpreted as one displaying catching up with the Joneses features
because the utility of each agent is a ected by a benchmark, , which is a weighted average of
the past aggregate endowment
, the latter obviously being equal to aggregate consumption
(see the constraint in [P1] below). This model is important. We already know (see Chapter 7)
that a realistically calibrated economy with habit formation and a representative agent relies
on a high risk aversion. Moreover, an economy with catching up with the Joneses and a
representative agent would also rely on a high risk-aversion, as argued below. Chan and Kogan
(2002) show that their model, while capturing the spirit of habit formation through catching
up with the Joneses, does not need to rely on a high risk-aversion, once we populate the
economy with heterogenous agents, thereby allowing habit formation and catching up with
the Joneses to perform a role in the explanation of the equity premium puzzle.
In this economy with complete markets, we can determine the asset price, and solve the
model by relying on the centralization of competitive equilibrium through Pareto weightings,
along lines similar to those in Theorem 2.7 of Chapter 2. As explained in the Appendix, the
equilibrium price process is the same as that in an economy with a representative agent with
instantaneous utility,
Z
Z
(
) max
(
)
s.t.
=
[P1]
1

348

c
by
A. Mele

8.4. Idiosyncratic risk

1
where
is the marginal utility of income of the agent . The Appendix provides further
details on the derivation of the value function of the program [P1], which is:
Z
1
1
1
( )
( )
(8.25)
1
1
where is a Lagrange multiplier, a function of the state , satisfying:
Z
1
1
=
( )
(8.26)
1

Finally, the Appendix shows that the unit risk-premium predicted by this model is,
( )=

0R

1
1

( )

(8.27)

To summarize, Eq. (8.26) determines the Lagrange multiplier, ( ), which then feeds ( )
through Eq. (8.27). Empirically, the Pareto weighting function, , can be parametrized by a
function, which can be calibrated to match selected characteristics of the asset returns and
volatility. Note, nally, that this economy collapses to an otherwise identical homogeneous
economy, once the social weighting function
= (
), the Diracs mass at . In this case,
( ) = 0 , a constant. As anticipated, an economy with a single agent with catching up with
the Joneses is unlikely to resolve the equity premium puzzle or address other issues such as
predictability, because 0 is both small and constant.3
A technical, albeit crucial assumption of this model is that the standard of living of others,
, is a process with bounded variation (see Eq. (8.24)). This assumption implies that
is
not a risk for which the agents require a compensation. In this model, then, it is the agents
heterogeneity that drives variation in the risk-premium ( ) (see Eq. (8.27)). By calibrating
the model to US data, Chan and Kogan nd that ( ) is decreasing and convex in .4 The
mechanism of the model is an endogenous redistribution of wealth. Note that the less riskaverse agents obviously invest a higher proportion of their wealth into risky assets, compared
to the more risk-averse. In the poor states of the world, then, when stock prices decrease, the
wealth of the less risk-averse agents lowers more than that of the more risk-averse. The result
is a reduction in the fraction of wealth held by the less risk-averse individuals in the whole
economy. Thus, in bad times, the contribution of these less risk-averse individuals to aggregate
risk-aversion decreases and, hence, aggregate risk-aversion increases and so does ( ) in Eq.
(8.27).
[Discuss the criticism of Xiouros and Zapatero (2010)]

8.4 Idiosyncratic risk


Aggregate risk is too small to justify the size of the equity premium through a low level of riskaversion. Do individuals bear some idiosyncratic risk, one that cannot be diversied away by
trading in capital markets? And then, how can this risk a ect asset evaluation? Shouldnt this
risk be neutral to asset pricing? The answer to these questions is indeed subtle, and relies upon
whether idiosyncratic risk a ects agents portfolio choices and, then, the stochastic discounting
factor. This section presents and discusses models that activate these channels.
3 See

Campbell (2003) for a similar critique.


numerical results also revealed that in their model, the log of the price-dividend ratio is increasing and concave in .
Finally, their lemma 5 (p. 1281) establishes that in a homogeneous economy, the price-dividend ratio is increasing and convex in .
4 Their

349

c
by
A. Mele

8.4. Idiosyncratic risk


8.4.1 A static model

Mankiw (1986) is the rst to point out the asset pricing implications of idiosyncratic risks. In
his model, aggregate shocks to consumption do not a ect individuals in the same way, ex-post.
Ex-ante, individuals know that the business cycle may adversely changean aggregate shock
although they also anticipate that the very same same shock might be particularly severe to only
a portion of theman idiosyncratic shock. To illustrate, everyone faces a positive probability
of experiencing a job loss during a recession, although then, only a part of the population will
actually su er from a job loss. Alternatively, one thing is to say that a recession will lead to a
salary reduction to everyone, and another thing is to say that a number of individuals will not
even be a ected by the recession but then the others will bear its entire burden.
Idiosyncratic risk might signicantly a ect agents portfolio choice and, therefore, rational
asset evaluation. Naturally, in the presence of contingent claims able to insure against these
shocks, idiosyncratic risk would not matter. But the point is that in reality, these contingent
claims do not exist yet, due perhaps to moral hazard or adverse selection reasons. This source
of market incompleteness might then potentially explain the aggregate stock market behavior
in a way that the model with a standard representative agent cannot.
Mankiw considers the pricing of a risky asset in a two-period model, with the rst period
budget constraint given by
+ = , and the second period consumption equal to:
= +

+(

) (1 + )

where
is the amount to invest in a money market account, is the safe interest rate on
the money market account, normalized to zero, is the initial endowment, also normalized to
zero, is the price of the risky asset, is the payo promised by the risky asset and, nally,

is the asset net payo . That is, we may either endogenize the price , given the
payo
, or, then, just the net payo , , as described next. The asset is in zero net supply,
and because agents are ex-ante identical, we have that in equilibrium, = 0, such that equals
per capita consumption, .
There are two states of nature for the aggregate economy, which are equally likely. In the
good state, the net asset payo is = 1 + , and per capita consumption is = . In the bad
state, the net asset payo is, instead, = 1, and per capita consumption is also adversely
a ected in that, = (1
) . The payo in the good state, , equals 2 ( ). Therefore,
measures a risk premium; of course it has to be determined in equilibrium.
How does this macroeconomic shock a ect asset pricing in the presence of agents who could
be heterogeneous ex-post? The crucial feature of the model is the assumption that only a
fraction of agents will absorbe the macroeconomic risk, in that, literally, a fraction 1
of individuals will not be hit by the aggregate shock, and each of them will still consume
, for a total of (1
) . The residual portion of the population will consume the residual,

(1
)
(1
) =(
) , such that each agent hit by the shock will consume 1
.
The ratio, , is the per capita fall in consumption for any individual hit by the shock in the
bad state of nature. If = 1, the aggregate shock hits everyone. The highest concentration of
the shock arises when = , i.e. when the fall in consumption is borne by the lowest possible
fraction of the population. Table 8.1 summarizes payo s, per capita consumption and individual
consumption in this economy.
350

c
by
A. Mele

8.4. Idiosyncratic risk

Net
Asset Payo
Bad state

Good state

Per capita
consumption
(1

1+

Individual
consumption

(consumed by )
(consumed by 1

TABLE 8.1. Aggregate uctuations and idiosyncratic risk


To summarize, only a selected portion of individuals will bear and share a macroeconomic
shock, such that the burden of this shock will be particularly heavy should it materialize,
especially when is small. Since the risky asset performs poorly in bad times, its evaluation
should reect not only business cycle risk, but also the fact that the business cycle might be
particularly severe to those who will be hurt by it.
To formalize this reasoning, consider the rst order conditions for any expected utility maximizer:
0=
=
=

1
2

[ 0 ( + )]
[ 0 ( )]


( 1) [ 0 1
+

( ) (1

)] + 12 (1 + )

( )

where the second equality follows by the equilibrium condition, = 0, and


function satisfying standard regularity conditions. The premium, , equals:

0
0
( )
1
=
0( )

is an utility

Mankiw shows that for utility functions leading to prudent behavior, 000 0, the premium is
decreasing in : an increase in the concentration of aggregate shocks leads to higher premiums.
Moreover, it is easy to see that can be made arbitrarily large, for arbitrarily close to ,
once the utility function satises the Inadas condition, lim 0 0 ( ) = , as we have that
lim
= . For example, in the log-utility case, we have that =
.
8.4.2 Self-insurance and persistence of idiosyncratic shocks
The critical assumption underlying Mankiws model is that once agents are hit by an idiosyncratic shock, the game is over. What happens once we allow the agents to act in a multiperiod
horizon? Intuitively, in a dynamic context, agents might implement self-insurance plans, by accumulating nancial assets after good shocks and selling or short-selling after bad shocks have
occurred. Telmer (1993) and Lucas (1994) show that if idiosyncratic shocks are not persistent,
self-insurance is quite e ective and asset prices behave substantially the same as they would do
in a world without idiosyncratic risk. Therefore, to have asset prices signicantly deviate from
those arising within a complete market setting, one has to either (i) reduce the extent of risksharing, by assuming frictions such as transaction costs, short-selling constraints or in general
severe forms of market incompleteness, or (ii) make idiosyncratic shocks persistent. With (i), we
just merely eliminate the possibility that agents may implement self-insurance plans through
351

8.4. Idiosyncratic risk

c
by
A. Mele

capital market transactions. With (ii), we make idiosyncratic shocks so severe that no capital market transaction might allow agents to insure themselves and achieve portfolio solutions
close to the complete market solution; intuitively, once any individual is hit by an idiosyncratic
shock, he may short-sell nancial assets in the short-run, although then, he cannot persistently
do so, given his wealth constraints.
Heaton and Lucas (1996) calibrate a model with idiosyncratic shocks using PSID (Panel
Study of Income Dynamics) and NIPA (National Income and Product Accounts) data. They
nd that idiosyncratic shocks are not quite persistent, and that a large amount of transaction
costs is needed to generate sizeable levels of the equity premium. Naturally, idiosyncratic shocks
are not always as in the PSID dataset analyzed by Heaton and Lucas long time ago. Models
with idiosyncratic risk would actually help think about market behavior in periods such as the
Great Recessionthe recession occurred around 2009, when the persistence of idiosyncratic
shocks would arguably be larger than over the Great Moderationthe period of low volatility
of macroeconomic aggregates, starting after the Monetary experiment in 1982 (e.g., Bernanke,
2004) and presumably ending in 2007.
8.4.3 A model with countercyclical income inequality
Constantinides and Du e (1996) do actually take the issue of persistence in idiosyncratic risk
to the extreme, and consider a model without any transaction costs, but with permanent idiosyncratic risk. They show that in fact, given an asset price process, it is always possible to nd
a cross-section of idiosyncratic risk processes compatible with the asset price given in advance.
We now present this elegant model, which has a quite substantial theoretical importance per
se, because of its feature to make so transparent how some state variables a ecting consumer
choices can be reverse-engineered from the observation of an asset price process.
Central to Constantinides and Du e analysis is the assumption that each individual has a
consumption equal to
at time , given by:


P
1 2
=
= exp
2
=1

are independent and standard normally distributed, and is a sequence of random


where
variables, interpreted as standard deviations of the cross-sectional distribution of the individual
consumption growth shares, ln
. Indeed,
1

1 2
+1
+1
2
ln
(8.28)

2 +1 +1
F { +1 }

where F is the information set as of time . Denoting with ( ) is the measure of agent , we
have that by Eq. (8.28) and the Law of Large Numbers, the cross-sectional variance of ln
1
R 2
R
1 2 2
2
2
is, (ln
+2 )
()=
()= .
1
The meaning of the consumption share
is that of an idiosyncratic shock every agent
receives on his consumption share at . From the perspective of each agent, this shock is uninsurable, in that it is unrelated to the asset returns. Moreover, by construction, the consumption
1 2
share has a unit root, as ln
ln
: a change in
and/or a shock in
1 =
2
have a permanent e ect on the future path of .
All agents have a CRRA utility function. We want to make sure this setup
R is consistent
with any given equilibrium asset price process, by requiring two conditions: (i)
()= ,
352

8.5. Incomplete markets with homogeneous and heterogenous agents

c
by
A. Mele

R
i.e.
( ) = 1, a condition satised by the law of large numbers; (ii) the cross-sectional
variances 2 are reverse-engineered so as to be consistent with any stochastic discount factor
and, hence, any asset price process given in advance. To achieve (ii), note that by the law of
iterated expectations, for any agent , the value of an asset delivering a payo equal to at
time + 1 is:
#
"

+1
( +1 +1 2 2+1 ) F {

F
+1 }

#
"

+1
( +1) 2+1

2
=
F

where is the discount rate and is the CRRA coe cient. It is independent of any agent ,
such that the stochastic discount factor is:

+1

+1

+1

1
2

( +1)

2
+1

That is, given an aggregate consumption process, and an arbitrage free asset price process, there
exists a cross-section of idiosyncratic risk processes that supports the given price process. As
a trivial example, consider the standard Lucas stochastic discount factor, which obtains when
0.
Which properties of the stochastic discounting factor are we looking for? Naturally, we wish
to make sure
is as countercyclical as ever, which might be the case should the dispersion of
the cross-sectional distribution of the log-consumption growth, 2 , be countercyclical. However,
Lettau (2002) shows that empirically, such a dispersion seems to be not enough, even when
multiplied by 12 ( + 1), unless of course, we are willing to assume, again, a high level of
risk-aversion. Note that Lettau analyzes a situation that favourably biases his nal outcome
towards not rejecting the null that idiosyncratic risk matters, as he assumes agents cannot
insure themselves at all: once they are hit by an idiosyncratic shock, they just have to consume
their income. Constantinides, Donaldson and Mehra (2002) consider an OLG to mitigate the
issue of persistence in the idiosyncratic risk process.
Discuss the recent literature.
[In progress]

8.5 Incomplete markets with homogeneous and heterogenous agents


Idiosyncratic risk plays a specic role in the previous section, in that in a way or in another, it
quite relates to aggregate consumptionfor example, the seminal paper of Mankiw (1986) relies
on the assumption that idiosyncratic risk is correlated with aggregate shocks, and the paper by
Constantinides and Du e (1996) relies on time-variation of the cross-sectional distribution of
the consumption growth shares. This section explores the implications of models where market
incompleteness is induced by a form of idioyncratic risk and a sort of inability to participate
market, which do not link to aggregate risk. Section 8.5.1 reviews a model with agents who
are homogeneous ex ante, but heterogenous ex post, due to the realization of idiosyncratic
shocks unrelated to aggregate risk. Section 8.5.2 reviews a model with agents heterogeneous
with respect to their access to the risky security market.
353

8.5. Incomplete markets with homogeneous and heterogenous agents

c
by
A. Mele

8.5.1 Idiosyncratic shocks unrelated to aggregate risk


Weil (1992) develops a simple model where idiosyncratic shocks are unrelated to aggregate risk.
In this model, there is a continuum of agents who are ex ante identical, in that they have the
same utility and the same endowments, a non-random in the rst period, and a stochastic
endowment in the second, independent of dividends, and with possibly di erent outcomes
across them. The agents, then, face the following budgent constraints:

=( + ) 0+
1+

2 = +
where 0 denotes the endowment of the risky asset, 1 and 2 are rst and second period
consumption, and and are the dividends in the rst and second period. We assume that
0 = 1. Because agents are ex ante identical, although ex post heterogenous, they have no
incentives to trade. An autarkic equilibrium is therefore one where = 1, 1 = + and
2 = + . Markets are incomplete because the endowments cannot be spanned through the
risky asset. The asset price satises
=

( 0 ( + ) )
0( + )

Compare this price with that arising in a complete markets economy, where by Pareto optimality, the agents have the same marginal rate of substitution, state by state. In this complete
markets economy with agents displaying identical preferences, risk-pooling is the equilibrium
outcome, with each agent consuming the same, 2 = + ( ), such that the price in the
complete market setting is,
( 0 ( + ( )) )
=
c
0( + )
If 0 is convex, and as soon as dividend and endowment risks are independent, ( 0 ( +
( )) )
( 0 ( + ) ), thereby leading to (i) the risk-free rate in the incomplete market
case lower than in the complete case, and (ii) c
.
The interpretation of the condition that 0 be convex relates to prudence, similarly as for
the condition of the previous section that ensures the equity premium in Mankiws model is
increasing in the concentration of aggregate shocks. In fact, Weil argues that the risk-free rate
and equity premium puzzles might be rationalized by agents facing incomplete markets while
engaging in precautionary savings, and restricted by preferences exhibiting decreasing absolute
risk aversion (DARA) and decreasing absolute prudence (DAP)an utility function exhibits
000 ( )
decreasing absolute prudence if the coe cient of absolute prudence,
00 ( ) , is decreasing. The
mechanism is quite illuminating. Incomplete markets lead agents to act more prudently than
they would if they had to face complete markets and as a result, to increase their demand of
both the riskless and the risky assets. Clearly, then, the riskless interest rate decreases although
to ensure an increase in the equity premium, we need to ensure that the increase in the price of
the risky asset is less than the riskless, which is ensured by DARA and DAPour agents need
then to be less prone to bear dividend risk once they have nontraded labor income.
While the model seems somehow restrictive due to its reliance on DARA and DAP, many
standard utility functions satisfy DARA and DAP. Du e (1992) develops a model with a nite
number of agents without exploring its implications. Note that these models are static. It is not
clear whether self-insurance would play a role in these models where idiosyncratic and aggregate
354

8.5. Incomplete markets with homogeneous and heterogenous agents

c
by
A. Mele

risk are unrelated. The model in the next section has a completely di erent mechanism, relying
on the assumption some agents would never be able to implement any self-insurance plans,
living in a world with an extreme form of market incompleteness, with all the macroeconomic
risk being borne by a handful portion of remaining agents.
8.5.2 A two-agents economy
The economy of this section relies on the presence of heterogenous agents. One agent has access
to the market for the risk asset, while a second agent has not. The equity premium is the
expected excess return the rst agent requires to enter the risky asset market, and can be quite
large, even in the presence of small risk-aversion, because the agent is being willing to take on
the entire aggregate macroeconomic risk.
Basak and Cuoco (1998) consider the following model. An agent does not invest in the stock
market, and has logarithmic instantaneous utility,
( ) = ln . From his perspective, markets
are incomplete. A second agent, instead, can invest in the stock market, and has instantaneous
utility equal to ( ) = ( 1
1)/ (1
). Both agents are innitely lived. The competitive
equilibrium of this economy cannot be Pareto e cient, and so aggregation results such as those
underlying the economy in Section 8.2 cannot obtain. However, Basak and Cuoco show that
aggregation still obtains in this economy, once we dene social weights in a judicious way.
Let be the general equilibrium allocation of agent , =
. In equilibrium, + = ,
where is the instantaneous aggregate consumption, taken to be a geometric Brownian motion
with parameters 0 and 0 ,
=

Dene

(8.29)

(8.30)

which is the consumption share of the market participant.


The rst order conditions pertaining to the two agents intertemporal consumption plans are:
0

where

( ) =

are two constants,

is the short-term rate and


( )= +

(8.31)

is the pricing kernel process, solution to,


=

and, nally,

is the unit risk-premium, which equal

+ (1

(8.32)

1
1
2 ( + (1

))

2
0

(1 + )

(8.33)

and
( )=

(8.34)

The expressions for and in Eqs. (8.33)-(8.34) are derived below. Appendix 2 provides a
further derivation relying on the existence of a representative agent, as originally put forward
by Basak and Cuoco (1998), and explained below.
In this economy, the marginal investor bears the entire macroeconomic risk. The risk premium
he requires to invest in the aggregate stock market is large when his consumption share, ,
355

c
by
A. Mele

8.5. Incomplete markets with homogeneous and heterogenous agents

is small. With just a risk aversion of = 2, and a consumption volatility of 1%, this model
can explain the equity premium, as the plot of Eq. (8.34) in the Figure 8.1 illustrates. For
example, Mankiw and Zeldes (1991) estimate that the share of aggregate consumption held by
stock-holders is approximately 30%, which in terms of this model, would translate to an equity
premium of more than 6 5%.
Guvenen (2009) makes an interesting extension of the Basak and Cuoco model. He consider
two agents in which only the rich invests in the stock market, and is such that EISrich
EISpoor . He shows that for the rich, a low EIS is needed to match the equity premium. However,
US data show that the rich have a high EIS, which can not do the equity premium. (Guvenen
considers an extension of the model where we can disentangle EIS and CRRA for the rich.)

lambda

0.10
0.09
0.08
0.07
0.06
0.05
0.04
0.2

0.3

0.4

0.5

0.6

FIGURE 8.1. The equity premium in the Basak and Cuoco (1998) model, for
0 = 1%.

= 2 and

To derive Eqs. (8.33)-(8.34), note that the consumption of the agent not participating in the
stock market satises, by Eq. (8.31):

=(

(8.35)

Therefore, the consumption of the marginal investor, =

=
=

, satises

) (1

))

1
0

(8.36)

where the second equality follows by the denition of in Eq. (8.30), and by Eqs. (8.29) and
(8.35). Moreover, by the rst order conditions of the market participant in Eq. (8.31), and the
CRRA assumption for ,

1 2
+
ln =
(8.37)
ln =
2
356

c
by
A. Mele

8.6. Disagreement and learning

2
1
Using the relation, ln =
, then identifying terms in Eq. (8.36) and Eq. (8.37),
2
delivers the two expressions for and in Eqs. (8.33)-(8.34).
How do these results technically relate to aggregation? Basak and Cuoco dene the instantaneous utility of a representative agent, a social planner, as:
(

( )+

max [
+

where

( )
=
0 ( )

( )]

(8.38)

( )

is a stochastic social weight and, once again, and are the private allocations, satisfying the
rst order conditions in Eqs. (8.31). By the denition of , and Eqs. (8.31),
is solution to,
=

(8.39)

Then, the equilibrium in this economy is supported by a ctitious representative agent with
utility (
). Intuitively, the social planner allocations satisfy, by construction,
0

(
0 (

)
=
)

0
0

( )
=
( )

where the starred variables denote social planners allocations. In other words, Basak and
Cuoco approach is to nd a stochastic social weight process
such that the rst order conditions of the representative agent leads to the market allocations. The utility in Eq. (8.38)
can then be used to compute the short-term rate and risk premium, and lead precisely to Eqs.
(8.33)-(8.34), as shown in Appendix 2.

8.6 Disagreement and learning


This is the rst of two sections dealing with economies in which agents have a limited knowledge
of the fundamentals, which they attempt to improve by processing publicly available signals. In
this section, we assume agents have di erent opinions: while they all learn from signals, their
learning pattern might di er for a variety of reasonsfor example, agents might su er from
di erent cognitive biases regarding the real content of the signals they have access to.
We analyze the impact of disagreement on asset prices in two models. In both, the mechanism
leading to disagreement is the agents overcondence about their ability to process information
they consider as their own. In the rst model, developed by Scheinkman and Xiong (2002),
bubbles arise as a result of di erences in opinion amongst risk-neutral agents in the presence
of short-sale constraints. In the second, agents disagreement is cast within a more standard
general equilibrium with complete markets, risk-averse agents and no short-sale constraints; a
special case of this model is the two-person model of Dumas, Kurshev and Uppal (2009), which
we discuss in some detail.
To deal with the models in this section, we need a preliminary survey of tools, which allow
us to think about how agents process information conveyed by multiple signals. These tools
generalize the scope of those introduced in Chapter 7 (Section 7.5.4), and are in Section 8.6.1.
In Section 8.6.2, we present details regarding the model of Scheinkman and Xiong, and in
Section 8.6.3, we analyze how di erences in opinions are dealt with in general equilibrium in
357

c
by
A. Mele

8.6. Disagreement and learning

the absence of frictions. Section 8.6.3 also deals with issues of aggregation in markets with
heterogeneous beliefs, providing a general framework to address a number of issues, from the
analysis of survival of irrational traders, to themes such as the equity premium puzzle or excess
volatility in irrational markets.
Note that the vast majority of the explanations in this section have a behavioral slant. We
are about to assume that agents have psychological biases, in that they make systematic mistakes whilst assessing the probability distribution of the fundamentals, by emphasizing aspects
of the markets such as correlations or informativeness of signals, which are less pronounced than
in the real markets they operate in. In Section 8.7, we move to an alternative approach, relying
on Knightian uncertainty: while agents still have a poor knowledge of the complex environment
in which they take decisions, they behave rationally, by explicitly acknowledging their ignorance
and acting while fearing the errors possibly arising from it.
8.6.1 Learning with multiple signals
We state ltering results on Bayesian learning that generalize those in Chapter 7 while taking
into account the possibility that agents might update beliefs relying on multiple signals.
Consider the following result, a special case of Theorem 12.7 in Liptser and Shiryaev (2001;
page 36). Suppose that some unobservable process is solution to
=(

(8.40)

is a where 0 and 1 are scalar constants, is -dimensional vector of constants, and


dimensional standard Brownian motion. Assume that a -dimensional vector of signals S on
are observed, solution to
S =
+
(8.41)
where

and

are a -dimensional vector and a

matrix of constants. Dene

( |F )

where F is the information set available at time . Then, is a di usion process with time
varying but deterministic volatility.
Assuming enough time has elapsed to have made the variance of stationary, we have
=(

where
:0=2

>

>

>

>

>

) 1( S
>

(8.42)

(8.43)

Note that the Brownian motions driving the unobservable and the signals S are (potentially) the same, such that the interpretation of > into the variance term of Eq. (8.42) is that
of the instantaneous covariance between and S ,
(

S)=

>

(8.44)

The next sections rely on these ltering results and evaluate how agents update their beliefs
in light of new information in a number of models.
358

c
by
A. Mele

8.6. Disagreement and learning


8.6.2 Overcondence and bubbles

Scheinkman and Xiong (2002) consider a market in which the cumulative dividend process
of an asset satises
=
+
(8.45)
is a Brownian motion, is a constant, and
is the expected instantaneous dividend.
where
Note that in all the models dealt with in these lectures so far, cumulative dividend is locally
deterministic, i.e.
= 0 in Eq. (8.45). So this model di ers, and relies on the additional
assumption that
is not observedonly
is, although known to be a mean-reverting
process,

=
(8.46)
+
is a standard Brownian motion, and and
are two positive constants.
where
There are two sets of risk-neutral agents, and , who observe signals on , which satisfy:
=

(8.47)

as being his own signal and believes its volatility is , for some
However, agent thinks of
1, not , and agent thinks the same regarding his own signal . This overcondence
is the source of a behavioral bias similar to that considered in previous work by Kyle and Wang
(1997) and Odean (1998): agents make systematic mistakes about the asset fundamentals while
processing information, by assigning higher weight to a signal they arbitrarily perceive as their
own.
These assumptions are those appearing in the working paper version of Scheinkman and Xiong
(2002). In the published version, the assumption is that agents perceive signals to be correlated
with the fundamentals even if they are not. These two alternative assumptions give rise to the
same conclusions. We maintain the assumptions in this section because we deal with the latter
assumption in Section 8.6.3, while discussing another model, thereby providing the reader with
an additional exercise about learning in an overcondence context.5 We now apply the ltering
results in the previous section and analyze how overcondence leads to disagreement formally.
We then turn to the determination of an equilibrium, and explain the mechanism leading to
bubbles.
8.6.2.1 The inference process

We provide a step-by-step derivation of the inference process. In terms of the framework of


Section 8.6.1, the expected instantaneous dividend
in Eq. (8.46) is the unobservable , and
the signals available to each agent, S , are the cumulative dividend
in Eq. (8.45) and the
two signals in Eq. (8.47), where the latter are subject to behavioral biases as explained.
First, consider an agent who observes the cumulative dividend
in Eq. (8.45), such that the
Brownian motion in Section 8.6.1 is
=[
] here, with the covariance term in Eq. (8.44)
being then equal to zero,
(
) = 0. Then, by Eq. (8.42), the expected instantaneous
dividend at given all available information,
( | F ), satises:
= (

2 2
is the positive root of 0 =
+2
where
early models of learning reviewed in Chapter 7.

(
2

. This is entirely consistent with the

5 Alternatively, we may think of agent


(resp. agent ) as having access to more precise information about signal
than agent
(agent ). However, in this case, agents would not disagree once information is aggregated.

359

(8.48)

(signal

c
by
A. Mele

8.6. Disagreement and learning

Next, assume that agents also observe the signals in Eq. (8.47) but have no overcondence.
Then, below, we shall argue that all agents update priors through new information as follows,

1
1
1
= (
) +
(8.49)
+
+
is a Brownian motion under the information set F available to the agents, and is
where
dened as:
1
)
(
are dened similarly, and will be dealt with later (see Eqs.
The two Brownian motions
(8.53) below). Note that Eq. (8.49) collapses to Eq. (8.48), once we assume that agents have no
access to the two signals in Eq. (8.47) or, equivalently, that these signals are uninformative,
i.e.
.
Finally, consider the case in which agents have overcondence. Agent , to start with, thinks
of the signal
as being generated by Eq. (8.47), but regarding , he assumes that,
=

(8.50)

In terms of the ltering problem in Section 8.6.1, the information set of this agent includes
realizations of S = [
] up to time = , where
is solution to Eq. (8.50) and
is
solution to Eq. (8.47), such that and in Eq. (8.40) and (8.41) are,

=[

0 0 0]

0
0
0

0
0
0

0
0

and then, by straightforward


calculations, and Eq. (8.42), the dividend expected by agent ,

| F , satises

1
1
= (
) +
(8.51)
+
+
where the Brownian motions will be dened in a moment, and
is the positive root of,
2
1
1
2
2
.
0 = ( 2 + 2 + 2) + 2

Likewise, the dividend expected by agent ,


| F , satises

1
1
= (
) +
(8.52)
+
+
Finally, the Brownian motions in Eqs. (8.51)-(8.52) satisfy,
=

where,
=

1
1
360

if =
6
if =

(8.53)

c
by
A. Mele

8.6. Disagreement and learning


Naturally, the two standard Brownian motions
those in Eq. (8.53), namely for
1.
Let us dene the di erences in beliefs as,

and

in Eq. (8.49) are a special case of

(8.54)

That is, a negative value of is interpreted as a state in which agent is more optimistic
than .
We wish to express the dynamics of
under the probability space of agent , and the
dynamics of
under the probability space of agent . Let us consider the terms of disagreement
between the two types of agents. As regards the dynamics of cumulative dividends, we have,
by Eq. (8.53), that:
1
=
and concerning the signals,
=

such that, using Eqs. (8.51)-(8.52),


=

(8.55)

for some positive constant and , and a Brownian motion


.6 Di erences in beliefs are
mean-reverting to zero. Intuitively, both agents believe that the long term value of the instantaneous dividend is . Moreover, agents in group
think that future dividends will reect
their own beliefs, not the beliefs mistakenly held by agents in group . Hence, they expect the
di erence in opinions to shrink over time. Naturally, agents in group make a symmetrically
opposite reasoning.
8.6.2.2 Bubbles as re-sale American options

The distinctive mark of the model is that trading occurs between agents due to di erence in
beliefs, and leads to bubbles arising as a result of short-selling constraints, consistent with
previous insights of Harrison and Kreps (1979). Intuitively, at any point in time, any asset is
held by the relatively more optimistic agent, whose asset evaluation is higher than his own
assessment of the fundamentals, expecting as he is to sell the asset to some future relatively
more optimistic agent. Prices then deviate from any agents fundamental evaluation because
short-selling constraints bias the price towards the more optimistic agents.
What is the optimal time at which to sell? It is a real option problem, of the kind introduced
in Chapter 4. Let
denote the asset price at that agents in group are willing to pay,
such that,
Z

= sup

(8.56)

where
denotes the time conditional expectation of agents ,
is the cumulative dividend
process in Eq. (8.45), is the constant interest rate, and is a transaction cost, to be discussed
below.
6 The

1+

+ ( 1+2

constants are
2

1
2

and

).

361

1)

1+

, and the Brownian motion satises:

c
by
A. Mele

8.6. Disagreement and learning

In words, the price agents are willing to pay at reect the dividends paid out over the
holding period and the price agents are willing to pay at time + , net of the transaction
cost. The conjecture is that the solution
contains a bubble component,
=

+B( )

(8.57)

The rst two terms on the R.H.S. of REq. (8.57) amount to the payo of the asset over its
(
)
(
)the fundamentals. According to the
life expected by agents in group ,
conjecture, agents are willing to pay more than the fundamentals, with B ( ) representing a
bubble, a function of , the di erence in beliefs in Eq. (8.54). The bubble arises exactly because
agents in group bid up the price to their own evaluations and agents in group cannot sell
short.
By replacing Eq. (8.57) into Eq. (8.56), and using the denition of in Eq. (8.54) leaves:

+
B ( ) = sup
+B( + )
(8.58)
+
0
The bubble is a re-sale option: it arises because the current asset holders have the option to
re-sell it in the future at a price higher than their own evaluation. Eq. (8.58) shows that its value
is determined through an optimal stopping time to sell, similarly as with a perpetual American
option (see Chapter 4), with the complication that the strike is endogeneous, reecting the
value of the bubble as perceived by the future buyer at the optimal stopping time.
From results in Chapter 4 (see Section 4.6), one can conjecture that there exists a region
of inaction, where the asset owner holds the asset, and a threshold , such that the optimal
stopping time in Eq. (8.58) is
= inf ( : +
). In particular, in the continuation region,
where the agent does not sell, the (discounted) bubble is a martingale, such that by Itos lemma
and Eq. (8.55),
L [B]
B=0
(8.59)

where L [B] = 12 2 B00


where L [B]
B 0,

B0 is the innitesimal generator for

in (8.55). In the stopping region,

+B( )
(8.60)
+
Finally, a solution to this optimal stopping time satises the usual smooth-pasting conditions:
is the same as the function satisfying Eq. (8.60)
(i) the function solution to Eq. (8.59) for
at = ; and (ii) the derivatives of these two functions are the same at = . Scheinkman
and Xiong (2002) show that there is a solution to this problem, and that
0, and that
= 0 only when the transaction cost = 0. To illustrate, agent needs to become su ciently
pessimistic (
0) to be able to justify the transaction cost while selling the asset.
Intuitively, Brownian motions hit zero innitely many times, such that agents will trade very
often, driven by their ever switching di erences in opinions. The presence of a transaction cost
mitigate the occurrence of a such a trading frenzy.
B( ) =

8.6.3 General equilibrium without frictions


We illustrate how to incorporate di erences in beliefs in general equilibrium in a framework
with complete markets. We begin with an introductory model, formulated by Kogan, Ross,
Wang and Westereld (2006), which actually addresses an old and important question brought
362

c
by
A. Mele

8.6. Disagreement and learning

about at least since Friedman (1953): how long would irrational investors survive in nancial
markets?
The technical issue arising while addressing this question links to how the agents priors are
aggregated in equilibrium: irrational agents are those who systematically believe in a model
deviating from the truth, and never wish to learn, thereby holding di erent beliefs than
those of the rational agents. Section 8.6.3.1 studies such a model and Section 8.6.3.2 generalizes
it to a context of learning, in which there are no agents who are more or less rational than
others, along the lines of Scheinkman and Xiong (2002), but within a framework of frictionless
markets without short-sales constraints. The sentiment model of Dumas, Kurshev and Uppal
(2009) is a special a case of this framework, and vividly illustrates the main predictions arising
therefrom. It will be discussed in Section 8.6.3.3.
8.6.3.1 Di erence in beliefs and extinction of irrational traders

Kogan, Ross, Wang and Westereld (2006) (KRWW, in the sequel) analyze the market impact of
a drastic source of di erences in beliefs, arising due to the irrationality of some agents. We briey
review this model both because of its outstanding economic importance and the aggregation
techniques needed to solve it. The ensuing aggregation results are generalized in the following
subsection devoted to the analysis of disagreement and learning in general equilibrium with
multiple agents and frictionless markets.
Consider an economy in which there is a riskless asset in zero net supply, and one risky asset
that entitles to an instantaneous dividend ow. Dividends follow a geometric Brownian motion
under the physical probability , with parameters and ,
=

(8.61)

with obvious notation. Rational agents correctly believe dividends are as in Eq. (8.61). However,
irrational agents think that the dividend process has a higher drift than in Eq. (8.61),
=

(8.62)

where is a constant and


2 is a Brownian motion, to be dened in a moment. Eq. (8.62)
formalizes the idea that irrational agents understand the market under their own probability
2
(say), on which
is a Brownian motion by Girsanovs theorem. If
0,
2 =
the irrational agent is optimistic, in that he views dividends grow more than justied by the
true probability laws.
The change of probability from the benchmark rational agent to the irrational is given by
the Radon-Nikodym derivative,

where F is the information set available to all agents, and the density process
=

is solution to
(8.63)

In this model, irrational agents do not attempt to learn from the history of dividends and
their cognitive bias is not eliminated as a result.
363

c
by
A. Mele

8.6. Disagreement and learning

KRWW assume that all agents consume at some time , that they have CRRA equal to
, and that they have the same endowments.7 Therefore, the rational and irrational investors
maximize,
1

1
1
2
1
2
2
and
=
(8.64)
1
1
1
respectively, where 2 denotes the expectation under 2 taken by the irrational agent, and
the equality follows by a change of probability. Note that because
is strictly positive, the
irrational agent becomes more and more aggressive as he disagrees more with the rational,
where disagreement is captured by .
Because markets are complete, consumption allocations can be determined by solving a central planner program as explained in Chapter 2, whose value is:
=

( )

max

=1 2 : 1

1
1

1
2

[8.P1]

denotes the reciprocal of the marginal utilities of income for the two agents. The
where
solution is
1
( )1
2
(8.65)
1 =
2 =
1
1
1
1+( )
1+( )
As anticipated, the irrational agent receives an allocation that increases with his disagreement: it is as if his utility function in (8.64) had a random weight, , arising through how
much he will disagree regarding the dividend path leading to
at the consumption date .
We need to determine the ratio of marginal utilities of income, . Replace the consumption
allocations in (8.65) into the central planner program [8.P1], leaving:
(

1
1

+(

1
2)

such that, denoting partial derivatives with subscripts, the pricing kernel is given by
=

(
1

)
(

)]

(1 + ( )1 )
h
(1 + ( )1 )

(8.66)

Note that because consumption only takes place at terminal date, , the denominator of
contains the expectation of the representative agents marginal utility, not its value at zero as
it is instead the standard case (Chapter 7, Section 7.5.2). As shown in Chapter 4 (Appendix
2),
is the pricing kernel that determines the equilibrium prices using the money market
account as the numeraire. KRWW show that (see the Appendix for a few steps underlying this
derivation),
2
= (1 )
(8.67)
Eqs. (8.66) and (8.67) can now be used to dermine the asset price and the relative consumption
share. Regarding the asset price process (expressed in terms of the money market account
7 Work on extinction by Cvitani
c and Malamud (2011) relaxes the assumption the mass of rational and irrational agents are the
same.

364

c
by
A. Mele

8.6. Disagreement and learning

numeraire), we have that, = (


). This expression considerably simplies when agents
have logarithmic preferences, = 1, in which case,
=

1+
1+

2(

(8.68)

where
denotes the price in an economy that is only populated by rational agents, i.e. =
0. That is, the presence of optimistic agents (
0) inates asset prices beyond rational
evaluations.
The density process can, thus, be interpreted as sentiment, leading as it does to the following euphoria. Suppose a positive shock hits dividends; then the asset price goes up because
both
and go up in Eq. (8.68). That is, prices are driven both by rational evaluations and
sentiment. The term sentiment has been used in a context with irrational agents and learning
by Dumas, Kurshev and Uppal (2009) (see Section 8.6.3.3).
This property can be generalized to the case of any CRRA coe cient. Indeed, given the
expression for in Eq. (8.66), we can rely on results in Chapter 7 (Section 7.5.2), and determine
the unit risk-premium
in this economy. It is:
=

(
1+(

)1

2
1

(8.69)

and is less than in the rational economy, due to the permanent di erence di erence in beliefs,
2
( )
( )=
0. All in all, the aggregate risk-premium in this economy is lower,
reecting the optimistic view of the irrational agents: asset prices (expressed in terms of the
money market numeraire) are higher than in the purely rational. Note, also, that the aggregate
risk-appetite is time-varying and driven by market sentiment, . In good times (i.e., after
a positive dividend shock), sentiment increases, such that
lowers in Eq. (8.69), leading to
speculative enthusiasm and higher asset evaluation.
Would irrational traders ever disappear in the long-term? Dene the relative consumption
share of the rational agent against the irrational, which by Eq. (8.65) and then (8.63) and (8.67)
is,
1
2
(1 + 12 ) 2 + 1
= ( )1 =
1
2
KRWW dene relative extinction of irrational traders occurs whenever, lim
= 0
1
almost surely. By the Strong Law of Large Numbers for Brownian motions (Karatzas and
Shreve, 1991), we have that given two constants 0 and 1 ,

0
0
0
0 + 1
=
lim
0
0

irrespective of the sign of 1 .


In the log-utility case, = 1, the irrational trader does not survive. For
1, relative
extinction occurs whenever the irrational agent is either pessimistic,
0, or too optimistic,
2(
1). However, the irrational trader would survive as soon as he is moderately
optimistic, 0
2(
1), in which case the rational agent would not survive. The interpretation for the rational trader extinction is that the irrational trader has a price impact that
makes low dividend states cheaper than the good, such that the rational trader accumulates
more wealth in bad states than in the (relatively more likely) better states.
365

c
by
A. Mele

8.6. Disagreement and learning


8.6.3.2 Di erences in beliefs in markets with multiple agents

We consider an economy in which agents hold di erent beliefs regarding the fundamentals of
the economy, generalizing the two-person setting originally formulated by Detemple and Murthy
(1994), Zapatero (1998), and others such as Basak (2000, 2005), Berrada (2006), or Buraschi
and Jiltsov (2006). For example, some agent, the benchmark, may think that output has
expected growth and constant volatility 0 ,
=

(8.70)

We dene the agents with these beliefs as the benchmark, in that any other agents beliefs are
about to be gauged against theirs, as explained below. Accordingly, and for a given , consider
the Radon-Nikodym derivative of the probability
of agent against the benchmark = 1,
and its density process,
say, which by Girsanovs theorem satises
=

(8.71)

For each agent , the dynamics of output is,


=

(8.72)

=
+ 10 is a Brownian motion under . This example generalizes the model
where
in the previous section, in which and are constant, as summarized by Eq. (8.63).
We refrain from specifying additional examples and explaining mechanisms leading to disagreement, for now and until the next subsection, and focus on the determination of the equilibrium in this economy. Without loss of generality, dene the rst agent as the benchmark. We
assume that markets are complete and that there is symmetric albeit incomplete information,
and denote with F the information set available to all agents. The Radon-Nikodym derivatives,

{2 }
(8.73)
1
F

formalize the notion of disagreement between agent and the benchmark at a notion made
more precise below.
Each agent maximizes his intertemporal utility, subject to the budget constraint,
Z
Z

max
( )
s.t.
= 0
[8.P2]
(

where
denotes expectation under , 0 is the initial wealth available to agent , and
is
the private state-price process, or pricing kernel, of agent , a concept we elaborate on in a
moment. Remaining notation is straightforward.
We know since from Part I of the lectures that a state-price process links to the evaluation
of a consumption unit at some future pre-specied state of nature. Heuristically, it is,

0
(8.74)

366

c
by
A. Mele

8.6. Disagreement and learning

where denotes the risk-neutral probability as usual, which links to the value of Arrow-Debreu
securities as we know. Note that
in Eq. (8.74) can vary across agents due to di erence in
opinions, although then the agents need to agree on the price of the assets they observethey
do, simply, have di erent perspectives regarding the future developments of the assets price.
To formalize the idea that agents disagree against a same benchmark, note that,
=

and cast the program [8.P2] under the probability space of the benchmark agent, such that
each agent (note, ) maximizes his intertemporal utility subject to his budget constraint,
Z
Z

1
1
( )
s.t.
= 0
[8.P3]
max
1
(

The program [8.P3] is a standard complete markets problem in which each agent has an
instantaneous utility function equal to
( ) and a commonly agreed state-price process 1 ,
that of the benchmark agent. All agents do agree on the current price of all assets, although
they now act more aggressively on their consumption as their divergence of opinion against the
rst agent widens, as summarized by an utility distorsion factor . The program [8.P3] is the
intermediate consumption, innite horizon extension to the program of maximizing terminal
utility in (8.64).
Because markets are complete, we can study the asset pricing implications of this economy
by centralizing it as usual. Consider a representative agent with instantaneous utility equal to,
= max
( )

=1

( )

s.t.

=1

[8.P4]

=1

where
denotes output at , and 1 is the marginal utility of income of agent . We simplify
the presentation, and to focus on the salient aspects of disagreement, we set
for all .
The rst order conditions of the program [8.P4] are,
0

( )

is a Lagrange multiplier. Assuming agent have CRRA equal to , we can solve

1
1
=
, plug it
the previous equation for the consumption allocation to agent ,
into the constraint of
=

the program [8.P4], thereby determining the Lagrange multiplier,


1
1P
)
and, ultimately, the sharing rule regarding consumption allocation,
=1 (
where

=P

)1

=1

)1

(8.75)

Intuitively, and as anticipated, the more aggressive agents are, compared to the benchmark
(i.e., the higher ), the higher their consumption allocation. Eq. (8.75) is, naturally, reminiscent of the consumption allocation in Eq. (8.65) applying to the KRWW model. However,
disagreement, , is now allowed to take a more general form than in (8.63): it is time-varying
and reects the agents learning as in (8.71).
367

c
by
A. Mele

8.6. Disagreement and learning

The value function for the central planner is obtained by replacing Eq. (8.75) into the maximand of the program [8.P4],
(

( ) =1 )

=1

(8.76)

By the usual arguments (see Section 7.5.2 in Chapter 7), the pricing kernel in this economy is
given by 1
, where
(
( ) =1 )
(8.77)
=
( 0 ( 0 ) =1 )
0
where subscripts to denote a partial derivative with respect to the rst argument.
Note that to price any asset, we still need to determine the marginal utility of income for
each agent , 1 . One approach could be to search for a cross-sectional distribution of Pareto
weights that best ts selected moments of the price distribution, as Chan and Kogan (2002) did
in their model surveyed in Section 8.3. Note that the cross-section of Pareto weights depends
not only on wealth distribution, but also on beliefs. Indeed, by results in Chapter 4 (see Section
4.5.2), we have that the reciprocal of the marginal utility of income is,
Z
1

1 1
1
1
= 0
1
0

The asset market equilibrium can now be studied by determining the risk-premiums,
and interest rates as implied by the pricing kernel for = 1 in Eq. (8.74),

0
0
1
1

say,

where,

(8.78)

Finally, Eq. (8.77) shows that agents disagreement, as summarized by the cross-section
( ) =1 , is a potential source of increased volatility of the pricing kernel and, then, resolution of
the equity premium and other puzzles. The model of the next section discusses a special case
of this framework that seems to illustrate this role.
8.6.3.3 Two-person equilibrium

Dumas, Kurshev and Uppal (2009) (DKU, in the sequel) consider a model in which agents
disagree on the fundamentals due to overcondence, similarly as in the model of Scheinkman
and Xiong (2002) discussed in Section 8.6.2. This model is a special case of that in the previous
section: there are two agents ( = 2) with the same CRRA coe cient , with one of them
holding di erent beliefs due to overcondence, as explained below.
The key assumptions of the model are that the expected growth of output
,
say, is
unobserved, and that the only available information is the observation of
and one additional
signal that is totally uninformative about the state of the economy. However, an overcondence
investor believes this signal is correlated with the economic fundamentals, whence a di erence
in beliefs between him and a second, rational investor, the benchmark.
368

c
by
A. Mele

8.6. Disagreement and learning


In detail, output is solution to
=

= (

where 0 , , and
are constant, and 0 and
also observe an uninformative signal, solution to

(8.79)
+
are standard Brownian motions. Investors

=
is another standard Brownian motion. However, the overcondent investor believes
where
that this signal is correlated with the fundamentals, in that
p

2
= :
=
1
+

These assumptions substantially match those in the published version of Scheinkman and Xiong
(2002), as mentioned in Section 8.6.2.
Let us solve for the inference problem of the irrational investor, the problem of the rational
being a special case, notably for = 0. In terms of the learning problem of Section 8.6.1, we
have that the vector Brownian motion is
=[
], such that and in Eq. (8.40) and
8
(8.41) are

0
0
0
p
(8.80)
= [0 0
]
=
2
0
1
( | F ) is

and the inferred output growth

= (

(8.81)

where F denotes the information set available to the irrational agent. Instead, the rationally
expected output growth is
( | F ) and satises
= (

(8.82)
0

where F is the information set available to the rational agent. Remaining notation is as follows:
and
are solutions to Eq. (8.43), with 0 = , 1 =
, = [1 0], and the matrix
is as in Eqs. (8.80), where
is determined by setting = 0. Below, we shall dene the two
Brownian motions in Eqs. (8.81) and (8.82).
We take the rational agent to be the benchmark. In terms of the model in the previous subsection, he thinks output is as in Eq. (8.70), where is his expected dividend growth, solution
to Eq. (8.82)and by construction, the two Brownians in Eqs. (8.70) and (8.82) coincide. The
second overcondent investor disagrees, and thinks that the expected dividend growth is ,
solution to Eq. (8.81).
As in the previous subsections, the density process summarizing the di erence in opinions
between the irrational and the benchmark is the process
solution to Eq. (8.71) with
8 Note that the in Section 8.6.1, the dynamics of the signals, S , is expressed in basis point terms, as opposed to the dynamics
of output in Eq. (8.79). However, the inference yields the same result after suitable denition of the Brownian motions.

369

c
by
A. Mele

8.6. Disagreement and learning

. Thus, and consistent with similar conventions in previous sections,


0 is
indicative of states in which the irrational agent is pessimistic compared to the rational agent.
Note that from the perspective of the irrational agent, output is solution to Eq. (8.72), with

, and for the Brownian motion


in Eq. (8.81), such that
=
+ 10 .
Finally, the Brownian motion
is simply dened as
, and is therefore uncorrelated with
. The dynamics of disagreement, , are
=

2
where
+
(
) 0.
0 and
DKU term the density process sentiment to emphasize it arises through overcondence
over the agents information processing of publicly available signals. Because markets are complete, the asset pricing implications of this economy are obtained by relying on the instantaneous
utility of a representative agent, which by Eq. (8.76) is,
1

1
1
=
+ ( 2)
1
1

where denotes the reciprocal of the marginal utility of agent , as usual.


By results in Chapter 7 (Section 7.5.2), we can determine interest rates and risk-premium
and, then, the pricing measure, as indicated by Eq. (8.78). We can, equivalently, rely on the
solution of the central planner. By Eq. (8.77), we have that the pricing kernel satises

1
1
+
(
)
2
1

=
(8.83)
1
1
0
0
+
(
)
0 2
1

such that the risk premium is

)=

(
1
1

+(

)1
2

(8.84)

)1

It is increasing in both sentiment


and disagreement
Similarly, one can show that the

) for some function


short-term rate at time , , is a function of , and , i.e. = (
().
The risk-premium in Eq. (8.84), (
), generalizes the risk-premium in the Kogan, Ross,
Wang and Westereld (2006) market (see Eq. (8.69) in Section 8.6.3.1), in that the risk-premium
is now time-varying due to both sentiment and di erences in opinion (di erences in opinion
never change in the model of Section 8.6.3.1). When the irrational agent is optimistic,
0, an
increase in sentiment lowers the risk-premium to a level below that in the Lucas model. However,
note that in times in which the irrational agent is pessimistic,
0, a higher sentiment
would exacerbate the equity premium.
Given the previous characterization of the risk-premium and the interest rate, it follows that
the price-dividend ratio is driven by the three state variables , and , and is solution to a
partial di erential equation that has one of the usual forms seen in Chapter 7. DKU actually
note that a closed-form solution exists under the parameter restriction that is an integer;
they note that by the binomial formula,

X 2
1
1
+ ( 2)
= 1
1
=0

370

c
by
A. Mele

8.7. Coping with Knigthian uncertainty

Relying on this formula and the expression for the pricing kernel in Eq. (8.83), one nds that
the price-dividend ratio satises

2
Z
Z
X
1 0

1
0
0
0
0
0
0
2
=0
1+ 1 0
(

0 )

where the dependence of on 0 and 0 arises through the expectation inside the integral, which
DKU calculate in closed form.
Asset prices are, then, driven by three state variables: (i) sentiment, , (ii) di erence in
opinions, , and (iii) the correctly specied expected output growth, . The model generates
plausible values of both volatility and the equity premium. The main determinant of the equity
premium is the rational agents risk-aversion to the sources of risk introduced by the irrational,
summarized by sentiment and di erence in opinions, and , as illustrated by the risk-premium
in Eq. (8.84). Expected returns and volatility are, then, high, compared to an economy with
rational agents (i.e. with = 0), with rational investors increasing the proportion of their
wealth in equity only when their evaluation of the expected fundamentals, , increases by a
considerable amount. DKU also nd that the time needed for the extinction of the irrational
investors is quite high.

8.7 Coping with Knigthian uncertainty


8.7.1 Prelude
8.7.1.1 Knightian uncertainty

This section still studies asset prices in economies where agents have limited knowledge of the
statistical laws for the fundamentals. The new element is that we assume that agents give up
thinking of having a single model to decipher the signals they receive. Rather, they formulate
multiple priors underlying the laws of the fundamentals, and act while being averse to the
uncertainty inherent their own priors.
Note the di erence between this approach and that in the previous section. In the previous
section, limited knowledge of the fundamentals leads the agents to disagree on the right
model. Naturally, disagreement is not logical necessity given limited knowledge; however, it is
a natural assumption in this context, as exemplied by the overcondence bias models dealt
with in Section 8.6.2.9 Still, the previous models with disagreement rely on the assumption
that agents have a unique prior with which they interpret the complex world where they live.
Instead, in this section, limited knowledge leads agents to be skeptical about their own ability
to process complicated pieces of information. The agents acknowledge that many explanations
are possible regarding how the economy works. We survey models that allow for this line of
reasoning but, to simplify, consider only one representative agent.
The context in which agents operate in this section has come to be dubbed as Knightian
uncertainty (Keynes, 1921; Knight, 1921), or ambiguity, that is, uncertainty that cannot be
9 Di erences in beliefs do not necessarily arise through overcondence. Cujean (2013) develops a two-agent model in which
expected dividend growth is unknown, with one agent thinking it is a continuous process, and another, thinking it is a discrete
Markov chain instead.

371

c
by
A. Mele

8.7. Coping with Knigthian uncertainty

quantied probabilistically. In fact, one may argue that, since the dawn of nancial economics,
the entire building blocks of our models have been relying on the assumption that everything
could be quantied probabilistically; the leading examples in these early developments are
trivially easy to detect in the precursory mean-variance model of Markowitz (1952) and in the
subsequent work leading to the CAPM (see Chapter 1).
But introducing Knightian uncertainty in economics is not a trivial task. We would need to
know how to model rational decision makers who face uncertainty. Standard decision theory
would not be helpful while thinking of uncertainty. We now proceed to a short survey of the
literature on ambiguity, while insisting on the key references in decision theory on which models
can be built. In subsequent sections, we show how these decision-theoretic foundations can be
relied upon to build up models that could be used to address the typical issues arising in
nancial economics.
8.7.1.2 Survey notes regarding theoretic aspects of Knightian uncertainty

From the economic-theoretic standpoint, the important issue regards how to model aversion
to uncertainty.10 One of the rst approaches to emerge relies on the so-called capacities as
explained in more detail below. This approach goes back to at least Schmeidler (1982, published
in 1989). The idea underlying capacities is to make rigorous use of non-additive measures to
formalize the concept of loss in probability as an attitude towards uncertainty in the context
of decision theory. Dow and Werlang (1992) are the rst to analyze the implications of this
theory in the context of portfolio selection. First, they show, agents do not trade when prices
are not favourable enough. Second, and within their simple introductory example, they vividly
illustrate a fundamental albeit somehow technical result known since at least Schmeidler (1986)
and further elaborated by Gilboa and Schmeidler (1989) in a context of decision theory: once
capacities are convex, the agents behaviour is the same as that of an agent who has a maxmin criterion. Max-min criteria are best described as decision rules the agents implement while
believing nature will draw worst-case scenario events. Agents then take robust decisions, in
that their choices will lead to outcomes that are the ideal ones in bad times.
Max-min criterions lead to an analytically convenient framework utilized in both nance
and macroeconomics, as explained in this section. The approach to max-min in situations
of uncertainty was originally advocated by Wald (1950), Ellsberg (1961) and Rawls (1971),
and axiomatized by Gilboa and Schmeidler (1989). The max-min criterion of choice has been
extended to smoother formulations that allow to disentangle a cognitive notion of uncertainty
from the attitude towards it.
Provide references in decision theory and on work in macroeconomics and nance.
[In progress]
8.7.1.3 Plan of the section

[In progress]
8.7.2 Uncertainty aversion and Ellsberg paradox
The Ellsberg paradox (Ellsberg, 1961) describes situations in which agents prefer to take considerable amount of risk rather than to engage into situations plagued with ambiguity. To illustrate
10 Gilboa

and Marinacci (2011) provide a survey of Knightian uncertainty in economics and decision theory.

372

c
by
A. Mele

8.7. Coping with Knigthian uncertainty

this paradox, consider the following example relying on a two-period market, zero interest rates,
and a risk-neutral Robinson Crusoe agent.
We assume that there are three Arrow-Debreu securities that pay o in three mutually
exclusive states of the world, states A, B and C (see Table 8.2 below). We initially assume that
the probability of state A is 13 and that of state B is 14 , such that the securities values, say,
are
1
1
5
(8.85)
A =
B =
C =
3
4
12
with straightforward notation. Obviously, then, the value of a security paying o in states A or
C is higher than that paying o in states B or C,
A C

9
12

B C

8
12

(8.86)

That is, the ranking of two portfolios is preserved once we include the same additional assets
in each of these portfolios, provided the additional assets pay in states of the world in which
the initial portfolios do not.
Arrow-Debreu
securities
A
B
C
A C
B C

states
A B C
1 0 0
0 1 0
0 0 1
1 0 1
0 1 1

TABLE 8.2. Arrow-Debreu securities in an uncertain market.

Next, suppose that the probability of state B,


tainty, not risk. We have:
A

1
3

A C

say, is unknown. It is a situation of uncer-

=1

B C

2
3

How would we expect Mr Crusoe to rank these assets? We are actually stuck without making
additional assumptions regarding his attitude vis-`a-vis uncertainty. We may proceed as follow.
We assume that Mr Crusoe is risk-neutral but at the same time so averse to uncertainty that if
hypothetically asked to go long on these asstes, he would evaluate them at worst-case scenario.
To formalize this uncertainty, we may assume that Mr Crusoe may conceive a band within
which the unknown probability lies,
( ). The wider this band, the more averse he is
to uncertainty regarding . Note that and can be interpreted both in terms of uncertainty
aversion and cognitive terms, that is, in terms of the extent of ignorance about . We address
this issue later; for now, we interpret this band as indicating Mr Crusoes aversion to uncertainty.
How does asset evaluation reect uncertainty aversion? Assuming that Mr Crusoe evaluates
the asset at worst-case scenarios,
B

min

( )

and A

373

min (1
( )

)=1

c
by
A. Mele

8.7. Coping with Knigthian uncertainty

An interesting phenomenon results. The ranking summarized by Eqs. (8.85)-(8.86) may break
1
down. In particular, if Mr Crusoe sets his band such that = 13
1 and = 3 + 2 , for two
positive numbers 1 and 2 small enough, then,
A

B and A

B C

(8.87)

That is, uncertainty aversion might actually undermine a sure thing principle, a concept
we shall return to in a moment.
The previous example is consistent with experimental evidence initially provided by Ellsberg
(1961), who showed that individuals tend to avoid situations where it is di cult to describe
events probabilistically. Klibano , Marinacci and Mukerji (2005) explain this context by relying
on the following lottery counterpart to the Arrow-Debreu security example in Table 8.2.

states
Lotteries A B C
1 0 0
0 1 0
0
1 0 1
0
0 1 1
TABLE 8.3. Lotteries in an uncertain environment.

Savages axiom P2, known as the sure thing principle, would tell us that in this context and
for any decision maker,
0
=
0
Yet assume that the probability of state is unknown, similarly as in the example of Table
8.2. Experimental evidence is consistent with the hypothesis that in this case, decision makers
would prefer the risky lottery 0 (paying o $1 with probability 23 ) rather than the lottery
0
(paying o $1 with unknown probability), even if their preferences would lead them to
choose over . In other words, aversion to uncertainty entails the following counterpart to the
inequalities in (8.87),
0
0

and
The previous examples reveal that there are new elements in asset evaluation in the presence
of aversion to Knightian uncertainty. We now proceed with a few key models that provide new
predictions in this context.
[In progress]
8.7.3 Portfolio selection and market participation
8.7.3.1 A static model

Dow and Werlang (1992) would actually deal with capacities. Heuristically, capacities are nonadditive measures, in that they do not sum up to one from the perspective of an uncertaintyaverse decision maker. For example, an investor may be unaware of the distribution of the asset
discounted value, yet he may consider that the asset discounted value is high ( ) with probability no lower than , or low ( ) with probability no lower than 0 , with these probabilities
374

c
by
A. Mele

8.7. Coping with Knigthian uncertainty

being such that + 0 1. In other words, the (unknown) probability that the asset value is
0
,
say, satises
1
. We now illustrate how the fact that + 0
1 reects
the investors aversion to uncertainty. We shall explain that while the investor does not know
the true distribution of the asset value, he evaluates the asset by assigning low chances of good
outcomes.
Suppose a risk-neutral agent is contemplating buying the asset. The worst-case scenario for
him is that the asset value is ; however, there are chances this worst-case scenario could
improve by an amount equal to
. That is, this improvement occurs with probability
at least equal to . The minimum expected improvement is, thus, (
), such that the
minimum expected return from being long the asset is
( + (
))
, where
denotes the asset price, as usual. The expected improvement and expected returns are minimum because by assumption, the true probability is taken to satisfy
. Now, consider
a risk-neutral agent who calculates expected returns at these minimum levels; he will buy the
asset if
0, i.e., whenever the price satises
+ (

(8.88)

Similarly, consider a risk-neutral agent who contemplates selling the asset. His worst-case
scenario is that the asset is actually good, in which case his payo is
and the minimum
expected improvement is 0 (
), such that the expected return from being short the asset
is
( + 0(
)) + . Similarly as in the buy-case, the expected improvement is the
0
minimum one because by assumption, the true probability that the asset is is 1
.
Our agent would now sell the asset when
0, i.e., when the price satises

(8.89)

Note how aversion to uncertainty operates. The conditions (8.88) and (8.89) tell us that
the lower and 0 , the more averse to uncertainty you are. Aversion to uncertainty leads the
agent to presume that + 0
1. Let us elaborate. By (8.88), the agent buys when
and by (8.89), the agent sells when
, where
, with an equality holding only when
0
0
+ = 1. That is, and unless the agent is uncertainty
neutral (i.e., + = 1), the agent will
not participate in the market when the price
. When
, indeed, the price is
too high to break-even while deciding to buy, and too low to break-even while deciding to sell.
0
Dow and Werlang (1992, p. 200) dene the quantity 1
as the amount of probability
lost by the presence of uncertainty aversion.
8.7.3.2 Worst-case scenario interpretation

We can re-interpret the previous behavior in terms of decisions made under worst-case scenarios.
That is, we may assume, now, that the agent relies on a set of addititive priors, i.e., on fully
specied probability distributions. The previous model can then be interpreted as one in which
the agent picks up the worst-case distribution according to the acts he makes (buy or sell).
Let be the unknown probability that the asset value is good, such that the expected prots
can be written as
(
+ (1
)
)
where denotes the position in the asset, with | | = 1 because we are assuming that the agent
can only buy or sell one unit of the asset. Note that sums up to one now.
This model can be made consistent with the previous capacity framework. In the previous
model, the asset is worth with a probability at least , meaning that the asset is worth with
375

c
by
A. Mele

8.7. Coping with Knigthian uncertainty

probability at most 1 . Moreover, in the previous model, the asset is worth with probability
0
at least 0 , meaning that the asset is worth with probability at most 1
. Therefore, when
0
the uncertainty averse agent buys, he does so while thinking that the probability
( 1
)
that the asset value is is as small as possible. Similarly, when he sells, he does so while thinking
that the probability 1
( 0 1
) that the asset value is is also as small as possible.
That is, when the agent buys, the prots he expects are:
min

0)

( 1

+ (

and when he sells,


1

min
0
(

)=

)=

Thus, this model collapses to the previous capacity model, as claimed. it also provides additional rationale regarding the probability bands underlying the Arrow-Debreu asset evaluation
in Section 8.7.2.
A technical detour: Provide denitions/relations amongst capacities, convex measures, cores,
etc.
[In progress]
8.7.3.3 Maxmin preferences

The previous model can be extended


to2 one in which agents are risk averse. Suppose that an
asset pays o a dividend
, where the mean is unknown albeit presumed to
belong to a certain band,
[ ], for two constants and . We shall come back to the
interpretation of this band below. Suppose that a safe asset is elastically supplied at an interest
rate equal to zero, and denote as usual with the price of the risky asset. A risk-averse agent
(with CARA equal to ) chooses to invest dollars in the risky asset while solving the
following optimization problem:

= arg max min


(8.90)
[ ]

where
=(
) , and
denotes the expectation taken under the probability law that
results when = .
Knightian uncertainty arises in this model due to the lack of knowledge regarding the dividend
distribution: while the agent knows is normally distributed, he is unaware of its exact location,
. The presumption that
[ ] might indicate his aversion to this uncertainty: the wider
he presumes this band is, the more uncertainty averse he is. One alternative interpretation
of the band is that this very same band plays a merely cognitive role, in that the agent only
knows that there are two bounding constants to the truth, and . In this model, it is actually
impossible to disentangle the cognitive notion of uncertainty from the agent attitude towards
it. There are more general models of ambiguity in which these two notions can be separated,
surveyed in the next subsection.
The optimization problem in (8.90) is solved as follows. First, the agent solves for the inner
problem; that is, he takes a portfolio choice as given and, then, gures out what a malevolent
Nature would pick up for him given his portfolio choice. For example, if the agent buys the
asset,
0, it easy to see that the solution for the inner problem is
. Analogously, if
0, then, the inner solution is
. In other words, the agents aversion to uncertainty
leads him to presume that Nature will pick up a bad asset for him when he buys ( = ), and
a good one when he sells ( = ).
376

c
by
A. Mele

8.7. Coping with Knigthian uncertainty

Given the solution to the inner problem, , the agent proceeds with solving for his portfolio
choice. The rst order conditions lead to
=

( )

We have that 0
( )
and
the overall optimization problem in (8.90) is
( )

for

for

( )
2

( )

( )
for

( )]

( )

( )

, such that the solution to

(buy region)
(non-participation region)
(sell region)

The interpretation of the agents behavior is similar to that regarding the previous model.
The agent participates in the market if the price is su ciently favourable to him, compared
to the worst-case scenario: he buys (sells) when the price is lower (higher) than his own most
pessimistic (optimistic) expectation of the asset payo . In case the price is not favourable
enough, the agent does not participate.
It is interesting to examine the equilibrium implications of the model. Suppose that the asset
is in positive supply, , say. In equilibrium, = . Then, the equilibrium price is
=

( )

That is, the equilibrium price reects the most pessimistic evaluation of the dividend. This
property will extend to the dynamics models below.
8.7.3.4 Smooth ambiguity aversion

Klibano , Marinacci and Mukerji (2005) (KMM, in the sequel) introduce a model of ambiguity
in which the cognitive notion of uncertainty can be disentangled from the attitude towards it.
We illustrate the main features of this model while extending the previous market, as follows.
Conditionally on , the asset payo
is still normally distributed, but uncertainty aversion is
modeled assuming that there exists an increasing and concave function : R
R, such that a
decision maker prefers the portfolio holdings 0 to if and only if

0
(
))]
(8.91)
M[ (
M
where M
the set of priors on . For example, Mele and Sangiorgi (2015) assume that
denotes
2
in their model with asymmetric information, an assumption that will be used
0
in this section too.
Concavity of is crucial to the denition of ambiguity aversion. Let us explain. Lack of
knowledge of is the source of uncertainty in the model, and concavity of implies that a
decision maker dislikes mean-preserving spreads in expected utility values that arise due to :
he is thus ambiguity averse, unless is linear, in which case he is ambiguity neutral. Therefore,
in this model, Knightian uncertainty arises because is unknown, with 2 measuring how
acute uncertainty is. What makes the model new compared to one with only second-order
377

c
by
A. Mele

8.7. Coping with Knigthian uncertainty

uncertainty (i.e., the presence of a stochastic mean in the asset payo ) is the aversion to this
second-order uncertainty, arising through the concavity of .
Regarding the functional function for , one may consider ( ) = 1 (
1), where the
parameter measures absolute ambiguity aversion, with the model collapsing to one with
an ambiguity neutral agent only when = 0. Based on KMM (Proposition 3), one can show
that maxmin expected utility obtains for large. This section follows Mele and Sangiorgi, in
that is taken to lead to constant relative ambiguity aversion, i.e. ( ) = ( ) , for some
constant
1, with the model collapsing to a description of an ambiguity neutral agent only
if = 1. For otherwise, the higher , the more averse to ambiguity the agent is, independent
of the extent of parameter uncertainty about , which is summarized by 2 , as explained.
Thus, the agent solves the following optimization problem:

= arg max M
(8.92)
where the Appendix shows that

= exp
M

1
) +
2

( var ( ) + (1

) var ( | ))

(8.93)
Thus, the program of this ambiguity-averse agent resembles that of a mean-variance agent,
although the variance term is replaced by a convex combination of the unconditional variance
var ( ) = 2 + 2 and the conditional variance var ( | ) = 2 , such that V
var ( ) +
2
2
2
(1
) var ( | ) =
+
. If the agent were ambiguity neutral, = 1, V =
+ 2 , such
that the problem would be indistinguishable from a standard mean-variance program with an
increased variance: in other words, second-order uncertainty would not matter. Instead, secondorder uncertainty matters when the agent is ambiguity averse, in which case the solution in
(8.92) is

0
2

+ 2
The more averse to ambiguity the agent is, the higher , and the less aggressive his portfolio holding will be for a given uncertainty level 2 . The equilibrium implications of the
model are straightforward.
With the asset in positive supply, , the equilibrium price is =

2
2
+

,
with
an
uncertainty premium component, being clearly related to both
0
2
uncertainty, , and uncertainty aversion, .
=

8.7.4 A model of multiple likelihoods


Leippold, Trojani and Vanini (2008) (LTV, in the sequel) develop a model in which the drift of
dividend growth is unobservable, similarly as in the learning models of Chapter 7 (see Section
7.5.4) and the previous Section 8.6. Their point of departure from those models is the assumption
that agents do not know the statistical laws regarding the unobserved dividend growthin
previous models, the laws of movement for the expected dividend growth are known, although
the expected dividend growth is unobserved. This lack of knowledge adds complexity and leads
to interesting predictions regarding the equity premium.
In the model, agents are unsure about the unobservable drift of dividend growth, yet they
formulate conjectures regarding the existence of a cloud of models containing the truth. In
detail, this cloud is placed around a reference model for the dynamics of aggregate dividends,
=

+
378

(8.94)

c
by
A. Mele

8.7. Coping with Knigthian uncertainty

where 0 is a constant, is the unobservable expected dividend growth, and


is a standard Brownian motion. Eq. (8.94), the reference model, is referred to as by LTV as a rough
approximation to reality, for reasons explained below.11
Eq. (8.94) is an approximation to reality, as mentioned. That is, the agents acknowledge
that they are dealing with a possibly misspecied model, which leads them to consider multiple
likelihoods governing the statistical laws of the fundamentals. That is, even while they take Eq.
(8.94) as a benchmark, they do not trust this benchmark. They actually assume that the true,
unknown, model is contained in a neighborhood of this benchmark, i.e., in one belonging to
the following family of models:
=( + ( )

0)

(8.95)

In this family of models, the distorsion function generates deviations from the benchmark.
It captures the idea that agents face Knightian uncertainty, in that they have limited knowledge
regarding whether their reference model is correctly specied. Moreover, the agents assume that
satises
2
( )
( )
for all
(8.96)
and for some known function . In words, Eq. (8.95) contains model specications that are
statistically close to the reference model (8.94): they are so close that it is actually di cult to
distinguish them statistically. Naturally, in the absence of ambiguity, one has that
0, such
that this model would collapse to those seen in Chapter 7.
Ambiguity, as described until now, is a source of second-order uncertainty, i.e., one that
adds uncertainty (the agents acknowledgement of dealing with model misspecication) to an
already unobserved process (the drift of dividend growth), which they may wish to learn about.
The crucial point is how the agents behave vis-`a-vis this added uncertainty. The behavioral
assumption is that they fear model misspecication, and choose consumption and portfolio
policies (
) that maximize their worst-case scenario welfare, i.e., their lifetime utility arising
when a malevolent Nature chooses the worst possible model in (8.95) for them:

Z
max inf
( )
(8.97)
(

(0

subject to the usual wealth constraints, and where


is the expectation taken under the
statistical laws in Eq. (8.95) arising with a distortion function equals to , is the instantaneous
discounting rate, and ( ) is the instantaneous utility with CRRA equal to . This formulation
of the agents problem has the same avor as the max-min problem in the static market of
Section 8.7.3.
Regarding the unobserved drift in the reference model, , we may assume that is a continuous process or that it is a Markov chain on set of nite values, as explained in Chapter 7
(Section 7.5.4). One simplifying assumption is that takes a nite number of values, similarly
as in the introductory example of learning of Chapter 7 (see Section 7.5.4.2).12 Moreover, we
assume that could only take two values, and , and denote the family of drifts of the dividend growth with
= + ( ) 0 . As for the agents inference process, let the conditional
11 LTV assume that the agents also observe additional signals on , which are not considered in this section to simplify the
presentation of the model.
12 Note, then, that with this assumption, the model does not generalize Veronesi (2000) models of learning, in which the unobserved
drift is a Markov chain (see Section 7.5 in Chapter 7). However, it captures the salient features brought up by Knightian uncertainty.

379

c
by
A. Mele

8.7. Coping with Knigthian uncertainty

probability
Pr( = F ), where F denotes the information set availaible to agents at
time . Then, by results given in Chapter 7, we have that

(1



( )
where
= 01
.
An equilibrium is one in which (i) a representative agent solves the optimization problem
(8.97) and (ii) his optimal consumption equals dividends, =
for all . To solve the optimization problem, and in analogy with the static maxmin model of Section 8.7.3, the agent rst
determines the inmum in (8.97), while taking as given the consumption and portfolio choices
in the outer optimization problem. Given the thusly determined functions, the agent solves for
the outer portfolio policies while imposing the equilibrium condition that = . Given this
equilibrium condition, we have that the inmum is attained with

Z
() arg inf
( )
0


= . Note that because the
Let ( )
: (8 96) holds and assume that ( )
drift of
in (8.95) is increasing in (), then, by a comparison theorem (e.g., Karatzas and
Shreve (1991, p. 291-295)), the previous expectation is increasing in (), such that13
p
( )=
( )
(8.98)

In other words, asset prices are now evaluated as if the aggregate dividends dynamics were
fully observed, but with left-tilted bounds to growth,
=
=

(
0

(8.99)

)(

) , such that, by results in Chapter 7 (Section


( ) ) = + (1
7.5.4), the equity premium, E , the short-term rate, , and volatility, Vol , are given by
where

E =

Vol

1
( + 1)
2

= +

2
0

Vol =

( )
( )
( )

(8.100)

where () denotes the di usion coe cient of in (8.99) and is the price-dividend ratio.14
By results given in Chapter 7 (see Section 7.4), the price-dividend ratio is an a ne function of
the dividend growth expected under the agents worst-case scenario probability, . Appendix
3 shows that

( ) =

| {z }

1
(1

) (

+
1
2

2
0)

13 Precisely,

| {z }

1
(1

)(

1
2

2
0)

(8.101)

=1

note

that the value function in (8.97) that has to be extremized is


( )
=
0
, 1
( ( )| ) , where 1
, 2
, 2
1
2 . The expectation in the integral is increasing
in the drift of
by a comparison theorem.
14 Note that the probability we are using while dening these statistics is that under the distorted beliefs: the agent prices the
asset through ( ), as explained, and the statistics in (8.100) all originate from the pricing function () given below (see Eq.
(8.101). The expression for the equity premium under the reference model is given in Eq. (8.102) below.
2
=1 0

380

c
by
A. Mele

8.7. Coping with Knigthian uncertainty

The price-dividend ratio is a weighted average (weighted with the posterior ) of the discounted lifetime expected dividends conditional upon the up () and down ( ) states of the
world, and under the worst-case drifts. Provided
1, the price-dividend ratio is decreasing
15
in the degree of ambiguity aversion, ().
Note that the equity premium in (8.100) originates from the perspective of the worst-case
probability in the family of models included in (8.95). Appendix 3 shows that the equity premium under the reference model (8.94) is
E =E

Vol A

() + (1

) ( )

(8.102)

where E is the worst-case scenario equity premium in (8.99) and A is the average size of
ambiguity aversion. By Eq. (8.98), A
0, such that ambiguity aversion contributes positively
to the thusly dened equity premium.
The rationale behind the denition of E in (8.102) relies on the following interpretation of
the reference model. The ambiguity averse agent prices the asset at the worst-case scenario, yet
Nature draws aggregate dividends according to Eqs. (8.94). Naturally, the agent is unaware of
this statistical law, and places a band around , a band that could be symmetric or asymmetric,
reecting the modeling assumption that the agent only knows that for each regime (up or down),
the expected dividend growth belongs to a certain band.16

It is easy to see that if the band is symmetric, with, say, ( ) =


, the model
predicts that the price-dividend ratio is the same as that in an economy without ambiguity,
but a higher time preference rate (see Eq. (8A.44) in Appendix 3).However,
this isomorphism

breaks down once ambiguity aversion is state-dependent, ( ) 6= . Furthermore, even with


homogeneous ambiguity aversion, the economic assumptions at the heart of the model lead to
an equity premium under the reference probability, E
in (8.102), which cannot arise in an
economy without ambiguity.
The left panel of Figure 8.2 plots the equity premium predicted by the model, with parameters
xed at the values given in the legend of Figure 8.2. Note that the E component is small,
whereas the ambiguity premium Vol A is able to boost the premium at the levels we need
to rationalize the empirical evidence reviewed in Chapter 7. Note, also, from the right panel
of Figure 8.2., that return volatility, Vol , has the right order of magnitude in the calibrated
model. Finally, the price-dividend ratio, not shown, ranges from about 25 to 38.

15 The assumption that


1 is standard. The sensitivity of the short-term rate to growth is the agents CRRA, (see (8.100)).
Therefore, if
1, growth changes would make discount rates change even more, such that the price-dividend ratio would be
decreasing in growth and, hence, increasing in ambiguity aversion. Non-expected utility would x this seemingly counterintuitive
issue while allowing the elasticity of intertemporal substitution to be disentangled from relative risk aversion (see Section 8.2).
16 Precisely, the band is [
0 +
0 ] in the up
( ) 0 +
( ) 0 ] in the down regime, = , and [

regime,

= .

381

c
by
A. Mele

8.8. Production

Expected excess returns

Excess return volatility

0.06
0.25
0.05

with
ambiguity
premium

0.2

0.04

0.15
0.03

0.1
0.02

0.05

0.01

0.01

0.02

0.03

0.01

0.02

0.03

FIGURE 8.2. Left panel: The solid line depicts the equity premium under the reference
probability, E
in Eq. (8.102), as a function of the expected growth, ; the dashed
line is the equity premium under the worst-case probability, E
in Eq. (8.100). Right
panel: return volatility, Vol in Eq. (8.100). In both panels, parameter values are = 0 005,

= 0 03, 0 = 0 01, = 1 , = 0 04, and, nally, ( ) = = = 0 05.
2

Note that the equity premium is inverse-U shaped against , a property arising mainly
because return volatility is inverse-U shaped. The origins of this property are claried in Chapter
7 (Section 7.5.4). Models with multiple states (such as those considered by LTV) and meanreverting behavior (ensured while assuming is a Markov chain) are natural candidates to make
volatility and equity premiums visit their descending parts more often than their ascending
parts, thereby leading to a countercyclical behavior.

8.8 Production
Consider an economy with one representative rm producing one single good, as in Section
3.4.1.2 of Chapter 3, and paying o a dividend (
) in each period , expressed as a
function of capital
and investment , with partial with respect to capital
equal to
(
):

(
)
(
( ))
( )

(
)
(
( ))
382

c
by
A. Mele

8.8. Production

Remember, Tobins marginal q and average q are the same, by Theorem 3.2, meaning that the
stock market value of the rm, ( ), coincides with the value of installed capital, ( ) =
collapses to Tobins q, once we x the price of uninstalled capital to one,
1,
+1 , where
which is the case as soon as the rm produces uninstalled capital, simply. A few calculations
allow us to dene equity returns in this economy. First, we note that:
(

)=
=
=
=
=

+1

[
[
[
[

(
(
(
+1 (
(
+1 (
+1 ( (
+1

+1

+1 )

+1

+1 )

+ (1
)
+1 + +1 (

+1

+1 )

+1

+1 )

+1

+1

+1

+1

+1 )]

+1

+1 ))]

+2

+1 ))]

+1 ))]

where the second line follows by the q theory, as developed in Chapter 3, the third and fourth
lines by the law of capital accumulation, and the expression for ( +1 ), the fth line by the
condition +1 =
( +1 +1 ), and the homogeneity of the function . Therefore, equity
returns are:
+1

+1

+1 )

(
(

+1

In the absence of adjustment costs,


capital gains, ( +1 )
( ) 1=
+1

+1 )

+1 )

)
+1

+1

+1

+1 )

= 1, such that the


0, Tobins q collapses to one,
+ +1
,
bringing
equity
returns
to:
+1
1= (

))

To match the volatility of equity returns, a model without adjustment costs would require
a counterfactually large volatility of the marginal product of capital. Therefore, not only are
adjustment costs needed to rationalize the existence of time-varying market-to-book ratios.
Adjustment costs would have the potential to boost return volatility. But then, the equity
premium puzzle can only be exacerbated in a setting without adjustment costs. Note, indeed,
that by the usual representation of the equity premium in Section 6.5 of Chapter 6,

+1

corr (

+1

+1 )

Std (
(

+1 )
+1 )

Std

+1

where denotes the equity return in excess of the risk-free rate. Unless the excess returns predicted by the model co-vary substantially, and negatively,

with the stochastic discount factor,


the equity premium can only be small, when Std
+1 is very small. One route to inate the
equity premium might seem to be one where risk-aversion is increased. However, in equilibrium,
equity returns obviously relate to consumption, and in models with production, consumption
smoothing may make the equity premium puzzle worsen: as originally pointed out by Rouwenhorst (1995), if consumption

is endogenous, it becomes smoother as risk-aversion increases,


thereby making Std
+1 smaller, in equilibrium.
The main issue with the neoclassical model is that capital supply is perfectly elastic, such
that the price of capital and, hence, capital gains, are roughly constant, consistently with
the previous arguments. As Jermann (1998) and Boldrin, Christiano and Fisher (2001) note,
383

8.9. Government spending and asset prices

c
by
A. Mele

we need to introduce some sort of hindrance to the adjustment of capital supply to shocks.
For example, Jermann (1998), assumes the presence of adjustment costs. Instead, Boldrin,
Christiano and Fisher assume, among other things, that investment decisions can be thought
to be determined prior to the realization of the shocks. Both Jermann (1998) and Boldrin,
Christiano and Fisher (2001) consider economies with habit persistence anyway, which allows
them to generate variability in the demand for capital and, hence, boost price volatility.
[In progress]

8.9 Government spending and asset prices


[In progress]

8.10 Leverage and volatility


Is rms leverage responsible for a sustained stock volatility? Can leverage explain countercyclical stock volatility? We already know, from the previous chapter, that ex-post stock returns are
high in good times, whence stock volatility is negatively related to ex-post returns. According
to the leverage e ect hypothesis, this negative relation arises because after a negative shock hits
a stock price, the debt/equity ratio increases and a s result, the rm becomes riskier, leading to
an increase in the stock volatility. Empirically, it is often argued, the leverage e ect is too weak.
Most of the contributions to these issues are empirical (e.g., Black, 1976; Christie, 1982; Schwert, 1989a,b; Nelson, 1991). Naturally, another possibility is that stock volatility and returns
are negatively related for reasons unrelated to the leverage e ect. For example, stock volatility
can be countercyclical because agents preferences and beliefs, combined with macroeconomic
conditions, lead precisely to this property, as in the models discussed in Chapter 7, and in the
previous sections.
In this section, we explore an additional explanation, namely that countercyclical volatility
arises as a result of a combined e ect of the properties of the previous models, and leverage. A
di culty is that in many empirical studies, tests of the leverage e ect hypothesis are performed
without regard to a well specied economic model. Gallmeyer, Aydemir, Hollield (2007) show
that the reasoning underlying this hypothesis can be made rigorous. They formulate a general
equilibrium model with levered rms, which they realistically calibrate, to disentangle leverage
e ects from real e ects such as habit formation. They make use of a stochastic discount factor
known to price assets fairly well, and conclude that leverage e ects do indeed have little e ects
in general equilibrium. This section develops a variant of their model, which has the mere merit
to admit a closed-form solution.
8.10.1 Primitives
We consider an endowment economy, and denote endowment at time with , assuming it is
a Geometric Brownian motion with parameters 0 and 0 . The reason we denote output with
, rather than the usual , is that we assume a representative rm issues debt, denoted with
, such that the value of the rm is, by Modigliani-Miller [to be discussed in Chapter 13],
+ , where
is the value of equity at time . Let denote debt maturity. The payo s
of the rm are such that =
+ , with obvious notation. We assume a representative agent
has habit formation preferences, and to obtain closed-form solutions for the asset prices, we
384

c
by
A. Mele

8.10. Leverage and volatility

make reference to the Menzly, Santos and Veronesi (2004) economy in Section 7.5.4 of Chapter
7. We denote the equilibrium surplus consumption ratio with =
, where, as explained
extensively in Chapter 7, is solution to,

1
1
1
1
1
=
0

and , , and are parameters.

In this economy, Sharpe ratios are countercyclical, being


equal to ( ) = 0 1 + 1
, as mentioned in Section 7.5.4. We assume debt services are
= , for some
(0 1), and set the benchmark for debt maturity to = 10 years.
We now show that a calibration of the model leads to the following results: (i) the price of debt
is procyclical; (ii) return volatility is countercyclical; (iii) the leverage ratio is countercyclical;
and nally, (iv) the contribution of leverage to equity returns volatility is quantitatively limited.
8.10.2 Equity volatility: a decomposition formula
From Chapter 7, we know that the price-dividend ratio for the aggregate consumption claim is
( )
+ , for two constants and , which we shall give below again. It is easy to show
that the debt value is,
(+ )

( + )
)+
(1
1
= (
+
)
=
=
+
( + )
where

Vol

= lim

where Vol ( ) =

and
+(

= lim

. Equity volatility is,

Vol
| {z
0

0 15

0+

Vol ( )

0+

, such that

=
Vol ( )
0 +
|{z}
| {z }
+
|
{z
}
}
=0 01
510 3
= endog. P/D uct. 26 31

+
Vol ( )
| {z }
+
+
|{z}
{z
}
|
510 3

Vol ( )

= leverage multiplier

(8.103)

0 24

11 08

where we have indicated the approximate average values taken by the variables of interest, and
obtained by calibrating the model with the values of Table 8.2 on Section 8.9 below. Note, also,
that the leverage ratio, , is endogenous and equal to,
=

(
+

+
(

)
+

and Vol
, as the surplus changes.
In other words, we only see what happens to
As the numerical values in Eq. (8.103) show, much of the action in this model derives from the
0
large swings in the price-dividend ratio, (( )) = + . What is the statistical relation between
and return volatility that the model predicts? Figure 8.3 depicts values
the leverage ratio
385

c
by
A. Mele

8.11. Multiple trees and the cross-section of asset returns

of the leverage ratio and volatility consistent with the model. Note that Figure 8.3. does not
depict a causal relation, as leverage and equity volatility are both driven by the same state
variable, the surplus consumption ratio.17

Vol
0.20

0.15

0.10

0.05

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

leverage ratio

FIGURE 8.3. Leverage and equity volatility: a naked eye view.

8.10.3 Bankruptcy
The previous model has no role for bankruptcy, which plays an obviously fundamental role,
as developed in great detail in in Chapter 13. Let us consider bankruptcy in a simple setting.
Consider a two date economy, and suppose that the value of the rm in one year is, , which
equals bad
Nominal debt, with probability , and good
Nominal debt, with probability

1 . We assume risk-neutrality. and that are no bankruptcy costs. Let = 1 0 0 be the equity
q
return, where 1 is the equity value at the second period. Then, we have that vol( ) =
.
1

For example, if

= 2%, then vol( ) = 14%!

8.11 Multiple trees and the cross-section of asset returns


Menzly, Santos and Veronesi (2004), Cochrane, Longsta and Santa-Clara (2008), Pavlova and
Rigobon (2008), Martin (2011).

8.12 The term-structure of interest rates


What are the term-structure implications of the main paradigms considered so far? Consider
the habit formation model introduced by Campbell and Cochrane (1999). While Campbell and
Cochrane consider an economy where interest rates are constant, in their working paper they
allow the short-term rate to be time-varying, and as explained in Appendix 5 of Chapter 7, set
equal to:

1
1 2
( )= +
( (1
)
) + ( ln )
(8.104)
0
0
2
2
17 Does debt maturity lead to a greater contribution of leverage to volatility? It is not the case. Given the models parameters,
the e ects of debt maturity on leverage can be shown to be quite limited.

386

c
by
A. Mele

8.12. The term-structure of interest rates

where is the surplus consumption ratio, is a constant, and all the remaining parameters are
as in Section 7.5.2 of Chapter 7. Wachter (2006) analyzes the term-structure implications of
this model in detail, both real and nominal, within an environment with time-varying expected
ination.
Note, the constant does not depend on anything relating to the agents preferences. Its mere
role is to make interest rates time-varying. How to ensure that Eq. (8.104) is consistent with
optimizing behavior? As explained in Chapter 7, the short-term rate depends on the sensitivity
of habit to consumption shocks, a function of , ( ), through an e ect due to precautionary
savings: the higher this sensitivity, the higher the volatility of habit and, hence, the propensity to
save, which drives interest rates down. This sensitivity ( ) is free, in that it is not restricted by
the theoryCampbell and Cochrane simply guide us with heuristic considerations leading to it.
One of these considerations is that the short-term rate also relates to habit, due to intertemporal
substitution e ects, and negatively, due to mean-reversion. Campbell and Cochrane choose ( )
such that intertemporal substitution e ects exactly o set precautionary savings, thereby making
the short-term rate constant or, at most, a ne in the log surplus consumption ratio, as in Eq.
(8.104). Naturally, the sensitivity, ( ), is a function of , once this reverse engineering has
unfold, as shown in Appendix 5 of Chapter 7.
The question arises as to which sign we should expect from the parameter , empirically. Are
real interest rates countercyclical? They are. It is somehow puzzling, from the perspective of
the basic production economies analyzed in Chapter 3, where real interest rates are procyclical,
being positively related to the marginal product of capital and, hence, to productivity shocks.
However, economies with habit formation might be capable of generating countercyclical real
rates, due to intertemporal substitution e ectsIt is the case, for example, for the models with
frictions in the adjustment of capital supply to shocks of Boldrin, Christiano and Fisher (2001).
In endowment economies and habit formation, countercyclical real rates are, then, quite likely
to arise. Consider, for example, the Menzly, Santos and Veronesi (2004) model of external habit
formation presented in Section 7.5.4. we remind that this model predicts that the short-term
rate is:

2
2
( )= + 0
(8.105)
+
1
1
0
0

Figure 8.4 depicts the short-term rate as a function of , obtained using the parameter values
in Table 8.2, which are similar to those used by Menzly, Santos and Veronesi.

R(s) 0.16
0.14
0.12
0.10
0.08
0.06
0.04
0.02
0.010

0.015

0.020

0.025

387

0.030

0.035

0.040

c
by
A. Mele

8.13. Prices, quantities and the separation hypothesis

FIGURE 8.4. The short-term rate predicted by Menzly, Santos and Veronesi (2004) model
of external habit formation, with parameter values as in Table 8.2.

0 03 0 01 0 04 0 15 0 03 40 0 05 0 60
0

TABLE 8.2. Parameter values utilized for the Menzly, Santos and Veronesi (2004) model
of external habit formation.

The fourth term of Eq. (8.105) reects intertemporal substitution e ects, and is the dominating term, leading to countercyclical interest rates, due to the mean reversion in the surplus
consumption ratio, and similarly as in the Campbell and Cochrane model, as explained in Section 7.5.2 of Chapter 7. Finally, the catching-up model of Chan and Kogan (2002) reviewed in
Section 8.3 leads to the same prediction: real interest rates are countercyclical.
[In progress]

8.13 Prices, quantities and the separation hypothesis


One compelling lesson that we should have learnt is that to address the asset pricing puzzles
brought about by the neoclassical model, we would need a substantial re-vamp of the standard
paradigms underlying dynamic macroeconomic theorynamely, a re-vamp of the basic version
of the real business cycle theory reviewed in Chapter 3. For example, we would need to introduce
adjustment costs, habit formation, or restricted stock market participation. How is it, then,
that while attempting to explain quantity dynamics, macroeconomists would simply ignore the
advances made by nancial economists? Tallarini (2000) considers a di erent possibility, a real
business cycles model in which a representative agent has non-expected utility, as in Section
8.2.
8.13.1 A closed-form expression for non-expected utility
To illustrate these old and yet important ideas, let us determine the stochastic discounting
factor in a model with non-expected utility and without production; later, we shall describe the
main implications of a production-based economy. In Appendix 1 (see Eq. (8A.5)), we explain
that with non-expected utility, an alternative representation of +1 to Eq. (8.8) is:
1

1
1
( +1 ) 1
+1
+1
(8.106)
=
=
+1
+1
1
+1
where denotes the certainty equivalent continuation utility expressed in consumption units,
1
) , which is obviously ordinally equivalent
and is solution to Eq. (8.3). Next, dene = (1
to and satises

1
= (1
) +
( 1+1 ) 1

As the intertemporal elasticity of substitution gets close to one,


0, = 1 ( ( 1+1 )) 1 .
(1 )
Dene, then,
:
, obtained after having taken logs in the previous expression for
; by rearranging terms,
= ln

ln
388

+1

(8.107)

c
by
A. Mele

8.13. Prices, quantities and the separation hypothesis


where
(1
) (1
).
Next, assume that log consumption is a random walk with drift,
+1

+1

= ln

1
2

+1

+1

(8.108)

for some constant . Note that Eq. (8.108) is a special case of the dynamics of consumption
in the Bansal and Yaron (2004) model in Section 8.2it does not include the small persistent
component
of Eq. (8.19). Guessing that
= + ln , and solving for the undetermined
2
(1 ) (1
) 2
coe cients and using Eq. (8.107), delivers =
and = 1 1 , such that
3
2(1 )
the stochastic discounting factor in Eq. (8.106) for = 0 is
+1

+1

+1

=
+1

+1 )

=
+1

We have,
(

+1 ) =

(
(

+1

+ 12 (2

+1 ) =

+1 )

2 2

+1

(8.109)

By increasing , the volatility of the stochastic discounting factor increases although, then,
( +1 ) remains substantially at. Once again, this property is what makes non-expected
utility address the interest rate puzzle (see Section 8.2). As also discussed in Section 8.2, nonexpected utility does not necessarily imply a resolution of the equity premium puzzle: Eqs.
(8.109) clearly indicate that large values of are needed to inate the equity premium to empirically plausible levels. Alternatively, we need additional sources of variation in the primitives
of the economyfor example, after Tallarini, long-run risks are proposed by Bansal and Yaron
(2004).
8.13.2 Preferences for robustness
Hansen and Sargent (2008, page 317) demonstrate that this model can be understood as leading to a resolution of the equity premium puzzle even without long-run risks, by simply re1
interpreting the parameter
in Eq. (8.107) as a multiplier for an individual with ambiguity
averse multiplier preferences.
8.13.3 Irrelevance
The main point of Tallarini is an irrelevance result: we can understand the dynamics of asset
markets independently of those of real economic aggregates, due to the linearity of preferences
resulting from the assumption that intertemporal elasticity of substitution is one, = 0. To extend the previous model to one with real aggregates, Tallarini considers the following extension
to Eq. (8.107),
0

+1
= ln + ln + 0 ln
(8.110)

where the parameter


0 now loads leisure, , and is such that 0 =
(1 + ). Tallarini does
not consider adjustment costs, and yet his model can explain the equity premium, through a
simple increase in the risk aversion parameter 0 , while maintaining intertemporal substitution
constant, i.e. by keeping on assuming log consumption in the right hand side of Eq. (8.110).
Interestingly, raising risk-aversion does not a ect the quantity dynamics macroeconomists are
interested in, only intertemporal substitution might a ect it. Naturally, there are many other
389

8.14. Endogenous risk and the nancial accelerator doctrine

c
by
A. Mele

dimensions we should consider, to conclude on any models prediction about asset prices. For
example, Tallarinis assumption of no adjustment costs implies Tobins q is one. Moreover,
welfare calculations such as those in Lucas (19??) are likely to change, as Alvarez and Jermann
(20??) demonstrate.
Hansen and Sargent (2008, page 317) demonstrate that this model can be understood as
leading to a resolution of the equity premium puzzle even without long-run risks, by simply re1
interpreting the parameter
in Eq. (8.107) as a multiplier for an individual with ambiguity
averse multiplier preferences.
[In progress]

8.14 Endogenous risk and the nancial accelerator doctrine


The models reviewed so far predict that asset prices move in response to changes in the fundamentals. In these models, it is as if asset prices were derivatives written on a few key variables
summarizing the state of the economy. Can capital market movements lead to feedbacks on real
economic developments? The Great Recession likely has its origins in the credit crisis erupted
in 2007 (see Chapter 13). Adverse developments in credit markets led nancial intermediaries
to losses and a diminished propensity to lend and, ultimately, a credit crunch. In turn, the
credit crunch exacerbated business cycle conditions, over a spiral.
These historical facts are an instance of endogenous risk, by which a shock leads to adverse
developments, which in turn reinforce the initial shock, creating a trend. Specically, capital
markets turmoil can lead to adverse economic developments, which reinforce the initial turmoil,
over a spiral. Even a relatively small shock such as the initial subprime mortgage losses in 2007,
would quite adversely a ect the business cycle. This thesis is known as nancial accelerator
hypothesis [Mention the 1930s literature] Understanding asset prices under such a more general
perspective allows us to explain economic trends in critical contexts such as the Great Recession.
This section surveys models that have this nancial accelerator avor. We begin with variants
of the Kiyotaki and Moore (1997) model, in which small and temporary shocks generate large
and persistent swings in asset prices and the business cycle. The mechanism is similar to the
credit multiplier channel in Bernanke and Gertler (1989), although it does not rely on the
presence of nancial intermediaries. [In progress]
Bernanke, Gertler and Gilchrist (1999) provide an early survey of models with more emphasis on macroeconomics; Brunnermeier, Eisenbach and Sannikov (2012) survey more recent
developments.
[In progress]

8.14.1 Credit cycles


Kiyotaki and Moore (1997) provide a framework to think about business cycle uctuations originating from and, in turn, inuencing movements in asset prices. Persistence and amplications
of shocks are the key predictions of this model. In fact, persistence of shocks can be understood
without reference to asset prices: it could simply arise through the rms nancial constraints.
Kiyotaki (1998) develops a model along these lines, which is analyzed in this section. The next
section explains how introducing an asset allows a shock to be amplied.
390

8.14. Endogenous risk and the nancial accelerator doctrine

c
by
A. Mele

There are two types of agents: (i) productive or farmers and (ii) unproductive or
gatherers. Farmers invest into a linear technology, obtaining output +1 ,
=

+1

(8.111)

Gatherers have access to a less productive technology, in that when investing


output,
+1 , satises
+1 =

, their
(8.112)

where
1

(8.113)

Productivity is random: in each period, farmers may become gatherers, and gatherers may
become farmers. We shall develop more details regarding this assumption below. Intuitively,
switching types is required to make sure both types survive.
All agents maximize the same intertemporal utility of consumption subject to their budget
constraint. For example, the farmers plan is

!
X
+1
max
s.t.
ln +
+ = +
(8.114)
(

=0

=0

where denotes debt outstanding at time and


is the gross interest rate on debt for the
period to + 1. The budget constraint says that at time- , consumption, , and investments,
, are nanced by output, , and new issuance of debt, worth +1 , after repaying current debt
.
The general ideas underlying this model are simple: in equilibrium, we shall see, gatherers will
lend to farmers, because farmers know how to produce e ciently (i.e., they can make better
money than while just investing at the short-term rate). The main reason for this mechanism
to work is a friction in the market for lending: gatherers only provide collateralized loans to
farmers and, they are obviously not credit constraints, such that in this economy, the gross
interest rate
= .18
Without such frictions, the gross interest rate
= . Indeed, if
, the farmers would
never produce because the return on investment would be less than the real interest rate; and
if
, the farmers would be willing to borrow an innite amount, which cannot be an
equilibrium either. Note, then, that by Eq. (8.113), the interest rate
= would discourage
the gatherers from investing. In fact, gatherers are out of this economy once we assume their
initial wealth is zero: because they do not produce, they have no resources to survive.
Dene the farmers net worth as
. The ow-of-funds accounting in Eq. (8.114),
the equilibrium condition
=
and Eq. (8.111) yield
+1

=(

(8.115)

The farmers plan is then as in (8.114), subject to the ow-of-funds constraint in Eq. (8.115).
The solution to this program is well-known (see, e.g., Chapter 3). Optimal consumption is =
(1
) . Moreover, gatherers do not participate in this economy. Therefore, in equilibrium,
= 0 for all , such that by the constraint in (8.114),
=
. Aggregate investments are
18 A formal proof of this statement relies on the rst order conditions of the gatherers problem (the equivalent to (8.114)) with
and +1 and based on the production function in (8.112).
respect to

391

c
by
A. Mele

8.14. Endogenous risk and the nancial accelerator doctrine

, where
denotes the aggregate net-worth, which equals aggregate output,
because aggregate borrowing is zero. To sum up,
+1

+1

+1 ,

(8.116)

The growth of the economy is constant and equal to


. It is a very simple outcome, arising
within a representative agent economy. Note that this economy has farmers only because the
interest rate is too high to allow penniless gatherers to survive. Gatherers could survive in
an economy in which their productivity is at least as high as the interest rate, i.e.,
.
However, we know that this outcome cannot arise in absence of any frictions: farmers would
nd it optimal to borrow an innite amount of funds and try to exploit this opportunity, which
cannot be an equilibrium.
An economy with both farmers and gatherers needs to display some sort of hindrance in the
borrowing opportunites opened to farmers, for example, a limit to the amount of borrowing.
Kiyotaki and Moore (1997) propose the following framework, also used by Kiyotaki (1998).
First, only those who invest have the skills to produce the full output. Without those skills,
one may only achieve to a fraction of the full output. Moreover, each producer can declare
bankruptcy at any time after he starts produce. Anticipating this possibility of default, the
creditors will collaterize the loan. Assuming that the debtors have enough bargaining power,
the amount of debt issued by the farmers has a ceiling, i.e.,
+1

(8.117)

+1

We search for an equilibrium in which = around the stationary state, and (8.117) holds
as an equality. Below, we conrm that such an equilibrium exists, by just verifying that the
amount of the gatherers investments is bounded and bounded away from zero.
Note that (8.117) is the friction that prevents farmers from borrowing in an innite amount
even while the interest rate is less than their productivity.19 Thus, the farmers have access to
good nancing conditions but they cannot borrow an innite amount as they are borrowingconstrained by (8.117). They will then borrow as much as they could, i.e.,
+1

+1

(8.118)

The previous constraint imposes an upper limit to the farmers investment, . Using the
budget constraint in (8.114), the equilibrium condition = , and Eq. (8.118), it is:
=

(8.119)

Investments
are just mechanically derived from the budget constraints. The numerator
in (8.119) equals the farmers savings. It is the down-payment that is used to invest over and
+1
above the amount borrowed,
. Furthermore, to ensure this investment plan is part of the
equilibrium, it must be that
: the collateralized return (i.e., the amount of borrowable
funds) needs to be less than the interest rate to ensure that the current borrowing is less than
. The ow-of-funds accounting is now
+1

+1

+1

= (1

(1
1

19 Thus, Eq. (8.117) suggests the alternative interpretation of the parameter


where lenders do not observe the lenders type.

392

(8.120)

as the fraction of skilled producers in contexts

c
by
A. Mele

8.14. Endogenous risk and the nancial accelerator doctrine

The program of the farmers in this economy is to maximize the intertemporal utility in
(8.114) subject to (8.120). We solve for consumption , which then determines
in (8.119).
The solution for consumption follows by the usual argument. For a log-utility maximizer, =
(1
) . Replacing into (8.119), and aggregating over all farmers leaves:
=

(8.121)

where capitalized letters denote aggregate variables.


Under the conjecture that
= , gatherers are indi erent as to whether to invest or to
lend. However, we can determine their investments from equilibrium conditions. Indeed, note
that their aggregate savings are equal to
with obvious notation. Moreover, in equilibrium,
aggregate savings satisfy
( +
)
, where
= +
because aggregate borrowing is obviously zero. Therefore, gatherers investment is determined by equating aggregate
savings to aggregate investments, viz
!

1
(8.122)
=
=
1
such that aggregate output satises: +1 =

+1
=
+(

+1

+1

, or,

(8.123)

is the farmers aggregate share of net-worth.


where
Note that the growth of this economy can be lower or higher than the growth in the unconstrained economy,
(see Eq. (8.116)). It is lower if and only if
. There are indeed
conditions under which
in the neighborhood of the steady state. Note, also, that as soon
as
, the gathererss investment
is strictly positive and bounded by (8.122), conrming
that = is an equilibrium.
Finally, we determine aggregate output. Output dynamics depends on how farmers and gatherers switch their types. Assume that there is a mass : 1 of gatherers to farmers, and that
productivity is a Markov chain, as in the following table:
+1

+1

1
1
Moreover, assume that productivity is persistent, in that Pr (
Pr ( +1 | ) Pr ( +1 | ), i.e.,
1

+1 |

Pr (

+1 |

) and
(8.124)

That is, Pr ( +1 | ) Pr ( +1 | ). In words, a gatherer at time + 1 is more likely to have


been gatherer than farmer at time . [Determine the stationary distribution] Let +1 be the
aggregate net worth of the farmers. Then, the farmers aggregate new worth at + 1 after the
productivity shock, +1 , includes that of the time- farmers who are still farmers (and have
a mass equal to 1
) and that of the time- gatherers who become farmers (and have mass
equal to ), viz
+1

= (1

)(

+1

+1 )

+1

393

+1 )

+1

+1

(8.125)

c
by
A. Mele

8.14. Endogenous risk and the nancial accelerator doctrine

where capitalized letters denote aggregate variables, and the second equality holds in equilibrium.
Requiring that agents can swich their types (
0) ensures that a stationary state steady
exists, which is compatible with an equilibrium in which both types survive. Intuitively, if = 0,
the farmers net-worth would dominate, driven by the farmers higher productivity. Moreover,
persistence in productivity shocks (1
) is needed to ensure that a stationary steady
state exists.
To corroborate these claims, we determine the net-worth of both the farmers and gatherers,
and replace them into the farmers law of new-worth accumulation, Eq. (8.125). By using the
optimal consumption = (1
) into (8.120) and aggregating, yields the farmers aggregate
net-worth before the productivity shock,
+1

+1

(1
1

(8.126)

The gatherers aggregate net-worth is obtained by plugging Eq. (8.121) into the farmers
aggregate debt obtained through Eq. (8.118), and plugging
from Eq. (8.122) into
+1 =
, leaving:
(
)
(8.127)
+1 +
+1 =
Finally, we plug Eqs. (8.126)-(8.127) into Eq. (8.125) and use the expression for
(8.123), obtaining,
(1
)
(1
) +
( )
+1 =
+(
)

+1

in Eq.
(8.128)

An equilibrium is a sequence ( ) =0 for any given initial condition 0 . Note that (0) =
and (1) = 1
. Therefore, (8.124) guarantees that a xed point
exists, and is such that
= ( ), with = 1 when = 0. Moreover, under conditions, there is a unique
. So
productivity switches are needed to prevent this economy from reaching the trivial steady state
with gatherers extinction. Moreover, persistence of these switches needs not to be too large.
Suppose we are at the steady state,
, and that there is an aggregate productivity shock
at , in that both and lower by . After this shock, the farmers aggregate net worth is, by
Eq. (8.125):
) ((1
)
+
) (1
) +1
+1 = (1
That is, +1 obviously goes down, although proportionately more than aggregate output,
. It will now take
+1 =
+1 +
+1 , due to leverage. Therefore, after the shock,
+1
some time for the farmers aggregate share of net-worth
to converge towards its steady
state , through Eq. (8.128). This is propagation due to credit constraints. Through leverage,
a temporary productivity shock makes farmers aggregate net worth fall more than output,
leading to deviate from its steady state. It will now take time for to catch up to . During
this recovery process, output growth, +1 in Eq. (8.123), will obviously be lower than that in
the steady state, and will only gradually converge to it. In contrast, a temporary shock in an
unconstrained economy will only have a temporary impact.
8.14.2 Amplication
The model in the previous section illustrates how propagation mechanisms operate in an economy with borrowing constraints. However, the main idea underlying the nancial accelerator
394

8.14. Endogenous risk and the nancial accelerator doctrine

c
by
A. Mele

doctrine is that of contagion, that is, spillover of capital markets shocks into the real sphere
of the economy. In the examples of the previous section, the farmers borrowing capacity can
depend on the value of some collateral, say land. In bad times, when the land value drops, the
borrowing capacity decreases, which makes land value decrease even more. Land is a parody
for assets.
We explain how this channel works by relying on the Kiyotaki and Moore (1997) model,
which actually is a model simpler than that in the previous section.
[In progress]
8.14.3 Additional literature
Adrian and Shin (2011); Brunnermeier and Sannikov (2013); Danielsson, Zigrand and Shin
(2011); Geanakoplos (2010); Gertler and Kiyotaki (2011); He and Krishnamurthy (2012): Hugonnier
and Prieto (2012); Shin (2010).
[In progress]

395

c
by
A. Mele

8.15. Appendix 1: Non-expected utility

8.15 Appendix 1: Non-expected utility


8.15.1 Detailed derivation of optimality conditions and selected relations
Derivation of Eq. (8.5). We have,
X
( +1 +
+1 =
+1 )
X
=
( +1 +
+1

X
+1 +
= 1+

X
(
= 1+
+1

+1

+1

+1

+1

+1
+1

where the last line follows by the standard budget constraint


given in the main text.
+1 and the denition of
Optimality. Dene
W(

)=

1
1

and consider Eq. (8.6) in the main text,

) = max W (

( (

+1

+1 )))

) )

( (

+1

+1

+1

, the denition of

+1 ) (1

1
+1 )))

yields

The rst order condition for


W1 (

+ ((1

= W2 (

( (

+1 )))

+1

1(

+1

+1 ))]

(8A.1)

where subscripts denote partial derivatives. Thus, optimal consumption is some function of the state
(
) such that,
(
)) (1 +
( +1 ))
+1 = (
By di erentiating the value function with respect to
1(

) = W1 ( (
+ W2 ( (

( (

+1

+1 ))) 1 (

( (

+1

+1 )))

,
)

1(

+1 ) (1 +

+1

+1 ))] (1

1(

))

where subscripts denote partial derivatives. By replacing Eq. (8A.1) into the previous equation leaves
the envelope condition for the dynamic programming problem,
1(

) = W1 ( (

( (

+1

+1 )))

(8A.2)

By replacing Eq. (8A.2) back into Eq. (8A.1), and rearranging terms,
W2 ( (
W1 ( (

)
)

(
(

))
W1 ( (
))

+1

+1 )

+1

+1 )) (1 +

( +1 )) = 1

where we have set (


)
( ( +1 +1 )).
We now show through a similar argument that the same Euler equation applies to any asset =
1
,

) (
))
W2 ( (
W1 ( ( +1 +1 ) ( +1 +1 )) (1 + ( +1 )) = 1
(8A.3)
W1 ( (
) (
))

396

c
by
A. Mele

8.15. Appendix 1: Non-expected utility


Indeed, we have:
(

) = max W (

( (

= max W (
+1

P
where +1 =
( +1 +
rst order conditions are
+1

+1 )))

+1

+1 )

:0=

+1

W1 ()

and where the portfolio choice

+ W2 ()

1(

+1 ) (

+1

( (

+1

is made at time . The

+1

+1

+1 )))

+1

+1 )]

By replacing Eq. (8A.2) into the previous equation leaves


W2 ( (
W1 ((

) (
) (

))
W1 ( (
))

+1 )

+1

+1

+1 ))

+1

+1

=1

Derivation of Eq. (8.7). We need to explicitly determine the stochastic discount factor in Eq.
(8A.3)
;

+1

+1 )

W2 ( (
W1 ( (

)
)

(
(

))
W1 ( (
))

+1 )

+1

+1

+1 ))

Note that
W1 (

)=

W2 (

)=

such that
(

+ ((1

) )1

+ ((1

) )

W2 (
+1 ) =
W1 (

+1

1
1

+1 )

+1

1(

) = W1 ( (
(

+1 ))

+1
+1

+1 )

( (

W(
)

+1 )

))

(1

+1 )

+1

(8A.4)

+1

)). Therefore,
1

+1

(8A.5)

. By the envelope condition in (8A.2),


+1 )))
1
1

(1

1
1

+1 ))

+1

(1

=W( (
=

+1 ) =

( ( +1
( +1

( (
(

We are left with evaluating the term

) )

)=W( (

=W(

((1

+1

)
W1 (
)

(
).
where
Along any optimal consumption path,
(

where the rst equality follows by Eq. (8A.2), the second by the rst of Eqs. (8A.4), and the last is
1
the optimality condition. We conjecture that (
) = ( ) 1 , such that the previous expression
delivers the agents optimal consumption:
)= ( )

( )

( ) (1

)(

1)

(8A.6)

Therefore,
+1

= (1

( ))

397

(1 +

+1 ))

(8A.7)

c
by
A. Mele

8.15. Appendix 1: Non-expected utility


and
( ( +1
( +1

+1 ))
+1 )

Along any optimal path, (


)=W( (
again the conjecture made on ,

+1 ) (1 +

1
+1 ))

(8A.8)
( +1 ))1
)). Using the expression of W in Eq. (8.6) and

+1 ) (1 +

= ( )
=
=

1
1
1
1

( )

( )

and rearranging terms leaves:

+1 ) (1 +

(
(

1
+1

+1 )

+1 ) (1 +

+1 ) (1

( ))

+1 ))

(1 +

( ) in Eq. (8A.6) leaves

( +1 ))1 =
( +1 ) (1 +
(

Plugging Eqs. (8A.9)-(8A.10) into Eq. (8A.8),

1
( ( +1 +1 ))
=
( +1 +1 )
(1
( )) (

+1 ) (1 +

+1

+1 ))

(1

+1 ))

)(

(8A.9)

(1

+1 ))

( ))

(1 +

1
(1 +

+1 ))

1
1

(
! (1

)(

1)

(8A.10)
! (1

+1 ))
)(

1)

)(

+1

(1

( )
( )

( )

+1

Moreover, using the denition of


(

1)

! (1

)(

1)

1)

where the rst equality follows by Eq. (8A.6) and the second by Eq. (8A.7). The result follows by
replacing the previous expression into Eq. (8A.5).
Proof of Eqs. (8.12) and (8.13). By the standard property that if is normally distributed,
(), we can elaborate on Eq. (8.10), obtaining,
ln ( ) = () + 12

ln( +1 )+
+1
0 = ln
!

2
2
1
+1
2
2 2
ln
+
(
+
2
=
(8A.11)
+1 ) +
2
We do the same in Eq. (8.11), and obtain:

+1
= +
ln
(
1) (
+1 )

1
2

+(

1)

By replacing Eq. (8A.12) into Eq. (8A.11), we obtain Eq. (8.12) in the main text.
in Eq. (8.13), we replace the expression for (
To obtain the risk-free rate
into Eq. (8A.12).

398

1)

!
(8A.12)

+1 )

in Eq. (8.12)

c
by
A. Mele

8.15. Appendix 1: Non-expected utility


8.15.2 Details regarding models of long-run risks
Proof of Eq. (8.22). By substituting the guess = 0 + 1 into Eq. (8.21),

ln( +1 )+ 1 1 +1
1 +
+1
) + ln
0 = ( 0 (1
1) 0

1
1 2
1 1
=
( 0 (1
+ ln
)+ 1
1) 0
0
2

1
+ ( 1
1) 1 + 1

1 1

const1 + const2

where the second equality follows by Eqs. (8.19) and (8.20). Note, then, that this equality can only
hold if the two constants, const1 and const2 are both zero. Imposing const2 = 0 yields,
1

1
1

as in Eq. (8.22) in the main text. Imposing const1 = 0, and using the solution for
for the constant 0 .

1,

yields the solution

8.15.3 Continuous time


Du e and Epstein (1992a,b) extend the framework on non-expected utility to continuous time. Heuristically, the continuation utility is the continuous time limit of,
=
Continuation utility

1
)

1
+

solves the following stochastic di erential equation,

1
2
=
(
)
( )k k
with
+
2

=0

Now, (
) is the aggregator, with being a variance multiplier, placing a penalty proportional to
) is the continuous time counterpart to the aggregator
utility volatility k k2 . The aggregator (
( ) of the discrete time case. The solution to the previous stochastic di erential utility is:

Z
1
2
( )k k
(
=
)+
2
For example we can take,
(

)=

with =
6 0 and
1. The standard additive utility case is obtained once
case, ( ) = ( )
(see, e.g., Du e and Epstein (1992, p. 367).
[In progress]

399

= 0 and

= , in which

c
by
A. Mele

8.16. Appendix 2: Economies with heterogenous agents

8.16 Appendix 2: Economies with heterogenous agents


Economies with a continuum of agents. We study asset prices in economies with complete
markets by relying on a centralized decision mechanims that delivers the same outcome as that of the
decentralized, market based. This approach was developed by Huang (1987), and extends the approach
to deal with the classical static mode, explained in Chapter 2 of Part I of these lectures.
We consider a set continuum of agents, , indexed by an instantaneous utility function ( ), where
for each
, is consumption. The case with a discrete number of agents is dealt with a change
in notation, and actually analyzed in some of the examples below. Because markets are complete, the
market allocation is Pareto e cient, and by the second welfare theorem, it can be implemented by
means of the following program,
Z

Z
Z
( )
s.t.
=
max
0

or, because there is no intertemporal transfer of resources,


Z
Z
( )
max
( )
s.t.
(

[8A.P1]

where
is the aggregate endowment in the economy.
We explain how the equilibrium price system is determined as the Arrow-Debreu state price density
in an economy with a single agent endowed with the aggregate endowment , instantaneous utility
is the reciprocal of the marginal utility of
function ( ), and where the social weighting function
income for agent
. Indeed, the rst order conditions for the program [8A.P1] are
0

for all

where 0 ( ) denotes the partial derivative of


Moreover, by di erentiating with respect to
Z
0
( )=
( )

(8A.13)

with respect to

, and

is a Lagrange multiplier.

,
=

(8A.14)

( ) denotes the partial derivative of with respect to . The second equality in Eq. (8A.14)
where
follows by the optimality conditions (8A.13), and the third, by di erentiating the constraint of the
social plan [8A.P1]. Combining Eqs. (8A.13) and (8A.14) leaves
(
(

)
=
0)

(
(

)
0)

for all

(8A.15)

On the other hand, the market general equilibrium allocation satises


=
0

0
0

(
(

)
0)

for all

(8A.16)

To show that the allocations in Eq. (8A.15) and in Eq. (8A.16) are the same, we need to show that
(

)=

(8A.17)

1 , where
We show that Eq. (8A.17) holds true once the social weight
=
is the marginal
utility of income of agent . Indeed, in this case, Eq. (8A.13) and the optimality conditions for the
decentralized economy lead to,

for all

)=

and

400

)=

(8A.18)

8.16. Appendix 2: Economies with heterogenous agents

c
by
A. Mele

By aggregating the rst and the second of the previous conditions leaves:
Z
Z

=
(
)
=

where
ans denote the inverse functions for consumption, as implied by the social and private
allocations in (8A.18). Eq. (8A.17) follows by an argument similar to that used to show Theorem 2.7 in
Chapter 2, and by the envelope theorem in Eq. (8A.14). Therefore, the pricing kernel in this economy
with heterogeneous agents and complete markets has the same pricing implication as the economy
with a single agent as explained, viz
( )
=
(8A.19)
( 0)
0
The practical merit of this approach is that while the marginal utility of income is unobservable, the
thusly constructed Arrow-Debreu state price density depends on the innite dimensional parameter,
, which could be calibrated to match selected quantitative features of consumption and asset price
data.
We now apply this approach and derive the equilibrium conditions in two models and then, move
to study the allocation process in an incomplete market setting.
catching up with the Joneses (Chan and Kogan, 2002). In this model, markets are
(
) =
complete, and we have that
= [1 ] and the instantaneous utility of agent
is,
(1
), where is the standard of living of others, as explained in the main text.
( / )1
The static optimization problem for the social planner in [8A.P1] can be written as,
Z
Z
( / )1
=
(8A.20)
(
) = max
s.t.
1
1
1
The rst order conditions for this problem lead to,

(8A.21)

where is a Lagrange multiplier, a function of the aggregate endowment , normalized by . It is


determined by Eq. (8.26), which is obtained by replacing Eq. (8A.21) into the budget constraint of the
social planner, the second of Eqs. (8A.20). The value function in Eq. (8.25) of the main text follows
by replacing Eq. (8A.21) into the maximed value of the intertemporal utility, the rst of Eqs. (8A.20).
General equilibrium allocations, and prices, are obtained by setting ( ) equal to the reciprocal of
the marginal utility of income for agent .
The expression for the unit risk-premium in Eq. (8.27) follows by results given in Section 7.5.1 of
Chapter 7,

2
( (
))
( (
))
(
) ln
(8A.22)
( (
)) =
0
2
where
that:

( ) is the value function in Eq. (8.25). To evaluate the previous expression for , note, rst,
Z
0
1
( (
))
1 1
( (
))
=
( (
))
(8A.23)
1

Moreover, by di erentiating Eq. (8.26) with respect to , using Eq. (8A.23), and rearranging terms,
leads to
( (
))
= ( (
)) . Di erentiating this expression for
with respect
to
again, produces:
0
2 ( (
))
( (
)) 1
(8A.24)
=
2

401

c
by
A. Mele

8.16. Appendix 2: Economies with heterogenous agents

Replacing Eqs. (8A.23)-(8A.24) into Eq. (8A.22) yields Eq. (8.27) in the main text. In this ctitious
representative agent economy, the short-term rate is the expectation of the stochastic discount factor.
It equals, again by results given in Section 7.5.1 of Chapter 7,
2

)=

( (
( (

))
))

( (

1+

))

( (
))
( (
))

))
(
)
))

( (
( (

1
2

2
0

1
2

2
0

00

( (
( (
))
( (

( (

))
))
0
( (
))

))

It is instructive to compare the rst order conditions of the social planner in Eq. (8A.21) with
those in the decentralized economy. Because markets are complete, the optimality conditions in the
decentralized economy satisfy:

m

is the marginal utility of income for agent


where
in Eq. (8A.25), and solving for yields,
=

as usual. Aggregating the market allocations

1
s

By aggregating the social weighted allocations,


=

(8A.25)

say, in Eq. (8A.21), with

(8A.26)
=

1,

(8A.27)

Comparing the two implications in (8A.26) and (8A.27) leads to the conclusion that the social
allocations with = 1 , are the same as the private, and that the counterpart to Eq. (8A.19) is,

1
(
)
=
=
1
( 0 )
0
0
0

Restricted stock market participation (Basak and Cuoco, 1998). Given Eq. (8.39), and
results given in Section 7.5.1 of Chapter 7, the unit risk-premium, (
), solves a xed point problem:
(

)=

11 (

)
)

1(

11 (

12 (
1(

)
)

That is,
(

)=

1(
1(

12 (

We claim that:
1(

)=

( ), and

1(
1(

)
)

11 (
12 (

0
1(

(8A.28)

)
=
)

00

( )

(8A.29)

where
and
are the social planner consumption allocations. By replacing Eqs. (8A.29) into Eq.
and , leads to the expression for in Eq. (8.34). The expression
(8A.28), and using the denition of
for the short-term rate in Eq. (8.33) can be found similarly, and again, through the results given in
Section 7.5.1 of Chapter 7.
We now show that Eq. (8A.29) hold true. Consider the Lagrangean for the maximization problem
in Eq. (8.38),
( )+ (
)
= ( )+

402

c
by
A. Mele

8.16. Appendix 2: Economies with heterogenous agents


0

( )

where is a Lagrange multiplier, = 0 ( ) , and and are the market consumption allocations.
= (
) and
The rst order conditions for the social planner lead to social allocation functions
= (
), and Lagrange multiplier (
), satisfying:
0

)) = (

)=

))

))

(8A.30)

Accordingly, the value of the problem in Eq. (8.38) is:


(

)=

)) +

such that
1(

)=

))

))

))

+
)

(
(

))

(8A.31)

where the second equality follows by the rst order conditions in Eq. (8A.30), and the third equality
holds by di erentiating the equilibrium condition
(

)+

(8A.32)

with respect to .
Eq. (8A.31) establishes the rst claim in Eqs. (8A.29). To prove the second claim, invert, rst,
(
) = 0 1[ (
)] and
(
) =
the rst order condition with respect to , obtaining,
0 1[ (
) ]. Replace, then, these inverse functions into Eq. (8A.32),
=

0 1

[ (

)] +

0 1

[ (

(8A.33)

where, by Eq. (8A.32) and Eq. (8A.31),


(

)=

Di erentiating Eq. (8A.33) with respect to


0=

00 (

1
(

))

and
1=

00 (

1
(

12 (

))

and
)+

11 (

Replacing Eq. (8A.36) into Eq. (8A.35) leaves:

1
)
12 (
+
0 = 00
( (
))
)
11 (

00

)) =

1(

(8A.34)

, and using Eq. (8A.34), leaves:


12 (

1
( (

)+

))

00

1
00 ( (

11 (

1
( (

))

1(

(8A.35)

(8A.36)

))

12 (

1(

The second relation in Eqs. (8A.29) follows by rearranging terms in the previous relation.
Extinction I: derivation of Eq. (8.67). Denote with 0 the initial wealth of agent . Note
), = 1 2, such that by using
that the budget constraints faced by the two agents are: 0 = (
and
in Eqs. (8.65) and (8.66),
the expressions for

1
1
1
1
(
) (1 + (
) )
(
02
2 )

(8A.37)
1=
=
=
(
01
1 )
(1 + (
)1 ) 1 1

403

c
by
A. Mele

8.16. Appendix 2: Economies with heterogenous agents

where the rst equality follows by the assumptions that rational and irrational agents have the same
initial endowments.
Let us introduce a new probability, , dened through the Radon-Nikodym derivative,

1

=

( 1 )
F
such that Eq. (8A.37) can be re-written as

(
)1 (1 + (
)1 )

= (1 + (

)1 )

(8A.38)

where denotes the expectation under . By Girsanovs theorem, we have that under ,
to
2 1 2 2
) +
2
= ((1 )

where is a Brownian motion under . That is, utilizing the expression for
(8A.38) can be written as

1 2 2
+
1 (1 + 1 ) 1 = (1 + 1 ) 1
2

is solution

in Eq. (8.67), Eq.

(8A.39)

KRWW show indeed that Eq. (8A.39) holds true.

Extinction II: derivation of Eq. (8.68). We determine the equilibrium asset price in the
logarithmic preferences case. We have:
=

)=

(1 + )
(1 + )

1+

(1 + )

where the second line follows by the martingale property of . To determine the denominator of the
previous ratio, note that,
"
"
1#
1#

(
)
+
(1 + )
=
=

"
(

2(

where the second equality follows by a change in probability and the third, by Girsanovs theorem and
Eq. (8.62). Therefore,
=

1+

(1 + )

The price prevailing in a rational market,


such that by Eq. (8A.40),

1+
1+

, is determined once we set


1

=
Replacing this value for

back to Eq. (8A.40) produces Eq. (8.68).

404

2(

= 0 and

(8A.40)

= 1 for all ,

c
by
A. Mele

8.17. Appendix 3: Knightian uncertainty

8.17 Appendix 3: Knightian uncertainty


Smooth ambiguity aversion. Given the assumptions on in the main text, we have:
h
i

=
M

1 2
2
=
exp
( ( | )
) +
var ( | )
2
Note that ( | ) is normally distributed with, say, mean
var [ ( | )], such that,

1 2 2
) +
=
exp
(
M
2

[ ( | )] and variance
2

1
+
2

var ( | )

(8A.41)

By the Law of Iterated Expectations,

[ ( | )] =
Regarding

( )=

, note that by the Law of Total Variance,


var ( ) = var [ ( | )] +

Replacing the values of


(8.93).

and

[var ( | )]

+ var ( | )

implied by the previous two equalities into Eq. (8A.41) leaves Eq.

Solution of the multiple likelihood model. Consider the APT equation delivering E , i.e.,
the asset expected returns under the distorted probability in (8.95),
E =

Vol
1
( + 1)
2

2
0

0(

)
( )

( )

(8A.42)

where
denotes the innitesimal generator of ( ) in (8.99), and the last two lines follow by Eqs.
(8.100). That is, the price-dividend ratio is solution to the following partial di erential equation:

1
0
2
2
( + 1) 0 + 0
) ( )
+
( )+1
0=
( ) + 0 ( ) (1
2
Conjecture that ( ) = 0 + 1 , for two positive constants. By plugging this guess back to the
previous equation conrms that the price-dividend ratio is indeed a ne in the expected growth. To
determine 0 and 1 , one may proceed through the following intuitive arguments. Note that
1
Z
1
1

(0 ) =
+ (1
= 0
0)

1
2)
1
2)
0
0
(1
)(
(1
) (
0
2
0
2
where the rst equality is a standard evaluation formula, and the second relies on Eq. (8.99). Replacing

) , delivers
the denition of expected dividend growth into the previous equation, = + (1
Eq. (8.101).
Next, we determine the expression for the equity premium under the reference model, i.e., E in
( | ( ) ) = + (1
) and A is the average
(8.102). Note that = + A 0 , where

405

c
by
A. Mele

8.17. Appendix 3: Knightian uncertainty

size of ambiguity aversion, also dened in (8.102). Therefore, by Girsanov theorem (see Chapter 4),
we can dene a probability
with Radon-Nikodym derivative against the worst-case probability
(i.e., the probability under which ( ) are solution to (8.99)),

= 12 0 A2
0 A

such that

0A

is a Brownian motion under


=

= =

. Then, under

, we have that

(8A.43)

A ( )

+ ( )

where () denotes as usual the di usion coe cient of in (8.99). The APT equation delivering the
is
asset expected returns under
+

E
=
=
=

0 ()

1
() 0 +
()

0 ()
A 0 + +
()
()

0 ()
A ( ) A 0 +
+ +

denotes the innitesimal generator of (


) in (8A.43), and the second line follows by
where
) solution to (8A.43), the third by the denition of
applying It
os lemma to =
( ), with (
and, nally, the fourth by applying Itos lemma to ( ), where solution to the second equation in
in (8A.43). Rearranging terms and using the denition of the
(8A.43), and the expression for E
asset return volatility Vol in (8.100) delivers the expression for E in (8.102).
The expression for the price-dividend ratio in Eq. (8.101) simplies
as ambiguity aversion is ho, in which case A collapses
mogenous across states of nature. Precisely, assume that ( ) =
, and = + 0 , such that the aggregate dividends under the distorted dynamics are
to
=
=
and the price-dividend ratio is

1
( )=
where

+ (1

)(

(1

) (

1
2

0.

406

2)
0

(1

)(

1
2

2)
0

(8A.44)

8.17. Appendix 3: Knightian uncertainty

c
by
A. Mele

References
Abel, A.B. (1990): Asset Prices under Habit Formation and Catching Up with the Joneses.
American Economic Review Papers and Proceedings 80, 38-42.
Abel, A.B. (1999): Risk Premia and Term Premia in General Equilibrium. Journal of Monetary Economics 43, 3-33.
Adrian, T. and H. S. Shin (2011): Financial Intermediaries and Monetary Economics. In
B. M. Friedman and M. Woodford (Editors): Handbook of Monetary Economics (NorthHolland Elsevier), Vol 3A, Chapter 12, 601-650.
Alvarez, F. and U.J. Jermann (20??):
Bansal, R. and A. Yaron (2004): Risks for the Long Run: A Potential Resolution of Asset
Pricing Puzzles. Journal of Finance 59, 1481-1509.
Basak, S. (2000): A Model of Dynamic Equilibrium Asset Pricing with Heterogeneous Beliefs
and Extraneous Risk. Journal of Economic Dynamics and Control 24, 63-95.
Basak, S. (2005): Asset Pricing with Heterogeneous Beliefs. Journal of Banking and Finance
29, 2849-2881.
Basak, S. and D. Cuoco (1998): An Equilibrium Model with Restricted Stock Market Participation. Review of Financial Studies 11, 309-341.
Bernanke, B.S. and M. Gertler (1989): Agency Costs, Net Worth, and Business Fluctuations.
American Economic Review 79, 14-31.
Bernanke, B.S. (2004): The Great Moderation: Remarks by the Federal Reserve Board Governor, The Meetings of the Eastern Economic Association, Washington, DC, February
20.
Bernanke, B. S., M. Gertler and S. Gilchrist (1999): The Financial Accelerator in a Quantitative Business Cycle Framework. In J.B. Taylor and M. Woodford (Editors): Handbook
of Macroeconomics (North-Holland Elsevier), Vol. 1C, Chapter 21, 1341-1393.
Berrada, T. (2006): Incomplete Information, Heterogeneity, and Asset Pricing. Journal of
Financial Econometrics 4, 136-160.
Black, F. (1976): Studies of Stock Price Volatility Changes. Proceedings of the 1976 Meeting
of the American Statistical Association, 177-81.
Boldrin, M., L. Christiano and J. Fisher (2001): Habit Persistence, Asset Returns and the
Business Cycle. American Economic Review 91, 149-166.
Brunnermeier, M. K., T. M. Eisenbach and Y. Sannikov (2012): Macroeconomics with Financial Frictions: A Survery. Working Paper Princeton University.
Brunnermeier, M. K. and Y. Sannikov (2013): A Macroeconomic Model with a Financial
Sector. Working Paper Princeton University.
407

8.17. Appendix 3: Knightian uncertainty

c
by
A. Mele

Buraschi, A. and A. Jiltsov (2006): Model Uncertainty and Option Markets with Heterogeneous Beliefs. Journal of Finance 61, 2841-2897.
Campbell, J.Y., A. W. Lo and C. MacKinlay (1997): The Econometrics of Financial Markets.
Princeton: Princeton University Press.
Campbell, J.Y. (2003): Consumption-Based Asset Pricing. In Constantinides, G. M., M.
Harris and R. M. Stulz (Editors): Handbook of the Economics of Finance (North-Holland
Elsevier), Vol 1B, Chapter 13, 803-887.
Campbell, J.Y., and J.H. Cochrane (1999): By Force of Habit: A Consumption-Based Explanation of Aggregate Stock Market Behavior. Journal of Political Economy 107, 205-251.
Campbell, J.Y. and R. Shiller (1988): The Dividend-Price Ratio and Expectations of Future
Dividends and Discount Factors. Review of Financial Studies 1, 195228.
Chan, Y.L. and L. Kogan (2002): Catching Up with the Joneses: Heterogeneous Preferences
and the Dynamics of Asset Prices. Journal of Political Economy 110, 1255-1285.
Christie, A.A. (1982): The Stochastic Behavior of Common Stock Variances: Value, Leverage,
and Interest Rate E ects. Journal of Financial Economics 10, 407-432.
Cochrane, J. H., F. A. Longsta , and P. Santa-Clara (2008): Two Trees. Review of Financial
Studies 21, 347-385.
Constantinides, G.M. and D. Du e (1996): Asset Pricing with Heterogeneous Consumers.
Journal of Political Economy 104, 219-240.
Constantinides, G.M., J.B. Donaldson and R. Mehra (2002): Juniors Cant Borrow: a New
Perspective on the Equity Premium Puzzle. Quarterly Journal of Economics 117, 269296.
Cujean, J. and M. Hasler (2001): Fear of Recessions, Heterogenous Beliefs, and Stock Price
Under/Over-Reaction. Working Paper Swiss Finance Institute EPFL.
Cvitanic, J., and S. Malamud (2011): Price Impact and Portfolio Impact. Journal of Financial Economics 100: 201-225.
Danielsson, J., J.-P. Zigrand and H. S. Shin (2011): Balance Sheet Capacity and Endogenous
Risk. Working Paper London School of Economics and Princeton University.
Detemple, J. and Murthy, S. (1994): Intertemporal Asset Pricing with Heterogeneous Beliefs
Journal of Economic Theory 62, 294-320.
Dow, J. and S. Werlang (1992): Uncertainty Aversion, Risk Aversion, and the Optimal Choice
of Portfolio. Econometrica 60, 197-204.
Du e, D. (1992): The Nature of Incomplete Security Markets. In J-J La ont (Editor):
Advances In Economic Theory, 6th World Congress, Vol. II, Chapter 4, 214-262.
Du e, D. and L.G. Epstein (1992a): Asset Pricing with Stochastic Di erential Utility. Review of Financial Studies 5, 411-436.
408

8.17. Appendix 3: Knightian uncertainty

c
by
A. Mele

Du e, D. and L.G. Epstein (with C. Skiadas) (1992b): Stochastic Di erential Utility. Econometrica 60, 353-394.
Dumas, B., A. Kurshev and R. Uppal (2009): Equilibrium Portfolio Strategies in the Presence
of Sentiment Risk and Excess Volatility. The Journal of Finance 64, 579-629.
Ellsberg, D. (1961): Risk, Ambiguity and the Savage Axioms. Quarterly Journal of Economics 75, 643-69.
Epstein, L.G. and S.E. Zin (1989): Substitution, Risk-Aversion and the Temporal Behavior of
Consumption and Asset Returns: A Theoretical Framework. Econometrica 57, 937-969.
Epstein, L.G. and S.E. Zin (1991): Substitution, Risk-Aversion and the Temporal Behavior of
Consumption and Asset Returns: An Empirical Analysis. Journal of Political Economy
99, 263-286.
Friedman, M. (1953): The Case for Flexible Exchange RatesEssays in Positive Economics.
Chicago: University of Chicago Press.
Gallmeyer, M., Aydemir, A.C. and B. Hollield (2007): Financial Leverage and the Leverage
E ect: A Market and a Firm Analysis. working paper Carnegie Mellon.
Geanakoplos, J. (2010): The Leverage Cycle. In D. Acemoglu, K. Rogo and M. Woodford
(Editors): NBER Macroeconomic Annual 2009 (University of Chicago Press), Vol 24,
1-65.
Gertler, M. and N. Kiyotaki (2011): Financial Intermediation and Credit Policy in Business
Cycle Analysis. In B. M. Friedman and M. Woodford (Editors): Handbook of Monetary
Economics (North-Holland Elsevier), Vol 3A, Chapter 11, 547-599.
Gilboa, I. and M. Marinacci (2011): Ambiguity and the Bayesian Paradigm. In Advances
in Economics and Econometrics: Theory and Applications, Tenth World Congress of the
Econometric Society. Cambridge: Cambridge University Press.
Gilboa, I. and D. Schmeidler (1989): Maxmin Expected Utility with a Non-Unique Prior.
Journal of Mathematical Economics 18, 141-153.
Guvenen, F. (2009): A Parsimonious Macroeconomic Model for Asset Pricing. Econometrica
77, 1711-1740.
Harrison, J.M. and D.M. Kreps (1978): Speculative Investor Behavior in a Stock Market with
Heterogeneous Expectations. Quarterly Journal of Economics 92, 323-36.
Hansen, L. P. and T. J. Sargent (2008): Robustness. Princeton: Princeton University Press.
He, Z. and A. Krishnamurthy (2012): Intermediary Asset Pricing. Forthcoming in the American Economic Review.
Heaton, J. and D.J. Lucas (1996): Evaluating the E ects of Incomplete Markets on Risk
Sharing and Asset Pricing. Journal of Political Economy 104, 443-487.
409

8.17. Appendix 3: Knightian uncertainty

c
by
A. Mele

Huang, C.-f. (1987): An Intertemporal General Equilibrium Asset Pricing Model: the Case
of Di usion Information. Econometrica 55, 117-142.
Hugonnier, J. and R. Prieto (2012): Arbitrageurs, Bubbles, and Credit Conditions. Working
paper SFI- EPFL (Lausanne) and Boston University.
Jermann, U.J. (1998): Asset Pricing in Production Economies. Journal of Monetary Economics 41, 257-276.
Karatzas, I. and S.E. Shreve (1991): Brownian Motion and Stochastic Calculus. New York:
Springer Verlag.
Keynes, J.M. (1921): A Treatise on Probability. London: MacMillan and Co.
Kiyotaki, N. (1998): Credit and Business Cycles. Japanese Economic Review 49, 18-35.
Kiyotaki, N. and J. Moore (1997): Credit Cycles. Journal of Political Economy 105, 211-248.
Klibano , P., M. Marinacci, and S. Mukerji (2005): A Smooth Model of Decision Making
under Ambiguity. Econometrica 73, 1849-1492.
Knight, F.H. (1921): Risk, Uncertainty, and Prot. New York: Houghton Mi in.
Kogan, L., S. Ross, J. Wang, and M. Westereld (2006): The Survival and Price Impact of
Irrational Traders. Journal of Finance 61, 195-229.
Kyle, A.S. and F.A. Wang (1997): Speculation Duopoly with Agreement to Disagree: Can
Overcondence Survive the Market Test? Journal of Finance 52, 2073-90.
Leippold, M., F. Trojani and P. Vanini (2008): Learning and Asset Prices under Ambiguous
Information. Review of Financial Studies 21, 2565-2597.
Lettau, M. (2002): Idiosyncratic Risk and Volatility Bounds, or, Can Models with Idiosyncratic Risk Solve the Equity Premium Puzzle? Review of Economics and Statistics 84,
376-380.
Liptser, R.S. and A.N. Shiryaev (2001): Statistics of Random ProcessesVol. II (Applications).
Berlin, Springer-Verlag.
Lucas, R.E. (19??):
Lucas, D.J. (1994): Asset Pricing with Undiversiable Income Risk and Short Sales Constraints: Deepening the Equity Premium Puzzle. Journal of Monetary Economics 34,
325-341.
Mankiw, N.G. (1986): The Equity Premium and the Concentration of Aggregate Shocks.
Journal of Financial Economics 17, 211-219.
Mankiw, N.G. and S.P. Zeldes (1991): The Consumption of Stockholders and Non-Stockholders.
Journal of Financial Economics 29, 97-112.
Markovitz, H. (1952): Portfolio Selection. Journal of Finance 7, 77-91.
410

8.17. Appendix 3: Knightian uncertainty

c
by
A. Mele

Martin, I. (2011): The Lucas Orchard. Working Paper Stanford University.


Mele, A. and F. Sangiorgi (2015): Uncertainty, Information Acquisition and Price Swings in
Asset Markets. Review of Economic Studies 82, 1533-1567.
Menzly, L., T. Santos and P. Veronesi (2004): Understanding Predictability. Journal of
Political Economy 112, 1, 1-47.
Nelson, D.B. (1991): Conditional Heteroskedasticity in Asset Returns: A New Approach.
Econometrica 59, 347-370.
Odean, T. (1998): Volume, Volatility, Price, and Prot When All Traders Are Above Average. Journal of Finance 53, 1887-1934.
Pavlova, A. and R. Rigobon (2008): The Role of Portfolio Constraints in the International
Propagation of Shocks. Review of Economic Studies 75, 1215-1256.
Rawls, J. (1971): A Theory of Justice. Cambridge: Harvard University Press.
Rouwenhorst, G. K. (1995): Asset Returns and Business Cycles. In Cooley, T.F. (Ed.):
Frontiers of Business Cycle Research, Princeton University Press, 294-330.
Scheinkman, J.A. and W. Xiong (2002): Overcondence and Speculative Bubbles. Journal
of Political Economy 111, 1183-1219.
Schmeidler, D. (working paper, 1982, published in 1989): Subjective Probability and Expected
Utility without Additivity. Econometrica 57, 571-587.
Schmeidler, D. (1986): Integral Representation without Additivity. Proceedings of the American Mathematical Society 97, 255-261.
Schwert, G.W. (1989a): Why Does Stock Market Volatility Change Over Time? Journal of
Finance 44, 1115-1153.
Schwert, G.W. (1989b): Business Cycles, Financial Crises and Stock Volatility. CarnegieRochester Conference Series on Public Policy 31, 83-125.
Shin, H. S. (2010): Risk and Liquidity. Clarendon Lectures in Finance, Oxford University
Press.
Tallarini, T. (2000): Risk-Sensitive Real Business Cycles. Journal of Monetary Economics
45, 507-32.
Telmer, C.I. (1993): Asset-Pricing Puzzles and Incomplete Markets. Journal of Finance 48,
1803-1832.
Veronesi, P. (2000): How Does Information Quality A ect Stock Returns? Journal of Finance
55, 807-837.
Wachter, J.A. (2006): A Consumption-Based Model of the Term Structure of Interest Rates.
Journal of Financial Economics 79, 365-399.
Wald, A. (1950): Statistical Decision Functions. New York: John Wiley.
411

c
by
A. Mele

8.17. Appendix 3: Knightian uncertainty

Weil, Ph. (1989): The Equity Premium Puzzle and the Risk-Free Rate Puzzle. Journal of
Monetary Economics 24, 401-421.
Weil, Ph. (1992): Equilibrium Asset Prices with Undiversiable Labor Income Risk. Journal
of Economic Dynamics and Control 16, 769-790.
Xiouros, C. and F. Zapatero (2010): The Representative Agent of an Economy with External
Habit Formation and Heterogeneous Risk Aversion. Review of Financial Studies 23,
3017-3047.
Zapatero, F. (1998): E ects of Financial Innovations on Market Volatility when Beliefs are
Heterogeneous. Journal of Economic Dynamics and Control 22, 597-626.

412

9
Information and other market frictions

9.1 Introduction
In the economies of the previous chapters, the equilibrium outcomes do not convey more information than that available to each agent because information, whilst sometimes incomplete, is
disseminated symmetrically across decision makersfor example, asset prices aggregate information that the agents already know, such that the agents inference on the asset fundamentals
would not improve were it also based on the equilibrium prices.
This chapter considers markets in which equilibrium outcomes aggregate information dispersed across the market, which agents nd useful while updating beliefs. This information is
useful because the pieces of information agents have access to are not the same, such that, now,
asset prices contain information about the fundamentals that some agents might not directly
observe, and which are made publicly available (so to speak) through trading activity. We study
markets with asymmetric information, in which some agents have more precise information than
others, and markets with di erential information, in which agents know di erent pieces of information that have the same quality. In the economies of the previous chapters, the price only
determines the budget constraints. While the price still determines budget constraints in the
markets that we analyze in this chapter, this price now plays a new, additional, fundamental
role: it conveys information to investors.
Note how subtle the equilibrium concept needs to be in the markets of this chapter. If agents
nd it useful to condition their choices upon the information conveyed by an equilibrium outcome, these very same choices may well a ect the information contained in the equilibrium
outcome, over a xed point. Provided it exists, this xed point leads to implications regarding
the informational role of asset prices, that is, the equilibrium amount of information. In the
asymmetric information case, we say that the price transmits information from the more
informed investors to the less. In the di erential information case, we say that the price aggregates information dispersed amongst investors. Both cases play outstanding roles in economics.
Consider the asymmetric information case: if uninformed investors can learn from the equilibrium price, what are the incentives left to purchase information? In other words, what is
the value of information? The di erential information case is equally important. The welfare
theorems reviewed in Chapter 2 suggest that a Pareto optimal allocation can be centralized.

9.1. Introduction

c
by
A. Mele

However, this solution cannot be implemented while portfolio decisions are made on the basis
of local information. The market solution proves useful as the price would now aggregate
dispersed information, making it available to investors initially not having direct access to it.
These ideas go back to Hayek (1945) at least.
There is indeed a number of conceptual issues that arise in these markets. Namely, how much
information does a price need to convey for an equilibrium to exist? Intuitively, if the asset
price conveys too precise information, and becomes a su cient statistics for the asset payo ,
the agents would not even need to condition upon their own information while formulating
portfolio decisions. But then, if agents trade while only relying on the equilibrium price (and
not on their own signals), how can, then, the price aggregate information? This is the Grossman
(1976) paradox. Note, also, that the price cannot reveal all private information when information
is costly, for otherwise there would not be incentives left to purchase information in the rst
placethe Grossman and Stiglitz (1980) paradox.
A standard approach to deal with these paradoxes is to assume that prices convey noisy
signals about the information that investors have. As Black (1986) discussed, noise makes
markets function when information problems would otherwise lead them not to arise in the rst
place. The mechanism is simple. In equilibrium, the price incorporates the information on which
informed investors trade, but also other factors that are possibly unrelated to the fundamentals,
such as liquidity shocksnoise. An uninformed investor, now, cannot tell whether a large asset
price swing is due to information or to a liquidity shock as the price is partially revealing of
the information informed investors have. So uninformed investors learn from the price, but the
information conveyed by the price is imperfect, such that informed investors still have superior
information, even after the uninformed learning. Thus, it can pay to purchase information. All
in all, partial revelation is the key to make asset markets work in this context.
The equilibrium concept to deal with these markets is an extension of the Rational Expectations Equilibrium (REE) the previous chapters have relied upon, and is called Noisy REE
(NREE). It appears that macroeconomics contains the rst example of an an equilibrium in
which agents confuse fundamentals with noise. In an attempt to explain the relation between
the conduct of monetary policy and the business cycle, Phelps (1970) and Lucas (1972) consider an economy in which agents have imperfect information regarding the fundamentals of
the economy. This information-based approach to the business cycle is summarized in Lucas
(1981), and was somehow abandoned in favor of the real business cycle theory (see Chapter 3),
perhaps because information could hardly be regarded as the main engine of macroeconomic
uctuations. However, the interplay between macroeconomics and information helped Lucas
(1972) develop the notion of rational expectations within a noisy economy. In Section 9.2, we
present a simplied version of the Lucas framework, which helps pave the way to the study of
asset markets in subsequent sections.
A central theme of this chapter is the role of information in asset markets. In many of the
models of this chapter, noise is what makes these markets work, as explained. Liquidity shocks
are the most natural example that illustrates this concept. In fact, the models in this chapter
make sharp predictions on how liquidity shocks a ect asset prices. For example, we shall explain,
in non-competitive markets (markets beyond the standard NREE), uninformed trades may have
a price impact because market makers confuse information with noise: being unsure about the
nature of the orders they see (information-driven or liquidity shocks), market makers price the
assets in a way that even a liquidity shock (noise) gets impounded in equilibrium, a clear case
of adverse selection. Liquidity in asset markets therefore constitutes the other side of the same
coin (against information) in this chapter.
414

c
by
A. Mele

9.2. Prelude: imperfect information in macroeconomics

The core of this chapter is then to analyze how information a ects asset markets. Section
9.3 discusses the classical notions of informational e ciency in the early empirical literature,
and how the subsequent information literature has helped shape these notions. This literature
is reviewed next. First, Sections 9.4 and 9.5 explain that the standard notions of a Walrasian
equilibrium or REE are not su cient to deal with markets in which agents have asymmetric
or diverse information. Section 9.6 then deals with NREE. Sections 9.7 and 9.8 cover markets
with non-competitive players.
While information does play a fundamental role to explain prices and liquidity in the microstructure of asset markets, information cannot be the only driver of market liquidity. Liquidity in markets driven by macroeconomic news (government bonds, for example) cannot be
only driven by investors with superior or diverse information. Search and bargaining are
alternative explanations in these markets. Section 9.9 studies liquidity in markets where information plays a more limited role, with a focus shifted to the search nature of OTC markets.
Section 9.10 concludes the chapter and examines a number of additional mechanisms that
could potentially a ect the asset price formation process, such as the presence of irrational (or
noise) traders and additional capital market imperfections that lead to limits to arbitrage.

9.2 Prelude: imperfect information in macroeconomics


We consider a simplied version of the model developed by Lucas (1973), in which goods are
produced in distinct islands. Let
denote log-production supplied in the -th island. (All
variables are in logs, in this section.) Below, we explain which sources of randomness a ect this
economy (see Eqs. (9.3) and (9.4)). It is assumed that production supplied is set so as to equal
the expected wedge of the (log-) price in the island, , over the average price in the economy,
,
1X
= (
| )
where
=
(9.1)
=1

Eq. (9.1) follows, approximately, once we assume that the average price, , is common knowledge, as for example in the monopolistic competition model of Blanchard and Kiyotaki (1987).
When is not common knowledge as in the analysis of this section, Eq. (9.1) can still be
thought of as arising through a plausible decision mechanism. The specic functional form for
the average price is the most important approximation made while deriving Eq. (9.1) based
on a rigorous micro-founded setup. Appendix 3 to this chapter contains a discussion of these
issues.
Information is disseminated di erentially (not asymmetrically), in that producers in the -th
island are not aware of the price in the remaining islands, and make statistical inference on
economic developments occurring in the other islands with the same precision. We conjecture
and, later, verify, that all variables, exogeneous and endogeneous, are normally distributed.
We shall show that this normality property implies the price index gathers all information
e ciently, i.e. is a su cient statistics for all that information.
By the Projection Theorem reviewed in Appendix 1, we have that:
=

| )

( ))

where we have used the fact that information is symmetrically disseminated and, then, (i)
the expectation ( ) = ( ) = ( ) for every and , and (ii) both the numerator and
415

c
by
A. Mele

9.2. Prelude: imperfect information in macroeconomics

(
)
denominator of the ratio,
, are the same across all islands. This coe cient will
( )
be determined below, as a part of the equilibrium.
Aggregating across all islands, yields the celebrated Lucas supply equation:

1X

( ))

(9.2)

=1

Next, assume the demand for the good produced in the -th island is given by:

=
+
(
) where
0 2

and

(9.3)

is money supply, which we assume to satisfy:


=

( )+

where

(9.4)

P
are sectoral shocks in that:
= 0.
Finally, we assume that ( ) = 0, and that
=1
The functional form for the demand function, , follows after assuming the goods in the islands
are imperfect substitutes (see, e.g., Blanchard and Kiyotaki, 1987).
The equilibrium price in the islands plays two roles in this economy. A rst, and standard
role, is to clear the markets, being such that = , or:
(

( )) =

) , for all

(9.5)

The second role the price performs is to convey information to agents regarding the two
shocks: (i) the macroeconomic, monetary shock, ; and (ii) the real shocks in all the islands,
, = 1 . We conjecture that the only real shock that determines the price in the -th
island is , i.e. that the price is a function
(
). We also conjecture this price is a ne,
in and , viz
(
)= + +
(9.6)
where the coe cients , and have to be determined in equilibrium. Under these conditions,
the average price is a function
( ) satisfying
( ) = +

(9.7)

Let us replace Eqs. (9.6), (9.7) and (9.4) into the equilibrium condition, Eq. (9.5). By rearranging terms,
0=( +
1) + ( +
1) +
( )
The previous equation has to hold for all

and
=

and the coe cients for


and :

and

. Therefore,

( )

must both equal zero, leading to the following expressions for

1
1
=
(9.8)
1+
+
We are left with determining , which given Eqs. (9.6)-(9.7), and Eq. (9.8), is easily shown to
equal:
2
)
(
=
=
(9.9)

2
( )
+
2 +
2
1+
416
=

c
by
A. Mele

9.3. Informational e ciency: roadmap

The previous equation has a unique and positive xed point for , which can then be replaced
back into Eqs. (9.8), yielding the solutions for and , which are both positive.
We can now gure out the implications of this equilibrium. By replacing Eqs. (9.6)-(9.7) into
the Lucas supply equation (9.2), leaves:
=
This is Lucas celebrated neutrality result. Anticipated monetary policy, ( ), does not a ect
the equilibrium outcome, . It is the monetary shock that a ects . Agents in any island do
not observe the price in the remaining islands and, hence, the aggregate price level, . Therefore,
they are unable to tell whether an increase in the price of the good they produce, , is due to
a real shock, , or to a monetary shock, . In other words, they cannot disentangle a monetary
shock from a real shock. If the agents were informed about real shocks, they would of course
infer , and a monetary shock would not exert any e ect on the equilibrium production.
In other words, in equilibrium, the price di erence is
=
, which does not depend on
. It is a dichotomy prediction reminiscent of the classical theory. Note, however, that
is not observed, as is not observed. Instead, the producers in the -th island can only make
inference on,
2 2

( | )=

2 2

2 2

(
) = 2 2.
The previous term co-varies positively with the observed price, ,
The covariance is zero precisely when the assumption is removed of imperfect knowledge regarding the real shocks, 2 = 0, in which case = 0. In contrast, and assuming imperfect
knowledge, producers act so as to compensate for their partial lack of knowledge, and produce
to the maximum extent they can justify, on the basis of the positive statistical co-movements,
(
) 0. Note, if ( ) =
1 , i.e. money supply in the previous period, then from
Eq. (9.7), the ination rate,
=
+ (1
) 1 . Therefore, output and ination are pos1
itively correlated, and generate a Phillips curve, which policy makers cannot exploit anyway,
as anticipated monetary policy, ( ), is rationally factored out, and does not a ect output.
This is the essence of the Lucas critique (Lucas, 1977).
In the next sections, we analyze asset markets that work due to similar mechanisms. Why
should we ever buy some assets from those people insisting in selling them? Trading seems to
be a di cult phenomenon to explain, in a world with imperfect information. Yet trading does
occur, if imperfect information has the same nature as that of the Phelps-Lucas model. Agents
might well be imperfectly informed about the nature of, say, unusually high market orders. For
example, sell orders might arrive to the market, either because the asset is a lemon or because
the agents selling it are hit by a liquidity shock. In the models of this chapter, an equilibrium
with rational expectation exists, precisely because of this noiseliquidity, in this example.
There is a chance the sell order arrives to the market, simply because the agents selling it are
hit by a liquidity shock. Imperfectly informed agents, therefore, might be willing to buy, if it is
in their interest to do so.

9.3 Informational e ciency: roadmap


Informational e ciency is an attribute of asset markets that we gauge through equilibrium
outcomes, typically the price. It is an old and somehow controversial topic in nancial economics.
417

9.3. Informational e ciency: roadmap

c
by
A. Mele

We would say asset markets are informationally e cient if prices reect available information,
accurately and rapidly. This denition is obviously loose, but made purposedly so, as our
objective, now, is to illustrate how the process of narrowing it down leads to topics that have
made the object of controversy. In particular, we need to qualify the (i) type and (ii) quality of
information embedded into and revealed by the price: What type of information does the price
reveal? How accurately can the price convey information?
The rst question, relating to the type of information, has been addressed in a famous contribution by Fama (1970), who considers three forms of informational e ciency. First, strong
e ciency, arising when the price reects all private and public information. Second, semi-strong
e ciency, the situation in which the price conveys all public information. Third, weak e ciency,
arising when the price only conveys information regarding past (price) data.
At least initially, the motivation to dene e cient markets (in the informational sense) was
to illustrate that markets cannot be in a state of disequilibrium. For example, if asset prices
tend to be high on Friday and to decline on Monday, a protable trading opportunity might
seem to arise. An equilibrium, the motivation goes, is informationally e cient (weakly so, in
this example), should the average return from this opportunity be small enough to discourage
trading on it. We know this reasoning has fallacies. Even if the average gain is statistically large,
we might have no agent attempting at it, due to risk-aversion or trading frictions. Therefore,
we would never know whether even a potentially large gain (in statistical terms) is a market
ine ciency or rather, say, compensation for risk, a classical joint hypothesis problem.
The theoretical literature has rened the notion of informational e ciency, by shifting its
focus on the second question formulated at the beginning of this section: How accurately can
the price convey information? The approach followed to address this question relies on models
in which rational agents obviously trade on their information, but also on the information
the price reveals in equilibrium, which depends on the agents portfolio decisions, over a xed
point, as anticipated in the Introduction. This xed point can lead to asset prices that are fully
revealing, in that they reveal all private and public information. These models rely on rational
agents and not surprisingly, predict that no money could be ever left on the table. We also
have a renement of this fully revealing concept: we say markets with fully revealing prices
are strongly informationally e cient if the prices reveal a su cient statistics for all the private
information. Note that strong e ciency in this theoretical sense di ers from its meaning in the
empirical literature, signifying as it does, now, that a simple statistics (say, the average of the
signals dispersed across the agents population) is enough to forecast the asset fundamentals.
As discussed in the Introduction, markets with fully revealing prices are problematic. They
lead to paradoxes. First, the Grossman (1976) paradox. If markets are strongly e cient, the
agents should abandon their own signals while formulating portfolio decisions, although in this
case the price should not contain any piece of private information, contradicting the initial
presumption that markets are strongly e cient.
Second, the Grossman and Stiglitz (1980) paradox. If markets are strongly e cient, informed
agents make losses once information is costly, and would rather become uninformed, freeriding on an informative price, although then, in this case, the price will not contain information
anymore. To resolve this paradox, Grossman and Stiglitz (1980) propose markets which they
famously describe (p. 393) to be ones in which there is an equilibrium degree of disequilibrium.
That is, in their model, prices cannot be fully revealing, but partially revealing, meaning that the
informed agents do not give all of their information away to the uninformed. Disequilibrium
means prices are not fully revealing, and equilibrium degree of disequilibrium means that
the price informativeness depends on how many agents are informed, which is an endogenous
418

9.4. Walrasian equilibria as informationally ine cient outcomes

c
by
A. Mele

variable in the model. Note, now, that disequilibrium allows markets to function, a perspective
somehow distinct from the early attempts to dene e ciency in the empirical literature.
The following three sections aim to formalize these ideas. Section 9.4 shows that money is
indeed left on the table once agents take portfolio decisions while ignoring the information
content of asset prices. Then, we introduce progressively more appropriate notions of equilibrium
in which prices are fully (Section 9.5) and partially (Section 9.6) revealing.

9.4 Walrasian equilibria as informationally ine cient outcomes


We consider a two-period market in which in the second period, a riskless
asset (the numeraire)
2

yields a gross return equal to , and a risky asset pays o


. The budget constraint
applying to every agent is
=( 0 (
+ , where 0 and 0 denote the initial
0) )
endowment of the numeraire and the risky asset, is the price of the risky asset, is the
asset holding and, nally,
is the agents terminal wealth. We assume that each
investor
F ,
maximizes a CARA expected utility function against terminal wealth,
0,
where F denotes the information set available to agent .
We derive the equilibrium in this market while assuming asymmetric information, and treat
di erential information as a special case of this setting. We assume that there is a total of
agents. Of these agents,
are informed, in that they observe a signal on the asset value
equal to,
= +
= 1
(9.10)

where the noise component of the signal is


(0 2 ), for each investor .
2
Let
and
denote the precisions of the dividend and noise distribution, i.e.
and
2
. The demand for the risky asset formulated by any informed investor is determined
as,

= arg max

( | )+ 12 2 ( | )
= arg max

=
|
(9.11)
|

are normally distributed, and


where the second line follows because both and
interpreted as risk-tolerance, and by the Projection Theorem,

|
= +
+
and

denotes the precision of the asset payo


|

(9.12)

conditional upon having observed

for all

(9.13)

uninformed agents, who do not observe the signal in Eq. (9.10). Their
There are also
risky asset demand, say, is therefore a special case of the informed asset demand in Eq.
(9.11), namely for
= 0,

=
(9.14)

The equilibrium price of the risky asset is found by aggregating demand, setting supply equal
to aggregate demand,
X
=
+
(9.15)
0
=1

419

c
by
A. Mele

9.5. Rational Expectations Equilibrium

and solving for after replacing Eq. (9.11) and Eq. (9.14) into Eq. (9.15) and using Eqs.
(9.13)-(9.12), yielding,


= +
(9.16)
(( + ) +
)
(( + ) +
)
where denotes the average signal, viz

1 X

(9.17)

=1

The equilibrium price in Eq. (9.16) has three components. The rst is the discounted payo .
The second term reveals that the price aggregates information dispersed across the informed
investors through the average signal, . That is, is a su cient statistics for all of the signals
observed by each informed agent, ( ) =1 , with respect to the equilibrium price. This second
term thus adds or subtracts value according to whether the average signal, , is higher than
the unconditional guess, . The third term is a risk-premium: the higher the average supply,
the higher the risk the agents have to bear in equilibrium, which is evaluated proportionally to
their risk aversion, 1 = .
The main issue with a Walrasian equilibrium is that while the price conveys information
informed investors have, the uninformed investors do not condition upon this informative price
while formulating their asset demand in Eq. (9.14). They only use the price to determine their
budget constraint.
Consider the following additional issue, arising when the number of agents gets large. In this
case, we have that by the Law of Large Numbers,


plim =
+

( + )+ 1
( + )+ 1
where
lim
. That is, in this limit, the price perfectly reveals the asset payo , ,
provided of course the proportion of informed agents is asymptotically signicant, 0.
It is an arbitrage opportunity. Any rational investor who understands this market, can make
large prots whenever 6= . He will observe the price , and infer . For example, if
,
he will borrow and use this to invest in the risky asset. In the second period, he will pay back
and receive the asset payo , with a sure prot equal to
0.

9.5 Rational Expectations Equilibrium


If the equilibrium price reveals information, any agent will nd it convenient to update his
knowledge about the asset payo , , while conditioning on his information set, comprising both
his own signal and the equilibrium price. We term the equilibrium resulting in this market
as rational expectations equilibrium (REE, henceforth).
The term rational expectations is used to indicate that the agents correctly understand
that the equilibrium asset price incorporates information disseminated across the market, and
make statistical inference and take investment decisions accordingly.
We solve for the equilibrium as follows. First, we conjecture the equilibrium price does indeed
incorporate information on the average signal, , as the equilibrium price does in the Walrasian
case in (9.16),

= +
(9.18)
420

c
by
A. Mele

9.5. Rational Expectations Equilibrium

for two constants and . However, we conjecture that the two constants and are not the
same as those in Eq. (9.16). It is natural: if we assume the agents make inference about the
asset payo while also conditioning on the equilibrium price, this equilibrium price should then
di er from that arising in the Walrasian case.
To determine the equilibrium, rst note that the informed agents formulate a demand function
equal to


=
(9.19)
|
|
This demand schedule generalizes that in Eq. (9.11), in that the conditioning information now
contains both the signal available to the informed investor, , and the equilibrium price in
(9.18) or, equivalently, and conjecturing that 6= 0, the average signal, . However, it is easy
to check that by the projection theorem,

| = ( | ) = +

(9.20)
+
and,

(9.21)

That is, the average signal is a su cient statistics for the distribution of the asset payo . We
shall discuss the economic implications of this conclusion below, once we will have solved for
the equilibrium.
Regarding the uninformed investors, their demand schedule di ers from that in (9.14), in that
they will update their beliefs about the asset payo after having learnt the price realization,
in (9.18) or, equivalently, and conjecturing that 6= 0, the average signal, . But then note that
each uninformed agent will have exactly the same demand schedule as the informed in (9.19),
due to Eqs. (9.20)-(9.21), viz,
= =

| (

( | )

= 1

= 1

We can determine the coe cients and of the equilibrium price in (9.18) by plugging the
previous demand schedules into the market clearing condition, Eq. (9.15), leaving,

(9.22)

In turn, the equilibrium asset allocation is, simply,


= =

= 1

= 1

The equilibrium price in (9.22) fully reveals the average information disseminated in the
market, , just as in the Walrasian case (see Eq. (9.16)), although of course this REE di ers
from the Walrasians. The REE collapses to Walrasians only (i) in the absence of uninformed
agents attempting to extract information about the price,
= 0, and (ii) when the informed
agents have access to the same signal, = for all
(in which case we would set = 1 in
(9.22)). One implication is that absent these two conditions, arbitrage is ruled out now as the
number of agents increase as,
plim

The mechanism in this model is the following. While trading, the informed agents transmit
pieces of information into the equilibrium pricethey set the information content of the
421

9.6. Noisy Rational Expectations Equilibrium

c
by
A. Mele

price system so to speak. The uninformed agents free-ride on these prices. The price now
performs two roles: one, classical, is to determine the agents budget constraint; and a second,
less mechanical, to inform the uninformed agents about the information other investors possess.
Thus when an uninformed agent observes a low price, he will increase his demand as his budget
constraints soften. However, a low price reveals to the uninformed agents that the informed
might have received bad information about the asset quality, which decreases their demand.
The two e ects compensate exactly with each other, leading to a price-inelastic demand, =
= 1
. Given this, the informed agents can only be allocated their initial asset
0,
endowment as well, = 0 , = 1
.
This model leads to a number of puzzling predictions. First, a feature of the model pointed
out earlier, known as the Grossmans paradox (Grossman, 1976). By Eqs. (9.20) and (9.21),
any informed agent gives up his knowledge about his own signal, , relying as he does on the
average signal, . But if there is no agent using his own information while trading, we might
wonder, now, how the price ends up aggregating information in the rst place.
A second models implication links to the equilibrium allocation. Because each investor is
allocated his own endowment in equilibrium, informed and uninformed investors make the
same prots and, hence, have the same welfare. This raises the following issue, known as the
Grossman-Stiglitz paradox (Grossman and Stiglitz, 1980): why would we be willing to purchase
private information (i.e. the signals ), if this information could be freely read from the
equilibrium price? And, precisely because there are no incentives to purchase private information
in this case, the price should then not reect any.
We now explain these paradoxes could be solved by assuming the presence of noise in the
equilibrium, leading to a partially revealing price: when the uninformed agents make inference
on the private information investors have, based on the price, they can only extract part of this
information: while part of their information is given for free to uninformed, uniformed investors
can still have incentives to acquire information.

9.6 Noisy Rational Expectations Equilibrium


A natural way to cope with the issues arising in a REE is to assume that the risky asset supply
is random. This assumption has an economic interpretation: small realizations of the asset
supply might be viewed as liquidity shocks. The mechanism through which this assumption
allows to deal with the conceptual di culties inherent the REE is the following. In equilibrium,
the asset price embeds all of the private information and the realization of a liquidity shock.
However, investors cannot extract both upon observing the price. The price system, therefore,
conveys information, albeit in an imperfect way, which is lucky in a wayas Grossman and
Stiglitz (1980, p. 393) put itfor were it to do it perfectly, an equilibrium would not exist. It
is the role of noisy information famously pointed out by Black (1986), and discussed in the
Introduction.
We consider two models. One, with asymmetric information, and another, with di erential
information.
In the model with asymmetric information, the uninformed investors trade at disadvantage
compared to the informed. An equilibrium in the market for information obtains once the welfare
of the two types of agents (informed and uninformed) are equalized controlling for the cost of
information. An equilibrium in the asymmetric information market thus involves a xed point.
For a given number of informed agents, the price reveals some information to the uninformed
422

9.6. Noisy Rational Expectations Equilibrium

c
by
A. Mele

who determine the asset demand on the basis of the probabilistic distribution of the price. But
in equilibrium, prices depend on the uninformed demand. In the Grossman and Stiglitz (1980)
market reviewed in Section 9.6.1, there is a simple solution to this information transmission
problem. In equilibrium, prices are informative, to the extent that the incentives to purchase
information decrease with the number of the already informed agentsinformation choices are
strategic substitutes.
In the di erential information market of Section 9.6.2, there are no agents with superior information. The models that analyze this market are introduced by Hellwig (1980) and Diamond
and Verrecchia (1981). They aim to study issues regarding information aggregation: how does
the market help aggregate information dispersed across investors? This question has very distant origins in economics. A competitive equilibrium leads to Pareto optimal allocations (rst
welfare theorem); conversely, a given (desired) Pareto optimal allocation can be decentralized
through a dedicated re-distribution of wealth (second welfare theorem). Pushed to the extreme,
the second welfare theorem may seem to suggest that any desired market outcome could be
implemented through a centralized, socialist type planning system, as in the planning literature
following the work of Lange (1936, 1942); that is, once the planner xes an outcome as an objective, the very same outcome could be achieved through a dedicated re-distribution of resources
that leads agents to the desired objective under laissez faire. Hayek (1945) rejects these ideas:
we cannot implement this mechanism while missing data regarding information that is local.
After all, the second welfare theorem regards Walrasian equilibria, not the equilibria we are
examining in this chapter.
Hellwig and Diamond & Verrecchia models formalize the process of information aggregation
in the context of nancial markets. It is a complex task because, now, compared to the previous
REE markets, agents cannot extract all the information that others know from the price, and
need to condition on their own signals and forecast both signals and forecasts of others, over
a xed point. An equilibrium in this market exists once all these forecasts are mutually and
internally consistent and, the price is only partially revealing of all the information disseminated
in the market.

9.6.1 Asymmetric information: information transmission


Grossman and Stiglitz (1980) consider a model with asymmetric information, in which investors
can acquire information on the asset payo after agreeing to pay a fee, say. Once the investors
agree (say at time 0 ) to pay the fee (say once payo s are revealed), they observe a signal on the
asset payo , and then they trade at 1
0 based on their private information. The uninformed
investors trade while making inference on this private information based on the equilibrium
price that they are seeing. In equilibrium, we shall see, uninformed investors can only infer part
of the information possessed by the informed.
In a rst step, we study the equilibrium in the asset market for a given number of informed
investors. Due to asymmetric information, informed investors have higher welfare than the
uninformed (before paying ). In a second step, we determine the number of informed investors
by requiring that the welfare of the uninformed and the informed are the same once accounting
for the cost . It turns out that in this model, the higher the number of informed agents, the
lower the incentives to acquire information.
423

c
by
A. Mele

9.6. Noisy Rational Expectations Equilibrium


9.6.1.1 Asset market equilibrium

Each informed investor agrees to pay the constant cost , and observes the same signal on the
asset payo ,
=
for all
in Eq. (9.10). That is, acquiring information only leads
1
to a partial resolution of risk. We assume, and later justify, that the equilibrium price in this
market does not reveal more information than is already possessed by the informed agents.
Therefore, the asset demand of the informed agents is the same as in the Walrasian market,
Eq. (9.11), viz

=
)
= 1
(9.23)
| ( ( | )
with agents conditional estimate and precision of the asset payo being the same as in (9.12)(9.13),

+
(9.24)
( | ) = +
| =
+

We know from the previous section that the assumptions formulated so far lead to an equilibrium in which the uninformed investors free-ride on the equilibrium price, as the latter is
fully revealing. Obviously prices do reveal the information that informed investors impound
on them. The issue is to ascertain whether some of this information could not be revealed
for free to uninformed investors. Grossman and Stiglitz do indeed study an equilibrium with
partial information revelation. One of their key assumption is that the asset supply is random,
such that, now, markets reveal both fundamental and non-fundamental sources of information,
which an uninformed investor cannot tell apart. Specically, assume that the total asset supply
is random, and equal to2

0 2
(9.25)
0+
We interepret as a liquidity shock. For example, a positive and large realization of
could be interpreted as an asset sell-o due to non-fundamental reasons (say, asset owners who
monetize their investments due to previously unexpected consumption contingencies).
The partial revelation mechanism operates as follows. First, note that the uninformed investors now formulate their asset demand conditional upon having learnt about the equilibrium
price,

=
)
= 1
(9.26)
| ( ( | )
Replacing Eqs. (9.23)-(9.26) into the market clearing condition,
= + (1

(9.27)

reveals that the equilibrium price, , is informationally equivalent to the compound signal ,
dened as

(
(
) 1
(9.28)
0)

The compound signal (or, equivalently, the price) does reveal information about funda; however, it does so imperfectly, conveying as it does additional
mental information,
information regarding possible liquidity shocks,
0 . An uninformed investor who observes
1 In

their original formulation of the model, Grossman and Stiglitz assume that the asset payo is = + , where both and
are normally distributed, and that the informed investors observe for a fee. The formulation of the model in this section is
equivalent. Moreover, we are assuming that the information fee is only paid once payo s are revealed (rather than at 0 ) to simplify
the presentation.
2 To study the model predictions as
increases, we would need to make assumptions on how 2 would change with ; see
Footnote 3 for one alternative assumption. In this section, we take as given.

424

c
by
A. Mele

9.6. Noisy Rational Expectations Equilibrium

the price, now, cannot tell how much of a price increase is due to good news (say high ) or
a negative liquidity shock (say low ). Therefore, we expect the equilibrium in this market to
be one in which informed agents have higher welfare (before paying ) due to their superior
information: they know and of course, they know the price , and then ; that is, they know
everything (but ).
We search for a linear equilibrium, that is, an equilibrium in which the asset price is a linear
function of the compound signal, in Eq. (9.28),
=

(9.29)

for two constants and to be determined. We determine and by replacing the asset
demands of the informed and uninformed investors into the market clearing condition, Eq.
(9.27). We proceed as follows.
First, we determine the conditional expectation, ( | ), and the conditional precision, | ,
in Eq. (9.26). Note that the equilibrium price in Eq. (9.29) is a ne in , and is normally
distributed, such that
( 2 2 ). Therefore,
( | ) = +

)
2

(9.30)

where,
=

1
+

2 2

(9.31)

such that, the uninformed asset demand in (9.26) is


=

( +

{z

)
}

info revelation

)
|{z}

(9.32)

budget constr.

Regarding the informed asset demand, we have, by Eqs. (9.23)-(9.24), that:

=
+
|

(9.33)

We shall develop the economic interpretation of both the uninformed and the informed asset
demand schedules below.
Second, we plug (9.32) and (9.33) into Eq. (9.27), obtaining the equilibrium price conjectured
in (9.29)

+ (1
) |
0

=
) |
) |
| + (1
| + (1
|
|
{z
}
{z
}
=

The model leads to a number of important predictions. First, in equilibrium, the price does
not reveal all private information to the uninformed agents, but only a noisy version of it, :
we say the price is partially revealing. The model thus provides a resolution to the GrossmanStiglitz paradox: as the next subsection will show, there exist equilibria in which it is valuable
to purchase private information.
Second, consider the uninformed demand, in Eq. (9.32). One the one hand, it decreases
with the price as a mechanical implication of the agents budget constraint. On the other hand,
the uninformed demand increases with , due to information. In the REE, the two e ects exactly
425

c
by
A. Mele

9.6. Noisy Rational Expectations Equilibrium

compensate with each other, as explained in the previous section. Straightforward calculations
do, instead, reveal, that in the equilibrium of this model,
0
with an equality only holding once 2 = 0. That is, the uninformed asset demand in (9.32)
slopes down. The interpretation is simple: in the NREE, the e ects of information revelation
are weaker than in the REE case, as the price is partially revealing, as explained.
Third, the constant determines the price impact of a liquidity shock. The higher , the higher
. Its inverse, 1 =
the impact of a liquidity shock,
, is
0 , compared to information,
a measure of the risk-bearing capacity of the informed agents vis-`a-vis information. Note, indeed,
that the informed agents demand in (9.33) has two components. The rst relates to how high
the price is, compared to the unconditional guess of the asset value; it increases (in absolute
value) with the conditional precision of the value estimate given private information, | . The
second component relates to how good news are, compared to unconditional guesses; it increases
(in absolute value) with the conditional precision of the noise (i.e. with the informational
advantage), reecting a better quality of private information. Thus, and as anticipated, the
term 1 measures the total e ect of the informed agent demand with respect to information.
The higher this term, the easier becomes for the price system to absorb a liquidity shock,
0.
Finally, consider the conditional precision of the asset payo upon observation of the price,
i.e., | in Eq. (9.30): it is a measure of price informativeness. In particular, the constant in
(9.31) reects asymmetric information: it increases with , i.e., it decreases with the size of the
noisy component of private information, 2 . Moreover, it decreases with , i.e. it increases with
the risk-bearing capacity of the informed agents. In other words, price informativeness decreases
as private information becomes more noisy and increases with the risk-bearing capacity of the
informed agents. Note, nally, that because
1,
|

(9.34)

That is, prices provide less information than private signals. This property, while very intuitive,
has important welfare implications, discussed below.
9.6.1.2 The value of information

The uninformed agents do not entirely learn about the private information other investors
, with
have acquired. Whilst observing the equilibrium price, they confuse fundamentals,
liquidity shocks,
0 and sometimes, then, trade against the informed. Uninformed investors
should therefore expect lower prots and welfare than the informed (before information costs).
In Appendix 2, we show that,

r
(
)
(

(9.35)

where
and
denote the terminal wealth of the informed and the uninformed agent.
Dene the ex-ante prot certainty equivalent of the informed and the uninformed agents as the
two values C and C that solve:
C

and
426

c
by
A. Mele

9.6. Noisy Rational Expectations Equilibrium

We dene the value of information as the net gain of becoming informed in terms of the
previous certainty equivalents, which by Eq. (9.35) is,

1
|
(9.36)
C
C =
ln
2
|
Because prices provide less information than private signals, |
| (see Eq. (9.34)),
the informed agent is always better o compared to the uninformed, before accounting for the
information cost . Moreover, note that the conditional precision of the asset value estimate
made by the informed agents, | , is independent of the number of informed agents, . Instead,
that of the uninformed, | , is increasing in , as discussed in the previous section. Therefore,
the value of information is strictly decreasing in : the higher the number of informed agents,
the less valuable it is to acquire private information. An interior equilibrium in the market for
information is given by the proportion of informed agents
such that C
C = 0. Figure 9.1
depicts the value of information as a function of the proportion of informed agents for given
constellations of parameter values.

0.14

0.14

0.12

0.12

0.10

0.10

0.08

0.08

0.06

0.06

0.04

0.04
0.02

0.02
0.00
-0.02
-0.04

0.00
0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

lambda

-0.02

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

lambda

-0.04
-0.06

FIGURE 9.1. This gure depicts the value of information, C C in (9.36), when the the
cost of information is = 0 20, and parameters are set equal to =
=
=
= 1,
which is the benchmark solid line in both panels. The left and right panels compare the
benchmark with the value of information arising when the size of liquidity shocks increases
so as to make
= 34 (left panel, dashed line), and when the precision of information
acquired by the informed agent decreases such that
= 34 (right panel, dashed line).

In all cases depicted in Figure 9.1, there exist interior equilibria. For example, when the cost
of information is = 0 20 and all remaining parameters are set equal to one (the benchmark),
the equilibrium obtains with
= 71%. This equilibrium is stable in the following sense. When
the proportion of informed agents is less than this equilibrium,
, there are incentives to
become informed (C C
0), such that more agents will become informed until is reached.
The opposite occurs when
. There might be non-interior equilibria. For example, the cost
427

c
by
A. Mele

9.6. Noisy Rational Expectations Equilibrium

of information could be so prohibitive, to make C


C
0 in (9.36) for all and in this case
the equilibrium would be
= 0. In general, non-interior equilibria are dened as
= 0 if
C
C
0 for all , and
= 1 if C
C
0.
The comparative statics in Figure 9.1 are easy to explain. In the left panel, we compare
a market in which liquidity shocks have higher variability than in the benchmark. The value
of information is higher than in the benchmark as prices reveal less information than in the
benchmark, and the equilibrium proportion of informed agents increase. In the right panel, the
benchmark is compared with a market in which the quality of private information deteriorates,
as captured by a lower value of . The value of information in this market now deteriorates
and the equilibrium proportion of informed agents decreases as a result.
9.6.1.3 Information sales

Review the literature on information sales.


[In progress]
9.6.2 Di erential information: information aggregation
Hellwig (1980) and Diamond and Verrecchia (1981) consider models in which, unlike in the
previous section, there are no investors with superior information. The key feature of these
models is that in equilibrium, investors learn about the asset fundamentals from both the price
and the signals they observe. The reasons this is possible are similar to those explained in the
asymmetric information market of the previous section: due to the asset being in random supply,
the equilibrium price is partially revealing, such that the signals available to each investor are
useful while formulating the inference process leading to asset demands.3
In detail, we assume that each investor observes a signal on the asset fundamental (thus,
0 and
), which is the same as in Eq. (9.10), and that the total asset supply,
,
is as in Eq. (9.25). The asset demand of each agent is


= 1
(9.37)
|
=
|

where, as usual, denotes the equilibrium price, and where the conditional precision of the
. This conjecture on
payo estimate is assumed to be the same for each agent, |
|
the conditional precision will be conrmed below.
We shall formulate conjectures regarding below. For now, we assume that the price is a
random variable that helps each investor forecast the asset payo , . The rationale behind this
conjecture is that in equilibrium,
X
=

(9.38)
=1

such that the clearing price aggregates the signals all investors observe, which are potentially
useful in each investors inference problem.
Furthermore, we conjecture that the conditional expectation,

+ (
|
= +
)
(9.39)

3 The two models are similar except that Diamond and Verrecchia consider endowment shocks as the channel leading to partially
revealing prices. The inference process of the agents in Diamond and Verrecchia is also more elaborated than in Hellwig, as each
agent information set includes the price, the signal on the dividend, and the endowment shock. In Hellwig, there are no endowment
shocks as explained.

428

c
by
A. Mele

9.6. Noisy Rational Expectations Equilibrium

for two constants


and . If were exogenous, these two constants could be readily determined relying on the projection theorem. However, is endogenous, and these coe cients
could well depend on how the equilibrium price aggregates all available information. In other
words, due to the endogeneity of the price, , we need to check whether the projection theorem
representation in (9.39) holds and whether it is unique.
There is a natural conjecture to make regarding the equilibrium price, which we shall justify
in a moment, namely that is an a ne function of the average signal and the per-capita supply,

=

(
(9.40)
0)

where denotes the average signal in Eq. (9.17). The rationale underlying this conjecture is
the following. Note that by replacing Eq. (9.39) into the asset demand in (9.37), and then
plugging the result into the equilibrium condition (9.38), yields, after rearranging terms, that,
+

{z

| {z }

{z

0)

(9.41)

where the identities under the brackets follow by the conjecture made in (9.40).
The rst restriction in (9.41) delivers the value of in (9.40),

(9.42)

0
|

Instead, the coe cients and


tation of Eq. (9.39),
and ,

depend on the coe cient updates in the conditional expec-

=
|

(9.43)

,
{ } depend on ,
{ }, over a a xed point.
where the very same
Intuitively, the price coe cients, , depend on the agents coe cient updates, , as the
agents conditional expectation of the dividend (and, hence, ) contribute to the equilibrium
price, through the equilibrium condition (see Eq. (9.38)). In turn, the agents coe cient updates
depend on the statistical distribution of the equilibrium price (and, hence, ), as the latter
is a conditioning variable for the agents expectations of the dividends. Appendix 2 shows that
there is a unique solution to this xed point, and that the price in Eq. (9.40) can be expressed
as

+ 2
1 1

(9.44)
=
1 1
+1
+ 2( + )
where

is the compound signal, dened as,

0)

(9.45)

and
is a strictly positive and bounded constant dened in the Appendix (see Eq. (9A.18)).
The fact that
0 implies that the price only partially reveals the average signal, .
Similarly as for the constant in the Grossman and Stiglitz (1980) model (see Eq. (9.28)), the
constant
in this model determines the price impact of a liquidity shock. Below, we shall see a
striking similarity between these two constants in the limiting case of a large number of agents.
429

c
by
A. Mele

9.6. Noisy Rational Expectations Equilibrium

The compound signal in Eq. (9.45) is the di erential information counterpart to in (9.28).
That is, in the di erential information market of this section, the price is partially revealing,
as conveys noisy information about the asset fundamentals. The equilibrium in this market
then addresses a number of issues that the REE cannot. First, Eqs. (9.44) and (9.45) reveal
that
and
are both positive. Moreover,
and
are both positive, provided that the
conditional precision |
0,4 a condition that can be shown to hold. That is, every agent
now conditions on both his own signal and the price while formulating his demand functiona
resolution to the Grossman (1976) paradox. The coe cient
is positive because the agents
signals are obviously positively correlated with the dividend; and the coe cient
is also
positive, because the equilibrium price is positively correlated with the asset dividend.
The solution in Eq. (9.44) is not known in closed form because the constant
is not. Note
that this constant determines the price sensitivity to the average signal and liquidity shocks,
and and also the conditional precision, | in (9.42). However, a solution is available in
the limiting case in which the number of agents is large. This case provides intuition regarding
the nite case.
So suppose that the number of agents is large, such by the Law of Large Numbers, the price
in (9.44) satises5

1
(1 + 2
)
plim =
+
(9.46)

0
2
|
( + (1 +
) )
|
{z
}
|
{z
}

where

and

is the compound signal,

0)

denotes the limiting conditional precision,

= lim

=(

)+

(9.47)

The price now reveals better information about the asset fundamentals than in the nite
agents case, as the signal embeds more precise information about the asset payo than in
(9.45). Naturally, each agent still nds it useful to condition his demand on both his own signal
and the equilibrium price. Moreover, the price is partially revealing in this limiting case too.
Note, now, that the limiting coe cient, , is the same as in the Grossman and Stiglitz (1980)
model when = 1 (see Eq. (9.28)). In other words, in this market, liquidity shocks are absorbed
through a mechanism quite comparable to that in the asymmetric information market.
Finally, the conditional precision in (9.47) is the sum of the precision of the payo given each
2
agents signal, | =
+ , plus a second component, 2
. This second component is
increasing in : the agents demand becomes more aggressive as risk-tolerance increases, leading
to a more informative equilibrium price. The second component is also increasing in both
and . The e ects regarding an increase in
are explained by the price being more responsive
to fundamentals than noise as 2 decreases. The e ects regarding an increase in
link to each
agent providing more information over equilibrium aggregation while trading more aggressively
on less noisy signals.
4 Assume momentarily that
0. Because
0, the second of Eqs. (9.43) implies that
0,
0. The fact that
0 follows by Eq. (9A.12) in Appendix 2.
(9.43), and
5 By

Eq. (9.25), we have that

1
2

from zero. For example, this case arises when the shock
, where
0 2 .
=1

. We assume that the limit lim

. Hence, by the rst of Eqs.


1
2

is bounded and bounded away

in (9.25) is the sum of the endowment shocks a ecting each agent, say

430

9.7. Dealers markets: Introduction

c
by
A. Mele

9.7 Dealers markets: Introduction


Asset prices uctuate even in the absence of important public news about the fundamentals,
exhibiting a quite pronounced volatility at as such high frequency scale as intraday. Information
could then just be a market driver. Indeed, the previous sections emphasize that asset prices can
incorporate private information conveyed by investors, albeit in a partially revealing fashion.
Intuitively, changes in private information can be incorporated into asset prices more or less
quickly, but once prices are partially revealing, they can uctuate even while asset demand is
driven by reasons unrelated to fundamentals.
This and the following section formalize the previous reasoning and build up on the insights of
the previous section to develop information-based explanations of asset price uctuations. While
the models in the previous sections provide the necessary foundations to the role information
plays in asset markets, the organization of nancial markets and the very trading systems have
institutional details that go beyond those underlying the previous models. These details include
the timing of the market clearing mechanism (when), the geography (where) and modalities
(how).
The timing of the clearing mechanism can be best illustrated by two main organization
protocols: (i) call auctions (batch markets), whereby trading occurs over prespecied windows
of time, and (ii) continuous auctions, whereby trading occurs continuously. The geography of
markets is varied. There are centralized markets such as those for products listed on an exchange.
There are decentralized markets (e.g., OTC markets). There are dealership markets, in which
end-users trade through a number of dealers; dealership markets are therefore decentralized
markets: an hypothetical end-user who is not satised with the quotes a dealer provides him
with, can search for better opportunities o ered by another dealer. Inter-dealer markets are
markets in which only dealers operate who adjust their desired balances from those carried over
while trading with end-users. While dealers typically operate in OTC markets, they can also
operate on exchanges like the LSE. Finally, modalities of the market clearing mechanism, or
order submissions, can broadly take on two forms: (i) market orders or (ii) limit orders. These
two forms may well a ect market conditions. For example, market orders (i.e. those most likely
to be initiated by informed/active investors) consume limit orders, and decrease market
liquidity, while limit orders (most likely to be initiated by passive investors) improve market
liquidity.
We analyze the role of information in the price formation process while taking into account
some of the previous market microstructure details. In this and in the following section, we
study markets in which there are traders submitting market orders and dealers who operate as
market makers and set prices. Market makers can be competitive or not. This section displays
an important example of a market outcome in a setting where market makers are competitive,
a famous model developed by Glosten and Milgrom (1985). The model is a cornerstone in the
theory of market microstructure, and helped develop empirical measures of liquidity. We hinge
upon this model to illustrate how information matters in dealers markets, and how the model
could be enriched (in the next section).
9.7.1 Markets with symmetric information
Suppose that market makers are perfectly competitive and risk-neutral, and that they receive
market orders from potentially informed traders. The latter are potentially informed, in that
the market makers do not know whether the orders they receive emanate from an informed or a
431

c
by
A. Mele

9.7. Dealers markets: Introduction

liquidity traderan assumption that parallels those leading to partial revelation in the NREE
models of the previous section. As usual, informed traders are those who observe a signal on
the asset payo , . Denote with F the information set that is available to marketSmakers at
time , and assume buy or sell orders arrive sequentially to them. Clearly, F = F 1
, where
can be either a buy or a sell order, i.e.
{buy sell }. For simplicity, we assume interest
rates are zero.
Suppose, now, that F is the same as that available to all traders: that is, there are no
informed traders. Then, if dealer receives a buy order, and is risk-neutral, he will set the
ask price at ,
say, such that,
0=

|F

= buy ) =

|F

1)

(9.48)

where the second equality follows because the buy order is totally uninformative, that is, it has
no predictive power on .
Likewise, if dealer receives a sell order, he will set the bid price at ,
say, such that,
0=

|F

= sell ) =

|F

1)

(9.49)

The expected prots in (9.48) and (9.49) are set equal to zero, due to the assumption dealers
are perfectly competitive, and the second equalities in both (9.48) and (9.49) follow by the
assumption that trades arrive to dealers while not carrying any information about . Therefore,
the bid-ask spread is zero for each dealer, and the asset price collapses to
=

( |F

1)

In absence of public news about , the price


will not change. It is counterfactual, as
explained. Glosten and Milgrom (1985) show how the bid-ask spread can be positive in a model
in which dealers face informed investors. We present a simplied version of their model next.
9.7.2 With asymmetric information
Suppose the asset payo
has a binary distribution: it is high, , with probability , or low,
, with complementary probability. Denote with
the probability that =
conditionally
upon the entire history of orders seen by the market makers up to ,
= Pr ( =

|F )

Trades arrive sequentially to the market makers. The latter believe there is (i) a probability
the order at is information driven; (ii) a probability 1
the order at is liquidity driven, and
in this case, 50% chances the liquidity motivated order is a buy or a sell. Note that the order,
now, might contain private information regarding the asset payo . While market makers are
not sure about whether the order is informative, they post order-contingent rules that reect
updated information: a bid price if the order they receive is a sell, and a ask price if the order
is a buy.
For example, and regarding a sell order, competition leads a representative market maker to
post the following bid price
=

( |F

= sell ) =

(sell )
432
1

+ (1

(sell ))

(9.50)

c
by
A. Mele

9.7. Dealers markets: Introduction


where by Bayes law,
1

(sell )

|F

Pr ( =

= sell ) =

Pr (

= sell | =
F
Pr ( = sell | F 1 )

1)
1

(9.51)

Once again, note that the market maker posts bid and ask prices before knowing whether he
will receive a sell or buy order. As explained, his is a posting rule and actually a regret-free rule:
once the order is lled, market makers will not regret having executed at the prices they had
previously posted given the information of the time. This rule is, of course, entirely consistent
with market practice.
To determine the probabilities of the ratio in (9.51), note that conditionally on = ,
informed traders would never sell, such that the probability to receive a sell order in this case
is simply the probability that the order is liquidity motivated, 1
, times the probability that
1
a liquidity trader sells, 2 ,
Pr (

= sell | =

1)

1
(1
2

(9.52)

Furthermore,
Pr (

= sell | F

1)

1
(1
1
2

1)

) + (1

1
+ (1
2

(9.53)

The rst term on the RHS of the previous equality arises because conditionally upon =
(which has
1 likelihood given F 1 ), the only sell trades can be liquidity motivated, which
occurs with probability 12 (1
), as explained while deriving Eq. (9.52). The second term on
the RHS of (9.53) is due to the fact that conditionally on = (which has 1
1 likelihood
given F 1 ), the probability of receiving a sell order is the sum of (i) the prob of an informed
trade, , plus (ii) the prob of a liquidity motivated order, 12 (1
).
Replacing Eqs. (9.52) and (9.53) into Eq. (9.51) leaves
1

(sell ) =

1
2
1
2

(1

(1
)
) + (1

1)

(9.54)

An analogous reasoning leads to the following expression for the ask price posted by a representatitive market maker
=

( |F

= buy ) =

(buy )

+ (1

(buy ))

(9.55)

where
1

(buy )

Pr ( = | F 1
= buy )
Pr ( = buy | =
F 1)
=
Pr ( = buy | F 1 )
1
(1 + )
= 1 2
1
(1
)+
1
2

It is immediate to check that Eqs. (9.54) and (9.56) imply that


1

(buy )

433

(sell )

(9.56)

c
by
A. Mele

9.7. Dealers markets: Introduction

That is, the order ow is informative. Moreover, the model predicts the price is asymptotically
strongly e cient, in that
0 if =
and
1 if =
(see Eqs. (9.60) below).
Intuitively, chances are high the asset payo is low, after a large number of sell orders arriving
to the market makers trading desk.
The model implies adverse selection costs are borne by liquidity motivated orders. Consider,
indeed, the fair value of the asset, i.e. that arising in the absence of any information frictions
(i.e. prior to possibly informed trades arriving at ), dened as,
1

( |F

1)

+ (1

1)

(9.57)

It is the price we would observe absent any adverse selection. We have, accordingly, and using
Eq. (9.50), Eq. (9.54) and Eq. (9.57),
1

(9.58)

where,
1

(sell )

1)

(1

1
2

(1

) + (1

1)

The bid price is less than the fair value because once a market maker receives a sell order,
he will be unsure about whether the sell order is liquidity motivated: it could be information
motivated. This correlation between the order ow and the asset value leads the market maker
to evaluate the asset less than based on his information set prior to the received order. The
spread,
(posted at
1 in anticipation of a sell order possibly occurring at ) represents
the updated beliefs of the market maker after receiving a sell order. Adverse selection costs
are borne by liquidity traders because those a ected by a liquidity shock, will sell the asset
at a price lower than the fundamental value according to the previous information set not
including their own order,
1.
Likewise, and using Eq. (9.55), Eq. (9.56) and Eq. (9.57),
1

(1
(1

1)

(9.59)

where,
1

(buy )

1
2

)+

Now, those who want to buy an asset for liquidity reasons and accordingly, place a buy order,
will buy at a price higher than , bearing an adverse selection cost.
To summarize, Eq. (9.57), Eq. (9.58) and Eq. (9.59) form a stochastic di erence system,
where the conditional probability the asset is good, , is,

as in Eq. (9.54)
1 (sell )
=
(9.60)
(buy
)
as in Eq. (9.56)
1
Note, also, that the two spreads,

and

, cannot be the same. In particular, we have,


1

1
2

is increasing that the asset payo is high, because


That is, if the conditional probability
for example there are more and more buy orders, adverse selection on the ask side will be less
434

c
by
A. Mele

9.8. Markets with strategic players


and less severe as informed traders are giving them away. As anticipated,
payo is really high.
Finally, the bid-ask spread predicted by the model is,
=

1 if the asset

It is increasing in the volatility of the asset payo , consistent with empirical evidence, being
0 or
zero only when = 0 (no asymmetric information) or asymptotically, when either
1.
The insights of the model are very important although the model has limitations. First, the
only source of the bid-ask spread relies on information asymmetries and the inherent adverse
selection: the market maker may receive buy orders by investors who know the asset payo is
good, or sell orders by investors who know the asset is a lemon. That is, the order ow correlates
with the asset payo , which leads to price impacts of trades not related to fundamentalsthe
liquidity shocks; that is, whenever a liquidity trader trades, he will move the price above or
below the fundamentals because the market maker anticipates that the order he observes could
be information driven.6 Furthermore, two assumptions of the model are that informed traders
do not act strategically, and that the order size is xed. The next section analyzes markets in
which traders understand their trades can have a price impact and can, accordingly, optimize
on their order size and also distribute their order sizes over time optimally.

9.8 Markets with strategic players


Investors with superior information do a ect asset prices in all the markets analyzed so far. The
question arises as to how such investors trade while making the best use of their information. In
the markets of the previous sections, agents do not understand they have a price impact. This
section removes this assumption and considers dealer markets in which investors with superior
information make a strategic use of their information while realizing they have a price impact.
The models we study make sharp predictions regarding how liquidity conditions are a ected
by the extent of information asymmetry between traders and dealers.
How does an informed trader trade while realizing his trade has a price impact? On the one
hand, an investor with superior information (an insider trader, say) should trade aggressively
on his information advantage against the market makers. On the other hand, prices likely move
against the insider trader while he trades aggressively: they increase when the insider buys and
decrease when he sells. So a large trade is not optimal because prices could move adversely, but
then, a small trade could just be under-utilized information.
In the celebrated Kyle (1985) model, there is a trading size that optimizes the previous tradeo given the market makers pricing rule. Regarding this pricing rule, we assume that the market
makers fully understand that the order ow they observe comprises both informed trading
and noise. But similarly as in the Glosten and Milgrom (1985) model, they cannot disentangle
the two components. Therefore, liquidity traders bear an adverse selection cost: the order ow
has a price impact, such that the liquidity traders will buy (sell) at higher (lower) price than the
fundamental value, controlling for the informed trading. A price impact actually exists because
the market makers understand that there might be informed traders.
6 Section

9.9 provides additional explanations of bid-ask spreads, based on search frictions.

435

c
by
A. Mele

9.8. Markets with strategic players

Kyle (1985) solves for the equilibrium of this game and shows that the price impact, and
hence, liquidity conditions, tightly link to the extent of the information asymmetry between
traders and dealers. In Sections 9.8.1 and 9.8.2, we analyze the static, baseline version of his
model as well as a few variants of it. In Section 9.8.3, we analyze and discuss dynamic extensions
of the baseline model, explaining the models implications in terms of trading patterns and
general market behavior.
9.8.1 The Kyle baseline model
Kyle (1985) considers a market in which prices are determined in a sequential equilibrium,
similarly as in the Glosten and Milgrom (1985) market. An insider trader knows the value of
the asset payo , , and submits his order, . The market maker (many of them, actually),
observes the aggregate order ow, dened as the sum of the insider trade and a liquidity shock,
, i.e.
+ , where
( 2 ) and
(0 2 ), and fundamentals ( ) and noise ( )
are independent.
Given the order ow they observe, and perfect competition, the representative market maker
sets the asset price according to semi-strong informational e ciency,
( | )

(9.61)

We conjecture the price is linear in the aggregate order ow,


= +

(9.62)

where the coe cient, , known as Kyles lambda, measures the price impact of the order
ow. In other words, its inverse, 1 , is a measure of market depth, i.e. the order ow that is
needed to induce prices to change by one dollar. So because the order ow obviously contains
liquidity shocks, the higher , the less liquid the market is. We shall determine the value of
, below, as part of the equilibrium of the game between the informed trader and the market
maker.
The insider trader chooses his order size to maximize his prots while anticipating he has
price impact,

max
(
) =

with the price being as in Eq. (9.62). The rst order conditions of the previous program lead
to the following demand schedule,
=

1
2

(9.63)

, which is the
That is, the insider trade is proportional to his informational advantage,
di erence between his information about the asset payo , and the unconditional guess of the
asset payo , the fair value of the asset. Naturally, the coe cient of proportionality, , has
still to be determined.
Regarding the market makers beliefs, we have, by the projection theorem, that,
=

( | ) = +

)
( )
436

= +

2 2

+ 2
{z }

(9.64)

c
by
A. Mele

9.8. Markets with strategic players

where the identity under the brackets follows by the conjecture made in (9.62).
To summarize, the equilibrium in this market is one in which the market maker sets the price
as in Eq. (9.62), and the optimal trading size of the insider is given by in Eq. (9.63). The
price impact, , in (9.62) and the trading aggressiveness coe cient, , in (9.63), satisfy:
1
2
1
=
+
=

(9.65)

The rst condition says that when markets are deep (i.e. when the price impact is low), the
price is obviously less likely to move against the insider, who will therefore trade aggressively
. The second condition says that the market maker makes
on his informational advantage,
the markets deep both when the insider trades little on his information advantage (i.e. when
is low) and, when he trades so aggressively to reveal much of his information (i.e. when is
high). An equilibrium of this game is the solution to the system (9.65),
=

1
2

(9.66)

The insider trades more and more aggressively as liquidity trades become large (i.e. as
becomes large), because it is easier to hide his information in this case. Alternatively, the probability the order ow contains information decreases with , making adverse selection less acute,
leading the market maker to lower and, then, the informed traders to trade more aggressively.
Likewise, the insiders informational advantage increases with , leading the market maker to
raise the adverse selection costs , and, then, the insider to trade less aggressively.
Price discovery leads to halve the initial uncertainty about the asset payo ,
2

( | )=

1
( )
=
( )
2

(9.67)

where the rst equality follows by the projection theorem, and the second by a direct calculation.
Finally, the expected prots to the trader are, unconditionally,
((

) )=

1
2

They increase with the insiders informational advantage, measured by


ability to hide his orders, measured by .

, and with the insiders

9.8.2 Markets with multiple traders and dealers


We consider three extensions of the Kyle baselines model: one in which there are multiple
insiders with the same information; a second, in which insiders have di erential information;
and a third, in which there are multiple dealers.
9.8.2.1 Identical private information

When

traders observe the asset payo , the market maker will see an order ow equal to,
+ , and will set the price as in Eq. (9.61). We still conjecture that this price is linear
in , as in Eq. (9.62), and that each insider trades on his information advantage according to,

=
(9.68)
437

c
by
A. Mele

9.8. Markets with strategic players

Note, now, that while formulating his trading rule, each insider knows he will have a price
impact but that the other agents have price impact too. A Cournot-Nash equilibrium is one
in which (i) each trader formulates his optimal trading decision whilst taking the trading rules
adopted by his peers as being as in (9.68) and still, (ii) nds (9.68) being optimal to him.
The reason this market is referred to as being Cournotian is because it parallels the Cournot
(1838) model in Industrial Organization, in which a nite number of rms compete for the same
product (and, hence, the same price),7 and act by maximizing their prots based on residual
demands, just as in the model of this section. Naturally, traders formulate their strategies
while also conjecturing that the market maker bases his inference on the aggregate order ow,
as explained.
Each trader maximizes his expected prots, as follows:

+ +

= + (
1)
max
(
) =

leading to the following rst order conditions,


1

(
2{z

1)
}

where the identity under the brackets follows by the conjecture made in Eq. (9.68). That is,
=
and one determines

1
+ 1)

(9.69)

similarly as in (9.64),
(

+ )
=
+ )

2
2 2 2

(9.70)

The solution to Eqs. (9.69)-(9.70) is,


=

+1

(9.71)

The interpretation regarding


and
is the same as in the previous section. Regarding
, note that in the market of this section, traders trade less aggressively as their number
increases. The reason is that the overall price impact of information is
= +1 , which is
increasing in , making the probability of an informed trade increasing in
as well. In other
words, as
increases, decreases. Intuitively, as more and more traders trade on the same
information, price move more and more against them, leading each trader to be less aggressive.
Note, however, that markets become more liquid with , as the expression for in (9.71)
suggests.
Furthermore, note that while each trader trades less aggressively than in the baseline case
where = 1 (see Eqs. (9.66)), price discovery improves as increases, as a simple calculation
reveals,
2
1
( )
2
( | )= 2
=
( )
+1
7 Chamberlins (1933) model of monopolistic competition deals with markets in which rms compete for products that have
imperfect substitutes.

438

9.8. Markets with strategic players

c
by
A. Mele

Finally, competition amongst traders lead them to experience lower prots as their number
increases,
1
((
) )=
( + 1)
We now explain that some of these properties do not hold in markets where traders have
heterogenous information.
9.8.2.2 Heterogeneity in private information

We now assume that each trader observes a di


is as in
P erent signal , and that each
Eq. (9.10). The aggregate order ow is now
+ , where each traders demand
=1
reects his own informational advantage against the market maker. Below, we shall formulate
conjectures about how this informational advantage is reected into the traders strategy (see
Eq. (9.73)). We initially focus on explaining the gaming of this market with heterogeneous
information.
First, and as usual, market making makes the price semi-strong informationally e cient, just
as in Eq. (9.61). Second, we conjecture that this price is linear in , as in Eq. (9.62). Third, we
analyze this market as being a Cournot-Nash one, as in the previous section: each trader trades
so as to maximize his expected prots given the trading strategies of the other players and, the
market makers inference. Accordingly, each trader maximizes his expected prots, as follows:

max
(
) |
+ +
= +
6=
It is instructive to analyze the rst order conditions of this problem,

0=
|
2
6=

(9.72)

Note that while determining his optimal asset demand , each agent needs to forecast the
demand of his peers given the information he has access to, ( | ), while at the same time
fully understanding that his peers are doing exactly the same, in that
reects agents
expectation of agent (plus all other agents) expectations given . Forecasting the forecasts of
others leads to an innite regress problem, whereby agents end up making conjectures about
the conjectures others are making about them, ad innitum.
There is no guarantee a xed point exists to this reasoning. However, in a linear equilibrium,
a solution exists and is easy to describe. Conjecture that each agent trades on his informational
advantage according to,

=
(9.73)
Then, each agent forecasts of the forecasts of others simplies, collapsing
asit does to fore
. Therefore, in
casting the informational advantages of the peers, ( | ) =
this linear equilibrium, each agent can determine his optimal demand from the rst order
conditions (9.72), as follows:


1

(
1)
(1
(
1))

=
(9.74)
=
2
2{z
}
|
where we have used
that each agent trades according to (9.73), the symmetry
1 the conjecture

( | )=
for each , and the second equality follows because by the projection
439

c
by
A. Mele

9.8. Markets with strategic players


theorem,

2
2

+ 2
| {z }

(9.75)

and, nally, the identity under the brackets in (9.74) follows by the conjecture made in Eq.
(9.73).
Note that Eq. (9.75) says that the slope regression estimates of dividends and signals are the
same, and equal to . Moreover, is also the correlation coe cient between the signals traders
have access to. The model collapses to the identical information market in the previous section
once 2 = 0, i.e. = 1.
By Eq. (9.74),
(9.76)
=
(2 + (
1))
and regarding the price impact, the Appendix shows that,
=

)
( )

such that,
=

2 2

(1 + (

1) ) +

2+(

1)

(9.77)

(9.78)

In equilibrium, trading aggressiveness, , increases with the correlation of the traders signals.
Intuitively, in this model, the signals the traders observe become less and less correlated as their
quality deteroriates,
=
1+
2

where = 2 is a measure of the signals quality. That is, as the noise component increases
(i.e. 2 increases), each trader has access to more and more idiosyncratic, albeit noisy, pieces
of information. Each trader then trades less aggressively as his signal becomes less precise.8
The price impact in (9.78), , is increasing in , provided is small enough. The mechanism
is the following. The market maker anticipates that an increase in the quality of the traders
information (reected by a higher ) results in a more informative trading, which makes adverse
selection more severe increases with as a result. In other words, market makers have few
traders ( is small), who in addition are equipped with high quality information, and increases
with as a result. Note that when
is large, can be decreasing in provided the latter is
large enough. The reason is that as
increases, the order ow become more informative (as
is increasing in )informed traders are then revealing their presence, and an increase
in would make their presence revealed even more so to speak: adverse selection costs lower as
a result.
Price discovery deteriorates as the correlation decreases, as it can be easily veried that:
2

( | )=

2
( )
=
( )
2+(

1)

8 In Section 9.8.3, we discuss alternative signals structures such that a decrease in correlation does not necessarily imply less
precise signals but an increase in monopolistic information power.

440

c
by
A. Mele

9.8. Markets with strategic players


Finally, the Appendix shows that the expected prots for each trader are,
r
1
((
) )=
2+(
1)

(9.79)

When is small, expected prots increase as increases, reecting the fact that agents observe
signals with better quality. As increases, the expected prots might actually fall with when
is su ciently high, reecting a lower informational advantage, as explained earlier while
commenting on in (9.78).
9.8.2.3 Multiple dealers

Kyle (1989) considers a model in which uninformed and informed traders co-exist and act
strategically. Foucault, Pagano and Roell (2013) consider a simplied version of this model, in
which all traders are risk-neutral, and the uninformed traders are interpreted as dealers.
These dealers compete through a call auction mechanism which leads to an equilbrium
P price such
that the dealers make prots on average. As in Kyle (1985), the order ow is =
+ ,
=1
but in the presence of
dealers acting in an imperfectly competitive market,
=

( )

(9.80)

=1

where ( ) is the supply schedule submitted by dealer , a function of the price. The auctioneer
sets a price such that the demand schedules of the dealers sum up to the order ow.
Each dealer maximizes his expected prots taking the other dealers (and the informed
traders) actions as given, as in a Cournot-Nash market,
max

((

| )

(9.81)

and one constraint that takes into account the nature of imperfect competition amongst dealers,
as we now explain.
Assume that the other dealers supply schedules are, for each ,

( )=
6=
(9.82)

for some to be determined in the equilibrium of the game. The meaning of is that dealer
sells if
0, and buys if
0. Replacing Eq. (9.82) into Eq. (9.80) leaves the residual
demand function for any dealer ,
= +

The maximization problem faced by each dealer


constraint (9.83). The solution is,

1)

(9.83)

is therefore that in (9.81), under the

1
1
(
1)
( | ) = (1
(
1) )
(9.84)
2
2
where the last equality follows by the projection theorem and, as usual, and assuming for
simplicity a single insider trader,

2
+

= 2 2
=
(9.85)
+
+ 2
441
=

c
by
A. Mele

9.8. Markets with strategic players

The previous expression is the same as the usual Kyles lambda (see Eq. (9.64)), although it
is not the price impact in this model unless
is large.
We determine in Eq. (9.82) and then, we search for an equilibrium price by solving Eq.
(9.80). First, eliminate from Eq. (9.83) and Eq. (9.84), such that,
(1

( )

1)

(
1) ) (
1+ (
1)
{z

where the identity under the brackets follows by the conjecture made in Eq. (9.82). Therefore,
2

(9.86)

1)

Finally, Eq. (9.80), Eq. (9.82), and Eq. (9.86) imply that the equilibrium price is,
= +

1
2

(9.87)

To determine the overall equilibrium, we need to determine in (9.85), and then, in Eq.
(9.87). The usual maximization problem of the insider trader leads to,
1
=
2

(9.88)

Eq. (9.85) and Eq. (9.88) imply that:


=

=1
2

(9.89)

Dealers market power makes asset markets less deep than in the perfect competition case of
Kyle (1985), because the price impact, , is decreasing in
as (9.89) reveal. However, price
discovery remains the same as in the Kyle (1985) model (see Eq. (9.67)),
( | )=

1
2

Intuitively, the insider trades through half of the market depth both in the perfectly competitive
(see (9.63)) and in the imperfectly competitive case (see (9.88)).
Finally, the expected prots to the insider trader are,
((

1
) )=
2

and are increasing in the number of dealers


gets large.

(
(

2)
1)2

, converging to the perfectly competitive case as


442

c
by
A. Mele

9.8. Markets with strategic players


9.8.3 Dynamic markets
9.8.3.1 Positions in discrete time

Consider an asset that pays o a dividend at some point in time, . The asset can be
traded inP distinct periods. A traders cumulative position at the end of the trading period
is
=
, where
denotes the position in the asset at the trading period (these
=1
positions are not part of a self-nanced strategy). His nal prot is,

where
Therefore,

=1

denotes the sum of the values of all the positions over the trading period.
=

(9.90)

=1

Below, we shall utilize the continuous time counterpart to this expression while reviewing a
dynamic extension of the Kyles model.
9.8.3.2 Monopolistic trader

How does the insider trader dilute his information when allowed to trade over di erent batch
auctions? We analyze the continuous time version of Kyles model, in which trading occurs over
a nite horizon xed to [0 1]. As in the baseline market, the insider trades based on his available
information, and the market maker (many of them, actually) updates his beliefs regarding the
fundamental asset value based on having observed the aggregate order ow.
The aggregate cumulative orders at
[0 1], say, comprise two components: (i) the
R any
insiders cumulative demand,
, and (ii) the cumulative orders by the liquidity
0
traders,
, where
is a constant, and
is a standard Brownian motion. Therefore, =
+
, such that the aggregate order ow satises:
=

(9.91)

The market maker does not make any prots while he sets the price according to the semistrong e ciency rule paralleling that in Eq. (9.61),

=
| ( ) [0 ]
(9.92)
The insider trader maximizes his expected gains from trade, the continuous time version of
in Eq. (9.90):

Z 1

(9.93)
)| =0
max
(
)
(
=
( )
0

[0 1]

Motivated by the market behavior in the static setting of the previous sections, we conjecture
that the optimal trading strategy in (9.93) is
=
for some deterministic function

(9.94)

, such that the order ow in Eq. (9.91) can be written as,


=

)
443

(9.95)

c
by
A. Mele

9.8. Markets with strategic players


To determine the dynamics of the price in Eq. (9.92), note, heuristically, that,

| ( ) [0 ] +
+ =

(
| ( ) [0 ] )
+
( +
)
=
| ( ) [0 ] +
( +
| ( ) [0 ] )
=

(9.96)

where the second equality follows by the projection theorem, and the third by Eq. (9.92), Eq.
(9.95) and the following denition of residual variance,

(
)2 ( ) [0 ]
(9.97)
The residual variance, , can be interpreted as a gauge of informational e ciency, or price
discovery process, i.e. the market makers inference about the asset payo . We shall see that in
equilibrium, price discovery will be complete, in that lim 1 = 0.
In the limit, and disregarding 2 terms, the price in (9.96) satises,
=

(9.98)

We now determine and , and then use (9.98) to determine , thereby having determined
the trading strategy in Eq. (9.94). Later, we shall verify that the thusly determined strategy
is indeed optimal. Regarding , suppose that
becomes available over the time interval .
Then,

(
)2 ( ) [0 + ]
+ =
2

(
| ( ) [0 ] )
+
2
) ( ) [0 ]
=
(
( +
| ( ) [0 ] )
2

2
2

such that in the limit, and by the expression for

in (9.98),
2 2

(9.99)

= , a constant, and that the equilibrium trading strategy in (9.94)


We conjecture that
leads to a complete price discovery process, lim 1 = 0. By integrating Eq. (9.99), and using
2 2
the condition 0 = 2 , leaves
= 2
, and by the conjecture that 1 = 0, we nd that
the price impact in (9.98) is,
(9.100)
=
and, accordingly,
=

(1
444

(9.101)

c
by
A. Mele

9.8. Markets with strategic players

That is, the pace of information revelation is constant in this market. Eqs. (9.100), (9.101)
and (9.98) now imply that the trading aggressiveness in (9.94) is,
=

1
(1

(1

(9.102)

Eqs. (9.100), (9.101) and (9.94) complete the description of the market equilibrium, once we
prove the trading strategy in Eq. (9.94) is optimal with
as in (9.94). The proof proceeds
in two steps. First, we show that the proposed
leads the price process to converge to the
fundamental (no money left on the table); and second, we show that any strategy is optimal
when it satises this property.
Regarding convergence to fundamental value, note that by replacing (9.91) into (9.98), using
the expressions for in (9.94), and the expressions for and
in (9.100) and (9.102),
=

(9.103)

Now, by Karatzas and Shreve (1991, Corollary 6.10, p. 359), the process,
=
, is a

, meaning that, 0 = 0 and 1 =


. The latter equality means
Brownian bridge from 0 to
9
implies that 1 = . That is the price is a Brownian bridge under the insiders information
although the end-point of it is unknown to the market maker.
Regarding the previous convergence property as being one of the optimal trading strategies,
consider the following Hamilton-Jacobi-Bellman equation satised by (
) in Eq. (9.93),

1
(
(9.104)
) + (
)+ (
)
+
(
) 2 2
0 = max
2
where the expectation is taken conditional upon the price, and where we have used
=
and the expression of (9.91) for . As originally noted by Back (1992), the maximand in (9.104)
is linear in such that the terms multiplying must sum up to zero, with the remaining terms
summing up to zero as well, viz

+
=0
(9.105)
1
( )+ 2
( ) 2 2=0
Eqs. (9.105), and the boundary condition
the value function in (9.93),
(

)=

1
(
2

(1

1)

)2 +

Next, consider the value function in (9.93) for any

=
+

= 0, lead to the following expression for


1
2

(1

, say

)
(

(9.106)
). By Itos lemma,

where
denotes the usual innitesimal generator, which by Eq. (9.93) satises 0 =
(
) . Therefore,

Z 1

=
=
(
)
(0
)
(1
)
0
1

9 Note

of

that the insiders cumulative orders


= ( +
, we have that
=

are bounded in probability as a result of this convergence. Indeed, by the denition


): boundedness in probability of
and
implies that of .

445

c
by
A. Mele

9.8. Markets with strategic players

2
Note that by evaluating Eq. (9.106) at = 1, we conclude that (1 1 ) = 21 (
0.
1)

This means that for an optimal strategy we have that =


(0 0 )
( (1 1 ))

(0 0 ), and therefore that ( (1 1 )) = 0; because (1 1 ) 0, it follows that (1 1 ) = 0,


i.e. 1 = .

9.8.3.3 Imperfectly competitive traders

Foster and Viswanathan (1996) generalize the dynamic market of Kyle (1985), by relying on
a discrete time setting with multiple traders ( say), and a general correlation structure of
the signals (still assumed to be Gaussian). They assume that each trader observes a signal
correlated with the asset payo , which has variance 0 , and that any two signals have the same
0 10
correlation equal to
.
0
One leading example of signals in this setting is the exhaustive information structure,
arising when the sum of all signals is the
P truth, i.e. all traders would form a mega Kyle
trader upon information collusion, =
is the signal available to trader . It
=1 , where
is immediate to verify that in this case, for each ,
(

)=

+(

1)

and that the variance of the asset payo is:


2

Eq. (9.107) can be inverted for , leaving, for all


(

1)
,

1
(

(9.107)

1)

2
0

The correlation coe cient is a measure of monopolistic information power amongst traders.
The lower , the more unique a signal is to each traderand the higher is his monopolistic
power. Foster and Viswanathan (1996) show, numerically, that in a dynamic context, traders
engage in a rat race once is high: because traders have comparable information, they trade
very aggressively since the beginning of the trading period to preempt being anticipated by
others, but then, revealing virtually all their information over the rst trading rounds. In fact,
Back, Cao and Willard (2000) consider a continuous time model along the same lines, and
conclude that an equilibrium fails to exist when = 1 due to this behavior.
However, when is small, and the degree of monopolistic information power is high as a
result, traders engage in a waiting game, trading little at the beginning of trading period, and
aggressively towards the conclusion. These trading patterns lead to a price discovery process,
which occurs at a slow pace at the beginning of the trading period, and at a high pace at the
end. These patterns of price discovery process cannot be generated by the constant information
ow predicted by the single insider traders model of Kyle (1985), as summarized by Eq. (9.101).

10 The

variance-covariance matrix of the signals available to all traders is, then, invertible, provided

446

1)

0.

9.9. Further topics on market microstructure, frictions and limits to arbitrage

c
by
A. Mele

9.9 Further topics on market microstructure, frictions and limits to arbitrage


[In progress]
9.9.1 Further determinants of bid-ask spreads
Inventory risk
[In progress]
9.9.2 Liquidity trading
The assumption of noise traders. Discussion
9.9.3 Arbitrage imperfections
Irrational traders: DeLong, Shleifer, Summers and Waldman (1990). Risky arbitrage. Constrained arbitrage: equity (Gromb and Vayanos, 2002). Preferred habitat and the yield curve.
Greenwood and Vayanos (2014). Capital immobility.
9.9.4 Price impacts and derivatives
Price pressures and demands for derivatives: Garleanu, Pedersen and Poteshman (2009).

447

c
by
A. Mele

9.10. Over-the-counter markets

9.10 Over-the-counter markets


Du e, Garleanu and Pedersen (2005, 2007). Succinct account in Du e (2012).
[In progress]

448

9.11. Questions regarding higher order beliefs and beauty contests

9.11 Questions regarding higher order beliefs and beauty contests


[In progress]

449

c
by
A. Mele

c
by
A. Mele

9.12. Appendix 1: The projection theorem

9.12 Appendix 1: The projection theorem


Consider the following Gaussian environment, in which a random vector is normally distributed:

We have,
( | )=

(9A.1)

and
1

( | )=

(9A.2)

A proof of this result can be obtained as follows. Consider the following regression

)+
= (

where is zero-mean, and orthogonal to . By taking covariances in Eq. (9A.3),


yields Eq. (9A.1), and by taking variances in Eq. (9A.3),
>

( )=

(9A.3)

, which

( | )

1 into the previous equation leaves Eq. (9A.2).


Plugging =
We apply the projection theorem to
study inference problems in which the conditioning variable
1
is a vector of signals,
, where,

= +
0 2
= 1
(9A.4)

When

= 1,

( | )=

( | )=

We can express the previous relations in terms of precision of the signals,


+

( | )=

( | )=

where denotes the precision of a random variable, for example,


generalized to the case of multiple signals in (9A.4),

=1

1
P

=1

1
+

(9A.5)
1

. Eq. (9A.5) is easily

(9A.6a)
(9A.6b)

=1

Eqs. (9A.6a)-(9A.6b) simplify once the precisions of the signals are all the same,
leaving,

where denotes the average signal,

1X
=1

450

1
+

for all ,

c
by
A. Mele

9.13. Appendix 2: Details regarding solutions of selected models

9.13 Appendix 2: Details regarding solutions of selected models


Proof of Eq. (9.35) (Grossman and Stiglitz (1980) model). We, rst, determine the expected
utility of the would-be uninformed investors. By the Law of Iterated Expectations, it is:



(
| )+ 12 2
(
| )

U =
=
where, by the expression of in (9.26),
(

| )=(

such that,

U =

)2

( ( | )
(

0+ 0

1
2

| )=

( ( | )

)2

( ( | )

)2

(9A.7)

Instead, by the Law of Iterated Expectations, the expected utility of the would-be informed investors
before accounting for information costs is

U =

1
( 0+ 0 )
)2
2 |( ( |)
=

1
( 0+ 0 )
)2
| ( ( | )
2
(9A.8)
=

To determine the inner expectation in the last line of (9A.8), we rely on the following distributional
result, shown below,

)
(
)
| ( ( | )
(9A.9)
|
)
1
| ( ( | )
|

Now, it is well-known that if

), then

1 2
2

2
1
2 1+

1
1+

Applying this to determine the inner expectation in (9A.8) yields


s
s

2
1
|
( 0+ 0 )
((
(
|
)
))
=
U =
2 |
|

|
|

where the second equality follows by the expression of U in Eq. (9A.7). Eq. (9.35) follows by the
previous expression of U .
We are left to show that (9A.9) holds true. Regarding the conditional expectation , note that
( | )= ( |
), such that, by the Law of Iterated Expectations and the fact that the information
content of is coarser than that of (
),
(
Regarding the expression for

( | )| ) =

( |

)| ) =

( | )

in (9A.9), note that due to the normality of


2
|

( | )
=
=

(
(

( | )| ) +
( | )| ) +

451

(
2
|

( | )| )

and , we have that:

c
by
A. Mele

9.13. Appendix 2: Details regarding solutions of selected models


2

where the last line follows because


=

is independent of . That is,

( | )| ) =

2
|

whence, the expression in (9A.9).


The equilibrium price in Eq. (9.41) regarding the differential information model. By
the projection theorem, the two coe cients appearing in the conditional expectation of (9.39) are,
>

=
=

1
( )

( )

( )

2(

( )

(9A.10)

in (9.10) and the conjectured equilibrium price in (9.40),

where, given the assumption on

=
) =
=
=
)
=

and the fourth equality follows because,


!

1
1X
=
=

2
2

2
+2
2+
2
+

=1

(9A.11)

+(

2 2

1)

+(

1)

Plugging (9A.11) into Eq. (9A.10), leaves, after tedious but straightforward calculations,
=
=

2 2 2

(9A.12)

where

( )

2 2

(9A.13)

Next, replace (9A.12) into the expressions of the price coe cients, Eqs. (9.43), which leaves the
following expressions for and :
2 2 2

We determine the conditional precision,


projection theorem,

1
2
=
(
|

|
|

, prior to determining

452

21

and

2 2

(9A.14)

in (9A.14). By the

2 2

(9A.15)

c
by
A. Mele

9.13. Appendix 2: Details regarding solutions of selected models

where the second equality follows by (9A.11), the expressions for


and
in (9A.12), and that for
in (9A.13). Replacing Eq. (9A.15) into the expression for
in (9A.14), and using (9A.13), leaves
the following expressions for the asset price coe cients and the conditional precision,
2 2 2

21

=
and

21

1
1

2
2

(9A.16)

2 2

2 2

2 2

2 2

2 2

(9A.17)

To show that there exists a unique solution for and , dene the constant through the following
relation,

1
1
21 1
+ 2 2 2
1 1 2+ 2 2 2
=
=
(9A.18)
2 2
2 2
where the rst equality follows by Eqs. (9A.16) and the second by the very same denition of . It is
say. Given , we determine
easy to see that there exists a unique solution for to Eq. (9A.18),
in (9A.16) by expressing it as a function of , and relying on the denition of
in (9A.13), as
follows:
2 2 2

2 2 2

=
where

The solution for


denition of ,

1
, and then

1
2

2 2

1
2

(9A.19)

, are obtained after rearranging terms in (9A.19), and using the


2

2 2 2

Eq. (9.44) follows after using the denitions of precisions in the previous expression, and after rearranging terms.
The limiting case in Eq. (9.46) is obtained after taking the limits in Eq. (9.44) for large, and
noting that,
lim

Finally, the limiting conditional precision in (9.47) is obtained after taking the limit in Eq. (9A.17)
for large, and using the previous denition of .
Proof of Eq. (9.77). By Eq. (9.73), the order ow is,
=

1 X

=1

(9A.20)

where denotes the average signal,

=1

1 X
=1

453

(9A.21)

c
by
A. Mele

9.13. Appendix 2: Details regarding solutions of selected models


We have,
(

)=

(9A.22)

and,
( )=

2 2

() +

2 2

1)
2

(9A.23)

Eq. (9.77) follows by Eqs. (9A.22)-(9A.23), and by,


2

1 2

(9A.24)

Proof of Eq. (9.79). Note that by Eq. (9A.20), the equilibrium price is,

= +
+
1

. Therefore, the expected prots for each trader are,


Moreover, the trade of agent 1 is 1 =

2
1
((
) )=
1

2
=
(1
) 2
=

(2
)
2+(

1)

where the second line follows by the expression for the average signal in Eq. (9A.21) and by rearranging
terms, and the third by using the expression for
in Eq. (9.76). Eq. (9.79) follows by the denition
of in (9A.24) and the expression for in (9.78).

454

9.14. Appendix 3: Some foundations to pricing behavior in macroeconomics

c
by
A. Mele

9.14 Appendix 3: Some foundations to pricing behavior in macroeconomics


We provide a few arguments to justify the functional form for the supply and demand equations of
Section 9.2 (see Eq. (9.1) and Eq. (9.3)). We consider a variant of a model presented by Blanchard
and Fisher (1989), which itself is a simplied version of Blanchard and Kiyotaki (1987). The model in
this appendix di ers from those above as it considers (i) a continuum of goods and (ii) taste shocks
a ecting each good. While the material in this appendix goes somehow beyond the scope of these
lectures, it still helps maintaining the reasoning in this chapter self-contained.
Consumers-producers tastes and technology, and markets. So consider a continuum of
consumers-producers, each with a utility function equal to
1

[0 1]

(9A.25)

is the nominal money holdings, is a general price index (to be determined below; see Eq.
where
is consumption of a basket of goods (to be dened in a moment), and
is labor supplied;
(9A.32)),
the parameters satisfy
(0 1),
0,
1. In words, each producer enjoys consuming a basket a
goods but su ers a disutility while working.
We assume that the basket of goods contains CES (Constant Elasticity of Substitution) substitutes
as in Dixit and Stiglitz (1977), that is,
=

1)

(9A.26)

is consumer-producer consumption of the good ,


is a taste shock for good , assumed
where
to be the same for all consumers-producers, and is the CES between any two consumption goods
and
. We assume that
1. We shall specify the properties of
later.
Finally, we assume that (i) each producer specializes in the production of his own good according
to the simple technology: = , and that (ii) the market is one with monopolistic competition; this
market collapses to the perfectly competitive one when is large.
Each consumer-producer maximizes the utility in (9A.25) subject to the budget constraint,
Z

(9A.27)

is the price of his produced good and is his money endowment. It is easy to see that
where
the solution to this problem is symmetric, in that each consumer-producer chooses the same basket
composition (albeit acheiving di erent consumption quantities of this same basket, depending on the
income ). We can, then, dene the price index in (9A.25) as the minimal expenditure needed to
purchase a unit of the composite good
in Eq. (9A.26),
:

(9A.28)

We now determine demand for both the single and composite goods.
Goods demand and price index. We determine
rst. Replacing Eq. (9A.28) into the budget
+
= , leads
constraint (9A.27) and maximizing (9A.25) subject to the resulting constraint,
to a standard result:
=
= (1
)
(9A.29)

455

9.14. Appendix 3: Some foundations to pricing behavior in macroeconomics


Next, we identify the optimal composition of each single good,
Z 1
= arg max
s.t.
=

c
by
A. Mele

. It is
=

(9A.30)

where the second equality of the constraint follows by Eqs. (9A.29). Note that this program is simply
maximization of (9A.25) under the constraint (9A.27). The rst order conditions lead to
1
1 1
=
whence, the denition of as the CES between any two consumption goods. Note that to simplify
. Replacing the previous optimality condition into
notation, we are now reverting to writing
the constraint of (9A.30), and using the denition of the composite good in Eq. (9A.26), leads to


=
=
(9A.31)
where the second equality follows by the second equality of the constraint in (9A.30).
The price index is obtained by solving for in Eq. (9A.28) while relying on the rst equality in Eq.
(9A.31), leaving
11
Z 1
1
(9A.32)
=
0

Finally, we determine the indirect utility of consumer-producer , by replacing Eqs. (9A.29) into
Eq. (9A.25),


=
(9A.33)
+

where is a constant and where we have used the agent budget constraint and the production technology = . Note that this expression resembles that of a maximizing rm. One issue is whether the
in
is a ected by his decisions regarding
consumer-producer has pricing power, that is, whether
. We assume it is the case below, although then this is not crucial for the interpretation of the pricing
equations in Section 9.2.

Production and equilibrium. The aggregate demand for the good produced by the consumerproducer is obtained by aggregating the individual demands for this good,
Z 1

Z 1
=
=
(9A.34)
0

where we have used the second equality in (9A.31), and where the last equality holds by the following
denition of aggregate demand, ,

Z 1
Z 1
Z 1 Z 1
=
=
(9A.35)
0

Let denote the aggregate money endowment. We can solve for , by noticing that
Z 1

Z 1

+
=
=
+
=
0

and the fact that, in equilibrium, =


Z 1
0

R1
0

, such that,
Z 1
=
=
0

456

(9A.36)

9.14. Appendix 3: Some foundations to pricing behavior in macroeconomics

c
by
A. Mele

where the rst equality follows


R 1by (i) aggregating the agents budget constraint and (ii) the equilibrium

condition equilibrium,
= 0
; and the second by (9A.35). By Eq. (9A.36), the solution for
is then

=
1
which replaced into Eq. (9A.34) leaves the aggregate demand facing consumer-producer
=

(9A.37)

In the context of monopolistic competition of this appendix, one could solve for the optimal pricing
rule. The latter is obtained by replacing Eq. (9A.37) into Eq. (9A.33) and maximizing with respect to
, leaving
"
# 1

1 1+ ( 1)
=
(9A.38)
(
1)
1
We now proceed with the interpretation of some basic assumptions in Section 9.2 in light of the
framework developed so far.
Relations with model in Section 9.2. Eq. (9A.37) provides foundations to the demand for each
product in the model of Section 9.2 (see Eq. (9.3)). Regarding the supply equation (see Eq. (9.1)), note
that assuming market power and that is common knowledge leads to Eq. (9A.38), which could be
replaced back into Eq. (9A.37) to determine the equilibrium production in each market. Alternatively,
one could simplify and remove the assumption of market power. Assume, further, that is not common
knowledge, in which case producers maximize their expected utility in Eq. (9A.33),
!

arg max
=

Assume that ln

is normally distributed conditionally on


ln =

. It is easy to verify that in this case,

| )

(9A.39)

for two constant 0 and 1 . Naturally, the assumption that


is conditionally normally distributed
needs to be veried to hold in equilibrium. However, the price index in Eq. (9A.32) reveals this to
be problematic, unless a number of conditions hold true. For example, For example, assume that the
agents distribution is such
R that there is a mass ( ) of agents , and suppose that taste shocks are
( ) = 1, and that = 1, such that
centered around one, or
Z
()
=
Even in this case, the price index is not exactly the average log-price due to the presence of the taste
shocks, such that Eq. (9A.39) is only an approximation to Eq. (9.1).

457

9.14. Appendix 3: Some foundations to pricing behavior in macroeconomics

c
by
A. Mele

References
Back, K. (1992): Insider Trading in Continuous Time. Review of Financial Studies 5, 387409.
Back, K., C.H. Cao and G.A. Willard (2000): Imperfect Competition Among Informed
Traders. Journal of Finance 55, 2117-2155.
Black, F. (1986): Noise. Journal of Finance 41, 529-543.
Blanchard, O.J. and N. Kiyotaki (1987): Monopolistic Competition and the E ects of Aggregate Demand. American Economic Review 77, 647-666.
Blanchard, O. and S. Fisher (1989): Lectures on Macroeconomics. Cambridge, MIT Press.
Chamberlin, E.H. (1933): The Theory of Monopolistic Competition: A Re-orientation of the
Theory of Value. Harvard: Harvard University Press.
Cournot, A.A. (1838): Recherches sur les Principes Mathematiques de la Theorie des Richesses.
Paris: Hachette.
DeLong, J.B., A. Shleifer, L.H. Summers and R.J. Waldman (1990): Noise Trader Risk in
Financial Markets. Journal of Political Economy 98, 703-738.
Diamond, D.W. and R.E. Verrecchia (1981): Information Aggregation in a Noisy Rational
Expectations Economy. Journal of Financial Economics 9, 221-235.
Dixit, A.K. and J.E. Stiglitz (1977): Monopolistic Competition and Optimum Product Diversity. American Economic Review 67, 297-308.
Du e, D. (2012): Dark Markets: Asset Pricing and Information Transmission in Over-theCounter Markets (Princeton Lectures in Finance). Princeton: Princeton University Press.
Du e, D., N. Garleanu and L.H. Pedersen (2005): Over-the-Counter Markets. Econometrica
73, 1815-1847.
Du e, D., N. Garleanu and L.H. Pedersen (2007): Valuation in Over-the-Counter Markets.
Review of Financial Studies 20, 1865-1900.
Fama, E. (1970): E cient Capital Markets: A Review of Theory and Empirical Work. Journal of Finance 25, 383-417.
Foster, F.D., and S. Viswanathan (1996): Strategic Trading When Agents Forecast the Forecasts of Others. Journal of Finance 51, 1437-78.
Foucault T., M. Pagano and A. Roell (2013): Market Liquidity: Theory, Evidence and Policy.
Oxford: Oxford University Press.
Garleanu, N., L.H. Pedersen and A.M. Poteshman (2009): Demand-Based Option Pricing.
Review of Financial Studies 22, 4259-4299.
458

9.14. Appendix 3: Some foundations to pricing behavior in macroeconomics

c
by
A. Mele

Glosten, L.R. and P.R. Milgrom (1985): Bid, Ask and Transaction Prices in a Specialist
Market with Heterogeneously Informed Traders. Journal of Financial Economics 14,
71-100.
Greenwood, R. and D. Vayanos (2014): Bond Supply and Excess Bond Returns. Review of
Financial Studies 27, 663-713.
Gromb, D. and D. Vayanos (2002): Equilibrium and Welfare in Markets with Financially
Constrained Arbitrageurs. Journal of Financial Economics 66, 361-407.
Grossman, S.J. (1976): On the E ciency of Competitive Stock Markets where Traders Have
Diverse Information. Journal of Finance 31, 573-585.
Grossman, S.J. and J.E. Stiglitz (1980): On the Impossibility of Informationally E cient
Markets. American Economic Review 70, 393-408.
Hayek, F.A. (1945): The Use of Knowledge in Society. American Economic Review 35,
519-530.
Karatzas, I. and S.E. Shreve (1991): Brownian Motion and Stochastic Calculus. New York:
Springer Verlag.
Kyle, A.S. (1985): Continuous Auctions and Insider Trading. Econometrica 53, 1335-55.
Kyle, A.S. (1989): Informed Speculation with Imperfect Competition. Review of Economic
Studies 56, 317-356.
Hellwig, M.F. (1980): On the Aggregation of Information in Competitive Markets. Journal
of Economic Theory 22, 477-498.
Lange, O. (1936): On the Economic Theory of Socialism: Part I. Review of Economic Studies
4: 53-71.
Lange, O. (1942): The Foundations of Welfare Economics. Econometrica 10: 215-228.
Lucas, R.E. (1972): Expectations and the Neutrality of Money. Journal of Economic Theory
4, 103-124.
Lucas, R.E. (1973): Some International Evidence on Output-Ination Tradeo s. American
Economic Review 63, 326-334.
Lucas, R.E. (1977): Econometric Policy Evaluation: A Critique. Carnegie-Rochester Conference Series on Public Policy 1, 19-46.
Lucas, R.E. (1981): Studies in Business-Cycle Theory. Boston, MIT Press.
Phelps, E.S. (1970): Introduction. In: Phelps, E. S. (Editor): Microeconomic Foundations of
Employment and Ination Theory, New York: W. W. Norton.

459

Part III
Asset pricing and reality

460

10
Options and volatility

10.1 Introduction
This is the rst of four chapters devoted to illustrate how nancial theory can be applied to
cope with the pricing of derivatives and related instruments. We actually know that much of the
theory in Part I of these lectures was motivated as an attempt to rationalize the breakthrough
made by Black and Scholes (1973) and Merton (1973) to price European options. How come
we could even price an asset without making reference to any risk-aversion correction? The
theory in Part I of these lectures explains the rationale behind this and related results. We now
apply the theory to explain how to price assets in markets more realistic than those originally
idealized by Black, Scholes and Merton.
We face a paradox known since at least Hakansson (1979). If the Black & Scholes formula
is true, we should acknowledge that markets are complete, as market completeness is the assumption needed to argue about the redundancy of the option and, then, the whole Black &
Scholes theoretical construct. But if the option is redundant, why would we be willing to trade it
in the rst place? Alternatively, the option is not redundantand options are massively traded
indeedbut then the Black & Scholes formula is wrong, in that it relies on the counterfactual
assumption that markets are complete. Indeed, in practice, many derivatives are traded overthe-counter, with nancial intermediaries specializing in providing counterparties with payo s.
Financial intermediation is also about matching the clients needs regarding the obtention of
dedicated payo s against a fee, on top of the fair value of the derivative. The fee might be
justied by the specialization required to cope with the sources of market incompleteness, as
well as the risks the intermediary will incur losses due to its obligation to honour the payo s
promised to clients.
This chapter analyzes a form of market incompleteness, arising when the volatility of the
assets underlying these derivatives is random, and cannot be hedged through the underlying
assets.
[Introduction in progress]
[Plan of the chapter]

c
by
A. Mele

10.2. Forwards and futures

10.2 Forwards and futures


A forward contract is an agreement to pay for the realization of a given risk at some maturity
date, with the price being agreed at the inception of the contract. The realized risk can be
anything: an asset price, a commodity price, or an abstract index. The agreed price is called
forward price, and is such that it makes the value of the contract equal to zero at inception. A
standard situation is when the risk is an asset price, and when the two parties agree to trade
the asset at a price determined at inception, as in the examples below. In this case, we say that
the party who has the obligation to buy the asset at the expiration has a long position, and
the party who has the obligation to sell this asset has a short position.
Forward contracts are a means to lock-in uncertainty relating to the realization of the future
risk. Consider the following examples. A chocolate producer starts his production process at
time , where he will need cocoa as an input. The output, chocolate, will be sold at time
,
as depicted below.
0 7

(cocoa) 7

(chocolate) 7

Let co and ch be the price of cocoa and chocolate at time and , and co and ch be
the corresponding forward prices. To insure against the vagaries of cocoa prices, the producer
co
can go long a forward contract. This contract guarantees a payo equal to co
at time
co
co
. This payo , minus the unit input cost
to incur at time , leaves exactly
. That
is, a forward contract allows the producer to freeze the sure amount co to be paid at .
Likewise, the producer may wish to short a forward on the price of his own chocolate; indeed,
ch
this position would allow him to receive a payo equal to ch
at , which added to the
price of chocolate sold at , leaves the sure amount ch .
10.2.1 Forwards: denition and pricing in frictionless markets
If markets are frictionless, and the underlying asset is traded (a stock, say), forward contracts
can be synthesized as follows. Let ( ) be the price of a bond expiring at time and
(
)
the price of a stock. Assuming the short-term rate is constant, we have ( ) =
,
where denotes the short-term rate, which is the same for borrowing and lending. So at time
, borrow ( )
and buy the stock, choosing
: ( )
= 0. The value of this
portfolio at time is
. But the portfolio is worthless at time , so this trade is the same
as a forward. Therefore, we have
= ( ), where:
( )

(10.1)

Therefore, forwards are insensitive to volatility in this introductory example. They actually
might be volatility-sensitive under circumstances claried below (see the discussion after Eq.
(10.2)) and in the following sections.
The pricing of a forward on a stock can be extended to cases where the underlying is not
traded. Suppose that a risk materializes at time , say the level of average temperature over
a certain period preceding , and over a pre-specied geographical area, or say the realized
volatility experienced by the S&P 500 Index over the month preceding . Denote this risk with
. The payo of a forward contract from the perspective of its buyer is given by
( ),
such that by risk-neutral evaluation,
( )=E( )
462

(10.2)

c
by
A. Mele

10.2. Forwards and futures

where E denotes the risk-neutral expectation. Naturally, Eq. (10.2) collapses to Eq. (10.1),
should be the price of a traded stock.
In general, though, Eq. (10.2) reveals that is unlikely we could come up with a preferencefree evaluation of a non-traded risk, as expected from the theory developed in Part I of these
lectures. In these cases, the volatility of may well a ect the forward price ( ) due to
risk-premiums.
There are exceptions in which the pricing in Eq. (10.2) is model-free even if is not traded.
An important relatively recent advance is to have shown how to proceed with the model-free
evaluation of non-traded risks, provided a su ciently high number of additional derivatives
are written on this risk. The CBOE-VIX index of expected volatility does rely on this idea as
explained in Section 10.8.
10.2.2 Forwards as a means to borrow money
Forward contracts can be used to borrow money. We can do the following: (i) go long a forward,
which at time , delivers the payo
+ ; (ii) short-sell the underlying asset, which at time
, will give rise to a payo of
. So, (i) and (ii) are such that now, we access to
dollars,
due to (ii), and at time , we pay
, i.e. the sum of the two payo s resulting from (i) and
(ii). By Eq. (10.1), this is tantamount to borrowing money at the interest rate .
10.2.3 Marking to market
Consider a derivative we go long at time = 0, when it is worthless. As time unfolds, its value
will change, which calls for marking to market it. Suppose the derivative pays o
( )
0
at time , where
is the price of some asset as of time , and 0 is set so as to make
the derivative worthless at time zero. Assuming that interest rates are constant, we have that
E0 [ ( )
= 0, taken under the
0 :
0 ] = 0, where E0 is the expectation at time
risk-neutral probability. That is, 0 = E0 [ ( )]. The market value of the derivative at time
, say MtM , is simply the present value of the expected payo at , under the risk-neutral
(
)
probability,
E [ ( )
0 ], or
MtM =

0)

(10.3)

For more elaborated payo s, such as those depending on the realizations of the underlying risks
over the life of the contract, marking to market updates may be more intricate than that in
Eq. (10.3), as in the case of the variance contracts (see Section 10.7.3).
10.2.4 Futures
Forward contracts are typically OTC, and not standardized, and might not be traded after
their inception. Futures are, instead, standardized. The cost of entering into a future contract
is zero, as for a forward. However, the central feature of future contracts is marking-to-market,
which forces their value to be zero at any time we wish to enter into them after their inception.
Note, in contrast, that in general, we should have to pay (or be payed) to enter into a forward
contract after its inception. Indeed, the payo at pertaining to a forward that we enter at
time is
( ), such that similarly as for Eq. (10.3), the value of the forward at (and
originated at ) is as in the following mark-to-market update,
( )

E (

( )) =
463

( )

( ))

c
by
A. Mele

10.2. Forwards and futures

By construction, forwards are not standardized, in that their value (the cost of entering
into them) depends on when we enter them! Moreover, in OTC markets, it is often the case
that the appropriate marking notion is that of a mark-to-model, rather than mark-to-market.
Futures work di erently, as the cost of entering into them is always zero, as mentioned.
Precisely, let F ( ) be the price process of the future at time . Intuitively, while we hold a
future position, we only pay or receive di erences up to the maturity , F ( ) F 1 ( )
1
over pre-specied periods,
1 . Finally, we could close any position at any time before
by just initiating an opposite position.
For reasons developed below, let us assume that the short-term rate
is random. The two
dening properties of F ( ) are that (i) at maturity, the futures settle at , i.e. F ( ) =
(a boundary condition that is part of the futures security design), and (ii) the gains generated
by F ( ) are continuously credited from/debited to the future holders account, such that in
absence of arbitrage,
Z

F ( )

0=E

The previous equality actually


holds for any investment horizon
. Therefore, we have,
R
heuristically, that
F ( ) is a martingale under the risk-neutral probability,
or 0 = E (
)=
E ( F ( )), implying that F ( ) is a martingale too. Therefore,
F ( )=E (

In contrast, it is easy to see that forward prices satisfy,

1
( )=
E
( )

where ( ) is the price of a zero coupon bond. In other words, the future price is a martingale
under the risk-neutral probability, while the forward is not. In Chapter 12, we show that forward
prices are martingales under another probability, referred to as the forward probability (see also
Chapter 4). Naturally, both futures and forward prices are martingales under the risk-neutral
probability, once interest rates are constant or deterministic, although this case is obviously not
relevant whilst dealing with the evaluation of xed income securities.
10.2.5 Backwardation and Contango
Let us assume interest rates are constant, such that forwards and futures are valued the same.
A natural question arises: do markets price forwards or futures at a discount or premium?
We have two notions of dicounts: one, regarding expectations of future prices; and a second,
more operational, regarding current prices.
As for the rst notion, we say markets are in Backwardation if the forward price is lower than
(
)
the expected spot price, ( ) =
( ), and in Contango otherwise, i.e. ( )
( ), where denotes the expectation taken under the physical probability. Clearly, markets
(
)
are in Backwardation if prices follow Geometric Brownian motions, as
( ) =
,
where
is the drift coe cient under the physical probability. Keynes (1930) and Hicks
(1939) would refer to markets in Backwardation as being the standard situation, one of Normal
1 In addition to these margins, one is also typically due to provide an initial margin, which aims to mitigate concerns regarding
the solvability of the trading parties.

464

c
by
A. Mele

10.2. Forwards and futures

Backwardation. According to the Keynes-Hicks Normal Backwardation hypothesis, commodity


producers are more incentivized to hedge their risk than consumers, which leads forward prices
to be cheaper than their current expectations. In terms of the introductory example in this
section, risk-averse chocolate producers would like to short forwards, and their risk-aversion
would make them likely to accept low forward prices for the chocolate they producelower
than the chocolate price at which they expect to sell.
A second notion relies of discounts relies on spot prices, rather than the di cult to estimate
expected prices. The criterion Keynes and Hicks would actually rely on links to the forward
price being lower than the current spot price, not the expected spot. Accordingly, markets are
in Backwardation if ( )
, and in Contango otherwise. It is easy to see that under this
(
)
criterion, markets are now typically in Contango, as ( ) =
, as predicted by Eq.
(10.1)! To restore Backwardation, we need a mechanism by which spot prices are rich. One
possibility is to assume that the no-arbitrage arguments underlying Eq. (10.1) break down. For
(
)
example, suppose that ( )
. A standard argument suggests to go long a forward
contract, short the asset, and invest the short-sale proceeds into a money market account. Come
(
)
time , the money market account would deliver
, and by honouring the forward at ,
which costs ( ), we would buy the asset and, nally, close the short-sale position. The net
(
)
payo at would be equal to
( ) 0. An informal argument at this juncture
would be that in the presence of a selling pressure aiming to exploit the arbitrage, the spot
price would go down, thereby restoring Eq. (10.1) and, then, Contango. But in the presence of
frictions such as short-sale constraints, the downward price pressure would not arise in the rst
place, leading to Backwardation.
200

160

190

150
140

Forward

180

Spot

170

130

160

Forward

120

150
140

110

130
100
90
80

Spot

120
110
0.0 0.1

0.2 0.3

0.4 0.5 0.6

0.7 0.8

0.9

100

1.0

time

0.0

0.1

Backwardation

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

time

Contango

The presence of a convenience yield is an alternative assumption to that of short-sale constraints, which we can make to restore Backwardation. A convenience yield represents the ow
of services accruing to the owner of the asset (not the owner of the forward). In the presence
of a constant convenience yield, denoted with , the forward price satises:2
( )=

)(

(10.4)

2 To prove Eq. (10.4), we generalize the reasoning leading to Eq. (10.1), as follows. Suppose to the contrary that
()
)(
) . Then we go long a forward contract, short
(
) shares at , and invest the proceeds in a money
()
() (

465

c
by
A. Mele

10.2. Forwards and futures

such that we have Backwardation again, for large enough.3 Note, however, that Contango
would be a normal situation in storable commodities markets characterized by costs of carry
such as warehousing fees or foregone interestsa cost of carry would imply a negative in Eq.
(10.4). Thus, and according to the sign and magnitude of , we can either have Backwardation
or Contango.
Some commodity markets can actually switch from being in Backwardation to Contango, and
in Backwardation again, through cycles. To account for these cycles, we may wish to consider
a model with a stochastic convenience yield; an instance of such a model is one where under
the risk-neutral probability, the spot price is solution to
=(

= (

(10.5)

and
where are standard Brownian motions under the risk-neutral probability, ,
are constant parameters, and is a risk-premium parameter arising because the stochastic
convenience yield
is not tradable and, hence,
is not a martingale under the riskneutral probability.
The expression for the forward price would collapse to that in Eq. (10.4), once we assume
the convenience yield is constant. In the general case,
( )=E (

)=

(10.6)

That is, a stochastic convenience yield might drive the correlation between the forward and the
spot to values lower than one.
The expectation in Eq. (10.6), (
), is actually known in closed-form. It is the same
as the price of a zero coupon bond predicted by a model where the short-term rate is , and is
solution to the second equation in (10.5). This model, developed by Vasicek (1977), is discussed
in Chapter 12. Naturally, the important implication of this model is that according to the
specic values taken by , markets can either be in Backwardation or in Contangoboth in
terms of the criterion comparing the forward price to the current spot and the expected spot.
Finally, the nature of the relation between the ctitious bond price (
) and the
volatility parameter
is indeterminate. As discussed in Chapter 12, there are two e ects that
explain how the price of any zero coupon bond links to the volatility of the short-term rate,
a convexity e ect and a risk-premium e ect. The former acts so as to lead (
) to
be increasing in
, and the latter acts in the same direction if
0, and in the opposite
direction if
0. For short-maturities, the risk-premium e ect dominates over convexity, such
that (
) and, hence, the forward price, is decreasing in the volatility parameter ,
provided
0. However, if
0, the forward price, is increasing in , for any maturity.
market account. The short-sale at implies that we need to pay out the convenience yield
( ) at any time
(
), which
we nance by shorting additional
shares per share already held, such that over any innitesimal amount of time, we short
) = 1,
( )=
( )
additional shares. In total, then, at time , we would have shorted just one share, as ( ) = ( ) (
( ), as usual. Finally, at time , the money market account would
which we buy back by honouring the forward, which costs
)(
) . Therefore, the net payo at
)(
)
yield ( ) (
would be equal to ( ) (
( ) 0, an arbitrage.
3 To derive Eq. (10.4), note that by no-arbitrage, E
+
=
yield is simply
. A storage cost is a negative convenience yield.

466

, where

is the constant dividend yield. A convenience

c
by
A. Mele

10.3. Optionality and no-arb bounds

10.3 Optionality and no-arb bounds


We analyze the well-known fundamental optionalities displayed by options: the right, but not
the obligation, to buy or sell a given asset at a given price (the strike, or exercise price) at
some future date. While Chapter 4 illustrates foundational issues regarding evaluation of these
contracts, this chapter describes additional properties of these contracts that help explain their
use in the market practice, including those at the heart of the recent waves of nancial innovation
and underlying volatility trading. This is the rst section that illustrates a few fundamental
properties of European options, with a few examples.
10.3.1 Model-free properties
Let and the prices of the call and the put option, with the price of the asset underlying
these contracts, and with
and the exercise prices and the expiration date. Let be the
current time. As we know, at the expiration,
=(
)+ and
=(
)+ . Figure 10.1
depicts the net prots generated by holding or short-selling one asset and one option written
on the asset. We take the short-term rate = 0 to simplify the presentation. The exposure to
losses generated by going long an asset drops by trading the appropriate option, i.e by a long
position in a call, as illustrated by the top panel, at least provided
, which is indeed
a no-arbitrage condition, as shown below. It is this insurance feature that makes the option
economically valuable. Likewise, the exposure to losses relating to a short position in the asset
is mitigated by trading the appropriate option, i.e by a long position in the put option, as
illustrated by the bottom panel, provided again that
, another no-arbitrage condition.
Once again, this feature makes the option economically valuable.

467

c
by
A. Mele

10.3. Optionality and no-arb bounds

20

long stock
10

-10

long call

85

90

95

100

105

110

115

110

115

120

S_T

short put

-20

Bullish view

20

short share
10

long put
short call

85

90

95

100

105

120

S_T

-10

-20

Bearish view
FIGURE 10.1. Top panel: The solid line depicts the P&L of an at-the-money European
call option when the interest rate is zero, (
)+
, where
is the stock price at
expiration, = 100 is the strike price, and = 5 is the price of a call. The dashed line is
, with = 100. The dotted line is the payo from
the P&L from holding the stock,
+
the sale of a put option
(
) , where by the put-call parity, =
+ = 5.
The bottom panel depicts P&Ls from going long a put, (
)+
(the solid line),
shorting the stock,
(the dashed line), and shorting a call
(
)+ (the
dotted line).

The prices of the call and put options are related by the put-call parity. Let ( ) be the
time price of a zero maturing at time
. Then, the prices of a put and a call option with
468

c
by
A. Mele

10.3. Optionality and no-arb bounds


the same exercise price

and the same expiration date


=

satisfy,
)

(10.7)

To show Eq. (10.7), consider two portfolios: (a) long one call, short one underlying asset, and
invest
( ); (b) long one put. The table below gives the value of the two portfolios at time
and at time .
Value at
Value at
+
(

(a)
(b)

+
0

The two portfolios have the same value in each state of nature at time . Therefore, their
values at time must be identical to rule out arbitrage. Alternatively, Eq. (10.7) follows by
(
)
(
)
taking conditional expectations of the identity:
(
)+
(
)+ +
(
)
(
).
By the put-call parity, the properties of European puts are easily found to follow from those
of calls. Therefore, we only focus on calls, whenever possible. The price of a European call
option satises the following bounds:
max {0

)}

( ;

(10.8)

Indeed, consider two portfolios: (a) long one call; (b) long the asset underlying the call and issue
debt for an amount equal to
( ). The table below gives the value of the two portfolios at
time and at time .
Value at
Value at
(a)
(b)

0
(

The value of portfolio (a) dominates that of portfolio (b) at , and the same must be true at
time . Moreover, the price is positive because the payo of the option is positive. Therefore,
the rst inequality in (10.8) is true. As for the second inequality, suppose the contrary, i.e.
. Then, at time , we could sell one call and buy the underlying asset, thus making
a prot equal to
. Come time , the option will be exercized if
, in which
case we shall sell the underlying assets and obtain . If
, the option will not be
exercized, and we will still hold the asset or sell it and make a prot equal to
. Eq. (10.8)
implies the following asymptotic behavior of the call price: (i) lim 0 ( ; ;
)
0, (ii)
lim 0 ( ; ;
)
, and (iii) lim
( ; ;
)
.
The top panel of Figure 10.2 illustrates the basic arbitrage bounds in (10.8), as well as the
limiting behavior of the price for
small and for
large. First, the price
must be in the
region within the
and
lines. Moreover,
is small when
is small, and large when
is large. However,
cannot lie outside the region within the
line and
lines, which
implies that gets large, by sliding up on the
line.
469

c
by
A. Mele

10.3. Optionality and no-arb bounds

c(t)

45

B
K b(t,T)

c(t)

S(t)

c(t)

S(t)

S(t)

FIGURE 10.2.

How does the option price behave in the region within the
and
lines? We cannot
tell. We may simply say that given the boundary behavior of , if
is convex in , it is
also increasing in . Convexity of
is a reasonable property, which holds for basic di usive
models, as originally noted by Bergman, Grundy and Wiener (1996). In this case,
would
behave as in the left-hand side of the bottom panel of Figure 10.2. This case seems to be
relevant, empirically, and consistent with the predictions of the celebrated Black and Scholes
(1973) formula, and some of its extensions. However, it is not a general property of option
prices. Bergman, Grundy and Wiener (1996) provide several counter-examples where
can be
decreasing over some range of , arising in models with jumps, or with stochastic volatility.
Theoretically, we cannot rule out that the option price behaves as in the right-hand side of the
bottom panel of Figure 10.2, as further developed in Section 10.5 [in progress].
The economic meaning of convexity is that the option is unlikely to be exercized when
is small. Therefore, changes in
have little e ect on . However, the option is likely to be
exercized when
is large. A percentage increase in is then to be followed by an even higher
percentage increase in . In other terms, the elasticity of the option price with respect to the
asset price is larger than one,

1.4 Therefore, option returns likely are more volatile


than those on the underlying asset.
How does time-to-maturity a ect the call price? Calls are known to be wasting assets,
meaning that their value decreases over time, as illustrated by an hypothetical example in Figure
10.3, which plots the option price function relating to three maturity dates, 1
2
3 , as
predicted by the Black & Scholes model.
4 For

any increasing and convex function, which is zero at the origin, the tangent is higher than the secant.

470

c
by
A. Mele

10.3. Optionality and no-arb bounds

20

15

10

T=1Y
80

85

T=6m T=3m
90

95

100

105

110

115

120

FIGURE 10.3. The value of a call option struckable at


= 100, as time to maturity
shrinks, as predicted by the Black & Scholes model, with volatility parameter equal to
20% and short-term rate = 1%. The leftmost solid line is the price corresponding to time
to maturity = one year, the dashed line is the price corresponding to time to maturity
= six months, and the dotted line is the price corresponding to time to maturity =
three months. The rightmost solid line is the no-arbitrage bound (
)+ .

10.3.2 Hedging
Financial intermediaries such as investment banks sell options that they want to hedge against,
to avoid the exposure to losses illustrated in Figure 10.1. Hedging is important when the only
objective is to receive fees from the sale of derivatives. The portfolio that mimics the option
price must display the properties discussed in Section 10.3.1. For example, we need to ensure
that it behaves as the call price behaves in left-hand side of the bottom panel of Figure 10.2,
which is the most relevant, empirically.
We require this portfolio to exhibit a number of properties: (i) its value, , should be increasing in , which is ensured by including the asset underlying the option into the portfolio; (ii)
the sensitivity of with respect to must be positive and bounded by one, 0
1, which
we can make, once the number of underlying assets is less than one; (iii) the elasticity of
with respect to must be greater than one,
1, a condition that could be met, by issuing
debt. Mathematically, the value of the replicating portfolio should be
=
, where
denotes the number of the underlying assets, with
(0 1), and
is debt. In principle, this
portfolio might lead these three properties to be satised.
In fact, hedging is dynamic in nature, because option prices obviously change over time.
Therefore, we expect to be a function of the underlying asset price, , and time to expiration
of the option, in accordance with theory set forth in Part I of these lectures, especially in Chapter
4. The portfolio needs to satisfy additional properties: (iv) the number of the underlying assets
must increase with , and the value of the portfolio should be virtually insensitive to changes
in
when
is low, and slide up through the
line in Figure 10.3 when
is large.
These conditions are met if increases with , with lim 0 ( )
0 and lim
( )
1.
Finally, the portfolio needs to be self-nanced, as the long position in the option does not
471

c
by
A. Mele

10.3. Optionality and no-arb bounds

entail additional inows or outows, until time to expirationany additional purchases of


the underlying asset has to be nanced by issuing new debt, and any additional sale of the
underlying asset must be used to shrink existing. Section 10.4 aims to derive this hedging
strategies within the Black & Scholes market.
10.3.3 A case study: accumulators, decumulators
Options can be used to build up dedicated structured products, such as those relying on baskets
of options. Consider, for example, an accumulator. Anecdotal evidence suggests that at times,
accumulators might be quite popular amongst private bankers in Hong Kong. An accumulator
is a portfolio which is long one call and short two or more puts. (A decumulator is short one
accumulator.) The strike of the call is higher than the strike of the put. The rationale behind
going long the call is to ensure prots are made once the market is up. Instead, puts are sold to
nance the long position in the call, so as to make the value of the accumulator equal to zero
at its inception. Consider, for example, Figure 10.4, where the strike of the call is
= 100.
Note that since the value of the accumulator is zero at inception, the picture actually depicts
net prots.5

20

10

85

90

95

100

105

110

115

120

S_T

-10

-20

-30

-40

FIGURE 10.4. The solid line depicts the payo guaranteed by an accumulator, a structured product that is long one call option with strike price
= 100, and short puts
with strike price
= 90, and = 2 (solid line), and = 4 (dashed line).

If the current market level is = 102, prots are likely to be made, at least provided the
market does not fall below the strike at which the put is struck, which in this example is
= 90. However, accumulators are quite riskythe losses they might lead to during market
downturns can be quite severe cmpared to possible gains in good times.
The size of the losses obviously depends on the number of puts to sell, , and their strike,
. As the previous picture reveals, losses widen as we increase the number of puts we go
short. Therefore, we can decrease the probability of experiencing any losses, by just xing
5 Therefore, underlying Figure 10.4 is the assumption that the price of the put is half that of the call. Below, we discuss this
assumption, which we now make for illustrative purposes only, and which we shall remove in Section 10.5.3.

472

c
by
A. Mele

10.4. Evaluation

= 90, and decreasing , although this might entail less resources left over to go long a call.
A possibility might then be to go long a less expensive call, by adding a knock-out feature into
the call contract (one that says that the option becomes worthless once the market reaches a
certain level such as, say, 105 within the investment horizon), or by purchasing a call with a
lower strike price. Alternatively, we might be willing to design a product with more risk, but
also more upside, by choosing an appropriate strike for the puts. Obviously, puts become more
valuable as we increase the strike price. Therefore, we could increase
, from 90 to 95 say,
whilst keeping constant. While selling put options with higher strikes increases the probability
to have losses, it also allows us to purchase more expensive calls, those with lower strikes, which
leads to a higher probability to achieve positive returns.
Naturally, the previous reasoning hinges upom the assumption that the accumulator is selfnanced at inception. This condition denes the type of options we can a ord. Some call options
can be too expensive and might require a large exposure relating to the short position in puts.
For example, some calculations show that in a market with stochastic volatility such as that of
Heston (1993b), we may need to sale short approximately six puts when the current index level
is = 102 and
= 95. Section 10.5.3 develops a case study where these risks are quantied
under a variety of alternative assumptions about strikes and market volatility.

10.4 Evaluation
This section provides pricing formulae and discusses hedging in the special case of the Black
and Scholes (1973) and Merton (1973) markets. It also examines issues relating to how hedging
might possible spillover to the volatility of the underlying asset, thereby dealing with instances
of feedback e ects, whereby the presence of derivatives (and trading activities on them) might
a ect the dynamics of the very same underlyinga theme sometimes referred to as endogenous
risk (see Chapter 8).
The next section provides a general evaluation formula to price futures and options, which
goes beyond the assumptions underlying Black & Scholes and Merton market. Section 10.4.2
provides a derivation of Black-Scholes formula relying on this general framework, and also
relying on a replication argument. Section 10.4.3 [...]
[In progress]
10.4.1 A pricing formula
Forward and options obviously di er, as they rely on the two distinct notions of obligation and
optionality. However, we can encompass their payo s into a single one, the value of which will
be used a few times in this chapter. Consider, then, again, a contract similar to that in Section
10.2.1, where at time , the payo is given by
, for some constant value of . We know
that the current value of this payo is:
(

E (

)=

(10.9)

which is zero, for


= . We want to further analyze a situation where this current value is
not necessarily zero, and show that the previous expression is a special case of a quite important
pricing formula. Consider a payo at time , equal to
, provided the stock price at is
at least as large as some positive constant
0,
(

)I
473

c
by
A. Mele

10.4. Evaluation
For = 0, this payo is just that of a forward, and for =
call. To price this payo , we proceed as follows:
(

E [(

)I

]=
=
=

E ( ( )I
)
(
(I
E
)
(
)

, the payo is that of a European


(
)

E (I

E (I

)
(

(10.10)

(
)
where ( )
,
is the risk-neutral probability given the information at time ,
is a new probability, with Radon-Nikodym derivative given by

( )

(10.11)

the expectation under . Naturally,


and, nally, E denotes the expectation under , and E
Eq. (10.10) collapses to Eq. (10.9), once = 0, and to the celebrated Black and Scholes (1973)
formula, once we take
to be a geometric Brownian Motion, and = , as explained in the
following section. It is a general formula, and a quite useful one whilst dealing with di cult
models such as those where the volatility of the underlying asset return is not constant, as
illustrated in Section 10.5.
10.4.2 Black & Scholes
We assume that the stock price is a Geometric Brownian motion with parameters (
=
where

),

is a Brownian motion under the physical probability.

10.4.2.1 By replication: I

Replicating the value of an asset while trading in other assets is a theme framed in many
junctures of Part I of these lectures. For sake of completeness, we develop the arguments, which
ultimately lead to the Black & Scholes formula.
We invest units in the asset underlying a call option, and
in the money market account,
and the value of this portfolio is =
+
with obvious notation. Once self-nanced, the
value of this portfolio satises,

=
+
(10.12)
whereas by Itos lemma, the call option price

+
=

say, satises,
1
+
2

2
2
2

(10.13)

We conjecture that the value of this portfolio does exactly replicate the option value, a
conjecture veried in the subsection below. By comparing Eq. (10.12) and Eq. (10.13) leaves
474

c
by
A. Mele

10.4. Evaluation
(i) the matching condition for

,
+

=
1
2

, and (ii) the drift matching condition,


2
2

=
=
= (

where the second equality follows by


=
, the third by the expression for the value of
the replicating portfolio, and the fourth, by the equality = (arising due to the conjecture
that the portfolio value is replicating the options) and the expression for in (i). Rearranging
terms leaves,
1 2 2 2
+
+
=
(10.14)
2 2
subject to the boundary condition, ( 0;
)=(
)+ .
The solution to Eq. (10.14) is the celebrated Black and Scholes (1973) formula,
BS

)=

( )

ln( ) + ( +

1
2

)(

)
(10.15)

where denotes the cumulative Normal distribution.


Finally, Appendix 1 derives Eq. (10.14) hinging upon the original arguments developed by
Black and Scholes (1973), and Merton (1973), where one considers a portfolio comprising the
option, the underlying asset, and a portfolio strategy aiming to make the portfolio locally
riskless.
10.4.2.2 By replication: II

Eq. (10.15) holds even without requiring that a market for the option exists over the options
life, or that the pricing function ( ) is di erentiable. That the option price is di erentiable
is a result, not an assumption. Let us dene the function ( ) that solves Eq. (10.14), with
boundary condition (
)=(
)+ . Note, we are not assuming this function is the option
price. Rather, we shall show this is the option price. Consider a self-nanced portfolio of bonds
and stocks, with =
. Its value satises,
=(
Moreover, by Itos lemma,
=

)+

) is solution to
1
+
2

By subtracting the previous two equations, leaves:


1 2 2
=(
|
{z 2
=

= (

(
)=( 0
(0 0 )) , for all
[0 ]. Next, assume that
Hence, we have that
+
=
(0
).
Then,
=
(
)
and
=
(
)
=
(
)
. That is, the portfolio
0
0
=
replicates the payo underlying the option contract. Therefore, 0 is the value of the
option at time zero, even when a market for the option does not exist over its life.
475

c
by
A. Mele

10.4. Evaluation
10.4.2.3 By probabilistic arguments

Alternatively, we can use the general framework in Section 10.2.3 and arrive at Eq. (10.15).
We simply set
in Eq. (10.10), and calculate the dynamics of the stock price under
and under , and determine (
) and (
) in Eq. (10.10). Under , the stock
price is the usual geometric Brownian motion with drift equal to , and then, (
)=
(
), which explains the second term
in Eq. (10.15). (As )for the rst term, we can

show that the Radon-Nikodym derivative,
, is such that ( ) is
= ( )
F

solution to:

( )
=
( )

such that by the Girsanov theorem reviewed in Chapter 4, the stock price is solution to:

1 2

+
+
ln =
2
where is a Brownian motion under . It easily follows that (
completing the proof of the Black-Scholes formula.

)=

( ), thereby

10.4.3 Surprising cancellations and preference-free formulae


Chapter 4 explains that due to what Heston (1993a) (p. 933) quite aptly terms a surprising
cancellation, the constant doesnt show up in the nal formula. Heston (1993a) shows that
this property is not robust to modications in the assumptions for the underlying asset price
process. [In progress, Gamma processes, incomplete markets.]
However, even within a di usion setting, the expected return on the option does of course
depend on . By Eq. (10.14),

= +

| {z }

(10.16)

is, simply, by Itos lemma, the instantaneous volatility of the option returns, and
where
is the unit risk-premium related to the uctuations of the asset price, = (
) .
10.4.4 Future options and Blacks formula
Consider a future option, one that gives the buyer the right, not the obligation, to enter into a future contract for a specied price , at time , such that the payo at time is, ( ( )
)+ ,
(
)
where
( ) denotes the future price in Eq. (10.1),
( )=
. It is easy to see that
( ) is martingale under the risk-neutral probability . For example, assuming that is a
Geometric Brownian motion with volatility , we have that,
( )
=
( )
where is a Brownian motion under
F(

( )

E (

(10.17)

. Therefore, the price of a future option as of time is,

(
)
( )
)+ =
( ) ( )
(
)
(10.18)
476

c
by
A. Mele

10.4. Evaluation
where
ln
=

( )

1
2

and the second equality of Eq. (10.18) follows by the Black & Scholes formula, Eq. (10.15).
Eq. (10.18) is the celebrated Blacks (1976a) formula, which turns out to be very useful in
the context of xed income security pricing, as explained in Chapter 12. Appendix 2 provides
an alternative derivation of Eq. (10.18), based on the pricing approach of Section 10.2, and
the slightly more general assumption that the volatility of the future price in Eq. (10.17) is
time-varyingbut deterministic.
Chapter 12 explains that the property that future prices are martingales under the riskneutral probability generalizes to one holding when interest rates are time-varying, under a
certain probability called forward probability (see Chapter 12, Section 12.2).
10.4.5 Hedging
The cloning arguments in Section 10.4.2 suggest how to replicate a call option in a Black
and Scholes market. We set up a portfolio with an amount
in the underlying asset and the
remaining in the money market account, where
BS

BS

(10.19)

and BS () denotes the Black & Scholes formula.


Eq. (10.19) can actually be simplied. Note that Black & Scholes formula is homogenous of
degree one in and , that is, BS (
;
)=
;
) for any constant
BS (
. Therefore, by Eulers theorem,
BS

)=

BS

BS

(10.20)

Comparing Eq. (10.20) with the Black-Scholes formula in Eq. (10.15), produces,
BS

= ( )

(10.21)

But why do we need to replicate derivatives, in practice? Because most of them are dealt with
by investment banks, which simply act as nancial intermediaries, trading derivatives on behalf
of third parties, and being compensated through fees. Suppose an investment bank receives an
order to sell a put. The bank would like to hedge against this put by creating a replicating
portfolio such that the value of this portfolio be the same as the nal payo to be paid o
to honour the sale. So hedging is needed to replicate the nal payo s required to honour the
contracts giving rise to these payo s. Standard market practice is to use the Black-Scholes
delta in Eq. (10.21).
Note that at the same time, investment banks, not to mention funds, can undertake speculative trading activities aimed to implement specic views, such as those described in Section
10.5.5 below, in which case hedging doesnt necessarily need to be implemented. However, even
in this case, hedging might be required to isolate the particular views a trading desk of the
bank is taking. For example, Section 10.5.5 will explain that to express the view that equity
volatility will raise, say, we cannot simply go long call options, because call prices are increasing
both in volatility and the price underlying the option. A better solution is to go long an option,
delta-hedged through Black-Scholes, as we shall explain.
477

10.4. Evaluation

c
by
A. Mele

10.4.6 Endogenous volatility


[In progress]
Hedges and crashes. Delta-hedging can lead to nancial turmoil. The 1987 crash, and the
conclusion of the Brady commission. The theory in the Bradys report is that the market was
initially hit by bad news, and fell, triggering further sell-o s originating from program trading.
Uninformed investors, who did not fully understand the nature of these developments, exited
the market, and the fall they created would trigger additional program trades until the market
could not reach the usual equilibrium where it used to stay, switching to another equilibrium
through a discrete changean equity market crash of more than -22% in just one day, the
largest of nancial history.
The ash crash of May 6th, 2010 might be considered a modern prototype version of the 1987
crash. According to the explanation put forward in a joint report by the SEC and the CFTC
(SEC-CFTC, 2010), a mutual fund sold a quite large number of E-mini futures on the S&P
500 for mere hedging motives, but high frequency trading rms would then initiate arbitrage
adjustments, by going long them and simultaneously shorting the equities representing the
SPYthe ETF that tracks the S&P 500 index. To unwind these positions would normally
require traders with fundamental views, but the majority of the players over those few minutes
of trading were high frequency trading rms, who would then transact with each other over a
self-exciting process, where trading aggressiveness intensies as volume increases, due to the
nature of algorithmic trading. In the end, large volume and sell-o s reinforced each other,
whereby increasing volumes triggered algorithms to sell into a falling market, leading to a crash
of nearly -10% in the equity market over a few minutes.
[Give a short outline of this section]
10.4.6.1 Hedges

A well-known denition is that of the Gamma of a derivative, which is the second order partial of
the derivative price with respect to the underlying. The Gamma is always positive for long calls
and puts, as these derivatives have positive convexity, as illustrated by Figure 10.1. Naturally,
short calls and puts have negative gamma. In order for the statement when gamma is negative,
delta hedging involves buying on the way up and selling on the way down to be true, we also
have to consider whether the delta is positive or not (that is, whether the derivative price is
increasing or decreasing in the underlying asset price). So we have four instances of hedging
portfolios:
(i) Positive gamma: Buying on the way up and selling on the way down.
(i.1) Hedging portfolios with positive delta, as required, for example, to hedge against the
sale of a call. Positive delta means that the hedging portfolio relies on buying the
assets underlying the call. When the price of these assets are up, the delta is also
up, which implies we need to keep on buying even more of the assets underlying the
hedging portfolio. On the other hand, when prices are down, the delta is also down,
which implies holding less of the assets underlying the hedging portfolio, thereby
leading to sell some these assets precisely when the market is down.
(i.2) Hedging portfolios with negative delta, as required, for example, to hedge against
the sale of a put. Negative delta means that the hedging portfolio relies on selling
the assets underlying the put. In this case, delta is up when when prices are up.
478

c
by
A. Mele

10.4. Evaluation

However, this now simply means that we need to sell less! For example delta might
have been 12 before the market was up and now delta is 14 : that is, we need to buy
back some of the assets underlying the hedging portfolio. When, instead, prices are
down, delta is also down, which means we need to sell even more into a depressed
market.
(ii) Negative gamma: Buying on the way down and selling on the way up.
(ii.1) Hedging portfolios with positive delta, as required, for example, to hedge against
having gone long a put. Positive delta means that the hedging portfolio relies on
buying assets underlying the put. Negative gamma now means that as soon as the
price of these asset goes up (resp. down), we need to buy less (resp. buy more), so
we sell when prices go up and buy when prices go down.
(ii.2) Hedging portfolios with positive delta, as required, for example, to hedge against
having gone long a call. We are now selling the assets underlying the call. Negative
gamma, here, means that as the price of these assets goes up (resp. down), we need
to sell more (resp. sell less), so once again, we sell when prices go up and buy when
prices go down.
How to implement these hedging portfolios, in practice, is still an open question, as this
issue is necessarily model-based. Section 10.5.4, for example, shows that delta hedging under
the Black-Scholes assumptions would lead the bank to eliminate the risk of uctuations in the
underlying stock price. At the same time, however, hedging through Black-Scholes leads the
derivatives book quite messy once the fundamental assumption underlying the Black-Scholes
world does not hold, namely that volatility changes randomly. In this case, hedging would
rather look like a volatility view. To appropriately hedge, one has to rely on more complicated
hedging strategies. For example, to hedge against an option in a world of stochastic volatility,
we would need to use a stock, a bond, and, another ... option!
10.4.6.2 Crashes

[In progress]
Use a simple model, by Grossman, to illustrate how volatility is pumped-up by automatic
mechanisms. Then, discuss a streamlined version of Gennotte and Leland, with asymmetric
information, to illustrate the 1987 crash.
10.4.7 Properties of options in di usive models
We consider a simple model in which the stock price is solution to,
p
= ( ) + 2 ( )

(10.22)

) be the price of a European-style option at time , which pays o


( ) 0 at
Let (
0
time . We assume that ( ) is di erentiable, with ( ) 0. In the absence of arbitrage,
satises the following partial di erential equation,
0=

subject to the boundary condition,


[Plan of the section, in progress]

+
(

+
)=
479

( )

( ).

(10.23)

c
by
A. Mele

10.4. Evaluation
10.4.7.1 Passage of time

Call options are sometimes referred to as wasting assets because their value tends to decreases over time, due to a decrease in the value of the optionality, in a sense to be explained
next. Dene
as the elasticity of the option price with respect to the asset price. For a
call option, the elasticity
1 as noted in Section 10.3, such that Eq. (10.23) and convexity of
leaves:
=
(
1)
( ) 0
(10.24)
Furthermore, note that convexity increases as time to maturity decreases, with the limit case
arising at maturity when
shrinks to Diracs delta. Therefore, the drop in value is the most
severe in correspondence of shorter maturities. What happens, then, if we sell call options to buy
them back later? Do options provide us with arbitrage opportunities? Obviously not. Selling
a call and bying it later leads to prots only if market volatility is stable and the underlying
asset price does not move too much as a result. This remark is indeed the motivation of some
basic trading strategies known as calendar spreads, and further explained in Section 10.6.1. A
trading strategy is not an aribtrage opportunity though, only a way to implement a particular
view.
Note that for a Euroepan put option, the elasticity of the put price with respect to the asset
is negative, and can actually lead the RHS of Eq. (10.24) to change sign, especially for far
out-of-the-money options, as Figure 10.5 reveals.

34

T=3m

32
30
28
26
24
22
20

T=1Y

18
16
66

68

70

72

74

76

78

80

82

84

FIGURE 10.5. The value of a put option with strike = 100, as time to maturity shrinks,
as predicted by the Black & Scholes model, with volatility parameter equal to 20% and
short-term rate = 1%. The solid line is the price corresponding to time to maturity =
one year, and the dashed line is the price corresponding to time to maturity = three
months.
10.4.7.2 Comparative statics of dynamic models

We derive properties of option prices in the context of di usion processes, relying on methods
suggested by Bergman, Grundy and Wiener (1996), and in Chapter 7 of the lectures (see
Proposition 7.1). We establish that if the stock price is solution to Eq. (10.22), the price of an
European-style option inherits the properties of the nal payo : it is increasing and convex in
480

c
by
A. Mele

10.4. Evaluation

the underlying price if the payo


is. These properties imply that the option price is increasing
in the volatility of the underlying price.
Di erentiate Eq. (10.23) with respect to . The result is that
satises,

2
2
+
( ) +
( )
(10.25)
+
0=

)=
subject to the boundary condition (
) = 0 ( ) 0. Therefore, we have that (
(
) 0 for all , due to results reviewed in Chapter 7 (see Proposition 7.1 and Appendix 1 in Chapter 7). That is, in a scalar di usion setting, a European-style option price is
increasing in the underlying whenever the ( ) is increasing.
Next, we nd conditions under which
0. We di erentiate Eq. (10.25) with respect to
, and
=
satises,

2
2 2
2
+2
( )
+
( )
(
)
(10.26)
0= +
2

subject to the boundary condition (


) = 00 ( ). We now have that
(
) =
00
(
)
0 whenever the payo is convex,
( )
0 for all . That is, in a scalar
di usion setting, a European-style option price is convex in the underlying, provided the nal
payo is convex.
Finally, we implement the thought experiment of tilting the volatility of the underlying.
Consider two markets A and B with prices (
) = , in which the volatility of the asset
price in market A is higher than that in market B, viz
=

p
2 (

where is Brownian motion under the risk-neutral probability, and


The price di erence in the two markets,
, satises,

2
+
+
+
0=

( )
2

( ) for all .
(10.27)

By the same results used to analyze Eq. (10.25) and (10.26), we now have that
0 whenever
0. That is, if option prices are convex in the underlying price, they are increasing in the
volatility of the asset price.
This result is reminiscent of the theory of mean-preserving spreads as explained in Chapter
7. By increasing the volatility of the underlying, the holder of an European call (say) would
benet from the upside while not su ering losses on the way downrisk-neutral evaluation of
traded assets benets from an increasing volatility. We will see that that this conclusion might
be reversed when it comes to assessing how the price of xed income instruments reacts to
changes in the volatility of the underlying fundamentals.
10.4.7.3 Counterexamples

[In progress]
10.4.7.4 Recovering risk-neutral probabilities

Consider the price of a European call,


(

)=

481

) ( |

c
by
A. Mele

10.5. Stochastic volatility


where
lim

is the risk-neutral probability and ( + | )


( | ) = 0, and di erentiating with respect to
leaves:
Z
(
; )
(
)
=
( | )

| ). Assuming that

We can check this relation holds true in the Black-Scholes model, in Eq. (10.20). Let us di erentiate again,
2
(
; )
(
)
= ( | )
(10.28)
2
Eq. (10.28) allows us to recover the risk-neutral density using option prices. The Arrow-Debreu
state density, DAD (
= | ), is given by,

2
(
; )
(
)
2 (
)
= | )=
( | )| = =
DAD (

2
=

These results are quite useful in applied work. They also help deal with the pricing of volatility
contracts reviewed in Section 10.6, as explained in Appendix 4.

10.5 Stochastic volatility


10.5.1 Statistical models of changing volatility
10.5.1.1 ARCH and random variance models

Asset returns have time-varying volatility and their distributions are both heavy-peaked and
tailed, as reviewed in Chapter 7. These empirical regularities are very well-known at least
since the seminal work of Mandelbrot (1963) and Fama (1965). Engle (1982) and Bollerslev
(1986) introduce the rst parametric models aiming to capture these stylized facts through
the celebrated Auto Regressive Conditionally Heteroskedastic (ARCH) models. ARCH models
have played a prominent role in the analysis of many aspects of nancial econometrics, such as
the term structure of interest rates, the pricing of options, or the presence of time varying risk
premiums in the foreign exchange market, as summarized by the classic survey of Bollerslev,
Engle and Nelson (1994).
An ARCH model works as follows. Let { } =1 be a record of observations on some asset
returns, = ln
is the asset price. The variance of is, then, modeled as an
1 , where
autoregressive process, as follows:
=

(0

2
1

2
1

(10.29)

where , , and are parameters and denotes the information set as of time . This model is
known as the GARCH(1,1) model (Generalized ARCH). It was introduced by Bollerslev (1986),
and collapses to the ARCH(1) model introduced by Engle (1982) once we set = 0. In other
words, the variance of the distribution of asset returns tomorrow, is linear in the expectation
2
2
error, (
), and rises linearly with the current realized variance, 2 , viz
1( )

2
2

2
+( + ) 2+
1
+1 =
The quintessence of ARCH models is to make volatility dependent on the variability of past
observations. An alternative formulation, initiated by Taylor (1986), makes volatility driven
482

c
by
A. Mele

10.5. Stochastic volatility

by some unobserved components. This formulation gives rise to the stochastic volatility model.
Consider, for example, the following stochastic volatility model,

ln

=
=

+
+

ln

(0

1
1

+ ln

2
1

);
;

(0

where , , , and 2 are parameters. The main di erence between this model and the
GARCH(1,1) model in Eq. (10.29) is that the volatility as of time , 2 , is not predetermined
by the past forecast error, 1 . Rather, this volatility depends on the realization of the stochastic
volatility shock at time . This makes the stochastic volatility model considerably richer than
a simple ARCH model. As for the ARCH models, SV models have also been intensively used,
especially following the progress accomplished in the corresponding estimation techniques. The
seminal contributions related to the estimation of this kind of models are mentioned in Mele
and Fornari (2000). Early contributions that relate changes in volatility of asset returns to
economic intuition include Clark (1973) and Tauchen and Pitts (1983), who assume that a
stochastic process of information arrival generates a random number of intraday changes of the
asset price.
10.5.1.2 ARCH and di usive models

Under regularity conditions, ARCH models and stochastic volatility models behave essentially
the same as the sampling frequency gets su ciently high. Precisely, Nelson (1990) shows that
ARCH models converge in distribution to the solution of the stochastic di erential equations, in
the sense that the nite-dimensional distributions of the volatility process generated by ARCH
models converge towards the nite-dimensional distributions of some di usion process, as the
sampling frequence goes to innity. Mele and Fornari (2000) (Chapter 2) contain a review of
results relating to this type of convergence, and Corradi (2000) develops a critique related to
the conditions underlying these convergence results. To illustrate, heuristically, consider the
following model,

ln
= ( ) +
(10.30)
2
2
= (
) + 2

and
are correlated, with correlation , and
where
Consider, further, the ARCH model:

+1 =
+1
2
+
(|
|
)2 +
+1 =

, and

are some constants.

NID (0 1)
2

(10.31)

where
)
(ln (
)), and
refer to the indexing of ob+1 = ln (
+1
+1
served data and the sampling frequency (weekly, say), and
,
,
are positive parameters,
possibly depending on the sampling frequency, and
( 1 1). The parameter allows to
capture the Black-Christie-Nelson leverage e ect (Black, 1976b; Christie, 1982; Nelson, 1991)
discussed in Chapter 8. Note that the second of Eqs. (10.31) can be written as:

2
2
2
1
1
=
(|
|
)2
1
+1
+

1
2

(10.32)

and
(|
|
)2
(|
|
)2 . The rst two terms dene the drift
term for the variance process, and the last term is the di usive component. Suppose that
483

c
by
A. Mele

10.5. Stochastic volatility

1
1
lim 0
=
, lim 0
(|
|
)2
1
=
, and, nally,
1 2
=
, where
var (
lim 0
2 ). Then, under regularity conditions,
2
the sample paths of and
in Eqs. (10.31) converge to those of and 2 in Eqs. (10.30),
with a well-dened correlation coe cient (see Fornari and Mele, 2006).6
10.5.2 Implied volatility, smiles and skews
Parallel to time-series research into asset volatilities reviewed in the previous section, research
on option prices over the 1980s challenged the assumption of a constant volatility in the Black
& Scholes and Merton model. As we know, the Black & Scholes model relies on the assumption
that the price of the underlying asset is a geometric Brownian motion with constant volatility,
=

where
is a Brownian motion, and , are constants. As we also know, is the only parameter
to enter the option pricing formula, which leads to a crucial point. Not only is the assumption
of a constant inconsistent with the time-series evidence reviewed in the previous section. It
is also inconsistent with empirical evidence on the cross-section of option prices. Let $ (
)
denote the time market price of a call expiring at with strike , and consider the price
predicted by the Black-Scholes, BS (
;
) in Eq. (10.15). Dene the Black-Scholes
implied volatility as the value of that equates the Black-Scholes formula to the option market
price, IV say,
IV : $ (
) = BS (
; IV)
(10.33)
We know from Section 10.4.7 that the Black-Scholes option price is strictly increasing in .
Therefore, this denition of implied volatility makes sense, in that there exists a unique value
for IV such that Eq. (10.33) holds true. In fact, the market practice is to quote options in terms
of implied volatilities, not prices. Moreover, implied volatility is the same for both the call and
the put. Indeed, by the put-call parity in Eq. (10.7), viz
$

)=

(10.34)

;
) =
This equation also holds for the Black-Scholes model for each , i.e. BS (
(
)
;
)
+
. Subtracting this equation from Eq. (10.34) shows that
BS (
the implied volatilities for a call and for a put options are the same.
If the Black & Scholes model holds, implied volatilities would be the same for each . Yet
empirically, and at least since 1987, the cross section of implied volatilities exhibit striking
characteristics, when gauged against the moneyness of the option dened as,
(

(10.35)

Prior to 1987, the pattern of implied volatilities was unclear or U-shaped in 1 at best
a smile. After the 1987 crash, the smile pattern turned into a smirk, also referred to as
volatility skew. One possible explanation for these facts might refer to the fact that call
6 For

example, if

= 0, the random component of the di usive term in Eq. (10.32) collapses to

1
the moment condition for the di usive component is = lim
0
centered chi-square variates with one degree of freedom (and variance
motion increments
in the second of Eqs. (10.30).

484

), and

2 . Intuitively, in this case


is an IID sequence of
= 2), and stands for the discrete version of the Brownian

c
by
A. Mele

10.5. Stochastic volatility

and put options that are deep-in-the-money and call or put options that are deep-out-of the
money are relatively less liquid than at-the-money options, thereby commanding a liquidity riskpremium. Since the Black-Scholes option price is increasing in volatility, the implied volatility
is U-shaped in 1 .
Figures 10.6 and 10.7 illustrate how smiles and smirks arise in a di erent context, one where
asset returns exhibit random volatility. We rely on the celebrated Hestons (1993) model in
which volatility is random, and refer the reader to Section 10.5.4 (see Eq. (10.54)) for technical
details regarding this model.7
rho=0

rho=0.5

0.108

0.125

0.107

0.12

0.106
0.115

Implied volatility

Implied volatility

0.105
0.104
0.103
0.102

0.11

0.105

0.1

0.101
0.095
0.1
0.09

0.099
0.098
0.7

0.8

0.9

1.1

1.2

0.085
0.7

1.3

r(Tt)

Ke

0.8

0.9

1.1

1.2

1.3

r(Tt)

/S

Ke

/S

FIGURE 10.6. Smile and smirk predicted by the Heston model in Eq. (10.54), with parameters xed at = 2, = 0 01, = 0 1 and, for the left-hand panel (the smile) = 0,
and the right-hand panel (the skew) = 0 5. The initial values of the asset price and
volatility are = 100, and
=
, and the short-term rate = 0, and the maturity
of the option is six months.

The rationale underlying the patterns in Figure 10.6 is the following. The Black & Scholes
model relies on the assumption asset returns are log-normally distributed. However, this assumption may not be correct, as the market might be pricing through alternative distributions
where a higher weight is given to tail events, due to market fears about extreme outcomes. For
example, the market might fear the stock price will fall below a given level, say , more than
the Black & Scholes model would predict. As a result, the market density should have a left
tail ticker than the log-normal, for values of
.
7 The densities in Figure 10.7 are those of 1
Eq. (10.59) for Hestonboth densities are with respect to

in Eq. (10.15) for Black & Scholes, and of 1


.

485

ln

in

c
by
A. Mele

10.5. Stochastic volatility

This possibility is illustrated by the left panel of Figure 10.7, which depicts the risk-neutral
distributions of both the Black & Scholes model, and one model with random volatility, taken
to be the trutha model that does generate thick tails, as discussed below. A market density
with a left tail thicker than that of Black & Scholes implies that the probability deep-out-ofthe-money puts (i.e., those with low strike prices) will be exercized is higher under the market
density than under the log-normal. For this reason, the implied volatility we need to price
deep-out-of-the-money puts is higher than that required to price at-the-money calls and puts.8
At the other extreme, the market may attach a higher likelihood that the stock price will be
above some than predicted by Black & Scholes, which would translate into a market density
. This characteristics implies a
with a right tail ticker than the log-normal, for values of
higher probability that deep-out-of-the-money calls (i.e., those with high strike prices) will be
exercized, compared to the log-normal. As a result, the implied volatility needed to price deepout-of-the-money calls exceeds the implied vol needed to price at-the-money calls and puts, as
illustrated by the left panel of Figure 10.7. As mentioned, the second e ect has disappeared
since the 1987 crash, for reasons similar to those underlying the right panel of Figure 10.7,
leaving the smirk of the right panel of Figure 10.6.
This section suggests that out-of-the money options contain important information regarding
market expectations about future volatility. Indeed, the CBOE-VIX index does aggregate the
prices of out-of-the-money options to convey an estimate of the volatility expected to arise
corrected by risk. Section 10.8 develops details on this index and explains that it links to
the fair value of a variance swap, i.e. a contract in which a counterparty is insured against
uctuations in future volatility.

rho=0

rho=0.5

0.07

0.07
Stochastic volatility model
Black & Scholes model

0.06

0.06

0.05

0.05

Probability density

Probability density

Stochastic volatility model


Black & Scholes model

0.04

0.03

0.04

0.03

0.02

0.02

0.01

0.01

0
80

8 Note

85

90

95

100
105
Asset price

110

115

0
80

120

85

90

95

100
105
Asset price

110

115

120

that if a call (put) option is out-of-the-money for a given strike, a put (call) is in-the-money option for the same strike.

486

c
by
A. Mele

10.5. Stochastic volatility

FIGURE 10.7. Risk-neutral densities predicted by the Black & Scholes model (dashed
line) and the Heston model in Eq. (10.54). The Black & Scholes volatility parameter is
= 9%, and Hestons parameters are xed at = 2,
= 0 01, = 0 1 and, for the
left-hand panel = 0, and the right-hand panel = 0 5. The initial values of the asset
price and volatility are = 100, and
=
, the short-term rate = 0, and maturity
is six months.

While the previous conclusions rely on numerical results, an explanation of smiles is available
since the early 1990s (see, e.g., Ball and Roma, 1994; Renault and Touzi, 1996). To illustrate,
consider the continuous time model,
=
2

+
(

(10.36)
)

+ (

is Brownian motion correlated with , with instantaneous correlation equal to ,


where
and and are some functions satisfying regularity conditions. The drift function, , is needed
to generate mean-reverting behavior in stochastic volatility, 2 , a characteristic we require in
exactly the same spirit of what we need to assume from interest rates, as further elaborated in
Chapter 12.

2
(
)
2
The option price is, (
) =
E (
)+
, where E [] is the
expectation under some risk-neutral probability . Assume that the correlation = 0. Then,
the implied volatility predicted by this model satises,

2
IV :
= BS (
; IV)

and is U-shaped with respect to 1 . Indeed, let


= ln ( ), where
denotes the option
moneyness, as dened in Eq. (10.35). In Appendix 3, we show that it holds, approximately,
that in a model with stochastic volatility and zero correlation, IV is, approximately, a quadratic
function of , IV (
) say, with a minimum occurring at = 0, just as in the left panel
of Figure 10.6, viz

1 2
2 !
q
2
(
)
(
)
1
2
IV (
)
( )+
( )
(10.37)

3
2
( )(
)

q
( ) = E ( ).
where
These interesting properties interestingly link to a compelling lesson we learnt over the early
statistical literature on ARCH and random variance models: random changes in volatility lead
to a return distribution with tails thicker than the normalone with kurtosis larger than three
(Mandelbrot, 1963; Fama, 1965; Nelson, 1990; Mele and Fornari, 2000)a feature that the
Hestons model illustrates vividly in Figure 10.7. For example, we know from Nelson (1990),
that even if unexpected returns are conditionally normally distributed, they are approximately,
and unconditionally, Students t, once we assume their variance follows a GARCH(1,1) process.
Mathematically, denote the unexpected returns as of time with , and suppose that =
, where
NID (0 1) and , the conditional volatility of , is some random process. Then
2
2
we have, by Jensens inequality, that ( 4 ) = ( 4 ) ( 4 )
( 4 ) [ ( 2 )] = ( 4 ) [ ( 2 )] ,
4
( )
which is an equality when
is not random. It follows that the kurtosis, Kurt
[ ( 2 )]2
487

c
by
A. Mele

10.5. Stochastic volatility

( 4 ) = 3. That is, random volatility makes the unconditional return density leptokurtotic even
when the conditional is normal. Although these calculations relate to unconditional densities,
similar conclusions would apply to conditional: random volatility makes a -day conditional
density leptokurtotic even when the one-day conditional is normal.
As a result of this leptokurtoticity in asset returns, the probability out-of-the money options
is exercized is larger than that implied by the log-normal distributionthe smile e ect. As for
the smirk e ect, we need
0, as shown by Figure 10.7. Intuitively, when
0, the left tail
of the return distribution is thicker than the right, thereby making out-of-the money puts most
valuable.
The model in Eqs. (10.36) has been extended to one with jumps, where the variance process
follows a mean-reverting process such as:

2
2
=
+S
+

where is a Poisson process with intensity (see Section 4.7 in Chapter 4), S 0 is the size
of the jump, which we suppose to be constant for illustration purposes only, and, nally, ,
and are constants. In this model, the presence of positive jumps,
0, makes the left tail of
the return distribution thicker, when
0. Therefore, we need a high to avoid a too thicker
distribution. With = 0, instead, a thicker distribution can only be obtained through lower
values of .
Naturally, the e ects illustrated in this section mostly refer to explanations related to stochastic volatility although in general, they might arise for other reasons leading to leptokurticity,
such feedback e ects. [In progress, mention the literature on feedback e ects of the 1990s.]
An even simple channel is one in which stock prices are simply driven by jumps, which make
the left tail of the distribution thicker than the right one, as in the following model,
= ( )

where is not constant. Naturally, the point of assuming


as an explanation for the skew. [In progress]

S
constant is to isolate crashophobia

10.5.3 Option pricing with stochastic volatility


Stochastic volatility might lead to market incompleteness,9 the situation dened in Chapter 4 as
one in which we cannot hedge against future contingencies based on trading available securities.
In the context we are dealing with in this section, we cannot replicate the payo of a European
stock option, because the number of assets available for trading (one stock) is less than the
sources of riskthe two Brownian motions in Eq. (10.36),
and
. Intuitively, the option
2
price is
(
), thus being driven by both the asset price, , and stochastic
2
variance, . The point is that 2 and, then, , are driven by
, while the value of a would
be replicating portfolio (only comprising the underlying asset) would only be driven by
.
Therefore, this portfolio cannot replicate the option, and the option price cannot be preferencefree. In a nutshell, and as explained in Part I of these lectures, preference-free pricing relies
on the possibility to perfectly replicate a derivative and the unique price of this derivative is
9 Stochastic volatility might not necessarily lead to market incompleteness. Mele (1998) (p. 88) considers a circular market with
asset prices, where (i) asset price no. exhibits stochastic volatility, and (ii) this stochastic volatility is driven by the Brownian
motion driving the (
1)-th asset price. Therefore, in this market, each asset price is solution to Eqs. (10.36) and yet markets are
complete.

488

c
by
A. Mele

10.5. Stochastic volatility

then that of the replicating portfolio, independently of risk-appetite.10 Naturally, markets can
be completed by the option, although in this case, option pricing is not preference-free as we
shall show in the next sections.
To summarize, stochastic volatility entails two inextricable consequences: (i) There is an
innity of option prices consistent with absence of arbitrage, which correspond to the many riskneutral probabilities consistent with the model: there are many risk-adjustments that we can
make to the drift term of the variance process in Eqs. (10.36); (ii) there cannot be perfect hedging
strategies only relying on the underlying asset. As regards point (ii), we might, alternatively,
either (a) use a strategy, which albeit not self-nanced, would still allow for a perfect replication
of the claim, or (b) a self-nanced strategy that would apply to some misspecied model. In case
(a), the strategy leads to a hedging cost process. In case (b), the strategy leads to a tracking
error process, although there might be situations where the claim can be super-replicated, as
explained below.
We start with a short detour about how to understand replicability in a general context, and
proceed to option evaluation in subsequent subsections.
10.5.3.1 Spanning and cloning

A set of securities spans a set of payo s, if any point in that set can be generated by a linear
combination of the security prices. As explained in Chapter 4, the set of payo s may include
those promised by a contingent claim, for example, that promised by a European call, or nal
consumption, as in Harrison and Kreps (1979) and Du e and Huang (1985). Chapter 4 relies
on this spanning property and solves for consumption-portfolio choices through martingale
methods. In this section, we show how spanning helps dene replicating strategies with the
purpose of pricing redundant assets. Consider the following model (introduced in Chapter
4), where asset prices are assumed to be driven by a -dimensional Brownian motion W ,
y =

(y )

(y ) W

(10.38)

where and are vector and matrix valued functions. The value of a portfolio is = + ,
where + denotes the vector of the security prices and the money market account, and is a
portfolio process with the same notation as in Chapter 4. As explained in Chapter 4 (Section
4.3.1) the value of a self-nanced portfolio satises
= + , or,

= >(
1 )+
(10.39)
+ >

where
( 1
)> ,
,
( 1
)> ,
is the price of the -th asset,
is
its drift and is the volatility matrix of the price process. Chapter 4 utilizes the risk-neutral
probability, Q, to help characterize the securities span. Let us, now, do the same under the
physical probability, . In our context, asset prices are semimartingales under ,
=

(10.40)

where is a process with nite variation, and satises regularity conditions. Let us conjecture
=
for all . By the unique decomposition property mentioned in Chapter 4 (Section
that
10 Note that the payo of the option at the expiration
is (
)+ and does not depend on 2 . However, the value of the option
at any
does depend on 2 because, intuitively, the risk-neutral expectation of (
)+ conditional on the information set
2 .
|
at is calculated through the risk-neutral transition density say, which obviously depend on 2 :

489

c
by
A. Mele

10.5. Stochastic volatility

4.2), drifts and di usion components of


and must be the same. Regarding the di usion
terms, the portfolio
in Eq. (10.39) needs to satisfy:
=
Regarding the drifts, we equate those of
=

>

and
)+

>

(10.41)
, obtaining,
>

)+

(10.42)

where the second equality holds by conjecture that


= for all .
The conjecture that
= is indeed correct as
is the price of a traded asset (by a slight
generalization of the argument in Section 4.2 of Chapter 4), and the assumption of complete
markets. However, Eq. (10.41) has no solutions for when
, such that there are no
self-nancing strategies that replicate : the number of available assets is too small to span
all possible eventsmarkets are incomplete according to the denitions in Chapter 4. We shall
return to this topic a few times in this chapter.
Option evaluation ts perfectly within this framework. For example, a call price could be
undertood as being a di erentiable function ( y ), and by Itos lemma, a special case of Eq.
(10.40),
=( ) +( )
is the vector containing the partials of
where
is the usual innitesimal generator,
with respect to each component of y. It can be replicated if, at least,
= , in which case
= > . For example, in the context of Black & Scholes and Merton of Section 10.4.1,
we have that
= = 1, and the asset price is the only state variable in Eq. (10.38)a
geometric Brownian motion with parameters (
), i.e. ( ) =
and ( ) =
. Then,
=
, =
, leading to the celebrated Black-Scholes hedge ration in Eq. (10.19).
How does replicability work in the context of stochastic volatility? It may actually work,
although then we might not have preference-free formulae, as we now explain.
10.5.3.2 Replication

Let us suppose that the price of the underlying asset is solution to Eqs. (10.36). The rational
2
pricing function of a European-style option is
= (
). Suppose two such options
1
2
1
are traded, with prices
and
, where we take 1
by trading
2 . We cannot replicate
the underlying asset and the money market account. Indeed, let be the value of a self-nanced
strategy including the asset price and the money market account, which obviously satisfy:
=( (

)+

(10.43)

where is the value of the invested underlying asset. Instead, the price of the rst option
satises:
1
1
=L
+ 21
(10.44)
+ 1

1
1 2 2
1
1
1
+
+
+ 12 2 21 2 +
where L is the innitesimal generator, L 1
2 + 2
1
cients of the option price in Eq.
2 . We see that we cannot match the di usion coe
(10.44) through that in Eq. (10.43).
Instead, we might replicate the price of the rst option, 1 , through a self-nanced portfolio
strategy including (i) the underlying asset, (ii) the option expiring at 2 , and (iii) the money
490

c
by
A. Mele

10.5. Stochastic volatility


market account. The value
=

)+

of this strategy satises:


L(

2)
2

2
2
2

(10.45)
where 1 is the invested asset value, and 2 is the value of the investment in the second option.
We match the di usion coe cients of Eq. (10.44) and Eq. (10.45), and obtain:

1
1
1
2
2
2
2
(10.46)
1 =
2 =
2
2
2
2

Replacing these expressions into Eq. (10.45), and equating the drift of Eq. (10.45) to that of
Eq. (10.44), leaves:

1
2
1
1
2
2
L
(
)
(
)
L
=
(10.47)
1
2
2

These two ratios agree. They must then be equal to some process
of the maturity of the option. Therefore, we obtain that,
+

+(

1
2

1
2

2 2

, say, which is independent

(10.48)

The interpretation of
is that of the unit risk-premium required to face the risk of stochastic uctuations in volatility. The problem, absence of arbitrage does not su ce to recover a
unique . By the Feynman-Kac stochastic representation of the solution to a partial di erential equation, there are many solutions to Eq. (10.48),

2
(
)
2
=
E
)+
(
(10.49)

is a risk-neutral probability, induced by the many that are consistent with absence
where
of arbitrage.
The previous derivation suggests two possible uses of the portfolio strategies of Eqs. (10.46).
The rst, obvious, is hedging. We can always hedge the rst option with 1 and 2 in Eqs.
(10.46). The second is more subtle. If we really think our evaluation model for the rst option
is better than the market, we can always synthesize the rst option with the portfolio in Eqs.
(10.46), and replicate the payo of the rst option at expiration, ( 1 ).
Eqs. (10.47) and (10.48) can be interpreted as APT relations. Indeed, let us dene the unit
risk-premium related to the uctuations of the asset price, = (
) . Then, Eq. (10.47)
or Eq. (10.48) imply that,

2
L( )
+

=
= +
| {z }
| {z }
2

is the beta related to the volatility of the option price induced by uctuations in
where
the stock price, , and 2 is the beta related to the volatility of the option price induced by
uctuations in the return volatility. It is a generalization of the APT relation in Eq. (10.16)
that holds for the Black & Scholes model.
491

c
by
A. Mele

10.5. Stochastic volatility


10.5.3.3 Market completeness

Derivatives can complete markets. Show this is the case when


0. Show conditions under
which this is true. Its a generalization to the one-dimensional case analyzed in Section 10.4.7.
Review the literature. [In progress]
10.5.3.4 Pricing formulae

Hull and White (1987), Scott (1987) and Wiggins (1987) develop the rst option pricing models
with stochastic volatility. Heston (1999b) provides an analytical solution assuming an a ne
model for the variance process, for otherwise, we need to solve through numerical methods such
as Montecarlo simulation or the numerical solution to partial di erential equations.
Hull & White

Hull and White (1987) derive a rst pricing formula based on a continuous-time model where
asset returns and volatility are uncorrelated,
=

+
(10.50)

2
2

where
and
are uncorrelated Brownian motions, dened under the risk-neutral probability.
They show that the option price takes the following form:
q

2
)]
= E [ BS (
;
(10.51)
q

) denotes the usual Black-Scholes formula in Eq. (10.15), evaluated


where BS (
;
at the average variance, dened as
Z
1
2
=

and E denotes the conditional risk-neutral expectation taken with respect to laws generating
. According to Eq. (10.51), the option price is simply the Black & Scholes formula averaged over all possible values taken by future average variance, . Accordingly, the authors
provide a Taylors expansion around the conditional expectation of ,
( ) = E ( ),

2
= BS (
;
)

;
)
1 2 BS (

( )
+

2
= ()
1 3 BS (
;
)
+

( ) +
(10.52)

6
=

()

In fact, Eq. (10.52) is a general formula applying to models more general than that in Eq.
(10.50), say one of the models encompassed by Eqs. (10.36), provided of course
and
are uncorrelated, as explained more formally in Appendix 3. In Appendix 3, we also explain
that Hull & White Equation (10.51) can be generalized to the case in which
=
+
492

c
by
A. Mele

10.5. Stochastic volatility

p
2
1
, where is a standard Brownian motion and is a stochastic correlation. Romano
and Touzi (1997) show that in this case, and provided (), and () () in Eqs. (10.36) are
independent of (),
q
2
)]
(
) = E [ BS (
;
Z
(10.53)
1
2 2
+
2
2
= 1
2
(1
)
Heston

Heston (1993b) develops an analytical solution to a model with stochastic volatility, relying on
the following dynamics of the stock price:

(
1 2
ln
=
+
2
1

p
(10.54)
2
2
2
=
(
) +
+
1
1
2
The instantaneous variance is thus a square-root process. We provide a few hints on the
derivation of Hestons formula, relying on a general formula based on the same line of reasoning
leading to Eq. (10.10) in Section 10.2, as follows:
(

=
=
=

)
(

E (
Z
)

)+

0
m

I
m

Z0

)+

I
m

(10.55)

2
where (
) is the risk-neutral joint density of the stock price
and variance 2 at
m
m
( ) is the risk-neutral marginal density of
, ( ) is another marginal density of
m
with Radon-Nikodym derivative with respect to
( ) given in Eq. (10.11),

( )=

m (
m
(

)
=
)

,
,

(10.56)

) and
(
) are two probabilities with densities m and m ,
and nally, (
respectively. All these densities and probabilities are conditional upon the information at time
.
By Girsanov theorem, the density process associated to the Radon-Nikodym derivative in
Eq. (10.56) satises,
( )
)
= (
1
( )
such that the stock price is solution to:
(

=
+ 12 2
+
ln
2
= (
(
)

1
2

+
493

1 +

p
1

(10.57)

c
by
A. Mele

10.5. Stochastic volatility

1 +
and 1 is a Brownian motion under the new probability with density m , with
1 =
.
Let
ln . In the Black-Scholes case, 2 is a constant, and the two probabilities, (
ln )
and
(
ln ), can be expressed in closed-form, using Eq. (10.57) and Eq. (10.54), respectively, leading to the celebrated formula in Eq. (10.15), as explained in Section 10.4.2.
2
(
In the Hestons model, the two probabilities, 1 (
)
ln ) and
2
)
(
ln ), are solutions to:
2(

2
2
1
=0
=0
(10.58)
2

2
with the same boundary condition
(
) = I ln , = 1 2, and where and are
the innitesimal generators associated to Eq. (10.57) and Eq. (10.54). Indeed, we have that:
2
(i) probabilities are martingales, due to 2 (
) = E (I ln ) = Pr (
ln ), and
similarly for 1 (); and (ii) probabilities are di usion processes because the stock price is. Eq.
(10.58) then follows by the Feynman-Kac representation theorem.
The solution to the two partial di erential equations is unknown in closed-form, though.
However, their characteristic functions can be shown to be exponential a ne in and 2 .
Precisely, dene the two characteristic functions:

i
2
2

; =E
; =E
1
i=
1
2

denotes the expectation taken with respect to m , and E denotes the conditional
where E
expectation taken against m . The two functions satisfy the same partial di erential equations
(10.58), but they can be solved in closed-form, because their boundary conditions are simply
2
(
) = i . Indeed, a fundamental denition is that a model is a ne if its characteristic
function is exponential-a ne in its state variables. A ne models were already used to analyze
the term structure of interest rates, since at least Vasicek (1977) and Cox, Ingersoll and Ross
(1985), as we shall discuss in Chapter 12. Hestons model is the option pricing counterpart to
these models of the yield curve.
The solution to the two characteristic functions is given by:

2
(
; )+ (
; ) 2 +i
; =
where

(
(

; ) =

)+
2

i+
i

i+

; ) =

such that,

i(

1
1

q
= (

1 1
= +
2

1
2

i + )(

2 ln

1
1

i)2
2

i ln

Re

1
2
(
i

; )

(10.59)

[Write a small technical Appendix on inversions of characteristic functions] Replacing these two
probabilities into Eq. (10.55), yields the celebrated Hestons formula.
494

c
by
A. Mele

10.5. Stochastic volatility


10.5.3.5 Case study: assessing accumulators in markets with stochastic volatility

Is stochastic volatility important, in practice? Consider the accumulators described in Section


10.3.2, which are portfolios short puts having the same strike , and long one call with strike
, such that the initial cost of the portfolio is equal to zero. Let
= 100, and assume all
options expire in six months. We consider two markets. The rst market is one where asset
prices are generated by the Black & Scholes model. In the second market, asset prices follow
the Hestons model in Eq. (10.54). We assume that the parameter values are the same as those
in Figure 10.7. We also assume that random uctuations in volatility are not priced, such that
the parameters and under the physical probability are the same as those under the riskneutral. We address a number of issues aiming at exploring how stochastic volatility a ects risk
and return of these accumulators.
First, assume the current level of the index is = 98. How many puts with strike
= 90
should we sell, to nance the long position in the call option, in both the Black & Scholes and
Hestons markets? Assuming that the current volatility in Hestons model equals that related to
the steady state variance,
=
= 0 01, we nd that in the Hestons market, = 3 9399,
whereas in Black & Scholes, = 6 5362.
Next, we assume the current level of the index is = 102 and re-do the previous calculations
assuming, now, that the strike of the put is
= 95. We nd that in Hestons market, =
5 9338, whereas in Black & Scholes, = 8 8767. The interpretation is that by increasing the
strike from
= 90 to
= 95, we move towards more expensive puts. At the same time, by
assuming that = 102 (instead of = 98), we are also considering a better market, which
has an opposite e ect on the put price than that arising following an increase in
. Moreover,
a better market obviously makes call options more expensive. These e ects lead to a higher
number of puts to sell, in order to nance a long position in the call.
We can assess how the portfolio changes, once we consider a value of the index still equal
to = 98, but two levels of volatility: one higher than that corresponding to the steady state,
= 0 02, and another, lower,
= 0 008. In the high volatility case,
= 0 02, we nd
that = 2 9252, and in the low volatility case, we nd that
= 0 008, = 4 3249. An
increase in volatility makes both puts and calls increase in value, although the price of the two
puts increases more than that of the call.
Finally, assume that the rate of appreciation of the asset price is constant under the physical
probability, and equal to = 3%. We can use Montecarlo simulations to calculate the average
and standard deviation of the Prots & Losses generated by the accumulator. We summarize
our ndings in the table below.

P&Ls for a six month accumulator


average std dev
B&S
1.3716 6.5114
Heston
1.2925 7.0352
Heston (high vol) 1.3571 8.9136
Heston (low vol) 1.2700 6.5906

Figure 10.8 depicts the relative frequency of the P&Ls corresponding to all cases we consider.
495

c
by
A. Mele

10.6. Trading volatility with options

Black & Scholes

Heston
0.8
Relative frequency

Relative frequency

0.8
0.6
0.4
0.2
0
100

50

0.6
0.4
0.2
0
100

50

Hestonhigh vol

50

0.8
Relative frequency

Relative frequency

Hestonlow vol

0.8
0.6
0.4
0.2
0
100

50

50

50

0.6
0.4
0.2
0
100

50

50

FIGURE 10.8. Frequency of Prot and Losses regarding a six month accumulator under
di erent market assumptions: Black & Scholes (NW quadrant), Hestons with current
volatility xed at its long-term value (NE), Hestons with current volatility lower than its
long-term value (SE), and Hestons with current volatility higher than its long-term value
(SW).

Note that although the average P&Ls are positive in all cases, the frequency distributions
of the P&Ls exhibit quite high standard deviations, with long left-tailsdownside risk is quite
substantial, consistently with the payo structure that we depicted in Figure 10.4. In the
Hestons market, average prots are higher than in Black & Scholes because accumulators
necessitate less puts. Within the Hestons market, prots lower when we move to both a low
volatility and high volatility scenarios. In the low volatility scenario, average prots lower
because the number of puts in the accumulator needs to increase. In the high volatility scenario,
prots lower because whilst the number of puts in the accumulator decreases, a more volatile
market also makes the accumulator more likely to generate adverse outcomes, thereby leveling
down the expected prots.

10.6 Trading volatility with options


10.6.1 Payo s
Positioning in volatility is a trading strategy relying on expectations about future volatility
movements. It is a non-directional strategy, as it does not rely on the direction of the markets,
only on their changes. It is market practice to distinguish between two types of volatility trading:
496

10.6. Trading volatility with options

c
by
A. Mele

(i) Vega trading, or volatility surface trading. It refers to a trade aiming to prot from a
view that the term-structure of implied volatilities will changefor example, from the
expectation of a attening or a steepening term-structure of implied volatilities. It requires
positioning into multiple types of options according to the nature of the expectation. For
example, a bull attener relies on the expectation that long-term implied volatilities will
decrease faster than short-term implied volatilities, leading the term-structure of implied
volatilities to atten, which could be implemented through a portfolio which is: (i.1) long
short-term options, and (i.2) short the long-term ones. (This portfolio would need to be
delta-hedged, for reasons explained below).
(ii) Gamma trading. It is a trade that aims to generate prots from a realized volatility
exceeding the current implied volatility. It relies on directional views regarding ongoing
volatility developments. For this reason, gamma trading has an horizon that is typically
shorter than that of vega trading.
Option-based strategies might allow us to have views about these volatility developments,
and include trading straddles, strangles, butteries, calendars, or delta-hedged option positions, as we shall explain below. They consist of portfolios comprising options and
assets underlying these options, and aim to make P&Ls consistent with views about volatility
developments.
A natural question arises. We know option prices are, generally, increasing in volatility. So
why do we need to create portfolios of options and underlyings, in order to trade volatility? The
reason is that option prices are increasing in both volatility and the asset price. For example, in
2
a stochastic volatility setting, the option price is (
), and if the volatility
increases,
2
the option price (
) increases as well, in general. However, it might be possible that
the increase in volatility occurs exactly when the asset price decreases. Incidentally, this circumstance is quite likely to occur, given the empirical evidence about the negative correlation
between
and
reviewed in Section 10.5. The implication would be that the increase in
determined by an increase in
might be o set by the fall in following the drop in . To
isolate movements in the asset price volatility, we need to consider portfolios reverse-engineered
so as to be insensitive to changes in the underlying asset price. [Mention here and in the next
section Goldman Sachs approach to VIX]
To mitigate the e ects of the movements in the underlying price, we may consider BlackScholes hedges, such that the long position in the call option is o set by the short-position in
the Black-Scholes replicating portfoliowhich, by construction, only neutralizes movements
in , not . An alternative is a portfolio comprising options with nal payo s driven by the
stock price, and negatively correlated, such as a European put and call options. For example, a
straddle is a portfolio of one call option and one put option that have the same strike price and
the same maturity. (A strangle is the same as a straddle, with the di erence that the strike of
the call di ers from that of the put.)
Figure 10.9 depicts an example of payo s arising from going long a straddle. The left panel
shows the nal payo , equal to (
)+ +(
)+ , for = 100, as well as the value of this
payo at , assuming a Black and Scholes market, with a risk-free rate = 1%, instantaneous
volatility = 20%, and under two assumptions about the maturity of the straddle, three months
and one month. The right panel shows, instead, the P&L of this straddle, dened as (
)+ +
+
(
)
(
)
Cost , where Cost = BS (
;
) + BS (
;
), BS ()
is the Black-Scholes formula in Eq. (10.15), BS () is the corresponding put price, and
is
497

c
by
A. Mele

10.6. Trading volatility with options

1
3
the maturity of the straddle, with 1
= 12
and 2
= 12
. We assume that the index level
at is
= 100, such that the straddles are approximately at the moneythe strike leading
(
)
to at-the-money straddles is:
=
, but for comparison reasons, we keep on setting
= 100 for both straddles.

10

9
8

7
6

96

97

98

99 100 101 102 103 104 105

S_T

-1

4
3

-2

2
1
0

-3
90

95

100

105

110

-4

FIGURE 10.9. The left panel depicts the payo of a straddle with strike price = 100
3
(thin dashed line), as well as the value of a straddle with maturity 2
= 12
(solid line)
1
= 12 (dashed line). The right panel depicts the P&Ls of roughly at-the-money
and 1
(
) Cost , where
straddles bought at time , dened as (
)+ + (
)+
3
Cost = BS (
;
) + BS (
;
), with 2
= 12 months (solid
1
line), and 1
= 12 (dashed line), = 1%, = 10%, and the index level
= 100.

The logic behind a straddle is that a call and a put have deltas that roughly compensate with
each other, thereby allowing this portfolio to change primarily because of volatility movements.
Figure 10.9 illustrates that a straddle helps express views about volatility, in that it pays o
whenever the stock price moves su ciently away from the initial level of the index, = 100.
Note a technical complication, arising because the delta of the straddle is not precisely always
zero, especially when the index level drifts away from moneyness. By Eq. (10.21), and the putcall parity in Eq. (10.7), it is: 2 ( ) 1. For example, we have that in the Black & Scholes
market,


Straddle = + = (2 ( ) 1)
1
2

such that once we choose

Straddle

2
=2
2
=2

at inception, we have, at inception,

1
1
2
2

1
1
2
2

1
1
2
(0) + (0)
2
1
2
498

c
by
A. Mele

10.6. Trading volatility with options

where the third line follows by a Taylors linear approximation for


small, and the fourth
by a simple calculation. Figure 10.10 depicts the delta of the straddle as a function of the index,
after inception.11
The reason it deviates from zero while the index is signicantly away from its initial level
is that the straddle becomes a short or a long position as soon as the index moves. When the
index is up, the delta of the call is higher than the delta of the put, in absolute value, and when
the index is down, the opposite happens.

0.8
0.6
0.4
0.2
0.0

96

97

98

99

100

-0.2

101

102

103

104

105

-0.4
-0.6
-0.8

FIGURE 10.10. The delta of a straddle, 2 ( ) 1, where ( ) is the Black and Scholes
3
1
delta in Eq. (10.21), the strike = 100, for maturity 2
= 12
(solid line) and 1
= 12
(dashed line), and = 1% and = 10%.

Naturally, shorting a straddle leads to payo s with opposite sign than those in Figure 10.9
shorting a straddle relies on the expectation that markets are going to be stable. Straddles
bear some inglorious history. In 1995, the 233-year old Barings Bank collapsed, because of the
famous short-straddle one of its traders, Nick Leeson, was implementing on the Nikkei Index. A
short-straddle is, of course, a view volatility will not raise. However, in January 1995, a violent
earthquake made the Nikkei index crash by almost 7% in a week. The straddle was naked,
i.e. delta-hedged, at most, and led to losses Leeson was not only unable to absorb, but also to
amplify, given he was insisting on having views the Index would stabilize. The Index did not.
Potential losses arising from a short position in straddles can be reduced, by going long
one additional portfolio comprising: (i) an out-of-the money put, which pays exactly when the
underlying goes down, and (ii) an out-of-the money call, which pays when the underlying goes
up. Combining this portfolio with a short-straddle leads to what is known as buttery spread.
Figure 10.11 depicts payo s and P&L relating to a buttery, where the straddle has strike
= 100 and maturity one month (as one of the straddles in Figure 10.9), and the strikes of the
out-of-the-money call and put are
= 102 and = 98. The right panel shows the P&L of
+
(
)
the buttery, dened as
(
) +(
)+ +(
)+ +(
)+
Cost ,
11 Note that by standard homogeneity properties,
1
at-the-money straddle,
Straddle|
=
2

Straddle = 2 ( )
.

499

1, and that at inception, and for an (approximately)

c
by
A. Mele

10.6. Trading volatility with options

1
where
= 12
, and Cost is the value of the buttery at time in the same Black & Scholes
market considered in Figures 10.9-10.10.

1.4
1.2

1.0

0.8
0
-1

96

98

100

102

104

0.6

S_T

0.4

-2

0.2

-3

0.0
-0.2

-4

96

98

100

102

104

S_T

-0.4
-5

FIGURE 10.11. The left panel depicts the payo of a buttery with maturity equal to
one month (solid line), which is (i) short one straddle (the thin dashed line) with strike
= 100, and (ii) long one out-of-the-money put, with strike
= 98, and one out-of= 102 (the dashed line). The right panel depicts the P&L
the-money put, with strike

+
(
) Cost ,
of the buttery,
(
) +(
)+ + (
)+ + (
)+
1
where
= 12 , and Cost is the value of the buttery at time , obtained in a Black &
Scholes market with = 1% and = 10% and the index level
= 100.

Calendar spreads are alternative strategies to straddles with which to express views about
volatility. They are portfolios long one call with maturity 1
and short one call with maturity
, where 1
2
2 , and where the two calls have the same strike price. If the underlying
asset price does not move too much, the calendar spread value drops, because the price decay
due to the passage of time (see Section 10.4.6.2) is more severe for the call with lower time to
maturity.
However, at any given point in time, the calendar spread increases in value as soon as the
underlying price moves, regardless of whether this movement is positive or negative. This property is due to convexity. Let us explain. Note that as the option maturity decreases, the option
value becomes more convex with respect to the underlying price. Therefore, at any given point
in time, when the underlying increases, the short-dated option value increases more than the
long-dated; and when the underlying decreases, the short-dated option value decreases less than
the long-dated.12
Therefore, going long a calendar is consistent with the view that the asset prices will quite
uctuate away from their current levels (regardless of whether on their way up or down), that
is, that market (realized) volatility is about to increase. Figure 10.12 depicts the payo and the
1
P&Ls of a calendar, with strike
= 100, an index level
= 100, maturities 1
= 12
and
12 This argument is best understood while comparing the intrinsic value of an option to the option value before maturity. The
former is at for relatively small values of the underlying price. Otherwise, it increases one-to-one with underlying.

500

c
by
A. Mele

10.6. Trading volatility with options

3
= 12
, and assuming the same Black & Scholes market as in Figures 10.9-10.11. The payo

2
)+
;
, whereas the P&L is
one month after the initial positioning is (
BS
12

1
+
2
12 (
)
;
;
)
;
)).
given by (
BS
BS (
1
BS (
2
12
2

0.6
0.4
0.2
0.0
-0.2

96

97

98

99

100

101

102

103

104

105

S_T

-0.4
-0.6
-0.8
-1.0
-1.2
-1.4
-1.6

FIGURE 10.12. Payo and P&L of a calendar, a portfolio which is (i) long a call option
1
with strike price
= 100 and time to maturity 1
= 12
, and (ii) short a call option
3
= 12 . The solid line plots the payo
with strike price = 100 and time to maturity 2

2
after one month from inception, dened as ( )
(
)+
,
BS
12 ;
whereas the dashed line is the payo inclusive of the initial value of the position, ( )
1
12 ( BS (
;
)
;
)), with the index level
= 100.
1
2
BS (

10.6.2 P&Ls of

-hedged strategies

Straddles, calendars, or Black-Scholes hedged strategies, are not necessarily the best way to
formulate views regarding ongoing volatility developments. To understand volatility trading
through option-based strategies, consider the simplest strategy, where one buys an option and
hedges it through the Black and Scholes formula.13 Suppose to live in a world with stochastic
volatility, where the asset price moves as in Eqs. (10.36). Assume that at time , we go long
2
a call option with market price equal to (
). Let us build up a self-nanced portfolio
with value ,
=
+
(10.60)
where

is the money market account,

2
0 =
0
0 0

BS

; IV0 )

(10.61)

and IV0 is the Black-Scholes implied volatility as of time = 0, i.e. the time at which we are
to take a view on future volatility.
Consider, rst, the following heuristic arguments. Assume a Black & Scholes market, where
the short-term rate, , is zero and is also zero. While volatility is constant in this market, there
13 The following arguments also apply to the hypothetical situation where an investment bank, say, purchases an option for a
mere market making scope, and then tries to hedge against it through Black-Scholes. It is, however, an unrealistic situation, as
investment banks hedge through books, not through the single units adding up to the books.

501

c
by
A. Mele

10.6. Trading volatility with options

2
might be periods where the realized instantaneous variance,
, is higher than IV20 . What is
the P&L of a call option delta-hedged through Black & Scholes? Note that a call option deltahedged with Black & Scholes is simply a portfolio with value equal to
=
,
BS
such that, approximately,
=

+
|

1
2

1
2

)2 +

IV20

{z

1
+
2

BS

BS

1
) =
2
2

"

IV20

(10.62)

, BS =
, the Delta, = 2 , the Gamma, and the second equality follows
where =
by the Black-Scholes pricing equation (10.14). Aggregating up to the maturity of the option,
delivers the P&L at :
"
#
2
X
1X
2
P&L
=
IV20
(10.63)
2
=1
=1
Note that the Black & Scholes delta is needed to compensate for those portions of the call
price movements arising due to the asset price movementsthe term BS
in the brackets of
Eq. (10.62) contributes positively to the P&L, when
0, and negatively otherwise. So the
2
hedging is natural: the call price can go up because of realized volatility (i.e.,
) or because
the underlying price goes up. To isolate the views about volatility, we need to hedge against
movements in the underlying price. As is clear, hedging through the Black-Scholes delta helps
neutralize this e ect.14
Hedging can only be e ective in the very short-term, as the period-by-period prots
in
Eq. (10.62), only depend on how far realized volatility is from the initial Black-Scholes implied
volatility.15 In general, hedging might lead to a P&L inconsistent with views, because
the
2

di erence between realized volatility and Black-Scholes implied volatility at time ,

2
IV20 , is weighted with the Dollar Gamma,
, which is positive as the Black-Scholes price
is
in . In other words, we may well end up with a negative P&L , even if the terms

convex
2

IV20
are positive for most of the time, a feature known as price-dependency. We
now illustrate these facts through a continuous-time model with random volatility.
Consider a general situation where volatility is not constant, such that the model is misspecied. El Karoui, Jeanblanc-Picque and Shreve (1998) make the following observations. Consider
the value of the self-nanced portfolio in Eq (10.60). Because this portfolio is self-nanced,
=
=
=[

+
+ (
+ (

)
+
2

in the continuous time limit, and assuming Black & Scholes is true,
IV20 , such that Eq. (10.63) shrinks
to zero. Therefore, in the Black-Scholes market, views on realized volatility can be channelled only due to discrete-rebalancing.
Below, we shall explain that a non-zero P&L obtains even in the continuous time limit, once the stock price displays stochastic
volatility.
15 A similar P&L obtains regarding straddles and calendars.
14 Naturally,

502

c
by
A. Mele

10.7. Local volatility


Moreover, by Itos lemma,
BS

IV0 ) =

BS

BS

1
+
2

BS
2

BS

where
BS

BS

BS

BS

BS + (

1
2

{z

2
2

BS
2
2

+ IV20

BS
2

BS

BS

1
2

+(

IV20

BS

2
2

1
2

IV20

2
2

BS
2

BS
2

Therefore, the tracking error, or P&L , dened as the di erence between the Black-Scholes price
and the portfolio value,
P&L
; IV0 )
BS (
satises
P&L =
At maturity

,
P&L

1
P&L +
2
BS (

= max {
Z
1
=
2
0

;
0}
2

IV20

2
2

BS
2

IV0 )

IV20

BS
2

(10.64)

This expression is the continuous-time counterpart to Eq. (10.63) in a market with stochastic
volatility. Moreover, it can be shown that a delta-hedged straddle strategy leads to twice the
expression in Eq. (10.64), with the second partial of the straddle replacing the Black-Scholes
Gamma. Because the Black-Scholes price is convex, Eq. (10.64) tells us that even if we do not
exactly know the law of movement of volatility, but still hold the view it will be persistently
higher than the initial Black-Scholes implied volatility, we can obtain positive prots through
(i) a long position in a call, and (ii) a short position in the Black-Scholes replicating portfolio. Naturally, it isnt an arbitrage opportunity. The critical assumption is that volatility will
increase.
Eq. (10.64) is problematic. Even if the volatility
is higher than IV0 for most of the time,
the nal P&L may not necessarily lead to a prot. The reason is that each volatility view,
2
2
IV20 , is weighted by the Dollar Gamma, 2 BS
2 . It may be that bad realization of the
volatility views, i.e. 2
IV20 , occur precisely when the Dollar Gamma is largethe pricedependency issue raised whilst discussing Eq. (10.63). Moreover, the strategy is costly, as it
relies on -hedging. The volatility contracts of Section 10.6 overcome these di culties.

10.7 Local volatility


This section generalizes the previous models with stochastic volatility, which allow to price the
entire cross-section of options (the skew) without errors. Section 10.8 explains that there
503

10.7. Local volatility

c
by
A. Mele

exists a beautiful connection between the model in this section and the price of volatility, i.e.,
the price to be paid in dedicated variance swaps aiming to protect investors from changes in
volatility over a xed horizon.
10.7.1 Issues
Stochastic volatility models might provide interesting explanations, such as the smile e ect, as
discussed in Section 10.5.2. However, these models cannot allow for a perfect t of the smile.
Towards the end of 1980s and the beginning of the 1990s, a modeling approach emerged to cope
with issues relating to a perfect t of the yield curve. As reviewed in the next two chapters,
this modeling approach was a response to the need of pricing interest rate derivatives while
relying on models in which the underlying assets in the banks books (bonds say) were priced
without errors. In the early and mid 1990s, methods were developed to deal with equity options
(Derman and Kani, 1994; Dupire, 1994; Derman, 1998), which we succinctly review in this
section.
Why is it important to exactly t all of the already existing options? Trading deals with
both plain vanilla and less liquid, or exotic derivatives. Suppose we wish to price exotic
derivatives. We want to make sure the model we use to price the illiquid option predict that
the plain vanilla option prices are identical to those we are trading. How can we trust a model
that is not even able to pin down all outstanding contracts? A model like this could give rise
to arbitrage opportunities to unscrupulous users. To achieve these tasks, we need to feed the
model not only with information regarding the current price, but also with information linking
to the entire collection of available option prices.
[Plan of this Section]
10.7.2 Implied binomial trees
We begin with the usual binomial tree of Cox, Ross and Rubinstein (1979), then deal with
implied binomial trees. The idea underlying implied binomial trees is to enrich the basic Black &
Scholes model with information regarding the very same asset we are pricing (i.e., the options).
While Black-Scholes and Cox-Ross-Rubinstein feed the evaluation model with information only
relating to the initial stock price, implied binomial trees also use information available through
derivatives data, aiming to model developments in the stock price. The resulting model is one
with time-varying volatility, in which each derivative price used to feed the model is t without
errors. The model can be used to price exotic derivatives, while by design able to price existing
derivatives without errors.
10.7.2.1 Binomial trees

[Use material from D:\antonio\lectures\xed income LSE FM413\classes\simple trees, note


its almost copy-paste]
10.7.2.2 Implied binomial trees

It is the discrete-time version of the local volatility model in continuous time. In discrete time,
Derman and Kani (1994) and Rubinstein (1994), and in continuous time, Dupire (1994).
We assume that the short-term rate is constant, or exogenously given. We shall rely on
Arrow-Debreu securities, i.e., securities that pay o one unit of numeraire in a given state of
the world, and zero otherwise (see Part I of these lectures). Let ( + 1) be the price of an
504

c
by
A. Mele

10.7. Local volatility


Arrow-Debreu security that pays o $1 at time
have the following tree

+ 1 and in state , such that at time , we

$0
(

):

(1

1 ):

( ))

( )

time

%
&

%
&

+ 1) : $1

$0

time + 1

In this tree, ( ) is the risk-neutral probability of an upward movement in the stock price at
time in state . Displayed on the second row of the tree are the values of the Arrow-Debreu
security that pays at (
+ 1), both at ( ) and (
1 ). By explanations given in Chapter
2, we have that

( + 1) =

( )

(1

( )) +

( )

( )

(10.65)

Eq. (10.65) is known as the forward equation for the set of all the Arrow-Debreu security
prices. It can be solved recursively once we know ( ) is given, as mentioned. To illustrate,
suppose that risk-neutral probabilities are constant and equal to , such that the solution to Eq.
(10.65) is, simply, the discounted value of the risk-neutral probability that upward movements
occur over + 1 trials:

( + 1) =

( +1)

+1

(1

+1

This example can be used to price Arrow-Debreu securities in the context of the Cox-RossRubinstein model of the previous section.
We now turn to a more general problem. Suppose we are given a set of European option
prices. What are the values of ( ) to be assigned in each node of the tree such that these
options are priced without error? More generally, what are the values of ( ) and those of
the stock price in each node of the tree such that all the European options are priced without
error?
Our general approach in Chapter 11 is to freeze the risk-neutral probabilities while allowing
the short-term rate to be stochastic. That is, things are coupled in that the need arises to
determine an implied binomial tree for the short-term rate that allows for a perfect t of the
entire yield curve. In this chapter, we take as given as explained, and determine an implied
binomial tree for the risk-neutral probabilities. Suppose that we have solved everything up to
time , for example, = 2 as in the example depicted below.
505

c
by
A. Mele

10.7. Local volatility

%
&
%
&

%
&

=0

=1

% 2
&
% 1
&
% 0
&
=2

3
2
1
0
=3

(2) the stock prices at time = 2 and in node


{0 1 2}, and assume,
We denote with
as mentioned, that we have already determined them. We want to determine the values of the
stock price at = 3, i.e.
(3) in node
{0 1 2 3}, and the risk-neutral probabilities at
= 2, i.e.
(2) in node
{0 1 2}. In total, we want to determine 7 parameters, and
.
In general, and for any , given a certain level of the tree at
1, we want to determine
+ 1 values of the stock price and risk-neutral probabilities for a total of 2 + 1 parameters.
Denote with $ (
+ 1) the market price of a European call option expiring at time + 1,
and struck at . We consider options struck at the values of the stock at time . The price
of each of these options is the average of its values one period before expiration, weighted by
the Arrow-Debreu security prices for , viz
$

+ 1) =

( )

= 0

( )

=0

where ( ) denotes the price at in node , of the same option struck at


+ 1, which equals

( +1
) + (1 )(
) if
( )=
( +1
)
if =
0
if

(10.66)
, and expiring at

(10.67)

The strikes in Eq. (10.66) are chosen such that the options are (roughly) at-the-money one
period before their expiration and, accordingly, the expressions in Eqs. (10.67) for the prices of
+1 .
the call options one period before the expiration follow, because by construction,
While market prices for these options do not necessarily exists, we can interpolate the skew,
and predict the missing points needed to implement Eq. (10.66).
To solve for the 2 + 1 parameters, we use (i) the pricing equations in (10.66), (ii) the
martingale conditions satised by the stock price at time ,

=
+1 + (1 )
= 0
(10.68)

and, nally, (iii) a (2 + 1)-th condition, a renormalization condition, which we shall discuss
below.
Replacing Eq. (10.68) into the the rst of Eqs. (10.67) leaves,
( )=

if
506

(10.69)

c
by
A. Mele

10.7. Local volatility


Therefore, by Eqs. (10.66), (10.67) and (10.69),
$(

+ 1) =

( )

( )+

( )

( )=

( ) ( +1

= +1

)+

( )(

= +1

(10.70)

Let us solve Eq. (10.68) for ,


=

+1

(10.71)

Replacing Eq. (10.71) into Eq. (10.70) and rearraning terms, leaves a recursion for the stock
price over the nodes at time + 1,

(
+
1)
(
)
(
)
( ) (
)
$
= +1
+1 =

(10.72)
P
(
+
1)
(
)
(
)
(
)
(
)
$
= +1

Eq. (10.72) needs a re-normalization, as mentioned. [In progress] Once we solve for +1 , we
can solve for , using Eq. (10.71), then update the price of the Arrow-Debreu securities in Eq.
(10.65), and solve everything recursively.
This algorithm might lead to negative risk-neutral probabilities, which implies arbitrage in
which case we change the options to use. [In progress]
We now turn to the continuous time approach.
10.7.3 The perfect t, in continuous time
We know that the only input to the option pricing problem is the instantaneous volatility of
the underlying asset price. Which volatility should we use? At least, we know that option prices
are a function of this volatility. The idea is to nd a volatility function such that this very same
volatility delivers back the prices of all the already traded options. Its an inverse problem.
Let us outline the steps we need to price new derivatives, while avoiding any pricing errors for
the existing ones:
(i) We take as given the prices of a set of actively traded European options. Let and be
strikes and time-to-maturity of these liquid options. We aim to match models predictions
to data:
)= (
),
varying,
(10.73)
$(
where $ (
) denote market option prices, and (
) are the corresponding models
predictions. Note, we are assuming a continuum of options, and, that their prices are
di erentiable as much as we need for the solution to our problem to be well-dened
technically, we need Eq. (10.75) below to be well-dened. The question we now ask is
whether it is mathematically possible to consider a di usive model for the stock price,
such that the initial collection of European option prices, $ (
), is predicted without
errors by the resulting model, as in Eq. (10.73)?
(ii) The answer is in the a rmative. Consider a di usion process for the stock price:
=

+ (
507

(10.74)

c
by
A. Mele

10.7. Local volatility

where is a Brownian motion under the risk-neutral probability. The only function to
calibrate to make Eq. (10.73) hold is the volatility function, (
).
(iii) The Appendix shows that Eq. (10.73) holds if and only if (
v
u
)
$(
$(
u
+
u
)=u
loc (
t2
2
)
$(
2

)=

loc

), where:

)
(10.75)

The function loc ( ) is referred to as local volatility. Its square is the local variance,
dened as the conditional expectation under
of the instantaneous variance given the
market level at some future date ,

2
2
)=E
(
) =
(10.76)
loc (

where E [ | ] is the conditional expectation taken under the risk-neutral probability. All in
all, local volatility is the function loc ( ) in Eq. (10.75) such that the theoretical price
generated by the model in Eq. (10.74) equals the market price of all available options.
(iv) Finally, we can price illiquid options through numerical methods, for example through
simulations. In the simulations, we use
=

loc

Empirically, the local volatility surface, loc ( ) is typically decreasing in for xed , a
phenomenon known as the Black-Christie-Nelson leverage e ect discussed in Chapter 8 and
Section 10.5.2. This fact might lead to assume from the outset that ( ) =
( ), for some
function and some constant
0, as simplication leading to the so-called CEV (Constant
Elasticity of Variance) model. A convenient model is one that combines local vols with stoch
vol, as follows:
=
+ (
)
(10.77)
= ( ) + ( )
where is another Brownian motion, and , are some functions. The appendix shows that
in this specic case, the initial set of all European options prices is pinned down by:
loc (
)= p
E( 2|

loc (
where

loc

)
=

(10.78)

) is the same as in Eq. (10.75). For this model, we simulate


=

+ loc (

= ( )

+ ( )

Practitioners are also heavily rely on the so-called SABR model, which is parametric in
nature. [Provide references, and explain why SABR is important, compared to local-vol.] A
note on recalibration. Clearly, local surfaces are obviously functions of the initial state where
the calibration starts o . The calibration has to be re-performed all the time to reect new
information.
508

c
by
A. Mele

10.7. Local volatility


10.7.4 Relations with implied volatility
10.7.4.1 Implied volatility as expected local volatility

Section 10.5.5 provides the expression of the P&L relating to a long position in a call option,
delta-hedged with Black and Scholes using an implied volatility xed at an initial level IV0 ,
P&L

BS (

1
=
2

IV0 )

IV20

2
2

BS
2

(10.79)

Naturally, we have that


E (P&L ) = 0
IV0 ) = 0 = ( 0 20 0), the
BS ( 0 0;
true market price, consistently with Eq. (10.61), such that, setting the expression of the last
equality of Eq. (10.79), and solving for IV0 , delivers:
i
hR
2 2 BS
2

E 0
2
hR
i
IV20 =
2 2 BS
E 0
2

Alternatively, we may consider another hedging positioning, suggested by Gatheral (2006,


Chapter 3), where the delta-hedging is made through some ctitious time-varying instantaneous,
but deterministic, volatility, equal to , say, where
2

=
for some deterministic

20 = IV20

. In this case, the P&L would be similar to that in Eq. (10.79), with
1
P&L =
2

Imposing the zero prot condition under , leaves:

2
2
E 2 BS
2

=E
=
2
E 2 BS
2
where E

=
E

BS
2
2

BS
2

2
2

BS
2

is the expectation taken under the probability


2

We term

(10.80)

(10.81)

, dened as,

Dollar Gamma probability. By Eqs. (10.80) and (10.81),


IV20

(10.82)

So implied vols are expectations of future realized vols, but only under the Dollar Gamma
probability. Clearly, then, they cannot be used as the fair value of a variance swaps, unless we
tilt the variance contract by a random multiplier coinciding with the Dollar Gamma.
509

c
by
A. Mele

10.8. The price of (equity) volatility


We can elaborate on Eq. (10.82). We have:

2
2
E
=E E
=E

2
loc

=E
=E
Z
=

2
loc

2
loc

)
2

loc (

)
2

E
)

BS
2

( |

BS
2

0)

(10.83)

where the rst equality follows by the law of iterated expectations, ( | 0 ) denotes the conditional density of given 0 , and 2loc (
) is the local variance, as dened in Section 10.6.2.
Finally, is a deterministic, most likely path of , after Gatheral (2006, Chapter 6), a
sort of certainty equivalent for the local variance, for a xed . We also know that at = ,
2
2
is Diracs delta centered at , such that we may safely condition = and, then,
BS
.
view as a bridge starting from 0 and ending at . As a simple example,
0(
0)
As a second example, E ( |
= ), which we may approximate assuming
is a Geometric

1 2
2
. Gatheral
Brownian motion with parameters and , in which case
0
0

argues, with a numerical example, that these approximations are quite reasonable, at least for
options with time to maturity less than a year.
Using the approximation in Eq. (10.83) delivers:
Z
1
2
2
)
IV0 =
(10.84)
loc (
0

Surfaces depend on the initial state, as mentioned in Section 10.6.2. Sticky smiles can
roughly be dened as those where the skew does not depend on the initial state.
Suppose a

very simple example, where IV (


; )=
( + )=
+1
. As
falls,
the skews goes up, consistent with the leverage e ect. This skew does not depend on the
initial price 0 . We can generate this skew, by assuming the local variance does not depend
on time, 2loc (
)=
( + 0 ). Indeed, in light of Eq. (10.84), we would then have that
2
2
IV (0
; 0 ) = IV0 =
( + 0 ) and then, IV (
; )=
( + ) for each .
10.7.4.2 Local volatility as a function of implied volatility

[In progress]

10.8 The price of (equity) volatility


How much volatility do we expect to prevail in the future, after controlling for risk? The answer
to this question has long been the volatility implied by at-the-money options. Eq. (10.84) seems
to suggest this answer is correct. In fact, it is not. Expected volatility, adjusted for risk, is a
510

c
by
A. Mele

10.8. The price of (equity) volatility

certain weighted average of implied volatilities of a continuum of options, as explained below.


It is not mere academic purism. Knowing expected volatility under the risk-neutral probability
allows to trade assets with payo s linked to future realized volatility, known as variance swaps.
In fact, in September 2003, the Chicago Board Options Exchange (CBOE) changed its volatility
index VIX to approximate the variance swap rate of the S&P 500 index return (for 30 days), as
explained below. In March 2004, the CBOE launched the CBOE Future Exchange for trading
futures on the new VIX; and options referenced to VIX futures are also available for trading.
There are compelling reasons explaining the interest investors may have in these contracts.
Undeniably, one is the possibility to formulate views about developments in stock market volatility, without incurring into the price-dependency issues pointed out in Section 10.5.4. Passive
funds managers might also nd these contracts useful, as in times of high volatility, tracking
errors widen and, then, index tracking performance deteriorates. Hedge funds might nd this
type of contracts attractive as well, as they invest in relative value strategies, attempting
to prot from temporary price discrepancies. In times of high volatility, price discrepancies
typically widen, and variance swaps help these institutions hedge against these events.
10.8.1 One introductory example: range-based volatility
Straddles give rise to price-dependency, as explained in Section 10.5.4: even in the presence of
a highly volatile market within a reference period, the straddle might not deliver a positive
payo , unless the underlying price moves beyond the straddles thresholds (see Figure 10.9).
This introductory section develops an example of a simple payo that aims to track changes in
volatility.
Consider a derivative with payo equal to the following estimate of stock volatility over a
given time period [ ],
(

max (ln
[

min (ln
[

denotes the stock price at .


where
This estimator of volatility relates to the Parkinsons (1980) range estimator of volatility,
that is, one based on price ranges, dened as the di erence between the highest and the lowest
log price over a xed sampling interval. This measure of realized volatility has drawbacks as a
basis for volatility trading. Consider the following example. Suppose that for one month, the
stock price never moves, except for three days (only then), when it goes down by 7% (10th day)
(discretely compounded), then overshoots by 1% (11th day) and, nally, gets back to its initial
value (12th day) (a ash-crash?). The payo
( ) is equal to ln
8 2%.

511

c
by
A. Mele

10.8. The price of (equity) volatility

B
100

98

96

94

A
92

10

15

20

25

30

days

While this contract is clearly tracking volatility, it does so imperfectly. The stock price has not
been volatile except for a few isolated days. We would like to make reference to contracts that
pay o when volatility has been sustained over the whole month. The next section describes
such contracts.
10.8.2 Fear gauge contracts
10.8.2.1 Variance swaps and the VIX index

Let us consider the following price process

under the risk-neutral probability:

=
where
( :
with 1

(10.85)

is F -adapted, i.e. F can be larger than that generated by the stock price, F
). Next, dene the realized integrated variance within the time interval [ 1
:
Z

2 ],

var (

2)

We dene a variance swap as a contracts with zero value at the inception date, , and payo
at maturity given by:
var
= (var ( ) Pvar ( )) N
(10.86)
where N is the notional value of the contract, and Pvar ( ) is the variance swap rate agreed
at , and paid o at time .16 Therefore, this contract is a forward, not a swap really. If is
deterministic, then the swap rate must satisfy:
Pvar (

) = E (var (

))

16 Note that this contract relies on some notion of realized variance, as a continuous record of returns is obviously unavailable.
, where VN is the
Moreover, it has long been market practice to dene the variance notional in such a way that N = 2 VN
Pvar
vega notional, that is, the notional expressed in volatility percentage points. Suppose, for example, that realized volatility
is 1 vega (i.e., one volatility point) above the square root of the variance swap rate, var (
) = ( Pvar + 1)2 , such that
1
var = (1 +
) VN VN. That is, vega notional is approximately the notional for each vega realized volatility that exceeds
2 Pvar
the square root of the variance swap rate.

512

c
by
A. Mele

10.8. The price of (equity) volatility

It remains to determine E (var ( )); below, we show that this is the same as a portfolio of
out-of-the money options. First, let us consider a simple case, in which = 0. In this case, we
can solve for E (var ( )) while relying on previous results in the previous section regarding
local volatility. Indeed, note that by Eqs. (10.75) and (10.76), and the connection between
risk-neutral densities and convexity of the option price, Eq. (10.28), we have that
E

=2

(10.87)

(
) is the price as of time of a call option expiring at and struck at . We
where
can compute the risk-neutral expectation of this realized variance, under the assumption that
= 0. By Eq. (10.87),
E (var (

2 ))

=2

2)

Eq. (10.88) can be generalized to the case where


,
1 = , 2
E (var (

)) = 2

1)

(10.88)

)
2

0. In the Appendix, we show that for

)
2

(10.89)

where
is the forward price:
= ( ) , and (
) is the price as of time of a put
option expiring at and struck at . A proof of Eq. (10.89) is in the Appendix. It is a weighted
average of out-of-the-money options, and it is a fear gauge as a resultthe market assessment
of extreme movements is aptly captured by an average of out-of-the-money options.
The new VIX index maintained by the CBOE is an estimate of the square root of E (var ( ))
in Eq. (10.89), annualized:
VIX (

E (var (

))

(10.90)

where
is expressed as a fraction of a year. The approximation relies on a nite number of
out-of-the-money options.
Note an interesting point in this section. Up to the previous sections, we were used to think
that volatility determines option prices (see, e.g., Section 10.5). We now have a theoretic construct that makes option prices sum up to a su cient statistics for (market-adjusted, i.e.,
risk-neutral) expected volatility.
The VIX index is typically referred to as a fear-gauge. Eqs. (10.89)-(10.90) illustrate that
this denition is quite appropriate. The VIX index depends on those of the out-of-the money
options, that is, the options that are exercized in case of tail events. Moreover, tail events are
those arising when worst-case scenarios occur. This connection suggests an interpretation of
VIX behavior and in terms of aversion to Knightian uncertainty as surveyed in Chapter 8, an
issue not explored yet in theoretical research. It is well-known that the VIX index spikes exactly
when equity markets drop. However, Eqs. (10.89)-(10.90) indicate the index potentially reects
market fears regarding both tail-events.
513

c
by
A. Mele

10.8. The price of (equity) volatility


10.8.2.2 A crash derivation

The previous results are extraordinary. We know since Section 10.2 that for any generic risk ,
its forward price is E( ), provided interest rates are constant, such that E( ) = ( )
as soon as
is traded, for otherwise we would need to rely on a model to determine the
expectation E( ). Note that here, var ( ) is not traded, and yet we can still express its
price, E [var ( )], without relying on any model. In other words, we can price volatility in a
model-free fashionthe annualized price of a variance swap is simply the square of the VIX
index.
It is useful to review the steps that lead to Eq. (10.89). The starting point is the so-called
log-contract, a concept rst introduced by Neuberger (1994). This idealized

contract is designed
as to ensure a a payo equal to ln . Its fair value is obviously E ln
, and is negative, as
we shall shaw in a moment.
The intuitive reason the price of the log-contract is negative is that the payo ln
is skewed
to the left, due to the concavity of the log function, such that the expected losses on the
downside are larger than the expected gains on the upside. These facts are conrmed by an
application to Itos lemma to Eq. (10.85), which yields:

1
E ln
=
(10.91)
E [var ( )]
2
It is remarkable. The price of volatility is the same as the value of going short a log-contract.
Naturally, we are not done, because we still dont know how to price the log-contract in the
rst place! The pricing of this, and related contracts, relies on so-called spanning arguments,
as explained for example by Bakshi and Madan (2000) and Carr and Madan (2001). In the
Appendix, it is shown that the payo of the log-contract can be written as
Z

Z
1
+ 1
+ 1
ln
=
(
)
(
)
+
(
)
(10.92)
2
2
0

The expectation on the R.H.S. of Eq. (10.92) is (minus) half that on the R.H.S. of Eq. (10.89),
assuming interest rates are constant. Eq. (10.89) then follows by Eq. (10.91).
10.8.2.3 The market for volatility and further developments

The behavior of the VIX index has been a topic intensively studied in empirical research. [Cite
references.] The following pictures depict the time series behavior of the new VIX index since
its inception as well as dynamics of volume on VIX options.

514

c
by
A. Mele

10.8. The price of (equity) volatility

515

c
by
A. Mele

10.8. The price of (equity) volatility

The trading of derivatives referenced to VIX has increased at a very fast pace. These derivatives aim to replace expensives straddles and makes books less messy with outcomes consistent
with viewseliminates price dependency. It is not a mere theoretical curiosity. The following
table depicts transaction data for options on the VIX index as compared to other options cleared
by CBOE. (Note that the notionals relating to VIX options and futures are not the same, being
$1000 for VIX futures and $100 for VIX options, as of August 2011.)
CBOE trading volume (contracts)Average per day, August 2011
Total trading volume

12,000,000

CBOE Index Options


S&P 500 Options

2,000,000
1,300,000

CBOE VIX Options


CBOE VIX Futures

582,000
79,000

(i)
(ii)
1
6

1
2

of (i)+(ii)

of (ii)

Section 10.6.3 explains how the skew relates to local volatility, but how is the expected
variance in Eq. (10.89) related to the skew? Demeter, Derman, Kamal and Zou (1999) show
that if the implied volatility varies linearly with the strike,
IV = IVatm
for some constant , then,
1

E [var (

)]

IV2atm 1 + 3 (

That is, the existence of a skew, 6= 0, increases the value of the fair variance above the
at-the-money implied volatility.
Sometimes it is said that variance swaps are protable to protection sellers, because The
derivative house has the statistical edge, meaning that the realized variance from to , say,
is general lower than future expected variance under the risk-neutral probability, reecting
variance risk-premiums, as shown in the following picture.

516

c
by
A. Mele

10.8. The price of (equity) volatility

10.8.3 Forward volatility trading


Let us consider the following example of structured volatility trading. Suppose we hold the
view that expected market volatility will rise in one year time, to an extent that is ine ciently
priced in by the term structure of the currently traded variance swaps. Precisely, our view is
that the spot price of the variance swap in one year will exceed the implied forward variance
swap price, i.e.
Pvar (1 2) Pvar (0 2) Pvar (0 1)
(10.93)
To implement a trade consistent with this view, we may proceed as follows:
(i) long a two year variance swap, struck at Pvar (0 2) , with notional one
(ii) short a one year variance swap, struck at Pvar (0 1), with notional

(10.94)

Obviously, this strategy does not cost, at time zero.


The strategy in (10.94) generates prots whenever the inequality in (10.93) holds true. Indeed,
suppose Eq. (10.93) holds true at time 1. Then, come time 1, we can short another one year
variance swap, struck at Pvar (1 2). Intuitively, we do so because we bought it cheap, according
to (10.93). Shorting this variance swap at time 1 generates the following payo at time 2:
1

(2)

Pvar (1 2)

var (1 2)

(10.95)

Moreover, the two year variance swap we went long at time zero (component (i) of 10.94) gives
rise to the following payo at time 2:
2

(2)

var (0 2)
517

Pvar (0 2)

(10.96)

c
by
A. Mele

10.8. The price of (equity) volatility

Adding Eq. (10.95) and Eq. (10.96), and using the relation, var (0 2) = var (0 1) + var (1 2),
leads to:
Pvar (0 2)
(2)
1 (2) + 2 (2) = Pvar (1 2) + var (0 1)
we shorted at time zero (component (ii)

Finally, the one year variance swap with notional


of 10.94) leads to the following payo at time 1:
(1)

(Pvar (0 1)

var (0 1))

(10.97)

Investing (1) for a further year at the safe interest rate delivers
the total prots at time 2 are:
tot

(2) + (1)

= Pvar (1 2)

(1)

at time 2, such that

Pvar (0 2) + Pvar (0 1)

(10.98)

where the inequality follows by Eq. (10.93).


10.8.4 Marking to market
Suppose a variance contract expiring at time is issued at time , when it is costless. How is this
contract worth at time
( )? Let us take the time risk-neutral discounted expectation
of var in Eq. (10.86),
E ( var )
=
Notional
=

E (var (

(var (

) + var (
) + Pvar (

)
)

Pvar (
Pvar (

))
))

(10.99)

where E denotes the risk-neutral expectation conditional upon the information available at
time .
Marking to market suggests an alternative way to implement the forward volatility trading
of the previous section. Suppose, then, again, to have the view that markets for volatility will
be such that (10.93) holds true at time 1, and, accordingly, consider the strategy in (10.94). If
(10.93) holds true at time 1, we may close the position (i) in (10.94) at time 1. By Eq. (10.99),
the market value of the two year variance swap we went long at time 0 is,
(1)

(var (0 1) + Pvar (1 2)

Pvar (0 2))

(10.100)

At time 1, we obtain (1)+ (1), which we can invest at the safe interest rate for one additional
period, delivering the prot tot in Eq. (10.98), for time 2.
10.8.5 Stochastic interest rates
When interest rates are stochastic, but still independent of volatility, the expressions given for
the contract and indexes do not hold anymore, and there are a number of qualications, which
we make in Remark A.1 of Appendix 5. Moreover, the forward volatility trading strategy in
10.94 should be modied. For example, we might use the following strategy:
(i) long a two year variance swap, struck at Pvar (0 2) , with notional one
2)
(ii) short a one year variance swap, struck at Pvar (0 1), with notional (0
(0 1)
518

c
by
A. Mele

10.8. The price of (equity) volatility

If come time 1, Eq. (10.93) holds true, we may liquidate (i), thereby accessing the payo relating
to (ii), for a total payo equal to:
(var (0 1) + Pvar (1 2)

Pvar (0 2))

= (Pvar (1 2)

(1 2) + (Pvar (0 1)

Pvar (0 2) + Pvar (0 1))

(1 2)
+ (var (0 1) Pvar (0 1))

var (0 1))

(0 2)
(0 1)

(1 2)

(0 2)
(0 1)

where the rst term on the left hand side arises by the liquidation of (i) and by Eq. (10.100),
and the second term on the left hand side arises by (ii). By Eq. (10.93), the rst term on the
2)
right hand side is positive. If the short-term interest rate was deterministic, (1 2) = (0
,
(0 1)
and the second term on the right hand side would be zero. When interest rates are stochastic,
the second term can take on any sign although then, its absolute value should be quite low,
compared to the rst term on the right hand side.
10.8.6 Hedging
A nancial institution might be merely interested in intermediating the contract, which then
needs to be hedged against. Suppose, for example, that the nancial institution sells protection
at time , thereby promising to pay the realized integrated variance var ( ) at time . We
want to replicate this integrated variance. By Itos lemma:
Z

Z
1
1
var ( ) = 2
=2
2 ln
(
)
2 ln
(10.101)
The rst term can be replicated by continuously rebalancing a stock position, which is always
long
= 2 shares of the stock, adjusted for the time value of money. Precisely, consider a
self-nanced portfolio (
), such that its value satises:
=
where

denotes the money market account. We choose:


Z
1
1
=
=
1

It is easy to see that


=

1
)

(10.102)

(10.103)

R 1
(
). In Appendix 5, we show that ( )
such that: (i) = 0, and (ii) =
is self-nanced. The bottom line is that we can hedge the rst term in Eq. (10.101) through a
self-nanced portfolio that costs nothing at time . This portfolio is simply (2 2 ).
To replicate the second term in Eq. (10.101), the payo of the so-called log-contract, note
that we simply have to make reference to twice Eq. (10.92). Therefore, the log-contract can be
replicated by shorting 2
units of forwards, which are of course costless at time , and going
long a continuum of out-of-the-money options with weights 2 2 , which cost
Z
Z
(
)
(
)
(
)
2
+2
=
E [var ( )]
2
2
0

519

c
by
A. Mele

10.9. A digression on skewness

(
)
where the equality follows by Eq. (10.89). We borrow
E [var ( )] to purchase these
options, and once this is done, we are guaranteed var ( ) is replicated at time , as we now
have replicated both the rst term and the second term in Eq. (10.101). Finally, come time ,
we pay back the loan, worth E [var ( )], and receive a payo equal to var ( ) E [var ( )],
due to the sale of insurance. Since var ( ) is replicated, no additional funds are needed at
time .

10.9 A digression on skewness


We might be interested in left tail risks. Model-free skewness contracts and indexes can be
designed and built up to cope with this risk. (These indexes are maintained by the CBOE
since January 2011.) Lets take a general perspective. Appendix 6 shows that for any twice
di erentiable function ,
Z
Z
(
)
00
00
E[ ( )
( )] =
( ) Put (
)
+
( ) Call (
)
0

(10.104)
=
, the forward rate, and the remaining usual notation. Eq. (10.104) shows
where
that any Markov payo can be spanned through a set of European options. For example, if
( ) = ln , we can price a log-contract, which leads to the new VIX index, as explained in the
previous section. We are interested in skewness
contracts.

Consider the following payo , v ( )


ln
. Note that by construction, v ( ) = 0,
such that Eq. (10.104) allows us to price this payo as follow,
Z

Z
(
)
E [ v ( )] =
) Put (
)
+
) Call (
)
v (
v (
0

2
00
)
) = 2 1 ln
v (
v (

Which volatility contract does the payo v ( ) relate to? Its a contract relating to the second
moment of the cumulative return ln , rather than the realized volatility of the previous section,
R 2
dened as the sum of the instantaneous return variances,
. Precisely, note that by
Itos lemma,
" Z

2 #

Z
E [ v ( )] = E 2
(10.105)
1 ln
ln
+
This volatility contract is a bit
2unusual at thetime of
2writing, as the standard notion of variance
R
we typically price is
, rather than ln
.
The current literature and practice on skewness contracts have a similar cumulative return
avor. Consider the following payo , introduced by Bakshi, Kapadia and Madan (2003),
3

)
ln
E ln
sk (
sk ( ) = 0
The payo sk ( ) refers to the third moment of the cumulative return over a certain investment
horizon. Instead, the notion of a realized skewness would rely on the third moments of the
520

c
by
A. Mele

10.10. Dealing with market imperfections

instantaneous returns, averaged over the given investment horizon. Pricing results relating to
realized skewness are not available at the time of writing. Let us keep on relying on the payo
), and consider the denition of skewness, adjusted for risk,
sk (
E[

Skew

E[
where,
vv

)]

)] 2

sk
vv

E ln

ln

vv

( )=0

and adjustment for risk relates to the fact that the expectations in the denition of Skew are
taken under the risk-neutral probability. Note that vv ( ) is the de-meaned version of v ( ),
and its expectation can be easily found as,
E[
where

log

E[

) = ln
log

vv

)] = E [

)]

E[

log

)]2

, the log-contract, which is priced in the usual VIX fashion,


(

)] =

1
2

Put (

1
2

Call (

Likewise,
E[

sk

)] =

sk

sk

( )

00
sk

)=

3
2

( ) Put (

ln

E[

log

)
(

10.10 Dealing with market imperfections

521

)]

sk

( ) Call (

c
by
A. Mele

10.11. Appendix 1: The original arguments of Black & Scholes

10.11 Appendix 1: The original arguments of Black & Scholes


The arguments in Black and Scholes (1973) and Merton (1973) rely on the assumption the option is
traded. Accordingly, create a self-nanced portfolio of units of the underlying asset and
units of
the European call option, where is an arbitrary number. Such a portfolio is worth = +
and since it is self-nanced it satises:
=
=
= ( +

+
)

1 2 2
+
+
2

1 2 2
+
+
2

where the second line follows by It


os lemma. Therefore, the portfolio is locally riskless whenever
=
in which case

would appreciate at the safe rate,

+ 12 2 2
=
=
+

1 2 2
2
1

The last equality, and the boundary condition, lead to the Black-Scholes partial di erential equation
(10.14).

522

c
by
A. Mele

10.12. Appendix 2: Black (1976)

10.12 Appendix 2: Black (1976)


We want to evaluate the following expectation:
)+

E(
where

1
2

Let Iexe be the indicator of all events s.t.


E (

. We have

)+ = E (

Iexe )
E (Iexe )

E (Iexe )
E
Iexe

=
=
where the probability

=
,

=
(

E (Iexe )

is dened as:

-martingale starting at one. Under

Note that

(Iexe )

=
a

) = ( 2 ) and

1
2

2 1

1
2

is a Brownian motion, such that

) = ( 1 ), where
=

ln

qR

523

1
2

R
2

c
by
A. Mele

10.13. Appendix 3: Stochastic volatility

10.13 Appendix 3: Stochastic volatility


10.13.1 Hull & White equation
We assume that the asset price and its volatility solve Eqs. (10.36), but that the two Brownian motions


are uncorrelated. Note, rst that conditionally on the variance path 2 [ ] , ln
and
is normally distributed under the risk-neutral probability,

Z
Z
1
2
ln
+
= (
)
2
with:
E

ln
ln

=
=

(
Z

)
2

1
(
2

)
)

=(

Then, we use the law of iterated expectations and elaborate on Eq. (10.49), and arrive to Eq. (10.51)
as follows:

2
(
)
2
=
E (
)+
h

(
)
2
(
)+
=E E

[ ]

2
)
= E [ BS (
;
]

) 2 ]
= E [ BS (
;

Z
q

) ( 2 )
=
;
BS (
q

)]
E [ BS (
;
(10A.1)

where ( 2 ) denotes the density of conditional upon the current level of the variance, 2 .
The third and fourth equalities follow by the assumption
and
are uncorrelated, such that

2
) = ( 2)
(
for otherwise the current level of the index, , would help predict . In other words, Eq. (10A.1)
reveals that the price of an option in this market with stochastic volatility is Black & Scholes weighted
with the probability density of the realized variance.

10.13.2 Extensions
Romano and Touzi (1997) extend the Hull & White equation to the case where asset returns and
volatility are correlated. Consider the following model:

p
2
=
+
+ 1
2

( )

+ ( )

where the correlation process, , does not depend on , and


motions, such that,
1
2 1
( 2) +
=
( ) 2

524

and

are two independent Brownian


1

c
by
A. Mele

10.13. Appendix 3: Stochastic volatility


where

is as in Eq. (10.53) of the main text. We have, using the Law of Iterated Expectations,

2
(

=E

h
E

=E[

)+

BS (

)+
q

where is as in Eq. (10.53) of the main text. The third equality follows because by assumption, both
the variance and correlation processes are independent of { } [ ] , such that conditionally upon


the variance and the correlation paths, 2 [ ] and ( ) [ ] , ln
is normally distributed
under the risk-neutral probability,
ln

= (

1
2

with
E

h
h

ln
ln

= (
i Z
=
]
]

)
2

p
1
1
(
2

)
)

=(

The fourth line follows by the same arguments leading to Eq. (10A.1).

10.13.3 Smile analytics


We develop the approximation stated in Eq. (10.37). The assumption is that the asset price and its
volatility solve Eqs. (10.36), with the Brownians
and
being uncorrelated, such that the market
price of the option is that generated by the Hull & White equation (10.51). By denition, the Black
& Scholes implied volatility, IV, satises,
$

=E[

BS (

)] =

BS (

; IV (

))

(10A.2)

ln

q
Let
( ) = E ( ) the expected average volatility, and consider a Taylors second order expansion
( ),
of the Black & Scholes function about
E[

BS (

BS (

)]
1
( )) +
2

BS (

;
2

and,
BS (
BS (

; IV (

))
;

( )) +

BS (

525

()

()

(IV (

q
(

( ))

c
by
A. Mele

10.13. Appendix 3: Stochastic volatility


By plugging these two approximations into Eq. (10A.2) leaves:
IV (
The vega,
are:

)
BS /

1
( )+
2

; )

BS (

;
2

BS

2,

()

q
(

(10A.3)

for the Black-Scholes model are well-known. They

)
)/

;
;

BS (

, and the volga,


BS (

BS (

( 12

3(

2(

+ 12
2

))

2(

BS (

Replacing these expressions into Eq. (10A.3) yields the approximation in Eq. (10.37) of the main text.

526

c
by
A. Mele

10.14. Appendix 4: Local volatility

10.14 Appendix 4: Local volatility


The arguments underlying the derivation of the local volatility function in this appendix rely on
Derman and Kani (1994), Dumas (1995) and Britten-Jones and Neuberger (2000). In all the proofs
to follow, all expectations are taken to be conditional on F , but to simplify notation, we write
E ( | ) E ( | F ).
Proof of Eqs. (10.78) and (10.87). We rst derive Eq. (10.78), a result encompassing Eq.
(10.75). We assume that the under the risk-neutral probability, the stock price is solution to,
=

(10A.4)

where
is some F -adapted process. For example,
(
) , all , where is solution to the
second of Eqs. (10.77). Next, we assume that we are given a continuum of option prices $ (
)
along the two dimensions of strikes
and time-to-maturities . We want to match the prediction of
the model with the market prices,
$(

)=

)+

E(

(10A.5)

where
is solution to Eq. (10A.4).
Let us expand the right-hand side of Eq. (10A.5) with respect to time-to-maturity, for xed

1
+
2 2

(
+I
(
) = I
+
)
2
where

)+

E(

E(

where
E

=
=

)+ +

ZZ
Z

Z
( )

(
Z

( )E

such that,
(

)
2

|
(

)
{z

joint density of (
2

|
=

)
}

( )E

)+

E(
(

1
) + E (
2

E (I

)+ +

is the Diracs delta. By using the identity, (

E(

)+ +

1
) +
2

E (I

Next, di erentiate both sides of Eq. (10A.5 with respect to


$(

)=

$(

)+

527

(10A.6)

,
)

E(

)+

(10A.7)

c
by
A. Mele

10.14. Appendix 4: Local volatility


and twice with respect to
$(

,
(

)=

E (I

$(

)
2

( )

(10A.8)

where the second relation is simply the famous relation in Eq. (10.28) of the main text. By replacing
Eq. (10A.5), (10A.7) and Eqs. (10A.8) into Eq. (10A.6) leaves,
$(

$(

)=

1
2

$(
2

This is,
E

$(

=2

$(

+
2

$(

)
E

(10A.9)

)
2
loc (

(10A.10)

That is, if Eq. (10A.5) holds true, volatility must be restricted to satisfy Eq. (10A.10). As an
example, let
(
) , where is solution to the second of Eqs. (10.77). Then,

2
)=E 2
=
loc (

= E 2(
) 2
=

)E 2
=
= 2(

)E 2
=
2loc (

which proves Eq. (10.78).


The converse does also hold true. Consider, for example, the case where
that ( ;
) is solution to the terminal value problem,
(

0=
(

1
2

2(

)=(

[0

$(

$(

0) = (

$(

where denotes the stock price at expiration. Then, dene


Eq. (10A.10), $ (
) solves the initial value problem,
0=

1. That is, suppose

1
2

2 2 (
loc

$(

)
2

$(

)
2

), such that, by

where denotes the initial price. The previous partial di erential equation is known as the Dupires
equation.

528

c
by
A. Mele

10.15. Appendix 5: Variance contracts

10.15 Appendix 5: Variance contracts


We provide proofs of results relating to volatility contracts.
Proof of Eq. (10.87). We have,
Z
2
=
E

=2

=2

+
2

0
(

( )
(

( )

)
2

where the second line follows by Eq. (10A.10), and the third line follows by Eq. (10.28).
Proof of Eq. (10.89). By a Taylor expansion with remainder, we have that for any function
smooth enough,
Z
0
(
) 00 ( )
(10A.11)
( ) = ( 0) + ( 0) (
0) +
0

Let

be the forward price,


ln

= ln

= ln

1
1

Z
Z

. By applying this formula to ln


(

1
2

)+

)+

= ln

R
R
) 12 = 0 0 (
where the second equality follows because 0 (
the third equality follows because the forward price at satises

Z
Z
(
)
(
)
=
+
E ln
2
0

)+

)+

(10A.12)

R
)+ 12 + 0 (
)+ 12 , and
= . Hence, by E ( ) = ,

(
)
(10A.13)
2

On the other hand, by It


os lemma,
E

2E ln

(10A.14)

By replacing Eq. (10A.14) this formula into Eq. (10A.13) yields Eq. (10.89).
Remark A1. The previous proof results hold when the short-term rate is constant. The case of
stochastic interest rates can actually be dealt with, although with some tools, which will be introduced
more systematically in Chapter 12 (Section 12.2). We anticipate how these tools work in the present
appendix, as they allow us to solve for the fair price of variance contracts even when interest rates are
stochastic. Note that if interest rates are stochastic, Eq. (10A.13) generalizes to:

Z
(
)
(
)
= E
+
ln
+
E
2
2
0

(10A.15)

529

c
by
A. Mele

10.15. Appendix 5: Variance contracts


The left hand side of Eq. (10A.15) can be written as

E
ln
= (
)E
ln
(
)

)E

ln

(10A.16)

where E
denotes the expectation taken under a new probability, known as the forward probability.
Naturally, the rst term on the right side of Eq. (10A.15) is zero, as a forward has no value at inception.
But then, this zero value condition implies that:

!
=E

=E

That is, the forward price is a martingale under the forward probability. Therefore, Eq. (10A.14) is
replaced with,
Z

2
ln
(10A.17)
= 2E
E
now denotes the instantaneous volatility of the forward price. By combining Eqs. (10A.15),
where
(10A.16) and (10A.17), we get,
Z

Z
Z
Z
(
)
(
)
2
2
2
=E
=
+
E
2
2
(
) 0

That is, the fair price of a variance contract for a swap of forward volatility can be expressed in a
model-free format. Note that it is the price of a variance contract that we can express in a model-free
fashion, not the (undiscounted) expected realized variance. Indeed, the payo of a variance contract
for forward realized variance is:
Z
2

Pvar (

such that the zero value condition at inception,


Z
0=E
leads to,
Pvar (

)=

1
(

Pvar (

=E

)
Z

Proof that ( , ) in Eq. (10.102) is self-financed. For a portfolio strategy to be self=


and
=
+
, or:
nanced, we need to have

=
+
=
+
(10A.18)
. With (

where the second line follows by

=
=
=

), we have that:

530

(10A.19)

c
by
A. Mele

10.15. Appendix 5: Variance contracts

where we have used the portfolio weights in Eq. (10.102) and the expression for the portfolio value
in Eq. (10.103). Eq. (10A.19) is the same as Eq. (10A.18), once we use the portfolio weight in Eq.
(10.102). Therefore, ( ) is self-nanced.

531

c
by
A. Mele

10.16. Appendix 6: Skewness contracts

10.16 Appendix 6: Skewness contracts


By Eq. (10A.11), we have that, for any function
(

)=
=

( )+

( )+

( )(
( )(

)+
)+

Z
Z

as many times di erentiable as we might need,

(
00

)
( )(

00

()
+

00

( )(

)
where
= (
, the forward rate. Multiplying both sides of this equation by
taking expectations, yields Eq. (10.104) in the main text.

532

)+
(

),

and

10.16. Appendix 6: Skewness contracts

c
by
A. Mele

References
Bakshi, G. and D. Madan (2000): Spanning and Derivative Security Evaluation. Journal of
Financial Economics 55, 205-238.
Bakshi, G., N. Kapadia, D. Madan (2003): Stock Return Characteristics, Skew Laws, and
Di erential Pricing of Individual Equity Options. Review of Financial Studies 16, 101143.
Ball, C.A. and A. Roma (1994): Stochastic Volatility Option Pricing. Journal of Financial
and Quantitative Analysis 29, 589-607.
Bergman, Y. Z., B. D. Grundy, and Z. Wiener (1996): General Properties of Option Prices.
Journal of Finance 51, 1573-1610.
Black, F. (1976a): The Pricing of Commodity Contracts. Journal of Financial Economics
3, 167-179.
Black, F. (1976b): Studies of Stock Price Volatility Changes. Proceedings of the 1976 Meeting
of the American Statistical Association, 177-81.
Black, F. and M. Scholes (1973): The Pricing of Options and Corporate Liabilities. Journal
of Political Economy 81, 637-659.
Bollerslev, T. (1986): Generalized Autoregressive Conditional Heteroskedasticity. Journal of
Econometrics 31, 307-327.
Bollerslev, T., Engle, R. and D. Nelson (1994): ARCH Models. In: McFadden, D. and R.
Engle (Editors): Handbook of Econometrics (Volume 4), 2959-3038. Amsterdam, NorthHolland
Britten-Jones, M. and A. Neuberger (2000): Option Prices, Implied Price Processes and
Stochastic Volatility. Journal of Finance 55, 839-866.
Carr, P. and D. Madan (2001): Optimal Positioning in Derivative Securities. Quantitative
Finance 1, 19-37.
Christie, A.A. (1982): The Stochastic Behavior of Common Stock Variances: Value, Leverage,
and Interest Rate E ects. Journal of Financial Economics 10, 407-432.
Clark, P.K. (1973): A Subordinated Stochastic Process Model with Fixed Variance for Speculative Prices. Econometrica 41, 135-156.
Corradi, V. (2000): Reconsidering the Continuous Time Limit of the GARCH(1,1) Process.
Journal of Econometrics 96, 145-153.
Cox, J.C., S.A Ross and M. Rubinstein (1979): Option Pricing: A Simplied Approach.
Journal of Financial Economics 7, 229-263.
Cox, J.C., J.E. Ingersoll and S.A. Ross (1985): A Theory of the Term Structure of Interest
Rates. Econometrica 53, 385-407.
533

10.16. Appendix 6: Skewness contracts

c
by
A. Mele

Demeter, K., E. Derman, M. Kamal and J. Zou (1999): More Than You Ever Wanted To
Know About Volatility Swaps. Goldman Sachs Quantitative Strategies Research Notes.
Derman, E. (1998): Stochastic Implied Trees: Arbitrage Pricing with Stochastic Term and
Strike Structure of Volatility. International Journal of Theoretical and Applied Finance
1, 61-110.
Derman, E. and J. Kani (1994): Riding on a Smile. Risk 7, 32-39.
Du e, D. and C-f. Huang (1985): Implementing Arrow-Debreu Equilibria by Continuous
Trading of Few Long-Lived Securities. Econometrica 53, 1337-1356.
Dumas, B. (1995): The Meaning of the Implicit Volatility Function in Case of Stochastic
Volatility. Available from:
http://www.insead.edu/facultyresearch/faculty/personal/bdumas/research/index.cfm.

Dupire, B. (1994): Pricing with a Smile. Risk 7, 18-20.


El Karoui, N., M. Jeanblanc-Picque and S. Shreve (1998): Robustness of the Black and
Scholes Formula. Mathematical Finance 8, 93-126.
Engle, R.F. (1982): Autoregressive Conditional Heteroskedasticity with Estimates of the Variance of United Kingdom Ination. Econometrica 50, 987-1008.
Fama, E. (1965): The Behaviour of Stock Market Prices. Journal of Business 38, 34-105.
Fornari, F. and A. Mele (2006): Approximating Volatility Di usions with CEV-ARCH Models. Journal of Economic Dynamics and Control 30, 931-966.
Gatheral, J. (2006): The Volatility Surface: A Practioners Guide. New York: John Wiley and
Sons.
Hakansson, N. (1979): The Fantastic World of Finance: Progress and the Free Lunch. Journal
of Financial and Quantitative Analysis (Proceeding Issue) 14, 717-734.
Harrison, J.M. and D.M. Kreps (1979): Martingales and Arbitrage in Multiperiod Securities
Markets. Journal of Economic Theory 20, 381-408.
Heston, S.L. (1993a): Invisible Parameters in Option Prices. Journal of Finance 48, 933-947.
Heston, S.L. (1993b): A Closed Form Solution for Options with Stochastic Volatility with
Applications to Bond and Currency Options. Review of Financial Studies 6, 327-344.
Hicks, J.R. (1939): Value and Capital. Oxford: Oxford University Press.
Hull, J. and A. White (1987): The Pricing of Options with Stochastic Volatilities. Journal
of Finance 42, 281-300.
Keynes, J.M. (1930): A Treatise on Money: The Applied Theory of Money. London: MacMillan.
Mandelbrot, B. (1963): The Variation of Certain Speculative Prices. Journal of Business
36, 394-419.
534

c
by
A. Mele

10.16. Appendix 6: Skewness contracts

Mele, A. (1998): Dynamiques non lineaires, volatilite et equilibre. Paris: Editions Economica.
Mele, A. and F. Fornari (2000): Stochastic Volatility in Financial Markets. Crossing the Bridge
to Continuous Time. Boston: Kluwer Academic Publishers.
Merton, R. (1973): Theory of Rational Option Pricing. Bell Journal of Economics and
Management Science 4, 637-654.
Nelson, D.B. (1990): ARCH Models as Di usion Approximations. Journal of Econometrics
45, 7-38.
Nelson, D.B. (1991): Conditional Heteroskedasticity in Asset Returns: A New Approach.
Econometrica 59, 347-370.
Neuberger, A. (1994): Hedging Volatility: the Case for a New Contract. Journal of Portfolio
Management 20, 74-80.
Parkinson, M. (1980): The Extreme Value Method for Estimating the Variance of the Rate
of Returns. Journal of Business 53, 61-68.
Renault, E. (1997): Econometric Models of Option Pricing Errors. In: Kreps, D., Wallis, K.
(Editors): Advances in Economics and Econometrics (Volume 3), 223-278. Cambridge:
Cambridge University Press.
Renault and Touzi.
Romano, M. and N. Touzi (1997): Contingent Claims and Market Completeness in a Stochastic Volatility Model. Mathematical Finance 7, 399-412.
Rubinstein, M. (1994): Implied Binomial Trees. Journal of Finance 49, 771-818.
Scott, L. (1987): Option Pricing when the Variance Changes Randomly: Theory, Estimation,
and an Application. Journal of Financial and Quantitative Analysis 22, 419-438.
SEC-CFTC (2010): Findings Regarding the Market Events of May 6, 2010. A joint report
by the Securities and Exchanges Commission & the Commodity Futures Trading Commission, September.
Tauchen, G. and M. Pitts (1983): The Price Variability-Volume Relationship on Speculative
Markets. Econometrica 51, 485-505.
Taylor, S. (1986): Modeling Financial Time Series. Chichester, UK: Wiley.
Vasicek, O. (1977): An Equilibrium Characterization of the Term Structure. Journal of
Financial Economics 5, 177-188.
Wiggins, J. (1987): Option Values and Stochastic Volatility: Theory and Empirical Estimates. Journal of Financial Economics 19, 351-372.

535

11
Engineering of xed income securities

11.1 Introduction
This chapter is an introduction to the practice of xed income security pricing. Fixed income
securities quite di er from equities and equity derivatives. Consider the simple example of a
simple pure discount bond, which is quite di cult to price, being tied down to the time value
of money. Its value reects intertemporal preferences and beliefs of market participants, which
are unobservable and, importantly, not traded. For this reason, the price of this bond cannot
be related to the current state of the world in a preference-free format. This is of course not
the case while we price equity derivatives in a complete market setting such as that in Black
and Scholes (1973).
This chapter reviews models where we can still price xed income securities in a preferencefree setting. We rely on no-arbitrage models, which are the xed income counterparts to the
local volatility models reviewed in Chapter 10. Within this framework, we give up modeling
the current security prices in the rst place. Rather, we take these prices as given, and exploit
all the information embedded into them so as to extract risk-neutral probabilities of future
price movements. Once risk-neutral probs are reverse-engineered, we can price any interest rate
product in a preference-free format. No-arbitrage models means that the only assumption we
are really making is absence of arbitrage.
The main model that illustrates this way to proceed through closed-form formulae is that of
Ho and Lee (1986). The Ho and Lee approach is an elegant way through which models can be
calibrated to data while ensuring absence of arbitrage. However, the model relies on unrealistic
assumptions, and might lead to negative interest rates. We develop a calibration approach based
on the extraction of Arrow-Debreu security prices, which can accommodate for more realistic
interest rate developments. Arrow-Debreu securities are abstract securities that only pay o in
mutually exclusive states of the world, and their value then naturally relates to the risk-neutral
probability of the events where they specically pay o (see Chapter 2). While these assets do
not obviously trade, we can extract their shadow value from the price of xed income securities
based on a model; we can then use these extracted values to price any interest rate derivative.
We center around these themes, which we illustrate through simple numerical applications
including the pricing of interest rate derivatives such as options on bonds, swaps, caps, callable

11.1. Introduction

c
by
A. Mele

or convertible bonds, emphasizing the joint behavior of derivative prices and the underlying
for example, the derivative price predicted by a given model can tell us about whether the
underlying is mispriced. The framework of analysis is in discrete time, and relies on the same
implied binomial trees introduced in Chapter 10 to model equity derivatives. Implied binomial
trees simplify many conceptual intricacies; Chapter 12 deals with more advanced topics in
interest rate modeling and derivative evaluation, including the empirical motivation underlying
them as well as a systematic analysis through continuous time methods. Finally, this chapter
does not cover credit risk, which is the focus of Chapter 13. We now proceed with a number
of basic pieces of motivation, explaining more in detail a few of the very issues arising in xed
income markets.
11.1.1 Relative pricing in xed income markets
While current bond prices cannot be given a preference-free representation, we can still aim to
price interest rate derivatives in a preference-free fashion. The keyword is relative pricingthe
situation in which we price a number of assets given the price of others, while ensuring absence
of arbitrage, as explained in many previous junctures of these lectures. Pricing options on traded
assets is in general quite a di erent matter than pricing the underlying assets in the rst place.
Even in the equity case, we try to evaluate an option, say, in a preference-free fashion, even if
the underlying asset cannot really. By preference-free, we refer to the possibility to extract
risk-neutral probabilities, or Arrow-Debreu prices, from the price of already traded assets, and
price derivatives written on the states spanned by the extracted Arrow-Debreu prices.
We wish to achieve similar objectives in the xed income space, although the task is challenging. Consider, for example, the Black & Scholes formula. The logic leading to it cannot
exactly be applied to evaluate xed income securities. Indeed, the Black & Scholes model relies
on the assumption of a constant volatility of the underlying price. In the context of interest
rate derivatives, the volatility of the underlying asset price depends, instead, on the maturity
of the underlying, as it tends to zero as the time-to-maturity goes to zero.
More generally, pricing and hedging interest rate derivatives requires a model describing
developments of the whole yield curve, as we shall explain. It goes without saying that the
general principles underlying the APT are naturally still the same. The methods described
in this and following chapter aim to an objective, where the dynamics of the pricers under
the risk-neutral probability (e.g., the dynamics of the short-term rate) do not depend on riskaversion corrections. The reader who is impatient about being referred to a concrete illustration
of these somewhat surprising statements is referred to Eq. (11.42) of this chapter and the
discussion around it, or in general, to Section 12.6 of the next, where we shall deal with the
celebrated Heath, Jarrow and Morton (1992) model.
11.1.2 Many evaluation paradigms
After reading this introductory chapter, you might nd that so many methods and models might
readily become available, which could be used to price interest rate derivatives. Derivatives
houses do really have dozens of models, with di erent houses possibly asking for di erent prices
relating to the same product. This circumstance needs not to be an arbitrage opportunity. Rate
markets are typically over-the-counter, of the type studied in Chapter 9 of these lectures. Indeed,
this chapter relies on stylized examples of markets with di erent derivative prices predicted by
models that match the same set of initially given market prices. Market incompleteness and
segmentation are responsible of these interesting features, as we shall explain.
537

11.2. Markets and interest rate conventions

c
by
A. Mele

With dozens of methods available to price xed income products, we do not see the emergence
of a single model to price all of the extant xed income products. Typically, any bank has a
battery of di erent models, with pieces of this battery possibly ghting for di erent goals. For
example, a bank might display a preference for a certain type of models as a result of (i) its
culture and history, or (ii) the particular business is pursuing. For example, in the next chapter,
we shall see that to price interest rate options such as caps, we may use the market model,
which relies on the Black 76 formula. However, using this model implies that we do not have
a closed-form solution for the price of swaptions, which can only be solved through numerical
methods. If the swaptions business is not important for the bank then, we may safely adopt
the market model.
[In progress]
11.1.3 Plan of the chapter
The chapter is organized as follows. Section 11.2 through 11.3 develop the basics underlying
xed income securities, such as interest rate and market conventions, duration, convexity, and
an introduction to basic hedging and trading strategies. Section 11.4 is the rst section to deal
with models that aim to t the initial yield curve without errors, using binomial trees. We
have two fundamentals ways to achieve this goal. As for the rst one, developed in Section
11.4, we freeze scenarios, i.e. the values taken by the short-term rate on the branches of a
tree, and search for the risk-neutral probs such that the prices predicted by the model agree
with the market. As for the second, we x risk-neutral probs, and search for scenarios such
that the model and the market are the same. Section 11.5 deals with the second approach,
which is the essence of the Ho and Lee (1986) model. Naturally, we may consider situations
where we might simultaneously search for probabilities and short-term rate scenarios that make
models consistent with the markets. These situations are quite complex and necessitate a general framework of analysis, developed in Section 11.6, and hinging upon calibration through
Arrow-Debreu securities. Section 11.7 concludes this chapter and provides numerical examples
of how to evaluate bonds with callability and convertibility features, which will receive a more
systematic theoretical treatment in the next chapters.

11.2 Markets and interest rate conventions


11.2.1 Markets for interest rates
There are three main types of markets for interest rates: (i) LIBOR; (ii) Treasure rate; (iii)
Repo rate (or repurchase agreement rate). We briey survey these markets and, then review
how to extract potentially useful information from them while constructing spreads.
11.2.1.1 LIBOR (London Interbank O er Rate) and other interbank rates

Financial institutions trade deposits with each other, which span maturities that range from just
overnight to one year at a given currency. The LIBOR rate reects the rate at which nancial
institutions are willing to borrow in these markets, on average. It is an average indicative quote
of the interbank lending market. It is determined as follows. A panel of major banks exists,
where each bank belonging to the panel reports the rate it expects to be charged by its peers in
case of borrowing needs for a given maturity and a given currency. The four highest and the four
lowest rates of these banks reports are ignored, and the remaining reports are aggregated by
538

c
by
A. Mele

11.2. Markets and interest rate conventions

Thomson Reuters for ten currencies, and published daily by the British Bankers Association.1
The left panel of Figure 11.1 plots the LIBOR referenced to US dollars.
The LIBOR is a fundamental point of reference to nancial institutions, which look at it as
an opportunity cost of capital. Moreover, many xed income instruments are indexed to the
LIBOR: forward rate agreements, interest rate swaps, or variable mortgage rates (see Chapters
12 and 13). Because the banks submissions over the process of the LIBOR formation do not
rely on trades (only on subjective estimates of the cost of capital), there might be incentives for
report manipulation biased towards the rates that would make a given bank reap prots over
LIBOR-sensitive derivatives, as hypothesized over the LIBOR scandals emerging in June 2012.
While the LIBOR reects market conditions inherent the banking system for a given currency,
the US Federal Funds rate more directly links to the liquidity to be deposited within the Federal
Reserve. Banks have to maintain reserves with the Federal Reserve to partially back deposits and
to clear nancial transactions, as further explained in Section 13.6 of Chapter 13. Transactions
involve banks with excess reserves with the Fed, which earn no interest, to banks with reserve
deciencies. The Federal Funds rate is the overnight rate at which banks lend these reserves
to each other. It is a ected by the FDRBNY, which aims to make it lie within a range of the
target rate decided by the governors at Federal Open Market Committee meetings. This range
is maintained through open market operations.
11.2.1.2 Treasury rate

It is the rate at which a given Government can borrow at a given currency. The left panel
of Figure 11.1 depicts the time behavior of the interest rate on short-term and long term US
government debt: the 3 month T-bill rate and the 10 year Treasury yield.
11.2.1.3 Repo rate (or repurchase agreement rate)

A Repo agreement is a contract by which one counterparty sells some assets to another, with
the obligation to buy these assets back at some future date. The assets act as collateral. The
rate at which such a transaction is made is the repo rate. One day repo agreements give rise to
overnight repos. Longer-term agreements give rise to term repos.
11.2.1.4 Interest rate spreads

Interest rate spreads are the di erence between interest rates applying to two di erent markets.
Thus, they have the potential to remove components that are incorporated by both markets,
thereby isolating interesting pieces of information.
One important example is the LIBOR-OIS spread, which is the di erence between the 3month LIBOR minus the 3-month OIS. The OIS (overnight indexed swap) rate is the swap
rate in a swap agreement of xed against variable interest rate payments, where the variable
interest rate is an overnight reference, typically an average, unsecured interbank overnight rate,
such as the Federal Funds rate in the US, SONIA in the UK or EONIA in the Euro area.2 The
LIBOR reects the monetary policy stance but should also incorporate a premium related to
counterparty risk. Instead, the OIS is a mere interest rate swap; as such, it should primarily
reect the monetary policy stance. Therefore, the LIBOR-OIS spread has the potential to isolate
credit views on nancial institutions. Historically, the LIBOR-OIS spread has behaved quite
1 Instead, the LIBID (London Interbank Bid Rate) is the rate that these nancial institutions are prepared to pay to borrow
money, on average, but it does not rely on a formal setting procedure such as that leading to the daily LIBOR. Naturally, the
LIBID is less than the LIBOR.
2 See the next chapter (Section 12.8.5) for extensive discussions regarding interest rate swaps.

539

11.2. Markets and interest rate conventions

c
by
A. Mele

at, although then it reached high record levels during the 2007 subprime crisis (see the right
panel of Figure 11.1).
Another example of an interest rate spread commonly used in empirical research is the socalled TED spread, which is the di erence between the LIBOR and the Treasury bill rate.
The TED spread captures ight to quality e ects that typically occur during times of crisis,
when Treasuries are considered particularly valuable by investors (see the right panel of Figure
11.1). Due to this ight to quality reason, the TED spread might fail isolate views about
developments in the interbank market.

FIGURE 11.1. Left panel: The 3m LIBOR referenced to USD, the 3m T-Bill rate and
the 10 year Treasury yield. Right panel: The TED spread (the di erence between the 3m
LIBOR and the 3m T-Bill rate) and the LIBOR-OIS spread (the di erence between the
3m LIBOR and the 3m OIS). The shaded areas mark recession periods identied by the
National Bureau of Economic Research.

On a historical note, the Federal Funds rate has been the object of much empirical research.
In an attempt to explain how the credit view contributes to growth more than Friedmans
monetary view, Bernanke and Blinder (1992) show that the Federal Funds rate makes the
predicting power of M1 growth insignicant, as we further review in Section 13.6 of Chapter
13. This nding initially spread enthusiasm about the ability of this rate to explain short-run
aggregate uctuations. However, as surveyed for example by Stock and Watson (2003), the
explanatory power of the Federal Funds rate evaporizes, once we condition on the term spread,
a fact we comment in Section 12.2.2 of the next chapter.
540

c
by
A. Mele

11.2. Markets and interest rate conventions


11.2.2 Mathematical denitions of interest rates
11.2.2.1 Simply compounded interest rates

A simply compounded interest rate at time , for the time interval [


solution to the following equation:
(

)=

1
1+(

) (

], is dened as the

(11.1)

This denition is intuitive, and is the most widely used in the market practice. For example,
LIBOR rates can be dened consistently with this way, with (
) being the initial investment
at that delivers $1 at .
11.2.2.2 Yield curves

The yield-to-maturity, or spot rate, for some maturity date is the yield on the zero maturing
at , denoted as ( ). It is the solution to the following equation,
(

)=

1
(1 + (

(11.2)

))

With semi-annual compounding, we have that ( ) = (1 + ( 2 ) ) 2 . In general, we have


that if the interest is compounded times in a year, at the annual rate , then, investing for

. Continuous compounding is obtained by letting


in the previous
years gives 1 +
expression, leaving
. Therefore, the continuously compounded spot rate is obtained as:
(

)=

ln

It is a sort of average rate for investing from time to time


. The function, 7
( ),
is called the yield curve, or the term structure of interest rates.
A related and widely used concept is the par yield curve. Let ( ) be the time price of a
bond that pays o the principal of $1 at expiry , and a known sequence of constant coupons
( ) at + 1, + 2 , such that, in the absence of arbitrage and any other frictions,
(

)=

+ )+

=1

Note that ( ) is xed at time . A par bond is one that quotes at parity, ( ) = 100%.
The par yield curve is the sequence of coupon rates ( ), for varying, that correspond to
the par:
( )
( )
( )= P
( )=1
(11.3)
( + )
=1
In other words, the coupon rates ( ) have to adjust to make the market happy to have
the coupon bearing bond quote at par, ( ) = 1. An interesting interpretation
of par-yield
P
is the following. Rewrite Eq. (11.3) as follows: 1
( ) = ( )
(
+ ). The
=1
right-hand side of this equation is the present value of the ow of known coupons, ( ),
receivables at the dates + 1, + 2 . The left-hand side is the present value of the ow
541

c
by
A. Mele

11.2. Markets and interest rate conventions


of future, and unknown, LIBOR rates,
the dates + 1, + 2 ,
X
=1

Val

X
1 =
( (

1
( +

+ ) for = 1 2

( +1

+ )

1)

1, receivable at

+ )) = 1

=1

where Val ( ) denotes the current price of receiving a random amount of


dollars at the
maturity date where these are due, and where we have used (i) the denition of the LIBOR
rate, as dened in Eq. (11.1) and, (ii) a no-arbitrage relation, which we shall show in Section
12.1 of Chapter 12, namely that the present value of 1 ( +
1 + ) paid at + is simply
( +
1).
Therefore, we can interpret the par yield as the xed rate in a swap contract that costs nothing
at origination, where one counterparty pays another counterparty, the xed rate against the
variable LIBORa spot swap rate. These swap contracts, of which Section 11.5.4.2 gives a
numerical example, are analyzed in detail in Section 12.7.5 of Chapter 12.
11.2.2.3 Forward rates

In a forward rate agreement (FRA, henceforth), two counterparties agree that the interest
rate on a given principal (say $1) in a future time-interval [
] will be xed at some level
. The FRA works as follows: at time , the rst counterparty receives $1 from the second
counterparty; at time
, the rst counterparty pays back $ [1 + 1 (
) ] to the
second counterparty. The amount
is agreed upon at time . Therefore, the FRA makes it
possible to lock-in future interest rates. We consider simply compounded interest rates because
this is the standard market practice.
The amount for which the current value of the FRA is zero is called the simply-compounded
forward rate as of time for the time-interval [
], and is usually denoted as (
). We
can use absence of arbitrage to express (
) in terms of bond prices, as follows:
(
(

)
=1+(
)

(11.4)

Indeed, an investor in a zero from time to time is one who simultaneously makes (i) a
spot loan from to , and (ii) a forward loan from to . In the absence of arbitrage, it must
be the case that,
= [1 + ( )]
[1 + (
[1 + ( )]
|
{z
}
|
{z
} |
zero loan

spot loan

) (
{z

forward loan

)]
}

where ( ) is the spot rate at time for maturity . Eq. (11.4) follows by the denition of
( ) in Eq. (11.2).
Alternatively, consider the following portfolio implemented at time . Go long one bond
maturing at and short ( )/ ( ) bonds maturing at , for the time period [ ]. The
initial cost of this portfolio is zero because,
(

)+

(
(

)
)

)=0

At time , the portfolio yields $1, which originates from the bond purchased at time . At time
, we buy the ( )/ ( ) bonds shorted at , and maturing at the very same , which
542

c
by
A. Mele

11.2. Markets and interest rate conventions

obviously costs $ ( )/ ( ). The portfolio, therefore, is acting as a FRA: it pays $1 at


time , and $ ( )/ ( ) at time . In addition, the portfolio costs nothing at time
. Therefore, the interest rate implicitly paid in the time-interval [
] must be equal to the
forward rate (
), as stated in Eq. (11.4).
11.2.3 Yields to maturity on coupon bearing bonds
Finally, the yield to maturity (YTM, henceforth) on a bond is simply its rate of return. It is
the discount rate equating the present value of the payo stream to its market price,
:

( )=

X
=1

(1 + )

1
(1 + )

(11.5)

P
1
where
. Eq. (11.5) di ers from the price formula ( ) =
=1 (1+ ( )) + (1+ ( )) , by
utilizing the same discount rate to discount the future payments. Clearly, spot rates coincide
with the YTM on a zero, i.e. = ( ).
Next, suppose that coupon payments are the same for each ,
= say, and the payment
dates are set regularly. Eq. (11.5) then collapses to,

+

( )= 1

(1 + )

(11.6)

That is, the price of the coupon bearing bond, ( ), is a convex combination of that of a

perpetuity, , and that of a zero expiring at ,


. For large maturities, the bond price gets
closer to that of a perpetuity,
whereas
for
low
maturities,

the
bond price is closer to that of a

zero. If , ( )
1 , and if , ( )
1 . In the special case where = ,

the bond would quote at par.


This property is a special case of a more general characteristics of oating rate bonds. Floating
rate bonds pay coupons equal to the LIBOR, and would quote at par at their rst reset date
(see Section 12.7.5 in the next chapter). Mathematically, ( ) can be understood as the price
of a oating rate bond in a market without uncertainty, where the same coupon would
always be paid. If this coupon is the same as the interest rate we use to discount future cash
ows, our coupon bearing bond is, in fact, a oating rate bond, and would therefore need to
quote at par.
Finally, Appendix 1 provides standard material regarding how the yield curve can be determined based on a number of coupon-bearing bonds, based on bootstrapping, as well as
examples of how to implement trades to prot from temporary arbitrage opportunities arising
in connection with misaligned coupon-bearing bond prices.
11.2.4 Accruals, invoice, and clean prices on coupon bearing bonds
How do we cope with coupon bearing bonds that trade at any date after issuance? Naturally,
the price to pay at any date is still the present value of future coupons although then, a few
distinctions might be made, which are arise due to the discreteness of coupon payments. By
absence of arbitrage, the price of a coupon bearing bond is,
(

)=

( )+

=1

543

c
by
A. Mele

11.2. Markets and interest rate conventions

where
is a sequence of coupons paid o over some dates , = 1
, and is the
frequency of coupon payments. For example = 2 for semiannual coupon payments, in which
1
1
case
as the rst available
1 = 2 and in general,
1 = . Finally, we dene
coupon payment date a bondholder would have access to after time , i.e. :
.
1
Due to the discreteness of coupon payments, discontinuities arise, because at time
, the
coupon is paid o , determining a discrete drop in the bond price. In other words, we have:

(
(

)+

( )+

for

= +1

)=

(11.7)

( )+

for =

= +1

is equal to,

such that the discrete price drop at time


(

, for =

)=

It is market practice to avoid consider the e ects relating to this drop in value while quoting
bond prices. Dene the accruals as that idealized portion of the coupon payments that occurs
between
1 and ,
1

Dene a clean price as the di erence between the dirty price

Accr
0

Accr =

for

for =

1
1

) and the accruals,

( )+

) (11.8)

= +1

( ), does not experience any jumps over the reset


It is easy to see that the clean price,
, which is precisely the amount the dirty price drops by, such
datesthe accruals drop by
that the rst term in the equality in Eq. (11.8) is zero. Therefore, dirty and clean prices are
the same at each coupon payment.
The dirty price, ( ), is sometimes referred to as full price or invoice, reecting the fact
that it is the price at which buyers and sellers have to settle a transaction. US traders typically
quote clean prices, because dirty prices mainly reect mechanical adjustments relating to coupon
payments, whereas clean prices are more sensitive to changes in the general macroeconomic or
the xed income markets outlook, as illustrated by a basic example below. However, on a
transaction perspective, the real object of interest is the dirty price ( ) is the price to be
paid originating from the bond transaction.
The following picture depicts the typical shark-tooth pattern of the dirty price in an overly
simplied world, where
= 4, annualized, = 2, the yield curve is at at the continuously
544

c
by
A. Mele

11.3. Duration and convexity hedging and trading


compounded rate

= 7%, and the maturity of the bond equals

= 5 years.

102
100
98

Prices

96
94
92
90

Dirty price
Clean price

88
86

2
3
Calendar time, in years

The sharp increase in the dirty price is mainly due to the term 2 ( ) in Eq. (11.7). This
term increases roughly linearly over the payment dates, and overwhelms the increase in value
of the second term of Eq. (11.7). The drop is precisely 2 at each payment date, where the dirty
price equals the clean price.
The pattern in the picture is of course not unique. For example, the clean price decreases
over time when the yield curve is at at a continuosly compounded = 2%. When the level
of the yield curve is high, future coupons are heavily discounted such that the value of them
is low when time to maturity is high. When, instead, the level of the yield curve is low, the
current value of future coupons is high, and the clean price converges to 100 from above. We can
determine a threshold value for , say , such that the clean price is approximately constant at
100, using Eq. (11.6). Because we assume the yield curve is at at continuously compounded,
and = 2, we need to nd : = (1 + 2 )2 , with = 2%, whence = 3 96%.

11.3 Duration and convexity hedging and trading


The risk of a default-free bond is that future spot rate could change. In other words, the bond
required return (i.e., the YTM) can obviously change in the future. Consider the denition of
the YTM in Eq. (11.5), and introduce the following function,
( ; )

X
=1

(1 + )

1
(1 + )

(11.9)

This function mimics how the market price ( ) would behave after the initial YTM
changed to some value and, naturally, is such that, (; ) = ( ). The function ( ; )
is thus the simplest bond pricing model we could ever formulate. It is perhaps too simple, as it
545

c
by
A. Mele

11.3. Duration and convexity hedging and trading

does not rely on absence of arbitrage. Nevertheless, we can use this very preliminary model to
say something about interest rate risk.
We dene a measure of risk of the bond based on the sensitivity of the hypothetical bond
price ( ; ) with respect to changes in . We answer the following question: What happens
to the hypothetical bond price ( ; ), once we perturb the one rate that discounts all the
payo s? The sensitivity we are dealing with is simply the rst partial of the bond-pricing
formula in Eq. (11.9) with respect to ,
( ; )=

1
1+

"
X
=1

1
+
(1 + )
(1 + )

where the subscript denotes a partial derivative, i.e.


( ; ) =
( ; ). The sensitivity,
( ; ), is the tangent to the price-yield relation, which Figure 11.2 illustrates in the case of
a zero-coupon bond with time to maturity equal to 10 years.

1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.10

YTM

FIGURE 11.2. The relation between the YTM and the bond price, and its rst-order
(duration) and second-order (convexity) approximations. The solid line depicts the price
of a zero coupon bond expiring in 10 years, as a function of the YTM, (1 + YTM) 10 ,
and the two dashed lines are rst-order and second-order Taylors expansions around
YTM = 5%.

11.3.1 Duration
We dene the Macaulay duration as,
DMac

X
( ; )
(1 + ) =
( ; )
=1
546

c
by
A. Mele

11.3. Duration and convexity hedging and trading


where
/ (1 + )
( ; )

1/ (1 + )
( ; )

In words, the Macaulay duration is a weighted average of the payment dates. The weights
are the discounted coupons at the various payment dates,
/ (1 + ) , related to the current
market value of these coupons, i.e. the bond price ( ; ) when the YTM is . That is, the
weights are the proportionsPof the bonds present value that is attributable to the payo at
date . The weights satisfy
+ = 1. Therefore, DMac
. The Macaulay duration is
=1
a measure of how far in the future the bond pays o . For zeros, DMac = .
For small , DMac ( ) is simply the semi-elasticity of the bond price with respect to the YTM.
This semi-elasticity is also referred to as modied duration:
D

DMac
1+
D

=
+
.
A simple computation reveals that the modied duration, D, satises:
Therefore, the modied duration is decreasing in the YTM when the bond price is su ciently
convex in the YTM, which is surely the case for long-term maturity dates. Interestingly, the
modied duration is increasing in the YTM when the bond price is concave in the YTM,
a property featured by callable bonds and mortgage-backed securities (MBS, henceforth), as
explained in the next chapters (see also, Section 11.8.1 of this chapter, for a basic numerical
example). Intuitively, the incentives to proceed to early repayments kick in as the YTM
decreases, which makes the duration of the MBS decrease.
The Macaulay duration for continuously compounded rates is even simpler to calculate. First,
dene the continuously compounded YTM as the single number such that
(; ) =

=1

where (; ) is the market price of a bond paying o the principal of one at maturity and
the stream of payo s . Next, consider, the function 7
( ; ). Dene the semi-elasticity
of the bond price ( ; ) with respect to the continuously compounded YTM ,
( ; )
=
( ; )

+
( ; )

=1

X
=1

( ; )
( ; ) =
,
= ( ; ) and = ( ; ) . Note, the weights are such that
+ = 1. Therefore, the Macaulay duration for continuously compounded rates
=1
is equal to the semi-elasticity of the bond price with respect to the continuously compounded
YTM .3 This result may simplify some calculations.

where
P

3 Mathematically, we could have obtained this result in a straightforward manner, as follows. Dene the bond price function as
( ( )), where by denition, ( ) =
1. Hence,
( ( )) =
( ( )) 0 ( ) =
( ( ))
=
( ( )) (1 + ). It follows

that DMac =

(1+ )

547

11.3. Duration and convexity hedging and trading

c
by
A. Mele

11.3.2 Convexity
Convexity measures how the sensitivity, , changes with . It is the second partial of the bond
price with respect to ,
. Positive convexity means that the interest rate sensitivity declines
as increases, as in Figure 11.2. This properties arises because (
)=
0. Formally,
convexity is dened as,
C
such that we can expand the bond price as follows:
1
+ C ( )2
(11.10)
2
That is, for very convex securities, duration may not be a safe measure of return, as Figure
11.2 illustrates.
D

11.3.3 Asset-liability management


11.3.3.1 Introductory issues

We can use duration to assess how exposed a bond portfolio is to movements in the interest
rates and, then, immunize this portfolio to interest rates changes. Duration is relevant to
asset-liability management. For example, pension funds have known streams of liabilities that
must be matched by the assets they hold. In words, the duration of the assets must equal
the duration of the liabilities. For example, in the UK, pension funds must mark-to-market
the liabilities. Therefore, one objective of these funds is to immunize their liabilities against
movements in the interest rates.
Alternatively, consider the following basic example. A bank borrows $100 at 2% for a year
and lends this money at 4% for 5 years, where the higher rate compensates for a variety of
factors such as business risk or the banks market power. Assuming that the banks borrower
does not default, the bank generates prots equal to $(4% 2%) 100 = 2 in the rst year, and
according to its books. However, the appropriate assessment of the current situation should not
make reference to past market conditions but, obviously, current. Suppose, for example, that
after one year, the interest rate for borrowing increases from 2% to 5%, and remains such for 4
additional years. This assumption is unrealistic, but it gives the idea of where the action is. The
045
market value of the assets is, then, 1001
= 100 09. Note, we discount at the 5% rate, as this is
1 054
4
the cost of capital for the bank. The market value of the liabilities is, instead, 100 1 02 = 102.
The banks problem is a duration mismatch.
Let us return to the pension fund example, and consider the following extreme case. In 30
years from now, a pension fund is due to deliver $100,000 to some future retiree. Suppose the
current market situation is such that the yield curve is at at 4%, such that the market value
of this liability is $100 000 (1 04) 30 = $30 832. Accordingly, the would-be retiree invests
$30 832 in the pension fund. So we have the following situation:
Cash
Pensions
$30 832 $30 832
4 Suppose,

at time 1 is

for example, that the bank wants to borrow $102 to pay o its liabilities, and for 4 additional years, then the prot

100(1 04)5 102(1 05)4


(1 05)4

1 9057. Alternatively, the 5% interest rate is just an opportunity cost of capital, dened as

max {borrowing cost, lending rate}, where the borrowing cost is that the bank might obtain from other banks, for example.

548

c
by
A. Mele

11.3. Duration and convexity hedging and trading

Suppose, now, that the pension fund does not invest this cash. This strategy is of course
ine cient, but it is precisely the point of this exercise to see why it is so.
Consider two extreme cases, occurring under two scenarios underlying developments in the
xed income market. In one week,
(i) Scenario : the yield curve shifts up parallely to 5%. Accordingly, the value of the liability
for the pension fund becomes: $100 000 (1 05) 30 = 23 138.
Cash
$30 832

Prot
$7 694
Pensions
$23 138

(ii) Scenario : the yield curve shifts down parallely to 3%. Accordingly, the value of the
liability for the pension fund is: $100 000 (1 03) 30 = 41 199.
Cash
$30 832

Loss
$10 367
Pensions
$41 199

A drop in the yield curve results in a loss for the pension fund: when interest rates go down,
the pension fund faces a challenging situation as it has to honour its obligations in 30 years, but
the nancial market yields less than it promised one week earlier. Naturally, the pension fund
would face the opposite situation were interest rates to go up, as in the rst scenario above.
There are many good ethical reasons we might dislike pension funds have to experience
interest rate volatility. The volatility in this very basic example stems from the simple fact
the pension fund receives $30 832, which it then puts under the pillow. The most e cient
way to erase volatility could have been to invest $30 832 in a 30 year bondat the market
conditions of 4%. This is perfect hedging, which relies on the assumption we have access to
such a long-term bond. How do we deal with situations in which we do not have access to such
a bond? The next sections illustrate these cases.
11.3.3.2 Hedging

Let us consider a portfolio of two bonds with di erent durations. Its value is given by,
=

(1 ) +

(2 )

where 1 (1 ) and 2 (2 ) are the market value of the bonds, 1 and 2 are the YTM on the
bonds and, nally, 1 and 2 are the quantities of bonds in the portfolio. Let us consider a small
change in the two YTM 1 and 2 . We have,
=

[ 1 D (1 )

(1 ) 1 +
549

2 D (2 )

(2 ) 2 ]

c
by
A. Mele

11.3. Duration and convexity hedging and trading

The question is: How should we choose 1 and 2 such that the value of the portfolio remains
the same, even after a change in 1 and 2 ?
Let us assume a parallel shift in the term structure of interest rates. In this case, 1 = 2 .
The portfolio is said to be immunized if its value
does not change when 1 and 2 change,
i.e.
= 0, which is true when,
1

D (2 )
D (1 )

(2 )
1 (1 )
2

(11.11)

A useful interpretation of this portfolio is that we may be holding a bond with some duration,
say we hold 2 units of the second bond. Given these holdings, we may wish to sell another
bond, possibly with a lower duration, to hedge against movements in the price of the bond we
hold.
Alternatively, we can think of the second asset as a liability, with a value that uctuates after
interest rates change. Then, we may wish to purchase some asset to hedge against the liability.
Mathematically, 2
0 and 1
0. Moreover, Eq. (11.11) reveals that the number of assets
to hold to hedge against the liability is high if the ratio of the two durations of the assets,
D (2 )/ D (1 ), is large. In this case, the hedging position is obviously ine cient. Asset-liability
management, and immunization, is costly when we hedge high-duration liabilities with low
duration assets. We now illustrate these claims through a few basic examples.
11.3.3.3 A rst example: hedging zeros with zeros

Suppose that we hold one bond, a zero with maturity equal to 5 years. We want to hedge
this risk through another bond, a zero with maturity equal to 1 year. Let us assume that the
term-structure is at at 5%, discretely compounded. Then,
1
1
DMac (1 )
1
= 0 95238 D (1 ) =
= 0 95238
=
=
1 + 1
1 + 0 05
1 + 1
1 + 0 05
1
1
DMac (2 )
5
= 4 7619
D (2 ) =
=
2 (2 ) =
5 =
5 = 0 78353
1 + 2
1 + 0 05
(1 + 2 )
(1 + 0 05)
1

(1 ) =

and:

D (2 ) 2 (2 )
4 7619 0 78353
1 = 4 1135
2 =
D (1 ) 1 (1 )
0 95238 0 95238
That is, to hedge the 5Y zero, we need to short-sell approximately four 1Y zeros. The balance
of this hedging position is,
1

(1 )

(2 )

= ( 4 1135) 0 95238 + 0 78353 =

3 1341

(11.12)

This is quite an ine cient hedging position. One reason it is ine cient is that hedging longterm bonds with short implies we should rebalance too often. Moreover, as time goes on, the
sensitivity of the short-term bonds to changes in the YTM is very small (eventually, the price
equals face value plus coupon, at maturity), compared to that of long-term bonds. Therefore,
rebalancing becomes increasingly severe as time unfolds.
Next, we study how the value of this portfolio changes after large changes in the YTM.
By the assumption that the initial term-structure is at at 5%, 1 = 2 = 5%. Moreover, by
rearranging Eq. (11.12),
2

( = 5%) = 4 1135 1 ( = 5%)


550

3 1341

(11.13)

c
by
A. Mele

11.3. Duration and convexity hedging and trading

The left hand side of Eq. (11.13) is the price of the 5Y bond. The right hand side is the value
of a replicating portfolio, which consists of (i) approximately 4 units of the 1Y bond, and (ii)
the balance of the hedging position. Precisely, the right hand side is a net obligation: the value
of the assets we need to purchase back (approximately 4 units of the 1Y bond), net of some cash
we already have ($3.1341), which we can use to partially purchase these assets. In other words,
we can interpret this position as a trade, where we buy the 5Y bond and sell approximately
four 1Y bonds at some initial time 0 . Come some time 1
0 , we liquidate, by reversing the
position.
If interest rates do not change, then, approximately, and abstracting from passage of time,
there will be no prots or losses, once we liquidate, or mark-to-market, this position. If interest
rates change, 6= 5%, Eq. (11.13) can only approximately hold,
2

( )

4 1135

( )

3 1341

Figure 11.3 plots the left hand side and the right hand side of this relation.

1.0

0.9

0.8

0.7

0.6
0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.10

YTM

FIGURE 11.3. Dashed line (top): The price of the 5Y zero, 2 ( ) = (1+1 )5 , where is
the YTM. Solid line (bottom): The value of the replicating portfolio consisting of (i)
4 1135 units of the 1Y zero, and (ii) the balance of the hedging position, which is equal
to $3 1341, i.e. 4 1135 1 ( ) 3 1341, where 1 ( ) = 1+1 is the 1Y zero price.

What is going on? We are hedging the 5Y zero by selling approximately four 1Y zeros. In a
neighborhood of = 5%, the value of the synthetic 5Y zero we sold, 4 1135 1 ( ) 3 1341,
behaves as 2 ( ). However, the 5Y zero displays more convexity than the synthetic bond.
This larger convexity implies that:
If interest rates go down, the price of the 5Y zero bond we hold increases more than the
value of the synthetic bond we sold.
If interest rates go up, the price of the 5Y zero bond we hold decreases less than the value
of the synthetic bond we sold.
551

c
by
A. Mele

11.3. Duration and convexity hedging and trading

+
In both cases, we make prots. Prots are, indeed, equal to ( 2+
( 2
1 1 )
1 1)
+
+
2
1
1 , where
2
1 and
2 are the prices of the 1Y and 5Y bonds. But by convexity,
increases more than 1 1 when interest rates go down, and 2 decreases less than 1 1 when
interest rates go up, or
0.
2
1
1
Note that this is not an arbitrage opportunity! The previous reasoning hinges on the assumption of a parallel shift in the term-structure of interest rates, that is 1 = 2 , where 1 = spot
rate for 1 year, and 2 = spot rate for 5 years. While parallel shifts in the term-structure seem
empirically relevant, they are not the only shifts that are likely to occur, as explained in the
next chapter.
To sum up, duration hedging is a useful tool, but with quite important limitations. As
Eq. (11.10) makes clear, duration is only a rst-order approximation to the price of a bond.
Moreover, duration hedging obviously requires rebalancing, which might be substantial. As
we know, a conventional bond is strictly convex in the YTM. Therefore, for large changes in
the YTM, the duration-based hedging ratios should be updated. Re-adjustments are in order
anyway, independently of whether YTM change or not, as the duration of conventional xed
income securities obviously decreases over time.

11.3.3.4 Duration trading: Barbell and bullet hedges

As a second example of duration hedging, consider the barbell trading, which is a way to
hedge some liability (a bullet) with duration 2 through two assets with durations 1 and
3 , where
1
2
3 that is a trade where we sell
2 and buy
1 &
3 . This trade is
expected to work as soon as the yield curve attens, with its short-end part not going high too
much. Moreover, investing in the short-term segment of the yield curve, allows one to invest
elsewhere relatively rapidly once the rst asset expires, were the bond market to go down.
[Mention di erences with atteners and steepeners]
To illustrate a barbell trade, consider the example in the previous section, and suppose that
another bond is available for trading, a zero with maturity equal to 10 years. We aim to hedge
against movements in the price of the 5Y zero with a portfolio consisting of (i) one 1Y zero and
(ii) the 10Y zero. We keep on assuming that the yield-curve is at at 5%, and only consider
parallel shifts in the term-structure of interest rates. We consider extensions below.
Such a buttery trade can be implemented as follows. We look for a portfolio of the 1Y and
10Y zero with the following properties: (i) the market value of the portfolio equals the market
price of the 5Y zero,
(11.14)
2 (2 ) =
1 (1 ) 1 + 3 (3 ) 3 ;
and (ii) the local risk of the portfolio equals the local risk of the 5Y zero,
D (2 ) 2 (2 ), i.e.:
D (2 )

(2 ) = D (1 )

(1 )

+ D (3 )

(3 )

(2 )

2 =

(11.15)

(2 )
3 (3 )

(11.16)

The solution to Eqs. (11.14) and (11.15) is given by,


1

D (3 )
D (3 )

D (2 )
D (1 )

(2 )
1 (1 )
2

D (2 )
D (3 )

D (1 )
D (1 )

By the same calculation in the example of the previous section, we have that
and D (3 ) = 9 5238. Using the gures in the previous example, we calculate
(11.16),
1

9 5238 4 7619 0 78353


= 0 45706
9 5238 0 95238 0 95238

552

4 7619
9 5238

3
1

(3 ) = 0 61391
and 3 in Eqs.

0 95238 0 78353
= 0 56724
0 95238 0 61391

c
by
A. Mele

11.3. Duration and convexity hedging and trading

Figure 11.4 depicts the behavior of the bullet price and the market value of the barbell as
we change the YTM. Note that the barbell portfolio is more convex than the bullet. Moreover,
the barbell trade is self-nanced. By construction, the value of the bullet we sell equals the
value of the barbell portfolio. Therefore, large movements in the YTM lead to prots under the
assumption of parallel shifts in the yield curve.
Note that in this example, as in that of the previous section, the direction of interest rate
movements does not matter for value creation. A convexity trading such as this might be the
basis of a standard non-directional strategy, resembling one where, say, we go long a number
of undervalued stocks and short a number of overvalued stocks such that the initial value
of the portfolio is zero. Then, we likely make prots: in good times, the undervalued stock
should increase in value more than the overvalued, and in bad times, the drop in value of the
undervalued stock should be less severe than that of the overvalued. For the barbell, the driver of
value is convexity: as Eq. (11.10) illustrates, the convexity term, C, is, trivially, always positive,
independently of the sign of
. Therefore, as soon as we hedge a bond with a portfolio that
has the same duration as the given bond, but higher convexity, the position leads to prots,
given the assumptions made so far.
Naturally, a barbell trade does not lead to an arbitrage. The P&L summarized by Figure
11.4 relies on the assumption of parallel shifts in the yield curve. However, and as explained in
the next chapter (Section 12.3), it is not realistic to assume that large and parallel movements
in the yield curve. Historically, large shifts, occurring over long horizons, are accompanied by
changes in the yield curve shape. In other words, factors a ecting parallel movements in the
yield curve are frequent, albeit not the only ones. At least three factors are needed to explain
the entire variation of the yield curve.

$ 1.0

0.9

0.8

0.7

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.10

YTM

FIGURE 11.4. Barbell trading. Dashed line (bottom): The price of the 5Y zero, 2 ( ) =
1
, where is the YTM. Solid line (top): The value of the barbell portfolio consisting
(1+ )5
of (i) 0 45706 units of the 1Y zero and (ii) 0 56724 of the 10Y zero, i.e. 1 ( 1 ) 0 45706 +
1
1
3 ( 3 ) 0 56724, where 1 ( ) = 1+ is the 1Y zero price and 3 ( ) = (1+ )10 is the 10Y
zero price.

553

c
by
A. Mele

11.3. Duration and convexity hedging and trading

Table 11.4 considers the case of non-parallel shifts in the term-structure. We assume that the
initial term-structure is not at. Then, we consider two scenarios: (i) A twist in the termstructure, i.e. long-term rates lower than short-term; (ii) a steepening of the term-structure.

TABLE 11.4.
YTM
Initial term-structure
1Y
5Y
10Y

1 = 4%
2 = 5%
3 = 6%

Bullet price

Mod. dur.

Barbell value =
1 1 (1 ) + 3 3 (3 )

(1 ) = 0 961 D (1 ) = 0 961
2 (2 ) = 0 783 D (2 ) = 4 762
3 (3 ) = 0 558 D (3 ) = 9 434
1

Barbell value = 0 783


Twist
1Y
5Y
10Y

1 = 6%
2 = 5%
3 = 4%

(1 ) = 0 943 D (1 ) = 0 943
2 (2 ) = 0 783 D (2 ) = 4 762
3 (3 ) = 0 675 D (3 ) = 9 615
1

Barbell value = 0 847


Steepening
1Y
5Y
10Y

1 = 4%
2 = 5%
3 = 7%

(1 ) = 0 961 D (1 ) = 0 961
2 (2 ) = 0 783 D (2 ) = 4 762
3 (3 ) = 0 508 D (3 ) = 9 346
1

Barbell value = 0 751


We use the portfolio in Eq. (11.16). Indeed, note that Eq. (11.16) does not rely on the
assumption that the yield curve is at. It only relies on the assumption that subsequent movements in the yield curve are parallel. The exercise we are about to perform aims to explore the
implications of a barbell trade once the assumption of parallel movements is incorrect.
Eq. (11.16), we nd that in correspondence of the initial term-structure (1 = 4% 2 =
5% 3 = 6%), 1 = 0 449 and 3 = 0 629. We keep this portfolio xed, and determine the
barbell value, 1 1 (1 ) + 3 3 (3 ), occurring at the two scenarios twist and steepening,
with 2 (2 ) = 0 783 in all cases. The trade is as follows: at time zero, we sale short the ve
year bond, which we hedge through the barbell portfolio ( 1 3 ), using the proceeds of the
short-sale. Then, at some future date, we purchase back the ve year bond and sell back the
portfolio ( 1 3 ). The convexity of the barbell trade is, in fact, a view about movements of
long-term bond prices, and leads to prots in the twist scenario. That is, by convexity, the
price 3 varies more than the price of shorter maturity zeros, thus leading to prots. Note,
however, that this strategy leads to losses in the steepening scenario.
We need to state an important caveat. The previous conclusions rely on a comparative
static analysis, and abstract from the principle that term-structure movements occur under
no-arbitrage. For example, the value of the zeros changes over the horizons we are designing
scenarios for, even without any changes in the yield curve. Whether this e ect is minor depends
on the horizon and the model we use to generate scenarios. In Section 11.6.6, we shall revisit
the example of this section and illustrate how passage of time and absence of arbitrage can be
factored into the analysis, and change some results emanating from Table 11.4.
554

11.3. Duration and convexity hedging and trading

c
by
A. Mele

11.3.3.5 Fixed income arbitrage strategies

The previous convexity trades are examples of yield curve arbitrage strategies. They may
purely rely on convexity or, as discussed in the previous section, on directional views about
interest rate movements. For example, we have explained, we may short ve year bonds, and go
long two- and ten-year bonds, as we view that short-term interest raise will raise and medium
term interest rates will lower. This buttery strategy is somehow cheap, intellectually, and
not necessarily rewarding, and will be further analyzed in Section 11.6.6.
Swap spread arbitrage is a popular strategy. It was responsible of leading LTCM to a loss of
about $1.6 billion in 1997. The strategy works as follows: (i) enter a swap paying the oating
LIBOR, , and receiving a xed rate ; (ii) short a par Treasury with the same maturity as
the swap, thus paying the xed coupon rate
, and invest the proceeds at the repo rate .
Thus, the payo of the strategy is the xed spread to be received, =
, and the oating
spread to be paid,
=
. So we go long or short this strategy according to whether we
view
to be larger or smaller than the average oating spread
over the strategy horizon.
Historically, the spread
has certainly been volatile, but quite stable, so it is a reasonable
strategy. The problem occasionally, though,
can attain quite large values.
Trading strategies more sophisticated than the previous ones rely on models, aiming to identify points of the yield curve that are misaligned from those predicted by the models. All in all,
the strategy is to buy the cheap and short the model-based rich, where the model-based rich
is replicated through a portfolio with cash and the bonds that are well-priced by the model,
weighted with model-based delta, as in the derivation of the bond pricing formula in Section
12.4.2.2 of the next chapter.

11.3.3.6 Negative convexity, and market volatility

What happens when bond prices have negative convexity? In the next chapter, we shall see
that the value of a callable bond can be concave in the short-term rate. A similar feature is
displayed by mortgage-backed-securities (MBS, henceforth), which can now be concave in the
YTM! The reason for this negative convexity is that early repayments are likely to occur as the
YTM decreases, which entails two inextricable consequences: (i) the price of the MBS increases
less than a conventional bond price after a decline in the YTM, especially when the YTM is
low; (ii) the duration of the MBS decreases as the YTM decreases.
Hedging against MBS might lead to an increased volatility in rate markets. The mechanism
is the following. Institutions that are long MBS would typically short conventional bonds for
hedging purposes, consistently with the prediction of Eq. (11.11). However, the duration of
MBS increases as interest rates increase, due to negative convexity: Duration = Convexity.
Therefore, an interest rate increase can lead these institutions to short additional conventional
bonds, which worsens liquidity and leads to a further increase in the interest rates, thereby
feeding a vicious circle. Perli and Sack (2003) estimate that in 2002 and 2003, this mechanism
may have amplied the volatility of long-term US rates by a factor between 15% and 30%. It is
an instance of what is sometimes dened as endogenous risk, the circumstance that the trend
of a certain economic variable triggers actions from market participants that in turn, reinforce
the initial trend, as in the case of the 1987 crash discussed in Section 10.4.5 of Chapter 10, or
in the case of assets sell-o in times of crisis, discussed in Section 13.5.3 of Chapter 13.
555

c
by
A. Mele

11.4. Foundational issues in interest rate modeling

11.4 Foundational issues in interest rate modeling


This section is an introduction to the design of binomial trees to price and t xed income
securities. Such a design requires revising a few details pertaining the equity case examined in
Chapter 10. It is instructive to review how generally binomial trees are to be used to model
price movements:
(i) At the time of evaluation, we observe the state. In the next period, there can be two
mutually exclusive states of the world: (a) the state up, occurring with probability ;
and (b) the state down, occurring with probability 1
.
(ii) After two periods, there can be three mutually exclusive states of the world, as in the
following diagram. We label the tree in this diagram a recombining tree, to emphasize
that the up & down and the down & up nodes are the same.
"up","up"
p

"up"
state
p

1-p

"up","down"

Today

"down","up"
p

1-p

"down"
state
1-p

"down","down"

F irst period

Second period

The previous diagram can be used to price options written on stocks. The stock price unfolds
through the branches of the tree. Then, we gure out the no-arbitrage movements of the option
price along the tree. Suppose, however, we wish to price an option written on a zero, a 3 Year
zero say. Can we apply the same methodology to price the option? The answer is no, and the
reason is that we cannot exogenously track the movements of the prices of the zero, as in the
case of the stock price. Instead, after one year, the 3 Year zero becomes a 2 Year zero, i.e. quite
a di erent asset.
These issues can be mitigated by modeling the movements of the entire yield curve. There are
two approaches, as in the diagram below. In the rst, we model the dynamics of the short-term
rate, dened as the interest rate on a loan with maturity equal to the time intervals in the
tree. The resulting model, referred to as model of the short-term rate, has implications in terms
of the movements of the entire term-structure. This approach, developed in the next section,
leads to evaluation formulae in which the current price of the zeros predicted by the model are
556

c
by
A. Mele

11.4. Foundational issues in interest rate modeling

not necessarily equal to the market prices. A second approach, based on calibration, leads to
the so-called no-arbitrage models, where we model the dynamics of the entire term-structure.
This approach gives rise to option evaluation formulae in which the current prices of the zeros
predicted by the model are equal to the market prices. We describe this approach in the last
sections of this chapter, using binomial trees, with the next chapter developing their continuous
time counterparts.
Models of the short-term rate

Interest rates

No-arbitrage

Input
No-arb models

Market prices

Prices, not
market prices
Output

No-arbitrage

Interest rates

11.4.1 Tree representation of the short-term rate


11.4.1.1 Recursive evaluation

Consider a two-period, two-state tree, where the current short-term rate is . The development
of the short-term rate is uncertain. That is, the future short-term rate, , is random, and can
take two values: either + with (physical) probability , or
with probability 1 . We assume
that +
.
+
( + )
%
1

&

Let us explain. We are looking for a pricing function of , say (


), such that it is
( = + ) in the bad state of the world, and ( =
) in the good. The very same
function is ( =
), today. When it comes to price options in a complete market setting
such as Black & Scholes, we take the stock price as exogeneous and, then, search for an
endogenous call pricing function, as explained in the previous chapter, achieving a preferencefree evaluation. The usual argument relies on market completeness. In the Black & Scholes
market, the stock price is driven by one source of randomness, such that markets are completed
by the very same asset, and the option price can be expressed in a preference-free format,
through a replication argument.
In the bond pricing case of this section, we shall also price a bond through a replication
argument. However, the resulting evaluation formula will not be preference-free, because the
risks a ecting the asset are not tradedthe short-term rate is not traded. While we can still
determine the price of the bonds through no-arbitrage, the resulting evaluation formula is not
preference-free.
Let us elaborate. Suppose that two zeros with distinct maturities are available for trading.
A money market accounting technology is also available (MMA, in the sequel). Investing $1 in
the MMA generates $1(1 + ) in the second period. We derive an evaluation formula for the
zero based on the previous model of the short-term rate. The idea is to build up a portfolio
that contains one zero and the MMA. We, then, make sure the value of this portfolio in the
second period replicates the value of the zero we wish to price. By no-arbitrage, the value of
557

c
by
A. Mele

11.4. Foundational issues in interest rate modeling

the portfolio in the rst period equals the value of the zero we wish to price. The appendix
develops the arguments, and shows that in the absence of arbitrage, there is a function of at
most, say , such that the expected excess return on the bond satises:
[ (

)]

(1 + )

)=
|

( )
Vol (
{z

)
}

= price volatility

(11.17)

|{z}

= unit risk premium

[ ( )] denotes the expectation of the bond price under , and Vol(


)
=
, which is interpreted as the volatility of the short-term rate.
The meaning of Eq. (11.17) is quite simple. Suppose you invest $ (
) into a MMA. Its
gross return is then (1 + ) (
). Alternatively, suppose you invest $ (
) into a bond,
the expected gross return of which is obviously
[ ( )]. The left-hand side of Eq. (11.17)
is, then, the expected excess return on the bond, and the right-hand side the prediction of the
model. Eq. (11.17) is, in fact, an APT relation, where is interpreted as a unit premium related
to the risk of holding the bond over one period of time. It says that the expected excess return
on the zero equals the volatility of its price multiplied by the unit price of risk. We call the
term,
( )
Vol (
)

price volatility, because it measures the amplitude of the price variation due to changes in the
( )
, i.e. the price-sensitivity, where this price sensitivity is
short-term rate in the future,

normalized by the volatility of the short-term rate, Vol(


). We can elaborate on Eq. (11.17),
and bridge to the continuous time APT relations of Chapter 4. The key observation is that Eq.
(11.17) relies on a tree with a trading period normalized to unity. When the trading period is
equal to some , the interest earned over that period is , and accordingly,
is the
premium to compensate for the risk of holding the bond over an amount of time equal to .
With these changes, Eq. (11.17) can be cast as,
where
+

[ (

)]

such that, by considering small


(

( )
Vol (

, and rearranging terms,

(1 +
(

))

)=

)=

Vol ( )

Once we assume is solution to a stochastic di erential equation, we can use Itos lemma, to
turn the previous equation into a partial di erential equation subject to a boundary condition
that states the the bond price is one at expiration. Chapter 12 contains a rigorous discussion
of these topics.
Eq. (11.17) can be cast in an alternative format that it equally easy to interpret. Rearranging
terms leaves:
(
) ( + ) + [1 (
)] (
)
[ ( )]
(
)=
=
(11.18)
1+
1+
where
is the risk-neutral probability.
Let us add a few considerations. We expect that
0 because bond prices are decreasing
5
in the short-term rate here. Then,
. Hence, the risk-neutral probability of an
5 To

ensure is a probability, we need to have that (i)


1
)

558

and (ii)

. That is,

c
by
A. Mele

11.4. Foundational issues in interest rate modeling

upward movement of the short-term rate, , is higher than the true probability, . An investor
who goes long a bond, is concerned by an increase of the short-term rate in the future and,
hence, corrects the true probability by assigning a higher risk-adjusted probability to the
upward state.
It is the FTAP again. There is not arbitrage if and only if there is a risk-neutral probability
such that prices earn an expected return equal to the short-term rate. As explained in the
rst part of these lectures, it does not mean that the market is risk-neutral. The market,
instead, prices assets by discounting the assets future payo s in a generous way, through . To
compensate for this, the expectation in the numerator is taken under a probability which overweights the bad states of nature, so as to lower back asset prices. It is like we wished to price
assets in Planet Earth, a planet full of frictions, risk-aversion and the like, which are reected
in relatively poor asset evaluation. Then, we search for another planet, which has no frictions
or risk-aversion, where the asset is evaluated through actuarial methods. To reconcile prices in
these two planets, we need the idealized and peaceful planet to display a distorted probability,
which assigns relatively higher chances to bad events than Planet Earth does.
11.4.1.2 One example

Assume the current short-term rate equals 10%. We know that in one year, and with (physical)
probability , will increase by 2 percentage points, and with probability 1
, it will decrease
by 2 percentage points. Finally, with the same probability , the short-term rate prevaling from
the next year to two years time, will increase by 2 further percentage points from its previous
value in one year time. We take the probability of an upward movement to be 20% and the
absolute value of the Sharpe ratio to be 30%. Given these data, we use the formula, and obtain
an estimate of the risk-neutral probability of an upward movement of the short-term rate, equal
to =
= 20% ( 30%) = 50%.
Pricing zeros

Let us price a zero maturing in two years, hinging upon the following tree:
r = 14%

q=

1
2

r = 12%

r = 10%

r = 10%

r = 8%

r = 6%

1 Y ear

559

2 Y ears

c
by
A. Mele

11.4. Foundational issues in interest rate modeling

We can use Eq. (11.18) to ll-in each node of the tree. We start from the end of the tree,
where the price of the two year zero is $1, and then use Eq. (11.18) to ll every node, as
illustrated in Figure 11.5. In one year time, the price of the zero is simply one divided by the
discount factor prevailing at the beginning of the next year. The price we are looking for is
obtained by applying Eq. (11.18) yielding
[ ( 2)]
=
1+

2) + (1
1+

) (

2)

1
2

(0 8928) + 12 (0 9259)
= 0 8267
1 10
P=1

q=

1
2

1
1.12

= 0.8928

0.8267

P=1
1
1.08

= 0.9259
P=1

1 Y ear

2 Y ears

FIGURE 11.5.
Convexity e ects

Does the two-year spot rate equal to 10%? It is a natural and, as we shall see, interesting
question, as the short-term rate is a martingale under , in this particular example. However,
the answer to the previous question is in the negative. Let us elaborate. The two-year spot rate,
(0 2), satises, 0 8266 = (1 + (0 2)) 2 , or (0 2) = 9 98%. That is,

1
1
1
1
0 8267 =
= 0 8264
1+
1+
1 + 1 + ()
In other words, suppose the interest rate is known with certainty to be 10% in the second period.
Then, the price should be 0 8264, because 1 + (). Price, then, increase upon activation of
uncertainty. Its a convexity e ect, which we shall explain in deeper detail the next chapter
(Section 12.4.5.1).
11.4.2 Tree pricing
We can generalize the tree to a multiperiod case. We use Eq. (11.18) to evaluate zeros at all nodes
of the tree and maturities. Given , which can be estimated once we estimate and , we use
recursively Eq. (11.18), and calculate prices of zeros. We can, then, price any derivative written
on these zeros. The drawback of this approach is that the initial term structure is predicted
560

11.4. Foundational issues in interest rate modeling

c
by
A. Mele

with error! Let us illustrate with a concrete numerical example. Consider the tree in Figure
11.6, where the current short-term rate for one year is = 4%. Also shown in this tree is the
price of a hypothetical 3 Year zero at the expiration date, = 3, and at = 2. At = 3, = 1
in all states of nature. At = 2, the price is (
) = [ ( )] (1 + ) = 1 (1 + ), for
= 6%, 4% and 2%. The issue, now, is how to determine the price of the zero in correspondence
of the remaining nodes. We should use the formula, (
) = [ ( )] (1 + ) to populate
the tree, but we do not know , , and . Suppose we estimate and . In this case, we
determine simply as =
, as in Eq. (11.18), and we come up with = 12 . The following
diagram gives the price of the zero in all the nodes at time = 1, and at the evaluation time
= 0, yielding a price of the 3 Year zero equal to 0.8893.
Next, consider a European call option written on the 3 Year zero, with expiration date equal
to 2 and strike price
= 0 95. The following diagram gives the value of the option predicted
by the model at each node of the tree. The model predicts that the current price of the call
option is 0 0124.
11.4.3 Introduction to calibration
Calibration is a procedure by which we search for a given models parameter values, such that
the models predictions coincide with selected empirical counterparts. For example, in the real
business cycle literature reviewed in Chapter 3, we might calibrate a models parameters to
ensure that the correlation of asset returns with output is the same as that in the data. In
this chapter, calibration is a procedure by which we search for a given models parameters that
make the model price equal to the market.
11.4.3.1 Motivation

The model we are dealing with in the previous section predicts that the price of the 3 Year
zero is equal to 0.8893. However, there is no guarantee that this model-implied price equals the
market price of the 3 Year zero. Suppose, instead, that the market price of the 3 Year zero,
$ say, equals 0.8700. What should we do to make the model-implied price of the 3 Year zero
equal to the market price? The question is important: how can we trust an option pricing model
that is not even able to pin down the initial market value of the asset underlying the option
contract? Alternatively, suppose the option is a bespoke product of a bank. The banks client
might question why the banks evaluation model predicts a price of the bond to be so high
(0.8893), compared to the observable and cheap market price (0.8700).

561

c
by
A. Mele

11.4. Foundational issues in interest rate modeling

1
= 2

P=1

r = 6%

q=

1
2

1
1.06

Puu =

= 0.9433

r = 5%
Pu =

q=

qPuu +(1q)Pud
1.05

P=1

= 0.9070

1
2

r = 4%
P =

1
= 2

1
= 2

r = 4%

qPu +(1q)Pd
1.04

= 0.8893

q=

1
2

1
1.04

Pud =

= 0.9615

r = 3%
Pd =

qPud +(1q)Pdd
1.03

= 0.9427

P=1

r = 2%
Pdd =

1
1.02

= 0.9804

P=1

t=0

t=1

t=2

t=3

FIGURE 11.6. The dynamics of the short-term rate

11.4.3.2 Perfectly tting trees

We look for perfectly tting trees, that is, those trees where risk-neutral probabilities and/or
the values of the short-term rate in each node are not given in advance but, rather, are such that
the initial yield curve is tted without error. These trees are called implied binomial trees
implied by the market prices. Let us consider the example in the previous section. To make
the model-implied price of the 3 Year zero equal to the market price, $ = 0 8700, we cannot
take the risk-neutral probability as given, i.e. independent of the observed price $ = 0 8700,
as we did before. Rather, we should calibrate the probability , as follows,
$

= 0 8700 =

1
[
1 04

(5%) + (1

562

(3%)]

(11.19)

c
by
A. Mele

11.4. Foundational issues in interest rate modeling

Puu = 0.9433 , K = 0.9500


Cuu = (Puu K)+ = 0.0000
1

2
q=

r = 5%
Cu =
q=

qCuu +(1q)Cud
1.05

= 0.0055

1
2

r = 4%
C=

Pud = 0.9615 , K = 0.9500

qCu +(1q)Cd
1.04

Cud = (Pud K)+ = 0.0115

= 0.0124
1

2
q=

r = 3%
Cd =

qCud +(1q)Cdd
1.03

= 0.0203

Pdd = 0.9804 , K = 0.9500


Cdd = (Pdd K)+ = 0.0304

t=0

t=1

t=2

where 1 (5%) and 1 (3%) are the prices of the zero at time = 1, in the events that the
short-term rate is up to 5% or down to 3%.
The previous equation follows, again, by Eq. (11.18). Note, now, that the unknown is not
the price, which is instead given by the market price. Rather, we are looking for, or calibrating,
the probability that makes the RHS of Eq. (11.19) equal to its LHS. Naturally, we need to
calculate the prices of the zeros 1 (5%) and 1 (3%). These prices can be found by another
application of Eq. (11.18), as follows,

(5%) =

0 9433 + (1
1 05

) 0 9615

By replacing the previous expressions for

1
= 0 8700 =
1 04

0 9433 + (1
1 05

(3%) =

(5%) and

) 0 9615
563

+ (1

0 9615 + (1
1 03

) 0 9804

(3%) into Eq. (11.19), we obtain,


)

0 9615 + (1
1 03

) 0 9804

c
by
A. Mele

11.4. Foundational issues in interest rate modeling


This is a nonlinear equation in , which we can easily solve, to obtain,
nd:
1 (5%) = 0 9005 and
1 (3%) = 0 9357

= 0 8779. Hence, we

The next diagram depicts the implied binomial tree, i.e. the tree that we obtain after we
match the model-implied price of the 3 Year zero to the market price, $ = 0 8700.
P=1

r = 6%
79

7
0.8

q=

Puu = 0.9433

r = 5%
Pu =
79

7
0.8

q=

qPuu +(1q)Pud
1.05

P=1

= 0.9005

r = 4%
P =

r = 4%

qPu +(1q)Pd
1.04

= 0.8700

q=

7
.87

Pud = 0.9615

r = 3%
Pd =

qPud +(1q)Pdd
1.03

P=1

= 0.9357

r = 2%
Pdd = 0.9804

P=1

Note that 1 (5%) and 1 (3%) are quite away from the values we found earlier whilst imposing
that = 12 . In the implied tree, they are smaller than those obtained with = 12 , state by
state. This is because in the implied tree, = 0 8779, such that the model can match a lower
initial price, 0 8700. The implied tree puts more weight on those states of nature where the
short-term rate is high or, equivalently, bond prices are low. We expect the price of the option
in the implied binomial tree to be lower than that we found earlier, because call option prices
decrease with the underlying. Let us perform the calculations, by relying on the implied binomial
tree depicted in the next diagram.

564

c
by
A. Mele

11.4. Foundational issues in interest rate modeling


Puu = 0.9433
79
0.87

K = 0.9500

Cuu = (Puu K)

= 0.0000

q=

r = 5%
Cu =
q

77

.8
=0

qCuu +(1q)Cud
1.05

= 0.0013

r = 4%
C=

Pud = 0.9615

qCu +(1q)Cd
1.04

= 0.0026

9
877

K = 0.9500

Cud = (Pud K)

= 0.0115

0.

q=

r = 3%
Cd =

qCud +(1q)Cdd
1.03

= 0.0134
Pdd = 0.9804

K = 0.9500

Cdd = (Pdd K)
t=0

t=1

= 0.0304

t=2

The calculations in the previous diagram reveal indeed that the option price predicted by the
implied binomial tree is 0.0026, which is one order of magnitude less than the option price we
nd earlier, 0.0124! The interpretation of this result relates, again, to the implied risk-neutral
probability, which is much larger than = 12 . The implied tree puts a relatively large weight on
events where the short-term rate is high or bond prices are low, which reduces the likelihood
the option will be exercized leading then to a small option price.
11.4.3.3 Another zero

We might not be done yet. Let us go back to the zero pricing problem, and assume we observe
the price of a 2 Year zero, and that this price equals 0.9200, a reasonable gure. Is there any
chance that the inputs to the pricing problem for the 3 Year zero could also lead to t the 2
Year zero without errors? Of course there isnt. Indeed, in the next diagram, we use the inputs
to the 3 Year zero, and Eq. (11.18), and nd that the price of the 2 Year zero implied by the
price of the 3 Year zero is equal to 0.9178. Unless the market price happens, by chance, to equal
0.9178, we cannot simultaneously t the price of the 3 Year and the 2 Year zeros.

To simultaneously t the price of the 3 Year and the 2 Year zeros, we should implement at
least one of the two strategies: (i) to make the probabilities time-varying; (ii) to calibrate the
entire structure of the short-term movements in Figure 11.6. We implement the rst of these
two strategies in the next subsection. We develop the second strategy in Section 11.5.
565

c
by
A. Mele

11.4. Foundational issues in interest rate modeling

P=1

.877

q=0

r = 5%
1
= 0.9523
Pu = 1.05

r = 4%
P =

qPu +(1q)Pd
1.04

P=1

= 0.9178

r = 3%
1
= 0.9709
Pd = 1.03
P=1
t=0

t=1

t=2

11.4.3.4 Implementing implied binomial trees

We build up implied binomial trees in more general cases, arising in the presence of several
bond prices to be matched. Suppose the time interval is six months, such that the short-term
rate is for six months. The current short-term rate is 3 99%, annualized. It can change to either
4 50% or to 4 00%, with equal (physical) probability. Suppose that two zeros are available for
trading: a 6M zero and a 1Y zero, where the current price of the 1Y zero is 0 95974. What is the
risk-neutral probability implied by this tree? This probability must be such that, the price of
all the zeros are matched exactly. Figure 11.7 depicts the tree corresponding to this situation.

566

c
by
A. Mele

11.4. Foundational issues in interest rate modeling

p=

r=

1
2

r=

4.50%
2

r=

4.00%
2

3.99%
2

t=0

t = 0.5

FIGURE 11.7. The dynamics of the short-term rate: high interest rate scenario

In this tree, = 12 denotes the physical


probability. Naturally, the price of a 6M zero at

= 0, equals, $ (0 0 5) = 1 1 + 0 0399
=
0
9804. This price is actually observed. That is, the
2
current short-term rate, 3.99%, is a mere denition. Next, we proceed to nd the no-arbitrage
movements of the 1Y zero, which are displayed below.
1

4.50%
2
1/(1 + 0.045
)
2

r=
Pu (0.5, 1) =

p=
r=

1
2

= 0.9779

3.99%
2

P$ (0, 1) = 0.95974

4.00%
2
1/(1 + 0.040
)
2

r=

Pd (0.5, 1) =

= 0.9804

t=0

t = 0.5

567

t =1

c
by
A. Mele

11.4. Foundational issues in interest rate modeling

Note, the current market price, $ (0 1) = 0 95974, is less than the expected price to prevail
tomorrow, discounted at the current interest rate,

1
1
1
1
0 9779 + 0 9804 = 0 9599
[ (0 5 1)] =
1+
2
2
1 + 0 0399
2
Hence, = 12 cannot be the risk-neutral probability. To nd out the risk-neutral probability,
we proceed as follows. In the absence of arbitrage,
$

(0 1) = 0 95974 =

1
1+

0 0399
2

[ 0 9779 + (1

) 0 9804]

with obvious notation. This is one equation with one unknown, , which is solved by = 0 605.
We may now proceed with pricing derivatives. Consider a European call option on the 1Y
zero, with expiration date in six months and exercise price equal to 0 9785. Its payo is as
depicted below:
1

P (0.5, 1) = 0.9779
05

0.6

q=

r=

Cu = (P (0.5, 1) K)+ = 0

3.99%
2

C =?

P (0.5, 1) = 0.9804
Cd = (P (0.5, 1) K)+ = 0.0019

t=0

t = 0.5

t=1

So the option price is, by risk-neutral evaluation,


=

1
1+

0 0399
2

[ 0 + (1

) 0 0019] = 0 9804 [0 395 0 0019] = 7 3579 10

(11.20)

What happens when the short-term rate does not evolve as in the diagram of Figure 11.7
but, instead, as in Figure 11.8?

568

c
by
A. Mele

11.4. Foundational issues in interest rate modeling


4.4154%
2

r=

r=

3.99%
2

r=

t=0

4.00%
2

t = 0.5

FIGURE 11.8. The dynamics of the short-term rate: low interest rate scenario

The previous tree is one where the short-term rate in the upper state of the world equals
= 4 4154%, not 4 50%, as in Figure 11.7. It implies that the price of the 1Y bond in 6 months
in this state is:
1
1
=
= 0 9784
up (0 5 1)
4
1+ 2
1 + 4154%
2
The risk-neutral probability, , solves:
$

(0 1) = 0 95974
1
[ up (0 5 1) + (1
=
1+
1
=
[ 0 9784 + (1
1 + 0 0399
2

down

(0 5 1)]

) 0 9804]

The solution is, = 0 756, which is higher than the solution we found earlier using the tree in
Figure 11.7 (i.e., = 0 605). The option price is, now,
=

1
1+

0 0399
2

[ 0 + (1

) 0 0019] = 0 9804 [0 244 0 0019] = 4 5451 10

(11.21)

The up-state of the world in Figure 11.8 is less severe than that in Figure 11.7. Why then is
the price in Eq. (11.21) smaller than that in Eq. (11.20)? To match the initial price $ (0 1) =
0 95974, the model in Figure 11.8 must put more weight on the up-state of the world, i.e. a
larger implied risk-neutral probability. This implies a larger risk-neutral probability that low
bond prices will arise in the future and, hence, a lower option price.6
In a segmented market, two investment banks might have di erent views about developments
in the short-term ratethe view in Figure 11.7 and that in Figure 11.8. The rst bank favours
a high interest rate scenario, but it is not too risk-averse to that scenario ( up = 4 5%,
= 0 605). The second bank favours a mild interest rate scenario, although it assigns a
1
6 Mathematically, we have that
), where
0. While down predicted by Figure
up
down
$ (0 1) = 1+ ( down
11.7 is the same as that in Figure 11.8, the bond price volatility, i.e. the di erence
, is lower in Figure 11.8 than in Figure
11.7. Therefore, the tree in Figure 11.8 is consistent with the given market price, $ (0 1), only when increases from 0 605 to
0 756.

569

c
by
A. Mele

11.4. Foundational issues in interest rate modeling

greater chance of this scenario to arise ( up = 4 4154%, = 0 756). But then, naturally, both
institutions need to agree on the initial bond price, $ (0 1) = 0 95974. (The rst bank might
have, then, a quite conservative risk-management system although then its option prices are
higher than the second bank.) The segmentation could arise, for example, because the client`ele
of the rst bank and that of the second bank are unlikely to meet and, the prices for the
option charged by the banks are not publicly known. In the absence of market imperfections
(and arbitrage), however, the investment banks should agree on the option price too. Note,
nally, that the price in Eq. (11.21) is almost half of that in Eq. (11.20). Derivatives can be
quite nonlinear object, due to their optionality. A small deviation in the assumptions on the
short-term rate developments can lead to dramatic option pricing implications.
Let us add a period in the tree of Figure 11.7, assuming that the short-term rate is as in the
following diagram:

q1

q0

r=

=0

605

r=

t=0

4.90%
2

r=

4.30%
2

r=

3.90%
2

4.50%
2

3.99%
2

r=

=?

r=

4.00%
2

t = 0.5

t=1

FIGURE 11.9.

In this tree, 0 is the risk-neutral probability for the rst period, and 1 is the risk-neutral
probability for the second period. We already know that 0 = 0 605. The probability 1 is
the risk-neutral probability for the time-period (0 5 1), and can di er from 0 . Suppose, also,
that an additional zero is available for trading, a 1.5Y zero. The current price of this zero is
$ (0 1 5) = 0 9382. To derive the risk-neutral probability 1 , we calibrate the implied tree for
the 1.5Y zero, as follows.

570

c
by
A. Mele

11.4. Foundational issues in interest rate modeling

r=

4.90%
2

Puu (1, 1.5) = 1/(1 +


= 0.9761

?
q 1=
r=

q0
r=

5
.60

0.049
)
2

4.50%
2

Pu (0.5, 1.5) =?

r=

3.99%
2

4.30%
2

Pud (1, 1.5) = 1/(1 +

P$ (0, 1.5) = 0.9382

= 0.9789

?
q 1=
r=

0.043
)
2

4.00%
2

Pd (0.5, 1.5) =?

r=

3.90%
2

Pdd (1, 1.5) = 1/(1 +

0.039
)
2

= 0.9808

t=0

t = 0.5

t=1

t = 1.5

We need to calculate the prices


(0 5 1 5) and (0 5 1 5), which shall be used to receover
1 , throughout the no-arbitrage property of the zero, and the previously calculated 0 = 0 605.
By no-arbitrage, we have, as usual, that:
(0 5 1 5) =
(0 5 1 5) =

1
1+

0 045
2

1
1+

0 040
2

0 9761 + (1

1)

0 9789]

(11.22)

0 9789 + (1

1)

0 9808]

(11.23)

The problem, 1 is not known. Therefore, Eqs. (11.22)-(11.23) do not allow us to pin down
the prices
(0 5 1 5) and
(0 5 1 5). But here is where calibration comes in. We know the
current price of the 1.5Y zero, which is, $ (0 1 5) = 0 9382. In the absence of arbitrage,
$

(0 1 5) = 0 9382 =

1
1+

0 0399
2

571

(0 5 1 5) + (1

0)

(0 5 1 5)]

c
by
A. Mele

11.4. Foundational issues in interest rate modeling


where
have,

(0 5 1 5) and
0 9382 =

(0 5 1 5) are as in Eqs. (11.22)-(11.23), and where


1
1+

0 0399
2

[0 605

(0 5 1 5) + 0 395

(0 5 1 5)]

= 0 605. So we
(11.24)

where
(0 5 1 5) and
(0 5 1 5) are as in Eqs. (11.22)-(11.23). Hence, by replacing Eqs.
(11.22)-(11.23) into Eq. (11.24) leaves one equation with one unknown, 1 . Solving, yields,
(0 5 1 5) = 0 9549, and
(0 5 1 5) = 0 9600.
1 = 0 8412, which implies that,
To sum up, we have the tree below.

1
r=
q 1=

r=
0

5
.60

q 0=

r=

18

0.84

4.90%
2

Puu (1, 1.5) = 0.9761

4.50%
2

Pu (0.5, 1.5) = 0.9549

3.99%
2

r=

P$ (0, 1.5) = 0.9382

q 1=

r=

18

0.84

4.30%
2

Pud (1, 1.5) = 0.9789

4.00%
2

Pd (0.5, 1.5) = 0.9600


r=

3.90%
2

Pdd (1, 1.5) = 0.9808


1
t=0

t = 0.5

t=1

t = 1.5

We are ready to evaluate derivatives written on these zeros. Consider, for example, a call
option on the 1.5Y zero, with expiration date in 1Y and exercise price equal to 0 9800. The
price of the option at time = 0 5, is either zero or = 0 00012, as illustrated below.

572

c
by
A. Mele

11.4. Foundational issues in interest rate modeling


P (1, 1.5) = 0.9761

q1=

0.841

C = (P (1, 1.5) K)+ = 0

C = 0.00000

P (1, 1.5) = 0.9789


C = (P (1, 1.5) K)+ = 0
q1 .0+(1q1 ).0.0008
= 0.00012
1+0.04/2

C=

P (1, 1.5) = 0.9808


C = (P (1, 1.5) K)+ = 0.0008

t = 0.5

t=1

The no-arbitrage price of the 1Y call option on the 1.5Y zero, struck at
=

1
1+

0 0399
2

[0

+ 0 00012 (1

0 )]

= 0 9804 [0 00012 (1

= 0 9800, is:

0 605)] = 4 647 10

We can use the tree in Figure 11.9 to price additional derivatives, such as, say, a call option
on the 1.5Y zero, with expiration date in six months, and exercise price equal to 0 9580. We
have the following tree.

573

c
by
A. Mele

11.4. Foundational issues in interest rate modeling


Pu (0.5, 1.5) = 0.9549
C = (Pu (0.5, 1.5) K)+ = 0.0000

.60

0
q 0=
3.99%
2
C =?

r=

Pd (0.5, 1.5) = 0.9600


C = (Pd (0.5, 1.5) K)+ = 0.0020

t=0

t = 0.5

Therefore, the no-arbitrage price of the option is,


=

1
1+

0 039
2

0 + (1

0)

0 0020] = 0 9804 [0 395 0 0020] = 7 745 10

11.4.3.5 Summing up

What have we done? Our starting point is the tree in Figure 11.9, which we use to recover the
two risk-neutral probabilities 0 (for the time span (0 0 5)) and 1 (for the time span (0 5 1)),
using the information about the market price of two zeros, the 1Y and the 1.5Y. Precisely, given
$ (0 1), the price of the 1Y zero, we recover 0 , as illustrated below:
1
Pu (0.5, 1)

q0
1

P$ (0, 1)

Pd (0.5, 1)
1
t=0

This is possible as
(0 5 1) and
in a straightforward manner.

t = 0.5

t=1

(0 5 1) do not depend on
574

and so they are obtained

c
by
A. Mele

11.4. Foundational issues in interest rate modeling


Next, and given
below:

0,

we determine

1,

using

(0 1 5), the price of the 1.5Y zero, as illustrated


1
Puu (1, 1.5)

q1

q0

Pu (0.5, 1.5)

P$ (0, 1.5)

Pud (1, 1.5)

Pd (0.5, 1.5)

1
Pdd (1, 1.5)

1
t=0

t = 0.5

t=1

t = 1.5

Again, the risk-neutral probability, 1 , can be recovered because


(1 1 5),
(1 1 5) and
(1 1 5) do not depend on 1 , as they are next to expiration, and are thus easily obtained.
Given
(1 1 5),
(1 1 5) and
(1 1 5), we can express
(0 5 1 5) and
(0 5 1 5) as
two linear functions of 1 . Finally, no-arbitrage forces the market price, $ (0 1 5), to be linear
in
(0 5 1 5) and
(0 5 1 5) and, hence, 1 , thereby allowing us to recover 1 .
We can continue, by adding one time period, as in the tree in Figure 11.10 below. We can
recover 2 , once we are given the market price of a 2Y zero, $ (0 2), as follows:
The price of the 2Y zero at time = 1 5 (the lled nodes in Figure 11.10) (say (1 5 2))
can now be determined, given an assumption about the numerical values of the short-term
rate in those nodes.
Given the prices (1 5 2) at time = 1 5, and the previously calibrated probabilities 0
and 1 , we impose no-arbitrage thereby expressing the current market price $ (0 2) as a
linear function of 2 . Then, we solve for 2 .
The calibration can continue. We extend the tree, by adding more and more periods. Then,
we use the price of one additional zero to recover time varying risk-neutral probabilities. An
alternative procedure consists in: (i) xing the risk-neutral probabilities to some value at all
times (e.g., = 12 ), and (ii) guring out the implied values for the short-term rate in each
node of the tree. Section 11.6 develops a systematic approach that allows to implement this
procedure.

575

c
by
A. Mele

11.4. Foundational issues in interest rate modeling


1

q2
q1
q0

x
x

P$ (0, 2)

x
x

x
1
t=0

t = 0.5

t=1

t = 1.5

t=2

FIGURE 11.10.

11.4.4 Calibrating probabilities throught derivative data


This section deals with two numerical examples where we exploit information from derivative
data to say something about the assets underlying the very same derivative contracts. Namely,
we shall use derivative data to calibrate risk-neutral probabilities.
11.4.4.1 Options

Suppose that a two year zero coupon bond is traded for a price equal to $ (0 2) = 0 95500.
We assume that the short-term rate evolves over time according to the tree described in the
following diagram.

576

c
by
A. Mele

11.4. Foundational issues in interest rate modeling


r = 3.5%
r = 3.5%
r = 3%

r=3%

r = 2%

r = 3%
r = 2%

r=2%
r = 2%
r = 1%

t=0

t=1

t=2

t=3

Suppose that a European call option written on a three year zero coupon bond is traded.
This option has a strike price equal to 0 97000, expires in two years, and quotes for $ (0 2) =
1 0141 10 3 . We can use the price of this derivative, to nd the no-arbitrage price of a three
year bond which, every year, pays o 3% of the principal of 1 00. Precisely, we use the price
of the two year zero coupon bond to recover the risk-neutral probability applying to the rst
year, and the price of the option to recover the risk-neutral probability applying to the second
year. With these probabilities, we determine the no-arbitrage price of the three year bond. (We
assume the two probabilities are state-independent, for otherwise we would need the price of
additional assets to reverse-engineer state independent risk-neutral probabilities.)
So we know that $ (0 2) = 0 95500. Moreover, as illustrated below, we can extract the price
of the 2Y bond in the up- and down- states of the world at time = 1.
1
r = 3%

?
q 0=

Pu (1, 2) =

1
1.03

r = 2%

r = 2%
Pu (1, 2) =

1
1.02

1
t=0

t=1

577

t=2

c
by
A. Mele

11.4. Foundational issues in interest rate modeling


We have
(1 2) = 1 103 = 0 97087, and
risk-neutral probability. We have,
$

(1 2) =

1
1 02

= 0 98039. We can now solve for the

(0 2) = 0 95500
1
( 0 (1 2) + (1
=
1 02
1
=
( 0 0 97087 + (1
1 02

0)
0) 0

(1 2))
98039)

Solving for 0 , yields, 0 = 0 6607. We use this probability, and the price of the option, $ (0 2),
to solve for the risk-neutral probability relevant to the second period, as illustrated below.
1

r = 3.5% , K = 0.97
Puu (2, 3) =

?
q 1=

1
1.035

= 0.96618

r = 3%
7

0
.66

=0
q0

Cu =?
r = 3% , K = 0.97

r = 2%

Pud (2, 3) =

1
1.03

= 0.97087

r = 2%

Cd =?
r = 2% , K = 0.97
Pdd (2, 3) =

1
1.02

= 0.98039

t=0

t=1

t=2

t=3

In this tree, = 0 97000 is the strike price of the option. The option price at time = 1, in
the two states, can be either
or , where:
1
(2 3)
[ 1 max {
1 03
1
=
(1
1 ) 0 00087
1 03
1
[ 1 max {
=
(2 3)
1 02
1
[ 1 0 00087 + (1
=
1 02
=

1)

0} + (1

1 ) max {

(2 3)

0}]

0} + (1

1 ) max {

(2 3)

0}]

0 01039]
578

c
by
A. Mele

11.4. Foundational issues in interest rate modeling


Hence, the option price satises,

(0 2) = 1 0141 10 3
1
( 0 + (1
=
)
0)
1 02
1
=
(0 6607 + 0 3393 )
1 02
1
1
1
0 6607
(1
[
=
1 ) 0 00087 + 0 3393
1 02
1 03
1 02

Solving for 1 yields, 1 = 0 8000.


Next, we determine the price of the zero maturing at time 3,
Figure 11.11.

0 00087 + (1

1)

0 01039]

(0 3). We use the diagram in

q 1=

00

Puu (2, 3) = 0.96618

0
0.8

r = 3%
07

6
0.6

=
q0

Pu (1, 3) =?

r = 2%

Pud (2, 3) = 0.97087

r = 2%

Pd (1, 3) =?

Pdd (2, 3) = 0.98039

t=0

t=1

t=2

FIGURE 11.11.

579

t=3

c
by
A. Mele

11.4. Foundational issues in interest rate modeling


We have,
1
(2 3) + (1
(2 3)]
[ 1
1)
1 03
1
=
(0 80 0 96618 + 0 20 0 97087) = 0 93895
1 03
1
[ 1
(1 3) =
(2 3) + (1
(2 3)]
1)
1 02
1
=
(0 80 0 97087 + 0 20 0 98039) = 0 95370
1 02
(1 3) =

The price of a 3Y zero coupon bond, embedded in the market prices,


therefore:

(0 2) and

(0 2), is

1
[ 0 (1 3) + (1
(1 3)]
0)
1 02
1
=
(0 6607 0 93895 + 0 3393 0 95370) = 0 92545
1 02

(0 3) =

We are now ready to evaluate the 3Y bond with 3% coupon rate. It is,
coupon=3% (0

3) = 0 03 [ $ (0 1) + $ (0 2) + (0 3)] + (0 3)
= 0 03 (0 98039 + 0 95500 + 0 92545) + 0 92545 = 1 0113

The discretely compounded yield curve implied by the previous calculations is given by
2
0 1 = 2 00% (1Y); 0 2 : 0 95500 = (1 + 0 2 ) , or 0 2 = 2 328% (2Y); and 0 3 : 0 92545 =
3
(1 + 0 3 ) , or 0 3 = 2 616% (3Y). Note, we are capable of computing the yield curve, without
knowing all bond data, but inverting some of them from the price of an option! We can go
further. Suppose the price of the missing bond becomes available, so to speak. We want to
make sure this price is consistent with absence of arbitrage. Suppose, for example, that the
market price is $ (0 3)
(0 3) = 0 92545, say. Then, we can sell short the 3Y zero, and set
up a dynamic, self-nancing strategy aiming to replicate the 3Y zero, i.e. capable of delivering
$1 at maturity.
We would proceed as follows. Consider the tree in Figure 11.11. We build up a portfolio, which
is long the option and a MMA. We assume the 3Y bond converges to the values
,
and
in Figure 11.11, for otherwise we might implement a trivial arbitrage from time = 2 to
= 3. At time = 0, we go long 0 options and 0 units of the MMA, so as to make sure that
the portfolio delivers
(1 3) in the upstate of
(1 3) in the downstatethereby ensuring
that the price of the bond is replicated at time = 1. The value of this replicating strategy is,
of course, (0 3), so by short-selling the 3Y bond at = 0, we obtain an initial prot, equal
to $ (0 3)
(0 3). Suppose, then, that at time = 1, we are in the up-node, such that the
bond price is
(1 3). In this node, we can build up another portfolio long 1 options and 1
units of the MMA, aiming to replicate the price of the bond at time = 2either
(2 3)
or
(2 3). The value of this replicating portfolio would be just
(1 3), which is what is
obtained by the replicating strategy at implemented at time = 0. The strategy is clearly
self-nanced, as the following calculations reveal. By construction, 0 (1) + 0 (1 + ) =
(1 3) = 1 (1) + 1 (with = 2%), and 1 (2) + 1 (1 + ) =
(2 3), where
= 3%, and: (2) (2 3) are either
(2)
(2 3), or
(2)
(2 3), at time = 2,
580

c
by
A. Mele

11.4. Foundational issues in interest rate modeling


with straight forward notation. Therefore, we have that,

(2 3)
(1 3) =
(2)
(1)
+
1

(2)
=
(1)
1

(1) +

(1 3)

Likewise, if, instead, at time = 1, we end up in the down-node, where the bond price (and the
(1 3), we can invest
(1 3) in options
value of the strategy implemented at time = 0) is
and MMA so as to replicate the price of the bond at time = 2either
(2 3) or
(2 3).
The presence of dynamically complete markets allows us to implement an arbitrage.
11.4.4.2 Arrow-Debreu securities, and the pricing of interest rate derivatives

Arrow-Debreu securities are assets that only pay over a specic state of the world, as explained
in Chapter 2. We shall deal with these securities in more detail in Section 11.7, because they
allow us to implement perfectly tting models quite elegantly. This section is a rst introduction
to them. We make use of Arrow-Debreu securities to, rstly, extract risk-neutral probabilities,
and, secondly, to price quite basic interest rate derivatives, such as a caplet or a forward rate
agreement. Likewise, the pricing of interest rate derivatives covered in this section is a preliminary introduction, as the next chapter will systematically deal with it, within a continuous
time setting.
Extracting risk-neutral probabilities from Arrow-Debreu securities

Assume that the discretely compounded one-year rate, or the short-term rate, evolves over
time as described by the following tree:
q2

q1

q0

uu: r = 7 %

u: r = 6%

r = 5%

uuu: r = 7.5%

uud: r = 6%.0
ud: r = 5%

d: r = 4 %

udd: r = 4.5%
dd: r = 4 %
ddd: r = 3%.0

t=0

t=1

t=2

t=3

Assume three securities are available for trading: (i) a zero coupon bond expiring in two
years, quoting for 0 91000; (ii) a zero coupon bond expiring in three years, quoting for 0 86500;
581

c
by
A. Mele

11.4. Foundational issues in interest rate modeling

(iii) an Arrow-Debreu security, paying o $1 only at time 3 in the state uuu of the previous
diagram, where the short-term rate equals 7.5%, quoting for 0 10000.
Assume that the risk-neutral probabilities of upward movements in the short-term rate change
over time and take three values, 0 , 1 and 2 , but are independent of the state of nature, as
illustrated in the previous diagram. We can calibrate these probabilities, through the previously
given available market data. First, we derive 0 using the price of the two year bond, $ (0 2)
say, as follows:
$

(0 2) = 0 91000 =

1
(
1 05

(1 2) + (1

0)

(1 2))

where

1
1
(1 2) =
= 0 94340
= 0 96154
1 06
1 04
Solving for 0 yields 0 = 0 33284. Next, we calibrate 1 to match the price of the three year
bond, $ (0 3) say. We have,
(1 2) =

(0 3) = 0 86500 =
=

1
(
1 05

(1 3) + (1

1
(0 33284
1 05

(1 3) + 0 66716

0)

(1 3))
(1 3))

where
1
(
1 06
1
(1 3) =
(
1 04
(1 3) =

(2 3) + (1

1)

(2 3))

(2 3) + (1

1)

(2 3))

and
(2 3) =

1
= 0 93458
1 07

(2 3) =

1
= 0 95238
1 05

Solving for 1 yields 1 = 0 66507. Finally, we calibrate


security paying in state uuu. This price, denoted as
$

i.e.

(3) = 0 10000 =

1 1 1
1 05 1 06 1 07

0 1 2

(2 3) =

1
= 0 96154
1 04

3 through
$

the price of the Arrow-Debreu


(3), is given by:

1 1 1
0 33284 0 66507
1 05 1 06 1 07

= 0 53798.

Pricing interest rate derivatives

Next, we use the previously calibrated probabilities to price some interest rate derivatives. First,
consider a caplet contingent on the rates prevailing at time = 3, paying o at time = 4,
with strike rate equal to 5%, and notional value equal to $100. The payo of this derivative
instrument at = 4 is max {
5% 0}, where is the rate at = 3. Therefore, the discounted
payo s at time = 3 are:
uuu:

1
1 075

uud:

1
1 06

max {7 5

max {6

5 0} = 2 32560
5 0} = 0 94340
582

c
by
A. Mele

11.4. Foundational issues in interest rate modeling


udd:

1
1 045

ddd:

1
1 03

max {4 5

max {3

5 0} = 0
5 0} = 0

As for time = 2 and = 1, we have:


= 2 uu:

ud:

1
1 07
1
1 05

dd:

u:

d:

=1

1
1 06
1
1 04

( 2 2 32560 + (1

2) 0

94340) = 1 5766

( 2 0 94340 + (1

2 ) 0)

= 0 48336

+ (1

1)

2)

= 1 1419

+ (1

1)

3)

= 0 30910

Therefore, the price of the caplet is:


Caplet Price =

1
(
1 05

+ (1

0)

2)

= 0 55837

Next, consider a forward rate agreement, whereby at time = 0, two counterparties agree
that at time = 4, they will exchange with each other the variable short-term rate prevailing
at time = 3, against a xed interest rate equal to . We can use the previously calibrated
probabilities to determine the forward rate, i.e. the level of
that makes the value of this
agreement equal to zero at time = 0. Take the case of a payer forward agreement, one for
which the discounted payo s at time = 3 are:
uuu:

1
1 075

uud:

1
1 06

udd:

1
1 045

ddd:

1
1 03

(7 5

(6

(4 5

(3

At time = 2 and = 1, the payo s are:


= 2 uu: 1
ud: 2

=1

dd: 3

1
1 07
1
1 05
1
1 04

u: 1

1
1 06

d: 2

1
1 04

2 1

+ (1

2)

2)

= 5 9519

0 87506

2 2

+ (1

2)

3)

= 4 7950

0 90443

+ (1

2)

( 2

1 + (1

2 + (1

= 3 5215 0 92632

)
0 83481
1
2 = 5 2495

0 87669
1 ) 3 = 4 2004
4)

We can now express the value of the contract as a function of the xed rate

0 82167
Fwd ( )
0 1 + (1
0 ) 2 = 4 3329
1 05

, as follows:
(11.25)

The forward rate is, simply, the value of


such that Fwd ( ) = 0, i.e.
= 5 27330%. More
generally, we can determine the value of the forward rate agreement in Eq. (11.25), for any
583

c
by
A. Mele

11.5. The Ho and Lee model

value of . For example, we have that Fwd ( = 6) = 4 3329 0 82167 6 = 0 59712, in


percentage terms.
Finally, we can derive the price of a bond expiring at time = 4, (0 4), through the relation,
(0 4)
(0 3)
$ (0 3)
= $(0
1 = 5 27330%, leading to (0 4) = 0 82170, which is indeed the same
(0 4)
4)
gure we can obtain by solving the tree for (0 4), as we now show. We have, for time = 3,
uuu:

(3 4) =

1
1 075

uud:

(3 4) =

1
1 06

udd:

(3 4) =

1
1 045

ddd:

(3 4) =

1
1 03

= 0 93023

= 0 94340
= 0 95694

= 0 97087

Then, we can solve, recursively, as usual:


= 2 uu:

(2 4) =

ud:

(2 4) =

dd:

(2 4) =

=1

u:

(1 4) =

d:

(1 4) =

1
( 2
1 07
1
( 2
1 05
1
( 2
1 04

1
1 06
1
1 04

(3 4) + (1

2)

(3 4)) = 0 87506

(3 4) + (1

2)

(3 4)) = 0 90443

(3 4) + (1

2)

(3 4)) = 0 92632

(2 4) + (1

1)

(2 4)) = 0 83481

(2 4) + (1

1)

(2 4)) = 0 87669

Finally, we calculate the price


0 82170, by rounding.

(0 4) =

1
1 05

(1 4) + (1

0)

(1 4)) = 0 82167

11.4.5 Extensions to trinomial trees


Trinomial trees might be useful to model mean-reversion. They might, also, be considered as
approximation to partial di erential equations.

11.5 The Ho and Lee model


Ho and Lee (1986) develop a revolutionary approach to modeling yield curve movements. This
approach does not aim to an economic theory to explain the yield curve that we observe.
Rather, it takes the yield curve as given, and shifts the focus towards the modeling of noarbitrage movements of the entire yield curve. As explained, we need to match model prices
to market prices, to avoid having derivatives with underlyings deviating from market prices.
The next chapter derives the Ho & Lee model in continuous time, as this derivation allows
us to illustrate a general approach to interest rate modeling, developed later by Heath, Jarrow
and Morton (1992). The original derivation of the model is, however, in discrete time, and it is
quite instructive to follow this approach here, and to compare with the alternative calibration
methods of this chapter.
The main idea underlying Ho and Lee is that the movements of the yield curve can be
modeled through a binomial tree, much in the spirit of the Cox, Ross and Rubenstein (1979)
tree representation of Black and Scholes (1973). However, in Black and Scholes (1973) and Cox,
Ross and Rubenstein (1979), the asset underlying the option contract is a traded risk, such
584

c
by
A. Mele

11.5. The Ho and Lee model

that the underlying price satises the martingale condition. Instead, interest rate derivatives
generally depend on non-traded risks, which are not martingales. Moreover, the mere presence
of boundary conditions induce bond return volatility to be time-varying.
Ho and Lee address these issues by modeling the movements of the entire collection of bond
prices. We have three ways to achieve this task: (i) by making risk-neutral probs time-varying,
for a given tree with predetermined values of the short-term rate (as in the previous sections);
(ii) by assuming a constant risk-neutral probability, and searching for the values of the shortterm rate on the tree; (iii) by a combination of (i) and (ii) as in the implied binomial tree of
Chapter 10. The Ho and Lee model relies on the second way. The key element of this model is
the determination of the no-arbitrage ups and down of the entire yield curve, through modeling
bond prices.
11.5.1 The tree
The key element of the model is the determination of the no-arbitrage ups and down of the
entire yield curve, obtained by directly modeling bond prices of arbitrary expiration. Note that
once bond prices are obtained, forward rates are obtained as a result. Therefore, the Ho and
Lee model is a model of forward rate movements. It is a simple but powerful remark, because
the key point of the model is, then, to re-express bond prices again, as a function of future
forward rates. In this sense, the Ho and Lee model is a representation of current bond prices
(in terms of forward rates), rather than a model of current bond prices.
Assume that the price of any zero evolves according to a binomial tree. Let ( ) be the
price of a pure discount bond as of time , with time to maturity
, after upstate price
movements
of
the
bond
price.
Let
(
1
),
a
binomial
random
variable,
meaning that

Pr ( ) =
, such that:
(1
)
( ) = (1

where 1
we have,

( ) = (1

is the risk-neutral probability of a single upstate movement of the price. Therefore


+1
1

( +1

%
&

( +1

That is, if at time , the number of upstate movements is equal to then, at time + 1, the
number of upstate movements can either jump to + 1, with probability 1
, or stay at ,
with probability . (Therefore, we are now following the convention to have high values of bond
prices in the upper parts of the tree.) Note, further, that after one period, any zero is one period
closer to maturity. At maturity, = , the price of any zero is worth one unit of numeraire, viz
(

) = 1,

for all

and

Note, in the previous tree, it shall not necessarily hold that ( + 1 )


( ). On the
contrary, we would expect that especially when time-to-maturity approaches zero, ( + 1 )
( ), as the price of the zero needs to converge to par.
585

c
by
A. Mele

11.5. The Ho and Lee model


11.5.2 Price movements and the martingale restriction
In the absence of arbitrage opportunities, the expected return on the zero at
()
short-term rate, viz ( ) =
( ( + 1 )), or
(

)=

+ 1) [(1

+1

( +1

)+

( +1

must equal the

)]

(11.26)

()
where ( + 1) =
, and ( ) is the continuously compounded short-term rate at time
after upward movements. We call this condition the martingale restriction.
Let us introduce notation for the movements of the price of any zero along the tree,
+1

( +1 )
= (
( )
{z

1
(

up at

+ 1)
}

( +1 )
= (
( )
{z
|

and

down at

1
(

+ 1)
}

(11.27)

The two functions, () and (), also called perturbation functions, are taken to be stateindependent. They capture the fact that in the case of uncertainty, the price of the zero can
either go up or down with respect to the risk-free of return. In other words, Eqs. (11.27) tell us
that the discounted gross return from going long a bond is:
( +1 )
( + 1) =
| {z }
( )
|
{z
}
Discount

) with probability 1

) with probability

Gross return

where the two functions (


) and (
) have to be endogenously determined. If there
was no uncertainty, we would have (
)= (
) = 1, for all
. In general, we have
that (
) 1
(
), as we shall now demonstrate.
One period before the expiration date, i.e. at =
1, our price is certain to jump to one,
with jump size equal to the short-term rate ( ). Hence, the following boundary condition for
the two functions () and () holds:
(1) = (1) = 1
In terms of the two functions
1 = (1

(11.28)

() and (), the martingale restriction in Eq. (11.26) is,


) (

)+

(11.29)

This relation matches the martingale restriction introduced in Chapter 10, and applying to
stocks, the current values of which are tied down to their future up and down movements
through the risk-neutral probability. The di erence in this context is that the up and down
movements of the prices depend on the asset maturity, through the two functions (
) and
(
), which are endogenous, which makes the evaluation problem more intricate.
11.5.3 The recombining condition and interest rate volatility
Ho and Lee consider a recombining tree: the price
( ) we are looking for depends only
on , not on the exact sequence of up and down movements leading to upstate movements.
To summarize, we are looking for two functions (
) and (
) such that (i) the noarbitrage condition in Eq. (11.29) holds true and (ii) the tree is recombining. We now elaborate
586

c
by
A. Mele

11.5. The Ho and Lee model


on the arguments that lead to the recombining property of the tree.

+1

( +1

( +2

+1

( +2

&

&

( +1

+2

&

( +2

The recombining property of the tree implies that the bond price at time + 2 in the event
of + 1 jumps, i.e. +1 ( + 2 ), can be generated by one of the two paths, which we track
by using the two Eqs. (11.27):
(i) The up & down path,

+1

( +2

)=

)=

+1

}|

+1

( +1

{
)

+ 1)

(
(

(
{z

( +1

( +1

( +2

1)

), where,

1
+1 ( + 1

+1

( +2

1)
1)

(11.30)

), where

(
(

{
)

+ 1)

(
{z

1)

1
( +1

up at +1

(
(

+ 2)
}

), down at

}|

By equating Eq. (11.30) and (11.31), we obtain,


)
=
)

+1

), up at

(
(

down at +1

(ii) The down & up path

( +2

+1 (

+1

( + 1 + 2)
( + 1 + 2)

+1

(1

+ 2)
}

(11.31)

(11.32)

where we take the ratio to be constant, and return to its interpretation in a moment. Eq.
(11.32) is a nite-di erence equation for the ratio (
) (
). Given the boundary
conditions in Eqs. (11.28), the solution for this ratio is:
(
(

)
=
)

1)

(11.33)

We claim that in Eq. (11.32) is a contant relating to the volatility of the short-term rate.
Indeed, by taking logs, we obtain:
ln

+1

( + 1)

( + 1)]

(11.34)

Moreover, conditionally upon time and price jumps equal to


, the short-term rate is
binomially distributed, and can take on two values: (i) +1 ( + 1) with probability 1
and
587

c
by
A. Mele

11.5. The Ho and Lee model

( + 1) with probability . Then, the conditional variance of the short-term rate,


say, satises,
p
p
p
[ ( + 1)] =
(1
) | +1 ( + 1)
( + 1)| =
(1
) ln

[ ( + 1)]
1

where the second equality follows by Eq. (11.34).


There are no economic reasons to assume that is constant. Assuming to be time-varying
and possibly state dependent, might actually help match more than the initial yield curve (say
European options), similarly as for the implied binomial trees for equity reviewed in Chapter
10. At the same time, assuming to be time-varying would likely lead to models without closedform solutions. We shall return to the topic of models without closed-form solution in Section
11.7. The next section shows that in the context where is constant, the bond price has a
solution expressed in closed-form.
11.5.4 Models solution
Eq. (11.33) yields the condition under which the tree is recombining. On the other hand, the
martingale restriction in Eq. (11.29) is needed to prevent arbitrage. Therefore, we are left with
the following system of two equations, Eq. (11.33) and Eq. (11.29), and two unknowns, ()
and (),

1)
(
)= (
(
)
(1
) (
)+ (
)=1
The solution to this system is,
(

)=

1
)+

(1

1
1

)=

(1

)+

(11.35)

So the problem is now solved. Once we assign values to and , and an initial bond price
( ), we plug Eqs. (11.35) into Eqs. (11.27), and populate the treewe plug bond prices
on each node of the tree. Once this is done, we can price interest rate derivatives, i.e. assets
with payo s indexed to bond prices or interest rates over a given set of nodes. Note that Eqs.
(11.27) dictate the law of motion of the entire structure of bond prices along the trees, and they
are silent regarding the initial condition, i.e. at time = 0. The natural initial condition is the
market price of the bond for each maturity, say 0 (0 ) = $ (0 ) for all .
We can actually do more, and develop a closed-form solution for the bond price. Let
()
be the forward rate as of time after the occurrence of upward price movements, and let the
continuously compounded forward rate ( ) be dened as,

( ) ln 1 +
()
In Appendix 2, we show that,

)=

(0 )
(0 )

1
=

()

(0))

(11.36)

That is, we can express the price in closed-form, once we are able to do the same with the
forward rate changes, ( ) (0). In Appendix 2, we show that,
)
( ) = (0) + ln ( + 1
( + 1)
588

) ln

(11.37)

c
by
A. Mele

11.5. The Ho and Lee model

We replace Eq. (11.37) into Eq. (11.36), use the solution for the perturbation function () in
Eqs. (11.35), and using the condition that initial node of the tree is the same as the market
price for each / , we nd that:

)=

(0 )
$ (0 )

)(

Y1 (1
=

(1

)+
)+

(11.38)

From the perspective of time 0, the price of the zero at , in each state , is only a function
of the initial yield curve, the volatility parameter , and the risk-neutral probability . Note in
particular how important this volatility is. It is, in practice, the only parameter to be determined
once we x , which leads to an interesting parallel between this model and that of Black and
Scholes (1973). In both models, the inputs are the volatilities of the fundamentals (the shortterm interest rate, in Ho & Lee, and stock returns, in Black & Scholes). However, in Ho & Lee,
the input does not link to the short-term rate, but the entire yield curve, and the output relates
to future bond price movements. In Black & Scholes, the input is that of the fundamental, the
stock price, and the output is the initial option price.
Finally, the model displays the properties we would require from a perfectly tting one,
and illustrated by the next picture. First, it matches each price at time zero, as it can be
veried by collapsing = 0. Second, it bridges
to one when = . Third, it is random at
any time 0
, ( ) with probability
. Naturally, the values taken by
(1
)
( ) are determined by the equilibrium of the model, i.e. Eq. (11.38). All in all, the model
predicts random outcomes at time , ensuring at the same time that the initial yield curve is
tted without errors. The reason we might be interested in random outcomes at time , is that
we might have to evaluate options expiring at that date, through a model that predicts the
underlying to be pinned down without errors as explained.

j=t

Pj(T,T)=1

P (t,T)
j

P (0,T)
0
j=0

589

c
by
A. Mele

11.5. The Ho and Lee model


11.5.5 Calibration of the model

We need to estimate the value of . We can proceed as follows. Consider Eq. (11.38), and let
= + 1. We have,
1
(0 + 1)
( + 1) =
(0 )
(1
)+
The continuously compounded short-term rate predicted by the model is,

()
ln ( + 1) = (0) + ln (1
)+
(
) ln

where (0)

ln

(0 )

(1)

ln

(11.39)

(0 + 1). We also have,

(0) = 1 (0)

0 (0) + ln ((1

)+

) + ln

(1

Hence, the parameter can be chosen such that the volatility of the short-term rate predicted
by the model matches exactly the volatility
pof the short-term rate that we see in the data. Concretely, we can take = exp( Std ( )/
(1
)), where Std( ) is the standard deviation
of the short-term rate in the data.
Note, then, the interesting feature of the model. The Ho and Lee model doesnt take any
a priori stance on the dynamics of the short-term rate. Rather, it imposes: (i) the martingale
restriction on bond prices, an economic restriction, Eq. (11.29); and (ii) the simplifying assumption the tree is recombining, a technical condition, Eq. (11.27). These two conditions su ce to
to tell what to expect from the dynamics of the short-term rate. While deliberately simple, the
Ho and Lee model is quite powerful. The modern approach to interest rate modeling simply
aims to make the Ho and Lee methodology more accurate for practical purposes.
11.5.6 An example
Assume that three zero coupon bonds are available for trading, with current market prices: (i)
$ (0 1) = 0 9851 (the price of a 6M zero), (ii) $ (0 2) = 0 9685 (the price of a 1Y zero), and
(iii) $ (0 3) = 0 9445 (the price of the 1.5Y zero). We know that the price of one-period zero
at time , in the event of upward price-jumps from the current date to , is:
(

+ 1) =

(0 + 1)
$ (0 )

(1

1
)+

(11.40)

where $ (0 ) is the current market price of a zero expiring at time , with equal to six
months, one year and eighteen months, in this example. We assume that = 12 and = 0 9802.
11.5.6.1 The dynamics of the short-term rate

We want to determine the developments of the short-term rate on a recombining tree for as
many periods as we can, given the market price of the zeros we observe. We use Eq. (11.40) to
nd the one-period zeros in each node.

= 0. We have, trivially,

= 1. We have three cases:

= 0:

(1 2) = 2

(0 1) =

$ (0
$ (0

2)
1
1) 1+

(0 1) = 0 9851.

= 0 9733
590

c
by
A. Mele

11.5. The Ho and Lee model

= 1:

(1 2) = 2

$ (0
$ (0

2) 1
1) 1+

= 0 9930

= 2. We have three cases:

= 0:

(2 3) = 2

$ (0

= 1:

(2 3) = 2

$ (0

= 2:

(2 3) = 2

$ (0

$ (0

$ (0

$ (0

3) 2 1
2)
1+
3)
1
2) 1+
3) 1
2) 1+

= 0 9557
= 0 9750

= 0 9947

So we face the tree below.


P (1, 2) = 0.9557

q=

1
2

P (1, 2) = 0.9733

q=

1
2

P (0, 1) = 0.9851

P (1, 2) = 0.9750

P (1, 2) = 0.9930

P (1, 2) = 0.9947

t=0

t=1

t=2

11.5.6.2 Pricing a coupon bearing bond

Suppose, now, that we want to nd the price of some additional bond, e.g., a 1.5Y bond which
pays (semiannually) coupons at 3% of the principal of $1. First, we need to nd the value
of this bond in each node of the tree. Note, at each node, the price equals (i) the discounted
expectation of its future value (including coupons), and (ii) the current coupons, as illustrated
in the tree below. That is, the convention, here, is that the bond purchased at time doesnt
give the owner the right to receive any coupon at time , only from time + 1 onwards.

591

c
by
A. Mele

11.5. The Ho and Lee model

1.03

Puu (2.3) = 0.9557

q1 =

1
2

3% + Puu (2, 3) 1.03 = 1.014

Pu (1, 2) = 0.9733

1.03

3% + Pu (1, 2)( 12 1.014 + 12 1.034) = 1.0267

q0 =

1
2

Pud (2, 3) = 0.9750

P (0, 1) = 0.9851

3% + Pud (2, 3) 1.03 = 1.034

Pd (1, 2) = 0.9930

1.03

3% + Pd (1, 2)( 12 1.034 + 12 1.054) = 1.0667

Pdd (2, 3) = 0.9947


3% + Pdd (2, 3) 1.03 = 1.054

1.03

t=0

t=1

t=2

t=3

Since the bond does not pay coupons at time zero, its current price is,

1
1
1
1
1 0267 + 1 0667 = 0 9851
1 0267 + 1 0667 = 1 0311
= (0 1)
2
2
2
2
Naturally, this price could been obtained by simply adding [ $ (0 1) + $ (0 2) + $ (0 3)]
0 03 + $ (0 3), although the results in the tree above are going to matter while pricing derivatives written on the coupon bearing bond.
11.5.6.3 Pricing European options

Next, we wish to nd the price of options, say the price of two call options on the 1.5Y bond
considered in the previous subsection, when the strike price is $1 and the maturities of the
options are 6 months and 1 year. Again, we need to gure out the no-arbitrage movements of
592

c
by
A. Mele

11.5. The Ho and Lee model

the ex-coupons bond price. (This is because if we purchase the bond today, we are not entitled
to receive any coupon, today. The ow of coupons we are entitled to receive starts from the
next period.) We easily obtain the tree below. We must just subtract the coupon, 0.03, from
each cum-coupons price in each node of the tree. Then, we obtain:

q=

q=

1
2

1
2

P = 1.014 0.03 = 0.984

P = 1.0267 0.03 = 0.997

P (0, 1) = 0.9851

P = 1.034 0.03 = 1.004

P = 1.0667 0.03 = 1.0367

P = 1.054 0.03 = 1.024

t=0

t=1

t=2

We are ready to price the two options. As for the call option on the 1.5Y bond, with 6 months
maturity, and strike price = $1, we have the following tree:

P = 0.997
0.5
q=

C = (P K)+ = 0

P (0, 1) = 0.9851
C=?

P = 1.0367
C = (P K)+ = 0.0367

t=0

t=1

593

c
by
A. Mele

11.5. The Ho and Lee model


Therefore,

1
1
0 + 0 0367 = 1 808 10
= 0 9851
2
2

The call option on the 1.5Y bond with 1 year maturity, and strike price
with similarly. We have the following tree:

= $1, is dealt
P = 0.984

q=

C = (P K)

1
2

= 0.000

P (1, 2) = 0.9733
C = P (1, 2)( 12 0 + 12 0.004)

q=

= 0.0019

1
2

P = 1.004

P (0, 1) = 0.9851
C =?

C = (P K)

= 0.004

P (1, 2) = 0.9930
C = P (1, 2)( 12 0.004 + 12 .0.024)
= 0.014

P = 1.024
C = (P K)

t=0

t=1

t=2

Therefore, the price of the option is,

1
1
1
1
= (0 1)
0 0019 + 0 014 = 0 9851
0 0019 + 0 014 = 7 831 10
2
2
2
2
11.5.7 Continuous-time approximations, with an application to barbell trading
11.5.7.1 The approximation

Consider Eq. (11.39), and dene


()=

( (1

= 0.024

ln , and

()

(0) + ln

) ln
594

()

, such that:

(1

)+

(1

c
by
A. Mele

11.5. The Ho and Lee model


We have,
E0

() =

and

V0


( ) = ln

1 2

(1

(1
), and, then, =
. Replacing this into the denition
such that we may dene
of , yields, after expanding terms up to the second order,

= (0) + ln
(1
) + (1 )

2
(0) + ln 1 + 1 2 (1
)
2
(0) + 1 2 (1
) 2
2
1
= (0) + 2 2
2
Note, this expansion is accurate when is small, which empirically is indeed, as we have that,
typically,
10 2 , which is reasonable small for values of up to at least 50 years! However,
these calculations might also be considered as the starting point for the initial drift of the
short-term rate from zero to time . So, we have, approximately, that,
E0

1
( ) = (0) +
2

2 2

and V0

() =

(11.41)

In the next chapter (Section 12.4.2), we shall show, consistently with the previous calculations,
that in continuous time, the Ho and Lee model predicts the short-term rate to be the solution
to:

2
()
()=
+
(11.42)
$ (0 ) +

where $ (0 ) is the instantaneous continuously compounded forward rate, and ( ) is a


Brownian motion dened under the risk-neutral probability. In fact, in the next chapter, it will
be shown that the instantaneous forward rate predicted by the Ho & Lee model is:
Z 0

0
2
( 0) ( )
( )=
(
) +
(11.43)
( )

such that, for ( 0 ) =

( )

),

)=

( 0)

( )

(11.44)

the continuous time counterpart to the two conditions in Eqs. (11.41). By combining Eqs.
(11.43)-(11.44), we obtain, after simple computations, that:
(

)=

)(

) + ( 0)

(11.45)

As shown in the next chapter (see Section 12.6.1), we have that for any model, including Ho &
Lee,7 the following representation holds true:
(
7 For

)=

(0 )

(0 )

[ (

(0 )]

example, Eq. (11.36) in the Appendix provides the discrete time counterpart to Eq. (11.46).

595

(11.46)

c
by
A. Mele

11.5. The Ho and Lee model


Using the expression for
Z

[ (

(0 )]

) in Eq. (11.45), and integrating,


1
2

Replacing this expression into Eq. (11.46) leaves:

(0 )
1 2
( )=
exp
(
)2
(0 )
2

)2 + (

( ()

)( ( )

(0 )) (

(0 ))

(11.47)

It is a neat expression, which we may use, for a variety of purposes, such as option pricing. The
next section develops an example relating to barbell trading.
11.5.7.2 Application to barbell trading

We revisit the barbell trading strategy of Section 11.4.3.4, where we argued that this strategy
leads to positive prots due to convexity, as summarized by Figure 11.4. The key point in
this argument is that it abstracts from passage of time, and may, in fact, lead us to misinterpret
what is a merely static analysis. We may use the Ho and Lee model to analyze the prot and
losses of a barbell trade, in a dynamic context free from arbitrage. We consider two situations:
one, where the initial yield curve is at, and a second, where the initial yield curve is upward
sloping.
As for the at yield curve, we use the continuously compounded rate corresponding to the at
5% of Section 11.4.3.5, delivering = ln 1 05 = 0 04879. The number of assets to include into
the portfolio, 1 and 2 , are as in Eq. (11.16), i.e. 1 = 0 45706 and 3 = 0 56724. Instantaneous
forward rates are (0 ) = lim
(0
) =
ln (0 )
= . Using Eq. (11.47),
with volatility parameter = 0 03, we calculate the value of the strategy a few months later,
as follows:
Barb ( ) = 100 ( 1 ( 1) + 3 ( 10)
( 5))
(11.48)
Figure 11.12 depicts the value of the barbell, Barb ( ), for investment horizons equal to 1 month,
3 months, 6 months and one year.

596

c
by
A. Mele

11.5. The Ho and Lee model

2.5

1.5

1.5
1
1
0.5
0.5
0

0
0.5

0.02

0.04
0.06
shortterm rate in one month

0.08

0.1

2.5

0.5

0.02

0.04
0.06
shortterm rate in six months

0.08

0.1

0.02

0.04
0.06
shortterm rate in one year

0.08

0.1

1.5

1.5
0.5
1
0
0.5
0.5

0
0.5

0.02

0.04
0.06
shortterm rate in three months

0.08

0.1

FIGURE 11.12. Prot and losses arising from barbell trading, Barb ( ) in Eq. (11.48),
under the assumption the yield curve is driven by the Ho and Lee model, Eq. (11.47).
The initial yield curve is assumed to be at at = 4 8790%. Investment horizons are
= 1 12 (NW quadrant), = 3 12 (SW quadrant), = 6 12 (NE quadrant) and, = 1
(SE quadrant). The vertical dashed lines pass through = 4 8790%, and the horizontal
dashed lines pass through zero.

This trade is quite risky. For long investment horizons, it pays o when the short-term rate
uctuates signicantly away from the initial value, = 4 8790%. The amount of uctuations in
the short-term rate diminishes as we shrink the investment horizon. Nevertheless, this amount
appears to be considerable: for example, at one-month horizon, we should require the shortterm rate to move from = 4 8790% to either values larger than = 6% or lower than = 4%,
in order to claim for positive prots. Actually, these results suggest that a short position in the
barbell trade (i.e., sell the barbell portfolio and go long the 5Y bond) should be an interesting
strategy to implement in periods where we do not expect high volatility of interest rates. For
example, for investment horizons of 6 months, the prots from a short position in the barbell
trade are positive within a quite signicant range of variation of the short-term rate, [2.5%,
6.8%].
Finally, we consider a scenario where the initial yield curve is upward sloping, and generate
()
prices as (0 ) =
, where ( )
0 01(1 + ln ). We still determine the value of the
portfolio according to Eq. (11.16), i.e., we rely on the self-nancing condition in Eq. (11.14)
and both (i) the locally riskless condition in Eq. (11.15),
2 (2 ) = 1
1 (1 ) + 3
3 (3 ),
597

11.6. Beyond Ho and Lee: Calibration through Arrow-Debreu securities

c
by
A. Mele

and (ii) the (generically incorrect) assumption of parallel shifts in the yield curve,
1 (1 )
1
1

3 (3 )
3.
3

2 (2 )

Figure 11.13 depicts the prot and losses arising from the trade.

2.5

1.5

1.5
1
1
0.5
0.5
0

0
0.5

0.01

0.02
0.03
0.04
shortterm rate in one month

0.05

0.06

2.5

0.5

0.01

0.02
0.03
0.04
shortterm rate in six months

0.05

0.06

0.01

0.02
0.03
0.04
shortterm rate in one year

0.05

0.06

1.5

1.5
0.5
1
0
0.5
0.5

0
0.5

0.01

0.02
0.03
0.04
shortterm rate in three months

0.05

0.06

FIGURE 11.13. Prot and losses arising from barbell trading, Barb ( ) in Eq. (11.48), under the
assumption the yield curve is driven by the Ho and Lee model, Eq. (11.47). The initial yield curve
is assumed to be upward sloping, generated by the equation, ( ) = 0 01(1 + ln ), with prices given
( ) . Investment horizons are
by (0 ) =
= 1 12 (NW quadrant), = 3 12 (SW quadrant),
= 6 12 (NE quadrant) and, = 1 (SE quadrant). The vertical dashed lines pass through the current
short-term rate, = 1 0%, and the horizontal dashed lines pass through zero.

Similarly as for the prot and losses summarized in Figure 11.12, the trade leads to prots
only when the short-term rate increases, and signicantly, from the initial value = 1%. In
particular, when moves around 1%, prots increase as lowers, and decrease, as goes up. This
e ect relates to that arising within the static exercise described in Table 11.4: long term bonds
benet from a decrease in more than short-term, and lose their value more than short-term
bonds as increases. However, as the interest rate increases signicantly, the barbell generates
prots because the convexity of 10 year bonds dominates overall.

11.6 Beyond Ho and Lee: Calibration through Arrow-Debreu securities


The approach in the previous sections imposes no-arbitrage restrictions on bond prices, which
have implications on forward rates, thereby ultimately determining an implied stochastic process
of the short-term rate. In this section, we determine the no-arbitrage dynamics of the shortterm rate in the rst place. The advantage of the approach in this section is that it allows us to
598

11.6. Beyond Ho and Lee: Calibration through Arrow-Debreu securities

c
by
A. Mele

implement solutions for models, without requiring to solve them in closed-form. For example,
the Ho and Lee (1986) model relies on a number of assumptions, which might be unrealistic,
in practicethe short-term rate can take on negative values in this model. It is quite unusual
that a model displaying realistic features also has a closed-form solution.
The approach in this section relies on the Arrow-Debreu securities introduced in Chapter 2,
and parallels the work of Derman and Kani (1994), Rubinstein (1994) and Dupire (1994) on
equity options reviewed in Chapter 10. Arrow-Debreu securities do not exist, in practice, for
the reasons put forward in Chapter 2, relating to the simple circumstance we may disagree on
models and, hence, on the relevant events and states of nature. However, we can extract the
shadow price of these securities from traded securities, and used them to price interest rate
derivatives. Naturally, we do so, by relying on a given model. The rationale of all this is that
we do need models to evaluate interest rate derivatives, and Arrow-Debreu security prices are,
in fact, the top of the iceberg of a given model, so to speak. Note that the emphasis in the
rst and second parts of these Lectures is on the determination of Arrow-Debreu security prices
in given economies, by relying on assumptions such as production possibilities, preferences or
markets. The approach in this section is to extract the price of Arrow-Debreu securities from
that of already traded assets. We illustrate this approach, by elaborating on three points.
First, we show how Arrow-Debreu securities can be used in the specic context of xed
income security evaluation; in particular, we illustrate how to exploit the prices of these assets
to turn the martingale restrictions of the previous sections into a set of equivalent conditions
that are directly usable for practical purposes.
Second, we just use the previously extracted Arrow-Debreu prices, and develop algorithms to
populate the short-term rate tree, while ensuring that the initial yield curve is tted without
errors.
Third, we illustrate this procedure and solve two models: (i) the Ho and Lee model, and
(ii) a model developed by Black, Derman and Toy (1990). While we know the solution to the
rst model, it is useful to review how it could alternatively be solved, as this would naturally
give us insights into the mechanisms underlying the calibration algorithms of this section. Note
in any event, the retrieving the price of Arrow-Debreu security prices is crucial even in the
context of the Ho & Lee model, as they can help determine the price of complex derivatives
without a closed-form expression. The second model is one where even the bond price has not
a closed-form solution.
11.6.1 Extracting Arrow-Debreu securities from the yield curve
We know, from Chapter 2, that an Arrow-Debreu security is an asset that pays o $1 in some
prespecied state of the nature, and zero otherwise. Consider, for example, the diagram in
Figure 11.14.

599

11.6. Beyond Ho and Lee: Calibration through Arrow-Debreu securities

c
by
A. Mele

x s,
x
1

(0,0)

s, +1

s 1,

Arrow-Debreu security
1

0
0

t=0
t = 0.5
t=1
t = 1.5
t=2
FIGURE 11.14. In the binomial tree of this section, an Arrow-Debreu security for state
at time + 1 is a security that pays $1 at time + 1 in state , and zero otherwise. This
section aims to show how to recover Arrow-Debreu prices from the price of xed income
securities.

In this diagram, is the risk-neutral probability of an upward movement of the short-term


rate. A generic pair ( ) at each node tracks the number of upward movements of the shortterm rate, , and calendar time, , where
, as there can only be one possible short-term
rate movement in each period. Consider the Arrow-Debreu security for state at time + 1.
Let ( ) denote the current price of an Arrow-Debreu security, one which pays o $1 at time
and in state , and zero otherwise. As we have learnt from many junctures of these lectures,
Arrow-Debreu securities can be used to price everything, including bonds. For example, the
current market price of a zero coupon bond, maturing at time is,
$

(0

)=

( )

(11.49)

=0

( ) in node ( ), meaning a dividend


More generally, consider a derivative that pays o
equal to 1 ( ) in state = 1, equal to 2 ( ) in state = 2 , and equal to
( ) in state
= . Consistently with explanations given in Chapter 2, the price of this asset, denoted as
), is given by,
$ (0
XX
(0
)
=
( )
( )
(11.50)
$
=1 =0

11.6.1.1 The forward equation for Arrow-Debreu security prices

Our objective is to make use of the initial yield curve, and retrieve the price of all ArrowDebreu securities, i.e. ( ) for all and , where
{1 }, from the observation of the
600

c
by
A. Mele

11.6. Beyond Ho and Lee: Calibration through Arrow-Debreu securities

initial term-structure of interest rates. Consider the Arrow-Debreu security that pays $1 in
node (
+ 1) (see Figure 11.14). Denote with
[
+ 1] its value at time , and in state
,
. What is this value at time in all states? A key observation is that in this tree,
the node (
+ 1) (the lled circle) can only be accessed to through the nodes ( ) and
(
1 ) occurring at time (the two empty circles in Figure 11.14). At time then, the value
[
+ 1] is zero in all the nodes ( ) except the empty circles ( ) and (
1 ). Indeed,
if we do not happen to be at one of those nodes denoted with empty circles, we could not reach
the node (
+ 1) (the lled circle), where the Arrow-Debreu security pays o .
So, we are left with nding the values
[
+ 1] in the nodes corresponding to the empty
circles ( ) and (
1 ), i.e.
[
+ 1] and
+ 1]. Let ( ) be the continuously
1 [
compounded short-term rate in node ( ). Consider the upper node ( ). We have,
[

Similarly, in the lower node, (


1

( )

+ 1] =

[0 + 1 (1

( )

)] =

(1

1 ),
1(

+ 1] =

[1 + 0 (1

We can think of our Arrow-Debreu security for (


delivers the following payo s
[

+ 1] =
+ 1] =
1 [
[
+ 1] = 0

( )

1(

)] =

+ 1) as a derivative that at time ,

(1
1(

)
)

(11.51)

for all

These payo s are simply the market value of the Arrow-Debreu security for (
+ 1), in the
various states occurring at time , i.e. the money the holder can make by selling the asset at
time , in the various states. Therefore, we can apply Eq. (11.50), and obtain,
( + 1) =

( )

+ 1]

=0

( )

+ 1] +

( )

+ 1]

By replacing the Arrow-Debreu prices in (11.51) into the previous equation, we obtain the
so-called forward equation for the Arrow-Debreu prices,
( + 1) =

( )

( )

(1

)+

( )

1(

(11.52)

Eq. (11.52) is the counterpart to the forward equation used in Chapter 10 to t European
option prices. The approach in this section di ers from that in Chapter 10, because we take
the risk-neutral probability to be constant, and interest rates to be time-varying and statedependent, whereas in Chapter 10, interest rates are exogenously given, and the risk-neutral
probability is time-varying and state-dependent. Naturally, the approach in this section can be
generalized to the case of stochastic risk-neutral probabilities, once we also wish to t European
options, on top of bond prices. In the next section, we explain how to t bond prices.
601

11.6. Beyond Ho and Lee: Calibration through Arrow-Debreu securities

c
by
A. Mele

11.6.1.2 The bond price equation

To implement an algorithm, we make a repeated use of the forward equation (11.52) and the
following zero pricing equation,
$ (0

+ 1) =

( )

( )

(11.53)

=0

The inputs to the algorithm are a number of zeros equal to the largest maturity date the
tree extends to. Note an important feature of the calibration procedure. While we extract
Arrow-Debreu security prices, we need to make reference to a given model for the underlying
short-term rate movements, ( ) and, indeed, we shall illustrate the algorithm by hinging upon
two examples in the following sections. Instead, in Chapter 10, we have illustrated that in the
equity case, cross section of option prices is enough to uniquely pin down the underlying stock
price movements.
11.6.2 Two model examples
We begin with Ho and Lee, assuming continuous compounding. By Eq. (11.39), the short-term
rate predicted by the Ho and Lee model is:
( ) = (0) + ln ((1

)+

) ln

(11.54)

+ 1],
where (0) is the continuously compounded forward rate at time zero for maturity [
and is the number of upward movements of the entire set of bond prices. Dene
(
),
which is the number of downward movements of the bond prices or, equivalently, the number
of upward movements of the short-term rate. Hence, we can equivalently index the short-term
rate by , instead of , and rewrite Eq. (11.54) with a slight abuse in notation, as follows:
( ) = (0) + ln ((1
|
{z

0(

)+
)

) + ln
}

(11.55)

such that 0 ( ) is the short-term rate at time , in the event of zero upward movements in
this rate, and is the usual volatility parameter, which can be calibrated through ln 1 =
Std( )
, with straightforward notation. Note, incidentally, that the short-term rate movements
(1

do depend on the specic value we assign to the risk-neutral probability .


Naturally, Eq. (11.55) would be su cient to populate the interest rate tree quite easily.
However, even in the context of Ho and Lee, it is still quite important to retrieve Arrow-Debreu
security prices, as these would allow us to price all kinds of interest rate derivatives, including
those for which a closed-form expression is unavailable.
Let us proceed. At time zero, the price of a zero maturing at time + 1 is:

(0

+ 1) =

X
=0

( )

( )

0(

( )

=0

where the second equality follows by the assumption that the short-term rate is solution to Eq.
(11.55).
602

c
by
A. Mele

11.6. Beyond Ho and Lee: Calibration through Arrow-Debreu securities

By rearranging terms in the previous equation, we obtain a closed-form expression for the
future short-term rate at time , in the event of zero upward movements,

P
( )
=0
(11.56)
0 ( ) = ln
+ 1)
$ (0
This is the counterpart to zero pricing equation (11.53).
We use Eq. (11.56) and the forward equation (11.52) to populate the interest rate tree, under
the assumption that = 12 . Precisely, the algorithm proceeds as follows:
(i) Given the boundary condition for the Arrow-Debreu price, 0 (0) = 1, determine the
initial value of the short-term rate, 0 (0), using Eq. (11.56), as 0 (0) = ln(1/ $ (0 1)).
(ii) Suppose we know the future value of the short-term rate at time
1, in the event of no
1). Then, given the value of 0 (
1), and the price of
upward movements, i.e. 0 (
the Arrow-Debreu securities (
1) for
1, determine ( ) for
, through
the forward equation (11.52),
( )=

0(

1)

1)

(1

)+

1)

0(

1
= ,
2

1)

where the last equation follows by plugging Eq. (11.55) into Eq. (11.52).
, use Eq. (11.56) to determine the future
(iii) Given the Arrow-Debreu prices ( ) for
value of the short-term rate at time , in the event of no upward movements, i.e. 0 ( ).
(iv) If

= , stop. Otherwise, go to (ii).

As a second example, consider the Black, Derman and Toy (1990) model. In this model, the
short-term rate is solution to,
( )=
(11.57)
0( ),
where is, once again, a volatility parameter.8 For computational convenience, this model
assumes that the short-term rate in Eq. (11.57) is discretely compounded. Accordingly, we
rewrite the forward equation (11.52) in terms of discretely compounded rates,
( + 1) =

( )

1
1+

( )

(1

)+

( )

1
1+

(11.58)

( )

The algorithm proceeds as follows:


(i) Determine the initial value of the short-term rate,

(0), as the solution to,

(0 1) =

1
1+ 0 (0)

(ii) Suppose we know the future value of the short-term rate at time
1, in the event of no
upward movements, i.e. 0 (
1). Then, given the value of 0 (
1), and the price of
the Arrow-Debreu securities (
1) for
1, determine ( ) for
, through
the forward equation (11.58),
( )=

1)

1
1+

1)

(1

)+

1)

1
1+

1)

1
2

where the last equation follows by plugging Eq. (11.57) into Eq. (11.58).
8 In its most general form, this model assumes that
( )=
is a volatility parameter that varies determinis0 ( ), where
tically over time. This more general formulation leads to more exibility, which is useful to t the term structure of volatility.

603

11.6. Beyond Ho and Lee: Calibration through Arrow-Debreu securities


(iii) Given the boundary condition 0 (0) = 1, and the Arrow-Debreu prices,
use the pricing equation for the zero,
$

(0

+ 1) =

( )

=0

c
by
A. Mele
( ) for

1
1+

( )

to solve, numerically, for the future value of the short-term rate at time , in the event
of no upward movements, i.e. 0 ( ). Note, we did not need this additional step for the
solution of the Ho and Lee model, as the short-term rate 0 ( ) is known in closed form
in the Ho and Lee model (see Eq. (11.56)). Note, since 0 ( ) 0, then, we also have that
( ) 0 by Eq. (11.57).
(iv) If

= , stop. Otherwise, go to (ii).

11.6.2.1 A numerical example

Consider, again the Ho and Lee model example in Section 11.5.5, where three zeros were traded:
(i) one zero maturing in 6 months, (ii) one zero maturing in 1 year, and (iii) one zero maturing
in 1.5 years, with market prices $ (0 1) = 0 9851, $ (0 2) = 0 9685, $ (0 3) = 0 9445. By
Eq. (11.55), the Ho and Lee model assumes that,

( ) = 0 ( ) + ln 1
(11.59)

We use Eq. (11.59) and nd the values of the short-term rate ( ) in each node, under
the assumption that = 12 , and that the standard deviation of the short-term rate is 0 014,
annualized. To nd , we may use the relation, ln 1 = Std( ) , where = 12 and Std( )
(1

is the standard deviation of the short-term rate, which equals Std( ) = 0 014, annualized.
1
Therefore ln 1 = 0 014
= 0 02 or = 0 9802.
2
2
For the Ho & Lee model, we know the closed-form expression for 0 ( ),
P

( )
=0
(11.60)
0 ( ) = ln
+ 1)
$ (0
where ( ) denotes the price of an Arrow-Debreu security which pays of $1 in state at time
, and zero otherwise. Given the term-structure of prices $ (0 + 1), = 0 1 2, we populate
the tree using Eq. (11.60) and the forward Arrow-Debreu prices equation (11.52),
( )=

1
2

0(

1)

1) +

with the appropriate boundary conditions.


So we have to calculate interest rates and Arrow-Debreu prices for

1)

(11.61)

= 0 1 2.

= 0. Eq. (11.60) is trivial. It leads to, 0 (0) = ln $ (01 1) = 0 015 The forward equation
for the Arrow-Debreu prices, Eq. (11.61), is also trivial, 0 (0) = 1.
= 1. Let us use Eq. (11.61), the forward equation for the Arrow-Debreu prices, to nd
0 (1) and 1 (1). We have two cases:
604

c
by
A. Mele

11.6. Beyond Ho and Lee: Calibration through Arrow-Debreu securities

= 0. We have:
0

(1) =

1
2

0 (0)

(0) + 0] =

1
2

0 (0)

= 0 4925

The previous relation holds because 0 (1) is the current price of the Arrow-Debreu
security which pays o $1 in state 0 at time 1, as illustrated by the tree in the Figure
1 below,

1
= 2

s=1

s=0
s=0
1

=0

=1

= 1. By a similar reasoning,
1

(1) =

1
2

0 (0)

[0 +

(0)] =

1
2

0 (0)

= 0 4925

This Arrow-Debreu security is valued the same as that for state


risk-neutral probability is 50%.
Eq. (11.60) is, now,

0 (1) = ln

(1) + 1 (1)
$ (0 2)

0 4925 (1 + 0 9802)
= ln
0 9685

Hence, by Eq. (11.59),


1

(1) =

(1) + ln

605

= 0 0069 + 0 02 = 0 0270

= 0 because the

= 0 0069

11.6. Beyond Ho and Lee: Calibration through Arrow-Debreu securities

c
by
A. Mele

To sum up, we have the tree below,


q=

1
2

r1 (1) = 0.027

r0 (0) = 0.015

r0 (1) = 0.0069

=0

=1

We can now calculate the values of the short-term rate for one further period.

= 2. By Eq. (11.61), the forward equation for the Arrow-Debreu prices, we have the
following three cases:
( = 0)
( = 1)
( = 2)

(2) =
1 (2) =
2 (2) =
0

1
2
1
2
1
2

0 (1)

[ 0 (1) + 0] = 0 2446
[ 1 (1) + 0 (1)] = 0 4843
0 (1)
[0 + 1 (1)] = 0 2397
0 (1)

The tree below further illustrates how to obtain these prices.


s=2

q=

1
2

s=1

s=0

s=1
s=0
s=0

=0

=1

=2

Consider, for example, 0 (2). It is the price of the Arrow-Debreu security for time 2,
under two consecutive downward movements of the short-term rate. This state can only
be accessed to through the state = 0 at time = 1. But at state = 0 at time = 1,
the value of the Arrow-Debreu asset is 12 0 (1) . Hence, 0 (2) = 0 (1) 12 0 (1) . By a
similar reasoning, we have that 2 (2) = 1 (1) 12 1 (1) = 1 (1) 12 0 (1) . Note, there
is some symmetry in the distribution of the Arrow-Debreu prices, with 1 (2) being the
largest, being the price of the security that pays o with the highest likelihood. However,
606

c
by
A. Mele

11.6. Beyond Ho and Lee: Calibration through Arrow-Debreu securities

(2)
is constant and equal to 50%, because discounting is more severe
2 (2), even if
whilst crossing the nodes leading to = 2, compared to the nodes leading to = 0.
0

We can now calculate the values of the short-term rate for each node. Eq. (11.60) is, now,

2
0 (2) +
1 (2) +
2 (2)
0 (2) = ln
$ (0 3)
!

0 2446 + 0 9802 0 4843 + (0 9802)2 0 2397


= 0 0054
= ln
0 9445
Hence, by Eq. (11.59),
(2) =

(2) + ln

= 0 0054 + 0 02

This yields the following values for the short-term rate:


and 2 (2) = 0 0452.

=0 1 2
(2) = 0 0054,

(2) = 0 0253,

The diagram below summarizes the implied tree for the short-term rate in this model.

q=

q=

1
2

1
2

r2 (2) = 0.0452

r1 (1) = 0.027

r0 (0) = 0.015

r1 (2) = 0.0253
r0 (1) = 0.0069
r0 (2) = 0.0054

=0

=1

=2

Naturally, the prices


=
in the nodes of the previous tree match those calculated in
Section 11.6.5, apart from discrepancies arising due to rounding errors.
11.6.2.2 A second example: time-varying probabilities and interest rate volatility swaps

Assume that the spot yield curve is 2.5% for = 1 year, 4.5% for = 2 years, and 6% for = 3
years, continuously compounded and annualized. Consider the following model:
()=

( )+

(11.62)

where is a constant and equal to 0 01, ( ) is the continuously compounded short-term rate
as of time , after upward movements and, nally, the unit period of time is taken to be one
607

11.6. Beyond Ho and Lee: Calibration through Arrow-Debreu securities

c
by
A. Mele

year. As we know, the Ho & Lee model predicts that the price as of time zero of an ArrowDebreu security paying o in state at time , denoted as ( ), satises the following forward
equation, for
1,
0 and
:
i
h
1

1)
0(
()=
(
1) +
1)
(11.63)
(1
)
1(
where is the risk-neutral probability of an upward movement in the short-term rate. Furthermore, according to this model, the price of a zero coupon bond, paying $1 at time , $ (0 ),
equals,
1
X

1)
0(
(0
)
=
(
1)
(11.64)
$
=0

Suppose, next, that the risk-neutral probability of an upward movement at any time is not
a constant , but a function of calendar time, say : is, then, the probability of an upward
movement in the short-term rate from time to time + 1. Naturally, the assumption that
is time-varying, makes this model markedly distinct from Ho & Lee model. To calibrate this
model, we consider the recursive equation for the Arrow-Debreu security prices:
h
i

1
1)
0(
()=
(1
(
1) + 1
1)
(11.65)
1)
1(

where
1 denotes the risk-neutral probability of an upward movement in the short-term rate
from time
1 to time . The boundary conditions are the usual ones: 0 (0) = 1, ( ) = 0,
for
and
0. Eq. (11.65) can be derived through the same arguments in Section 11.7.1.
Next, suppose the risk-neutral probability of an upward movement in the short-term rate in
the rst period equals 12 . Suppose, further, that available for trading is a derivative, which pays
o an amount of $1 in state = 2 and an amount of $1 in state = 0, both at time = 2.
The current price of this derivative equals 0 45514. The interpretation of the derivative is that
of a contract that pays o when the interest rate experiences extreme movements (up-up or
down-down)a very basic interest rate volatility contract. Its price can be expressed as the sum
of the two Arrow-Debreu securities for these extreme interest rate movements. Let us set the
nominal values of the zero coupon bonds to $1. To populate the interest rate tree, we need to
determine the three zero prices, which are:
$

(0 1) =

0 025

= 0 97531

(0 2) =

0 0452

= 0 91393

(0 3) =

0 063

= 0 83527

We can start populate the tree. Eq. (11.64) can be rewritten as:

P
)
()
=0 (
0 ( ) = ln
$ (0 + 1)
We have,

= 0. In this case, Eq. (11.66) is:

1
0 (0) = ln
0 97531

= 1. We have two nodes to ll:

= 0 025

= 0 & = 1. We use Eq. (11.65), as follows:


608

(11.66)

c
by
A. Mele

11.6. Beyond Ho and Lee: Calibration through Arrow-Debreu securities

= 0: We have,
0

(1

0) 0

(0) = 0 97531 0 5 1 = 0 48766

= 1: We have,
1

(1) =

0 (0)

(1) =

Then, Eq. (11.66) is,

0 (1) = ln

0 (0)

(1) =

0 0

(0) = 0 97531 0 5 1 = 0 48766

(1) + ( ) 1 (1)
$ (0 2)
0 (1) + 0 01 = 0 07
0

= 2. There are now three nodes to ll, corresponding to


Eq. (11.65), as follows:

= 0 06

= 1 and

= 2. We use

0 (1)

(2) =

(1

1) 0

(1) =

0 06

(1

1) 0

48766

= 1: We have,
1

= 0,

= 0: We have,
0

0 01

0 48766 (1 +
= ln
0 91393

(2) =

0 (1)

= 2: We have,

(1
2

1)

(1) +

0 (1)

(2) =

1 0

(1) =

0 06

0 06

(1) =

(1
0 01

0 01

1)

0 48766

0 48766

We do not know yet 1 . Yet the rate volatility asset, which quotes for 0 45514, can be
used to extract 1 . At time = 1, its price is either
= 0 07 1 (in the up state of the
0 06
world), or
=
(1
1 ) (in the down state of the world). So by no-arbitrage, its
current price, satises
0 45514 =

1
2

0 025

)=

1
2

0 025

0 07

0 06

(1

1)

Solving for 1 yields, 1 = 0 90. Naturally, the same result is obtained by calibrating 1 so
as to make the price of the derivative, 0 45514, match the sum of the prices of the ArrowDebreu securities paying o in states 0 and 2 at = 2, viz 1 : 0 45514 = 0 (2) + 2 (2) =
0 06
0 06
0 01
(1
0 48766. So now, we can use 1 = 90% and calculate
1 ) 0 48766 +
1
the Arrow-Debreu prices, obtaining:
(2) =
1 (2) =
2 (2) =
0

0 06

(1

(1
0 06
9
0 06

9) 0 48766 = 0 04592

9) 0 01 + 9 0 48766 = 0 4588
0 01
0 48766 = 0 40922

Note, there is no symmetry at all in the distribution of these Arrow-Debreu security prices.
The price 0 (2) is very low, due to the fact that 1 is very high, such that the probability
of reaching the lowest node of the tree at time = 2 is quite low.
609

c
by
A. Mele

11.6. Beyond Ho and Lee: Calibration through Arrow-Debreu securities


Next, by Eq. (11.66),

(2) + 2 2 (2)
0 (2) = ln
$ (0 3)

0 04592 + 0 01 0 4588 + 20 01 0 40922


= 0 07605
= ln
0 83527

(2) +

and,
(2) =
2 (2) =
1

(2) + 0 01 = 0 07605 + 0 01 = 0 08605


0 (2) + 2 0 01 = 0 07605 + 2 0 01 = 0 09605
0

Finally, we wish to evaluate a European call option at time zero, written on the three year
zero coupon bond with nominal value equal to $1. This option expires at = 2 and has a strike
price equal to $0 91000. At expiry, the option pays o :
2 (2)
+ 0 09605
+
0 91 =
0 91 = 0
2 (2)
1 (2)
+ 0 08605
+
0 91 =
0 91 = 0 00755
1 (2)
0 (2)
+ 0 07605
+
0 91 =
0 91 = 0 01677
0 (2)

Then,

1 (1)

=
=

0 07

( 1 2 (2) + (1
1 ) 1 (2))
(0 9 0 + 0 1 0 00755) = 7 0396 10

0 (1)

=
=

0 06

( 1 1 (2) + (1
1 ) 0 (2))
(0 9 0 00755 + 0 1 0 01677) = 7 9786 10

which leads to = 0 025 12 ( + ) = 4 2341 10 3 .


Finally, using all the market data so far, we wish to evaluate a second European call option
written on the three year bond, expiring in one year, and struck at $0 85. Its no-arbitrage price,
denoted with
, is:
0 025 1

where
(1 3) =
=
(1 3) =
=

1 (1)

(1 3)

2 (2)

0 85)+ + (

+ (1

1)

(1 3)

1 (2)

0 85)+

0 90 0 09605 + 0 10 0 08605 = 0 84786

0 (1)
1 (2)
0 (2)
+ (1
1
1)

0 06
0 90 0 08605 + 0 10 0 07605 = 0 86498
0 07

= 0 025 12 (0 01498) = 0 00730. Suppose, now, that the market value of this option
That is,
diverges from
, i.e.
6= $ , where $ is the market value of the option. For example,
$ . To implement this arbitrage opportunity, we can sell the option, and use the proceeds
to build up a portfolio comprising the bond expiring in three years and a money market account,
with initial value:
0 =
$ (0 3) +
610

c
by
A. Mele

11.7. Callables, puttable and convertibles with trees


where

and

are chosen to match the payo s promised by the option at time 1:

0
(1 3) +
=
(
):
0
(1 3) +
=

where

and

are the payo s of the one year option. The solution is,
=

(1 3)

(1 3)

Using the numerical values obtained so far,


0 86498, we have:
=

0 01498
= 0 875
0 84786 0 86498

=
= 0,

(1 3)
(1 3)

= 0 01498,

(1 3)
(1 3)
(1 3) = 0 84786,

01498 0 84786
=
0 84786 0 86498

0 025 0

(1 3) =

0 72356

The current value of this portfolio is,


0 =

(0 3) + = 0 875 0 83527

0 72356 = 0 00730 =

] from the option seller, and


Finally, if
$ , we can buy the option, claim for [
sale short the previous ( ) portfolio. Net prots from the trade are
0 at time
$
= 0; at time = 1, we use the claims from the option we purchased to honour the sale of the
( )-portfolio.

11.7 Callables, puttable and convertibles with trees


This section provides an introductory discussion of the pricing of callable, puttable and convertible bonds, with and without credit risk, and develops basic pricing examples for callable
and convertibles, relying on binomial trees. Chapter 12 develops a continuous time evaluation
framework for callable and puttable bonds, while Chapter 13, contains a continuous time model
to evaluate convertible bonds.
11.7.1 Denitions and rationale
Callable bonds are assets that can be called back by the issuer at a pre-specied strike price,
either at a xed maturity date or at any xed date before the expiration. The rationale behind
this optionality is that at the date of issuance, the market might not share the same optimism
as the issuer as regards the issuers future creditworthiness. By adding a provision to call the
bonds, the issuer gives itself the option to renance at some future date, at hopefully better
market conditions. Although this specic example might link to agency problems or di erence
in beliefs between the bond issuer and market participants, the indenture to call the bond is
an option that might generally arise as a result of pure hedging motives, arising by a concern
that future interest rates might lower.
Naturally, the right to call the bonds rises the cost of capital, to the extent of the value of
this (call) option to redeem the bonds. Let us examine the details of these issues. Let denote
the value of a callable zero coupon bond, which we assume to embed an American style option
to call, i.e. one that could be exercized at any point in time before the expiration. For each ,
the rm decides to reedem the bonds whenever the exercise price is less than the (discounted
611

c
by
A. Mele

11.7. Callables, puttable and convertibles with trees

value of the) callable bond expected to prevail in the next period. The rm, then, would exercise
at , should it expect its cost of capital will decrease at + 1, which would boost the market
evaluation of its debt. In this case, the price of a callable bond is clearly just . Otherwise, the
value of the callable bond is its discounted expected value over the next period. To summarize,
= min{

E (

+1 )}

E (

+1 )

max{

E (

+1 )

0}

(11.67)

We can view this problem under a slightly di erent angle, one where the rm may decide to
issue non-callable zero coupon bonds just upon convertion, such that the price of any callable
bond could be neatly decomposed as the price of a straight, non-callable bond, minus the option
to call the bond.
Let
( ) denote the price of a non-callable zero coupon bond as usual and suppose that
at some point in time , interest rates have decreased to an extent to have made
( )
su ciently large, in a sense to be explained in a moment. The problem we want to study is
actually one where the issuer is seeking for an optimal stopping time at which it can redeem
the bonds for , and issue new non-callable debt at = , priced at
( ). This would allow
the issuer to cash in a di erence equal to
( )
. Note that by doing so, the bond-issuer
is left with the same optionalities it would have by not exercising the option to call, but with
the additional money-shower,
( )
. It is, therefore, in the interest of the bond-issuer
to exercise at , whenever the di erence
( )
is positive and su ciently large, and it
is obviously not otherwise. Naturally, we consider re-issuance of non-callable debt because we
wish to achieve a neat decomposition of the value of callable debt in terms of non-callable debt
and an option to call.
How large the di erence
( )
has to be? It is a real option problem of the kind studied
in Chapter 4, 8 and 10 of these Lectures. The objective of the rm is to maximize the present
value of the money shower at some optimal stopping time , viz
P

=
sup E
( ( )
)+
(11.68)
C
= inf [ ] { : ( ) = }.
which gives rise to a free-exercise boundary,
We conjecture, accordingly, that the value of callable debt can be decomposed at any time
as the value of a non-callable debt minus the American option price C in Eq. (11.68),
=

( )

(11.69)

Replacing Eq. (11.69) into Eq. (11.67) leaves


( )

C =
=

E (
( )

+1

( )

max{(

C +1 )

( )

max{
+

E (

+1

( )

E (C +1 )}

C +1 )

0}

where the second equality follows by the martingale property of the bond price rescaled by the
money market account, and by rearranging terms. That is,
C = max{(

( )

)+

E (C +1 )}

(11.70)

conrming Eq. (11.69). In other words, the optimal stopping time for the problem in Eq. (11.67)
collapses to that in Eq. (11.68).
Puttable bonds, instead, are assets that give the holder the right to sell the bonds back to
the issuer at some exercise price, either at a xed maturity date or any xed date before the
612

c
by
A. Mele

11.7. Callables, puttable and convertibles with trees

expiration. The bondholders would exercise their option to tender the bonds to the issuer when
market conditions improve from their perspective, i.e. when interest rates are high enough, so
as to make bond prices lower than the exercise price. Therefore, issuing puttable bonds leads
to a lower cost of capital, to the extent of the value of the American put option given to the
bondholders to tender the bonds at the strike , in analogy with the pricing of callable bonds.
Suppose for example that the price of a non-puttable bond,
( ) as usual, lowers to a
level su cienly lower than the strike price , in a sense to be determined in a moment. The
bondholders, then, will nd convenient to tender the bonds at , buying conventional bonds at
( ), thereby cashing in
( ), and then wait until maturity. This trade would provide
bondholders with a money-shower equal to
( ), at the exercise date. Alternatively, the
bondholders would not exercise, and wait until maturity, in which case they would not receive
the prot
, at the exercise date. Therefore, it is optimal to exercise when
( ),
with the price being su ciently low for some , and it is obviously not otherwise. Therefore,
that the value of a puttable zero coupon bond, p say, satises:
p
p
p
= max{
E
E ( +1 ) + max{
E
0}
(11.71)
+1 } =
+1
Conjecture that,

( )+P

(11.72)

where P is the value of an American put on the zero-coupon bond, solution to,
P = max{(

( ))+

E (P +1 )}

(11.73)

Substituting Eq. (11.72) into Eq. (11.71) leaves Eq. (11.73) by arguments similar to those
leading to Eq. (11.70).
Convertible bonds are assets that give the holder the right to convert them into a prespecied
number of shares of the rm. Their value at each date when the conversion can take place is
max {CV } = + max {CV
0}, where CV denotes the conversion value of the bonds,
expressed in terms of the value of the rms shares: issuing convertible bonds now lowers the
cost of capital to the extent of the option given to the bondholders to convert the bonds into
shares. Convertible bonds can be made callable by the bond-issuers, at a strike . Usually, if
the bonds are called, the convertible bondholders have the option to either tender the bonds to
the rm, or to convert them. On the other hand, the only reason the bond-issuers might call
is that the price of the convertibles is up, compared to the strike price. Therefore, the option
to make convertible bonds also callable puts a ceiling to the price of the convertibles bonds,
given by the exercise price, . Mathematically, in the presence of callability, the value of a
convertible bond at each potential conversion date is max {CV min {
}}: the option to call
back takes away some of the optionality from the bondholders, who are, in e ect, forced to
convert, as soon as the price increases to a level beyond .
For the previous mechanism to work, the conversion value and the bond price cannot be
both continuous processes. Alternatively, we need to think in terms of a discrete time context.
Consider the following events triggering the convertible bond-holder to convert after the issuer
decides to call. First, the issuer calls as soon as
, such that max {CV min {
}} =
max {CV
}. Then, the convertible bond-holder decides to convert when CV
, regardless
of whether CV is larger than . For example, it may be that CV
, in which case conversion
would not have taken place without the issuer option to call at . Note that with this option
to call, we could well have that:
CV
613

11.7. Callables, puttable and convertibles with trees

c
by
A. Mele

If CV0
are continuous, these inequalities could not hold. If some
0 , and both CV and
point
, it so occurs that
, the issuer would call the bond, and give the bondholder
the option to convert. But the bondholder would not convert as due to continuity, CV would
still be below .
11.7.2 Callable bonds
11.7.2.1 Copying with credit risk

We can price callable bonds through trees in a way that the initial yield curve is tted without
errors, relying on the methodology in this chapter. One issue to take into account is the presence
of credit risk. We may proceed as follows.
(i) First, we populate a short-term rate tree through one of the models described in this
chaptersay, for example, through the Black, Derman and Toy (1990) model.
(ii) Second, we rely on the implied short-term rate process of the previous step and price a
callable coupon bearing bond without default risk. In each node, we compare the strike
price
with the rolled-back (ex-coupon) bond value, take the minimum of the two, add
the coupon to this minimum and nd, then, the market value of the callable bond at the
relevant node,
( )
( ) = min{
E ( +1 | )} + coupon
where ( ) denote the value of the callable coupon bearing bond at time and state ,
( )
( ) is the short-term process at and state , and
E ( +1 | ) is the rolled-back
ex-coupon value of the bond at and state . As usual, this rolled-back value is found
recursively, i.e., by discounting the risk-neutral expectation of the future values of the
coupon-bearing callable bond value.
(iii) Third, we correct for credit risk. The price of the callable bond in the previous step is
likely to exceed the market price, due to credit risk. One then proceeds with adding a
constant spread to the short-term rate process in step one. The resulting credit riskadjusted interest rate tree is used to implement step (ii). If the model-based callable bond
is valued less than the market, one re-calibrates the interest rate tree with a lower credit
spread, until convergence is achieved by which market and model prices of the callable
bonds are the same. The resulting spread is usually referred to as the option-adjusted
spread to emphasize it is determined while taking into account the optionalities regarding
the callable bonds.
At this point, we may price derivatives written on callable defaultable coupon bearing bonds
for example, options.9 We now illustrate a simple case, relating to the the pricing of a callable
bond without credit risk.
11.7.2.2 A numerical example: without credit

Assume that the discretely compounded six-month rate, or the short-term rate, evolves over
time according to the tree described in Figure 11.15.

9 Ho and Lee (2004) (Chapter 8, Section 8.3 p. 274-278) contain exercises on the pricing of options on callable bonds relying on
trees such as those described in this section.

614

c
by
A. Mele

11.7. Callables, puttable and convertibles with trees

uuu: r=4.75%
uu: r=4%
u: r=3.5%

uud: r=3.5%

r=3%

ud: r=3%
d: r=2%

udd: r=4.5%
dd: r=1.5%
ddd: r=1.25%

t=0

t = 0.5

t=1

t= 1.5

FIGURE 11.15.

Next, consider a bond expiring in two years, paying o coupon rates of 3% of the principal
of $1 every six months, and callable at any time by the issuer, at par value. Let this bond
be labeled BCX. Suppose that the prices of three zero coupon bonds expiring in one year,
eighteen months and two years are, respectively, 0 94632, 0 91876 and 0 89166. We can use
these market data to calibrate the risk-neutral probabilities of upward movements in the shortterm rate implied by the binomial tree in Figure 11.15, provided these risk-neutral probabilities
depend only on calendar time , not on the specic state of nature at time .
We assume that available for trading is also a conventional, (i.e. non-callable) bond maturing
in two years and paying coupons semiannually, at 3% of the principal of $1. We wish to
calculate the price movements of the non-callable coupon-bearing two year bond. We have,
1
$ (0 0 5) = 1 03 = 0 97087. Furthermore, as regards the zero expiring in one year:

1
1
+ (1
0
0)
$ (0 1) = 0 94632 = $ (0 0 5)
1 035
1 02
which solved for
$

delivers

1
(0 5 1 5) =
1 035
1

= 0 40. As for the zero expiring in 1 5 years,

(0 1 5) = 0 91876 = $ (0 0 5) ( 0 (0 5 1 5) + (1
(0 5 1 5))
0)
= 0 97087 (0 40 (0 5 1 5) + 0 60 (0 5 1 5))

where

Solving for

leaves
$

1
+ (1
1
1 04

1
1)
1 03

1
(0 5 1 5) =
1 02

= 0 70. As for the zero expiring in 2 years:

(0 2) = 0 89166 = $ (0 0 5) ( 0 (0 5 2) + (1
0)
= 0 97087 (0 40 (0 5 2) + 0 60 (0 5 2))
615

1
+ (1
1
1 03
(0 5 2))

1
1)
1 015

c
by
A. Mele

11.7. Callables, puttable and convertibles with trees


where
1
( 1
(1 2) + (1
(1 2))
1)
1 035
1
=
(0 70
(1 2) + 0 30
(1 2))
1 035
1
(0 5 2) =
(1 2) + (1
(1 2))
( 1
1)
1 02
1
=
(0 70
(1 2) + 0 30
(1 2))
1 02
(0 5 2) =

and:

1
1
(1 2) =
+ (1
2
1 04
1 0475

1
1
+ (1
(1 2) =
2
1 03
1 035

1
1
+ (1
(1 2) =
2
1 015
1 02

1
2)
1 035

1
2)
1 02

1
2)
1 0125

Solving for 2 leaves 2 = 0 60.


The price of a coupon bearing bond yielding 3% of the principal every six months is easy to
calculate,
(0 2) = 0 03 (0 97087 + 0 94632 + 0 91876 + 0 89166) + 0 89166 = 1 0035

(11.74)

Given the market data and the previously calibrated risk-neutral probabilities, we now proceed with the calculation of the price of the callable coupon bearing bond. We discount the
expected cash ows, through the evaluation formula, min{ 1} + 0 03, where is the present
value of the future expected discounted cash ows promised at each node by a callable bond
with the same strike price . We have:
(i) At = 1 5 years,
uuu:
uud:
udd:
ddd:

1 03
= 0 98329 vs 1
wait, and the value of the callable bond is 0 98329.
1 0475
1 03
= 0 99517 vs 1
wait, and the value of the callable bond is 0 99517.
1 035
1 03
vs 1
exercise, and the value of the callable bond is 1.
1 02
1 03
vs 1
exercise, and the value of the callable bond is 1.
1 0125

(ii) At = 1 year, we have that

= 60%, and, then:

03

03
+ 0 03 + 0 4 11035
+ 0 03 = 0 97889 vs 1 wait, and the value
uu: 1 104 0 6 110475
of the callable bond is 0 97889.

03
+ 0 03 + 0 4 (1 + 0 03) = 0 99719 vs 1
wait, and the value of
ud: 1 103 0 6 11035
the callable bond is 0 99719.
1
[0 6 (1 + 0 03) + 0 4 (1 + 0 03)] = 1 0285 vs 1
dd: 1 0015
the callable bond is 1.
616

exercise, and the value of

c
by
A. Mele

11.7. Callables, puttable and convertibles with trees


(iii) At = 0 5 years, we have that

= 70%, and, then:

1
[(0 7 0 97889 + 0 3 0 99719) + 0 03] = 0 98008 vs 1
u: 1 035
of the callable bond is 0 98008.

d: 1 102 [(0 7 0 99719 + 0 3 1) + 0 03] = 1 0079 vs 1


the callable bond is 1.

wait, and the value

exercise, and the value of

Finally, at the time of evaluation, we have that = 40%, and, then, the price of the callable
bond is:
1
(0 40 0 98008 + 0 60 1 + 0 03) = 0 99226
=
1 03
Naturally, the callable bond is valued less than the conventional bond (0 2) in Eq. (11.74):
the di erence is the value of the option given to the issuer to redeem these bonds, and arises
when the interest rates go su ciently downnegative convexity.
How would we proceed to price the BCX bond if the previous market data were unavailable?
In particular, suppose that (i) the risk-neutral probabilities of upward movements in the shortterm rate are: (i.a) unknown from time zero to 0.5 years; (i.b) 70%, from 0.5 to one year; and
(i.c) 60%, from one to 1.5 years; (ii) available for trading is a European call option written on
the BCX bond; (iii) this option, which quotes for $1 7226 10 3 , expires in 1.5 years, is struck
at $0 99000, and becomes worthless as soon as the underlying callable bond is called back by
the issuer. First, note that at the expiration, = 1 5 years, the payo s of the option are:
=0

= 0 00517

and, because of the sudden death assumption,


=
At = 1 year, we have that

= 0 00000

= 60%, and, then:

1
(0 6 0 + 0 4 0 00517) = 1 9885 10
1 04
1
(0 6 0 00517 + 0 4 0) = 3 0117 10
=
1 03
=0
=

At = 0 5 years, we have that

= 70%, and, then:

1
(0 70
+ 0 30
)
1 035

1
=
0 70 1 9885 10 3 + 0 30 3 0117 10 3 = 2 2178 10
1 035
= 0, by the sudden death assumption.
=

At the time of evaluation, the price of the call is,


= 1 7226 10

1
(
1 03

+ (1
617

)=

1
2 2178 10
1 03

11.7. Callables, puttable and convertibles with trees

c
by
A. Mele

where is the risk-neutral probability of an upward movement in the short-term rate during
the rst six months. We can solve for this , obtaining = 80%. Finally, given this probability,
we can calculate the price of the callable bond. We have:
=

1
(0 80 0 98008 + 0 20 1 + 0 03) = 0 98453
1 03

It is lower than the price calculated earlier, because the price of the option is giving more weight
(80%) than before (40%) to the occurrence of the state of the world where the interest rate
goes up.
11.7.3 Convertible bonds
11.7.3.1 Evaluation issues

Consider the following convertible and callable bond. Let


be the strike at which the bond
can be called by the bond-issuer, and let the parity, or conversion value, be CV = CR ,
where is the price of the common share. To evaluate this bond through a binomial tree, we
may proceed through the following three steps:
(i) First, we set the life of the tree equal to the life of the callable convertible bond.
(ii) Second, we assess the evolution of the stock price along the tree, under the risk-neutral
probability. This is done following the standard Cox, Ross and Rubinstein (1979) approach.
(iii) Third, in each node, we determine the value of the bond as max{CV min{
}}, where
is the value of the bond, rolled-back from the values of the bond in the next nodes
through the usual recursive, backward methodrelying on calculating the present value
of the risk-neutral expectation of the future payo s. That is, assuming the bondholder
does not convert, the value is
= min {
}, where
is the rolled-back value of
the bond. Then, the value is max{CV
}.
Note, this procedure leads to ll in the nodes, once we know the appropriate interest rate.
If the rm was not subject to default risk, we would simply use the riskless interest rate.
However, the rm is obviously subject to default risk. In practice, we proceed as follows. In
each node, the value of the bond is decomposed into two parts. One part, related to the pure
debt component, discounted at the defaultable interest rate; and one part related to the pure
equity component, discounted at the default-free interest rate. Exercise 25.7 in Hull (2003) (p.
653-654) illustrates a specic example.
11.7.3.2 A numerical example: without credit

Consider a three year convertible bond, which can be converted at any time into one share of
the underlying rms stock. The bond has a face value equal to 1, it is default-free, and pays
o a coupon of 3% of the face value every year, except the time at which it is issued. Moreover,
in each period, it pays o the coupon, regardless of whether it will be converted or not.
The price of the share is assumed to be una ected by any decision relating to the conversion
of the bond, and evolves over time as described by the following tree:
In the previous diagram, each period corresponds to one year, denotes the price of the
share, and = 12 is the constant risk-neutral probability of price movements. Assume, nally,
618

c
by
A. Mele

11.7. Callables, puttable and convertibles with trees


q=

q=

q=

1
2

1
2

1
2

uu: S = 1.20

u: S = 1.10

S = 1.00

uuu: S = 1.30

uud: S = 1.10
ud: S = 1.00

d: S = 0.90

udd: S = 0.90
dd: S = 0.80
ddd: S = 0.70

t=0

t=1

t=2

t= 1.3

that the yield curve is at at 3%, discretely compounded, and that it will remain such over the
next three periods, and in each state of the world.
We proceed to calculate the conversion value of the convertible bond at each node of the tree.
We shall identify, then, the nodes where it is optimal for the bond-holder to convert. Finally,
we shall determine the value of the convertible bond at time = 0, as well as the value of
the option to convert. As for the conversion value, we know this is simply the product of the
conversion ratio times the current value of the outstanding stock, and equals CV = CR = ,
as the conversion ratio is one. To nd the current value of the convertible bond, we proceed
recursively, as explained earlier, and calculate, for each date and each node, max {CV
},
where
denotes the present value of the future cash ows of the convertible, in case of no
conversion at time . The payo s at time = 3 are:
uuu: CV =

= 1 30,

max {CV 1} + 0 03 = 1 33, convert

uud: CV =

= 1 10,

max {CV 1} + 0 03 = 1 13, convert

udd: CV =

= 0 90,

max {CV 1} + 0 03 = 1 03

ddd: CV =

= 0 70,

max {CV 1} + 0 03 = 1 03

We have:
max {CV 1 } + 0 03, 1 1 103 12 (
= 2 uu: CV = = 1 20,
1 1
(1 33 + 1 13) = 1 19420. Hence
= 1 23000, convert
1 03 2
619

)=

c
by
A. Mele

11.7. Callables, puttable and convertibles with trees


ud: CV = = 1 00,
max {CV 2 } + 0 03, 2
1 1
(1 13 + 1 03) = 1 04850. Hence
= 1 07850
1 03 2
dd: CV =
Hence
=1

= 0 80,
max {CV
= 1 03000

3}

+ 0 03,

1 1
1 03 2

)=

1 1
1 03 2

) = 1.

max {CV 1 } + 0 03, 1 1 103 12 (


u: CV = = 1 10,
1 1
(1 23 + 1 07850) = 1 1206. Hence
= 1 15060
1 03 2

)=

d: CV = = 0 90,
max {CV 2 } + 0 03, 2 1 103 12 (
1 1
(1 07850 + 1 03000) = 1 0235. Hence
= 1 0535
1 03 2

)=

Finally, we have that:


convertible

1 1
(
0 03 2

)=

1 1
(1 15060 + 1 0535) = 1 0700
1 03 2

Instead, the value of a non-convertible three year coupon bearing bond is

2
3 !
1
1
1
1
= 1 00000
+ 0 03
+
+
1 03
1 03
1 03
1 03
Therefore, the option to convert is worth 0 07000.
Next, assume that the convertible bond is also callable by the issuer, at any time, and at a
strike value of 1 02000, and if it is called, the bond-holder has the option to tender the bond or to
convert it into one share. This convertible, and callable, bond can be evaluated as in the previous
calculations, although the formula to use in each node is, now, max {CV min {
}}, with
= 1 02000. The payo s at time = 3 are, now:
uuu: CV =

= 1 30,

max {CV min {

= 1 1 02}} + 0 03 = 1 33, convert

uud: CV =

= 1 10,

max {CV min {

= 1 1 02}} + 0 03 = 1 13, convert

udd: CV =

= 0 90,

max {CV min {

= 1 1 02}} + 0 03 = 1 03

ddd: CV =

= 0 70,

max {CV min {

= 1 1 02}} + 0 03 = 1 03

We have:
= 2 uu: CV = = 1 20,
max {CV min {
= 1 103 12 (1 33 + 1 13) = 1 19420. Hence
bond is called, and then converted
ud: CV = = 1 00,
max {CV min {
= 1 103 12 (1 13 + 1 03) = 1 04850. Hence
bond is called, but not converted

1 1
1 02}} + 0 03, 1
(
+
)
1 03 2
= max {CV 1 02} + 0 03 = 1 23. The
1 1
1 02}} + 0 03, 2
(
+
)
1 03 2
= max {CV 1 02} + 0 03 = 1 05. The

dd: CV = = 0 80,
max {CV min { 3 1 02}} + 0 03,
= 1. Hence
= max {CV 1} + 0 03 = 1 03000
=1

1 1
1 03 2

1 1
max {CV min { 1 1 02}} + 0 03, 1
(
+
)=
u: CV = = 1 10,
1 03 2
1 1
(1 23 + 1 05) = 1 1068. Hence
= max {CV 1 02} + 0 03 = 1 13000. The
1 03 2
bond is called, and then converted
620

11.7. Callables, puttable and convertibles with trees

c
by
A. Mele

1 1
d: CV = = 0 90,
max {CV min { 2 1 02}} + 0 03, 2
(
+
)=
1 03 2
1 1
(1 050 + 1 03000) = 1 0097. Hence
= max {CV 1 0097} + 0 03 = 1 0397.
1 03 2
The bond is not called and is not converted

Therefore, we have that,


convertible callable

1 1
(
1 03 2

)=

1 1
(1 13000 + 1 0397) = 1 0533
1 03 2

As expected, the value of a convertible callable is less than that of the convertible, due to the
option given to the bond-issuers to call the bond.

621

c
by
A. Mele

11.8. Appendix 1: Botstrapping and no-arbitrage restrictions

11.8 Appendix 1: Botstrapping and no-arbitrage restrictions


A. Extracting zeros from bond prices
Standard techniques are available to extract the price of zeros from that of coupon-bearing bonds,
provided at least a su cient spread of bonds exist across maturities. Consider the following example,
in which available for trading are three coupon-bearing bonds: the rst pays o at 1 , the second bond
pays o at 1 2 , the third bond pays o at 1 2 3 . By no-arbitrage,
(
(
(

1)
2)

11

3)

+1
21

0
22 + 1

31

32

0
0
33 + 1

(
(
(

1)
2)
3)

(11A.1)

1 . There
. We can use Eq. (11A.1) to invert for the prices of the zeros, =
for some coupons
is a mathematically equivalent inversion algorithm, a procedure known as bootstrapping, based on
be the price of a bond
the observation that the prices of the zeros can be solved for recursively. Let
be the price of the
that pays o coupons at on dates 1 2 , and the principal of $1 at . Let
can be estimated as follows:
zero maturing at . Then,
P 1
=1
= 1
(11A.2)
=
+1

where is the largest available maturity. It is straightforward to verify this formula using the example
= 3 in (11A.1).
for all . We dene the
To illustrate, suppose the bonds maturing at have xed coupons,
such that the price
is forced to equal 100%. The
par yield as in Eq. (11.3), as the xed sequence
following example shows how to use Eq. (11A.2) and extract zeros and, then, reconstruct a discretely
compounded yield curve.
P
Yield curve
Coupon
Maturity,
Zero price
=1
6 00%
1
0 9434
0 9434
6 00%
7 00%
2
0 8728
1 8162
7 04%
8 00%
3
0 7914
2 6076
8 11%
9 50%
4
0 6870
3 2946
9 84%
9 00%
5
0 6454
3 9400
9 15%
10 50%
6
0 5306
4 4706
11 14%
11 00%
7
0 4579
4 9285
11 81%
11 25%
8
0 4005
5 3290
12 12%
11 50%
9
0 3472
5 6762
12 47%
11 75%
10
0 2980

12 87%
Note that Eq. (11A.2) relies on the assumption that no maturities are missing. When some of
can be replaced with a linear
the maturity dates are not available, the required coupon rate
and
,
as
follows,
interpolation between
1
+1

+1
+1

1
1

+
+1

+1
1

The e ects of the interpolation should be visible near the missing maturitites.

B. Splines
Alternative to bootstrap are techniques that aim to cope with situations where the number of bonds is
less than the maturity dates we want to t. Suppose we observe bonds, where the -th bond entitles

622

11.8. Appendix 1: Botstrapping and no-arbitrage restrictions


to receive the coupons

, for
(

= 1

)=

c
by
A. Mele

, and that these prices are observed with errors, or


( )+

=1

)+ ,

= 1

where is the measurement error for the -th bond.


We aim to nd the curve
7
( ) that minimizes the errors, in some statistical sense. The
natural device is to parametrize the function ( ), with a number of parameters, where
.
To parametrize the function ( ) for a generic , we can use polynomials, as originally suggested by
McCulloch (1971, 1975),
( ) = 1 + 1 + 2 2 + +
where are the parameters. Cubic splines are polynomials up to the third order, and are
P very2 popular.
can be estimated by minimizing the sum of the squared errors,
The parameters
=1 . A wellknown pitfall of polynomials is that a high might imply that while the polynomial approximation
works reasonably well near the observed maturities, it may exhibit an erratic behavior in between. To
avoid this problem, we can use local polynomials, which are low-order polynomials (typically splines)
tted to non-overlapping subintervals.
Naturally, we may also want to parametrize the spot rates, ( ), as polynomials. The next chapter
(Section 12.3) explains that an alternative to polynomials are parametrizations introduced by Nelson
and Siegel (1987), where,

1
1
+ 3
(11A.3)
( )= 1+ 2
and
and are parameters. We provide interpretations of this parametrization in the next chapter,
where we also explain how this has been used in practice to forecast the yield curve.

C. No arb restrictions
Bond prices need to satisfy restrictions that prevent arbitrage. We use gures taken from Tuckman
(2002) (p. 8-12), and illustrate how an arbitrage opportunity can arise and be exploited in this context.
Data

Suppose that on some hypothetical date, say Februrary 15, 09, we observe Set I of bond prices in the
left panel of the following table, and that bootstrap leads to to the implicit zeros in the corresponding
right panel of the table. Also assumed is that we observe additional bond prices, those in Set II in the
lower part of the table. Are these additional prices, those in Set II, compatible with those in Set I, in
terms of arbitrage opportunities?

623

11.8. Appendix 1: Botstrapping and no-arbitrage restrictions

Set I: Treasury Bond prices


Coupon
Maturity
Market price
7 875%
8/15/09
101 400
14 250%
2/15/10
108 980
6 375%
8/15/10
102 160
6 250%
2/15/11
102 570
5 250%
8/15/11
100 840

Bootstrapped zeros
Time to maturity
05
10
15
20
25

c
by
A. Mele

Implicit zero
(0 0 5) = 0 97557
(0 1 0) = 0 95247
(0 1 5) = 0 93045
(0 2 0) = 0 90796
(0 2 5) = 0 88630

Set II: Treasury Bond prices


Coupon
Maturity
Market price
13 375%
8/15/09
104 080
10 750%
2/15/11
110 938
5 750%
8/15/11
102 020
11 125%
8/15/11
114 375
A basic no-arb condition

We cast the problem in a more general format. Suppose we observe a vector of


bond prices, with
a

matrix
of coupons, where each row of
gives the stream of the coupons promised by a
given asset. We know that the 1 vector of zeros , satises, =
. That is, assuming that the
matrix is invertible,
1
(11A.4)
=
Next, suppose there exists some asset that: (i) promises to pay:
=
and (ii) has a price,

, such that:

+ 100

(11A.5)

The right hand side of this inequality,


, is the no-arbitrage price of the asset, which in this
example is greater than the market price. The inequality gives rise to an arbitrage opportunity, which
can be exploited by going long the asset, and shorting a portfolio synthesizing it. To synthesize the
asset to go long for, we solve the following system of
equations with
unknowns,
=

(11A.6)

where the vector of unknowns, , contains the number of assets in the synthesizing portfolio: by
purchasing the portfolio , one is entitled to receive
in the future, which we want to equal . The
solution to Eq. (11A.6) is:
1
(11A.7)
=
1
Accordingly, the value of this portfolio, say, is given by, = =
=
, where the
last equality follows by the zero pricing equation (11A.4), and the inequality holds by the inequality
(11A.5).
To summarize, we now have the following situation: (i) the asset we hold produces the cash ows
that are needed to pay out the coupons of the synthesizing portfolio we sold, and (ii) the price of
the asset we go long is less than the value of the portfolio we short. This situation is an arbitrage
opportunity, as initially claimed. We now use these insights to check whether arbitrage opportunities
exist and exploited, using the data in Tables 11.1 through 11.3.

624

c
by
A. Mele

11.8. Appendix 1: Botstrapping and no-arbitrage restrictions


First step: detecting arbitrage opportunities

First, we determine no-arbitrage prices of the bonds in Set II, using the implicit zeros extracted from
Set I. Denote these prices with 1 (for the six month 13.375%), 2 (for the two year 10.750%), 3
(for the 2.5 year 5.750%), and 4 (for the 2.5 year 11.125%). They are given by,
13 375

+ 100 (0 0 5)
1 =
2

10 750
[ (0 0 5) + (0 1) + (0 1 5)] + 10 2750 + 100 (0 2 0)
2 =
2
5 75

5 75
+ 100 (0 2 5)
3 = 2 [ (0 0 5) + (0 1 0) + (0 1 5) + (0 2 0)] +
2

11 125
[ (0 0 5) + (0 1 0) + (0 1 5) + (0 2 0)] + 11225 + 100 (0 2 5)
4 =
2

The next table provides the numerical values of these theoretical prices, comparing to their market
counterparts:
Set II: Treasury Bond prices
Coupon
Maturity
Market price
13 375%
8/15/09
104 080
10 750%
2/15/11
110 938
5 750%
8/15/11
102 020
11 125%
8/15/11
114 375

No-arb price
104 080
111 041
102 007
114 511

While there are no arbitrage opportunities for the 13.375% bond expiring in six months, the price
of the 10.750% bond expiring in 2 years is less than its no-arbitrage price: this bond trades cheap.
In contrast, the 2.5 year 5.750% bond trades rich, although the resulting arbitrage does not seem
to be quite sensible.
Second step: implementing the arbitrage

We proceed to exploit the mispricing related


bond expiring in 2 years, using
the
1to the 10.750%

1
1
1
previous insights. We have,
= 4, and = 2 10 750 2 10 750 2 10 750 2 10 750 + 100 . We use
the rst four bonds in Set I to construct an arbitrage portfolio. In terms of the coupon matrix , we
have,
1
0
0
0
2 7 875 + 100
1
1
14
250
14
250
+
100
0
0
2
2
=
1
1
1
6
375
6
375
6
375
+
100
0
2
2
2
1
1
1
1
6
250
6
250
2
2
2 6 250
2 6 250 + 100
We implement the following trade: (i) buy 10.750% bonds expiring in 2 years, which cost 110 938 ;
(ii) create portfolios satisfying Eq. (11A.7),

1
=
= 0 0189 0 0197 0 0212 1 0218

If we short of these portfolios, then, by construction, the coupons we need to pay are exactly matched
by the coupons we receive from the 10.750% bonds expiring in 2 years. However, the market value
of the portfolios we short equals,

0 0189 0 0197 0 0212 1 0218

101 40
108 98
102 16
102 57

111 041

where the vector of the market prices, , is taken from Set I. Therefore, the gains from this trade are,
(111 041 110 938) = 0 103 . For example, by trading $1,000,000 at face value, i.e. = 10000,
then, arbitrage prots equal $1030.

625

c
by
A. Mele

11.9. Appendix 2: Proof of Eq. (11.17)

11.9 Appendix 2: Proof of Eq. (11.17)


Let (
) denote the price of a zero with maturity
= 1 2, when the interest rate is equal to
. We wish to replicate a zero with maturity 1 by means of a portfolio that comprises (i)
zeros
with maturity 2 and (ii) in the MMA. Let 0 be the current value of this portfolio; it is clearly
a function of the current short-term rate , and equals
0(

)=

2)

In the second period, the value of the portfolio is random, as it depends on the development of the
short-term rate . Precisely, the value of the portfolio in the second period, is

( + ) = ( + 2 ) + (1 + ) with probability
() =
(1 + ) with probability 1
( )= (
2) +
We also know that in the second period, the value of the second zero is,

( + 1 ) with probability
( 1 ) =
(
with probability 1
1)
Next, we select and
state of nature, viz

to make the portfolio value match that of the zero maturing at


() =

1)

1,

in each

in each state.

Mathematically, this is tantamount to solving the following system of two equations with two unknowns
( and ),

( + ) = ( + 2 ) + (1 + ) = ( + 1 )
(11A.8)
+ (1 + ) = (
( )= (
2)
1)
The solution is,
=

(
(

+
+

1)

(
(

2)

1)

2)

1)

[ (

(
+

2)

2)

(
2 )] (1 + )

1)

2)

By construction, the previous portfolio, ( ), replicates the value of the second zero in the second
period. But if two assets (the portfolio, and the second zero) yield the same payo s in each state of
the nature, they must be worth the same, in the absence of arbitrage. Therefore, we must have,
=

2)

+ =

(1 + ) = (1 + )

1)

(1 + )

0(

)|

1)

or,
(

2)

(11A.9)

Next, let us gure out the prediction of the model in terms of the expected return it generates for
the price of the bond maturing at 2 , when (
) = ( ). To do this, multiply the rst equation in
(11A.8) by , and multiply the second equation in (11A.8) by 1
. Add the result for =
=
to obtain,

+ (1 + ) =
) (
( + 1 ) + (1
) (
( + 2 ) + (1
2)
1)
Replacing Eq. (11A.9) into the previous equation yields,

( + 2 ) + (1
) (
(1 + ) (
2)

+
=
(
) (
(1 + ) (
1 ) + (1
1)

626

2)

1)

c
by
A. Mele

11.9. Appendix 2: Proof of Eq. (11.17)


Finally, replacing the solution for into the previous equation leaves,
[

2 ) + (1

) (
2)

2 )]

(1 + ) (
2)

2)

1 ) + (1

) (
1)

1 )]

(1 + )
1)

1)

The previous equation is easy to interpret. The numerators are the expected excess returns from
[ ( )] (1 + ) (
), where
[ ( )] is what the investors
holding the assets. They equal
) today, in the bond; and (1 + ) (
) is
expect to receive, the next period, by investing (
) today, in the MMA. The
what the investors expect to receive, the next period, by investing (
denominators constitute a measure of volatility related to holding the assets. The previous equation
then tells us that the Sharpe ratios (or the unit risk premiums) on the two zeros agree.
Let the Sharpe ratio on any zero be equal to some function of the short-term rate only (and
possibly of calendar time). This function, , does not clearly depend on the maturity of the zeros.
Then, we have,

) (
(1 + ) ( 2 ) = ( + 2 )
(
( + 2 ) + (1
2)
2)
=

2)
+

2)

[(

) as a measure of interest rate volatility, and dene Vol(


We can interpret ( +
Eq. (11.17) follows by rewriting Eq. (11A.10) for a generic maturity date
2.

627

) ] (11A.10)

).

c
by
A. Mele

11.10. Appendix 2: The Ho and Lee price representation

11.10 Appendix 2: The Ho and Lee price representation


Dene the discretely compounded forward rate, as the number
( )
(
+ 1), satisfying:
(
+1)
1
=
,
as
in
Eq.
(11.4)
of
the
main
text.
Iterating
this
equation
leaves:
(
)
1+ ( )

)=

Y1
=

Therefore, for any :

1
1+

( )

(
(

)
)

(
(

1
) Y
1
)
1+
( )
=

, we have that,

)=

(
(

1
) Y 1+
)
1+
=

()
( )

(11A.11)

Eq. (11A.11) is a convenient representation of the bond price at a future date : it is the ratio of
the two current prices (
) and ( ), and a factor relating to the development of forward rates
1+ ( )
from the current time to time , i.e. 1+
=
1. Hence, once we model forward
( ) , for
rates, we have implications for bond price movements.
We normalize the time-line and set = 0. Redening = , Eq. (11A.11) reduces to,

)=

1
(0 ) Y 1 +
(0 )
1+
=

(0)
()

(11A.12)

Eq. (11.36) in the main text follows by Eq. (11A.12).


Next, we search for the models predictions on forward rates, i.e. we prove Eq. (11.37). The proof is
by induction. Eq. (11.37) holds true for = 0. Next, suppose that it holds at time . We wish to show
that in this case, Eq. (11.37) would also hold at time + 1. At time + 1, we have two cases.

Case 1 : A positive price jump occurs between time

+1

( + 1) = ln
= ln
= ln
= ln

and time + 1. In this case,

+1 (

+1 )
+ 1)
+1 ( + 1

( )
ln
(
)
( + 1)
(
)
+ ()
( +1
)
( + 1 ( + 1))
+ (0)
( + 1)

(
(

+ 1)
+ 1)

( +1

[( + 1)

( + 1)] ln

where the rst equality and the third follow by the denition of +1 ( ), the second equality
holds by the denition of the jump in Eq. (11.27), the fourth equality follows by Eq. (11.37).
Hence, Eq. (11.37) holds at time + 1 in the occurrence of a positive price jump between time
and time + 1.

628

c
by
A. Mele

11.10. Appendix 2: The Ho and Lee price representation


Case 2 : A negative price jump occurs between time
( +1 )
( + 1 + 1)

( )
= ln (
)
( + 1)
(
)
= ln
+ ()
( +1
)

and time + 1. In this case,

( + 1) = ln

= ln

)+1
( +1

( +1
)
(
)
+ (0)
= ln
( + 1)

ln

1
)+1

[( + 1)

( +1

+ (0) + ln

(
(

+ 1)
+ 1)

( +1
)
( + 1)

) ln

] ln

where the rst four equalities follow by the same arguments produced in Case 1, the fth
equality holds by the relation ( ) = ( ) ( 1) in Eq. (11.33) and the last equality follows
by rearranging terms. Hence, Eq. (11.37) holds at time + 1 in the occurrence of a negative price
jump between time and time + 1.

629

11.10. Appendix 2: The Ho and Lee price representation

c
by
A. Mele

References
Bernanke, B. S. and A. Blinder (1992): The Federal Funds Rate and the Channels of Monetary
Transmission. American Economic Review 82, 901-921.
Black, F. and M. Scholes (1973): The Pricing of Options and Corporate Liabilities. Journal
of Political Economy 81, 637-659.
Black, F., E. Derman and W. Toy (1990): A One Factor Model of Interest Rates and its
Application to Treasury Bond Options. Financial Analysts Journal (January-February),
33-39.
Cox, J. C., S. A. Ross and M. Rubinstein (1979): Option Pricing: A Simplied Approach.
Journal of Financial Economics 7, 229-263.
Derman, E. and J. Kani (1994): Riding on a Smile. Risk 7, 32-39.
Dupire, B. (1994): Pricing with a Smile. Risk 7, 18-20.
Heath, D., R. Jarrow and A. Morton (1992): Bond Pricing and the Term-Structure of Interest
Rates: a New Methodology for Contingent Claim Valuation. Econometrica 60, 77-105.
Ho, T. S. Y. and S.-B. Lee (1986): Term Structure Movements and the Pricing of Interest
Rate Contingent Claims. Journal of Finance 41, 1011-1029.
Ho, T. S. Y. and S.-B. Lee (2004): The Oxford Guide to Financial Modeling. Oxford University
Press.
Hull, J. C. (2003): Options, Futures, and Other Derivatives. Prentice Hall. 5th edition (International Edition).
Hull, J. C. and A. White (1990): Pricing Interest Rate Derivative Securities. Review of
Financial Studies 3, 573-592.
McCulloch, J. (1971): Measuring the Term Structure of Interest Rates. Journal of Business
44, 19-31.
McCulloch, J. (1975): The Tax-Adjusted Yield Curve. Journal of Finance 30, 811-830.
Nelson, C.R. and A.F. Siegel (1987): Parsimonious Modeling of Yield Curves. Journal of
Business 60, 473-489.
Rubinstein, M. (1994): Implied Binomial Trees. Journal of Finance 49, 771-818.
Tuckman, B. (2002): Fixed Income Securities. Wiley Finance.
Vasicek, O. (1977): An Equilibrium Characterization of the Term Structure. Journal of
Financial Economics 5, 177-188.

630

12
Interest rates

12.1 Introduction
This chapter surveys empirical facts and models regarding the term-structure of interest rates
and derivatives based thereon. It di ers from the previous introductory chapter, as we now
largely rely on continuous-time methods while providing a systematic approach to a variety
of important topics. These topics range from the stylized facts such as the factors driving the
yield curve, their business cycle components, or bond returns predictability and volatility, to
more conceptual aspects regarding how we would need to think about duration in a random
environment, or the pricing details of interest rate derivatives such as bond options, puttable
and callable bonds, swaps, caps, oors, or swaptions, to mention a few.
We know from previous chapters that an important objective arising while pricing derivatives
is that we make sure that the price of the underlying assets is pinned down without errors. When
it comes to interest rate derivatives, this task is more challenging, because the yield curve relies
on risks that are typically not traded. Consider, for example, a model where the price of a zero
coupon bond is only driven by random movements of the short-term ratea one-factor model.
Let (
) be the price at time of a zero coupon bond expiring at time , when the
short-term rate is . The exact functional form of the pricing function (
) depends on
the assumed dynamics of the short-term rate and the market risk-appetite. Models of this kind,
and generalizations to multi-state variables, are known as models of the short-term rate, and
are discussed in Section 12.4.
Models of the short-term rate are very important because once they are made complex enough
to address the main facts we see in the data, they might perform a series of tasks. For example,
they can be used to forecast developments in xed income markets. They could also be used for
trading purposes should they reasonably point to market ine ciencies. Note that these models
lead to pricing errors, and it is actually the presence of these errors to justify their potential
use for trading purposes.
A second class of models that does not lead to pricing errors is that developed by Heath,
Jarrow and Morton (1992), which generalizes the Ho and Lee (1986) model described in the
previous chapter. In the previous chapter, we have seen three-based instances of additional noarbitrage models. This chapter provides a systematic continuous time treatment (in Sections

c
by
A. Mele

12.2. Bond prices and interest rates

12.6 and 12.6). A principle underlying these models is that current bond prices need not be
modeled in the rst place. Rather, current bond prices are taken as primitives, with the modeling
focus being shifted to the ongoing development of forward rates, i.e. interest rates prevailing
today for borrowing in the future. There is a relation linking bond prices to forward rates. No
arbitrage then restricts the joint behavior of future bond prices and forward rates. We shall
emphasize the use of these models to price derivatives.
In more detail, the plan of this chapter is as follows. The next section provides denitions of
interest rates and markets, and foundational issues regarding two basic representations of bond
pricesone in terms of the short-term rate and another in terms of forward rates. Section 12.3
contains an introduction to a number of very important empirical topics, such as bond return
predictability, or the relation between the yield curve and the business cycle. Sections 12.4
and 12.5 deal with models of the short-term rate, and Section 12.6 with their perfectly tting
extensions, i.e. the extensions that make these models t the initial yield curve without errors.
Section 12.7 contains a treatment of models that t the yield curve, based on the Heath, Jarrow
and Morton (1992) framework. Section 12.8 is an introduction to the main interest rate derivatives, and describes how these assets can be priced relying on the models of the short-term rate
(and their perfectly tting extensions). Section 12.9 provides an alternative pricing framework,
known as market model, whereby derivatives are evaluated through Blacks (1976) pricers.
A number of appendixes provides technical details omitted from the main text. This chapter
does not assume default risk except in Section 12.4.6. Default risk is, instead, systematically
dealt with in the next chapter.

12.2 Bond prices and interest rates


12.2.1 A rst representation of bond prices
12.2.1.1 Zeros

Let (
) the price at of a zero coupon bond expiring at , and consider the discretely
compounded interest rate for the time interval [
] introduced in Section 11.2.2.1 of the
previous chapter, and dened as:
(
Given

)=

1
1+(

), the short-term rate process

) (

(12.1)

is obtained as:
lim (

Next, let be a risk-neutral probability, and E () denote the time conditional expectation
under . By the FTAP, there is no arbitrage if and only if (
) satises, for all
[ ],
(

)=E

(12.2)

The proof of Eq. (12.2) relies on arguments that are now quite standard in these lectures, but
its if part is provided again in Appendix 1, because it highlights a few key hedging arguments
that underlie it.
632

c
by
A. Mele

12.2. Bond prices and interest rates


12.2.1.2 Fixed coupon and oating rate bonds

Given a set of dates { } =0 , a xed coupon bond pays o a known coupon stream, i.e. at ,
= 1 , and $1 at ; typically, the coupon paid at at compensates for the time-interval
1 . By the FTAP, the value of a xed coupon bond is

fcb

)=

)+

=1

A oating rate bond pays o coupons indexed to short-term rates,


=
where
price

)=

(12.3)

, and where the second equality follows by Eq. (12.1). By the FTAP, the
as of time of a oating rate bond is:
+1

frb

frb

()=

)+

=1

)+

=1

)+

1)

=1

0)

=1

=1

(12.4)

where the second line follows by Eq. (12.3) and the third line will be justied in a moment
(see Eq. (12.5) below). That is, a oating rate bond would quote at par at its rst reset date,
( 0 0 ) = 100%.1
frb ( 0 ) =
Regarding the third line of Eq. (12.4),
(

)=E

(12.5)

consider the following economic interpretation. Suppose that at time , $ ( ) are invested in
a bond maturing at time . At time , this investment will obviously pay o $1. And at time
, $1 can be further rolled over another bond maturing at time , thus yielding $ 1/ (
)
at time . In other words, an investment at equal to $ ( ), leads to a payo at equal
to $ 1/ (
), whence Eq. (12.5).2
1 This

property also holds in a market where the oating rates continuously pay o the instantaneous short-term rate . Indeed,
= , and let frb is solution to the partial di erential equation (12.87) in Section 12.6, with ( ) = , and boundary condition
frb ( ) = 1. Then, it can be veried that frb = 1 is solution to Eq. (12.87).
2 Mathematically, we have, by the Law of Iterated Expectations, that

let

( )

( )

=E

( )

633

F( )

c
by
A. Mele

12.2. Bond prices and interest rates


12.2.2 Forward rates

Forward rates are interest rates that make the value of a forward rate agreement (FRA, henceforth) equal to zero at origination. Section 11.2.2.3 of the previous chapter provides the denition
of a forward rate agreement, which we re-state below for reasons claried in a moment. Forward rates as of time , for a forward rate agreement relating to a future time-interval [
], are
denoted with (
), and link to bond prices through a precise relation, derived in Section
11.2.2.3 of the previous chapter:
(
(
Clearly, the forward rate agreed at
to the same period:

)
=1+(
)

for the time interval [


(

)=

(12.6)

] is the short-term rate applying

(12.7)

Consider, next, a more general FRA, where a rst counterparty agrees: (i) to pay an interest
rate on a given principal at time , xed at some
6= (
), and (ii) to receive, in
exchange, the future interest rate prevailing at time
for the time interval [
], (
),
from a second counterparty. The time payo originated by this forward starting interest
rate swap is:
(
)[ (
)
]
(12.8)
It is the same as the P&L to a party who enters a FRA at time (with no costs) for the
time-interval [
], as a future borrower. Come time , the party shall honour the FRA by
borrowing $1 for the time-interval [
] at a cost of . At the same time, the party can lend
this very same $1 at the random interest rate (
). The time payo deriving from this
trade is, of course, the same as that in Eq. (12.8).
12.2.3 A second representation of bond prices
12.2.3.1 Prices as forward looking indicators

Bond prices can be expressed in terms of these forward interest rates, namely in terms of the
instantaneous forward rates. First, rearrange terms in Eq. (12.6) so as to obtain:
(
The instantaneous forward rate (
(

)=

(
(

(
) (

)
)

) is dened as
lim (

ln (

)=

(12.9)

It can be interpreted as the marginal rate of return from committing a bond investment for an
ln ( )
additional instant. To express bond prices in terms of , integrate Eq. (12.9), ( ) =
,
with respect to the maturity date , use the condition that ( ) = 1, and obtain:
(

)=

(12.10)

Eq. (12.10) suggests a natural modeling approach of the yield curve that emphasizes the
dynamics of the forward rates, and dealt in detail in Sections 12.5 and 12.6.
634

c
by
A. Mele

12.3. Stylized facts


12.2.3.2 The marginal nature of forward rates

Consider the yield-to-maturity introduced in Section 11.2.2.2 of the previous chapter, dened
to be the function ( ) such that:
(

) (

(12.11)

Comparing Eq. (12.11) with Eq. (12.10) yields:


(

)=

By di erentiating Eq. (12.12) with respect to


(

(12.12)

yields:
[ (

)]

(12.13)

Eq. (12.13) underscores the marginal nature of forward rates: the yield-curve, ( ), is
increasing in, decreasing in, or stationary at , according to whether ( ) exceeds, is lower,
or equal the spot rate for maturity . It is a simple but powerful result: it is intuitive that
forward rates should summarize information about the market expectations regarding future
interest rates, as explained in Section 12.3.1. For example, an increasing yield curve would
signal high rates to be expected in the future, under conditions explained in Section 12.3.1.

12.3 Stylized facts


12.3.1 The expectation hypothesis
The expectation hypothesis holds that the forward rates equal expected future short-term rates,
viz
( )= ( )
(12.14)
where () denotes expectation under the physical probability. Eq. (12.14) has two implications.
First, through Eq. (12.12),
Z
1
( )=
( )
(12.15)
and second, by Eq. (12.13),

( )

)]

(12.16)

Note a very straight conclusion we can make from Eq. (12.16): short-term rates are expected
to be high at some future date (compared to the corresponding yield) if, and only if, the
yield curve is increasing in . One implication is that an increasing yield curve (i.e. one where
is always less than ( )) signals the expectation that future short-term rates will increase
and, similarly, an inverted yield curve signals the expectation that future short-term rates will
fall.
For example, in December 2012, the yield curve in the United Kingdom was substantially at
at less than 0.5% for maturities of less than three years, and then steadily increased to 2% for
maturities up to ten years. Taken at face value, the expectation hypothesis would simply imply
635

c
by
A. Mele

12.3. Stylized facts

that in December 2012, the market did not expect UK short-term to move over the following
three years. Naturally, the market reasoning could have changed over new information.
A natural question arises as to whether in the data the forward rate for maturity is higher
than the short-term rate expected to prevail at time , viz
(

( )

(12.17)

It is an old issue. One possibility might be that in the presence of risk-averse investors.
Consider the Hicks-Keynesian normal backwardation hypothesis reviewed in Chapter 10.3 In
the context of this chapter, forward rates can be higher than expected spot rates as rms
demand long-term funds with fund suppliers preferring to lend at shorter maturity dates. In
this case, the market might be cleared by intermediaries, who require a liquidity premium to
be compensated for their risky activity of borrowing at short and lending at long maturities
intermediaries worry about their prots at time , thereby lending at higher rates than the
expected spots.4
Finally, consider the following denition of the term-premium, the di erence between the
spot rate and the future expected average short-term rate, for the same horizon, as follows:
TP (

( )

[ (

( )]

where the second equality follows by Eq. (12.12). We see that the expectation hypothesis, and
possible violations of it, bring content to the term-premium. The next section aims to provide
evidence on the expectation hypothesis, as linked to the predictability of bond returns.
12.3.2 Bond returns predictability
What does the empirical evidence suggest about the expectation hypothesis? How does the
expectation hypothesis link to bond returns? First, we explain under which probability should
the expectation hypothesis hold. Second, we explain the main issues regarding the empirical
evidence.
12.3.2.1 Forward-adjusted expectation hypothesis

In more advanced sections of this chapter (see Section 12.9), we shall explain that forward
rates cannot be unbiased expectations of the future spot rates, not even under the risk-neutral
probability. They only are such, under a certain probability, known as forward probability,
already introduced in Chapter 4. Section 12.8.3, for example, shows that,
(

)=E

( )

(12.18)

where E denotes the expectation under the forward probability,


(see Eq. (12.90)). This
result points to a basic di culty with the expectation hypothesis: Eq. (12.18) just tells us that
the expectation hypothesis can only hold true under the forward probability. There are two
3 Note that the normal backwardation (contango) hypothesis states that forward prices are lower (higher) than future expected
spot prices, which implies the inequality in (12.17), stated in terms of rates.
4 Note, however, that the inequality in (12.17) cannot hold even in a risk-neutral market, as a mechanical result originating from
(
)
Jensens inequality. We have, indeed that,
(
)
0 for eact and so, ( ) E [ ( )] for each , contradicting (12.17).

636

E [ ( )]

, which implies that

( (

E [ ( )])

c
by
A. Mele

12.3. Stylized facts

sources of risk, which might actually prevent the expectation theory from holding, in practice.
To illustrate, let us elaborate on Eq. (12.18),
(

)=E

( )

=E ( ( )
=E ( )+
=

)
(

( )+

( )
( )

( )

(12.19)

where
( ) denotes the Radon-Nikodym derivative of the physical probability against the
risk-neutral; E and
denote time conditional expectation and covariance under the riskneutral probability, with
and
denoting the counterparts under the physical probability.
That is, forward rates might deviate from future expected short-term rates, because of riskaversion corrections (the second term in the last equality) and randomness in interest rates (the
third term in the last equality).
Note that we can determine these covariance terms through the Radon-Nikodym theorem.
Show that the second covariance
( ( )
) is zero, when goes to . Then,
as goes

to , we are only left with risk-aversion corrections, the rst covariance


( )
.
The local expectation hypothesis postulates that this second correction is zero.]
[In progress]
12.3.2.2 Organizing the empirical evidence

Denote the continuously compounded returns on a zero expiring at some date


as +1 =
( +1 )
ln ( ) . Using the denition of spot rates, ( ), the excess returns, +1 say, can be
expressed as:
1
( +1 )
ln
( )
( + 1)
( +1 )
= ln
( + 1)
( )
= (
1) ( + 1 ) + (
ln

+1

+ 1)

such that the expected change in the yield curve is negatively related to the expected excess
returns and positively related to the slope of the yield curve:
[ ( +1

)] =

+1 +

The expectation hypothesis implies that the risk-premium,


ing that the expectation hypothesis holds,
( ( + 1)) =
=

+ 1)
( ( + 1)) +

+1 ( + 1)

( + 1) +

[ ( )
( + 1)]
1

+1 , is zero. Indeed, assum

+1 ( + 1)

( + 1)

where the rst equality holds by the expectation hypothesis, and the second is Eq. (12.19).

Therefore, the sum of the last two terms in the last equality is zero, implying that
+1 is
zero. Veronesi (2010, Chapter 7) builds up a parametric example to illustrate these relations,
within an a ne model. In the Appendix, we illustrate how these relations work, analytically,
by hinging upon a simple but famous modelthat of Vasicek (1977). [In progress]
637

c
by
A. Mele

12.3. Stylized facts

Empirically, we can test for the expectation theory, by running the following regression:
( +1

)=

[ (

+ 1)] + Residual

and test for the null:


= 0 and
= 1. A widely known empirical feature of US data is that
the estimates of
are typically negative for all maturities , and somewhat increasing with
in absolute value. In fact,
Fama
and Bliss (1987) show that bond returns are predictable, in
that the risk-premium
( + 1), and
+1 relates to the forward spreads, dened as
that regressing,

+1 =
+
( + 1) + Residual
for many maturities .
delivers statistically signicant and positive values of
Cochrane and Piazzesi (2005) go one step further and consider the following regressions:
+1 =

+ 1) +

5
X

+ Residual

=2

where

is the forward rate for maturity

1,
5

ln

(
(

)
.
1)

They document a tent

shape for the estimates of the coe cients


, for bond maturities
{1 5}, and
=1
where is in years so as to make returns calculated on a yearly basis. They document that this
tent shape is robust to estimating a factor model, in that the shape persists in the estimates of
the coe cients ( )5=1 in:
+1 =

+ Residual

+ 1) +

5
X
=2

where
is the common factor among the bond maturities
{1 5}. Moreover, they
argue that the predicting power of their factors is not destroyed, in sample, while conditioning
on the standard known to explain movements in the yield curve (see Section 12.3.5).
12.3.3 The yield curve and the business cycle
There is a simple prediction about the shape the yield-curve that we can make. By Jensens
E ( )
inequality, ( ) ( )
( ) = E[
]
. Therefore, the yield curve
R
1
satises: ( )
E ( ) . For example, suppose that the short-term rate is a martingale under the risk-neutral probability, viz E ( ) = . Then, the yield curve is bound to be:
( )
. That is, the yield curve is not increasing in time-to-maturity, , at least for small
maturities. Positively sloped yield curve, then, likely arise because the short-term rate is not
a martingale under the risk-neutral probability, which happens because of two fundamental,
and not necessarily mutually exclusive, reasons: (i) interest rates are expected to increase, (ii)
investors are risk-averse. On average, the US yield curve is upward sloping at maturity from
one up to ten years.
There is strong empirical evidence since at least Kessel (1965) or, later, Laurent (1988,
1989), Stock and Watson (1989), Estrella and Hardouvelis (1991) and Harvey (1991, 1993),
that inverted yield curves predict recessions with a lead time of about one to two years. Figure
12.1 illustrates these empirical facts through a plot of the di erence between long-term and
short-term yields on Treasuriesin short, the term spread.
638

12.3. Stylized facts

c
by
A. Mele

FIGURE 12.1. This picture depicts the time series of the term spread, dened as the
di erence between the 10 year yield minus the 3 month yield on US Treasuries. Sample
data cover the period from January 1957 to December 2008. The shaded areas mark
recession periods, as dened by the National Bureau of Economic Research. The end of
the last recession was announced to have occurred in June 2009.

Naturally, there are recession episodes preceded by mild yield curve inversions. But the really
striking empirical regularity is the sharp movements of the term spread towards a negative
territory, occurring prior to any recession episode. Note, it is not really important whether it is
the short-term rate that goes up or the long-term rate that goes down. The empirical regularity
is that the term spread goes down and becomes negative prior to a recession. The explanations of
these statistical facts are challenging, and might hinge upon both (i) the conduct of monetary
policy and the expectations about it, and (ii) the risk-premiums agents require to invest in
long-term bonds. We discuss these two points below.
(i) The monetary channel :
(i.1) During expansions, monetary policy tends to be restrictive, to prevent the economy
from heating up. At the height of an expansion, then, short-term yields go up.
(i.2) Moreover, during recessions, monetary policy tends to keep interest rates low. At
the height of an expansion, agents might be anticipating an incoming recession and
expecting central banks to lower future interest rates. Therefore, at the height of
an expansion, future interest rates might be expected to lower. The expectation hypothesis in Eq. (12.15) would then predict that the slope of the yield curve would
639

c
by
A. Mele

12.3. Stylized facts

decrease. Granted, in the previous subsection, we have just learnt that the expectation hypothesis does not hold empirically. Bond markets command risk-premiums.
However, a risk-premium channel would reinforce the conclusion that the slope of
the yield curve decreases during expansions, as argued in the next point.

(ii) The risk-premium channel : From Chapter 7, we know that risk-premiums are countercyclical, being high during recessions and low during expansions. The conditional equity
premium is countercyclical, and so is the long-bond premium.5 In fact, long-term yields
and equity expected returns are likely to be driven by the same state variables a ecting
the pricing kernel of the economy.6

Lets summarize. On the one hand, countercyclical monetary policy might be responsible
of the negative price changes a ecting short-term bonds. On the other, expectations about
countercyclical monetary policy as well as procyclical risk-appetite might be responsible for
positive price changes a ecting long-term bonds. These price movements, we have argued,
should occur at the height of an expansion. But the sample data we have are those where
expansions are followed by recessions. Whence, the statistical facts about the predictive content
of the yield curve, as we further formalize with a simple model in Section 12.4.3.7
Are these explanations plausible? It is interesting to note that these inversions did also use to
occur prior to the creation of the Federal Reserve system. The creation of the US Central Bank
might constitute a Natural Experiment to perform statistical inference about the importance
of the gaming between central banks and the market expectations about the future conduct of
monetary policy.
Note also that the inversion of the yield curve occurring in 2006 might have arisen due to a
strong demand for long-term bonds, as warned by some policy-makers at the time (see, e.g.,
the European Central Bank Monthly Bulletin, February 2006, p. 27). It is clearly challenging to
quantify the extent of this demand pressure, which might perhaps be coming from institutional
investors such as Pension Funds over their performamce of asset-liability management duties. It
is undeniable, though, that in 2006, the Federal Reserve was targeting higher and higher interest
rates to deal with concerns about ination generated by a previous loose policy following the
2001 recession, the Twin Towers attacks and, perhaps, the Corporate scandals in 2003 too. It is
an open question as to whether the markets thought that this increased tightening was marking
the end of an expansion, thereby feeding an expectation future interest rates would drop again
in the near future. Ironically, the sharp tightening of the FED policy at the time would carry
implications on nancial and economic developments such as the 2007 subprime crisis and the
crisis arising therefrom as explained in the next chapter.
5 An objection to this line of reasoning is that countercyclical risk-premiums might lead to expect future bond prices to decrease
over a future recession, thereby destroying the e ects of a procyclical short-term rate. In Section 12.3.3.2, we develop a model where
these e ects do not arise as soon as the e ects of countercyclical risk-premiums are assumed to be bounded.
6 That long term bonds and stock market are informally acknowledged to be tightly related is witnessed by a quite raw rule of
thumb, whereby a stock market correction, such as a crash say, is deemed to be imminent when the spread 30 year bond yield minus
the earning-price ratio is larger than 3%. This spread, which is usually around 1% or 2%and on average, zero, once corrected for
inationwas indeed larger than 3% in 1987 and in 1997.
7 There might be other channels. Inverted yield curves lead to negative margins for banks, which might then contribute to a
credit crunch, determined by a less aggressive attitude to lend due to the negative margins. This might depress demand, leading to
the expectation of an imminent recession. This expectation leads to a negative term spread due to the mechanism analyzed in the
main text, over a vicious cycle.

640

c
by
A. Mele

12.3. Stylized facts


12.3.4 Additional stylized facts about the US yield curve
There are three additional features of data, which need to be noted.

(i) Yields are highly correlated (say three year yields with four year yields, with ve year
yields, etc.), and suggest the existence of common factors driving all of them, discussed
in Section 12.3.5 below.
(ii) Yields are also highly persistent, and this persistence bears important consequences on
derivative pricing, as explained in Section 12.8.1.
(iii) The term-structure of unconditional volatility is downward sloping, a feature also rationalized in Section 12.8.1.
12.3.5 Common factors a ecting the yield curve
Which systematic risks a ect the entire term-structure of interest rates? How many factors are
needed to explain the variation of the yield curve? The standard duration hedging practice,
reviewed in detail in Chapter 11, relies on the idea that most of the variation of the yield curve
is successfully captured by a single factor that produces parallel shifts in the yield curve. How
reliable is this idea, in practice? This section reviews famous evidence that most of the variation
of the US yield curve is explained by just a few factors, interpreted as (i) a level factor, (iii)
a steepness factor, and (iii) a curvature factor. It is natural to expect that only a few factors
explain the yield curve: intuitively, there is no so much di erence between a 10Y zero and a
10Y+1day zero, and then, 10Y+2days, etc. Repeating this reasoning ad innitum suggests that
most of the yield curve should be likely driven by common sources of variation.
Litterman and Scheinkman (1991) demonstrate that most of the variation (more than 95%)
of the term-structure of interest rates can be attributed to the variation of three unobservable
factors, which they label (i) a level factor, (ii) a steepness (or slope) factor, and (iii)
a curvature factor. To disentangle these three factors, the authors make an unconditional
analysis based on a xed-factor model. Succinctly, this methodology can be described as follows.
Suppose that returns computed from bond prices at di erent maturities are generated by
a linear factor model, with a xed number of factors,
= +
1

+
1

(12.20)
1

where
is the vector of returns,
is the zero-mean vector of common factors a ecting the

returns, assumed to be zero mean, is the vector of unconditional expected returns, is a vector
of idiosyncratic components of the return generating process, and is a matrix containing the
factor loadings. Each row of contains the factor loadings for all the common factors a ecting
a given return, i.e. the sensitivities of a given return with respect to a change of the factors.
Each comumn of contains the term-structure of factor loadings, i.e. how a change of a given
factor a ects the term-structure of excess returns.
12.3.5.1 Methodological details

Estimating the model in Eq. (12.20) leads to econometric challenges, mainly because the vector of factors
is unobservable.8 However, there exists a simple method, known as principal
8 Suppose that in Eq. (12.20),
>+ .

(0 ), and that
(0 ), where is diagonal. Then,
, where =
The assumptions that
(0 ) and that
is diagonal are necessary to identify the model, but not su cient. Indeed, any

641

c
by
A. Mele

12.3. Stylized facts

components analysis (PCA, henceforth), which leads to empirical results qualitatively similar
to those holding for the general model in Eq. (12.20). We discuss these empirical results in the
next subsection. We now describe the main methodological issues arising within PCA.
The main idea underlying PCA is to transform the original correlated variables into a set
of new uncorrelated variables, the principal components. These principal components are linear
combinations of the original variables, and are arranged in order of decreased importance: the
rst principal component accounts for as much as possible of the variation in the original data,
etc. Mathematically, we are looking for linear combinations of the demeaned excess returns,

= >
= 1
(12.21)

such that, for vectors > of dimension 1 , (i) the new variables are uncorrelated, and (ii)
their variances are arranged in decreasing order. The logic behind PCA is to ascertain whether
a few components of
= [ 1 ]> account for the bulk of variability of the original data.
>
>
>
Let
= [ 1
] be a matrix such that we can write Eq. (12.21) in matrix format,
>

=
or, by inverting,
=

> 1

(12.22)

Next, suppose that the vector ( ) = [ 1 ]> accounts for most of the variability in the
original data,9 and let >( ) denote a matrix extracted from the matrix > 1 through
the rst rows of > 1 . Since the components of ( ) are uncorrelated and they are deemed
largely responsible for the variability of the original data, it is natural to disregard the last
components of in Eq. (12.22),

>( )

( )
1

( )

If the vector
really accounts for most of the movements of , the previous approximation
to Eq. (12.22) should be fairly good.
Let us make more precise what the concept of variability is in the context of PCA. Suppose
that the variance-covariance matrix of the returns, , has distinct eigenvalues, ordered from
the highest to the lowest, as follows: 1
. Then, the vector
in Eq. (12.21) is the
eigenvector corresponding to the -th eigenvalue. Moreover,
= 1

( )=
Finally, we have that
RPCA

P
= P =1
=1

P
( )
= P =1
( )
=1

(12.23)

orthogonal rotation of the factors yields a new set of factors which also satises Eq. (12.20). Precisely, let
be an orthonormal
> > =
> . Hence, the factor loadings
matrix. Then, (
)(
)> =
and
have the same ability to generate the matrix
. To obtain a unique solution, one needs to impose extra constraints on . For example, J
oreskog (1967) develop a maximum
1 , where
likelihood approach in which the log-likelihood function is, 12
is the sample covariance matrix of
ln | | + Tr
, and the constraint is that >
be diagonal with elements arranged in descending order. The algorithm is: (i) for a given ,
maximize the log-likelihood with respect to , under the constraint that >
be diagonal with elements arranged in descending
order, thereby obtaining ; (ii) given , maximize the log-likelihood with respect to , thereby obtaining , which is fed back into
step (i), etc. Knez, Litterman and Scheinkman (1994) describe this approach in their paper. Note that the identication device they
describe at p. 1869 (Step 3) roughly corresponds to the requirement that >
be diagonal with elements arranged in descending
order. Such a constraint is clearly related to principal component analysis.
9 There are no rigorous criteria to say what most of the variability means in this context. Instead, a likelihood-ratio test is
most informative in the context of the estimation of Eq. (12.20) by means of the methods explained in the previous footnote.

642

c
by
A. Mele

12.3. Stylized facts

(Appendix 4 provides technical details and proofs of the previous formulae.) It is in the sense
of Eq. (12.23) that in the context of PCA, we say that the rst principal components account
for RPCA % of the total variation of the data.
12.3.5.2 The empirical facts

The striking feature of the empirical results uncovered by Litterman and Scheinkman (1991)
is that they have been conrmed to hold across a number of countries and sample periods.
Moreover, the economic nature of these results is the same, independently of whether the
statistical analysis relies on a rigorous factor analysis of the model in Eq. (12.20), or a more
back-of-envelope computation based on PCA. Finally, the empirical results that hold for bond
returns are qualitatively similar to those that hold for bond yields.

Level

Slope

Curvature

FIGURE 12.2. Changes in the term-structure of interest rates generated by changes in


the level, slope and curvature factors.

Figure 12.2 visualizes the e ects that the three factors have on the movements of the termstructure of interest rates.
The rst factor is called a level factor as its changes lead to parallel shifts in the termstructure of interest rates. Thus, this level factor produces essentially the same e ects
on the term-structure as those underlying the duration hedging portfolio practice. This
factor explains approximately 80% of the total variation of the yield curve.
The second factor is called a steepness factor as its variations induce changes in the
slope of the term-structure of interest rates. After a shock in this steepness factor, the
short-end and the long-end of the yield curve move in opposite directions. The movements
of this factor explain approximately 15% of the total variation of the yield curve.
The third factor is called a curvature factor as its changes lead to changes in the
curvature of the yield curve. That is, following a shock in the curvature factor, the middle
of the yield curve and both the short-end and the long-end of the yield curve move in
opposite directions. This curvature factor accounts for approximately 5% of the total
variation of the yield curve.
643

12.4. Models of the short-term rate: Introduction

c
by
A. Mele

Understanding the origins of these three factors is still challenging to nancial economists and
macroeconomists. For example, macroeconomists explain that central banks a ect the shortend of the yield curve, e.g. by inducing variations in Federal Funds rate in the US. However,
decisions taken by the Federal Reserve rely on current macroeconomic conditions. Therefore, the
short-end of the yield-curve likely links to macroeconomic developments. Instead, movements
in the long-end of the yield curve should primarily depend on market expectations and riskaversion surrounding future interest rates and economic conditions. Financial economists, then,
should expect to see the long-end of the yield curve as being driven by expectations of future
economic activity, and by risk-aversion.
Empirically indeed, Ang and Piazzesi (2003) demonstrate that macroeconomic factors such
as ination and real economic activity are able to explain movements at the short-end and the
middle of the yield curve. Interestingly, they show that the long-end of the yield curve is driven
by unobservable factors. However, it is not clear whether such unobservable factors are driven
by time-varying risk-aversion or changing expectations. A compelling lesson is that models of
the yield curve driven by only one factor are likely to be misspecied, due to the complexity of
roles played by many institutions participating in the xed income markets, and the links with
the macroeconomy that decisions taken by these instititions have.
12.3.5.3 Forecasting the yield curve

Chapter 11 explains that simple techniques are available to t the yield curve for the purpose
of mere statistical descriptions of the data (see Appendix 1 in Chapter 11). One, proposed by
Nelson and Siegel (1987), and reproduced here for convenience, postulates that the yield curve
can be modeled as:

1
1
( )= 1+ 2
+ 3
The three coe cients, 1 , 2 and 3 , can be interpreted in terms of the three factors reviewed
in this section. The coe cient 1 governs the level of the yield curve. The coe cient 2 relates
to the slope, as an increase in this coe cient increases short yields more than long yields.The
coe cient 3 shapes the curvature, as an increase in this coe cient has little e ect on very
short and very long yields, but increases the middle of the yield curve. Moreover, the coe cient
controls the exponential decay of the yield curve: small values of translate to slow decay
and can better t the curve at long maturities; large values of , instead, lead to a fast decay,
which helps t the short-end of the yield curve. Finally, determines where the loading on 3
achieves its maximum. Diebold and Li (2006) rely on this model, and estimate for each date,
and then use these estimated time series of
to forecast future values of
through vector
autoregressions and, then, the future yield curve.

12.4 Models of the short-term rate: Introduction


The short-term rate is simply the growth rate, or velocity, at which locally riskless investments
appreciate over the next innitesimal amount of time. Naturally, this velocity is not a traded
asset. Models where bond prices are tied up to interest rates are likely to be incomplete, in that
to hedge against a bond, we cannot rely on anything underlying the bond price movements
what is traded is a money market account as well as the bond itself, not the interest rate. The
evaluation framework in this context is one where bond prices can be replicated through other
644

c
by
A. Mele

12.4. Models of the short-term rate: Introduction

bond prices, as explained in a discrete time setting in Chapter 11. This issues is similar to that
encountered in Chapter 10, where to price options in environments with random volatility, we
needed to replicate options through other options.
12.4.1 Models versus representations
The fundamental relation in Eq. (12.2),
(

)=E

(12.24)

suggests to model the arbitrage-free price , by assuming the short-term rate, , is an exogenously given process. For example, we can rely on a Brownian information structure, and
assume be the solution to a stochastic di erential equation such as:
= (

+ (

(12.25)

for two functions and satisfying standard regularity conditions.


How to pick up these functions? One concern is that the model solution should be fast
to compute. Another concern is that the model should be empirically accurate. Well-known
models solved in closed form are the celebrated a ne models (see Section 12.5.3). The point
any model should be accurate is more subtle. Perfect accuracy does not obtain with models
described by Eq. (12.25), even when they are extended in multiple dimensions. After all, the
model in Eqs. (12.24)-(12.25) is a model of determination of the observed yield curve. As such,
it cannot exactly t the observed term structure of interest rates. To have the model t the
initial yield curve perfectly is instead crucial whilst pricing interest rate derivatives as explained
in the previous chapter. We shall see that the model in Eq. (12.25) has the potential to t the
yield curve, once it is augmented with an innite-dimensional parameter, calibrated to the
observed yield curve.
Models with such a perfect accuracy are known as no-arbitrage models, and are the
continuous-time counterparts to the implied trees dealt with in Chapter 11. In a sense, noarbitrage models seem to be representations, rather than models. They force the short-term
rate process to exactly pin down the yield curve at any instant and for this reason, they likely
lead to intertemporal inconsistenciesthe parameters that pin down the yield curve today,
likely di er from those as of tomorrow. The philosophy underlying these models goes to the
opposite extreme of the approach we describe in this section, where the short-term rate is the
input of all subsequent movements of the term-structure of interest rates. In these models,
instead, we model future movements of the yield curve, by feeding information relating to the
entire current yield curve, not only the current short-term rate.
Models of the short-term rate are, then, and obviously, very useful, when it comes to explain
market behavior through a few inputsa data reduction scientic principle. In these models,
prices are economically admissible, i.e. no-arbitrage, and move being driven by random changes
in the state variables. The pricing errors they lead to may be seen as a virtue not a defect. A
permanent deviation of market data from the models predictions might be a market anomaly
rather than a models imperfection, which the model could help exploit. By construction, noarbitrage models are, instead, needed to deal with pricing problems where bond prices have to
match market data, and being able to change over time as a result of a probability distribution
determined by the current market data, even when these market data are mispriced.
645

c
by
A. Mele

12.4. Models of the short-term rate: Introduction


12.4.2 The bond pricing equation
12.4.2.1 A rst derivation

Suppose bond prices are solutions to the following stochastic di erential equation:
=

(12.26)

and
are some progressively measurable
where
is a standard Brownian motion in R ,
functions (
is vector-valued), and
(
). The exact functional form of
and
is not given, as in the BS case. Rather, it is endogenous and must be found as a part of the
equilibrium.
Based on general results given in Chapter 4, we can show that the price system in (12.26) is
arbitrage-free if and only if
= +
(12.27)
for some R -dimensional process satisfying some basic regularity conditions. Even if Eq.
(12.27) follows by standard no-arbitrage arguments, Appendix 1 derives it again to emphasize
how no arbitrage does specically leads when it comes to bond markets.
The meaning of Eq. (12.27) is better understood by replacing this equation into Eq. (12.26),
obtaining
=( +

is the short-term rate plus a termThe previous equation tells us that the growth rate of
premium equal to
. In the bond market, there are no obvious economic arguments enabling
us to sign term-premia. Empirical evidence suggests that term-premia may take both signs. But
term-premia would be zero in a risk-neutral world, where bond prices would, then, satisfy:
=

R
where =
+
is a Brownian motion under the risk-neutral probability, .
To illustrate the derivation of Eq. (12.27) when = 1, consider the dynamics of the value
of a self-nanced portfolio in two bonds and a money market account
= ( 1(
where
setting

)+

+(

is wealth invested in bond maturing at

)+

2(

1 1

2)

. We can zero uncertainty by

2
1

By replacing this into the dynamics of ,

1
=
2+(
1
{z
|

)
}

Note that 2 can always be chosen such that the value of this portfolio appreciates at a rate
strictly greater than : just set sign ( 2 ) = sign ( ). Therefore, to rule out arbitrage, = 0, or
1

=
1

646

2
2

c
by
A. Mele

12.4. Models of the short-term rate: Introduction

That is, the Sharpe ratio for any two bonds has to equal a process , say, and Eq. (12.27)
follows. Clearly is independent of the two maturity dates, 1 or 2 , which were indeed arbitrary. It is natural that is independent of any bond maturity, as this is unit price of risk
require to compensate for randomness in the short-term rate.
The two functions
and
in Eq. (12.26) can be determined through Itos lemma. Let
(
) be the price at of a bond maturing at when the state at is the rational
bond pricing function. Because is solution to Eq. (12.25), Itos lemma implies that:

1 2
=
+
+
+
2
where subscripts denote partial derivatives.
Comparing this equation with Eq. (12.26), and identifying drifts and di usion terms,
+

1
2

Now replace these functions into Eq. (12.27) to obtain the the bond price satises the following
partial di erential equation (PDE, henceforth), for all and
[ )
+

1
2

(12.28)

with the boundary condition (


) = 1, holding for all . We now turn to further our
interpretation of this bond pricing equation.
12.4.2.2 Derivation based on duration

The idea, here, is to replicate the price of a bond expiring at some time 1 , say 1
(
1 ),
with a self-nanced portfolio comprising a money market account and a second bond expiring
at time 2
1 . This approach is the continuous-time equivalent to the pricing approach
in Chapter 11. It is, also, the interest rate counterpart to option evaluation with stochastic
volatility in Chapter 10. Let, then, be the value of our self-nanced portfolio, = 2 + ,
where
is the number of bonds maturing at 2 to include in the portfolio, 2 = (
2 ),
and
is the dollar amount in the money market account. Since the portfolio is self-nanced,
by the usual arguments, we have that

2
=

+
+ 2
(12.29)
= 2+
where

1 2
2

. And, obviously,
1

(12.30)

Let the initial value of the portfolio match the bond price. Then, comparing the di usive terms
in Eq. (12.29) and Eq. (12.30), we nd the delta to be:
=

(
(

1 )/
2 )/

Comparing the drift terms in Eq. (12.29) and Eq. (12.30),

1
2
2
2
=
+
=
+
=
647

c
by
A. Mele

12.4. Models of the short-term rate: Introduction


where the last line follows as we are using the values (
matches that of the rst bond. Rearranging terms yields,
evaluating this for = ,
1

1
1

), such that the portfolio value


1
2
= ( 2
), and

2
2

for some and independent of calendar time.


The delta, , can be interpreted as the ratio of the durations of the two bonds, as explained
in Chapter 11. Such a duration-based derivation suggests a trading strategy. Suppose that the
rst bond price is mispriced against a given model, in that we have some condence that our
model does better than the market is currently doing while pricing this bond. For example, we
might observe that the model quite often predicts that the price of this rst bond is lower than
1
the market, 1
$ , but that at the same time, the very same model prices well bonds on other
points of the curve. We can implement the following relative value trade. We sale short the rst
1
bond for $1 , realize an initial gain equal to $1
, and invest 1 into the previously described
portfolio, comprising a money market account and the second bond, aiming to replicate the
rst bond at maturity 1
2 . Naturally, the assumption is that the model is better than the
market, which means that such a replication is indeed possible. Veronesi (2010, Chapter 16)
develops examples that show the extent of success of this strategy.
12.4.2.3 Interpreting the bond pricing equation: market incompleteness and preference-free evaluation

Eq. (12.28) shows that the bond price, , depends on both the drift of the short-term rate,
, and the risk-aversion correction, . Let us elucidate the circumstances leading to this fact.
They link to market incompleteness.
Consider a benchmark complete market model, Black-Scholes, in which the option is redundant, given the initial market structure. As pointed out in Chapter 11, in markets such as Black
& Scholes, the stock price is the source of randomness, and markets are trivially completed by
the very same asset.
By contrast, the bond price is not the risk. Rather, the bond draws value from the interest rate risk, as if it was a derivative on , where forms, alone, an incomplete market
structure. For this reason, the bond price cannot be determined in a preference-free format.
Similarly, in equity markets with stochastic volatility (see Chapter 10), two sources of randomness arise, the stock price and its stochastic volatility, such that a portfolio comprising a stock
and one additional option is needed to hedge against one given optionand no preference-free
formulae are possible. Let us mention another example. We are facing a situation similar to
those encountered in Part II of these lectures, where given a certain dividend risk, the stock
price could not, then, expressed in a preference-free fashion: the dividend in that context was
the risk, and the stock price was like a derivative written on that risk.
In Chapter 2, we learnt indeed that given an incomplete market, there might be securities
that could be introduced to make markets complete. For example, a market with two securities
and three states is incomplete. Yet it may be completed once a third security is introduced,
which together with the rst two, can generate any consumption bundle in all the possible
states of the worldwe need that the payo matrix of the three securities be no-singular. Does
this mean that the third security can be priced in a preference-free format? Of course not.
By construction, the payo s promised by the third security cannot be replicated by trading
the rst two. In other words, the third security is not redundant, for otherwise it would not
complete the initial three-state/two security market. And if it is not redundant, its price reects
648

c
by
A. Mele

12.4. Models of the short-term rate: Introduction

the services it provides the society with and thus, it cannot be preference-free. An hypothetical
fourth security could, however, be replicated by trading the three securities, and would thus be
preference-free.
The market in this section shares similarities with this simple example, but with qualications. As mentioned, the bond completes the market, and its price cannot be replicated through
the short-term rate, because the short-term rate is not traded. However, we can replicate a second bond price by trading the rst bond (and a money market account). But the second bond
price cannot be expressed in a preference-free format: there are actually no bonds that could
be expressed in a preference-free format, for the very reason that the rst bond cannot! The
situation in the context of this section is actually involved. Issues arise not only because we
try to replicate a bond with another bond. Suppose, indeed, that we try to replicate an option
written on this bond with the very same bond. We could not achieve a preference-free solution
either. By the reasoning in the previous sections, we need to make sure that the volatility of
the bond price is matched to that of the option price. Granted, we can achieve this. However,
the volatility of the bond price is not exogeneous, as it depends on the price sensitivity with
respect to , which is obviously not preference-free.
Finally, the reason the bond price depends on and is intimately related to the previous
explanations. In this market, uncertainty is generated by an untraded risk, , such that the
initial market structure has one untraded risk, , and zero assets, as explained. But because the
short-term rate is not a discounted martingale under , its drift does not equal to = 2
under but, rather
, thereby leading to Eq. (12.28). All in all, the bond price depends
on the specic functional forms of , and .
This dependence on risk-appetite might appear as a kind of hindrance to practitioners, but
it may actully be a source of value, for its potential to inform on agents risk-appetite : once
two functions ( ) are estimated, the yield curve could be inverted to infer the implied ,
which could help policy makers to take more informed decisions about how to a ect the yield
curveadmittedly though, the task of estimating ( ) is far from trivial.
By specifying ( ) and identifying the risk-premium , the PDE in Eq. (12.28) can then
be solved, either analytically or numerically. Choices concerning the exact functional form of
and are often made on the basis of analytical or empirical reasons. In Section 12.4.4,
we introduce the rst, famous models in which
and have a particularly simple form.
Section 12.4.5 discusses analytical merits but major empirical drawbacks of these models. In
Section 12.4.6 we provide a very succinct description of models exhibiting jump (and default)
phenomena. We, rst, highlight how models of the short-term rate could be used to set up a
metric of duration when this concept is ore di cult to quantify than in the contexts outlined
in Chapter 11.
12.4.3 Stochastic duration
Duration is a measure of risk tailored to capture the notion of xed income volatility. Cox,
Ingersoll and Ross (1979) hinge upon models of the short-term rate and introduce the notion
of stochastic duration, generalizing the notion of modied duration discussed in Chapter 11.
Suppose the price of a zero-coupon bond is a function of the short-term rate only, (
).
Dene the basis risk as the semi-elasticity of the bond price with respect to the short-term rate,
(

(
(

)
649

)
)

c
by
A. Mele

12.4. Models of the short-term rate: Introduction

Naturally, we want to make sure that the measure of duration for a zero.coupon bond equals
time-to-maturity. Therefore, cannot be a measure of duration: except in the trivial case in
which is constant, does not equal
for a zero. The idea underlying stochastic duration
of a given bond is to search for the time-to-maturity
of an hypothetical zero-coupon bond
such that its basis risk is the same as that of the given bond (e.g., a coupon bearing bond or a
callable bond), viz
(
)
(
)=
(
)
where (
) is the price of the given bond that delivers its face value at time , if no
events preventing this occur prior to time (say, default or the exercise of the optionalities
embedded into the institutional details of the bond). Stochastic duration of this bond is dened
as the time-to-maturity
of the hypothetical zero-coupon bond:

(
)
1
(
)=
(12.31)
(
)
1
where
is the inverse function of ( ) with respect to time to maturity . It is immediate
to verify that for a zero, (
)=
, Moreover, the stochastic duration, (
),
collapses to the modied duration introduced in the previous chapter (see Section 11.4) once
the short-term rate is a constant.10

12.4.4 Some famous models


12.4.4.1 Vasicek

Vasicek (1977) derives a model of the yield curve assuming the short-term rate is a continuoustime and mean-reverting process with constant basis point volatility, solution to:11
= (

(12.32)

where , and are positive constants. This model is more sensible than that of Merton
(1973), where the short-term rate is an arithmetic Brownian motion. The intuition underlying
the importance of mean-reversion is as follows. Suppose, rst, that = 0, in which case,
= +

(12.33)

, the
If the current level of the short-term rate = , it will be locked-in at forever. If,
short-term rate shall steadily increase, and converge to as
. Likewise, the short-term
rate shall converge to when
. The speed of convergence of to the long-term value
depends on the magnitude of : the higher , the higher the speed of convergence to .
In the general case, 6= 0, the solution to Eq. (12.32) is,
Z
(
)
= +
(
) +
where the integral has to be understood in the Itos sense. The interpretation of this solution
is similar to that in the determinist case, in that the short-term rate now uctuates around
10 Indeed,

(
) , and
1(
note that if is constant, then, (
)=
)= .
that Mertons (1973) seminal paper is also framed in a context with random interest rates solved in closed form. In
Mertons model, the short-term rate is an arithmetic Brownian motion.
11 Note

650

c
by
A. Mele

12.4. Models of the short-term rate: Introduction

its central tendency . In other words, shocks are absorbed at a speed depending on the
magnitude of , leading the short-term rate to display a mean-reverting behavior. Indeed, the
conditional expectation of is the same as that in Eq. (12.33),
(

( | ) = +
Moreover, the conditional variance of

(12.34)

is:
2

( | )=

2 (

Finally, it can be shown that is normally distributed, with expectation and variance given by
the two functions given above.
To solve for the entire term-structure of interest rates, we need to make assumptions about
the risk-premium, . A closed-form expression for the bond price obtains, once we assume is
a constant. Indeed, by replacing a constant risk-premium and the functions ( ) = (
)
and ( ) = into Eq. (12.28), and denoting

, we obtain that the bond price is


solution to:
0=

+ (

1
2

for all (

R[

),

(12.35)

with the usual boundary condition. Intuitively, (


) is the drift of the short-term rate
under , which is higher than under for
0, reecting higher Arrow-Debreu state prices
for the bad states of the world arising when interest rates are high. It is instructive to see how
this partial di erential equation can be solved. We guess a solution of the form:
(

)=

(12.36)

for two functions and to be determined. Now suppose the guess in Eq. (12.36) is true. By
replacing the partial derivatives of into Eq. (12.35) leaves the equation 1 ( ) + 2 ( ) = 0,
where for all ,
1

( )

That is, 0 =
conditions (

1
(
)=
where

)
1

)+

1
2

and

( )

( ) = 2 ( ), two ordinary di erential equations subject to the boundary


) = 0 and (
) = 0. The solutions are:

(
)
2

1
(
) 2
(
)
(
)
(
)
=
1
1
4 3

1 2
2
The term-structure of spot rates predicted by this model is, by the denition in Eq. (12.11),
=

)=

(12.37)

, is interpreted as the asymptotic spot rate, as it is the limit of (


)
The quantity,
for large .
The model displays a number of features that match some empirical factssuch as selected
shapes of the yield-curve. Plot it.
651

c
by
A. Mele

12.4. Models of the short-term rate: Introduction


12.4.4.2 A case study: expectations and business cycles in Vasicek

The Vasiceks model can be used to illustrate aspects of the expectation hypothesis and the
predicting power of the term spread mentioned in Section 12.3. First, we express the yield
curve in Eq. (12.37) in a way that is more convenient to interpret. We mentioned earlier that
the short-term rate is conditionally
normally distributed. Now, by Lamberton and Lapeyre
R
(1997, Chapter 6), the term
is also conditionally normally distributed, and then, by
Eq. (12.2) and Eq. (12.11),

Z
Z
1
1 1
(
)=E
(12.38)
var
2
The second term in Eq. (12.38) reects Jensens inequality e ects. It equals


(
)
2
2

1 1
1
1
=
var
1
1
3
2
2
(
)
4(
)

) 2

and is quantitatively much smaller than the rst term.


We approximate the yield curve with the rst term of Eq. (12.38), which is an expectation of
the average short-term rate under the risk-neutral probability. By Eq. (12.34) and Eq. (12.28),
this expectation can be decomposed as the sum of the expectation under the physical probability
plus risk-premiums terms, as follows:

Z
Z
Z
1
1
1
(
) E
=
( ) +
E
(12.39)
where:
1

( )
E

= + (

)
(
(

1
(
)
)

(
(

)
)

1
1

Eq. (12.39) says that long-term rates reect expectations of the future short-term rate and
risk-premiums terms, dened as the average expected return on the bond.
We can rely on this simple framework to describe some of the business cycle properties of the
yield curve. We assume that a single state variable , is capable to track some business cycle
conditions, and is solution to the following stochastic di erential equation,
= (

for three positive constants, , and . Next, suppose that: (i) the nominal short-term rate
is procyclical, in that
+
, for two positive constants
and , and that (ii) riskpremiums are countercyclical in that,
( )
0
1 , for two additional constants, where
0,
to
ensure
countercyclicality,
and
might
in
principle take any sign, although it is
1
0
reasonable to assume that 0
0, which would imply that the constant portion of the riskpremium is positive anyway. We shall return to the sign of 0 in a moment.
Given the assumptions made so far, the short-term rate is solution to Eq. (12.32), with
parameters
+ and
. While the risk-premium is time-varying, being equal to
1
( )
(
),
the
short-term
rate is still conditionally normally distributed under the
0
652

12.4. Models of the short-term rate: Introduction

c
by
A. Mele

risk-neutral probability, and the yield-curve can be solved in closed form, with a solution like
that in Eq. (12.36), and an approximation like that in Eq. (12.39). It is instructive to calculate
the decomposition in Eq. (12.39) predicted by this model. We have,

Z
(
)
1
1
(
) E
= +(
)
(
)
where

and

0
1

. The models parameters clearly collapse to Va-

siceks, once we assume 1 = 0. A countercyclical risk-premium, 1 0, leads to a tilt in the


unconditional short-term rate, , and, importantly, an increased persistence of the short-term
rate, due to the fact that
. Clearly, there are no solutions to the model for large , when
risk-premium e ects are so large to make
0. Finally, note that although it is reasonable to
assume 0 0, as explained, we might also wish to make sure that ( ) is negative for most
of the time, to ensure positive expected excess returns.
The term-spread over the short-term rate predicted by the model is,

(
)
1
( +
1 (
)
)
(
)
0
=
is the unconditional expectation of the procyclical variable , taken under
where
1
the risk-neutral probability. According to this model, the term spread is the product of two
terms. The rst is negative when
0, and in this case, the model formalizes explanations
given in Section 12.3.3. Suppose, once again, that interest rates are procyclical, in that
0.
Before a peak, i.e. when
, the yield curve is upward sloping. After this peak is achieved
and, then,
, the probability the economy would enter into a recession becomes more
likely, given the mean-reverting nature of , and the yield curve becomes inverted, as nominal
rates are procyclical,
0, and countercyclical risk-aversion is mild enough to guarantee
that
0. In other words, at the peak of an expansion, i.e. when a recession is most likely,
we expect that interest will lower due to their procyclicality,
0, whence the yield curve
inversion. Long term interest rates are driven by expectations of future short-term rates.
Note that if 1 had to be so large to make
0, the model would generate the wrong
predictions, with an inverted yield curve during the rising part of a boom, not the descending
part. The mechanism would be that during a peak, we would expect that the future short-term
rate would be low, but risk-aversion to be so high, to dwarf expectations e ects and push future
prices down, to an extent that would compensate for the procyclical e ects generated by the
short-term rate.
Naturally, this model is very simple, driven as it is by only one factor, i.e., the business cycle
variable . Its merit is that it makes a sharp prediction regarding the slope of the yield curve:
a positive slope occurs before a peak,
, and a negative slope after the peak,
,
similarly as in the data. This model thus isolates the business cycle component of the yield
curve that relates to its inversions. The crucial point is that the model is silent as regards the
business cycle variable . If we knew , we could use it to forecast the business cycle in the
rst place.

12.4.4.3 Cox, Ingersoll and Ross

Vasiceks model su ers from two main drawbacks. First, the short-term rate is normally distributed. This circumstance might be mitigated when is low, compared to , in which case
653

c
by
A. Mele

12.4. Models of the short-term rate: Introduction

the probability the short-term rate takes negative values can be small. At the same time, even
a small probability of a negative interest rate might lead to severe mispricings when it comes
to pricing interest rate derivatives, due to nonlinearities induced by optionality, as pointed
out by Dybvig [cite reference]. Section 11.5.3 of the previous chapter displays numerical examples where small changes in assumptions can lead to quite substantial changes in the price of
derivatives.
The second drawback, related to the rst, is that the short-term rate volatility is independent
of the level of the short-term rate. It might be argued that short-term rates changes become
more and more volatile as the level of the short-term rate increases, a phenomenon usually
referred to as the level-e ect.
Cox, Ingersoll and Ross (1985) (CIR, henceforth) propose a model that addresses these two
drawbacks at once, by assuming that the short-term rate is solution to,
= (

The CIR model is also referred to as square-root process to emphasize that the di usion
function is proportional to the square-root of . This feature makes the model address the levele ect phenomenon. The evidence about the level-e ect is further discussed below (see Section
12.4.5). Moreover, this property prevents from taking negative values. Intuitively, when
wanders just above zero, it is pulled back to the stricly positive region at a strength of the
order
= .12 The transition density of is noncentral chi-square. The stationary density
of is a gamma distribution. The expected value is as in Vasicek.13 However, the variance is
di erent, although its exact expression is really not important here.
CIR formulate a set of assumptions (e.g., preferences), leading to a risk-premium function
=
, where is a constant. By replacing this, ( ) = (
) and ( ) =
into the
partial di erential equation (12.28), one gets (similarly as in the Vasicek model), that the bond
price function takes the form in Eq. (12.36), but with functions and satisfying the following
di erential equations:
0=

and 0 =

subject to the boundary conditions, (


the bond price takes the following form,

)=

(
2
(

+ )(

1
(
2

+ )(
(

)+( +

) = 0 and

)= (
! 2 2

1) + 2
p
2+2 2
=

(
+

)+

1
2

) = 0, such that the solution to

)=
=

2
(
+

+ )(

)
(

1
1) + 2

This model has been the reference in the industry for many years, and still now, many models
are multidimensional extensions of the basic CIR model, as reviewed in Section 12.5.3.
1 2
12 This is only intuition. The exact condition under which the zero boundary is unattainable by
is
. See Karlin and
2
Taylor (1981, vol II chapter 15) for a general analysis of attainability of boundaries for scalar di usion processes.
13 The expected value of linear mean-reverting processes is always as in Vasicek, independently of the functional form of the
di usion coe cient. This property follows by a direct application of a general result for di usion processes given in Chapter 6
(Appendix A).

654

c
by
A. Mele

12.4. Models of the short-term rate: Introduction


12.4.4.4 Nonlinear drifts

Models that are analytically tractable are certainly quite valuable. Vasicek and CIR models
do lead to closed-form solutions, because they have a linear drift, among other things. Is the
empirical evidence consistent with linear mean-reversion of the short-term rate? This issue
is controversial. In the mid 1990s, three papers by At-Sahalia (1996), Conley et al. (1997)
and Stanton (1997) produce evidence of nonlinear mean-reverting behavior. For example, AtSahalia (1996) estimates a drift function that has the following form:
( )=

2
2

(12.40)

corresponding to a nonlinear di usion function. Figure 12.3 reproduces this function using the
parameter values in his Table 4, and relating to the sample period from 1983 to 1995. Similar
results are reported in the other papers. To illustrate the action the short-term rate dynamics
are under, Figure 12.3 also depicts a linear drift, obtained with the parameter estimates of
At-Sahalia (1996) (Table 4), and relating to a model with a CEV di usion.

drift

0.005
0.004
0.003
0.002
0.001
0.000
-0.001

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

short-term rate r

-0.002
-0.003
-0.004
-0.005

FIGURE 12.3. Nonlinear mean reversion? The solid line is the drift function in Eq. (12.40),
estimated by At-Sahalia (1996), and relating to a parametric model with a nonlinear
di usion function. The dashed line is the estimated linear drift relating to a model with
CEV di usion.

The nonlinear drifts in Figure 12.3 might lead bond prices to exhibit unusual properties,
though. As explained in Chapter 7 (Appendix 5), bond prices are concave in the short-term
rate if the risk-neutralized drift function is su ciently convex (Mele, 2003). While the results
in Figure 12.3 relate to the physical drift functions, the point is nevertheless important as riskpremiums should look like quite unusual to destroy the nonlinearities of the short-term rate
under the physical probability.
655

12.4. Models of the short-term rate: Introduction

c
by
A. Mele

The compelling lesson from Figure 12.3 is that under the nonlinear drift dynamics, the
short-term rate behaves in a way that can at least be roughly comparable with that it would
behave under the linear drift dynamics. However, the behavior at the extremes is dramatically
di erent. As the short-term rate moves to the extremes, it is pulled back to the center in a
very abrupt way. At the moment, it is not clear whether these preliminary empirical results are
reliable or not. New econometric techniques are currently being developed to address this and
related issues.
One possibility is that such single factor models of the short-term rate are simply misspecied.
For example, there is strong empirical evidence that the volatility of the short-term rate is timevarying, as we shall discuss in the next section. Moreover, the term-structure implications of a
single factor model are counterfactual, since we know that a single factor cannot explain the
entire variation of the yield curve, as explained in Section 12.3.5. We now describe more realistic
models driven by more than one factor.
12.4.5 The Monetary Experiment and interest rate volatility
One of the motivation underlying the early adoption of the CIR model in the industry was the
property of the model to predict interest rate volatility to increase with the level of interest rates.
Is this a robust empirical feature? It is a di cult topic, with the answer relying on particular
historical episodes. Certainly the episodes of FED QE after 2010 when interest were extremely
low were accompanied by a suppressed volatility. And, during the Monetary Experiment of
the Federal Reserve occurred between October 1979 and October 1982, both interest rates and
interest rate volatility were high.
Figure 12.4 depicts the time series behavior of the nominal short-term rate, as measured
by the three month TB rate,
P12as well as the volatility of its changes, calculated as Vol 14
1
6 , where
|, and
is the short-term rate as of month .
=1 | +1
12
Figure 12.5 plots a scatterplot of the short-term rate basis point volatility, Vol , against , for
6 % , where
two sampling periods.
For later use, dene percentage volatility as Vol%

P12

1
+1
%
. Interest rates are expressed in percentage, and
=1 ln
12
One clear property is that interest rate volatility appears to be countercyclical, spiking as it
does in most of the NBER recession episodes. A natural explanation is that the FED is more
aggressive decreasing interest rates in bad times than increasing them in good.
We previously discussed the level e ect, dened as the statistical relation arising when
the volatility of the short-term rate increases in the level of the short-term rate. One possible
explanation of these episodes is that when liquidity is erratic, interest rates are high reecting
a liquidity risk-premium. But precisely because of erratic liquidity, interest rates are also very
volatile in such periods. A simple statistical model that could help deal with these facts is one
in which the short-term rate has stochastic volatility, as follows:

= (
) +
1
p
(12.41)
2
= (
) +
(
1
1 +
2 )
are standard Brownian motions under the physical probability, is a factor a ecting
where
in interest rate volatility, and remaining notation refers to models parameter. In particular,
: | | 1 is the instantaneous correlation between
and
. This model generalizes the one
14 This

calculation follows that implemented to measure aggregate equity volatility in Section 7.2 of Chapter 7.

656

12.4. Models of the short-term rate: Introduction

c
by
A. Mele

factor models in the previous section, in which the yield curve is only driven by the short-term
rate. If
0, the instantaneous rate volatility increases with the level of the interest level. If
the correlation coe cient
0, interest rate volatility is also partly related to sources of
volatility not directly a ected by the level of the interest rate.
Yet the empirical evidence underlying Figure 12.5 does not lend much support to the level
e ect. Rather, it seems that interest rate volatility to be quite at against the level of rates.
The exception, of course, occurred over the Monetary Experiment, when the FED target was
money supply, rather than interest rates. Over this period, high volatility of money demand
mechanically translated to high interest rate volatility through market clearing. Additionally,
monetary base over this period was kept deliberately low as an attempt to ght against ination.
Whence, both interest rate volatility and interest rates were very high. One additional reason for
the high nominal rates at the time might link to a compensation for high ination volatilitynot
only high ination.

FIGURE 12.4. This picture depicts the time-series behavior of the 3 month TB rate
(top panel, in percent) and the rolling, basis point volatility of the 3 month TB rate
changes, Vol (bottom panel), over the sampling period from 1957:01 through 2008:12.
BP volatility is expressed in percentage terms, such that a value of one in the graph is
the same as 100 basis points.

657

c
by
A. Mele

12.4. Models of the short-term rate: Introduction

BP volatility against rate levels. Sample size: 1957:01 2008:12


6

10

12

14

16

BP volatility against rate levels. Sample size: 1990:01 2008:12


2
1.5
1
0.5
0

FIGURE 12.5. This picture plots the basis point volatility of the short-term rate changes,
Vol (on the vertical axis), against the level of the short-term rate (on the horizontal
axis), for the sampling period spanning 1957:01 through 2008:12 (top panel), and for a
more recent sample spanning 1990:01 through 2008:12 (bottom panel).

Log volatility against rate levels. Sample size: 1957:01 2008:12


1.5

0.5

10

12

14

16

Log volatility against rate levels. Sample size: 1990:01 2008:12


1.5

0.5

FIGURE 12.6. This picture plots the percentage volatility of the short-term rate Vol%
(on the vertical axis), against the level of the short-term rate (on the horizontal axis), for
the sampling period spanning 1957:01 through 2008:12 (top panel), and for a more recent
sample spanning 1990:01 through 2008:12 (bottom panel).

658

c
by
A. Mele

12.4. Models of the short-term rate: Introduction

Figure 12.5 is also suggestive of a change in regime that possibly occurred over a more recent
past. From 1990 on, interest rate volatility does not necessarily appear to positively link to rate
levels, and there is evidence of the opposite. Figure 12.6 suggests that percentage volatility
could actually be inversely related to the short-term rate, and over the more recent sample
periods too.
All in all, evidence is mixed regarding the relation between the level of interest rates and
interest rate volatility. The interesting property of the model in (12.41) is that it allows interest
rate volatility to uctuate, driven by its own source of risk, 2 . In the next section, we review
models of the yield curve with stochastic volatility in a more systematic fashion. We end the
ongoing section with a succinct account of how to model the yield curve in the presence of
jumps.
12.4.6 Short-term rates as jump-di usion processes
Ahn and Thompson (1988) extend the CIR model to one where the short-term rate is a jumpdi usion process. In general, suppose that the short-term rate is a jump-di usion process:
=

( )

+ ( )

+ ( )S

where
and are under the risk-neutral probability, and
is, then, a jump-adjusted riskneutral drift. The bond price (
) is solution to,

Z
+
(
)+
[ ( + S
)
(
)] ( S)
(12.42)
0=
supp(S)

with the usual boundary condition, (


) = 1. Eq. (12.42) follows because, as usual, the
discounted value of a zero coupon bond is a martingale under the risk-neutral probability. This
model can also be extended to one where there are di erent quality, or types, of jumps, in which
case Eq. (12.42) is:
0=

)+

X
=1

supp(S)

[ ( + S

)]

( S)

where
is the number of jump types. However, to simplify the exposition, we just set = 1.
To identify risk-premiums related to jumps, we simply note that
= , where is the
intensity of the short-term rate jump under the physical distribution, and
is the risk-premium
demanded by agents to be compensated for the presence of jumps.
Next, consider a defaultable bond. Assume the event of default is a Poisson process with
intensity , and that in the event of default at , the bondholder receives a recovery payment
( ), which could be deterministic or dependent on the short-term rate.15 Let be the random
time of default, and dene a state variable with the following features:

0 if

=
1 otherwise
15 Chapter 13 contains an account of this approach to modeling defaultable bonds, known as reduced-form approach. This
approach is distinct from a structural, in which the default event is modeled regarding the books of the issuer. The derivations
in this section are based on the partial di erential equations in Mele (2003).

659

c
by
A. Mele

12.5. Multifactor models of the short-term rate


such that, under the risk-neutral probability,

= ( ) + ( )
=S
, where S

1, with probability one

(12.43)

Denote the price of a zero coupon bond with (


),
[ ], and assume that for all
0
( ; )
.
and
0, ( 1
) = ( )
( 0
) and that ( ; 0 )
These assumptions guarantee that default-free bond prices are higher than defaultable bond
prices, as shown below. In the absence of arbitrage, the pre-default bond price ( 0
)=
pre
(
) satises:

+
( 0
)+ ( )[ ( 1
)
( 0
)]
0=
=

for all

[
pre

( + ( ))
( 0

), and the boundary condition ( 0


Z

( + ( ))
)=E
+E

) + ( ) ( )

(12.44)

) = 1. The solution is, formally:

( + ( ))

( ) ( )
(12.45)

where E [] is the expectation taken with resepct to only the rst equation of system (12.43).
Du e and Singleton (1999, Eq. (10) p. 696) provide a slightly di erent evaluation formula
than Eq. (12.45), dening a percentage loss process
[0 1] : = (1
) , which inserted
into Eq. (12.44) leaves a partial di erential equation, the solution of which is, by Feynman-Kac,

pre
( + ( ))
(
)=E
Finally, pre is decreasing in the default intensity , in the following sense. Consider two
markets and where the default intensities are
and , and assume that the coe cients
of
are independent of . The pre-default bond price function in market is (
)
=
, and satises:

0=
+ (
)
=
+
with the usual boundary condition. Assuming that = , subtracting these two equations,
and rearranging terms, shows that the price di erence
(
)
(
)
(
)
satises,


0=
(
)+
(
)
+
( + )
( )
with boundary condition,
(
) 0 whenever
Appendix 3 of Chapter 7.

) = 0 for all . Because, clearly,


, we have that
, by an application of the maximum principle reviewed in

12.5 Multifactor models of the short-term rate


The empirical evidence reviewed in Section 12.3.4 suggests that one-factor models cannot explain the entire variation of the term-structure of interest rates. We need at least three factors.
This section contains a succinct review regarding the standard approaches that deal with this
multi-dimensionality.
660

c
by
A. Mele

12.5. Multifactor models of the short-term rate


12.5.1 Stochastic volatility
12.5.1.1 Convexity

How does volatility a ect the yield curve? Consider the following two-period example. In the
rst period, the short-term rate is and in the second, it is either = + or =
with equal
probability, where
0. The price of a two-period bond is ( ) = ( )/ (1 + ), where
( ) = (1/ (1 + )) is the discount factor expected to prevail at next period. By Jensens
inequality, ( )
1/ (1 + ()) = 1/ (1 + ) = ( 0). That is, two-period bond prices
increase upon activation of randomness. More generally, two-period bond prices are always
increasing in the volatility parameter in this example, as illustrated by Figure 12.7.
The intuition underlying Figure 12.7 is standard: the bond price is decreasing and convex in
the short-term rate, such that the price is increasing in the interest rate volatilitythe price
drop in bad times (i.e. when the interest rate increases), is less than the price increase in
good times.16
1

a
m(r,d) (a A)/2
m(r,d) (b B)/2

B
A

r d

r d

r d

r d

FIGURE 12.7. If the risk-neutralized interest rate of the next period is either = + or
=
with equal probability, the discount factor 1/ (1 + ) is either or with equal
probability. Hence ( ) = [ 1/ (1 + )] is the midpoint of
. Similarly, if volatility is
0
0
, (
) is the midpoint of
. Since
, it follows that ( 0 )
( )
such that the two-period bond price satises ( ) = ( )/ (1 + ) satises: ( 0 )
.
( ) for 0

These properties are due to the assumption that the expected short-term rate is independent
of . They may well break down in alternative settings. For example, consider a market in which
an upward rate movement is more likely than a downward. As a second example, consider a
multiplicative setting, in which either = (1 + ) or = / (1 + ) with equal probability.
16 This property relates to the theory of mean-preserving spreads and convex payo s explained in Chapters 7 and 10. Let
() = 1/ (1 + ) denote the random discount factor, such that
7
( ) is increasing and concave and, hence,
0
00 =
( 00 ( ))
( 0 ( )), just as Figure 12.7 illustrates.

661

c
by
A. Mele

12.5. Multifactor models of the short-term rate

It can be shown that in these two examples, bond prices are decreasing in volatility for short
maturities, and increasing for longer, a property originally illustrated by Litterman, Scheinkman
and Weiss (1991). Below, it is argued that, due to risk-aversion, changes in the expected shortterm rate may well depend on the volatility parameter, . Then, at short maturities, riskaversion dominates the convexity e ects in Figure 12.7, whereas convexities dominate over
longer maturities. We now build on this intuition and explain the relation between interest rate
volatility and the yield curve.
12.5.1.2 Two-factor models

In the CIR model, the instantaneous volatility of the short-term rate is stochastic, depending
as it does on the level of , which is obviously stochastic. However, empirical evidence suggests
that the short-term rate volatility depends on some additional factors, as discussed in Section
12.4. A natural extension of the CIR model is one where the instantaneous volatility of the
short-term rate depends on (i) the level of the short-term rate, similarly as in the CIR model,
and (ii) some additional random component. This additional component is what we refer to as
the stochastic volatility of the short-term rate. It is the term-structure counterpart to the
stochastic volatility extension of the Black and Scholes (1973) model (see Chapter 10).
Fong and Vasicek (1991) develop the rst model in which the volatility of the short-term rate
is stochastic. They assume that the short-term rate
is solution to
=
=

(
(

)
)

+
+

(12.46)

are constants, and [ 1 2 ] is a vector Brownian motion. To obtain


where , , , and
a closed-form solution, Fong and Vasicek set = 0. The authors also make assumptions about
risk aversion corrections. Namely, they assume that the unit-risk-premia for the stochastic uctuations of the short-term rate, , and the short-term rate volatility, , are both proportional
to
, and then they nd a closed-form solution for the bond price as of time and maturing
at time , (
).
Longsta and Schwartz (1992) propose another model of the short-term rate (interpreted in
general equilibrium), in which the volatility of the short-term rate is stochastic. Naturally, the
Longsta & Schwartz model predicts, just as the Fong-Vasicek model does, that the bond price
is a function of both the short-term rate and its instantaneous volatility.
Note, then, the important feature of these models. The yield curve is now a function
(
), depending on the level of the short-term rate, , and one additional factor,
the instantaneous variance of the short-term rate, . Therefore, these models predict that we
now have two factors that help explain the whole term-structure of interest rates.
12.5.1.3 Volatility and the yield curve

How does the factor a ect the yield curve? Consider the basic Vasicek (1977) model. Naturally, this model assumes that volatility is constant, yet it could be used to develop intuition
on Eqs. (12.46) and possibly other stochastic volatility models. It is possible to show that Eq.
(12.36) implies that
Z

Z
(
)
1
2
=
(
) +
(
)
(12.47)
where (
) is as in Section 12.4.4.1. Eq. (12.47) shows that if
0, the whole termstructure is decreasing in , the short-term rate volatility. That is, bond prices increase in , a
662

12.5. Multifactor models of the short-term rate

c
by
A. Mele

conclusion that parallels that for options, where option prices are increasing in the volatility of
the asset price. As explained in Chapter 10, this property arises through the optionality of the
contractsay the convexity of a European call price with respect to the asset price.
But the interesting properties arise in the empirically relevant case,
0.17 In this case, the
( )
sign of
depends on both convexity and slope e ects. Convexity e ects, those relatR
2 (
)
2
= (
) 2(
), arise through the term
(
ing to the second partial
2
(
)
) . Slope e ects, those relating to
= (
) (
), arise, instead, through
R
the term
(
) . If is negative, and su ciently large in absolute value, slope e ects
dominate convexity e ects, and the term-structure can actually increase in . For intermediate
values of , the term-structure can be both increasing and decreasing in . At short maturities,
the convexity e ects in Eq. (12.47) are typically dominated by slope e ects, and the short-end of
the term-structure can be increasing in . At longer maturity dates, however, convexity e ects
are more important and, sometimes, dominate slope e ects.
More generally, changes in interest rate volatility are not mean-preserving spreads for the riskneutral distribution, as Eq. (12.47) illustrates for the Vasicek model. In a world with complete
markets, say Black-Scholes, the asset underlying the derivative contracts is traded. In the case
under study, the short-term rate is not a traded risk. Therefore, its risk-neutral drift depends
on volatility through risk-adjustements: to illustrate, in the Vasicek example, this dependence
arises through the risk-premium parameter, .
While Eq. (12.47) relies on a model with constant volatility, the reasoning underlying its
interpretation holds even when volatility is random.18 In particular, suppose that the riskpremium required to bear the interest rate risk is negative and su ciently large in absolute
value. In this case, slope e ects may dominate convexity ones at any maturity date, such
that the whole yield curve, now, could be always increasing in volatility. Let us provide some
intuition. It is reasonable to expect that in bad times (i.e., when interest rate volatility is high)
risk-premium e ects dominate over convexity, such that the yield curve shifts up following an
increase in volatility. However, in good times, we would expect that convexity dominates, with
the yield curve being decreasing in volatility. Thus, in these examples, if risk-premiums are
su ciently sensitive to volatility, we would expect that in good times, when volatility is small,
convexity e ects dominate and the yield curve lowers as volatility increases. In bad times, when
volatility is high, we would expect that risk-premium e ects dominate, such that the yield curve
increases following an increase in volatility.
To illustrate, consider the Vasicek model again, and assume that the risk-premium is = 3 ,
for some constant . This functional form of the risk-premium ensures that the risk-premium
is quite small when is small, although then it substantially increases in bad times, i.e. when
gets larger and larger. With this risk-premium, Eq. (12.47) is:
Z

Z
1
(
)
2
2

=
(
) +
(
)
(12.48)
That is, risk-premium e ects become more and more relevant as increases. The previous
equation reveals that we may also dene a threshold value for such that convexity e ects are
exactly o set by risk-premium e ects. Eq. (12.48) shows that for each time to maturity
,
(
)
there exists a value of depending on
, say (
) such that the partial
= 0.

17 In this simple model, the assumption that


0 is reasonable, as we observe positive risk-premia more often than negative
risk-premia. But in this same model,
0, which together with
0, ensures that term-premia are positive.
18 See Mele (2003).

663

12.5. Multifactor models of the short-term rate

c
by
A. Mele

RT
We might go on and dene an average value of (
), say
T 1 0 ( ) , where T
denotes the highest time-to-maturity we want to consider. This threshold value, , is the one
that might lead to a denition of what good or bad times can bein terms of the term-structure
implications of a volatility shock.
How do we interpret these properties in light of the factors dynamics reviewed in Section
12.3? Clearly, the very short-end of yield curve is not a ected by movements in volatility, as
lim
(
) = , for all . Moreover, these models predict that lim
(
)

= , where is a constant and, hence, independent of . Therefore, movements in the shortterm volatility can only a ect the middle portion of the yield curve. For example, if the riskpremium required to bear the interest rate risk is negative and su ciently large, an upward
movement in can produce an e ect on the yield curve qualitatively similar to that depicted
in Figure 12.2 (Curvature panel), and would thus roughly mimic the curvature factor that
we reviewed in Section 12.3.
12.5.2 Three-factor models
We need at least three factors to explain the entire variation of the yield curve. A natural
extension of the model in Eqs. (12.46) is one in which the drift of the short-term rate contains
some predictable component, , such that the yield curve is driven by the following three factor
model:
= (
) +
1
= (
) +
(12.49)
2
= ( ) +
3
where

and are constants, and [ 1


2
3 ] is vector Brownian motion.
Balduzzi et al. (1996) develop the rst model in which the drift of the short-term rate changes
stochastically, as in Eqs. (12.49). Dai and Singleton (2000) estimate a number of models that
generalize that in Eqs. (12.49) (See Section 12.5.6 for details on the estimation strategy). The
term-structure implications of these models can be understood very simply. First, under regularity conditions, the yield curve is now a function (
). Second, and intuitively,
changes in the new factor should primarily a ect the long-end of the yield curve. This is
because empirically, the usual nding is that the short-term rate reverts relatively quickly
to the long-term factor (i.e.,
is relatively large), where mean-reverts slowly (i.e.,
is relatively low). This mechanism makes the short-term rate quite persistent anyway. Ultimately, then, the slow mean-reversion of means that changes in last for the relevant part
of the term-structure we are usually interested in (i.e. up to 30 years), despite the fact that
lim
(
) is independent of the three factors, , and .
However, it is di cult to see how to reconcile such a behavior of the long-end of the yield
curve with the existence of any of the factors discussed in Section 12.3. First, the short-term
rate cannot be taken as a level factor, since we know its e ects die o relatively quickly.
Instead, a joint change in both the short-term rate, , and the long-term rate, , should be
really needed to mimic the Level panel of Figure 12.2. However, this interpretation is at odds
with the assumption that the factors underlying the exercises in Figure 12.2 are uncorrelated!19
Finally, to emphasize how exacerbated these puzzles are, consider the e ects of changes in
the short-term rate . We know that the long-end of the term-structure is not a ected by
movements of the short-term rate. Hence, the short-term rate acts as a steepness factor, as
19 Empirical

results in Dai and Singleton reveal that, if any,

and are negatively correlated.

664

c
by
A. Mele

12.5. Multifactor models of the short-term rate

in Figure 12.2 (Slope panel). However, this interpretation is restrictive, as factor analysis
reveals that the short-end and the long-end of the yield curve move in opposite directions after
a change in the steepness factor. Here, instead, a change in the short-term rate only modies
the short-end (and, perhaps, the middle) of the yield curve and, hence, does not produce any
variation in the long-end curve.
12.5.3 A ne and quadratic term-structure models
12.5.3.1 A ne

The Vasicek and CIR models predict that the bond price is exponential-a ne in the shortterm rate . This property is the expression of a general phenomenon. Indeed, it is possible
to show that bond prices are exponential-a ne in if, and only if, the functions and 2 are
a ne in . Models that satisfy these conditions are known as a ne models. More generally,
these basic results extend to multifactor models, where bond prices are exponential-a ne in
the state variables.20 In these models, the short-term rate is a function ( ) such that
( )=
where
to.

is a constant,

is a vector, and
= (

is a multidimensional di usion, in R , and is solution


)

( )

where
is a -dimensional Brownian motion, is a full rank
rank diagonal matrix with elements,
q
( )( ) =
+ >
= 1

(12.50)

matrix, and

is a full

(12.51)

for some scalars and vectors . Langetieg (1980) develops the rst multifactor model of this
kind, in which = 0.
Next, Let
( ) be a diagonal matrix with elements
(
1
if Pr{ ( )( ) 0 all } = 1
( )( )
( )( ) =
0
otherwise
and set,
( )=

( )

( )

(12.52)

for some -dimensional vector 1 and some matrix 2 . Du e and Kan (1996) explained
in a comprehensive way the benet of this model. In their formulation 2 = 0 , and the bond
price is exponential-a ne in the state variables . That is, the price of the zero has the following
functional form,
(
) = exp ( (
)+ (
) )
(12.53)
for some functions
and (0)( ) = 0.

and

of time to maturity,

is vector-valued), such that

(0) = 0

20 More generally, we say that a ne models are those that make the characteristic function exponential-a ne in the state variables.
In the case of the multifactor interest rate models of the previous section, this condition is equivalent to the condition that bond
prices are exponential a ne in the state variables.

665

12.5. Multifactor models of the short-term rate

c
by
A. Mele

The more general functional form for in Eq. (12.52) has been suggested by Du ee (2002).
The rationale is the following. Du ee explains that in bond markets, risk-premiums seem to
relate to both volatility and level of the fundamentals. In this model, risk-premiums reduce
to ( ) ( ) = 2 ( ) 1 + 2 . Thus, the inclusion of the term 2 allows one to model the
statistical relations linking risk-premiums to fundamentals. Interestingly, bond prices still have
an exponential a ne form, just as in Eq. (12.53). When 2 = 0 , we say that the model
is completely a ne, and essentially a ne, otherwise. The clear advantage of these a ne
models, then, is that they considerably simplify statistical inference, as explained in Section
12.5.5 below.
Ang and Piazzesi (2003) and Hordahl, Tristani and Vestin (2006) (HTS, henceforth) introduce
no-arbitrage regressions, to model the relations linking macroeconomic variables to the yield
curve. In their models, the factors are taken to be a discrete-time version of Eq. (12.50), where
some components of are observable, and others are unobservable. The observables relate to
macroeconomic factors such as ination or industrial production. The authors, then, study how
all these factors a ect the yield curve, predicted by a pricing equation such as that in Eq.
(12.53). While HTS have a structural model of the macroeconomy, Ang and Piazzesi (2003)
have a reduced-form model.
Reduced-form model can be exposed to the critique that some of the parameters are not
variation-free. [Explain what variation-free parameters are, in mathematical statistics] For example, in the simple Lucas economy of Part I, we know that the short-term rate
is =
+ 12 2 (1 + ), so by change the risk-aversion paramter, , a change in the interest rate should arise as a result. This simple example shows that the parameters related to
risk-aversion correction in Eq. (12.52) are not free, in that tilting them has an e ect on the
parameters of the factor dynamics in Eq. (12.50). At the same time, reduced-form model o er
a great deal of exibility, as they do not restrict, so to speak, the model to track any market
or economy such as the Lucas economy, say. Moreover, we can always nd a theoretical market
supporting the no-arb market underlying the reduced-form model. No-arb regressions such as
those in AP give the data the power to say which parameter constellation make the model likely
to perform, without imposing theoretical restrictions which the data might, then, be likely to reject. For example, the Lucas model, while clearly illustrates that some of the parameters are not
variation-free, can be simply wrong, and might impose unreasonable restrictions on the data.
For no-arb models, instead, cross-equations restrictions arise through the weaker requirement
of absence of arbitrage.
12.5.3.2 Quadratic

A ne models are known to impose tight conditions on the structure of the volatility of the
state variables. These restrictions arise to keep the square root in Eq. (12.51) real valued. But
these constraints may hinder the actual performance of the models. There exists another class
of models, known as quadratic models, that partially overcome these di culties.
12.5.4 Unspanned stochastic volatility
Are xed income markets incomplete? Mele and Obayashi (2015) argue that xed income
volatility is quite distinct from equity. Consequently, many of the investable products on the
popular gauge of equity volatility, VIX (see Chapter 10), could only be poor surrogates for
exposure to xed income volatility. The uniqueness of xed income volatility has actually
been widely acknowledged in the literature. It is well-known since at least Collin-Dusfresne and
666

c
by
A. Mele

12.5. Multifactor models of the short-term rate

Goldstein (2002) that xed income market volatility does not appear to be priced only based
on existing xed income assets.
Simply, the authors showed that straddle returns on caps and foors (see Section 12.8) cannot
be explained by changes in the term structure of swap rates, but by other factors. That is,
existing xed income assets (such as bonds) do not help hedge rate volatility. The models
proposed by Collin-Dusfresne and Goldstein to address these issues are known as leading to
unspanned stochastic volatility (USV, in the sequel). The reason for this terminology is the
following. Consider any of the stochastic volatility models reviewed in Chapter 10, in which a
stock price
is solution to
=

(12.54)

, and a volatility
under the physical probability, for some constant , a Brownian motion
process ( ) 0 possibly driven by
and other Brownian motions. One example of these models
is the celebrated Hestons (1993) model, in which 2 is a square root process.
As we know from Chapter 10, these models are typically understood to describe a situation
of incomplete markets. Collin-Dusfresne and Goldstein (2002) propose to extend this notion to
xed income markets, by modeling bond prices in an incomplete markets setup. The idea is
simple. If bond markets have stochastic volatility and are still incomplete, bond prices should
satisfy dynamics in which their instantaneous returns have stochastic volatility, similarly as in
Eq. (12.54) for the equity case. At the same time, bond prices exposure to volatility should be
zeroed, just as a stock price exposure to its own volatility is zero in the context of Chapter 10.
In the context of the short-term rate models of this section, the latter condition is met when
(

)=0

(12.55)

Collin-Dufresne and Goldstein (2002) provide a characterization of the previous condition


in multidimensional models. It is still an open question whether existing models with USV
compare to models without. In a nutshell, restrictions such as that in Eq. (12.55) can impede
standard multifactor models to perform as accurately as they could do without restrictions.
However, USV and incompleteness of xed income markets both seem to be a quite robust and
well-accepted empirical feature.
12.5.5 Topics regarding estimation and trading strategies
12.5.5.1 Univariate models

How to estimate the parameter vector = [


]> in Eqs. (12.41)? Maximum
Likelihood (ML) would be a feasible estimation method under two sets of conditions. First, the
model in Eqs. (12.41) should not have stochastic volatility at all, viz, = = 0; in this case,
the short-term rate would be solution to,
= (

where is now a constant. Second, the value of the elasticity parameter is important. If = 0,
the short-term rate process is the Gaussian one proposed by Vasicek (1977). If = 12 , we obtain
the square-root process of Cox, Ingersoll and Ross (1985). As we know, the transition density
of is Gaussian in the Vasicek market, and a noncentral chi-square in the CIR case. Therefore,
in both Vasicek and CIR markets, we may write down the likelihood function of the di usion
667

c
by
A. Mele

12.5. Multifactor models of the short-term rate

process. Therefore, ML estimation is possible in these two cases. In more general cases, such
those in the next section, one needs to go for simulation methods, such as those described in
Chapter 5. However, we could still estimate multifactor a ne models through ML.
12.5.5.2 More general models

Estimating the model in Eqs. (12.41) is certainly instructive. Yet a more important question
is to examine the term-structure implications of this model. More generally, how would the
estimation procedure outlined in the previous subsection change if the task is to estimate a
Markov model of the term-structure of interest rates? There are three steps.
Step 1

Collect data on the term structure of interest rates. We will need to use data on three maturities,
say a time series of riskless 6 month, 5 year and 10 year yields.
Step 2

Let us consider the three-factor model in Eqs. (12.49) of Section 12.5.2, where the three Brownian motions
are now allowed to be correlated. The bond price predicted by this model
is:

(12.56)
(
)
(

)=E

is a sequence of expiration dates. Naturally, this price depends on the risk-aversion


where
corrections needed to turn the dynamics in Eqs. (12.49) into risk-neutral ones. As discussed,
one may impose analytically convenient conditions on the risk-adjustments, but we do not need
to be more precise at this juncture.
No matter the risk-adjustments, we have that they entail that Eq. (12.56) depends not only on
the physical parameter vector = [

]> , where is a vector containing

all the correlation coe cients of


, but also on these very same risk-adjustment parameter
vector, say . Precisely, the Radon-Nikodym
probability with
R derivative of

R the risk-neutral
respect to the physical probability is exp 12 k k2
Z , for some vector Brownian
m
motion Z, and
(
; ), for some vector-valued function m and some parameter
m
vector . The function
makes risk-adjustment corrections depend on the current value of the
state vector [
], which makes the model Markov, thereby simplifying statistical inference.
To summarize, the issue is now one where we need to estimate both the physical parameter
vector and the risk-adjustment parameter vector . Next, we consider the yield curve in
correspondence of three maturities,
(

ln

=1 2 3

(12.57)

where the notation


(
;
) emphasizes that the theoretical yield curve depends on the
parameter vector (
). We can now use actual data, $ say, and the model predictions about
the data, , create moment conditions, and proceed to estimate the parameter vector (
)
through some method of momentsprovided of course the moments are enough to make (
)
identiable.
We encounter two di culties. The rst is that the volatility and the long-term drift of the
short-term rate, , are not observable. We can use inference methods based on simulations to
cope with this issue. Very simply, we simulate Eqs. (12.49), and apply moment conditions or
auxiliary models to observable variables, according to the procedure set forth in Chapter 5. For
668

c
by
A. Mele

12.6. No-arbitrage models: early formulations

example, we simulate Eqs. (12.49) for a given value of (


). For each simulation, we reconstruct
a time series of interest rates
relying on Eq. (12.57). Then, we use these simulated data to
create moment conditions or t some auxiliary model to these articial data that is as close as
possible to the very same auxiliary model t to real data. The parameter estimator is the value
of (
) minimizing some norm of these moment conditions, obtained through the simulations,
with any of the methods explained in Chapter 5. According to Theorem 5.4 in Chapter 5, tting
a su ciently rich auxiliary model should result in an e cient estimator.
A second di culty is that the bond pricing formula in Eq. (12.56) does not generally admit
a closed-form, an issue we can address using a ne models, as explained next.
Step 3

A ne models are relatively easy to estimate. They imply that,


(

)=

( ;

)+B( ;

)[

]>

=1 2 3

(12.58)

where ( ;
) and B ( ;
) are some functions of the maturity
(B is vector valued),
and generally depend on the parameter vector (
). Once Eqs. (12.49) are simulated, a time
series of yields
is then straightforward to determined based on Eq. (12.58).21
12.5.5.3 Filtering (and trading)

Once we estimate the models parameters, we could attempt to infer the state [
]> . For
example, we could invert Eq. (12.58) given an observation of three yields having three di erent
maturities. The main conceptual di culty with this approach is that the estimates of the state
[
]> rely on the maturities we choose. Changing maturities likely leads to di erent ltered
states. The usual, and admittedly pragmatic, assumption is that Eq. (12.58) only holds with
some additional observation/model error, and proceed, then, to nd the state that minimizes
the error variance using all the available maturities. This procedure delivers the state for each
observation, and it is attractive, as it exploits market-based information summarized by the
cross-section of yields.
Such a pricing-model-based procedure to lter the state has the potential to be used, in
practice, to implement forecasting exercises. We can t a VAR to the time-series of the ltered
values of [
]> and, then, use this VAR to produce forecasts of the state and, then, forecasts
of yields using Eq. (12.58), similarly as we could do in the case of options on equities in markets
with stochastic volatility (see Chapter 10).

12.6 No-arbitrage models: early formulations


Pricing interest rate derivatives necessitates to take the initial yield as given rather than to
explain it. To illustrate, consider a European option written on a bond: only explaining the
bond price is problematic as a models error on the bond price can generate a large option price
error due to the nonlinearities induced by the optionality. How can we trust an option pricing
model, which is even not able to pin down the value of the underlying?
21 Dai and Singleton (2000) implement this estimation strategy, although they make use of data on swap rates. The models they
consider predict theoretical values for the swap rates, obtained through the formula in Eq. (12.106) of Section 12.7.5.4 below, where
the bond prices in that formula are replaced by the pricing functions predicted by the models. Dai & Singleton consider three rates
predicted by their models: two swap rates (with tenures of two and ten years), plus the six month Libor rate, 12 ln
+ 12 ,
where
is the pricing function predicted by the models they consider.

669

c
by
A. Mele

12.6. No-arbitrage models: early formulations

This section surveys the rst attempts to deal with these issues within a continuous-time
framework, with of one them actually being the continuous-time version of the Ho and Lee
(1986) model in Chapter 11. In Sections 12.6.1, we explain the main issue while relying on a
benchmark option pricing formula (derived in section 2.7) and in Section 12.6.2, we provide
details regarding two benchmark models.
12.6.1 Fitting the yield-curve, perfectly
Let again (
) be the price of a zero coupon bond maturing at some . By no arbitrage,
the price of a European call option on this bond, struck at
and expiring at
, is:
h
i
(
) E
( (
)
)+
In Section 12.8, we show that
(

)=

[ (

[ (

] (12.59)

where
is a the forward probability for maturity
(see Eq. (12.93)).
The bond option price in Eq. (12.59) depends on theoretical prices, (
) and (
),
not market prices. This issue is problematic to sell-side institutions while engaged in intermediating derivatives: as explained many times, the need arises in this case to simultaneously
match the yield curve at the time of evaluation. This section describes models that t the yield
curve without errors, which we call perfectly tting models. These models are simply a more
elaborated, continuous-time version of the no-arbitrage models introduced in Chapter 11. They
predict that the price of any bond, say a bond expiring at some , is, of course, random, at
time
, but also exactly equal to the current market price, that of time . Finally, and
naturally, this price must be arbitrage-free. Aim of this and subsequent sections is to show
how to achieve this task by augmenting the models seen in the previous sections with a set of
innite dimensional parameters.
A nal remark. Section 12.8 explains that at least for the Vasicek model, the option price
in Eq. (12.59) does not explicitly depend on because it only depends on (
) and
(
). So why do we look for perfectly tting models in the rst place? Wouldnt it be
enough, then, to just replace the theoretical prices (
) and (
) with the market
values, say $ ( ) and $ ( )? This way, the model is perfectly tting. Apart from being
logically inconsistent (you would have a model predicting something generically di erent from
prices), this way of proceeding also has practical drawbacks.
Section 12.8 reveals indeed that option pricing formulae for European options, might well
agree in notation with those relating to perfectly tting models. However, in Section 12.8.6,
we explain that as we move towards more complex interest rate derivatives, say options on
coupon bearing bonds and swaption contracts, the situation becomes dramatically di erent.
Finally, some maturity dates might not be actually traded at some point in time. For example,
$
( ) might not be observed and still, we might well be interested in the pricing of exotic
products requiring knowledge of $ ( ). An intuitive procedure to deal with this di culty is
to interpolate across the traded maturities. In fact, the objective of perfectly tting models is
to allow for such an interpolation while preserving absence of arbitrage.
The next two sections discuss two specic, old, and yet very famous examples of perfectly
tting models: (i) the Ho and Lee (1986) model, and (ii) one generalization of it, introduced by
Hull and White (1990). In Section 12.7, we move on towards a general model-building principle
that includes these two models as special cases.
670

c
by
A. Mele

12.6. No-arbitrage models: early formulations


12.6.2 Ho & Lee

Ho and Lee (1986) originally set their model in discrete-time, which is analyzed in the context
of Chapter 11 along with alternative models. The model below, represents the di usion limit
of the original Ho & Lee model, as put forward in Section 11.6.7 of Chapter 11, in which the
short-term rate
is solution to,
=

(12.60)

is an innite dimensional
where is a Brownian motion under , is a constant, and
parameter, which we need to pin down the initial, observed yield curve, as we now explain. The
reason we refer to
as innite dimensional is that
is taken to be a continuous function of
calendar time
. We assume this function is known at whence, parameter.
Clearly, Eq. (12.60) is an a ne model. Therefore, the bond price takes the following form,

for two functions

and
(

)=

)=

(12.61)

to be determined below. It is easy to show that,


Z

1
6

)3

)=

Let $ ( ) denote the instantaneous, observed forward rate. By matching the instantaneous
forward rate (
) predicted by the model to $ (
) yields:
$

)=

)=

ln

1
2

)2 +

(12.62)

R
Because ( ) = exp(
( ) ), the drift term
satisfying Eq. (12.62) guarantees
an exact t of the yield curve. By di erentiating Eq. (12.62) with respect to , leaves
=
2
22
)+ (
), or:
$(
=

)+

(12.63)

By Eqs. (12.60) and (12.63), the short-term rate is, then:


=

(0 ) +

1
2

2 2

(12.64)

Moreover, by Eq. (12.62), and Eq. (12.60), the instantaneous forward rate satises,
(

)=

(12.65)

The predictions of this model are the continuous-time counterparts to the original, discrete-time
version of Ho & Lee, introduced in Section 11.6.6 of the previous chapter. In Section 12.7.3,
they will be shown to be a particular case of a more general framework known as HJM.
22 To verify that
is indeed the tting parameter we are searching for, we replace Eq. (12.63) into Eq. (12.62) and verify indeed
that Eq. (12.62) holds as an identity.

671

c
by
A. Mele

12.7. The Heath-Jarrow-Morton framework


12.6.3 Hull & White
Hull and White (1990) consider the following model:
=

(12.66)

where is a -Brownian motion, and


are constants. The model generalizes the Ho and
Lee model (1986) in Eq. (12.60) and the Vasicek (1977) model in Eq. (12.32). In the original
formulation of Hull and White, and are both time-varying, but the main points of this
model can be learnt by working out this particularly simple case.
Eq. (12.66) also gives rise to an a ne model. Therefore, the solution for the bond price is
given by Eq. (12.61). It is easy to show that the functions and are given by
(

1
)=
2

and
(

)=

1
1

(12.67)

(12.68)

By reiterating the same reasoning produced to show (12.63), one shows that the solution for
is:
2

2 (
)
=
(
)
+
(
)
+
1
(12.69)
$
$
2

A proof of this result is in Appendix 5.


Why did we need to go for this more complex model? After all, the Ho & Lee model is
already able to pin down the entire yield curve. The answer is that in practice, investment
banks typically prices a large variety of derivatives. The yield curve is not the only thing to be
exactly t. Rather it is only the starting point. In general, the more exible a given perfectly
tting model is, the more successful it is to price more complex derivatives.

12.7 The Heath-Jarrow-Morton framework


12.7.1 Framework
The bond price representation in Eq. (12.10),
(

)=

all

(12.70)

underlies the modeling approach started by Heath, Jarrow and Morton (1992) (HJM, henceforth). Given Eq. (12.70), this approach takes as a primitive the stochastic evolution of the entire
structure of forward rates, not only the special case of the short-term rate, = lim
( )
( ). The goal is to start with Eq. (12.70), take the initial observed forward rates ( ( )) [ ]
as given, and, then, nd the no-arb, cross-equation restrictions on the stochastic behavior of
( ( )) ( ] , for any
[ ].
672

c
by
A. Mele

12.7. The Heath-Jarrow-Morton framework

By construction, the HJM approach allows for a perfect t of the initial term-structure. This
point can be illustrated quite simply, as the bond price (
) is,
(

)=
=
=
=
=

(
(
(
(
(
(
(
(

)
)
)
)
)
)
)
)

(
(

)
)
(

) +

( (

))

(12.71)

The key points of the HJM methodology are (i) to take the current forward rates ( )
as given (i.e., equal to those in the market) and, then, (ii) to model the future forward rate
movements,
( )
( )
Therefore, the HJM methodology takes the current term-structure as perfectly tted, as we we
observe both ( ) and ( ). In contrast, the approach to interest rate modeling in Section
12.4 is to model the current bond price ( ) through assumptions regarding the dynamics
of the short-term rate. Instead, tting the initial term-structure is critical for market making
purposes, as we explained in the previous section and in Chapter 11.
Finally, note that the bond price representation in Eq. (12.71) leads to a modeling perspective
that is the continuous-time counterpart to that underlying the discrete time Ho & Lee model
(see Chapter 11). Indeed, below we shall show that the continuous-time version of the Ho &
Lee model (see Eq. (12.65)) is a special case of HJM framework.
12.7.2 The model
12.7.2.1 Primitives

We assume information is Brownian, such that for any given , the instantaneous forward rate,
( (
)) [ ] , satises,
(

)=

+ (

(12.72)

for some adapted processes


(

and , and where ( ) is given. The solution to Eq. (12.72) is:


Z
Z
)= ( )+
(
) +
(
)
( ]
(12.73)

Note that in this model,


12.7.5, allow us to index

doesnt depend on . Strings models, analyzed in Section


by , in a sense to be discussed below.

12.7.2.2 No-arb restrictions

The next step is to derive restrictions on that rule out arbitrage. Let
We have
Z

= ( )
(
( )) =
(
)
(
673

R
)

) .

c
by
A. Mele

12.7. The Heath-Jarrow-Morton framework


where the second equality follows by Eq. (12.72) and the following dentions:
Z
Z
(
)
( )
(
)
( )
By Eq. (12.70),

(
(

. By Itos lemma,

)
1
=
(
)+ (
)
2

2
)

By the FTAP, we have that:

2
1
(
)
=
(
)+ (
) + (
)
(
)
(
)
2
R
+
is a -Brownian motion, and satises:
where =
2
1
(
)= (
) + (
)
(12.74)
2
By di erentiating the previous relation with respect to gives us the arbitrage restriction that
we were looking for:
Z
(

)= (

)>

+ (

(12.75)

12.7.3 The dynamics of the short-term rate


By Eq. (12.73), the short-term rate satises
Z
(
)= ( )+
(

Di erentiating with respect to yields

Z
= 2( ) + ( ) +
2(
where

2(

)=

2(

+
Z

)> + (
{z
2(

2(

) (

)> +
}

(12.76)

+ (

(12.77)

(12.78)

Eq. (12.76) reveals that the short-term rate is in general non-Markov. A special case of Eq.
(12.76) is the Ho and Lee (1986) model, where ( ) = , a constant, such that, by Eq.
(12.75), ( ) = 2 (
)+
, consistently with Eq. (12.64). The Hull and White (1990)
model is dealt with in the next section.
Note that regarding the main objective of these modelspricingwe do not need to be
concerned with estimating any risk-premium, i.e., (). We only need to consider the riskneutral dynamics of the left corner in the forward rate surface, that is, those of the short-term
rate . By Eq. (12.77) and Girsanov theorem, these are given by

Z
Z

+ ( )
= 2( ) +
2( ) +
)
(12.79)
2(
where 2 ( ) is dened in Eq. (12.78), and denotes as usual a Brownian motion under the
risk-neutral probability.
Eq. (12.79) can be easily simulated for the purpose of the evaluation of exotic products.
674

c
by
A. Mele

12.7. The Heath-Jarrow-Morton framework


12.7.4 Embedding

Naturally, HJM models are not distinct from the short-term rate models of Section 12.4. Under
embeddability conditions, HJM can be turned into short-term rate modelsa property known
as universality of HJM models.
12.7.4.1 Markovianity

Under which conditions do HJM models predict the short-term rate to be Markov? This question
naturally links to the early literature reviewed in Section 12.4, where the whole yield curve is
driven by a scalar Markov processthe short-term rate. Carverhill (1994) and Ritchken and
Sankarasubramanian (1995) study conditions under which the original state vector can be
enlarged such that the resulting augmented state vector is Markov and at the same time,
includes the short-term rate as a component. The resulting model quite resembles some of the
short-term rate models surveyed in Section 12.4. In these models, the short-term rate is not
Markov, yet it is part of a system that is Markov. We now illustrate these points within the
simple Markov scalar case.
Assume the forward-rate volatility is deterministic and takes the following form:
(

=
=
=

2(

)+

2(

)+

2(

)+

1(

) 2 ( ) all

(12.80)

satises

By Eq. (12.79), then,

)=

Z
Z

2(

0
2(

0
2(

2(

)
+
2( )

2(

0
2(

)
(
2 )

1(

2(

1(

+ (

)
)

+ (
)

+ (

Done. This is Markov. Precisely, the condition in Eq. (12.80) ensures the HJM model predicts
the short-term rate is Markov. Mean reversion, then, obtains assuming that 20 ( ) 0 for all
. For example, take to be a constant, and:
1(

)=

2(

)=

R
(
)
(
)
, and the price volatility is
(
) = 1
. This is
such that ( ) =
the Hull-White model discussed in Section 12.4, of which the Ho and Lee model is a particular
case, namely for = 0.
12.7.4.2 Short-term rate reductions

We prove everything in the Markov case. Let the short-term rate be solution to:
= (

+ (

where is a -Brownian motion, and is some risk-neutralized drift function. The rational
bond price function is (
), and the forward rate implied by the model is:
(

ln

)=
675

c
by
A. Mele

12.7. The Heath-Jarrow-Morton framework


By Itos lemma,
=
But for (

1
+
2

) to be consistent with the solution to Eq. (12.73), it must be the case that

) + (

) (

) (

)+

1
(
2

)2

(12.81)

and
(

)= (

(12.82)

In particular, the last condition can only be satised if the short-term rate model under consideration is of the perfectly tting type.
12.7.5 Stochastic string shocks models
The rst papers are Kennedy (1994, 1997), Goldstein (2000) and Santa-Clara and Sornette
(2001). Heaney and Cheng (1984) are also very useful to read.
12.7.5.1 Stochastic singularity

Let

)=[

)] in Eq. (12.72). For any

1)

2 )]

1)

2,

2)

=1

and,
(

2)

1)

By replacing this result into Eq. (12.75),


Z
(
) =
(
) (
=

k (

)k k (

2 )]

)>

(
(
1)
1 )k k (

=1

k (

+ (
)k (

2)
2 )k

(12.83)

)
)

+ (

One drawback of this model is that the correlation matrix of any ( + )-dimensional vector
of forward rates is degenerate for
1. Stochastic string models overcome this di culty by
modeling the correlation structure (
1
2 ) for all 1 and 2 in an independent way, rather
than implying it from a given -factor model (as in Eq. (12.83)). In other terms, within the
HJM methodology, one uses the functions to model both volatility and correlation structure
of forward rates. The outcome might not be a good model, in practice. Instead, stochastic string
models have two separate functions with which to model volatility and correlation.
The starting point is a model where the forward rate is solution to,
(
where the string

)=

+ (

satises the following ve properties:


676

c
by
A. Mele

12.7. The Heath-Jarrow-Morton framework


(i) For all ,

) is continuous in

(ii) For all

) is continuous in ;

(iii)

) is a -martingale and, hence, a local martingale i.e.

(iv)

(v)

)) =
1)

)) = 0;

;
(

2 ))

2)

(say).

Properties (iii), (iv) and (v) make Markovian. The functional form for is crucially important to guarantee this property. Given the previous properties, one can easily derive a key
property of forward rates. We have
p
( (
)) = (
)
(

2)

1)

2 ))

1)

2)

1)

2)

2)

2)

As claimed before, we now have two separate functions with which to model volatility and
correlation.
12.7.5.2 No-arbitrage restrictions

Similarly as in the HJM-Brownian case, let


Z

= ( )
( ) =

where as usual,
(
(

)
=
)
=

1
2

(
(

( (

. But

. We have
Z
[ ( )

) = exp (

)]

). Therefore,

)
1
)+
2

))

Next, suppose that the pricing kernel

satises:
Z
(

1)

2)

2)

where T denotes the set of all risks spanned by the string , and
family of unit risk-premia.
In absence of arbitrage,

0 = [ ( )] =
drift
+ drift
+
By exploiting the dynamics of
Z Z
1
(
)=
2

is the corresponding

and ,
(

1)

2)

677

2)

c
by
A. Mele

12.8. Interest rate derivatives


where

=
=

)=

) (

) (

) (

) (

with respect to

By di erentiating
(

we obtain,

) (

+ (

(12.84)

A proof of Eq. (12.84) is in the Appendix.

12.8 Interest rate derivatives


This section provides the main denitions and an evaluation framework for the main interest
rate derivatives, such as bond options, callable and puttable bonds, interest rate swaps, caps,
oors and swaptions.
We begin by addressing a simple fundamental question (in Section 12.8.1). How come interest
rate derivatives are so pervasive, when in fact, we know very well that the short-term rate
displays little volatility? We explain this apparent puzzle by proposing that the bulk of interest
rate volatility occurs for deals relating to long maturities. Granted, the short-term rate is not
very volatile in the short term. However, it is very persistent. This persistence can be at the
origins of wild volatility of interest rate deals that have larger maturities.
The remaining sections develop evaluation models of interest rate derivatives, which mainly
rely on models of the short-term rate, and their perfectly tting extensions. Section 12.9 contains details of the so-called market model, which is a HJM-style model intensively used by
practitioners to evaluate these derivatives.
12.8.1 Fixed income market volatility and the persistence of the short-term rate
The implied vol on bond options is typically high for large maturities, in fact comparable to
that on stocks. Why is it that this implied vol is so large, when in fact, the volatility of the
short-term rate is one order of magnitude less than that in equity markets? We tackle this issue
by analyzing the realized volatility, not implied volatility. To anticipate, the answer is that
the short-term rate is very persistent, and it is a risk for the long-run, pretty much in the
same spirit of the explanations attempting to rationalize the equity premium puzzle reviewed
in Chapter 8.
Dene, rst, the term-structure of volatility. It is the function, 7 Vol ( ( )), where ( )
is the spot rate for the maturity , and Vol ( ( )) is the standard deviation of this spot-rate.
By the denition of ( ), the term-structure of volatility can also be written as the function

1
7 Vol
ln ( )
where ( ) is the price of a zero with maturity equal to . It is instructive to see what this
volatility looks like, for a concrete model. Consider the Vasicek model in Eq. (12.32). We know
678

c
by
A. Mele

12.8. Interest rate derivatives


from Eq. (12.37), that the yields predicted by this model are:
( )=
for some function

( )

( )

( )=

( ). Therefore, we have that,


Vol ( ( )) =
q

( ) Vol ( )

(12.85)

where Vol ( ) = 2 is the long-run volatility of the short-term rate. For example, if = 0 2
and = 0 03, then Vol ( ) 4 7%. Given the previous values for and , Figure 12.5 depicts
the term-structure of volatility, i.e. Eq. (12.85).

Vol(R)
0.045

0.040

0.035

0.030

Maturity (years)

FIGURE 12.8. The term-structure of volatility predicted by the Vasicek model.

Figure 12.8 illustrates how the term-structure of volatility decreases over the maturity of the
zero, attaining its maximum at Vol ( ) 4 7%. It is natural, as the yield curve in this model
attens out, converging towards a constant long-term value, the asymptotic interest rate, as we
say sometimes.
Despite this, the volatility of bond returns can be much higher, as we now illustrate. We need
to gure out the dynamics of the bond price, for the Vasicek model. By Itos lemma,
( )
= ( )
( )

+(

Therefore, the volatility of bond returns is,



Vol
=

( )

( ))

(12.86)

Compare Eq. (12.86) with Eq. (12.85). The main di erence between these two equations is
that the right hand side of Eq. (12.85) is divided by , which makes Vol ( ( )) decreasing in .
(Otherwise, Vol ( ) and have roughly the same order of magnitude.) The main point is that
the yield ( ) is, simply, an average return achieved once a bond is purchased and held until
679

c
by
A. Mele

12.8. Interest rate derivatives

its expiry. This average return is progressively less volatile as time to maturity gets larger, and
becomes constant, eventually. The return
does, instead, measure the capital gains achieved
while trading the bond. The volatility of these capital increases with time to maturity. Even
if is very small, the bond returns volatility in Eq. (12.86) can be quite high. Suppose, for
example, that is close to zero, in which case Vol
, which is 15% for a ve year zero.
These properties are illustrated by Figure 12.6, which depicts Eq. (12.86) when the parameter
values are = 0 2 and = 0 03.
0.18

Vol(dP/P)

0.16

Theoretical upper bound


= infinite persistence

0.14
0.12
0.10
0.08
0.06
0.04
0.02
0.00

Maturity (years)


FIGURE 12.9. The dashed line depicts the bond return volatility, Vol
, arising when
the persistence parameter = 0, and the solid line is the bond return volatility for = 0 2.

The high persistence of the short-term rate, as measured by a low value of , makes long
maturity bond returns quite volatile. Intuitively, this high persistence implies that a shock in
the short-term rate has long lasting e ects on the future path of the short-term rate. This
makes the short-term rate very volatile in the long-run, which makes the value of long maturity
zeros very volatile as a result. Intuitively, interest rates exhibit inertia: (i) it takes a number
of shocks to move interest rates away from their equilibrium paths and so, short-term bonds
are not volatile; and (ii) it takes time for interest rates to absorbe shocks and so, medium/long
-term bonds are volatile. For example, Figure 12.7 depicts the dynamics of the three month
rate and those of the three months into ve years forward swap rate, an interest rate that refers
to relatively higher maturities, as explained in Section 12.8.7 below (see Eq. (12.106)). The
forward swap rate is orders of magnitude more volatile than the short-term rate.

680

c
by
A. Mele

12.8. Interest rate derivatives

FIGURE 12.10. Interest rate volatility increases with maturity.

These facts are conrmed by the implicit (not implied) option-based volatility. In Section
12.8.4, Eq. (12.97), we show that this volatility is,
s
(
)
2 (
)1
1
=
Vol
Vol
2 (
)
As gets small, Vol tends to (
), which increases with the bonds time to maturity
left at its expiration,
.
The previous reasoning does, of course, still hold in the more realistic case of a three-factor
model, such as that in Eqs. (12.49). In that case, as explained,
is large and is small:
the short-term rate is quite persistent because it mean-reverts, quickly, to a persistent process,
which we denoted as . Naturally, in such as a three-factor model, Eq. (12.86) does not hold
anymore, as we should add two more volatility components, related to stochastic volatility, ,
and the persistent process . However, the bond return volatility would be boosted by the high
persistence of .
12.8.2 Hypothetical continuous payo s
Interest rate derivatives could be priced in a very elegant fashion once we assume payo s are
paid out continuously. Let denote the price of any such derivative, and be the instantaneous
payo paid by it, a function of calendar time and . Consider any model of the short-term rate
in Section 12.4, and to simplify, assume that = 1, such that
in Eq. (12.25) carries all
information. By the FTAP, is solution to the following partial di erential equation:
+

1
2

, for all (
681

R[

(12.87)

c
by
A. Mele

12.8. Interest rate derivatives

subject to the appropriate boundary conditions.


In Eq. (12.87), is some risk-neutralized drift function of , with the payo
term that
needs to be added to the expected basis point appreciation of under ,
+ + 12 2 ,
such that overall, derivative yields
, which prevents arbitrage. While the payo
could
approximate many of the interest rate derivatives payo s dealt with below, standard market
practice typically relies on payo s dened in terms of LIBOR-type discretely-compounded rates
and, intermediate payments obviously only take place discretely, not continuously.
We now proceed with explaining how to proceed while pricing interest rate derivatives based
on more realistic assumptions than the continuous payo s leading to the evaluation equation
(12.87). We begin by introducing probably the most famous framework of analysis regarding
interest rate derivation evaluation, based on the so called forward probabilities.
12.8.3 Forward martingale probabilities
We know from Chapter 4 that asset prices normalized by the money market account are martingales, yet the money market account is not the only numeraire in markets where interest
rates are random. Chapter 4 also explains we may use the price of zero-coupon bonds as an
alternative numeraire, which leads to the notion of forward probabilities (see Section 4.4). This
section develops this notion more closely, and hinges upon it to demonstrate useful results applying to the dynamic of interest rates, not only prices, which will be quite useful whilst pricing
interest rate derivatives.
Consider a forward contract agreed at time , and let
( ) be the -forward price for a
claim
at
. That is, and consistently with notation in Chapter 10,
( ) is the price
agreed at , which will be paid at for delivery of the claim at , such that the agreement is
worthless at . Clearly,

0=E

But since

( ))

( ) is known at time ,
E

( )E

(12.88)

For example, assume is the price of a traded asset, such that E (


) = : Eq.
(12.88) then collapses to the well-known formula:
() ( )=
as explained in Chapter
10. Yet and as also explained in Chapter 10, entering the forward contract originated at , at
a later date
, costs. To calculate the marking-to-market of the forward at time , note
that the nal payo at time
is
( ). Discounting this payo at
[ ] delivers
(
)[ ( )
( )].
Next, let us elaborate on Eq. (12.88). We can use the basic bond pricing equation (12.2) in
Section 12.2, and rearrange terms in Eq. (12.88), to obtain:
()=E

=E (

where
( )

(
682

( )

(12.89)

c
by
A. Mele

12.8. Interest rate derivatives


Eq. (12.89) suggests that we can dene a new probability
( )=
E

, as follows,

Naturally, E [ ( )] = 1. Moreover, if the short-term rate process is deterministic,


one and and
are the same.
In terms of this new probability
, the forward price
( ) is:
Z
Z
( )=E ( ( ) )= ( ( ) )
=
=E ( )
where E () is conditional expectation at under
is referred to as the -forward probability. Clearly,
()=E

(12.90)
( ) equals

(12.91)

. For reasons claried in a moment,


( ) = . Therefore, Eq. (12.91) is, also,
( ))

That is, forward prices are martingales under the forward probabilitywhence, the expression, forward martingale probability, which we shall shorten to forward probability to simplify
the presentation. Naturally, and as usual, future prices are martingales under the risk-neutral
probability, not the forward, as explained in Chapter 10.
The forward probability is a useful tool, which helps pricing interest-rate derivatives, as we
shall explain in detail below. It was introduced by Geman (1989) and Jamshidian (1989), and
further analyzed by Geman, El Karoui and Rochet (1995). The appendix provides additional
details. Appendix 2 relates forward prices to their certainty equivalent, and Appendix 3 deals
with additional technical details. We now rely on this probability to facilitate the calculation
of options on bonds.
12.8.4 European options on bonds
12.8.4.1 A bond option pricing formula

Let be the expiration date of a European call option on a zero-coupon bond, and
the
expiration date of the bond. We consider a simple model of the short-term rate with = 1,
such that the price of a zero is (
), with the usual notation. Consider the price of an
option on this bond, maturing at and with strike equal to . It equals:
h
i
(
)=E
( (
)
)+
(12.92)

Finding a closed-form solution for this price looks formidable. Note indeed that
is solution to a partial di erential equation, subject to the boundary condition
(
) =
+
( (
)
) , where (
) is also solution to another partial di erential equation. Relying on the forward probability allows simplifying this problem.
Let us, then, elaborate on Eq. (12.92). Note that the main issue we encounter is that the
payo ( (
)
)+ depends on , and yet the discounting factor,
, would
also obviously depend on the realization of the short-term rate. As discussed in Chapter 4,
it is a general issue arising whilst evaluating xed-income instruments, because interest rates
are obviously random in this context. These issues are overcome by turning the risk-neutral
expectation in (12.92) into the forward.
683

c
by
A. Mele

12.8. Interest rate derivatives


Let I be the indicator function of the event the option is exercized, i.e
We have:
(

=E
=

)E

)E

)I

(I

[ (

)E

)E

(
)

(I

[ (

(12.93)

is the -forward probability,


where the second equality follows by a standard calculation,23
=
; and, nally, E () is the expectation taken under the -forward martingale probability. Section 12.8.4 explains how the two probabilities in Eq. (12.93) are computed in the case
of a specic model.
Eq. (12.93) is the bond option counterpart to Black & Scholes formula, in that: (i) the underlying asset is a zero-coupon bond expiring at , the current price of which, (
), multiplies
the probability
; and (ii) the present value of the strike price,
(
), multiplies a second probability,
. Below, we provide an important instance in which the probabilities in Eq.
(12.93) can be calculated in closed-form, based on the assumption that the short-term rate is
Gaussian. We actually present two models with a closed-form solution, one developed in a seminal paper by Jamshidian (1989), and the second one, simply, its perfectly tting extension. We
shall primarily deal with call options although the pricing of put options will be easily derived
through a put-call parity, given next.
12.8.4.2 A put-call parity arising in government bond markets

Consider the identity,


(

))+

( (

)+ +

Taking the risk-neutral, discounted expectations of both sides of this equation leaves,
E

=E
=E
23 By

R
R

))

( (

( (

+
)
+
+
)
+

R
(

the Law of Iterated Expectations, we have, similarly as in Section 12.2.2.2,


E

( )

( ( )

)Iexe = E

( )

684

Iexe E

( )

=E

( )

Iexe

c
by
A. Mele

12.8. Interest rate derivatives

where the last equality follows by the same argument leading to Eq. (12.93). Therefore, we have
the put-call parity relation:
Put (

where Put ( ; (
, expiring at time

)
)

) = Call (

)+

(12.94)

) is the price of a European put written on a zero expiring at time


, and struck at , and Call () denotes the corresponding call price.

12.8.4.3 Jamshidian & Vasicek model

Suppose that the short-term rate is solution to the Vasicek model considered in Section 12.4
(see Eq. (12.32)), such that under the risk-neutral probability,
= (

where is a -Brownian motion and

, and remaining notation is as in Section


12.4.4.1. As we know from Section 12.4, Eq. (12.36), the price of a zero
by
predicted
the model
(
)
is (
) = ( ) ( ) , for some function , and ( ) = 1 1
. In Section
12.4, it was also argued that the price of a European option on a zero is,

( (
)
)+
E
=

[ (

[ (

(12.95)

where
denotes the -forward probability.
In Appendix 8, we show that the two probabilities in Eq. (12.95) can be evaluated by the
changes of numeraire described in Section 12.8.3, such that the solution for (
) is:
(

(
(
(
(

) =

) =

)
)
)
)

1
2

1
2

[ (

[ (

)]2

[ (

[ (

]=

where
2

( 1)
Z

[ (

[ (

under

)]

is a Brownian motion under the forward probability


where
now reveals that:
[ (

)]

(12.96)

)]2

]=

)]2

(
21

under
. Therefore, simple algebra

)
2 (

ln

)
(

1 2
2

)2

(12.97)
2
It is a very elegant formula. It resembles Black & Scholes, although the inputs to the volatility
function link to both the instantaneous volatility, , and the speed of mean-reversion, , of the
short-term rate. We provided the economic interpretation of this dependence in Section 12.8.1.
12.8.4.4 Perfectly tting extension

Brigo and Mercurio (2006) survey a number of perfectly tting models that go well beyond
that in the previous section. The simplest relies on the Hull and White (1990) model in Eq.
(12.66) of Section 12.6.3. Note that while, formally, the solution to Eq. (12.92) is the same as
in the previous section, the value of more complex derivatives depends on whether we use or
not a perfectly tting extension, as Section 12.8.6 explains in further detail.
685

c
by
A. Mele

12.8. Interest rate derivatives


12.8.5 Callable and puttable bonds

This section relies on a continuous time model to illustrate a few, key properties of callable and
puttable bonds. As explained in Chapter 11 (Section 11.8.1) callable bonds are assets that give
the issuer the right to buy them back at certain times and predetermined prices; puttable bonds,
instead, give the investor the right to sell them back to the issuer at certain times and strikes.
Chapter 11 explains the pricing mechanims of these assets within a discrete time framework
in which the option exercise is of the American type, relying on binomial trees. This section
only considers European-style options (and zero coupon bonds), but leads to clear predictions
and analytical evaluation formule. For simplicity, we consider non-defaultable, and zero coupon,
bonds.
Consider, rst, callable bonds maturing at , and let
be the strike at which they can be
called. Suppose that the date of exercise, if any, is some future time
. Repeating some
of the reasoning in Chapter 11, assume exercise, in which case the issuer can buy its bonds
back at and re-issue a zero-coupon bond at better market conditions, , where denotes as
usual the price of a non-callable bond. The di erence,
, is just a net gain for the issuer.
Therefore, the callable bond is worth just
when
. Instead, if
, the issuer does
not have any incentives to exercise and, then, the value of the callable bond is just that of a
non-callable bond. Therefore, the callable bond is worth when
. To sumup, the value
at of a callable bond is min {
(
)}. It easy to see that,
min {

}=

max {

0}

Therefore, we see that the price of a callable bond with maturity date , equals the price of
a non-callable bond with the same maturity date , minus the value to call the bond, which
equals the price of an hypothetical option on the non-callable bond, struck at .
We can apply these insights to price a callable option in a concrete example. Consider, for
example, the short-term rate in the Vasicek model. Then, if the short-term rate is at time ,
the value as of time of the non-defaultable zero coupon bond maturing at time , callable at
time
, at a strike price equal to , is,
callable

)=

Call (

(12.98)

where (
) is the value of the non-callable zero maturing at time , and Call (
)
is the value of a call option on the non-callable -zero, maturing at time and having a strike
price equal to .
Eq. (12.98) shows that the presence of the option to call the bond raises the cost of capital
for the issuer.
In the context of the Vasicek model, the solution to
(
) in Eq. (12.98) is given by
the Jamshidians (1989) formula in Eq. (12.95), which we now use below. Figure 12.8 depicts the
behavior of the price of the callable bond in Eq. (12.98), callable ( 0
), as a function of the
short-term rate, , when the exercise price = 0 65, option maturity is = 0 5, and the bond
maturity is = 10. Finally, to evaluate Eq. (12.98), we make use of the closed-form solution in
Eq. (12.36), and the parameter values = 0 2, = 0 06, = 0 03, = 1 7146 10 2 .
686

c
by
A. Mele

12.8. Interest rate derivatives

0.70

0.65

0.60

0.55

0.50
0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.10

short-term rate

FIGURE 12.11. Negative convexity. Solid line: the price of a callable bond. Dashed line:
the price of a non-callable bond. The price of a callable bond exhibits negative convexity
with respect to the short-term rate.

As Figure 12.8 illustrates, the convexity of the non-callable bond price is destroyed by the
convexity of the price of the option embedded in the callable bond. Intuitively, as the shortterm rate lowers, callable and non-callable bond prices increase. However, the price of callable
bonds increases less because as the short-term rate decreases, bond prices increase and, then, the
probability the issuer will exercise the option increases. As a result, the risk-neutral distribution
of the callable bond price becomes markedly shifted towards the strike price, = 0 65, implying
a progressively lower decay rate for the bond price as the short-term rate gets small.
What is the duraton of a callable bond? Roughly, a ve year bond with xed coupons issued
when interest rates are relatively high might resemble, so to speak, a three year conventional
bond, as a likely decrease in the interest rates would lead the bond-issuer to reedem its debt at
the strike price. To formalize this intuition, we determine the stochastic duration of the callable
bond predicted by this model, using Eq. (12.31). For the Vasicek
model,( the) semi-elasticity of
1
the non-callable bond price with respect to is (
)= 1
; its inverse with
respect to time-to-maturity is
1

( )=

ln (1

Therefore, the stochastic duration for the callable bond predicted by the Vasicek model is, by
Eq. (12.31):

1
(
)
(
)
(
)=
ln 1 +
(
)
(
)
where subscripts denote partial derivatives.
[In progress]
Next, we proceed with pricing the puttable bond. As explained in the previous chapter,
Section 11.8, the payo at the expiration of the bondholders right to tender the bonds is:
max {

}=

+ max {
687

0}

c
by
A. Mele

12.8. Interest rate derivatives

where
is the price of a non-puttable bond. We can use, again, the Vasicek model to price
the previous payo . The price at of a non-defaultable zero-coupon bond maturing at time ,
puttable at time
, at a strike price equal to , when the short-term rate is , is:
puttable

)=

) + Put (

) = Call (

)+

where (
) is the value of the non-puttable zero maturing at time ; Put (
) is
the value of a put option on the non-puttable zero maturing at , maturing at , struckable at
; and the second equality follows by the put-call parity of Eq. (12.94), with Call (
)
dened as in Eq. (12.98).
[In progress]
12.8.6 Options on xed coupon bonds
For simplicity, we shall ignore issues regarding coupon accruals, and assume the expiration date
of these options occurs at any of the reset dates. Therefore, the payo of an option maturing
at 0 on a xed coupon bond paying o at dates 1
is:

!+
X
+
( fcb ( 0
)
) =
( 0
)+
( 0 )
(12.99)
=1

Evaluating the expectation of the payo in Eq. (12.99) is somehow problematic: the maximum
between zero and a sum is obviously not in general the same as the sum of the maxima between
zero and each element of the sum. Even with a model in which bond prices are log-normal, the
sum of log-normals is not log-normal. However, this issue can be dealt through a very well-know
trick, described next.
Consider any of the models of the short-term rate reviewed in Section 12.4, in which the
price of a zero is some function ( ) = (
). Assume that (
) is decreasing in
24
, such that (under additional conditions) there exists a unique value of , say ( ), which
solves the following equation:
(

)+

( )

)=

(12.100)

=1

The payo in Eq. (12.99) can then be written as:


!+

X
X
( 0 0 )
=
( (
=1

=1

( )

!+

))

where = , = 1
1, and = 1 + .
Next, note that the terms ( 0 0 )
( ( ) 0 ) have all the same sign for all .25
Therefore, the payo in Eq. (12.99) is

!+
X
X
( 0 0 )
=
( ( 0 0 ) K ( ))+
K ( )
( ( ) 0 )
=1

=1

(12.101)

24 Bond prices are indeed always decreasing in the short-term rate in all one-factor stationary, Markov models of the short-term
rate. However, this is not a general property in multi-factor models (see Mele, 2003).
25 Suppose that
( 0 0 1)
(
. Hence ( 0 0 2 )
(
0
1 ). By Eq. (??),
0
2 ), etc.
0

688

c
by
A. Mele

12.8. Interest rate derivatives

Each term of the sum in Eq. (12.101) can be evaluated as an option on a pure discount bond
with strike price equal to ( ( ) 0 ), where the threshold ( ) is found numerically.
The device to reduce the problem of an option on a xed coupon bond to a problem involving
the sum of options on zero coupon bonds was invented by Jamshidian (1989).
The price of the call on the xed coupon bond is, therefore
Call (

0;

fcb (

)=

X
=1

Call (

0;

) K ( )

(12.102)

solves Eq. (12.100), and

where

Call (
1

ln

K (

) (

0;
0)

+ 12

K ( )
2

)=
q
1

(
2 ( 0

K ( )

1
)

0)

)=

The price of a put can then be determined through the put-call parity in Eq. (12.94).
Why are perfectly tting models so important, in practice? Suppose that in Eq. (12.100), the
critical value is determined through Vasiceks model. This assumption is attractive because it
leads to evaluate the payo in Eq. (12.101) through the Jamshidians formula of Section 12.8.4.
However, this way to proceed does not ensure that the yield curve is perfectly tted.
The natural alternative is to use the corresponding perfectly tting extension, the Hull and
White model in Section 12.8.4, i.e. Eq. (12.66), and use this price to calibrate in Eq. (12.100).
Note, now, the importance of a perfectly tting model. As mentioned in Section 12.8.4, both
Jamshidian and its perfectly tting extension agree regarding the price of an option on a zero.
However, Jamshidian and its perfectly tting extension would assign di erent values to options
on coupon bearing bonds, because they would lead to di erent values for
in Eq. (12.100)
and, hence, di erent values for the ctitious strikes (
) in Eq. (12.101).
0
12.8.7 Interest rate swaps
A Savings and Loan (S&L, henceforth) is an institution that extends mortgage, car and personal
loans to individual members, nanced through savings. During the 1980s through the beginning
of the 1990s, these forms of cooperative ventures entered into a deep and persistent crisis, leading
to a painful Government bailout of about $125b under George H.W. Bush administration.
There are many causes of this crisis, but one of them was certainly the rise in short-term rates
arising as a result of ination and the attempts at ghting against itthe so-called Monetary
Experiment mentioned in Section 12.4.5. But banking is risky precisely because it involves
lending at horizons longer than those relating to borrowing, and S&L banking was not an
exception to such modus operandi. Certainly, interest rate swaps could have helped copying
with the inversion of the yield curve of the time. We now examine the pricing details of this
derivative in detail.
12.8.7.1 Forward rate agreements again

Interest rate swaps are baskets of forward rate agreements in a sense. Consider a forward rate
agreement in which the xed rate does not clear its value at origination. Denote its value at
origination with FRA (
; ), where
is the debt-servicing period, and is the xed
689

c
by
A. Mele

12.8. Interest rate derivatives


rate, viz
FRA (

)=E
=E

=E

)( (
) (

(
!

(1 + (

(1 + (

)
(

)
(12.103)

where the third line holds by the denition of and the fourth follows by Eq. (12.5) given in
Section 12.2.
Alternatively, note that the LIBOR, (
), while known at , is only paid o at , such
that the value of 1 + (
) (
) (to be delivered at ) is simply one at and, obviously,
( ) at . That is, the value of (
) (
) (to be delivered at time ) is ( )
( )
at , whence, the fourth equality.26
Finally, replace the basic denition of the forward rate in Eq. (12.6) into Eq. (12.103), which
leaves:
FRA (
; )=(
)( (
)
) ( )
(12.104)
As is clear, FRA can take on any sign, and is exactly zero when
= (
), where
(
) solves Eq. (12.6). Interest rate swaps are those where payment exchanges repeatedly
occur over a given time horizon known as the swap tenor, as explained below.
12.8.7.2 Forward starting swaps

An interest rate swap is simply an exchange of interest rate payments. One counterparty exchanges a xed against a oating interest rate payment. The oating payment is typically a
short-term interest rate. For example, the counterparty receiving a oating interest rate payment has good (or only) access to markets for variable interest rates, but wishes to pay
xed interest rates. Alternatively, the counterparty receiving a oating interest rate wants to
hedge against changes in short-term rates, as it might have been the case for S&L institutions
during the 1980s. The counterparty receiving a oating interest rate payment and paying a
xed interest rate irs has a payo equal to,
1

( (

irs )

at time , = 1 , where
1
1 as usual.
Each of these payments is a FRA really, and can be evaluated as in the previous section. By
convention, we say that the swap payer is the counterparty who pays the xed interest rate
irs , and that the swap receiver is the counterparty receiving the xed interest rate
irs .
With a dedicated interest swap of this kind, a S&L institution would have locked-in the yield
curve:
specically, in this stylized
example, the payo for the nancial
long
long institution would
be
( 1 )
( 1 ) + 1( ( 1 )
( 1 )
1
irs ) =
1
irs , where

26 We are assuming, as it is standard, that settlements occur at


. When settlements occur at
, the value of
1+(
) (
) at is obviously higher than one, and the calculations leading to the last line of Eq. (12.103) and the modelfree expression for would not go through. The technical issue is a convexity e ect by which a payo of 1 + (
) (
) at
time
is equivalent to a payo of (1 + (
) (
))2 at time . Brigo and Mercurio (2006, Chapter 13) and Veronesi (2010,
Chapter 21) explain the standard market practice to deal with this issue.

690

c
by
A. Mele

12.8. Interest rate derivatives


long

( 1 ) is the interest rate gained over long-term assets. Naturally, if short-term interest
rates had to go down, relative to irs , a S&L
beneted from the
institution would not have
increased long-term/short-term spread, 1 long ( 1 )
( 1 ) . But clearly insuring
against yield curve inversions is the thing to do, if yield curve inversions lead to bankruptcy and
bankruptcy is costly. We shall see, below, that other products exist, such as caps or swaptions,
which ensure against the upside while at the same time freeing up the downside.
The value at of a forward starting interest rate swap (i.e. a swap starting at
) payer,
irs ( ) say, is:
irs

()=

X
=1

( (

X
FRA(
irs ) =

irs )

(12.105)

=1

where by Eq. (12.104),


FRA (

irs )

( (

irs )

The forward swap rate sw is the value of irs such that irs ( ) = 0. Simple calculations yield:
P
) (
)
(
)
( 0)
1
=1 P1 (
= P
(12.106)
sw ( ) =
)
)
1 (
1 (
=1
=1

where the last equality holds by Eq. (12.6) in Section 12.2: 1 (


) (
)= (
1
1)
27
(
). This expression collapses to the par coupon rate derived in Section 11.2.2.2 of
Chapter 11, once we set = 0 . That is, the spot swap rate is a par yield.
The rst equality of Eq. (12.106) indicates that the forward swap rate is a linear combination
of forward rates. This property has been used by Rebonato (1998) to deal with approximations
of swaptions values based on the market model (see Section 12.9.4), as discussed in length by
Brigo and Mercurio (2006, pp. 247-249).
By plugging the expression for the forward swap rate in Eq. (12.106) into Eq. (12.105), we
obtain the following intuitive expression for the forward swap payer:
irs ( ) =

irs

=1

=1

)(

sw

()

irs )

=1

PVBP (

)(

sw

()

irs )

(12.107)

where PVBP ( 1
) is the so-called swap Present Value of the Basis Point, i.e. the
present value impact of one basis point move in the forward swap rate at .
12.8.7.3 Marking to market

While a forward starting swap is costless at origination, its marking to market updates are
calculated as follows. Suppose to enter at into a forward starting swap originated at time .
27 To cast this problem in terms of continuous time swap exchanges and, then, PDEs, we set
0 as a boundary condition,
irs ( )
and ( ) =
, where plays the same role as irs above. Then, if the bond price ( ) is solution to Eq. (12.87), the following
function, irs ( ) = 1
( )
( ) , does also satisfy Eq. (12.87).

691

c
by
A. Mele

12.8. Interest rate derivatives


The value of this position is,
m-t-m
irs

( )
X
E

1( (

(
))
sw

=1

=1

= PVBP (

( (

)(

sw

sw ( ) + sw ( ))

( )

sw

PVBP (

sw

()

( ))

where the third equality follows by the denition of a forward swap at .


12.8.8 Caps & oors
A cap works as an interest rate swap, with the exception that the exchange of interest rates
payments takes place only if actual interest rates are higher than . A cap protects against
upward movements in the interest rates, while freeing up the downside. By going long a cap, the
S&L institution in the example of the previous section, then, would benet from the downside
in the short-term interest rates through a cap on them, literally. Precisely, a cap is made up of
caplets. The payo as of time of a caplet is:
1

( (

)+

= 1

Floors are dened in a similar way. They are baskets of single oorlets that pay o
( 1 ))+ at time , = 1 .
1(
We will only discuss caps. By the FTAP, the value cap of a cap as of time is:
cap (

)=

=1

( (

)+

(12.108)

Analytical solutions to this problem can be found relying on models of the short-term rate.
First, we use the standard denition of simply compounded rates given in Section 12.2 (see Eq.
(12.1)), viz
) = ( 1 1 ) 1, and rewrite the caplet payo as follows:
1 (
1
(

)+ =

1
(

(1

(1 +

) (

))+

We have,

cap

()=

=1

X
=1

1
1

(1

(1 +

1
(K
K

) (

692

))

))+

K = (1 +

(12.109)

c
by
A. Mele

12.8. Interest rate derivatives

where the last equality follows by a simple calculation.28 For the models of Jamshidian or Hull &
White, bond prices are such that the cap price in Eq. (12.109) can be expressed in closed-form.
Indeed, Eq. (12.109) makes clear a cap is a basket of puts on zero coupon bonds, with strikes
K . As such, it can be priced in closed form, using the models in Sections 12.8. We have:
cap

()=

X 1
Put (
K
=1

1;

) K

(12.110)

where Put () satises the put-call parity in Eq. (12.94), and, by the pricing formulae in Section
12.8.4,
Call (
ln

(
(

1;
)
1)

+ 12

) K )= (
q
2 (
1
=
2

(
2

1)

)=

(12.111)
Naturally, caps on interest rates, which are nothing but baskets of calls, are portfolios of puts
on xed coupon bonds, due to the inverse relation between prices and interest rates.29
12.8.9 Swaptions
Let us elaborate further on the example of the S&L institution in the previous sections. The
benets for a S&L institution to buy caps is to be protected against upward movements in
the short-term rates while ensuring the downside is freed up. These benets arise, so to speak,
period per period in that, a cap is a basket of options with di erent maturities. A swaption
works di erently, in that the optionality kicks in all together.
Suppose at time , the S&L institution is still concerned about future inversions of the yield
curve and, therefore, anticipates it might need to go for going long a swap payer at some future
date. At the same time, the institution might fear that in the future, swap rates will be lower
relative to some reference strike. Swaptions allow to free up such a downside risk, being options
to enter a swap contract on a future date. Let 0 be the maturity date of this option. Then,
come time 0 , the payo for a payer swaption is the maximum between zero and the value of
a payer interest rate swap at 0 , irs ( 0 ), viz

!+
!+
X
X
FRA ( 0
; irs )
=
)
( 0 )
( irs ( 0 ))+ =
1
1( ( 0
1
irs )
=1

=1

(12.112)

28 By

the law of iterated expectations,

[1

)]+

=E

=E

E
=E

( )

(1

(1

(1

K
(

))+ F

))+ F

))+ F
=E

(1

))+

29 We might also price caps and oors through the partial di erential equation (12.87), after setting
( )=(
)+ (caps) and
( )=(
)+ (oors), for some strike . However, this type of contracts, where payo s are paid continuously in time, is highly
stylized, and does not exist in the markets.

693

c
by
A. Mele

12.9. Market models


By the FTAP, the value of the payer swaption at time is:
"

X
0
swpn ( ) = E
1( ( 0
1
= E

"

irs )

=1

irs

!+ #

!+ #

(12.113)

=1

where we used the relation


) = ( (0 0 )1 ) 1.
1 ( 0
1
Eq. (12.113) is the expression for the price of a put option on a xed coupon bond struck at
one. Therefore, we can price this contract in closed-form, through the models in Section 12.8.4,
similarly to that we did in the previous section while pricing caps. We have:
swpn

( ) = Put (

0;

fcb

) 1 )

where Put () satises the put-call parity in Eq. (12.94). By the pricing formulae in Section
12.8.4,
Call (

0;

fcb

) 1 )=

irs

1 Call (

0;

)+Call (

0;

=1

where Call ( 0 ; (
)
to Eq. (12.100) for = 1.

) is as in Eq. (12.111), with

), and

solution

12.9 Market models


12.9.1 Models and market practice
The previous section illustrates that models of the short-term rate (along woth their perfectly
tting extensions) lead to closed-form expressions for the price of important interest rate sensitive products. Yet practitioners evaluate caps, oors or swaptions through the Blacks (1976)
formula, as explained in detail below. The assumption underlying market practice is that the
simply-compounded forward rate is lognormally distributed. As it turns out, the analytically
tractable (Gaussian) short-term rate models are not consistent with this assumption. Clearly,
the (Gaussian) Vasicek model does not predict that the simply-compounded forward rates are
Geometric Brownian motions.30
How to address these issues while relying on a non-Markovian HJM? Some qualications are
in order. A practical di culty with HJM is that instantaneous forward rates are not observed,
which at a rst sight seems to be an hindrance to realistic pricing of caps, oors and swaptions.
Brace, Gatarek and Musiela (1997), Jamshidian (1997) and Miltersen, Sandmann and Sondermann (1997) address this issue, and note that the HJM framework can be somehow forced
to produce models ready to be used consitsently with market practice.
The key feature of the market models is the emphasis on the dynamics of the simplycompounded forward rates. One additional, and technical, assumption is that these simplycompounded forward rates are lognormal under the risk-neutral probability . That is, given a
(
)
= exp [
( )
( ) ( )], where
( ) = (
)
(
( ) =
+1 ), and
(
+1 )
(
)
(
( ) is not a Geometric Brownian motion, despite the fact that the short-term rate is Gaussian
+1 ). Hence,
and, hence, the bond price is log-normal. Black 76 can not be applied in this context.
30 Indeed,

1+

( ) =

694

c
by
A. Mele

12.9. Market models


non-decreasing sequence of reset times { } =0 1 , each simply-compounded rate,
to the following stochastic di erential equation:31
=

( )

( )

= 0

, is solution

(12.114)

where to simplify notation, we have set,


(
and are some determin+1 ), and
istic functions of time ( is vector valued). On a mathematical point of view, that assumption
that
follows Eq. (12.114) is innocuous.32
As we shall show, this simple framework can be used to use the simple Blacks (1976) formula
to price caps and oors. However, we need to emphasize that there is nothing wrong with the
short-term rate models analyzed in previous sections. The real advance of the so-called market
model is to give a rigorous foundation to the standard market practice to price caps, oors and
swaptions by means of the Blacks (1976) formula.
12.9.2 Simply-compounded forward rate dynamics, and no-arb restrictions
By the denition of the simply-compounded forward rates in Eq. (12.6),
ln

(
(

)
+1 )

= ln (1 +

(12.115)

The logic we follow, now, is the same as that underlying the HJM representation of Section
12.7. We wish to express the volatility of bond prices in terms of the volatility of forward rates.
To achieve this task, we rst assume that bond prices are driven by Brownian motions and
expand the L.H.S. of Eq. (12.115) (step 1). Then, we expand the R.H.S. of Eq. (12.115) (step
2). Finally, we identify the two di usion terms derived from the previous two steps (step 3).
Step 1: Let

), and assume that under the risk-neutral probability

is solution

to:
=

In terms of the HJM framework in Section 12.7,


( )=

)=

(12.116)

where ( ) is the instantaneous volatility of the instantaneous -forward rate as of time


. By Itos lemma,
ln

(
(

)
+1 )

1
k
2

k2

31 Brace,

2
+1 k

+(

+1 )

(12.117)

Gatarek and Musiela (1997) formulate a model in terms of the spot simply-compounded LIBOR interest rates. Because
( ) = ( ), the two derivations are essentially the same.
32 It is well-known that lognormal instantaneous forward rates are problematic as they imply the money market account is illbehaved. Sandmann and Sondermann (1997) provide a succinct overview on how this problem is handled with simply-compounded
forward rates.

695

c
by
A. Mele

12.9. Market models


Step 2: Applying It
os lemma to ln (1 +

ln (1 +

) =
=

), and using Eq. (12.114), yields:


1
2 (1 +

1+

(
)2
!

1 2 2 k k2
2 (1 +
)2

1+

)2
+

1+

(12.118)

Step 3: By Eq. (12.115), the di usion terms in Eqs. (12.117) and (12.118) have to be the same.

Therefore,
( )

+1 (

)=

( )
[
]
1+
By summing over , we get the following no-arbitrage restriction applying to the volatility
of the bond prices:
1
X
( )
( )
(12.119)
0( ) =
1+
=0
Eq. (12.119) is, thus, a restriction to the general HJM framework. In other words, assume
the instantaneous forward rates are as in Eq. (12.72) of Section 12.7. As shown in Section 12.7,
the bond price volatility is, then, given by Eq. (12.116). But if we also assume that simplycompounded forward rates are solution to Eq. (12.114), then, the bond price volatility needs
to equal that in Eq. (12.119). Comparing Eq. (12.116) with Eq. (12.119) produces,
Z
1
X
( ) =
( )
1+
0
=0
The practical interest to restrict the forward-rate volatility dynamics in this way lies in the
possibility to obtain closed-form solutions for some of the interest rates derivatives surveyed in
Section 12.8.
12.9.3 Applications to derivative evaluation
12.9.3.1 Forward rates as martingales

Forward rates are martingales under the forward probability. Regarding the continuously compounded notion, we have, by the denition of the instantaneous forward rate, the usual change
of probability and standard regularity conditions,
(

ln

)=

=
=E

=E (

)
)

(
( )

=E

( )

=E

( (
696

))

)
!

c
by
A. Mele

12.9. Market models

where the last equality holds as = ( ) as usual.


Similarly, and regarding simply-compounded forward rates, we have that for a given sequence
of dates { } =0 1 ,
+1 )

=E

( (

+1

+1 ))

=E

+1

( (

+1 ))

where the second equality follows by Eq. (12.7), i.e. (


(
+1 ) =
(12.120), note that by denition, the simply-compounded forward rate (

( (
)
(
))
(
):0=E

as explained in Section 12.7.1. That is, by rearranging terms,

!
(

)=E

( (

=E

(12.120)

+1 ).

To show Eq.
) satises,

))

(12.121)

We use this martingale property whilst pricing caps and oors.


12.9.3.2 Caps & oors

We provide analytical results for the price of caps only. We have:


cap (

)=

=1

=1

)E

=1

( (
( (

( (

)+

)+

() denotes, as usual, the expectation taken under the

where E

(12.122)

-forward martingale prob-

ability
; the rst equality is Eq. (12.108); and the second equality folloows by a change of
probability, from the risk-neutral to the forward.
A key point explained above is that
) is a martingale under
for
1
1(
1
[
1 ] (see Eq. (12.121), such that by Eq. (12.114),
1 is solution to:
1

( )

( )

= 1

1]

under
. Therefore, the cap price in Eq. (12.122) reduces to the Blacks (1976) formula
discussed in Chapter 10 (see Section 10.4.4 and Appendix 2 to Chapter 10), once we assume
is deterministic:
E

( (

)+ =

where
1

ln

1 2
2

697

1)

1(

(12.123)

c
by
A. Mele

12.9. Market models


12.9.3.3 Swaptions

By Eq. (12.107), the payo of a payer swaption expiring at time


(

+
irs ( 0 )) = PVBP 0 (

)(

+
PVBP 0 (
irs )

sw ( 0 )

Therefore, by the FTAP, and a change of probability,

0
(
)
=
E
PVBP 0 ( 1
swpn
= PVBP (

where E
by:

)E

sw

)(

is:

)=

+
irs )

sw ( 0 )

=1

+
irs )

sw ( 0 )

(12.124)

denotes the expectation taken under the so-called forward swap probability, dened

R 0
PVBP 0 ( 1
)
sw
=
PVBP ( 1
)
F 0


sw
It is easy to see that E
= 1, by using the denition of PVBP 0 ( 1
), the

F 0
0
)=E
( 0 ) , as anticipated in Chapter 4. As also menpricing equation, (
tioned in Chapter 4, swap is also sometimes referred to as annuity probability.
The key point underlying this change of probability is that the forward swap rate swap is a
33
and clearly, positive. Therefore, it must satisfy:
swap -martingale,
swap

( )
=
sw ( )
sw

sw

sw

( )

0]

(12.125)

where sw is a sw -Brownian motion, and sw ( ) is adapted.


If the vol sw ( ) in Eq. (12.125) is deterministic, we can use Black 76 to price the payer
swaption in Eq. (12.124) in closed-form. We have:
p
)
(12.126)
) Black76( sw ( ) ; 0 irs
swpn ( ) = PVBP ( 1

where Black76 () is given by Blacks (1976) formula:


Black76(

sw

( );

ln

) =

irs

sw ( ) + 1
2
irs

sw

12.9.3.4 Inconsistencies

()
R

( )
2
sw

irs

( )

If the forward rate is solution to Eq. (12.114), sw cannot be deterministic. Unfortunately, if


forward swap rates are lognormal, then, Eq. (12.114) does not hold. Therefore, we may use
Blacks formula to price either caps or swaptions, not both. This might limit the importance of
33 By

swap

Eq. (12.106), and one change of measure,


[

swap (

)] = E

swap

(
0)
PVBP (

)
=E
)

( )

( (
PVBP (

698

0)

(
)

))

( 0)
PVBP (

)
=
)

swap (

c
by
A. Mele

12.10. Volatility surfaces

market models. A couple of tricks that seem to work in practice. The best known is based on
a suggestion by Rebonato (1998), to replace the true pricing problem with an approximating
pricing problem where sw is deterministic. That works in practice, but in a world with stochastic volatility, we should expect that trick to generate unstable things in periods experiencing
highly volatile volatility. See, also, Rebonato (1999) for an essay on related issues. The next
section suggests to use numerical approximation based on Montecarlo techniques.
12.9.3.5 Numerical approximations

Suppose forward rates are lognormal. Then, we can price caps using Blacks formula and we
proceed to price swaptions relying on the general HJM framework, as summarized by its restrictions in Eq. (12.119), and Montecarlo integration, as follows. By a change of probability,
"

!+ #
X
0
)
) ( 0 )
swpn ( ) = E
1( ( 0
1
=1

0 )E

( (

) (

!+

=1

where ( 0
), = 1 , can be simulated under 0 .
1
Details are as follows. We know that
(
) is solution to,
1
1
1

1(

(12.127)

and that by results in Appendix 3,


0

( )

1
X

1+

=0

0(

))
( )

where the second line follows from Eq. (12.119). Replacing this into Eq. (12.127) leaves:
1
1

1
X
=0

1+

( )

1(

= 1

These can easily be simulated with the methods described in any standard textbook of this
kind, such as that of Kloeden and Platen (1992).

12.10 Volatility surfaces


12.10.1 Implied volatilities
12.10.1.1 Caps & oors

Market practice quoting conventions rely on volatility surfaces stemming from the market models of the previous section, rather than those in Section 12.8.7-12.8.9. The models in Section
12.8.7-12.8.9 could actually be exploited to produce volatility surfaces, albeit indirectly, after
699

c
by
A. Mele

12.10. Volatility surfaces

calibration of the two parameters and , as Eq. (12.110) indicates. However, it is easier to
provide volatility surfaces in the rst place, through the models of this section. Quite simply,
practitioners use Eq. (12.123) and quote volatilities such that the market price of a cap equals
to the value predicted by Eq. (12.123) using the desired implied volatility . In Eq. (12.123),
p
=
()
1
for some

( ), although, then, practitioners simply quote the value of that satises:


:

$
cap

(; )=

=1

where

$
cap

( ; ) is the market price of the cap, and:

Black76 ( 1 ; ) =
1
1
1

Given
0=

) Black76 (

ln

+ 12 2

, we can bootstrap ( ), i.e. we can recursively solve for ( ), as follows:


X

=1

) [Black76 (

Black76 (

= 1

)]

where
is the latest available maturity, and =
( ). The values of ( ) amount
1
to what is typically referred to as the term structure of caps volatilities.
12.10.1.2 Swaptions

As for swaptions, the situation is much simpler. The market practice is to quote swaptions
through standard implied vols, i.e. those vols IV such that, once inserted into Eq. (12.126),
delivers the swaption market price:
swpn (

) = PVBP (

) Black76 (

sw

( );

irs

IV )

12.10.2 Local volatilities and SABR models


How to evaluate illiquid derivatives while making sure that the existing products are tted without error? When the existing products are derivatives, we can apply the same methodology
relying on local volatilities, and put forward in Chapter 10 (see Section 10.7).
Local volatility models su er from a drawback pointed out by Hagan, Kumar, Lesnieki and
Woodward (2002), which we now illustrate in the case of the modeling of swaptions. For simplicity, assume that the local volatility does not depend on calendar time, and denote it with
), where
is the forward swap rate at time for a given tenor . Hagan, and
loc (
Woodward (1999) show that for any maturity, the implied Blacks volatility of a Europeanstyle option is:

1
1 00loc 12 ( + )
(
1
) = loc 2 ( + ) 1 +
)2 +
iv (
24 loc 2 ( + )
where the omitted terms are likely to be numerically negligible for practical purposes.
700

c
by
A. Mele

12.10. Volatility surfaces

What are the dynamics of the market smile implied by this local volatility model? To illustrate, consider what happens to the rst term in the previous expansion, say iv (
), when
the forward increases from
to
+
,

1
+
)
+
+
) = iv ( +
)
iv (
loc 2 (
In other words, provided loc is decreasing, the local volatility model predicts that as the
forward
increases, the skew moves to the left, which might contradict market behavior. For
example, let us assume the local volatility function is, loc ( ) = 0 04 1 2 . The left panel
of Figure 12.12 plots the implied volatility iv (
) for
= 3% (solid line) and
= 4%
(dashed line).
30
R =3%

R =3%

Rn=4%

Rn=4%

35

30
Implied volatility (SABR model), in %

Implied volatility (Local vol model), in %

Rn=5%

25

20

25

20

15

10

15

4
Strike, in %

4
Strike, in %

FIGURE 12.12. The left panel depicts the approximated implied volatility iv (
)
1
2
predicted by a local volatility model with loc ( ) = 0 04
.Solid and dashed lines
equal to 3% and 4%, respectively. The right panel depicts the
correspond to values of
approximated implied volatility iv (
; ) in Eq. (12.129) predicted by the SABR
model in Eq. (12.128), with
= 0 02, = 0 5, = 0 5, = 0 5, and = 1.Solid,
dashed and dotted lines correspond to values of
equal to 3%, 4% and 5%, respectively.

Hagan, Kumar, Lesnieki and Woodward (2002) (HKLW in the sequel) consider a richer model,
which they call SABR for Stochastic, in which
satises,
(
=
1

p
(12.128)
2
=
+
1
1
2

where
are two standard Brownian motions under the market probability, , and are
constants, and
is interpreted as the initial condition for the unobserved stochastic volatility
701

c
by
A. Mele

12.10. Volatility surfaces

component of the forward. Note that the model allows the forward and its volatility to be
conditionally correlated with instantaneous correlation equal to .
HKLW show that the implied volatility predicted by this model is,

2
(1 )2
2 3 2 2
1
1+
+ 4(
+ 24
+
24 (
)1
)(1 ) 2

; )=
iv (
2
4
( )
(
)(1 ) 2
1 + (1 24 ) ln2
+ (11920) ln4
+
(12.129)
where,
p
1 2 + 2+
(1 ) 2
(
)
ln
( ) ln
1
The right panel of Figure 12.12 depicts the approximated implied volatility predicted by the
SABR model obtained with hypothetical parameter values. The model can x the counterfactual
behavior of the skew predicted by a local vol model: as
increases, the implied volatility
shifts to the right while at the same time generating a downward-sloping backbone, dened
as the curved traced by the at-the-money volatility as the forward varies.
The reason for a downward-sloping backbone is the coe cient
1. HKLW also show the
origins of the skew (i.e., the asymmetric smile) predicted by their model, due to (i) a coe cient
1
1, which makes the instantaneous volatility in Eq. (12.128),
, decreasing in
, and
(ii) a
0, which makes the transition density of the log-changes in
skewed towards the
left, as in classical explanations of Heston (1993) given in the equity case (see Chapter 10).
Finally, the volatility of volatility parameter helps determine the curvature of the skew. The
implied volatility shifts up as
increases (option prices increase with volatility in this model
(see Chapter 10), and so does implied volatility.
The SABR model is widely used in the market practice, especially while modeling the swaption skew. Note, however, that the model does not allow for a perfect matching of all available
swaption prices, which by construction the local volatility can, at least theoretically. Finally,
the comparative statics exercise in Figure 12.12 regards a change in
. Because
is correlated with volatility, an alternative comparative statics exercise is one in which both the forward
changes and volatility change in accordance with their assumed correlation (see Bartlett, 2006).
Figure 12.13 shows that in this case with negative correlation, an increase in the forward accompanied by a decrease in the volatility (consistent with the negative correlation) implies the
skew shifts toward the left although the backbone (dened below) is still downward sloping.

702

c
by
A. Mele

12.10. Volatility surfaces

30

35
Rn=3%

Rn=3%

Rn=4%

R =3.5%
n

Rn=4%

Implied volatility (SABR model), in %

Implied volatility (Local vol model), in %

30

25

20

25

20

15

10

15

4
Strike, in %

4
Strike, in %

FIGURE 12.13. The left panel is the same as the left panel in Figure 12.12. The right panel
depicts the approximated implied volatility iv (
; ) in Eq. (12.129) predicted by
the SABR model in Eq. (12.128), obtained with the parameters values used in Figure
12.12. Solid, dashed and dotted lines correspond to values of
equal to 3%, 3 5%
and 4%, respectively. The values of
in corrispondence of the three values of
are
= 0 02, and then, two increments obtained consistently with the negative correlation
in Eq. (12.128),
=
.

703

c
by
A. Mele

12.11. Appendix 1: The FTAP for bond prices

12.11 Appendix 1: The FTAP for bond prices


We assume that available for trading are
=

pure discount bonds with prices

) satisfying:

= 1

(12A.1)

where
is a Brownian motion in R , and
and
(
is vector-valued) are such that there exists
a strong solution to the previous system. The value of a self-nancing portfolio in these
bonds and
a money market account satises:

1 )+
+ >
= >(
where

is some portfolio, 1

is a

-dimensional vector of ones, and

=[

2]

>

=[

2]

>

Now suppose there exists a portfolio such that > = 0. This is an arbitrage opportunity if there
1 6= 0. (Use as usual, when
1
0, and
when
exist events for which at some time,
1
0: the drift of
will then be appreciating at a deterministic rate that is strictly greater
than .) Therefore, arbitrage is ruled out if:
>

) = 0 whenever

>

=0

is orthogonal to
In other terms, there is no arbitrage as soon as every vector in the null space of
1 , or when there exists a in R and satisfying a few integrability conditions, and such that
1 =
, or
=
= 1
(12A.2)
In this case,
=( +

= 1

R
R
R
>
1
= exp(
k k2 ). It is easy to show, now, that
Next, dene =
+
,
2
by Girsanovs theorem the discounted bond price is a martingale under . Indeed, dene for a generic
)
, and:
, (
)
(
(

( )

By Girsanovs theorem, and It


os lemma,
=
Therefore, for all

, under

], ( ) = E ( ( )), implying that:


(

( )

) = E ( ( )) = E [

(
)] = E
| {z }
=1

or
(

)=

=E

, all

which is Eq. (12.2).


Note that no assumptions have been made regarding the number of zero-coupon bonds, . For
example, suppose that
, and that there are no other assets traded in this market. Then, there

704

c
by
A. Mele

12.11. Appendix 1: The FTAP for bond prices

exists an innite number of risk-neutral proabilities . If


= , there exists one and only one riskneutral probability . If
, there exists one and only one risk-neutral probability but then, the
various bond prices have to satisfy some basic no-arbitrage restrictions. For example, take = 2 and
= 1, such that Eq. (12A.2) reduces to,
1

2
2

In other terms, the Sharpe ratio of any two bonds must be identical. Eq. (12A.2) is used several
times in this chapter. In Section 12.4, the market primitive is the short-term rate, solution of a
and
are derived via It
os lemma. In Section 12.7,
multidimensional di usion process, and
and
are restricted by a model for the forward rates.

705

12.12. Appendix 2: Certainty equivalent interpretation of forward prices

c
by
A. Mele

12.12 Appendix 2: Certainty equivalent interpretation of forward prices


Multiply both sides of the pricing equation (12.2) by the amount

(
)
=E

Suppose momentarily that

is known at
(

But in the context of this chapter,


that solves:

. In this case, we have:

=E

is random. Dene then its certainty equivalent by the number


(

=E

or
=E (

( )

(12A.3)

where ( ) has been dened in (12.90).


Comparing Eq. (12A.3) with Eq. (12.89) reveals that forward prices can be interpreted in terms of
the previously dened certainty equivalent.

706

c
by
A. Mele

12.13. Appendix 3: Additional results on forward probabilities

12.13 Appendix 3: Additional results on forward probabilities


Eq. (12.90) denes

( ) as:
( )=
E

More generally, dene a density process as:

( )
E

By the FTAP, the


( ) is a martingale under
(see Appendix 1 to this chapter).
numerator of

[
], and in particular, ( ) = 1.
Therefore, E[
F ] = E[ ( )| F ] = ( ) all
We demonstrate these claims under a slightly di erent angle. Let us consider the price dynamics of
),
a zero-coupon bond in Eq. (12A.1), (
)
(
=
and
where we have dened
Under the risk-neutral probability

R
where =
+
By Itos lemma,

is a

))

.
,

-Brownian motion.
( )
=
( )

( ) = 1.

The solution is:


( ) = exp

1
2

k (

)k

))

Under the usual integrability conditions, we can use the Girsanovs theorem and conclude that
Z

+
(
)>
(12A.4)
.
is a Brownian motion under the -forward probability
Assuming for example that the driving state variable is the short-term rate, we have that the drift
of the same short-term rate is lower and that of the bond price is higher, due to negative
under
bond price volatility,
0.
Finally, note that for all and non-decreasing sequences of dates { } =0 1 ,
Z

(
= 0 1
= +
)>
Therefore,
1

)>

>
1)

is a Brownian motion under the -forward martingale probability


used in Section 12.8 on interest rate derivatives.

707

= 1 2

(12A.5)

. Eqs. (12A.5) and (12A.4) are

c
by
A. Mele

12.14. Appendix 4: Principal components analysis

12.14 Appendix 4: Principal components analysis


Principal component analysis transforms the original data into a set of uncorrelated variables, the
principal components, with variances arranged in descending order. Consider the following program,
max (

1 ))

>
1

s.t.

where
lead to,

1)

>
1

1,

=1

and the constraint is an identication constraint. The rst order conditions


(

=0

where is a Lagrange multiplier. The previous condition tells us that must be one eigenvalue of
( 1) =
the matrix , and that 1 must be the corresponding eigenvector. Moreover, we have
>
=
which
is
clearly
maximized
by
the
largest
eigenvalue.
Suppose
that
the
eigenvalues
of
1
1
. Then,
are distinct, and let us arrange them in descending order, i.e. 1
(

1)

>

, where 1 is the eigenvector corresponding


Therefore, the rst principal component is 1 = 1
to the largest eigenvalue, 1 .
Next, consider the second principal component. The program is, now,
max (

2 ))

>
2

s.t.

= 1 and

>
2

=0

where
( 2 ) = 2> 2 . The rst constraint, 2> 2 = 1, is the usual identication constraint. The
second constraint, 2> 1 = 0, is needed to ensure that 1 and 2 are orthogonal, i.e. ( 1 2 ) = 0.
The rst order conditions for this problem are,
0=

where is the Lagrange multiplier associated with the rst constraint, and is the Lagrange multiplier
associated with the second constraint. By pre-multiplying the rst order conditions by 1> ,
0=

>
1

where we have used the two constraints 1> 2 = 0 and 1> 1 = 1. Post-multiplying the previous
>
>
expression by 1> , one obtains, 0 = 1> 2 1>
1 =
1 , where the last equality follows by
>
= 0. So the rst order conditions can be rewritten as,
1 2 = 0. Hence,
(

=0

The solution is now 2 , and 2 is the eigenvector corresponding


to 2 . (Indeed, this time we cannot

, implying that ( 1 2 ) 6= 0.) It follows


choose 1 as this choice would imply that 2 = 1>
that
( 2) = 2.
In general, we have,
= 1
( )=
Let be the diagonal matrix with the eigenvalues
> , and by the orthonormality of
,
of , =

P
( ) = Tr ( ) = Tr
=1
Hence, Eq. (12.23) follows.

708

on the diagonal. By the spectral decomposition


= , we have that >
= and, hence,

>
>
= Tr
= Tr ( )

>

c
12.15. Appendix 5: A few analytical details regarding the Hull and White modelby
A. Mele

12.15 Appendix 5: A few analytical details regarding the Hull and White
model
As in the Ho and Lee model, the instantaneous forward rate (
) predicted by the Hull and White
model is as in Eq. (12.62), where functions 2 and 2 can be easily computed from Eqs. (12.67) and
(12.68) as:
2(

)=

2(

2(

2(

)=

Therefore, the instantaneous forward rate (


) predicted by the Hull and White model is obtained
by replacing the previous equations in Eq. (12.62). The result is then equated to the observed forward
rate $ ( ) so as to obtain:
2 Z
2
(
)
(
)
(
)
1
(
)
=
+
+
$
2 2
By di erentiating the previous equation with respect to , and rearranging terms,
Z

2
(
)
(
)
(
)
(
=
(
)
+
+
+
1
$

2
2
(
)
(
)
(
=
1
1
(
)
+
+
(
)
+
$
$
2 2
which reduces to Eq. (12.69) after using simple algebra.

709

c
by
A. Mele

12.16. Appendix 6: Expectation theory and embedding in selected models

12.16 Appendix 6: Expectation theory and embedding in selected models


A. Expectation theory
Assume rhat
( ) =

and

() =

(12A.6)

where and are constants. We derive the dynamics of , compare them with , and formulate some
basic claims regarding the expectation theory. We have:
Z
= ( )+
( ) + (
)
where
(

)= (

Hence,

+ (

1
2

)+

)+

Finally,
= (
and since

|F ) =

1
2

)+

)+

)+ (

)+

,
(

|F ) = (

)+

1
2

( ). As shown in the following


Even with
0, this model does not guarantee that ( | F )
example, this is due to the nonstationary nature of the volatility function. Indeed, suppose, next, that
instead of Eq. (12A.6), we have that
(
where

and

)=

exp(

)) and

() =

are constants. In this case, we have:


Z
Z
= ( )+
( ) +

2 (

where
(

)=

Finally,
(

|F ) = (

)+

= (

)+

Therefore, it is su cient to have a risk-premium such that


(
In other words,
that ( | F )

|F )

, to generate the prediction that:

) for any .

0 is a necessary condition, not su cient. Notice that when


).

710

= 0, it always holds

c
by
A. Mele

12.16. Appendix 6: Expectation theory and embedding in selected models


B. Embedding

We now embed the Ho and Lee model in Section 12.6.2 in the HJM format. In the Ho and Lee model,
=
where is a

-Brownian motion. By Eq. (12.62) in Section 12.6,


(

where

2(

)=

1 2
(
2

) and

) =
)

2(

)=
2(

2(

2(

) = 1. Therefore, by Eqs. (12.81),

12 (

)+

=
)+

12 (

) +

2(

Next, we embed the Vasicek model in Section 12.6 into the HJM format. The Vasicek model is:
=(
where is a

-Brownian motion. By results in Section 12.4,


(
R
2
) =
(
)

) . By Eqs. (12.81),

2(

where

= 1 1

)=

2(

2(

)=
2(

)=

)+

12 (

)+

2(

2(

)
) ,

2(

12 (

and

;
2

) =

) +(

2(

)=

Naturally, this model can never be embedded within a HJM model because it is not of the perfectly
tting type. In practice, condition (12.82) can never hold in the simple Vasicek model. However, the
model is embeddable once is turned into an innite dimensional parameter `
a la Hull and White (see
Section 12.4).

711

c
by
A. Mele

12.17. Appendix 7: Additional results on string models

12.17 Appendix 7: Additional results on string models


We prove Eq. (12.84). We have,
(
Di erentiation of the
Z

)=

2)

1
2

R
(

(
1)

2)

2)

+
(

term is straightforward. Moreover,


Z
(
(
)+
2) 2 = (
Z
= (
)
(
)( (
Z
=2 (
)
(
) (

712

), where

2)

2)

)+
)

))

c
by
A. Mele

12.18. Appendix 8: Changes of numeraire and Jamshidians (1989) formula

12.18 Appendix 8: Changes of numeraire and Jamshidians (1989) formula


Consider the following change-of-numeraire arithmetics. Let
=
We have:

{
2

(
(

Next, we apply this result to the process (


)
derive the solution of (
) at for Vasicek, viz
(

(
(

)
=
)

)
),

+(

under

) under

(12A.7)

as well as under

. We aim to

as well as under

This solution is needed to calculate the two probabilities in Eq. (12.95).


By Itos lemma, and the fact that
=
, we have that under the risk-neutral probability
(
(
By applying Eq. (12A.7) to (
(
(

)
=
)

)
=
)

),
)

) (

[ (

)]

(12A.8)

Next, we change probability by hinging upon the tools of Appendix 3. We have,


+

)
into Eq. (12A.8),

is a Brownian motion under the -forward martingale probability. Replace then


then integrate, and obtain:
(
(
(
(

)
=
)
)
=
)

(
(

(
(
(
)
(
)

)
=
)
)
=
)

1
2

1
2

[ (

[ (

Rearranging terms gives Eqs. (12.96) in the main text.

713

)]2

)]2

[ (

[ (

)]

)]

12.18. Appendix 8: Changes of numeraire and Jamshidians (1989) formula

c
by
A. Mele

References
At-Sahalia, Y. (1996): Testing Continuous-Time Models of the Spot Interest Rate. Review
of Financial Studies 9, 385-426.
Ahn, C.-M. and H.E. Thompson (1988): Jump-Di usion Processes and the Term Structure
of Interest Rates. Journal of Finance 43, 155-174.
Ang, A. and M. Piazzesi (2003): A No-Arbitrage Vector Autoregression of Term Structure
Dynamics with Macroeconomic and Latent Variables. Journal of Monetary Economics
50, 745-787.
Balduzzi, P., S. R. Das, S. Foresi and R. K. Sundaram (1996): A Simple Approach to Three
Factor A ne Term Structure Models. Journal of Fixed Income 6, 43-53.
Bartlett, B. (2006): Hedging Under SABR Model. Wilmott Magazine July/August, 68-70.
Black, F. (1976): The Pricing of Commodity Contracts. Journal of Financial Economics 3,
167-179.
Black, F. and M. Scholes (1973): The Pricing of Options and Corporate Liabilities. Journal
of Political Economy 81, 637-659.
Brace, A., D. Gatarek and M. Musiela (1997): The Market Model of Interest Rate Dynamics.
Mathematical Finance 7, 127-155.
Brigo, D. and F. Mercurio (2006): Interest Rate ModelsTheory and Practice, with Smile,
Ination and Credit. Springer Verlag Finance (2nd Edition).
Brunnermeier, M. (2009): Deciphering the Liquidity and Credit Crunch 2007-08. Journal of
Economic Perspectives 23, 77-100.
Carverhill, A. (1994): When is the Short-Rate Markovian? Mathematical Finance 4, 305-312.
Cochrane, J. H. and M. Piazzesi (2005): Bond Risk Premia. American Economic Review 95,
138-160.
Collin-Dufresne, P. and R. S. Goldstein (2002): Do Bonds Span the Fixed-Income Markets?
Theory and Evidence for Unspanned Stochastic Volatility. Journal of Finance 57, 16851729.
Conley, T. G., L. P. Hansen, E. G. J. Luttmer and J. A. Scheinkman (1997): Short-Term
Interest Rates as Subordinated Di usions. Review of Financial Studies 10, 525-577.
Cox, J. C., J. E. Ingersoll and S. A. Ross (1979): Duration and the Measurement of Basis
Risk. Journal of Business 52, 51-61.
Cox, J. C., J. E. Ingersoll and S. A. Ross (1985): A Theory of the Term Structure of Interest
Rates. Econometrica 53, 385-407.
Dai, Q. and K. J. Singleton (2000): Specication Analysis of A ne Term Structure Models.
Journal of Finance 55, 1943-1978.
714

12.18. Appendix 8: Changes of numeraire and Jamshidians (1989) formula

c
by
A. Mele

Diebold, F.X. and C. Li (2006): Forecasting the Term Structure of Government Bond Yields.
Journal of Econometrics 130, 337-364.
Du e, D. and R. Kan (1996): A Yield-Factor Model of Interest Rates. Mathematical Finance
6, 379-406.
Du e, D. and K. J. Singleton (1999): Modeling Term Structures of Defaultable Bonds.
Review of Financial Studies 12, 687-720.
Estrella, A. and G. Hardouvelis (1991): The Term Structure as a Predictor of Real Economic
Activity. Journal of Finance 46, 555-76.
Fama, E. F. and R. R. Bliss (1987): The Information in Long-Maturity Forward Rates.
American Economic Review 77, 680-692.
Fong, H. G. and O. A. Vasicek (1991): Fixed Income Volatility Management. The Journal
of Portfolio Management (Summer), 41-46.
Geman, H. (1989): The Importance of the Forward Neutral Probability in a Stochastic Approach to Interest Rates. Unpublished working paper, ESSEC.
Geman H., N. El Karoui and J. C. Rochet (1995): Changes of Numeraire, Changes of Probability Measures and Pricing of Options. Journal of Applied Probability 32, 443-458.
Goldstein, R. S. (2000): The Term Structure of Interest Rates as a Random Field. Review
of Financial Studies 13, 365-384.
Hagan, P. S. and D. E. Woodward (1999): Equivalent Black Volatilities. Applied Mathematical Finance 6, 147-157.
Hagan, P. S., D. Kumar, A. S. Lesniewski, and D. E. Woodward (2002): Managing Smile
Risk. Wilmott Magazine, September, 84-108.
Harvey, C. R. (1991): The Term Structure and World Economic Growth. Journal of Fixed
Income 1, 4-17.
Harvey, C. R. (1991): The Term Structure Forecasts Economic Growth. Financial Analysts
Journal May/June 6-8.
Heaney, W. J. and P. L. Cheng (1984): Continuous Maturity Diversication of Default-Free
Bond Portfolios and a Generalization of E cient Diversication. Journal of Finance 39,
1101-1117.
Heath, D., R. Jarrow and A. Morton (1992): Bond Pricing and the Term-Structure of Interest
Rates: a New Methodology for Contingent Claim Valuation. Econometrica 60, 77-105.
Heston, S. L. (1993): A Closed Form Solution for Options with Stochastic Volatility with
Applications to Bond and Currency Options. Review of Financial Studies 6, 327-344.
Ho, T. S. Y. and S.-B. Lee (1986): Term Structure Movements and the Pricing of Interest
Rate Contingent Claims. Journal of Finance 41, 1011-1029.
715

12.18. Appendix 8: Changes of numeraire and Jamshidians (1989) formula

c
by
A. Mele

Hordahl, P., O. Tristani and D. Vestin (2006): A Joint Econometric Model of Macroeconomic
and Term Structure Dynamics. Journal of Econometrics 131, 405-444.
Hull, J. (2003): Options, Futures, and Other Derivatives. Prentice Hall. 5th edition (International Edition).
Hull, J. and A. White (1990): Pricing Interest Rate Derivative Securities. Review of Financial
Studies 3, 573-592.
Jamshidian, F. (1989): An Exact Bond Option Pricing Formula. Journal of Finance 44,
205-209.
Jamshidian, F. (1997): Libor and Swap Market Models and Measures. Finance and Stochastics 1, 293-330.
Joreskog, K. G. (1967): Some Contributions to Maximum Likelihood Factor Analysis. Psychometrica 32, 443-482.
Karlin, S. and H. M. Taylor (1981): A Second Course in Stochastic Processes. San Diego:
Academic Press.
Kennedy, D. P. (1994): The Term Structure of Interest Rates as a Gaussian Random Field.
Mathematical Finance 4, 247-258.
Kennedy, D. P. (1997): Characterizing Gaussian Models of the Term Structure of Interest
Rates. Mathematical Finance 7, 107-118.
Kessel, R. A. (1965): The Cyclical Behavior of the Term Structure of Interest Rates. National
Bureau of Economic Research Occasional Paper No. 91.
Kloeden, P. and E. Platen (1992): Numeric Solutions of Stochastic Di erential Equations.
Berlin: Springer Verlag.
Knez, P. J., R. Litterman and J. Scheinkman (1994): Explorations into Factors Explaining
Money Market Returns. Journal of Finance 49, 1861-1882.
Lamberton, D. and B. Lapeyre (1997): Introduction au Calcul Stochastique Applique `a la
Finance. Paris: Ellipses.
Langetieg, T. (1980): A Multivariate Model of the Term Structure of Interest Rates. Journal
of Finance 35, 71-97.
Laurent, R. D. (1988): An Interest Rate-Based Indicator of Monetary Policy. Federal Reserve
Bank of Chicago Economic Perspectives 12, 3-14.
Laurent, R. D. (1989): Testing the Spread. Federal Reserve Bank of Chicago Economic
Perspectives 13, 22-34.
Litterman, R. and J. Scheinkman (1991): Common Factors A ecting Bond Returns. Journal
of Fixed Income 1, 54-61.
Litterman, R., J. Scheinkman, and L. Weiss (1991): Volatility and the Yield Curve. Journal
of Fixed Income 1, 49-53.
716

12.18. Appendix 8: Changes of numeraire and Jamshidians (1989) formula

c
by
A. Mele

Longsta , F. A. and E. S. Schwartz (1992): Interest Rate Volatility and the Term Structure:
A Two-Factor General Equilibrium Model. Journal of Finance 47, 1259-1282.
Mele, A. (2003): Fundamental Properties of Bond Prices in Models of the Short-Term Rate.
Review of Financial Studies 16, 679-716.
Mele, A. and F. Fornari (2000): Stochastic Volatility in Financial Markets: Crossing the Bridge
to Continuous Time. Boston: Kluwer Academic Publishers.
Mele, A. and O. Obayashi (2015): The Price of Fixed Income Market Volatility. Springer Verlag
Finance (forthcoming).
Merton, R. C. (1973): Theory of Rational Option Pricing. Bell Journal of Economics and
Management Science 4, 141-183.
Miltersen, K., K. Sandmann and D. Sondermann (1997): Closed Form Solutions for Term
Structure Derivatives with Lognormal Interest Rate. Journal of Finance 52, 409-430.
Nelson, C.R. and A.F. Siegel (1987): Parsimonious Modeling of Yield Curves. Journal of
Business 60, 473-489.
Rebonato, R. (1998): Interest Rate Option Models. Wiley.
Rebonato, R. (1999): Volatility and Correlation. Wiley.
Ritchken, P. and L. Sankarasubramanian (1995): Volatility Structure of Forward Rates and
the Dynamics of the Term Structure. Mathematical Finance 5, 55-72.
Sandmann, K. and D. Sondermann (1997): A Note on the Stability of Lognormal Interest
Rate Models and the Pricing of Eurodollar Futures. Mathematical Finance 7, 119-125.
Santa-Clara, P. and D. Sornette (2001): The Dynamics of the Forward Interest Rate Curve
with Stochastic String Shocks. Review of Financial Studies 14, 149-185.
Stanton, R. (1997): A Nonparametric Model of Term Structure Dynamics and the Market
Price of Interest Rate Risk. Journal of Finance 52, 1973-2002.
Stock, J. H. and M. W. Watson (1989): New Indexes of Coincident and Leading Economic
Indicators. In: Blanchard, O. J. and S. Fischer (Eds.): NBER Macroeconomics Annual
1989, MIT Press, 352-394.
Stock, J. H. and M. W. Watson (2003): Forecasting Output and Ination: The Role of Asset
Prices, Journal of Economic Literature 41, 788-829.
Vasicek, O. (1977): An Equilibrium Characterization of the Term Structure. Journal of
Financial Economics 5, 177-188.
Veronesi, P. (2010): Fixed Income Securities: Valuation, Risk and Risk Management. John
Wiley and Sons.

717

13
Risky debt and credit derivatives

13.1 Introduction
This chapter deals with the pricing of securities that carry credit risk. It examines the main
conceptual approaches to deal with credit risk as well as how this risk can be transferred through
dedicated credit derivatives. It is instructive to review the historical reasons leading up to the
creation and trading of these derivatives. The next subsection contains such a succinct account;
Section 11.3.2 provides a roadmap to this chapter.
[In progress]
13.1.1 A brief history of credit risk and nancial innovation
During the mid 1980s, a market begins to develop regarding the rst interest rates derivatives
reviewed in the previous chapter. This market would grow to an extent that during the late
1980s already, the appetite for these derivatives would proliferate and also lead to additional
and fairly complex products, arising through innovation and competition. It is natural: nancial
innovation is relatively easy to imitate, which leads banks to increasing creativenessincreasing
creativeness is needed to keep the innovators initial competitive advantage as long as possible.
The early 1990s were extraordinary years. On the one hand, interest rates were low amid
concerns the U.S. economy had not still recovered from 1991 recession. On the other hand, capital market volatility was quite muted. Low interest rates and low capital market volatility are
natural drivers for motivating the introduction of new derivatives that aim to boost investors
returns.1 But the nancial turmoil in 1994 brought the interest rate climate to suddenly change
while some of these products would produce large losses. These losses would trigger a call for
regulation by public opinion and certain policy makers even while the ISDA (International
Swaps and Derivatives Association) would debate that more regulation would destroy market
creativity.
Regulatory pressures would vanish by the mid 1990s, when the market started to innovate
again amid a general consensus that derivative risks could be controlled through market discipline, not regulation. Swap markets recovered. They did so slowly though, as these derivatives
1 Examples

of products introduced at the time are LIBOR squared, inverse oaters, or power options, sponsored by JPMorgan.

c
by
A. Mele

13.1. Introduction

were already in the end of the innovation cycle: the process of imitation had led swap-related
derivatives to become a mass product, with prot margins having been eroded in the meantime. The market was ready for a new major innovation wave.
Credit risk was the next innovation stream. Global institutions (e.g., JPMorgan, Credit Suisse, Bankers Trust) soon realized that borrower defaulting was a source of substantial risk
that could be so conveniently re-allocated through dedicated derivatives. Similarly as classic
derivatives transfer market risk, credit derivatives could transfer market risk. Institutions such
as JPMorgan had additional motivations to innovate in this space, given the vast pools of loans
contained in its books: importantly, these loans required too many reserves and were therefore
expensive.
One solution was to proceed with securitization. Securitization is a process by which some
illiquid assets (say some loans) are gathered (packaged) into a common pool that backs the
issuance of new securities aimed to display an enhanced liquidity obtained through packaging,
credit and liquidity enhancements. These new securities are, in fact, derivatives written on the
initial illiquid assets. Two leading examples of this process include the securitization of mortgages and receivables. Financial institutions nd the securitization process attractive, as they
can carve out certain items in their balance sheet, thus boosting their return on investments;
moreover, by securitizing assets, less capital is needed to meet capital requirements standards.
For example, the accounts receivables of a corporation may be used to back the issue of commercial paper known as asset-backed commercial paper. A well-functioning securitization system
is a way (not the only way) to transfer and trade credit risk.
Global institutions would then repackage loans into derivatives, in a way that default risk
and/or part of the securitized loans could be transferred to outside investors. Note that credit
derivatives were also a regulatory mitigation device, partly useful as a response to regulation.
The underlying ideas were (i) to turn loans into derivatives that could be sold, and (ii) to create
new insurance products such as credit default swaps. At the very beginning, derivatives were just
designed to have single loans as the underlying. Afterwards, the idea emerged to create structures organized in derivatives bundles, with cash ow indexed to baskets of loansthe ancestors
to collateralized debt obligations (CDOs). For example, JPMorgan created Bistro (Broad
Index Secured Trust O ering), a structure relying on a variety of assets, ranging from corporate debt to student loans; ABN-Amro created similar structures (Heineken and Amstel).
During this innovation process, competition increased and prot margins fell again, leading to
renewed motivation for additional innovation.

719

c
by
A. Mele

13.1. Introduction

Year

US ABS
(Outstanding)

Global CDO
(Issuance)

US Agency MBS
(Issuance)

US Agency CMO
(Issuance)

2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010

1084
1230
1381
1507
1814
2111
2700
2945
2599
2326
2034

68
78
83
87
157
251
455
430
62
4
7

474
1086
1447
2131
1015
983
923
1189
1169
1734
1420

586
1455
2019
2762
1379
1345
1240
1471
1339
2022
1885

In billions US dollars. Source: Sifma.

110
100
20
90
15

80
70

10

60
5
50
40

30

Europe
U.S.
U.K.

10
1985 1990 1995 2000 2005 2010

20

ABX AAA
ABX BB

10
2006

2007

2008

2009

Left hand side panel: U.S. and European House Price Change (year-to-year, in percentage). Right
hand side panel: Indexex of CDS on U.S. Mortgage-Related Securities Prices. Source: IMF, Global
Financial Stability Report, April 2008.

720

13.2. The classics: Modigliani-Miller irrelevance results

c
by
A. Mele

The response to increased competition was the creation of structured products referenced
to riskier assets. Importantly, during the mid 1990s, derivatives teams begun to interact with
teams managing loans extended to borrowers with poor credit historysubprime mortgages.
Subprime loans would begin to be securitized and structured into CDOs with global nancial
institutions involved (e.g., Merrill Lynch or UBS). The mortgage banking model had shifted
from one of buy to hold to one of originate and distribute. The subprime crisis erupted in
2007, and sent the global nancial system into nearly two years of turmoil, in a situation where
the payments and settlement system was jeopardized.
The mechanics of the 2007-2008 crisis are well-understood. While low interest rates helped
sustain the boom in the housing market, the originate-and-distribute model had operated in a
way that lending standards were not in line with market expectations (see Section 13.6.2). The
subprime mortgage market would sink amid increasing interest rates and rapidly decelerating
housing prices. The shadow banking system, which helped sustain the originate-and-distribute
model, was actually an important piece of the crisis: the uncertainties related to the entities
nancing this system triggered a sharp liquidity dry-up and, then, a credit crunch, followed by
a drop in the real economic activity, which magnied the credit crunch, over a spiral.
[In progress]
13.1.2 Plan of the chapter
This chapter reviews conceptual approaches to the pricing of defaultable securities as well as the
basic mechanics of derivatives that transfer credit risk. Section 13.2 reviews classical irrelevance
results: the capital structure does not matter for the value of a rm. This result should sound like
a remainder for some of the subsequent developments in this chapter. For example, Section 13.3
deals with structural approaches to debt evaluation, relying on assumptions that are at times
consistent and at other times inconsistent with those underlying classical irrelevance results.
Section 13.3 also deals with reduced-form approaches by which default risk is modeled as an
exogenously given event. Section 13.4 reviews the main credit derivatives that aim to re-allocate
credit risk, such as credit default swaps or securitized obligations.
These lectures have never really dealt with issues regarding risk-management. Section 13.5
contains both introductory discussions regarding risk-management in general and some details
of credit risk management. It also discusses regulatory developments over the relatively recent
history. Section 13.6 discusses a few more details regarding the 2007-08 nancial crisis, which
originally erupted when losses mounted in credit markets, and then spread to the overal economy. It is an exemplary case study (very well utilized in the literature) that illustrates how
a shock in capital markets can a ect global developments over a vicious circle: an important
instance of endogenous risk of the type dened and discussed in general terms in Chapter 8 of
these lectures.

13.2 The classics: Modigliani-Miller irrelevance results


Modigliani and Miller (1958) consider an economy where rms can be sorted by the expected
returns of their shares, according to the sector, or class, they belong to. Let be the constant,
expected prot paid o by the each rm within sector , andP be the price of an unlevered
rms share. Under standard conditions, we have that
=
)
, where
is the
=1 (1 +
risk-adjusted discount rate prevailing in sector , such that the return on equity (ROE) for the
721

c
by
A. Mele

13.2. The classics: Modigliani-Miller irrelevance results


unlevered rm is,
=

a constant for all the unlevered rms belonging to sector . Naturally, the value of the rm is
=
, say. Next, consider a levered rm operating within the
equal to the value of equity,
-th sector. This rm issues debit with nominal value equal to such that is value denoted as
, equals the sum of equity and debt,
=
+ . In the absence of any market frictions,
we have the following irrelevance result:
Theorem 13.1 (Modigliani & Miller theorem). In the absence of arbitrage and frictions, the
market value of any rm is independent of its capital structure and is given by dicoutning its
expected prots at the discount rate appropriate to its class:
= , for any rm
{
}
in class .
In other words, the return on investment (ROI), dened as = , is the same for two rms
that earn the same expected prot , regardless of the capital structure. Naturally, the ROE
and ROI are the same for the unlevered rm.
The proof of Theorem 13.1 can proceed by applying the modern tools reviewed in Chapter
2 through 4. For sake of completeness, we use the original Modigliani and Miller arguments,
which are very simple. Consider two rms: a rst, unlevered and a second, levered. They both
earn the same expected prot, . Suppose to purchase the shares of the unlevered rm and
borrow the same amount of money issued by the levered rm. In the absence of arbitrage or
any frictions, the value of this portfolio should equal the value of the levered rm, which is
possible as soon as the value of the levered and the unlevered rm are the same.
Mathematically, given an arbitrary
(0 1), we do the following trade: (i) we buy
=
+
=
of the unlevered rm; (ii) we sell
= shares of the levered rm. These
two trades make the balance of the position worth
+
=
, and so (iii) we
borrow
at the interest rate , to make this initial position worthless. This portfolio yields:
(i) +
, due to the purchase of the shares of the unlevered rm, (ii)
(
), due
to the sale of the shares of the levered rm, which of course has to pay interests on its debt,
and (iii)
, arising to honour the debt we are making to build up the worthless portfolio.
(
)
=
1 . If
, we have
Summing up, the prots are
an arbitrage opportunity as we may make money out of a worthless portfolio, and if
,
we have an arbitrage as well, as we could reverse the positions of the worthless portfolio. So we
need to have that
=
=
= .
[As mentioned, Theorem 13.1 can be proved through the modern tools in Chapters 2 through
4]
We have:

= ROI
ROE =

. Therefore,
=

ROI ( +

= ROI + (ROI

If the nancial conditions of the rm do not a ect the interest rate on debt, the ROE is
. This situation arises when the arbitrage
increasing in the leverage ratio, , provided ROI
arguments underlying Theorem 13.1 assume no-arbitrage trades can be implemented with a
cost of borrowing money equal to that of the rm. In the presence of market frictions such
722

c
by
A. Mele

13.3. Conceptual approaches to valuation of defaultable securities

as asymmetric information between borrowers and lenders, this needs not to be the case. For
example, debt markets might be concerned about the size of the leverage ratio. Assume, for
example, that = ( ), where = , and in particular that ( ) = 0 03 . Then, we have that:
ROE = ROI + (ROI 0 03 ) . The picture below depicts the behavior of ROE as a function
of , assuming that ROI = 5% and that the risk-free rate in case of no such frictions is = 3%.

ROE

0.09

0.08

0.07

0.06

0.05

0.04

0.03

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

Leverage ratio

The solid line depicts the ROE for a rm sustaining a cost of debt independent of the
leverage ratio, with ROI = 5% and = 2%. The dashed line is the ROE for a rm that
has a cost of debt increasing in the leverage ratio , ( ) = 0 03 .

Consider the rm with cost of capital depending on the current leverage rato, . For a low
level of , the ROE increases with , so as to magnify the di erence ROI 0 03 through the
multiplying e ect (ROI 0 03 ) . However, for higher leverage ratios, the di erence ROI 0 03
becomes thinner and thinner, and an increase in then leads to marginally lower ROE. In this
example, there is an interior value for the leverage ratio that maximizes the ROE, which is,
approximately, = 0 83.

13.3 Conceptual approaches to valuation of defaultable securities


13.3.1 Firm value, or structural, approaches
Relies on the structure of the rm. Shares and bonds as derivatives written upon the rm asset
value. We begin reviewing the Mertons model, which stems from a Modigliani-Miller world,
where, as we know, the value of the rm is not a ected by the leverage. Note since the beginning
the limitation of such a model: its a model with which we evaluate debt, assuming a world
where leverage does not even a ect the value of the rm that is issuing the debt!
Consider the following stylized balance sheet.
723

c
by
A. Mele

13.3. Conceptual approaches to valuation of defaultable securities

Equity ( )
(Shares)
Assets ( )
Debt ( )
(Bonds)
Therefore, we have the accounting identity: Assets = Equity + Debt, or
=

When debt expires, debt-holders receive the minimum between the nominal value of debt and
the value of the assets the rm can liquidate to honour debt. Debt-holders are senior claimants.
Equity holders are juniors, i.e., they are residual claimants to the rms assets.
We use these basic insights and illustrate the rst approach to the modeling of the riskstructure of interest ratesthe Merton-KMV approach. In this approach, equity is the same as
a European call option written on the rms assets, with expiration equal to the debt expiration,
and strike equal to the nominal value of debt. The current value of debt equals the value of the
assets minus the value of equity, i.e. the value of a risk-free discount bond minus the value of
a put option on the rm with strike price equal to the nominal value of debt, as shown by Eq.
(13.3) below.
Merton (1974) uses the Black and Scholes (1973) formula to derive the price of debt. The
main assumption underlying this model is that the assets of the rm can be traded, and that
their value
satises2

=
+
(13.1)
where is a Brownian motion under the risk-neutral probability,
is the instantaneous
standard deviation, and is the short-term rate on riskless bonds.
Let
be the nominal value of debt, be time of expiration of debt;
the debt value as
of at time
. As argued earlier, shareholders are long a European call option, and the
bond-holders are residual claimants. Mathematically,

if the rm defaults, i.e.


=
if the rm is solvent, i.e.
We can decompose the rm asset value at time
value of debt at :
= min {
}=

, into the sum of the value of equity and the


max {

0}

(13.2)

Equity at

Note, also, that,


= min {

}=

max {

0}

(13.3)

Put on the rm

That is, credit risk raises the cost of capital.


A word on convexity, and risk-taking behavior. Convexity: Managers have incentives to invest
in risky assets, as the terminal payo to them is increasing in the rm asset volatility, .
2 Eq.

(13.1) could be generalized to one in which


=(
rm. This would make the rm value equal to 0 = E 0
=(
)
motion with parameters and , in which case

, where
) +
is the instantaneous cash ow to the
. For example, one could take
to be a geometric Brownian
, forever, but were just ignoring this complication.

724

c
by
A. Mele

13.3. Conceptual approaches to valuation of defaultable securities

Concavity: The value of debt, instead, is decreasing in the rm asset volatility, as we shall show
in detail in the next section.

1.2
1.0
0.8
0.6
0.4
0.2
0.0

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

2.2

A_T

FIGURE 13.1. Dashed line: the value of equity at the debt maturity, , max {
0},
. Solid line: the value of debt at maturity,
plotted as a function of the rm asset value,
min {
} as a function of
. Nominal value of debt is xed to
= 1.

13.3.1.1 Merton

The current value of the bonds equals the current value of the assets, 0 , minus the current
value of equity. The current value of equity can obtained through the Black & Scholes formula,
as equity is a European call option on the rm, struck at . By Eq. (13.2), and standard
risk-neutral evaluation, the current value of debt, 0 , is,

where
3 For

1)

ln (

0/

)+

1
2

(13.4)

() denotes the distribution function of a standard normal variable.3


the details, note that

E(

E[
0)

0]

and, then, by Eq. (13.2),

E [ max {

0}|

0]

1)

where the last equality follows by the Black & Scholes formula. Eq. (13.4) follows after rearranging terms in the previous equation.

725

c
by
A. Mele

13.3. Conceptual approaches to valuation of defaultable securities

1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

A_0

FIGURE 13.2. Solid line: the no-arbitrage bound, min { 0 }, depicted as a function
of 0 , when the nominal value of debt is xed to
= 1. Dashed line: the bond value
predicted by the Mertons model when = 1, = 3% and = 20%, annualized. Dotted
line: same as the dashed line, but with a higher asset volatility, = 40%.

Bond prices are decreasing in the asset volatility as bad outcomes are exaggerated on the
downside, due to the concavity properties depicted in Figure 13.1. Note that the property that
the term structure of interest rates increases with the volatility of the fundamentals is quite
sharp here. In Chapter 12 (Section 12.3.4.1), it was argued that the relation between the yield
curve and the volatility of the fundamentals (i.e. the volatility of the short-term rate) was quite
complex, as it depends on which of two e ects dominatea convexity and a risk-premium
e ect. In bad times, it should be the risk-premium e ect to dominate, thereby leading to a
positive link between the volatility of the fundamentals and the yield curve. In good times,
a convexity e ect would lead the yield curve to be negatively related to the volatility of the
fundamentals. Instead, the prediction in this section is quite neat: the term structure of interest
rates always increases with the volatility of the fundamentals. Naturally, this prediction relies
on a channel that is completely distinct from the risk-premium channel discussed in Chapter
12.
The term structure of interest rates is dened as usual as:

1
0
ln
= + s0
where
s0 =

ln

1)

(13.5)

We usually refer to s0 as the term-spread for a given xed maturity, and to the mapping
maturity-spreads as the risk-structure of interest rates.
Figure 13.3 depicts the spread predicted by this model. Credit spreads shrink to zero as
time-to-maturity becomes smaller and smaller. This property of the model stands in sharp
contrast with the empirical behavior of credit spreads, which are high even for short-maturity
726

c
by
A. Mele

13.3. Conceptual approaches to valuation of defaultable securities

bonds. This property arises because the model is driven by Brownian motions, which have
have continuous sample paths, such that given a rm asset value
, the probability of
bankruptcy, arising when hits
from above, approaches zero very fast as time-to-maturity
goes to zero. Because credit spreads reect default probabilities, as explained in detail below
(see Eq. (13.9)), credit spreads shrink to zero quickly as time-to-maturity approaches zero.
Naturally, one might end up with credit spreads su ciently high at short maturities, by
assuming the rm asset value is su ciently small. For example, in Figure 13.3.1, credit spreads
are high at short maturities, when = 1 1. However, even with = 1 1, credit spreads are
still zero at very short maturities. More fundamentally, requiring such a small value for is
problematic. Firms with such a low asset value would command a much higher spread than
that in Figure 13.3.1. All in all, the Brownian motion model in this section lacks some source of
risk driving the behavior of short-term spreads. In Section 13.3.2, we will show that this issue
can be addressed assuming that rms default can be triggered by jumps.

Spread
300

200

100

Time to maturity

FIGURE 13.3.1. The term structure of spreads, s0 , in basis points, predicted by Mertons
model, obtained with initial asset values 0 = 1 1 (solid line), 0 = 1 2 (dashed line),
and 0 = 1 3 (dotted line). The short-term rate, = 3%, and asset volatility is = 0 20.
Nominal debt
= 1.

Naturally, the term-structure of credit spreads has a rather di erent shape, when the current
rm asset value is below , as depicted in Figure 13.3.2. In this case, the probability the rm
defaults is close to one when time to maturity is close to zero, such that the spreads would then
be arbitrarily large as we get closer and closer to maturity. For visualization purposes, Figure
13.3.2 is truncated to only include values of the spreads for maturities higher than one year.
727

c
by
A. Mele

13.3. Conceptual approaches to valuation of defaultable securities

Spread
3000

2500

2000

1500

1000

500
1

Time to maturity

FIGURE 13.3.2. The term structure of spreads, s0 , in basis points, predicted by Mertons
model, obtained with initial asset values 0 = 0 9 (solid line), 0 = 0 8 (dashed line),
and 0 = 0 7 (dotted line). The short-term rate, = 3%, and asset volatility is = 0 20.
Nominal debt
= 1.

What is the asymptotic behavior of the spread predicted by this model? If


as

1
2

, then,

, the probability of survival for the rm, which we shall show to be


1

below (see Eq. (13.7)), approaches one,


1. That is, the rm asset value is
1
expected to grow
so large that
default will never occur, such that the bond becomes riskless
1
1 2
and s0
ln
0. Intuitively, when
, the asset volatility is so small,
1
2
that the exponential trend for
will make it unlikely that the rm asset value will fall below
the constant value . In other words, the Mertons model predicts that in the long-run, things
can only go well for the rm, a view quite opposite to that leading to positive spreads for long
maturities.
To summarize, the Mertons model predict that short-term spreads are zero and long-term
spreads are likely so. Intensity models, such as those analyzed in Section 13.3.3, help mitigate
these counterfactual features. Intuitively, an intensity model is one where a rm can default
at any time, thereby leading to positive short-term spreads. If the probability of default is
time-varying and mean-reverting, long-term spreads might the be positive, especially when the
current default probability is so low to be expected to increase to high levels, due to meanreversion.
13.3.1.2 Assessing credit quality

We introduce a useful summary statistics to measure credity quality, called distance-to-default


(under Q). We can use the previous model to estimate the likelihood of default for a given rm.
First, we develop Eq. (13.2),
= min {

}=

I{
728

I{

c
by
A. Mele

13.3. Conceptual approaches to valuation of defaultable securities

where I{E} is the indicator function, i.e. I{E} = 1 if the event E is true and I{E} = 0 if the event
E is false. Second, we have,
0

E( )

I{
E

=
=

E I{

| Default) Q (Default) +

[E (

Q (Survival)]

(13.6)

where E ( | Default) is the expected rm asset value given the event of default, Q (Default) is
the probability of default, and Q (Survival) = 1 Q (Default) is the probability
the rms does

not
The last
E
I{
} =
equality
default.

follows
by the Law of Iterated Expectations,

E E
I{
= E I{
|
) = E I{
| Default) .
}
} E(
} E(
Comparing Eq. (13.6) and Eq. (13.4) reveals that for the Mertons model,
Q (Survival) =

( 2)

(13.7)

where for obvious reasons, the quantity,


2

ln (

0/

)+

1
2

(13.8)

is called distance-to-default. Distance to default is a useful summary statistics, providing an


intutive gauge of how far the rm is from defaulting: the higher 2 , the higher the (riskadjusted) probability of surviving, ( 2 ). The higher the current rm asset value 0 is, the less
likely it is the rm will default at .
1 2
By Eq. (13.1), we have that E (ln | 0 ) = ln 0 + (
) , so Eq. (13.8) tells us that
2
distance-to-default is simply the di erence E (ln | 0 ) ln , normalized by the standard
deviation of the assets over the life of debt. Some, then, might prefer to use the slightly di erent
formula,
asset mkt value
default value
Distance-to-default =
asset mkt value asset volatility
How does the probability of survival for a given rm relate to debt maturity or asset volatility?
In Figure 13.4, the probability of survival decreases with, (i) debt maturity and (ii) asset
volatility.

Pr(surv)

1.0

0.9

0.8

0.7

0.6

0.5
0.0

0.1

0.2

729

0.3

0.4

sigma

c
by
A. Mele

13.3. Conceptual approaches to valuation of defaultable securities

FIGURE 13.4. Probability of survival for a given rm predicted by the Mertons model,
( 2 ), depicted as a function of the asset volatility, . Firm asset value is xed at 0 = 1 1,
and plotted are survival probabilities for bonds maturing at
= 0 5 years (solid line),
= 1 year (dashed line) and
= 2 years (dotted line). The short-term rate, = 3%.
Nominal debt
= 1.

Property (i) is not a general property, though. For example, we already pointed out that for
1 2
large , the probability of survival is close to one as soon
, a condition ensuring the
2
rm asset value grows so large to ensure default becomes unlikely, eventually. The next picture
1 2
shows that for a xed , such that
, the probability of survival is non-monotonic in .
2

Pr(surv)

1.00
0.98
0.96
0.94
0.92
0.90
0.88
0.86
0

10

15

20

25

30

years

Probability of survival for a given rm predicted by the Mertons model, ( 2 ), depicted


as a function of time-to-maturity, when the rm asset value is xed at 0 = 1 1, and asset
volatility is = 0 10. The short-term rate, = 3%. Nominal debt
= 1.

1 2
This property arises for the following reason. Assuming that 0
and
, the rst
2
term of 2 in Eq. (13.8) is decreasing in , whereas the second is increasing. When is small,
the rst term (and its sensitivity to dominates, such that distance-to-default decreases with
maturity. But for large, the second term of 2 dominates, and distance-to-default becomes
eventually large. Non-monotonicities arise even at nite maturities, once we consider low values
of 0 , in which case the relation between maturity and probability of survival can be increasing
or decreasing, according to the values of , as shown in Figure 13.5. Intuitively, when 0
,
the probability of survival is:

1 2
1 2
ln 0 +
2
2
Q (Survival) = ( 2 ) , with 2 =

such that the survival probability decreases in for large although then it increases in for
small . The intuition underlying this property is that for large , the probability the rm asset
value will end up below from 0
can only increase with time to maturity, . Analytically,
730

c
by
A. Mele

13.3. Conceptual approaches to valuation of defaultable securities

E (ln | 0 ) = ln 0 +
asset value will be below

Pr(surv)

1 2
ln +
, such that the probability the rm
2
does indeed increase with .
1
2

1.0

0.9

0.8

0.7

0.6

0.5
0.0

0.1

0.2

0.3

0.4

sigma

FIGURE 13.5. Probability of survival for a given rm predicted by the Mertons model,
( 2 ), depicted as a function of the asset volatility, . The rm asset value is xed at
= 0 5 years
0 = 1 01, and plotted are survival probabilities for bonds maturing at
(solid line), = 1 year (dashed line) and = 2 years (dotted line). The short-term rate,
= 3%. Nominal debt
= 1.

A nal useful concept is that of loss-given-default (under Q), denoted as LGD in the sequel.
Comparing Eq. (13.6) with Eq. (13.4) reveals another property of the Mertons model,
E(

| Default) =

( 1)
=
Q (Default)
0

(
(

1)
2)

= E(

(
(

1)
2)

E(

Recovery rates are dened as the fraction of the bond value the bond-holders expect to obtain
at maturity and in the event of default:
E(

Rec

| Default)

(
(

1)
2)

Loss-given-default is dened as the fraction of the bond value the bond-holders expect to lose
at maturity and in the event of default, i.e., LGD = 1 Rec. Finally, by Eq. (13.5), we can
write,

1
0
( 1) + ( 2)
s0 =
ln
1

=
1

ln (Rec Q (Default) + Q (Survival))

[LGD Q (Default)]

(13.9)

This is actually a general formula, which goes beyond the Mertons model. It can easily be
obtained through Eq. (13.6).
731

13.3. Conceptual approaches to valuation of defaultable securities

c
by
A. Mele

An important note. Previously, we dened survival probabilities, distance-to-default, and


loss-given-default, under the risk-adjusted probability Q. To calculate the same objects under
the true probability P, we replace with the asset growth rate under the physycal probability,
, in the formulae for the survival probabilities, ( 2 ), distance-to-default, 2 , and loss-givendefault.
However, it is hard to estimate for many single names. Moodys KMV EDFTM are based
on dynamic structural models like these, although the details are not publicly known. Finally,
we could use historical data about default frequencies to estimate the probability that a given
single name within a certain industry will default. These frequencies are based on samples of
rms that have defaulted in the past, with similar characteristics to those of the rm under
evaluation (in terms, for instance, of distance-to-default).
How to estimate
and ? One algorithm is to start with some equal to the volatility
(0)
of equity returns, say (0) , and use Mertons formula for equity, to extract
for each date
(0)
to compute the standard deviation of
{1 T }, where T is the sample size. Then, use
(0)
(0)
(1)
ln(
, which can be used as the new input into the Mertons formula
1 ). This gives say
(1)
() T
to extract say
. We obtain a sequence of (
) =1 and ( ) , and we stop for su ciently
large, according to some criterion.
13.3.1.3 One example

Assume a rm has asset value 0 = 110, and that the asset value volatility is
= 30%,
annualized. The safe interest rate is = 2%, annualized, and the expected growth rate of the
asset value is = 5%, annualized. The rm has outstanding debt with nominal value = 100,
which expires in two years.
First, we compute the distance-to-default implied by the Mertons model, which is,

1 2
1
2
ln 0 +
0
3
2
ln
(1
1)
+
0
02
2
2
D-t-D =
=
= 0 10680
03 2
Accordingly, the probability of default is,
1

(0 10680) = 1

0 54253 = 0 45747

We can compute the same probability, under the physical probability, by simply replacing
= 2% with = 5%, in the formula for D-t-D. We have,

1 2
1
2
0
3
2
ln 0 +
ln
(1
1)
+
0
05
2
2
D-t-Dphysical =
= 0 24822
=
03 2
Therefore, the probability of default under the physical distribution is,
1

physical

(0 24822) = 1

0 59802 = 0 40198

It is, of course, lower under the physical probability than under the risk-neutral probability,
due to the larger asset growth rate,
.
Finally, we can compute the spread on this bond, which is given by:

1
0
Spread =
( 1) + ( 2)
ln
732

13.3. Conceptual approaches to valuation of defaultable securities


where

= D-t-D, and

= 2+
. So we have,

1
Spread =
0 10680 + 0 30
ln 1 1 0 022
2

1
=
ln 1 1 0 022 0 29769 + 0 54253
2
= 6 20%

c
by
A. Mele

2 +

(0 10680)

13.3.1.4 Stocks and bonds

On the 5th of August 2011, the rating agency Standard & Poors downgraded the US debt from
AAA to AA+ for the rst time in history. The US and global equity markets sunk (the DJIA lost
nearly 6%) on the rst trading day (Monday the 8th) following the announcement. Somehow
paradoxically, US Treasuries rallied on the very same day, a phenomenon many commentators
described as a ight-to-quality response to a quite unique event. The reason this rally seems
paradoxical is that the downgrade regarded, obviously, US debt! Moreover, at the time, the US
debt/GDP ratio was hovering at about 100%, a fact that mitigates the case of US Treasuries
as safe-heaven assets.
But additional arguments made US Treasuries safe-heaven. First, some of the signals leading
to the downgrade regarded the political gaming about an increase in the debt ceiling, a gaming
that could have led to delinquencies.4 However, this gaming and its e ects were arguably transitory. Moreover, AA+ debt is accepted as a collateral in many transactions, and considered
to be high grade debt in most investment mandates. Furthermore, an issue at the time was
to speculate whether other rating agencies (Moodys and Fitch) would proceed with similar
downgrades of US debt that could have made the 05-08 decision more solidly grounded, so to
speak. Finally, the Standard & Poors decision was not totally unexpected, as rumours about
it would start circulate months earlier indeed.
However, ight-to-quality does not seem to be an exhaustive explanation for the US Treasury
rally during these events. First, during the period around the 5th of August, the US were ooded
with bad news regarding the economic fundamentals, with many leading indicators reaching
levels historically consistent with recessions. These news would make it likely (at the time) that
the US economy would spiral towards a second recession in less than four years, i.e., right after
that related to the subprime crisis (discussed in Section 13.6). As we know from Chapter 12,
recession fears translate into an expectation future short rates will lower as a result of the FED
attempt to stimulate growth, whence the Treasury rally.
Therefore, the rally of US Treasuries at the time might merely reect the expectation hypothesis that future rates would lower. In this period of bad news (which also included adverse
developments regarding the debt crisis in Europe), the Standard & Poors downgrade might
have come as a yet-another negative signal about the general health of the US system, which
would further deteriorate the general investment climate. Therefore, a rally in US Treasuries
could be explained by both ight-to-quality e ects (i.e., an uncertainty premium about the occurrence of possible disorderly tail events) and the market expectations about the FED response
to particularly severe economic developments.
The model in this section suggests one additional potential channel conducive to the equity
crash and the rally in US debt after the Standard & Poors downgrade decision. What happens
to the price of a stock and a bond, after an adverse shock hits the fundamentals? According
to Mertons model, they both fall after a decrease in . But which of the two prices will drop
4 In

the US, the Congress has to approve an increase in debt capacity.

733

c
by
A. Mele

13.3. Conceptual approaches to valuation of defaultable securities

more? After all, debt is less risky than stocks (due to subordination), and bad news about the
fundamentals should a ect stocks more than bonds.
The next picture depicts the price of bonds and stocks predicted by the Mertons model.
Naturally, the Mertons model is a very raw approximation to the events we are discussing
in this section. These events relate to sovereign debt, not rms debt! At the same time, this
model can shed some light into these events. It predicts that bond prices do not move too much
when the probability of default is small (i.e. when 0 is large enough), which might roughly
correspond to the situation where an agency announces a name to be downgraded from AAA
to AA+. Instead, stocks prices fall, and substantially, due to convexity, following an increase
in the probability of default (which occurs when 0 falls in the Mertons model).
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

A_0

The solid line depicts the value of the bond and the dashed line depicts that of the stock,
as predicted by the Mertons model, when the nominal value of debt is = 1, and = 1,
= 3% and = 20%, annualized.

While the Mertons model does not obviously regard the joint behavior of stock and sovereign
bond prices, it makes a sharp prediction regarding stock and bond prices for a rm. It is an
open issue as to whether these predictions would also apply to market developments related to
sovereigns. Naturally, the informal arguments of this section do not aim to rule out additional
explanations around the 05-08 events, such as ight-to-quality e ects and the expectation hypothesis. Rather, they suggest one additional hypothesis: even absent ight-to-quality e ects
or the expectation hypothesis, bond prices should not substantially fall, once the probability of
default for a name still remains very small.
13.3.1.5 First passage

The timing of default can be triggered by some exogeneously specied events. For example,
default occurs if the value of the assets hits some exogenously lower bound even before the
expiration of debt. These models are known as rst passage models, because they rely on
mathematical techniques that solve for the probability the rst time the asset value hit some
exogenous barrier, as in Black and Cox (1976). It is an interesting case of how asset evaluation
theories can develop. The evaluation methods in this section mostly rely on the option-pricing
734

13.3. Conceptual approaches to valuation of defaultable securities

c
by
A. Mele

toolkit available since Black & Scholes and Merton. However, rst passage approach by Black
and Cox has inspired further work on design and pricing derivatives with barriers (i.e. barrier
options), such as down (or up-) and-out and down (or up-) and-in options.
13.3.1.6 Strategic defaulting

The timing of default can actually be endogenous. We now analyze a simple model in which
equity holders choose a defaulting barrier (i.e. the rm asset value that triggers bankruptcy) so
as to maximize equity value. Naturally, strategic defaulting cannot arise under the assumptions
underlying the Modigliani-Miller theorem. This section analyzes the following mechanism. The
rm issues debt and needs to ensure coupon payments for this debt. Issuing debt adds value
as debt acts as a tax-shielding device. However, issuing debt exposes the rm to default, which
triggers bankruptcy costs (e.g., legal expenses). The presence of bankruptcy costs prevents the
rm from only issuing debt. Shareholders nance debt coupons by continuously raising equity
capital whenever the rm cash ows are not su cient to honour the coupon payments. The
rm cash ows include both asset performance and tax benets associated with the issuance of
debt.
Now, there a possibility of bankruptcy opened to the rm, which occurs (endogenously)
when the equity holders consider that the value of the assets is too small to warrant them a
lifetime positive expected return. Specically, in bad times, when the assets perform poorly,
shareholders do not necessarily liquidate the rm, because there might be chances that the assets
could perform better in the future. However, should the assets value become small enough, the
equity holders will stop paying the coupons and will liquidate the rm. Naturally, equity holders
choose the value of the asset that triggers bankruptcy to maximize the value of equity.
The model we analyze is developed by Leland (1994, Section VI.B), and extended to one
with nite maturity debt by Leland and Toft (1996). Anderson and Sundaresan (1996) consider cases of debt re-negotiation. [In progress: Literature review far from being completed]
Lelands model considers liquidation of the rm as a strategic choice of the equity holders, as
explained. In fact, the US bankruptcy code includes both a liquidation process (Chapter 7) and
a reorganization process (Chapter 11), but Lelands model only considers rms liquidation at
bankruptcy. Broadie, Chernov and Sundaresan (2007) generalize this setting to one where the
rm may choose to default through a reorganization process, in which case no equity is issued
to honour debt services as in the model analyzed in this section.
Formally, the terms leading to strategic defaulting are as follows. First, to model the rm
cash ows, we relax the assumption that the value of the assets, , is solution to Eq. (13.1).
Instead, we assume that the rm cash instantaneous ows are equal to a constant fraction of
the assets , such that, and generalizing Eq. (13.1),5
=(

Second, debt is innitely lived, in that it pays o an instantaneous coupon equal to , forever,
conditionally upon survival; in the absence of default risk, the value of debt would simply equal
. Third, tax benets are assumed to be proportional to the coupon,
. Fourth, there are
bankruptcy costs: if the rm defaults at =
, recovery is (1
) . Equity holders choose
. Naturally,
0.
5 In

particular, it is straightforward to check that the evaluation formula in Footnote 2 of this chapter collapses to
, once the instantaneous cash ows
=
.

735

c
by
A. Mele

13.3. Conceptual approaches to valuation of defaultable securities

The value of debt is a function of the rm asset value, , say ( ). Moreover, the rm
nances the net cost of the coupon by issuing additional equity, as explained above, and until
the equity value is zero, i.e. until =
, as seen below. That is, bankruptcy occurs when the
rm cannot meet the instantaneous coupon payments. Under the risk-neutral probability, the
value of debt satises:

E [ ( )| ]
+ |{z} =
( )
(13.10)
=
|
{z
} =coupon
=Expected capital gains

Due to Itos lemma, Eq. (13.10) is an ordinary


di erential equation, subject to the following
boundary conditions. First, at bankruptcy,
= (1
) . Second, for large , debt is
substantially riskless, i.e. lim
( ) = . The solution to (13.10), which also satises these
boundary conditions, is:
( ) = (1

( ))

( )

where

( ) (1

(13.11)

(13.12)

and is a positive constant.6


We interpret
( ) as the present value of $1, contingent on future bankruptcy, as further
explained in Appendix 1. Accordingly, (1
( ))
is the expected present value of the
coupon payments up to bankruptcy.
The total benets arising from tax shielding are
( ) = (1

( ))

and the present value of bankruptcy costs is,


( )=
Denoting equity with

( )

, we have that
+

= Firm value =

( )

( )

Summing up,
( )=

(1

( )) (1

( )

Equity equals (i) the rm asset value, ; minus (ii) the present value of debt contingent on
no-bankruptcy, net of tax benets, (1
( )) (1
) ; minus (iii) the present value of debt
contingent on bankruptcy, and net of bankruptcy costs,
( ) . The second term decreases
with the default boundary,
or, equivalently,
( ). The third term, instead, increases with
. So the time equity-holders wait before declaring bankruptcy, which is inversely related to
, a ects these two terms in opposite ways.
6 Notably,

2(

1
2

1
2

2 2

+2

).

736

13.3. Conceptual approaches to valuation of defaultable securities

c
by
A. Mele

Equity-holders choose
to maximize the value of equity. The solution is a default boundary,
, such that the value of equity does not change for small changes in the value of the assets
around
, or
: 0 ( )| = = 0, a smooth pasting condition, as argued below. The result
is:
(1
)
=
(13.13)
1+
Similarly as in the American option case, the value of the option to wait can be shown to
2
does
be increasing with uncertainty,

this
. Finally, it is easy to check that
solution for7
maximize the value
;
: 0 =
;
= 0. In other
( ) in that
words, the equity holders access to an instantaneous dividend given by
+ , such that
their optimization program is
Z

(
)
( ) = sup
(
(1
) )
The solution to this real option problem is that described by Eq. (13.13). That is, the shareholders are willing to accept some temporary negative values of the dividends (and in this case,
they would inject new equity capital to make sure debt is honoured), althought then they would
declare bankruptcy as soon as the assets value reaches the threshold level
in (13.13) such
that the expected net present value of the dividends, and, hence, equity, , is zero.
How is it that tax shielding incentives do not seem to a ect the existence of a solution to
this problem? That is, the default boundary,
, is well-dened even with = 0. In fact, if
= 0, there are no reasons to issue debt in the rst place: with = 0, equity value is negative
at bankruptcy level
in Eq. (13.13). In fact, when
0, there is a level of leverage that
maximizes the value of the rm, according to simulations reported in Leland (1994). Finally,
note that the solution for
is independent of . However, as noted, without bankruptcy costs,
the rm would only issue debt.
13.3.1.7 Pros and cons of structural approaches to risky debt assessment

Pros. First, they allow to think about more complicated structures or instruments easily (e.g.,
convertibles, as we see in the next section). Second, they lead to simple yet consistent relations
between di erent securities issued by the same name. Structural approaches were very useful
for theoretical research during the 1990s.
Cons. The rm asset value and asset volatility are not observed. Must rely on calibration/estimation methods. Bond prices generated by the model 6= market prices. These models
are a bit di cult to use in practice, for trading or hedging purposes, as we know that in this
case we need theoretical prices that exactly match market prices. Finally, how do we go for
sovereign issuers?
Most important. Structural models predict unrealistically low short-term spreads: see, e.g.,
Figure 13.3. The intuition is that di usion processes are smooth: the probability of default tends
to zero as time to maturity approaches zero, because default cannot just jump in an unexpected
way. This is not what we exactly observe. Jumps seem to be a more realistic device to modeling
spreads, and will be introduced in Section 13.3.3.
7 More formally, it is easy to see that the rm value is maximized by setting
as small as possible. This maximization process
would imply that ( )
0 for all
(1
)
0. But limited liability requires that ( )
0 for all
.
,

(1
) , then, 00 ( )
It is easy to see that if
consistent with positive equity value is
: 0 ( )| =

0. Thus, and because


= 0.

737

( )|

= 0, the lowest possible value of

13.3. Conceptual approaches to valuation of defaultable securities

c
by
A. Mele

13.3.2 The structural approach in practice: the pricing of convertible bonds


Convertible bonds o er bond-holders the option to convert their bonds into shares of the rm.8
Chapter 11 (Section 11.8.2) provides an introductory discussion of these assets, along with
a numerical example of how to price them through a binomial tree. This section analyzes
convertible bonds in a continuous time model of the rms capital structure. We assume the
option to convert can be exercized at any time up to maturity. By denition, the face value of
the convertible is
Face value = $1 CR CP
(13.14)
where CR is the conversion ratio, i.e. the number of shares this face value converts into, and
CP is the conversion price, i.e. the stock price implicitly dened by Eq. (13.14).
Typically, the bond is any like other xed income instrument, with coupon payments, callable
features, and credit risk. Callable features are almost invariably embedded into this type of
contracts. The parity, or conversion value, is the value of the bond if the bond-holders decide
to convert. It is dened as
CV = CR
where is the price of the common share. Not only is the convertible bond price a ected by
interest rates, credit risk, or timing risk; it is also a ected by the movements of the underlying
stock price, due to a positive probability the bond might become a share in the future
convertible bonds are hybrid instruments. The embedded option o ers the bond-holders the
possibility to obtain equity returns (not just bond returns) in good times, while o ering protection against the downside. As mentioned in Chapter 11 (Section 11.8), convertible bonds are
usually callable as well. In this case, bond-holders are usually given the right to convert the
bonds, once they are called. The rationale behind callability is to induce the bond-holder to
convert the bond earlier.
Useful to trade volatility. Simplest example of convertible arbitrage is going long a convertible
and shorting a Treasury, which is the same as going long an option on the rm. Useful when
there are no available options on the rm to trade, and/or when these are very illiquid.
Pricing convertible bonds is a topic that has been intensively studied, theoretically. Ingersoll
(1977) provides the rst theoretical insights into the pricing of convertible and callable bonds.
Let us dene the dilution factor, denoted as , as the fraction of common equity that would be
held by the convertible bondowners if the entire issue was converted. If there are out shares
outstanding, and the convertible bond can be exchanged for shares, then, in aggregate,
=

out

We assume that the market value of the rm is equal to the value of its assets, , which is
a Geometric Brownian motion, as in Eq. (13.1). Let conv (
; ) be the aggregate value of
the convertible bond with time to maturity and face value . To simplify the presentation,
we do not consider callability issues. However, we shall provide some intuition about this issue
later. Let us assume that the stocks and the convertible bonds are the only two claims in the
capital structure of the rm. Since, after conversion, only the stocks will remain, then, the
post-conversion value of the convertible bonds is simply the conversion value of the convertible,
8 Strictly speaking, the option embedded into this kind of asset is a warrant, not an option. A warrant gives the holder a right
to purchase new shares, i.e. shares issued by the rm.

738

c
by
A. Mele

13.3. Conceptual approaches to valuation of defaultable securities


i.e.

. Moreover, we have that for any

0,
conv

(13.15)

The rst inequality in (13.15) is simple to understand. Indeed, suppose that conv (
; )
. Then, we can purchase the convertibles, convert them into shares and, nally, sell the
shares for
. The second inequality follows by the limited liability of the equity holders, and
the Modigliani-Miller theorem.
At maturity,
conv
( 0; ) = min { max {
}}
(13.16)
Indeed, max {
} is the value of the convertible, in case of no-default. Then, min{ }
is what the rm will pay to the bond-holders: in case of default, and in case of no-default.
We can re-express the terminal payo in Eq. (13.16) in a manner that allows for a better
understanding of the issues underlying the exercise of the convertibles. In particular, we have
that,
conv
( 0; ) = min { max {
}} = max {
min {
}}
(13.17)
Indeed, let min{
} the payment the rm is ready to supply if the the bond-holders do
} is obviously the payo prole to the bond-holders.
not convert. Then, max{
The terminal payo in Eq. (13.17) illustrates very clearly that convertible bonds embed an
option to convert, on top of the plain vanilla non-convertible bond. Intuitively, at maturity, a
non-convertible bond is worth min {
}, and the option to convert is either worthless (in case
of non conversion) or worth
(in case of conversion), i.e. it is max {
0}. This
intuition is conrmed, mathematically, as we have that:
max {

min {

}} = min {

} + max {

0}

Therefore, the value of the terminal payo is, by Eq. (13.17),


conv

) = min {

0;

} + max {

0}

(13.18)

One can show that it is not optimal to exercize the option to convert before maturity. Therefore,
to price the convertible bond, we only need to deal with the terminal payo in Eq. (13.18).
Eq. (13.18) shows that the current value of the convertible bond is the sum of the value
of a straight bond plus the value of options on the rm with strike price equal to
.
Accordingly, let (
; ) and (
;
) be the prices of the straight bond and the option
on the rm. We have,
conv

)=

)+

We may use the Mertons (1974) model to nd the price of the straight bond, (
the results in Section 13.2, it is:

ln ( / ) + + 12 2
(
; )=
( 1) +
1
1 =

(13.19)
;

). By

(13.20)

where is the instantaneous volatility of the assets, is the (constant) instantaneous shortterm rate, and is the cumulative distribution of a standard normal. Similarly, we may use
the Black-Scholes formula to compute the function :
(

)=

( 1)
739

(13.21)

c
by
A. Mele

13.3. Conceptual approaches to valuation of defaultable securities

Eq. (13.20) reveals the intuitive property that as gets large, (


; )
: the probability of default gets extremely tiny as the value of the assets gets large. Moreover, the BlackScholes formula, Eq. (13.21), suggests that (
;
)
as gets large. Thereconv
fore, by Eq. (13.19), we have that, for large ,
(
; )
+ (
)=
.
conv
Eq. (13.20) also shows that for small values of ,
(
; ) 0. To sum-up, the value of
the convertible bond is less than the value of the rm, , and larger than the conversion value,
. Moreover, it approaches
, as the value of the rm gets large. Figure 13.6 depicts the
price of the convertible bond as function of the value of the rm, as predicted by Eq. (13.19),
for a particular example. It is possible to show that the value of a callable convertible bond is
between the value of the straight and that of the convertible.

2.0

1.5

1.0

0.5

0.0

FIGURE 13.6. The value of convertible and straight bonds as a function of the current
asset value, , when the short-term rate = 3%, the asset volatility = 0 20, time to
maturity = 3 years, the dilution factor = 30%, and nominal debt
= 1. The solid
line depicts the value of the convertible bond. The dashed straight line starting from the
origins, and attening out to the constant
= 0 91393, is the value of the straight
bond. The two dashed straight lines starting from the origins are the no-arbitrage bounds
and in Eq. (13.15).

13.3.3 Reduced form approaches: rare events, or intensity, models


Default often displays a few striking features. It arrives unexpectedly, it is rare, and causes
discontinuous price changes. The structural models in the previous section do not accommodate
for these features because di usion processes are continuous, as explained in Chapter 4. As a
result, passage times are known, locally, so to speak. This feature is responsible of the low
short-term spreads these models predict.
13.3.3.1 Poisson-driven defaults

As an alternative to Brownian motions, we can model defaults, by assuming their arrival is a


Poisson process, of the kind introduced in Chapter 4. Suppose to count the number of times
some event happens. Denote with
the corresponding counting process, as in Figure 13.7.
740

c
by
A. Mele

13.3. Conceptual approaches to valuation of defaultable securities

Nt

D efault

t0

t2

t1

t3

FIGURE 13.7.

The time of default is simply the rst time


jumps, e.g. 0 in Figure 13.7. More in detail,
assume we chop a given interval [0 ] in pieces, and consider each resulting interval
= .
Assume that the jump probability over each of these small intervals of time
is proportional
to , with proportionality factor equal to ,
Pr {One jump over

}=

(13.22)

Assume the number of jumps over the


Pr { jumps over [0
For

intervals follows a binomial distribution:



]} =
(1
)
where =

large, or, equivalently, for small intervals


Pr { jumps over [0

]}

,
)

)
!

We rely on these heuristic calculations and derive a few basic properties of default. We have,
Pr {Survival by
Pr {Default by

} = Pr {0 jumps over [0 ]} =
} = Pr {at least one jump over [0
=1

Pr {Default occurs at some } =

Pr {Survival by

]}

}=1

Note that the expected time to default equals 1 .


We can use these probabilities to value debt subject to default risk. Consider a simple case,
namely when default can occur over any
( ], but default implications do only occur at ,
741

c
by
A. Mele

13.3. Conceptual approaches to valuation of defaultable securities

in that at , the bond is equal to either the recovery value (in case of default over any
or the nominal value (in case of survival). In this case we have, by Eq. (13.6), that:
0

[Rec Q (Default) +
|
{z

Q (Survival)]
}

])

(13.23)

where Rec is the expected recovery value of the asset. Using the probabilities predicted by the
Poisson model, we obtain:
0

= Rec 1

(13.24)

Appendix 2 supplies an alternative derivation of Eq. (13.24).


13.3.3.2 Predicted spreads

The implications for the spreads corresponding to small maturities


some approximations,

Spread =

ln

1
1 =
1

can be easily seen after

LGD.

Note that in contrast to the structural models reviewed in Section 13.3.1, the spread is not
zero when is small. Rather, it is given by the expected default loss per period, dened as the
instantaneous probability of default times LGD,
Short-Term Spread =

LGD.

Therefore, models with jumps have the potential to explain the empirical behavior of credit
spreads at short maturities discussed in Section 13.3.1. As explained, structural models, being
typically driven by Brownian motions, cannot lead to positive spreads at very short maturities,
as they imply that the probability of default decays quickly as time-to-maturity goes to zero.
Instead, models with jumps predict a possibility the rm can experience a sudden death:
default can occur with positive probability at any time, even when the debt is about to expire.
A theoretical model of Du e and Lando (2001) shows how a structural model of the rm can
lead to positive short-term spreads, once we assume incomplete information and learning about
the asset value. In their model, learning takes place with some delay, which leaves investors
concerned about what they really know about the rm asset value. It is this concern to lead
to positive credit spreads in their model, to an extent comparable to that generated by a jump
process.
Figure 13.8.1 depicts the behavior of the spread predicted by the model at all maturities,
given by,

1
1
Rec
0
Spread =
ln
=
ln
1
+
742

c
by
A. Mele

13.3. Conceptual approaches to valuation of defaultable securities

Spread

240
239
238
237
236
235
234
233
232
231
0

Time to maturity

FIGURE 13.8.1. The term structure of bond spreads (in basis points) implied by an
intensity model, with recovery rate equal to 40% and intensity equal to = 0 04, implying
1
an expected time-to-default equal to
= 25 years.

In this example, spreads are decreasing in time-to-maturity. Eventually, as time to maturity


gets large, the bond becomes somehow certain to default, with the unusual feature to deliver,
for sure, some recovery rate at some pointthe bond is certain to deliver the recovery rate.
Indeed, in Appendix 2, we show that if the recovery value of the bond is not constant, but shrinks
exponentially to zero as Rec

, for two constants and , then, asymptotically, the


spread is:
lim ( ) = min {
}
(13.25)
The interpretation of does not link to discounting. Rather, might be referred to as a
recovery dissipation rate due to unfolding of time, in the following sense. As time unfolds,
the likelihood of occurrence of bad events increases, which leads the expected recovery to worsen.
Eq. (13.25) shows that if the dissipation rate is su ciently large, term spreads can be increasing,
as we discuss more comprehensively in a moment.
An instance leading to such an expected recovery rate is one where the recovery value of
the bond equals , if the rm defaults at any time , and provided an hidden risk does not
materialize, namely the risk that the rm will not distribute any recovery value at all, in case
of bankruptcy. If this risk is independent of bankruptcy, and Poisson, with instantaneous riskneutral probability , the expected recovery is precisely Rec =
. This is indeed a quite
simple way to model stochastic recovery rates.
Figure 13.8.2 plots the term-structure of spreads predicted by this model, obtained with the
same parameter values used to produce the spreads in Figure 13.8.1, and utilizing three values
for the dissipation rate: = 0 05, 0 03 and 0 013. Naturally, instantaneous spreads (i.e. those
corresponding to time to maturity equal to zero) are (1
) = 240 (in basis points) in all
cases.
When
, long-term maturity spreads are always higher than short, by Eq. (13.25). When
, instead, spreads for large maturities can be either higher or lower than short, according
to whether is higher or lower than (1
) . In this particular example, long-term spreads
743

c
by
A. Mele

13.3. Conceptual approaches to valuation of defaultable securities

equal 400 basis points, i.e. the default intensity, . When = 0 03, which is lower than = 0 04,
they are higher and when = 0 013, they are lower. In fact, when = 0 013, the term structure
of the spreads is even hump-shaped, a feature not visible from the picture. As is clear, this very
simple model predicts features of both short-term and long-term spreads that the Mertons
model in Section 13.3.1.1 cannot, realistically.

Spread

310
300
290
280
270
260
250
240
0

10

12

14

16

18

20

Time to maturity

FIGURE 13.8.2. The term structure of bond spreads (in basis points) implied by an
intensity model with recovery rate equal to 0 40
, where is time to maturity and is
the recovery dissipation rate, taken to equal = 0 05 (solid line), = 0 03 (dashed line),
and = 0 013 (dotted line). The instantaneous probability of default is taken to equal to
1
= 0 04, implying an expected time-to-default equal to
= 25 years.

We can interpret this behavior of term spreads as follows. Suppose we are in good times,
when is small relative to . We are in good times precisely because we expect things would
change adversely in the future, captured by a large value of . In this case, the term structure
of spreads is increasing. Instead, in bad times, when is large compared to , we might expect
future times to improve, which we might model by assuming is small.
Figure 13.8.2 shows that long maturity spreads are smaller than in good times. Naturally,
we would expect that spreads should increase for any maturity in bad times, although this
property is not captured by the numerical examples in Figure 13.8.2, where we x = 0 04.
The point of this exercise is to show that the slope of the term structure of the spreads lowers
as we enter bad times, when we only consider changes in the dissipation rate, . Allowing for a
countercyclical would reinforce the conclusions of this exercise. While these conclusions rely
on comparative statics, Section 13.5.5.5 shows that they still hold in a dynamic context, where
the intensity follows a mean-reverting continuous-time model.
13.3.3.3 One example

Naturally, the intensity, , is the risk-neutral instantaneous probability of default, not the physical probability of default,
say. The ratio
is generally larger than one. Its inverse,
,
is an indicator of the risk-appetite in the credit market. Similarly, LGD is an expectation under
the risk-neutral probability, and should contain useful indications about market participants
risk appetite.
744

c
by
A. Mele

13.3. Conceptual approaches to valuation of defaultable securities

Assume that under the risk-neutral probability, the instantaneous intensity of default for a
given rm is = 4%, annualized, and that under the physical, the instantaneous probability of
default for the same rm is
= 2%, annualized. From here, we can compute the probability
of survival of the rm within 5 years, under both probabilities. They are:
5

50 04

= 0 81873

50 02

= 0 90484

Naturally, the probability of survival is lower under the risk-neutral probability.


Next, assume that the spread on a 5 year bond with face value
= 1, equals 3%. What is
the implied expected recovery rate from this spread? We have,
0

[Rec Q (Default) +

The spread is,


s0 = 3% =

Q (Survival)] =

1
Rec (1
ln
5

[Rec (1

0 81873) + 1 0 81873]

0 81873) + 1 0 81873
1

Solving for Rec, gives, Rec = 23 16%.

13.3.4 Ratings

From

In practice, corporate debt is rated by rating agencies, such as Moodys and Standard and Poors.
Depending on the rating, corporate debt may be either investment grade or non-investment
grade (junk). Moodys ratings range from Aaa to C. Standard and Poors range from Aaa to
D. One can compute the probability of migrations based on past experience
Transition
probabilities. Consider, for example, the following table:
One year rating transition probabilities (%), S&P's 1981-1991
To
AAA
AA
A
BBB
BB
B
CCC
AAA
89.1
9.63
0.78
0.19
0.3
0
0
AA
0.86
90.1
7.47
0.99
0.29
0.29
0
A
0.09
2.91
88.94
6.49
1.01
0.45
0
BBB
0.06
0.43
6.56
84.27
6.44
1.6
0.18
BB
0.04
0.22
0.79
7.19
77.64
10.43
1.27
B
0
0.19
0.31
0.66
5.17
82.46
4.35
CCC
0
0
1.16
1.16
2.03
7.54
64.93
D
0
0
0
0
0
0
0

D
0
0
0.09
0.45
2.41
6.85
23.19
100

TABLE 13.1
13.3.4.1 Foundations

A natural approach is to assess credit risk by making reference to probabilities of default built
up on transition probabilities like those in Table 13.1.
Such an approch, also known as a migration approach, is somewhat less drastic than that
based on rare events, and hopefully more realistic. However, it is also technically more complex
than the intensity approach of the previous section. We provide the most foundational issues
of this approach, leaving some details in the Appendix.
At time , there exists several rating classes, say, denoted as Rat ,
Rat

{1 2
745

c
by
A. Mele

13.3. Conceptual approaches to valuation of defaultable securities


Transition probabilities of rating from time to time
(

are,

Pr (Rat = | Rat = )

We can build a Markov chain from here, by assuming that


Finally, we must have that,
(

0 and

) only depends on

) =1

=1

For example, the probability of transition from rating Rat = to rating Rat +1 = in one
year is, (1) . Table 13.1 contains one possible example of (1) . The probability of transition
from rating Rat = to rating Rat +2 = in two years is (2) , and is obtained as follows,
(2) =

X
=1

(1)
| {z }

Pr(transition from

to

in one year)

More generally, we have, ( ) = (1) , where


For example, the probability transition matrix

89 1
0 86
0 09
0 06
0 04
0
0
0

Pr(transition from

(1)
| {z }
to

in one further year)

( ) is the matrix with elements { ( ) }.


in Table 13.1 is,

9 63 0 78 0 19
03
0
0
0
90 1 7 47 0 99 0 29 0 29
0
0
2 91 88 94 6 49 1 01 0 45
0
0 09
0 43 6 56 84 27 6 44
16
0 18 0 45
0 22 0 79 7 19 77 64 10 43 1 27 2 41
0 19 0 31 0 66 5 17 82 46 4 35 6 85
0
1 16 1 16 2 03 7 54 64 93 23 19
0
0
0
0
0
0
100

The 15 year transition matrix is:

(15)

20 01 35 82 23 91 9 92
4 05 3 06
3 38 30 28 32 71 15 91 6 38 5 11
1 17 13 12 34 21 21 93 9 69 8 01
0 64 6 76 22 21 22 40 12 42 11 93
0 33 3 22 10 71 13 616 11 36 14 68
0 14 1 65 5 01
6 75
7 48 13 17
0
1 08 3 54
3 90
3 51 5 60
0
0
0
0
0
0

0 43
0 77
1 29
2 09
2 78
2 64
1 22
0

2 66
5 34
10 33
21 39
43 16
63 04
81 02
100

13.3.4.2 Evaluation

The previous probabilities, { ( ) }, are meant to be taken under the physical world, not the
risk-neutral. They can be used for risk-management purposes, but certainly not for pricing.
Indeed, historical default rates are too low to explain the price of defaultable securities. A
natural explanation relies on the presence of risk-premia. To use migration data for pricing, it
is vital to implement a number of steps.
The rst step relates to clean up the data. For example, it might be that downgrades from
class to class + 2 are more frequent than downgrades from class to class + 1, an occurrence
746

13.3. Conceptual approaches to valuation of defaultable securities

c
by
A. Mele

which we wish to smooth. Moreover, we would need to remove zero entries: although some
rating events did not happen in the past, they might well occur in the future. Finally, we need
to add positive risk-premia to the previous smoothed data, to recover realistic asset prices.
As for the pricing details, the migration model relies on the assumption that there are
classes of assets. Each single asset may migrate from one class to another. Because evaluation
is a dynamic business, we cannot evaluate defaultable securities within a given class without
simultaneously evaluate the defaultable securities in the remaining classes. For example, there
could be a chance that a given asset will mutate into a di erent one over the next year (i.e.
one belonging to another rating class). Therefore, the price of this asset, today, needs to reect
the price of the asset in the other classes where it can possibly migrate. As a result, we must
simultaneously solve for all the asset prices in all the rating classes. This approach, developed
by Jarrow, Lando and Turnbull (1997), is quite complex and is given a succinct account in the
Appendix.
Consider a simple case, arising when default can occur at any time before maturity , but
default implications arise only at maturity, in that the bond pays a recovery value only at ,
should the rm default at any time prior to . Let Q (
) denote the risk-neutral probability
the rm defaults, by time , given it belongs to rating at time . By Eq. (13.6),

Rec
0
) + (1 Q (
))
=
Q (
The risk neutral probabilities, Q (
), must be found using migration frequencies such
as those in Table 13.1, which we must clean up and corrct with appropriate risk-premia as
discussed.
13.3.4.3 One example

Consider the following transition matrix:


A
A
0.9
From
B
0.15
Def 0

To
B
0.07
0.75
0

Def
0.03
0.10
1

where Def denotes the state of default. What is the probability that a name A will remain name
A in two years? What is the probability that a name A will default in two years?
Consider the following two year transition matrix:
0 90 0 07 0 03
0 90 0 07 0 03
0 15 0 75 0 10 0 15 0 75 0 10
Q (2) =
0
0
1
0
0
1
{z
} |
{z
}
|
Q(1)

such that:

Pr {A is A in 2 years} =

Q(1)

0| 90 {z
0 90} + (0 07) (0 15) + 0| 03{z 0}
|
{z
}

= 0 8205
747

13.3. Conceptual approaches to valuation of defaultable securities

c
by
A. Mele

and
Pr {A defaults in 2 years} =

0| 90 {z
0 03} + (0 07) (0 10) + 0| 03{z 1}
|
{z
}

= 0 064
In general, we have that:
Q (2) =

3
X
=1

and for any

Q (1) Q (1)

Q ( ) = Q (1) =

0 90 0 07 0 03
0 15 0 75 0 10
0
0
1

Next, consider the following transition matrix, under the risk-neutral probability:
A
A
0.80
From
B
0.15
Def 0

To
B
0.20
0.75
0

Def
0
0.10
1

From here, we may easily compute, again, the (risk-neutral) probability A will default in two
years, and the probability B will default in two years. We have,
0 80 0 20 0
0 80 0 20 0
0 15 0 75 0 10 0 15 0 75 0 10
Q (2) =
0
0
1
0
0
1
{z
} |
{z
}
|
Q(1)

such that:

Q(1)

Pr {A defaults in 2 years} = Q (2)13


= 0| 80{z 0} + (0 20) (0 10) +
|
{z
}
= 0 02

0| {z
1}

(multiply rst row by the third column), and,


Pr {B defaults in 2 years} = Q (2)23
= 0| 15{z 0} + (0 75) (0 10) + 0| 10{z 1}
|
{z
}
= 0 175

(multiply second row by the third column).


Finally, suppose that the bonds issued by both A and B mature in two years. Furthermore,
assume that if these two bonds default, they pay o the same recovery rate, equal to 30%, and
748

c
by
A. Mele

13.4. Credit derivatives and structured products based thereon

only at the end of the second period. From here, we can compute the credit spreads for the two
bonds. We have,
Price A = (0 30) (0 02) + (1 0 02) = 0 986
1
Spread A =
ln (0 986) = 7 0495 10
2

and,
Price B = (0 30) (0 175) + (1 0 175) = 0 8775
1
Spread B =
ln (0 8775) = 6 5339 10
2

13.4 Credit derivatives and structured products based thereon


13.4.1 Options and spreads
13.4.1.1 Total Return Swaps

In a total return swap (TRS, henceforth), one party, who owns some asset, that underlying the
TRS, receives from the counterparty payments based on a mutually agreed rate, either xed or
variable, and makes payments to the counterparty based on the return of the underlying asset,
which includes both the income it generates and any capital gains. The underlying asset can
be a loan, a bond, an equity index, or a basket of assets. The interest payments are typically
based on the LIBOR plus a spread. Consider the following example. Party A receives LIBOR
+ xed spread equal to 3%. Party B receives the total return of the S&P 500 on a principal
amount of $1 million. If the LIBOR is 7% and the S&P 500 is up by 12%, A pays B 12% and
B pays A 7% + 3%. By netting, A pays B $20,000, i.e. $1 million (12% 10%).
While TRS are usually categorized as credit derivatives, they combine both market risk and
credit risk. The main benet from going long a TRS is that the party with the asset on the
balance sheet buys protection against loss in value. The main benet from shorting a TRS
is that it allows the counterparty to receive the payo s of the underlying without necessarily
having to put this underlying in the balance sheet. Hedge funds nd it quite convenient to short
a TRS, as this allows them to have views with limited collateral upfront. The market for TRS
is over-the-counter and market participants include institutions only.
13.4.1.2 Spread Options

Spread Options (SO, henceforth) are options written on the di erence between two indexes. For
example, let 1 ( ) and 2 ( ) be the prices of two assets at time . The payo promised by a
SO entered at some time
, might be max { 1 ( )
0}, where is the strike of
2( )
the SO. A SO can be written on the spread between two rates of returns too. Importantly, a SO
can be written on the spread between the yield of a corporate bond and the yield of a Treasury
bond. Examples include: (i) NOB spread (notes - bonds), which are spreads between maturities;
(ii) Spreads between quality levels, such as the TED spread (treasury bills Eurodollars); (iii)
MOB spreads, i.e. the di erence between municipal bonds and treasury bonds. More generally,
the denition of a SO has now been extended to include payo s written as a linear combination
of indexes, interest rates and yields.
749

c
by
A. Mele

13.4. Credit derivatives and structured products based thereon


13.4.1.3 Credit spread options

Credit spread options (CSO, henceforth) are options where the payo is the di erence between
(i) the spread between two reference securities (say Italian Government bonds and US Government bonds having the same maturity, or the spread between some stock return in excess of
the LIBOR, or two credit instruments), and (ii) a given strike spread, for a certain maturity
date. It may be an American or European option. So CSOs allow to hedge against, or take specic views about, changes in credit spreads. For example, an investor, while bullish on Italian
bonds, might hedge against the uncertain outcome of a political election, which could trigger a
widening of short-term spreads of Italians versus US. The investor, then, might go long a CSO,
with time to maturity around the days of the political election, where the underlying are the
Italian and Government bonds expiring in ten years, say. A possible payo to the CSO holder
might be proportional to, (ITA US
), where ITA US is the ten year Italian-US spread in
three months, and
is the strike spread.
13.4.2 Credit Default Swaps
13.4.2.1 Single name swaps

TRS provide protection against a general loss in asset value, which could be triggered by both
market or credit risk, although it is obviously more often market risk than credit to kick in.
Credit Default Swaps (CDS, henceforth) di er from TRS insofar as they provide protection
against a credit event.
The premium, assumed to be paid quarterly, on a CDS contract agreed at time , is obtained
by equating the expected discounted value of the protection (the oating protection leg), to
the expected discounted value of all the premiums paid over the life of the contract (the xed
premium leg), i.e. at dates :

= + 4 , and
is the number
1
2
4 , where
of years the CDS refers to. The discounted expected oating protection leg is:
Protection =

4
X

=1

LGD ( ) Pr {Default

)}

and the discounted expected xed leg is:


Premium =

4
X
=1

CDS (

) Pr {Survival at }

where is the (constant) risk free rate, CDS ( ) is the premium paid every quarter, prevailing
at time , and LGD ( ) is the LGD at time , which for simplicity is assumed to be constant,
i.e. known at time .
Equating Premium and Protection , and solving for CDS ( ), leaves:
P4
(
)
LGD ( ) Pr {Default ( 1 )}
=1
(13.26)
CDS ( ) =
P4
(
) Pr {Survival at }
=1

It is a forward premium, as long as


0 . We assume that if the obligor defaults prior to the
start date, 0 , the contract is terminated.
At rst glance, the previous derivation might look like actuarial, although it is not, actually.
The reason is that the probabilities in Eq. (13.26) are risk-neutral probabilities. As such, they
750

c
by
A. Mele

13.4. Credit derivatives and structured products based thereon

are, obviously, the same as those we use to price the bonds underlying the CDS contract.
Therefore, there are are no-arb relations that link bond prices to CDS premiums, which shall
be emphasized later on (see Section 13.4.5.4). This point illustrates in a remarkable way one
key di erence between nance and insurance. Even if in insurance, one may end up pricing
some products through risk-adjusted probabilities, nance is where we typically end up having
many more traded risks than in insurance, and these risks are tightly related through no-arb
restrictions.
Eq. (13.26) is a general formula we can use, once we have a model determining the riskneutral probability of default. In this chapter, we implement Eq. (13.26) through a reducedform approach, which will allow us to nd the quarterly premium (or spread) CDS ( ) quite
easily, as follows.
We have, denoting again with the instantaneous probability of default, that Pr{Survival at
(
)
( 1 )
(
)
}=
, and that Pr{Default at any
( 1 )} =
. Intuitively, if
the name survives at (event ), it must necessarily have survived at 1 (event
1 ), but
the converse is not true:
to
1 , and the complement of
1 is nothing but the event
of default between 1 and .9 Substituting the previous probabilities into Eq. (13.26), we nd
that:
(

P4
(
)
)
(
)
1
LGD ( )
=1
(13.27)
CDS ( ) =
P4
( + )(
)
=1

The denominator in the RHS of Eq. (13.27) is the defaultable-PVBP (Present Value of a Basis
Point), in perfect analogy with the expressions for the forward swap rate given in Chapter
12. Assuming that LGD ( ) is constant and equal to LGD for each , then, for a generic
=
1 , Eq. (13.27) can be simplied to,

1
(13.28)
CDS ( ) = LGD
That is approximately, for small
CDS (

LGD

(expected losses per unit of time)

(13.29)

Naturally, is the risk-neutral instantaneous probability of default for the security.


Note that in this simple model, the CDS premiums for a xed maturity are constant over
time, as a result of the assumption that the intensity of default, and the short-term rate, ,
are constant. Moreover, note that Eq. (13.29) shows that the CDS premium is approximately
the same as the instantaneous spread of a defaultable bonds, as explained in Section 13.2.
This property is to be expected, because a purchase of a defaultable bond and protection on
it amounts to a synthetic default-free bond. Therefore, there must be a no-arbitrage relation
between CDS spreads and defaultable bond spreads, as anticipated earlier.
In pratice, the approximation in (13.29) does not hold for longer maturities, due to the
unrealistic assumptions underlying it ( is constant, LGD is constant, is constant, etc.). On
the contrary, we often observe CDS spreads increasing with maturity, as further explained in
Section 13.4.5.4. Indeed, we may take interesting views. For example, buying CDS for two years
and selling CDS for three is a view that default will not occur between the second and the third
year from now.
9 Mathematically,
(

we have that Pr{Default at any

751

)} =

Pr{Default at

, where Pr{Default at

} =

c
by
A. Mele

13.4. Credit derivatives and structured products based thereon


13.4.2.2 Marking to market

Suppose you go long a CDS, meaning that at time , you commit to a swap agreementyou
pay CDS ( ) at time if the name survives by time , and receive LGD ( ), if default occurs
in the time interval [ 1 ], for 4 time intervals. Each swap payo the CDS-let so to
speakis:
cds ( ) LGD ( ) I{Default ( 1 )} CDS ( ) I{Survival at }
(13.30)
such that
CDS (

):0=

4
X

E (cds ( ))

=1

where E denotes the expectation conditional upon the information set at time , taken under
the risk-neutral probability. The solution to this equation is just that in Eq. (13.27).
What happens to the value of this contract at any subsequent time
( 0 )? The marking
to market value of the CDS is the present value of the risk-neutral expectation of the single
swaps payments cds ( ) in Eq. (13.30), consistently with the explanations in Section 10.4.6 of
Chapter 10. So the marking to market value of the CDS at is,
MtM (

4
X

)
=

=1
4
X
=1

= [CDS (

E (cds ( ))
)

LGD ( )
CDS (

)]

4
X

( + )(

CDS (

( + )(

=1

where the last line follows by the denition of CDS ( ), i.e. by setting
in Eq. (13.27).
These results perfectly match those in Chapter 12 regarding forward swap rates.
Note that in this model, marking-to-market is deterministic, because CDS premiums for a
xed maturity are constant over time, due to the fact that both and are constant. Markingto-market is actually zero once we assume that loss-given-default is constant, as Eq. (13.28)
shows that CDS spreads do not change in this case.
13.4.2.3 CDS on indexes, and options based thereon

A CDS index is a basket of credit entities in which the protection buyer pays the same premium
on all the names in the index, until a xed expiration date. Credit events are typically bound
to bankruptcy or delinquencies. After a credit event, the entity is removed from the index and
the contract goes through with a reduced notional amount, until expiration, as explained in
more detail below.
While CDS on single names are over-the-counter, CDS indexes are standardized and at the
time of writing give rise to relatively more liquid markets, as historical data on bid-ask spreads
show. In fact, it can be cheaper to hedge a portfolio of CDS or bonds with a CDS index than it
would be to buy many CDS to achieve a similar e ect. There exist two main indices: (i) CDX
index, which contains North American and Emerging Market companies; and (ii) iTraxx index,
which contains companies from the rest of the world.
Credit default swaptions are options to enter a CDStypically a CDS index. Consider, rst,
swaptions on single names. A payer swaption gives the right to buy protection at some future
752

13.4. Credit derivatives and structured products based thereon

c
by
A. Mele

date at some CDS xed strike, and a receiver swaption gives the right to sell protection. If
default of the name occurs prior to the swaption maturity, the contract is terminated. Note
that evaluating credit default swaptions is trivial in the pricing context of Section 13.6.3.1,
because CDS premiums move deterministically over time when the intensity of default and the
short-term rate are both constant. Section 13.6.3.9 hinges upon a continuous-time model of
stochastic intensity rates, and supplies an evaluation framework for these products.
Credit default index swaptions work di erently. Firstly, as noted, at inception, a credit default
index swap (CDIS) is referenced to a number of xed companies chosen by a market maker, each
carrying a given weight. Secondly, buyers of CDIS are typically those who provide protection to
market makers: they stand ready to pay a predetermined loss-given-default for any default that
occurs before maturity, which is constant and identical for all reference entities in the index. In
exchange, the market makers pay the CDIS buyer a periodic xed premiumthe credit default
index spread. After a default takes place, the nominal value of the CDIS is reduced by one, and
no replacement of the defaulted rm would take place, as further explained in Section 13.6.3.9.
13.4.2.4 Disentangling default probability from risk-aversion

The following picture, taken from Fender and Hordahl (2007), illustrates the behavior of the
credit market risk appetite before the 2007 credit market turmoil.

FIGURE 13.9. Antonio Mele does not claim any copyright on this picture, which is taken
from Fender and Hordahl (2007). The picture has been put here for illustrative purposes
only, and permission to the authors shall be duly asked before the book will be published.

How did the authors estimate the price of risk? Consider the expected losses under the
actuarial, or physical probability for a given security. The counterpart to Eq. (13.29), under the
physical probability, is:
Expected Losses
LGD
753

13.4. Credit derivatives and structured products based thereon

c
by
A. Mele

where
is the physical instantaneous probability of default for a given security. Assume that
LGD is constant, to simplify. If investors require compensation for default events, the actuarial
losses should be less than the CDS spread, i.e. Expected Losses
CDS, or,

The risk-premium is dened as the di erence between the actuarial losses, Expected Losses ,
and the CDS premium,

Risk-Premium =
LGD

The price of risk in Figure 13.9 is dened as the ratio of the CDS spread over Expected Losses ,
Price-of-Risk =
Early references to estimation methods are Du e et al. (2005) and Amato (2005). Typically,
Expected Losses are proxied by Moodys KMVs Expected Default Frequencies (EDFsTM ),
obtained through fully specied structural models for credit risk. The next pictures are taken
from Amato (2005). As we can see, during the 2003-2005 period, credit spreads were so low,
and this in turn gave incentives to CDO issuers to look for illiquid and relatively more complex
assets to put as collateral, which led to the issuance of CDO relying on ABS such as MBS, or
CDO2 , explained below.

FIGURE 13.10. Antonio Mele does not claim any copyright on this picture, which is
taken from Amato (2005). The picture has been put here for illustrative purposes only,
and permission to the author shall be duly asked before the book will be published.

754

13.4. Credit derivatives and structured products based thereon

c
by
A. Mele

FIGURE 13.11. Antonio Mele does not claim any copyright on this picture, which is
taken from Amato (2005). The picture has been put here for illustrative purposes only,
and permission to the author shall be duly asked before the book will be published.

The following picture illustrates the behavior of CDS indexes during approximately 20 years
before the 2007-2009 credit market turmoil.

FIGURE 13.12. Valuation of Financial Instruments Based on Implied Probability of Default. Antonio Mele does not claim any copyright on this picture, which is taken from
IMF (2008). The picture has been put here for illustrative purposes only, and permission
to the authors shall be duly asked before the book will be published.

755

c
by
A. Mele

13.4. Credit derivatives and structured products based thereon


13.4.2.5 Continuous time

We may relax the assumption the instantaneous intensity of default, , is constant. This intensity is dened under the risk-neutral probability and can change either because the intensity of
default under the physical probability changes or because risk-appetite changes, or both. We
examine the asset pricing implications of time-varying intensities, by exploring how probabilities of survival change in a simple setting, where we do not single out the reasons leading to
variations in .
First, we assume the instantaneous probability of default can only change discretely, giving
rise to random intensities , meaning that
is the intensity of default in the time interval
[
1 ]. Let F be the information set as of time . We assume that is F -measurable. What
is the probability of survival of any given name in this case? We have, by Bayess theorem,
Pr {Surv at | Surv at

1} =

Pr {Surv at }
Pr {Surv at
1}

(13.31)

By a repeated use of Eq. (13.31),


Pr {Surv at } = Pr {Surv at | Surv at
1} Pr {Surv at
=
Y
Pr {Surv at | Surv at
1}
=

1}

(13.32)

=1

So we are left with nding Pr {Surv at | Surv at


1}. Consider the following arguments.

was not random and xed at some


, then, Pr {Surv at | Surv at
1} =
.
If
When
is random,
is the probability of survival, conditioned upon some particular
value the intensity could possibly take. Heuristically, then, Pr {Surv at | Surv at
1} =
P
( )
Pr
{
},
where
(
)
is,
so
to
speak,
the
value
would
take
in
state
,
Pr
{
} is
S
the likelihood that state occurs and, nally, S is the set of all possible states, as illustrated
by Figure 13.3.
d e fa u lt
n

Pr2

s u rv iv a l
d e fa u lt

P r1
n

s u rv iv a l

FIGURE 13.13. This picture illustrates the determination of the probability of survival
in the case of random default intensities going over one period and two states. At the
beginning of period , nature draws the event dening the intensity of default, which is
either
(1) with probability Pr {1}, or
(2) with probability Pr {2} = 1 Pr {1}. Then,
(1)
the two paths leading to survival have probability of occurrence equal to Pr {1}
(2) , such that the total probability of survival equals Pr {1}
(1) +
and Pr {2}
(2)
.
Pr {2}

756

c
by
A. Mele

13.4. Credit derivatives and structured products based thereon

F 1 , where E denotes the expectation


Therefore, Pr {Surv at | Surv at
1} = E
taken under the risk-neutral probability. Inserting this result into Eq. (13.32), and using the
Law of Iterated Expectations, leaves:

P
=1
Pr {Surv at } = E

Under regularity conditions, we can easily extend the previous result to a continuous time
setting. For example, we may assume that the risk-neutral default intensity, , is solution to:
p

=
(13.33)
+
0 =

where
is a standard Brownian motion under the risk-neutral probability, and , and are
three positive constants. This is the same as the Cox, Ingersoll and Ross (1985) (CIR) model
of the short-term rate reviewed in Chapter 12. Therefore, under the parameter restrictions in
Chapter 12,
is always positive, and

R
0
(13.34)
) Pr {Surv at } = E
surv (
Eq. (13.34) is, formally, the same as the Feynman-Kac representation of a solution to a partial
di erential equation, solved by a bond price in the Cox, Ingersoll and Ross (CIR) (1985) model
of the previous chapter (Section 12.4.3.3). In other words, the survival probability in Eqs.
(13.33)-(13.34) is mathematically the same as the price of a zero coupon bond in the CIR
model. Therefore, the closed-form solution for surv ( ) is:

( )=

surv

2
( + )(

1
(
2

+ )

1) + 2

)=

( )

( )

1
1) + 2

2
( )=
( + )(

2
2

+2

(13.35)
More generally, we can build up a whole family of models with a closed-form solution, the
a ne class reviewed in Chapter 12, by assuming that:
=

(13.36)

where 0 is a constant, 1 is a vector of constants, and is a multivariate jump-di usion process,


with drift and di usion terms as in Section 12.4.6 of Chapter 12. This model is interesting, as we
can judiciously choose the components of which we suppose may a ect the default intensity.
For example, some of them could be unobservable, and others could be observable, and relate,
say, to the business cycle or even the structure of the rm.
Given any solution for the survival probability predicted by any of these a ne models when
) say, we can easily compute
0 = , surv (
Pr{Default

)} =

surv

1)

surv

(13.37)

We can look at the bond spreads and the CDS spreads implied by this modeling choice. In
Appendix 3, we show the price of a defaultable pure discount bond expiring in
years is:
Z
)+
Pr{Default
}Rec ( )
(13.38)
(
)=
surv (
0

757

c
by
A. Mele

13.4. Credit derivatives and structured products based thereon

where Rec ( ) denotes the recovery value in case of default, supposed to be known. This evaluation result is, naturally, consistent with a similar derivation provided in Section 12.4.7 of
Chapter 12, although in this chapter we are emphasizing more survival arguments.
As for the forward CDS spreads, we have, by Eq. (13.26),
P4
(
)
LGD ( ) [ surv (
)]
1)
surv (
=1
CDS ( ) =
P4
(
)
)
surv (
=1

where
is, again, the number of years the CDS refers to, and = + 4 .
Assume the short-term rate, , is zero, and that loss-given-default is constant and equal to
LGD. Then, as shown in Appendix 3, the price of a defaultable pure discount bond, (
),
and the CDS premium, obtained from the forward once we set = 0, CDS0 ( ), are given by:
(

LGD (1

)=1

surv

))

1
CDS0 ( ) = LGD P4

=1

Figure 13.14 depicts the bonds spread,

ln

surv

surv

)
)

(13.39)

), and the annualized credit default


q
spreads, 4CDS0 ( ), with parameters in Eq. (13.33) xed at = 0 30, = 0 04 and = 12 ,
and LGD = 0 60, and two values of the current intensity: = = 0 04, and = 0 02. Assuming LGD is constant is not plausible, empirically. Instead, we know LGD moves procyclically,
although it does not exhibit strong business cycle features, for sovereigns. Regarding sovereigns,
the size of the country and debt distribution seem to be by far more important.
Spreads, in basis points, for average default intensity
245

Spreads, in basis points, for low default intensity


200

240

190

235

180

230

170

225

160

220

150

215

140

210

205

130

bond spreads
CDS spreads, annualized
0

10

120

years

bond spreads
CDS spreads, annualized
0

10

years

FIGURE 13.14. Spreads on bonds and CDS predicted by the a ne model in Eq. (13.33).
The left panel depicts the spreads when the current default intensity equals the long-run

758

c
by
A. Mele

13.4. Credit derivatives and structured products based thereon

mean, = = 0 04. The right panel depicts the spreads in good times, i.e., when the
current intensity of default takes a low value, = 0 02. In each case the recovery rate
equals 40%.

The mechanism is that given the mean-reverting behavior of , good times are likely followed
by bad,such that when = 0 02, we expect default rates to rise in the future. Spreads increase
with maturity as a result. Moreover, bond spreads are approximately equal to CDS spreads at
short maturities. At longer maturities, the two spreads diverge, with CDS spreads, 4CDS0 ( ),
dominating bonds spreads, 1 ln (
). Moreover, the two curves are decreasing in time
to maturity even when the current value of the intensity equals the long-run one, . This
property is due to the assumption that recovery rates are constant, as explained in the constant
intensity case dealt with in Section 13.3.3.2. In Appendix 5, we provide additional details and
explanations regarding these properties.
13.4.2.6 A trading strategy

Bond prices and CDS spreads are driven by the same state variable, the default intensity, and
so they are restricted to lie on some space, to be consistent with no-arbitrage. To illustrate,
consider, rst, the simple case where the default intensity is constant, such that CDS spreads
are given by Eq. (13.27). Given this model, we can look at the market data for CDS spreads,
and infer the risk-neutral intensity, as in the picture below.
Inferring riskneutral intensity from CDS market data

CDS spreads, modelbased, in basis points

350

300

250

200

150

100

50

0.01

0.02
0.03
Default intensity

0.04

0.05

In this picture, the CDS spreads predicted by Eq. (13.27) are depicted as a function of the
risk-neutral intensity, , assuming = 5 years, LGD = 0 60 and the short-term rate is zero.
For example, if we had to observe a CDS premium equal to 200 basis points, we would infer a
risk-neutral intensity approximately equal to = 0 033. The key point is this same should

also be pricing the zero, such that for


= 5, ( ) = 1 LGD (1
) = 0 90874, and
so we go long both the bond and the CDS if the bond market price is less than 0 90874. The
rationale behind this strategy is that the intensity implied by the current CDS premium is too
low to justify the bond price, so bond prices and/or intensity (and, hence, CDS premiums)
should increase. The exactly opposite reasoning applies when the bond price is higher than
0 90874.
759

c
by
A. Mele

13.4. Credit derivatives and structured products based thereon

The previous example relies on a constant default intensity; however, the same strategy
applies when default intensities are stochastic. The picture below shows the no-arbitrage restrictions between bond spreads and CDS spreads, obtained with the same parameter values as
those in Figure 13.14, and cureent intensity values ranging from 0.0050 to 0.05. It also provides
indications of a strategy aiming to exploit deviations from theoretical parity.
Noarb restrictions between bond spreads and CDS spreads
240

Bond spreads, modelbased, in basis points

220

Bond prices low (long)

200

CDS spreads low (buy protection)


180

160

Bond prices high (short)

140

CDS spreads high (sell protection)


120

100

100

120

140
160
180
200
220
CDS spreads, modelbased, in basis points

240

260

13.4.2.7 Hazard rates

In a pricing context, the relevant probabilities of survival are obviously conditioned upon the
time of evaluation, time 0 say. For example, the probability of default in Eq. (13.37) is only conditioned to the information we have at time zero. More generally, the probability of defaulting
in the time interval ( 1 ), conditional upon survival at time
1 , is:
Pr{Default

)| Survival at } =

surv

surv

For example, for =


) small, and
1 , and (
1
approximation to this conditional probability:
Pr{Default

1)

(13.40)

deterministic, consider the following

surv

)| Survival at }

surv

default

surv

)
(

1)

)
(
1)

default

(
)

1)

with straight forward notation. The previous expressions are known as hazard rates. They
coincide with
, when
is deterministic. If
is not deterministic, simple computations
lead to:
(13.41)
Pr{Default ( + )| Survival at } = E ( )
760

c
by
A. Mele

13.4. Credit derivatives and structured products based thereon


where

is a new probability, with Radon-Nikodym derivative given by:


R

0
=

)
surv (

(13.42)

, the state variables in Eq. (13.36) follow a di usion process, with a


Accordingly, under
drift process tilted, due to this change of probability. For example, in the simple setting of Eq.
(13.33), we have that, for a xed ,
= (B0
B0 =
where

B1 ( )
B1 ( ) =

(0 ]

() as in Eq. (13.35),

(13.43)

is a Brownian motion under

Pr{Default

. Therefore, by Eq. (13.41), and computations,

Z
R
( )
B ( )
0 1
+ B0
( )
)| Survival at } =
()
()
0

Appendix 5 provides a proof of these results, which to the best of our knowledge, are developed
here for the rst time.
13.4.2.8 Extracting probabilities of default from market data

Markets obviously convey information about default probabilities, which could be extracted
under a number of assumptions. To illustrate, assume zero recovery and that both the shortterm rate and the default intensity are continuous-time Markov and independent of each other.
Then, the price of a defaultable zero is def (
) = ( ) surv (
), where def (
) and
( ) are the prices of a defaultable and a non-defaultable zero. Therefore, we can read the
risk-neutral probability of survival from the defaultable/non-defaultable price ratio:
surv

)=

def

(
)
( )

(13.44)

Naturally, surviving until some 2 implies having survived until any 1 : 1


2 and, from
to
.
Therefore,
(
)
=
(
)

(
),
where
(
1
2
surv
2
surv
1
surv
1
2
surv
1
2 ) is the
risk-neutral probability of survival between 1 and 2 . Using Eq. (13.44), then, we can extract
this probability, as follows:
surv

2)

(
def (
def

2)
1)

(
(

1)
2)

The previous example relies on the simplifying assumption of a zero recovery rate, but it
can be generalized to the case where the recovery rate is nonzero. However, in this case, an
identication issue arises, as prices would convey information about both default probabilities
and recovery rates.
13.4.2.9 Pricing credit default swaptions
Swaptions on single names

With stochastic intensity rates, we can think about the pricing of the credit default swaptions
that we briey mentioned in Section 13.4.2.3. We, now, actually assume that both default
761

c
by
A. Mele

13.4. Credit derivatives and structured products based thereon

intensities and the short-term rate are stochastic: we assume that the short-term rate is a a
di usion process, and that default arrives as a Cox process with intensity adapted to .10
Consider the following denition of a default swap in Section 13.4.2.1a contract whereby
a party stands ready to pay his counterparty a loss determined by a credit event for a given
ows of premiums, referred to as CDS premiums. At this level, we are not assuming that this
swap is worthless at origination. We assume that loss-given-default is constant and equal to
LGD. Denote the CDS premium, or coupon, agreed at time with ,
0 . Note that like
in Section 13.4.2.3, the contract we are dealing with is a forward default swap. We assume that
the contract is terminated should the underlying obligor default prior to the start date, 0 .
In this contract, the protection buyer commits to a swap agreement whereby it pays
at
time , if the name survives by time , and receives LGD, if default occurs in the time interval
[ 1 ], for 4 time intervals. Each swap payo is:
LGD I{Default

cds ( )

I{Survival

)}

at

(13.45)

where I{} is the indicator function.


The value of the default swap agreed at time is,
DS =

4
X

=1

cds ( ) = LGD

(13.46)

where E denotes the risk-neutral expectation, taken conditional upon the information set at
time , and,

I{

0 4

]}

4
X

=1

I{Survival

at

(13.47)

The interpretation of 0 is that of the value of one dollar paid o the rst time after default,
provided default occurs prior to the maturity of the default swap, 4 . Instead, 1 is the value
of an annuity of one dollar paid at the dates 1 2 4 , until default or maturity of the
default swap, whichever occurs rst. In other words, 1 is the value of a basket of defaultable
bonds with zero recovery valuea defaultable present value of the basis point.
The forward default spread is the value of
such that DS = 0. It is:
CDS (

) = LGD

(13.48)

such that the value of the default swap at time , agreed at time
DS ( ), can be expressed as:
DS (

) = LGD

(CDS (

, and denoted as

(13.49)

Note that the derivation leading to Eq. (13.49) generalizes that underlying the marks-to-market
updates in Section 13.4.2.2.
10 Section 13.4.2.5 shows how to generalize Eq. (13.26) to allow for stochastic intensities. It is easy to generalize further while
allowing for stochastic interest rates.

762

c
by
A. Mele

13.4. Credit derivatives and structured products based thereon

Next, note that for any F -measurable random variable


adapted to ( ) [ ] and satisfying enough regularity conditions, we have that for xed , and by the Law of Iterated
Expectations,
R

I{Survival at }

R

I{Survival at } F
=E E

=E

E I{Survival at } F
R

( + )
(13.50)

=E
where F is the information set at time , which includes the path of the short-term rate only.
Dene the probability sc through the Radon-Nikodym derivative:

R
sc
1
( + )
(13.51)
=
1

where is the risk-neutral probability. It is easy to see that sc does indeed integrate to one.11
Following Schonbucher (2003, p. 180), we refer to sc as the survival contingent probability.
Chapter 4 provides foundations on changes of probability, with this probability being a special
case of a general framework.
We can also show that for any
0,
R

( + )
(13.52)
0 = E
0

Indeed, 0 is the value of a basket of securities paying o contingent upon default not having
occurred prior to time , with
0 , in which case the value drops to zero. Following derivations
in Chapter 12 of these lectures, we have that
(0
is the
0
0 +
0 ) = 0, where
innitesimal generator for di usions, whence Eq. (13.52). By a similar reasoning, and prior to
, 1 in Eq. (13.47) satises the same partial di erential equation satised by 0 , whence the
claim that sc integrates to one.
Therefore, given the denition of sc in Eq. (13.51) and the martingale property of 0 in
Eq. (13.52), we have that the forward default spread in Eq. (13.48) is a martingale under the
survival contingent probability:
Esc (CDS (
R
=E

))
( +

1
1

CDS (

) =E

( +

0
1

LGD = CDS (

where Esc denotes the time- conditional expectation taken under the survival contingent probability, and where we have used the pricing equation (13.52) and the denition of CDS ( ) in
Eq. (13.48).
11 We could complete the arguments while relying on Sch
onbucher (2003, Chapter 7), Lando (2004, Chapter 5) and Chapter 12
of these lectures (see below).

763

c
by
A. Mele

13.4. Credit derivatives and structured products based thereon

We evaluate default swaptions by relying on the survival contingent probability. By Eq.


(13.49), the payo of a swaption payer is,
DS (

) = I{Survival

at

(CDS (

)+

. Relying on the property in Eq. (13.50) leaves:


R

DS ( )
E
R

+
( + )
= 1 Esc (CDS (
)
)
=E
1 (CDS (

for a strike

)+

We know that CDS ( ) is a martingale under the survival contingent probability. Let
be a Brownian motion under sc . Assume that
CDS ( )
=
CDS ( )

sc

sc

where is the volatility parameter, a constant. We can apply Black (1976) to obtain evaluation
formulae in this environment.
Swaptions on CDS indexes

Consider, rst, a CDS index, as succinctly described in Section 13.4.2.3. Let be the number
of names in the index decided at time . Each name has notional value equal to 1 , the same
loss-given-default LGD and the same default intensity . Denote with D ( 1 ) the number
of names having defaulted over the time interval ( 1 ),
D(

X
=1

I{Def

)}

where I{Def ( 1 )} is the indicator of the event that the -th name defaults over the time
interval ( 1 ). Dene the following swap payo , occurring at time , and generalizing that
in Eq. (13.45) holding for single names,

!
1X
1
1
D( 1 )
(13.53)
cdx ( ) LGD D ( 1 )
=0

where D ( 1 0 ) denotes the number of defaults occurred over the time interval ( 0 ). The rst
term of cdx ( ) is the loss in the index occurring at time , paid o by the protection seller,
whereas the second term is the protection premium, which equals the constant premium
times the outstanding notional.12
The value of the protection leg minus that of the premium leg over the life of the index is
obtained as:
4
!
X R
cdx ( ) = LGD 0
1
for
(13.54)
DSX = E
0
=1

12 According to standard market practice, the loss in the index would actually occur as soon obligors defaultwithout any need
to wait until the end of any of the time intervals . However, we cast the discussion in terms of a di erent timing convention, as
this makes the nature of the swap transaction in Eq. (13.53) transparent.

764

c
by
A. Mele

13.4. Credit derivatives and structured products based thereon

where 0 and 1 are as in Eqs. (13.47), and can interpreted as the values of securities indexed
on default events of an hypothetical representative name.
A CDS index at the time of origination
0 is, then, simply, the value of
0 in Eq. (13.54),
which makes DSX 0 = 0, viz
0

) = LGD

Next, consider a forward starting credit default index, which is an index starting at time 0 , as
before, but decided at some point prior to 0 , say at time . Clearly, the value of the protection
leg minus that of the premium leg over the life of the index is the same as that in Eq. (13.54),
for a generic
( 0 ),
0 . Moreover, in Appendix 7, we show that for any time
DSX (

0)

4
X
=1

where

cdx ( ) =

(LGD

(13.55)

denotes the outstanding notional,


=

1X
=1

I{Surv

()

at }

(13.56)

Finally, an index default swaption payer with strike , gives the holder the option to enter
a CDS index as a protection buyer with an index strike spread equal to . Upon exercise,
the protection buyer would also receive a front-end protection, dened as the losses occurring
from the option origination to the exercise date. Let be the option origination and = 0 the
maturity of the swaption. The front end protection is,
= LGD 1 D ( ), where D ( ) is
the number of defaults occurred over the time interval ( ). In Appendix 6, we show that the
value of the front-end protection is,
R

1
F
= LGD
=E
( (
)
))
(13.57)
D( ) (
)+
def (
where ( ) and def ( ) denote the price of a non-defaultable and a defaultable zero expiring at time , with zero recovery value and default intensity equal to that of the representative
rm, . The underlying of a default swaption payer equals DSX ( ) + . Accordingly, we
can dene the loss-adjusted forward default swap index, as DSX ( ) DSX ( ) + F , and
nd the value of CDS ( ) such that newly issued forwards are worthless, DSX (
) = 0,
denoted as CDX ( ), which is,
CDX (

) = LGD

(13.58)

such that,
DSX (

)=

(CDX (

We wish to use
eraire such that CDX ( ) is a martingale under a suitable
1 as a num
probability,13 similarly as for the probability sc in Eq. (13.51). Dene the probability sc
13 A technical issue with the denition of CDX ( ) in Eq. (13.58) relates to a denominator problemthe possibility of a total
collapse of the index,
= 0. The occurrence of such an event has been taken into account by Rutkowski and Armstrong (2009)
and Morini and Brigo (2011).

765

13.4. Credit derivatives and structured products based thereon


through the Radon-Nikodym derivative:

sc
=

c
by
A. Mele

(13.59)

The probability sc is the index counterpart to sc in Eq. (13.51). For simplicity, we shall
keep on referring to sc as the survival contingent probability. Appendix 7 contains a proof
that sc does indeed integrate to one. It also shows that CDX ( ) in Eq. (13.58) is a martingale
under sc . The price of a swaption payer with strike
is, for any
[ ],
R

sc
SW (
) E
)
)+ =
)
)+
1 (CDX (
1 E (CDX (
sc () denotes the time conditional expectation under the survival contingent probabilwhere E
ity sc in Eq. (13.59). We know CDX ( ) is a martingale under sc . We can use Black (1976)
to evaluate the previous expression, once we assume that under sc , CDX ( ) is a geometric
Brownian motion with constant volatility.
[Explain the post Big-Bang corrections]
13.4.3 Collateralized Debt Obligations (CDOs)
13.4.3.1 A crash description of securitization

On a historical perspective, an important input to the process of securitization relates to nancial innovation put forward by the US government during the 1980s. Until the 1970s, the
nancial system used to live in a buy to hold system, in which banks making loans to
businesses or individuals would typically hold the loans in their balance sheets. During the
1970s, another trend began, where the Government National Mortgage Association (GNMA
or Ginnie Mae) would buy the mortgages from banks to incentivize them to extend more
loans, thereby making houses accessible to families. The second step would then be for GNMA
to sell securities based on the cash ows generated by these mortgages. Securitization would
then begin to take on a higher level when the Federal National Mortgage Association (Fannie
Mae) and the Federal Home Loan Mortgage Corporation (Freddie Mac) would securitize
the assets through tranching. Once the tranching model was initially developed, investment
banks applied this same idea to other kinds of assets, such as corporate bonds, student loans,
small business loans, automobile loans, etc.
How does tranching work? Tranching relies on CDOs, which are securitized shares in pools of
assets. Collateral assets include loans or debt instruments. A CDO may be a collateralized loan
obligation (CLO) or collateralized bond obligation (CBO) according to whether it relies only
on loans or bonds, respectively. CDO investors bear the credit risk of the collateral. Multiple
tranches of securities are issued by the CDO, o ering investors various maturity and credit risk
characteristics. Tranches are categorized as senior, mezzanine, and subordinated, or junior, or
equity, according to their degree of credit risk. If there are defaults or the CDOs collateral
otherwise underperforms, scheduled payments to senior tranches take precedence over those of
mezzanine tranches, and scheduled payments to mezzanine tranches take precedence over those
to junior tranches. Typically, senior tranches are rated, with ratings of A to AAA. Mezzanine
are also rated, typically with ratings of B to BBB. In principle, these ratings should reect both
the credit quality of the collateral and the protection a given tranche is given by the tranches
subordinating to it.
766

c
by
A. Mele

13.4. Credit derivatives and structured products based thereon

CDOs are part of a more complex securitization process, which could also involve the inclusion
of assets of di erent nature. The stylized example in the diagram below illustrates this process.
In a rst step, subprime mortgages are securitized; in a second step, a CDO is created out of
the securitized subprime mortgages and additional Asset Backed Secutities (ABS); in a third
step, the structuring process involves creating seniority rules.

Monthly
payments

Subprime
Mortgage
Subprime
Mortgage

Monthly
payments

Asset
Backed
Security
(ABS)

Subprime
ABS

ABS
investor

Subprime
ABS

ABS
investor

ABS relating to other


forms of collateral
(e.g. corporate debt)

ABS
investor

Step 1

Collateralized
Debt Obligation
(CDO)

Subprime
Mortgage

CDO
Investors
CDO
Investors
CDO
Investors

Steps 2 and 3

Investors in CDOs senior tranches include banks and pension funds, which might benet from
the expertise of the asset managers, and the risk-return proles di cult to nd in the market.
Investors in junior tranches are hedge funds searching for highly risky investment opportunities
that at the same time, are quite rewarding and certainly unavailable in the market. Additional
investors in junior tranches were dedicated o -balance-sheets entities such as SIV, conduits,
and SIV-lites, which will be reviewed in Section 13.4.7.
Typical CDOs underwriters are investment banks. They work closely with the asset manager
and create the right debt/equity ratio and perform collateral quality tests. They liase with
law rms and create the special purpose vehicle (possibly in some tax heaven system) that will
purchase the assets and issue the tranches, price the various tranches, and obviously nd the
investors. Fees to underwriters are generous due to the complexity of the CDOs.14
Involved into the structuring process are also (i) trustee and collateral administrator, who
distribute noteholder reports, check compliance and execute priority of payments; (ii) accountants, who perform due diligence on the CDOs collateral pool, verifying for example credit
ratings for each asset; and (iii) rating agencies, which we shall discuss in the next subsection.
The economics behind structured nance is interesting. An originator may have private information about the quality of certain assets and/or a comparative advantage in evaluating
these assets relative to other market participants. If the originator intents to sell some of its
assets, an adverse selection problem arises: because investors do not know the true quality of
the assets, they will demand a premium to purchase them or even worse, a market might fail
to arise.
Structured nance helps originators mitigate this problem. First, by pooling the assets, diversication benets can be achieved. Second, tranching allows relatively poorly informed investors
to access senior tranches, and be relatively protected from default. In the process, the originator
or arranger may retain subordinated exposure to alleviate investors concerns about incentive
compatibility. The following scheme summarizes the structuring process.
14 According to Thomson Financial, top underwriters in 2006 were: Bear Sterns, Merrill Lynch, Wachovia, Citigroup, Deutsche
Bank, and Bank of America Securities.

767

13.4. Credit derivatives and structured products based thereon

c
by
A. Mele

Source: Committee on the Global Financial System: The role of ratings in structured
nance: issues and implications, January 2005.
13.4.3.2 The role of rating agencies

Structured nance has always been a rated market. Issuers of structured instruments had a
natural appetite for a rating to occur at a scale comparable with that applying to debt: the
main reason is that rating should facilitate the sale of these products to investors bound by
ratings-based constraints dened by their investment mandates.
However, the involvement of rating agencies into the delivery of their opinion about credit
risk di ers from that related to traditional bonds. As regards traditional instruments, rating
agencies simply aim to assess the risk of default as given, which they take as given. As regards
structured nance transactions, rating agencies play a much more ex-ante, reverse engineering
role. A tranche rating reects a view about both the credit risk of the asset pool and the extent
of credit support to be provided. These two elements are organized to reverse engineer the
tranche rating targeted by the deals arrangers. Deal origination involves rating agencies into
the structuring process.
13.4.3.3 Types of CDOs

In practice, CDOs are considerably more complex than the stylized examples outlined earlier.
We have a number of cases. We say that a CDO is static, if it holds the same set of assets.
Instead, a CDO is managed, if the asset manager is allowed to change the composition of assets.
If the claims to the CDO arise from the cash ows originated by the assets, we have a cashow CDO. If the claims to the CDO arise from the cash ows originated by the assets and/or
768

13.4. Credit derivatives and structured products based thereon

c
by
A. Mele

active asset management, we have a market-value CDO. CDOs can also be created to carve out
balance sheets, in which case we have balance-sheet CDOs. Moreover, and interestingly, CDOs
can be created (i) to achieve investment grade bonds through a pool of noninvestment grade
bonds, and (ii) to create riskier securities than those in the asset pool. In these cases, we have
arbitrage CDOs. Naturally, arbitrage CDOs do not give rise to any arbitrage opportunity.
These instruments merely reshu e risk and returns of the assets in the pool, as illustrated
by the examples in the next section. Arbitrage CDOs di er from balance sheet CDOs, because
issuers of arbitrage CDOs do not necessarily hold the underlying collateral in advance, which
is obviously the case for issuers of balance-sheet CDOs. Therefore, the assets to be put into the
an arbitrage CDO pool have to be reasonably liquid.
Furthermore, we have synthetic CDOs, which are exposed to a pool of assets that are not
strictly owned or in the asset pool, typically through CDS underwriting. Like a cash-ow CDO,
the vehicle receives payments (the premium), which is then transferred to the tranche holders.
Naturally, there can be default events, which are also passed through to the investors, according
to the prespecied seniority rules. A synthetic CDO is funded, if the relevant tranche holders are
to pay for in the case of a credit event related to the assets the CDO is exposed to. Typically,
some funding is made available at the very time of investment. At maturity, the investor receives
a payo equal to the funding minus the realized losses. Junior tranches are typically funded,
and senior are typically not. However, senior tranches investors might have to make payments
in the unlikely event losses had ever to erode their tranches.
Finally, we have hybrid CDOs, which are partly cash-ow CDOs and partly synthetic CDOs.
In a single-tranche CDO, the entire CDO is structured to accommodate the specic needs of a
small group of investors, with some remaining tranche held by the dealer. And we have CDO 2 ,
where a large portion of the assets in the pool are tranches from other CDOs; or more generally,
CDO n .
13.4.3.4 Pricing

CDOs repackage cash ows from a set of assets. We provide simple examples to show how to
price this repackaging process. We begin with a simple example, taken from McDonald (2006, p.
583), which we further elaborate. Suppose we have three one-year bonds with face value = 100.
For each of these bonds, the risk-neutral probabilities of default equal 10% and the recovery
rates are 40. The safe interest rate for one year is 6%. So each bond price equals,
=

0 06

( |{z}
0 10

Def Prob

40 +

0 90
|{z}

Surv Prob

100) = 88 526

The yield is, naturally, ln 100 = 12 19%.


A CDO can restructure the payments promised by the three bonds in a way that transforms
the riskiness and attractiveness of the initial assets. Consider the following example:
Senior tranche = 140
Face Value
= 300

Mezzanine tranche = 90
Junior tranche = 70

Asset Pool

CDO claims

769

c
by
A. Mele

13.4. Credit derivatives and structured products based thereon

In this example, each tranche receives the minimum between (i) the nominal value claimed by
the tranche and (ii) what is left available to the tranche after having satised the other tranches
by order of seniority.
Let
be the nominal values claimed by the tranches, so that 1 = 140, 2 = 90 and
3 = 70. Let be the realized payo of the asset pool, dened as,
= No of Defaults 40 + (3
|

No of Defaults) 100
{z
}

No of surviving bonds

Naturally, is random because the number of defaults is random. At the expiration,


(i) the senior tranches receives the minimum between 1 and . For example, if only one bond
defaults, = 240, and the senior tranche receives 140. If, however, three bonds default,
= 120, which is less than the senior tranch nominal value, and the senior tranche then
receives 120. So a quite severe loss is needed to erode the senior tranche claims.
(ii) The mezzanine tranche receives the minimum between
senior tranche.

and the left-over from the

(iii) Finally, at the expiration, the junior tranche reveives the minimum between
left-over from the senior and mezzanine tranches.

and the

In general, the payo to tranche no. is,


= min {Left-over
where Left-over

denotes the left-over from previous tranches, and up to tranche

P1
0
Left-over 1 max

1,

=1

That is,

= min max

P1

=1

All we need now is a model for the risk-neutral probability of default for each rm. Initially,
we assume the default events are independent across rms. We assume binomial distribution,

3
Pr (No of Defaults = ) =
(1
)
= 10%
{1 2 3}
leading to the payo s in the following table:
Payoffs to CDO tranches, and prices: with independent defaults
Defaults Pr(Defaults) : pool payoff (1)
1: Senior
2: Mezzanine
3: Junior
0

0.729

300

140

90

70

1
2
3

0.243
0.027
0.001

240
180
120

140
140
120
131.8281994
0.060142867

90
40
0
83.40266709
0.076129382

10
0
0
50.34673197
0.329561531

Price
Yield
(1)

: pool payoff = Def*40+(3-Def)*100


N1 = 140
N2 = 90
N3 = 70

770

c
by
A. Mele

13.4. Credit derivatives and structured products based thereon

The price of each tranche is computed as the tranche payo , averaged across states, discounted
at the safe interest rate. For example, the price of the mezzanine tranche is,

0 06

Price Mezzanine =

(0 729 90 + 0 243 90 + 0 027 40 + 0 001 0) = 83 403

Its yield is, Yield Mezzanine = ln 8390403 = 7 61%. Naturally, the sum of the three bond prices,
88 5263 = 265 58, is equal to the total value of the three tranches, 131 828+83 403+50 347 =
265 58. As anticipated, a CDO is a mere re-packaging device. It doesnt add or destroy value.
It merely redistributes risks (and returns).
The assumption defaults among names are uncorrelated is unrealistic, as argued in Section
13.5.4. We now remove this assumption. First, what happens in the special case where default
events are perfectly correlated ? In this case, either the three rms all default (with probability
0.10) or none defaults (with probability 0.90), and we have the situation summarized by the
table below.

Payoffs to CDO tranches, and prices: with perfectly correlated defaults


Defaults Pr(Defaults) : pool payoff (1)
1: Senior
2: Mezzanine
3: Junior
0

0.9

300

140

90

70

1
2
3

0
0
0.1

NA
NA
120

NA
NA
120
129.9635056
0.074388737

NA
NA
0
76.28292722
0.165360516

NA
NA
0
59.33116562
0.165360516

Price
Yield
(1)

: pool payoff = Def*40+(3-Def)*100


N1 = 140
N2 = 90
N3 = 70

Note that mezzanine and junior tranches now yield the same, because they each pay o either
their nominal value or zero in exactly the same states of nature. In other words, default clustering implies that good times are really good, in that the probability to have no defaults is
now 90%, much higher than the 72.9% arising when the correlation of defaults is zero.
The previous cases (with independent or perfectly correlated defaults) are extreme. What
happens when defaults are only imperfectly correlated? In this case, the pricing of tranches
is more complex, and requires a model of default correlations. We use the so-called Gaussian
copulae, reviewed in Appendixes 7 and 8, and simulations. Figure 13.14 illustrates how the
yield on each tranche changes as a result of a change in the default correlation underlying the
771

c
by
A. Mele

13.4. Credit derivatives and structured products based thereon


assets in the CDO.
Yields on CDO tranches
0.4
Junior
Mezzanine
Senior

0.35

0.3

Yield

0.25

0.2

0.15

0.1

0.05

0.1

0.2

0.3

0.4
0.5
0.6
default correlation

0.7

0.8

0.9

FIGURE 13.15. Yields on the three CDO tranches, as functions of the default correlation
among the assets in the structure, with probability of default for each name = 20%. The
thick, horizontal, line is the yield on each securitized asset.
Arbitrage CDOs

Figure 13.15 illustrates how arbitrage CDOs work. The CDO has three assets yielding the same,
12.19% (the horizontal line in the picture). However, by restructuring the asset base through a
CDO, we can create claims (Senior and Mezzanine tranches) that yield less than 12.19%, as they
are considerably less risky than the asset base. Such an excess return, (12 19% Yieldtranche ),
with Yieldtranche
{Senior Mezzanine}, is made available to the Junior tranche/equity
holdersonce management fees and expenses are accounted for. Note that such a redistribution of risk works quite e ectively as soon as the default correlation is relatively low. As the
default correlation in the asset base increases, the situation may change dramatically, with the
mezzanine tranche becoming more risky and, then, yielding a higher expected return. Finally,
Figure 13.16 depicts the output of a comparative statics where we increase from 10% to
20%. The yields are obviously higher for each tranche, and the three assets now yield 18.78%,
reecting the higher marginal probability of default for each of the securities in the pool, .
Correlation assumptions

In Figures 13.15 and 13.16, the yield on the junior tranche decreases with default correlation.
This happens because we are assuming that the probability of default is xed at = 10% for
each default correlation (say). As increases, the probability of clustering events increases,
which makes the Senior and Mezzanine tranches relatively less valuable and, correspondingly,
the Junior tranches more valuable. A more appropriate model is one in which increases as
increases, to capture the fact that in bad times, both default correlation and probability of
772

c
by
A. Mele

13.4. Credit derivatives and structured products based thereon

defaults increase as these two things are intimately connectedby, e.g., some common business
cycle factors.

Yields on CDO tranches


0.7
Junior
Mezzanine
Senior

0.6

0.5

Yield

0.4

0.3

0.2

0.1

0.2

0.4
0.6
default correlation

0.8

FIGURE 13.16. Yields on the three CDO tranches, as functions of the default correlation
among the assets in the structure, with probability of default for each name = 20%. The
thick, horizontal, line is the yield on each securitized asset.

Addressing the correlation assumption

We relax the assumption that the probability of default, , and the default correlation,
are independent. We assume that and are tied up through the following relation, =
3 8116 ln ( + 1), and let vary from 0.10 to 0.30, such that varies from 0 3633 to 1.
The situation now changes, dramatically. Figure 13.17 depicts the results, which show how
modeling might substantially a ect e ective pricing. First, and naturally, the yield on each
securitized asset is increasing in because is also increasing in the probability of default.
Second, the Junior tranche has a yield that increases over a wide spectrum of values for the
default correlation, . Note that the Junior tranche bends back to lower values as the default
correlation is close to one, reecting the fact that default clustering makes this tranche quite
valuable in good times, as explained earlier.
773

c
by
A. Mele

13.4. Credit derivatives and structured products based thereon

Yields on CDO tranches


0.5

0.45
Junior
Mezzanine
Senior

0.4

0.35

Yield

0.3

0.25

0.2

0.15

0.1

0.05

0.4

0.5

0.6

0.7
default correlation

0.8

0.9

FIGURE 13.17. Yields on the three CDO tranches, as functions of the default correlation
among the assets in the structure, with probability of default and default correlation
related by = 3 8116 ln ( + 1),
[0 10 0 30]. The thick curve line depicts the yield
on each securitized asset.
13.4.3.5 Nth to default

In this contract, the owner of the 1 to default bears the risk of the rst default that occurs in
the asset pool:
Payo = Pr(No of Defaults
Likewise, the owner of the 2
asset pool:

1) 100

to default bears the risk of the second default that occurs in the

Payo = Pr(No of Defaults


Finally, the owner of the 3
asset pool:

1) 40 + Pr(No of Defaults

2) 40 + Pr(No of Defaults

2) 100

to default bears the risk of the third default that occurs in the

Payo = Pr(No of Defaults = 3) 40 + Pr(No of Defaults

3) 100

Let us assume that default correlation is zero for simplicity. We have previously computed
the previous probabilities as:
Pr(No of Defaults
1) = 0 243 + 0 027 + 0 001 = 0 271
Pr(No of Defaults
2) = 0 027 + 0 001 = 0 028
Pr(No of Defaults = 3) = 0 001
774

c
by
A. Mele

13.4. Credit derivatives and structured products based thereon


Thus, we have the following prices,
Price1
Price2
Price3

-to-default
-to-default
-to-default

=
=
=

0 06

[0 271 40 + (1
0 06
[0 028 40 + (1
0 06
[0 001 40 + (1

0 271) 100] = 78 863


0 028) 100] = 92 594
0 001) 100] = 94 120

From here, we can compute the yields as follows, Yield1 -to-def = ln (78 863 100) = 23 74%,
Yield2 -to-def = ln (92 594 100) = 7 69%, and Yield3 -to-def = ln (94 120 100) = 6 06%.
13.4.3.6 One numerical example of a stylized structured product
A. Defaultable bonds

Suppose we observe the following risk-structure of spreads, related to two bonds maturing in
two years:
Spread (2 years) = 1 5% Spread (2 years) = 2 5%
where A and B denote the rating classes the bond issuers belong to. Assume that the one-year
transition rating matrix, dened under the risk-neutral probability, is:
To
A B
A
0.7 0.3
From
B
0.3 0.5
Def 0
0

Def
0
0.2
1

where Def denotes default. We assume that in the event of default, the recovery value of the
bond is paid o at the end of the second period. We want to determine the expected recovery
rates for the two bonds, and which expected recovery rate is the largest. We have:

Rec
0
=
Q (2) + (1 Q (2))
{
}
Therefore,
Spread (2 years) = 1 5% =
Spread (2 years) = 2 5% =

Rec
1
ln
Q (2) + (1
2
Rec
1
ln
Q (2) + (1
2

Q (2))

Q (2))

We have to nd Q (2) and Q (2). The transition matrix for two years is,
Q (2) =

07 03 0
03 05 02
0
0
1

07 03 0
03 05 02
0
0
1

such that,
Pr {A defaults in 2 years} = Q (2)
0 20} +
= 0| 70{z 0} + 0| 30 {z
= 0 06
775

0| {z
1}

(13.60)
(13.61)

c
by
A. Mele

13.4. Credit derivatives and structured products based thereon

Pr {B defaults in 2 years} = Q (2)


=
0| 20{z 1}

+ 0| 50 {z
0 20} + 0| 30{z 0}

= 0 20 + 0 10 = 0 30

Hence, using Eqs. (13.60)-(13.61), we have


Spread (2 years) = 1 5% =
Spread (2 years) = 2 5% =

Rec
1
ln
0 06 + (1
2
Rec
1
ln
0 30 + (1
2

0 06)

0 30)

Solving, yields,
Rec

= 50 7%

Rec

= 83 7%

The expected recovery rate for the second bond is the largest. This is because the probability
rm B defaults is much larger than the probability rm A defaults and yet the two spreads are
relatively close to each other. So to rationalize the two spreads, we need a large recovery rate
for the second bond.
What would happen to the two credit spreads, once we assume that the recovery rates are
the same, and equal to 50%? This question sheds additional light to the previous ndings. If
the recovery rates are the same and both equal 50%,
Spread (2 years) =
Spread (2 years) =

1
ln [0 50Q (2) + (1
2
1
ln [0 50Q (2) + (1
2

Q (2))]
Q (2))]

Then, using the previously computed transition probabilities for two years, we obtain:
Spread (2 years) = 1 52%

Spread (2 years) = 8 12%

When the recovery rates are the same, the spread on the second bond diverges substantially
from that on the rst bond.
B. Collateralized debt obligations

Let us keep on using the same framework as before, but use di erent gures, so as to gure out
the implications for CDOs pricing. Consider the following one year transition matrix, under the
risk-neutral probability:
A
A
0.7
From
B
0.1
Def 0

To
B
0.3
0.6
0

Def
0
0.3
1

where Def denotes default. Consider (i) 1 one-year bond issued by a company rated A, and
(ii) 3 one-year bonds issued by a company rated B. Both bonds have face value equal to 100.
776

c
by
A. Mele

13.4. Credit derivatives and structured products based thereon

We assume that the recovery values in case of default of all these bonds are the same, and equal
to 50. Finally, we assume the safe interest rate is taken to be equal to zero.
Consider a collateralized debt obligation (CDO, in the sequel), which gathers the previous
four bonds. Therefore, the CDO has nominal value of 400, and pays o in one year. The CDO
has (i) a senior tranche, with nominal value equal to 150; (ii) a mezzanine tranche, with nominal
value equal to 1 ; and (iii) a junior tranche, with nominal value equal to 2 . We assume that
the structure is such that 1 100.
First, we determine the price and yields on all the four bonds. Since the safe interest rate is
zero, and the company rated A is safe, up to the next year, the price of the A bond is 100, and
its yield is zero. As for the three bonds rated B, we have:
= 50 0 3 + 100 0 7 = 85 0

ln 0 85 = 16 25%

Second, we determine the yield on the junior tranche, and derive the yield on the mezzanine,
as a function of its nominal value 1 . To determine the yield on the tranches, we need to gure
out the following table:
No Def Pr
0
1
2
0
0.7 400 150
1
2
1
0
na na na na
2
0
na na na na
3
0.3 250 150 100 0
4
4
na na na na
where No Def denotes the number of defaults, Pr is the probability of No Def,
payo , dened as,
= No Def 50 + (4 No Def) 100

is the pool

and, nally: 0 is the payo to the senior tranche, 1 is the payo to the mezzanine tranche,
and, 2 is the payo to the junior tranche. Therefore, we have:
price mezzanine = 0 70
such that:
Yield mezzanine
Yield junior

=
=

ln
ln

0 70
0 70
2

+ 0 30 100
1

price junior = 0 70

+ 0 30 100

100
ln 0 70 + 0 30
1

= 35 67%

Naturally, we need to have that Yield mezzanine


Yield junior. It is simple to show this
relation: it su ces to note that,

100
= Yield mezzanine.
ln 0 70 + 0 30
Yield junior = ln (0 70)
1

A reverse enginnering question is, now, to determine which nominal value of the mezzanine
tranche 1 is needed, to ensure that the yield on the mezzanine tranche is equal to or greater
than the yields on the bonds issued by the company with credit rating B? The answer is
1 = 200, for in this case, the mezzanine tranche would have the same payo structure as the
bond rated B: it would deliver (i) the face value, in the event the company rated B does not
default; and (ii) half of its nominal value, 100, in the event the company rated B does default.
777

c
by
A. Mele

13.5. Foundations of risk-management

Finally, we ask which nominal value of the mezzanine tranche 1 is needed, to ensure that
the yield on the mezzanine is equal to 18%? And what is the corresponding nominal value of
the junior tranche, 2 ? To address these issues, we rst want that:

0 70 1 + 0 30 100
Yield mezzanine = ln
= 18%
1

Solving for 1 yields, 1 = 221 78. Therefore,


400 150 221 78 = 28 22.

= 400

Nominal value senior

13.5 Foundations of risk-management


We begin with the standard concepts applying to equity, and then show how these concepts
can be used to deal with a few fundamental credit products.
13.5.1 Value at Risk (VaR)
We need to review Value at Risk (VaR), in general. VaR is a method of assessing risk that uses
statistical techniques. Useful for supervision and management of nancial risks. Origins: reaction
to nancial disasters in the early 1990s involving Orange County, Barings, Metallgesellschaft,
Daiwa, etc. VaR measures the worst expected loss over a given horizon under normal market
conditions at a given condence level.
Consider the following denition:

Definition I: We are (1
)% certain that a given portfolio will not su er of a loss larger
than $W over the next
weeks, Pr (Loss
) = . That is, $VaR = $ .

778

c
by
A. Mele

13.5. Foundations of risk-management


Equivalently, note that
Loss

= portfolio return

where
denotes the change in value of the portfolio over the next
current value of the portfolio. Hence,

VaR
= Pr (Loss
VaR ) = Pr
0

days, and $

is the

This formulation leads us to the following alternative denition:


Definition II: We are (1
)% certain that a given portfolio will not experience a relative
VaR
loss larger than 0 over the next
weeks.
So in practice, we shall have to nd the relative loss, , for a given condence , as follows:

VaR
where =
= Pr
0

The corresponding VaR is just VaR = 0 . For example, suppose that the portfolio return
over the next 2 weeks, 0 , is normally distributed with mean zero and unit variance. We know
that 0 01 = Pr( 0
2 32). Hence, VaR = 2 32 0 .

0.4

0.35

0.3

0.25

0.2

0.15

0.1

1%
VaR/V

0.05

0
3

We are 99% certain that our portfolio will not su er of a loss larger than 2 32 times its
current value over the next 2
. We are 99% certain that our portfolio will not experience
a relative loss larger than 2 32 over the next 2 weeks.
As a second example note that the previous assumption about the portfolio return was
extreme. Assume, instead, the porfolio return over the next 2 weeks, 0 , is normally distributed
2 2
2
with mean zero and variance 2 = 52
year , where year is the annualized variance. We assume
2
2
that year = 0 15 . We have to re-scale the previous formulas, as follows. First, we introduce a
779

c
by
A. Mele

13.5. Foundations of risk-management


variable
write,

(0 1), i.e. is normally distributed with mean zero and variance = 1. So we can

and, hence,
0 01 = Pr (

2 32

2 32) = Pr (

whence, VaR = 2 32 0 . We know the annualized variance, 2year = 0 152 , from which we
2 2
can derive the two-week standard deviation, 2 = 52
0 032 , and, hence, VaR0 = 2 32 =
year
2 32 0 03 7%. That is, we are 99% certain that our portfolio will not su er of a loss larger
than 7% times its current value over the next 2 weeks. We are 99% certain that our portfolio
will not experience a relative loss larger than 7% over the next 2 weeks.
More generally, we may assume the porfolio return over the next 2 weeks, 0 , is normally
distributed with mean and variance 2 . In this case,
=
0

and, hence,
0 01 = Pr (
whence, VaR =
weeks.

(2 32

2 32) = Pr (
). In practice,

(2 32

))

is very small if the horizon is as short as two

13.5.1.1 Challenges to VaR

Challenges related to distributional assumptions, nonlinearities, or conceptual di culties.


Distributional assumptions

The assumption that data are generated by a normal distribution does not describe asset
returns well. In previous chapters of this Part and Part II of these Lectures, we explain that
we need ARCH e ects, stochastic volatility and multifactor models. More generally, data can
exhibit changes in regimes, nonlinearities and fat tails. Fat tails are particularly important to
understand, since this is what were interested in after all. More in general, it is quite challenging
to understand what the data generating process is, especially in so far as we consider portfolios
of assets. Asset returns and volatilities are typically correlated, with correlation rising in bad
timescorrelation is stochastic.
We may make distributional assumptions but then, these assumptions have to be carefully
assessed through, for example, backtesting (to be explained below). We may proceed with
nonparametric methods, and this is indeed a promising avenue, but with its caveats.
How do nonparametric methods work? These methods rely on an old and idea, which is to
estimate the data distribution through histograms. These histograms can be readily used to
compute VaR. This approach is nonparametric in nature, as it does not rely on any model.
A more rened method replaces rough histograms with smoothed histograms, as follows.
Suppose to have access to a time series of data , which are drawn from a certain probability
law, with density ( ). We may dene the following estimate of the density ( ),
X1
( )= 1
=1

780

c
by
A. Mele

13.5. Foundations of risk-management

where
is the sample size, and
is some symmetric function integrating to one. We may

think of
( ) as a smoothed histogram, with window bin equal to . It is possible to show
that as goes to innity and goes to zero at a certain rate, ( ) converges in probability
to ( ), for all . But we are not done, since there are not obvious rules to choose and ?
The choice of is notoriously di cult. Unfortunately, the bias, ( )
( ), tends to be
large exactly on the tails of ( ), which do represent the region were interested in. In general,
we can use Montecarlo simulations out of a smoothed density like this to compute VaR.
Nonlinearities

Finally, portfolios of assets can behave in a nonlinear fashion, especially when the portfolio
contains derivatives. In general, the value of a portfolio including
assets is,
=

X
=1

is the number of the -th asset in the portfolio, and


is the price of the -th asset
where
in the portfolio. Holding
constant, the variation on the portfolio return is simply a weigthed
average of all the asset returns,
=

X
=1

X
=1

are rational functions of


where the variations relate to any time interval. Often, the prices
the state variables, or are interlinked through arbitrage restrictions. Use factors to determine
the risk associated with xed income securities. When the horizon of the VaR is large, it is
unlikely that
is constant. Typically, we shall need to go for numerical methods, based, for
example, on Monte Carlo simulations. So all in all, we need to have a careful understanding of
the derivatives in the book, and proceed with back testing and stress testing.
VaR as an appropriate measure of risk

There are technical di culties with the very denition of VaR. VaR su ers from some statistictheoretic foundation. VaR tells us that 1% of the time, losses will exceed the VaR gure, but
it does not tell us the entity of the loss. So we need to compute the expected shortfall. Any
risk measure should enjoy a number of sensible properties. Artzner et al. (1999) have noted a
number of properties, and showed that VaR does not enjoy the so-called subadditivity property,
according to which the sum of the risk measures for any two portfolios should be larger than the
risk measure for the sum of the two portfolios. VaR doesnt satisfy the subadditivity property,
but expected shortfall does satisfy the subadditivity property.
13.5.2 Backtesting
How well the VaR estimate would have performed in the past? How often the loss in a given
sample exceeded the reference-period 99% VaR? If the exceptions occur more than 1% of the
time, there is evidence that the models leading to VaR estimates are misspecieda nice
word for saying bad models.
The mechanics of backtesting is as follows. Suppose the models leading to the VaR are
good. By construction, the probability the VaR number is exceeded in any reference period
is , where is the coverage rate for the VaR. Next, we go to our sample, which we assume
781

c
by
A. Mele

13.5. Foundations of risk-management

it comprises
days, and let
be the number of days the VaR is exceeded. We wish to test
whether the number of exceptions we observe in the sample conforms to the expected number
of exceptions based on the VaR. For example, it might be that the number of exceptions we
have observed, , is larger than the expected number of exceptions, . We want to make
sure this circumstance arose due to sample variability, rather than model misspecication. A
simple one-tail test is described below.
Let us compute the probability that in
days, the VaR is exceeded for
or more days.
Assuming exceptions are binomially distributed, this probability is,
=

X
=

!
!(

)!

(1

5% (say), we reject the hypothesis that the probability


Then, we can say the following. If
of exceptions is at the 5% levelthe models were using are misspecied. If
5% (say),
we cannot reject the hypothesis that the probability of exceptions is at the 5% levelwe cant
say the models were using are misspecied. This test is reviewed in more detail by Hull (2007,
p. 208). Other tests are reviewed by Christo ersen (2003, p. 184).
13.5.3 Stress testing
Stress testing is a technique through which we generate articial data from a range of possibles
scenarios. Stress scenarios help cover a range of factors that can create extraordinary losses
or gains in trading portfolios, or make the control of risk in those portfolios very di cult.
These factors include low-probability events in all major types of risks, including the various
components of credit, market, and operational risks. Stress scenarios need to shed light on the
impact of such events on positions that display both linear and nonlinear price characteristics
(i.e. options and instruments that have options-like characteristics).
Possible scenarios include simulating (i) shocks that although rare or even absent from the
historical database at hand, are likely to happen anyway; and (ii) shocks leading to structural
breaks and/or smooth transition in the data generating mechanism. One possible example is to
set the percentage changes in all market variables in the portfolio equal to the worst percentage
changes having occured in ten days in a row during the subprime crisis 2007-2008.
This example on the subprime crisis is related to the historical simulation approach to generate scenarios. This approach consists can be explained through a single formula. Let the value
of some market variable in day in our sample, where = 0
(say). We can generate
scenarios for the next day, + 1, as follows.
(i) The rst scenario is that in which each variable grows by the same amount it grew at
time 1, +1 =
10 .
(ii) The second scenario is that in which each variable grows by the same amount it grew at
21 .
time 2, +1 =
(iii)
(iv) The -th scenario is that in which each variable grows by the same amount it grew at

.
time , +1 =
1
782

c
by
A. Mele

13.5. Foundations of risk-management

(v) The scenarios are generated for all the market variables, which would give us an articial
multivariate sample of observations. We can use this sample for many things, including
VaR.
13.5.4 Credit risk and VaR
We can use the tools in Section 13.2 to assess the likelihood of default for a given name. The
important thing to do is to use the physical probability of default, not the risk neutral one. The
risk neutral probability of default is likely to be larger than the physical one. Therefore, using
the risk neutral probability leads to too conservative estimates.
VaR for credit risks pose delicate issues as well. The key issue is the presence of default
correlation. In practice, defaults among names or loans are likely to be correlated, for many
reasons. First, there might be direct relationships or, more generally, network e ects, among
names. Second, rms performance could be driven by common economic conditions, as in the
one factor model which we now describe. This one factor model, developed by Vasicek (1987),
is at the heart of Basel II. In the appendix, we provide additional technical details about how
this model is related to a modeling tool known as copulae functions. We now proceed to develop
this model in an intuitive manner. Let us dene the following variable:
p
=
+ 1
(13.62)

where
is a common factor among the names in the portfolio, is an idiosynchratic term,
and
(0 1),
(0 1). As we explain in the Appendix,
0 is meant to capture the
default correlation among the names.
Next, assume that the physical probability each rm defaults, by , say P ( ), is the same
for each rm within the same class of risk, and given by,
P( ) =

PD )

PD

where PD is the probability of default, and


is the cumulative distribution of a standard
normal variable. That is, by time , each rm defaults any time that,
1
PD

(PD)

1
where
denotes the inverse of . One economic interpretation of Eq. (13.62) is that
is
the value of a rm and that the rm defaults whenever this value hits some exogenously given
barrier PD .
Conditionally upon the realization of the macroeconomic factor , the probability of default
for each rm is,

1
(PD)
(13.63)
( ) Pr (Default| ) =
1

By the law of large numbers, this is quite a good approximation to the default rate for a portfolio
of a large number of assets falling within the same class of risk.
We see that this conditional probability is decreasing in : the larger the level of the common
macroeconomic factor, the smaller the probability each rm defaults. Hence, we can x a value
of such that Pr (Default| ) = Default rate is what we want. Note, the probability is larger
1
than
( ) is just ! Formally,

1
1
( ) = Pr
( ) =
( ) =
Pr
783

c
by
A. Mele

13.5. Foundations of risk-management


Then, with probability , the default rate will not exceed
1
(PD) +
VaRCredit Risk ( ) =
1

( )

It is easy to see that VaRCredit Risk ( ) increases with . Basel II sets


it imposes a capital requirement equal to,
LGD (VaRCredit

Risk

= 0 999 and, accordingly,

PD) Maturity adjustment

(0 999)

The reason Basel II requires the term VaRCredit Risk (0 999) PD, rather than just VaRCredit Risk ,
is that what is really needed here is the capital in excess of the 99.9% worst case loss over the
expected idiosyncratic loss, PD. Well functioning capital markets should already discount the
idiosyncratic losses.
Finally, Basel II requires banks to compute through a formula in which is inversely related
to PD. The formula is based on empirical research (see Lopez, 2004): for a rm which becomes
less creditworthy, the PD increases and its probability of default becomes less a ected by market
conditions. Basel II requires banks to compute a maturity adjustment factor that takes into
account that the longer the maturity the more likely it is a given name might eventually migrate
towards a more risky asset class.
The previous model can be further elaborated. We ask: (i) What is the unconditional probability of defaults, and (ii) what is the density function of the fraction of defaulting loans?
First, note that conditionally upon the realization of the macroeconomic factor , defaults
are obviosly independent, being then driven by the idiosyncratic terms in Eq. (13.62). Given
loans, and the realization of the macroeconomic factor , these defaults are binomially
distributed as:

Pr (No of defaults = | ) =
( ) (1
( ))
where ( ) is as in Eq. (13.63). Therefore, the unconditional probability of defaults is:
Z
Pr (No of defaults = | ) ( )
Pr (No of defaults = ) =
where denotes the standard normal density. This formula provides a valuable tool analysis in
risk-management. It can be shown that VaR levels increase with the correlation .
Next, let denote the fraction of defaulting loans. For a large portfolio of loans, = ( ),
such that:
Z
Z
Pr (
| ) ( )
=
I( )
( )
= ( )
(13.64)
Pr (
)=
where
1I denotes the indicator function, and
(PD)+
. Solving for
leaves:
1
=

satises, by Eq. (13.63),

( )

784

(PD)

= (

)=

c
by
A. Mele

13.6. Procyclicality, credit crunches and quantitative easing

It is the threshold value taken by the macroeconomic factor that guarantees a frequency of defaults less than . Replacing
into Eq. (13.64) delivers the cumulative distribution function
for . The density function ( ) for the frequency of defaults is then:
( )=

1
2

1(

))

1
2

1(

1 (PD) 2

This model can be generalized to one where the asset value of the rm,
multifactor model,
X
=
+

, is given by a

=1

and accordingly interpret the event of default as that occurring as soon as


PD , where
onbucher (2003, Chapter 10).
now can change across rms, similarly as explained by Sch
13.5.5 Expected shortfall and measures of systemic risk
This subsection is important, also in light of the following section.
[In progress]

13.6 Procyclicality, credit crunches and quantitative easing


How is it that a relatively small loss in the banking system triggered by credit derivatives
(about one trillion USD) has the power to lead to a spectacular nancial crisis such as that
experienced over the years 2007-2008, with spillover e ects to the real economy, and one of the
deepest recessions after World War II? One explanation is procyclicality, the situation studied
in Chapter 8 of these lectures, for which a trend in asset prices feeds an automatic mechanism,
which, in turn, reinforces the asset price trend, thereby creating a feedback loop between the
asset price trend and the mechanism. The equilibria in these markets can be drastically di erent
from those occurring while absent any automatic mechanisms; possibly, no equilibrium could
arise at all in the rst place. The mechanism is automatic in that, once implemented, is not
under discretion of any decision maker. One example of procyclicality includes the developments
leading to the Black Monday crash of October 19th, 1987, most likely linked to program trading,
as discussed in Chapter 10. The ash crash of May 6th, 2010 is the high-tech counterpart to
the 1987 crash (see also Chapter 10).
A further instance of procyclicality, which is more closely connected to the topics studied
in this chapter, relates to the amplication of business cycles determined by capital market
frictions: in bad times, agency problems might entail an increase in the cost of external funds,
thereby leading to a decrease in the availability of these funds and, then, to an amplication
of shocks occurring in the real sectors of the economy. A leading example of this amplication
mechanism is the nancial accelerator hypothesis. Due to asymmetric information, nancial
intermediaries agree on lending plans based on the collateral made available by borrowers.
The nancial accelerator hypothesis holds that in bad times, nancial intermediaries reduce
their funding activities as collateral values are also reduced in bad times. The ensuing lending
shrinkage contributes to further depress real economic activity and, hence, assets and collateral
785

13.6. Procyclicality, credit crunches and quantitative easing

c
by
A. Mele

values, feeding a vicious feedback loop. Bernanke, Gertler and Gilchrist (1999) present a unied
view of how agency problems make funding opportunities depend on rms collateral.15
Fisher (1933) is one of the earliest proponent of these procyclicality issues, in his attempt
to explain the origins of the Great Depression through a debt-deation spiral. In an economy
with highly levered rms, such as that of the US during the 1930s, a negative productivity
shock leads to bankruptcy of a fraction of these rms, which generates less investments and,
hence, depresses aggregate demand and creates deation. In turn, deation boosts the real
value of debt borne by rms, increasing the rms burden and leading a higher fraction of these
rms to default. Such a debt-deation spiral results in a deterioration of the balance sheet of
nancial intermediariesbanks obviously bleed money as their borrowers defaultand to a
default contagion, from rms to nancial intermediaries. As a result, nancial intermediation
shrinks and the vicious feedback loop might go through a consistently long period.16
[Explain the connections with the credit view and the previous footnote on Friedman and
Schwartz (1963)]
This section provides discussion of these procylicality problems, arising through the balance
sheets of nancial intermediaries. Section 13.6.1 is an overview of the extant regulatory framework, which is useful whilst framing procyclicality issues dealt with later in this section. Section
13.6.2 reviews a few institutional facts sorrounding the 2007 subprime turmoil. Section 13.6.3
develops a few models where the amplication of small shocks occurs because nancial intermediaries have concerns over the structure of their books. Thus, following a negative shock
a ecting the assets in the balance sheet, banks need to restore their Tier 1 and Tier 2 capital
(in short, their top tier capital) and leverage ratios. Since they cannot raise fresh capital in
the short-run, they cash-in by selling some of their assets. These sales create a vicious feedback
loop where banks sell assets, contributing to a further drop in the value of these assets, triggering further sales into a depressed market. We may have situations where this loop leads to
a complete market dry-up, which is even more likely to occur in the presence of capital market
frictions, where some initially moderately low liquidity frictions can turn into spots of liquidity
black holes.
Even absent such extreme situations, the equilibria in these markets can be those where an
initial small loss in the banking system is amplied, to an extent determining a very substantial
lending shrinkage, a credit crunch. Section 13.6.4 discusses the policy that monetary authorities
have implemented in their attempt to mitigate the credit crunch originating from the 2007
subprime crisis. The standard policy action against a recession is to target low interest rates
in the interbank markets for mandatory reserves. However, the cost of capital that matters
to a recovery in the economic activity is that faced by rms whilst demanding new funds to
banks (through loans) and/or the market (through issuance of corporate bonds). This cost can
be substantially higher than the interest rates targeted by the monetary authority, due to the
credit crunch. Quantitative easing is an unconventional policy action, where the monetary
15 Borio, Furne and Lowe (2001) explain that the agents misperception of risk might constitute an additional amplication
mechanism. For example, the credit/GDP ratio might be procyclical because nancial intermediaries under-estimate risk in good
times, and over-estimate risk in bad, thereby lending too much in good times and too less in bad.
16 This view of the Great Depression was challenged by Friedman and Schwartz (1963), who proposed a monetary view instead.
According to this view, the causes of the prolonged recession and the banking crises over the 1930s need to be linked to a nonaccommodating monetary policy. Friedman and Schwartz examine the US economy from Civil War through 1960, and nd a
statistical relation between monetary policy and developments in the real macroeconomic aggregates: an expansionary monetary
policy is associated with an expansion of the real economy. Friedman and Schwartz nd that this linkage is particularly strong
over the 1930s, and go further on, suggesting a causality from monetary policy to developments in the real economy. According
to them, the only role banks might have played over the crisis was their contribution to the shrinkage in money supply through a
lower money multiplier, dened in Section 13.6.3.

786

13.6. Procyclicality, credit crunches and quantitative easing

c
by
A. Mele

authority engages into the purchase of some of the assets held by banks (including the most
illiquid ones), so as to give banks incentives to start lending again.
13.6.1 Regulatory framework
Banks have to set up capital bu ers to guarantee the debt they issue against their risky activities
of lending and investing. The Basel Committee on Banking Supervision (BCBS)17 drafts accords
aiming to create an international standard for the capital necessary to cope with these risks,
together with rigorous tools for risk measurement and management. Quite simply, the greater
the risk a bank is exposed to, the greater the amount of capital the bank needs to hold to
safeguard its solvency, in the interest of overall economic stability. The main issue, then, is to
correctly measure this risk.
The rst accord of 1988, known as Basel I, focussed on minimal capital requirements to cope
with credit risk, and was enforced by law by the Group of Ten in 1992 and then by more than
100 countries. It relied on the so-called Cooke ratio (after Peter Cooke of the Bank of England),
a minimum capital adequacy standard of 8% of the total risk-weighted assets. The accord was
quite coarse, in that it considered ve broad classes of credit risk with which to weigh the assets,
which did not discriminate about the credit quality across classes. For example, corporate loans
had 100% weightings and loans to OECD countries had zero weigthings, independently of the
ratings of the borrowing entities.
The rst amendment to Basel I occurred in 1996, and aimed to include tools to cope with
market risk. In 1999, a rst consultative paper was drafted on a new accord, known as Basel II.
One of the main issues under reform was the one-size-ts-all approach of Basel Ithe fact
that default risk could be substantially lower for some of the assets within the same class of risk
in the banks accounts. For example, banks could have securitized the loans with default risk
lower than that implied by the at rate within the same class, and hold those loans with higher
default risk. This might have led to an increase in the overall riskiness of nancial institutions.
In 1998, the Federal Reserve Chairman Alan Greenspan pointed to the existence of incentives
left to bank to implement regulatory arbitrage:
Banks arbitrage away inappropriately high capital requirements on their safest assets by
removing these assets from the balance sheet via securitization. The issue is not solely
whether capital requirements on the banks residual risk in the securitized assets are
appropriate. We should also be concerned with the su ciency of regulatory capital requirements on the assets remaining on the book. In the extreme, such cherry picking
would leave on the balance sheet only those assets for which economic capital allocations
are greater than the 8 percent regulatory standard.[Greenspan, 1998 p. 166]

There is a consensus that Basel II did indeed considerably mitigate these issues, by paying
more attention to risk-sensitivity by means of a more precise set of indications about classes of
risk and, also, distinguishing among credit risk, market risk and even operational risks.18 Moreover, the Basel II accords aimed to a exible supervisory system whereby banks could choose
17 The BCSB is a committee of banking supervisory authorities established by the central bank governors of the Group of Ten
countries in 1975. It consists of senior representatives of bank supervisory authorities and central banks from Belgium, Canada,
France, Germany, Italy, Japan, Luxembourg, the Netherlands, Spain, Sweden, Switzerland, the United Kingdom, and the United
States. It usually meets at its permanent Secretariat, located at the Bank for International Settlements in Basel.
18 Operational risk is dened as the risk of losses resulting from inadequate or failed internal processes, people and systems, or
external events. Examples of operational risk include two famous cases of rogue trading: Nick Leeson, who in 1995 led Baring
Bank to bankruptcy, through a loss of $1.3bn, and J
er
ome Kerviel, who in 2008 led Soci
et
e Gen
erale to a loss of 5bn.

787

13.6. Procyclicality, credit crunches and quantitative easing

c
by
A. Mele

to measure and manage risks by following a standardized or an internal rating approach.


The standardized approach would rely on the ratings supplied by dedicated rating agencies approved by national supervisors. Capital incentives would also be available to move towards more
advanced, internal approaches, where banks calculate weightings through proprietary models.
However, moving from mechanical rules to propr-generated ones requires a particular care as
to the quality of the models, which have to be approved by national supervisors.19 Therefore,
Basel II encourages banks to improve their risk measurement and managament systems.
Basel II relies on three three pillars, which impose increasingly stringent rules on capital
adequacy:
Pillar I: Minimal capital requirements. It denes a set of basic rules for measuring credit
risk, market risk and operational risks (both standardized and internal).
Pillar II: Supervisory review of capital adequacy. It increases the role of banking supervisors, by setting rules of conduct national supervisors should maintain to ensure that
banks have capital adequacy over and above the minimal capital requirements of Pillar 1.
Pillar III: Market disclosure. It increases the role of market discipline and disclosure,
by relying on the publication of information by banks, relating to issues such as risk
measurements, risk-rating processes, or risk-management systems.
Pillar I denes a clear separation of three types of risk: (a) credit risk; (b) operational risk
and (c) market risk. The calculation rules rely on the concept of total risk-weighted assets
(Total RWA), dened as
Total RWA = Cr + MnO
where
Cr = Risk-weigthted assets for credit risk
MnO = Assets weighted for market and operational risk
= (Capital requirements for market and operational risk) |{z}
12 5

=1 0 08

For example, the capital requirements for market risk can be determined through dedicated
VaR models, such as (and, possibly, more sophisticated versions of) those surveyed in Section
13.5. The (total) minimum capital requirements are taken to be 8% of the total risk-weighted
assets,
Regulatory capital
8%
Total RWA
One immediate issue arising with Basel II is its heavy reliance on credit rating agencies for
what pertains the standardized approach to credit risk. This approach might be misguided
due to conicts of interest between credit rating agencies and the rms these agencies rate the
debt of. At the time of writing, rating agencies are mostly unregulated, with the credit risk
estimates quality being obviously observable only with lags and, importantly, too late should
serious mispricing take place.
19 Note that within the internal rating approach, banks are not allowed to use internal models of credit risk. Banks that have
received supervisory approval to use the internal approach may rely on their own internal estimates of risk components in determining
the capital requirement for a given exposure. The risk components include: (i) measures of the probability of default, (ii) loss given
default, (iii) exposure at default, (iv) e ective maturity.

788

13.6. Procyclicality, credit crunches and quantitative easing

c
by
A. Mele

A second issue is procyclicality: in bad times, banks reduce lending, exacerbating the
current economic developments, which makes banks reduce lending even further, over a vicious
circle. Sections 13.6.3 and 13.6.4 deal with procyclicality issues. The dangers of procyclicality
in a regulatory context is that in bad times, when risks intensify, the banking system is given
additional burden due to regulatory capital, which might lead to a further lending shrinkage.
Basel III introduces a regulatory device that allow to mitigate cyclicality by requiring banks to
build up capital bu ers in good times with which to cope in adverse times.
After a number of additional consultative papers, national regulators indicated to the Financial Stability Institute (FSI)20 that they would implement Basel II by 2015. During the process
when the European Union was implementing Basel II through its EU Capital Adequacy Directive (CAD III), the global nancial crisis following the 2007 subprime events determined a
re-thinking of Basel II, leading to a new set of rules, known as Basel III. The main innovations
of Basel III are summarized by the following points: (i) new capital requirements, such as those
summarized in the table below, as well as a new mandatory capital conservation bu er of 2.5%
of the Total RWA, to face economic stress; (ii) new rules that allow national supervisors to
require banks to set up capital up to 2.5% of the Total RWA in times of high credit growth
countercyclical capital bu ers; and (iii) a target for the leverage ratio, dened as the ratio of
Tier 1 capital (i.e. equity plus reserves minus intangible assets) over total assets net of intangi1
1
ble assets, to be at least 3% ( Tier
3%
Lev NonTierTier1 1
1), as well as additional
Assets
0 03
liquidity ratios. The following table summarizes the main di erences in capital requirements
that Basel III introduces against Basel II.
Capital requirements as a % of the Total RWA
Basel II Basel III
Common equity
2%
4.5%
Tier 1
4%
6%
Total capital
8%
8%
Common equity (conservation)

2.5%
13.6.2 The 2007 subprime crisis
The 2007 subprime crisis would develop due to a mixture of coincident factors. One factor
certainly regards the institutional details through which MBS securities were tradeda shadow
banking system that escaped the o cial nancial community.
A second factor was model misspecication, that is, the fact that the evaluation framework
for these securities relied on unrealistic assumptions such as stability of delinquencies, reliance
on expected (linear) actuarial losses (not tail risk losses), and small risk-aversion or liquidity
adjustments. For example, the picture below shows delinquencies were actually traveling fast
over the relatively newly created subprime mortgages; in retrospect, this of piece information
could have helped predict that a crisis in the mortgage market was about to arrive. Another
dimension of model misspecication was a reliance on an inappropriate rating mapping system, by which rating agencies tended to rate structured products relying on MBS, by using the
rating system they had in place for corporations. Finally, additional elements such as default
risk correlation were not duly taken into account while calibrating the models.
20 The FSI, headquartered by the Bank for International Settlements in Basel, was established in 1999 in response to the Asian
crisis of 1997.

789

c
by
A. Mele

13.6. Procyclicality, credit crunches and quantitative easing

This section pulls all these elements together while providing a succinct account of the subprime crisis. The crisis erupted while MBS derivatives were producing losses in a market where
the identity of the players was unclear as these products had been channelled through the
o -balance-sheet vehicles of the shadow banking system.
Subprime mortgage delinquencies by vintage Year (60+ day delinquencies, in % of balance)
30

25

2000
2006

2005

20

2004
15

2003
10

2007
5

10

20

30

40

50

60

Source: IMF, Global Financial Stability Report, April 2008.


13.6.2.1 O -balance-sheet entities: SIV, conduits, and SIV-lites

On the funding side, a typical SIV (Structured Investment Vehicle) issues long-maturity notes.
On the asset side, a SIV typically relies on assets that are more complex than those conduits rely
on. SIVs tended to be more leveraged than conduits. Please remember: SPV = Special Purpose
Vehicle, i.e. a vehicle that organizes securitization of assets; SIV = Structured Investment
Vehicle, i.e. a fund that manages asset backed securities. In a sense, SIV are virtual banks, in
that they borrow through low-interest securities and invest in longer term securities yielding
large rewards (and risk), as we discuss below. SIVs and conduits typically have an open-ended
lifespan.
SIV-lites are less conservatively managed and are structured with greater leverage. Their
portfolios are not much diversied, and are much smaller in size than SIVs. SIV-lites had a
nite lifespan, with a one-o issuance vehicle. They were greatly exposed to the U.S. subprime
market, more so than SIVs.
O -balance-sheet entities borrow in the shorter term, typically through commercial paper or
auction rate securities with average maturity of 90 days, as well as medium term notes with
average maturity of a year. They purchase long-maturity debt, such as nancial corporate bonds
or asset-backed securities, which is high-yielding. Naturally, the prots made by these entities
are paid to the capital note holders, and the investment managers. The capital note holders
are, of course, the rst-loss investors.
The obvious risk incurred by these entities is solvency, a risk that materializes when the value
of long-term assets falls below the value of short-term liabilities. This risk has great chances
790

13.6. Procyclicality, credit crunches and quantitative easing

c
by
A. Mele

to materialize when the pricing of the assets is informal, as argued below. A second risk
is funding liquidity, the risk related to duration mismatch: renancing occurs on a short-term
frequency, but if short-term market conditions are bad, the entities need to sell the assets into
a depressed market. To cope with this risk, the sponsoring banks would grant credit lines.21
13.6.2.2 Credit ratings and model misspecication

The role of credit ratings was crucial to determine the riskiness of MBS-related derivatives, as
explained in Section 13.4.3. Credit ratings were endogenous in the securitization and tranching
process: they would not be applied to an exogenously given tranching scheme; rather, they
would be used to determine the riskiness of the tranching scheme. Another important point is
that many of these MBS securitized assets were illiquid, which did not facilitate pricing. Thus,
the pricing of structured products would rely on the pricing of products that were similarly
rated and for which quotes were available. For example, the price of AAA ABX subindices
would be used to estimate the value of AAA-rated tranches of MBS. Or, the price of BBB
subindices would be used to value BBB-rated MBS tranches. This is the mapping role credit
ratings played for the pricing of customized or illiquid structured credit products.
Yet it is well-known that the risk prole of structured products di ers from that of corporate
bonds. Even if a tranche has the same expected loss as an otherwise similar corporate bond,
unexpected loss or tail risk can be much larger than that for corporate bonds given the complexity of the product (see the Matryoska - Russian doll scheme below). All in all, it would be
misleading to extrapolate structured products ratings from corporate bonds ratings. Typically,
corporate bond ratings only capture the rst moments of the distribution. Finally, credit rating
inertia for bonds does not necessarily work for structured products. Rating deterioration for
structured products can travel very fast.22

Two additional fundamental aspects contributing to the meltdown. First, there was an erosion in lending standards: statistical models were based on historically low mortgage default
and delinquency rates that arose in a credit environment with tight credit standards. Second,
21 Typical sponsors at the time were Citibank ($100bn), JP Morgan Chase ($77bn), Bank of America ($60bn). In the European
Union: HBOS ($42bn), ABN Amro ($40bn), HSBC ($32bn).
22 See IMF 2008 report.

791

13.6. Procyclicality, credit crunches and quantitative easing

c
by
A. Mele

there were correlation issues: past data suggested a quite weak correlation between regional
mortgages, which made investors perceive a sense of diversication. However, the housing
market grinding to a halt turned up to be a nation-wide phenomenon.
13.6.2.3 The meltdown

One crucial element of the crisis was the market fear of contagion from the rising level of
defaults in subprime underlying instruments, many of which were incorporated in complex
products. Fears of contagion concerned safer tranches as well. They came from the investors
understanding the pricing models were misspecied, and their lack of trust vis-`a-vis the rating
agencies.
Banks were a ected for a number of reasons: (i) they had invested in subprime securities
directly; (ii) they had provided credit lines to SIV (indebted through commercial paper) and
conduits that held these securities, thereby creating a shadow banking system, which escaped
accounting and supervision rules; and (iii) this very same shadow system generated banks loss
of condence in the ability of their counterparties to meet their contractual obligations. So
the Asset Backed Commercial Paper market dried up, triggering credit lines. The result was a
sell-o of anything related to structured nance, from junk to AAA, which led to a complete
liquidity black hole, and a severe reappraisal of structured nance.
The reappraisal of structured nance determined severe writedowns, also due to a liquidity
black hole fueled by a di cult repricing. Indeed, in the absence of a liquid market, writedowns
largely rely on marking-to-model. But investors begun not to trust the models and the rating
process leading to them. Meanwhile, credit agencies proceeded to severe downgrades, conrming
the investors beliefs that previous ratings were based on misspecied assumptions, a quite selfreinforcing mechanism. These events escalated to a complete dry up in September-October
2008, partly restored by painful bank bail-outs and recapitalizations.
[In progress, explain Lehmans experience]
13.6.3 Top tier capital ratio targets and endogenous volatility
Treating volatility or credit risk as exogenous could be a good approximation whilst living
in good times. The quality of this approximation deteriorates in times of crisis. The implicit
assumption made in many instances of this and previous chapters, is that ones own actions,
based on a volatility forecast, do not a ect future volatility, just like forecasting weather does
not inuence future weather. Arguably, the actions of many heterogeneous market participants
should tend to cancel with each other, during periods of calm. However, market participants
tend to cluster their decision rules in periods of crisis. The literature on this endogenous risk
is quite fascinating, as are the surveys in Shin (2010), or the recent modeling framework put
forward by Danielsson, Shin and Zigrand (2011).
This section develops a simple model of endogenous risk, where markets can be destabilized
by one instance of procyclicality, arising because nancial institutions need to comply with a
given top tier capital (i.e. the capital comprising Tier 1 and Tier 2, using the terminology of
Section 13.6.1) ratio. After a negative shock in the value of the assets on the balance sheet, a
nancial instititution needs to restore its top tier capital ratio. In the short-run, the institution
can only restore this ratio through asset sales. Because every institution is doing the same,
these asset sales have a market impact, collectively, determining a further fall in the value of
the assets, and so on. The nal outcome is an increased volatility of the risky asset price, as well
as a disproportionate assets sell-o , if compared to the initial shocks triggering it. The model
792

13.6. Procyclicality, credit crunches and quantitative easing

c
by
A. Mele

is useful to think about the subprime events in 2007, as well as the ensuing credit crunch and
the new solutions that monetary authorities have experimented to help mitigate these adverse
developments, as we shall discuss.
13.6.3.1 Model

We consider a model with many identical nancial institutions complying with regulation or,
more generally, concerned with a pre-specied target of top tier capital ratio against risky
assets. Each institution has the following balance sheet.
Balance sheet, Time 1

The notation is a bit more elaborated than that used in Section 13.3: whilst we still dene
as equity (including past retained earnings) and
as debt, we now dene
as the value
to equal cash and
of risky assets, no matter how liquid these can be. Moreover, we dene
reserves. We suppose the 8% Cooke ratio is in place, or
0 08 and to simplify let = 0 08 .
Note that a top tier capital rule does not determine a leverage rule: there are, obviously, many
leverage ratios, , consistent with a given top tier capital ratio . In fact, the new Basel III
expicitly considers leverage ratios, thereby innovating upon Basel II, as discussed in Section
13.6.1. We shall deal with the procyclicality induced by this new rule later in this section.
Next, assume that some exogenous shock takes place, which makes the value of the risky
assets decrease by some amount,
, after which each institution would have the following
balance sheet.
Balance sheet, Time 2

But, each institution has to comply with its top tier ratio target. Therefore, at time 2, the
new top tier capital,
, must be at least 8% of the risky assets,
, or,
= 0 08 (

(13.65)

At time 1, the nancial institution had set = 0 08 . Therefore, Eq. (13.65) cannot hold, as
a simple computation reveals. The intuition behind this impossibility is simple. As the value
of the risky assets falls, the value of equity falls by a larger percentage than that of the risky
assets, due to leveragethe value of risky assets falls by
, whereas the value of equity falls
by a larger percentage,
, due to
.
Two solutions are available to the nancial institution: (i) to inject fresh capital; (ii) to sell
some of the risky assets. The rst solution is not quite viable in the short-run. Let us analyze
the second solution. We are looking for some quantity of the risky asset to sell, such that the
reduction in value of the assets, say s , is able to meet the top tier capital ratio target. In
terms of the balance sheet, we have the following situation.
Balance sheet, Time 3
s

793

c
by
A. Mele

13.6. Procyclicality, credit crunches and quantitative easing

How much of the risky asset value should the nancial institution precisely get rid o ? To
maintain the new top tier ratio, s must satisfy:
s

= 0 08 (
Using

= 0 08 , and solving for


s

yields,
1

8%
8%

= 11 5

That is, roughly, the number of risky assets to sell is proportional to the percentage loss in their
value. In general,
s
1
=
(13.66)
where denotes the top tier capital ratio against risky assets. This result is intuitive: following
a negative shock a ecting the value of the risky assets, the amount of asset sales is decreasing
in the pre-specied top tier capital ratio, , because the closer is to 100%, the easier the
adjustment is to maintain the same . Eq. (13.66) would end the description of this market, if
the nancial institution had no price impact.
We assume that nancial institutions have a market impact, collectively: while the behavior
of one single institution cannot a ect the price of the risky asset, many institutions doing the
same thing at the same time more likely could, thereby creating price pressures, with the price
of assets falling and triggering new sales into a depressed market, over a vicious feedback loop.
Is there an equilibrium for this loop? The answer relies on the way we think of selling pressure.
We model selling pressure by assuming that there is a continuum of nancial institutions, and
that the asset value changes according to:
s

= +

(13.67)

where is the price impact, as we say in market microstructure, surveyed in Chapter 9. In


words, the percentage change in the asset value is the product of two components: (i) the initial
shock, , and (ii) a selling pressure component, which will pump up endogeneous volatility, as
we shall show. Naturally, if the the market was perfectly liquid, = 0. Moreover, in a world
where nancial institutions had no concerns about top tier capital rules, we would have s = 0
in the rst place, and so:
=

(13.68)

Note that this solution is also that arising when the market is perfectly liquid, = 0. However,
we assume the existence of price pressure, as formalized by Eq. (13.67), determined by the
concern nancial institutions have to comply with a top tier capital rule, leading them to a sale
S
satisfying Eq. (13.66). The loop we have created is, then, the following. After an initial shock
a ecting the risky asset value, , nancial institutions sell risky assets, to an extent proportional
to the percentage change in the asset value, according to Eq. (13.66). In turn, the sell-o entails
a further percentage drop in the asset value, as determined by Eq. (13.67). An equilibrium,
provided it exists, is a xed point to this feedback-loop, i.e. a situation where the loop stops
because the drops in the risky asset value do not happen anymore, and the assets sell-o s are
interrupted as a result, thereby providing no reasons for the asset value to fall any further.
794

c
by
A. Mele

13.6. Procyclicality, credit crunches and quantitative easing


To nd this xed point, we replace Eq. (13.66) into Eq. (13.67) and solve for
=

(1

, leaving:
(13.69)

)
s

Note that in order for this equilibrium to exist, we need that , the slope of the line
7
in
s
Eq. (13.67), be less than 1 , the slope of the line
7
in Eq. (13.66), or that (1
)
.
Intuitively, if the price impact was too large, the feedback from asset sales to the asset value
drops would create a perverse spiral such that the market would collapse.
Eq. (13.69) shows, crucially, that the shock multiplier,
1. When nancial institutions
have a concern over top tier capital ratios, the ultimate change in the asset value resulting
from an initial shock, is larger than that we would have observed otherwise, say in Eq. (13.68).
For example, assuming a price impact = 0 05 implies a multiplier
= 2 4. Naturally, these
e ects become less important as the market becomes more liquid, and do not matter anymore
in the limit case where the market is perfectly liquid, = 0.
What is the amount of asset sales resulting from this loop? Replacing Eq. (13.69) into Eq.
(13.66) yields:
s
1

=
(13.70)
(1
)
Note that the feedback loop might extert quite substantial e ects into the amount of asset sales.
Assuming = 0 08, the multiplier
would equal 11 5 in the absence of feedback, = 0, as
previously noted. The same multiplier more than doubles in the presence of feedbacks and a
price impact of just = 0 05, attaining a value
= 27 1.
This model thus formalizes the idea that even a small shock a ecting the risky assets held by
nancial institutions might lead to large sale adjustments and price corrections, similarly as for
the developments inherent the subprime events described in the previous section. In the model,
the concerns nancial institutions have about top tier capital ratios leads them to substantial
asset sales in response to a shock, which are even more amplifed in the presence of feedback
e ects induced by liquidity frictions.
13.6.3.2 Multiple equilibria and market break-ups

In the model we analyze, an equilibrium exists under parameter restrictions that are independent of the realization of the initial shock, . As noted, we simply need that the denominators
of the multipliers
and
in Eqs. (13.69) and (13.70) be strictly positive. We now present a
variant of the model, where an equilibrium fails to exist when the initial shock is su ciently
large. We simply assume that the price pressure is nonlinear, di erently from the linearity
assumption underlying Eq. (13.67). It is:
s 2
= +
(13.71)
The quadratic term in Eq. (13.71) formalizes the idea that the price impact of asset sales does
not matter too much when the asset sales are limited, but becomes disproportionately high
when the asset sales are at a large scale. This convexity translates into a non-linearity of the
resulting feedback loop. Replacing Eq. (13.71) into Eq. (13.66), we nd that in any equilibrium,
the amount of asset sales is solution to the following quadratic equation
S
S 2
S
1
1
0=

(13.72)
795

c
by
A. Mele

13.6. Procyclicality, credit crunches and quantitative easing

We hypothesize the market is hit by a series of shocks. Initially, we assume that = 0, such
S
that two equilibria are possible: one, where the sell-o is just zero,
= 0; and another, where
the sell-o is (1 ) . We assume the sell-o is zero. Then, we assume that a rst positive shock
S
hits the market, with = 1%. The solid line in the next picture is the graph of
in this
S
S
:
= 0. We assume that the market
case. There are two equilibria, corresponding to
coordinates towards the leftmost one.
0.3
0.2
0.1
0.0

0.2

0.4

0.6

0.8

1.0

1.2

-0.1

1.4

1.6

sell-off

-0.2
-0.3
-0.4
-0.5

S
in Eq. (13.72), when the initial shock is
The three curves depict the graph of
= 1% (solid line), = 2% (dashed line), and = 5% (dotted line), obtained assuming
a top tier capital ratio against risky assets = 8%, and a price impact = 0 05. The
equilibrium sales are those where the curves intersect the horizontal axis, if any.

S
As the risky asset value is hit by one additional shock, say with = 2%, the graph of
shifts to South-East,
Sto the dashed line, and the asset sales increase as a result, still being the
leftmost zero of
. The market collapses when the shock is = 5%, leading the graph of
S
to a further shift to South-East, where no equilibria are left at all.
13.6.3.3 Deleveraging

The assumption made so far is that following a shock a ecting the assets in the balance sheet,
nancial institutions sell additional assets to the extent their top tier capital ratios are restored.
This section investigates additional adjustments, aiming to preserve leverage ratio targets. Denote the leverage ratio as
. The timing of the shock and banks reaction is as in Section
13.6.3.1, with the exception that we now have one additional constraint: at time 3, nancial
institutions also wish to call portions of their debt, l say, so as to comply with leverage ratio
targets, and achieve the following balance sheet.
Balance sheet, Time 3
under a deleveraging scenario with deep liquidity bu ers
s

796

c
by
A. Mele

13.6. Procyclicality, credit crunches and quantitative easing

The term s is the usual sell-o needed to maintain top tier capital targets. The nancial
l
institutions also target an amount of deleveraging l :
= , or:
l

(13.73)

and we initially assume that while doing so, they do not exaust their liquidity bu ers,
+

(13.74)

Note that under the condition in (13.74), deleveraging does not have a price impact because
it would imply banks are simply using cash to repay portions of their debt. In this case, s is
the same as that in Eq. (13.70). If, instead,
+ s is not large enough, nancial institutions
s
would have to sell additional assets,
say, so as to have su cient cash with which to meet
leverage ratio targets. Precisely, if the inequality in (13.74) does not hold, we need to have that,
at least,
s
s
:
= l ( s+ )
(13.75)
such that the balance sheet faced by the nancial institutions at time 3 would be as in the
following alternative scenario:
Balance sheet, Time 3
under a deleveraging scenario leading to exhausting liquidity bu ers
+

s
To determine the feedback e ects of the asset sales, s and
, replace the top tier capital
ratio condition, Eq. (13.66), and the leverage condition, Eq. (13.73), into Eq. (13.75), leaving,
s

(1 + )

(13.76)

Note that this expression is positive by assumptionwe are assuming that over the deleveraging
process, banks would exhaust their liquidity bu ers to the extent the condition in (13.74) does
not hold. Such a situation arises precisely due to a high leverage, . For example, assuming
= 0 08, the loading for
in Eq. (13.76) is positive only when
11 5. We would need to
observe values of larger than 11.5 (and state-dependent, i.e. depending on the realization of
), in order for the condition in (13.74) to break down.
We determine an equilibrium for the feedback loop in this market. Replace Eq. (13.76) into
s
s
Eq. (13.67), evaluated when the asset sales amount to
, i.e.
= +
, and solve for
both the asset value drop and sales,
s

((1 + )

1)

(13.77)

(1+ ) 1
and the constants
and
. To ensure an equilibrium exists,
((1+ ) 1)
((1+ ) 1)
s
we need that , the slope of the line
7
in Eq. (13.67), be less than (1+ ) 1 , the slope of
s
the line
7
in Eq. (13.76), or that the denominators of
and
be strictly positive.
Finally, note that there is a third possibility available to banks: to sell additional assets even
when the condition in (13.74) holds. This possibility is relevant when the nancial institutions
797

c
by
A. Mele

13.6. Procyclicality, credit crunches and quantitative easing

also have concerns over maintaining a certain level of liquidity bu ers, some of them possibly
being mandatory. For example, if the liquidity target is at least , we replace the inequality
l
in (13.74) with the stricter inequality, s
0. The solutions for the asset value drop and
sales are the same as those in Eqs. (13.77), but with the terms involving
being dropped,
s
=
, and
=
. The next picture depicts the two multipliers of for
and
s
arising in this case, assuming the top tier capital ratio target is = 0 08, and the price
impact of asset sales is = 0 05.

Shock

Sales 90
5

80
70

60
50

40
30

20
10

12

14

16

18

20

22

24

26

28

Leverage, L

12

14

16

18

20

22

24

26

28

Leverage, L

This picture depicts the graph of the two multipliers,


and
in Eqs. (13.77), as a
function of banks leverage, , when the top tier capital ratio is = 0 08, and the price
impact of banks sell-o s is = 0 05. When banks have concerns over maintaining cash
and reserves to their level before the shock, , the overall asset value drop and sales equal
s
=
and
=
. The left panel depicts the graph of
and the right
.
panel depicts that of

First, note that given the assumption that


= 0 08, the relevant range of variation for
leverage is that for
11 5, as we are studying situations where deleveraging leads to partial
exhaustion of liquidity bu ers. The e ects can be quite substantial.
13.6.4 Credit crunches and quantitative easing
[Show pictures of (i) the balance sheet of FED, ECB and BoE over the last years. (ii) the FED
fund rate, and EONIA, (iii) the corporate spreads]
13.6.4.1 The money multiplier

Loans make deposits! The well-known mechanics underlying the creation of money relies on the
standard money multiplier, whereby new deposits made available to the banking system are
partially used to extend new loans, which generate further deposits, and so on. Mathematically,
the supply of money, say M1 aggregates, includes cash held by the public plus deposits,
=
+ , with straight forward notation. Instead, the monetary base, or high potential money,
is made of cash held by the public plus banks cash and reserves,
=
+ , where
798

13.6. Procyclicality, credit crunches and quantitative easing

c
by
A. Mele

denotes cash and reserves held by banks, as in the previous sections. Note the leakage banks
create over the circuit of money creation: because banks face liquidity needs possibly arising
in the short-term (vis-`a-vis their clients and/or other banks), and possibly need to maintain
mandatory reserves with the central banks of the countries where they operate, they hoard ,
which escapes the loans-make-deposits loop.
We assume that the ratios
and
are constant and equal to and , respectively, such
that money supply equals a money multiplier times monetary base, viz
=

1+
+

(13.78)

The value of depends on a variety of factors, such as: (i) the discount rate at which a
central bank lends money to banks; (ii) the interbank rates such as the LIBOR; (iii) the level
of mandatory reserves banks have to keep with their central banks; (iv) the interbank rate
for resources to allocate to mandatory reservesthe Federal Funds rate in the US; (v) the
risk/return tradeo prevailaing in other markets. Clearly, the value of increases with the
values of the items from (i) to (iv), and decreases when the tradeo in (v) compares more
favourably to banks.
13.6.4.2 Policy actions

Conventional policy strategies consist of actions aiming to a ect the value of and . For
example, central banks can expand
through open market operations, by purchasing shortterm Government bonds. Note that this action likely a ects as well, as the opportunity costs of
holding excess reserves increase as markets are ooded with more and more liquidity. There are,
obviously, limits to this action, arising when short-term interest rates get close to zero. Consider
the 2007 subprime events. We know that following a shock a ecting the banks books, the overall
adjustments can be quite substantial. Consider, for example, the model in the previous section,
where banks have concerns over the top tier capital ratio. After the shock takes place, banks
aim to a shrinkage in the asset value equal to s . Moreover, the shrinkage can be even more
s
substantial, s +
, in markets where banks are high levered, and concerned about not
increasing their leverage even more as a result of the shock. The model is silent about which
s
particular assets (liquid or not) or loans are involved into the shrinkage plans, s or s +
.
We dene a credit crunch as the situation where banks decide to cut on corporate loans and
bondsthey hold more reserves, instead of lending money to the real sector.
A quite mechanical response to a credit crunch is to increase
through, say, open market
operations. Put simply, a credit crunch entails a higher value of , among other things. Monetary
aggregates, , are destroyed as a result, but can be restored through an injection of high
potential money. Precisely, by Eq. (13.78), the expansion of monetary base needed to maintain
the same supply of money, , is
= 1+
, where
is the increase in determined by
the credit crunch. This policy action is quite fundamental as it helps keep interest rates low, yet
it may not be enough when the credit crunch is so particularly severe to lead to very substantial
shrinkages in the economic activity, as we now explain.
The e ects of a credit crunch on the real sector of the economy are quite obvious, with an
increase in the cost of capital and a subsequent shrinkage in the economic activity. For example,
the recession following the subprime events was spectacular, with industrial production falling
by approximately 13% on a yearly basis in March 2009, the highest drop since World War II.
The policy action was equally impressive: in less than two years after the subprime events, the
FED was capable of pushing short-term rates close to zero although during those periods, it
799

c
by
A. Mele

13.6. Procyclicality, credit crunches and quantitative easing

was already clear that this policy would not be likely to prevent an even deeper recession. All
in all, even if the Federal Funds rate and short-term rates on safe assets were close to zero, the
cost of capital rms had to bear were quite substantial as a result of the credit crunch.23 Note
that at that time, the credit crunch was also exasperated by a freeze in the interbank lending
market, arising from concerns nancial institutions had about counterparty risk.24
The events following the 2007 turmoil can be described as those of a liquidity trap, where
banks hold abundant liquidity and short-term rates are close to zero. Note that the nature
of this liquidity trap is di erent from the standard Keynesian liquidity trap, as formalized in
the Appendix of Chapter 1: the Keynesian trap arises when money demand is at as a result
of the expectations investors have that future interest rates can only increase. In this case,
agents simply absorbe any liquidity injections made by the monetary authority, and interest
rates remain trapped at some minimum rate, coinciding with the lowest, shadow interest
rate beyond which no investors is ready to bet against a decrease. The liquidity trap we are
analyzing is di erent, and stems from the mere and mechanical circumstance that money supply
is so abundant to have made short-term rates close to zero in the rst place. However, in both
cases, the economy is trapped in that a further increase in the monetary base would have no
e ects on short-term interest rates.
What was the initial policy reaction to this liquidity trap? Note that a further issue arising
within the case we analyze in this section, is that the liquidity trap is accompanied by a surge in
corporate spreads, as a result of the credit crunch. Quantitative easing is a policy action that
aims to restore the shrinkage in credit supply, and possibly reduce these corporate spreads, so as
to mitigate the adverse e ects of the credit crunch on to the real economic activity. Consider the
following balance sheets. The balance sheet on the left-hand side is that arising after nancial
institutions have reacted to a shock in their asset value, as formalized in previous sections. The
portion of s that includes corporate loans and bonds is what we are terming credit crunch.
Balance sheets, Time 3 and beyond
s

The e ects of quantitative easing can be seen through the balance sheet on the right-hand
side. Banks can purchase new corporate bonds or extend new loans (securitized loans), which
the central bank immediately purchases, leading to the left arrow of the previous diagram.
The ideal, spontaneous, resolution of a credit crunch is when nancial institutions are willing to
extend corporate loans or purchase corporate bonds, to restore their credit shrinkage, at least
partially, by some amount s . However, we know this resolution cannot occur until the value
of the assets remains depressed. However, the central bank could step in to purchase the assets
the banking system is disliking, to an extent equal to at least s , so as to leave banks with the
excess reserves they wish and comply with their top tier capital ratio targets. This action is
the essence of quantitative easing, with assets typically involved being ABS and even long-term
23 A high cost of capital to rms might occur during a credit crunch, as a result of one additional e ect: a credit crunch leads to
a contraction in aggregate demand, which makes defaults more likely.
24 A shrinkage in the economy activity suggests a natural extension of the models in the previous section, with one additional
element of procyclicality: as the real economy plummets as a result of the credit crunch, the value of the assets decreases even more,
thereby deepening the credit crunch, over a vicious feedback loop. Note that this feedback mechanism would be, quite naturally,
part of the nancial accelerator hypothesis, although distinct from the mechanism mentioned at the beginning of this section, and
surveyed by Bernanke, Gertler and Gilchrist (1999). In the version we suggest, the credit crunch is determined by concerns nancial
intermediaries have about their top tier capital ratios and leverage exposures, rather than agency problems occurring over their
relationships with clients. We do not examine this additional source of procyclicality to keep the analysis as simple as we can.

800

13.6. Procyclicality, credit crunches and quantitative easing

c
by
A. Mele

bonds. Its e ects include both (i) an increase in , equal to the extent of the liquidity injection,
and (ii) higher incentives given to banks to renance the real sector, due to the liquidity bu ers
supplied by the central bank.

801

c
by
A. Mele

13.7. Appendix 1: Present values contingent on future bankruptcy

13.7 Appendix 1: Present values contingent on future bankruptcy


The value of debt in Lelands (1994) model can be written as:

+E
(1
( )=E
0

(13A.1)

is the time at which the rm is liquidated. Eq. (13A.1) simply says that the value of debt
where
equals the expected coupon payments plus the expected liquidation value of the bond. We have:
Z

;
( )
(13A.2)
=
E
0

denotes the density of the rst passage time from to


. It can be shown that
where
;
( ) is exactly as in Eq. (13.12) of the main text. Similarly,

Z
= E
E
0
0

Z Z

;
=
0
0
Z

1
=
;
0

(1

( ))

Replacing Eq. (13A.2)-(13A.3) into Eq. (13A.1) yields Eq. (13.11).

802

(13A.3)

c
by
A. Mele

13.8. Appendix 2: Proof of selected results

13.8 Appendix 2: Proof of selected results


Alternative derivation of Eq. (13.24). Under the risk-neutral probability, the expected change
of any bond price must equal zero when the safe short-term rate is zero,
()

+ (Rec

( )) =

=0

with

( )=

where the rst term, ( ) , reects the change in the bond price arising from the mere passage of time,
and (Rec
( )) is the expected change in the bond price, arising from the event of default, i.e. the
probability of a sudden default arrival, , times the consequent jump in the bond price, Rec
( ).
The solution to the previous equation is,
Z
Rec
+
(0) =
| {z }
0

=Pr{Default at }

which is Eq. (13.24).

Proof of Eq. (13.25). The spread is given by:

Rec 1
1
ln
( )=
With

= 1, and Rec =
1
ln
( )=

or equivalently,

( )=
Therefore, if

ln

, then, lim

, we have,

1
+

( ) = , and if

803

ln

ln

, lim

( )= .

+1
)

c
by
A. Mele

13.9. Appendix 3: Transition probability matrices and pricing

13.9 Appendix 3: Transition probability matrices and pricing


Consider the matrix

) for
(

), and write,

1+

=
6=

(13A.4)

as they were the counterparts of the intensity of the Poisson process


We are dening the constants
in Eq. (13.22). Accordingly, these constants are simply interpreted as the instantaneous probabilities
of
. Naturally, for each , we have that
P migration from rating to rating over the time interval
(
)
=
1,
and
using
into
Eq.
(13A.4),
we
obtain,
=1
X

(13A.5)

=1 6=

dened in Eqs. (13A.4) and (13A.5) is called the generating


The matrix containing the elements
matrix.
Next, let us rewrite Eq. (13A.4) in matrix form,
(
Suppose we have a time interval [0

)=

], which we chop into

( )= ( ) =
+

pieces, so to have

. We have,

For large ,
( ) = exp ( )
(13A.6)
P
( )
the matrix exponential, dened as, exp ( )
=0
! .
is the price of derivaTo evaluate derivatives written on states, we proceed as follows. Suppose
tive in state
{1
}. Suppose the Markov chain is the only source of uncertainty relevant for
the evaluation of this derivative. Then,
+[

where {1
}, with the usual conditional probabilities. In words, the instantaneous change
, is the sum of two components: one,
, related to the mere passage of
in the derivative value,
], related to the discrete change arising from a change in the rating.
time, and the other, [
Suppose that = 0. Then,
(

=0=

)=

=1

6=

with the appropriate boundary conditions.


As an example, consider defaultable bonds. In this case, we may be looking for pricing functions
having the following form,
(
and then solve for

(
0

0=
=

)=

), for all
X
0
+

{1

)+1

}. Naturally, we have

[ (

6=

6=

804

)]
0

6=

c
by
A. Mele

13.9. Appendix 3: Transition probability matrices and pricing


which holds if and only if,
X
0
=
(
6=

That is,
(13A.6).

)=

X
6=

X
6=

6=

, which solved through the appropriate boundary conditions, yields precisely Eq.

805

13.10. Appendix 4: Bond spreads in markets with stochastic default intensity

c
by
A. Mele

13.10 Appendix 4: Bond spreads in markets with stochastic default intensity


We derive Eq. (13.38), by relying on the pricing formulae of Chapter 12. If the short-term is constant,
the price of a defaultable bond derived in Section 13.4.7 of Chapter 12 can easily be extended to, with
the notation of the present chapter,
R
Z

R
0
0
E0
E0
Rec ( )
(13A.7)
(
)=
+
0
|
{z
}
=Pr{Default (

+ )}

The term indicated inside the integral of the second term, is indeed the density of default time at ,
because,

R
0
(
)
=
1
E
default by time
such that by di erentiating with respect to , yields, under the appropriate regularity conditions, that
Pr{Default ( + )} is just the term indicated in Eq. (13A.7). So Eq. (13.38) follows. Naturally,
Pr{Default

Replacing this into Eq. (13A.7),


R
0
(
)=
E0
=1

LGD 1

+ Rec

)} =

surv (

surv (

(1

surv (

LGD)

)
surv (

where the second equality follows by integration by parts and the assumption of constant recovery
rates. Setting = 0, produces Eq. (13.39).

806

13.10. Appendix 4: Bond spreads in markets with stochastic default intensity

c
by
A. Mele

Appendix 5: Bond and CDS spreads


We provide some analytical details regarding the behavior of bonds and CDS spreads depicted in
Figure 13.14. First, we show that it holds, approximately, that bonds spreads are dominated by CDS
spreads. We have:
1

ln

)=

ln [1

LGD
LGD

LGD (1
surv (

1
4

surv (

1
P4

surv (

=1

))]

surv (

= 4 CDS0 ( )
Second, we show that approximately, bonds and CDS spreads are bounded away by a function
decreasing in time to maturity once the current is close to its long-term average under the riskneutral probability, . We illustrate this property while relying on arguments similar to those utilized
in Chapter
12 to address a related topic (see Section 12.3.4). For bond spreads, since E0 ( ) =
, we have, approximately, that:
+
1

ln

)=

LGD (1

1
ln 1 LGD 1

1
ln 1 LGD 1

ln [1

LGD
= LGD

0 E0 (

surv (

E0

))]

0 E0 (

) 1

Therefore, even if = , bond spreads are bounded away by a function decreasing in . Naturally, this property does not mechanically imply that bond spreads are decreasing in
too, although
the existence of such a bounding function helps this happening. As for the CDS spreads, we have,
approximately that:
4 CDS0 ( ) = LGD

1
4

1
P4

surv

=1

surv (

)
)

LGD

surv (

surv (

)
)

such that for = , CDS0 ( ) is bounded away by a function decreasing in


arguments made to for the bond spreads.

807

ln

surv

, due to the same

c
by
A. Mele

13.11. Appendix 6: Conditional probabilities of survival

13.11 Appendix 6: Conditional probabilities of survival


We prove Eqs. (13.41)-(13.43). First, for (
by

) small, the numerator in Eq. (13.40) can be replaced

surv (

E0

and rescaled by . Regularity conditions under which we can perform this di erentiation can be found
in a related context developed in Mele (2003). Eqs. (13.41)-(13.42) follow.
As for Eq. (13.43), the proof follows the same lines of reasoning as that in Appendix 3 of Chapter
12. That is, we can dene a density process,
R

R

0
)
surv (
F

R
( )=
) E
surv (

0
E
It is easy to show that the drift of

surv

( )
=
( )

is

, such that by Itos lemma,


( Vol (

surv (

)))

where,
Vol (

surv (

))

surv (
surv (

) p

where the second line follows by the closed-form expression of


a Brownian motion under
, where
p
=
+ (
)
and Eq. (13.43) follows.

808

=
surv

in Eq. (13.35). Therefore,

is

c
by
A. Mele

13.12. Appendix 7: Details regarding CDS index swaps and swaptions

13.12 Appendix 7: Details regarding CDS index swaps and swaptions


Proof of Eq. (13.55). Note that the expectation of the rst term in Eq. (13.53), conditional on the
information set , for
0 is, now, for = 1 4 ,
LGD

1X

=1

= LGD

1X
=1

I{Surv

I{Surv
at

}E

at } I{Def

I{Def

)}

)}

= LGD

I{Def

)}

(13A.8)

where the last equality follows by the denition of the outstanding notional value in Eq. (13.56), and
the fact that the expectation in the rst equality is the same for each name , due to the assumption
that the index names have the same credit quality. Summing over the reset dates, = 1 4 ,
delivers the rst term in Eq. (13.55). The second term in Eq. (13.55) follows by elaborating the time
expectation of the second term in Eq. (13.53),
E

=E

I{Surv

at

I{Surv

at } I{ Surv at

|Surv at }

= I{Surv

at } E

I{ Surv

at

|Surv at }

and summing over the reset dates and all names, and using the fact that all names have the same
default intensities.
Proof of Eq. (13.57). The derivation of Eq. (13A.8) relies on default events occurring after the
swap origination, i.e. over the reset dates, after = 0 . In evaluating the front-end protection, we need
to price securities that pay o over defaults possibly occurring over the life of the swaption, i.e. before
time = 0 . We have,
F

=E

1
= LGD E
1

= LGD D (
= LGD

D(

X
I{Def

=1

) + LGD

1X

I{Surv

( (

=1

)+

)} + I{Surv

at

at } E

def

} I{Def

))

)}

I{ Surv

at

|Surv at }

where the third equality holds by the assumption that the names have the same credit quality, and
( + )
) and def (
)=E (
). Note that the rst term in the brackets
(
)=E (
of the second equality is, obviously, always zero, when the timing of possible defaults does not overlap
with the evaluation horizon, as for Eq. (13A.8).
Proofs regarding the survival contingent probability. We show that sc in Eq. (13.59)
F
in Eq. (13.58), is a martingale under
does integrate to one, and that CDX ( ) = LGD 01 +
1

809

13.12. Appendix 7: Details regarding CDS index swaps and swaptions

c
by
A. Mele

sc . We rely on results that generalize the following equality:


E (

1X

)=E

=1

1X
=1

I{Surv

I{Surv

at

at

}E

h
E I{ Surv

at

|Surv at

} F

For example, regarding the survival contingent probability sc , we have that, under regularity
conditions,

R

E
=E E
1
1 F
R

( + )
=
E
1
=

where the second equality follows by Eq. (13.50) and the third by the denition of sc in Eq. (13.51).
sc () the time conditional expectation
As for the martingale property of CDX ( ) under sc , let E
operator under the the survival contingent probability sc . We have, using the denition of sc in Eq.
(13.59),

F
0
sc
sc
sc

E (CDX ( )) = LGD E
+E
1
1
R

R
F
1
0
1
sc
+E
= LGD E
1
1
1
1
R

1
1
F
E
E
+
= LGD
0
1

= LGD

0
1

+
1

where the last equality follows by the Law of Iterated Expectations and Eq. (13.52),

R

=E E
E
0
0 F
R

( + )
=E
0
=

810

c
by
A. Mele

13.13. Appendix 8: Modeling correlation with copulae functions

13.13 Appendix 8: Modeling correlation with copulae functions


A. Statistical independence and correlation
Two random variables are always uncorrelated, provided they are independently distributed. Yet
there might be situations where two random variables are not correlated and still exhibit statistical
dependence. As an example, suppose a random variable relates to another, , through = 3 , for
},
some constant , and can take on 2 + 1 values, P{
1 0 1
1 P
1
3+
3 = 0 and yet,
)
(
)
(
)
and Pr { } = 2 1+1 . Then, we have that Cov (
=1
=1
and are obviously dependent. This example might be interpreted, economically, as one where and
are two returns on two asset classes. These two returns are not correlated, overall. Yet the comove in
the same direction in both very bad and in very good times. This appendix is a succinct introduction
to copulae, which are an important tool to cope with these issues.
Consider two random variables 1 and 2 . We may relate 1 to another random variable 1 and we
may relate 2 to a second random variable 2 , on a percentile-to-percentile basis, viz
( )=

( )

=1 2

(13A.8)

are the cumulative marginal distribuwhere are the cumulative marginal distributions of , and
tions of . That is, for each , we look for the value of such that the percentiles arising through the
mapping in Eq. (13A.8) are the same. Then, we may assume that 1 and 2 have a joint distribution
and model the correlation between 1 and 2 through the correlation between 1 and 2 . This indirect
way to model the correlation between 1 and 2 is particularly helpful. It might be used to model the
correlation of default times, as in the main text of this chapter. We now explain.

B. Copulae functions
We begin with the simple case of two random variables, This simple case shall be generalized to the
multivariate one with a mere change in notation. Given two uniform random variables 1 and 2 ,
consider the function ( 1 2 ) = Pr ( 1
1
2
2 ), which is the joint cumulative distribution of
the two uniforms. A copula function is any such function , with the property of being capable to
into a summary of them, in the following natural way:
aggregate the marginals
(

1 ( 1)

2 ( 2 ))

2)

(13A.9)

where ( 1 2 ) is the joint distribution of ( 1 2 ). Thus, a copula function is simply a cumulative


bivariate distribution function, as ( 1 ) and ( 2 ) are obviously uniformly distributed. To prove Eq.
(13A.9), note that
(

1 ( 1)

2 ( 2 ))

= Pr (

= Pr
= Pr (

1 ( 1)

1
1
1

1
1

1)
1

2)

1
2

2 ( 2 ))

2
1
2
2)

2)

(13A.10)

That is, a copula function evaluated at the marginals 1 ( 1 ) and 2 ( 2 ) returns the joint density
( 1 2 ). In fact, Sklar (1959) proves that, conversely, any multivariate distribution function can
be represented through some copula function.
The most known copula function is the Gaussian copula, which has the following form:
1

1
( 1 2) =
(13A.11)
1 ( 1)
2 ( 2)

where denotes the joint cumulative Normal distribution, and denotes marginal cumulative Normal
distributions. So we have,
1

1
(13A.12)
( 1 2 ) = ( 1 ( 1 ) 2 ( 2 )) =
1 ( 2 ( 2 ))
2 ( 2 ( 2 ))

811

c
by
A. Mele

13.13. Appendix 8: Modeling correlation with copulae functions

where the rst equality follows by Eq. (13A.10) and the second equality follows by Eq. (13A.11).
As an example, we may interpret 1 and 2 as the times by which two names default. A simple
assumption is to set:
( )= ( )
=1 2
(13A.13)
for two random variables
that are stretched as explained in Part A of this appendix. By replacing
Eq. (13A.13) into Eq. (13A.12),
( 1 2) = ( 1 2)
This reasoning can be easily generalized to the
(

)=

1 ( 1)

-dimensional case, where:

( )=

( )

)) =

where
:

We use this approach to model default correlation among names, as explained in the main text, and
in the next appendix.

812

c
by
A. Mele

13.14. Appendix 9: Details on CDO pricing with imperfect correlation

13.14 Appendix 9: Details on CDO pricing with imperfect correlation


We follow the copula approach to price the stylized CDOs in the main text of this chapter. For each
name, create the following random variable,
p
=
+ 1
=1 2 3
(13A.14)

where

is a common factor among the three names, is an idiosynchratic term, and


(0 1),
(0 1). Finally,
0 is meant to capture the default correlation among the names, as follows.
Assume that the risk-neutral probability each rm defaults, by , is given by,
Q ( )=

0 10 )

10%

where is the cumulative distribution of a standard normal variable. That is, by time
defaults any time that,
1
(10%)
0 10

, each rm

Therefore, is the default correlation among the assets in the CDO.


We can now simulate Eq. (13A.14), build up payo s for each simulation, and price the tranches
by just averaging over the simulations, as explained below. Naturally, the same simulation technique
can be used to price tranches on CDOs with an arbitrary number of assets. Precisely, simulate Eq.
(13A.14), and obtain values , = 1 , where is the number of simulations and = 1 2 3.
At simulation no , we have
{1 }
1 2 3
We use the previously simulated values as follows:
For each simulation , count the number of defaults across the three names, dened as the
number of times that
0 10 , for = 1 2 3. Denote the number of defaults as of simulation
with Def .
For each simulation , compute the total realized payo of the asset pool, dened as,
= Def 40 + (3

Def ) 100

For each simulation , compute recursively the payo s to each tranche,

P1
= min max
0

=1

where

is the nominal value of each tranche (

= 140,

= 90,

= 70).

Estimate the price of each tranche by averaging across the simulations,


Price Senior =

1X

Price Mezzanine =

=1

1X
=1

Price Junior =

1X

=1

Note, the previous computations have to be performed under the risk-neutral probability . Using
the probability in the previous algorithm can only be lead to something useful for risk-management
and VaR calculations at best
Note, this model, can be generalized to a multifactor model where,
p
=
+ 1

1 1 + +
1
with obvious notation.

813

13.14. Appendix 9: Details on CDO pricing with imperfect correlation

c
by
A. Mele

References
Amato, J. D. (2005): Risk Aversion and Risk Premia in the CDS Market. BIS Quarterly
Review, September, 55-68.
Anderson, R. W. and S. Sundaresan (1996): Design and Valuation of Debt Contracts. Review
of Financial Studies 9, 37-68.
Artzner, P., F. Delbaen, J.-M. Eber, and D. Heath (1999): Coherent Measures of Risk.
Mathematical Finance 9, 203-228.
Bernanke, B. S., M. Gertler and S. Gilchrist (1999): The Financial Accelerator in a Quantitative Business Cycle Framework. In J. B. Taylor and M. Woodford (Eds.): Handbook of
Macroeconomics, Vol. 1C, Chapter 21, 1341-1393.
Berndt, A., R. Douglas, D. Du e, M. Ferguson and D. Schranz (2005): Measuring Default
Risk-Premia from Default Swap Rates and EDFs. BIS Working Papers no. 173.
Black, F. (1976): The Pricing of Commodity Contracts. Journal of Financial Economics 3,
167-179.
Black, F. and J. Cox (1976): Valuing Corporate Securities: Some E ect of Bond Indenture
Provisions. Journal of Finance 31, 351-367.
Black, F. and M. Scholes (1973): The Pricing of Options and Corporate Liabilities. Journal
of Political Economy 81, 637-659.
Borio, C., C. Furne, and P. Lowe (2001): Procyclicality of the Financial System and Financial
Stability: Issues and Policy Options. Bank for International Settlements working paper
no. 1.
Broadie, M., M. Chernov and S. Sundaresan (2007): Optimal Debt and Equity Values in the
Presence of Chapter 7 and Chapter 11. Journal of Finance 62, 1341-1377.
Christo ersen, P. F. (2003): Elements of Financial Risk Management. Academic Press.
Cox, J. C., J. E. Ingersoll and S. A. Ross (1985): A Theory of the Term Structure of Interest
Rates. Econometrica 53, 385-407.
Danielsson, J., Shin, H. S. and J.-P. Zigrand (2011): Balance Sheet Capacity and Endogenous
Risk. Working paper London School of Economics and Princeton.
Du e, D. and D. Lando (2001): Term Structure of Credit Spreads with Incomplete Accounting Information. Econometrica 69, 633-664.
Fender, I. and P. Hordahl (2007): Overview: Credit Retrenchement Triggers Liquidity Squeeze.
BIS Quarterly Review (September), 1-16.
Fisher, I. (1933): The Debt-Deation Theory of Great Depressions. Econometrica 1, 337-57.
Friedman, M. and A. J. Schwartz (1963): A Monetary History of the United States: 1867-1960.
Princeton, NJ: Princeton University Press.
814

13.14. Appendix 9: Details on CDO pricing with imperfect correlation

c
by
A. Mele

Greenspan, A. (1998): The Role of Capital in Optimal Banking Supervision and Regulation.
FRBNY Economic Policy Review, October, 163-168.
Hull, J. C. (2007): Risk Management and Financial Institutions. Pearson Education International.
Ingersoll, J. E. (1977): A Contingent-Claims Valuation of Convertible Securities. Journal of
Financial Economics 5, 289-321.
International Monetary Fund, (2008): Global Financial Stability Report. April 2008.
Jamshidian, F. (1989): An Exact Bond Option Pricing Formula. Journal of Finance 44,
205-209.
Jarrow, R. A., D. Lando and S. M. Turnbull (1997): A Markov Model for the Term-Structure
of Credit Risk Spreads. Review of Financial Studies 10, 481-523.
Jorion, Ph. (2008): Value at Risk. New York: McGraw Hill.
Lando, David, 2004. Credit Risk ModelingTheory and Applications. Princeton: Princeton
University Press.
Leland, H. E. (1994): Corporate Debt Value, Bond Covenants and Optimal Capital Structure. Journal of Finance 49, 1213-1252.
Leland, H. E. and K. B. Toft (1994): Optimal Capital Structure, Endogenous Bankruptcy,
and the Term Structure of Credit Spreads. Journal of Finance 51, 987-1019.
Lopez, J. (2004): The Empirical Relationship Between Average Asset Correlation, Firm Probability of Default and Asset Size. Journal of Financial Intermediation 13, 265-283.
McDonald, R. L. (2006): Derivatives Markets, Boston: Pearson International Edition.
Mele, A. (2003): Fundamental Properties of Bond Prices in Models of the Short-Term Rate.
Review of Financial Studies 16, 679-716.
Merton, R. C. (1974): On the Pricing of Corporate Debt: The Risk-Structure of Interest
Rates. Journal of Finance 29, 449-470.
Modigliani, F. and M. Miller (1958): The Cost of Capital, Corporation Finance and the
Theory of Investment. American Economic Review 48, 261-297.
Morini, M. and D. Brigo (2011): No-Armageddon Arbitrage-Free Equivalent Measure for
Index Options in a Credit Crisis. Mathematical Finance 21, 573-593.
Rutkowski, M. and A. Armstrong (2009): Valuation of Credit Default Swaptions and Credit
Default Index Swaptions. International Journal of Theoretical and Applied Finance 12,
1027-1053.
Schonbucher, Ph J., 2003. Credit Risk Pricing ModelsModels, Pricing and Implementation.
Chichester, UK: Wiley Finance.
815

13.14. Appendix 9: Details on CDO pricing with imperfect correlation

c
by
A. Mele

Shin, H. S. (2010): Risk and Liquidity. Clarendon Lectures in Finance, Oxford University
Press.
Sklar, A. (1959): Fonction de Repartition a` dimensions et Leurs Marges. Publications de
lInstitut Statistique de lUniversite de Paris 8, 229-231.
Vasicek, O. (1987): Probability of Loss on Loan Portfolio. Working paper KMV, published
in: Risk (December 2002) under the title Loan Portfolio Value.

816

You might also like