Lectures in Financial Economics

Lecture Notes in Financial Economics
c by Antonio Mele
London School of Economics & Political Science
November 2010
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
I Foundations 13
1 The classic capital asset pricing model 14

1.1 Portfolio selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.1.1 The wealth constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.1.2 Portfolio choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.1.3 Without the safe asset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.1.4 The market portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.2 The CAPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3 The APT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.1 A first derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.2 The APT with idiosyncratic risk and a large number of assets . . . . . . 23
1.3.3 Empirical evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.4 Appendix 1: Some analytical details for portfolio choice . . . . . . . . . . . . . . 26
1.4.1 The primal program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.4.2 The dual program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.5 Appendix 2: The market portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.5.1 The tangent portfolio is the market portfolio . . . . . . . . . . . . . . . . 29
1.5.2 Tangency condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.6 Appendix 3: An alternative derivation of the SML . . . . . . . . . . . . . . . . . 31
1.7 Appendix 4: Broader definitions of risk - Rothschild and Stiglitz theory . . . . . 32
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2 The CAPM in general equilibrium 35

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Contents c
by A. Mele
2.2 The static general equilibrium in a nutshell . . . . . . . . . . . . . . . . . . . . . 35

2.2.1 Walras’ Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2.2 Competitive equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2.3 Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3 Time and uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.4 Financial assets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.5 Absence of arbitrage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.5.1 How to price a financial asset? . . . . . . . . . . . . . . . . . . . . . . . . 42
2.5.2 The Land of Cockaigne . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.6 Equivalent martingales and equilibrium . . . . . . . . . . . . . . . . . . . . . . . 48
2.6.1 The rational expectations assumption . . . . . . . . . . . . . . . . . . . . 49
2.6.2 Stochastic discount factors . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.6.3 Optimality and equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.7 Consumption-CAPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.7.1 The risk premium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.7.2 The beta relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.7.3 CCAPM & CAPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.8 Infinite horizon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.9 Further topics on incomplete markets . . . . . . . . . . . . . . . . . . . . . . . . 56
2.9.1 Nominal assets and real indeterminacy of the equilibrium . . . . . . . . . 56
2.9.2 Nonneutrality of money . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.10 Appendix 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.11 Appendix 2: Proofs of selected results . . . . . . . . . . . . . . . . . . . . . . . . 59
2.12 Appendix 3: The multicommodity case . . . . . . . . . . . . . . . . . . . . . . . 62
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3 Infinite horizon economies 65

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.2 Consumption-based asset evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.2.1 Recursive plans: introduction . . . . . . . . . . . . . . . . . . . . . . . . 65
3.2.2 The marginalist argument . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.2.3 Lucas’ model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.2.4 Arrow-Debreu state prices, the CCAPM and the CAPM . . . . . . . . . 69
3.3 Production: foundational issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.3.1 Decentralized economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.3.2 Centralized economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.3.3 Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.3.4 Stochastic economies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.4 Production-based asset pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.4.1 Firms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.4.2 Consumers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.4.3 Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.5 Money, production, asset prices, and overlapping generations models . . . . . . . 83
2
Contents c
by A. Mele
3.5.1 Introduction: endowment economies . . . . . . . . . . . . . . . . . . . . . 83

3.5.2 Diamond’s model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.5.3 Money . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.5.4 Money in a model with real shocks . . . . . . . . . . . . . . . . . . . . . 90
3.6 Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.6.1 Models with productive capital . . . . . . . . . . . . . . . . . . . . . . . 91
3.6.2 Models with money . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.7 Appendix 1: Finite difference equations, with economic applications . . . . . . . 95
3.8 Appendix 2: Neoclassic growth model - continuous time . . . . . . . . . . . . . . 99
3.8.1 Convergence results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.8.2 The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4 Continuous time models 104

4.1 Lambdas and betas in continuous time . . . . . . . . . . . . . . . . . . . . . . . 104
4.1.1 The pricing equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.1.2 Expected returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.1.3 Expected returns and risk-adjusted discount rates . . . . . . . . . . . . . 105
4.2 An introduction to continuous time methods in finance . . . . . . . . . . . . . . 106
4.2.1 Partial differential equations and Feynman-Kac probabilistic representa-
tions of the solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.2.2 The Girsanov theorem with applications to finance . . . . . . . . . . . . 110
4.3 An introduction to arbitrage and equilibrium in continuous time models . . . . . 112
4.3.1 A “reduced-form” economy . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.3.2 Preferences and equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.3.3 Bubbles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.3.4 Reflecting barriers and absence of arbitrage . . . . . . . . . . . . . . . . 117
4.4 Martingales and arbitrage in a diffusion model . . . . . . . . . . . . . . . . . . . 118
4.4.1 The information framework . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.4.2 Viability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.4.3 Market completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.5 Equilibrium with a representative agent . . . . . . . . . . . . . . . . . . . . . . . 122
4.5.1 Consumption and portfolio choices: martingale approaches . . . . . . . . 122
4.5.2 The older, Merton’s approach: dynamic programming . . . . . . . . . . . 125
4.5.3 Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.5.4 Continuous-time Consumption-CAPM . . . . . . . . . . . . . . . . . . . 127
4.6 Market imperfections and portfolio choice . . . . . . . . . . . . . . . . . . . . . 127
4.7 Jumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.7.1 Poisson jumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.7.2 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.7.3 Properties and related distributions . . . . . . . . . . . . . . . . . . . . . 130
4.7.4 Some asset pricing implications . . . . . . . . . . . . . . . . . . . . . . . 131
4.7.5 An option pricing formula . . . . . . . . . . . . . . . . . . . . . . . . . . 133
3
Contents c
by A. Mele
4.8 Continuous-time Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

4.9 Appendix 1: Convergence issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
4.10 Appendix 2: An introduction to stochastic calculus for finance . . . . . . . . . . 135
4.10.1 Stochastic integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
4.10.2 Stochastic differential equations . . . . . . . . . . . . . . . . . . . . . . . 144
4.11 Appendix 3: Proof of selected results . . . . . . . . . . . . . . . . . . . . . . . . 150
4.11.1 Proof of Theorem 4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
4.11.2 Proof of Eq. (4.48). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
4.11.3 Walras’s consistency tests . . . . . . . . . . . . . . . . . . . . . . . . . . 151
4.12 Appendix 4: The Green’s function . . . . . . . . . . . . . . . . . . . . . . . . . . 152
4.12.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
4.12.2 The PDE connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
4.13 Appendix 5: Portfolio constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 154
4.14 Appendix 6: Models with final consumption only . . . . . . . . . . . . . . . . . . 156
4.15 Appendix 7: Further topics on jumps . . . . . . . . . . . . . . . . . . . . . . . . 158
4.15.1 The Radon-Nikodym derivative . . . . . . . . . . . . . . . . . . . . . . . 158
4.15.2 Arbitrage restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
4.15.3 State price density: introduction . . . . . . . . . . . . . . . . . . . . . . . 159
4.15.4 State price density: general case . . . . . . . . . . . . . . . . . . . . . . . 160
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5 Taking models to data 163

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
5.2 Data generating processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
5.2.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
5.2.2 Restrictions on the DGP . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
5.2.3 Parameter estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
5.2.4 Basic properties of density functions . . . . . . . . . . . . . . . . . . . . 165
5.2.5 The Cramer-Rao lower bound . . . . . . . . . . . . . . . . . . . . . . . . 166
5.3 Maximum likelihood estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
5.3.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
5.3.2 Factorizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
5.3.3 Asymptotic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.4 M-estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
5.5 Pseudo, or quasi, maximum likelihood . . . . . . . . . . . . . . . . . . . . . . . 170
5.6 GMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
5.7 Simulation-based estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
5.7.1 Three simulation-based estimators . . . . . . . . . . . . . . . . . . . . . . 175
5.7.2 Asymptotic normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
5.7.3 A fourth simulation-based estimator: Simulated maximum likelihood . . 180
5.7.4 Advances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
5.7.5 In practice? Latent factors and identification . . . . . . . . . . . . . . . . 181
5.8 Asset pricing, prediction functions, and statistical inference . . . . . . . . . . . . 182
4
Contents c
by A. Mele

5.10 Appendix 2: Collected notions and results . . . . . . . . . . . . . . . . . . . . . 187
5.11 Appendix 3: Theory for maximum likelihood estimation . . . . . . . . . . . . . . 190
5.12 Appendix 4: Dependent processes . . . . . . . . . . . . . . . . . . . . . . . . . . 191
5.12.1 Weak dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
5.12.2 The central limit theorem for martingale differences . . . . . . . . . . . . 191
5.12.3 Applications to maximum likelihood . . . . . . . . . . . . . . . . . . . . 191
5.13 Appendix 5: Proof of Theorem 5.4 . . . . . . . . . . . . . . . . . . . . . . . . . . 193
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
II Asset pricing and reality 197

6 On kernels and puzzles 198
6.1 A single factor model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
6.1.1 The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
6.1.2 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
6.2 The equity premium puzzle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
6.3 Hansen-Jagannathan cup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
6.4 Multifactor extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
6.4.1 Exponential affine pricing kernels . . . . . . . . . . . . . . . . . . . . . . 204
6.4.2 Lognormal returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
6.5 Pricing kernels and Sharpe ratios . . . . . . . . . . . . . . . . . . . . . . . . . . 207
6.5.1 Market portfolios and pricing kernels . . . . . . . . . . . . . . . . . . . . 207
6.5.2 Pricing kernel bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
6.5.3 The Roll’s critique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
6.6 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
7 The stock market 216

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
7.2 The empirical evidence: bird’s eye view . . . . . . . . . . . . . . . . . . . . . . . 216
7.3 Volatility: a business cycle perspective . . . . . . . . . . . . . . . . . . . . . . . 222
7.3.1 Volatility cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
7.3.2 Understanding the empirical evidence . . . . . . . . . . . . . . . . . . . . 224
7.3.3 What to do with stock market volatility? . . . . . . . . . . . . . . . . . . 229
7.3.4 What did we learn? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
7.4 Rational stock market fluctuations . . . . . . . . . . . . . . . . . . . . . . . . . 235
7.4.1 A decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
7.4.2 Asset prices and state variables . . . . . . . . . . . . . . . . . . . . . . . 235
7.4.3 Volatility, options and convexity . . . . . . . . . . . . . . . . . . . . . . . 237
7.5 Two economies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
7.5.1 External habit formation . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
5
Contents c
by A. Mele
7.5.2 Large price swings as a learning induced phenomenon . . . . . . . . . . . 244

7.6 The cross section of stock returns and volatilities . . . . . . . . . . . . . . . . . 248
7.7 Appendix 1: Calibration of the tree in Section 7.3 . . . . . . . . . . . . . . . . . 249
7.8 Appendix 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
7.8.1 Markov pricing kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
7.8.2 Arrow-Debreu PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
7.9 Appendix 3: The maximum principle . . . . . . . . . . . . . . . . . . . . . . . . 253
7.10 Appendix 4: Dynamic stochastic dominance . . . . . . . . . . . . . . . . . . . . 255
7.11 Appendix 5: Proofs of selected results . . . . . . . . . . . . . . . . . . . . . . . . 256
7.12 Appendix 6: Convexity of bond prices revisited . . . . . . . . . . . . . . . . . . . 258
7.13 Appendix 7: External habit formation in continuous time . . . . . . . . . . . . . 259
7.14 Appendix 8: Simulation of discrete-time pricing models . . . . . . . . . . . . . . 260
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
8 Tackling the puzzles 265

8.1 Non-expected utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
8.1.1 The recursive formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 265
8.1.2 Testable restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
8.1.3 Equilibrium risk premiums and interest rates . . . . . . . . . . . . . . . . 267
8.1.4 Campbell-Shiller approximation . . . . . . . . . . . . . . . . . . . . . . . 268
8.1.5 Risks for the long-run . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
8.2 Limited stock market participation . . . . . . . . . . . . . . . . . . . . . . . . . 269
8.3 “Catching up with the Joneses” in a heterogeneous agents economy . . . . . . . 270
8.4 Volatility, and leverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
8.4.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
8.5 Appendix 1: Non-expected utility . . . . . . . . . . . . . . . . . . . . . . . . . . 278
8.5.1 Detailed derivation of optimality conditions and selected relations . . . . 278
8.5.2 Details for the risks for the lung-run . . . . . . . . . . . . . . . . . . . . 280
8.5.3 Continuous time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
8.6 Appendix 2: Economies with heterogenous agents . . . . . . . . . . . . . . . . . 282
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
9 Information and other market frictions 287

9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
9.2 Prelude: imperfect information in macroeconomics . . . . . . . . . . . . . . . . . 287
9.3 Grossman-Stiglitz paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
9.4 Noisy rational expectations equilibrium . . . . . . . . . . . . . . . . . . . . . . . 290
9.4.1 Differential information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
9.4.2 Asymmetric information . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
9.4.3 Information acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
9.5 Strategic trading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
9.6 Dealers markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
9.7 Noise traders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
6
Contents c
by A. Mele
9.8 Demand-based derivative prices . . . . . . . . . . . . . . . . . . . . . . . . . . . 290

9.8.1 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
9.8.2 Preferred habitat and the yield curve . . . . . . . . . . . . . . . . . . . . 290
9.9 Over-the-counter markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
III Applied asset pricing theory 292

10 Options and volatility 293
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
10.2 Forwards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
10.2.1 Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
10.2.2 Forwards as a means to borrow money . . . . . . . . . . . . . . . . . . . 293
10.2.3 A pricing formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
10.2.4 Forwards and volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
10.3 Options: no-arb bounds, convexity and hedging . . . . . . . . . . . . . . . . . . 294
10.4 Evaluation and hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
10.4.1 Spanning and cloning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
10.4.2 Black & Scholes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
10.4.3 Surprising cancellations and “preference-free” formulae . . . . . . . . . . 302
10.4.4 Hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
10.4.5 Endogenous volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
10.4.6 Properties of options in diffusive models . . . . . . . . . . . . . . . . . . 304
10.5 Stochastic volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
10.5.1 Statistical models of changing volatility . . . . . . . . . . . . . . . . . . . 306
10.5.2 Implied volatility and smiles . . . . . . . . . . . . . . . . . . . . . . . . . 306
10.5.3 Stochastic volatility and market incompleteness . . . . . . . . . . . . . . 309
10.5.4 Trading volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
10.5.5 Pricing formulae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
10.6 Local volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
10.6.1 Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
10.6.2 How does it work? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
10.7 Variance swaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
10.7.1 Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
10.7.2 Forward volatility trading . . . . . . . . . . . . . . . . . . . . . . . . . . 319
10.7.3 Marking to market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
10.7.4 Stochastic interest rates . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
10.7.5 Hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
10.8 American options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
10.8.1 Real options theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
10.8.2 Perpetual puts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
10.8.3 Perpetual calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
7
Contents c
by A. Mele
10.9 A few exotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324

10.10Market imperfections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
10.11Appendix 1: Additional details on Black & Scholes . . . . . . . . . . . . . . . . 325
10.11.1 The original arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
10.11.2 Delta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
10.12Appendix 2: Stochastic volatility . . . . . . . . . . . . . . . . . . . . . . . . . . 326
10.12.1 Proof of the Hull and White (1987) equation . . . . . . . . . . . . . . . . 326
10.12.2 Simple smile analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
10.13Appendix 3: Local volatility and volatility contracts . . . . . . . . . . . . . . . . 327
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
11 Interest rates 333

11.1 Prices and interest rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
11.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
11.1.2 Markets and interest rate conventions . . . . . . . . . . . . . . . . . . . . 334
11.1.3 The yield curve and forward rates . . . . . . . . . . . . . . . . . . . . . . 336
11.1.4 The expectation theory, and stylized facts of US term structure . . . . . 339
11.1.5 Forward martingale probabilities . . . . . . . . . . . . . . . . . . . . . . 342
11.2 Common factors affecting the yield curve . . . . . . . . . . . . . . . . . . . . . . 344
11.2.1 Methodological details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
11.2.2 The empirical facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
11.3 Models of the short-term rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
11.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
11.3.2 The basic bond pricing equation . . . . . . . . . . . . . . . . . . . . . . . 348
11.3.3 Some famous univariate short-term rate models . . . . . . . . . . . . . . 351
11.3.4 Multifactor models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
11.3.5 Affine and quadratic term-structure models . . . . . . . . . . . . . . . . 358
11.3.6 Short-term rates as jump-diffusion processes . . . . . . . . . . . . . . . . 360
11.3.7 Estimation strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
11.4 No-arbitrage models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
11.4.1 Fitting the yield-curve, perfectly . . . . . . . . . . . . . . . . . . . . . . . 365
11.4.2 Ho & Lee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
11.4.3 Hull & White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
11.4.4 Critiques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
11.5 The Heath-Jarrow-Morton framework . . . . . . . . . . . . . . . . . . . . . . . . 368
11.5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
11.5.2 The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
11.5.3 The dynamics of the short-term rate . . . . . . . . . . . . . . . . . . . . 370
11.5.4 Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
11.6 Stochastic string shocks models . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
11.6.1 Addressing stochastic singularity . . . . . . . . . . . . . . . . . . . . . . 372
11.6.2 No-arbitrage restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
11.7 Interest rate derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
8
Contents c
by A. Mele
11.7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374

11.7.2 The put-call parity in fixed income markets . . . . . . . . . . . . . . . . 375
11.7.3 European options on bonds . . . . . . . . . . . . . . . . . . . . . . . . . 375
11.7.4 Related fixed income products . . . . . . . . . . . . . . . . . . . . . . . . 379
11.7.5 Market models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384
11.8 Appendix 1: The FTAP for bond prices . . . . . . . . . . . . . . . . . . . . . . . 391
11.9 Appendix 2: Certainty equivalent interpretation of forward prices . . . . . . . . 393
11.10Appendix 3: Additional results on T -forward martingale probabilities . . . . . . 394
11.11Appendix 4: Principal components analysis . . . . . . . . . . . . . . . . . . . . . 395
11.12Appendix 5: A few analytics for the Hull and White model . . . . . . . . . . . . 396
11.13Appendix 6: Expectation theory and embedding in selected models . . . . . . . 397
11.14Appendix 7: Additional results on string models . . . . . . . . . . . . . . . . . . 399
11.15Appendix 8: Changes of numéraire . . . . . . . . . . . . . . . . . . . . . . . . . 400
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
12 Risky debt and credit derivatives 406

12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
12.2 The classics: Modigliani-Miller irrelevance results . . . . . . . . . . . . . . . . . 406
12.3 Conceptual approaches to valuation of defaultable securities . . . . . . . . . . . 408
12.3.1 Firm’s value, or structural, approaches . . . . . . . . . . . . . . . . . . . 408
12.3.2 Reduced form approaches: rare events, or intensity, models . . . . . . . . 417
12.3.3 Ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
12.4 Derivatives on corporate assets . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
12.4.1 Callable and puttable bonds . . . . . . . . . . . . . . . . . . . . . . . . . 425
12.4.2 Convertibles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
12.5 (Credit-) risk shifting derivatives and structured products . . . . . . . . . . . . . 429
12.5.1 Securitization, and a brief history of credit risk and financial innovation . 429
12.5.2 Total Return Swaps (TRS) . . . . . . . . . . . . . . . . . . . . . . . . . . 432
12.5.3 Spread Options (SOs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
12.5.4 Credit spread options (CSOs) . . . . . . . . . . . . . . . . . . . . . . . . 433
12.5.5 Credit Default Swaps (CDS) . . . . . . . . . . . . . . . . . . . . . . . . . 433
12.5.6 Collateralized Debt Obligations (CDOs) . . . . . . . . . . . . . . . . . . 444
12.5.7 One stylized numerical example of a structured product . . . . . . . . . . 453
12.6 A few hints on the risk-management practice . . . . . . . . . . . . . . . . . . . . 460
12.6.1 Value at Risk (VaR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
12.6.2 Backtesting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
12.6.3 Stress testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
12.6.4 Credit risk and VaR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
12.7 Appendix 1: Present values contingent on future bankruptcies . . . . . . . . . . 468
12.9 Appendix 3: Details on transition probability matrixes and pricing . . . . . . . . 470
12.10Appendix 4: Derivation of bond spreads with stochastic default intensity . . . . 472
12.11Appendix 5: Conditional probabilities of survival . . . . . . . . . . . . . . . . . . 473
9
Contents c
by A. Mele
12.12Appendix 6: Modeling correlation with copulae functions . . . . . . . . . . . . . 474

12.13Appendix 7: Details on CDO pricing with imperfect correlation . . . . . . . . . 476
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
13 Financial engineering and fixed income securities 479

13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
13.1.1 Relative pricing in fixed income markets . . . . . . . . . . . . . . . . . . 479
13.1.2 Complexity of fixed income securities . . . . . . . . . . . . . . . . . . . . 479
13.1.3 Many evaluation paradigms . . . . . . . . . . . . . . . . . . . . . . . . . 480
13.2 Bootstrapping and curve fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
13.2.1 Extracting zeros from bond prices . . . . . . . . . . . . . . . . . . . . . . 480
13.2.2 Bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
13.2.3 Curve fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482
13.3 Duration, convexity and asset liability management . . . . . . . . . . . . . . . . 483
13.3.1 Duration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
13.3.2 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
13.3.3 Duration and asset-liability management . . . . . . . . . . . . . . . . . . 485
13.4 Foundational issues on interest rate modeling . . . . . . . . . . . . . . . . . . . 492
13.4.1 Tree representation of the short-term rate . . . . . . . . . . . . . . . . . 493
13.4.2 Tree pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
13.5 The Ho and Lee model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
13.5.1 The tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
13.5.2 The price movements and the martingale restriction . . . . . . . . . . . . 509
13.5.3 The recombining condition . . . . . . . . . . . . . . . . . . . . . . . . . . 510
13.5.4 Calibration of the model . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
13.5.5 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
13.6 Beyond Ho and Lee: Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . 516
13.6.1 Arrow-Debreu securities . . . . . . . . . . . . . . . . . . . . . . . . . . . 516
13.6.2 The algorithm in two examples . . . . . . . . . . . . . . . . . . . . . . . 518
13.7 Copying with credit risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
13.7.1 Callable bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
13.7.2 Convertible bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
13.8 Appendix 1: Proof of Eq. (13.9) . . . . . . . . . . . . . . . . . . . . . . . . . . . 526
13.9 Appendix 2: Proof of Eq. (13.24) . . . . . . . . . . . . . . . . . . . . . . . . . . 528
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530
10
“Many of the models in the literature are not general equilibrium models in my sense. Of
those that are, most are intermediate in scope: broader than examples, but much narrower
than the full general equilibrium model. They are narrower, not for carefully-spelled-out
economic reasons, but for reasons of convenience. I don’t know what to do with models
like that, especially when the designer says he imposed restrictions to simplify the model
or to make it more likely that conventional data will lead to reject it. The full general
equilibrium model is about as simple as a model can be: we need only a few equations to
describe it, and each is easy to understand. The restrictions usually strike me as extreme.
When we reject a restricted version of the general equilibrium model, we are not rejecting
the general equilibrium model itself. So why bother testing the restricted version?”
Fischer Black, 1995, p. 4, Exploring General Equilibrium, The MIT Press.
Preface
The present Lecture Notes in Financial Economics are based on my teaching notes for advanced
undergraduate and graduate courses in financial economics, macroeconomic dynamics, financial
econometrics and financial engineering. Part I, “Foundations,” develops the fundamentals tools
of analysis used in Part II and Part III. These tools span such disparate topics as classical
portfolio selection, dynamic consumption- and production- based asset pricing, in both discrete
and continuous-time, the intricacies underlying incomplete markets and some other market
imperfections and, finally, econometric tools comprising maximum likelihood, methods of mo-
ments, and the relatively more modern simulation-based inference methods. Part II, “Asset
pricing and reality,” is about identifying the main empirical facts in finance and the challenges
they pose to financial economists: from excess price volatility and countercyclical stock market
volatility, to cross-sectional puzzles such as the value premium. This second part reviews the
main models aiming to take these puzzles on board. Part III, “Applied asset pricing theory,”
aims just to this: to use the main tools in Part I and cope with the main challenges occurring
in actual capital markets, arising from option pricing and trading, interest rate modeling and
credit risk and their associated derivatives. In a sense, Part II is about the big puzzles we face
in fundamental research, while Part III is about how to live within our current and certainly
unsatisfactory paradigms, so as to cope with demand for intellectual expertise.
These notes are still underground. The economic motivation and intuition are not always de-
veloped as deeply as they deserve, some derivations are inelegant, and sometimes, the English
is a bit informal. Moreover, I still have to include material on asset pricing with asymmetric
information, monetary models of asset prices, bubbles, asset prices implications of overlapping
generations models, or financial frictions. Finally, I need to include more extensive surveys for
each topic I cover, especially in Part II. I plan to revise these notes to fill these gaps. Meanwhile,
any comments on this version are more than welcome.
Antonio Mele
November 2010
Part I
Foundations
13
1
The classic capital asset pricing model
1.1 Portfolio selection

An investor is concerned with choosing a number of assets to include in his portfolio. Which
weigths each asset must bear for the investor to maximize some utility criterion? This section
deals with this problem when our investor maximizes a mean-variance criterion, as in the seminal
approach of Markovitz (1952). First, we derive the wealth constraint. Second, we illustrate the
main results of the model, with and without a safe asset. Third, we introduce the notion of
market portfolio.
1.1.1 The wealth constraint

The space choice comprises m risky assets, and some safe asset. Let S = [S1 , · · ·, Sm ] be the
risky assets price vector, and let S0 be the price of the riskless asset. We wish to evaluate the
value of a portfolio that contains all these assets. Let θ = [θ1 , · · ·, θ m ], where θi is the number
of the i-th risky asset, and let θ0 be the number of the riskless assets, in this portfolio. The
initial wealth is, w = S0 θ0 + S · θ. Terminal wealth is w+ = x0 θ0 + x · θ, where x0 is the payoff
promised by the riskless asset, and x = [x1 , · · ·, xm ] is the vector of the payoffs pertaining to
the risky assets, i.e. xi is the payoff of the i-th asset.
The following pieces of notation considerably simplify the presentation. Let R ≡ Sx00 , and
R̃i ≡ Sxii . In words, R is the gross interest rate obtained by investing in a safe asset, and R̃i is
the gross return obtained by investing in the i-th risky asset. Accordingly, we define r ≡ R − 1
as the safe interest rate; b̃ = [b̃1 , · · ·, b̃m ], where b̃i ≡ R̃i − 1 is the rate of return on the i-th
asset; and b ≡ E(b̃), the vector of the expected returns on the risky assets. Finally, we let
π = [π 1 , · · ·, π m ], where π i ≡ θi Si is the wealth invested in the i-th asset. We have,
m
m
m

+
w = x0 θ0 + xi θi ≡ Rπ 0 + R̃i π i and w = π 0 + πi . (1.1)
i=1 i=1 i=1
1.1. Portfolio selection c
by A. Mele
Combining the two expressions for w+ and w, we obtain, after a few simple computations,
w + = π ⊤ (R̃ − 1m R) + Rw = π ⊤ (b − 1m r) + Rw + π ⊤ (b̃ − b).
We use the decomposition, b̃ − b = a · ũ, where a is a m × d “volatility” matrix, with m ≤ d,

and ũ is a random vector with expectation zero and variance-covariance matrix equal to the
identity matrix. With this decomposition, we can rewrite the budget constraint in Eq. (1.1) as
follows:
w+ = π ⊤ (b − 1m r) + Rw + π⊤ aũ. (1.2)
We now use Eq. (1.2) to compute the expected return and the variance of the portfolio value.
We have,

E w+ (π) = π ⊤ (b − 1m r) + Rw and var w + (π) = π ⊤ Σπ (1.3)
where Σ ≡ aa⊤ . Let σ 2i ≡ Σii . We assume that Σ has full-rank, and that,
σ 2i > σ 2j ⇒ bi > bj all i, j,
which implies that r < minj (bj ).
1.1.2 Portfolio choice

We assume that the investor maximizes the expected return on his portfolio, given a certain
level of the variance of the portfolio’s value, which we set equal to w2 · vp . So we use Eq. (1.3)
to set up the following program
+ +
π̂ (vp ) = arg max
m
E w (π) s.t. var w (π) = w 2 · vp2 . [1.P1]
π∈R
The first order conditions for [1.P1] are,
π̂ (vp ) = (2ν)−1 Σ−1 (b − 1m r) and π̂ ⊤ Σπ̂ = w2 · vp2 ,
where ν is a Lagrange multiplier for the variance constraint. By plugging the first condition
into the second, we obtain, (2ν)−1 = ∓ w·v
√ p , where
Sh
Sh ≡ (b − 1m r)⊤ Σ−1 (b − 1m r) , (1.4)
is the Sharpe market performance. To ensure efficiency, we take the positive solution. Substitut-
ing the positive solution for (2ν)−1 into the first order condition, we obtain that the portfolio
that solves [1.P1] is
π̂ (vp ) Σ−1 (b − 1m r)
≡ √ · vp . (1.5)
w Sh
We are now ready to calculate the value of [1.P1], E [w+ (π̂ (vp ))] and, hence, the expected
portfolio return, defined as,
E [w+ (π̂(vp ))] − w √
µp (vp ) ≡ = r + Sh · vp , (1.6)
w
where the last equality follows by simple computations. Eq. (1.6) describes what is known as
the Capital Market Line (CML).
15
by A. Mele
1.1.3 Without the safe asset

Next, let us suppose the investor’s space choice does not include the riskless asset. In this case,
m
his current wealth is w = m i=1 π i , and his terminal wealth is w +
= i=1 R̃i π i . By the definition
of b̃i ≡ R̃i − 1, and by a few simple computations,
m
m

+
w = b̃i π i + π i = π ⊤ b + w + π ⊤ aũ, (1.7)
i=1 i=1
where a and ũ are as defined as in Eq. (1.2). We can use Eq. (1.7) to compute the expected
return and the variance of the portfolio value, which are:

E w+ (π) = π⊤ b + w, where w = π ⊤ 1m and var w + (π) = π ⊤ Σπ. (1.8)
The program our investor solves, now, is:

π̂ (vp ) = arg max E w+ (π) s.t. var w+ (π) = w2 · vp2 and w = π ⊤ 1m . [1.P2]
π∈R
In the appendix, we show that provided αγ − β 2 > 0 (a second order condition), the solution
to [P2] is,
π̂ (vp ) γµp (vp ) − β −1 α − βµp (vp ) −1
= 2 Σ b+ Σ 1m , (1.9)
w αγ − β αγ − β 2
where α ≡ b⊤ Σ−1 b, β ≡ 1⊤ −1 ⊤ −1
m Σ b and γ ≡ 1m Σ 1m , and µp (vp ) is the expected portfolio return,
defined as in Eq. (1.6). In the appendix, we also show that,

2 1 1 2
vp = 1+ γµp (vp ) − β . (1.10)
γ αγ − β 2
Therefore, the global minimum variance portfolio achieves a variance equal to vp2 = γ −1 and an
expected return equal to µp = β/ γ.
Note that for each vp , there are two values of µp (vp ) that solve Eq. (1.10). The optimal choice
for our investor is that with the highest µp . We define the efficient portfolio frontier as the set
of values (vp , µp ) that solve Eq. (1.10) with the highest µp . It has the following expression,

β 1 2
µp (vp ) = + γvp − 1 αγ − β 2 . (1.11)
γ γ
Clearly, the efficient portfolio frontier is an increasing and concave function of vp . It can be
interpreted as a sort of “production function,” one that produces “expected returns” through
inputs of “levels of risk” (see, e.g., Figure 1.1). The choice of which portfolio has effectively to
be selected depends on the investor’s preference toward risk.
Example 1.1. Let the number of risky assets m = 2. In this case, we do not need to
optimize anything, as the budget constraint, πw1 + πw2 = 1, pins down an unique relation between
the portfolio expected return and the variance of the portfolio’s value. So we simply have,
E [w+ (π)]−w
µp = w
= πw1 b1 + πw2 b2 , or,
 π
 µp = b1 + (b2 − b1 ) 2
2 w 2
 v2 = 1 − π 2 σ 2 + 2 1 − π 2 π 2 σ 12 + π 2 σ 2
p 1 2
w w w w
16
by A. Mele
0.15
0.14
ρ = −1
p
ρ = − 0.5
Expected return, mu
0.13
ρ=0
ρ = 0.5
0.12
ρ=1
0.11
0.1
0.09
0 0.05 0.1 0.15 0.2 0.25
Volatility, v
p
FIGURE 1.1. From top to bottom: portfolio frontiers corresponding to ρ = −1, −0.5, 0, 0.5, 1. Param-
eters are set to b1 = 0.10, b2 = 0.15, σ 1 = 0.20, σ2 = 0.25. For each portfolio frontier, the efficient
portfolio frontier includes those portfolios which yield the lowest volatility for a given expected return.
whence:
2 2
1
vp = b2 − µp σ 21 + 2 b2 − µp µp − b1 ρσ 1 σ 1 + µp − b1 σ 22
b2 − b1
When ρ = 1,
(b1 − b2 ) (σ 1 − vp )
µp = b1 + .
σ2 − σ1
In the general case, diversification pays when the asset returns are not perfectly positively
correlated (see Figure 1.1). As Figure 1.1 reveals, it is even possible to obtain a portfolio that
is less risky than than the less risky asset. Moreover, risk can be zeroed when ρ = −1, which
corresponds to πw1 = σ2σ−σ
2
1
and πw2 = − σ2σ−σ
1
1
or, alternatively, to πw1 = − σ2σ−σ
2
1
and πw2 = σ2σ−σ
1
1
.
Let us return to the general case. The portfolio in Eq. (1.9) can be decomposed into two
components, as follows:

π̂ (vp ) πd πg β µp (vp ) γ − β
= ℓ (vp ) + [1 − ℓ (vp )] , ℓ (vp ) ≡ ,
w w w αγ − β 2
where
πd Σ−1 b πg Σ−1 1m
≡ , ≡ .
w β w γ
17
by A. Mele
πg
Hence, we see that is the global minimum variance portfolio, for we know from Eq. (1.10)
w
1 β
that the minimum variance occurs at (vp , µp ) = ,
γ γ
, in which case ℓ (vp ) = 0.1 More
generally, we can span any portfolio on the frontier by just choosing a convex combination of
πd πg
and , with weight equal to ℓ (vp ). It’s a mutual fund separation theorem.
w w
1.1.4 The market portfolio

The market portfolio is the portfolio at which the CML in Eq. (1.6) and the efficient portfolio
frontier in Eq. (1.11) intersect. In fact, the market portfolio is the point at which the CML is
tangent at the efficient portfolio frontier. For this reason, the market portfolio is also referred
to as the “tangent” portfolio. In Figure 1.2, the market portfolio corresponds to the point M
(the portfolio with volatility equal to vM and expected return equal to µM ), which is the point
at which the CML is tangent to the efficient portfolio frontier, AM C.2
As Figure 1.2 illustrates, the CML dominates the efficient portfolio frontier AM C. This is
because the CML is the value of the investor’s problem, [1.P1], obtained using all the risky
assets and the riskless asset, and the efficient portfolio frontier is the value of the investor’s
problem, [P2], obtained using only all the risky assets.3 For the same very reason, the CML
and the efficient portfolio frontier can only be tangent with each other. For suppose not. Then,
there would exist a point on the efficient portfolio frontier that dominates some portfolio on the
CML, a contradiction. Likewise, the CML must have a portfolio in common with the efficient
portfolio frontier - the portfolio that does not include the safe asset. Below, we shall use this
insight to characterize, analytically, the market portfolio.
Why is the market portfolio called in this way? Figure 1.2 reveals that any portfolio on the
CML can be obtained as a combination of the safe asset and the market portfolio M (a portfolio
containing only the risky assets). An investor with high risk-aversion would like to choose a
point such as Q, say. An investor with low risk-aversion would like to choose a point such as P ,
say. But no matter how risk averse an individual is, the optimal solution for him is to choose
a combination of the safe asset and the market portfolio M . Thus, the market portfolio plays
an instrumental role. It obviously does not depend on the risk attitudes of any investor - it is a
mere convex combination of all the existing assets in the economy. Instead, the optimal course
of action for any investor is to use those proportions of this portfolio that make his overall
exposure to risk consistent with his risk appetite. It’s a two fund separation theorem.
The equilibrium implications, then, are as follows. As we have explained, any portfolio can be
attained by lending or borrowing funds in zero net supply, and in the portfolio M . In equilibrium,
then, every investor must hold some proportions of M. But since in aggregate, there is no net
borrowing or lending, one has that in aggregate, all investors must have portfolio holdings that
sum up to the market portfolio, which is therefore the value-weighted portfolio of all the existing
assets in the economy. This argument is formally developed in the appendix.
1 Itis easy to show that the covariance of the global minimum variance portfolio with any other portfolio equals γ −1 .
2 The existence of the market portfolio requires a restriction on r, derived in Eq. (1.12) below.
3 Figure 1.2 also depicts the dotted line MZ, which is the value of the investor’s problem when he invests a proportion higher
than 100% in the market portfolio, leveraged at an interest rate for borrowing higher than the interest rate for lending. In this case,
the CML coincides with rM, up to the point M. From M onwards, the CML coincides with the highest between MZ and MA.
18
by A. Mele
P
CML
A
M Z
µM
C
r
vM
FIGURE 1.2.
We turn to characterize the market portfolio. We need to assume that the interest rate is
sufficiently low to allow the CML to be tangent at the efficient portfolio frontier. The technical
condition that ensures this is that the return on the safe asset be less than the expected return
on the global minimum variance portfolio, viz
β
r< . (1.12)
γ
Let π M be the market portfolio. To identify π M , we note that it belongs to AM C if π ⊤

M 1m = w,
where π M also belongs to the CML and, therefore, by Eq. (1.5), is such that:
πM Σ−1 (b − 1m r)
= √ · vM . (1.13)
w Sh
Therefore, we must be looking for the value vM that solves
Σ−1 (b − 1m r)
w = 1⊤ ⊤
m π M = w · 1m √ · vM ,
Sh
i.e. √
Sh
vM = . (1.14)
β − γr
Then, we plug this value of vM into the expression for π M in Eq. (1.13) and obtain,4
πM 1
= Σ−1 (b − 1m r) . (1.15)
w β − γr
4 While the market portfolio depends on r, this portfolio does not obviously include any share in the safe asset.
19
1.2. The CAPM c
by A. Mele
Naturally, the market portfolio belongs to the efficient portfolio frontier. Indeed, on the
one hand, the market portfolio can not be above the efficient portfolio frontier, as this would
contradict the efficiency of the AM C curve, which is obtained by investing in the risky assets
only; on the other hand, the market portfolio can not be below the efficient portfolio frontier, for
by construction, it belongs to the CML which, as shown before, dominates the efficient portfolio
frontier. In the appendix, we confirm, analytically, that the market portfolio does indeed enjoy
the tangency condition.
1.2 The CAPM

The Capital Asset Pricing Model (CAPM) provides an asset evaluation formula. In this section,
we derive the CAPM through arguments that have the same flavor as the original derivation of
Sharpe (1964). The first step is the creation of a portfolio including a proportion α of wealth
invested in any asset i and the remaining proportion 1 − α invested in the market portfolio.
Mathematically, we are considering an α-parametrized portfolio, with expected return and
volatility given by:
µ̃p ≡
αbi + (1 − α)µM
(1.16)
ṽp ≡ (1 − α)2 σ 2M + 2(1 − α)ασ iM + α2 σ 2i
where we have defined σ M ≡ vM . Clearly, the market portfolio, M , belongs to the α-parametrized
portfolio. By the Example 1.1, the curve in (1.16) has the same shape as the curve A′ M i in
Figure 2.3. The curve A′ M i lies below the efficient portfolio frontier AM C. This is because
the efficient portfolio frontier is obtained by optimizing a mean-variance criterion over all the
existing assets and, hence, dominates any portfolio that only comprises the two assets i and
M . Suppose, for example, that the A′ M i curve intersects the AM C curve; then, a feasible
combination of assets (including a proportion α of the i-th asset and a proportion 1 − α of the
market portfolio) would dominate AMC, a contradiction, given that AM C is the most efficient
feasible combination of all the assets. Therefore, the curve A′ M i is tangent to the efficient
portfolio frontier AMC at M , which in turn, as we know, is tangent to the CML at M .
Let us equate, then, the two slopes of the A′ M i curve and the efficient portfolio frontier
AM C at M . We shall show that this condition provides a restriction on the expected return bi
on any asset i. Because (1.16) is, mathematically, an α-parametrized curve, we may compute

its slope at M through the computation of dµ̃p dα and dṽp / dα, at α = 0. We have,

dµ̃p dṽp −(1 − α)σ 2M + (1 − 2α)σ iM + ασ 2i |α=0 1 2

= bi − µM , = − = σ − σ .
dα α=0
iM M
dα ṽp |α=0 σM
Therefore,
dµ̃p (α) bi − µM
= . (1.17)
dṽp (α) α=0 1
σM
(σ iM − σ 2M )
On the other hand, the slope of the CML is (µM − r)/ σ M which, equated to the slope in Eq.
(1.17), yields,
σ iM
bi − r = β i (µM − r) , β i ≡ 2 , i = 1, · · ·, m. (1.18)
vM
20
1.2. The CAPM c
by A. Mele
CML
A
µM M A’
i
C
r
vM
FIGURE 1.3.
Eq. (1.18) is the celebrated Security Market Line (SML). The appendix provides an alternative
derivation of the SML. Assets with β i > 1 are called “aggressive” assets. Assets with β i < 1
are called “conservative” assets.
Note, the SML can be interpreted as a projection of the excess return on asset i (i.e. b̃i − r)
on the excess returns on the market portfolio (i.e. b̃M − r). In other words,
b̃i − r = β i (b̃M − r) + εi , i = 1, · · ·, m. (1.19)
The previous relation leads to the following decomposition of the volatility (or risk) related to
the i-th asset return:
σ 2i = β 2i vM
2
+ var (εi ) , i = 1, · · ·, m.
The quantity β 2i vM
2
is usually referred to as systematic risk. The quantity var (εi ) ≥ 0, instead,
is what we term idiosyncratic risk. In the next section, we shall show that idiosyncratic risk
can be eliminated through a “well-diversified” portfolio - roughly, a portfolio that contains a
large number of assets. Naturally, economic theory does not tell us anything substantial about
how important idiosyncratic risk is for any particular asset.
The CAPM can be usefully interpreted within a classical hedging framework. Suppose we
hold an asset that delivers a return equal to z̃ - perhaps, a nontradable asset. We wish to
hedge against movements of this asset by purchasing a portfolio containing a percentage of α
in the market portfolio, and a percentage of 1 − α units in a safe asset. The hedging criterion
we wish to use is the variance of the overall exposure of the position, which we minimize by
minα var[z̃ − ((1 − α) r + αb̃M )]. It is straight forward to show that the solution to this basic
2
problem is, α̂ ≡ β z̃ ≡ cov(z̃, b̃M )/vm . That is, the proportion to hold is simply the beta of the
asset to hedge with the market portfolio.
21
1.3. The APT c
by A. Mele
The CAPM is a model for the required return for any asset and so, it is a very first tool we
can use to evaluate risky projects. Let
E (C + )
V = value of a project = ,
1 + rC
where C + is future cash flow and rC is the risk-adjusted discount rate for this project. We have:
E (C + )
= 1 + rC
V
= 1 + r + β C (µM − r)
+
C
cov V − 1, x̃M
= 1+r+ 2
(µM − r)
vM
1 cov (C + , x̃M )
= 1+r+ 2
(µM − r)
V vM
1 λ
= 1 + r + cov C + , x̃M ,
V vM
where λ ≡ µM
vM
−r
, the unit market risk-premium.
Rearranging terms in the previous equation leaves:
λ
E (C + ) − vM
cov (C + , x̃M )
V = . (1.20)
1+r
The certainty equivalent C̄ is defined as:
E (C + ) C̄
C̄ : V = = ,
1 + rC 1+r
or,
C̄ = (1 + r) V,
and using Eq. (1.20),
λ
C̄ = E C + − cov C + , x̃M .
vM
1.3 The APT

1.3.1 A first derivation
Suppose that the m asset returns we observe are generated by the following linear factor model,
b̃ = a + B · f ≡ a + cov(b̃, f )[var(f)]−1 · f (1.21)
m×1 m×1 m×k k×1
where a and B are a vector and a matrix of constants, and f is a k-dimensional vector of factors
supposed to affect the asset returns, with k ≤ m. Let us normalize [var(f)]−1 = Ik×k , so that
B = cov(b̃, f ). With this normalization, we have,
   k 
cov(b̃1 , f ) j=1 cov( b̃ 1 , fj )fj
 ..   .. 
b̃ = a +  .  · f = a +  . .
k
cov(b̃m , f ) j=1 cov(b̃m , fj )fj
22
1.3. The APT c
by A. Mele
Next, let us consider a portfolio π including the m risky assets. The return of this portfolio
is,
π ⊤ b̃ = π ⊤ a + π ⊤ Bf,
where as usual, π ⊤ 1m = 1. An arbitrage opportunity arises if there exists some portfolio π
such that the return on the portfolio is certain, and different from the safe interest rate r, i.e. if
∃π : π ⊤ B = 0 and π ⊤ a = r. Mathematically, this is ruled out whenever ∃λ ∈ Rk : a = Bλ+1m r.
Substituting this relation into Eq. (1.21) leaves,
b̃ = 1m r + Bλ + Bf = 1m r + cov(b̃, f )λ + cov(b̃, f )f.
Taking the expectation,

k
bi = r + (Bλ)i = r + cov(b̃ , f )λ , i = 1, · · ·, m. (1.22)
j=1 i j j
≡β i,j
The APT collapses to the CAPM, once we assume that the only factor affecting the returns
is the market portfolio. To show this, we must normalize the market portfolio return so that its
variance equals one, consistently with Eq. (1.22). So let r̃M be the normalized market return,
−1
defined as r̃M ≡ vM b̃M , so that var(r̃M ) = 1. We have,
b̃i = a + β i r̃M , i = 1, · · ·, m,
−1
where β i = cov(b̃i , r̃M ) = vM cov(b̃i , b̃M ). Then, we have,
bi = r + β i λ, i = 1, · · ·, m. (1.23)
−1
In particular, β M = cov(b̃M , r̃M ) = vM var(b̃M ) = vM , and so, by Eq. (1.23),
bM − r
λ= ,
vM
which is known as the Sharpe ratio for the market portfolio, or the market price of risk.
−1
By replacing β i = vM cov(b̃i , b̃M ) and the expression for λ above into Eq. (1.23), we obtain,
cov(b̃i , b̃M )
bi = r + 2
(bM − r) , i = 1, · · ·, m.
vM
This is simply the SML in Eq. (1.18).
1.3.2 The APT with idiosyncratic risk and a large number of assets
[Ross (1976), and Connor (1984), Huberman (1983).]
How can idiosyncratic risk be eliminated? Consider, for example, Eq. (1.19). Intuitively, we
may form portfolios with a large number of assets, so as to make idiosyncratic risk negligible, by
the law of large numbers. But would the beta-relation still hold, in this case? More in general,
would the APT relation in Eq. (1.22) be still valid? The answer is in the affirmative, although
it deserves some qualifications.
23
1.3. The APT c
by A. Mele
Consider the APT equation (1.21), and “add” a vector of idiosyncratic returns, ε, which are
independent of f , and have mean zero and variance σ 2ε :
b̃ = a + B · f + ε.
We wish to show that in the absence of arbitrage, to be defined below, it must be that the
number of assets such that Eq. (1.22) does not hold, N (m) say, is bounded as m gets large,
i.e.:
|ai − ((Bλ)i + r)| > 0, i = 1, · · · , N (m) , (1.24)
where
lim N (m) < ∞. (1.25)
m→∞
In other words, we wish to show that in a “large” market, Eq. (1.22) does indeed hold for most
of the assets, an approach close to that in Huang and Litzenberger (1988, p. 106-108).
By the same arguments leading to Eq. (1.1), the wealth generated by a portfolio of the assets
+
satisfying (1.24), wN(m) say, is,
+

wN(m) = π⊤
N(m) a N(m) − 1N(m) r + Rw N(m) + π ⊤
N(m) BN(m) f + ε N(m) ,
where aN , BN and εN are (i) the vector of the expected returns, (ii) the return volatility (or
factor exposures) matrix and (iii) the vector of idiosyncratic return components affecting these
assets, and, finally, πN and wN are the portfolio and the initial wealth invested in these assets.
In this context, we may define an arbitrage as the portfolio π N(m) that in the limit, as the
number of all the existing assets m gets large, is riskless and yet delivers an expected return
strictly larger than the safe interest rate, viz
+
E[wN(m) ] +
lim > R, and lim var[wN(m) ] → 0. (1.26)
m→∞ wN(m) m→∞
We want to show that this situation does not arises, under the condition in (1.25), thereby
establishing that the linear APT relation in Eq. (1.22) is valid for most of the assets, in a large
market.
So suppose the linear relation, aN − 1N r = BN λ, doesn’t hold. Then, there exists a portfolio
π such that,
π ⊤ BN = 0 and π ⊤ (aN − 1N r) = 0. (1.27)
Consider the portfolio:
1
π̂ N = · sign π ⊤ (aN − 1N r) · π,
N
where π is as in (1.27). With this portfolio we have, clearly, that E[wN +
] = π̂ ⊤
N (aN − 1N r) +
+
RwN > RwN , for each N, and even for N large. That is, limm→∞ E[wN (m) ]/wN(m) > R, which
is the first condition in (1.26). As regards the second condition in (1.26), we have that

var[wN+
] = π̂ ⊤ ⊤ 2 2 ⊤
N BN BN + σ ε IN×N π̂ N = σ ε π̂ N π̂ N ,
+
where the second equality follows by the first relation in (1.27). Clearly, limm→∞ var[wN(m) ]→0
as N (m) → ∞. Hence, in the absence of arbitrage, the condition in (1.25) must hold.
24
1.3. The APT c
by A. Mele
1.3.3 Empirical evidence

Fama-MacBeth. Economic forces driving asset returns.
25
1.4. Appendix 1: Some analytical details for portfolio choice c
by A. Mele
1.4 Appendix 1: Some analytical details for portfolio choice

We derive Eq. (1.9), which provides the solution for the portfolio choice when the space choice does not
include a safe asset. We derive the solution by proceeding with two programs: (i) the primal program
[P2] in the main text, which consists in maximizing the portfolio expected return, given a certain level
of the variance of the portfolio’s value; and (ii) a dual program, to be introduced below, by which one
minimizes the variance of the portfolio’s value, given a certain level of the portfolio expected return.
1.4.1 The primal program

Given Eq. (1.8), the Lagrangian function associated to [P2] is,
L = π⊤ b + w − ν 1 (π⊤ Σπ − w2 · vp2 ) − ν 2 (π⊤ 1m − w),
where ν 1 and ν 2 are two Lagrange multipliers. The first order conditions are,
1 −1
π̂ = Σ (b − ν 2 1m ) , π̂⊤ Σπ̂ = w2 · vp2 , π̂⊤ 1m = w. (1A.1)
2ν 1
Using the first and the third conditions, we obtain,
1 1
w = 1⊤
m π̂ = (1⊤ −1 ⊤ −1
m Σ b − ν 2 1m Σ 1m ) ≡ (β − ν 2 γ).
2ν 1 2ν 1
≡β ≡γ
We can solve for ν 2 , obtaining,

β − 2wν 1
ν2 = .
γ
By replacing the solution for ν 2 into the first condition in (1A.1) leaves,

w −1 1 −1 β
π̂ = Σ 1m + Σ b − 1m . (1A.2)
γ 2ν 1 γ
Next, we derive the value of the program [P2]. We have,

w 1 ⊤ −1 β w 1 β2
E w+ (π̂) − w = π̂⊤ b = 1⊤ Σ−1 b + (b Σ b − 1⊤ Σ−1 b) = β + α− . (1A.3)
γ m 2ν 1 γ m γ 2ν 1 γ
≡β ≡α ≡β
It is easy to check that

var w+ (π̂) = w2 · vp2
⊤
= π̂
Σπ̂
w ⊤ −1 1 ⊤ β ⊤ −1 w 1 β
= 1 Σ + b − 1m Σ 1m + b − 1m
γ m 2ν 1 γ γ 2ν 1 γ
2
2 2
w 1 β
= + α− . (1A.4)
γ 2ν 1 γ
Let us gather Eqs. (1A.3) and (1A.4),

 + (π̂)] − w
2

 E [w β 1 β
 µp (vp ) ≡ = + α−
w 2 γ 2ν 1 w γ
2 (1A.5)

 1 1 β
 vp2 = + α−
γ 2ν 1 w γ
26
by A. Mele
where we have emphasized the dependence of µp on vp , which arises through the presence of the
Lagrange multiplier ν 1 .
Let us rewrite the first equation in (1A.5) as follows,
1 −1
= αγ − β 2 γµp (vp ) − β . (1A.6)
2ν 1 w
We can use this expression for ν 1 to express π̂ in Eq. (1A.2) in terms of the portfolio expected return,
µp (vp ). We have,

π̂ Σ−1 1m
2 −1
−1 Σ−1 β
= + αγ − β γµp (vp ) − β Σ b − 1m .
w γ γ
By rearranging terms in the previous equation, we obtain Eq. (1.9) in the main text.
Finally, we substitute Eq. (1A.6) into the second equation in (1A.5), and obtain:
1 −1 2 !
vp2 = 1 + αγ − β 2 γµp (vp ) − β ,
γ
which is Eq. (1.10) in the main text. Note, also, that the second condition in (1A.5) reveals that,
2
1 γvp2 − 1
= .
2ν 1 w αγ − β 2
Given that αγ − β 2 > 0, the previous equation confirms the properties of the global minimum variance
portfolio stated in the main text.
1.4.2 The dual program

We now solve the dual program, defined as follows,
+
w (π)
π̂ = arg minm var s.t. E w+ (π) = Ep and w = π⊤ 1m , [1A.P2-dual]
π∈R w
for some constant Ep . The first order conditions are
π̂ ν 1 w −1 ν 2 w −1
= Σ b+ Σ 1m ; π̂⊤ b = Ep − w ; w = π̂⊤ 1m ; (1A.7)
w 2 2
where ν 1 and ν 2 are two Lagrange multipliers. By replacing the first condition in (8A.14) into the
second one,
ν1 ν 2 ⊤ −1 2 ν1 ν2
Ep − w = π̂⊤ b = w2 ( b⊤ Σ−1
b + 1 Σ b) ≡ w α + β . (1A.8)
2 2 m 2 2
≡α ≡β
By replacing the first condition in (8A.14) into the third one,

ν 1 ⊤ −1 ν2 ν2
2 ν1
w = π̂⊤ 1m = w2 ( b Σ 1m + 1⊤ Σ −1
1m ) ≡ w β + γ . (1A.9)
2 2 m 2 2
≡β ≡γ
Ep −w
Next, let µp ≡ w . By Eqs. (1A.8) and (1A.9), the solutions for ν 1 and ν 2 are,
ν 1w µp γ − β ν2w α − βµp
= ; =
2 αγ − β 2 2 αγ − β 2
27
by A. Mele
Therefore, the solution for the portfolio in Eq. (8A.14) is,
π̂ γµp − β −1 α − βµp −1
= 2 Σ b+ Σ 1m .
w αγ − β αγ − β 2
Finally, the value of the program is,
+
w (π̂) 1 1 µp γ − β 1 ⊤ α − µp β γµ2p − 2βµp + α (γµp − β)2 1
var = 2 π̂⊤ Σπ̂ = π̂⊤ 2 b + π̂ 2 1m = 2 = 2 + ,
w w w αγ − β w αγ − β αγ − β (αγ − β )γ γ
which is exactly Eq. (1.10) in the main text.
28
1.5. Appendix 2: The market portfolio c
by A. Mele
1.5 Appendix 2: The market portfolio

1.5.1 The tangent portfolio is the market portfolio
Let us define the market capitalization for any asset i as the value of all the assets i that are outstanding
in the market, viz
Capi ≡ θ̄i Si , i = 1, · · · , m,
where θ̄i is the number of assets i outstanding in the market. The market capitalization of all the
assets is simply
m
CapM ≡ Capi .
i=1
The market portfolio, then, is the portfolio with relative weights given by,
Capi
π̄M,i ≡ , i = 1, · · · , m.
CapM
Next, suppose there are N investors and that each investor j has wealth wj , which he invests in two
funds, a safe asset and the tangent portfolio. Let wjf be the wealth investor j invests in the safe asset
and wj − wjf the remaining wealth the investor invests in the tangent portfolio. The tangent portfolio

is defined as π̄T ≡ πwTj , for some πT solution to [P2], and is obviously independent of wj (see Eq.
(1.15) in the main text). The equilibrium in the stock market requires that
N
N

CapM · π̄M = wj − wjf π̄T = wj · π̄T = CapM · π̄T .
j=1 j=1
f
where the second equality follows because the safe asset is in zero net supply and, hence, N j=1 wj = 0;
and the third equality holds because all the wealth in the economy is invested in stocks, in equilibrium.
1.5.2 Tangency condition

We check that the CML and the efficient portfolio frontier have the same slope in correspondence of
the market portfolio. Let us impose the following tangency condition of the CML with the efficient
portfolio frontier in Figure 1.2, AMC, at the point M:
√ αγ − β 2
Sh = vM . (1A.10)
γµM − β
The left hand side of this equation is the slope of the CML, obtained through Eq. (1.6). The right hand
side is the slope of the efficient portfolio frontier, obtained by differentiating µp (v) in the expression
for the portfolio frontier in Eq. (1.11), and setting v = vM in
dµp (v) 2 αγ − β 2
= (γv − 1)−1 αγ − β 2 v = v,
dv γµp (v) − β
and where the second equality follows, again, by Eq. (1.11). By Eqs. (1A.10) and (1.14), we need to
show that,
γµM − β 1
2 = β − γr .
αγ − β
√
By plugging µM = r + Sh · vM into the previous equality and rearranging terms,
√
Sh
vM = ,
β − γr
29
1.5. Appendix 2: The market portfolio c
by A. Mele
where we have made use of the equality Sh = α − 2βr + γr2 , obtained by elaborating on the definition
of the Sharpe market performance Sh given in Eq. (1.4). This is indeed the variance of the market
portfolio given in Eq. (1.14).
30
1.6. Appendix 3: An alternative derivation of the SML c
by A. Mele
1.6 Appendix 3: An alternative derivation of the SML

The vector of covariances of the m asset returns with the market portfolio are:
πM πM 1
cov (x̃, x̃M ) = cov x̃, x̃ · =Σ = (b − 1m r) , (1A.11)
w w β − γr
where we have used the expression for the market portfolio given in Eq. (1.15). Next, premultiply the
π⊤
previous equation by M
w to obtain:
2 π⊤
M πM π⊤ 1 1
vM = Σ = M (b − 1m r) = Sh, (1A.12)
w w w β − γr (β − γr)2
√
Sh
or vM = β−γr , which confirms Eq. (1.14).
Let us rewrite Eq. (1A.11) component by component. That is, for i = 1, · · ·, m,
1 vM 2
vM
σiM ≡ cov (x̃i , x̃M ) = (bi − r) = √ (bi − r) = (bi − r) ,
β − γr Sh µM − r
√
where the last two equalities follow by Eq. (1A.12) and by the relation, Sh = µM −r
vM . By rearranging
terms, we obtain Eq. (1.18).
31
1.7. Appendix 4: Broader definitions of risk - Rothschild and Stiglitz theory c
by A. Mele
1.7 Appendix 4: Broader definitions of risk - Rothschild and Stiglitz theory

The papers are Rothschild and Stiglitz (1970, 1971). Notation, any variable with a tilde is a random
variable. Let us consider the following definition of stochastic dominance:
Definition A.1 (Second-order stochastic dominance). x̃2 dominates x̃1 if, for each utility function
u satisfying u′ ≥ 0, we have also that E [u (x̃2 )] ≥ E [u (x̃1 )].
We have:
Theorem A.2. The following statements are equivalent:

a) x̃2 dominates x̃1 , or E [u (x̃2 )] ≥ E [u (x̃1 )];
b) ∃ random variable η > 0 : x̃2 = x̃1 + η;
c) ∀x > 0, F1 (x) ≥ F2 (x).
Proof. We provide the proof when the support is compact, say [a, b]. First, we show that b) ⇒ c).
We have: ∀t0 ∈ [a, b], F1 (t0 ) ≡ Pr (x̃1 ≤ t0 ) = Pr (x̃2 ≤ t0 + η) ≥ Pr (x̃2 ≤ t0 ) ≡ F2 (t0 ). Next, we show
that c) ⇒ a). By integrating by parts,
" b " b
E [u (x)] = u(x)dF (x) = u(b) − u′ (x)F (x)dx,
a a
where we have used the fact that: F (a) = 0 and F (b) = 1. Therefore,
" b
E [u (x̃2 )] − E [u (x̃1 )] = u′ (x) [F1 (x) − F2 (x)] dx.
a
Finally, it is easy to show that a) ⇒ b).
Next, we turn to the definition of “increasing risk”:
Definition A.3. x̃1 is more risky than x̃2 if, for each function u satisfying u′′ < 0, we have also
that E [u (x̃1 )] ≤ E [u (x̃2 )] for x̃1 and x̃2 having the same mean.
This definition of “increasing risk” does not rely on the sign of u′ . Furthermore, if var (x̃1 ) >
var (x̃2 ), x̃1 is not necessarily more risky than x̃2 , according to the previous definition. The standard
counterexample is the following one. Let x̃2 = 1 w.p. 0.8, and 100 w.p. 0.2. Let x̃1 = 10 w.p. 0.99, and
1090 w.p. 0.01. We have, E (x̃1 ) = E (x̃2 ) = 20.8, but var (x̃1 ) = 11762.204 and var (x̃2 ) = 1647.368.
However, consider u(x) = log x. Then, E (log (x̃1 )) = 2.35 > E (log (x̃2 )) = 0.92. It is easily seen that
in this particular example, the distribution function F1 of x̃1 “intersects” F2 , which is in contradiction
with the following theorem.
Theorem A.4. The following statements are equivalent:

a) x̃1 is more risky than x̃2 ;
#t
b) x̃1 has more weight in the tails than x̃2 , i.e. ∀t, −∞ [F1 (x) − F2 (x)] dx ≥ 0;
c) x̃1 is a mean preserving spread of x̃2 , i.e. there exists a random variable ǫ : x̃1 has the same
distribution as x̃2 + ǫ, and E ( ǫ| x̃2 = x2 ) = 0.
32
by A. Mele
Proof. Let us begin with c) ⇒ a). We have,
E [u (x̃1 )] = E [u (x̃2 + ǫ)]

= E [E ( u (x̃2 + ǫ)| x̃2 = x2 )]
≤ E [u (E ( x̃2 + ǫ| x̃2 = x2 ))]
= E [u (E ( x̃2 | x̃2 = x2 ))]
= E [u (x̃2 )] .
As regards a) ⇒ b), we have that:

" b
E [u (x̃1 )] − E [u (x̃2 )] = u(x) [f1 (x) − f2 (x)] dx
a
" b
b
= u(x) [F1 (x) − F2 (x)]|a − u′ (x) [F1 (x) − F2 (x)] dx
a
" b
=− u′ (x) [F1 (x) − F2 (x)] dx
a
" b
b
= − u′ (x) F̄1 (x) − F̄2 (x) a − u′′ (x) F̄1 (x) − F̄2 (x) dx
a
" b
= u′′ (x) F̄1 (x) − F̄2 (x) dx − u′ (b) F̄1 (b) − F̄2 (b) ,
a
#x
where F̄i (x) = a Fi (u)du. Now, x̃1 is more risky than x̃2 means that E [u (x̃1 )] < E [u (x̃2 )] for u′′ < 0.
By the previous relation, then, F̄1 (x) > F̄2 (x). Finally, see Rothschild and Stiglitz (1970) p. 238 for
the proof of b) ⇒ c).
33
by A. Mele
References
Connor, G. (1984): “A Unified Beta Pricing Theory.” Journal of Economic Theory 34, 13-31.
Huang, C-f. and R.H. Litzenberger (1988): Foundations for Financial Economics. New York:
North-Holland.
Huberman, G. (1983): “A Simplified Approach to Arbitrage Pricing Theory.” Journal of Eco-

nomic Theory 28, 1983-1991.
Markovitz, H. (1952): “Portfolio Selection.” Journal of Finance 7, 77-91.
Ross, S. (1976): “Arbitrage Theory of Capital Asset Pricing.” Journal of Economic Theory
13, 341-360.
Rothschild, M. and J. Stiglitz (1970): “Increasing Risk: I. A Definition.” Journal of Economic

Theory 2, 225-243.
Rothschild, M. and J. Stiglitz (1971): “Increasing Risk: II. Its Economic Consequences.” Jour-
nal of Economic Theory 5, 66-84.
Sharpe, W. F. (1964): “Capital Asset Prices: A Theory of Market Equilibrium under Condi-
tions of Risk.” Journal of Finance 19, 425-442.
34
2
The CAPM in general equilibrium
2.1 Introduction
This chapter develops the general equilibrium foundations to the CAPM, within a framework
that abstracts from the production sphere of the economy. For this reason, we usually refer
the resulting model to as the “Consumption-CAPM.” First, we review the static model of
general equilibrium, without uncertainty. Then, we illustrate the economic rationale behind the
existence of financial assets in an uncertain world. Finally, we derive the Consumption-CAPM.
2.2 The static general equilibrium in a nutshell

We consider an economy with n agents and m commodities. Let wij denote the amount of the
i-th commodity the j-th agent is endowed with, and let wj = [w1j , · · ·, wmj ]. Let the price vector
be p = [p1 , · · ·, pm ], where pi is the price of the i-th commodity. Let wi = nj=1 wij be the total
endowment of the i-th commodity in the economy, and W = [w1 , · · ·, wm ] the corresponding
endowments bundle in the economy.
The j-th agent has utility function uj (c1j , · · ·, cmj ), where (cij )m
i=1 denotes his consumption
bundle. We assume the following standard conditions for the utility functions uj :
Assumption 2.1 (Preferences). The utility functions uj satisfy the following properties:
(i) Monotonicity; (ii) Continuity; and (iii) Quasi-concavity: uj (x) ≥ uj (y), and ∀α ∈ (0, 1),
∂uj 2
uj (αx + (1 − α)y) > uj (y) or, ∂cij
(c1j , · · ·, cmj ) ≥ 0 and ∂∂cu2 j (c1j , · · ·, cmj ) ≤ 0.
ij
m m
Let Bj (p1 , · · ·, pm ) = {(c1j , · · ·, cmj ) : i=1 pi cij ≤ i=1 pi wij ≡ Rj }, a bounded, closed and
convex set, hence a convex set. Each agent maximizes his utility function subject to the budget
constraint:
max uj (c1j , · · ·, cmj ) subject to (c1j , · · ·, cmj ) ∈ Bj (p1 , · · ·, pm ) . [P1]

{cij }
2.2. The static general equilibrium in a nutshell c
by A. Mele
This problem has certainly a solution, for Bj is compact set and by Assumption 2.1, uj is
continuous, and a continuous function attains its maximum on a compact set. Moreover, the
Appendix shows that this maximum is unique.
The first order conditions to [P1] are, for each agent j,
 ∂uj ∂uj ∂uj

 ∂c1j ∂c2j ∂cmj

 = = ··· =
p1 p2 pm
m
m
(2.1)



 pi cij = pi wij
i=1 i=1
These conditions form a system of m equations with m unknowns. Let us denote the solution
to this system with [ĉ1j (p, wj ), · · ·, ĉmj (p, wj )]. The total demand for the i-th commodity is,
n

ĉi (p, w) = ĉij (p, wj ), i = 1, · · ·, m.
j=1
We emphasize the economy we consider in this chapter is one that completely abstracts from
production. Here, prices are the key determinants of how resources are allocated in the end. The
perspective is, of course, radically different from that taken by the Classical school (Ricardo,
Marx and Sraffa), for which prices and resources allocation cannot be disentangled from the
production side of the economy. In the next chapter and more advanced parts of the lectures,
we consider the asset pricing implications of production, following the Neoclassical perspective.
2.2.1 Walras’ Law

Let us plug the demand functions of the j-th agent into the constraint of [P1], to obtain,
m

∀p, 0= pi ĉij (p, w j ) − wij . (2.2)
i=1
Next, define the total excess demand for the i-th commodity as ei (p, w) ≡ ĉi (p, w) − wi . By
aggregating the budget constraint across all the agents,
n
m m

j

∀p, 0 = pi ĉij (p, w ) − wij = pi ei (p, w).
j=1 i=1 i=1
The previous equality is the celebrated Walras’ law.

Next, multiply p by λ ∈ R++ . Since the constraint to [P1] does not change, the excess demand
functions are the same, for each value of λ. In other words, the excess demand functions are
homogeneous of degree zero in the prices, or ei (λp, w) = ei (p, w), i = 1, · · ·, m. This property of
the excess demand functions is also referred to as absence of monetary illusion.
2.2.2 Competitive equilibrium

A competitive equilibrium is a vector p̄ in Rm
+ such that ei (p̄, w) ≤ 0 for all i = 1, · · ·, m, with at
least one component of p̄ being strictly positive. Furthermore, if there exists a j : ej (p̄, w) < 0,
then p̄j = 0.
36
by A. Mele
2.2.2.1 Back to Walras’ law
Walras’ law holds by the mere aggregation of the agents’ constraints. But the agents’ constraints
are accounting identities. In particular, Walras’ law holds for any price vector and, a fortiori,
it holds for the equilibrium price vector,
m
m−1

0= p̄i ei (p̄, w) = p̄i ei (p̄, w) + p̄m em (p̄, w). (2.3)
i=1 i=1
Now suppose that the first m−1 markets are in equilibrium, or ei (p̄, w) ≤ 0, for i = 1, ···, m−1.
By the definition of an equilibrium, we have that sign (ei (p̄, w)) p̄i = 0. Therefore, by Eq. (2.3),
we conclude that if m − 1 markets are in equilibrium, then, the remaining market is also in
equilibrium.
2.2.2.2 The notion of numéraire
The excess demand functions are homogeneous of degree zero. Walras’ law implies that if m − 1
markets are in equilibrium, then, the m-th remaining market is also in equilibrium. We wish
to link these two results. A first remark is that by Walras’ law, the equations that define a
competitive equilibrium are not independent. Once m − 1 of these equations are satisfied, the
m-th remaining equation is also satisfied. In other words, there are m − 1 independent relations
and m unknowns in the equations that define a competitive equilibrium. So, there exists an
infinity of solutions.
Suppose, then, that we choose the m-th price to be a sort of exogeneous datum. The result
is that we obtain a system of m − 1 equations with m − 1 unknowns. Provided it exists, such
a solution is a function f of the m-th price, p̄i = fi (p̄m ), i = 1, · · ·, m − 1. Then, we may
refer to the m-th commodity as the numéraire. In other words, general equilibrium can only
determine a structure of relative prices. The scale of these relative prices depends on the price
level of the numéraire. It is easily checked that if the functions fi are homogeneous of degree
one, multiplying pm by a strictly positive number λ does not change the relative price structure.
Indeed, by the equilibrium condition, for all i = 1, · · ·, m,
0 ≥ ei (p̄1 , p̄2 , · · ·, λp̄m , w) = ei (f1 (λp̄m ), f2 (λp̄m ), · · ·, λp̄m , w)

= ei (λp̄1 , λp̄2 · ··, λp̄m , w) = ei (p̄1 , p̄2 · ··, p̄m , w) ,
where the second equality is due to the homogeneity property of the functions fi , and the
last equality holds because the excess demand functions ei are homogeneous of degree zero. In
particular, by defining relative prices as p̂j = pj / pm , one has that pj = p̂j · pm is a function
that is homogeneous of degree one. In other words, if λ ≡ p̄−1 m , then,

p̄1
0 ≥ ei (p̄1 , · · ·, p̄m , w) = ei (λp̄1 , · · ·, λp̄m , w) ≡ ei , · · ·, 1, w .
p̄m
2.2.3 Optimality
Let cj = (c1j , · · ·, cmj ) be the allocation to agent j, j = 1, · · ·, n. The following definition is the
well-known concept of a desiderable resource allocation in a society, according to Pareto.
37
by A. Mele
Definition 2.2 (Pareto optimum). An allocation c̄ = (c̄1 , · · ·, c̄n ) is a Pareto optimum if

n j j 1 n j j
j=1 (c̄ − w ) ≤ 0 and there is no c = (c , · · ·, c ) such that uj (c ) ≥ uj (c̄ ), j = 1, · · ·, n, with
one strictly inequality for at least one agent.
We have the following fundamental result:
Theorem 2.3 (First welfare theorem). Every competitive equilibrium is a Pareto optimum.
Proof. Let us suppose on the contrary that c̄ is an equilibrium but not a Pareto optimum.
∗ ∗ ∗
Then, there exists a c : uj ∗ (cj ) > uj ∗ (c̄j ), for some
j ∗ . Because
c̄j is optimal for agent j ∗ ,
/ Bj (p̄), or p̄cj > p̄wj and, by aggregating: p̄ nj=1 cj > p̄ nj=1 wj , which is unfeasible. It
∗ ∗
cj ∈
follows that c can not be an equilibrium.
Next, we show that any Pareto optimal allocation can be “decentralized.” That is, corre-
sponding to a given Pareto optimum c̄, there exist ways of redistributing endowments around,
and a price vector p̄ : p̄c̄ = p̄w, which is an equilibrium for the initial set of resources.
Theorem 2.4 (Second welfare theorem). Every Pareto optimum can be decentralized.
Proof. In the appendix.
The previous theorem can be interpreted as one that supports an equilibrium with transfer
payments. For any given Pareto optimum c̄j , a social planner can always give p̄w j to each
agent (with p̄c̄j = p̄wj , where wj is chosen by the planner), and agents choose c̄j . Figure 2.1
illustratres such a decentralization procedure within the Edgeworth’s box. Suppose that the
objective is to achieve c̄. Given an initial allocation w chosen by the planner, each agent is
given p̄w j . Under laissez faire, c̄ will obtain. In other words, agents are given a constraint of
the form pcj = p̄w j . If wj and p̄ are chosen so as to induce each agent to choose c̄j , then p̄ is a
supporting equilibrium price. In this case, the marginal rates of substitutions are identical, as
established by the following celebrated result:
Theorem 2.5 (Characterization of Pareto optima: I). A feasible allocation c̄ = (c̄1 , · · ·, c̄n )
m−1
is a Pareto optimum if and only if there exists a φ̃ ∈ R++ such that
$ ∂uj ∂uj %
˜ j = φ̃, j = 1, · · ·, n, where ▽u
˜ j≡ ∂c 2j ∂c
▽u , · · ·, mj .
∂uj ∂uj
(2.4)
∂c1j ∂c1j
Proof. A Pareto optimum satisfies:

 j
1  unj (c ) ≥ ūj , j = 2, · · ·, n
 (λj , j = 2, · · ·, n)
c̄ ∈ arg max u c subject to
m·n
c∈R+
1

 (cj − wj ) ≤ 0 (φi , i = 1, · · ·, m)
j=1
The Lagrangian function associated with this program is

n
m n
1
j

L = u1 (c ) + λj uj (c ) − ūj − φi (cij − wij ) ,
j=2 i=1 j=1
38
by A. Mele
FIGURE 2.1. Decentralizing a Pareto optimum
and the first order conditions are


 ∂u1

 = φ1
 ∂c11
···

 ∂u

 1
= φm
∂cm1
and, for j = 2, · · ·, n, 

 ∂uj

 λj = φ1
 ∂c1j
···

 ∂uj


 λj = φm
∂cmj
In both systems
of equations,
divide each equation by the the first, obtaining exactly Eq. (2.4),
φ2 φm
with φ̃ = φ , · · ·, φ . The converse is straight forward.
1 1
There is a simple and appealing interpretation of the Kuhn-Tucker multipliers φ on the

constraints of Theorem 2.5. Note that by Eq. (2.1), in the competitive equilibrium,

˜ p2 pm
▽uj = p̃ ≡ ,··· , .
p1 p1
But because a competitive equilibrium is also a Pareto optimum, then, by Theorem 2.5,

˜ j = φ̃ ≡ φ 2 φ m
▽u ,··· , .
φ1 φ1
Hence, φ̃ represents the vector of relative, shadow prices arising within the centralized allocation
process.
We provide a further characterization of Pareto optimal allocations.
39
by A. Mele
Theorem 2.6 (Characterization of Pareto optima: II). A feasible allocation c̄ = (c̄1 , · · ·, c̄n )
is a Pareto optimum if and only if there exists ℓ > 0 such that c̄ is solution to the following
program:
n
n
j
u (w, ℓ) = 1maxn ℓj uj c subject to cj ≤ w (ψ j , j = 1, · · ·, m) [P2]
c ,··· ,c
j=1 j=1
Proof. The if part is simple and at the same time instructive. Let us solve the program in
[P2]. The Lagrangian is,
n m n
j
L= ℓj uj c − ψi (cij − wij ) ,
j=1 i=1 j=1
and the first order conditions are, for j = 1, · · · , n,

⊤
⊤ ∂uj ∂uj
ℓj ∇uj = ψ ≡ (ψ 1 , · · ·, ψ m ) , ∇uj ≡ , · · ·, . (2.5)
∂c1j ∂cmj
˜ j equals the same vector of constants for all the agents, just as in Theorem 2.5. The
That is, ∇u
converse to this theorem follows by an application of the usual separating theorem, as in Duffie
(2001, Chapter 1).
Note, if ℓ1 = 1 and ℓj = λj for j = 2, · · · , n, then, ψ i = φi (i = 1, · · · , m) and so the first

order conditions in Theorem 2.5 and 2.6 would lead to the same allocation. More generally, we
have:
Theorem 2.7 (Centralization of competitive equilbrium through Pareto weightings). The

outcome of any competitive equilibrium can be obtained, through a central planner who maxi-
mizes the program in [P2], with system of social weights equal to ℓj = 1/κj , where κj is the
marginal utility of income for agent j.
So agents with high marginal utility of income for a given price vector, will receive little
social weight in the centralized planner allocation procedure. This result is particularly useful
when it comes to study financial markets in economies with heterogeneous agents. Theorem 2.7
is also a point of reference, where to move from, when it comes to study asset prices in a world
of incomplete markets. Chapter 8 contains several examples of these applications.
Proof of Theorem 2.7. In the competitive equilibrium,

∇uj = κj p, p ≡ (p1 , · · ·, pm ) , (2.6)
where κj are the Lagrange multipliers for the agents budget constraint, so that κj is the agent
j marginal utility of income:
m
∂
κj = uj (ĉ1j (p, w1j , · · · , wmj ) , · · ·, ĉmj (p, w1j , · · · , wmj )) , mj ≡ pi wij .
∂mj i=1
By comparing the competitive equilibrium solution in Eq. (2.6) with the Pareto optimality
property of the equilibrium in Eq. (2.5), we deduce that, a competitive equilibrium (c̄, p) can
be implemented, by a social planner acting as in Theorem 2.6, when ℓj = 1/κj , in which case
it also follows that, necessarily, ψ = p, by the resources constraint, nj=1 cj ≤ w, that has to
hold both in the competitive economy and the centralized one.
40
2.3. Time and uncertainty c
by A. Mele
2.3 Time and uncertainty

“A commodity is characterized by its physical properties, the date and the place at which
it will be available.”
Gerard Debreu (1959, Chapter 2)
General equilibrium theory can be used to study a variety of fields, by making an appropriate
use of the previous definition - from the theory of international commerce to finance. To deal
with uncertainty, Debreu (1959, Chapter 7) extended the previous definition, by emphasizing
that a commodity should be described through a list of physical properties, with the structure
of dates and places replaced by some event structure. The following example illustrates the
difference between two contracts underlying delivery of corn arising under conditions of certainty
(case A) and uncertainty (case B):
A The first agent will deliver 5000 tons of corn of a specified type to the second agent, who
will accept the delivery at date t and in place ℓ.
B The first agent will deliver 5000 tons of corn of a specified type to the second agent, who
will accept the delivery in place ℓ and in the event st at time t. If st does not occur at
time t, no delivery will take place.
In both cases, the contract is paid at the time it is actually agreed.

The model of the previous section can be used to deal with contracts containg statements such
as that in case B above. For example, consider a two-period economy. Suppose that in the second
period, sn mutually exhaustive and exclusive states of nature may occur. Then, we may recover
the model of the previous section, once we replace m (the number of commodities described
by physical properties, dates and places) with m∗ , where m∗ = sn · m. With m∗ replacing m,
the competitive equilibrium in this economy is defined as the competitive equilibrium in the
economy of the previous section.
The important assumption underlying the previous simplifying trick is that markets exists,
where commodities for all states of nature are traded. Such “contingent” markets are complete
in that a market is open for every commodity in all states of nature. Therefore, the agents may
implement any feasible action plan and, therefore, the resource allocation is Pareto-optimal.
The presumed existence of sn · m contingent markets is, however, very strong. We now show
how the presence of financial assets helps us mitigate this assumption.
2.4 Financial assets

What role might be played by financial assets in an uncertainty world? Arrow (1953) developed
the following interpretation. Rather than signing commodity-based contracts that are contin-
gent on the realization of events, the agents might wish to sign contracts generating payoffs
that are contingent on the realization of events. The payoffs delivered by the assets in the var-
ious states of the world could then be collected and used to satisfy the needs related to the
consumption plans.
The simplest financial asset is the so-called Arrow-Debreu asset, i.e. an asset that payoffs
some amount of numéraire in the state of nature s if the state s will prevail in the future, and
41
2.5. Absence of arbitrage c
by A. Mele
nil otherwise. More generally, a financial asset is a function x : S → R, where S is the set
of all future events. Then, let m be the number of financial assets. To link financial assets to
commodities, we note that if the of nature s will occurs, then, any agent could use the payoff
xi (s) promised by the i-th assets Ai to finance net transactions on the commodity markets, viz
m

p (s) · e (s) = θi xi (s), ∀s ∈ S, (2.7)
i=1
where p(s) and e(s) denote some vectors of prices and excess demands related to the commodi-
ties, contingent on the realization of state s, and θi is the number of assets i held by the agent.
In other words, the role of financial assets, here, is to transfer value from a state of nature to
another to finance state-contingent consumption.
Unfortunately, Eq. (2.7) does not hold, in general. A condition is that the number of assets,
m, be sufficiently high to let each agent cope with the number of future events in S, sn . Market
completeness merely reduces to a size problem - the assets have to be sufficiently diverse to
span all possible events in the future. Indeed, we shall show that if there are not payoffs that
are perfectly correlated, then, markets are complete if and only if m = sn . Note, also, that
this reduces the dimension of our original problem, for we are then considering a competitive
equilibrium in sn + m markets, instead of a competitive equilibrium in sn · m markets.
2.5 Absence of arbitrage

2.5.1 How to price a financial asset?
Consider an economy in which uncertainty is resolved through the realization of the event:
“Tomorrow it will rain.” A decision maker, an hypothetical Mr Law, must implement the
following contingent plan: if tomorrow will be sunny, he will need cs > 0 units of money, to
buy sun-glasses; if tomorrow it will rain, Mr Law will need cr > 0 units of money, to buy an
umbrella. Mr Law has access to a financial market on which m assets are traded. He builds up
a portfolio θ aimed to reproduce the structure of payments that he will need tomorrow:
 m

 θi Si (1 + xi (r)) = cr
i=1
m (2.8)

 θi Si (1 + xi (s)) = cs
i=1
where Si is the price of the i-th asset, θi is the number of assets to put in the portfolio, and
xi (r) and xi (s) are the net returns of asset i in the two states of nature, which of course are
known by Mr Law. For now, we do need to assume anything as regards the resources needed
to buy the assets, but we shall come back to this issue below (see Remark 2.6). Finally, and
remarkably, we are not making any assumption regarding Mr Law’s preferences.
Eqs. (2.8) form a system of two equations with m unknowns (θ1 , · · · θm ). If m < 2, no perfect
hedging strategy is possible - that is, the system (2.8) can not be solved to obtain the desired
pair (ci )i=r,s . In this case, markets are incomplete. More generally, we may consider an economy
with sn states of nature, in which markets are complete if and only if Mr Law has access to sn
assets. More precisely, let us define the following “payoff matrix,” defined as
 
S1 (1 + x1 (s1 )) Sm (1 + xm (s1 ))
 ... 
X = ,
S1 (1 + x1 (sn )) Sm (1 + xm (sn ))
42
by A. Mele
where xi (sj ) is the payoff promised by the i-th asset in the state sj . Then, to implement any
state contingent consumption plan c ∈ Rsn , Mr Law has to be able to solve the following system,
c = X · θ,
where and θ ∈ Rm , the portfolio. A unique solution to the previous system exists if rank(X) =
sn = m, and is given by θ̂ = X −1 c. Consider, for example, the previous case, in which sn = 2.
Let us assume that m = 2, for any additional assets would be redundant here. Then, we have,


 (1 + x2 (r))cs − (1 + x2 (s))cr
 θ̂1 =
S1 [(1 + x1 (s))(1 + x2 (r)) − (1 + x1 (r))(1 + x2 (s))]

 (1 + x1 (s))cr − (1 + x1 (r))cs
 θ̂2 =
S2 [(1 + x1 (s))(1 + x2 (r)) − (1 + x1 (r))(1 + x2 (s))]
Finally, assume that the second asset is safe, or that it yields the same return in the two states
of nature: x2 (r) = x2 (s) ≡ r. Let xs = x1 (s) and xr = x1 (r). Then, the pair (θ̂1 , θ̂2 ) can be
rewritten as,
cs − cr (1 + xs ) cr − (1 + xr ) cs
θ̂1 = , θ̂2 = .
S1 (xs − xr ) S2 (1 + r) (xs − xr )
As is clear, the issues we are dealing with relate to the replication of random variables. Here,
the random variable is a state contingent consumption plan (ci )i=r,s , where cr and cp are known,
which we want to replicate for hedging purposes. (Mr Law will need to buy either a pair of
sun-glasses or an umbrella, tomorrow.)
In the previous two-state example, two assets with independent payoffs are able to generate
any two-state variable. The next step, now, is to understand what happens when we assume
that there exists a third asset, A say, that delivers the same random variable (ci )i=r,s we can
obtain by using the previous pair (θ̂1 , θ̂2 ).
We claim that if the current price of the third asset A is H, then, it must be that,
H = V ≡ θ̂1 S1 + θ̂2 S2 , (2.9)
for the financial market to be free of arbitrage opportunities, to be defined informally below.
Indeed, if V < H, we can buy θ̂ and sell at the same time the third asset A. The result is a sure
profit, or an arbitrage opportunity, equal to H − V , for θ̂ generates cr if tomorrow it will rain
and cr if tomorrow it will not rain. In both cases, the portfolio θ̂ generates the payments that
are necessary to honour the contract committments related to the selling of A. By a symmetric
argument, the inequality V > H would also generate an arbitrage opportunity. Hence, Eq. (2.9)
must hold true.
It remains to compute the right hand side of Eq. (2.9), which in turn leads to an evaluation
formula for the asset A. We have:
1 xr − r
H= [P ∗ cs + (1 − P ∗ )cr ] , P∗ = . (2.10)
1+r xs − xr
Importantly, then, H can be understood as the discounted (by 1 + r) expectation of payoffs
promised by A, taken under some “artificial” probability P ∗ .
Remark 2.8. In this introductory example, the asset A can be priced without making
reference to any agents’ preferences. The key observation to obtain this result is that the
43
by A. Mele
payoffs promised by A can be obtained through the portfolio θ̂. This fact does not obviously
mean that any agent should use this portfolio. For example, it may be the case that Mr Law
is so poor that his budget constraint would not even allow him to implement the portfolio θ̂.
The point underlying the previous example is that the portfolio θ̂ could be used to construct an
arbitrage opportunity, arising when Eq. (2.9) does not hold. In this case, any penniless agent
could implement the arbitrage described above.
The next step is to extend the results in Eq. (2.10) to a dynamic setting. Suppose that
an additional day is available for trading, with the same uncertainty structure: the day after
tomorrow, the asset A will pay off css if it will be sunny (provided the previous day was sunny),
and crs if it will be sunny (provided the previous day was raining). By using the same arguments
leading to Eq. (2.10), we obtain that:
1 ∗2 ∗ ∗ ∗ ∗ ∗2

H= P css + P (1 − P )csr + (1 − P )P crs + P crr .
(1 + r)2
Finally, by extending the same reasoning to T trading days,
1
H= E ∗ (cT ) , (2.11)
(1 + r)T
where E ∗ denotes the expectation taken under the probability P ∗ .

The key assumption we used to derive Eq. (2.11) is that markets are complete at each trading
day. True, at the beginning of the trading period Mr Law faced 2T mutually exclusive possible
states of nature that would occur at the T -th date, which would seem to imply that we would
need 2T assets to replicate the asset A. However, we have just seen that to price A, we only
need 2 assets and T trading days. To emphasize this fact, we say that the structure of assets
and transaction dates makes the markets dynamically complete in the previous example. The
presence of dynamically complete markets allows one to implement dynamic trading strategies
aimed at replicating the value of the asset A, period by period. Naturally, the asset A could
be priced without any assumption about the preferences of any agent, due to the assumption
of dynamically complete markets.
2.5.2 The Land of Cockaigne

We provide a precise definition of the notion of absence of arbitrage opportunities, as well
as a connection between this notion and the notion and properties of the competitive equi-
librium described in Section 2.2. For simplicity, we consider a multistate economy with only
one commodity. The extension to the multicommodities case is dealt with very briefly in the
appendix.
Let vi (ω s ) be the payoff of asset i in the state ω s , i = 1, · · ·, m and s = 1, · · ·, d. Consider the
payoff matrix:
 
v1 (ω 1 ) vm (ω 1 )
 ... 
V ≡ .
v1 (ω d ) vm (ω d )
Let vsi ≡ vi (ω s ), vs,· ≡ [vs1 , · · · , vsm ], v·,i ≡ [v1i , , · · · , vdi ]⊤ . We assume that rank(V ) = m ≤ d.
44
by A. Mele
The budget constraint of each agent has the form:

 m


 c0 − w0 = −Sθ = −
 Si θi
i=1
m


 c
 s − w s = vs· θ = vsi θi , s = 1, · · ·, d
i=1
Let x1 = [x1 , · · · , xd ]⊤ . The second constraint can be written as:
c1 − w 1 = V θ.
We define an arbitrage opportunity as a portfolio that has a negative value at the first period,
and a positive value in at least one state of world in the second period, or a positive value in
all states of the world in the second period and a nonpositive value in the first period.
Notation: ∀x ∈ Rm , x > 0 means that at least one component of x is strictly positive while
the other components of x are nonnegative. x ≫ 0 means that all components of x are strictly
positive. [Insert here further notes]
Definition 2.9. An arbitrage opportunity is a strategy θ that yields1 either V θ ≥ 0 with

an initial investment Sθ < 0, or a strategy θ that produces2 V θ > 0 with an initial investment
Sθ ≤ 0.
As we shall show below (Theorem 2.11), an arbitrage opportunity can not exist in a com-
petitive equilibrium, for the agents’ program would not be well defined in this case. Introduce,
then, the (d + 1) × m matrix,
−S
W = ,
V
the vector subspace of Rd+1 ,
& '
W = z ∈ Rd+1 : z = W θ, θ ∈ Rm ,
and, finally, the null space of W ,

& '
W ⊥ = x ∈ Rd+1 : xW = 0m .
The economic interpretation of the vector subspace W is that of the excess demand space for
all the states of nature, generated by the “wealth transfers” generated
& by the investments in the
'
⊥ ⊥ d+1
assets. Naturally, W and W are orthogonal, as W = x ∈ R : xz = 0m , z ∈ W .
Mathematically, the assumption that there are no arbitrage opportunities is equivalent to the
following condition, (
W Rd+1
+ = {0} . (2.12)
The interpertation of (2.12) is in fact very simple. In the absence of arbitrage opportunities,
there should be no portfolios generating “wealth transfers” that are nonnegative and strictly
positive in at least one state, i.e. ∄θ : W θ > 0. Hence, W and the positive orthant Rd+1
+ can
not intersect.
1V θ ≥ 0 means that [V θ]j ≥ 0, j = 1, · · ·, d, i.e. it allows for [V θ]j = 0, j = 1, · · ·, d.
2V θ > 0 means [V θ]j ≥ 0, j = 1, · · ·, d, with at least one j for which [V θ]j > 0.
45
by A. Mele
The following result provides a general characterization of how the no-arbitrage condition in
(2.12) restricts the price of all the assets in the economy.
Theorem 2.10. There are no arbitrage opportunities

if and only if there
exists a φ ∈ Rd++ :
S = φ⊤ V . If m = d, φ is unique, and if m < d, dim φ ∈ Rd++ : S = φ⊤ V = d − m.
Proof. In the appendix.
The previous theorem provides the foundations for many developments in financial economics.
To provide its intuition, let us pre-multiply the second constraint by φ⊤ , obtaining,
φ⊤ (c1 − w 1 ) = φ⊤ V θ = Sθ = − (c0 − w0 ) ,
where the second equality follows by Theorem 2.10, and the third equality is due to the first
period budget constraint. Critically, then, Theorem 2.10 shows that in the absence of arbitrage
opportunities, each agent has access to the following budget constraint,
d

1
⊤ 1

0 = c0 − w0 + φ c − w = c0 − w0 + φs (cs − ws ) , with c1 − w1 ∈ V . (2.13)
s=1
The budget constraints in (2.13) reveal that φ can be interpreted as the vector of prices to
the commodity in the future d states of nature, and that the numéraire in this economy is
the first-period consumption. We usually refer φ to as the state price vector, or Arrow-Debreu
state price vector. However, it would be misleading to say that the budget constraint in (2.13)
is that we are used to see in the static Arrow-Debreu type model of Section 2.2. In fact, the
Arrow-Debreu economy of Section 2.2 obtains when m = d, in which case V = Rd in (2.13).
This case, which according to Theorem 2.10 arises when markets are complete, also implies the
remarkable property that there exists a unique φ that is compatible with the asset prices we
observe.
The situation is radically different if m < d. In other terms, V is the subspace of excess
demands agents have access to in the second period and can be “smaller” than Rd if markets
are incomplete. Indeed, V is the subspace generated by the payoffs obtained by the portfolio
choices made in the first period,
& '
V = e ∈ Rd : e = V θ, θ ∈ Rm .
2
v1 case d = 2 and m = 1. In this case, V = {e ∈ R : e = V θ, θ ∈ R},
Consider, for example, the
with V = V1 , where V1 = v2 say, and dim V = 1, as illustrated by Figure 2.2.
Next, suppose we open a new market for)a second financial asset with payoffs * given by: V2 =
v3 v1 v3
θ1 v1 +θ2 v3
v4
. Then, m = 2, V = ( v2 v4 ), and V = e ∈ R : e = θ1 v2 +θ2 v4 , θ ∈ R , i.e. V = R2 . As
2 2
a result, we can now generate any excess demand in R2 , just as in the Arrow-Debreu economy
of Section 2.2. To generate any excess demand, we multiply the payoff vector V1 by θ1 and the
payoff vector V2 by θ2 . For example, suppose we wish to generate the payoff the payoff vector
V4 in Figure 2.3. Then, we choose some θ1 > 1 and θ2 < 1. (The exact values of θ1 and θ 2
are obtained by solving a linear system.) In Figure 2.3, the payoff vector V3 is obtained with
θ1 = θ2 = 1.
To summarize, if markets are complete, then, V = Rd . If markets are incomplete, V
is only a subspace of Rd , which makes the agents’ choice space smaller than in the complete
markets case.
46
by A. Mele
v2
v1
<V>
FIGURE 2.2. Incomplete markets, d = 2, m = 1.
v4 V3
V2 V4
v2
V1
v3 v1
FIGURE 2.3. Complete markets, V = R2 .
47
2.6. Equivalent martingales and equilibrium c
by A. Mele
We now present a fundamental result, about the “viability of the model.” Define the second
period consumption c1j ≡ [c1j , · · · , cdj ]⊤ , where csj is the second-period consumption in state s,
and let,
+
1
1
c0j − w0j = −Sθj
ĉ0j , ĉj ∈ arg max1 uj (c0j ) + β j E(ν j (cj )) , subject to 1 1 [P3]
c0j ,cj cj − wj = V θj
where uj and ν j are utility functions, both satisfying Assumption 2.1. Naturally, we could use
more general formulations of utilities than that in [P3], and in fact we shall in more advanced
parts of this book. For the sake of this introductory chapter, we only consider additive utility.
We have:
Theorem 2.11. The program [P3] has a solution if and only if there are no arbitrage oppor-
tunities.
Proof. Let us suppose on the contrary that the program [P3] has a solution ĉ0j , ĉ1j , θ̂j , but
that there exists a θ : W θ > 0. The program constraint is, with straight forward notation,
ĉj = wj + W θ̂j . Then, we may define a portfolio θ j = θ̂j + θ, such that cj = wj + W (θ̂j + θ) =
ĉj + W θ > ĉj , which contradicts the optimality of ĉj . For the converse, note that the absence of
arbitrage opportunities implies that ∃φ ∈ Rd++ : S = φ⊤ V , which leads to the budget constraint
in (2.13), for a given φ. This budget constraint is clearly a closed subset of the compact budget
constraint Bj in [P1] (in fact, it is Bj restricted to V ). Therefore, it is a compact set and,
hence, the program [P3] has a solution, as a continuous function attains its maximum on a
compact set.
2.6 Equivalent martingales and equilibrium

We provide the definition of an equilibrium with financial markets, when the financial assets
are in zero net supply.
Definition 2.12. An equilibrium is given by allocations and prices {(ĉ0j )nj=1 , ((ĉsj )nj=1 )ds=1 ,
(Ŝi )m n nd d
i=1 ∈ R+ × R+ × R+ }, where the allocations are solutions of the program [P3] and satisfy:
n
n
n

0= (ĉ0j − w0j ) , 0= (ĉsj − wsj ) (s = 1, · · ·, d) , 0= θij (i = 1, · · ·, d) .
j=1 j=1 j=1
We now express demand functions in terms of the stochastic discount factor, and then look
for an equilibrium by looking for the stochastic discount factor that clears the commodity
markets. By Walras’ law, this also implies the equilibrium on the financial market. Indeed, by
aggregating the agent’s constraints in the second period,
n
n

c1j − wj1 = V θ ij (m).
j=1 j=1
For simplicity, we also assume that u′j (x) > 0, u′′j (x) < 0 ∀x > 0 and limx→0 u′j (x) = ∞,
limx→∞ u′j (x) = 0 and that ν j satisfies the same properties.
48
by A. Mele
2.6.1 The rational expectations assumption

Lucas, Radner, Green. Every agent correctly anticipates the equilibrium price in each state of
nature.
[Consider for example the models with asymmetric information that we will see later in these
lectures. At some point we will have to compute, E ( ṽ| p (ỹ) = p). That is, the equilibrium is a
pricing function which takes some values p (ỹ) depending on the state of nature. In this kind of
models, λθI (p (ỹ) , ỹ) + (1 − λ) θU (p (ỹ) , ỹ) + ỹ = 0, and we look for a solution p (ỹ) satisfying
this equation.]
2.6.2 Stochastic discount factors

Theorem 2.10 states that in the absence of arbitrage opportunities,
d

Si = φ⊤ v·,i = φs vs,i , i = 1, · · ·, m. (2.14)
s=1
Let us assume that the first asset is a safe asset, i.e. vs,1 = 1 ∀s. Then, we have
d
1
S1 ≡ = φs . (2.15)
1+r s=1
Eq. (2.15) confirms the economic interpretation of the state prices in (2.13). Recall, the states
of nature are exhaustive and mutually exclusive. Therefore, φs can be interpreted as the price
to be paid today for obtaining, for sure, one unit of numéraire, tomorrow, in state s. This is
indeed the economic interpretation of the budget constaint in (2.13). Eq. (2.15) confirms this
as it says that the prices of all these rights sum up to the price of a pure discount bond, i.e. an
asset that yields one unit of numéraire, tomorrow, for sure.
Eq. (2.15) can be elaborated to provide us with a second interpretation of the state prices in
Theorem 2.10. Define,
Ps∗ ≡ (1 + r)φs ,
which satisfies, by construction,
d

Ps∗ = 1.
s=1
Therefore, we can interpret P ∗ ≡ (Ps∗ )ds=1 as a probability distribution. Moreover, by replacing

P ∗ in Eq. (13.9) leaves,
d
1 ∗ 1 ∗
Si = Ps vs,i = E P (v·,i ) , i = 1, · · ·, m. (2.16)
1 + r s=1 1+r
Eq. (2.16) confirms Eq. (2.10), obtained in the introductory example of Section 2.5. It says
that the price of any asset is the expectation of its future payoffs, taken under the proba-
bility P ∗ , discounted at the risk-free interest rate r. For this reason, we usually refer to the
probability P ∗ as the risk-neutral probability. Eq. (2.16) can be extended to a dynamic con-
text, as we shall see in later chapters. Intuitively, consider an asset that distributes dividends
in every period, let S (t) be its price at time t, and D (t) the dividend paid off at time t.
49
by A. Mele
Then, the “payoff” it promises for the next period is S (t + 1) + D (t + 1). By Eq. (2.16),
S (t) = (1 + r)−1 E P (S (t + 1) + D (t + 1)) or, by rearranging terms,
∗

P∗ S (t + 1) + D (t + 1) − S (t)
E = r. (2.17)
S (t)
That is, the expected return on the asset under P ∗ equals the safe interest rate, r. In a dynamic
context, the risk-neutral probability P ∗ is also referred to as the risk-neutral martingale measure,
or equivalent martingale measure, for the following reason. Define a money market account as
an asset with value evolving over time as M (t) ≡ (1 + r)t . Then, Eq. (2.17) can be rewritten
∗
as S (t) /M (t) = E P [(S (t + 1) + D (t + 1)) /M (t + 1)]. This shows that if D (t + 1) = 0 for
some t, then, the discounted process S (t) /M (t) is a martingale under P ∗ .
Next, let us replace P ∗ into the budget constraint in (2.13), to obtain, for (c1 − w 1 ) ∈ V ,
d
d
1 ∗ 1 ∗
0 = c0 −w0 + φs (cs − ws ) = c0 −w0 + Ps (cs − ws ) = c0 −w0 + E P c1 − w1 .
s=1
1 + r s=1 1+r
(2.18)
For reasons developed below, it is also useful to derive an alternative representation of the
budget constraint, in terms of the objective probability P (say). Let us introduce, first, the
ratio η, defined as,
Ps∗ = ηs Ps , s = 1, · · ·, d.
The ratio ηs indicates how far P ∗ and P are. We assume η s is strictly positive, which means
that P ∗ and P are equivalent measures, i.e. they assign the same weight to the null sets. Finally,
let us introduce the stochastic discount factor, m = (ms )ds=1 , defined as,
ms ≡ (1 + r)−1 ηs .
We have,
d d
1 ∗ 1 1
E P c1 − w1 = Ps∗ (cs − ws ) = η s (cs − ws ) Ps = E m · c1 − w1 .
1+r s=1
1+r 1+r
s=1
=ms
Hence, we can rewrite Eq. (2.18) as,

0 = c0 − w0 + E m · c1 − w1 , c1 − w1 ∈ V .
Similarly, by replacing the stochastic discount factor m into Eq. (2.16) we obtain,
1 ∗
Si = E P (v·,i ) = E (m · v·,i ) , i = 1, · · ·, m. (2.19)
1+r
Naturally, despite all such different ways to express budget constraints and asset prices, the
key of the model is still φ,
Ps∗ φ
ms = (1 + r)−1 η s = (1 + r)−1 = s,
Ps Ps
which can be recovered, once we solve for the equilibrium stochastic discount factor m, as we
shall illustrate in the next section.
50
by A. Mele
2.6.3 Optimality and equilibrium

We have argued that in the absence of arbitrage opportunities, the program of any agent j is
1
1 1
1 1

max uj (c0j ) + β j · E(ν j (cj )) subject to 0 = c0j −w0j +E m · (cj − wj ) , c − w ∈ V .
1
(c0 ,c )
[P4]
2.6.3.1 Complete markets and risk sharing
In the complete markets case, V = Rd , so that the first order conditions to the program [P4]
are,
u′j (ĉ0j ) = λj , β j ν ′j (ĉsj ) = λj ms , s = 1, · · ·, d,
where λj is a Lagrange multiplier. So, really, the properties of this model are the same as those
of the static model in Section 2.2. Formally, the complete markets economy in this section is the
same as the static economy in Section 2.2, once we set m = d, where m is the dimension of the
commodity space, in Section 2.2, and ps = φs , where ps is the price of the s-th commodity in
Section 2.2, with p1 = 1 (the numéraire), and φs is the Arrow-Debreu state price in the unified
budget constraint of Eq. (2.18).
These simple observations have profound implications: an economy subject to uncertainty can
be understood through a static model, in the presence of complete markets! Under the conditions
stated in Section 2.2, even complicated models with heterogeneous agents, with potentially
interesting asset pricing implications, and still, apparently, so hopelessly difficult to analyze,
can actually be “centralized,” through a dedicated design of Pareto’s weights, as formalized
in Theorem 2.7. We can actually do much more. First, this centralization property is easily
extended to a dynamic context, as we shall see in more advanced parts of these lectures (see
Chapter 8), provided markets satisfy the property of being dynamically complete, a property
explained in the next two chapters. Second, the assumption agents can exchange Arrow-Debreu
securities for all future states of the world, is clearly unrealistic: markets are pretty likely to
be incomplete, one possible reason why financial innovation is so pervasive, in practice. Yet
the theory about centralization can be extended to an incomplete markets setting, through a
system of “stochastic Pareto weights,” as we discuss in detail in Chapter 8. For now, let us
proceed with the next simple and fundamental steps.
To illustrate the equilibrium implications of the first order conditions in a simple case, consider
an economy with a single agent. In this economy, the first order conditions immediately lead to
the following stochastic discount factor,
ν ′ (ws )
ms = β .
u′ (w0 )
The economic interpretation of this stochastic discount factor is the following. In the autarchic
state,
dc0 ν ′ (ws )
− = β Ps = ms Ps = φs
dcs c0 =w0 ,cs =ws u′ (w0 )
is the present consumption the agent is willing to give up to at t = 0, in order to obtain
additional consumption at time t = 1, in state s. In other words, φs is the price, in terms of the
present consumption numéraire, of one additional unit of consumption at time t = 1 and state
s. So it is a state price, such that, the agent is happy to consume his own endowment, without
51
by A. Mele
any incentives to trade in the financial markets. The risk-neutral probability is,
ν ′ (ws )
Ps∗ = ηs Ps = (1 + r) ms Ps = (1 + r) β ′ Ps .
u (w0 )
By the first
dorder∗conditions, and the pure discount bond evaluation formula, it is easily checked
that 1 = s=1 Ps . Moreover,
′ −1
Ps∗ ν (ws ) ′
−1 u (w0 ) ν ′ (ws )
= ms (1 + r) = ms βE = ms β = ,
Ps u′ (w0 ) E [ν ′ (ws )] E [ν ′ (ws )]
1
where the second equality follows by the pure discount bond evaluation formula: 1+r = E(m).
In the multi-agent case, the situation is similar as soon as markets are complete. Indeed,
consider the first order conditions of each agent,
ν ′j (ĉsj )
βj = ms , s = 1, · · ·, d, j = 1, · · ·, n.
u′j (ĉ0j )
The previous relation reveals that as soon as markets are complete, agents must have the same
marginal rate of substitution, in equilibrium. This is because by Theorem 2.10, the state price
vector φ is unique if and only if markets are complete, which then implies uniqueness of ms = Pφss
ν ′ (ĉsj )
and, hence, the fact that each marginal rate of substitution β j uj′ (ĉ0j ) is independent of j. In this
j
case, the equilibrium allocation is clearly a Pareto optimum, by the discussion at the beginning
of this section, and Theorem 2.5.
The result that agents have the same marginal rate of substitution for each state of the world
is known as risk sharing. It means that, given an initial endowment distribution among the
agents, the market mechanism, through to a system of complete securities markets, is such that
consumption risk is shifted around the economy, so that it is borne by the agents most willing to
take it. For example, suppose that two agents 1 and 2 have the same discount rate, and utility
functions uj = ν j , with CRRA given by η 1 and η2 , where η 1 < η 2 . Then, Grs1 = (Grs2 )η2 /η1 ,
where Grsi is consumption growth for the i-th agent in state s. In good times, when Grs2 > 1,
the more risk-averse agent experiences, ex-post, a lower consumption growth rate, Grs2 < Grs1 .
In bad times, however, when Grs2 < 1, the more risk-averse agent experiences, ex-post, a higher
consumption growth rate, Grs2 > Grs1 . In other words, capital markets, when complete, operate
in such a way to have the more risk-averse agent face a less volatile consumption growth.
2.6.3.2 Incomplete markets
If markets are incomplete, marginal rates of substitution cannot be equal, among agents, except
perhaps on a set of endowments distribution with measure zero. The best outcome in this case,
is a set of equilibria called constrained Pareto optima, i.e. constrained by ... the states of nature.
As it turns out, there might not even exist constrained Pareto optima in multiperiod economies
with incomplete markets–except perhaps those arising on a set of endowments distributions
with zero measure.
When market are incomplete, the state price vector φ is not unique. That is, suppose that
⊤
φ is an equilibrium state price. Then, all the elements of
Φ = {φ′ ∈ Rd++ : (φ′ − φ)⊤ V = 0} (2.20)

52
by A. Mele
are also equilibrium state prices - there exists an infinity of equilibrium state prices that are
consistent with absence of arbitrage opportunities. In other words, there exists an infinity of
equilibrium state prices guaranteeing the same observable assets price vector S, for φ′ ⊤ V =
φ⊤ V = S.
How do we proceed in this case? Introduce the following budget constraint:
& '
C = c ∈ Rd++ : 0 = c0 − w0 + φ⊤ c1 − w 1 , c1 − w1 ∈ V , ∀φ ∈ Rd++ : S = φ⊤ V .
(2.21)
This budget constraint, and the previous reasoning about the set Φ in (2.20) shows that in the
context of incomplete markets, there exists many constraints to take care of, and the previous
“martingale methods,” do not apply.
Yet let Val (PI ) be the value of the following program in the incomplete markets at hand:

max uj (c0j ) + β j E(ν j (c1j )) . [PI ]
c∈C
Consider, next, the following constraint:

+ ,
c ∈ Rd++ : 0 = c0 − w0 + φ⊤ (c1 − w1 ) , (c1 − w 1 ) ∈ Rd ,
Cφ = ,
for some given φ ∈ Rd++ : S = φ⊤ V
and let Val (Pφ ) be the value of the program in some abstract complete markets case:

max uj (c0j ) + β j E(ν j (c1j )) . [Pφ ]
c∈Cφ
Clearly, we have, Val (PI ) ≤ Val (Pφ ) for all φ, for the constraint in the incomplete markets
case, C, is more stringent than that in any complete market setting, Cφ : the solution to the
program in the incomplete markets case [PI ], must satisfy the budget constraints in C, formed
using all of the possible Arrow-Debreu state prices (including the Arrow-Debreu state price φ
given in Cφ ), as the constraint of Eq. (2.21) shows. Moreover, (c1 − w1 ) ∈ V . These remarks
suggest to define the following “min-max” Arrow-Debreu state price:
φ∗ = arg min Val (Pφ ) .

φ∈Φ
The natural question is to know whether
Val (PI ) = Val (Pφ∗ ) . (2.22)
This is indeed the case, given some regularity conditions. For the characterization of φ∗ , suppose
there exists φ̂ : Val (PI ) = Val(Pφ̂ ). Then, φ̂ = φ∗ . Indeed, suppose the contrary, i.e. there exists
φ′ : Val(Pφ′ ) < Val(Pφ̂ ). Then, we would have,
Val (PI ) ≤ Val(Pφ′ ) < Val(Pφ̂ ) = Val (PI ) ,
a contradiction. Note, again, this is a characterization result about φ∗ , not an existence proof.
But as mentioned earlier, Eq. (2.22) holds true, as shown in a dynamic setting by He and
Pearson (1991). Chapter 4 provides general guidance about an even more general approach to
solving problems of this kind, arising in a broader context of market imperfections, including
incomplete markets as a special case.
53
2.7. Consumption-CAPM c
by A. Mele
2.6.3.3 Computation of the equilibrium
The first order conditions satisfied by any agent’s program are:

ĉ0j = Ij (λj ) , ĉsj = Hj β −1
j λj ms , (2.23)
where Ij and Hj denote the inverse functions of u′j and ν ′j . By the assumptions we made on uj
and ν ′j , Ij and Hj inherit the same properties of u′j and ν ′j . By replacing these functions into
the constraint,
d

0 = ĉ0j − w0j + E m · (ĉ1j − wj1 ) = Ij (λj ) − w0j + Ps ms · Hj (β −1
j λj ms ) − wsj .
s=1
Define the function,

zj (λj ) ≡ Ij (λj ) + E mHj (β −1 1
j λj m) = w0j + E m · wj .
We see that limx→0 z(x) = ∞, limx→∞ z(x) = 0 and z ′ (x) < 0. Therefore, there exists a unique
solution for λj :
λj ≡ Λj w0j + E(m · wj1 ) ,
where Λ(·) denotes the inverse function of z. By replacing back into Eqs. (2.23), we obtain:

ĉ0j = Ij Λj w0j + E(m · wj1 ) , ĉsj = Hj β −1j ms Λj w0j + E(m · wj )
1
.
It remains to compute the general equilibrium. The kernel m must be determined. This means
that we have d unknowns (ms , s = 1, · · ·, d). We have d + 1 equilibrium conditions (holding in
the d + 1 markets). By Walras’ law, only d of these are independent. Consider the equilibrium
conditions in the d markets at the second period:
n
−1 n
1
gs ms ; (ms′ )s′ =s ≡ Hj β j ms Λj w0j + E(m · wj ) = wsj ≡ ws , s = 1, · · ·, d.
j=1 j=1
These conditions determine the kernel (ms )ds=1 which leads to compute prices and equilibrium
allocations. Finally, once the optimal ĉs are computed, for s = 0, 1, · · ·, d, the portoflio θ̂
generated them can be inferred through θ̂ = V −1 (ĉ1 − w1 ).
2.7 Consumption-CAPM
Consider the pricing equation (2.19). It states that for every asset with gross return R̃ ≡
S −1 · payoff,
1 = E(m · R̃), (2.24)
where m is some pricing kernel.
In the previous section, we learnt that in a complete markets economy, equilibrium leads to
the following identification of the pricing kernel,
ν ′ (ws )
ms = β .
u′ (w0 )
For a riskless asset, 1 = E(m · R). By combining this equality with Eq. (2.24), leaves E[m ·
(R̃ − R)] = 0. By rearranging terms,
cov(ν ′ (w+ ), R̃)
E(R̃) = R − . (2.25)
E [ν ′ (w + )]
54
2.7. Consumption-CAPM c
by A. Mele
2.7.1 The risk premium

Eq. (2.25) can be rewritten as,
cov(m, R̃)
E(R̃) − R = − = −R · cov(m, R̃). (2.26)
E(m)
The risk-premium to invest in the asset is high for securities which pay high returns when
consumption is high (i.e. when we don’t need high returns) and low returns when consumption
is low (i.e. when we need high returns).
All in all, if the price p = E (m · payoff) = E (m) E (payoff)+cov (m, payoff) = R−1 E (payoff)
- Premium, where Premium = − cov (m, payoff), a discounting effect.
2.7.2 The beta relation

Suppose there is a R̃m such that
R̃m = −γ −1 · ν ′ (ws ) , all s.
In this case,
γ · cov(R̃m , R̃) γ · var(R̃m )
E(R̃) = R + and E(R̃m ) = R + .
E [ν ′ (w + )] E [ν ′ (w+ )]
These relations can be combined to yield,
cov(R̃m , R̃)
E(R̃) − R = β · [E(R̃m ) − R], β≡ .
var(R̃m )
2.7.3 CCAPM & CAPM

Let R̃p be the portfolio return which is the most highly correlated with the pricing kernel m.
We have,
E(R̃p ) − R = −R · cov(m, R̃p ). (2.27)
Using Eqs. (2.26) and (2.27),
E(R̃) − R cov(m, R̃)
= ,
p
E(R̃ ) − R cov(m, R̃p )
and by rearranging terms,
β R̃,m
E(R̃) − R = [E(R̃p ) − R] [CCAPM].
β R̃p ,m
If R̃p is perfectly correlated with m, i.e. if there exists γ : R̃p = −γm, then
cov(R̃p , R̃)
β R̃,m = −γ and β R̃p ,m = −γ
var(R̃p )
and then
E(R̃) − R = β R̃,R̃p [E(R̃p ) − R] [CAPM].
This is not the only way the CAPM obtains. As we shall explain in Chapter 6, the CAPM also
obtains through the so-called “maximum correlation portfolio,” which is the portfolio that is
the most highly correlated with the pricing kernel m.
55
2.8. Infinite horizon c
by A. Mele
2.8 Infinite horizon

We consider d states of the nature and m = d Arrow securities. We write a unified budget
constraint, as in the valuation equilibria approach of Debreu (1954).
We have, 
m

 p (c − w ) = −S (0) θ(0) = −
0 0 0 S (0) θ(0) i i

 i=1
p1s (c1s − ws1 ) = θ(0)
s , s = 1, · · ·, d
or,
m
(0)
p0 (c0 − w0 ) + Si p1i c1i − wi1 = 0.
i=1
The previous relation holds in a two-period economy. In a multiperiod economy, in the second
period (as in the following periods) agents save indefinitively for the future. In the appendix,
we show that, -∞ .

0=E m0,t · pt ct − wt , (2.28)
t=0
where m0,t are the state prices. From the perspective of time 0, at time t there exist dt states
of nature and, thus, dt possible prices.
2.9 Further topics on incomplete markets

2.9.1 Nominal assets and real indeterminacy of the equilibrium
m·(d+1)
The equilibrium is a set of prices (p̂, Ŝ) ∈ R++ × Ra++ such that:
n
n
n

0= e0j (p̂, Ŝ), 0= e1j (p̂, Ŝ), 0= θj (p̂, Ŝ),
j=1 j=1 j=1
where the previous functions are the results of optimal plans of the agents. This system has
m · (d + 1) + a equations and m · (d + 1) + a unknowns, where a ≤ d. Let us aggregate the
constraints of the agents,
n
n
n
n

p0 e0j = −S θj , p1 e1j = B θj .
j=1 j=1 j=1 j=1
n
Suppose the financial markets clearing condition is satisfied, i.e. j=1 θj = 0. Then,
 n m
 (ℓ) (ℓ)

 0 = p0 e0j ≡ p0 e0 = p0 e0
j=1 ℓ=1
m ⊤

 n (ℓ) (ℓ)
m
(ℓ) (ℓ)
 0d = p1 e1j ≡ p1 e1 = p1 (ω 1 )e1 (ω 1 ), · · ·, p1 (ω d )e1 (ω d )
j=1 ℓ=1 ℓ=1
Therefore, there is one redundant equation for each state of nature, or d + 1 redundant
equations, in total. As a result, the equilibrium has less independent equations (m · (d + 1) − 1)
than unknowns (m·(d+1)+d), i.e., an indeterminacy degree equal to d+1. This result does not
56
2.9. Further topics on incomplete markets c
by A. Mele
rely on whether markets are complete or not. In a sense, it is even not an indeterminacy result
when markets are complete, as we may always assume agents would organize the exchanges
at the beginning. In this case, onle the suitably normalized Arrow-Debreu state prices would
matter for agents.
The previous indeterminacy can be reduced to d−1, as we may use two additional homogene-
ity relations. To pin down these relations, let us consider the budget constaint of each agent
j,
p0 e0j = −Sθ j , p1 e1j = Bθj .
The first-period constraint is still the same if we multiply the spot price vector p0 and the
financial price vector S by a positive constant, λ (say). In other words, if (p̂0 , p̂1 , Ŝ) is an equi-
librium, then, (λp̂0 , p̂1 , λŜ) is also an equilibrium, which delivers a first homogeneity relation.
To derive the second homogeneity relation, we multiply the spot prices of the second period by
a positive constant, λ and increase at the same time the first period agents’ purchasing power,
by dividing each asset price by the same constant, as follows:
S
p0 e0j = − λθj , λp1 e1j = Bλθj .
λ

Therefore, if (p̂0 , p̂1 , Ŝ) is an equilibrium, then, p̂0 , λp̂1 , Ŝλ is also an equilibrium.
2.9.2 Nonneutrality of money

The previous indeterminacy arises because financial contracts are nominal, i.e. the asset payoffs
are expressed in terms of some unité de compte that, among other things, we did not make
precise. Such an indeterminacy vanishes if we were to consider real contracts, i.e. contracts
with payoffs expressed in terms of the goods. To show this, note that in the presence of real
contracts, the agents’ constraints are
+
p0 e0j = −Sθj
p1 (ω s )e1j (ω s ) = p1 (ω s )As θj , s = 1, · · ·, d
where As = [A1s , · · ·, Aas ] is the m × a matrix of the real payoffs. The previous constraint
now reveals how to “recover” d + 1 homogeneity relations. For each strictly positive vector
λ = [λ0 , λ1 · ··, λd ], we have that if [p̂0 , S, p1 (ω 1 ), · · ·, p1 (ω s ), · · ·, p1 (ω d )] is an equilibrium, then,
[λ0 p̂0 , λ0 S, p1 (ω 1 ), · · ·, p1 (ω s ), · · ·, p1 (ω d )] is also an equilibrium, and so is
[p̂0 , S, p1 (ω 1 ), · · ·, λs p1 (ω s ), · · ·, p1 (ω d )], for λs , s = 1, · · ·, d.
As is clear, the distinction between nominal and real assets has a precise meaning, when
one considers a multi-commodity economy. Even in this case, however, such a distinctions is
not very interesting without a suitable introduction of a unité de compte. These considerations
led Magill and Quinzii (1992) to solve the indetermincay while still remaining in a framework
with nominal assets. They simply propose to introduce money as a mean of exchange. The
indeterminacy can then be resolved by “fixing” the prices via the d + 1 equations defining the
money market equilibrium in all states of nature:
n

Ms = ps · wsj , s = 0, 1, · · ·, d.
j=1
Magill and Quinzii showed that the monetary policy (Ms )ds=0 is generically nonneutral.
57
2.10. Appendix 1 c
by A. Mele
2.10 Appendix 1
In this appendix we prove that the program [P1] has a unique maximum. Indeed, suppose on the
contrary that we have two maxima:

c̄ = (c̄1j , · · ·, c̄mj ) and c = c1j , · · ·, cmj .
m
These two maxima would satisfy uj (c̄) = uj (c̄), with mi=1 pi c̄ij = i=1 pi c̄ij = Rj . To check that this
m
claim is correct, suppose on the contrary that i=1 pi c̄ij < Rj . Then, the consumption bundle,

c = c1j + ε, · · ·, cmj , ε > 0,
would be preferred to c, by Assumption 2.1, and, at the same time, it would hold that, for sufficiently
small ε,
m m

pi c̄ij = εp1 + pi cij < Rj .
i=1 i=1
m
[Indeed, we have, A ≡ i=1 pi cij . A < Rj ⇒ ∃ε > 0 : A + εp1 < Rj . E.g., εp1 = Rj − A − η, η > 0. The
condition is then: ∃η > 0 : Rj − A > η.] Hence, c would be a solution to [P1], thereby contradicting
the optimality of c. Therefore, the existence of two optima would imply a full use of resources. Next,
consider a point y lying between c̄ and c, viz y = αc̄ + (1 − α)c, α ∈ (0, 1). By Assumption 2.1,

uj (y) = uj αc̄ + (1 − α)c > uj (c̄) = uj (c).
Moreover,
m
m
m

m
pi yi = pi αc̄ij + (1 − α)cij = α pi c̄ij + m
i=1 cij − α i=1 cij = αRj + Rj − αRj = Rj.
i=1 i=1 i=1
Hence, y ∈ Bj (p) and is also strictly preferred to c̄ and c, which means that c̄ and c are not optima,
as initially conjectured. This establishes uniqueness of the solution to [P1].
58
2.11. Appendix 2: Proofs of selected results c
by A. Mele
2.11 Appendix 2: Proofs of selected results

We first provide a useful result, a well-known theorem on separation of two convex sets. We use this
theorem to deal with the proof of the second welfare theorem (Theorem 2.4) and the existence of state
prices tying up all asset prices together (Theorem 2.10). A final proof we provide in this appendix is
that of Eq. (2.28).
Minkowski’s separation theorem.

( Let A and B be two non-empty convex subsets of Rd . If A
is closed, B is compact and A B = ∅, then there exists a φ ∈ Rd and two real numbers d1 , d2 such
that:
a⊤ φ ≤ d1 < d2 ≤ b⊤ φ, ∀a ∈ A, ∀b ∈ B.
We are now ready to prove Theorems 2.4 and 2.10.

& '
Proof of Theorem 2.4. Let c̄ be a Pareto ) optimum and B̃j = cj : uj (c*j ) > uj (c̄j ) . Let us
/ n
consider the two sets B̃ = nj=1 B̃j and A = (cj )nj=1 : cj ≥ 0 ∀j, j
j=1 c = w . A is the set of all
possible combinations of feasible allocations. By (
the definition of a Pareto optimum, there are no
elements in A that are(simultaneously in B̃, or A B̃ = ∅. In particular, this is true for all compact
subsets B of B̃, or A B = ∅. Because A is closed, then, by the Minkowski’s separating theorem,
there exists a p ∈ Rm and two distinct numbers d1 , d2 such that
p⊤ a ≤ d1 < d2 ≤ p⊤ b, ∀a ∈ A, ∀b ∈ B.
n
This means that for all allocations cj j=1 preferred to c̄, we have:
n
n

p⊤ wj < p⊤ cj ,
j=1 j=1
n j
n j
or, by replacing j=1 w with j=1 c̄ ,
n
n

⊤ j ⊤
p c̄ < p cj . (2A.1)
j=1 j=1

Next we show that p > 0. Let c̄i = nj=1 c̄ij , i = 1, · · ·, m, and partition c̄ = (c̄1 , · · ·, c̄m ). Let us apply
the inequality in (2A.1) to c̄ ∈ A and, for µ > 0, to c = (c̄1 + µ, · · ·, c̄m ) ∈ B. We have p1 µ > 0, or
p1 > 0. By reiterating the argument, pi > 0 for all i. Finally, we choose cj = c̄j + 1m nǫ , j = 2, · · ·, n,
ǫ > 0 in (2A.1), p⊤ c̄1 < p⊤ c1 + p⊤ 1m ǫ or,
p⊤ c̄1 < p⊤ c1 ,
for ǫ sufficiently small. This means that u1 (c1 ) > u1 (c̄1 ) ⇒ p⊤ c1 > p⊤ c̄1 . This means that c̄1 =
arg maxc1 u1 (c1 ) s.t. p⊤ c1 = p⊤ c̄1 . By symmetry, c̄j = arg maxcj uj (cj ) s.t. p⊤ cj = p⊤ c̄j for all j.
Proof of Theorem 2.10. The condition in (2.12) holds for any compact subset of Rd+1
+ , and
d+1
therefore it holds when it is restricted to the unit simplex in R+ ,
(
W S d = {0} .
By the Minkowski’s separation theorem, ∃φ̃ ∈ Rd+1 : w⊤ φ̃ ≤ d1 < d2 ≤ σ⊤ φ̃, w ∈ W , σ ∈ S d .

By walking along the simplex boundaries, one finds that d1 < φ̃s , s = 1, · · ·, d. On the other hand,
59
by A. Mele
0 ∈ W , which reveals that d1 ≥ 0, and φ̃ ∈ Rd+1 ⊤

++ . Next we show that w φ̃ = 0. Assume the contrary,
i.e. ∃w∗ ∈ W that satisfies at the same time w∗⊤ φ̃ = 0. In this case, there would be a real number ǫ
with sign(ǫ) = sign(w∗⊤ φ̃) such that ǫw∗ ∈ W and ǫw∗⊤ φ̃ > d2 , a contradiction. Therefore, we have
⊤ ⊤ ⊤
0 = φ̃ W θ = (φ̃ (−S V )⊤ )θ = (−φ̃0 S + φ̃(d) V )θ, ∀θ ∈ Rm , where φ̃(d) contains the last d components

of φ̃. Whence S = φ⊤ V , where φ⊤ = φ̃φ̃1 , · · ·, φ̃φ̃d .
0 0
The proof of the converse is immediate (hint: multiply by θ): shown in further notes.
The proof of the second part is the following one. We have that “each point of Rd+1 is equal to
each point of W plus each point of W ⊥ ,” or dim W + dim W ⊥ = d + 1. Since dim W =
rank(W ), dim W ⊥ = d + 1 − dim W , and since S = φ⊤ V in the absence of arbitrage opportunities,
dim W = dim V = m, whence:
dim W ⊥ = d − m + 1.
⊤ ⊤
In other terms, before we showed that ∃φ̃ : φ̃ W = 0, or φ̃ ∈ W ⊥ . Whence dim W ⊥ ≥ 1 in
the absence of arbitrage opportunities. The previous relation provides more information. Specifically,
⊤
dim W ⊥ = 1 if and only if d = m. In this case, dim{φ̃ ∈ Rd+1
+ : φ̃ W = 0} = 1, which means that
⊤ ∗
the relation −φ̃0 S + φ̃d V = 0 also holds true for φ̃ = φ̃· λ, for every positive scalar λ, but there are no
other possible candidates. Therefore, φ⊤ = φ̃φ̃1 , · · ·, φ̃φ̃d is such that φ = φ(λ), and then it is unique.
0 0
d+1 ⊤ & '
By a similar reasoning, dim{φ̃ ∈ R+ : φ̃ W = 0} = d−m+1 ⇒ dim φ ∈ Rd++ : S = φ⊤ V = d−m.

(2)(ℓ)
Proof of Eq. (2.28). Let Ss′ ,s be the price at t = 2 in state s′ if the state in t = 1 was s, for the
(2) (2)(1) (2)(m)
Arrow security promising 1 unit of numéraire in state ℓ at t = 3. Let Ss′ ,s = [Ss′ ,s , · · ·, Ss′ ,s ]. Let
(1)(s)
θi be the quantity purchased at t = 1 in state i of Arrow securities promising 1 unit of numéraire
if s at t = 2. Let p2s,i be the price of the good at t = 2 in state s if the previous state at t = 1 was i.
(1)(i) (2)(ℓ) (1) (2)
Let S (0)(i) and Ss correspond to Ss′ ,s ; S (0) and Ss correspond to Ss′ ,s .
The budget constraint is
 m


 (0) (0)
S (0)(i) θ(0)(i)
 p0 (c0 − w0 ) = −S θ = −

i=1
m


 1 (0)(s) (1) (1) (0)(s) (1)(i) (1)(i)
 1 1
 ps cs − ws = θ
 − Ss θs = θ − Ss θs , s = 1, · · ·, d.
i=1
(1)(i)
where Ss is the price to be paid at time 1 and in state s, for an Arrow security giving 1 unit of
numéraire if the state at time 2 is i.
By replacing the second equation of (3.9) in the first one:
m
!
(1) (1)
p0 (c0 − w0 ) = − S (0)(i) p1i c1i − wi1 + Si θi
i=1
⇐⇒ m m
1 (1) (1)
0 = p0 (c0 − w0 ) + S (0)(i) p1i 1
ci − wi + S (0)(i) Si θi
i=1 i=1
m
m m
(1)(j) (1)(j)
= p0 (c0 − w0 ) + S (0)(i) p1i c1i − wi1 + S (0)(i) Si θi
i=1 i=1 i=1
m
m
m
(1)(j) (1)(j)
= p0 (c0 − w0 ) + S (0)(i) p1i c1i − wi1 + S (0)(i) Si θi
i=1 i=1 j=1
60
by A. Mele
At time 2,
m

(1)(s) (2) (2) (1)(s) (2)(ℓ) (2)(ℓ)
p2s,i c2s,i − ws,i
2
= θi − Ss,i θs,i = θi − Ss,i θs,i , s = 1, · · ·, d.
ℓ=1
(2)
Here Ss,i is the price vector, to be paid at time 2 in state s if the previous state was i, for the Arrow
securities expiring at time 3. The other symbols have a similar interpretation.
By plugging (???) into (???),

m m
m !
(1)(j) (2) (2)
0 = p0 (c0 − w0 ) + S (0)(i) p1i c1i − wi1 + S (0)(i) Si p2j,i c2j,i − wj,i
2
+ Sj,i θ j,i
i=1 i=1 j=1
m m m
(1)(j) 2
= p0 (c0 − w0 ) + S (0)(i) p1i c1i − wi1 + S (0)(i) Si pj,i (c2j,i 2
− wj,i )
i=1 i=1 j=1

m
m
m
(1)(j) (2)(ℓ) (2)(ℓ)
+ S (0)(i) Si Sj,i θj,i .
i=1 j=1 ℓ=1
In the absence of arbitrage opportunities, ∃φt+1,s′ ∈ Rd++ - the state prices vector for t + 1 if the
state in t is s′ - such that:
(t)(ℓ)
Ss′ ,s = φ′t+1,s′ · eℓ , ℓ = 1, · · ·, m,
where eℓ ∈ Rd+ and has all zeros except in the ℓ-th component which is 1. Next, we restate the
(ℓ)
previous relation in terms of the kernel mt+1,s′ = (mt+1,s′ )dℓ=1 and the probability distribution Pt+1,s′ =
(ℓ)
(Pt+1,s′ )dℓ=1 of the events in t + 1 when the state in t is s′ :
(t)(ℓ) (ℓ) (ℓ)

Ss′ ,s = mt+1,s′ · Pt+1,s′ , ℓ = 1, · · ·, m.
By replacing in (???), and imposing the transversality condition:

m
m
m
m m
(1)(ℓ2 ) (2)(ℓ3 ) (3)(ℓ4 ) (t−1)(ℓ )
··· · · ·S (0)(ℓ1 ) Sℓ1 Sℓ2 ,ℓ1 Sℓ3 ,ℓ2 t
· · · Sℓt−1 ,ℓt−2 · ·· → 0,
t→∞
ℓ1 =1 ℓ2 =1 ℓ3 =1 ℓ4 =1 ℓt =1
we get eq. (2.28).
61
2.12. Appendix 3: The multicommodity case c
by A. Mele
2.12 Appendix 3: The multicommodity case

The multicommodity case is interesting, but at the same time is extremely delicate to deal with when
markets are incomplete. While standard regularity conditions ensure the existence of an equilibrium
in the static and complete markets case, only “generic” existence results are available for the incoplete
markets cases. Hart (1974) built up well-chosen examples in which there exist sets of endowments
distributions for which no equilibrium can exist. However, Duffie and Shafer (1985) showed that such
sets have zero measure, which justifies the terminology of “generic” existence.
Here we only provide a derivation of the contraints. mt commodities are traded in period t (t = 0, 1).
The states of nature in the second period are d, and the number of traded assets is a. The first period
budget constraint is:
p0 e0j = −Sθj , e0j ≡ c0j − w0j
(1) (m ) (1) (m )
where p0 = (p0 , · · ·, p0 1 ) is the first period price vector, e0j = (e0j , · · ·, e0j 1 )′ is the first period
excess demands vector, S = (S1 , · · ·, Sa ) is the financial asset price vector, and θj = (θ1j , · · ·, θaj )′ is
the vector of assets quantities that agent j buys at the first period.
The second period budget constraint is,
E1 p′1 = B · θj
d×d·m2
where  
e1 (ω 1 ) 0 ··· 0
 1×m2 1×m2 1×m2 
 0 e1 (ω 2 ) · · · 0 
 
E1 =

1×m2 1×m2 1×m2 

d×d·m2  
 
0 0 · · · e1 (ω d )
1×m2 1×m2 1×m2
is the matrix of excess demands, p1 = (p1 (ω 1 ), · · ·, p1 (ω d )) is the matrix of spot prices, and
m2 ×1 m2 ×1
 
v1 (ω 1 ) va (ω1 )
 .. 
B = . 
d×a
v1 (ω d ) va (ωd )
is the payoffs matrix. We can rewrite the second period constraint as p1 e1j = B · θj , where e1j is
defined similarly as e0j , and p1 e1j ≡ (p1 (ω 1 )e1j (ω 1 ), · · ·, p1 (ω d )e1j (ω d ))′ . The budget constraints are
then,
p0 e0j = −Sθj , p1 e1j = Bθj .
Now suppose that markets are complete, i.e., a = d and B can be inverted. The second constraint
is then: θj = B −1 p1 e1j . Consider without loss of generality Arrow securities, or B = I. We have
θj = p1 e1j , and by replacing into the first constraint,
0 = p0 e0j + Sθj
= p0 e0j + Sp1 e1j
= p0 e0j + S · (p1 (ω 1 )e1j (ω 1 ), · · ·, p1 (ω d )e1j (ωd ))′
d
= p0 e0j + Si · p1 (ω i )e1j (ω i )
i=1
m
1
d m2

(h) (h) (ℓ)
= p0 e0j + Si · p1 (ω i )eℓ1j (ω i )
h=1 i=1 ℓ=1
m
1 d m 2
(h) (h) (ℓ)
= p0 e0j + p̂1 (ω i )eℓ1j (ω i )
h=1 i=1 ℓ=1
62
by A. Mele
(ℓ) (ℓ)
where p̃1 (ω i ) ≡ Si · p1 (ω i ). The price to be paid today for the obtention of a good ℓ in state i is equal
(ℓ)
to the price of an Arrow asset written for state i multiplied by the spot price p̃1 (ω i ) of this good in this
(ℓ)
state; here the Arrow-Debreu state price is p̃1 (ω i ). The general equilibrium can be analyzed by making
reference to such state prices. From now on, we simplify and set m1 = m2 ≡ m. Then we are left with
(1) (m) (1) (m)
determining m(d + 1) equilibrium prices, i.e. p0 = (p0 , · · ·, p0 ), p̃1 (ω 1 ) = (p̃1 (ω 1 ), · · ·, p̃1 (ω 1 )),
(1) (m)
· · ·, p̃1 (ω d ) = (p̃1 (ω d ), · · ·, p̃1 (ωd )). By exactly the same arguments of the previous chapter, there
exists one degree of indeterminacy. Therefore, there are only m(d + 1) − 1 relations that can determine
the m(d + 1) prices. (Price normalization can be done by letting one of the first period commodities
be the numéraire.) On the other hand, in the initial economy we have to determine m(d + 1) + d prices
m·(d+1)
(p̂, Ŝ) ∈ R++ × Rd++ which are the solution to the system:

n
n
n
e0j (p̂, Ŝ) = 0, e1j (p̂, Ŝ) = 0, θj (p̂, Ŝ) = 0,
j=1 j=1 j=1
where the previous functions are obtained as solutions to the agents’ programs. When we solve for
Arrow-Debreu prices, in a second step we have to determine m(d + 1) + d prices starting from the
knowledge of m(d + 1) − 1 relations defining the Arrow-Debreu prices, which implies a price inde-
terminacy of the initial economy equal to d + 1. In fact, it is possible to show that the degree of
indeterminacy is only d − 1.
63
by A. Mele
References
Arrow, K. J. (1953): “Le rôle des valeurs boursières pour la répartitition la meilleure des
risques.” Econométrie 41-48. CNRS, Paris. Translated and reprinted in 1964: “The Role
of Securities in the Optimal Allocation of Risk-Bearing.” Review of Economic Studies 31,
91-96.
Debreu, G. (1954): “Valuation Equilibrium and Pareto Optimum.” Proceedings of the National
Academy of Sciences 40, 588-592.
Debreu, G. (1959): Theory of Value: An Axiomatic Analysis of Economic Equilibrium. New

Haven: Yale University Press.
Duffie, D. (2001): Dynamic Asset Pricing Theory. Princeton: Princeton University Press.
Duffie, D. and W. Shafer (1985): “Equilibrium in Incomplete Markets: I. A Basic Model of

Generic Existence.” Journal of Mathematical Economics 13 285-300.
Hart, O. (1974): “On the Existence of Equilibrium in a Securities Model.” Journal of Economic
Theory 9, 293-311.
He, H. and N. Pearson (1991): “Consumption and Portfolio Policies with Incomplete Markets
and Short-Sales Constraints: The Infinite Dimensional Case.” Journal of Economic Theory
54, 259-304.
64
3
Infinite horizon economies
3.1 Introduction
We study asset prices in multiperiod economies, where agents either live forever, and have access
to a set of complete markets, or belong to overlapping generations. We consider models without
and with production, without and with money, and develop the fundamental tools we need in
subsequent chapters, to analyze financial frictions, bubbles and sunspots in capital markets.
3.2 Consumption-based asset evaluation

3.2.1 Recursive plans: introduction
We consider a simple, benchmark case, arising in the absence of any risks for a decision maker.
Consider an agent endowed with initial wealth equal to w0 , who solves the following problem:
∞

V (w0 ) ≡ max β t u (ct )
∞
(ct )t=0
t=0
[3.P1]
s.t. wt+1 = (wt − ct )Rt+1 , (Rt )∞
t=0 given
The previous problem can be reformulated in a recursive format:
V (wt ) = max [u (ct ) + βV (wt+1 )] s.t. wt+1 = (wt − ct ) Rt+1 . (3.1)

ct
By replacing the wealth constraint into the maximand, it is easily checked that the first-order
condition for c leads to, u′ (ct ) = βV ′ (wt+1 )Rt+1 . Therefore, the consumption policy is a function
of both wealth and the interest rate, which for sake of simplicity we denote as c (wt ). The value
function and the first-order condition, then, can be written as:
V (wt ) = u (c(wt )) + βV ((wt − c (wt )) Rt+1 ) , u′ (c (wt )) = βV ′ ((wt − c (wt )) Rt+1 ) Rt+1 .
By differentiating the value function, and using the first-order condition,
V ′ (wt ) = u′ (c (wt )) c′ (wt ) + βV ′ ((wt − c (wt )) Rt+1 ) (1 − c′ (wt )) Rt+1 = u′ (c (wt )) .

3.2. Consumption-based asset evaluation c
by A. Mele
Therefore, V ′ (wt+1 ) = u′ (c (wt+1 )) too, and by substituting back into the first-order condition,
u′ (c (wt+1 )) 1
β ′
= . (3.2)
u (c (wt )) Rt+1
The economic intuition underlying Eq. (8.8) is the same as that we saw in the two-period
economy analyzed in Chapter 2. Eq. (8.8) says that the present consumption I give up, at t,
to obtain addition consumption at t + 1, has to equal a pure discount bond issued at t and
expiring the next period, along an optimal consumption path.
We can arrive at the very same conclusions, following an alternative approach, based on
Lagrange multipliers. This approach is useful when dealing with more intricate issues relating
to production economies or economies with financial frictions, as we shall see in this and further
chapters. So consider the constraint in program [3.P1]. Savings at time t are savt ≡ wt − ct .
Using this definition, the constraint in [3.P1] is: ct+1 + savt+1 = Rt+1 savt , with sav−1 = w0 ,
given. Let λt be a sequence of Lagrange multipliers associated to these constraints. Consider
the program,
∞
t
L (sav−1 ) ≡ max β u (ct ) − λt (ct + savt − Rt savt−1 ) ,
(ct ,savt )∞
t=0
t=0
where λt is a sequence of Lagrange multipliers. The first-order condition for consumption ct is,
β t u′ (ct ) = λt , and the first-order condition for savings savt leads to: λt = λt+1 Rt+1 . Putting all
together yields precisely Eq. (8.8). Note that the same program can be cast, and solved, in a
recursive format,
L (savt−1 ) = max [u (ct ) − λt (ct + savt − Rt savt−1 ) + βL (savt )] .

ct ,savt ,λt
The first-order condition for consumption and savings are u′ (ct ) = λt and λt = βL′ (savt ),
respectively. By replacing the first-order condition for λt , i.e. the budget constraint, and differ-
entiating L (savt−1 ), leaves L′ (savt−1 ) = βL′ (savt ) Rt . These conditions lead to Eq. (8.8).
As a simple example, consider the case of a logarithmic utility function, u (c) = ln c. Let us
guess that the value function is V (wt ) ≡ V (wt ; Rt ) = at + b ln wt . The first-order condition
then yields c (w) = b−1 w. By Eq. (8.8), then, wt+1 = βwt Rt+1 . Comparing the right hand
side of this equation with the right hand side of the constraint in the program [3.P1], leaves
c (wt ) = (1 − β) wt ; in other terms, b = (1 − β)−1 .1
Next, we introduce uncertainty.
3.2.2 The marginalist argument

Consider the following thought experiment. At time t, I give up to a small quantity of con-
sumption equal to ∆ct . The reduction in the (current) utility is, then, equal to β t u′ (ct )∆ct .
But by investing ∆ct in a safe asset, I can have access to ∆ct+1 = Rt+1 ∆ct additional units
of consumption at time t + 1. These additional consumption units lead to an expected utility
gain equal to β t+1 E (u′ (ct+1 ) ∆ct+1 ). If ct and ct+1 are part of an optimal consumption plan,
1 To pin down the coefficient series a , use the definition of the value function, V (w ; R ) ≡ u (c (w )) + βV (w
t t t t t+1 ; Rt+1 ). By
β
plugging V (w, Rt ) = at + b log w and c (w) = (1 − β)−1 w into this definition leaves, at = ln (1 − β) + βat+1 + 1−β ln (βRt+1 ). If
β
R is constant, at is also constant, and equal to (ln (1 − β) + 1−β
ln (βR))/ (1 − β).
66
by A. Mele
I should be left with no incentives to implement these intertemporal consumption transfers.

Therefore, along an optimal consumption plan, any reductions and gains in the welfare of the
type considered above need to be identical:
u′ (ct ) = βE (u′ (ct+1 )Rt+1 ) .
This relation generalizes Eq. (8.8). Next, suppose that at time t, ∆ct can be invested in a risky
asset whose price is St . I can buy ∆ct / St units of this asset. Come time t + 1, I could sell the
asset for St+1 , pocket its divend Dt+1 , if any, and finance additional units of consumption equal
to ∆ct+1 = ( ∆ct / St ) · (St+1 + Dt+1 ). The reduction in the current utility is β t u′ (ct )∆ct , and
the boost in the expected utility at time t + 1 is β t+1 E (u′ (ct+1 )∆ct+1 ). Again, if I am following
an optimal consumption policy, the incentives for these kind of intertemporal transfers should
not exist. Therefore, the celebrated Lucas asset pricing equation holds:

′ ′ St+1 + Dt+1
u (ct ) = βE u (ct+1 ) . (3.3)
St
We now derive Eq. (3.3) through dynamic programming methods, which are essential, once we
wish to work through more complex models such as those including financial frictions.
3.2.3 Lucas’ model

3.2.3.1 The optimality condition
We consider markets for m “trees,” and assume that the only source of risk stems from the
dividends related to these trees: D = (D1 , · · ·, Dm ). We assume D is a Markov process and
denote its conditional distribution function with P ( Dt+1 | Dt ). A representative agent solves
the following program:
- ∞ .

i
V (θt ) = max ∞ E β u(ct+i ) Ft
(ct+i ,θt+i )i=0 [3.P2]
i=0
s.t. ct + St θt+1 = (St + Dt ) θt
where θt+1 ∈ Rm is Ft -measurable, that is, θt+1 needs to be chosen at time t. We can solve the
program [3.P2], using the same recursive approach in Section 3.2.1, once due account is made
of uncertainty. The Bellman’s equation is:
V (θt , Dt ) = max E [ u(ct ) + βV (θt+1 , Dt+1 )| Ft ] s.t. ct + St θ t+1 = (St + Dt ) θt .
ct ,θt+1
Similarly as we did for Eq. (3.1), let us replace the budget constraint into the maximand. The
following first-order condition holds for θi :
0 = E [−u′ ((St + Dt ) θt − St θt+1 ) Si,t + βV1i (θ t+1 , Dt+1 )] , (3.4)
where the subscript in the value function on the right hand side denotes a partial derivative:
V1i (θ, D) = ∂ (θ, D) /∂θi . The optimal policy, θt+1 is a function of the current state, (θt , Dt ),
say θ t+1 = T (θt , Dt ). By differentiating the value function with respect to θi , and using the
previous first-order condition, leaves:
- $ % .

m
m
V1i (θt , Dt ) = E u′ (ct ) Si,t + Di,t − Sj,t T1ji (θt , Dt ) + β V1i (θ t+1 , Dt+1 ) T1ji (θt , Dt )
j=1 j=1
′
= u (ct ) (Si,t + Di,t ) ,
67
by A. Mele
where we have defined T1ji (θ t , Dt ) = ∂Ti (θ, D) /∂θj and Ti is the i-th component of the vector
T . Substituting this result into Eq. (3.4) yields precisely the Lucas equation (3.3), holding for
each asset i:
′ ′ Si,t+1 + Di,t+1
u (ct ) = βE u (ct+1 ) . (3.5)
Si,t
3.2.3.2 Rational expectations equilibrium
(0)
The asset market clears when for each t, θt = 1m and θt = 0, where θ (0) denotes the amount
of
mthe riskless asset. By the budget constraint, then, the market for goods also∞clears, ct =
i=1 Dit ≡ D̄t . A rational expectation equilibrium is a sequence of asset prices (St )t=0 such that
the optimality condition in Eq. (3.5) holds, the markets clear, ct = D̄t , and each asset price is
a function of the state, Si,t = Si (Dt ) say. All in all,
"

u (D̄t )Si (Dt ) = β u′ (D̄t+1 ) Si (D̄t+1 ) + Di,t+1 dP (Dt+1 | Dt ) .
′
(3.6)
This is a functional equation in Si (·). Let us focus, first, on the IID case: P (Dt+1 | Dt ) =
P (Dt+1 ).
IID shocks
Eq. (3.6) simplifies to:

"
′

u (D̄t )Si (Dt ) = β u′ (D̄t+1 ) Si (D̄t+1 ) + Di,t+1 dP (Dt+1 ) .
Note that the right hand side of this equation is independent of D. Therefore, u′ (D̄t )Si (Dt )
equals some constant κ (say), which we can easily find by substituting it back into the previous
equation, leaving: "
β
κi = u′ (D̄t+1 )Di,t+1 dP (Dt+1 ) .
1−β
The solution for Si (D) is then:
κi
Si (Dt ) = ′ .
u (D̄t )
′′
Note, the elasticity of the price to dividend equals − uu′ ((D̄)
D̄)
Di , which collapses to relative risk-
aversion, once we assume only one tree exists, as it is customary. For example, if relative
risk-aversion is constant and equal to η,
"
η β
S(Dt ) = κ · Dt , κ ≡ D1−η dP (D) .
1−β
Figure 3.1 depicts the behavior of the asset price function S (D), under the assumption that
κ is not increasing in η.
Only when the representative agents are risk-neutral, η = 0, does the asset price collapse to
the constant β(1 − β)−1 E(D).
Dependent shocks
#
Define gi (D) ≡ u′ (D̄)Si (D) and hi (D) ≡ β u′ (D̄t+1 )Di,t+1 dP ( Dt+1 | D). In terms of these new
functions, Eq. (3.6) is:
"
gi (D) = hi (D) + β gi (Dt+1 ) dP ( Dt+1 | D) .
68
by A. Mele
S(Dt)
0<η<1
β(1−β)−1
η=1
η>1
Dt
1
FIGURE 3.1. The asset pricing function S (Dt ) in the IID case and constant relative risk-aversion,
equal to η.
It is a functional equation in gi , which we can show it admits a unique solution, under the
conditions contained in the celebrated Blackwell’s theorem below:
Theorem 3.1. Let B(X) the Banach space of continuous bounded real functions on X ⊆ Rn
endowed with the norm f = supX |f |, f ∈ B(X). Introduce an operator T : B(X) → B(X)
with the following properties:
(i ) T is monotone: ∀x ∈ X and f1 , f2 ∈ B(X), f1 (x) ≤ f2 (x) ⇐⇒ T [f1 ] (x) ≤ T [f2 ] (x);
(ii ) ∀x ∈ X and c ≥ 0, ∃β ∈ (0, 1) : T [f + c] (x) ≤ T [f ] (x) + βc.
Then, T is a β-contraction and, ∀f0 ∈ B(X), it has a unique fixed point limτ →∞ T τ [f0 ] = f =
T [f ].
So let us introduce the following operator:

"
T [gi ] (D) = hi (D) + β gi (D′ ) dP ( D′ | D) .
The existence of gi and, hence, Si , relies on the existence of a fixed point of T : gi = T [gi ].
It is easily checked that conditions (i) and (ii) in Theorem 3.1 hold here. To establish that
T : B(D) → B(D) as well, it is sufficient to show that hi ∈ B(D). A sufficient condition given
by Lucas (1978) is that u is bounded, and bounded away by a constant ū.2
3.2.4 Arrow-Debreu state prices, the CCAPM and the CAPM

Let us consider the case of a single tree. We have the following consumption-based asset pricing
equation:
u′ (Dt+1 )
St = Et [mt+1 (St+1 + Dt+1 )] , mt+1 ≡ β ′ .
u (Dt )
2 In this case, concavity of u implies that for each D, 0 = u (0) ≤ u (D) + u′ (D) (−D) ≤ ū − Du′ (D), which implies that
for each D, Du′ (D) ≤ ū and, hence, hi (D) ≤ β ū. Then, it is possible to show that the solution is in B(D), which implies that
T : B(D) → B(D).
69
3.3. Production: foundational issues c
by A. Mele
By using the same arguments as those in Section 2.6 of the previous chapter, we can show that
the Radon-Nikodym derivative of the risk-neutral probability, P ∗ , with respect to P , is:
dP ∗ u′ (Dt+1 )
( Dt+1 | Dt ) = .
dP E [u′ ( Dt+1 | Dt )]
In the Lucas model, then, the Arrow-Debreu state-price density is:
dP̃ ∗ ( Dt+1 | Dt ) = dP ∗ ( Dt+1 | Dt ) R−1

t .
It is the price to pay, in state Dt , to obtain one unit of the good the next period in state Dt+1 .
Finally, define the gross return R̃ as, R̃t+1 ≡ St+1S+D t
t+1
. Then, all the considerations made in
Section 2.7 of the previous chapter, are also valid here.
3.3 Production: foundational issues

In the economy of the previous section, the asset “reward,” is an exogenous datum. In this
chapter, we lay down the foundations for the analysis of production-based economies, where
firms maximize their value and set dividends endogenously. In these economies, production and
capital accumulation are endogenous. In this section, we review the foundational issues that
arise in economies with productive capital. In the next section, we develop the asset pricing
implications of these economies, in absence of frictions. In Part II, we extend the framework
in this and the next section, and examine the asset price implications deriving from financial
frictions.
3.3.1 Decentralized economy

A continuum of identical firms in (0, 1) have access to capital and labor markets, and the follow-
ing technology: (K, N) → Y (K, N),where Yi (K, N) > 0, yii (K, N) < 0, limK→0+ Y1 (K, N ) =
limN →0+ Y2 (K, N ) = ∞, limK→∞ Y1 (K, N ) = limN→∞ Y2 (K, N ) = 0, and subscripts denote
partial derivatives. We assume Y is homogeneous of degree one, i.e. Y (λK, λN ) = λY (K, N)
for all λ > 0. Per capita production is y(k) ≡ Y ( K/ N, 1), where k ≡ K/ N is per-capita
capital, Population growth can be non-zero, i.e. N satisfies Nt /Nt−1 = (1 + n). Firms purchase
capital and labor at prices R = Y1 (K, N ) and w = Y2 (K, N) = w. We have,
R = y ′ (k) , w = y (k) − ky ′ (k) .
The Nt consumers live forever. We assume each consumer offers inelastically one unit of labor,
and that, for now, that N0 = 1 and n = 0. The resource constraint for the consumer is:
ct + st = Rt st−1 + wt Nt , Nt ≡ 1, t = 1, 2, · · ·. (3.7)
At each time t − 1, the consumer saves st−1 units of capital, which he lends to the firm. At time
t, the consumer receives the gross return on savings from the firm, Rt st−1 , where Rt = y ′ (kt ),
plus the wage receipts wt Nt . Then, he uses these resources to consume ct and lend st to the
firm. At time zero,
c0 + s0 = V0 ≡ Y1 (K0 , N0 )K0 + w0 N0 , N0 ≡ 1.
70
by A. Mele
Following the approach developed in Chapter 2, we can write down a single budget constraint,
obtained iterating Eq. (3.7):
T
ct − wt Nt sT
0 = c0 + 0t + 0T − V0 ,
t=1 i=1 Ri i=1 Ri
and imposing the transversality condition:

T
1
lim sT Ri−1 = 0, (3.8)
T →∞
i=1
so as to have:
∞
∞
t ct − wt Nt
max β u(ct ), s.t. V0 = c0 + 0t . [3.P3]
(ct )∞
t=0
t=1 t=1 i=1 Ri
The economic interpretation of the transversality condition (4.25) is the following. The first-
order conditions of the program [3.P3] are:
1
β t u′ (ct ) = l 0t , (3.9)
i=1 Ri
where l is a Lagrange multiplier. In equilibrium, current savings equal next period capital, or
kt+1 = st . Therefore, Eq. (4.25) is:
lim β T u′ (cT ) kT +1 = 0. (3.10)

T →∞
That is, the economic value of capital is capital weighted by discounted marginal utility, which
needs to be zero, eventually.
The first-order condition (3.9) leads to the usual optimality condition in Eq. (8.8), where this
time, Rt+1 = y ′ (kt+1 ). In this economy, an equilibrium is a sequence ((ĉ, k̂)t )∞
t=0 satisfying

 kt+1 = y (kt ) − ct
u′ (ct+1 ) 1 (3.11)
 β ′ = ′
u (ct ) y (kt+1 )
and the transversality condition in Eq. (3.10). The first equation in this system is simply this:
capital available for producing the next period, kt+1 , is equal to savings, st ≡ y (kt ) − ct .
3.3.2 Centralized economy

The market solution in (3.11) can be implemented by a social planner, who solves the following
program:
∞
V (k0 ) ≡ max∞ β i u (ct )
(ct ,kt )t=0
i=0
[3.P4]
s.t. kt+1 = y (kt ) − ct , k0 given
under the further transversality condition in Eq. (3.10).
71
by A. Mele
The program in [3.P4] is easily solved. By replacing the constraint into the utility func-
tion, and taking derivatives with respect to kt , leads directly to the second equation in (3.11).
Alternatively, let us introduce the Lagrangian,
∞
t
L (k0 ) = max β u(ct ) − λt (kt+1 − y(kt ) + ct ) .
(ct ,kt+1 )∞
t=0
t=0
The first-order condition with respect to consumption is λt = β t u′ (ct ), and the condition for
capital is λt−1 = λt y ′ (kt ). Putting these conditions together, leads to the second equation in
(3.11). The same argument can be made, following a recursive approach. We have:
L (kt ) = max [u (ct ) − λt (kt+1 − y (kt ) + ct ) + βL (kt+1 )] .

ct ,kt+1 ,λt
The first-order condition for consumption is λt = u′ (ct ), and that for capital is λt = βL′ (kt+1 ).
By replacing the first-order condition for λt (i.e., the constraint in program [3.P4]), and dif-
ferentiating with respect to kt , yields L′ (kt ) = βL′ (kt+1 ) y ′ (kt ). These three conditions lead,
again, to the second equation in (3.11).
Finally, consider the Bellman’s equation:
V (kt ) = max [u (ct ) + βV (kt+1 )] , s.t. kt+1 = y (kt ) − ct .

ct
The first-order condition leads to, u′ (ct ) = βV ′ (y (kt ) − ct ). Let us denote the policy with
ct = c (kt ). In terms of the policy c function, the value function and the first-order conditions
are:
V (kt ) = u (c (kt )) + βV (y (kt ) − c (kt )) , u′ (c (kt )) = βV ′ (y (kt ) − c (kt )) .
By differentiating the value function:
V ′ (kt ) = u′ (c (kt )) c′ (kt ) + βV ′ (y (kt ) − c (kt )) (y ′ (kt ) − c′ (kt )) = u′ (c (kt )) y ′ (kt ) .
By replacing back into the first-order condition, we obtain the second equation in (3.11).
3.3.3 Dynamics
We study the dynamics of the system in (3.11) in a small neighborhood of the stationary state,
defined as the pair (c, k), solution to:
1
c = y (k) − k, β= .
y′ (k)
A first-order expansion of each equation in (3.11) around its stationary state, yields the
following linear system:
$ %
′
k̂t+1 k̂t y (k) −1
=A , A≡ ′ ′ . (3.12)
ĉt+1 ĉt − uu′′(c)
(c)
y ′′ (k) 1 + β uu′′(c)
(c)
y ′′ (k)
The solution to this system is obtained with the tools reviewed in Appendix 1 of this chapter.
It is:
k̂t = v11 κ1 λt1 + v12 κ2 λt2 , ĉt = v21 κ1 λt1 + v22 κ2 λt2 , (3.13)
72
by A. Mele
ct
c0 = c + (v21/v11) (k0 – k)
c
c = y(k) – k
c0
kt
k0 k k*
FIGURE 3.2.

where: κi are constants that depend on the initial state, λi are the eigenvalues of A, and vv11 ,
v12 21
v22
are the eigenvectors associated with λi . In Appendix 1, we show that λ1 ∈ (0, 1) and
λ2 > 1. The proof we provide in the appendix is important, as it illustrates precisely how the
neoclassical model reviewed in this section, needs to be modified to induce indeterminacy in
the dynamics of capital and consumption. A critical step in that proof relies on the assumption
of diminishing returns, i.e. y ′′ (k) > 0.
Let us return to the equations in (3.13). First, we need to rule out an explosive behavior
of k̂t and ĉt , for otherwise we would contradict (i) that (c, k) is a stationary point, and (ii)
the optimality of the trajectories. Since λ2 > 1, the only possibility is to “lock” the initial
state (k̂0 , ĉ0 ) in such a way that κ2 = 0, which yields the following set of initial conditions:
k̂0 = v11 κ1 and ĉ0 = v21 κ1 , or k̂ĉ0 = vv21 11
.3 Therefore, the set of initial points that ensure a
0
non-explosive path must lie on the line c0 = c + vv21 11
(k0 − k). Since k is a predetermined variable,
there exists one, and only one, value of c0 , which ensures a non-explosive path of the system
around its steady state, as Figure 3.2 illustrates. In this figure, k∗ is defined as the solution of
1 = y ′ (k∗ ) ⇔ k∗ = (y ′ )−1 [1], and k = (y ′ )−1 [β −1 ].
The usual word of caution is in order. A linear approximation might turn out to be misleading.
We develop one example where the dynamics of the system could be quite different from those
analyze here, when we start away from the stationary state. Let y(k) = k γ , u(c) = ln c. It is
easy to show that the exact solution is:
ct = (1 − βγ) ktγ , kt+1 = βγktγ .
Figure 3.3 depicts the nonlinear manifold associated with this system, and its linear approxi-
mation. For example, let β = 0.99 and γ = 0.3. Then, the (linear) saddlepath is, approximately,
ct = c + 0.7101 (kt − k) , where: c = (1 − γβ) k γ , k = γβ 1/(1−γ) ,
where k̂t = λ1 k̂t−1 , and λ1 = 0.3.
3 In ĉ0 v21
fact, Appendix 1 shows that the converse is also true, i.e. = v11
⇒ κ2 = 0.
k̂0
73
by A. Mele
ct
linear approximation
steady state
nonlinear stable manifold
kt
FIGURE 3.3.
3.3.4 Stochastic economies

“Real business cycle theory is the application of general equilibrium theory to the quan-
titative analysis of business cycle fluctuations.” Edward Prescott (1991, p. 3)
“The Kydland and Prescott model is a complete markets set-up, in which equilibrium and
optimal allocations are equivalent. When it was introduced, it seemed to many–myself
included–to be much too narrow a framework to be useful in thinking about cyclical
issues.” Robert Lucas (1994, p. 184)
In its simplest version, real business cycle theory is an extension of the neoclassical model
of Section 3.3.3, in which random productivity shocks are added. The engine of fluctuations,
then, comes from the real sphere of the economy. This approach is in contrast with the Lucas
approach of the 1970s, based on information and money, where fluctuations arise due to infor-
mation delays with which agents discover the nature of a shock (real or monetary). As further
reviewed in Chapter 9, the Lucas information-theoretic approach has been, instead, more suc-
cessful in inspiring work on the formation of asset prices, leading to the development of market
microstructure theory and, more generally, to information driven explanations of asset prices.
Despite the remarkable switch in the economic motivation, the paradigm underlying real
business cycle theory is the same as the information-based approach of Lucas, as it relies on
rational expectations: macroeconomic fluctations and, then, as we shall explain, asset prices
fluctuations, stem from the optimal response of the agents vis-à-vis exogeneous shocks: agents
implement action plans that are state-contingent, i.e. they decide to consume, to work and to
invest according to the history of shocks as well as the present shocks they observe.
3.3.4.1 Basic model
We consider an economy with complete markets and no frictions, such that its equilibrium
allocations are Pareto-optimal. To characterize these allocations, we implement them through
the following program of a social planner:
-∞ .

V (k0 , s0 ) = max
∞
E β t u(ct ) , (3.14)
(ct )t=0
t=0
74
by A. Mele
subject to a capital accumulation constraint, with capital depreciation. Let It denote new
investment. It is:
It = Kt+1 − (1 − δ) Kt . (3.15)
At time t − 1, the available productive capital is Kt . At time t, a portion δKt of this capital is
lost, due to depreciation. Therefore, at time t, the productive system is left with (1 − δ) Kt units
of capital. The capital available at time t, Kt+1 , equals the capital already in place, (1 − δ) Kt ,
plus new investments, which is exactly Eq. (3.15).
Next, normalize population normalized to one, such that Kt = kt . The goods market clearing
condition is:
ỹ (kt , ǫt ) = ct + It ,
where ỹ(kt , st ) is the production function, which is Ft -measurable, and s is the source of
randomness–the engine for random fluctuations of the endogeneous variables. By replacing
Eq. (3.15) into the equilibrium condition,
kt+1 = ỹ (kt , ǫt ) − ct + (1 − δ) kt . (3.16)
So the planner maximizes the utility in Eq. (3.14), under the capital accumulation constraint
in Eq. (3.16).
We assume that ỹ (kt , st ) ≡ st y (kt ), where y is as in Section 3.2, and (st )∞
t=0 is solution to:
st+1 = sρt ǫt+1 , (3.17)

where ρ ∈ (0, 1), and (ǫt )∞ t=0 is a IID sequence with support s.t. st ≥ 0. In this economy, every
asset is priced as in the Lucas model of the previous section. Therefore, the gross return on
savings s· y ′ (k· ) satisfies:
u′ (ct ) = βEt (u′ (ct+1 ) (st+1 y ′ (kt+1 ) + 1 − δ)) . (3.18)
A rational expectation equilibrium is a stochastic process (ct , kt )∞
t=0 , satisfying Eq. (3.16), the
Euler equation in (3.18), for given k0 and s0 .
We show the existence of a saddlepoint path for the linearized version of Eqs. (3.16)-(3.17)-
(3.18), which implies determinacy of the stochastic (linearized) equilibrium.4 We study the
behavior of (c, k, s)t in a neighborhood of ǫ ≡ E(ǫt ). Let (c, k, s) be consumption, capital and
productivity shock, corresponding to ǫ, obtained replacing ǫ into Eqs. (3.16)-(3.17)-(3.18), and
assuming no uncertainty takes place:
1 1
c = sy (k) − δk, s = ǫ 1−ρ , β = .
sy ′ (k) +1−δ
A first-order approximation to Eqs. (3.16)-(3.17)-(3.18) around (k, c, s), leaves:
ẑt+1 = Φẑt + Rut+1 , (3.19)
where we have defined x̂t ≡ xtx−x , and ẑt = (k̂t , ĉt , ŝt )⊤ , ut = (uc,t , us,t )⊤ , uc,t = ĉt − Et−1 (ĉt ),
us,t = ŝt − Et−1 (ŝt ) = ǫ̂t , and, finally,
   
β −1 − kc s y(k)
k 0 0
 ′ ′ (c) βu′ (c) 
Φ =  − cuu ′′(c)
(c)
sky ′′ (k) 1 + βuu′′ (c)
sy ′′ (k) − cu ′′ ′
′′ (c) s(sy(k)y (k) + ρy (k))  , R =  1 0 .
0 0 ρ 0 1
4A

stochastic equilibrium is the situation where there is a stationary measure (definition: p(+) = π(+/−)dp(−), where π is
the transition measure) generating (ct , kt )∞
t=1 .
75
by A. Mele
Let us consider the characteristic equation:

2 −1 u′ (c) ′′ −1
0 = det (Φ − λI) = (ρ − λ) λ − β + 1 + β ′′ sy (k) λ + β .
u (c)
A solution is λ1 = ρ. By the same arguments produced for the deterministic case of Section
3.3.3 (see Appendix 1), one finds that λ2 ∈ (0, 1) and λ3 > 1.5 As for the deterministic case in
Section 3.3.3, we can diagonalize the system by rewriting Φ = P ΛP −1 , where Λ is a diagonal
matrix that has the eigenvalues of Φ on the diagonal, and P is a matrix of the eigenvectors
associated to the roots of Φ. The system in (3.19) is, then:
ŷt+1 = Λŷt + wt+1 , (3.20)
where ŷt ≡ P −1 ẑt and wt ≡ P −1 Rut . The third equation of this system is:
ŷ3,t+1 = λ3 ŷ3t + w3,t+1 , (3.21)
and ŷ3 explodes unless ŷ3t = 0 for all t, which is only possible when w3t = 0 for all t.6
The condition that ŷ3t ≡ 0 carries an interesting economic interpretation: it tells us that the
only sources of uncertainty in this system can stem from shocks to the fundamentals, or that
there can not be extraneous sources of noise, or “sunspots.” The reasons for this are easy to
explain. Let ŷt = P −1 ẑt ≡ Πẑt . We have:
0 = ŷ3t = π 31 k̂t + π 32 ĉt + π 33 ŝt . (3.22)
Eq. (3.22) shows that the three state variables, k̂t , ĉt and ŝt , are are mutually linked through a
two-dimensional plane. This plane is the saddlepoint of the economy, where the state variables
do exhibit a stable behavior, and is formally defined as:
& '
S = x ∈ R3 π 3. x = 0 , π 3. = (π 31 , π 32 , π 33 ).
Furthermore, Eq. (3.22) implies that a linear relation exists between the two expectational
errors:
π 33
For all t, uct = − ust (“no-sunspots”). (3.23)
π 32
Eq. (3.23) is a “no-sunspots” condition, as it says that the expectational error to consumption
can not be independent of the expectational shock on the fundamentals of the economy, which in
this simple economy relates to technological shock. In other words, the source of uncertainty we
have assumed in this economy, relates to the technological shock. The remaining expectational
errors can only be perfectly correlated to the expectational shock in technology or, there are
no sunspots.
The manifold S brings, mathematically, the same meaning as the stable relation depicted in
Figure 3.2, for the deterministic case. In this section, S is convergent subspace, with dim(S) = 2,
5 The linearized model in this section has state variables expressed in growth rates here. However, we can always reformulate this
model in terms of first differences, by pre- and post- multiplying Φ by appropriate normalizing matrices. As an example, if G i the
3 × 3 matrix that has k1 , 1c and 1s on its diagonal, (3.19) can be written as: E(zt+1 − z) = G−1 ΦG · (zt − z), where zt = (kt , ct , st ),
and we would arrive at the same conclusions. It is tedious but easy to check that the model in this section collapses to that in
Section 3.3.3, once we set ǫt = 1, for each t, and s0 = 1.
6 In other words, Eq. (3.21) implies that ŷ −(T −t)
3t = λ3 Et (ŷ3,t+T ), and for all T . Because λ3 > 1, this relation holds only when
ŷ3t = 0 for all t.
76
by A. Mele
which is the number of roots with modulus less than one. In other words, in this economy with
two predetermined variables, k̂0 and ŝ0 , there exists one, and only one, value of of ĉ0 in S, which
ensures stability, and is given by ĉ0 = − π31 k̂0π+π
32
33 ŝ0
. This reasoning generalizes that we made
for the deterministic case in Section 3.3.3, and is generalized further in Appendix 1.
The solution to the linearized model can be computed by generalizing the reasoning for the
deterministic case. First, by Eq. (3.20) ŷ is:
t−1

ŷit = λti ŷi0 + ζ it , ζ it ≡ λji wi,t−j ,
j=0
which implies the solution for ẑ is:

3
3
3

ẑt = P ŷt = (v1 v2 v3 )ŷt = vi ŷit = vi ŷi0 λti + vi ζ it .
i=1 i=1 i=1
To pin down the components of ŷ0 , note that ẑ0 = P ŷ0 ⇒ ŷ0 = P −1 ẑ0 ≡ Πẑ0 . The stability
(3)
condition then requires that the state variables be in S, or ŷ0 = 0, which we now use to
implement the solution. We have:
ẑt = v1 λt1 ŷ10 + v2 λt2 ŷ20 + v3 λt3 ŷ30 + v1 ζ 1t + v2 ζ 2t + v3 ζ 3t .

t
t−1thej term v3 λ3 ŷ30 + v3 ζ 3t needs to be zero, because ŷ30 = 0. Finally, we have that
Moreover,
ζ 3t = j=0 λ3 w3,t−j , and since w3,t = 0, then, then ζ 3t = 0 as well. Therefore, the solution for
ẑt is:
ẑt = v1 λt1 ŷ10 + v2 λt2 ŷ20 + v1 ζ 1t + v2 ζ 2t .
3.3.4.2 Frictions, indeterminacy and sunspots
In the neoclassical model that we are analyzing, the equilibrium is determinate. As explained,
this property arises because the number of predetermined variables equals the dimension of the
convergent subspace of the economy. If we managed to increase the dimension of the converging
subspace, the equilibrium would be indeterminate, as further formalized in Appendix 1. As it
turns out, indeterminacy goes hand in hand with sunspots, the expectational shocks extraneous
to those in the economic fundamentals, as we discussed earlier, just after Eq. (3.23).
Introducing sunspots in macroeconomics has been an approach pursued in detail by Farmer
in a series of articles (see Farmer, 1998, for an introductory account of this approach). The
idea is quite interesting, as we know that the basic real business cycle model of this section
needs many extensions in order not to be rejected, empirically, as originally shown by Watson
(1993). In other words, the basic model in this section offers little room for a rich propagation
mechanism, as it entirely relies on impulses, the productivity shocks, which “we hardly read
about in the Wall Street Journal,” as provocatively put by King and Rebelo (1999). Sunspots
offer an interesting route to enrich the propagation mechanism, although their asset pricing
implications in terms of the model analyzed in this section, have not been explored yet.
In a series of articles, David Cass showed that a Pareto-optimal economy can not harbour
sunspots equilibria. On the other hand, any market imperfection has the potential to be a
source of sunspots. The typical example is the presence of incomplete markets. The neoclassical
model analyzed in this section can not generate sunspots, as it relies on a system of perfectly
competitive markets and absence of any sort of frictions. To introduce sunspots in the economy
77
3.4. Production-based asset pricing c
by A. Mele
of this section, we need to think about some deviation from optimality. Two possibilities ana-
lyzed in the literature are the presence of imperfect competition and/or externality effects. We
provide an example of these effects, by working out the deterministic economy in Section 3.3.3.
(Generalizations to the stochastic economy in this section are easy, although more cumbersome.)
How is it that a deterministic economy might generate “stochastic outcomes,” that is, out-
comes driven by shocks entirely unrelated to the fundamentals of the economy? Let us imagine
this can be possible. Then, both optimal consumption and capital accumulation in Section
3.3.3 are necessarily random processes. The system in (3.12), then, must be rewritten in an
expectation format,
k̂t+1 k̂t
Et =A .
ĉt+1 ĉt
Next, let us introduce the expectational error process uc,t ≡ ĉt − Et−1 (ĉt ), which we plug back
into the previous system, to obtain:

k̂t+1 k̂t 0
=A + .
ĉt+1 ĉt uc,t+1
Naturally, we still have λ1 ∈ (0, 1) and λ2 > 1, as in Section 3.3.3. Therefore, we decompose A
as P ΛP −1 , and have:
ŷt+1 = Λŷt + P −1 (0 uc,t+1 )⊤ .
Moreover, for ŷ2t = λ−T
2 Et (ŷ2,t+T ) to hold for all T , we need to have ŷ2t = 0, for all t. Therefore,
the second element of the vector P −1 (0 uc,t+1 )⊤ must be zero, or, for all t,
0 = π 22 uc,t ⇐⇒ 0 = uc,t .
There is no room for expectational errors and, hence, sunspots, in this model. The fact that
λ2 > 1 implies the dimension of the saddlepoint is less than the number of predetermined
variables. So a viable route to pursue here, is to look for economies such that the saddlepoint
has a dimension larger than one, i.e. such that λ2 < 1. In these economies, indeterminancy
and sunspots will be two facets of the same coin. As shown in the appendix, the reasons for
which λ2 > 1 relate to the classical assumptions about the shape of the utility function u and
the production function y. We now modify the production function, to see the effect on the
eigenvalues of A.
[Economy with increasing returns]
[Asset pricing implications in further chapters]
3.4 Production-based asset pricing

3.4.1 Firms
For each firm, capital accumulation does satisfy the identity in Eq. (3.15), reproduced here for
convenience:
Kt+1 = (1 − δ) Kt + It . (3.24)
The additional assumption we make, is that capital adjustment is costly: investing It per unit
of capital already in place, Kt , entails a cost φ KItt , expressed in terms of the price of the final
good, which we take to be the numéraire, thereby allowing the investment goods to differ from
78
by A. Mele

It
the final good the firm produces. An investment of It , then, leads to a cost φ Kt
Kt , such
that the profit the firm makes at time t is,

It
D (Kt , It ) ≡ ỹ (Kt , N (Kt )) − wt N (Kt ) − pt It − φ Kt , (3.25)
Kt
where ỹ (Kt , Nt ) is the firm’s production at time t, obtained with capital Kt and labor Nt , and
subject to the same random productivity shocks as those in Section 3.3.4, wt is the real wage,
N (K) is the labor demand schedule, solution to the optimality condition, ỹN (Kt , N (Kt )) = wt
for all t, and pt is the real price of the investment goods, or uninstalled capital. Finally, the
adjustment-cost function satisfies φ ≥ 0, φ′ ≥ 0, φ′′ ≥ 0. In words, capital adjustment is costly
when the adjustment is made fastly. Naturally, φ is zero in the absence of adjustment costs.
What is the value of the profit, from the perspective of time zero? This question can be
answered, by utilizing the Arrow-Debreu state prices introduced in Chapter 2. At time t, and
in state s, the profit Dt (s) (say) is worth,
φ0,t (s) D (Kt (s) , It (s)) = m0,t (s) Dt (Kt (s) , It (s)) P0,t (s) ,
with the same notation as in Chapter 2.
3.4.1.1 The value of the firm
We assume that in each period, the firm distributes all the profits it makes, and that for a given
capital K0 , it maximizes its cum-dividend value,
- $∞ %.

Vc (K0 ) = max ∞ D (K0 , I0 ) + E m0,t D (Kt , It ) ,
(Kt ,It−1 )t=1
t=1
subject to the capital accumulation in Eq. (3.24).

The value of the firm at time t, V c (Kt ), can be found recursively, through the Bellman’s
equation,
Vc (Kt ) = max [D (Kt , It ) + E (mt+1 Vc (Kt+1 ))] ,
It
where the expectation is taken with respect to the information set as of time t. The first-order
conditions for It lead to,
−DI (Kt , It ) = E [mt+1 Vc′ (Kt+1 )] . (3.26)
That is, along the optimal capital accumulation path, the marginal cost of new installed capital
at time t, −DI , must equal the expected marginal return on the investment, i.e. the expected
value of the marginal contribution of capital to the value of the firm at time t + 1, Vc′ (Kt+1 ).
By Eq. (3.26), optimal investment is a function I (Kt ), and the value of the firm satisfies,
Vc (Kt ) = D (Kt , I (Kt )) + E [mt+1 Vc′ (Kt+1 )] .
Differentiating the value function in the previous equation, with respect to Kt , and using Eq.
(3.26), yields the following envelope condition:
Vc′ (Kt ) = DK (Kt , I (Kt )) + DI (Kt , I (Kt )) I ′ (Kt ) + E [mt+1 Vc′ (Kt+1 ) ((1 − δ) + I ′ (Kt ))]
= DK (Kt , I (Kt )) − (1 − δ) DI (Kt , I (Kt )) .
79
by A. Mele
By replacing this expression for the value function back into Eq. (3.26), leaves:
−DI (Kt , I (Kt )) = E [mt+1 (DK (Kt+1 , I (Kt+1 )) − (1 − δ) DI (Kt+1 , I (Kt+1 )))] . (3.27)
Along the optimal capital accumulation path, the marginal cost of new installed capital
at time t, which by Eq. (3.26) is the expected marginal return on the investment, equals the
expected value of (i) the very same marginal cost at time t+1, corrected for capital depreciation,
(1 − δ), and (ii) capital productivity, net of adjustment costs. Analytically,

′ ∂ It
DK (Kt , I (Kt )) ≡ ỹK (Kt , N (Kt )) − wt N (Kt ) − φ Kt ,
∂Kt Kt

′ It
−DI (Kt , I (Kt )) ≡ pt + φ .
Kt
We now proceed to introduce a fundamental concept in investment theory.

3.4.1.2 q theory
The Tobin’s marginal q is defined as the ratio of the expected marginal value of an additional
unit of capital over its replacement cost:
E [mt+1 Vc′ (Kt+1 )]

TQt ≡ Tobin’s marginal q ≡ .
pt
It is easy to see that the numerator, E [mt+1 Vc′ (Kt+1 )], is simply the shadow price of installed
capital. Consider the Lagrangian at time t,
L (Kt ) = max [D (Kt , It ) − qt (Kt+1 − (1 − δ) Kt − It ) + E (mt+1 L (Kt+1 ))] , (3.28)

It ,Kt+1 ,qt
which, integrated, gives rise to the value of the firm:

- $∞ %.

L (K0 ) = max ∞ V c (K0 ) − E m0,t qt (Kt+1 − (1 − δ) Kt − It ) .
(It ,Kt+1 qt )t=0
t=0
The first-order condition for investment, It , is, qt = −DI (Kt , It ), and that for capital, Kt+1 ,
is qt = E (mt+1 L′ (Kt+1 )). By Eq. (3.26), then, L′ (Kt ) = Vc′ (Kt+1 ) and, therefore, qt is the
expected marginal return on the investment, that is, the shadow price of installed capital.
Therefore, Tobin’s marginal q is the ratio of the shadow price of installed capital to its replace-
ment cost:
qt
TQt = .
pt
Next, replace the first-order condition for qt , i.e. Eq. (3.24), into Eq. (3.28), differentiate L (Kt )
with respect to Kt , and use the first-order condition for Kt+1 , obtaining, L′ (Kt ) = DK (Kt , It )+
qt (1 − δ). These conditions imply that qt satisfies the valuation equation (3.27):
qt = E [mt+1 (DK (Kt+1 , It+1 ) + (1 − δ) qt+1 )] , (3.29)
and therefore that:

′ It
qt = pt + φ . (3.30)
Kt
80
by A. Mele
The shadow price of installed capital, qt , has to equal the marginal cost of new installed capital,
and is larger than the price of uninstalled capital, pt . It is natural: to install new capital requires
some (marginal) adjustment costs, which add to the “row” price of uninstalled capital, pt .
Therefore, in the presence of adjustment costs, Tobin’s marginal q is larger than one.
Eq. (3.29) can be solved forward, leaving:
-∞ .

qt = E (1 − δ)s−1 m0,t+s DK (Kt+s , It+s ) .
s=1
The shadow price of installed capital is worth the sum of all its future marginal net productivity,
discounted at the depreciation rate. Moreover, Eq. (3.30) can be inverted for It /Kt , to deliver:
It
= φ′−1 (qt − pt ) , (3.31)
Kt
where φ′−1 denotes the inverse of φ′ , and is increasing, since φ′ is increasing. Given Kt , and the
fact that Kt+1 is predetermined, the firm evaluates qt through Eq. (3.29), and then determines
the level of new investments through Eq. (3.31). These investments are increasing in the dif-
ference between the shadow price of installed capital, qt , and that of uninstalled capital, pt , as
originally assumed by Tobin (1969).
In the absence of adjustement costs, when qt = pt , Eq. (3.29) delivers the usual condition,
1 = E [mt (ỹK (Kt , N (Kt )) + (1 − δ))| Ft−1 ] .
Empirically, however, the marginal productivity of capital, ỹK (Kt , N (Kt )), is not volatile
enough, to rationalize asset returns. Moreover, as we argue in a moment, Tobin’s marginal
q can be approximated by market-to-book ratios, which are typically time-varying. Therefore,
adjustment costs are important for asset pricing.
A difficulty with Tobin’s marginal q is that it is quite difficult to estimate. Yet in the special
case we are analyzing in this section, where firms act competitively and have access to an
homogeneous production function and adjustment costs, Tobin’s marginal q can be proxied by
the market-to-book ratio of a given firm. Let V (Kt ) denote the ex-dividend value of the firm,
which is its stock market value, since it nets out the dividend it pays to its holder in the current
period. It is:
V (Kt ) ≡ Vc (Kt ) − D (Kt , I (Kt )) = E [mt+1 Vc (Kt+1 )] .
The Tobin’s average q is defined as the ratio of the stock market value of the firm over the
replacement cost of the capital:
Stock Mkt Value of the Firm V (Kt )
Tobin’s average q ≡ = .
Replacement Cost of Capital pt Kt+1
The next result was originally obtained by Hayashi (1982) in a continuous-time setting.
Theorem 3.2. Tobin’s marginal q and average q coincide. That is, we have,
V (Kt ) = qt Kt+1 .
Proof. By the homogeneity properties of the production function and the adjustment costs,
D (Kt , It ) = DK (Kt , It ) Kt + DI (Kt , It ) It .

81
by A. Mele
Therefore, the ex-dividend value of the firm is:

-∞ .

V (K0 , I0 ) = E m0,t D (Kt , It )
- t=1
∞
. -∞ .

=E m0,t (DK (Kt , It ) − (1 − δ) DI (Kt , It )) Kt + E m0,t DI (Kt , It ) Kt+1 ,
t=1 t=1
where the second line follows by Eq. (3.24). By Eq. (3.27), and the law of iterated expectations,
-∞ . -∞ .

E m0,t (DK (Kt , It ) − (1 − δ) DI (Kt , It )) Kt = −DI (K0 , I0 ) K1 −E m0,t Kt+1 DI (Kt , It ) .
t=1 t=1
Hence, V (K0 , I0 ) = −DI (K0 , I0 ) K1 = q0 K1 .
This result, in conjunction with that in Eq. (3.30), provides a simple rule of thumb for
investement decisions. Consider, for example, the case of quadratic adjustment costs, where
φ (x) = 12 κ−1 x2 , for some κ > 0. Then, Eq. (3.31) is:

Stock Mkt Value of the Firm
It = κ (qt − pt ) Kt = κ − 1 pt Kt ,
Replacement Cost of Capital
where the second equality follows by Theorem 3.2. Thus, according to q theory, we expect firms
with a market value larger than the cost of reproducing their capital to grow, and firms which
are not worth the cost of reproducing their capital to shrink. This basic observation constitutes
a first assessment that we can use to assess developments of firms future.
3.4.2 Consumers
We now generalize the budget constraint obtained in the program [3.P3], to the uncertainty
case. We claim that in this case, the relevant budget constraint is,
-∞ .

V0 = c0 + E m0,t (ct − wt Nt ) . (3.32)
t=1
We have: ct + St θ t+1 = (St + Dt ) θ t + wt Nt and, then:

-∞ . -∞ . -∞ .

E m0,t (ct − wt Nt ) = E m0,t (St + Dt ) θt − E m0,t St θ t+1
t=1
- t=1
∞
t=1
. -∞ .
m0,t
=E E mt−1,t (St + Dt ) θt −E m0,t−1 St−1 θt
t−1 mt−1,t
- t=1
∞
. -∞ . t=2

=E m0,t−1 St−1 θ t − E m0,t−1 St−1 θt
t=1 t=2
= S0 θ1 = V0 − c0 .
m0,t
where the third line follows by the properties of the discount factor, mt−1,t
= m0,t−1 and mt ≡
mt−1,t .
82
3.5. Money, production, asset prices, and overlapping generations models c
by A. Mele
Therefore, the program consumers solve is:

-∞ .

max
∞
E β t u (ct ) , s.t. Eq. (3.32).
(ct )t=0
t=1
We now have two optimality conditions, one intertemporal and another, intratemporal:
u1 (ct+1 , Nt+1 ) u2 (ct , Nt )
mt+1 = β (intertemporal); wt = − (intratemporal).
u1 (ct , Nt ) u1 (ct , Nt )
3.4.3 Equilibrium
For all t,
It
ỹ (Kt , Nt ) = ct + pt It + φ Kt . (3.33)
Kt
It is easily seen that the condition θt = 1 in the financial market, implies that ct = Dt + wt Nt ,
which, upon substitution of the profits in Eq. (3.25), delivers the equilibrium condition in Eq.
(3.33). Implicit in this reasoning, is the idea the adjustment costs are not paid to anyone. They
represent, so to speak, capital losses incurred along the way of growth.
3.5 Money, production, asset prices, and overlapping generations models

3.5.1 Introduction: endowment economies
3.5.1.1 A deterministic model
We initially assume the population is constant, and made up of one young and one old. The
young agent maximizes his intertemporal utility subject to his budget constraint:
+
savt + c1t = w1t
max [u (c1t ) + βu (c2,t+1 )] subject to [3.P5]
(c1t ,c2,t+1 ) c2,t+1 = savt Rt+1 + w2,t+1
where w1t and w2,t+1 are the endowments the agent receives at his young and old age.
The agent born at time t − 1, then, faces the constraints: savt−1 + c1,t−1 = w1,t−1 and c2t =
savt−1 Rt + w2t . By combining his second period constraint with the first period constraint of
the agent born at time t,
savt−1 Rt + wt = savt + c1t + c2t , wt ≡ w1t + w2t . (3.34)
The equilibrium in the intergenerational lending market is, naturally:
savt = 0, (3.35)

and implies that the goods market is also in equilibrium, in that wt = 2i=1 ci,t , and for all t.
Therefore, we can analyze the model, by just analyzing the autarkic equilibrium.
As Figure 3.4 illustrates, the first-order condition for the program [3.P5] requires that the
slope of the indifference curve be equal to the slope of the lifetime budget constraint, c2,t+1 =
−Rt+1 c1,t + Rt+1 w1t + w2,t+1 , and leads to:
u′ (c2,t+1 ) 1
β ′
= . (3.36)
u (c1,t ) Rt+1
83
by A. Mele
c2,t+1
w2,t+1
c2,t+1 = − Rt+1 c1,t + Rt+1 w1t + w2,t+1
c1,t
w1,t
FIGURE 3.4.
The equilibrium, then, is a sequence of gross returns Rt satisfying Eqs. (3.34), (3.35) and (3.36),
or:
1 u′ (w2,t+1 )
bt ≡ =β ′ . (3.37)
Rt+1 u (w1t )
In this relation, bt is the shadow price of a bond issued at t, and promising one unit of numéraire
at t + 1: the sequence of prices, bt , satisfying Eq. (3.37), is such that agents are happy with not
being able to lend and borrow, intergenerationally.
The previous model is easy to extend to the case where agents are heterogeneous. The program
each agent j solves is, now:
+
savj,t + c1j,t = w1j,t
max uj (c1j,t ) + β j uj (c2j,t+1 ) subject to
(c1j,t ,c2j,t+1 ) c2j,t+1 = savj,t Rt+1 + w2j,t+1
with obvious notation. The first-order condition is, for all time t and agent j,
u′j (c2j,t+1 ) 1
βj ′
= ≡ bt ,
uj (c1j,t ) Rt+1
and the equilibrium is a sequence of bond prices bt satisfying the previous relation and the
equilibrium in the intrageneration lending market:
J

savj,t = 0, (3.38)
j=1
where J denotes the constant number of agents in each generation.

To illustrate, suppose agents have all the same utility, uj (c) = ln c and discount rate, β j = β.
Then,

1 w2j,t+1 β 1 w2j,t+1
c1j,t = w1j,t + , c2j,t+1 = (Rt+1 w1j,t + w2j,t+1 ) , savj,t = βw1j,t − ,
1+β Rt+1 1+β 1+β Rt+1
and using the equilibrium condition in Eq. (3.38),

1 β Jj=1 w1j,t
bt = = J . (3.39)
Rt+1 j=1 w2j,t+1
84
by A. Mele
3.5.1.2 A tree in a stochastic economy
Suppose, next, that we introduce a tree, which yields a stochastic dividend Dt in each period.
Each agent solves the following program:
+
St θt + c1t = w1t
max [u (c1t ) + βE ( u(c2,t+1 )| Ft )] subject to [3.P6]
(c1t ,c2,t+1 ) c2,t+1 = (St+1 + Dt+1 )θt + w2,t+1
where St denotes the asset price and θ the units of the asset the agent chooses in his young age.
The agent born at time t − 1 faces the constraints St−1 θt−1 + c1,t−1 = w1,t−1 and w2t + (St +
Dt )θt−1 = c2,t . By combining the second period constraint of the agent born at time t − 1 with
the first period constraint of the agent born at time t,
(St + Dt ) θ t−1 − St θt + wt = c1,t + c2,t .
The clearing condition in the asset market, θt = 1, implies that the market for goods also clears,
for all t: Dt + w1t + w2t = c1,t + c2,t . A characterization of the solution to the program [3.P6]
can be obtained by eliminating c from the constraint,
max [u (w1t − St θ) + βE ( u ((St+1 + Dt+1 ) θ)| Ft )] .
θ
The equilibrium is one where θ t = 1, implying that (i) c1t = w1t − St and (ii) c2,t+1 = St+1 +
Dt+1 + w2,t+1 . Using (i) and (ii), the first-order condition for the program [3.P6] leads to:
u′ (w1t − St ) St = βE [u′ (St+1 + Dt+1 + w2,t+1 ) (St+1 + Dt+1 )| Ft ] .
Consider, for example, the case where u (c) = ln c, and set R̃t+1 = (St+1 + Dt+1 ) /St . We
have:
- .
1 1

∗
= βE R̃t+1 Ft , where sav∗t ≡ St θt , θt = 1. (3.40)
w1t − savt ∗
savt R̃t+1 + w2,t+1
In a deterministic setting,
1 1
=β Rt+1 , where savt = 0, (3.41)
w1t − savt savt Rt+1 + w2,t+1
which leads to the equilibrium bond price in Eq. (3.39). Eqs. (3.40) and (3.41) are formally
equivalent. Their fundamental difference is that in the tree economy, savings have to stay
positive, as the tree must be held by the young agent, in equilibrium: sav∗t ≡ St ≥ 0. In an
economy without a tree, instead, the interest rate, Rt , has to be such that savings are zero for
all t, savt = 0.
Eq. (3.40) can be solved explicitly for the price of the tree, St , once we assume w2t = 0 for
all t. In the absence of a tree, we cannot assume endowments are zero in the old age, since
the autarkic economy in this case would be such that the old generation would not consume
anything. In the presence of a tree, instead, this assumption is innocuous, conceptually, as the
autarkic equilibrium in this case is such that the old generation could consume the fruits of the
tree, as well as the proceedings arising from selling the tree to the young generation. Solving
Eq. (3.40) for St when w2t = 0, then, leads to a price for the tree, equal to:
β
St = w1t .
1+β
85
by A. Mele
3.5.2 Diamond’s model

3.5.3 Money
We consider a version of the previous model with endowment (not with capital), and assume
that agents can now transfer value through a piece of paper, interpreted as money. The young
agent, then, maximizes his intertemporal utility, subject to a new budget constraint:

 mt
 + c1t = w1t
max [u (c1t ) + βu (c2,t+1 )] subject to pt [3.P7]
 mt
(c1t ,c2,t+1 )  c2,t+1 = + w2,t+1
pt+1
where mt is the amount of money he holds at time t, and pt is the price of the consumption
good as of time t.
Let
mt pt
savt ≡ , Rt+1 ≡ . (3.42)
pt pt+1
Then, the budget constraint for program [3.P7] is formally identical to that for program [3.P5].
The difference is that in the monetary economy of this section, the young agent may wish to
transfer value over time, by saving money, earning a gross “interest rate” equal to the rate
of deflation: the lower the price level the next period, the higher the purchasing power of the
money he transfers from the young to the old age. Naturally, then, by aggregating the budget
constraints of the young and the old generation, we obtain, formally, Eq. (3.34), where now,
savt and Rt+1 are as in (3.42). However, in the setting of this section, savt is not necessarily
zero, as money can be transferred from a generation to another one. In equilibrium, savt = m̄ptt ,
where m̄t denotes money supply. Therefore, the real value of money is strictly positive, if the
equilibrium price pt stays bounded over time, which might actually occur, as we shall study
below. As we see, the role of money as a medium for transferring value, is, in this context,
similar to that of a tree in the stochastic overlapping generations economy of Section 3.5.1.2.
pt
Substituting the equilibrium savings savt = m̄ptt and Rt+1 = pt+1 into Eq. (3.34), we obtain,
m̄t−1 = m̄t + pt (c1t + c2t − wt ), which used again in Eq. (3.34), delivers,
∆m̄t
savt−1 Rt = savt − . (3.43)
pt
∆m̄t
We need a law of movement for money creation. We assume that:7 m̄t−1
= µt , for some bounded
sequence µt . Replacing this into Eq. (3.43), leaves:
(1 + µt ) savt−1 Rt = savt . (3.44)
The last relation can be obtained even more simply, noting that by definition, (1 + µt ) m̄pt−1
t−1 pt−1
pt
=
m̄t
pt
. The previous relation can be generalized when population grows. Suppose that at time t,
Nt individuals are born, and that NNt−1
t
= (1 + n), for some constant n. Let money supply be
∆Mt
given by Mt ≡ Nt m̄t , and assume that for all t, M t−1
= µt . Then, by a reasoning similar to that
leading to Eq. (3.44),
1 + µt
sav (Rt ) Rt = sav (Rt+1 ) , (3.45)
1+n
7 In this section, we assume that money transfers are made to the young generation: the money the young generation has to
absorb is that from the old generation, m̄t−1 , and that created by the “central bank,” µt m̄t−1 . One might consider an alternative
model in which transfers are made to old.
86
by A. Mele
where now, we have set the real savings equal to a function of the interest rate, savt−1 ≡ sav (Rt ),
as it should be, by the solution to the program [3.P7].
Next, suppose that µt is independent of R, and that limt→∞ µt = µ, say, a constant. Eq.
(3.45) leads to two stationary equilibria:
(a) R = 1+n
1+µ
. This stationary equilibrium relates to the “golden rule,” once we set µ = 0, as
we shall
1+µsay
t in the next section. For µ = 0, the price is, in this stationary equilibrium,
pt = 1+n p0 . Then, we have: (i) pt = NMt ptt = NM0 p00 , and (ii) pm̄
m̄t
t+1
t
= NM0 p00 1+n
1+µ
. All in all, the
agents’ budget constraints are bounded and the real value of money is strictly positive.
In this stationary equilibrium, agents “trust” money.
(b) Ra : sav (Ra ) = 0. This stationary equilibrium relates to an autarkic state. Generally, we
have that Ra < R: prices increase more rapidly than per-capita money stocks. Analyti-
cally, Ra < R ⇐⇒ pt+1 pt
> 1+µ
1+n
= M t+1 /Mt
Nt+1 /Nt
= m̄m̄t+1
t
⇐⇒ m̄pt+1t+1
< m̄ptt , whence limt→∞ m̄ptt → 0.
As for pm̄ t
t+1
, we have that pm̄ t
t+1
= m̄ptt Ra < m̄ptt R = m̄ptt 1+n
1+µ
, and since limt→∞ m̄ptt → 0, then
limt→∞ pm̄ t
t+1
→ 0. In this stationary equilibrium, agents do not “trust” money.
If sav(·) is differentiable and sav′ (·) = 0, the dynamics of (Rt )∞ t=0 can be studied through the
slope,
dRt+1 sav′ (Rt )Rt + sav(Rt ) 1 + µt
= . (3.46)
dRt sav′ (Rt+1 ) 1+n
There are three cases:
(i) sav′ (R) > 0. Gross substituability: the income effect is dominated by the substitution
effect.
(ii) sav′ (R) = 0. Income and substitution effects compensate each other.
(iii) sav′ (R) < 0. Complementarity: the income effect dominates the substitution effect.
An example of gross substituability was provided during the presentation of the introductory
examples of the present section (log utility functions). The second case can be obtained with
the same examples after imposing that agents have no endowments in the second period. The
equilibrium is seriously compromised in this case, however. Another example is obtained with
Cobb-Douglas utility functions: u(c1t , c2,t+1 ) = cl1t1 ·cl2,t+1
2
, which generates a real savings function
l2 w
w − R2,t+1
l1 1t
sav(Rt+1 ) = l
t+1
the derivative of which is nil when one assumes that w2,t = 0 for all t,
1+ l2
1
m̄t
which also implies pt
= savt = ν1 w1t , ν ≡ l1 +l2
l2
and, by reorganizing,
m̄t ν = pt w1t ,
an equation supporting the view of the Quantitative Theory of money. In this case, the sequence
pt t w1,t+1
of gross returns is Rt+1 = pt+1 = m̄m̄t+1 w1,t
, or
(1 + n) · (1 + gt+1 )
Rt+1 = ,
1 + µt+1
where gt+1 denotes the growth rate of endowments of young between time t and time t + 1. The
inflation factor R−1
t is equal to the monetary creation factor corrected for the the growth rate
of the economy.
87
by A. Mele
y
1.5
1.0
0.5
0.0
0.0 0.2 0.4 0.6 0.8 1.0 1.2
x
FIGURE 3.5. η = 2
η/(η−1)
(η−1)/η (η−1)/η
Another example is u(c1t , c2,t+1 ) = lc1t + (1 − l)c2,t+1 . Note that
1−l
limη→1 u(c1t , c2,t+1 ) = cl1t · c2,t+1 , Cobb-Douglas. We have
 Rt+1 w1t +w2,t+1 1−l

 c1t = η
Rt+1 +K η Rt+1
, K ≡ l
 Rt+1 w1t +w2,t+1
c2,t+1 = 1+K −η R1−η

 η
K η Rt+1
t+1
 sav =
w1t −w2,t+1
t η
Rt+1 +K η Rt+1
To simplify, suppose that K = 1, and 0 = w2t = µt = n, and w1t = w1,t+1 ∀t. It is easily
checked that
sign (sav′ (R)) = sign (η − 1) .
The interest factor dynamics is:
Rt+1 = (R−η
t +R
−1
− 1)1/(1−η) . (3.47)
The stationary equilibria are the solutions of
R1−η = R−η + R−1 − 1,
and one immediately verifies the existence of the monetary state R = 1.

When η > 1, the dynamics is quite simple studying: in general, one has Ra = 0 and R = 1,
and Ra is stable and R is unstable. Figure 3.5 illustrates this situation in the case η = 2.
When η < 1, the situation is more delicate. In this case, Ra is generically ill-defined, and
R = 1 is not necessarily stable. One can observe a dynamics converging towards R, or even the
emergence of more or less “regular” cycles. In this respect, it is important to examine the slope
of the slope of the map in (3.42) in correspondence with R = 1:

dRt+1 η+1
= .
dRt Rt+1 =Rt =1 η − 1
Here are some hints concerning the general case. Figure 3.6 depicts the shape of the map
Rt → Rt+1 in the case of gross substituability (in fact, the following arguments can also be
′ (x)x
adapted verbatim to the complementarity case whenever ∀x, sav sav(x)
< −1: indeed, in this case
dRt+1
dRt
> 0 since the numerator is negative and the denominator is also negative by assumption.
88
by A. Mele
Rt+1
R M
A
Rt
Ra R
FIGURE 3.6. Gross substituability
Such a case does not have any significative economic content, however). This is an increasing
sav′ (Rt )Rt +sav(Rt ) 1+µt
function since the slope dRt+1
dRt
= sav′ (Rt+1 ) 1+n
> 0. In addition, the slope (3.41) computed
1+n
in correspondence with the monetary state R = 1+µ is:

dRt+1 sav(R)
= 1 + .
dRt Rt+1 =Rt =R Rsav′ (R)
This is always greater than 1 if sav′ (.) > 0, and in this case the monetary state M is unstable
and the autarchic state A is stable. In particular, all paths starting from the right of M are
unstables. They imply an increasing sequence of R, i.e. a decreasing sequence of p. This can
not be an equilibrium because it contradicts the budget constraints (in fact, there would not
be a solution to the agents’ programs). It is necessary that the economy starts from A and M ,
but we have not endowed with additional pieces of information: there is a continuum of points
R1 ∈ [Ra , R) which are candidates for the beginning of the equilibria sequence. Contrary to
the models of the previous sections, here we have indeterminacy of the equilibrium, which is
parametrized by p0 .
Is the autarchic state the only possible stable configuration?
The answer is no. It is sufficient
dRt+1
that the map Rt → Rt+1 bends backwards and that dRt < −1 to make M a stable
Rt+1 =Rt =R
′ (R)R
state. A condition for the curve to bend backward is that sav > −1, and the condition for
sav(R)
dRt+1 sav′ (R)R sav ′ (R)R
dRt
< −1 to hold is that sav(R) > − 12 . If sav(R) > − 12 , M is attained starting
Rt+1 =Rt =R
from a sufficiently small neighborhood of M . Figure 3.7 shows the emergence of a cycle of
R2 9
order two,8 in which R∗∗ = R ∗
. Notice that in this case, the dynamics of the system has been
analyzed in a backward-looking manner, not in a forward-looking manner. The reason is that
there is an indeterminacy of the forward-looking dynamics, and it is thus necessary to analyze
8 There are more complicated situations in which cycles of order 3 may exist, whence the emergence of what is known as “chaotic”
trajectories.
9 Here is the proof. Starting from relation (3.40), we have that for a 2-cycle,
1+µ
R s(R∗ ) = s(R∗∗ )
1+n ∗
1+µ
R s(R∗∗ ) = s(R∗ )
1+n ∗∗
By multiplying the two sides of these equations, one recovers the desired result.
89
by A. Mele
Rt+1
R M
Rt
R* R R**
FIGURE 3.7.
the system dynamics in a backward-looking manner ... In any case, the condition sav′ (R) < 0
is not appealing on an economic standpoint.
3.5.4 Money in a model with real shocks

The Lucas (1972) model10 is the first to address issues concerning the neutrality of money in a
context of overlapping generations with stochastic features. Here we present a very simplified
version of the model.11
The agent works during the first period and consumes in the second period: this is, of course,
nothing but a pleasant simplification. Disutility of work is −v. We suppose that v′ (x) > 0,
v ′′ (x) > 0, ∀x ∈ R+ . The second period consumption utility is u, and has the usual properties.
The program is:
max+ {n,c} {−v(nt ) + βE [u(ct+1 )| Ft ]}
m = pt yt , yt = ǫt nt
s.t.
pt+1 ct+1 = m
where m denotes money, y denotes the production obtained by means of labour n, and {ǫt }t=0,1,···
is a sequence of strictly positive shocks affecting labour productivity. The price level is p.
pt
Let us rewrite the program by using the single constraint ct+1 = Rt+1 ǫt nt , Rt+1 ≡ pt+1 ,
max {−v(nt ) + βE [u(Rt+1 ǫt nt )| Ft )]} .

n
The first-order condition is v′ (nt ) = βE(u′ (Rt+1 ǫt nt )Rt+1 ǫt / Ft ) ⇔

′ 1 ′ 1
v (nt ) = βE u (Rt+1 ǫt nt ) Ft
pt ǫt pt+1
At the equilibrium, yt = ct ⇒ ct = ǫt · nt , and
m
ct+1 = = ǫt+1 · nt+1 .
pt+1
10 Lucas, R.E., Jr. (1972): “Expectations and the Neutrality of Money,” J. Econ. Theory, 4, 103-124.
11 Parts of this simplified version of the model are taken from Stokey et Lucas (1989, p. 504): Stokey, N.L. and R.E. Lucas (with
E.C. Prescott) (1989): Recursive Methods in Economic Dynamics, Harvard University Press.
90
3.6. Optimality c
by A. Mele
By replacing the previous relation into the first-order condition, and simplifying,
v ′ (nt )nt = βE [u′ (ǫt+1 · nt+1 )(ǫt+1 · nt+1 )| Ft ] . (3.48)
As in the model of section 3.6, the rational expectation assumption consists in regarding all the
model’s variables as functions of the state varibales. Here, the states of natures are generated
by ǫ, and we have:
n = n(ǫ).
By plugging n(ǫ) into (3.43) we get:
"

′
v (n(ǫ))n(ǫ) = β u′ ǫ+ n(ǫ+ ) ǫ+ n(ǫ+ )dP ( ǫ+ ǫ). (3.49)
supp(ǫ)
A rational expectations equilibrium is a statistical sequence n(ǫ) satisfying the functional

equation (3.44).
Eq. (3.44) considerably simplifies when shocks are i.i.d.: P (ǫ+ / ǫ) = P (ǫ+ ),
"
′

v (n(ǫ))n(ǫ) = β u′ ǫ+ n(ǫ+ ) ǫ+ n(ǫ+ )dP (ǫ+ ). (3.50)
supp(ǫ)
In this case, the r.h.s. of the previous relationship does not depend on ǫ, which implies that
the l.h.s. does not depend on ǫ neither. Therefore, the only candidate for the solution for n is
a constant n̄:12
n(ǫ) = n̄, ∀ǫ.
Provided such a n̄ exists, this is a result on money neutrality. More precisely, relation (3.45)
can be written as: "
′
v (n̄) = β u′ (ǫ+ n̄)ǫ+ dP (ǫ+ ),
supp(ǫ)
and it is always possible to impose reasonable conditions on v and u that ensure existence and
unicity of a strictly positive solution for n̄, as in the following example.
√ √
Example. v(x) = 12 x2 and u(x) = ln x. The solution is n̄ = β, y(ǫ) = ǫ β and p(ǫ) = m
√
ǫ β
.
Exercise. Extend the previous model when the money supply follows the stochastic process:
∆mt
mt−1
= µt , where {µt }t=0,1,··· is a i.i.d. sequence of shocks.
3.6 Optimality
3.6.1 Models with productive capital
The starting point is the relation
Kt+1 = St = Y (Kt , Nt ) − Ct , K0 given.
12 A rigorous proof that n(ǫ) = n̄, ∀ǫ is as follows. Let’s suppose the contrary, i.e. there exists a point ǫ0 and a neighborhood
of ǫ0 such that either n (ǫ0 + A) > n (ǫ0 )or n (ǫ0 + A) < n (ǫ0 ), where the constant A > 0. Let’s consider the first case (the proof
of the second case being entirely analogous). Since the r.h.s. of (4.43) is constant for all ǫ, we have, v ′ (n (ǫ0 + A)) · n (ǫ0 + A) =
v ′ (n (ǫ0 )) · n (ǫ0 ) ≤ v ′ (n (ǫ0 + A)) · n (ǫ0 ), where the inequality is due to the assumption that v′′ > 0 always holds. We have thus
shown that, v ′ (n (ǫ0 + A)) · [n (ǫ0 + A) − n (ǫ0 )] ≤ 0. Now v ′ > 0 always holds, so that n (ǫ0 + A) < n (ǫ0 ), a contradiction with
the assumption n (ǫ0 + A) > n (ǫ0 ).
91
3.6. Optimality c
by A. Mele
Dividing both sides by Nt one gets:

1
kt+1 = (y(kt ) − ct ) , k0 given and: inf kt ≥ 0, sup kt < ∞. (3.51)
1+n
The stationary state of this economy is:
c = y(k) − (1 + n)k,
and we see that the per-capita consumption attains its maximum at:
k̄ : y ′ (k̄) = 1 + n.
This is the golden rule.
If y ′ (k̄) < 1 + n, it is always possible to increase per-capita consumption at the stationary
state; indeed, since y(k) is given, one can decrease k and obtain dc = −(1 + n)dk > 0, and in
the next periods, one has dc = (y ′ (k) − (1 + n)) dk > 0. We wish to provide a formal proof of
these facts along the entire capital accumulation path of the economy. Notice that the foregoing
results are valid whatever the structure of the economy is (e.g., a finite number of agents living
forever or overlapping generations). As an example, in the overlapping generations case one can
interpret relation (3.46) as the one describing the capital accumulation path in the Diamond’s
Ct c2,t+1
model once ct is interpreted as: ct ≡ N t
= c1t + 1+n .
The notion of efficiency that we use is the following one:13 a path {(k, c)t }∞
t=0 is consumption-
∞
inefficient if there exists another path {(k̃, c̃)t }t=0 satisfying (3.46) and such that, for all t,
c̃t ≥ ct , with at least one strict inequality.
We now present a weaker version of thm. 1 p. 161 of Tirole (1988), but easier to show.14
Theorem 3.2 ((weak version of the) Cass-Malinvaud theory). (a) A path {(k, c)t }∞t=0 is
y′ (kt ) ∞
consumption efficient if 1+n ≥ 1 ∀t. (b) A path {(k, c)t }t=0 is consumption inefficient if
y′ (kt )
1+n
< 1 ∀t.
Proof. (a) Let k̃t = kt + ǫt , t = 0, 1, · · ·, an alternative consumption efficient path. Since k0

is given, ǫt = 0. Furthermore, by relation (3.46) one has that
(1 + n) · (k̃t+1 − kt+1 ) = y(k̃t ) − y(kt ) − (c̃t − ct ),
and because k̃ has been supposed to be efficient, c̃t ≥ ct with at least one strictly equality over
the ts. Therefore, by concavity of y,
0 ≤ y(k̃t ) − y(kt ) − (1 + n)(k̃t+1 − kt+1 )
= y(k̃t ) − y(kt ) − (1 + n)ǫt+1
< y(kt ) + y ′ (kt )ǫt − y(kt ) − (1 + n)ǫt+1 ,

or
y ′ (kt )
ǫt+1 < ǫt .
1+n
13 Tirole,
J. (1988): “Efficacité intertemporelle, transferts intergénérationnels et formation du prix des actifs: une introduction,”
in: Melanges économiques. Essais en l’honneur de Edmond Malinvaud. Paris: Editions Economica & Editions EHESS, p. 157-185.
14 The proof we present here appears in Touzé, V. (1999): Financement de la sécurité sociale et équilibre entre les générations,
unpublished PhD dissertation Univ. Paris X Nanterre.

92
3.6. Optimality c
by A. Mele
y'(kt)
β −1
kt
k k*
FIGURE 3.8. Non-necessity of the conditions of thm. 3.2 in the model with a representative agent.
′
Evaluating the previous inequality at t = 0 yields ǫ1 < y1+n
(k0 )
ǫ0 , and since ǫ0 = 0, one has that
y′ (kt )
ǫ1 < 0. Since 1+n ≥ 1 ∀t, ǫt → −∞, which contradicts (3.46).
t→∞
(b) The proof is nearly identical to the one of part (a) with the obvious exception that
lim inf ǫt >> −∞ here. Furthermore, note that there are infinitely many such sequences that
allow for efficiency improvements.
Are actual economies dynamically efficient? To address this issue, Abel et al. (1989)15 mod-
ified somehow the previous setup to include uncertainty, and conclude that the US economy
does satisfy their dynamic efficiency requirements.
The conditions of the previous theorem are somehow restrictive. As an example, let us take the
model of section 3.2 and fix, as in section 3.2, n = 0 to simplify. As far as k0 < k = (y ′ )−1 [β −1 ],
per-capita capital is such that y ′ (kt ) > 1 ∀t since the dynamics here is of the saddlepoint type
and then monotone (see figure 3.8). Therefore, the conditions of the theorem are fulfilled. Such
conditions also hold when k0 ∈ [k, k∗ ], again by the monotone dynamics of kt . Nevertheless, the
conditions of the theorem do not hold anymore when k0 > k∗ and yet, the capital accumulation
path is still efficient! While it is possible to show this with the tools of the evaluation equilibria
of Debreu (1954), here we provide the proof with the same tools used to show thm. 3.2. Indeed,
let
τ = inf {t : kt ≤ k∗ } = inf {t : y ′ (kt ) ≥ 1} .
+ ′
y (kt ) < 1, t = 0, 1, · · ·, τ − 1
We see that τ < ∞, and since the dynamics is monotone, . By
y ′ (kt ) ≥ 1, t = τ , τ + 1, · · ·
using again the same arguments used to show thm. 3.2, we see that since τ is finite, −∞ <
ǫτ +1 < 0. From τ onwards, an explosive sequence ǫ starts unfolding, and ǫt → −∞.
t→∞
3.6.2 Models with money

The decentralized economy is characterized by the presence of money. Here we are interested
in first best optima i.e., optima that a social planner may choose by acting directly on agents
15 Abel, A.B., N.G. Mankiw, L.H. Summers and R.J. Zeckhauser (1989): “Assessing Dynamic Efficiency: Theory and Evidence,”
Review Econ. Studies, 56, 1-20.

93
3.6. Optimality c
by A. Mele
consumptions.16 Let us first analyze the stationary state R = 1+n

1+µ
and show that it corresponds
to the stationary state in which consumptions and endowments are constants, and agents’ utility
is maximized when µ = 0. Indeed, here the social planner allocates resources without caring
about monetary phenomena, and the only constraint is the following “natural” constraint:
w2 c2
wn ≡ w1 + = c1 + ,
1+n 1+n
in which case the utility of the “stationary agent” is:
c2
u(c1 , c2 ) = u(wn − , c2 ).
1+n
The first-order condition is uuc2
c1
1
= 1+n . In the market equilibrium, the first-order condition is
uc2 1
uc1
= R , which means that in the market equilibrium, the golden rule is attained at R if and
only if µ = 0, as claimed in section 3.?
The convergence of the optimal policy of the social planner towards the gloden rule can be
verified as it follows. The social planner solves the program:
 ∞
 max ϑt · u(t) (c , c

) 1t 2,t+1

 w ≡w + t=0
w2t c2t
nt 1t 1+n
= c1t + 1+n

or max ∞ t
t=0 ϑ · u
(t) c2t
wnt − 1+n , c2,t+1 . Here the notation u(t) (.) has been used to stress the fact
that endowments may change from one generation to another generation, and ϑ is the weighting
coefficient of generations that is used by the planner. The first-order conditions,
(t−1)
uc2 ϑ
(t)
= ,
uc1 1+n
lead towards the modified Golden Rule at the stationary state (modified by ϑ).
16 Inour terminology, a second best optimum is the one in which the social planner makes the thought experiment to let the
market “play” first (with money) and then parametrizes such virtual equilibria by µt . The resulting indirect utility functions are
expressed in terms of such µt s and after creating an aggregator of such indirect utility functions, the social planner maximises such
an aggregator with respect to µt .
94
3.7. Appendix 1: Finite difference equations, with economic applications c
by A. Mele
3.7 Appendix 1: Finite difference equations, with economic applications

Let z0 ∈ Rd , and consider the following linear system of finite difference equations:
zt+1 = A · zt , t = 0, 1, · · ·, (3A1.1)
where A is conformable matrix. Solution is,
zt = v1 κ1 λt1 + · · · + vd κd λtd ,
where λi and vi are the eigenvalues and the corresponding eigenvectors of A, and κi are constants
which will be determined below.
The classical method of proof is based on the so-called diagonalization of system (3A1.1). Let us
consider the system of characteristic equations for A, (A − λi I) vi = 0d×1 , λi scalar and vi a column
vector d × 1, i = 1, · · ·, n, or in matrix form, AP = P Λ, where P = (v1 , · · ·, vd ) and Λ is a diagonal
matrix with λi on the diagonal. By post-multiplying by P −1 one gets the decomposition17
A = P ΛP −1 . (3A1.2)
By replacing (3A1.2) into (3A1.1), P −1 zt+1 = Λ · P −1 zt , or
yt+1 = Λ · yt , yt ≡ P −1 zt .
The solution for y is:

yit = κi λti ,

and the solution for z is: zt = P yt = (v1 , · · ·, vd )yt = di=1 vi yit = di=1 vi κi λti .
To determine the vector of the constants κ = (κ1 , · · ·, κd )⊤ , we first evaluate the solution at t = 0,
z0 = (v1 , · · ·, vd )κ = P κ,
whence
κ̂ ≡ κ(P ) = P −1 z0 ,
where the columns of P are vectors ∈ the space of the eigenvectors. Naturally, there is an infinity of
such P s, but the previous formula shows how κ(P ) must “adjust” to guarantee the stability of the
solution with respect to changes of P .
3A.1 Example. d = 2. Let us suppose that λ1 ∈ (0, 1), λ2 > 1. The system is unstable in cor-
respondence with any initial condition but a set of zero measure. This set gives rise to the so-called
saddlepoint path. Let us compute its coordinates. The strategy consists in finding the set of initial
conditions for which κ2 = 0. Let us evaluate the solution at t = 0,

x0 κ1 v11 κ1 + v12 κ2
= z0 = P κ = (v1 , v2 ) = ,
y0 κ2 v21 κ1 + v22 κ2
where we have set z = (x, y)⊤ . By replacing the second equation into the first one and solving for κ2 ,
v11 y0 − v21 x0
κ2 = .
v11 v22 − v12 v21
This is zero when
v21
y0 = x0 .
v11
95
by A. Mele
y0
x0 x
y = (v21/v11) x
FIGURE 3.9.
Here the saddlepoint is a line with slope equal to the ratio of the components of the eigenvector
associated with the root with modulus less than one. The situation is represented in figure 3.9, where
the “divergent” line has as equation y0 = vv22
12
x0 , and corresponds to the case κ1 = 0.
The economic content of the saddlepoint is the following one: if x is a predetermined variable, y
must “jump” to y0 = vv21
11
x0 to make the system display a non-explosive behavior. Notice that there is
a major conceptual difficulty when the system includes two predetermined variables, since in this case
there are generically no stable solutions. Such a possibility is unusual in economics, however.
4A.2 Example. The previous example can be generated by the neoclassic growth model. In section
3.2.3, we showed that in a small neighborhood of the stationary values k, c, the dynamics of (k̂t , ĉt )t
(deviations of capital and consumption from their respective stationary values k, c) is:

k̂t+1 k̂t
=A
ĉt+1 ĉt
where $ %
y ′ (k) −1
A≡ u′ (c) ′′ u′ (c) ′′ , β ∈ (0, 1).
− u′′ (c) y (k) 1 + β u′′ (c) y (k)
By using the relationship βy ′ (k) = 1, and the conditions imposed on u and y, we have
- det(A) = y ′ (k) = β −1 > 1;
′
- tr(A) = β −1 + 1 + β uu′′(c) ′′
(c) y (k) > 1 + det(A).
√
tr(A)∓ tr(A)2 −4 det(A)
The two eigenvalues are solutions of a quadratic equation, and are: λ1/2 = 2 . Now,
a ≡ tr(A)2 − 4 det(A)
′
!2
= β −1 + 1 + β uu′′(c)
(c) y ′′ (k) − 4β −1
−1 2
> β + 1 − 4β −1
2
= 1 − β −1
> 0.
17 The previous decomposition is known as the spectral decomposition if P ⊤ = P −1 . When it is not possible to diagonalize A,
one may make reference to the canonical transformation of Jordan.

96
by A. Mele
√ √ √
It follows that λ2 = tr(A)
2 + 2 >
a 1+det(A)
2 + 2a > 1 + 2a > 1.
To show that λ1 ∈ (0, 1), notice first that since det(A) > 0, 2λ1 = tr(A) −
2
tr(A) − 4 det(A) > 0. It remains 2 to be shown that λ1 < 1. But λ1 < 1 ⇔ tr(A) −
tr(A)2 − 4 det(A) < 2, or (tr(A) − 2) < tr(A)2 − 4 det(A), which can be confirmed to be always
true by very simple calculations.
Next, we wish to generalize the previous examples to the case d > 2. The counterpart of the
saddlepoint seen before is called the convergent, or stable subspace: it is the locus of points for which
the solution does not explode. (In the case of nonlinear systems, such a convergent subspace is termed
convergent, or stable manifold. In this appendix we only study linear systems.)
Let Π ≡ P −1 , and rewrite the system determining the solution for κ:
κ̂ = Πz0 .
We suppose that the elements of z and matrix A have been reordered in such a way that ∃s : |λi | < 1,
for i = 1, · · ·, s and |λi | > 1 for i = s + 1, · · ·, d. Then we partition Π in such a way that:
 
Πs
κ̂ =  s×d  z0 .
Πu
(d−s)×d
As in example (3A.1), the objective is to make the system “stay prisoner” of the convergent space,
which requires that
κ̂s+1 = · · · = κ̂d = 0,
or, by exploiting the previous system,
 
κ̂s+1
 .. 
 .  = Πu z0 = 0(d−s)×1 .
(d−s)×d
κ̂d
Let d ≡ k + k∗ (k free and k ∗ predetermined), and partition Πu and z0 in such a way to distinguish
the predetermined from the free variables:
 
z0free
(1) (2)
0(d−s)×1 = Πu z0 = Πu Πu  k×1
pre  = Πu
(1)
z0free + Π(2)
u z0pre ,
(d−s)×d (d−s)×k (d−s)×k ∗ z0 (d−s)×k k×1 (d−s)×k k∗ ×1
∗
k∗ ×1
or,
Π(1)
u z0free = − Π(2)
u z0pre .
(d−s)×k k×1 (d−s)×k∗ k∗ ×1
The previous system has d − s equations and k unknowns (the components of z0free ): this is so be-
(1) (2)
cause z0pre is known (it is the k∗ -dimensional vector of the predetermined variables) and Πu , Πu are
(1)
primitive data of the economy (they depend on A). We assume that Πu has full rank.
(d−s)×k
Therefore, there are three cases: 1) s = k∗ ; 2) s < k ∗ ; and 3) s > k∗ . Before analyzing these case,
let us mention a word on terminology. We shall refer to s as the dimension of the convergent subspace
(S). The reason is the following one. Consider the solution:
zt = v1 κ̂1 λt1 + · · · + vs κ̂s λts + vs+1 κ̂s+1 λts+1 + · · · + vd κ̂d λtd .
In order to be in S, it must be the case that
κ̂s+1 = · · · = κ̂d = 0,
97
by A. Mele
in which case the solution reduces to:

⊤
zt = v1 κ̂1 λt1 + · · · + vs κ̂s λts = (v1 κ̂1 , · · ·, vs κ̂s ) · λt1 , · · ·, λts ,
d×1 d×1 d×1
i.e.,
zt = V̂ · λ̂t ,
d×1 d×s s×1
⊤
where V̂ ≡ (v1 κ̂1 , · · ·, vs κ̂s ) and λ̂t ≡ λt1 , · · ·, λts . Now for each t, introduce the vector subspace:
V̂ t ≡ {zt ∈ Rd : zt = V̂ · λ̂t , λ̂t ∈ Rs }.

d×1 d×s s×1
Clearly, for each t, dimV̂ t = rank(V̂ ) = s, and we are done.

We analyze these three cases below.
(i) d − s = k, or s = k∗ . The dimension of the divergent subspace is equal to the number of the free
variables or, the dimension of the convergent subspace is equal to the number of predetermined
variables. In this case, the system is determined. The previous conditions are easy to interpret.
The predetermined variables identify one and only one point in the convergent space, which
allows us to compute the only possible jump in correspondence of which the free variables can
(1)−1 (2) pre
jump to make the system remain in the convergent space: z0free = −Πu Πu z0 . This is exactly
the case of the previous examples, in which d = 2, k = 1, and the predetermined variable was
x: there x0 identified one and only one point in the saddlepoint path, and starting from such a
point, there was one and only one y0 guaranteeing that the system does not explode.
(ii) d − s > k, or s < k∗ . There are generically no solutions in the convergent space. This case was
already reminded at the end of example 4A.1.
(iii) d − s < k, or s > k∗ . There exists an infinite number of solutions in the convergent space, and
such a phenomenon is typically referred to as indeterminacy. In the previous example, s = 1,
and this case may emerge only in the absence of predetermined variables. This is also the case
in which sunspots may arise.
98
3.8. Appendix 2: Neoclassic growth model - continuous time c
by A. Mele
3.8 Appendix 2: Neoclassic growth model - continuous time

3.8.1 Convergence results
Consider chopping time in the population growth law as,
Nhk − Nh(k−1) = n̄ · h · Nh(k−1) , k = 1, · · ·, ℓ,
where n̄ is an instantaneous rate, and ℓ = ht is the number of subperiods in which we have chopped a
given time period t. The solution is Nhℓ = (1 + n̄h)ℓ N0 , or
Nt = (1 + n̄ · h) t/h N0 .
By taking limits:
N(t) = lim (1 + n̄ · h) t/h N(0) = en̄t N(0).
h↓0
On the other hand, an exact discretization yields:
N(t − ∆) = en̄(t−∆) N(0).
⇔
N(t)
= en̄∆ ≡ 1 + n∆ .
N(t − ∆)
⇔
1
n̄ = ln (1 + n∆ ) .
∆
E.g., ∆ = 1, n∆ = n1 ≡ n : n̄ = ln (1 + n).
Now let’s try to do the same thing for the capital accumulation law:

Kh(k+1) = 1 − δ̄ · h Khk + Ih(k+1) · h, k = 0, · · ·, ℓ − 1,
where δ̄ is an instantaneous rate.

By iterating on as in the population growth case we get:
ℓ

ℓ ℓ−j t
Khℓ = 1 − δ̄ · h K0 + 1 − δ̄ · h Ihj · h, ℓ= ,
h
j=1
or
t/h
t/h t/h−j
Kt = 1 − δ̄ · h K0 + 1 − δ̄ · h Ihj · h,
j=1
As h ↓ 0 we get: " t
K(t) = e−δ̄t K0 + e−δ̄t eδ̄u I(u)du,
0
or in differential form:
K̇(t) = −δ̄K(t) + I(t),
and starting from the IS equation:
Y (t) = C(t) + I(t),
we obtain the capital accumulation law:
K̇(t) = Y (t) − C(t) − δ̄K(t).

99
by A. Mele
Discretization issues
An exact discretization gives:
" t+1
−δ̄ −δ̄(t+1)
K(t + 1) = e K(t) + e eδ̄u I(u)du.
t
By identifying with the standard capital accumulation law in the discrete time setting:
Kt+1 = (1 − δ) Kt + It ,
we get:
1
δ̄ = ln .
(1 − δ)
It follows that
δ ∈ (0, 1) ⇒ δ̄ > 0 and δ = 0 ⇒ δ̄ = 0.
Hence, while δ can take on only values on [0, 1), δ̄ can take on values on the entire real line.
An important restriction arises in the continuous time model when we note that:
1
lim ln = ∞,
δ→1− (1 − δ)
It is impossible to think about a “maximal rate of capital depreciation” in a continuous time model
because this would imply an infinite depreciation rate!
Finally, substitute δ into the exact discretization (?.?):
" t+1
K(t + 1) = (1 − δ) K(t) + e−δ̄(t+1) eδ̄u I(u)du
t
#
−δ̄(t+1) t+1
so that we have to interpret investments in t + 1 as e t eδ̄u I(u)du.
Per capita dynamics
Consider dividing the capital accumulation equation by N(t):
K̇(t) F (K(t), N(t))

= − c(t) − δ̄k(t) = y(k(t)) − c(t) − δ̄k(t).
N(t) N(t)
8
By plugging the relationship k̇ = d ( K(t)/ N(t)) = K̇(t) N(t) − n̄k(t) into the previous equation we
get:
k̇ = y(k(t)) − c(t) − δ̄ + n̄ · k(t)
This is the constraint we use in the problem of the following subsection.
3.8.2 The model

We consider directly the social planner problem. The program is:
 " ∞
 max e−ρt u(c(t))dt
c
0 (3A2.1)

s.t. k̇(t) = y(k(t)) − c(t) − δ̄ + n̄ · k(t)
where all variables are expressed in per-capita terms. We suppose that there is no capital depreciation
(in the discrete time model, we supposed a total capital depreciation). More general results can be
obtained with just a change in notation.
100
by A. Mele
The Hamiltonian is,

H(t) = u(c(t)) + λ(t) y(k(t)) − c(t) − δ̄ + n̄ · k(t) ,
where λ is a co-state variable.

As explained in Appendix 4 of the present chapter, the first-order conditions for this problem are:

 ∂H

 0 = (t) ⇔ λ(t) = u′ (c(t))

 ∂c
∂H
(t) = k̇(t) (3A2.2)
 ∂λ


 ∂H (t) = −λ̇(t) + ρλ(t)

⇔ λ̇(t) = ρ + δ̄ + n̄ − y ′ (k(t)) λ(t)
∂k
By differentiating the first equation in (4A2.2) we get:
′′
u (c(t))
λ̇(t) = ċ(t) λ(t).
u′ (c(t))
By identifying with the help of the last equation in (4A2.2),
u′ (c(t))
ċ(t) = ′′
ρ + δ̄ + n̄ − y ′ (k(t)) . (3A2.3)
u (c(t))
The equilibrium is the solution of the system consisting of the constraint of (4A2.1), and (4A2.3).
As in section 3.2.3, here we analyze the equilibrium dynamics of the system in a small neighborhood of
the stationary state.18 Denote the stationary state as the solution (c, k) of the constraint of program
(4A2.1), and (4A2.3) when ċ(t) = k̇(t) = 0,
+
c = y(k) − δ̄ + n̄ k
ρ + δ̄ + n̄ = y ′ (k)
Warning! these are instantaneous figures, so that don’t worry if they are not such that
y ′ (k) ≥ 1 + n!. A first-order approximation of both sides of the constraint of program (4A2.1) and
(4A2.3) near (c, k) yields:  ′
 ċ(t) = − u (c) y ′′ (k) (k(t) − k)
u′′ (c)

k̇(t) = ρ · (k(t) − k) − (c(t) − c)
where we used the equality ρ + δ̄ + n̄ = y ′ (k). By setting x(t) ≡ c(t) − c and y(t) ≡ k(t) −k the previous
system can be rewritten as:
ż(t) = A · z(t), (3A2.4)
where z ≡ (x, y)⊤ , and  
u′ (c) ′′
0 − y (k)
A≡ u′′ (c) .
−1 ρ
Warning! There must be some mistake somewhere. Let us diagonalize system (4A2.4) by
setting A = P ΛP −1 , where P and Λ have the same meaning as in the previous appendix. We have:
ν̇(t) = Λ · ν(t),
18 In addition to the theoretical results that are available in the literature, the general case can also be treated numerically with
the tools surveyed by Judd (1998).

101
by A. Mele
where ν ≡ P −1 z.
The eigenvalues are solutions of the following quadratic equation:
u′ (c) ′′
0 = λ2 − ρλ − y (k). (3A2.5)
u′′ (c)
′
We see that λ1 < 0 < λ2 , and λ1 ≡ ρ
2 − 1
2 ρ2 + 4 uu′′(c) ′′
(c) y (k). The solution for ν(t) is:
ν i (t) = κi eλi t , i = 1, 2,
whence
z(t) = P · ν(t) = v1 κ1 eλ1 t + v2 κ2 eλ2 t ,
where the vi s are 2 × 1 vectors. We have,
+
x(t) = v11 κ1 eλ1 t + v12 κ2 eλ2 t
y(t) = v21 κ1 eλ1 t + v22 κ2 eλ2 t
Let us evaluate this solution in t = 0,

x(0) κ1
= P κ = v1 v2 .
y(0) κ2
By repeating verbatim the reasoning of the previous appendix,
y(0) v21
κ2 = 0 ⇔ = .
x(0) v11
As in the discrete time model, the saddlepoint path is located along a line that has as a slope
the ratio of the components of the eigenvector associated with the negative root. We can explicitely
compute such ratio. By definition, A · v1 = λ1 v1 ⇔

 u′ (c) ′′
− ′′ y (k) = λ1 v11
u (c)

−v11 + ρv21 = λ1 v21
v21 λ1 v21 1
i.e., v11 =− u′ (c) ′′ and simultaneously, v11 = ρ−λ1 , which can be verified with the help of (3A2.5).
u′′ (c)
y (k)
102
by A. Mele
References
Farmer, R. (1998): The Macroeconomics of Self-Fulfilling Prophecies. Boston: MIT Press.
Hayashi, F. (1982): “Tobin’s Marginal q and Average q: A Neoclassical Interpretation.” Econo-

metrica 50, 213-224.
Kamihigashi, T. (1996): “Real Business Cycles and Sunspot Fluctuations are Observationally
Equivalent.” Journal of Monetary Economics 37, 105-117.
King, R. G. and S. T. Rebelo (1999): “Resuscitating Real Business Cycles.” In: J. B. Taylor
and M. Woodford (Editors): Handbook of Macroeconomics, Elsevier.
Lucas, R. E. Jr. (1978): “Asset Prices in an Exchange Economy.” Econometrica 46, 1429-1445.
Lucas, R. E. Jr. (1994): “Money and Macroeconomics.” In: General Equilibrium 40th Anniver-
sary Conference, CORE DP no. 9482, 184-187.
Prescott, E. (1991): “Real Business Cycle Theory: What Have We Learned?” Revista de Anal-
isis Economico 6, 3-19.
Tobin, J. (1969): “A General Equilibrium Approach to Monetary Policy.” Journal of Money,

Credit and Banking 1, 15-29.
Watson, M. (1993): “Measures of Fit for Calibrated Models.” Journal of Political Economy
101, 1011-1041.
103
4
Continuous time models
4.1 Lambdas and betas in continuous time

4.1.1 The pricing equation
Let St be the price of a long lived asset as of time t, and let Dt the dividend paid by the asset
at time t. In the previous chapters, we learned that in the absence of arbitrage opportunities,
St = Et [mt+1 (St+1 + Dt+1 )] , (4.1)
where Et is the conditional expectation given the information set at time t, and mt+1 is the
usual stochastic discount factor.
Next, let us introduce the pricing kernel, or state-price process,
ξ t+1 = mt+1 ξ t , ξ 0 = 1.
In terms of the pricing kernel ξ, Eq. (4.1) is,

0 = Et ξ t+1 St+1 − St ξ t + Et ξ t+1 Dt+1 .
For small trading periods h, this is,

0 = Et ξ t+h St+h − St ξ t + Et ξ t+h Dt+h h .
As h ↓ 0,
0 = Et [d (ξ (t) S (t))] + ξ (t) D (t) dt. (4.2)
Eq. (4.2) can now be integrated to yield,
" T
ξ (t) S (t) = Et ξ (u) D (u) du + Et [ξ (T ) S (T )] .
t
Finally, let us assume that limT →∞ Et [ξ (T ) S (T )] = 0. Then, provided it exists, the price St
of an infinitely lived asset price satisfies,
" ∞
ξ (t) S (t) = Et ξ (u) D (u) du . (4.3)
t
4.1. Lambdas and betas in continuous time c
by A. Mele
4.1.2 Expected returns

Let us elaborate on Eq. (4.2). We have,

dξ dS dξ dS
d (ξS) = Sdξ + ξdS + dξdS = ξS + + .
ξ S ξ S
By replacing this expansion into Eq. (4.2) we obtain,

dS D dξ dξ dS
Et + dt = −Et − Et . (4.4)
S S ξ ξ S
This evaluation equation holds for any asset and, hence,
for the assets that do not distribute
dividends and are locally riskless, i.e. D = 0 and Et dξ dS0
ξ S0
= 0, where S0 (t) is the price of
these locally riskless assets, supposed to satisfy dS 0 (t)
S0 (t)
= r (t) dt, for some short term rate process
rt . By Eq. (4.4), then,
dξ
Et = −r (t) dt.
ξ
By
dSreplacing
D this into eq. (4.4) leaves the following representation for the expected returns
Et S + S dt,
dS D dξ dS
Et + dt = rdt − Et . (4.5)
S S ξ S
In a diffusion setting, Eq. (4.5) gives rise to a partial differential equation. Moreover, in a
diffusion setting,
dξ
= −rdt − λ · dW,
ξ
where W is a vector Brownian motion, and λ is the vector of unit risk-premia. Naturally, the
price of the asset, S, is driven by the same Brownian motions driving r and ξ. We have,

dξ dS dS
Et = −Vol · λdt,
ξ S S
which leaves,
dS D dS
Et + dt = rdt + Vol ·
λ dt.
S S S
“lambdas”
“betas”
4.1.3 Expected returns and risk-adjusted discount rates

The difference between expected returns and risk-adjusted discount rates is subtle. If dividends
and asset prices are driven by only one factor, expected returns and risk-adjusted discount rates
are the same. Otherwise, we have to make a distinction. To illustrate the issue, let us make a
simplification. Suppose that the price of the asset S takes the following form,
S (y, D) = p (y) D, (4.6)
where y is a vector of state variables that are suggested by economic theory. In other words, we
assume that the price-dividend ratio p is independent of the dividends D. Indeed, this “scale-
invariant” property of asset prices arises in many model economies, as we shall discuss in detail
105
4.2. An introduction to continuous time methods in finance c
by A. Mele
in the second part of these lectures. By Eq. (4.6),

dS dp dD
= + .
S p D
By replacing the previous expansion into Eq. (4.5) leaves,

dS D dp dξ
Et + dt = Disc dt − Et . (4.7)
S S p ξ
where we define the “risk adjusted discount rate”, Disc, to be:
9
dD dξ
Disc = r − Et dt.
D ξ
If the price-dividend ratio, p, is constant, the “risk adjusted discount rate” Disc has
the usual
dD dξ
interpretation. It equals the safe interest rate r, plus the premium −Et D ξ that arises
to compensate the agents for the fluctuations of the uncertain flow of future dividends. This
premium equals,
dD dξ dD
Et = − Vol ·
λ dt.
D ξ D
“lambdas”
“cash-flow betas”
In this case, expected returns and risk-adjusted discount rates are the same thing, as in the
simple one-factor Lucas economy of Section 4.3.
If, instead, the price-dividend ratio is not constant, the last term in Eq. (4.7) introduces a
wedge between expected returns and risk-adjusted discount rates. As we shall see, the risk-
adjusted discount rates play an important role in explaining returns volatility, i.e. the beta
related to the fluctuations of the price-dividend ratio. Intuitively, this is because risk-adjusted
discount rates affect prices through rational evaluation and, hence, price-dividend ratios and
price-dividend ratios volatility. To illustrate these properties, note that Eq. (4.3) can be rewrit-
ten as, " ∞
D∗ (τ ) − τ Disc(y(u))du
p (y (t)) = E ·e t y (t) , (4.8)
t D (t)
where the expectation is taken under the risk-neutral probability, but the expected dividend
growth DD(t)
∗ (τ )
is not risk-adjusted (that is E( DD(t)
∗ (τ )
) = eg0 (τ −t) ). Eq. (4.8) reveals that risk-adjusted
discount rates play an important role in shaping the price function p and, hence, the volatility
of the price-dividend ratio p. These points are developed in detail in Chapter 7.
4.2 An introduction to continuous time methods in finance

4.2.1 Partial differential equations and Feynman-Kac probabilistic representations of the
solution
4.2.1.1 Background: Black & Scholes
Why are partial differential equations so important in finance? Suppose that the price of a stock
follows a geometric Brownian motion:
dS (t)
= µdt + σdW (t) , µ, σ > 0,
S (t)
106
by A. Mele
and that there exists a riskless accounting technology making spare money evolve as:
dB (t)
= rdt,
B (t)
where r > 0. Finally, suppose that there exists another asset, a “call option,” which gives rise to
a payoff equal to (S (T ) − K)+ at some future date T , where K is the “strike,” or exercise price
of the option. Let c (t) , t ∈ [0, T ], be the price process of the option. We wish to figure out what
this price looks like by formulating as few assumptions as possible. We ignore dividend issues,
and assume there are no transaction costs, and rule out any other forms of frinctions. We assume
rational expectations, that is, there exists a function f ∈ C 1,2 ([0, T ] × R++ ) : c (t) = f (t, S (t)).
By the previous assumption and Itô’s lemma,

∂f
dc = Lf + dt + fS σSdW,
∂t
where Lf = 12 σ 2 S 2 fSS + µSfS and subscripts denote partial derivatives. Nextwe , create the
following portfolio: α quantities of the risky asset and β quantities of the riskless accounting
technology. You will learn that for any portfolio strategy to be “self-financed” and for the value
V of the resulting portfolio, V (t) = α (t) S (t) + β (t) B (t), to be well-defined, it must be the
case that:
dV (t) = α (t) dS (t) + β (t) dB (t)

= (α (t) µS (t) + rβ (t) B (t)) dt + α (t) σS (t) dW (t) .
Next, set V0 = c0 and find α̂, β̂ such that drift and diffusion terms of V and c be the same.
This is done with α̂ = fS . Replace then this into the previous stochastic differential equation.
we have:
dV (t) = (µSfS + rβB) dt + fS σSdW.
Now find β̂ : drift(V ) = drift (c), which after simple calculations is:
∂f
Lf + ∂t
− µfS S
β̂ = .
rB
Since V0 = c0 and the previous α̂, β̂ make drifts and diffusion terms of V and c the same, then,
by the Unique Decomposition Property for stochastic differential equations stated in Appendix
1, we have that V (t) = c (t), or:
∂f
Lf + ∂t
− µfS S
f = c = V̂ ≡ α̂S + β̂B = fS S + .
r
By the definition of L and rearranging,
∂f 1
+ σ 2 S 2 fSS + rSfS − rf = 0 ∀(t, S) ∈ [0, T ) × R++ , (4.9)
∂t 2
with the “boundary condition” f (T, S) = (S − K)+ , ∀S ∈ R++ . This is an example of a Partial
Differential Equation. The “unknown” is a function f , which has to be such that it and its
partial derivatives are plugged into the left hand side of the first line, we obtain zero. Moreover,
the same functions must pick up the boundary condition. The solution to this is the celebrated
Black and Scholes (1973) formula.
107
by A. Mele
4.2.1.2 Absence of arbitrage opportunities
Suppose that c0 > f0 = α̂0 S0 + β̂ 0 B0 . Then sell the option for c0 , invest α̂0 S0 + β̂ 0 B0 , follow
the (α̂ (t) , β̂ (t))-trading strategy until time T and at time T , obtain (S (T ) − K)+ from the
portfolio - which is exactly what is due to the buyer of the option. This generates a riskless
profit = c0 −f0 without further expenses in the future - Recall, the α̂, β̂ strategy is self-financing
and there are no transaction costs). This is an arbitrage opportunity.
Suppose then the opposite, i.e. that c0 < f0 = α̂0 S0 + β̂ 0 B0 . Then buy the option for c0 and
hence claim for (S (T ) − K)+ at T . At the same time, “short-sell” the portfolio for f0 = α̂0 S0 +
:0 B0 , and subsequently update the short-selling through the strategy α̂, β̂. The short-selling
β
position is (−α̂ (t))S (t) + (−β̂ (t))B (t) for all t. At time T , the same short-selling position is
(−α̂ (T ))S (T ) + (−β̂ (T ))B (T ) = −(α̂ (T ) S (T ) + β̂ (T ) B (T )) = −(S (T ) − K)+ . This amount
of money is exactly the payoff of the option purchased at time zero. Thus use the option payoff
to close the short selling position. The whole strategy generate a riskless profit = f0 −c0 without
further expenses in the future. This is an arbitrage opportunity.
Therefore, absence of arbitrage opportunities are ruled out with c0 = f0 . Note, the previous
argument does not hinge upon the existence of a market for the option during the life of the
option.
4.2.1.3 Some technical definitions
The Black-Scholes equation (4.9) is a typical (in fact the first) example of partial differential
equations in finance. It leads to an equation of the so-called parabolic type, as we shall explain
soon. More generally, let us be given,
a0 + a1 Ft + a2 FS + a3 FSS + a4 Ftt + a5 FSt = 0,
subject to some boundary condition. This partial differential equation is called: (i) elliptic,
if a25 − 4a3 a4 < 0; (ii) parabolic, if a25 − 4a3 a4 = 0; (iii) hyperbolic, if a25 − 4a3 a4 > 0. The
typical partial differential equations arising in finance are of the parabolic type. For example,
the Black-Scholes function F = ert f is parabolic. The following section explains how to provide
a probabilitsic representation to these parabolic partial differential equations.
4.2.1.4 Partial differential equations and Feynman-Kac probabilistic representations
The typical situation encountered in finance is when a function F , typically the price of some
asset, is solution to a parabolic partial differential equation:
1
−r (x, t) F (x, t) + Ft (x, t) + µ (x, t) Fx (x, t) + σ 2 (x, t) Fxx (x, t) = 0, ∀(t, x) ∈ [0, T ) × R
2
(4.10)
with the boundary condition, F (x, T ) = g (x, T ), ∀x ∈ R, where the function g is interpreted
as the final payoff.
Somehow surprisingly, define a SDE that has drift and diffusion µ and σ in Eq. (10.15),
dZ (t) = µ (Z (t) , t) dt + σ (Z (t) , t) dW (t) , Z0 = x. (4.11)
where W (t) is a Brownian motion. Under regularity condition on µ, σ, r, the solution F to Eq.
(10.15) is: #T
− r(Z(s),s)ds
F (x, t) = E e 0 g (Z (T ) , T ) , (4.12)
108
by A. Mele
where Z is solution to Eq. (4.11), and the expectation is taken with respect to the distribu-
tion of Z in Eq. (4.11). As a technical remark, note that the existence of the Feynman-Kac
representation does not ensure per se the existence of a solution to a given partial differential
equation.
The Feynman-Kac representation of the solution to partial differential equations is quite
useful. First, computing expectations is generally both easier and more intuitive than finding
a solution to partial differential equations through guess and trial. Second, except for specific
cases, the solution to asset prices is unknown, and a natural way to cope with this problem is
to go for Monte-Carlo methods (i.e. approximating the expectation in Eq. (4.12) through sim-
ulations and using the law of large numbers. Finally, the Feynman-Kac representation theorem
is useful for some theoretical reasons we shall see later in this chapter.
4.2.1.5 A few heuristic proofs
It is beyond the purpose of this book to develop detailed proof of the Feynman-Kac repre-
sentation theorem. In addition to Karatzas and Shreve (1991, p.366), an excellent source to is
still Friedman (1975), which relaxes many sufficient conditions given in Karatzas and Shreve
through opportune localizations of linear and growth conditions. The heuristic proof provided
below covers the slightly more general case in which
∂f
Lf + + q − rf = 0, (4.13)
∂t
with some boundary condition. Here q is some function q (x, t) ≡ qt . The typical role of q is the
one of instantaneous dividend rate promised by the asset. As usual, Lf = µfx + 12 σ 2 fxx .
So suppose there exists a solution to Eq. (11.28). To see what a Feynman-Kac representation
of such a solution looks like in this case, define
#t " t #u
− r(u)du
y (t) ≡ e 0 f (t) + e− 0 r(s)ds q (u) du,
0
where again, f (t) = f (t, z (t)), and:
dz (t) = µ (t) dt + σ (t) dW (t) , z0 = x.
By Itô’s lemma,
t t t
dy = e− 0 r(u)du
qdt − re− 0 r(u)du
fdt + e−
df 0 r(u)du

− 0t r(u)du − 0t r(u)du − 0t r(u)du ∂f
=e qdt − re fdt + e (Lf + )dt + fz σdW
∂t
t ∂f
= e− 0 r(u)du [(Lf + + q − rf )dt + fz σdW ]
∂t
=0
t
− r(u)du
=e 0 fz σdW.
#t
Therefore y (T ) = y0 + e− 0
r(u)du
fz σdW . Assuming σfz ∈ H2 , then, y is martingale, viz y0 =
E(y (T )). We have,
#T " T #u
− r(t)dt
y (T ) = e 0 f (T ) + e− 0 r(s)ds q (u) du, and y0 = f0 .
0
109
by A. Mele
Hence, #T "
T #u
− r(t)dt − r(s)ds
f0 = y0 = E(y (T )) = E e 0 f (T ) + E e 0 q (u) du .
0
4.2.2 The Girsanov theorem with applications to finance

4.2.2.1 Motivation
Consider again the Black-Scholes partial differential equation:

∂f
+ Lf − rf = 0, ∀(t, S) ∈ [0, T ] × R++ ,
∂t
with boundary condition, f (T, S) = (S − K)+ for all S ∈ R++ , where Lf = 12 σ 2 S 2 fSS + rSfS ,
and subscripts denote partial derivatives. By the Feynman-Kac theorem,
f (t) = f(t, S (t)) = e−r(T −t) EQ (S (T ) − K)+ ,
where the expectation EQ is taken with respect to the probability Q, say, on the σ-field generated
by S (t), and S (t) is solution to
dS (t)
= rdt + σdW̃ (t) ,
S (t)
where W̃ is a Brownian motion defined under Q. For obvious reasons, such a probability measure
is usually referred to as risk-neutral measure, as we shall explain in detail further in this chapter.
As it turns out, the probability Q is related to the physical probability B on the σ-field
generated by S (t), where S (t) is solution to:
dS (t)
= µdt + σdW (t) ,
S (t)
and W is now a Brownian motion defined under B. To see heuristically that this is true, let us
consider the following equalities:
"
EQ (S (T ) − K) = [S (T ) (ω) − K]+ Q (dω)
+
"
= ξ (T ) (ω) [S (T ) (ω) − K]+ B(dω)

≡ EP ξ (T ) (ω) (S (T ) − K)+ ,
Q(dω)
where ξ (T ) ≡ P (dω)
, and ξ (t) is solution to:
dξ (t)
= −λdW (t) , ξ 0 = 1,
ξ (t)
and λ is necessarily equal to, λ = µ−r . To show this, let y (t) ≡ ξ (t) f (t). We have, f0 =
+
σ
e E ξ (T ) (S (T ) − K) . Therefore, and because y (T ) = (S (T ) − K)+ , we have that:
−rT
y0 = f0 = e−rT EP (y (T )) .
110
by A. Mele
That is, e−rt y (t) is a P -martingale. Now we have, by Itô’s lemma:

d e−rt y = −re−rt ydt + e−rt dy

−rt ∂f 1 2 2
= e ξ + σ S fSS + µSfS − λσSfS − rf dt + e−rt ξ (σSfS − λf) dW.
∂t 2
Under the usual pathwise integrability conditions, this is a martingale when,
∂f 1
0= + σ 2 S 2 fSS + µSfS − λσSfS − rf. (4.14)
∂t 2
On the other hand, we know that f is solution of the Black-Scholes partial differential equation:
∂f 1
0= + σ 2 S 2 fSS + rSfS − rf. (4.15)
∂t 2

Comparing Eq. (4.14) with Eq. (4.15) reveals that the representation EP ξ (T ) (S (T ) − K)+
is possible with, λ = µ−r
σ
, as originally claimed. λ has the simple interpretation of unit risk-
premium for investing in stocks.
The point of the previous computations is that it looks like as if we could start from the
original probability space under which
dS
= µdt + σdW, (4.16)
S
and, then, we could define a new Brownian motion dW̃ = dW + λdt, such that Eq. (4.16) can
be written as, dS
S
= (µ − λσ) dt + σdW̃ = rdt + σdW̃ , under some new probability space. And
vice-versa. We formalize this idea in the next subsection, although the following clarification
is in order. The definition of Brownian motion you were originally given obviously depends on
the underlying probability measure P . As an example, for the definition of the independent,
stationary increments of a Brownian motion and for its Gaussian distribution, we must know
the probability measure on the σ-field F . Usually, we do not pay attention to this fact, although
this very same fact can be crucial as the previous example demonstrates.
4.2.2.2 The theorem
Let W (t) be a P -Brownian motion, and λ be some measurable process satisfying the so-called
#T
Novikov’s condition: E[exp( 12 0 λ (t)2 dt)] < 0. Then there exists another probability measure
Q equivalent to P with the following properties:
# #T
dQ 1 T 2
(i) Radon-Nikodym derivative, dP = ξ (T ) = exp − 2 0 λ (t) dt − 0 λ (t) dW (t) .1
#T
(ii) W̃ (t) = W (t) + 0
λ (t) dt is a Q-Brownian motion.
To grasp intuition, consider the following

example. Suppose that some random variable x is
standard normal: P (dx) = √12π exp − 12 x2 dx, and that, next, we tilt that distribution by a

factor ξ (x) = exp − 12 λ2 − λz . Precisely,the new random variable
has distribution, Q (dx) =
1 1 2 1 2 1 1 2
ξ (x) P (dx) = 2π exp − 2 x − 2 λ − λz dx = 2π exp − 2 x̃ dx̃, where x̃ = x + λ. Note
√ √
1 This

is short-hand notation for the more rigorous definition: ν 1 (A) = A ξ (ω) ν 2 (dω) ∀A ∈ F for any two measure ν 1 and ν 2 .
111
4.3. An introduction to arbitrage and equilibrium in continuous time models c
by A. Mele
that the new density Q is still normal with unit variance. Yet under this new probability, it
is x̃ = x + λ to have zero expectation. In other words, we have that under Q, x̃ is standard
normal, or that alternatively, x is normal with unit variance but drift −λ.
The fact that changing probability does not lead to change volatility is a well known fact in
continuous-time finance where asset prices are driven by Brownian motions. This property does
not need to hold in other models, or even in discrete time settings. The typical counterexample
is that of a binomial distribution, as in the infinite horizon tree model of Chapter 7, and in all
the trees dealt with in Chapter 13.
We conclude this section by discussing a few technical details. The Novikov
dQ condition is
needed for a variety of reasons. Technically, we need it to ensure that E dP = E (ξ (T )) =
#T #T #
E[exp(− 12 0 λ (t)2 dt − 0 λ (t) dW (t))] = 1 ⇔ dQ # = 1. This condition rules out extremely
ill-behaved λs which could not allow the equality dQ = 1 to hold. Thus, it ensures Q is
indeed a probability. We may also define the Radon-Nikodym density process of dQ dP
. First,
some intuition. Suppose we have a claim at time T . Heuristically, we have
" "
dQ dQ
EQ [X (T )] = X (T ) dQ = X (T ) dP = EP X (T ) = EP [ξ (T ) X (T )] .
dP dP
Similarly, we can “update” the previous formula as time unfolds. The formula to use is,
EQ [ X (T )| F (t)] = ξ (t)−1 EP [ξ (T ) X (T )] ,
where
dξ (t)
= −λ (t) dW (t) , ξ 0 = 1.
ξ (t)
4.3 An introduction to arbitrage and equilibrium in continuous time models

4.3.1 A “reduced-form” economy
Let us consider the Lucas (1978) model with one tree and a perishable good taken as the
numéraire. The Appendix shows that in continuous time, the wealth process of a representative
agent at time τ , V (τ ), is solution to,

dS(τ ) D(τ )
dV (τ ) = + dτ − rdτ π(τ ) + rV dτ − c(τ )dτ , (4.17)
S(τ ) S(τ )
where D (τ ) is the dividend process, π ≡ Sθ(1) , and θ (1) is the number of trees in the portfolio
of the representative agent.
We assume that the dividend process, D (τ ), is solution to the following stochastic differential
equation,
dD
= µD dτ + σ D dW,
D
for two positive constants µD and σ D . Under rational expectation, the price function S is such
that S = S(D). By Itô’s lemma,
dS
= µS dτ + σ S dW,
S
where
µ DS ′ (D) + 12 σ 2D D2 S ′′ (D) σ D DS ′ (D)
µS = D ; σS = .
S(D) S(D)
112
by A. Mele
Then, by Eq. (4.17), the value of wealth satisfies,

D
dV = π µS + − r + rV − c dτ + πσ S dW.
S
Below, we shall show that in the absence of arbitrage, there must be some process λ, the “unit
risk-premium”, such that,
D
µS + − r = λσ S . (4.18)
S
Let us assume that the short-term rate, r, and the risk-premium, λ, are both constant. Below,
we shall show that such an assumption is compatible with a general equilibrium economy. By
the definition of µS and σ S , Eq. (4.18) can be written as,
1
0 = σ 2D D2 S ′′ (D) + (µD − λσ D ) DS ′ (D) − rS (D) + D. (4.19)
2
Eq. (4.19) is a second order differential equation. Its solution, provided it exists, is the the
rational price of the asset. To solve Eq. (4.19), we initially assume that the solution, SF say,
tales the following simple form,
SF (D) = K · D, (4.20)
where K is a constant to be determined. Next, we verify that this is indeed one solution to
Eq. (4.19). Indeed, if Eq. (4.20) holds, then, by plugging this guess and its derivatives into Eq.
(4.19) leaves, K = (r − µD + λσ D )−1 and, hence,
1
SF (D) = D. (4.21)
r + λσ D − µD
This is a Gordon-type formula. It merely states that prices are risk-adjusted expectations of
future expected dividends, where the risk-adjusted discount rate is given by r + λσ D . Hence,
in a comparative statics sense, stock prices are inversely related to the risk-premium, a quite
intuitive conclusion.
Eq. (4.21) can be thought to be the Feynman-Kac representation to Eq. (4.19), viz
" ∞
−r(τ −t)
SF (D (t)) = Et e D (τ ) dτ , (4.22)
t
where Et [·] is the conditional expectation taken under the risk neutral probability Q (say), the
dividend process follows,
dD
= (µD − λσ D ) dτ + σ D dW̃ ,
D
and W̃ (τ ) = W (τ )+λ (τ − t) is a another standard Brownian motion defined under Q. Formally,
the true probability, P , and the risk-neutral probability, Q, are tied up by the Radon-Nikodym
derivative,
dQ 1 2
η= = e−λ(W (τ )−W (t))− 2 λ (τ −t) . (4.23)
dP
113
by A. Mele
4.3.2 Preferences and equilibrium

The previous results do not let us see how precisely preferences affect asset prices. In Eq. (4.21),
the asset price is related to the interest rate, r, and the risk-premium, λ. In equilibrium, the
agents preferences affect the interest rate and the risk-premium. However, such an impact can
have a non-linear pattern. For example, when the risk-aversion is low, a small change of risk-
aversion can make the interest rate and the risk-premium change in the same direction. If the
risk-aversion is high, the effects may be different, as the interest rate reflects a variety of factors,
including precautionary motives.
To illustrate these points, let us rewrite, first, Eq. (4.17) under the risk-neutral probability
Q. We have,
dV = (rV − c) dτ + πσ S dW̃ . (4.24)
We assume that the following transversality condition holds,

lim Et e−r(τ −t) V (τ ) = 0. (4.25)
τ →∞
By integrating Eq. (4.24), and using the previous transversality condition,

" ∞
−r(τ −t)
V (t) = Et e c (τ ) dτ . (4.26)
t
By comparing Eq. (4.22) with Eq. (4.26) reveals that the equilibrium in the real markets, D = c,
also implies that S = V . Next, rewrite (4.26) as,
" ∞ " ∞
−r(τ −t)
V (t) = Et e c(t)dτ = Et mt (τ )c(t)dτ ,
t t
where
ξ (τ )
= e−(r+ 2 λ )(τ −t)−λ(W (τ )−W (t)) .
1 2
mt (τ ) ≡
ξ (t)
We assume that a representative agent solves the following intertemporal optimization prob-
lem, " ∞ " ∞
−ρ(τ −t)
max Et e u (c(τ )) dτ s.t. V (t) = Et mt (τ )c(τ )dτ [P1]
c t t
for some instantaneous utility function u (c) and some subjective discount rate ρ.
To solve the program [P1], we form the Lagrangean
" ∞ " ∞
−ρ(τ −t)
L = Et e u(c(τ ))dτ + ℓ · V (t) − Et mt (τ )c(τ )dτ ,
t t
where ℓ is a Lagrange multiplier. The first order conditions are,
e−ρ(τ −t) u′ (c(τ )) = ℓ · mt (τ ).
Moreover, by the equilibrium condition, c = D, and the definition of mt (τ ),
u′ (D (τ )) = ℓ · e−(r+ 2 λ −ρ)(τ −t)−λ(W (τ )−W (t)) .

1 2
(4.27)
114
by A. Mele
That is, by Itô’s lemma,

′′
du′ (D) u (D)D 1 2 2 u′′′ (D) u′′ (D)D
= µ + σ D dτ + σ D dW. (4.28)
u′ (D) u′ (D) D 2 D u′ (D) u′ (D)
Next, let us define the right hand side of Eq. (8A.14) as U (τ ) ≡ ℓ · e−(r+ 2 λ −ρ)(τ −t)−λ(W (τ )−W (t)) .
1 2
By Itô’s lemma, again,

dU
= (ρ − r) dτ − λdW. (4.29)
U
By Eq. (8A.14), drift and volatility components of Eq. (4.28) and Eq. (4.29) have to be the
same. This is possible if
u′′ (D) D 1 2 2 u′′′ (D) u′′ (D) D
r =ρ− µ − σ D ′ ; and λ = − σD .
u′ (D) D 2 D u (D) u′ (D)
Let us assume that λ is constant. After integrating the second of these relations two times, we
obtain that besides some irrelevant integration constant,
D1−η − 1 λ
u (D) = , η≡ ,
1−η σD
where η is the CRRA. Hence, under CRRA preferences we have that,
η(η + 1) 2
r = ρ + ηµD − σ D , λ = ησ D .
2
Finally, by replacing these expressions for the short-term rate and the risk-premium into Eq.
(4.21) leaves,
1
S(D) = D,
ρ − (1 − η) µD − 12 ησ 2D
provided the following conditions holds true:

1 2
ρ > (1 − η) µD − ησ D . (4.30)
2
We are only left to check that the transversality condition (4.25) holds at the equilibrium
S = V . We have that under the previous inequality,

lim Et e−r(τ −t) V (τ ) = lim Et e−r(τ −t) S(τ )
τ →∞ τ →∞
= lim Et [mt (τ )S(τ )]

τ →∞
!
lim Et e−(r+ 2 λ )(τ −t)−λ(W (τ )−W (t)) S(τ )
1 2
=
τ →∞
!
(µD − 12 σ 2D −r− 12 λ2 )(τ −t)+(σ D −λ)(W (τ )−W (t))
= S (t) lim Et e
τ →∞
= S (t) lim e−(r−µD +σD λ)(τ −t)

τ →∞
= S (t) lim e−[ρ−(1−η)(µD − 2 ησD )](τ −t)

1 2
τ →∞
= 0. (4.31)
115
by A. Mele
4.3.3 Bubbles
The transversality condition in Eq. (4.25) is often referred to as a no-bubble condition. To
illustrate the reasons underlying this definition, note that Eq. (4.19) admits an infinite number
of solutions. Each of these solutions takes the following form,
S(D) = KD + ADδ , K, A, δ constants. (4.32)
Indeed, by plugging Eq. (4.32) into Eq. (4.19) reveals that Eq. (4.32) holds if and only if the
following conditions holds true:
1
0 = K (r + λσ D − µD ) − 1,and 0 = δ (µD − λσ D ) + δ (δ − 1) σ 2D − r. (4.33)
2
The first condition implies that K equals the price-dividend ratio in Eq. (4.21), i.e. K =
SF (D)/ D. The second condition leads to a quadratic equation in δ, with the two solutions,
δ1 < 0 and δ 2 > 0.
Therefore, the asset price function takes the following form:
S(D) = SF (D) + A1 Dδ1 + A2 Dδ2 .
It satisfies:
lim S (D) = ∓∞, if A1 ≶ 0, lim S (D) = 0 if A1 = 0.
D→0 D→0
To rule out an explosive behavior of the price as the dividend level, D, gets small, we must set
A1 = 0, which leaves,
S (D) = SF (D) + B (D) , B (D) ≡ A2 Dδ2 . (4.34)
The component, SF (D), is the fundamental value of the asset, as by Eq. (4.22), it is the
risk-adjusted present value of the expected dividends. The second component, B (D), is simply
the difference between the market value of the asset, S (D), and the fundamental value, SF (D).
Hence, it is a bubble.
We seek conditions under which Eq. (4.34) satisfies the transversality condition in Eq. (4.25).
We have,

lim Et e−r(τ −t) S(τ ) = lim Et e−r(τ −t) SF (D (τ )) + lim Et e−r(τ −t) B (D (τ )) .
τ →∞ τ →∞ τ →∞
By Eq. (4.31), the fundamental value of the asset satisfies the transversality condition, under
the condition given in Eq. (4.30). As regards the bubble, we have,
!
lim Et e−r(τ −t) B (D (τ )) = A2 · lim Et e−r(τ −t) D (τ )δ2
τ →∞ τ →∞
1 2
!
= A2 · D (t)δ2 · lim Et e(δ2 (µD −λσD )+ 2 δ2 (δ2 −1)σD −r)(τ −t)
τ →∞
= A2 · D (t)δ2 , (4.35)
where the last line holds as δ 2 satisfies the second condition in Eq. (4.33). Therefore, the bubble
can not satisfy the transversality condition, except in the trivial case in which A2 = 0. In other
words, in this economy, the transversality condition in Eq. (4.25) holds if and only if there are
no bubbles.
116
by A. Mele
4.3.4 Reflecting barriers and absence of arbitrage

Next, suppose that insofar as the dividend D (τ ) fluctuates above a certain level D > 0, every-
thing goes as in the previous section but that, as soon as the dividends level hits a “barrier”
D, it is “reflected” back with probability one. In this case, we say that the dividend follows a
process with reflecting barriers. How does the price behave in the presence of such a barrier?
First, if the dividend is above the barrier, D > D, the price is still as in Eq. (4.32),
1
S(D) = D + A1 Dδ1 + A2 Dδ2 .
r − µD + λσ D
First, and as in the previous section, we need to set A2 = 0 to satisfy the transversality condition
in Eq. (4.25) (see Eq. (4.35)). However, in the new context of this section, we do not need to set
A1 = 0. Rather, this constant is needed to pin down the behavior of the price function S (D)
in the neighborhood of the barrier D.
We claim that the following smooth pasting condition must hold in the neighborhood of D,
S ′ (D) = 0. (4.36)
This condition is in fact a no-arbitrage condition. Indeed, after hitting the barrier D, the divi-
dend is reflected back for the part exceeding D. Since the reflection takes place with probability
one, the asset is locally riskless at the barrier D. However, the dynamics of the asset price is,
dS σ D DS ′
= µS dτ + dW.
S S

σS
Therefore, the local risklessness of the asset at D is ensured if S ′ (D) = 0. [Warning: We need
to add some local time component here.] Furthermore, rewrite Eq. (4.18) as,
D σ D DS ′ (D)
µS + − r = λσ S = λ .
S S (D)
If D = D then, by Eq. (4.36), S ′ (D) = 0. Therefore,

D
µS + = r.
S
This relations tells us that holding the asset during the reflection guarantees a total return
equal to the short-term rate. This is because during the reflection, the asset is locally riskless
and, hence, arbitrage opportunities are ruled out when holding the asset will make us earn no
more than the safe interest rate, r. Indeed, by previous relation into the wealth equation (4.17),
and using the condition that σ S = 0, we obtain that

D
dV = π µS + − r + rV − c dτ + πσ S dW = (rV − c) dτ .
S
This example illustrates how the relation in Eq. (4.18) works to preclude arbitrage opportunities.
Finally, we solve the model. We have, K ≡ SF (D)/ D, and
0 = S ′ (D) = K + δ 1 A1 Dδ1 −1 ; Q ≡ S (D) = KD + A1 Dδ1 ,

117
4.4. Martingales and arbitrage in a diffusion model c
by A. Mele
where the second condition is the value matching condition, which needs to be imposed to
ensure continuity of the pricing function with respect to D and, hence absence of arbitrage.
The previous system can be solved to yield2
1 − δ1 K 1−δ1
Q= KD and A1 = D .
−δ 1 −δ 1
Note, the price is an increasing and convex function of the fundamentals, D.
4.4 Martingales and arbitrage in a diffusion model

4.4.1 The information framework
We still consider a Lucas’ type economy, but consider a finite horizon T < ∞. The primitives
include a probability space (Ω, F , P ). Let W be a standard Brownian motion in Rd . Define
F = {F (t)}t∈[0,T ] as the P -augmentation of the natural filtration F W (τ ) = σ (W (s), s ≤ τ )
generated by W , with F = F (T ).
We consider m trees and an accumulation factor. These assets, and other “inside money”
assets (i.e., assets in zero net supply) to be introduced later, are exchanged without frictions.
The trees entitle to receive the usual fruits, or dividends, Di = {Di (τ )}τ ∈[t,T ] , i = 1, ···, m, which
are positive F(τ )-adapted bounded processes. Fruits are the numéraire. Let S+ = {S+ (τ ) =
(S0 (τ ), · · ·, Sm (τ ))⊤ }τ ∈[t,T ] be the positive F (τ )-adapted asset price process. The accumulation
factor does not distribute dividends. Its price satisfies:
" τ
S0 (τ ) = exp r(u)du ,
t
#T
where r(τ ) is F (τ )-adapted process satisfying E( t r(τ )du) < ∞. We assume the dynamics of
the last components of S+ , i.e. S ≡ (S1 , · · ·, Sm )⊤ , satisfy:
dSi (τ ) = Si (τ )âi (τ )dt + Si (τ )σ i (τ )dW (τ ), i = 1, · · ·, m, (4.37)
where âi (τ ) and σ i (τ ) are processes satisfying the same properties as r, with σ i (τ ) ∈ Rd . We
assume that rank(σ(τ ; ω)) = m ≤ d a.s., where σ(τ ) ≡ (σ 1 (τ ), · · ·, σ m (τ ))⊤ .
We assume that Di is solution to
dDi (τ ) = Di (τ )aDi (τ )dτ + Di (τ )σ Di (τ )dW (τ ),
where aDi (τ ) and σ Di (τ ) are F(τ )-adapted, with σ Di ∈ Rd .

A strategy is a predictable process in Rm+1 , denoted as: θ = {θ(τ ) = (θ 0 (τ ), · · ·, θm (τ ))⊤ }τ ∈[t,T ] ,
#T
and satisfying E( t θ(τ )2 dτ ) < ∞. The value of a strategy, net of dividends, is: V ≡ S+ · θ,
where S+ is a row vector. By generalizing Section 4.4.1, we say a strategy is self-financing if its
value V , is the solution to:

dV = π ⊤ (a − 1m r) + V r − c dt + π ⊤ σdW, (4.38)
2 In this model, we take the barrier D as given. In other context, we might be interested in “controlling” the dividend D in such
a way that as soon as the price, q, hits a level Q, the dividend level D is activate to induce the price q to increase. The solution for
−δ1
Q reveals that this situation is possible when D = K −1 Q, where Q is an exogeneously given constant.
1 − δ1
118
by A. Mele
where 1m is a m-dimensional vector of ones, π ≡ (π 1 , · · ·, π m )⊤ , π i ≡ θi Si , i = 1, · · ·, m,

a ≡ (â1 + D1
S1
, · · ·, âm + Dm ⊤
Sm
) . The solution to the previous equation is, for each τ ∈ [t, T ],
" τ " τ " τ
V x,π,c (τ ) c (u) π ⊤ (u) (a (u) − 1m r (u)) π⊤ (u) σ (u)
= x− du+ du+ dW (u), (4.39)
S0 (τ ) t S0 (u) t S0 (u) t S0 (u)
where x denotes the initial wealth. We require V to be strictly positive.
4.4.2 Viability
#τ
Let ḡi = SS0i + z̄i , i = 1, · · ·, m, where dz̄i = S10 dzi and zi (τ ) = t Di (u)du. Let us generalize the
definition of the risk-neutral probability in Eq. (4.23), and introduce the set Q of risk-neutral,
or equivalent martingale, probabilities, defined as:
Q ≡ {Q ≈ P : ḡi is a Q-martingale} .
The aim of this section is to show the equivalent of Theorem 2.8 in Chapter 2: Q is not empty
if and only if there are not arbitrage opportunities.
Associated to every F(t)-adapted process {λ(t)}t∈[0,T ] satisfying some basic regularity condi-
tions (essentially, the Novikov’s condition),
" τ
W0 (t) = W (t) + λ(u)du, τ ∈ [t, T ], (4.40)
t
is a standard Brownian motion under a probability Q which is equivalent to P , with Radon-

Nikodym derivative equal to,
" " T
dQ 1 T 2 ⊤
η(T ) ≡ = exp − λ(τ ) dτ − λ (τ )dW (τ ) . (4.41)
dP 2 t t
The process η(τ )τ ∈[t,T ] is a martingale under P . This result is the celebrated Girsanov’s theorem.
Now let us rewrite Eq. (4.37) under such a new probability by plugging W0 in it. Under Q,
dSi (τ ) = Si (τ ) (âi (τ ) − σ i (τ ) λ (τ )) dt + Si (τ )σ i (τ )dW0 (τ ), i = 1, · · ·, m.
We also have

Si Si (τ )
dḡi (τ ) = d (τ ) + dz̄i (τ ) = [(ai (τ ) − r(τ )) dτ + σ i (τ )dW (τ )] .
S0 S0 (τ )
If ḡi is a Q-martingale, i.e.
" T
Q S0 (τ ) S0 (τ )
Si (τ ) = Eτ Si (T ) +
Di (s)ds F (τ ) , i = 1, · · ·, m, (4.42)
S0 (T ) τ S0 (s)
it is necessary and sufficient that ai − σ i λ = r, i = 1, · · ·, m, or
a(τ ) − 1m r(τ ) = σ (τ ) λ (τ ) . (4.43)
Therefore, by Eqs. (4.38), (4.40) and (4.43), we have that, for τ ∈ [t, T ],
" τ " τ ⊤
V x,π,c (τ ) c (u) π (u) σ (u)
= x− du + dW0 (u). (4.44)
S0 (τ ) t S0 (u) t S0 (u)
119
by A. Mele
Consider the following definition:
Definition 4.1 (Arbitrage opportunity). A portfolio π is an arbitrage opportunity if V x,π,0 (t) ≤

S0−1 (T ) V x,π,0 (T ) and Pr S0−1 (T ) V x,π,0 (T ) − x > 0 > 0.
We have:
Theorem 4.2. There are no arbitrage opportunities if and only if Q is not empty.
A proof of this theorem is in the Appendix. The if part follows easily, by Eq. (4.44). The
only if part is more elaborated, but its basic structure can be understood as follows. By the
Girsanov’s theorem, the statement “absence of arbitrage opportunities ⇒ ∃Q ∈ Q” is equivalent
to “absence of arbitrage opportunities ⇒ ∃λ satisfying Eq. (4.43).” If Eq. (4.43) didn’t hold, one
could implement an arbitrage, and find a nonzero π : π ⊤ σ = 0 and π ⊤ (a−1m r) = 0. Once could
then use π when a − 1m r > 0 and −π when a − 1m r < 0, and obtain an appreciation rate of V
greater than r in spite of having zeroed uncertainty through π ⊤ σ = 0. If Eq. (4.43) holds, such
an arbitrage opportunity would never occur, as in this case for each π, π ⊤ (a − 1m r) = π ⊤ σλ.
Let
& '
σ ⊤ ⊥ ≡ x ∈ L2t,T,m : σ ⊤ x = 0d
and
& '
σ ≡ z ∈ L2t,T,m : z = σu, for u ∈ L2t,T,d .
Then, we may formalize the previous reasoning as follows. The excess return vector, a − 1m r,
must be orthogonal to all vectors in σ ⊤ ⊥ , and since σ and σ ⊤ ⊥ are orthogonal, a − 1m r ∈
σ, or ∃λ ∈ L2t,T,d : a − 1m r = σλ.3
4.4.3 Market completeness

Let Y ∈ L2 (Ω, F, P ). Consider the following definition:
Definition 4.3 (Market completeness). Markets are dynamically complete if for each ran-
dom variable Y ∈ L2 (Ω, F , P ), we can find a portfolio process π : V x,π,0 (T ) = Y a.s.
The previous definition is the natural continuous-time counterpart to that we gave in the
discrete-time case (see Chapter 2). In analogy with the conclusions in Chapter 2, we shall prove
that in continuous-time, markets are dynamically complete if and only if (i) m = d and (ii) the
price volatility matrix of the available assets (primitives and derivatives) is nonsingular. We shall
provide a sketch of the proof for the sufficiency part of this statement (see, e.g., Karatzas (1997
pp. 8-9) for the converse), which relates to the existence of fully spanning dynamic strategies.
So given a Y ∈ L2 (Ω, F , P ), let m = d and suppose the volatility matrix σ is nonsingular. Let
3 To see that σ and σ′ ⊥ are orthogonal spaces, note that:

x ∈ L2t,T ,m : x⊤ z = 0, z ∈ σ = x ∈ L2t,T,m : x⊤ σu = 0, u ∈ L2t,T,d

= x ∈ L2t,T,m : x⊤ σ = 0d

= x ∈ L2t,T,m : σ ⊤ x = 0d
≡ σ⊤ ⊥ .
120
by A. Mele
us consider the Q-martingale:

M (τ ) ≡ E Q S0 (T )−1 · Y F(τ ) . (4.45)
By the representation theorem of continuous local martingales as stochastic integrals with
respect to Brownian motions (e.g., Karatzas and Shreve (1991) (thm. 4.2 p. 170)), there exists
ϕ ∈ L20,T,d (Ω, F , Q) such that M can be written as:
" τ
M (τ ) = M (t) + ϕ⊤ (u)dW0 (u).
t
We wish to find out a portfolio process π such that the discounted wealth process, net of
consumption, S0−1 (τ ) V x,π,0 (τ ) equals M (τ ) under P (or, equivalently, under Q) a.s. By Eq.
(4.44), " τ ⊤
V x,π,0 (τ ) π (u) σ (u)
= x+ dW0 (u),
S0 (τ ) t S0 (u)
and so, by identifying, the portfolio we are looking for is π̂ ⊤ = S0 ϕ⊤ σ −1 . Set, then, x = M (t).
Then, M (τ ) = S0−1 (τ ) V M(t),π̂,0 (τ ), and in particular, M (T ) = S0−1 (T ) V M(t),π̂,0 (T ) a.s. By
comparing with Eq. (4.45), V M(t),π̂,0 (T ) = Y .
Armed with this result, we can now easily state:
Theorem 4.4. Q is a singleton if and only if markets are complete.
Proof. There exists a unique λ : a − 1m r = σλ ⇐⇒ m = d. The result follows by the

Girsanov’s theorem.
When markets are incomplete, there is an infinity of risk-neutral probabilities belonging to

Q. Absence of arbitrage does not allow us to “recover” a unique risk-neutral probability, just as
in the discrete time model of Chapter 2. One could make use of general equilibrium arguments,
but in this case we go beyond the edge of knowledge, although we shall see something in Part
II of these lectures on “Asset pricing and reality.”
The next results, provide a further representation of the set of risk-neutral probabilities Q,
in the incomplete markets case. Let L20,T,d (Ω, F, P ) be the space of all F (t)-adapted processes
#T
x in Rd satisfying: 0 < 0 x(u)2 du < ∞, and define,
& '
σ⊥ ≡ x ∈ L20,T,d (Ω, F , P ) : σ(t)x(t) = 0m a.s. ,
where 0m is a vector of zeros in Rm . Let
−1
λ̂ = σ ⊤ σσ ⊤ (a − 1m r) .
Under the usual regularity conditions, λ̂ can be interpreted as the process of unit risk-premia.
In fact, all processes belonging to the set:
) *
⊥
Z = λ : λ(t) = λ̂(t) + η(t), η ∈ σ
are bounded and, hence, can be interpreted as unit risk-premia processes. More precisely, define
the Radon-Nikodym derivative of Q with respect to P on F(T ):
" " T
dQ 1 T; ;
;2
; ⊤
η̂(T ) ≡ = exp − ;λ̂(t); dt − λ̂ (t)dW (t) ,
dP 2 0 0
121
4.5. Equilibrium with a representative agent c
by A. Mele
and the density process of all Q ≈ P on (Ω, F ),

" t " t
1 2 ⊤
η(t) = η̂(t) · exp η(u) du − η (u)dW (u) , t ∈ [0, T ]),
2 0 0
a strictly positive P -martingale. We have the following results, which follows for example by
He and Pearson (1991, Proposition 1 p. 271) or Shreve (1991, Lemma 3.4 p. 429):
Proposition 4.5. Q ∈ Q if and only if it is of the form: Q(A) = E(1A η(T )), ∀A ∈ F (T ).
To summarize, we have that dim(σ⊥ ) = d − m. The previous result shows quite nitidly that
markets incompleteness implies the existence of an infinity of risk-neutral probabilities. Such a
result was shown in great generality by Harrison and Pliska (1983).4
4.5 Equilibrium with a representative agent

4.5.1 Consumption and portfolio choices: martingale approaches
For now, we assume that markets are complete, m = d, and that there are no portfolio con-
straints or any other frictions. We consider the problem of an agent, who maximizes the expected
utility from his consumption flows, u (·), plus the expected utility from terminal wealth, U (·),
under the constraint in Eq. (4.39):5
" T
x,π,c
J(0, V0 ) = max E U(V (T )) + u(c(τ ))dτ , s.t. Eq. (4.39) holds.
(π,c,v) t
The first approach to solve this problem was introduced by Merton, which we shall see later.
We wish to present another approach, which makes use of Arrow-Debreu state prices, similarly
as in Chapter 2. Our first task is to derive a budget constraint paralleling the budget constraint
in Chapter 2:
0 = c0 − w0 + E m · c1 − w 1 , (4.46)
where c· and w · are consumption and endowments, and m is the discount factor m. In Chapter
2, such a budget constraint arises after having multiplied the initial budget constraint by the
Arrow-Debreu state prices,
Qs
φs = ms · Ps , ms ≡ (1 + r)−1 ηs , ηs = ,
Ps
and after “having taken the sum over all the states of nature”. We wish to apply the same logic
here. First, we define Arrow-Debreu state price densities:
dQ
φt,T ≡ mt,T · dP, mt,T = S0 (T )−1 η(T ), η(T ) = . (4.47)
dP
As in the finite state space of Chapter 2, we multiply the budget constraint in Eq. (4.39) by
these Arrow-Debreu densities, and then, we “take the integral over all states of nature.” The
4 The so-called Föllmer and Schweizer (1991) measure, or minimal equivalent martingale measure, is defined as: P̂ ∗ (A) ≡
E(1A ξ̂ (T )), for each A ∈ F(T ).
5 Moreover, we assume that the agent only considers the choice space in which the control functions satisfy the elementary
Markov property and belong to L20,T ,m (Ω, F, P ) and L20,T ,1 (Ω, F, P ).

122
by A. Mele
original problem, one with an infinity of trajectory constraints, will then be reduced to one with
only one constraint, just as for the budget constraint in Eq. (4.46). Accordingly, multiply both
sides in Eq. (4.39) by φ0,T = S0 (T )−1 · dQ, and rearrange terms, to obtain:
x,π,c " T -" .
V (T ) c(u) T
π ⊤ (a − 1m r) (u)du + (π ⊤ σ)(u)dW (u)
0= + du − x dQ − dQ.
S0 (T ) t S0 (u) t S0 (u)
Next, take the integral over all states of nature. By the Girsanov’s theorem,
x,π,c " T
V (T ) c(u)
0=E + du − x .
S0 (T ) t S0 (u)
We can retrieve back the budget constraint under the probability P . We have, by a change of
measure and computations in the Appendix, that:
x,π,c " T " T
V (T ) c(u) x,π,c
x=E + du = E mt,T · V (T ) + mt,u · c(u)du . (4.48)
S0 (T ) t S0 (u) t
So the program is,

" T
−ρ(T −t) x,π,c
J(t, x) = max E e U (V (T )) + u(τ , c(τ ))dτ ,
(c,v) t
" T
x,π,c
s.t. x = E mt,T · V (T ) + mt,τ · c(τ )dτ .
t
Because of its emphasis on the equivalent martingale measure, this approach to solve the original
problem is known as relying on martingale methods. Critically, market completeness is needed
to use these methods, as in this case, there is one and only one Arrow-Debreu density process.
However, the same martingale methods can be applied in the presence of portfolio constraints
(which include incomplete markets as a special case) too, although in a slightly modified manner,
as we shall see in Section 4.6.
To solve the problem, consider the Lagrangean,
" T
max E [u (τ , c(τ )) − ψ · mt,τ · c(τ )] dτ + U (v) − ψ · mt,T · v + ψ · x ,
(c,v) t
where ψ is the constraint’s multiplier, and by Eqs. (4.41) and (4.47),

" τ " τ
1 2 ⊤
mt,τ = exp − r (u) + λ (u) du − λ (u) dW (u) . (4.49)
t 2 t
The first order conditions are:
uc (τ , c(τ )) = ψ · mt,τ , for τ ∈ [t, T ), and U ′ (V x,π,c (T )) = ψ · mt,T . (4.50)
To compute the portfolio-consumption policy, note that for c (τ ) ≡ 0, the proof is just that
leading to Theorem 4.4. In the general case, define,
" T

Q −1
M (τ ) ≡ E S0 (T ) · v̂ + −1
S0 (u) ĉ(u)du F (τ ) .
t
123
by A. Mele
Notice that:
" T
" T

M(τ ) = E Q S0−1 (T ) · v̂ + −1
S0 (u) ĉ(u)du F (τ ) = E mt,T · v̂ +
mt,u · ĉ(u)du F (τ ) .
t t
By the predictable representation theorem, ∃φ such that:

" τ
M (τ ) = M (t) + φ⊤ (u)dW (u).
t
Consider the process {m0,t V x,π,c (τ )}τ ∈[t,T ] . By Itô’s lemma,

" τ " τ
x,π,c

m0,t V (τ ) + mt,u · c(u)du = x + mt,u · π ⊤ σ − V x,π,c λ (u)dW (u).
t t
By identifying,
⊤ x,π,c φ⊤ (τ ) −1
π (τ ) = V (τ ) λ (τ ) + σ (τ ) , (4.51)
mt,τ
where V x,π,c (τ ) can be computed from the constraint:
" T

V x,π,c
(τ ) = E mτ ,T · v +
mτ ,u · c(u)du F(τ ) ,
τ
once that the optimal trajectory of c has been computed.

As an example, let U(v) = ln v and u(x) = ln x. By the first order conditions (4.50), ĉ(τ1 ) =
ψ · mt,τ , v̂1 = ψ · mτ ,T . By plugging these conditions into the constraint, one obtains the solution
for the Lagrange multiplier: ψ = T +1 x
. By replacing this back into the previous first order
x 1 x 1
conditions, one eventually obtains: ĉ(t) = T +1 mt,τ
, and v̂ = T +1 mt,T
. As regards the portfolio
process, one has that:
" T

M (τ ) = E mτ ,T · v̂ +
mt,u ĉ(u)du F(τ ) = x,
t
which shows that φ = 0 in the representation of Eq. (4.51). So by replacing φ = 0 into (4.51),
π ⊤ (τ ) = V x,π,ĉ (τ ) λ (τ ) σ −1 (τ ) .
We can compute V x,π,ĉ in (4.14) by using ĉ:

" T
x,π,ĉ x mτ ,T mτ ,u x T + 1 − (τ − t)
V (τ ) = E + du F (τ ) = ,
T +1 mt,T τ mt,u mt,τ T +1
where we used the property that m satisfies: mt,a · mt,b = mt,b , t ≤ a ≤ b. The solution is:
x T + 1 − (τ − t)
π ⊤ (τ ) = λ (τ ) σ −1 (τ )
mt,τ T +1
whence, by taking into account the relation: a − 1m r = σλ,
x T + 1 − (τ − t)
π (τ ) = (σσ ⊤ )−1 (a − 1m r) (τ ).
mt,τ T +1
124
by A. Mele
4.5.2 The older, Merton’s approach: dynamic programming

The Merton’s approach derives optimal consumption and portfolio through Bellman’s dynamic
programming. Let us see how it works in the infinite horizon case. The problem the agent faces
is:
" ∞
−ρ(τ −t)
J (V (t)) = max E e u (c(τ )) dτ
c t

s.t. dV = π ⊤ (a − 1m ) + rV − c dτ + π ⊤ σdW
Under regularity conditions,

′
1 ′′
⊤ ⊤ ⊤
0 = max E u(c) + J (V ) π (a − 1m r) + rV − c + J (V )π σσ π − ρJ(V ) . (4.52)
c 2
The first order conditions lead to:

′ ′ −J ′ (V ) −1
u (c) = J (v) and π = σσ ⊤ (a − 1m r) . (4.53)
J ′′ (V )
By plugging these expressions back to the Bellman’s Equation (4.52) leaves:

2
′ −J ′ (V ) 1 ′′ −J ′ (V )
0 = u(c) + J (V ) · Sh + rV − c + J (V ) Sh − ρJ(V ), (4.54)
J ′′ (V ) 2 J ′′ (V )
where:
Sh ≡ (a − 1m r)⊤ (σσ ⊤ )−1 (a − 1m r),
with limT →∞ e−ρ(T −t) E [J (V (T ))] = 0.

As an example, consider the CRRA utility u (c) = (c1−η − 1) / (1 − η). Conjecture that:
x1−η − B
J(x) = A ,
1−η
where A, B are constants to be determined. Using the first condition in (4.53), leaves c =
A−1/η V . By plugging this expression into Eq. (4.54), and using the conjectured analytical form
of J, we obtain:

1−η η 1 Sh ρ 1
0 = AV A−1/η + +r− − (1 − ρAB) .
1−η 2 η 1−η 1−η
This equation must hold for every V . Therefore

−η η
ρ − r(1 − η) (1 − η)Sh 1 ρ − r(1 − η) (1 − η)Sh
A= − , B= −
η 2η 2 ρ η 2η 2
Clearly, limη→1 J(V ) = ρ−1 ln V .

125
by A. Mele
4.5.3 Equilibrium
In a complete markets setting, an equilibrium is (i) a consumption plan satisfying the first order
conditions (4.50); (ii) a portfolio process having the form in Eq. (4.51), and (iii) the following
market clearing conditions:
m
m

c (τ ) = D(τ ) ≡ Di (τ ), for τ ∈ [t, T ), q(T ) ≡ Si (T ) (4.55)
i=1 i=1
θ0 (τ ) = 0, π(τ ) = S(τ ), for τ ∈ [t, T ] . (4.56)
We now derive equilibrium allocations and Arrow-Debreu state price densities. First, note
that the dividend process, D, satisfies:
dD(τ ) = aD (τ )D(τ )dτ + σ D (τ )D(τ )dW (τ ),

m m
where aD D ≡ i=1 aDi Di and σ D D ≡ i=1 σ Di Di .
We have:
d ln uc (τ , D(τ )) = d ln uc (τ , c(τ ))
= d ln mt,τ

1 2
= − r(τ ) + λ(τ ) dt − λ⊤ (τ )dW (τ ), (4.57)
2
where the first equality holds in an equilibrium, the second equality follows by the first order
conditions in (4.50), and the third equality is true by the definition of mt,τ in Eq. (4.49).
Finally, by Itô’s lemma, ln uc (τ , D(τ )) is solution to:
- $ 2 %.
uτ c ucc 1 2 2 uccc ucc ucc
d ln uc = + aD D + σD D − dt + Dσ D dW. (4.58)
uc uc 2 uc uc uc
By identifying drifts and diffusion terms in Eqs. (4.57)-(4.58), we obtain, after a few simplifi-
cations, the expression for the equilibrium short term rate and the prices of risk:

uτ c (τ , D(τ )) ucc (τ , D(τ )) 1 uccc (τ , D(τ ))
r(τ ) = − + aD (τ )D(τ ) + σ D (τ )2 D(τ )2
uc (τ , D(τ )) uc (τ , D(τ )) 2 uc (τ , D(τ ))
ucc (τ , D(τ ))
λ⊺ (τ ) = − σ D (τ ) D (τ ) .
uc (τ , D(τ ))
For example, consider the CRRA utility function, if u (τ , c) = e−(τ −t)ρ (c1−η − 1) / (1 − η), and
m = 1. Then,
1
r(τ ) = ρ + η · aD (τ ) − η(η + 1)σ D (τ )2 , λ (τ ) = ησ D (τ ) .
2
Appendix 2 performs Walras’s consistency tests: Eq. (4.55) ⇐⇒ Eq. (4.56).

126
4.6. Market imperfections and portfolio choice c
by A. Mele
4.5.4 Continuous-time Consumption-CAPM

By Eq. (4.42),
" T
S0 (τ ) S0 (τ )
Si (τ ) = E Q
Si (T ) +
Di (s)ds F (τ )
S0 (T ) τ S0 (s)
" T
mt,T mt,s
= E Si (T ) + Di (s)ds F(τ ) ,
mt,τ τ mt,τ
where the second line follows by the same arguments leading to Eq. (4.48). Replacing the
first order condition in (4.50), and the equilibrium conditions in Eq. (4.55), we obtain the
consumption CAPM evaluation of each asset:
- " T ′ .
u′ q(T ) u (D(s))

Si (τ ) = E ′
Si (T ) + ′
Di (s)ds F (τ ) , i = 0, 1, · · · , m.
u (D(τ )) τ u (D(τ ))
As an example, consider a pure discount bond, with price b. We have that its dividend is zero
and that b(T ) = 1. Therefore,
- .
u′ q(T ) mt,T
b(τ ) = E F(τ ) = E F (τ ) ,
u′ (D(τ )) mt,τ
where mt,τ is as in Eq. (4.49).
4.6 Market imperfections and portfolio choice

The setup is as in Section 4.4, where we fix m = d. To allow for frictions such as market
incompleteness or short sale constraints, we assume that the vector of normalized portfolio
shares in the risky assets, p (t) ≡ π (t) /V x,π,c (t), is constrained to lie in a closed convex set
K ∈ Rd .
We follow the approach put forward by Cvitanić and Karatzas (1992), which consists in
“embedding” the constrained portfolio choice of the investor in a set of unconstrained portfolio
optimization problems. Under regularity conditions that we shall not deal with in these lectures,
it is shown that in this set of unconstrained problems, there exists one, which happens to be the
solution to the original constrained portfolio problem. So the constrained portfolio problem is
solved, once we solve for the unconstrained, which we can do through the martingale methods in
Section 4.4. This approach is closely related to the discrete time minimax probability mentioned
in Chapter 2. It is a systematic approach to consumption and portfolio policies in a context of
constrained portfolio choices, and generalizes results from He and Pearson (1991).
The starting point is the definition of the support function,
ζ (ν) = sup(−p⊤ ν), ν ∈ Rd , (4.59)
p∈K
and its effective domain,

K̃ = {ν ∈ Rd : ζ (ν) < ∞}.
The role of the support function ζ is to “tilt” the dynamics of the price system in Section 4.4,
as follows:
dS0 (t) dSi (t)
= rν (t) dt, = âνi (t) dt + σ i (t) dW (t) (i = 1, · · · , d) (4.60)
Si (t) Si (t)
127
4.7. Jumps c
by A. Mele
where:
rν ≡ r + ζ (ν) , âνi ≡ âi + ν + ζ (ν) ,
and âi is as in Section 4.4.
The main result is as follows. Denote with Val (x; K) the value of the problem faced by an
investor facing a portfolio constraint K ∈ Rd , when his initial wealth is x. Let Valν (x) be the
corresponding value of the problem faced by an unconstrained investor in the market (4.60).
Clearly, this value is just Val0 (x) for the market considered in Sections 4.4 and 4.5. Moreover,
for each ν ∈ Rd , the unconstrained program the investor faces in the market (4.60), can be
solved through martingale methods, using the unique risk-neutral probability Qν , equivalent to
P , with Radon-Nikodym derivative equal to,
" T "
η dQν 0
−1 ⊤ 1 T; ; −1
;2
;
η (T ) ≡ = η (T ) exp − σ (t) ν (t) dW (t) − σ (t) ν (t) dt . (4.61)
dP 0 2 0
Then, under regularity conditions, we have that:
Val (x; K) = inf (Valν (x)) , (4.62)

ν∈K̃
and optimal consumption and portfolio choices for this unconstrained problem are exactly those
chosen by the investor constrained to have p ∈ K. Appendix 4 provides an informal sketch of
the arguments leading to Eq. (4.62).
Examples of the support function ζ in Eq. (4.59) are the unconstrained case: K = Rd , in
which case K̃ = {0} and ζ = 0 on K̃; prohibition of short-selling: K = [0, ∞)d , in which case
K̃ = K and ζ = 0 on K̃, or: incomplete markets: K = {p ∈ Rd : pM+1 = · · · = pD = 0} (i.e.
the first M assets can only be traded), in which case K̃ = {ν ∈ Rd : ν 1 = · · · = ν M = 0} and
ζ = 0 on K̃.
In the context of log-utility functions, we have that,
; ;
; −1 ;2
ν̂ = arg min 2ζ (ν) + λ + σ ν ,
ν∈K̃
where λ = σ −1 (a − 1d r). Applications of this will be worked out in Part II on “Asset pricing
and reality.”
4.7 Jumps
Brownian motions are well suited to model the price behavior of liquid assets or assets issued by
names or Governments not subject to default risk. There is, however, a fair amount of interest in
modeling discontinuous changes in asset prices. Fixed income instruments may undergo liquidity
dry-ups, or even default, causing price discontinuities that we wish to model. This section is
an introduction to Poisson models, a class of processes that is particularly useful in addressing
these issues.
4.7.1 Poisson jumps

Let (t, T ) be a given interval, and consider events in that interval which display the following
properties:
128
4.7. Jumps c
by A. Mele
(i) The random number of events arrivals on any disjoint time intervals of (t, T ) are inde-
pendent.
(ii) Given two arbitrary disjoint but equal time intervals in (t, T ), the probability of a given
random number of events arrivals is the same in each interval.
(iii) The probability that at least two events occur simultaneously in any time interval is zero.
Next, let Pk (τ − t) be the probability that k events arrive during the time interval τ − t. We
make use of the previous three properties to determine the functional form of Pk (τ − t). First,
Pk (τ − t) must satisfy:
P0 (τ + dτ − t) = P0 (τ − t) P0 (dτ ) , (4.63)
and we impose
P0 (0) = 1, Pk (0) = 0 for k ≥ 1. (4.64)
Eq. (4.63) and the first condition in (4.64) are satisfied by P0 (τ ) = e−vτ , for some constant v,
which we take to be positive, so as to ensure that P0 ∈ [0, 1]. Furthermore, we have that:


 P1 (τ + dτ − t) = P0 (τ − t) P1 (dτ ) + P1 (τ − t) P0 (dτ )

 ..
.
(4.65)

 Pk (τ + dτ − t) = Pk−1 (τ − t) P1 (dτ ) + Pk (τ − t) P0 (dτ )

 ..
.
The first equation in (4.65) can be rearranged as follows:
P1 (τ + dτ − t) − P1 (τ − t) 1 − P0 (dτ ) P1 (dτ )
=− P1 (τ − t) + P0 (τ − t) .
dτ dτ dτ
For small dτ , P1 (dτ ) ≈ 1 − P0 (dτ ) and P0 (dτ ) = 1 − vdτ + O (dτ 2 ) ≈ 1 − vdτ . Therefore,
P1′ (τ − t) = −vP1 (τ − t) + vP0 (τ − t). By a similar reasoning,
Pk′ (τ − t) = −vPk (τ − t) + vPk−1 (τ − t) .
The solution to this equation is:
v k (τ − t)k −v(τ −t)

Pk (τ − t) = e .
k!
4.7.2 Interpretation
A Poisson model is one of rare events. Moreover, by:
E (event arrival in dτ ) = P1 (dτ ) = vdτ .
For this reason, we usually refer to the parameter v as the intensity of event arrivals.
To provide additional intuition about the mathematics of rare events, consider the expression
for the probability of k “arrivals” in n trials, predicted by a binomial distribution:

n k n−k n!
Pn,k = p q = pk q n−k , p, q > 0, p + q = 1,
k k! (n − k)!
129
4.7. Jumps c
by A. Mele
where p is the probability of arrival for each trial. We want to model the probability p as a
function of n, with the feature that limn→∞ p(n) = 0, so as to make each arrival “rare.” One
possible choice is p (n) = na , for some constant a > 0. Under this assumption, we have:
n!
Pn,k = p(n)k (1 − p(n))n−k
k! (n − k)!
n! a k a n−k
= 1−
k! (n − k)! n n
n!
a k a n a −k
= 1− 1−
k! (n − k)! n n n
k
n! a a n a −k
= 1 − 1 −
nk (n − k)! k! n n
k
n n−1 n−k+1a a n a −k
= · ··· 1− 1− ,
n n n k! n n
k times
leaving,
ak −a
lim Pn,k ≡ Pk = e .
n→∞ k!
Next, we split the interval (τ − t) into n subintervals of length τ n−t , and then make the prob-
ability of one arrival in each sub-interval proportional to each sub-interval length, as illustrated
in Figure 4.1,
τ −t a
p(n) = v ≡ , a ≡ v(τ − t).
n n
The Poisson model in the previous section is thus as that we consider here, with n → ∞,
which is continuous-time, as each sub-interval in Figure 4.1 shrinks to dτ . The probability there
is one arrival in dτ is vdτ , which is also the expected number of events in dτ as shown below:
E (# arrivals in dτ )
= Pr (one arrival in dτ ) × one arrival + Pr (zero arrivals in dτ ) × zero arrivals
= Pr (one arrival in dτ ) × 1 + Pr (zero arrivals in dτ ) × 0
= vdτ .
The heuristic construction in this section opens the way to how we can simulate Poisson
processes. We can just simulate a Uniform random variable U (0, 1), with the continuous-time
process being approximated by Y , where:
+
0 if 0 ≤ U < 1 − vh
Y =
1 if 1 − vh ≤ U < 1
where h is a discretization interval.
4.7.3 Properties and related distributions

We ckeck that Pk is a probability. We have:
∞
∞

−a ak
Pk = e = 1,
k=0 k=0
k!
130
4.7. Jumps c
by A. Mele
n −1 (τ − t)
t τ
n subintervals
FIGURE 4.1. Heuristic construction of a Poisson process from a binomial distribution.

∞
since k=0 ak k! is the McLaurin expansion of ea . Second, we compute the mean,
∞
∞
ak
Mean = k · Pk = e−a k· = a.
k!
k=0 k=0
A related distribution is the exponential (or Erlang) distribution. Remember, the probability
of zero arrivals in τ − t predicted by the Poisson model is P0 (τ − t) = e−v(τ −t) , from which it
follows that:
G (τ − t) ≡ 1 − P0 (τ − t) = 1 − e−v(τ −t)
is the probability of at least one arrival in τ − t. The function G can be also interpreted as the
probability the first arrival occurred before τ , starting from t. The density function of G is:
∂
g (τ − t) = G (τ − t) = ve−v(τ −t) .
∂τ
The first two moments of the exponential distribution are:
" ∞ " ∞
−vx −1
2
Mean = xve dx = v , Variance = x − v −1 ve−vx dx = v−2 .
0 0
The expected time of the first arrival occurred before τ starting from t equals v −1 . More gen-
erally, v −1 can be interpreted as the average time from an arrival to another.6
A more general distribution than the exponential is the Gamma distribution with density:
−v(τ −t) [v (τ − t)]γ−1

gγ (τ − t) = ve .
(γ − 1)!
The exponential distribution obtains when γ = 1.
4.7.4 Some asset pricing implications

This section is a short introduction to modeling asset prices as being driven by Brownian
motions and jumps processes. We model jumps by interpreting the “arrivals” in the previous
sections as those events upon which a certain random variable experiences a jump of size S,
where S is another random variable with a fixed probability p. A simple model is:
dS(τ ) = b(S(τ ))dτ + σ(S(τ ))dW (τ ) + ℓ(S(τ )) · S · dZ(τ ), (4.66)
where b, σ, ℓ are given functions (with σ > 0), W is a standard Brownian motion, and Z is a
Poisson process with intensity equal to v, i.e.
6 Suppose arrivals are generated by Poisson processes, and consider the random variable “time interval elapsing from one arrival
to next one.” Let τ ′ be the instant at which the last arrival occurred. Then, the probability the time τ − τ ′ which will elapse from
the last arrival to the next is less than ∆ is the same as the probability that during the time interval τ − τ ′ , there is at least one
arrival.
131
4.7. Jumps c
by A. Mele
(i) Pr (Z(t)) = 0.
(ii) ∀t ≤ τ 0 < τ 1 < · · · < τ N < ∞, Z(τ 0 ) and Z(τ k ) − Z(τ k−1 ) are independent for each
k = 1, · · ·, N.
(iii) ∀τ > t, Z(τ ) − Z(t) is a random variable with Poisson distribution and expected value
v(τ − t), i.e.:
v k (τ − t)k −v·(τ −t)
Pr (Z(τ ) − Z(t) = k) = e .
k!
In this framework, k is the number of jumps over the time interval τ − t.7 From this, we have
that Pr (Z(τ ) − Z(t) = 1) = v (τ − t) e−v·(τ −t) and for τ − t small,

Pr (dZ(τ ) = 1) ≡ Pr ( Z(τ ) − Z(t)|τ →t = 1) = v (τ − t) e−v·(τ−t) τ →t ≃ vdτ .
More generally, the process {Z(τ ) − v (τ − t)}τ ≥t is a martingale.

Armed with these preliminary facts, we can provide a heuristic derivation of Itô’s lemma for
jump-diffusion processes. Consider any function f with enough regularity conditions, a rational
function of time and S in Eq. (4.66), i.e. f (τ ) ≡ f (S(τ ), τ ). Consider the following expansion
of f :

∂
df (τ ) = + L f (S(τ ), τ ) dτ + fS (S(τ ), τ ) σ(S(t))dW (τ )
∂τ
+ [f (S(τ ) + ℓ(S(τ )) · S, τ ) − f (S(τ ), τ )] · dZ(τ ).
∂
The first two terms in are the usual Itô’s lemma terms, with ∂τ · +L· denoting the infinitesimal
generator for diffusions. The third term accounts for jumps. If there are no jumps from time τ −
to time τ (where dτ = τ − τ − ), then dZ(τ ) = 0. If there is a jump then dZ(τ ) = 1, and in this
case f , as a “rational” function, needs also instantaneously jump to f (S(τ ) + ℓ(S(τ )) · S, τ ).
The jump will be exactly f (S(τ ) + ℓ(S(τ )) · S, τ ) − f (S(τ ), τ ), where S is another random
variable with a fixed probability measure. Clearly, if f(S, τ ) = S, we are back to the initial
jump-diffusion model in Eq. (4.66).
To derive the infinitesimal generator for jumps-diffusion, LJ f say, note that:

∂
E (df) = + L fdτ + E [(f (S + ℓS, τ ) − f (S, τ )) · dZ(τ )]
∂τ

∂
= + L fdτ + E [(f (S + ℓS, τ ) − f (S, τ )) · v · dτ ] ,
∂τ
or "
J
L f = Lf + v · [f (S + ℓS, τ ) − f (S, τ )] p (dS) ,
supp(S)
∂ theJ support of S. Therefore, the infinitesimal generator for jumps-

where supp (S) denotes
diffusion is simply, ∂τ + L f.
7 For simplicity, we take v to be constant. If v is a deterministic function of time, we have that

τ k τ
v(u)du
Pr (Z(τ ) − Z(t) = k) = t exp − v(u)du , k = 0, 1, · · ·
k! t
and there is also the possibility to model v as a function of the state: v = v(q), for example. Cox processes.
132
4.8. Continuous-time Markov chains c
by A. Mele
4.7.5 An option pricing formula

Merton (1976, JFE), Bates (1988, working paper), Naik and Lee (1990, RFS) are the seminal
papers.
4.8 Continuous-time Markov chains

Needed to model credit risk.
133
4.9. Appendix 1: Convergence issues c
by A. Mele
4.9 Appendix 1: Convergence issues

We have,
(1) (2) (1) (2) (1)
ct + St θt+1 + bt θt+1 = (St + Dt ) θt + bt θ t ≡ Vt + Dt θt ,
(1) (2)
where Vt ≡ St θt + bt θ t is wealth net of dividends. We have,
(1) (2)
Vt − Vt−1 = St θt + bt θt − Vt−1

(1) (2) (1) (2) (1)
= St θt + bt θt − ct−1 + St−1 θt + bt−1 θt − Dt−1 θt−1
(1) (2) (1)
= (St − St−1 ) θt + (bt − bt−1 ) θt − ct−1 + Dt−1 θt−1 ,
and more generally,

(1) (2) (1)
Vt − Vt−∆ = (St − St−∆) θt + (bt − bt−∆ ) θt − (ct−∆ · ∆) + (Dt−∆ · ∆) θt−∆ .
Now let ∆ ↓ 0 and assume that θ(1) and θ(2) are approximately constant between t and t − ∆. We
have:
dV (τ ) = (dS(τ ) + D(τ )dτ ) θ(1) (τ ) + db(τ )θ (2) (τ ) − c(τ )dτ .
Assume that
db(τ )
= rdτ .
b(τ )
The budget constraint can then be written as:
dV (τ ) = (dS(τ ) + D(τ )dτ ) θ(1) (τ ) + rb(τ )θ(2) (τ )dτ − c(τ )dτ

= (dS(τ ) + D(τ )dτ ) θ(1) (τ ) + r V − S(τ )θ (1) (τ ) dτ − c(τ )dτ
= (dS(τ ) + D(τ )dτ − rS(τ )dτ ) θ(1) (τ ) + rV dτ − c(τ )dτ

dS(τ ) D(τ )
= + dτ − rdτ θ(1) (τ )S(τ ) + rV dτ − c(τ )dτ
S(τ ) S(τ )

dS(τ ) D(τ )
= + dτ − rdτ π(τ ) + rV dτ − c(τ )dτ .
S(τ ) S(τ )
134
4.10. Appendix 2: An introduction to stochastic calculus for finance c
by A. Mele
4.10 Appendix 2: An introduction to stochastic calculus for finance

4.10.1 Stochastic integrals
4.10.1.1 Motivation
Given is a Brownian motion W (t) ≡ Wt (ω), t ≥ 0, and the associated natural filtration F (t). We aim
to give a sense to the “integral” " t
It (ω) ≡ f(s)dWs (ω) , (4A.1)
0
where f is a given function. More generally, this appendix aims to provide explanations about the
sense to give to “integrals” which look like:
" t
It (ω) ≡ g(s; ω)dWs (ω) ,
0
where g is now a progressively F (t)-measurable function.

The motivation for this aim is that we can build up a class of useful processes from Brownian
motions. Let us illustrate. Given (Ω, F, P ) on which W is Brownian motion, and let T < ∞. Let
us write dW for the increment of W over an infinitesimal amount of time. In some sense, dW (t)
equals W (t + ∆t) − W (t) as ∆t → 0. We may think of “increment” dW (t) as normally distributed:
dW (t) ∼ N (0, dt). From here, we may consider some richer processes X (say)
dXt (ω) = µt (ω) dt + σt (ω) dW (t) (4A.2)
for some objects µt (·) and σ t (·) to be defined later. Later on we will call these processes Itô’s processes.
The intuition on µt (·) and σt (·) is as follows. Heursitically, we have that E [dXt (ω)] = E [µt (ω)] dt +
σE (dW (t)) = E [µt (ω)] dt, such that µt (·) is related to the instantaneous expected changes of dX.
So this model is richer than Brownian motions because µ can be different from identically zero. Useful
for asset pricing. Think of X as an asset price process. Hard to imagine that we would be willing to
invest if the expected variation of X (that is the expected capital gain) over some time horizon is just
zero. Following the interpretation of X as an asset price, we now compute the variance of dX. We
have, var (dX (t)) = E [dX (t) − E (dX (t))]2 = E (σdW (t))2 which turns out to equal σ2 dt.
A quite important terminology issue. The “process” µ is called the drift and the “process” σ is
called the diffusion coefficient, or the volatility of X. Clearly, the drift µ determines the trend, and the
volatility determines the noisiness of X around that trend. Both drift and diffusion coefficients need
to be adapted processes, as we shall explain. One example of drift and diffusion coefficients. Assume
that µ ≡ W (t) and σ ≡ 0. In this case, we have that: dX (t) = W (t) dt, which shows that X (t) is still
a truly random process. Here µ is a stochastic process and so is X (t). Its infinitesimal variations can
be predicted. But its further evolution cannot. In finance jargon, we would say that X (t) is locally
riskless in this example.
Let us proceed with a more delicate example, relating to strategies and trading gains. Suppose
that a stock price is just a Brownian motion. Assume it does not distribute dividends over some
time-horizon of interest, and that we hold θ (t) units of it #at time t. What are our trading gains
t
from 0 to t? We will see later that the intuitive expression 0 θ (s) dWs (ω) is indeed the answer to
this question. Under certain conditions, that expression will be called stochastic integral. But then,
why are we insisting in modeling asset prices through Brownian motions? As we shall see, Brownian
motions are wild in some sense, i.e. they are of unbounded variation on any interval. So why don’t
we go for smoother processes? The answer is that “smoother” processes would give rise to arbitrage
opportunities. Harrison, Pitbladdo and Schaefer (1984) showed that in continuous time models, asset
prices must be “wild.” Intuitively, if stock-prices are continuous in time and have finite variation, we
could predict them over the immediate future, thus cashing-in the capital gains.
135
by A. Mele
Let us mention a few technicalities. We already know W (t) is nowhere differentiable. So the expres-
sion in Eq. (8.23) should be only understood as a shorthand for,
" t " t
Xt (ω) = X0 + µs (ω) ds + σs (ω) dWs (ω) .
0 0
#t
The question, then, is what does the “stochastic integral” 0 σs (ω) dWs (ω) mean, and why we need
it. In standard calculus, the integral can be defined from its differential. To anticipate, in stochastic
calculus this is no longer the case, in that the stochastic integral is the real thing.
In the following sections, we provide short reviews of the ordinary Riemann integral, the Riemann-
Stieltjes integral and explain why these two approaches to pathwise integration generically fail to
provide a solid foundation to the “expression” It (ω) in Eq. (4A.1). To anticipate, the main issue
relates to unboundedness of Brownian motions:
n

∀ω ∈ Ω, sup Wt (ω) − Wt (ω) = ∞,
i i−1
τ
i=1
where the supremum is taken over all partitions of [0, T ]. We shall state conditions on “how much
bounded” the integrator and integrands in It (ω) should be in order for the Riemann-Stieltjes theory
to hold. As it turns out, these conditions are unfortunately restrictive in the context of interest here.
We shall explain that
# t in general, no Riemann or Riemann-Stieltjes explanation can be given to “ex-
pressions” such as 0 f(s; ω)dWs (ω). However, there are still cases where the Riemann-Stieltjes theory
works. For example, consider the functions f (t) = 1, or f(t) = t. But in general, the Riemann-Stieltjes
theory doesn’t work, so we have to attack the problem with a more general approach. Intuitively, we
can only consider a probabilistic representation of It (ω).
4.10.1.2 Riemann
Given is x → f(x), x ∈ (0, 1). We consider two standard definitions. First, we define a partition as
τ n : 0 = t0 < t1 < · · · < tn−1 < tn = 1 and ∆i = ti − ti−1 , i = 1, · · ·, n, as in the following picture.
678
L ∆1 L
t0 = 0 t1 LL t n −1 tn = 1
Second, we define an intermediate partition as σn : any collection of values yi satisfying ti−1 ≤ yi < ti ,
i = 1, · · ·, n. Then, for a given partition τ n and intermediate partition σn , the Riemann sum is defined
as: n

Sn (τ n , σn ) ≡ f(yi )∆i .
i=1
It’s a “weighted average of the values f(yi ).” Next, let Mesh (τ n ) ≡ maxi=1,···,n ∆i . Consider letting
Mesh (τ n ) → 0 by sending n → ∞. If the limit, limn→∞ Sn (τ n , σn ), exists, and is independent of τ n
and σn , then it is called the Riemann integral of f on (0, 1) and it is written:
" 1
f(t)dt.
0
Two properties are worth mentioning:

#1 #1 #1
1. Linearity: Given two constants c1 and c2 , 0 (c1 f1 (t) + c2 f2 (t))dt = c1 0 f1 (t)dt + c2 0 f2 (t)dt.
#1 #a #1
2. Linearity on adjacent intervals: 0 f(t)dt = 0 f(t)dt + a f(t)dt for every a ∈ (0, 1).
136
by A. Mele
4.10.1.3 Riemann-Stieltjes
The main idea is to “integrate one function f with respect to another function g.” One standard
example relates to the computation of the expectation of a random variable with distribution function
g. Heuristically, we have that:
" 1
tdg(t) ≈ ti [g (ti ) − g (ti−1 )] .
0 i
In general, let us be given two functions f and g. Consider, again, the definitions of τ n , σ n given
earlier, and set: ∆gi = g(ti ) − g(ti−1 ), i = 1, · · ·, n. The Riemann-Stieltjes sum is defined as:
n

Sn (τ n , σn ) = f(yi )∆gi .
i=1
Clearly the Riemann sum is a special case obtained with the identity function g(t) = t. Similarly as
in the definition of the Riemann sum, here we have that if the limit, limn→∞ Sn (τ n , σn ), exists, and
is independent of τ n and σn , then is called the Riemann-Stieltjes integral of f with respect to g on
(0, 1) and it is written:
" 1
f (t) dg (t) .
0
The crucial issue is, can we now use Riemann-Stieltjes
#1 theory to define integrals of functions w.r.t
Brownian motions? That is, can we interpret 0 f (t) dWt (ω) as a Riemann-Stieltjes integral, path
by path, i.e. ∀ω ∈ Ω? The answer is in the negative, except in very special cases.# Indeed, a natural
t
example of an integral of functions with respect to Brownian motion is It (ω) ≡ 0 f(s)dWs (ω). But
what does this representation mean? We know that a ω-Wt path is non differentiable. However, the
main point here is even not differentiability, but the property of unboundedness of Brownian motions.
Let us formalize this reasoning. Consider the following definition:
Definition 4A.1. A real function h on (0, 1) has bounded p-variation, p > 0, if

n

sup |h(ti ) − h(ti−1 )|p < ∞,
τ
i=1
where the supremum is taken over all partitions of (0, 1).
We have:
#1
Theorem 4A.2. The Riemann-Stieltjes integral, 0 f (t)dg(t), exists under the following conditions:
(i) Functions f and g don’t have discontinuities at the same points.
(ii) f has bounded p-variation and g has bounded q-variation, with 1p + 1q > 1, that is, f, g satisfy

supτ ni=1 |f (ti ) − f (ti−1 )|p < ∞ and supτ ni=1 |g (ti ) − g (ti−1 )|q < ∞ with 1p + 1q > 1.
Now, it is well-known that almost every ω-Wt path has bounded p-variation for p ≥ 2. And, as
expected,
#1 unbounded p-variation for p < 2, as further argued below. Consider, then, the integral,
0 f (t) dWt (ω), and suppose f is differentiable with bounded derivatives. By the mean value theorem,
there exists a K > 0 such that: |f(t) − f(s)| ≤ K (t − s) for s < t. Therefore, supτ ni=1 |f (ti ) − f (ti−1 )|
≤ K ni=1 (ti − ti−1 ) = K. That is, f has bounded p-variation, with p = 1.
137
by A. Mele
By Theorem 4A.2, we now have that for almost every ω-Wt path, the Riemann-Stieltjes integral of
f with respect to Brownian motions,
" t
It (ω) ≡ f (s) dWs (ω) ,
0
exists for every deterministic function f which is differentiable with bounded first-order derivative.
For example, f (t) = 1, or f (t) = t. We aren’t done. Consider ft (ω) = Wt (ω) and, then:
" 1
I (W ) (ω) = Wt (ω) dWt (ω) .
0
Let p = 2 + ǫ, for some ǫ > 0. Hence p = q = 2 + ǫ, and so 1p + 1q = 2+ǫ2

< 1. The Riemann-Stieltjes
theory doesn’t work even with this simple example. This is where the theory of Itô’s stochastic integrals
comes in.
4.10.1.4 A digression on unboundedness of Brownian motions

Why do Brownian motions display unbounded variation? Consider the “Brownian tree” in the picture
below.
W0
− ∆h + ∆h
∆t
1
2
1 1
2 2
LL
Time is ∆t and space is ∆h. In the Brownian tree, we must have,

√
∆h = ∆t. (4A.3)
Indeed, and heuristically, we have that var (∆W ) =√ (∆h)2 , which matched to var (∆W ) = ∆t, leaves
precisely Eq. (4A.3). Therefore, E (|∆W |) = ∆h = ∆t. Next let us chop a time interval of length t
t
in n ≡ ∆t parts. The total expected length traveled by a Brownian motion is,
t t √
∆h = ∆t → ∞ as ∆t → 0.
∆t ∆t
A more substantive proof is one for example of Corollary 2.5 p. 25 in Revuz and Yor (1999). A
sketch of this proof proceeds as follows. We have:
2
Wti − Wti−1 ≤ max Wti − Wti−1 · Wti − Wti−1 .
i
i i
138
by A. Mele

Moreover, maxi Wti − Wti−1 converges to zero ∀ω ∈ Ω because W· is continuous, and by the Heine-
Cantor theorem, continuous functions are uniformly continuous on finite intervals. Then, suppose that
Wt· has bounded variation, which would imply that
2
∀ω ∈ Ω, Wti − Wti−1 → 0, Mesh ↓ 0,
i
2 q.m
which is impossible. It is impossible because we know that Ln ≡ i Wti − Wti−1 → t = 1, as
established below, which implies that plimn Ln = 1, hence there exists a sequence nk : Lnk → 1 for
all ω ∈ Ω. (Convergence in probability does not imply almost sure convergence, yet it implies that ∃
a suitable subsequence nk s.t. ∃ a.s. convergence, which is what we just need here.)
4.10.1.5 Itô
Let us begin with a first example, which can help grasp the nature of the issues under study. Consider
" 1
I(W )(ω) = Wt (ω)dWt (ω).
0
Consider, then, the following Riemann-Stieltjes sum:

n

Sn = Wti−1 ∆i W, ∆i W = Wti − Wti−1 ,
i=1
where the intermediate partition makes simply use of the left-end points yi = ti−1 , i = 1, · · ·, n. Simple
computations leave:
n
1 2
Sn = Wt − Qn (t) , Qn (t) ≡ (∆i W )2 .
2
i=1
The quantity Qn (t) is known as the Quadratic Variation, a quite useful concept in financial econo-
metrics. We have
n
n
E [Qn (t)] = E (∆i W )2 = ∆i = t.
i=1 i=1
√
Moreover, var[(∆i W )2 ] = var[( √1∆ ∆i W ∆i )2 ] = ∆2i var[( ∆
√i W )2 ] = 2∆2 , where the last equality
∆ i
i i
∆
√i W ∆
√i W
follows because ∆i
∼ N(0, 1), which implies that ∆i
∼ χ2 (1). Hence,
n
! n
n

2
var [Qn (t)] = var (∆i W ) =2 ∆2i ≤2 Mesh(τ n ) · ∆i = 2t · Mesh(τ n ) → 0.
i=1 i=1 i=1
But var [Qn (t)] = E [Qn (t) − E (Qn (t))]2 = E [Qn (t) − t]2 . Therefore,
var [Qn (t)] = E [Qn (t) − t]2 → 0 t-pointwise.
This type of convergence is called convergence in quadratic mean of Qn (t) to t and it is written
q.m.
Qn (t) → t, as we shall explain in the appendix of the next chapter. By the celebrated Chebyshev’s
inequality, convergence in quadratic mean implies convergence in probability:
E [Qn (t) − t]2

∀δ > 0, Pr {|Qn (t) − t| > δ} ≤ .
δ2
Issues related to uniform convergence issues will be dealt with later.
139
by A. Mele
#·
To sumup, 0 Ws (ω)dWs (ω) doesn’t exist as a Riemann-Stieltjes integral. Nevertheless, the previous
facts suggest that a good definition of it could hinge upon the notion of a mean square limit, viz
n
1 2 q.m. 1 2
Sn = Wti−1 ∆i W = Wt − Qn (t) → Wt − t ,
2 2
i=1
or, as we shall explain,

n
" t
q.m.
Sn = Wti−1 ∆i W → Ws dWs ,
i=1 0
#t
where 0 Ws dWs = 12 Wt2 − t has the Itô’s sense.
#t
Clearly, 0 Ws dWs does not satisfy the usual Riemann-Stieltjes rule of integration. (For any smooth
#t
function f such that f (0) = 0, the Riemann-Stieltjes integral 0 f(u)df(u) = 12 f(t)2 .) This doesn’t
work here because we have yet to see what the chain-rule # t for functions
of ω-W t is. This will lead us
1 2
to the celebrated Itô’s lemma, which shall confirm that 0 Ws dWs = 2 Wt − t . This example vividly
illustrated that standard integration methods fails. In fact, the timing of the integrands is quite critical.
For example, in Riemann integration, the integrand can be evaluated at any point
inthe
interval. If
we apply this to the kind of integrals
we are we obtain, lim i f Wti−1 Wti − Wti−1
studying here
(for the left boundary) and lim i f (Wt ) Wti − Wti−1 (for the right boundary). But the two limits
do not agree. The expectation of the first is zero (by the law of iterated expectations), while the
expectation of the second is not necessarily zero. Finally, Riemann integration theory differs from the
integration theory underlying the previous example because of the mode of convergence utilized in the
two theories.
A short digression is onder. The so-called Stratonovich stochastic integral selects as points of the
intermediate partion the central ones:
n
1
S̃n = f (Wyi ) ∆i W, yi = (ti−1 + ti ) .
2
i=1
For the Stratonovich integral, the usual Riemann-Stieltjes rule applies, yet the Stratonovich stochastic
integral isn’t Riemann-Stieltjes.
4.10.1.6 The Itô’s stochastic integral for simple processes

Let F be the P -augmentation of the filtration of W . Consider [0, T ] and partitions τ n : 0 = t0 < t1 <
· · · < tn = T , and the following definition:
Definition 4A.3 (Simple processes). The process C = (Ct , t ∈ [0, T ]) is simple if
(i) There exists a partition τ n and a sequence of r.v. Zi , i = 1, · · ·, n, s.t

+
Zn , if t = T
Ct =
Zi , if ti−1 ≤ t < ti , i = 1, · · ·, n
(ii) The sequence (Zi ) is Fti−1 -adapted, i = 1, · · ·, n.
(iii) E(Zi2 ) < ∞ all i ( L2 ).
As an example, consider Ct = Wtn−1 , if t = T , and Ct = Wti−1 , if ti−1 ≤ t < ti , i = 1, · · ·, n. Next,

we have:
140
by A. Mele
Definition 4A.4. The Itô’s stochastic integral of a simple process C is,
" n
n
T
Cs dWs = Cti−1 Wti − Wti−1 = Zi Wti − Wti−1 , on [0, T ]
0 i=1 i=1
" t k−1

Cs dWs = Cti−1 Wti − Wti−1 + Zk Wt − Wtk−1 , t ∈ [tk−1 , tk ] .
0 i=1
0
with the notation i=1 mi ≡ 0.
It is a Riemann-Stieltjes sum of C with respect to Brownian motions evaluated at left-end points.

Finally, we proceed with listing a set of useful properties.
#t
Property 4A.P1. It (C) = 0 Cs dWs , t ∈ [0, T ] is a Ft -martingale and has expectation equal to
zero.
Proof. Let us check that It (C) is a Ft -martingale. We have to check three conditions: (i) E |It (C)| < ∞,
all t ∈ [0, T ]; (ii) It (C) is Ft -adapted; (iii) E [ It (C)| Fs ] = Is (C), s < t. Condition (i) follows by the
isometry property to be introduced below. Condition (ii) is trivial. To show (iii), suppose, initially,
that s, t ∈ [tk−1 , tk ], s < t. We have:
k−1

It (C) = Zi Wti − Wti−1 + Zk Wt − Wtk−1
i=1
k−1

= Zi Wti − Wti−1 + Zk Ws − Wtk−1 + Zk (Wt − Ws )
i=1
= Is (C) + Zk (Wt − Ws )
E [ It (C)| Fs ] = E [ Is (C)| Fs ] + E [ Zk (Wt − Ws )| Fs ]

= Is (C) + Zk E [ (Wt − Ws )| Fs ] = Is (C) .
The case s ∈ [tl−1 , tl ] and t ∈ [tk−1 , tk ], l < k is proven similarly. Finally, It (C) has zero expectation
because it starts from the origin by the definition: I0 (C) = 0 ⇒ E (It (C)) = 0 all t. That is, ∀t,
E [It (C)] = E [I0 (C)] = I0 (C) = 0.
# 2 # t 2
t
Property 4A.P2 (Isometry). E 0 Cs dWs = 0 E Cs ds, for all t ∈ [0, T ].
Proof. Without loss of generality, set t = tk . We have:
" 2 - k .2
t
E Cs dWs = E Cti−1 Wti − Wti−1
0 i=1
 
k
k

= E Cti−1 Wti − Wti−1 Ctj−1 Wtj − Wtj−1 
i=1 j=1
- k .
2
= E Ct2i−1 Wti − Wti−1 ,
i=1
141
by A. Mele

where the last equality follows because Wti − Wti−1 and Wtj − Wtj−1 are independent for all i = j.
Then,
" t 2 - k .
2
2
E Cs dWs = E Cti−1 Wti − Wti−1
0 i=1
- k .

2
= E E Ct2i−1 Wti − Wti−1 Fti−1
i=1
- k .

= E E Ct2i−1 Fti−1 (ti − ti−1 )
i=1
k

= E Ct2i−1 (ti − ti−1 )
i=1
" t
= E Cs2 ds.
0
Property 4A.P3 (Linearity and linearity on adjacent intervals).
Property 4A.P4. It (C) has continuous ω-paths.
4.10.1.7 The general Itô’s stochastic integral

We now consider a more general class of integrand F-adapted processes Ct , t ∈ [0, T ] satisfying
# T 2 2
0 E Cs ds < ∞, and ∈ L (P ⊗ dt), which is obviously satisfied by simple processes, although now
we are now moving to continuous time. Clearly, H2 is a closed linear subspace of L2 (P ⊗ dt). So let
·L2 (P ⊗dt) be the norm of L2 (P ⊗ dt). Let H02 be the subset of H2 consisting of all simple processes.
We now outline how to construct the stochastic integral, in four steps.
Step 1: (H02 is dense in H2 ). For any C ∈ H2 , there exists a sequence of simple processes C (n) s.t
; ; #T (n)
;C − C (n) ; 2 → 0, i.e. 0 E(Cs − Cs )2 ds → 0.
L (P ⊗dt)
& '
Step 2: By step 1, C (n) is a Cauchy sequence in L2 (P ⊗ dt). By the isometry property of the Itô’s
integral for simple processes
; ; ; ;
; ′ ; ; ′ ;
;IT (C (n) ) − IT (C (n ) ); 2 = ;C (n) − C (n ) ; 2 .
L (P ) L (P ⊗dt)

Therefore, IT C (n) is a Cauchy sequence in L2 (P ). Now it is well-known that L2 (P ) is com-
(n)
plete, and so IT C must converge to some element of L2 (P ), denoted as IT (C).
Step 3: This limit is called the Itô’s stochastic integral of C, and is written as
" T
IT (C) = Cs dWs .
0
; ;
(n) ; (n) ;
Finally, the limit is well-defined: if there is another C∗ : ;C − C∗ ; → 0, then
L2 (P ⊗dt)
(n)
lim IT (C∗ ) = lim IT (C (n) ) = IT (C) in the L2 (P ) norm.
Step 4: (Itô’s integral as a process) We wish to create a whole “continuum” of Itô’s integrals at a
single glance. Step 3 is not enough because we need uniform convergence on [0, T ]. To show
142
by A. Mele
that it’s feasible lies beyond the aim of these introductory lectures. The final result is, For any
C ∈ H2 , there exists a process (It , t ∈ [0, T ]) which is a continuous Ft -martingale s.t
" t
It = Cs dWs , t ∈ [0, T ], P ⊗ dt-a.s.
0
#t
To summarize, then, let θ ∈ H2 . The stochastic integral It (θ) = 0 θ s dWs satisfies the following
properties: (i) Continuous sample paths, and It (θ) is a Ft -martingale; (ii) Expectation equal to zero;
#t #t # t
(iii) Itô’s isometry on H2 , i.e. E[ 0 θs dWs ]2 = 0 E θ2s ds < ∞, t ∈ [0, T ], hence E[ 0 Cs dWs ]2 ≤
#t #t
E[ 0 Cs dWs ]2 = 0 E(Cs2 )ds < ∞; (iv) Linearity and linearity on adjacent intervals.
A few remarks are in order. If θ ∈ H2 , then X solution to dXt = θt dWt is a martingale. If θ ∈ H2 , but
∈ L2 , X is, instead, called a local martingale. The converse is the Martingale Representation Theorem.
This theorem states that if X is a Ft -martingale, then there exists a θ ∈ H2 : dXt = θt dWt . This result
is utilized in the main text of this chapter, when it helps us tell whether we live in a world with complete
or incomplete markets. Moreover, in continuous-time finance, θ is often a portfolio strategy. It must
be in H2 to avoid doubling strategies, which are a kind of arbitrage opportunities (at least in absence
of frictions such as short-selling constraints). Assume, for example, that an asset price is W , and that
this asset does not distribute dividends from 0 to T . Then dW is the instantaneous gain from holding
one unit of this asset. The condition θ ∈ H2 implies that these strategies cannot become #t arbitrarily
2
large according to the H criterion. Moreover, the previous properties of It (θ) = 0 θs dWs suggest
that the “cumulative” gain process Gt = G0 + It (θ) is a martingale (not only a “local” martingale).
Therefore, no investor expects to make profits from investing in this asset.
4.10.1.8 Itô’s lemma: Introduction

We develop, heuristically, a basic version of Itô’s lemma, with its most general version stated further
in this appendix. Let f : R → R be twice continuously differentiable. We have:
" t " t
′ 1
f (Wt ) = f (W0 ) + f (Ws ) dWs + f ′′ (Ws ) ds, (4A.4)
0 2 0
where the first integral is an Itô’s stochastic integral, and second one is a Riemann’s one. For example,
let f (x) = x2 . Then,
" t " t " t

1 2
Wt2 =2 Ws dWs + ds ⇔ Ws dWs = Wt − t . (4A.5)
0 0 0 2
To provide a sketchy proof of Eq. (4A.4), note that:
k−1

f(Wt ) − f (W0 ) = f Wti+1 − f (Wti ) .
i=0
By Taylor,
1 2
f Wti+1 − f (Wti ) = f ′ (Wti ) Wti+1 − Wti + f ′′ (ξ i ) Wti+1 − Wti ,
2
143
by A. Mele

where min Wti , Wti+1 < ξ i < max Wti , Wti+1 , as in the figure below. Because W is continuous,
ξ i (ω) = Wτ i (ω) for some τ i (ω) : ti ≤ τ i (ω) ≤ ti+1 .
Wti+ 1
ξi
Wti
ti τi t i +1
Therefore,
k−1
k−1
1 2
f (Wt ) − f (W0 ) = f ′ (Wti ) Wti+1 − Wti + f ′′ (Wτ i ) Wti+1 − Wti .
i=0
2 i=0
We have
k−1
k−1
′′
2
f (Wτ i ) Wti+1 − Wti ≈ f ′′ (Wτ i ) (ti+1 − ti ) .
i=0 i=0
Finally, "

f ′ (Wti ) Wti+1 − Wti → f ′ (Ws ) dWs
i "

f ′′ (Wτ i ) (ti+1 − ti ) → f ′′ (Ws ) ds
i
More technical details in order of descending difficulty can be found in Karatzas and Shreve (1991),
Arnold (1974), Steele (2001) and Mikosch (1998).
Let us reconsider the example in Eq. (4A.4). By the stochastic integral theorem, is a martingale.
This is confirmed by Eq. (4A.4). According to Eq. (4A.4),
" t
1 2
Ws dWs = Wt − t
0 2

and Wt2 − t is indeed a martingale for E Wt2 = t all t.
4.10.2 Stochastic differential equations

4.10.2.1 Background
Consider the differential equation:
dxt = µ (t, xt ) dt, x0 = x,
for some function µ. Randomness can be introduced via an additional “noise term”:
dxt = µ (t, xt ) dt + σ (t, xt ) dWt , x0 = x.

144
by A. Mele
We already know that a ω-Wt is not differentiable, so this is only a short-hand notation for:
" t " t
xt = x0 + µ (s, xs ) ds + σ (s, xs ) dWs , (4A.6)
0 0
where the first integral is Riemann and the second integral is an Itô’s stochastic integral.
We have the following definitions. First, we say that an Itô’s process is,
dxt (ω) = µt (ω) dt + σt (ω) dWt , x0 = x.
Moreover, we say that an Itô’s diffusion process is,
dx (t) = µ (t, x (t)) dt + σ (t, x (t)) dW (t) , x0 = x.
It is known that an Itô’s diffusion process is a Markov process. The previous equation is also called a
stochastic differential equation (SDE). In a SDE, µ and σ “depend” on ω only through x. Finally, we
say that a time-homogeneous diffusion process is,
dx (t) = µ (x (t)) dt + σ (x (t)) dW (t) , x0 = x.
There is a beautiful property that is used to price financial derivatives, using replication arguments,
as explained in the main text, called the Unique Decomposition Property. Suppose we were given two
processes x and y with x0 = y0 , and that:
dxt = µxt dt + σ xt dWt and dyt = µyt dt + σyt dWt .
Then xt = yt almost surely if and only if µxt = µyt and σxt = σyt almost everywhere, in the sense that
#T #T
E[ 0 |axt − ayt | = 0] = E[ 0 |bxt − byt | dt] = 0.
4.10.2.2 Basic definitions, properties and regularity conditions

How do we know whether # t the various integrals given before are well-defined.
# t As an example, the Itô’s
integral representation 0 σ(s, xs )dWs works if σ is Ft -adapted and 0 E σ(s, xs )2 ds < ∞. But how
can be sure that these two basic conditions are satisfied if we don’t know yet the solution of x? And,
above all, what is a solution to a SDE? We have two concepts of such a solution, strong and weak.
Definition 4A.5. (Strong solution to a SDE) A strong solution to Eq. (4A.6) is a stochastic
process x = (xt , t ∈ [0, T ]) such that:
(i) x is Ft -adapted.
(ii) The integrals in Eq. (4A.6) are well-defined in the Riemann’s and Itô’s sense and Eq. (4A.6)
holds P ⊗ dt-almost surely
#
T 2
(iii) E 0 |xs | ds < ∞.
In other words, the definition of a strong solution requires that a Brownian motion be “given in
advance,” and that the solution xt constructed from it be then Ft -adapted.
Next, suppose, instead, that we were only given x0 and the functions σ(t, x) and µ(t, x), and that we
were asked to find a pair of processes (x̃, W̃ ) on some probability space (Ω, F̃, P ) such that Eq. (4A.6)
holds with x̃ being F̃t -adapted on some space, not necessarily the one in Eq. (4A.6). (Clearly such a x̃
needs not to be Ft -adapted.) In this case (x̃, W̃ ) is called a weak solution on (Ω, F̃, P ). In the case of a
weak solution, we are given x, µ, σ and then “we have to find” two things: a Brownian motion W̃ and
145
by A. Mele
#t #t
a F̃t -adapted process x̃ such that x̃t = x0 + 0 µ (s, x̃s ) ds + 0 σ (s, x̃s ) dW̃s holds P ⊗ dt-almost surely.
Clearly, a strong solution is also weak, but the converse is not true. Consider the following example.
Example 4A.6. (Tanaka equation) Let x (t) satisfy:
dx (t) = sign(x (t))dW (t) , x0 = 0. (4A.7)
This equation has no strong solutions, for define

" t
y (t) = sign(Ŵ (s))dŴ (s) , (4A.8)
0
where Ŵ is a Brownian motion. It can be shown that y (t) is G (t)-measurable, where G (t) is the
σ-algebra generated by |Ŵ (t)|. Clearly G (t) ⊂ F̂ (t), where F̂ (t) is the σ-algebra generated by Ŵ (t).
Therefore, the σ-algebra generated by y (t) is also strictly contained in F̂ (t). Armed with this result,
we can easily show that there are no strong solutions to Eq. (4A.7). To show this, suppose the contrary.
There is a theorem saying that x (t) would then be a Brownian motion. On the other hand, Eq. (4A.7)
can also be written
dW (t) = sign(x (t))dx (t) , x0 = 0,
or " t
W (t) = sign(x (s))dx (s) .
0
By the same reasoning produced to show that the σ-algebra generated by y (t) is strictly contained
in F̂ (t) in Eq. (4A.8), we conclude that the σ-algebra generated by W (t) is strictly contained in the
σ-algebra generated by x (t). But this contradicts that x (t) is a strong solution to Eq. (4A.7).
Clearly we must be able to impose some conditions enabling one to distinguish weak from strong
solutions. However, the only focus of the following is to provide regularity conditions ensuring existence
and uniqueness for the restrictive case of strong solutions, which is the case of interest in continuous-
time finance. We need to restrictions on µ and σ. For a given function f, we say that it satisfies a
Lipschitz condition in x if there exists a constant L such that for all (x, y) ∈ Rd × Rd ,
f (x, t) − f (y, t) ≤ L x − y uniformly in t.

where A ≡ Tr (AA⊤ ). In other words, f cannot change too widely. We also say f satisfies a growth
condition in x if there exists a constant G such that for all (x, y) ∈ Rd × Rd ,

2 2
f (x, t) ≤ G 1 + x uniformly in t.
That is, f cannot grow too much.

Next, we turn to the concepts of existence and uniqueness of a solution to a stochastic differential
(1) (2) (1)
equation. We say that if xt (ω) and xt (ω) are both strong solutions to Eq. (4A.6), then xt (ω) =
(2)
xt (ω) P ⊗ dt-a.s. We have:
Theorem 4A.7. Suppose that µ, σ satisfy Lipschitz and growth conditions in x, then there exists
a unique Itô’s process x satisfying Eq. (4A.6) which is continuous adapted Markov.
Consider the following stochastic differential equation:

dx (t) = µ (a − x (t)) dt + σ x (t)dW (t) ,
146
by A. Mele
for some constants µ, a, σ. This is the so-called square-root process utilized to model equity volatility
(see Chapter 10), the short-term rate (see Chapters 11) or instantaneous probabilities of default of
debt issuers (see Chapter 12). The point here, for now, is that the diffusion component does not satisfy
the conditions in Theorem 4A.7. Yet it is possible to show that under suitable parameter restrictions
there exists a strong solution. Incidentally, the solution to this simple equation is still unknown.
What about uniqueness of the solution? It is well-known that if µ, σ are locally Lipschitz continuous
in x, then strong uniqueness holds. But even for ordinary differential equations, a local Lipschitz
condition is not necessarily enough to guarantee global existence (i.e. for all t) of a solution. For
example, consider the following equation:
dx (t)
= µ (x (t)) ≡ x2 (t) , x0 = 1,
dt
has as unique solution:
1
x (t) = , 0 ≤ t < 1.
1−t
Yet is impossible to find a global solution, i.e. one defined for all t. This is exactly the kind of pathology
ruled out by linear-growth conditions. More generally, linear-growth conditions ensure that |xt (ω)| is
unique and doesn’t explode in finite time. Naturally, Lipschitz and growth conditions are only sufficient
conditions to guarantee the previous conclusions.
A final remark. The uniqueness concept used here refers to strong or pathwise uniqueness. There
are also definitions of weak uniqueness to mean that any two solutions (weak or strong) have the same
finite-dimensional distributions. For example, the Tanaka’s equation introduced earlier has no strong
solution, yet it can be shown that it has a (weakly) unique weak solution.
4.10.2.3 Itô’s lemma

Itô’s lemma is a fundamental tool of analysis in continuous-time finance. It helps to build up new
processes from given processes. Two examples might clarify.
(i) A share price is certainly a function of its dividend process. If the dividend process is solution
to some SDE, then the asset price is a solution to another SDE. Which SDE? Itô’s lemma will
give us the answer.
(ii) Derivative products, reviewed in the third part of the book, are financial instruments the value of
which depends on some underlying factors (hence the terminology “derivative”). In other words,
derivative prices are functions of these factors. Then if factors are solutions to SDE, derivative
prices are also solutions to SDE. Once again, Itô’s lemma will provide us with right SDE.
Naturally, the functional form linking the dividend process (or the factors) to the asset prices is
unknown. But in situations of interest, simple no-arbitrage restrictions will help to pin down such a
functional form.
Let us proceed with a few preliminary heuristic considerations. A useful heuristic definition is that
the increments of a Brownian motion, dW (t), can be thought of as being equal to W (T + ∆t) − W (t)
as ∆t → 0. We may think of the “increments” dW (t) as being normally distributed, dW (t) ∼ N (0, dt).
Heuristically, indeed, ∆W (t) ≡ W (t + ∆t) − W (t) ∼ N (0, ∆t). But then, by the previous normality
property of ∆W (t),
! !
E [∆W (t)] = 0 and E (∆W (t))2 = ∆t, hence var [∆W (t)] = ∆t, and var (∆W (t))2 = 2 (∆t)2 ,
where the second equality follows by the property χ2 distributions.

The point of the previous computations is that for small ∆t, the variance of (∆W (t))2 , which is
q.m.
proportional to (∆t)2 , is negligible if compared to its expectation, which is ∆t. Heuristically, Qn (t) →
147
by A. Mele
q.m.
t and (dW (t))2 ≡ Qn (dt) → dt. These heuristic considerations lead to the following, celebrated table
below.
Itô’s multiplication table

(dt)n = 0 for n > 1
dt · dW = 0
(dW )2 = dt
(dW )n = 0 for n > 2
dW1 dW2 = 0 for two independent Brownian motions
We now heuristically derive Itô’s lemma by hinging upon this table. Let x (t) be the solution to,
dx (t) = µ (t) dt + σ (t) dW (t) ,
and suppose we are given a function f (x, t), which we assume to be as differentiable in (x, t) as many
times we shall need below. We expand f as follows:
1
df (x, t) = ft (x, t) dt + fx (x, t) dx + fxx (x, t) (dx)2 + Remainder,
2
where the remainder contains only terms of order higher than (dx)2 and (dt)2 . So for reasons which
will be clear in one moment we will discard it. We have,
1
df = ft dt + fx dx + fxx (dx)2
2
1
= ft dt + fx (µdt + σdW ) + fxx (µdt + σdWt )2
2
1 !
= ft dt + fx µdt + fx σdW + fxx µ2 (dt)2 + σ2 (dW )2 + 2µσ (dt · dW ) .
2
By the Itô’s multiplication table,
1 !
2 2 2 2
df = ft dt + fx µdt + fx σdW + fxx µ (dt) + σ (dW ) + 2µσ (dt · dW )
2
1
= ft dt + fx µdt + fx σdW + fxx 0 + σ2 · dt + 0
2
By rearranging terms,

1
df (x, t) = ft (x, t) dt + fx (x, t) µ + fxx (x, t) σ2 dt + fx (x, t) σdW,
2
and the remainder is also zero by the Itô’s multiplication table. This is Itô’s lemma.
Naturally, Itô’s lemma also holds when x is a multidimensional process. A heuristic derivation of it
can be obtained through the Itô’s multiplication table applied to the following expansion:
1
df (x, t) = ft dt + fx dx + fxi xj dxi dxj .
2
i,j
Then, we have:
Theorem 4A.8. (Itô’s lemma, multidimensional) Let us be given a multidimensional process

x ∈ Rn solution to,
dx (t) = µ (x (t) , t) dt + σ (x (t) , t) dW (t) , (4A.9)
148
by A. Mele
where µ is in Rn , σ is in Rn×d and W is a d-dimensional vector of independent Brownian motions.

Moreover, let us be given a function f (x, t) which is twice differentiable in x and differentiable in t.
Then f is an Itô’s process, solution to:
df (x (t) , t) = Lf (x (t) , t) dt + fx (x (t) , t) σ (t) dW (t)
or more formally,
" t " t
f (x (t) , t) = f (x0 , 0) + Lf (x (s) , s) ds + fx (x (s) , s) σ (x (s) , s) dW (s) , (4A.10)
0 0
where !
1
Lf (x, t) = ft (x, t) + fx (x, t) µ + Tr σσ⊤ fxx (x, t)
2
and fx (x, t) and fxx (x, t) are the gradient and Hessian of f with respect to x.
Note that by Eq. (4A.10), and provided fx σ ∈ H2 , f is a martingale whenever Lf (x, t) = 0, for all
x, t. Moreover, on a terminology standpoint, the operator Af (x, t) = fx (x, t) µ + 12 Tr σσ⊤ fxx (x, t)
is usually referred to as the infinitesimal generator of the diffusion process in Eq. (4A.9).
149
4.11. Appendix 3: Proof of selected results c
by A. Mele
4.11 Appendix 3: Proof of selected results

4.11.1 Proof of Theorem 4.2
As mentioned in the main text, we have that by by the Girsanov’s theorem, Q is non-empty if and
only if Eq. (4.43) holds true. Therefore, the proof will rely on Eq. (4.43). If part. With c ≡ 0, Eq.
(4.44) is:
" τ ⊤
V x,π,0 (τ ) π (u) σ (u)
=x+ dW0 (u), τ ∈ [t, T ],
S0 (τ ) t S0 (u)
which implies, x = EτQ [S0 (T )−1 V x,π,0 (T )]. An arbitrage opportunity is V x,π,0 (t) ≤ S0 (T )−1 V x,π,0 (T )
a.s., which combined with the previous equality leaves: V x,π,0 (t) = S0 (T )−1 V x,π,0 (T ) Q-a.s. (if a r.v.
ỹ ≥ 0 and
Et (ỹ) = 0, this means thatỹ = 0 a.s.) and, hence, P -a.s. The last equality is in contradiction
with Pr S0 (T )−1 V x,π,0 (T ) − x > 0 > 0, as required by Definition 4.3.
Only if part. We combine portions of proofs in Karatzas (1997, thm. 0.2.4 pp. 6-7) and Øksendal
(1998, thm. 12.1.8b, pp. 256-257). We let:
Z (τ ) = {ω ∈ Ω : Eq. (4.43) has no solutions}
= {ω ∈ Ω : a(τ ; ω) − 1m r(τ ; ω) ∈
/ σ}
) *
= ω ∈ Ω : ∃π(τ ; ω) : π(τ ; ω)⊤ σ(τ ; ω) = 0 and π(τ ; ω)⊤ (a(τ ; ω) − 1m r(τ ; ω)) = 0 ,
and consider the following portfolio,
+
k · sign π(τ ; ω)⊤ (a(τ ; ω) − 1m r(τ ; ω)) · π(τ ; ω) for ω ∈ Z(τ )
π̂(τ ; w) =
0 for ω ∈
/ Z(τ )
Clearly π̂ is (τ ; ω)-measurable, and generates, by Eq. (4.39),
" τ " τ ⊤
V x,π̂,0 (τ ) π̂ (u)⊤ (a (u) − 1m r (u)) π̂ (u) σ (u)
=x+ IZ(u) du + IZ(u) dW (u)
S0 (τ ) t S 0 (u) t S0 (u)
" τ $ %
π̂ (u)⊤ (a (u) − 1m r (u))
=x+ IZ(u) du
t S0 (u)
≥ x.
So the market has no arbitrage only if IZ(u) = 0, i.e. only if Eq. (4.43) has at least one solution.
4.11.2 Proof of Eq. (4.48).

We have:
" T
V x,π,c (T ) c(u)
x=E + du
S0 (T ) t S0 (u)
" T
η (T ) V x,π,c (T ) η(T )c(u)
= η(t)−1 · E + du
S0 (T ) t S0 (u)
" T
−1 η (T ) V x,π,c (T ) η(T )c(u)
= η(t) · E + E F(u) du
S0 (T ) t S0 (u)
" T
−1 η (T ) V x,π,c (T ) E ( η(T )| F(u)) c (u)
= η(t) · E + du
S0 (T ) t S0 (u)
" T
−1 η (T ) V x,π,c (T ) η (u) c (u)
= η(t) · E + du
S0 (T ) t S0 (u)
" T
x,π,c
= E mt,T · V (T ) + mt,u · c(u)du ,
t
150
by A. Mele
where we used the fact that c is adapted, the law of iterated expectations, the martingale property of
η, and the definition of m0,t .
4.11.3 Walras’s consistency tests

First, we show that Eq. (4.55) ⇒ Eq. (4.56). To grasp intuition about the ongoing proof, consider the
two-period economy of Chapter 2. In that economy, absence of arbitrage opportunities implies that
∃φ ∈ Rd : φ⊤ (c1 − w1 ) = Sθ = −(c0 − w0 ), whence cs = ws , s = 0, · · ·, d ⇐⇒ θ = 0m . In the model of
this chapter, absence of arbitrage opportunities implies that there exists a unique Q ∈ Q such that:
" τ ⊤ " τ
V x,π,c (τ ) θ0 (τ )S0 (τ ) + π⊤ (τ )1m π σ (u) c (u)
≡ = 1⊤
m S(t) + dW 0 (u) − du.
S0 (τ ) S0 (τ ) t S0 (u) t S0 (u)
That is,

θ 0 (τ )S0 (τ ) + π⊤ (τ ) − S ⊤ (τ ) 1m S ⊤ (τ )1m
+
S0 (τ ) S0 (τ )

" τ ⊤ ⊤
" τ " τ ⊤
⊤ π (u) − S (u) σ(u) c (u) S (u)σ(u)
= 1m S(t) + dW0 (u) − du + dW0 (u).
t S0 (u) t S0 (u) t S0 (u)
#τ #τ
Plugging the solution SS0i (τ ) = Si (t)+ t S0−1 Si (u)σi (u)dW0 (u)− t S0−1 Di (u)du in the previous
relation,
" T ⊤ " T
θ0 (T )S0 (T ) + π ⊤ (T ) − S ⊤ (T ) 1m π (u) − S ⊤ (u) D(u) − c(u)
= σ(u)dW0 (u) + du.
S0 (T ) t S0 (u) t S0 (u)
(4A.11)
When Eq. (4.55) holds, we have that V x,π,c (T ) = θ0 (T )S0 (T ) + π⊤ (T )1m = q(T ) = S ⊤ (T )1m , and
D = c, and Eq. (4A.11) becomes:
" T
π ⊤ (u) − S ⊤ (u)
0 = x(T ) ≡ σ(u)dW0 (u),
t S0 (u)
a martingale starting at zero, satisfying:
π⊤ (τ ) − S ⊤ (τ )
dx(τ ) = σ(τ )dW0 (τ ) = 0.
S0 (τ )
Since ker(σ) = {∅} then, we have that π(τ ) = S(τ ) a.s. for τ ∈ [t, T ] and, hence, π(τ ) = S(τ ) a.s. for
τ ∈ [t, T ]. It is easily checked that this implies θ0 (T ) = 0 P -a.s. and that in fact, θ0 (τ ) = 0 a.s.
Next, we show that Eq. (4.56) ⇒ Eq. (4.55). When Eq. (4.56) holds, Eq. (4A.11) becomes:
" T
D(u) − c(u)
0 = y(T ) ≡ du,
t S0 (u)
a martingale starting at zero. We conclude by the same arguments used in the proof of the previous
part.
151
4.12. Appendix 4: The Green’s function c
by A. Mele
4.12 Appendix 4: The Green’s function

4.12.1 Setup
In Section 4.6, it is shown that in frictionless markets, the value of a security as of time τ is:
" T
mt,T mt,s
V (x(τ ), τ ) = E V (x(T ), T ) + h (x(s), s) ds , (4A.12)
mt,τ τ mt,τ
where mt,τ is the stochastic discount factor,

η(t, τ ) 1 dQ
mt,τ = = · .
S0 (τ ) S0 (τ ) dP F(τ )
The Arrow-Debreu state price density is:
S0 (t)
φt,T = mt,T dP = dQ.
S0 (T )
Our aim is to characterize this density in terms of partial differential equations. By the same reasoning
produced in Section 4.6, Eq. (4A.12) can be rewritten as:
" T
S0 (t′ )
V (x(τ ), τ ) = E a (τ , T ) V (x(T ), T ) + a (τ , s) h (x(s), s) ds , a t′ , t′′ ≡ . (4A.13)
τ S0 (t′′ )
Next, consider the state vector, y(u) ≡ (a (τ , u) , x(u)), τ ≤ u ≤ T , and let q ( y(t′ )| y(τ )) be the
risk-neutral density of y. We have,
" T
V (x(τ ), τ ) = E a (τ , T ) V (x(T ), T ) + a (τ , s) h (x(s), s) ds
τ
" " T "
= a (τ , T ) V (x(T ), T ) q ( y(T )| y(τ )) dy(T ) + a (τ , s) h (x(s), s) q ( y(s)| y(τ )) dy(s)ds.
τ
If V (x(T ), T ) and a (τ , T ) are independent,

" "
a (τ , T ) V (x(T ), T ) q ( y(T )| y(τ )) dy(T ) = G(τ , T )V (x(T ), T ) dx(T )
X
where: "
G(τ , T ) ≡ a (τ , T ) q ( y(T )| y(τ )) dy(T ).
A
Assuming the same for h,
" " T "
V (x(τ ), τ ) = G(τ , T )V (x(T ), T ) dx(T ) + G(τ , s)h (x(s), s) dx(s)ds.
X τ X
The function G is known as the Green’s function:

"
G (t, ℓ) ≡ G (x, t; ξ, ℓ) = a (t, ℓ) q ( y(ℓ)| y(t)) da.
A
It is the value in state x ∈ Rd as of time t of a unit of numéraire at ℓ > t if future states lie in a
neighborhood (in Rd ) of ξ. It is thus the Arrow-Debreu state-price density.
For example, a pure discount bond has V (x, T ) = 1 ∀x, and h(x, s) = 1 ∀x, s, and
"
V (x(τ ), τ ) = G (x(τ ), τ ; ξ, T ) dξ, with lim G (x(τ ), τ ; ξ, T ) = δ (x(τ ) − ξ) ,
X τ ↑T
where δ is the Dirac’s delta.

152
4.12. Appendix 4: The Green’s function c
by A. Mele
4.12.2 The PDE connection

We show the Green’s function satisfies the same partial differential equation (PDE) satisfied by the
security price, but with a different boundary condition, and with the instantaneous dividend taken
out. We have:
" " T"
V (x (t) , t) = G (x(t), t; ξ (T ) , T ) V (ξ (T ) , T ) dξ (T ) + G(x(t), t; ξ(s), s)h (ξ(s), s) dξ(s)ds.
X t X
(4A.14)
Consider the scalar case. By Eq. (4A.13), and the Feynman-Kac connection between PDEs and con-
ditional expectations reviewed in Section 4.2, we have that under regularity conditions, V is solution
to:
1
0 = Vt + µVx + σ 2 Vxx − rV + h, (4A.15)
2
where µ is the risk-neutral drift of x. Next, take the following partial derivatives of V (x, t) in Eq.
(4A.14):
" " " T " " " T "
Vt = Gt V dξ −
δ(x − ξ)hdξ + Gt hdξds = Gt V dξ − h + Gt hdξds
X X t X X τ X
" " " T
Vx = Gx V dξ + Gx hdξds
X t X
" " T"
Vxx = Gxx V dξ + Gxx hdξds
X t X
and replace them into Eq. (4A.15) to obtain:

"
1 2
0 = Gt + µGx + σ Gxx − rG V (ξ (T ) , T ) dξ(T )
X 2
" T"
1 2
+ Gt + µGx + σ Gxx − rG h (ξ (s) , s) dξ (s) ds.
t X 2
This shows that G is solution to

1
0 = Gt + µGx + σ2 Gxx − rG, with lim G (x, t; ξ, T ) = δ (x − ξ) .
2 t↑T
153
4.13. Appendix 5: Portfolio constraints c
by A. Mele
4.13 Appendix 5: Portfolio constraints

We are looking for a portfolio-consumption policy (pν̂ , cν̂ ) such that
" T
x,pν̂ ,cν̂
Val (x; K) = E u (t, cν̂ (t)) dt + U (V (T )) ≡ Valν̂ (x) , (4A.16)
0
and pν̂ (t) ∈ K for all t ∈ [0, T ].

Note that because K contains the origin, then, the support function ζ in Eq. (4.59) satisfies ζ (ν) ≥ 0
for each ν ∈ K̃. Moreover, an intuitive and important property of ζ is that,
p ∈ K ⇐⇒ ζ (ν) + p⊤ ν ≥ 0, ∀ν ∈ K̃. (4A.17)
Next, define the standard Brownian motion under the probability Qν , defined through the Radon-
Nikodym in Eq. (4.61):
" t " t
−1

Wν (t) = W (t) + λ (u) + σ (u) ν (u) du ≡ W0 (t) + σ−1 (u) ν (u) du,
0 0
where λ = σ−1 (a − 1d r), and W0 is the usual Brownian under the risk-neutral probability in a market
without any frictions. If the price system is as in Eqs. (4.60), then, for any unconstrained portfolio-
consumption (p, c), the dynamics of wealth, Vνx,p,c say, are easily seen to be:

dVνx,p,c = p⊤ ν + ζ (ν) Vνx,p,c + rVνx,p,c − c dt + p⊤ σdW0 .
So we have that under Q0 ,

" T
Vνx,p,c (T ) c (t)
+ dt
S0 (T ) 0 S0 (t)
" T x,p,c ! " T x,p,c
Vν (t) ⊤ Vν (t) ⊤
=x+ p (t) ν (t) + ζ (ν (t)) dt + p (t) σ (t) dW0 (t) .
0 S0 (t) 0 S0 (t)
Therefore, for any normalized portfolio-consumption (p, c), we have that the wealth difference, ∆ (t) ≡
Vνx,π,c (T )−V x,π,c (T )
S0 (T ) , satisfies:
Vνx,π,c (t) ⊤ !
d∆ (t) = p (t) ν (t) + ζ (ν (t)) dt + ∆ (t) p⊤ (t) σ (t) dW (t) , ∆ (0) = 0.
S (t)
0
≡m(t)
Next, consider, the simpler equation,

¯ (t) = ∆
d∆ ¯ (t) p⊤ (t) σ (t) dW (t) , ¯ (0) = 0.
∆ (4A.18)
Because m (t) ≥ 0 by Eq. (4A.17), then, by a comparison theorem (e.g., Karatzas and Shreve (1991,
p. 291-295)), ∆ (t) ≥ ∆ ¯ (t) = 0, where the last equality follows because the solution to Eq. (4A.18) is
¯ ¯
∆ (t) = ∆ (0) L (t), for some positive process L (t). Therefore, we have,
Vνx,p,c (t) ≥ V x,p,c (t) , with an equality if ζ (ν (t)) + p⊤ ν (t) = 0 for all t. (4A.19)
Finally, suppose there is a constrained portfolio-consumption pair (pν̂ , cν̂ ), such that
ζ (ν̂ (t)) + p⊤ (t) ν̂ (t) = 0. (4A.20)

154
4.13. Appendix 5: Portfolio constraints c
by A. Mele
Naturally, we have that Val (x; K) ≤ Valν (x) for all ν and, hence,
Val (x; K) ≤ inf (Valν (x)) . (4A.21)

ν∈K̃
Moreover, we have,
" T
x,p,c
Val (x; K) = E u (t, c (t)) dt + U (V (T )) , p (t) ∈ K
0
" T
x,pν̂ ,cν̂
≥ E u (t, cν̂ (t)) dt + U (V (T ))
0
" T
x,pν̂ ,cν̂
= E u (t, cν̂ (t)) dt + U Vν̂ (T )
0
= Valν̂ (x) , (4A.22)
where the second line follows, because the value of the unconstrained problem is, of course, the largest
we may have, once we consider any arbitrary constrained portfolio-consumption (pν̂ , cν̂ ). The third line
follows by Eq. (4A.20) and (4A.19). The fourth line is the definition of Valν (x). Combining (4A.21)
with (4A.22) leaves,
Val (x; K) = Valν̂ (x) .
The converse, namely “if there exists a ν̂ ∈ K̃ that minimizes Valν̂ (x), then, the corresponding
portfolio-consumption process (pν̂ , cν̂ ) is optimal for the constrained problem,” is also true, but its
arguments (even informal) are omitted here.
155
4.14. Appendix 6: Models with final consumption only c
by A. Mele
4.14 Appendix 6: Models with final consumption only

Sometimes, we may beinterested in models with consumption taking place in at the end of the period
⊺
only. Let S̄ = S (0) , S and θ̄ = (θ(0) , θ), where θ and S are both m-dimensional. Define as usual
wealth as of time t as Vt ≡ S̄t θ̄t . There are no dividends. A self-financing strategy θ̄ satisfies,
S̄t+ θ̄t+1 = S̄t θ̄t ≡ Vt , t = 1, · · ·, T.
Therefore,
Vt = S̄t θ̄t + S̄t−1 θ̄ t−1 − S̄t−1 θ̄t−1
= S̄t θ̄t + S̄t−1 θ̄ t−1 − S̄t−1 θ̄t (because θ̄ is self-financing)
= Vt−1 + ∆S̄t θ̄t , ∆S̄t ≡ S̄t − S̄t−1 , t = 1, · · ·, T,
or,
t

Vt = V1 + ∆S̄n θ̄ n .
n=1
Next, suppose that
(0) (0)
∆St = rt St−1 , t = 1, · · ·, T,
with {rt }Tt=1 given and to be defined more precisely below. The term ∆S̄t θ+
t can then be rewritten as:
(0) (0)
∆S̄t θ̄t = ∆St θt + ∆St θ t
(0) (0)
= rt St−1 θt + ∆St θt
(0) (0)
= rt St−1 θt + rt St−1 θt − rt St−1 θt + ∆St θt
= rt S̄t−1 θ̄t − rt St−1 θt + ∆qt θt
= rt S̄t−1 θ̄t−1 − rt St−1 θt + ∆St θt (because θ̄ is self-financing)
= rt Vt−1 − rt St−1 θ t + ∆St θ t ,
and we obtain
Vt = (1 + rt ) Vt−1 − rt St−1 θt + ∆St θt ,
or,
t

Vt = V1 + (rn Vn−1 − rn Sn−1 θn + ∆Sn θ n ) .
n=1
Next, considering “small” time intervals. In the limit we obtain:
dV (t) = r(t)V (t)dt − r(t)S(t)θ(t)dt + dS(t)θ(t).
Such an equation can also be arrived at by noticing that current wealth is nothing but initial wealth
plus gains from trade accumulated up to now:
" t
V (t) = V (0) + dS̄(u)θ̄(u).
0
⇔
dV (t) = dS̄(t)θ̄(t)+
= dS0 (t)θ0 (t) + dS(t)θ(t)
= r(t)S0 (t)θ0 (t)dt + dS(t)θ(t)
= r(t) (V (t) − S(t)θ(t)) dt + dS(t)θ(t)
= r(t)V (t)dt − r(t)S(t)θ(t)dt + dS(t)θ(t).
156
4.14. Appendix 6: Models with final consumption only c
by A. Mele
Now consider the sequence of problems of terminal wealth maximization:

+
maxθt E [ u(V (T ))| Ft−1 ] ,
For t = 1, · · ·, T, Pt :
s.t. Vt = (1 + rt ) Vt−1 − rt St−1 θt + ∆St θt
Even if markets are incomplete, agents can solve the sequence of problems {Pt }Tt=1 as time unfolds.
Each problem can be written as:
- $ % .
T

max E u V1 + (rt Vt−1 − rt St−1 θ t + ∆St θt ) Ft−1 .
θt
t=1
The FOC for t = 1 is:

E u′ (V (T )) (S1 − (1 + r0 ) S0 ) F0 ,
whence
E [ u′ (V (T )) · S1 | F0 ]
S0 = (1 + r0 )−1 .
E [ u′ (V (T ))| F0 ]
In general
E [ u′ (V (T )) · St+1 | Ft ]
St = (1 + rt )−1 , t = 0, · · ·, T − 1.
E [ u′ (V (T ))| Ft ]
The previous relations suggest that we can define a martingale measure Q for the discounted price
process by defining
dQ u′ (V (T ))
= .
dP E [ u′ (V (T ))| Ft ]
Ft
Connections with the CAPM. It’s easy to show that:

u′ (V (T ))
E (r̃t+1 ) − rt = cov , r̃t+1 ,
E [ u′ (V (T ))| Ft ]
where r̃t+1 ≡ (St+1 − St )/ St .
157
4.15. Appendix 7: Further topics on jumps c
by A. Mele
4.15 Appendix 7: Further topics on jumps

4.15.1 The Radon-Nikodym derivative
To derive heuristically the Radon-Nikodym derivative, consider the jump times 0 < τ 1 < τ 2 < · · · <
τ n = T:. The probability of a jump in a neighborhood of τ i is v(τ i )dτ , and to find what happens under
the risk-neutral probability or, in general, under any equivalent measure, we just write vQ (τ i )dτ under
measure Q, and set vQ = vλJ . Clearly, the probability of no-jumps between any two adjacent random
points τ i−1 and τ i and a jump at time τ i−1 is, for i ≥ 2, proportional to
τ
− i v(u)du
v(τ i−1 )e τ i−1
under the probability P,
and to
τ τ
− i vQ (u)du − i v(u)λJ (u)du
vQ (τ i−1 )e τ i−1
= v(τ i−1 )λJ (τ i−1 )e τ i−1
under the probability Q.
As explained in Section 4.7, these are in fact densities of time intervals elapsing from one arrival to
the next one.
Next let A be the event of marks at time τ 1 , τ 2 , · · ·, τ n . The Radon-Nikodym derivative is the
likelihood ratio of the two probabilities Q and P of A:
τ1 τ τ
v(u)λJ (u)du − 2 v(u)λJ (u)du − 3 v(u)λJ (u)du
Q(A) e− t · v(τ 1 )λJ (τ 1 )e τ1
· v(τ 2 )λJ (τ 2 )e τ2
· ···
= τ1 τ
2
τ
3
,
P (A) e− t v(u)du · v(τ 1 )e
− τ1 v(u)du
· v(τ 2 )e
− τ2 v(u)du
· ···
where we have used τthe fact that given that at

τ 1τ 0 = Jt, there are no-jumps, the probability of no-jumps
− 1 v(u)du
from t to τ 1 is e t under P and e t v(u)λ (u)du under Q, respectively. Simple algebra then
−
yields,
τ1 τ τ
Q(A) J J − v(u)(λJ (u)−1)du − 2 v(u)(λJ (u)−1)du − 3 v(u)(λJ (u)−1)du
= λ (τ 1 ) · λ (τ 2 ) · e t ·e τ1 e τ2 · ···
P (A)
n
1 τn
v(u)(λJ (u)−1)du
= λJ (τ i ) · e− t
i=1
- $n %.
1 τn
λJ (τ i ) · e− t v(u)(λ (u)−1)du
J
= exp ln
i=1
- n " .
τn J
J
= exp ln λ (τ i ) − v(u) λ (u) − 1 du
i=1 t
-" " .
T
T
J
= exp ln λJ (u)dZ(u) − v(u) λ (u) − 1 du ,
t t
where the last equality follows from the definition of the Stieltjes integral.
The previous results can be used to say something substantive on an economic standpoint. But
before, we need to simplify both presentation and notation. We have:
Definition (Doléans-Dade exponential semimartingale): Let M be a martingale. The unique solu-

tion to the equation: " τ
L(τ ) = 1 + L(u)dM(u),
t
is called the Doléans-Dade exponential semimartingale and is denoted as E(M).

158
by A. Mele
4.15.2 Arbitrage restrictions

As in the main text, let now S be the price of a primitive asset, solution to:
dS
= bdτ + σdW + ℓSdZ
S
= bdτ + σdW + ℓS (dZ − vdτ ) + ℓSvdτ
= (b + ℓSv) dτ + σdW + ℓS (dZ − vdτ ) .
Next, define

dZ̃ = dZ − vQ dτ vQ = vλJ ; dW̃ = dW + λdτ .
Both Z̃ and W̃ are Q-martingales. We have:
dS
= b + ℓSv Q − σλ dτ + σdW̃ + ℓSdZ̃.
S
The characterization of the equivalent martingale measure for the discounted price is given by the
following Radon-Nikodym density of Q with respect to P :
" T " T
dQ J
=E − λ(τ )dW (τ ) + λ (τ ) − 1 (dZ(τ ) − v(τ )) dτ ,
dP t t
where E (·) is the Doléans-Dade exponential semimartingale, and so:
b = r + σλ − ℓv Q ES (S) = r + σλ − ℓvλJ ES (S).
Clearly, markets are incomplete here. It is possible to show that if S is deterministic, a representative
1−η
agent with utility function u(x) = x 1−η−1 makes λJ (S) = (1 + S)−η .
4.15.3 State price density: introduction

We have: " "
T T
L(T ) = exp − v(τ ) λJ (τ ) − 1 dτ + ln λJ (τ )dZ(τ ) .
t t
The objective here is to use Itô’s lemma for jump processes to express L in differential form. Define
the jump process y as:
" τ " τ

y(τ ) ≡ − v(u) λJ (u) − 1 du + ln λJ (u)dZ(u).
t t
In terms of y, L is L(τ ) = l(y(τ )) with l(y) = ey . We have:

dL(τ ) = −ey(τ ) v(τ ) λJ (τ ) − 1 dτ + ey(τ ) + jump − ey(τ ) dZ(τ )
J

= −ey(τ ) v(τ ) λJ (τ ) − 1 dτ + ey(τ ) eln λ (τ ) − 1 dZ(τ )
or,
dL(τ )
= −v(τ ) λJ (τ ) − 1 dτ + λJ (τ ) − 1 dZ(τ ) = λJ (τ ) − 1 (dZ(τ ) − v(τ )dτ ) .
L(τ )
The general case (with stochastic distribution) is covered in the following subsection.
159
by A. Mele
4.15.4 State price density: general case

Suppose the primitive is:
dx(τ ) = µ(x(τ − ))dτ + σ(x(τ − ))dW (τ ) + dZ(τ ),
and that u is the price of a derivative. Introduce the P -martingale,
dM(τ ) = dZ(τ ) − v(x(τ ))dτ .
By Itô’s lemma for jump-diffusion processes,
du(x(τ ), τ )
= µu (x(τ − ), τ )dτ + σu (x(τ − ), τ )dW (τ ) + J u (∆x, τ ) dZ(τ )
u(x(τ − ), τ )
= (µu (x(τ − ), τ ) + v(x(τ − ))J u (∆x, τ )) dτ + σu (x(τ − ), τ )dW (τ ) + J u (∆x, τ ) dM (τ ),
∂
where µu = ∂t + L u u, σu = ∂u ∂x · σ
∂
u, ∂t + L is the generator for pure diffusion processes and,
finally:
u(x(τ ), τ ) − u(x(τ − ), τ )
J u (∆x, τ ) ≡ .
u(x(τ − ), τ )
Next generalize the steps made some two subsections ago, and let
dW̃ = dW + λdτ ; dZ̃ = dZ − vQ dτ .
The objective is to find restrictions on both λ and v Q such that both W̃ and Z̃ are Q-martingales.
Below, we show that there is a precise connection between vQ and J η , where J η is the jump component
in the differential representation of η:
dη(τ )
= −λ(x(τ − ))dW (τ ) + J η (∆x, τ ) dM(τ ), η(t) = 1.
η(τ − )
The relationship is
v Q = v (1 + J η ) ,
and a proof of these facts will be provided below. What has to be noted here, is that in this case,
dη(τ )
= −λ(x(τ − ))dW (τ ) + λJ − 1 dM(τ ), η(t) = 1,
η(τ − )
which clearly generalizes what stated in the previous subsection.

Finally, we have:
du
= (µu + vJ u ) dτ + σu dW + J u (dZ − vdτ )
u
= µu + vQ J u − σu λ dτ + σu dW̃ + J u dZ̃
= (µu + v (1 + J η ) J u − σ u λ) dτ + σu dW̃ + J u dZ̃.
Finally, by the Q-martingale property of the discounted u,
µu − r = σu λ − vQ · E∆x (J u ) = σu λ − v · E∆x {(1 + J η ) J u } ,
where E∆x is taken with respect to the jump-size distribution, which is the same under Q and P .
160
by A. Mele
Proof that v Q = v (1 + J η ). As usual, the state-price density η has to be a P -martingale in order

to be able to price bonds (in addition to all other assets). In addition, η clearly “depends” on W and
Z. Therefore, it satisfies:
dη(τ )
= −λ(x(τ − ))dW (τ ) + Jη (∆x, τ ) dM(τ ), η(t) = 1.
η(τ − )
We wish to find vQ in dZ̃ = dZ − vQ dτ such that Z̃ is a Q-martingale, viz
Z̃(τ ) = E[Z̃(T )],
i.e.,
E η(T ) · Z̃(T )
E(Z̃(t)) = = Z̃(t) ⇔ η(t)Z̃(t) = E[η(T )Z̃(T )],
η(t)
i.e.,
η(t)Z̃(t) is a P -martingale.
By Itô’s lemma,
d(η Z̃) = dη · Z̃ + η · dZ̃ + dη · dZ̃

= dη · Z̃ + η dZ − v Q dτ + dη · dZ̃
Q

= dη · Z̃ + η[dZ
− vdτ + v − v dτ ] + dη · dZ̃
dM

= dη · Z̃ + η · dM + η v − vQ dτ + dη · dZ̃.
Because η, M and ηZ̃ are P -martingales,

" T " T
Q

∀T , 0 = E η(τ ) · v(τ ) − v (τ ) dτ + dη(τ ) · dZ̃(τ ) .
t t
But

dη · dZ̃ = η (−λdW + J η dM) dZ − vQ dτ = η [−λdW + J η (dZ − vdτ )] dZ − v Q dτ ,
and since (dZ)2 = dZ,

E(dη · dZ̃) = η · J η v · dτ ,
and the previous condition collapses to:
" T
Q η

∀T , 0 = E η(τ ) · v(τ ) − v (τ ) + J (∆x)v(τ ) dτ ,
t
which implies
vQ (τ ) = v(τ ) (1 + J η (∆x)) , a.s.

161
by A. Mele
References
Arnold, L. (1974): Stochastic Differential Equations: Theory and Applications, New York:
Wiley.
Black, F. and M. Scholes (1973): “The Pricing of Options and Corporate Liabilities.” Journal
of Political Economy 81, 637-659.
Cvitanić, J. and I. Karatzas (1992): “Convex Duality in Constrained Portfolio Optimization.”

Annals of Applied Probability 2, 767-818.
Föllmer, H. and M. Schweizer (1991): “Hedging of Contingent Claims under Incomplete Infor-
mation.” In: Davis, M. and R. Elliott (Editors): Applied Stochastic Analysis. New York:
Gordon & Breach, 389-414.
Friedman, A. (1975): Stochastic Differential Equations and Applications (Vol. I). New York:
Academic Press.
Harrison, J.M. and S. Pliska (1983): “A Stochastic Calculus Model of Continuous Trading:
Complete Markets.” Stochastic Processes and Their Applications 15, 313-316.
Harrison, J.M, R. Pitbladdo and S.M. Schaefer (1984): Continuous Price Processes in Fric-
tionless Markets Have Infinite Variation.” Journal of Business 57, 353-365.
He, H. and N. Pearson (1991): “Consumption and Portfolio Policies with Incomplete Markets
and Short-Sales Constraints: The Infinite Dimensional Case.” Journal of Economic Theory
54, 259-304.
Karatzas, I. and S.E. Shreve (1991): Brownian Motion and Stochastic Calculus. New York:
Springer Verlag.
Mikosch, T. (1998): Elementary Stochastic Calculus with Finance in View. Singapore: World
Scientific.
Revuz, D. and M. Yor (1999): Continuous Martingales and Brownian Motion. New York:
Springer Verlag.
Shreve, S. (1991): “A Control Theorist’s View of Asset Pricing.” In: Davis, M. and R. Elliot
(Editors): Applied Stochastic Analysis. New York: Gordon & Breach, 415-445.
Steele, J.M. (2001): Stochastic Calculus and Financial Applications. New York: Springer-
Verlag.
162
5
Taking models to data
5.1 Introduction
This chapter surveys methods to estimate and test dynamic models of asset prices. It begins
with foundational issues on identification, specification and testing. Then, it surveys classical
estimation and testing methodologies such as the Method of Moments, where the number of
moment conditions equals the dimension of the parameter vector (Pearson, 1894); Maximum
Likelihood (ML) (Gauss, 1816; Fisher, 1912); the Generalized Method of Moments (GMM),
where the number of moment conditions exceeds the dimension of the parameter vector, leading
to the minimum chi-squared (Neyman and Pearson, 1928; Hansen, 1982); and, finally, the recent
developments relying on simulations, which aim to implement ML and GMM estimation for
models that are analytically quite complex, but that can be simulated. The chapter concludes
with an illustration of how joint estimation of fundamentals and asset prices in arbitrage-free
models can lead to statistical efficiency, asymptotically.
5.2 Data generating processes

5.2.1 Basics
Given is a multidimensional stochastic process yt , a data generating process (DGP). While
we do not know the probability distribution underlying yt , we use the available data to get
insights into its nature. A few definitions. A DGP is a conditional law, say the law of yt given
the set of past values yt−1 = {yt−1 , yt−2 , · · ·}, and some exogenous [define] variable z, with
zt = {zt , zt−1 , zt−2 , · · ·},
DGP : ℓ0 ( yt | xt ),
where xt = (yt−1 , zt ), and ℓ0 denotes the conditional density of the data, the true law. Then,
we have three basic definitions. First, we define a parametric model as a set of conditional laws
for yt , indexed by a parameter vector θ ∈ Θ ⊆ Rp ,
(M ) = {ℓ ( yt | xt ; θ) , θ ∈ Θ ⊆ Rp } .
5.2. Data generating processes c
by A. Mele
Second, we say that the model (M ) is well-specified if,
∃θ0 ∈ Θ : ℓ ( yt | xt ; θ 0 ) = ℓ0 ( yt | xt ) .
Third, we say that the model (M ) is identifiable if θ0 is unique. The main goal of this chapter is
to review tools aimed at drawing inference about the true parameter θ0 , given the observations.
5.2.2 Restrictions on the DGP

The previous definition of DGP is too rich to be of practical relevance. This chapter deals
with estimation methods applying to DGPs satisfying a few restrictions. Two fundamental
restrictions are usually imposed on the DGP:
• Restrictions on the heterogeneity of the stochastic process, which lead to stationary ran-
dom processes.
• Restrictions on the memory of the stochastic process, which pave the way to ergodic
processes.
5.2.2.1 Stationarity
Stationary processes describe phenomena leading to long run equilibria, in some statistical
sense: as time unfolds, the probability generating the observations settles down to some “long-
run” probability density, a time invariant probability. As Chapter 3 explains, in the early
1980s, theorists begun to define a long-run equilibrium as a well-defined stationary, probability
distribution generating economic outcomes. We have two notions of stationarity: (i) Strong,
or strict, stationarity. Definition: Homogeneity in law; (ii) Weak stationarity, or stationarity of
order p. Definition: Homogeneity in moments.
Even with stationary DGP, there might be situations where the number of parameters to
be estimated increases with the sample size. As an example, consider two stochastic processes:
one, for which cov(yt , yt+τ ) = τ 2 ; and another, for which cov(yt , yt+τ ) = exp (− |τ |). In both
cases, the DGP is stationary. Yet for the first process, the dependence increases with τ , and
for the second, the dependence decreases with τ . As this simple example reveals, a stationary
stochastic process may have “long memory.” “Ergodicity” further restricts DGP, so as to make
this memory play a more limited role.
5.2.2.2 Ergodicity
We shall deal with DGPs where the dependence between yt1 and yt2 decreases with |t2 − t1 |.
To introduce some concepts and notation, say two events A and B are independent, when
P (A ∩B) = P (A)P (B). A stochastic process is asymptotically independent if, for some function
βτ ,
β τ ≥ |F (yt1 , · · ·, ytn , yt1 +τ , · · ·, ytn +τ ) − F (yt1 , · · ·, ytn ) F (yt1 +τ , · · ·, ytn +τ )| ,
we also have that limτ →∞ β τ → 0. A stochastic process is p-dependent if ∀τ ≤ p, β τ = 0.
A stochastic ρ∞τ such that for all t, ρτ ≥
process is asymptotically uncorrelated if there exists
cov(yt , yt+τ )/ var(yt ) · var(yt+τ ), and that 0 ≤ ρτ ≤ 1 with τ =0 ρτ < ∞. For example,
ρτ = τ −(1+δ) , δ > 0, in which case ρτ ↓ 0 as τ ↑ ∞.
Let Bt1 denote the σ-algebra generated by {y1 , · · ·, yt } and A ∈ Bt−∞, B ∈ B∞t+τ , and define:
ατ = sup |P (A ∩ B) − P (A)P (B)| , ϕ(τ ) = sup |P (B | A) − P (B)| , P (A) > 0.

τ τ
164
5.2. Data generating processes c
by A. Mele
We say that (i) y is strongly mixing, or α-mixing if limτ →∞ ατ → 0; (ii) y is uniformly mixing
if limτ →∞ ϕτ → 0. Clearly, a uniformly mixing
process is also strongly mixing. A second order
stationary process is ergodic if limT →∞ Tτ=1 cov (yt , yt+τ ) < ∞. If a second order stationary
process is strongly mixing, it is also ergodic.
5.2.3 Parameter estimators

Consider an estimator of the parameter vector θ of the model,
(M ) = {ℓ( yt | xt ; θ), θ ∈ Θ ⊂ Rp } .
Naturally, any estimator does necessarily depend on the sample size, which we write as θ̂ T ≡
tT (y). Of a given estimator θ̂T , we say that it is:
• Correct, or unbiased, if E(θ̂T ) = θ0 . The difference E(θ̂T ) − θ0 is called distortion, or bias.
a.s.
• Weakly consistent if plimθ̂T = θ0 . And strongly consistent if θ̂ T → θ0 .
(1) (2)
Finally, an estimator θ̂T is more efficient than another estimator θ̂T if, for any vector of
(1) (2)
constants c, we have that c⊤ · var(θ̂T ) · c < c⊤ · var(θ̂T ) · c.
5.2.4 Basic properties of density functions

We have T observations yT1 = {y1 , · · ·, yT }. Suppose these observations are the realization
of a T -dimensional random variable with joint density, f (ỹ1 , · · ·, ỹT ; θ) = f ỹ1T ; θ . We have
1
momentarily put tildes on yi , to emphasize that we view each ỹi as a random variable. # How-
ever,
# #to ease notation,
from now on, we write yi instead of ỹi . By construction, f ( y| θ) dy ≡
··· f y1T θ dy1T = 1 or, "
∀θ ∈ Θ, f (y; θ) dy = 1.
Now suppose that the support of y doesn’t depend on θ. Under regularity conditions,
" "
∇θ f (y; θ) dy = ∇θ f (y; θ) dy = 0p ,
where 0p is a column vector of zeros in Rp . Moreover, for all θ ∈ Θ,

"
0p = ∇θ f (y; θ) dy = Eθ [∇θ ln f (y; θ)] . (5.1)
Finally, we have,
"
0p×p = ∇θ [∇θ ln f (y; θ)] f (y; θ) dy
" "
= [∇θθ ln f (y; θ)] f (y; θ) dy + |∇θ ln f (y; θ)|2 f (y; θ) dy,
where |x|2 denotes the outer product, i.e. |x|2 = x · x⊤ . Hence, by Eq. (5.1),
Eθ [∇θθ ln f (y; θ)] = −Eθ |∇θ ln f (y; θ)|2 = −varθ [∇θ ln f (y; θ)] ≡ −J (θ), ∀θ ∈ Θ.
The matrix J is known as the Fisher’s information matrix.
1 Therefore, we follow a classical perspective. A Bayesian statistician would view the sample as given. We do not review Bayesian
methods in this chapter.

165
5.3. Maximum likelihood estimation c
by A. Mele
5.2.5 The Cramer-Rao lower bound

Let t(y) some unbiased estimator of θ, and set the dimension of the parameter space to p = 1.
We have, "
E [t(y)] = t(y)f (y; θ) dy.
Under regularity conditions,

"
∇θ E [t(y)] = t(y) [∇θ ln f (y; θ)] f (y; θ) dy = cov (t(y), ∇θ ln f (y; θ)) .
By Cauchy-Schwartz inequality, [cov (t(y), ∇θ ln f (y; θ))]2 ≤ var [t(y)]·var [∇θ ln f (y; θ)]. There-
fore,
[∇θ E (t(y))]2 ≤ var [t(y)] · var [∇θ ln f (y; θ)] = −var [t(y)] · E [∇θθ ln f (y; θ)] .
But if t(y) is unbiased, or E [t(y)] = θ,
var [t(y)] ≥ [−E (∇θ ln f (y; θ))]−1 ≡ J (θ)−1 .
This is the celebrated Cramer-Rao bound. The same results holds in the multidimensional
case, through a mere change in notation (see, e.g., Amemiya, 1985, p. 14-17).
5.3 Maximum likelihood estimation

5.3.1 Basics

The density of the data, f( y1T θ), maps every possible sample and parameter values of θ on
to positive numbers, the “likelihood” of occurence of any given sample, given the parameter
θ: RnT ×Θ → R+ . We trace the joint density of the entire sample through a thought experiment,
in which we change the sample y1T . So the sample is viewed as the realization of a random
variable, a view opposite to the Bayesian perspective. We ask: Which value of θ makes the
sample we observed the most likely to have occurred? We introduce the “likelihood function,”
L( θ| y1T ) ≡ f(y1T ; θ). It is the function θ → f (y; θ) for y1T given and equal to ȳ, say:
L( θ| ȳ) ≡ f (ȳ; θ).
Then, we maximize L( θ| y1T ) with respect to θ. That is, we look for the value of θ, which
maximizes the probability to observe the sample we have effectively observed. The resulting
estimator is called maximum likelihood estimator (MLE). As we shall see, the MLE attains the
Cramer-Rao lower bound, provided the model is not misspecified.
5.3.2 Factorizations
Consider a series of events {Ai }. In the Appendix, we show that,
n 1 $ %
( n i−1
(

Pr Ai = Pr Ai Aj . (5.2)
i=1 j=1
i=1
166
by A. Mele
By Eq. (5.2), then, the MLE satisfies:

1
θ̂ T = arg max LT (θ) = arg max ln LT (θ) ,
θ∈Θ θ∈Θ T
where, assuming IID data,

T
1 T T T
t−1 t−1
ln LT (θ) ≡ ln
f yt y1 ; θ =
ln f yt y1 ; θ ≡ ln f (yt ; θ) ≡ ℓt (θ), (5.3)
t=1 t=1 t=1 t=1
and ℓt (θ) is the “log-likelihood” of a single observation.
5.3.3 Asymptotic properties

We consider the i.i.d. case only, as in Eq. (5.3). Moreover, we provide heuristic arguments,
leaving more rigorous proofs and general results in the Appendix.
5.3.3.1 The limiting problem
The MLE satisfies the following first order conditions,
0p = ∇θ ln LT (θ)|θ=θ̂T ≡ ∇θ ln LT (θ̂T ).
Consider a Taylor expansion of the first order conditions around θ0 ,

d
0p = ∇θ ln LT (θ̂T ) = ∇θ ln LT (θ0 ) + ∇θθ ln LT (θ0 )(θ̂T − θ0 ), (5.4)
d
where the notation xT = yT means that the difference xT − yT = op (1), and θ0 is defined as the
solution to the limiting problem,

1
θ0 = arg max lim ln LT (θ) = arg max [E (ℓ (θ))] ,
θ∈Θ T →∞ T θ∈Θ
and, finally, ℓ satisfies regularity conditions needed to ensure that,
θ0 : E [∇θ ℓ (θ 0 )] = 0p .
To show that this is indeed the solution, suppose θ0 is identified; that is, θ = θ0 and θ, θ0 ∈
Θ ⇐ f ( y| θ) = f( y| θ0 ). Suppose, further, that for each θ ∈ Θ, Eθ [ln f ( y| θ)] < ∞. Then, we
have that θ0 = arg maxθ∈Θ Eθ [ln f( y| θ)], and this value of θ is unique. The proof is, indeed,
very simple. We have,

f ( y| θ) f ( y| θ)
Eθ0 − ln > − ln Eθ0
f ( y| θ0 ) f( y| θ0 )
"
f (y| θ)
= − ln f( y| θ0 )dy
f ( y| θ0 )
"
= − ln f ( y| θ)dy = 0.
167
by A. Mele
5.3.3.2 Consistency and asymptotic normality

p a.s.
Provided the model is well-specified, we have that θ̂T → θ 0 and even θ̂T → θ0 , under regularity
conditions. One example of conditions required to obtain weak consistency is that the following
uniform weak law of large numbers holds,

lim Pr sup |ℓT (θ) − E (ℓ (θ))| → 0.
T →∞ θ∈Θ
Next, consider again the asymptotic expansion in Eq. (5.4), which can be elaborated, so as to
have,
−1
√ d 1 1
T (θ̂T − θ0 ) = − ∇θθ ln LT (θ0 ) √ ∇θ ln LT (θ 0 )
T T
- T
.−1 T
1 1
=− ∇θθ ℓt (θ0 ) √ ∇θ ℓt (θ0 ).
T t=1 T t=1
By the law of large numbers reviewed in the Appendix (weak law no. 1),
T
1 p
∇θθ ℓt (θ0 ) → Eθ0 [∇θθ ℓt (θ0 )] = −J (θ0 ) .
T t=1
Therefore, asymptotically,
T
√ d −1 1
T (θ̂T − θ 0 ) = J (θ0 ) √ ∇θ ℓt (θ0 ).
T t=1
We also have,
T
1 d
√ ∇θ ℓt (θ0 ) → N (0, J (θ 0 )) .
T t=1

Indeed, let ∇θ ℓ(θ0 )T = T1 Tt=1 ∇θ ℓt (θ0 ), and note that E (∇θ ℓt (θ0 )) = 0. Then, by the central
limit theorem reviewed in the Appendix:
T √
1 ∇ ℓ (θ ) T ∇θ ℓ(θ 0 ) T − E (∇ θ ℓt (θ 0 ))
√ t=1 θ t 0 = ,
T var [∇θ ℓt (θ0 )] var [∇θ ℓt (θ 0 )]
where, for each t, var [∇θ ℓt (θ0 )] = J (θ0 ).

Finally, by the Slutzky’s theorem reviewed in the Appendix,
√ d
T (θ̂T − θ0 ) → N 0, J (θ0 )−1 .
Therefore, the ML estimator attains the Cramer-Rao lower bound.

168
5.4. M-estimators c
by A. Mele
5.4 M-estimators
Consider a function g of the unknown parameters θ. Given a function Ψ, a M-estimator of the
function g(θ) is the solution to,
T

max Ψ (xt , yt ; g) ,
g∈G
t=1
where y and x are as in Section 5.2.1. We assume that a solution to this problem exists, that it
is interior and that it is unique. Let us denote the M-estimator with ĝT (xT1 , y1T ). Naturally, the
M-estimator satisfies the following first order conditions,
T
1
0= ∇g Ψ yt , xt ; ĝT (xT1 , y1T ) .
T t=1
To simplify the presentation, we assume that (x, y) are independent in time, and that they have
the same law. By the law of large numbers,
T "" ""
1 p
Ψ (yt , xt ; g) → Ψ (y, x; g) dF (x, y) = Ψ (y, x; g) dF (y| x) dZ (x) ≡ Ex E0 [Ψ (y, x; g)] ,
T t=1
where E0 is the expectation operator taken with respect to the true conditional law of y given
x and Ex is the expectation operator taken with respect to the true marginal law of x. The
limit problem is,
g∞ = g∞ (θ0 ) = arg max Ex E0 [Ψ (y, x; g)] .
g∈G
Under standard regularity conditions,2 there exists a sequence of M-estimators ĝT (x, y) con-
verging a.s. to g∞ = g∞ (θ0 ). Under additional regularity conditions, the M-estimator is also
asymptotic normal:

⊤
Theorem 5.1: Let I ≡ Ex E0 ∇g Ψ (y, x; g∞ (θ 0 )) [∇g Ψ (y, x; g∞ (θ0 ))] and assume that the
matrix J ≡ Ex E0 [−∇gg Ψ (y, x; g)] exists and has an inverse. We have,
√ d
T (ĝT − g∞ (θ0 )) → N 0, J −1 IJ −1 .
Sketch of the proof. The M-estimator satisfies the following first order conditions,
T
1
0 = √ ∇g Ψ (yt , xt ; ĝT )
T t=1
T
- T
.
d 1 √ 1
=√ ∇g Ψ (yt , xt ; g∞ ) + T ∇gg Ψ (yt , xt ; g∞ ) · (ĝT − g∞ ) .
T t=1 T t=1
2G 1 T a.s.
is compact; Ψ is continuous with respect to g and integrable with respect to the true law, for each g; T t=1 Ψ (yt , xt ; g) →
Ex E0 [Ψ (y, x; g)] uniformly on G; the limit problem has a unique solution g∞ = g∞ (θ0 ).
169
5.5. Pseudo, or quasi, maximum likelihood c
by A. Mele
By rearranging terms,
- T
.−1 T
√ d 1 1
T (ĝT − g∞ ) = − ∇gg Ψ (yt , xt ; g∞) ·√ ∇g Ψ (yt , xt ; g∞ )
T t=1 T t=1
T
d 1
= [Ex E0 (−∇gg Ψ (y, x; g))]−1 · √ ∇g Ψ (yt , xt ; g∞ )
T t=1
T
d −1 1
=J ·√ ∇g Ψ (yt , xt ; g∞ ) .
T t=1

By the limiting problem, Ex E0 [∇g Ψ (y, x; g∞ )] = 0. Then, var (∇g Ψ) = E ∇g Ψ · [∇g Ψ]⊤ =
I, and, then,
T
1 d
√ ∇g Ψ (yt , xt ; g∞ ) → N (0, I) .
T t=1
The result follows by the Slutzky’s theorem and the symmetry of J .
One simple example of M-estimator is the Nonlinear Least Squares estimator,

T

θ̂T = arg min [yt − m (xt ; θ)]2 ,
θ∈Θ
t=1
for some function m. In this case, Ψ (x, y; θ) = [y − m (x; θ)]2 .
5.5 Pseudo, or quasi, maximum likelihood

The maximum likelihood estimator is an M-estimator: set Ψ = ln L, the log-likelihood function.
Indeed, assume the model is well-specified, in which case J = I, which confirms we are back
to the MLE.
Next, suppose that we implement the MLE to estimate a model, when in fact the model is
misspecified in that the true DGP ℓ0 ( yt | xt ) does not belong to the family of laws spanned by
our model,
ℓ0 ( yt | xt ) ∈
/ (M) = {f ( yt | xt ; θ) , θ ∈ Θ} .

Suppose we insist in maximizing Ψ = ln L, where L = t f ( yt | xt ; θ). In this case,
√ d
T (θ̂T − θ∗0 ) → N 0, J −1 IJ −1 ,
where θ∗0 is the “pseudo-true” value,3 and
! !⊤
∗ ∗ ∗
J = −Ex E0 ∇θθ ln f ( yt | yt−1 ; θ0 ) , I = ExE0 ∇θ ln f( yt | yt−1 ; θ0 ) · ∇θ ln f (yt | yt−1 ; θ0 ) .
In the presence of specification errors, J = I. By comparing the two estimated matrices

leads to detect specification errors. Finally, note that in this general case, the variance-covariance
3 That is, θ ∗ is, clearly, the solution to some misspecified limiting problem. This θ ∗ has an appealing interpretation in terms of
0 0
some entropy distance minimizer.
170
5.6. GMM c
by A. Mele
matrix J −1 IJ −1 depends on the unknown law of (yt , xt ). To assess the precision of the estimates
of ĝT , one needs to estimate such a variance-covariance matrix. A common practice is to use
the following a.s. consistent estimators,
T
T
1 1
Jˆ = − ∇gg Ψ(yt , xt ; ĝT ), and Î = − ∇g Ψ(yt , xt ; ĝT ) ∇g Ψ(yt , xt ; ĝT )⊤ .
T t=1
T t=1
5.6 GMM
Economic theory often places restrictions on models that have the following format,
E [h (yt ; θ0 )] = 0q , (5.5)
where h : Rn × Θ → Rq , θ 0 is the true parameter vector, yt is the n-dimensional vector of

the observable variables and Θ ⊆ Rp . Typically, then, the MLE cannot be used to estimate
θ0 . Moreover, MLE requires specifying a density function. Hansen (1982) proposed the fol-
lowing Generalized Method of Moments (GMM) estimation procedure. Consider the sample
counterpart to the population in Eq. (5.5),
T
T 1
h̄ y1 ; θ = h (yt ; θ) , (5.6)
T t=1
where we have rewritten h asa function

of the parameter vector θ ∈ Θ. The basic idea of GMM
is to find a θ which makes h̄ y1⊤ ; θ as close as possible to zero. Precisely, we have,
Definition (GMM estimator): The GMM estimator is the sequence θ̂T satisfying,
⊤
θ̂T = arg min p h̄ y1T ; θ · WT · h̄ y1T ; θ ,
θ∈Θ⊆R 1×q q×q q×1
where {WT } is a sequence of weighting matrices, with elements that may depend on the obser-
vations.
When p = q, we say the GMM is just-identified, and is, simply, the MM, satisfying:
θ̂ T : h̄(y1T ; θ̂T ) = 0q .
When p < q, we say the GMM estimator imposes overidentifying restrictions.

We analyze the i.i.d. case only. Under regularity conditions, there exists a matrix WT that
minimizes the asymptotic variance of the GMM estimator, which satisfies asymptotically,
!−1
T T ⊤
W = lim T · E h̄(y1 ; θ̂T ) · h̄(y1 ; θ̂T ) ≡ Σ−1
0 . (5.7)
T →∞
An estimator of Σ0 can be:
1
T !
ΣT = h(yt ; θ̂T ) · h(yt ; θ̂T )⊤ .
T t=1
171
5.6. GMM c
by A. Mele
Note that θ̂T depends on the weighting matrix ΣT , and the weighting matrix ΣT depends on θ̂T .
Therefore, we need to implement an iterative procedure. The more one iterates, the less likely
(0)
the final outcome depends on the initial weighting matrix ΣT . For example, one can start with
(0)
ΣT = Iq .
We have:
Theorem 5.2: Suppose to be given a sequence of GMM estimators θ̂T with weigthing matrix
p
as in Eq. (5.7), and such that: θ̂T → θ 0 . We have,
!−1
√ d −1 ⊤
T (θ̂T − θ0 ) → N 0p , E (hθ ) Σ0 E (hθ ) , where hθ ≡ ∇θ h(y; θ 0 ).
p
Sketch of the proof: The assumption that θ̂ T → θ0 is easy to check under mild regularity
conditions. Moreover, the GMM satisfies,
0p = ∇θ h̄(y1T ; θ̂T )Σ−1 T

T h̄(y1 ; θ̂ T ). (5.8)
p×q q×q q×1
Eq. (5.8) confirms that if p = q, the GMM satisfies θ̂T : h̄(y1T ; θ̂ T ) = 0. Indeed, ∇θ hΣ−1 T is
full-rank with p = q, and Eq. (5.8) can only be satisfied with h̄ = 0. In the general case, q > p,
we have,
√ √ ⊤ √
T h̄(y1T ; θ̂T ) = T h̄ y1T ; θ0 + ∇θ h̄ y1T ; θ0 T (θ̂T − θ 0 ) + op (1).
q×1 q×1 q×p
By premultiplying both sides of the previous equality by ∇θ h̄(y1T ; θ̂T )Σ−1

T ,
√
T ∇θ h̄(y1T ; θ̂T )Σ−1 T
T · h̄(y1 ; θ̂T )
√ T T ⊤ √
= T ∇θ h̄(y1T ; θ̂T )Σ−1
T · h̄ y1 ; θ 0 + ∇θ h̄(y1
T
; θ̂ T )Σ−1
T · ∇θ h̄ y1 ; θ 0 T (θ̂ T − θ0 ) + op (1).
The l.h.s. of this equality is zero by the first order conditions in Eq. (5.8). By rearranging
terms,
√ T ⊤ −1 √ T
d
T (θ̂T − θ0 ) = − ∇θ h̄ y1T ; θ0 Σ−1T ∇ θ h̄ y1 ; θ 0 ∇ θ h̄(y1
T
; θ̂ T )Σ −1
T · T h̄ y1 ; θ0
T −1 T
1 −1 1
T
⊤ 1 √ T
=− ∇θ h(yt ; θ̂T )ΣT [∇θ h(yt ; θ̂ T )] ∇θ h(yt ; θ̂T )Σ−1
T T h̄ y1 ; θ0
T t=1 T t=1 T t=1
−1 1
T
d ⊤
= − E (hθ ) Σ−1
0 E (h θ ) E (hθ ) Σ−1
0 · √ h (yt ; θ0 ) .
T t=1
d
We have: √1T Tt=1 h (yt ; θ0 ) → N (E(h), var(h)), where, by Eq. (5.5), E(h) = 0, and var(h) =

E h · h⊤ = Σ0 . Hence:
T
1 d
√ h (yt ; θ 0 ) → N (0, Σ0 ) .
T t=1
√
Therefore, T (θ̂ T − θ 0 ) is asymptotically normal with expectation 0p , and variance,
−1 ⊤−1 −1
⊤ ⊤ ⊤ ⊤
E (hθ ) Σ−1
0 E (hθ ) E (h θ ) Σ−1
0 Σ Σ−1
0 0 E (h θ ) E (h θ ) Σ−1
0 E (hθ ) = E (hθ ) Σ−1
0 E (h θ ) .
172
5.6. GMM c
by A. Mele
A widely used global specification test is that of the celebrated “overidentifying restrictions.”
Consider the following intuitive result:
√ T ⊤ −1 √ T ⊤ d 2
T h̄ y1 ; θ 0 Σ0 T h̄ y1 ; θ0 → χ (q).
Would we be expecting the same, if we were to replace the true parameter θ0 with the GMM
estimator θ̂T , which is, anyway, a consistent estimator for θ0 ? The anwer is no. Define:
√ √
CT = T h̄(y1T ; θ̂T )⊤ Σ−1
T · T h̄(y1T ; θ̂ T ).
We have,
√ d √ √
T h̄(y1T ; θ̂T ) = T h̄ y1T ; θ0 + ∇θ h̄ y1T ; θ0 T (θ̂T − θ0 )
√ ⊤ !−1 √ T
d ⊤
= T h̄ y1T ; θ 0 − ∇θ h̄ y1T ; θ0 E (hθ ) Σ−1 0 E (h θ ) E (hθ ) Σ−1
0 · T h̄ y1 ; θ0
√ !−1 √ T
d
= T h̄ y1T ; θ 0 − E (hθ )⊤ E (hθ ) Σ−1 0 E (h θ ) ⊤
E (hθ ) Σ−10 · T h̄ y1 ; θ0
√
= (Iq − Pq ) T h̄ y1T ; θ0 ,
q×q q×1
and !−1
⊤ ⊤
Pq ≡ E (hθ ) E (hθ ) Σ−1
0 E (hθ ) E (hθ ) Σ−1
0
is the orthogonal projector in the space generated by the columns of E (hθ ) by the inner product
Σ−1
0 . Thus, we have shown that,
d
√ ⊤ √ T
CT = T h̄ y1T ; θ0 (Iq − Pq )⊤ Σ−1T (I q − P q ) T h̄ y1 ; θ0 .
But,
√ T d
T h̄ y1 ; θ 0 → N (0, Σ0 ) ,
and, by a classical result,
d
CT → χ2 (q − p) .
Hansen and Singleton (1982, 1983) started the literature on the estimation and testing of dy-
namic asset pricing models within a fully articulated rational expectations framework. Consider
the classical system of Euler equations arising in the Lucas tree,
′
u (ct+1 )
E β ′ (1 + ri,t+1 ) − 1 Ft = 0, i = 1, · · ·, m,
u (ct )
where u is the utility function of the representative agent, ri is the return on asset i, β is the
time-discount factor, Ft is the information set as of time t, and m is the number of assets.
Consider the CRRA utility function, u(x) = x1−η / (1 − η). If the model is well-specified, then,
there exist some β 0 and η 0 such that:
- −η0 .
ct+1

E β0 (1 + ri,t+1 ) − 1 Ft = 0, i = 1, · · ·, m.
ct
173
5.7. Simulation-based estimators c
by A. Mele
To sumup, the dimension of the parameter vector is p = 2. To estimate the true parameter
vector θ0 ≡ (β 0 , η 0 ), we may build up a system of orthogonality conditions. This system can
be based on projecting observable variables predicted by the model onto other variables, some
“instruments” included in the information set Ft :
E [h (yt ; θ0 )] = 0,
where, for some vector of z instruments, say, Int = [i1,t , · · · , iz,t ]⊤ ,

 −η 
ct+1
 β ct (1 + r1,t+1 ) − 1 · Int 
 
 .. 
h (yt ; θ) =  .  , q = m · z. (5.9)
q×1  −η 
 ct+1 
β ct (1 + rm,t+1 ) − 1 · Int
The instruments used to produce the orthogonality restrictions, may include constants, past
values of consumption growth, ct+1
ct
, or even past returns.
5.7 Simulation-based estimators

Ideally, MLE should be the preferred estimation method of parametric Markov models, as it
leads to first-order efficiency. Yet economic theory places restrictions that make these models
problematic to estimate through maximum ML. In these cases, GMM is a natural estimation
method. But GMM can be unfeasible as well, in situations of interest. Assume, for example, that
the data generating process is not i.i.d. Instead, data are generated by the transition function,
yt+1 = H (yt , ǫt+1 ; θ0 ) , (5.10)
where H : Rn × Rd × Θ → Rn , and ǫt is a vector of i.i.d. disturbances in Rd . Assume the

econometrician knows the function H. Let zt = (yt , yt−1 , · · · , yt−l+1 ), l < ∞. In many cases of
interest, the function h̄ in Eq. (5.6) can be written as,
T
1 ∗
h̄ y1T ; θ = [f − E (f (zt , θ))], (5.11)
T t=1 t
≡h(zt ,θ)
where,
ft∗ = f (zt , θ0 ) ,
is a vector-valued moment function, or “observation function,” a function that summarizes
satisfactorily the data, so to speak. Consider, for example, Eq. (5.9) without the instruments
Int , where ft∗ = (1 + ri,t+1 )−1 and E (f (zt , θ)) = βE(( ct+1
ct
)−η ). Once we identify consumption
growth with yt+1 , yt+1 = ln ct+1ct
, and take the transition law in Eq. (5.10) to be log-normally
distributed, as in some basic models we shall see in Part II of these lectures, we can compute
E (f (zt , θ)) in closed form. Needless to say, the GMM estimator is unfeasible, if we are not able
to compute the expectation E (f (zt , θ)) in closed form, for each θ. Simulation-based methods
can make the method of moments feasible in this case.
174
by A. Mele
5.7.1 Three simulation-based estimators

The basic idea underlying simulation-based methods is quite simple. While the moment condi-
tions are too complex to be evaluated analytically, the model in Eq. (5.10) can be simulated.
Accordingly, draw ǫt from its distribution, and save the simulated values ǫ̂t . Compute recur-
sively,
θ
yt+1 = H ytθ , ǫ̂t+1 , θ ,
and create simulated moment functions as follows,

ftθ ≡ f ztθ , θ .
Consider the following parameter estimator,
θT = arg min GT (θ)⊤ WT GT (θ) , (5.12)

θ∈Θ
where WT is some weigthing matrix, GT (θ) is the simulated counterpart to h̄ in Eq. (5.11),
T
$ %
1 1 S(T
)
GT (θ) = ft∗ − fθ ,
T t=1 S (T ) s=1 s
and S (T ) is the simulated sample size, which we write as a function of the sample size T , for
the purpose of the asymptotic theory.
The estimator θT , also known as the Simulated Method of Moments (SMM) estimator, aims to
match the sample properties of the actual and simulated processes ft∗ and ftθ . It was introduced
in a series of works, by McFadden (1989), Pakes and Pollard (1989), Lee and Ingram (1991)
and Duffie and Singleton (1993). The simulated pseudo-maximum likelihood method of Laroque
and Salanié (1989, 1993, 1994) can also be interpreted as a SMM estimator.
A second simulation-based estimator relies on the indirect inference principle (IIP), and was
proposed by Gouriéroux, Monfort and Renault (1993) and Smith (1993). Instead of minimizing
the distance of some moment conditions, the IIP relies on minimizing the parameters of an
auxiliary, possibly misspecified model. For example, consider the following auxiliary parameter
estimator,
β T = arg max ln L y1T ; β , (5.13)
β
where L is the likelihood of some possibly misspecified model. Consider simulating S times the
process yt in Eq. (5.10), and computing,
β sT (θ) = arg max ln L(ys (θ)T1 ; β), s = 1, · · ·, S,

β
where ys (θ)T1 = (ytθ,s )Tt=1 are the simulated variables (for s = 1, ···, S) when the parameter vector
is θ. The IIP-based estimator is defined similarly as θT in Eq. (5.12), but with the function GT
given by,
S
1 s
GT (θ) = β T − β (θ) . (5.14)
S s=1 T
The diagram in Figure 5.1 illustrates the main ideas underlying the IIP.
175
by A. Mele
Estimation of an
auxiliary model on
Model-simulated data model-simulated data
Model
Auxiliary
yt = H ( yt −1 , ε t ;θ ) ~
y (θ ) = ( ~
y1 (θ ),L, ~
yT (θ )) parameter estimates
~
βT (θ )
Sss
Auxiliary
y = ( y1 ,L, yT ) parameter estimates
βT
Observed data
Estimation of the
same auxiliary model
on observed data
Indirect Inference Estimator
~
θˆT ∈ argmin βT (θ ) − βT
θ∈Θ Ω
FIGURE 5.1. The Indirect Inference principle. Given the true model yt = H (yt−1 , ǫt ; θ), an estimator
of θ based on the indirect inference principle (θ̂T say) makes the parameters of some auxiliary model
β̃ T (θ̂T ) as close as possible to the parameters
; β T ;of the same auxiliary model estimated on the
; ;
observations. That is, θ̂T = arg minθ∈Θ ;β̃ T (θ) − β T ; , for some norm Ω.
Ω
Finally, Gallant and Tauchen (1996) propose a simulation-based estimation method they
label efficient method of moments (EMM). Their estimator sets,
N
1 ∂ θ
GT (θ, β T ) = ln f ynθ zn−1 ; βT ,
N n=1 ∂β
∂
where ∂β ln f ( y| z; β) is the score of some auxiliary model f , also known as the score generator,
β T is the Pseudo ML estimator of the auxiliary model, and (ynθ )N n=1 is a long simulation (i.e. N
is very large) of Eq. (5.10), with parameter vector set equal to θ. Finally, the weighting matrix
WT in Eq. (5.12) is taken to be any matrix IT−1 converging in probability to:

∂
I=E ln f ( y2 | z1 ; β) . (5.15)
∂β 2
∂
To motivate this choice of GT (θ), note that the auxiliary score, ∂β
ln f ( yt | zt−1 ; β T ), satisfies
the following first order conditions:
T
1 ∂
ln f ( yt | zt−1 ; β T ) = 0,
T n=1 ∂β
which is the sample equivalent of

∂ ∗
E ln f ( y2 | z1 ; β ) = 0,
∂β
176
by A. Mele
for some β ∗ . Likewise, we must have that with θ = θ0 , GT (θ0 , β T ) = 0, for large N. All in all,
we want to find a stochastic process H (yt , ǫt+1 ; ·) in Eq. (5.10), or a parameter vector θ such
that the expectation of the score of the auxiliary model is zero, a very property of the score,
arising even when the model is misspecified.
5.7.2 Asymptotic normality

We show, heuristically, how asymptotic normality obtains for the three estimators of Section
5.7.1, and then, define conditions under which asymptotic efficiency might obtain for the EMM.
5.7.2.1 SMM
Let,
∞
∗ ∗ ⊤ !
Σ0 = E (ft∗ − E (ft∗ )) ft−j − E ft−j ,
j=−∞
and suppose that

p
WT → W0 = Σ−1
0 .
We now demonstrate that under this condition, as T → ∞ and S (T ) → ∞,

√ −1
d
T (θT − θ0 ) → N 0p , (1 + τ ) D0⊤ Σ−1
0 D0 , (5.16)
T
θ0

where τ = limT →∞ S(T )
, D0 = E (∇ θ G∞ (θ 0 )) = E ∇θ f∞ , and the notation G∞ means that
G. is drawn from its stationary distribution.
Indeed, the first order conditions satisfied by the SMM in Eq. (5.12) are,
0p = [∇θ GT (θ T )]⊤ WT GT (θT ) = [∇θ GT (θT )]⊤ WT · [GT (θ0 ) + ∇θ GT (θ0 ) (θT − θ0 )] + op (1) .
That is,
√ −1 √
d
T (θT − θ0 ) = − [∇θ GT (θT )]⊤ WT ∇θ GT (θ0 ) [∇θ GT (θT )]⊤ WT · T GT (θ0 )
d −1 ⊤ √
= − D0⊤ W0 D0 D0 W0 · T GT (θT )
−1 ⊤ −1 √
= − D0⊤ Σ−1
0 D0 D0 Σ0 · T GT (θ 0 ) . (5.17)
We have,
T
$ %
√ √ 1 1 S(T
)
T GT (θ0 ) = T· ft∗ − f θ0
T t=1 S (T ) s=1 s
T √ S(T )
1 ∗ ∗ T 1 θ0
= √ (ft − E (f∞ )) − · fsθ0 − E f∞
T t=1 S (T ) S (T ) s=1
d
→ N (0, (1 + τ ) Σ0 ) ,
∗
θ
where we have used the fact that E (f∞ ) = E f∞ 0
. By using this result into Eq. (5.17) produces
T
the convergence in Eq. (5.16). If τ = limT →∞ S(T ) = 0 (i.e. if the number of simulations grows
more fastly than the sample size), the SMM estimator is as efficient as the GMM estimator.
T
Finally, and obviously, we need that τ = limT →∞ S(T )
< ∞: the number of simulations S (T )
cannot grow more slowly than the sample size.
177
by A. Mele
5.7.2.2 Indirect inference
The IIP-based estimator works slightly differently. For this estimator, even if the number of
simulations S is fixed, asymptotic normality obtains without requiring S to go to infinity more
fastly than the sample size. Basically, what really matters here is that ST goes to infinity.
By Eq. (5.17), and the discussion in Section 5.7.1, we know that asymptotically, the first
order conditions satisfied by the IIP-based estimator are,
√ d −1 ⊤ √
T (θT − θ0 ) = − D0⊤ W0 D0 D0 W0 · T GT (θ0 ) ,
where GT is as in Eq. (5.14), D0 = ∇θ b (θ), and b (θ) is solution to the limiting problem
corresponding to the estimator in Eq. (5.13), viz

1 T
β (θ) = arg max lim ln L y1 ; β .
β T →∞ T
We need to find the distribution of GT in Eq. (5.14). We have,

S
√ 1 √
T GT (θ0 ) = T (β T − β sT (θ 0 ))
S s=1
S
1 √
= T [(β T − β 0 ) − (β sT (θ0 ) − β 0 )]
S s=1
S
√ 1 √
= T (β T − β 0 ) − T (β sT (θ 0 ) − β 0 ) ,
S s=1
where β 0 = β (θ0 ). Hence, given the independence of the sample and the simulations,
√
√ d 1
T GT (θ0 ) → N 0, 1 + · Asy.Var T βT .
S
That is, asymptotically S can be fixed with respect to T .
5.7.2.3 Efficient method of moments
We have,
N
1 ∂ θ
⊤
θT = arg min GT (θ, β T ) WT GT (θ, β T ) , GT (θ, β T ) = ln f ynθ zn−1 ; βT .
θ N n=1 ∂β
The first order conditions are:
0 = ∇θ GT (θ T , β T )⊤ WT GT (θT , β T )
d
= ∇θ GT (θ 0 , β T )⊤ WT (GT (θ0 , β T ) + ∇θ GT (θ0 , β T ) (θT − θ0 )) ,
or
√ −1 √
d
T (θT − θ0 ) = − ∇θ GT (θ0 , β T )⊤ WT ∇θ GT (θ0 , β T ) ∇θ GT (θ0 , β T )⊤ WT T GT (θ0 , β T ) .
178
by A. Mele
We have, for some β ∗ ,

√ d √ d
T GT (θ 0 , β T ) = J T (β T − β ∗ ) → N (0, I) ,

∂
where J = E ∂β∂β ⊺
ln f ( y2 | y1 ; β) and I is as in Eq. (5.15). Hence,
√ d
T (θT − θ0 ) → N (0, V ) ,
where,
−1 −1
V = ∇θ G⊤ W ∇θ G ∇θ G⊤ W IW ⊤ ∇θ G ∇θ G⊤ W ∇θ G .
With W = I −1 , this variance collapses to,
−1
V = ∇θ G⊤ I −1 ∇θ G . (5.18)
5.7.2.4 Spanning scores
This section provides a heuristic discussion of the conditions under which the EMM achieves the
Cramer-Rao lower bound. Consider the following definition, which is similar to that in Tauchen
(1997). Of a given span of moment conditions sf , say that of the EMM, we say that it also
spans the true score if,
var ( s| sf ) = 0, (5.19)
where s denotes the true score. From Eq. (5.18), we know that the asymptotic variance of the
EMM, say varEMM , satisfies:
−1
varEMM ≡ V −1 = ∇θ G⊤ var (sf )−1 ∇θ G.
By the linear projection,
s = Bsf + ǫ, B = cov (s, sf ) var (sf )−1 ,
we have,
−1
varMLE = var (s) = Bvar (sf ) B ⊤ + var ( s| sf ) = cov (s, sf ) var (sf )−1 cov (s, sf )⊤ + var ( s| sf ) ,
(5.20)
where varMLE denotes the asymptotic variance of the MLE. We claim that:
cov (s, sf )⊤ = ∇θ G. (5.21)
Indeed, under regularity conditions,
"
∗ ∂ ∂ ∗
∇θ G (θ0 , β ) = ln f (y; β ) p (y, θ) dy
∂θ ∂β θ=θ0
"
∂ ∂
= ln f (y; β ∗ ) p (y, θ0 ) dy
∂β ∂θ
"
∂ ∗ ∂
= ln f (y; β ) ln p (y, θ0 ) p (y, θ0 ) dy
∂β ∂θ
= cov (s, sf )⊤ ,
where p (y, θ0 ) is the true density. Next, replace Eq. (5.21) into Eq. (5.20),
−1
varMLE = ∇θ G⊤ var (sf )−1 ∇θ G + var ( s| sf ) = varEMM
−1
+ var ( s| sf ) .
Therefore, the EMM estimator achieves the Cramer-Rao lower bound under the spanning con-
dition in Eq. (5.19).
179
by A. Mele
5.7.3 A fourth simulation-based estimator: Simulated maximum likelihood

Estimating the parameters of stochastic differential equations is a recurrent theme in empirical
finance. Consider a continuous time model,
dy (τ ) = b (y (τ ) ; θ) dτ + Σ (y (τ ) ; θ) dW (τ ) , (5.22)
where W (τ ) is a Brownian motion and b and Σ are two functions guaranteeing a strong solution
to Eq. (5.22). Except in special cases (e.g., the affine models reviewed in Chapter 11), the
likelihood function of the data generated by this process is unknown. We can then use one of
the three estimators we have presented in section 5.7.1. Alternatively, we might use simulated
maximum likelihood, a method introduced in finance by Santa-Clara (1995) (see, also, Brandt
and Santa-Clara, 2002). We only provide the idea of the method, not the asymptotic theory.
Suppose, then, that we observe discretely sample data generated by Eq. (5.22): y0 , y1 ,· · · , yt ,
· · · , yT , where T is the sample size. We need to know the transition density, say p ( yt+1 | yt ; θ),
to implement maximum likelihood, which we assume we do not know. Consider, then, the Euler
approximation to Eq. (5.22),
<
1 1
y(k+1)/n = yk/n + b yk/n ; θ + Σ yk/n ; θ ǫk+1 , (5.23)
n n
where ǫk is a sequence of i.i.d. random variables with expectation zero and unit variance. This
stochastic process is defined at the dates nk , for k integer. Let [T n] denote the integer part of
T n, and for k = 1, · · · , [T n], set
k k+1
ŷτ(n) = yk/n , if ≤τ ≤ .
n n
In other words, we are chopping the time interval between two observations, [t, t + 1], in n
(n)
pieces, and then take n to be large. We know that as n → ∞, ŷt ⇒ y (t) as n → ∞,
where ⇒ denotes “weak convergence,” or “convergence in distribution,” meaning that all finite
(n)
dimensional distributions of ŷt converge to those of y (t) as n → ∞. The idea underlying
simulated maximum likelihood, then, is to estimate the transition density, p ( yt+1 | yt ; θ), through
simulations of Eq. (5.23), performed using a large value of n. Note, we cannot guarantee the
transition density is recovered by simulating Eq. (5.23), not even for a large value of n. We can
only perform an imperfect simulation of Eq. (5.23).
The likelihood function is,
T
1 −1
L = p (y0 ; θ) p ( yt+1 | yt ; θ) ,
t=0
where p (y0 ; θ) denotes the marginal density of the first observation, y0 .

Let pn ( y ′ | y; θ) the transition density of the data generated by Eq. (5.23). Then, if ǫ is
normally distributed,

1 2 1
pn
y(k+1)/n yk/n ; θ = ϕ y(k+1)/n ; yk/n + b yk/n ; θ ; Σ yk/n ; θ ,
n n
180
by A. Mele
where ϕ (u; µ; σ 2 ) denotes the Gaussian density with mean µ and variance σ 2 . Moreover, we
have, approximately,
"
p ( yt+1 | yt ; θ) = pn ( yt+1 | x; θ) pn ( x| yt ; θ) dx
n
"
1 2 1
= ϕ yt+1 ; x + b (x; θ) ; Σ (x; θ) pn ( x| yt ; θ) dx,
n n
where we have set x = yt+1− 1 . We may, now, draw values of x from pn ( x| yt ; θ), as explained
n
in a moment, and estimate pn ( yt+1 | yt ; θ) through:
S
n,S 1 j
j 1 2 j 1
p ( yt+1 | yt ; θ) ≡ ϕ y(k+1)/n ; x̃ + b x̃ ; θ ; Σ x̃ ; θ ,
S j=1 n n
1
where x̃j is obtained by iterating Eq. (5.23) from
n,Stime t to time t + 1 − n
. Under regularity
′ ′
conditions, we have that for all θ ∈ Θ, supy′ ,y p ( y | y; θ) − p ( y | y; θ) → 0 as n and S get
√
large, with nS → 0.
5.7.4 Advances
The three estimators that we have examined in Sections 5.7.1-5.7.2, are general-purpose, but in
general, they do not lead to to asymptotic efficiency, unless the true score belongs to the span
of the moment conditions, as explained in Section 5.7.2.4. There exist other simulation-based
methods, which aim to approximate the likelihood function through simulations (e.g., Lee, 1995;
Hajivassiliou and McFadden, 1998): for example, the simulated maximum likelihood estimator
in Section 5.7.2.3 can be used to estimate the parameters of stochastic differential equations.
While methods based on simulated likelihood lead to asymptotically efficient estimators, they
address specific estimation problems, just as the example of Section 5.7.2.3 illustrates.
There exist estimators that are both general purpose and that can lead to asymptotic effi-
ciency. Fermanian and Salanié (2004) consider an estimator that relies on approximating the
likelihood function through kernel estimates obtained simulating the model of interest. Carrasco,
Chernov, Florens and Ghysels (2007) rely on a “continuum of moment conditions” matching
model-based (simulated) characteristic functions to data-based characteristic functions. Al-
tissimo and Mele (2009) propose an estimator based on a continuum of moment conditions,
which minimizes a certain distance between conditional densities estimated with the true data
and conditional densities estimated with data simulated from the model, where both conditional
densities are estimated through kernel methods.
5.7.5 In practice? Latent factors and identification

The estimation theory of this section does not rule out the situation where some of the variables
in Eq. (5.10) are unobservable. The principle to follow is very simple, one applies any of the
methods we have discussed to those variables simulated out of Eq. (5.10), which correspond to
the observed ones. For example, we may want to estimate the following model of the short-term
rate r (τ ), discussed at length in Chapter 11:

dr (τ ) = κr (r̄ − r (τ )) dτ + v (τ )dW1 (τ )
(5.24)
dv (τ ) = κv (v̄ − v (τ )) dτ + ξ v (τ )dW2 (τ )
181
5.8. Asset pricing, prediction functions, and statistical inference c
by A. Mele
where v (τ ) is the short-term rate instantaneous, stochastic variance, W1 and W2 are two stan-
dard Brownian motions, and the parameter vector of interest is θ = [κr r̄ κv v̄ ξ]. Let us pick
up one the methods discussed so far, say indirect inference. The logical steps to follow, then,
are (i) to simulate Eqs. (5.24), and (ii) to calibrate an auxiliary model to the short term rate
data simulated out of Eqs. (5.24) which is as close as possible to the very same auxiliary model
fitted on true data. Note, in doing so, we just have to neglect the volatility data simulated out
of Eqs. (5.24), as these data are obviously unobservable.
The question arises, therefore, as to whether the auxiliary model one chooses is rich enough
to allow identifying the model’s parameter vector θ. There might be many combinations of
unobserved random processes v (τ ) that are consistent with the likelihood of any given auxiliary
model. So which auxiliary model to fit, in practice? Gallant and Tauchen (1996) asked this
question long time ago. Needless to mention, there are no general answers to this question.
Very simply, one requires the model to be identifiable, which is likely to happen once the
auxiliary model is “rich enough.” In an impressive series of applied work, Gallant and Tauchen
and their co-authors have proposed semi-nonparametric score generators, as a way to get as
close as possible to a “rich” model. Intuitively, by increasing the order of Hermite expansions,
semi-nonparametric scores might converge to the true ones. Alternatively, one might use a
continuum of moment conditions, as explained in Section 5.7.4. For example, the nonparametric
density estimates in Altissimo and Mele (2009) converge to the true ones, once the bandwidth
parameters used to smooth out these these estimates is sufficiently fine. In the next section,
we provide a discussion of how asset prices might help convey information about unobserved
processes and lead to statistical efficiency.
5.8 Asset pricing, prediction functions, and statistical inference

We develop conditions, which ensure the feasibility of estimation methods in a context where
an unobservable multidimensional process is estimated in conjunction with prediction func-
tions suggested by asset pricing models.4 We assume that the data generating process is a
multidimensional partially observed diffusion process solution to,
dy (τ ) = b (y (τ ) ; θ) dτ + Σ (y (τ ) ; θ) dW (τ ) , (5.25)
where W is a multidimensional process and (b, Σ) satisfy some regularity conditions we single
out below. We analyze situations where the original partially observed system in Eq. (5.25)
can be estimated by augmenting it with a number of observable deterministic functions of the
state. In many situations, such deterministic functions are suggested by asset pricing theories
in a natural way. Typical examples include the price of derivatives or in general, any functional
of asset prices (such as asset returns, bond yields, implied volatilities).
The idea to use asset pricing predictions to improve the fit of models with unobservable
factors has been explored at least by, e.g., Christensen (1992), Pastorello, Renault and Touzi
(2000), Chernov and Ghysels (2000), Singleton (2001), and Pastorello, Patilea and Renault
(2003).
We consider a standard Markov pricing setting. For fixed t ≥ 0, we let M be the expiration
date of a contingent claim with rational price process c = {c(y(τ ), M − τ )}τ ∈[t,M) , and let
{z(y(τ ))}τ ∈[t,M] and Π(y) be the associated intermediate payoff process and final payoff function,
4 This section is based on an unpublished appendix of Altissimo and Mele (2009).

182
by A. Mele
respectively. Let ∂/ ∂τ +L be the usual infinitesimal generator of the system in Eq. (5.25), taken
under the risk-neutral probability. Then, as we saw in Chapter 4, we have that in a frictionless,
arbitrage-free market, c is the solution to the following partial differential equation:

 ∂
0= + L − R c(y, M − τ ) + z(y), ∀(y, τ ) ∈ Y × [t, M)
∂τ (5.26)

c(y, 0) = Π(y), ∀y ∈ Y
where R ≡ R(y) is the short-term rate. We call prediction function any continuous and twice
differentiable function c (y; M − τ ) solution to the partial differential equation and boundary
condition in (5.26). Examples of contingent claims with prices satisfying (5.26) are derivatives,
typically.
Next, we augment the system in Eq. (5.25) with d − q prediction functions, where q denotes
the number of the observable variables in Eq. (5.25). Precisely, we let:
C(τ ) ≡ (c (y(τ ), M1 − τ ) , · · ·, c (y(τ ), Md−q − τ )) , τ ∈ [t, M1 ]

d−q
where {Mi }i=1 is an increasing sequence of fixed maturity dates. Furthermore, we define the
measurable vector valued function:
φ (y(τ ); θ, γ) ≡ (y o (τ ), C (y(τ ))) , τ ∈ [t, M1 ], (θ, γ) ∈ Θ × Γ, (5.27)
where y o (τ ) denotes the vector of observable variables in Eq. (5.25), and Γ ⊂ Rpγ is a compact
parameter set containing additional parameters. These new parameters arise from the change of
measure leading to the pricing model in Eq. (5.27), and are now part of our estimation problem.
We assume that the pricing model in Eq. (5.27) is correctly specified. That is, all contingent
claim prices in the economy are taken to be generated by the prediction function c(y, M −τ ) for
some (θ0 , γ 0 ) ∈ Θ×Γ. For simplicity, we also consider a stylized situation in which all contingent
claims have the same contractual characteristics specified by C ≡ (z, Π). More generally, one
may define a series of classes of contingent claims {Cj }Jj=1 , where the class of contingent claims
j has provisions specified by Cj ≡ (zj , Πj ). As an example, assets belonging to the class C1 can
be European options, assets belonging to the class C1 can be bonds. The number J of prediction
functions that we would introduce in this case would be equal to d − q = j=1 M j , where M j
is the number of prediction functions within class of assets j. To keep the presentation simple,
we do not consider such a more general situation.
The objective is to define estimators of the parameter vector (θ0 , γ 0 ), under which obser-
vations were generated. We want to use any of the simulation methods reviewed in Section
5.7 to produce an estimator of (θ0 , γ 0 ). The idea, as usual, is to make the finite dimensional
distributions of φ implied by the pricing model in Eq. (5.27) and the fundametals in Eq. (5.26)
as close as possible to the sample counterparts of φ. Let Φ ⊆ Rd be the domain on which φ
takes values. As illustrated by Figure 5.2, we want to move from the unfeasible domain Y of
the original state variables in Eq. (5.25) (observables and not) to the domain Φ on which only
observable variables take value. Ideally, we would like to implement such a change in domain
in order to recover as much information as possible about the original unobserved process in
(5.25). Clearly, φ is fully revealing whenever it is globally invertible. However, we will show that
estimation is feasible even when φ is only locally one-to-one.
An important feature of the theory in this section is that it does not hinge upon the avail-
ability of contingent prices data covering the same sample period covered by the observables
183
by A. Mele
φ (y; θ0, γ0)
Φ
Y
φ −1(y; θ0, γ0)
FIGURE 5.2. Asset pricing, the Markov property, and statistical efficiency. Y is the domain on which
the partially observed primitive state process y ≡ (y o y u )⊤ takes values, Φ is the domain on which
the observed system φ ≡ (y o C(y))⊤ takes values in Markovian economies, and C(y) is a contingent
∗
claim price process in Rd−q . Let φc = (y o , c(y, ℓ1 ), · · ·, c(y, ℓd−q∗ )), where {c(y, ℓj )}d−q
∗
j=1 forms an
intertemporal cohort of contingent claim prices, as in Definition 5.3. If the local restrictions of φ are
one-to-one and onto, statistical inference about θ and γ can be made, using information about the price
of derivative contracts, φc . If φ is also globally invertible, statistical inference can lead to first-order
asymptotic efficiency, once conditioned upon φc .
in Eq. (5.25). First, the price of a given contingent claim is typically not available for a long
sample period. As an example, available option data often include option prices with a life span
smaller than the usual sample span of the underlying asset prices. By contrast, it is common
to observe long time series of option prices having the same maturity. Second, the price of a
single contingent claim depends on the time-to-maturity of the claim; therefore, it does not
satisfy the stationarity assumptions maintained in this paper. To address these issues, we deal
with data on assets having the same characteristics at each point in time. Precisely, consider
the data generated by the following random processes:
Definition 5.3. (Intertertemporal (ℓ, N)-cohort of contingent claim prices) Given a prediction
function c (y; M − τ ) and a N -dimensional vector ℓ ≡ (ℓ1 , · · ·, ℓN ) of fixed time-to-maturity,
an intertemporal (ℓ, N)-cohort of contingent claim prices is any collection of contingent claim
price processes c (τ , ℓ) ≡ (c(y(τ ), ℓ1 ), · · ·, c(y(τ ), ℓN )) (τ ≥ 0) generated by the pricing model
(5.27).
Consider for example a sample realization of three-months at-the-money option prices, or

a sample realization of six-months zero-coupon bond prices. Long sequences such as the ones
in these examples are common to observe. If these sequences were generated by the pricing
model in Eq. (5.27), as in Definition 5.3, they would be deterministic functions of y, and hence
stationary. We now develop conditions ensuring both feasibility and first-order efficiency of the
class of simulation-based estimators, as applied to this kind of data. Let ā denote the matrix
having the first q rows of Σ, the diffusion matrix in Eq. (5.25). Let ∇C denote the Jacobian of
C with respect to y. We have:
Theorem 5.4. (Asset pricing and Cramer-Rao lower bound) Suppose to observe an intertem-
poral (ℓ, d−q)-cohort of contingent claim prices c (τ , ℓ), and that there exist prediction functions
C in Rd−q with the property that for θ = θ0 and γ = γ 0 ,

ā(τ ) · Σ(τ )−1
= 0, P ⊗ dτ -a.s. all τ ∈ [t, t + 1], (5.28)
∇C(τ )
184
by A. Mele
where C satisfies the initial condition C(t) = c (t, ℓ) ≡ (c(y(t), ℓ1 ), · · ·, c(y(t), ℓd−q )). Let φct =
(y o (t), c(y(t), ℓ1 ), ···, c(y(t), ℓd−q )). Then, any simulation-based estimator applied to φct is feasible.
Moreover, asssume φct is also Markov. Then, any estimator with a span of moment conditions
for φct that also spans the true score, attains the Cramer-Rao lower bound, with respect to the
fields generated by φct .
According to Theorem 5.4, any estimator is feasible, whenever φ is locally invertible for a
time span equal to the sampling interval. As Figure 5.2 illustrates, condition (5.28) is satisfied
whenever φ is locally one-to-one and onto.5 If φ is also globally invertible for the same time
span, φc is Markov. The last part of this theorem says that in this case, any estimator is
asymptotically efficient. We emphasize that this conclusion is about first-order efficiency in the
joint estimation of θ and γ given the observations on φc .
Naturally, condition (5.28) does not ensure that φ is globally one-to-one and onto: φ might
have many locally invertible restrictions.6 In practice, φ might fail being globally invertible
because monotonicity properties of φ may break down in multidimensional diffusion models.
For example, in models with stochastic volatility, option prices can be decreasing in the under-
lying asset price (see Bergman, Grundy and Wiener, 1996). In models of the yield curve with
stochastic volatility, to cite a second example, medium-long term bond prices can be increasing
in the short-term rate (see Mele, 2003). These cases might arise as there is no guarantee that
the solution to a stochastic differential system is nondecreasing in the initial condition of one
if its components, which is, instead, always true in the scalar case.
When all components of vector y o represent the prices of assets actively traded in frictionless
markets, (5.28) corresponds to a condition ensuring market completeness in the sense of Harrison
and Pliska (1983). As an example, condition (5.28) for Heston’s (1993) model is ∂c/ ∂σ =
0 P ⊗ dτ -a.s, where σ denotes instantaneous volatility of the price process. This condition is
satisfied by the Heston’s model. In fact, Romano and Touzi (1997) showed that within a fairly
general class of stochastic volatility models, option prices are always strictly increasing in σ
whenever they are convex in Q. Theorem 5.4 can be used to implement efficient estimators in
other complex multidimensional models. Consider for example a three-factor model of the yield
curve. Consider a state-vector (r, σ, ℓ), where r is the short-term rate and σ, ℓ are additional
factors (such as, say, instantaneous short-term rate volatility and a central tendency factor). Let
u(i) = u (r(τ ), σ(τ ), ℓ(τ ); Mi − τ ) be the time τ rational price of a pure discount bond expiring
at Mi ≥ τ , i = 1, 2, and take M1 < M2 . Let φ ≡ (r, u(1) , u(2) ). Condition (5.28) for this model
is then,
(2) (1) (2)
u(1)
σ uℓ − uℓ uσ = 0, P ⊗ dt-a.s. τ ∈ [t, t + 1], (5.29)
where subscripts denote partial derivatives. It is easily checked that this same condition must be
satisfied by models with correlated Brownian motions and by yet more general models. Classes
of models of the short-term rate for which condition (5.29) holds are more intricate to identify
than in the European option pricing case seen above (see Mele, 2003).
5 Local invertibility of φ means that for every y ∈ Y , there exists an open set Y containing y such that the restriction of φ to
∗
Y∗ is invertible. Let Jφ denote the Jacobian of φ. Then, we have that φ is locally invertible on Y∗ if det Jφ = 0 on Y∗ , which is
condition (5.28).
6 As an example, consider the mapping R2 → R2 defined as φ(y , y ) = (ey1 cos y , ey1 sin y ). The Jacobian satisfies
1 2 2 2
det Jφ(y1 , y2 ) = e2y1 , yet φ is 2π-periodic with respect to y2 . For example, φ(0, 2π) = φ(0, 0).
185
by A. Mele

( (
Proof of Eq. (5.2). We have: P (A1 A2 ) = P (A1 ) · P (A2 |A1 ). Consider the event E ≡ A1 A2 .
We still have,
( ( (
( Pr (A3 E) Pr (A3 A1 A2 )
Pr (A3 |A1 A2 ) = Pr (A3 |E ) = = ( .
Pr (E) Pr (A1 A2 )
That is,

(
3 ( ( (
Pr Ai = Pr (A1 A2 ) · Pr (A3 |A1 A2 ) = Pr (A1 ) · Pr (A2 |A1 ) · Pr (A3 |A1 A2 ) .
i=1
Continuing, we obtain Eq. (5.2).
186
5.10. Appendix 2: Collected notions and results c
by A. Mele
5.10 Appendix 2: Collected notions and results

Convergence in probability. A sequence of random vectors {xT } converges in probability to the
random vector x̃ if for each ǫ > 0, δ > 0 and each i = 1, 2, · · ·, N, there exists a Tǫ,δ such that for
every T ≥ Tǫ,δ ,
Pr (|xT i − x̃i | > δ) < ǫ.
p
This is succinctly written as xT → x̃, or plim xT = x̄, if x̃ ≡ x̄, a constant.
Convergence in probability generalizes the standard notion of a limit of a deterministic sequence.

Of a deterministic sequence xT , we say it converges to some limit x̄ if, for κ > 0, there exists a Tκ :
for each T ≥ Tκ we have that |xT − x̄| < κ. Convergence in probability can also be restated as saying
that:
lim Pr (|xT i − x̃i | > δ) = 0.
T →∞
The following is a stronger notion of convergence:
Almost sure convergence. A sequence of random vectors {xT } converges almost surely to the
random vector x̃ if, for each i = 1, 2, · · ·, N, we have:
Pr (ω : xT i (ω) → x̃i ) = 1,
a.s.
where ω denotes the entire random sequence xT i . This is succinctly written as xT → x̃.
Almost sure convergence implies convergence in probability. Convergence in probability means

that for each ǫ > 0, limT →∞ Pr (ω : |xT i (ω) − x̃i | < ǫ) = 1. Almost sure convergence requires that
Pr (limT →∞ xT i → x̃i ) = 1 or that
$ % $ %
/
lim
′
Pr sup |xT i − x̃i | > δ = lim ′
Pr |xT i − x̃i | > δ = 0.
T →∞ T ≥T ′ T →∞ T ≥T ′
Next, assume that the second order moments of all xi are finite. We have:
Convergence in quadratic mean. A sequence of random vectors {xT } converges in quadratic

mean to the random vector x̃ if for each i = 1, 2, · · ·, N, we have:
!
2
lim
′
E (x Ti − x̃i ) → 0.
T →∞
q.m.
This is succinctly written as xT → x̃.
Remark. By Chebyshev’s inequality,

E[(xT i − x̃i )2 ]
Pr (|xT i − x̃i | > δ) ≤ ,
δ2
which shows that convergence in quadratic mean implies convergence in probability.
We now turn to a weaker notion of convergence:
Convergence in distribution. Let {fT (·)}T be the sequence of probability distributions (that is,
fT (x) = pr (xT ≤ x)) of the sequence of the random vectors {xT }. Let x̃ be a random vector with
probability distribution f(x). A sequence {xT } converges in distribution to x̃ if, for each i = 1, 2, ···, N,
we have:
lim fT (x) = f(x).
T →∞
187
by A. Mele
d
This is succinctly written as xT → x̃.
The following two results are useful to the purpose of this chapter:
p d
Slutzky’s theorem. If yT → ȳ and xT → x̃, then:
d
yT · xT → ȳ · x̃.
Cramer-Wold device. Let λ be a N-dimensional vector of constants. We have:

d d
xT → x̃ ⇔ λ⊤ · xT → λ⊤ · x̃.
d d
The following example illustrates the Cramer-Wold device. If λ⊤ · xT → N 0; λ⊤ Σλ , then xT →
N (0; Σ).
We now state two laws about convergence in probability.
Weak law (No. 1) (Khinchine). Let {xT } be a i.i.d. sequence satistfying E(xT ) = µ < ∞ ∀T . We
have:
T
1 p
x̄T ≡ xt → µ.
T t=1
{xT } be a sequence independent but not

Weak law (No. 2) (Chebyshev). Let identically distributed,
satisfying E(xT ) = µT < ∞ and E (xT − µT )2 = σ2T < ∞. If limT →∞ T12 Tt=1 σ2t → 0, then:
T T
1 p 1
x̄T ≡ xt → µ̄T ≡ µ.
T t=1 T t=1 t
We now state and provide a proof of the central limit theorem in a simple setting.

Central Limit Theorem. Let {xT } be a i.i.d. sequence, satisfying E(xT ) = µ < ∞ and E (xT − µ)2

= σ2 < ∞ ∀T . Let x̄T ≡ T1 Tt=1 xt . We have,
√
T (x̄T − µ) d
→ N(0, 1).
σ
The multidimensional version of this theorem requires a mere change in notation. For the proof, the
classic method relies on the characteristic functions. Let:
"
√
ϕ(t) ≡ E eitx = eitx f(x)dx, i ≡ −1.
∂r

We have = ir m(r) , where m(r) is the r-th order moment. By a Taylor’s expansion,
∂tr ϕ(t) t=0

∂ 1 ∂2

ϕ(t) = ϕ(0) + ϕ(t) t + ϕ(t) t2 + · · · = 1 + im(1) t − m(2) 1 t2 + · · ·.
∂t 2 ∂t 2 2
t=0 t=0

Next, let x̄T = T1 Tt=1 xt , and consider the random variable,
√ T
T (x̄T − µ) 1 xt − µ
YT ≡ =√ .
σ T t=1 σ
188
by A. Mele
x√t −µ
The characteristic function of YT is the product of the characteristic functions of at ≡ Tσ
, which are
t2
all the same: ϕYT (t) = (ϕa (t))T , where ϕa (t) = 1 − 2T + · · ·. Therefore,

t T 1 t2 T
ϕYT (t) = ϕ √ = 1− + o T −1 .
T 2T
1 2
Clearly, limT →∞ ϕYT (t) = e− 2 t , which is the characteristic function of a standard Gaussian variable.
189
5.11. Appendix 3: Theory for maximum likelihood estimation c
by A. Mele
5.11 Appendix 3: Theory for maximum likelihood estimation

a.s.
Assume that θ̂ T → θ0 , and that H(y, θ) ≡ ∇θθ ln L( θ|# y) exists, it is continuous in θ uniformly in y
and that we can differentiate twice inside the integral L( θ| y)dy = 1. We have:
T
1
sT (θ) = ∇θ ln L ( θ| y) .
T t=1
Consider the c-parametrized curves θ(c) = c(θ0 − θ̂T ) + θ̂T where, for all c ∈ (0, 1)p and θ ∈ Θ, cθ
denotes a vector in Θ where the ith element is c(i) θ(i) . By the intermediate value theorem, there exists
then a c∗ in (0, 1)p such that we have almost surely:
sT (θ̂T ) = sT (θ0 ) + HT (θ∗ ) · (θ̂T − θ0 ),
where θ∗ ≡ θ(c∗ ) and:

T
1
HT (θ) = H( θ| yt ).
T t=1
The first order conditions tell us that sT (θ̂ T ) = 0. Hence,
0 = sT (θ0 ) + HT (θ∗T ) · (θ̂T − θ0 ).
We also have that:

T
1
|HT (θ∗T ) − HT (θ0 )| ≤ |H (θ∗T ) − HT (θ0 )| ≤ sup |H (θ∗T ) − HT (θ0 )| , (5A.1)
T t=1
a.s.
where the supremum is taken over the set of all the observations. Since θ̂T → θ0 , we also have that
a.s.
θ∗T → θ0 . Moreover, by the law of large numbers,
T
1 p
HT (θ0 ) = H ( θ0 | yt ) → E [H ( θ0 | yt )] = −J (θ0 ) . (5A.2)
T t=1
Since H is continuous in θ uniformly in y, the inequality in (5A.1), and (5A.2) both imply that:
a.s.
HT (θ∗T ) → −J (θ0 ) .
Therefore, as T → ∞,
√ √ √
T θ̂ T − θ0 = −HT−1 (θ0 ) · sT (θ0 ) T = J −1 · T sT (θ 0 ).
1 T
By the central limit theorem, and E (sT ) = 0, the score, sT (θ0 ) = T t=1 s (θ0 , yt ), is such that
√ d
T · sT (θ0 ) → N (0, var (s (θ 0 , yt ))) ,
where
var (s (θ0 , yt )) = J .
The result follows by the Slutzky’s theorem and the symmetry of J .
Finally, one should show the existence of a sequence θ̂ T converging a.s. to θ0 . Proofs on this type
of convergence can be found in Amemiya (1985), or in Newey and McFadden (1994).
190
5.12. Appendix 4: Dependent processes c
by A. Mele
5.12 Appendix 4: Dependent processes

5.12.1 Weak dependence
T
Let σ2T = var( t=1 xt ), and assume that that σ2T = O(T ), and that σ2T = O T −1 . If
T
d
σ −1
T (xt − E(xt )) → N (0, 1) ,
t=1
we say that {xt } is weakly dependent. Of a process, we say it is “nonergodic,” when it exhibits such a
strong dependence that it does not even satisfy the law of large numbers.
• Stationarity
• Weak dependence
• Ergodicity
5.12.2 The central limit theorem for martingale differences

1 T
Let xt be a martingale difference sequence with E x2t = σ 2t < ∞ for all t, and define x̄T ≡ xt ,
T t=1
1 T
and σ̄2T ≡ σ2 . Let,
T t=1 t
T T
1 2 1 2 p
∀ǫ > 0, lim xt I|xt |≥ǫ·T ·σ̄2 = 0, and xt − σ̄ 2T → 0.
T →∞ T σ̄ 2
T t=1
T T t=1
Under the previous condition, √

T · x̄T d
→ N (0, 1) .
σ̄T
5.12.3 Applications to maximum likelihood

We use the central limit theorem for martingale differences to prove asymptotic normality of the MLE,
in the case of weakly dependent processes. We have,
T

ln LT (θ) = ℓt (θ) , ℓt (θ) ≡ ℓ (θ; yt | xt ) .
t=1
The MLE satisfies the following first order conditions,

T
T

d
0p = ∇θ ln LT (θ)|θ=θ̂T = ∇θ ℓt (θ)|θ=θ0 + ∇θθ ℓt (θ)|θ=θ0 (θ̂T − θ0 ),
t=1 t=1
whence - .−1
T T
√ d 1 1
T (θ̂T − θ0 ) = − ∇θθ ℓt (θ0 ) √ ∇θ ℓt (θ0 ). (5A.3)
T t=1 T t=1
We have:
Eθ0 [ ∇θ ℓt+1 (θ 0 )| Ft ] = 0p ,
∂ℓt (θ0 )
which shows that ∂θ is a martingale difference. Naturally, here we also have that:
Eθ0 ( |∇θ ℓt+1 (θ 0 )|2 | Ft ) = −Eθ0 ( ∇θθ ℓt+1 (θ0 )| Ft ) ≡ Jt (θ0 ) .

191
5.12. Appendix 4: Dependent processes c
by A. Mele
Next, for a given constant c ∈ Rp , let:
xt ≡ c⊤ ∇θ ℓt (θ0 ).
Clearly, xt is also a martingale difference. Furthermore,

Eθ0 x2t+1 Ft = −c⊤ Jt (θ 0 ) c,
and because xt is a martingale difference, E (xt xt−i ) = E [E ( xt · xt−i | Ft−i )] = E [E ( xt | Ft−i ) · xt−i ] =
0, for all i. That is, xt and xt−i are mutually uncorrelated. It follows that,
$ T % T

var xt = E x2t
t=1 t=1
T
= c⊤ Eθ0 (|∇θ ℓt (θ 0 )|2 ) c
t=1
T

= c⊤ Eθ0 [Eθ0 ( |∇θ ℓt (θ 0 )|2 | Ft−1 )] c
t=1
T

=− c⊤ Eθ0 [Jt−1 (θ0 )] c
t=1
- T .

= −c⊤ Eθ0 (Jt−1 (θ0 )) c.
t=1
Next, define:
T T
- T
.
1 1 1
x̄T ≡ xt and σ̄2T ≡ E x2t = −c⊤ Eθ (Jt−1 (θ0 )) c.
T t=1 T t=1 T t=1 0
Under the conditions underlying the central limit theorem for weakly dependent processes provided
earlier, to be spelled out below, √
T x̄T d
→ N (0, 1) .
σ̄T
By the Cramer-Wold device,
- T
.−1/2 T
1 1 d
Eθ (Jt−1 (θ0 )) √ ∇θ ℓt (θ 0 ) → N (0, Ip ) .
T t=1 0 T t=1
The conditions that need to be satisfied are,

T T T
1 1 p 1
∇θθ ℓt (θ0 ) − Eθ0 [Jt−1 (θ 0 )] → 0, and plim Eθ [Jt−1 (θ 0 )] ≡ J∞ (θ0 ) .
T t=1 T t=1 T t=1 0
Under the previous conditions, it follows from Eq. (5A.3) that,

√
d
T (θ̂T − θ0 ) → N 0p , J∞ (θ0 )−1 .
192
5.13. Appendix 5: Proof of Theorem 5.4 c
by A. Mele
5.13 Appendix 5: Proof of Theorem 5.4

Let πt ≡ πt ( φ (y(t + 1), M − (t + 1)1d−q )| φ (y(t), M − t1d−q )) denote the transition density of
φ (y(t), M − t1d−q ) ≡ φ (y(t)) ≡ (y o (t), c(y(t), M1 − t), · · ·, c(y(t), Md−q − t)),
where we have emphasized the dependence of φ on the time-to-maturity vector:
M − t1d−q ≡ (M1 − t, · · ·, Md−q − t).
By Σ(τ ) full rank P ⊗ dτ -a.s., and Itô’s lemma, φ satisfies, for τ ∈ [t, t + 1],
+ o
dy (τ ) = bo (τ )dτ + F (τ )Σ(τ )dW (τ )
dc(τ ) = bc (τ )dτ + ∇c(τ )Σ(τ )dW (τ )
where bo and bc are, respectively, q-dimensional and (d − q)-dimensional measurable functions, and
F (τ ) ≡ ā(τ )·Σ(τ )−1 P ⊗dτ -a.s. Under condition (5.28), πt is not degenerate. Furthermore, C (y(t); ℓ) ≡
C(t) is deterministic in ℓ ≡ (ℓ1 , · · ·, ℓd−q ). That is, for all (c̄, c̄+ ) ∈ Rd × Rd , there exists a function µ
such that for any neighbourhood N(c̄+ ) of c̄+ , there exists another neighborhood N(µ(c̄+ )) of µ(c̄+ )
such that,
& '
ω ∈ Ω : φ (y(t + 1), M − (t + 1)1d−q ) ∈ N(c̄+ ) φ (y(t), M − t1d−q ) = c̄
&
= ω ∈ Ω : (y o (t + 1), c(y(t + 1), M1 − t)), · · ·, c(y(t + 1), Md−q − t)) ∈ N(µ(c̄+ ))
|φ (y(t), M − t1d−q ) = c̄ }
&
= ω ∈ Ω : (y (t + 1), c(y(t + 1), M1 − t)), · · ·, c(y(t + 1), Md−q − t)) ∈ N(µ(c̄+ ))
o
|(y o (t), c(y(t), M1 − t), · · ·, c(y(t), Md−q − t)) = c̄ }
where the last equality follows by the definition of φ. In particular, the transition laws of φct given
φct−1 are not degenerate; and φct is stationary. The feasibility of simulation based method of moments
estimation is proved. The efficiency claim follows by the Markov property of φ, and the usual score
martingale difference argument.
193
by A. Mele
References
Altissimo, F. and A. Mele (2009): “Simulated Nonparametric Estimation of Dynamic Models.”
Review of Economic Studies 76, 413-450.
Amemiya, T. (1985): Advanced Econometrics. Cambridge, Mass.: Harvard University Press.
Bergman, Y. Z., B. D. Grundy, and Z. Wiener (1996): “General Properties of Option Prices.”
Journal of Finance 51, 1573-1610.
Brandt, M. and P. Santa-Clara (2002): “Simulated Likelihood Estimation of Diffusions with an

Applications to Exchange Rate Dynamics in Incomplete Markets.” Journal of Financial
Economics 63, 161-210.
Carrasco, M., M. Chernov, J.-P. Florens and E. Ghysels (2007): “Efficient Estimation of Gen-
eral Dynamic Models with a Continuum of Moment Conditions.” Journal of Econometrics
140, 529-573.
Chernov, M. and E. Ghysels (2000): “A Study towards a Unified Approach to the Joint Esti-
mation of Objective and Risk-Neutral Measures for the Purpose of Options Valuation.”
Journal of Financial Economics 56, 407-458.
Christensen, B. J. (1992): “Asset Prices and the Empirical Martingale Model.” Working paper,
New York University.
Duffie, D. and K. J. Singleton (1993): “Simulated Moments Estimation of Markov Models of

Asset Prices.” Econometrica 61, 929-952.
Fermanian, J.-D. and B. Salanié (2004): “A Nonparametric Simulated Maximum Likelihood

Estimation Method.” Econometric Theory 20, 701-734.
Fisher, R. A. (1912): “On an Absolute Criterion for Fitting Frequency Curves.” Messages of
Mathematics 41, 155-157.
Gallant, A. R. and G. Tauchen (1996): “Which Moments to Match?” Econometric Theory 12,
657-681.
Gauss, C. F. (1816): “Bestimmung der Genanigkeit der Beobachtungen.” Zeitschrift für As-
tronomie und Verwandte Wissenschaften 1, 185-196.
Gouriéroux, C., A. Monfort and E. Renault (1993): “Indirect Inference.” Journal of Applied
Econometrics 8, S85-S118.
Hajivassiliou, V. and D. McFadden (1998): “The Method of Simulated Scores for the Estima-
tion of Limited-Dependent Variable Models.” Econometrica 66, 863-896.
Hansen, L. P. (1982): “Large Sample Properties of Generalized Method of Moments Estima-

tors.” Econometrica 50, 1029-1054.
Hansen, L. P. and K. J. Singleton (1982): “Generalized Instrumental Variables Estimation of

Nonlinear Rational Expectations Models.” Econometrica 50, 1269-1286.
194
by A. Mele
Hansen, L. P. and K. J. Singleton (1983): “Stochastic Consumption, Risk Aversion, and the
Temporal Behavior of Asset Returns.” Journal of Political Economy 91, 249-265.
Harrison, J. M. and S. R. Pliska (1983): “A Stochastic Calculus Model of Continuous Trading:

Complete Markets.” Stochastic Processes and their Applications 15, 313-316.
Heston, S. (1993): “A Closed-Form Solution for Options with Stochastic Volatility with Ap-
plications to Bond and Currency Options.” Review of Financial Studies 6, 327-343.
Laroque, G. and B. Salanié (1989): “Estimation of Multimarket Fix-Price Models: An Appli-

cation of Pseudo-Maximum Likelihood Methods.” Econometrica 57, 831-860.
Laroque, G. and B. Salanié (1993): “Simulation-Based Estimation of Models with Lagged

Latent Variables.” Journal of Applied Econometrics 8, S119-S133.
Laroque, G. and B. Salanié (1994): “Estimating the Canonical Disequilibrium Model: Asymp-
totic Theory and Finite Sample Properties.” Journal of Econometrics 62, 165-210.
Lee, B-S. and B. F. Ingram (1991): “Simulation Estimation of Time-Series Models.” Journal
of Econometrics 47, 197-207.
Lee, L. F. (1995): “Asymptotic Bias in Simulated Maximum Likelihood Estimation of Discrete

Choice Models.” Econometric Theory 11, 437-483.
McFadden, D. (1989): “A Method of Simulated Moments for Estimation of Discrete Response

Models without Numerical Integration.” Econometrica 57, 995-1026.
Mele, A. (2003): “Fundamental Properties of Bond Prices in Models of the Short-Term Rate.”
Review of Financial Studies 16, 679-716.
Newey, W. K. and D. L. McFadden (1994): “Large Sample Estimation and Hypothesis Test-
ing.” In: Engle, R. F. and D. L. McFadden (Editors): Handbook of Econometrics, Vol. 4,
Chapter 36, 2111-2245. Amsterdam: Elsevier.
Neyman, J. and E. S. Pearson (1928): “On the Use and Interpretation of Certain Test Criteria
for Purposes of Statistical Inference.” Biometrika 20A, 175-240, 263-294.
Pakes, A. and D. Pollard (1989): “Simulation and the Asymptotics of Optimization Estima-
tors.” Econometrica 57, 1027-1057.
Pastorello, S., E. Renault and N. Touzi (2000): “Statistical Inference for Random-Variance
Option Pricing.” Journal of Business and Economic Statistics 18, 358-367.
Pastorello, S., V. Patilea, and E. Renault (2003): “Iterative and Recursive Estimation in
Structural Non Adaptive Models.” Journal of Business and Economic Statistics 21, 449-
509.
Pearson, K. (1894): “Contributions to the Mathematical Theory of Evolution.” Philosophical

Transactions of the Royal Society of London, Series A 185, 71-78.
Romano, M. and N. Touzi (1997): “Contingent Claims and Market Completeness in a Stochas-
tic Volatility Model.” Mathematical Finance 7, 399-412.
195
by A. Mele
Santa-Clara, P. (1995): “Simulated Likelihood Estimation of Diffusions With an Application

to the Short Term Interest Rate.” Ph.D. dissertation, INSEAD.
Singleton, K. J. (2001): “Estimation of Affine Asset Pricing Models Using the Empirical Char-
acteristic Function.” Journal of Econometrics 102, 111-141.
Smith, A. (1993): “Estimating Nonlinear Time Series Models Using Simulated Vector Autore-
gressions.” Journal of Applied Econometrics 8, S63-S84.
Tauchen, G. (1997): “New Minimum Chi-Square Methods in Empirical Finance.” In D. Kreps

and K. Wallis (Editors): Advances in Econometrics, 7th World Congress, Econometrics
Society Monographs, Vol. III. Cambridge UK: Cambridge University Press, 279-317.
196
Part II
Asset pricing and reality
197
6
On kernels and puzzles
This chapter discusses theoretical restrictions that can be used to perform statistical validation
of asset pricing models. We reconsider the Lucas’ model, and give more structure on the data
generating process. We present a simple setting which allows us to obtain closed-form solutions.
We then discuss how the model’s predictions can be used to test the validity of the model.
6.1 A single factor model

6.1.1 The model
There is a representative agent with CRRA utility, viz u (x) = x1−η / (1 − η). Cum-dividends
gross returns (St + Dt )/ St−1 are generated by:

 ln(St + Dt ) = ln St−1 + µS − 1 σ 2S + ǫS,t

2 (6.1)
 1
 ln Dt = ln Dt−1 + µD − σ 2D + ǫD,t
2
where 2
ǫS,t σS σ SD
∼ NID 02 ; .
ǫD,t σ SD σ 2D
Given the stochastic price and dividend process in (13.26), we now derive restrictions between
the various coefficients µS , µD , σ 2S , σ 2D and σ SD that are imposed by economic theory. The cum
dividend process in (13.26) has been assumed so for analytical purposes only.
By standard consumption-based asset pricing theory (see Part I),
′
u (Dt+1 )
St = E β ′ (St+1 + Dt+1 ) Ft ,
u (Dt )
with the usual notation. By the preferences assumption,

1 = E eZt+1 +Qt+1 Ft , (6.2)
6.1. A single factor model c
by A. Mele
where $ −η %
Dt+1 St+1 + Dt+1
Zt+1 = ln β ; Qt+1 = ln .
Dt St
In fact, Eq. (8.7) holds for any asset. In particular, it holds
for a one-period bond with price
b b b b −1
St ≡ bt , St+1 ≡ 1 and D t+1 ≡ 0. Define, Qt+1 ≡ ln bt ≡ ln Rt . By replacing this into Eq.
−1 Zt+1
(8.7), one gets Rt = E e Ft . We are left with the following system:

 1
= E eZt+1 Ft
R (6.3)
 1 t = E eZt+1 +Qt+1 F
t
To obtain closed-form solutions, we will need to use the following result:
Lemma 6.1: Let Z be conditionally normally distributed. Then, for any γ ∈ R,

1 2
E e−γZt+1 Ft = e−γE( Zt+1 |Ft )+ 2 γ var( Zt+1 |Ft )
2

var [e−γZt+1 | Ft ] = e−γE( Zt+1 |Ft )+γ var( Zt+1 |Ft ) 1 − e−γ 2 var( Zt+1 |Ft )
By the definition of Z, Eq. (13.26), and Lemma 6.1,

1
= E eZt+1 Ft = eE[ Zt+1 |Ft ]+ 2 var[ Zt+1 |Ft ] = eln β−η(µD − 2 σD )+ 2 η σD .
1 1 2 1 2 2
Rt
The equilibrium interest rate thus satisfies,
η (η + 1) 2
ln Rt = − ln β + ηµD − σD , a constant. (6.4)
2
The ηµD term reflects “intertemporal substitution” effects; the last term term reflects “precau-
tionary” motives.
The second equation in (6.3) can be written as,

1 = E [exp (Zt+1 + Qt+1 )| Ft ] = eln β−η(µD − 2 σD )+µS − 2 σS · E eñt+1 Ft ,
1 2 1 2
where ñt+1 ≡ ǫS,t+1 −ηǫD,t ∼ N(0, σ 2S +η2 σ 2D −2ησ SD ). The above expectation can be computed
through Lemma 6.1. The result is,
η (η + 1) 2
0 = ln β − ηµD + σ D + µS − ησ SD .
2
− ln Rt
By defining Rt ≡ ert , and rearranging terms,
µ − r = ησ SD .
S
risk premium
To sum up,
µS = r + ησ SD
η (η + 1) 2
rt = − ln β + ηµD − σD
2
199
6.1. A single factor model c
by A. Mele
Let us compute other interesting objects. The expected gross return on the risky asset is,

St+1 + Dt+1 1 2
E Ft = eµS − 2 σS · E [eǫS,t+1 | Ft ] = eµS = er+ησSD .
St

Therefore, if σ SD > 0, then E [ (St+1 + Dt+1 )/ St | Ft ] > E b−1 t
Ft , as expected.
Next, we test the internal consistency of the model. The coefficients of the model must satisfy
some restrictions. In particular, the asset price volatility must be determined endogeneously.
We first conjecture that the following “no-sunspots” condition holds,
ǫS,t = ǫD,t . (6.5)
We will demonstrate below that this is indeed the case. Under the previous condition,
µS = r + λσ D ; λ ≡ ησ D ,
and
1 2 ǫD,t+1
Zt+1 = − r + λ − λuD,t+1 ; uD,t+1 ≡ .
2 σD
Under condition (6.5), we have a very instructive way to write the pricing kernel. Precisely,
define recursively,
ξ
mt+1 = t+1 ≡ exp (Zt+1 ) ; ξ 0 = 1.
ξt
This is reminiscent of the continuous time representation of Arrow-Debreu state prices (see
Chapter 4).
Next, let’s iterate the asset price equation (8.7),
-$ n % . -$ i % .
1 n 1

St = E eZt+i · St+n Ft + E eZt+j · Dt+i Ft
j=1
i=1 j=1

n
ξ t+n ξ
= E · St+n Ft + E t+i
· Dt+i Ft .
ξt i=1
ξt
By letting n → ∞ and assuming no-bubbles, we get:

∞
ξ t+i
St = E · Dt+i Ft . (6.6)
i=1
ξt
The expectation is, by Lemma 6.1,

i
ξ t+i
E · Dt+i Ft = E e j=1 t+j · Dt+i Ft = Dt e(µD −r−σD λ)i .
Z
ξt
Suppose that the “risk-adjusted” discount rate r + σ D λ is higher than the growth rate of the
economy, viz.
r + σ D λ > µD ⇔ k ≡ eµD −r−σD λ < 1.
Under this condition, the summation in Eq. (6.6) converges, and we obtain:
St k
= . (6.7)
Dt 1−k
200
6.2. The equity premium puzzle c
by A. Mele
This is a version of the celebrated Gordon’s formula. It predicts that price-dividend ratios are
constant, a counterfactual feature addressed in Chapter 8.
To find the final restrictions of the model, notice that Eq. (6.7) and the second equation in
(13.26) imply that
1
ln(St + Dt ) − ln St−1 = − ln k + µD − σ 2D + ǫD,t .
2
By the first equation in (13.26),
1 1
µS − σ 2S = µD − σ 2D − ln k
2 2
ǫS,t = ǫD,t , ∀t
The second condition confirms condition (6.5). It also reveals that, σ 2S = σ SD = σ 2D . By replacing
this into the first condition, delivers back µS = µD − ln k = r + σ D λ.
6.1.2 Extensions
In Chapter 3 we showed that in a i.i.d. environment, prices are convex (resp. concave) in the
dividend rate whenever η > 1 (resp. η < 1). The pricing formula (6.7) reveals that in a dynamic
environment, such a property is lost. In this formula, prices are always linear in the dividends’
rate. It would be possible to show with the techniques developed in the next chapter that in
a dynamic context, convexity properties of the price function would be inherited by properties
of the dividend process in the following sense: if the expected dividend growth under the risk-
neutral measure is a convex (resp. concave) function of the initial dividend rate, then prices are
convex (resp. concave) in the initial dividend rate. In the model analyzed here, the expected
dividend growth under the risk-neutral measure is linear in the dividends’ rate, and this explains
the linear formula (6.7).
6.2 The equity premium puzzle

“Average excess returns on the US stock market [the equity premium] is too high to be
easily explained by standard asset pricing models.” Mehra and Prescott
To be consistent with data, the equity premium,

µS − r = λσ D , λ = ησ D
must be “high” enough - as regards US data, approximately an annualized 6%. If the asset we
are trying to price is literally a consumption claim, then σ D is consumption volatility, which is
very low (approximately 3%). To make µS − r high, one needs very “high” values of η (let’s say
η ≃ 30), But assuming η = 30 doesn’t seem to be plausible. This is the equity premium puzzle
originally raised by Mehra and Prescott (1985).
Even if we dismiss the idea that η = 30 is implausible, another puzzle arises: the interest rate
puzzle. As we showed in Eq. (6.4), very high values of η can make the interest rate very high
(see Figure 6.1).
In the next section, we show how this failure of the model can be “detected” with a general
methodology that can be applied to a variety of related models - more general models.
201
6.3. Hansen-Jagannathan cup c
by A. Mele
0.1
0.0
10 20 30 40
eta
-0.1
FIGURE 6.1. The risk-free rate puzzle: the two curves depict the graph
η → r(η) = − ln β + 0.0183 · η − (0.0328)2 · η(η+1)
2 , with β = 0.95 (solid line) and β = 1.05
(dashed line). Even if risk aversion were to be as high as η = 30, the equilibrium short-term rate
would behave counterfactually, reaching a level as high as 10%. In order for r to be lower when η is
high, it might be required that β > 1.
6.3 Hansen-Jagannathan cup

Suppose there are n risky assets. The n asset pricing equations for these assets are,
1 = E [mt+1 (1 + Rj,t+1 )| Ft ] , j = 1, · · ·, n.
By taking the unconditional expectation of the previous equation, and defining Rt = (R1,t , · ·
·, Rn,t )⊤ ,
1n = E [mt (1n + Rt )] .
Let m̄ ≡ E(mt ). We create a family of stochastic discount factors m∗t parametrized by m̄ by
projecting m on to the asset returns,
P roj ( m| 1n + Rt ) ≡ m∗t (m̄) = m̄ + [Rt − E(Rt )]⊤ β m̄ ,

1×n n×1
where1
β m̄ = Σ−1 cov (m, 1n + Rt ) = Σ−1 [1n − m̄E (1n + Rt )] ,
n×1 n×n n×1
!
and Σ ≡ E (Rt − E(Rt )) (Rt − E(Rt ))⊤ . As shown in the appendix, we also have that,
1n = E [m∗t (m̄) · (1n + Rt )] .
We have,

var (m∗t (m̄)) = β m̄ Σβ m̄ = (1n − m̄E (1n + Rt ))⊤ Σ−1 (1n − m̄E {1n + Rt }).
⊤
This is the celebrated Hansen-Jagannathan “cup” (Hansen and Jagannathan (1991)). The
interest of this object lies in the following theorem.
1 We have, cov (m, 1n + Rt ) = E [m (1n + R)] − E (m) E (1n + Rt ) = 1n − m̄E (1n + Rt ).

202
6.3. Hansen-Jagannathan cup c
by A. Mele
Theorem 6.2: Among all stochastic discount factors with fixed expectation m̄, m∗t (m̄) is the
one with the smallest variance.
Proof: Consider another discount factor indexed by m̄, i.e. mt (m̄). Naturally, mt (m̄) satisfies
1n = E [mt (m̄) (1n + Rt )]. And since it also holds that 1n = E [m∗t (m̄) (1n + Rt )], we deduce
that
0n = E [(mt (m̄) − m∗t (m̄)) (1n + Rt )]

= E {[mt (m̄) − m∗t (m̄)] [(1n + E(Rt )) + (Rt − E(Rt ))]}
= E {[mt (m̄) − m∗t (m̄)] [Rt − E(Rt )]}
= cov [mt (m̄) − m∗t (m̄), Rt ]
where the third line follows from the fact that E [mt (m̄)] = E [m∗t (m̄)] = m̄, and the fourth line
follows because E [(mt (m̄) − m∗t (m̄))] = 0. But m∗t (m̄) is a linear combination of Rt . By the
previous equation, it must then be the case that,
0 = cov [mt (m̄) − m∗t (m̄), m∗t (m̄)] .
Hence,
var [mt (m̄)] = var [m∗t (m̄) + mt (m̄) − m∗t (m̄)]

= var [m∗t (m̄)] + var [mt (m̄) − m∗t (m̄)] + 2 · cov [mt (m̄) − m∗t (m̄), m∗t (m̄)]
= var [m∗t (m̄)] + var [mt (m̄) − m∗t (m̄)]
≥ var [m∗t (m̄)] .
The previous bound can be improved by using conditioning information as in Gallant, Hansen
and Tauchen (1990) and the relatively more recent work by Ferson and Siegel (2003). Moreover,
these bounds typically diplay a finite sample bias: they typically overstate the true bounds and
thus they reject too often a given model. Finite sample corrections are considered by Ferson
and Siegel (2003).
For example, let us consider an application of the Hansen-Jagannathan testing methodology
to the model in Section 6.1. That model has the following stochastic discount factor,

ξ t+1 1 2 ǫD,t+1
mt+1 = = exp (Zt+1 ) ; Zt+1 = − r + λ − λuD,t+1 ; uD,t+1 ≡ .
ξt 2 σD
First, we have to compute the first two moments of the stochastic discount factor. By Lemma
6.1 we have,
1 2

2
m̄ = E(mt ) = e−r and σ̄ m = var (mt (m̄)) = e−r+ 2 λ 1 − e−λ (6.8)
where
η (η + 1) 2
r = − ln β + ηµD − σ D and λ = ησ D .
2
For given µD and σ 2D , system (6.8) forms a η-parametrized curve in the space (m̄-σ̄ m ). The
objective is to see whether there are plausible values of η for which such a η-parametrized
203
6.4. Multifactor extensions c
by A. Mele
curve enters the Hansen-Jagannathan cup. Typically, this is not the case. Rather, one has the
situation depicted in Figure 6.2 below.
The general message is that models can be consistent with data with high volatile pricing
kernels (for a fixed m̄). Dismiss the idea of a representative agent with CRRA utility function.
Consider instead models with heterogeneous agents (by generalizing some ideas in Constan-
tinides and Duffie (1996); and/or consider models with more realistic preferences - such as for
example the habit preferences considered in Campbell and Cochrane (1999); and/or combina-
tions of these. These things will be analyzed in depth in the next chapter.
6.4 Multifactor extensions

A natural way to increase the variance of the pricing kernel is to increase the number of factors.
We consider two possibilities: one in which returns are normally distributed, and one in which
returns are lognormally distributed.
6.4.1 Exponential affine pricing kernels

Consider again the simple model in Section 6.1. In this section, we shall make a different
assumption regarding the returns distributions. But we shall maintain the hypothesis that the
pricing kernel satisfies an exponential-Gaussian type structure,

1 2
mt+1 = exp (Zt+1 ) ; Zt+1 = − r + λ − λuD,t+1 ; uD,t+1 ∼ NID (0, 1) ,
2
where r and λ are some constants. We have,
St+1 + Dt+1
1 = E(mt+1 · R̃t+1 ) = E (mt+1 ) E(R̃t+1 ) + cov(mt+1 , R̃t+1 ), R̃t+1 ≡ .
St
By rearranging terms,2 and using the fact that E (mt+1 ) = R−1 ,
E(R̃t+1 ) − R = −R · cov(mt+1 , R̃t+1 ). (6.9)
The following result is useful:
Lemma 6.3 (Stein’s lemma): Suppose that two random variables x and y are jointly normal.
Then,
cov [g (x) , y] = E [g ′ (x)] · cov (x, y) ,
for any function g : E (|g ′ (x)|) < ∞.
We now suppose that R̃ is normally distributed. This assumption is inconsistent with the
model in Section 6.1. In the model of Section 6.1, R̃ is lognormally distributed in equilibrium
because ln R̃ = µD − 12 σ 2D +ǫS , with ǫS normal. But let’s explore the asset pricing implications of
2 With a portfolio return that is perfectly correlated with m, we have:
M 1 σt (mt+1 )
Et (R̃t+1 )− =− σ t (R̃M
t+1 ).
Et (mt+1 ) Et (mt+1 )
σ t (mt+1 )
In more general setups than the ones considered in this introductory example, both and σt (R̃M
t+1 ) should be time-varying.
Et (mt+1 )
204
by A. Mele
this tilting assumption. Because R̃t+1 and Zt+1 are normal, and mt+1 = m (Zt+1 ) = exp (Zt+1 ),
we may apply Lemma 6.3 and obtain,
cov(mt+1 , R̃t+1 ) = E [m′ (Zt+1 )] · cov(Zt+1 , R̃t+1 ) = −λR−1 · cov(uD,t+1 , R̃t+1 ).
Replacing this into Eq. (6.9),
E(R̃t+1 ) − R = λ · cov(uD,t+1 , R̃t+1 ).
We wish to extend the previous observations to more general situations. Clearly, the pricing
kernel is some function of K factors m (ǫ1t , · · ·, ǫKt ). A particularly convenient analytical as-
sumption is to make m exponential-affine and the factors (ǫi,t )K i=1 normal, as in the following
definition:
Definition 6.4 (EAPK: Exponential Affine Pricing Kernel): Let,

K

Zt ≡ φ0 + φi ǫi,t .
i=1
A EAPK is a function
mt = m(Zt ) = exp(Zt ).
If (ǫi,t )K 2
i=1 are jointly normal, and each ǫi,t has mean zero and variance σ i , i = 1, · · ·, K, the
EAPK is called a Normal EAPK (NEAPK).
In the previous definition, we assumed that each ǫi,t has mean zero. This entails no loss of
generality insofar as φ0 = 0.
Now suppose that R̃ is normally distributed. By Lemma 6.3 and the NEAPK structure,
K

−1 −1
cov(mt+1 , R̃t+1 ) = cov[exp (Zt+1 ) , R̃t+1 ] = R cov(Zt+1 , R̃t+1 ) = R φi cov(ǫi,t+1 , R̃t+1 ).
i=1
By replacing this into Eq. (6.9) leaves the linear factor representation,
K

E(R̃t+1 ) − R = − φi cov(ǫi,t+1 , R̃t+1 ). (6.10)

i=1
“betas”
We have thus shown the following result:
Proposition 6.5: Suppose that R̃ is normally distributed. Then, NEAPK ⇒ linear factor
representation for asset returns.
The APT representation in Eq. (6.10), is close to one result in Cochrane (1996).3 Cochrane
(1996) assumed that m has a linear structure, i.e. m (Zt ) = Zt where Zt is as in Definition 6.1.
3 To recall why eq. (6.10) is indeed a APT equation, suppose that R̃ is a n-(column) vector of returns and that R̃ = a + bf , where
f is K-(column) vector with zero mean and unit variance and a, b are some given vector and matrix with appropriate dimension.
Then clearly, b = cov(R̃, f ). A portfolio π delivers π ⊤ R̃ = π ⊤ a + π ⊤ cov(R̃, f )f . Arbitrage opportunity is: ∃π : π ⊤ cov(R̃, f ) = 0
and π ⊤ a = r. To rule that out, we may show as in Part I of these Lectures that there must exist a K-(column) vector λ s.t.
a = cov(R̃, f )λ + r. This implies R̃ = a + bf = r + cov(R̃, f )λ + bf . That is, E(R̃) = r + cov(R̃, f )λ.
205
by A. Mele
K
This assumption implies that cov(mt+1 , R̃t+1 ) = i=1 φi cov(ǫi,t+1 , R̃t+1 ). By replacing this into
Eq. (6.9),
K
1 1
E(R̃t+1 ) − R = −R φi cov(ǫi,t+1 , R̃t+1 ), where R = = .
i=1
E (m) φ0
The advantage to use the NEAPKs is that the pricing kernel is automatically guaranteed to be
strictly positive - a condition needed to rule out arbitrage opportunities.
6.4.2 Lognormal returns

Next, we assume that R̃ is lognormally distributed, and that NEAPK holds. We have,
K !
1 = E mt+1 · R̃t+1 ⇐⇒ e−φ0 = E e i=1 φi ǫi,t+1 · R̃t+1 . (6.11)
Consider first the case K = 1 and let yt = ln R̃t be normally distributed. The previous
equation can be written as,
1 2 2 2
e−φ0 = E eφ1 ǫt+1 +yt+1 = eE(yt+1 )+ 2 (φ1 σǫ +σy +2φ1 σǫy ) .
This is,
1 2 2 2
E (yt+1 ) = − φ0 + (φ1 σ ǫ + σ y + 2φ1 σ ǫy ) .
2
By applying the pricing equation (6.11) to a bond price,
1 2 2
e−φ0 = E eφ1 ǫt+1 eln Rt+1 = eln Rt+1 + 2 φ1 σǫ ,
and then
1 2 2
ln Rt+1 = − φ0 + φ1 σ ǫ .
2
The expected excess return is,
1
E (yt+1 ) − ln Rt+1 + σ 2y = −φ1 σ ǫy .
2
This equation reveals how to derive the simple theory in Section 6.1 in an alternate way.
Apart from Jensen’s inequality effects ( 12 σ 2y ), this is indeed the Lucas model of Section 6.1 once
φ1 = −η. As is clear, this is a poor model because we are contrived to explain returns with only
one “stochastic discount-factor parameter” (i.e. with φ1 ).
Next consider the general case. Assume as usual that dividends are as in (13.26). To find the
price function in terms of the state variable ǫ, we may proceed as in Section 6.1. In the absence
of bubbles,
∞ ∞
ξ t+i 1 K
e(µD +φ0 + 2 i=1 φi (φi σi +2σi,D ))·i , σ i,D ≡ cov (ǫi , ǫD ) .
2
St = E · Dt+i = Dt ·
i=1
ξt i=1
Thus, if
K
1 2
k̂ ≡ µD + φ0 + φi φi σ i + 2σ i,D < 0,
2 i=1
206
6.5. Pricing kernels and Sharpe ratios c
by A. Mele
then,
St k̂
= .
Dt 1 − k̂
Even in this multi-factor setting, price-dividend ratios are constant - which is counterfactual.
Note that the various parameters can be calibrated so as to make the pricing kernel satisfy
the Hansen-Jagannathan theoretical test conditions in Section 6.3. But the resulting model
always makes the boring prediction that price-dividend ratios are constant. This multifactor
model doesn’t work even if the variance of the implied pricing kernel is high - and lies inside
the Hansen-Jagannathan cup. Living inside the cup doesn’t necessarily imply that the resulting
model is a good one. We need other theoretical test conditions. The next chapter develops
such theoretical test conditions (When are price-dividend ratios procyclical? When is returns
volatility countercyclical? Etc.).
6.5 Pricing kernels and Sharpe ratios

6.5.1 Market portfolios and pricing kernels
This section clarifies issues pertaining to the correlation between the market portfolio and the
pricing kernel in a given market. Can this correlation be ever perfect? In general, this cannot
e
be the case (see, also, Cecchetti, Lam, and Mark, 1994). Let ri,t+1 = R̃i,t+1 − Rt+1 be the excess
return on a risky asset. By standard arguments,
e e
e
0 = Et mt+1 ri,t+1 = Et (mt+1 ) Et ri,t+1 + ρi,t · V art (mt+1 ) · V art ri,t+1 ,
e Et (ri,t+1
e
)
where ρi,t = corrt (mt+1 , ri,t+1 ). Hence, the Sharpe Ratio, S ≡ satisfies:
V art (ri,t+1 )
e

V art (mt+1 )
|S| ≤ = V art (mt+1 ) · Rt+1 . (6.12)
Et (mt+1 )
M
The highest possible Sharpe ratio is bounded. The equality holds as soon as the returns, Rt+1
say, are perfectly conditionally negatively correlated with the pricing kernel, ρM,t = −1. In this
case, we say the portfolio generating these returns is a β-CAPM generating portfolio. Should
such a portfolio really exist, we might call it market portfolio. The reason is that a feasible
and attainable portfolio lying on the kernel volatility bounds is clearly mean-variance efficient.
Let’s elaborate. As explained in the context of the static model of Chapter 1, the Sharpe ratio,
S, equals the slope of the Capital Market Line, and bears the interpretation of unit market
risk-premium. If ρM,t = −1, then, by Eq. (6.12), the slope of the Capital Market Line reduces
√
V art (mt+1 )
to Et (mt+1 ) . For example, for the Lucas model in Section 6.1,

V art (mt+1 ) η2 σ2
= e D − 1 ≈ ησ D .
Et (mt+1 )
In Section 6.1, we also explained that (µS − r)/ σ D = ησ D , which is only approximate true,
according to the previous relation: the asset returns considered in the simple model of Section
6.1 are, then, simply not a β-CAPM generating. For example, suppose that in the economy of
207
by A. Mele
Section 6.1, there is single risky asset, which we would naturally refer to as “market portfolio.”
Yet this asset wouldn’t be β-CAPM generating. Indeed, the model of Section 6.1 implies that,
E(R̃) = eµS , R = e− ln β+η(µD − 2 σD )− 2 η σD , and var(R̃) = e2µS (eσD − 1) and, hence that:
1 2 1 2 2 2
2
1 − e−ησD
S= 2 .
eσD − 1
2
1−e−ησ
Indeed, by simple computations, ρ = − √ η2 σ2
√D
σ2
, which is not precisely “−1” although
e D −1 e D −1
it is close to it for low values of σ D . However, consumption claims are not acting as market
portfolios - in the sense of Chapter 2. If that consumption claim is very highly correlated with
the pricing kernel, then it is also a good approximation to the β-CAPM generating portfolio.
But as the previous simple example demonstrates, that is only an approximation. To summarize,
the fact that everyone is using an asset (or in general a portfolio in a 2-fund separation context)
doesn’t imply that the resulting return is perfectly correlated with the pricing kernel. In other
terms, a market portfolio is not necessarily β-CAPM generating.4
Finally, we describe a further complication: a β-CAPM generating portfolio is not necessarily
the tangency portfolio. We show the existence of another portfolio producing the same β-pricing
relationship as the tangency portfolio. For reasons developed below, such a portfolio is usually
referred to as the maximum correlation portfolio.
1
Let R̄ = E(m) . By the CCAPM in Chapter 2,
β Ri ,m
E Ri − R̄ = E (Rp ) − R̄ ,
β Rp ,m
m
where Rp is a portfolio return. Next, let Rp = Rm ≡ E(m 2 ) , which is clearly perfectly correlated
with the pricing kernel. By results in Chapter 2,

E Ri − R̄ = β Ri ,Rm E (Rm ) − R̄ .
This is not yet the β-representation of the CAPM, because we have yet to show that there
is a way to construct Rm as a portfolio return. In fact, there is a natural choice: pick m = m∗ ,
where m∗ is the minimum-variance kernel leading to the Hansen-Jagannathan bounds. Since
∗
m∗ is linear in all asset retuns, Rm can be thought of as a return that can be obtained by
∗
investing in all assets. Furthermore, in the appendix we show that Rm satisfies,
∗
1 = E m · Rm .
Where is this portfolio located? As shown in the appendix, there is no portfolio yielding the
∗
same expected return with lower variance (i.e., Rm is mean-variance efficient). In addition, in
the appendix we show that,
∗ r − Sh 1+r
E Rm − 1 = =r− Sh < r.
1 + Sh 1 + Sh
∗
Mean-variance efficiency of Rm and the previous inequality imply that this portfolio lies in
the lower branch of the mean-variance efficient portfolios. And this is so because this portfolio
4 As is well-known, things are the same in economies with one agent with quadratic utility. This fact can be seen at work in the
previous formulae (just take η = −1). You should also be able to show this claim with more general quadratic utility functions, as
in chapter 2.
208
by A. Mele
is positively correlated with the true pricing kernel. Naturally, the fact that this portfolio is
β-CAPM generating doesn’t necessarily imply that it is also perfectly correlated with the true
∗
pricing kernel. As shown in the appendix, Rm has only the maximum possible correlation
with all possible m. Perfect correlation occurs exactly in correspondence of the pricing kernel
m = m∗ (i.e. when the economy exhibits a pricing kernel exactly equal to m∗ ).
∗ ∗
Proof that Rm is β-capm generating. The relations 1 = E(m∗ Ri ) and 1 = E(m∗ Rm )
imply

E(Ri ) − R = −R · cov m∗ , Ri
∗ ∗
E(Rm ) − R = −R · cov m∗ , Rm
and,
E(Ri ) − R cov (m∗ , Ri )
= .
E(Rm∗ ) − R cov (m∗ , Rm∗ )
∗ ∗
By construction, Rm is perfectly correlated with m∗ . Precisely, Rm = m∗ / E(m∗2 ) ≡ γ −1 m∗ ,
γ ≡ E(m∗2 ). Therefore,
∗ ∗
cov (m∗ , Ri ) cov γRm , Ri γ · cov Rm , Ri
= = = β Ri ,Rm∗ .
cov (m∗ , Rm∗ ) cov (γRm∗ , Rm∗ ) γ · var (Rm∗ )
6.5.2 Pricing kernel bounds

Figure 6.2 depicts a typical situation for the simple neoclassical asset pricing model. Points
are the ones generated by the Lucas model in correspondence of different η. The model
has to be such that points lie above the observed Sharpe ratio (σ(m)/ E(m) ≥ greatest
Sharpe ratio ever observed in the data–Sharpe ratio on the market portfolio) and inside the
Hansen-Jagannathan bounds. Typically, very high values of η are required to enter the Hansen-
Jagannathan bounds.
There is a beautiful connection between these things and the familiar mean-variance portfolio
frontier described of Chapter 1. As shown in Figure 6.3, every asset or portfolio must lie inside
the wedged region bounded by two straight lines with slopes ∓ σ(m)/ E(m). This is so because,
for any asset (or portfolio) that is priced with a kernel m, we have that

E(Ri ) − R ≤ σ(m) · σ Ri .
E(m)
As seen in the previous section, the equality is only achieved by asset (or portfolio) returns that
are perfectly correlated with m. The point here is that a tangency portfolio such as T doesn’t
necessarily attain the kernel volatility bounds. Also, there is no reason for a market portfolio
to lie on the kernel volatility bound. In the simple Lucas-Breeden economy considered in the
previous section, for example, the (only existing) asset has a Sharpe ratio that doesn’t lie on
the kernel volatility bounds. In a sense, the CCAPM doesn’t necessarily imply the CAPM, i.e.
there is no necessarily an asset acting at the same time as a market portfolio and β-CAPM
generating that is also priced consistently with the true kernel of the economy. These conditions
simultaneously hold if the (candidate) market portfolio is perfectly negatively correlated with
209
by A. Mele
the true kernel of the economy, but this is very particular (it is in this sense that one may
say that the CAPM is a particular case of the CCAPM). A good research question is to find
conditions on families of kernels consistent with the previous considerations.
σ (m )
H ansen-Jagannathan bounds
Sharpe ratio
E (m )
On the other hand, we know that there exists another portfolio, the maximum correlation
portfolio, that is also β-CAPM generating. In other terms, if ∃R∗ : R∗ = −γm, for some positive
constant γ, then the β-CAPM representation holds, but this doesn’t necessarily mean that R∗
is also a market portfolio. More generally, if there is a return R∗ that is β-CAPM generating,
then
ρi,m
ρi,R∗ = , all i. (6.13)
ρR∗ ,m
Therefore, we don’t need an asset or portfolio return that is perfectly correlated with m to
make the CCAPM shrink to the CAPM. In other terms, the existence of an asset return that
is perfectly negatively correlated with the price kernel is a sufficient condition for the CCAPM
to shrink to the CAPM, not a necessary condition. The proof of Eq. (6.13) is easy. By the
CCAPM,
σ(m) σ(m)
E(Ri ) − R = −ρi,m σ(Ri ); and E(R∗ ) − R = −ρR∗ ,m σ(R∗ ).
E(m) E(m)
That is,
E(Ri ) − R ρi,m σ(Ri )
= (6.14)
E(R∗ ) − R ρR∗ ,m σ(R∗ )
But if R∗ is β-CAPM generating,
E(Ri ) − R cov(Ri , R∗ ) σ(Ri )

= = ρ i,R ∗ . (6.15)
E(R∗ ) − R σ(R∗ )2 σ(R∗ )
Comparing Eq. (6.14) with Eq. (6.15) produces (6.13).
210
by A. Mele
E(R)
kernel volatility bounds

mean-variance efficient portfolios
efficient portfolios frontier
tangency portfolio
1 / E(m)
maximum correlation portfolio
σ (R)
A final thought. Many recent applied research papers have important result but also a surpris-
ing motivation. They often state that because we observe time-varying Sharpe ratios on8(proxies

of) the market portfolio, one should also model the market risk-premium V art (mt+1 ) Et (mt+1 )
as time-varying. However, this is not rigorous
8 motivation. The Sharpe
8 ratio of the market portfo-

lio is generally less than V art (mt+1 ) Et (mt+1 ). V art (mt+1 ) Et (mt+1 ) is only a bound.
8
On a strictly theoretical point of view, V art (mt+1 ) Et (mt+1 ) time-varying is not a neces-
sary nor a sufficient condition to observe time-varying Sharpe ratios. Figure 6.3 illustrates this
point.
6.5.3 The Roll’s critique

In applications and tests of the CAPM, proxies of the market portfolio such as the S&P 500 are
used. However, the market portfolio is unobservable, and this prompted Roll (1977) to point
out that the CAPM is inherently untestable. The argument is that a tangency portfolio (or in
general, a β-CAPM generating portfolio) always exists. So, even if the CAPM is wrong, the
proxy of the market portfolio will incorrectly support the model if such a proxy is more or
less the same as the tangency portfolio. On the other hand, if the proxy is not mean-variance
efficient, the CAPM can be rejected even if the CAPM is wrong. All in all, any test of the CAPM
is a joint test of the model itself and of the closeness of the proxy to the market portfolio.
211
6.6. Appendix c
by A. Mele
6.6 Appendix
Proof of the Equation, 1n = E [m∗t (m̄) · (1n + Rt )]. We have,
!
E [m∗t (m̄) · (1n + Rt )] = E m̄ + (Rt − E(Rt ))⊤ β m̄ (1n + Rt )
!
= m̄E (1n + Rt ) + E (Rt − E(Rt ))⊤ β m̄ (1n + Rt )
!
= m̄E (1n + Rt ) + E (1n + Rt ) (Rt − E(Rt ))⊤ β m̄
!
= m̄E (1n + Rt ) + E ((1n + E(Rt )) + (Rt − E(Rt ))) (Rt − E(Rt ))⊤ β m̄
!
= m̄E (1n + Rt ) + E (Rt − E(Rt )) (Rt − E(Rt ))⊤ β m̄
= m̄E (1n + Rt ) + Σβ m̄
= m̄E (1n + Rt ) + 1n − m̄E (1n + Rt ) ,
where the last line follows by the definition of β m̄ .

∗
Proof that Rm can be generated by a feasible portfolio
∗
Proof of the Equation, 1 = E m · Rm . We have,
∗ 1
E(m · Rm ) = E (m · m∗ ) ,
E [(m∗ )2 ]
where
!
E (m · m∗ ) = m̄2 + E m (Rt − E(Rt ))⊤ β m̄
! !
= m̄2 + E m (1 + Rt )⊤ β m̄ − E m (1 + E(Rt ))⊤ β m̄
= m̄2 + β m̄ − E (m) [1 + E(Rt )]⊤ β m̄
!
= m̄2 + 1n − m̄ (1 + E(Rt ))⊤ β m̄
!
= m̄2 + 1n − m̄ (1 + E(Rt ))⊤ Σ−1 [1n − m̄ (1n + E(Rt ))]
= m̄2 + var (m∗ ) ,
where the last line is due to the definition of m∗ .

∗
Proof that Rm is mean-variance
efficient. Let p = (p0 , p1 , · · ·, pn )⊤ the vector of n + 1
portfolio weights (here pi ≡ πi w is the portfolio weight of asset i, i = 0, 1, · · ·, n. We have,
p⊤ 1n+1 = 1.
⊤
The returns we consider are rt = m̄−1 − 1, r1,t , · · ·, rn,t . We denote our “benchmark” portfolio
∗
return as rbt = rm − 1. Next, we build up an arbitrary portfolio yielding the same expected return
E(rbt ) and then we show that this has a variance greater than the variance of rbt . Since this portfolio
212
6.6. Appendix c
by A. Mele
is arbitrary, the proof will be complete. Let rpt = p⊤ rt such that E(rpt ) = E(rbt ). We have:
cov (rbt , rpt − rbt ) = E [rbt · (rpt − rbt )]

= E [Rbt · (Rpt − Rbt )]

= E (Rbt · Rpt ) − E R2bt
1 ! 1
∗ ⊤
= ∗2
E m 1 + p r t − 2 E m∗2
E (m ) ∗2
[E (m )]
1 ) *
⊤ ∗
= p E [m (1n+1 + r t )] − 1
E (m∗2 )
= 0.
The first line follows by construction since E(rpt ) = E(rbt ). The last line follows because
p⊤ E [m∗ (1n+1 + rt )] = p⊤ 1n+1 = 1.
Given this, the claim follows directly from the fact that
var (Rpt ) = var [Rbt + (Rpt − Rbt )] = var (Rbt ) + var (Rpt − Rbt ) ≥ var (Rbt ) .
∗ 1+r
Proof of the Equation, E Rm − 1 = r − 1+Sh Sh. We have,
∗ m̄
E(Rm ) − 1 = − 1.
E[(m∗ )2 ]
In terms of the notation introduced in Section 6.8, m∗ is:
m∗ = m̄ + (aǫ)⊤ β m̄ , β m̄ = σ−1 (1n − m̄ {1n + b}) .
We have,
!2
E[(m∗ )2 ] = m̄ + (aǫ)⊤ β m̄
!2
= m̄2 + E (aǫ)⊤ β m̄
!2
= m̄2 + E (aǫ)⊤ β m̄ · (aǫ)⊤ β m̄
!
= m̄2 + E β ⊤m̄ aǫ ǫ⊤ ⊤
a β m̄
= m̄2 + β ⊤
m̄ · σ · β m̄
!
= m̄2 + 1⊤ n − m̄ 1 ⊤
n + b⊤
σ−1 [1n − m̄ (1n + b)]

= m̄2 + 1⊤
n σ −1
1 n − m̄ 1 ⊤ −1
n σ 1n + 1 ⊤ −1
n σ b
) *
−m̄ 1⊤ −1 ⊤ −1 ⊤ −1 ⊤ −1 ⊤ −1
n σ 1n + b σ 1n − m̄ 1n σ 1n + b σ 1n + 1n σ b + b σ b
⊤ −1
Again in terms of the notation of Section 6.8 (γ ≡ 1⊤ −1 ⊤ −1

n σ 1n and β ≡ 1n σ b), this is:
!
E (m∗ )2 = γ − 2m̄ (γ + β) + m̄2 1 + γ + 2β + b⊤ σ −1 b .
The expected return is thus,

m∗ E (m∗ ) m̄ − γ + 2m̄ (γ + β) − m̄2 1 + γ + 2β + b⊤ σ−1 b
E R −1= ! −1= .
E (m∗ )2 γ − 2m̄ (γ + β) + m̄2 (1 + γ + 2β + b⊤ σ −1 b)
213
6.6. Appendix c
by A. Mele
Now recall two definitions:

1
m̄ = ; Sh = (b − 1m r)⊤ σ−1 (b − 1m r) = b⊤ σ−1 b − 2βr + γr2 .
1+r
In terms of r and Sh, we have,
∗ E (m∗ )
E Rm − 1 = ! −1
E (m∗ )2
γ (1 + r)2 − (1 + r) (1 + 2γ + 2β) + 1 + γ + 2β + b⊤ σ−1 b
= −
γ (1 + r)2 − (1 + r) (2γ + 2β) + 1 + γ + 2β + b⊤ σ−1 b
r − Sh
=
1 + Sh
1+r
= r− Sh
1 + Sh
< r.
This is positive if r − Sh > 0, i.e. if b⊤ σ−1 b − (2β + 1) r + γr2 < 0, which is possible for sufficiently
low (or sufficiently high) values of r.
∗
Proof that Rm is the m-maximum correlation portfolio. We have to show that for any
price kernel m, |corr(m, Rbt )| ≥ |corr(m, Rpt )|. Define a ℓ-parametrized portfolio such that:
E [(1 − ℓ)Ro + ℓRpt ] = E (Rbt ) , Ro ≡ m̄−1 .
We have
corr (m, Rpt ) = corr [m, (1 − ℓ)Ro + ℓRpt ]

= corr [m, Rbt + ((1 − ℓ)Ro + ℓRpt − Rbt )]
cov (m, Rbt ) + cov (m, (1 − ℓ)Ro + ℓRpt − Rbt )
=
σ(m) · var ((1 − ℓ)Ro + ℓRpt )
cov (m, Rbt )
=
σ(m) · var ((1 − ℓ)Ro + ℓRpt )
The first line follows because (1 − ℓ)Ro + ℓRpt is a nonstochastic affine translation of Rpt . The last
equality follows because
cov (m, (1 − ℓ)Ro + ℓRpt − Rbt ) = E [m · ((1 − ℓ)Ro + ℓRpt − Rbt )]

= (1 − ℓ) · E (mRo ) + ℓ · E (mRpt ) − E (m · Rbt )

=1 =1 =1
= 0.
where the first line follows because E((1 − ℓ)Ro + ℓRpt ) = E(Rbt ).
Therefore,
cov (m, Rbt ) cov (m, Rbt )
corr (m, Rpt ) = ≤ = corr (m, Rbt ) ,
σ(m) · var ((1 − ℓ)Ro + ℓRpt ) σ(m) · var(Rbt )
where the inequality follows because Rbt is mean-variance efficient (i.e. ∄ feasible portfolios with the
same expected return as Rbt and variance less than var(Rbt )), and then var((1 − ℓ)Ro + ℓRpt ) ≥
var(Rbt ), all Rpt .
214
6.6. Appendix c
by A. Mele
References
Campbell, J. Y. and J. Cochrane (1999): “By Force of Habit: A Consumption-Based Expla-
nation of Aggregate Stock Market Behavior.” Journal of Political Economy 107, 205-251.
Cecchetti, S., Lam, P-S. and N. C. Mark (1994): “Testing Volatility Restrictions on Intertem-
poral Rates of Substitution Implied by Euler Equations and Asset Returns.” Journal of
Finance 49, 123-152.
Cochrane, J. (1996): “A Cross-Sectional Test of an Investment-Based Asset Pricing Model.”

Journal of Political Economy 104, 572-621.
Constantinides, G. M. and D. Duffie (1996): “Asset Pricing with Heterogeneous Consumers.”

Journal of Political Economy 104, 219-40.
Ferson, W. E. and A. F. Siegel (2003): “Stochastic Discount Factor Bounds with Conditioning
Information.” Review of Financial Studies 16, 567-595.
Gallant, R. A., L. P. Hansen and G. Tauchen (1990): “Using the Conditional Moments of
Asset Payoffs to Infer the Volatility of Intertemporal Marginal Rates of Substitution.”
Journal of Econometrics 45, 141-179.
Hansen, L. P. and R. Jagannathan (1991): “Implications of Security Market Data for Models
of Dynamic Economies.” Journal of Political Economy 99, 225-262.
Mehra, R. and E. C. Prescott (1985): “The Equity Premium: A Puzzle.” Journal of Monetary
Economics 15, 145-161.
Roll, R. (1977): “A Critique of the Asset Pricing Theory’s Tests Part I: On Past and Potential
Testability of the Theory.” Journal of Financial Economics 4, 129-176.
215
7
The stock market
7.1 Introduction
This chapter documents a few empirical regularities affecting the aggregate stock market be-
havior but also some properties arising at a disaggregated level. It also points to general issues
about what we need to do with the neoclassical asset pricing model in Part I of these Lectures,
so as to address these empirical puzzles. Section 7.2 provides a succinct overview of the main
empirical regularities of aggregate stock market fluctuations. For example, we shall explain
that price-dividend ratios and stock returns are procyclical, and that stock volatility and risk-
premiums are both time-varying and countercyclical. Section 7.3 analyzes in deeper detail the
empirical behavior of aggregate stock market volatility, and puts forward some explanations
for it. Section 7.4 develops a framework of analysis aiming to explore the extent to which the
empirical behavior of price-dividend ratios, stock returns, risk-premiums and volatility can be
rationalized within the neoclassical framework. Section 7.5 provides two examples of economies,
which illustrate the predictions in Section 7.4: one economy, with habit formation, and a sec-
ond, with uncertain fundamentals and a learning process about them. Section 7.6 concludes the
chapter, and surveys the properties of the stock market at a disaggregated level.
7.2 The empirical evidence: bird’s eye view

Aggregate stock market fluctuations are intimately related to the business cycle. The evidence
is striking and well-known (see, e.g., the survey in Campbell, 2003), although the emphasis in
this section is to streamline how these fluctuations relate to general macroeconomic conditions.
We use data sampled at a monthly frequency, covering the period fromJanuary 1948 through
December 2002. We compute ex-post, yearly returns at month t as 12 i=1 R̃t−i , where R̃t =
ln( SSt +D
t−1
t
), St is the S&P Composite index as of month t, and D t is the aggregate dividend,
as calculated by Robert Shiller. Table 7.1 provides basic statistics for both row data such as
P/D ratios, P/E ratios and ex-post returns, and stock volatility and expected returns. Stock
7.2. The empirical evidence: bird’s eye view c
by A. Mele
volatility is computed as:
1
12
√
Volt ≡ 6π · σ̄ t , σ̄ t ≡ R̃t+1−i − Rt+1−i , (7.1)
12 i=1
where Rt is the risk-free rate, taken to be the one month bill return. The rationale behind this
calculation is as follows. First, σ̄ t is an estimate of √
the average volatility
√ occurring over the last
12 months. We annualize σ̄ t by multiplying it by 12. The term 6π arises for the following
reason. If we assume that a given return R = σu, where σ is a positive constant and u is a
standard unit normal, then E (|R|) = σ π2 . The definition Volt in Eq. (8.12), then, follows
√
by multiplying 12σ̄ t (ℓ) by π2 . This correction term, π2 , has been suggested by Schwert
(1989a) in a related context.
Expected returns are computed through the Fama and French (1989) predictive regressions
of R̃t on to default-premium, term-premium and the previously defined return volatility, Volt .
With the exception of the P/D and P/E ratios, all figures are annualized percent.
We note the first main set of stylized facts:
Fact I. P/D, P/E ratios and ex-post returns are procylical, although variations in
the business cycle conditions do not seem to be the only driving force for them.
For example, Figure 7.1 reveals that price-dividend ratios decline during all of the economic
slowdowns, as signaled by the recession indicator calculated by the National Bureau of Economic
Research (NBER)–the NBER recessions. At the same time, during NBER expansions, price-
dividend ratios seem to be driven by additional factors not necessarily related to the business
cycle. For example, during the “roaring” 1960s, price-dividend ratios experienced two major
drops with the same magnitude as the decline at the very beginning of the “chaotic” 1970s.
Ex-post returns follow approximately the same pattern, although they are more volatile than
price-dividend ratios (see Figure 7.2).
What about the first two conditional moments of asset returns?
Fact II. Stock volatility and expected returns are countercyclical. However, busi-
ness cycle conditions do not seem to be the only forces explaining the swings of
these variables.
Figures 7.3 through 7.5 are suggestive. For example, Figure 7.4 depicts the statistical relation
between stock volatility and the industrial production growth rate over the last sixty years,
which shows that stock volatility is largely countercyclical, being larger in bad times than in
good.1 There are, of course, exceptions. For example, stock volatility rocketed to almost 23%
during the 1987 crash–a crash occurring during one of the most enduring post-war expansions
period. Countercyclical volatility is a stylized fact extensively discussed in Sections 7.3 and 7.4.
In those sections, we shall learn that within the neoclassical modeling framework, this property
does likely arise as soon as the volatility of the P/D ratios changes is countercyclical. Table 7.1
reveals, then, that the P/D ratios variations are more volatile in bad times than in good.
A third set of stylized facts relates to the asymmetric behavior of the previous variables over
the business cycle:
1 The predictive regressions in Figures 7.4, 7.5 and 7.7 are obtained through least absolute deviations regressions, a technique
known to be more robust to the presence of outliers than ordinary least squares (see Bloomfield and Steiger, 1983).
217
by A. Mele
Fact III. P/D ratios and expected returns changes behave asymmetrically over
the business cycle: the deepest variations in these variables occur during the con-
tractionary phases of the business cycle.
During recessions, these variables move more than they do in good times. As an example, not
only are expected returns countercyclical. On average, expected returns increase more during
NBER recessions than they decrease during NBER expansions. Similarly, not only are P/D
ratios procyclical. On average, P/D ratios increase less during NBER expansions than they
decrease during NBER recessions. Moreover, this asymmetric behavior is, quantitatively, quite
pronounced. Consider, for example, the changes in the P/D ratios: on average, their percentage
(negative) changes during recessions is nearly twice as the percentage (positive) changes during
expansions. Sections 7.3 and 7.4 aim to provide explanations of these facts within neoclassical
models, and develop theoretical test conditions that the very same models would have to satisfy
in order to be consistent with these facts.
total NBER expansions NBER recessions

average std dev average std dev average std dev
P/D ratio 31.99 15.88 33.21 15.79 26.20 14.89
P/E ratio 15.79 6.89 16.36 6.62 13.04 7.46
P/D
ln P/Dt+1 2.01 12.13 3.95 10.81 −7.28 16.79
t
one year returns 8.59 15.86 12.41 13.04 −9.45 15.49
real risk-free rate 1.02 2.48 1.03 2.43 0.97 2.69
excess return volatility 11.34 3.89 10.80 3.59 13.91 4.15
expected returns 8.36 3.49 8.09 3.29 9.62 4.10
TABLE 7.1. Data are sampled monthly and cover the period from January 1948 through December
2002. With the exception of the P/D ratio levels, all figures are annualized percent.
218
by A. Mele
100 t pt p t p t p t pt pt p t pt pt
75
P/D ratio
50
25
P/E ratio
0
1954 1958 1962 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002
FIGURE 7.1. P/D and P/E ratios
60 t p t pt p t p t pt pt p t pt p t
40
20
-20
-40
-60
1954 1958 1962 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002
FIGURE 7.2. Monthly smoothed excess returns (%)
219
by A. Mele
27.5 t p t p t p t p t pt pt p t p t p t
25.0
22.5
20.0
17.5
15.0
12.5
10.0
7.5
1954 1958 1962 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002
FIGURE 7.3. Stock market volatility
Return volatility and industrial production Predictive regression

35 28
30
24
Predicted volatility (annualized, percent)
Return volatility (annualized, percent)
25
20
20
16
15
12
10
8
5
0 4
-1.2 -0.6 0.0 0.6 1.2 1.8 2.4 -1.2 -0.6 0.0 0.6 1.2 1.8 2.4
Industrial production growth rate (percent) Industrial production growth rate (percent)
FIGURE 7.4. Stock volatility and business cycle conditions. The left panel plots stock
volatility, Volt , against yearly (deseasoned) industrial production average growth rates,
1 12
computed as IPt ≡ 12 i=1 Indt+1−i , where where Indt is the real, seasonally adjusted
industrial production growth as of month t. The right panel depicts the prediction of the
static least absolute deviations regression: Vol t = 12.01−5.57 ·IPt +2.06 ·IP2t + wt , where
(0.16) (0.33) (0.35)
wt is a residual term, and robust standard errors are in parenthesis. The data span the
period from January 1948 to December 2002.
220
by A. Mele
Expected Returns & Industrial Production Predictive regression

25 16
Predicted Expected Returns (annualized, %)

14
Expected Stock Returns (annualized, %)
20
12
15
10
10
5
6
0 4
-1.2 -0.6 0.0 0.6 1.2 1.8 2.4 -1.2 -0.6 0.0 0.6 1.2 1.8 2.4
Industrial Production Growth Rate (%) Industrial Production Growth Rate (%)
FIGURE 7.5. The left-hand side of this picture plots estimates of the expected returns
(annualized, percent) (Êt say) against yearly (deseasoned) industrial production average
1 12
growth rates, computed as IPt ≡ 12 i=1 Indt+1−i , where where Indt is the real, seasonally
adjusted industrial production growth as of month t. Expected returns are estimated
through the predictive regression of S&P returns on to default-premium, term-premium
and return volatility, Volt . The right-hand side of this picture depicts the prediction of the
static Least Absolute Deviations regression: Êt = 8.56 − 4.05 · IPt + 1.18 · IP2t + wt , where
(0.15) (0.30) (0.31)
wt is a residual term, and robust standard errors are in parenthesis. Data are sampled
monthly, and span the period from January 1948 to December 2002.
Fact I brings a quite intuitive consequence: price-dividend ratios might convey information
relating to future returns. After all, expansions are followed by recessions. Therefore, in good
times, the stock market predicts that in the future, returns will be negative. Define the excess
return as R̃te ≡ R̃t − Rt , and consider the following regressions,
e
R̃t+n = an + bn × P/Dt + un,t , n ≥ 1,
where u is a residual term. Typically, then, the estimates of bn are significantly negative, and
the R2 for these regressions increases with n. In turn, the previous regressions imply that
e
E[ R̃t+n P/Dt ] = an + bn × P/Dt . They thus suggest that price-dividend ratios are driven by
expected excess returns. In this restrictive sense, countercyclical expected returns (Fact II) and
procyclical price-dividend ratios (Fact I) might be two sides of the same coin.
Moreover, an apparently puzzling feature is that price-dividend ratios do not predict future
dividend growth. Let gt ≡ ln(Dt / Dt−1 ). In regressions taking the following format,
gt+n = an + bn × P/Dt + un,t , n ≥ 1,
the predictive content of price-dividend ratios is poor, and estimates of bn might often come
with a wrong sign.
The previous regressions thus suggest that: (i) price-dividend ratios are driven by time-varying
expected returns (i.e. by time-varying risk-premiums); and (ii) the role played by expected
221
7.3. Volatility: a business cycle perspective c
by A. Mele
dividend growth is somewhat limited. As we shall see later in this chapter, this view can be
challenged along several dimensions. First, it seems that expected earning growth does help
predicting price-dividend ratios. Second, the fact expected dividend growth does not seem to
affect price-dividend ratios can be a property to be expected in equilibrium.
Naturally, because expected returns and stock volatility are both strongly countercyclical,
they then positively relate, at the business cycle frequency considered in this chapter, as illus-
trated by Figure 7.6 below.
Expected Returns & Stock Volatility

25
Expected returns (annualized, %)
20
15
10
0
0 5 10 15 20 25 30 35
Stock Volatility (annualized, %)
FIGURE 7.6.
7.3 Volatility: a business cycle perspective

A prominent feature of the U.S. stock market is the close connection between aggregate stock
volatility and business cycle developments, as Figure 7.4 vividly illustrates. Understanding
the origins and implications of these facts is extremely relevant to policy makers. Indeed, if
stock market volatility is countercyclical, it must necessarily be encoding information about
the development of the business cycle. Policy makers could then attempt at extracting the
signals stock volatility brings about the development of the business cycle.
This section accomplishes three tasks. First, it delivers more details about stylized facts
relating stock volatility, expected returns and P/D ratios over the business cycle (in Section
7.3.1). Second, it provides a few preliminary theoretical explanations of these facts (in Section
7.3.2). Third, it investigates whether stock volatility contains any useful information about the
business cycle development (in Section 7.3.3). There are other exciting topics left over from
this section. For example, we do not tackle statistical issues related to volatility measurement
(see, e.g., Andersen, Bollerslev and Diebold, 2002, for a survey on the many available statis-
tical techniques to estimate volatility). Nor do we consider the role of volatility in applied
asset evaluation: Chapter 10, instead, provides details about how time-varying volatility affects
derivative pricing. At a more fundamental level, the focus of this section is to explore the extent
to which stock market volatility movements can be given a wider business cycle perspective,
and to highlight some of the rational mechanisms underlying them.
222
by A. Mele
7.3.1 Volatility cycles

Why is stock market volatility related to the business cycle? Financial economists seem to
have overlooked this issue for decades. A notable exception is an early contribution by Schwert
(1989a,b), who demonstrates how difficult it is to explain low frequency fluctuations in stock
market volatility through low frequency variation in the volatility of other macroeconomic
variables. A natural exercise at this juncture, is to look into the statistical properties of industrial
production volatility and check whether this correlates with 12stock volatility. Accordingly, we
1
compute industrial production volatility as, VolG,t ≡ √12 i=1 |Gt+1−i |, where Gt is the real,
seasonally adjusted industrial production growth rate as of month t, similarly as in Eq. (8.12).
Figure 7.7 plots stock volatility against the volatility of industrial production growth, and does
not reveal any statistically discernible pattern between these two variables. These results are in
striking contrast with those available from Figure 7.4, where, instead, stock volatility exhibits a
quite clear countercyclical behavior. More in detail, Table 7.1 reveals that stock market volatility
is almost 30% higher during NBER recessions than during NBER expansions.
In fact, Schwert, also shows that stock volatility is countercyclical. The main focus of this
section is to provide a few explanations for this seemingly puzzling evidence, in support of the
view stock market volatility relates to the business cycle, although not precisely related to the
volatility of other macroeconomic variables.
A seemingly separate, yet very well-known, stylized fact is that risk-premiums (i.e. the in-
vestors’ expected return to invest in the stock market) are countercyclical (see, e.g., Fama and
French, 1989, and Ferson and Harvey, 1991), as summarized by Fact II. Particularly important
is also Fact III, that expected returns lower much less during expansions than they increase
during recessions. Using post-war data, we find that compared to an average of 8.36%, the ex-
pected returns increase by nearly 19% during recessions and drop by a mere 3% during NBER
expansions (see Table 7.1). A final stylized fact relates to the behavior of the price-dividend
ratios over the business cycle. Table 7.1 reveals that not only are price-dividend ratios pro-
cyclical. Over the last fifty years at least, price-dividend ratios movements in the US have also
been asymmetric over the business cycle: downward changes occurring during recessions have
been more severe than upward movements occurring during expansions. Table 7.1 suggests that
price-dividend ratios fluctuate nearly two times more in recessions than in expansions.
How can we rationalize these facts? A simple possibility is that the economy is frequently hit
by shocks that display the same qualitative behavior of return volatility, expected returns and
price-dividend ratios. However, the empirical evidence summarized in Figure 7.7 suggests this
channel is unlikely. Another possibility is that the economy reacts to shocks, thanks to some
mechanism endogenously related to the investors’ maximizing behavior, which then activates
the previous phenomena. The next section puts forward explanations for countercyclical stock
volatility, which rely on such endogenous mechanisms. Section 7.3.3, instead, provides addi-
tional empirical results about cyclical properties of stock volatility. The motivation is simple:
because stock volatility is countercyclical, it might contain useful information about ongoing
business cycle developments. The section, then, aims to provide some answers to the following
questions: (i) Do macroeconomic factors help explain the dynamics of stock market volatility?
(ii) Conversely, what is the predictive content stock market volatility brings about the devel-
opment of the business cycle? (iii) Finally, how does “risk-adjusted” volatility relate to the
business cycle?
223
by A. Mele
Return volatility and industrial production volatility Predictive regression

35 20
Predicted return volatility (annualized, percent)

18
30
Return volatility (annualized, percent)
16
25
14
20
12
15
10
10
8
5
6
0 4
0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0
Industrial production volatility (annualized, percent) Industrial production volatility (annualized, percent)
FIGURE 7.7. Return volatility and industrial production volatility. The left panel plots
stock volatility, Volt , against industrial production volatility, VolG,t . The right panel of
the picture depicts the prediction of the static least absolute deviations regression: Volt =
12.28 − 0.83 · VolG,t − 0.12 · Vol2G,t + wt , where w is a residual term, and standard errors
(0.83) (0.47) (0.05)
are in parenthesis. The data span the period from January 1948 to December 2002.
7.3.2 Understanding the empirical evidence

This section aims to two tasks. First, it develops a simple example of an economy where coun-
tercyclical volatility arises in conjunction with the property that investors’ required return are
(i) countercyclical, and (ii) asymmetrically related business cycle development, an econoy, that
is, where risk-premiums increase more in bad times than they decrease in good, as suggested
by the evidence in Table 7.1. Second, the section reviews additional plausible explanations for
countercyclical volatility, where large price swings might relate to the investors’ process of learn-
ing about the fundamentals of the economy. The aim of this section is to introduce to some of
the main explanations of aggregate stock market fluctuations, which will be made deeper and
deeper in the remaining parts of this and the following chapters.
7.3.2.1 Fluctuating compensation for risk
In frictionless markets, the price of a long-lived security and, hence, the aggregate stock market,
is simply the risk-adjusted discounted expectation of the future dividends stream. Other things
being equal, this price increases as the expected return from holding the asset and hence, the
risk-premium, decreases. According to this mechanism, asset prices and price-dividend ratios
are pro-cyclical because risk-adjusted discount rates are countercyclical.
Next, let us develop the intuition about why a countercyclical and asymmetric behavior of
the risk-premium might lead to countercyclical volatility. Assume, first, that risk-premiums
are countercyclical and that they decrease less in good times than they increase in bad times,
consistently with the empirical evidence discussed in the previous section. Next, suppose the
224
by A. Mele
economy enters a boom, in which case we expect risk-premiums to decrease and asset prices to
increase, on average, as illustrated by Figure 7.8. The critical point is that during the boom,
the economy is hit by shocks on the fundamentals, which makes risk-premiums and asset prices
change. However, risk-premiums and, hence, asset prices, do not change as they would during a
recession, since we are assuming that they behave asymmetrically over the business cycle. Then
eventually, the boom ends and a recession begins. As the economy leads to a recession, the risk-
premiums increase and asset prices decrease. Yet now, the shocks hitting the economy make
risk-premiums and, hence, prices, increase more than they decreased during the boom. Once
again, the reasons for this asymmetric behavior relate to our assumption that risk-premiums
change asymmetrically over the business cycle.
Risk-adjusted Price-dividend
discount rates ratio
good
times
bad bad
times times
good
times
Y Y
FIGURE 7.8. Countercyclical risk-premiums and stock volatility.
The empirical evidence in Table 7.1 is supportive of the channel described above: expected
returns seem to move more during recessions than during expansions. Figure 7.9 connects such
an asymmetric behavior of the expected returns with short-run macroeconomic fluctuations. It
depicts how expected returns relate to the monthly Industrial production growth, according to
whether the U.S. economy is in a booming or a recessionary phase.
225
by A. Mele
Expected returns & Short-Run Industrial Production Growth Predictive regression

25 13
12
Predicted Expected Returns (annualized, % )

20
Expected Returns (annualized, % )
11
15
10
9
10
5
7
0 6
-5.0 -2.5 0.0 2.5 5.0 7.5 -5.0 -2.5 0.0 2.5 5.0 7.5
Monthly IP Growth (%) Monthly IP Growth (%)
FIGURE 7.9. This picture is as Figure 7.5, except that it uses monthly IP growth. The
predictive regression depicts the prediction of the Ordinary Least Squares: Ê = 8.299 −
(0.155)
Irecession · 1.006 · Ind − Iexpansion · 0.169 · Ind + w, where Irecession (resp. Iexpansion ) is
(0.339) (0.153)
the indicator function taking the value one if the economy is in a NBER-recession (resp.
expansion) episode and zero otherwise, w is a residual term, and standard errors are in
parenthesis.
To summarize, if risk-premiums are more volatile during recessions than booms, asset prices
and, then, price-dividend ratios are more responsive to changes in economic conditions in bad
times than in good, thereby leading to countercyclical volatility. These effects are precisely those
we observe, as explained. The next section develops theoretical foundations for these facts,
for a fairly general class of models with rational expectations, based on Mele (2007). A key
result is that countercyclical volatility is likely to arise in many models, provided the previous
asymmetry in discounting is sufficiently strong. More precisely, if the asymmetry in discounting
is sufficiently strong, then, the price-dividend ratio is an increasing and concave function of
some variables tracking the business cycle conditions. It is this concavity feature to make
stock volatility increase on the downside. Under similar conditions, models with external habit
formation predict countercyclical stock volatility along the same arguments (see, for example,
Campbell and Cochrane, 1999; Menzly, Santos and Veronesi, 2004; Mele, 2007). Brunnermeier
and Nagel (2007) find that US investors do not change the composition of their risky asset
holdings in response to changes in wealth. The authors interpret their evidence against external
habit formation. Naturally, time-varying risk-premiums do not exclusively emerge in models
with external habit formation. Barberis, Huang and Santos (2001) develop a theory distinct
from habit formation that leads to time-varying risk-premiums.
These theoretical explanations for countercyclical volatility, which are further developed in
the next section, hold within a fairly general continuous-time framework. Although their proofs
226
by A. Mele
might be technical, the intuition is precisely that illustrated by Figure 7.8. This section, then,
aims to provide a quantitative illustration of these results, which hinges upon a simple analytical
framework, relying on a tree. This model is very simple, but it can be solved analytically and,
as shown below, is able to reproduce some of the main stylized features of the actual aggregate
stock market behavior.
We consider an infinite horizon economy with a representative investor who in equilibrium
consumes (state by state) all the dividends promised by some asset. We assume that there exists
a safe asset elastically supplied such that the safe interest rate is some constant r > 0. In the
initial state, a dividend process takes a unit value (see Figure 7.10). In the second period, the
dividend equals either e−δ (δ > 0) with probability p (the bad state) or eδ with probability 1 − p
(the good state). In the initial state, the investor’s coefficient of constant relative risk-aversion
(CRRA) is η > 0. In the good (resp., the bad) state, the investor’s CRRA is ηG (resp., ηB )
> 0. In the third period, the investor receives the final payoffs in Figure 7.10, where MS is the
price of a claim to all future dividends, discounted at a CRRA η S , with S ∈ {G, B, GB} and
η GB = η (the “hybrid” state). This model is thus one with constant expected dividend growth,
but random risk-aversion.
e 2δ + M G
eδ
Good state qG
p
1 + M GB
1
q
p Bad state
e −δ qB
p
e −2δ + M B
FIGURE 7.10. A tree model of random risk-aversion and countercyclical volatility. The
dividend process takes a unit value at the initial node. With probability p, the dividend
then decreases to e−δ in the bad state. The corresponding risk-neutral probability is
denoted as q. The risk-neutral probability of further dividends movements differs according
to whether the economy is in the good or bad state (i.e. qG or qB ). At the end of the tree,
the investor receives the dividends plus the right to the stream of all future dividends. In
the upper node, this right is worth MG (the evaluation obtained through the risk-neutral
probability qG ). In the central node it is worth MGB (the evaluation obtained through
the risk-neutral probability q). In the lower node it is worth MB (the evaluation obtained
through the risk-neutral probability qB ). The safe interest rate is taken to be constant.
The model is calibrated using the same U.S. data as in Table 7.1, and calibration results are
in Table 7.2. (Appendix A provides all technical details about the solution and the calibration
of the model.) In spite of its overly simplifying assumptions, the model does reproduce volatility
227
by A. Mele
swings similar to those we observe in the data, although in the bad state of the world, it might
overstate the expected returns levels by a few percentage points. Importantly, this calibration
exercise illustrates in an exemplary manner the asymmetric feature of expected returns and
risk aversion. In this simple experiment, both expected returns and risk-aversion increase much
more in bad times than they decrease in good times.
Data
expansions average recessions
P/D ratio 33.21 31.99 26.20
excess return volatility 10.80 11.34 13.91
Model calibration
good state average bad state
P/D ratio 32.50 31.81 28.15
excess return volatility 7.29 8.20 13.03
risk-adjusted rate 8.95 9.07 9.71
expected returns 10.16 11.46 18.42
implied risk-aversion 13.69 13.89 14.96
TABLE 7.2. Infinite horizon model. This table reports calibration results for the infinite
horizon tree model in Figure 3. The expected returns and excess return volatility predicted
by the model are computed using log-returns. The risk-adjusted rate is computed as r +
σ̂D λS , where: r is the continuously compounded riskless rate; σ̂D is the dividendvolatility;
λS is the Sharpe ratio on gross returns in state S, computed as λS ≡ (qS − p)÷ p (1 − p)
for S = G (the good state) and S = B (the bad state); p is the probability of the bad state;
and qS is the state dependent risk-adjusted probability of a bad state (for S ∈ {G, B}).
Implied risk-aversion is the coefficient of relative risk aversion ηS in the good state (S = G)
and in the bad state (S = B), implied by the calibrated model. The figures in the “average”
column are the averages of the corresponding values in the good and bad states taken under
the probability p = 0.158.
7.3.2.2 Alternative stock market volatility channels
Rational explanations of stock market fluctuations must necessarily rely on some underlying
state variable affecting the investors’ decision environment. Two natural ways to accomplish
this task are obtained through the introduction of (i) time-varying risk-premiums; and (ii) time-
varying expected dividend growth. The previous tree model is one simple example addressing
the first extension. More substantive examples of models predicting time-varying risk-premiums
are the habit formation models mentioned in Section 7.3, and in Section 7.5 below.
Models addressing the second extension have also been produced. For example, Veronesi
(1999, 2000) and Brennan and Xia (2001) have proposed models in which stock market volatility
fluctuates as a result of a learning induced phenomenon. In these models, the growth rate of
the economy is unknown and investors attempt to infer it from a variety of public signals. This
inference process makes asset prices also depend on the investors’ guesses about the dividends
growth rate, and thus induces high return volatility. (In Veronesi, 1999, stock market volatility
is also countercyclical.)
Finally, Bansal and Yaron (2004) formulate a model in which expected dividend growth is
affected by some unobservable factor. This model, which will be discussed in detail in the next
228
by A. Mele
chapter, is also capable to generate countercyclical stock volatility. This property follows by the
model’s assumption that the volatilities of dividend growth and consumption are countercycli-
cal. In contrast, in models with time-varying risk-premiums (such as the previous tree model),
countercyclical stock market volatility emerges without the need to impose similar features
on the fundamentals of the economy. Remarkably, in models with time-varying risk-premiums,
countercyclical stock market volatility can be endogenously induced by rational fluctuations in
the price-dividend ratio.
7.3.3 What to do with stock market volatility?

Both data and theory suggest that stock market volatility has a quite pronounced business
cycle pattern. A natural purpose at this juncture is to exploit these patterns to perform some
basic forecasting exercises. We consider three in-sample exercises. First, we forecast stock mar-
ket volatility from past macroeconomic data (six month inflation, and six month industrial
production growth). Second, we forecast industrial production growth from past stock market
volatility. Third, we forecast the VIX index, an index of the risk-adjusted expectation of fu-
ture volatility, from macroeconomic data, and attempt to measure the volatility risk-premium,
which is the excess amount of money a risk-averse investor is willing to pay to avoid the risk of
volatility fluctuations.
7.3.3.1 Macroeconomic constituents of stock market volatility
Table 7.3 reports the results for the first forecasting exercise. Volatility is positively related to
past growth. This is easy to understand. Bad times are followed by good times. Precisely, in
my sample, high growth is inevitably followed by low growth. Since stock market volatility is
countercyclical, high growth is followed by high stock market volatility. Stock market volatility
is also related to past inflation, but in a more complex manner. Note that once we control for
past values of volatility, the results remain highly significant. Figure 7.11 (top panel) depicts
stock market volatility and its in-sample forecasts when the regression model is fed with past
macroeconomic data only. This fit can even be improved through the joint use of both past
volatility and macroeconomic factors. Nevertheless, it is remarkable that the fit from using
past macro information is more than 60% better than just using past volatility (see the R2 s in
Table 7.3). These results are somewhat in contrast with those reported in Schwert (1989). The
key issue, here, however, is that we stock market volatility is being predicted within a longer
time-horizon perspective.
229
by A. Mele
Forecasting stock market volatility

35 p t p t pt p t p t p t pt p t pt p t
30
25
20
15
10
5
0
1948 1952 1956 1960 1964 1968 1972 1976 1980 1984 1988 1992 1996 2000
Forecasting economic activity

3 p t p t pt p t p t p t pt p t pt p t
-1
-2
-3
1948 1952 1956 1960 1964 1968 1972 1976 1980 1984 1988 1992 1996 2000
FIGURE 7.11. Forecasts. The top panel depicts stock market volatility (solid line) and
stock market volatility forecasts obtained through the sole use of the macroeconomic
indicators in Table 7.3 (dashed line). The bottom panel depicts 6-month moving average
industrial production growth (solid line) and its forecasts based on the 6th regression in
Table 7.4 (dashed line).
The previous findings, while certainly informal and preliminary, suggest that relating stock
market volatility to macroeconomic factors might be a fertile avenue of research. The main ques-
tion is, of course, how precisely stock market volatility should relate to past macroeconomic
factors? Indeed, the previous linear regressions capture mere statistical relations between stock
market volatility and macroeconomic factors. Yet, in the absence of arbitrage opportunities,
stock market volatility is certainly related to how the price responds to shocks in the funda-
mentals and, hence, the macroeconomic conditions. Therefore, there should exist a no-arbitrage
nexus between stock market volatility and macroeconomic factors. Corradi, Distaso and Mele
(2010) pursue this topic in detail and build up a no-arbitrage model, which reproduces the
previous predictability results. More recently, Paye (2010) documents that there is evidence of
Granger causality from past values of several macroeconomic variables to stock volatility, in out
of sample experiments, although we are still not able to exploit the relation linking these very
same macroeconomic variables to stock volatility, for forecasting purposes. It is an important
result, as it points to the possibility that in the future, alternative data sets might do a better
job than the data set Paye is using. The distinction between Granger causality and practical
forecasting accuracy is subtle. A set of variables might well affect the probability distribution of
stock volatility, which is the definition of Granger causality. At the same time, estimating, say,
a linear regression linking past macroeconomic variables to stock volatility might not necessar-
ily perform well. Intuitively, this relation can be subject to parameter estimation error, which
increases the uncertainty sorrounding the forecasts. Such an uncertainty might overwhelm the
230
by A. Mele
gain due to a bias reduction, due to a well specified model, without omitted variables (i.e. the
macroeconomic variables). Statistical tests for Granger causality may rely on Clark and West
(2007), and tests for forecasting accuracy may hinge upon Giacomini and White (2006).
Past Future
Const. 6.92 7.76 2.48 Const. 8.28
Growtht−12 — 0.29∗ 1.67 Growtht+12 0.21∗
Growtht−24 — 0.74 1.09 Growtht+24 1.62
Growtht−36 — 2.17 2.44 Growtht+36 −0.02∗
Growtht−48 — 1.77 1.91 Growtht+48 0.12∗
Inflt−12 — 10.44 8.05 Inflt+12 3.55
Inflt−24 — −5.96 −5.49 Inflt+24 −0.81∗
Inflt−36 — −1.42∗ −0.97 Inflt+36 −0.54∗
Inflt−48 — 3.73 3.31 Inflt+48 4.33
Volt−12 0.43 — 0.37
Volt−24 −0.17 — −0.09
Volt−36 0.02∗ — 0.09
Volt−48 0.12 — 0.09
R2 16.38 26.01 34.52 R2 12.70
TABLE 7.3. Forecasting stock market volatility with economic activity. The first part of
this table (“Past”) reports ordinary least square coefficient estimates in linear regression
of volatility on to, past six month industrial production growth, past six month inflation,
and past stock volatility. Growtht−12 is the long-run industrial production growth at time
t − 12, etc. Time units are months. The second part of the table (“Future”) is similar, but
it contains coefficient estimates in linear regressions of volatility on to future industrial
production growth and future inflation. Starred figures are not statistically distinguishable
from zero at the 95% level. R2 is the percentage, adjusted R2 .
7.3.3.2 Macroeconomic implications of stock market volatility
Does stock market volatility also anticipate the business cycle? Fornari and Mele (2010) have
tackled this issue, and concluded that stock volatility can help predict the business cycle. This
issue is indeed quite a delicate one. Indeed, the fact stock volatility is countercyclical does not
necessarily imply it anticipates real economic activity. And even if it anticipates it, there remains
to know whether a sustained stock market volatility does really create the premises for future
economic slowdowns. Post hoc ergo propter hoc? Does aggregate stock market volatility affect
investment decisions in the real sector of the economy? Or, rather, does volatility help predict
the business cycle? The policy implications of these issues are quite obvious. If volatility merely
anticipates, without affecting, the business cycle, there is little policy makers can do about it,
although of course its forecasting power is interesting per se. This theme is still unexplored.
Table 7.4 reports results from regressing long-run growth on to macroeconomic variables
and return volatility (only R2 s are reported). The volatility concept we use is purely related
to volatility induced by price-dividend fluctuations (i.e. it is not related to dividend growth
volatility). we find that the predictive power of traditional macroeconomic variables is con-
siderably enhanced (almost doubled) with the inclusion of this new volatility concept and the
price-dividend ratio. According to Figure 4 (bottom panel), stock market volatility does help
predicting the business cycle. Fornari and Mele (2010) contain details on the forecasting per-
formance of a new block, including stock market volatility and the slope of the yield curve.
231
by A. Mele
They show this block is quite successful and outperforms traditional models based on financial
variables, both in sample and out of sample.
Predictors R2
(i) P/D Volatility 10.81
(ii) P/D ratio 15.57
(iii) P/D Volatility, P/D ratio 20.98
(iv) Growth, Inflation 21.20
(v) Growth, Inflation, P/D volatility 34.29
(vi) Growth, Inflation, P/D volatility, P/D ratio 41.76
TABLE 7.4. Forecasting economic activity with stock market volatility. This table reports
the R2 (adjusted, in percentage) from six linear regressions of 6 month moving average
industrial production growth on to the listed set of predicting variables. Inflation is also
6 month moving average inflation. The regressor lags are 6 months, and 1, 2 and 3 years.
1+P/D
P/D volatility is defined as a 12 month moving average of abs(log( P/D t+1 )), where abs(·)
t
denotes the absolute value, and P/D is the price-dividend ratio.
7.3.3.3 Risk-adjusted volatility

Volatility trading
An important innovation for volatility trading was the introduction of the “variance swaps”
during the beginning of the 2000s. Variance swaps are contracts allowing to trade future realized
variance against a fixed swap rate. They allow to take pure views about volatility movements,
without incurring into price-dependency issues arising from trading volatility through straddles,
as we shall explain in detail in Chapter 10, which shall also explain the trading rationale
underlying these contracts. All in all, the payoff guaranteed to the buyer of a swap equals the
difference between the realized volatility over the life of the contract and a fixed swap rate.
Entering this contract at time of origination does not cost. Therefore, the fixed swap rate is
equal to the expectation of the future realized volatility under the risk-neutral probability. In
September 2003, the CBOE started to calculate the VIX index in a way that makes this index
equal to such a risk-neutral expectation. The strength of this new index is that although it
deals with risk-neutral expectations, it is nonparametric–it does not rely on any model of
stochastic volatility. Precisely, it is based on a basket of all the available option prices, relying
on the seminal work by Demeterfi et al. (1999), Bakshi and Madan (2000), Britten-Jones and
Neuberger (2000), and Carr and Madan (2001).
Business cycle determinants of volatility trading
Figure 7.5 (top panel) depicts the VIX index, along with predictions obtained through a para-
metric model. The predicting model is based on the regression of the VIX index on the same
macroeconomic variables considered in the previous sections: inflation and growth. Table 7.5
reports the estimation results, which reveal how important the contribution of macroeconomic
factors is to explain the dynamics of the VIX.
232
by A. Mele
Past Future
Const. 2.60∗ 30.15 3.03∗ Const. 25.53
Growtht−1 — −5.12 0.51∗ Growtht+1 25.53
Growtht−12 — −3.69∗ −0.35∗ Growtht+12 −5.58
Growtht−24 — 4.91 3.69 Growtht+24 −8.34
Growtht−36 — 11.19 4.33 Growtht+36 −9.67
Inflt−1 — −26.96 −9.14∗ Inflt+1 1.04∗
Inflt−12 — −22.62 −1.89∗ Inflt+12 −24.11
Inflt−24 — −1.59∗ 5.85∗ Inflt+24 9.32∗
Inflt−36 — −6.02∗ −2.56∗ Inflt+36 20.71
VIXt−1 0.72 — 0.55
VIXt−12 0.18 — 0.14∗
VIXt−24 −0.06∗ — −0.01∗
VIXt−36 0.02∗ — 0.12∗
R2 66.87 54.12 71.03 R2 55.04
TABLE 7.5. Forecasting the VIX index with economic activity. The first part of this table
(“Past”) reports ordinary least square coefficient estimates in linear regression of the VIX
index on to, past long-run industrial production growth (defined in Figure 1), past long-
run inflation (defined similarly as in Figure 1), and past long-run volatility. Growtht−12
is the long-run industrial production growth at time t − 12, etc. Time units are months.
The second part of the table (“Future”) is similar, but it contains coefficient estimates in
linear regression of the VIX index on to future long-run industrial production growth and
future long-run inflation. Starred figures are not statistically distinguishable from zero at
the 95% level. R2 is the percentage, adjusted R2 .
Figure 7.5 (bottom panel) depicts the volatility risk-premium, defined as the difference be-
tween the expectation of future volatility under the risk-neutral and the physical probability.
We estimated the risk-neutral expectation as the predicting part of the linear regression of the
VIX index on the macroeconomic factors (inflation and growth only)—the dotted line in Figure
7.5, top panel. We estimated expected volatility as the predicting part of an AR(1) model fitted
to the volatility depicted in Figure 4 (top panel). As we see, volatility risk-premiums are indeed
strongly countercyclical. Once again, the results in these picture are suggestive, but they do
represent mere statistical relations. The model considered by Corradi, Distaso and Mele (2010)
has the strength to make these statistical relations emerge as a result of a fully articulated
no-arbitrage model.
233
by A. Mele
Forecasting the VIX index

45 p t p t
40
35
30
25
20
15
10
5
1990 1992 1994 1996 1998 2000 2002 2004 2006
Volatility risk-premium
16 p t
14
12
10
8
6
4
2
1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
FIGURE 7.12. Forecasting the VIX Index, and the volatility risk-premium. The top panel
depicts the VIX index (solid line) and the VIX forecasts obtained through the sole use
of the macroeconomic indicators in Table 7.3 (dashed line). The bottom panel plots the
volatility risk premium, defined as the difference between the one-month ahead volatility
forecast calculated under the risk-neutral probability and the one-month ahead volatility
forecast calculated under the physical probability.
7.3.4 What did we learn?

Stock market volatility is higher in bad times than in good times. Explaining this basic fact
is challenging. Indeed, economists know very well how to model risk-premiums and how these
premiums should relate to the business cycle. We feel more embarrassed when we come to
explain volatility. The ambition in this short essay is to explain that countercyclical volatility
can be made consistent with the prediction of the neoclassical model of asset pricing - in
which asset prices are (risk-adjusted) expectations of future dividends. One condition activating
countercyclical volatility is very simple: risk-premiums must swing sharply as the economy
moves away from good states, just as the data seem to suggest.
The focus in this section was stock market volatility fluctuations, not the average levels of
stock volatility and the risk-premiums, which could make them consistent with plausible levels of
investors’ risk-aversion, two themes so controversial, as many topics at the intersection between
financial economics and macroeconomics (see, e.g., Campbell, 2003; Mehra and Prescott, 2003).
However, a simple model relying on a tree suggests the neoclassical model might deliver results
explaining how volatility switches across states. Finally, this section investigates whether these
theoretical insights have some additional empirical content. Three empirical issues have been
explored, which are currently being investigated in the literature. It was explained that (i) stock
volatility can be forecast through macroeconomic variables; (ii) stock market volatility does
contain relevant information related to business cycle developments; and (iii) volatility trading
does relate to the business cycle and that volatility risk-premiums are strongly countercyclical.
234
7.4. Rational stock market fluctuations c
by A. Mele
7.4 Rational stock market fluctuations

We aim to explain the stylized facts in Section 7.2 through the neoclassical model, and relying
on a framework of analysis as general as possible. This section draws on Mele (2005, 2007).
7.4.1 A decomposition
We have:
St+1 + Dt+1 pt+1 + 1 Dt St
ln R̃t+1 ≡ ln = gt+1 + ln , where gt ≡ ln and pt ≡ .
St pt Dt−1 Dt
This decomposition reveals that the properties of asset returns can be understood through
those of the dividend growth, gt , and the price-dividend ratio pt . The empirical evidence in
Section 7.2 suggests that explanations based on rational evaluation should exhibit at least two
features. First, we need volatile price-dividend ratios. Second we need price-dividend ratios
to be, on average, more volatile in bad times than in good. Let us consider, for example, a
model where asset prices are affected by some key state variables related to the business cycle
conditions (as in the habit models of Section 7.5) A basic property we should require from
this particular model is that the price-dividend ratio be increasing and concave in the state
variables related to the business cycle conditions, as explained in Section 7.3.2. In particular,
such a concavity property ensures stock volatility increases on the downside, which is the very
definition of countercyclical volatility.
The ultimate scope in this section is to search for classes of models ensuring this and related
properties. A word of caution is needed at this juncture. The Gordon’s model in Chapter 6
predicts price-dividend ratios are constant, a counterfactual feature, pointing to the need of
multifactor models. At the same time, multifactor model might not necessarily lead to the
properties we observe in the data. For example, the previous chapter has shown how we can
build up models where: (i) we can arbitrarily increase the variance of the pricing kernel by
adding more and more factors, and (ii) with the unfortunate feature that price-dividend ratios
are still constant. What is needed, then, is to impose discipline on how to increase the dimension
of a model.
7.4.2 Asset prices and state variables

7.4.2.1 A multifactor model
Consider the following model in reduced-form, where an asset price, Si say, is a twice-differentiable
function of a number of factors,
Si = Si (y), y ∈ Rd , i = 1, · · ·, m,
and y = [y1 , · · ·, yd ]⊤ is the vector of factors deeming to affect asset prices. We assume asset
i pays off an instantaneous dividend rate Di = Di (y), where y is a multidimensional diffusion
process:
dy (t) = ϕ (y (t)) dt + v (y (t)) dW (t) ,
where ϕ is d-valued, v is d × d valued, and W is a d-dimensional Brownian motion. We assume
the number of assets does not exceed the number of factors, m ≤ d, consistently with the
235
by A. Mele
framework in Chapter 4. By Itô’s lemma:

1×d
d×d
dSi LSi ∇Si
= dt + v dW, (7.2)
Si Si Si
where LSi is the infinitesimal operator. Let r (t) be the instantaneous short-term rate. By the
FTAP, we have that under regularity conditions, there exists a measurable d-vector process λ,
the vector of unit prices of risk associated with the fluctuations of the factors, such that,
 LS D1
  ∇S 
S1
1
− r + S1 S1
1
 ..   
 .  =
σ
λ , where σ =  ...  · v. (7.3)
LSm Dm ∇S
Sm
− r + Sm m×d d×1
Sm
m
We restrict this economy to be Markov, for the mere scope to simplify, and suppose that,
r (t) ≡ r (y (t)) and λ (t) ≡ λ (y (t)) . (7.4)
In the appendix, we provide examples of pricing kernels such that the short-term rate and
risk-premiums having the same functional form as in Eqs. (7.4).
Eqs. (7.3) add up to a system of m uncoupled partial differential equations, and the solution is
one no-arbitrage price system, assuming no bubbles. This section aims to a reverse-engineering
approach, a search for the primitives ϕ, v, r and λ, such that the asset prices in (7.3) exhibit
some properties given in advance, such as those surveyed in the previous section. As an example,
Eq. (7.2) predicts the volatility of stock i is,

dSi (t) ∇Si (y (t))
Volt ≡ v(y (t)),
Si (t) Si (t)
which is, typically, time-varying. One fundamental question, then, is: which restrictions do we
need to impose to ϕ, v, r and λ, so as to ensure that Volt is countercyclical? The next section
introduces a simple model potentially apt to address this and related questions, through time-
variation in the expected returns expected dividend growth.
7.4.2.2 A canonical economy
We consider a pure exchange economy endowed with a flow of a single consumption good, which
equals the dividend paid by a single long-lived asset. Let d = 2, such that the dividend, D, and
some state variable, y, are solution to:

 dD (τ )
= m (y(τ )) dτ + σ 0 dW1 (τ )
D (τ ) (7.5)

dy (τ ) = ϕ (y (τ )) dτ + v1 (y (τ )) dW1 (τ ) + v2 (y (τ )) dW2 (τ )
where W1 and W2 are standard Brownian motions. By the Feynman-Kac representation theorem
reviewed in Chapter 4, Eqs. (7.3) imply that under regularity conditions, the asset price satisfies:
" ∞ " τ

S (D, y) = C (D, y, τ ) dτ , C (D, y, τ ) ≡ E exp −
r (D (t) , y (t)) dt · D (τ ) D, y ,
0 0
(7.6)
236
by A. Mele
where E is the expectation operator taken under the risk-neutral probability Q, where the
drifts m and ϕ in Eqs. (7.5) have to be replaced by m̂ (y) ≡ m (y) − σ 0 λ1 (y) and ϕ̂ (y) ≡
ϕ (y) − v1 (y) λ1 (y) − v2 (y) λ2 (y).2 Additional details about this economy are in Appendix 2.
7.4.3 Volatility, options and convexity

7.4.3.1 Issues
This section analyzes asset price properties, which can be streamlined into three categories: (i)
“monotonicity,” (ii) “convexity,” and (iii) “dynamic stochastic dominance” properties.
(i) Monotonicity. Consider a model where the asset price is: S (D, y) = D · p (y), for some
′ (y)
positive function p ∈ C 2 (Y). By Itô’s lemma, stock volatility is Vol (D)+ pp(y) Vol (y), where
Vol (D) > 0 is consumption growth volatility and Vol (y) has a similar interpretation. As
explained in the Chapter 6, the volatility in the data is too high to be explained by
consumption volatility. Additional state variables may increase return volatility. In this
simple example, the state variable y inflates volatility if p is increasing in y, p′ > 0. Such a
monotonicity property is also important for a purely theoretical reason, as it would ensure
asset volatility is strictly positive, a crucial condition guaranteeing that the agents’ budget
constraints are well-defined.
(ii.1) Negative convexity. Next, suppose that y is a state variable related to the business cycle
conditions. If S(D, y) = D · p(y) and Vol(y) is constant, stock volatility, then, is counter-
cyclical whenever p is a concave in y. Second-order properties of the price-dividend ratio
are, then, critical to the understanding of time variation in returns volatility, as illustrated
by Figure 7.8.
(ii.2) Convexity. Alternatively, suppose that expected dividend growth is positively affected by
some state variable g. If p is increasing and convex in y ≡ g, price-dividend ratios would
typically display “overreaction” to small changes in g. The empirical relevance of this point
was first recognized by Barsky and De Long (1990, 1993). More recently, Veronesi (1999)
addressed similar convexity issues by means of a fully articulated equilibrium model of
learning.
(iii) Dynamic stochastic dominance. An old issue in financial economics is the relation between
long-lived asset prices and the volatility of fundamentals (see, e.g., Malkiel, 1979; Pindyck,
1984; Poterba and Summers, 1985; Abel, 1988; Barsky, 1989). The standard focus of the
literature has been the link between dividend (or consumption) volatility and stock prices.
A further question is the relation between the volatility of additional state variables (such
as the dividend growth rate) and stock prices.
The next section provides a characterization of these properties, by extending some insights
in the option pricing literature. This literature attempts to explain the qualitative behavior of
a contingent claim price functions with as few assumptions as possible. Unfortunately, some of
the conceptual foundations in this literature are not well-suited to pursue the purposes of this
chapter. As an example, many available results are based on the assumption that at least one
2 See, for example, Huang and Pagès (1992, Theorem 3 p. 53) and Wang (1993, Lemma 1, p. 202), for regularity conditions
underlying the Feynman-Kac theorem in infinite horizon settings; and Huang and Pagès (1992, Proposition 1, p. 41) for regularity
conditions ensuring that the Girsanov’s theorem holds in infinite horizon settings.
237
by A. Mele
state variable is tradable. This is not the case of the “European-type option” pricing problem
(7.6). The next section, then, introduces an abstract asset pricing problem which is appropriate
to these purposes, which encompasses existing results in the option pricing literature.
7.4.3.2 General properties of models
Consider a two-period, risk-neutral economy, where there is a right to receive a cash premium
ψ at the second period. Assume that interest rates are zero, and that the cash premium is a
function of some random variable x̃, viz ψ = ψ(x̃). Finally, let c̄ ≡ E[ψ(x̃)] be the price of this
right. What is the relation between the volatility of x̃ and c̄? By classical second-order stochastic
dominance arguments, as reviewed in Appendix 4 to Chapter 1, c̄ is inversely related to mean
preserving spreads in x̃, provided ψ is concave. Intuitively, a concave function “exaggerates”
poor realizations of x̃ and “dampens” the favorable ones.
Do stochastic dominance properties still hold in a dynamic setting? Consider a multiperiod,
continuous time extension of the previous risk-neutral environment. Assume the cash premium
ψ is paid off at some future date T , and that x̃ = x (T ), where X = x (τ ), x(0) = x, is some
underlying state process. If the yield curve is flat at zero, c (x) ≡ E[ψ(x(T ))| x] is the price
of the right. Clearly, the pricing problem, E[ψ(x(T ))| x], is different from the pricing problem,
E[ψ(x̃)]. However, there are analogies. First, if X is a proportional process, i.e. one for which
the risk-neutral distribution of x(T
x
)
is independent of x, then,
x(T )
c(x) = E [ψ(x · G(T ))] , G(T ) ≡ , x > 0.
x
As this simple formula reveals, standard stochastic dominance arguments still apply: c decreases
(resp. increases) after a mean-preserving spread in G whenever ψ is concave (resp. convex),
consistently with the prediction of the Black and Scholes (1973) formula. This point was first
made by Jagannathan (1984, p. 429-430). In two independent papers, Bergman, Grundy and
Wiener (1996) and El Karoui, Jeanblanc-Picqué and Shreve (1998) generalize these results
to any diffusion process, i.e., not necessarily a proportional process. Bajeux-Besnainou and
Rochet (1996, Section 5) and Romano and Touzi (1997) contain further extensions pertaining
to stochastic volatility models.3 These extensions rely on the assumption X is the price of a
traded asset that does not pay dividends. This assumption is crucial, as it makes the risk-
neutraliz drift X proportional to x. As a result, c inherits convexity properties of ψ, as in the
proportional process case. As shown below, the presence of nontradable state variables might
make interesting nonlinearites emerge. As an example, Proposition 7.1 reveals that convexity
of ψ is neither a necessary or a sufficient condition for convexity of c.4 Furthermore, “dynamic”
stochastic dominance properties are more intricate than in the classical second order stochastic
dominance theory, as revealed by Proposition 7.1.
To substantiate these claims, introduce the following pricing problem.
3 The proofs in these two articles are markedly distinct but are both based on price function convexity. An alternate proof
directly based on payoff function convexity can be obtained through a direct application of Hajek’s (1985) theorem. This theorem
states that if ψ is increasing and convex, and X1 and X2 are two diffusion processes (both starting off from the same origin) with
integrable drifts b1 and b2 and volatilities a1 and a2 , then E[ψ(x1 (T ))] ≤ E[ψ(x2 (T ))] whenever m2 (τ ) ≤ m1 (τ ) and a2 (τ ) ≤ a1 (τ )
for all τ ∈ (0, ∞). Note that this approach is more general than the approach in Bergman, Grundy and Wiener (1996) and El
Karoui, Jeanblanc-Picqué and Shreve (1998) insofar as it allows for shifts in both m and a. As we argue below, both shifts are
important to account for when X is nontradable.
4 Kijima (2002) produces a counterexample in which convexity of option prices may break down, in the presence of convex payoff
functions. His counterexample is based on an extension of the Black-Scholes model where due to the presence of dividends, the
underlying asset price has a concave drift function. Among other things, the proof of Proposition 7.1 reveals the origins of this
counterexample.
238
by A. Mele
Canonical pricing problem. Let X be the solution to:
dx(τ ) = b (x(τ )) dτ + a (x(τ )) dW̄ (τ ),
where W̄ is a multidimensional P̄ -Brownian motion (for some P̄ ), and b, a are some given
functions. Let ψ and ρ be two twice continuously differentiable positive functions, and define
" T

c(x, T ) ≡ Ē exp − ρ(x(t))dt · ψ(x(T )) x (7.7)
0
to be the price of an asset which promises to pay ψ(x(T )) at time T .
In this pricing problem, X can be the price of a traded asset. In this case b(x) = xρ(x). If in
addition, ρ′ = 0, the problem collapses to the classical European option pricing problem with
constant discount rate. If instead, X is not a traded risk, b(x) = b0 (x)−a(x)λ(x), where b0 is the
physical drift function of X and λ is a risk-premium. The previous framework then encompasses
a number of additional cases. As an example, set ψ(x) = x. Then, one may 1) interpret X as
consumption #process; 2) restrict a long-lived asset price S to be driven by consumption only,
∞
and set S = 0 c(x, τ )dτ . As another example, set ψ(x) = 1 and ρ(x) = x. Then, c is a zero-
coupon bond price as predicted by a simple univariate short-term rate model. The importance
of these specific cases will be clarified in the following sections.
In the appendix (see Proposition 7.A.1), we provide a result linking the volatility of the state
variable x to the price c. We now characterize slope (cx) and convexity (cxx ) properties of c.
We have:
Proposition 7.1. The following statements are true:

(i ) If ψ ′ > 0, then c is increasing in x whenever ρ′ ≤ 0. Furthermore, if ψ ′ = 0, then c is
decreasing (resp. increasing) whenever ρ′ > 0 (resp. < 0).
(ii ) If ψ ′′ ≤ 0 (resp. ψ ′′ ≥ 0) and c is increasing (resp. decreasing) in x, then c is concave
(resp. convex ) in x whenever b′′ < 2ρ′ (resp. b′′ > 2ρ′ ) and ρ′′ ≥ 0 (resp. ρ′′ ≤ 0). Finally, if
b′′ = 2ρ′ , c is concave (resp. convex ) whenever ψ ′′ < 0 (resp. > 0) and ρ′′ ≥ 0 (resp. ≤ 0).
Proposition 7.1-(i) generalizes previous monotonicity results obtained by Bergman, Grundy

and Wiener (1996). By the so-called “no-crossing property” of a diffusion, X is not decreasing
in its initial condition x. Therefore, c inherits the same monotonicity features of ψ if discounting
does not operate adversely. This simple observation allows us to address monotonicity properties
of long-lived asset prices, as we shall see in Section 7.5.
Proposition 7.1-(ii) generalizes a number of existing results on option price convexity. First,
assume that ρ is constant and that X is the price of a traded asset. In this case, ρ′ = b′′ = 0. The
last part of Proposition 7.1-(ii) then says that convexity of ψ propagates to convexity of c. This
result reproduces the findings in the literature surveyed earlier. Proposition 7.1-(ii) characterizes
option price convexity within more general contingent claims models. As an example, suppose
that ψ ′′ = ρ′ = 0, and that X is not a traded risk. Then, Proposition 7.1-(ii) reveals that c
inherits the same convexity properties of the drift of X. As a final example, Proposition 7.1-(ii)
extends a result in Mele (2003) relating to bond pricing: let ψ(x) = 1 and ρ(x) = x. Accordingly,
c is the price of a zero-coupon bond predicted by a short-term rate model such as those we shall
deal with in Chapter 11. By Proposition 7.1-(ii), then, c is convex in x whenever b′′ (x) < 2 (see
239
by A. Mele
Appendix 6 for further details and intuition on this bounding number). In analyzing properties
of asset prices with non-traded fundamentals, such as stock prices, both discounting and drift
nonlinearities might come to play a prominent role. We now turn to illustrate the gist of the
proofs underlying Proposition 7.1, by developing an example.
[Gabaix (2009) —> Linearity-generating processes, quadratic drifts]
7.4.3.3 A “macro-asset” option
We discuss the general lines of the proof of Proposition 7.1 by developing the following example
of a “macro-asset” option. Let c (t) be the aggregate consumption process. The owner of the
option has the right to receive a payoff ψ(c (T )), ψ ∈ C 2 , at some date T , where ψ is increasing
and convex. We assume that c (t) is solution to:
dc (t)
= g (t) dt,
c (t)
where the consumption growth rate g (t) satisfies
dg (t) = φ(g (t))dt + ξ(g (t))dW (t) ,
where φ and ξ are some well-behaved functions, and W is a standard Brownian motion. Let
p(c, g, t) be the rational price of the option when the state of the economy as of time t ∈ [0, T ]
is c (t) = c and g (t) = g. Let p ∈ C 2,2,1 . Assume that interest rates are constant and that all
agents are risk-neutral.
By the usual connection between partial differential equations and conditional expectations,
the price p(c, g, t) is solution to the following partial differential equation:
1
0 = p3 + gcpc + ξ 2 pgg + φpg − rp, for all c, g and t ∈ [0, T ), (7.8)
2
with boundary condition p(c, g, T ) = ψ(c), all c, g, where subscripts denote partial differenti-
∂
ation, and p3 (c, g, t) ≡ ∂t p(c, g, t). Monotonicity properties of the price function p(c, g, t) with
respect to c and g can be understood through two approaches. The first approach, based on
the no-crossing property for diffusion processes, proceeds as follows. We have:
#T
−r(T −t)

g(u)du
p(c, g, t) = e E ψ c (t) · e t c (t) = c, g (t) = g . (7.9)
Since ψ is increasing, p is increasing in c as well. Furthermore, the no-crossing property of g

implies that g (t) is increasing in the initial condition g (0). Therefore, p(c, g) is also increasing
in g. (The no-crossing property might also be (trivially) used to conclude about monotonicity
of p with respect to c.)
To study convexity properties of the price function p with respect to g, we can differentiate
Eq. (7.8), and its boundary condition, with respect to c, and find that w ≡ pc is solution to:
1
0 = w3 + gcwc + ξ 2 wgg + φwg − (r − g)w, for all c, g and t ∈ [0, T ),
2
with boundary condition w(c, g, T ) = ψ ′ (c), all c, g. The Feynman-Kac representation of the
solution to the previous equation is:
#T

pc (c, g, t) = e −r(T −t)
E e t g(u)du ′
· ψ (c (T )) c (t) = c, g (t) = g , (7.10)
240
7.5. Two economies c
by A. Mele
which is positive, by the assumption that ψ ′ > 0. thus confirming the monotonicity results
obtained through the previous no-crossing arguments. So when is p(c, g, t) convex in g? By
differentiating Eq. (7.8) with respect to g, one obtains that u ≡ pg is solution to:
1 1
0 = u3 + gcuc + ξ 2 ugg + (φ + (ξ 2 )′ )ug − (r − φ′ )u + cpc , for all c, g and t ∈ [0, T ), (7.11)
2 2
with boundary condition u(c, g, T ) = 0, all c, g. By the Feynman-Kac representation theorem,

" T #u
φ′ (g(s))ds

−r(T −t)
pg (c, g, t) = e E e −r(u−t)+
t
c (u) · pc (c (u) , g (u) , u)du c (t) = c, g (t) = g .
t
By eq. (7.10), pc > 0. Hence, p is increasing in g. We can now apply Proposition 7.1 and
conclude that p is strictly convex in g whenever the drift function of g is weakly convex.
Indeed, by differentiating Eq. (10.15) with respect to g, we obtain that v ≡ pgg is solution to:
1 1
v3 + gcvc + ξ 2 vgg + (φ + (ξ 2 )′ )vg − (r − 2φ′ − (ξ 2 )′′ )v + k, for all c, g and t ∈ [0, T ), (7.12)
2 2
where
k(c, g, t) ≡ 2cpcg (c, g, t) + φ′′ (g)pg (c, g, t), (7.13)
and boundary condition v(c, g, T ) = 0 all c, g. By a direct differentiation of Eq. (7.10), and the
assumption ψ is increasing, we have that pcg > 0. Furthermore, pg > 0. Therefore, k(c, g, t) > 0
whenever φ′′ (g) ≥ 0. By the Feynman-Kac theorem, p then, is convex in g whenever φ′′ (g) ≥ 0.
These conclusions would hold even if we were given a concave payoff function, say ψ(c) = ln c.
In this case, Eq. (7.10) implies that pc (c, g, t) = e−r(T −t) 1c , such that the function k in Eq. (7.13)
collapses to, k(c, g, t) = φ′′ (g)pg (c, g, t). That is, the price function is convex (resp. concave) in
g whenever φ is convex (concave) in g. As a result, the price is linear in g whenever φ′′ = 0, as
it can easily be verified by replacing ψ(c) = ln c into Eq. (7.9), leaving:
#T !

p(c, g, t) = e−r(T −t) ln c + e−r(T −t) E t
g (u) du g (t) = g .
7.5 Two economies

Asset price predictions necessarily rely upon assumptions related to both the pricing kernel
(i.e. interest rates and risk-premiums) and the statistical distribution of dividend growth. This
section hinges upon the results of the previous section, and explores whether we may look for
both: (i) pricing kernels consistent with given dynamic properties of asset prices and dividends,
as illustrated by the two solid arrows; and (ii) properties of dividends consistent with given
dynamic properties of asset prices and pricing kernels, as illustrated by the two dashed arrows.
We consider two types of economies: one, where changes in the variables of economic interest
are determined by cyclical variations in the discount rates; and a second, where the very same
changes are determined by time-varying expected dividend growth.
241
by A. Mele
Dynamic properties 1. Expected returns

of asset prices 2. Returns volatility
Dividends Pricing
distribution Kernel
1. Interest rates
2. Risk-premium
FIGURE 7.13.
7.5.1 External habit formation

We might think time-varying risk-premiums to be a plausibly natural engine of asset price
fluctuations. Indeed, within the neoclassical asset pricing framework, the very properties of
asset prices must necessarily be inherited by those of the risk-premiums, when dividends are
IID, as illustrated by the diagram in Figure 7.13. Campbell and Cochrane (1999) model of
external habit formation is certainly one of the most well-known attempts at explaining some
of the empirical features outlined in Section 7.2, thoughout the channel of time-varying risk-
premiums. Consider an infinite horizon, complete markets economy, where a representative
agent has undiscounted instantaneous utility:
(c − x)1−η − 1
u(c, x) = , (7.14)
1−η
with c denoting consumption and x is a time-varying habit, or exogenous “subsistence level”.
The total endowment process D (τ ) satisfies,
dD (τ )
= g0 dτ + σ 0 dW (τ ) . (7.15)
D (τ )
In equilibrium, C = Z. Let s ≡ (D − x)/D, the “surplus consumption ratio”. By assumption,

s (τ ) is solution to:

1 2 2
ds(τ ) = s(τ ) (1 − φ)(s̄ − ln s(τ )) + σ 0 l(s(τ )) dτ + σ 0 s(τ )l(s(τ ))dW (τ ), (7.16)
2
where l is a positive function given in Appendix 7.2. This function l turns out to be decreasing
in s; and convex in s for the empirically relevant range of variation of s. The Sharpe ratio
predicted by the model is:
λ(s) = ησ 0 (1 + l(s)) , (7.17)
242
by A. Mele
a decreasing function in s (see Appendix 2 for details on the derivation). Finally, Campbell and
Cochrane choose l so as to make the short-term rate constant.
The economic interpretation of the model is simple. Consider first the instantaneous utility
in (7.14). It is readily seen that CRRA = ηs−1 . That is, risk aversion is countercyclical in this
model. Intuitively, during economic downturns, the surplus consumption ratio s decreases and
agents become more risk-averse. As a result, prices decrease and expected returns increase. It’s
a very nice mechanism. Finally,
the model generates realistic risk premiums.
Intuitively, the
−γ −γ
s(t) D(t) D(t)
pricing-kernel is e−ρt s(0) D(0)
, and is more volatile than just e−ρt D(0) . But it’s still
a high risk-aversion economy. Barberis, Huang and Santos (2001) have a similar mechanism,
based on alternative preferences.
By Eq. (7.16), the log of s is a mean-reverting process. By taking logs, we are sure that
s remains positive. Moreover, ln s is also conditionally heteroskedastic since its instantaneous
volatility is σ 0 l. Because l is decreasing in s and s is clearly procyclical, the volatility of ln s is
countercyclical. This feature is responsible of many interesting properties of the model, such as
countercyclical returns volatility.
Finally, the Sharpe ratio λ in Eq. (7.17) is made up of two components. The first is ησ 0 ,
which coincides with the Sharpe ratio predicted by the standard Gordon’s model. The second
is ησ 0 l(s), and arises as a compensation related to the stochastic fluctuations of x = D(1−s). By
the functional form of l picked up by the authors, λ is therefore countercyclical. Combined with
a high φ, this assumption leads to a slowly-varying, countercyclical expected returns. Finally,
numerical simulations of the model leads the authors to conclude that the price-dividend ratio
is concave in s. In Appendix 6.8, we describe a simple algorithm that one may use to solve this
and related models numerically, and in discrete time.
We now proceed to clarify the theoretical link between convexity of l and concavity of the
price-dividend ratio in this and related models. We aim to writing the price-dividend ratio, in
the format of the Canonical Pricing Problem of Section 7.4, and then appeal to Proposition 7.1.
The starting point is the evaluation formula in Eq. (7.6). Note that interest rate are constant
in the Campbell-Cochrane model. Yet to gain in generality, it will be assumed they are state
dependent, although only a function of s. Therefore, Eq. (7.6) predicts that the price-dividend
ratio is:
" ∞ " ∞
S(D, s) C (D, s, τ ) τ D(τ )
p (D, s) ≡ = dτ = E e − 0 r(s(u))du
· D, s dτ . (7.18)
D 0 D 0 D
To compute the inner expectation, one has to figure out the dynamics of D under the risk-
neutral probability measure. By the Girsanov theorem,
D (τ ) 1 2
τ
= e− 2 σ0 τ +σ0 Ŵ (τ ) · eg0 τ − 0 σ0 λ(s(u))du ,
D
where Ŵ is a Brownian motion under the risk-neutral probability. By replacing this into Eq.
(7.18), and noticing that the price-dividend ratio is now independent of D, p (s) ≡ p (D, s), we
have: " ∞ !

g0 τ − 12 σ 20 τ +σ0 Ŵ (τ ) − 0τ Disc(s(u))du
p (s) = e ·E e ·e D, s dτ , (7.19)
0
where
Disc (s) ≡ r (s) + σ 0 λ (s)
243
by A. Mele
is the “risk-adjusted” discount rate. Note also, that under the risk-neutral probability measure,
ds(τ ) = ϕ̂ (s(τ )) dτ + v(s (τ ))dŴ (τ ),
where ϕ̂ (s) = ϕ (s) − v (s) λ (s), ϕ (s) = s[(1 − φ)(s̄ − ln s) + 12 σ 20 l(s)2 ] and v (s) = σ 0 sl(s).
1 2
To obtain a neat formula, we should also get rid of the e− 2 σ0 τ +σ0 Ŵ (τ ) term. Intuitively, this
term arises because consumption and habit are correlated. A convenient change of measure will
do the job. Precisely, define a new probability measure P̄ (say) through the Radon-Nikodym
1 2
derivative dP̄ dP̂ = e− 2 σ0 τ +σ0 Ŵ (τ ) . Under this new probability measure, the price-dividend
ratio p (s) satisfies, " ∞ !
g0 τ − 0τ Disc(s(u))du
p (s) = e · Ē e s dτ , (7.20)
0
and
ds(τ ) = ϕ̄ (s(τ )) dτ + v(s (τ ))dW̄ (τ ),
where W̄ (τ ) = Ŵ (τ ) − σ 0 τ is a P̄ -Brownian motion, and ϕ̄ (s) = ϕ (s) − v (s) λ (s) + σ 0 v (s).
The inner expectation in Eq. (7.20) comes in exactly the same format as in the canonical
pricing problem of Section 7.4. Therefore, we can apply Proposition 7.1, and make the following
conclusions:
d
(i) Suppose that risk-adjusted discount rates are countercyclical, viz ds
Disc(s) ≤ 0. Then
d
price-dividend ratios are procyclical, viz ds p (s) > 0.
(ii) Suppose that price-dividend ratios are procyclical. Then price-dividend ratios are con-
d2
cave in s whenever risk-adjusted discount rates are convex in s, viz ds 2 Disc(s) > 0, and
d2 d
ds2
ϕ̄ (s) ≤ 2 ds Disc(s).
So we have found joint restrictions on the primitives such that the pricing function p is
consistent with properties given in advance. What is the economic interpretation related to
the convexity of risk-adjusted discount rates? If price-dividend ratios are concave in some state
variable Y tracking the business cycle condititions, stock volatility increases on the downside,
and is thus countercyclical, as illustrated by Figure 7.6. According to the previous predictions,
price-dividend ratios are concave in Y whenever risk-adjusted discount rates are decreasing and
sufficiently convex in Y . The economic significance of convexity in this context is that in good
times, risk-adjusted discount rates are substantially constant. As a result, the evaluation of
future dividends does not vary too much, and price-dividend ratios remain relatively constant.
In bad times, howeve, risk-adjusted discount rates increase sharply, thus making price-dividend
ratios more responsive to changes in the economic conditions.
One defect of this model is that the variables of interest are all driven by the same state
variable, s. For this reason, the correlation between consumption growth and stock returns is,
conditionally, one, whereas in the data, it is much less. Naturally, the correlation predicted by
this model is less than one, unconditionally, but still too high, if compared with that in the
data.
7.5.2 Large price swings as a learning induced phenomenon

Time variation in stock volatility may also arise as a result of the agents’ learning process about
the economic fundamentals. In models along these lines, public signals about the fundamentals
244
by A. Mele
hit the market, and agents make inferences about them, thereby creating new state variables
driving price fluctuations, which relate to the agents own guesses about the economic fundamen-
tals. Timmermann (1993, 1996) provide models with exogenous discount rates, where learning
effects increase stock volatility over and above that we might observe in a world without uncer-
tainty, and learning effects, about the fundamentals. Brennan and Xia (2001) generalize these
models to a stochastic general equilibrium. Veronesi (1999) provides a rational expectations
model with learning about the fundamentals, with quite nonlinear learning effects. This section
provides details about the mechanisms through which learning affects asset prices in general,
and stock volatility in particular.
Suppose consumption D is generated by D = θ + w, where θ and w are independently dis-
tributed, with p ≡ Pr(θ = A) = 1 − Pr(θ = −A), and Pr(w = A) = Pr(w = −A) = 12 .
Suppose that the “state” θ is unobserved. How would we update our prior probability p of
the “good” state upon observation of D? A simple application of the Bayes’ Theorem gives
the posterior probabilities Pr(θ = A| Di ) displayed in Table 7.3. Considered as a random vari-
able defined over observable states Di , the posterior probability Pr(θ = A| Di ) has expectation
E [Pr ( θ = A| D)] = p and variance var [Pr ( θ = A| D)] = 12 p(1 − p). Clearly, this variance is
zero exactly where there is a degenerate prior on the state. More generally, it is a ∩-shaped
function of the a priori probability p of the good state. Since the “filter”, g ≡ E (θ = A| D), is
linear in Pr (θ = A| D), the same qualitative conclusions are also valid for g.
Di (observable state)
D1 = 2A D2 = 0 D3 = −2A
1 1 1
Pr(Di ) 2
p 2 2
(1 − p)
Pr ( θ = A| D = Di ) 1 p 0
TABLE 7.3. Randomization of the posterior probabilities Pr ( θ = A| D) .
To understand in detail how we computed the values in Table 7.3, let us recall Bayes’ Theo-
rem. Let (Ei )i be a partition of the state space Ω. (This partition can be finite or uncountable,
i.e. the set of indexes i can be finite or uncountable - it really doesn’t matter.) Then Bayes’
Theorem says that,
Pr ( F | Ei ) Pr ( F | Ei )
Pr (Ei | F ) = Pr (Ei ) · = Pr (Ei ) · . (7.21)
Pr (F ) j Pr ( F | Ej ) Pr (Ej )
By applying Eq. (7.21) to our example,

Pr ( D = D1 | θ = A) Pr ( D = D1 | θ = A)
Pr ( θ = A| D = D1 ) = Pr (θ = A) =p .
Pr (D = D1 ) Pr (D = D1 )
But Pr (D = D1 | θ = A) = Pr (w = D1 − A) = Pr (w = A) = 12 . On the other hand, we have

that Pr (D = D1 ) = 12 p. This leaves Pr ( θ = A| D = D1 ) = 1. It’s trivial, but one proceeds
similarly to compute the other probabilities.
The previous example conveys the main ideas underlying nonlinear filtering. However, it leads
to a nonlinear filter, g, differing from those we usually see in the literature (see, e.g., Chapters
8 and 9 in Liptser and Shiryaev, 2001a). In the literature, the instantaneous variance of the
245
by A. Mele
posterior probability changes, dπ say, is, typically, proportional to π 2 (1 − π)2 , not to π(1 − π).
As we now heuristically demonstrate, the distinction is merely technical. Precisely, it is due to
the assumption that w is a discrete random variable. Indeed, assume that w has zero mean
and unit variance, and that it is absolutely continuous with arbitrary density function φ. Let
π(D) ≡ Pr ( θ = A| D ∈ dD). By an application of the Bayesian learning mechanism in Eq.
(7.21),
Pr ( D ∈ dD| θ = A)
π(D) = Pr (θ = A) · .
Pr ( D ∈ dD| θ = A) Pr (θ = A) + Pr ( D ∈ dD| θ = −A) Pr (θ = −A)
But Pr (D ∈ dD| θ = A) = Pr (w = D − A) = φ (D − A) and similarly, Pr (D ∈ dD| θ = −A) =

Pr (w = D + A) = φ (D + A). Simple computations then leave,
φ(D − A) − φ(D + A)
π(D) − p = p(1 − p) . (7.22)
pφ(D − A) + (1 − p)φ(D + A)
That is, the variance of the “probability changes” π(D) − p is proportional to p2 (1 − p)2 .
To add more structure to the problem, we now assume that w is Brownian motion and set
A ≡ Adτ . Let D0 ≡ D(0) = 0. In Appendix 5, we show that by an application of Itô’s lemma
to π(D),
dπ(τ ) = 2A · π(τ )(1 − π(τ ))dW (τ ), π(D0 ) ≡ p, (7.23)
where dW (τ ) ≡ dD(τ ) − g(τ )dτ and g(τ ) ≡ E ( θ| D (τ )) = [Aπ(τ ) − A(1 − π(τ ))]. Naturally,
this construction is heuristic. Nevertheless, the result is correct.5 Importantly, it is possible to
show that W is a Brownian motion with respect to the agents’ information set σ (D(t), t ≤ τ ).6
Therefore, the equilibrium in the original economy with incomplete information is isomorphic
in its pricing implications to the equilibrium in a full information economy in which,
+
dD(τ ) = [g(τ ) − λ(τ )σ 0 ] dτ + σ 0 dŴ (τ )
(7.24)
dg(τ ) = −λ(τ )v(g(τ ))dτ + v(g(τ ))dŴ (τ )
where Ŵ is a Q-Brownian motion, λ is a risk-premium process, v(g) ≡ (A − g)(g + A)/σ 0 and

σ 0 ≡ 1.7 In fact, if the variance per unit of time of w is σ 20 , eqs. (7.24) hold for any σ 0 > 0.8
Furthermore, a similar result would hold had drift and diffusion of Z been assumed to be
proportional to D. In this case, (Z, G) would be solution to

 dD(τ ) = [g(τ ) − σ λ] dτ + σ dŴ (τ )
0 0
D(τ ) (7.25)

dg(τ ) = ϕ (g(τ )) dτ + v (g(τ )) dŴ (τ )
where ϕ = −λv and v is as before. In all cases, the instantaneous volatility of G is ∩-shaped.
Under positive risk-aversion, this makes the risk-neutralized drift of Z a convex function of g.
The economic implications of this result are very important, and will be analyzed with the help
of Proposition 7.1.
5 See,for example, Liptser and Shiryaev (2001a) (theorem 8.1 p. 318; and example 1 p. 371).
6 SeeLiptser and Shiryaev (2001a) (theorem 7.12 p. 273).
7 Such an isomorphic property has been pointed out for the first time by Veronesi (1999) in a related model.

8 More precisely, we have dW (τ ) = σ−1
0 dz (τ ) − E θ| z (t)t≤τ dτ = σ −1
0 (dz (τ ) − g (τ ) dτ ).
246
by A. Mele
System (7.24) is related to a model studied by Veronesi (1999) (see, also, David, 1997, for a
related model.) This model regards an infinite horizon economy in which a representative agent
with CARA = γ. This agent observes realizations of Z generated by:
dD(τ ) = θdτ + σ 0 dw1 (τ ), (7.26)
where w1 is a Brownian motion, and θ is a two-states (θ̄, θ) Markov chain. θ is unobserved,
and the agent implements a Bayesian procedure to learn whether she lives in the “good” state
θ̄ > θ. All in all, such a Bayesian procedure is similar to the one in Eq. (7.21). Therefore, it
would be relatively simple to show that all equilibrium prices in this economy are isomorphic
to all equilibrium prices in an economy in which (Z, G) are solution to:
+
dD(τ ) = (g(τ ) − γσ 20 ) dτ + σ 0 dŴ (τ )
dg(τ ) = (k(ḡ − g(τ )) − γσ 0 v (g(τ ))) dτ + v (g(τ )) dŴ (τ )

where Ŵ is a P 0 -Brownian motion, v(g) = (θ̄ − g)(g − θ) σ 0 , k, ḡ are some positive constants.
Veronesi (1999) also assumed that the riskless asset is infinitely elastically supplied, and there-
fore that the interest rate r is a constant. It is instructive to examine the price implications
of the resulting
# ∞ economy. In terms of the representation in Eq. (7.6), this model predicts that
S(D, g) = 0 C(D, g, τ )dτ , where
" τ
−rτ −rτ
C(D, g, τ ) = e (D − σ 0 γτ ) + D (g, τ ) , and D (g, τ ) ≡ e E [g(u)| g] du, τ ≥ 0.
0
(7.27)
We may now apply Proposition 7.1 to study convexity properties of D. Precisely, function
E [g(u)| g] is a special case of the auxiliary pricing function (7.7) (namely, for ρ ≡ 1 and
ψ(g) = g). By Proposition 7.1-b), E [g(u)| g] is convex in g whenever the drift of G in (7.24)
is convex. This condition is automatically guaranteed by γ > 0. Technically, Proposition 7.1
implies that the conditional expectation of a diffusion process inherits the very same second
order properties (concavity, linearity, and convexity) of the drift function.
The economic implications of this result are striking. In this economy prices are convex in
the expected dividend growth. This means that in good times, prices may well rocket to very
high values with relatively small movements in the underlying fundamentals.
The economic interpretation of this convexity property is that risk-aversion correction is nil
during extreme situations (i.e. when the dividend growth rate is at its boundaries), and it is
the highest during relatively more “normal” situations. More formally, the risk-adjusted drift
of g is ϕ̂ (y) = ϕ (g) − γσ 0 v (g), and it is convex in g because v is concave in g.
Finally, we examine model (7.25). Also, please notice that this model has been obtained as
a result of a specific learning mechanism. Yet alternative learning mechanisms can lead to a
model having the same structure, but with different coefficients ϕ and v. For example, a model
related to Brennan and Xia (2001) information structure is one in which a single infinitely lived
agent observes Z, where Z is solution to:
dD(τ )
= ĝ(τ )dτ + σ 0 dw1 (τ ),
D(τ )
where Ĝ = {ĝ(τ )}τ >0 is unobserved, but now it does not evolve on a countable number of
states. Rather, it follows an Ornstein-Uhlenbeck process:
dĝ(τ ) = k(ḡ − ĝ(τ ))dτ + σ 1 dw1 (τ ) + σ 2 dw2 (τ )
247
7.6. The cross section of stock returns and volatilities c
by A. Mele
where ḡ, σ 1 and σ 2 are positive constants. Suppose now that the agent implements a learning
procedure similar as before. If she has a Gaussian prior on ĝ(0) with variance γ 2∗ (defined below),
the nonarbitrage price takes the form S(D, g), where (Z, G) are now solution to Eq. (7A.13),
with m0 (D, g) = gD, σ(D) = σ 0 D, ϕ0 (D, g) = k(ḡ −g), v2 = 0, and v1 ≡ v1 (γ ∗ ) = (σ 1 + σ10 γ ∗ )2 ,
where γ ∗ is the positive solution to v1 (γ) = σ 21 + σ 22 − 2kγ.9
Finally, models making expected consumption another observed diffusion may have an inter-
est in their own (see for example, Campbell (2003) and Bansal and Yaron (2004)).
Now let’s analyze these models. Once again, we may make use of Proposition 7.1. We need
to set the problem in terms of the notation of the canonical pricing problem in Section 7.5. To
simplify the exposition, we suppose that λ is constant. By the same kind of reasoning leading
to Eq. (7.20), one finds that the price-dividend ratio is independent of D here too, and is given
by function p below, " ∞ τ !

p (g) = Ē e 0 [g(u)−r(g(u))]du−σ0 λτ g dτ , (7.28)
0
where, 
 dD(τ )
= [g(τ ) − σ 0 λ] dτ + σ 0 dW̄ (τ )
D(τ )

dg(τ ) = [ϕ (g(τ )) + σ 0 v (g(τ ))] dτ + v (g(τ )) dW̄ (τ )
and W̄ (τ ) = Ŵ (τ ) − σ 0 τ is a P̄ -Brownian motion. Under regularity conditions, monotonicity
and convexity properties are inherited by the inner expectation in Eq. (7.28). Precisely, in the
notation of the canonical pricing problem,
ρ (g) = −g + R (g) + σ 0 λ and b (g) = ϕ0 (g) + (σ 0 − λ) v (g) ,
where ϕ0 is the physical probability measure. Therefore,

d
(i) The price-dividend ratio is increasing in the dividend growth rate whenever dg
R (g) < 1.
(ii) Suppose that the price-dividend ratio is increasing in the dividend growth rate. Then it is
d2 d2 d
convex whenever dg 2 R (g) > 0, and dg 2 [ϕ0 (g) + (σ 0 − λ) v (g)] ≥ −2 + 2 dg R (g).
For example, if the riskless asset is constant (because for example it is infinitely elastically
supplied), then the price-dividend ratio is always increasing and it is convex whenever,
d2
[ϕ (g) + (σ 0 − λ) v (g)] ≥ −2.
dg 2 0
The reader can now use these conditions to check predictions made by all models with stochastic
dividend growth presented before.
7.6 The cross section of stock returns and volatilities
9 Intheir article, Brennan and Xia considered a slightly more general model in which consumption and dividends differ. They
obtain a reduced-form model which is identical to the one in this example. In the calibrated model, Brennan and Xia found that
the variance of the filtered ĝ is higher than the variance of the expected dividend growth in an economy with complete information.
The results on γ ∗ in this example can be obtained through an application of theorem 12.1 in Liptser and Shiryaev (2001) (Vol. II, p.
22). They generalize results in Gennotte (1986) and are a special case of results in Detemple (1986). Both Gennotte and Detemple
did not emphasize the impact of learning on the pricing function.
248
7.7. Appendix 1: Calibration of the tree in Section 7.3 c
by A. Mele
7.7 Appendix 1: Calibration of the tree in Section 7.3

Solution and calibration of the model. The initial step of the calibration reported in Table
2 involves estimating the two parameters p and δ of the dividend process. Let G be the dividend
gross growth rate, computed at a yearly frequency. We calibrate p and δ by a perfect matching of the
model’s expected dividend growth, µD ≡ E (G) = pe−δ +(1 − p) eδ , and the model’s dividend variance,
2
σ2D ≡ var (G) = eδ − e−δ p (1 − p), to their sample counterparts µ̂D = 1.0594 and σ̂D = 0.0602
obtained on US aggregate dividend data. The result is (p, δ) = (0.158, 0.082). Given these calibrated
values of (p, δ), we fix r = 1.0%, and proceed to calibrate the probabilities q, qB and qG .
To calibrate (q, qB , qG ), we need an explicit expression for all the payoffs at each node. By standard
risk-neutral evaluation, we obtain a closed form solution for the price of the claim MS , as follows. For
each state S ∈ {G, B, GB}, MS is solution to,
′
MS −r DS MS′ DS′
= e ES + ′ , (7A.1)
DS DS DS DS
where ES (·) is the expectation taken under the risk-neutral probability qS in state S, S ∈ {G, B, GB},
and qGB = q, DG = e2δ , DB = e−2δ , DGB = 1, and DS′ and MS′ are the dividend and the price of the
claim as of the next period. Since risk-aversion is constant from the third period on, the price-dividend
MS′
ratio is constant as well, from the third period on, which implies that MDS = D′ . By using the equality
S
S
MS MS′
DS = ′
DS in Eq. (7A.1), and solving for MS , yields,
qS e−δ + (1 − qS ) eδ
MS = DS , S ∈ {G, B, GB} . (7A.2)
er − [qS e−δ + (1 − qS ) eδ ]
We calibrate (qG , qB , qGB = q) to make the “hybrid” price-dividend (P/D henceforth) ratio MGB ,
the “good” P/D ratio M G
e2δ
and the “bad” P/D ratio eM −2δ in Eq. (7A.2) perfectly match the average
B
P/D ratio, the average P/D ratio during NBER expansion periods, and the average P/D ratio during
NBER recession periods (i.e. 31.99, 33.21 and 26.20, from Table 7.1). Given (p, δ, r, q, qS , qG ), we
compute the P/D ratios
in states G and B. For example, the price of the asset in state B is, PB =
e−r [qB e−2δ + MB + (1 − qB ) (1 + MGB )]. Given PB , we compute the log-return in the bad state as
log( PΠ̃B ), where either Π̃ = e−2δ + MB with probability p, or Π̃ = 1 + MGB with probability 1 − p.
Then, we compute the return volatility in state B. The P/D ratios, the expected log-return and
return volatility in state G are computed similarly. (Please notice that volatilities under p and under
{qS }S∈{G,B,GB} are not the same.)
Next, we recover the risk-aversion parameter η S in the three states S ∈ {G, B, GB} implied by the
previously calibrated probabilities q, qG and q = qGB . As we shall show below, the relevant formula
to use is,
qS eη S δ
= η δ , S ∈ {G, B, GB} . (7A.3)
p pe S + (1 − p) e−ηS δ
The values for the “implied” risk-aversion parameter in Table 7.2 are obtained by inverting Eq. (7A.3)
for η S , given the calibrated values of (p, δ, qS , qG ).
Finally, we compute the risk-adjusted discount rate as r + σ̂D λS , where λS is the Sharpe ratio,
which we shall show below to equal,
qS − p
λS = , S ∈ {G, B, GB} . (7A.4)
p (1 − p)
Proof of Eq. (7A.3). We only provide the derivation of the risk-neutral probability qB , since
the proofs for the expressions of the risk-neutral probabilities qG and q = qGB are nearly identical. In
249
7.7. Appendix 1: Calibration of the tree in Section 7.3 c
by A. Mele
equilibrium, the Euler equation for the stock price at the “bad” node is,
- .
u′B (D̃S )
−η
!
PB = βE ′ −δ D̃S + MS = βE G̃S B D̃S + MS , S ∈ {B, GB} , (7A.5)
uB (e )
where: (i) β is the discount

rate; (ii) the utility function for consumption C is state dependent and equal
to, uB (C) = C 1−ηB (1 − η B ); (iii) E (·) is the expectation taken under the probability p; and (iv)
−2δ
the dividend D̃S and the gross dividend growth rate G̃S are either D̃B = e−2δ and G̃B = ee−δ = e−δ
1
with probability p, or D̃GB = 1 and G̃GB = e−δ = eδ with probability 1 − p.
The model we set up assumes that the asset is elastically supplied or, equivalently, that there exists
a storage technology with a fixed rate of return equal to r = 1%. Let us derive the agent’s private
evaluation of this asset. The Euler equation for the safe asset is,
−η −η
e−rB = βE[G̃S B ] = β pS G̃S B , (7A.6)
S∈{B,GB}
where the safe interest rate, rB , is state dependent, pB = p and pGB = 1 − p. Therefore,
−ηB −η
qB = βerB pG̃B , 1 − qB = βerB (1 − p) G̃GBB (7A.7)
is a probability distribution. In fact, by plugging qB and 1 − qB into Eq. (7A.5), one sees that it
is the risk-neutral probability distribution. To obtain Eq. (7A.3), note that by Eq. (7A.6), βerB =
−η
1/ E[G̃S B ], which replaced into Eq. (7A.7) yields,
−η
qB G̃B B
= −η .
p E[G̃S B ]
Eq. (7A.3) follows by the definition of G̃S given above.
Proof of Eq. (7A.4). Let eµ the gross expected return of the risky asset. The asset return can
take two values: eRℓ with probability p, and eRh with probability 1 − p, and Rh > Rℓ . Therefore, for
each state, we have that:
eµ = peRℓ + (1 − p) eRh , er = qeRℓ + (1 − q) eRh , (7A.8)
where we have omitted the dependence on the state S to alleviate the presentation. The standard
deviation of the asset return is StdR = eRh − eRℓ p (1 − p). The Sharpe ratio is defined as
eµ − er
λ= .
StdR
By substracting the two equations in (7A.8),
eµ − er
q =p+ p (1 − p) = p − λ p (1 − p),
(eRh − eRℓ ) p (1 − p)
from which Eq. (7A.4) follows immediately. Note, also, that in terms of this definition of the Sharpe
ratio, the risk-neutral expectation of the dividend growth is, E (G) = E (G) − λσD .
250
7.8. Appendix 2 c
by A. Mele
7.8 Appendix 2
7.8.1 Markov pricing kernels
Let τ
ξ(τ ) ≡ ξ(D(τ ), y(τ ), τ ) = e− 0 δ(D(s),y(s))ds
Υ(D(τ ), y(τ )), (7A.9)
for some bounded positive function δ, and some positive function Υ(D, y) ∈ C 2,2 (Z × Y). By the
assumed functional form for ξ, and Itô’s lemma,
LΥ(D, y)
R(D, y) = δ(D, y) −
Υ(D, y)
∂ ∂
λ1 (D, y) = −σ 0 D ln Υ(D, y) − v1 (D, y) ln Υ(D, y)
∂D ∂y
∂
λ2 (D, y) = −v2 (D, y) ln Υ(D, y)
∂y
Example A1 below is an important special case of this setting.
Example A1 (Infinite horizon, complete markets economy.) Consider an infinite horizon, complete
markets economy in which total consumption Z is solution to Eq. (7A.13), with v2 ≡ 0. Let a (single)
agent’s program be:
" ∞ " ∞
−δτ
max E e u(c(τ ), x(τ ))dτ s.t. V0 = E ξ(τ )c(τ )dτ , V0 > 0,
0 0
where δ > 0, the instantaneous utility u is continuous and thrice continuously differentiable in its
arguments, and x is solution to
dx(τ ) = β(D(τ ), g(τ ), x(τ ))dτ + γ(D(τ ), g(τ ), x(τ ))dW1 (τ ).
In equilibrium, C = Z, where C is optimal consumption. In terms of the representation in (7A.9), we

have that δ(D, x) = δ, and Υ(D(τ ), x(τ )) = u1 (D(τ ), x(τ ))/ u1 (D(0), x(0)). Consequently, λ2 = 0,
u11 (D, x) u12 (D, x)

R(D, g, x) = δ − m0 (D, g) − β(D, g, x)
u1 (D, x) u1 (D, x)
1 u111 (D, x) 1 u122 (D, x) u112 (D, x)
− σ(D, g)2 − γ(D, g, x)2 − γ(D, g, x)σ(D, g) (7A.10)
2 u1 (D, x) 2 u1 (D, x) u1 (D, x)
u11 (D, x) u12 (D, x)
λ(D, g, x) = − σ(D, g) − γ(D, g, x). (7A.11)
u1 (D, x) u1 (D, x)
7.8.2 Arrow-Debreu PDEs

By Eq. (7.3), we know that S is solution to,
LS + D = rS + (SD σ0 D + Sy v1 ) λ1 + Sy v2 λ2 , ∀(D, y) ∈ Z × Y. (7A.12)
Under regularity conditions, the Feynman-Kac representation of the solution to Eq. (7A.12) is exactly
Eq. (7.6).Naturally, Eq. (7.6) can also be rewritten under the physical measure. We have:
" τ

C (D, y, τ ) = E exp − r (y (t)) dt · D (τ ) D, y = E [ m (τ ) · D (τ )| D, y] ,
0
251
7.8. Appendix 2 c
by A. Mele
ξ(τ )
where m is the stochastic discount factor: m (τ ) = ξ(0) , ξ (0) = 1. Given the previous assumptions, ξ
necessarily satisfies,
dξ (τ )
= −r (y (τ )) dτ − λ1 (y (τ )) dW1 (τ ) + λ2 (y (τ )) dW2 (τ ) . (7A.13)
ξ (τ )
Eq. (7A.12) can also be derived through the undiscounted “Arrow-Debreu adjusted” asset price
process, defined as:
w(D, y) ≡ Υ(D, y) · S(D, y).
By results in Section 7.4.2, we know that the following price representation holds true:
" ∞
S(τ )ξ(τ ) = E ξ(s)D(s)ds , τ ≥ 0.
τ
Under regularity conditions, the previous equation can then be understood as the unique Feynman-Kac
stochastic representation of the solution to the following partial differential equation
Lw(D, y) + f(D, y) = δ(D, y)w(D, y), ∀(D, y) ∈ Z × Y,

where f ≡ ΥD. Eq. (7A.12) then follows by the definition of Lw(τ ) ≡ d
ds E [ΥS] s=τ .
252
7.9. Appendix 3: The maximum principle c
by A. Mele
7.9 Appendix 3: The maximum principle

Suppose we are given the differential equation:
dx(τ )
= φ(τ ), τ ∈ (t, T ),
dτ
where φ satisfies some basic regularity conditions (essentially an integrability condition: see below).
Suppose we know that
x(T ) = 0,
and that
sign (φ (τ )) = constant on τ ∈ (t, T ).
We wish to determine the sign of x(t). Under the previous assumptions on x(T ) and the sign of φ, we
have that:
sign (x(t)) = − sign (φ) .
The proof of this basic result can be grasped very simply from Figure 7.11, and it also follows easily
analytically. We obviously have,
" T " T
0 = x(T ) = x(t) + φ(τ )dτ ⇔ x(t) = − φ(τ )dτ .
t t
Next, suppose that,

dx(τ )
= φ(τ ), τ ∈ (t, T ),
dτ
where
x(τ ) = f (y(τ ), τ ) , τ ∈ (t, T ),
and
dy(τ )
= D(τ ), τ ∈ (t, T ).
dτ
With enough regularity conditions on φ, f, D, we have that

dx ∂
= + L f, τ ∈ (t, T ),
dτ ∂τ
∂f
where Lf = ∂y · D. Therefore,

∂
+ L f = φ, τ ∈ (t, T ), (7A.14)
∂τ
and the previous conclusions hold here as well: if f(y, T ) = 0 ∀y, and sign(φ(τ )) = constant on
τ ∈ (t, T ), then,
sign (f (t)) = − sign (φ) .
Again, this is so because " T
f (y(t), t) = − φ(τ )dτ .
t
Such results can be extended in a straightforward manner in the case of stochastic differential
equations. Consider the more elaborate operator-theoretic format version of (7A.14) which typically
emerges in many asset pricing problems with Brownian information:

∂
0= + L − k u + ζ, τ ∈ (t, T ). (7A.15)
∂τ
253
7.9. Appendix 3: The maximum principle c
by A. Mele
x(t)
φ<0
x(T)
τ
t T
φ>0
x(T)
τ
t T
x(t)
FIGURE 7A.1. Illustration of the maximum principle for ordinary differential equations
Let τ " τ u
y(τ ) ≡ e− t k(u)du
u(τ ) + e− t k(s)ds
ζ(u)du.
t
We claim that if (7A.15) holds, then y is a martingale under some regularity conditions. Indeed,
τ τ τ
dy(τ ) = −k(τ )e− t k(u)du
u(τ )dτ + e− t k(u)du
du(τ ) + e− t k(u)du ζ(τ )dτ

τ
− tτ k(u)du − tτ k(u)du ∂
= −k(τ )e u(τ ) + e + L u(τ ) dτ + e− t k(u)du ζ(τ )dτ
∂τ
+ local martingale

− tτ k(u)du ∂
= e −k(τ )u(τ ) + + L u(τ ) + ζ(τ ) dτ + local martingale
∂τ

∂
= local martingale - because + L − k u + ζ = 0.
∂τ
Suppose that y is also a martingale. Then

T ! " T u

− k(u)du − k(s)ds
y(t) = u(t) = E [y(T )] = E e t u(T ) + E e t ζ(u)du ,
t
and starting from this relationship, you can adapt the previous reasoning on deterministic differential
equations to the stochastic differential case. The case with jumps is entirely analogous.
254
7.10. Appendix 4: Dynamic stochastic dominance c
by A. Mele
7.10 Appendix 4: Dynamic stochastic dominance

We have:
Proposition 7.A.1. (Dynamic Stochastic Dominance) Consider two economies A and B with two
fundamental volatilities aA and aB and let πi (x) ≡ ai (x)·λi (x) and ρi (x) (i = A, B) the corresponding
risk-premium and discount rate. If aA > aB , the price cA in economy A is lower than the price price
cB in economy B whenever for all (x, τ ) ∈ R × [0, T ],
1 2
V (x, τ ) ≡ − [ρA (x) − ρB (x)] cB (x, τ ) − [πA (x) − πB (x)] cB
x (x, τ ) + aA (x) − a2B (x) cB
xx (x, τ ) < 0.
2
(7A.16)
If X is the price of a traded asset, πA = πB . If in addition ρ is constant, c is decreasing (increasing)

in volatility whenever it is concave (convex) in x. This phenomenon is tightly related to the “convexity
effect” discussed earlier. If X is not a traded risk, two additional effects are activated. The first one
reflects a discounting adjustment, and is apparent through the first term in the definition of V . The
second effect reflects risk-premiums adjustments and corresponds to the second term in the definition
of V . Both signs at which these two terms show up in Eq. (7A.16) are intuitive.
255
by A. Mele
7.11 Appendix 5: Proofs of selected results

#T

Proof of proposition 7.A.1. The function c(x, T −s) ≡ E[ exp(− s ρ(x(t))dt) · ψ(x(T )) x(s) = x]
is solution to the following partial differential equation:
+
0 = −c2 (x, T − s) + L∗ c(x, T − s) − ρ(x)c(x, T − s), ∀(x, s) ∈ R × [0, T )
(7A.17)
c(x, 0) = ψ(x), ∀x ∈ R
where L∗ c(x, u) = 12 a(x)2 cxx (x, u) + b(x)cx (x, u) and subscripts denote partial derivatives. Clearly, cA
and cB are both solutions to the partial differential equation (7A.17), but with different coefficients.
Let bA (x) ≡ b0 (x) − πA (x). The price difference ∆c(x, τ ) ≡ cA (x, τ ) − cB (x, τ ) is solution to the
following partial differential equation: ∀(x, s) ∈ R × [0, T ),
1
0 = −∆c2 (x, T − s) + σB (x)2 ∆cxx (x, T − s) + bA (x)∆cx (x, T − s) − ρA (x)∆c(x, T − s) + V (x, T − s),
2
with ∆c(x, 0) = 0 for all x ∈ R, and V is as in Eq. (7A.16) of the proposition. The result follows by
the maximum principle for partial differential equations.
Proof of proposition 7.1. By differentiating twice the partial differential equation (7A.17) with
respect to x, We find that c(1) (x, τ ) ≡ cx (x, τ ) and c(2) (x, τ ) ≡ cxx (x, τ ) are solutions to the following
partial differential equations: ∀(x, s) ∈ R++ × [0, T ),
(1) 1 1
0 = −c2 (x, T − s) + a(x)2 c(1) 2 ′ (1)
xx (x, T − s) + [b(x) + (a(x) ) ]cx (x, T − s)
2 2

− ρ(x) − b′ (x) c(1) (x, T − s) − ρ′ (x)c(x, T − s),
with c(1) (x, 0) = ψ ′ (x) ∀x ∈ R, and ∀(x, s) ∈ R × [0, T ),

(2) 1
0 = −c2 (x, T − s) + a(x)2 c(2) 2 ′ (2)
xx (x, T − s) + [b(x) + (a(x) ) ]cx (x, T − s)
2
1
− ρ(x) − 2b (x) − (a(x) ) c(2) (x, T − s)
′ 2 ′′
2
′
− 2ρ (x) − b′′ (x) c(1) (x, T − s) − ρ′′ (x)c(x, T − s),
with c(2) (x, 0) = ψ ′′ (x) ∀x ∈ R. By the maximum principle for partial differential equations, c(1) (x, T −
s) > 0 (resp. < 0) ∀(x, s) ∈ R×[0, T ) whenever ψ′ (x) > 0 (resp. < 0) and ρ′ (x) < 0 (resp. > 0) ∀x ∈ R.
This completes the proof of part a) of the proposition. The proof of part b) is obtained similarly.
Derivation of Eq. (7.23). We have,
dD = gdτ + dW,
and, by Eq. (7.22),

pφ(D − A) p
π(D) = =
pφ(D − A) + (1 − p)φ(D + A) p + (1 − p)e−2AD
1 2
where the second equality follows by the Gaussian distribution assumption φ (x) ∝ e− 2 x , and straight
forward simplifications. By simple computations,
1 − π (D) (1 − p) e−2AD (1 − p) e−2AD
= , π′ (D) = 2Aπ (D)2 , π′′ (D) = 2Aπ′ (D) [1 − 2π (D)] .
π (D) p p
(7A.18)
256
by A. Mele
By construction,
g = π (D) A + [1 − π (D)] (−A) = A [2π (D) − 1] .
Therefore, by Itô’s lemma,
1
dπ = π′ dD + π′′ dτ = π′ dD + Aπ′ (1 − 2π) dτ = π′ [g + A (1 − 2π)] dτ + π′ dW = π′ dW.
2
By using the relations in (7A.18) once again,
dπ = 2Aπ (1 − π) dW.
257
7.12. Appendix 6: Convexity of bond prices revisited c
by A. Mele
7.12 Appendix 6: Convexity of bond prices revisited

Consider a short-term rate process r(τ ) (say), and let u(r0 , T ) be the price of a bond expiring at time
T when the current short-term rate is r0 :
" T

u(r0 , T ) = E exp − r(τ )dτ r0 .
0
As pointed out in Section 7.6, Proposition 7.1-(ii) implies that in scalar diffusion models of the short-
term rate, such as those we shall deal with in Chapter 11, we have that u11 (r0 , T ) < 0 whenever b′′ < 2,
where b is the risk-netraulized drift of r. This result, obtained by Mele (2003), be proved through the
Feynman-Kac representation of the partial u11 , and a similar proof can be used to show Proposition
7.1-(ii). This appendix provides a more intuitive derivation under a set of simplifying assumptions. By
Eq. (6) p. 685 in Mele (2003),
-$" 2 " T 2 % " T .
T
∂r ∂ r
u11 (r0 , T ) = E (τ )dτ − 2 (τ )dτ exp − r(τ )dτ .
0 ∂r0 0 ∂r0 0
Hence u11 (r0 , T ) > 0 whenever

" T " T 2
∂ 2r ∂r
(τ )dτ < (τ )dτ . (7A.19)
0 ∂r02 0 ∂r0
To keep the presentation as simple as possible, we assume that r is solution to:
dr(τ ) = b(r(τ ))dt + a0 r(τ )dW (τ ),
where a0 is a constant. We have,

" τ
∂r ′ 1 2
(τ ) = exp b (r(u))du − a0 τ + a0 W (τ ) ,
∂r0 0 2
and "
τ
∂2r ∂r(τ ) ′′ ∂r(u)
2 (τ ) = b (r(u)) du .
∂r0 ∂r0 0 ∂r0
Therefore, if b′′ < 0, then ∂ 2 r(τ )/∂r02 < 0, and by inequality (11.47), u11 > 0. But this result can
considerably be improved. Precisely, suppose that b′′ < 2 (instead of simply assuming that b′′ < 0).
By the previous equality, "
∂2r ∂r(τ ) τ ∂r(u)
(τ ) < 2 du,
∂r02 ∂r0 0 ∂r0
and consequently,
" T " T " τ " T 2
∂2r ∂r(τ ) ∂r(u) ∂r(u)
(τ )dτ < 2 du dτ = du ,
0 ∂r02 0 ∂r0 0 ∂r0 0 ∂r0
which is inequality (11.47).
258
7.13. Appendix 7: External habit formation in continuous time c
by A. Mele
7.13 Appendix 7: External habit formation in continuous time

In their original article, Campbell and Cochrane considered a discrete-time model in which consump-
tion is a Gaussian process. The diffusion limit of their model is simply Eq. (8.22) given in the main
text. By example A1, Eq. (7A.11),

η 1
λ(D, x) = σ0 − γ(D, x) . (7A.20)
s D
To find the diffusion function γ of x, notice that x = D(1 − s), where s solution to Eq. (7.16). By
Itô’s lemma, then, γ = [1 − s − sl(s)] Dσ0 . Finally, we replace this function into (7A.20), and obtain
λ(s) = ησ0 [1 + l(s)], as we claimed in the main text. (This result holds approximately in the original
discrete time framework.) Finally, the real interest rate is found by an application of formula (7A.10),

1 1
R(s) = δ + η g0 − σ20 + η(1 − φ)(s̄ − ln s) − η2 σ 20 [1 + l(s)]2 .
2 2
Campbell
and Cochrane choose l so as
to make the real interest rate constant. They took l(s) =
S̄ −1 1 + 2(s − ln s) − 1, where S̄ = σ0 η/(1 − φ) = exp(s̄), which leaves R = δ + η g0 − 12 σ 20 −
1
2 η(1 − φ).
259
7.14. Appendix 8: Simulation of discrete-time pricing models c
by A. Mele
7.14 Appendix 8: Simulation of discrete-time pricing models

The pricing equation is
′ −η ′ −η
uc (D′ , x′ ) s D
S = E m · S ′ + D′ , m=β =β .
uc (D, x) s D
Hence, the price-dividend ratio p ≡ S/ D satisfies:

D′ ′
D′
p=E m 1+p , = eg0 +w .
D D
This is a functional equation having the form,

′ −η ′ 1−η
s D
p(s) = E g s′ , s 1 + p s′ s , g s′ , s = β .
s D
A numerical solution can be implemented as follows. Create a grid and define pj = p (sj ), j = 1, ···, N,
for some N. We have,
      
p1 b1 a11 · · · aN1 p1
 ..   ..   .. .. ..   ..  ,
 .  =  . + . . .  . 
pN bN a1N · · · aNN pN

N
bi = aji , aji = gji · pji , gji = g (sj , si ) , pji = Pr ( sj | si ) · ∆s,
j=1
where ∆s is the integration step; s1 = smin , sN = smax ; smin and smax are the boundaries in the
approximation; and Pr ( sj | si ) is the transition density from state i to state j - in this case, a Gaussian
transition density. Let p = [p1 · · · pN ]⊤ , b = [b1 · · · bN ]⊤ , and let A be a matrix with elements aji .
The solution is,
p = (I − A)−1 b. (7A.21)
The model can be simulated in the following manner. Let s and s̄ be the boundaries of the underlying
s̄ − s
state process. Fix ∆s = . Draw states. State s∗ is drawn. Then,
N
s∗ −s
1. If min (s∗ − s, s̄ − s∗ ) = s∗ − s, let k be the smallest integer close to ∆s . Let smin = s∗ − k∆s,
and smax = smin + N · ∆s.
s∗ −s
2. If min (s∗ − s, s̄ − s∗ ) = s̄ − s∗ , let k be the biggest integer close to ∆s . Let smax = s∗ + k∆s,
and smin = smax − N · ∆s.
The previous algorithm avoids interpolations. Importantly, it ensures that during the simulations,
p is computed in correspondence of exactly the state s∗ that is drawn. Precisely, once s∗ is drawn,
1 ) create the corresponding grid s1 = smin , s2 = smin + ∆s, · · ·, sN = smax according to the previous
rules; 2 ) compute the solution from Eq. (7A.21). In this way, one has p (s∗ ) at hand - the simulated
P/D ratio when state s∗ is drawn.
260
by A. Mele
References
Abel, A. B. (1988): “Stock Prices under Time-Varying Dividend Risk: An Exact Solution
in an Infinite-Horizon General Equilibrium Model.” Journal of Monetary Economics 22,
375-393.
Andersen, T. G., T. Bollerslev and F. X. Diebold (2002): “Parametric and Nonparametric

Volatility Measurement.” Forthcoming in Aı̈t-Sahalia, Y. and L. P. Hansen (Eds.): Hand-
book of Financial Econometrics.
Bakshi, G. and D. Madan (2000): “Spanning and Derivative Security Evaluation.” Journal of
Financial Economics 55, 205-238.
Bajeux-Besnainou, I. and J.-C. Rochet (1996): “Dynamic Spanning: Are Options an Appro-
priate Instrument?” Mathematical Finance 6, 1-16.
Bansal, R. and A. Yaron (2004): “Risks for the Long Run: A Potential Resolution of Asset
Pricing Puzzles.” Journal of Finance 59, 1481-1509.
Barberis, N., M. Huang and T. Santos (2001): “Prospect Theory and Asset Prices.” Quarterly
Journal of Economics 116, 1-53.
Barsky, R. B. (1989): “Why Don’t the Prices of Stocks and Bonds Move Together?” American
Economic Review 79, 1132-1145.
Barsky, R. B. and J. B. De Long (1990): “Bull and Bear Markets in the Twentieth Century.”
Journal of Economic History 50, 265-281.
Barsky, R. B. and J. B. De Long (1993): “Why Does the Stock Market Fluctuate?” Quarterly
Journal of Economics 108, 291-311.
Bloomfield, P. and Steiger, W. L. (1983): Least Absolute Deviations. Boston: Birkhäuser.
Brennan, M. J. and Y. Xia (2001): “Stock Price Volatility and Equity Premium.” Journal of
Monetary Economics 47, 249-283.
Britten-Jones, M. and A. Neuberger (2000): “Option Prices, Implied Price Processes and
Stochastic Volatility.” Journal of Finance 55, 839-866.
Brunnermeier, M. K. and S. Nagel (2007): “Do Wealth Fluctuations Generate Time-Varying

Risk Aversion? Micro-Evidence on Individuals’ Asset Allocation.” Forthcoming in Amer-
ican Economic Review.
Campbell, J. Y. (2003): “Consumption-Based Asset Pricing.” In: Constantinides, G.M., M.

Harris and R. M. Stulz (Editors): Handbook of the Economics of Finance (Volume 1B:
Chapter 13), 803-887.
261
by A. Mele
Campbell, J. Y., and J. H. Cochrane (1999): “By Force of Habit: A Consumption-Based

Explanation of Aggregate Stock Market Behavior.” Journal of Political Economy 107,
205-251.
Carr, P. and D. Madan (2001): “Optimal Positioning in Derivative Securities.” Quantitative

Finance 1, 19-37.
Clark, T.E. and K.D. West (2007): “Approximately Normal Tests for Equal Predictive Accu-
racy in Nested Models.” Journal of Econometrics 138, 291-311.
Corradi, V., W. Distaso and A. Mele (2010): “Macroeconomic Determinants of Stock Market
Volatility and Volatility Risk-Premia.” Working paper University of Warwick, Imperial
College, and London School of Economics.
David, A. (1997): “Fluctuating Confidence in Stock Markets: Implications for Returns and
Volatility.” Journal of Financial and Quantitative Analysis 32, 427-462.
Demeterfi, K., E. Derman, M. Kamal and J. Zou (1999): “A Guide to Volatility and Variance
Swaps.” Journal of Derivatives 6, 9-32.
Detemple, J. B. (1986): “Asset Pricing in a Production Economy with Incomplete Informa-

tion.” Journal of Finance 41, 383-391.
El Karoui, N., M. Jeanblanc-Picqué and S. E. Shreve (1998): “Robustness of the Black and
Scholes Formula.” Mathematical Finance 8, 93-126.
Fama, E. F. and K. R. French (1989): “Business Conditions and Expected Returns on Stocks
and Bonds.” Journal of Financial Economics 25, 23-49.
Ferson, W. E. and C. R. Harvey (1991): “The Variation of Economic Risk Premiums.” Journal
Fornari, F. and A. Mele (2010): “Financial Volatility and Real Economic Activity.” Working
paper European Central Bank and London School of Economics.
Gennotte, G. (1986): “Optimal Portfolio Choice Under Incomplete Information.” Journal of

Finance 41, 733-746.
Huang, C.-F. and Pagès, H. (1992): “Optimal Consumption and Portfolio Policies with an
Infinite Horizon: Existence and Convergence.” Annals of Applied Probability 2, 36-64.
Hajek, B. (1985): “Mean Stochastic Comparison of Diffusions.” Zeitschrift fur Wahrschein-

lichkeitstheorie und Verwandte Gebiete 68, 315-329.
Gabaix, X. (2009): “Linearity-Generating Processes: A Modelling Tool Yielding Closed Forms

for Asset Prices.” WP NYU.
Giacomini, R. and H. White (2006): “Tests of Conditional Predictive Ability.” Econometrica

74, 1545-1578.
Jagannathan, R. (1984): “Call Options and the Risk of Underlying Securities.” Journal of
262
by A. Mele
Kijima, M. (2002): “Monotonicity and Convexity of Option Prices Revisited.” Mathematical

Finance 12, 411-426.
Liptser, R. S. and A. N. Shiryaev (2001): Statistics of Random Processes. Berlin, Springer-

Verlag. [2001a: Vol. I (General Theory). 2001b: Vol. II (Applications).]
Malkiel, B. (1979): “The Capital Formation Problem in the United States.” Journal of Finance
34, 291-306.
Mehra, R. and E. C. Prescott (2003): “The Equity Premium in Retrospect.” In Constantinides,

G.M., M. Harris and R. M. Stulz (Editors): Handbook of the Economics of Finance (Vol-
ume 1B, chapter 14), 889-938.
Mele, A. (2005): “Rational Stock Market Fluctuations.” WP FMG-LSE.
Mele, A. (2007): “Asymmetric Stock Market Volatility and the Cyclical Behavior of Expected
Returns.” Journal of Financial Economics 86, 446-478.
Menzly, L., T. Santos and P. Veronesi (2004): “Understanding Predictability.” Journal of

Political Economy 111, 1, 1-47.
Pindyck, R. (1984): “Risk, Inflation and the Stock Market.” American Economic Review 74,
335-351.
Paye, B.P. (2010): “Do Macroeconomic Variables Forecast Aggregate Stock Market Volatil-
ity?” Working Paper, Rice University.
Poterba, J. and L. Summers (1985): “The Persistence of Volatility and Stock Market Fluctu-
ations.” American Economic Review 75, 1142-1151.
Rothschild, M. and J. E. Stiglitz (1970): “Increasing Risk: I. A Definition.” Journal of Eco-

Schwert, G. W. (1989a): “Why Does Stock Market Volatility Change Over Time?” Journal of
Finance 44, 1115-1153.
Schwert, G.W. (1989b): “Business Cycles, Financial Crises and Stock Volatility.” Carnegie-
Rochester Conference Series on Public Policy 31, 83-125.
Timmermann, A. (1993): “How Learning in Financial Markets Generates Excess Volatility

and Predictability in Stock Prices.” Quarterly Journal of Economics 108, 1135-1145.
Timmermann, A. (1996): “Excess Volatility and Return Predictability of Stock Returns in

Autoregressive Dividend Models with Learning.” Review of Economic Studies 63, 523-
577.
263
by A. Mele
Veronesi, P. (1999): “Stock Market Overreaction to Bad News in Good Times: A Rational
Expectations Equilibrium Model.” Review of Financial Studies 12, 975-1007.
Veronesi, P. (1999): “Stock Market Overreaction to Bad News in Good Times: A Rational
Expectations Equilibrium Model.” Review of Financial Studies 12, 975-1007.
Veronesi, P. (2000): “How Does Information Quality Affect Stock Returns?” Journal of Finance
55, 807-837.
Wang, S. (1993): “The Integrability Problem of Asset Prices.” Journal of Economic Theory
59, 199-213.
264
8
Tackling the puzzles
8.1 Non-expected utility

The standard intertemporal additive separable utility function confounds intertemporal substi-
tution effects and attitudes towards risk. This fact is problematic. Epstein and Zin (1989, 1991)
and Weil (1989) consider a class of recursive, but not necessarily expected utility, preferences.
In this section, we present some details of this approach, without insisting on the theoretic un-
derpinnings, which the reader will find in Epstein and Zin (1989). We provide a basic definition
and derivation of this class of preferences, and a heuristic analysis of the resulting asset pricing
properties.
8.1.1 The recursive formulation

Let utility as of time t be vt . We take vt to be:
vt = W (ct , v̂t+1 ) ,
where W is the aggregator and v̂t+1 is the certainty-equivalent utility at t + 1 defined as,
h (v̂t+1 ) = E [h (vt+1 )] ,
and h is a von Neumann-Morgenstern utility function. That is, the certainty equivalent depends
on some agent’s risk-attitudes encoded into h. Therefore,

vt = W ct , h−1 [E (h (vt+1 ))] .
The analytical example used in the asset pricing literature is,

1/ρ
W (c, v̂) = cρ + e−δ v̂ ρ and h (v̂) = v̂ 1−η , (8.1)
for three positive constants ρ, η and δ. In this formulation, risk-attitudes for static wealth
gambles have still the classical CRRA flavor. More precisely, we say that η is the RRA for
8.1. Non-expected utility c
by A. Mele
static wealth gambles and ψ ≡ (1 − ρ)−1 is the intertemporal elasticity of substitution (IES,
henceforth). We have,
1−η 1−η 1−η
1
v̂t+1 = h−1 [E (h (vt+1 ))] = h−1 E vt+1 = E vt+1 .
Naturally, in the absence of uncertainty, vtρ = cρt + e−δ vt+1

ρ
, which clearly reveals ψ is the IES.
The parametrization for the aggregator in Eq. (8.1) implies that:
ρ !1/ρ
vt = cρt + e−δ E(vt+1
1−η 1−η
) . (8.2)
This collapses to the standard intertemporal additively separable case when ρ = 1 − η ⇔

RRA = IES−1 . Indeed, it is straight forward to show that in this case,
∞ 1
−δn 1−η 1−η
vt = E e ct+n .
n=0
Let us go back to Eq. (8.2). The function V = v 1−η / (1 − η) is obviously ordinally equivalent
to v, and satisfies,
1 ρ
! 1−η
ρ
Vt = cρt + e−δ ((1 − η) E(Vt+1 )) 1−η . (8.3)
1−η
The previous formulation makes even more transparent that these utils collapse to standard
intertemporal additive utils as soon as RRA = IES−1 .
8.1.2 Testable restrictions

m
Let us define cum-dividend wealth as xt ≡ i=1 (Pit + Dit ) θ it . In the Appendix, we show that
xt evolves as follows:
xt+1 = (xt − ct ) ω ⊤
t (1m + rt+1 ) ≡ (xt − ct ) (1 + rM,t+1 ) , (8.4)
where ω is the vector of proportions of wealth invested in the m assets, rt+1 is the vector of asset
returns, with any component i being equal to, rit+1 ≡ Pit+1 +D it+1 −Pit
Pit
, and rM,t is the return on
the market portfolio, defined as,
m
Pit θit+1
rM,t+1 = rit+1 ω it , ω it ≡ ,
i=1 i Pit θ it+1
where Pit and Dit are the price and the dividend of asset i at time t.
Let us consider a Markov economy in which the underlying state is some process y. We
consider stationary consumption and investment plans. Accordingly, let the stationary util be
a function V (x, y) when current wealth is x and the state is y. By Eq. (8.3),
1 ρ
! 1−η
ρ
′ ′
V (x, y) = max W (c, E (V (x , y ))) ≡ max cρ + e−δ ((1 − η) E(V (x′ , y ′ ))) 1−η .
c,ω 1 − η c,ω
(8.5)
In the Appendix, we show that the first order conditions for the representative agent lead to
the following Euler equation,
E [m (x, y; x′ y ′ ) (1 + ri (y ′ ))] = 1, i = 1, · · ·, m, (8.6)

266
by A. Mele
where the stochastic discount factor m is,

− ψθ
′ ′ −δθ c (x′ , y ′ ) θ−1 1−η
m (x, y; x y ) = e (1 + rM (y ′ )) , θ≡ .
c (x, y) 1 − ψ1
This pricing kernel displays the interesting property to be affected by the market portfolio
return, rM , at least as soon as η = ψ1 . Potentially, then, the pricing kernel may inherit the excess
volatility of market returns quite naturally aalthough then, this very same statement needs to
be further qualified, as discussed below.
8.1.3 Equilibrium risk premiums and interest rates

So the Euler equation is,
- ′ − ψθ .
c θ−1
E e−δθ ′
(1 + rM ) (1 + ri′ ) = 1. (8.7)
c
Eq. (8.7) obviously holds for the market portfolio and the risk-free asset. Therefore, by taking
logs in Eq. (8.7) for i = M , and for the risk-free asset, i = 0 say, yields the following conditions:
′
θ c ′
0 = ln E exp −δθ − ln + θRM , RM = ln (1 + rM ), (8.8)
ψ c
and, ′
θ c
−Rf = − ln (1 + r0 ) = ln E exp −δθ − ln + (θ − 1) RM . (8.9)
ψ c
′
Next, suppose that consumption growth, ln cc , and the market portfolio return, RM , are
jointly normally distributed. In the appendix, we show that the expected excess return on the
market portfolio is given by,
1 θ
E(RM ) − Rf + σ 2RM = σ RM ,c + (1 − θ) σ 2RM (8.10)
2 ψ
where σ 2RM = var(RM ) and σ RM ,c = cov(RM , ln (c′ /c)), and the term 12 σ 2RM in the left hand
side is a Jensen’s inequality term. Note, Eq. (8.10) is a mixture of the Consumption CAPM
(for the part ψθ σ RM ,c ) and the CAPM (for the part (1 − θ) σ 2RM ).
The risk-free rate is given by,
′
1 c 1 1 θ 2
Rf = δ + E ln − (1 − θ) σ 2RM − σ , (8.11)
ψ c 2 2 ψ2 c
where σ 2c = var(ln (c′ /c)).

Eqs. (8.10) and (8.11) can be elaborated further. In equilibrium, the asset price and, hence,
the return, is certainly related to consumption volatility. Precisely, let us assume that,
σ 2RM = σ 2c + σ 2∗ , σ RM ,c = σ 2c , (8.12)
where σ 2∗ is a positive constant that may arise when the asset return is driven by some additional
state variable. (This is the case, for example, in the Bansal and Yaron (2004) model described
267
by A. Mele
below.) Under the assumption that the asset return volatility is as in Eq. (8.12), the equity
premium in Eq. (8.10) is:
1
1 η− ψ
E(RM ) − Rf + σ 2RM = ησ 2c + (1 − θ) σ 2∗ = ησ 2c + 1 σ 2∗ .
2 1− ψ
Disentangling risk-aversion from intertemporal substitution is not enough for the equity pre-
mium puzzle to be resolved. To raise the equity premium, we need that σ 2∗ > 0, meaning that
additional state variables are needed, to drive variation of asset returns. At the same time, the
volatility of these state variables has the power to affect asset returns only when risk-aversion
is distinct from the inverse of the IES. As an example, suppose that σ 2∗ does not depend on η
and ψ and that ψ > 1. Then, the equity premium increases with σ 2∗ whenever η > ψ −1 . In other
words, these state variables have the power to affect the equity premium insofar as they enter
the pricing kernel, which it happens with the Epstein-Zin-Weil preferences.
Next, let us derive the risk-free rate. Assume that E [ln (c′ /c)] = g0 − 12 σ 2c , where g0 is the
expected consumption growth, a constant. Furthermore, use the assumptions in Eq. (8.12) to
obtain that the risk-free rate in Eq. (8.11) is,
1
1 1 1 2 1η− ψ 2
R f = δ + g0 − η 1 + σc − σ .
ψ 2 ψ 2 1 − ψ1 ∗
As we can see, we may increase the level of relative risk-aversion, η, without substantially
affecting the level of the risk-free rate, Rf . This is because the effects of η on Rf are of a
second-order importance (they multiply variances, which are orders of magnitude less than the
expected consumption growth, g0 ).
8.1.4 Campbell-Shiller approximation

Consider the definition of the return on the market portfolio,
zt+1
Pt+1 + Ct+1 e +1
RM,t+1 = ln = ln + gt+1 ≡ f (zt+1 , zt ) + gt+1 ,
Pt ezt
where Pt is the value of the market portfolio, gt+1 = ln CCt+1

t
is the aggregate dividend growth,
Pt
and zt = ln Ct is the log of the aggregate price-dividend ratio. A first-order linear approximation
of f (zt+1 , zt ) around the “average” level of z leaves,
RM,t+1 ≈ κ0 + κ1 zt+1 − zt + gt+1 , (8.13)

z̄ z̄
where κ0 = ln e e+1
z̄ + ez̄z̄+1 , κ1 = ez̄e+1 and z̄ is the average level of the log price-dividend ratio.
Typically, κ1 ≈ 0.997 for US data.
Campbell and Shiller (1988) derive Eq. (8.13) and show how useful it is to address a number
of questions. We now use Eq. (8.13) to illustrate how non-expected utility may be used to
address the equity premium puzzle.
8.1.5 Risks for the long-run

Bansal and Yaron (2004) argue that persistence in the expected consumption growth may
explain the equity premium puzzle. To understand the potential for this, let us assume that
268
8.2. Limited stock market participation c
by A. Mele
consumption growth is solution to,
Ct+1 1
gt+1 = ln = g0 − σ 2c + xt + ǫt+1 , ǫt+1 ∼ N 0, σ 2c , (8.14)
Ct 2
where xt is a “small” persistent component of consumption growth, solution to,

xt+1 = ρxt + ηt+1 , η t+1 ∼ N 0, σ 2x . (8.15)
To find an approximate solution to the log of the price-dividend ratio, replace the Campbell-
Shiller approximation in Eq. (8.13) into the Euler equation (8.8) for the market portfolio,

θ Ct+1
0 = ln E exp −δθ − ln + θ (κ0 + κ1 zt+1 − zt + gt+1 ) . (8.16)
ψ Ct
Conjecture that the log of the price-dividend ratio takes the simple form, zt = a0 + a1 xt , where
a0 and a1 are two coefficients to be determined. Substituting this guess into Eq. (8.16), and
identifying terms, leaves:
1 − ψ1
zt = a0 + xt , (8.17)
1 − κ1 ρ
for some constant a0 in the Appendix. Next, use RM,t+1 ≈ κ0 +κ1 zt+1 −zt +gt+1 (or alternatively,
the stochastic discount factor) to compute σ 2∗ , volatility, risk-premium, etc. [In progress]
8.2 Limited stock market participation

Basak and Cuoco (1998) consider a model with two agents. One of these agents does not
invest in the stock market, and has logarithmic instantaneous utility, un (c) = ln c. From his
perspective, markets are incomplete. The second agent, instead, invests in the stock market,
and has instantaneous utility equal to up (c) = (c1−η − 1)/ (1 − η). Both agents are infinitely
lived.
Clearly, in this economy, the competitive equilibrium is Pareto inefficient. Yet, Basak and
Cuoco show how aggregation obtains in this economy. Let ĉi (τ ) be the general equilibrium
allocation of agent i, i = p, n. The first order conditions for the two agents’s intertemporal
consumption plans are:
τ
u′p (ĉp (τ )) = wp eδτ ξ (τ ) ; ĉn (τ )−1 = wn eδτ − 0 R(s)ds
(8.18)
where wp , wn are two constants, and ξ is the usual pricing kernel process, solution to,
dξ (τ )
= −R (τ ) dt − λ (τ ) · dW (τ ) . (8.19)
ξ (τ )
Let
u (D, x) ≡ max [up (cp ) + x · un (cn )] ,
cp +cn =D
where
u′p (ĉp )
x≡ = u′p (ĉp )ĉn (8.20)
u′n (ĉn )
269
8.3. “Catching up with the Joneses” in a heterogeneous agents economy c
by A. Mele
is a stochastic social weight. By the definition of ξ, x (τ ) is solution to,
dx (τ ) = −x (τ ) λ (τ ) dW (τ ) , (8.21)
where λ is the unit risk-premium, which as shown in the appendix equals,
λ (s) = ησ 0 s−1 , where s (τ ) ≡ ĉp (τ )/ D (τ ) .
Then, the equilibrium price system in this economy is supported by a fictitious representa-
tive agent with utility u (D, x). Intuitively, the representative agent “allocations” satisfy, by
construction,
u′p (c∗p (τ )) u′p (ĉp (τ ))
= = x (τ ) ,
u′n (c∗n (τ )) u′n (ĉn (τ ))
where starred allocations are the representative agent’s “allocations”. In other words, the trick
underlying this approach is to find a stochastic social weight process x (τ ) such that the first
order conditions of the representative agent leads to the market allocations. This is shown more
rigorously in the Appendix.
Guvenen (2005) makes an interesting extension of the Basak and Cuoco model. He consider
two agents in which only the “rich” invests in the stock market, and is such that ISErich >
IESpoor . He shows that for the rich, a low IES is needed to match the equity premium. However,
US data show that the rich have a high IES, which can not do the equity premium. (Guvenen
considers an extension of the model in which we can disentangle IES and CRRA for the rich.)
8.3 “Catching up with the Joneses” in a heterogeneous agents economy

Chan and Kogan (2002) study an economy with heterogeneous agents and “catching up with the
Joneses” preferences. In this economy, there is a continuum of agents indexed by a parameter
η ∈ [1, ∞) in the instantaneous utility function,
c 1−η
x
uη (c, x) = ,
1−η
where c is consumption, and x is the “standard living of others”, to be defined below.

The total endowment in the economy, D, follows a geometric Brownian motion,
dD (τ )
= g0 dτ + σ 0 dW (τ ) . (8.22)
D (τ )
By assumption, the standard of living of others, x (τ ), is a weighted geometric average of the

past realizations of the aggregate consumption D, viz
" τ
−θτ
ln x (τ ) = ln x (0) e +θ e−θ(τ −s) ln D (s) ds, with θ > 0.
0
Therefore, x (τ ) satisfies,
dx (τ ) = θs (τ ) x (τ ) dτ , where s (τ ) ≡ ln ( D (τ )/ x (τ )) . (8.23)
270
8.4. Volatility, and leverage c
by A. Mele
By Eqs. (8.22) and (8.23), s (τ ) is solution to,

1 2
ds (τ ) = g0 − σ 0 − θs (τ ) dτ + σ 0 dW (τ ) .
2
In this economy with complete markets, the equilibrium price process is the same as the price
process in an economy with a representative agent with the following utility function,
" ∞ " ∞
u (D, x) ≡ max uη (cη , x) f (η) dη s.t. cη dη = D, [P1]
cη 1 1
where f (η)−1 is the marginal utility of income for agent η. (See Chapter 2 in Part I, for the
theoretical foundations of this program.)
In the appendix, we show that the solution to the static program [P1] leads to the following
expression for the utility function u (D, x),
" ∞
1 1 η−1
u(D, x) = f (η) η V (s) η dη,
1 1−η
where V is a Lagrange multiplier, which satisfies,
" ∞
1 1
s
e = f (η) η V (s)− η dη.
1
The appendix also shows that the unit risk-premium predicted by this model is,
exp(s)
λ(s) = σ 0 # ∞ 1 1 . (8.24)
1
1
η
f (η) η V (s)− η dη
This economy collapses to an otherwise identical homogeneous economy, once the social weight-
ing function f (η) = δ (η − η0 ), the Dirac’s mass at η 0 . In this case, λ (s) = σ 0 η 0 , a constant.
A crucial assumption in this model is that the standard of living X is a process with bounded
variation (see Eq. (8.23)). By this assumption, the standard living of others is not a risk which
agents require to be compensated for. The unit risk-premium in Eq. (8.24) is driven by s through
nonlinearities induced by agents heterogeneity. By calibrating their model to US data, Chan
and Kogan find that the risk-premium, λ (s), is decreasing and convex in s.1 The mechanism
at the heart of this result is an endogenous wealth redistribution in the economy. Clearly, the
less risk-averse individuals put a higher proportion of their wealth in risky assets, compared to
the more risk-averse agents. In the poor states of the world, stock prices decrease, the wealth
of the less risk-averse lowers more than that of the more risk-averse agents, which reduces the
fraction of wealth held by the less risk-averse individuals in the whole economy. Thus, in bad
times, the contribution of these less risk-averse individuals to aggregate risk-aversion decreases
and, hence, the aggregate risk-aversion increases in the economy.
8.4 Volatility, and leverage

Can firm leverage be responsible for a sustained stock volatility? Can leverage explain counter-
cyclical stock volatility? We already know, from the previous chapter, that ex-post stock returns
1 Their numerical results also revealed that in their model, the log of the price-dividend ratio is increasing and concave in s.
Finally, their lemma 5 (p. 1281) establishes that in a homogeneous economy, the price-dividend ratio is increasing and convex in s.
271
by A. Mele
are high in good times, whence stock volatility is negatively related to ex-post returns. Accord-
ing to the the leverage effect hypothesis, the mechanism for such a negative relation between
stock returns and volatility is that a negative shock to a share price makes the debt/equity ratio
increase. As a result, the firm becomes riskier, and stock volatility increases. It is often argued
that empirically, the leverage effect is too weak. Most of the contributions to these issues are
empirical (e.g., Black, 1976; Christie, 1982; Schwert, 1989a,b; Nelson, 1991). Naturally, another
possibility is that stock volatility and returns are negatively related for reasons unrelated to the
leverage effect. For example, stock volatility can be countercyclical because agents’ preferences
and beliefs, combined with macroeconomic conditions, lead precisely to this property, as in the
models discussed in Chapter 7 and in the previous sections.
Alternatively, countercyclical volatility might arise as a result of a combined effect of the
properties of the previous models, and leverage. A difficulty is that in many empirical studies,
tests of the leverage effect hypothesis are performed without regard to a well specified economic
model. Gallmeyer, Aydemir, Hollifield (2007) show that the reasoning underlying this hypothesis
can be made rigorous. They formulate a general equilibrium model with levered firms, which
they realistically calibrate, to disentangle leverage effects from “real” effects such as habit
formation. They make use of a stochastic discount factor known to price assets fairly well, and
conclude that leverage effects do indeed have little effects in general equilibrium. This section
develops a variant of their model, which has the mere merit to admit a closed-form solution.
8.4.1 Model
8.4.1.1 Primitives
Exogeneous aggregate output follows:

dδ (t)
= µδ dt + σ δ dW (t) ,
δ (t)
where σ δ > 0 and µδ are two constants. Many households maximize,
" ∞
−ρt
E e ln (c (t) − x (t)) dt ,
0
1 c(t)−x(t)
where x (t) is external habit and relative risk-aversion is = s(t)
, where st = c(t)
, the surplus
consumption ratio, which in equilibrium equals δ(t)−x(t)
δ(t)
.
For the representative firm, we have that its value is:
V = E + D,
where E is equity and D is debit. Let maturity of debt be denoted as Td , which will be 10 years.
The payoffs of the firm are such that δ (t) = δ E (t) + δ D (t), with obvious notation.
To obtain closed-form solutions, assume the surplus consumption ratio, s (t) = δ(t)−x(t)
δ(t)
is a
1
continuous-time autoregressive process, with the inverse s(t) solution to,

1 1 1 1 1
d =β − dt − ω − σ δ dW (t) ,
s (t) s̄ s (t) s (t) υ
for some constants β, s̄, ω and υ. It can be shown that if ω is small enough, then, 0 < s (t) < υ.
Menzly, Santos and Veronesi (2004) show that this modeling trick leads to closed-form solutions
in models with external habit formation. It’s a trick quite similar to that used by Ljungqvist
and Uhlig (2000) to model productivity shocks.
272
by A. Mele
8.4.1.2 Model’s predictions
We now show that a calibration of the model leads to the following results: (i) price/dividend
ratios and price of debt are procyclical; (ii) return volatility is countercyclical; (iii) the leverage
ratio is countercyclical; (iv) the contribution of leverage to equity returns volatility is quite
limited.
8.4.1.3 Variation in volatility
Equity volatility is,

dE D
vol = σ V + (σ V − σ D ) .
E E
Let T = ∞, and assume debt services are δ D = qδ, for some q ∈ (0, 1). Maturity of debt Td = 10
years. For the aggregate consumption claim, we have:
P/D ratio = p (s (t)) = a + bs (t)
for some constants a, b given below. There is a similar expression for debt.
From this, we may easily deduce volatility. We have,
b bTd
σ V (s (t)) = σ δ + vol (ds (t)) , σ D (s (t)) = σ δ + vol (ds (t))
a + bs (t) aTd + bTd s (t)

1
vol (ds (t)) = ωσ δ st 1 − st ,
υ
where,

1 − e−(ρ+β)Td β 1 − e−ρTd + ρ e−(ρ+β)Td − e−ρTd
aTd = , bTd = , a = lim al , b = lim bl .
ρ+β ρ (ρ + β) s̄ l→∞ l→∞
8.4.1.4 Equity volatility: a decomposition formula
Equity volatility is,

dE (t) b 1
vol = σδ + · σ δ ωs (t) 1 − s (t)
E (t) a + bs (t) υ

= endog. P/D fluct. =vol(ds(t))

b bTd 1 D (t)
+ − · σ δ ωs (t) 1 − s (t) · . (8.25)
a + bs (t) aTd + bTd s (t) υ E (t)

= leverage multiplier =vol(ds(t))
D(t)
Note that the leverage ratio, E(t)
, is endogeneous and equal to,
D (t) (aTd + bTd s (t)) q

=
E (t) a + bs (t) − (aTd + bTd s (t)) q

D(t) dE(t)
so we can only “see” what happens to E(t) and vol E(t) as the surplus s (t) changes.
We calibrate the model using the values in the following Table, which are similar to those in
Menzly, Santos and Veronesi (2004).
273
by A. Mele
σδ ρ β s̄ ω υ q
0.01 0.04 0.15 0.03 40 0.05 0.60
To anticipate, ′much of the action in this model is activated by the large swings in the price-
dividend ratio, pp(s(t))
(s(t)) b
= a+bs(t) . Precisely, we have:

dE (t) b 1
vol = σδ + · σ δ ωs (t) 1 − s (t)
E (t) a + bs (t) υ
=0.01
≈0.15 = endog. P/D fluct.≈26.31 =vol(ds(t)) ≈ 5·10−3

b bTd 1 D (t)
+ − · σ δ ωs (t) 1 − s (t) · .
a + bs (t) aTd + bTd s (t) υ E (t)

= leverage multiplier ≈11.08 =vol(ds(t)) ≈ 5·10−3 ≈ 0.24
These computations might suggest that by playing with debt maturity, one may obtain a
greater leverage contribution to volatility. However, this is not the case, as we shall show.
274
by A. Mele
P/D ratio
35
30
25
20
15
10
5
0.00 0.01 0.02 0.03 0.04 0.05
surplus ratio
FIGURE 8.1. Price-dividend ratio for the aggregate consumption claim.
Eq. Vol.
0.20
0.15
0.10
0.05
0.00
0.015 0.020 0.025 0.030 0.035 0.040 0.045 0.050
surplus ratio
FIGURE 8.2. Equity volatility for Td = 10. The solid line is total volatility. The top
dashed is the contribution from “unlevered” volatility to total volatility, σV . The bottom
dashed line is the contribution from “levered” volatility to total volatility, (σV − σ δ ) D
E.
275
by A. Mele
D
What is the statistical relation between the leverage ratio E
and return volatility that we
should expect to find in the data?
Total Vol. 0.22

0.20
0.18
0.16
0.14
0.12
0.10
0.08
0.06
0.04
0.02
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

leverage ratio
FIGURE 8.3. Leverage and equity volatility: a “naked” eye view.
Note, this is not a causal relation, both leverage and equity volatility are driven by the same
state variable, the surplus consumption ratio.
The effects of debt maturity on the leverage effect are quite limited. Indeed, as debt maturity
decreases, the leverage multiplier increases. However, the leverage ratio D E
shrinks to zero as
maturity shrinks to zero. The overall effect is given by the third term on the right hand side of
Eq. (8.25).
Lev. Vol.
0.012
0.010
0.008
0.006
0.004
0.002
0.000
0 5 10 15 20 25 30
debt maturity (yrs.)
FIGURE 8.4. Leverage volatility at the steady state expectation, s (t) = 0.03.
276
by A. Mele
8.4.1.5 The role of no-bankruptcy, and some model’s implications
In the previous model, there was no role for bankruptcy. Let us consider bankruptcy in a simple
setting. Consider a two date economy, and suppose that the value of the firm in one year is,
+
Vbad < Nominal debt, wp p
Ṽ =
Vgood > Nominal debt, wp 1 − p
S̃1 −S0
We assume risk-neutrality and that are no bankruptcy costs. Let R̃ = S0
be the equity

p
return, where S̃1 is the equity value at the second period. Then, we have that vol(R̃) = 1−p .
For example, if p = 2%, then vol(R̃) = 14%!
277
8.5. Appendix 1: Non-expected utility c
by A. Mele
8.5 Appendix 1: Non-expected utility

8.5.1 Detailed derivation of optimality conditions and selected relations
Derivation of Eq. (8.4). We have,

xt+1 = (Pit+1 + Dit+1 ) θit+1
i
= (Pit+1 + Dit+1 − Pit ) θit+1 + Pit θit+1
i i

Pit+1 + Dit+1 − Pit Pit θit+1
= 1+ Pit θit+1
i Pit i Pit θit+1
i

= 1+ rit+1 ω it (xt − ct )
i

where the last line follows by the standard budget constraint ct + Pit θit+1 = xt , the definition of
rit+1 and the definition of ω it given in the main text.
Optimality. Consider Eq. (8.5). The first order condition for c yields,

W1 c, E V x′ , y ′ = W2 c, E V x′ , y ′ · E V1 x′ , y′ 1 + rM y′ , (8A.1)
where subscripts denote partial derivatives. Thus, optimal consumption is some function c (x, y). Hence,

x′ = (x − c (x, y)) 1 + rM y ′
We have,
V (x, y) = W c (x, y) , E V x′ , y ′ .
By differentiating the value function with respect to x,

V1 (x, y) = W1 c (x, y) , E V x′ , y ′ c1 (x, y)
′ ′ ′ ′
+W2 c (x, y) , E V x , y E V1 x , y 1 + rM y ′ (1 − c1 (x, y)) ,
where subscripts denote partial derivatives. By replacing Eq. (8A.1) into the previous equation we get
the Envelope Equation for this dynamic programming problem,

V1 (x, y) = W1 c (x, y) , E V x′ , y′ . (8A.2)
By replacing Eq. (8A.2) into Eq. (8A.1), and rearranging terms,

W2 (c (x, y) , ν (x, y)) ′ ′ ′ ′ ′
E W1 c x , y , ν x , y 1 + rM y = 1, ν (x, y) ≡ E V x′ , y ′ .
W1 (c (x, y) , ν (x, y))
Below, we show that by a similar argument the same Euler equation applies to any asset i,

W2 (c (x, y) , ν (x, y))
E W1 c x′ , y′ , ν x′ , y ′ 1 + ri y ′ = 1, i = 1, · · ·, m. (8A.3)
W1 (c (x, y) , ν (x, y))
Derivation of Eq. (8A.3). We have,

′ ′ ′
V (x, y) = max W c, E V x′ , y ′ = max W x − Pi θ ′
i , E V x ,y ; x′ = Pi + Di′ θ′i .
c,ω ′
θ
The set of first order conditions is,

θ′i : 0 = −W1 (·) Pi + W2 (·) E V1 x′ , y ′ Pi′ + Di′ , i = 1, · · ·, m.
278
by A. Mele
Optimal consumption is c (x, y). Let ν (x, y) ≡ E (V (x′ , y ′ )), as in the main text. By replacing Eq.
(8A.2) into the previous equation,

W2 (c (x, y) , ν (x, y)) ′ ′ ′ ′ Pi′ + Di′
E W1 c x , y , ν x , y = 1, i = 1, · · ·, m.
W1 (c (x, y) , ν (x, y)) Pi
Derivation of Eq. (8.6). We need to compute explicitly the stochastic discount factor in Eq.
(8A.3),
W2 (c (x, y) , ν (x, y))
m x, y; x′ y ′ = W1 c x′ , y ′ , ν x′ , y ′ .
W1 (c (x, y) , ν (x, y))
We have,
1 ρ
! 1−η
ρ
W (c, ν) = cρ + e−δ ((1 − η) ν) 1−η .
1−η
From this, it follows that,
ρ
! 1−η −1
ρ −δ ρ
W1 (c, ν) = c +e ((1 − η) ν) 1−η cρ−1
ρ ! 1−η −1 ρ
e−δ ((1 − η) ν) 1−η −1 ,
ρ −δ ρ
W2 (c, ν) = c +e ((1 − η) ν) 1−η
and,
ρ ! 1−η
ρ
−1 1−η−ρ 1−η−ρ
W1 c′ , ν ′ = c′ρ + e−δ (1 − η) ν ′ 1−η c′ρ−1 = W c′ , ν ′ 1−η (1 − η) 1−η c′ρ−1 , (8A.4)
where ν ′ ≡ ν (x′ , y ′ ). Therefore,

ρ
−1 ′ ρ−1
W2 (c, ν) ν 1−η c
m x, y; x′ y ′ = W1 c′ , ν ′ = e−δ .
W1 (c, ν) W (c′ , ν ′ ) c
Along any optimal consumption path, V (x, y) = W (c (x, y) , ν (x, y)). Therefore,
′ ′
ρ −1 ′ ρ−1
′ ′
−δ E (V (x , y ))
1−η c
m x, y; x y = e ′ ′
. (8A.5)
V (x , y ) c
′ ′
We are left with evaluating the term E(V (x ,y ))
V (x′ ,y ′ ) . The conjecture to make is that v (x, y) = b (y)1/(1−η) x,

for some function b. From this, it follows that V (x, y) = b (y) x1−η (1 − η). We have,
V1 (x, y)
1−η−ρ 1−η−ρ 1−η−ρ 1−η−ρ
= W1 c (x, y) , E V x′ , y ′ = W (c, ν) 1−η (1 − η) 1−η cρ−1 = V (x, y) 1−η (1 − η) 1−η cρ−1 .
where the first equality follows by Eq. (8A.2), the second equality follows by Eq. (8A.4), and the last
equality follows by optimality. By making use of the conjecture on V , and rearraning terms,
ρ
c (x, y) = a (y) x, a (y) ≡ b (y) (1−η)(ρ−1) . (8A.6)

Hence, V (x′ , y ′ ) = b (y ′ ) x′1−η (1 − η), where

x′ = (1 − a (y)) x 1 + rM y ′ , (8A.7)
and !
E (V (x′ , y ′ )) E ψ (y ′ ) (1 + rM (y ′ ))1−η
= . (8A.8)
V (x′ , y ′ ) ψ (y ′ ) (1 + rM (y′ ))1−η
279
by A. Mele
Along any optimal path, V (x, y) = W (c (x, y) , E (V (x′ , y ′ ))). By plugging in W (from Eq. 8.5)) and
the conjecture for V ,
(1−η)(ρ−1)
′ ′ 1−η ! −δ − 1−η ρ a (y) ρ
E ψ y 1 + rM y = e . (8A.9)
1 − a (y)
Moreover,
1−η ρ ! (1−η)(ρ−1)
ψ y′ 1 + rM y′ = a y ′ 1 + rM y ′ ρ−1
ρ
. (8A.10)
By plugging Eqs. (8A.9)-(8A.10) into Eq. (8A.8),
- . (1−η)(ρ−1)
E (V (x′ , y ′ )) −δ −
1−η ρ
ρ a (y)
= e ρ
V (x′ , y ′ ) (1 − a (y)) a (y ′ ) (1 + rM (y ′ )) ρ−1
- . (1−η)(ρ−1)
− 1−η c′ −1 x′ ρ
= e−δ
ρ
ρ
c (1 − a (y)) x (1 + rM (y ′ )) ρ−1
- . (1−η)(ρ−1)
− 1−η c′ −1 1 ρ
ρ
= e−δ 1
c (1 + rM (y ′ )) ρ−1
where the first equality follows by Eq. (8A.6), and the second equality follows by Eq. (8A.7). The
result follows by replacing this into Eq. (8A.5).
Proof of Eqs. (8.10) and (8.11). By using the standard property that ln E(eỹ ) = E(ỹ)+ 12 var (ỹ),
for ỹ normally distributed, in Eq. (8.8), we obtain,
′
θ c
0 = ln E exp −δθ − ln + θRM
ψ c
′ - .
θ c 1 θ 2 2 2 2 θ2
= −δθ − E ln + θE(RM ) + σc + θ σ RM − 2 σRM ,c . (8A.11)
ψ c 2 ψ ψ
We do the same in Eq. (8.9), and obtain,

′ - .
θ c 1 θ 2 2 2 2 θ (θ − 1)
Rf = δθ + E ln − (θ − 1) E(RM ) − σ c + (θ − 1) σRM − 2 σ RM ,c .
ψ c 2 ψ ψ
(8A.12)
By replacing Eq. (8A.12) into Eq. (8A.11), we obtain Eq. (8.10) in the main text.
To obtain the risk-free rate Rf in Eq. (8.11), we replace the expression for E(RM ) in Eq. (8.10) into
Eq. (8A.12).
8.5.2 Details for the risks for the lung-run

Proof of Eq. (8.17). By substituting the guess zt = a0 + a1 xt into Eq. (8.16),

θ Ct+1
0 = (κ0 − (1 − κ1 ) a0 − δ) θ + ln Et exp − ln + θκ1 a1 xt+1 − θa1 xt + θgt+1
ψ Ct

1 1 2 1
= θ (κ0 − (1 − κ1 ) a0 − δ) + 1 − g0 − σ c + ln Et exp θ 1 − ǫt + θκ1 a1 η t
ψ 2 ψ

1
+ (κ1 ρ − 1) a1 + 1 − xt
ψ
≡ const1 + const2 · xt ,
280
by A. Mele
where the second equality follows by Eqs. (8.14) and (8.15). Note, then, that this equality can only
hold if the two constants, const1 and const2 are both zero. Imposing const2 = 0 yields,
1
1− ψ
a1 = ,
1 − κ1 ρ
as in Eq. (8.17) in the main text. Imposing const1 = 0, and using the solution for a1 , yields the solution
for the constant a0 .
8.5.3 Continuous time

Duffie and Epstein (1992a,b) extend the framework on non-expected utility to continuous time. Heuris-
tically, the continuation utility is the continuous time limit of,
ρ 1/ρ
ρ −δ∆t 1−η 1−η
vt = ct ∆t + e E(vt+∆t ) .
Continuation utility vt solves the following stochastic differential equation,

1 2
dvt = −f (ct , vt ) − A (vt ) σvt dt + σvt dBt , with vT = 0
2
Now, (f, A) is the aggregator, with A being a variance multiplier, placing a penalty proportional to
utility volatility σ vt 2 . The aggregator (f, A) corresponds somehow to the aggregator (W, v̂) of the
discrete time case.
The solution to the previous “stochastic differential utility” is:
" T
1 2
vt = E f (cs , vs ) + A (vs ) σvs ds ,
t 2
which collapses to the standard additive utility case once f (c, v) = u (c) − βv and A = 0.
281
8.6. Appendix 2: Economies with heterogenous agents c
by A. Mele
8.6 Appendix 2: Economies with heterogenous agents

Restricted stock market participation (Basak and Cuoco (1998)). We first show that Eq.
(8.21) holds true. Indeed, by the definition of the stochastic social weight in Eq. (8.20), we have that
wp τ
x (τ ) = u′p (ĉp (τ ))ĉn (τ ) = ξ (τ ) e 0 R(s)ds
wn
where the second line follows by the first order conditions in Eq. (8.18). Eq. (8.21) follows by the
previous expression for x and the dynamics for the pricing kernel in Eq. (8.19).
By Chapter 7 (Appendix 1), the unit risk premium λ satisfies,
u11 (D, x) u12 (D, x)x

λ(D, x) = − σ0 D + λ(D, x).
u1 (D, x) u1 (D, x)
This is:
u1 (D, x)u11 (D, x) σ0 D
λ(D, x) = − ·
u1 (D, x) − u12 (D, x)x u1 (D, x)
u′′ (ĉa )
=− a σ0 D
u1 (D, x)
u′′ (ĉa )ĉa
= − a′ σ0 s−1 .
ua (ĉa )
where the second line follows by Basak and Cuoco (identity (33), p. 331) and the third line follows by
the definition of u(D, x) and s. The Sharpe ratio reported in the main text follows by the definition
of ua . The interest rate is also found through Chapter 7 (Appendix 1). We have,
ηg0 1 η(η + 1)σ20

R(s) = δ + − .
η − (η − 1)s 2 s(η − (η − 1)s)
Finally, by applying Itô’s lemma to s = cDa , and using the optimality conditions for agent a, we find
that drift and diffusion functions of s are given by:

(1 − η)(1 − s) 1 (η + 1)σ20 1 (η + 1)σ20
φ(s) = g0 s− + + σ0 (s − 1),
η + (1 − η)s 2 η + (1 − η) s 2 s
and ξ(s) = σ0 (1 − s).
Economies with a continuum of agents. Next, we assume each agent faces a system of com-
plete markets, in which case the equilibrium can be computed along the lines of Huang (1987), an
extension of the classical approach described in Chapter 2 of Part I of these Lectures. We consider a
continuum of agents indexed by an instantaneous utility function ua (c, x), where c is consumption, a
is some parameter belonging to some set A, and x is some variable. For example, x is the “standard of
living of others” in the Chan and Kogan (2002) model. The equilibrium allocation is, of course, Pareto
efficient, because we are assuming each agent a ∈ A faces a system of complete markets. By the second
welfare theorem, then, we know that for each Pareto allocation (ca )a∈A , there exists a social weighting
function f such that the Pareto allocation can be “implemented” by means of the following program,
" "
u (D, x) = max ua (ca , x) f (a) da, s.t. ca da = D, [8A.Soc-Pl]
ca a∈A a∈A
where D is the aggregate endowment in the economy. Then, the equilibrium price system can be
computed as the Arrow-Debreu state price density in an economy with a single agent endowed with
282
by A. Mele
the aggregate endowment D, instantaneous utility function u (c, x), and where for a ∈ A, the social
weighting function f (a) equals the reciprocal of the marginal utility of income of the agent a.
The practical merit of this approach is that while the marginal utility of income is unobservable, the
thusly constructed Arrow-Debreu state price density depends on the “infinite dimensional parameter”,
f, which can be calibrated to reproduce the main quantitative features of consumption and asset price
data. We now apply this approach and derive the equilibrium conditions in the Chan and Kogan
(2002) model.
“catching up with the Joneses” (Chan and Kogan (2002)). In this model, markets are
complete, and we have that A = [1, ∞] and uη (cη , x) = ( cη / x)1−η / (1 − η). The static optimization
problem for the social planner in [8A.Soc-Pl] can be written as,
" "
∞
( cη / x)1−η ∞
u (D, x) = max f (η) dη, s.t. ( cη / x) dη = D/ x. (8A.13)
cη 1 1−η 1
The first order conditions for this problem lead to,
( cη / x)−η f (η) = V ( D/ x) , (8A.14)
where V is a Lagrange multiplier, a function of the aggregate endowment D, normalized by x. It is

determined by the equation, " ∞ 1 1
V ( D/ x)− η f (η) η dη = D/ x,
1
which is obtained by replacing Eq. (8A.14) into the budget constraint of the social planner.
The general equilibrium allocations and prices can be obtained by setting f (η) equal to the reciprocal
of the marginal utility of income for agent η. Then, the expression for the unit risk-premium in Eq.
(8.24) follows by,
2 9
∂ u (D, x) ∂u (D, x)
λ (s) = − σ0 D,
∂D2 ∂D
and lengthy computations, after setting D/ x = es . The short-term rate can be computed by calculating
the expectation of the pricing kernel in this fictitious representative agent economy.
It is instructive to compare the first order conditions of the social planner in Eq. (8A.14) with those
in the decentralized economy. Since markets are complete, we have that the first order conditions in
the decentralized economy satisfy:
e−δt ( cη (t)/ x (t))−η = κ (η) ξ (t) x (t) , (8A.15)
where κ (η) is the marginal utility of income for agent η, and ξ (t) is the usual pricing kernel.
By aggregating the market equilibrium allocations in Eq. (8A.15),
" ∞ " ∞ !− 1 1
eδt ξ (t) x (t) κ (η)− η dη.
η
D (t) = cη (t) dη = x (t)
1 1
By aggregating the social weighted allocations in Eq. (8A.14), with f = κ−1 ,

" ∞
1 1
D (t) = x (t) V ( D (t)/ x (t))− η κ (η)− η dη.
1
Hence, it must be that,

x (t)−1 V ( D (t)/ x (t)) = eδt ξ (t) . (8A.16)
283
by A. Mele
That is, if f = κ−1 , then, Eq. (8A.16) holds. The converse to this result is easy to obtain: eliminating
( cη / x)−η from Eq. (8A.14) and Eq. (8A.15) leaves:
eδt x (t) ξ (t) 1

= , (t, η) ∈ [0, ∞) × [1, ∞).
V ( D (t)/ x (t)) f (η) κ (η)
Hence if Eq. (8A.16) holds, then, f = κ−1 .

To summarize, the equilibrium allocations and prices can be “centralized” through the social planner
program in (8A.13), with f = κ−1 .
284
by A. Mele
References
Bansal, R. and A. Yaron (2004): “Risks for the Long Run: A Potential Resolution of Asset
Pricing Puzzles.” Journal of Finance 59, 1481-1509.
Basak, S. and D. Cuoco (1998): “An Equilibrium Model with Restricted Stock Market Par-
ticipation.” Review of Financial Studies 11, 309-341.
Black, F. (1976): “Studies of Stock Price Volatility Changes.” Proceedings of the 1976 Meeting
of the American Statistical Association, 177-81.
Campbell, J. and R. Shiller (1988): “The Dividend-Price Ratio and Expectations of Future
Dividends and Discount Factors.” Review of Financial Studies 1, 195—228.
Chan, Y.L. and L. Kogan (2002): “Catching Up with the Joneses: Heterogeneous Preferences
and the Dynamics of Asset Prices.” Journal of Political Economy 110, 1255-1285.
Christie, A.A. (1982): “The Stochastic Behavior of Common Stock Variances: Value, Leverage,
and Interest Rate Effects.” Journal of Financial Economics 10, 407-432.
Duffie, D. and L.G. Epstein (1992a): “Asset Pricing with Stochastic Differential Utility.” Re-
view of Financial Studies 5, 411-436.
Duffie, D. and L.G. Epstein (with C. Skiadas) (1992b): “Stochastic Differential Utility.” Econo-
metrica 60, 353-394.
Epstein, L.G. and S.E. Zin (1989): “Substitution, Risk-Aversion and the Temporal Behavior of
Consumption and Asset Returns: A Theoretical Framework.” Econometrica 57, 937-969.
Epstein, L.G. and S.E. Zin (1991): “Substitution, Risk-Aversion and the Temporal Behavior of
Consumption and Asset Returns: An Empirical Analysis.” Journal of Political Economy
99, 263-286.
Gallmeyer, M., Aydemir, A.C. and B. Hollifield (2007): “Financial Leverage and the Leverage
Effect: A Market and a Firm Analysis.” working paper Carnegie Mellon.
Guvenen, F. (2005): “A Parsimonious Macroeconomic Model for Asset Pricing: Habit forma-
tion or Cross-Sectional Heterogeneity.” Working paper, University of Rochester.
Huang, C.-f. (1987): “An Intertemporal General Equilibrium Asset Pricing Model: the Case
of Diffusion Information.” Econometrica 55, 117-142.
Ljungqvist, L. and H. Uhlig (2000): “Tax Policy and Aggregate Demand Management under
Catching Up with the Joneses.” American Economic Review 90, 356-366.
Menzly, L., T. Santos and P. Veronesi (2004): “Understanding Predictability.” Journal of
Political Economy 112, 1, 1-47.
Nelson, D.B. (1991): “Conditional Heteroskedasticity in Asset Returns: A New Approach.”
Econometrica 59, 347-370.
Schwert, G. W. (1989a): “Why Does Stock Market Volatility Change Over Time?” Journal of
Finance 44, 1115-1153.
285
by A. Mele
Schwert, G.W. (1989b): “Business Cycles, Financial Crises and Stock Volatility.” Carnegie-
Rochester Conference Series on Public Policy 31, 83-125.
Weil, Ph. (1989): “The Equity Premium Puzzle and the Risk-Free Rate Puzzle.” Journal of
Monetary Economics 24, 401-421.
286
9
Information and other market frictions
9.1 Introduction
The assumption agents have imperfect information about the fundamentals of the economy was
first used by Phelps (1970) and Lucas (1972), to explain the relation between monetary policy
and the business cycle. This information-based approach to the business cycle, summarized
in Lucas (1981), was, in fact, abandoned in favour of the real business cycle theory, reviewed
in Chapter 3, partly because imperfect information can not be considered as the sole engine
of macroeconomic fluctuations. Instead, it is widely acknowledged that the merit of Lucas’
approach was the introduction of a systematic way of thinking about fluctuations, in a context
of rational expectations. Moreover, his information approach has inspired work in financial
economics, where imperfect information is likely to play a quite fundamental role. In Section
9.2, we provide a succinct account of the Lucas framework, and solve a model relying on a
simplified version of Lucas (1973). We solve this model, following the perspective we think a
finance theorist would typically have. It is quite useful to present this model, as this is very
simple and at the same time, contributes to give us a big picture of where imperfect information
can lead us, in general. Section 9.2 through 9.7 review the many models in financial economics
that have been used to explain the price formation mechanism in contexts with imperfect
information, be it asymmetric or differential, as we shall make precise below.
Sections 9.7 and 9.9 conclude this chapter, and present additional market frictions that are
potentially apt to explain certain features in the asset price formation process.
9.2 Prelude: imperfect information in macroeconomics

There are n islands, where n goods are produced. Let yis denote log-production supplied in the
i-th island. (All prices and quantities are in logs, in this section.) It is assumed that this supply
is set so as to equal the expected wedge of the price in the island, pi , over the average price in
the economy, p,
n
s 1
yi = E ( pi − p| pi ) , where p = pj .
n j=1
9.2. Prelude: imperfect information in macroeconomics c
by A. Mele
The previous equation can be easily derived, once we assume p is common knowledge, as for
example in the model of monopolistic competion of Blanchard and Kiyotaki (1987). If, instead,
p is not common knowledge, it is more problematic to derive the exact functional form assumed
for yis , although this describes a quite plausible decision mechanism.
Information is disseminated differentially, not asymmetrically, in that producers in the i-th
island do not know the price in the remaining islands, and guess economic developments in
the other islands with the same precision. We assume and, later, verify, that all variables,
exogeneous and endogeneous, are normally distributed. Under this presumption, we shall show,
the price index p gathers all the available information in the economy efficiently, i.e. it is a
sufficient statistics for all that information.
We have, by the Projection theorem,
yis = E ( pi − p| pi ) ≡ β (pi − E (p)) ,
where we have used the fact that information is symmetrically disseminated and, then, (i)
the expectation E (pi ) = E (pj ) = E (p) for every i and j, and (ii) both the numerator and
denominator of the ratio, β ≡ cov(p i −p,pi )
var(pi )
, are the same across all islands. This coefficient will
be determined below, as a result of the equilibrium.
Aggregating across all islands, yields the celebrated Lucas supply equation:
n
s 1 s
y ≡ y = β (p − E (p)) . (9.1)
n j=1 j
Next, assume the demand for the good produced in the i-th island is given by:

yid = m − p + ui − θ (pi − p) , where ui ∼ N 0, σ 2u
where money is
m = E (m) + ǫ, where ǫ ∼ N 0, σ 2ǫ . (9.2)
n
Finally, we assume that E (ui ǫ) = 0, and that ui are a sectoral shocks, in that: j=1 uj = 0.
The functional form assumed for the demand function, yid , can be easily derived, assuming the
goods in the islands are imperfect substitutes, as for example in Blanchard and Kiyotaki (1987).
In this context, the equilibrium price in the islands plays two roles. A first, standard role, is
to clear the market in each island, being such that yis = yid , or:
β (pi − E (p)) = m − p + ui − θ (pi − p) , for all i. (9.3)
Its second role is to convey information about the two shocks, the macroeconomic, monetary
shock, ǫ, and the real shocks in all the islands, uj , j = 1, · · · , n. Let us assume, then, that the
only real shock that matters for the price in the i-th island is ui . Below, we shall verify this
conjecture holds, in equilibrium. Then, the price is a function pi = P (ǫ, ui ), which we conjecture
to be affine, in ǫ and ui , viz
P (ǫ, ui ) = a + bǫ + cui , (9.4)
where the coefficients a, b and c have to be determined, in equilibrium. Under these conditions,
the average price is a function p = P̄ (ǫ), equal to:
P̄ (ǫ) = a + bǫ. (9.5)

288
9.2. Prelude: imperfect information in macroeconomics c
by A. Mele
Let us replace Eqs. (9.4), (9.5) and (9.2) into Eq. (9.3). By rearranging terms, we obtain:
0 = (βb + b − 1) ǫ + (βc + θc − 1) ui + a − E (m) .
This equation has to hold for all ǫ and ui . Therefore,
a = E (m) ,
and the coefficients for ǫ and ui must both equal zero, leading to the following expressions for
b and c:
1 1
b= , c= . (9.6)
1+β θ+β
We are left with determining β, which given Eqs. (9.4)-(9.5), and Eq. (9.6), is easily shown to
equal:
σ 2u
β= 2 . (9.7)
2 θ+β 2
σ u + 1+β σ ǫ
The positive fixed point to this equation, which is easily shown to exist, delivers β, which can
then be replaced back into Eqs. (9.6), to yield the solutions for b and c, which are both positive.
We can now figure out the implications of this equilibrium. By replacing Eqs. (9.4)-(9.5) into
the Lucas supply equation (9.1), leaves:
y s = βbǫ.
This is Lucas celebrated neutrality result. Anticipated monetary policy, E (m), does not affect
the equilibrium outcome, y s . Instead, it is the monetary shock that affects y s . Agents in any
one island do not observe the price in the remaining islands and, hence, the aggregate price
level, p. Therefore, they are unable to tell whether an increase in the price of the good they
produce, pi , is due to a real shock, ui , or to a monetary shock, ǫ. In other words, they can
not disentangle a monetary shock from a real shock. If the agents were informed about the
real shocks in the other islands, they would of course infer ǫ, and a monetary shock would not
exert any effect on the equilibrium production. Formally, in equilibrium, the price difference,
pi − p = cui , which does not depend on ǫ, a standard “dichotomy” prediction reminiscent of
classical theory. But pi −p is not observed, as p is not observed. Instead, the producers in the i-th
island can only guess pi −E ( p| pi ) = bǫ+cui , which co-varies positively with the observed price,
pi , cov (pi − p, pi ) = c2 σ 2u . This covariance is zero precisely when we remove the assumption of
imperfect knowledge about the real shocks, so that σ 2u = 0, in which case β = 0. By contrast,
with imperfect knowledge, producers act so as to compensate for their partial lack of knowledge,
and produce to the maximum extent they can justify, on the basis of the positive statistical
co-movements, cov (pi − p, pi ) > 0. Note, if E (m) = m−1 , i.e. money supply in the previous
period, then from Eq. (9.5), the inflation rate, p − p−1 = bǫ + (1 − b) ǫ−1 . Therefore, output and
inflation are positively correlated, and generate a Phillips curve, which policy makers can not
exploit anyway, as anticipated monetary policy, E (m), is rationally “factored out,” and does
not affect output. This is the essence of the Lucas critique (Lucas, 1977).
In the next sections, we present a number of models that work due to a similar mechanism.
Why should we ever purchase an asset from any one else, who is insisting in selling it to the
market? Trading seems to be a difficult phenomenon to explain, in a world with imperfect
information. Yet trading does occur, if imperfect information has the same nature as that of
289
9.3. Grossman-Stiglitz paradox c
by A. Mele
the Phelps-Lucas model. Agents might well be imperfectly informed about the nature of, say,
unusually high market orders. For example, huge sell orders might arrive to the market, either
because the asset is a lemon or because the agents selling it are hit by a liquidity shock. In
the models of this section, an equilibrium with rational expectation exists, precisely because of
this “noise”–liquidity, in this example. There is a chance the sell order arrives to the market,
simply because the agents selling it are hit by a liquidy shock. Imperfectly informed agents,
therefore, might be willing to buy, if it is in their interest to do so.
9.3 Grossman-Stiglitz paradox

9.4 Noisy rational expectations equilibrium
9.4.1 Differential information
Hellwig (1980). Diamond and Verrecchia (1981).
9.4.2 Asymmetric information

Grossman and Stiglitz (1980).
9.4.3 Information acquisition

9.5 Strategic trading
Kyle (1985). Foster and Viswanathan (1996).
9.6 Dealers markets

Glosten and Milgrom (1985).
9.7 Noise traders

DeLong, Shleifer, Summers and Waldman (1990).
9.8 Demand-based derivative prices

9.8.1 Options
Gârleanu, Pedersen and Poteshman (2007).
9.8.2 Preferred habitat and the yield curve

Vayanos and Vila (2007), Greenwood and Vayanos (2008).
9.9 Over-the-counter markets

Duffie, Gârleanu and Pedersen (2005, 2007).
290
9.9. Over-the-counter markets c
by A. Mele
References
Blanchard, O. and N. Kiyotaki (1987): “Monopolistic Competition and the Effects of Aggregate
Demand.” American Economic Review 77, 647-666.
Lucas, R. E., Jr. (1972): “Expectations and the Neutrality of Money.” Journal of Economic
Theory 4, 103-124.
Lucas, R. E., Jr. (1973): “Some International Evidence on Output-Inflation Tradeoffs.” Amer-
ican Economic Review 63, 326-334.
Lucas, R. E., Jr. (1977): “Econometric Policy Evaluation: A Critique.” Carnegie-Rochester

Conference Series on Public Policy 1: 19-46.
Lucas, R. E., Jr. (1981): Studies in Business-Cycle Theory. Boston, MIT Press.
Phelps, E. S. (1970): “Introduction.” In: Phelps, E. S. (Editor): Microeconomic Foundations

of Employment and Inflation Theory, New York: W. W. Norton.
291
Part III
Applied asset pricing theory
292
10
Options and volatility
10.1 Introduction
This chapter is under construction. Will include material on forwards, exotics, evaluation
through trees and calibration. Will cover details on how to deal with market imperfections.
Will have to improve the presentation.
10.2 Forwards
10.2.1 Pricing
Forwards can be synthesized, as follows. Let P (t, T ) be the price of a bond expiring at time
T . Assuming the short-term rate r is constant, we have P (t, T ) = e−r(T −t) . At time t, borrow
P (t, T ) F ∗ and buy a stock, with market price St , with F ∗ : P (t, T ) F ∗ − S (t) = 0. Then, the
payoff of this portfolio at time T is F ∗ − S (T ). But the portfolio is worthless at time t, so this
trading is the same as a forward. Therefore, we have F ∗ = F (t) (say), where:
F (t) = S (t) er(T −t) . (10.1)
Forwards are insensitive to volatility, in general, although they might, under some circumstances
clarified below.
10.2.2 Forwards as a means to borrow money

Forward contracts can be used to borrow money. We can do the following: (i) long a forward,
which at time T , delivers the payoff −F + S (T ); (ii) short-sell the underlying asset, which at
time T , will give rise to a payoff of −S (T ). So, (i) and (ii) are such that now, we access to S (t)
dollars, due to (ii), and at time T , we need to pay −F , which is the sum of the two payoffs
resulting from (i) and (ii). By Eq. (10.1), this is tantamount to borrowing money at the interest
rate r.
10.3. Options: no-arb bounds, convexity and hedging c
by A. Mele
10.2.3 A pricing formula

Consider, again, a contract similar to that in Section 10.2.1, where at time T , the payoff is given
by S (T ) − K, for some constant value of K. We know that the current value of this payoff is:
e−r(T −t) Et (S (T ) − K) = S (t) − e−r(T −t) K, (10.2)
which is zero, for K = F ∗ . We want to consider the situation where such a current value is not
necessarily zero, and show that the previous expression is a special case of a quite important
pricing formula. Consider a payoff at time T , equal to S (T ) − K, provided the stock price at
T is at least as large as some positive constant ℓ ≥ 0,
(S (T ) − K) IS(T )≥ℓ .
For ℓ = 0, this payoff is just that of a forward, and for ℓ = K, the payoff is that of European
call. To price this payoff, we proceed as follows:

e−r(T −t) Et (S (T ) − K) IS(T )≥ℓ = S (t) Et η t (T ) IS(T )≥ℓ − e−r(T −t) KEt IS(T )≥ℓ

= S (t) Êt IS(T )≥ℓ − e−r(T −t) KEt IS(T )≥ℓ
= S (t) · Q̂t (S (T ) ≥ ℓ) − e−r(T −t) K · Q (S (T ) ≥ ℓ) (10.3)
e−r(T −t) S(T )
where η t (T ) ≡ S(t)
, Qt is the risk-neutral probability given the information at time t,
Q̂t is a new probability, with Radon-Nykodim derivative given by
dQ̂t
= η t (T ) , (10.4)
dQt
and, finally, Et denotes the expectation under Qt , and Êt the expectation under Q̂t . Naturally,
Eq. (10.3) collapses to Eq. (10.2), once ℓ = 0, and to Black-Scholes, once we take S (t) to be a
geometric Brownian Motion, and ℓ = K. It is a general formula quite useful in models where,
for example, the volatility of the underlying asset return is not constant, as in the models of
Section 10.5.
10.2.4 Forwards and volatility

10.3 Options: no-arb bounds, convexity and hedging
A European call (put) option is a contract by which the buyer has the right, but not the
obligation, to buy (sell ) a given asset at some price, called the strike, or exercise price, at some
future date. Let C and p be the prices of the call and the put option. Let S be the price of
the asset underlying the contract, and K and T be the exercise price and the expiration date.
Finally, let t be the current evaluation time. The following relations hold true,
+ +
0 if S(T ) ≤ K K − S(T ) if S(T ) ≤ K
C(T ) = p(T ) =
S(T ) − K if S(T ) > K 0 if S(T ) > K
or more succinctly, C(T ) = (S(T ) − K)+ and p(T ) = (K − S(T ))+ .

Figure 10.1 depicts the net profits generated by portfolios including one asset, a share, say,
and/or one option written on the same very share. To simplify the presentation, we take the
294
by A. Mele
short-term rate r = 0. The first rows in Figure 10.1 illustrate that the exposure to losses
generated by longing or shorting a share drops by purchasing an appropriate options. Consider,
for example, consider the first row, which depicts the two net profits related to (i) longing a
share and (ii) longing a call option written on the share. Both cases generate positive net profits
when S (T ) is high. However, the call option provides “protection” when S(T ) is low, provided
C(t) < S(t), which is indeed a no-arbitrage condition we prove demonstrate later. It is this
insurance feature that makes the option economically valuable.
The prices of the call and put options are intimately related by the put-call parity. Let P (t, T )
be the time t price of a zero maturing at time T > t. We have:
Theorem 10.1 (Put-call parity). Consider a put and a call option with the same exercise
price K and the same expiration date T . Their prices p(t) and C(t) satisfy, p (t) = C (t) −
S (t) + KP (t, T ).
Proof. Consider two portfolios: (A) Long one call, short one underlying asset, and invest
KP (t, T ); (B) Long one put. The table below gives the value of the two portfolios at time t
and at time T .
Value at T
Value at t S(T ) ≤ K S(T ) > K
Portfolio A C (t) − S (t) + KP (t, T ) −S(T ) + K S(T ) − K − S(T ) + K
Portfolio B p (t) K − S (T ) 0
The two portfolios have the same value in each state of nature at time T . Therefore, their values
at time t must be identical to rule out arbitrage.
295
by A. Mele
π (S)T = S (T) − S (t) π (c)T = c(T) − c(t)
K K + c(t)
S(T) S(T)
S(t)
− c(t)
− S(t)
Buy share Buy call
S(t)
π (-S)T = S(t) - S(T) π (p)T = p(T) - p(t)
K
S(T) S(T)
S(t) K - p(t)
-p(t)
Short-sell share
Buy put
π(-c)T = c(t) - c(T) π(-p)T = p(t) - p(T)
c(t)
p(t)
K + c(t)
S(T) S(T)
K K
K - p(t)
p(t) - K
Sell call Sell put
FIGURE 10.1.
296
by A. Mele
By the put-call parity, properties of European put prices can mechanically be deduced from
those of the corresponding call prices. From now on, we focus our discussion on European calls.
The following result gathers a few basic properties of call prices occurring before the expiration
date.
Theorem 10.2. The call price C (t) = C (S (t) ; K; T − t) satisfies the following properties:
(i) C (S (t) ; K; T − t) ≥ 0; (ii) C (S (t) ; K; T − t) ≥ S(t)−KP (t, T ); and (iii) C (S (t) ; K; T − t)
≤ S (t).
Proof. Part (i) holds because Pr {C (S (T ) ; K; 0) > 0} > 0, which implies that C must be
nonnegative at time t to preclude arbitrage opportunities. As regards Part (ii), consider two
portfolios: Portfolio A, buy one call; and Portfolio B, buy one underlying asset and issue debt
for an amount of KP (t, T ). The table below gives the value of the two portfolios at time t and
at time T .
Value at T
Value at t S(T ) ≤ K S(T ) > K
Portfolio A C(t) 0 S(T ) − K
Portfolio B S (t) − KP (t, T ) S(T ) − K S(T ) − K
At time T , Portfolio A dominates Portfolio B. Therefore, in the absence of arbitrage, the value
of Portfolio A must dominate the value of Portfolio B at time t. To show Part (iii), suppose the
contrary, i.e. C (t) > S (t), which is an arbitrage opportunity. Indeed, at time t, we could sell
m options (m large) and buy m of the underlying assets, thus making a sure profit equal to
m · (C (t) − S (t)). At time T , the option will be exercized if S (T ) > K, in which case we shall
sell the underlying assets and obtain m · K. If S (T ) < K, the option will not be exercized, and
we will still hold the asset or sell it and make a profit equal to m · S (T ).
Theorem 10.2 can be summarized as follows:
max {0, S (t) − KP (t, T )} ≤ C (S (t) ; K; T − t) ≤ S (t) . (10.5)
Eq. (10.5), then, leads to the next result:
Theorem 10.3. We have, (i) limS→0 C (S; K; T − t) → 0; (ii) limK→0 C (S; K; T − t) → S;

(iii) limT →∞ C(S; K; T − t) → S.
297
by A. Mele
c(t)
A B
A
45° B
A S(t)
K b(t,T)
c(t) c(t)
B C
S(t) S(t)
FIGURE 10.2.
The previous results basic arbitrage bounds option prices satisfy. Consider the top panel of
Figure 10.2. Eq. (10.5) tells us that C (t) must lie inside the AA and the BB lines. Moreover,
by Theorem 10.3(i), C (t) is zero when the price of the asset underlying the contract is zero.
Finally, by Eq. (10.5), the option price goes to zero as the price of the underlying asset gets
large; but because C cannot lie outside the the region bounded by the AA line and the BB
lines, C will go to infinity by “sliding up” through the BB line.
How does the option price behave within AA and BB? We cannot tell. Given the boundary
behavior of the option price C (t), we can only say that provided C (t) is convex in S (t), then, it
is also increasing in S. In this case, C (t) would behave as as in the left-hand side of the bottom
panel of Figure 10.2. This case seems to be the most relevant, empirically. It is predicted by
the celebrated Black and Scholes (1973) formula reviewed in Section 10.4. However, this is
not a general property of option prices. Bergman, Grundy and Wiener (1996) show that in
one-dimensional diffusive models, the price of a contingent claim written on a tradable asset
is convex in the underlying asset price if the payoff of the claim is convex in the underlying
asset price (as in the case of a Europen call option). In our context, the boundary conditions
guarantee that the price of the option is then increasing and convex in the price of the underlying
asset. However, Bergman, Grundy and Wiener provide several counter-examples in which the
price of a call option can be decreasing over some range of the price of the asset underlying
the option contract. These counter-examples include models with jumps, or the models with
stochastic volatility that we shall describe later in this chapter. Therefore, there are no reasons
to exclude that the option price behavior could be as that in right-hand side of the bottom
panel of Figure 10.2. [We have seen some of these things in Chapter 7, actually, so I don’t need
to overlap too much here, on the contrary, need to show how the thing seen in Chapter 7 can
be used here. Moreover, I have to mention this is a general qualitative thing, and that I have a
more technical treatment in Section 10.5.]
298
by A. Mele
c(t)
T1 T2 T3
45°
S(t)
K b(t,T)
FIGURE 10.3.
By convexity, the option is unlikely to be exercized when S is small. Therefore, changes in

the price of the underlying asset produce little effect on the call price, C. However, the option
is likely to be exercized when S is large. In fact, a given percentage increase in S is then to
be followed by an even higher percentage increase in C: the elasticity of the option price with
respect to the asset price is larger than one, ǫ ≡ dCdS
· CS > 1, as for an increasing and convex
function, which is zero at the origin, the first order derivative is always higher than the secant.
In other words, option returns are more volatile than the returns on the underlying asset. A
final property is that call options are also “wasting assets,” in that their value decreases over
time, as illustrated by an hypothetical example in Figure 10.3, which plots the option price
arising for three maturity dates, T1 > T2 > T3 .
The previous properties illustrate in a simple way the general principles underlying a portfolio
aimed to “mimick” the option price. For example, investment banks sell options that they want
to hedge against, to avoid the exposure to losses illustrated in Figure 10.1. As emphasized
further in Section 10.4.4, hedging is important when the only objective is to receive fees from
the sale of derivatives. Then, and at a very least, the portfolio that “mimics” the option price
must exhibit the previous properties. For example, suppose we wish our portfolio to exhibit the
behavior in the left-hand side of the bottom panel of Figure 10.2, which as we argues is the
most relevant, empirically. We require the portfolio to exhibit a number of properties.
(p-i) The portfolio value, V , must be increasing in the underlying asset price, S.
(p-ii) The sensitivity of the portfolio value with respect to the underlying asset price must be
strictly positive and bounded by one, 0 < dVdS
< 1.
(p-iii) The elasticity of the portfolio value with respect to the underlying asset price must be
strictly greater than one, dV
dS
· VS > 1.
The previous properties hold under the following conditions:
(c-i) The portfolio includes the asset underlying the option contract.
(c-ii) The number of assets underlying the option contract is less than one.
(c-iii) The portfolio includes debt to create a sufficiently large elasticity. Indeed, let V = θS −D,
where θ is the number of assets underlying the option contract, with θ ∈ (0, 1), and D is
299
10.4. Evaluation and hedging c
by A. Mele
dV dV S
debt. Then, dS
> θ and dS
· V
= θ · VS > 1 ⇔ θS > V = θS − D, which holds if and only
if D > 0.
In fact, the hedging problem is dynamic in nature, and we would expect θ to be a function
of the underlying asset price, S, and time to expiration. Therefore, we require the portfolio to
display the following additional property:
(p-iv) The number of assets underlying the option contract must increase with S. Moreover,
when S is low, the value of the portfolio must be virtually insensitive to changes in
S. When S is high, the portfolio must include mainly the assets underlying the option
contract, to make the portfolio value “slide up” through the BB line in Figure 10.3.
The previous property holds under the following condition:
(c-iv) θ is an increasing function of S, with limS→0 θ (S) → 0 and limS→∞ θ (S) → 1.
Finally, the purchase of the option does not entail any additional inflows or outflows until time
to expiration. Therefore, we require that the “mimicking” portfolio display a similar property:
(p-v) The portfolio must be implemented as follows: (i) any purchase of the asset underlying
the option contract must be financed by issue of new debt; and (ii) any sells of the asset
underlying the option contract must be used to shrink the existing debt:
The previous property of the portfolio just says that the portfolio has to be self-financing, in
the sense described in the first Part of these lectures.
(c-v) The portfolio is implemented through a self-financing strategy.
We now proceed to add more structure to the problem.
10.4 Evaluation and hedging

We consider a continuous-time model in which asset prices are driven by a d-dimensional Brow-
nian motion W .1 We consider a multivariate state process

dY (h) (t) = ϕh (y (t)) dt + dj=1 ℓhj (y (t)) dW (j) (t) ,
for some functions ϕh and ℓhj (y), satisfying the usual regularity conditions.
The price of the primitive assets satisfies the regularity conditions in Chapter 4. The value of
a portfolio strategy, V , is V (t) = θ (t) · S+ (t). We consider a self-financing portfolio. Therefore,
V is solution to
!
dV (t) = π (t)⊤ (µ (t) − 1m r (t)) + r (t) V (t) − C (t) dt + π (t)⊤ σ (t) dW (t) , (10.6)
where π ≡ (π (1) , ..., π(m) )⊤ , π (i) ≡ θ (i) S (i) , µ ≡ (µ(1) , ..., µ(m) )⊤ , S (i) is the price of the i-th asset,
µ(i) is its drift and σ (t) is the volatility matrix of the price process. We impose that V satisfy
the same regularity conditions in Chapter 4.
1 As usual, we let {F (t)} W (t) = σ (W (s) , s ≤ t) generated by W , with

t∈[0,T ] be the P -augmentation of the natural filtration F
F = F (T ).
300
by A. Mele
10.4.1 Spanning and cloning

A set of securities “spans” a given vector space, say a set of payoffs, if any point in that space can
be generated by a linear combination of the security prices. The set of payoffs may include those
promised by a contingent claim, say e.g. that promised by a European call, or final consumption,
as in Harrison and Kreps (1979) and Duffie and Huang (1985) (see Chapter 4). Chapter 4 relies
on this “spanning” property to solve for consumption-portfolio choices through martingale
techniques. This section emphasizes how spanning helps defining replicating strategies that
lead to price “redundant” assets.
Similarly as in Chpater 4, let V x,π (t) denote the solution to Eq. (10.6) when the initial wealth
is x, the portfolio policy is π, and the intermediate consumption is C ≡ 0. We say that the
portfolio policy π spans F (T ) if V x,π (T ) = X̃ almost surely, where X̃ is any square-integrable
F(T )-measurable random variable.
Chapter 4 characterizes security spanning using the risk-adjusted probability, Q. In this
chapter, we work out the implications of security spanning under the physical probability, P .
In a diffusion environment, asset prices are semimartingales under P . More generally, consider
the following representation of a F(t)-P semimartingale,
dA(t) = dF (t) + γ̃(t)dW (t), (10.7)
where F is a process with finite variation, and γ̃ ∈ L20,T,d (Ω, F , P ). We wish to replicate A
through a portfolio. First, then, we must look for a portfolio π satisfying
γ̃(t) = π ⊤ (t) σ (t) . (10.8)
Second, we equate the drift of V to the drift of F , obtaining,
dF (t)
= π (t)⊤ (µ (t) − 1m r (t)) + r (t) V (t) = π (t)⊤ (µ (t) − 1m r (t)) + r (t) F (t) . (10.9)
dt
The second equality holds because if drift and diffusion terms of F and V are identical, then
F (t) = V (t).
Clearly, if m < d, there are no solutions for π in Eq. (10.8). The economic interpretation is that
in this case, the number of assets is so small that we cannot create a portfolio able to replicate
all possible events in the future. Mathematically, if m < d, then V x,π (T ) ∈ M ⊂ L2 (Ω, F , P ).
As Chapter 4 emphasizes, there is also a converse to this result, which motivates the definition
of market incompleteness given in Chapter 4 (Definition 4.5).
Let H (t) the price of a European call option, which
we take to be rationally formed, in that
1,2 k
H (t) = C (t, y (t)), for some C ∈ C [0, T ) × R . By Itô’s lemma,
dC = µ̄C Cdt + (CY · J) dW,
∂C
k ∂C 1
k ∂2C
where µ̄C C = ∂t
+ l=1 ∂yl ϕl (t, y) + 2 l,j=1 ∂yl ∂yj cov (yl , yj ); CY is 1 × d, and J is d × d.
Finally,
C (T, y) = X̃ ∈ L2 (Ω, F , P ) .
In this context, µ̄C C and CY · J are the same as dF / dt and γ̃ in Eqs. (10.7) and (10.9). In
particular, we identify the volatility in Eq. (10.8) as CY J = π ⊤ σ.
301
by A. Mele
10.4.2 Black & Scholes

Let m = d = 1, and suppose the only state variable is the price of a stock, and that ϕ(s) = µs,
cov(s) = σ 2 s2 , and µ and σ 2 are constants. Then CY · J = CS σS, π = CS S, and by Eq. (10.9),
∂C ∂C 1 ∂2C 2 2 ∂C
+ µS + 2
σ S = π (µ − r) + rC = S (µ − r) + rC, (10.10)
∂t ∂S 2 ∂S ∂S
subject to the boundary condition, C (T, s) = (s − K)+ . The solution is, then, the celebrated
Black and Scholes (1973) formula,
S
√ ln( K ) + (r + 12 σ 2 )(T − t)
C (t, S) = SΦ(d1 ) − Ke−r(T −t) Φ(d1 − σ T − t), d1 = √ , (10.11)
σ T −t
where Φ denotes the cumulative Normal distribution.

Note that Eq. (10.11) holds even without requiring that a market exists for the option,2 or that
the pricing function C (t, S) is differentiable. As it turns out, the option price is differentiable,
but this can be shown to be a result, not just an assumption. Indeed, let us define the function
C (t, S) that solves Eq. (10.10), with boundary condition C (T, S) = (S − K)+ . Note, we are
not assuming this function is the option price. Rather, we shall show this is the option price.
Consider a self-financed portfolio of bonds and stocks, with π = CS S. Its value satisfies,
dV = [CS S(µ − r) + rV ] dt + CS σSdW.
Moreover, by Itô’s lemma, C (t, s) is solution to

1 2 2
dC = Ct + µSCS + σ S CSS dt + CS σSdW.
2
By subtracting the previous two equations,
1
dV − dC = [−Ct − rSCS − σ 2 S 2 CSS + rV ]dt = r (V − C) dt.
2
=−rC
Hence, we have that V (τ ) − C (τ , S (τ )) = [V (0) − C (0, S (0))] exp(rτ ), for all τ ∈ [0, T ].
Next, assume that V (0) = C (0, S (0)). Then, V (τ ) = C (τ , S (τ )) and V (T ) = C (T, S (T )) =
(S (T ) − K)+ . That is, the portfolio π = CS S replicates the payoff underlying the option
contract. Therefore, V (τ ) equals the market price of the option. But V (τ ) = C (τ , S (τ )).
10.4.3 Surprising cancellations and “preference-free” formulae

Due to what Heston (1993a) (p. 933) terms “a surprising cancellation”, the constant µ doesn’t
show up in the final formula. Heston (1993a) shows that this property is not robust to modifica-
tions in the assumptions for the underlying asset price process. Gamma processes, incomplete
markets.
2 The original derivation of Black and Scholes (1973) and Merton (1973) relies on the assumption that an option market exists
(see the Appendix to this chapter).

302
by A. Mele
10.4.4 Hedging
The “cloning” arguments suggest themselves as a mechanism to replicate a derivative instru-
ments through the asset prices underlying the derivative contract. Why do derivatives need to
be replicated, in practice? Because most of them are dealt with by investment banks, which
simply act as financial intermediaries, trading derivatives on behalf of third parties, being com-
pensated through fees. Suppose, for instance, an investment bank receives an order to sell a
put. Then, the bank would want to hedge against this put, by creating a replicating portfolio
such that the value of this portfolio is the same as the final payoff the investment bank has to
pay to its buyer to honour its sale. So hedging is needed to replicate the final payoffs required
to honour the contracts giving rise to these payoffs.
Naturally, investment banks can undertake speculative trading activities, aimed at taking
views, such as those described in Section 10.5.4 below, in which case hedging does not have to
be implemented, in general. However, even in this case, hedging might be required to isolate the
particular views a trading desk of the bank is taking. For example, Section 10.5.4 will explain
that to express the view that equity volatility will raise, say, we cannot simply buy European
options, because options are increasing both in volatility and the asset underlying the option.
A better solution, then, is long an option, delta-hedged through Black-Scholes.
10.4.5 Endogenous volatility

Hedges and crashes. The presence of delta-hedging can lead to financial turmoil. Brady com-
mission.
Gamma is always positive for long calls and puts, as these contracts have positive convexity,
as illustrated by Figure 10.1. Naturally, short calls and puts have negative gamma. In order
for the statement “when gamma is negative, delta hedging involves buying on the way up and
selling on the way down” to be true, we also have to consider whether the delta is positive
or not (that is, whether the derivative price is increasing or decreasing in the underlying asset
price). So we have four cases:
(i) Positive gamma: Buying on the way up and selling on the way down.
(i.1) Positive delta (as in long call). Positive delta means buying the assets needed to
implement the hedging portfolio. When the price of these assets are are up, then,
the delta is also up, which implies we need to keep on buying even more of the assets
underlying the hedging portfolio. On the other hand, when prices are down, the delta
is also down, which implies holding less of the assets underlying the hedging portfolio,
thereby leading to sell some these assets precisely when the market is down.
(i.2) Negative delta (as in long put). Negative delta means selling the assets comprising
the hedging portfolio. In this case, delta is up when when prices are up. However,
this now simply means that we need to sell less! That is, we need to buy back some
of the assets underlying the hedging portfolio. When, instead, prices are down, delta
is also down, which means we need to sell even more into a depressed market.
(ii) Negative gamma: Buying on the way down and selling on the way up.
(ii.1) Positive delta (as in short put). We are buying assets to implement the hedging
portfolio. Negative gamma now means that as soon as the price of these asset goes
303
by A. Mele
up (resp. down), we need to buy less (resp. buy more), so we sell when prices go up
and buy when prices go down.
(ii.2) Negative delta (as in short call). We are selling assets comprising the hedging port-
folio. Negative gamma, here, means that as the price of these assets goes up (resp.
down), we need to sell more (resp. sell less), so once again, we sell when prices go up
and buy when prices go down.
10.4.6 Properties of options in diffusive models

10.4.6.1 Price reaction to random changes in the state variables
We now derive some general properties of option prices arising in the context of diffusion
processes. The discussion in this section hinges upon the seminal contribution of Bergman,
Grundy and Wiener (1996). At the same time, Eqs. (10.14), (10.15) and (10.16) below can be
seen as particular cases of general results provided in Chapter 7.
We take as primitive the price of a share, solution to:
dS (t)
= µ (S (t)) dt + σ (S (t)) dW (t) , σ (s) = 2v (s) (10.12)
S (t)
and develop some properties of a European-style option price at time t, denoted as C (S (t) , t, T ),
where T is time-to-expiration. Let the payoff of the option be the function ψ(S), where ψ satis-
fies ψ ′ (S) > 0. In the absence of arbitarge, C satisfies the following partial differential equation
+
0 = Ct + CS rS + CSS v(S) − rC for all (τ , S) ∈ [t, T ) × R++
(10.13)
C(S, T, T ) = ψ(S) for all S ∈ R++
Let us differentiate the previous partial differential equation with respect to S. The result is
that H ≡ CS satisfies another partial differential equation,
+
0 = Ht + (rS + v ′ (S)) HS + HSS σ(S) − rH for all (τ , S) ∈ [t, T ) × R++
′
H(S, T, T ) = ψ (S) > 0 for all S ∈ R++
(10.14)
By technical results for partial differential equations reviewed in Chapter 7 (Appendix 1), we
have that H (S, τ , T ) > 0 for all (τ , S) ∈ [t, T ] × R++ . That is, in the scalar diffusion setting,
the option price is always increasing in the underlying asset price.
Next, let us tilt the asset price volatility: consider two markets A and B with prices (C i , S i )i=A,B ,
with the asset price volatility being larger in market A than in market B, viz
dS i (τ ) i
i
= rdτ + σ S (τ ) dŴ (τ ) , i = A, B,
S i (τ )
where Ŵ is Brownian motion under the risk-neutral probability, σ i is as σ in Eq. (10.12), and
σ A (s) > σ B (s), for all s. It is easy to see that the price difference, ∆C ≡ C A − C B , satisfies,
+ B
0 = ∆Cτ + r∆CS + ∆CSS · σ A (S) − r∆C + σ A − σ B CSS , for all (τ , S) ∈ [t, T ) × R++
∆C = 0, for all S
(10.15)
By the same results reviewed in Chapter 7 (Appendix 1), used to analyze Eq. (10.14), we have
that ∆C > 0 whenever CSS > 0. Therefore, it follows that if option prices are convex in the
304
by A. Mele
underlying asset price, then they are also always increasing in the volatility of the underlying
asset prices. Volatility changes are mean-preserving spread in this context. We are left to show
that CSS > 0. Let us differentiate Eq. (10.14) with respect to S. The result is that Z ≡ HS =
CSS satisfies the following partial differential equation,
+
0 = Zτ + (r + 2v ′ (S)) ZS + ZSS σ(S) − (r − σ ′′ (S))Z for all (τ , S) ∈ [t, T ) × R++
′′
H(S, T, T ) = ψ (S) for all S ∈ R++
(10.16)
By the usual results in Chapter 7 (Appendix 1), we have, again that H (S, τ , T ) > 0 for all
(τ , S) ∈ [t, T ]×R++ , whenever ψ ′′ (S) > 0 ∀S ∈ R++ . That is, in the scalar diffusion setting, the
option price is always convex in the underlying asset price if the terminal payoff is convex in the
underlying asset price. In other terms, the convexity of the terminal payoff propagates to the
convexity of the pricing function. Therefore, if the terminal payoff is convex in the underlying
asset price, then the option price is always increasing in the volatility of the underlying asset
price.
10.4.6.2 Passage of time
Sometimes we claim that options are “wasting” assets, in that their value decreases over time,
due to a decrease in the value of optionality. For call options, this is definitely true, at least
within diffusive models. By the first equation in (10.13), we have:
Ct = −rC (ǫ − 1) − CSS v(S) < 0, (10.17)
where ǫ ≡ CS CS is the elasticity of the option price with respect to the asset price, which
for a call option, is larger than one, as noted in Section 10.3. However, for a put option, this
elasticity is negative, and can make the right hand side of Eq. (10.17) change sign, especially
for far out-of-the-money options.
10.4.6.3 Recovering risk-neutral probabilities
Consider the price of a European call,

" ∞ " ∞
+
C (S(t), t, T ; K) = P (t, T ) [S(T ) − K] dQ (S(T )| S(t)) = P (t, T ) (x − K) q (x| S(t)) dx,
0 K
where Q is the risk-neutral probability and q( x+ | x)dx ≡ dQ( x+ | x). Assuming that
limx→∞ xq ( x| S) = 0, and differentiating with respect to K leaves:
" ∞
r(T −t) ∂C (S(t), t, T ; K)
e =− q (x| S(t)) dx.
∂K K
Let us differentiate again,

∂ 2 C (S(t), t, T ; K)
er(T −t) = q ( K| S(t)) . (10.18)
∂K 2
Eq. (10.18) allows one to “recover” the risk-neutral density using option prices. The Arrow-
Debreu state density, AD ( S + = u| S(t)), is given by,

+ + ∂ 2
C (S(t), t, T ; K)
AD S = u S(t) = e r(T −t)
q S S (t) S + =u = e 2r(T −t)
.
∂K 2 K=u
These results are quite useful in applied work. They also help deal with the pricing of volatility
contracts reviewed in Section 10.6, as explained in Appendix 3.
305
10.5. Stochastic volatility c
by A. Mele
10.5 Stochastic volatility

10.5.1 Statistical models of changing volatility
A prominent step in empirical finance was the understanding that financial returns exhibit
both temporal dependence in their second order moments and heavy-peaked and tailed distri-
butions. This empirical feature of financial returns was known at least since the seminal work
of Mandelbrot (1963) and Fama (1965). However, it was only with the introduction of the au-
toregressive conditionally heteroscedastic (ARCH) model of Engle (1982) and Bollerslev (1986)
that econometric models of changing volatility have been intensively fitted to data.
An ARCH model works as follows. Let {yt }N t=1 be a record of observations on some asset
returns. That is, yt = ln St /St−1 , where St is the asset price, and where we are ignoring divi-
dend issues. The empirical evidence suggests that the dynamics of yt are well-described by the
following model:
yt = a + ǫt , ǫt | Ft−1 ∼ N(0, σ 2t ), σ 2t = w + αǫ2t−1 + βσ 2t−1 , (10.19)
where a, w, α and β are parameters and Ft denotes the information set as of time t. This model
is known as the GARCH(1,1) model (Generalized ARCH). It was introduced by Bollerslev
(1986), and collapses to the ARCH(1) model introduced by Engle (1982) once we set β = 0.
ARCH models have played a prominent role in the analysis of many aspects of financial
econometrics, such as the term structure of interest rates, the pricing of options, or the presence
of time varying risk premiums in the foreign exchange market. The classic survey is that in
Bollerslev, Engle and Nelson (1994).
The quintessence of ARCH models is to make volatility dependent on the variability of past
observations. An alternative formulation, initiated by Taylor (1986), makes volatility driven
by some unobserved components. This formulation gives rise to the stochastic volatility model.
Consider, for example, the following stochastic volatility model,
yt = a + ǫt , ǫt | Ft−1 ∼ N(0, σ 2t );
ln σ 2t = w + α ln ǫ2t−1 + β ln σ 2t−1 + η t ; η t | Ft−1 ∼ N (0, σ 2η )
where a, w, α, β and σ 2η are parameters. The main difference between this model and the
GARCH(1,1) model in Eq. (10.19) is that the volatility as of time t, σ 2t , is not predetermined
by the past forecast error, ǫt−1 . Rather, this volatility depends on the realization of the stochastic
volatility shock ηt at time t. This makes the stochastic volatility model considerably richer than
a simple ARCH model. As for the ARCH models, SV models have also been intensively used,
especially following the progress accomplished in the corresponding estimation techniques. The
seminal contributions related to the estimation of this kind of models are mentioned in Mele
and Fornari (2000). Early contributions that relate changes in volatility of asset returns to
economic intuition include Clark (1973) and Tauchen and Pitts (1983), who assume that a
stochastic process of information arrival generates a random number of intraday changes of the
asset price.
10.5.2 Implied volatility and smiles

Parallel to the empirical research into asset returns volatility, practitioners and academics re-
alized that the assumption of constant volatility underlying the Black and Scholes (1973) and
306
by A. Mele
Merton (1973) formulae was too restrictive. The Black-Scholes model assumes that the price of
the asset underlying the option contract follows a geometric Brownian motion,
dS (t)
= µdτ + σdW (t) ,
S (t)
where W is a Brownian motion, and µ, σ are constants. As explained earlier, σ is the only
parameter to enter the Black-Scholes-Merton formulae.
The assumption that σ is constant is inconsistent with the empirical evidence reviewed in
the previous section. This assumption is also inconsistent with the empirical evidence on the
cross-section of option prices. Let CBS (St , t; K, T, σ) be the option price predicted by the Black-
Scholes formula, when the stock price is S (t), the option contract has a strike price equal to K,
and the maturity is K, and let the market price be Ct$ (K, T ). Then, empirically, the implied
volatility”, i.e. the value of σ that equates the Black-Scholes formula to the market price of the
option, IV say,
CBS (S (t) , t; K, T, IV) = Ct$ (K, T ) (10.20)
depends on the “moneyness of the option,” defined as,
S (t) er(T −t)

mo ≡ ,
K
where r is the short-term rate, K is the strike of the option, and T is the maturity date of
the option contract. By the results in Section 10.4.1, we know the Black-Scholes option price
is strictly increasing in σ. Therefore, the previous definition makes sense, in that there exists
an unique value IV such that Eq. (10.20) holds true. In fact, the market practice is to quote
options in terms of implied volatilities, not prices. Moreover, this same implied volatility relates
to both the call and the put option prices. Consider the put-call parity in Theorem 10.1,
Pt (K, T ) = Ct (K, T ) − S (t) + Ke−r(T −t) ,
Naturally, for each σ, this same equation must necessarily hold for the Black-Scholes model,
i.e. PBS (S (t) , t; K, T, σ) = CBS (S (t) , t; K, T, σ) − S (t) + Ke−r(T −t) . Subtracting this equation
from the previous one, we see that, the implied volatilities of a call and a put options are the
same.
The crucial empirical point is that the IV exhibits a pattern. Before 1987, it did not display
1
a clear pattern, or a ∪-shaped pattern in mo at best, a “smile.” After the 1987 crash, the smile
turned in to a “smirk,” also referred to as “volatility skew.”
What are the orgins of this empirical regularity? One plausibe explanation is that options (be
they call or puts) that are deep-in-the-money and options (be they call or puts) that are deep-
out-of the money are relatively less liquid and therefore command a liquidity risk-premium.
Since the Black-Scholes option price is increasing in volatility, the implied volatility is, then,
1
∪-shaped in mo .
A second explanation relates to the Black-Scholes assumption that asset returns are log-
normally distributed. This assumption may not be correct, as the market might be pricing using
an alternative distribution. One possibility is that such an alternative distribution puts more
weight on the tails, as a result of the market fears about the occurrence of extreme outcomes.
For example, the market might fear the stock price will decrease under a certain level, say K.
As a result, the market density should then have a left tail ticker than that of the log-normal
307
by A. Mele
density, for values of S < K. This implies that the probability deep-out-of-the-money puts (i.e.,
those with low strike prices) will be exercized is higher under the market density than under the
log-normal density. In other words, the volatility needed to price deep-out-of-the-money puts is
larger than that needed to price at-the-money calls and puts.
At the other extreme, if the market fears that the stock price will be above some K̄, then, the
market density should exhibit a right tail ticker than that of the log-normal density, for values
of S > K̄, which implies a larger probability (compared to the log-normal) that deep-out-of-the-
money calls (i.e., those with high strike prices) will be exercized. Then, the implied volatility
needed to price deep-out-of-the-money calls is larger than that needed to price at-the-money
calls and puts. The second effect has disappeared since the 1987 crash, leaving the “smirk.”
Ball and Roma (1994) and Renault and Touzi (1996) were the first to note that a smile effect
arises when the asset return exhibits stochastic volatility. In continuous time,3

 dS (t)
= µdt + σ (t) dW (t)
S (t) (10.21)

dσ 2 (t) = b(S (t) , σ (t))dt + a(S (t) , σ (t))dW σ (t)
where W σ is another Brownian motion, and b and a are some functions satisfying the usual
regularity conditions. In other words, let us suppose that Eqs. (10.21) constitute the data
generating process. Then, the fundamental theorem of asset pricing (FTAP, henceforth) tells
us that there is a probability equivalent to P , Q say (the risk-neutral probability), such that
the rational option price C(S(t), σ 2 (t) , t, T ) is given by,

C(S(t), σ 2 (t) , t, T ) = e−r(T −t) E (S(T ) − K)+ S(t), σ 2 (t) ,
where E [·] is the expectation taken under the probability Q. Next, if we continue to assume
that option prices are really given by the previous formula, then, by inverting the Black-Scholes
formula produces a “constant” volatility that is ∪-shaped with respect to K.
The first option pricing models with stochastic volatility are developed by Hull and White
(1987), Scott (1987) and Wiggins (1987). Explicit solutions have always proved hard to derive.
If we exclude the approximate solution provided by Hull and White (1987) or the analytical
solution provided by Heston (1993b),4 we typically need to derive the option price through
some numerical methods based on Montecarlo simulation or the numerical solution to partial
differential equations.
In addition to these important computational details, models with stochastic volatility raise
serious economic concerns. Typically, the presence of stochastic volatility generates market
incompleteness. As we pointed out earlier, market incompleteness means that we cannot hedge
against future contingencies. In our context, market incompleteness arises because the number
of the assets available for trading (one) is less than the sources of risk (i.e. the two Brownian
3 Inan important paper, Nelson (1990) shows that under regularity conditions, the GARCH(1,1) model converges in distribution
to the solution of the following stochastic differential equation:
dσ 2 (t) = (ω − ϕσ2 (t))dt + ψσ 2 (t) dW σ (τ ),
where W σ is a standard Brownian motion, and ω, ϕ, and ψ are parameters. See Mele and Fornari (2000) (Chapter 2) for additional
results on this type of convergence. Corradi (2000) develops a critique related to the conditions underlying these convergence results.
4 The Heston’s solution relies on the assumption that stochastic volatility is a linear mean-reverting “square-root” process. In
a square root process, the instantaneous variance of the process is proportional to the level reached by that process: in model
(10.21), for instance, a(S, σ) = a · σ, where a is a constant. In this case, it is possible to show that the characteristic function is
exponential-affine in the state variables S and σ. Given a closed-form solution for the characteristic function, the option price is
obtained through standard Fourier methods.
308
by A. Mele
motions).5 In our option pricing problem, there are no portfolios including only the underlying
asset and a money market account that could replicate the value of the option at the expiration
date. Precisely, let C be the rationally formed price at time t, i.e. C (τ ) = C (S (τ ) , σ 2 (τ ) , τ , T ),
where σ 2 (τ ) is driven by a Brownian motion W σ , which is different from W . The value of the
portfolio that only includes the underlying asset is only driven by the Brownian motion driving
the underlying asset price, i.e. it does not include W σ . Therefore, the value of the portfolio does
not factor in all the random fluctuations that move the return volatility, σ 2 (τ ). Instead, the
option price depends on this return volatility as we have assumed that the option price, C (τ ),
is rationally formed, i.e. C (τ ) = C (S (τ ) , σ 2 (τ ) , τ , T ).
In other words, trading with only the underlying asset cannot lead to a perfect replication of
the option price, C. In turn, rembember, a perfect replication of C is the condition we need to
obtain a unique preference-free price for the option, as explained in a general context in Chapter
4. To summarize, the presence of stochastic volatility introduces two inextricable consequences:6
• There is an infinity of option prices that are consistent with the requirement that there
are no arbitrage opportunities.
• Perfect hedging strategies are impossible. Instead, we might, alternatively, either (i) use
a strategy, which is not self-financed, but that allows for a perfect replication of the claim
or (ii) a self-financed strategy for some misspecified model. In case (i), the strategy leads
to a hedging cost process. In case (ii), the strategy leads to a tracking error process, but
there can be situations in which the claim can be “super-replicated”, as we explain below.
10.5.3 Stochastic volatility and market incompleteness

Let us suppose that the asset price is solution to Eqs. (10.21). To simplify, we assume that W
and W σ are independent. Since C is rationally formed, C(τ ) = C(S(τ ), σ 2 (τ ) , τ , T ). By Itô’s
lemma,

∂C 1 2 2 1 2
dC = + µSCS + bCσ2 + σ S CSS + a Cσ2 σ2 dτ + σSCS dW + aCσ2 dW σ .
∂t 2 2
Next, let us consider a self-financed portfolio that includes (i) one call, (ii) −α shares, and
(iii) −β units of the money market account (MMA, henceforth). The value of this portfolio is
V = C − αS − βP , and satisfies
dV = dC − αdS − βdP

∂C 1 2 2 1 2
= + µS (CS − α) + bCσ2 + σ S CSS + a Cσ2 σ2 − rβP dτ + σS (CS − α) dW + aCσ2 dW σ .
∂t 2 2
As is clear, only when a = 0, we could zero the volatility of the portfolio value. In this case,
we could set α = CS and βP = C − αS − V , leaving

∂C 1 2 2
dV = + bCσ2 + σ S CSS − rC + rSCS + rV dτ ,
∂t 2
5 Naturally, markets can be “completed” by the presence of the option. However, in this case the option price is not preference
free.
6 The mere presence of stochastic volatility is not necessarily a source of market incompleteness. Mele (1998) (p. 88) considers
a “circular” market with m asset prices, in which (i) the asset price no. i exhibits stochastic volatility, and (ii) this stochastic
volatility is driven by the Brownian motion driving the (i − 1)-th asset price. Therefore, in this market, each asset price is solution
to the Eqs. (10.21) and yet, by the previous circular structure, markets are complete.
309
by A. Mele
where we have used the equality V = C. The previous equation shows that the portfolio is
locally riskless. Therefore, by the FTAP,
∂C 1
0= + bCσ2 + σ 2 S 2 CSS − rC + rSCS + rV = rV.
∂t 2
The previous equation generalizes the Black-Scholes equation to the case in which volatility
is time-varying and non-stochastic, as a result of the assumption that a = 0. If a = 0, return
volatility is stochastic and, hence, there are no hedging portfolios to use to derive a unique
option price. However, we still have the possibility to characterize the price of the option.
Indeed, consider a self-financed portfolio with (i) two calls with different strike prices and
maturity dates (with weights 1 and γ), (ii) −α shares, and (iii) −β units of the MMA. We
denote the price processes of these two calls with C 1 and C 2 . The value of this portfolio is
V = C 1 + γC 2 − αS − βP , and satisfies,
dV = dC 1 + γdC 2 − αdS − βdP

= LC 1 + γLC 2 − αµS − rβP dτ + σS CS1 + γCS2 − α dW + a Cσ12 + γCσ22 dW σ ,
i
where LC i ≡ ∂C∂t
+ µSCSi + bCσi 2 + 12 σ 2 S 2 CSS
i
+ 12 a2 Cσi 2 σ2 , for i = 1, 2. In this context, risk can
be eliminated. Indeed, set
Cσ12
γ = − 2 and α = CS1 + γCS2 .
Cσ2
The value of this portfolio is solution to,

dV = LC 1 + γLC 2 − αµS + rV + αrS − rC 1 − γrC 2 dτ .
Therefore, by the FTAP,
0 = LC 1 + γLC 2 − αµS + αrS − rC 1 − γrC 2

= LC 1 − rC 1 − CS1 (µS − rS) + γ LC 2 − rC 2 − CS2 (µS − rS)
where the second equality follows by the definition of α, and by rearranging terms. Finally, by
using the definition of γ, and by rearranging terms,
LC 1 − rC 1 − CS1 (µS − rS) LC 2 − rC 2 − CS2 (µS − rS)

= . (10.22)
Cσ12 Cσ22
These ratios agree. So they must be equal to some process a · Λσ (say) independent of both the
strike prices and the maturity of the options. Therefore, we obtain that,
∂C 1 1
+ rSCS + [b − aΛσ ] Cσ2 + σ 2 S 2 CSS + a2 Cσ2 σ2 = rC. (10.23)
∂t 2 2
The economic interpretation of Λσ is that of the unit risk-premium required to face the risk
of stochastic fluctuations in the return volatility. The problem, the requirement of absence of
arbitrage opportunities does not suffice to recover a unique Λσ . In other words, by the Feynman-
Kac stochastic representation of a solution to a PDE, we have that the solution to Eq. (10.23)
is,
C(S(t), σ 2 (t) , t, T ) = e−r(T −t) EQΛ (S(T ) − K)+ S(t), σ 2 (t) , (10.24)
310
by A. Mele
where QΛ is a risk-neutral probability.

Eqs. (10.22) and (10.23) can be interpreted as APT relations. Indeed, let us define the unit
risk-premium related to the fluctuations of the asset price, λ = (µ − r) /σ. Then, Eq. (10.22)
or Eq. (10.23) imply that,

LC dC CS Cσ2
=E =r+ σS · λ + a · Λσ ,
C C C
C

≡β S ≡β σ 2
where β S is the beta related to the volatility of the option price induced by fluctuations in
the stock price, S, and β σ2 is the beta related to the volatility of the option price induced by
fluctuations in the return volatility.
10.5.4 Trading volatility

Buying volatility is a strategy relying on the expectation future volatility will increase. There
are option-based trading strategies that allow us to have views about volatility, such as straddles
or strangles. A straddle is a portfolio of one call option and one put option that have the same
strike price and the same maturity. A strangle is the same as a straddle, with the difference
that the strike of the call differs from that of the put. In 1995, the 233-year old Barings Bank
collapsed, because of the famous short-straddle Nick Leeson was implementing on the Nikkei
Index. A short-straddle is, of course, a view volatility will not raise. However, in January 1995,
a violent earthquake made the Nikkei index crash by almost 7% in a week. The straddle was
“naked,” i.e. delta-hedged, at most, which leads to losses Leeson was not only unable to absorb,
but also to amplify, given he was insisting on having views the Index would stabilize. The Index
did not.7
To understand straddles, and the reason why they are not necessarily the best way to take
views about volatility, consider the simplest strategy, where one buys an option and hedges it
through the Black and Scholes formula.8 Suppose to live in a world with stochastic volatility,
and purchase a call at time t, with market price equal to C (St , σ 2t , t) - we are assuming the
world moves exactly as in Eqs. (10.21). Build up a self-financed portfolio with value Vt ,
Vt = at St + bt Bt , (10.25)
where Bt = ert is the money market account,

V0 = C S0 , σ 20 , 0 , at = ∆BS (St , t; IV0 ) ,
and IV0 is the Black-Scholes implied volatility as of time t = 0, i.e. the time at which we are
to take a view on future volatility.
7 Losses from shorting straddles might considerably be reduced, through an additional portfolio comprising: (i) an out-of-the
money put, which pays exactly when the underlying goes down, and (ii) an out-of-the money call, which pays when the underlying
goes up. Combining this portfolio with a short-straddle leads to what is known as butterfly spread. Alternatives to straddles are
calendar spreads, which are portfolios long one call with maturity T1 and short one call with maturity T2 , where T1 < T2 , and
where the two calls have the same strike price. If the underlying asset price does not move too much, then, the calendar spread
value drops, because the price decay due to the passage of time (see Section 10.4.6.2) is more severe for the call with lower time to
maturity. Naturally, if the price of the underlying increases, the value of the calendar spread increases as well, due to the positions
in the two call options.
8 The following arguments also apply to the hypothetical situation where an investment bank, say, purchases an option for a
mere market making scope, and then tries to hedge against it through Black-Scholes. It is, however, an unrealistic situation, as
investment banks hedge through books, not through the single units adding up to the books.
311
by A. Mele
Consider, first, the following heuristic arguments. Assume the short-term rate, r, is zero and
that µ is also zero. Assume we live in the Black-Scholes world,
where
volatility is constant.
2
However, there might be periods where realized volatility, say ∆S St
t
is larger than IV20 . What
is the daily (say) profit and loss (P&L, henceforth) of call options valued at Πt ? Since µ = r = 0,
we have, approximately,
- 2 .
1 2 1 2 2 1 2 1 2 ∆St 2
P&Lt = ∆Πt = Θt ∆t+ Γt (∆St ) = − Γt St IV0 ∆t+ Γt (∆St ) = Γt St − IV0 ∆t ,
2 2 2 2 St
2
where Θ = ∂Π∂t
, Γ = ∂∂SΠ2 , the Gamma, and the third equality follows by a well-known property
of the Black-Scholes pricing equation. Aggregating the daily P&L until the maturity of the
option, we obtain: - .
T 2
1 ∆S t
P&LT = Γt St2 − IV20 ∆t . (10.26)
2 t=1 St
Hence, a portfolio
2 of options is a quite basic way to have views about the movements of future
volatility, ∆S
St
t
. It may lead to difficulties, however, as described below. Moreover, the P&L
in Eq. (10.26) should, also, consist of a term like ∆t St ∆S
St
t
, where ∆t is the delta ∂Π
∂S
: the realized
appreciation rate for the asset price should matter, in general. We may safely neglect this
term here, as this is in average small, due to the assumption that µ = 0. But if µ > 0, this
additional term contributes positively to the P&L, when ∆t > 0, and negatively otherwise. It
is natural: call option prices, for example, can go up because volatility goes up or because the
underlying stock prices go up. To isolate pure views about volatility, we need to hedge, as in
the continuous-time case analyzed below.
So consider a general situation where volatility is not constant, such that the model is misspec-
ified. El Karoui, Jeanblanc-Picqué and Shreve (1998) make the following observation. Consider
the value of the self-financed portfolio in Eq (10.25). Because this portfolio is self-financed,
dVt = at dSt + rbt Bt dt

= rVt dt + at (dSt − rSt dt)
= [rVt + at (µ − r) St ] dt + at σ t St dWt .
Moreover, by Itô’s lemma,

∂CBS ∂CBS 1 2 2 ∂ 2 CBS ∂CBS
dCBS (St , t; K, T, IV0 ) = + µSt + σ t St dt + σ t St dWt ,
∂t ∂S 2 ∂S 2 ∂S
where,
∂CBS ∂CBS 1 2 2 ∂ 2 CBS

+ µSt + σ t St
∂t ∂S 2 ∂S 2
∂CBS ∂CBS ∂ 2 CBS ∂CBS 1 2 2 ∂ 2 CBS
= + rSt + IV20 St2 + (µ − r) St + σ t − IV 2
0 St
∂t ∂S ∂S 2 ∂S 2 ∂S 2
≡ rCBS
∂CBS 1 2 ∂ 2 CBS
= rCBS + (µ − r) St + σ t − IV20 St2 .
∂S 2 ∂S 2
312
by A. Mele
Therefore, the tracking error, or P&Lt , defined as the difference between the Black-Scholes price
and the portfolio value,
P&Lt ≡ CBS (St , t; K, T, IV0 ) − Vt ,
satisfies,
1 2 2
2 ∂ 2 CBS
dP&Lt = rP&Lt + σ − IV0 St dt.
2 t ∂S 2
At maturity T :
P&LT ≡ CBS (ST , T ; K, T, IV0 ) − VT

= max {ST − K, 0} − VT
"
1 rT T −rt 2 ∂ 2 CBS
= e e σ t − IV20 St2 dt. (10.27)
2 0 ∂S 2
This expression is the “neat” version of Eq. (10.26). It is possible to show that a (delta-hedged)
straddle strategy leads to twice the expression in Eq. (10.27), with the second partial of the
straddle replacing the Black-Scholes Gamma. Eq. (10.27) has the following implications. We
know the Black-Scholes price is convex. Hence, Eq. (10.27) tells us that even if we do not exactly
know the law of movement for volatility, but still hold the view it will increase in the future,
we could: (i) buy a call option; (ii) short the Black-Scholes replicating portfolio. The P&L in
Eq. (10.27) shows that this strategy leads to positive profits. Naturally, this is not an arbitrage
opportunity. The critical assumption is that volatility will increase.
Eq. (10.27) shows a disturbing feature. Even if the volatility σ t is larger than IV0 for most of
the time, the final P&L may not necessarily lead to a profit. The reason is that each volatility
2C
view, σ 2t −IV20 , is weighted by the “Dollar Gamma,” St2 ∂ ∂S 2 . It may be that “bad” realization of
BS
2 2
the volatility views, i.e. σ t < IV0 , occur precisely when the Dollar Gamma is large. This feature
is known as “price-dependency.” Moreover, the strategy is costly, as it relies on expensive ∆-
hedging. (Naturally, this issue does not apply to straddles.) Volatility contracts overcome these
difficulties, and are described in Section 10.6 below.
10.5.5 Pricing formulae

10.5.5.1 The first pricing formula: Hull and White (1987)
Hull and White (1987) derive the first pricing formula in the stochastic volatility literature.
They assume that the return volatility is independent of the asset price, and show that,
!
C(S (t) , σ 2 (t) , t, T ) = EṼ BS(S(t), t, T ; Ṽ ) ,
where BS(S(t), t, T ; Ṽ ) is the Black-Scholes formula obtained by replacing the constant σ 2 with
Ṽ , and " T
1
Ṽ = σ 2 (τ ) dτ .
T −t t
This formula tells us that the option price is simply the Black-Scholes formula averaged over
all the possible “values” taken by the future average volatility Ṽ . A proof of this equation is
given in the appendix.9
9 The result does not hold in the general case in which the asset price and volatility are correlated. However, Romano and Touzi
(1997) prove that a similar result holds in such a more general case.
313
by A. Mele
10.5.5.2 Heston (1993)
The most celebrated formula is Heston’s (1993b), which holds when the return volatility is a
square-root process:
+
d ln S (t) = r − 12 σ 2 (t) dt + σ (t) dW (t)
(10.28)
dσ 2 (t) = κ (ω − σ 2 (t)) dt + ξσ (t) dW σ (t)
where W and W σ are two correlated Brownian motions, with correlation ρ.
It is instructive to go through a derivation of this formula, as this reveals some general
properties of option prices. Let us develop the price of a call option, similarly to what we have
done for Eq. (10.3), as follows:

e−r(T −t) Et (S (T ) − K)+
" ∞" ∞

=e −r(T −t)
(S (T ) − K)+ qt S (T ) , σ 2 (T ) dS (T ) dσ 2 (T )
"0 ∞ 0 " ∞
−r(T −t) m −r(T −t)
=e IS(T )≥K S (T ) qt (S (T )) dS (T ) − e K IS(T )≥K qtm (S (T )) dS (T ) ,
" ∞0 " ∞ 0
= S (t) IS(T )≥K q̂tm (S (T )) dS (T ) − e−r(T −t) K IS(T )≥K qtm (S (T )) dS (T )

0 0
−r(T −t)
= S (t) · Q̂t (S (T ) ≥ K) − e K · Qt (S (T ) ≥ K) , (10.29)
where qt (S (T ) , σ 2 (T )) is the risk-neutral joint density of the stock price and variance at T ,
qtm (S (T )) is the risk-neutral marginal density of the stock price at T , and finally, q̂tm (S (T )) is
a new marginal density of the stock price at T , with Radon-Nykodim derivative with respect
to qtm (S (T )) given by the expression in Eq. (10.4):
q̂tm (S (T )) e−r(T −t) S (T )
η t (T ) = m = ,
qt (S (T )) S (t)
and finally, Q̂t (S (T ) ≥ K) and Qt (S (T ) ≥ K) are two probabilities with densities q̂tm and qtm ,
respectively. All these densities and probabilities are conditional upon the information at time
t.
It is easy to see that the state process, η τ (T ), is solution to:
dη τ (T )
= − (−σ (τ )) dW (τ ) ,
η τ (T )
such that the stock price is solution to:
+
d ln S (t) = r + 12 σ 2 (t) dt + σ (t) dŴ (t)
(10.30)
dσ 2 (t) = κ (ω − σ 2 (t)) dt + ξσ (t) dW σ (t)
under q̂tm .
Let x ≡ ln S. In the Black-Scholes case, σ 2 (t) is a constant, and the two probabilities,
Q̂t (x (T ) ≥ ln K) and Qt (x (T ) ≥ ln K), can be expressed in closed-form, using Eq. (10.30)
and Eq. (10.28), respectively, leading to the celebrated formula in Eq. (10.11).
In the Heston’s model, the two probabilities, P1 (x (t) , σ 2 (t) , t) ≡ Q̂t (x (T ) ≥ ln K) and
P2 (x (t) , σ 2 (t) , t) ≡ Qt (x (T ) ≥ ln K), are solutions to:

L̂P1 x, σ 2 , t = 0, LP2 x, σ 2 , t = 0, (10.31)
314
10.6. Local volatility c
by A. Mele
with the same boundary condition Pj (x, σ 2 , T ) = Ix≥ln K , j = 1, 2, and where L̂ and L are the
infinitesimal generators associated to Eq. (10.30) and Eq. (10.28). While the solution to these
probabilities is unknown in closed-form, their characteristic functions are exponential affine in
x and σ 2 . Precisely, let the two characteristic function be defined as:
√
f1 x, σ 2 , t; φ = Êt e−iφx(T ) , f2 x, σ 2 , t; φ = Et e−iφx(T ) , i = −1,
where Êt denotes the expectation taken with respect to q̂tm , and Et denotes the conditional
expectation taken against qtm .
The two functions fj satisfy the same partial differential equations (10.31), but they can be
solved in closed-form, because their boundary conditions are simply fj (x, σ 2 , T ) = e−iφx . Indeed,
a fundamental definition is that a model is affine if its characteristic function is exponential-
affine in its state variables. Affine models were already in use to analyze the term structure of
interest rates, since at least Vasicek (1977) and Cox, Ingersoll and Ross (1985), as discussed in
Chapter 11. Heston’s model is the counterpart to those models in the option pricing domain.
The solution to the two characteristic function is s equation is given by:
2
fj x, σ 2 , t; φ = eCj (T −t;φ)+Dj (T −t;φ)σ +iφx,
where

κω 1 − gj edj (T −t)
Cj (T − t; φ) = rφi (T − t) + 2 (bj − ρξφi + dj ) (T − t) − 2 ln
ξ 1 − gj
dj (T −t)

bj − ρξφi + dj 1−e
Dj (T − t; φ) = 2
ξ 1 − gj edj (T −t)

bj − ρξφi + dj
gj = , dj = (bj − ρξφi)2 − ξ 2 2uj φi − φ2
bj − ρξφi − dj
1 1
b1 = κ − ξρ, b2 = κ, u1 = , u2 = −
2 2
such that,
" ∞
2
1 1 e−iφ ln K fj (x (t) , σ 2 (t) , t; φ)
Pj x (t) , σ (t) , t = + Re dφ.
2 π 0 iφ
[Provide a small technical Appendix on inversions of characteristic functions] Replacing these

two probabilities into Eq. (10.29), yields the celebrated Heston’s formula.
10.6 Local volatility

10.6.1 Issues
Stochastic volatility models might provide interesting explanations for the smile effect, as dis-
cussed in Section 10.5.2. However, the very same models cannot allow for a perfect fit of the
smile. Towards the end of 1980s and the beginning of the 1990s, a modeling approach emerged
to cope with issues relating to a perfect fit of the yield curve. As reviewed in Chapters 11 and
13, this approach is needed, as it makes the pricing of interest rate derivatives rely on models
where the underlying assets in the books of the banks, bonds for instance, are priced without
315
10.6. Local volatility c
by A. Mele
any error, as for the simple European options reviewed in this chapter. In 1993 and 1994, Der-
man & Kani, Dupire and Rubinstein [cite exact references] come up with a technology that
could be applied to options on tradable assets.
Why is it important to exactly fit the structure of already existing plain vanilla options?
Banks trade both plain vanilla and less liquid, or “exotic” derivatives. Suppose we wish to price
exotic derivatives. We want to make sure the model we use to price the illiquid option must
predict that the plain vanilla option prices are identical to those we are trading. How can we
trust a model that is not even able to pin down all outstanding contracts? A model like this
could give rise to arbitrage opportunities to unscrupulous users.
10.6.2 How does it work?

As usual in this context, we model asset prices under the risk-neutral probability. Accordingly,
let Ŵ be a Brownian motion under the risk-neutral probability, and E the expectation operator
under the risk-neutral probability. The logical steps leading to pricing take place as follows:
(i) We take as given the prices of a set of actively traded European options. Let K and T be
strikes and time-to-maturity of these liquid options. We aim to match the model to the
data:
C$ (K, T ) = C (K, T ) , K, T varying, (10.32)
where C$ (K, T ) are market data, and C (K, T ) are the model’s prediction.
(ii) Is it mathematically possible to consider a diffusive model for the stock price such that
such that the initial collection of European option prices, C$ (K, T ), is predicted without
errors by the resulting model, as in Eq. (10.32)? The answer is in the affirmative. Consider
a diffusion process for the stock price:
dSt
= rdt + σ (St , t) dŴt .
St
The only function to “calibrate” to make Eq. (10.32) hold is the volatility function,
σ (St , t).
(iii) The Appendix shows that Eq. (10.32) holds when σ (St , t) = σ loc (St , t), where:
=
> ∂C(K, T ) ∂C(K, T )
> + rK
>
σ loc (K, T ) = > ∂T ∂K
?2 ∂ 2
C(K, T )
. (10.33)
K 2
∂K 2
The function σ loc (S, t) is referred to as “local volatility.”
(iv) Finally, we can price the illiquid options through numerical methods, say via simulations.
In the simulations, we use
dSt
= rdt + σ loc (St , t) dŴt .
St
316
10.7. Variance swaps c
by A. Mele
Empirically, the local volatility surface, σ loc (S, t) is typically decreasing in S for fixed t, a
phenomenon known as the Black-Christie-Nelson leverage effect (Black, 1976; Christie, 1982;
Nelson, 1991), discussed in Section 10.5 and also in Chapter 8. [Also, explain this in Section
10.5 while introducing ARCH models] This fact might lead to assume from the outset that
σ(x, t) = xα f (t), for some function f and some constant α < 0, as simplification leading to the
so-called CEV (Constant Elasticity of Variance) model. Practitioners are increasing relying on
the so-called SABR model, which combines “local vols” with “stoch vol,” as follows:
dSt
= rdt + σ(St , t) · vt · dŴt
St (10.34)
dvt = φ(vt )dt + ψ(vt )dŴtv
where Ŵ v is another Brownian motion, and φ, ψ are some functions. [Provide references.] The
appendix shows that in this specific case, the initial structure of European options prices is
pinned down by:
σ loc (K, T )
σ̃ loc (K, T ) = , (10.35)
E ( vT2 | ST )
where σ loc (K, T ) is the same as in Eq. (10.33). For this model, we simulate

 dSt
= rdt + σ̃ loc (St , t) · vt · dŴt
St

dvt = φ(vt )dt + ψ(vt )dŴtv
10.7 Variance swaps

How much volatility do we expect to prevail in the future, after controlling for risk? The answer
to this question has long been conjectured to be the volatility implied by at-the-money options.
In fact, it is not. Expected volatility, adjusted for risk, is a weighted average of implied volatil-
ities of a continuum of options, as explained below. It is not mere academic purism. Knowing
expected volatility under the risk-neutral probability allows to trade assets with payoffs linked
to future realized volatility, known as variance swaps. In fact, in September 2003, the Chicago
Board Option Exchange (CBOE) changed its volatility index VIX to approximate the variance
swap rate of the S&P 500 index return (for 30 days), as in Eq. (10.38) below. In March 2004,
the CBOE launched the CBOE Future Exchange for trading futures on the new VIX. Options
on VIX are also available for trading.
There are a number of compelling reasons explaining the interest investors may have in
these contracts. One is, undeniably, related to the possibility to take views about developments
in stock market volatility, without incurring into the price-dependency issues pointed out in
Section 10.5.4. Passive funds managers might also find these contracts useful, as in times of
high volatility, tracking errors widen and, then, index tracking performance deteriorates. Hedge
funds might find this type of contracts attractive as well, as they invest in “relative value”
strategies, attempting to profit from temporary price discrepancies. In times of high volatility,
price discrepancies typically widen, and volatility contracts help these institutions hedge against
these events.
317
by A. Mele
10.7.1 Pricing
Let us consider the following price process St under the risk-neutral probability:
dSt
= rdt + σ t dŴt ,
St
where σ t is Ft -adapted: i.e. Ft can be larger than FtS ≡ σ (Sτ : τ ≤ t). Then,
" ∞ ∂C(K,T )
−r(T −t)
2 ∂T
+ rK ∂C(K,T
∂K
)
e E σT = 2 dK. (10.36)
0 K2
Next, let us define the realized “integrated” variance within the time interval [T1 , T2 ], with
T1 > t: " T2
var (T1 , T2 ) ≡ σ 2u du.
T1
Let us, then, compute the risk-neutral expectation of such a “realized” variance. If r = 0, then,
by Eq. (10.36), " ∞
Ct (K, T2 ) − Ct (K, T1 )
E [var (T1 , T2 )] = 2 dK, (10.37)
0 K2
where Ct (K, T ) is the price as of time t of a call option expiring at T and struck at K. A proof
of Eq. (10.37) is in the Appendix.
In the general case where r > 0, we have, for T1 = t, T2 ≡ T ,
-" " ∞ .
F (t)
Pt (K, T ) Ct (K, T )
E [var (t, T )] = 2er(T −t) 2
dK + dK , (10.38)
0 K F (t) K2
where F (t) is the forward price: F (t) = er(T −t) S (t), and Pt (K, T ) is the price as of time t of
a put option expiring at T and struck at K. A proof of Eq. (10.38) is in the Appendix.
As mentioned, the new VIX index is just an approximation to E [var (t, T )], where the ap-
proximation arises due to the finite number of out-of-the-money options underlying Eq. (10.38).
The VIX index, then, can be used to price and, then, trade, variance swaps. A variance swap
is a contract that has zero value at entry (at t). At maturity T , the buyer of the swap receives,
π var
T = (var (t, T ) − var-p (t, T )) × Notional, (10.39)
where var-p (t, T ) is the swap rate agreed at t, and paid off at time T . Therefore, this contract
is a forward, not a swap really. If r is deterministic,
var-p (t, T ) = E [var (t, T )] ,
where E [var (t, T )] is given by Eq. (10.38). Therefore, (10.38) is used to evaluate these variance
swaps. Finally, it is worth mentioning that the previous contracts rely on some notions of
realized volatility as a continuous record of returns is obviously unavailable. Sometimes it is
said that variance swaps are profitable to protection sellers, because “The derivative house has
the statistical edge,” meaning that realized variance from t to T , say, is general lower than
future expected variance under the risk-neutral probability, reflecting variance risk-premiums.
How is expected variance in Eq. (10.38) related to the skew? Derman et al. [Goldman Sachs
note, provide reference] show that
<
1
E [var (t, T )] ≈ σ atm 1 + 2 (T − t) · Skew.
T −t
[Work in progress]
318
by A. Mele
10.7.2 Forward volatility trading

Let us consider the following example of structured volatility trading. Suppose we hold the view
that market volatility will rise in one year time, to an extent that is inefficiently priced in by
the term structure of the currently traded variance swaps. Precisely, our view is that the spot
price of the variance swap in one year will exceed the “implied forward variance swap price,”
i.e.
var-p (1, 2) > var-p (0, 2) − var-p (0, 1) . (10.40)
To implement a trade consistent with this view, we may proceed as follows:
(i) long a two year variance swap, struck at var-p (0, 2) , with notional one
(ii) short a one year variance swap, struck at var-p (0, 1), with notional e−r
[10.Pfolio.1]
Obviously, this strategy does not cost, at time zero.
The strategy in [10.Pfolio.1] generates profits whenever Eq. (10.40) holds true. Indeed, sup-
pose Eq. (10.40) holds true at time 1. Then, come time 1, we can short another one year variance
swap, struck at var-p (1, 2). Intuitively, we do so because “we bought it cheap,” according to
Eq. (10.40). Shorting this variance swap at time 1 generates the following payoff at time 2:
π 1 (2) ≡ var-p (1, 2) − var (1, 2) . (10.41)
Moreover, the two year variance swap we went long at time zero (component (i) of [10.Pfolio.1])
gives rise to the following payoff at time 2:
π 2 (2) ≡ var (0, 2) − var-p (0, 2) . (10.42)
Adding Eq. (10.41) and Eq. (10.42), and using the relation, var (0, 2) = var (0, 1) + var (1, 2),
leads to:
π (2) ≡ π 1 (2) + π 2 (2) = var-p (1, 2) + var (0, 1) − var-p (0, 2) .
Finally, the one year variance swap with notional e−r we shorted at time zero (component (ii)
of [10.Pfolio.1]) leads to the following payoff at time 1:
π (1) ≡ (var-p (0, 1) − var (0, 1)) e−r . (10.43)
Investing π (1) for a further year at the safe interest rate delivers π (1) er at time 2, such that
the total profits at time 2 are:
π tot ≡ π (2) + π (1) er = var-p (1, 2) − var-p (0, 2) + var-p (0, 1) > 0, (10.44)
where the inequality follows by Eq. (10.40).
10.7.3 Marking to market

Suppose a variance contract expiring at time T is issued at time t, when it is costless. How is this
contract worth at time τ ∈ (t, T )? Let us take the time τ risk-neutral discounted expectation
of π var
T in Eq. (10.39),
Eτ (π var
T )
= e−r(T −τ ) Eτ (var (t, τ ) + var (τ , T ) − var-p (t, T ))
Notional
= e−r(T −τ ) (var (t, τ ) + var-p (τ , T ) − var-p (t, T )) . (10.45)
319
by A. Mele
where Eτ denotes the risk-neutral expectation conditional upon the information available at
time τ .
Marking to market suggests an alternative way to implement the forward volatility trading
exercise of the previous section. Suppose, then, again, to have the view that markets for volatility
will make Eq. (10.40) hold true at time 1, and, accordingly, consider the strategy in [10.Pfolio.1].
If Eq. (10.40) holds true at time 1, then, we may close the position (i) in [10.Pfolio.1] at time
1. By Eq. (10.45), the market value of the two year variance swap we were long at time 0 is,
π̂ (1) ≡ (var (0, 1) + var-p (1, 2) − var-p (0, 2)) e−r . (10.46)
Adding π̂ (1) to π (1) in Eq. (10.43) leads to a total profit of e−r π tot at time 1, where π tot is as
in Eq. (10.44).
10.7.4 Stochastic interest rates

When interest rates are stochastic, but still independent of volatility, the expressions given for
the contract and indexes are still the same, with the bond price P (t, T ) replacing e−r(T −t) , as
mentioned in Remark A.1 in Appendix 3. However, the forward volatility trading strategy in
[10.Pfolio.1] should be modified. For example, we might use the following strategy:
(i) long a two year variance swap, struck at var-p (0, 2) , with notional one
(ii) short a one year variance swap, struck at var-p (0, 1), with notional PP (0,2)
(0,1)
If come time 1, Eq. (10.40) holds true, we may, then, liquidate (i), thereby accessing the payoff
relating to (ii), for a total payoff equal to:
P (0, 2)
(var (0, 1) + var-p (1, 2) − var-p (0, 2)) P (1, 2) + (var-p (0, 1) − var (0, 1))
P (0, 1)
= (var-p (1, 2) − var-p (0, 2) + var-p (0, 1)) P (1, 2)

P (0, 2)
+ (var (0, 1) − var-p (0, 1)) P (1, 2) − ,
P (0, 1)
where the first term on the left hand side arises by the liquidation of (i) and by Eq. (10.46),
and the second term on the left hand side arises by (ii). By Eq. (10.40), the first term on the
right hand side is positive. If the short-term interest rate was deterministic, P (1, 2) = PP (0,2)
(0,1)
,
and the second term on the right hand side would be zero. When interest rates are stochastic,
the second term can take on any sign although then, its absolute value should be quite low,
compared to the first term on the right hand side.
10.7.5 Hedging
A financial institution might be merely interested in intermediating the contract, which then
needs to be hedged against. Suppose, for example, that the financial institution sells protection
at time t, thereby promising to pay the realized integrated variance var (t, T ) at time T . We
want to replicate this integrated variance. By Itô’s lemma:
" T " T
1 ST 1 FT
var (t, T ) = 2 dSu − 2 ln =2 dSu − r (T − t) − 2 ln . (10.47)
t Su St t Su Ft
320
10.8. American options c
by A. Mele
The first term can be replicated by continuously rebalancing a stock position so that it is
always long θ t = S2t shares of the stock, adjusted for time value of money. More precisely, we
consider a self-financed portfolio (θτ , ψ τ ), such that its value satisfies:
Vτ = θτ Sτ + ψ τ Mτ ,
where Mτ denotes the money market account. We choose:

" τ
1 Mτ 1 1
θ̂τ = , ψ̂ τ = dSu − 1 − r (τ − t) . (10.48)
Sτ MT t Su MT
It is easy to see that "
τ
1 Mτ
V̂τ = dSu − r (τ − t) , (10.49)
t Su MT
#T
such that: (i) V̂t = 0, and (ii) V̂T = t S1u dSu − r (T − t). In the appendix, we show that (θ̂τ , ψ̂ τ )
is self-financed. The bottom line is that we can hedge the first term in Eq. (10.47) through a
self-financed portfolio that costs nothing at time t. This portfolio is simply (2θ̂τ , 2ψ̂ τ ).
To replicate the second term in Eq. (10.47), the payoff of the so-called log-contract, note
that, by Eq. (10A.9) in the Appendix,
" Ft " ∞
FT 1 + 1 + 1
−2 ln = −2 (FT − Ft ) + 2 (K − ST ) dK + (ST − K) dK .
Ft Ft 0 K2 Ft K2
Therefore, the log-contract can be replicated by shorting 2/Ft units of forwards, which are of
course costless at time t, and going long a continuum of out-of-the-money options with weights
2/K 2 , which cost
" Ft " ∞
Pt (K, T ) Ct (K, T )
2 2
dK + 2 2
dK = e−r(T −t) E [var (t, T )]
0 K Ft K
where the equality follows by Eq. (10.38). We borrow e−r(T −t) E [var (t, T )] to purchase these
options, and once this is done, we are guaranteed var (t, T ) is replicated at time T , as we now
have replicated both the first term and the second term in Eq. (10.47). Finally, come time T ,
we pay back the loan, worth E [var (t, T )], and receive a payoff equal to var (t, T ) − E [var (t, T )],
due to the sale of insurance. Since var (t, T ) is replicated, no additional funds are needed at
time T .
10.8 American options

10.8.1 Real options theory
The option can be exercised at any time before the expiry date, T . When the option is exercised,
it yields a payoff equal to a function of the underlying asset price, say ψ (S (t)). Let Ct be the
price of an American option as of time t. In discrete time, we have:
& '
Ct = max ψ (St ) , e−r∆t E [Ct+∆t ] .
We suppose that the nature of the option, summarized by the payoff ψ (St ), is such that there
are two regions, a stopping region and a continuation region, defined as follows:
321
by A. Mele
(i) Stopping region, where time-to-maturity and the price & of the −r∆t
asset underlying
' the option
are such that it is optimal to exercise, Ct = max ψ (St ) , e E [Ct+∆t ] = ψ (St ), in
which case, of course, Ct ≥ e−r∆t E [Ct+∆t ]. By rearranging terms
E [Ct+∆t ] − Ct 1 − e−r∆t
0 ≥ e−r∆t − Ct .
∆t ∆t
The expected return on the option under the risk-neutral probability is less than that on
a bank deposit, which further clarifies why it is optimal to exercise early. Naturally, the
fact the option is yielding less than the safe interest rate is not an arbitrage. We could
simply not short the derivative, as no one else is willing to buy it, as it is not optimal to
do so.
(ii) Continuation region, where time-to-maturity and& the price of the asset 'underlying the op-
tion are such that it is optimal to wait, Ct = max ψ (St ) , e−r∆t E [Ct+∆t ] = e−r∆t E [Ct+∆t ],
or
E [Ct+∆t ] − Ct 1 − e−r∆t
0 = e−r∆t − Ct .
∆t ∆t
The expected return on the option under the risk-neutral probability is the same as that
on a bank deposit.
Note that the existence of these two regions is not guaranteed. For example, we shall see
that it is never optimal to exercise early American calls written on assets that do not distribute
dividends. When the two regions are, instead, well-defined, they define an exercise “envelope,” a
function of the asset price underlying the option and time-to-maturity. It is a “free boundary”
problem: we need to find a boundary that triggers some action, in this case, exercising the
option, and the boundary is free in that it is not given in advance as in the case of, say, the
barrier options of the following section.
This problem can be quite complex, but sometimes, simplifies for those derivatives with an
infinite expiry date, T . This simplification arises as in this case, the option price and, hence,
the envelope, only depends on the underlying asset price. Under this assumption, and the
assumption that the price of the asset underlying the option is a geometric Brownian motion
with volatility parameter σ, we have that the option price satisfies, in the limit ∆t → 0:
Stopping region: L [C] − rC ≤ 0 and C = ψ (S) (10.50)
Continuation region: L [C] − rC = 0 (10.51)
where L [C] = 12 σ 2 S 2 CSS +rSCS . To Eqs. (10.50)-(10.51), we have to add a number of conditions,
discussed in the two examples in the subsections below.
10.8.2 Perpetual puts

Consider an American perpetual put, where ψ (S) = (K − S)+ , and the price p is, accordingly,
a function of the underlying asset price S only. This price satisfies Eqs. (10.50)-(10.51), with
some additional conditions and qualifications. First, we assume, and later verify, that there
exists a value for the asset price, the free boundary, S∗ say, such that, it is optimal to exercise
the option whenever S < S∗ . In other terms, Eqs. (10.50)-(10.51) can be written as:
Stopping region (S ≤ S∗ ): p (S∗ ) = K − S∗ (10.52)
Continuation region (S > S∗ ): L [p] − rp = 0 (10.53)
322
by A. Mele
where K is the strike price of the option. Eq. (10.52) is, then, a “value-matching” condition, as
explained in Chapter 4 in a related context. It ensures that the pricing function p is continuous
as we move from the continuation towards the stopping region.
Second, we require the following boundary condition:
lim p (S) = 0. (10.54)

S→∞
That is, as the asset price gets large, the value of the put option needs to approach zero, as the
probability the derivative is ever exercised becomes negligible.
Finally, the pricing function, p (S), satisfies the following “smooth-pasting” condition, ob-
tained after taking the derivative in Eq. (10.52), as also explained in Chapter 4:
pS (S∗ ) = −1. (10.55)
We conjecture that in the continuation region, the pricing function p that solves Eq. (10.53) has
the form p (S) = AS γ , for two constants A and γ. Plugging this guess into Eq. (10.53) reveals
that actually, the pricing function satisfying it has the following form:
p (S) = A+ S γ + + A− S γ − , (10.56)
where A+ and A− are two constants, to be pinned down, γ + = 1 and γ − = − σ2r2 . To satisfy
the boundary condition in Eq. (10.54), we need that A+ = 0, which leaves p (S) = A− S γ − .
Evaluating this function at S∗ , as in Eq. (10.52), and using the smooth pasting condition in
Eq. (10.55), yields: + γ
p (S∗ ) = A− S∗ − = K − S∗
γ −1 (10.57)
pS (S∗ ) = γ − A− S∗ − = −1
The endogenous variables of this system are the two constants A− and S∗ . We have:
2r
S∗ = K, (10.58)
2r + σ 2
−γ −
and A− = (K − S∗ ) S∗ , such that
γ −
S
p (S) = (K − S∗ ) .
S∗
A few comments are in order. First, Eq. (10.58) shows that the value to wait increases with
σ 2 . Second, when the short-term rate is zero, S∗ = 0, meaning it is never optimal to exercise,
and the option is worthless. Intuitively, in the stopping region, the expected return on the
option under the risk-neutral probability is less than that on a bank deposit. When r = 0, this
expected return is negative, which destroys the time-value of money argument underpinning
early exercise.
10.8.3 Perpetual calls

As anticipated, not any payoff gives rise to well-defined stopping and continuation regions, such
as those in Eqs. (10.50)-(10.51). For call options, where ψ (S) = (S − K)+ , it is never optimal
to exercise early, when the underlying assets do not pay dividends. To illustrate, we follow the
same reasoning in the previous subsection, and find that the call price, c (S), has the same
323
10.9. A few exotics c
by A. Mele
functional form as in Eq. (10.56), with the same values of γ − and γ + . However, it satisfies the
boundary condition limS→0 c (S) = 0, rather than limS→∞ c (S) = 0, as the put price does in Eq.
(10.54). Therefore, we must have that c (S) = A+ S γ + , where, recall, γ ∗ = 1. The counterparts
to the two Eqs. (10.57), then, are c (S∗ ) = A+ = S∗ − K and cS (S∗ ) = A+ = 1, such that the
option price fails to satisfy the smooth pasting condition.
[With dividends]
10.9 A few exotics

10.10 Market imperfections
324
10.11. Appendix 1: Additional details on Black & Scholes c
by A. Mele
10.11 Appendix 1: Additional details on Black & Scholes

10.11.1 The original arguments
The original arguments in Black and Scholes (1973) and Merton (1973) rely on the assumption the
option is already traded. Let dS/ S = µdτ + σdW . Create a self-financing portfolio of n̄S units of the
underlying asset and nC units of the European call option, where n̄S is an arbitrary number. Such a
portfolio is worth V = n̄S S + nC C and since it is self-financing it satisfies:
dV = n̄S dS + nC dC

1 2 2
= n̄S dS + nC CS dS + Cτ + σ S CSS dτ
2

1
= (n̄S + nC CS ) dS + nC Cτ + σ2 S 2 CSS dτ
2
where the second line follows from Itô’s lemma. Therefore, the portfolio is locally riskless whenever
1
nC = −n̄S ,
CS
in which case V must appreciate at the r-rate

dV nC Cτ + 12 σ2 S 2 CSS dτ − C1S Cτ + 12 σ2 S 2 CSS
= = dτ = rdτ .
V n̄S S + nC C S − C1S C
The last equality, plus the boundary condition, lead to the Black-Scholes partial differential equation.
10.11.2 Delta
We have:
∂BS
= N(d1 ). (10A.1)
∂S
Indeed, the Black-Scholes formula is homogenous of degree one in S and K, that is, BS(λS, λK) =
λBS(S, K). Therefore, bu Euler’s theoreom,
∂BS ∂BS
BS (S, K) = S+ K,
∂S ∂K
and Eq. (10A.1) then follows by identifying terms in the Black-Scholes formula.
325
10.12. Appendix 2: Stochastic volatility c
by A. Mele
10.12 Appendix 2: Stochastic volatility

10.12.1 Proof of the Hull and White (1987) equation
By the law of iterated expectations, (10.24) can be written as:

C(S(t), σ2 (t) , t, T ) = e−r(T −t) E [S(T ) − K]+ S(t), σ2 (t)
& ' !

= E E e−r(T −t) [S(T ) − K]+ S(t), σ 2 (τ ) τ ∈[t,T ] S(t), σ2 (t)
!

= E BS S(t), t, T ; Ṽ S(t), σ2 (t)
!

= E BS S(t), t, T ; Ṽ σ2 (t)
"

= BS S(t), t, T ; Ṽ Pr Ṽ σ2 (t) dṼ
!
≡ EṼ BS S(t), t, T ; Ṽ , (10A.2)
where Pr(Ṽ | σ2 (t)) is the density of Ṽ conditional on the current volatility value σ2 (t).
In other terms, the price of an option on an asset with stochastic volatility is the expectation of
the Black-Scholes formula over the distribution of the average (random) volatility Ṽ . & To understand
'
better this result, all we have to understand is that conditionally on the volatility path σ2 (τ ) τ ∈[t,T ] ,

ln S(T )
S(t) is normally distributed under the risk-neutral probability measure. To see this, note that
under the risk-neutral probability measure,
" " T
S(T ) 1 T 2
ln = r(T − t) − σ (τ ) dτ + σ(τ )dW (τ ).
S(t) 2 t t
Therefore, conditionally upon the volatility path {σ (τ )}τ ∈[t,T ] ,

" T
S(T ) 1 S(T )
E ln = r(T − t) − (T − t) Ṽ and var ln = σ2 (τ )dτ = (T − t) Ṽ .
S(t) 2 S(t) t
This shows the claim. It also shows that the Black-Scholes formula can be applied to compute the
inner expectation of the second line of Eq. (10A.2). And this produces the third line of Eq. (10A.2).
The fourth line is trivial to obtain. Given the result of the third line, the only thing that matters in
the remaining conditional distribution is the conditional probability Pr(Ṽ | σ2 (t)), and we are done.
10.12.2 Simple smile analytics
326
10.13. Appendix 3: Local volatility and volatility contracts c
by A. Mele
10.13 Appendix 3: Local volatility and volatility contracts

In all the proofs to follow, all expectations are taken to be expectations conditional on Ft . However,
to simplify notation, we simply write E ( ·| ·) ≡ E ( ·| ·, Ft ).
Proof of Eqs. (10.35) and (10.36). We first derive Eq. (10.35), a result encompassing Eq.
(10.33). By assumption,
dSt
= rdt + σt dŴt ,
St
where σt is some Ft -adapted process. For example, σ t ≡ σ(St , t) · vt , all t, where vt is solution to the
2nd equation in (10.34). Next, by assumption we are observing a set of option prices C (K, T ) with a
continuum of strikes K and maturities T . We have,
C (K, T ) = e−r(T −t) E (ST − K)+ , (10A.3)
and
∂
C (K, T ) = −e−r(T −t) E (IST ≥K ) . (10A.4)
∂K
For fixed K,

+ 1 2 2
dT (ST − K) = IST ≥K rST + δ (ST − K) σT ST dT + IST ≥K σT ST dŴT ,
2
where δ is the Dirac’s delta. Hence, by the decomposition (ST − K)+ + KIST ≥K = ST IST ≥K ,
dE (ST − K)+ 1
= r E (ST − K)+ + KE (IST ≥K ) + E δ (ST − K) σ2T ST2 .
dT 2
By multiplying throughout by e−r(T −t) , and using (10A.3)-(10A.4),
+
−r(T −t) dE (ST − K) ∂C (K, T ) 1
e = r C (K, T ) − K + e−r(T −t) E δ (ST − K) σ2T ST2 . (10A.5)
dT ∂K 2
We have,
""

E δ (ST − K) σ2T ST2 = δ (ST − K) σ2T ST2 φT ( σ T | ST ) φT (ST ) dST dσ T

≡ joint density of (σ T ,ST )
" "
= σ2T δ (ST − K) ST2 φT (ST ) φT ( σT | ST ) dST dσT
"
= K 2 φT (K) σ2T φT ( σT | ST = K) dσT

≡ K 2 φT (K) E σ 2T ST = K .
By replacing this result into Eq. (10A.5), and using the famous relation
∂ 2 C (K, T )
= e−r(T −t) φT (K) (10A.6)
∂K 2
(which easily follows by differentiating once again Eq. (10A.4)), we obtain

dE (ST − K)+ ∂C (K, T ) 1 ∂ 2 C (K, T ) 2
e−r(T −t) = r C (K, T ) − K + K2 2
E σT ST = K . (10A.7)
dT ∂K 2 ∂K
327
by A. Mele
We also have,
∂ ∂E (ST − K)+
C (K, T ) = −rC (K, T ) + e−r(T −t) .
∂T ∂T
Therefore, by replacing the previous equality into Eq. (10A.7), and by rearranging terms,
∂ ∂C (K, T ) 1 2 ∂ 2 C (K, T ) 2
C (K, T ) = −rK + K 2
E σT ST = K .
∂T ∂K 2 ∂K
This is,
∂C (K, T ) ∂C (K, T )
+ rK
E σ2T ST = K = 2 ∂T ∂K ≡ σloc (K, T )2 . (10A.8)
∂ 2 C (K, T )
K2
∂K 2
As an example, let σt ≡ σ (St , t) · vt , where vt is solution to the 2nd equation in (10.34). Then,

σloc (K, T )2 = E σ2T ST = K

= E σ(ST , T )2 · vT2 ST = K

= σ(K, T )2 E vT2 ST = K

≡ σ̃loc (K, T )2 E vT2 ST = K ,
which proves Eq. (10.35).

Next, we prove Eq. (10.36). We have,
" ∞

E σ2T = E σ2T ST = K φT (K) dK
0
"∞ ∂C(K,T ) + rK ∂C(K,T )
∂T ∂K
= 2 ∂ 2 C(K,T )
φT (K) dK
0 K 2
∂K 2
" ∞ ∂C(K,T )
r(T −t) ∂T + rK ∂C(K,T
∂K
)
= 2e dK,
0 K2
where the 2nd line follows by Eq. (10A.8), and the third line follows by Eq. (10A.6). This proves Eq.
(10.36).
Proof of Eq. (10.37). If r = 0, Eq. (10.36) collapses to,

" ∞ ∂C(K,T )

E σ2T = 2 ∂T
dK.
0 K2
Then, we have,
" T2 " ∞ " T2 " ∞
1 ∂C (K, u) C (K, T2 ) − C (K, T1 )
E [var (T1 , T2 )] = E σ2u du = 2 du dK = 2 dK.
T1 0 K2 T1 ∂T 0 K2
Proof of Eq. (10.38). By the standard Taylor expansion with remainder, we have that for any
function f smooth enough,
" x
f (x) = f (x0 ) + f ′ (x0 ) (x − x0 ) + (x − t) f ′′ (t) dt.
x0
328
by A. Mele
Let Ft be the forward rate, Ft = er(T −t) St . By applying this formula to ln FT ,

" FT
1 1
ln FT = ln Ft + (FT − Ft ) − (FT − t) 2 dt
Ft Ft t
" Ft " ∞
1 1 1
= ln Ft + (FT − Ft ) − (K − FT )+ 2 dK − (FT − K)+ 2 dK
Ft 0 K Ft K
" Ft " ∞
1 1 1
= ln Ft + (FT − Ft ) − (K − ST )+ 2 dK − (ST − K)+ 2 dK, (10A.9)
Ft 0 K Ft K
#x #x #∞
where the second equality follows because x0 (x − t) t12 dt = 0 0 (t − x)+ t12 dt + x0 (x − t)+ t12 dt, and
the third equality follows because the forward price at T satisfies FT = ST . Hence, by E (FT ) = Ft ,
" Ft " ∞
FT r(T −t) Pt (K, T ) Ct (K, T )
−E ln =e dK + dK . (10A.10)
Ft 0 K2 Ft K2
On the other hand, by Itô’s lemma,
" T
FT
E σ2u du = −2E ln . (10A.11)
t Ft
By replacing Eq. (10A.11) this formula into Eq. (10A.10) yields Eq. (10.38).
Remark A1. The previous proof results hold when the short-term rate is constant. The case of
stochastic interest rates is easily dealt with, when they are independent of the asset price. In this case,
Eq. (10A.10) is replaced by:
" Ft " ∞
FT Pt (K, T ) Ct (K, T )
−E ln = P (t, T ) dK + dK ,
Ft 0 K2 Ft K2
where P (t, T ) is the price of a zero at time t and expiring at time T . If interest rates and asset
prices are not independent, the variance contracts examined in this chapter cannot be expressed in a
model-free format.
Remark A2. For simplicity, let r = 0. The proof in this appendix reveal 8 that if dC(K,T
dT
)
=
+ 2
dE(ST −K) 2 ∂C(K,T ) 2 ∂ C(K,T )
dT , then, volatility must be restricted in a way to make σ = 2 ∂T K ∂K 2
. We
show the converse is true. The Fokker-Planck equation for the risk-neutral density is:
1 ∂2 2 2 ∂
2
x σ φ = φ, t, x forward.
2 ∂x ∂t
For simplicity, we may ignore those ill-posedness issues related to Eq. (10A.6), dealt
8 with in Tikhonov
∂2C 2 ∂C(x,T ) 2 ∂ 2 C(x,T )
and Arsenin (1977), and then, we have that φ = ∂x2 . Replacing σ = 2 ∂T x ∂x2
into the
Fokker-Planck equation leaves: $ ∂C(x,T ) %
∂2 ∂T ∂
2 2 φ = φ.
∂x ∂ C(x,T )
2
∂t
∂x
∂2C
This equation is satisfied by φ = ∂x2
.
Proof that (θ̂τ ,ψ̂τ ) in Eq. (10.48) is self-financed. For a portfolio strategy to be self-financed,
we need to have ψτ Mτ = Vτ − θτ Sτ and dVτ = θτ dSτ + ψτ dMτ , or:

dSτ dMτ dSτ
dVτ = θτ Sτ + ψτ Mτ = θτ Sτ − rdτ + rVτ dτ (10A.12)
Sτ Mτ Sτ
329
by A. Mele

where the second line follows by ψ τ Mτ = Vτ − θτ Sτ . With θ̂τ , ψ̂ τ , we have that:
dV̂τ = θ̂τ dSτ + ψ̂ τ dMτ

Mτ dSτ
= + ψ̂τ Mτ rdτ
MT Sτ

Mτ dSτ Mτ
= + V̂τ − rdτ
MT Sτ M
T
Mτ dSτ
= − rdτ + rV̂τ dτ , (10A.13)
MT Sτ
where we have used the portfolio weights in Eq. (10.48) and the expression for the portfolio value V̂
in Eq. (10.49). Eq. (10A.13) is the same as Eq. (10A.12), once we use the portfolio weight θ̂τ in Eq.
(10.48). Therefore, (θ̂τ , ψ̂τ ) is self-financed.
330
by A. Mele
References
Ball, C.A. and A. Roma (1994): “Stochastic Volatility Option Pricing.” Journal of Financial
and Quantitative Analysis 29, 589-607.
Black, F. (1976): “Studies of Stock Price Volatility Changes.” Proceedings of the 1976 Meeting
of the American Statistical Association, 177-81.
Bollerslev, T. (1986): “Generalized Autoregressive Conditional Heteroskedasticity.” Journal of

Econometrics 31, 307-327.
Bollerslev, T., Engle, R. and D. Nelson (1994): “ARCH Models.” In: McFadden, D. and R.
Engle (Editors): Handbook of Econometrics (Volume 4), 2959-3038. Amsterdam, North-
Holland
Christie, A.A. (1982): “The Stochastic Behavior of Common Stock Variances: Value, Leverage,
and Interest Rate Effects.” Journal of Financial Economics 10, 407-432.
Clark, P. K. (1973): “A Subordinated Stochastic Process Model with Fixed Variance for Spec-
ulative Prices.” Econometrica 41, 135-156.
Corradi, V. (2000): “Reconsidering the Continuous Time Limit of the GARCH(1,1) Process.”
Cox, J. C., J. E. Ingersoll and S. A. Ross (1985): “A Theory of the Term Structure of Interest
Rates.” Econometrica 53, 385-407.
El Karoui, N., M. Jeanblanc-Picqué and S. Shreve (1998): “Robustness of the Black and
Scholes Formula.” Mathematical Finance 8, 93-126.
Engle, R.F. (1982): “Autoregressive Conditional Heteroskedasticity with Estimates of the Vari-
ance of United Kingdom Inflation.” Econometrica 50, 987-1008.
Fama, E. (1965): “The Behaviour of Stock Market Prices.” Journal of Business 38, 34-105.
Gallmeyer, M., Aydemir, A.C. and B. Hollifield (2007): “Financial Leverage and the Leverage
Effect: A Market and a Firm Analysis.” working paper Carnegie Mellon.
Heston, S.L. (1993a): “Invisible Parameters in Option Prices.” Journal of Finance 48, 933-947.
Heston, S.L. (1993b): “A Closed Form Solution for Options with Stochastic Volatility with
Applications to Bond and Currency Options.” Review of Financial Studies 6, 327-344.
Hull, J. and A. White (1987): “The Pricing of Options with Stochastic Volatilities.” Journal
of Finance 42, 281-300.
331
by A. Mele
Mandelbrot, B. (1963): “The Variation of Certain Speculative Prices.” Journal of Business

36, 394-419.
Mele, A. (1998): Dynamiques non linéaires, volatilité et équilibre. Paris: Editions Economica.
Mele, A. and F. Fornari (2000): Stochastic Volatility in Financial Markets. Crossing the Bridge
to Continuous Time. Boston: Kluwer Academic Publishers.
Merton, R. (1973): “Theory of Rational Option Pricing.” Bell Journal of Economics and
Management Science 4, 637-654.
Nelson, D.B. (1990): “ARCH Models as Diffusion Approximations.” Journal of Econometrics

45, 7-38.
Renault, E. (1997): “Econometric Models of Option Pricing Errors.” In: Kreps, D., Wallis, K.
(Editors): Advances in Economics and Econometrics (Volume 3), 223-278. Cambridge:
Cambridge University Press.
Scott, L. (1987): “Option Pricing when the Variance Changes Randomly: Theory, Estimation,
and an Application.” Journal of Financial and Quantitative Analysis 22, 419-438.
Tauchen, G. and M. Pitts (1983): “The Price Variability-Volume Relationship on Speculative

Markets.” Econometrica 51, 485-505.
Taylor, S. (1986): Modeling Financial Time Series. Chichester, UK: Wiley.
Tikhonov, A. N. and V. Y. Arsenin (1977): Solutions to Ill-Posed Problems. Wiley, New York.
Vasicek, O. (1977): “An Equilibrium Characterization of the Term Structure.” Journal of

Wiggins, J. (1987): “Option Values and Stochastic Volatility: Theory and Empirical Esti-
mates.” Journal of Financial Economics 19, 351-372.
332
11
Interest rates
11.1 Prices and interest rates

11.1.1 Introduction
A pure-discount, or zero-coupon, bond is a contract that guarantees one unit of numéraire at
some maturity date. Apart from isolated exceptions, we only consider pure-discount bonds,
which we will then simply call bonds, to simplify the exposition. Also, with the exception of
Section 11.3.6, we assume no default risk. Default risk is, instead, more systematically dealt
with in the next chapter.
Let [t, T ] be a fixed time interval, and for each τ ∈ [t, T ], let P (τ , T ) be the price as of
time τ of a bond maturing at T > t. The information in this chapter is Brownian, except
for the jump-diffusion models in Section 11.3.6. The price of a bond in this chapter, then, is
driven by some multidimensional diffusion process {y (τ )}τ ≥t , which we emphasize by writing
P (y (τ ) , τ , T ) ≡ P (τ , T ). As an example, y can be a scalar diffusion, and r = y can be the
short-term rate. In this particular example, bond prices are driven by short-term rate movements
through the bond pricing function P (r, τ , T ). The exact functional form of the pricing function
is determined by (i) the assumptions made as regards the short-term rate dynamics and (ii)
the Fundamental Theorem of Asset Pricing (henceforth, FTAP). The bond pricing function in
the general multidimensional case is obtained following the same route. Models of this kind are
presented in Section 11.3.
A second class of models is one where bond prices cannot be expressed as a function of
any state variable. Rather, current bond prices are taken as primitives, and forward rates
(i.e., interest rates prevailing today for borrowing in the future) are multidimensional diffusion
processes. There is a relation linking bond prices to forward rates. The FTAP restricts the
dynamic behavior of future bond prices and forward rates. Models belonging to this second
class are analyzed in Section 11.4.
The aim of this chapter is to develop the simplest foundations of the previously described
two approaches to interest rate modeling. In the next section, we provide definitions of interest
rates and markets. Section 11.2.3 develops the two basic representations of bond prices: one in
terms of the short-term rate; and the other in terms of forward rates. Section 11.2.4 develops
11.1. Prices and interest rates c
by A. Mele
the foundations of the so-called forward martingale probability, which is a probability measure
under which forward interest rates are martingales. It is an important tool of analysis. [ ... ]
11.1.2 Markets and interest rate conventions

11.1.2.1 Markets for interest rates
There are three main types of markets for interest rates: (i) LIBOR; (ii) Treasure rate; (iii)
Repo rate (or repurchase agreement rate).
LIBOR (London Interbank Offer Rate) and other interbank rates
Many large financial institutions trade with each other deposits for maturities ranging from
just overnight to one year at a given currency. The LIBOR is the rate at which financial
institutions are willing to lend, on average. It is an average indicative quote of the interbank
lending market. It is calculated by Thomson Reuters for ten currencies, and published daily by
the British Bankers Association. Instead, the LIBID (London Interbank Bid Rate) is the rate
that these financial institutions are prepared to pay to borrow money, on average. Normally,
LIBID < LIBOR. The LIBOR is a fundamental point of reference to financial institutions,
which look at it as an opportunity cost of capital. Moreover, many fixed income instruments
are indexed to the LIBOR: forward rate agreements, interest rate swaps, or variable mortgage
rates.
The LIBOR is distinct from the US Federal Funds rate. Banks have to maintain reserves with
the Federal Reserve to partially back deposits and to clear financial transactions. Transactions
involve banks with excess reserves with the Fed, which earn no interest, to banks with reserve
deficiencies. The Federal Funds rate is the overnight rate at which banks lend these reserves
to each other. The Federal Funds rate is affected by the FDRBNY, which aims to make it lie
within a range of the target rate decided by the governors at Federal Open Market Committee
meetings. This range is “maintained” through open market operations.
Treasury rate
This is the rate at which a given Government can borrow at a given currency.
Repo rate (or repurchase agreement rate)
A Repo agreement is a contract by which one counterparty sells some assets to another one,
with the obligation to buy these assets back at some future date. The assets act as collateral.
The rate at which such a transaction is made is the repo rate. One day repo agreements give
rise to overnight repos. Longer-term agreements give rise to term repos.
Spreads
Interest rate spreads isolate interesting pieces of information, as they remove common com-
ponents of the interest rates generating the spreads, which we might not be interested in. An
important example is the overnight interest swap rate (OIS), which is the swap rate in a swap
agreement of fixed against variable interest rate payments, where the variable interest rate pay-
ments are made of an overnight reference, typically an average, unsecured interbank overnight
rate, such as the Federal Funds rate in the US, SONIA in the UK or EONIA in the Euro area.
(See Section 11.7.5 for definitions of swaps and swap rates.) An interesting indicator, then, is
the “3-month LIBOR − 3-month OIS” spread, also known as the LIBOR-OIS spread. Because
payments relating to overnight rates are not subject to default risk, and the overnight rate
is “anchored” to monetary policy, the LIBOR-OIS spread is capable of isolating credit views
334
by A. Mele
about financial institutions. It is generally flat, although then it reached high record levels dur-
ing the 2007 subprime crisis (see Figure 11.1). Instead, the so-called TED (Treasury bill rate
− Eurodollar LIBOR) spread, also captures “flight to quality” effects occurring during times
of crisis, when Treasury bonds are considered particularly valuable. For this “flight to quality”
reason, the TED spread might fail isolate views about developments in the interbank market.
FIGURE 11.1. Antonio Mele does not claim any copyright on this picture, which is taken from Brun-
nermeier (2009). The picture has been put here for illustrative purposes only, and permission to the
author shall be duly asked before the book will be published.
On a historical note, the Federal Funds rate has been the object of much empirical research.
In an attempt to explain how the “credit view” contributes to growth more than Friedman’s
monetary view, Bernanke and Blinder (1992) show that the Federal Funds rate makes the
predicting power of M1 growth insignificant. This finding initially spread enthusiasm about the
ability of this rate to explain short-run aggregate fluctuations. However, as surveyed for example
by Stock and Watson (2003), the explanatory power of the Federal Funds rate evaporizes, once
we condition on the term spread, a fact we comment in Section 1.1.4 below.
11.1.2.2 Mathematical definitions of interest rates
The simplest definition is that of simply compounded interest rates. A simply-compounded

interest rate at time τ , for the time interval [τ , T ], is defined as the solution L to the following
equation:
1
P (τ , T ) = . (11.1)
1 + (T − τ )L(τ , T )
335
by A. Mele
This definition is intuitive, and is the most widely used in the market practice. As an example,
LIBOR rates are computed in this way. In this case, P (τ , T ) is generally interpreted as the initial
amount of money to invest at time τ to obtain $ 1 at time T .
Given L(τ , T ), the short-term rate process r is obtained as:
r (τ ) ≡ lim L (τ , T ) .
T ↓τ
11.1.3 The yield curve and forward rates

11.1.3.1 The yield curve
The yield-to-maturity is defined to be the function R (t, T ) such that:
P (t, T ) ≡ e−(T −t)·R(t,T ) . (11.2)
It’s a sort of “average rate” for investing from time t to time T > t. The function, T → R (t, T ),
is called the yield curve, or the term structure of interest rates.
A related, and widely used concept, is the the par yield curve. Let B (t, T ) be the current
price of a coupon bearing bond. This bond pays off the principal of $1 at expiry T , as well as a
known sequence of coupons C (t, T ) at t + 1, t + 2, · · · , T , such that, in the absence of arbitrage
and any other frictions, its price is:
T −t

B (t, T ) = C (t, T ) · P (t, t + i) + P (t, T ) .
i=1
Please note, C (t, T ) is fixed at time t. A par bond, then, is one such that B (t, T ) = 100%, and
the par yield curve is the resulting sequence of the coupon rates C (t, T ), for T varying, viz
B (t, T ) − P (t, T )
C (t, T ) = T −t , B (t, T ) = 1. (11.3)
i=1 P (t, t + i)
In other words, the coupon rates C (t, T ) have to “adjust” to make the market happy to have
the coupon bearing bond quote at par, B (t, T ) = 1.
11.1.3.2 A first representation of bond prices
Let Q be a risk-neutral probability probability. Let E [·] denote the expectation under Q. By
the FTAP, there are no arbitrage opportunities if and only if P (τ , T ) satisfies, for all τ ∈ [t, T ],
T !
P (τ , T ) = E e− τ r(ℓ)dℓ , (11.4)
A sketch of the if-part (there is no arbitrage if bond prices are as in Eq. (11.4)) is in Appendix
1. The proof is quite standard, in fact similar to those encountered in the first part of the
book. Yet we provide it again because it highlights the specific issues relating to interest rate
modeling.
11.1.3.3 Forward rates, and a second representation of bond prices
In a forward rate agreement (FRA, henceforth), two counterparties agree that the interest
rate on a given principal in a future time-interval [T, S] will be fixed at some level K. Let
336
by A. Mele
the principal be normalized to one. The FRA works as follows: at time T , the first counter-
party receives $1 from the second counterparty; at time S > T , the first counterparty pays
back $ [1 + 1 · (S − T ) K] to the second counterparty. The amount K is agreed upon at time
t. Therefore, the FRA makes it possible to lock-in future interest rates. We consider simply
compounded interest rates because this is the standard market practice.
The amount K for which the current value of the FRA is zero is called the simply-compounded
forward rate as of time t for the time-interval [T, S], and is usually denoted as F (t, T, S). A
simple argument can be used to express F (t, T, S) in terms of bond prices. Consider the following
portfolio implemented at time t. Long one bond maturing at T and short P (t, T )/ P (t, S) bonds
maturing at S, for the time period [t, S]. The initial cost of this portfolio is zero because,
P (t, T )
−P (t, T ) + P (t, S) = 0.
P (t, S)
At time T , the portfolio yields $1 (originated from the bond purchased at time t). At time S,
P (t, T )/ P (t, S) bonds maturing at S (that were shorted at t) must be purchased. But at time
S, the cost of purchasing P (t, T )/ P (t, S) bonds maturing at S is obviously $ P (t, T )/ P (t, S).
The portfolio, therefore, is acting as a FRA: it pays $1 at time T , and −$ P (t, T )/ P (t, S) at
time S. In addition, the portfolio costs nothing at time t. Therefore, the interest rate implicitly
paid in the time-interval [T, S] must be equal to the forward rate F (t, T, S), and we have:
P (t, T )
= 1 + (S − T )F (t, T, S). (11.5)
P (t, S)
Clearly, the forward rate agreed at T for the time interval [T, S] is the short-term rate applying
to the same period:
F (T, T, S) = L(T, S). (11.6)
Next, we derive the value of the FRA in the general case for which K = F (t, T, S). Consider
the following trade. At time t, enter a FRA for the time-interval [T, S] as a future borrower.
Come time T , honour the FRA by borrowing $1 for the time-interval [T, S] at a cost of K.
Then, lend this very same $1 at the random interest rate L(T, S). The time S payoff deriving
from this trade is:
(S − T ) [L (T, S) − K] .
The value of the FRA, which we denote as IRS(t, T, S; K), is the current market value of this
future, random payoff. By the FTAP,
S !
IRS (t, T, S; K) = E e− t r(τ )dτ
(S − T ) [L (T, S) − K]
S !
= (S − T ) E e− t r(τ )dτ L (T, S) − (S − T ) P (t, S) K
- S .
e− t r(τ )dτ
=E − [1 + (S − T ) K] P (t, S)
P (T, S)
= P (t, T ) − [1 + (S − T ) K] P (t, S) , (11.7)

337
by A. Mele
where the second line holds by the definition of L and the third line follows by the following
relation:1 - S .
e− t r(τ)dτ
P (t, T ) = E . (11.8)
P (T, S)
Finally, by replacing Eq. (11.5) into Eq. (11.7),
IRS (t, T, S; K) = (S − T ) [F (t, T, S) − K] P (t, S) . (11.9)
As is clear, IRS can take on any sign, and is exactly zero when K = F (t, T, S), where
F (t, T, S) solves Eq. (11.5).
A useful remark. Comparing the second line in Eq. (11.7) with Eq. (11.9) reveals that:
- S .
e− t r(τ )dτ
F (t, T, S) = E L (T, S) .
P (t, S)
That is, forward rates are not unbiased expectations of future interest rates, not even under
the risk-neutral probability. We shall return to this point in Section 11.1.4.2.
Bond prices can be expressed in terms of these forward interest rates, namely in terms of the
“instantaneous” forward rates. First, rearrange terms in Eq. (11.5) so as to obtain:
P (t, S) − P (t, T )
F (t, T, S) = − .
(S − T )P (t, S)
The instantaneous forward rate f (t, T ) is defined as
∂ ln P (t, T )
f(t, T ) ≡ lim F (t, T, S) = − . (11.10)
S↓T ∂T
It can be interpreted as the marginal rate of return from committing a bond investment for
an additional instant. To express bond prices in terms of f , integrate Eq. (11.10), f (t, ℓ) =
− ∂ ln ∂ℓ
P (t,ℓ)
, with respect to the maturity date ℓ, use the condition that P (t, t) = 1, and obtain:
T
P (t, T ) = e− t
f (t,ℓ)dℓ
. (11.11)
11.1.3.4 The “marginal revenue” nature of forward rates
Comparing Eq. (11.2) with Eq. (11.5) yields:

" T
1
R(t, T ) = f(t, τ )dτ . (11.12)
T −t t
By differentiating Eq. (11.12) with respect to T yields:

∂R(t, T ) 1
= [f (t, T ) − R(t, T )] .
∂T T −t
1 To show that Eq. (11.8) holds, suppose that at time t, $P (t, T ) are invested in a bond maturing at time T . At time T , this
investment will obviously pay off $1. And at time T , $1 can be further rolled over another bond maturing at time S, thus yielding
$ 1/ P (T, S) at time S. Therefore, it is always possible to invest $P (t, T ) at time t and obtain a “payoff” of $ 1/ P (T, S) at time
S. By the FTAP, there are no arbitrage opportunities if and only if Eq. (11.8) holds true. Alternatively, use the law of iterated
expectations to obtain S T S
e− t r(τ)dτ e− t r(τ )dτ e− T r(τ )dτ
E =E E F(T ) = P (t, T ).
P (T, S) P (T, S)
338
by A. Mele
This relation underscores very clearly the “marginal revenue” nature of forward rates. Similarly
as for the cost function in the basic theory of production, we have that: (i) If f (t, T ) < R(t, T ),
the yield-curve R(t, T ) is decreasing at T ; (ii) if f(t, T ) = R(t, T ), the yield-curve R(t, T ) is
stationary at T ; (iii) if f(t, T ) > R(t, T ), the yield-curve R(t, T ) is increasing at T .
11.1.4 The expectation theory, and stylized facts of US term structure

11.1.4.1 The expectation hypothesis, and bond returns predictability
The expectation theory holds that forward rates equal expected future short-term rates, or
f (t, T ) = E [r (T )] ,
where E(·) denotes expectation under the physical probability. By Eq. (11.12), then, the ex-
pectation theory implies that,
" T
1
R (t, T ) = E [r (τ )] dτ . (11.13)
T −t t
The question whether f (t, T ) is higher than E [r (T )] is an old one. A possibility is that risk-
adverse investors induce f (t, T ) to be higher than the short-term rate they expect to prevail at
T , viz,
f (t, T ) ≥ E(r(T )). (11.14)
By Jensen’s inequality,
T T ! T
− f (t,τ )dτ − r(τ )dτ E[r(τ )]dτ
e t ≡ P (t, T ) = E e t ≥ e− t .
By taking logs, " "

T T
E[r(τ )]dτ ≥ f (t, τ )dτ .
t t
Therefore, in a risk-neutral market, the inequality in (11.14) cannot hold. This inequality is
related to the Hicks-Keynesian normal backwardation hypothesis.2 According to Hicks, firms
tend to demand long-term funds while fund suppliers prefer to lend at shorter maturity dates.
The market is cleared by intermediaries who demand a liquidity premium to be compensated
for their risky activity of borrowing at short maturity # T and lending at long maturities. Finally,
a recurrent definition. The difference, R(t, T ) − T 1−t t E [r(τ )] dτ , is usually referred to as yield
term-premium.
What does the empirical evidence suggest about the expectation hypothesis? Denote the
continuously compounded returns on a zero expiring at some date T as rt+1 T
= ln PP(t+1,T
(t,T )
)
, and
T T
the excess returns as r̂t+1 = rt+1 − R (t, t + 1), where R (t, t + 1) = − ln P (t, t + 1). Finally, the
forward rate is: ftT = − ln PP(t,T
(t,T )
−1)
. It is easy to see that the log-excess returns can be expressed
as:
T
r̂t+1 = − [R (t, t + 1) − R (t, T )] (T − t − 1) + [R (t, T ) − R (t, t + 1)] ,
such that,
1 T 1
R (t, t + 1) − R (t, T ) = − Et r̂t+1 + [R (t, T ) − R (t, t + 1)] .
T −t−1 T −t−1
2 According to the normal backwardation (contango) hypothesis, forward prices are lower (higher) than future expected spot
prices. Here the normal backwardation hypothesis is formulated with respect to interest rates.
339
by A. Mele
T
The expectation hypothesis implies that the risk-premium Et r̂t+1 = 0, and to test for it,
we can run the following regression:
1
R (t, t + 1) − R (t, T ) = αT + β T [R (t, T ) − R (t, t + 1)] + Residualt ,
T −t−1
and test for the null of αT = 0 and β T = 1. A widely known empirical feature of US data is
that the estimates of β T are typically negative for all maturities T , and somewhat increasing
T
with T in absolute value. In fact, Fama and Bliss (1987) show that the risk-premium Et r̂t+1
relates to the forward spreads, defined as ftT − R (t, t + 1), in that regressing
T

r̂t+1 = αT + β T ftT − R (t, t + 1) + Residualt ,
delivers statistically significant and positive values of β T for many maturities T .

Cochrane and Piazzesi (2005) go one step further and consider the following regressions:
5

T
r̂t+1 = αT + β 1T R (t, t + 1) + β j,T ftj + Residualt ,
j=2
5
and document a “tent shape” for the estimates of the coefficients β j,T j=1 , for bond maturities
T ∈ {1, · · · , 5}, and where t is in years so as to make returns calculated on a yearly basis. They
document this tent shape is robust to estimating a factor model in that this shape persists in
the estimates of the coefficients (bj,T )5j=1 in:
5

T
r̂t+1 = αT + β 1T Zt + Residualt , Zt = b1T R (t, t + 1) + bj,T ftj ,
j=2
where Zt is the common factor among the bond maturities T ∈ {1, · · · , 5}. Moreover, they
argue that using the traditional factors known to explain movements in the yield curve (see
Section 11.2) does not destroy the predicting power of their factors, in sample.
11.1.4.2 The yield curve and the business cycle
There is a simple prediction about the shape # T theyield-curve that we can make. By Jensen’s
T
inequality, e−(T −t)R(t,T ) ≡ P (t, T ) = E e− t r(τ )dτ ≥ e− t E[r(τ )]dτ . Therefore, the yield curve
#T
satisfies: R(t, T ) ≤ T 1−t t E[r(τ )]dτ . For example, suppose that the short-term rate is a mar-
tingale under the risk-neutral probability, viz E[r(τ )] = r(t). Then, the yield curve is bound to
be: R (t, T ) ≤ r (t). That is, the yield curve is not increasing in time-to-maturity, T . Positively
sloped yield curve, then, arise because the short-term rate is not a martingale under the risk-
neutral probability, which happens because of two fundamental, and not necessarily mutually
exclusive, reasons: (i) interest rates are expected to increase, (ii) investors are risk-averse. On
average, the US yield curve is upward sloping at maturity from three up to ten years.
There exists strong empirical evidence since at least Kessel (1965) or, later, Laurent (1988,
1989), Stock and Watson (1989), Estrella and Hardouvelis (1991) and Harvey (1991, 1993),
that inverted yield curves predict recessions with a lead time of about one to two years. The
explanations for such a statistical fact hinge upon both (i) the conduct of monetary policy
and the expectations about it, and (ii) the risk-premiums agents require to invest in long-term
bonds. We discuss these two points below.
340
by A. Mele
(i) The monetary channel :
(i.1) During expansions, monetary policy tends to be restrictive, to prevent the economy
from heating up. At the height of an expansion, then, short-term yields go up.
(i.2) Moreover, during recessions, monetary policy tends to keep interest rates low. At the
height of an expansion, agents might be anticipating an incoming recession and, then,
expecting central banks to lower future interest rates. Therefore, at the height of an
expansion, future interest rates might be expected to lower. The expectation hypoth-
esis in Eq. (11.13) would then predict the slope of the yield curve to decrease. In the
previous subsection, we have just learnt that the expectation hypothesis does not
hold, empirically. Bond markets command risk-premiums. However, a risk-premium
channel would reinforce the conclusion that the slope of the yield curve decreases
during expansion, as argued in the next point.
(ii) The risk-premium channel : From Chapter 7, we know that risk-premiums are counter-
cyclical, being high during recessions and low during expansions. The conditional equity
premium is countercyclical, and so is the long-bond premium. In fact, long-term yield and
equity expected returns are likely to be driven by the same state variables affecting the
pricing kernel of the economy.3
To sum up, we have that on the one hand, countercyclical monetary policy might be responsi-
ble for the negative price pressure on short-term bonds. On the other hand, expectations about
countercyclical monetary policy as well as procyclical risk-appetite might be responsible for a
positive price pressure on long-term bonds. These price pressures, we have argued, should occur
at the height of an expansion. But the sample data we have are those where expansions are
followed by recessions. Whence, the statistical facts about the predictive content of the yield
curve.
11.1.4.3 Additional stylized facts about the US yield curve
There are three additional features of data, which need to be noted.
(i) Yields are highly correlated (say three year yields with four year yields, with five year
yields, etc.), and suggest the existence of common factors driving all of them, discussed
in Section 11.2 below.
(ii) Yields are also very persistent, and this persistence bears important consequences to
derivative pricing, as explained in Section 11.7.4.
(iii) The term-structure of unconditional volatility is downward sloping, a feature Section

11.7.4 attempts to rationalize.
3 That long term bonds and stock market are acknowledged to be tightly related is witnessed by the market practitioners thumb
rule, whereby a stock market correction, such as a crash say, is deemed to be imminent when the spread 30 year bond yield −
earning-price ratio is larger than 3%. Normally, this spread is around 1% or 2%, which is zero, once corrected for inflation. This
spread was indeed larger than 3% in 1987 and in 1997.
341
by A. Mele
11.1.5 Forward martingale probabilities

11.1.5.1 Definition
Let ϕ(t, T ) be the T -forward price of a claim S(T ) at T . That is, ϕ(t, T ) is the price agreed
at t, that will be paid at T for delivery of the claim at T . Nothing has to be paid at t. By the
FTAP, there are no arbitrage opportunities if and only if:
T !
0 = E e− t
r(u)du
· (S(T ) − ϕ(t, T )) .
But since ϕ(t, T ) is known at time t,

T ! T !
E e− t r(u)du
· S(T ) = ϕ(t, T ) · E e− t r(u)du .
Now use the bond pricing equation (11.4), and rearrange terms in the previous equality, to
obtain - T .
− t r(u)du
e
ϕ(t, T ) = E · S(T ) = E [ηT (T ) · S(T )] , (11.15)
P (t, T )
where4 T
e− t r(u)du
η T (T ) ≡ .
P (t, T )
Eq. (11.15) suggests that we can define a new probability QTF , as follows,
T
dQTF e− t r(u)du
η T (T ) = ≡ !. (11.16)
dQ E e− tT r(u)du
Naturally, E[η T (T )] = 1. Moreover, if the short-term rate process is deterministic, η T (T ) equals

one and Q and QTF are the same.
In terms of this new probability QTF , the forward price ϕ(t, T ) is:
" "
ϕ(t, T ) = E [η T (T ) · S(T )] = [ηT (T ) · S(T )] dQ = S(T )dQTF = EQTF [S(T )] , (11.17)
where EQTF [·] denotes the expectation taken under QTF . For reasons that will be clear in a
moment, QTF is referred to as the T -forward martingale probability. The forward martingale
probability is a practical tool to price interest-rate derivatives, as we shall explain in Section
11.7. It was introduced by Geman (1989) and Jamshidian (1989), and further analyzed by
Geman, El Karoui and Rochet (1995). The appendix provides additional details: Appendix
2 relates forward prices to their certainty equivalent, and Appendix 3 illustrates additional
technicalities about the forward martingale probability.
4 As an example, suppose that S is the price process of a traded asset. By the FTAP, there are no arbitrage opportunities if and
τ T
only if e− t r(u)du S(τ ) is a Q-martingale. In this case, E[e− t r(u)du S(T )] = S(t), and Eq. (11.15) collapses to the well-known
formula: ϕ(t, T )P (t, T ) = S(t). As is also well-known, entering the forward contract established at t at a later date τ > t costs.
Apply the FTAP to prove that the value of a forward contract as of time τ ∈ [t, T ] is given by P (τ , T ) · [ϕ(τ , T ) − ϕ(t, T )]. [Hint:
Notice that the final payoff is S(T ) − ϕ(t, T ) and that the discount has to be made at time τ .]
342
by A. Mele
11.1.5.2 Martingale properties

Forward prices
Clearly, ϕ(T, T ) = S(T ). Therefore, Eq. (11.17) is, also,

ϕ(t, T ) = EQTF [ϕ(T, T )] .
Forward rates
Forward rates display a similar property:

f (t, T ) = EQTF [r(T )] = EQTF [f(T, T )] . (11.18)
where the last equality holds as r(t) = f (t, t). The proof is also simple. We have,
∂ ln P (t, T )
f (t, T ) = −
∂T
9
∂P (t, T )
=− P (t, T )
∂T
- T .
e− t r(τ )dτ
=E · r(T )
P (t, T )
= E [η T (T ) · r(T )]
= EQTF [r(T )] .
Finally, the simply-compounded forward rate satisfies the same property: given a sequence
of dates {Ti }i=0,1,··· ,
F (τ , Ti , Ti+1 ) = EQTi+1 [L (Ti , Ti+1 )] = EQTi+1 [F (Ti , Ti+1 )] , τ ∈ [t, Ti ] , (11.19)
F F
where the second equality follows by Eq. (11.6). To show Eq. (11.19), note that by definition,
the simply-compounded forward rate F (t, T, S) satisfies:
IRS(t, T, S; F (t, T, S)) = 0,
where IRS(t, T, S; K) is the value as of time t of a FRA struck at K for the time-interval [T, S].
By rearranging terms in the second equality of Eq. (11.7),
!
− tS r(τ )dτ
F (t, T, S)P (t, S) = E e L(T, S) .
By the definition of ηS (S),
F (t, T, S) = EQSF [L(T, S)] .
These relations show that it is only under the forward martingale probability that the expec-
tation theory holds true. Consider, for instance, Eq. (13.19). We have,
f(t, T ) = EQTF [r(T )] = EQ [η T (T ) r(T )]
= E [η T (T )]E [r(T )] + covQ [ηT (T ) , r(T )]

=1
= E [r(T )] + cov [Ker (T ) , r(T )] + covQ [η T (T ) , r(T )] ,
where Ker(T ) denotes the pricing kernel in the economy. That is, forward rates in general
deviate from the future expected spot rates because of risk-aversion corrections (the second
term in the last equality) and because interest rates are stochastic (the third term in the last
equality).
343
11.2. Common factors affecting the yield curve c
by A. Mele
11.2 Common factors affecting the yield curve

Which systematic risks affect the entire term-structure of interest rates? How many factors are
needed to explain the variation of the yield curve? The standard “duration hedging” practice,
reviewed in detail in Chapter 13, relies on the idea that most of the variation of the yield curve
is successfully captured by a single factor that produces parallel shifts in the yield curve. How
reliable is this idea, in practice?
Litterman and Scheinkman (1991) demontrate that most of the variation (more than 95%)
of the term-structure of interest rates can be attributed to the variation of three unobservable
factors, which they label (i) a “level” factor, (ii) a “steepness” (or “slope”) factor, and (iii)
a “curvature” factor. To disentangle these three factors, the authors make an unconditional
analysis based on a fixed-factor model. Succinctly, this methodology can be described as follows.
Suppose that p returns computed from bond prices at p different maturities are generated by
a linear factor structure, with a fixed number k of factors,
Rt = R̄ + B Ft + ǫt , (11.20)
p×1 p×1 p×k k×1 p×1
where Rt is the vector of returns, Ft is the zero-mean vector of common factors affecting the
returns, assumed to be zero mean, R̄ is the vector of unconditional expected returns, ǫt is a vector
of idiosyncratic components of the return generating process, and B is a matrix containing the
factor loadings. Each row of B contains the factor loadings for all the common factors affecting
a given return, i.e. the sensitivities of a given return with respect to a change of the factors.
Each comumn of B contains the term-structure of factor loadings, i.e. how a change of a given
factor affects the term-structure of excess returns.
11.2.1 Methodological details

Estimating the model in Eq. (11.20) leads to econometric challenges, mainly because the vec-
tor of factors Ft is unobservable.5 However, there exists a simple method, known as principal
components analysis (PCA, henceforth), which leads to empirical results qualitatively similar
to those holding for the general model in Eq. (11.20). We discuss these empirical results in the
next subsection. We now describe the main methodological issues arising within PCA.
The main idea underlying PCA is to transform the original p correlated variables R into a set
of new uncorrelated variables, the principal components. These principal components are linear
combinations of the original variables, and are arranged in order of decreased importance: the
first principal component accounts for as much as possible of the variation in the original data,
etc. Mathematically, we are looking for p linear combinations of the demeaned excess returns,

Yi = Ci⊤ R − R̄ , i = 1, · · · , p, (11.21)
5 Suppose

that in Eq. (11.20), F ∼ N (0, I), and that ǫ ∼ N (0, Ψ), where Ψ is diagonal. Then, R ∼ N R̄, Σ , where Σ = BB ⊤ +Ψ.
The assumptions that F ∼ N (0, I) and that Ψ is diagonal are necessary to identify the model, but not sufficient. Indeed, any
orthogonal rotation of the factors yields a new set of factors which also satisfies Eq. (11.20). Precisely, let T be an orthonormal
matrix. Then, (BT ) (BT )⊤ = BT T ⊤ B ⊤ = BB ⊤ . Hence, the factor loadings B and BT have the same ability to generate the matrix
Σ. To obtain a unique solution, one needs to impose extra constraints on B. For example,
Jöreskog (1967) develop a maximum
likelihood approach in which the log-likelihood function is, − 12 N ln |Σ| + Tr SΣ−1 , where S is the sample covariance matrix of
R, and the constraint is that B ⊤ ΨB be diagonal with elements arranged in descending order. The algorithm is: (i) for a given Ψ,
maximize the log-likelihood with respect to B, under the constraint that B ⊤ ΨB be diagonal with elements arranged in descending
order, thereby obtaining B̂; (ii) given B̂, maximize the log-likelihood with respect to Ψ, thereby obtaining Ψ̂, which is fed back into
step (i), etc. Knez, Litterman and Scheinkman (1994) describe this approach in their paper. Note that the identification device they
describe at p. 1869 (Step 3) roughly corresponds to the requirement that B ⊤ ΨB be diagonal with elements arranged in descending
order. Such a constraint is clearly related to principal component analysis.
344
by A. Mele
such that, for p vectors Ci⊤ of dimension 1 × p, (i) the new variables Yi are uncorrelated, and (ii)
their variances are arranged in decreasing order. The logic behind PCA is to ascertain whether
a few components of Y = [Y1 · · · Yp ]⊤ account for the bulk of variability of the original data.
Let C ⊤ = [C1⊤ · · · Cp⊤ ] be a p × p matrix such that we can write Eq. (11.21) in matrix format,

Yt = C ⊤ Rt − R̄ or, by inverting,
Rt − R̄ = C ⊤−1 Yt . (11.22)
Next, suppose that the vector Y (k) = [Y1 · · · Yk ]⊤ accounts for most of the variability in the
original data,6 and let C ⊤(k) denote a p × k matrix extracted from the matrix C ⊤−1 through
the first k rows of C ⊤−1 . Since the components of Y (k) are uncorrelated and they are deemed
largely responsible for the variability of the original data, it is natural to “disregard” the last
p − k components of Y in Eq. (11.22),
(k)
Rt − R̄ ≈ C ⊤(k) Yt .
p×1 p×k k×1
(k)
If the vector Yt really accounts for most of the movements of Rt , the previous approximation
to Eq. (11.22) should be fairly good.
Let us make more precise what the concept of variability is in the context of PCA. Suppose
that the variance-covariance matrix of the returns, Σ, has p distinct eigenvalues, ordered from
the highest to the lowest, as follows: λ1 > · · · > λp . Then, the vector Ci in Eq. (11.21) is the
eigenvector corresponding to the i-th eigenvalue. Moreover,
var (Yi ) = λi , i = 1, · · · , p.
Finally, we have that

k k
i=1 var (Yi ) λi
RPCA = p = i=1
p . (11.23)
i=1 var (Ri ) i=1 λi
(Appendix 4 provides technical details and proofs of the previous formulae.) It is in the sense
of Eq. (11.23) that in the context of PCA, we say that the first k principal components account
for RPCA % of the total variation of the data.
11.2.2 The empirical facts

The striking feature of the empirical results uncovered by Litterman and Scheinkman (1991)
is that they have been confirmed to hold across a number of countries and sample periods.
Moreover, the economic nature of these results is the same, independently of whether the
statistical analysis relies on a rigorous factor analysis of the model in Eq. (11.20), or a more
back-of-envelope computation based on PCA. Finally, the empirical results that hold for bond
returns are qualitatively similar to those that hold for bond yields.
6 There are no rigorous criteria to say what “most of the variability” means in this context. Instead, a likelihood-ratio test is
most informative in the context of the estimation of Eq. (11.20) by means of the methods explained in the previous footnote.
345
by A. Mele
Level Slope Curvature
FIGURE 11.2. Changes in the term-structure of interest rates generated by changes in the “level,”
“slope” and “curvature” factors.
Figure 11.2 visualizes the effects that the three factors have on the movements of the term-
structure of interest rates.
• The first factor is called a “level” factor as its changes lead to parallel shifts in the term-
structure of interest rates. Thus, this “level” factor produces essentially the same effects
on the term-structure as those underlying the “duration hedging” portfolio practice. This
factor explains approximately 80% of the total variation of the yield curve.
• The second factor is called a “steepness” factor as its variations induce changes in the
slope of the term-structure of interest rates. After a shock in this steepness factor, the
short-end and the long-end of the yield curve move in opposite directions. The movements
of this factor explain approximately 15% of the total variation of the yield curve.
• The third factor is called a “curvature” factor as its changes lead to changes in the
curvature of the yield curve. That is, following a shock in the curvature factor, the middle
of the yield curve and both the short-end and the long-end of the yield curve move in
opposite directions. This curvature factor accounts for approximately 5% of the total
variation of the yield curve.
Understanding the origins of these three factors is still a challenge to financial economists and
macroeconomists. For example, macroeconomists explain that central banks affect the short-
end of the yield curve, e.g. by inducing variations in Federal Funds rate in the US. However, the
Federal Reserve decisions rest on the current macroeconomic conditions. Therefore, we should
expect that the short-end of the yield-curve is related to the development of macroeconomic
factors. Instead, the development of the long-end of the yield curve should largely depend on the
market average expectation and risk-aversion surrounding future interest rates and economic
conditions. Financial economists, then, should expect to see the long-end of the yield curve as
being driven by expectations of future economic activity, and by risk-aversion. Indeed, Ang and
Piazzesi (2003) demonstrate that macroeconomic factors such as inflation and real economic
activity are able to explain movements at the short-end and the middle of the yield curve.
Interestingly, they show that the long-end of the yield curve is driven by unobservable factors.
346
11.3. Models of the short-term rate c
by A. Mele
However, it is not clear whether such unobservable factors are driven by time-varying risk-
aversion or changing expectations. The compelling lesson, in general, is that models of the yield
curve driven by only one factor are likely to be misspecified, due to the complexity of roles
played by many institutions participating in the fixed income markets, and the links with the
macroeconomy that decisions taken by these instititions have.
11.3 Models of the short-term rate

The short-term rate represents the velocity at which “locally” riskless investments appreciate
over the next instant. This velocity, or growth rate, is of course not a traded asset. What it is
traded is a bond and/or a MMA.
11.3.1 Introduction
The fundamental bond pricing equation in Eq. (11.4),
T !
− r(u)du
P (t, T ) = E e t , (11.24)
suggests to model the arbitrage-free bond price P by using as an input an exogenously given
short-term rate process r. In the Brownian information structure considered in this chapter, r
would then be the solution to a stochastic differential equation. As an example,
dr(τ ) = b(r(τ ), τ )dτ + a(r(τ ), τ )dW (τ ), τ ∈ (t, T ], (11.25)
where b and a are well-behaved functions guaranteeing the existence of a strong-form solution
to the previous equation.
This approach to modeling interest rates was the first to emerge, after the seminal papers of
Merton (1973) (in a footnote!) and Vasicek (1977). This section illustrates the main modeling
and empirical challenges related to this approach. We examine one-factor “models of the short-
term rate,” such as that in Eqs. (11.24)-(11.25), and also multifactor models, where the short-
term rate is a function of a number of factors, r (τ ) = R(y (τ )), where R is some function and
y is solution to a multivariate diffusion process.
Two fundamental issues for the model’s users are that the models they deal with be (i)
fast to compute, and (ii) accurate. As regards the first point, the obvious target would be to
look for models with a closed form solution, such as for example, the so-called “affine” models
(see Section 11.3.6). The second point is more subtle. Indeed, “perfect” accuracy can never be
achieved with models such as that in Eqs. (11.24)-(11.25) - even when this model is extended
to a multifactor diffusion. After all, the model in Eqs. (11.24)-(11.25) can only be taken as it
really is - a model of determination of the observed yield curve. As such the model in Eqs.
(11.24)-(11.25) cannot exactly fit the observed term structure of interest rates.
As we shall explain, the requirement to exactly fit the initial term-structure of interest rates
is important when the concern of the model’s user is the pricing of options or other derivatives
written on the bonds. And the good news is that such a perfect fit can be obtained indeed, once
we “augment” Eq. (11.25) with an infinite dimensional parameter calibrated to the observed
term-structure. The bad news, such a calibration leads to some “intertemporal inconsistencies,”
which we shally duly explain in a moment.
The models leading to perfect accuracy are often referred to as “no-arbitrage” models. These
models work by making the short-term rate process exactly pin down the term-structure we
347
by A. Mele
observe at a given instant. The intertemporal inconsistencies arise because the parameters of
the short-term rate pinning down the term structure today, say, are likely to differ from from the
very same parameters as of tomorrow. Clearly, this methodology goes to the opposite extreme
of the original approach, where the short-term rate is the input of all subsequent movements
of the term-structure of interest rates. This original approach is consistent with the rational
expectations paradigm that permeates modern economic analysis: economically admissible, i.e.
no-arbitrage, bond prices move as a result of random changes in the state variables. Economists
try to explain broad phenomena with the help of a few inputs, a science reduction principle.
Practitioners, instead, implement models to solve pricing problems where bond prices have to
match market data. In these models used by practitioners, it is derivatives written on these
bond prices to “move” in reaction to changes in the underlying fundamentals, not bond prices,
which instead are perfectly fitted, as we shall say. Both activities are important, and the choice
of the “right” model to use rests on the ultimate objective of the model’s user.
11.3.2 The basic bond pricing equation

11.3.2.1 A first derivation
Suppose bond prices are solutions to the following stochastic differential equation:
dPi
= µbi dτ + σ bi dW, (11.26)
Pi
where W is a standard Brownian motion in Rd , µbi and σ bi are some progressively measurable
functions (σ bi is vector-valued), and Pi ≡ P (τ , Ti ). The exact functional form of µbi and σ bi
is not given, as in the BS case. Rather, it is endogenous and must be found as a part of the
equilibrium.
As shown in Appendix 1, the price system in (11.26) is arbitrage-free if and only if
µbi = r + σ bi λ, (11.27)
for some Rd -dimensional process λ satisfying some basic regularity conditions. The meaning of
(11.27) can be understood by replacing it into Eq. (11.26), and obtaining:
dPi
= (r + σ bi λ) dτ + σ bi dW.
Pi
The previous equation tells us that the growth rate of Pi is the short-term rate plus a term-
premium equal to σ bi λ. In the bond market, there are no obvious economic arguments enabling
us to sign term-premia. Empirical evidence suggests that term-premia did take both signs over
the last twenty years. But term-premia would be zero in a risk-neutral world. In other terms,
bond prices are solutions to:
dPi
= rdτ + σ bi dW̃ ,
Pi
#
where W̃ = W + λdτ is a Q-Brownian motion and Q is the risk-neutral probability.
To derive Eq. (11.27) with the help of a specific version of theory developed in Appendix 1,
we now work out the case d = 1. Consider two bonds, and the dynamics of the value V of a
self-financed portfolio in these two bonds and a money market account:
dV = [π 1 (µb1 − r) + π 2 (µb2 − r) + rV ] dτ + (π1 σ 1b + π 2 σ b2 ) dW,

348
by A. Mele
where π i is wealth invested in bond maturing at Ti : π i = θi Pi . We can zero uncertainty by

setting
σ b2
π1 = − π2.
σ b1
By replacing this into the dynamics of V ,

µb1 − r
dV = − σ b2 + (µb2 − r) π 2 dτ + rV dτ .
σ b1
Notice that π 2 can always be chosen so as to make the value of this portfolio appreciate at a
rate strictly greater than r. It is sufficient to set:

µb1 − r
sign(π 2 ) = sign − σ b2 + (µb2 − r) .
σ b1
Therefore, to rule out arbitrage opportunities, it must be the case that:
µb1 − r µ −r
= b2 .
σ b1 σ b2
The previous relation tells us that the Sharpe ratio for any two bonds has to equal a process λ,
say, and Eq. (11.27) immediately follows. Clearly, this function, λ, does not depend on none of
the two maturity dates, T1 or T2 . Since T1 and T2 are arbitrary, then, λ is independent of time
to maturity, T . It is natural, as λ is the unit price of risk agents require to be compensated for
the fluctuations of the short-term rate, and it must be independent of the assets they trade on,
i.e. the maturity.
In models of the short-term rate such as (11.25), the two functions µbi and σ bi in Eq. (11.26)
can be determined through Itô’s lemma. Let P (r, τ , T ) be the rational bond price function, i.e.,
the price as of time τ of a bond maturing at T when the state at τ is r. Since r is solution to
(11.25), Itô’s lemma then implies that:

∂P 1 2
dP = + bPr + a Prr dτ + aPr dW,
∂τ 2
where subscripts denote partial derivatives.
Comparing this equation with Eq. (11.26) then reveals that:
∂P 1
µb P = + bPr + a2 Prr , σ b P = aPr .
∂τ 2
Now replace these functions into Eq. (11.27) to obtain the the bond price satisfies the following
partial differential equation (PDE, henceforth):
∂P 1
+ bPr + a2 Prr = rP + λaPr , for all (r, τ ) ∈ R++ × [t, T ), (11.28)
∂τ 2
with the boundary condition P (r, T, T ) = 1 for all r ∈ R++ .
Eq. (11.28) shows that the bond price, P , depends on both the drift of the short-term rate, b,
and the risk-aversion correction, λ. This circumstance occurs as the initial asset market structure
is incomplete, in the following sense. In the Black-Scholes model, the option is redundant, given
the initial market structure. In the context we analyze here, the short-term rate r is not a
349
by A. Mele
traded asset. In other words, the initial market structure has one untraded risk (r) and zero
assets: the factor generating uncertainty in the economy, r, is not traded. Therefore, the drift
of the short-term rate cannot be equal to r · r = r2 under the risk-neutral probability, but
rather b − λa, thereby leading to Eq. (11.28). Therefore, the bond price depends on the specific
functional forms b, a and λ.
While this kind of dependence might be seen as a kind of hindrance to practitioners, it can
also be viewed as a good piece of news. Indeed, information about agents’ risk-appetite λ can be
backed out, after having estimated the two functions (b, a). In turn, information about agents’
risk-appetite can, for example, help central bankers to take decisions about the interest rates
to set.
By specifying the drift and diffusion functions b and a, and by identifying the risk-premium
λ, the PDE in Eq. (11.28) can explicitly be solved, either analytically or numerically. Choices
concerning the exact functional form of b, a and λ are often made on the basis of either ana-
lytical or empirical reasons. In the next section, we will examine the first, famous short-term
rate models where b, a and λ have a particularly simple form. We will discuss the analytical
advantages of these models, but we will also highlight the major empirical problems associ-
ated with these models. In Section 11.3.4 we provide a very succinct description of models
exhibiting jump (and default) phenomena. In Section 11.3.5, we introduce multifactor models:
we will explain why do we need such more complex models, and show that even in this more
complex case, arbitrage-free bond prices are still solutions to PDEs such as (11.28). In Section
11.3.6, we will present a class of analytically tractable multidimensional models, known as affine
models. We will discuss their historical origins, and highlight their importance as regards the
econometric estimation of bond pricing models. Finally, Section 11.3.7 presents the “perfectly
fitting” models, and Appendix 5 provides a few technical details about the solution of one of
these models.
11.3.2.2 Derivation based on duration
The idea, here, is to replicate the price of a bond expiring at some time T1 , say P 1 ≡ P (r, τ , T1 ),
with a self-financed portfolio comprising a money market account and a second bond expiring
at time T2 > T1 . The value of the self-financed portfolio is V = ∆ · P 2 + M , where ∆ is the
number of bonds maturing at T2 to be put in the portfolio, P 2 = P (r, τ , T2 ), and M is the
amount of resources put in the money market account. Since the portfolio is self-financed, we
have, by the usual arguments, that,

dV = ∆ · dP 2 + dM = ∆ · LP 2 + rM dτ + ∆ · aPr2 dW, (11.29)
∂P 2
where LP 2 = ∂τ
+ bPr2 + 12 a2 Prr
2
. And, obviously,
dP 1 = LP 1 dτ + aPr1 dW. (11.30)
Let the initial value of the portfolio match the bond price. Then, comparing the diffusive terms
in Eq. (11.29) and Eq. (11.30), we find the delta to be:
ˆ = ∂P (r, τ , T1 )/ ∂r .
∆
∂P (r, τ , T2 )/ ∂r
Comparing the drift terms in Eq. (11.29) and Eq. (11.30),

LP 1 = ∆ · LP 2 + rM = ∆ · LP 2 + r V − ∆P 2 = ∆ · LP 2 + r P 1 − ∆P 2 ,
350
by A. Mele
where the last line follows as we’re using the values (∆, M ) such that the portfolio matches the
value of the first bond. Rearranging terms yields, LP 1 − rP 1 = ∆ · (LP 2 − rP 2 ), and evaluating
ˆ
this for ∆ = ∆,
LP 1 − rP 1 LP 2 − rP 2
= ≡ Λ ≡ λa,
Pr1 Pr2
for some Λ and λ independent of calendar time.
ˆ can be interpreted as the ratio of the durations of the two bonds, as explained
The delta, ∆,
in Chapter 13.
11.3.3 Some famous univariate short-term rate models

11.3.3.1 Vasicek and CIR
Vasicek’s (1977) model is to be considered the seminal contribution to the literature. It assumes
the short-term rate is solution to:
dr(τ ) = κ (r̄ − r (τ )) dτ + σdW (τ ), τ ∈ (t, T ], (11.31)
where r̄, κ and σ are positive constants. This model generalizes that of Merton (1973), where
the drift was µdτ for some constant µ > 0. The intuition behind Eq. (11.31) is simple. Suppose,
first, that σ = 0. In this case, the solution is:
r(τ ) = r̄ + e−κ(τ −t) (r(t) − r̄) .
The previous equation reveals that if the current level of the short-term rate r(t) = r̄, it will be
“locked-in” at r̄ forever. If, instead, r(t) < r̄, then, for all τ > t, r(τ ) < r̄ too, but |r(τ ) − r̄| will
eventually shrink to zero as τ → ∞. An analogous property holds when r(t) > r̄. In all cases,
the “speed” of convergence of r to its “long-term” value r̄ is determined by κ: the higher is κ,
the higher is the speed of convergence to r̄. In other terms, r̄ is the long-term value towards
which r tends to converge, and κ determines the speed of such a convergence.
Eq. (11.31) generalizes the previous ideas to the stochastic differential case. It can be shown
that a “solution” to Eq. (11.31) can be written in the following format:
" τ
−κ(τ −t) −κτ
r(τ ) = r̄ + e (r(t) − r̄) + σe eκs dW (s),
t
where the integral has the so-called Itô’s sense meaning. The interpretation of this solution is
similar to the one given above. The short-term rate tends to a sort of “central tendency” r̄.
Actually, it will have the tendency to fluctuate around it. In other terms, there is always the
tendency for shocks to be absorbed with a speed dictated by the value of κ. In this case, the
short-term rate process r is said to exhibit a mean-reverting behavior. Precisely, it can be shown
that the expected future value of r is given by the solution given above for the deterministic
case, viz
E [r(τ )| r (t)] = r̄ + e−κ(τ −t) (r(t) − r̄) .
Moreover, the variance of the value taken by r at time τ is:
σ2
var [r(τ )| r (t)] = 1 − e−2κ(τ −t) .
2κ
Finally, it can be shown that r is normally distributed (with expectation and variance given by
the two functions given above).
351
by A. Mele
The previous properties of r are certainly instructive. Yet the main objective here is to find
the price of a bond. As it turns out, the assumption that the risk premium process λ is a
constant allows one to obtain a closed-form solution. Indeed, replace this constant and the
functions b(r) = κ (r̄ − r) and a(r) = σ into the PDE (11.28), and let r∗ ≡ r̄ − λσ κ
. The result
is that the bond price P is solution to the following partial differential equation:
∂P 1
0= + κ (r∗ − r) Pr + σ 2 Prr − rP, for all (r, τ ) ∈ R × [t, T ), (11.32)
∂τ 2
with the usual boundary condition. Intuitively, κ (r∗ − r) is the drift of the short-term rate
under Q, which is higher than under P for λ < 0, reflecting higher Arrow-Debreu state prices
for the bad states of the world arising when interest rates are high.
It is instructive to see how this kind of PDE can be solved. Guess a solution of the form:
P (r, τ , T ) = eA(τ ,T )−B(τ ,T )·r , (11.33)
where A and B have to be found. The boundary condition is P (r, T, T ) = 1, which implies that
the two functions A and B must satisfy:
A(T, T ) = 0 and B(T, T ) = 0. (11.34)
Now suppose that the guess is true. By differentiating Eq. (11.33), ∂P

∂τ
= (A1 −B1 r)P , Pr = −P B
2
and Prr = P B , where A1 (τ , T ) ≡ ∂A(τ , T )/ ∂τ and B1 (τ , T ) ≡ ∂B(τ , T )/ ∂τ . By replacing
these partial derivatives into the PDE (11.32) we get:

∗ 1 2 2
0 = A1 − κr B + σ B + (κB − B1 − 1)r, for all (r, τ ) ∈ R++ × [t, T ).
2
This implies that for all τ ∈ [t, T ),

1
0 = A1 − κr∗ B + σ 2 B 2 , 0 = κB − B1 − 1,
2
subject to the boundary conditions (11.34). The solutions are
" " T
1 −κ(T −τ )
1 2 T 2 ∗
B(τ , T ) = 1−e , A (τ , T ) = σ B(s, T ) ds − κr B(s, T )ds.
κ 2 τ τ
By the definition of the yield curve given in Section 11.1 (see Eq. (11.2)),
ln P (r, t, T ) −A(t, T ) B(t, T )

R (τ , T ) ≡ − = + r.
T −t T −t T −t
It is possible to show the existence of a finite “asymptotic” spot rate, i.e. limT →∞ R(t, T ) =
limT →∞ −A(t,T
T −t
)
< ∞.
The model has a number of features capable of matching some empirical fatcs, such as a
few typical shapes of the yield-curve. However, this model is known to suffer from two main
drawbacks. The first drawback is that the short-term rate is Gaussian and, hence, can take on
negative values with positive probability. That is a counterfactual feature of the model. However,
it should be stressed that on a practical standpoint, this feature is practically irrelevant. If σ is
low compared to r̄, this probability is really very small. However, interest rate derivatives are
352
by A. Mele
nonlinear object and a small modeling error may result in serious mispricing, as pointed out
long time ago by Dybvig [cite reference]. The second drawback, related to the first, is that the
short-term rate volatility is independent of the level of the short-term rate. It is well-known
that short-term rates changes become more and more volatile as the level of the short-term rate
increases. In the empirical literature, this phenomenon is usually referred to as the level-effect.
The model proposed by Cox, Ingersoll and Ross (1985) (CIR, henceforth) addresses these
two drawbacks at once, as it assumes that the short-term rate is solution to,

dr(τ ) = κ(r̄ − r(τ ))dτ + σ r(τ )dW (τ ), τ ∈ (t, T ].
The CIR model is also referred to as “square-root” process to emphasize that the diffusion
function is proportional to the square-root of r. This feature makes the model address the level-
effect phenomenon. Moreover, this property prevents r from taking negative values. Intuitively,
when r wanders just above zero, it is pulled back to the stricly positive region at a strength
of the order dr = κr̄dτ .7 The transition density of r is noncentral chi-square. The stationary
density of r is a gamma distribution. The expected value is as in Vasicek.8 However, the variance
is different, although its exact expression is really not important here.
CIR formulated a set of assumptions
√ on the primitives of the economy (e.g., preferences) that
led to a risk-premium
√ function λ = ℓ r, where ℓ is a constant. By replacing this, b(r) = κ (r̄ − r)
and a(r) = σ r into the PDE (11.28), one gets (similarly as in the Vasicek model), that the
bond price function takes the form in Eq. (11.33), but with functions A and B satisfying the
following differential equations:
1
0 = A1 − κr̄B, 0 = −B1 + (κ + ℓσ)B + σ 2 B 2 − 1,
2
subject to the boundary conditions in Eq. (11.34).
In their article, CIR also showed how to compute options on bonds. They even provided
hints on how to “invert the term-structure,” a popular technique that we describe in detail in
Section 11.3.6. For all these features, the CIR model and paper have been used in the industry
for many years. And many of the more modern models are mere multidimensional extensions
of the basic CIR model. (See Section 11.3.6).
11.3.3.2 Nonlinear drifts
An important issue is the analytical tractability of a given model. As demonstrated earlier,

models such as Vasicek and CIR admit a closed-form solution. Among other things, this is
because these models have a linear drift. Is evidence consistent with this linear assumption?
What does empirical evidence suggest as regards mean reversion of the short-term rate?
Such an empirical issue is subject to controversy. In the mid 1990s, three papers by Aı̈t-
Sahalia (1996), Conley et al. (1997) and Stanton (1997) produced evidence that mean-reverting
behavior is nonlinear. As an example, Conley et al. (1997) estimated a drift function of the
following form:
b(r) = β 0 + β 1 r + β 2 r2 + β 3 r−1 ,
7 This is only intuition. The exact condition under which the zero boundary is unattainable by r is κr̄ > 1 σ 2 . See Karlin and
2
Taylor (1981, vol II chapter 15) for a general analysis of attainability of boundaries for scalar diffusion processes.
8 The expected value of linear mean-reverting processes is always as in Vasicek, independently of the functional form of the
diffusion coefficient. This property follows by a direct application of a general result for diffusion processes given in Chapter 6
(Appendix A).
353
by A. Mele
which is reproduced in Figure 11.2 below (Panel A). Similar results were obtained in the other
papers. To grasp the phenomena underlying this nonlinear drift, Figure 11.2 (Panel B) also
contrasts the nonlinear shape in Panel A with a linear drift shape that can be obtained by
fitting the CIR model to the same data set (US data: daily data from 1981 to 1996).
drift drift
0.3
0.05
0.2
0.1
0.00
0.04 0.06 0.08 0.10 0.12
0.0 short-term rate r
0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16
-0.1 short-term rate r
-0.05
-0.2
-0.3 -0.10
Panel A Panel B
FIGURE 11.3. Nonlinear mean reversion?
The importance of the nonlinear effects in Figure 11.3 is related to the convexity effects in
Mele (2003). Mele (2003) showed that bond prices may be concave in the short-term rate if the
risk-neutralized drift function is sufficiently convex. While the results in this Figure relate to
the physical drift functions, the point is nevertheless important as risk-premium terms should
look like very strange to completely destroy the nonlinearities of the short-term rate under the
physical probability.
The main lesson is that under the “nonlinear drift dynamics,” the short-term rate behaves in
a way that can at least be roughly comparable with that it would behave under the “linear drift
dynamics.” However, the behavior at the extremes is dramatically different. As the short-term
rate moves to the extremes, it is pulled back to the “center” in a very abrupt way. At the
moment, it is not clear whether these preliminary empirical results are reliable or not. New
econometric techniques are currently being developed to address this and related issues.
One possibility is that such single factor models of the short-term rate are simply misspecified.
For example, there is strong empirical evidence that the volatility of the short-term rate is time-
varying, as we shall discuss in the next section. Moreover, the term-structure implications of
a single factor model are counterfactual, since we know that a single factor cannot explain
the entire variation of the yield curve, as we explained in Section 11.2. We now describe more
realistic models driven by more than one factor.
11.3.4 Multifactor models

The empirical evidence reviewed in Section 11.2 suggests that one-factor models cannot explain
the entire variation of the term-structure of interest rates. Factor analysis suggests we need at
least three factors. In this section, we succinctly review the advances made in the literature to
address this important empirical issue.
354
by A. Mele
11.3.4.1 Stochastic volatility
In the CIR model, the instantaneous short-term rate volatility is stochastic, as it depends on the
level of the short-term rate, which is obviously stochastic. However, empirical evidence suggests
that the short-term rate volatility depends on some additional factors. A natural extension of
the CIR model is one where the instantaneous volatility of the short-term rate depends on (i)
the level of the short-term rate, similarly as in the CIR model, and (ii) some additional random
component. Such an additional random component is what we shall refer to as the “stochastic
volatility” of the short-term rate. It is the term-structure counterpart to the stochastic volatility
extension of the Black and Scholes (1973) model (see Chapter 10).
Fong and Vasicek (1991) write the first paper in which the volatility of the short-term rate
is stochastic. They consider the following model:

dr (τ ) = κr (r̄ − r (τ )) dτ + v(τ )r (t)γ dW1 (τ )
(11.35)
dv (τ ) = κv (v̄ − v (τ )) dτ + ξ v v (τ )dW2 (τ )
where κr , r̄, κv , v̄ and ξ v are constants, and [W1 W2 ] is a vector Brownian motion. To obtain
a closed-form solution, Fong and Vasicek set γ = 0. The authors also make assumptions about
risk aversion corrections. Namely, they assume that the unit-risk-premia for the stochastic fluc-
tuations
of the short-term rate, λr , and the short-term rate volatility, λv , are both proportional
to v (τ ), and then they find a closed-form solution for the bond price as of time t and maturing
at time T , P (r (t) , v (t) , T − t).
Longstaff and Schwartz (1992) propose another model of the short-term rate where the
volatility of the short-term rate is stochastic. The remarkable feature of their model is that
it is a general equilibrium model. Naturally, the Longstaff & Schwartz model predicts, as the
Fong-Vasicek model, that the bond price is a function of both the short-term rate and its
instantaneous volatility.
Note, then, the important feature of these models. The pricing function, P (r (t) , v (t) , T − t)
and, hence, the yield curve R (r (t) , v (t) , T − t) ≡ − (T − t)−1 ln P (r (t) , v (t) , T − t), depends
on the level of the short-term rate, r (t), and one additional factor, the instantaneous variance
of the short-term rate, v (t). Hence, these models predict that we now have two factors that
help explain the term-structure of interest rates, R (r (t) , v (t) , T − t).
What is the relation between the volatility of the short-term rate and the term-structure
of interest rates? Does this volatility help “track” one of the factors driving the variations of
the yield curve? Consider, first, the basic Vasicek (1997) model. While this model assumes the
short-term rate volatility is constant, it can still be used to develop intuition about models with
stochastic volatility models, such those the Fong and Vasicek (1991) of Eqs. (11.35). For the
Vasicek model, then,
" T " T
∂R (r (t) , T − t) 1 2
=− σ B (T − s) ds + λ B (T − s) ds . (11.36)
∂σ T −t t t

where B (T − s) = κ1 1 − e−κ(T −s) . Eq. (11.36) shows that if λ ≥ 0, the term-structure is
decreasing short-term rate volatility. That is, bond prices increase in σ, a conclusion paralleling
that for options, where option prices are increasing in the volatility of the asset price. As
explained in Chapter 10, this property arises through the optionality of the contract–say the
convexity of a European call price with respect to the asset price.
355
by A. Mele
Interesting properties arise when λ < 0, which is an empirically relevant case.9 In this case,
the sign of ∂R(t,T
∂σ
)
is determined by both “convexity” and “slope” effects. “Convexity” effects,
2 (r,T −t)
those relating to the second partial ∂ P ∂r 2 = P (r, T − t) B (T − t)2 , arise through the term
#T
σ t B(T − s)2 ds. “Slope” effects, those relating to ∂P (r,T ∂r
−t)
= P (r, T − t) B (T − t), arise,
#T
instead, through the term t B (T − s) ds. If λ is negative, and large in absolute value, slope
effects can dominate convexity effects, and the term-structure can actually increase in σ. For
intermediate values of λ, the term-structure can be both increasing and decreasing in σ. At short
maturities, the convexity effects in Eq. (11.36) are typically dominated by slope effects, and
the short-end of the term-structure can be increasing in σ. At longer maturity dates, however,
convexity effects are more important and, sometimes, dominate slope effects.
To develop further intuition, consider the following binomial example. In the next period, the
short-term rate is either r+ = r + d or r+ = r − d with equal probability, where r is the current
interest rate level and d > 0. The price of a two-period bond is P (r, d) = m(r, d)/ (1 + r),
where m(r, d) = E [1/ (1 + r+ )] is the expected discount factor of the next period. By Jensen’s
inequality, m(r, d) > 1/ (1 + E [r+ ]) = 1/ (1 + r) = m(r, 0). Therefore, two-period bond prices
increase upon activation of randomness. More generally, two-period bond prices are always
increasing in the “volatility” parameter d in this example (see Figure 11.4). Again, this property
relates to an insight of Jagannathan (1984, p. 429-430) that in a two-period economy with
identical initial underlying asset prices, a terminal underlying asset price ỹ is a mean preserving
spread of another terminal underlying asset price x̃ (in the Rothschild and Stiglitz (1970) sense)
if and only if the price of a call option on ỹ is higher than the price of a call option on x̃. This
is because if ỹ is a mean preserving spread of x̃, then E [f(ỹ)] > E [f (x̃)] for f increasing and
convex.10
These properties arise as we assumed the expected short-term rate is independent of d. In an
alternative setting, say a multiplicative setting where either r+ = r (1 + d) or r+ = r/ (1 + d)
with equal probability, bond prices are decreasing in volatility at short maturities and increasing
in volatility at longer maturities, as originally pointed out by Litterman, Scheinkman and Weiss
(1991). It’s because expected future interest rates increase over time at a strength positively
related to d. That is, the expected variation of the short-term rate is increasing in the volatility
of the short-term rate, d, a property that can be re-interpreted as one arising in an economy
with risk-averse agents. At short maturity dates, such an effect dominates the convexity effect
illustrated in Figure 11.4. At longer maturity dates, the convexity effect dominates.
More generally, then, and as regards the term-structure, volatility changes do not represent
a mean-preserving spread for the risk-neutral distribution, as Eq. (11.36) illustrates for the Va-
sicek model. In a world with complete markets, as in the Black-Scholes one, the asset underlying
the contract is traded. As regards interest rates, the situation differs, for the very simple reason
the short-term rate is not a traded asset. Therefore, the risk-neutral drift of the short-term rate
does in general depend on the short-term volatility through some risk-adjustement–in Vasicek,
for example, this dependence is channeled through the risk-premium parameter λ. And while
the previous conclusions rely on comparative statics for a constant volatility model, they illus-
trate the more general situation of stochastic volatility. Mele (2003) shows that in more complex
9 In this simple model, the assumption that λ < 0 is reasonable, as we observe positive risk-premia more often than negative
risk-premia. But in this very same model, ur < 0, which together with λ < 0, ensures that term-premia are positive.
10 In our case, let m̃ (i+ ) = 1/ (1 + i+ ) denote the random discount factor when i+ = i ∓ d. We have that x → −m̃ (x)
d d
is increasing and concave and, hence, E [−m̃d′′ (x)] < E [−m̃d′ (x)] ⇔ d′ < d′′ , which is what demonstrated in Figure 11.4. In
Jagannathan (1984), f is increasing and convex, and so we must have: E [f (ỹ)] > E [f (x̃)] ⇔ ỹ is riskier than, or a mean preserving
spread of, x̃.
356
by A. Mele
stochastic volatility cases, provided the risk-premium required to bear the interest rate risk is
negative, and sufficiently large in absolute value, slope effects dominate convexity effects at any
finite maturity date, thus making bond prices decrease with volatility at any arbitrary maturity
date.
m(r,d’) = (a + A)/2
m(r,d) = (b + B)/2
b
B
A
r − d’ r −d r r +d r + d’
FIGURE 11.4. If the risk-neutralized interest rate of the next period is either r+ = r + d or
r+ = r − d with equal probability, the random discount factor 1/ (1 + r+ ) is either B or b with
equal probability. Hence m(r, d) = E [1/ (1 + r+ )] is the midpoint of bB. Similarly, if volatility
is d′ > d, m(r, d′ ) is the midpoint of aA. Since ab > BA, it follows that m(r, d′ ) > m(r, d).
Therefore, the two-period bond price P (r, d) = m(r, d)/ (1 + r) satisfies: P (r, d′ ) > P (r, d) for
d′ > d.
What are the implications of these conclusions in terms of the classical factor analysis of
the term-structure reviewed in Section 11.2? Clearly, the very short-end of yield curve is not
affected by movements of the volatility, as limT →t R (r (t) , v (t) , T − t) = r (t), for all possible
values of v (t). Also, in these models, we have that limT →∞ R (r (t) , v (t) , T − t) = R̄, where
R̄ is a constant and, hence, independent of of v (t). Therefore, movements in the short-term
volatility can only produce their effects on the middle of the yield curve. For example, if the
risk-premium required to bear the interest rate risk is negative and sufficiently large, an upward
movement in v (t) can produce an effect on the yield curve qualitatively similar to that depicted
in Figure 11.2 (“Curvature” panel), and would thus roughly mimic the “curvature” factor that
we reviewed in Section 11.2.
11.3.4.2 Three-factor models
We need at least three factors to explain the entire variation in the yield-curve. A model where
the interest rate volatility is stochastic may be far from being exhaustive in this respect. A
natural extension is a model where the drift of the short-term rate contains some predictable
component, r̄ (τ ), which acts as a third factor, as in the following model:

dr (τ ) = κr (r̄ (τ ) − r (τ )) dτ + v (τ )r (t)γ dW1 (τ )
dv (τ ) = κv (v̄ − v (τ )) dτ + ξ v v (τ )dW2 (τ ) (11.37)
dr̄ (τ ) = κr̄ (ı̄ − r̄ (τ )) dτ + ξ r̄ r̄ (τ )dW3 (τ )
357
by A. Mele
where κr , γ, κv , v̄, ξ v , κr̄ ,ı̄ and ξ r̄ are constants, and [W1 W2 W3 ] is vector Brownian motion.
Balduzzi et al. (1996) develop the first model for which the drift of the short-term rate
changes stochastically, as in Eqs. (11.37). Dai and Singleton (2000) estimate a number of models
that generalize that in Eqs. (11.37) (See Section 11.3.7 for details on the estimation strategy).
The term-structure implications of these models can be understood very simply. First, under
regularity conditions about the risk-premia, the yield curve is R (r (t) , r̄ (t) , v (t) , T − t) ≡
− (T − t)−1 ln P (r (t) , r̄ (t) , v (t) , T − t). Second, and intuitively, changes in the new factor r̄ (t)
should primarily affect the long-end of the yield curve. This is because empirically, the usual
finding is that the short-term rate reverts relatively quickly to the long-term factor r̄ (τ ) (i.e. κr
is relatively large), where r̄ (τ ) mean-reverts slowly (i.e. κr̄ is relatively low). This mechanism
makes the short-term rate quite persistent anyway. Ultimately, then, the slow mean-reversion of
r̄ (τ ) means that changes in r̄ (τ ) last for the relevant part of the term-structure we are usually
interested in (i.e. up to 30 years), despite the fact that limT →∞ R (r (t) , r̄ (t) , v (t) , T − t) is
independent of the movements of the three factors r (t), r̄ (t) and v (t).
However, it is difficult to see how to reconcile such a behavior of the long-end of the yield
curve with the existence of any of the factors discussed in Section 11.2. First, the short-term rate
cannot be taken as a “level factor,” since we know its effects die off relatively quickly. Instead, a
joint change in both the short-term rate, r (t), and the “long-term” rate, r̄ (t), should be really
needed to mimic the “Level” panel of Figure 11.2 in Section 11.2. However, this interpretation
is at odds with the assumption that the factors discussed in Section 11.2 are uncorrelated!
Moreover, and crucially, the empirical results in Dai and Singleton reveal that if any, r (t) and
r̄ (t) are negatively correlated.
Finally, to emphasize how exacerbated these puzzles are, consider the effects of changes in
the short-term rate r (t). We know that the long-end of the term-structure is not affected by
movements of the short-term rate. Hence, the short-term rate acts as a “steepness” factor, as
in Figure 11.2 (“Slope” panel). However, this interpretation is restrictive, as factor analysis
reveals that the short-end and the long-end of the yield curve move in opposite directions after
a change in the steepness factor. Here, instead, a change in the short-term rate only modifies
the short-end (and, perhaps, the middle) of the yield curve and, hence, does not produce any
variation in the long-end curve.
11.3.4.3 Unspanned stochastic volatility
Unspanned stochastic volatility arises when

∂
P (r (t) , r̄ (t) , v (t) , T − t) = 0.
∂v
The hypothesis that fixed income markets have unspanned stochastic volatility has been put
forward by Collin-Dufresne and Goldstein (2002). Mele (2003) provides conditions under which
this occurs.
[In progress]
11.3.5 Affine and quadratic term-structure models

11.3.5.1 Affine
The Vasicek and CIR models predict that the bond price is exponential-affine in the short-
term rate r. This property is the expression of a general phenomenon. Indeed, it is possible
to show that bond prices are exponential-affine in r if, and only if, the functions b and a2 are
358
by A. Mele
affine in r. Models that satisfy these conditions are known as affine models. More generally,
these basic results extend to multifactor models, where bond prices are exponential-affine in
the state variables.11 In these models, the short-term rate is a function r (y) such that
r (y) = r0 + r1 · y,
where r0 is a constant, r1 is a vector, and y is a multidimensional diffusion, in Rn , and is solution

to.
dy (τ ) = κ (µ − y (t)) dt + ΣV (y (τ )) dW (τ ) , (11.38)
where W is a d-dimensional Brownian motion, Σ is a full rank n × d matrix, and V is a full
rank d × d diagonal matrix with elements,

V (y)(ii) = αi + β ⊤
i y, i = 1, · · ·, d, (11.39)
for some scalars αi and vectors β i . Langetieg (1980) develops the first multifactor model of this
kind, in which β i = 0.
Next, Let V − (y) be a d × d diagonal matrix with elements

1
− V (y)(ii)
if Pr{V (y (t))(ii) > 0 all t} = 1
V (y)(ii) =
0 otherwise
and set,
Λ (y) = V (y) λ1 + V − (y) λ2 y, (11.40)
for some d-dimensional vector λ1 and some d × n matrix λ2 . Duffie and Kan (1996) explained
in a comprehensive way the benefit of this model. In their formulation λ2 = 0d×n , and the bond
price is exponential-affine in the state variables y. That is, the price of the zero has the following
functional form,
P (y, T − t) = exp [A (T − t) + B (T − t) · y] , (11.41)
for some functions A and B of time to maturity, T − t (B is vector-valued), such that A (0) = 0
and B (0)(i) = 0.
The more general functional form for Λ in Eq. (11.40) has been suggested by Duffee (2002).
Duffee noticed that in bond markets, risk-premiums, defined as Λ (y) V (y) = V 2 (y) λ1 + λ2 y,
are related not only to the volatility of fundamentals, but also to the level of the fundamentals,
which justifies the inclusion of the additional term λ2 y. In this case, bond prices still have
an exponential affine form, just as in Eq. (11.41). When λ2 = 0d×n , we say that the model
is “completely affine,” and “essentially affine,” otherwise. The clear advantage of these affine
models, then, is that they considerably simplify statistical inference, as explained in Section
11.3.7 below.
Ang and Piazzesi (2003) (AP, henceforth) and Hördahl, Tristani and Vestin (2006) (HTS,
henceforth) introduce “no-arbitrage” regressions, to model the relations linking macroeconomic
variables to the yield curve. In their models, the factors are taken to be a discrete-time version
of Eq. (11.38), where some components of y are observable, and others are unobservable. The
observables relate to macroeconomic factors such as inflation or industrial production. The
11 More generally, we say that affine models are those that make the characteristic function exponential-affine in the state variables.
In the case of the multifactor interest rate models of the previous section, this condition is equivalent to the condition that bond
prices are exponential affine in the state variables.
359
by A. Mele
authors, then, study how all these factors affect the yield curve, predicted by a pricing equation
such as that in Eq. (11.41). While HTS have a structural model of the macroeconomy, AP have
a reduced-form model.
Reduced-form model can be exposed to the critique that some of the parameters are not
“variation-free.” [Explain what variation-free parameters are, in mathematical statistics] For
example, in the simple Lucas economy of Part I, we know that the short-term rate is r =
ρ + ηµ + 12 σ 2 η (1 − η), so by “tilting” η (risk-aversion), we should also have a change in the
interest rate. This simple example shows that the parameters related to risk-aversion correction
in Eq. (11.40) are not free to be “tilted,” in that tilting them has an effect on the parameters of
the factor dynamics in Eq. (11.38). At the same time, reduced-form model offer a great deal of
flexibility, as they do not restrict, so to speak, the model to track any market or economy such as
the Lucas economy, say. Moreover, we can always find a theoretical market supporting the no-
arb market underlying the reduced-form model. No-arb regressions such as those in AP give the
data the power to say which parameter constellation make the model likely to perform, without
imposing theoretical restrictions which the data might, then, be likely to reject. For example,
the Lucas model, while clearly illustrates that some of the parameters are not variation-free,
can be simply wrong, and might impose unreasonable restrictions on the data. For no-arb
models, instead, cross-equations restrictions arise through the weaker requirement of absence
of arbitrage opportunities.
11.3.5.2 Quadratic
Affine models are known to impose tight conditions on the structure of the volatility of the
state variables. These restrictions arise to keep the square root in Eq. (11.39) real valued. But
these constraints may hinder the actual performance of the models. There exists another class
of models, known as quadratic models, that partially overcome these difficulties.
11.3.6 Short-term rates as jump-diffusion processes

Seminal contribution (extension of the CIR general equilibrium model to jumps-diffusion): Ahn
and Gao (1988, JF). Suppose that the short-term rate is a jump-diffusion process:
dr(τ ) = bJ (r(τ ))dτ + a(r(τ ))dW (τ ) + ℓ(r(τ )) · S · dZ(τ ),
where the previous equation is written under the risk-neutral probability, and bJ is thus a jump-
adjusted risk-neutral drift. For all (r, τ ) ∈ R++ × [t, T ), the bond price P (r, τ , T ) is then the
solution to,
"
∂ Q
0= + L − r P (r, τ , T ) + v [P (r + ℓS, τ , T ) − P (r, τ , T )] p (dS) , (11.42)
∂τ supp(S)
and P (r, T, T ) = 1 ∀r ∈ R++ . #τ

This is because, as usual, {exp(− t r(u)du)P (r, τ , T )}τ ∈[t,T ] must be a martingale under the
risk-neutral probability in order to prevent arbitrage opportunities.12 Also, we can model the
presence of different quality (or “types”) of jumps, and the previous formula becomes:
N "
∂ Q
0= + L − r P (r, τ , T ) + vj [P (r + ℓS, τ , T ) − P (r, τ , T )] pj (dS) ,
∂τ j=1 supp(S)
12 Just use y(τ ) ≡ b(τ )−1 u(r(τ ), τ , T ), where b solves db(τ) = r(τ )b(τ )dτ (in differential form), for the connection between Eq.
(11.42) and martingales.

360
by A. Mele
where N is the number of jump types, but here for simplicity we just set N = 1.
As regards the risk-neutral distribution, the important thing as usual is to identify the risk-
premia. Here we simply have:
v Q = v · λJ ,
where v is the intensity of the short-term rate jump under the physical distribution, and λJ is
the risk-premium demanded by agents to be compensated for the presence of jumps.13
Bonds subject to default-risk can be modeled through partial differential equations. This is
particularly the case when default is considered as an exogeously given rare event modeled as
a Poisson process. This is the so-called “reduced-form” approach. Precisely, assume that the
event of default at each instant of time is a Poisson process Z with intensity v, and assume
that in the event of default at point τ , the holder of the bond receives a recovery payment
P̄ (τ ) which can be a deterministic function of time (e.g., a constant) or more generally, a
σ (r(s) : t ≤ s ≤ τ )-adapted process satisfying some basic regularity conditions.
Next, let τ̂ be the random default time, and let’s create an auxiliary state variable g with
the following features: +
0 if t ≤ τ < τ̂
g=
1 otherwise
The relevant information for an investor is thus given by the following risk-neutral dynamics:
+
dr(τ ) = b(r(τ ))dτ + a(r(τ ))dW (τ )
(11.43)
dg(τ ) = S · dN(τ ), where S ≡ 1, with probability one
Denote the rational bond price function as P (r, g, τ , T ), τ ∈ [t, T ]. It is assumed that ∀τ ∈
[t, T ] and ∀v ∈ (0, ∞), P (r, 1, τ , T ) = P̄ (τ ) < P (r, 0, τ , T ) a.s. As shown below, such an
assumption, plus the assumption that P̄ (τ ; v′ ) ≥ P̄ (τ ; v) ⇔ v ′ ≥ v, is sufficient to guarantee
that default-free bond prices are higher than defaultable bond prices.
By the usual absence of arbitrage opportunities arguments, the following equation is satisfied
by the pre-default bond price P (r, 0, τ , T ) = P pre (r, τ , T ):

∂
0= + L − r P (r, 0, τ , T ) + v(r) · [P (r, 1, τ , T ) − P (r, 0, τ , T )]
∂τ

∂
= + L − (r + v(r)) P (r, 0, τ , T ) + v(r)P̄ (τ ), τ ∈ [t, T ), (11.44)
∂τ
with the usual boundary condition P (r, 0, T, T ) = 1.

The solution for the pre-default bond price is:
" T
pre ∗
P (x, t, T ) = E exp − (r(τ ) + v(r(τ )))dτ
t
" T " τ
∗
+E exp − (r(u) + v(r(u)))du · v(r(τ ))P̄ (τ )dτ ,
t t
where E∗ [·] is the expectation taken with reference to only the first equation of system (11.43).
This coincides with Duffie and Singleton (1999, Eq. (10) p. 696) when we define a percentage
13 Further details on changes of measures for jump-type processes can be found in Brémaud (1981).
361
by A. Mele
loss process l in [0, 1] so as to have P̄ = (1 − l) · P . Indeed, inserting P̄ = (1 − l) · P into Eq.

(11.44) gives:

∂
0= + L − (r + l(τ )v(r)) P (r, 0, τ , T ), ∀(r, τ ) ∈ R++ × [t, T ),
∂τ
with the usual boundary condition, the solution of which is:

" T
pre ∗
P (x, t, T ) = E exp − (r(τ ) + l(τ ) · v(r(τ )))dτ .
t
To validate the claim that the bond price is decreasing in v, consider two models A and B
where the default-intensities are vA and v B , and assume that the coefficients of L don’t depend
on default-intensity. The pre-default bond price function in economy i is P i (r, τ , T ), i = A, B,
and satisfies:
∂
0= + L − r P i + vi · (P̄ i − P i ), i = A, B,
∂τ
with the usual boundary condition. Substracting these two equations and rearranging terms
reveals that the price difference ∆P (r, τ , T ) ≡ P A (r, τ , T ) − P B (r, τ , T ) satisfies, ∀(r, τ ) ∈
R++ × [t, T ),

∂
0= + L − (r + v ) ∆P (r, τ , T )+ v A − v B · P̄ B (τ ) − P B (r, τ , T ) +v A P̄ A (τ ) − P̄ B (τ ) ,
A
∂τ
with ∆P (r, T, T ) = 0, ∀r ∈ R++ . Given the previous assumptions, the proof is complete by an
application of the maximum principle (see Appendix ? in Chapter 6).
11.3.7 Estimation strategies

Let r (t) be the short-term rate process, solution to the following stochastic differential equation,

dr (t) = κ (µ − r (t)) dt + v (t)r (t)η dW (t) , t ≥ 0, (11.45)
where W (t) is a standard Brownian motion, and κ, µ and η are three positive constants. Sup-
pose, also, that the instantaneous volatility process v (t)r (t)η is such that v (t) is solution
to,

dv (t) = β (α − v (t)) dt + ξv (t)ϑ ρdW (t) + 1 − ρ2 dU (t) , t ≥ 0, (11.46)
where U (t) is another standard Brownian motion; β, α, ξ and ϑ are four positive constants,
and ρ is a constant such that |ρ| < 1.
11.3.7.1 The level effect
Which empirical regularities would the short-term rate model in Eqs. (11.45)-(11.46) address?
Which sign of the correlation coefficient ρ would be consistent with historical episodes such as
the Monetary Experiment of the Federal Reserve System between October 1979 and October
1982?
The short-term rate model in Eqs. (11.45)-(11.46) would address two empirical regularities.
362
by A. Mele
(i) The volatility of the short-term rate is not constant over time. Rather, it seems to be
driven by an additional source of randomness. All in all, the short-term process seems
to be generated by the stochastic volatility model in Eqs. (11.45)-(11.46), in which the
volatility component v (t) is driven by a source of randomness only partially correlated
with the source of randomness driving the short-term rate process itself.
(ii) The volatility of the short-term rate is increasing in the level of the short-term rate. This
phenomenon is known as the “level effect.” Perhaps, periods of high interest rates arise
because of erratic liquidity. (Erratic liquidity would command a high risk-premium and so
a high LIBOR rate say.) But precisely because of erratic liquidity, interest rates are also
very volatile. The short-term rate model in Eqs. (11.45)-(11.46) is a very useful reduced
form able to capture these effects through the two parameters: η and ρ. If the parameter
η is greater than zero, the instantaneous interest rate volatility increases with the level of
the interest level. If the “correlation” coefficient ρ > 0, the interest rate volatility is also
partly related to the sources of interest rate volatility not directly related to the level of
the interest rate.
During the Monetary Experiment, the FED decided to target money supply, rather than
interest rates. So the high volatility of money demand mechanically translated to high interest
rate volatility as a result of market clearing. Moreover, the quantity of monetary base was
kept deliberately low - to fight against inflation. So the US experienced both high interest rate
volatility and high interest rates (see, for example, Andersen and Lund, 1997, for an empirical
study). Moreover, high nominal interest rates may be so because they might be compensating
for high inflation volatility, that is, not only high inflation. There is no empirical study about the
issues related to the sign of the correlation coefficient ρ. Here is a suggestion. A rolling window
estimation suggestive that the level of ρ changed a lot around the Monetary Experiment would
mean that the bulk of interest rate volatility was not entirely due to the mechanical effects
related to the FED behavior.
11.3.7.2 The simplest estimation case
Next, suppose we wish to estimate the parameter vector θ = [κ, µ, η, β, α, ξ, ϑ, ρ]⊤ of the model
in Eqs. (11.45)-(11.46). Under which circumstances would Maximum Likelihood be a feasible
estimation method?
The ML estimator would be feasible under two sets of conditions. First, the model in Eqs.
(11.45)-(11.46) should not have stochastic volatility at all, viz, β = ξ = 0; in this case, the
short-term rate would be solution to,
dr (t) = κ (µ − r (t)) dt + σ̄r (t)η dW (t) , t ≥ 0,
where σ̄ is now a constant. Second, the value of the elasticity parameter η is important. If η = 0,
the short-term rate process is the Gaussian one proposed by Vasicek (1977). If η = 12 , we obtain
the square-root process of Cox, Ingersoll and Ross (1985). In the Vasicek case, the transition
density of r is Gaussian, and in the CIR case, the transition density of r is a noncentral chi-
square. So in both the Vasicek and CIR, we may write down the likelihood function of the
diffusion process. Therefore, ML estimation is possible in these two cases.
In the more general case, we have to go for simulation methods described in Chapter 5.
363
by A. Mele
11.3.7.3 More general models
Estimating the model in Eqs. (11.45)-(11.46) is certainly instructive. Yet a more important
question is to examine the term-structure implications of this model. More generally, how would
the estimation procedure outlined in the previous subsection change if the task is to estimate
a Markov model of the term-structure of interest rates? There are three steps.
Step 1
Collect data on the term structure of interest rates. We will need to use data on two maturities
(say a time series of riskless 6 months and 5 year interest rates).
Step 2
Let us consider a model of the yield curve. We have, as usual:

#N
j −
j
r(s)ds
P (r (t) , v (t)) ≡ P (r (t) , v (t) , Nj − t) = E e t r (t) , v (t) , (11.47)
where E (.) is the conditional expectation taken under the risk-neutral probability, and Nj is a
sequence of expiration dates. Naturally, the previous formula relies on some assumptions about
risk-aversion correction. Some of these assumptions may be of a reduced-form nature; others
may rely on the specification of preferences, beliefs, markets and technology. But we do not
need to be more precise at this level of generality. In turn, these assumptions entail that the
pricing formula in Eq. (11.47) depends on some additional risk-adjustment parameter vector,
say λ. Precisely, the Radon-Nykodim derivative
1 # of the2 risk-neutral
# probability
with respect to
the physical probability is given by exp − 2 Λ (t) dt − Λ (t) dZ (t) , where Z = [W U ]⊤ ,
W and U are the two Brownian motions in Eqs. (11.45)-(11.46), and Λ (t) is some process
adapted to Z, which is taken to be of the form Λ (t) ≡ Λm (r (t) , v (t) ; λ), for some vector
valued function Λm and some parameter vector λ. The function Λm makes risk-adjustment
corrections dependent on the current value of the state vector (r (t) , v (t)), and thus makes the
model Markov.
So the estimation problem is actually one in which we have to estimate both the “physical”
parameter vector θ = [κ, µ, η, β, α, ξ, ϑ, ρ]⊤ and the “risk-adjustment” parameter vector λ.
Next, compute interest rates corresponding to two maturities,
1
Rj (r (t) , v (t) ; θ, λ) = − ln P j (r (t) , v (t)) , j = 1, 2, (11.48)
Nj
where the bond prices are computed through Eq. (11.47), and where the notation Rj (r, v; θ, λ)
emphasizes that the theoretical term-structure depends on the parameter vector (θ, λ). We can
now use the data (Rj$ say) and the model predictions about the data (Rj ), create moment con-
ditions, and proceed to estimate the parameter vector (θ, λ) through some method of moments
(provided the moments are enough to make (θ, λ) identifiable). But there are two difficulties.
The first difficulty is that the volatility process v (t) is not observable by the econometrician.
We can use inference methods based on simulations to cope with this issue. Very simply, we
simulate Eqs. (11.45) and (11.46) and apply moment conditions or auxiliary models to observ-
able variables, as explained in Chapter 5. For example, we simulate Eqs. (11.45)-(11.46) for a
given value of the parameter vector (θ, λ). For each simulation, compute a time series of interest
rates Rj from Eq. (11.48). Then, we use these simulated data to create moment conditions or
fit some auxiliary model to these artificial data that is as close as possible to the very same
364
11.4. No-arbitrage models c
by A. Mele
auxilary model fit to real data: the parameter estimator is the value of (θ, λ) which minimizes
some norm of these moment conditions obtained through the simulations, with any of the meth-
ods explained in Chapter 5. According to Theorem 5.4 in Chapter 5, fitting a sufficiently rich
auxilary model should result in a quite efficient estimator.
A second difficulty is that the bond pricing formula in Eq. (11.47) does not generally admit
a closed-form, an issue we can address using affine models, as explained next.
Step 3
The use of affine models would considerably simplify the analysis. Affine models place restric-
tions on the data generating process in Eqs. (11.45)-(11.46) and in the risk-aversion corrections
in Eq. (11.47) in such a way that the term structure in Eq. (11.48) is,
Rj (r (t) , v (t) ; θ, λ) = A (j; θ, λ) + B (j; θ, λ) · y (t) , j = 1, 2,
where A (j; θ, λ) and B (j; θ, λ) are some functions of the maturity Nj (B is vector valued),
and generally depend on the parameter vector (θ, λ); and finally the state vector y = [r v]⊤ .
(Namely, an affine model obtains once η = 0, ϑ = 12 , and the function Λm is affine.) So once
Eqs. (11.45)-(11.46) are simulated, the computation of a time series of yields Rj is straight
forward.14
11.4 No-arbitrage models

11.4.1 Fitting the yield-curve, perfectly
For derivative trading purposes, we do not really wish to explain the term structure. Rather, we
wish to take it as given. Consider, for instance, a European option written on a bond. We may
find it unsatisfactory to have a model that only “explains” the bond price. A model’s mistake
on the bond price is likely to generate a huge option price mistake. How can we trust an option
pricing model that is not even able to pin down the value of the underlying asset price? To
illustrate these points, denote with P (r(τ ), τ , S) the rational price process of a zero coupon
bond maturing at some time S. What is the price of a European option written on this bond,
struck at K and expiring at T < S? By the FTAP, there are no arbitrage opportunities if and
only if the option price C b is:
T !
C b (r(t), t, T, S) = E e− t r(τ )dτ · (P (r(T ), T, S) − K)+ .
As an example, in affine models, P is lognormal whenever r is normally distributed. This hap-

pens precisely for the Vasicek model. The intuition developed for the Black and Scholes (1973)
(BS) formula suggests that in this case, the previous expectation is a nonlinear function of the
current bond price P (r(t), t, T ). This claim cannot be shown with the simple risk-neutral

tools
− tT r(τ )dτ
used to show the BS formula. One of the troubles is due to the presence of the e term
inside the brackets, which is obviously unknown at the time of evaluation t. But the problem is
tractable, thanks to the forward martingale probability introduced in Section 11.2.4. Precisely,
14 Dai and Singleton (2000) implement this estimation strategy, although the make use of swap rates data. The models they
consider predict theoretical values for the swap rates, obtained through the formula in Eq. (11.85) of Section 11.7.4.4 below, where
the bond prices in that formula are replaced by the pricing functions predicted by the models. Dai & Singleton consider three
rates

predicted by their models: two swap rates (with tenures of two and ten years), plus the six month Libor rate, − 12 ln P t, t + 12 ,
where P is the pricing function predicted by the models they consider.
365
by A. Mele
let 1ex be the indicator of all events s.t. the option is exercized i.e., that P (r(T ), T, S) ≥ K.
We have:
C b (r(t), t, T, S)
T ! T !
= E e− t r(τ )dτ
P (r(T ), T, S) · 1ex − K · E e− t r(τ )dτ · 1ex
- S . - T .
e− t r(τ )dτ e− t r(τ )dτ
= P (r(t), t, S) · E · 1ex − KP (r(t), t, T ) · E · 1ex
P (r(t), t, S) P (r(t), t, T )
= P (r(t), t, S) · EQSF [1ex ] − KP (r(t), t, T ) · EQTF [1ex ]
= P (r(t), t, S) · QSF [P (r(T ), T, S) ≥ K] − KP (r(t), t, T ) · QTF [P (r(T ), T, S) ≥ K] , (11.49)
where the first term in the second equality follows by an argument nearly identical to that
produced in Section 11.1 (see footnote 2);15 QiF (i = T, S) is the i-forward probability; and
finally, EQiF [·] is the expectation taken under the i-forward martingale probability (see Section
11.1 for more details).
Section 11.7 explains how the two probabilities in Eq. (11.49) are computed. The important
issue, now, is to emphasize that the bond option price does depend on the theoretical bond
prices P (r(t), t, T ) and P (r(t), t, S), which, in turn, cannot equal the current, observed market
prices. Theoretical prices are simply the output of a rational expectations model. This fact is not
a source of concern to those who wish to predict future term-structure movements with the help
of a few, key state variables, as in the multifactor models discussed earlier. However, a source of
concern to practitioners dealing with pricing a bond option is that the pricing model perfectly
matches the yield curve at the time of evaluation. The aim of this section is to introduce a class
of models that fit the yield curve without errors, which we call “perfectly fitting models:”: these
models predict prices of bonds expiring at some date S, which are of course random at time
T < S, but also exactly equal to the current market bond prices (at time t). Finally, these prices
must, of course, be arbitrage-free. As we show, these conditions can be met by augmenting the
models seen in the previous sections with a set of “infinite dimensional parameters.” We do not
develop a general model-building principle, however. Rather, we discuss two specific yet famous
such models: the Ho and Lee (1986) model, and one generalization of it, introduced by Hull
and White (1990).
A final remark. In Section 11.7, we will show that at least for the Vasicek’s model, Eq. (11.49)
does not explicitly depend on r because it only “depends” on P (r(t), t, T ) and P (r(t), t, S). So
why do we look for perfectly fitting models in the first place? Wouldn’t it be enough, then,
to just replace the theoretical prices P (r(t), t, T ) and P (r(t), t, S) with the market values, say
P $ (t, T ) and P $ (t, S)? This way, the model is perfectly fitting. Apart from being logically
inconsistent (you would have a model predicting something generically different from prices),
this way of proceeding also has practical drawbacks. Section 11.7 shows that option pricing
formulae for European options, might well agree “in notation” with those relating to perfectly
fitting models. However, Section 11.7.3 explains that as we move towards more complex interest
15 By the Law of Iterated Expectations,

T T S S

E e− t r(τ )dτ P (r(T ), T, S)1ex = E e− t r(τ)dτ
1ex E e− T r(τ )dτ F (T ) = E e− t r(τ )dτ 1ex .
366
by A. Mele
rate derivatives products, such as options on coupon bonds and swaption contracts, the situation
gets dramatically different. Finally, it can be the case that some maturity dates are actually
not traded at some point in time. For example, it may happen that P $ (t, T ) is not observed
and that we could still be interested in pricing more “exotic” or less liquid bonds or options
on these bonds. An intuitive procedure to deal with this this difficulty is to “interpolate” the
traded maturities. In fact, the objective of perfectly fitting models is to allow for such an
“interpolation” while preserving absence of arbitrage opportunities.
11.4.2 Ho & Lee

The original Ho and Lee (1986) model is in discrete-time and is analyzed in the context of
Chapter 13, along with other models. The model below, represents the “diffusion limit” of the
original Ho & Lee model:
dr (τ ) = θ (τ ) dτ + σdW̃ (τ ) , τ ≥ t, (11.50)
where W̃ is a Q-Brownian motion, σ is a constant, and θ (τ ) is an “infinite dimensional”

parameter introduced to pin down the initial, observed term structure. The time of evaluation
is t. The reason we refer θ (τ ) to as “infinite dimensional” parameter is that we θ (τ ) is a function
of calendar time τ ≥ t. Crucially, then, we assume that this function is known at the time of
evaluation t.
Clearly, Eq. (11.50) gives rise to an affine model. Therefore, the bond price takes the following
form,
P (r (τ ) , τ , T ) = eA(τ ,T )−B(τ ,T )·r(τ ) , (11.51)
for two functions A and B to be determined below. It is easy to show that,

" T
1
A (τ , T ) = θ (s) (s − T ) ds + σ 2 (T − τ )3 , B (τ , T ) = T − τ .
τ 6
Let f$ (t, τ ) denote the instantaneous, observed forward rate. By matching the instantaneous
forward rate f (τ , T ) predicted by the model to f$ (τ , T ) yields:
" T
∂ ln P (r (τ ) , τ , T ) 1
f$ (τ , T ) = f (τ , T ) = − = θ (s) ds − σ 2 (T − τ )2 + r (τ ) . (11.52)
∂T τ 2
#
T
Because P (t, T ) = exp − t f (t, τ ) dτ , the drift term θ (s) satisfying Eq. (11.52) also guar-
antees an exact fit of the yield curve. By differentiating the previous equation with respect to
∂
T , leaves θ (T ) = ∂T f$ (τ , T ) + σ 2 (T − τ ), or:
∂
θ (τ ) = f$ (t, τ ) + σ 2 (τ − t) . (11.53)
∂τ
To check that θ is indeed the solution we were looking for, we replace Eq. (11.53) into Eq.
(11.52) and verify indeed that Eq. (11.52) holds as an identity.
[Develop connections with the HML approach introduced in Section 11.5 below]
367
11.5. The Heath-Jarrow-Morton framework c
by A. Mele
11.4.3 Hull & White

Hull and White (1990) consider the following model:

θ (τ )
dr(τ ) = κ − r (τ ) dτ + σdW̃ (τ ) , (11.54)
κ
where W̃ is a Q-Brownian motion, and κ, σ are constants. The model generalizes the Ho and
Lee model (1986) in Eq. (11.50) and the Vasicek (1977) model in Eq. (11.31). In the original
formulation of Hull and White, κ and σ are both time-varying, but the main points of this
model can be learnt by working out this particular simple case.
Eq. (11.54) also gives rise to an affine model. Therefore, the solution for the bond price is
given by Eq. (11.51). It is easy to show that the functions A and B are given by
" " T
1 2 T 2
A(τ , T ) = σ B(s, T ) ds − θ(s)B(s, T )ds, (11.55)
2 τ τ
and
1
B(τ , T ) = 1 − e−κ(T −τ ) . (11.56)
κ
By reiterating the same reasoning produced to show (11.53), one shows that the solution for
θ is:
∂ σ2
θ(τ ) = f$ (t, τ ) + κf$ (t, τ ) + 1 − e−2κ(τ −t) . (11.57)
∂τ 2κ
A proof of this result is in Appendix 5.
Why did we need to go for this more complex model? After all, the Ho & Lee model is
already able to pin down the entire yield curve. The answer is that in practice, investment
banks typically prices a large variety of derivatives. The yield curve is not the only thing to be
exactly fit. Rather it is only the starting point. In general, the more flexible a given perfectly
fitting model is, the more successful it is to price more complex derivatives.
11.4.4 Critiques
As we shall see in Section 11.6, closed-form solution for options, caps, floors and swaptions
on bond prices are easy to implement, as soon as we assume the short-term rate is Gaussian.
However, no-closed form solutions are generally available, which are consistent the standard
market practice. A class of models known as “market models” overcomes this difficulty. These
models are a particular case of a general class of models introduced by Heath, Jarrow and
Morton (1992), examined in Section 11.5..
- Intertemporal inconsistencies: θ functions have to be re-calibrated every single day. (As Eq.
(11.53) demonstrates, at time t, θ (τ ) depends on the slope of f$ which can change every day.)
This kind of problems is present in HJM-type models
- Stochastic string shocks models.
11.5 The Heath-Jarrow-Morton framework

11.5.1 Motivation
The bond price representation in Eq. (11.5),
T
P (τ , T ) = e− τ f (τ ,ℓ)dℓ
, all τ ∈ [t, T ], (11.58)
368
by A. Mele
underlies the modeling approach started by Heath, Jarrow and Morton (1992) (HJM, hence-
forth). Given Eq. (11.58), this approach takes as a primitive the τ -stochastic evolution of the
entire structure of forward rates, not only the special case of the short-term rate, r (t) =
limℓ↓t f (t, ℓ) ≡ f (t, t). So given Eq. (11.58) and the initial, observed structure of forward rates
{f (t, ℓ)}ℓ∈[t,T ] , no-arbitrage “cross-equations” restrictions determine the stochastic behavior of
{f (τ , ℓ)}τ ∈(t,ℓ] for any ℓ ∈ [t, T ].
By construction, the HJM approach allows for a perfect fit of the initial term-structure. This
point can be illustrated quite simply, as the bond price P (τ , T ) is,
T
P (τ , T ) = e− τ f (τ ,ℓ)dℓ
P (t, T ) P (t, τ ) − T f (τ ,ℓ)dℓ
= · e τ
P (t, τ ) P (t, T )
P (t, T ) − tτ f (t,ℓ)dℓ+tT f (t,ℓ)dℓ−τT f (τ ,ℓ)dℓ
= ·e
P (t, τ )
P (t, T ) T f (t,ℓ)dℓ− T f (τ ,ℓ)dℓ
= ·e τ τ
P (t, τ )
P (t, T ) − T [f (τ ,ℓ)−f (t,ℓ)]dℓ
= ·e τ .
P (t, τ )
The key point of the HJM methodology is to take the current forward rates structure f (t, ℓ) as
given, and to model the future forward rate movements,
f (τ , ℓ) − f (t, ℓ).
Therefore, the HJM methodology takes the current term-structure as given and, hence, perfectly
fitted, as we we observe both P (t, T ) and P (t, τ ). In contrast, the initial approach to interest
rate modeling is to model the current bond price P (t, T ) through a model for the short-term
rate (as illustrated in Section 11.3), which for this reason, does not fit the initial term structure.
As explained in the previous section, fitting the initial term-structure is an important issue when
the objective is to price interest-rate derivatives.
11.5.2 The model

11.5.2.1 Primitives
Because the primitive is still a Brownian information structure, once we want to model future
movements of {f (τ , T )}τ ∈[t,T ] , we also have to accept that for every T , {f (τ , T )}τ ∈[t,T ] is F (τ )-
adapted. There thus exist functionals α and σ such that, for a given T ,
dτ f (τ , T ) = α (τ , T ) dτ + σ (τ , T ) dW (τ ) , τ ∈ (t, T ], (11.59)
where f (t, T ) is given. The solution to Eq. (11.59) is:

" τ " τ
f (τ , T ) = f (t, T ) + α(s, T )ds + σ(s, T )dW (s), τ ∈ (t, T ]. (11.60)
t t
In other terms, W “doesn’t depend” on T . In some sense, however, we may also want to “index”
W by T . The so-called stochastic string models are capable of doing that, and are discussed in
Section 11.7.
369
by A. Mele
11.5.2.2 No-arb restrictions
The next step is to derive #restrictions on α that are consistent with absence of arbitrage op-
T
portunities. Let X(τ ) ≡ − τ f (τ , ℓ)dℓ. We have
" T

dX(τ ) = f (τ , τ )dτ − (dτ f (τ , ℓ)) dℓ = r(τ ) − αI (τ , T ) dτ − σ I (τ , T )dW (τ ),
τ
where " "

T T
I I
α (τ , T ) ≡ α(τ , ℓ)dℓ, σ (τ , T ) ≡ σ(τ , ℓ)dℓ.
τ τ
By Eq. (11.58), P = eX . By Itô’s lemma,

dτ P (τ , T ) I 1;; I
;2
;
= r(τ ) − α (τ , T ) + σ (τ , T ) dτ − σ I (τ , T )dW (τ ).
P (τ , T ) 2
By the FTAP, there are no arbitrage opportunties if and only if

dτ P (τ , T ) 1; ;2
= r(τ ) − α (τ , T ) + ;σ (τ , T ); + σ (τ , T )λ(τ ) dτ − σ I (τ , T )dW̃ (τ ),
I I I
P (τ , T ) 2
#τ
where W̃ (τ ) = W (τ ) + t λ(s)ds is a Q-Brownian motion, and λ satisfies:
1; ;
;σ I (τ , T );2 + σ I (τ , T )λ(τ ).
αI (τ , T ) = (11.61)
2
By differentiating the previous relation with respect to T gives us the arbitrage restriction that
we were looking for:
" T
α(τ , T ) = σ(τ , T ) σ(τ , ℓ)⊤ dℓ + σ(τ , T )λ(τ ). (11.62)
τ
11.5.3 The dynamics of the short-term rate

By Eq. (11.60), the short-term rate satisfies:
" τ " τ
r(τ ) ≡ f(τ , τ ) = f (t, τ ) + α(s, τ )ds + σ(s, τ )dW (s), τ ∈ (t, T ]. (11.63)
t t
Differentiating with respect to τ yields

" τ " τ
dr(τ ) = f2 (t, τ ) + σ(τ , τ )λ(τ ) + α2 (s, τ )ds + σ 2 (s, τ )dW (s) dτ + σ(τ , τ )dW (τ ),
t t
where " τ
α2 (s, τ ) = σ 2 (s, τ ) σ(s, ℓ)⊤ dℓ + σ(s, τ )σ(s, τ )⊤ + σ 2 (s, τ )λ(s).
s
As is clear, the short-term rate is in general non-Markov. However, the short-term rate can be
“risk-neutralized,” and used to price exotics through simulations. A special case of Eq. (11.63)
is the Ho and Lee model, where
σ (s, τ ) = σ, a constant, such that, by Eq. (11.62), α (s, τ ) = σ (τ − s) + σλ(τ ).

370
by A. Mele
11.5.4 Embedding
At first glance, it might be guessed that HJM models are quite distinct from the models of the
short-term rate introduced in Section 11.3. However, there exist “embeddability” conditions
turning HJM into short-term rate models, and viceversa, a property known as “universality”
of HJM models.
11.5.4.1 Markovianity
One natural question to ask is whether there are conditions under which HJM-type models
predict the short-term rate to be a Markov process. The question is natural insofar as it relates
to the early literature where the whole yield curve was assumed to be driven by a scalar Markov
process: the short-term rate. The answer to this question is in the contribution of Carverhill
(1994). Another important contribution in this area is due to Ritchken and Sankarasubramanian
(1995), who studied conditions under which it is possible to enlarge the original state vector in
such a manner that the resulting “augmented” state vector is Markov and at the same time,
includes that short-term rate as a component. The resulting model resembles a lot some of the
short-term rate models surveyed in Section 11.3. In these models, the short-term rate is not
Markov, yet it is part of a system that is Markov. Here we only consider the simple Markov
scalar case.
Assume the forward-rate volatility structure is deterministic and takes the following form:
σ(t, T ) = g1 (t)g2 (T ) all t, T . (11.64)
By Eq. (11.63), r is then:

" τ " τ
r(τ ) = f (t, τ ) + α(s, τ )ds + g2 (τ ) · g1 (s)dW (s), τ ∈ (t, T ],
t t
Also, r is solution to:
" τ " τ
dr(τ ) = f2 (t, τ ) + σ(τ , τ )λ(τ ) + α2 (s, τ )ds + g2′ (τ ) g1 (s)dW (s) dτ + σ(τ , τ )dW (τ )
t t
" τ " τ
g2′ (τ )
= f2 (t, τ ) + σ(τ , τ )λ(τ ) + α2 (s, τ )ds + g2 (τ ) g1 (s)dW (s) dτ + σ(τ , τ )dW (τ )
t g2 (τ ) t
" τ " τ
g2′ (τ )
= f2 (t, τ ) + σ(τ , τ )λ(τ ) + α2 (s, τ )ds + r(τ ) − f(t, τ ) − α(s, τ )ds dτ
t g2 (τ ) t
+ σ(τ , τ )dW (τ ).
Done. This is Markov. Precisely, the condition in Eq. (11.64) ensures the HJM model predicts
the short-term rate is Markov. Mean reversion, then, obtains assuming that g2′ (T ) < 0 for all
T . For example, take λ to be a constant, and:
g1 (t) = σ · eκt , σ > 0, g2 (t) = e−κt , κ ≥ 0.
This is the Hull-White model discussed in Section 11.3, and of course, the Ho and Lee model
obtains in the special case κ = 0.
371
11.6. Stochastic string shocks models c
by A. Mele
11.5.4.2 Short-term rate reductions
We prove everything in the Markov case. Let the short-term rate be solution to:
dr(τ ) = b̄(τ , r(τ ))dτ + a(τ , r(τ ))dW̃ (τ ),
where W̃ is a Q-Brownian motion, and b̄ is some risk-neutralized

! drift function. The rational

− tT r(τ )dτ
bond price function is P (r(t), t, T ) = E e . The forward rate implied by this model
is:
∂
f (r(t), t, T ) = − ln P (r(t), t, T ).
∂T
By Itô’s lemma,
∂ 1 2
df = f + b̄fr + a frr dτ + afr dW̃ .
∂t 2
But for f(r, t, T ) to be consistent with the solution to Eq. (11.60), it must be the case that
∂ 1
α(t, T ) − σ(t, T )λ(t) = f (r, t, T ) + b̄(t, r)fr (r, t, T ) + a(t, r)2 frr (r, t, T )
∂t 2 (11.65)
σ(t, T ) = a(t, r)fr (t, r)
and
f (t, T ) = f(r, t, T ). (11.66)
In particular, the last condition can only be satisfied if the short-term rate model under con-
sideration is of the perfectly fitting type.
11.6 Stochastic string shocks models

The first papers are Kennedy (1994, 1997), Goldstein (2000) and Santa-Clara and Sornette
(2001). Heaney and Cheng (1984) are also very useful to read.
11.6.1 Addressing stochastic singularity

Let σ (τ , T ) = [σ 1 (τ , T ) , · · ·, σ N (τ , T )] in Eq. (11.59). For any T1 < T2 ,
N

E [df (τ , T1 ) df (τ , T2 )] = σ i (τ , T1 ) σ i (τ , T2 ) dτ ,
i=1
and,
N
i=1σ i (τ , T1 ) σ i (τ , T2 )
c (τ , T1 , T2 ) ≡ corr [df (τ , T1 ) df (τ , T2 )] = . (11.67)
σ (τ , T1 ) · σ (τ , T2 )
By replacing this result into Eq. (11.62),
" T
α(τ , T ) = σ(τ , T ) · σ(τ , ℓ)⊤ dℓ + σ(τ , T )λ(τ )
τ
" T
= σ (τ , ℓ) σ (τ , T ) c (τ , ℓ, T ) dℓ + σ(τ , T )λ(τ ).
τ
372
11.6. Stochastic string shocks models c
by A. Mele
One drawback of this model is that the correlation matrix of any (N + M )-dimensional vector
of forward rates is degenerate for M ≥ 1. Stochastic string models overcome this difficulty by
modeling in an independent way the correlation structure c (τ , τ 1 , τ 2 ) for all τ 1 and τ 2 rather
than implying it from a given N-factor model (as in Eq. (11.67)). In other terms, the HJM
methodology uses functions σ i to accommodate both volatility and correlation structure of
forward rates. This is unlikely to be a good model in practice. As we will now see, stochastic
string models have two separate functions with which to model volatility and correlation.
The starting point is a model where the forward rate is solution to,
dτ f (τ , T ) = α (τ , T ) dτ + σ (τ , T ) dτ Z (τ , T ) ,
where the string Z satisfies the following five properties:
(i) For all τ , Z (τ , T ) is continuous in T ;
(ii) For all T , Z (τ , T ) is continuous in τ ;
(iii) Z (τ , T ) is a τ -martingale and, hence, a local martingale i.e. E [dτ Z (τ , T )] = 0;
(iv) var [dτ Z (τ , T )] = dτ ;
(v) cov [dτ Z (τ , T1 ) dτ Z (τ , T2 )] = ψ (T1 , T2 ) (say).
Properties (iii), (iv) and (v) make Z Markovian. The functional form for ψ is crucially impor-
tant to guarantee this property. Given the previous properties, we can deduce a key property
of the forward rates. We have,

var [df (τ , T )] = σ (τ , T )
σ (τ , T1 ) σ (τ , T2 ) ψ (T1 , T2 )
c (τ , T1 , T2 ) ≡ corr [df (τ , T1 ) df (τ , T2 )] = = ψ (T1 , T2 )
σ (τ , T1 ) σ (τ , T2 )
As claimed before, we now have two separate functions with which to model volatility and
correlation.
11.6.2 No-arbitrage restrictions

#T
Similarly as in the HJM-Brownian case, let X (τ ) ≡ − τ
f (τ , ℓ) dℓ. We have,
" T " T
I

dX (τ ) = f (τ , τ ) dτ − dτ f (τ , ℓ) dℓ = r (τ ) − α (τ , T ) dτ − [σ (τ , ℓ) dτ Z (τ , ℓ)] dℓ,
τ τ
#T
where as usual, αI (τ , T ) ≡ τ
α (τ , ℓ) dℓ. But P (τ , T ) = exp (X (τ )). Therefore,
dP (τ , T ) 1
= dX (τ ) + var [dX (τ )]
P (τ , T ) 2
" "
I 1 T T
= r (τ ) − α (τ , T ) + σ (τ , ℓ1 ) σ (τ , ℓ2 ) ψ (ℓ1 , ℓ2 ) dℓ1 dℓ2 dτ
2 τ τ
" T
− [σ (τ , ℓ) dτ Z (τ , ℓ)] dℓ.
τ
373
11.7. Interest rate derivatives c
by A. Mele
Next, suppose that the pricing kernel ξ satisfies:

"
dξ (τ )
= −r (τ ) dτ − φ (τ , T ) dτ Z (τ , T ) dT,
ξ (τ ) T
where T denotes the set of all “risks” spanned by the string Z, and φ is the corresponding
family of “unit risk-premia.”
By absence of arbitrage opportunities,

dP dξ dP dξ
0 = E [d (P ξ)] = E P ξ · drift + drift + cov , .
P ξ P ξ
By exploiting the dynamics of P and ξ,
" "
I 1 T T dP dξ
α (τ , T ) = σ (τ , ℓ1 ) σ (τ , ℓ2 ) ψ (ℓ1 , ℓ2 ) dℓ1 dℓ2 + cov , ,
2 τ τ P ξ
where
" " T
dP dξ
cov , = E φ (τ , S) dτ Z (τ , S) dS · σ (τ , ℓ) dτ Z (τ , ℓ) dℓ
P ξ T τ
" T "
= φ (τ , S) σ (τ , ℓ) ψ (S, ℓ) dSdℓ.
τ T
By differentiating αI with respect to T we obtain,

" T "
α (τ , T ) = σ (τ , ℓ) σ (τ , T ) ψ (ℓ, T ) dℓ + σ (τ , T ) φ (τ , S) ψ (S, T ) dS. (11.68)
τ T
A proof of Eq. (11.68) is in the Appendix.
11.7 Interest rate derivatives

11.7.1 Introduction
Options on bonds, caps and swaptions are the main interest rate derivatives traded in the
market. The purpose of this section is to price these assets. In principle, the pricing problem
could be solved very elegantly. Let w denote the value of any of such instrument, and π be the
instantaneous payoff process paid by it. Consider any model of the short-term rate considered
in Section 11.3. To simplify, assume that d = 1, and that all uncertainty is subsumed by the
short-term rate process in Eq. (11.25). By the FTAP, w is then the solution to the following
partial differential equation:
∂w 1
0= + b̄wr + a2 wrr + π − rw, for all (r, τ ) ∈ R++ × [t, T ) (11.69)
∂τ 2
subject to some appropriate boundary conditions. In the previous PDE, b̄ is some risk-neutralized
drift function of the short-term rate. The additional π term arises because to the average instan-
taneous increase rate of the derivative, viz ∂w
∂τ
+ b̄wr + 12 a2 wrr , one has to add its payoff π. The
sum of these two terms must equal rw to avoid arbitrage opportunities. In many applications
374
by A. Mele
considered below, the payoff π can be approximated by a function of the short-term rate itself
π (r). However, such an approximation is at odds with standard practice. Market participants
define the payoffs of interest-rate derivatives in terms of LIBOR discretely-compounded rates.
Moreover, intermediate payments do not occur continuously, only discretely. The aim of this
section is to present more models that are more realistic than those emananating from Eq.
(11.69).
The next section introduces notation to cope expeditiously with the pricing of these interest
rate derivatives. Section 11.7.3 shows how to price options within the Gaussian models discussed
in Section 11.3. Section 11.7.4 provides precise definitions of the remaining most important
fixed-income instruments: fixed coupon bonds, floating rate bonds, interest rate swaps, caps,
floors and swaptions. It also provides exact solutions based on short-term rate models. Finally,
Section 11.7.5 presents the “market model,” which is a HJM-style model intensively used by
practitioners.
11.7.2 The put-call parity in fixed income markets

Consider the identity,
[K − P (T, S)]+ ≡ [P (T, S) − K]+ + K − P (T, S) .
Taking risk-neutral, discounted expectations of both sides of this equation leaves,

#T
− r(τ )dτ +
E e t (K − P (T, S))
#T + #T
− r(τ )dτ − r(τ )dτ
=E e t (P (T, S) − K) + P (t, T ) K − E e t P (T, S)
#T +
− r(τ )dτ
=E e t (P (T, S) − K) + P (t, T ) K − P (t, S) ,
where the last equality follows by the same argument leading to Eq. (11.49). Therefore, we have
the put-call parity relation:
Put (t, T ; P (t, S) , K) = Call (t, T ; P (t, S) , K) + P (t, T ) K − P (t, S) , (11.70)
where Put (t, T ; P (t, S) , K) is the price of a European put written on a zero expiring at time
S, expiring at time T < S, and struck at K, and Call (·) denotes the corresponding call price.
11.7.3 European options on bonds

Let T be the expiration date of a European call option on a bond and S > T be the expiration
date of the bond. We consider a simple model of the short-term rate with d = 1, and a rational
bond pricing function of the form P (τ ) ≡ P (r, τ , S). We also consider a rational option price
function C b (τ ) ≡ C b (r, τ , T, S). By the FTAP, there are no arbitrage opportunities if and only
if, !
T
C b (t) = E e− t r(τ )dτ (P (r(T ), T, S) − K)+ , (11.71)
where K is the strike of the option. In terms of PDEs, C b is solution to Eq. (11.69) with π ≡ 0
and boundary condition C b (r, T, T, S) = (P (r, T, S) − K)+ , where P (r, τ , S) is also the solution
375
by A. Mele
to Eq. (11.69) with π ≡ 0, but with boundary condition P (r, S, S) = 1. In terms of PDEs, the
situation seems hopeless. As we show below, the problem can considerably be simplified with
the help of the T -forward martingale probability introduced in Section 11.1. In fact, we shall
show that under the assumption that the short-term rate is a Gaussian process, Eq. (11.71) has
a closed-form expression. We now present two models enabling this. The first one was developed
in a seminal paper by Jamshidian (1989), and the second one is, simply, its perfectly fitting
extension.
11.7.3.1 Jamshidian & Vasicek
Suppose that the short-term rate is solution to the Vasicek’s model considered in Section 11.3
(see Eq. (11.31)):
dr(τ ) = κ (r∗ − r (τ )) dτ + σdW̃ ,
where W̃ is a Q-Brownian motion and r∗ ≡ r̄ − λσ κ
. As shown in Section 11.3, Eq. (11.33), the
bond price is:
P (r(τ ), τ , S) = eA(τ ,S)−B(τ ,S)r(τ ) ,

for some function A, and for B(t, T ) = κ1 1 − e−κ(T −t) (see Eq. (11.56)).
In Section 11.3, Eq. (11.49), it was also shown that
!
− tT r(τ )dτ +
E e (P (r(T ), T, S) − K)
= P (r(t), t, S) · QSF [P (r(T ), T, S) ≥ K] − KP (r(t), t, T ) · QTF [P (r(T ), T, S) ≥ K] , (11.72)

where QTF denotes the T -forward martingale probability (see Section 11.1.4).
In Appendix 8, we show that the two probabilities in Eq. (11.72) can be evaluated by the
changes of numéraire described in Section 11.1.4, such that the solution for P (r, T, S) is:
P (r, T, S) − 1 σ2 T [B(τ ,S)−B(τ ,T )]2 dτ −σ T [B(τ ,S)−B(τ ,T )]dW QTF (τ )
P (r, T, S) = e 2 t t under QTF
P (r, t, T )
P (r, T, S) 1 σ2 T [B(τ ,S)−B(τ ,T )]2 dτ −σ T [B(τ ,S)−B(τ ,T )]dW QSF (τ )
P (r, T, S) = e2 t t under QSF
P (r, t, T )
(11.73)
T
where W QF is a Brownian motion under the forward probability QTF . Therefore, simple algebra
now reveals that:
!
P (r(t),t,S)
ln KP (r(t),t,T )
+ 12 v 2
S T
QF [P (T, S) ≥ K] = Φ (d1 ) , QF [P (T, S) ≥ K] = Φ (d1 − v) , d1 = ,
v
where " T
1 − e−2κ(T −t)
2
v =σ 2
[B(τ , S) − B(τ , T )]2 dτ = σ 2 B(T, S)2 . (11.74)
t 2κ
11.7.3.2 Perfectly fitting extension
We now consider the perfectly fitting extension of the previous results. Namely, we consider
model (11.54) in Section 11.3, viz
dr(τ ) = (θ(τ ) − κr(τ ))dτ + σdW̃ (τ ),
where θ(τ ) is now the infinite dimensional parameter that is used to “invert the term-structure.”
The solution to Eq. (11.71) is the same as in the previous section. However, in Section 11.7.3
we shall argue that the advantage of using such a perfectly fitting extension arises as soon as
one is concerned with the evaluation of more complex options on fixed coupon bonds.
376
by A. Mele
11.7.3.3 Bond price volatility and the persistence of the short-term rate
The implied vol on options on bonds is typically very large, in fact comparable to that on
stocks. Why is it that this implied vol is so large, when in fact, the volatility of the short-term
rate is one order of magnitude less than that on stock markets? The answer is that the short-
term rate is very persistent, and it is “a risk for the long-run,” pretty much in the same spirit
of the explanations attempting to explain the equity premium puzzle, reviewed in Chapter
7. To make this point precise, define, first, the term-structure of volatility. It is the function,
τ → Vol (R (τ )), where R (τ ) is the spot rate for the maturity τ , and Vol (R (τ )) is the standard
deviation of this spot-rate. By the definition of R (τ ), the term-structure of volatility can also
be written as the function
1
τ → Vol − ln P (τ ) ,
τ
where P (τ ) is the price of a zero with maturity equal to τ . It is instructive to see what this
volatility looks like, for a concrete model. Consider again the Vasicek model. This model assumes
that the short-term rate is solution to,
drt = κ (µ − rt ) dt + σdWt ,
where Wt is a Brownian motion, and κ, µ and σ are three positive constants. By previous results
given in this chapter, we know that for this model,
A (τ ) 1 1 − e−κτ
R (τ ) = + B (τ ) r, B (τ ) = .
τ τ κ
for some function A (τ ). Therefore, we have that,
1
Vol [R (τ )] = B (τ ) Vol∞ (r) , (11.75)
τ

where Vol∞ (r) is the “ergodic” volatility of the short-term rate, defined as, Vol∞ (r) = σ 2 /2κ.
For example, if κ = 0.2 and σ = 0.03, then Vol∞ (r) ≈ 4.7%. Given the previous values for κ
and σ, the picture below depicts the term-structure of volatility, i.e. Eq. (11.75).
Vol(R)
0.045
0.040
0.035
0.030
0 1 2 3 4 5
Maturity (years)
As we can see, the term-structure of volatility is decreasing in the maturity of the zero, and
attains its maximum at Vol∞ (r) ≈ 4.7%. It is natural, as the yield curve in this model flattens
377
by A. Mele
out, converging towards a constant long-term value, the asympotic interest rate, as we say
sometimes.
Despite this, the volatility of bond returns can be much higher, as we now illustrate. We need
to figure out what the dynamics of the bond price are, for the Vasicek model. By Itô’s lemma,
dP (τ )
= [· · · ] dt + [−σ · B (τ )] dWt
P (τ )
Therefore, the volatility of bond returns is,

dP
Vol = σB (τ ) . (11.76)
P
Compare Eq. (11.76) with Eq. (11.75). The main difference between the two equations is
that the right hand side of Eq. (11.75) is divided by τ , which makes Vol [R (τ )] decreasing in τ .
(Otherwise, Vol∞ (r) and σ have roughly the same order of magnitude.) The point is, indeed,
that the yield, R (τ ), is simply an average return which we obtain were we to decide not to sell
the bond until its expiry. This average return is, of course, progressively less volatile as time to
maturity gets large and it becomes a constant, eventually. The return dP P
is, instead, measuring
the capital gains we may obtain by trading the bond, and tends to be more and more volatile
as time
to maturity gets large. Indeed, even if σ is very small, the volatility dP of bond return,
Vol dPP
, can be quite high. For example, if κ is close to zero, then, Vol P
≈ σ · τ , which
is 15% for a 5Y zero. This fact is illustrated by the next picture, which depicts Eq. (11.76),
evaluated at the previous parameter values, κ = 0.2 and σ = 0.03.
0.12
Vol(dP/P)
0.10
0.08
0.06
0.04
0.02
0.00
0 1 2 3 4 5 6 7 8
Maturity (years)
Intuitively, it is the high persistence of the short-term rate (measured by the low value of κ)
to make the bond price so volatile in correspondence of large maturity dates. High persistence
in the short-term rate means that a shock in the short-term rate, is permanently embedded
in the future path of the short-term rate, or it has persistent consequences. This makes the
short-term rate very volatile in the long-run, which makes the value of the long maturity zero
very volatile as well. These facts can be seen at work analyzing the option-based
√ volatility in
Eq. (11.74). In that case, we have that as κ gets small, v tends to σ × T − t × (S − T ), so
that the implied vol equals σ × (S − T ), which increases with the bond’s time to maturity left
at its expiration, S − T .
378
by A. Mele
The previous reasoning does, of course, still hold in the more realistic case of a three-factor
model, such as that in Eqs. (11.37). In that case, as explained, κr is large and κr̄ is small:
the short-term rate is quite persistent because it mean-reverts, quickly, to a persistent process,
which we denoted as r̄ (τ ). Naturally, in such as a three-factor model, Eq. (11.76) does not hold
anymore, as we should add two more volatility components, related to stochastic volatility,
v (τ ), and the persistent process r̄ (τ ). However, the bond return volatility would be boosted
by the high persistence of r̄ (τ ).
11.7.4 Related fixed income products

11.7.4.1 Fixed coupon bonds
Given a set of dates {Ti }ni=0 , a fixed coupon bond pays off a fixed coupon ci at Ti , i = 1, · · ·, n
and one unit of numéraire at time Tn . Ideally, one generic coupon at time Ti pays off for the
time-interval Ti − Ti−1 . It is assumed that the various coupons are known at time t < T0 . By
the FTAP, the value of a fixed coupon bond is
n

Pfcb (t, Tn ) = P (t, Tn ) + ci P (t, Ti ) .
i=1
11.7.4.2 Floating rate bonds
A floating rate bond works as a fixed coupon bond, with the important exception that the
coupon payments are defined as:
1
ci = δ i−1 L (Ti−1 , Ti ) = − 1, (11.77)
P (Ti−1 , Ti )
where δ i ≡ Ti+1 − Ti , and where the second equality is the definition of the simply-compounded
LIBOR rates introduced in Section 11.1 (see Eq. (11.1)). By the FTAP, the price pfrb as of time
t of a floating rate bond is:
n
Ti !
pfrb (t) = P (t, Tn ) + E e− t r(τ )dτ
δi−1 L (Ti−1 , Ti )
i=1
n
- Ti . n
e− t r(τ )dτ
= P (t, Tn ) + E − P (t, Ti )
i=1
P (Ti−1 , Ti ) i=1
n
n

= P (t, Tn ) + P (t, Ti−1 ) − P (t, Ti )
i=1 i=1
= P (t, T0 ).
where the second line follows from Eq. (11.77) and the third line from Eq. (11.8) given in Section
11.1.
The same result can be obtained by assuming an economy where the floating rates contin-
uously pay off the instantaneous short-term rate r. Let T0 = t for simplicity. In this case, pfrb
is solution to the PDE (11.69), with π(r) = r, and boundary condition pfrb (T ) = 1. As it can
verified, pfrb = 1, all r and τ , is indeed solution to the PDE (11.69).
379
by A. Mele
11.7.4.3 Options on fixed coupon bonds
The payoff of an option maturing at T0 on a fixed coupon bond paying off at dates T1 , · · ·, Tn

is given by: - .+
n

[Pfcb (T0 , Tn ) − K]+ = P (T0 , Tn ) + ci P (T0 , Ti ) − K . (11.78)
i=1
At first glance, the expectation of the payoff in Eq. (11.78) seems very difficult to evaluate.
Indeed, even if we end up with a model that predicts bond prices at time T0 , P (T0 , Ti ), to be
lognormal, we know that the sum of lognormals is not lognormal. This issue can be dealt with
in an elegant manner. Suppose we wish to model the bond price P (t, T ) through any one of
the models of the short-term rate reviewed in Section 11.3. In this case, the pricing function is
obviously P (t, T ) = P (r, t, T ). Assume, further, that
∂P (r, t, T )
For all t, T, < 0, (11.79)
∂r
and that
For all t, T, lim P (r, t, T ) > K and lim P (r, t, T ) = 0. (11.80)
r→0 r→∞
Under conditions (11.79) and (11.80), there is one and only one value of r, say r∗ , that solves
the following equation:
n

∗
P (r , T0 , Tn ) + ci P (r∗ , T0 , Ti ) = K. (11.81)
i=1
Then, the payoff in Eq. (11.78) can be written as:

- n .+ - n .+

c̄i P (r(T0 ), T0 , Ti ) − K = c̄i (P (r(T0 ), T0 , Ti ) − P (r∗ , T0 , Ti )) ,
i=1 i=1
where c̄i = ci , i = 1, · · ·, n − 1, and c̄n = 1 + cn .

Next, note that by condition (11.79), the terms P (r(T0 ), T0 , Ti ) − P (r∗ , T0 , Ti ) have the same
sign for all i.16 Therefore, the payoff in Eq. (11.78) is,
- n .+ n

c̄i P (r(T0 ), T0 , Ti ) − K = c̄i [P (r(T0 ), T0 , Ti ) − P (r∗ , T0 , Ti )]+ . (11.82)
i=1 i=1
Next, note that each term of the sum in Eq. (11.82) can be evaluated as an option on a
pure discount bond with strike price equal to P (r∗ , T0 , Ti ). Typically, the threshold r∗ must be
found with some numerical method. The device to reduce the problem of an option on a fixed
coupon bond to a problem involving the sum of options on zero coupon bonds was invented by
Jamshidian (1989).17 The price of the call on the fixed coupon bond is, therefore,
n

Call (t, T0 ; Pfcb (t, Tn ) , K, v) = c̄i Call (t, T0 ; P (t, Ti ) , P (r∗ , T0 , Ti ) , vi ) , (11.83)
i=1
16 Suppose that P (r(T0 ), T0 , T1 ) > P (r∗ , T0 , T1 ). By Eq. (11.79), r(T0 ) < r∗ . Hence P (r(T0 ), T0 , T2 ) > P (r∗ , T0 , T2 ), etc.
17 The conditions in Eqs. (11.79) and (11.80) hold, within the Vasicek’s model that Jamshidian considered in his paper. In fact,
the condition in Eq. (11.79) holds for all one-factor stationary, Markov models of the short-term rate. However, the condition in
Eq. (11.79) is not a general property of bond prices in multi-factor models (see Mele (2003)).
380
by A. Mele
where r∗ solves Eq. (11.81), and,
Call (t, T0 ; Pi , Pi∗ , vi ) = Pi Φ (d1,i ) − Pi∗ P (t, T0 ) Φ (d1,i − vi ) ,

Pi
ln P ∗ P (t,T + 1 v2

0) 2 i 1−e−2κ(T0 −t) 1 −κ(T −t)
d1,i = i
vi
, vi = σ 2κ
B (T0 , Ti ) , B (t, T ) = κ
1 − e .
Why are perfectly fitting models so important, in practice? Suppose that in Eq. (11.81), the
critical value r∗ is computed by means of the Vasicek’s model. This assumption is attractive
because it allows to evaluate the payoff in Eq. (11.82) with the Jamshidian’s formula of Section
11.7.2. However, this way to proceed does not ensure that the yield curve is perfectly fitted.
The natural alternative is to use the corresponding perfectly fitting extension, as in Eq. (11.83).
However, such a perfectly fitting extension gives rise to a zero-coupon bond option price that
is perfectly equal to the one that can be obtained through the Jamshidian’s formula. However,
things differ as far as options on zero coupon bonds are concerned. Indeed, by using the perfectly
fitting model (11.54), one obtains bond prices such that the solution r∗ in Eq. (11.81) is radically
different from the one obtained when bond prices are obtained with the simple Vasicek’s model.
11.7.4.4 Interest rate swaps
A Savings and Loan (S&L, henceforth) is an institution that makes mortgage, car and personal
loans to individual members, financed through savings. During the 1980s through the beginning
of the 1990s, these forms of cooperative ventures entered into a deep and persistent crisis, leading
to a painful Government bailout of about $125b under George H.W. Bush administration.
There are many causes of this crisis, but one of them was certainly the rise in short-term rates
arising as a result of inflation and the attempts at fighting against it–the so-called Monetary
experiment mentioned in Section 11.3.7. But banking is risky precisely because it involves
lending at horizons longer than those relating to borrowing, and S&L “banking” was not an
exception to such modus operandi. Certainly, interest rate swaps could have helped copying
with the inversion of the yield curve of the time.
An interest rate swap is simply an exchange of interest rate payments. Typically, one coun-
terparty exchanges a fixed against a floating interest rate payment. The floating payment is
typically a short-term interest rate. For example, the counterparty receiving a floating interest
rate payment has “good” (or only) access to markets for “variable” interest rates, but wishes to
pay fixed interest rates. Alternatively, the counterparty receiving a floating interest rate wants
to hedge itself against changes in short-term rates, as it might have been the case for S&L
institutions during the 1980s. The counterparty receiving a floating interest rate payment and
paying a fixed interest rate Kirs has a payoff equal to,
δ i−1 [L (Ti−1 , Ti ) − Kirs ]
at time Ti , i = 1, ···, n. Each of this payment is a FRA really, and can be evaluated as in Section
11.1. By convention, we say that the swap payer is the counterparty who pays the fixed interest
rate Kirs , and that the swap receiver is the counterparty receiving the fixed interest rate Kirs .
With a dedicated interest swap of this kind, a S&L institution would have locked-in the
yield curve: at time t, the payoff for the financial institution is, in this stylized example,
long long
δ i−1 L (Ti−1 , Ti ) − L (Ti−1 , Ti ) + δ i−1 [L (Ti−1 , Ti ) − Kirs ] = δ i−1 L (Ti−1 , Ti ) − Kirs ,
where Llong (Ti−1 , Ti ) is the interest rate gained over long-term assets. Naturally, if short-term in-
terest rates had to go down, relative to Kirs , a S&L institution would not have benefited from the
increased long-term/short-term spread, δ i−1 Llong (Ti−1 , Ti ) − L (Ti−1 , Ti ) . But clearly insuring
381
by A. Mele
against yield curve inversions is the thing to do, if yield curve inversions lead to bankruptcy and
bankruptcy is costly. We shall see, below, that other products exist, such as caps or swaptions,
which ensure against the upside while at the same time freeing up the downside.
By the FTAP, the price as of time t of an interest rate swap payer, pirs (t), say, is:
n
Ti ! n

− r(τ )dτ
pirs (t) = E e t δ i−1 (L (Ti−1 , Ti ) − Kirs ) = IRS(t, Ti−1 , Ti ; Kirs ), (11.84)
i=1 i=1
where IRS is the value of a FRA and, by Eq. (11.9) in Section 11.1, is:
IRS (t, Ti−1 , Ti ; Kirs ) = δ i−1 [F (t, Ti−1 , Ti ) − Kirs ] P (t, Ti ) .
The forward swap rate Rswap is the value of Kirs such that pirs (t) = 0. Simple computations
yield: n
δ i−1 F (t, Ti−1 , Ti ) P (t, Ti ) P (t, T0 ) − P (t, Tn )
Rswap (t) = i=1 n = n , (11.85)
i=1 δ i−1 P (t, Ti ) i=1 δ i−1 P (t, Ti )
where the last equality is due to Eq. (11.5) in Section 11.1: δ i−1 F (t, Ti−1 , Ti ) P (t, Ti ) = P (t, Ti−1 )
−P (t, Ti ).18 This expression is quite similar to the par coupon rate in Eq. (11.3).
By plugging the expression for the forward swap rate in Eq. (11.85) into Eq. (11.84), we
obtain the following intuitive expression for the swap payer:
n
n

pirs (t) = δ i−1 F (t, Ti−1 , Ti ) P (t, Ti ) − Kirs δ i−1 P (t, Ti )
i=1 i=1
n
= δ i−1 P (t, Ti ) (Rswap (t) − Kirs )
i=1
≡ PVBPt (T1 , · · ·, Tn ) (Rswap (t) − Kirs ) , (11.86)
where PVBPT (T1 , · · ·, Tn ) is the so-called swap’s “Present Value of the Basis Point” (see, e.g.,
Brigo and Mercurio, 2006), i.e. the present value impact of one basis point move in the forward
swap rate at T .
11.7.4.5 Caps & floors
A cap works as an interest rate swap, with the important exception that the exchange of interest
rates payments takes place only if actual interest rates are higher than K. A cap protects against
upward movements of the interest rates, freeing up the downside. By going long a cap, the S&L
institution in the example of the previous section, then, would benefit from the downside in
the short-term interest rates through a cap on them, literally. Precisely, a cap is made up of
caplets. The payoff as of time Ti of a caplet is:
δ i−1 [L (Ti−1 , Ti ) − K]+ , i = 1, · · ·, n.
Floors are defined in a similar way, with a single floorlet paying off,
δ i−1 [K − L (Ti−1 , Ti )]+
18 To cast this problem in terms of continuous time swap exchanges and, then, PDEs, we set p (T ) ≡ 0 as a boundary condition,
irs
and π(r) = r − k, where k plays the same role as Kirs above. Then, if the bond price P (τ ) is solution to Eq. (11.69), the following
T
function, pirs (τ ) = 1 − P (τ ) − k τ P (s)ds, does also satisfy Eq. (11.69).
382
by A. Mele
at time Ti , i = 1, · · ·, n.
We will only focus on caps. By the FTAP, the value pcap of a cap as of time t is:
n
T !
− t i r(τ )dτ +
pcap (t) = E e δ i−1 (L (Ti−1 , Ti ) − K) . (11.87)
i=1
We can develop explicit solutions to this problem, relying upon models of the short-term
rate. First, we use the standard definition of simply compounded rates given in Section 11.1
1
(see Eq. (11.1)), viz δ i−1 L (Ti−1 , Ti ) = P (Ti−1 ,Ti )
− 1, and rewrite the caplet payoff as follows:
1
[δ i−1 L (Ti−1 , Ti ) − δ i−1 K]+ = [1 − (1 + δ i−1 K)P (Ti−1 , Ti )]+ .
P (Ti−1 , Ti )
We have,
n
-
Ti .
e− t r(τ )dτ +
pcap (t) = E (1 − (1 + δi−1 K)P (Ti−1 , Ti ))
i=1
P (Ti−1 , Ti )
n
− t i−1 r(τ )dτ 1
T
= E e +
(Ki − P (Ti−1 , Ti )) , Ki = (1 + δ i−1 K)−1 , (11.88)
i=1
K i
where the last equality follows by a simple computation.19 For the models of Jamshidian or
in Hull & White, bond prices are such that the cap price in Eq. (11.88) can be expressed in
closed-form. Indeed, Eq. (11.88) makes clear a cap is a basket of puts on zero coupon bonds,
with strikes Ki . As such, it can be priced in closed form, using the models in Sections 11.7.4.1
and 11.7.4.2. We have:
n
1
pcap (t) = Put (t, Ti−1 ; P (t, Ti ) , Ki , v) , (11.89)
i=1
Ki
where Put (·) satisfies the put-call parity in Eq. (11.70), and, by the pricing formulae in Section
11.7.4.1,
Call (t, Ti−1 ; P (t, Ti ) , Ki , v) = P (t, Ti ) Φ (d1,i ) − Ki P (t, Ti−1 ) Φ (d1,i − v) ,
P (t,T )
ln K P t,T i + 1 v2

i ( i−1 ) 2 1−e−2κ(Ti−1 −t)
d1,i = v
, v = σ 2κ
B (Ti−1 , Ti ) , B (t, T ) = κ1 1 − e−κ(T −t) .
(11.90)
Naturally, caps on interest rates, which are nothing but baskets of calls, are portfolios of puts
on fixed coupon bonds, due to the inverse relation between prices and interest rates.20
19 By the law of iterated expectations,

    
T
− t i r(τ )dτ
T
− t i r(τ )dτ
e + e +

E [1 − Ki P (Ti−1 , Ti )]  = E E 
(1 − Ki P (Ti−1 , Ti )) F (Ti )
P (Ti−1 , Ti ) P (Ti−1 , Ti )

Ti Ti
− t r(τ )dτ Ti−1 r(τ )dτ +
=E E e e (1 − Ki P (Ti−1 , Ti )) F (Ti )

!
Ti−1
= E E e− t r(τ)dτ
(1 − Ki P (Ti−1 , Ti ))+ F (Ti )
Ti−1
!
= E e− t r(τ)dτ
(1 − Ki P (Ti−1 , Ti ))+
20 We might also price caps and floors through the partial differential equation (11.69), after setting π (r) = (r − k)+ (caps) and
π (r) = (k − r)+ (floors), for some strike k. However, this type of contracts, where payoffs are paid continuously in time, is highly
stylized, and does not exist in the markets.
383
by A. Mele
11.7.4.6 Swaptions
Let us proceed with the example of the S&L institution in the previous sections. The benefits for
a S&L institution long of caps is to be protected against upward movements in the short-term
rates while ensuring the downside is freed up. These benefits arise, so to speak, period per period
in that, a cap is a basket of options with different maturities. A swaption works differently, in
that the optionality kicks in “all together.” Suppose at time t, the S&L institution is still
concerned about future inversions of the yield curve and, therefore, anticipates it might need
to go for going long a swap payer at some future date. At the same time, the institution might
fear that in the future, swap rates will be lower relative to some reference strike. Swaptions
allow to free up such a downside risk, in that they simply are options to enter a swap contract
on a future date. Let the maturity date of this option be T0 . Then, at time T0 , the payoff for
a payer swaption is the maximum between zero and the value of a payer interest rate swap at
T0 , pirs (T0 ), viz
- n .+ - n .+

(pirs (T0 ))+ = IRS (T0 , Ti−1 , Ti ; Kirs ) = δ i−1 (F (T0 , Ti−1 , Ti ) − Kirs ) P (T0 , Ti ) .
i=1 i=1
(11.91)
By the FTAP, the value of the payer swaption at time t is:
- $ n %+ .
T0
pswaption (t) = E e− t r(τ )dτ δ i−1 (F (T0 , Ti−1 , Ti ) − Kirs ) P (T0 , Ti )
i=1
- $ n
%+ .
T0
1 − P (T0 , Tn ) − Kirs δ i−1 P (T0 , Ti ) , (11.92)
i=1
where we used the relation δ i−1 F (T0 , Ti−1 , Ti ) = PP(T(T0 ,T i−1 )

0 ,Ti )
− 1.
Eq. (11.92) is the expression for the price of a put option on a fixed coupon bond struck at
one. Therefore, we can price this contract in closed-form, through the models in Section 11.7.4.1
and 11.7.4.2, similarly to that we did in the previous section for caps pricing. We have:
pswaption (t) = Put (t, T0 ; Pfcb (t, Tn ) , 1, v) ,
where Put (·) satisfies the put-call parity in Eq. (11.70). By the pricing formulae in Section
11.7.4.1,
n

Call (t, T0 ; Pfcb (t, Tn ) , 1, v) = Kirs δ i−1 Call (t, T0 ; P (t, Ti ) , Pi∗ , v)+Call (t, T0 ; P (t, Tn ) , Pn∗ , v) ,
i=1
where Call (t, T0 ; P (t, Ti ) , Pi∗ , v) is as in Eq. (11.90), with Pi∗ = P (r∗ , T0 , Ti ), and r∗ solution
to Eq. (11.81) for K = 1.
11.7.5 Market models

11.7.5.1 Models and market practice
As illustrated in the previous sections, models of the short-term rate can be used to obtain
closed-form solutions of virtually every important interest rate derivative product. The typical
examples are the Vasicek’s model and its perfectly fitting extension. Yet practitioners evaluate
384
by A. Mele
caps through the Black’s (1976) formula. The assumption underlying the market practice is that
the simply-compounded forward rate is lognormally distributed. As it turns out, the analytically
tractable (Gaussian) short-term rate models are not consistent with this assumption. Clearly,
the (Gaussian) Vasicek’s model does not predict that the simply-compounded forward rates are
Geometric Brownian motions.21
Is it be possible to address these issues through a non-Markovian HJM? The answer is in
the affirmative, although some qualifications are necessary. A practical difficulty with HJM
is that instantaneous forward rates are not observed, which at a first sight seems to be an
hindrance to realistic pricing of caps and swaptions, a so important portion of the interest
rate derivative markets. This point.has been addressed by Brace, Gatarek and Musiela (1997),
Jamshidian (1997) and Miltersen, Sandmann and Sondermann (1997), who observed that the
HJM framework can be somehow “forced” to produce models ready to be used consisently
with the market practice. The key feature of these models is the emphasis on the dynamics of
the simply-compounded forward rates. One additional, and technical, assumption is that these
simply-compounded forward rates are lognormal under the risk-neutral probability Q. That is,
given a non-decreasing sequence of reset times {Ti }i=0,1,··· , each simply-compounded rate, Fi , is
solution to the following stochastic differential equation:22
dFi (τ )
= mi (τ )dτ + γ i (τ )dW̃ (τ ), τ ∈ [t, Ti ] , i = 0, · · ·, n − 1, (11.93)
Fi (τ )
where to simplify notation, we have set, Fi (τ ) ≡ F (τ , Ti , Ti+1 ), and mi and γ i are some
deterministic functions of time (γ i is vector valued). On a mathematical point of view, that
assumption that Fi follows Eq. (11.93) is innocous.23
As we shall show, this simple framework can be used to use the simple Black’s (1976) formula
to price caps and floors. However, we need to emphasize that there is nothing wrong with the
short-term rate models analyzed in previous sections. The real advance of the so-called market
model is to give a rigorous foundation to the standard market practice to price caps and floors
by means of the Black’s (1976) formula.
11.7.5.2 Simply-compounded forward rate dynamics, and no-arb restrictions
By the definition of the simply-compounded forward rates in Eq. (11.5),

P (τ , Ti )
ln = ln [1 + δ i Fi (τ )] . (11.94)
P (τ , Ti+1 )
The logic we follow, now, is the same as that underlying the HJM representation of Section
11.4. We wish to express the volatility of bond prices in terms of the volatility of forward rates.
To achieve this task, we first assume that bond prices are driven by Brownian motions and
expand the l.h.s. of Eq. (11.94) (step 1). Then, we expand the r.h.s. of Eq. (11.94) (step 2).
Finally, we identify the two diffusion terms derived from the previous two steps (step 3).
21 Indeed, P (τ ,Ti )
1 + δi Fi (τ ) = = exp [∆Ai (τ ) − ∆Bi (τ ) r (τ )], where ∆Ai (τ ) = A (τ , Ti ) − A (τ , Ti+1 ), and ∆Bi (τ ) =
P (τ,Ti+1 )
B (τ , Ti ) − B (τ , Ti+1 ). Hence, Fi (τ ) is not a Geometric Brownian motion, despite the fact that the short-term rate r is Gaussian
and, hence, the bond price is log-normal. Black ’76 can not be applied in this context.
22 Brace, Gatarek and Musiela (1997) derived their model by specifying the dynamics of the spot simply-compounded Libor
interest rates. Since Fi (Ti ) = L(Ti ) (see Eq. (??)), the two derivations are essentially the same.
23 It is well-known that lognormal instantaneous forward rates create mathematical problems to the money market account (see, for
example, Sandmann and Sondermann (1997) for a succinct overview on how this problem is easily handled with simply-compounded
forward rates).
385
by A. Mele
Step 1: Let Pi ≡ P (τ , Ti ), and assume that under the risk-neutral probability Q, Pi is solution
to:
dPi
= rdτ + σ bi dW̃ .
Pi
In terms of the HJM framework in Section 11.4,
" Ti
I
σ bi (τ ) = −σ (τ , Ti ) = − σ(τ , ℓ)dℓ, (11.95)
τ
where σ(τ , ℓ) is the instantaneous volatility of the instantaneous ℓ-forward rate as of time
τ . By Itô’s lemma,

P (τ , Ti ) 1
d ln = − σ bi 2 − σ b,i+1 2 dτ + (σ bi − σ b,i+1 ) dW̃ . (11.96)
P (τ , Ti+1 ) 2
Step 2: Applying Itô’s lemma to ln [1 + δ i Fi (τ )], and using Eq. (11.93), yields:
δi 1 δ 2i
d ln [1 + δ i Fi (τ )] = dFi − (dFi )2
1 + δ i Fi 2 (1 + δ i Fi )2
- .
δi mi Fi 1 δ 2i Fi2 γ i 2 δ i Fi
= − dτ + γ dW̃ . (11.97)
1 + δ i Fi 2 (1 + δ i Fi ) 2 1 + δ i Fi i
Step 3: By Eq. (11.94), the diffusion terms in Eqs. (11.96) and (11.97) have to be the same.
Therefore,
δ i Fi (τ )
σ bi (τ ) − σ b,i+1 (τ ) = γ (τ ), τ ∈ [t, Ti ] .
1 + δ i Fi (τ ) i
By summing over i, we get the following no-arbitrage restriction applying to the volatility
of the bond prices:
i−1
δ j Fj (τ )
σ bi (τ ) − σ b,0 (τ ) = − γ (τ ). (11.98)
j=0
1 + δ j Fj (τ ) j
As is clear, Eq. (11.98) is merely a restriction to the general HJM framework. In other
words, assume the instantaneous forward rates are as in Eq. (11.59) of Section 11.4. As we
demonstrated in Section 11.4, then, the bond prices volatility is given by Eq. (11.95). But if we
also assume that simply-compounded forward rates are solution to Eq. (11.93), then, the bond
prices volatility is also equal to Eq. (11.98). Comparing Eq. (11.95) with Eq. (11.98) produces,
" Ti i−1
δ j Fj (τ )
σ(τ , ℓ)dℓ = γ (τ ).
T0 j=0
1 + δ j Fj (τ ) j
The practical interest to restrict the forward-rate volatility dynamics in this way lies in the
possibility to obtain closed-form solutions for some of the interest rates derivatives surveyed in
Section 11.7.3.
386
by A. Mele
11.7.5.3 Pricing formulae

Caps & floors
We provide analytical results for the price of caps only. We have:

n
Ti !
− r(τ )dτ +
pcap (t) = E e t δi−1 (L (Ti−1 , Ti ) − K)
i=1
n Ti !
δi−1 (F (Ti−1 , Ti−1 , Ti ) − K)+
i=1
n

= δ i−1 P (t, Ti ) · EQTi [F (Ti−1 , Ti−1 , Ti ) − K]+ , (11.99)
F
i=1
where EQTi [·] denotes, as usual, the expectation taken under the Ti -forward martingale proba-
F
bility QTFi ; the first equality is Eq. (11.87); and the second equality has been obtained through
the usual change of probability technique introduced Section 11.1.4.
The key point is that
Fi−1 (τ ) ≡ Fi−1 (τ , Ti−1 , Ti ), τ ∈ [t, Ti−1 ], is a martingale under QTFi .
A proof of this statement was given in Section 11.1. By Eq. (11.93), this means that Fi−1 (τ ) is
solution to:
dFi−1 (τ ) Ti
= γ i−1 (τ ) dW QF (τ ) , τ ∈ [t, Ti−1 ] , i = 1, · · ·, n,
Fi−1 (τ )
under QTFi . Therefore, the cap price in Eq. (11.99) reduces to that of Black (1976), once we
assume γ is deterministic:
EQTi [F (Ti−1 , Ti−1 , Ti ) − K]+ = Fi−1 (t) Φ (d1,i−1 ) − KΦ (d1,i−1 − si ) , (11.100)
F
where " Ti−1

ln Fi−1 (t)
+ 12 s2i 2
d1,i−1 = K
, si = γ i−1 (τ )2 dτ .
si t
A derivation of the Black’s formula is provided in Appendix 8.
Swaptions
By Eq. (11.86), the payoff of a payer swaption expiring at time T0 is:

n

[pirs (T0 )]+ = PVBPT0 (T1 , · · ·, Tn ) (Rswap (T0 ) − Kirs )+ , PVBPT0 (T1 , · · ·, Tn ) = δ i−1 P (T0 , Ti ).
i=1
Therefore, by the FTAP, and a change of measure,

T0 !
pswaption (t) = E e− t r(τ )dτ
PVBPT0 (T1 , · · ·, Tn ) (Rswap (T0 ) − Kirs )+
= PVBPt (T1 , · · ·, Tn ) · EQswap (Rswap (T0 ) − Kirs )+ , (11.101)
where EQswap denotes the expectation taken under the so-called forward swap probability, defined
by: # T0
dQswap PVBPT0 (T1 , · · ·, Tn )
= e t r(τ )dτ
−
.
dQ FT PVBPt (T1 , · · ·, Tn )
0
387
by A. Mele

dQswap
It is easy to see that E = 1, by using the definition of PVBPT0 (T1 , · · ·, Tn ), and
dQ
FT0
T0 !
the pricing equation, P (t, Ti ) = E e− t r(τ )dτ P (T0 , Ti ) . The key point underlying this change
of measure is that the forward swap rate Rswap is a Qswap -martingale.24 And naturally, it is
positive. Therefore, it must satisfy:
dRswap (τ )
= γ swap (τ ) dWswap (τ ) , τ ∈ [t, T0 ] , (11.102)
Rswap (τ )
where Wswap is a Qswap -Brownian motion, and γ swap (τ ) is adapted.

If the volatility γ swap (τ ) in Eq. (11.102) is deterministic, we can use Black 76 to price the
payer swaption in Eq. (11.101) in closed-form. We have:

pswaption (t) = PVBPt (T1 , · · ·, Tn ) · Black76(Rswap (t) ; T0 , Kirs , V̄ ), (11.103)
where Black76 (·) is given by Black’s (1976) formula:

√ √
Black76(Rswap (t) ; T0 , Kirs , V̄ ) = Rswap (t) Φ (dt ) − Kirs Φ(dt − V̄ ),
ln
Rswap (t) 1
+ 2 V̄ # T0
dt = Kirs
√
V̄
, V̄ = t
γ swap (τ )2 dτ .
Inconsistencies
If the forward rate is solution to Eq. (11.93), γ swap cannot be deterministic. Unfortunately, if
forward swap rates are lognormal, then, Eq. (11.93) does not hold. Therefore, we may use Black’s
formula to price either caps or swaptions, not both. This might limit the importance of market
models. A couple of tricks that seem to work in practice. The best known is based on a suggestion
by Rebonato (1998), to replace the true pricing problem with an approximating pricing problem
where γ swap is deterministic. That works in practice, but in a world with stochastic volatility,
we should expect that trick to generate unstable things in periods experiencing highly volatile
volatility. See, also, Rebonato (1999) for an essay on related issues. The next section suggests
to use numerical approximation based on Montecarlo techniques.
11.7.5.4 Numerical approximations
Suppose forward rates are lognormal. Then, we can price caps using Black’s formula. As for
swaptions, Montecarlo integration should be implemented as follows. By a change of measure,
- $ n %+ .
T0
pswaption (t) = E e− t r(τ )dτ
δ i−1 (F (T0 , Ti−1 , Ti ) − K) P (T0 , Ti )
i=1
- n .+

= P (t, T0 )EQT0 δ i−1 (F (T0 , Ti−1 , Ti ) − K) P (T0 , Ti ) ,
F
i=1
24 By Eq. (11.85), and one change of measure,

 τ 
!
P (τ , T0 ) − P (τ , Tn )  e− t r(τ )dτ (P (τ , T0 ) − P (τ , Tn ))  P (t, T0 ) − P (t, Tn )
EQswap [Rswap (τ )] = EQswap =E = = Rswap (t).
PVBPτ (T1 , · · ·, Tn ) PVBPt (T1 , · · ·, Tn ) PVBPt (T1 , · · ·, Tn )
388
by A. Mele
where F (T0 , Ti−1 , Ti ), i = 1, · · ·, n, can be simulated under QTF0 .

Details are as follows. We know that
dFi−1 (τ ) Ti
= γ i−1 (τ )dW QF (τ ). (11.104)
Fi−1 (τ )
By results in Appendix 3, we also know that:
Ti T0
dW QF (τ ) = dW QF (τ ) − [σ bi (τ ) − σ b0 (τ )] dτ
i−1

T
QF0 δ j Fj (τ )
= dW (τ ) + γ (τ )dτ ,
j=0
1 + δ j Fj (τ ) j
where the second line follows from Eq. (11.98) in the main text. Replacing this into Eq. (11.104)
leaves:
i−1

dFi−1 (τ ) δ j Fj (τ ) T0
= γ i−1 (τ ) γ j (τ )dτ + γ i−1 (τ )dW QF (τ ), i = 1, · · ·, n.
Fi−1 (τ ) j=0
1 + δ j Fj (τ )
These can easily be simulated with the methods described in any standard textbook such as
Kloeden and Platen (1992).
11.7.5.5 Volatility surfaces
Caps & floors
The market practice relies on the models of this section, rather than those of Sections 11.7.4.1-
11.7.4.2, in providing volatility surfaces. In the models of Sections 11.7.4.1-11.7.4.2, volatility
surfaces might be produced, but only indirectly, after calibration of the two parameters κ and
σ, as Eq. (11.89) indicates. It is easier, however, to provide volatility surfaces in the first place,
through the models of this section. Quite simply, traders use Eq. (11.100) and quote volatilities
such that the market price of a cap equals to the value predicted by Eq. (11.100) using the
desired implied volatility si . In Eq. (11.100),

si = Ti−1 − t · γ (i) ,
for some γ (i), although traders simply quote the value of γ i that satisfies:
n

γ̂ n : p$cap (t; n) = δ i−1 P (t, Ti ) · Black76 (Fi−1 (t) ; K, ŝi,n ) ,
i=1
where p$cap (t; n) is the market price of the cap, and:

Black76 (Fi−1 (t) ; K, ŝi,n ) = Fi−1 (t) Φ dn1,i−1 − KΦ dn1,i−1 − ŝi,n ,
ln
Fi−1 (t) 1 2
+ 2 ŝi,n √
dn1,i−1 = K
ŝi,n
, ŝi,n = Ti−1 − t · γ̂ n
Given n, we can bootstrap γ̂ (i), i.e. we can recursively solve for γ̂ (i), as follows:
n

0= δ i−1 P (t, Ti ) · [Black76 (Fi−1 (t) ; K, ŝi,n ) − Black76 (Fi−1 (t) ; K, ŝi )] , n = 1, · · ·, N,
i=1
√
where N is the latest available maturity, and ŝi = Ti−1 − t· γ̂ (i). The values of γ̂ (i) constitute
what is known as the term structure of caps volatilities.
389
by A. Mele
Swaptions
As for swaptions, the situation is much simpler. The market practice is to quote swaptions
through standard implied vols, i.e. those vols IVt such that, once inserted into Eq. (11.103),
delivers the swaption market price:
pswaption (t) = PVBPt (T1 , · · ·, Tn ) · Black76 (Rswap (t) ; T0 , Kirs , IVt ) .
390
11.8. Appendix 1: The FTAP for bond prices c
by A. Mele
11.8 Appendix 1: The FTAP for bond prices

Suppose there exist m pure discount bond prices {{Pi ≡ P (τ , Ti )}m
i=1 }τ ∈[t,T ] satisfying:
dPi
= µbi · dτ + σbi · dW, i = 1, · · ·, m, (11A.1)
Pi
where W is a Brownian motion in Rd , and µbi and σbi are progressively F(τ )-measurable functions
guaranteeing the existence of a strong solution to the previous system (σ bi is vector-valued). The value
process V of a self-financing portfolio in these m bonds and a money market technology satisfies:
!
dV = π⊤ (µb − 1m r) + rV dτ + π ⊤ σb dW,
where π is some portfolio, 1m is a m-dimensional vector of ones, and
µb = (µb1 , · · ·, µb2 )⊤ , σ b = (σ b1 , · · ·, σ b2 )⊤ .
Next, suppose that there exists a portfolio π such that π⊤ σ b = 0. This is an arbitrage opportunity if
there exist events for which at some time, µb − 1m r = 0 (use π when µb − 1m r > 0, and −π when
µm − 1d r < 0: the drift of V will then be appreciating at a deterministic rate that is strictly greater
than r). Therefore, arbitrage opportunities are ruled out if:
π⊤ (µb − 1m r) = 0 whenever π⊤ σb = 0.
In other terms, arbitrage opportunities are ruled out when every vector in the null space of σb is
orthogonal to µb − 1m r, or when there exists a λ taking values in Rd satisfying some basic integrability
conditions, and such that
µb − 1m r = σb λ
or,
µbi − r = σbi λ, i = 1, · · ·, m. (11A.2)
In this case,
dPi
= (r + σbi λ) · dτ + σbi · dW, i = 1, · · ·, m.
Pi
# #T ⊤ #
1 T 2
Now define W̃ = W + λdτ , dQ dP = exp(− t λ dW − 2 t λ dτ ). The Q-martingale property of
the “normalized” bond price processes now easily follows by Girsanov’s theorem. Indeed, define for a
generic i, P (τ , T ) ≡ P (τ , Ti ) ≡ Pi , and:
τ
g(τ ) ≡ e− t r(u)du
· P (τ , T ), τ ∈ [t, T ] .
By Girsanov’s theorem, and an application of Itô’s lemma,

dg
= σ bi · dW̃ , under Q.
g
Therefore, for all τ ∈ [t, T ], g(τ ) = E [g(T )], implying that:

τ T T !
g(τ ) ≡ e− t
r(u)du
· P (τ , T ) = E [g(T )] = E[e− t
r(u)du
· P (T, T ) ] = E e− t
r(u)du
,

=1
or τ T ! T !
r(u)du
P (τ , T ) = e t · E e− t r(u)du
= E e− τ r(u)du
, all τ ∈ [t, T ],
391
11.8. Appendix 1: The FTAP for bond prices c
by A. Mele
which is Eq. (11.4).

Notice that no assumption has been made on m. The previous result holds for all m, be they less
or greater than d. Suppose, for example, that there are no other traded assets in the economy. Then,
if m < d, there exists an infinite number of risk-neutral proabilities Q. If m = d, there exists one and
only one risk-neutral probability Q. If m > d, there exists one and only one risk-neutral probability
but then, the various bond prices have to satisfy some basic no-arbitrage restrictions. As an example,
take m = 2 and d = 1. Eq. (11A.2) then becomes
µb1 − r µ −r
= λ = b2 .
σb1 σb2
In other terms, the Sharpe ratio of any two bonds must be identical. Relation (11A.2) will be used
several times in this chapter.
• In Section 11.3, it is assumed that the primitive of the economy is the short-term rate, solution
of a multidimensional diffusion process, and µbi and σbi will be derived via Itô’s lemma.
• In Section 11.4, µbi and σbi are restricted through a model describing the evolution of the forward
rates.
392
11.9. Appendix 2: Certainty equivalent interpretation of forward prices c
by A. Mele
11.9 Appendix 2: Certainty equivalent interpretation of forward prices

Multiply both sides of the bond pricing equation (11.4) by the amount S(T ):
T !
P (t, T ) · S(T ) = E e− t r(τ )dτ · S(T ).
Suppose momentarily that S(T ) is known at T . In this case, we have:

T !
P (t, T ) · S(T ) = E e− t r(τ )dτ · S(T ) .
But in the applications we have in mind, S(T ) is random. Define then its certainty equivalent by the
number S(T ) that solves: T !
P (t, T ) · S(T ) = E e− t r(τ )dτ · S(T ) ,
or
S(T ) = E [ηT (T ) · S(T )] , (11A.3)
where ηT (T ) has been defined in (11.16).
Comparing Eq. (11A.3) with Eq. (11.15) reveals that forward prices can be interpreted in terms of
the previously defined certainty equivalent.
393
11.10. Appendix 3: Additional results on T -forward martingale probabilities c
by A. Mele
11.10 Appendix 3: Additional results on T -forward martingale probabilities

Eq. (11.16) defines ηT (T ) as: T
e− t r(τ )dτ
·1
ηT (T ) = T !
− r(τ )dτ
E e t
More generally, we can define a density process as:

τ
e− t
r(u)du
· P (τ , T )
ηT (τ ) ≡ T ! , τ ∈ [t, T ] .
E e− t r(τ )dτ
#τ
By the FTAP, {exp(− t r(u)du) · P (τ , T )}τ ∈[t,T ] is a Q-martingale (see Appendix 1 to this chapter).

dQT
Therefore, E[ dQF Fτ ] = E[ ηT (T )| Fτ ] = ηT (τ ) all τ ∈ [t, T ], and in particular, η T (t) = 1. We now
show that this works. And at the same time, we show this by deriving a representation of ηT (τ ) that
can be used to find “forward premia.”
We begin with the dynamic representation (11A.1) given for a generic bond price # i, P (τ , T ) ≡
P (τ , Ti ) ≡ Pi :
dP
= µ · dτ + σ · dW,
P
where we have defined µ ≡ µbi and σ ≡ σbi .
Under the risk-neutral probability Q,
dP
= r · dτ + σ · dW̃ ,
P
#
where W̃ = W + λ is a Q-Brownian motion.
By Itô’s lemma,
dη T (τ )
= − [−σ(τ , T )] · dW̃ (τ ), ηT (t) = 1.
ηT (τ )
The solution is:
" " τ
1 τ 2
η T (τ ) = exp − σ(u, T ) du − (−σ(u, T )) · dW̃ (u) .
2 t t
Under the usual integrability conditions, we can now use the Girsanov’s theorem and conclude that
" τ
QT
W (τ ) ≡ W̃ (τ ) +
F −σ(u, T )⊤ du (11A.4)
t
is a Brownian motion under the T -forward martingale probability QTF .

Finally, note that for all integers i and non decreasing sequences of dates {Ti }i=0,1,···, ,
" τ
Ti
W QF (τ ) = W̃ (τ ) + −σ(u, Ti )⊤ du, i = 0, 1, · · ·.
t
Therefore,
T
" τ !
T
QFi QFi−1
W (τ ) = W (τ ) − σ(u, Ti )⊤ − σ(u, Ti−1 )⊤ du, i = 1, 2, · · ·, (11A.5)
t
is a Brownian motion under the Ti -forward martingale probability QTFi . Eqs. (11A.5) and (11A.4) are
used in Section 11.7 on interest rate derivatives.
394
11.11. Appendix 4: Principal components analysis c
by A. Mele
11.11 Appendix 4: Principal components analysis

Principal component analysis transforms the original data into a set of uncorrelated variables, the
principal components, with variances arranged in descending order. Consider the following program,
max [var (Y1 )] s.t. C1⊤ C1 = 1,

C1
where var (Y1 ) = C1⊤ ΣC1 , and the constraint is an identification constraint. The first order conditions
lead to,
(Σ − λI) C1 = 0,
where λ is a Lagrange multiplier. The previous condition tells us that λ must be one eigenvalue of
the matrix Σ, and that C1 must be the corresponding eigenvector. Moreover, we have var (Y1 ) =
C1⊤ ΣC1 = λ which is clearly maximized by the largest eigenvalue. Suppose that the eigenvalues of Σ
are distinct, and let us arrange them in descending order, i.e. λ1 > · · · > λp . Then,
var (Y1 ) = λ1 .

Therefore, the first principal component is Y1 = C1⊤ R − R̄ , where C1 is the eigenvector corresponding
to the largest eigenvalue, λ1 .
Next, consider the second principal component. The program is, now,
max [var (Y2 )] s.t. C2⊤ C2 = 1 and C2⊤ C1 = 0,

C2
where var (Y2 ) = C2⊤ ΣC2 . The first constraint, C2⊤ C2 = 1, is the usual identification constraint. The
second constraint, C2⊤ C1 = 0, is needed to ensure that Y1 and Y2 are orthogonal, i.e. E (Y1 Y2 ) = 0.
The first order conditions for this problem are,
0 = ΣC2 − λC2 − νC1
where λ is the Lagrange multiplier associated with the first constraint, and ν is the Lagrange multiplier
associated with the second constraint. By pre-multiplying the first order conditions by C1⊤ ,
0 = C1⊤ ΣC2 − ν,
where we have used the two constraints C1⊤ C2 = 0 and C1⊤ C1 = 1. Post-multiplying the previous
expression by C1⊤ , one obtains, 0 = C1⊤ ΣC2 C1⊤ − νC1⊤ = −νC1⊤ , where the last equality follows by
C1⊤ C2 = 0. Hence, ν = 0. So the first order conditions can be rewritten as,
(Σ − λI) C2 = 0.
The solution is now λ2 , and C2 is the eigenvector corresponding

to λ2 . (Indeed, this time we cannot
choose λ1 as this choice would imply that Y2 = C1⊤ R − R̄ , implying that E (Y1 Y2 ) = 0.) It follows
that var (Y2 ) = λ2 .
In general, we have,
var (Yi ) = λi , i = 1, · · · , p.
Let Λ be the diagonal matrix with the eigenvalues λi on the diagonal. By the spectral decomposition
of Σ, Σ = CΛC ⊤ , and by the orthonormality of C, C ⊤ C = I, we have that C ⊤ ΣC = Λ and, hence,
p
⊤ ⊤
i=1 var (Ri ) = Tr (Σ) = Tr ΣCC = Tr C ΣC = Tr (Λ) .
Hence, Eq. (11.23) follows.

395
11.12. Appendix 5: A few analytics for the Hull and White model c
by A. Mele
11.12 Appendix 5: A few analytics for the Hull and White model
As in the Ho and Lee model, the instantaneous forward rate f(τ , T ) predicted by the Hull and White
model is as in Eq. (11.52), where functions A2 and B2 can be easily computed from Eqs. (11.55) and
(11.56) as:
" T " T
A2 (τ , T ) = σ2 B(s, T )B2 (s, T )ds − θ(s)B2 (s, T )ds, B2 (τ , T ) = e−κ(T −τ ) .
τ τ
Therefore, the instantaneous forward rate f (τ , T ) predicted by the Hull and White model is obtained
by replacing the previous equations in Eq. (11.52). The result is then equated to the observed forward
rate f$ (t, τ ) so as to obtain:
σ2 !2 " τ
f$ (t, τ ) = − 2 1 − e−κ(τ −t) + θ(s)e−κ(τ −s) ds + e−κ(τ −t) r(t).
2κ t
By differentiating the previous equation with respect to τ , and rearranging terms,

" τ
∂ σ2
θ(τ ) = f$ (t, τ ) + 1 − e−κ(τ −t) e−κ(τ −t) + κ θ(s)e−κ(τ −s) ds + e−κ(τ −t) r(t)
∂τ κ
t 2
∂ σ2 −κ(τ −t)

−κ(τ −t) σ2 −κ(τ −t)
= f (t, τ ) + 1−e e + κ f$ (t, τ ) + 2 1 − e ,
∂τ $ κ 2κ
which reduces to Eq. (11.57) after using simple algebra.
396
11.13. Appendix 6: Expectation theory and embedding in selected models c
by A. Mele
11.13 Appendix 6: Expectation theory and embedding in selected models

A. Expectation theory
Suppose that
σ(·, ·) = σ and λ(·) = λ, (11A.6)
where σ and λ are constants. We derive the dynamics of r and compare them with f to deduce
something about the expectation theory. We have:
" τ
r(τ ) = f(t, τ ) + α(s, τ )ds + σ (W (τ ) − W (t)) ,
t
where " T
α(τ , T ) = σ(τ , T ) σ(τ , ℓ)dℓ + σ(τ , T )λ(τ ) = σ2 (T − τ ) + σλ.
τ
Hence, " τ
1
α(s, τ )ds = σ 2 (τ − t)2 + σλ(τ − t).
t 2
Finally,
1
r(τ ) = f(t, τ ) + σ2 (τ − t)2 + σλ(τ − t) + σ (W (τ ) − W (t)) ,
2
and since E ( W (τ )| F(t)) = W (t),
1
E [ r(τ )| F(t)] = f(t, τ ) + σ2 (τ − t)2 + σλ(τ − t).
2
Even with λ < 0, this model is not able to always generate E[ r(τ )| F(t)] < f (t, τ ). As shown in the
following exercise, this is due to the nonstationary nature of the volatility function. Indeed, suppose,
next, that instead of Eq. (11A.6), we have that
σ(t, T ) = σ · exp(−γ(T − t)) and λ(·) = λ,
where σ, γ and λ are constants. In this case, we have:

" τ " τ
r(τ ) = f(t, τ ) + α(s, τ )ds + σ e−γ(τ −s) · dW (s),
t t
where
" τ !
2 −γ(τ −s) σ2 −γ(τ −s)
α(s, τ ) = σ e e−γ(ℓ−s) dℓ + σλe−γ(τ −s) = e − e−2γ(τ −s) + σλe−γ(τ −s) .
s γ
Finally,
" τ
σ σ
−γ(τ −t) −γ(τ −t)
E [ r(τ )| F(t)] = f(t, τ ) + α(s, τ )ds = f(t, τ ) + 1−e 1−e +λ .
t γ 2γ
σ
Therefore, it is sufficient to have a risk-premium such that −λ > 2γ , to generate the prediction that:
E [ r(τ )| F(t)] < f(t, τ ) for any τ .
In other words, λ < 0 is a necessary condition, not sufficient. Notice that when λ = 0, it always holds
that E ( r(τ )| F(t)) > f(t, τ ).
397
11.13. Appendix 6: Expectation theory and embedding in selected models c
by A. Mele
B. Embedding
We now embed the Ho and Lee model in Section 11.5.2 in the HJM format. In the Ho and Lee model,
dr(τ ) = θ(τ )dτ + σdW̃ (τ ),
where W̃ is a Q-Brownian motion. By Eq. (11.52) in Section 11.4,
f (r, t, T ) = −A2 (t, T ) + B2 (t, T )r,

#T
where A2 (t, T ) = t θ(s)ds − 12 σ2 (T − t)2 and B2 (t, T ) = 1. Therefore, by Eqs. (11.65),
σ(t, T ) = B2 (t, T ) · σ = σ,
α(t, T ) − σ(t, T )λ(t) = −A12 (t, T ) + B12 (t, T )r + B2 (t, T )θ(t) = σ2 (T − t).
Next, we embed the Vasicek model in Section 11.4 in the HJM format. The Vasicek model is:
dr(τ ) = (θ − κr(τ ))dτ + σdW̃ (τ ),
where W̃ is a Q-Brownian motion. Results from Section 11.3 imply that:
f (r, t, T ) = −A2 (t, T ) + B2 (t, T )r,

#T #T
where −A2 (t, T ) = −σ 2 t B(s, T )B2 (s, T )ds + θ t B2 (s, T )ds, B2 (t, T ) = e−κ(T −t) and B(t, T )

= κ1 1 − e−κ(T −t) . By Eqs. (11.65),
σ(t, T ) = σ · B2 (t, T ) = σ · e−κ(T −t) ;
σ2 !
α(t, T ) − σ(t, T )λ(t) = −A12 (t, T ) + B12 (t, T )r + (θ − κr)B2 (t, T ) = 1 − e−κ(T −t) e−κ(T −t) .
κ
Naturally, this model can never be embedded within a HJM model because it is not of the perfectly
fitting type. In practice, condition (11.66) can never hold in the simple Vasicek model. However, the
model is embeddable once θ is turned into an infinite dimensional parameter à la Hull and White (see
Section 11.3).
398
11.14. Appendix 7: Additional results on string models c
by A. Mele
11.14 Appendix 7: Additional results on string models

1
#T dξ
Here we prove Eq. (11.68). We have, αI (τ , T ) = 2 τ g (τ , T, ℓ2 ) dℓ2 + cov( dP
P , ξ ), where
" T
g (τ , T, ℓ2 ) ≡ σ (τ , ℓ1 ) σ (τ , ℓ2 ) ψ (ℓ1 , ℓ2 ) dℓ1 .
τ
Differentiation of the cov term is straight forward. Moreover,

" T " T
∂ ∂g (τ , T, ℓ2 )
g (τ , T, ℓ2 ) dℓ2 = g (τ , T, T ) + dℓ2
∂T τ τ ∂T
" T
= σ (τ , T ) σ (τ , x) [ψ (x, T ) + ψ (T, x)] dx
τ
" T
= 2σ (τ , T ) σ (τ , x) ψ (x, T ) dx .
τ
399
11.15. Appendix 8: Changes of numéraire c
by A. Mele
11.15 Appendix 8: Changes of numéraire

A. Jamshidian (1989)
Consider the following change-of-numéraire result. Let
dA
= µA dτ + σ A dW,
A
and consider a similar process B with coefficients µB and σB . We have:
d(A/B)
= µA − µB + σ2B − σA σB dτ + (σA − σB ) dW. (11A.7)
A/B
We apply this result to the process y(τ , S) ≡ PP (τ

(τ ,S) S T
,T ) , under QF as well as under QF . The objective
is to obtain the solution as of time T of y (τ , S) viz
P (T, S)
y(T, S) ≡ = P (T, S) under QSF as well as under QTF .
P (T, T )
This allows us to calculate the two probabilities in Eq. (11.72).

By Itô’s lemma, the PDE (11.32) and the fact that Pr = −BP ,
dP (τ , x)
= rdτ − σB(τ , x)dW̃ (τ ), x ≥ T.
P (τ , x)
By applying Eq. (11A.7) to y(τ , S),
dy(τ , S)
= σ 2 B(τ , T )2 − B(τ , T )B(τ , S) dτ − σ [B(τ , S) − B(τ , T )] dW̃ (τ ). (11A.8)
y(τ , S)
All we need to do now is to change measure with the tools of Appendix 3. We have that:
x
dW QF (τ ) = dW̃ (τ ) + σB(τ , x)dτ
x
is a Brownian motion under the x-forward martingale probability. Replace then W QF into Eq. (11A.8),
then integrate, and obtain:
y(T, S) P (t, T )
1 2 T 2 T QT
= P (T, S) = e− 2 σ t [B(τ ,S)−B(τ,T )] dτ −σ t [B(τ ,S)−B(τ ,T )]dW F (τ ) ,
y(t, S) P (t, S)
y(T, S) P (t, T )
1 2 T 2 T QS
= P (T, S) = e 2 σ t [B(τ ,S)−B(τ ,T )] dτ −σ t [B(τ ,S)−B(τ ,T )]dW F (τ ) ,
y(t, S) P (t, S)
Rearranging terms gives Eqs. (11.73) in the main text.
B. Black (1976)
To prove Eq. (11.100), we need to evaluate the following expectation:
E [x(T ) − K]+ ,
where T
1
γ(τ )2 dτ + tT γ(τ )dW̃ (τ )
x(T ) = x(t)e− 2 t . (11A.9)
400
by A. Mele
Let 1ex be the indicator of all events s.t. x(T ) ≥ K. We have
E [x(T ) − K]+ = E [x(T ) · 1ex ] − K · E [1ex ]

x(T )
= x(t) · E · 1ex − K · E [1ex ]
x(t)
= x(t) · EQx [1ex ] − K · E [1ex ]
= x(t) · Qx (x(T ) ≥ K) − K · Q (x(T ) ≥ K) .
where the probability Qx is defined as:
dQx x(T )
1 T 2
T
= = e− 2 t γ(τ ) dτ + t γ(τ )dW̃ (τ ) ,
dQ x(t)
a Q-martingale starting at one. Under Qx ,
dW x (τ ) = dW̃ (τ ) − γdτ
is a Brownian motion, and x in Eq. (11A.9) can be written as:

T
1
γ(τ )2 dτ + tT γ(τ )dW x (τ )
x (T ) = x (t) e 2 t .
It is straightforward that Q (x(T ) ≥ K) = Φ(d2 ) and Qx (x(T ) ≥ K) = Φ(d1 ), where

#T
ln x(t)
K ∓ 12 t γ(τ )2 dτ
d2/1 = # .
T 2 dτ
t γ(τ )
Applying this to EQTi [Fi−1 (Ti−1 ) − K]+ gives the formulae of the text.
F
401
by A. Mele
References
Aı̈t-Sahalia, Y. (1996): “Testing Continuous-Time Models of the Spot Interest Rate.” Review
of Financial Studies 9, 385-426.
Andersen, T. G., and J. Lund (1997): “Estimating Continuous-Time Stochastic Volatility
Models of the Short-Term Interest Rate.” Journal of Econometrics 77, 343-377.
Ang, A. and M. Piazzesi (2003): “A No-Arbitrage Vector Autoregression of Term Structure
Dynamics with Macroeconomic and Latent Variables.” Journal of Monetary Economics
50, 745-787.
Balduzzi, P., S. R. Das, S. Foresi and R. K. Sundaram (1996): “A Simple Approach to Three
Factor Affine Term Structure Models.” Journal of Fixed Income 6, 43-53.
Bernanke, B. S. and A. Blinder (1992): “The Federal Funds Rate and the Channels of Monetary
Transmission.” American Economic Review 82, 901-921.
Black, F. (1976): “The Pricing of Commodity Contracts.” Journal of Financial Economics 3,
167-179.
Brace, A., D. Gatarek and M. Musiela (1997): “The Market Model of Interest Rate Dynamics.”
Mathematical Finance 7, 127-155.
Brémaud, P. (1981): Point Processes and Queues: Martingale Dynamics. Berlin: Springer Ver-
lag.
Brigo, D. and F. Mercurio (2006): Interest Rate Models–Theory and Practice, with Smile,
Inflation and Credit. Springer Verlag (2nd Edition).
Brunnermeier, M. (2009): “Deciphering the Liquidity and Credit Crunch 2007-08.” Journal of
Economic Perspectives 23, 77-100.
Carverhill, A. (1994): “When is the Short-Rate Markovian?” Mathematical Finance 4, 305-312.
Cochrane, J. H. and M. Piazzesi (2005): “Bond Risk Premia.” American Economic Review 95,
138-160.
Collin-Dufresne, P. and R. S. Goldstein (2002): “Do Bonds Span the Fixed-Income Markets?
Theory and Evidence for Unspanned Stochastic Volatility.” Journal of Finance 57, 1685-
1729.
Conley, T. G., L. P. Hansen, E. G. J. Luttmer and J. A. Scheinkman (1997): “Short-Term
Interest Rates as Subordinated Diffusions.” Review of Financial Studies 10, 525-577.
Dai, Q. and K. J. Singleton (2000): “Specification Analysis of Affine Term Structure Models.”
402
by A. Mele
Duffie, D. and R. Kan (1996): “A Yield-Factor Model of Interest Rates.” Mathematical Finance
6, 379-406.
Duffie, D. and K. J. Singleton (1999): “Modeling Term Structures of Defaultable Bonds.”

Estrella, A. and G. Hardouvelis (1991): “The Term Structure as a Predictor of Real Economic
Activity.” Journal of Finance 46, 555-76.
Fama, E. F. and R. R. Bliss (1987): “The Information in Long-Maturity Forward Rates.”

American Economic Review 77, 680-692.
Fong, H. G. and O. A. Vasicek (1991): “Fixed Income Volatility Management.” The Journal
of Portfolio Management (Summer), 41-46.
Geman, H. (1989): “The Importance of the Forward Neutral Probability in a Stochastic Ap-
proach to Interest Rates.” Unpublished working paper, ESSEC.
Geman H., N. El Karoui and J. C. Rochet (1995): “Changes of Numeraire, Changes of Prob-
ability Measures and Pricing of Options.” Journal of Applied Probability 32, 443-458.
Goldstein, R. S. (2000): “The Term Structure of Interest Rates as a Random Field.” Review
Harvey, C. R. (1991): “The Term Structure and World Economic Growth.” Journal of Fixed
Income 1, 4-17.
Harvey, C. R. (1991): “The Term Structure Forecasts Economic Growth.” Financial Analysts
Journal May/June 6-8.
Heaney, W. J. and P. L. Cheng (1984): “Continuous Maturity Diversification of Default-Free

Bond Portfolios and a Generalization of Efficient Diversification.” Journal of Finance 39,
1101-1117.
Heath, D., R. Jarrow and A. Morton (1992): “Bond Pricing and the Term-Structure of Interest
Rates: a New Methodology for Contingent Claim Valuation.” Econometrica 60, 77-105.
Ho, T. S. Y. and S.-B. Lee (1986): “Term Structure Movements and the Pricing of Interest
Rate Contingent Claims.” Journal of Finance 41, 1011-1029.
Hördahl, P., O. Tristani and D. Vestin (2006): “A Joint Econometric Model of Macroeconomic
and Term Structure Dynamics.” Journal of Econometrics 131, 405-444.
Hull, J. C. (2003): Options, Futures, and Other Derivatives. Prentice Hall. 5th edition (Inter-
national Edition).
Hull, J. C. and A. White (1990): “Pricing Interest Rate Derivative Securities.” Review of
Financial Studies 3, 573-592.
Jagannathan, R. (1984): “Call Options and the Risk of Underlying Securities.” Journal of
403
by A. Mele
Jamshidian, F. (1989): “An Exact Bond Option Pricing Formula.” Journal of Finance 44,
205-209.
Jamshidian, F. (1997): “Libor and Swap Market Models and Measures.” Finance and Stochas-
tics 1, 293-330.
Jöreskog, K. G. (1967): “Some Contributions to Maximum Likelihood Factor Analysis.” Psy-
chometrica 32, 443-482.
Karlin, S. and H. M. Taylor (1981): A Second Course in Stochastic Processes. San Diego:
Academic Press.
Kennedy, D. P. (1994): “The Term Structure of Interest Rates as a Gaussian Random Field.”
Kennedy, D. P. (1997): “Characterizing Gaussian Models of the Term Structure of Interest
Rates.” Mathematical Finance 7, 107-118.
Kessel, R. A. (1965): “The Cyclical Behavior of the Term Structure of Interest Rates.” National
Bureau of Economic Research Occasional Paper No. 91.
Kloeden, P. and E. Platen (1992): Numeric Solutions of Stochastic Differential Equations.
Berlin: Springer Verlag.
Knez, P. J., R. Litterman and J. Scheinkman (1994): “Explorations into Factors Explaining
Money Market Returns.” Journal of Finance 49, 1861-1882.
Langetieg, T. (1980): “A Multivariate Model of the Term Structure of Interest Rates.” Journal
of Finance 35, 71-97.
Laurent, R. D. (1988): “An Interest Rate-Based Indicator of Monetary Policy.” Federal Reserve
Bank of Chicago Economic Perspectives 12, 3-14.
Laurent, R. D. (1989): “Testing the Spread.” Federal Reserve Bank of Chicago Economic
Perspectives 13, 22-34.
Litterman, R. and J. Scheinkman (1991): “Common Factors Affecting Bond Returns.” Journal
of Fixed Income 1, 54-61.
Litterman, R., J. Scheinkman, and L. Weiss (1991): “Volatility and the Yield Curve.” Journal
of Fixed Income 1, 49-53.
Longstaff, F. A. and E. S. Schwartz (1992): “Interest Rate Volatility and the Term Structure:
A Two-Factor General Equilibrium Model.” Journal of Finance 47, 1259-1282.
Mele, A. and F. Fornari (2000): Stochastic Volatility in Financial Markets: Crossing the Bridge
to Continuous Time. Boston: Kluwer Academic Publishers.
Merton, R. C. (1973): “Theory of Rational Option Pricing.” Bell Journal of Economics and
Management Science 4, 141-183.
404
by A. Mele
Miltersen, K., K. Sandmann and D. Sondermann (1997): “Closed Form Solutions for Term
Structure Derivatives with Lognormal Interest Rate.” Journal of Finance 52, 409-430.
Rebonato, R. (1998): Interest Rate Option Models. Wiley.
Rebonato, R. (1999): Volatility and Correlation. Wiley.
Ritchken, P. and L. Sankarasubramanian (1995): “Volatility Structure of Forward Rates and

the Dynamics of the Term Structure.” Mathematical Finance 5, 55-72.
Rothschild, M. and J. E. Stiglitz (1970): “Increasing Risk: I. A Definition.” Journal of Eco-

Sandmann, K. and D. Sondermann (1997): “A Note on the Stability of Lognormal Interest

Rate Models and the Pricing of Eurodollar Futures.” Mathematical Finance 7, 119-125.
Santa-Clara, P. and D. Sornette (2001): “The Dynamics of the Forward Interest Rate Curve
with Stochastic String Shocks.” Review of Financial Studies 14, 149-185.
Stanton, R. (1997): “A Nonparametric Model of Term Structure Dynamics and the Market
Price of Interest Rate Risk.” Journal of Finance 52, 1973-2002.
Stock, J. H. and M. W. Watson (1989): “New Indexes of Coincident and Leading Economic
Indicators.” In: Blanchard, O. J. and S. Fischer (Eds.): NBER Macroeconomics Annual
1989, MIT Press, 352-394.
Stock, J. H. and M. W. Watson (2003): “Forecasting Output and Inflation: The Role of Asset
Prices,” Journal of Economic Literature 41, 788-829.

405
12
Risky debt and credit derivatives
12.1 Introduction
12.2 The classics: Modigliani-Miller irrelevance results
Firms are divided into equivalent returns classes. Returns are perfectly correlated within the
same class. Let π be the constant expected profit paid off by the each firm within class k, and
EU be the price ofan unlevered firm’s share. Under the conditions reviewed in Chapter 7, we
have that EU = ∞ t=1 (1 + ρk )
−t
π, where ρk is the risk-adjusted discount rate prevailing in
sector k, such that the return on equity (ROE) for the unlevered firm is,
π
ρk = ,
EU
a constant for all unlevered firms belonging to the asset class k. Naturally, the value of the firm
is equal to the value of equity, VU = EU , say. Next, let us introduce debt: for any arbitrary firm
in the k-th sector that issues D nominal value of debt, we have that its value, denoted as VL ,
is the sum of equity and debt, VL = EL + D. [Assumptions: ]
We have:
Theorem 12.1 (Modigliani & Miller theorem). In the absence of arbitrage and frictions, the
market value of any firm is independent of its capital structure and is given by capitalizing its
expected dividend at the discount rate appropriate to its class: Vj = ρπ , for any firm j ∈ {U, L}
k
in class k.
In other words, the return on investment (ROI), defined as ρk = Vπj , is the same for two firms
that earn the same expected profit, π, and that only differ as regards their capital structure.
Naturally, the ROE and ROI are the same for the unlevered firm.
The proof of Theorem 12.1 can be obtained with the modern tools reviewed in Chapter 2
through 4, but for sake of completeness, we produce the arguments in Modigliani and Miller
(1958), as these are very simple. Consider two firms: a first, unlevered and a second, levered.
They both earn the same expected profit, π. Suppose to purchase the shares of the unlevered
12.2. The classics: Modigliani-Miller irrelevance results c
by A. Mele
firm and borrow the same amount of money issued by the levered firm. In the absence of
arbitrage or any frictions, the value of this portfolio should equal the value of the levered firm,
which is possible as soon as the values of the levered and the unlevered firm are the same.
Mathematically, given an arbitrary α ∈ (0, 1), we do the following trade: (i) we buy NU =
α EU = α VVUL of the unlevered firm; (ii) we sell NL = α shares of the levered firm. These
EL +D
two trades make the balance of the position worth −NU EU + NL EL = −αD, and so (iii) we
borrow αD at the interest rate r, to make this initial position worthless. This portfolio yields:
(i) +NU π, due to the purchase of the shares of the unlevered firm, (ii) −NL (π − rD), due
to the sale of the shares of the levered firm, which of course has to pay interests on its debt,
and (iii) −rαD, arising to honour the debt we are making to build up the worthless portfolio.
VL
Summing up, NU π − NL (π − rD) − rαD = α VU − 1 π. If VL > VU , we have an arbitrage
opportunity as we may make money out of a worthless portfolio, and if VL < VU , we have an
arbitrage as well, as we could reverse the positions of the worthless portfolio. So we need to
have that VL = VU = EU = ρπ .
k
[As mentioned, Theorem 12.1 can be proved through the modern tools in Chapters 2 through
4]
We have: π = ROI · V . Therefore,
π − rD ROI · (E + D) − rD D
ROE = = = ROI + (ROI − r) .
E E E
If the financial conditions of the firm do not affect the interest rate on debt, then, the ROE is
increasing in the leverage ratio, D
E
, provided ROI > r. This situation arises when the arbitrage
arguments underlying Theorem 12.1 assume no-arbitrage trades can be implemented with a
cost of borrowing money equal to that of the firm. In the presence of market frictions such
as asymmetric information between borrowers and lenders, this needs not to be the case. For
example, debt markets might be concerned about the size of the leverage ratio. Assume, for
example, that r = f (ℓ), where ℓ = D E
, and in particular that f (ℓ) = 0.03ℓ. Then, we have that:
ROE = ROI + (ROI − 0.03ℓ) ℓ. The picture below depicts the behavior of ROE as a function
of ℓ, assuming that ROI = 5% and that the risk-free rate in case of no such frictions is r = 3%.
407
12.3. Conceptual approaches to valuation of defaultable securities c
by A. Mele
0.09
ROE
0.08
0.07
0.06
0.05
0.04
0.03
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
Leverage ratio
The solid line depicts the ROE for a firm sustaining a cost of debt independent of the
leverage ratio, with ROI = 5% and r = 2%. The dashed line is the ROE for a firm that
has a cost of debt increasing in the leverage ratio ℓ, r (ℓ) = 0.03ℓ.
Consider the firm with cost of capital depending on the current leverage rato, ℓ. For a low
level of ℓ, the ROE increases with ℓ, so as to magnify the difference ROI − 0.03ℓ through the
multiplying effect (ROI − 0.03ℓ) ℓ. However, for higher leverage ratios, the difference ROI−0.03ℓ
becomes thinner and thinner, and an increase in ℓ then leads to marginal lower ROE. In this
example, there is an interior value for the leverage ratio that maximizes the ROE, which is,
approximately, ℓ = 0.83.
12.3 Conceptual approaches to valuation of defaultable securities

12.3.1 Firm’s value, or structural, approaches
Relies on the structure of the firm. Shares and bonds as derivatives on the firm’s assets.
Stylized balance sheet

Equity (E)
(Shares)
Assets (A)
Debt (D)
(Bonds)
Therefore, we have the accounting identity: Assets = Equity + Debt, or
A = E + D.
At the time of debt expiration, debtholders receive the minimum between the debt nominal value
and the value of the assets the firm can liquidate to honour the debt obligation. Debtholders are
senior claimants. Equity holders are residual claimants to the firm’s assets –> Junior claimants
408
by A. Mele
We can use these basic insights to illustrate the first model about the risk-structure of interest
rates, the Merton - KMV approach. Equity is like a European call option written on the firm’s
assets, with expiration equal to the debt expiration, and strike equal to the nominal value of
debt. Current value of debt equals the value of the assets minus the value of equity, i.e. the
value of a risk-free discount bond minus the value of a put option on the firm with strike price
equal to the nominal value of debt, as shown by Eq. (12.3) below.
Merton (1974) uses the Black and Scholes (1973) formula to derive the price of debt. The
main assumption underlying this model is that the assets of the firm can be traded, and that
their value At satisfies,1
dAt
= rdt + σdW̃t , (12.1)
At
where W̃t is a Brownian motion under the risk-neutral probability, σ is the instantaneous
standard deviation, and r is the short-term rate on riskless bonds.
Let N be the nominal value of debt, T be time of expiration of debt; Dt the debt value as
of at time t ≤ T . As argued earlier, shareholders have long a European call option, and the
bondholders are residual claimants. Mathematically,
+
AT , if the firm defaults, i.e. AT < N
DT =
N, if the firm is solvent, i.e. AT ≥ N
We can decompose the assets value at time T , into the sum of the value of equity and the value
of debt, at time T ,
DT = min {AT , N } = AT − max {AT − N, 0} . (12.2)
≡ Equity at T
Note, also, that,

DT = min {AT , N} = N − max {N − AT , 0} . (12.3)
≡ Put on the firm
That is, credit risk raises the cost of capital.
1.2
1.0
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2
A_T
1 Eq.
(12.1) could be generalized to one in which dA
t = (rAt − δ t) dt + σAt dW̃t , where δ t is the instantaneous cash flow to the
firm. This would make the firm value equal to A0 = E 0∞ e−rt δ t dt . For example, one could take δ t to be a geometric Brownian
motion with parameters g and σ, in which case At = (r − g)−1 δ t , forever, but we’re just ignoring this complication.
409
by A. Mele
FIGURE 12.1. Dashed line: the value of equity at the debt maturity, T , max {AT − N, 0},
plotted as a function of the value of assets, AT . Solid line: the value of debt at maturity,
min {AT , N} as a function of AT . Nominal value of debt is fixed to N = 1.
A word on convexity, and risk-taking behavior. Convexity: Managers have incentives to invest
in risky assets, as the terminal payoff to them is increasing in the assets volatility, σ. Concavity:
The value of debt, instead, is decreasing in the assets volatility.
12.3.1.1 Merton
The current value of the bonds equals the current value of the assets, A0 , minus the current
value of equity. The current value of equity can obtained through the Black & Scholes formula,
as equity is a European call option on the firm, struck at N. By Eq. (12.2), and standard
risk-neutral evaluation, then, the current value of debt, D0 , is,

−rT ln ( A0 / N) + r + 12 σ 2 T √
D0 = A0 Φ (−d1 ) + N e Φ(d2 ), d1 = √ , d2 = d1 − σ T , (12.4)
σ T
where Φ (·) denotes the distribution function of a standard normal variable.2
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
A_0
FIGURE 12.2. Solid line: the no-arbitrage bound, min {A0 , N}, depicted as a function
of A0 , when the nominal value of debt is fixed to N = 1. Dashed line: the bond value
predicted by the Merton’s model when T = 1, r = 3% and σ = 20%, annualized. Dotted
line: same as the dashed line, but with a larger asset volatility, σ = 40%.
Bond prices are decreasing in the asset volatility as bad outcomes are exaggerated on the
downside, due to the concavity properties depicted in Figure 12.1.
2 For the details, note that D0 = e−rT E [ DT | A0 ] and, then, by Eq. (12.2),

D0 = e−rT E ( AT | A0 ) − e−rT E [ max {AT − N, 0}| A0 ] = A0 − A0 Φ (d1 ) − Ne−rτ Φ (d2 ) ,
where the last equality follows by the Black & Scholes formula. Eq. (12.4) follows after rearranging terms in the previous equation.
410
by A. Mele
The risk-structure of interest rates is obtained with the standard formula for continuously
compounded interest rates as,

1 D0
R = − ln = r + Spread,
T N
where the term-spread or, simply, the spread, is

1 A0 rT
Spread = − ln e Φ (−d1 ) + Φ (d2 ) . (12.5)
T N
Figure 12.3 depicts the spread predicted by this model. Credit spreads shrink to zero as time-
to-maturity becomes smaller and smaller. This property of the model is in sharp contrast with
the empirical behavior of credit spreads, which are high even for short-maturity bonds. This
property arises because the model is driven by Brownian motions, which have have continuous
sample paths, such that given an assets value A > N, the probability of bankruptcy, arising
when A hits N from above, approaches zero very fast as time-to-maturity goes to zero. Because
credit spreads reflect, of course, default probabilities, as we shall explain below (see Eq. (12.8)),
credit spreads, then, shrink to zero quickly as time-to-maturity approaches zero.
Naturally, one might end up with credit spreads sufficiently high at short-maturities, by
assuming the assets value is sufficiently small. For example, in Figure 12.3, credit spreads are
“high” at short maturities, when A = 1.1. However, even with A = 1.1, credit spreads are
still zero at very short maturities. More fundamentally, requiring such a small value for A is
problematic. Firms with such a low assets value would command a much higher spread than
that in Figure 12.3. All in all, the Brownian motion model in this section lacks some source
of risk driving the behavior of short-term spreads. In Section 12.3.2, we will show that this
problem can succesfully be addressed assuming the firm’s default can be triggered by “jumps.”
Spread
0.03
0.02
0.01
0.00
0 1 2 3 4 5
Time to maturity
FIGURE 12.3. The term structure of spreads, s0 , obtained with initial asset values A0 =
1.1 (solid line), A0 = 1.2 (dashed line), and A0 = 1.3 (dotted line). The short-term rate,
r = 3%, and asset volatility is σ = 0.20. Nominal debt N = 1.
411
by A. Mele
We can introduce a useful summary statistics for credity quality: distance-to-default (under
Q). We can use the previous model to estimate the likelihood of default for a given firm. First,
we develop Eq. (12.2),
DT = min {AT , N } = AT · I{AT <N} + N · I{AT ≥N} ,
where I{E} is the indicator function, i.e. I{E} = 1 if the event E is true and I{E} = 0 if the event
E is false. Second, we have,
D0 = e−rT E (DT )

= e−rT E AT · I{AT <N} + N · E I{AT ≥N}
= e−rT [E ( AT | Default) Q (Default) + N · Q (Survival)] , (12.6)
where E ( AT | Default) is the expected asset value given the event of default, Q (Default) is the
probability of default, and Q (Survival) = 1 − Q (Default) is the probability the firms does not
default.
Comparing Eq. (12.6) with Eq. (12.4) reveals that for the Merton’s model,
Q (Survival) = Φ (d2 ) .
1.0
Pr(surv)
0.9
0.8
0.7
0.6
0.5
0.0 0.1 0.2 0.3 0.4
sigma
FIGURE 12.4. Probability of survival for a given firm predicted by the Merton’s model,
Φ (d2 ), depicted as a function of the asset volatility, σ. Assets value is fixed at A0 = 1.1,
and plotted are survival probabilities for bonds maturing at T = 0.5 years (solid line),
T = 1 year (dashed line) and T = 2 years (dotted line). The short-term rate, r = 3%.
Nominal debt N = 1.
The probability of survival, (i) decreases with debt maturity and (ii) the asset volatility.
Property (i) is not a general property, though. With lower values of A0 , the relation between
maturity and probability of survival can be increasing or decreasing, according to the values of
σ, as shown in Figure 12.5. Intuitively, when A0 ≈ N, the probability of survival is:

ln AN0 + r − 12 σ 2 T r − 12 σ 2 √
Q (Survival) = Φ (d2 ) , with d2 = √ ≈ T,
σ T σ
412
by A. Mele
such that the survival probability decreases in T for large σ although then it increases in T for
small σ. The intuition underlying this property is that for large σ, the probability the asset value
will end up below N from A10 ≈ 2
N can only increase
1 2
with time to maturity, T . Analytically,
E ( ln AT | A0 ) = ln A0 + r − 2 σ T ≈ ln N + r − 2 σ T , such that the probability the assets
value will be above N is, indeed, approximately Q (Survival).
1.0
Pr(surv)
0.9
0.8
0.7
0.6
0.5
0.0 0.1 0.2 0.3 0.4

sigma
FIGURE 12.5. Probability of survival for a given firm predicted by the Merton’s model,
Φ (d2 ), depicted as a function of the asset volatility, σ. Assets value is fixed at A0 = 1.01,
and plotted are survival probabilities for bonds maturing at T = 0.5 years (solid line),
T = 1 year (dashed line) and T = 2 years (dotted line). The short-term rate, r = 3%.
Nominal debt N = 1.
The summary statistics, distance-to-default, is defined as,

ln ( A0 / N ) + r − 12 σ 2 T
d2 = √ . (12.7)
σ T
It is a very intuitive measure of how far the firm is from defaulting, as we know the (risk-
adjusted) probability of surviving is Φ (d2 ), which is increasing in d2 . The larger the current
asset value A0 is, the less likely it is the firm will default at T . Distance-to-default is decreasing
in the assets volatility σ, as illustrated earlier by Figure 12.4.
By Eq. (12.1), we have that E ( ln AT | A0 ) = ln A0 + (r − 12 σ 2 )T , so Eq. (12.7) tells us that
distance-to-default is simply the difference E ( ln AT | A0 ) − ln N, normalized by the standard
deviation of the assets over the life of debt.
Some prefer to use the slightly different formula,
Mkt value of Assets − Default value
Distance-to-default = .
Mkt value of Assets ∗ Asset volatility
Another useful concept is Loss-given-default, under Q. Comparing Eq. (12.6) with Eq. (12.4)
reveals another property of the Merton’s model,
A0 erT Φ (−d1 ) Φ (−d1 ) Φ (−d1 )
E ( AT | Default) = = A0 erT = E (AT ) ≤ E (AT ) .
Q (Default) Φ (−d2 ) Φ (−d2 )
413
by A. Mele
Recovery rates are defined as the fraction of the bond value the bondholders expect to obtain
in the event of default, at maturity:
E ( AT | Default) A0 rT Φ (−d1 )
Recovery Rate = = e .
N N Φ (−d2 )
Loss-given-default is defined as the fraction of the bond value the bondholders expect to lose
in the event of default, at maturity:
Loss-given-default = 1 − Recovery Rate.
Finally, by Eq. (12.5), we can write,

1 A0 rT
s0 = − ln e Φ (−d1 ) + Φ (d2 )
T N
1
= − ln (Recovery Rate · Q (Default) + Q (Survival))
T
1
≈ [Loss-given-default · Q (Default)] . (12.8)
T
This is actually a general formula, which goes through beyond the Merton’s model). It can
easily be obtained through Eq. (12.6).
What is the asymptotic behavior of the survival probabilities, and then the spreads? If r >
1 2
2
σ , then, as T → ∞, Φ (d2 ) → 1, that is, the assets value is expected to grow so much that
default will never occur, such that thr bond becomes riskless and s0 → − T1 ln Φ (d2 ) → 0.
An important note. Survival probabilities, distance-to-default, loss-given-default, were previ-
ously defined under the risk-adjusted probability Q. To calculate the same objects under the
true probability P, we replace r with the asset growth rate under the physycal probability, µ, in
the formulae for the survival probabilities, Φ (d2 ), distance-to-default, d2 , and loss-given-default.
However, it is hard to estimate µ for many single names. Moody’s KMV EDFTM are based
on dynamic structural models like these, although the details are not publicly known. Finally,
we could use historical data about default frequencies to estimate the probability that a given
single name within a certain industry will default. These frequencies are based on samples of
firms that have defaulted in the past, with similar characteristics to those of the firm under
evaluation (in terms, for instance, of distance-to-default).
How to estimate At and σ? One algorithm is to start with some σ equal to the volatility
(0)
of equity returns, say σ (0) , and use Merton’s formula for equity, to extract At for each date
(0)
t ∈ {1, · · ·, T }, where T is the sample size. Then, use At to compute the standard deviation of
(0) (0)
ln(At /At−1 ). This gives say σ (1) , which can be used as the new input to the Merton’s formula
(1) (i)
to extract say At . We obtain a sequence of (At )Tt=1 and σ (i) , and we stop for i sufficiently
large, according to some criterion.
12.3.1.2 One example
Assume the assets value of a given firm is A0 = 110, and that the instantaneous volatility of the
assets value growth is σ = 30%, annualized. The safe interest rate is r = 2%, annualized, and
the expected growth rate of the assets value is µ = 5%, annualized. The firm has outstanding
debt with nominal value N = 100, which expires in two years.
414
by A. Mele
First, we compute the distance-to-default implied by the Merton’s model, which is,

ln AN0 + r − 12 σ 2 T ln (1.1) + 0.02 − 12 0.32 2
D-t-D = √ = √ = 0.10680.
σ T 0.3 2
Accordingly, the probability of default is,
1 − Φ (0.10680) = 1 − 0.54253 = 0.45747.
We can compute the same probability, under the physical measure, by simply replacing
r = 2% with µ = 5%, in the formula for D-t-D. We have,

ln AN0 + µ − 12 σ 2 T ln (1.1) + 0.05 − 12 0.32 2
D-t-Dphysical = √ = √ = 0.24822.
σ T 0.3 2
Therefore, the probability of default under the physical distribution is,
1 − Φphysical (0.24822) = 1 − 0.59802 = 0.40198.
It is, of course, lower under the physical probability than under the risk-neutral probability,
due to the larger asset growth rate, µ > r.
Finally, we can compute the spread on this bond, which is given by:

1 A0 rT
Spread = − ln e Φ (−d1 ) + Φ (d2 ) ,
T N
√
where d2 = D-t-D, and d1 = d2 + σ T . So we have,
1 √
Spread = − ln 1.1e0.02∗2 Φ − 0.10680 + 0.30 ∗ 2 + Φ (0.10680)
2
1
= − ln 1.1e0.02∗2 ∗ 0.29769 + 0.54253
2
= 6.20%.
12.3.1.3 First passage
The timing of default can be triggered by some exogeneously specified events. For example,
default occurs if the value of the assets hits some exogenously lower bound even before the
expiration of debt. These models are known as “first passage” models, because they rely on
mathematical techniques that solve for the probability the first time the asset value hit some
exogenous “barrier,” as in Black and Cox (1976).
12.3.1.4 Strategic defaulting
The timing of default can be endogenous. Managers choose the defaulting barrier (i.e. the
asset value that triggers bankruptcy) so as to maximize the firm’s value. Naturally, strategic
defaulting works if the assumptions underlying the Modigliani-Miller theorem do not hold. The
mechanism is the following: on the one hand, debt is a tax-shielding device. On the other hand,
issuing too much debt increases the likelihood of default, which triggers bankruptcy costs. The
first effect raises the value of the firm while the second, decreases the value of the firm. Equity
holders choose the value of the asset that triggers bankruptcy to maximize the value of equity.
415
by A. Mele
Leland (1994): Long-term debt. Leland and Toft (1996): Extension to finite maturity debt.
Anderson and Sundaresan (1996): Debt re-negotiation.
The Leland’s model considers liquidation of the firm as a strategic choice of the equity
holders. In fact, the US bankruptcy code includes both a liquidation process (Chapter 7) and
a reorganization process (Chapter 11), but Leland’s model only considers firm’s liquidation at
bankruptcy. Broadie, Chernov and Sundaresan (2007) generalize this setting to one where the
firm may choose to default through a reorganization process, in which case no equity is issued
to honour debt services, i.e. coupon payments, as it is instead the case in Leland, as we now
explain.
The terms leading to the strategic defaulting in Leland’s model are as follows. First, the value
of the assets, At , is solution, as usual, to Eq. (12.1). Second, debt is infinitely lived in that, it
pays off an instantaneous coupon equal to Cdt; in the absence of default risk, then, debt would
simply equal C/r. Third, tax benefits are assumed to be proportional to the coupon, τ Cdt.
Fourth, there are bankruptcy costs: if the firm defaults at A = AB , recovery is (1 − α) AB .
Equity holders choose AB . Naturally, AB < A0 .
The value of debt is a function of the assets value, A, say D (A). Moreover, the firm finances
the net cost of the coupon C by issuing additional equity, until the equity value is zero, i.e.
until A = AB , as seen below. Therefore, under the risk-neutral probability, the value of debt
satisfies:
d
E [D (AT )| A0 ] +
C = rD (At ) .
dT T =t
=coupon
=Expected capital gains
By Itô’s lemma, this is an ordinary

Bdifferential
equation, subject to the following boundary
conditions. First, at bankruptcy, D A = (1 − α) AB . Second, for large A, debt is substantially
riskless, i.e. limA→∞ D (A) = Cr .
The solution to this is,
C
D (A) = (1 − pB (A)) + pB (A) (1 − α) AB , (12.9)
r
where 2r2
AB σ
pB (A) ≡ . (12.10)
A
Note, we may interpret pB (A) as the present value of $1, contingent on future bankruptcy, as
further explained in Appendix 1. Accordingly, (1 − pB (A)) /r is the expected present value of
the coupon payments up to bankruptcy.
The total benefits arising from tax shielding are,
C
T B (A) = (1 − pB (A)) τ .
r
and the present value of bankruptcy costs is,
BC (A) = pB (A) αAB .

We have,
Value of the firm = Equity + Debt

= Value of Assets ( = A) + T B (A) − BC (A) .
416
by A. Mele
Summing up,
C
E (A) ≡ Equity = A − (1 − pB (A)) (1 − τ ) − pB (A) AB .
r
Equity equals (i) the value of the assets, A; minus (ii) the present value of debt contingent on
no-bankruptcy net of tax benefits, (1 − pB (A)) (1 − τ ) Cr ; minus (iii) the present value of debt
contingent on bankruptcy net of bankruptcy costs, pB (A) AB . The second term decreases with
the default boundary, AB or, equivalently, pB (A). The third term, instead, increases with AB .
So the time equityholders wait before declaring bankruptcy, which is inversely related to AB ,
affects in opposite ways the last two terms. Equityholders choose AB to maximize the value of
equity. Their solution is a default boundary, AB , such that the value of equity does not change
for small changes in the value of the assets A around AB , or AB : E ′ (A)|A=AB = 0, a smooth
pasting condition. The result is,
C
AB = (1 − τ ) .
r + 12 σ 2
Similarly as in the American option case, the value of the option to wait increases with uncer-
tainty, σ 2 . Finally, and consistent
with the real option theory, it is easy to check
that this value
B B B B B
A does maximize the value E A; A ≡ E (A) in that A : 0 = ∂E A; A /∂A = 0.
How is it that tax shielding does not seem to affect the existence of a solution to this problem?
That is, the default boundary, AB , still exists, even with τ = 0. This issue is easily resolved.
If τ = 0, there are no reasons to issue debt in the first place, as no tax benefits would flow
to the firm, thereby increasing its value! In fact, when τ > 0, there is a value of leverage that
maximizes the value of the firm, according to simulations reported in Leland (1994).
12.3.1.5 Pros and cons of structural approaches to risky debt assessment
Pros. First, they allow to think about more complicated structures or instruments easily (e.g.,
convertibles). Second, they lead to simple yet consistent relations between different securities
issued by the same name. Structural approaches were very useful for theoretical research in the
1990s.
Cons. The firm’s asset value and asset volatility are not observed. Must rely on calibra-
tion/estimation methods. Bond prices generated by the model = market prices. These models
are a bit difficult to use in practice, for trading or hedging purposes, as we know that in this
case we need theoretical prices that exactly match market prices. Finally, how do we go for
sovereign issuers?
Most important. Structural models predict unrealistically low short-term spreads: see, e.g.,
Figure 12.3. The intuition is that diffusion processes are smooth: the probability of default tends
to zero as time to maturity approaches zero, because default cannot just jump in an unexpected
way. This is not what we exactly observe. Jumps seem to be a more realistic device to modeling
spreads.
12.3.2 Reduced form approaches: rare events, or intensity, models

Default often displays a few strong characteristics. It arrives unexpectedly, it is rare, and causes
causes discontinuous price changes. The structural models in the previous section do not ac-
commodate for these features −→ diffusion processes are continuous: passage times are known,
“locally.” This feature is responsible for the low short-term spreads.
417
by A. Mele
12.3.2.1 Poisson-driven defaults
We model the arrival of defaults through the Poisson processes introduced in Chapter 4, as
follows. Suppose to “count” the number of times some event happens. Denote with Nt the
corresponding “counting process.”
Nt
D efault
t0 t1 t2 t3
FIGURE 12.6.
Default time is simply defined as t0 , i.e. the first time Nt “jumps,” as in Figure 12.6. So
assume we chop a given interval [0, T ] in n pieces, and consider each resulting interval ∆t = Tn .
Assume jump probability over each of these small intervals of time ∆t is proportional to ∆t,
with proportionality factor equal to λ,
p ≡ Pr {One jump over ∆t} = λ∆t. (12.11)
Assume the number of jumps over the n intervals follows a binomial distribution:

n k T
Pr {k jumps over [0, T ]} = p (1 − p)n−k , where p = λ .
k n
For n large (or, equivalently, for small intervals ∆t),

(λT )k λT n−k (λT )k −λT
Pr {k jumps over [0, T ]} ≈ 1− ≈ e .
k! n k!
We can use the previous basic computations to come up with a few fundamental properties
for the distribution of default. We have,
Pr {Survival} = Pr {0 jumps over [0, T ]} = e−λT
Pr {Default} = 1 − Pr {Survival} = 1 − e−λT
Pr {Default occurs at some t} = λe−λt dt
1
Expected Time-to-Default =
λ
418
by A. Mele
We can now use these probabilities to assess the value of debt subject to default risk. Consider
Eq. (12.6):
D0 = e−rT [Rec · Q (Default) + N · Q (Survival)],

≡B0
where Rec is the expected recovery value of the asset. Using the probabilities predicted by the
Poisson model, we obtain:

B0 = Rec · 1 − e−λT + N · e−λT . (12.12)
The Appendix supplies an alternative derivation of Eq. (12.12).
12.3.2.2 Predicted spreads
The implications for spreads, for small maturities T , are easily seen, after some innocous ap-
proximations,

1 B0 1 B0 1
Spread = − ln ≈− −1 = 1 − e−λT · Loss-given-default.
T N T N T
Note, for T small, and in contrast to the structural models reviewed in Section 12.3.1, the
spread is not zero. Rather, it is given by the expected default loss per period, defined as the
instantaneous probability of default times loss-given-default,
Short-Term Spread = λ · Loss-given-default.
Therefore, models with jumps are able to rationalize the empirical behavior of credit spreads
at short maturities, discussed in Section 12.3.1. As explained, structural models, which are
typically driven by Brownian motions, cannot lead to positive spreads, as they imply that the
probability of default decays quickly as time-to-maturity goes to zero. Instead, in models with
jumps, there is always a possibility of “sudden death” for the firm: at any instant of time, and
even when the debt is about to expire, default can occur with positive probability, and this
fact is, then, reflected by positive short-term spreads. A theoretical model of Duffie and Lando
(2001) shows how a structural model of the firm can lead to positive short-term spreads, once
we assume incomplete information and learning about the assets value. In their model, learning
takes place with some delay, which leaves investors concerned about what they really know
about the firm’s asset value. It is this concern that leads to positive credit spreads in their
model.
Figure 12.7 depicts the behavior of the spread predicted by the model at all maturities, given
by,

1 B0 1 Rec −λT
−λT
Spread = − ln = − ln · 1−e +e .
T N T N
419
by A. Mele
240
Spread
239
238
237
236
235
234
233
232
231
0 1 2 3 4 5
Time to maturity
FIGURE 12.7. The term structure of bond spreads (in basis points) implied by an intensity
model, with recovery rate equal to 40% and intensity equal to λ = 0.04, implying an
expected time-to-default equal to λ−1 = 25 years.
It’s a decreasing function in time-to-maturity. Eventually, as time to maturity gets large, the
bond becomes, so to speak, certain to default, with the unusual feature to deliver, for sure,
some recovery rate at some point–the bond is certain to deliver the recovery rate. Indeed,
in Appendix 1, we show that if the recovery value of the bond is not constant, but shrinks
exponentially to zero as RecT ≡ R · e−κT , for two constants R and κ, then, asymptotically, the
spread is +
λ, if κ ≥ λ
lim s (T ) = (12.13)
T →∞ κ, if κ ≤ λ
A few additional issues. λ is the risk-neutral instantaneous probability of default, not the
physical probability of default, λ∗ say. The ratio λ/λ∗ is generally larger than one. Its inverse,
λ∗ /λ, is an indicator of the risk-appetite in the credit market. Similarly, loss-given-default is
an expectation under the risk-neutral probability, and should contain useful indications about
market participants risk appetite.
Assume that under the risk-neutral probability, the instantaneous intensity of default for a
given firm is λ = 4%, annualized, and that under the physical probability, the instantaneous
probability of default for the same firm is λ∗ = 2%, annualized. From here, we can compute the
probability of survival of the firm within 5 years, under both probabilities. They are:
∗
e−5λ = e−5∗0.04 = 0.81873, e−5λ = e−5∗0.02 = 0.90484.
Naturally, the probability of survival is lower under the risk-neutral measure.

Next, assume that the spread on a 5 year bond with face value N = 1, equals 3%. What is
the implied expected recovery rate from this spread? We have,
D0 = e−rT [Rec · Q (Default) + N · Q (Survival)] = e−rT [Rec ∗ (1 − 0.81873) + 1 ∗ 0.81873] .

420
by A. Mele
The spread is,

1 Rec ∗ (1 − 0.81873) + 1 ∗ 0.81873
s0 = 3% = − ln .
5 1
Solving for Rec, gives, Rec = 23.16%.
12.3.3 Ratings
In practice, corporate debt is rated by rating agencies, such as Moodys and Standard and Poors.
Depending on the rating, corporate debt may be either investment grade or non-investment
grade (“junk”). Moodys ratings range from Aaa to C. Standard and Poor’s range from Aaa to
D. One can compute the probability of “migrations” based on past experience −→ Transition
probabilities. Consider, for example, the following table:
One year rating transition probabilities (%), S&P's 1981-1991

To
AAA AA A BBB BB B CCC D
AAA 89.1 9.63 0.78 0.19 0.3 0 0 0
AA 0.86 90.1 7.47 0.99 0.29 0.29 0 0
A 0.09 2.91 88.94 6.49 1.01 0.45 0 0.09
From
BBB 0.06 0.43 6.56 84.27 6.44 1.6 0.18 0.45

BB 0.04 0.22 0.79 7.19 77.64 10.43 1.27 2.41
B 0 0.19 0.31 0.66 5.17 82.46 4.35 6.85
CCC 0 0 1.16 1.16 2.03 7.54 64.93 23.19
D 0 0 0 0 0 0 0 100
TABLE 12.1
12.3.3.1 Foundations
A natural approach, then, is to assess credit risk by making reference to probabilities of default
built up on transition probabilities like those in Table 12.1.
Such an approch, also known as a migration approach, is somewhat less drastic than that
based on rare events, and hopefully more realistic. However, it is also technically more complex
than the intensity approach of the previous section. We provide the most foundational issues
related to this approach, leaving some details in the Appendix.
At time t, there exists several rating classes, N say, denoted as Ratt ,
Ratt ∈ {1, 2, · · ·, N} .
Transition probabilities of rating from time t to time T are,
P (T − t)ij ≡ Pr ( RatT = j| Ratt = i) , i, j ≤ N.
We can build a Markov chain from here, by assuming that P (T − t)ij only depends on T − t.
Finally, we must have that,
N

P (T − t)ij ≥ 0 and P (T − t)ij = 1.
j=1
421
by A. Mele
For example, the probability of transition from rating Ratt = i to rating Ratt+1 = j in one
year is, P (1)ij . Table 12.1 contains one possible example of P (1)ij . The probability of transition
from rating Ratt = i to rating Ratt+2 = j in two years is P (2)ij , and is obtained as follows,
N

P (2)ij = P (1)ik · P (1)kj

k=1
Pr(transition from i to k in one year) Pr(transition from k to j in one further year)
More generally, we have, P (T ) = P (1)T , where P (T ) is the matrix with elements {P (T )ij }.
For example, the probability transition matrix P in Table 12.1 is,
 
89.1 9.63 0.78 0.19 0.3 0 0 0
 0.86 90.1 7.47 0.99 0.29 0.29 0 0 
 
 0.09 2.91 88.94 6.49 1.01 0.45 0 0.09 
 
 0.06 0.43 6.56 84.27 6.44 1.6 0.18 0.45 
P =  
0.04 0.22 0.79 7.19 77.64 10.43 1.27 2.41 
 
 0 0.19 0.31 0.66 5.17 82.46 4.35 6.85 
 
 0 0 1.16 1.16 2.03 7.54 64.93 23.19 
0 0 0 0 0 0 0 100
The 15 year transition matrix is:

 
20.01 35.82 23.91 9.92 4.05 3.06 0.43 2.66
 3.38 30.28 32.71 15.91 6.38 5.11 0.77 5.34 
 
 1.17 13.12 34.21 21.93 9.69 8.01 1.29 10.33 
 
 0.64 6.76 22.21 22.40 12.42 11.93 2.09 21.39 
P (15) ≈  0.33


 3.22 10.71 13.616 11.36 14.68 2.78 43.16 
 0.14 1.65 5.01 6.75 7.48 13.17 2.64 63.04 
 
 0 1.08 3.54 3.90 3.51 5.60 1.22 81.02 
0 0 0 0 0 0 0 100
12.3.3.2 Evaluation
The previous probabilities, {P (T )ij }, are meant to be taken under the physical world, not
the risk-neutral world. They can be used for risk-management purposes, but certainly not for
pricing. Indeed, historical default rates are too low to explain the price of defaultable securities.
A natural explanation relies on the presence of risk-premia. To use migration data for pricing,
it is vital to implement a number of steps.
First, clean up the data – smoothing. For example, it might well be that downgrades from
class i to class i + 2 are more frequent than downgrades from class i to class i + 1. Moreover,
remove zero entries: although some rating event did not happen in the past, they might well
occur in the future. Second, add positive risk-premia to the previous smoothed data so as to
obtain realistic asset prices.
As regards pricing, according to the migration model, there are N classes of assets. Each
single asset may migrate from one class to another. Because evaluation is a dynamic business,
we cannot evaluate defaultable securities within a given asset class without simultaneously
evaluate the defaultable securities in the remaining classes. For example, there could be a
chance that a given asset will “mutate” into a different one in the next year. Given this, the
price of this asset, today, must reflect the price of the asset in the other classes where it can
422
by A. Mele
possibly migrate. Hence, we must simultaneously solve for all the asset prices in all the rating
classes. This approach, developed by Jarrow, Lando and Turnbull (1997), is quite complex and
is given a succinct account in the Appendix.
Consider the simplest case, which arises when the expected recovery rate is zero. In this case,
by Eq. (12.6),
D0,i
= e−rT (1 − Qi (T − t)) ,
N
where Qi (T − t) is the risk-neutral probability the firm defaults, by time T , given it belongs
to rating i at time T .
More generally, by Eq. (12.6),

D0,i −rT Rec
=e Qi (T − t) + (1 − Qi (T − t)) .
N N
The risk neutral probabilities, Qi (T − t), must be found using migration frequencies such as
those in Table 12.1, which we must “clean up” and corrct with appropriate risk-premia, as
discussed.
Consider the following transition matrix:
To
A B Def
A 0.9 0.07 0.03
From
B 0.15 0.75 0.10
Def 0 0 1
where Def denotes the state of default. What is the probability that a name A will remain name
A in two years? What is the probability that a name A will default in two years?
Consider the following two year transition matrix:
   
0.90 0.07 0.03 0.90 0.07 0.03
Q (2) =  0.15 0.75 0.10  ·  0.15 0.75 0.10 ,
0 0 1 0 0 1

≡ Q(1) ≡ Q(1)
such that:
Pr {A is A in 2 years} = 0.90
∗ 0.90 + (0.07) ∗ (0.15) + 0.03
∗ 0

A→A→A A→B→A A→Def →A
= 0.8205,
and
Pr {A defaults in 2 years} = 0.90

∗ 0.03 + (0.07) ∗ (0.10) + ∗ 1
0.03

A→A→Def A→B→Def A→Def →Def
= 0.064.
423
by A. Mele
In general, we have that:

3

Q (2)ij = Q (1)ik · Q (1)kj ,
k=1
and for any T ,
 T
0.90 0.07 0.03
Q (T ) = Q (1)T =  0.15 0.75 0.10  .
0 0 1
Next, consider the following transition matrix, under the risk-neutral probability:
To
A B Def
A 0.80 0.20 0
From
B 0.15 0.75 0.10
Def 0 0 1
From here, we may easily compute, again, the (risk-neutral) probability A will default in two
years, and the probability B will default in two years. We have,
   
0.80 0.20 0 0.80 0.20 0
Q (2) =  0.15 0.75 0.10  ·  0.15 0.75 0.10 ,
0 0 1 0 0 1

≡ Q(1) ≡ Q(1)
such that:
Pr {A defaults in 2 years} = Q (2)13
∗ 0 + (0.20)
= 0.80

∗ (0.10) +

∗1
0
= 0.02.
(multiply first row by the third column), and,
Pr {B defaults in 2 years} = Q (2)23
∗ 0 + (0.75)
= 0.15

∗ (0.10) +
∗ 1
0.10
B→A→Def B→B→Def B→Def →Def
= 0.175.
(multiply second row by the third column).
Finally, suppose that the bonds issued by both A and B mature in two years. Furthermore,
assume that if these two bonds default, they pay off the same recovery rate, equal to 30%, and
only at the end of the second period. From here, we can compute the credit spreads for the two
bonds. We have,
erT ∗ Price A = (0.30) ∗ (0.02) + (1 − 0.02) = 0.986
1
⇒ Spread A = − ln (0.986) = 7.0495 × 10−3 .
2
and,
erT ∗ Price B = (0.30) ∗ (0.175) + (1 − 0.175) = 0.8775
1
⇒ Spread B = − ln (0.8775) = 6.5339 × 10−2 .
2
424
12.4. Derivatives on corporate assets c
by A. Mele
12.4 Derivatives on corporate assets

12.4.1 Callable and puttable bonds
Callable bonds are corporate bonds that give the issuer the right to buy them back at certain
times and predetermined prices. Puttable bonds, on the other hand, give the investor the right
to sell them back to the issuer at a certain strike price. Let us focus attention on callable
bonds. The next chapter, illustrates how to price these assets through trees. In this section, we
illustrate some useful properties of callable bonds, with the help of a few simple points. For
simplicity, we consider a callable, non-defaultable bond.
Let K be the strike price of the callable bond maturing at time S, and suppose that the date
of exercise, if any, is some future time T < S. Then, the payoff of the callable bond at time T
is worth min {K, P }, where P is the price of a non-callable bond. Indeed, if K < P , then, the
issuer can buy its bonds back at K and re-issue the same bond at better market conditions, P .
The difference, P − K, is just a net gain for the issuer. So the callable bond is worth just K
when K < P . Instead, if K > P , the issuer does not have any incentives to exercise and, then,
the value of the callable bond is just that of a non-callable bond. Therefore, the callable bond
is worth P when K > P .
It easy to see that,
min {P, K} = P − max {P − K, 0} .
Therefore, we see that the price of a callable bond with maturity date S, equals the price of
a non-callable bond with the same maturity date S, minus the value to call the bond, which
equals the price of an hypothetical option on the non-callable bond, struck at K.
We can apply these insights to price a callable option in a concrete example. Consider, for
example, the short-term rate in the Vasicek’s model discussed in Chapter 11. Then, if the
short-term rate is r (t) at time t, the value as of time t of the non-defaultable zero coupon bond
maturing at time S, callable at time T < S, at a strike price equal to K, is,
P callable (r (t) , t, T, S) = P (r (t) , t, S) − C b (r(t), t, T, S) , (12.14)
where P (r (t) , t, S) is the value of the non-callable zero maturing at time S, and C b (r(t), t, T, S)
is the value of a call option on the non-callable S-zero, maturing at time T and having a strike
price equal to K.
Eq. (12.14) shows that the presence of the option to call the bond raises the cost of capital
for the issuer.
In the context of the Vasicek’s model, the solution to C b (r(t), t, T, S) in Eq. (12.14) is given
by the Jamshidian’s (1989) formula given in Chapter 11, which we now use below. Figure 12.8
below depicts the behavior of the price of the callable bond in Eq. (12.14), P callable (r, 0, T, S),
as a function of the short-term rate, r, when the exercise price, K = 0.65, and S = 10, T = 0.5,
κ = 0.2, σ = 0.03, θ̄ = 0.06 ∗ κ − λ, where λ, the unit risk-premium, equals −1.7146 × 10−2 .3
3 Toevaluate Eq. (12.14), we make use of the closed-form solution for the bond price, given by P (r, t, T ) = eA(T −t)−B(T −t)·r ,
−κ(T −t) σ2

−κ(T −t) 2 , r̄ = 1 θ̄ − 1 σ 2 and
where the functions A and B are given by A (T − t) = −(T − 1−e κ )r̄ − 4κ3 1−e κ 2 κ

B (T − t) = κ1 1 − e−κ(T −t) .
425
by A. Mele
£
0.70
0.65
0.60
0.55
0.50
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10
short-term rate
FIGURE 12.8. “Negative convexity.” Solid line: the price of a callable bond. Dashed line:
the price of a non-callable bond. The price of a callable bond exhibits negative convexity
with respect to the short-term rate.
As Figure 12.8 illustrates, the convexity of the non-callable bond price is destroyed by the
convexity of the price of the option embedded in the callable bond. Intuitively, as the short-term
rate gets small, callable and non-callable bond prices increase. However, the price of callable
bonds increases less because as the short-term rate decreases, bond prices increase and then, the
probability the issuer will exercise the option, at maturity, increases. This makes the risk-neutral
distribution of the callable bond price markedly shifted towards the value of the strike price,
K = 0.65, which entails a progressively lower decay rate for the bond price as the short-term
rate gets small.
12.4.2 Convertibles
We only consider corporate convertible bonds. Convertible bonds offer bondholders the option
to convert their bonds into shares of the firm. The option to convert can be exercized at any
time up to maturity. How do they work? By definition, the face value of the convertible is,
Face value = $1 ≡ CR ∗ CP, (12.15)
where CR is the conversion ratio, i.e. the number of shares this face value converts into, and
CP is the conversion price, i.e. the stock price implicitly defined by Eq. (12.15).
Typically, the bond is any like other fixed income instrument, with coupon payments, callable
features, credit risk, etc. Callable features are almost invariably embedded into this type of
contracts. The parity, or conversion value, is the value of the bond if the bondholders decide
to convert. It is defined as,
CV = CR ∗ S,
where S is the price of the common share. Not only is the convertible bond price affected by
interest rates, credit risk, timing risk, etc. This price is also affected by the movements of the
underlying stock price. This is quite natural as there is a positive probability that the bond
426
by A. Mele
will “become” a share in the future. To emphasize this, we also say that convertible bonds
are hybrid instruments. The embedded option offers the bondholders the possibility to obtain
equity returns (not just bond returns) in good times, while offering protection against the
downside. As mentioned earlier, convertible bonds are almost always callable. The holder can
always convert the bond, once it has been called. The rationale behind callability is to induce
the bondholder to convert the bond earlier.
The pricing problem of convertible bonds has been intensively studied, theoretically. Inger-
soll (1977) provides the first theoretical article which lays down the foundations to rational
evaluation of convertible callable bonds. Let us define the dilution factor, denoted as γ, as the
fraction of common equity that would be held by the convertible bondowners if the entire issue
was converted. If there are nout shares outstanding, and the convertible bond can be exchanged
for n shares, then, in aggregate,
n
γ= .
n + nout
Let V be the market value of the firm and B conv (V, τ ; N ) the aggregate value of the convertible
bond with time to maturity τ and balloon payment N. To simplify the presentation, we do not
consider callability issues. However, we shall provide some intuition about this issue later. Let us
assume that the stocks and the convertible bonds are the only two claims in the capital structure
of the firm. Since, after conversion, only the stocks will remain, then, the post-conversion value
of the convertible bonds is simply the conversion value of the convertible, i.e. γV . Moreover,
we have, for any τ ≥ 0,
γV ≤ B conv (V, τ ; N) ≤ V. (12.16)
The first inequality in (12.16) is simple to understand. Indeed, suppose that B conv (V, τ ; N) <
γV . Then, we can purchase the convertibles, convert them into shares and, finally, sell the shares
for γV . The second inequality follows by limited liability equity holders, and the Modigliani-
Miller theorem.
At maturity, we have that,
B conv (V, 0; N) = min {V, max {N, γV }} . (12.17)
Indeed, B̄ ≡ max {N, γV } is the value of the convertible, in case of no-default. Then, min{V, B̄}
is what the firm will pay, to the bondholders: V in case of default, and B̄ in case of no-default.
It is possible to show that it is never optimal to exercize the option to convert before maturity.
Therefore, to price the convertible bond, we only need to be concerned with the risk-neutral
evaluation of the terminal payoff in Eq. (12.17).
We can re-express the terminal payoff in Eq. (12.17) in a manner that allows a better un-
derstanding of the issues underlying the exercise of the convertibles. In particular, we have
that,
B conv (V, 0; N ) = min {V, max {N, γV }} = max {γV, min {V, N }} . (12.18)
Indeed, let B̂ ≡ min{V, N}, which is what the firm is ready to pay, to the bondholders, if the
bondholders do not exercise the option to convert. Then, max{γV, B̂} is obviously the payoff
profile to the bondholders.
The terminal payoff in Eq. (12.18) illustrates very clearly that convertible bonds embed an
option to convert - on top of the plain vanilla non-convertible bond. Intuitively, at maturity, a
non-convertible bond is worth min {V, N }, and the option to convert is either worthless (in case
of non conversion) or γV − N (in case of conversion), i.e. it is max {γV − N, 0}. This intuition
427
by A. Mele
is confirmed, mathematically, as we have that:

max {γV, min {V, N }} = min {V, N} + max {γV − N, 0} .
Therefore, the value of the terminal payoff is, by Eq. (12.18),
B conv (V, 0; N) = min {V, N } + γ max {V − N/γ, 0} . (12.19)
Eq. (12.19) shows that the current value of the convertible bond is the sum of the value of
a “straight” bond plus the value of γ options on the firm with strike price equal to N/γ.
Accordingly, let B (V, τ ; N ) and W (V, τ ; N/γ) be the prices of the straight bond and the option
on the firm. We have,
B conv (V, τ ; N) = B (V, τ ; N ) + γW (V, τ ; N/γ) . (12.20)
We may use the Merton’s (1974) model to find the price of the straight bond, B (V, τ ; N). By
the results in Section 12.2, it is:
1 2

√ ln ( V / N ) + r + σ τ
B (V, τ ; N ) = V Φ (−d1 ) + Ne−rτ Φ d1 − σ τ , d1 = √ 2
, (12.21)
σ τ
where σ is the instantaneous volatility of the asset value, r is the (constant) instantaneous
short-term rate, and Φ is the cumulative distribution of a standard normal. Similarly, we may
use the Black-Scholes model to compute the function W .
Eq. (12.21) reveals the intuitive property that as V gets large, B (V, τ ; N ) ≈ N e−rτ : the
probability of default gets extremely tiny as the value of the assets gets large. Moreover, the
Black-Scholes model suggests that W (V, τ ; N/γ) ≈ V − e−rτ N/γ as V gets large. Therefore, by
Eq. (12.20), we have that, for large V , B conv (V, τ ; N) ≈ Ne−rτ + γ (V − e−rτ N/γ) = γV . Eq.
(12.21) also shows that for small values of V , B conv (V, τ ; N) ≈ 0.
To sum-up, the value of the convertible bond is less than the value of the firm, V , and larger
than the conversion value, γV . Moreover, it approaches γV , as the value of the firm gets large.
Figure 12.9 below illustrates the shape of the convertible bond price, as a function of the value
of the firm.
Convertible bond value
γV
Ne − rτ
Straight bond value
V
FIGURE 12.9. The value of a convertible bond
The value of a callable convertible bond is between the value of the straight bond and the
value of the convertible bond.
428
12.5. (Credit-) risk shifting derivatives and structured products c
by A. Mele
12.5 (Credit-) risk shifting derivatives and structured products

12.5.1 Securitization, and a brief history of credit risk and financial innovation
Securitization is a process by which some illiquid assets are transformed into a package of securi-
ties backed by these assets, through packaging, credit and liquidity enhancements. Two leading
examples are: (i) mortgages and (ii) receivables. Financial institutions find the securitization
process attractive, as they can carve out certain items in their balance sheet, thus boosting
their return on investments or simply because by securitizing assets, less capital is needed to
meet capital-requirements standards. For example, the accounts receivables of a corporation
may be used to back the issue of commercial paper known as asset-backed commercial paper.
Securitization is a way (not the only way) to trade/transfer credit risk, in principle.
What are the origins of credit derivatives and financial innovation? The first interest rates
derivatives were created around the mid 80s. In the late ’80s the business proliferated and
become fairly complex. But financial innovation is easy to imitate, which led banks to become
increasingly creative, so as to exploit their initial competitive advantage as longer as possible.
During the early 1990s, just after the 1991 recession, interest rates were quite low, and the
volatility of capital markets extraordinarily low. Derivatives, then, could be used as devices
to boost investors’ returns. JPMorgan introduced many structures such as, “LIBOR squared,”
“inverse floaters,” “power options,” ‘convexity forwards,” etc.
However, after the 1994 financial turmoil, the interest rate climate suddenly changed, and
many of the derivatives contracts produced large losses. There was a call for regulation by public
opinion and certain policy makers. At the same time, the International Swaps and Derivatives
Association argued that more regulation would destroy markets creativity. These regulatory
pressures vanished by the mid 1990s.
During the mid 1990s, the markets started to innovate, again, with a view that risks could
be assessed, and controlled, through market discipline, rather than through regulation. Swaps
markets recovered. They did do slowly though, as these products were already in the end of the
innovation cycle. They had been massively imitated, and margins for profits had consequently
been eroded. They had become, so to speak, a “mass-product.” The markets, then, were ready
for a new major innovation wave. The natural innovation had to with credit. Market players
such as JPMorgan, Credit Suisse, Bankers Trust soon realized that borrower defaulting was a
source of substantial risk, which could be conveniently reallocated through the use of dedicated
derivatives. Credit risks could be transferred in pretty much the same way as market risks
can be transferred through the underwriting of options written on stocks, or on interest rates.
JPMorgan had serious motivations to innovate, as its books contained vast pools of loans, which
could be used as practical material to experiment with. Importantly, these loans required too
many reserves and were consequently expensive.
The main idea, then, was to repackage the loans into derivatives, in a way that default risk
and/or part of the securitized loans, or both, could be sold to outside investors. In a sense,
then, credit derivatives were also a regulatory mitigation device, partly useful as a response to
regulation. The idea was simple: turn loans into derivatives that could be sold, and/or create
new insurance product such as credit default swaps. At the very beginning, derivatives were just
designed to have single loans as underlying. Afterwards, the idea emerged to create structures
organized in derivatives bundles, with cash flow indexed in some way to baskets of loans–the
ancestors to collateralized debt obligations. For example, JPMorgan created “Bistro” (Broad
Index Secured Trust Offering), a structure relying on a variety of assets, ranging from corporate
429
by A. Mele
debt to student loans. ABN - Amro created similar structures, “Heineken” and “Amstel”. But
then, competition increased and profit margins fell, which triggered the need for new innovation.
As explained in Section 12.4.7, these products were channeled through off-balance-sheet vehicles
escaping national supervision, a sort of “shadow banking system.” Basel II was not yet in place.
The response to increased competition was the creation of structured products having riskier
and riskier assets. For example, in the mid 1990s, derivatives teams begun to interact with
teams managing loans extended to borrowers with poor credit history–subprime mortgages. As
a result, subprime loans begun to be securitized and then structured into CDOs. JPMorgan was
not the leader in the creation of these products, compared to other institutions such as Merrill
Lynch or UBS. Ironically, JPMorgan itself bought Bear Stearns during the 2008 springtime.
The subprime turmoil arose out of mechanics that are by now well-understood. First, there
was a boom, sustained by (i) low interest rates and house price appreciation and (ii) a business
model that changes from buy to hold to originate and distribute, as explained earlier.
After the boom, the burst, caused by increasing interest rates and falling housing prices.
Evaluation models, if any, relied on the assumption delinquencies would remain the same, and
small risk-aversion adjustments to the calculation of expected actuarial losses were made (if
any). The picture below shows this wasn’t true and that in fact, the pieces of information
emanating from those simple pictures could have helped predict the crisis. Finally, correlation
issues were simply ignored or, at best, badly calibrated.
Section 12.4.7 provides a more systematic analysis of these issues, but it is instructive to
discuss since now, some of the causes leading to the burst and the 2007 crisis. One of them
is certainly related to “model misspecification,” or an inappropriate rating “mapping” system,
by which rating agencies used to tend to “transplant” the rating system for corporation to
structured products relyin on MBS. A second difficulty was the presence of a true “shadow
banking system” escaping the attention of the official financial community. The dynamics of
the crisis were a sharp liquidity dry-up, then a credit crunch, followed by a drop in the real
economic activity, which further fed the credit crunch, etc. In that context, it is quite difficult
to draw the line between liquidity squeezes and solvency issues.
430
by A. Mele
Mortgage Delinquencies by Vintage Year (60+ day delinquencies, in percent of balance).

Source: IMF, Global Financial Stability Report, April 2008.
431
by A. Mele
Left hand side panel: U.S. and European House Price Changes. Right hand side panel:
U.S. Mortgage-Related Securities Prices. Source: IMF, Global Financial Stability Report,
April 2008.
12.5.2 Total Return Swaps (TRS)

In a total return swap, or TRS, one party (who owns some asset, the asset underlying the
TRS) receives from the counterparty payments based on a mutually agreed rate, either fixed or
variable, and makes payments to the counterparty based on the return of the underlying asset,
which includes both the income it generates and any capital gains. The underlying asset can
be a loan, a bond, an equity index, or a basket of assets. The interest payments are typically
based on the LIBOR plus a spread. Consider the following example. Party A receives LIBOR
+ fixed spread equal to 3%. Party B receives the total return of the S&P 500 on a principal
amount of $1 million. If the LIBOR is 7% and the S&P 500 is up by 12%, A pays B 12% and
B pays A 7% + 3%. By netting, A pays B $20,000, i.e. $1 million × (12% − 10%).
While TRS are usually categorized as credit derivatives, they combine both market risk and
credit risk. The benefits from longing a TRS relate to the fact that the party with the asset
on the balance sheet buys protection against loss in value. On the other hand, shorting a TRS
allows the counterparty to receive the payoffs guaranteed by the asset without necessarily having
to put it in the balance sheet. Hedge funds find it quite convenient to short a TRS, as this allows
them to have views with limited collateral upfront. The market for TRS is over-the-counter and
market participants include institutions only.
432
by A. Mele
12.5.3 Spread Options (SOs)

In general, SOs are options written on the difference between two indexes. For example, let
S1 (T ) and S2 (T ) be the prices of two assets at time T . The payoff promised by a SO entered
at some time t < T , might be max {S1 (T ) − S2 (T ) − K, 0}, where K is the strike of the SO.
A SO can be written on the spread between two rates of returns too. Importantly, a SO can be
written on the spread between the yield of a corporate bond and the yield of a Treasury bond.
Examples include: (i) NOB spread (notes - bonds), which are spreads between maturities; (ii)
Spreads between quality levels, such as the TED spread (treasury bills − Eurodollars); (iii)
MOB spreads, i.e. the difference between municipal bonds and treasury bonds. More generally,
the definition of a SO has now been extended to include payoffs written as a linear combination
of indexes, interest rates and yields.
12.5.4 Credit spread options (CSOs)

In a CSO, the payoff is the difference between (i) the spread between two reference securities
(say Italian Government bonds and US Government bonds having the same maturity, or the
spread between the share on xyz and LIBOR, or two credit instruments), and (ii) a given strike
spread, for a certain maturity date. It may be an American or European option. So CSOs
allow to hedge against, or take specific views about, changes in credit spreads. For example,
an investor, while bullish on Italian bonds, might hedge against the uncertain outcome of a
political election, which could trigger a widening of short-term spreads of Italians versus US.
The investor, then, may long a CSO, with time to maturity around the days of the political
election, where the underlying are the Italian and Government bonds expiring in ten years, say.
A possible payoff to the CSO holder can be proportional to, (ITA/US − K), where ITA/US is
the ten year Italian-US spread in three months, and K is the strike spread.
12.5.5 Credit Default Swaps (CDS)

12.5.5.1 CDS on single names
CDS differ from TRS insofar as they provide protection against a credit event. TRS, instead,
provide protection against a loss in asset value, which could be triggered by both market or
credit riskm although it is obviously more often market risk than credit to kick in.
The premium, assumed to be paid quarterly, on a CDS contract at time t, is obtained by
equating the expected discounted value of the premium paid over the life of the contract, i.e.
at dates t < t1 < t2 < · · · < tM , where ti = t + 4i , M = 4 · N, and N is the number of years the
CDS refers to,
4·N

Premiumt = e−r(ti −t) · CDSt (N ) Pr {Survival at ti } ,
i=1
with the expected discounted value of the protection,

4·N

Protectiont = e−r(ti −t) · LGD (ti ) Pr {Default ∈ (ti−1 , ti )} ,
i=1
where r is the (constant) risk free rate, CDSt (N ) is the premium paid every quarter, prevailing
at time t, and LGD (ti ) is the Loss-Given-Default at time ti , which for simplicity is assumed to
be constant, i.e. known at time t.
433
by A. Mele
Equating Premiumt and Protectiont , and solving for CDSt , leaves:

4N −r(ti −t)
e · LGD (ti ) Pr {Default ∈ (ti−1 , ti )}
CDSt (N ) = i=1 4N . (12.22)
e−r(ti −t) · Pr {Survival at t }
i=1 i
At first glance, the previous derivation might look like “actuarial,” although it is not, actually.
The reason is that the probabilities in Eq. (12.22) are risk-neutral probabilities. As such, they
are, obviously, the same as those we use to price the bonds underlying the CDS contract.
Therefore, there must be no-arb relations linking bond prices to CDS premiums, which shall
be emphasize later on (see Section 12.4.5.4). This point illustrates in a remarkable way one key
difference between finance and insurance. Even if in insurance, one may end up pricing some
products through risk-adjusted probabilities, finance is where we typically end up having many
more traded risks than in insurance, and these risks are tightly related to no-arb restrictions.
Eq. (12.22) is a general formula we can use, once we have a model determining the risk-
nuutral probability of default. In this chapter, we implement Eq. (12.22) through a reduced-
form approach, which will allows us then to find the quarterly premium (or spread) CDSt (N)
quite easily, as follows.
We have, denoting again with λ the instantaneous probability of default, that Pr{Survival at
ti } = e−λ(ti −t) , and that Pr{Default at any z ∈ (ti−1 , ti )} = e−λ(ti−1 −t) − e−λ(ti −t) . Intuitively, if
the name survives at ti (event Ei ), it must necessarily have survived at ti−1 (event Ei−1 ), but
the converse is not true: Ei ⊂ Ei−1 , and the complement of Ei to Ei−1 is nothing but the event
of default between ti−1 and ti .4 Substituting the previous probabilities into Eq. (12.22), we find
that: 4N −r(ti −t)
i=1 e · LGD (ti ) e−λ(ti−1 −t) − e−λ(ti −t)
CDSt (N ) = 4N −(r+λ)(t −t) . (12.23)
i=1 e
i
For example, if LGD (ti ) is constant and equal to LGD for each ti , then, for ∆t = ti − ti−1 = 14 ,
CDSt (N ) ≈ λ · LGD · ∆t ≡ (expected losses per unit of time) · ∆t, (12.24)
where the approximation is obtained by making e−λ(ti−1 −t) − e−λ(ti −t) ≈ λe−λ(ti −t) ∆t. Naturally,
λ is the risk-neutral instantaneous probability of default for the security.
Note, Eq. (12.24) shows that the CDS premium is approximately the same as the instan-
taneous spread of a defaultable bonds, as explained in Section 12.2. This property is to be
expected, so to speak, as a purchase of a defaultable bond and protection on it is nothing
but a synthetic default-free bond. Therefore, there must be a no-arbitrage relation between
CDS spreads and defaultable bond spreads, as we anticipated earlier. However, in general, Eq.
(12.24) does not hold, as the assumptions made to achieve it (λ is constant, LGD is constant,
r is constant, etc.) are quite unrealistic. On the contrary, we often observe CDS spreads curves
that increase with maturity, as we shall explain in more analytical detail in Section 12.4.5.4.
Indeed, we may take interesting views. For example, buying CDS for 2Y and sell CDS for 3Y
is a view that default will not occur between the second and the third year from now.
12.5.5.2 CDS on indexes
A CDS index is a basket of credit entities in which the protection buyer, pays the same pre-
mium, called the fixed rate, on all the names in the index. Credit events are typically bound
4 Mathematically,
ti
we have that Pr{Default at any z ∈ (ti−1 , ti )} = ti−1 Pr{Default at z}dz, where Pr{Default at z} =
λe−λ(z−t) dz.
434
by A. Mele
to bankruptcy or delinquencies. After a credit event, the entity is removed from the index and
the contract goes through, although with a reduced notional amount, until expiration. While
CDS on single names are over-the-counter, CDS indexes are completely standardized and can
be more liquid, as historical data on bid-ask spreads show. In fact, it can be cheaper to hedge
a portfolio of CDS or bonds with a CDS index than it would be to buy many CDS to achieve
a similar effect. There exist two main indices: (i) CDX index, which contains North American
and Emerging Market companies; and (ii) iTraxx index, which contains companies from the
rest of the world
12.5.5.3 Disentangling default probability from risk-aversion
The following picture, taken from Fender and Hördahl (2007), illustrates the behavior of the
credit market risk appetite before the 2007 credit market turmoil.
FIGURE 12.10. Antonio Mele does not claim any copyright on this picture, which is taken
from Fender and Hördahl (2007). The picture has been put here for illustrative purposes
only, and permission to the authors shall be duly asked before the book will be published.
How did the authors estimate the price of risk? Consider the expected losses under the
actuarial, or physical probability for a given security. The counterpart to Eq. (12.24), under the
physical probability, is:
Expected LossesP ≡ λP · LGD · ∆t,
where λP is the physical instantaneous probability of default for a given security. Assume that
LGD is constant, to simplify. If the investors require compensation for the default event, then,
the actuarial losses should be less than the CDS spread, i.e. Expected LossesP < CDS, or,
λ > λP .
435
by A. Mele
The risk-premium is defined as the difference between the actuarial losses, Expected LossesP ,
and the CDS premium,

Risk-Premium = λ − λP · LGD · ∆t.
The price of risk in Figure 12.10 is defined as the ratio of the CDS spread over Expected LossesP ,
λ
Price-of-Risk = .
λP
Early references to estimation methods are Duffie et al. (2005) and Amato (2005). Typically,
Expected LossesP are proxied by Moody’s KMV’s Expected Default Frequencies (EDFsTM ),
obtained through fully specified structural models for credit risk. The next pictures are taken
from Amato (2005). As we can see, during the 2003-2005 period, credit spreads were so low,
and this in turn gave incentives to CDO issuers to look for illiquid and relatively more complex
assets to put as collateral, which led to the issuance of CDO relying on ABS such as MBS, or
CDO2 , explained below.
FIGURE 12.11. Antonio Mele does not claim any copyright on this picture, which is
taken from Amato (2005). The picture has been put here for illustrative purposes only,
and permission to the author shall be duly asked before the book will be published.
436
by A. Mele
FIGURE 12.12. Antonio Mele does not claim any copyright on this picture, which is
taken from Amato (2005). The picture has been put here for illustrative purposes only,
and permission to the author shall be duly asked before the book will be published.
The following picture illustrates the behavior of CDS indexes during approximately 20 years
before the 2007-2009 credit market turmoil.
FIGURE 12.13. Valuation of Financial Instruments Based on Implied Probability of De-

fault. Antonio Mele does not claim any copyright on this picture, which is taken from
IMF (2008). The picture has been put here for illustrative purposes only, and permission
to the authors shall be duly asked before the book will be published.
437
by A. Mele
12.5.5.4 Continuous time
We may relax the assumption the instantaneous intensity of default, λ, is constant. This inten-
sity is defined under the risk-neutral probability and can change either because the intensity
of default under the physical probability changes or because risk-appetite changes, or both.
We aim to examine the asset pricing implications of time-varying intensities, by exploring how
probabilities of survival change in a simple setting, where we do not single out the reasons
leading to variations in λ.
First, we assume the instantaneous probability of default can only change at discrete times,
giving rise to random intensities λt , meaning that λt is the intensity of default in the time interval
[t − 1, t]. Let Ft be the information set as of time t. We assume that λt is Ft -measurable. What
is, then, the probability of survival of any given name in this case? We have, by Bayes’s theorem,
Pr {Surv at t}
Pr { Surv at t| Surv at t − 1} = . (12.25)
Pr {Surv at t − 1}
By a repeated use of Eq. (12.25),
Pr {Surv at t} = Pr { Surv at t| Surv at t − 1} Pr {Surv at t − 1}
= ···
1t
= Pr { Surv at n| Surv at n − 1} . (12.26)
n=1
So we are left with finding Pr { Surv at n| Surv at n − 1}. Consider the following arguments.
If λn was not random and fixed at some λ̄n , then, Pr {Surv at n| Surv at n − 1} = e−λ̄n .
When λn is random, e−λn is the probability of survival, conditioned upon some particular
value the intensity could possibly take. Heuristically, then, Pr { Surv at n| Surv at n − 1} =
−λn (s)
s∈S e Pr {s}, where λn (s) is, so to speak, the value λn would take in state s, Pr {s} is
the likelihood that state s occurs and, finally, S is the set of all possible states, as illustrated
by the figure below.
d e f a u lt
λ n (2 )
P r{2}
s u rv iv a l
d e f a u lt
P r{1}
λ n (1)
s u rv iv a l
This picture illustrates the case of two states of intensities. At the beginning of period n, nature
draws the event defining the intensity of the default, which is λn (1) with probability Pr {1} and
λn (2) with probability Pr {2} = 1 − Pr {1}. Then, the two paths leading to default have probability
Pr {1} e−λn (1) and probability Pr {2} e−λn (2) of occurrence, such that the probability of default equals
Pr {1} e−λn (1) + Pr {2} e−λn (2) .

Therefore, Pr { Surv at n| Surv at n − 1} = E e−λn Fn−1 , where E denotes the expectation
taken under the risk-neutral probability. Inserting this result into Eq. (12.26), and using the
438
by A. Mele
Law of Iterated Expectations, leaves:

t
− λ
Pr {Surv at t} = E e n=1 n .
Under regularity conditions, we can easily extend the previous result to a continuous time
setting. For example, we may assume that the risk-neutral default intensity, λ (t), is solution
to:
dλ (t) = φ λ̄ − λ (t) dt + σ λ (t)dW (t) , λ (0) = λ. (12.27)
where W is a standard Brownian motion under the risk-neutral probability, and φ, λ̄ and σ are
three positive constants. This is the same as the Cox, Ingersoll and Ross (1985) (CIR) model
of the short-term rate reviewed in Chapter 11. Therefore, under the parameter restrictions in
Chapter 11, λ (t) is always positive, and
#t
− λ(s)ds
Psurv (λ, t) ≡ Pr {Surv at t} = E e 0 . (12.28)
Eq. (12.28) is, formally, the same as the Feynman-Kac representation of a solution to a PDE,
solved by a bond price in the CIR model. In other words, the model for the survival probability
in Eqs. (12.27)-(12.28) has the same mathematical structure as that leading to the price of a
bond in the CIR model. Therefore, a closed-form solution is available for Psurv (λ, t). It is given
by:
Psurv (λ, N) = Φ (N ) e−B(N)λ ,

$ 1
% 2φ2λ̄
2γe 2 (φ+γ)N σ
2 eγN − 1
Φ (N) = , B (N ) = , γ= φ2 + 2σ 2 .
(φ + γ) (eγN − 1) + 2γ (φ + γ) (eγN − 1) + 2γ
(12.29)
More generally, we can build up a whole family of models with a closed-form solution, the
affine class reviewed in Chapter 11, by just assuming that:
λ (t) = λ0 + λ1 · y (t) , (12.30)
where λ0 is a constant, λ1 is a vector of constants, and y is a multivariate jump-diffusion process,

with drift and diffusion terms as in Section 11.3.6 of Chapter 11. This model is interesting, as
we can judiciously choose the components of y (t) which we suppose may affect the default
intensity. For example, some of them could be unobservable, and others could be observable,
and relate, say, to the business cycle or even the structure of the firm.
So given any solution for the survival probability predicted by any of these affine models
when y (0) = y, Psurv (y, t) say, we can easily compute
Pr{Default ∈ (ti−1 , ti )} = Psurv (y, ti−1 ) − Psurv (y, ti ) . (12.31)
We can then look at the bond spreads and the CDS spreads implied by this modeling choice.
In Appendix 3, we show the price of a defaultable pure discount bond expiring in N years is:
" N
−rN
P (y, N ) = e Psurv (y, N ) + e−rt Pr{Default ∈ dt}Rec (t) dt, (12.32)
0
439
by A. Mele
where Rec (t) denotes the recovery value in case of default, supposed to be known. This eval-
uation result is, naturally, consistent with a similar derivation provided in Section 11.3.7 of
Chapter 11, although in this chapter we are emphasizing more “survival arguments.”
As for the CDS spreads, we have, by Eq. (12.22),
4N −r(ti −t)
e LGD (ti ) [Psurv (y, ti−1 ) − Psurv (y, ti )]
CDSt (N) = i=1 4N −r(t −t) ,
i=1 e Psurv (y, ti )
i
where N is, again, the number of years the CDS refers to, and ti = t + 4i .
Assume the short-term rate, r, is zero, and that loss-given-default is constant and equal to
LGD. Then, as shown in Appendix 3, the price of a defaultable pure discount bond, P (λ, N ),
and the current CDS premium, CDS0 (N ), are given by:
1 − Psurv (λ, N )
P (λ, N) = 1 − LGD · (1 − Psurv (λ, N )) , CDS0 (N) = LGD · 4N . (12.33)
i=1 Psurv (λ, ti )
Figure 12.14 depicts the bonds spread, −1 ln P (λ, N ), and the annualized credit default
N √
spreads, 4 × CDS0 (N), when the parameters in Eq. (12.27) are φ = 0.25, λ̄ = 0.04 and σ = λ̄,
with loss-given-default LGD = 0.60, and two values of the current intensity: λ = λ̄ = 0.04,
and λ = 0.02. Assuming that LGD is constant is not plausible, empirically. Instead, we know
LGD moves countercyclically for most names, although it does not exhibit strong business cycle
features, for sovereigns. For sovereigns, the size of the country and debt distribution seem to
be by far more important.
Spreads, in basis points, for average default intensity Spreads, in basis points, for low default intensity
240 180
bond spreads
235 CDS spreads, annualized
170
bond spreads
230 CDS spreads, annualized
225 160
220
150
215
210 140
205
130
200
195 120
0 2 4 6 8 10 0 2 4 6 8 10
years years
FIGURE 12.14. Spreads on bonds and CDS predicted by the affine model in Eq. (12.27).
The left panel depicts the spreads when the current default intensity equals the long-run
mean, λ = λ̄ = 0.04. The right panel depicts the spreads in good times, i.e., when the
current intensity of default takes a low value, λ = 0.02. In each case the recovery rate
equals 40%.
440
by A. Mele
The mechanism is that good times are followed by bad, and so when λ = 0.02, we expect de-
fault rates to rise in the future. As a consequence, spreads are increasing in maturity. Moreover,
we easily see that bond spreads are approximately equal to CDS spreads at short maturities.
At longer maturities, the two spreads diverge, with CDS spreads, 4 × CDS0 (N ), dominating
bonds spreads, −1 N
ln P (λ, N). Moreover, we have that the two curves are decreasing in time to
maturity even when the current value of the intensity equals the long-run one, λ̄.
Where do these two properties originate from? The first one follows because we have, ap-
proximately,
−1 −1
ln P (λ, N) = ln [1 − LGD · (1 − Psurv (λ, N ))]
N N
1 − Psurv (λ, N )
≈ LGD ·
N
1 − Psurv (λ, N)
≤ LGD · 1 4N
4 i=1 Psurv (λ, ti )
= 4 × CDS0 (N ) .
As regards the second property, it’s a convexity effect. We can tackle this issue using argu-
ments similar to those we made for another topic in Chapter 11, Section 11.3.4. For the bond
−φs
spreads, since E (λ (s)) = λ̄ + e λ − λ̄ , we have, approximately,
−1 −1
ln P (λ, N) = ln [1 − LGD · (1 − Psurv (λ, N ))]
N N #N
−1 − λ(s)ds
= ln 1 − LGD · 1 − E e 0
N
#N
−1 − E(λ(s))ds
≤ ln 1 − LGD · 1 − e 0
N
#N
1 − e 0 E(λ(s))ds
−
≈ LGD ·
N
−φN
−λ̄N−(λ−λ̄) 1−eφ
1−e
= LGD · ,
N
so that even if λ = λ̄, then, bond spreads are bounded away by a decreasing function (in N ).
Of course, it doesn’t necessarily mean that bond spreads have to be decreasing as well, but that
bounding function helps this happening. As for the CDS spreads, we have, approximately:
1 − Psurv (λ, N) 1 − Psurv (λ, N ) 1

4 × CDS0 (N ) = LGD · 4N ≤ LGD · ≈ − ln Psurv (λ, N ) ,
1
4 i=1 Psurv (λ, ti )
N · Psurv (λ, N ) N
such that for λ = λ̄, CDS0 (N ) is bounded away by a decreasing function (in N ), for the same
arguments made as regards the bond spreads, − N1 ln P (λ, N).
12.5.5.5 A trading strategy
Bond prices and CDS spreads are driven by the same state variable, the default intensity, and
so they are restricted to lie on some space, to be consistent with no-arbitrage. To illustrate,
441
by A. Mele
consider, first, the simple case where the default intensity is constant, such that CDS spreads
are given by Eq. (12.23). Given this model, we can look at the market data for CDS spreads,
and infer the risk-neutral intensity, as in the picture below.
Inferring risk−neutral intensity from CDS market data

350
CDS spreads, model−based, in basis points
300
250
200
150
100
50
0
0 0.01 0.02 0.03 0.04 0.05
Default intensity
In this picture, the CDS spreads predicted by Eq. (12.23) are depicted as a function of the
risk-neutral intensity, λ, assuming N = 5 years, LGD = 0.60 and the short-term rate r is zero.
For example, if we had to observe a CDS equal to 200 basis points, we would infer a value of
λ approximately equal to λ̂ = 0.033. The key point is this very same λ̂ should be pricing the
zero as well, such that for N = 5,

P (N ) = 1 − LGD · 1 − e−λ̂N = 0.90874,
and so we might go long (short) the zero if its market price is lower (higher) than 0.90874.
Naturally, this example is based on the unrealistic assumption that the default intensity is
constant. But the same strategy can be used in the more general case where default intensities
are stochastic. In this case, bond prices and CDS spreads should also be restricted, by no-
arbitrage. The picture below shows the restrictions between bond spreads and CDS spreads,
obtained with the same parameter values as those used to produce Figure 12.14, and values of
442
by A. Mele
current default intensities ranging from nearly zero to up to 0.05.
No−arb restrictions between bond spreads and CDS spreads

260
240
Bond spreads, model−based, in basis points

220
200
180
160
140
120
100
80
80 100 120 140 160 180 200 220 240 260
CDS spreads, model−based, in basis points
12.5.5.6 Hazard rates
In a pricing context, the relevant probabilities of survival are obviously conditioned upon the
time of evaluation, time 0 say. For example, the probability of default in Eq. (12.31) is only con-
ditioned to the information we have at time zero. More generally, the probability of defaulting
in the time interval (ti−1 , ti ), conditional upon survival at time t < ti−1 , is:
Psurv (y, ti−1 ) − Psurv (y, ti )

Pr{Default ∈ (ti−1 , ti )| Survival at t} = . (12.34)
Psurv (y, t)
For example, for t = ti−1 , and (ti−1 , ti ) small, and λ deterministic, a simple approximation to
this conditional probability can be,
∂
P
∂t surv
(y, t)
Pr{ Default ∈ (ti−1 , ti )| Survival at t} ≈ (ti − ti−1 )
Psurv (y, t)
pdefault (y, t)
≡ (ti − ti−1 )
1 − Pdefault (y, t)
= λ (t) (ti − ti−1 ) ,
with straight forward notation. The previous expressions are known as hazard rates. They coin-
cide with λ (t) dt, when λ (t) is deterministic. If λ (t) is not deterministic, simple computations
lead to:
Pr{ Default ∈ (t, t + dt)| Survival at t} = EQλ [λ (t)] dt, (12.35)
where Qλ is a new probability, with Radon-Nikodym derivative given by:
#t
− λ(s)ds
dQλ e 0
= . (12.36)
dQ F0 Psurv (λ, t)
443
by A. Mele
Accordingly, under Qλ , the state variables in Eq. (12.30) follow a diffusion process, with a drift
process tilted, due to this change of measure. For example, in the simple setting of Eq. (12.27),
we have that, for a fixed t,

dλ (s) = (B0 − B1t (s) λ (s)) ds + σ λ (s)dWλ (s) , s ∈ (0, t] , λ (0) = λ,
(12.37)
B0 = φλ̄, B1t (s) = φ + B (t − s) σ 2 , B (·) as in Eq. (12.29),
where Wλ is a Brownian motion under Qλ . Therefore, by Eq. (12.35), and computations,

" t #x
λ G (s) t
Pr{ Default ∈ (t, t + dt)| Survival at t} = + B0 ds dt, G (x) ≡ e 0 B1 (u)du .
G (t) 0 G (t)
Appendix 5 provides a proof of these results, which to the best of our knowledge, are developed
here for the first time.
12.5.5.7 Extracting probabilities of default from market data
Market data obviously convey information about probabilities of default, which might be ex-
tracted from these data, under a number of assumptions. To illustrate this possibility in a simple
case, assume that the recovery rate is zero, and that the short-term rate and the instantaneous
probability of default are both continuous time Markov and independent of each other. Then,
the price of a defaultable zero is: Pdef (λ, N ) = P (N) · Psurv (λ, N), where Pdef (λ, N ) is the price
of a defaultable zero and P (N ) is the price of a non-defaultable zero. Therefore, we can read
the risk-neutral probability of survival from the defaultable/non-defaultable bond price ratio:
Pdef (λ, N )
Psurv (λ, N ) = . (12.38)
P (N )
Naturally, surviving until some time N2 means having survived until some time N1 < N2 and
having survived from N1 to N2 . Therefore, Psurv (λ, N2 ) = Psurv (λ, N1 ) · Psurv (λ, N1 , N2 ), where
Psurv (λ, N1 , N2 ) is the risk-neutral probability of survival between N1 and N2 . Using Eq. (12.38),
then, we can extract this probability, as follows:
Pdef (λ, N2 ) P (N1 )
Psurv (λ, N1 , N2 ) = .
Pdef (λ, N1 ) P (N2 )
The previous example relies on the simplifying assumption of a zero recovery rate, but it can
be generalized to the case where the recovery rate is nonzero. But in this case, bond prices would
convey information about both probabilities of default and recovery rates, an identification issue
to be dealt with.
12.5.6 Collateralized Debt Obligations (CDOs)

12.5.6.1 A crash description
CDOs are securitized shares in pools of assets. Collateral assets include loans or debt instru-
ments. A CDO may be a collateralized loan obligation (CLO) or collateralized bond obligation
(CBO) according to whether it relies only on loans or bonds, respectively. CDO investors bear
the credit risk of the collateral. Multiple tranches of securities are issued by the CDO, offering
investors various maturity and credit risk characteristics. Tranches are categorized as senior,
mezzanine, and subordinated, or junior, or equity, according to their degree of credit risk. If
444
by A. Mele
there are defaults or the CDO’s collateral otherwise underperforms, scheduled payments to
senior tranches take precedence over those of mezzanine tranches, and scheduled payments to
mezzanine tranches take precedence over those to junior tranches. Typically, senior tranches
are rated, with ratings of A to AAA. Mezzanine are also rated, typically with ratings of B to
BBB. In principle, these ratings should reflect both the credit quality of the collateral and the
protection a given tranche is given by the tranches subordinating to it. CDOs are part of a
more general securitization process, and can also include mortgages, as in the stylized example
below.
(i) In a first step, subprime mortgages are securitized, as illustrated below:
Subprime
Mortgage ABS
Monthly investor
payments Asset
Subprime Backed
ABS
Mortgage Security
investor
(ABS)
Monthly
payments
Subprime ABS
Mortgage investor
(ii) In a second step, a CDO is created, out of the securitized subprime mortgages and addi-
tional Asset Backed Secutities (ABS):
Subprime
ABS CDO
Investors
Debt Obligation
Collateralized
Subprime
(CDO)
ABS CDO
Investors
ABS relating to other
forms of collateral CDO
(e.g. corporate debt) Investors
(iii) In a third, and final step, the structuring process involves creating seniority rules.
Investors in CDOs senior tranches include banks and pension funds, which might benefit from
the expertise of the asset managers, and the risk-return profiles difficult to find in the market.
Investors in junior tranches are hedge funds searching for highly risky investment opportunities
that at the same time, are quite rewarding and certainly unavailable in the market. Additional
investors in junior tranches were dedicated off-balance-sheets entities such as “SIV,” “conduits,”
and “SIV-lites,” which will be reviewed in Section 12.4.7.
Underwriters of CDOs are investment banks, typically. They work closely with the asset
manager and create the “right” debt/equity ratio and perform collateral quality tests. They
liase with law firms and create the special purpose vehicle (possibly in some tax heaven system)
445
by A. Mele
that will purchase the assets and issue the tranches, price the various tranches, and obviously
find the investors. Fees to underwriters are very generous, due to the complexity of the CDOs.
According to Thomson Financial, top underwriters in 2006 were: Bear Sterns, Merrill Lynch,
Wachovia, Citigroup, Deutsche Bank, and Bank of America Securities.
Involved in the structuring process are also (i) trustee and collateral administrator, who dis-
tribute noteholder reports, check compliance and execute priority of payments; (ii) accountants,
who perform due diligence on the CDOs collateral pool, verifying for example credit ratings for
each asset; and (iii) rating agencies, which we shall discuss in the next subsection.
The economics behind structured finance is quite interesting. An originator may have private
information about the quality of certain assets and/or a comparative advantage in evaluating
these assets relative to other market participants. If the originator wishes to sell some of its
assets, an adverse selection problem will arise: because investors do not know the true quality of
the assets, they will demand a premium to purchase them or even worse, a market might fail to
arise. Structured finance helps originators mitigate this problem. First, by pooling the assets,
diversification benefits can be achieved. Second, tranching allows relatively poorly informed
investors to access senior tranches, and be relatively protected from default. In the process, the
originator or arranger may retain subordinated exposure to alleviate investors’ concerns about
incentive compatibility. The following scheme summarizes the structuring process.
Source: Committee on the Global Financial System: “The role of ratings in structured
finance: issues and implications,” January 2005.
12.5.6.2 The role of rating agencies
Structured finance has always been a “rated” market. Issuers of structured instruments had
a natural appetite for a rating to occur at a scale comparable to that available for bonds.
446
by A. Mele
The main reason was this would facilitate the sale of these products to investors bound by
ratings-based constraints defined by their investment mandates.
However, the involvement of rating agencies into the delivery of their opinion about credit
risk differs from that related to traditional bonds. As regards traditional instruments, rating
agencies simply aim to assess the risk of default as given, which they take as given. As regards
structured finance transactions, rating agencies play a much more ex-ante, reverse engineering
role. A tranche rating reflects a view about both the credit risk of the asset pool and the extent
of credit support to be provided. These two elements are organized to reverse engineer the
tranche rating targeted by the deal’s arrangers. Deal origination thus involves rating agencies
in the structuring process.
12.5.6.3 Types of CDOs
In practice, CDOs are considerably more complex than the stylized examples outlined earlier.
We have a number of cases. We say that a CDO is static, if it holds the same set of assets.
Insetad, a CDO is managed, if the asset manager is allowed to change the composition of assets.
If the claims to the CDO arise from the cash flows originated by the assets, we have a cash-
flow CDO. If the claims to the CDO arise from the cash flows originated by the assets and/or
active asset management, we have a market-value CDO. CDOs can also be created to carve out
balance sheets, in which case we have balance-sheet CDOs. Moreover, and interestingly, CDOs
can be created (i) to achieve investment grade bonds through a pool of noninvestment grade
bonds, and (ii) to create riskier securities than those in the asset pool. In these cases, we have
arbitrage CDOs. Naturally, “arbitrage” CDOs do not give rise to any arbitrage opportunity.
These instruments merely “reshuffle” risk and returns of the assets in the pool, as we shall see
in the next section. Typically, then, arbitrage CDOs differ from balance sheet CDOs, because of
course, issuers of arbitrage CDOs do not necessarily hold the underlying collateral in advance,
which is obviously the case for issuers of balance-sheet CDOs. Therefore, the assets to be put
into the an arbitrage CDO pool have to be reasonably liquid.
Furthermore, we have synthetic CDOs, which are exposed to a pool of assets that are not
strictly owned or in the asset pool, typically through CDS underwriting. Like a cash-flow CDO,
the vehicle receives payments (the premium), which is then transferred to the tranche holders.
Naturally, there can be default events, which are also passed through to the investors, according
to the prespecified seniority rules. A synthetic CDO is funded, if the relevant tranche holders are
to pay for in the case of a credit event related to the assets the CDO is exposed to. Typically,
some funding is made available at the very time of investment. At maturity, the investor receives
a payoff equal to the funding minus the realized losses. Junior tranches are typically funded,
and senior are typically not. However, senior tranches investors might have to make payments
in the unlikely event losses had ever to erode their tranches.
Finally, we have hybrid CDOs, which are partly cash-flow CDOs and partly synthetic CDOs.
In a single-tranche CDO, the entire CDO is structured to accommodate the specific needs of a
small group of investors, with some remaining tranche held by the dealer. And we have CDO 2 ,
where a large portion of the assets in the pool are tranches from other CDOs; or more generally,
CDO n .
12.5.6.4 Pricing
CDOs repackage cash flows from a set of assets. We provide simple examples to show how to
price this repackaging process. We begin with a simple example, taken from McDonald (2006, p.
583), which we further elaborate. Suppose we have three one-year bonds with face value = 100.
447
by A. Mele
For each of these bonds, the risk-neutral probabilities of default equal 10% and the recovery
rates are 40. The safe interest rate for one year is 6%. So each bond price equals,
b = e−0.06 · (
0.10 · 40 + 0.90
· 100) = 88.526.
≡Def. Prob ≡Surv. Prob
b
The yield is, naturally, − ln 100 = 12.19%.
A CDO can restructure the payments promised by the three bonds in a way that transforms
the riskiness and attractiveness of the initial assets. Consider the following example:
Senior tranche = 140
Face Value
= 300 Mezzanine tranche = 90
Junior tranche = 70
Asset Pool CDO claims

In this example, each tranche receives the minimum between (i) the nominal value claimed by
the tranche and (ii) what is left available to the tranche after having satisfied the other tranches
by order of seniority.
Let Ni be the nominal values claimed by the tranches, so that N1 = 140, N2 = 90 and
N3 = 70. Let π̃ be the realized payoff of the asset pool, defined as,
π̃ = No. of Defaults · 40 + (3 − No. of Defaults) · 100.

≡No. of surviving bonds
Naturally, π̃ is random because the number of defaults is random. At the expiration,
(i) the senior tranches receives the minimum between N1 and π̃. For example, if only one
bond defaults, π̃ = 240, and the senior tranche receives 140. If, however, three bonds
default, then, π̃ = 120, which is less than the senior tranch nominal value, and the senior
tranche then receives 120. So a quite severe loss is needed to erode the senior tranche
claims.
(ii) The mezzanine tranche receives the minimum between N2 and the “left-over” from the
senior tranche.
(iii) Finally, at the expiration, the junior tranche reveives the minimum between N3 and the
“left-over” from the senior and mezzanine tranches.
More generally, tranche no. i receives,
π i = min {Left-over from previous tranches up to tranche i − 1 , Ni } ,
where
+ ,

i−1
Left-over from previous tranches up to tranche i − 1 = max π̃ − πk , 0 .
k=1
448
by A. Mele
Synthetically, + + , ,

i−1
πi = min max π̃ − π k , 0 , Ni .
k=1
All we need, now, is to model the risk-neutral probability of default for each firm. Initially, we
assume the default events are independent across firms. Assume binomial distribution,

3 k
Pr (No. of Defaults = k) = p (1 − p)n−k , p = 10%, k ∈ {1, 2, 3} .
k
We can then derive the following payoff structure
Payoffs to CDO tranches, and prices: with independent defaults

Defaults Pr(Defaults) π: pool payoff (1) π1: Senior π2: Mezzanine π3: Junior
0 0.729 300 140 90 70

1 0.243 240 140 90 10
2 0.027 180 140 40 0
3 0.001 120 120 0 0
Price 131.8281994 83.40266709 50.34673197
Yield 0.060142867 0.076129382 0.329561531
(1)
π: pool payoff = Def*40+(3-Def)*100
N1 = 140
N2 = 90
N3 = 70
The price of each tranche is computed as the tranche payoff, averaged across states, discounted
at the safe interest rate. For example, the price of the mezzanine tranche is,
Price Mezzanine = e−0.06 (0.729 ∗ 90 + 0.243 ∗ 90 + 0.027 ∗ 40 + 0.001 ∗ 0) = 83.403.
Its yield is, Yield Mezzanine = − ln 83.403

90
= 7.61%. Naturally, the sum of the three bond prices,
88.526×3 = 265.58, is equal to the total value of the three tranches, 131.828+83.403+50.347 =
265.58. As anticipated, a CDO is a mere re-packaging device. It doesn’t add or destroy value.
It merely redistributes risks (and returns).
The assumption defaults among names are uncorrelated is unrealistic, as argued in Section
12.5.4. We now remove this assumption. First, what happens in the special case where default
events are perfectly correlated ? In this case, either the three firms all default (with probability
0.10) or none defaults (with probability 0.90), and we have the situation summarized below,
Payoffs to CDO tranches, and prices: with perfectly correlated defaults

Defaults Pr(Defaults) π: pool payoff (1) π1: Senior π2: Mezzanine π3: Junior
0 0.9 300 140 90 70

1 0 NA NA NA NA
2 0 NA NA NA NA
3 0.1 120 120 0 0
Price 129.9635056 76.28292722 59.33116562
Yield 0.074388737 0.165360516 0.165360516
(1)
π: pool payoff = Def*40+(3-Def)*100
N1 = 140
N2 = 90
N3 = 70
449
by A. Mele
Note, now, that mezzanine and junior tranches yield the same as they each pay off either their
nominal value or zero in exactly the same states of nature.
The previous cases (with independent and perfectly correlated defaults) are extreme. It is
by far more relevant to see what happens when defaults are only imperfectly correlated. When
defaults are imperfectly correlated, there are no simple tables to use to come up with tranche
pricing. Instead, one might make use of simulations, described succinctly in the Appendix.
Figure 12.14 below, obtained through Monte Carlo simulations, illustrates how the yield on
each tranche changes as a result of a change in the default correlation underlying the assets in
the CDO.
Yields on CDO tranches

0.4
Junior
Mezzanine
0.35 Senior
0.3
0.25
Yield
0.2
0.15
0.1
0.05
0 0.2 0.4 0.6 0.8 1
default correlation
FIGURE 12.15. Yields on the three CDO tranches, as functions of the default correlation
among the assets in the structure, with probability of default for each name p = 20%. The
thick, horizontal, line is the yield on each securitized asset.
“Arbitrage” CDOs
Figure 12.15 illustrates how arbitrage CDOs work. The CDO has three assets yielding the same,
12.19% (the horizontal line in the picture). However, by restructuring the asset base through a
CDO, we can create claims (Senior and Mezzanine tranches) that yield less than 12.19%, as they
are considerably less risky than the asset base. Such an excess return, (12.19% − Yieldtranche ),
with Yieldtranche ∈ {Senior, Mezzanine}, is “made available” to the Junior tranche/equity hold-
ers, once we account for management fees and expenses. Note, the previous redistribution of
risk always works when the default correlation is relatively low. As the default correlation in
the asset base increases, the situation may change dramatically, as we now illustrate. Figure
12.16 below makes some comparative statics: with p = 20%, instead of p = 10%. The yields are
obviously larger for each tranche, and the three assets now yield 18.78%, reflecting the highr p.
450
by A. Mele
Correlation assumptions
In Figures 12.15 and 12.16, the yield on the junior tranche decreases with default correlation.
This happens because we are assuming that the probability of default is fixed at p = 10% for
each default correlation ρ (say). As ρ increases, the probability of clustering events increases,
which makes the Senior and Mezzanine tranches relatively less valuable and, correspondingly,
the Junior tranches more valuable. A more appropriate model is one in which p increases as
ρ increases, to capture the fact that in bad times, both default correlation and probability of
defaults increase as these two things are intimately related (by, e.g., some common business
cycle factor).

0.7
Junior
Mezzanine
0.6 Senior
0.5
0.4
Yield
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1
default correlation
among the assets in the structure, with probability of default for each name p = 20%. The
thick, horizontal, line is the yield on each securitized asset.
Addressing the correlation assumption
Relax the assumption that the probability of default, p, and the default correlation, ρ are
independent. For simplicity, assume that ρ = 3.8116 ∗ ln (p + 1), and let p vary from 0.10 to
0.30, such that then, ρ varies from 0.3633 to 1. The situation, then, changes dramatically. Figure
12.17 depicts the results, which show how modeling might substantially affect effective pricing.
First, and naturally, the yield on each securitized asset is increasing in ρ because ρ is, itself,
increasing in the probability of default. Second, the Junior tranche has a yield that increases
451
by A. Mele
over a wide spectrum of values for the default correlation, ρ.

0.5
0.45
0.4 Junior
Mezzanine
Senior
0.35
0.3
Yield
0.25
0.2
0.15
0.1
0.05
0.4 0.5 0.6 0.7 0.8 0.9 1
default correlation
among the assets in the structure, with probability of default and default correlation
related by ρ = 3.8116 ∗ ln (p + 1), p ∈ [0.10, 0.30]. The thick curve line depicts the yield
on each securitized asset.
12.5.6.5 Nth to default
In this contract, the owner of the 1st to default bears the risk of the first default that occurs in
the asset pool:
Payoff = Pr(No. of Defaults ≥ 1) ∗ 40 + Pr(No. of Defaults < 1) ∗ 100.
Likewise, the owner of the 2nd to default bears the risk of the second default that occurs in the
asset pool:
Payoff = Pr(No. of Defaults ≥ 2) ∗ 40 + Pr(No. of Defaults < 2) ∗ 100.
Finally, the owner of the 3rd to default bears the risk of the third default that occurs in the
asset pool:
Payoff = Pr(No. of Defaults = 3) ∗ 40 + Pr(No. of Defaults < 3) ∗ 100.
Let us assume that default correlation is zero for simplicity. We have previously computed
the previous probabilities as:
Pr(No. of Defaults ≥ 1) = 0.243 + 0.027 + 0.001 = 0.271
Pr(No. of Defaults ≥ 2) = 0.027 + 0.001 = 0.028
Pr(No. of Defaults = 3) = 0.001
452
by A. Mele
Thus, we have the following prices,
Price1st -to-default = e−0.06 ∗ [0.271 ∗ 40 + (1 − 0.271) ∗ 100] = 78.863

Price2nd -to-default = e−0.06 ∗ [0.028 ∗ 40 + (1 − 0.028) ∗ 100] = 92.594
Price3rd -to-default = e−0.06 ∗ [0.001 ∗ 40 + (1 − 0.001) ∗ 100] = 94.120
From here, we can compute the yields as follows, Yield1st -to-def = − ln (78.863/100) = 23.74%,
Yield2nd -to-def = − ln (92.594/100) = 7.69%, and Yield3rd -to-def = − ln (94.120/100) = 6.06%.
12.5.7 One stylized numerical example of a structured product

A. Defaultable bonds
Suppose we observe the following risk-structure of spreads, related to two bonds maturing in
two years:
SpreadA (2 years) = 1.5%, SpreadB (2 years) = 2.5%,
where A and B denote the rating classes the bond issuers belong to. Assume that the one-year
transition rating matrix, defined under the risk-neutral probability, is:
To
A B Def
A 0.7 0.3 0
From
B 0.3 0.5 0.2
Def 0 0 1
where “Def” denotes default. We assume that in the event of default, the recovery value of the
bond is paid off at the end of the second period. We want to determine the expected recovery
rates for the two bonds, and which expected recovery rate is the largest. We have:

rT D0,i Reci
e = Qi (2) + (1 − Qi (2)) , i ∈ {A, B} .
N N
Therefore,

1 RecA
SpreadA (2 years) = 1.5% = − ln QA (2) + (1 − QA (2)) (12.39)
2 N

1 RecB
SpreadB (2 years) = 2.5% = − ln QB (2) + (1 − QB (2)) (12.40)
2 N
We have to find QA (2) and QB (2). The transition matrix for two years is,
  
0.7 0.3 0 0.7 0.3 0
Q (2) =  0.3 0.5 0.2   0.3 0.5 0.2  ,
0 0 1 0 0 1
such that,
Pr {A defaults in 2 years} = QA (2)

∗ 0 + 0.30
= 0.70 ∗ 0.20 + ∗1
0
= 0.06
453
by A. Mele
Pr {B defaults in 2 years} = QB (2)

= ∗ 1
0.20 + 0.50
∗ 0.20 + 0.30
∗ 0
B→Def →Def B→B→Def B→A→Def
= 0.20 + 0.10 = 0.30.
Hence, using Eqs. (12.39)-(12.40), we have

1 RecA
SpreadA (2 years) = 1.5% = − ln 0.06 + (1 − 0.06)
2 N

1 RecB
SpreadB (2 years) = 2.5% = − ln 0.30 + (1 − 0.30)
2 N
Solving, yields,
RecA RecB
= 50.7%, = 83.7%.
N N
The expected recovery rate for the second bond is the largest. This is because the probability
firm B defaults is much larger than the probability firm A defaults and yet the two spreads are
relatively close to each other. So to rationalize the two spreads, we need a large recovery rate
for the second bond.
What would happen to the two credit spreads, then, once we assume that the recovery rates
are the same, and equal to 50%? This question sheds additional light to the previous findings.
If the recovery rates are the same and both equal 50%,
1
SpreadA (2 years) = − ln [0.50QA (2) + (1 − QA (2))]
2
1
SpreadB (2 years) = − ln [0.50QB (2) + (1 − QB (2))]
2
Then, using the previously computed transition probabilities for two years, we obtain:
SpreadA (2 years) = 1.52%, SpreadB (2 years) = 8.12%.
When the recovery rates are the same, the spread on the second bond diverges substantially
from that on the first bond.
B. Collateralized debt obligations
Let us keep on using the same framework as before, but use different figures, so as to figure out
the implications for CDOs pricing. Consider the following one year transition matrix, under the
risk-neutral probability:
To
A B Def
A 0.7 0.3 0
From
B 0.1 0.6 0.3
Def 0 0 1
where “Def” denotes default. Consider (i) 1 one-year bond issued by a company rated A, and
(ii) 3 one-year bonds issued by a company rated B. Both bonds have face value equal to 100.
454
by A. Mele
We assume that the recovery values in case of default of all these bonds are the same, and equal
to 50. Finally, we assume the safe interest rate is taken to be equal to zero.
Consider a collateralized debt obligation (CDO, in the sequel), which gathers the previous
four bonds. Therefore, the CDO has nominal value of 400, and pays off in one year. The CDO
has (i) a senior tranche, with nominal value equal to 150; (ii) a mezzanine tranche, with nominal
value equal to N1 ; and (iii) a junior tranche, with nominal value equal to N2 . We assume that
the structure is such that N1 > 100.
First, we determine the price and yields on all the four bonds. Since the safe interest rate is
zero, and the company rated A is safe, up to the next year, the price of the A bond is 100, and
its yield is zero. As for the three bonds rated B, we have:
P B = 50 ∗ 0.3 + 100 ∗ 0.7 = 85.0, Y ield B = − ln 0.85 = 16.25%.
Second, we determine the yield on the junior tranche, and derive the yield on the mezzanine,
as a function of its nominal value N1 . To determine the yield on the tranches, we need to figure
out the following table:
No Def Pr Π π0 π1 π2
0 0.7 400 150 N1 N2
1 0 NA NA NA NA
2 0 NA NA NA NA
3 0.3 250 150 100 0
4 4 NA NA NA NA
where No Def denotes the number of defaults, Pr is the probability of No Def, Π is the pool
payoff, defined as,
Π = No Def ∗ 50 + (4 − No Def) ∗ 100,
and, finally: π 0 is the payoff to the senior tranche, π 1 is the payoff to the mezzanine tranche,
and, π 2 is the payoff to the junior tranche. Therefore, we have:
price mezzanine = 0.70 ∗ N1 + 0.30 ∗ 100, price junior = 0.70 ∗ N2 ,
such that:

0.70 ∗ N1 + 0.30 ∗ 100 100
Yield mezzanine = − ln = − ln 0.70 + 0.30 ∗
N1 N1

0.70 ∗ N2
Yield junior = − ln = 35.67%.
N2
Naturally, we need to have that Yield mezzanine < Yield junior. It is simple to show this
relation: it suffices to note that,

100
Yield junior = − ln (0.70) > − ln 0.70 + 0.30 ∗ = Yield mezzanine.
N1
A reverse enginnering question is, now, to determine which nominal value of the mezzanine
tranche N1 is needed, to ensure that the yield on the mezzanine tranche is equal to or greater
than the yields on the bonds issued by the company with credit rating B? The answer is
N1 = 200, for in this case, the mezzanine tranche would have the same payoff structure as the
bond rated B: it would deliver (i) the face value, in the event the company rated B does not
default; and (ii) half of its nominal value, 100, in the event the company rated B does default.
455
by A. Mele
Finally, we ask which nominal value of the mezzanine tranche N1 is needed, to ensure that
the yield on the mezzanine is equal to 18%? And what is the corresponding nominal value of
the junior tranche, N2 ? To address these issues, we first want that:

0.70 ∗ N1 + 0.30 ∗ 100
Yield mezzanine = − ln = 18%.
N1
Solving for N1 yields, N1 = 221.78. Therefore, N2 = 400− Nominal value senior − N1 =

400 − 150 − 221.78 = 28.22.
12.5.7.1 The 2007 subprime crisis

Issuance data
European and U.S. Structured Credit Insurance. Source: IMF, Global Financial Stability
Report, April 2008.
456
by A. Mele
Outstanding U.S. Subprime issuance. Source: IMF, Global Financial Stability Report,
April 2008.
Off-balance-sheet entities: “SIV,” “conduits,” and “SIV-lites”
[b. circa 1985]
On the funding side, a typical SIV (Structured Investment Vehicle) issues long-maturity
notes. On the asset side, a SIV typically relies on assets that are more complex than those
conduits rely on. SIVs tended to be more leveraged than conduits. Please remember: SPV =
Special Purpose Vehicle, i.e. a vehicle that organizes securitization of assets; SIV = Structured
Investment Vehicle, i.e. a fund that manages asset backed securities. In a sense, SIV were virtual
baks, as they used to borrow through low-interest securities and invest the money in longer term
securities yielding large rewards (and risk), as we discuss below. SIVs and conduits typically
had an open-ended lifespan.
SIV-lites are less conservatively managed, and structured with greater leverage. Their port-
folios are not much diversified, and are much smaller in size than SIVs. SIV-lites had a finite
lifespan, with a one-off issuance vehicle. They were greatly exposed to the U.S. subprime market,
more so than SIVs.
Off-balance-sheet entities borrow in the shorter term, typically through commercial paper or
auction rate securities with average maturity of 90 days, as well as medium term notes with
average maturity of a year. They purchase long-maturity debt, such as financial corporate bonds
or asset-backed securities, which is high-yielding. Naturally, the profits made by these entities
are paid to the capital note holders, and the investment managers. The capital note holders
are, of course, the first-loss investors.
The obvious risk incurred by these entities relates to solvency, which happens when long-
term asset values fall below the value of short-term liabilities. This risk has great chance to
materialize when the pricing of the assets is “informal,” as argued below. A second risk relates
to funding liquidity, which is the risk related to duration mismatch: refinancing occurs short-
term, but if the short-term market conditions are bad, the entities need to sell the assets into a
457
by A. Mele
depressed market. To cope with this risk, the sponsoring banks would grant credit lines. Typical
sponsors were: Citibank ($100bn), JP Morgan Chase ($77bn), Bank of America ($60bn). In the
European Union: HBOS ($42bn), ABN Amro ($40bn), HSBC ($32bn).
The 2007 meltdown
The first obvious issue to think about relates to pricing and the role played by credit ratings.
Being illiquid, the pricing of structured credit products used to rely on that of similarly rated
comparable products for which quotations were available. For example, the price of AAA ABX
subindices would be used to estimate the values of AAA-rated tranches of MBS. Or, the price of
BBB subindices would be used to value BBB-rated MBS tranches. This is the “mapping role”
credit ratings played for the pricing of customized or illiquid structured credit products. How-
ever, it is well-known that the risk profile of structured products differs from that of corporate
bonds. Even if a tranche has the same expected loss as an otherwise similar corporate bond,
unexpected loss or tail risk can be much larger than that for corporate bonds. Therefore, it is
458
by A. Mele
misleading to extrapolate structured products ratings from corporate bonds ratings. Typically,
ratings used to capture only the first moments of the distribution. Moreover, credit rating in-
ertia for bonds does not necessarily work for structured products, as illustrtated in the picture
below.
Two additional fundamental aspects contributing to the meltdown. First, there was an ero-
sion in lending standards: statistical models were based on historically low mortgage default
and delinquency rates that arose in a credit environment with tight credit standards. Second,
there were correlation issues: past data suggested a quite weak correlation between regional
mortgages, which made investors perceive a sense of “diversification.” However, the housing
downturn turned up to be a nation-wide phenomenon.
The mechanics of the crisis started with fears of contagion from the rising level of defaults
in subprime underlying instruments, many of which were incorporated in complex products.
The fears of contagion related to safer tranches as well. They came from the investors’ un-
derstanding the pricing models were misspecified, and their lack of trust vis-à-vis the rating
agencies. Banks, on the other side, were affected for a number of reasons: (i) they had invested
in subprime securities directly; (ii) they had provided credit lines to SIV (indebted through
commercial paper) and conduits that held these securities, thereby creating a shadow bank-
ing system, which escaped accounting and supervision rules; and (iii) this very same shadow
system generated banks’ loss of confidence in the ability of their counterparties to meet their
contractual obligations. So the Asset Backed Commercial Paper market dried up, triggering
credit lines. The result was a sell-off of anything related to structured finance, from junk to
AAA, which led to a complete “liquidity black hole,” and a severe reappraisal of structured
finance.
In turn, the reappraisal of structured finance determined severe writedowns, arising in part
through the “liquidity black hole,” i.e. by the market participants expectations. Repricing was
difficult indeed. In the absence of a liquid market, writedowns have to rely on marking to model.
But investors did not trust the models and the rating process leading to them! Meanwhile,
credit agencies proceeded to severe downgrades, confirming the investors’ beliefs that ratings
were not entirely appropriate, a quite self-reinforcing mechanism. These events escalated to
a complete dry up in September-October 2008, partly restored by painful bank bail-outs and
459
12.6. A few hints on the risk-management practice c
by A. Mele
recapitalizations.
12.6 A few hints on the risk-management practice

12.6.1 Value at Risk (VaR)
We need to review Value at Risk (VaR), in general. VaR is a method of assessing risk that uses
statistical techniques. Useful for supervision and management of financial risks. Origins: reaction
to financial disasters in the early 1990s involving Orange County, Barings, Metallgesellschaft,
Daiwa, etc.
Definition I: VaR measures the worst expected loss over a given horizon under normal market
conditions at a given confidence level.
Definition II: We are (1 − p)% certain that a given portfolio will not suffer of a loss larger
than $W over the next N weeks, Pr (Loss < −W ) = p. That is, $VaRp = $W .
460
by A. Mele
−W
Equivalently, note that

Loss ∆V
= = portfolio return
V0 V0
where ∆V denotes the change in value of the portfolio over the next N days, and $V0 is the
current value of the portfolio. Hence,

∆V VaRp
p = Pr (Loss < −VaRp ) = Pr <− .
V0 V0
This formulation leads us to the following alternative definition:
Definition III: We are (1 − p)% certain that a given portfolio will not experience a relative
loss larger than VaR
V0
p
over the next N weeks.
So in practice, we shall have to find the relative loss, ℓp , for a given confidence p, as follows:

∆V VaRp
p = Pr < −ℓp , where ℓp = .
V0 V0
The corresponding VaRp is just

VaRp = ℓp · V0
For example, suppose that the portfolio return over the next 2 weeks, ∆V
V0
, is normally dis-
∆V
tributed with mean zero and unit variance. We know that 0.01 = Pr( V0 < −2.32), hence,
VaRp = 2.32 · V0 .
461
by A. Mele
0.4
0.35
0.3
0.25
0.2
0.15
0.1
1%
VaR/V
0.05 0
0
−3 −2 −1 0 1 2 3
We are 99% certain that our portfolio will not suffer of a loss larger than −2.32 times its
current value over the next 2 weeks. We are 99% certain that our portfolio will not experience
a relative loss larger than −2.32 over the next 2 weeks.
As a second example note that the previous assumption about the portfolio return was
extreme. Assume, instead, the porfolio return over the next 2 weeks, ∆V V0
, is normally distributed
2 2 2 2
with mean zero and variance σ = 52 σ year , where σ year is the annualized variance. We assume
that σ 2year = 0.152 . We have to re-scale the previous formulas, as follows. First, we introduce a
variable ǫ̃ ∼ N (0, 1), i.e. ǫ̃ is normally distributed with mean zero and variance = 1. So we can
write,
∆V d
= ǫ̃ · σ ∼ N 0, σ 2 ,
V0
and, hence,
0.01 = Pr (ǫ̃ < −2.32) = Pr (∆V < −2.32 · V0 · σ) ,
whence, VaRp = 2.32 · V0 · σ. We know the annualized variance, σ 2year = 0.152 , from which we
can derive the two-week standard deviation, σ 2 = 52 σ year ≈ 0.032 , and, hence, VaR
2 2
V0
p
= 2.32 · σ =
2.32 · 0.03 ≈ 7%. Thata is, we are 99% certain that our portfolio will not suffer of a loss larger
than 7% times its current value over the next 2 weeks. We are 99% certain that our portfolio
will not experience a relative loss larger than 7% over the next 2 weeks.
More generally, we may assume the porfolio return over the next 2 weeks, ∆V V0
, is normally
2
distributed with mean µ and variance σ . In this case,
∆V d
= µ + ǫ̃ · σ ∼ N µ, σ 2 ,
V0
and, hence,
0.01 = Pr (ǫ̃ < −2.32) = Pr (∆V < −V0 · (2.32 · σ − µ))
whence, VaRp = V0 · (2.32 · σ − µ). In practice, µ is very small if the horizon is as short as two
weeks.
462
by A. Mele
12.6.1.1 Challenges to VaR
Challenges related to distributional assumptions, nonlinearities, or conceptual difficulties.

Distributional assumptions
The assumption that data are generated by a normal distribution does not describe asset
returns well. Chapters 10 and 11 explain that we need ARCH effects, stochastic volatility and
multifactor models. More generally, data can exhibit changes in regimes, nonlinearities and fat
tails. Fat tails are particularly important to understand, since this is what we’re interested
in after all. More in general, it is quite challenging to understand what the data generating
process is, especially in so far as we consider portfolios of assets. Asset returns and volatilities
are typically correlated, with correlation rising in bad times–correlation is stochastic.
We may make distributional assumptions but, then, these assumptions have to be carefully
assessed through, for example, backtesting (to be explained below). We may proceed with
nonparametric methods, and this is indeed a promising avenue, but with its caveats.
How do nonparametric methods work? These methods rely on an old and idea, which is to
estimate the data distribution through histograms. These histograms, then, can be readily used
to compute VaR. This approach is nonparametric in nature, as it does not rely on any model.
A more refined method replaces “rough” histograms with “smoothed” histograms, as follows.
Suppose to have access to a time series of data xn , which are drawn from a certain probability
law, with density f (x). We may define the following estimate of the density f (x),
N
ˆ 1 1 x − xn
fN (x) = K ,
N n=1 λ λ
where N is the sample size, and K is some symmetric function integrating to one. We may
think of fˆN (x) as a smoothed histogram, with window bin equal to λ. It is possible to show
that as N goes to infinity and λ goes to zero at a certain rate, fˆN (x) converges “in probability”
to f (x), for all x. But we are not done, since there are not obvious rules to choose λ and K?
The choice of λ is notoriously difficult. Unfortunately, the “bias,” fˆN (x) − f (x), tends to be
large exactly on the tails of f (x), which do represent the region we’re interested in. In general,
we can use Montecarlo simulations out of a smoothed density like this to compute VaR.
Nonlinearities
Finally, portfolios of assets can behave in a nonlinear fashion, especially when the portfolio
contains derivatives. In general, the value of a portfolio including M assets is,
M

P = αi Si ,
j=1
where αi is the number of the i-th asset in the portfolio, and Si is the price of the i-th asset
in the portfolio. Holding αi constant, the variation on the portfolio return is simply a weigthed
average of all the asset returns,
M M
∆P αi Si ∆Si
∆PT ≡ PT − Pt = αi ∆Si ⇐⇒ = ,
j=1
P j=1
P S i
where the variations relate to any time interval. Often, the prices Si are rational functions of
the state variables, or are interlinked through arbitrage restrictions. Use factors to determine
463
by A. Mele
the risk associated with fixed income securities. When the horizon of the VaR is large, it is
unlikely that αi is constant. Typically, we shall need to go for numerical methods, based, for
example, on Monte Carlo simulations. So all in all, we need to have a careful understanding of
the derivatives in the book, and proceed with back testing and stress testing.
VaR as an appropriate measure of risk
There are technical difficulties related to the very definition of VaR. VaR suffers from some
statistic-theoretic foundation. VaR tells us that 1% of the time, losses will exceed the VaR figure,
but it does not tell us the entity of the loss. So we need to compute the expected shortfall. Any
risk measure should enjoy a number of sensible properties. Artzner et al. (1999) have noted a
number of properties, and showed that VaR does not enjoy the so-called subadditivity property,
according to which the sum of the risk measures for any two portfolios should be larger than the
risk measure for the sum of the two portfolios. VaR doesn’t satisfy the subadditivity property,
but expected shortfall does satisfy the subadditivity property.
12.6.2 Backtesting
How well the VaR estimate would have performed in the past? How often the loss in a given
sample exceeded the reference-period 99% VaR? If the exceptions occur more than 1% of the
time, there is evidence that the models leading to VaR estimates are “misspecified”–a nice
word for saying “bad” models.
The mechanics of backtesting is as follows. Suppose the models leading to the VaR are
“good”. By construction, the probability the VaR number is exceeded in any reference period
is p, where p is the coverage rate for the VaR. Next, we go to our sample, which we assume
it comprises N days, and let M be the number of days the VaR is exceeded. We wish to test
whether the number of exceptions we observe in the sample “conforms” to the expected number
of exceptions based on the VaR. For example, it might be that the number of exceptions we
have observed, M, is larger than the expected number of exceptions, p · N . We want to make
sure this circumstance arose due to sample variability, rather than model misspecification. A
simple one-tail test is described below.
Let us compute the probability that in N days, the VaR is exceeded for M or more days.
Assuming exceptions are binomially distributed, this probability is,
N
N!
Πp = pk (1 − p)N−k .
k=M
k! (N − k)!
Then, we can say the following. If Πp ≤ 5% (say), we reject the hypothesis that the probability
of exceptions is p at the 5% level–the models we’re using are misspecified. If Πp > 5% (say),
we cannot reject the hypothesis that the probability of exceptions is p at the 5% level–we can’t
say the models we’re using are misspecified. This test is reviewed in more detail by Hull (2007,
p. 208). Other tests are reviewed by Christoffersen (2003, p. 184).
12.6.3 Stress testing

Stress testing is a technique through which we generate artificial data from a range of possibles
scenarios. Stress scenarios help cover a range of factors that can create extraordinary losses
or gains in trading portfolios, or make the control of risk in those portfolios very difficult.
These factors include low-probability events in all major types of risks, including the various
464
by A. Mele
components of credit, market, and operational risks. Stress scenarios need to shed light on the
impact of such events on positions that display both linear and nonlinear price characteristics
(i.e. options and instruments that have options-like characteristics).
Possible scenarios include simulating (i) shocks that although rare or even absent from the
historical database at hand, are likely to happen anyway; and (ii) shocks leading to structural
breaks and/or smooth transition in the data generating mechanism. One possible example is to
set the percentage changes in all market variables in the portfolio equal to the worst percentage
changes having occured in ten days in a row during the subprime crisis 2007-2008.
This example on the subprime crisis is related to the historical simulation approach to gener-
ate scenarios. This approach consists can be explained through a single formula. Let vt the value
of some market variable i in day t in our sample, where t = 0, · · · , T (say). We can generate T
scenarios for the next day, T + 1, as follows.
(i) The first scenario is that in which each variable grows by the same amount it grew at
time 1,
v1
vT +1 = vT · .
v0
(ii) The second scenario is that in which each variable grows by the same amount it grew at
time 2,
v2
vT +1 = vT · .
v1
(iii) · · ·
(iv) The T -th scenario is that in which each variable grows by the same amount it grew at
time T ,
vT
vT +1 = vT · .
vT −1
(v) The T scenarios are generated for all the market variables, which would give us an artificial
multivariate sample of T observations. We can use this sample for many things, including
VaR.
12.6.4 Credit risk and VaR

We can use the tools in Section 12.2 to assess the likelihood of default for a given name. The
important thing to do is to use the physical probability of default, not the risk neutral one. The
risk neutral probability of default is likely to be larger than the physical one. Therefore, using
the risk neutral probability leads to too conservative estimates.
VaR for credit risks pose delicate issues as well. The key issue is the presence of default
correlation. In practice, defaults among names or loans are likely to be correlated, for many
reasons. First, there might be direct relationships or, more generally, network effects, among
names. Second, firms performance could be driven by common economic conditions, as in the
one factor model which we now describe. This one factor model, developed by Vasicek (1987),
is at the heart of Basel II. In the appendix, we provide additional technical details about how
this model is related to a modeling tool known as copulae functions. We now proceed to develop
this model in an intuitive manner. Let us define the following variable:
√
zi = ρF + 1 − ρǫi , (12.41)
465
by A. Mele
where F is a common factor among the names in the portfolio, ǫi is an idiosynchratic term,
and F ∼ N (0, 1), ǫi ∼ N (0, 1). As we explain in the Appendix, ρ ≥ 0 is meant to capture the
default correlation among the names.
Next, assume that the physical probability each firm defaults, by T , say P (T ), is the same
for each firm within the same class of risk, and given by,
P (T ) = Φ (ζ PD ) ≡ PD,
where PD is the probability of default, and Φ is the cumulative distribution of a standard

normal variable. That is, by time T , each firm defaults any time that,
zi < ζ PD ≡ Φ−1 (PD) ,
where Φ−1 denotes the inverse of Φ. One economic interpretation of Eq. (12.41) is that zi is the
value of a firm and, then, the firm defaults whenever this value hits some exogenously given
barrier ζ PD .
Conditionally upon the realization of the macroeconomic factor F , the probability of default
for each firm is, −1 √
Φ (PD) − ρF
p (F ) ≡ Pr ( Default| F ) = Φ √ . (12.42)
1−ρ
By the law of large numbers, this is quite a good approximation to the default rate for a portfolio
of a large number of assets falling within the same class of risk.
We see that this conditional probability is decreasing in F : the larger the level of the common
macroeconomic factor, the smaller the probability each firm defaults. Hence, we can fix a value
of F such that Pr ( Default| F ) = Default rate is what we want. Note, the probability F is larger
than −Φ−1 (x) is just x! Formally,

Pr F > −Φ−1 (x) = Pr −F < Φ−1 (x) = Φ Φ−1 (x) = x.
Then, with probability x, the default rate will not exceed

−1 √
Φ (PD) + ρΦ−1 (x)
VaRCredit Risk (x) = Φ √ .
1−ρ
It is easy to see that VaRCredit Risk (x) increases with ρ. Basel II sets x = 0.999 and, accordingly,
it imposes a capital requirement equal to,
Loss-given-default ∗ [VaRCredit Risk (0.999) − PD] ∗ Maturity adjustment.
The reason Basel II requires the term VaRCredit Risk (0.999)−PD, rather than just VaRCredit Risk ,
is that what is really needed here is the capital in excess of the 99.9% worst case loss over the
expected idiosyncratic loss, PD. Well functioning capital markets should already discount the
idiosyncratic losses.
Finally, Basel II requires banks to compute ρ through a formula in which ρ is inversely related
to PD. The formula is based on empirical research (see Lopez, 2004): for a firm which becomes
less creditworthy, the PD increases and its probability of default becomes less affected by market
conditions. Basel II requires banks to compute a maturity adjustment factor that takes into
account that the longer the maturity the more likely it is a given name might eventually migrate
towards a more risky asset class.
466
by A. Mele
The previous model can be further elaborated. We ask: (i) What is the unconditional prob-
ability of defaults, and (ii) what is the density function of the fraction of defaulting loans?
First, note that conditionally upon the realization of the macroeconomic factor F , defaults
are obviosly independent, being then driven by the idiosyncratic terms ǫi in Eq. (12.41). Given
N loans, and the realization of the macroeconomic factor F , these defaults are binomially
distributed as:

N
Pr (No of defaults = n| F ) = p (F )n (1 − p (F ))N−n ,
n
where p (F ) is as in Eq. (12.42). Therefore, the unconditional probability of n defaults is:

" ∞
Pr (No of defaults = n) = Pr ( No of defaults = n| F ) φ (F ) dF,
−∞
where φ denotes the standard normal density. This formula provides a valuable tool analysis in
risk-management. It can be shown that VaR levels increase with the correlation ρ.
Next, let ω denote the fraction of defaulting loans. For a large portfolio of loans, ω = p (F ),
such that:
" ∞ " ∞
Pr (ω ≤ x) = Pr ( ω ≤ x| F ) φ (F ) dF = Ip(F )≤x φ (F ) dF = Φ (F ∗ ) , (12.43)
−∞ −∞
−1I denotes
where the indicator function, and F ∗ satisfies, by Eq. (12.42), −F ∗ : x = p (−F ∗ ) =
√ ∗
Φ (PD)+ ρF
Φ √
1−ρ
. Solving for F ∗ leaves:
√
∗ 1 − ρΦ−1 (x) − Φ−1 (PD)
F = √ .
ρ
It is the threshold value taken by the macroeconomic factor that guarantees a frequency of de-
faults ω less than x. Replacing F ∗ into Eq. (12.43) delivers the cumulative distribution function
for ω. The density function f (x) for the frequency of defaults is then:
<
1 − ρ 12 (Φ−1 (x))2 − 2ρ1 (√1−ρΦ−1 (x)−Φ−1 (PD))2
f (x) = e .
ρ
467
12.7. Appendix 1: Present values contingent on future bankruptcies c
by A. Mele
12.7 Appendix 1: Present values contingent on future bankruptcies

The value of debt in Leland’s (1994) model can be written as:
" TB

D (A) = E e Cds + E e−rTB (1 − α) AB ,
−rs
(12A.1)
0
where TB is the time at which the firm is liquidated. Eq. (12A.1) simply says that the value of debt
equals the expected coupon payments plus the expected liquidation value of the bond. We have:
" ∞
−rT
E e B
= e−rt f t; A, AB dt ≡ pB (A) , (12A.2)
0
where
it can
be shown that pB (A) takes exactly the same form as in Eq. (12.10) of the main text, and
B B
f t; A, A denotes the density of the first passage time from V to V . Similarly,
" TB " TB
−rs −rs
E e Cds = C · E e ds
0 0
" ∞ " t

= e ds f t; A, AB dt
−rs
0 0
" ∞
1 − e−rt
=C· f t; A, AB dt
0 r
C
= · (1 − pB (A)) . (12A.3)
r
Replacing Eq. (12A.2)-(12A.3) into Eq. (12A.1) yields Eq. (12.9).
468
by A. Mele

Alternative derivation of Eq. (12.12). Under the risk-neutral probability, the expected change
of any bond price must equal zero when the safe short-term rate is zero,
∂B (t)
+ λ (Rec − B (t)) = rB = 0, with B (T ) = N,
∂t
where the first term, ∂B(t)

∂t , reflects the change in the bond price arising from the mere passage of time,
and λ (Rec − B (t)) is the expected change in the bond price, arising from the event of default, i.e. the
probability of a sudden default arrival, λ, times the consequent jump in the bond price, Rec − B (t).
The solution to the previous equation is,
" T
−λt
B (0) = Rec · λe dt + Ne−λT ,
0
=Pr{Default at t}
Proof of Eq. (12.13). The spread is given by:

$ %
1 RecT 1 − e−λT + Ne−λT
s (T ) = − ln .
T N
With N = 1, and RecT = R · e−κT , we have,

1 1
s (T ) = − ln Re−κT 1 − e−λT + e−λT = λ − ln Re−(κ−λ)T 1 − e−λT + 1 ,
T T
or equivalently,
1 −κT 1
s (T ) = − ln Re 1 − e−λT + e−λT = κ − ln R 1 − e−λT + e−(λ−κ)T ,
T T
Therefore, if κ ≥ λ, then limT →∞ s (T ) = λ, and if κ ≤ λ, limT →∞ s (T ) = κ.
469
12.9. Appendix 3: Details on transition probability matrixes and pricing c
by A. Mele
12.9 Appendix 3: Details on transition probability matrixes and pricing

Consider the matrix P (T − t) for T − t ≡ ∆t, P (∆t), and write,
+
1 + λij ∆t, i=j
P (∆t)ij ≡ (12A.4)
λij ∆t, i = j
We are defining the constants λij as they were the counterparts of the intensity of the Poisson process
in Eq. (12.11). Accordingly, these constants are simply interpreted as the instantaneous probabilities
of migration from rating i to rating j over the time interval ∆t. Naturally, for each i, we have that
N
j=1 P (∆t)ij = 1, and using into Eq. (12A.4), we obtain,
N

λii = − λij . (12A.5)
j=1,j=i
The matrix Λ containing the elements λij defined in Eqs. (12A.4) and (12A.5) is called the generating
matrix.
Next, let us rewrite Eq. (12A.4) in matrix form,
P (∆t) = I + Λ∆t.
T
Suppose we have a time interval [0, T ], which we chop into n pieces, so to have ∆t = n. We have,

T n
P (T ) = P (∆t)n = I + Λ .
n
For large n,
P (T ) = exp (ΛT ) , (12A.6)
∞ (T Λ)n
the matrix exponential, defined as, exp (ΛT ) ≡ n=0 n! .
To evaluate derivatives “written on states,” we proceed as follows. Suppose Fi is the price of deriva-
tive in state i ∈ {1, · · ·, N}. Suppose the Markov chain is the only source of uncertainty relevant for
the evaluation of this derivative. Then,
∂Fi
dFi = dt + [FR̃ − Fi ],
∂t
where R̃ ∈ {1, · · ·, N}, with the usual conditional probabilities. In words, the instantaneous change in
the derivative value, dFi , is the sum of two components: one, ∂F ∂t dt, related to the mere passage of
i
time, and the other, [FR̃ − Fi ], related to the discrete change arising from a change in the rating.
Suppose that r = 0. Then,
N
E (dFi ) ∂Fi ∂Fi
rFi = 0 = = + λij [Fj − Fi ] = + λij [Fj − Fi ] ,
dt ∂t ∂t
j=1 j=i
with the appropriate boundary conditions.

As an example, consider defaultable bonds. In this case, we may be looking for pricing functions
having the following form,
Fi (T − t) = xQi (T − t) + 1 − Qi (T − t) ,
and then solve for Qi (T − t), for all i ∈ {1, · · ·, N}. Naturally, we have

0 = xQ′i − Q′i + λij [x (Qj − Qi ) − (Qj − Qi )]
j=i
! !
=x Q′i + λij (Qj − Qi ) − Q′i + λij (Qj − Qi ) ,
j=i j=i
470
12.9. Appendix 3: Details on transition probability matrixes and pricing c
by A. Mele
which holds if and only if,

!
Q′i = − λij (Qj − Qi ) = − λij Qj + λij Qi = − λij Qj + λii Qi .
j=i
j=i j=i j=i
That is, Q′ = −ΛQ, which solved through the appropriate boundary conditions, yields precisely Eq.
(12A.6).
471
c
12.10. Appendix 4: Derivation of bond spreads with stochastic default intensity by A. Mele
12.10 Appendix 4: Derivation of bond spreads with stochastic default

intensity
We derive Eq. (12.32), by relying on the pricing formulae of Chapter 11. If the short-term is constant,
the price of a defaultable bond derived in Section 11.3.7 of Chapter 11 can easily be extended to, with
the notation of the present chapter,
#N " N #t
−rN − 0 λ(t)dt −rt − 0 λ(u)du
P (y, N) = e E e + e E λ (t) e Rec (t) dt. (12A.7)
0

=Pr{Default∈(t,t+dt)}
The term indicated inside the integral of the second term, is indeed the density of default time at t,
because, #t
Pdefault by time t (λ) = 1 − E e− 0 λ(s)ds ,
such that by differentiating with respect to t, yields, under the appropriate regularity conditions, that
Pr{Default∈ (t, t + dt)} is just the term indicated in Eq. (12A.7). So Eq. (12.32) follows. Naturally,
∂
Pr{Default ∈ (t, t + dt)} = − Psurv (λ, t) .
∂t
Replacing this into Eq. (12A.7),
#N " N
∂
P (y, N) = e−rN E e− 0 λ(t)dt + Rec e−rt − Psurv (λ, t) dt
0 ∂t
" N

= 1 − LGD 1 − e−rN Psurv (λ, N) − (1 − LGD) re−rt Psurv (λ, t) dt,
0
where the second equality follows by integration by parts and the assumption of constant recovery
rates. Setting r = 0, produces Eq. (12.33).
472
12.11. Appendix 5: Conditional probabilities of survival c
by A. Mele
12.11 Appendix 5: Conditional probabilities of survival

We prove Eqs. (12.35)-(12.37). First, for (ti−1 , ti ) small, the numerator in Eq. (12.34) can be replaced
by #t
∂ − 0 λ(s)ds
− Psurv (λ, t) ≡ E λ (t) e ,
∂t
and rescaled by dt. Regularity conditions under which we can perform this differentiation can be found
in a related context developed in Mele (2003). Eqs. (12.35)-(12.36) follow.
As for Eq. (12.37), the proof follows the same lines of reasoning as that in Appendix 3 of Chapter
11. That is, we can define a density process,
#τ
#T
e− 0 λ(s)ds Psurv (λ (τ ) , τ , T )
− τ λ(s)ds
ηT (τ ) = #T , Psurv (λ (τ ) , τ , T ) ≡ E e Fτ .
− 0 λ(s)ds
E e
It is easy to show that the drift of Psurv is λ (τ ) dτ , such that by Itô’s lemma,
dηT (τ )
= − [−Vol (Psurv (λ (τ ) , τ , T ))] dW (τ ) ,
ηT (τ )
where,
∂
Psurv (λ (τ ) , τ , T )
−Vol (Psurv (λ (τ ) , τ , T )) ≡ − ∂λ σ λ (τ ) = B (T − τ ) σ λ (τ ),
Psurv (λ (τ ) , τ , T )
where the second line follows by the closed-form expression of Psurv in Eq. (12.29). Therefore, Wλ (τ )
is a Brownian motion under Qλ , where

dWλ (τ ) = dW (τ ) + B (T − τ ) σ λ (τ )dτ ,
and Eq. (12.37) follows.
473
12.12. Appendix 6: Modeling correlation with copulae functions c
by A. Mele
12.12 Appendix 6: Modeling correlation with copulae functions

A. Statistical independence and correlation
Two random variables are always uncorrelated, provided they are independently distributed. Yet
there might be situations where two random variables are not correlated and still exhibit statistical
dependence. As an example, suppose a random variable y relates to another, x, through y = kx3 , for
x ∈ {−xN , xN−1 ,
some constant k, and x can take on 2N + 1 values, · · ·, −x1 , 0, x1 , · · ·, xN−1 , xN }, and
Pr {xj } = 2N1+1 . Then, we have that Cov (x, y) = Nj=1 (−x j ) x3+
j
N 3
j=1 (xj ) xj = 0 and yet, y and x
are obviously dependent. This example might be interpreted, economically, as one where y and x are
two returns on two asset classes. These two returns are not correlated, overall. Yet the comove in the
same direction in both very bad and in very good times. This appendix is a succinct introduction to
copulae, which are an important tool to cope with these issues.
Consider two random variables Y1 and Y2 . We may relate Y1 to another random variable Z1 and we
may relate Y2 to a second random variable Z2 , on a percentile-to-percentile basis, viz
Fi (yi ) = Gi (zi ) , i = 1, 2, (12A.8)
where Fi are the cumulative marginal distributions of Yi , and Gi are the cumulative marginal distribu-
tions of Zi . That is, for each yi , we look for the value of zi such that the percentiles arising through the
mapping in Eq. (12A.8) are the same. Then, we may assume that Z1 and Z2 have a joint distribution
and model the correlation between Y1 and Y2 through the correlation between Z1 and Z2 . This indirect
way to model the correlation between Y1 and Y2 is particularly helpful. It might be used to model the
correlation of default times, as in the main text of this chapter. We now explain.
B. Copulae functions
We begin with the simple case of two random variables, This simple case shall be generalized to the
multivariate one with a mere change in notation. Given two uniform random variables U1 and U2 ,
consider the function C (u1 , u2 ) = Pr (U1 ≤ u1 , U2 ≤ u2 ), which is the joint cumulative distribution of
the two uniforms. A copula function, then, is any such function C, with the property of being capable
to aggregate the marginals Fi into a summary of them, in the following natural way:
C (F1 (y1 ) , F2 (y2 )) = F (y1 , y2 ) , (12A.9)
where F (y1 , y2 ) is the joint distribution of (y1 , y2 ). Thus, a copula function is simply a cumulative
bivariate distribution function, as F (Y1 ) and F (Y2 ) are obviously uniformly distributed. To prove Eq.
(12A.9), note that
C (F1 (y1 ) , F2 (y2 )) = Pr (U1 ≤ F1 (y1 ) , U2 ≤ F2 (y2 ))

= Pr F1−1 (U1 ) ≤ y1 , F2−1 (U2 ) ≤ y2
= Pr (Y1 ≤ y1 , Y2 ≤ y2 )
= F (y1 , y2 ) . (12A.10)
That is, a copula function evaluated at the marginals F1 (y1 ) and F2 (y2 ) returns the joint density
F (y1 , y2 ). In fact, Sklar (1959) proves that, conversely, any multivariate distribution function F can
be represented through some copula function.
The most known copula function is the Gaussian copula, which has the following form:

C (u1 , u2 ) = Φ Φ−1 −1
1 (u1 ) , Φ2 (u2 ) , (12A.11)
where Φ denotes the joint cumulative Normal distribution, and Φi denotes marginal cumulative Normal
distributions. So we have,

F (y1 , y2 ) = C (F1 (y1 ) , F2 (y2 )) = Φ Φ−1 −1
1 (F2 (y2 )) , Φ2 (F2 (y2 )) , (12A.12)
474
12.12. Appendix 6: Modeling correlation with copulae functions c
by A. Mele
where the first equality follows by Eq. (12A.10) and the second equality follows by Eq. (12A.11).
As an example, we may interpret Y1 and Y2 as the times by which two names default. A simple
assumption, then, is to set:
Fi (yi ) = Φi (zi ) , i = 1, 2, (12A.13)
for two random variables Zi that are “stretched” as explained in Part A of this appendix. By replacing
Eq. (12A.13) into Eq. (12A.12),
F (y1 , y2 ) = Φ (z1 , z2 ) .
This reasoning can be easily generalized to the N-dimensional case, where:
F (y1 , · · ·, yN ) = C (F1 (y1 ) , · · ·, FN (yN )) = Φ (z1 , · · ·, zN ) ,
where
zi : Fi (yi ) = Φi (zi ) .
We use this approach to model default correlation among names, as explained in the main text, and
in the next appendix.
475
12.13. Appendix 7: Details on CDO pricing with imperfect correlation c
by A. Mele
12.13 Appendix 7: Details on CDO pricing with imperfect correlation

We follow the copula approach to price the stylized CDOs in the main text of this chapter. For each
name, create the following random variable,
√
zi = ρF + 1 − ρǫi , i = 1, 2, 3, (12A.14)
where F is a common factor among the three names, ǫi is an idiosynchratic term, and F ∼ N (0, 1),
ǫi ∼ N (0, 1). Finally, ρ ≥ 0 is meant to capture the default correlation among the names, as follows.
Assume that the risk-neutral probability each firm defaults, by T , is given by,
Qi (T ) = Φ (ζ 0.10 ) ≡ 10%,
where Φ is the cumulative distribution of a standard normal variable. That is, by time T , each firm
defaults any time that,
zi < ζ 0.10 ≡ Φ−1 (10%) .
Therefore, ρ is the default correlation among the assets in the CDO.
We can now simulate Eq. (12A.14), build up payoffs for each simulation, and price the tranches
by just averaging over the simulations, as explained below. Naturally, the same simulation technique
can be used to price tranches on CDOs with an arbitrary number of assets. Precisely, simulate Eq.
(12A.14), and obtain values z̃i,s , s = 1, · · · , S, where S is the number of simulations and i = 1, 2, 3.
At simulation no s, we have
z̃1,s , z̃2,s , z̃3,s , s ∈ {1, · · ·, S} .
We use the previously simulated values as follows:
• For each simulation s, count the number of defaults across the three names, defined as the
number of times that z̃i,s < ζ 0.10 , for i = 1, 2, 3. Denote the number of defaults as of simulation
s with Def s .
• For each simulation s, compute the total realized payoff of the asset pool, defined as,
π̃s = Def s · 40 + (3 − Def s ) · 100.
• For each simulation s, compute recursively the payoffs to each tranche, πi,s ,
+ + , ,

i−1
πi,s = min max π̃s − πk,s , 0 , Ni ,
k=1
where Ni is the nominal value of each tranche (N1 = 140, N2 = 90, N3 = 70).
• Estimate the price of each tranche by averaging across the simulations,
S S S
−r 1 −r 1

−r 1

Price Senior = e π1,s , Price Mezzanine = e π2,s , Price Junior = e π3,s .
S s=1 S s=1 S s=1
Note, the previous computations have to be performed under the risk-neutral probability Q. Using
the probability P in the previous algorithm can only be lead to something useful for risk-management
and VaR calculations at best
Note, this model, can be generalized to a multifactor model where,
√ √
zi = ρi1 F1 + · · · + ρid Fd + 1 − ρi1 − · · · − ρid ǫi ,
with obvious notation.

476
by A. Mele
References
Amato, J. D. (2005): “Risk Aversion and Risk Premia in the CDS Market.” BIS Quarterly
Review, September, 55-68.
Anderson, R. W. and S. Sundaresan (1996): “Design and Valuation of Debt Contracts.” Review
Artzner, P., F. Delbaen, J.-M. Eber, and D. Heath (1999): “Coherent Measures of Risk.”
Berndt, A., R. Douglas, D. Duffie, M. Ferguson and D. Schranz (2005): “Measuring Default
Risk-Premia from Default Swap Rates and EDFs.” BIS Working Papers no. 173.
Black, F. and J. Cox (1976): “Valuing Corporate Securities: Some Effect of Bond Indenture
Provisions.” Journal of Finance 31, 351-367.
Broadie, M., M. Chernov and S. Sundaresan (2007): “Optimal Debt and Equity Values in the
Presence of Chapter 7 and Chapter 11.” Journal of Finance 62, 1341-1377.
Christoffersen, P. F. (2003): Elements of Financial Risk Management. Academic Press.
Duffie, D. and D. Lando (2001): “Term Structure of Credit Spreads with Incomplete Account-
ing Information.” Econometrica 69, 633-664.
Fender, I. and P. Hördahl (2007): “Overview: Credit Retrenchement Triggers Liquidity Squeeze.”
BIS Quarterly Review (September), 1-16.
Hull, J. C. (2007): Risk Management and Financial Institutions. Pearson Education Interna-
tional.
Ingersoll, J. E. (1977): “A Contingent-Claims Valuation of Convertible Securities.” Journal of

International Monetary Fund, (2008): Global Financial Stability Report. April 2008.
Jamshidian, F. (1989): “An Exact Bond Option Pricing Formula.” Journal of Finance 44,
205-209.
Jarrow, R. A., D. Lando and S. M. Turnbull (1997): “A Markov Model for the Term-Structure
of Credit Risk Spreads.” Review of Financial Studies 10, 481-523.
Jorion, Ph. (2008): Value at Risk. New York: McGraw Hill.
Leland, H. E. (1994): “Corporate Debt Value, Bond Covenants and Optimal Capital Struc-
ture.” Journal of Finance 49, 1213-1252.
477
by A. Mele
Leland, H. E. and K. B. Toft (1994): “Optimal Capital Structure, Endogenous Bankruptcy,

and the Term Structure of Credit Spreads.” Journal of Finance 51, 987-1019.
Lopez, J. (2004): “The Empirical Relationship Between Average Asset Correlation, Firm Prob-
ability of Default and Asset Size.” Journal of Financial Intermediation 13, 265-283.
McDonald, R. L. (2006): Derivatives Markets, Boston: Pearson International Edition.
Merton, R. C. (1974): “On the Pricing of Corporate Debt: The Risk-Structure of Interest
Rates.” Journal of Finance 29, 449-470.
Modigliani, F. and M. Miller (1958): “The Cost of Capital, Corporation Finance and the
Theory of Investment.” American Economic Review 48, 261-297.
Sklar, A. (1959): “Fonction de Répartition à N dimensions et Leurs Marges.” Publications de

l’Institut Statistique de l’Université de Paris 8: 229-231.
Vasicek, O. (1987): “Probability of Loss on Loan Portfolio.” Working paper KMV, published
in: Risk (December 2002) under the title “Loan Portfolio Value.”
478
13
Financial engineering and fixed income securities
13.1 Introduction
13.1.1 Relative pricing in fixed income markets
This chapter lies down foundational issues relating to financial engineering for fixed income
securities. Fixed income securities can be particularly complex, as outlined in the previous
two chapters. Many instruments in the fixed income markets differ substantially from those in
the remaining portions of the capital markets. For example, a simple instrument such a pure
discount bond is very difficult to price. Intuitively, the price of a pure discount bond reflects
the time value for money. It is related to the intertemporal preferences and beliefs of the
market participants, which are unobservable. The situation is different in the case of traditional
“relative pricing,” i.e. when we price a number of assets given the price of some other assets,
while ensuring that there are no arbitrage opportunities “left on the table.” In this case, we
can evaluate derivatives without reference to any preferences or beliefs. The Black & Scholes
formula, for example, is a preference free formula, although this type of formula or reasoning
cannot exactly be applied to evaluate fixed income securities, as explained below.
13.1.2 Complexity of fixed income securities

The rapid growth in the fixed income markets was also led by many new instruments that
are substantially more complex than the traditional plain vanilla bonds (i.e. default-free, non-
callable bonds, defaultable bonds), or other instruments related to credit risk transfers, or
baskets of fixed income instruments or callable bonds, where the borrower can “call” the contract
to anticipate the payment of the principal, as we have seen in the previous chapter.
We have seen that the standard tools of asset evaluation are unlikely to work in this context.
For example, we cannot even hope to “adapt” such models as the Black & Scholes model to
price interest rate derivatives. Indeed, the Black & Scholes model relies on the assumption
of a constant volatility of the asset price underlying the contract. In the context of interest
rate derivatives, instead, the volatility of the underlying asset price depends on the maturity
of the underlying (tends to zero as the maturity goes to zero). More generally, pricing and
hedging interest rate derivatives requires a model that describes the evolution of the entire term
13.2. Bootstrapping and curve fitting c
by A. Mele
structure of interest rates. Academics and practitioners have proposed a variety of solutions to
this problem, from the mid 80s to the beginning of the 90s. Today, dozens of new methods are
available to price fixed income products. The general principles underlying the APT are still
the same, though.
13.1.3 Many evaluation paradigms

While dozens of new methods are available to price fixed income products, we do not see the
emergence of a “single” model to price all of the extant fixed income products! Market partici-
pants use different models to price interest rate derivatives. Typically, a single investment bank
has a “battery” of different models with which to “fight” in the market. Pieces of this “battery”
may fight for different goals. For example, an investment bank might display a preference for a
certain type of models as a result of (i) its culture and history (see, e.g., the intellectual legacy
of Fisher Black and Emanuel Derman in Goldman), (ii) the particular business the bank is
pursuing. For example, we have seen that to price options on interest rates such as “caps,” we
may use the market model, which relies on the “Black 76” formula. However, using this model
implies that we do not have a closed-form solution for the price of “swaptions”, which can only
be solved through numerical methods. If the “swaptions” business is not important for the bank
then, we may safely adopt the market model.
This chapter presents the main challenges to solve for complicated models, while ensuring
that all the products in the books are perfectly fitted.
13.2 Bootstrapping and curve fitting

We start with a standard definition. The yield to maturity ŷ (YTM, henceforth) on a bond is
its rate of return. It is the discount rate that would equate the present value of the stream of
payoffs with its market price,
n
Cti 1
ŷ : B (T ) = ti + . (13.1)
i=1
(1 + ŷ) (1 + ŷ)T
C
This formula differs from the price formula B (T ) = ni=1 [1+r(tti )]ti + [1+r(T
1
)]T
, as Eq. (13.1)
i
uses the same discount rate ŷ to discount the future payements. Clearly, for zeros we have,
ŷ = R (T ).
13.2.1 Extracting zeros from bond prices

In principle, the zeros can be “extracted” from the market price of the bonds, provided there is
a sufficient spread of bonds across maturities. As an example, consider three bonds. The first
bond pays off at T1 , the second bond pays off at T1 , T2 , the third bond pays off at T1 , T2 , T3 . By
no-arbitrage,
    
B (T1 ) C11 + 1 0 0 P (T1 )
 B (T2 )  =  C21 C22 + 1 0   P (T2 )  ,
B (T3 ) C31 C32 C33 + 1 P (T3 )
480
by A. Mele
for some coupons Cij . Therefore, we can use the observed prices B (t, Ti ) and the payments Cij
to calculate the zeros P (t, Ti ) as,
   −1  
P (T1 ) C11 + 1 0 0 B (T1 )
 P (T2 )  =  C21 C22 + 1 0   B (T2 )  . (13.2)
P (T3 ) C31 C32 C33 + 1 B (T3 )
The previous procedure can be generalized to the case in which “some maturity is missing.”
The resulting algorithm is known as the bootstrap, which is described next.
13.2.2 Bootstrapping
Bootstrapping proceeds as follows. Let Bi be the price of a bond paying off coupons at the
sequence of dates t1 , t2 , · · · , ti and a principal of $1 at ti . Let Pi be the price of the zero
maturing at ti . Then,
(i) The equation B1 = (C11 + 1) P1 implies that we can extract the zero P1 as follows,
B1
P1 = 1+C11
.
(ii) Given the equation (C22 + 1) P2 + C21 P1 = B2 , and the previously computed P1 , we
proceed to extract the zero P2 as follows, P2 = B2C−C 21 P1
22 +1
.
n−1
Bn − i=1 Cni Pi
(iii) In general, we extract the zero Pn as follows, Pn = Cnn +1
.
(iv) The previous steps work if we have an ordered number of bonds and all of the maturity
dates. Indeed, the previous procedure boils down to the computation of the solution of
Eq. (13.2). When some of the maturity dates are not available, we replace the required
coupon rate Cni at time ti with a linear interpolation Ĉni between the coupon Cn,i−1 at
time ti−1 and Cn,i+1 at time ti+1 , as follows,
ti+1 − ti ti − ti−1
Ĉni = Cn,i−1 + Cn,i+1 .
ti+1 − ti−1 ti+1 − ti−1
The effects of the interpolation should be “visible” near the missing maturitites.
Consider a sequence of coupon bearing bonds maturing at n with fixed coupon streams Cn .
Then, as explained in Chapter 11, let us define the par yield curve as the sequence of Cn such
that the price Bn is “forced” to equal 100%. Therefore, we can “extract” zeros and, then, the
yield curve, from step (iii) above, by just using the the recursive formula,

Bn − Cn n−1
i=1 Pi
Pn = , (13.3)
Cn + 1
481
by A. Mele
where Bn = 100%. The following table provides a numerical example.

n
Coupon Maturity, n Zero price i=1 Pi Yield curve∗
6.00% 1 0.9434 0.9434 6.00%
7.00% 2 0.8728 1.8162 7.04%
8.00% 3 0.7914 2.6076 8.11%
9.50% 4 0.6870 3.2946 9.84%
9.00% 5 0.6454 3.9400 9.15%
10.50% 6 0.5306 4.4706 11.14%
11.00% 7 0.4579 4.9285 11.81%
11.25% 8 0.4005 5.3290 12.12%
11.50% 9 0.3472 5.6762 12.47%
11.75% 10 0.2980 not useful 12.87%
∗
Discretely compounded
13.2.3 Curve fitting

We may use statistical techniques alternative to bootstrapping, to cope with situations in which
the number of bonds does not equal the number of maturity dates. Suppose we observe N bonds,
where the i-th bond entitles to receive the coupons Cij , for j = 1, · · · , Mi . We assume that the
bond prices are observed with errors, or
Mi

B (Mi ) = Cij P (tj ) + P (tMi ) + ǫi , i = 1, · · · , N,
j=1
where ǫi is the measurement error for the i-th bond.

We aim to find the curve T → P (T ) that minimizes the errors, in some statistical sense. The
natural device is to “parametrize” the function P (T ), with a number of k parameters, where
k < N . To parametrize the function P (tj ) for a generic tj , we can use polynomials, as originally
suggested by McCulloch (1971, 1975),
P (tj ) = 1 + a1 tj + a2 t2j + · · · + ak tkj ,
where the ai are the parameters. Cubic splines are polynomials up to the third order, and
are verypopular. The parameters ai can be estimated by minimizing the sum of the squared
errors, N 2
i=1 ǫi . A well-known pitfall of polynomials is that a high k might imply that while the
polynomial approximation works reasonably well near the observed maturities, it may exhibit
an erratic behavior in between. To avoid this problem, we can use local polynomials, which are
low-order polynomials (typically splines) fitted to non-overlapping subintervals.
Naturally, we may also want to parametrize the spot rates, R (T ), as polynomials. Alterna-
tively, Nelson and Siegel (1987) propose the following parametrization,

1 − e−λT 1 − e−λT −λT
R (T ) = β 1 + β 2 + β3 −e ,
λT λT
where β i and λ are the parameters. These coefficients may be given an interpretation, in terms
of the factors driving the yield curve, reviewed in Chapter 11. The coefficient β 1 governs the
level of the yield curve. The coefficient β 2 relates to the slope, as an increase in this coefficient
482
13.3. Duration, convexity and asset liability management c
by A. Mele
increases short yields more than long yields. The coefficient β 3 shapes the curvature, as an
increase in this coefficient has little effect on very short and very long yields, but increases the
middle of the yield curve. Moreover, the coefficient λ controls the exponential decay of the yield
curve: small values of λ translate to slow decay and can better fit the curve at long maturities;
large values of λ, instead, lead to a fast decay, which helps fit the short-end of the yield curve.
Finally, λ determines where the loading on β 3 achieves its maximum. Diebold and Li (2006)
have used this setting to estimate β i for each date, and then used these estimated time series
of β i to forecast future values of β i through vector autoregressions and, then, the future yield
curve.
13.3 Duration, convexity and asset liability management

The risk of longing a default-free bond is that the future bond price is uncertain, due to the
possibility that the spot interest rates could change in the future. Synthetically, we can say that
the risk of a bond is related to the changes in the required bond return, or the YTM. Consider
the definition of the YTM ŷ in Eq. (13.1). Next, consider the following function B (y; T ),
n
Cti 1
B (y; T ) = ti + .
i=1
(1 + y) (1 + y)T
This function aims to “mimic” how the market price B (T ) would behave if the YTM ŷ changed
to some value y. Naturally,
B (ŷ; T ) = B (T ) .
Motivated by the previous remarks, we can define a measure of risk of the bond based on
the sensitivity of the bond price with respect to changes in y. Economically, we are trying
to answer the following question: What happens to the bond price once we perturb the one
rate ŷ that discounts all the payoffs? Mathematically, this sensitivity is the first partial of the
“bond-pricing” formula B (y; T ) with respect to y,
- n .
1 ti · Cti T ·1
By (y; T ) = − ti +
1 + y i=1 (1 + y) (1 + y)T
∂
where the subscript denotes a partial derivative, i.e. By (y; T ) = ∂y B (y; T ). Graphically, this
sensitivity measure By (y; T ) is the tangent to the price-yield relation, as shown in Figure 13.1
below.
13.3.1 Duration
We define the “Macaulay duration” as,
n
−By (y; T )
DMac ≡ (1 + y) = ω ti · ti + ω̂ T · T,
B (y; T ) i=1
where
Cti / (1 + y)ti 1/ (1 + y)T
ω ti = , ω̂ T = .
B (y; T ) B (y; T )
483
by A. Mele
B o n d p ric e
2 nd o rd e r a p p ro x im a tio n
1 st o rd e r a p p ro x im a tio n
YTM
FIGURE 13.1. The bond price-yield relation (solid line), its first-order approximation (duration) and
its second-order approximation (convexity).
In words, the Macaulay duration is a weighted average of the payment dates. The weights ω ti
are the discounted coupons at the various payment dates, Cti / (1 + y)ti , related to the current
market value of these coupons, i.e. the bond price B (y; T ) when the YTM is y. That is, the
weights are the proportionsof the bond’s present value that is attributable to the payoff at
date t. The weights satisfy ni=1 ω ti + ω̂ T = 1. Therefore, DMac ≤ T . The Macaulay duration is
a measure of how far in the future the bond pays off. For zeros, DMac = T .
For small y, DMac (y) is simply the semi-elasticity of the bond price with respect to the YTM.
This semi-elasticity is also referred to as “modified duration”:
−By DMac
D≡ = .
B 1+y
2
−Byy By
A simple computation reveals that the modified duration, D, satisfies: ∂D ∂y
= B
+ B
.
Therefore, the modified duration is decreasing in the YTM when the bond price is sufficiently
convex in the YTM, which is surely the case for long-term maturity dates.
Interestingly, the modified duration is increasing in the YTM when the bond price is concave
in the YTM, a property that arises for callable bonds and mortgage-backed securities (MBS,
henceforth), as explained in Chapters 11 and 12. Intuitively, the incentives to proceed to early
repayments “kick in” as the YTM decreases, which makes the duration of the MBS decrease.
The Macaulay duration for continuously compounded rates is even simpler to compute. First,
define the continuously compounded YTM as the single number x̂ such that
n

B(x̂; T ) = cti e−x̂·ti + e−x̂·T ,
i=1
where B(x̂; T ) is the market price of a bond paying off the principal of one at maturity and the
stream of payoffs cti . Next, consider, the function x → B (x; T ). Compute the semi-elasticity of
the bond price B (x; T ) with respect to the continuously compounded YTM x,
n −x·ti n
−Bx (x; T ) i=1 cti ti e + T · e−x·T
= = wti · ti + ŵT · T,
B (x; T ) B (x; T ) i=1
484
by A. Mele
c e−x·ti
where Bx (x; T ) = ∂B(x;T ) ti e −x·T
∂x
, wti = B(x;T )
and ŵT = B(x;T )
. Note, the weights are such that
n
i=1 wti + ŵT = 1. Therefore, the “Macaulay duration” for continuously compounded rates
is equal to the semi-elasticity of the bond price with respect to the continuously compounded
YTM x.1 This result may simplify some calculations.
13.3.2 Convexity
Convexity measures how the sensitivity, By , changes with y. Mathematically, convexity is related
to the second partial of the bond price with respect to y, Byy . If the second partial, Byy , is
positive, then, the interest rate sensitivity declines as y increases (see Figure 13.1). This is
∂
because ∂y (−By ) = −Byy < 0. Formally, convexity is defined as,
Byy
C≡ .
B
We may, then, consider the following expansion of the bond price:
∆B 1
≈ −D · ∆y + C · (∆y)2 .
B 2
That is, for very “convex securities”, duration may not be a safe measure of return, as also
shown in Figure 13.1
13.3.3 Duration and asset-liability management

13.3.3.1 Introductory issues
We can use duration to assess how exposed a bond portfolio is to movements in the interest
rates. We can then “immunize” a portfolio of bonds to changes in the interest rates. Duration
is relevant for asset-liability management. For example, pension funds have known streams of
liabilities that must be matched by the assets they hold. In words, the duration of the assets
must equal the duration of the liabilities. In the UK, pension funds must mark-to-market the
liabilities. Therefore, one objective of these funds is to “immunize” their liabilities against
movements in the interest rates.
Alternatively, consider the following basic example. A bank borrows $100 at 2% for a year
and lends this money at 4% for 5 years, where the higher rate compensates for many things
such as risk, the bank’s market power, etc. Assuming that the bank’s borrower does not default,
in the first year, the bank generates profits equal to $(4% − 2%) · 100 = 2, according to its
books. However, the right computation to make should not relate to past market (interest rate)
conditions, but to the current ones. Suppose for example that in one year, the interest rate
for borrowing raises from 2% to 5%. This is of course a bit unrealistic, but it gives the idea
5
of where the action is. In this case, The market value of the assets is: 100·1.04
1.054
= 100.09. The
market value of the liabilities is, of course, 100 · 1.02 = 102. The bank’s problem is, of course,
a duration mismatch.
Let us consider a more substantive example, based on asset-liability management for pension
funds. We consider the following extreme example. In 30 years from now, a pension fund is due
1 Mathematically, we could have obtained this result in a straightforward manner, as follows. Define the bond price function as
B (y (x)), where by definition, y (x) = ex − 1. Hence, Bx (y (x)) = By (y (x)) y′ (x) = By (y (x)) ex = By (y (x)) (1 + y). It follows
−By (1+y) −Bx
that DMac = B
= B
.
485
by A. Mele
to deliver $100,000 to some future retiree. Suppose the current market situation is such that
the yield curve is flat at 4%, such that the market value of this liability is $100, 000·(1.04)−30 =
$30, 832. Accordingly, the would-be retiree invests $30.832 in the pension fund. So we have
the following situation:
Cash Pensions
$30, 832 $30, 832
Suppose, now, that the pension fund does not invest this cash. This is of course inefficient, but
it is precisely the point of this simple exercise to see why the strategy is inefficient.
Consider two extreme cases, occurring under two scenarios underlying developments in the
fixed income market. In one week,
(i) Scenario ↑: the yield curve shifts up parallely to 5%. Accordingly, the value of the liability
for the pension fund is: $100, 000 · (1.05)−30 = 23, 138.
Cash Profit
$30, 832 $7, 694
Pensions
$23, 138
(ii) Scenario ↓: the yield curve shifts down parallely to 3%. Accordingly, the value of the
liability for the pension fund is: $100, 000 · (1.03)−30 = 41, 199.
Cash Loss
$30, 832 −$10, 367
Pensions
$41, 199
Therefore, a drop in the yield curve results in a loss for the pension fund: when interest rates
go down, the pension fund faces a challenging situation as it has to honour its obligations in
30 years, but the financial market “yields less” than one week ago.
Naturally, the pension fund would face the opposite situation were interest rates to go up.
In some countries, we do not like pension funds to experience volatility. The previous volatility
arises simply because the pension fund, receives $30, 832, and then it just puts this money
“under the pillow.” The most efficient way to kill volatility is, of course, to invest $30, 832 in
a 30 bond as soon as we receive this money–at the market conditions of 4%. This is perfect
hedging! But, we do not necessarily have access to such a bond. How do we proceed, then?
We now develop examples that illustrate how to deal systematically with issues relating to
asset-liability management.
13.3.3.2 Hedging
Let us consider a portfolio of two bonds with different durations. Its value is given by,
V = B1 (ŷ1 ) θ1 + B2 (ŷ2 ) θ2 ,
486
by A. Mele
where B1 (ŷ1 ) and B2 (ŷ2 ) are the market value of the bonds, ŷ1 and ŷ2 are the YTM on the
bonds and, finally, θ1 and θ2 are the quantities of bonds in the portfolio. Let us consider a small
change in the two YTM ŷ1 and ŷ2 . We have,
dV = − [D (ŷ1 ) B1 (ŷ1 ) θ1 dŷ1 + D (ŷ2 ) B2 (ŷ2 ) θ2 dŷ2 ] .
The question is: How should we choose θ1 and θ 2 so as to make the value of the portfolio remain
constant after a change in ŷ1 and ŷ2 ?
Let us assume a parallel shift in the term structure of interest rates. In this case, dŷ1 = dŷ2 .
The portfolio is said to be immunized if its value V does not change as ŷ1 and ŷ2 change, i.e.
dV = 0, which is true when,
D (ŷ2 ) B2 (ŷ2 )
θ1 = − θ2 . (13.4)
D (ŷ1 ) B1 (ŷ1 )
A useful interpretation of this portfolio is that we may be holding a bond with some duration,
say we hold θ2 units of the second bond. Given these holdings, we may wish to sell another
bond, possibly with a lower duration, to hedge against movements in the price of the bond we
hold.
Alternatively, we can think of the second asset as a liability the value of which fluctuates after
a change in the interest rates. Then, we may wish to purchase some asset to hedge against the
liability. Mathematically, θ2 < 0 and θ 1 > 0. Moreover, Eq. (13.4) reveals that the number of
assets to hold to hedge against the liability is high if the ratio of the two durations of the assets,
D (ŷ2 )/ D (ŷ1 ), is large. In this case, the hedging position is obviously inefficient. Asset-liability
management, and “immunization”, is costly when we hedge high-duration liabilities with low
duration assets. We now illustrate these cases through a few basic examples.
13.3.3.3 A first example: hedging zeros with zeros
Suppose that we hold one bond, a zero with maturity equal to 5 years. We want to hedge the
risk of this bond through another bond, a zero with maturity equal to 1 year. Let us assume
that the term-structure is flat at 5%, discretely compounded. Then,
1 1 DMac (ŷ1 ) 1
B1 (ŷ1 ) = = = 0.95238, D (ŷ1 ) = = = 0.95238
1 + ŷ1 1 + 0.05 1 + ŷ1 1 + 0.05
1 1 DMac (ŷ2 ) 5
B2 (ŷ2 ) = 5 = 5 = 0.78353, D (ŷ2 ) = = = 4.7619
(1 + ŷ2 ) (1 + 0.05) 1 + ŷ2 1 + 0.05
and:
D (ŷ2 ) B2 (ŷ2 ) 4.7619 · 0.78353
θ1 = − θ2 = − · 1 = −4.1135.
D (ŷ1 ) B1 (ŷ1 ) 0.95238 · 0.95238
That is, to hedge the 5Y zero, we need to short-sell approximately four 1Y zeros. The balance
of this hedging position is,
B1 (ŷ1 ) θ 1 + B2 (ŷ2 ) θ2 = (−4.1135) · 0.95238 + 0.78353 = −3.1341, (13.5)
a quite inefficient hedge.

The reason why this is inefficient is clear. Hedging high maturity bonds with short maturity
ones implies we should rebalance too often. Moreover, as time goes on, the sensitivity of the
short-term bonds to changes in the YTM is very small (at the extreme, the price equals face
487
by A. Mele
value plus coupon, at maturity), compared to that of long-term bonds. Therefore, rebalancing
becomes increasingly severe as time unfolds.
Next, we study how the value of this portfolio changes after large changes in the YTM.
By the assumption that the initial term-structure is flat at 5%, ŷ1 = ŷ2 = 5%. Moreover, by
rearranging Eq. (13.5),
B2 (y = 5%) = 4.1135 · B1 (y = 5%) − 3.1341.
The left hand side of this equation is the price of the 5Y bond. The right hand side is the value
of the “replicating” portfolio, which consists of (i) approximately 4 units of the 1Y bond, and
(ii) the balance of the hedging position.
When y = 5%, the previous relation can only approximately hold,
B2 (y) ≈ 4.1135 · B1 (y) − 3.1341.
Figure 13.2 below plots the left hand side and the right hand side of this relation.
1.0
y
0.9
0.8
0.7
0.6
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10
YTM
1
FIGURE 13.2. Dotted line (top): The price of the 5Y zero, B2 (y) = (1+y) 5 , where y is
the YTM. Solid line (bottom): The value of the “replicating” portfolio consisting of (i)
4.1135 units of the 1Y zero, and (ii) the balance of the hedging position, which is equal
1
to −$3.1341, i.e. 4.1135 · B1 (y) − 3.1341, where B1 (y) = 1+y is the 1Y zero price.
What is going on? We are hedging the 5Y zero by selling approximately four 1Y zeros. In a
neighborhood of y = 5%, the value of the “synthetic” 5Y zero we sold, 4.1135 · B1 (y) − 3.1341,
behaves as B2 (y). However, the 5Y zero displays more convexity than the “synthetic” bond.
This larger convexity implies that:
• If the interest rates go down, the price of the 5Y zero bond we hold increases more than
the value of the “synthetic” bond we sold. As a result, we make profits.
• If the interest rates go up, the price of the 5Y zero bond we hold decreases less than the
value of the “synthetic” bond we sold. As a result, we make profits.
488
by A. Mele
In all cases, we make profits.2 However, this is not an arbitrage opportunity! The previous
reasoning hinges on the assumption of a parallel shift in the term-structure of interest rates,
that is dŷ1 = dŷ2 , where ŷ1 = spot rate for 1 year, and ŷ2 = spot rate for 5 years. While parallel
shifts in the term-structure seem empirically relevant, they are not the only shifts that are likely
to occur, as we explained in Chapter 11.
13.3.3.4 Fixed income arbitrage
Swap spread arbitrage is a popular strategy. It was responsible for leading LTCM to a loss of
about $1.6 billion in 1997. The strategy works as follows: (i) enter a swap paying the floating
LIBOR, Lt , and receiving a fixed rate C̄; (ii) short a par Treasury with the same maturity as
the swap, thus paying the fixed coupon rate CT , and invest the proceeds at the repo rate rt .
Thus, the payoff of the strategy is the fixed spread to be received, F = C̄ −CT , and the floating
spread to be paid, St = Lt − rt . So we go long or short this strategy according to whether we
view F to be larger or smaller than the average floating spread St over the strategy horizon.
Historically, the spread St has certainly been volatile, but quite stable, so it is a reasonable
strategy. The problem occasionally, though, St can reach quite large levels.
A second strategy is yield curve arbitrage. For example, go long five year bonds, and short
two- and ten-year bonds, an implicit view that short term interest raise will raise and medium
term interest rates will lower. This “butterfly” strategy is also known as barbell trading, and
will be further illustrated in the next subsection. This might be quite cheap, intellectually, and
not necessarily rewarding.
More sophisticated strategies rely on models, which identify which points of the yield curve
are misaligned from those predicted by the model. The strategy, substantially, is: buy the poor
and short the model-based rich, where the model-based rich is replicated through a portfolio
with cash and the bonds that are well-priced by the model, weighted with the model-based
delta, as in the derivation of the bond pricing formula in Section 11.3.2.2 of Chapter 11.
13.3.3.5 Duration trading: Barbell and bullet hedges
As a second example of duration hedging, consider the “barbell” trading strategy, which is a
way to hedge some liability (a “bullet”) with duration D2 through two assets with durations
D1 and D3 , where D1 < D2 < D3 . This trading strategy is expected to work when we expect
the yield curve to flatten, with its short-end part not going too much high. Moreover, investing
in the short-term segment of the yield curve, allows one to invest elsewhere relatively rapidly
once the first asset expires, were the bond market to go down.
Let us consider the previous example, and suppose there is another bond available for trading,
a zero with maturity equal to 10 years. We aim to hedge against movements in the price of the
5Y zero with a portfolio consisting of (i) one 1Y zero and (ii) the 10Y zero. We continue to
2 Mathematically, we buy 1 unit of the 5Y zero at B and sell θ units of the 1Y zero at B , thereby cashing in θ B −B = 3.1341.
2 1 1 1 1 2
Then, in one one month (say), consider what would happen if we had to reverse the position in Eq. (13.5), i.e. sell the 5Y zero and
buy back the 1Y zeros we sold. We consider three scenarios: (i) The yield curve will be the same as today. In this case, reversing the
position in Eq. (13.5) implies that we shall simply have to pay 3.1341 (assuming the change in value of the two bonds due to the
mere passage of time is small enough). (ii) The yield curve will experience a positive parallel shift. In this case, the prices of the two
zeros will be B1 −∆B1 and B2 − ∆B2 , where ∆B1 and ∆B2 are both positive. Therefore, we shall obtain, −3.1341 +θ1 ∆B1 − ∆B2 ,
where θ1 ∆B1 − ∆B2 is positive because by convexity, the value of the portfolio decreases more than the price of the 5Y zero, thus
yielding a profit. (iii) The yield curve will experience a negative parallel shift. In this case, the prices of the two zeros will be
B1 + ∆B1 and B2 + ∆B2 , where ∆B1 and ∆B2 are both negative. Therefore, we shall obtain, −3.1341 + ∆B2 − θ1 ∆B1 , where
∆B2 − θ1 ∆B1 is positive because by convexity, the value of the portfolio increases less than the price of the 5Y zero, thus yielding
a profit.
489
by A. Mele
assume that the yield-curve is flat at 5%, and only consider parallel shifts in the term-structure
of interest rates.
Such a “butterfly” trade can be implemented as follows. We look for a portfolio of the 1Y and
10Y zero with the following properties: (i) the market value of the portfolio equals the market
price of the 5Y zero,
B2 (ŷ2 ) = B1 (ŷ1 ) θ1 + B3 (ŷ3 ) θ3 ; (13.6)
and (ii) the duration of the portfolio equals the duration of the 5Y zero,
D (ŷ2 ) B2 (ŷ2 ) = D (ŷ1 ) B1 (ŷ1 ) θ1 + D (ŷ3 ) B3 (ŷ3 ) θ3 . (13.7)
The solution to Eqs. (13.6) and (13.7) is given by,

D (ŷ3 ) − D (ŷ2 ) B2 (ŷ2 ) D (ŷ2 ) − D (ŷ1 ) B2 (ŷ2 )
θ1 = , θ3 = . (13.8)
D (ŷ3 ) − D (ŷ1 ) B1 (ŷ1 ) D (ŷ3 ) − D (ŷ1 ) B3 (ŷ3 )
By the same computations made in the previous example, we have that B3 (ŷ3 ) = 0.61391
and D (ŷ3 ) = 9.5238. By using the figures in the previous example, we compute θ1 and θ3 in
Eqs. (13.8) to be
9.5238 − 4.7619 0.78353 4.7619 − 0.95238 0.78353
θ1 = = 0.45706, θ3 = = 0.56724.
9.5238 − 0.95238 0.95238 9.5238 − 0.95238 0.61391
Figure 13.2 depicts the behavior of the bullet price and the market value of the barbell as
we change the YTM. Several comments are in order. First, note that the barbell portfolio is
now more convex than the bullet! Now, large movements in the YTM lead to profits, provided
we maintain the assumption of parallel shifts in the term-structure of interest rates. Second,
the barbell trade is “self-financed.” By construction, the value of the bullet we sell equals the
value of the barbell portfolio. However, the barbell is clearly not an arbitrage opportunity. The
scenario underlying Figure 13.3 relies on the assumption of a parallel shift in the term structure
of interest rates. As we explained in Chapter 11, it is not realistic to simultaneously assume
large and parallel movements in the term-structure of interest rates. Historically, large interest
rate shifts (that is, typically, shifts occurring over large horizons of time) are accompanied by
the occurrence of a variety of shape modifications.
1.0
0.9
0.8
0.7
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10
YTM
490
by A. Mele
FIGURE 13.3. “Barbell trading.” Dotted line (bottom): The price of the 5Y zero, B2 (y) =
1
(1+y)5
, where y is the YTM. Solid line (top): The value of the “barbell” portfolio consisting
of (i) 0.45706 units of the 1Y zero and (ii) 0.56724 of the 10Y zero, i.e. B1 (y1 ) · 0.45706 +
1 1
B3 (y3 ) · 0.56724, where B1 (y) = 1+y is the 1Y zero price and B3 (y) = (1+y) 10 is the 10Y
zero price.
Table 13.1 considers the case of non-parallel shifts in the term-structure. We assume that
the initial term-structure is not flat. Then, we consider two scenarios: (i) A “twist” in the
term-structure, i.e. long-term rates lower than short-term rates; (ii) a “steepening” of the term-
structure.
TABLE 13.1.
Barbell value =
YTM Bullet price Mod. dur. θ1 B1 (ŷ1 ) + θ3 B3 (ŷ3 )
Initial term-structure
1Y ŷ1 = 4% B1 (ŷ1 ) = 0.961 D (ŷ1 ) = 0.961
5Y ŷ2 = 5% B2 (ŷ2 ) = 0.783 D (ŷ2 ) = 4.762
10Y ŷ3 = 6% B3 (ŷ3 ) = 0.558 D (ŷ3 ) = 9.434
Barbell value = 0.783
“Twist”
1Y ŷ1 = 6% B1 (ŷ1 ) = 0.943 D (ŷ1 ) = 0.943
5Y ŷ2 = 5% B2 (ŷ2 ) = 0.783 D (ŷ2 ) = 4.762
10Y ŷ3 = 4% B3 (ŷ3 ) = 0.675 D (ŷ3 ) = 9.615
“Steepening”
1Y ŷ1 = 4% B1 (ŷ1 ) = 0.961 D (ŷ1 ) = 0.961
5Y ŷ2 = 5% B2 (ŷ2 ) = 0.783 D (ŷ2 ) = 4.762
10Y ŷ3 = 7% B3 (ŷ3 ) = 0.508 D (ŷ3 ) = 9.346
We use the portfolio in Eq. (13.8), and find that in correspondence of the initial term-structure
(ŷ1 = 4%, ŷ2 = 5%, ŷ3 = 6%), θ1 = 0.449 and θ3 = 0.629. We keep this portfolio fixed, and
compute the barbell value, θ1 B1 (ŷ1 ) + θ3 B3 (ŷ3 ), occurring at the two scenarios “twist” and
“steepening.” The convexity of the barbell trade is in fact a bet on long-term bonds and leads
to a profit in the “twist” scenario (since B2 (ŷ2 ) = 0.783 in all cases). That is, by convexity, the
price B3 varies more than the price of shorter maturity zeros, thus leading to profits. However,
note that the barbell bet leads to losses in the “steepening” scenario.
A caveat. The previous computations should be interpreted with some care, as the value of
the zeros changes over time. Notably, the value of the zeros changes over the horizon after which
we are designing scenarios, even without any changes in the yield curve. However, this effect
is usually minor when the horizon is sufficiently small and, generally, can be factored into the
analysis.
To sumup, duration hedging is a useful tool, although it has some limitations. It is only a
first-order approximation to the price of bond. A conventional bond is typically strictly convex
in the YTM. Therefore, for large changes in the YTM, we should update the duration-based
491
13.4. Foundational issues on interest rate modeling c
by A. Mele
hedging ratios. Re-adjustments are in order anyway, since fixed income securities have duration
that decreases over time.
13.3.3.6 Negative convexity
What happens when bond prices have “negative convexity”? In Chapter 12, we saw that the
value of a callable bond can be concave in the short-term rate. A similar feature is displayed
by mortgage-backed-securities (MBS, henceforth), which can now be concave in the YTM! The
reason for this negative convexity is that early repayments are likely to occur as the YTM
decreases, which entails two inextricable consequences: (i) the price of the MBS “increases less”
than a conventional bond price after a decline in the YTM, especially when the YTM is low;
(ii) the duration of the MBS decreases as the YTM decreases.
MBS may be responsible of financial turmoil. The mechanism is well-known. Institutions
that hold MBS typically short conventional bonds for hedging purposes. But the MBS dura-
tion increases as interest rate increase, due to the negative convexity: ∂Duration
∂r
= −Convexity.
Therefore, an interest rate increase can lead these institutions to short additional conventional
bonds, which worsens liquidity and leads to a further increase in the interest rates, thereby
feeding a vicious circle. Perli and Sack (2003) estimate that in 2002 and 2003, this mechanism
may have amplified the volatility of the long-term US rates by a factor between 15% and 30%.
13.4 Foundational issues on interest rate modeling

In principle, the classical ideas underlying contingent claim analysis through binomial trees,
can be put at work in the context of fixed income instruments. In this context, however, we
need to revise quite a few methodological details. Let us illustrate the general issues. First, let
us review how binomial trees are constructed, in general:
(i) We begin with a probabilistic representation of how the price develops over time, using a
tree-like information structure.
(ii) For example, at the time of evaluation, we observe the state. In the next period, there
can be two mutually exclusive states of the world: (a) the state “up,” occurring with
probability p; and (b) the state “down,” occurring with probability 1 − p.
(iii) After two periods, there can be three mutually exclusive states of the world, as in the
following diagram. We label the tree in this diagram a “recombining” tree, to emphasize
that the “up & down” and the “down & up” nodes are the same.
“u p ”, “u p ”
p
“u p ”
p state
1 -p
“u p ”, “d o w n ”
Today “d o w n ”, “u p ”
p
1 -p “d o w n ”
state
1 -p
“d o w n ”, “d o w n ”
F irst p erio d S eco n d p erio d
492
by A. Mele
The previous diagram can be used to price options written on stocks. The stock price unfolds
through the branches of the tree. Then, we figure out the no-arbitrage movements of the option
price along the tree. Suppose, however, we wish to price an option written on a zero, a 3 Year
zero say. Can we apply the same methodology to price the option? The answer is no, and the
reason is that we cannot exogenously “track” the movements of the prices of the zero, as in the
case of the stock price. Instead, after one year, the 3 Year zero becomes a 2 Year zero, i.e. quite
a different asset.
The trick, here, is to model the movements of the yield curve. There are two approaches. In
the first approach, we model the dynamics of the short-term rate, defined as the interest rate
on a loan with maturity equal to the time intervals in the tree. The resulting model, which in
Chapter 11 we called model of the short-term rate, has implications in terms of the movements of
the entire term-structure. This approach gives rise to evaluation formulae in which the current
prices of the zeros predicted by the model are not necessarily equal to the market prices. We
develop this approach in this section. In a second approach, which in Chapter 11 we called
no-arbitrage, or calibration, approach, we model the dynamics of the entire term-structure.
This approach gives rise to option evaluation formulae in which the current prices of the zeros
predicted by the model are equal to the market prices. We develop this approach in the last
sections of this chapter.
13.4.1 Tree representation of the short-term rate

13.4.1.1 Recursive evaluation
Consider a two-period and two-state tree in which the current short-term rate is r. The devel-
opment of the short-term rate is uncertain. That is, the future short-term rate, r̃, is random,
and can take two values: either r+ with probability p, or r− with probability 1 − p. We assume
that r+ > r− . We emphasize p is the physical probability:
r+ ⇒ P (r+ , T )
p
ր
r
1−p ց
r− ⇒ P (r− , T )
Suppose, also, that two zeros with distinct maturities are available for trading. A money mar-
ket accounting technology is also available (MMA, in the sequel). Investing $1 in the MMA
generates $1·(1 + r) in the second period. We aim to derive an evaluation formula for the zero
based on the previous probabilistic model for the short-term rate dynamics. The general idea
is to build up a portfolio that contains one zero and the MMA. We shall make the value of this
portfolio in the second period replicate the value of the zero we wish to price. By no-arbitrage,
then, the value of the portfolio in the first period must equal the value of the zero we wish
to price, and we shall be done. The appendix develops the arguments, and shows that in the
absence of arbitrage, there is a constant λ, such that the following relation holds true:
∆P (r̃, T )
Ep [P (r̃, T )] − (1 + r) P (r, T ) = · Vol (r̃ − r) · λ
, (13.9)
∆r̃ = unit risk premium
= volatility of the price
where Vol(r̃ − r) = |r+ − r− |, and Ep [P (r̃, T )] denotes the expectation of the bond price under
the probability p.
493
by A. Mele
As we explained in previous chapters, Eq. (13.9) is an APT relation. It says that the excess
return on the zero equals the volatility of its price multiplied by the unit price of risk. We call
the term,
∆P (r̃, T )
· Vol (r̃ − r) ,
∆r̃
“price volatility” because it measures the amplitude of the price variation due to changes in the
short-term rate in the future, ∆P∆r̃
(r̃,T )
, i.e. the “price-sensitivity”, where this price sensitivity is
normalized by the volatility of the short-term rate, Vol(r̃ − r).
Eq. (13.9) can now be cast in a format that we can use to make it more “operational”. After
rearranging terms, we obtain:
(p − λ) P (r+ , T ) + [1 − (p − λ)] P (r− , T ) Eq [P (r̃, T )]
P (r, T ) = = (13.10)
1+r 1+r
where q ≡ p − λ is the risk-neutral probability.
A few considerations. We “expect” that λ < 0 because bond prices are decreasing in the
short-term rate here. Then, q ≡ p − λ > p.3 Hence, the risk-neutral probability of an upward
movement of the short-term rate, q, is higher than the true probability, p. An investor who longs
a bond, is concerned by an increase of the short-term rate in the future and, hence, “corrects”
the true probability p by assigning a higher risk-adjusted probability to the “upward” state.
Assume the current short-term rate equals 10%. We know that with (physical ) probability p,
the short-term rate as of the next year will increase by 2 percentage points, and with probability
1 − p, it will decrease by 2 percentage points. Finally, with the same probability p, the short-
term rate prevaling from the next year to two years time, will increase by 2 further percentage
points from its previous value in one year time. Suppose that the probability of an upward
movement is 20% and that the the absolute value of the Sharpe ratio is 30%.
Risk-neutral probability
These data suffice to provide an estimate of the risk-neutral probability of an upward movement
of the short-term rate. We simply use the formula, q = p − λ, and obtain q = 20% − (−30%) =
50%.
Pricing zeros
Next, we can price, say, a zero maturing in two years. We can set up the following tree:
14%
12%
q= 1/2
r = 10% 10%
8%
6%
1 Year 2 Years
3 To be able to interpret q as a probability, we must have that (i) q ≡ p − λ > 0 ⇔ −λ > −p and q ≡ p − λ < 1 ⇔ −λ < 1 − p.
That is, −λ ∈ (−p, 1 − p)

494
by A. Mele
We can use Eq. (13.10) to “fill-in” each node of the tree. We start from the end of the tree,
where the price of the two years zero is $1, and then use Eq. (13.10) to fill every node, as
illustrated below.
P = 1
0 .8 9 2 8
q = 1 /2 = 1 /1 .1 2
P = 1
0 .8 2 6 7
0 .9 2 5 9
= 1 /1 .0 8
P = 1
1 Y ear 2 Y ears
The price of the zero, in one year, is simply one divided by the interest relevant at the beginning
of the year, next year. The price we are looking for is obtained by applying Eq. (13.10) yielding,
1
Eq [P (r̃, 2)] qP (r+ , 2) + (1 − q) P (r− , 2) 2
(0.8928) + 12 (0.9259)
= = = 0.8267.
1+r 1+r 1.10
Convexity effects
What is the discretely compounded two-years spot rate? Does it equal 10%? Why or why not?
The two-year spot rate, r (0, 2), satisfies,
<
1 1
0.8266 = ⇔ r (0, 2) = − 1 = 9.98%.
[1 + r (0, 2)]2 0.8266
Even though r = 10% and Eq (r̃) = 10%, we have that two years spot rate equals, 9.98%. That
is,

1 1 1 1
0.8266 = Eq > = 0.8264.
1+r 1 + r̃ 1 + r 1 + Eq (r̃)
Prices increase after activation of uncertainty. It’s a convexity effect similar to that we have
explained in Chapter 11 (Section 11.3.5.1, Figure 11.4).
13.4.2 Tree pricing

We can simply generalize the tree to a multiperiod case. We use Eq. (13.10) to evaluate zeros
at all nodes of the tree and maturities. Given q, which can be estimated once we estimate p
and λ, we use recursively Eq. (13.10). Then, we may price options on zeros. The weakness of
the approach is that the initial term structure is predicted with error! Let us illustrate this
approach with a concrete numerical example. Consider the following tree, in which the current
495
by A. Mele
short-term rate for one year is r = 4%.
P =1
r = 6%
P = 1 / 1 .0 6 = 0 .9 4 3 3
P =1
r = 5%
r = 4%
r = 4% P = 1 / 1 .0 4 = 0 .9 6 1 5
r = 3% P =1
r = 2%
P = 1 / 1 .0 2 = 0 .9 8 0 4
P =1
t =0 t =1 t =2 t =3
FIGURE 13.5. The dynamics of the short-term rate
At time t = 1, the short-term rate is either 5%, with probability p (the true probability) or
3%, with probability 1 − p. At time t = 2, the short-term rate behaves as follows:
+
6% with probability p
If at time t = 1, r = 5% then, at time t = 2, r =
4% with probability 1 − p
+
4% with probability p
If at time t = 1, r = 3% then, at time t = 2, r =
2% with probability 1 − p
Also shown in the previous diagram is the price of a hypothetical 3 Year zero, P , at time
t = 3 and at time t = 2. At time t = 3, the expiration date, P = 1 in all states of nature. At
time t = 2, the price P is P (r, T ) = Eq [P (r̃, T )]/ (1 + r) = 1/ (1 + r), for r = 6%, 4% and 2%.
The issue, now, is how to compute the price of the zero in correspondence of the remaining
nodes. We should use the formula, P (r, T ) = Eq [P (r̃, T )]/ (1 + r) to populate the tree, but
we do not know p, λ, and q. Suppose we “estimate” p and λ. In this case, we compute q as
q = p − λ, as in Eq. (13.10). (For example, p = 20% and λ = −30%, so that q = 50%.) Suppose
that we come up with q = 12 . Then, the following diagram gives the price of the zero at all the
496
by A. Mele
nodes as of time t = 1, and at the evaluation time t = 0.

P =1
q= 1
2
P = 0.9433
r = 5%
P = ( 12 0.9433 + 12 0.9615) / 1.05 = 0.9070
P =1
q= 1
2
r = 4% P = 0 .9615
P= ( 1
2 0 . 9070 + 12 0 . 9427 ) / 1 . 04 = 0 . 8893
q= 1
2
P =1
r = 3%
P= ( 12 0 .9615 + 12 0 . 9804 ) / 1 .03 = 0 .9427
P = 0.9804
P =1
t=0 t =1 t=2 t =3
So the price of the 3 Year zero equals 0.8893. Next, consider a European call option written
on the 3 Year zero, with expiration date equal to 2 and strike price K = 0.95. The following
diagram gives the value of the option predicted by the model at each node of the tree.
P = 0.9433, K = 0.9500
q=
C = max{P − K ,0} = 0
1
2
r = 5%
C = ( 12 ⋅ 0 + 12 0.0115) / 1.05 = 0.0055
q= 1
2
r = 4% P = 0.9615, K = 0.9500
C = ( 12 0 . 0055 + 12 0 . 0203 ) / 1 .04 = 0 .0124 q= 1
C = max{P − K ,0} = 0.0115
2
r = 3%
C = ( 12 0 .0115 + 12 0 . 0304 ) / 1 . 03 = 0 . 0203
P = 0.9804, K = 0.9500
C = max{P − K,0} = 0.0304
t =0 t =1 t =2
The model predicts that the current price of the call option is 0.0124.
13.4.2.1 Calibration
The model we are dealing with predicts that the price of the 3 Year zero is equal to 0.8893.
However, there is no guarantee that this model-implied price equals the market price of the 3
Year zero. Suppose, instead, that the market price of the 3 Year zero, P$ say, equals 0.8700.
What should we do to make the model-implied price of the 3 Year zero equal to the market
price? The question is important: how can we trust an option pricing model that is not even
able to pin down the initial market value of the underlying zero?
497
by A. Mele
To make the model-implied price of the 3 Year zero equal to the market price, P$ = 0.8700,
we cannot take the risk-neutral probability q as given, i.e. independent of the observed price
P$ = 0.8700, as we did before. Rather, we should calibrate the probability q, as follows,
1
P$ = 0.8700 = [q · P1 (5%) + (1 − q) · P1 (3%)] (13.11)
1.04
where P1 (5%) and P1 (3%) are the prices of the zero at time t = 1, in the events that the
short-term rate is up to 5% or down to 3%.
The previous equation follows, again, by Eq. (13.10). But here, the unknown is not the
price, which is instead given by the market price. Rather, we are looking for, or calibrating,
the probability q that makes the RHS of Eq. (13.11) equal to its LHS. Naturally, we need to
compute the prices of the zeros P1 (5%) and P1 (3%). These prices can be found by another
application of Eq. (13.10), as follows,
q · 0.9433 + (1 − q) · 0.9615 q · 0.9615 + (1 − q) · 0.9804

P1 (5%) = , P1 (3%) = .
1.05 1.03
By replacing the previous expressions for P1 (5%) and P1 (3%) into Eq. (13.11), we obtain,

1 q · 0.9433 + (1 − q) · 0.9615 q · 0.9615 + (1 − q) · 0.9804
P$ = 0.8700 = q· + (1 − q) · .
1.04 1.05 1.03
This is a nonlinear equation in q, that we can easily solve, to obtain, q = 0.8779. Hence, we
find:
P1 (5%) = 0.9005 and P1 (3%) = 0.9357.
The next diagram depicts the implied binomial tree, i.e. the tree that results after matching
the model-implied price of the 3 Year zero to the market price, P$ = 0.8700.
P =1
P = 0.9433
q = 0.8779
r = 5%
P1 (5 % ) = [q 0 .9433 + (1 − q )0.9615 ] / 1 .05 = 0.9005
P =1
q = 0.8779
r = 4%
P = 0.9615
P$ = 0.8700 = [qP1 (5% ) + (1 − q )P1 (3% )] / 1.04
q = 0.8779
P =1
r = 3%
P1 (3 % ) = [q 0 .9615 + (1 − q )0 .9804 ] / 1.03 = 0 .9357
P = 0.9804
P =1
t =0 t =1 t =2 t =3
Note how different P1 (5%) and P1 (3%) are from those we found earlier by imposing that
q = 12 . In the “implied” tree, they are smaller than those obtained with q = 12 , state by state.
This is because in the implied tree, q = 0.8779. The implied tree puts more weight on those
498
by A. Mele
states of nature in which the short-term rate is high or, equivalently, bond prices are low. We
expect that the price of the option on the implied binomial tree to be different (lower) from
that we found earlier.
So let’s do the computations by utilizing the implied binomial tree:
P = 0.9433, K = 0.9500
q = 0.8779 C = max{P − K,0} = 0
r = 5%
C = [q ⋅ 0 + (1 − q)0.0115] / 1.05 = 0.0013
q = 0.8779
r = 4% P = 0.9615, K = 0.9500
C = (q 0.0013 + (1 − q )0.0134 ) / 1.04 = 0.0026 C = max{P − K,0} = 0.0115
q = 0.8779
r = 3%
C = [q 0.0115 + (1 − q )0.0304 ] / 1.03 = 0 .0134
P = 0.9804, K = 0.9500
C = max{P − K,0} = 0.0304
t =0 t =1 t =2
The computations in the previous diagram reveal that the option price predicted by the
implied binomial tree is 0.0026, which is one order of magnitude less than the option price
we find earlier, 0.0124! The interpretation for this result is, again, related to the implied risk-
neutral probability, which is much larger than q = 12 . The implied tree puts a relatively large
weight on the events in which the short-term rate is high or bond prices are low, which makes
the option price relatively so small.
13.4.2.2 Another zero
We are not done. Let us go back to the zero pricing problem, and suppose that we observe the
price of a 2 Year zero, and that this price equals 0.9200, a quite reasonable figure. Is there any
chance that the inputs to the pricing problem related to the 3 Year zero are such that we can
“fit” the 2 Year zero as well? The answer is, of course, not. There are no reasons for which the
inputs utilized to fit the price of the 3 Year zero could also lead to fit the price of the 2 Year
zero. The 2 Year zero is quite a different asset! Indeed, in the next diagram, we use the inputs
to the pricing problem related to the 3 Year zero, and Eq. (13.10), and find that the price of
the 2 Year zero implied by the price of the 3 Year zero is equal to 0.9178. Unless the market
price happens, by chance, to equal 0.9178, we cannot simultaneously fit the price of the 3 Year
499
by A. Mele
and the 2 Year zeros.

P =1
r = 5%
P1 ,1 (5 % ) = 1 / 1 . 05 = 0 . 95 2 3
q = 0.8779
P =1
P = [q 0 . 95 2 3 + (1 − q )0 . 9 7 09 ] / 1 . 04 = 0 .91 7 8
r = 3%
P1,1 (3 % ) = 1 / 1 . 03 = 0 . 97 0 9
P =1
t=0 t =1 t=2
To simultaneously fit the price of the 3 Year and the 2 Year zeros, we should implement at
least one of the two strategies: (i) To make the probabilities q time-varying; (ii) To calibrate the
entire structure of the short-term movements in Figure 13.5 and fit the initial term-structure
of market prices. We implement the first of these two strategies in the next subsection. We
develop the second strategy in Section 13.4.
13.4.2.3 Implementing implied binomial trees
We now build up the implied binomial tree in the general case, i.e. when we have several bond
prices to match. Suppose the time interval is six months, so that the short-term rate is for six
months. The current short-term rate is 3.99%, annualized. It can change to either 4.50% or to
4.00%, with equal (physical) probability. Suppose that two zeros are available for trading: a 6M
zero and a 1Y zero, where the current price of the 1Y zero is 0.95974. What is the risk-neutral
probability implied by this tree? This probability must be such that, the price of all the zeros
are matched exactly.
The tree we face is depicted below.
r = 4 .5 0 %
p= 1
2
2
r = 3 .9 9 %
2
r = 4 .0 0 %
2
t=0 t = 0 .5
FIGURE 13.6. The dynamics of the short-term rate: high interest rate scenario
In this tree, p = 12 denotes the physical

probability. Naturally, the price of a 6M zero at
t = 0, equals, P$ (0, 0.5) = 1/ 1 + 0.0399
2
= 0.9804. This price is actually observed. That is, the
500
by A. Mele
current short-term rate, 3.99%, is a mere definition. Next, we proceed to find the no-arbitrage
movements of the 1Y zero, which are displayed below.
£1
r= 4.50 %
p = 12 2
P (0.5,1) = 1 / (1 + 0 .045
2
) = 0.9779
r= 3.99 %
2
P$ (0,1) = 0.95974 £1
r= 4 .00 %
2
P (0.5,1) = 1 / (1 + 0 .040
2
) = 0.9804
£1
t =0 t = 0.5 t =1
Note, the current market price, P$ (0, 1) = 0.95974, is less than the expected price to prevail
tomorrow, discounted at the current interest rate,

1 1 1 1
Ep [P (0.5, 1)] = 0.9779 + 0.9804 = 0.9599.
1+r 1 + 0.0399
2
2 2
Hence, p = 12 cannot be the risk-neutral probability. To find out the risk-neutral probability,
we proceed as follows. In the absence of arbitrage opportunities,
P$ (0, 1) = 0.95974
1
= [qPup (0.5, 1) + (1 − q) Pdown (0.5, 1)]
1+r
1
= [q · 0.9779 + (1 − q) · 0.9804]
1 + 0.0399
2
with obvious notation. This is one equation with one unknown, q. The solution for q is, q = 0.605.
We may now proceed with pricing derivatives. Consider a call option on the 1Y zero, with
expiration date in six months and exercise price equal to 0.9785. Its payoff is as depicted below:
£1
P (0 . 5 ,1 ) = 0 . 9 7 7 9
q = 0 .6 0 5 C = m a x {P (0 . 5 ,1 ) − K , 0 } = 0
r = 3 .9 9 %
2
C = ? £1
P (0 . 5 ,1 ) = 0 . 9 8 0 4
C = m a x {P (0 . 5 ,1 ) − K , 0 } = 0 . 0 0 1 9
£1
t=0 t = 0 .5 t =1
501
by A. Mele
So the option price is, by risk-neutral evaluation,

1
C= 0.0399 [q · 0 + (1 − q) · 0.0019] = 0.9804 [0.395 · 0.0019] = 7.3579 × 10−4 . (13.12)
1+ 2
What happens when the short-term rate does not evolve as in the diagram of Figure 13.6
but, instead, as in Figure 13.7?
r = 4 .4 1 54 %
2
r= 3.99 %
2
r = 4 .0 0 %
2
t =0 t = 0.5
FIGURE 13.7. The dynamics of the short-term rate: low interest rate scenario
The previous tree is one in which the short-term in the upper state of the world equal to
r = 4.4154%, not 4.50%, as in Figure 13.6. It implies that:
1 1
Pup (0.5, 1) = r = 4.4154%
= 0.9784.
1+ 2 1+ 2
Then, the risk-neutral probability, q, solves the following pricing equation,
P$ (0, 1) = 0.95974
1
= [qPup (0.5, 1) + (1 − q) Pdown (0.5, 1)]
1+r
1
= [q · 0.9784 + (1 − q) · 0.9804] .
1 + 0.0399
2
The solution is, q = 0.756, which is larger than the solution we found earlier using the tree in
Figure 13.6 (i.e., q = 0.605). The option price is, now,
1
C= 0.0399 [q · 0 + (1 − q) · 0.0019] = 0.9804 [0.244 · 0.0019] = 4.5451 × 10−4 .
1+ 2
Why is this price smaller than that computed in Eq. (13.12)? In the tree of Figure 13.7, the
up-state of the world is, so to speak, less severe than the up-state of the world in the tree of
Figure 13.6. To be able to match the initial price P$ (0, 1) = 0.95974, the model in Figure 13.7
must put more weight on the up-state of the world, i.e. a larger implied risk-neutral probability.
This implies a larger risk-neutral probability that low bond prices will arise in the future and,
hence, a lower option price.
In a segmented market, two investment banks might have different views about the evolution
of the short-term rate (the view in Figure 13.6 and the view in Figure 13.7). The first bank
502
by A. Mele
favours a “high” interest rate scenario, but it is not too risk-averse to that scenario (rup = 4.5%,
q = 0.605). The second bank favours a “mild” interest rate scenario, but it is more too risk-
averse to that scenario (rup = 4.4154%, q = 0.9784). But then, naturally, both institutions
need to agree on the initial bond price, P$ (0, 1) = 0.95974. The segmentation could arise, for
example, because the clientèle of the first bank and that of the second bank are unlikely to
meet and, the prices charged by the banks are not publicly known. In the absence of market
imperfections (and arbitrage), however, the investment banks should agree on the option price
too.
Next, let us another period to the diagram in Figure 13.6, assuming that the short-term rate
is as in the following diagram:
r = 4 .9 0 %
2
q1 = ?
r = 4 .5 0 %
q 0 = 0 .6 0 5 2
r = 3 .9 9 % r = 4 .3 0 %
2
2
r = 4 .0 0 %
2
r = 3 .9 0 %
2
t =0 t = 0 .5 t =1
FIGURE 13.8.
In this tree, q0 is the risk-neutral probability for the first period, and q1 is the risk-neutral
probability for the second period.
We already know that q0 = 0.605. The probability q1 is the risk-neutral probability for the
time-period (0.5, 1), and can be different from q0 . Suppose, also, that an additional zero is
available for trading, a 1.5Y zero. The current price of the 1.5Y zero is P$ (0, 1.5) = 0.9382.
To derive the the risk-neutral probability q1 , we proceed as follows. First, we consider the tree
503
by A. Mele
below.
£1
r= 4. 90 %
q1 = ? 2
P (1,1.5) = 1 / (1 + 0.049
2
) = 0.9761
r= 4.50 %
2
q0 = 0.605 PU (0.5,1.5) = ? £1
r= 3.99 %
2 r= 4. 30 %
P$ (0,1.5) = 0.9382
2
q1 = ?
P (1,1.5) = 1 / (1 + 0.043 ) = 0.9789
r= 4.00 % 2
2
PD (0.5,1.5) = ? £1
r= 3.90 %
2
P (1,1.5) = 1 / (1 + 0.039
2
) = 0.9808
£1
t =0 t = 0.5 t =1 t =1.5
We need to compute the prices PU (0.5, 1.5) and PD (0.5, 1.5). Once we compute these prices,
we shall use the no-arbitrage property of the zero, and the previously computed q0 = 0.605, to
recover q1 . By the usual no-arbitrage property of the zero, we have that:
1
PU (0.5, 1.5) = 0.045 [q1 · 0.9761 + (1 − q1 ) · 0.9789] (13.13)
1+ 2
1
PD (0.5, 1.5) = 0.040 [q1 · 0.9789 + (1 − q1 ) · 0.9808] (13.14)
1+ 2
The problem, q1 is not known. Therefore, Eqs. (13.13)-(13.14) do not allow us to pin down
the prices PU (0.5, 1.5) and PD (0.5, 1.5). But here is where calibration comes in! We know the
current price of the 1.5Y zero, which is, P$ (0, 1.5) = 0.9382. In the absence of arbitrage,
1
P$ (0, 1.5) = 0.9382 = 0.0399 [q0 · PU (0.5, 1.5) + (1 − q0 ) · PD (0.5, 1.5)] ,
1+ 2
where PU (0.5, 1.5) and PD (0.5, 1.5) are as in Eqs. (13.13)-(13.14), and where q0 = 0.605. So we
have,
1
0.9382 = [0.605 · PU (0.5, 1.5) + 0.395 · PD (0.5, 1.5)] , (13.15)
1 + 0.0399
2
where PU (0.5, 1.5) and PD (0.5, 1.5) are as in Eqs. (13.13)-(13.14). Hence, by replacing Eqs.
(13.13)-(13.14) into Eq. (13.15) leaves one equation with exactly one unknown, q1 . Solving,
yields, q1 = 0.8412, which implies that,
PU (0.5, 1.5) = 0.9549, PD (0.5, 1.5) = 0.9600.

504
by A. Mele
So, to sum up, we have the tree below.
£1
r= 4 .90 %
2
q1 = 0.8418 P (1,1 .5 ) = 0 .9761
r = 4.502 % £1
q0 = 0.605 P (0 .5,1 .5 ) = 0 .9549
U
r= 3 . 99 %
2 r= 4 . 30 %
2
P$ (0,1 .5 ) = 0 .9382 P (1,1 .5 ) = 0 .9789
r= 4.00 %
2
PD (0 . 5,1 . 5 ) = 0 .9600
£1
r= 3 .90 %
2
P (1,1 .5 ) = 0 . 9808
£1
t =0 t = 0.5 t =1 t = 1.5
We are now ready to compute the no-arbitrage price of a call option on the 1.5Y zero, with
expiration date in 1Y and exercise price equal to 0.9800. The price of the option at time t = 0.5,
is C = 0.00012, as illustrated below.
P (1 ,1 . 5 ) = 0 . 9 7 6 1
q1 = 0 .8 4 1 8 C = m a x {P (1 ,1 . 5 ) − K , 0 } = 0
C = 0
P (1 ,1 . 5 ) = 0 . 9 7 8 9
C = [q 1 ⋅ 0 + (1 − q 1 ) ⋅ 0 . 0 0 0 8 ] / (1 + 0 .0 4
) C = m a x {P (1 ,1 . 5 ) − K , 0 } = 0
2
= 0 .0 0 0 1 2
P (1 ,1 . 5 ) = 0 . 9 8 0 8
C = m a x {P (1 ,1 . 5 ) − K , 0 } = 0 . 0 0 0 8
t = 0 .5 t =1
We can now calculate the no-arbitrage price of the 1Y call option on the 1.5Y zero, struck at
K = 0.9800. It is,
1
C= 0.0399 [0 · q0 + 0.00012 · (1 − q0 )] = 0.9804 [0.00012 · (1 − 0.605)] = 4.647 × 10−5 .
1+ 2
We can use Figure 13.8 to price derivatives, such as, say, a call option on the 1.5Y zero, with
expiration date in six months, and exercise price equal to 0.9580. We have the following tree.
505
by A. Mele
P U (0 . 5 ,1 . 5 ) = 0 . 9 5 4 9
q 0 = 0 .6 0 5 C = m ax{P U (0 . 5 ,1 . 5 ) − K , 0 } = 0
r= 3.9 9 %
2
C =?
PD (0 . 5 ,1 . 5 ) = 0 . 9 6 0 0
C = m ax{PD (0 . 5 ,1 . 5 ) − K , 0 } = 0 . 0 0 2 0
t =0 t = 0.5
Therefore, the no-arbitrage price of the option is,
1
C= 0.039 [q0 · 0 + (1 − q0 ) · 0.0020] = 0.9804 [0.395 · 0.0020] = 7.745 × 10−4 .
1+ 2
13.4.2.4 Summing up
So let’s sum up what we’ve done. Given is the “evolution” of the short-term rate in Figure 13.8,
which we use to recover the two risk-neutral probabilities q0 (for the time span (0, 0.5)) and q1
(for the time span (0.5, 1)), starting from the knowledge of the market prices of two zeros, the
1Y zero and the 1.5Y zero. Precisely, given P$ (0, 1), the price of the 1Y zero, we recover q0 , as
illustrated below:
£1
PU (0 . 5 ,1 )
q0
P$ (0 ,1 ) £1
P D (0 . 5 ,1 )
£1
t=0 t = 0 .5 t =1
This is possible as PU (0.5, 1) and PD (0.5, 1) do not “depend” on q0 and so they are obtained
in a straightforward manner. Given q0 , then, we compute q1 , using P$ (0, 1.5), the price of the
506
by A. Mele
1.5Y zero, as illustrated below:

£1
q1 PU U (1 ,1 . 5 )
PU (0 . 5 ,1 . 5 ) £1
q̂ 0
P$ (0 ,1 . 5 ) PU D (1 ,1 . 5 )
PD (0 . 5 ,1 . 5 )
£1
PD D (1 ,1 . 5 )
£1
t=0 t = 0 .5 t =1 t = 1 .5
Again, the risk-neutral probability, q1 , can be recovered because PU U (1, 1.5), PU D (1, 1.5) and
PDD (1, 1.5) do not “depend” on q1 , and are easily obtained. So, given PU U (1, 1.5), PUD (1, 1.5)
and PDD (1, 1.5), we can express PU (0.5, 1.5) and PD (0.5, 1.5) as two (linear) functions of q1 .
Finally, we impose the no-arbitrage property to P$ (0, 1.5), which makes the observed price,
P$ (0, 1.5), a (linear) function of PU (0.5, 1.5) and PD (0.5, 1.5) and, hence, q1 , thereby allowing
us to “recover” q1 .
We can continue, and consider an additional time period, as in the tree in Figure 13.9 below.
We can recover q2 , once we are given the market price of a 2Y zero, P$ (0, 2), as follows:
• The prices of the 2Y zero at time t = 1.5 (the filled nodes in Figure 13.9) (say P (1.5, 2))
are easily computed, given an assumption about the numerical values of the short-term
rate in those nodes.
• Then, given the prices P (1.5, 2) at time t = 1.5, and the previously calibrated probabilities
q̂0 and q̂1 , we can express the current market price P$ (0, 2) as a (linear) function of q2 .
Then, we solve for q2 .
£1
q2
q̂ 1
£1
q̂ 0
£1
P $ (0 , 2 )
£1
£1
t =0 t = 0 .5 t =1 t = 1 .5 t = 2
507
13.5. The Ho and Lee model c
by A. Mele
FIGURE 13.9.
The calibration can continue. We extend the tree to one period more. Then, we use the
price of one additional zero to “recover” time varying risk-neutral probabilities. An alternative
procedure consists in: (i) fixing the risk-neutral probabilities q to some value at all times (e.g.,
q = 12 ), and (ii) figuring out the “implied” values for the short-term rate in each node of the
tree. The next section develops a systematic approach for implementing this procedure.
13.5 The Ho and Lee model

Ho and Lee (1986) introduced a revolutionary approach to modeling yield curve movements.
Their model is not about an economic theory of determination of the yield curve. Rather,
their approach is to take the yield curve as given, and then to model the movements of the
entire yield curve in order to price interest rate derivatives. As explained in Chapter 11 and in
the previous section, we need to “match” prices, to avoid having derivatives with underlyings
deviating from market prices. In Chapter 11, we discussed the Ho and Lee model in continuous
time, as this allowed to illustrate the general methodology underlying the HJM approach. The
original formulation of the model was, however, in discrete time. In this section, we present the
discrete time version of the model and some of its extensions, as well as the general philosophy
underlying matching the initial yield curve within a discrete time framework, which represents
indeed the industry practice.
The main idea underlying the Ho and Lee model is to model the movements of the yield
curve along a binomial tree, much in the spirit of the Cox, Ross and Rubenstein (1979) tree
representation of the Black and Scholes (1973) model. The main issues can be summarized as
follows. In Black and Scholes (1973) and Cox, Ross and Rubenstein (1979), the asset underlying
the option contract is a traded risk. So the underlying asset price satisfies the martingale
condition. Interest rate derivatives, instead, generally depend on non-traded risks. The mere
presence of boundary conditions induce bond return volatility to be time-varying.
13.5.1 The tree

The price of any zero evolves randomly over time, according to a binomial tree. Let Pj (t, T )
be the price of a pure discount bond as of time t, with time to maturity T − t, after j upstate
price movements. Let j ∼ B (t, q), a binomial random variable,
E (j) = tq, V ar (j) = tq (1 − q) ,
where q is the risk-neutral probability of a single upstate movement. Therefore we have,
Pj+1 (t + 1, T )
q
ր
Pj (t, T )
1−q ց
Pj (t + 1, T )
That is, if at time t, the number of upstate movements is equal to j then, at time t + 1, the
number of upstate movements can either jump to j + 1, with probability q, or stay at j, with
508
by A. Mele
probability 1 − q. Note also that after one period, the price of any zero is one period closer to
maturity. At maturity, t = T , the price of any zero is worth one unit of numéraire, viz
Pj (T, T ) = 1, for all j and T.
Note, in the previous tree, it shall not necessarily hold that Pj (t + 1, T ) < Pj (t, T ). On
the contrary, we would expect that especially when the maturity approaches, Pj (t + 1, T ) >
Pj (t, T ), as the price of the zero needs to converge to par.
13.5.2 The price movements and the martingale restriction

In the absence of arbitrage opportunities, the expected return on the zero at t must equal the
short-term rate, viz Pj (t, T ) = e−rj (t) Eq (P· (t + 1, T )), or
Pj (t, T ) = Pj (t, t + 1) [qPj+1 (t + 1, T ) + (1 − q) Pj (t + 1, T )] , (13.16)
where Pj (t, t + 1) = e−rj (t) , and rj (t) is the continuously compounded short-term rate at time
t after j upward movements. We call this condition the martingale restriction.
Let us introduce notation for the movements of the price of any zero along the tree,
Pj+1 (t + 1, T ) 1 Pj (t + 1, T ) 1
= u (T − t) and = d (T − t) . (13.17)
Pj (t, T ) Pj (t, t + 1) Pj (t, T ) Pj (t, t + 1)

up at t down at t
The two functions u (·) and d (·), also called “perturbation functions,” capture the fact that
in the case of uncertainty, the price of the zero can either go up or down with respect to the
risk-free of return. In other words, Eqs. (13.17) tell us that the discounted gross return from
going long a bond is:

P (t + 1, T )  u (T − t) with probability q
· P (t, t + 1) =
P (t, T ) 
Discount d (T − t) with probability 1 − q
Gross return
where the two functions u (T − t) and d (T − t) have to be determined endogeneously. If there

was no uncertainty, we would have u (T − t) = d (T − t) = 1, for all t ≤ T . In general, we have
that d (T − t) ≤ 1 ≤ u (T − t), as we shall now demonstrate.
One period before the expiration date, i.e. at t = T − 1, our price is certain to jump to one,
with jump size equal to the short-term rate rj (t). Hence, the following boundary condition for
the two functions u (·) and d (·) holds:
u (1) = d (1) = 1. (13.18)
In terms of the two functions u (·) and d (·), the martingale restriction in Eq. (13.16) is,
1 = qu (T − t) + (1 − q) d (T − t) , t ≤ T. (13.19)
This relation is quite familiar as it matches the standard risk-neutral relation for stock prices
in which the short-term rate is tied down to the up and down movements of the stock price.
However, in this context the up and down movements of the zero price depend on the maturity
of the price itself through the two functions u (T − t) and d (T − t), which makes the evaluation
problem more intricate.
509
by A. Mele
13.5.3 The recombining condition

Ho and Lee consider a recombining tree: the price Pj (t, T ) we are looking for depends only
on j, not on the exact sequence of up and down movements leading to j upstate movements.
To summarize, we are looking for two functions u (T − t) and d (T − t) such that (i) the no-
arbitrage condition in Eq. (13.19) holds true and (ii) the tree is recombining. We now elaborate
the arguments that lead to the recombining property of the tree.
Pj+2 (t + 2, T )
ր
Pj+1 (t + 1, T )
ր ց
Pj (t, T ) Pj+1 (t + 2, T )
ց ր
Pj (t + 1, T )
ց
Pj (t + 2, T )
The recombining property of the tree implies that the bond price at time t + 2 in the event
of j + 1 jumps, i.e. Pj+1 (t + 2, T ), can be generated by one of the two paths:
(i) The path Pj (t, T ) → Pj+1 (t + 1, T ) → Pj+1 (t + 2, T ) → “up & down”

(ii) The path Pj (t, T ) → Pj (t + 1, T ) → Pj+1 (t + 2, T ) → “down & up”
We can use the two relations in Eqs. (13.17), to figure out the two paths leading to the bond
price at time t + 2 in the event of j + 1 jumps, i.e. Pj+1 (t + 2, T ). We have that along the first
path,
Pj+1 (t + 1, T ) 1 Pj+1 (t + 2, T ) 1
= u (T − t) , = d (T − t − 1) ,
Pj (t, T ) Pj (t, t + 1) Pj+1 (t + 1, T ) Pj+1 (t + 1, t + 2)

up at t down at t+1
and along the second path,

Pj (t + 1, T ) 1 Pj+1 (t + 2, T ) 1
= d (T − t) , = u (T − t − 1) .
Pj (t, T ) Pj (t, t + 1) Pj (t + 1, T ) Pj (t + 1, t + 2)

down at t up at t+1
To sum up:
≡ Pj+1 (t+1,T )

1 1
Pj+1 (t + 2, T ) = d (T − t − 1) · u (T − t) Pj (t, T ) (up & down)
Pj+1 (t + 1, t + 2) Pj (t, t + 1)
1 1
Pj+1 (t + 2, T ) = u (T − t − 1) · d (T − t) Pj (t, T ) (down & up)
Pj (t + 1, t + 2) Pj (t, t + 1)

≡ Pj (t+1,T )
By equating the previous two equations, we obtain,

u (T − t) u (T − t − 1) Pj+1 (t + 1, t + 2)
= (13.20)
d (T − t) d (T − t − 1) Pj (t + 1, t + 2)
510
by A. Mele
By evaluating Eq. (13.20) at T = t + 2,
u (2) u (1) Pj+1 (t + 1, t + 2) Pj+1 (t + 1, t + 2)

= = ≡ δ −1 ,
d (2) d (1) Pj (t + 1, t + 2) Pj (t + 1, t + 2)
where we assume that δ is constant. Clearly, 0 ≤ δ ≤ 1. Substituting back into Eq. (13.20),
u (T − t) u (T − t − 1) −1
= δ .
d (T − t) d (T − t − 1)
Therefore, given that u (1) = d (1) = 1,
u (T − t)
= δ −(T −t−1) . (13.21)
d (T − t)
Eq. (13.21) gives us the condition under which the tree is recombining. To rule out arbitrage
opportunities, the martingale restriction in Eq. (13.19) must also hold true. Therefore, we have
to solve the following system of two equations (Eq. (13.21) and Eq. (13.19)) with two unknowns
(u (·) and d (·)), +
u (T − t) = δ −(T −t−1) d (T − t)
qu (T − t) + (1 − q) d (T − t) = 1
The solution to this system is,
1 δ T −t−1
u (T − t) = , d (T − t) = . (13.22)
q + (1 − q) δ T −t−1 q + (1 − q) δ T −t−1
So we have solved the problem. We know how to “populate” the tree. Suppose we know how
to assign values to q and δ. Given q and δ, and an initial bond price P (t, T ), we can use Eqs.
(13.17) to populate the tree, using the solution for u (T − t) and d (T − t) given in Eqs. (13.22).
In this way, we can figure out the exact bond prices to insert in each node of the tree. Once
we have computed the bond prices in each node, we can price interest rate derivatives, i.e. the
asset the payoff of which depend on the particular value taken by the bond price on a given set
of nodes. Below, we provide the closed-form solution for the bond price in this model.
What is the interpretation of δ? We have defined δ to be, δ −1 ≡ PPj+1 (t+1,t+2)
j (t+1,t+2)
, or,

−1 Pj+1 (t + 1, t + 2)
ln δ = ln = − [rj+1 (t + 1) − rj (t + 1)] . (13.23)
Pj (t + 1, t + 2)
But we know that conditionally upon time t and (price) jumps equal to j ≤ t, the short-term
rate is binomially distributed, and can take on two values: (i) rj+1 (t + 1) with probability q
and rj (t + 1) with probability 1 − q. Then, the conditional variance of the short-term rate is,
vart [r̃ (t + 1)] = q (1 − q) [rj+1 (t + 1) − rj (t + 1)]2 ,
where vart [r̃ (t + 1)] is the conditional variance at time t, of the short-term rate one-period
ahead. Then, we may use Eq. (13.23), and the previous equation, to obtain,

vart [r̃ (t + 1)] = q (1 − q) · ln δ −1 .
511
by A. Mele
That is, δ is a parameter related to the volatility of the short-term rate, which in this basic
model, is constant. In general, δ could be time-varying, although it is then difficult to find
closed-form solutions for the model.
The Appendix shows that the solution to the Ho and Lee model (i.e. with fixed δ), is:
T −1
P (0, T ) (T −t)(t−j) 1 q + (1 − q) δ S−t
Pj (t, T ) = δ S
. (13.24)
P (0, t) S=t
q + (1 − q) δ
From the perspective of time 0, the price of the zero at t, is only a function of the initial yield
curve, the volatility parameter δ, and of course the risk-neutral probability q.
13.5.4 Calibration of the model

We need to “estimate” the value of δ. We can proceed as follows. Consider Eq. (13.24), and let
T = t + 1. We have,
P (0, t + 1) t−j 1
Pj (t, t + 1) = δ .
P (0, t) q + (1 − q) δ t
The continuously compounded short-term rate predicted by the model is,

rj (t) ≡ − ln Pj (t, t + 1) = F̂t (0) + ln q + (1 − q) δt − (t − j) ln δ, j ≤ t, (13.25)
where F̂t (0) ≡ ln P (0, t) − ln P (0, t + 1). We also have,
rj (1) − r (0) = F̂1 (0) − F̂0 (0) + ln (q + (1 − q) δ) + ln δ −1 · (1 − j) .
Hence, the parameter δ can be chosen so that the volatility of the short-term rate predicted by
the model matches exactly the volatility of the short-term rate that we see in the data. Con-
cretely, we can take δ̂ = exp(− Std (∆r)/ q (1 − q)), where Std(∆r) is the standard deviation
of the short-term rate in the data.
Note, then, the interesting feature of the model. The Ho and Lee model doesn’t take any
a priori stance on the dynamics of the short-term rate. Rather, it imposes: (i) the martingale
restriction on bond prices, an economic restriction, Eq. (13.19); and (ii) the simplifying assump-
tion the tree is recombining, a technical condition, Eq. (13.17). These two conditions suffice to
to tell what to expect from the dynamics of the short-term rate. While deliberately simple, the
Ho and Lee model is quite powerful. The modern approach to interest rate modeling simply
aims to make the Ho and Lee methodology more accurate for practical purposes.
13.5.5 An example
Assume that three zero coupon bonds are available for trading, with current market prices: (i)
P$ (0, 1) = 0.9851 (the price of a 6M zero), (ii) P$ (0, 2) = 0.9685 (the price of a 1Y zero), and
(iii) P$ (0, 3) = 0.9445 (the price of the 1.5Y zero). We know that the price of one-period zero
at time t, in the event of j upward price-jumps from the current date to t, is:
P$ (0, t + 1) t−j 1
Pj (t, t + 1) = δ , j ≤ t, (13.26)
P$ (0, t) q + (1 − q) δ t
where P$ (0, t) is the current market price of a zero expiring at time t, with t equal to six
months, one year and eighteen months, in this example. We assume that q = 12 and δ = 0.9802.
512
by A. Mele
13.5.5.1 The dynamics of the short-term rate
We want to determine the evolution of the short-term rate on a recombining tree for as many
periods as we can, given the market price of the zeros we observe. We use Eq. (13.26) to find
the one-period zeros in each node.
• t = 0. We have, trivially, P (0, 1) = P$ (0, 1) = 0.9851.
• t = 1. We have three cases:
— j = 0: P0 (1, 2) = 2 PP$$ (0,2) δ 1 = 0.9733

(0,1) 1+δ
— j = 1: P1 (1, 2) = 2 PP$$ (0,2) 1

(0,1) 1+δ
= 0.9930
• t = 2. We have two cases:
— j = 0: P0 (2, 3) = 2 PP$$ (0,3)

(0,2)
δ 2 1+δ
1
2 = 0.9557
— j = 1: P1 (2, 3) = 2 PP$$ (0,3) δ 1 = 0.9750

(0,2) 1+δ 2
— j = 2: P2 (2, 3) = 2 PP$$ (0,3) 1

(0,2) 1+δ2
= 0.9947
So we face the tree below.
P (2 ,3 ) = 0 . 9 5 5 7
q= 1
2
P (1, 2 ) = 0 . 9 7 3 3
q= 1
2
P (0 ,1 ) = 0 . 9 8 5 1 P (2 ,3 ) = 0 . 9 7 5 0
P (1, 2 ) = 0 . 9 9 3 0
P (2 , 3 ) = 0 . 9 9 4 7
t =0 t =1 t=2
13.5.5.2 Pricing a coupon bearing bond
Suppose, now, that we want to find the price of some additional bond, e.g., a 1.5Y bond which
pays (semiannually) coupons at 3% of the principal of $1. First, we need to find the value of
this bond in each node of the tree. Note, at each node, we have to figure out (i) the discounted
expectation of its future value (including coupons), and (ii) the current coupons, as illustrated
in the tree below. That is, the convention, here, is that the bond purchased at time t doesn’t
give the owner the right to receive any coupon at time t, only from time t + 1 onwards.
513
by A. Mele
£ 1 .0 3
PU U (2 , 3 ) = 0 .9 5 5 7
0 .0 3 + PU U (2 ,3 ) ⋅ 1 . 0 3 = 1 .0 1 4
q= 1
2
P (1, 2 ) = 0 . 9 7 3 3
0 .0 3 + P (1, 2 )( 12 1 .0 1 4 + 12 1 . 0 3 4) = 1 . 0 2 6 7
£ 1 .0 3
q = 12
PU D (2 ,3 ) = 0 .9 7 5 0
0 .0 3 + PU D (2 ,3 ) ⋅ 1 . 0 3 = 1 . 0 3 4
P (0 ,1 ) = 0 . 9 8 5 1
P (1,2 ) = 0 . 9 9 3 0
0 .0 3 + P (1,2 )( 12 1 .0 3 4 + 12 1 . 0 5 4) = 1 .0 6 6 7
£ 1 .0 3
PD D (2 ,3 ) = 0 .9 9 4 7
0 .0 3 + PD D (2 ,3 ) ⋅ 1 . 0 3 = 1 . 0 5 4
£ 1 .0 3
t =0 t =1 t =2 t =3
Naturally, the bond does not pay coupons at time zero. Therefore, the current price is,

1 1 1 1
P = P (0, 1) 1.0267 + 1.0667 = 0.9851 1.0267 + 1.0667 = 1.0311.
2 2 2 2
Naturally, this price could been obtained by simply adding [P$ (0, 1) + P$ (0, 2) + P$ (0, 3)] ∗
0.03 + P$ (0, 3), although the results in the tree above are going to matter while pricing deriva-
tives written on the coupon bearing bond.
13.5.5.3 Pricing European options
Next, we wish to find the price of options, say the price of two call options on the 1.5Y bond
considered in the previous subsection, when the strike price is $1 and the maturities of the
options are 6 months and 1 year. Again, we need to figure out the no-arbitrage movements of
the ex-coupons bond price. (This is because if we purchase the bond today, we are not entitled
to receive any coupon, today. The flow of coupons we are entitled to receive starts from the
next period.) We easily obtain the tree below. We must just subtract the coupon, 0.03, from
514
by A. Mele
each cum-coupons price in each node of the tree. Then, we obtain:
P = 1.01 4− 0 .0 3 = 0 .98 4
q= 1
2
P = 1.0267− 0 .03 = 0.99 7

q = 12
P (0,1) = 0 .98 51 P = 1.034 − 0 .0 3 = 1 .004
P = 1.06 67− 0.03 = 1 .0 367
P = 1.05 4 − 0 .0 3 = 1.0 24
t =0 t =1 t =2
We are ready to price the two options. As for the call option on the 1.5Y bond, with 6 months
maturity, and strike price K = $1, we have the following tree:
P = 0.997
C = max{P − K ,0} = 0
q = 12
P(0,1) = 0.9851
C =?
P = 1.0367
C = max{P − K ,0} = 0.0367
t =0 t =1
Therefore,

1 1
C = 0.9851 · 0 + · 0.0367 = 1.808 × 10−2 .
2 2
515
13.6. Beyond Ho and Lee: Calibration c
by A. Mele
The call option on the 1.5Y bond with 1 year maturity, and strike price K = $1, is dealt
with similarly. We have the following tree:
P = 0.984
q = 12
C = m ax{P − K ,0} = 0
P(1, 2 ) = 0.9733
C = P (1,2 )( 12 ⋅ 0 + 12 0.004) = 0 .0019
q = 12
P = 1.004
P (0,1) = 0.9851 C = m ax{P − K ,0} = 0.004
C=?
P (1,2 ) = 0.9930
C = P (1,2 )( 0.004 + 12 0.024) = 0.014
1
2
P = 1.024
C = m ax{P − K ,0} = 0.02 4
t =0 t =1 t =2
Therefore, the price of the option is,

1 1 1 1
C = P (0, 1) 0.0019 + 0.014 = 0.9851 0.0019 + 0.014 = 7.831 × 10−3 .
2 2 2 2
13.6 Beyond Ho and Lee: Calibration

The modeling approach in the previous sections imposes no-arbitrage conditions on the price
of the zeros, thereby determining the implied stochastic process for the short-term rate. In this
section, we show how to implement this approach by looking for, or fitting, the “right” short-term
rate process in the first place. Practitioners might prefer to “view” the Ho and Lee model by
the same “calibration” perspective we develop in this section. To illustrate how the calibration
works, we develop three points. First, we review how Arrow-Debreu securities can be put at
work in the very applied context of fixed income security evaluation. We shall see Arrow-Debreu
securities are conceptually very useful here, as they allow us to turn the martingale restriction
of the previous sections to a set of analytically simpler conditions. Second, we use these Arrow-
Debreu securities to implement a general algorithm to “populate” the short-term rate tree,
while ensuring that the initial term-structure is perfectly fitted. Finally, we apply the previous
algorithm to illustrate how to solve two models, in practice: (i) the Ho and Lee model, and (ii)
the model developed by Black, Derman and Toy (1990).
13.6.1 Arrow-Debreu securities

We know from Chapter 2, that an Arrow-Debreu security is a security that promises to pay $1
in some prespecified state of the nature, and zero otherwise. Consider, for example, the diagram
516
by A. Mele
below.
£0
q
s, τ 1− q
£1
Arrow-Debreu security
q s,τ + 1
1− q
0,0 s − 1,τ
£0
£0
FIGURE 13.10. In the binomial tree of this section, an Arrow-Debreu security for state s
at time τ + 1 is a security that pays $1 at time τ + 1 in state s, and zero otherwise. This
section aims to show how to recover Arrow-Debreu prices from the price of fixed income
securities.
In this diagram, q is the risk-neutral probability of an upward movement of the short-term

rate. A generic pair (s, τ ) at each node tracks the number of upward movements of the short-
term rate, s, and calendar time, τ , where of course s ≤ τ (since there can only be one possible
short-term rate movement in each period). From now on, let us focus attention on the Arrow-
Debreu security for the state s at time τ + 1.
Let ps (τ ) denote the current price of an Arrow-Debreu security that pays off $1 in state s
at time τ , and zero otherwise. Then, the current market price of a zero that matures at time T
is necessarily,
T

P$ (0, T ) = ps (T ) .
s=0
More generally, consider a derivative that pays off Ds (τ ) in node (s, τ ), meaning a dividend
equal to D1 (τ ) in state s = 1, equal to D2 (τ ) in state s = 2, · · · , and equal to Dτ (τ ) in state
s = τ . The price of this asset, denoted as C$ (0, T ), is given by,
T
τ
C$ (0, T ) = ps (τ ) Ds (τ ) . (13.27)
τ =1 s=0
Our objective, now, is to “recover” the price of the Arrow-Debreu securities ps (τ ) for all s
and τ , where τ ∈ {1, · · · , T }, from the observation of the initial term-structure of interest rates.
Consider the Arrow-Debreu security that promises to pay $1 in node (s, τ + 1) (see Figure
13.7). Let its value at time τ in state j (j ≤ τ ) be denoted as π j,τ [s, τ + 1]. What is this
value at time τ in all states? The key observation, here, is that in this binomial tree, the node
517
by A. Mele
(s, τ + 1) (the filled circle) can only be “accessed to” through the nodes (s, τ ) and the nodes
(s − 1, τ ) occurring at time τ (the two empty circles in Figure 13.7). For this reason, at time
τ , the value π j,τ [s, τ + 1] is zero in all the nodes (j, τ ) that are distinct from the empty circles
(s, τ ) and (s − 1, τ ). This is because starting from any node different from these empty circles,
it is impossible to reach the node (s, τ + 1) (the filled circle) where the Arrow-Debreu security
pays off.
So, we are left with finding the values π j,τ [s, τ + 1] in the nodes corresponding to the empty
circles (s, τ ) and (s − 1, τ ), i.e. π s,τ [s, τ + 1] and π s−1,τ [s, τ + 1]. Let rs (τ ) be the continuously
compounded short-term rate in node (s, τ ). Consider the upper node (s, τ ). We have,
π s,τ [s, τ + 1] = e−rs (τ ) [0 · q + 1 · (1 − q)] = e−rs (τ ) (1 − q) .
Similarly, in the lower node, (s − 1, τ ),
π s−1,τ [s, τ + 1] = e−rs−1 (τ ) [1 · q + 0 · (1 − q)] = e−rs−1 (τ ) q.
We can think of our Arrow-Debreu security for (s, τ + 1) as a derivative that at time τ ,
delivers the following “payoffs“

 π s,τ [s, τ + 1] = e−rs (τ ) (1 − q)
π [s, τ + 1] = e−rs−1 (τ ) q (13.28)
 s−1,τ
π j,τ [s, τ + 1] = 0, for all j < s
These “payoffs” are simply the market value of the Arrow-Debreu security for (s, τ + 1), in the
various states occurring at time τ , i.e. the money the holder can make by selling the asset at
time τ , in the various states. Therefore, we can apply Eq. (13.27) to obtain,
τ

ps (τ + 1) = pj (τ ) π j,τ [s, τ + 1]
j=0
= ps (τ ) π s,τ [s, τ + 1] + ps−1 (τ ) π s−1,τ [s, τ + 1] .
By replacing the Arrow-Debreu prices in (13.28) into the previous equation, we obtain the
so-called forward equation for the Arrow-Debreu prices,
ps (τ + 1) = ps (τ ) e−rs (τ ) (1 − q) + ps−1 (τ ) e−rs−1 (τ ) q (13.29)
13.6.2 The algorithm in two examples

The algorithm aims to populate the interest rate tree by making a repeated use of the forward
equation (13.29) and the zero pricing equation
τ

P$ (0, τ + 1) = ps (τ ) e−rs (τ ) .
s=0
The input is, of course, a number of zeros equal to the largest maturity date the tree extends
to. We describe how the algorithm works by developing two concrete examples.
We start with Ho and Lee. We assume continuous compounding, for analytical reasons clar-
ified below. We assume that the continuously compounding short-term rate is solution to,
rs (τ ) = r0 (τ ) + ln δ−1 · s, (13.30)
518
by A. Mele
where rs (τ ) is the short-term rate at time τ , in the event of s upward movements of the
short-term rate, and δ is a volatility parameter, i.e. such that
Std (∆r)
ln δ −1 = ,
q (1 − q)
where Std (∆r) is the standard deviation of the short-term rate in the data.4 At time zero, the
price of a zero maturing at time τ + 1 is:
τ
τ

P$ (0, τ + 1) = ps (τ ) e−rs (τ ) = e−r0 (τ ) δ s ps (τ ) ,
s=0 s=0
where the second equality follows by the assumption that the short-term rate is solution to Eq.
(13.30).
By rearranging terms in the previous equation, we obtain a closed-form expression for the
future short-term rate at time τ , in the event of zero upward movements,
τ s
s=0 δ ps (τ )
r0 (τ ) = ln . (13.31)
P$ (0, τ + 1)
We use Eq. (13.31) and the forward equation (13.29) to populate the interest rate tree, under
the assumption that q = 12 . Precisely, the algorithm proceeds as follows:
(i) Given the boundary condition for the Arrow-Debreu price, p0 (0) = 1, compute the initial
value of the short-term rate, r0 (0), using Eq. (13.31), as r0 (0) = ln( 1/ P$ (0, 1)).
(ii) Suppose we know the future value of the short-term rate at time τ − 1, in the event of
no upward movements, i.e. r0 (τ − 1). Then, given the value of r0 (τ − 1), and the price
of the Arrow-Debreu securities ps (τ − 1) for s ≤ τ − 1, compute ps (τ ) for s ≤ τ , through
the forward equation (13.29),
1
ps (τ ) = ps (τ − 1) δ s e−r0 (τ −1) (1 − q) + ps−1 (τ − 1) δs−1 e−r0 (τ −1) q, q = ,
2
where the last equation follows by plugging Eq. (13.30) into Eq. (13.29).
(iii) Given the Arrow-Debreu prices ps (τ ) for s ≤ τ , use Eq. (13.31) to compute the future
value of the short-term rate at time τ , in the event of no upward movements, i.e. r0 (τ ).
(iv) If τ = T , stop. Otherwise, go to (ii).
As a second example, consider the Black, Derman and Toy (1990) model. In this model, the
short-term rate is solution to,
rs (τ ) = δ s r0 (τ ) , (13.32)
where δ is, once again, a volatility parameter.5 For computational convenience, this model
assumes that the short-term rate in Eq. (13.32) is discretely compounded. Accordingly, we
rewrite the forward equation (13.29) in terms of discretely compounded rates,
1 1
ps (τ + 1) = ps (τ ) (1 − q) + ps−1 (τ ) q. (13.33)
1 + rs (τ ) 1 + rs−1 (τ )
The algorithm is as follows:
4 Hence, the short-term rate movements that we shall derive do depend on the value of the risk-neutral probability q that we
choose.
5 In its most general form, this model assumes that r (τ ) = δ s r (τ ), where δ is a volatility parameter that varies determinis-
s τ 0 τ
tically over time. This more general formulation leads to more flexibility, which is useful to fit the term structure of volatility.
519
by A. Mele
(i) Compute the initial value of the short-term rate, r0 (0), as the solution to,
1
P$ (0, 1) = .
1 + r0 (0)
(ii) Suppose we know the future value of the short-term rate at time τ − 1, in the event of
no upward movements, i.e. r0 (τ − 1). Then, given the value of r0 (τ − 1), and the price
of the Arrow-Debreu securities ps (τ − 1) for s ≤ τ − 1, compute ps (τ ) for s ≤ τ , through
the forward equation (13.33),
1 1 1
ps (τ ) = ps (τ − 1) s (1 − q) + ps−1 (τ − 1) s−1 q, q = ,
1 + δ r0 (τ − 1) 1+δ r0 (τ − 1) 2
where the last equation follows by plugging Eq. (13.32) into Eq. (13.33).
(iii) Given the boundary condition p0 (0) = 1, and the Arrow-Debreu prices, ps (τ ) for s ≤ τ ,
use the pricing equation for the zero,
τ
1
P$ (0, τ + 1) = ps (τ ) s ,
s=0
1 + δ r0 (τ )
to solve, numerically, for the future value of the short-term rate at time τ , in the event
of no upward movements, i.e. r0 (τ ). Note, we did not need this additional step for the
solution of the Ho and Lee model, as the short-term rate r0 (τ ) is known in closed form
in the Ho and Lee model (see Eq. (13.31)).
(iv) If τ = T , stop. Otherwise, go to (ii).
13.6.2.1 A numerical example
Consider, again the Ho and Lee example in Section 13.5.5, where three zeros were traded: (i)
one zero maturing in 6 months, (ii) one zero maturing in 1 year, and (iii) one zero maturing in
1.5 years, with market prices P$ (0, 1) = 0.9851, P$ (0, 2) = 0.9685, P$ (0, 3) = 0.9445. The Ho
and Lee model assumes that,

rs (τ ) = r0 (τ ) + ln δ−1 · s. (13.34)
We now want to use this equation to find the values of the short-term rate rs (τ ) in each
node, under the assumption that q = 12 , and that the standard deviation of the short-term rate
is 0.014, annualized.
Remarks on notation. By Eq. (13.25), the short-term rate predicted by the Ho and Lee
model is:
rj (τ ) = F̂τ (0) + ln (q + (1 − q) δ τ ) − (τ − j) ln δ. (13.35)
where F̂τ (0) is the continuously compounded forward rate, at time zero, for maturity [τ , τ + 1],
and j is the number of upward movements of the bond prices. Naturally, then, s ≡ (t − j) is
the number of downward movements of the bond prices or, equivalently, the number of upward
movements of the short-term rate. Hence, we may equivalently index the short-term rate by s,
instead than by j, and rewrite Eq. (13.35) as follows:

rs (τ ) = F̂τ (0) + ln (q + (1 − q) δ τ ) + ln δ −1 · s,

= r0 (τ )
520
by A. Mele

Std(∆r)
To find δ, we make reference to the relation, ln δ −1 = √ , where q = 1
2
and Std(∆r)
q(1−q)
is the standard deviation of the short-term rate, which equals Std(∆r) = 0.014, annualized.
Therefore ln δ −1 = 0.014
√ / 1 = 0.02 or δ = 0.9802.
2 2
For the Ho & Lee model, we know the closed-form expression for r0 (τ ),
τ s
s=0 δ ps (τ )
r0 (τ ) = ln , (13.36)
P$ (0, τ + 1)
where ps (τ ) denotes the price of an Arrow-Debreu security which pays of $1 in state s at time
τ , and zero otherwise. Given the term-structure of prices P$ (0, τ + 1), τ = 0, 1, 2, we “populate”
the tree using Eq. (13.36) and the forward equation for the Arrow-Debreu prices developed in
the lecture notes,
1
ps (τ ) = e−r0 (τ −1) δ s ps (τ − 1) + δs−1 ps−1 (τ − 1) , (13.37)
2
with the appropriate boundary conditions.
So we have to compute interest rates and Arrow-Debreu prices for τ = 0, 1, 2.
• τ = 0. Eq. (13.36) is trivial. It leads to,

1
r0 (0) = ln = 0.015.
P$ (0, 1)
The forward equation for the Arrow-Debreu prices, Eq. (13.37), is also trivial,
p0 (0) = 1.
• τ = 1. Let us use Eq. (13.37), the forward equation for the Arrow-Debreu prices, to find
p0 (1) and p1 (1). We have two cases:
— s = 0. We have:
1 1
p0 (1) = e−r0 (0) [p0 (0) + 0] = e−r0 (0) = 0.4925.
2 2
The previous relation holds because p0 (1) is the current price of the Arrow-Debreu
security which pays off $1 in state 0 at time 1, as illustrated by the tree in the Figure
1 below,
q = 1
2
s =1
s = 0
£1
τ = 0 τ =1
521
by A. Mele
— s = 1. By a similar reasoning,
1 1
p1 (1) = e−r0 (0) [0 + p0 (0)] = e−r0 (0) = 0.4925.
2 2
Eq. (13.36) is, now,

p0 (1) + δp1 (1) 0.4925 · (1 + 0.9802)
r0 (1) = ln = ln = 0.0069.
P$ (0, 2) 0.9685
Hence, by Eq. (13.34),

r1 (1) = r0 (1) + ln δ −1 = 0.0069 + 0.02 = 0.0270.
So, to sum up, we have the tree below,
r1 (1) = 0.027
q= 1
2
r0 (0) = 0.015
r0 (1) = 0.0069
τ =0 τ =1
where p0 (1) = p1 (1) = 0.4925.

We now proceed to compute the values of the short-term rate for one further period.
• τ = 2. By Eq. (13.37), the forward equation for the Arrow-Debreu prices, we have the
following three cases:
(s = 0) p0 (2) = 12 e−r0 (1) [p0 (1) + 0] = 0.2446

(s = 1) p1 (2) = 12 e−r0 (1) [δp1 (1) + p0 (1)] = 0.4843
(s = 2) p2 (2) = 12 e−r0 (1) [0 + δp1 (1)] = 0.2397
The tree below further illustrates how to obtain these prices.

s = 2
s =1
s =1
s = 0
s = 0
τ = 0 τ =1 τ = 2
522
13.7. Copying with credit risk c
by A. Mele
Consider, for example, p0 (2). It is the price of the Arrow-Debreu security for time 2,
under two consecutive downward movements of the short-term rate. This state can only
be accessed to through the state s = 0 at time τ = 1. But at state s = 0 at time τ = 1, the
value of the Arrow-Debreu asset is 12 e−r0 (1) . Hence, p0 (2) = p0 (1) · 12 e−r0 (1) . By a similar
reasoning, we have that p2 (2) = p1 (1) · 12 e−r1 (1) = p1 (1) · 12 e−r0 (1) δ.
We can now compute the values of the short-term rate for each node. Eq. (13.36) is, now,

p0 (2) + δp1 (2) + δ 2 p2 (2)
r0 (2) = ln
P$ (0, 3)
$ %
0.2446 + 0.9802 · 0.4843 + (0.9802)2 · 0.2397
= ln = 0.0054.
0.9445
Hence, by Eq. (13.34),

rs (2) = r0 (2) + ln δ −1 · s = 0.0054 + 0.02 · s, s = 0, 1, 2.
This yields the following values for the short-term rate: r0 (2) = 0.0054, r1 (2) = 0.0253,
and r2 (2) = 0.0452.
To summarize, the implied tree for the short-term rate is given by Figure 4 below.
r2 (2 ) = 0 . 0 4 5 2
q= 1
2
r1 (1) = 0 . 0 2 7
q= 1
2
r0 (0 ) = 0 . 0 1 5 r1 (2 ) = 0 . 0 2 5 3
r0 (1) = 0 . 0 0 6 9
r0 (2 ) = 0 . 0 0 5 4
τ =0 τ =1 τ =2
Naturally, the prices P = e−r in the nodes of the previous tree match those calculated in
Section 13.5.5, apart from discrepancies arising due to rounding errors.
13.7 Copying with credit risk

Two examples: callable and convertible bonds.
523
by A. Mele
13.7.1 Callable bonds

For example, to evaluate callable bonds through trees, we follow that methodology described
in this chapter, that we correct for the presence of credit risk.
(i) First, we “populate” a short-term rate tree through one of the models described in this
chapter (say, for example, through the Black, Derman and Toy model).
(ii) Second, we use this tree to find the value of some coupon bearing bond of interest, by
just using the short-term rate process of the previous step.
(iii) Third, we use the results obtained in the second step and build up a tree for the callable
bond. In each node immediately preceding the maturity, we compare the strike price with
the non-callable coupon bearing bond price (ex-coupon) and take the minimum of the
two. We add the coupon to this minimum and find, then, the payoff of the non-callable
bond at the relevant node. This gives us V = min{K, B rolled-back (ex-coupon)} + coupon,
where K is the call price, and B rolled-back (ex-coupon) is the ex-coupon bond price, which
is found from the values of the bond V in the next nodes (by using, as usual, recursive,
backward solution, i.e. the risk-neutral expectation of the future payoffs).
(iv) Fourth, we go backward, discounting the values obtained in the previous steps, V say,
obtaining, for each node, V− = min{K, V } + coupon, etc. Hence, we find the price. If the
callable bond is not subject to default risk, we stop. Otherwise, we proceed to the next
step.
(v) Fifth, we correct for credit risk. The price we found in the fourth step is typically different
than the market price. One issue is that the market price reflects the credit risk of the
firm, and should be typically less than the price obtained in the fourth step. The trick,
here, is to search for an additional spread to add to the short-term rate process obtained
in the first step, such that the theoretical bond price equals the market price of the bond.
This is done numerically, and of course alters the results obtained in steps 3 and 4.
At this point, we may price options written on callable bonds. Ho and Lee (2004) (Chapter
8, Section 8.3 p. 274-278) develop a number of useful exercises on the pricing of options on
callable bonds, through tree methods.
13.7.2 Convertible bonds

Recall from Chapter 12, that the parity, or conversion value, is CV = CR × S, where S is the
price of the common share. The evaluation tree features the following three steps.
(i) First, we set the life of the tree equal to the life of the callable convertible bond.
(ii) Second, we assess the evolution of the stock price along the tree, under the risk-neutral
probability. (This is done according to the usual Cox, Ross and Rubinstein (1979) ap-
proach.)
(iii) Third, in each node, we compute the value of the bond as max{CV, min{B, K}}, where
CV is the conversion value, K is the call value, and B is the value of the bond which is
“rolled-back” from the values of the bond in the next nodes (by using, as usual, recur-
sive, backward solution, i.e. the risk-neutral expectation of the future payoffs). That is,
524
by A. Mele
assuming the bondholder does not convert, the value is B ∗ = min {B, K}, where B is the
“rolled-back” value of the bond. Then, the value is max{CV, B ∗ }.
Note, this procedure leads to fill in the nodes, once we know the appropriate interest rate. If
the firm was not subject to default risk, we would simply use the riskless interest rate. However,
the firm is obviously subject to default risk. In practice, we proceed as follows. In each node, the
value of the bond is decomposed in two parts. One part, related to the “pure debt component”,
which is discounted at the defaultable interest rate; and one part related to the “pure equity
component”, which is discounted at the default-free interest rate. Exercise 25.7 in Hull (2003)
(p. 653-654) illustrates a specific example.
525
13.8. Appendix 1: Proof of Eq. (13.9) c
by A. Mele
13.8 Appendix 1: Proof of Eq. (13.9)

The derivation in this appendix is the discrete-time counterpart to that in Section 11.3.2.2 in Chapter
11.
Let P (r, Ti ) denote the price of a zero with maturity Ti , i = 1, 2, when the interest rate is equal
to r. We wish to replicate a zero with maturity T2 by means of a portfolio that includes a zero with
maturity T1 . Consider the following portfolio: (i) Go long ∆ zeros with maturity T1 and (ii) invest
M in the MMA. Let V0 be the current value of this portfolio. V0 is clearly a function of the current
short-term rate r, and equals,
V0 (r) = ∆ · P (r, T1 ) + M.
In the second period, the value of the portfolio is random, as it depends on the development of the
short-term rate r̃. Precisely, the value of the portfolio in the second period, is
+
V (r+ ) = ∆ · P (r+ , T1 ) + M · (1 + r), with probability p
V (r̃) =
V (r− ) = ∆ · P (r− , T1 ) + M · (1 + r), wit probability 1 − p
We also know that in the second period, the value of the second zero is,
+
P (r+ , T2 ) , with probability p
P (r̃, T2 ) =
P (r− , T2 ) , with probability 1 − p
Next, we select ∆ and M to make the value of the portfolio equal the value of the second zero, in each
state of nature, viz
V (r̃) = P (r̃, T2 ) , in each state.
Mathematically, this is tantamount to solving the following system of two equations with two unknowns
(∆ and M), +
V (r+ ) = ∆ · P (r+ , T1 ) + M · (1 + r) = P (r+ , T2 )
(13A.1)
V (r− ) = ∆ · P (r− , T1 ) + M · (1 + r) = P (r− , T2 )
The solution is,
+ − P (r− , T2 )P (r+ , T1 ) − P (r+ , T2 )P (r− , T1 )
ˆ = P (r , T2 ) − P (r , T2 ) ,
∆ M̂ = .
P (r+ , T1 ) − P (r− , T1 ) [P (r+ , T1 ) − P (r− , T1 )] (1 + r)
ˆ M̂ ), replicates the value of the second zero in the second
By construction, the previous portfolio, (∆,
period. But if two assets (the portfolio, and the second zero) yield the same payoffs in each state of
the nature, they must be worth the same, in the absence of arbitrage. Therefore, we must have,
ˆ
V0 (r)|∆=∆,M=
ˆ M̂ = ∆ · P (r, T1 ) + M̂ = P (r, T2 ) ,
or,
ˆ · P (r, T1 ) .
(1 + r) M̂ = (1 + r) P (r, T2 ) − (1 + r) ∆ (13A.2)
Next, let us figure out the prediction of the model in terms of the expected return it generates for
the price of the bond maturing at T1 , when (∆, M) = (∆, ˆ M̂ ). To do this, multiply the first equation in
(13A.1) by p, and multiply the second equation in (13A.1) by 1 − p. Add the result for ∆ = ∆, ˆ M = M̂
to obtain,

ˆ · pP (r+ , T1 ) + (1 − p) P (r− , T1 ) + M̂ · (1 + r) = pP (r+ , T2 ) + (1 − p)P (r− , T2 ).
∆
Replacing (13A.2) into the previous equation yields,

ˆ · pP (r+ , T1 ) + (1 − p) P (r− , T1 ) − (1 + r)P (r, T1 )
∆

= pP (r+ , T2 ) + (1 − p)P (r− , T2 ) − (1 + r)P (r, T2 ) .
526
by A. Mele
ˆ into the previous equation leaves,

Finally, replacing the solution for ∆
[pP (r+ , T1 ) + (1 − p)P (r− , T1 )] − (1 + r)P (r, T1 ) [pP (r+ , T2 ) + (1 − p) P (r− , T2 )] − (1 + r) P (r, T2 )
= .
P (r+ , T1 ) − P (r− , T1 ) P (r+ , T2 ) − P (r− , T2 )
The previous equation is easy to interpret. The numerators are the expected excess returns from
holding the assets. They equal Ep [P (r̃, Ti )] − (1 + r) P (r, Ti ), where Ep [P (r̃, Ti )] is what the investors
expect to receive, the next period, by investing £P (r, Ti ) today, in the bond; and (1 + r) P (r, Ti )
is what the investors expect to receive, the next period, by investing £P (r, Ti ) today, in the MMA.
The denominators constitute a measure of volatility related to holding the assets. Then, the previous
equation tells us that the Sharpe ratios, or the unit risk premiums, on the two zeros agree.
Let the Sharpe ratio on any zero be equal to some function λ of the short-term rate r only (and
possibly of calendar time). This function, λ, does not clearly depend on the maturity of the zeros.
Then, we have,

pP (r+ , T1 ) + (1 − p)P (r− , T1 ) − (1 + r) P (r, T1 ) = P (r+ , T1 ) − P (r− , T1 ) λ
P (r+ , T1 ) − P (r− , T1 )
= · [(r+ − r− )λ]. (13A.3)
r + − r−
We can interpret (r+ − r− ) as a measure of interest rate volatility, and define Vol(r̃ − r) ≡ (r+ − r− ).
Eq. (13.9) follows by rewriting Eq. (13A.3) for a generic maturity date T > 2.
527
by A. Mele
13.9 Appendix 2: Proof of Eq. (13.24)

P (τ ,T +1)
Consider the equation defining the discretely compounded forward rate (see Chapter 11): P (τ ,T ) =
1
1+FT (τ ) , where FT (τ ) ≡ F (τ , T, T + 1). Iterating this equation leaves:
T1
−1 T −1
1 P (t, T ) P (t, τ ) 1 1
P (τ , T ) = = .
1 + FS (τ ) P (t, τ ) P (t, T ) 1 + FS (τ )
S=τ S=τ
Therefore, at any instant of time t : t < τ < T , we have that,

T −1
P (t, T ) 1 1 + FS (t)
P (τ , T ) = . (13A.4)
P (t, τ ) 1 + FS (τ )
S=τ
Eq. (13A.4) gives us the price of the bond at a future date τ . It reveals that the price P (τ , T ) as of
time τ can be expressed as a function of the current bond prices P (t, T ) and P (t, τ ), and how forward
rates will change from the current time t to the time τ at which the derivative payoff will be paid,
1+FS (t)
i.e. 1+FS (τ )
, for S = τ , · · ·, T − 1. Hence, once we model the evolution of forward rates, we also have
a model of the future bond price movements, P (τ , T ), which we can use to price, at the evaluation
time t, interest rate derivatives, with payoffs depending on the realization of the bond price P (τ , T )
at time τ .
To normalize the time-line, we now set t = 0. Redefining τ = t, Eq. (13A.4) then reduces to,
T −1
P (0, T ) 1 1 + FS (0)
P (t, T ) = . (13A.5)
P (0, t) 1 + FS (t)
S=τ
It is quite natural, at this juncture, to search for the model’s predictions about the evolution of
future forward rates. Not only is this task theoretically important, it is also relevant as a matter of
the practical implementation of the model. Indeed, if the model’s predictions about the evolution of
future forward rates yields a closed-form solution, the bond price at the future date t, P (t, T ), could
be expressed in a closed-form, which might facilitate the implementation details of the model.
Let us introduce some further notation. Let FSj (t) be the forward rate as of time t after the occur-
rence of j upward movements in the bond price, and let the continuously compounded forward rate
F̂Sj (t) be defined as,
F̂Sj (t) ≡ ln 1 + FSj (t) , j ≤ t.
By Eq. (13A.5), then,

T −1
P (0, T ) 1 1 + FS (0) P (0, T ) − TS=t
−1
(F̂Sj (t)−F̂S (0)) .
Pj (t, T ) = j
= e (13A.6)
P (0, t) P (0, t)
S=t 1 + FS (t)
We have the following important result, which we shall prove later on:
u (S + 1 − t)
F̂Sj (t) = F̂S (0) + ln − (t − j) ln δ, j ≤ t. (13A.7)
u (S + 1)
By replacing Eq. (13A.7) into Eq. (13A.6), and using the solution for the perturbation function u (·)
in Eqs. (13.22), we get Eq. (13.24).
So we are left with proving Eq. (13A.7). The proof proceeds by induction. Eq. (13A.7) holds true
for t = 0. Next, suppose that it holds at time t. We wish to show that in this case, Eq. (13A.7) would
also hold at time t + 1. At time t + 1, we have two cases.
528
by A. Mele
Case 1 : A positive price jump occurs between time t and time t + 1. In this case,
Pj+1 (t + 1, S)
F̂Sj+1 (t + 1) = ln
Pj+1 (t + 1, S + 1)

Pj (t, S) Pj (t, S + 1)
= ln u (S − t) − ln u (S + 1 − t)
Pj (t, t + 1) Pj (t, t + 1)
u (S − t)
= ln + F̂Sj (t)
u (S + 1 − t)
u (S + 1 − (t + 1))
= ln + F̂S (0) − [(t + 1) − (j + 1)] ln δ,
u (S + 1)
where the first equality and the third follow by the definition of F̂Sj+1 (t), the second equality holds by
the definition of the jump in Eq. (13.17), the fourth equality follows by using Eq. (13A.7). Hence, Eq.
(13A.7) holds at time t + 1 in the occurrence of a positive price jump between time t and time t + 1.
Case 2 : A negative price jump occurs between time t and time t + 1. In this case,
Pj (t + 1, S)
F̂Sj (t + 1) = ln
Pj (t + 1, S + 1)

Pj (t, S) Pj (t, S + 1)
= ln d (S − t) − ln d (S + 1 − t)
Pj (t, t + 1) Pj (t, t + 1)
d (S − t)
= ln + F̂Sj (t)
d (S + 1 − t)
d (S − t) δ −(S−t)+1 u (S + 1 − t)
= ln −(S+1−t)+1
δ −1 + F̂S (0) + ln − (t − j) ln δ
d (S + 1 − t) δ u (S + 1)
u (S − t)
= ln + F̂S (0) − [(t + 1) − j] ln δ,
u (S + 1)
where the first four equalities follow by the same arguments produced in Case 1, the fifth equality
holds by the relation u (T ) = d (T ) δ −(T −1) in Eq. (13.21) and the last equality follows by rearranging
terms. Hence, Eq. (13A.7) holds at time t + 1 in the occurrence of a negative price jump between time
t and time t + 1.
These two cases reveal that if Eq. (13A.7) holds at time t for any j ≤ t, it also holds at time t + 1,
in each state of nature. By induction, Eq. (13A.7) is therefore true.
529
by A. Mele
References
Black, F., E. Derman and W. Toy (1990): “A One Factor Model of Interest Rates and its
Application to Treasury Bond Options.” Financial Analysts Journal (January-February),
33-39.
Cox, J. C., S. A. Ross and M. Rubinstein (1979): “Option Pricing: A Simplified Approach.”
Journal of Financial Economics 7, 229-263.
Diebold, F. X. and C. Li (2006): “Forecasting the Term Structure of Government Bond Yields.”
Ho, T. S. Y. and S.-B. Lee (1986): “Term Structure Movements and the Pricing of Interest
Rate Contingent Claims.” Journal of Finance 41, 1011-1029.
Ho, T. S. Y. and S.-B. Lee (2004): The Oxford Guide to Financial Modeling. Oxford University
Press.
Hull, J. C. (2003): Options, Futures, and Other Derivatives. Prentice Hall. 5th edition (Inter-
national Edition).
Hull, J. C. and A. White (1990): “Pricing Interest Rate Derivative Securities.” Review of
Financial Studies 3, 573-592.
McCulloch, J. (1971): “Measuring the Term Structure of Interest Rates.” Journal of Business
44, 19-31.
McCulloch, J. (1975): “The Tax-Adjusted Yield Curve.” Journal of Finance 30, 811-830.
Nelson, C.R. and A.F. Siegel (1987): “Parsimonious Modeling of Yield Curves.” Journal of
Business 60, 473-489.

530

Lectures in Financial Economics

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lectures in Financial Economics

Uploaded by

Copyright:

Available Formats

Lecture Notes in Financial Economics

1 The classic capital asset pricing model 14

2 The CAPM in general equilibrium 35

2.2 The static general equilibrium in a nutshell . . . . . . . . . . . . . . . . . . . . . 35

3 Infinite horizon economies 65

3.5.1 Introduction: endowment economies . . . . . . . . . . . . . . . . . . . . . 83

4 Continuous time models 104

4.8 Continuous-time Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

5 Taking models to data 163

5.9 Appendix 1: Proof of selected results . . . . . . . . . . . . . . . . . . . . . . . . 186

II Asset pricing and reality 197

7 The stock market 216

7.5.2 Large price swings as a learning induced phenomenon . . . . . . . . . . . 244

8 Tackling the puzzles 265

9 Information and other market frictions 287

9.8 Demand-based derivative prices . . . . . . . . . . . . . . . . . . . . . . . . . . . 290

III Applied asset pricing theory 292

10.9 A few exotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324

11 Interest rates 333

11.7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374

12 Risky debt and credit derivatives 406

12.12Appendix 6: Modeling correlation with copulae functions . . . . . . . . . . . . . 474

13 Financial engineering and fixed income securities 479

1.1 Portfolio selection

1.1.1 The wealth constraint

w + = π ⊤ (R̃ − 1m R) + Rw = π ⊤ (b − 1m r) + Rw + π ⊤ (b̃ − b).

We use the decomposition, b̃ − b = a · ũ, where a is a m × d “volatility” matrix, with m ≤ d,

σ 2i > σ 2j ⇒ bi > bj all i, j,

which implies that r < minj (bj ).

1.1.2 Portfolio choice

The ﬁrst order conditions for [1.P1] are,

π̂ (vp ) = (2ν)−1 Σ−1 (b − 1m r) and π̂ ⊤ Σπ̂ = w2 · vp2 ,

Sh ≡ (b − 1m r)⊤ Σ−1 (b − 1m r) , (1.4)

1.1.3 Without the safe asset

1.1.4 The market portfolio

Let π M be the market portfolio. To identify π M , we note that it belongs to AM C if π ⊤

1.2 The CAPM

b̃i − r = β i (b̃M − r) + εi , i = 1, · · ·, m. (1.19)

1.3 The APT

b̃ = 1m r + Bλ + Bf = 1m r + cov(b̃, f )λ + cov(b̃, f )f.

Taking the expectation,

This is simply the SML in Eq. (1.18).

1.3.3 Empirical evidence

1.4 Appendix 1: Some analytical details for portfolio choice

1.4.1 The primal program

L = π⊤ b + w − ν 1 (π⊤ Σπ − w2 · vp2 ) − ν 2 (π⊤ 1m − w),

We can solve for ν 2 , obtaining,

Next, we derive the value of the program [P2]. We have,

It is easy to check that

Let us gather Eqs. (1A.3) and (1A.4),

1.4.2 The dual program

for some constant Ep . The ﬁrst order conditions are

By replacing the ﬁrst condition in (8A.14) into the third one,

Therefore, the solution for the portfolio in Eq. (8A.14) is,

which is exactly Eq. (1.10) in the main text.

1.5 Appendix 2: The market portfolio

1.5.2 Tangency condition

1.6 Appendix 3: An alternative derivation of the SML

1.7 Appendix 4: Broader deﬁnitions of risk - Rothschild and Stiglitz theory

Theorem A.2. The following statements are equivalent:

Next, we turn to the deﬁnition of “increasing risk”:

Theorem A.4. The following statements are equivalent:

Proof. Let us begin with c) ⇒ a). We have,

E [u (x̃1 )] = E [u (x̃2 + ǫ)]

and, ﬁnally, the null space of W ,

FIGURE 2.3. Complete markets, V = R2 .

By the Minkowski’s separation theorem, ∃φ̃ ∈ Rd+1 : w⊤ φ̃ ≤ d1 < d2 ≤ σ⊤ φ̃, w ∈ W , σ ∈ S d .

0 ∈ W , which reveals that d1 ≥ 0, and φ̃ ∈ Rd+1 ⊤

we get eq. (2.28).